What is Linear Regression?

A short explanation of Linear Regression

Linear regression is used to model the relationship between continuous variables. For example to predict the price of a house when you have features like size in square meters and crime in the neighborhood etc.  A linear regression function takes the form of

$$\hat{y}=\hat{\beta_0}+\hat{\beta_1}x_1+\hat{\beta_2}x_2+\dots+\hat{\beta_p}x_p$$

Here y is the target we're trying to predict (house price), the x's are the features or predictors (size, crime) and the β's are the coefficients or the parameters that we are trying to estimate by fitting the model to data. The little hats on top of the y and β's are called hats, and indicate we are dealing with estimates here.
With multiple features it is called Multiple Linear Regression and when there's only one feature it is Simple Linear Regression.
For Linear Regression the function does not have to be linear with regards to the predictors as long as it is linear in the parameters. This means that you can model interactions between predictors by, for example, multiplying x's if this makes a better fit.

$$\hat{y}=\hat{\beta_0}+\hat{\beta_1}x_1+\hat{\beta_2}x_2+\hat{\beta_3}x_1x_2+\dots+\hat{\beta_p}x_p$$

How do you fit a model to data?
You estimate the β's as the values that minimize the sum of squared residuals, which is the squared difference between the actual value y and the estimated y-hat summed for all x's. Or

$$\displaystyle\sum_{i=1}^n(y_i-\hat{\beta_0}-\hat{\beta_1}x_{i1}-\hat{\beta_2}x_{i2}-\dots-\hat{\beta_p}x_{ip})^2$$

When should you use Linear Regression?
As a guideline, with any regression problem, try linear regression first and move on to other methods when it is not good enough. This choice should be based on considerations like whether the model under- or overfits the data.

Δ9

Show Comments