Multiple Linear Regression

Multiple linear regression is used to estimate the relationship between two or more independent variables and one dependent variable. You can use multiple linear regression when you want to know:

  1. How strong the relationship is between two or more independent variables and one dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).
  2. The value of the dependent variables at a certain value of the independent variables (e.g. the expected yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).

The formula for a multiple linear regression is:

                                                       *Y* = β0+β1x1+β2x2+⋯+βnxn+ϵ

  • Y = the predicted value of the dependent variable.
  • β0 = the Y - intercept (value of Y when all other parameters are set to 0).
  • β1x1 = he regression coefficient () of the first independent variable () (a.k.a. the effect that increasing the value of the independent variable has on the predicted Y value).
  • ….. = do the same for however many independent variables you are testing.
  • βnxn = the regression coefficient of the last independent variable.
  • ϵ = model error (a.k.a. how much variation there is in our estimate of Y ).

To find the best-fit line for each independent variable, multiple linear regression calculates three things:

  • The regression coefficients that lead to the smallest overall model error.
  • The t statistic of the overall model.
  • The associated p value (how likely it is that the t statistic would have occurred by chance if the null hypothesis of no relationship between the independent and dependent variables was true).

It then calculates the t statistic and p value for each regression coefficient in the model.

NOTE: Multiple linear regression consider the data you have is multi-independent variables(feature) and single dependent variables (target).