When there are many predictor variables, one can create the most statistically significant model from the data. There are two main choices: forward stepwise regression and backward deletion method.
In Forward Stepwise Regression: Start off with the single best variable and add more variables to build your model into a more complex form.
In Backward Deletion (Backward Selection) Regression: put all the variables in the model and reduce the model by removing variables until you are left with only significant terms.
Backward Deletion method
Let start with a big model and trim it down until you get the best (most statistically significant) regression model. To do this
drop1() command can be used to examine a linear model and determine the effect of removing each one from the existing model. Complete the following steps to perform a backward deletion. Note that there are different R packages for the Backward and Forward Selection of predictors in the model.
Step 1: To start, create a “full” model (all variables at once in the model). It would be tedious to enter all the variables in the model, one can use the shortcut, the dot notation.
mod <- lm(mpg ~., data = mtcars)
Step 2: Let use the
formula() function to see the response and predictor variables used in Step 1.
Step 3: Let use the
drop1() function to see which term (predictor) should be deleted from the model
Step 4: Look to remove the term with the lowest AIC value. Re-form the model without the variable which one is non-significant or having the lowest AIC value. The simplest way to do this is to copy the model formula in the clipboard, paste it into a new command and edit out the term you do not want
mod1 <- lm(mpg~ ….., data = mtcars)
Step 5: Examine the effect of dropping another term by running the
drop1() command once more:
If you see any variable having the lowest AIC value, if found remove the variable and carry out this process repeatedly until you have a model that you are happy with.