Variance Inflation Factor Continued

Continuing from the previous post, we have seen the variance inflation factor (VIF) of three variables in a database:

   pcFAT   Weight Activity 
3.204397 3.240334 1.026226 

The following is the correlation matrix of the three variables. You can see a reasonable correlation between the percentage of fat and weight.

               pcFAT     Weight    Activity
pcFAT     1.00000000  0.8267145 -0.02252742
Weight    0.82671454  1.0000000 -0.10766796
Activity -0.02252742 -0.1076680  1.00000000

Now, we create a new variable, FAT (in kg), multiplying fat percentage with weight. First, we rerun the correlation to check how the new variables relate.

MM_data <- M_data %>% select(pcFAT, Weight, Activity, FAT)
cor(MM_data)
               pcFAT     Weight    Activity        FAT
pcFAT     1.00000000  0.8267145 -0.02252742  0.9244814
Weight    0.82671454  1.0000000 -0.10766796  0.9684322
Activity -0.02252742 -0.1076680  1.00000000 -0.0966596
FAT       0.92448136  0.9684322 -0.09665960  1.0000000

FAT, as expected, has a very high correlation with weight and fat percentage. Now, we run the VIF function on it.

model <- lm(BMD_FemNeck ~ pcFAT + Weight + Activity + FAT , data=M_data)
VIF(model)
    pcFAT    Weight  Activity       FAT 
14.931555 33.948375  1.053005 75.059251 

VIFs of those three variables are high.