Continuing from the previous post, we have seen the variance inflation factor (VIF) of three variables in a database:
pcFAT Weight Activity
3.204397 3.240334 1.026226
The following is the correlation matrix of the three variables. You can see a reasonable correlation between the percentage of fat and weight.
pcFAT Weight Activity
pcFAT 1.00000000 0.8267145 -0.02252742
Weight 0.82671454 1.0000000 -0.10766796
Activity -0.02252742 -0.1076680 1.00000000
Now, we create a new variable, FAT (in kg), multiplying fat percentage with weight. First, we rerun the correlation to check how the new variables relate.
MM_data <- M_data %>% select(pcFAT, Weight, Activity, FAT)
cor(MM_data)
pcFAT Weight Activity FAT
pcFAT 1.00000000 0.8267145 -0.02252742 0.9244814
Weight 0.82671454 1.0000000 -0.10766796 0.9684322
Activity -0.02252742 -0.1076680 1.00000000 -0.0966596
FAT 0.92448136 0.9684322 -0.09665960 1.0000000
FAT, as expected, has a very high correlation with weight and fat percentage. Now, we run the VIF function on it.
model <- lm(BMD_FemNeck ~ pcFAT + Weight + Activity + FAT , data=M_data)
VIF(model)
pcFAT Weight Activity FAT
14.931555 33.948375 1.053005 75.059251
VIFs of those three variables are high.