Logistic Regression of The Heart Failure Data

Let’s work out a few more matrices to continue with the heart data. First, let’s recall the data using the str() command.

str(h_data)
'data.frame':	299 obs. of  13 variables:
 $ Age      : num  75 55 65 50 65 90 75 60 65 80 ...
 $ Anaemia  : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 2 2 1 2 ...
 $ Cr_Ph    : int  582 7861 146 111 160 47 246 315 157 123 ...
 $ Diabetes : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 2 1 1 ...
 $ Ej_fr    : int  20 38 20 20 20 40 15 60 65 35 ...
 $ BP       : Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 1 1 2 ...
 $ Platelets: num  26.5 26.3 16.2 21 32.7 ...
 $ Ser_Cr   : num  1.9 1.1 1.3 1.9 2.7 2.1 1.2 1.1 1.5 9.4 ...
 $ Ser_Na   : int  130 136 129 137 116 132 137 131 138 133 ...
 $ Sex      : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 2 1 2 ...
 $ Smoking  : Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 2 1 2 ...
 $ Time     : int  4 6 7 7 8 8 10 10 10 10 ...
 $ Death    : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...

Logistic Regression

mod2 <- glm(Death ~ ., data = h_data, family = 'binomial')
summary(mod2)

Call:
glm(formula = Death ~ ., family = "binomial", data = h_data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1848  -0.5706  -0.2401   0.4466   2.6668  

Coefficients:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept) 10.1849290  5.6565703   1.801 0.071774 .  
Age          0.0474191  0.0158006   3.001 0.002690 ** 
Anaemia1    -0.0074705  0.3604891  -0.021 0.983467    
Cr_Ph        0.0002222  0.0001779   1.249 0.211684    
Diabetes1    0.1451498  0.3511886   0.413 0.679380    
Ej_fr       -0.0766625  0.0163291  -4.695 2.67e-06 ***
BP1         -0.1026794  0.3587069  -0.286 0.774688    
Platelets   -0.0119962  0.0188906  -0.635 0.525404    
Ser_Cr       0.6660933  0.1814926   3.670 0.000242 ***
Ser_Na      -0.0669811  0.0397351  -1.686 0.091855 .  
Sex1        -0.5336580  0.4139180  -1.289 0.197299    
Smoking1    -0.0134922  0.4126178  -0.033 0.973915    
Time        -0.0210446  0.0030144  -6.981 2.92e-12 ***
---

Observe the p-values (Pr(>|z|)) for the regression coefficients, and we find that only ‘Age’ and ‘Ser_Cr’ have significant contributions to the response variable, ”Death. Therefore, we can already do a good job by fitting only those two variables.