Let’s work out a few more matrices to continue with the heart data. First, let’s recall the data using the str() command.
str(h_data)
'data.frame': 299 obs. of 13 variables:
$ Age : num 75 55 65 50 65 90 75 60 65 80 ...
$ Anaemia : Factor w/ 2 levels "0","1": 1 1 1 2 2 2 2 2 1 2 ...
$ Cr_Ph : int 582 7861 146 111 160 47 246 315 157 123 ...
$ Diabetes : Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 2 1 1 ...
$ Ej_fr : int 20 38 20 20 20 40 15 60 65 35 ...
$ BP : Factor w/ 2 levels "0","1": 2 1 1 1 1 2 1 1 1 2 ...
$ Platelets: num 26.5 26.3 16.2 21 32.7 ...
$ Ser_Cr : num 1.9 1.1 1.3 1.9 2.7 2.1 1.2 1.1 1.5 9.4 ...
$ Ser_Na : int 130 136 129 137 116 132 137 131 138 133 ...
$ Sex : Factor w/ 2 levels "0","1": 2 2 2 2 1 2 2 2 1 2 ...
$ Smoking : Factor w/ 2 levels "0","1": 1 1 2 1 1 2 1 2 1 2 ...
$ Time : int 4 6 7 7 8 8 10 10 10 10 ...
$ Death : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
Logistic Regression
mod2 <- glm(Death ~ ., data = h_data, family = 'binomial')
summary(mod2)
Call:
glm(formula = Death ~ ., family = "binomial", data = h_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1848 -0.5706 -0.2401 0.4466 2.6668
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 10.1849290 5.6565703 1.801 0.071774 .
Age 0.0474191 0.0158006 3.001 0.002690 **
Anaemia1 -0.0074705 0.3604891 -0.021 0.983467
Cr_Ph 0.0002222 0.0001779 1.249 0.211684
Diabetes1 0.1451498 0.3511886 0.413 0.679380
Ej_fr -0.0766625 0.0163291 -4.695 2.67e-06 ***
BP1 -0.1026794 0.3587069 -0.286 0.774688
Platelets -0.0119962 0.0188906 -0.635 0.525404
Ser_Cr 0.6660933 0.1814926 3.670 0.000242 ***
Ser_Na -0.0669811 0.0397351 -1.686 0.091855 .
Sex1 -0.5336580 0.4139180 -1.289 0.197299
Smoking1 -0.0134922 0.4126178 -0.033 0.973915
Time -0.0210446 0.0030144 -6.981 2.92e-12 ***
---
Observe the p-values (Pr(>|z|)) for the regression coefficients, and we find that only ‘Age’ and ‘Ser_Cr’ have significant contributions to the response variable, ”Death. Therefore, we can already do a good job by fitting only those two variables.