Science is built on the foundations of hypothesis testing. The Chi-square test of independence is one prime statistic for testing hypotheses when the variables are nominal.
The application of the Chi-square test is widely prevalent in clinical research. Here is an example of a case study published in ‘Biochemia Medica’. Following is the data from a group of 184 people, half of whom received a vaccine against pneumococcal pneumonia.
Un-Vaccinated | Vaccinated | |
Contracted pneumococcal pneumonia | 23 | 5 |
Contracted another type of pneumonia | 8 | 10 |
Did not contract pneumonia | 61 | 77 |
1. Marginals
The first step in the Chi-square test is the calculation of the ‘marginals’. As marginals mean ‘on the sides’, we write them on the right column (the row-sums) and the bottom row (the column-sums).
Un-Vaccinated | Vaccinated | Row Sum | |
Contracted pneumococcal pneumonia | 23 | 5 | 28 |
Contracted another type of pneumonia | 8 | 10 | 18 |
Did not contract pneumonia | 61 | 77 | 138 |
Column Sum | 92 | 92 | N = 184 |
2. Expected values
The chi-square test requires observed and expected values. It applies the following formula to each element and adds them up.
(O-E)2/E
The observed values are the data, and expectations are to be estimated based on the marginals. The expected data for a perfectly independent scenario is calculated as below. The expected value at (row i, column j) is obtained by RowSum(i) x ColumnSum(j)/(N).
Un-Vaccinated | Vaccinated | |
(O-E)2/E | (O-E)2/E | |
Contracted pneumococcal pneumonia | (23 – 28*92/184)2/ (28*92/184) | (5 – 28*92/184)2/ (28*92/184) |
Contracted another type of pneumonia | (8 – 18*92/184)2/ (18*92/184) | (10 – 18*92/184)2/ (18*92/184) |
Did not contract pneumonia | (61 – 138*92/184)2/ (138*92/184) | (77 – 138*92/184)2/ (138*92/184) |
3. Test for Independence
Un-Vaccinated | Vaccinated | |
(O-E)2/E | (O-E)2/E | |
Contracted pneumococcal pneumonia | 5.78 | 5.78 |
Contracted another type of pneumonia | 0.11 | 0.11 |
Did not contract pneumonia | 0.93 | 0.93 |
The Chi-square is calculated as the overall sum = 13.649
The p-value is estimated by looking at the Chi-square table for 13.349 at degrees of freedom (df) = 2.
The R code for the whole exercise
edu_data <- matrix(c(23, 5, 8, 10, 61, 77), ncol = 2 , byrow = TRUE)
colnames(edu_data) <- c("Vac", "No-Vac")
rownames(edu_data) <- c("npneumococcal pneumonia", "non-pneumococcal pneumonia", "Stayed healthy")
chisq.test(edu_data)
edu_data
Pearson's Chi-squared test
data: edu_data
X-squared = 13.649, df = 2, p-value = 0.001087
Vac No-Vac
npneumococcal pneumonia 23 5
non-pneumococcal pneumonia 8 10
Stayed healthy 61 77
The p-value suggests that the impact of vaccination on protecting against pneumococcal pneumonia is significant. And there is only a 1.1 in a thousand possibility that the difference is out of pure chance.
Reference
The Chi-square test of independence: Biochem Med