Chi-square for Science

Science is built on the foundations of hypothesis testing. The Chi-square test of independence is one prime statistic for testing hypotheses when the variables are nominal.

The application of the Chi-square test is widely prevalent in clinical research. Here is an example of a case study published in ‘Biochemia Medica’. Following is the data from a group of 184 people, half of whom received a vaccine against pneumococcal pneumonia.

Un-VaccinatedVaccinated
Contracted
pneumococcal pneumonia
235
Contracted
another type of pneumonia
810
Did not contract pneumonia6177

1. Marginals

The first step in the Chi-square test is the calculation of the ‘marginals’. As marginals mean ‘on the sides’, we write them on the right column (the row-sums) and the bottom row (the column-sums).

Un-VaccinatedVaccinatedRow Sum
Contracted
pneumococcal pneumonia
23528
Contracted
another type of pneumonia
81018
Did not contract pneumonia6177138
Column Sum9292N = 184

2. Expected values

The chi-square test requires observed and expected values. It applies the following formula to each element and adds them up.

(O-E)2/E

The observed values are the data, and expectations are to be estimated based on the marginals. The expected data for a perfectly independent scenario is calculated as below. The expected value at (row i, column j) is obtained by RowSum(i) x ColumnSum(j)/(N).

Un-VaccinatedVaccinated
(O-E)2/E(O-E)2/E
Contracted
pneumococcal pneumonia
(23 – 28*92/184)2/
(28*92/184)
(5 – 28*92/184)2/
(28*92/184)
Contracted
another type of pneumonia
(8 – 18*92/184)2/
(18*92/184)
(10 – 18*92/184)2/
(18*92/184)
Did not contract pneumonia(61 – 138*92/184)2/
(138*92/184)
(77 – 138*92/184)2/
(138*92/184)

3. Test for Independence

Un-VaccinatedVaccinated
(O-E)2/E(O-E)2/E
Contracted
pneumococcal pneumonia
5.785.78
Contracted
another type of pneumonia
0.110.11
Did not contract pneumonia0.930.93

The Chi-square is calculated as the overall sum = 13.649

The p-value is estimated by looking at the Chi-square table for 13.349 at degrees of freedom (df) = 2.

The R code for the whole exercise

edu_data <- matrix(c(23, 5, 8, 10, 61, 77), ncol = 2 , byrow = TRUE)
colnames(edu_data) <- c("Vac", "No-Vac")
rownames(edu_data) <- c("npneumococcal pneumonia", "non-pneumococcal pneumonia", "Stayed healthy")


chisq.test(edu_data)
edu_data
	Pearson's Chi-squared test

data:  edu_data
X-squared = 13.649, df = 2, p-value = 0.001087

                           Vac No-Vac
npneumococcal pneumonia     23      5
non-pneumococcal pneumonia   8     10
Stayed healthy              61     77

The p-value suggests that the impact of vaccination on protecting against pneumococcal pneumonia is significant. And there is only a 1.1 in a thousand possibility that the difference is out of pure chance.

Reference

The Chi-square test of independence: Biochem Med