Another application of the chi-square statistics is to test for independence, applied to categorical variables. For example, if you did sampling and wanted to know the gender dependence on higher education. Following is the data collected.
Male | Female | |
No Graduation | 7 | 6 |
College | 16 | 13 |
Bachelors | 15 | 16 |
Masters | 11 | 8 |
Test for Independence
You perform a chi-square to test for independence.
Male Observed | Female Observed | Total | |
No Graduation | 7 | 6 | 13 |
College | 16 | 13 | 29 |
Bachelors | 15 | 16 | 31 |
Masters | 11 | 8 | 19 |
Total | 49 | 43 | 92 |
The expected data for a perfectly independent scenario is calculated as below. The expected value at (row i, column j) is obtained by RowSum(i) x ColumnSum(j)/(Grand Total).
Male Expected | Female Expected | Total | |
No Graduation | 13×49/92 = 6.92 | 13×43/92 = 6.08 | 13 |
College | 29×49/92 = 15.45 | 29×43/92 = 13.55 | 29 |
Bachelors | 31×49/92 = 16.51 | 31×43/92 = 14.49 | 31 |
Masters | 19×49/92 = 10.12 | 19×43/92 = 8.88 | 19 |
Total | 49 | 43 | 92 |
The Chi-square is calculated as
Male (O-E)2/E | Female (O-E)2/E | Chi2 | |
No Graduation | (7-6.92)2/6.92 = 0.00092 | (6-6.08)2/6.08 = 0.00105 | |
College | (16-15.45)2/15.45 = 0.0196 | (13-13.55)2/13.55 = 0.0223 | |
Bachelors | (15-16.51)2/16.51 = 0.138 | (16-14.49)2/14.49 = 0.157 | |
Masters | (11-10.12)2/10.12 = 0.0765 | (8-8.88)2/8.88 = 0.087 | |
Total | 0.235 | 0.268 | 0.503 |
Like, we have done previously, plug in the value for the 5% significance level (0.05) in R function qchisq with degrees of freedom 3. The answer is 7.81, which is the critical value. The calculated value of 0.503 is lower than 7.81, and therefore, the null hypothesis that education level is independent of gender can not be rejected. The p-value can be calculated by using 0.503 inside the R function, pchisq. The answer is 0.918.
qchisq(0.05, 3, lower.tail = FALSE)
pchisq(0.503, df=3, lower.tail=FALSE)
The R code for the whole exercise
edu_data <- matrix(c(7, 16, 15, 11, 6, 13, 16, 8), ncol = 4 , byrow = TRUE)
colnames(edu_data) <- c("no Grad", "College", "Bachelors", "Masters")
rownames(edu_data) <- c("male", "female")
chisq.test(edu_data)