Chi-square for Independence

Another application of the chi-square statistics is to test for independence, applied to categorical variables. For example, if you did sampling and wanted to know the gender dependence on higher education. Following is the data collected.

MaleFemale
No Graduation76
College1613
Bachelors1516
Masters118

Test for Independence

You perform a chi-square to test for independence.

Male
Observed
Female
Observed
Total
No Graduation7613
College161329
Bachelors151631
Masters11819
Total494392
Observed Data

The expected data for a perfectly independent scenario is calculated as below. The expected value at (row i, column j) is obtained by RowSum(i) x ColumnSum(j)/(Grand Total).

Male
Expected
Female
Expected
Total
No Graduation13×49/92
= 6.92
13×43/92
= 6.08
13
College29×49/92
= 15.45
29×43/92
= 13.55
29
Bachelors31×49/92
= 16.51
31×43/92
= 14.49
31
Masters19×49/92
= 10.12
19×43/92
= 8.88
19
Total494392
Expected Data

The Chi-square is calculated as

Male
(O-E)2/E
Female
(O-E)2/E
Chi2
No Graduation(7-6.92)2/6.92
= 0.00092
(6-6.08)2/6.08
= 0.00105
College(16-15.45)2/15.45
= 0.0196
(13-13.55)2/13.55
= 0.0223
Bachelors(15-16.51)2/16.51
= 0.138
(16-14.49)2/14.49
= 0.157
Masters(11-10.12)2/10.12
= 0.0765
(8-8.88)2/8.88
= 0.087
Total0.2350.2680.503
Chi-square

Like, we have done previously, plug in the value for the 5% significance level (0.05) in R function qchisq with degrees of freedom 3. The answer is 7.81, which is the critical value. The calculated value of 0.503 is lower than 7.81, and therefore, the null hypothesis that education level is independent of gender can not be rejected. The p-value can be calculated by using 0.503 inside the R function, pchisq. The answer is 0.918.

qchisq(0.05, 3, lower.tail = FALSE)
pchisq(0.503, df=3, lower.tail=FALSE)

The R code for the whole exercise

edu_data <- matrix(c(7, 16, 15, 11, 6, 13, 16, 8), ncol = 4 , byrow = TRUE)
colnames(edu_data) <- c("no Grad", "College", "Bachelors", "Masters")
rownames(edu_data) <- c("male", "female")

chisq.test(edu_data)