Test for Independence – Illustration

We have seen how R calculates the chi-squared test for independence. This time, we will estimate it manually while developing an intuition of the calculations. Here are the observed values.

High SchoolBachelorsMastersPh.d.Total
Female60544641201
Male40445357194
Total100989998395

Now, the expected values are estimated by assuming independence, which allows us to multiply the marginal probabilities to obtain the joint probabilities.

First cell

The observed frequency of the female and high school is 60. The expected frequency, if they are independent, is the product of the marginals (being a female and being in high school): (201/395) x (100/395) x 395. The last multiplication with 395 is to get the frequency from the probability. (201/395) x (100/395) x 395 = 50.88. In the same way, we can estimate the other cells.

High SchoolBachelorsMastersPh.d.Total
Female50.8849.8750.3849.87201
Male49.1148.1348.6248.13194
Total100989998395
chi-squared = sum(observed - expected)2 / expected
= (60 - 50.88)2/50.88 +  (54 - 49.87)2/49.87 + (46 - 50.38)2/50.38 + (41 - 49.87)2/49.87 + (40 - 49.11)2/49.11 + (44 - 48.13)2/48.13 + (53 - 48.62)2/48.62 + (58 - 48.13)2/48.13
 8.008746

You can look at the chi-squared table for 8.008746 with degrees of freedom = 3 for the p-value.