We have seen how R calculates the chi-squared test for independence. This time, we will estimate it manually while developing an intuition of the calculations. Here are the observed values.
High School | Bachelors | Masters | Ph.d. | Total | |
Female | 60 | 54 | 46 | 41 | 201 |
Male | 40 | 44 | 53 | 57 | 194 |
Total | 100 | 98 | 99 | 98 | 395 |
Now, the expected values are estimated by assuming independence, which allows us to multiply the marginal probabilities to obtain the joint probabilities.
First cell
The observed frequency of the female and high school is 60. The expected frequency, if they are independent, is the product of the marginals (being a female and being in high school): (201/395) x (100/395) x 395. The last multiplication with 395 is to get the frequency from the probability. (201/395) x (100/395) x 395 = 50.88. In the same way, we can estimate the other cells.
High School | Bachelors | Masters | Ph.d. | Total | |
Female | 50.88 | 49.87 | 50.38 | 49.87 | 201 |
Male | 49.11 | 48.13 | 48.62 | 48.13 | 194 |
Total | 100 | 98 | 99 | 98 | 395 |
chi-squared = sum(observed - expected)2 / expected
= (60 - 50.88)2/50.88 + (54 - 49.87)2/49.87 + (46 - 50.38)2/50.38 + (41 - 49.87)2/49.87 + (40 - 49.11)2/49.11 + (44 - 48.13)2/48.13 + (53 - 48.62)2/48.62 + (58 - 48.13)2/48.13
8.008746
You can look at the chi-squared table for 8.008746 with degrees of freedom = 3 for the p-value.