The following table lists the weights of 50 boys (6-year-olds) sampled randomly. Can you test the hypothesis that the weights of 6-years old boys follow a normal distribution with a mean = 25 and a standard deviation = 2? We will do a chi-squared test to find out the answer to this.
28 | 24 | 27 | 24 | 27 |
26 | 25 | 29 | 22 | 24 |
23 | 25 | 21 | 22 | 25 |
26 | 27 | 27 | 26 | 29 |
28 | 27 | 22 | 23 | 21 |
29 | 24 | 23 | 23 | 22 |
25 | 22 | 29 | 28 | 30 |
24 | 28 | 26 | 25 | 25 |
28 | 29 | 26 | 27 | 30 |
22 | 31 | 25 | 24 | 27 |
The hypotheses
The null hypothesis, H0, in this case: there is no difference between the observed frequencies and the expected frequencies of a normal distribution with mean = 25 and standard deviation = 2.
The alternative hypothesis, HA: there is a difference between the observed frequencies and the expected frequencies of a normal distribution with mean = 25 and standard deviation = 2.
Estimation of chi2
Let us divide the data in the previous table into six groups of equal ranges. The frequencies of those ranges are counted. The expected frequency is estimated from the cumulative distribution function of the normal distribution for each of the ranges using the formula
Ei = n x [F(Ui) – F(Li)]
n is the number of samples, F(Ui) is the upper limit of a range, and F(Li) is the lower limit.
Range | Observed Frequency (O) | Expected Frequency (E) | (O-E)2/E |
20 – 21 | 2 | 0.83 | 1.65 |
22 -23 | 10 | 6.65 | 1.69 |
24 -25 | 13 | 16.45 | 0.72 |
26 – 27 | 12 | 16.07 | 1.03 |
28 – 29 | 10 | 6.21 | 2.31 |
30 – 31 | 3 | 0.94 | 4.51 |
11.92 |
The critical value at the 5% significance level and the p-value are estimated using the following R code.
qchisq(0.05, 5, lower.tail = FALSE)
pchisq(11.92, df=5, lower.tail=FALSE)
The critical value is 11.07, and the p-value is 0.036. Since the estimated chi-square (11.92) is outside the critical value, we reject the null hypothesis that the data follow the normal distribution with a mean = 25 and a standard deviation of 2.