Here is some data on drinking and getting in trouble with the police. Assess the relationship between drinking habits and getting into trouble with the authorities. Does this data provide evidence of drinking and getting into trouble with the police?
Never | Occasional | Frequent | |
Trouble with Police | 60 | 200 | 420 |
No trouble with Police | 4800 | 2700 | 2800 |
The first step is to form the hypothesis. Here is the null hypothesis:
H0 – Drinking habits and getting into trouble with the police are independent.
The alternative is
H1 – Drinking habits and getting into trouble with the police are not independent.
We will use the chi-squared test to validate the null hypothesis.
We will use the chi-squared test to validate the null hypothesis. It requires observed data as well as the expected data under the null hypothesis conditions. From the data, the number of people belonging to each of the drinking categories is:
Never | Occasional | Frequent | Total | |
# | 4860 | 2900 | 3220 | 10980 |
% | 44.26 | 26.41 | 29.33 | 100 |
So, under ‘normal’ conditions (conditions of independence), one would expect similar percentages of individuals getting into trouble with the police, the expected numbers we needed.
Never | Occasional | Frequent | |
Trouble with Police | 301 | 178 | 200 |
No trouble with Police | 4559 | 2720 | 3020 |
If you add a row below each category, you will get the same split as per the total.
Never | Occasional | Frequent | |
% | 44.26 | 26.41 | 29.33 |
It’s time for the chi-square test, i.e. (observed – expected)2/expected summed over all the members.
(60 – 301)2 / 301 + (200 – 178)2 / 178 + (420 – 200)2 / 200 +(4800 – 4559)2 / 4559 +(2700 – 2720)2 / 2720 + (2800 – 3020)2 / 3020 = 467
The chi-squared statistic is 467. The degrees of freedom are the product of one less than the number of categorical variables (i.e. (2-1) x (3-1) = 2). Upon looking at the probability table, you can find that 467 is way on the right side of the distribution, with the probability (p-value) almost zero. So the data did not happen by chance, and the null hypothesis is rejected.