Chi-Square Test for the Lefties

Randomness and the subsequent scattering of data can confuse people interpreting observations. Take this example: from studies, we know that 10% of the population is left-handed. You surveyed 150 people (randomly selected) and found that 20 are left-handed. Does this violate the theory, or it’s just normal? What do we do?

Goodness of fit

You perform a chi-square goodness of fit test on the data.

Observed (O)Expected (E)(O-E)2/E
Left 201525/15
Right13013525/135
Total1501501.85

We will reject the notion (that 10% is left-handed) with a 5% significance level. In other words, the evidence shall be outside the 95% confidence interval to support the alternative hypothesis. In our case, the alternative hypothesis is that the proportion of lefties is more than 10% of the population. So how do you estimate the critical value at a 0.05 (5%) significance level? In an old-fashioned way, there is a lookup table where you find out the number by matching the degrees of freedom (in this case, df = 1) and the significance level. We use the following R code to get it.

qchisq(0.05, 1, lower.tail = FALSE) # qchisq(p, df)

The answer is 3.84. In other words, the calculated value of the chi-squared needs to be greater than 3.84 to be outside the range to reject the notion (or the null hypothesis). In our case, it is 1.85, which is less than 3.84, and we can’t reject the notion of 10% lefties, although we see 20 in 150!

p-value

How to calculate our favourite p-value from this? For that, we plug in the chi-square value (1.85) in the pchisq function.

pchisq(1.85, df=1, lower.tail=FALSE)

The answer is 0.1737. Needless to say, pchisq is the inverse of qchisq. In other words

qchisq(0.1737, 1, lower.tail = FALSE)

gives 1.85.

Everything in one step

The following R code will do everything from the start

obsfreq <- c(20,130)
nullprobs <- c(0.1,0.9)
chisq.test(obsfreq,p=nullprobs)

The answer will be in the following format

	Chi-squared test for given probabilities

data:  obsfreq
X-squared = 1.8519, df = 1, p-value = 0.1736