The trouble with Multiple Testing

Last time we saw the issue with sub-group analysis, an example of multiple-hypothesis testing. Here, we illustrate the problem with multiple hypothesis testing. Before that, a few recaps.

  1. Hypothesis testing is a statistical procedure to put assumptions (hypotheses) about a population parameter to test based on evidence collected from samples.
  2. The Null hypothesis is the default assumption (what we assume is true before evidence)
  3. Alpha (significance level) represents the strength of the evidence that must be present in your sample that the effect is statistically significant
  4. p-value is the probability that the observed statistics appeared purely by chance.
  5. if p < alpha, the null hypothesis is rejected
  6. A type I error is when the Null hypothesis is true but you rejected it.
  7. Rejection of the null hypothesis may be called a discovery

Based on item # 6, the probability of type I error is alpha.

Assume five tests are done at a 5% significance level, and the null hypothesis is true. What is the probability that at least one of the tests rejects the null hypothesis?

We know the old formula: at least one = 1 – none. Therefore, at least one type I error = 1 – no type I error = 1 – (1-alpha)5.

1 – (1- 0.05)5 = 0.226 or 22.6%

So, we have a 22.6% chance of rejecting at least one null hypothesis (and making a type I error).