p-value Revisited

Hypothesis testing is an all-important tool in experimental research, needless to say, in pharmaceutical studies and drug discovery. If you forgot, hypothesis testing is a method that determines the probability that an event occurs only by chance.

The word used here is ‘hypothesis’, which suggests some default position (‘no effect’ or Null Hypothesis), and the trial aspires to examine whether the intervention (e.g. consumption of medicine) has made a difference. In other words, if the experimental results reject the null hypothesis, a discovery has happened.

Then you have the popular p-value approach that quantifies and helps the decision-making to reject or not. The experimenter sets a significance level before looking at the p-value. The significance level gives protection against incorrectly making a discovery – it is the probability of rejecting the default when it is true (a.k.a. Type I Error)! The smaller the value, the stronger the required evidence be. A simple coin-flipping example shows you how tough discovery (rejection of null hypothesis) is. I have flipped a coin ten times and got eight heads. Do I have sufficient evidence to prove that the coin is biased toward the heads?

Let’s assume a commonly used significance level of 0.05 (5%). My null hypothesis, naturally, is that the coin is fair (unbiased, with an equal probability of leaning heads or tails). We will use the binomial equation to estimate the chance of getting eight or more heads for an unbiased coin.

P(H >/= 8) = P(H = 8) + P(H = 9) + P(H = 10) = 10C8 x (0.5)8 x (0.5)2 +  10C9 x (0.5)9 x (0.5)1 +  10C10 x (0.5)10 x (0.5)0 = 0.044 + 0.0098 +  0.00098 = 0.055

The following R code can do it in one line.

binom.test(8, 10, 0.5, alternative="greater") 
Exact binomial test

data:  8 and 10
number of successes = 8, number of trials = 10, p-value = 0.05469
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
 0.4930987 1.0000000
sample estimates:
probability of success 
                   0.8 

p > the significance value. So, even eight heads out of ten tries can’t prove the coin is biased towards heads. Imagine you wanted to be doubly strict about the trial and set a tighter significance value of 1%, then even 9 out of 10 would have failed the test (p-value = 0.01074 > 0.01)!

Now, you can imagine why the ‘Valley of Death’ exists in clinical research.