Troubles with Sub-Group Analysis

Here is an example from Dr Vickers’s book, ‘What is a p-value anyway?’ about issues related to investigators running more analyses hoping to get statistical significance. A well-known type is a sub-group analysis. Note the following data on cancer drugs.

New.DrugOld.Drug
Recurred150190
Cancer free850810

Run a Fisher’s Exact Test, and you get a p-value of 0.02, which is statistically significant that the new drug is more effective.

p-value = 0.02016
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.5904410 0.9576516
sample estimates:
odds ratio 
  0.752434

Now, you do two sub-groups:

MENNew.DrugOld.Drug
Recurred80100
Cancer free420400
WOMENNew.DrugOld.Drug
Recurred7090
Cancer free430410

Run the test for the first sub-group (men): p-value = 0.12, and for the second (women), the p-value = 0.1; the new drug work for people, but not for men or for women!

Reference

What is a p-value anyway? 34 Stories to Help You Actually Understand Statistics:  Andrew Vickers

Troubles with Sub-Group Analysis Read More »

Hooping with Jordan

Dr Andrew J. Vickers’ famous ‘Hoop story with Jordan’ describes a good interpretation of p-value and hypothesis testing. The story goes like this:

The other day I shot baskets with Michael Jordan. He shot 7 straight free throws; I hit 3 and missed 4 and then rushed to the sideline, grabbed my laptop and calculated a p-value by Fisher’s exact test.

Andrew Vickers, What is a p-value anyway? 34 Stories to Help You Actually Understand Statistics 

So, what was the p-value? Let’s summarise the results and apply the test using R codes.

BasketNo.Basket
Jordan70
Vickers34
hoop.game
hoop.game <- data.frame("Basket" = c(7, 3), "No Basket" = c(0, 4), row.names = c("Jordan", "Vickers"))
fisher.test(hoop.game)
Fisher's Exact Test for Count Data

data:  hoop.game
p-value = 0.06993
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.8498871       Inf
sample estimates:
odds ratio 
       Inf 

Now, would you take this p-value (0.07) to suggest that there is no difference between my basketball skills and those of Michael Jordan? The answer is a firm NO; it only says the experiment hadn’t proved a difference between the two players.

Reference

What is a p-value anyway? 34 Stories to Help You Actually Understand Statistics:  Andrew Vickers

Low-Fat Diets Flub a Test: NYT

Hooping with Jordan Read More »

p-value Revisited

Hypothesis testing is an all-important tool in experimental research, needless to say, in pharmaceutical studies and drug discovery. If you forgot, hypothesis testing is a method that determines the probability that an event occurs only by chance.

The word used here is ‘hypothesis’, which suggests some default position (‘no effect’ or Null Hypothesis), and the trial aspires to examine whether the intervention (e.g. consumption of medicine) has made a difference. In other words, if the experimental results reject the null hypothesis, a discovery has happened.

Then you have the popular p-value approach that quantifies and helps the decision-making to reject or not. The experimenter sets a significance level before looking at the p-value. The significance level gives protection against incorrectly making a discovery – it is the probability of rejecting the default when it is true (a.k.a. Type I Error)! The smaller the value, the stronger the required evidence be. A simple coin-flipping example shows you how tough discovery (rejection of null hypothesis) is. I have flipped a coin ten times and got eight heads. Do I have sufficient evidence to prove that the coin is biased toward the heads?

Let’s assume a commonly used significance level of 0.05 (5%). My null hypothesis, naturally, is that the coin is fair (unbiased, with an equal probability of leaning heads or tails). We will use the binomial equation to estimate the chance of getting eight or more heads for an unbiased coin.

P(H >/= 8) = P(H = 8) + P(H = 9) + P(H = 10) = 10C8 x (0.5)8 x (0.5)2 +  10C9 x (0.5)9 x (0.5)1 +  10C10 x (0.5)10 x (0.5)0 = 0.044 + 0.0098 +  0.00098 = 0.055

The following R code can do it in one line.

binom.test(8, 10, 0.5, alternative="greater") 
Exact binomial test

data:  8 and 10
number of successes = 8, number of trials = 10, p-value = 0.05469
alternative hypothesis: true probability of success is greater than 0.5
95 percent confidence interval:
 0.4930987 1.0000000
sample estimates:
probability of success 
                   0.8 

p > the significance value. So, even eight heads out of ten tries can’t prove the coin is biased towards heads. Imagine you wanted to be doubly strict about the trial and set a tighter significance value of 1%, then even 9 out of 10 would have failed the test (p-value = 0.01074 > 0.01)!

Now, you can imagine why the ‘Valley of Death’ exists in clinical research.

p-value Revisited Read More »

McKinsey Curve – Energy Efficiency

If you have noticed the McKinsey curve, and I’m sure you have, one thing that surprises me is why a significant portion of the graph has abatement cost in negative, yet haven’t happened yet! Simple economics can’t explain that. So why does it remain a resource untapped?

One possible explanation can be a lack of information.

Second, is a principal-agent problem

Reference

McKinsey Curve

McKinsey Curve – Energy Efficiency Read More »

McKinsey Curve

McKinsey curve is a global mapping of opportunities that can reduce GHG emissions and is quite influential among policymakers. These are GHG abatement curves estimated at a future period for different countries. Following is an illustration of how they appear (for getting the actual curves, follow the link in the reference).

For an economist, it is a supply curve or the map of the marginal cost of making the marginal unit. Or the cost of reducing that last ton of greenhouse gas emissions. And each block represents one item – residential lighting, cellulosic biofuel, onshore wind, and coal power plant with CCS, to name a few.

Take one block, say the residential lighting: its width represents how many fewer greenhouse gas emissions we would have if we optimize the residential lighting system. The height is how much would that cost ($/ton CO2) to the households. If it is negative, it suggests the family gains money.

Most items on the negative side (the left side) are related to energy efficiency. And, by the definition of efficient markets, should happen by default, like changing CFL lamps with LED. But it’s a different matter altogether that these don’t always happen that way. But what is the idea of getting everything done on the list? From an economist’s standpoint, add a carbon tax larger than the height of the highest block on the right side. It becomes cheaper to perform abatement in that sector than pay taxes.

Reference

McKinsey Curve

McKinsey Curve Read More »

The Misuse of Conditional Probabilities

The misuse of conditional probability was at its best (worst) in the OJ Simpson murder trial. To give a one-line summary of the context, in June 1994, the American footballer O J Simpson was arrested and charged with the murders of his ex-wife Brown and her friend Goldman.

Against the prosecutor’s argument that Mr Simpson had a history of violence towards his wife, the defence argued that 1 in 2500 of the men who abuse their wives end up murdering them. And the judge seemed to have bought this conditional probability that

P(Husband murders wife | Husband abuses wife) = 1/2500

The real conditional probability should have been

P(Abusive husband is guilty | The wife is murdered)

The probability for this is much higher, close to 80%.

The Misuse of Conditional Probabilities Read More »

The Elevator Paradox

The elevator problem is an observation reported by physicists Marvin Stern and George Gamow. They observed that someone who waits for an elevator (to go down) at one of the top floors (not the topmost) is more likely to see the first elevator that stops at the floor going up.

Imagine the building has 20 floors, and the person who wants to go down has her office on the 19th. The elevator is in constant flight, and it takes 1 second to cover one floor. Let’s write down a hypothetical journey.

FloorUpDown
205:00:38
195:00:374:59:59; 5:00:39
18365:00; 40
173501
163402
153303
143204
133105
123006
112907
102808
92709
82610
72511
62412
52313
42214
32115
22016
11917
05:00:1818

Everyone who comes between 5:00 and 5:00:37 sees the elevator going up (at 5:00:37) and only the people who reached floor 19 at 5:00:38 and 5:00:39 miss that (and only see it comes down from floor 20).

The Elevator Paradox Read More »

Chuck a Luck Game

Gambling games are fascinating examples that illustrate human irrationality because of their straightforward mathematics. We have spent several times on roulette wheels in the past. Now, it’s the game Chuck-a-Luck.

A player can bet on one of the numbers 1, 2, 3, 4, 5, 6. Three dice are rolled. If the player’s number comes up in one, two or three of the dice, she gets, respectively, one, two or three times the original stake (in addition to her original wager); else loses the money.

So what is the house advantage of Chuck-a-Luck?

Imagine the player chooses X (a number between 1 to 6) and places 1 dollar bet. The expected value of the casino then becomes,

E(X) = 1 x P(X=0) – 1 x P(X=1) – 2 x P(X=2) – 3 x P(X=3)

E(X) is the expected value for the casino for X
P(X=0) = probability of no appearance of X (in three dice rolling)
P(X=1) = probability of one appearance of X (in three dice rolling)
P(X=2) = probability of two appearances of X (in three dice rolling)
P(X=3) = probability of three appearances of X (in three dice rolling)

If you forgot how to calculate the expected value of a die, read this post; it is the payoff of an event x its probability. And the probabilities can be calculated by applying the binomial theorem.

E(X) = 1 x [3C0 x (1/6)0 x (5/6)3] – 1 x [3C1 x (1/6)1 x (5/6)2] – 2 x [3C2 x (1/6)2 x (5/6)] – 3 x [3C3 x (1/6)3 x (5/6)0]

E(X) = [(5/6)3] – [3 x (1/6) x (5/6)2] – 2 x [3 x (1/6)2 x (5/6)] – 3 x [(1/6)3]

0.0787 or 7.87%; at par with the European style Roulette!

Reference

Fifty Challenging Problems In Probability: Frederick Mosteller

Chuck a Luck Game Read More »

German Tank Problem

The German tank problem is about the math that helped the Allies in WW2 to estimate the number of German tanks (panther) based on the ‘samples’, i.e., the ones captured. In a war, an accurate estimate of the maximum number of tanks on the enemy side helps estimate the size of the threat.

The Allies discovered that the components of the tanks had sequential serial numbers. Then they assumed that the probability of finding any tank from #1 to #N (the maximum) was equally distributed (uniform distribution) at 1/N. The serial numbers of the captured tanks then gave them the samples.

Imagine at some stage, the following five tanks were captured: 15, 47, 79, 28, 39. Organising them in increasing order, we get 15, 28, 39, 47, 79. We will consider them as random draws from the uniform distribution and calculate the gaps between them without taking the numbers themselves. They are 14, 12, 10, 7 and 31. The average of these gaps = 14.8. Add this correction factor to the maximum number 79 to get 94.

If m is the highest number, k is the number of tanks captured, and N is the (unknown) total number,

N = m + m/k – 1

Reference

German tank problem: Wiki

German Tank Problem Read More »

Blood-Pressure Control: SPRINT Study

The SPRINT study, sponsored by The National Heart, Lung, and Blood Institute, has been a landmark work which affirmed the value of keeping systolic pressure at a lower level through intensive treatment. SPRINT is the acronym for Systolic Blood Pressure Intervention Trial that compared the benefit of maintaining systolic blood pressure < 120 mm Hg with treatment for < 140 mm Hg.

SPRINT study enrolled 9361 participants above 50 years with high blood pressure (130 to 180 mm Hg), but without diabetes, between 2010 through 2013. SPRINT was a randomized, controlled, open-label trial that compared the study outcomes between the standard-treatment group (systolic blood-pressure target < 140 mm Hg) and the intensive-treatment group (systolic blood pressure target < 120 mm Hg).

A committee of professionals, unaware of the study-group assignments, judged the medical outcomes of the participants. The primary composite outcome was myocardial infarction, other acute coronary syndromes, stroke, heart failure, or death from cardiovascular causes. Secondary outcomes included the individual components of the primary composite outcome, death from any cause, and the composite of the primary outcome or death from any cause.

The results

Key results are summarised below

OutcomeIntensive
Treatment

(N = 4678)
Standard
Treatment

(N = 4683)
Hazard
Ratio
p-value
Primary
outcome
2433190.75<0.001
Death from
cardiovascular
causes
37650.570.005
Myocardial
infarction
971160.780.19
Stroke62700.470.5
Death from
any cause
1552100.730.003

Reference

A Randomized Trial of Intensive versus Standard Blood-Pressure Control: NEJM

Blood-Pressure Control: SPRINT Study Read More »