Criteria for Confounders

Identifying confounders is a challenge that statisticians encounter all the time. Confounding determines whether or not a causal association exists between an exposure and an outcome. A (rather silly) example is the notion that carrying matchboxes causes lung cancer. The factor – confounder – here is the smoking status. Smokers are likely to carry matchboxes; smokers have a higher chance of getting lung cancer. If this confounder is not identified, one may conclude that having matchboxes is the exposure that caused the outcome of lung cancer.

As per Jager et al., a confounding variable must satisfy three criteria: 1) it must have an association with the exposure of interest, (2) it must be associated with the outcome of interest, and (3) it must not be an outcome of the exposure.

Criteria for Confounders Read More »

Physical Activity and Health

The March issue of the British Journal of Sports Medicine came out with the results from a 9-year-long cohort study of people who did physical activity and its impact on influenza and pneumonia.

Before we get into details, note that it is a cohort study – of 577 909 US adults. Cohort studies are observational, whereas randomised controlled trials (RCTs) are interventional. Establishing causations from observational studies is problematic.

A key finding of the study has been the association of lowered risk of influenza and pneumonia with aerobic physical activity.

Reference

Webber BJ, et al. Br J Sports Med 2023;0:1–8.

Physical Activity and Health Read More »

Fisher’s Exact Test

Fisher’s exact test is a statistical significance test that calculates the p-value and indicates an association between two variables. For example, scientists tagged 50 king penguins in each of three nesting areas (lower, middle, and upper) and counted the numbers that were alive or dead after a year. The following were the results.

AliveDead
Upper nesting area437
Middle nesting area446
Lower nesting area491

Are these differences significant?

penguin.nest <- data.frame("Alive" = c(43, 44, 49), "Dead" = c(7, 6, 1), row.names = c("Lower", "Middle", "Upper"))
fisher.test(penguin.nest)

The p-value is 0.0896; it is not significant.

Fisher’s Exact Test Read More »

Hypergeometry of counterfeits

A collection of 15 gold coins contains 4 counterfeits. If 2 of them are randomly selected to be sold at the auction, find the probability that

  1. neither of them is a counterfeit
  2. only one of them is a counterfeit
  3. both coins are counterfeits.

This is a hypergeometric probability distribution – picking without replacement. If X is the number of counterfeit coins (hypergeometric random variable),

P(X = 0) = \frac{_{4}C_0 \textrm{ }*\textrm{ } _{11}C_2\textrm{ }}{_{15}C_2}

choose(4,0)*choose(11,2) / choose(15,2)
0.52

P(X = 1) = \frac{_{4}C_1 \textrm{ }*\textrm{ } _{11}C_1\textrm{ }}{_{15}C_2}

choose(4,1)*choose(11,1) / choose(15,2)
0.42

P(X = 2) = \frac{_{4}C_2 \textrm{ }*\textrm{ } _{11}C_0\textrm{ }}{_{15}C_2}

choose(4,2)*choose(11,0) / choose(15,2)
0.06

Or simply,

dhyper(2, 4, 11, 2, log = FALSE)

Hypergeometry of counterfeits Read More »

Coffee Overflow

A coffee machine is regulated to charge 195 ml per cup with a standard deviation of 5 ml. Assuming the amount of fill is normally distributed, what is the probability that 200 ml cups will overflow?

For normal distributions,

P(X \ge 200) = P(z \ge \frac{200-\mu}{\sigma})  = P(z \ge \frac{200-195}{5})

Or you may use this simple R command

1 - pnorm(200, 195, 5)
0.1586553

Coffee Overflow Read More »

Craps Probability – Don’t Pass

Another type of bet in craps is a ‘don’t pass bet’. Here, the winning opportunities are the opposite of what we have seen before. Well, not really; had that been the case, the player would have got an exactly opposite, +1.41% advantage, which is absurd. A player never holds winning odds in gambling! The rules are almost the opposite, but getting 12 in the first throw makes a pass (no win. no loss). Let’s list down all the possible outcomes and the payoff table.

  1. The player throws the dice and wins at once if the total for the first throw is 2 or 3.
  2. The player loses if the outcome is 7 or 11.
  3. It’s a pass if the outcome is 12.
  4. The throws 4, 5, 6, 8, 9 or 10 are called points.
  5. If the first throw is a point, it is repeated until the same number (the point) comes back (player loses) or 7 (player wins).

The probability of winning a point 4 is the joint probability of winning 4 in the first roll and the probability of getting 7 (and not 4) in the second.

Dice
Roll
PayoffProbabilityReturn
7 or 11
(come-out loss)
-116.67 + 5.56
= 22.23
-22.23
2, 3
(come-out win)
12.78 + 5.56
= 8.34
8.34
12
(come-out push)
02.780
Point 4 loss-18.33*8.33/(8.33+16.67)
= 2.78
-2.78
Point 5 loss-111.11*11.11/(11.11+16.67)
= 4.44
-4.44
Point 6 loss-113.89*13.89/(13.89+16.67)
= 6.31
-6.31
Point 8 loss-113.89*13.89/(13.89+16.67)
= 6.31
-6.31
Point 9 loss-111.11*11.11/(11.11+16.67)
= 4.44
-4.44
Point 10 loss-18.33*8.33/(8.33+16.67)
= 2.78
-2.78
Point 4 win18.33*16.67/(8.33+16.67)
= 5.55
5.55
Point 5 win111.11*16.67/(11.11+16.67)
= 6.67
6.67
Point 6 win113.89*16.67/(13.89+16.67)
= 7.58
7.58
Point 8 win113.89*16.67/(13.89+16.67)
= 7.58
7.58
Point 9 win111.11*16.67/(11.11+16.67)
= 6.67
6.67
Point 10 win18.33*16.67/(8.33+16.67)
= 5.55
5.55
Overall100-1.35

So, as usual, the house wins.

Craps Probability – Don’t Pass Read More »

Craps Probability

Here we continue and determine the probability of winning one of the craps moves, the pass line bet. Let’s summarise the ways of winning (and losing) and the corresponding payoffs.

Dice
Roll
PayoffProbability
7 or 11
(come-out win)
1P7 + P11
2, 3, or 12
(come-out loss)
-1P2 + P3 + P12
Point 4 win1P4*P4/7
Point 5 win1P5*P5/7
Point 6 win1P6*P6/7
Point 8 win1P8*P8/7
Point 9 win1P9*P9/7
Point 10 win1P10*P10/7
Point 4 loss-1P4*P7/4
Point 5 loss-1P5*P7/5
Point 6 loss-1P6*P7/6
Point 8 loss-1P8*P7/8
Point 9 loss-1P9*P7/9
Point 10 loss-1P10*P7/10

The notations are:
P7 = probability of getting a 7
P4/7 = probability of getting a 4 over 7 (in the second throw, after getting a 4 in the first throw) etc.

The probability of winning a point 4 is the joint probability of winning 4 in the first roll and the probability of getting 4 (and not 7) in the second. Let’s calculate each of these probabilities using the reference table.

Dice
Roll
Probability%
21/362.78
32/365.56
43/368.33
54/3611.11
65/3613.89
76/3616.67
85/3613.89
94/3611.11
103/368.33
112/365.56
121/362.78

A sample calculation goes like this: The probability of point 4 is P4 (8.33) multiplied with chances of 4 over 4 or 7 (8.33/(8.33 +16.67)). I.e., 8.33*8.33/(8.33 +16.67) = 2.78. Similarly, the probability of losing a point 4 = P4 (8.33) x chance of 7 over 4 or 7 (16.67/(8.33 +16.67)).

Dice
Roll
PayoffProbabilityReturn
7 or 11
(come-out win)
116.67 + 5.56
= 22.23
22.23
2, 3, or 12
(come-out loss)
-12.78 + 5.56 + 2.78
= 11.12
-11.12
Point 4 win18.33*8.33/(8.33+16.67)
= 2.78
2.78
Point 5 win111.11*11.11/(11.11+16.67)
= 4.44
4.44
Point 6 win113.89*13.89/(13.89+16.67)
= 6.31
6.31
Point 8 win113.89*13.89/(13.89+16.67)
= 6.31
6.31
Point 9 win111.11*11.11/(11.11+16.67)
= 4.44
4.44
Point 10 win18.33*8.33/(8.33+16.67)
= 2.78
2.78
Point 4 loss-18.33*16.67/(8.33+16.67)
= 5.55
-5.55
Point 5 loss-111.11*16.67/(11.11+16.67)
= 6.67
-6.67
Point 6 loss-113.89*16.67/(13.89+16.67)
= 7.58
-7.58
Point 8 loss-113.89*16.67/(13.89+16.67)
= 7.58
-7.58
Point 9 loss-111.11*16.67/(11.11+16.67)
= 6.67
-6.67
Point 10 loss-18.33*16.67/(8.33+16.67)
= 5.55
-5.55
Overall100-1.41

No surprise, the house wins; at 1.43%

Craps Probability Read More »

Betting on Craps

Craps is another popular casino game that involves throwing two dice and noting the total. A person can place several bets, and we discuss the probabilities of one such type – the pass bets. But before we get to the rules, look at how the totals distribute.

From the picture, you can estimate the probability of each sum as the ratio between the number of occurrences of that number with the total number, i.e., 36. The following picture can help count those possibilities to build the table below.

Dice
Roll
Probability%
21/362.78
32/365.56
43/368.33
54/3611.11
65/3613.89
76/3616.67
85/3613.89
94/3611.11
103/368.33
112/365.56
121/362.78

Pass line bet

The rules for the pass-line bet are:
1) The player throws the dice and wins at once if the total for the first throw is 7 or 11.
2) The player loses if the outcome is 2, 3 or 12.
3) The throws 4, 5, 6, 8, 9 or 10 is called a point.
4) If the first throw is a point, it is repeated until the same number (the point) comes back (player wins) or 7 (player loses).

Now, what are the chances for the player to win a pass-line bet? That is next.

Betting on Craps Read More »

The trouble with Multiple Testing

Last time we saw the issue with sub-group analysis, an example of multiple-hypothesis testing. Here, we illustrate the problem with multiple hypothesis testing. Before that, a few recaps.

  1. Hypothesis testing is a statistical procedure to put assumptions (hypotheses) about a population parameter to test based on evidence collected from samples.
  2. The Null hypothesis is the default assumption (what we assume is true before evidence)
  3. Alpha (significance level) represents the strength of the evidence that must be present in your sample that the effect is statistically significant
  4. p-value is the probability that the observed statistics appeared purely by chance.
  5. if p < alpha, the null hypothesis is rejected
  6. A type I error is when the Null hypothesis is true but you rejected it.
  7. Rejection of the null hypothesis may be called a discovery

Based on item # 6, the probability of type I error is alpha.

Assume five tests are done at a 5% significance level, and the null hypothesis is true. What is the probability that at least one of the tests rejects the null hypothesis?

We know the old formula: at least one = 1 – none. Therefore, at least one type I error = 1 – no type I error = 1 – (1-alpha)5.

1 – (1- 0.05)5 = 0.226 or 22.6%

So, we have a 22.6% chance of rejecting at least one null hypothesis (and making a type I error).

The trouble with Multiple Testing Read More »

False Discovery Rate

I recommend you read the recent post on p-value first. In short, if the investigator rejects the null hypothesis based on evidence, it may be called a discovery. Then what is a false discovery rate (FDR)?

FDR is the proportion of tests in which the null hypothesis is true out of all cases where it is rejected. In probability notation, FDR = P(H0 is true | reject H0).

At first glance, it may resemble the significance level or alpha. But alpha is the probability of rejecting the null hypothesis when it is true; it is P(reject H0 | H0 is true). So, to get the FDR, we need to use Bayes’ theorem.

FDR = P(H0 is true | reject H0) = P(reject H0 | H0 is true) x P(H0 is true) /(P(reject H0 | H0 is true) x P(H0 is true) + P(reject H0 | H0 is not true) x P(H0 is not true))

P(H_0 True | Reject H_0) = \frac{P(Reject H_0 | H_0  True) * P(H_0 True)} {P(Reject H_0 | H_0 True) * P(H_0 True) + P(Reject H_0 | H_0 Not True) * P(H_0 Not True)}

The first term, P(reject H0 | H0 is true), as we know, is alpha. The next one, P(H0 is true), is the prior probability for the null hypothesis to be true that we need to find out. P(H0 is not true) = 1 – P(H0 is true). That leaves the last term, P(reject H0 | H0 is not true). We know the chance of not rejecting if H0 is not true is beta (false-negative or type II error). So, P(reject H0 | H0 is not true) = 1 – beta.

Let’s assume alpha = 0.05, the prior probability of the null hypothesis is 0.25, beta = 0.2,

FDR = \frac{0.05 * 0.25}{0.05 * 0.25 + (1-0.2)*(1-0.25)} = 0.02

False Discovery Rate Read More »