Waiting for Heads

The last time, we saw the expected waiting times of sequences from coin-flipping games. Today, we will make the necessary formulation to theorise those observations.

We have already seen how the expected values are calculated. In statistics, the expected values are the average values estimated by summing values multiplied by the theoretical probability of occurrence. Start with the simplest one in the coin game.

Consider a coin with probability p for heads and q (= 1-p) for tails. Note that, for a fair coin, p becomes (1/2).

Expected waiting time for H

You toss a coin once: it can land on H with probability p and T with q or (1 -p). If it is H, the game ends after one flip. The value (for the time to wait) becomes 1, and the expected value, E(H), for an H start is p x 1. On the other hand, if the flip ends up with T, you start again for another flip. In other words, you did 1 flip, but back to E(H). The expected value, in this case, is (1-p)x(1 + E(H)). The final E(H) is the sum of each starting possibility.

\\ E(H) = p*E(H1) + q*E(T1)\\ \\ E(H) = p*1 + (1-p)*(1 + E(H)) \\ \\ p*E(H) = p + 1 - p \\ \\ E(H) = \frac{1}{p} = 2 \text{, for a fair coin (p = 1/2)}

This should not come as a surprise. Since p is the probability of getting heads (say, 1/2), you get an H on an average 1/p (2) flip if you flip several times.

Expected waiting times for HT

We follow the same logic again. You made one flip (1), and you have two possibilities as starting – p x E(HT|H) and q x E(HT|T). HT|H means H followed by T given H has happened. For the second flip, you either start from the state of H or from the state of T.

\\ E(HT) = 1 + p*E(HT|H) + q*E(HT|T) \\ \\ \text{from the state of H, } \\ \\ E(HT|H) = p*(1 + E(HT|H)) + q*1 \\ \\ E(HT|H) = \frac{p + q}{1 -p} = \frac{1}{0.5} = 2\\ \\ \text{from the state of T, } \\ \\ E(HT|T) = p*(1 + E(HT|H)) + q*(1 + E(HT|T)) \\ \\ E(HT|T) = 0.5*(1 + 2) + 0.5*(1 + E(HT|T)) \\ \\ E(HT|T) = \frac{0.5*(1 + 2)}{0.5} + \frac{0.5}{0.5} = 4 \\ \\ E(HT) = 1 + p*2 + q*4 = 1+1+2 = 4

Expected waiting times for HH

We’ll use a different method here.

\\ \text{If the first toss is a T, } \\ \\ term1 = q*(1+E(HH)) \text{; start again with the same expected time, E(HH), after the first T}\\ \\ \text{First toss is an H and the two tosses are HT } \\ \\ term2 = p*q*(2+E(HH)) \text{; start again after the second, T}\\ \\ \text{First toss is an H and the two tosses are HH, and you win in 2 tosses} \\ \\ term3 = p*p*2 \\ \\  E(HH)) = term1 + term 2+ term3 = q*(1+E(HH)) + p*q*(2+E(HH)) + p*p*2  \\ \\ E(HH)) = 0.5 + 0.5*E(HH)) + 2*0.5*0.5 + 0.5*0.5*E(HH) + 0.5*0.5*2 \\ \\ E(HH)) = \frac{0.5+2*0.5*0.5+0.5*0.5}{0.25} = \frac{0.5+2*0.5*0.5+0.5*0.5*2}{0.25} = \frac{1.5}{0.25} = 6

Waiting for Heads Read More »

Coin Flip Game

We are back to our favourite topic – coin-flipping. Anna and Ben are playing a game of flipping coins. They aim for a pattern and whoever gets it there wins. Anna chooses head-tail (HT) and Ben for head-head (HH). Who do you think will win?

You may assume since the probability of getting HT or HH in two tosses is the same, i.e., 1 in 4, the chances of winning should be identical. But the game is not about two tosses. The game is about several tosses and then counting who got the most. Let’s do this game first before getting into any theories. The following R code executes the game 10000 times, each with 5000 flips at a time, and calculates the number of times they get their respective patterns.

library(stringr)

redo <- 10000
flip <- 5000

streak <- replicate(redo, {
toss <- sample(c("H", "T"), flip, replace = TRUE, prob = c(1/2,1/2))
toss1 <- paste(toss,collapse=" ")
count <- str_count(toss1, c("H H"))

})

mean(streak)


streak <- replicate(redo, {
toss <- sample(c("H", "T"), flip, replace = TRUE, prob = c(1/2,1/2))
toss1 <- paste(toss,collapse=" ")
count <- str_count(toss1, c("H T"))

})

mean(streak)

The answer I got was 833.17 for Ben (HH) and 1249.76 for Anna (HT). Divide the number of flips with these numbers, and you get the average waiting times for the pattern. They are 5000/833.17 = 6 and 5000/1249.76 = 4. So, on average, Anna needs to wait for four flips, and Ben needs six before getting the pattern.

Pattern of three

Let us extend this for 3-coin games. Using the following code, we find the average waiting time for the three patterns – HHT, HTH, and HHH.

```{r}
library(stringr)

redo <- 10000
flip <- 5000

streak <- replicate(redo, {
toss <- sample(c("H", "T"), flip, replace = TRUE, prob = c(1/2,1/2))
toss1 <- paste(toss,collapse=" ")
count <- str_count(toss1, c("H H T"))
})

flip/mean(streak)


streak <- replicate(redo, {
toss <- sample(c("H", "T"), flip, replace = TRUE, prob = c(1/2,1/2))
toss1 <- paste(toss,collapse=" ")
count <- str_count(toss1, c("H T H"))
})

flip/mean(streak)

streak <- replicate(redo, {
toss <- sample(c("H", "T"), flip, replace = TRUE, prob = c(1/2,1/2))
toss1 <- paste(toss,collapse=" ")
count <- str_count(toss1, c("H H H"))
})

flip/mean(streak)

The waiting times are 8, 10 and 14 flips, respectively, for HHT, HTH and HHH.

Chances not identical

We will look at the theoretical treatment in another post. But first, let us try and understand it qualitatively. While the probability of getting both those two-coin sequences (or three in the second game) may be the same, the game they played takes different pathways depending on each outcome.

Look at the game from Anna’s point of view (she needs HT to win): Imagine she starts with H. The next can be an H or a T. If it is a T, she wins. But if she gets an H, she doesn’t win, but a win is just a toss away, as there is a 50% chance for her to get a T in the next flip. In other words, her failure gives her a headstart for the next.

On the other hand, Ben also starts with an H. Another head, he wins, but a tail, he needs to start all over again. He must get an H and aim for another H. A 25% chance of that happening after a failure.

Coin Flip Game Read More »

Cars No Safer

Calculating the risk numbers for passenger cars is a lot harder than the air. First, the data gathering is more challenging, thanks to the sheer number of vehicles on the road. The next thing is to estimate the number of car crashes a year. But, let’s make an attempt.

As per wiki, globally, about 1.4 billion motor vehicles are in use; a billion of them are cars. We don’t know how many journeys those make. Assuming an average of 100 days, you get 200 billion trips a year.

We use some shortcuts to estimate the number of crashes involving cars. India, which accounts for 11% of global death from road accidents, reports 150,000 fatalities from 450,000 incidents in a year. By extending the logic to the global scale, for the 1.3 million deaths every year, we estimate the incidents to be four million. We try yet another way of estimation. About 50 million injuries happen every year from vehicles and let’s assume half of them involve people travelling in the car (same ratio of reported death). The rest of them involves pedestrians and cyclists. Assuming an average of 3 people inside, we can estimate 25/3 = 8.3 million cars involved in incidents.

In the same way, 1.3 million deaths every year translates to 650,000 involving car travellers. That suggests a maximum of 650,000 fatal incidents and a minimum of 200,000 fatal incidents. Assume a mid-value of 400,000. Let’s compile all these (reported and assumed) into a table.

ItemData
# of car trips200 bln
(estimated)
# road incidents4 – 8 mln
(estimated)
# fatal incidents400,000
(estimated)
# deaths650,000
(estimated)
# passengers 600 bln
(estimated)
average trip length20 km
(estimated)
passenger-km12000 bln-km
(estimated)

Calculated quantities

MetricData
Incidents per trip20 – 40
(per million trips)
Fatal incidents per trip2
(per million trips)
Fatality per trip3.2
(per million trips)
Fatality per passenger-km54
(per billion-km)
Fatality per passenger0.81
(per million passengers)

Now compare these with what we had estimated previously for the air.

Comparison – air travel

MetricData
Incidents per trip3.13
(per million trips)
Fatal incidents per trip0.2
(per million trips)
Fatality per trip14.4
(per million trips)
Fatality per passenger-km0.06
(per billion-km)
Fatality per passenger0.13
(per million passengers)

Looks like air travel is safer on all counts.

References

[1] http://www.rvs.uni-bielefeld.de/publications/Reports/probability.html
[2] https://economictimes.indiatimes.com/news/politics-and-nation/india-tops-the-world-with-11-of-global-death-in-road-accidents-world-bank-report/articleshow/80906857.cms
[3] https://en.wikipedia.org/wiki/Aviation_accidents_and_incidents
[4] https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries
[5] https://www.icao.int/annual-report-2019/Pages/the-world-of-air-transport-in-2019.aspx
[6] https://data.worldbank.org/indicator/IS.AIR.PSGR
[7] https://en.wikipedia.org/wiki/Aviation_safety
[8] https://accidentstats.airbus.com/statistics/fatal-accidents
[9] https://injuryfacts.nsc.org/home-and-community/safety-topics/deaths-by-transportation-mode/
[10] https://www.sciencedaily.com/releases/2020/01/200124124510.htm

Cars No Safer Read More »

Riskier Flights

That flight travel is one of the safer modes of transportation is a foregone conclusion. Yet, there seems to be some confusion about the risk of taking flights versus, say, cars. Therefore the comparison requires a reevaluation.

The first question is: what is the right metric to use? Is it the number of fatalities per passenger boarding? Or is it the number of accidents/deaths per boarding? Yet another one is the number of accidents/deaths per passenger-kilometre travelled. Let’s make some (gu)estimates on each of these.

Available data

ItemData
# of flights40 mln (2019)
# aviation incidents125 (2019)
# fatal accidents8 (2019)
# aviation deaths575 (2019)
# passengers 4500 mln (2019)
average trip length2000 km
passenger-km9000 bln (2019)

Calculated quantities

MetricData
Incidents per trip3.13
(per million trips)
Fatal incidents per trip0.2
(per million trips)
Fatality per trip14.4
(per million trips)
Fatality per passenger-km0.06
(per billion-km)
Fatality per passenger0.13
(per million passengers)

Risk of air travel

In my option, the right metric is either the number of incidents per trip or the number of fatal incidents per trip. And probably the difference between road vs air. In air travel, the distance covered or the number of hours in the air are not the prime variable for incidents; riskier parts of a flight are the takeoff and landing, each of which happens once every trip, however brief or lengthy the travel be.

Comparison with the road

So how does it compare with road travel? That is a bit more complex as the data are hard to come by, requiring a lot of assumptions. Also, the risk of road travel has not distributed the way it is for the air. We’ll visit those in another post.

References

[1] http://www.rvs.uni-bielefeld.de/publications/Reports/probability.html
[2] https://economictimes.indiatimes.com/news/politics-and-nation/india-tops-the-world-with-11-of-global-death-in-road-accidents-world-bank-report/articleshow/80906857.cms
[3] https://en.wikipedia.org/wiki/Aviation_accidents_and_incidents
[4] https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries
[5] https://www.icao.int/annual-report-2019/Pages/the-world-of-air-transport-in-2019.aspx
[6] https://data.worldbank.org/indicator/IS.AIR.PSGR
[7] https://en.wikipedia.org/wiki/Aviation_safety
[8] https://accidentstats.airbus.com/statistics/fatal-accidents
[9] https://injuryfacts.nsc.org/home-and-community/safety-topics/deaths-by-transportation-mode/
[10] https://www.sciencedaily.com/releases/2020/01/200124124510.htm

Riskier Flights Read More »

Florida and Sibling Stories

We have seen the girl paradox in one of the older posts. Today we do a series of variations of the problem using Bayes’s equation. Sorry, Bayes-Price-Laplace equation! In a town far far away, every household has exactly two children.

The probability of two girls in a family

\\ P(GG) = \frac{1}{4}

The probability of two girls in a family, if you know, they have at least one girl.
We use the generalised equation here.

\\ P(GG|1G) = \frac{P(1G|GG)*P(GG)}{P(1G|GG)*P(GG) + P(1G|GB)*P(GB) + P(1G|BG)*P(BG) + P(1G|BB)*P(BB)} \\\\ = \frac{1*\frac{1}{4}}{1*\frac{1}{4} + 1*\frac{1}{4} + 1*\frac{1}{4} + 0*\frac{1}{4}} = \frac{\frac{1}{4}}{\frac{3}{4}} = \frac{1}{3}

I guess you don’t need a lot of explanations. B represents a boy, and G represents a girl. The prior probability of each combination, BB, BG, GB or GG, is (1/4); equally likely.

The probability of two girls in a family, if you know, a family has a girl named Florida. Florida is a girl’s name, and let p is the probability of a girl named Florida.

\\ P(GG|F) = \frac{P(F|GG)*P(GG)}{P(F|GG)*P(GG) + P(F|GB)*P(GB) + P(F|BG)*P(BG) + P(F|BB)*P(BB)} \\\\ = \frac{[p(1-p)+(1-p)p+p^2]*\frac{1}{4}}{[p(1-p)+(1-p)p+p^2]*\frac{1}{4} + p*\frac{1}{4} + p*\frac{1}{4} + 0*\frac{1}{4}}  = \frac{(2p-p^2)*\frac{1}{4}}{(2p-p^2)*\frac{1}{4} + p*\frac{1}{4}} = \frac{2-p}{4-p}

You may be wondering where that long-expression for P(F|GG) comes from. It’s the total probability of having a girl named Florida, regardless of whether they have already a daughter named Florida. So p(1-p) (the first girl is Florida and the other girl is not), (1-p)p (the second girl is Florida and the other girl is not), and p2 (both girls are Florida).

This is interesting. If the probability of a girl’s name Florida is 1, or every girl is named Florida, then P(GG|F) = (1/3) = P(GG|1G). If the name is rare or close to zero, P(GG|F) becomes (1/2).

Florida and Sibling Stories Read More »

A Laplace Equation Named Bayes

You may be wondering at the title of this post. Well, it is true – it was Laplace who made the Bayes equation. But not the Bayes theorem!

Bayes theorem may have been postulated a few years before Pierre Simon Laplace was born, in 1749. Bayes’ view about probabilities was more conceptual. It was a simple idea of modifying our subjective knowledge with objective information. In more technical language: initial (subjective) belief (guess or prior) + objective data = updated belief. Interestingly, those two words – subjective and belief – made classical statisticians, aka frequentists, mad!

Laplace, unaware of what Bayes had done more than two decades before, had his own ideas about the probability of causes. Eventually, he came up with a theory: the probability of a cause (given an event) is proportional to the probability of the event (given the cause). Note how close he has come to the Bayes formula that we know today.

It took Laplace another eight years or so to learn about Bayes’ idea of a prior, which gave Laplace’s equation the form as we know it. Well, by the name Bayes equation!

A Laplace Equation Named Bayes Read More »

When It’s No Longer Rare

Let us end this sequence of Sophie and her cancer screening saga. We applied Bayes’ theorem and showed that the probability of having the disease is low, even with a positive test result. But the purpose was not to downplay the importance of diagnostics tests. In fact, it was not about diagnostics at all!

Screening a random person

Earlier, we have used a prior of 1.5% based on what is generally found in the population (corrected for age). And that was the main reason why the conclusion (the posterior) was so low. It was also considered a random event. Sophie had no reason to suspect a condition; she just went for screening.

Is different from Diagnostics 

You can not consider a person in front of a specialist as random. She was there for a reason – maybe discomfort, symptoms, or recommendation from the GP after a positive result from a screening. In other words, the previous prior of 1.5% is not applicable in this case; it becomes higher. Based on the specialist’s database or gutfeel, imagine that the assigned value was 10%. If you substitute 0.1 as the prior in the Bayes’ formula, we get about 50% as the updated probability (for the set of screening devices).

Typically, the diagnostic test would have a better specificity. If the specificity goes up from 90 to 95%, the new posterior becomes close to 70%. It remains high, even if the sensitivity of the equipment dropped from, say, 95% to 90%.

When It’s No Longer Rare Read More »

Why Posterior is the New Prior?

So far, we have been accepting the notion that the posterior probability from the Bayes’ equation becomes the prior when you repeat a test or collect more data. Today, we verify that argument. What is the chance of having the disease if two independent tests turned positive? Let’s write down the equation.

\\ P(D|++) = \frac{P(++|D)*P(D)}{P(++|D)*P(D) + P(++|nD)*(1-P(D))}

Since the two tests are independent, and the marginal probability of the two positive tests is similar, we can write P(++|D) as the joint probability, P(+|D)*P(+|D). The same is true for the false positives, P(++|nD). Substituting all of them, we get

\\ P(D|++) = \frac{P(+|D)*P(+|D)*P(D)}{P(+|D)*P(+|D)*P(D) + P(+|nD)*P(+|nD)*(1-P(D))}

P(+|D) is your sensitivity, P(+|nD) is 1 – specificity and P(D) is the assumed prior.

Now, we will go to the original proposition of the posterior becoming the next prior. The probability of having the disease given the second test is also positive is given by

\\ P(D|2nd +) = \frac{P(2nd +|D)*P(D|1st+)}{P(2nd +|D)*P(D|1st+) + P(2nd+|nD)*(1-P(D|1st+))} \\ \\ \text{where, } \\ \\ P(D|1st+) = \frac{P(+|D)*P(D)}{P(+|D)*P(D) + P(+|nD)*(1-P(D))}  \\ \\ \text{since these tests are independent}, P(2nd +|D) = P(+|D) \text{. Substituting, } \\ \\ P(D|2nd +) = \frac{P(+|D)*P(D|1st+)}{P(+|D)*P(D|1st+) + P(+|nD)*(1-P(D|1st+))} \\ \\ =   \frac{P(+|D)* [ \frac{P(+|D)*P(D)}{P(+|D)*P(D) + P(+|nD)*(1-P(D))} ] }{P(+|D)* [ \frac{P(+|D)*P(D)}{P(+|D)*P(D) + P(+|nD)*(1-P(D))} ] ) + P(+|nD)*(1- [ \frac{P(+|D)*P(D)}{P(+|D)*P(D) + P(+|nD)*(1-P(D))} ] )} \\ \\ \text{expanding and cancelling similar terms,} \\ \\  P(D|2nd +) =  \frac{P(+|D)*P(+|D)*P(D)} {P(+|D)*P(+|D)*P(D) + P(+|nD)*(1-P(D))} = P(D|++)

Yes, posterior is the new prior! If you generalise the equation for n number of independent tests,

\\ P(D|+n) = \frac{P(+|D)^n*P(D)}{P(+|D)^n*P(D) + P(+|nD)^n*(1-P(D))}

Why Posterior is the New Prior? Read More »

Equation of Life Revisited

I guess you remember the story of Sophie that we encountered at the start of our journey with the equation of life. She has tested positive during a cancer screening but found that the probability of the illness was about 12% after applying Bayes’ principles. There was nothing faulty about the test method, which was pretty accurate, at 95% sensitivity and 90% specificity. Now, how many independent tests does she need to undertake to confirm her illness at 90% probability?

Assume that her second test was positive: The probability for Sophie to have cancer, given that the second test is also positive,

\\ P(C|++) = \frac{P(++|C)*P(C)}{P(++|C)*P(C) + P(++|nC)*P(nC)}  \\ \\ P(C|++) = \frac{0.95*0.126}{0.95*0.126 + 0.1*0.874} = 0.58

The updated probability has become 56% (note we have used 12.6%, which is the posterior of the first examination, as the prior and not the original 1.5%). Applying the equation one more time for a positive (third by now) test, you get

\\ P(C|++) = \frac{0.95*0.58}{0.95*0.58 + 0.1*0.42} = 0.93

So the answer is three tests to get a high level of confidence.

You may recall that the prior probability used in the beginning was 1.5%, based on what she found in the American Cancer Society publications. What would have happened if she did not have that information? She still needs a prior. Let’s use 0.1% instead. Let’s work on the math, and you will find that about 89% probability can reach in the fourth test, provided all are positive. Therefore, an accurate prior is not that crucial as long as you follow up with more data collection, which is the power of the Bayesian approach.

Equation of Life Revisited Read More »

Another Game Behind Closed Doors

We have seen the Monty Hall problem in an earlier post. This time, instead of 3, we have four doors. There is $1000 behind one door, -$1000 behind another (you lose $1000), and two other doors have nothing ($0). Like in the previous game, you choose one door, and then the game host opens a door that contains nothing. You have an option to change to one of the other closed doors now. What will you do?

No Change

In the beginning, before hosts reveals the $0 door, the probabilities are P($1000) = 1/4, P($0) = 1/2 and P(-$1000) = 1/4. The expected return is (1/4) x $1000 + (1/2) x $0 + (1/4) x -$1000 = $0. After the clue, if you still don’t want to change, this remains the case.

Change

Here, we use solution 2, the argument method, of the Monty Hall problem. Before you get the clue, the chance that you chose the $1000 door is 1/4, and that the prize was outside your choice is 1 – 1/4 = 3/4. After the clue, that probability of 3/4 sits behind two doors. In other words, if you shift, the chance of getting $1000 is 3/8. Using similar arguments, we shall see that the chance of losing became 3/8, and for $0 is 1/4. The expected return is (3/84) x $1000 + (1/2) x $0 + (3/8) x -$1000 = $0.

Will you change?

Well, it depends on your risk appetite. The chance of winning and the chance of losing have increased. But the expected returns remained the same, at zero. Or the risk has increased if you shift. If you are risk-averse, stay where you are!

Another Game Behind Closed Doors Read More »