Consider a group of 10 people who form committees of k members, 2 < k < 8. How many different committees of k members can be formed?
Judgment under Uncertainty: Heuristics and Biases, Tversky and Kahneman
The third story from Tversky and Kahneman paper is about the role of imaginability in the estimation of probabilities. Consider this group of 10 people who form committees of a minimum of 2 up to a maximum of 8. To find how many possible ways to form teams, you need to apply what is known as Combinations, which is nothing but the binomial coefficient that you have seen earlier. i. e. Combinationsof n things taken k at a time without repetition.
For 3-member teams, it comes out to be 10C3 or 120. The choice increases to the maximum for 5 (252 combinations), and then decreases symmetrically such that nCk = nCn-k (number of 3-member groups = number of 7-member groups and so on).
The following R code uses the function choose(n,k) to evaluate the binomial coefficient and plots the outcome.
committe <- function(n,k){
choose(n,k)
}
diff <- seq(2,8)
diff_com <- mapply(committe, diff, n = 10)
plot(x = diff, y = diff_com, main = paste("Number of Ways to form a Committee"), xlab = "Number of Individuals in the Committee", ylab = "Number of Combinations to Form a Committee", col = "blue", ylim = c(0,400))
It requires number crunching, and mental constructs don’t always help. In a study, when people were asked to make guesses, the median estimate of the number of 2-member committees was around 70; 8-member committees were at 20. So, imagining a few two-member teams were possible in mind, whereas 8-member groups were beyond its capacity.
A certain town is served by two hospitals. In the larger hospital about 45 babies are born each day, and in the smaller hospital about 15 babies are born each day. As you know, about 50 percent of all babies are boys. However, the exact percentage varies from day to day. Sometimes it may be higher than 50 percent, sometimes lower.
For a period of 1 year, each hospital recorded the days on which more than 60 percent of the babies born were boys. Which hospital do you think recorded more such days?
b The larger hospital b The smaller hospital b About the same
Judgment under Uncertainty: Heuristics and Biases, Tversky and Kahneman
If you recall the law of large numbers, you would have guessed the correct answer, i. e. the smaller hospital. Because as the number of births increases, the gender of the baby comes closer to the expected percentage of 50.
If you still doubt, let’s run a simple Monte Carlo run using the following R code,
days <- 365
birth <- 15
boy <- 0.5
boys <- replicate(days, {
prob_birth <- sample(c(0,1), birth, prob = c((1-boy), boy), replace = TRUE)
mean(prob_birth)*100
})
sum(boys > 60)
Run this code 100 times and plot the answers, the probabilities of a day in which more than 60% were boys:
Now, change the number of births to 45 and re-run the calculations:
What about more than 60% of girls?
Let me end this piece with this one. Which hospital do you expect more number of days with less than 40% of boys? No marks for guessing: it is still the small hospital.
“Steve is very shy and withdrawn, invariably helpful, but with little interest in people, or in the world of reality. A meek and tidy soul, he has a need for order and structure, and a passion for detail.”
Judgment under Uncertainty: Heuristics and Biases, Tversky and Kahneman
Following the clues above, your task is to guess if Steve is a farmer or a librarian.
A significant proportion of people may have guessed Steve was a librarian. Some of the others who chose farmer may have done it so out of suspicion of the build-up.
Back to Bayes’ics
Remember the Bayes’ theorem? If not, read my earlier post, The Equation of life.
P(Lib|D) = P(D|Lib) x P(Lib) / [P(D|Lib) x P(Lib) + P(D|noLib) x P(noLib)]
Let us check the chance for the frequent answer – that Steve was a librarian – to be true (P(Lib|D)). I am ready to support the argument that all librarians fit this stereotype (P(D|Lib) = 1) if that was a concern. It is unlikely to be valid, but I give you that benefit of the doubt. Estimating the prior probability of librarian in a set of farmers and librarians (P(Lib)) is the task that needs data. Based on the available data in the public domain, in the US, that ratio is 0.026!
P(D|noLib) or the description fitting farmers is tricky, but I make an assumption least 10% of the farmer community can have shy and withdrawn men! P(noLib) is nothing but 1 – P(Lib). Substitute all the numbers
(1 * 0.026)/[(1 * 0.026)+(0.1 * 0.974)] = 0.21.
Even if all the librarians fit your mental stereotype, you are right only 20% of the time. To paraphrase what late Hans Rosling used to say: a chimp would do a better job; she picks the correct answer 50% of the time.
It’s not about Maths
The message here is not about the math, nor about the research required to get an accurate answer. It is only about being mindful about our biases and how much they can lead to inaccurate perceptions about others.
Let me talk about an article that I want you to read. It is titled ‘Judgment under Uncertainty: Heuristics and Biases‘, published in Science 1974. The paper is about heuristics, the mental shortcuts to arrive at decisions, and its inherent problems in real life. And the consequence? Implicit biases to gambling-addiction, stereotyping to micro-inequities.
We have seen in the past few posts how unreliable our intuitions about conditional probabilities can be. The authors give many stories to expose the errors in judgements that we are carrying.
I will go through their stories one by one in the coming days, but first, you read the paper.
I have shown you my estimates on vaccine effectiveness in Kerala in an earlier post. The plot is reproduced below.
You may have noticed that the effectiveness was steadily going upwards. Intrigued, it has prompted me to check what was going on with the other variables during the same period. One of them was the overall incident rate (shown below).
Puzzled by those periodic minima in the plot? It is an artefact in the data caused by the lower-than-usual number of tests on Sundays!
Yes, you guessed it right: the next step is to plot the incident rate with vaccine effectiveness. As expected, it is a straight line!
Correlation or Causation
It was still not easy to know what was happening. The whole observation can be a confounded outcome of something else. We have seen how chance plays its role in containing the disease outbreak (the post of swiss cheese!).
The next logical step is to perform the calculations and check if the trends are making sense. The calculations are:
Let P be the prevalence of the disease (incidence rate multiplied by some number to cover the who can still infect others at a given time). V represents the holes in the vaccination barrier, U is the holes in the unvaccinated (probability = 1), M is the holes in the mask usage, and n is the number of exposures that a person encounters. Here, I’ve stopped at mask as the only external factor, but you can add more barriers such as safe distance.
The chance of getting infected in n encounters with the virus = (1 – chance of being lucky in n encounters) The chance of getting lucky in n encounters is = nCn x (chance of being lucky once)n x (chance of being unlucky once)0 = (chance of being lucky once)n.
The probability of being lucky once = (1 – chance of getting infected in a single encounter). The chance of getting infected is the joint probability of breaking through multiple barriers. So luck = (1 – P x M x V) for a vaccinated person and (1 – P x M) for an unvaccinated person. Note that V is related to 1 – prior assumed efficacy of the vaccine.
Finally, the effectiveness of vaccine = { [1 – (1 – P M )n] – [1 – (1 – P M V)n] } / {1 – (1 – P M)n}
You can already see from the expression that the vaccine effectiveness is a function of P, the prevalence. Repeat the calculations for a few incidence rates, and the results are plotted along with the actual data. The dotted line represents the estimation.
Key Takeaways
An apple to apple comparison between vaccines requires, among other things, the prevalence of the disease during the period of the efficacy trials. Part of the reason why many of the Covid 19 vaccines are showing modest efficacy levels lies in the extraordinary high incident rate of the illness prevailing through the last year and a half. A high prevalence of disease in society means a person, even though vaccinated, would encounter the virus several times, increasing the probability to get infected. Every such event is an additional test to the efficacy.
As calamities from another variant of Covid19 is looming large, the omicron, be prepared for more confusing news in the coming days. It is also the right time to introduce the word risk. Risk has a specific technical meaning. It is the product of the likelihood of something to happen and the consequence.
Risk = likelihood x consequence.
Compare the delta variant with ones that came earlier. From the data, anecdotal evidence by individuals and evolutionary arguments, it became the public narrative that the consequence of infection with delta was similar to, or even mildly less dangerous. Did it mean the same for the risk? We can’t say until we know the likelihood. Delta turned out to be more than double as contagious as earlier ones. So the overall risk was much more than the first.
The second common argument was the case fatality rate. The CFR, as it is commonly known, was not high, they argued, but forgetting that almost a third of humanity was going to get it. A small fraction of a large number is still sizeable.
Black Swan Event
An extreme example of risk is the black swan event – a concept introduced by Nassim Nicholas Taleb through his book that carried the same name. These are unpredictable events and has infinite consequences.
Was the Covid pandemic a black swan event? As per the author himself, it was not a black swan event. People had predicted viruses attacks like these, and there were, however hypothetical, opportunities to control the disease at its onset, had there been a few steps taken by the originating country – be it intervention at the start or by just being more transparent.
But September 11 was one of them. It was never anticipated, and the consequence was enormous and far-reaching.
Most of the food you eat today is genetically modified, if not all! By genetic modification, I do not mean that the cultivar had gone through countless Petri dishes and a bunch of scientists injected solutions that would consciously and systematically modify specific parts of its DNA. Much milder than that, through a process called plant breeding, a fundamental process in agriculture.
Let me go a step further: humans cannot (or would not) make the transition from the Hunter-Gatherer society to the Agrarian without violating the rules of natural selection. We have seen Natural Selection before, and I want to repeat: Nature does not select anything. Nature only offers its playground and let the living species play random games. Some survive the game; we only get to see the survivors.
Humble Story of Staple Grains
Take wheat, rice and corn, which satisfy more than 50% of the calory requirements of the world. They all had their beginnings as grasses that bore too small seeds to attract any animals. Wild wheat seeds grew at the top of a stalk that spontaneously shattered and spread as far as possible, away from public sight, and quietly germinated. For that reason, they escaped early humans until a single-gene mutation caused a few plants to lose the capacity to shatter. For the wheat plant, this would be detrimental for the seeds cant fly to places and germinate. By the way, if I made you think that the plant was doing all these out of intelligence, let me rephrase – plants with such a defect won’t survive for long because of their limited capacity to spread their offspring.
However, such useless mutants were a lottery for humans as they got control of the entire growth and regrowth of the plants without losing any seeds. Wheat is now in her orchard. Occasionally, the already ‘unnatural’ plant gets another mutation, yielding larger seeds. From the plant’s viewpoint, what happened is a sheer wastage of its nutrients; after all, a seed, irrespective of its size, gets a single chance to become the next plant. Humans, on the other hand, love it and select only those bigger ones and grow.
For centuries we did this process without knowing what we were doing. Now we know the details, so much so that we know what parts of its genetic make-up need to change. And we also know how to change it!
How odds and percentages can sometimes hide the big picture away from our eyes was the topic of an earlier post on Down Syndrome. Today, we continue from where we left off.
The data we analysed were livebirth from 10 states in the united states. That approach has a few issues. First, it included only 10 out of the 50 states. Second, and perhaps more importantly, the data covered only live births. In other words, there could be asurvivorship bias to the data. What if children born with Down syndrome from different age-group-mothers have different chances of survival? Can it turn our analyses and insights upside down? Well, we don’t know, but we will find out.
Updated Data Including Stillbirths
Last time we sampled 10 states, 5600 live births and a total of 4.4 million mothers. Here we widen our net to cover 29 states, 12,946 births (live births and stillbirths) and a population of 9.8 million mothers. The messages are:
Women above 40 risk about 12 times higher than those younger than 35 to have babies with Down Syndrome. Yet, 54% of the mothers were 35 years or younger.
Not Done Yet
Is this all before we claim a logically consistent analysis? The answer is an emphatic NO. We still miss a major confounding factor that can potentially lead to a survivorship bias. It is the increased use of prenatal testing and termination of pregnancy for women older than 35. What we see at the end could be biased statistics of the probability distribution. So, the work is not done yet, and we will do more research in another post.
Imagine a crime scene where the investigators were able to collect bloodstain. The sample was old, the DNA degraded, and the analysts estimated a relative frequency of 1 in 1000 in the population. Police found a suspect and got a DNA match. What is the chance that the suspect is guilty?
The prosecutor argues that since the relative frequency of the DNA match is 1 in 1000, the chance for the person to be innocent is 1 in 1000 and deserves maximum punishment. Well, the prosecutor made a wrong argument here. Imagine the city has 100,000 people in it. The test results suggest that there are about 100 people whose DNA can match the sample. So, the suspect is one of 100, and the chance of innocence only based on the DNA test is 99%.
P(INN|DNA) – the chance that the suspect is innocent given the DNA matches P(DNA|INN) – chance of DNA match if the suspect is innocent = 1/1000 P(CRI) – prior probability that the suspect did the crime = 1 /100,000 (like any other citizen) P(INN) – prior probability that the suspect is innocent = (1 – 1 /100,000) P(DNA|CRI) – chance of DNA match given the suspect did the crime = 1 (100%)
Does this mean that the suspect is innocent? Not either. The results only mean that the investigators must collect more evidence to file charges against the suspect.
The COVID-19 pandemic presented us with a live demonstration of science at work, much to the surprise of many who are not regular followers of its history. It gave a ringside view of the current state of the art, yet it created confusion among people whenever they missed consistency in the messaging, theories, or guidelines. The guidance on protective barriers—using masks, safe distancing, and hand washing—was one of them.
Swiss Cheese Model of Safety
The Swiss cheese model provides a picture of how the layered approach of risk management works against hazards. Let us use the model to check the underlying math behind general health advice on COVID-19 protection. I describe it through a simplified probability model.
The probability of someone getting infected by Covid 19 is a joint probability of several independent events. They are the probabilities:
an infected person who can transmit the virus in the vicinity (I)
to get inside a certain distance (D)
to pass through a mask (M)
to pass through the protection due to vaccination (V)
to get the infection after washing hands (H)
to infect the person once the virus is inside the body (S)
Infected person in the vicinity (I): is equal to the prevalence of the disease (assuming homogeneous mixing of people). Let’s make a simple estimate. These days, the UK reports about 50,000 cases per day in a population of 62 million. It is equivalent to an incident rate of 0.000806. Assume that an infected person can transmit the virus for ten days, and half of them manage to isolate themselves without passing the virus to others. The prevalence (proportion of people who can transmit the disease at a given moment) is 5 x 0.000806 = 0.0004032. Multiply by a factor of 2 to include the asymptomatic and the symptomatic but untested folks too into the mix. Prevalence becomes = 0.0008064 (8 in 1000).
To get inside a certain distance (D): If the person managed to stay outside the 2 m radius from an infected person, there could be zero probability of getting infected, but it is not practically possible to follow every time. Therefore, we assume she managed to stay away 50% of the time, which means a probability of 0.5 to get infected.
To pass through a mask (M): General purpose masks never offer 100% protection against viruses. So, assume 0.5 or 50% protection.
To pass through the protection from vaccination (V): The published data suggest that vaccination could prevent up to 80% of symptomatic infections. That means the chance of getting infected is 0.2 for the vaccinated.
The last two items – hand washing (H) and susceptibility to getting infected (S) – are assumed to play no role in protecting COVID-19. Infection via touching surfaces plays a minor role in transmission, and the latest variants (e.g. Delta) are so virulent that almost all get it once it is inside the body.
Assume a person makes one visit outside in a day. The probability of getting the infection is = I x D x M x V x H x S = 0.008 x 0.5 x 0.5 x 0.2 x 1 x 1 = 0.0004 or the chance of not getting is 0.9996.
The person makes one visit for 30 days (or two visits for 15 days!). Her probability of getting infected on one of those days is = 1 – the probability she survived for 30 days. To estimate the survival probability, you need to use the binomial theorem. Which is 30C30 x 0.999630 x 0.0040 = 0.988. The chance of a fully protected person getting infected in a month outdoors is 1 – 0.988 or 12 in 1000!
Scenario 2: Fully Protected Person Indoor
The distance rule doesn’t work anymore, as the suspected droplets (or aerosols or whatever) are available everywhere. The probability of getting the infection is = I x D x M x V = 0.008 x 1 x 0.5 x 0.2 = 0.0008. This means the chance of not getting is 0.9992. 30-day chance is 1 – 0.976 = 0.024 or 24 in thousand.
Scenario3: Indoor Unprotected but Vaccinated
I x I x D x M x V = 0.008 x 1 x 1 x 0.2 = 0.0016. The chance of getting infected in a month = 1 – 0.95 or 5 in hundred.
Scenario4: Indoor Unprotected
I x D x M x V = 0.008 x 1 x 1 x 1 = 0.008. The chance of getting infected in a month = 1 – 0.78 or about 2 in 10 chance.
A bunch of simplifications were made in these calculations. One of them is the complete independence of items, which may not always hold. Some of these can be associated – a person who cares to make a safe distance may be more likely to wear a mask and get vaccinated. Inverse associations are also possible – a vaccinated person may start getting into crowds more often and stop following other safety practices.
Second is the simplification of one outing and one encounter with an ill person. In reality, you may come across more than one infected. In the case of indoor, the suspended droplets containing the virus act as encounters with multiple individuals.
The case of health workers is different as the chances of encountering an infected person in a clinic or a medical facility differ from that in the general public. If one in ten people who come to a COVID clinic is infected, the chances of the health worker getting infected in a month are 95% if she wears an ordinary mask and comes across 100 patients daily. If she uses a better face cover that offers ten times more protection, the chance becomes about 25% in a month, or one in 4 gets infected even after getting vaccinated.
Bottomline
Despite all these barriers, people will still get infected. Small portions of large numbers are still sizeable numbers but do not get distracted by them. Use every single protection that is available to you. Those include vaccination, mask use, maintaining distance, and reducing non-essential outdoor trips. They all help to reduce the overall rate of infection.