We all know this: “a bird in the hand is worth two in the bush”. It is a timeless proverb that cautioned generations against taking risks and, just as every other proverb, is a monument of simplicity and avoids every rational scrutiny of the present. Whether people believe in this saying or not, there exists a gap in us while estimating the time value of money.
That was what Prof. Shane Frederick found out in the famous Cognitive Reflection Tests (CRT) that he carried out while at MIT. One of his questions was whether the individual goes for $3400 this month versus $3800 next month. The majority of the subjects preferred 3400, leaving the option of getting more than 11% growth in a month. Now compare that with the 2% rate that the world’s best investor could give you!
The results say something about patience and appreciation about rewards at a future date. In that way, it is not so different from the Marshmallow kids!
While my focus was to highlight our attitude towards risk and deferred gratification, I can’t end this piece without quoting the famous 3-item cognitive reflection test. The questions are:
1) If a bat and ball together cost $1.10 in total. The bat costs $1.0 more than the ball. What is the cost of the ball? 2) If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets? 3) In a lake, there is a patch of lily pads. Every day, the patch doubles its size. If it takes 48 days for the patch to cover the entire lake, how long would it take to cover half?
In my opinion, the Monte Hall problem was not about probability. It was about prejudices.
The trouble with reasoning
Logical reasoning has enjoyed an upper hand over experimentation due to historical reasons. Reasoners and philosophers commanded respect in society from very early history. It was understandable, and science, the way we see it today, was in its infancy. Experimentation and computation techniques did not exist. But we continued that habit even when our ability to experiment – physical or computational – has improved exponentially.
I have recently read an article on the Monty Hall problem, and in the end, the author remarked that the topic was still in debate. I wonder who on earth is still wasting their time on something so easy to find experimentally or by performing simulations. Make a cutout, collect a few toys, call your child for help, do a few rounds and note down the outcome. There you are and the great philosophical debate.
Thought experiments are thoughts, not experiments!
Thought experiments, if you can do some, are decent starting points to frame actual experiments and not the end in itself. The trouble with logical reasoning as the primary mode of developing a concept is that it creates an unnecessary but inevitable divide between a minority who could understand and articulate the idea and a large group of others. Evidence that emerges from experiments, on the other hand, is far convincing to communicate to people. The debate then shifts to the validity and representativeness of the experimental conditions and the interpretation of results.
Monte Hall is relevant
The relevance of the Monty Hall problem is that it tells you the existing deep-rooted prejudices and sexism in society. The topic should be discussed but not as an example for budding logical reasoning or the eloquence of mathematical language. If someone doubts the results, which is very ‘logical’, the recommendation should be to conduct experiments or numerical simulations and collect data.
Philosophy, like psychology, has played its role in the grand arena of scientific splendour as the main protagonist. The time has come for them to take the grandpa roles and give the space for experimentation and computation.
Last time we saw the statistics on police violence and death rates. The data may have given you a false impression that most of the homicides in the US are caused by the police. The number of homicides in the US ranged from 15,000 (5 in 100,000) to 24,000 (7.5 in 100,000) annually in the last 20 years. The total number of deaths from police violence was 30,800 but spread over 40 years – in the range of 4-5% of the annual homicides.
Here are a few more statistics. About 70% of the murders were caused by the use of firearms, based on the 2019 and 2020 data. Does that mean guns are used mostly to kill others? Well, 60% of gun-related deaths were suicides!
Before we close this gloomy topic: the CDC publishes firearm-related deaths for various states in the US. Alaska tops the rate in 2019 with 24.4 deaths per 100,000 people. Following is a plot of death rates vs the population (7 years of combined data).
At first, it reminded me of an older post on life in a funnel. Is it an artefact of randomness in small population sizes? We need to check further. Let us differentiate states with colour.
Zoom in to the lower 29 states.
It’s a mixed bag!
As is the case with a lot of things in life – there are randomness and clear patterns. There are fluctuations in the data at lower population states. But there are also definite clusters among those states.
We continue the earlier post on interpreting data from an asymmetric sample space.
An estimated 9540 non-Hispanic black people died from police violence during 1980-2018, says a study published in The Lancet last year. In the same period, the number of non-Hispanic White people who met the same fate was 15,200. So, whites are more likely to die from police violence in the US. Right?
Yes, if the population of non-Hispanic blacks and non-Hispanic Whites in the US are equal. But that is not true. As per Wikipedia, the former accounts for 12.3% of the US population and the latter 61.5%. If there is no correlation between death and race, you would expect around 12.3% of deaths for blacks and 61.5% for whites. As that is not apparent from the numbers, we will calculate the odds.
The easier way to do this is to divide 9540 with 12.3 and 15,200 with 61.5 and take ratios. The numbers are 775.6 : 247.2 = 3.1 : 1. In plain English, a non-Hispanic black has a 3.1 times more chance to die from police violence than a non-Hispanic White.
Here we only considered how the mind works while interpreting data when the representation of groups is not symmetric. Studying the reason behind the disparity of either behaviour of people of certain races or the reaction of police in response was not the topic. Statistics rarely tell the cause, but it may suggest a problem that requires a solution.
Weather reports are perhaps the most commonly encountered examples of probability in our daily life. For instance, the chance of precipitation for tomorrow is 40%. We know that there is only one chance of tomorrow happening, and only two possibilities – it rains or doesn’t. Then what does this 40% mean?
Let us start with what it is not. 40% rain does not mean it will rain 40% of the time or on 40% of the area!
One interpretation is that it rained 40 out of 100 days of similar weather patterns like tomorrow in the past. This interpretation relates closely to the climatology method of weather prediction, where past weather statistics guide the future. But whether predictions of today are far more advanced.
These days, weather forecasters run advanced mathematical models that take into account wind velocity, humidity, temperature, pressure, density etc. Even tiny errors in some of these variables can make the prediction off by a mile. Therefore, different models with several modes of sensitivities are solved to get an ensemble of outcomes. In the end, the Meteorologist looks at how many of them predicted rain. Suppose 20 out of a total of 50 realisations (model outcomes) predicted rain; the forecast becomes 40%.
Asymmetry causes chaos in our brains; lack of data helplessness. Start with this news headline.
So what does this mean? The simple answer is – nothing! Because the percentage quoted in the headline (and the subsequent text) is the death of unvaccinated in the total deaths. It makes an implicit assumption that in the system, an unvaccinated can get serious illness in about 70 – 30 compared to vaccinated. That does not give the right picture about the vaccine.
Take a location with 1000 people, 100 deaths and create three scenarios. Scenario 1
Vaccinated
Unvaccinated
% of population
90%
10%
number of people
900
100
breakup of death
30%
70%
number of death
30
70
risk of dying
(30/900) = 0.033
(70/100) = 0.7
risk ratio
0.047
Scenario 1
Take the second scenario:
Vaccinated
Unvaccinated
% of population
50%
50%
number of people
500
500
breakup of death
30%
70%
number of death
30
70
risk of dying
(30/500) = 0.06
(70/500) = 0.14
risk ratio
0.43
Scenario 2
A third scenario
Vaccinated
Unvaccinated
% of population
10%
90%
number of people
100
900
breakup of death
30%
70%
number of death
30
70
risk of dying
(30/100) = 0.3
(70/900) = 0.077
risk ratio
3.9
Scenario 3
Discussion
Three scenarios using the same death-break up among vaccinated and unvaccinated tell three different stories. Scenario 1 shows a highly effective vaccine, the second is very modest, and the third is likely a substance to avoid! If you are not convinced, change the population from 1000 to any other number; you should get the same answer.
I agree journalists have a role in bringing information to the public. They also have a duty to provide data that enables the public to understand something. I doubt the news had any such intentions.
Finally
So, what is the big picture in Maharashtra? It’s difficult to say without details. But, assuming the number of deaths is more likely among adults, and its vaccination rates (at least one dose) are closer to 90%, the vaccine seems to protect as it promised.
Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.
Tversky and Kahneman, Psycological Review (1983)
As part of their study, Tversky and Kahneman gave this problem to 142 UBC undergrads to determine which of the two alternative options was more probable.
Linda is a bank teller.
Linda is a bank teller and is active in the feminist movement.
What is your answer?
The AND rule
Remember the AND rule? You did not need my post to know the rule; we have accepted it happily at school. 1) it appeared logical. 2) since the probability values are always less than or equal to 1, a product of P(A) and some other probability can never be more than P(A) because that other shall always be one or less. 3) Our teacher explained it graphically using Venn diagrams.
None of us had issues with any of these.
Judgements rooted deep
Yet, 85% of the students selected the second option as the more probable!
Resemblance wins over Extensional
The scientists went to another group of students and asked to choose one statement from the following (far more explicit) options.
Argument 1 Linda is more likely to be a bank teller than a feminist bank teller as every feminist bank teller is a bank teller and some bank tellers are not feminists.
Argument 2 Linda is more likely to be a feminist bank teller than a bank teller because she resembles an active feminist more than she resembles a bank teller.
The students chose the second (65%) in the majority!
There are more examples of conjunction fallacy in our day-to-day lives. Who knows better to exploit this vulnerability of mind than your insurance agent, who can sell you the life insurance that covers deaths from terrorist attacks when you didn’t want to buy the normal one?
Interestingly, this fallacy is not restricted to stories that use rare but appealing words that trigger our imagination. Students were asked to bet on one of three sequences if a six-sided die, with four green faces and two red faces, is rolled 20 times. 1) RGRRR 2) GRGRRR 3) GRRRRR Students overwhelmingly chose option 2, forgetting that option 1 is a subset of option 2!
T-test closely resembles Z-test; both follow normal distributions. The t-test is relevant when the population standard deviation is unknown; instead, the sample standard deviation is used. After finding the test statistic, the t-test refers to the t-distribution for significance and p-values instead of the standard normal distribution.
T-distribution, unlike standard normal, is dependent on the sample size and is more spread for smaller values. A key term to remember is the degrees of freedom (df) = sample size – 1. A comparison between the two for a sample size of 5 (df = 4) is below.
The difference soon disappears as the number of samples goes beyond a few. The plot below compares a sample size of 50.
Coffee Drinking
A researcher studies the coffee drinking habits of people and found that in her city, people drink 14 ml of extra coffee on Mondays (standard deviation of 8.5 ml). Can her results reject the existing average of 10 ml more on Mondays at a 5% significance level?
Let’s set up the null hypothesis: The average extra coffee consumed on Mondays is less than or equal to 10 ml. The alternative hypothesis is: The average extra coffee consumed on Mondays is more than 10 ml. No standard deviation is known for the population; therefore, we take sample standard deviation and t-statistic. t = (14-10)/(8.5 x sqrt(50)) = 3.327. The critical value for 0.05 significance level in a t-distribution with degrees of freedom (df) = 49 is 1.68 [qt(0.95,49) in R]. Since the t-statistic value (3.327) is greater than the t-critical value (1.68), we reject the null hypothesis. The p-value is 0.000838 [pt(3.327, 49, lower.tail = FALSE) in R].
The claim on weight reduction
T-tests can be used to validate claims of interventions by taking statistical differences of the same population between two conditions or time points. Company X claim success for its weight loss drug by showing the following data. You’ll test whether there’s any statistical evidence for the claim (at a 5% significance level).
Before
After
120
114
94
95
86
80
111
116
99
93
78
83
78
74
96
91
132
136
108
109
94
90
88
91
101
100
93
90
121
120
115
110
102
103
94
93
82
81
84
80
The steps are: 1) start with a null hypothesis: the average weight change (after medicine – before medicine) is zero. 2) calculate the weight difference by subtracting before from after (for 20 samples) 3) estimate the mean and standard deviation of the differences 4) population mean (for the null hypothesis) for weight difference is 0. 5) apply the formula for t-statistic 6) compare with critical t-value = -1.72 for 5% significance level 7) estimate the p-value
Difference = After – Before
-6
1
-6
5
-6
5
-4
-5
4
1
-4
3
-1
-3
-1
-5
1
-1
-1
-4
Mean = -1.35, standard deviation = 3.7, t-value = -1.63, critical t = -1.73 (for 5%), p-value = 0.0597 > 0.05
The test shows no evidence to prove the effectiveness, and therefore, the null hypothesis is not rejected. The above treatment is called a paired t-test.
We have seen sample and population statistics in an earlier post. We continue developing from there, but this time using the concepts for hypothesis testing. Suppose we have an established population mean and standard deviation. To that, a new sampling statistic is introduced. The task is to test if the understanding (a.k.a. the null hypothesis) needs an update.
By the way, the resemblance of the equation with the story of constructing the confidence interval is not coincidental. They are both related.
Take an example: A farmer shows sample results (mean = 58 g. from a sample of 40 eggs) and claims eggs from her farm weigh more than the national average. The national average is 54 g. and has a standard deviation of 10 g. How do you test her claim?
First, create the null hypothesis: the farmer’s eggs are within the national average. The alternative hypothesis is that they are heavier than the national. We set a 5% significance level.
As per the formula, Z = (58-54)/(10 x sqrt(40))= 2.53. Now compare the position of 2.53 in a standard normal distribution, the assumption we made for the Z-statistic. The location of 2.53 is marked as a black dot in the plot below, and the start of the critical region (> 95% or <5%), by the red arrow.
So what is the p-value here? The probability that we observe a z-statistic 2.53 and above, given the null hypothesis. In other words, p is the area under the curve above the value 2.53. You can get that using the R function, pnorm (The function pnorm returns the integral from -infinity to q of the normal distribution where q is a Z-score.) and subtracting it from 1 (the total integral from -infy to +infy). The value of p is 1 – pnorm(2.53) = 1 – 0.9943 = 0.0057.
Farmer’s data has only a 0.57% chance to stand with the null hypothesis. So we reject the null hypothesis and accept the notion that the farmer’s eggs are heavier than the national average.
The concept of p-value is one complicated setting in statistics. As per the renowned writer, cognitive psychologist, and intellectual Steven Pinker, 90% of psychology professors get it wrong! Where did he get this 90%? Well, my default position (the null hypothesis) is that Steven Pinker is a no-nonsense writer. So, I confidently take 90% as my prior. I also find it super confusing (how is it relevant in a statistical setting?).
Two people are playing a game of rolling dice. One person suspects that the die was faulty, and the other (as always!) is getting too many 6s. To test the assumption, they decide to roll it 100 times. The result was 22 sixes. Since the probability of getting a six is (1/6), and the number of rolls was 100, she argues, the expected number of sizes was 16.6. Since they got 22 sixes, the die is defective.
By now, we know that the above argument was wrong. It is not how probability and randomness work. The experiment is equivalent to independent Bernoulli trials with the following distribution of chances for each number of sixes. Let the force of “dbinom” be with you and get the probability distribution. Probabilities for the “dbinom” are (1/6) for success and (5/6) for failure (not a 6).
The proper way
You got to start with a hypothesis. Since statisticians are statisticians and wanted to maintain a scientific temper, they created a concept called the null hypothesis as the default. Here the null hypothesis is: that the die is fair and will follow the binomial distribution as shown in the plot above. If you want to prove the die is defective, you need to demonstrate the null hypothesis to be invalid and reject it.
Proving beyond doubt
We have to prove that getting 22 sixes is within 5% of the most extreme values the dice can give in 100 rolls. Why 5%? It is just a convention and is called the significance level. We define the p-value and have to prove that it is smaller than or equal to the significance level to reject the null hypothesis (or prove your point). Else accept the null hypothesis (and acknowledge that you are unsuccessful).
Enter p-value
The p-value is the probability of getting numbers at least as extreme as 22. At least as extreme as 22 means: chance of getting 22 + chances of getting anything more extreme than 22! So it is the sum of 0.037 + 0.025 + 0.016 + 0.01 + 0.006 + 0.003 + 0.002 + 0.001 = 0.1 = 10%. The p-value is 10%. This is more than the significance level of 5%, and therefore we can’t reject the null hypothesis that the die is good. No evidence. To repeat, if you do the same experiment 100 times over and over, you may get 22 or more 6s one out of 10 times. To prove the die is faulty, it must reduce to one in 20 (or lower).
P for Posterior
The significance test through p has a twisted logic. p is the probability for my data given the (null) hypothesis. In other words, while you intend to prove your point, the world (or science) wants to compare it with its default, null hypothesis. The smaller the chance, you win, and the prior gives way to the posterior. My theory wins because the data collected was unlikely if the null hypothesis is true.
Tailpiece
Going to more extreme values, you will see that the probability of getting 24 or more times 6 is less than 5%. So if you throw 6s 24 or more times, you are in the critical region, and you can prove the die is faulty.
Typically, a p-value below 0.01 signifies strong evidence, 0.05 – 0.01 is moderate, and 0.05 – 0.1 is weak evidence against the null hypothesis in favour of the alternative. p greater than 0.1 is considered as no evidence against the null hypothesis.
Steven Pinker: Rationality: What It Is, Why It Seems Scarce, Why It Matters