May 2023

The Sailor’s Child problem

May 31, 2023

A sailor sails between two ports. At each port, he stays with a woman, both of whom want to have a child with him. The sailor is initially reluctant but changes his mind and tosses a coin to decide: if it’s a head, he will have a child with one and if it’s tail, with both. If heads come up, he will open up The Sailor’s Guide to Ports, and whichever port, out of the two, features earlier, he will choose the woman on that port.

If A is the son of the sailor, what is the probability that he is an only son?

We’ll go to the Bayes’ to find out the answer.

$\\ P(O|C) = \frac{P(C|O) * P(O)}{P(C|O) * P(O) + P(C|NO) * P(NO)} \\ \\ P(O|C) = \frac{(1/2) * (1/2)}{(1/2) * (1/2) + 1 *(1/2)} = \frac{1}{3}$

The Sailor’s Child problem Read More »

The Heat of the Momentum

May 30, 2023

Yesterday night (or earlier today, for some), the Miami Heat beat Boston Celtics to win the Eastern Conference final of the NBA, thus qualifying for the ultimate showdown against the Denver Nuggets. The Heat made it in the seventh game after both teams had tied at 3-3.

In many ways, the matchup has been a nightmare that threw sports analysts and Las Vegas for a complete spin. For those who missed the plot, Celtics were the pre-series favourites but lost the first three matches to the Heat. The Heat obtained the momentum to make the fourth win for a sweep but lost the next three games and gave the momentum back to the Celtics. And the Celtics, not knowing that they have this thing called momentum, lost cheaply against the Heat.

The momentum of sports

Momentum is a term borrowed from physics, defined as the product of mass and velocity, a parameter with magnitude and direction. Journalists use it to represent some internal force of nature (psychology) that moves entities (sports teams, stock prices) to one direction based on their immediate past performances.

Momentum, like a hot hand, positive energy and negative energy, is a type of cognitive illusion. An argument that is often used to explain a complex or a random process. While hot hands may be partially explainable as it happens due to someone’s mood or a form on a day, this momentum thing happens over a few days. The three-match stretch may appear to you like a sequence, but each game breaks for 45 hours before the next one; most professional teams recover from such setbacks. And every game becomes a new matchup, unconnected to the previous; like a coin toss.

One can argue it was a reverse momentum that happened in this series. The fourth match became the must-win for the Celtics. And as it happened several times in the past two years, they successfully dragged themselves out of the hole, not once, but three times. Then it became a must-win for the Heat (well, also for the Celtics), which they successfully executed.

The Heat of the Momentum Read More »

Advantage Dice

May 29, 2023

You play a game in which you throw two dice (6-sided) and select the highest value. Repeat it many times. What is the average of the results?

itr <- 100000
play <- replicate(itr, {
first <- sample(c(1,2,3,4,5,6), 1, replace = TRUE, prob = c(1/6,1/6,1/6,1/6,1/6,1/6))
second <- sample(c(1,2,3,4,5,6), 1, replace = TRUE, prob = c(1/6,1/6,1/6,1/6,1/6,1/6))

max(first, second)  
})

mean(play)

The answer is 4.47

What happens if you do the same game on two 20-sided dice?

itr <- 100000
play <- replicate(itr, {
first <- sample(seq(1,20), 1, replace = TRUE, prob = rep(1/20,20))
second <- sample(seq(1,20), 1, replace = TRUE, prob = rep(1/20,20))

max(first, second)  
})

mean(play)

You get 13.83

Advantage Dice Read More »

Car with No Rear View

May 28, 2023

Imagine you get a chance to buy a coffee shop. Here is what the owner tells you.
The current sales = $ 74,000 /yr
Shop rent = $30,000 /yr
Employee salary = $25,000 /yr
Coffee beans = $15,000 /yr
The cost of furniture and coffee machine = $45,000

How much are you willing to pay?

Market value

A simple valuation shows the shop can generate $4,000 a year (74,000 – 30,000 – 25,000 – 15,000) after paying for the rent, salaries and the purchase of the coffee beans. If you feel the shop will generate the same forever, you can do a simple (perpetuity) formula of 4000 / 0.1 = 40,000; 0.1 represents the discount rate of 10%. So you are willing to pay a maximum of $40,000.

The owner reminds you that she spent 45,000 just a few weeks ago to renovate. Will you change your mind? Sadly, it shouldn’t. The cost the owner sunk in the past can’t change the value it generates in the future. The buyer politely replies that she could get $500 more ($4,500) every year if she invested that 45,000 in the market at a 10% return. So what the owner spent (the book value) is immaterial to the buyer who calculated the market value.

Movie or football

Mat bought a ticket for a movie by paying $25. Just before he starts, he gets a phone call from John, who invites him to watch a football match. Mat likes football and John’s company, yet declines the invite because he has already spent the ticket price of the movie.

The money Mat spent is sunk, and what matters now is what gives him a good time (movie vs football with friends). But Mat falls for the sunk cost fallacy, the bad feeling for the loss on things that have already been spent against a better return in the future.

The concord of failures

The fallacy of sunk cost is common in big projects. Companies often hesitate to shut down projects midway when even they realise that it’s getting expensive and the product won’t make any economic benefit. They rationalise they invested too much to quit.

Social scientists hypothesise three reasons for this fallacy

The loss aversion
Desire not to appear wasteful
To force one to do things that otherwise won’t happen

Psychology of decision making

The sunk cost fallacy is a powerful force that impacts decision-making. The issue with sunk costs is that they are the things of the past, but we pay too much attention to them. It’s the same feeling that keeps you attending the whole show of a terrible movie, eating everything ordered even when you are full, or continuing a nonfunctional relationship solely because the couple spent four years of their life together.

Reference

Sunk Costs: The Big Misconception About Most Investments: Sprouts

Car with No Rear View Read More »

Mean, Median and Bill Gates

May 27, 2023

We have seen that the two most commonly used ways of summarising the centre of variation of observed values are the mean and the median. The mean is the numerical average, and the median is the mid-point.

Andrew Vickers uses the following example to illustrate the need for two parameters and the issue when there are outliers. Seven people with annual incomes of $85,000, $50,000, $60,000, $40,000, $75,000, $100,000 and $45,000 are in a dinner. Bill Gates walks in. What is the new distribution of the salary in the room?

Before Gates

Before Mr Gates walked in, the average salary was ($85,000 + $50,000 + $60,000 + $40,000 + $75,000 + $100,000 + $45,000) / 7 = $65,000. To estimate the median, we first need to arrange the numbers in ascending order, $40,000, $45,000, $50,000, $60,000, $75,000, $85,000, $100,000, locate the midpoint, i.e., $60,000, which is the median.

After Gates

The picture changes once Mr Gates enters the room. Let’s assume his annual income (!) is $ 1 B (the highest number I could envision). The mean is = 1,000,455,000 / 8 = $ 125 million and a bit. And the median? ($60,000 + $75,000)/2 = $67,500.

You might say the median ($67,500) better represents the crowd of upper-middle-class people (and one billionaire). The mean, the so-called average, appears helpless here.

The session cannot be complete without invoking my favourite plot of all – the box plot.

You may have noticed that 7 out of 8 fall below the mean.

Reference

What is a p-value anyway? 34 Stories to Help You Actually Understand Statistics: Andrew Vickers

Mean, Median and Bill Gates Read More »

Contingency Table and Mosaic

May 26, 2023

table(T_data$Sex, T_data$Survided)

table2 <- table(T_data$Sex, T_data$Survided)
mosaicplot(table2, main = "Titanic Data",
           sub = "",
           xlab = "Sex",
           ylab = "Survided",
           las = 1,
           color = c("skyblue2","lightgreen"),
           border = "chocolate")

table2 <- table(T_data$Sex, T_data$Survided)
fisher.test(table2)

chisq.test(table2, correct = FALSE)

Contingency Table and Mosaic Read More »

Mosaic Plot – Titanic

May 25, 2023

We’ll continue with the Mosaic plot. This time we use one of the popular datasets – the Titanic data.

Mosaic Plot – Titanic Read More »

Simpson’s Paradox – Mosaic Plot

May 24, 2023

We have seen Berkeley data in the previous post and refreshed the concept of Simpson’s paradox. Here we introduce a handy visualisation of such data using mosaic plots.

The following R code generates the mosaic plot for the overall admission. The code requires the ‘vcd’ package.


mosaic( ~   Gender + Admit, data = berk_data,
       highlighting = "Gender", highlighting_fill = c("pink", "lightblue"),
       direction = c("v","h"))

The lower width of the pink panel on the admission (top) suggests a smaller number of females (89) compared to males (512). The smaller width of the top pink panel compared to the bottom pink panel indicates lower admission rates for females (proportional to the application rate). Smaller heights of pannels indicate more rejection than admission.

Once the data is stratified to include the department, the picture changes to the following.

mosaic( ~ Dept  + Gender + Admit, data  = berk_data,
       highlighting = "Gender", highlighting_fill = c("pink", "lightblue"),
       direction = c("v","v","h"))

Most of the pink panels on the top are more than or equal to the ones on the bottom, suggesting better admission rates for females. You can check the last table of the previous post and recognise that the admission rates of departments A and B are more than 50%, and the rest are lower. Lastly, the number of male applicants is much more in those two departments (width of the blue panel compared to pink).

Simpson’s Paradox – Mosaic Plot Read More »

Simpson’s Paradox – Berkeley data

May 23, 2023

We have seen Simpson’s paradox in one of the earlier posts. A famous one was the discrepancy in observed admission rates of men and women from six departments at Berkeley. Here is what the data shows; the dataset is available on GitHub.

Admit	Gender	Dept	Frequency
Admitted	Male	A	512
Rejected	Male	A	313
Admitted	Female	A	89
Rejected	Female	A	19
Admitted	Male	B	353
Rejected	Male	B	207
Admitted	Female	B	17
Rejected	Female	B	8
Admitted	Male	C	120
Rejected	Male	C	205
Admitted	Female	C	202
Rejected	Female	C	391
Admitted	Male	D	138
Rejected	Male	D	279
Admitted	Female	D	131
Rejected	Female	D	244
Admitted	Male	E	53
Rejected	Male	E	138
Admitted	Female	E	94
Rejected	Female	E	299
Admitted	Male	F	22
Rejected	Male	F	351
Admitted	Female	F	24
Rejected	Female	F	317

The paradox

If one considers the university as a whole, here is the summary

Admit	Gender	#
Admitted	Male	1198
Rejected	Male	1493
Admitted	Female	557
Rejected	Female	1278
Total		4526

Proportion of Male admitted = 1198 /(1198+1493) = 0.45

Proportion of female admitted = 557/(557 + 1278) = 0.30

There is a difference in success rates for men and women. But what about department-wise ‘discrimination’? Here are the success rates of males and females in each department.

Department	Male	Female
A	0.62	0.82
B	0.63	0.68
C	0.37	0.34
D	0.33	0.35
E	0.28	0.24
F	0.06	0.07

Success rates of females are at par or even higher in every department! Let’s probe further and check where they applied against the success rates.

Department	% Male Applied	% Female Applied	Admission Rate (%)
A	30	6	64
B	21	1	63
C	12	32	35
D	15	20	34
E	7	21	25
F	14	19	6
Total	100	100

Women preferred more competitive departments with lower acceptance rates, whereas more men opted for departments with better acceptance rates.

Simpson’s Paradox – Berkeley data Read More »

Confounding vs Effect Modification

May 22, 2023

We have seen confounders before; it is a factor that associates with both exposure and outcome, thereby deceiving investigators of a causal relationship between the two.

For example, smoking is a confounder that misleads people to conclude that drinking can lead to lung cancer. In reality, smokers have a higher tendency to drink, and smokers have a higher tendency to get lung cancer. Until you stratify and find the impact of drinking on smokers and non-smokers, you are unlikely to figure out the error.

On the other hand, if the variable impact the outcome and not the exposure, it is an effect modification. A simple example is the immunisation status of an individual can impact the person’s susceptibility to getting the infection from the virus.

Confounding vs Effect Modification Read More »