Car with No Rear View

Imagine you get a chance to buy a coffee shop. Here is what the owner tells you.
The current sales = $ 74,000 /yr
Shop rent = $30,000 /yr
Employee salary = $25,000 /yr
Coffee beans = $15,000 /yr
The cost of furniture and coffee machine = $45,000

How much are you willing to pay?

Market value

A simple valuation shows the shop can generate $4,000 a year (74,000 – 30,000 – 25,000 – 15,000) after paying for the rent, salaries and the purchase of the coffee beans. If you feel the shop will generate the same forever, you can do a simple (perpetuity) formula of 4000 / 0.1 = 40,000; 0.1 represents the discount rate of 10%. So you are willing to pay a maximum of $40,000.

The owner reminds you that she spent 45,000 just a few weeks ago to renovate. Will you change your mind? Sadly, it shouldn’t. The cost the owner sunk in the past can’t change the value it generates in the future. The buyer politely replies that she could get $500 more ($4,500) every year if she invested that 45,000 in the market at a 10% return. So what the owner spent (the book value) is immaterial to the buyer who calculated the market value.

Movie or football

Mat bought a ticket for a movie by paying $25. Just before he starts, he gets a phone call from John, who invites him to watch a football match. Mat likes football and John’s company, yet declines the invite because he has already spent the ticket price of the movie.

The money Mat spent is sunk, and what matters now is what gives him a good time (movie vs football with friends). But Mat falls for the sunk cost fallacy, the bad feeling for the loss on things that have already been spent against a better return in the future.

The concord of failures

The fallacy of sunk cost is common in big projects. Companies often hesitate to shut down projects midway when even they realise that it’s getting expensive and the product won’t make any economic benefit. They rationalise they invested too much to quit.

Social scientists hypothesise three reasons for this fallacy

  1. The loss aversion
  2. Desire not to appear wasteful
  3. To force one to do things that otherwise won’t happen

Psychology of decision making

The sunk cost fallacy is a powerful force that impacts decision-making. The issue with sunk costs is that they are the things of the past, but we pay too much attention to them. It’s the same feeling that keeps you attending the whole show of a terrible movie, eating everything ordered even when you are full, or continuing a nonfunctional relationship solely because the couple spent four years of their life together.

Reference

Sunk Costs: The Big Misconception About Most Investments: Sprouts

Car with No Rear View Read More »

Mean, Median and Bill Gates

We have seen that the two most commonly used ways of summarising the centre of variation of observed values are the mean and the median. The mean is the numerical average, and the median is the mid-point.

Andrew Vickers uses the following example to illustrate the need for two parameters and the issue when there are outliers. Seven people with annual incomes of $85,000, $50,000, $60,000, $40,000, $75,000, $100,000 and $45,000 are in a dinner. Bill Gates walks in. What is the new distribution of the salary in the room?

Before Gates

Before Mr Gates walked in, the average salary was ($85,000 + $50,000 + $60,000 + $40,000 + $75,000 + $100,000 + $45,000) / 7 = $65,000. To estimate the median, we first need to arrange the numbers in ascending order, $40,000, $45,000, $50,000, $60,000, $75,000, $85,000, $100,000, locate the midpoint, i.e., $60,000, which is the median.

After Gates

The picture changes once Mr Gates enters the room. Let’s assume his annual income (!) is $ 1 B (the highest number I could envision). The mean is = 1,000,455,000 / 8 = $ 125 million and a bit. And the median? ($60,000 + $75,000)/2 = $67,500.

You might say the median ($67,500) better represents the crowd of upper-middle-class people (and one billionaire). The mean, the so-called average, appears helpless here.

The session cannot be complete without invoking my favourite plot of all – the box plot.

You may have noticed that 7 out of 8 fall below the mean.

Reference

What is a p-value anyway? 34 Stories to Help You Actually Understand Statistics:  Andrew Vickers

Mean, Median and Bill Gates Read More »

Contingency Table and Mosaic

table(T_data$Sex, T_data$Survided)
table2 <- table(T_data$Sex, T_data$Survided)
mosaicplot(table2, main = "Titanic Data",
           sub = "",
           xlab = "Sex",
           ylab = "Survided",
           las = 1,
           color = c("skyblue2","lightgreen"),
           border = "chocolate")
table2 <- table(T_data$Sex, T_data$Survided)
fisher.test(table2)

chisq.test(table2, correct = FALSE)

Contingency Table and Mosaic Read More »

Simpson’s Paradox – Mosaic Plot

We have seen Berkeley data in the previous post and refreshed the concept of Simpson’s paradox. Here we introduce a handy visualisation of such data using mosaic plots.

The following R code generates the mosaic plot for the overall admission. The code requires the ‘vcd’ package.


mosaic( ~   Gender + Admit, data = berk_data,
       highlighting = "Gender", highlighting_fill = c("pink", "lightblue"),
       direction = c("v","h"))

The lower width of the pink panel on the admission (top) suggests a smaller number of females (89) compared to males (512). The smaller width of the top pink panel compared to the bottom pink panel indicates lower admission rates for females (proportional to the application rate). Smaller heights of pannels indicate more rejection than admission.

Once the data is stratified to include the department, the picture changes to the following.

mosaic( ~ Dept  + Gender + Admit, data  = berk_data,
       highlighting = "Gender", highlighting_fill = c("pink", "lightblue"),
       direction = c("v","v","h"))

Most of the pink panels on the top are more than or equal to the ones on the bottom, suggesting better admission rates for females. You can check the last table of the previous post and recognise that the admission rates of departments A and B are more than 50%, and the rest are lower. Lastly, the number of male applicants is much more in those two departments (width of the blue panel compared to pink).

Simpson’s Paradox – Mosaic Plot Read More »

Simpson’s Paradox – Berkeley data

We have seen Simpson’s paradox in one of the earlier posts. A famous one was the discrepancy in observed admission rates of men and women from six departments at Berkeley. Here is what the data shows; the dataset is available on GitHub.

AdmitGenderDeptFrequency
AdmittedMaleA512
RejectedMaleA313
AdmittedFemaleA89
RejectedFemaleA19
AdmittedMaleB353
RejectedMaleB207
AdmittedFemaleB17
RejectedFemaleB8
AdmittedMaleC120
RejectedMaleC205
AdmittedFemaleC202
RejectedFemaleC391
AdmittedMaleD138
RejectedMaleD279
AdmittedFemaleD131
RejectedFemaleD244
AdmittedMaleE53
RejectedMaleE138
AdmittedFemaleE94
RejectedFemaleE299
AdmittedMaleF22
RejectedMaleF351
AdmittedFemaleF24
RejectedFemaleF317

The paradox

If one considers the university as a whole, here is the summary

AdmitGender#
AdmittedMale1198
RejectedMale1493
AdmittedFemale557
RejectedFemale1278
Total4526

Proportion of Male admitted = 1198 /(1198+1493) = 0.45

Proportion of female admitted = 557/(557 + 1278) = 0.30

There is a difference in success rates for men and women. But what about department-wise ‘discrimination’? Here are the success rates of males and females in each department.

DepartmentMaleFemale
A0.620.82
B0.630.68
C0.370.34
D0.330.35
E0.280.24
F0.060.07

Success rates of females are at par or even higher in every department! Let’s probe further and check where they applied against the success rates.

Department% Male
Applied
% Female
Applied
Admission
Rate (%)
A30664
B21163
C123235
D152034
E72125
F14196
Total100100

Women preferred more competitive departments with lower acceptance rates, whereas more men opted for departments with better acceptance rates.

Simpson’s Paradox – Berkeley data Read More »

Confounding vs Effect Modification

We have seen confounders before; it is a factor that associates with both exposure and outcome, thereby deceiving investigators of a causal relationship between the two.

For example, smoking is a confounder that misleads people to conclude that drinking can lead to lung cancer. In reality, smokers have a higher tendency to drink, and smokers have a higher tendency to get lung cancer. Until you stratify and find the impact of drinking on smokers and non-smokers, you are unlikely to figure out the error.

On the other hand, if the variable impact the outcome and not the exposure, it is an effect modification. A simple example is the immunisation status of an individual can impact the person’s susceptibility to getting the infection from the virus.

Confounding vs Effect Modification Read More »

Collider Bias – The Math

So far, I have addressed the collider-bias phenomena qualitatively. This time, I will try to show through numbers. It can be complex as the illustration involves a lot of arithmetic. The reference material provided at the end is a good read, further grasping the concept.

Imagine a situation where exposure is obesity, the risk factor is smoking, the outcome is mortality, and the collider is diabetes. If you are confused about what each represents, here is the expected storyline: A research group does study the impact of obesity on mortality in a set of people who have diabetes and comes up with a counterintuitive conclusion (perhaps that obesity decreases mortality)!

Set of information

Total study population = 1000
Smokers = 500
Non-smokers = 500
Obese = 500
Non-obese = 500
Baseline diabetes risk (non-smoking, non-obese)= 4%
Obesity increases diabetes risk by 16 % points
Smoking increases diabetes risk by 12% points
Baseline mortality risk (non-smoking, non-obese, nondiabetic)= 5%
Obesity increases mortality risk by 2.5% points
Smoking increases mortality risk by 15% points
Diabetic increases mortality by 5%

Calculations on the total sample

The overall study population is depicted as

Now, calculate the mortality rates of each quadrant and portion into obesity and non-obesity conditions.

Total mortality of NS-NO (non-smoking, non-obese) quadrant
= # of diabetic x diabetic mortality + # non-diabetic x baseline mortality
= 0.04 x 250 x (0.05 + 0.05) + (250 – 0.04 x 250) x 0.05
= 1 + 12 = 13
(note that diabetic mortality = baseline mortality + diabetic increases mortality)

S-NO (smoking, non-obese) quadrant
= # of diabetic x (diabetic mortality + smoking mortality) + # non-diabetic x (Baseline mortality + smoking mortality)
= (0.04 + 0.12) x 250 x (0.05 + 0.05 + 0.15) + (250 – (0.04 + 0.12) x 250) x (0.05 + 0.15)
= 52

S-O (smoking, obese) quadrant
= (0.04 + 0.12 + 0.16) x 250 x (0.05 + 0.05 + 0.15 + 0.025) + (250 – (0.04 + 0.12 + 0.16) x 250) x (0.05 + 0.15 + 0.025)
= 60

NS-O (non-smoking, obese) quadrant
= (0.04 + 0.16) x 250 x (0.05 + 0.05 + 0.025) + (250 – (0.04 + 0.16) x 250) x (0.05 + 0.025)
= 21

Calculations (for the total sample)
Mortality rate with obesity = (60 + 21) / 500 = 16.5%
Mortality rate without obesity = (13 + 52) / 500 = 13%
An increase of 3.5%

Calculations on the sub-sample

Suppose the study stratified the sample and analysed only people who have diabetes. The study sample space is as follows.

Do the same exercise as before

NS-NO quadrant
= # of diabetic x diabetic mortality
= 0.04 x 250 x (0.05 + 0.05)
= 1

S-NO quadrant
= # of diabetic x (diabetic mortality + smoking mortality)
= (0.04 + 0.12) x 250 x (0.05 + 0.05 + 0.15)
= 10

S-O quadrant
= (0.04 + 0.12 + 0.16) x 250 x (0.05 + 0.05 + 0.15 + 0.025)
= 22

NS-O quadrant
= (0.04 + 0.16) x 250 x (0.05 + 0.05 + 0.025)
= 6

Calculations (for the sub-sample)
Mortality rate with obesity = (22 + 6) / 130= 21.5%
Mortality rate without obesity = (1 + 10) / 50= 22 %
A decrease of 0.5%

Reference

Collider Bias in Observational Studies: Dtsch Arztebl Int.

Collider Bias – The Math Read More »

The Obesity Paradox

The obesity paradox is the idea that people who are overweight live longer than normal-weight people. While later studies have found this claim invalid, the notion stayed in public discourse ever since.

There are many explanations for this odd observation. One of them goes with the parameter of measurement itself – the survival rate after getting cardiovascular disease. Studies found that obese people may get the disease much earlier in life and therefore survive a longer proportion of life with it.

Another one is collider stratification bias, which happens when two variables, e.g., risk factor and outcome, influence a third, namely, the likelihood of being sampled. It works in the following way:

Obese individuals may have developed CAD because they are obese or because of another stronger condition, e.g., smoking or genetics. In other words, CAD, the collider, is caused by 1) obesity and 2) the (more severe) condition (smoking). In this simple two-cause model, a stratification of variables means among individuals with CAD, obese individuals are less likely to be smokers, and non-obese individuals are more likely to be smokers. Subsequently, obesity may appear protective against mortality (outcome) because its presence indicates the absence of a more harmful risk factor – smoking.

References

The ‘obesity paradox’ may not be a paradox at all: International Journal of Obesity

Obesity is bad regardless of the obesity paradox for hypertension and heart disease: J Clin Hypertens

Association of Body Mass Index With Lifetime Risk of Cardiovascular Disease and Compression of Morbidity: JAMA Cardiology

The Obesity Paradox Read More »

Night Light and Myopia

A well-known case for confounding was the finding of night lighting casing myopia in young children.

In 1999 Quinn et al. published an article in the prestigious journal Nature that reported a strong association between exposure to nighttime light before the age of two years and myopia and created wide publicity in the media. As axial myopia is caused by excessive eyeball growth during childhood, the researchers rationalised that nighttime lighting in young children could stimulate the condition.

However, multiple studies that repeated the investigation found no association between the exposure (night light) and the outcome (myopia).

Myopic parents

It turned out that the fault was from those myopic parents of those infants who had the habit of keeping the lights on at night for better vision and created the confounder. As myopic parents tend to have myopic children, the association now looked easier to understand.

References

Myopia and ambient lighting at night: Quinn et al.
Continuous ambient lighting and eye growth in primates: Smith et al.
Myopia and night lighting in children in Singapore: Saw et al.

Night Light and Myopia Read More »