One-way ANOVA – by Hand

Let’s do the ANOVA step by step. We use the F-statistic to accept or reject the null hypothesis by comparing it with the critical F value. Once you get the F-value, you can calculate the p-value based on a significance level.
The definition of F-statistic is

F = Between groups variance / Within-group variance

Between groups variance

Here, you are estimating the variation of the group statistic from the global statistic. In other words, you determine the means of each group and the global mean (of all data or the mean of means). The estimate the difference, square, add up and divide by the degree of freedom like you do standard variance.

Recall the previous example (strength of materials by four vendors). So you have four groups, each containing ten samples. First, estimate four means and the global mean. They are:

VendorVendor 1Vendor 2Vendor 3Vendor 4
Mean11.28.9410.688.84
Samples10101010
Global mean
(= 9.915)
Square for factor10*(11.2-9.915)210*(8.94-9.915)210*(10.68-9.915)210*(8.84-9.915)2
Sum
Square for factor
(= 43.62)
Degrees of freedom
(DF = 4 -1 = 3)

The numerator (mean squares of factor) is calculated by dividing the sum square of factor with the degrees of freedom, i.e., 43.62/3 = 14.54.

Within-group variance

Here, you add up all the variations inside the groups. Add them up and then divide by the sum of the degrees of freedom of each group.

VendorVendor 1Vendor 2Vendor 3Vendor 4
Samples10101010
Degrees of Freedom
(sample – 1)
9999
Within group
Squares for error
(variance x df)
35.81
(3.98 x 9)
79.93
(8.88 x 9)
10.94
(1.22 x 9)
31.78
(3.53 x 9)
Sum
Within group
Squares for error
(= 158.466)
Total
Degrees of Freedom
(= 36)

The denominator (mean squares of error) is calculated by dividing the sum within group squares for error with the total degrees of freedom, i.e., 158.466/36 = 4.402.

F – Statistics = 14.54 / 4.402 = 3.30

The 3.30 is then compared with the critical F-value corresponding to a set significance level, 0.05, in the present case. You can either look up at the F distribution table or use the R function.

qf(0.05, 3,36, lower.tail=FALSE)

The critical value is 2.87. Since the F-statistics in our case is larger than 2.87, we reject the null hypothesis. The p-value turned out to be 0.031.

pf(3.303, 3, 36, lower.tail = FALSE)

One-way ANOVA – by Hand Read More »

One-way ANOVA

We know how to use a 1-sample t-test to perform a hypothesis test on the mean of a single group and a 2-sample t-test to compare two groups. The scope of t-tests ends here as two is the limit. What happens when you have three groups of data? If the number is two or more, you will use the analysis of variance or ANOVA. As we have done before, we will do an ANOVA using R programming.

Comparing four vendors

ANOVA requires an independent variable (categorical factor) and a dependent variable (continuous). The following table will tell you what I meant. The data used in the analysis is taken from https://statisticsbyjim.com/. The strength of certain materials from four vendors is available, and we are determining if there is a statistically significant difference between the mean strengths. Here is how a few selected entries appear (out of 40).

VendorStrength
Vendor 111.71
Vendor 111.981
Vendor 18.043
Vendor 27.77
Vendor 210.74
Vendor 210.72
Vendor 39.65
Vendor 38.79
Vendor 310.86
Vendor 46.97
Vendor 49.16
Vendor 48.67

Note that the categorical factor, vendor names (1, 2, 3 and 4), divided the continuous data into four groups. Before performing ANOVA, we plot the data and check how they look.

par(bg = "antiquewhite")
boxplot(AN_data$Strength ~ factor(AN_data$Vendor), xlab = "Vendor", ylab = "Strength of Material")

The null and alternate hypotheses are:

N0 = four mean strengths are equal
NA = four mean strengths are not equal

Doing ANOVA is pretty easy; the following commands will do the job.

sum_aov <- aov(AN_data$Strength ~ factor(AN_data$Vendor))
summary(sum_aov)
                      Df Sum Sq Mean Sq F value Pr(>F)  
factor(AN_data$Vendor)  3  43.62  14.540   3.303 0.0311 *
Residuals              36 158.47   4.402                 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We are testing the statistical significance by the F-test. And here is the most important thing, the p-value is 0.03, less than the 0.05 we chose. So the null hypothesis is rejected. We will see what they all mean, in the next post.

Hypothesis Testing: An Intuitive Guide: Jim Frost

One-way ANOVA Read More »

Based on a Lancet Study …

In this post, we discuss an article that otherwise requires no special mention in this space. Yet, we discuss it today, perhaps as an illustration of 1) the diverse objectives that scientific researchers set for their work and 2) how the ever-imaginative media, and subsequently the public, could interpret the messages. Before we examine the motivation or the results, we need to understand something about the study’s publication status.

Preprints with The Lancet 

It is a non-peer-reviewed work or preprint and, therefore, is not a published article in the Lancet, at least for now. The SSRN page, the repository at which it appeared, further states that it was not even necessarily under review with a Lancet journal. So, a preprint with The Lancet is not equivalent to a publication by the Lancet.

The motivation

You may read it from the title: Randomised clinical trials of COVID-19 vaccines: do adenovirus-vector vaccines have beneficial non-specific effects? It is a review paper, and the investigators specifically wanted to understand the impact of COVID-19 vaccines on non-COVID diseases, which, I think, is a valid reason for the research. By the way, you have every right to ask why COVID-19 vaccines should impact accidents and suicides!

Motivated YouTubers

The following line from the abstract turned out to be the key attraction for the YouTuber scientist. It reads: “For overall mortality, with 74,193 participants and 61 deaths (mRNA:31; placebo:30), the relative risk (RR) for the two mRNA vaccines compared with placebo was 1.03“. Now, ignore the first three words, “For overall mortality”, add The Lancet, and you get a good title and guaranteed clicks! 

The results

First, the results from mRNA vaccines (Pfizer and Moderna):

Cause of
death
Death/total
Vaccine group
Death/total
Placebo group
Relative
Risk (RR)
Overall mortality31/3711030/370831.03
Covid-19 mortality2/371105/370830.4
CVD mortality16/3711011/370831.45
Other non-Covid-19
mortality
11/3711012/370830.92
Accidents2/371102/370831.00
Non-accidents,
Non-Covid-19
27/3711023/370831.17

In my opinion, the key messages from the table are:
1) The number of deaths due to Covid-19 is too small to make any meaningful inference
2) The deaths due to other causes show no clear trends upon vaccination

Results from adenovirus-vector vaccines (several studies combined):

Cause of
death
Death/total
Vaccine group
Death/total
Placebo group
Relative
Risk (RR)
Overall mortality16/7213830/500261.03
Covid-19 mortality2/721388/500260.4
CVD mortality0/721385/500261.45
Other non-Covid-19
mortality
8/7213811/500260.92
Accidents6/721386/500261.00
Non-accidents,
Non-Covid-19
8/7213816/500261.17

My messages are:
Accidental accumulation of non-Covid-19-related deaths (five of them coming from cardiovascular) gives an edge to the vaccine group and, therefore, “saves” people immunised with Adenovirus-vector vaccines from dying from other causes, including accidents, in some countries! The statistical significance of the number of cases is dubious.

Lessons learned

1) Be extremely careful before accepting commentaries about scientific work (including this post)
2) As much as possible, find out and read the original paper after being enlightened by YouTube teachers.

Randomised clinical trials of COVID-19 vaccines: do adenovirus-vector vaccines have beneficial non-specific effects?: Benn et al.

Based on a Lancet Study … Read More »

Risks vs Benefit – mRNA Against CoVid-19

You may read this post as the continuation of the one I made last year. Evaluate the risk caused by an action by comparing it with situations without that action. That is the core of the risk-benefit trade-off in decision-making. A third factor is missing in the equation, namely, the cost.

A new study published in The Lancet is the basis for this post. The report compiles the incidents of myocarditis and pericarditis, two well-known side effects linked to the mRNA vaccines against COVID-19. The data covered four health claim databases in the US and more than 15 million individuals.

The results

First, the overall summary: the data from four Data Partners (DP) indicate 411 events out of the 15 million studied who received the vaccine. Details of what is provided by each of the DPs are,

Data Partner
(DP)
Total vaccinatedTotal Observed
myocarditis or
pericarditis
events (O)
Expected
events (E)
(based on 2019)
O/E
DP16,245,406154N/A
DP22,169,3986424.96 2.56
DP33,573,0979440.08 2.35
DP43,160,4689944.612.22

I don’t think you will demand a chi-squared test to get convinced that the two mRNA vaccines have an adverse effect on heart health. Age-wise split of the data gives further insights into the story.

Age-groupObserved EventsTotal vaccinatedIncident Rate
(per 100,000)
Expected Rate
(per 100,000)
18-25153 1,972,410 7.760.99
26-3562 2,587,814 2.40 0.95
36-4563 3,226,022 1.951.11
46-5562 3,597,292 1.721.3
56-64713,764,8311.891.63

The relative risk is much higher for younger – 18 to 35 – age groups. But the absolute risk of the event is still in the single digits per hundred thousand. And this is where we should look at the risk-benefit-cost trade-off of decision-making.

The risk

First and foremost, don’t assume all those 411 individuals died from myocarditis or pericarditis; > 99% recover. To know that, you need to read another study published in December 2021 that reported the total number of deaths to just 8! So, there is a risk, but the absolute value is low. The awareness of the risk should alert the recipients that any discomfort after the vaccination warrants a medical checkup.

The benefit

It would be a crime to forget the unimaginable calamity that disease has brought to the US, with more than a million people dying from it. A significant portion of those deaths happened prior to the introduction of the vaccines, and even after, the casualties were disproportionately harder on the unvaccinated vs the vaccinated.

The cost

At least, in this case, the cost is a non-factor. Vaccine price, be it one dollar or 10 dollars, is way lower than the cost of the alternate choices, buying medicines, hospitalisation or death.

Managing trade-off

Different countries manage this trade-off differently. Since the risk of complications due to COVID-19 is much lower for children and the youth, some allocate a lower priority to the younger age groups or assign a different vaccine. However, it is recognised that avoiding their vaccination altogether, due to their low-risk status, is also not an answer to the problem. It can elevate the prevalence of illness in the system and jeopardise the elders with extra exposure to the virus.

References

Risk of myocarditis and pericarditis after the COVID-19 mRNA vaccination in the USA: The Lancet

Myocarditis after COVID-19 mRNA vaccination: Nature Reviews Cardiology

How to Compare COVID Deaths for Vaccinated and Unvaccinated People: Scientific American

Risks vs Benefit – mRNA Against CoVid-19 Read More »

The Responsibility Bias

It is a commonly observed phenomenon where people claim more credit for their contributions to collecting activities than they deserve. Examples are partners taking more than 50% credits inside marriage relationships, award-winning personalities resisting giving enough credits to their collaborators etc.

The person I see every day

Responsibility bias does not necessarily emerge out of the evilness of an individual. However, it is exacerbated by their ego – too much focus on themselves. Understandably, the quantity of information that a person has on herself is more than what she has on other people. And if she fails to recognise that fundamental disparity, she is expected to make the mistake of shunning others.

Perspective thinking

Noticing and acknowledging the contribution of others requires deliberate effort. One of the techniques is to deliberately consider the members in the group as individuals, not just the ‘rest of the group’.

This is what Caruso and Bazerman at Harvard observed this phenomenon in their investigations on perspective-taking with academic collaborators. They selected articles with three to six authors from five journals, and questionnaires were sent to the writers asking about their experience with the author group.

The questionnaire was divided into 2: 1) self-focused, in which the receivers were asked to write about their contribution (in percentages), and 2) other-focused, in which the subject was first given a task to write down the names of the co-contributors and then about their contributions, including themselves. As a measure, the participants were asked two questions: 1) how much they enjoyed the work and 2) if they were willing to collaborate on a future publication.

As predicted by the investigators, on average, the self-focused group had allocated a higher responsibility to themselves compared to the other-focused.

References

The costs and benefits of undoing egocentric responsibility assessments in groups: Caruso and Bazerman
Give and Take: Adam Grant

The Responsibility Bias Read More »

The Ultimatum Game – The Kahneman Experiment

In yet another Kahneman experiment, the team tried to play the ultimatum game with a group of psychology and business administration students. If you forgot what the game was, here is the description.

The game

Experiment 1

In their experiment, player A got paired with player B at random. There were several pairs. Each duo got $10 that could be divided between the two as proposed by one of the pairs. If player A allocated the division and was acceptable to player B, the payoffs were done accordingly. If the proposed division was unacceptable to player B, neither got anything.

Much to the surprise, because it violated the standard game theory prediction, the researchers found that the majority (75%) of the participants split the offers equally. There were also rejections of some of the proposals.

Experiment 2

The experiment had two parts. The first part was the ultimatum game with a few differences. The subjects only got two possibilities to divide $20: 18:2 or 10:10. And the receiver had no option to reject. In the second part, the participants were matched with two others. She then got a chance to split $12 evenly between herself and the person (the unfair one) who gave away $2 in the previous game (if one of them happened to be in the match) or to split $10 evenly with the even-splitters (the fair ones) of the earlier part.

76% of the people split evenly in the first part of the experiment. In the second part, there was a clear preference (74%) to punish the unfair allocators even when that would mean a $1 cost to the allocator.

The Ultimatum Game – The Kahneman Experiment Read More »

The Ultimatum Game

Adam Grant, in his best-selling book Give and Take, describes the behavioural characteristics of three types of humans based on their attitudes towards other people – takers, matchers and givers. According to the author, takers give away (money, service or information) when the benefits to themselves are far more than the personal costs that come with the transfer. Givers, on the other extreme, relish the value to others more than the personal cost to themselves. Naturally, the matchers are in between – strictly reciprocating.

Grant reference to a paper published by Kahneman et al. in 1986 based on a concept called the ultimatum game, a well-known idea in game theory. Today, we will look at the game. We’ll discuss the study results another day.

The game

We will illustrate the concept through a 100-dollar game. Player 1 (donor) gets 100 dollars, and she can offer – anything from 0 to 100 – to player 2 (receiver). If player 2 accepts, she gets it, and player 1 takes the rest (100 – X). If player 2 rejects the offer, then no one gets anything.

Rationality vs sense of fairness

If the receiver was rational, her actions would have been governed by her self-interest, as expected by economic theories, and she would have taken whatever was offered. After all, something is better than nothing. But this doesn’t always happen. There is a limit to the offer below which the receiver may feel the donor’s injustice.

Further Reading

Give and Take: Adam Grant
Ultimatum Games: William Spaniel

The Ultimatum Game Read More »

Newcomb’s Paradox

The paradox was created by William Newcomb and was first published by Robert Nozick in 1969.

Imagine there is a being that has the superpower to predict your choices with high accuracy, and you know that. There are two boxes, B1 and B2. You know that B1 contains 1000 dollars and B2 carries either one million dollars or nothing. You have two choices: 1) take what is inside both the boxes or 2) only take what is in the box B. Further, it is a common knowledge that:
1) If the being predicts that you will take both the boxes, it will not add anything to box B
2) If the being knows you will only take box B, it will add a million dollars to it.

I guess you remember the definition of common knowledge: you know that he knows that you know stuff!

What will you choose?

There are two possible arguments for leading to two different decisions.
1) You know the being will read your mind and put nothing in B if you choose both the boxes and add a million if only B is chosen. So select option 2 (select box B).
2) The being has already made the decision (after reading your mind), and the only way for you to minimise the damage is to select option 1 (select both the boxes).

In polls conducted to understand their preferences, people often tied at 50:50; there are takers for both options. But why is that?

Dominance principle

Let’s first write down the payoff matrix.

The Being
predicts
you take B
The Being
predicts
you take both
You take Box B1 million0
You take both1 million +
1000
1000

The dominance principle states that if you have a strategy that is always better, you make a rational decision to choose that. In this case, that is taking both boxes.

Here is a thought experiment to explain this perspective. Imagine the other side of the box is transparent, and your friend is standing on that side. She can see the amount inside. Although she can’t tell you anything, what would she be hoping for? Well, if she sees that the being had put a million in box B, you would be better off taking that box and the one that carries 1000. If She finds the being did not add anything, she would still like you to take both the boxes to win the guaranteed 1000.

Expected value theory

While the expected utility theory is better suited to describe situations like these, I have gone for the expected value theory as I find it easier to explain things. We estimate the expected value of each action by multiplying the value by its probability. Imagine you trust the being is accurate at 90%, the following two calculations get you the value of your decision, and you choose what gives the highest.

You take B0.9 x 1,000,000 + 0.1 x 0
= 900,000
You take both0.9 x 1000 + 0.1 x 1,001,000
= 101,000

Therefore, you select only box B.

Newcomb’s Problem and Two Principles of Choice: Robert Nozick
Newcomb’s Paradox – What Would You Choose?: Smart by Design

Newcomb’s Paradox Read More »

Ambiguity Aversion

Ellsberg and Allais paradoxes have one thing in common – both reflect our ambiguity aversion. Given the opportunity to choose between a ‘sure thing’ and an uncertain one, people tend to pick the former. Or it is the behaviour characteristics that dictate your decision-making when the probability of the outcome is known vs it is unknown; a feeling that tells you an uncertain outcome is a negative outcome.

In the case of the Ellsberg paradox, people are happy to bet on the red ball when they know the risk (33% chance) against the ambiguity surrounding the black and yellow. The same people had no issue dumping the mathematically identical option (red) when they knew there was a 60% chance of getting 100 if they went for one of the others.

In the case of the Allais, it was a fear imposed by a 1% chance of getting nothing. If you want to know that fear, let’s take the case of a vaccine that can give a 10% chance of 5-year protection, 89% chance of 1-year protection and a 1% chance of no protection, or worse, a 1 in a million probability of death! If that was placed side by side with another one that guarantees 1-year protection to all, without any known side effects, guess what I would go for.

Ambiguity Aversion Read More »

Allais Paradox

You have two choices: A) A lottery that guarantees $ 1 M vs B) where you have a 10% chance of winning $ 5M, 89% chance for 1 M and 1% chance of nothing. Which one will you choose? If I write them in a different format:

A$ 1M (1)
B$ 5M (0.1); $ 1M (0.89); $ 0 (0.01)

Having chosen one of the above two, you have another one to choose from. C) A lottery with an 11% chance of $ 1 M and 89% chance of nothing vs D) a 10% chance of winning $ 5M, 90% chance of nothing.

C$ 1M (0.11); $ 0M (0.89)
D$ 5M (0.1); $ 0 (0.9)

Allais (1953) argued that most people preferred A and D. What is wrong with that?

Expected Value

If the person had followed the expected value theory, she could have chosen B and D:

A) $ 1M x 1 = $ 1M
B) $ 5M x 0.1 + $ 1M x 0.89 + $ 0 x 0.01 = $ 1.39 M
C) $ 1M x 0.11 + $ 0M x 0.89 = $ 0.11 M
D) $ 5M x 0.1 + $ 0 x 0.9 = $ 0.5 M

Expected Utility

Since the person chose A over B, clearly, it was not the expected value but an expected utility that governed her. Mathematically,

U($ 1 M) > U($ 5 M) x 0.1 + U($ 1 M) x 0.89 + U($ 0M) x 0.01

Now, collect the U($ 1 M) on one side, add U($ 0M) x 0.89 on both sides, and simplify.

U($ 1 M) – U($ 1 M) x 0.89 > U($ 5 M) x 0.1 + U($ 0M) x 0.01
U($ 1 M) x 0.11 > U($ 5 M) x 0.1 + U($ 0M) x 0.01
U($ 1 M) x 0.11 + U($ 0M) x 0.89 > U($ 5 M) x 0.1 + U($ 0M) x 0.01 + U($ 0M) x 0.89
U($ 1 M) x 0.11 + U($ 0M) x 0.89 > U($ 5 M) x 0.1 + U($ 0M) x 0.9

Pay attention to the last equation. What are you seeing here? The term on the left side is the expected utility equation corresponding to option C, and the one on the right side is option D. In other words, if A > B, then C > D. But that was violated in the present case.

Allais Paradox Read More »