July 2022

The Unexpected Hanging Paradox

A judge gives the verdict to the prisoner that he will be hanged one noon next week, on a weekday, but the day would be a surprise. The prisoner goes back to the cell and makes the following assumptions.

If the executioner doesn’t appear by Thursday noon, he will not be hanged as the last day, Friday, is no more a surprise. After eliminating Friday, he extends the logic to Thursday, which now becomes the final day. Finally, he concludes that he will not be executed as none of the days will come as a surprise.

The prisoner is happy and confident that he will not be hanged, only to find out by complete surprise that he was executed on Wednesday at noon. The judge now stands correct; it was a surprise to the convict. But what was wrong with his logic?

The Unexpected Hanging Paradox Read More »

First Instinct Fallacy

Our fundamental instinct to resist changes reflects well in the first-instinct fallacy of answering multiple-choice questions. However, studies have time and again suggested in favour of rechecking and updating the initial ‘gut feel’ as a test-taking strategy. One such example is the test conducted by Kruger et al. in 2005.

Following the eraser

The study followed eraser marks of 1561 exam papers for a psychology course at UIUC. The researchers categorised the changes in answers into three, viz., wrong to right, right to wrong and wrong to wrong, based on the 3291 changes they found. And here is what they found:

Answer changeNumbers%
Wrong to right169051
Right to wrong83825
Wrong to wrong76323

An important statistic is that about 79% of the students changed their answers. It is significant because, when asked separately, 75% of the students believed that the original choices were more likely to be correct in situations of uncertainty.

Switching to the wrong hurts

The level of fear or shame on a decision to shift from right to wrong overwhelms the misery of failure by sticking to the incorrect one, even though the data showed the advantages the second thinking brings. In a subsequent study, the team asked 23 students of the University of Illinois asked which of the outcomes would hurt them most – 1) you made a switch from a correct answer to a wrong and 2) you did not move away from the initial instinct after considering the eventually correct answer. The response from the majority of respondents suggested that people who were in the first situation regretted it more than the second.

[1] Counterfactual thinking and the first instinct fallacy: Justin Kruger, Derrick Wirtz, Dale T Miller
[2] Our first instinct is far too often wrong: FT

First Instinct Fallacy Read More »

The q-q Plot: The Method

Making a q-q plot is easy in R. Get data, and type a single line command; you’re ready. I will show you the process and the output using the famous mtcars dataset, which is in-built in R. For example, if you want to know the data, qsec, which is 1/4 mile time, of 32 automobiles follows a normal distribution, you type.

qqnorm(mtcars$qsec)
qqline(mtcars$qsec)

These functions are from the stats library. The first line makes the (scatter) plot for your sample, and the second one ( qqline()) makes the line representing the theoretical distribution line, which makes it easier to evaluate whether the points deviate from the reference line.

You can see that the qsec data is more or less uniformly distributed, something you may see from the histogram of the data below.

Take a different data which is left-skewed, such as hp, and its qqplot

The q-q Plot: The Method Read More »

The q-q Plot: The Codes

It took quite some time and online help to get some of the R codes to generate the analysis of the last post. So let me share them with you.

The Data

The data was generated using the rnorm function, which is a random number generator using a normal distribution. The following lines of codes generate 10,000 random data points from a normal distribution, with an average of 67 and a standard deviation of 2.5, add them to a data frame, and plot the histogram.

theo_dat <- rnorm(10000, mean = 67, sd = 2.5)
height_data <- data.frame(Height = theo_dat)
par(bg = "antiquewhite")

hist(height_data$Height, main = "", xlab = "Height (inch)", ylab = "Frequency", col = c("grey", "grey", "grey", "grey", "grey", "grey", "grey","grey","brown","blue", "red", "green"), freq = TRUE, ylim = c(0,2000))
abline(h = 2000, lty = 2)
abline(h = 1500, lty = 2)
abline(h = 1000, lty = 2)
abline(h = 500, lty = 2)

Horizontal lines are drawn to mark a few reference frequencies, and the option freq is turned ON (TRUE), which is the default. On the other hand, the density plot may be obtained by one of the two options: freq = FALSE or prob = TRUE.

par(bg = "antiquewhite")
hist(height_data$Height, main = "", xlab = "Height (inch)", ylab = "Density", col = c("grey", "grey", "grey", "grey", "grey", "grey", "grey","grey","brown","blue", "red", "green"), prob = TRUE, ylim = c(0,0.2))
abline(h = 0.2, lty = 2)
abline(h = 0.15, lty = 2)
abline(h = 0.1, lty = 2)
abline(h = 0.05, lty = 2)

quantile function (stats library) can give quantiles at specified intervals, in our case, 5%.

quantile(height_data$Height, probs = seq(0,1,0.05))
0%5%10%15%20%25%30%
57.8862.87 63.80 64.3764.8765.3165.69
35%40%45%50%55%60%65%
66.0569766.3766.6966.9867.2967.6267.97
70%75%80%85%90%95%100%
68.3169.1069.5668.7070.1571.0677.55

A bunch of things are attempted below:
1) The spacing (breaks) of the histogram bins as per quantiles
2) Rainbow colours for the bins are given to the bins
3) Two X-axes are given one 2.5 units below another

vec <- rainbow(9)[1:10]
vic <- rev(vec)
mic <- c(vec,vic)
lab <- c("0%", "5%", "10%", "15%", "20%", "25%", "30%", "35%", "40%", "45%", "50%", "55%", "60%", "65%", "70%", "75%", "80%", "85%", "90%", "95%", "100%" )

par(bg = "antiquewhite")
hist(height_data$Height, breaks = quantile(height_data$Height, probs = seq(0,1,0.05)), col = mic, xaxt = "n", main = "", xlab = "", ylab = "Density")
axis(1, at = quantile(height_data$Height, probs = seq(0,1,0.05)) , labels=lab)
axis(1, at = quantile(height_data$Height, probs = seq(0,1,0.05)), line = 2.5)

Note that the 10th bin (from the left came out with no colour as we selected 9 from the rainbow and 10 total array elements for the colour vector.

The q-q Plot: The Codes Read More »

The q-q Plot: Episode 1

The Quantile-Quantile (q-q) plot is a technique for verifying how well your data compares with a given distribution. Quantiles are regular intervals for cutting a probability distribution into equal probability pieces. We have seen before about percentiles, Px, the value below which x percentage (100 portions) of the data lies, and quartiles, which divide the distribution into four equal parts of 25% each (first, second, third, and fourth quartiles).

Distribution

Imagine you collected 10,000 sample data for a parameter, say the height of 10000 adult males, and made a histogram.

You can see that the X-axis describes the height (in inches). Each bin (bar) is one inch wide (X-distance). This is how you interpret the plot: say, the red bin starts at 67, ends at 68, and has a height of 1500 in frequency. This would mean that about 1500 individuals (out of the 10,000) are 67 to 68 inches tall. Similarly, from the brown bucket, there are ca. 1300 males between 65 and 66.

The same graph may be represented with density on the Y-axis instead of frequency.

Using density, we rescale the frequency so that the total area under the curve becomes one, and each bin will provide probabilities of occurrence. For example,

1 x 0.16 + 1 x 0.15 + 1 x 0.125 + 1 x 0.13 + 1 x 0.1 + 1 x 0.1 + 1 x 0.06 + 1 x 0.06 + 1 x 0.025 + 1 x 0.025 + ... = 1

Plot against the quantiles

So far, we have used equal quantity for the parameter (height) on the X-axis. We will change it to something different. We will take percentage intervals (a type of quantile). We use 5% intervals, and the height values corresponding to each of the 5% occurrences are tabulated:

0%5%10%15%20%25%30%
57.8862.87 63.80 64.3764.8765.3165.69
35%40%45%50%55%60%65%
66.0569766.3766.6966.9867.2967.6267.97
70%75%80%85%90%95%100%
68.3169.1069.5668.7070.1571.0677.55

Key observations are 1) the distance between 0 to 5 maybe 5%, but on the scale, it occupies a length of almost 5 inches (62.87 – 57.88). 2) there is an equal probability of observing the value in each group. If you plot the density against the percentiles, we get this:

To understand the second point, equal probability groups, let me add another scale to the X-axis:

Now find the area of any block, e.g. the left red = (62.87 – 57.88) x 0.0093 = 0.046. The next brown = (63.80 – 62.87) x 0.053 = 0.049. Finally, one of the white boxes = (66.98 – 66.69) * 0.17 = 0.049. Now you know the probability groups.

Compare actual with theory

In q-q, you collect the values of various quantiles of your data and plot them against the theoretical quantiles of a specified (normal, chi-squared, etc.) distribution. Since it is theory vs actual, and if they perfectly match, you should get a diagonal straight line.

Before we close

We will do the exercise another time. I will also show you the various R codes used in this post.

The q-q Plot: Episode 1 Read More »

The Ultimatum Game – The Game Theory Version

We have seen what behavioural scientists had observed when carrying out the ultimatum game on their subjects. Ultimatum game also has an economic side theorised by the game theorists for the rational decision-maker. A representation of the game is below.

Unlike the simultaneous games we had seen before, where we used payoff matrices, this is a sequential game, i.e. the second person starts after the first one has made her move. The first type is a normal form game and is very static. The one shown in the tree above is an example of an extensive form game.

The game

Player A has ten dollars that she splits between her and player B. In the game design, A has to make the proposal and B can accept or reject it. If B accepts the offer, both the players get the money per the division proposed by A. If B refuses, no one gets anything.

Backward induction

Although player A starts the game by spitting 10 dollars between herself and player B, her decision gets influenced by what she assumes about B’s decision (accept/reject). In other words, A requires to begin from the ending and work backwards. Suppose player A does an unfair split 9-1 in favour of A. B can accept the 1 dollar or get nothing by rejecting. Since one is better than zero, B will probably take the offer. If A makes a fair split, then also B will accept the 5. That means B will take the offer no matter what A proposes. So player A may choose the unfair path. This is a Nash equilibrium.

What happens if player B makes a threat of rejecting the unfair offer. It may not be explicit; it could just be a feeling in A’s mind. In either case, player A believes in that and thus makes a fair division. And this is what Kahneman learned from his experiments. In-game theory language, the threat from B is known as an incredible threat as it makes no economic sense to refuse even the unfair offer (as 1 > 0)!

References

Games in the Normal Form: Nolan McCarty and Adam Meirowitz

Extensive Form Games: Nolan McCarty and Adam Meirowitz

The Ultimatum Game – The Game Theory Version Read More »

Tukey’s Method Continued

Here are the sampling results of a product from four suppliers, A, B, C and D (Data courtesy: https://statisticsbyjim.com/).

ABCD
40 37.9 36 38
36.9 26.2 39.4 40.8
33.4 24.936.3 45.9
42.3 30.3 29.5 40.4
39.1 32.6 34.9 39.9
34.737.539.841.4

Hypotheses

N0 – All means are equal
NA – Not all means are equal

Input the data

PO_data <- read.csv("./Anova_Tukey.csv")
as_tibble(PO_data)

Leads to the output (first ten entries)

Material Strength
<chr>     <dbl>
B	  37.9			
C	  36.0			
D	  38.0			
A	  40.0			
A	  36.9			
C	  39.4			
A	  33.4			
B	  26.2			
B	  24.9			
B	  30.3

Plot the data

par(bg = "antiquewhite")
colors = c("red","blue","green", "yellow")
boxplot(PO_data$Strength ~ factor(PO_data$Material), xlab = "Supplier", ylab = "Material Data", col = colors)

F-test for ANOVA

str.aov <- aov(Strength ~ factor(Material), data = PO_data)
summary(str.aov)

Output:

                 Df Sum Sq Mean Sq F value Pr(>F)   
factor(Material)  3  281.7    93.9   6.018 0.0043 **
Residuals        20  312.1    15.6                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Reject null hypothesis

We at least reject the null hypothesis because the p-value < 0.05 (the chosen significance level). The F-value is 6.018. Another way of coming the conclusions is to find out the critical F value for the degrees of freedoms, df1 = 3 and df2 = 20.

qf(0.05, 3,20, lower.tail=FALSE)
pf(6.018, 3, 20, lower.tail = FALSE)

Lead to

3.098391      # F-critical
0.004296141   # p-value for F = 6.018

Tukey’s test for multiple comparisons of means

TukeyHSD(str.aov)
Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Strength ~ factor(Material), data = PO_data)

$`factor(Material)`
         diff        lwr        upr     p adj
B-A -6.166667 -12.549922  0.2165887 0.0606073
C-A -1.750000  -8.133255  4.6332553 0.8681473
D-A  3.333333  -3.049922  9.7165887 0.4778932
C-B  4.416667  -1.966589 10.7999220 0.2449843
D-B  9.500000   3.116745 15.8832553 0.0024804
D-C  5.083333  -1.299922 11.4665887 0.1495298

Interpreting pair-wise differences

You can see that the D-B difference, 9.5, is statistically significant at an adjusted p-value of 0.0022. And, as expected, the 95% confidence interval for D-B doesn’t include 0 (no difference between D and B).

By the way, the message, the difference between blue box and yellow, was already apparent had you paid attention to the box plots we made in the beginning.

Reference

Hypothesis Testing: An Intuitive Guide: Jim Frost

Tukey’s Method Continued Read More »

Tukey’s Method: Who Made the Difference?

In the previous ANOVA exercises, we found that data suggested rejecting the null hypothesis. To remind you of the two hypotheses,
N0 – All means are equal
NA – Not all means are equal
So, we at least rejected the proposition that all means are equal because the p-value was lower than the chosen significance level of 0.05 (or the F value was outside the critical F value corresponding to the 0.05 level). But we have no idea which of the pairs of means had the most significant difference.

Tukey method can create confidence intervals for all pair-wise differences while controlling the family error rate to whatever we specify.

Family error rate

You know what is an error rate. It is the probability that the null hypothesis is correct when you reject it when the p-value is less than the significance level. At a significance level is 0.05, there is a 5% chance of getting your outcome when the null hypothesis is correct. The situation is called a false positive.

The p-value we obtained for the material testing problem was 0.03, but it was for the entire family of four vendor groups (each with ten samples). This is the experiment-wise or family-wise error rate. Since our significance level for the F-test was kept at 0.05, we can regard the family error rate to be 5%.

Four groups, six comparisons

Since we had four groups (factors) of samples, each representing one vendor, we have six possible comparisons. They are:

#Comparison
1Vendor 2-Vendor 1
2Vendor 3-Vendor 1
3Vendor 4-Vendor 1
4Vendor 3-Vendor 2
5Vendor 4-Vendor 2
6Vendor 4-Vendor 3

Family-wise error rate, 0.05, is the grand union of all pair-wise error rates. If the pair-wise error is alpha, family-wise error = (1 – (1-alpha)C), where C is the number of comparisons. If you substitute alpha = 0.05 and C = 4, you get the family-wise error as 0.26. Obviously, 26% is too high a significance level.

Tukey method preserves the family-wise error rate to what we specify, say, 0.05, and therefore the pair-wise error rates could be about 0.0085.

By keeping all these points in mind, let’s perform the Tukey’s method on our dataset using R.

TukeyHSD(res.aov)

Which leads to the following output:

Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Strength ~ factor(Sample), data = AN_data)

$`factor(Sample)`
                         diff        lwr       upr     p adj
Vendor 2-Vendor 1 -2.26479763 -4.7917948 0.2621995 0.0924842
Vendor 3-Vendor 1 -0.51997359 -3.0469707 2.0070236 0.9448076
Vendor 4-Vendor 1 -2.36456760 -4.8915647 0.1624295 0.0736423
Vendor 3-Vendor 2  1.74482404 -0.7821731 4.2718212 0.2632257
Vendor 4-Vendor 2 -0.09976996 -2.6267671 2.4272272 0.9995613
Vendor 4-Vendor 3 -1.84459400 -4.3715911 0.6824031 0.2197059

In the next post, we will do a complete exercise of ANOVA including the Post Hoc test.

Hypothesis Testing: An Intuitive Guide: Jim Frost

Tukey’s Method: Who Made the Difference? Read More »

F – Statistics

We have seen how F-statistics work to test the hypotheses in one-way ANOVA. We also know the definition of F as a ratio between two variances. Variances measure how spread the data is around the mean, estimated as the sum of squared deviation divided by the degrees of freedom. If you forgot, you get the standard deviation if you take the square root of the variance.

F = Between groups variance / Within-group variance

F-tests use F-distribution

Recall how we used the t-distribution or binomial distribution to determine the probability where the null hypothesis was true. F-distribution, too, has a characteristic shape and is based on two parameters – the degrees of freedom 1 and 2, the ones used in the numerator and denominator, respectively.

In the case of the material strength problem we have been working out in the past two posts (four groups with ten samples each leading to df1 =4-1 = 3 and df2 = 4 x (10-1) = 36), the F-distribution appear in the following form.

One way to understand the above plot is to imagine you are repeating the sampling several times (keeping for vendors and taking ten samples each so that df1 and df2 remain the same), and the null hypothesis is true. You calculate the F values each time. Finally, if you plot the frequency of those F values, you get a plot similar to the one above.

F – Statistics Read More »

One-way ANOVA – by Hand

Let’s do the ANOVA step by step. We use the F-statistic to accept or reject the null hypothesis by comparing it with the critical F value. Once you get the F-value, you can calculate the p-value based on a significance level.
The definition of F-statistic is

F = Between groups variance / Within-group variance

Between groups variance

Here, you are estimating the variation of the group statistic from the global statistic. In other words, you determine the means of each group and the global mean (of all data or the mean of means). The estimate the difference, square, add up and divide by the degree of freedom like you do standard variance.

Recall the previous example (strength of materials by four vendors). So you have four groups, each containing ten samples. First, estimate four means and the global mean. They are:

VendorVendor 1Vendor 2Vendor 3Vendor 4
Mean11.28.9410.688.84
Samples10101010
Global mean
(= 9.915)
Square for factor10*(11.2-9.915)210*(8.94-9.915)210*(10.68-9.915)210*(8.84-9.915)2
Sum
Square for factor
(= 43.62)
Degrees of freedom
(DF = 4 -1 = 3)

The numerator (mean squares of factor) is calculated by dividing the sum square of factor with the degrees of freedom, i.e., 43.62/3 = 14.54.

Within-group variance

Here, you add up all the variations inside the groups. Add them up and then divide by the sum of the degrees of freedom of each group.

VendorVendor 1Vendor 2Vendor 3Vendor 4
Samples10101010
Degrees of Freedom
(sample – 1)
9999
Within group
Squares for error
(variance x df)
35.81
(3.98 x 9)
79.93
(8.88 x 9)
10.94
(1.22 x 9)
31.78
(3.53 x 9)
Sum
Within group
Squares for error
(= 158.466)
Total
Degrees of Freedom
(= 36)

The denominator (mean squares of error) is calculated by dividing the sum within group squares for error with the total degrees of freedom, i.e., 158.466/36 = 4.402.

F – Statistics = 14.54 / 4.402 = 3.30

The 3.30 is then compared with the critical F-value corresponding to a set significance level, 0.05, in the present case. You can either look up at the F distribution table or use the R function.

qf(0.05, 3,36, lower.tail=FALSE)

The critical value is 2.87. Since the F-statistics in our case is larger than 2.87, we reject the null hypothesis. The p-value turned out to be 0.031.

pf(3.303, 3, 36, lower.tail = FALSE)

One-way ANOVA – by Hand Read More »