A judge gives the verdict to the prisoner that he will be hanged one noon next week, on a weekday, but the day would be a surprise. The prisoner goes back to the cell and makes the following assumptions.
If the executioner doesn’t appear by Thursday noon, he will not be hanged as the last day, Friday, is no more a surprise. After eliminating Friday, he extends the logic to Thursday, which now becomes the final day. Finally, he concludes that he will not be executed as none of the days will come as a surprise.
The prisoner is happy and confident that he will not be hanged, only to find out by complete surprise that he was executed on Wednesday at noon. The judge now stands correct; it was a surprise to the convict. But what was wrong with his logic?
Our fundamental instinct to resist changes reflects well in the first-instinct fallacy of answering multiple-choice questions. However, studies have time and again suggested in favour of rechecking and updating the initial ‘gut feel’ as a test-taking strategy. One such example is the test conducted by Kruger et al. in 2005.
Following the eraser
The study followed eraser marks of 1561 exam papers for a psychology course at UIUC. The researchers categorised the changes in answers into three, viz., wrong to right, right to wrong and wrong to wrong, based on the 3291 changes they found. And here is what they found:
Answer change
Numbers
%
Wrong to right
1690
51
Right to wrong
838
25
Wrong to wrong
763
23
An important statistic is that about 79% of the students changed their answers. It is significant because, when asked separately, 75% of the students believed that the original choices were more likely to be correct in situations of uncertainty.
Switching to the wrong hurts
The level of fear or shame on a decision to shift from right to wrong overwhelms the misery of failure by sticking to the incorrect one, even though the data showed the advantages the second thinking brings. In a subsequent study, the team asked 23 students of the University of Illinois asked which of the outcomes would hurt them most – 1) you made a switch from a correct answer to a wrong and 2) you did not move away from the initial instinct after considering the eventually correct answer. The response from the majority of respondents suggested that people who were in the first situation regretted it more than the second.
Making a q-q plot is easy in R. Get data, and type a single line command; you’re ready. I will show you the process and the output using the famous mtcars dataset, which is in-built in R. For example, if you want to know the data, qsec, which is 1/4 mile time, of 32 automobiles follows a normal distribution, you type.
qqnorm(mtcars$qsec)
qqline(mtcars$qsec)
These functions are from the stats library. The first line makes the (scatter) plot for your sample, and the second one ( qqline()) makes the line representing the theoretical distribution line, which makes it easier to evaluate whether the points deviate from the reference line.
You can see that the qsec data is more or less uniformly distributed, something you may see from the histogram of the data below.
Take a different data which is left-skewed, such as hp, and its qqplot
It took quite some time and online help to get some of the R codes to generate the analysis of the last post. So let me share them with you.
The Data
The data was generated using the rnorm function, which is a random number generator using a normal distribution. The following lines of codes generate 10,000 random data points from a normal distribution, with an average of 67 and a standard deviation of 2.5, add them to a data frame, and plot the histogram.
Horizontal lines are drawn to mark a few reference frequencies, and the option freq is turned ON (TRUE), which is the default. On the other hand, the density plot may be obtained by one of the two options: freq = FALSE or prob = TRUE.
A bunch of things are attempted below: 1) The spacing (breaks) of the histogram bins as per quantiles 2) Rainbow colours for the bins are given to the bins 3) Two X-axes are given one 2.5 units below another
The Quantile-Quantile (q-q) plot is a technique for verifying how well your data compares with a given distribution. Quantiles are regular intervals for cutting a probability distribution into equal probability pieces. We have seen before about percentiles, Px, the value below which x percentage (100 portions) of the data lies, and quartiles, which divide the distribution into four equal parts of 25% each (first, second, third, and fourth quartiles).
Distribution
Imagine you collected 10,000 sample data for a parameter, say the height of 10000 adult males, and made a histogram.
You can see that the X-axis describes the height (in inches). Each bin (bar) is one inch wide (X-distance). This is how you interpret the plot: say, the red bin starts at 67, ends at 68, and has a height of 1500 in frequency. This would mean that about 1500 individuals (out of the 10,000) are 67 to 68 inches tall. Similarly, from the brown bucket, there are ca. 1300 males between 65 and 66.
The same graph may be represented with density on the Y-axis instead of frequency.
Using density, we rescale the frequency so that the total area under the curve becomes one, and each bin will provide probabilities of occurrence. For example,
1 x 0.16 + 1 x 0.15 + 1 x 0.125 + 1 x 0.13 + 1 x 0.1 + 1 x 0.1 + 1 x 0.06 + 1 x 0.06 + 1 x 0.025 + 1 x 0.025 + ... = 1
Plot against the quantiles
So far, we have used equal quantity for the parameter (height) on the X-axis. We will change it to something different. We will take percentage intervals (a type of quantile). We use 5% intervals, and the height values corresponding to each of the 5% occurrences are tabulated:
0%
5%
10%
15%
20%
25%
30%
57.88
62.87
63.80
64.37
64.87
65.31
65.69
35%
40%
45%
50%
55%
60%
65%
66.05697
66.37
66.69
66.98
67.29
67.62
67.97
70%
75%
80%
85%
90%
95%
100%
68.31
69.10
69.56
68.70
70.15
71.06
77.55
Key observations are 1) the distance between 0 to 5 maybe 5%, but on the scale, it occupies a length of almost 5 inches (62.87 – 57.88). 2) there is an equal probability of observing the value in each group. If you plot the density against the percentiles, we get this:
To understand the second point, equal probability groups, let me add another scale to the X-axis:
Now find the area of any block, e.g. the left red = (62.87 – 57.88) x 0.0093 = 0.046. The next brown = (63.80 – 62.87) x 0.053 = 0.049. Finally, one of the white boxes = (66.98 – 66.69) * 0.17 = 0.049. Now you know the probability groups.
Compare actual with theory
In q-q, you collect the values of various quantiles of your data and plot them against the theoretical quantiles of a specified (normal, chi-squared, etc.) distribution. Since it is theory vs actual, and if they perfectly match, you should get a diagonal straight line.
Before we close
We will do the exercise another time. I will also show you the various R codes used in this post.
We have seen what behavioural scientists had observed when carrying out the ultimatum game on their subjects. Ultimatum game also has an economic side theorised by the game theorists for the rational decision-maker. A representation of the game is below.
Unlike the simultaneous games we had seen before, where we used payoff matrices, this is a sequential game, i.e. the second person starts after the first one has made her move. The first type is a normal form game and is very static. The one shown in the tree above is an example of an extensive form game.
The game
Player A has ten dollars that she splits between her and player B. In the game design, A has to make the proposal and B can accept or reject it. If B accepts the offer, both the players get the money per the division proposed by A. If B refuses, no one gets anything.
Backward induction
Although player A starts the game by spitting 10 dollars between herself and player B, her decision gets influenced by what she assumes about B’s decision (accept/reject). In other words, A requires to begin from the ending and work backwards. Suppose player A does an unfair split 9-1 in favour of A. B can accept the 1 dollar or get nothing by rejecting. Since one is better than zero, B will probably take the offer. If A makes a fair split, then also B will accept the 5. That means B will take the offer no matter what A proposes. So player A may choose the unfair path. This is a Nash equilibrium.
What happens if player B makes a threat of rejecting the unfair offer. It may not be explicit; it could just be a feeling in A’s mind. In either case, player A believes in that and thus makes a fair division. And this is what Kahneman learned from his experiments. In-game theory language, the threat from B is known as an incredible threat as it makes no economic sense to refuse even the unfair offer (as 1 > 0)!
str.aov <- aov(Strength ~ factor(Material), data = PO_data)
summary(str.aov)
Output:
Df Sum Sq Mean Sq F value Pr(>F)
factor(Material) 3 281.7 93.9 6.018 0.0043 **
Residuals 20 312.1 15.6
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Reject null hypothesis
We at least reject the null hypothesis because the p-value < 0.05 (the chosen significance level). The F-value is 6.018. Another way of coming the conclusions is to find out the critical F value for the degrees of freedoms, df1 = 3 and df2 = 20.
You can see that the D-B difference, 9.5, is statistically significant at an adjusted p-value of 0.0022. And, as expected, the 95% confidence interval for D-B doesn’t include 0 (no difference between D and B).
By the way, the message, the difference between blue box and yellow, was already apparent had you paid attention to the box plots we made in the beginning.
In the previous ANOVA exercises, we found that data suggested rejecting the null hypothesis. To remind you of the two hypotheses, N0 – All means are equal NA – Not all means are equal So, we at least rejected the proposition that all means are equal because the p-value was lower than the chosen significance level of 0.05 (or the F value was outside the critical F value corresponding to the 0.05 level). But we have no idea which of the pairs of means had the most significant difference.
Tukey method can create confidence intervals for all pair-wise differences while controlling the family error rate to whatever we specify.
Family error rate
You know what is an error rate. It is the probability that the null hypothesis is correct when you reject it when the p-value is less than the significance level. At a significance level is 0.05, there is a 5% chance of getting your outcome when the null hypothesis is correct. The situation is called a false positive.
The p-value we obtained for the material testing problem was 0.03, but it was for the entire family of four vendor groups (each with ten samples). This is the experiment-wise or family-wise error rate. Since our significance level for the F-test was kept at 0.05, we can regard the family error rate to be 5%.
Four groups, six comparisons
Since we had four groups (factors) of samples, each representing one vendor, we have six possible comparisons. They are:
#
Comparison
1
Vendor 2-Vendor 1
2
Vendor 3-Vendor 1
3
Vendor 4-Vendor 1
4
Vendor 3-Vendor 2
5
Vendor 4-Vendor 2
6
Vendor 4-Vendor 3
Family-wise error rate, 0.05, is the grand union of all pair-wise error rates. If the pair-wise error is alpha, family-wise error = (1 – (1-alpha)C), where C is the number of comparisons. If you substitute alpha = 0.05 and C = 4, you get the family-wise error as 0.26. Obviously, 26% is too high a significance level.
Tukey method preserves the family-wise error rate to what we specify, say, 0.05, and therefore the pair-wise error rates could be about 0.0085.
By keeping all these points in mind, let’s perform the Tukey’s method on our dataset using R.
We have seen how F-statistics work to test the hypotheses in one-way ANOVA. We also know the definition of F as a ratio between two variances. Variances measure how spread the data is around the mean, estimated as the sum of squared deviation divided by the degrees of freedom. If you forgot, you get the standard deviation if you take the square root of the variance.
F = Between groups variance / Within-group variance
F-tests use F-distribution
Recall how we used the t-distribution or binomial distribution to determine the probability where the null hypothesis was true. F-distribution, too, has a characteristic shape and is based on two parameters – the degrees of freedom 1 and 2, the ones used in the numerator and denominator, respectively.
In the case of the material strength problem we have been working out in the past two posts (four groups with ten samples each leading to df1 =4-1 = 3 and df2 = 4 x (10-1) = 36), the F-distribution appear in the following form.
One way to understand the above plot is to imagine you are repeating the sampling several times (keeping for vendors and taking ten samples each so that df1 and df2 remain the same), and the null hypothesis is true. You calculate the F values each time. Finally, if you plot the frequency of those F values, you get a plot similar to the one above.
Let’s do the ANOVA step by step. We use the F-statistic to accept or reject the null hypothesis by comparing it with the critical F value. Once you get the F-value, you can calculate the p-value based on a significance level. The definition of F-statistic is
F = Between groups variance / Within-group variance
Between groups variance
Here, you are estimating the variation of the group statistic from the global statistic. In other words, you determine the means of each group and the global mean (of all data or the mean of means). The estimate the difference, square, add up and divide by the degree of freedom like you do standard variance.
Recall the previous example (strength of materials by four vendors). So you have four groups, each containing ten samples. First, estimate four means and the global mean. They are:
Vendor
Vendor 1
Vendor 2
Vendor 3
Vendor 4
Mean
11.2
8.94
10.68
8.84
Samples
10
10
10
10
Global mean (= 9.915)
Square for factor
10*(11.2-9.915)2
10*(8.94-9.915)2
10*(10.68-9.915)2
10*(8.84-9.915)2
Sum Square for factor (= 43.62)
Degrees of freedom (DF = 4 -1 = 3)
The numerator (mean squares of factor) is calculated by dividing the sum square of factor with the degrees of freedom, i.e., 43.62/3 = 14.54.
Within-group variance
Here, you add up all the variations inside the groups. Add them up and then divide by the sum of the degrees of freedom of each group.
Vendor
Vendor 1
Vendor 2
Vendor 3
Vendor 4
Samples
10
10
10
10
Degrees of Freedom (sample – 1)
9
9
9
9
Within group Squares for error (variance x df)
35.81 (3.98 x 9)
79.93 (8.88 x 9)
10.94 (1.22 x 9)
31.78 (3.53 x 9)
Sum Within group Squares for error (= 158.466)
Total Degrees of Freedom (= 36)
The denominator (mean squares of error) is calculated by dividing the sum within group squares for error with the total degrees of freedom, i.e., 158.466/36 = 4.402.
F – Statistics = 14.54 / 4.402 = 3.30
The 3.30 is then compared with the critical F-value corresponding to a set significance level, 0.05, in the present case. You can either look up at the F distribution table or use the R function.
qf(0.05, 3,36, lower.tail=FALSE)
The critical value is 2.87. Since the F-statistics in our case is larger than 2.87, we reject the null hypothesis. The p-value turned out to be 0.031.