Let’s do the ANOVA step by step. We use the F-statistic to accept or reject the null hypothesis by comparing it with the critical F value. Once you get the F-value, you can calculate the p-value based on a significance level.
The definition of F-statistic is
F = Between groups variance / Within-group variance
Between groups variance
Here, you are estimating the variation of the group statistic from the global statistic. In other words, you determine the means of each group and the global mean (of all data or the mean of means). The estimate the difference, square, add up and divide by the degree of freedom like you do standard variance.
Recall the previous example (strength of materials by four vendors). So you have four groups, each containing ten samples. First, estimate four means and the global mean. They are:
Vendor | Vendor 1 | Vendor 2 | Vendor 3 | Vendor 4 |
Mean | 11.2 | 8.94 | 10.68 | 8.84 |
Samples | 10 | 10 | 10 | 10 |
Global mean (= 9.915) | ||||
Square for factor | 10*(11.2-9.915)2 | 10*(8.94-9.915)2 | 10*(10.68-9.915)2 | 10*(8.84-9.915)2 |
Sum Square for factor (= 43.62) | ||||
Degrees of freedom (DF = 4 -1 = 3) |
The numerator (mean squares of factor) is calculated by dividing the sum square of factor with the degrees of freedom, i.e., 43.62/3 = 14.54.
Within-group variance
Here, you add up all the variations inside the groups. Add them up and then divide by the sum of the degrees of freedom of each group.
Vendor | Vendor 1 | Vendor 2 | Vendor 3 | Vendor 4 |
Samples | 10 | 10 | 10 | 10 |
Degrees of Freedom (sample – 1) | 9 | 9 | 9 | 9 |
Within group Squares for error (variance x df) | 35.81 (3.98 x 9) | 79.93 (8.88 x 9) | 10.94 (1.22 x 9) | 31.78 (3.53 x 9) |
Sum Within group Squares for error (= 158.466) | ||||
Total Degrees of Freedom (= 36) | ||||
The denominator (mean squares of error) is calculated by dividing the sum within group squares for error with the total degrees of freedom, i.e., 158.466/36 = 4.402.
F – Statistics = 14.54 / 4.402 = 3.30
The 3.30 is then compared with the critical F-value corresponding to a set significance level, 0.05, in the present case. You can either look up at the F distribution table or use the R function.
qf(0.05, 3,36, lower.tail=FALSE)
The critical value is 2.87. Since the F-statistics in our case is larger than 2.87, we reject the null hypothesis. The p-value turned out to be 0.031.
pf(3.303, 3, 36, lower.tail = FALSE)