Non-Parametric ANOVA – Kruskal–Wallis test

Here are five months of quality data on Ozone concentration. The task is to test if one month’s data is significantly different from any other month’s.

The first thing to graph the monthly variations of ozone in summary plots: a boxplot is one good choice.

library(ggpubr)
data("airquality")
AQ_data <- airquality
ggboxplot(AQ_data, x = "Month", y = "Ozone", 
          color = "Month", palette = c("#00AFBB", "#E7B800", "#a0AF00", "#17B800", "#20AFBB"),
        ylab = "Ozone", xlab = "Month") +
theme(legend.position="none")

Getting quantitative

Let’s do a hypothesis test. A few quick Shapiro tests suggest only month 7 followed a normal distribution. So, we will use a non-parametric test. The Kruskal–Wallis test is one of them.

kruskal.test(Ozone ~ Month, data = airquality)
	Kruskal-Wallis rank sum test

data:  Ozone by Month
Kruskal-Wallis chi-squared = 29.267, df = 4, p-value = 6.901e-06

Yes, monthly behaviours are not similar. If you want pair-wise testing, we can use a pair-wise Wilcoxon rank-sum test.

pairwise.wilcox.test(AQ_data$Ozone, AQ_data$Month)
	Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  AQ_data$Ozone and AQ_data$Month 

  5      6      7      8     
6 0.5775 -      -      -     
7 0.0003 0.0848 -      -     
8 0.0011 0.1295 1.0000 -     
9 0.4744 1.0000 0.0060 0.0227

P value adjustment method: holm 

The conclusion: Significant differences are seen:
Month 5 vs Month 7 and Month 8
Month 9 vs Month 7 and Month 8