Birthday Problem – Data

We have seen the birthday problem earlier, and a group of 23 has a 50% chance that two of its members will share a birthday. Here is a real test to validate it. We use birth data from the recently concluded women’s World Cup. The data is available in the reference.

The following R code arranged the data of 736 players that belonged to 32 teams.

F_data <- read.csv("D:/Misc/DataData/Footer1.csv")
F_data <- as.data.frame(matrix(F_data$DOB, nrow = 23))
names(F_data) <- paste0("TEAM", 1:ncol(F_data))
as_tibble(F_data)

The next set of calculations modifies the dataset into a month-date format.

F_data1 <- F_data
for (i in 1:ncol(F_data)) {
  F_data1[,i] <- as.Date(F_data[,i], format = "%d/ %m/ %Y")
  F_data1[,i] <- format(F_data1[,i], format="%m-%d")
}
as_tibble(F_data1)

The final set of codes calculates if any date is duplicated in each team and gets the total number of such instances.

match1 <- rep(0, ncol(F_data1))
for (i in 1:ncol(F_data1)) {
match1[i] <- any(duplicated(F_data1[,i]) == TRUE)
}

match1
sum(match1)
0 0 1 1 0 1 1 1 0 1 1 1 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 1 1 1 

17

Since for a 23-member group, there is a 50% chance. Therefore, in a 30-team competition, the expectation is 16 teams on average. And in reality, it turned out to be 17; not bad, eh?

Reference

Squad List: women’s world cup