You may recall the boy or girl paradox and the Bayesian explanation from the earlier two posts. Today we will perform experiments with thousands of families and count. Or simulate using the Monte Carlo method written in R. First, what is the paradox?
1) Mr X has two children. The first one is a girl, and what is the probability that both the children are girls?
Martin Gardner (1959)
2) Mr Y has two children. At least one of them is a girl, and what is the probability that both the children are girls?
We will use the replicate() function to do simulations. Before we go through the problems, let’s calculate the probability of having two girls if no other information is given.
itr <- 1000000
girl_count <- replicate(itr, {
child <- sample(c("B","G"), size = 2, replace = TRUE, prob = c(1/2,1/2))
child <- paste(child,collapse =" ")
ifelse (child == "G G", "2G", "n2G")
})
length(girl_count[girl_count == "2G"]) / length(girl_count[girl_count == "2G" | girl_count == "n2G"])
The code yields 0.25 as the answer. To explain what we did: assign “B” or “G” to two children at random using the sample() function and count the items “G G” among all the possible outcomes. Now, item 1 of the two.
itr <- 1000000
girl_count <- replicate(itr, {
child <- sample(c("B","G"), size = 2, replace = TRUE, prob = c(1/2,1/2))
child <- paste(child,collapse =" ")
if(substr(child, 1, 1) == "G"){
ifelse (child == "G G", "2G", "1G")
}
})
length(girl_count[girl_count == "2G"]) / length(girl_count[girl_count == "2G" | girl_count == "1G"])
The answer is 0.5. The key difference here is that we count for two girls for only the situations where the first one is already a girl. It is achieved by searching the first member of the substring and proceeding further only if it is a “G”. Note: using the function substr() requires the library(stringr). Finally, the contentious one – item 2.
itr <- 1000000
girl_count <- replicate(itr, {
child <- sample(c("B","G"), size = 2, replace = TRUE, prob = c(1/2,1/2))
child <- paste(child,collapse =" ")
if(grepl("G", child, fixed = TRUE)){
ifelse (child == "G G", "2G", "1G")
}
})
length(girl_count[girl_count == "2G"]) / length(girl_count[girl_count == "2G" | girl_count == "1G"])
The answer, and no surprise, is 0.33. We have used the grepl() command to find a G somewhere as the trigger to further count or not.
Tailpiece
For those who suspect my previous post about throwing dice for a double six was similar to the last one: You are right; change B and G to the sides of a die, run the simulation, and you get what you solved analytically.
itr <- 1000000
dice_count <- replicate(itr, {
dice <- sample(c("1","2", "3", "4", "5", "6"), size = 2, replace = TRUE, prob = c(1/6, 1/6, 1/6, 1/6, 1/6, 1/6))
dice <- paste(dice,collapse =" ")
if(grepl("6", dice, fixed = TRUE)){
ifelse (dice == "6 6", "2-6", "1-6")
}
})
length(dice_count[dice_count == "2-6"]) / length(dice_count[dice_count == "2-6" | dice_count == "1-6"])