Confusion Matrix – The Prevalence Problem

So, do you think the machine learning algorithm developed in the previous post is useful for predicting the sex of a person their height? In other words, what is the precision of the method?

The precision means the probability of the person being a female, given the prediction was for a female.

P(Y = 1 | \hat{Y} = 1)

Based on Bayes’ theorem

P(Y = 1 | \hat{Y} = 1) = P( \hat{Y} = 1 | Y = 1) \frac{P(Y = 1)}{P(\hat{Y} = 1)}

Note that P(Y = 1) is the prior probability of females in the system and not in the dataset. It is likely to be close to 0.5. Whereas we know the prevalence of females in the dataset is 0.23 (P( Y^ = 1)). This implies the ratio, actual vs the dataset, is 0.23/0.5 = 0.46; precision is less than 1 in 2.