Rare Disease Revisited

Remember when we discussed the application of Bayes’ theorem to quantify the predictive values of medical tests? It gives the probability that a person has a (rare) disease, given she is tested positive. Today, we verify the solution using a simple R code by sampling a million people!

Using the familiar notations, we write down Bayes’ formula.

P(D|+) = \frac{P(+|D) P(D) }{P(+|D) P(D) + P(+|NoD) P(NoD)}

Let’s assign probabilities to each of the parameters.

1) The test shows positive 90% of the time on patients with the disease (high sensitivity); P(+|D)
2) The test shows negative 95% of the time on healthy patients (high specificity); P(-|noD)
3) The disease is present in 1% of the community (low prevalence ) P(D)

Note that specificity = P(-|noD), whereas what we want is P(+|NoD), which is 1 – P(-|noD). Substituting all values,

\\ P(D|+) = \frac{P(+|D) P(D) }{P(+|D) P(D) + P(+|NoD) P(NoD)} \\ \\ P(D|+) = \frac{Sensitivity *  Prevalence}{Sensitivity *  Prevalence + (1-Specificity)*(1- Prevalence)} \\ \\ P(D|+) = \frac{0.9*0.01}{0.9*0.01 + 0.05*0.99} = 0.15

Let’s develop a code that simulates the testing of a million people.

Step 1: A million people, with 1% (random) having the disease. 1 = disease, 0 = no disease.

disease <- sample(c(0,1), size=1e6, replace=TRUE, prob=c(0.99,0.01))

Step 2: Create an empty vector with a million slots.

test <- rep(NA, 1e6)

Step 3: Fill the disease columns (disease = 1) with random assignment of test results; 90% with 1 and 10% with 0.
Fill the nondisease columns (disease = 0) with random assignment of test results; 95% with 0 and 0.05% with 1.

test[disease==1] <- sample(c(0,1), size=sum(disease==1), replace=TRUE, prob=c(0.1, 0.9))
test[disease==0] <- sample(c(0,1), size=sum(disease==0), replace=TRUE, prob=c(0.95,0.05))

Now estimate the average number of people with disease = 1 AND test = 1.

mean(disease[test==1]==1)

Putting everything together,

disease <- sample(c(0,1), size=1e6, replace=TRUE, prob=c(0.99,0.01))
test <- rep(NA, 1e6)
test[disease==1] <- sample(c(0,1), size=sum(disease==1), replace=TRUE, prob=c(0.1, 0.9))
test[disease==0] <- sample(c(0,1), size=sum(disease==0), replace=TRUE, prob=c(0.95,0.05))
mean(disease[test==1]==1)