Data & Statistics

Poissonous Rain Drops

Raindrops are falling at a rate of 20 drops per inch per minute. What is the probability that no drop falls inside a 5-inch area for 3 seconds?

Since we consider the raindrops fall at random, we model this as a Poisson process. We need two parameters to estimate the required Poisson probability.
1) average rate of success: lambda
2) number of successes: s

Since we must consider zero success (s = 0) in a 3-second interval on a 5-inch area, we convert the success rate on the 5-inch area in 3 seconds.
lambda = 20 drops per inch per minute
= 20 x 5 = 100 drops per 5-inch area per minute
= 100/20 (= 5) drops per 5-inch area per 3 seconds

\\ P(X = s) = \frac{e^{-\lambda}\lambda^s}{s!} \\ \\ P(X = 0) = \frac{e^{-5}5^0}{0!} = 0.0067

OR

dpois(0, 100/20)

Poissonous Rain Drops Read More »

Probability of a Norepeatword

If one must make a random Norepeatword from the 26 alphabets, what is the probability of picking a work with all 26 letters? A Norepeatword is an assemblage of any number of alphabets (1 to 26) such that no letter is repeated. Let’s do the problem step by step.

  1. Number of 26 letter words: there are 26! words possible from 26 letters.
  2. Number of k-letter words (k = 1 to 26): It has two parts: A) how many k-letter words are possible from 26 and B) how many rearrangements are possible for each.
    A) 26Ck collections can be made from 26 letters with k letters.
    B) For each collection, k! rearrangements are possible.
    Therefore, the number of k-letter Norepeatwords is 26Ck x k!, and the total is obtained by summing from k = 1 to k = 26.

\textrm{\# norepeatwords} = \sum\limits_{k = 1}^{26} _{26}C_k * k! = \sum\limits_{k = 1}^{26} \frac{26!}{k!(26-k)!} * k!

The required probability is:

\\ P = \frac{26!}{\sum\limits_{k = 1}^{26} \frac{26!}{k!(26-k)!} * k!}  =  \frac{26!}{\sum\limits_{k = 1}^{26} \frac{26!}{(26-k)!}} = \frac{1}{\frac{1}{25!} + \frac{1}{24!} + ... + \frac{1}{1!} + 1}

The denominator of the equation is the famous Taylor series expansion of ex for x = 1.
ex = 1 + x + x2/2! + …

So, P = 1/e

Probability of a Norepeatword Read More »

All Three Girls

A family has six children – three boys and three girls. What is the probability that three older children are all girls?

Let’s label the children 1, 2, and 3 as girls and 4, 5, and 6 as boys. The required probability is nothing but:
The permutations for 1, 2, and 3 come up in the first three (any order) / All permutations
3! x 3! / 6! = 6/(6 x 5 x 4) = 1/20 = 0.05

Let’s use brute force and perform an R simulation.

itr <- 1000000

girl <- replicate(itr, {
   birth <- sample(seq(1:6), size = 3, replace = FALSE, prob = rep(1/6, 6))
   all_girl <- c(1,2,3) 

    if(all(all_girl %in% birth) == TRUE){
       counter <- 1
   } else {
       counter <- 0
   }
   
})
mean(girl)
0.050108

All Three Girls Read More »

Flipping for Winners

Sixty-four teams are playing a knockout tournament, and you have to predict the winners of each game. You get 1 point for correctly predicting the first-round winners, 2 for the second, 4 for the third, 8 for the fourth, 16 for the semi-finals and 32 for the final. If you flip coins to choose the winners, what is the expected number of points for predicting all the matches?

Round 1

The expected value of an event is the probability of occurrence x the payout. For a first-round match, the probability of predicting a single winner is (1/2), and the payout is 1.
The expected value for a single first round match = (1/2) x 1 = 1/2
Since there are 32 matches, the total expected value = 32 x 1/2 = 16

Round 2

The probability of predicting a winner in the second round means you need to get two coin flips right (first round and second round). The probability is (1/2) x (1/2). The Payoff is 2. The expected value is (1/2) x (1/2) x 2 = 1/2. For all the 16 matches in this round, it is 16 x 1/2 = 8.

Round 3

The probability is (1/2) x (1/2) (1/2). The Payoff is 4. The expected value is (1/2) x (1/2) x (1/2) x 4 = 1/2. Total = 8 x 1/2 = 4.

The next 3 rounds

The expected values for the next three rounds are 4 x 1/2, 2 x 1/2, and 1/2, respectively.

Adding all values: 16 + 8 + 4 + 2 + 1 + 1/2 = 31.5.

Flipping for Winners Read More »

A Car at a Junction

The probability of a car passing a junction in a 20-minute window is 0.9. What is the chance that a car passes the crossroad in a 5-minute window?

Let’s divide the 20-minute duration into four intervals of 5 minutes each. Let p be the probability of a car passing the junction in 5 minutes. Then, the chance of no car in 5 minutes is (1-p). We know the probability of not finding a car for 20 minutes, the joint probability of four such events happening one after another, is 1 – 0.9 = 0.1. Since these four incidents are independent, you can multiply, i.e., it is (1-p)4.

0.1 = (1-p)4
1 – p = (0.1)(1/4)
p = 1 – (0.1)(1/4)= 1 – 0.56 = 0.44

A Car at a Junction Read More »

Stem and Leaf Plots

A stem and leaf plot is a table of values. The arrangement can give a picture of the distribution of values. Here is the leaf plot of 20 values using the following R command.

numbers <- c(19, 37, 5, 12, 15, 32, 27, 35, 23, 22, 28, 34, 31, 12, 48, 31, 31, 28, 43, 3)
stem(numbers)
  The decimal point is 1 digit(s) to the right of the |

  0 | 35
  1 | 2259
  2 | 23788
  3 | 1112457
  4 | 38

There are five stems, 0, 1, 2, 3, and 4, listed one over another. The column on the right contains the leaves arranged from left to right.

How to read:
Take the first stem, 0. There are two numbers on the right, 3 and 5. That means the set has values of 3 (03) and 5 (05).

Stem and Leaf Plots Read More »

Hardy-Weinberg – Applied

Here is a problem to illustrate how the Hardy-Weinberg Principle is applied.

In a population of 130,000 special mice, green fur is dominant over orange. If there are 300 orange mice in the population of 130,000, find the following:
Frequency of green allele
Frequency of orange allele
Frequency of each genotype

p + q = 1
p2 + 2 pq + q2 = 1

We must start with orange as it is the recessive one. It is because, from the dominant (green) colour, it is impossible to say that it is homozygous dominant or heterozygous. Whereas only homozygous recessive has orange fur.

q2 = 300/130,000 = 0.0023
q = sqrt(q2) = 0.048
p = 1 – q = 0.95

Genotype Frequencies:
homozygous dominant = p2 = 0.90
Heterozygous = 2pq = 0.0456 = 0.091
Homozygous recessive = q2 = 0.0023

Reference

The Hardy-Weinberg Principle: Watch your Ps and Qs: ThePenguinProf

Hardy-Weinberg – Applied Read More »

Hardy-Weinberg – Continued

We have seen the terms allele, dominant, recessive, genotype, phenotype, etc. Consider a population with five individuals with the following gene pairs. Assume they dictate the hair colour. So, the red gene is for red hair colour and black for black. But red is recessive, and black is dominant. That means.

red-red = red hair
red-black = black hair
black-black = black hair

So we have two people with red hair (homozygous recessive) and three with black hair (two heterozygous and one homozygous dominant).

Hardy Weinberg rule

The rule states that allele and genotype frequencies in a population will remain constant in the absence of other evolutionary processes such as migration, mutation, selection, etc.

To estimate the allele frequency, we create the gene pool, the aggregate of all alleles of the population. Our gene pool has ten alleles, with six reds and four blacks.

Let p represent the allele frequency of the dominant trait and q that of the recessive.
p = 4/10 = 0.4
q = 6/10 = 0.6
p + q = constant = 1

To estimate genotype frequency, consider the following. If we take one random gene from the pool, what is the probability that it is red? It will be q. What is the chance of picking two genes to form a homozygous recessive? It will be q x q = q2. Similarly, to pull out a black followed by another black is p2 and a black and a red is 2pq (black followed by red OR red followed by black; add them because of OR rule).

The genotype frequency is the sum of those three types, i.e., p2 + 2 pq + q2. Needless to say, that will be 1. To summarise, the following are the two governing equations.

p + q = 1
p2 + 2 pq + q2 = 1

Hardy-Weinberg – Continued Read More »

Hardy-Weinberg Principle

We have two copies of every gene; each copy is called an allele. If both copies of alleles are the same, it’s homozygous. On the other hand, if they are different, it’s heterozygous. The terms dominant and recessive mean two different inheritance patterns of traits.

Imagine hair colour as a trait. The dominant is black, and the recessive is red. Note this doesn’t mean black hair dominates or anything like that. It only means:

black allele + black allele = black trait
black allele + red allele = black trait
red allele + red allele = red trait

Simply put, you need both the recessive alleles to get a recessive trait, whereas one dominant allele is sufficient to get the dominant trait.

Also, the allele pairs (black, black), (black, red) and (red, red) are all genotypes. On the other hand, the traits – black hair and red hair are two phenotypes. If a genotype is what your genes are, then a phenotype is what you look like!

Before Hardy and Weinberg, people used to get puzzled by the fact that the population did not end up having only the dominant traits. Their principle (developed independently) states that frequencies of alleles and genotypes in a population will remain constant over time in the absence of other evolutionary influences.

Hardy-Weinberg Principle Read More »

Chance of DMD

A woman who has a family history of Duchenne muscular dystrophy (DMD) gets tested for the presence of disease using a test (creatine phosphokinase, CPK) that has a sensitivity of 67% and a specificity of 95%. It is known that her brother has the condition. What is the probability that she is a carrier of the condition, and further, what is the chance that her son will have the disease if she tests negative in the CPK?

The conditional probability of disease, given the negative test result. We will use Bayes’ theorem to estimate the probability.

P(C|-ve) = \frac{P(-ve|C) P(C)}{P(-ve|C) P(C) + P(-ve|nC) P(nC)}

P(C|-ve) denotes the probability that she is a carrier (C), given she tested negative for the conditions (-ve).

P(-ve|C) is the chance of a negative result, given the person is a carrier. We know the chance of a +ve results if the person carries the gene. It is the sensitivity and is 67%. This means if the person carries a gene, there is a 67% chance of getting a positive and a 33% chance of getting a negative result. Therefore, P(-ve|C) = 0.33.

P(-ve|nC) is the chance of a negative result, given the person is NOT a carrier. It is nothing but specificity, and it is 95%. Therefore, P(-ve|nC) = 0.95.

This leaves the final two parameters: the prior probabilities that she carries or does not carry the disease genes (P(C) and P(nC)). DMD happens because of a mutated gene on the X chromosome. In our case, the woman can get that X chromosome from her father or mother. Since her brother had the conditions, and he could get X only from his mother, the mother is certainly a carrier of the mutated gene. Since the daughter inherited one of two Xs from her mother, P(C) = 0.5, which means P(nC) is 0.5.

Applying the Bayes’ theorem,

P(C|-ve) = \frac{0.33*0.5}{0.33*0.5 + 0.95*0.5} = 0.26

If she is a carrier (there is a 26% chance), she could pass one of her X chromosomes to her son at a 50% chance. This implies that the probability that her son gets the disease is 0.26 x 0.5 = 0.13 or 13%.

Reference

Bayesian Analysis and Risk Assessment in Genetic Counseling and Testing: Journal of Molecular Diagnostics, Vol. 6, No. 1, February 2004

Chance of DMD Read More »