Data & Statistics

Likelihood Ratio and Posterior Odds

We know how the updated (posterior) disease probability is related to the prevalence (prior) via Bayes’ relationship.

\text{Posterior} = \frac{Sensitivity *  Prior}{Sensitivity *  Prior + (1-Specificity)*(1- Prior)}

Here, the ‘posterior’ and ‘prior’ are probability values. The corresponding odds ratio may be calculated using the following formula,

\text{Odds Ratio} = \frac{P}{1-P}

Using this definition, we estimate the odds ratio of the posterior as:

\\ OR_{post}= \frac{Posterior}{1-Posterior} \\ \\ = \frac{\frac{Sensitivity *  Prior}{Sensitivity *  Prior + (1-Specificity)*(1- Prior)}}{1 - \frac{Sensitivity *  Prior}{Sensitivity *  Prior + (1-Specificity)*(1- Prior)}} \\ \\ = \frac{Sensitivity *  Prior} {(1-Specificity)*(1- Prior)} = \frac{Sensitivity} {(1-Specificity)}\frac{Prior}{(1- Prior)}}

Notice the two terms: the first term, Sensitivity / (1 – Specificity), is the likelihood ratio and the second term, Prior / (1-Prior), is the odds ratio of the prior. Therefore,

OR_Post = LR x OR_Pri

Example

A new diagnostic tool yielded the following results.

  • A total of 1,000 individuals took the test.
  • 435 individuals had positive results, and 565 were negative.
  • Out of the 435 positive, 381 of them had the disease.
  • Out of the 565 negative, 549 did not have the disease.

What is the positive likelihood ratio of the test method?

From the data, true positives (TP) are 381. Then 435 – 381 = 54 must be false positives (FP).
Similarly, the true negatives (TN) are 549. 565 – 549 = 16 must be false negatives (FN).

Sensitivity = TP/(TP + FN) = 381/(381+16) = 0.96
Specificity = TN/(TN+FP) = 549 / (549 + 54) = 0.91

The likelihood ratio, therefore, is,
0.96 / (1 – 0.91) = 10.7

Likelihood Ratio and Posterior Odds Read More »

Likelihood Ratio

The likelihood ratio is the proportion of people with a disease and a test result vs. people without the disease and the same test result. In other words,

P(+ve AND D) / P(+ve AND D-) = P(TP) / P(FP) = [TP/TP+FN] / [FP/FP+TN]
LR+= Sensitivity / 1 – Specificity.

This is the positive likelihood ratio (LR+)

In the same way, there is a negative likelihood ratio (LR-),
P(-ve AND D) / P(-ve AND D-) = P(FN) / P(TN) = TP/TP+FN
LR- = (1-Sensitivity) / Specificity

Note that both these ratios don’t depend on the prevalence of the disease but on the measurement techniques. A likelihood ratio of close to 1 means that the particular test has little influence on determining whether the patient has the suspected condition or not. Likelihood ratios > 10 and < 0.1 are considered to provide robust evidence for and against the diagnoses, respectively.

Likelihood Ratio Read More »

Two-Proportions Z-Test

A survey revealed the following information on the prevalence of eye disease. Check if the difference in prevalence is statistically significant.

ResidenceEye DiseaseTotal
YesNo
Rural24276300
Urban15485500

z = \frac{p_1 - p_2}{\sqrt{p_1(1-p_1)/n_1 + p_2(1-p_2)/n_2}}

n1: sample size of population 1 = 300
n2: sample size of population 2 = 500
p1: sample proportion for population 1 = 24/300
p2: sample proportion for population 2 = 15/500

z = \frac{0.08 - 0.03}{\sqrt{0.08(1-0.08)/300 + 0.03(1-0.03)/500}} =  2.87

Critical z = 1.96 at a 5% confidence interval.
Therefore, z = 2.87 > critical z; the difference in prevalence of eye disease between urban and rural is significant.

Two-Proportions Z-Test Read More »

Enigmatic Possibilities

The Enigma machine was an electromechanical device built by the Germans in World War II to mechanise encryption. The device was about the size of a typewriter and had two sets of letters on a keyboard and a lampboard. The message got encrypted letter by letter.

The Enigma machine was a large circuit. It had the following components.

  1. Rotors 1, 2, and 3. They connected the cris-cross wires from one letter to another. But these three rotors are selected from a total of five.
  2. The reflector connected 26 letters into 13 pairs.
  3. The plugboard connected some letters into pairs, and some were left unconnected. In one version, it connected 20 letters into ten pairs and left six unpaired.

So what are the total possibilities?

1) 3 chose from 5 (and order matters) => 5!/2! = 60.
2) Three rotors with 26 letters available => 26 x 26 x 26 possibilities
3) 10 pairs from 26 possible letters => 26!/6!10!210. 210 comes because a pair AB is indistinguishable from BA, and there were 10 such combinations.

Multiply all three, and you get the possible ways to set the enigma machine! That equals 1.589626e+20.

158,962,555,217,826,360,000 (Enigma Machine): Numberphile

Enigmatic Possibilities Read More »

Pascal’s Wager

Think about this game. There is a 1 in 1000 chance of winning a prize of 1 billion. The price of the ticket is $1. Will you take the gamble? Definitely, it is a good deal to buy the ticket. You only lose a dollar but get a chance to win a billion (expected value of a million).

Pascal used a similar argument to state that belief in god was a better deal than not doing so. He argues:
Proposition 1: God exists
Proposition 2: God doesn’t exist
If god exists and you believe, the payoff is infinitely good
If god exists and you don’t believe, the payoff is much worse
On the other hand, if god doesn’t exist, regardless of whether you believe in it or not, the payoffs (positive and negative) are finite. So, he argues, believing is a better deal.

God exists (G)God
does not exist (¬G)
Belief (B)infinite gainfinite loss
Disbelief (¬B)infinite lossfinite gain

Based on the payoff matrix, there is only one rational (!) decision: choose B.

Pascal’s wager: Wiki
PHILOSOPHY – Religion: Pascal’s Wager: Wireless Philosophy

Pascal’s Wager Read More »

The Probability of Steroid Team

A country has two teams of weightlifters; in one, 80% use steroids regularly, and in the other, only 20% use them. The head coach flips a coin and selects the team for the international meet. At the venue, if one lifter was selected at random for the drug test and found positive, what is the probability that the team is the steroid one?

We will use the base form of Bayes’ theorem – the relationship between conditional and joint probabilities.

P(S/T) = P(S & T) / P(T)
S – it is a steroid team
T – tested member used steroid
C – it is a clean team

P(S & T) = P(S) x P(T|S) = 0.5 (coin toss) x 0.8 (chance of using steroids, given he is from the steroid team) = 0.4
P(T) = P(S) x P(T|S) + P(C) x P(T|C) = 0.5 x 0.8 + 0.5 x 0.2 = 0.5
P(S/T) = 0.4/0.5 = 0.8

The probability that the team is the steroid one is 80%

The Probability of Steroid Team Read More »

Martin and Big Fish

The story of Martin and Big Fish, taken from ‘An Introduction to Probability and Inductive Logic’ by Hacking, is about risk and insurance.

Marting sells clothes on the streets. His sales are typically about $300 and cost $100. Since he is not registered as a vendor at that location, he gets tickets from the authorities for illegal sales. The fine is $100, and he estimates that they happen about two times on his 5-day week.

The daily expected value of his work is:
(2/5) x (300 – 100 -100) + (3/5) x (300 – 100) = $160.

Now, Big Fish finds Martin offers his stall at a daily rent of $50. Martin’s new return can become 300 – 100 – 50 = 150. Should he agree with this?

It is a trade-off for Martin; his profits come down, but he runs no risk now. It is possible that the number of raids increase in future. The same can happen with the fine amount. By paying the additional $10, he replaced the risk with certainty.

Reference

An Introduction to Probability and Interactive Logic by Ian Hacking

Martin and Big Fish Read More »

The Principle of Insufficient Reason

Also known as the principle of indifference states, if you have a bunch of theories and don’t have a reason to prefer one of them, then they all get the same prior probability.

1) what is the probability that the trillionth digit of pi is 5? Well, until you do the calculations, the prior probability is 1 in 10.

2) Andy knows his friend Becky will arrive at the City airport between 9:00 and 10:00. Five airlines land between these timings. Airline A and B on terminal 1, C and D on terminal 2 and E on terminal 3. What should Andy do? He can eliminate terminal 3 (the lowest probability of 1/5) and then toss a coin and decide between 1 and 2 (equal prior probabilities of 2/5 each) accordingly.

The Principle of Insufficient Reason Read More »

Electricity Production – Power – Energy Gap

We have seen the carbon intensity of the various national electric grids in the previous post. India is one of the countries with a reasonable growth of renewables – 40% installed power of non-fossil fuel-based electricity – yet with one of the higher carbon intensities in the group with 632 gCO2/kWh. We use that example to explain the difference between power and energy.

Power vs Energy

Power, defined as W, kW, MW etc., is the capacity of the generator to deliver the electric energy. And energy is what is delivered by the machine to do work. For example, if a one MW system runs for one hour, it produces 1 MWh of energy. In other words, a 1 MW system delivers 8.76 GWh of energy a year if it works full-time (1 x 24 x 365). But, if the same generator works only 10% of the time, it produces 876 MWh.

Capacity factor

We have encountered it before. It is the actual amount of energy obtained (in MWh) in an average hour of the year if you install a one MW plant. You can get it by dividing the exact electricity output by the maximum possible.

Let’s look at India’s electricity production (excluding utility and captive Power).

And the installed power,

You can see the issue: the installed power from non-fossil-fuel-based electricity production is in the 40s, whereas the energy contribution is only in the 20s. The capacity factors are estimated by dividing the power with the corresponding energy for a 24-running generator.

Note the low capacity factor for the gas generators. It is not an inherent problem of gas turbines but is likely due to controlled production as a flexible means to manage the peak load requirements.

Reference

CO2 Emissions in 2022: IEA
Electricity production: Enerdata
Carbon Dioxide Emissions From Electricity: world-nuclear.org
Greenhouse gas emissions: Our World in Data
Electricity Mix: Our World in Data
Electricity sector in India: Wiki
Renewable energy in India: Wiki

Electricity Production – Power – Energy Gap Read More »

Electricity Production – Power and Energy

The global emissions of CO2, which is about three-quarters of all greenhouse gases, stood at 36.8 Gt in 2022. A third of the CO2 comes from power production. Reduction of CO2 intensity, therefore, is crucial for a few reasons. First, it reduces the present emissions. More importantly, a cleaner grid catalyses future decarbonisation of other industries via electrification.

The carbon intensity of electric grids, expressed as grams of CO2 per kWh of electricity produced, is presented below.

You can see in the plot that the global average is ca. 436.34 gCO2/kWh. Coupled that with 28,528 Terrawat-hour (TWh) of electricity production in 2022, you get 436.34 (gCO2/kWh)* 28528 (TWh) /1e6 = 12.45 Gt CO2.

There are two commonly used units for the power production of an area – energy produced and the installed power. And they often cause some confusion. That is next.

Reference

CO2 Emissions in 2022: IEA
Electricity production: Enerdata
Carbon Dioxide Emissions From Electricity: world-nuclear.org
Greenhouse gas emissions: Our World in Data
Electricity Mix: Our World in Data
Electricity sector in India: Wiki
Renewable energy in India: Wiki

Electricity Production – Power and Energy Read More »