Risk ratio and Odds Ratio

What is the risk of you getting lost in the barrage of jargon used by statisticians? What are the odds of the earlier statement being true? Risks, odds and their corresponding ratios are terms used by statisticians to mesmerise non-statisticians.

Risk is probability, p

In medical studies, the phrase risk means probability. For example: if one person has cancer in a population of 1000 people, we call the risk of cancer in that society is (1/1000) or 0.001. For coin flipping, our favourite hobby, the risk of having a head is (1/2) = 0.5, and for rolling a dice, the risk of getting a 3 is (1/6) or 0.167. You may call it the absolute risk because you will see something soon that is not absolute (called relative), so be prepared.

Odds are p/(1-p)

Odds are the probability of an event occurring in a group divided by the probability of the event not occurring. Odds are the favourite for bettors. The odds of cancer in the earlier fictitious society are (1/1000)/(999/1000) =0.001. The number appears similar to the risk, which is only a coincidence due to the small value of the probability. For coin tossing, the odds of heads are (0.5/0.5) = 1, and for the dice, (0.167/0.833) = 0.2. Conversely, the odds of getting anything but a 3 in dice is (5/6)/(1/6) = 0.833/0.167 = 5.

Titanic survivors

SexDiedSurvivedRisk
Men13643671364/(1364+367)
0.79
Women126344126/(126 + 344)
0.27
SexDeathSurvivalOdds
Men0.791-0.790.79/(1-0.79)
3.76
Women0.271-0.270.27/(1-0.27)
0.37

The risk shuttles between 0 and 1; odds, on the other hand, it is 0 to infinity. When the risk moves above 0.5, the odds crosses 1.

Now, the ratios, RR and OR

Risk Ratio (RR) is the same as Relative Risk (RR). If the risk of cancer in one group is 0.002 and in another is 0.001, then RR = (0.002/0.001) = 2. The RR of losing dice rolling to coin tossing is (5/6)/(1/2)= 1.7. In the titanic example, the RR (between men and women) is (0.79/0.27) = 2.93.

The Odds Ratio (OR) is the ratio of odds. The odds ratio for titanic is (3.76/0.37) = 10.16.

Risk ratio and Odds Ratio Read More »

When Mother Became Nature

Sexual selection is a topic that invoked some controversy among evolutionary biologists. Darwin distinguished sexual selection as the difference in the ability to produce offspring, and natural selection, on the other hand, is about the struggles for existence.

Sexual selection is a combination of many factors. It could be a male-male struggle to reach a female, females snubbing males with certain features, or simply mating with certain males leading to weaker or no offspring.

mtDNA and NRY know it all

Whatever may be the precise reason, it has been established now using complex DNA analysis and computation that historically, more females and fewer males have participated in the development of the human race. In other words, throughout human history, leave the modern time when females started moving with their partners, fewer men participated in the reproduction process, although there are no reasons to believe that their respective numbers in the population were different. It went to such a low around 8000 years ago when the female-to-male effective population ratio was about 17!

References

Lippold et al. Investigative Genetics 2014, 5:13, Human paternal and maternal demographic histories: insights from high-resolution Y chromosome and mtDNA sequences

Genome Res. 2015 Apr; 25(4): 459–466, A recent bottleneck of Y chromosome diversity coincides with a global change in culture

When Mother Became Nature Read More »

What the Eyewitness saw

We have seen earlier that much of the evidence, depending on the nature, gives only moderate separation of the probability distributions of guilty and innocent curves. Evidence from the eyewitness is a leader that plays a pivotal role in the trial process. The pioneering work of Elizabeth Loftus reveals a lot about the fallibility of memory and the malleability of the brain by misinformation.

It’s in the wording

The first one is on people’s ability to estimate. In one experiment, the participants were asked to guess about the height of a basketball player. One group was asked: “how tall was the player” and the other, “how short was”. The ‘tall’ group estimated a higher number on an average than the ‘short’; the height difference between the tall to short was about 15 mm!

In the second experiment, 100 students were shown a video involving motor accidents and asked a few questions in which 6 were test questions – three of them about what happened in the movie and three that did not. Half of the subjects were given questions that were framed using ‘a’, such as Did you see a …? The other half were asked using ‘the’, such as Did you see the …? An overwhelmingly more number of people responded yes to the ‘the’ questions than the ‘a’ queries, irrespective of whether those events happened in the movie or not.

The role of presupposition

It is about asking one question followed by a second one. The purpose of the first question is to plant some seeds in the participant to influence the subsequent one. Forty undergraduates at the University of Washington were shown a 3-min video taken from the film “diary of a student revolution”. In the end, they were given a questionnaire with 19 filler questions and one key question. Half of the people got the question: “was the leader of the four demonstrators a male?” and the other half “was the leader of the twelve demonstrators a male?”. A week later, the subjects were back to answer 12 questions in which one key question was “how many demonstrators did you see in the movie”. The people who were asked “12” gave an average of 8.85 as the answer, whereas the “4” gave 6.4.

And the result?

The results make descriptions by witnesses one of the least reliable forms of evidence to separate guilty from the innocent. Do you remember the d’ of 0.8 from the earlier post?

Loftus, E. F., Cognitive Psychology 7, 560-572

Elizabeth Loftus: Wiki

What the Eyewitness saw Read More »

Justice and the Use of Prior Beliefs

The last two posts ended with rather pessimistic notes on the possibility of establishing justice under the complex world of overlapping pieces of evidence. We end the series with the last technique and check if that offers a better hope of overcoming some inherent issues of separating signals from noise using the beta parameter.

Beta comes from signal detection theory, and it is the ratio of likelihoods, i.e. P(xi|G)/P(xi|I). P(xi|G) is the probability of the evidence, given the person is guilty, and P(xi|I), if she is innocent.

Let us start from Bayes’ rule,

\\ P(G|x_i) = \frac{P(x_i|G)*P(G)}{P(x_i|G)*P(G) + P(x_i|I)*P(I)} \\ \\  P(I|x_i) = \frac{P(x_i|I)*P(I)}{P(x_i|I)*P(I) + P(x_i|G)*P(G)} \\ \\  \frac{P(G|x_i)}{P(I|x_i)} = \frac{P(x_i|G)*P(G)}{P(x_i|I)*P(I)} \text{or} \\ \\ \frac{P(G|x_i)*P(I)}{P(I|x_i)*P(G)} = \frac{P(x_i|G)}{P(x_i|I)} = \beta

So, beta depends on the posterior odds of guilt and the prior odds of innocent.

For a situation at a likelihood ratio of 1, if the prior belief, P(G), is lower, the jury is less likely to make a false alarm. Graphically, this means moving the vertical line to the right and achieving higher accuracy in preventing false alarms (at the expense of more misses).

The sad truth is that none of these techniques is helping to reduce the overall errors in judgement.

Do juries meet our expectations?: Arkes and Mellers

Justice and the Use of Prior Beliefs Read More »

Elusive Justice

We have seen perceived culpability vs probability distribution of evidence and their overlapping nature in the last post. It offers a simple picture, but the message demonstrates the difficulties of establishing justice for all. Today, we will go a step deeper by invoking signal detection theory, a strategy that can help detect signals from noise.

One technique to raise the level of justice (reduce misses and false alarms) is to increase the distance between the distribution curves from each other. Let’s look at it quantitatively: imagine you want not more than 5% misses and 5% false alarms, then the separation required between the two curves should be something like the following picture.

The red dotted line passes through the point on the 95th quantile of the green curve (calculated by the R formula, qnorm(0.95, 5, 1.52), where five is the mean and 1.52 is the standard deviation). You already know the meaning of 95% – it is 1.64 standard deviations away from the mean (equivalent to the one-sided confidence interval). The line also should match the left 5% of the blue curve (qnorm(0.05, 10, 1.52). One way to quantify the separation is to estimate the distance between the two distributions as a function of the standard deviation of the innocent.

d' = \frac{\mu_g - \mu_i}{\sigma_i} \\ \\ \mu_g \text{ = mean of guilty, }\mu_i \text{ = mean of innocent, and } \sigma_i \text{ = standard deviaton of innocent}

For the above plot, the separation is 3.3 standard deviations. If you wanted to see a more just system with a maximum of 1% errors, you need a detachment of 4.7.

Are 3.3 and 4.7 realistic?

Unfortunately, the answer is no! Look at some indicators that experimentalists have found out. For spotting lie, the d’ values obtained by studies stand at 0! d’ ranged from 0.6 to 0.9 for the polygraph test; the best-in-class gave close to 2.5, but that was exceptional.

Then come eyewitnesses: researchers did a meta-analysis of 120 studies and found d’ values of 0.8 on facial identification. Following is how d’ = 0.8 appears.

Medical tests have typically higher levels of d’ values but fall short of 3 (CT scans 3.0, mammograms 1.3 etc.), suggesting getting a 5% error level is a target difficult to achieve. What more can we do? We will see next.

Do juries meet our expectations?: Arkes and Mellers

Rationality: Steven Pinker

Elusive Justice Read More »

Justice for All

We all like to see justice happening in every trial. What is justice? In simple language, it is the conviction of the guilty and the acquittal of the innocent. In the court, jurors encounter facts, testimonies, arguments in support and against the defendant. The number and variety of evidence make them feel like random, independent events and make the hypothesis (that the accused is guilty or not) in front of the judge a distribution!

Overlapping Evidence

To illustrate the complexity of decision making, see the following two distributions (I chose uniform distribution as an example).

You can imagine four possibilities formed by such overlapping curves. The right-hand side tail of the innocent line (green) that enters the guilty (blue) region leads to the conviction of the innocent. The opposite – acquittal of the guilty – happens for the left-hand side tail of the guilty line that enters the non-guilty. The dotted line at the crossover of the two curves represents the default position of the decision or the point of optimal justice. At that junction, there are equal frequencies for false alarms and misses.

10 guilty for 1 innocent

If the judge believes in Blackstone’s formulation, she will move her position to the right, as in the following plot.

The jury is willing to miss more guilty but assure fewer innocents are convicted. The opposite will happen if there is a zero-tolerance policy for the guilty; the examples in the real world are many, especially when it comes to crimes against national interest.

Errors and mitigations

So, what can jurors do to reduce the number of errors? We will look at more theoretical treatments and suggestions in the next post.

Do juries meet our expectations?: Arkes and Mellers

Justice for All Read More »

Binomial-Beta

Shaquille’s story continues. Last time, we made assumptions about each other’s feelings (hypotheses is the polished word!) for Shaq’s chance to enter the White House. Those assumptions were arbitrary: 0.9 and 0.2 for the probability of success or p. What about other factions? There are infinite of them between 0 and 1. This time, we will make no assumptions and take as many as possible using a continuous hypothesis generator.

For that purpose, we will use the beta distribution function. Why beta? There is a reason for that, but we will know only towards the end. For now, we focus on what it can do for us more than what beta is. The beta distribution function can give a wide variety of probabilities for the entire range of p values. See three typical types (beta pdf uses two characteristic parameters, alpha and beta):

And three variety types:

The first reason to choose the beta distribution function (don’t get confused; this is not a beta function, which is a different beast) is that it takes the range of hypotheses (p) we are after as its input. i.e. 0 to 1.

What should Shaq and his friend choose?

Remember, Shaq and his friend have strong views about getting inside the White House, and they are betting! So, which shape of the beta distribution function should they choose? They are unlikely to use more consensus-driven types, the curves that bulge in the middle. Who will bet when two opinions are towards a single idea? So, they chose alpha = 0.5 and beta = 0.5 as their prior hypothesis.

Bayes’s to help after the misadventure

We know what happened: Shaq failed to get inside without the prior appointment, so the outcome was zero success in the first attempt. Now, we predict what happens if he tries again (without actually doing and finding it). That is where Bayesian inference comes in handy. That is also the second reason for choosing the beta distribution function. Write down the Bayes’ rule here, i.e. posterior = likelihood x prior / sum of all possibilities. In the world of continuous functions, integrations replace additions.

\\ beta(\alpha_{posterior}, \beta_{posterior} | data) = \frac{likelihood * beta(\alpha_{prior}, \beta_{prior}) }{\int likelihood * beta(\alpha_{prior}, \beta_{prior})} \\ \\ \text{likelihood is binomial distribution function, as seen in the previous post} \\\\ \text{likelihood} = f(s; n,p) = \binom{n}{s} p^s (1-p)^{n-s}

Then a miracle happens: a complicated equation applied over the prior beta function results in a posterior beta function with a few minor modifications. The posterior is:

\\ beta(\alpha_{posterior}, \beta_{posterior}) = beta(\alpha_{prior + s}, \beta_{prior + n - s}) \\ \\ \text{in the present case} \\ \\ beta(\alpha_{prior}, \beta_{prior}) = beta(0.5,0.5) \\ \\ beta(\alpha_{posterior}, \beta_{posterior}) = beta(0.5 + 0, 0.5 + 1 -0 ) = beta(0.5, 1.5) \\ \\

New betting scheme

You know why s was zero and n was one, because Shaq did one (n) attempt and failed (s)! How does the new scenario, beta(0.5, 1.5), look? Here is the shape of beta(0.5, 1.5):

Putting together

The updated chance of Shaq getting inside the White House has come down after the information that he failed in his first attempt came out.

Shaq Denied Entrance: Washington Post

Bayesian Statistics for Beginners: Donovan and Mickey

Binomial-Beta Read More »

Shaq Goes to the White House

Shaquille O’Neil, popularly known as Shaq, is a basketball player and four-time NBA champion. He once had a bet with his friend about getting into the White House without prior permission. The wager was 1000 push-ups.

What is the chance of Shaq to be successful?

We can use binomial probability mass function (PMF) to estimate Shaq’s chance.

f(s; n,p) = \binom{n}{s} p^s (1-p)^{n-s}

Where n is the number of trials, s is the number of successes, and p is the probability of success.

Since Shaq was confident to bet, one can imagine that he would have given a higher value of p (say 0.9) for himself, whereas his friend would think lower (say, 0.2). If Shaq has one chance to show up, called a Bernoulli trial, associated probabilities with those two conditions are:

\\ f(1; 1,0.9) = \binom{1}{1} 0.9^1 (1-0.9)^{1-1} = 0.9 \text { and} \\ \\    f(1; 1,0.2) = \binom{1}{1} 0.2^1 (1-0.2)^{1-1} = 0.2

If Shaq makes three attempts, as per his view, the success rates are:

\\ f(s; 3,0.9) = \binom{3}{s} 0.9^s (1-0.9)^{3-s} \\ \\ \text{for s = 0, no success} \\ f(0; 3,0.9) = \binom{3}{0} 0.9^0 (1-0.9)^{3-0} = 0.001 \\ \\ \text{for s = 1, 1 success} \\ f(1; 3,0.9) = \binom{3}{1} 0.9^1 (1-0.9)^{3-1} = 0.027 \\ \\ \text{for s = 2, 2 successes} \\ f(2; 3,0.9) = \binom{3}{2} 0.9^2 (1-0.9)^{3-2} = 0.243 \\ \\ \text{for s = 3, 3 successes} \\ f(3; 3,0.9) = \binom{3}{3} 0.9^3 (1-0.9)^{3-3} = 0.729

The probability densities for Shaq and his friend, in graphical format, are:

Was Shaq successful?

No, he was not. He tried once but was stopped at the gate by the security! What would have happened had he made another attempt? More about that story and the use of Baysian in the next post.

Shaq Denied Entrance: Washington Post

Shaq Goes to the White House Read More »

Has UK Weathered the Storm?

This post follows up on the one made in December. Covid’s latest variant, the omicron, was a storm that played havoc over the world. The actual calamity is not known yet, but there is a widespread feeling that it was milder.

The UK, one of the countries that captured and shared covid data from day 1, provides answers to some of those questions. Here is an update of what happened since last time. Note the deaths of the last few days are incomplete.

Critical parameters such as hospitalisations and deaths have been rescaled – both x and y axes – to coincide with the reported cases before the vaccine.

Has UK Weathered the Storm? Read More »

Who Wrote paper No. 54?

The Federalist papers were published anonymously in 1787-88 by Alexander Hamilton, John Jay and James Madison. Of the 77 essays, it is generally agreed that Jay wrote 5, Hamilton 43 and Madison 14. The remaining papers are either jointly written by Hamilton and Madison or by one of the two (not Jay). The problem was solved by Mosteller and Wallace using the Bayesian approach but using Poisson distribution.

We try to go through their approach using a simple Bayes’ rule. Consider paper no. 54. The starting point was the style of writing. Both Hamilton and Madison used similar styles, so it was difficult to get an answer that easily. The authors then looked for the usage of specific words, such as by, from, to, while, whilst, war etc. We take one such word, upon. The frequency distribution of upon collected from a set of papers published by Hamiton and Madison (including the ones outside The Federalist) is given below

Rate / 1000HM
0041
(0,1]17
(1,2]102
(2,3]11
(3,4]11
(4,5]10
(5,6]3
(6,7]1
(7,8]1
Total4850

Bayesian Problem

Let’s the formula the problem using upon as the tag word:
In paper 54, the word upon comes 2 times in the text (in 2004 words). So upon frequency is 0.99.
P(Hamilton|upon = 0.99) = P(upon = 0.99|Hamilton) * P(Hamilton) / [P(upon = 0.99|Hamilton) * (Hamilton) + P(upon = 0.99|Madison) * P(Madison)]

P(upon = 0.99|Hamilton) = (1/48) based on the frequency table
P(upon = 0.99|Madison) = (7/50) based on the frequency table
P(Hamilton) = can be 43/77 based on the existing known data of authorship. But we take 0.5
P(Madison) = can be 14/77 based on the existing known data of authorship. But we take 0.5

P(Hamilton|upon = 0.99 ) = (1/48 * 0.5)/(1/48 * 0.5 + 7/50*0.5) = 0.13 or 13%. Naturally, P(Madison|upon = 0.99) = 1 – P(Hamilton|upon = 0.99) = 87%

The Federalist Papers: No. 54

Inference in an Authorship Problem: Mosteller and Wallance, Journal of American Statistical Association, 58 (302), 275-309

Who Wrote paper No. 54? Read More »