Decision Making

Justice and the Use of Prior Beliefs

The last two posts ended with rather pessimistic notes on the possibility of establishing justice under the complex world of overlapping pieces of evidence. We end the series with the last technique and check if that offers a better hope of overcoming some inherent issues of separating signals from noise using the beta parameter.

Beta comes from signal detection theory, and it is the ratio of likelihoods, i.e. P(xi|G)/P(xi|I). P(xi|G) is the probability of the evidence, given the person is guilty, and P(xi|I), if she is innocent.

Let us start from Bayes’ rule,

\\ P(G|x_i) = \frac{P(x_i|G)*P(G)}{P(x_i|G)*P(G) + P(x_i|I)*P(I)} \\ \\  P(I|x_i) = \frac{P(x_i|I)*P(I)}{P(x_i|I)*P(I) + P(x_i|G)*P(G)} \\ \\  \frac{P(G|x_i)}{P(I|x_i)} = \frac{P(x_i|G)*P(G)}{P(x_i|I)*P(I)} \text{or} \\ \\ \frac{P(G|x_i)*P(I)}{P(I|x_i)*P(G)} = \frac{P(x_i|G)}{P(x_i|I)} = \beta

So, beta depends on the posterior odds of guilt and the prior odds of innocent.

For a situation at a likelihood ratio of 1, if the prior belief, P(G), is lower, the jury is less likely to make a false alarm. Graphically, this means moving the vertical line to the right and achieving higher accuracy in preventing false alarms (at the expense of more misses).

The sad truth is that none of these techniques is helping to reduce the overall errors in judgement.

Do juries meet our expectations?: Arkes and Mellers

Justice and the Use of Prior Beliefs Read More »

Elusive Justice

We have seen perceived culpability vs probability distribution of evidence and their overlapping nature in the last post. It offers a simple picture, but the message demonstrates the difficulties of establishing justice for all. Today, we will go a step deeper by invoking signal detection theory, a strategy that can help detect signals from noise.

One technique to raise the level of justice (reduce misses and false alarms) is to increase the distance between the distribution curves from each other. Let’s look at it quantitatively: imagine you want not more than 5% misses and 5% false alarms, then the separation required between the two curves should be something like the following picture.

The red dotted line passes through the point on the 95th quantile of the green curve (calculated by the R formula, qnorm(0.95, 5, 1.52), where five is the mean and 1.52 is the standard deviation). You already know the meaning of 95% – it is 1.64 standard deviations away from the mean (equivalent to the one-sided confidence interval). The line also should match the left 5% of the blue curve (qnorm(0.05, 10, 1.52). One way to quantify the separation is to estimate the distance between the two distributions as a function of the standard deviation of the innocent.

d' = \frac{\mu_g - \mu_i}{\sigma_i} \\ \\ \mu_g \text{ = mean of guilty, }\mu_i \text{ = mean of innocent, and } \sigma_i \text{ = standard deviaton of innocent}

For the above plot, the separation is 3.3 standard deviations. If you wanted to see a more just system with a maximum of 1% errors, you need a detachment of 4.7.

Are 3.3 and 4.7 realistic?

Unfortunately, the answer is no! Look at some indicators that experimentalists have found out. For spotting lie, the d’ values obtained by studies stand at 0! d’ ranged from 0.6 to 0.9 for the polygraph test; the best-in-class gave close to 2.5, but that was exceptional.

Then come eyewitnesses: researchers did a meta-analysis of 120 studies and found d’ values of 0.8 on facial identification. Following is how d’ = 0.8 appears.

Medical tests have typically higher levels of d’ values but fall short of 3 (CT scans 3.0, mammograms 1.3 etc.), suggesting getting a 5% error level is a target difficult to achieve. What more can we do? We will see next.

Do juries meet our expectations?: Arkes and Mellers

Rationality: Steven Pinker

Elusive Justice Read More »

Justice for All

We all like to see justice happening in every trial. What is justice? In simple language, it is the conviction of the guilty and the acquittal of the innocent. In the court, jurors encounter facts, testimonies, arguments in support and against the defendant. The number and variety of evidence make them feel like random, independent events and make the hypothesis (that the accused is guilty or not) in front of the judge a distribution!

Overlapping Evidence

To illustrate the complexity of decision making, see the following two distributions (I chose uniform distribution as an example).

You can imagine four possibilities formed by such overlapping curves. The right-hand side tail of the innocent line (green) that enters the guilty (blue) region leads to the conviction of the innocent. The opposite – acquittal of the guilty – happens for the left-hand side tail of the guilty line that enters the non-guilty. The dotted line at the crossover of the two curves represents the default position of the decision or the point of optimal justice. At that junction, there are equal frequencies for false alarms and misses.

10 guilty for 1 innocent

If the judge believes in Blackstone’s formulation, she will move her position to the right, as in the following plot.

The jury is willing to miss more guilty but assure fewer innocents are convicted. The opposite will happen if there is a zero-tolerance policy for the guilty; the examples in the real world are many, especially when it comes to crimes against national interest.

Errors and mitigations

So, what can jurors do to reduce the number of errors? We will look at more theoretical treatments and suggestions in the next post.

Do juries meet our expectations?: Arkes and Mellers

Justice for All Read More »

Who Wrote paper No. 54?

The Federalist papers were published anonymously in 1787-88 by Alexander Hamilton, John Jay and James Madison. Of the 77 essays, it is generally agreed that Jay wrote 5, Hamilton 43 and Madison 14. The remaining papers are either jointly written by Hamilton and Madison or by one of the two (not Jay). The problem was solved by Mosteller and Wallace using the Bayesian approach but using Poisson distribution.

We try to go through their approach using a simple Bayes’ rule. Consider paper no. 54. The starting point was the style of writing. Both Hamilton and Madison used similar styles, so it was difficult to get an answer that easily. The authors then looked for the usage of specific words, such as by, from, to, while, whilst, war etc. We take one such word, upon. The frequency distribution of upon collected from a set of papers published by Hamiton and Madison (including the ones outside The Federalist) is given below

Rate / 1000HM
0041
(0,1]17
(1,2]102
(2,3]11
(3,4]11
(4,5]10
(5,6]3
(6,7]1
(7,8]1
Total4850

Bayesian Problem

Let’s the formula the problem using upon as the tag word:
In paper 54, the word upon comes 2 times in the text (in 2004 words). So upon frequency is 0.99.
P(Hamilton|upon = 0.99) = P(upon = 0.99|Hamilton) * P(Hamilton) / [P(upon = 0.99|Hamilton) * (Hamilton) + P(upon = 0.99|Madison) * P(Madison)]

P(upon = 0.99|Hamilton) = (1/48) based on the frequency table
P(upon = 0.99|Madison) = (7/50) based on the frequency table
P(Hamilton) = can be 43/77 based on the existing known data of authorship. But we take 0.5
P(Madison) = can be 14/77 based on the existing known data of authorship. But we take 0.5

P(Hamilton|upon = 0.99 ) = (1/48 * 0.5)/(1/48 * 0.5 + 7/50*0.5) = 0.13 or 13%. Naturally, P(Madison|upon = 0.99) = 1 – P(Hamilton|upon = 0.99) = 87%

The Federalist Papers: No. 54

Inference in an Authorship Problem: Mosteller and Wallance, Journal of American Statistical Association, 58 (302), 275-309

Who Wrote paper No. 54? Read More »

The Probability of 100 posts

Reaching one hundred days is a milestone. The general focus of my posts over these days, and also likely in the near future, has been about understanding risks, differentiating them from perceived risks and making decisions in situations of uncertainty using prior information, also known as Bayesian thinking. It started from the misery of seeing how the responsible agents of society -journalists, the political leadership, or whoever influences the public – ignore critical thinking and present a distorted view of reality.

Probability and evolution

A subplot that is closely associated with probability and randomness is the topic of evolutionary biology. It is hard to comprehend, yet the truth about life is that it moved from one juncture to another through chance. Evolution is misleading if you view the probability from hindsight. Changes happen through random errors, but one at a time, so that every step in the process is a high-probability event. Indeed, getting an error at a specified location during the body’s 30 trillion cell divisions is close to zero, but getting one at an unspecified somewhere is close to one. In other words, the designer has deeper trouble explaining a move than a drunken gambler!

Biases and fallacies

Next up is our biases and fallacies. The title of this post already suggests two of them – survivorship bias and the fallacy of hindsight. The probability of delivering an uninterrupted string of 100 articles in 100 days is small, and I would never have chosen the present title had I missed a post on one of these days. Now that it happened (luck, effort or something else), I claim I’ve accomplished a low-probability event. As long as I have the power to change the blog title until the last minute, I am fine. But scheduling a post today, for 100 days from now, with a caption of 200 days, is risky and, therefore, not a wise thing to do if you are risk-averse.

Deteminism or probability

Why does the subject of probability matter so much when we can understand physical processes in a deterministic sense? We grew up in a deterministic world, i.e. a world that taught us about actions and reactions, causes and effects. However, we also deal with situations where the outcomes are uncertain, which is the realm of probability. The impact of lifestyle on health, growth of wealth in the market, action of viruses on people with varying levels of immunity, the possibility of earthquakes, droughts, the list is endless. You can argue that the complexity of variables and the gaps in the understanding demand stochastic reasoning.

Updation of knowledge and Bayesian thinking

However imperfect it may be, Bayesian thinking is second nature to us. You are watching an NBA match in the last seconds, and your team is trailing by a point. Your team gets two free throws. What is in your mind when Steph Curry comes in for those? Contrast that with Ben Simmons. It is intuitive Bayesian thinking. It is not a surprise that you are more at ease with Curry’s 91% success rate in free throws than Ben, who is at 60%. You may not remember those numbers, but you know it from gut feeling.

Yet, being rational requires constant training. Your innate Baysian is too vulnerable to biases and fallacies. You start missing the base rates, confuse the hypothesis given the evidence with evidence given the hypothesis, or overestimate the prior due to recency bias. Surviving the sea of probability is hard, fighting the wind of lies and the sirens of misinformation. So what do you prefer, put wax in the ears, tie tight to the mast or get the rational power, listen to the music, and persist the voyage?

The Probability of 100 posts Read More »

The Heuristics Fight Back

This post supports heuristics as a practical tool for decision making. The proponents of heuristics score a few points when it comes to managing firefighting situations or dealing with the world of randomness, where a deep subject knowledge offers no added advantage. Heuristics suffer from shortcomings that prevent them from helping the world pull out from its biases and fallacies. I will end this article with some observations that I found in the book by Gigerenzer et al.

What is Heuristics?

It is a method of solving practical problems without using extensive rational knowledge. While it is not random guesswork, it can be closer to making educated guesses. It uses the characteristic evolutionary traits of our species in responding to external stimuli. One popular book which comes closer to this description is Blink by Malcolm Gladwell.

Claims of Heuristics

Heuristics claims superior decision quality with a minimum amount of information. They frequently use the adjectives such as simple, practical, minimalist to describe the technique. I will focus on the book of Gigerenzer et al. in the rest of the post. The book is titled, The Simple Heuristics That Make Us Smart.

Toolkits and Workflows

The book opens with an example of managing heart attack victims with the help of a simple checklist. What you see in the list are a few simple questions. They included noting the patient’s systolic blood pressure, age, and heartbeats. Follow this short and sweet classification hierarchy, and you just made a decision.

The book conveniently ignores the origin of how those few vital parameters ended up on the list and what type of history-matching (not the so-called experience) done to those parameters. In other words, the checklist is not something the specialist made up at that moment, however appealing that thought could be. And it doesn’t ignore any quantitative information as the authors appear to claim. Each of the questions in the checklist suggests something about prior knowledge (data), the pillar of Bayesian thinking. It is like claiming pre-cooked meals as an invention to replace a complex cooking process. It is just an illusion to the customer; someone (or a machine) needed to cook somewhere.

Appeal to authority and Appeal to emotions

It is no coincidence that the authors fell into the trap that has been the main criticism against the heuristics – that they can not avoid logical fallacies. The first page itself gives two examples. Starting with a quote attributed to Isaac Newton:

Truth is ever to be found in simplicity, and not in the multiplicity and confusion of things.

Then comes the opening sentence:

A man is rushed to a hospital in the throes of a heart attack. The doctor needs to decide quickly whether the victim should be treated as a low-risk or a high-risk patient.

Simple Heuristics That Make Us Smart, Gerd Gigerenzer, Peter M. Todd, and the ABC Research Group

The subject did not require either of the two quoted statements to prove its point. Heuristics is a practical recipe in decision making.

Final verdict 

The book’s proposal to replace the first revolution (computing probabilities etc.) with the second one (adaptive toolbox with fast and frugal heuristics) is rejected! It is pure short-sightedness. Instead of abandoning probability, the alternative proposal may be to promote heuristics as a practical tool. And working side by side with its older sister is no shame.

The quote, “Truth is ever found …” is utter nonsense; sounds nice but far from the truth!

The Heuristics Fight Back Read More »

The problem with Experience

Imagine this: a peaceful city of a million population wakes up with news of a murder. Its inhabitants, not used to such events, naturally get shocked and think of it as a failure of the establishment. The administration recruits a highly specialised cop from an area of another city notorious for criminals. The place was in such a bad shape that it had 10000 people and 100 criminals, and the officer had a lot of experience catching them.

The cop knows the strength of facial recognition systems as he had a high success rate in catching criminals in his old job. He trusts it was because of the high accuracy of the system, i.e. a 1% false-positive rate and a 100% true positivity (if you are a criminal, you are caught). Is recruiting the top cop a good strategy?

The answer depends on how much of the previous experience the cop was willing to forget and learn the reality of the new city. Look at, mathematically, the problem with his background. We use Bayes’ theorem.

Violent City: prevelance of criminal P(C) = 100/10000 = 0.01, P(+ve|C) = 1, P(+ve|nC) = 0.01, P(nc) = 1 – P(C). Chance of the person is a criminal given the facial recognition is matched, P(C|+ve) = (1 x 0.01) / [1 x 0.01 + 0.01 x 0.99] = 0.5 = 50%; he was right half the time.

Peaceful City: prevelance of criminal P(C) = 100/1000000 = 0.0001, P(+ve|C) = 1, P(+ve|nC) = 0.01, P(nc) = 1 – P(C). P(C|+ve) = (1 x 0.0001) / [1 x 0.0001 + 0.01 x 0.9999] = 0.01 = 1%.

The right level of experience

If the officer uses his experience and starts using a facial recognition system to randomly check people, expect him to catch 99 innocent for every potential criminal. Instead, he can use the tool as supporting evidence for those who are caught for other suspicious activities.

Now replace murder with drinking, facial recognition with a breath analyser. The results will be the same as long as the tool is employed for random checking – a lot of innocents are penalised.

The problem with Experience Read More »

The Fallacy of the Inverse

Let us start from where we ended yesterday, the P -> Q problem. Remember the truth table?

PQP -> Q
truetruetrue
truefalsefalse
falsetruetrue
falsefalsetrue

The way to read the truth table is:

If P is true, Q has to be true. It is called direct reasoning. Q false with P true is a violation of the rule, and therefore if Q is false, P has to be false (indirect reasoning). Finally, if P is not true, Q can be true or false. 

Re-look at the earlier rain problem – if it rains, I will take an umbrella. The only thing that is not possible is rain and no umbrella. The statement, it doesn’t rain now, and therefore, I should not have an umbrella [1] is not true; I can have an umbrella, whether it rains or not. Also, the statement, I am carrying an umbrella, and therefore it is raining [2] is also wrong.

The above statements numbered 1 and 2 mark two widespread logical errors. [1] is called the fallacy of the inverse. An example is: if it’s a dog, it has a tail. The fallacy is if it is not a dog, it can not have a tail. But what about a cat?

[2] is called the fallacy of the converse. An example is catholic priests are men. He is a man and, therefore, must be a catholic priest.

From the examples so far, it seems the inverse and converse errors are easy to spot and escape; until you reach more complex situations such as the equation of life! Yes, the Bayesian way of interpreting evidence. Take our famous example, suppose the probability of having a test is positive given the person has the disease (sensitivity of the equipment) is 95% or P(+|D). We know from our earlier discussions that this is just one variable in the equation. We need more data to estimate our ultimate quest, i.e. P(D|+) or the probability of having the disease given the test is positive. Yet, most people jump to conclude that the probability of getting the disease is 95%. Let’s re-phrase the equation to the PQ format. If the person has the disease, there is a 95% chance that the instrument will test +ve (P ->Q). But, the public presumes the converse, i.e. the device has tested +ve, and therefore the person has a 95% chance of having the disease, which is utter nonsense for a rare disease (low prior).

Rules of Inference

The Fallacy of the Inverse Read More »

Wason’s trials with logic

Suppose you are shown four cards, each with a number on one side and an alphabet on the other, kept on a table. The symbols on the card facing upwards are D, K, 3, 6. Then there is a rule that says if D is one side, the other side of the card is 3. Which two cards do you need to turn over to verify the rule?

Wason did this study in the 1960s with psychology and statistics students and found a large percentage of them suggesting the cards D and 3 as the leading choices to inspect. It is pointless to turn over 3 as there are no restrictions about what 3 can take on the other side; the rule is on D. The right answer is D and 6. Logical problems such as this are called formal logic as they are set up in certain forms of conditional statements (AND, OR, NOT, IF etc.).

According to Wason, these logics are of the form, if P then Q (P -> Q). P is called the antecedent, and Q is the consequent. The only violation of the rule (P -> Q) here is if P and not Q (P -> !Q). The other combinations, not P and Q (!P -> Q), and not P and not Q (!P -> !Q) are not violations.

To put some meaning to P and Q, consider this one: if it rains (P), I will take an umbrella (P) (P -> Q), the end of the statement. So what all can follow from this? If it doesn’t rain (!P), I may or may not take an umbrella and still, the rule is not broken (!P -> Q and !P -> !Q are possible). But, if it rains, I can not have a situation with no umbrella (P -> !Q is not possible) because I made that statement already. The truth table summarises all the arguments:

P
(rain)
Q
(umbrella)
P -> Q
TrueTrueTrue
TrueFalseFalse
FalseTrueTrue
FalseFalseTrue

Interestingly, these logics are easier to follow in real life. Imagine a simple instruction to the cashier at the shop counter: allow alcohol only if the person is above 18. The employee knows whom to watch out for – the people who carry alcohol to check out and those who appear below 18. Not the man who is holding apple juice. It is ecologic logic or logic in the real world.

Reasoning about a rule: Wason

Rationality: Steven Pinker

Wason’s trials with logic Read More »

Myopic Discounting

General preference for short term rewards as against deferred payoff is well known. We have already seen it from the tests done by Prof. Frederick. Subject’s selection for $100 now vs $140 a year later, 30 min massage in 2 weeks vs 45 min massage in November, $3400 this month vs $3800 next month; the list is endless.

The individuals who go for immediate rewards undervalue prizes that are achievable some time in future. Put it differently, they discount the value of the future payoff higher (if it’s money, higher than what is practically achievable from the market) and wait for increasing the benefit until it meets their criteria. The phenomenon is known as temporal discounting. In other words, people with high values of temporal discounting possess myopic discounting.

A study published in The Journal of Neuroscience (2010) used subjects with lesions in their prefrontal cortex to establish the role of the brain in discounting behaviour. Participants included 16 people with brain damage and 20 healthy, as evidenced by MRI and CT images of their brain.

The subjects were given various temporal discounting tasks involving fictitious incentives of money, food, vouchers etc. The results showed a remarkable difference between healthy subjects and the non-healthy. A higher discounting committed by people who damaged their medial orbitofrontal cortex (mOFC) of the brain suggests the importance of mOFC in having clarity about future outcomes in decision making.

Sellitto et al. The Journal of Neuroscience, December 8, 2010, 30(49):16429 –16436.

Myopic Discounting Read More »