April 2024

The Poisson Cars and Binomial Hires

A car hire firm typically receives an average of 3 hiring requests per day. What is the probability it gets at most two hiring requests for exactly 3 days a week?

The first part of the problem (getting at most 2 requests in a day) can be solved using the Poisson probability model. It involves a random variable, X, and it takes positive values. All we know is an expected value (average value), lambda. The probability is expressed as:

P(X = s) = \frac{e^{-\lambda}\lambda^s}{s!}

Now, substitute lambda = 3 and s for at most 2 requests, i.e., the chance of 0 requests + 1 request + 2 requests.

P(X \le 2) = \frac{e^{-3} 3^2}{2!} +  \frac{e^{-3} 3^1}{1!} +  \frac{e^{-3} 3^0}{0!} \\ \\ = \frac{9}{2}e^{-3} + 3e^{-3} +  e^{-3}  = 0.423

This can be easily estimated using the R code:

ppois(2, 3, lower.tail = TRUE)

This is the daily probability for at most 2 car hire requests. For estimating the probability of 3 exact such days in a week, we use the binomial model.

P(X = 3) = _3C_7 * p^3 (1-p)^{7-3} \\ \\ P(X = 3) = _3C_7 * 0.423^3 (1-0.423)^{4} = 0.294

Or the R code.

dbinom(3, 7, prob = ppois(2, 3, lower.tail = TRUE))

The Poisson Cars and Binomial Hires Read More »

Law of Truly Large Numbers

In their paper ‘Methods for Studying Coincidences, ‘ Diaconis and Mosteller propose the law of truly large numbers, which states that almost any outrageous event is bound to occur with a large enough number of independent samples! 

Imagine an event that happens with a probability of 0.1% or 0.001. Therefore, the chance that it doesn’t happen is 0.999. If you carry out 100 independent trials, the probability of this not occurring is 0.999100 = 0.90.  In other words, there is a 1-0.9 = 0.1 or 10% chance of occurrence. The following is the plot of this from 1 to 10,000 trials.

You can see that beyond, say, 5000 independent trials, this rare event is sure to occur at least once.   

Law of Truly Large Numbers Read More »

Confidence Interval vs Credible Interval

Confidence interval is a frequentist’s way of communicating the range of values within which the actual (population) parameter sits. A confidence interval of 90% implies that if you do 20 random samples from the same target population and with the same sample size, 18 of the confidence intervals cover the true population mean. This is the frequentist’s view, and the parameter is fixed. 

On the other hand, the Bayesian does not have a concept of a fixed parameter and is happy to accept it as an unknown quantity. Instead, she gives a probability distribution to the expected outcome. The range of values (the interval) of the probability distribution (plausibility) is the credibility interval. In a 90% credible interval, the portion of the (posterior) distributions between the two intervals will cover 90% of the area.

For example, in the following posterior distribution, there is a 90% plausibility that the parameter lies between 0.9 and 11.2; the shaded area = 0.9.

Confidence Interval vs Credible Interval Read More »

Lewis Carroll’s Pillow Problem

Here is one of the Lewis Carroll’s Pillow Problems (problem # 5):

A bag contains a counter, known to be either white or black. A white counter is put in, the bag is shaken, and a counter is drawn out, which proves to be white. What is now the chance of drawing a white counter?

We will use Bayes’ theory to get the required probability.

\\ P(W other|W taken) = \frac{P(W taken|W other) * P(W other)}{P(W taken|W other) * P(W other) + P(W taken|B other) * P(B other)} \\ = \frac{1*1/2}{1*1/2 + 1/2*1/2}

= 1/2/[1/2+1/4] = 2/3 = 0.66

Lewis Carroll’s Pillow Problem Read More »

Bayesian Persuasion

Persuasion is the act of a person (a.k.a. the sender) to convince another (the receiver) to decide in favour of the sender. Suppose the receiver is a judge and the sender is the prosecutor. The prosecutor aims to make the judge convict 100% of the defendants. But the judge knows that only a third of the defendants are guilty. Can the prosecutor persuade the judge to get more than 33% of the decisions in her favour? If the judge is rational, what should be the prosecutor’s strategy?  

Suppose the prosecutor has the research report and the knowledge about the truth. She can follow the following three strategies.

Strategy 1: Always guilty

The prosecutor reports that the defendant is guilty 100% of the time, irrespective of what happened. In this process, the prosecutor loses credibility, and the judge resorts to the prior probability of a person being guilty, which is 33%. The result? Always acquit the defendant. The prosecutor’s incentive is 0. 

Strategy 2: Full information

The prosecutor keeps it simple – report what the research finds. It makes her credibility 100%, and the judge will follow the report, convicting 33% and acquiring 66%. The prosecutor’s incentive is 0.33. 

Strategy 3: Noisy information

Here, when the research suggests the defendant is innocent, report that the defendant is guilty slightly less than 50% of the time and innocent the rest of the time. Let this fraction be 3/7 for guilty and 4/7 for innocent. 

From the judge’s perspective, if she sees an ‘innocent’ report from the prosecutor, she will acquit the defendant. The proportion of time this will happen is (2/3) x (4/7) or 40%. Remember, 2/3 of the defendants are innocent! On the other hand, she will apply the Bayes’ rule if she sees a guilty report. The probability that the defendant is guilty, given the prosecutor provided a guilty report, P(g|G-R), is

P(g|G-R) = P(G-R|g) x P(g) / [P(G-R|g) x P(g) + P(G-R|i) x P(i)]
= 1 x (1/3) /[1 x (1/3) + (3/7) (2/3)]
= (1/3)/(13/21) = 0.54

The judge will convict the defendant since the probability is > 50%. So, the overall conviction rate is 100 – 40 = 60%. The prosecutor’s incentive is 0.6. 

Conclusion

So, persuasion is the act of exploiting the sender’s information edge to influence the receiver’s decision-making. As long as the sender mixes up the flow of information to the judge, she can maximise the decisions in her favour, in this case, from 33% to 60%. 

Emir Kamenica and Matthew Gentzkow, American Economic Review 101 (October 2011): 2590–2615

Bayesian Persuasion Read More »

Response Bias

This type of bias is common in surveys, where the individual’s answer tends to be inaccurate or non-representative of the population. It can significantly impact the research; we will see some common types here. 

Voluntary response bias

The people who responded to the survey differed from the general population due to their personal experience. A typical example is the star rating, in which people with extreme experiences, either highly satisfied or highly unsatisfied, tend to respond more often than those with average experience. 

Social response bias

Also known as the social desirability bias, this bias occurs when individuals choose to respond in a way that makes them look good in front of others. In the end, good behaviour is overreported, and bad behaviour is underreported. 

Non-response bias

Suppose the people who participate in the survey are systematically different from those who don’t. A telephonic survey, say via land phone, is an example that collects only the people available at home during the calling hours. 

Response Bias Read More »

The Rating Problem

Here is the rating summary of a product,
Good – 40%
Average – 10%
Poor – 50%
Looking at the product, how do you know which view represents the actual quality of the product?

Can we conclude that the probability of the product being good equals 0.4, average 0.1, and poor 0.5? Although that is what we want from the rating system, we must realise that these may not represent the absolute or marginal probability of quality but the conditional probability, e.g., the probability of good a given person has rated. In other words

P(Good|Rated) = 0.4
P(Average|Rated) = 0.1
P(Poor|Rated) = 0.5

From this information, we can estimate the actual probabilities, P(Good), P(Average) and P(Poor) using Bayes’ theorem. 

P(Good|Rated) = P(Rated|Good) x P(Good) / P(Rated)
P(Average|Rated) = P(Rated|Average) x P(Average) / P(Rated)
P(Poor|Rated) = P(Rated|Poor) x P(Poor) / P(Rated)

The Rating Problem Read More »

P-Hacking

P-hacking is an often malicious practice in which the analysis is chosen based on what makes the p-value significant. Before going into detail, let’s recall the definition of p-value. It is the probability that an effect is seen purely by chance. In other words, if we chose 5% as the critical p-value to reject (or fail to reject) a null hypothesis, 1 in 20 tests will result in a spectacular finding even when there was none. 

So what happens if the researcher carries out several tests and reports only the one with the ‘shock value’ without mentioning the context of the other non-significant tests? It becomes an example of a p-hacking. 

References

P-Hacking: Crash Course Statistics: CrashCourse
Data dredging: Wiki
The method that can “prove” almost anything: TED-Ed

P-Hacking Read More »

Confusion of the Inverse

What is the safest place to be if you drive a car, closer to home or far away?

Take this statistic: 77.1 per cent of accidents happen 10 miles from drivers’ homes. You can do a Google search on this topic and read several reasons for this observation, ranging from overconfidence to distraction. So, you conclude that driving closer to home is dangerous. 

However, the above statistics are useless if you seek a safe place to drive. Because what you wanted was the probability of an accident, given that you are near or far from home, say P(accident|closer to home). And what you got instead was the probability that you are closer to home, given you have an accident P(closer to home|accident). Look at the two following scenarios. Note that P(closer to home|accident) = 77% in both cases.

Scenario 1: More drive closer to home

Home Away
Accident7723100
No
Accident
10002001200

Here, out of the 1300 people, 1077 drive closer to home.
P(accident|closer to home) = 77/1077 =0.07
P(accident|far from home) = 23/223 = 0.10
Home is safer. 

Scenario 2: More drive far from home

Home Away
Accident7723100
No
Accident
10005001500

Here, out of the 1600 people, 1077 drive closer to home.
P(accident|closer to home) = 77/1077 =0.07
P(accident|far from home) = 23/523 = 0.04
Home is worse 

This is known as the confusion of the inverse, which is a common misinterpretation of conditional probability. The statistics only selected the people who had been in accidents. Not convinced? What will you conclude from the following? Of the few tens of millions of people who died in motor accidents in the last 50 years, only 19 people died during space travel. Does it make space safer to travel than on earth? 

Confusion of the Inverse Read More »

Annie’s Table Game – The End Game

In the last exercise, we found that the frequentist solution heavily underpredicted Becky’s chances of winning the table game. This time, we will see how much that depended on the sample size. So, they continued the game but played 80 matches in total—Annie winning 50 to Becky’s 30. What are Becky’s chances of winning the next three games?

Frequentist solution 

This is not different from the previous: Based on the results, with 30 wins in 80 matches, Beck’s probability of winning a game is (3/8). That means the probability of Becky winning the next three games is (3/8)3 = 0.053.

Bayesian solution 

Run the R program we developed last time:

library(DescTools)
x <- seq(0,1,0.01)
AUC(x, choose(80,30)*x^33*(1-x)^50, from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE))  /
AUC(x, choose(80,30)*x^30*(1-x)^50, from = min(x, na.rm = TRUE), to = max(x, na.rm = TRUE)) 
0.057

And the Simulations

itr <- 1000000
beck_win <- replicate(itr, {
  beck_pro <- runif(1)
  p_5_3 <- dbinom(30, 80, beck_pro)

  if(runif(1) < p_5_3){
    game_5_3 <- 1
  }else{
    game_5_3 <- 0
  }
  
  
  if(game_5_3 == 1){
     beck_3_0 <- beck_pro^3
  } else{
     beck_3_0 <- 0
  }
  
  
  if(beck_3_0 == 0){
    win_beck <- "no"
  }else if(runif(1) < beck_3_0){
    win_beck <- "Becky"
  }else{
    win_beck <- "Annie"
  }
  
  })

sum(beck_win == "Becky")/(sum(beck_win == "Becky") + sum(beck_win == "Annie"))
0.058

Well, when the number of data points is large, the frequentist solution reaches what was simulated.

Annie’s Table Game – The End Game Read More »