July 2023

Negative Binomial Distribution

A fair coin is tossed repeatedly. What is the chance of getting 3rd head on the 10th toss?

You may notice the difference here; it is not asking for the probability of getting three heads in 10 tosses, which can be done using a binomial distribution. This one belongs to the negative binomial distribution.

Let each trial has a probability of success p (and failure 1−p). We follow this sequence until r successes occur.

The probability of observing the s success after having f failures (i.e., the success specified for the [s+f]th trial) is s+f-1Cf x ps x qf

The present problem

\frac{9!}{2! 7!} * (0.5)^3 * (1-0.5)^7 = 0.035

or the R – code

toss_number <- 10
success <- 3
failure <- toss_number - success
dnbinom(failure, success, prob = 0.5)
0.035

Here is the distribution of probabilities of success at each milestone.

Binomial and negative binomial

The key difference between the two is: in the binomial distribution, the number of trials is fixed and the number of successes is a random variable. Whereas in the negative binomial distribution, the opposite is true, viz. the number of successes is fixed and the number of trials is a random variable.

Negative Binomial Distribution Read More »

The Big “But” Fallacy

The big but fallacy involves starting with a generally accepted statement only to negate it at the end with a but. An example is: “Yes, it is wrong to hurt animals, but this time it was different (as I was hungry!)”.

The fallacy is closely related to what is known as the “special pleading”. Here, the ‘but’ gives the ‘special’ exception from the generally accepted rules or ethics.

The Big “But” Fallacy Read More »

A pair of Aces from Four Cards

There are four cards – ace of spades, ace of clubs, ten of spades and seven of clubs.

A♠; A♣; 10♠; 7♣

If I draw two random cards, what is the probability that I get two aces, given?
1. At least one of them is an ace
2. One card is an ace of spades

The problem can be solved in different ways, but we choose, as usual, the Bayes’ rule.

At least one of them is an ace

\\ P(AA|A_{At1}) = \frac{P(A_{At1}|AA) * P(AA)}{P(A_{At1}|AA) * P(AA) + P(A_{At1}|AA^n)* P(AA^n)}

Estimating each parameter:
Probability of at least one Ace, given two aces, P(AAt1|AA) = 1
Probability of picking two aces, P(AA) = 1/6 (there are six ways of arranging four cards into pairs)
Probability of at least one Ace, given NOT two aces, P(AAt1|AAn) = 4/5
Probability of NOT picking two aces, P(AAn) = 5/6 [P(AA) + P(AAn) = 1]
Substituting the values,

\\ P(AA|A_{At1}) = \frac{1/6}{1/6 + (4/5)*(5/6)} = \frac{1}{1+4} = \frac{1}{5}

One card is an ace of spades

\\ P(AA|A_{Asp}) = \frac{P(A_{Asp}|AA) * P(AA)}{P(A_{Asp}|AA) * P(AA) + P(A_{Asp}|AA^n)* P(AA^n)}

Probability of Ace of Spades, given two aces, P(AAsp|AA) = 1
Probability of picking two aces, P(AA) = 1/6 (there are six ways of arranging four cards into pairs)
Probability of Ace of Spades, given NOT two aces, P(AAsp|AAn) = 2/5
Probability of NOT picking two aces, P(AAn) = 5/6 [P(AA) + P(AAn) = 1]
Substituting the values,

\\ P(AA|A_{Asp}) = \frac{1/6}{1/6 + (2/5)*(5/6)} = \frac{1}{1+2} = \frac{1}{3}

R Simulation

cards <- c("Ace of Spades", "Ace of Clubs", "Ten of Spades", "Seven of Clubs")

itr <- 100000

shuff <- replicate(itr, {
draw <- sample(cards, 2, replace = FALSE, prob = rep(1/4, 4)) 

if(any(str_detect(draw, "Ace of Spades"))) {
  if(all(str_detect(draw, "Ace"))){counter <- "A"}
  else{counter <- "B"}
}
else{counter <- "C"}
})

sum(shuff == "A") / (sum(shuff == "A") + sum(shuff == "B"))

A pair of Aces from Four Cards Read More »

Cards from a Deck

If you draw cards from a well-shuffled deck of cards, what is the probability that you get an Ace of Hearts and a black card?

There are two different probabilities this can happen.

  1. An ace of hearts followed by a black card
  2. A black card followed by an ace of hearts

The probability for 1) is (1/52) x (26/51) and for 2) is (26/52) x (1/51). Add them up: (2 x 26)/(51 x 52) = 1/51

If you want to verify the results, you may shuffle the deck a million times and count:

suits <- c("Diamonds", "Spades", "Hearts", "Clubs")
face <- c("Jack", "Queen", "King")
numb <- c("Deuce", "Three", "Four", "Five", "Six", "Seven", "Eight", "Nine", "Ten")
face_card <- expand.grid(Face = face, Suit = suits)
face_card <- paste(face_card$Face, face_card$Suit)

numb_card <- expand.grid(Numb = numb, Suit = suits)
numb_card <- paste(numb_card$Numb, numb_card$Suit)

Aces <- paste("Ace", suits) 

deck <- c(Aces, numb_card, face_card)
itr <- 1000000

shuff <- replicate(itr, {
draw <- sample(deck, 2, replace = FALSE, prob = rep(1/52, 52)) 

dr <- "Ace Hearts" %in% draw & (any(str_detect(draw, "Spades|Clubs")))

if(dr == TRUE){
  counter <- 1
}else{ 
counter <- 0
}

})

mean(shuff)

Cards from a Deck Read More »

The Climate Data – Nasa Power

NS_data <- get_power(
  community = "re",
  lonlat = c(1.6780, 56.5187),
  pars = c("T2M", "WS10M", "WD10M"),
  dates = c("2021-1-1", "2021-03-31"),
  temporal_api = "daily")

Wind_rose<-NS_data[,9:10]
colnames(Wind_rose)<-c("ws", "wd")

windRose(Wind_rose,paddle = FALSE,breaks = c(1,5,10,15,20),
         col=c("#4f4f4f", "#0a7cb9", "#f9be00", "#ff7f2f"))

References

The Power Project: NASA

The Climate Data – Nasa Power Read More »

Drinking and Police

Here is some data on drinking and getting in trouble with the police. Assess the relationship between drinking habits and getting into trouble with the authorities. Does this data provide evidence of drinking and getting into trouble with the police?

NeverOccasionalFrequent
Trouble with Police 60200420
No trouble with Police 480027002800
Observation table

The first step is to form the hypothesis. Here is the null hypothesis:

H0 – Drinking habits and getting into trouble with the police are independent.

The alternative is

H1 – Drinking habits and getting into trouble with the police are not independent.

We will use the chi-squared test to validate the null hypothesis.

We will use the chi-squared test to validate the null hypothesis. It requires observed data as well as the expected data under the null hypothesis conditions. From the data, the number of people belonging to each of the drinking categories is:

NeverOccasionalFrequentTotal
#48602900322010980
%44.2626.4129.33100

So, under ‘normal’ conditions (conditions of independence), one would expect similar percentages of individuals getting into trouble with the police, the expected numbers we needed.

NeverOccasionalFrequent
Trouble with Police 301178200
No trouble with Police 455927203020
Expectation table

If you add a row below each category, you will get the same split as per the total.

NeverOccasionalFrequent
%44.2626.4129.33

It’s time for the chi-square test, i.e. (observed – expected)2/expected summed over all the members.

(60 – 301)2 / 301 + (200 – 178)2 / 178 + (420 – 200)2 / 200 +(4800 – 4559)2 / 4559 +(2700 – 2720)2 / 2720 + (2800 – 3020)2 / 3020 = 467

The chi-squared statistic is 467. The degrees of freedom are the product of one less than the number of categorical variables (i.e. (2-1) x (3-1) = 2). Upon looking at the probability table, you can find that 467 is way on the right side of the distribution, with the probability (p-value) almost zero. So the data did not happen by chance, and the null hypothesis is rejected.

Drinking and Police Read More »

Asymmetry of Information – Market for Lemon

We have seen information asymmetry. And it is a market failure. Why? Because it’s a feature that violates a fundamental theorem of welfare economics, i.e., “the competitive market will maximise the total social welfare”. In the event of a market failure, the overall group is worse off; we have seen one example in the past, i.e., externality.

Insurance is a good example of market failure due to information asymmetry. Another one is “the market for lemons,” which we’ll see in the end.

We saw the owner-mechanic case where the seller holds superior information. Insurance is a transaction where the opposite happens; the seller suffers from a lack of information about the (health condition) of the buyer. Suppose a target group of customers with 90% healthy and 10% sick. For the healthy, there is a 10% chance of incurring a $10,000 charge next year and the rest, none. For the unhealthy, there is a 50% chance of incurring $10,000 and 50% none. So, the expected values (of cost) are:

Healthy: 0.9 x 0 + 0.1 x 10,000 = 1000
Unhealthy: 0.5 x 0 + 0.5 x 10,000 = 5000

If everyone buys health insurance, the expected cost to the insurance company is:

0.9 x 1000 + 0.1 x 5000 = 1400.

Taking a profit of 100 per person, it sets a premium of $1500 for health insurance. Now, what happens in reality?

All the sick will buy the insurance, and only the risk-averse will buy from the healthy. Because the healthy will look at the expected cost (1000) and feel discouraged by the premium that costs 500 more. If there are a total of 1000 people, and 50% of the healthy are risk-averse (buyers of the insurance). Then, the revenue of the insurance company is

0.9 (proportion of healthy) x 0.5 (proportion of risk-averse) x 1000 (total people) x 1500 (premium) + 0.1 (proportion of unhealthy) x 1000 (total people) x 1500 (premium) = 0.9 x 0.5 x 1000 x 1500 + 0.1 x 1000 x 1500

825,000.

And the cost,

0.9 (proportion of healthy) x 0.5 (proportion of risk-averse) x 1000 (total people) x 1000 (expected cost on healthy) + 0.1 (proportion of unhealthy) x 1000 (total people) x 5000 (expected cost on unhealthy) = 0.9 x 0.5 x 1000 x 1000 + 0.1 x 1000 x 5000

950,000

The company loses money due to what is known as adverse selection. What happens if the company raises the premium? Well, it will discourage more healthy companies from entering the market, and the company will lose more money.

Market for Lemons

The problem of lemons is an example in the used-car market. Lemon is a poorly performing product. Since the buyer can’t tell the difference between a lemon and a good car (the plum), they are willing to pay some price corresponding to an average-performing car. Seeing what is happening, the top plum cars will exit the market, further compounding the miseries of the buyer (and the seller alike).

Asymmetry of Information – Market for Lemon Read More »

Asymmetry of Information – Signaling

Here is another manifestation of information asymmetry. How does a new car that entered the market convince customers about its quality? Here, the car manufacturer knows much more about the product than the customer.

This is what Hyundai did in the US, recovering from a phase of making average-quality cars into better ones. It offered its customers a 10-year / 100,000-mile warranty. This is called a signal, which is an expensive action that reveals information.

A certificate of higher education—even better, from a top university—is a powerful signal to the hiring manager. Whether the degree subject is directly applicable to the job or not, the hiring company sees the certificate as evidence of the candidate’s quality, a signal offered by the employee to the employer.

Asymmetry of Information – Signaling Read More »

Asymmetry of Information – Moral Hazard

We have seen it before; information asymmetry leads to what is known as a principal-agent problem.

Take the popular example of the conflict between the car owner and the mechanic. You, a car owner, want to check the vehicle for annual maintenance. Under normal circumstances, unless the owner knows all about the car mechanics, a mechanic knows more about the car repair.

While the whole point of going to the workshop and what is expected from a mechanic both emanate from this (asymmetric) information, it can potentially develop a principal (car owner) agent (mechanic) problem.

You assume that the mechanic will use the information to exploit you by selling unnecessary parts and services. It happens because the incentives of the two parties (the principal and agent) are not the same and possibly conflicting. The owner wants to repair the car at a minimum cost, and the agent wants to maximise his return. In the end, a Moral Hazard is created. A moral hazard is an adverse behaviour that is encouraged by the situation.

Solutions to Moral Hazard

The easiest way is for the owner to gain more information. It may come from taking a ‘second opinion’ from another mechanic (who may have a different incentive) or an auto consultant (who may not even have an incentive).

The second is to reduce the incentive that the agent has. An example is the rating system, preferably at a neutral site, that can deter the agent from ripping the customer off.

Asymmetry of Information – Moral Hazard Read More »

Portfolio Theory – Normal DIstribution

With all its simplicity, portfolio theory still describes the value in grouping securities, preferably ones uncorrelated with each other, for more predictable returns. The statistical parameters, mean and standard deviation, representing the expected return and risk, respectively, also suggest an underlying probability distribution. Despite all criticism around the usage or normal distribution (symmetric bell curve), we still utilise it to explain the portfolio concept.

In the previous post, we saw two stocks, 1 and 2, with two different expected returns (12 and 6) and risks (6 and 3). If the overall returns followed a normal distribution, they would have appeared like in the following plot.

Here, the blue curve represents the one with a higher expected return and higher volatility. The red one is more conservative. The combined set (1:1) for a correlation coefficient of 0 (uncorrelated) behaves in the following way.

The advantage of using a standard distribution (normal, in this case) is that it enables us to estimate various probabilities. E.g., the chance of ending up with a zero return and below for the blue curve (aggressive one) is 2.3%, which is similar to what the conservative (red) can give. On the other hand, for the joint distribution (green curve), it is just 0.4%.

Portfolio Theory – Normal DIstribution Read More »