Card Game – Optimal Decisions

Here is a game of cards. A and B have two cards each – one green and one red. If A shows green and B shows green, A wins 5 – 0. If A shows green and B shows red, A loses 2 – 3. If A shows red and B shows green, A loses 0 – 5. If A shows red and B shows red, A wins 5 – 0. Here is the representation of the rules.

Looking carefully at the rule, you can conclude that the game is in A’s favour. But can A guarantee the maximum score, and how?

Here is the payoff matrix in the game theory format.

Before getting into the proper formulation, let’s check what happens if A plays only green. A might get a few wins early on, but once B figures out, she will play only red and win by 1 (2-3). On the other hand, if A plays only red, B will play green and win 5 (0-5).

A mixes up

Let A mixes up the play at a probability PAG for green (1 – PAG for red). If she aims to provide no incentive for B to show either green or red,
The payoff for B showing green = Payoff for B showing red
0 x PAG + 5 x (1-PAG) = 3 x PAG + 0 x (1-PAG)
5 – 5 PAG = 3PAG
PAG = 5/8 = 0.625

B mixes up

Naturally, B may respond by mixing her game, PBG for green. Using the same argument from B’s standpoint
The payoff for A showing green = Payoff for A showing red
5 x PBG + 2 x (1-PBG) = 0 x PBG + 5 x (1-PBG)
5PBG + 2 – 2PBG = 5 – 5PBG
PBG = 3/8 = 0.375

Equilibrium outcome

At these rates (PAG, PBG), the expected outcome for A is:
(5/8)(3/8)(5) + (5/8)(5/8)(2) + (3/8)(3/8)(0) + (3/8)(5/8)5 = 3.125

And the expected outcome for B is:
(5/8)(3/8)(0) + (5/8)(5/8)(3) + (3/8)(3/8)(5) + (3/8)(5/8)0 = 1.875

Card Game – Optimal Decisions Read More »

Defective Shirts

This one is taken from a lecture by Eddie Woon, available online. Two friends, Ammie and Becky, make shirts. Ammie makes 20 shirts a day and has a defect rate of 2%. Becky is 50% faster but causes twice the defects. If they sell 30 shirts per day, what is the probability that the daily pack (received by the buyer) has two or fewer defects?

The first step is to calculate the defect probability.

P(D) = Proportion of shirt by Annie x Defect rate of Annie + Proportion of shirt by Becky x Defect rate of Becky
P(D) = (20/50)(2/100) + (30/50)(4/100) = 0.032.

Now, we apply the binomial equation to calculate the probability of having two or fewer defects inside the pack of 30- shirts.

the probability of s success in n rounds is
nCs x ps x q(n-s)

30C0 x 0.0320 x (1- 0.032)30 + 30C1 x 0.0321 x (1- 0.032)29 + 30C2 x 0.0322 x (1- 0.032)28 = 0.93 or 93%

The R code is

pbinom(2, 30, 0.032)

Binomial Probability: Eddie Woo

Defective Shirts Read More »

Coin-Toss Game – Misleading Averages

We have seen how a seemingly profitable game, because of its positive expected value and high average compounding rate, still leads to losses for the average player.

For example, in a simulation of 10000 people betting up to 50 times starting with $100, the number of them ending up with less than the initial amount is:

# Bets# that lost money
(out of 10,000)
205805
307057
507560

The original game implied an average gain of 15% per toss but at a certain probability of winning and losing. It assumed the individual was betting all the money she had earned so far in the subsequent bet. Here, we vary the amount per bet as a fraction of the total and then calculate the optimum fraction that yields profit to the median. But first, the misleading average wealth after 50 wagers, starting with $100.

It’s absurd to see how the gain rises exponentially. Then, we look at the median outcome for the individuals.

We notice there is an optimum strategy, somewhere close to 40%, that gives a modest, nonetheless positive return. Two such situations are illustrated based on how ten people performed after 25 games – the batch betted 100% money and the second person 30% each time.

Note that the Y-axis is in the log scale to resolve the variation better over a wide range.

Coin-Toss Game – Misleading Averages Read More »

Asymmetric Coin-Toss Game

Here is another coin-tossing game. If it lands on heads, your money increases by 80%, and if it’s tails, it reduces by 50%. The question is: will you play this game?

As always, let’s estimate the expected value of this game.
E =0.8 x 0.5 – 0.5 x 0.5 = 0.4 – 0.25 = 0.15 or a net gain of 15%. In other words, if I start with $100 and bet everything each time 20 times, on average, I end up getting 100 x 1.1520 = $1637.

Let 10,000 play this game and collect the average.

n_people <- 10000
n_bet <- 20
gain <- data.frame(matrix(NA, nrow = n_bet, ncol = n_people))
 
  for (m in 1:n_people) {
       bet1 <- 100
       money <- c(rep(0, n_bet))
       for (n in 1:n_bet) {
           bet2 <- bet1
           toss <- sample(c(0.8*bet2,-0.5*bet2), size = 1, replace = TRUE, prob = c(1/2,1/2))
           bet1 <- bet1 + toss
           money[n] <- bet1 
           gain[n,m] <- money[n]
       }
  }   
  gain$AVG <- apply(gain, 1, mean, na.rm=TRUE)

Yes, we get $1694.8 as the average of 10,000 people at the end of 20 rounds.

So, a no-brainer, huh? But before we make a decision, what is the median return from the game? In other words, what does the average person get: it’s 34.86!

gain$MED <- apply(gain, 1, median, na.rm=TRUE)

So, the average person systematically loses money.

Asymmetric Coin-Toss Game Read More »

Expected Value of Spin

There is a wheel with three values of 0, 1 and 2. Each occupies an equal area of the wheel. You play the game by spinning the wheel. If it lands on 1 or 2, you get the money and spin again. When it lands on 0, the game ends. What is the average value expected from this game?

There is a (1/3) chance the game ends with no money. There is a (2/3) chance that you get 1 or 2. So the expected value for game one is (2/3) x (3/2). For the second game, it is (2/3) x (2/3) x (3/2). So the total expected value is a sum of the following series.

(3/2) x [(2/3) + (2/3)^2 + (2/3)^3 + …]. The sum in the square bracket is 2; the product becomes (3/2) x (2) = 3.

Expected Value of Spin Read More »

Mtcars Dataset – Pair Plots

We continue with the mtcars dataset to illustrate a few more correlation plots – this time, the pair plots.

library(GGally)
ggpairs(car_data[,1:7])
  • The main diagonal represents the data distribution of the variable
  • The upper half diagonal represents the correlation coefficients
  • The lower half diagonal represents a scatter plot between pairs 
library(psych)
pairs.panels(car_data, lm = TRUE)

Mtcars Dataset – Pair Plots Read More »

Pearson vs Spearman Correlations

We have seen Pearson’s correlation coefficient earlier. There is a nonparametric alternative to this which is Spearman’s correlation coefficient.

Pearson’s is a choice when there is continuous data for a pair of variables, and the relationship follows a straight line. Whereas Spearman’s is the choice when you have a pair of continuous variables that do not follow a linear relationship, or you have a couple of ordinal data. Another difference is that Spearman correlates the rank of the variable, unlike Pearson (which uses the variable itself).

Rank of variables

A rank shows the position of the variable if the variable is organised in ascending order. The following is an example of a vector, xx and its rank.

VariableRank
103
21
345
214
52

Let’s apply each of the correlation coefficients to the mtcars database.

Pearson Method

cor.test(car_data$mpg, car_data$hp, method = "pearson")
	Pearson's product-moment correlation

data:  car_data$mpg and car_data$hp
t = -6.7424, df = 30, p-value = 1.788e-07
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8852686 -0.5860994
sample estimates:
       cor 
-0.7761684 

Spearman Method

cor.test(car_data$mpg, car_data$hp, method = "spearman")
	Spearman's rank correlation rho

data:  car_data$mpg and car_data$hp
S = 10337, p-value = 5.086e-12
alternative hypothesis: true rho is not equal to 0
sample estimates:
       rho 
-0.8946646 

Spearman via Pearson!

cor.test(rank(car_data$mpg), rank(car_data$hp), method = "pearson")
	Pearson's product-moment correlation

data:  rank(car_data$mpg) and rank(car_data$hp)
t = -10.969, df = 30, p-value = 5.086e-12
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.9477078 -0.7935207
sample estimates:
       cor 
-0.8946646 

Pearson vs Spearman Correlations Read More »

mtcars Dataset – Correlation Plots

In exploratory data analyses, you may want to check the correlations – the strength and the direction – in one go. And a correlation matrix can give that snapshot. Following is the R code to get the matrix.

corrplot(corr = cor(car_data), method = 'number')

As we discussed in the previous posts, a higher positive number (blue) denotes a stronger positive correlation between the variables (pairwise), and the negative (red) indicates the opposite.

Let’s work on various other ways of visualising the same using R.

As a colour map

corrplot(corr = cor(car_data), method = 'color')

As a pie chart and labels inside

corrplot(corr = cor(car_data), method = 'pie', tl.pos = 'd')

Having mixed visualisations for upper and lower triangles

corrplot(cor(car_data), type = 'upper', method = 'pie', tl.pos = "d")
corrplot(cor(car_data), type = 'lower', method = 'number', add = TRUE, tl.pos = "n", diag = FALSE)

mtcars Dataset – Correlation Plots Read More »

mtcars Dataset – Correlation Coefficient

We have seen a couple of plots showing relationships between variables in the ‘mtcars‘ database.

Statisticians use single numbers to quantify the strength and direction of the relationship. One of them is the correlation coefficient which quantifies linear relationships. Before going into correlation coefficients, let’s first learn the covariance between two variables.

Covariance

The sample covariance among two variables based on N observations of each is,

Cov(x,y) = \frac{1}{N-1}\sum\limits^{N}_{i = 1} (x_i - \bar{x}) * (y_i - \bar{y})

You see N − 1 in the denominator rather than N when the population mean is not known and is replaced by the sample mean (X bar).

sum((car_data$mpg - mean(car_data$mpg))*(car_data$cyl - mean(car_data$cyl))) / 31

Or simply,

cov(car_data$mpg, car_data$cyl)
-9.17

Correlation coefficient

The Pearson correlation coefficient is the covariance of the two variables divided by the product of their standard deviations.

r_{x,y} = \frac{Cov(x,y)}{s_xs_y}

cov(car_data$mpg, car_data$cyl) /(sd(car_data$mpg)*sd(car_data$cyl))

Or use the following command from ‘corrplot‘ package

cor(car_data$mpg, car_data$cyl)
-0.85

The greater the absolute value of the correlation coefficient, the stronger the relationship. The maximum value is 1 (+1 and -1), which represents a perfectly linear relationship. A positive value means when one variable increases, the other one also increases. On the other hand, a negative value suggests when one value increases, the other decreases.

In exploratory analyses, however, you may want to know the relationships between several variables in one go. That is the topic for the next post.

References

The Correlation Coefficient: Investopedia
Covariance: Wiki

mtcars Dataset – Correlation Coefficient Read More »