Data & Statistics

The sum of Dice – 5 before 7

The game is to roll a pair of dice until a sum of 5 or 7 is reached. Then, what is the probability that a sum of 5 comes before a sum of 7?

We use a few handly R commands to answer the problem.

1. expand.grid: gives all combinations of vectors – in our case, two dice.

rolls <- expand.grid(seq(1,6), seq(1,6))

2. do.call: executes a function on a list of arguments, e.g. to perform + (sum) to the earlier list.

sum_rolls <- do.call(`+`, rolls)
2  3  4  5  6  7  3  4  5  6  7  8  4  5  6  7  8  9  5  6  7  8  9 10  6  7  8  9 10 11  7  8  9 10 11 12

3. table function to tabulate each categorical variable (2 to 12) with its frequencies.  

tab_rolls <- table(sum_rolls)
sum_rolls
 2  3  4  5  6  7  8  9 10 11 12 
 1  2  3  4  5  6  5  4  3  2  1 

The table shows that the sum 5 occurs 4 times, and 7 can occur 6 times (out of the 36 possibilities). So, the probability of getting a five before seven (as the sum) is 4/(4+6) = 40%.

Combining steps into 2 lines.

dice_roll <- as.data.frame.table(prop.table(table(do.call(`+`, expand.grid(seq(1,6), seq(1,6))))))
dice_roll$Freq[which(dice_roll$Var1 == 5)] / (dice_roll$Freq[which(dice_roll$Var1 == 5)] + dice_roll$Freq[which(dice_roll$Var1 == 7)])

The sum of Dice – 5 before 7 Read More »

Another Dice Puzzle

A regular die has six sides, and the probability of getting a six is 1/6. If the die is modified so that the number 6 now appears to be half than usual (the frequency is halved), what is the probability of rolling a 6?

dice, roll the dice, to play-2031512.jpg

The question implies that the probability of getting 6 is half that of the other probabilities (1 to 5). Let p6 be the probability of rolling a six. Let p be the probability of rolling any other side.

For the die:
5 x p + p6 = 1
given p6 = p/2
5 x p + p/2 = 1
11p = 2
p = 2/11
p6 = p/2 = 1/11

The probability of rolling a six is not 1/12 but 1/11.

Another Dice Puzzle Read More »

Proximity in a Meeting

Amy and Becky are two friends invited to a 5-member committee meeting. What is the probability that they get seats next to each other if:
1. the member seats are arranged in a round table
2. the member seats are arranged in a row

Seated around a roundtable

Given that Amy is seated as denoted in the figure, Becky can sit next to her if she gets one of the two seats marked as Betty. In a random seating assignment case, the probability for that to happen is 2 out of 4 or the probability for the two seated side by side is 2/4 = 50%

Seated in a row

As we did before, let’s place Amy first.
In the end – two such places (E)
In the middle – three such places (M)

Only one person can be next to the position E’s, but two positions are possible for M’s. A series of AND and OR rules estimate the overall possibility.

(Amy on E1 OR E2) AND (Becky on position 1) OR (Amy of M2 OR M3 OR M4) AND (Becky on position 1 OR Position 2).
(1/5+1/5) x (1/4) + (1/5+1/5+1/5) x (1/4 + 1/4)
2/20 + 6/20 = 8/20 = 40%

Reference

How To Solve Bloomberg’s Random Seating Interview Question: MindYourDecisions

Proximity in a Meeting Read More »

Multiple Regression – Non-Linear Transformations

If you recall one of the previous posts relating ‘lstat’ with ‘medev’, we noticed the apparent curvature of the relationship. And how slightly awkward the fit-line on top of the scatter plot was.

This seeming ‘non-linearity’ prompts us to extend the linear model to non-linear. We use a quadratic term by squaring the predictor (‘lstat’). We must use the identity function to do this.

fit5 <- lm(medv ~ lstat + I(lstat^2), Boston)

The results are presented below:

Call:
lm(formula = medv ~ lstat + I(lstat^2), data = Boston)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.2834  -3.8313  -0.5295   2.3095  25.4148 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 42.862007   0.872084   49.15   <2e-16 ***
lstat       -2.332821   0.123803  -18.84   <2e-16 ***
I(lstat^2)   0.043547   0.003745   11.63   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.524 on 503 degrees of freedom
Multiple R-squared:  0.6407,	Adjusted R-squared:  0.6393 
F-statistic: 448.5 on 2 and 503 DF,  p-value: < 2.2e-16

Now, let’s compare the fit results.

plot(medv~lstat, Boston)
points(Boston$lstat, fitted(fit5), col = "red", pch = 8)

Multiple Regression – Non-Linear Transformations Read More »

Multiple Regression Continued

Continuing from the previous post, we will add all the predictors in the regression this time. The notation is response variable ~ followed by ‘.’ [dot].

fit3 <- lm(medv~., Boston)
summary(fit3)
Call:
lm(formula = medv ~ ., data = Boston)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.1304  -2.7673  -0.5814   1.9414  26.2526 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  41.617270   4.936039   8.431 3.79e-16 ***
crim         -0.121389   0.033000  -3.678 0.000261 ***
zn            0.046963   0.013879   3.384 0.000772 ***
indus         0.013468   0.062145   0.217 0.828520    
chas          2.839993   0.870007   3.264 0.001173 ** 
nox         -18.758022   3.851355  -4.870 1.50e-06 ***
rm            3.658119   0.420246   8.705  < 2e-16 ***
age           0.003611   0.013329   0.271 0.786595    
dis          -1.490754   0.201623  -7.394 6.17e-13 ***
rad           0.289405   0.066908   4.325 1.84e-05 ***
tax          -0.012682   0.003801  -3.337 0.000912 ***
ptratio      -0.937533   0.132206  -7.091 4.63e-12 ***
lstat        -0.552019   0.050659 -10.897  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.798 on 493 degrees of freedom
Multiple R-squared:  0.7343,	Adjusted R-squared:  0.7278 
F-statistic: 113.5 on 12 and 493 DF,  p-value: < 2.2e-16

The obvious questions that come here are:

  1. Was this a helpful exercise?
  2. Which variables are better at predicting the response?

To answer the first question, we look at the results from two parameters, the R2 and F values. They both signify how the variance around the model is away from that around the mean. R-squared of 0.7343 and F-statistic = 113.5 are good enough to justify this activity.

The clues to the second question are also in the results (the stars at the end of the coefficients). The results show that two variables, ‘Indus’ and ‘age’, have very low significance (high p-values). So, we remove the least damaging predictors to the model and reevaluate using the ‘update’ command.

fit4 <- update(fit3, ~.-age-indus)
summary(fit4)
Call:
lm(formula = medv ~ crim + zn + chas + nox + rm + dis + rad + 
    tax + ptratio + lstat, data = Boston)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.1814  -2.7625  -0.6243   1.8448  26.3920 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  41.451747   4.903283   8.454 3.18e-16 ***
crim         -0.121665   0.032919  -3.696 0.000244 ***
zn            0.046191   0.013673   3.378 0.000787 ***
chas          2.871873   0.862591   3.329 0.000935 ***
nox         -18.262427   3.565247  -5.122 4.33e-07 ***
rm            3.672957   0.409127   8.978  < 2e-16 ***
dis          -1.515951   0.187675  -8.078 5.08e-15 ***
rad           0.283932   0.063945   4.440 1.11e-05 ***
tax          -0.012292   0.003407  -3.608 0.000340 ***
ptratio      -0.930961   0.130423  -7.138 3.39e-12 ***
lstat        -0.546509   0.047442 -11.519  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 4.789 on 495 degrees of freedom
Multiple R-squared:  0.7342,	Adjusted R-squared:  0.7289 
F-statistic: 136.8 on 10 and 495 DF,  p-value: < 2.2e-16

An introduction to Statistical Learning: James, Witten, Hastie, Tibshirani, Taylor

Multiple Regression Continued Read More »

Multiple Linear Regression in R

We have seen linear regression before as a simple method in supervised learning. It is of two types – simple and multiple linear regressions. In simple, we write the dependent variable (also known as the quantitative response) Y as a function of the independent variable (predictor) X.

\\ Y \approx f(X) \\ \\ Y \approx \beta_0 + \beta_1 X

Beta0 and beta1 are the model coefficients or parameters. You may realise they are the intercept and slope of a line, respectively. An example of a simple linear regression is a function predicting a person’s weight (response) from her height (single predictor).

Multiple linear regression

Here, we have more than one predictor. The general form for p predictors is:

Y \approx \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ... + \beta_p X_p

We use the Boston data set from the library ISLR2 for the R exercise. The library contains ‘medv’ (median house value) for 506 census tracts in Boston. We will predict ‘medv’ using 12 predictors such as ‘rm’ (average number of rooms per house), ‘age’ (proportion of owner-occupied units built before 1940), and ‘lstat’ (per cent of households with low socioeconomic status).

First, we do the simple linear regression with one predictor, ‘lstat’.

fit1 <- lm(medv~lstat, Boston)
summary(fit1)
Call:
lm(formula = medv ~ lstat, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.168  -3.990  -1.318   2.034  24.500 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 34.55384    0.56263   61.41   <2e-16 ***
lstat       -0.95005    0.03873  -24.53   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.216 on 504 degrees of freedom
Multiple R-squared:  0.5441,	Adjusted R-squared:  0.5432 
F-statistic: 601.6 on 1 and 504 DF,  p-value: < 2.2e-16

Here is the plot with points representing the data and the line representing the model fit.

fit1 <- lm(medv~lstat, Boston)
plot(medv~lstat, Boston)
abline(fit1, col="red")

We will add another predictor, age, to the regression table.

fit2 <- lm(medv~lstat+age, Boston)
summary(fit2)
Call:
lm(formula = medv ~ lstat + age, data = Boston)

Residuals:
    Min      1Q  Median      3Q     Max 
-15.981  -3.978  -1.283   1.968  23.158 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 33.22276    0.73085  45.458  < 2e-16 ***
lstat       -1.03207    0.04819 -21.416  < 2e-16 ***
age          0.03454    0.01223   2.826  0.00491 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.173 on 503 degrees of freedom
Multiple R-squared:  0.5513,	Adjusted R-squared:  0.5495 
F-statistic:   309 on 2 and 503 DF,  p-value: < 2.2e-16

beta_0 is 33.22, beta_1 is -1.03 and beta_2 is 0.034. All three are significant, as evidenced by their low p-values (Pr(>|t|)). We will add more variables next.

An introduction to Statistical Learning: James, Witten, Hastie, Tibshirani, Taylor

Multiple Linear Regression in R Read More »

Braess’s Paradox

Another counterintuitive experience similar to Downs–Thomson’s is Braess’s Paradox. As per this phenomenon, adding more roads to an existing network can slow down the overall traffic. Similar to the previous, we will see the mathematical explanation first.

Suppose there are two routes to city B from city A, as shown in the picture – the top and bottom roads. The first part is narrow on the top road, followed by a broad highway. The situation is the opposite for the bottom road.

The highways are not impacted by the number of cars, whereas, for the narrower roads, the traffic time is related to the number of vehicles on the road – N/100, where N is the number of cars.

If there are 1000 cars in the city, the system will reach the Nash equilibrium by cars getting equally divided (in the longer term), i.e., 500 on each road. Therefore, each car to take
500/100 + 15 = 20 mins

Imagine a new interconnection built by the city to reduce traffic congestion. The travel time on the connection section is 1 minute. Let’s look at all scenarios.

Scenario 1: A car starts from the bottom highway, takes the connection, and moves to the top highway. Total time = 15 + 1 + 15 = 31 mins.
Scenario 2: One car starts from the top road, takes the connection, and moves to the bottom road, while the others follow the old path (not using the connection). Total time = 50/100 + 1 + 51/100 ~ 11 mins.

Scenario 2 seems a no-brainer to the car driver. But this news invariably reaches everyone, and soon, everyone starts taking the narrow paths! In other words, the narrow route is the dominant strategy. The total travel time now becomes:
1000/100 + 1 + 1000/100 = 21 min. This is more than the old state, a situation with no connection possible.

So, the condition without the connection road (or closing down the connection road) seems a better choice. And this is Braess’s paradox suggesting that a more complex system may not necessarily be a better choice.

Braess’s Paradox Read More »

Downs–Thomson paradox

Here is a game theory puzzle for you. The government proposes to build a highway between the two cities, A and B, that reduces the burden of the existing freeway. Here is the rationale behind the proposal.

There are two modes of arriving from A to B: 1) take the train, which takes 20 minutes, or 2) take the freeway using a car, which takes (10 + n/5) minutes, where n is the total number of cars on the road. Since the train is a public transit, it doesn’t matter how many people take it – it always takes 20 minutes to reach the destination. But if the new highway operates, the travel time becomes (5 + n/5) minutes. Note that the old freeway stops functioning once the new road is available.

What is your view on building the highway as a solution to reduce travel time, or are there alternate ideas to meet the objective?

The existing case

Suppose there are 100 commuters. Each can take the train and reach B in 20 minutes. That gives a few people the advantage of taking cars and reaching their destination earlier – until the travel time matches the following way.
20 = 10 + n/5
n = 50
Beyond 50 commuters, car travel will take longer than the train; therefore, 50 is an equilibrium number in the longer term.

The new case

The new equilibrium is estimated as follows:
20 = 5 + n/5
n = 75
In other words, more people will take the new route, but the travel time remains the same.

The paradox

This is a simplified game-theory explanation of what is known as the Downs–Thomson paradox. It says that comparable public transport journeys or the next best alternative defines the equilibrium speed of car traffic on the road.

Alternatives

On the other hand, if modifications are possible to reduce the commute time of the train, then overall travel time (for both railways and roads) can be reduced.

References

Downs–Thomson paradox: Wiki

The Problem with Faster Highways: William Spaniel

Downs–Thomson paradox Read More »

The Auction Game

Amy and Becky are in an auction for a painting. The rules of the auction are:

  1. Bidders must submit one secret bid.
  2. The minimum amount to bid is $10 and can be done in increments of $10
  3. Whoever bids the highest wins and will be charged an amount equal to the bid amount.
  4. In the event of a tie, the auction house will choose a winner based on a lot with equal probabilities.

Amy thinks the fair value of the painting is $56, and Becky thinks it’s $52. These valuations are also known to both. What should be the optimal bid?

The payoff matrix is:

Becky
Bid $40Bid $50
AmyBid $408, 60, 2
Bid $506, 03, 1

If Amy bids $40 and Becky $50, Becky wins the painting and her payoff is the value she attributes (52) – the payment (50)= $2. The loser wins or loses nothing.
If Amy bids $50 and Becky $40, Amy gets it, and her payoff will be the value (56) – what she pays (50) = $6.
If both bid at $40, Amy can get a net gain of 16 (56 – 40) at a probability of 0.5. That implies the payoff for Amy = $8. For Becky, it’s $12 x 0.5 = $6.
If both bid for $50, Amy’s payoff = $6 x 0.5 = $3 and Becky’s = $2 x 0.5 = $1

At first glance, it seems obvious that the equilibrium is where both are bidding for $40. If Amy thinks Becky will bid $40, Amy’s best response is to bid the same as her payoff is $8. The same is the case for Becky.
But if Amy expects Becky to go for $50, then Amy must also bid $50. Becky will also go along the same logic.

The Auction Game Read More »

Drunken Man Continues

The person in our previous two posts drinks every other day. As we have seen before, when he comes back, he must use the house key from the bunch of ten similar ones. As he can’t find which key is correct, he tries them individually when sober. When drunk, he pushes the same way but can’t distinguish which keys he has tried before.

What is the probability that he was inebriated on the day he unlocked the door on his seventh attempt?

It is a conditional probability problem, and the ask is:
The probability that he is drunk, given he got the right key on the seventh attempt or P(D|7). We can use Bayes’ rule, that is,

P(D|7) = \frac{P(7|D)*P(D)}{P(7|D)*P(D) + P(7|S)*P(S)}

The terms are:
P(7|D) – Probability he opens on his 7th try, given he is drunk
P(D) – Prior probability that he is drunk on the day
P(7|S) – Probability he opens on his 7th try, given he is sobber
P(S) – Prior probability that he is sobber on the day

P(7|D) = (9/10)6 x (1/10) = 0.053
P(7|S) = (9/10)x(8/9)x(7/8)x(6/7)x(5/6)x(4/5)x(1/4) = 1/10 = 0.1
P(D) = 1/2
P(S) = 1/2

P(D|7) = \frac{0.053 * 0.5}{0.053 * 0.5 + 0.1 * 0.5} = 0.35

There is a 35% probability that he is drunk.

Drunken Man Continues Read More »