The rules of roulette appear complex, with so many types of bets and payoffs. We have seen the basic odds of roulette in an older post, and this time we spend time demystifying the complexity. First, look at the wheel (American roulette).
And the layout on which the player places the bet is:
Now, various possible bets and payoffs.
numbers covered
the bet covers a number
35 to 1
bet on two adjacent numbers on the layout
17 to 1
bet on a column of 3 numbers (e.g. 12, 11, 10)
11 to 1
any block of 4 numbers of 2 x 2 (e.g. 32, 35, 31, 34)
8 to 1
five number combination of 00, 0, 3, 2, 1
6 to 1
Double Street
2 adjacent columns of the layout; a bet covers 6 numbers
5 to 1
A dozen
1-12, 13-24 or 25-36, by placing a bet on one of the 3 locations of the layout
2 to 1
odd/even, red/black, low (1-18), high (19-36)
1 to 1
Now, forget everything and let’s find out how payoffs are made, and what the expected values are.
Expected Value, E
The expected value of a random variable is a weighted average. In other words, you take the value of each variable, multiply it by its probability to occur and sum over all the variables. Imagine a coin-tossing game – you get one dollar for a head and lose 1 for a tail. The outcomes hear and tails, are random variables, each with a chance of 1 in 2 (0.5). So, the expected value = P(H) x V(H) + P(T) x V(T) = (1/2)x(1) + (1/2)(-1) = 0. Or, if you play the game over and over, you are expected to gain (or lose) nothing, but you should play for a long time to see that outcome. I have used V to denote value.
Another example: you play 6-sided dice. You get 6 dollars if the dice rolls on 3, and lose 1 dollar for everything else. The expected value (if you play long enough) is E = (1/6)(-1) + (1/6)(-1) + (1/6)(6) + (1/6)(-1) + (1/6)(-1) + (1/6)(-1) = (6/6) – (5/6) = 1/6. So, keep playing.
We have seen how it works in Casino games. We will formalise it this time. Look carefully at the last two columns of the bet-payoff table, and you can make a formula payoff = (36 – numbers covered ) / numbers covered. The formula holds good except for Basket, where the answer is (36-5)/5 = 6.2, but the casino rounds it off to 6 (benefits who?).
Let N be the number of pockets on the wheel (38 for American and 37 for European), and n be the number covered. The chance of getting one number from 38 possibilities is (1/38). The probability of getting one out of two numbers (such as a split) is (1/38) + (1/38) = 2/38 – remember the addition rule of mutually exclusive events? So the generalised formula for getting one number in a bet that covers n numbers is n/38, and the expected value is
There is something special about the final equation – that it is independent of the numbers covered but depends only on the number of pockets on the roulette wheel. In other words, if the game has a smart payoff structure given by a formula (36 – numbers covered ) / numbers covered, you get a bet-independent payoff (or a constant payoff).
House wins, always
We will plug in numbers and find out the advantage – you already know it’s a house advantage for any N more than 36. So for the American, it is (36-38)/38 = – 0.0526 or 5.26%; for the European, it is (36-37)/37 = – 0.027 or 2.7%. The Basket doesn’t exactly fit the rule, and its house advantage is higher at 7.89%.
Our results indicate that the sex ratio at conception is unbiased, the proportion of males increases during the first trimester, and total female mortality during pregnancy exceeds total male mortality; these are fundamental insights into early human development.
Orzack et al, (2015), Proceedings of the National Academy of Sciences
This post follows an old newspaper report – about the falling female/male ratio at birth in Kerala, a state in India that boasts its high female to male ratio in the population. The news suspected selective foeticide as the reason for this, a familiar allegation against many rich states of India. Let us start with the data (the data in 2021 is incomplete):
What happens in the rest of the world?
As per the data put together by the World health organisation (WHO), males to female ratio in several parts of the world ranges between 104 to 106, with a few high-profile outliers such as China (113), India (110), Pakistan (109), Vietnam (112).
What does science tell?
Orzack et al. published a thorough research paper in 2015 on this topic. The team has collected data starting with 3-6 days old embryos and all the way to live births and mapped out the whole trajectory – from conception to childbirth.
The Sex Ratio (SR) is defined here as the number of male children divided by the total; SR = 0.5 means an unbiased state, > 0.5 biased for males. The SR at conception is the Primary Sex Ratio (PSR).
The analysis of data from Assisted Reproductive Technology (ART) suggested that the PSR (sex rate at conception) was close to unbiased, at 0.502 (95 confidence interval between 0.499 and 0.505). The sex ratio becomes slightly female biased within a week or two due to more male embryos being (chromosomal) abnormal (and results in death). It changes to 0.511 by week 6-12 (first trimester) and 0.559 by week 20 (second trimester). The findings are consistent with the observed data of higher net female mortality during the first and second trimesters. It starts decreasing due to higher male mortality in the third trimester. You add up all these dynamics and get the final SR of 0.51 or 105 males per 100 females at birth.
So was there a concern?
The short answer to the initial question (Kerala) is a NO. Look at the data in the last ten years. The plot below shows the number of males per 100 females, and the red dotted line represents 105.
On the other hand, a glance at the yearly death data suggests a bias for males over females.
One can never prove the absence of selective foeticide against girl children. But the overall data doesn’t show any ‘abnormal’ features. It is equally impressive to know that females eventually gain back control in the final population figures due to their higher life expectancy.
The rule of compounding is in the following manner. Your money is on the y-axis, and the number of years you have invested in is on the X. Here, you invested 1 dollar and fetched average yearly returns of 12%.
Read the conditions
Each number is important. Read the conditions regarding the fees to enter and exit a scheme. Do you want to know the price you pay for not doing it? Read the next section.
If you allow 2% to go, you lose 60%
You have a product that can give a 12% annual return from two sources: 1) takes no expense ratio and 2) takes a 2% expense ratio. Take the one with no expense ratio. In India, this means buying direct mutual funds and not regular ones. Look what happens to an investment worth 12% (red diamonds) and the one with 2% subtracted.
At the end of the 50th year, your one dollar is worth 289, yet you get 117! Where did the rest go?
Trust the plots above
or remember the formula of compounding
Most financial advisors are just agents
who have conflicts of interest.
In summary, do the scheme of your choice, not the agent’s. Remember the rule of compounding.
Testing programs are not about machines but the people behind them.
We get into the calculations straight away. The equations that we made last time are:
Before we go further, let me show the output of 8 scenarios obtained by varying sensitivity and prevalence.
Case #
Chance of Disease for +ve (%)
Missed in 10000 tests
Chance of Disease for +ve = probability that a person is infected given her test result is positive. Missed in 10000 tests = the number of infected people showing negative results in every 10,000 tests.
Note that I fixed specificity in those calculations. The leading test methods of Covid19, RT-PCR and rapid Antigen are both known to have exceptionally low false-positive rates or specificities of close to 100%.
Now the results.
Before the Spread
It is when the prevalence of the disease was at 0.001 or 0.1%. While it is pretty disheartening to know that 95% of the people who tested positive and isolated did not have the disease, you can argue that it was a small sacrifice one did for society! The scenarios of low prevalence also seem to offer a comparative advantage for carrying out random tests using more expensive higher sensitivity tests. Those are also occasions of extensive quarantine rules for the incoming crowd.
After the Spread
Once the disease has displayed its monstrous feat in the community, the focus must change from prevention to mitigation. The priority of the public health system shifts to providing quality care to the infected people, and the removal of highly infectious people comes next. Devoting more efforts to testing a large population using time-consuming and expensive methods is no more practical for medical staff, who are now required at the patient care. And by now, even the highest accurate test throws more infected people into the population than the least sensitive method when the infection rate was a tenth.
Working Smart
A community spread also rings the time to switch the mode of operation. The problem is massive, and the resources are limited. An ideal situation to intervene and innovate. But first, we need to understand the root cause of the varied sensitivity and estimate the risk of leaving out the false negative.
Reason for Low Sensitivity
The sensitivity of Covid tests is spread all over the place – from 40% to 100%. It is true for RT-PCR, even truer for rapid (antigen) tests. The reasons for an ultimate false-negative test may lie with a lower viral load of the infected person, the improper sample (swab) collection, the poor quality of the kit used, inadequate extraction of the sample at the laboratory, a substandard detector of the instrument, or all of them. You can add them up, but in the end, what matters is the concentration of viral particles in the detection chamber.
Both techniques require a minimum concentration of viral particles in the test solution. Imagine a sample that contains lower than the critical concentration. RT PCR manages this shortfall by amplifying the material in the lab, cycle by cycle, each doubling the count. That defines the cycle threshold (CT) as the number of amplification cycles required for the fluorescent signal to cross the detection threshold.
Suppose the solution requires a million particles per ml of the solution (that appears in front of the fluorescent detector), and you get there by running the cycle 21 times. You get a signal, you confirm positive and report CT = 21. If the concentration at that moment was just 100, you don’t get a response, and you continue the amplification step until you reach CT = 35 (100 x 2(35 – 21) – 2 to the power 14 – is > 1 million). The machine suddenly detects, and you report a positive at CT = 35. However, this process can’t go forever; depending on the protocols, the CT has a cut-off of 35 to 40.
On the other hand, Antigen tests detect the presence of viral protein, and it has no means to amplify the quantity. After all, it is a quick point of care test. A direct comparison with the PCR family does not make much sense, as the two techniques work on different principles. But reports suggest sensitivities of > 90% for antigen tests for CT = 28 and lower. You can spare a thought at the irony that an Antigen test is sensitive to detect the presence of the virus that the PCR machine would have taken 28 rounds of amplification. But that is not the point. If you have the facility to amplify, why not use it.
The Risk of Leaving out the Infected
It is a subject of immense debate. Some scientists argue that the objectives of the testing program should be to detect and isolate the infectious and not every infected. While this makes sense in principle, there is a vital flaw in the argument. There is an underlying assumption that the person with too few counts to detect is always on the right side of the infection timeline – in the post-infectious phase. In reality, the person who got the negative test in a rapid screening can also be in the incubation period and becomes infectious in a few days. They point to the shape of the infection curve, which is skewed to the right, or fewer days to incubate to sizeable viral quantity and more time on the right. Another suggestion is to test more frequently so that the person who missed due to a lower count comes back for the test a day or two later and then caught.
How to Increase Sensitivity
There are a bunch of activities the system can do. The first in the list is to tighten the quality control or prevent all the loss mechanisms from the time of sampling till detection. That is training and procedures. The second is to change the strategy from analytical regime to clinical – from random screening to targetted testing. For example, if the qualified medical professional identifies patients with flu-like symptoms, the probability of catching a high-concentrated sample increases. Once that sample goes to the testing device for the antigen, you either find the suspect (covid) or not (flu), but it was not due to any lack of virus from the swab. If the health practitioner still suspects, she may recommend an RT PCR, but no more a random decision.
In Summary
We are in the middle of a pandemic. The old ways of prevention are no more practical. Covid diagnostics started as a clinical challenge, but somewhere along the journey, that shifted more to analytics. While test-kit manufacturers, laboratories, data scientists and the public are all valuable players to maximise the output, the lead must go back to trained medical professionals. A triage system, based on experiences to identify symptoms and suggested follow up actions, is a strategy worth the effort to stop this deluge of cases.
We have seen the definitions. We will see their applications in diagnosis. As we have seen, both Sensitivity and Specificity are probabilities, and the diagnostic process’s job is to bring certainty to the presence of a disease from the data. And the tool we use is Bayes’ theorem. So let’s get started.
We tailor the Bayes’ theorem for our screening test. First, the chance of being infected after the person was diagnosed with a positive test. Epidemiologists call it positive predictive value or, in our language, the posterior probability.
Positive Predictive Value (PPV)
Looking at the equation carefully, we can see the following. P(+|Inf) is the true positive or the sensitivity, and P(+|NoInf) is the false positive or (1 – Specificity). It leaves two unknown variables – P(Inf) and P(NoInf). P(Inf) is the prevalence of the disease in the community, and P(NoInf) is 1 – P(Inf).
And we’re done! Let’s apply the equation for a person who tested COVID-19 positive as part of a random sampling campaign in a city with a population of 100,000 and 100 ill people. The word random is a valuable description to remember; you will see the reason in a future post. Assume a sensitivity of 85% (yes, for your RT-PCR!) and a specificity of 98%.
Chance of Infection = 0.85 x 0.001 /(0.85 x 0.001 + 0.02 x 0.999) = 0.04. The instrument was of good quality, the health worker was skilled, and the system was honest (three deadly assumptions to make), yet she had only a 4% chance of infection.
Negative Predictive Value (NPV)
Now, quickly jump to the opposite: what is the chance someone who got tested negative, escapes the diagnostic web of the community?
There is a 99.98% certainty of no illness or a 0.02% chance of accidentally escaping the realm of the health protocol.
What These Mean
In the first example (PPV), a 4% chance of infection means relief to the person eventually, but there is a pain to do the mandatory ‘insolation’ as the system treats her as an infected.
The second one (NPV) is the opposite; for the individual, 0.02% is low; therefore, a test with medium sensitivity is quite acceptable. For the system, which wants to trace and isolate every single infected person, this means, that for every 10,000 people sampled randomly, there is a chance to send out two infected individuals into the society.
We have made a set of assumptions regarding sensitivity, specificity and prevalence. And the output is related to those. We will discuss the reasons behind these assumptions, the cost-risk-value tradeoffs, and the tricks to manage traps of diagnostics. But next time. Ciao.
Screening tests such as PCR are typically employed to test the likelihood of microbial pathogens in the body. Test results are estimates of probability and are evaluated by trained medical professionals to confirm the illness or to recommend any follow-up actions. Two terms that we have extensively used in the last two years have been the sensitivity and specificity of covid tests.
Sensitivity: Positive Among Infected, P(+|Inf)
Sensitivity is a conditional probability. It is not the ability of the machine to pick ill people from the population, although it could be related. But it is:
A test’s ability to correctly identify from a group of people who are infected.
P(+|Inf) – the probability of getting a positive result given the person was infected.
A test has a sensitivity of 0.8 (80%) if it can correctly identify 80% of people who have the disease. However, it wrongly assigns 20% with negative results.
Specificity: Negative Among Healthy, P(-|NoInf)
A test’s ability to correctly identify from a group of people who are not infected.
P(-|NoInf) – the probability of getting a negative result given the person was not infected.
A test with 90% specificity correctly identifies 90% of the healthy and wrongly gives out positive results to the rest 10%.
Final Remarks
We’ll stop here but will continue in another post. Sensitivity = P(+|Inf) = 1 – P(-|Inf). If you are infected, a test can either give a positive or a negative result (mutually exclusive probabilities). In other words, you are either true positive or false negative.
Specificity = P(-|NoInf) = 1 – P(+|NoInf). If you are healthy, a test can either give a negative or a positive test result – a true negative or a false positive.
Does a positive result from the screening test prove the person is infected? No, you need to know the prevalence to proceed further. We’ll see why we developed these equations and how we could use them to evaluate test results correctly.
Last time we set the objective: i.e. to find the posterior distribution of the expected value, from a Poisson distributed set of variables using a Gamma distribution of the mean as the prior information.
Caution: Math Ahead!
So we have a function and a prior. We will obtain the posterior using Bayes’ theorem.
The integral in the denominator will be a constant. Therefore,
Look at the above equation carefully. Don’t you see the resemblance with a Gamma p.d.f, sans the constant?
End Game
So if you know a prior gamma, you can get a posterior gamma based on the above equations. Recall the table from the previous post. The Sum of xi is 42000 and n is 7. Assume Gamma(6000,1) as a prior. This leads to a posterior of Gamma( 48000,8). Mean = 48000/8 and variance = 48000/82. The standard error becomes the square root of variance divided by the square root of n.
Expanding the Prior Landscape
Naturally, you may be wondering why I chose a prior that has a mean of 6000, or where I got that distribution from etc. And these are valid arguments. The prior was arbitrarily chosen to perform the calculations. In reality, you can get it from several sources – from similar shops in the town, scenarios created for worst (or best) case situations and so on. Rule number one in the scientific process is to challenge, and two is to experiment. So, we run a few cases and see what happens.
Imagine you come up with a prior of Gamma(8000,2). What does this mean? A distribution with a mean of 4000 and a variance of 2000 (standard deviation 44). [Recall mean = a/b; variance = a/b2 ]. The original distribution (Poisson) remains the same because it is your data.
Take another, Gamma(8000,1). A distribution with a mean of 8000 and a variance of 8000 (standard deviation 89).
Yes, the updated distributions do change positions, but they still hang around the original (from own data) probability density created by the Poisson function.
You may have noticed the power of Bayesian inference. The prior information can change expectations on the future yet retain the core elements.
Do you remember the shopping mall example? The one which attracts about 6000 customers a day? Now your task is to establish an expected value, the number of customers in a given day, and a confidence interval around it. You have the customer visits from the previous week as a reference.
Number of Visitors
The simplest way is: find out the mean, assume a distribution, and calculate the standard error. Let’s do that first. Since the number of visitors is counts, and we think their arrivals are random and independent (are they?), we choose to use Poisson distribution. Average of all those numbers give 6000, so it is
In English, it meant: for fetching the distribution of counts at a given average (mu), we decided to use a Poisson distribution with a parameter mu.
The advantage of using the Poisson is that we can now get the variance easily. For Poisson, the mean and variance are both the same, equal to mu = 6000. Therefore,
Bayesian Statistics
By now, you may have sensed that the best way to capture the uncertainties of customer visits is to consider the average too as a variable. After all, the present mean (6000) is just from a week’s data. Since the average is no more limited to integers but can also be fractions, we go for continuous distributions such as Gamma distribution to represent. In other words, a distribution of mu is my prior knowledge of average. And our objective is to get the updated mu or the posterior. So we are finally at the Baysian space for distributions or Bayesian statistics.
In Summary
You use the prior knowledge of the expected value (or average) through a Gamma distribution and apply it to the variable defined by a Poisson distribution. No marks for guessing: the posterior will be a Gamma! We will complete the exercise in the next post.
Yet another type of distribution – the Gamma distribution. It is an example of a continuous distribution. i.e. the data (or the random variable) can take any values within its range. Look at a variable like the weight of people. The values it can take vary, from its lower to upper bound, through infinite micrograms in between. Whereas the distributions we have seen so far (binomial and Poisson) had to restrict themselves to counts or tries of integer values.
As we did earlier for Poisson and Binomial, we plot the actual distribution of the random variable, probability density function and cumulative distribution function. Take a set of fictitious data from 200 Dutch adults for their heights.
The R function that creates random variables is rgamma, and takes two parameters, a and b – rgamma(a,b). One interesting thing about these two parameters is that the expectation (mean) of the distribution is (a/b), and the variance is (a/b2). Similarly, dgamma gives the PDF of the distribution.
Gamma distribution is used for modelling systems that lead to positive outcomes. The distribution is not symmetric. For the example we created, the mean comes out to be 670.15/3.65 = 183.6 and standard deviation = square root (670.15/3.652) = 7.1
There is a reason why I have introduced Gamma distribution immediately after the Poisson. That is for another post!
Height of Dutch Children from 1955 to 2009: Nature
Take the example of this shopping mall that attracts about 6000 customers daily, between 10 AM and 8 PM. The shop manager wants to know the probability of 50 customers visiting the shop between 12:00 and 12:05 next Monday. How do you do it?
One way is to divide the time into several small intervals and do Bernaulli (binomial) trials at each interval using an average probability of someone arriving during that interval based on historical data. How do you divide the time – into hours, minutes or seconds? It seems a very laborious process.
Instead of dividing time into compartments and running Bernoulli trials for each of those intervals, what about taking the time-averaged visitors and estimating expected numbers for the given interval? This method of collecting timestamps instead of recording counts at regular intervals is the strength of the Poisson (/ˈpwɑːsɒn/)distribution. It is still a discrete distribution for the outcome still counts, but its time dimension is a continuum.
We do the same process that we did last time. Following are the event, PMF and CDF of the Poisson process.
The R code required to generate the above plots is below. Please take special note of the three special functions – rpois, dpois and ppois.
trial <- 100
xxx <- seq(1,trial)
lambda <- 10
par(bg = "antiquewhite1", mfrow = c(1,3))
plot(rpois(trial, lambda), xlim = c(0,100), ylim = c(0,25), xlab="Arrival #", ylab="Count", col = "red", cex = 1, pch = 5, type = "p", bg=23, main="Poisson Outcomes")
grid(nx = 10, ny = 9)
plot(dpois(xxx, lambda), xlim = c(0,20), ylim = c(0,1), xlab="Number of Arrivals", ylab="Probability of Arrivals", col = "red", cex = 1, pch = 5, type = "p", bg=23, main="Poisson PMF")
grid(nx = 10, ny = 9)
plot(ppois(xxx,lambda, lower.tail=TRUE), xlim = c(0,20), ylim = c(0,1), xlab="Number of Arrivals", ylab="Cumulative Probability of Arrivals", col = "red", cex = 1, pch = 2, type = "p", bg=23, main="Poisson CDF")
grid(nx = 10, ny = 9)
Now to answer the manager’s question. The shop receives 6000 customers daily, i.e. an average of 50 customers every 5 minutes. It implies a Poisson function with an expected value (lambda) of 50. So what is the chance of 50 people arriving in a 5 min interval on a future day? It is dpois(50, lambda) = 5.5%