Decision Making

Viterbi algorithm – R Program

The steps we built in the previous post can be done using the following R code. Note that you are required to install the library, HMM, for that.

library(HMM)
hmm <- initHMM(c("H","F"), c("NOR","COL", "DZY"), transProbs=matrix(c(0.7, 0.4, 0.3, 0.6), nrow = 2),
	emissionProbs=matrix(c(0.5, 0.1, 0.4, 0.3, 0.1, 0.6), nrow = 2))

observations <- c("NOR","COL","DZY")

vPath <- viterbi(hmm,observations)

vPath 
"H" "H" "F"

Viterbi algorithm – R Program Read More »

Viterbi algorithm – The Solution

A doctor wants to diagnose whether a patient has a fever or is healthy. The patient can explain the conditions in three options: “Normal,” “Cold,” or “Dizzy.” The doctor has statistical data on health and how patients feel. If a patient comes to the doctor and reports “normal” on the first day, “cold” on the second, and “dizzy” on the third, how is the diagnosis done?

The probability of H today, given it was H the previous day = 0.7
The probability of F today, given it was H the previous day = 0.3
The probability of F today, given it was F the previous day = 0.6
The probability of H today, given it was F the previous day = 0.4
The probability of appearing N, given H, P(N|H) = 0.5
The probability of appearing C, given H, P(C|H) = 0.4
The probability of appearing D, given H, P(D|H) = 0.1
The probability of appearing N, given F, P(N|F) = 0.1
The probability of appearing C, given F, P(C|F) = 0.3
The probability of appearing D, given F, P(D|F) = 0.6

The Viterbi steps are:
1) Estimate prior probabilities of being healthy (H) or fever (F).
2) Calculate the posterior (the numerator) of day 1 using Bayes.
3) Compare the posteriors of the two posteriors (H vs F) and find out the most likely conditions.
4) Use it as the prior for the next day and repeat steps 2 and 3.

Day 1: “Normal”

Step 2: Calculate the posterior (the numerator) of day 1 using Bayes.
P(H|N) = P(N|H) x P(H) = 0.57 x 0.5
P(F|N) = P(N|F) x P(F) = 0.43 x 0.1
You can see the pattern already. The first comes from the transition probabilities (0.57) and the second from the emission (0.5).

Step 3: Compare the posteriors
P(H|N) > P(F|N)

Day 2: “Cold”

Now, we move to the next. Note that we can’t remove the branch that was lower on the first day as it can contribute to the next day.

Part 1: We start with Healthy (day 1) – Healthy (day 2) by multiplying the transition probability (0.7) with the emission probability (0.4), and the contribution comes from the earlier step (0.29). 0.7 x 0.4 x 0.29 = 0.08.
That is not the only way to arrive at “Healthy”. It can also come from the “Fever” of the previous day. Do the same math: 0.4 x 0.4 x 0.04 = 0.0064

So, there are two ways to arrive at H on the second day. One has a probability of 0.08 (start – H – H), and the other has 0.0064 ((start – C – H)). The first is greater than the second; retain Start-H-H.

0.08 is the maximum probability we carry now for H to fight against F on day 2.

Part 2: Do the same analysis for the maximum probability for F on day 2.
Two arrows lead to “Fever”. The first gives a probability of 0.6 x 0.3 x 0.04 = 0.0072, and the second, 0.3 x 0.3 x 0.29 = 0.02. Retain 0.02 as the probability for “Fever” as it is higher than 0.0072.

Final Part: Compete H vs F. 0.08 > 0.02. So H is on day 2 as well.

Day 3: “Dizzy”

H-H: 0.7 x 0.1 x 0.08 = 0.0056
F-H: 0.4 x 0.1 x 0.02 = 0.0008
F-F: 0.6 x 0.6 x 0.02 = 0.0072
H-F: 0.3 x 0.6 x 0.08 = 0.0144

Given the patient’s feedback, the assessment would have resulted in Healthy on days 1 and 2 but Fever on day 3.

References

Viterbi algorithm: Wiki

The Viterbi Algorithm: ritvikmath

Viterbi algorithm – The Solution Read More »

Viterbi algorithm

A doctor wants to diagnose whether a patient has a fever or is healthy. The patient can explain the conditions in three options: “Normal,” “Cold,” or “Dizzy.” The doctor has statistical data on health and how patients feel. If a patient comes to the doctor and reports “normal” on the first day, “cold” on the second, and “dizzy” on the third, how is the diagnosis done?

Before we get to the solution, let’s recognise this as a hidden Markov process.
The hidden (latent) variables are “Fever” and “Healthy.” They have translation probabilities.
The observed variables are: “Normal,” “Cold,” and “Dizzy.” They have emission probabilities.

The Viterbi algorithm finds the recursive solution of the most likely state sequence using the maximum a posteriori probability. As the earlier post shows, the Bayes equation may estimate the posterior priority. As the objective is to find the maximum a posteriori, we require only its numerator in Viterbi. In our case,

P(H|N) = P(N|H) x P(H)

H represents “Healthy”, and N denotes “Normal”.

The steps are:
1) Estimate prior probabilities of being healthy (H) or fever (F).
2) Calculate the posterior (the numerator) of day 1 using Bayes.
3) Compare the posteriors of the two posteriors (H vs F) and find out the most likely conditions.
4) Use it as the prior for the next day and repeat steps 2 and 3.

Here is the hidden Markov tree, and we will see the calculations next.

Viterbi algorithm Read More »

Hidden Markov Model – Return of Bayes!

Having seen people’s behaviour—whether wearing a raincoat or not—here is the question: Given that you have seen someone wearing a normal coat, what is the probability that it will be a sunny day? For reference, here is the hidden Markov tree.

In the usual math shorthand, we need to find P(S|NC). We can use the Bayes theorem here.
P(S|NC) = P(NC|S) x P(S) /[P(NC|S) x P(S) + P(NC|R) x P(R)]

P(NC|S) = probability of wearing a normal coat given it’s sunny. We know it is 0.9
P(NC|R) = probability of wearing a normal coat given it’s rainy. It is 0.4
P(S) = prior probability of today being sunny, i.e., without any clues from the dressing.
P(R) = 1 – P(S)

The probability of today being sunny

That can be estimated from the transition probabilities as follows. Here is the sunny part of the tree.

S = 0.7 S + 0.4 R

On the other hand,

R = 0.6 R + 0.3 S

To solve the equations, we need one more: S + R = 1. They lead to,

P(S) = 4/7 and P(R) = 3/7.

Back to Bayes

P(NC|S) = 0.9
P(NC|R) = 0.4
P(S) = 4/7
P(R) = 3/7
P(S|NC) = 0.9 x (4/7) /[0.9 x (4/7) + 0.4 x (3/7)]
= (3.6/7) / [(3.6/7) + (1.2/7)] = 3.6/4.8 = 0.75

Hidden Markov Model – Return of Bayes! Read More »

Hidden Markov Model

We have seen how states move from one to another and the translation probabilities. It models a sequence in which the probability of the following (n+1) event depends only on the state of the previous (n) event. That is normal, Markov, where we can see the states. Imagine we can’t directly observe the states but only observe some indications. That is hidden Markov.

An often-used example is observing people’s attire and estimating if it’s sunny or rainy.

If it rains today, there is a 60% chance it will rain the next day and a 40% chance it will be sunny.
If it’s sunny today, there is a 70% chance it will be sunny tomorrow, and 30% will be rainy.

But you can’t observe that. All you see is people wearing a raincoat or normal clothes.
If it rains, there is a 60% chance you will see people wearing raincoats and 40% normal clothes.
If it’s sunny, there is a 90% chance you will see people wearing normal clothes and 10% rain clothes.

Hidden Markov Model Read More »

Waiting for Heads – Four Times

What is the average number of tosses required for a coin to give four consecutive heads?

We will solve this as a Markov chain problem. We start with stage 0. At stage 0, two possibilities exist when flipping a coin – H or T. H is counted as one in the right direction, whereas T means you are back to stage 0. From an H, the next flip can lead to HH or 0 with equal probabilities. So, the transition matrix is:

Q <- matrix(c(0.5, 0.5, 0.0, 0.0,
               0.5, 0.0, 0.5, 0.0, 
               0.5, 0.0, 0.0, 0.5, 
               0.5, 0.0, 0.0, 0.0), nrow = 4)

I <- matrix(c(1, 0, 0, 0,
               0, 1, 0, 0, 
               0, 0, 1, 0,
               0, 0, 0, 1), nrow = 4)
I_Q <- I - Q
I_Q_INV <- solve(I_Q)

NN <- c(1, 1, 1, 1)
NN%*%I_Q_INV
30   28   24   16

It takes average 30 moves from 0 to HHHH.

Waiting for Heads – Four Times Read More »

Waiting for Heads – Markov Chains

Some time ago, we analytically estimated the expected waiting times of sequences HT and HH from coin-flipping games. Today, we calculate them using Markov chains.

Waiting time for HT

First, the notations, followed by the transition matrix:

\begin{bmatrix} 0.0 & 0.0 & 0.0  & 0.0\\ 0.5 & 0.5 & 0.5  & 0.0\\ 0.5 & 0.0 & 0.5  & 0.0\\ 0.0 & 0.5 & 0.0  & 1.0 \end{bmatrix}

Get Q, I – Q and its inverse 

AA <- matrix(c(0.0, 0.5, 0.5,
               0.0, 0.5, 0.0,
               0.0, 0.5, 0.5), nrow = 3)

II <- matrix(c(1, 0, 0,
               0, 1, 0,
               0, 0, 1), nrow = 3)
BB <- II - AA

CC <- solve(BB)

NN <- c(1, 1, 1)
NN%*%CC
 4    2    4

The average waiting time from 0 to HT is the first element in the vector, i.e., 4.

Waiting time for HH

\begin{bmatrix} 0.0 & 0.0 & 0.0  & 0.0\\ 0.5 & 0.0 & 0.5  & 0.0\\ 0.5 & 0.5 & 0.5  & 0.0\\ 0.0 & 0.5 & 0.0  & 1.0 \end{bmatrix}

AA <- matrix(c(0.0, 0.5, 0.5,
               0.0, 0.0, 0.5,
               0.0, 0.5, 0.5), nrow = 3)

II <- matrix(c(1, 0, 0,
               0, 1, 0,
               0, 0, 1), nrow = 3)
BB <- II - AA

CC <- solve(BB)

NN <- c(1, 1, 1)
NN%*%CC
 6    4    6

The waiting time is 6.

Waiting for Heads – Markov Chains Read More »

Visting Markov Friends – The Eigen Vector Problem

We have seen two ways of solving the end-stage distribution of cities in the example of Amy’s random walk to her friends. Method #1 is to multiply the transition matrix with itself n times and finally multiply with the initial-state distribution. Method #2 is to solve the equation, P X* = X*.

In the matrix form

0.0 0.33  0.0  0.25   0.0
0.5 0.00  0.5  0.25   0.0
0.0 0.33  0.0  0.25   0.0
0.5 0.33  0.5  0.00   1.0
0.0 0.0   0.0  0.25   0.0

There is a third method: find the first eigenvector of the transition matrix and normalise it (i.e., divide it with the sum of elements to make the total distribution 1). We will manage it using an R program that has an inbuilt function to calculate the higher vector.

pM <- matrix(c(0.0, 0.5,  0.0, 0.50, 0.0, 
               0.333, 0.0,  0.333, 0.333, 0.0,
               0.0, 0.5, 0.0, 0.5,  0.0,
               0.25, 0.25,  0.25, 0.0, 0.25,
               0.0, 0.0,  0.0, 1.0, 0.0), nrow = 5)

eigenvec <- eigen(pM)$vectors
first_eigenvec <- eigenvectors[,1]

first_eigenvec/sum(first_eigenvec)
0.16663714 0.25003386 0.16663714 0.33333681 0.08335504

Visting Markov Friends – The Eigen Vector Problem Read More »

Markovian Mouse and Absorbing Chains

Remember the cat-and-mouse game? We calculated the expected time for the mouse on door # 4 to reach the cat. As we know the Markovian technique, we will solve the problem using absorbing chains. Here is the situation with arrows representing the mouse’s stage-by-stage movement.

We follow the steps described in the previous post.
Step 1: Construct the transition probability matrix
Step 2: Isolate I and make Q
Step 3: Subtract Q from the Identity matrix
Step 4: Calculate the inverse of (I – Q)

In step 1, we re-arrange the columns and get the identity matrix cornered.

Q matrix is

And the rest of the calculations,

AA <- matrix(c(0, 0.5, 0, 0, 0,
               0.5, 0, 0.5, 0, 0,
               0, 0.5, 0, 0.5, 0,
               0, 0, 0.5, 0, 0.5,
               0, 0, 0, 0.5, 0), nrow = 5)

II <- matrix(c(1, 0, 0, 0, 0,
               0, 1, 0, 0, 0,
               0, 0, 1, 0, 0,
               0, 0, 0, 1, 0,
               0, 0, 0, 0, 1), nrow = 5)
BB <- II - AA

CC <- solve(BB)

\begin{bmatrix} 1.67  & 1.33 & 1.0  & 0.67 & 0.33\\ 1.33 & 2.67 & 2.0  & 1.33 & 0.67\\ 1.0  & 2.0 & 3.0 & 2.0 & 1.0\\ 0.67 & 1.33 & 2.0  & 2.67 & 1.33\\ 0.33  & 0.67 & 1.0 & 1.33 & 1.67\end{bmatrix}

Add the elements of column 3, which represents door 4. It takes an average of 9 days for the mouse to be caught.

The Cats And Random Mouse Riddle: MindYourDecisions

Markovian Mouse and Absorbing Chains Read More »

Absorbing chains – The Lawfirm

We have seen the concept of absorbing Markov chains. Here is a problem illustrating the idea.

A law firm has three types of employees: junior lawyers, senior lawyers, and partners. During a year
There is a 0.15 probability that a junior will get promoted to a senior
0.05 probability a junior leaves the firm
0.20 probability that a senior will get promoted to a partner
0.1 probability a senior leaves the firm
0.05 probability that a partner leaves the firm

What is the average number of years a junior stays in the company?

Step 1: Construct the transition probability matrix

J stands for junior, S for senior, P for partner, L for leaving and PL for partner leaving

\begin{bmatrix} 0.8  & 0.0 & 0.0  & 0.0 & 0.0\\ 0.15 & 0.7 & 0.0  & 0.0 & 0.0\\ 0.0  & 0.2 & 0.95 & 0.0 & 0.0\\ 0.05 & 0.1 & 0.0  & 1.0 & 0.0\\ 0.0  & 0.0 & 0.05 & 0.0 & 1.0 \end{bmatrix}

Step 2: Isolate I and make Q

If required, re-arrange the transition matrix and identify the unit matrix (corresponding to the absorbing).

After removing the columns and rows corresponding to the identity matrix, you are left with Q.

Q =  \begin{bmatrix} 0.8  & 0.0 & 0.0\\ 0.15 & 0.7 & 0.0\\ 0.0  & 0.2 & 0.95 \end{bmatrix}

Step 3: Subtract Q from the Identity matrix

I - Q =  \begin{bmatrix} 1 & 0 & 0\\ 0 & 1 & 0\\ 0 & 0 & 1 \end{bmatrix} - \begin{bmatrix} 0.8  & 0.0 & 0.0\\ 0.15 & 0.7 & 0.0\\ 0.0  & 0.2 & 0.95 \end{bmatrix}

Step 4: Calculate the inverse of (I – Q)

The whole process so far is

Q <- matrix(c(0.8, 0.15, 0,
               0.0, 0.7, 0.2,
               0, 0.0, 0.95), nrow = 3)

I <- matrix(c(1, 0, 0, 
               0, 1, 0, 
               0, 0, 1), nrow = 3)
I_Q <- I - Q

I_I_Q <- solve(I_Q)
5.0   0.000000    0
2.5   3.333333    0
10.0 13.333333   20

Add numbers corresponding to each category (column) to determine the average number of stages (years). For example, a junior to stay 5 + 2.5 + 10 = 17.5 years. A senior will remain at an average of 3.3 + 13.3 = 16.6, and Partners will persist an average of 20 years before leaving.

Absorbing chains – The Lawfirm Read More »