Hidden Markov Models (HMM) Explained - Heuristic Analyst

This post builds on the previous post about Markov chains. Now I will explain what Hiddem Markov Models are and how to calculate the most likely sequence of Markov chain states given a observable variable sequence.

Content of this post:

What is a HMM
HMM explained
Calculate the probability of a state sequence given observable variable states
Finding the most likely sequence with Brute Force

What is a HMM:

A Hidden Markov Model is a Markov chain where we cannot observe the state of the Markov chain itself but only other, observable variables, which depend on the – hidden – Markov chain.

HMM Explained:

I will use the Markov chain of the post where I explain Markov chains:

Lets create a Hidden Markov model from this chain by adding observable states of a variable (you can add as many observable variables you like). I will use John’s face as a observable variable. Either he is happy or sad, all because of the current state (cloudy, rainy or sunny) – with a certain probability at each state:

Like in the previous post about Markov chains we can also write down the probabilites of each transition from a current to a future state in a transition matrix, and we can write down the probabilites of each observable variable of each state. The observable probabilities are also known as emission probabilities, here written down in the emission matrix:

I will say it again: A Hidden Markov model a Markov chain where the current state is not observable. What is observable are other, Markov chain dependent variables, like in this example John’s mood!

Calculate the probability of a state sequence given observable variable states:

Lets say we observe John three times today: In the morning he is happy, afternoon he is happy too but in the evening he is sad. Now a question might occur: Which sequence of states did John most likely experience?

Because we cannot just read it from the table we need do the following: We need to calculate the probability for each possible state combination and select the one with the maximum probability – generally speaking we will calculate the joined probability of the observable mood sequence Y and the weather sequence X. Lets do it for one specific sequence:

If the following notation applies:

Y₀= sad John, Y₁ = happy John, X₀ = cloudy, X₁= rainy, X₂= sunny

then the probability of the sequences:

John’s mood = happy, happy, sad
Weather = sunny, cloudy, rainy

can be written as:

P(Y=Y₁, Y₁, Y₀ | X= X₂, X₀, X₁)

We can break the porbability calculation P(Y=Y₁, Y₁, Y₀ | X= X₂, X₀, X₁) down:

1. state:

Calculate the stationary probability of sunny ( because no transition occurring – calculate it with left eigenvector) – P( X₂ ) ≈ 0.24
Get the probability of being happy given a sunny weather – P( Y₁ | X₂ ) = 0.8

2. state:

Get the transition probability of sunny to cloudy – P( X₀ | X₂ ) = 0.4
Get the probability of being happy given a cloudy weather – P( Y₁ | X₀ ) = 0.15

3. state:

Get the transition probability of cloudy to rainy – P( X₁ | X₀ ) = 0.6
Get the probability of being sad given a sunny weather – P( Y₀ | X₁ ) = 0.7

Now we can compute the sequence/path by calculating the product:
P(Y=Y₁, Y₁, Y₀ | X= X₂, X₀, X₁) = 0.24*0.8*0.4*0.15*0.6*0.7=0.0048384

Now we need to comupte the probability of each possible weather sequence and select the sequence with the maximum probability to get the most likely weather sequence!

Finding the most likely sequence with: Brute Force

I will implement the same transition matrix probabilities and dependent observable probabilities (John’s mood) as above. The observable sequence, where we want to comput the most likely Markov chain state sequence from, is the following: “happy”, “happy”, “sad”

In the code below I copied the code from this previous post about Markov chains to calculate the left eigenvectors which represent the stationary probability distribution of the Markov chain states. We need this equilibrium for the first state, since it is the start of the sequence and does not transition from another state (hence we use the stationary probability of the state).

Python code – brute force most likely sequence:

###########################################
# The first code-part (between "#####...") is from a previous post to find the stationary probability of states
import numpy as np
from scipy import linalg
# all states in one list
states = ["cloudy","rainy", "sunny"]
# create transition matrix A - same order as in "states"
# cloudy, rainy, sunny
A = [[0.1, 0.6, 0.3],   # cloudy
     [0.7, 0.1, 0.2],   # rainy
     [0.4, 0.4, 0.2]]   # sunny

# convert matrix to numpy matrix
A = np.array(A)

# calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = linalg.eig(A, left=True, right=False)

# probability distribution vector should add up to one - same as lambda of eigenvector equation is 1
# -> divide each vector element by the sum of each vector's element
eigenvectors = eigenvectors/eigenvectors.sum()

# transpose eigenvectors matrix to get a vector easily out of the matrix
eigenvectors = eigenvectors.transpose()

# loop through each vector and see if all elements of vector are greater or equal to zero
# (because there are no negative probabilities)
# save valid eigenvectors in "equilibriums
equilibriums = []
for i in range(len(eigenvectors)):
     isGreaterEqualZero = True
     for j in range(len(eigenvectors[i])):
          if eigenvectors[i][j] <= 0:
               isGreaterEqualZero = False
     if isGreaterEqualZero == True:
          equilibriums.append([])
          for j in range(len(eigenvectors[i])):
               equilibriums[-1].append(eigenvectors[i][j])

###########################################
# which sequence is observable of the observable variable?
# This will be the sequence we will compute the most likely Markov chain states sequence from
sequence_of_observable_variable = ["happy", "happy", "sad"]

# all observable states in one list
observable_states = ["happy", "sad"]
# Create happy/sad face probabilities dependent on the weather
# cloudy, rainy, sunny
B = [[0.15, 0.05, 0.8],  # happy face
     [0.2, 0.7, 0.1]]    # sad face

# standard library for combination/permutation/product of observable states
import itertools
# create permutation list of states list (each combination of the states given length of the sequence n)
# combination returned in tuples or triples or ... depends on amount of possible states
products = list(itertools.product(states, repeat=len(sequence_of_observable_variable)))

maximum_sequence = () # tuple or triple or ... depends on amount of possible states
# starting probability - will be compared to and changed to maximum probability during the iterations
maximum_probability = 0

for product in products:
     for i in range(len(product)):
          if i == 0:
               # first Markov chain state probability is did not transition from another state -> stationary probability
               # I think there might be several eigenvectors/equilibriums but I do not know whether so
               # and if so, i do not know which one to take to start, so I just take the first eigenvector/equilibrium
               probability = equilibriums[0][states.index(product[i])]
          else:
               # state probability is transition probability from before
               current_state = states.index(product[i])
               previous_state = states.index(product[i-1])
               # multiply with transition probability from step i-1 to i
               probability *= A[previous_state][current_state]
          probability *= B[observable_states.index(sequence_of_observable_variable[i])][states.index(product[i])]
     if probability > maximum_probability:
          maximum_probability = probability
          maximum_sequence = product

print("Maximizing sequence:", maximum_sequence)
print("Sequence's probability:", maximum_probability)

# Output
Maximizing sequence: ('sunny', 'sunny', 'rainy')
Sequence's probability: 0.008575214723926384

We could also easily change the sequence to e.g. “sad”, “happy”, “sad”, “sad”, “happy”, “sad”, “happy” by changing the line 43 in the code:

sequence_of_observable_variable = ["sad", "happy", "sad", "sad", "happy", "sad", "happy"]

# Output
Maximizing sequence: ('rainy', 'sunny', 'rainy', 'cloudy', 'sunny', 'rainy', 'sunny')
Sequence's probability: 1.7376287411042944e-05 = 0.000017376287411042944

This was my take on explaining the basis of Hidden Markov Models. Next we will tackle some algorithms relating to HMM’s in order to find the best sequence and other stuff efficiently. Of course the code I used here can be found on my github: www.github.com/Heuristic-Analyst/…
Cheers!