Markov Chains Explained - Heuristic Analyst

Hope you are doing well! Today I will explain “Markov Chains” as a prerequisite for a future post.

Content of this post:

Short description of Markov chains – What are they
Detailed explaination of Markov chains – using a weather example
How to calculate the state probabilities
1. random walk
2. equilibrium
3. eigenvector approach – with brief summary of eigenvectors and eigenvalues
Nice to know property about Markov chains
Hint about the future post

Short description:

“A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.”
https://en.wikipedia.org/wiki/Markov_chain

This is the best, short description of it, which is why I quote it! Important in the quote is the “… the probability of each event depends only on the state attained in the previous event”. Read the sentence until you can understand it. Now Let’s explain Markov chains in detail!

Detailed explaination of Markov chains:

Let’s suppose that we got 3 states the weather can be – cloudy, rainy and sunny. Each of these states can lead to another or the same, future state. The diagram would look like this:

This transition, from the present state to the future state, has a probability (lets assume we know the numbers):

This already is a Markov Chain. In the diagram you can see the sunny state. It can lead to a sunny state with a probability of 20%, to a cloudy state (p=40%) and a rainy state (p=40%).

diagram – each state can lead to future states – probability adds up to 1 (colored for each state)

We can also seperate the probabilities from the diagram into a table for a more formal way of writing and understanding them:

How to calculate the state probabilities:

Random walk approach:

Now we might want to know how probable a state is. Our first attempt to answer this question is using a random walk. I will create a code which will generate a random walk of the states with the given probabilities (from the diagram). Then we just need to divide the occurrences of each state by the number of states generated in the random walk, just like in this example:

Random walk code with n = 100.000 – written in Python:

# see here how to sample a random number from a random distribution with Python:
#https://www.adamsmith.haus/python/answers/how-to-sample-a-random-number-from-a-probability-distribution-in-python
import random
# all states in one list
states = ["cloudy","rainy", "sunny"]
# create transition matrix - same order as in "states"
transition_matrix = [[0.1, 0.6, 0.3],   # cloudy
                     [0.7, 0.1, 0.2],   # rainy
                     [0.4, 0.4, 0.2]]   # sunny
# will hold every random walk
random_walk = []
# begin with a random state - I just used 1/n as the probability to start with each state
current_state = random.choices(states, [1/3, 1/3, 1/3])[0]
# run random walk 100.000 times
for i in range(100000):
    # get index of current state in states list
    state_index = states.index(current_state)
    # random selection of a state with the distribution in the transition matrix - current state = new state
    current_state = random.choices(states, transition_matrix[state_index])[0]
    # add current state, which is the new generated state, to the random walk list
    random_walk.append(current_state)

occurrences = [random_walk.count("cloudy"), random_walk.count("rainy"), random_walk.count("sunny")]
print("Verify n:", occurrences[0]+occurrences[1]+occurrences[2])
print("Probabilities:", occurrences[0]/100000, occurrences[1]/100000,occurrences[2]/100000)
print(random_walk)

# Output:
Verify n: 100000
Probabilities: 0.3932 0.36791 0.23889
['sunny', 'cloudy', 'rainy', 'cloudy', 'sunny', 'cloudy', 'sunny', 'sunny', 'rainy', 'rainy', 'cloudy', 'sunny',...]

As we calclated the probabilities for each state with the random walk we got to these numbers:

p(cloudy) = 0.3932
p(rainy) = 0.36791
p(sunny) = 0.23889

Equilibrium approach:

Now lets get to the juicy stuff – not approximate the probobility of each state but calculate it!

For it we will use the the transition matrix (same as in the code). Now suppose we start with a random state, e.g. rainy. Then the state distribution for today, a rainy day, is 100% rainy, 0% cloudy and 0% sunny.

Because of that the row vector (todays state – cloudy, rainy, sunny) will look like this:

πᵢ = [0, 1, 0]

Side note: this equilibrium is often also refert as the stationary state – denoted as π

Lets call the transition matrix “A”. To now calculate the probability of each state, can be also seen as the equilirium state, we will multiply πᵢ with A and get the the row vektor πᵢ₊₁ (next day/state):

To get the equilibrium:

We need to calculate the new state i+2 by calculating πᵢ₊₁ x A = πᵢ₊₂
We will do this until πₙ = πₙ₊₁ → equilibrium found

Equilibrium Python code – using Numpy:

import numpy as np
# all states in one list
states = ["cloudy","rainy", "sunny"]
# create transition matrix A - same order as in "states"
A = [[0.1, 0.6, 0.3],   # cloudy
     [0.7, 0.1, 0.2],   # rainy
     [0.4, 0.4, 0.2]]   # sunny

# start with a random state, lets say rainy
pi = [0, 1, 0]
pi_previous = pi
# convert matrices to numpy matrices
A = np.array(A)
pi = np.array(pi)
pi_previous = np.array(pi_previous)

while True:
     # dot product of pi x A
     pi = pi.dot(A)
     # if pi = pi_previous -> equilibrium found (pi will not change after it)
     if (pi==pi_previous).all():
          break
     else:
          pi_previous = pi
print(pi)

# Output:
['cloudy', 'rainy', 'sunny']
[0.39263804, 0.36809816, 0.2392638]

We now can compare these results with the results from the random walk:

deviation(cloudy): Absolut(0.39263804 / 0.3932 – 1)= 0.1429%
deviation(rainy): Absolut(0.36809816 / 0.36791 – 1) = 0.0511%
deviation(sunny): Absolut(0.2392638 / 0.23889 – 1) = 0.1565%

As we can see we were very close to the actual propabilities!

Eigenvector approach:

Now we can go one step further and look into the equation π x A = π

We calculated π with our code above by finding the equilibrium. If you look at the equation you might see some similarities with another famous equation: the eigenvector equation

Brief summary of eigenvectors and eigenvalues:

Lets say we have a 2 vectors – they can be combined into one matrix:

Example – 2 vectors and resulting combination of them (matrix)

If we now have another vector we could multiply it with the matrix:

Example – vector transfomation by multiplying it with a matrix

As you can see in the example above a multiplication of a matrix A with a vector vᵢ is nothing else then a transformation of the vector vᵢ into another vector vⱼ. The transformation transforms the vector’s length and rotate it.

If a vector vᵢ is transformed into a vector vₑ and vector vₑ stays on the same span (meaning on the same line) as vᵢ, then this vector vᵢ, aswell as vₑ and every other vector on the same span as them, are eigenvectors.

The factor by which the vector vᵢ gets streched/squished is called the eigenvalue.

The eigenvector equation is the following one: Av = λv (read important side note)

A is the linear transformation matrix
v is a vector (eigenvector)
λ is a factor, the eigenvalue

The meaning behind the equation is that an eigenvector linear transformed (A x v) is equal to the same eigenvector times a factor – which is exactly what I wrote before!

Very important note:
There are 2 eigenvector: left and right eigenvectors
Difference lies in calculation:
– A right eigenvector is calculated by the following eigenvector equation: Av = λv
→ the vector v is a column vector
– A right eigenvector is calculated by the following eigenvector equation: vA = λv
→ the vector v is a row vector

Connection of Markov chain equilibrium equation and Eigenvector equation:

So we got the Markov Chain equilibrium equation π x A = π and the left eigenvector equation v x A = λ x v.

As we seen before π denotes a probability distribution, which is why the sum of π’s elements must add up to 1. This is the same as setting the eigenvalue λ to 1 in the Eigenvector equation.

With these informations we can now easily calculate the equilibrium(s):
We can now calculate each eigenvector of the transition matrix of the Markov chain with the eigenvalue 1 and voilà, we calculated the probability distribution of each state! Lets code that:

Left eigenvector of transition matrix:

import numpy as np
from scipy import linalg
# all states in one list
states = ["cloudy","rainy", "sunny"]
# create transition matrix A - same order as in "states"
A = [[0.1, 0.6, 0.3],   # cloudy
     [0.7, 0.1, 0.2],   # rainy
     [0.4, 0.4, 0.2]]   # sunny

# convert matrix to numpy matrix
A = np.array(A)

# calculate eigenvalues and eigenvectors
eigenvalues, eigenvectors = linalg.eig(A, left=True, right=False)

# probability distribution vector should add up to one - same as lambda of eigenvector equation is 1
# -> divide each vector element by the sum of each vector's element
eigenvectors = eigenvectors/eigenvectors.sum()

# transpose eigenvectors matrix to get a vector easily out of the matrix
eigenvectors = eigenvectors.transpose()

# loop through each vector and see if all elements of vector are greater or equal to zero
# (because there are no negative probabilities)
# save valid eigenvectors in "equilibriums
equilibriums = []
for i in range(len(eigenvectors)):
     isGreaterEqualZero = True
     for j in range(len(eigenvectors[i])):
          if eigenvectors[i][j] <= 0:
               isGreaterEqualZero = False
     if isGreaterEqualZero == True:
          equilibriums.append([])
          for j in range(len(eigenvectors[i])):
               equilibriums[-1].append(eigenvectors[i][j])

print("Every eigenvector with lambda 1 and elements greater or equal zero:")
print(equilibriums)

# Output
Every eigenvector with lambda 1 and elements greater or equal zero:
[[0.39263803680981585, 0.3680981595092024, 0.2392638036809816]]

We can now compare the eigenvector (could also be more then one eigenvector, meaning multiple equilbria, depending on the starting state) to the before calculated equilibrium:

Before we got this equilibrium: [0.39263804, 0.36809816, 0.2392638]
Now we got: [0.39263803680981585, 0.3680981595092024, 0.2392638036809816]

As you can see we calculated the same numbers as the eigenvector, but rounded to the 8th number.

Nice to know property about Markov chains:

Lets work with the same transition matrix A as before:

Now the following question might interest you: What is the probability of getting from state “cloudy” to state “sunny” in 1 step?

To answer this you would look into the matrix A in the “cloudy” row (first row) and look into the 3rd column (“sunny”) (recap Markov chains above if you did not understand this). For steps n=1 the probability to start at “cloudy” and end up at “sunny” is p=0.3.

But what if we want to the probability of starting at “cloudy” and end up at “sunny” when we take 2 steps?

There are several paths that lead us from “cloudy” to “sunny”. I will now write down each path with each probability calculation:

“cloudy” → “cloudy” → “sunny”: 0.1*0.3=0.03
“cloudy” → “rainy” → “sunny”: 0.6*0.2=0.12
“cloudy” → “sunny” → “sunny”: 0.3*0.2=0.06

For steps n=2 the probability to start at “cloudy” and end up at “sunny” is p=0.03+0.12+0.06=0.21.

Now what do you notice here? You just made a dot product (matrix multiplication) of 2 vectors – one row vector of the first row in the matrix A with a column vector, being the last column of the matrix A:

Why is that? If you start with your first step at “cloudy” you can end up at every other state with a certain probability. This is represented by the first vector. Now you need to end up at “sunny”. So you multiply the vector with the probability vector to end up at “sunny”, represented by the second vector.

If you want to know every probability – starting and ending at a specific state, after n steps – you just need to multiply the matrix n times by itself:

If you calculated Aⁿ you can just read the probabilities – in the example above we can see, that for n=2 the “cloudy” to “sunny” state transition probability (with 2 steps) matches (p=21%). You can do it for how many steps you like!

Hint about a future post:

In a future post I would like to build a Markov model which should be able to detect market regimes.

I hope you understood everything.
As always, every code written can be found directly here on my github: github.com/Heuristic-Analyst/…
Cheers!