Markowitz Efficient frontier – Portfolio optimization from scratch with Python - Heuristic Analyst

Hello people of finance and tech lovers, today I would like to create a Python code which will be able to construct a mean-variance optimal stock portfolio. The theory is based on Harry Markowitz’s work. Lets begin with the fun!

Structure of this post:

Metrics: How to calculate the portfolio return and volatility (I hope everyone knows what returns and volatility/standard deviation are)
Theory: Some theory behind Harry Markowitz efficient frontier
Code: Code the portfolio optimization script

Metrics

Portfolio return: In a given portfolio we can easily calculate the portfolio return by multiplying each asset return with each weight of the portfolio and then adding them up:

Sidenote: if, for whatever reason, you would like to work with log returns later, then please adjust the math accordingly

Annualize returns: For comparison reasons we will annualize and compound the returns. We will work with daily data. Lets say we got 180 days of daily stock returns of an asset. Stocks trade about 252 days per year. If we want to calculate the annualized compounded return with these numbers we would calculate it the following way:

Annulaized portfolio volatility: Unlike returns we cannot just add up the standard deviations of assets in a portfolio. Because of intercorrelations between the assets we calculate the volatility (standard deviation of the portfolio) as follows:

First we calculate the covariance matrix of our returns
Then we annualize them by multiplying every covariance in the matrix by an annulization factor (in our example with (252/180))
Then we perform these matrix multiplications where we transpose the weights of the portfolio, calculate the product with the covariance matrix and then multiply it again with the weight vector
Last step is to take the square root of the variance to get the standard deviation/volatility

Theory

Harry Markowitz, the father of modern portfolio theory, showed in his paper “Portfolio Selection” (1952) for the first time that risk of a portfolio of assets is not linear. These shown inter-relationships of risk can then be used to construct a portfolio which will achive the same return as another portfolio but with less volatility. What is done is to calculate the optimal weights to achive a optimal portfolio with the given assets.

We will construct our portfolio using a convex optimizer called “CVXPY” (Python library). It will optimize our objective under certain constraints. In the Python code below I will minimize the risk by minimize the variance, which will automatically minimize the standard deviation/volatility. The constraints are that the sum of the weights should add up to 1 (100%) and that we will only go long (weightsᵢ >= 0). Below you can find these and some other ideas of objectives and constraints under which you can optimize your portfolio:

Objectives:

Minimize risk:

Maximize return:

maximize sharp ratio:

Constraints:

Long portfolio – sum of weights should add up to 1:

Long portfolio – weights:

Short portfolio – sum of weights should add up to -1:

Short portfolio – weights:

Market neutral portfolio:

Long and short portfolio:

Sidenote: It should be noted that this method is taught in most universities about portfolio theory, which is good but not enough. Estimation of covariance matrices (and their inverses) are often very unstable. Estimates of returns are also hard to make and are often made from their historical values, despite that past performance is not a reliable indicator of future performance. This means that optimizing the portfolio for its performance (weights.Transposed*returns) will not be a good reliable method. Because of these problems a 1/N portfolio, this means that every asset in a portfolio is has the same weight (1/N), will often outperform such kind of portfolio.

Code

Now lets implement everything in Python: (github.com/Heuristic-Analyst/…)

1. I will start with the covariance matrix. Look at this post on how to calculate a covariance between two assets. I just copied the code I already wrote for the previous post and adjusted it to calculate covariances for every combination of 2 pairs of assets in a list, because I will hand over a 2D list with every asset returns (lists) bundled in a list. So a “returns”-list can look like this: [[1,031, 0.998,…], [1,014, 0.978,…], [1,023, 0.983,…]]. The code will be the following:

def calc_covariancematrix(returns: list):
    covarianzmatrix = [[0 for j in range(len(returns))] for i in range(len(returns))]
    for i in range(len(returns)):
        for j in range(len(returns)):
            if j >= i:
                covarianzmatrix[i][j] = covariance(returns[i], returns[j], "sample")
                covarianzmatrix[j][i] = covarianzmatrix[i][j]
    return covarianzmatrix

def covariance(x_values: list, y_values: list, population_or_sample: str):
    x_mean = 0
    y_mean = 0
    sum_x_y_diff = 0
    n = 0

    for i in range(len(x_values)):
        x_mean += x_values[i]
        y_mean += y_values[i]
        n += 1
    x_mean /= n
    y_mean /= n

    for i in range(len(x_values)):
        sum_x_y_diff += (x_values[i] - x_mean) * (y_values[i] - y_mean)

    if population_or_sample == "population":
        return sum_x_y_diff / n
    elif population_or_sample == "sample":
        return sum_x_y_diff / (n - 1)

Sidenote: Covariance(A,B) = Covariance(B,A) and that the length of lists which are being calculated on must be the same length

2. Next we will get the data with a public library called “pandas-datareader” which comes in handy to gather daily stock data from yahoo or other sources. Because these are dataframes and I only need the close prices of the assets I will convert them into lists. Also I will save the names of the assets (in the right order) in a seperate list:

import pandas_datareader.data as web
import datetime

if __name__ == '__main__':
    start = datetime.datetime(2018,1,1)
    end = datetime.datetime(2020, 12, 31)

    cnp = web.DataReader("CNP", "yahoo", start, end).Close.tolist()
    wmt = web.DataReader("WMT", "yahoo", start, end).Close.tolist()
    f = web.DataReader("F", "yahoo", start, end).Close.tolist()
    ge = web.DataReader("GE", "yahoo", start, end).Close.tolist()
    telsa = web.DataReader("TSLA", "yahoo", start, end).Close.tolist()

    price_symbols = ["CNP", "WMT","F", "GE", "TSLA"]
    price_data = [cnp, wmt, f, ge, tesla]

3. Now lets imagine we calculated some weights and the covariance matrix. With them we can calculate the portfolio volatility like I described before. I will just use pure Python to do it from scratch:

from math import sqrt


def calc_portfolio_volatility(portfolio_weights: list, covariance_matrix: list):
    # volatility = sqrt(w.transposed() * covar-matrix * w)
    interimResultMatrix1 = [0 for i in range(len(portfolio_weights))]
    for i in range(len(covariance_matrix)):
        for j in range(len(portfolio_weights)):
            interimResultMatrix1[i] += portfolio_weights[j] * covariance_matrix[j][i]
    volatility = 0
    for i in range(len(interimResultMatrix1)):
        volatility += interimResultMatrix1[i] * portfolio_weights[i]
    volatility = sqrt(volatility)
    return volatility

4. Now you will see how I implemented the optimization of minimizing the volatilty function (risk) with the public convex optimization library “CVXPY”. Here we need to create the variables (weights vector with length of number of assets) which we will optimize. Then create our risk function (not portfolio volatility/standard deviation but variance, because CVXPY can not handle it). Then we create a list with the constraints, that we only go long on assets (weight of asset i > 0) and that the sum of the weights must equal 1 (100% asset allocation of our money). After that we optimize/minimize the risk function under our constraints and then we are finished with this step:

import cvxpy as cp


# For min volatility
def optimize_portfolio_weights_min_volatility(len_prices_data: int, covariance_matrix: list):
    # portfolio volatility = sqrt(w.transposed * covar-matrix * w) = sqrt(variance)
    # We will minimize variance (volatility^2)

    # Generate weight-variables and save them in "weights" -> will be optimized values
    weights = cp.Variable(len_prices_data)

    # Create objective "risk" function to be minimized:
    # See here: https://www.cvxpy.org/examples/basic/quadratic_program.html
    # that (1/2)*x.transposed*P*x+q.transposed*x is equal to (1/2)*cp.quad_form(x, P) + q.T @ x
    # This means that w.transposed() * covar-matrix * w (our risk measure "variance")
    #   is equal to cp.quad_form(weights, covariance_matrix)
    objective = cp.quad_form(weights, covariance_matrix)

    # Create constraints list: sum of weights should be 1 and only long assets (every weight > 0)
    constraint = [sum(weights) == 1]
    for i in range(len_prices_data):
        constraint.append(weights[i] >= 0)

    # Solve formulated problem with cvxpy library
    problem = cp.Problem(cp.Minimize(objective), constraint)
    problem.solve()
    # Save optimized weight-values in "optimized_weights"
    optimized_weights = []
    for i in range(len(weights.value)):
        optimized_weights.append(weights.value[i])
    # Return solution
    return optimized_weights

5. In this step i will write the main function which will get the asset prices, asset lables (symbols), amount of days they are trading in one year, frequency of asset data per day and the risk free rate (like a U.S. 10 Year Treasury: right now round about 0.03=3%). With these parameters we will calculate normal returns of each day for each asset, then calculate each annualized, compounded asset return, after which we will calculate the annualized covariance matrix (from step 1). Then we will run the optimization function we have already written in step 4 (“optimize_portfolio_weights_min_volatility()”). After that we will weight each asset return to calculate the portfolio return, then calculate the portfolio volatility with our written function “calc_portfolio_volatility()” in step 3 and after this we will also calculate the sharp ratio (the ratio of return (minus risk free rate) and volatility). As a final step we will format everything in a pretty way to return our solution:

import Covarianzmatrix
import PortfolioVolatility


def PortfolioOptimizationConvexOpt(prices_data: list, price_symbols: list, frequency_of_data_per_day: float, frequency_of_data_per_year: float, risk_free_rate: float):
    # Calculate returns for each asset - start with last one to affect the next return
    # (If I would start to calculate the first return I would change it and not be able to calculate the next return)
    returns = [[prices_data[i][j]/prices_data[i][j-1] for j in range(1, len(prices_data[i]))] for i in range(len(prices_data))]

    # Calculate each annualized asset return: First multiply returns (or for log returns: sum up)
    # and then return^(annualized compound factor)-1
    # if daily returns: frequency_of_data_per_day = 1; if stock data: frequency_of_data_per_year = 252
    # -> compound factor is 252/1 if only one day data is given to annualize it -> if for example 300 days worth
    # of data then: (252/1)/300 to annualize the returns
    assetReturns = [1 for i in range(len(returns))]
    for i in range(len(returns)):
        for j in range(len(returns[i])):
            assetReturns[i] *= (returns[i][j])
        assetReturns[i] = assetReturns[i] ** ((frequency_of_data_per_year/frequency_of_data_per_day) / len(returns[i])) - 1

    # Calculate covariance matrix with pure python (2d, square matrix; length = amount of assets)
    # Price data must be unweighted, because scaling affects the covariance calculation
    covarianceMatrix = Covarianzmatrix.calc_covariancematrix(returns)
    for i in range(len(covarianceMatrix)):
        for j in range(len(covarianceMatrix[i])):
            covarianceMatrix[i][j] *= (frequency_of_data_per_year/frequency_of_data_per_day)

    # optimize portfolio weights by minimizing volatility
    optimized_weights_min_volatility = PortfolioVolatility.optimize_portfolio_weights_min_volatility(len(returns), covarianceMatrix)

    # Calculate annualized portfolio return - just sum up the weighted returns
    portfolioReturn = 0
    for i in range(len(assetReturns)):
        portfolioReturn += assetReturns[i]*optimized_weights_min_volatility[i]

    # Calculate volatility sigma with the matrix multiplication formula:
    # sigma = volatility = square root( transposed(weights) * covariance matrix * weights)
    volatility = PortfolioVolatility.calc_portfolio_volatility(optimized_weights_min_volatility, covarianceMatrix)

    # Calculate the sharp ratio
    sharpRatio = (portfolioReturn-risk_free_rate) / volatility

    # Assignment of asset names to weights
    optimized_weights = {}
    for i in range(len(optimized_weights_min_volatility)):
        optimized_weights[price_symbols[i]] = optimized_weights_min_volatility[i]

    # Create dict which will contain the portfolio return, volatility, sharp ratio and weights for optimized portfolio
    portfolioMetrics = {"Portfolio return": portfolioReturn, "Volatility": volatility, "Sharp ratio": sharpRatio, "Weights": optimized_weights}

    # Return the results at the end
    return portfolioMetrics

Final step: Optimize our portfolio:

import pandas_datareader.data as web
import datetime
import PortfolioOptimizationConvexOpt

if __name__ == '__main__':
    start = datetime.datetime(2018,1,1)
    end = datetime.datetime(2020, 12, 31)

    CNP = web.DataReader("CNP", "yahoo", start, end).Close.tolist()
    WMT = web.DataReader("WMT", "yahoo", start, end).Close.tolist()
    F = web.DataReader("F", "yahoo", start, end).Close.tolist()
    GE = web.DataReader("GE", "yahoo", start, end).Close.tolist()
    TSLA = web.DataReader("TSLA", "yahoo", start, end).Close.tolist()

    price_symbols = ["CNP", "WMT","F", "GE", "TSLA"]
    price_data = [CNP, WMT, F, GE, TSLA]

    portfolioMetricsCVXOPT = PortfolioOptimizationConvexOpt.PortfolioOptimizationConvexOpt(price_data, price_symbols, 1, 252, 0.03)
    print(portfolioMetricsCVXOPT)

Output:

{'Portfolio return': 0.0839471406763497, 
 'Volatility': 0.21908650398113083, 
 'Sharp ratio': 0.2462367133349117, 
 'Weights': {'CNP': 0.10246322843605614, 
             'WMT': 0.6988277812970941, 
             'F': 0.1547298490937759, 
             'GE': 0.027893160907784275, 
             'TSLA': 0.016085980265289592}}

This was my take on the portfolio optimization (minimal volatility – objective can be changed by you (e.g. maximize return with a given volatility, maximize sharp ratio). As always can the aggregated code be found here on my Github: github.com/Heuristic-Analyst/…