I just finished writing my latest book, Algorithmic Trading with Python. When writing the chapter on performance metrics, I was consistently surprised with the simplicity of the pandas
code. If you, as a developer, resolve to only work with datetime-indexed pd.Series
objects, the resulting code is really clean and easy.
Simulating Data
For those unfamiliar with pandas
, the term datetime-indexed means that each floating point value of the series has a corresponding ordered index of pd.Datetime
objects. These effectively become the array indices of any pd.Series
or pd.DataFrame
you end up working with.
If you want some simulated data to work with for this article, try the following.
import numpy as np import pandas as pd import datetime from datetime import timedelta start_date = datetime.date(2010, 1, 1) date_index = [start_date + timedelta(days=i) for i in range(3650)] price = initial_price = 100 prices = [] for i in range(3650): price *= (1 + np.random.normal(loc=0.0001, scale=0.005)) prices.append(price) series = pd.Series(prices, index=date_index)
Calculating CAGR
CAGR (compounded annual growth rate) is the annual compounded rate of return required to achieve a total return over the specified time frame.
def calculate_percent_return(series: pd.Series): return series.iloc[-1] / series.iloc[0] - 1 def get_years_past(series: pd.Series): start_date = series.index[0] end_date = series.index[-1] return (end_date - start_date).days / 365.25 def calculate_cagr(series: pd.Series): start_price = series.iloc[0] end_price = series.iloc[-1] value_factor = end_price / start_price year_past = get_years_past(series) return (value_factor ** (1 / year_past)) - 1 print(calculate_cagr(series))
Calculating Annualized Volatility
Volatility in finance is typically assumed to be the annualized standard deviation of log returns. It is computed as follows.
def calculate_log_return_series(series: pd.Series): shifted_series = series.shift(1, axis=0) return pd.Series(np.log(series / shifted_series)) def calculate_annualized_volatility(return_series: pd.Series): years_past = get_years_past(return_series) entries_per_year = return_series.shape[0] / years_past return return_series.std() * np.sqrt(entries_per_year) return_series = calculate_log_return_series(series) print(calculate_annualized_volatility(return_series))
Calculating MACD
The MACD oscillator is a popular indicator based on the difference between two moving averages of different lengths.
def calculate_simple_moving_average(series: pd.Series, n: int=20): return series.rolling(n).mean() def calculate_macd_oscillator(series: pd.Series, n1: int=5, n2: int=34): return calculate_simple_moving_average(series, n1) - \ calculate_simple_moving_average(series, n2) print(calculate_macd_oscillator(series))
Calculating Bollinger Bands
The Bollinger Bands are another proper indicator that involves computing an upper, middle, and lower band.
def calculate_simple_moving_sample_stdev(series: pd.Series, n: int=20): return series.rolling(n).std() def calculate_bollinger_bands(series: pd.Series, n: int=20): sma = calculate_simple_moving_average(series, n) stdev = calculate_simple_moving_sample_stdev(series, n) lower = sma - 2 * stdev middle = sma upper = sma - 2 * stdev return lower, middle, upper print(calculate_bollinger_bands(series))
Conclusion
I hope this post can provide some inspiration. I have been impressed in recent years how pandas
increasingly caters directly to quants and financial analysts. If you want more detail, check out my latest book Algorithmic Trading with Python.
looks like your code has been horribly manged – I can make out < for MACD, but what is the last for Bollinger Bands?
Thanks for pointing that out Geoff. HTML entities were eating some of my code. I fixed it.