Realtime

Peter-Clark Algorithm: n = 366

In a previous post I applied the Peter-Clark algorithm for causal inference to a sample of 20 stocks from the S&P 500. In the conclusion, I said that I wanted to run the same code but for more stocks.

I'll be using only complete stock data from 2017-01-01 onwards. This results in 366 companies and 2012 rows.

import pandas as pd
df = pd.read_csv("stock_data_2017_onwards.csv")
df.head()
Date A AAPL ABBV ABT
0 2017-01-03 43.743877 26.891962 44.265285 33.788067
1 2017-01-04 44.317841 26.861860 44.889439 34.056301
2 2017-01-05 43.790916 26.998468 45.229893 34.350471
3 2017-01-06 45.155258 27.299448 45.244080 35.284950
4 2017-01-09 45.296417 27.549496 45.541965 35.250347

The problem with the PC algorithm is the computation time grows exponentially with the number of companies. Therefore, in order to run it, I setup a Google Cloud VM instance with an e2-highmem-4 machine type. The first time we ran it, we used a machine with less memory and the process terminated after 8 hours from lack of memory. I used the following code to generate the final dot file.

from causallearn.search.ConstraintBased.PC import pc
from causallearn.utils.GraphUtils import GraphUtils
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd 
import pickle

no_na_2017_onwards = pd.read_csv("stock_data_2017_onwards.csv")
no_na_2017_onwards = no_na_2017_onwards.set_index("Date")

data_normalised = (no_na_2017_onwards - no_na_2017_onwards.mean()) / no_na_2017_onwards.std()
data_array = data_normalised.values

pc_result = pc(data_array, alpha = 0.01)
pc_result.draw_pydot_graph(labels=no_na_2017_onwards.columns)

pyd = GraphUtils.to_pydot(pc_result.G, labels=no_na_2017_onwards.columns)
pyd.write_raw("output_raw.dot")

The graph can be accessed here. I won't explain the process but we also styled it by industry. The best way to do this with graphviz is to style the nodes directly in the dot file. You can find that one here.

Finally, we also performed Louvain clustering to find communities in the graph. You can find that one here.

Next step is to see if this is an effective way to design a portfolio.