Peter-Clark Algorithm: n = 366
In a previous post I applied the Peter-Clark algorithm for causal inference to a sample of 20 stocks from the S&P 500. In the conclusion, I said that I wanted to run the same code but for more stocks.
I'll be using only complete stock data from 2017-01-01 onwards. This results in 366 companies and 2012 rows.
import pandas as pd
df = pd.read_csv("stock_data_2017_onwards.csv")
df.head()
Date | A | AAPL | ABBV | ABT | |
---|---|---|---|---|---|
0 | 2017-01-03 | 43.743877 | 26.891962 | 44.265285 | 33.788067 |
1 | 2017-01-04 | 44.317841 | 26.861860 | 44.889439 | 34.056301 |
2 | 2017-01-05 | 43.790916 | 26.998468 | 45.229893 | 34.350471 |
3 | 2017-01-06 | 45.155258 | 27.299448 | 45.244080 | 35.284950 |
4 | 2017-01-09 | 45.296417 | 27.549496 | 45.541965 | 35.250347 |
The problem with the PC algorithm is the computation time grows exponentially with the number of companies. Therefore, in order to run it, I setup a Google Cloud VM instance with an e2-highmem-4 machine type. The first time we ran it, we used a machine with less memory and the process terminated after 8 hours from lack of memory. I used the following code to generate the final dot file.
from causallearn.search.ConstraintBased.PC import pc
from causallearn.utils.GraphUtils import GraphUtils
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pickle
no_na_2017_onwards = pd.read_csv("stock_data_2017_onwards.csv")
no_na_2017_onwards = no_na_2017_onwards.set_index("Date")
data_normalised = (no_na_2017_onwards - no_na_2017_onwards.mean()) / no_na_2017_onwards.std()
data_array = data_normalised.values
pc_result = pc(data_array, alpha = 0.01)
pc_result.draw_pydot_graph(labels=no_na_2017_onwards.columns)
pyd = GraphUtils.to_pydot(pc_result.G, labels=no_na_2017_onwards.columns)
pyd.write_raw("output_raw.dot")
The graph can be accessed here. I won't explain the process but we also styled it by industry. The best way to do this with graphviz
is to style the nodes directly in the dot file. You can find that one here.
Finally, we also performed Louvain clustering to find communities in the graph. You can find that one here.
Next step is to see if this is an effective way to design a portfolio.