Peter-Clark Algorithm: n = 366

06 Jan, 2025

In a previous post I applied the Peter-Clark algorithm for causal inference to a sample of 20 stocks from the S&P 500. In the conclusion, I said that I wanted to run the same code but for more stocks.

I'll be using only complete stock data from 2017-01-01 onwards. This results in 366 companies and 2012 rows.

import pandas as pd
df = pd.read_csv("stock_data_2017_onwards.csv")
df.head()

	Date	A	AAPL	ABBV	ABT
0	2017-01-03	43.743877	26.891962	44.265285	33.788067
1	2017-01-04	44.317841	26.861860	44.889439	34.056301
2	2017-01-05	43.790916	26.998468	45.229893	34.350471
3	2017-01-06	45.155258	27.299448	45.244080	35.284950
4	2017-01-09	45.296417	27.549496	45.541965	35.250347

The problem with the PC algorithm is the computation time grows exponentially with the number of companies. Therefore, in order to run it, I setup a Google Cloud VM instance with an e2-highmem-4 machine type. The first time we ran it, we used a machine with less memory and the process terminated after 8 hours from lack of memory. I used the following code to generate the final dot file.

from causallearn.search.ConstraintBased.PC import pc
from causallearn.utils.GraphUtils import GraphUtils
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd 
import pickle

no_na_2017_onwards = pd.read_csv("stock_data_2017_onwards.csv")
no_na_2017_onwards = no_na_2017_onwards.set_index("Date")

data_normalised = (no_na_2017_onwards - no_na_2017_onwards.mean()) / no_na_2017_onwards.std()
data_array = data_normalised.values

pc_result = pc(data_array, alpha = 0.01)
pc_result.draw_pydot_graph(labels=no_na_2017_onwards.columns)

pyd = GraphUtils.to_pydot(pc_result.G, labels=no_na_2017_onwards.columns)
pyd.write_raw("output_raw.dot")

The graph can be accessed here. I won't explain the process but we also styled it by industry. The best way to do this with graphviz is to style the nodes directly in the dot file. You can find that one here.

Finally, we also performed Louvain clustering to find communities in the graph. You can find that one here.

Next step is to see if this is an effective way to design a portfolio.