Realtime

GDELT + Stock Data: Sentiment Score Backtesting

We've got our data, but its showing no correlation between variables we thought were correlated. I want to backtest it before completely giving up on this strategy. I'm gonna use the backtesting.py package in python to do this. Its pretty simple to use but has some limitations if we ever wanted to try more complex strategies. Using the following strategy:

from collections import Counter
from backtesting import Strategy, Backtest
from gdeltdoc import GdeltDoc, Filters
import pandas as pd
import matplotlib.pyplot as plt
import yfinance as yf
import numpy as np
from scipy.stats import pearsonr

START_DATE = "2017-01-01"
END_DATE = "2024-12-30"
TICKER = "SPY"
SEARCH_TERM = "economy"

# Set up GDELT filters
f = Filters(
    keyword=SEARCH_TERM,
    start_date=START_DATE,
    end_date=END_DATE
)

gd = GdeltDoc()
articles = gd.article_search(f)
timeline = gd.timeline_search("timelinetone", f)
timeline

def get_tone(df):
  tones = [0 for i in range(df.shape[0])]
  timeline["datetime_clean"] = timeline.datetime.dt.date
  for index, date in enumerate(df.index):
    if date.date() in list(timeline.datetime.dt.date):
      tones[index] = float(timeline[timeline.datetime_clean == date.date()]["Average Tone"])
    else:
      tones[index] = 0
  return pd.Series(tones)

GOOG_Updated = yf.download(TICKER, start=START_DATE, end=END_DATE, multi_level_index=False)

class SentimentGDELT(Strategy): 
    buy_threshold = 0.5
    close_threshold = -0.5

    def init(self):
        self.tone = self.I(get_tone, self.data.df)

    def next(self):
        if self.tone != 0:
          if self.tone < self.buy_threshold:
            self.buy()
          elif self.tone > self.close_threshold:
            self.position.close()

bt = Backtest(GOOG_Updated, SentimentGDELT, cash=10_000, commission=.002)
stats = bt.run()
bt.plot()
stats

Which gave us the following plot: sentimentGDELT

Not too bad... at first sight. Looking at the statistics tells another story.

Start                     2017-01-03 00:00:00
End                       2024-12-27 00:00:00
Duration                   2915 days 00:00:00
Exposure Time [%]                   87.661692
Equity Final [$]                 20945.559185
Equity Peak [$]                  22627.686388
Return [%]                         109.455592
Buy & Hold Return [%]              201.593354
Return (Ann.) [%]                    9.712549
Volatility (Ann.) [%]               19.511283
Sharpe Ratio                         0.497791
Sortino Ratio                        0.762928
Calmar Ratio                         0.289356
Max. Drawdown [%]                  -33.566098
Avg. Drawdown [%]                   -1.684377
Max. Drawdown Duration     1137 days 00:00:00
Avg. Drawdown Duration       25 days 00:00:00
# Trades                                   64
Win Rate [%]                           53.125
Best Trade [%]                      119.20113
Worst Trade [%]                    -16.862316
Avg. Trade [%]                        1.12551
Max. Trade Duration        1756 days 00:00:00
Avg. Trade Duration          43 days 00:00:00
Profit Factor                        3.550265
Expectancy [%]                       1.791041
SQN                                  0.877532
_strategy                      SentimentGDELT
_equity_curve                             ...
_trades                       Size  EntryB...
dtype: object

A common trend in my analysis is that I have yet to devise a strategy that outperforms the market itself. You can see that in the "Buy and Hold Return [%]" which shows the return you would of got if you just held on to the stock during the period being backtested.

Alas, back to the drawing board.