This is part two of my series of articles on my dissertation from the University of York. While the first article introduced the basic concepts required for this study, the second article describes the experimentation process and approaches taken. Check out my GitHub if you want to skip to looking at the code.
This study seeks to measure the impact of certain external factors when predicting the direction of the FTSE index. These external factors are macroeconomic parameters and closing prices from other stock market indices. It is important to highlight that this study does not predict the actual closing price, but only whether the index will rise or fall on a daily basis. The aim of this study was not to find a state-of-the-art technique, or to build a trading tool, but rather to only study the impact of data. Therefore, predicting the direction was deemed enough for measuring this impact. Additionally, I only had four months, so I also had to limit the scope of my study somehow!
The study consisted of the following experiments:
- Predict the FTSE index by incorporating FTSE data with closing prices from other stock market indices. (Hypothesis 1)
- Predict the FTSE index by incorporating FTSE data with macroeconomic parameters. (Hypothesis 2)
- Predict the FTSE index by incorporating FTSE data with both closing prices from other stock market indices and macroeconomic parameters. (Hypothesis 3)
To determine whether an improvement in prediction accuracy was achieved, the above experiments were compared with two base cases:
- Always predict the most common direction. (Base Case 1)
- In the chosen dataset period (1st January 2013 to 1st January 2018), the most frequent direction was up.
- Predict the FTSE index by using the FTSE data only. (Base Case 2)
For all three hypotheses, the null hypothesis was that the method in question makes no significant difference in prediction accuracy when compared to underlying base cases.
Throughout this study two types of architectures were used:
- Feedforward networks
- A two-staged ensemble approach with a feedforward network in each stage.
The second architecture was used for incorporating macroeconomic parameters, hence hypothesis 2 and 3.
It was decided to only have one hidden layer for each feedforward network. Although this decision was taken due to time restrictions, the Universal Approximation Theorem states that one hidden layer is enough to represent a wide array of problems when given the correct parameters. Therefore, for each feedforward network, an exercise was performed to find the best combination of learning rate and the number of hidden nodes.
Additionally, all feedforward networks contained the following network conﬁgurations:
- A Xavier random initialisation was used for the initialisation of weights. The Xavier initialisation technique is designed to keep the scale of gradients approximately the same at all layers.
- A dropout rate of 20% was also applied to the input data to reduce overﬁtting when training. Training of data involved minimising a cross entropy cost function while using an Adam optimiser.
- A rectiﬁed linear unit (ReLu) was used as the activation function.
Base Case 2 and Hypothesis 1 Architecture
Base case 2 consisted of only using the FTSE data to predict the direction. This was done by taking the last 5 closing prices of the index and feeding them into the feedforward network as shown above.
Hypothesis 1 used the same architecture. The only difference was that it consisted of 21 inputs. These 21 inputs are mapped to the closing prices of the past 3 days from 7 different stock market indices, which were FTSE 100, S&P 500, DAX, CAC 40, EURO STOXX 50, NIKKEI 225 and HANG SENG. A table describing these indices and their regions can be found in my first article.
The first article also shows that Asian markets close right when the London stock market opens. Therefore, for NIKKEI and HANG SENG, the last 3 closing prices included the closing price of the same day that we’re trying to predict (at time t). For the rest of the indices, the last 3 closing prices started from the previous day (at time t – 1).
Hypothesis 2 and Hypothesis 3 Architecture
Both these experiments consisted of incorporating macroeconomic parameters through a two-stage process. The first stage consisted of training several models using the architectures from the previous section. These models were trained and tested over 5 years of data, however, each model was shifted by 6 months. As a result, I ended up with 16 models that together spanned over 13 years of data (1st January 2000 to 1st January 2013). For hypothesis 2, the architecture of these models followed that of the second base case, white for hypothesis 3 the models followed the architecture of hypothesis 1. These trained models are persisted and used again in the second stage.
The second stage involved using another feedforward network, however this time the inputs included the outputs of the pre-trained models from the first stage, and lagged inputs from 5 macroeconomic parameters, which were inflation, balance of trade, GDP, unemployment rate and interest rate. Definitions about these parameters can be found in my first article. These lagged inputs were the values of the last 3 quarters. The closing prices from the last 3 days of FTSE index were also added to the second stage feedforward network.
Initially, macroeconomic parameters were incorporated into a simple feedforward network, similar to the approach used for hypothesis 1. However, the first indication was that it was not going to perform well. So this other approach was considered. The main idea behind this approach was that these macroeconomic parameters would help choose which predictive model from the first stage to use. An ex-PhD student from the University of York, Matthew Butler, mentions that that:
Models constructed from historical information will wax and wane with the evolving market.
This suggests that predictive models lose their relevance over time, but may become relevant again in the future when a similar pattern in the data reappears. By taking this approach of training several models over different, shifted, periods of time, we hope that at least one of them become relevant again when testing with new data.
The figure below summarises the two-stage process in a flow diagram.
I would love to hear your feedback on these approaches. What approach would you have taken?