Volatility-inspired $\sigma$ -LSTM cell

German Rodikov Scuola Normale Superiore, Pisa, Italy Nino Antulov-Fantulin ETH Zürich, Switzerland
Aisot Technologies AG, Zürich, Switzerland

Abstract

Volatility models of price fluctuations are well studied in the econometrics literature, with more than 50 years of theoretical and empirical findings. The recent advancements in neural networks (NN) in the deep learning field have naturally offered novel econometric modeling tools. However, there is still a lack of explainability and stylized knowledge about volatility modeling with neural networks; the use of stylized facts could help improve the performance of the NN for the volatility prediction task. In this paper, we investigate how the knowledge about the ”physics” of the volatility process can be used as an inductive bias to design or constrain a cell state of long short-term memory (LSTM) for volatility forecasting. We introduce a new type of $\sigma$ -LSTM cell with a stochastic processing layer, design its learning mechanism and show good out-of-sample forecasting performance.

^†^†preprint: APS/123-QED

I Introduction

The structure of noise or errors in regression models is usually subtle and taken as an ansatz to use different mathematical frameworks. E.g. in the case of linear regression models $\mathbf{y}=\mathbf{X}\beta+\epsilon$ , where $\mathbf{y}\in\mathbb{R}^{n}$ represent response variable, unobservable parameters $\beta\in\mathbb{R}^{K}$ and non-random explanatory variable $\mathbf{X}\in\mathbb{R}^{n\times K}$ , while $\epsilon$ represents the noise or error. When the errors $\epsilon_{i}$ are homoscedastic i.e. $Var[\epsilon_{i}]=\sigma^{2}$ and serially uncorrelated i.e. $Cov[\epsilon_{i},\epsilon_{j}]=0$ by the Gauss–Markov theorem [1] the ordinary least squares is having the lowest sampling variance within the class of linear unbiased estimators. In econometrics, special interpretation and attention are given to the error structure [2].

In this paper, we focus on the problem of volatility of asset returns, which are well known to be heteroscedastic in nature. Volatility is associated with the risk and amplitude of price fluctuations.

Some models characterize the volatility from a conditional process perspective, for example, the autoregressive conditional heteroskedasticity (ARCH) model [3] and the generalized autoregressive conditional heteroskedasticity (GARCH) model [4]. The idea of the conditional process approach is the possibility of using a conditional variance that varies over time while the unconditional variance remains relatively constant.

On the other hand, researchers have recently focused on using realized volatility (RV) to build forecasting models due to the wide availability of high-frequency financial data. For example, the Heterogeneous Autoregression (HAR-RV) model [5] is widely used in the literature due to consistently good predictive performance and simple methods to estimate it.

Recently, different recurrent NN units like LSTM [6], GRU [7], SRU [8] have demonstrated high performance on forecasting tasks for time-series data. More specifically, LSTM is the particular architecture that design improves the model’s performance overall and especially in the volatility prediction task [9, 10].

Gated recurrent units (GRU) [7] is an enhanced LSTM architecture that improves the fitting process by eliminating the cell state. In addition, the Statistical Recurrent Unit (SRU) [8] was introduced, which can infer long-term dependencies from data by using simple moving averages of summary statistics and has multiple proxies of the past with simple linear combinations.

Our work investigated and analyzed how NN can learn to capture the temporal structure of realized volatility. We are interested in how we could add the structure of long and short-term volatility effects to the LSTM. We introduce a modified LSTM cell that we call $\sigma$ -LSTM to match these needs. For this reason, we extend the equation system of the LSTM cell, which reflects the inductive bias of GARCH-like structure and HAR-RV effects of volatility, which allows easier learning of volatility by maximum likelihood estimation.

Recently, several studies [11, 12] have explored the heteroskedasticity of returns with recurrent NN architectures, but not by means of modified LSTM cell. In [11], authors have proposed the RECH model, where $\omega$ -constant of the GARCH process is modeled by a particular RNN model.

In [12], authors have proposed the combination of the Stochastic Volatility (SV) model and Statistical Recurrent Unit (SRU). The idea is that the SRU captures the long-term memory effects and auto-dependence of the volatility. However, the SRU is modeling the deterministic dynamics of the hidden states in the SR-SV model.

We investigated the original Long short-term memory cell, proposed $\sigma$ -LSTM with a particular loss function for realized volatility forecasting tasks, and compared the predictive ability with widely used HAR-RV, GARCH(1,1) models.

The remaining paper is organized as follows. Section II provides mathematical motivation and a formal definition of $\sigma$ -LSTM cell. Section III describes how the experiment and its results. Finally, in Section IV, we provide a conclusion.

II Methodology

One important class of econometric models are GARCH family models [3, 4]:

r_{t}=\mu_{t}+\sigma_{t}\varepsilon_{t},\text{ }\sigma_{t}\varepsilon_{t}\sim\mathcal{N}(0,\sigma_{t}^{2})

(1)

where the conditional variance [4, 3] has the autoregressive structure:

\sigma_{\mathrm{t}}^{2}=\omega+\alpha*\mathrm{r}_{\mathrm{t}-1}^{2}+\beta*\sigma_{\mathrm{t}-1}^{2}.

(2)

Number extensions with different functional dependence (see Table 1) have been proposed like eGARCH [13], cGARCH [14], TGARCH [15], GJR-GARCH [16] and others.

Table 1: GARCH family

eGARCH

\ln({\sigma}_{t}^{2})=\omega+\alpha\left[\left|\frac{{\varepsilon}_{t-1}}{{\sigma}_{t-1}}\right|-E\left|\frac{{\varepsilon}_{t-1}}{{\sigma}_{t-1}}\right|\right]+\delta\frac{{\varepsilon}_{t-1}}{{\sigma}_{t-1}}+\beta\ln({\sigma}_{t-1}^{2})

cGARCH

{\sigma}_{t}^{2}=q_{t}+\alpha({\varepsilon}^{2}_{t-1}-q_{t-1})+\beta({\sigma}_{t-1}^{2}-q_{t-1})

q_{t}=\omega+\rho q_{t-1}+\theta({\varepsilon}^{2}_{t-1}-{\sigma}^{2}_{t-1})

GJR-GARCH

{\sigma}_{t}^{2}=\omega+\left(\alpha+\gamma I_{t-1}\right)\varepsilon_{t-1}^{2}+\beta\sigma_{t-1}^{2}

I_{t-1}\{\begin{array}[]{ll}0&\text{ if }r_{t-1}\geq\mu\\ 1&\text{ if }r_{t-1}<\mu\end{array}

TGARCH

\sigma_{t}=\omega+\alpha\varepsilon_{t-1}+\beta\sigma_{t-1}+\phi\varepsilon_{t-1}1_{\left[\varepsilon_{t-1}<0\right]}

The heterogeneous Autoregression Realized Volatility (HAR-RV) model introduced by [5] assumes that agents’ behavior in financial markets, which differ in their perception of volatility depending on their investment horizons and are divided into short-term, medium-term, and long-term. Heterogeneous structures in financial markets are based on the heterogeneous market hypothesis presented by [17]. Participants’ decisions refer to different time horizons that perceive and respond to different types of volatility. A memory of each component decreases with a particular time constant.

The HAR-RV model is an additive cascade of partial volatilities generated at different time horizons that follows an autoregressive process [5]. The HAR-RV approach is one more stable and accurate estimate for Realized Volatility [18] at the 3 different horizons, where $RV_{t}^{(d)},RV_{t}^{(w)}$ , and $RV_{t}^{(m)}$ are respectively the daily, weekly, and monthly observed realized volatilities.

\begin{cases}\tilde{\sigma}_{t+1m}^{(m)}=c^{(m)}+\phi^{(m)}RV_{t}^{(m)}+\tilde{\omega}_{t+1m}^{(m)}\\ \tilde{\sigma}_{t+1w}^{(w)}=c^{(w)}+\phi^{(w)}RV_{t}^{(w)}+\gamma^{(w)}\mathbb{E}_{t}\left[\tilde{\sigma}_{t+1m}^{(m)}\right]+\tilde{\omega}_{t+1w^{\prime}}^{(w)}\\ \tilde{\sigma}_{t+1d}^{(d)}=c^{(d)}+\phi^{(d)}RV_{t}^{(d)}+\gamma^{(d)}\mathbb{E}_{t}\left[\tilde{\sigma}_{t+1w}^{(w)}\right]+\tilde{\omega}_{t+1d}^{(d)}\end{cases}

(3)

where $c^{(m)}$ - the constant and $\tilde{\omega}_{t+1m}^{(m)}$ is an innovation that is simultaneously and consistently independent with a mean zero for monthly aggregation, and $\phi$ represents the wight in a particular cascade.

Motivated by GARCH structure and long and short-term volatility effect of HAR-RV model, we propose to model the return assets $\mathrm{r}_{\mathrm{t}}=\phi(\mathrm{r}_{\mathrm{t}-1},...,\mathrm{r}_{\mathrm{t}-p})$ , where $\phi(.)$ is a differentiable non-linear function. In particular a modified long short-term memory (LSTM) cell [6] that should capture long and short-term volatility. The inputs to out modified LSTM cell $x_{t}=r_{t}$ are directly returns and the outputs are $\hat{r}_{t}$ and $\hat{\sigma}^{2}_{t}$ . The cell has directly the hidden representation $h_{t}$ for short-term memory and long-term $C_{t}$ volatility memory component. The updates rules of $\sigma-$ LSTM are the following:

f_{t}=\sigma\left(W_{f}\cdot\left[h_{t-1},x_{t}\right]+b_{f}\right),

(4)

i_{t}=\sigma\left(W_{i}\cdot\left[h_{t-1},x_{t}\right]+b_{i}\right),

(5)

\tilde{C}_{t}=\tanh\left(W_{C}\cdot\left[h_{t-1},x_{t}\right]+b_{C}\right),

(6)

C_{t}=f_{t}*C_{t-1}+i_{t}*\tilde{C}_{t},

(7)

o_{t}={\mathcal{N}}(0,W_{o}[{C}_{t}^{2}]),

(8)

h_{t}=o_{t}*\phi\left(C_{t}\right).

(9)

Finally, the output return $\hat{r}_{t}$ and estimated volatility $\hat{\sigma}_{t}$ is:

\hat{r}_{t}=W_{h}\cdot h_{t}

(10)

\hat{\sigma}^{2}_{t}={\langle C_{t}\rangle}^{2},

(11)

where $\langle.\rangle$ is the mean operator and both $\hat{r}_{t}$ and $\hat{\sigma}^{2}_{t}$ are scalar values.

We implement the custom loss function as the likelihood of observed returns with estimated volatilities.

\mathcal{L}=\sum_{t=1}^{m}\left[-\ln\left(\hat{\sigma}_{t}^{2}\right)-\frac{r_{t}^{2}}{\hat{\sigma}_{t}^{2}}\right].

(12)

III Experiments & Results

Table 2: Description of the data

Type of asset	Name	Price points	RV points
Index	S&P500	2 821 368	3803
Stock	Apple Inc.	2 466 466	3803
Cryptocurrency	Bitcoin-USD	3 613 769	3375

This study investigates how proposed $\sigma$ -LSTM could estimate and predict realized volatility on different market structures, particularly stocks, indexes, and cryptocurrency data. We consider Apple inc. stock, the S&P 500 index, and Bitcoin-USD.

We calculate RV based on minutes-based price observations for daily aggregation. Returns are calculated on the daily close price. As a best practice, we divided the dataset into three parts: training, validation, and test. The validation and the test sample are equivalent to 200 points.

Mean Squared Error (MSE) measures averaged squared difference values between the predictions and the target. The power of 2 in this metric prevents neutralizing positive and negative deviations, which minimizes the distance between actual and calculated values. Root Mean Squared Error (RMSE) is the square root of MSE. The square root is introduced to scale error is the same as the target scale.

To find the best configuration of NN is necessary to conduct multiple experiments with different hyperparameters [19]. We have results for training launches and results for the validation dataset; the next step is to select promising hyperparameters RMSE metrics appropriately. Standardization is highly recommended before training RNNs and can improve the efficiency of training models. We normalized input data from 0 to 1 by min-max scale.

In our study, we ask the following questions. First, how could the proposed $\sigma$ -LSTM cell with GARCH-like structure and long and short-term volatility effect of the HAR-RV model capture the long and short-term volatility effects?

We performed standard accuracy measures for the one-step-ahead prediction using RMSE metrics for 200 data points, Table 3. As a result, $\sigma$ -LSTM shows the best performance for RMSE for the out-of-sample result for the S&P 500 index and Apple Inc. stock data sets.

Table 3: S&P 500, Apple Inc. stock and Bitcoin-USD out-of-sample tests of forecasting accuracy

Data Set	Model	RMSE
S&P 500 Index	GARCH (1,1)	0.00405
	HAR-RV	0.00359
	LSTM	0.00805
	$\sigma$ -LSTM	0,00351
Apple Inc. stock	GARCH (1,1)	0.00648
	HAR-RV	0.00561
	LSTM	0.00752
	$\sigma$ -LSTM	0.00560
Bitcoin-USD	GARCH (1,1)	0.01641
	HAR-RV	0.01537
	LSTM	0.02286
	$\sigma$ -LSTM	0.01542

However, it should be noted that in the case of cryptocurrency, the prediction error of HAR-RV was at the same level as $\sigma$ -LSTM. In our experiment, $C_{t}$ of the original LSTM does not provide sufficient results.

IV Conclusion

This work introduces a special $\sigma$ -LSTM cell to investigate whether the use of stylized facts or ”physics-informed” inductive bias [20] i.e., GARCH and HAR-RV volatility structure could help to improve the performance of the NN for the volatility prediction task. We do not use the Recurrent LSTM unit as a black box but rather design a sub-component to represent a long-short volatility memory and a stochastic part.

We add particular loss functions for the $\sigma$ -LSTM. As a result, we show that $\sigma$ -LSTM could outperform well-known models in this field, such as a strong baseline HAR-RV and regular LSTM cell. We will investigate more advanced loss functions in future work that could allow faster learning convergence and accuracy.

References

Huang [1970] D. S. Huang, Regression and econometric methods, QA 278.2. H82 (1970).
Tsay [2005] R. S. Tsay, Analysis of financial time series (John wiley & sons, 2005).
Engle [1982] R. F. Engle, Autoregressive conditional heteroscedasticity with estimates of the variance of united kingdom inflation, Econometrica: Journal of the econometric society , 987 (1982).
Bollerslev [1986] T. Bollerslev, Generalized autoregressive conditional heteroskedasticity, Journal of econometrics 31, 307 (1986).
Corsi [2009] F. Corsi, A simple approximate long-memory model of realized volatility, Journal of Financial Econometrics 7, 174 (2009).
Hochreiter and Schmidhuber [1997] S. Hochreiter and J. Schmidhuber, Long short-term memory, Neural computation 9, 1735 (1997).
Cho et al. [2014] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, Learning phrase representations using rnn encoder-decoder for statistical machine translation, arXiv preprint arXiv:1406.1078 (2014).
Oliva et al. [2017] J. B. Oliva, B. Póczos, and J. Schneider, The statistical recurrent unit, in International Conference on Machine Learning (PMLR, 2017) pp. 2671–2680.
Bucci [2020] A. Bucci, Realized volatility forecasting with neural networks, Journal of Financial Econometrics 18, 502 (2020).
Rodikov and Antulov-Fantulin [2022] G. Rodikov and N. Antulov-Fantulin, Can lstm outperform volatility-econometric models?, arXiv preprint arXiv:2202.11581 (2022).
Nguyen et al. [2020] T.-N. Nguyen, M.-N. Tran, and R. Kohn, Recurrent conditional heteroskedasticity, arXiv preprint arXiv:2010.13061 (2020).
Nguyen et al. [2022] T.-N. Nguyen, M.-N. Tran, D. Gunawan, and R. Kohn, A statistical recurrent stochastic volatility model for stock markets, Journal of Business & Economic Statistics , 1 (2022).
Nelson [1991] D. B. Nelson, Conditional heteroskedasticity in asset returns: A new approach, Econometrica: Journal of the Econometric Society , 347 (1991).
Lee and Engle [1999] G. Lee and R. Engle, A permanent and transitory component model of stock return volatility, Cointegration, Causality and Forecasting: A Festschrift in Honor of Clive W.J. Granger , 475 (1999).
Zakoian [1994] J.-M. Zakoian, Threshold heteroskedastic models, Journal of Economic Dynamics and control 18, 931 (1994).
Glosten et al. [1993] L. R. Glosten, R. Jagannathan, and D. E. Runkle, On the relation between the expected value and the volatility of the nominal excess return on stocks, The journal of finance 48, 1779 (1993).
Müller et al. [1993] U. A. Müller, M. M. Dacorogna, R. D. Davé, O. V. Pictet, R. B. Olsen, and J. R. Ward, Fractals and intrinsic time: A challenge to econometricians, Unpublished manuscript, Olsen & Associates, Zürich , 130 (1993).
Corsi et al. [2012] F. Corsi, F. Audrino, and R. Renó, Har modeling for realized volatility forecasting, - (2012).
Panchal et al. [2010] G. Panchal, A. Ganatra, Y. Kosta, and D. Panchal, Searching most efficient neural network architecture using akaike’s information criterion (aic), International Journal of Computer Applications 1, 41 (2010).
Karniadakis et al. [2021] G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, Physics-informed machine learning, Nature Reviews Physics 3, 422 (2021).