Exponential Kernels with Latency in Hawkes Processes: Applications in Finance

Marcos Costa Santos Carreira École Polytechnique, CMAP - PhD under the Quantitative Regulation chair

(Jan-2021)

Abstract

The Tick library [tick] allows researchers in market microstructure to simulate and learn Hawkes process in high-frequency data, with optimized parametric and non-parametric learners. But one challenge is to take into account the correct causality of order book events considering latency: the only way one order book event can influence another is if the time difference between them (by the central order book timestamps) is greater than the minimum amount of time for an event to be (i) published in the order book, (ii) reach the trader responsible for the second event, (iii) influence the decision (processing time at the trader) and (iv) the 2nd event reach the order book and be processed. For this we can use exponential kernels shifted to the right by the latency amount. We derive the expression for the log-likelihood to be minimized for the 1-D and the multidimensional cases, and test this method with simulated data and real data. On real data we find that, although not all decays are the same, the latency itself will determine most of the decays. We also show how the decays are related to the latency. Code is available on GitHub at https://github.com/MarcosCarreira/Hawkes-With-Latency.

Keywords: Hawkes processes; Kernel estimations; High-frequency; Order book dynamics; Market microstructure; Python

1 Introduction

It is well known that markets function in response both to external (or exogenous) information - e.g. news - and internal (endogenous) information, like the behavior of market participants and the patterns of price movements. The systematization of markets (synthesized by the Central Limit Order Book) led to two important developments on these responses: order book information is now available in an organized and timely form (which enables the study of intraday patterns in historical data) and algorithms can be deployed to process all kinds of information fast enough to trade (even before the information is received or processed by all the market participants).

That combination increases the importance of understanding how endogenous trading interacts with itself: activity in the order book leads to activity from fast traders which leads to more activity and so on. While the arrival of orders in an order book due to exogenous information can be described as a Poisson process, the interaction of the agents correspond to a self-excitatory process, which can be described by Hawkes processes. The importance of Hawkes processes in finance has been described quite well in [Bacry at al 2015], and recent work by [Euch et al 2016] has established the connection between Hawkes processes, stylized market microstructure facts and rough volatility. More recently, [Bacry et al 2019] improve the Queue-Reactive model by [Huang et al 2015] by adding Hawkes components to the arrival rates.

We start focusing on [Bacry at al 2015], which apply Hawkes processes learners as implemented in the Tick library [tick] to futures. There are two limitations in these applications: for parametric learners the decay(s) must be given and be the same for all kernels, and influences are assumed to be instantaneous. In this short paper we show how to code a parametric learner for an exponential kernel (or kernels in the multivariate case) in which we learn the decay and consider a latency (given). We won’t revisit the mathematical details for the known results; for this, we refer the reader to [Toke 2011] and [Abergel et al 2016]. One of the goals is to show a working Python script optimized at the most by [Numba]; this can be optimized for improved performance. The other goal is to understand what the learned parameters of the kernels mean given the trading dynamics of the BUND futures.

2 One-dimensional Hawkes processes

2.1 Definition

As explained in [Toke 2011], a Hawkes process with an exponential kernel has its intensity described by:

\lambda\left(t\right)=\lambda_{0}\left(t\right)+\sum_{t_{i}<t}\sum_{j=1}^{P}\left[\alpha_{j}\cdot\exp\left(-\beta_{j}\cdot\left(t-t_{i}\right)\right)\right]

(1)

For $P=1$ and $\lambda_{0}\left(t\right)=\lambda_{0}$ :

\lambda\left(t\right)=\lambda_{0}+\sum_{t_{i}<t}\left[\alpha\cdot\exp\left(-\beta\cdot\left(t-t_{i}\right)\right)\right]

(2)

With stationarity condition:

\frac{\alpha}{\beta}<1

(3)

And unconditional expected value of intensity:

E\left[\lambda\left(t\right)\right]=\frac{\lambda_{0}}{1-\frac{\alpha}{\beta}}

(4)

2.2 Maximum-likelihood estimation

Following [Toke 2011]:

LL=\int_{0}^{T}\left(1-\lambda\left(s\right)\right)ds+\int_{0}^{T}\left(\ln\left(\lambda\left(s\right)\right)\right)dN\left(s\right)

(5)

Which is computed as:

LL=t_{N}-\int_{0}^{T}\lambda\left(s\right)ds+\sum_{i=1}^{N}\ln\left[\lambda_{0}\left(t_{i}\right)+\sum_{j=1}^{P}\sum_{k=1}^{i-1}\alpha_{j}\cdot\exp\left(-\beta_{j}\cdot\left(t_{i}-t_{k}\right)\right)\right]

(6)

Defining the function:

R_{j}\left(i\right)=\sum_{k=1}^{i-1}\exp\left(-\beta_{j}\cdot\left(t_{i}-t_{k}\right)\right)

(7)

The recursion observed by [Ogata et al 1981] is:

R_{j}\left(i\right)=\exp\left(-\beta_{j}\cdot\left(t_{i}-t_{i-1}\right)\right)\cdot\left(1+R_{j}\left(i-1\right)\right)

(8)

And therefore the log-likelihood is:

	$\displaystyle LL$	$\displaystyle=t_{N}-\int_{0}^{T}\lambda_{0}\left(s\right)ds-\sum_{i=1}^{N}\sum_{j=1}^{P}\left[\frac{\alpha_{j}}{\beta_{j}}\left(1-\exp\left(-\beta_{j}\cdot\left(t_{N}-t_{i}\right)\right)\right)\right]$
		$\displaystyle+\sum_{i=1}^{N}\ln\left[\lambda_{0}\left(t_{i}\right)+\sum_{j=1}^{P}\alpha_{j}\cdot R_{j}\left(i\right)\right]$		(9)

For $P=1$ and $\lambda_{0}\left(t\right)=\lambda_{0}$ :

R\left(i\right)=\exp\left(-\beta\cdot\left(t_{i}-t_{i-1}\right)\right)\cdot\left(1+R\left(i-1\right)\right)

(10)

And:

	$\displaystyle LL$	$\displaystyle=t_{n}-\lambda_{0}\cdot T-\sum_{i=1}^{N}\left[\frac{\alpha}{\beta}\left(1-\exp\left(-\beta\cdot\left(t_{N}-t_{i}\right)\right)\right)\right]$
		$\displaystyle+\sum_{i=1}^{N}\ln\left[\lambda_{0}+\alpha\cdot R\left(i\right)\right]$		(11)

2.3 Simulation and learning

Using tick we simulate 100 paths for different end times (100, 1000, 10000) for both the builtin ExpKernel and for a user-defined Time Function that mirrors an ExpKernel:

Initialize $\left\{\lambda_{0},\alpha,\beta\right\}$

Define the kernel function $f\left(\alpha,\beta,t\right)$

Define the support $s$

Build the TimeFunction with:
- n + 1 steps for the time interval $\left\{0,s\right\}$
  
  n + 1 evaluations of f for these points
  
  an interpolation mode (e.g. TimeFunction.InterConstRight)
Build a Kernel with HawkesKernelTimeFunc
Define the builtin HawkesKernelExp with parameters $\left\{\frac{\alpha}{\beta},\beta\right\}$

Define the number of simulations $N_{S}$

Define the end time $T$

Build the SimuHawkes object with $\lambda_{0}$ , a kernel and $T$

Build the SimuHawkesMulti object with SimuHawkes and $N_{S}$

Simulate SimuHawkesMulti

Algorithm 1 Definitions of Exponential Kernels (builtin and custom) and simulation

We use minimize from [scipy] with method ’SLSQP’; as pointed out in [Kisel 2017], it can handle both bounds (ensuring positivity of parameters) and constraints (ensuring the stationarity condition). The differences should be calculated at the start of the optimization process and passed to the likelihood function instead of being calculated inside the likelihood function.

def $ll\left(\theta,ts,\Delta ts,\delta ts\right)$ :
- $\left\{\lambda_{0},\alpha,\beta\right\}=\theta$ parameters are defined as one argument for the optimization
  
  $ts$ is one of the SimuHawkesMulti.timestamps
  
  $\Delta ts$ is $ts_{i}-ts_{i-1}$
  
  $\delta ts$ is $ts_{N}-ts_{i-1}$
  
  Define $R$ as an array of zeros with the same size as $ts$
  
  Keep its first element $R_{1}=0$
  
  Calculate $R_{i}=\exp\left(-\beta\cdot\Delta ts_{i-1}\right)\cdot\left(1+R_{i-1}\right)$ recursively for $i>1$
  
  Return:
  
  $-\left[ts_{N}\cdot\left(1-\lambda_{0}\right)-\left(\frac{\alpha}{\beta}\right)\cdot\sum_{\delta ts}\left(1-\exp\left(-\beta\cdot\delta ts\right)\right)+\sum_{R_{i}}\left(\log\left(\lambda_{0}+\alpha\cdot R_{i}\right)\right)\right]$
----------------------------------------------------------------------------
Minimize $ll\left(\theta,ts,\Delta ts,\delta ts\right)$ with:
- bounds on $\lambda_{0},\alpha,\beta$ (all must be positive) and
  
  constraint $\alpha<\beta$

Algorithm 2 Negative Log-Likelihood and its minimization with scipy’s minimize

2.4 Kernels with latency

For $P=1$ and $\lambda_{0}\left(t\right)=\lambda_{0}$ , with latency τ:

\lambda\left(t\right)=\lambda_{0}+\sum_{t_{i}<t-\tau}\left[\alpha\cdot\exp\left(-\beta\cdot\left(t-\tau-t_{i}\right)\right)\right]

(12)

We can use the same (exponential) function and shift it to the right by the latency:

Initialize $\left\{\lambda_{0},\alpha,\beta\right\}$

Define the kernel function $f\left(\alpha,\beta,t\right)$

Define the support $s$

Build the TimeFunction with:
- n + 1 steps for the time interval $\left\{0,s\right\}$
  
  n + 1 evaluations of f for these points
  
  an interpolation mode (e.g. TimeFunction.InterConstRight)
  
  If latency $\tau>0$ :
  - shift $\left\{0,s\right\}$ and $f\left(\left\{0,s\right\}\right)$ to the right by $\tau$
    
    add zeros to the interval $\left\{0,\tau\right\}$
Build a Kernel with HawkesKernelTimeFunc
Define the number of simulations $N_{S}$

Define the end time $T$

Build the SimuHawkes object with $\lambda_{0}$ , the kernel and $T$

Build the SimuHawkesMulti object with SimuHawkes and $N_{S}$

Simulate SimuHawkesMulti

Algorithm 3 Exponential Kernel with latency

The new likelihoods are defined (for $P=1$ and $\lambda_{0}\left(t\right)=\lambda_{0}$ ) as:

	$\displaystyle R_{\tau}\left(i\right)$	$\displaystyle=R_{\tau}\left(i-1\right)\cdot\exp\left(-\beta\cdot\left(t_{i}-t_{i-1}\right)\right)$		(13)
		$\displaystyle+\begin{cases}\sum_{t_{k}}\left[\exp\left(-\beta\cdot\left(t_{i}-\tau-t_{k}\right)\right)\right]&t_{i-1}-\tau\leq t_{k}<t_{i}-\tau\\ 0&otherwise\end{cases}$		(14)

With $R_{\tau}\left(1\right)=0$ , and:

	$\displaystyle LL_{\delta}$	$\displaystyle=t_{N}-\lambda_{0}\cdot t_{N}-\sum_{t_{i}<t-\tau}\left[\frac{\alpha}{\beta}\left(1-\exp\left(-\beta\cdot\left(t_{N}-\tau-t_{i}\right)\right)\right)\right]$
		$\displaystyle+\sum_{i=1}^{n}\ln\left[\lambda_{0}+\alpha\cdot R_{\tau}\left(i\right)\right]$		(15)

Even with the latency $\tau=0$ , for each $t_{i}$ we pass $t_{i-1}$ . But this enough to slow down the optimization, as observed while testing that the algorithm returns the same results as the previous one for latency zero and seen in Table 1.

def $lllat\left(\theta,t,\Delta t,\delta t,\tau t\right)$ :
- $\left\{\lambda_{0},\alpha,\beta\right\}=\theta$ parameters are defined as one argument for the optimization
  
  $t$ is one of the SimuHawkesMulti.timestamps
  
  $\Delta t$ is the series $t_{i}-t_{i-1}$ (excluding $t_{1}$ )
  
  $\delta t$ is the series $t_{N}-\tau-t_{i-1}$
  
  $\tau t$ is the series of arrays $t_{i}-\tau-t_{k}$ for all $t_{k}$ such that $t_{i-1}-\tau\leq t_{k}<t_{i}-\tau$
  
  Define $R$ as an array of zeros with the same size as $ts$
  
  Keep its first element $R_{1}=0$
  
  Calculate $R_{i}=\exp\left(-\beta\cdot\Delta t_{i-1}\right)\cdot R_{i-1}+\sum_{\tau t_{i}}\left(\exp\left(-\beta\cdot\tau t_{i}\right)\right)$ recursively for $i>1$
  
  Return:
  
  $-\left[t_{N}\cdot\left(1-\lambda_{0}\right)-\left(\frac{\alpha}{\beta}\right)\cdot\sum_{\delta t}\left(1-\exp\left(-\beta\cdot\delta t\right)\right)+\sum_{R_{i}}\left(\log\left(\lambda_{0}+\alpha\cdot R_{i}\right)\right)\right]$
----------------------------------------------------------------------------
Minimize $lllat\left(\theta,t,\Delta t,\delta t,\tau t\right)$ with:
- bounds on $\lambda_{0},\alpha,\beta$ (all must be positive) and
  
  constraint $\alpha<\beta$

Algorithm 4 Log-likelihood with latency

We obtain the following results for 100 paths (parameters $\lambda_{0}=1.20$ , $\alpha=0.60$ and $\beta=0.80$ ) with the total running times in seconds (on a MacBook Pro 16-inch 2019, 2.4 GHz 8-core Intel Core i9, 64 GB 2667 MHz DDR4 RAM):

End Time	Kernel	Algorithm	runtime (s)		$\lambda_{0}$		$\alpha$		$\beta$
100	Builtin	ll	0.	535	1.	40	0.	59	0.	87
100	Builtin	lllat	1.	49	1.	40	0.	59	0.	87
100	Custom	ll	0.	521	1.	41	0.	63	0.	88
100	Custom	lllat	1.	6	1.	41	0.	63	0.	88
1000	Builtin	ll	0.	992	1.	22	0.	60	0.	80
1000	Builtin	lllat	11.	3	1.	22	0.	60	0.	80
1000	Custom	ll	1.	13	1.	23	0.	63	0.	81
1000	Custom	lllat	12.	8	1.	23	0.	63	0.	81
10000	Builtin	ll	6.	02	1.	20	0.	60	0.	80
10000	Builtin	lllat	143.		1.	20	0.	60	0.	80
10000	Custom	ll	7.	29	1.	20	0.	62	0.	80
10000	Custom	lllat	181.		1.	20	0.	62	0.	80

Table 1: Learning results and times

With latency $\tau=2$ , we find that we can still recover the parameters of the kernel quite well (still 100 paths):

End Time	runtime		$\lambda_{0}$		$\alpha$		$\beta$
100	1.	76	1.	41	0.	59	0.	81
1000	15.	6	1.	26	0.	62	0.	80
10000	223.		1.	21	0.	61	0.	79

Table 2: Learning results and times for latency

\tau=2

3 Multidimensional Hawkes

3.1 No latency

For $P=1$ and $\lambda_{0}\left(t\right)=\lambda_{0}$ :

\lambda^{m}\left(t^{m}\right)=\lambda_{0}^{m}+\sum_{n=1}^{M}\sum_{t_{i}^{n}<t^{m}}\left[\alpha^{m,n}\cdot\exp\left(-\beta^{m,n}\cdot\left(t^{m}-t_{i}^{n}\right)\right)\right]

(16)

With the recursion now:

R^{m,n}\left(i\right)=R^{m,n}\left(i-1\right)\cdot\exp\left(-\beta^{m,n}\cdot\left(t_{i}^{m}-t_{i-1}^{m}\right)\right)+\sum_{t_{i-1}^{m}\leq t_{k}^{n}<t_{i}^{m}}\exp\left(-\beta^{m,n}\cdot\left(t_{i}^{m}-t_{k}^{n}\right)\right)

(17)

And log-likelihood for each node m:

	$\displaystyle LL^{m}$	$\displaystyle=t_{N}^{m}\cdot\left(1-\lambda_{0}^{m}\right)-\sum_{n=1}^{M}\sum_{t_{i}^{n}<t_{N}^{m}}\left[\frac{\alpha^{m,n}}{\beta^{m,n}}\left(1-\exp\left(-\beta^{m,n}\cdot\left(t_{N}^{m}-t_{i}^{n}\right)\right)\right)\right]$
		$\displaystyle+\sum_{t_{i}^{n}<t_{N}^{m}}\ln\left[\lambda_{0}^{m}+\sum_{n=1}^{M}\alpha^{m,n}\cdot R^{m,n}\left(i\right)\right]$		(18)

3.2 Latency

For $P=1$ and $\lambda_{0}\left(t\right)=\lambda_{0}$ :

\lambda^{m}\left(t^{m}\right)=\lambda_{0}^{m}+\sum_{n=1}^{M}\sum_{t_{i}^{n}<t^{m}-\tau}\left[\alpha^{m,n}\cdot\exp\left(-\beta^{m,n}\cdot\left(t^{m}-\tau-t_{i}^{n}\right)\right)\right]

(19)

With the recursion now:

R^{m,n}\left(i\right)=R^{m,n}\left(i-1\right)\cdot\exp\left(-\beta^{m,n}\cdot\left(t_{i}^{m}-t_{i-1}^{m}\right)\right)+\sum_{t_{i-1}^{m}-\tau\leq t_{k}^{n}<t_{i}^{m}-\tau}\exp\left(-\beta^{m,n}\cdot\left(t_{i}^{m}-\tau-t_{k}^{n}\right)\right)

(20)

And log-likelihood for each node m:

	$\displaystyle LL^{m}$	$\displaystyle=t_{N}^{m}\cdot\left(1-\lambda_{0}^{m}\right)-\sum_{n=1}^{M}\sum_{t_{i}^{n}<t_{N}^{m}-\tau}\left[\frac{\alpha^{m,n}}{\beta^{m,n}}\left(1-\exp\left(-\beta^{m,n}\cdot\left(t_{N}^{m}-\tau-t_{i}^{n}\right)\right)\right)\right]$
		$\displaystyle+\sum_{t_{i}^{n}<t_{N}^{m}-\tau}\ln\left[\lambda_{0}^{m}+\sum_{n=1}^{M}\alpha^{m,n}\cdot R^{m,n}\left(i\right)\right]$		(21)

Now there is a choice on how to efficiently select $t_{k}^{n}$ on the recursion: either find the corresponding arrays (which could be empty) for each $\left\{t_{i-1}^{m},t_{i}^{m}\right\}$ pair with numpy.extract or find the appropriate $t_{i}^{m}$ for each $t_{k}^{n}$ using numpy.searchsorted and building a dictionary; we chose the latter.

To optimize it further all the arrays are padded so all accesses are on numpy.ndarrays (more details on the code - also see [numpy]).

def $lllatm\left(\theta,m,nts,t_{N}^{m},\Delta t,\delta t,\tau t\right)$ :

$\left\{\lambda_{0_{1}},\lambda_{0_{2}},\ldots,\lambda_{0_{M}},\alpha^{1,1},\alpha^{1,2},\ldots,\alpha^{1,M},\ldots,\alpha^{M,M},\beta^{1,1},\beta^{1,2},\ldots,\beta^{1,M},\ldots,\beta^{M,M}\right\}=\theta$

parameters are defined as one argument for the optimization

$m$ is the index of the time series in the timestamps object

$nts$ is the array of lengths of all the time series of

the timestamps object

$t_{N}^{m}$ is the last timestamp in all of the time series of

the timestamps object

$\Delta t$ is the collection of series $t_{i}^{m}-t_{i-1}^{m}$ for all $1\leq m\leq M$ including $t_{1}^{m}$

$\delta t$ is the collection of collections of series $t_{N}^{m}-\tau-t_{i-1}^{n}$

for all $1\leq m\leq M$ and $1\leq n\leq M$

$\tau t$ is the collection of collections of series of arrays $t_{i}^{m}-\tau-t_{k}^{n}$

for all $t_{k}^{n}$ such that $t_{i-1}^{m}-\tau\leq t_{k}^{n}<t_{i}^{m}-\tau$ for all $1\leq m\leq M$ and $1\leq n\leq M$

For each n:

Define $R^{m,n}$ as an array of zeros with the same size as $t^{m}$

Keep its first element $R_{1}^{m,n}=0$

Calculate $R_{i}^{m,n}=\exp\left(-\beta^{m,n}\cdot\Delta t_{i}^{m}\right)\cdot R_{i-1}^{m,n}+\sum_{\tau t_{i}}\left(\exp\left(-\beta^{m,n}\cdot\tau t_{i}^{m,n}\right)\right)$
- recursively for $i>1$

Return:

	$\displaystyle-\left[t_{N}^{m}\cdot\left(1-\lambda_{0}^{m}\right)-\sum_{\delta t}\left(\left(\frac{\alpha^{m,n}}{\beta^{m,n}}\right)\cdot\left(1-\exp\left(-\beta^{m,n}\cdot\delta t^{m,n}\right)\right)\right)\right]$
	$\displaystyle-\left[+\sum_{R_{i}}\left(\log\left(\lambda_{0}^{m}+\alpha^{m,n}\cdot R_{i}^{m,n}\right)\right)\right]$

----------------------------------------------------------------------------

Minimize $lllatm\left(\theta,m,nts,t_{N}^{m},\Delta t,\delta t,\tau t\right)$ with:

bounds on $\lambda_{0},\alpha,\beta$ (all must be positive)

no constraints for just m

Algorithm 5 Log-likelihood with latency - multidimensional for one time series m

A slower optimization can be done for more than one time series in parallel, with the advantage of enforcing symmetries on coefficients by re-defining the input $\theta$ with the same decays for blocks (e.g. on [Bacry et al 2016] instead of 16 different decays we can use 4 decays, as described further down the paper); the algorithm below shows the case where we optimize it for all the time series, but we can modify it to run on bid-ask pairs of time series.

def $lllatall\left(\theta,nts,t_{N}^{m},\Delta t,\delta t,\tau t\right)$ :

parameters are defined as one argument for the optimization

$m$ is the index of the time series in the timestamps object

$nts$ is the array of lengths of all the time series of

the timestamps object

$t_{N}^{m}$ is the last timestamp in all of the time series of

the timestamps object

$\Delta t$ is the collection of series $t_{i}^{m}-t_{i-1}^{m}$ for all $1\leq m\leq M$ including $t_{1}^{m}$

$\delta t$ is the collection of collections of series $t_{N}^{m}-\tau-t_{i-1}^{n}$

for all $1\leq m\leq M$ and $1\leq n\leq M$

$\tau t$ is the collection of collections of series of arrays $t_{i}^{m}-\tau-t_{k}^{n}$

for all $t_{k}^{n}$ such that $t_{i-1}^{m}-\tau\leq t_{k}^{n}<t_{i}^{m}-\tau$ for all $1\leq m\leq M$ and $1\leq n\leq M$

For each m:

For each n:

Define $R^{m,n}$ as an array of zeros with the same size as $t^{m}$

Keep its first element $R_{1}^{m,n}=0$

Calculate $R_{i}^{m,n}=\exp\left(-\beta^{m,n}\cdot\Delta t_{i}^{m}\right)\cdot R_{i-1}^{m,n}+\sum_{\tau t_{i}}\left(\exp\left(-\beta^{m,n}\cdot\tau t_{i}^{m,n}\right)\right)$
- recursively for $i>1$

Return:

	$\displaystyle LL^{m}$	$\displaystyle=\left[t_{N}^{m}\cdot\left(1-\lambda_{0}^{m}\right)-\sum_{\delta t}\left(\left(\frac{\alpha^{m,n}}{\beta^{m,n}}\right)\cdot\left(1-\exp\left(-\beta^{m,n}\cdot\delta t^{m,n}\right)\right)\right)\right]$
	$\displaystyle+$	$\displaystyle\left[\sum_{R_{i}}\left(\log\left(\lambda_{0}^{m}+\alpha^{m,n}\cdot R_{i}^{m,n}\right)\right)\right]$

Return $-\sum_{m}\left(LL^{m}\right)$

----------------------------------------------------------------------------

Minimize $lllatall\left(\theta,nts,t_{N}^{m},\Delta t,\delta t,\tau t\right)$ with:

bounds on $\lambda_{0},\alpha,\beta$ (all must be positive)

inequality constraints: stationarity condition

equality constraints: symmetries on coefficients

Algorithm 6 Log-likelihood with latency - multidimensional for all time series

3.3 Simulations and learning

With latency $\tau=2$ and two kernels we get good results (100 paths, method ’SLSQP’):

End Time	runtime(s)		$\lambda_{0}^{1}$		$\lambda_{0}^{2}$		$\alpha^{1,1}$		$\alpha^{1,2}$		$\alpha^{2,1}$		$\alpha^{2,2}$		$\beta^{1,1}$		$\beta^{1,2}$		$\beta^{2,1}$		$\beta^{2,2}$
Simulated			0.	6	0.	2	0.	5	0.	7	0.	9	0.	3	1.	4	1.	8	2.	2	1.	0
100	7.	1	0.	68	0.	28	0.	55	0.	73	1.	03	0.	37	1.	87	1.	93	2.	44	2.	65
1000	67.		0.	63	0.	21	0.	52	0.	75	0.	98	0.	31	1.	44	1.	78	2.	18	1.	05
10000	804.		0.	60	0.	20	0.	52	0.	75	0.	97	0.	31	1.	37	1.	76	2.	12	1.	00

Table 3: Learning results and times for latency

\tau=2

4 Results on real data

4.1 Estimation of Slowly Decreasing Hawkes Kernels: Application to High Frequency Order Book Dynamics

Here one must consider the latency of $250\mu s$ (as informed in [Eurex 2017]):

Refer to caption — Figure 1: Eurex Latency

We have available data for 20 days (Apr-2014) on events P and T, from 8h to 22h (14 hours on each day), as described in https://github.com/X-DataInitiative/tick-datasets/tree/master/hawkes/bund. Because in some days we have late starts, and because liquidity late at night is not better, we consider events only from 10h to 18h, which represent more than 75% of all events (daily counts available on GitHub).

To get a better idea of the time scales involved, we calculate the descriptive statistics of each time series:

	$P_{u}$	$P_{d}$	$T_{a}$	$T_{b}$
$\overline{n}$ (events)	4569	4558	11027	12811
$\Delta t_{10\%}$ (μs)	196	196	147	229
$\Delta t_{15\%}$ (μs)	241	242	317	471
$\Delta t_{25\%}$ (μs)	342	340	1890	11756
$\Delta t_{50\%}$ (ms)	14.1	13.0	159.7	183.2
$\Delta t_{75\%}$ (s)	2.5	2.4	1.9	1.5
$\Delta t_{90\%}$ (s)	16.9	17.1	7.5	6.3
$\overline{\Delta t}$ (s)	6.3	6.3	2.6	2.2

Table 4: Statistics for the time intervals within each series

And we can see that between 10% and 15% of events within each time series happen less than $250\mu s$ after the previous event, and therefore should not be considered as being caused by that previous event. We can also see that the average time interval is considerably higher than the median time interval, which means that this distribution is more fat-tailed than an exponential distribution.

We run optimizations for the sum of (Bid+Ask) log-likelihoods where the kernel intensities are the same within diagonals: $\alpha\left(T_{a}\rightarrow P_{u}\right)=\alpha\left(T_{b}\rightarrow P_{d}\right)$ and $\alpha\left(T_{b}\rightarrow P_{u}\right)=\alpha\left(T_{a}\rightarrow P_{d}\right)$ and the kernel decays are the same within blocks $\beta\left(T_{a}\rightarrow P_{u}\right)=\beta\left(T_{b}\rightarrow P_{d}\right)=\beta\left(T_{b}\rightarrow P_{u}\right)=\beta\left(T_{a}\rightarrow P_{d}\right)$ ; so the total LL is optimized individually as $LL^{opt}=LL_{P_{a}+P_{b}}^{opt}+LL_{T_{a}+T_{b}}^{opt}$ . We also run exploratory optimizations with completely independent intensities and decays, and tested different optimizer algorithms, in order to reduce the bounds for the parameters. The code (including initial values and bounds) is at the GitHub page, together with the spreadsheet with the results of the optimization (using the Differential Evolution, Powell and L-BFGS-B methods from [scipy]; Powell was the best-performing method). The parameters for $T\rightarrow T$ were the most unstable (charts in the Appendix).

	$P_{u}$	$P_{d}$	$T_{a}$	$T_{b}$
$\lambda_{0}$	4.0%	4.0%	14.3%	16.1%
R	25%	25%	38%	37%

$P_{u}\rightarrow P_{u}$ $P_{d}\rightarrow P_{u}$ $T_{a}\rightarrow P_{u}$ $T_{b}\rightarrow P_{u}$ $\alpha$ 737 488 401 14.5 $\beta$ 3113 3125 $\alpha/\beta$ 23.7% 15.7% 12.8% 0.46%

$P_{u}\rightarrow T_{a}$ $P_{d}\rightarrow T_{a}$ $T_{a}\rightarrow T_{a}$ $T_{b}\rightarrow T_{a}$ $\alpha$ 23.6 316 3.8 (median) 0.27 (median) $\beta$ 1122 7.8 (median) $\alpha/\beta$ 2.1% 28.3% 48.0% 3.1%

Table 5: Learned parameters

We calculate the exogeneity ratio R between the number of exogenous events and the total number of events n for each event type i and day d as:

R_{i,d}=\frac{\left(\lambda_{0}\right)_{i,d}}{\frac{\left(n\right)_{i,d}}{8\cdot 3600}}

(22)

And these ratios are much higher than those found in [Bacry et al 2016], which can be explained as the reassignment of the events within the latency. Even then we find that it is reasonable to expect that a good part of the trades are exogenous (ie caused by reasons other than short-term dynamics of the order book). Most interestingly, events that change the mid-price (trades, cancels and replenishments) are (i) faster and (ii) more endogenous than trades that do not change the mid-price. Please note how the first chart of Figure 2 is similar to the first chart of Figure 7 in [Bacry et al 2016] after approximately $250\mu s$ .

It’s not by chance that the decay on $T\rightarrow P$ and $P\rightarrow P$ kernels is about 3000; we can plot the first chart again but changing the x-axis to units of latency:

It’s expected that fast participants will act quickly on: (i) sharp imbalances by either canceling or consuming the remaining queue $\left(T_{a}\rightarrow P_{u}\right)$ (ii) filling in recent gaps $\left(P\rightarrow P\right)$ and that they will be able to react to the reactions as quickly as before, so the influence of the first event decays fast. We expect that Figure 3 will be found in different markets and different years (e.g. BUND futures in 2012, which had a latency of $1250\mu s$ instead of $250\mu s$ ).

The main differences found with [Bacry et al 2016] are on the anti-diagonals. For $P\rightarrow P$ we find that diagonals are stronger than anti-diagonals (even on a large tick asset such as the BUND future); to investigate this further a better breakdown of P into Cancels and Trades would help. For $P\rightarrow T$ our method is not able to learn negative intensities because of the log-likelihood, but none of the values reached the lower bound; average intensity was about 2%; although a change in mid-price will make limit orders with the same sign miss the fill and become diagonals in the sub-matrix $P\rightarrow P$ market orders would not be affected; but more importantly the speed and intensity of the $P\rightarrow P$ kernels ensure that gaps close quickly; we think that the introduction of the latency is enough to simplify the model such that negative intensities would not be necessary.

We have not tested a Sum of Exponentials model, but it might be interesting to fix the decays for the first exponential kernel at “known” values and see what we can find.

5 Conclusions

We show (with formulas and code) how to simulate and estimate exponential kernels of Hawkes processes with a latency, and interpreted the results of applying this method to a (limited) set of BUND futures data. We show how some features of this model (the $P\rightarrow P$ kernels and parameter symmetries) might be quite universal in markets. We can also extend this approach to two related securities that trade in different markets by using different latencies on the cross-market kernels without changing the model.

We would like to thank Mathieu Rosenbaum (CMAP) for the guidance and Sebastian Neusuess (Eurex) for the latency data.

References

[tick] Bacry, E., Bompaire, M., Gaïffas, S. and Poulsen, S., (2017), “tick: a Python library for statistical learning, with a particular emphasis on time-dependent modeling”, available at https://arxiv.org/abs/1707.03003
[numpy] Harris, C. P. et al (2020) “Array programming with NumPy”, Nature, 585, 357–362
[scipy] Virtanen, P. et al (2020) “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python”, Nature Methods, 17(3), 261-272
[Numba] Lam, S. K., Pitrou, A. and Seibert, S. (2015) “Numba: a LLVM-based Python JIT compiler”, LLVM ’15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, Article No.: 7 Pages 1–6
[Eurex 2017] Eurex, “Insights into trading system dynamics”, presentation
[Abergel et al 2016] Abergel, F., Anane, M., Chakraborti, A., Jedidi, A. and Toke, I. M. (2016) “Limit Order Books”, Cambridge University Press, ISBN 978-1-107-16398-0
[Bacry at al 2015] Bacry, E., Mastromatteo, I. and Muzy, J-F., (2015), “Hawkes processes in finance”, Market Microstructure and Liquidity 1(1)
[Bacry et al 2016] Bacry, E., Jaisson, T. and Muzy, J. F., (2016), “Estimation of Slowly Decreasing Hawkes Kernels: Application to High Frequency Order Book Dynamics”, Quantitative Finance 16(8), 1179-1201
[Bacry et al 2019] Bacry, E., Rambldi, M., Muzy, J. F. and Wu, P. (2019), “Queue-reactive Hawkes models for the order flow”, Working Papers hal-02409073, HAL, available at https://arxiv.org/abs/1901.08938
[Euch et al 2016] Euch, O. E., Fukasawa, M. and Rosenbaum, M., (2018) “The microstructural foundations of leverage effect and rough volatility”, Finance and Stochastics 22, 241-280
[Huang et al 2015] Huang, W., Lehalle, C.A. and Rosenbaum, M. (2015) “Simulating and Analyzing Order Book Data: The Queue-Reactive Model”, Journal of the American Statistical Association, 110(509):107–122
[Kisel 2017] Kisel, R. (2017): “Quoting behavior of a market-maker under different exchange fee structures”, Master’s thesis
[Ogata et al 1981] Ogata, Y., (1981) “On Lewis’ simulation method for point processes”, IEEE Transactions on Information Theory 27(1), 23-31
[Toke 2011] Toke, I. M., (2011), “An Introduction to Hawkes Processes with Applications to Finance”, BNP Paribas Chair Meeting, available at http://lamp.ecp.fr/MAS/fiQuant/ioane_files/HawkesCourseSlides.pdf

Appendix

Appendix A Appendix

The charts and tables for the daily estimates are below. The only real worry is the instability of the parameters of the $T\rightarrow T$ kernels, although the intensities $\frac{\alpha}{\beta}$ are stable. The upper bound for $\beta\left(T\rightarrow T\right)$ was hit on day 16.

Day	$\alpha\left(T_{a}\rightarrow T_{a}\right)$	$\alpha\left(T_{b}\rightarrow T_{a}\right)$	$\beta\left(T\rightarrow T\right)$
1	2.73	0.14	4.91
2	2.80	0.18	4.88
3	3.81	0.33	8.57
4	7.12	0.62	15.64
5	36.12	1.94	106.44
6	2.96	0.17	5.15
7	4.68	0.31	10.81
8	4.19	0.21	7.75
9	7.78	0.67	18.73
10	2.55	0.29	4.68
11	22.68	1.22	73.79
12	2.28	0.15	4.13
13	3.33	0.24	6.14
14	4.04	0.27	8.46
15	4.75	0.35	10.15
16	117.41	2.28	500.00
17	3.85	0.27	7.93
18	3.63	0.22	7.54
19	3.24	0.15	6.22
20	1.76	0.04	2.68

Table 6: Daily estimates of parameters for

T\rightarrow T

kernels