This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Exponential Kernels with Latency in Hawkes Processes: Applications in Finance

Marcos Costa Santos Carreira École Polytechnique, CMAP - PhD under the Quantitative Regulation chair
(Jan-2021)
Abstract

The Tick library [tick] allows researchers in market microstructure to simulate and learn Hawkes process in high-frequency data, with optimized parametric and non-parametric learners. But one challenge is to take into account the correct causality of order book events considering latency: the only way one order book event can influence another is if the time difference between them (by the central order book timestamps) is greater than the minimum amount of time for an event to be (i) published in the order book, (ii) reach the trader responsible for the second event, (iii) influence the decision (processing time at the trader) and (iv) the 2nd event reach the order book and be processed. For this we can use exponential kernels shifted to the right by the latency amount. We derive the expression for the log-likelihood to be minimized for the 1-D and the multidimensional cases, and test this method with simulated data and real data. On real data we find that, although not all decays are the same, the latency itself will determine most of the decays. We also show how the decays are related to the latency. Code is available on GitHub at https://github.com/MarcosCarreira/Hawkes-With-Latency.

Keywords: Hawkes processes; Kernel estimations; High-frequency; Order book dynamics; Market microstructure; Python

1 Introduction

It is well known that markets function in response both to external (or exogenous) information - e.g. news - and internal (endogenous) information, like the behavior of market participants and the patterns of price movements. The systematization of markets (synthesized by the Central Limit Order Book) led to two important developments on these responses: order book information is now available in an organized and timely form (which enables the study of intraday patterns in historical data) and algorithms can be deployed to process all kinds of information fast enough to trade (even before the information is received or processed by all the market participants).

That combination increases the importance of understanding how endogenous trading interacts with itself: activity in the order book leads to activity from fast traders which leads to more activity and so on. While the arrival of orders in an order book due to exogenous information can be described as a Poisson process, the interaction of the agents correspond to a self-excitatory process, which can be described by Hawkes processes. The importance of Hawkes processes in finance has been described quite well in [Bacry at al 2015], and recent work by [Euch et al 2016] has established the connection between Hawkes processes, stylized market microstructure facts and rough volatility. More recently, [Bacry et al 2019] improve the Queue-Reactive model by [Huang et al 2015] by adding Hawkes components to the arrival rates.

We start focusing on [Bacry at al 2015], which apply Hawkes processes learners as implemented in the Tick library [tick] to futures. There are two limitations in these applications: for parametric learners the decay(s) must be given and be the same for all kernels, and influences are assumed to be instantaneous. In this short paper we show how to code a parametric learner for an exponential kernel (or kernels in the multivariate case) in which we learn the decay and consider a latency (given). We won’t revisit the mathematical details for the known results; for this, we refer the reader to [Toke 2011] and [Abergel et al 2016]. One of the goals is to show a working Python script optimized at the most by [Numba]; this can be optimized for improved performance. The other goal is to understand what the learned parameters of the kernels mean given the trading dynamics of the BUND futures.

2 One-dimensional Hawkes processes

2.1 Definition

As explained in [Toke 2011], a Hawkes process with an exponential kernel has its intensity described by:

λ(t)=λ0(t)+ti<tj=1P[αjexp(βj(tti))]\lambda\left(t\right)=\lambda_{0}\left(t\right)+\sum_{t_{i}<t}\sum_{j=1}^{P}\left[\alpha_{j}\cdot\exp\left(-\beta_{j}\cdot\left(t-t_{i}\right)\right)\right] (1)

For P=1P=1 and λ0(t)=λ0\lambda_{0}\left(t\right)=\lambda_{0} :

λ(t)=λ0+ti<t[αexp(β(tti))]\lambda\left(t\right)=\lambda_{0}+\sum_{t_{i}<t}\left[\alpha\cdot\exp\left(-\beta\cdot\left(t-t_{i}\right)\right)\right] (2)

With stationarity condition:

αβ<1\frac{\alpha}{\beta}<1 (3)

And unconditional expected value of intensity:

E[λ(t)]=λ01αβE\left[\lambda\left(t\right)\right]=\frac{\lambda_{0}}{1-\frac{\alpha}{\beta}} (4)

2.2 Maximum-likelihood estimation

Following [Toke 2011]:

LL=0T(1λ(s))𝑑s+0T(ln(λ(s)))𝑑N(s)LL=\int_{0}^{T}\left(1-\lambda\left(s\right)\right)ds+\int_{0}^{T}\left(\ln\left(\lambda\left(s\right)\right)\right)dN\left(s\right) (5)

Which is computed as:

LL=tN0Tλ(s)𝑑s+i=1Nln[λ0(ti)+j=1Pk=1i1αjexp(βj(titk))]LL=t_{N}-\int_{0}^{T}\lambda\left(s\right)ds+\sum_{i=1}^{N}\ln\left[\lambda_{0}\left(t_{i}\right)+\sum_{j=1}^{P}\sum_{k=1}^{i-1}\alpha_{j}\cdot\exp\left(-\beta_{j}\cdot\left(t_{i}-t_{k}\right)\right)\right] (6)

Defining the function:

Rj(i)=k=1i1exp(βj(titk))R_{j}\left(i\right)=\sum_{k=1}^{i-1}\exp\left(-\beta_{j}\cdot\left(t_{i}-t_{k}\right)\right) (7)

The recursion observed by [Ogata et al 1981] is:

Rj(i)=exp(βj(titi1))(1+Rj(i1))R_{j}\left(i\right)=\exp\left(-\beta_{j}\cdot\left(t_{i}-t_{i-1}\right)\right)\cdot\left(1+R_{j}\left(i-1\right)\right) (8)

And therefore the log-likelihood is:

LL\displaystyle LL =tN0Tλ0(s)𝑑si=1Nj=1P[αjβj(1exp(βj(tNti)))]\displaystyle=t_{N}-\int_{0}^{T}\lambda_{0}\left(s\right)ds-\sum_{i=1}^{N}\sum_{j=1}^{P}\left[\frac{\alpha_{j}}{\beta_{j}}\left(1-\exp\left(-\beta_{j}\cdot\left(t_{N}-t_{i}\right)\right)\right)\right]
+i=1Nln[λ0(ti)+j=1PαjRj(i)]\displaystyle+\sum_{i=1}^{N}\ln\left[\lambda_{0}\left(t_{i}\right)+\sum_{j=1}^{P}\alpha_{j}\cdot R_{j}\left(i\right)\right] (9)

For P=1P=1 and λ0(t)=λ0\lambda_{0}\left(t\right)=\lambda_{0} :

R(i)=exp(β(titi1))(1+R(i1))R\left(i\right)=\exp\left(-\beta\cdot\left(t_{i}-t_{i-1}\right)\right)\cdot\left(1+R\left(i-1\right)\right) (10)

And:

LL\displaystyle LL =tnλ0Ti=1N[αβ(1exp(β(tNti)))]\displaystyle=t_{n}-\lambda_{0}\cdot T-\sum_{i=1}^{N}\left[\frac{\alpha}{\beta}\left(1-\exp\left(-\beta\cdot\left(t_{N}-t_{i}\right)\right)\right)\right]
+i=1Nln[λ0+αR(i)]\displaystyle+\sum_{i=1}^{N}\ln\left[\lambda_{0}+\alpha\cdot R\left(i\right)\right] (11)

2.3 Simulation and learning

Using tick we simulate 100 paths for different end times (100, 1000, 10000) for both the builtin ExpKernel and for a user-defined Time Function that mirrors an ExpKernel:

  • Initialize {λ0,α,β}\left\{\lambda_{0},\alpha,\beta\right\}

    Define the kernel function f(α,β,t)f\left(\alpha,\beta,t\right)

    Define the support ss

    Build the TimeFunction with:

    • n + 1 steps for the time interval {0,s}\left\{0,s\right\}

      n + 1 evaluations of f for these points

      an interpolation mode (e.g. TimeFunction.InterConstRight)

    Build a Kernel with HawkesKernelTimeFunc

    Define the builtin HawkesKernelExp with parameters {αβ,β}\left\{\frac{\alpha}{\beta},\beta\right\}

    Define the number of simulations NSN_{S}

    Define the end time TT

    Build the SimuHawkes object with λ0\lambda_{0}, a kernel and TT

    Build the SimuHawkesMulti object with SimuHawkes and NSN_{S}

    Simulate SimuHawkesMulti


Algorithm 1 Definitions of Exponential Kernels (builtin and custom) and simulation

We use minimize from [scipy] with method ’SLSQP’; as pointed out in [Kisel 2017], it can handle both bounds (ensuring positivity of parameters) and constraints (ensuring the stationarity condition). The differences should be calculated at the start of the optimization process and passed to the likelihood function instead of being calculated inside the likelihood function.

  • def ll(θ,ts,Δts,δts)ll\left(\theta,ts,\Delta ts,\delta ts\right):

    • {λ0,α,β}=θ\left\{\lambda_{0},\alpha,\beta\right\}=\theta parameters are defined as one argument for the optimization

      tsts is one of the SimuHawkesMulti.timestamps

      Δts\Delta ts is tsitsi1ts_{i}-ts_{i-1}

      δts\delta ts is tsNtsi1ts_{N}-ts_{i-1}

      Define RR as an array of zeros with the same size as tsts

      Keep its first element R1=0R_{1}=0

      Calculate Ri=exp(βΔtsi1)(1+Ri1)R_{i}=\exp\left(-\beta\cdot\Delta ts_{i-1}\right)\cdot\left(1+R_{i-1}\right) recursively for i>1i>1

      Return:

      [tsN(1λ0)(αβ)δts(1exp(βδts))+Ri(log(λ0+αRi))]-\left[ts_{N}\cdot\left(1-\lambda_{0}\right)-\left(\frac{\alpha}{\beta}\right)\cdot\sum_{\delta ts}\left(1-\exp\left(-\beta\cdot\delta ts\right)\right)+\sum_{R_{i}}\left(\log\left(\lambda_{0}+\alpha\cdot R_{i}\right)\right)\right]

    ----------------------------------------------------------------------------

    Minimize ll(θ,ts,Δts,δts)ll\left(\theta,ts,\Delta ts,\delta ts\right) with:

    • bounds on λ0,α,β\lambda_{0},\alpha,\beta (all must be positive) and

      constraint α<β\alpha<\beta

Algorithm 2 Negative Log-Likelihood and its minimization with scipy’s minimize

2.4 Kernels with latency

For P=1P=1 and λ0(t)=λ0\lambda_{0}\left(t\right)=\lambda_{0} , with latency τ:

λ(t)=λ0+ti<tτ[αexp(β(tτti))]\lambda\left(t\right)=\lambda_{0}+\sum_{t_{i}<t-\tau}\left[\alpha\cdot\exp\left(-\beta\cdot\left(t-\tau-t_{i}\right)\right)\right] (12)

We can use the same (exponential) function and shift it to the right by the latency:

  • Initialize {λ0,α,β}\left\{\lambda_{0},\alpha,\beta\right\}

    Define the kernel function f(α,β,t)f\left(\alpha,\beta,t\right)

    Define the support ss

    Build the TimeFunction with:

    • n + 1 steps for the time interval {0,s}\left\{0,s\right\}

      n + 1 evaluations of f for these points

      an interpolation mode (e.g. TimeFunction.InterConstRight)

      If latency τ>0\tau>0:

      • shift {0,s}\left\{0,s\right\} and f({0,s})f\left(\left\{0,s\right\}\right) to the right by τ\tau

        add zeros to the interval {0,τ}\left\{0,\tau\right\}

    Build a Kernel with HawkesKernelTimeFunc

    Define the number of simulations NSN_{S}

    Define the end time TT

    Build the SimuHawkes object with λ0\lambda_{0}, the kernel and TT

    Build the SimuHawkesMulti object with SimuHawkes and NSN_{S}

    Simulate SimuHawkesMulti

Algorithm 3 Exponential Kernel with latency

The new likelihoods are defined (for P=1P=1 and λ0(t)=λ0\lambda_{0}\left(t\right)=\lambda_{0}) as:

Rτ(i)\displaystyle R_{\tau}\left(i\right) =Rτ(i1)exp(β(titi1))\displaystyle=R_{\tau}\left(i-1\right)\cdot\exp\left(-\beta\cdot\left(t_{i}-t_{i-1}\right)\right) (13)
+{tk[exp(β(tiτtk))]ti1τtk<tiτ0otherwise\displaystyle+\begin{cases}\sum_{t_{k}}\left[\exp\left(-\beta\cdot\left(t_{i}-\tau-t_{k}\right)\right)\right]&t_{i-1}-\tau\leq t_{k}<t_{i}-\tau\\ 0&otherwise\end{cases} (14)

With Rτ(1)=0R_{\tau}\left(1\right)=0, and:

LLδ\displaystyle LL_{\delta} =tNλ0tNti<tτ[αβ(1exp(β(tNτti)))]\displaystyle=t_{N}-\lambda_{0}\cdot t_{N}-\sum_{t_{i}<t-\tau}\left[\frac{\alpha}{\beta}\left(1-\exp\left(-\beta\cdot\left(t_{N}-\tau-t_{i}\right)\right)\right)\right]
+i=1nln[λ0+αRτ(i)]\displaystyle+\sum_{i=1}^{n}\ln\left[\lambda_{0}+\alpha\cdot R_{\tau}\left(i\right)\right] (15)

Even with the latency τ=0\tau=0, for each tit_{i} we pass ti1t_{i-1}. But this enough to slow down the optimization, as observed while testing that the algorithm returns the same results as the previous one for latency zero and seen in Table 1.

  • def lllat(θ,t,Δt,δt,τt)lllat\left(\theta,t,\Delta t,\delta t,\tau t\right):

    • {λ0,α,β}=θ\left\{\lambda_{0},\alpha,\beta\right\}=\theta parameters are defined as one argument for the optimization

      tt is one of the SimuHawkesMulti.timestamps

      Δt\Delta t is the series titi1t_{i}-t_{i-1} (excluding t1t_{1})

      δt\delta t is the series tNτti1t_{N}-\tau-t_{i-1}

      τt\tau t is the series of arrays tiτtkt_{i}-\tau-t_{k} for all tkt_{k} such that ti1τtk<tiτt_{i-1}-\tau\leq t_{k}<t_{i}-\tau

      Define RR as an array of zeros with the same size as tsts

      Keep its first element R1=0R_{1}=0

      Calculate Ri=exp(βΔti1)Ri1+τti(exp(βτti))R_{i}=\exp\left(-\beta\cdot\Delta t_{i-1}\right)\cdot R_{i-1}+\sum_{\tau t_{i}}\left(\exp\left(-\beta\cdot\tau t_{i}\right)\right) recursively for i>1i>1

      Return:

      [tN(1λ0)(αβ)δt(1exp(βδt))+Ri(log(λ0+αRi))]-\left[t_{N}\cdot\left(1-\lambda_{0}\right)-\left(\frac{\alpha}{\beta}\right)\cdot\sum_{\delta t}\left(1-\exp\left(-\beta\cdot\delta t\right)\right)+\sum_{R_{i}}\left(\log\left(\lambda_{0}+\alpha\cdot R_{i}\right)\right)\right]

    ----------------------------------------------------------------------------

    Minimize lllat(θ,t,Δt,δt,τt)lllat\left(\theta,t,\Delta t,\delta t,\tau t\right) with:

    • bounds on λ0,α,β\lambda_{0},\alpha,\beta (all must be positive) and

      constraint α<β\alpha<\beta

Algorithm 4 Log-likelihood with latency

We obtain the following results for 100 paths (parameters λ0=1.20\lambda_{0}=1.20 , α=0.60\alpha=0.60 and β=0.80\beta=0.80) with the total running times in seconds (on a MacBook Pro 16-inch 2019, 2.4 GHz 8-core Intel Core i9, 64 GB 2667 MHz DDR4 RAM):

End Time Kernel Algorithm runtime (s) λ0\lambda_{0} α\alpha β\beta
100 Builtin ll 0. 535 1. 40 0. 59 0. 87
100 Builtin lllat 1. 49 1. 40 0. 59 0. 87
100 Custom ll 0. 521 1. 41 0. 63 0. 88
100 Custom lllat 1. 6 1. 41 0. 63 0. 88
1000 Builtin ll 0. 992 1. 22 0. 60 0. 80
1000 Builtin lllat 11. 3 1. 22 0. 60 0. 80
1000 Custom ll 1. 13 1. 23 0. 63 0. 81
1000 Custom lllat 12. 8 1. 23 0. 63 0. 81
10000 Builtin ll 6. 02 1. 20 0. 60 0. 80
10000 Builtin lllat 143. 1. 20 0. 60 0. 80
10000 Custom ll 7. 29 1. 20 0. 62 0. 80
10000 Custom lllat 181. 1. 20 0. 62 0. 80
Table 1: Learning results and times

With latency τ=2\tau=2, we find that we can still recover the parameters of the kernel quite well (still 100 paths):

End Time runtime λ0\lambda_{0} α\alpha β\beta
100 1. 76 1. 41 0. 59 0. 81
1000 15. 6 1. 26 0. 62 0. 80
10000 223. 1. 21 0. 61 0. 79
Table 2: Learning results and times for latency τ=2\tau=2

3 Multidimensional Hawkes

3.1 No latency

For P=1P=1 and λ0(t)=λ0\lambda_{0}\left(t\right)=\lambda_{0} :

λm(tm)=λ0m+n=1Mtin<tm[αm,nexp(βm,n(tmtin))]\lambda^{m}\left(t^{m}\right)=\lambda_{0}^{m}+\sum_{n=1}^{M}\sum_{t_{i}^{n}<t^{m}}\left[\alpha^{m,n}\cdot\exp\left(-\beta^{m,n}\cdot\left(t^{m}-t_{i}^{n}\right)\right)\right] (16)

With the recursion now:

Rm,n(i)=Rm,n(i1)exp(βm,n(timti1m))+ti1mtkn<timexp(βm,n(timtkn))R^{m,n}\left(i\right)=R^{m,n}\left(i-1\right)\cdot\exp\left(-\beta^{m,n}\cdot\left(t_{i}^{m}-t_{i-1}^{m}\right)\right)+\sum_{t_{i-1}^{m}\leq t_{k}^{n}<t_{i}^{m}}\exp\left(-\beta^{m,n}\cdot\left(t_{i}^{m}-t_{k}^{n}\right)\right) (17)

And log-likelihood for each node m:

LLm\displaystyle LL^{m} =tNm(1λ0m)n=1Mtin<tNm[αm,nβm,n(1exp(βm,n(tNmtin)))]\displaystyle=t_{N}^{m}\cdot\left(1-\lambda_{0}^{m}\right)-\sum_{n=1}^{M}\sum_{t_{i}^{n}<t_{N}^{m}}\left[\frac{\alpha^{m,n}}{\beta^{m,n}}\left(1-\exp\left(-\beta^{m,n}\cdot\left(t_{N}^{m}-t_{i}^{n}\right)\right)\right)\right]
+tin<tNmln[λ0m+n=1Mαm,nRm,n(i)]\displaystyle+\sum_{t_{i}^{n}<t_{N}^{m}}\ln\left[\lambda_{0}^{m}+\sum_{n=1}^{M}\alpha^{m,n}\cdot R^{m,n}\left(i\right)\right] (18)

3.2 Latency

For P=1P=1 and λ0(t)=λ0\lambda_{0}\left(t\right)=\lambda_{0} :

λm(tm)=λ0m+n=1Mtin<tmτ[αm,nexp(βm,n(tmτtin))]\lambda^{m}\left(t^{m}\right)=\lambda_{0}^{m}+\sum_{n=1}^{M}\sum_{t_{i}^{n}<t^{m}-\tau}\left[\alpha^{m,n}\cdot\exp\left(-\beta^{m,n}\cdot\left(t^{m}-\tau-t_{i}^{n}\right)\right)\right] (19)

With the recursion now:

Rm,n(i)=Rm,n(i1)exp(βm,n(timti1m))+ti1mτtkn<timτexp(βm,n(timτtkn))R^{m,n}\left(i\right)=R^{m,n}\left(i-1\right)\cdot\exp\left(-\beta^{m,n}\cdot\left(t_{i}^{m}-t_{i-1}^{m}\right)\right)+\sum_{t_{i-1}^{m}-\tau\leq t_{k}^{n}<t_{i}^{m}-\tau}\exp\left(-\beta^{m,n}\cdot\left(t_{i}^{m}-\tau-t_{k}^{n}\right)\right) (20)

And log-likelihood for each node m:

LLm\displaystyle LL^{m} =tNm(1λ0m)n=1Mtin<tNmτ[αm,nβm,n(1exp(βm,n(tNmτtin)))]\displaystyle=t_{N}^{m}\cdot\left(1-\lambda_{0}^{m}\right)-\sum_{n=1}^{M}\sum_{t_{i}^{n}<t_{N}^{m}-\tau}\left[\frac{\alpha^{m,n}}{\beta^{m,n}}\left(1-\exp\left(-\beta^{m,n}\cdot\left(t_{N}^{m}-\tau-t_{i}^{n}\right)\right)\right)\right]
+tin<tNmτln[λ0m+n=1Mαm,nRm,n(i)]\displaystyle+\sum_{t_{i}^{n}<t_{N}^{m}-\tau}\ln\left[\lambda_{0}^{m}+\sum_{n=1}^{M}\alpha^{m,n}\cdot R^{m,n}\left(i\right)\right] (21)

Now there is a choice on how to efficiently select tknt_{k}^{n} on the recursion: either find the corresponding arrays (which could be empty) for each {ti1m,tim}\left\{t_{i-1}^{m},t_{i}^{m}\right\} pair with numpy.extract or find the appropriate timt_{i}^{m} for each tknt_{k}^{n} using numpy.searchsorted and building a dictionary; we chose the latter.

To optimize it further all the arrays are padded so all accesses are on numpy.ndarrays (more details on the code - also see [numpy]).

  • def lllatm(θ,m,nts,tNm,Δt,δt,τt)lllatm\left(\theta,m,nts,t_{N}^{m},\Delta t,\delta t,\tau t\right):

    • {λ01,λ02,,λ0M,α1,1,α1,2,,α1,M,,αM,M,β1,1,β1,2,,β1,M,,βM,M}=θ\left\{\lambda_{0_{1}},\lambda_{0_{2}},\ldots,\lambda_{0_{M}},\alpha^{1,1},\alpha^{1,2},\ldots,\alpha^{1,M},\ldots,\alpha^{M,M},\beta^{1,1},\beta^{1,2},\ldots,\beta^{1,M},\ldots,\beta^{M,M}\right\}=\theta

      parameters are defined as one argument for the optimization

      mm is the index of the time series in the timestamps object

      ntsnts is the array of lengths of all the time series of

      • the timestamps object

      tNmt_{N}^{m} is the last timestamp in all of the time series of

      • the timestamps object

      Δt\Delta t is the collection of series timti1mt_{i}^{m}-t_{i-1}^{m} for all 1mM1\leq m\leq M including t1mt_{1}^{m}

      δt\delta t is the collection of collections of series tNmτti1nt_{N}^{m}-\tau-t_{i-1}^{n}

      • for all 1mM1\leq m\leq M and 1nM1\leq n\leq M

      τt\tau t is the collection of collections of series of arrays timτtknt_{i}^{m}-\tau-t_{k}^{n}

      • for all tknt_{k}^{n} such that ti1mτtkn<timτt_{i-1}^{m}-\tau\leq t_{k}^{n}<t_{i}^{m}-\tau for all 1mM1\leq m\leq M and 1nM1\leq n\leq M

      For each n:

      • Define Rm,nR^{m,n} as an array of zeros with the same size as tmt^{m}

        Keep its first element R1m,n=0R_{1}^{m,n}=0

        Calculate Rim,n=exp(βm,nΔtim)Ri1m,n+τti(exp(βm,nτtim,n))R_{i}^{m,n}=\exp\left(-\beta^{m,n}\cdot\Delta t_{i}^{m}\right)\cdot R_{i-1}^{m,n}+\sum_{\tau t_{i}}\left(\exp\left(-\beta^{m,n}\cdot\tau t_{i}^{m,n}\right)\right)

        • recursively for i>1i>1

      Return:

      [tNm(1λ0m)δt((αm,nβm,n)(1exp(βm,nδtm,n)))]\displaystyle-\left[t_{N}^{m}\cdot\left(1-\lambda_{0}^{m}\right)-\sum_{\delta t}\left(\left(\frac{\alpha^{m,n}}{\beta^{m,n}}\right)\cdot\left(1-\exp\left(-\beta^{m,n}\cdot\delta t^{m,n}\right)\right)\right)\right]
      [+Ri(log(λ0m+αm,nRim,n))]\displaystyle-\left[+\sum_{R_{i}}\left(\log\left(\lambda_{0}^{m}+\alpha^{m,n}\cdot R_{i}^{m,n}\right)\right)\right]

    ----------------------------------------------------------------------------

    Minimize lllatm(θ,m,nts,tNm,Δt,δt,τt)lllatm\left(\theta,m,nts,t_{N}^{m},\Delta t,\delta t,\tau t\right) with:

    • bounds on λ0,α,β\lambda_{0},\alpha,\beta (all must be positive)

      no constraints for just m

Algorithm 5 Log-likelihood with latency - multidimensional for one time series m

A slower optimization can be done for more than one time series in parallel, with the advantage of enforcing symmetries on coefficients by re-defining the input θ\theta with the same decays for blocks (e.g. on [Bacry et al 2016] instead of 16 different decays we can use 4 decays, as described further down the paper); the algorithm below shows the case where we optimize it for all the time series, but we can modify it to run on bid-ask pairs of time series.

  • def lllatall(θ,nts,tNm,Δt,δt,τt)lllatall\left(\theta,nts,t_{N}^{m},\Delta t,\delta t,\tau t\right):

    • {λ01,λ02,,λ0M,α1,1,α1,2,,α1,M,,αM,M,β1,1,β1,2,,β1,M,,βM,M}=θ\left\{\lambda_{0_{1}},\lambda_{0_{2}},\ldots,\lambda_{0_{M}},\alpha^{1,1},\alpha^{1,2},\ldots,\alpha^{1,M},\ldots,\alpha^{M,M},\beta^{1,1},\beta^{1,2},\ldots,\beta^{1,M},\ldots,\beta^{M,M}\right\}=\theta

      parameters are defined as one argument for the optimization

      mm is the index of the time series in the timestamps object

      ntsnts is the array of lengths of all the time series of

      • the timestamps object

      tNmt_{N}^{m} is the last timestamp in all of the time series of

      • the timestamps object

      Δt\Delta t is the collection of series timti1mt_{i}^{m}-t_{i-1}^{m} for all 1mM1\leq m\leq M including t1mt_{1}^{m}

      δt\delta t is the collection of collections of series tNmτti1nt_{N}^{m}-\tau-t_{i-1}^{n}

      • for all 1mM1\leq m\leq M and 1nM1\leq n\leq M

      τt\tau t is the collection of collections of series of arrays timτtknt_{i}^{m}-\tau-t_{k}^{n}

      • for all tknt_{k}^{n} such that ti1mτtkn<timτt_{i-1}^{m}-\tau\leq t_{k}^{n}<t_{i}^{m}-\tau for all 1mM1\leq m\leq M and 1nM1\leq n\leq M

      For each m:

      • For each n:

        • Define Rm,nR^{m,n} as an array of zeros with the same size as tmt^{m}

          Keep its first element R1m,n=0R_{1}^{m,n}=0

          Calculate Rim,n=exp(βm,nΔtim)Ri1m,n+τti(exp(βm,nτtim,n))R_{i}^{m,n}=\exp\left(-\beta^{m,n}\cdot\Delta t_{i}^{m}\right)\cdot R_{i-1}^{m,n}+\sum_{\tau t_{i}}\left(\exp\left(-\beta^{m,n}\cdot\tau t_{i}^{m,n}\right)\right)

          • recursively for i>1i>1

        Return:

        LLm\displaystyle LL^{m} =[tNm(1λ0m)δt((αm,nβm,n)(1exp(βm,nδtm,n)))]\displaystyle=\left[t_{N}^{m}\cdot\left(1-\lambda_{0}^{m}\right)-\sum_{\delta t}\left(\left(\frac{\alpha^{m,n}}{\beta^{m,n}}\right)\cdot\left(1-\exp\left(-\beta^{m,n}\cdot\delta t^{m,n}\right)\right)\right)\right]
        +\displaystyle+ [Ri(log(λ0m+αm,nRim,n))]\displaystyle\left[\sum_{R_{i}}\left(\log\left(\lambda_{0}^{m}+\alpha^{m,n}\cdot R_{i}^{m,n}\right)\right)\right]

      Return m(LLm)-\sum_{m}\left(LL^{m}\right)

    ----------------------------------------------------------------------------

    Minimize lllatall(θ,nts,tNm,Δt,δt,τt)lllatall\left(\theta,nts,t_{N}^{m},\Delta t,\delta t,\tau t\right) with:

    • bounds on λ0,α,β\lambda_{0},\alpha,\beta (all must be positive)

      inequality constraints: stationarity condition

      equality constraints: symmetries on coefficients

Algorithm 6 Log-likelihood with latency - multidimensional for all time series

3.3 Simulations and learning

With latency τ=2\tau=2 and two kernels we get good results (100 paths, method ’SLSQP’):

End Time runtime(s) λ01\lambda_{0}^{1} λ02\lambda_{0}^{2} α1,1\alpha^{1,1} α1,2\alpha^{1,2} α2,1\alpha^{2,1} α2,2\alpha^{2,2} β1,1\beta^{1,1} β1,2\beta^{1,2} β2,1\beta^{2,1} β2,2\beta^{2,2}
Simulated 0. 6 0. 2 0. 5 0. 7 0. 9 0. 3 1. 4 1. 8 2. 2 1. 0
100 7. 1 0. 68 0. 28 0. 55 0. 73 1. 03 0. 37 1. 87 1. 93 2. 44 2. 65
1000 67. 0. 63 0. 21 0. 52 0. 75 0. 98 0. 31 1. 44 1. 78 2. 18 1. 05
10000 804. 0. 60 0. 20 0. 52 0. 75 0. 97 0. 31 1. 37 1. 76 2. 12 1. 00
Table 3: Learning results and times for latency τ=2\tau=2

4 Results on real data

4.1 Estimation of Slowly Decreasing Hawkes Kernels: Application to High Frequency Order Book Dynamics

Here one must consider the latency of 250μs250\mu s (as informed in [Eurex 2017]):

Refer to caption
Figure 1: Eurex Latency

We have available data for 20 days (Apr-2014) on events P and T, from 8h to 22h (14 hours on each day), as described in https://github.com/X-DataInitiative/tick-datasets/tree/master/hawkes/bund. Because in some days we have late starts, and because liquidity late at night is not better, we consider events only from 10h to 18h, which represent more than 75% of all events (daily counts available on GitHub).

To get a better idea of the time scales involved, we calculate the descriptive statistics of each time series:

PuP_{u} PdP_{d} TaT_{a} TbT_{b}
n¯\overline{n} (events) 4569 4558 11027 12811
Δt10%\Delta t_{10\%} (μs) 196 196 147 229
Δt15%\Delta t_{15\%} (μs) 241 242 317 471
Δt25%\Delta t_{25\%} (μs) 342 340 1890 11756
Δt50%\Delta t_{50\%} (ms) 14.1 13.0 159.7 183.2
Δt75%\Delta t_{75\%} (s) 2.5 2.4 1.9 1.5
Δt90%\Delta t_{90\%} (s) 16.9 17.1 7.5 6.3
Δt¯\overline{\Delta t} (s) 6.3 6.3 2.6 2.2
Table 4: Statistics for the time intervals within each series

And we can see that between 10% and 15% of events within each time series happen less than 250μs250\mu s after the previous event, and therefore should not be considered as being caused by that previous event. We can also see that the average time interval is considerably higher than the median time interval, which means that this distribution is more fat-tailed than an exponential distribution.

We run optimizations for the sum of (Bid+Ask) log-likelihoods where the kernel intensities are the same within diagonals: α(TaPu)=α(TbPd)\alpha\left(T_{a}\rightarrow P_{u}\right)=\alpha\left(T_{b}\rightarrow P_{d}\right) and α(TbPu)=α(TaPd)\alpha\left(T_{b}\rightarrow P_{u}\right)=\alpha\left(T_{a}\rightarrow P_{d}\right) and the kernel decays are the same within blocks β(TaPu)=β(TbPd)=β(TbPu)=β(TaPd)\beta\left(T_{a}\rightarrow P_{u}\right)=\beta\left(T_{b}\rightarrow P_{d}\right)=\beta\left(T_{b}\rightarrow P_{u}\right)=\beta\left(T_{a}\rightarrow P_{d}\right); so the total LL is optimized individually as LLopt=LLPa+Pbopt+LLTa+TboptLL^{opt}=LL_{P_{a}+P_{b}}^{opt}+LL_{T_{a}+T_{b}}^{opt}. We also run exploratory optimizations with completely independent intensities and decays, and tested different optimizer algorithms, in order to reduce the bounds for the parameters. The code (including initial values and bounds) is at the GitHub page, together with the spreadsheet with the results of the optimization (using the Differential Evolution, Powell and L-BFGS-B methods from [scipy]; Powell was the best-performing method). The parameters for TTT\rightarrow T were the most unstable (charts in the Appendix).

PuP_{u} PdP_{d} TaT_{a} TbT_{b}
λ0\lambda_{0} 4.0% 4.0% 14.3% 16.1%
R 25% 25% 38% 37%

PuPuP_{u}\rightarrow P_{u} PdPuP_{d}\rightarrow P_{u} TaPuT_{a}\rightarrow P_{u} TbPuT_{b}\rightarrow P_{u} α\alpha 737 488 401 14.5 β\beta 3113 3125 α/β\alpha/\beta 23.7% 15.7% 12.8% 0.46%

PuTaP_{u}\rightarrow T_{a} PdTaP_{d}\rightarrow T_{a} TaTaT_{a}\rightarrow T_{a} TbTaT_{b}\rightarrow T_{a} α\alpha 23.6 316 3.8 (median) 0.27 (median) β\beta 1122 7.8 (median) α/β\alpha/\beta 2.1% 28.3% 48.0% 3.1%

Table 5: Learned parameters

We calculate the exogeneity ratio R between the number of exogenous events and the total number of events n for each event type i and day d as:

Ri,d=(λ0)i,d(n)i,d83600R_{i,d}=\frac{\left(\lambda_{0}\right)_{i,d}}{\frac{\left(n\right)_{i,d}}{8\cdot 3600}} (22)

And these ratios are much higher than those found in [Bacry et al 2016], which can be explained as the reassignment of the events within the latency. Even then we find that it is reasonable to expect that a good part of the trades are exogenous (ie caused by reasons other than short-term dynamics of the order book). Most interestingly, events that change the mid-price (trades, cancels and replenishments) are (i) faster and (ii) more endogenous than trades that do not change the mid-price. Please note how the first chart of Figure 2 is similar to the first chart of Figure 7 in [Bacry et al 2016] after approximately 250μs250\mu s.

Refer to caption
Refer to caption
Refer to caption
Figure 2: Learned Exponential Kernels

It’s not by chance that the decay on TPT\rightarrow P and PPP\rightarrow P kernels is about 3000; we can plot the first chart again but changing the x-axis to units of latency:

Refer to caption
Figure 3: P,TPP,T\rightarrow P Kernels in units of latency

It’s expected that fast participants will act quickly on: (i) sharp imbalances by either canceling or consuming the remaining queue (TaPu)\left(T_{a}\rightarrow P_{u}\right) (ii) filling in recent gaps (PP)\left(P\rightarrow P\right) and that they will be able to react to the reactions as quickly as before, so the influence of the first event decays fast. We expect that Figure 3 will be found in different markets and different years (e.g. BUND futures in 2012, which had a latency of 1250μs1250\mu s instead of 250μs250\mu s).

The main differences found with [Bacry et al 2016] are on the anti-diagonals. For PPP\rightarrow P we find that diagonals are stronger than anti-diagonals (even on a large tick asset such as the BUND future); to investigate this further a better breakdown of P into Cancels and Trades would help. For PTP\rightarrow T our method is not able to learn negative intensities because of the log-likelihood, but none of the values reached the lower bound; average intensity was about 2%; although a change in mid-price will make limit orders with the same sign miss the fill and become diagonals in the sub-matrix PPP\rightarrow P market orders would not be affected; but more importantly the speed and intensity of the PPP\rightarrow P kernels ensure that gaps close quickly; we think that the introduction of the latency is enough to simplify the model such that negative intensities would not be necessary.

We have not tested a Sum of Exponentials model, but it might be interesting to fix the decays for the first exponential kernel at “known” values and see what we can find.

5 Conclusions

We show (with formulas and code) how to simulate and estimate exponential kernels of Hawkes processes with a latency, and interpreted the results of applying this method to a (limited) set of BUND futures data. We show how some features of this model (the PPP\rightarrow P kernels and parameter symmetries) might be quite universal in markets. We can also extend this approach to two related securities that trade in different markets by using different latencies on the cross-market kernels without changing the model.

We would like to thank Mathieu Rosenbaum (CMAP) for the guidance and Sebastian Neusuess (Eurex) for the latency data.

References

  • [tick] Bacry, E., Bompaire, M., Gaïffas, S. and Poulsen, S., (2017), “tick: a Python library for statistical learning, with a particular emphasis on time-dependent modeling”, available at https://arxiv.org/abs/1707.03003
  • [numpy] Harris, C. P. et al (2020) “Array programming with NumPy”, Nature, 585, 357–362
  • [scipy] Virtanen, P. et al (2020) “SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python”, Nature Methods, 17(3), 261-272
  • [Numba] Lam, S. K., Pitrou, A. and Seibert, S. (2015) “Numba: a LLVM-based Python JIT compiler”, LLVM ’15: Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, Article No.: 7 Pages 1–6
  • [Eurex 2017] Eurex, “Insights into trading system dynamics”, presentation
  • [Abergel et al 2016] Abergel, F., Anane, M., Chakraborti, A., Jedidi, A. and Toke, I. M. (2016) “Limit Order Books”, Cambridge University Press, ISBN 978-1-107-16398-0
  • [Bacry at al 2015] Bacry, E., Mastromatteo, I. and Muzy, J-F., (2015), “Hawkes processes in finance”, Market Microstructure and Liquidity 1(1)
  • [Bacry et al 2016] Bacry, E., Jaisson, T. and Muzy, J. F., (2016), “Estimation of Slowly Decreasing Hawkes Kernels: Application to High Frequency Order Book Dynamics”, Quantitative Finance 16(8), 1179-1201
  • [Bacry et al 2019] Bacry, E., Rambldi, M., Muzy, J. F. and Wu, P. (2019), “Queue-reactive Hawkes models for the order flow”, Working Papers hal-02409073, HAL, available at https://arxiv.org/abs/1901.08938
  • [Euch et al 2016] Euch, O. E., Fukasawa, M. and Rosenbaum, M., (2018) “The microstructural foundations of leverage effect and rough volatility”, Finance and Stochastics 22, 241-280
  • [Huang et al 2015] Huang, W., Lehalle, C.A. and Rosenbaum, M. (2015) “Simulating and Analyzing Order Book Data: The Queue-Reactive Model”, Journal of the American Statistical Association, 110(509):107–122
  • [Kisel 2017] Kisel, R. (2017): “Quoting behavior of a market-maker under different exchange fee structures”, Master’s thesis
  • [Ogata et al 1981] Ogata, Y., (1981) “On Lewis’ simulation method for point processes”, IEEE Transactions on Information Theory 27(1), 23-31
  • [Toke 2011] Toke, I. M., (2011), “An Introduction to Hawkes Processes with Applications to Finance”, BNP Paribas Chair Meeting, available at http://lamp.ecp.fr/MAS/fiQuant/ioane_files/HawkesCourseSlides.pdf

Appendix

Appendix A Appendix

The charts and tables for the daily estimates are below. The only real worry is the instability of the parameters of the TTT\rightarrow T kernels, although the intensities αβ\frac{\alpha}{\beta} are stable. The upper bound for β(TT)\beta\left(T\rightarrow T\right) was hit on day 16.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 4: Daily estimates of parameters
Day α(TaTa)\alpha\left(T_{a}\rightarrow T_{a}\right) α(TbTa)\alpha\left(T_{b}\rightarrow T_{a}\right) β(TT)\beta\left(T\rightarrow T\right)
1 2.73 0.14 4.91
2 2.80 0.18 4.88
3 3.81 0.33 8.57
4 7.12 0.62 15.64
5 36.12 1.94 106.44
6 2.96 0.17 5.15
7 4.68 0.31 10.81
8 4.19 0.21 7.75
9 7.78 0.67 18.73
10 2.55 0.29 4.68
11 22.68 1.22 73.79
12 2.28 0.15 4.13
13 3.33 0.24 6.14
14 4.04 0.27 8.46
15 4.75 0.35 10.15
16 117.41 2.28 500.00
17 3.85 0.27 7.93
18 3.63 0.22 7.54
19 3.24 0.15 6.22
20 1.76 0.04 2.68
Table 6: Daily estimates of parameters for TTT\rightarrow T kernels