This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The statistical physics of discovering exogenous and endogenous factors in a chain of events

Shinsuke Koyama skoyama@ism.ac.jp The Institute of Statistical Mathematics, Tokyo 190-8562, Japan    Shigeru Shinomoto shinomoto.shigeru.6e@kyoto-u.ac.jp Department of Physics, Kyoto University, Kyoto 606-8502, Japan Brain Information Communication Research Laboratory Group, ATR Institute International, Kyoto 619-0288, Japan
Abstract

Event occurrence is not only subject to the environmental changes, but is also facilitated by the events that have occurred in a system. Here, we develop a method for estimating such extrinsic and intrinsic factors from a single series of event-occurrence times. The analysis is performed using a model that combines the inhomogeneous Poisson process and the Hawkes process, which represent exogenous fluctuations and endogenous chain-reaction mechanisms, respectively. The model is fit to a given dataset by minimizing the free energy, for which statistical physics and a path-integral method are utilized. Because the process of event occurrence is stochastic, parameter estimation is inevitably accompanied by errors, and it can ultimately occur that exogenous and endogenous factors cannot be captured even with the best estimator. We obtained four regimes categorized according to whether respective factors are detected. By applying the analytical method to real time series of debate in a social-networking service, we have observed that the estimated exogenous and endogenous factors are close to the first comments and the follow-up comments, respectively. This method is general and applicable to a variety of data, and we have provided an application program, by which anyone can analyze any series of event times.

I Introduction

It is widely seen that events occur one after another in a chain by self-excitation mechanisms, as in the communication of diseases Hethcote2000 ; Keeling2011 ; Vespignani2012 ; Brockmann2013 ; Pastorsatorras2015 , crimes and conflict Goffman1964 ; Kitsak2010 ; Mohler2011 ; Lewis2012 , seismic dynamics ogata1988 ; helmstetter2003 , scientometrics michael2012 , tweets CraneSornette2008 ; Lerman2010 ; Romero2011 , financial events AtSahalia2010 ; Hardiman2013 ; bacry2015 ; hawkes2018 , and neuronal firing Pernice2011 ; ReynaudBouret2014 ; Kass2018 ; GerhardDegerTruccolo2017 . Generally, event occurrence is subject not only to such intrinsic self-excitation mechanisms, but also environmental changes that take place independently of events that have occurred in a system. For instance, contagious diseases may spread from one individual to another, but the occurrence rate may also be subject to environmental factors such as temperature change Lipp2002 . When considering how to facilitate or attenuate occurrence activity, we need to know the potential contributions of endogenous and exogenous factors Menezes2004 ; lazer2018science ; Omi2017 . When the government reviews possible intervention in the economy, it must be able to project the degree of the uncontrollable chain reaction causing economic fluctuations such as falling stock prices, and hopefully to calculate the possible impact of external intervention.

It has been suggested that the temporal profile of changes in the tweeting rate may provide information as to whether such changes have been caused endogenously or exogenously CraneSornette2008 , but opinions have been divided over the issue as for whether gross activity contains enough information. Several works have addressed the estimation problem by analyzing precise time series of event occurrence using machine-learning algorithms Lerman2010 ; Romero2011 ; zaman2014 ; cheng2014 ; Dow2013 ; petrovic2011rt ; Aoki2016 ; Cattuto2007 ; Rybski2009 . Recently, we have developed an analytical method based on a generalized linear model (GLM) equipped with a self-exciting mechanism and analyzed a real time series of tweets to determine the contributions of endogenous and exogenous factors Fujita2018 . One problem with the GLM is its strong nonlinearity, such that previous events multiply each other and extrinsic and intrinsic factors are mixed up as a result.

In the present study, we consider analyzing a series of occurrence times via the “inhomogeneous Hawkes process,” a linear superposition of the inhomogeneous Poisson and Hawkes processes hawkes1971 , which represent exogenous fluctuations and endogenous chain-reaction mechanisms, respectively. We devise the “Hawkes decoder,” a method of fitting the inhomogeneous Hawkes process to a given dataset. We also develop a path-integral method for computing free energy analytically. Given an event-generation process, the path-integral method can determine the parameters of the model representing the degree of an internal chain reaction and the impact of environmental changes.

This estimation is inevitably accompanied by errors, because the event-generation process itself is stochastic. The estimation errors can be assessed by applying the decoder to synthetic data generated by simulating the inhomogeneous Hawkes process and comparing the estimated and original parameters. It may occur that exogenous fluctuation and/or endogenous self-excitation are not captured, even by a Hawkes decoder that can represent the original process. Using a path-integral method, we construct phase diagrams in which the results are categorized into four qualitatively different regimes according to whether or not their respective factors are detected. Then, we apply the Hawkes decoder to a time series of debate in a social-networking service (SNS) to estimate the relative contributions of exogenous and endogenous factors. We also examine the model’s capability of predicting the amount of follow-up comments induced by first comments in the SNS. We also provide an application program, by which anyone can analyze any sequence of event times.

II Methods

II.1 Generating a series of events with the inhomogeneous Hawkes process

We define the inhomogeneous Hawkes process in terms of the intensity or the occurrence rate λ(t)\lambda(t) given by

λ(t)=ν(t)+αti<tϕ(tti),\lambda(t)=\nu(t)+\alpha\sum_{t_{i}<t}\phi(t-t_{i}), (1)

where the first term ν(t)\nu(t) on the right-hand-side represents the inhomogeneous Poisson process such that events are derived from a time-varying occurrence rate given exogenously, whereas the second term represents a self-excitation effect in terms of the Hawkes process such that the occurrence of each event modifies the probability of future events endogenously (Fig. 1 (a)). Here, α\alpha is the reproduction ratio representing the average number of events induced by a single event, tit_{i} is the occurrence time of a past (iith) event, and ϕ(t)\phi(t) is a kernel representing the time course of the self-excitation, satisfying the causality ϕ(t)=0\phi(t)=0 for t<0t<0 and the normalization 0ϕ(t)𝑑t=1\int_{0}^{\infty}\phi(t)dt=1.

Refer to caption
Figure 1: (a) Schematic representation of the inhomogeneous Hawkes process. Exogenous or extrinsic stimulus is represented by the time-dependent rate ν(t)\nu(t) of the inhomogeneous Poisson process, while endogenous or intrinsic self-excitation is represented by the (homogeneous) Hawkes process such that the occurrence of each event facilitates the occurrence of events by adding the occurrence rate αϕ(t)\alpha\phi(t). (b) A basic procedure for executing the inhomogeneous Hawkes process. Divide the time axis into small time intervals of width δt\delta t and generate an event if a uniform random number x[0,1]x\in[0,1] is smaller than λ(t)δt\lambda(t)\delta t, and no event otherwise.

A basic procedure for putting the rate-model process into practice is to divide the time axis into small time intervals of δt\delta t, and then to repeat the Bernoulli process in which an event is either present (with a probability of λ(t)δt1\lambda(t)\delta t\ll 1) or absent (with a probability of 1λ(t)δt1-\lambda(t)\delta t) at each time bin (Fig. 1(b)). The probability of having no event for the first NN intervals and finally having an event at the N+1N+1st interval is given by i=1N(1λ(iδt)δt)λ(nδt)δt\prod_{i=1}^{N}(1-\lambda(i\delta t)\delta t)\lambda(n\delta t)\delta t. By taking the limit of the infinitesimal interval, the probability density of having an inter-event interval tnδtt\equiv n\delta t is given by

limδt0λ(t)i=1t/δt(1λ(iδt)δt)=λ(t)exp(0tλ(t)𝑑t).\lim_{\delta t\to 0}\lambda(t)\prod_{i=1}^{t/\delta t}\left(1-\lambda(i\delta t)\delta t\right)=\lambda(t)\exp{\left(-\int_{0}^{t}\lambda(t^{\prime})dt^{\prime}\right)}. (2)

Accordingly, the probability that events occur at times {ti}{t1,,tn}\{t_{i}\}\equiv\{t_{1},\ldots,t_{n}\} in a period [0,T][0,T] is obtained as Snyder1991 ; Daley2002

p({ti})=(i=1nλ(ti))exp(0Tλ(t)𝑑t).p(\{t_{i}\})=\left(\prod_{i=1}^{n}\lambda(t_{i})\right)\exp\left(-\int_{0}^{T}\lambda(t)dt\right). (3)

II.2 The Hawkes decoder: a method of estimating endogenous and exogenous factors

We wish to estimate the exogenous stimulus ν(t)\nu(t) and the degree of self-excitation α\alpha that have contributed to generating a given series of events that occurred at times {ti}\{t_{i}\} in a period of [0,T][0,T]. The estimation can be done by the Empirical Bayes method, in which the parameter and hyperparameter are determined by maximizing the marginal likelihood as follows.

Here, we assume that the external input ν(t)\nu(t) is modulated slowly in time. This may be represented by a prior distribution that penalizes the large gradient:

pγ({ν(t)})=1Z(γ)exp(12γ20T(dν(t)dt)2𝑑t),p_{\gamma}(\{\nu(t)\})=\frac{1}{Z(\gamma)}\exp\left(-\frac{1}{2\gamma^{2}}\int_{0}^{T}\left(\frac{d\nu(t)}{dt}\right)^{2}dt\right), (4)

where γ\gamma is a hyperparameter that specifies the smoothness or flatness of ν(t)\nu(t) and Z(γ)Z(\gamma) is the normalization constant

Z(γ)=exp(12γ20T(dν(t)dt)2𝑑t)D{ν(t)},Z(\gamma)=\int\exp\left(-\frac{1}{2\gamma^{2}}\int_{0}^{T}\left(\frac{d\nu(t)}{dt}\right)^{2}dt\right)D\{\nu(t)\}, (5)

where D{ν(t)}D\{\nu(t)\} represents functional integration over all possible paths of external inputs ν(t)\nu(t). Given a time series of events {ti}\{t_{i}\}, the posterior distribution of {ν(t)}\{\nu(t)\} is obtained through Bayes’ theorem as

pα,γ({ν(t)}|{ti})\displaystyle p_{\alpha,\gamma}(\{\nu(t)\}|\{t_{i}\}) =\displaystyle= pα({ti}|{ν(t)})pγ({ν(t)})pα,γ({ti}),\displaystyle\frac{p_{\alpha}(\{t_{i}\}|\{\nu(t)\})p_{\gamma}(\{\nu(t)\})}{p_{\alpha,\gamma}(\{t_{i}\})}, (6)

where pα({ti}|{ν(t)})p_{\alpha}(\{t_{i}\}|\{\nu(t)\}) is the conditional probability that events occur at times {ti}\{t_{i}\}, which is given by Eq. (3). Here we have denoted the self-exciting parameter α\alpha and the external input ν(t)\nu(t) explicitly as conditions, because they are used to specify the underlying rate λ(t)\lambda(t) in Eq. (1).

The denominator pα,γ({ti})p_{\alpha,\gamma}(\{t_{i}\}) is the marginal likelihood obtained by the marginalization integral:

pα,γ({ti})=pα({ti}|{ν(t)})pγ({ν(t)})D{ν(t)}.p_{\alpha,\gamma}(\{t_{i}\})=\int p_{\alpha}(\{t_{i}\}|\{\nu(t)\})p_{\gamma}(\{\nu(t)\})D\{\nu(t)\}. (7)

The parameter α\alpha and hyperparameter γ\gamma are determined by maximizing the marginal likelihood:

{α^,γ^}=argmaxα,γpα,γ({ti}).\{\hat{\alpha},\hat{\gamma}\}=\arg\max_{\alpha,\gamma}p_{\alpha,\gamma}(\{t_{i}\}). (8)

With the selected parameter and hyperparameter, the likely external input is obtained as the maximum a posteriori (MAP) estimate, such that the posterior distribution (6) is maximized:

{ν^(t)}=argmax{ν(t)}pα^,γ^({ν(t)}|{ti}).\{\hat{\nu}(t)\}=\arg\max_{\{\nu(t)\}}p_{\hat{\alpha},\hat{\gamma}}(\{\nu(t)\}|\{t_{i}\}). (9)

With the estimated α^\hat{\alpha}, {ν^(t)}\{\hat{\nu}(t)\}, and the given series of occurrence times {ti}\{t_{i}\}, we can estimate the underlying rate as

λ^(t)=ν^(t)+α^ti<tϕ(tti).\hat{\lambda}(t)=\hat{\nu}(t)+\hat{\alpha}\sum_{t_{i}<t}\phi(t-t_{i}). (10)

The estimated exogenous rate {ν^(t)}\{\hat{\nu}(t)\} depends largely upon the selected hyperparameter γ^\hat{\gamma}. With a large γ^\hat{\gamma}, the estimated rate fluctuates largely in time. If the exogenous fluctuation is small, it may occur that the estimated γ^\hat{\gamma} vanishes, in which case the estimated rate ν^(t)\hat{\nu}(t) becomes constant. Another parameter α^\hat{\alpha} represents the estimated level of self-excitation, and it might also occur that the reproduction ratio α^\hat{\alpha} vanishes. Thus, there are four regimes categorized according to the combination of whether γ^=0\hat{\gamma}=0 (homogeneous) or γ^0\hat{\gamma}\neq 0 (inhomogeneous) and whether α^=0\hat{\alpha}=0 (self-excitation undetected) or α^0\hat{\alpha}\neq 0 (self-exciting detected). The nicknames of the four regimes are given in Table 1. The method for selecting the parameter and hyperparameter {α^,γ^}\{\hat{\alpha},\hat{\gamma}\} and estimating a time-dependent exogenous stimulus ν^(t)\hat{\nu}(t) is explained in full detail in Appendix A. A ready-to-use version of the web application, the source code, and examplary data sets are available at our website: http://www.ton.scphys.kyoto-u.ac.jp/%7Eshino/hawkes

α^\hat{\alpha} γ^\hat{\gamma} interpretation
0 0   “Poisson”
0   finite   “Exo”
  finite 0   “Endo”
  finite   finite   “Exo+Endo”
Table 1: Four regimes categorized according to the selected parameter α\alpha and hyperparameter γ\gamma. “Poisson”: Neither endogenous self-excitation nor exogenous fluctuation is detected, {α^=0,γ^=0}\{\hat{\alpha}=0,\hat{\gamma}=0\}, and accordingly the process is considered to be close to the homogeneous Poisson process. “Exo”: Only temporal variation of the exogenous origin is detected, {α^=0,γ^0}\{\hat{\alpha}=0,\hat{\gamma}\neq 0\}, and the process is close to the inhomogeneous Poisson process. “Endo”: Only endogenous self-excitation is detected, {α^0,γ^=0}\{\hat{\alpha}\neq 0,\hat{\gamma}=0\}. “Exo+Endo”: Both exogenous and endogenous factors are detected, {α^0,γ^0}\{\hat{\alpha}\neq 0,\hat{\gamma}\neq 0\}.

II.3 path-integral estimation of the free energy

In the previous section, we showed the procedure for fitting the inhomogeneous Hawkes process to a given series of occurrence times. Nevertheless, the parameter and hyperparameter {α^,γ^}\{\hat{\alpha},\hat{\gamma}\} can be determined even without a series of events, if the event-generating process is given. In this section, we perform this estimation analytically using the path-integral method.

The marginalization in Eq. (7) can be represented as a path-integral:

pα,γ({ti})=1Z(γ)exp(S[ν(t)])D{ν(t)}.\displaystyle p_{\alpha,\gamma}(\{t_{i}\})=\frac{1}{Z(\gamma)}\int\exp(-S[\nu(t)])D\{\nu(t)\}. (11)

By representing the action functional S[ν(t)]=log(pα({ti}|{ν(t)})pγ({ν(t)}))logZ(γ)S[\nu(t)]=-\log(p_{\alpha}(\{t_{i}\}|\{\nu(t)\})p_{\gamma}(\{\nu(t)\}))-\log Z(\gamma) as an integral,

S[ν(t)]=0TL(ν˙,ν)𝑑t,S[\nu(t)]=\int_{0}^{T}L(\dot{\nu},\nu)dt, (12)

a classical orbit that satisfies the Euler-Lagrange equation

t(Lν˙)Lν=0\frac{\partial}{\partial t}\left(\frac{\partial L}{\partial\dot{\nu}}\right)-\frac{\partial L}{\partial\nu}=0 (13)

is the MAP solution {ν^(t)}\{\hat{\nu}(t)\}, given by Eq. (9).

The path-integral in Eq. (11) can be performed analytically by expanding the action functional up to the quadratic order in the deviation from the classical orbit:

S[ν(t)]S[ν^(t)]+12δ2Sν^(t)[δν(t)],S[\nu(t)]\approx S[\hat{\nu}(t)]+\frac{1}{2}\delta^{2}S_{\hat{\nu}(t)}[\delta\nu(t)], (14)

where δ2Sν^(t)[δν(t)]\delta^{2}S_{\hat{\nu}(t)}[\delta\nu(t)] denotes an action integral of the deviation around the classical orbit ν^(t)\hat{\nu}(t), defined as

0T(2Lν˙2|ν^(t)δν˙2+22Lν˙ν|ν^(t)δν˙δν+2Lν2|ν^(t)δν2)𝑑t.\displaystyle\int_{0}^{T}\left(\left.\frac{\partial^{2}L}{\partial\dot{\nu}^{2}}\right|_{\hat{\nu}(t)}\delta\dot{\nu}^{2}+\left.2\frac{\partial^{2}L}{\partial\dot{\nu}\partial\nu}\right|_{\hat{\nu}(t)}\delta\dot{\nu}\delta\nu+\left.\frac{\partial^{2}L}{\partial\nu^{2}}\right|_{\hat{\nu}(t)}\delta\nu^{2}\right)dt.

The marginal likelihood is obtained as

pα,γ({ti})ReS[ν^(t)],p_{\alpha,\gamma}(\{t_{i}\})\approx Re^{-S[\hat{\nu}(t)]}, (15)

where

R=1Z(γ)e12δ2Sν^(t)[δν(t)]D{δν(t)}R=\frac{1}{Z(\gamma)}\int e^{-\frac{1}{2}\delta^{2}S_{\hat{\nu}(t)}[\delta\nu(t)]}D\{\delta\nu(t)\} (16)

represents a “quantum effect.” The “free energy” is the negative logarithm of the marginal likelihood:

F(α,γ)\displaystyle F(\alpha,\gamma) =\displaystyle= 1Tlogpα,γ({ti})\displaystyle-\frac{1}{T}\log p_{\alpha,\gamma}(\{t_{i}\}) (17)
=\displaystyle= 1TlogR+1TS[ν^(t)].\displaystyle-\frac{1}{T}\log R+\frac{1}{T}S[\hat{\nu}(t)].

By decomposing the original exogenous input into a mean and fluctuation and expanding the action integral up to the quadratic order of the fluctuation, the free energy for a long series of events can be obtained analytically in terms of the mean input and the autocorrelation of fluctuation. The derivation of the free energy and the obtained formula are summarized in Appendix B.

III Results

III.1 Analyzing synthetic data

Event-generation model

We shall examine the Hawkes decoder by applying it to a series of events generated by the inhomogeneous Hawkes process of a given reproduction ratio α\alpha^{*} and exogenous input {ν(t)}\{\nu^{*}(t)\}:

λ(t)=ν(t)+αti<tϕ(tti),\lambda^{*}(t)=\nu^{*}(t)+\alpha^{*}\sum_{t_{i}<t}\phi^{*}(t-t_{i}), (18)

where the self-excitation kernel is given as ϕ(t)=1/τsexp(t/τs)\phi^{*}(t)=1/\tau^{*}_{s}\exp(-t/\tau^{*}_{s}) for t>0t>0 and ϕ(t)=0\phi^{*}(t)=0, otherwise. In particular, we examine the case where ν(t)\nu^{*}(t) is fluctuating according to the Ornstein-Uhlenbeck process (OUP) (Fig. 2):

dν(t)dt=ν(t)μτe+σ2τeξ(t),\frac{d\nu^{*}(t)}{dt}=-\frac{\nu^{*}(t)-\mu^{*}}{\tau^{*}_{e}}+\sigma^{*}\sqrt{\frac{2}{\tau_{e}^{*}}}\xi(t), (19)

where ξ(t)\xi(t) is a Gaussian white noise satisfying ξ(t)ξ(t+u)=δ(u)\langle\xi(t)\xi(t+u)\rangle=\delta(u). Accordingly, the OUP is characterized by an autocorrelation function with a time constant τe\tau_{e}^{*}:

(ν(t)μ)(ν(t+u)μ)=σ2e|u|/τe.\langle(\nu^{*}(t)-\mu^{*})(\nu^{*}(t+u)-\mu^{*})\rangle={\sigma^{*}}^{2}e^{-|u|/\tau_{e}^{*}}. (20)

The inhomogeneous Poisson process with the fluctuating rate Eq. (19) is also called the doubly stochastic Poisson process. Thus, the current event-generation is a linear combination of the Hawkes process in Eq. (18) and the doubly stochastic Poisson process characterized by the autocorrelation Eq. (20). The important dimensionless parameters of the model are the strength of endogenous self-excitation α\alpha^{*} and the exogenous fluctuation σ2τe/μ{\sigma^{*}}^{2}\tau_{e}^{*}/\mu^{*}.

Refer to caption
Figure 2: Parameters of an inhomogeneous Hawkes process generating synthetic data. (a) The self-excitation kernel ϕ(t)=1/τsexp(t/τs)\phi^{*}(t)=1/\tau^{*}_{s}\exp(-t/\tau^{*}_{s}). (b) Exogenous input ν(t)\nu^{*}(t) generated by the Ornstein-Uhlenbeck process (OUP) of given parameters μ\mu^{*}, σ\sigma^{*}, and τe\tau^{*}_{e}.
Refer to caption
Figure 3: Phase diagrams of the “Hawkes decoder” and the “Bayesian rate estimator” for the doubly stochastic inhomogeneous Hawkes process characterized by an amplitude of the self-excitation α\alpha^{*} and exogenous fluctuation σ2τe/μ{\sigma^{*}}^{2}\tau_{e}^{*}/\mu^{*}. (a) The Hawkes decoder. The parameters α\alpha and γ\gamma are selected by minimizing the free energy F(α,γ)F(\alpha,\gamma). The “Poisson,” “Endo,” and “Exo+Endo” regimes are defined by {α^=0,γ^=0}\{\hat{\alpha}=0,\hat{\gamma}=0\}, {α^0,γ^=0}\{\hat{\alpha}\neq 0,\hat{\gamma}=0\}, and {α^0,γ^0}\{\hat{\alpha}\neq 0,\hat{\gamma}\neq 0\}, respectively. The phase boundaries are computed for the case of τeτs\tau_{e}^{*}\gg\tau^{*}_{s}. (b) The sample event series, the original rate λ(t)\lambda^{*}(t), and the rate λ^(t)\hat{\lambda}(t) estimated by the Hawkes decoder. (c) The Bayesian rate estimator. The parameter α\alpha is chosen to be 0 and the hyperparameter γ\gamma is selected by minimizing the free energy Fp(γ)=F(α=0,γ)F_{\rm p}(\gamma)=F(\alpha=0,\gamma). The “Exo” regime is defined by {α^=0,γ^0}\{\hat{\alpha}=0,\hat{\gamma}\neq 0\}. (d) The rate ν^(t)\hat{\nu}(t) was estimated by the Bayesian rate estimator for the same event series analyzed in (b).
Refer to caption
Figure 4: Phase diagrams of the Hawkes decoder assuming the kernels of various timescales. The timescale of the decoder’s kernel τs\tau_{s} is (a) τsτs\tau_{s}\gg\tau_{s}^{*}, (b) τs>τs\tau_{s}>\tau_{s}^{*}, (c) τs=τs\tau_{s}=\tau_{s}^{*}, (d) τs<τs\tau_{s}<\tau_{s}^{*}, (e) τsτs\tau_{s}\ll\tau_{s}^{*}.
Refer to caption
Figure 5: Phase diagrams of the Hawkes decoder with different timescales of the exogenous fluctuation τe\tau_{e}. The timescale of the exogenous fluctuation τe\tau_{e}^{*} is (a) τeτs\tau_{e}^{*}\gg\tau_{s}^{*}, (b) τe>τs\tau_{e}^{*}>\tau_{s}^{*}, (c) τe=τs\tau_{e}^{*}=\tau_{s}^{*}, (d) τe<τs\tau_{e}^{*}<\tau_{s}^{*}, (e) τeτs\tau_{e}^{*}\ll\tau_{s}^{*}.
Refer to caption
Figure 6: Transformation of the phase diagram of the Hawkes decoder, according to the change in the timescales of the decoder’s kernel, τs\tau_{s}, the original self-excitation kernel, τs\tau^{*}_{s}, and the exogenous fluctuation, τe\tau^{*}_{e}.

The Hawkes decoder with a correct self-excitation kernel

First, we analyze the synthetic data using the Hawkes decoder. We consider here the case in which the model uses the correct shape of the original excitation kernel, ϕ(t)=ϕ(t)\phi(t)=\phi^{*}(t). The strength of self-excitation α\alpha and the hyperparameter γ\gamma for estimating the external fluctuation are selected by minimizing the free energy, F(α,γ)F(\alpha,\gamma), Eq. (17).

Figure 3 (a) depicts a phase diagram of the Hawkes decoder in a plane spanned by the parameters of an event-generation model, the reproduction ratio of the endogenous self-excitation α\alpha^{*}, and the strength of exogenous fluctuation σ2τe/μ{\sigma^{*}}^{2}\tau_{e}^{*}/\mu^{*}, for the case of τeτs\tau_{e}^{*}\gg\tau^{*}_{s}. We obtained three regimes: {α^=0,γ^=0}\{\hat{\alpha}=0,\hat{\gamma}=0\} (“Poisson”), {α^0,γ^=0}\{\hat{\alpha}\neq 0,\hat{\gamma}=0\} (“Endo”), and {α^0,γ^0}\{\hat{\alpha}\neq 0,\hat{\gamma}\neq 0\} (“Exo+Endo”). In Fig. 3 (b), we show sample event series derived from the inhomogeneous Hawkes process and the rates λ^(t)\hat{\lambda}(t) estimated via the Empirical Bayes method.

The yy-axis (α=0\alpha^{*}=0) in Fig. 3 (a) represents the situation by which events are generated by the inhomogeneous Poisson process in which the rate ν(t)\nu^{*}(t) fluctuates with a standard deviation σ\sigma^{*} around the mean rate μ\mu^{*} in a timescale of τe\tau_{e}^{*}. Because the event-generation process is stochastic in nature, the profile of the underlying rate ν(t)\nu^{*}(t) cannot be captured precisely from a series of occurrence times, and it might even occur that the model abandons to capture a fluctuation if the amplitude of the original rate fluctuation is too small. This γ^=0\hat{\gamma}=0 occurs if estimating a fluctuating rate (with γ0\gamma\neq 0) is likely to produces a larger error than indicating a constant rate (with γ=0\gamma=0). We found that the model suggests a constant rate if the original rate fluctuation is small (Appendix C.1):

σ2τe/μ<12(1α).{\sigma^{*}}^{2}\tau_{e}^{*}/\mu^{*}<\frac{1}{2(1-\alpha^{*})}. (21)

We have considered minimizing the mean integrated squared error between the underlying rate and the histogram of a given series of event times and discovered that the optimal bin size may diverge when the underlying fluctuation is small Koyama2004 . For the doubly stochastic Poisson process Eq. (19), the condition in which the bin size diverges is identical to inequality (21) with α=0\alpha^{*}=0. This implies that the condition for the rate fluctuation being unknowable is independent of the estimation methods.

The Bayesian rate estimator

Now we analyze the identical dataset using the “Bayesian rate estimator,” which estimates the fluctuating rate ν(t)\nu(t) by fitting the inhomogeneous Poisson process to the data. In this case, we compute the marginal likelihood by setting α=0\alpha=0 in Eq. (7), and obtain the free energy as

Fp(γ)=F(α=0,γ).\displaystyle F_{\rm p}(\gamma)=F(\alpha=0,\gamma). (22)

In this case, the process of estimating {ν(t)}\{\nu(t)\} is identical to the rate estimation method that we have developed previously Koyama2007 .

Figure 3 (c) depicts a phase diagram of the Bayesian rate estimator for the same data examined with the Hawkes decoder in Fig. 3 (a). Here we obtain two different regimes categorized according to whether γ^=0\hat{\gamma}=0 (“Poisson”) or γ^0\hat{\gamma}\neq 0 (“Exo”).

The xx-axis of the diagram (σ2τe/μ=0{\sigma^{*}}^{2}\tau_{e}^{*}/\mu^{*}=0) in Fig. 3 (c) represents the data generated by the (homogeneous) Hawkes process. Even if the exogenous fluctuation is absent, an apparent occurrence of events may exhibit the large fluctuation due to the self-excitation mechanism. We found that the Bayesian rate estimator may suggest a fluctuating rate ν^(t)\hat{\nu}(t) or γ^0\hat{\gamma}\neq 0 if the original self-excitation is greater than the critical value (Appendix C.2):

α>αc=11/20.2929.\alpha^{*}>\alpha_{c}=1-1/\sqrt{2}\approx 0.2929. (23)

The critical reproduction ratio αc\alpha_{c} is identical to that obtained for the principled histogram method, such that the selected bin size diverges above this reproduction ratio αc\alpha_{c} Onaga2014 ; Onaga2016 .

As shown in Fig. 3 (c), the Bayesian rate estimator cannot capture the rate fluctuation if the endogenous self-excitation α\alpha^{*} or exogenous fluctuation σ2τe/μ{\sigma^{*}}^{2}\tau_{e}^{*}/\mu^{*} is small. In Fig. 3 (d), we also show the rates ν^(t)\hat{\nu}(t) estimated by the Bayesian rate estimator for the same series of events analyzed by the Hawkes decoder in Fig. 3 (b).

The Hawkes decoder with an incorrect self-excitation kernel

Even if event occurrence is facilitated by the self-excitation mechanism, the self-excitation information is generally not available. We consider the case in which the Hawkes decoder assumes a kernel that is different from that of the original process. Here in particular, we analyze the case in which the timescale τs\tau_{s} of the self-exciting kernel ϕ(t)=1/τsexp(t/τs)\phi(t)=1/\tau_{s}\exp(-t/\tau_{s}) is different from the timescale τs\tau^{*}_{s} of the original kernel ϕ(t)=1/τsexp(t/τs)\phi^{*}(t)=1/\tau^{*}_{s}\exp(-t/\tau^{*}_{s}), and see how the phase diagram is modified by choosing an incorrect self-excitation kernel (here we examine the case of τeτs\tau_{e}^{*}\gg\tau^{*}_{s}).

Figure 4 (c) represents the phase diagram obtained with a correct kernel, τs=τs\tau_{s}=\tau^{*}_{s}. If the timescale of the decoder’s kernel τs\tau_{s} is longer than the original, τs>τs\tau_{s}>\tau^{*}_{s}, the “Exo+Endo” regime reduces and the “Exo” regime emerges from the right-hand side of the phase diagram (Fig. 4 (b)). If the timescale of the decoder’s kernel is even longer, the “Exo” regime dominates, and the self-excitation is not detected. In the limit of τsτs\tau_{s}\gg\tau^{*}_{s}, its phase diagram is identical to that of the Bayesian rate estimator (Fig. 4 (a)).

In the case where the timescale of the decoder’s kernel is shorter than the original, τs<τs\tau_{s}<\tau^{*}_{s}, the Hawkes regime expands (Fig. 4 (d)) from the regime obtained with a correct kernel (Fig. 4 (c)). In this case, the reproduction ratio α\alpha is estimated to be smaller than the original (α^<α\hat{\alpha}<\alpha^{*}). In the limit of τsτs\tau_{s}\ll\tau^{*}_{s}, the estimated reproduction ratio becomes infinitesimal, and accordingly the “Endo” and “Exo+Endo” regimes turn into “Poisson” and “Exo” regimes, respectively (Appendix C.3). In this limit, the phase diagram is also identical to that of the Bayesian rate estimator (Fig. 4 (e)).

The case of slow self-excitation

So far we have examined the cases in which an environmental factor changes slowly compared to the self-excitation (τeτs\tau_{e}^{*}\gg\tau^{*}_{s}), and shown that the Hawkes decoder can estimate the contributions of exogenous and endogenous factors (Fig. 5 (a)). If the timescale of exogenous fluctuation τe\tau_{e}^{*} is comparable or even shorter than that of self-excitation τs\tau^{*}_{s}, however, it is difficult to separate the exogenous and endogenous contributions to the data.

If the timescale of the external fluctuation τe\tau_{e}^{*} is relatively close to that of the self-exciation τs\tau^{*}_{s}, the “Exo+Endo” regime decreases (Fig. 5 (b)). If the timescale of the extrinsic fluctuation is comparable to that of self-excitation τe=τs\tau_{e}^{*}=\tau^{*}_{s}, we may not discriminate between the exogenous and endogenous self-excitation components and obtain only the “Endo” regime (Fig. 5 (c)). Contrary, if the timescale of the external fluctuation τe\tau_{e}^{*} is shorter than that of the self-exciation τs\tau^{*}_{s}, the decoder may yield the “Exo” regime (Figs. 5 (d) and (e)).

Figure 6 depicts the full view of the transition of the phase diagram, consisting of a variety of situations.

III.2 Analyzing real-world data

Refer to caption
Figure 7: Analysis of real-world data. (a) The number of comments containing the keyword “coronavirus” submitted to the Reddit forum “r/worldnews” for a period between January 19th and February 19th, 2020. (b) Estimating the exogenous factor representing environmental changes from the entire event series. Submission activities for two time series of 1 week (between Jan 29 and Feb 5 and between Feb 7 and Feb 14). The blue and red lines indicate the rate of all submissions and the first comments followed by the original posts, respectively. The yellow shaded areas represent the 95% range of exogenous contribution estimated by the Hawkes decoder. (c) Estimating the endogenous self-excitation causing chains of events, given the environmental changes. The entire occurrence rate λ(t)\lambda(t) was obtained by simulating the inhomogeneous Hawkes process, by considering the occurrence rate of first comments as the exogenous input ν(t)\nu(t). The reproduction ratio α\alpha and the timescale of the self-excitation τs\tau_{s} are obtained by fitting the Hawkes decoder for the previous dataset. The yellow shaded areas represent the 95% range of estimated entire occurrence rate.

Time series of comments submitted to an SNS

Finally, we analyze real-world data, namely the time series of comments talking about a given subject on a particular Reddit forum, either in original posts or in comments upon them. Data were collected through the public API for the keyword “coronavirus” in the subreddit “r/worldnews” for a period between January 19th and February 19th, 2020. The dataset contains 223,545 comments in response to 5,341 submissions. Figure 7 (a) displays the changes in commenting activity over a month, indicating a strong burst of comments occurring at many times, presumably induced by the news.

When considering topic-related content, the first comments on original posts might be interpreted as a direct consequence of exogenous influences, because they were likely to be induced by real-world events. The other follow-up comments may be considered as endogenous activity that was induced within a forum.

We applied the Hawkes decoder to the time series of times of all the comments (whether first comments or follow-up comments) to estimate the exogenous and endogenous factors that have worked to generate events. Due to the large size of the observation window, we selected several 1-week datasets from the full time series. When applying to the Empirical Bayes method, we also selected the timescale of the self-exciting kernel so that the likelihood is maximized. By analyzing several 1-week time series, the timescale of the kernel was selected in a range between 1,300 and 3,000 sec, implying that commenting may typically be done in about half an hour. This is in contrast to our previous GLM analysis of the tweets that contain a hashtag related to “bitcoin,” in which case we used a kernel of the timescale 6060 seconds or 1 min. This short response-time is presumably because clicking the retweet button can be done quickly, in contrast to the submission in Reddit, for which one needs some time to organize a comment.

The results of the decoding analysis for two segments are shown in Fig. 7 (b). The rates of total submissions and the original posts and direct comments were shown using 30-min binning. The exogenous rate estimated by the Hawkes decoder was superimposed upon the figure. Here we plotted the 95% range of ν(t)\nu(t) estimated from the posterior distribution pα^,γ^({ν(t)}|{ti})p_{\hat{\alpha},\hat{\gamma}}(\{\nu(t)\}|\{t_{i}\}) computed by Eq. (6) with the parameter α^\hat{\alpha} and hyperparameter γ^\hat{\gamma} determined by Eq. (8). We observe that the estimated exogenous component ν(t)\nu(t) exhibits good agreement with the rate of first comments. The reproduction ratio α^\hat{\alpha} was estimated as 0.840.84 and 0.810.81, and the timescale τ\tau was selected as 1,644 and 2,456 sec for these two segments.

Predicting chain-reactions induced by external influence

While the Hawkes decoder can discover exogenous and endogenous factors in a single event series, its validity cannot be proven rigorously as long as the information about these origins is unavailable. However, it may be possible to use a model for predicting chain-reactions that are indirectly induced by environmental changes.

We analyzed the commenting data in the Reddit. By considering the occurrence rate of first comments as the external influence ν(t)\nu(t), we simulated the Hawkes process to estimate the entire occurrence rate λ(t)\lambda(t). To do this, we estimated the reproduction ratio α\alpha and the timescale of the self-excitation kernel τs\tau_{s} by fitting the Hawkes decoder to the data of the preceding one week. With the series of events generated by this procedure, we constructed a time histogram of the occurrence rate with a bin size of 30 min. By repeating this procedure 1,000 times for the same ν(t)\nu(t), we obtained 1,000 different time histograms, with which we can obtain the distribution of the number of comments.

Figure 7 (c) depicts two examples of commenting activity for 1 week, with the distribution of occurrence rates λ(t)\lambda(t) predicted from the first comments. The rate of all comments that have occurred in practice is plotted on the distribution, exhibiting good predictive performance.

IV Discussion

The Hawkes process has attracted attention as a mathematical model that may describe the self-excitation mechanisms generating a chain of events, and there have been many attempts to fit the model to given event times.

Firstly, the time dependency of self-excitation has been pursued by fitting the original (homogeneous) Hawkes process to a given series of event-occurrence times. The estimation was performed by maximizing the likelihood using various analytical techniques, either parametrically Ogata1978 ; Ozaki1979 ; Veen2008 ; Rasmussen2013 ; Fonseca2014 ; Chen2018 or nonparametrically Lewis2011 ; Bacry2012 ; Bacry2016a ; Bacry2016b ; Patricia2010 ; Hansen2015 ; zhou13 ; xuc16 ; Kirchner2016 ; Kirchner2017 (an exhaustive review is given in bacry2015 ).

Secondly, potential environmental changes were taken into account by introducing temporal fluctuation to the background rate. There are several attempts to estimate the background rate nonparametrically, including an Expectation-Maximization (EM) algorithm Lewis2011 , kernel-density estimation and differential-entropy minimization chen_hall_2016 , and local-maximum-likelihood estimation Clinet2018 . There have also been methods for estimating the background rate parametrically chen_hall_2013 ; kobayashi2016 ; Omi2017 .

Our analytical method can be categorized into the latter group of studies; we have developed a method of estimating both exogenous and endogenous factors by fitting a linear superposition of the inhomogeneous Poisson process and the Hawkes process to an observed sequence of event times. With synthetic data generated by the inhomogeneous Hawkes process, we have confirmed that the method works properly for estimating the original parameters.

Here in particular, we have found that there are cases in which even the best decoding method cannot capture the extrinsic fluctuation and/or intrinsic self-excitation. Regarding the extrinsic fluctuation, a principled estimator may assume a constant environment if the extrinsic fluctuation is too small: even though the decoding method can estimate the fluctuating rate from a given dataset, the estimation error may become larger than that obtained by assuming a constant rate. Regarding the intrinsic self-excitation, the model cannot separate the self-excitation from environmental fluctuation if the timescale of excitation is similar to or larger than that of environmental fluctuation. We have summarized the undetectability conditions up into phase diagrams categorizing four regimes according to whether or not the exogenous fluctuation and endogenous self-excitation are respectively detected.

We also devised the Hawkes decoder to estimate the exogenous and endogenous factors from a given series of events. By applying it to real time series of debate on Reddit, we have observed that the first comments and the follow-up comments map closely to the estimated exogenous and endogenous reactions, respectively.

While the Hawkes decoder can estimate the contributions of exogenous and endogenous factors in a single event series, it is often the case that information about the origin is unavailable. In such a case, the action of dividing the underlying causes into exogenous and endogenous categories might be regarded as a subjective interpretation. However, the estimation of respective contributions might be useful if it correctly predicts the impact of external factors upon the resulting occurrence. For the real time series of debate on Reddit, we considered the occurrence rate of first comments as the exogenous influence, and simulated the Hawkes process to “predict” the number of follow-up comments. We confirmed that the prediction exhibited a good agreement with the number of follow-up comments that occur in practice.

Our method is general and applicable to a variety of data. We have provided an application program, with which anyone can analyze any series of event times.

Appendix A Numerical method

We reformulate the Hawkes decoder model, defined by Eqs. (3) and (4), as a state-space model, based on which a numerical method for selecting the parameter and hyperparameter {α^,γ^}\{\hat{\alpha},\hat{\gamma}\} and a time-dependent exogenous stimulus ν^(t)\hat{\nu}(t) are developed.

A.1 State-space representation

We use the exponential function ϕ(t)=(1/τs)exp(t/τs)\phi(t)=(1/\tau_{s})\exp(-t/\tau_{s}) for the self-excitation kernel. From Eqs. (1) and (2), the probability density of the inter-event interval tititt_{i}-t_{i-t} is expressed as

(ν+(ti)+ατsRi)\displaystyle\left(\nu^{+}(t_{i})+\frac{\alpha}{\tau_{s}}R_{i}\right) (24)
×exp(ti1tiν+(t)𝑑tαRi(e(titi1)/τs1)),\displaystyle\times\exp\left(-\int_{t_{i-1}}^{t_{i}}\nu^{+}(t)dt-\alpha R_{i}(e^{(t_{i}-t_{i-1})/\tau_{s}}-1)\right),

where ν+(t)=max(0,ν(t))\nu^{+}(t)=\max(0,\nu(t)), which ensures non-negativity of the exogenous stimulus, and RiR_{i} is computed for i=2,,ni=2,\ldots,n as

Ri=(1+Ri1)e(titi1)/τs,R_{i}=(1+R_{i-1})e^{-(t_{i}-t_{i-1})/\tau_{s}}, (25)

with an initial value R1R_{1}. We approximate the exogenous stimulus as being piecewise constant,

ν(t)νi,ti1<tti,\nu(t)\approx\nu_{i},\quad t_{i-1}<t\leq t_{i}, (26)

which is valid under the condition that the timescale of the exogenous fluctuation is sufficiently large enough compared with the mean inter-event interval. Under this approximation, and letting yi=titi1y_{i}=t_{i}-t_{i-1} be the iith inter-event interval, Eq. (24) becomes

pα(yi|νi)\displaystyle p_{\alpha}(y_{i}|\nu_{i}) =\displaystyle= (νi++ατsRi)\displaystyle\left(\nu_{i}^{+}+\frac{\alpha}{\tau_{s}}R_{i}\right) (27)
×exp(yiνi+αRi(eyi/τs1)),\displaystyle\times\exp(-y_{i}\nu_{i}^{+}-\alpha R_{i}(e^{y_{i}/\tau_{s}}-1)),

which we consider the conditional density of yiy_{i} given νi\nu_{i}.

The transition density of νi\nu_{i} associated with the prior distribution (4) is given by

pγ(νi|νi1)=1πγ2(titi2)exp((νiνi1)2γ2(titi2)).p_{\gamma}(\nu_{i}|\nu_{i-1})=\frac{1}{\sqrt{\pi\gamma^{2}(t_{i}-t_{i-2})}}\exp\left(-\frac{(\nu_{i}-\nu_{i-1})^{2}}{\gamma^{2}(t_{i}-t_{i-2})}\right). (28)

Combined with an initial density p(ν1)p(\nu_{1}), Eqs. (27) and (28) define a state-space model West1997 ; Durbin2001 , for which the empirical Bayes method can be implemented by the recursive Bayesian algorithm.

A.2 Recursive Bayesian algorithm

For notational simplicity, let Yi={y1,,yi}Y_{i}=\{y_{1},\ldots,y_{i}\} be the observations up to time tit_{i}. By the Bayes’ theorem, the posterior distribution of νi\nu_{i}, given the observations up to the current time, is expressed as

pα,γ(νi|Yi)=pα(yi|νi)pα,γ(νi|Yi1)pα(yi|νi)pα,γ(νi|Yi1)𝑑νi,p_{\alpha,\gamma}(\nu_{i}|Y_{i})=\frac{p_{\alpha}(y_{i}|\nu_{i})p_{\alpha,\gamma}(\nu_{i}|Y_{i-1})}{\int p_{\alpha}(y_{i}|\nu_{i})p_{\alpha,\gamma}(\nu_{i}|Y_{i-1})d\nu_{i}}, (29)

where pα,γ(νi|Yi1)p_{\alpha,\gamma}(\nu_{i}|Y_{i-1}) on the right-hand-side is computed using the posterior distribution from the last iteration, pα,γ(νi1|Yi1)p_{\alpha,\gamma}(\nu_{i-1}|Y_{i-1}), as

pα,γ(νi|Yi1)=pγ(νi|νi1)pα,γ(νi1|Yi1)𝑑νi1.p_{\alpha,\gamma}(\nu_{i}|Y_{i-1})=\int p_{\gamma}(\nu_{i}|\nu_{i-1})p_{\alpha,\gamma}(\nu_{i-1}|Y_{i-1})d\nu_{i-1}. (30)

Thus, starting with the initial distribution pα,γ(ν1|Y0)=p(ν1)p_{\alpha,\gamma}(\nu_{1}|Y_{0})=p(\nu_{1}), the posterior distributions (29) and (30) are recursively computed for i=1,,ni=1,\ldots,n. Once we obtain these distributions, the posterior distribution of νi\nu_{i}, given the whole observation YnY_{n}, is computed using the following recursive equation,

pα,γ(νi|Yn)\displaystyle p_{\alpha,\gamma}(\nu_{i}|Y_{n}) =\displaystyle= pα,γ(νi+1|Yn)pγ(νi+1|νi)pα,γ(νi+1|Yi)𝑑νi+1\displaystyle\int\frac{p_{\alpha,\gamma}(\nu_{i+1}|Y_{n})p_{\gamma}(\nu_{i+1}|\nu_{i})}{p_{\alpha,\gamma}(\nu_{i+1}|Y_{i})}d\nu_{i+1} (31)
×pα,γ(νi|Yi),\displaystyle\times p_{\alpha,\gamma}(\nu_{i}|Y_{i}),

for i=n1,,1i=n-1,\ldots,1 in backward. We obtain the MAP estimate of the exogenous stimulus {ν^i}\{\hat{\nu}_{i}\}, such that pα,γ(νi|Yn)p_{\alpha,\gamma}(\nu_{i}|Y_{n}) (for i=1,,ni=1,\ldots,n) is maximized.

For the state-space model (27) and (28), we introduce a Gaussian approximation in the posterior distribution (29) at each point in time, providing a simple algorithm that is computationally tractable Koyama2010JASA ; Koyama2010JCNS ; Koyama2010AISM . Let νi1|i1\nu_{i-1|i-1} and qi1|i1q_{i-1|i-1} be the (approximate) mean and variance for pα,γ(νi1|Yi1)p_{\alpha,\gamma}(\nu_{i-1}|Y_{i-1}). Under a Gaussian approximation in pα,γ(νi1|Yi1)p_{\alpha,\gamma}(\nu_{i-1}|Y_{i-1}), the posterior distribution (30) is also a Gaussian whose mean and variance are, respectively, computed as

νi|i1\displaystyle\nu_{i|i-1} =\displaystyle= νi1|i1,\displaystyle\nu_{i-1|i-1}, (32)
qi|i1\displaystyle q_{i|i-1} =\displaystyle= qi1|i1+γ2(titi2)/2.\displaystyle q_{i-1|i-1}+\gamma^{2}(t_{i}-t_{i-2})/2. (33)

To make a Gaussian approximation in Eq. (29), let l(νi)=logpα(Ti|νi)pα,γ(νi|T1:i1)l(\nu_{i})=\log p_{\alpha}(T_{i}|\nu_{i})p_{\alpha,\gamma}(\nu_{i}|T_{1:i-1}) be the log posterior distribution (from which we omit the normalization constant). Expanding the log posterior distribution about its maximum point up to the second-order yields l(νi)l(νi|i)+l¨(νi|i)(νiνi|i)2/2l(\nu_{i})\approx l(\nu_{i|i})+\ddot{l}(\nu_{i|i})(\nu_{i}-\nu_{i|i})^{2}/2, where νi|i\nu_{i|i} is obtained as a root of the equation l˙(νi)=0\dot{l}(\nu_{i})=0,

νi|i\displaystyle\nu_{i|i} =\displaystyle= (νi|i1α/τsRiyiqi|i1\displaystyle\big{(}\nu_{i|i-1}-\alpha/\tau_{s}R_{i}-y_{i}q_{i|i-1} (34)
+{(νi|i1α/τsRiyiqi|i1)2\displaystyle+\{(\nu_{i|i-1}-\alpha/\tau_{s}R_{i}-y_{i}q_{i|i-1})^{2}
+4(qi|i1+α/τsRiνi|i1\displaystyle+4(q_{i|i-1}+\alpha/\tau_{s}R_{i}\nu_{i|i-1}
α/τsRiyiqi|i1)}1/2)/2.\displaystyle-\alpha/\tau_{s}R_{i}y_{i}q_{i|i-1})\}^{1/2}\big{)}/2.

Thus the posterior distribution (29) is approximated to a Gaussian with mean νi|i\nu_{i|i} and variance:

qi|i\displaystyle q_{i|i} =\displaystyle= l¨(νi|i)1\displaystyle-\ddot{l}(\nu_{i|i})^{-1} (35)
=\displaystyle= qi|i1(νi|i+α/τsRi)2qi|i1+(νi|i+α/τsRi)2.\displaystyle\frac{q_{i|i-1}(\nu_{i|i}+\alpha/\tau_{s}R_{i})^{2}}{q_{i|i-1}+(\nu_{i|i}+\alpha/\tau_{s}R_{i})^{2}}.

The Gaussian approximations in pα,γ(νi|Yi)p_{\alpha,\gamma}(\nu_{i}|Y_{i}) and pα,γ(νi|Yi1)p_{\alpha,\gamma}(\nu_{i}|Y_{i-1}) result in a Gaussian for Eq. (31) as well. Let νi|n\nu_{i|n} and qi|nq_{i|n} be the mean and variance of pα,γ(νi|Yn)p_{\alpha,\gamma}(\nu_{i}|Y_{n}). We then obtain the recursive equation corresponding to Eq. (31):

νi|n\displaystyle\nu_{i|n} =\displaystyle= νi|i+(νi+1|nνi+1|i)qi|i/qi+1|i,\displaystyle\nu_{i|i}+(\nu_{i+1|n}-\nu_{i+1|i})q_{i|i}/q_{i+1|i}, (36)
qi|n\displaystyle q_{i|n} =\displaystyle= qi|i+(qi+1|nqi+1|i)(qi|i/qi+1|i)2.\displaystyle q_{i|i}+(q_{i+1|n}-q_{i+1|i})(q_{i|i}/q_{i+1|i})^{2}. (37)

Eqs. (32)–(37) comprise the recursive estimation of {νi}\{\nu_{i}\}. The algorithm is summarized as follows.

  1. 1.

    Set initial values ν1|0\nu_{1|0}, q1|0q_{1|0} and R1R_{1}.

  2. 2.

    Compute Eqs. (25) and (32)–(35) for i=1,,ni=1,\ldots,n in forward.

  3. 3.

    Starting with νn|n\nu_{n|n} and qn|nq_{n|n}, which are obtained at the last iteration in the step 2, compute Eqs. (36) and (37) for i=n1,,1i=n-1,\ldots,1 in backward.

The resulting {νi|n}\{\nu_{i|n}\} provides the MAP estimate of the exogenous rate. Note that we may use a diffuse (noninformative) prior for the initial values (q1|0q_{1|0}\to\infty), which results in ν1|1=1/y1α/τsR1\nu_{1|1}=1/y_{1}-\alpha/\tau_{s}R_{1} and q1|1=1/y12q_{1|1}=1/y_{1}^{2}, leaving out the dependency of the initial values upon the estimation. In our analysis, we used R1=τs/y¯R_{1}=\tau_{s}/\overline{y}, where y¯=i=1myi/m\overline{y}=\sum_{i=1}^{m}y_{i}/m is an average of the observations over a given range (we have chosen m=100m=100), to remove the initial non-stationary part of the estimation.

To select the parameter and hyperparameter {α^,γ^}\{\hat{\alpha},\hat{\gamma}\}, we consider a factorization of the marginal likelihood,

pα,γ(Yn)\displaystyle p_{\alpha,\gamma}(Y_{n}) =\displaystyle= i=1npα,γ(yi|Yi1)\displaystyle\prod_{i=1}^{n}p_{\alpha,\gamma}(y_{i}|Y_{i-1}) (38)
=\displaystyle= i=1n𝑑νipα(yi|νi)pα,γ(νi|Yi1).\displaystyle\prod_{i=1}^{n}\int_{-\infty}^{\infty}d\nu_{i}p_{\alpha}(y_{i}|\nu_{i})p_{\alpha,\gamma}(\nu_{i}|Y_{i-1}).

Since pα,γ(νi|Yi1)p_{\alpha,\gamma}(\nu_{i}|Y_{i-1}) on the right-hand-side is approximated by a Gaussian with mean νi|i1\nu_{i|i-1} and variance qi|i1q_{i|i-1}, the integral may be approximated by the Gauss-Hermite quadrature,

𝑑νipα(yi|νi)pα,γ(νi|Yi1)\displaystyle\int_{-\infty}^{\infty}d\nu_{i}p_{\alpha}(y_{i}|\nu_{i})p_{\alpha,\gamma}(\nu_{i}|Y_{i-1}) (39)
=\displaystyle= 12πqi|i1𝑑νipα(yi|νi)exp((νiνi|i1)22qi|i1)\displaystyle\frac{1}{\sqrt{2\pi q_{i|i-1}}}\int_{-\infty}^{\infty}d\nu_{i}p_{\alpha}(y_{i}|\nu_{i})\exp\left(-\frac{(\nu_{i}-\nu_{i|i-1})^{2}}{2q_{i|i-1}}\right)
=\displaystyle= 1π𝑑xpα(yi|2qi|i1x+νi|i1)ex2\displaystyle\frac{1}{\sqrt{\pi}}\int_{-\infty}^{\infty}dxp_{\alpha}(y_{i}|\sqrt{2q_{i|i-1}}x+\nu_{i|i-1})e^{-x^{2}}
\displaystyle\approx 1πl=1mwlpα(yi|2qi|i1xl+νi|i1),\displaystyle\frac{1}{\sqrt{\pi}}\sum_{l=1}^{m}w_{l}p_{\alpha}(y_{i}|\sqrt{2q_{i|i-1}}x_{l}+\nu_{i|i-1}),

where the weights {wl}\{w_{l}\} and evaluation points {xl}\{x_{l}\} are chosen according to a quadrature rule Press1992 . The parameter and hyperparameter {α^,γ^}\{\hat{\alpha},\hat{\gamma}\} are selected by maximizing Eq. (38) numerically. It should be noted that the time constant of the self-excitation kernel, τ^s\hat{\tau}_{s}, may also be selected by maximizing Eq. (38) with respect to τs\tau_{s}, as well.

Appendix B Derivation of the free energy

B.1 Representation of intensity

First, we represent the intensity (18) of the inhomogeneous Hawkes process in terms of the mean behavior and the fluctuations. We consider decomposing a series of events into the mean and fluctuation as

iδ(tti)=λ(t)+ξ(t),\sum_{i}\delta(t-t_{i})=\lambda^{*}(t)+\xi(t), (40)

where ξ(t)\xi(t) is the white noise (the “derivative” of the martingale Bacry2012 ), whose ensemble characteristics satisfy ξ(t)=0\langle\xi(t)\rangle=0 and ξ(t)ξ(t)=λ(t)δ(tt)\langle\xi(t)\xi(t^{\prime})\rangle=\langle\lambda^{*}(t)\rangle\delta(t-t^{\prime}). Using this decomposition, Eq. (18) can be represented as

λ(t)\displaystyle\lambda^{*}(t) =\displaystyle= ν(t)+α0tϕ(tu)λ(u)𝑑u\displaystyle\nu^{*}(t)+\alpha^{*}\int_{0}^{t}\phi^{*}(t-u)\lambda^{*}(u)du (41)
+α0tϕ(tu)ξ(u)𝑑u.\displaystyle+\alpha^{*}\int_{0}^{t}\phi^{*}(t-u)\xi(u)du.

By applying the Laplace transformation, this equation is solved as

λ(t)\displaystyle\lambda^{*}(t) =\displaystyle= ν(t)+0tΨ(tu)ν(u)𝑑u\displaystyle\nu^{*}(t)+\int_{0}^{t}\Psi^{*}(t-u)\nu^{*}(u)du (42)
+0tΨ(tu)ξ(u)𝑑u,\displaystyle+\int_{0}^{t}\Psi^{*}(t-u)\xi(u)du,

where Ψ(t)\Psi^{*}(t) is an “effective self-exciting kernel” whose Laplace transform is given by

Ψ^(s)=αϕ^(s)1αϕ^(s),\widehat{\Psi}^{*}(s)=\frac{\alpha^{*}\widehat{\phi}^{*}(s)}{1-\alpha^{*}\widehat{\phi}^{*}(s)}, (43)

where ϕ^(s)\widehat{\phi}^{*}(s) is the Laplace transform of the self-excitation kernel ϕ(t)\phi^{*}(t). The first and second terms on the right-hand-side of Eq. (42) represent the average behavior and the third term represents the fluctuation around the averaged behavior. Representing the original exogenous input (19) as ν(t)=μ+σζ(t)\nu^{*}(t)=\mu^{*}+\sigma^{*}\zeta(t), where ζ(t)\zeta(t) is the normalized fluctuation whose autocorrelation function is given by ζ(t)ζ(t+u)=exp(|u|/τe)\langle\zeta(t)\zeta(t+u)\rangle=\exp(-|u|/\tau^{*}_{e}), Eq. (42) is expressed as

λ(t)=Λ+σζ(t)+σZ(t)+Ξ(t),\lambda^{*}(t)=\Lambda^{*}+\sigma^{*}\zeta(t)+\sigma^{*}Z^{*}(t)+\Xi^{*}(t), (44)

where

Λ\displaystyle\Lambda^{*} =\displaystyle= λ(t)=μ1α,\displaystyle\langle\lambda^{*}(t)\rangle=\frac{\mu^{*}}{1-\alpha^{*}}, (45)
Z(t)\displaystyle Z^{*}(t) =\displaystyle= 0tΨ(tu)ζ(u)𝑑u,\displaystyle\int_{0}^{t}\Psi^{*}(t-u)\zeta(u)du, (46)
Ξ(t)\displaystyle\Xi^{*}(t) =\displaystyle= 0tΨ(tu)ξ(u)𝑑u.\displaystyle\int_{0}^{t}\Psi^{*}(t-u)\xi(u)du. (47)

In the same manner, by decomposing the exogenous rate in the decoder into the mean and fluctuation, ν(t)=(1α)Λ+x(t)\nu(t)=(1-\alpha)\Lambda^{*}+x(t), the decoder’s intensity (1) is represented as

λ(t)=Λ+x(t)+σZ(t)+Ξ(t),\lambda(t)=\Lambda^{*}+x(t)+\sigma^{*}Z(t)+\Xi(t), (48)

where

Z(t)\displaystyle Z(t) =\displaystyle= 0tΨ(tu)ζ(u)𝑑u,\displaystyle\int_{0}^{t}\Psi(t-u)\zeta(u)du, (49)
Ξ(t)\displaystyle\Xi(t) =\displaystyle= 0tΨ(tu)ξ(u)𝑑u.\displaystyle\int_{0}^{t}\Psi(t-u)\xi(u)du. (50)

Here, Ψ(t)\Psi(t) represents the effective self-exciting kernel of the decoder, whose Laplace transform is given by

Ψ^(s)=αϕ^(s)1αϕ^(s),\displaystyle\widehat{\Psi}(s)=\frac{\alpha\widehat{\phi}(s)}{1-\alpha^{*}\widehat{\phi}^{*}(s)}, (51)

where ϕ^(s)\widehat{\phi}(s) is the Laplace transform of the self-excitation kernel ϕ(t)\phi(t) of the decoder.

The path-integral for the marginal likelihood (11) is carried out by changing the variable from {ν(t)}\{\nu(t)\} to {x(t)}\{x(t)\}:

pα,γ({ti})=1Z(γ)exp(0TL(x˙,x)𝑑t)D{x(t)},p_{\alpha,\gamma}(\{t_{i}\})=\frac{1}{Z(\gamma)}\int\exp\left(-\int_{0}^{T}L(\dot{x},x)dt\right)D\{x(t)\}, (52)

where the Lagrangian is expressed using Eqs. (40), (44) and (48) as

L(x˙,x)\displaystyle L(\dot{x},x) =\displaystyle= 12γ2x˙2+Λ+x(t)+σZ(t)+Ξ(t)\displaystyle\frac{1}{2\gamma^{2}}\dot{x}^{2}+\Lambda^{*}+x(t)+\sigma^{*}Z(t)+\Xi(t) (53)
log(Λ+x(t)+σZ(t)+Ξ(t))\displaystyle-\log(\Lambda^{*}+x(t)+\sigma^{*}Z(t)+\Xi(t))
×(Λ+σζ(t)+σZ(t)+Ξ(t)\displaystyle\times(\Lambda^{*}+\sigma^{*}\zeta(t)+\sigma^{*}Z^{*}(t)+\Xi^{*}(t)
+ξ(t)).\displaystyle+\xi(t)).

Using this Lagrangian, the path-integral (52) is evaluated as Eq. (15). We derive the contributions of the MAP solution and the “quantum effect” to the path-integral below.

B.2 Contribution of the MAP solution

Expanding the Lagrangian Eq. (53) in terms of x(t)x(t), and ignoring o(σ2/μ)o({\sigma^{*}}^{2}/\mu^{*}) and the irrelevant terms yields

L(x˙,x)\displaystyle L(\dot{x},x) (54)
\displaystyle\approx x˙22γ2+x22Λ\displaystyle\frac{\dot{x}^{2}}{2\gamma^{2}}+\frac{x^{2}}{2\Lambda^{*}}
σζ(t)+σZ(t)σZ(t)+Ξ(t)Ξ(t)+ξ(t)Λx\displaystyle-\frac{\sigma^{*}\zeta(t)+\sigma^{*}Z^{*}(t)-\sigma^{*}Z(t)+\Xi^{*}(t)-\Xi(t)+\xi(t)}{\Lambda^{*}}x
+σ2(2Z(t)Z(t)Z(t)2)+2Ξ(t)Ξ(t)Ξ(t)22Λ\displaystyle+\frac{{\sigma^{*}}^{2}(2Z^{*}(t)Z(t)-Z(t)^{2})+2\Xi^{*}(t)\Xi(t)-\Xi(t)^{2}}{2\Lambda^{*}}
σ2ΛZ(t)ζ(t),\displaystyle-\frac{{\sigma^{*}}^{2}}{\Lambda^{*}}Z(t)\zeta(t),

for which a solution of the Euler-Lagrange equation (13) is obtained as

x^(t)\displaystyle\hat{x}(t) =\displaystyle= γ2Λ0TdueγΛ|tu|(σζ(u)+σZ(u)\displaystyle\frac{\gamma}{2\sqrt{\Lambda^{*}}}\int_{0}^{T}due^{-\frac{\gamma}{\sqrt{\Lambda^{*}}}|t-u|}(\sigma^{*}\zeta(u)+\sigma^{*}Z^{*}(u) (55)
σZ(u)+Ξ(u)Ξ(u)+ξ(u)).\displaystyle-\sigma^{*}Z(u)+\Xi^{*}(u)-\Xi(u)+\xi(u)).

For the Lagrangian (54), the action integral (12) is given as

S[x^(t)]\displaystyle S[\hat{x}(t)] (56)
=\displaystyle= 0Tdt(12γ2ddt(x^(t)x^˙(t))\displaystyle\int_{0}^{T}dt\bigg{(}\frac{1}{2\gamma^{2}}\frac{d}{dt}(\hat{x}(t)\dot{\hat{x}}(t))
σ(ζ(t)+Z(t)Z(t))+Ξ(t)Ξ(t)+ξ(t)2Λx^(t)\displaystyle-\frac{\sigma^{*}(\zeta(t)+Z^{*}(t)-Z(t))+\Xi^{*}(t)-\Xi(t)+\xi(t)}{2\Lambda^{*}}\hat{x}(t)
σ2(2Z(t)Z(t)Z(t)2)+2Ξ(t)Ξ(t)Ξ(t)22Λ\displaystyle-\frac{{\sigma^{*}}^{2}(2Z^{*}(t)Z(t)-Z(t)^{2})+2\Xi^{*}(t)\Xi(t)-\Xi(t)^{2}}{2\Lambda^{*}}
σ2ΛZ(t)ζ(t)).\displaystyle-\frac{{\sigma^{*}}^{2}}{\Lambda^{*}}Z(t)\zeta(t)\bigg{)}.

Here, the first term on the right-hand-side is a boundary effect, which is negligible in comparison to the bulk contribution given the long event series T1T\gg 1. Substituting Eq. (55) into this formula and assuming ergodicity, we obtain the contribution of the MAP solution as

S[x^(t)]\displaystyle S[\hat{x}(t)] (57)
=\displaystyle= γ4ΛΛ0T𝑑t0T𝑑ueγΛ|tu|\displaystyle-\frac{\gamma}{4\Lambda^{*}\sqrt{\Lambda^{*}}}\int_{0}^{T}dt\int_{0}^{T}due^{-\frac{\gamma}{\sqrt{\Lambda^{*}}}|t-u|}
×{ξ(t)ξ(u)+2Ξ(t)ξ(u)2Ξ(t)ξ(u)\displaystyle\times\big{\{}\langle\xi(t)\xi(u)\rangle+2\langle\Xi^{*}(t)\xi(u)\rangle-2\langle\Xi(t)\xi(u)\rangle
+Ξ(t)Ξ(u)2Ξ(t)Ξ(u)+Ξ(t)Ξ(u)\displaystyle+\langle\Xi^{*}(t)\Xi^{*}(u)\rangle-2\langle\Xi^{*}(t)\Xi(u)\rangle+\langle\Xi(t)\Xi(u)\rangle
+σ2(ζ(t)ζ(u)+2Z(t)ζ(u)2Z(t)ζ(u)\displaystyle+{\sigma^{*}}^{2}(\langle\zeta(t)\zeta(u)\rangle+2\langle Z^{*}(t)\zeta(u)\rangle-2\langle Z(t)\zeta(u)\rangle
+Z(t)Z(u)2Z(t)Z(u)+Z(t)Z(u))}\displaystyle+\langle Z^{*}(t)Z^{*}(u)\rangle-2\langle Z^{*}(t)Z(u)\rangle+\langle Z(t)Z(u)\rangle)\big{\}}
12Λ(20T𝑑tΞ(t)Ξ(t)0T𝑑tΞ(t)2)\displaystyle{}-\frac{1}{2\Lambda^{*}}\left(2\int_{0}^{T}dt\langle\Xi^{*}(t)\Xi(t)\rangle-\int_{0}^{T}dt\langle\Xi(t)^{2}\rangle\right)
σ22Λ(20T𝑑tZ(t)Z(t)0T𝑑tZ(t)2)\displaystyle{}-\frac{{\sigma^{*}}^{2}}{2\Lambda^{*}}\left(2\int_{0}^{T}dt\langle Z^{*}(t)Z(t)\rangle-\int_{0}^{T}dt\langle Z(t)^{2}\rangle\right)
σ2Λ0T𝑑tZ(t)ζ(t).\displaystyle{}-\frac{{\sigma^{*}}^{2}}{\Lambda^{*}}\int_{0}^{T}dt\langle Z(t)\zeta(t)\rangle.

B.3 Contribution of the quantum effect

The quantum effect (16) is given by the ratio of functional determinants of the second order differential operators associated with the Lagrangian (54) Coleman1988 ; Kleinert2009 ,

R=[det(1γ2t2+1Λ)det(1γ2t2)]12.R=\left[\frac{\det\big{(}-\frac{1}{\gamma^{2}}\partial^{2}_{t}+\frac{1}{\Lambda^{*}}\big{)}}{\det\big{(}-\frac{1}{\gamma^{2}}\partial^{2}_{t}\big{)}}\right]^{-\frac{1}{2}}. (58)

The Gelfand–Yaglom method allows us to calculate the functional determinants by an initial-value problem for the corresponding differential operator Gelfand1960 ,

(1/γ2)ψ¨1(t)+ψ1(t)/Λ\displaystyle-(1/\gamma^{2})\ddot{\psi}_{1}(t)+\psi_{1}(t)/\Lambda^{*} =\displaystyle= 0,ψ1(0)=0,ψ˙1(0)=1,\displaystyle 0,\quad\psi_{1}(0)=0,\ \dot{\psi}_{1}(0)=1,
(1/γ2)ψ¨2(t)\displaystyle-(1/\gamma^{2})\ddot{\psi}_{2}(t) =\displaystyle= 0,ψ2(0)=0,ψ˙2(0)=1.\displaystyle 0,\quad\psi_{2}(0)=0,\ \dot{\psi}_{2}(0)=1.

Then, the Gelfand–Yaglom reads

[det(1γ2t2+1Λ)det(1γ2t2)]12=[ψ1(T)ψ2(T)]12.\left[\frac{\det\big{(}-\frac{1}{\gamma^{2}}\partial^{2}_{t}+\frac{1}{\Lambda^{*}}\big{)}}{\det\big{(}-\frac{1}{\gamma^{2}}\partial^{2}_{t}\big{)}}\right]^{-\frac{1}{2}}=\left[\frac{\psi_{1}(T)}{\psi_{2}(T)}\right]^{-\frac{1}{2}}. (59)

By solving the differential equations, the quantum contribution is obtained as

R=(Λ2γT)12exp(γT2Λ).R=\left(\frac{\sqrt{\Lambda^{*}}}{2\gamma T}\right)^{-\frac{1}{2}}\exp\left(-\frac{\gamma T}{2\sqrt{\Lambda^{*}}}\right). (60)

B.4 Formula for free energy

The free-energy formula is obtained by substituting Eqs. (57) and (60) into Eq. (17) as,

F(α,γ)\displaystyle F(\alpha,\gamma) (61)
=\displaystyle= γ4Λσ22Λρ~(0)Ψ~(0)Ψ~(0)2\displaystyle\frac{\gamma}{4\sqrt{\Lambda^{*}}}-\frac{{\sigma^{*}}^{2}}{2\Lambda^{*}}\widetilde{\rho}(0)-\frac{\widetilde{\Psi}^{*}(0)-\widetilde{\Psi}(0)}{2}
12(0dt(2Ψ(t)Ψ(t))Ψ(t)\displaystyle-\frac{1}{2}\bigg{(}\int_{0}^{\infty}dt(2\Psi^{*}(t)-\Psi(t))\Psi(t)
+0dt(Ψ(t)Ψ(t))(Ψ~(t)Ψ~(t)))\displaystyle+\int_{0}^{\infty}dt(\Psi^{*}(t)-\Psi(t))(\widetilde{\Psi}^{*}(t)-\widetilde{\Psi}(t))\bigg{)}
σ2Λ(0𝑑tΨ(t)ρ(t)+0𝑑t(Ψ(t)Ψ(t))ρ~(t))\displaystyle-\frac{{\sigma^{*}}^{2}}{\Lambda^{*}}\bigg{(}\int_{0}^{\infty}dt\Psi(t)\rho(t)+\int_{0}^{\infty}dt(\Psi^{*}(t)-\Psi(t))\widetilde{\rho}(t)\bigg{)}
σ22Λ(0dt0du(2Ψ(t)Ψ(t))Ψ(u)ρ(tu)\displaystyle-\frac{{\sigma^{*}}^{2}}{2\Lambda^{*}}\bigg{(}\int_{0}^{\infty}dt\int_{0}^{\infty}du(2\Psi^{*}(t)-\Psi(t))\Psi(u)\rho(t-u)
+0𝑑t0𝑑u(Ψ(t)Ψ(t))(Ψ(u)Ψ(u))\displaystyle+\int_{0}^{\infty}dt\int_{0}^{\infty}du(\Psi^{*}(t)-\Psi(t))(\Psi^{*}(u)-\Psi(u))
×ρ~(tu)),\displaystyle\times\widetilde{\rho}(t-u)\bigg{)},

where ρ(u)=ζ(t)ζ(t+u)\rho(u)=\langle\zeta(t)\zeta(t+u)\rangle is the normalized autocorrelation function of the original exogenous input, and

ρ~(t)\displaystyle\widetilde{\rho}(t) =\displaystyle= γ2Λ𝑑ueγΛ|u|ρ(t+u),\displaystyle\frac{\gamma}{2\sqrt{\Lambda^{*}}}\int_{-\infty}^{\infty}due^{-\frac{\gamma}{\sqrt{\Lambda^{*}}}|u|}\rho(t+u), (62)
Ψ~(t)\displaystyle\widetilde{\Psi}(t) =\displaystyle= γΛ0𝑑ueγΛuΨ(t+u),\displaystyle\frac{\gamma}{\sqrt{\Lambda^{*}}}\int_{0}^{\infty}due^{-\frac{\gamma}{\sqrt{\Lambda^{*}}}u}\Psi(t+u), (63)
Ψ~(t)\displaystyle\widetilde{\Psi}^{*}(t) =\displaystyle= γΛ0𝑑ueγΛuΨ(t+u).\displaystyle\frac{\gamma}{\sqrt{\Lambda^{*}}}\int_{0}^{\infty}due^{-\frac{\gamma}{\sqrt{\Lambda^{*}}}u}\Psi^{*}(t+u). (64)

For the exponential kernels ϕ(t)=1/τsexp(t/τs)\phi^{*}(t)=1/\tau^{*}_{s}\exp(-t/\tau^{*}_{s}) and ϕ(t)=1/τsexp(t/τs)\phi(t)=1/\tau_{s}\exp(-t/\tau_{s}), the effective self-excitation kernels are, respectively, obtained as

Ψ(t)=α/τse(1α)t/τs,\Psi^{*}(t)=\alpha^{*}/\tau^{*}_{s}e^{-(1-\alpha^{*})t/\tau^{*}_{s}}, (65)

and

Ψ(t)\displaystyle\Psi(t) =\displaystyle= αατs(1α)τse(1α)t/τs\displaystyle\frac{\alpha\alpha^{*}}{\tau_{s}^{*}-(1-\alpha^{*})\tau_{s}}e^{-(1-\alpha^{*})t/\tau^{*}_{s}} (66)
+α(τs/τs1)τs(1α)τset/τs,\displaystyle+\frac{\alpha(\tau^{*}_{s}/\tau_{s}-1)}{\tau^{*}_{s}-(1-\alpha^{*})\tau_{s}}e^{-t/\tau_{s}},

for τsτs/(1α)\tau_{s}\neq\tau^{*}_{s}/(1-\alpha^{*}), and

Ψ(t)=α(1α)(1+αt/τs)/τse(1α)t/τs,\Psi(t)=\alpha(1-\alpha^{*})(1+\alpha^{*}t/\tau^{*}_{s})/\tau^{*}_{s}e^{-(1-\alpha^{*})t/\tau^{*}_{s}}, (67)

for τs=τs/(1α)\tau_{s}=\tau^{*}_{s}/(1-\alpha^{*}). Substituting Eqs. (65) and (66) and the autocorrelation function of the OUP, ρ(u)=exp(|u|/τe)\rho(u)=\exp(-|u|/\tau^{*}_{e}), into Eq. (61) yields the free energy for the Hawkes decoder with the exponential kernels. By scaling the hyperparameter γγτe/Λ\gamma\leftarrow\gamma\tau_{e}^{*}/\sqrt{\Lambda^{*}} and using the relative time constants, βs=τe/τs\beta_{s}=\tau_{e}^{*}/\tau_{s} and βs=τe/τs\beta^{*}_{s}=\tau_{e}^{*}/\tau^{*}_{s}, the dimensionless free energy, FτeFF\leftarrow\tau_{e}^{*}F, is expressed as

F(α,γ)\displaystyle F(\alpha,\gamma) (68)
=\displaystyle= γ4σ2τeμ(1α)γ2(γ+1)γ2{c0c1(α)γ+βαc2(α)γ+βs}\displaystyle\frac{\gamma}{4}-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{(1-\alpha^{*})\gamma}{2(\gamma+1)}-\frac{\gamma}{2}\bigg{\{}\frac{c_{0}-c_{1}(\alpha)}{\gamma+\beta_{\alpha}^{*}}-\frac{c_{2}(\alpha)}{\gamma+\beta_{s}}\bigg{\}}
12{(c0c1(α))2γ2βα(γ+βα)(c0c1(α))c2(α)γ(βα+βs)(γ+βs)\displaystyle-\frac{1}{2}\bigg{\{}\frac{(c_{0}-c_{1}(\alpha))^{2}\gamma}{2\beta_{\alpha}^{*}(\gamma+\beta_{\alpha}^{*})}-\frac{(c_{0}-c_{1}(\alpha))c_{2}(\alpha)\gamma}{(\beta_{\alpha}^{*}+\beta_{s})(\gamma+\beta_{s})}
(c0c1(α))c2(α)γ(βα+βs)(γ+βα)+c2(α)2γ2βs(γ+βs)c2(α)22βs\displaystyle-\frac{(c_{0}-c_{1}(\alpha))c_{2}(\alpha)\gamma}{(\beta_{\alpha}^{*}+\beta_{s})(\gamma+\beta_{\alpha}^{*})}+\frac{c_{2}(\alpha)^{2}\gamma}{2\beta_{s}(\gamma+\beta_{s})}-\frac{c_{2}(\alpha)^{2}}{2\beta_{s}}
+(2c0c1(α))c1(α)2βα+2(c0c1(α))c2(α)βα+βs}\displaystyle+\frac{(2c_{0}-c_{1}(\alpha))c_{1}(\alpha)}{2\beta_{\alpha}^{*}}+\frac{2(c_{0}-c_{1}(\alpha))c_{2}(\alpha)}{\beta_{\alpha}^{*}+\beta_{s}}\bigg{\}}
σ2τeμ(1α){(c0c1(α))(γ+βα+1)γ(βα+1)(γ+βα)(γ+1)\displaystyle-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}(1-\alpha^{*})\bigg{\{}\frac{(c_{0}-c_{1}(\alpha))(\gamma+\beta_{\alpha}^{*}+1)\gamma}{(\beta_{\alpha}^{*}+1)(\gamma+\beta_{\alpha}^{*})(\gamma+1)}
c2(α)(γ+βs+1)γ(βs+1)(γ+βs)(γ+1)+c1(α)βα+1+c2(α)βs+1}\displaystyle-\frac{c_{2}(\alpha)(\gamma+\beta_{s}+1)\gamma}{(\beta_{s}+1)(\gamma+\beta_{s})(\gamma+1)}+\frac{c_{1}(\alpha)}{\beta_{\alpha}^{*}+1}+\frac{c_{2}(\alpha)}{\beta_{s}+1}\bigg{\}}
σ2τeμ(1α)2{(c0c1(α))2(γ+βα+1)γβα(βα+1)(γ+βα)(γ+1)\displaystyle-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{(1-\alpha^{*})}{2}\bigg{\{}\frac{(c_{0}-c_{1}(\alpha))^{2}(\gamma+\beta_{\alpha}^{*}+1)\gamma}{\beta_{\alpha}^{*}(\beta_{\alpha}^{*}+1)(\gamma+\beta_{\alpha}^{*})(\gamma+1)}
2(c0c1(α))c2(α)(γ+βα+1)γ(βα+βs)(βα+1)(γ+βα)(γ+1)\displaystyle-\frac{2(c_{0}-c_{1}(\alpha))c_{2}(\alpha)(\gamma+\beta_{\alpha}^{*}+1)\gamma}{(\beta_{\alpha}^{*}+\beta_{s})(\beta_{\alpha}^{*}+1)(\gamma+\beta_{\alpha}^{*})(\gamma+1)}
2(c0c1(α))c2(α)(γ+βs+1)γ(βα+βs)(βs+1)(γ+βs)(γ+1)\displaystyle-\frac{2(c_{0}-c_{1}(\alpha))c_{2}(\alpha)(\gamma+\beta_{s}+1)\gamma}{(\beta_{\alpha}^{*}+\beta_{s})(\beta_{s}+1)(\gamma+\beta_{s})(\gamma+1)}
+c2(α)2(γ+βs+1)γβs(βs+1)(γ+βs)(γ+1)c2(α)2βs(βs+1)\displaystyle+\frac{c_{2}(\alpha)^{2}(\gamma+\beta_{s}+1)\gamma}{\beta_{s}(\beta_{s}+1)(\gamma+\beta_{s})(\gamma+1)}-\frac{c_{2}(\alpha)^{2}}{\beta_{s}(\beta_{s}+1)}
+(2c0c1(α))c1(α)βα(βα+1)+2(c0c1(α))c2(α)(βα+βs)(βs+1)\displaystyle+\frac{(2c_{0}-c_{1}(\alpha))c_{1}(\alpha)}{\beta_{\alpha}^{*}(\beta_{\alpha}^{*}+1)}+\frac{2(c_{0}-c_{1}(\alpha))c_{2}(\alpha)}{(\beta_{\alpha}^{*}+\beta_{s})(\beta_{s}+1)}
+2(c0c1(α))c2(α)(βα+βs)(βα+1)},\displaystyle+\frac{2(c_{0}-c_{1}(\alpha))c_{2}(\alpha)}{(\beta_{\alpha}^{*}+\beta_{s})(\beta_{\alpha}^{*}+1)}\bigg{\}},

where βα=(1α)βs\beta_{\alpha}^{*}=(1-\alpha^{*})\beta^{*}_{s}, c0=αβsc_{0}=\alpha^{*}\beta^{*}_{s}, c1(α)=ααβsβs/(βsβα)c_{1}(\alpha)=\alpha\alpha^{*}\beta_{s}\beta^{*}_{s}/(\beta_{s}-\beta^{*}_{\alpha}) and c2(α)=αβs(βsβs)/(βsβα)c_{2}(\alpha)=\alpha\beta_{s}(\beta_{s}-\beta^{*}_{s})/(\beta_{s}-\beta^{*}_{\alpha}). Note that the free energy (68) is fully characterized by the four dimensionless parameters: the strengths of endogenous self-excitation α\alpha^{*} and the exogenous fluctuations σ2τe/μ{\sigma^{*}}^{2}\tau^{*}_{e}/\mu^{*}, and the time constants of the self-excitation kernels relative to that of original exogenous input, τs/τe(=βs1)\tau^{*}_{s}/\tau^{*}_{e}(={\beta^{*}_{s}}^{-1}) and τs/τe(=βs1)\tau_{s}/\tau^{*}_{e}(=\beta_{s}^{-1}).

Appendix C Analysis of free energy

C.1 The Hawkes decoder with a correct self-excitation kernel

Here, we analyze the free energy (68) in the case where the model uses the correct shape of the original self-excitation kernel, τs=τs\tau_{s}=\tau^{*}_{s}. In particular, we consider the case of slow exogenous fluctuation, τeτs\tau^{*}_{e}\gg\tau^{*}_{s}. The free energy (68) is asymptotically evaluated as

F(α,γ)\displaystyle F(\alpha,\gamma) (69)
=\displaystyle= α(2αα)4(1α)τeτs+{γ4σ2τeμ(1α)γ2(γ+1)\displaystyle-\frac{\alpha(2\alpha^{*}-\alpha)}{4(1-\alpha^{*})}\frac{\tau^{*}_{e}}{\tau^{*}_{s}}+\bigg{\{}\frac{\gamma}{4}-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{(1-\alpha^{*})\gamma}{2(\gamma+1)}
(αα)γ2(1α)(αα)2γ4(1α)2+σ2τeμ(αγ+α)γ+1\displaystyle-\frac{(\alpha^{*}-\alpha)\gamma}{2(1-\alpha^{*})}-\frac{(\alpha^{*}-\alpha)^{2}\gamma}{4(1-\alpha^{*})^{2}}+\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{(\alpha^{*}\gamma+\alpha)}{\gamma+1}
σ2τeμ(α2γ+2ααα2)2(1α)(γ+1)}+O(τs/τe),\displaystyle-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{({\alpha^{*}}^{2}\gamma+2\alpha\alpha^{*}-\alpha^{2})}{2(1-\alpha^{*})(\gamma+1)}\bigg{\}}+O(\tau^{*}_{s}/\tau^{*}_{e}),

from which we obtain α^=α+O(τs/τe)\hat{\alpha}=\alpha^{*}+O(\tau^{*}_{s}/\tau^{*}_{e}), i.e., the reproduction ratio can be estimated with a potential small error of the order of τs/τe\tau^{*}_{s}/\tau^{*}_{e}. In the limit of τs/τe1\tau^{*}_{s}/\tau^{*}_{e}\ll 1, the free energy evaluated at α=α^\alpha=\hat{\alpha} is given as a function of γ\gamma as

F(α^,γ)=γ4σ2τeμ(1α)γ2(γ+1),F(\hat{\alpha},\gamma)=\frac{\gamma}{4}-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{(1-\alpha^{*})\gamma}{2(\gamma+1)}, (70)

which has a minimum at γ=0\gamma=0 if

σ2τe/μ<12(1α).{\sigma^{*}}^{2}\tau^{*}_{e}/\mu^{*}<\frac{1}{2(1-\alpha^{*})}. (71)

C.2 The Bayesian rate estimator

The free energy for the Bayesian rate estimator is obtained by setting α=0\alpha=0 in Eq. (68) as

Fp(γ)=F(α=0,γ)\displaystyle F_{\rm p}(\gamma)=F(\alpha=0,\gamma) (72)
=\displaystyle= γ4α(2α)βsγ4(1α)(γ+(1α)βs)\displaystyle\frac{\gamma}{4}-\frac{\alpha^{*}(2-\alpha^{*})\beta^{*}_{s}\gamma}{4(1-\alpha^{*})(\gamma+(1-\alpha^{*})\beta^{*}_{s})}
σ2τeμα(2α)βs(1+γ+(1α)βs)γ2(γ+(1α)βs)(1+(1α)βs)(γ+1)\displaystyle-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{\alpha^{*}(2-\alpha^{*})\beta^{*}_{s}(1+\gamma+(1-\alpha^{*})\beta^{*}_{s})\gamma}{2(\gamma+(1-\alpha^{*})\beta^{*}_{s})(1+(1-\alpha^{*})\beta^{*}_{s})(\gamma+1)}
σ2τeμ(1α)γ2(γ+1).\displaystyle-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{(1-\alpha^{*})\gamma}{2(\gamma+1)}.

The conditions under which the free energy (72) has a minimum at γ0\gamma\neq 0 are analytically derived into two particular cases. For the case of α=0\alpha^{*}=0, in which data is generated by the (inhomogeneous) Poisson process, Eq. (72) becomes

Fp(γ)=γ4σ2τeμγ2(γ+1),F_{\rm p}(\gamma)=\frac{\gamma}{4}-\frac{{\sigma^{*}}^{2}\tau^{*}_{e}}{\mu^{*}}\frac{\gamma}{2(\gamma+1)}, (73)

which corresponds to the free energy derived in Koyama2007 . The condition for γ^0\hat{\gamma}\neq 0 is then obtained as σ2τe/μ>1/2{\sigma^{*}}^{2}\tau^{*}_{e}/\mu^{*}>1/2.

For the other case, in which the data is generated by the homogenous Hawkes process (σ2τe/μ=0{\sigma^{*}}^{2}\tau^{*}_{e}/\mu^{*}=0), the free energy (72) becomes

Fp(γ)\displaystyle F_{\rm p}(\gamma) =\displaystyle= γ((1α)γ+(2α24α+1)βs)4(1α)(γ+(1α)βs),\displaystyle\frac{\gamma((1-\alpha^{*})\gamma+(2{\alpha^{*}}^{2}-4\alpha^{*}+1)\beta^{*}_{s})}{4(1-\alpha^{*})(\gamma+(1-\alpha^{*})\beta^{*}_{s})}, (74)

from which the condition for γ^0\hat{\gamma}\neq 0 is derived as α>αc=11/2\alpha^{*}>\alpha_{c}=1-1/\sqrt{2}.

C.3 The Hawkes decoder with an incorrect self-excitation kernel

We analyze the free energy for the Hawkes decoder with an incorrect self-excitation kernel (τsτs\tau_{s}\neq\tau^{*}_{s}) in the limit of τs/τe1\tau_{s}/\tau^{*}_{e}\ll 1 while τs/τe\tau^{*}_{s}/\tau^{*}_{e} remains finite (which includes the cases of τeτs\tau^{*}_{e}\gg\tau^{*}_{s} and τsτs\tau_{s}\ll\tau^{*}_{s}). The free energy (68) is asymptotically expanded with respect to τs/τe\tau_{s}/\tau^{*}_{e} as

F(α,γ)=α24τeτs+Fp(α,γ)+O(τs/τe),F(\alpha,\gamma)=\frac{\alpha^{2}}{4}\frac{\tau^{*}_{e}}{\tau_{s}}+F_{\rm p}(\alpha,\gamma)+O(\tau_{s}/\tau^{*}_{e}), (75)

where Fp(α,γ)F_{\rm p}(\alpha,\gamma) represents a collection of constant terms that satisfy Fp(α=0,γ)=Fp(γ)F_{\rm p}(\alpha=0,\gamma)=F_{\rm p}(\gamma). From Eq. (75) we obtain α^=O(τs/τe)\hat{\alpha}=O(\tau_{s}/\tau^{*}_{e}), i.e., the estimated reproduction ratio becomes infinitesimal. In this limit, the free energy is identical to that of the Bayesian rate decoder, Fp(γ)F_{\rm p}(\gamma).

ACKNOWLEDGMENTS

We thank Hidetaka Manabe for his technical assistance in developing a web-application program. S.S. is supported by JSPS KAKENHI Grant number 26280007, JST CREST Grant Number JPMJCR1304, and the New Energy and Industrial Technology Development Organization (NEDO).

References

  • (1) H. W. Hethcote. The mathematics of infectious diseases. SIAM review, Vol. 42, No. 4, pp. 599–653, 2000.
  • (2) M. J. Keeling and P. Rohani. Modeling infectious diseases in humans and animals. Princeton University Press, 2011.
  • (3) A. Vespignani. Modelling dynamical processes in complex socio-technical systems. Nature physics, Vol. 8, No. 1, p. 32, 2012.
  • (4) D. Brockmann and D. Helbing. The hidden geometry of complex, network-driven contagion phenomena. Science, Vol. 342, No. 6164, pp. 1337–1342, 2013.
  • (5) R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani. Epidemic processes in complex networks. Reviews of Modern Physics, Vol. 87, No. 3, p. 925, 2015.
  • (6) W. Goffman and V. A. Newill. Generalization of epidemic theory. Nature, Vol. 204, No. 4955, pp. 225–228, 1964.
  • (7) M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muchnik, H. E. Stanley, and H. A. Makse. Identification of influential spreaders in complex networks. Nature physics, Vol. 6, No. 11, p. 888, 2010.
  • (8) G. O. Mohler, M. B. Short, P. J. Brantingham, F. P. Schoenberg, and G. E. Tita. Self-exciting point process modeling of crime. Journal of the American Statistical Association, Vol. 106, No. 493, pp. 100–108, 2011.
  • (9) E. Lewis, G. Mohler, P. J. Brantingham, and A. L. Bertozzi. Self-exciting point process models of civilian deaths in iraq. Security Journal, Vol. 25, No. 3, pp. 244–264, Jul 2012.
  • (10) Y. Ogata. Statistical models for earthquake occurrences and residual analysis for point processes. Journal of the American Statistical Association, Vol. 83, No. 401, pp. 9–27, 1988.
  • (11) A. Helmstetter and D. Sornette. Predictability in the epidemic-type aftershock sequence model of interacting triggered seismicity. Journal of Geophysical Research: Solid Earth, Vol. 108, No. B10, p. 2482, 2003.
  • (12) M. Golosovsky and S. Solomon. Stochastic dynamical model of a growing citation network based on a self-exciting point process. Physical Review Letters, Vol. 109, p. 098701, Aug 2012.
  • (13) R. Crane and D. Sornette. Robust dynamic classes revealed by measuring the response function of a social system. Proceedings of the National Academy of Sciences, Vol. 105, No. 41, pp. 15649–15653, 2008.
  • (14) K. Lerman and R. Ghosh. Information contagion: An empirical study of the spread of news on digg and twitter social networks. ICWSM, Vol. 10, pp. 90–97, 2010.
  • (15) D. M. Romero, B. Meeder, and J. Kleinberg. Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter. pp. 695–704. ACM, 2011.
  • (16) Y. Aït-Sahalia, J. Cacho-Diaz, and R. J. A. Laeven. Modeling financial contagion using mutually exciting jump processes. Technical report, 2010.
  • (17) S. J. Hardiman, N. Bercot, and J. P. Bouchaud. Critical reflexivity in financial markets: a Hawkes process analysis. The European Physical Journal B, Vol. 86, No. 10, p. 442, 2013.
  • (18) E. Bacry, I. Mastromatteo, and J. F. Muzy. Hawkes processes in finance. Market Microstructure and Liquidity, Vol. 1, p. 1550005, 2015.
  • (19) A. G. Hawkes. Hawkes processes and their applications to finance: a review. Quantitative Finance, Vol. 18, No. 2, pp. 193–198, 2018.
  • (20) V. Pernice, B. Staude, S. Cardanobile, and S. Rotter. How structure determines correlations in neuronal networks. PLoS Computational Biology, Vol. 7, No. 5, p. e1002059, 2011.
  • (21) P. Reynaud-Bouret, V. Rivoirard, F. Grammont, and C. Tuleau-Malot. Goodness-of-fit tests and nonparametric adaptive estimation for spike train analysis. The Journal of Mathematical Neuroscience, Vol. 4, No. 1, p. 3, 2014.
  • (22) R. E. Kass, S. Amari, K. Arai, E. N. Brown, C. O. Diekman, M. Diesmann, B. Doiron, U. T. Eden, A. L. Fairhall, G. M. Fiddyment, T. Fukai, S. Grün, M. T. Harrison, M. Helias, H. Nakahara, J. Teramae, P. J. Thomas, M. Reimers, J. Rodu, H. G. Rotstein, E. Shea-Brown, H. Shimazaki, S. Shinomoto, B. M. Yu, and M. A. Kramer. Computational neuroscience: Mathematical and statistical perspectives. Annual Review of Statistics and Its Application, Vol. 5, No. 1, pp. 183–214, 2018.
  • (23) F. Gerhard, M. Deger, and W. Truccolo. On the stability and dynamics of stochastic spiking neuron models: Nonlinear Hawkes process and point process GLMs. PLoS Computational Biology, Vol. 13, No. 2, p. e1005390, 2017.
  • (24) E. K. Lipp, A. Huq, and R. R. Colwell. Effects of global climate on infectious disease: the cholera model. Clinical Microbiology Reviews, Vol. 15, No. 4, pp. 757–770, 2002.
  • (25) M. Argollo de Menezes and A. L. Barabási. Fluctuations in network dynamics. Physical Review Letters, Vol. 92, No. 2, p. 028701, 2004.
  • (26) D. M. J. Lazer, M. A. Baum, Y. Benkler, A. J. Berinsky, K. M. Greenhill, F. Menczer, M. J. Metzger, B. Nyhan, G. Pennycook, D. Rothschild, et al. The science of fake news. Science, Vol. 359, No. 6380, pp. 1094–1096, 2018.
  • (27) T. Omi, Y. Hirata, and K. Aihara. Hawkes process model with a time-dependent background rate and its application to high-frequency financial data. Physical Review E, Vol. 96, No. 1, p. 012303, 2017.
  • (28) T. Zaman, E. B. Fox, and E. T. Bradlow. A Bayesian approach for predicting the popularity of tweets. The Annals of Applied Statistics, Vol. 8, No. 3, pp. 1583–1611, 2014.
  • (29) J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, and J. Leskovec. Can cascades be predicted? In WWW’ 14, pp. 925–936, 2014.
  • (30) P. A. Dow, L. A. Adamic, and A. Friggeri. The anatomy of large facebook cascades. In ICWSM’ 13, pp. 145–154, 2013.
  • (31) S. Petrovic, M. Osborne, and V. Lavrenko. Rt to win! predicting message propagation in twitter. In ICWSM’ 11, pp. 586–589, 2011.
  • (32) T. Aoki, T. Takaguchi, R. Kobayashi, and R. Lambiotte. Input-output relationship in social communications characterized by spike train analysis. Physical Review E, Vol. 94, No. 4, p. 042313, 2016.
  • (33) C. Cattuto, V. Loreto, and L. Pietronero. Semiotic dynamics and collaborative tagging. Proceedings of the National Academy of Sciences, Vol. 104, No. 5, pp. 1461–1464, 2007.
  • (34) D. Rybski, S. V. Buldyrev, S. Havlin, F. Liljeros, and H. A. Makse. Scaling laws of human interaction activity. Proceedings of the National Academy of Sciences, Vol. 106, No. 31, pp. 12640–12645, 2009.
  • (35) K. Fujita, A. Medvedev, S. Koyama, R. Lambiotte, and S. Shinomoto. Identifying exogenous and endogenous activity in social media. Phys. Rev. E, Vol. 98, p. 052304, Nov 2018.
  • (36) A. G. Hawkes. Spectra of some self-exciting and mutually exciting point processes. Biometrika, Vol. 58, No. 1, pp. 83–90, 1971.
  • (37) D. L. Snyder and M. I. Miller. Random Point Processes in Time and Space. Springer, 2nd edition, 1991.
  • (38) D. Daley and D. Vere-Jones. An Introduction to the Theory of Point Processes, Vol. 1. Springer, 2nd edition, 2002.
  • (39) S. Koyama and S. Shinomoto. Histogram bin width selection for time-dependent poisson processes. Journal of Physics A: Mathematical and General, Vol. 37, No. 29, pp. 7255–7265, jul 2004.
  • (40) S. Koyama, T. Shimokawa, and S. Shinomoto. Phase transitions in the estimation of event rate: a path integral analysis. Journal of Physics A: Mathematical and Theoretical, Vol. 40, No. 20, pp. F383–F390, apr 2007.
  • (41) T. Onaga and S. Shinomoto. Bursting transition in a linear self-exciting point process. Phys. Rev. E, Vol. 89, p. 042817, Apr 2014.
  • (42) T. Onaga and S. Shinomoto. Emergence of event cascades in inhomogeneous networks. Scientific Reports, Vol. 6, pp. 33321 EP –, Sep 2016. Article.
  • (43) Y. Ogata. The asymptotic behavior of maximum likelihood estimators for stationary point processes. Annals of the Institute of Statistical Mathematics, Vol. 30, No. 2, pp. 243–261, 1978.
  • (44) T. Ozaki. Maximum likelihood estimation of Hawkes’ self-exciting point processes. Annals of the Institute of Statistical Mathematics, Vol. 31, No. 1, pp. 145–155, 1979.
  • (45) A. Veen and F. P. Schoenberg. Estimation of space-time branching process models in seismology using an EM-type algorithm. Journal of the American Statistical Association, Vol. 103, No. 482, pp. 614–624, 2008.
  • (46) J. G. Rasmussen. Bayesian inference for Hawkes processes. Methodology and Computing in Applied Probability, Vol. 15, No. 3, pp. 623–642, Sep 2013.
  • (47) J. Da Fonseca and R. Zaatour. Hawkes process: Fast calibration, application to trade clustering, and diffusive limit. Journal of Futures Markets, Vol. 34, No. 6, pp. 548–579, 2014.
  • (48) J. Chen, A. G. Hawkes, E. Scalas, and M. Trinh. Performance of information criteria for selection of Hawkes process models of financial data. Quantitative Finance, Vol. 18, No. 2, pp. 225–235, 2018.
  • (49) E. Lewis and G. Mohler. A nonparametric EM algorithm for multiscale Hawkes processes. preprint, 2011.
  • (50) E. Bacry, K. Dayri, and J. F. Muzy. Non-parametric kernel estimation for symmetric Hawkes processes. application to high frequency financial data. The European Physical Journal B, Vol. 85, p. 157, 2012.
  • (51) E. Bacry and J. F. Muzy. First- and second-order statistics characterization of Hawkes processes and non-parametric estimation. IEEE Transactions on Information Theory, Vol. 62, No. 4, pp. 2184–2202, April 2016.
  • (52) E. Bacry, T. Jaisson, and J. F. Muzy. Estimation of slowly decreasing Hawkes kernels: application to high-frequency order book dynamics. Quantitative Finance, Vol. 16, No. 8, pp. 1179–1201, 2016.
  • (53) P. Reynaud-Bouret and S. Schbath. Adaptive estimation for Hawkes processes; application to genome analysis. The Annals of Statistics, Vol. 38, No. 5, pp. 2781–2822, 2010.
  • (54) N. R. Hansen, P. Reynaud-Bouret, and V. Rivoirard. Lasso and probabilistic inequalities for multivariate point processes. Bernoulli, Vol. 21, No. 1, pp. 83–143, 2015.
  • (55) K. Zhou, H. Zha, and L. Song. Learning triggering kernels for multi-dimensional Hawkes processes. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, Vol. 28 of Proceedings of Machine Learning Research, pp. 1301–1309, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.
  • (56) H. Xu, M. Farajtabar, and H. Zha. Learning granger causality for Hawkes processes. In Maria Florina Balcan and Kilian Q. Weinberger, editors, Proceedings of The 33rd International Conference on Machine Learning, Vol. 48 of Proceedings of Machine Learning Research, pp. 1717–1726, New York, New York, USA, 20–22 Jun 2016. PMLR.
  • (57) M. Kirchner. Hawkes and INAR(\infty) processes. Stochastic Processes and their Applications, Vol. 126, No. 8, pp. 2494–2525, 2016.
  • (58) M. Kirchner. An estimation procedure for the Hawkes process. Quantitative Finance, Vol. 17, No. 4, pp. 571–595, 2017.
  • (59) F. Chen and P. Hall. Nonparametric estimation for self-exciting point processes–a parsimonious approach. Journal of Computational and Graphical Statistics, Vol. 25, No. 1, pp. 209–224, 2016.
  • (60) S. Clinet and Y. Potiron. Statistical inference for the doubly stochastic self-exciting process. Bernoulli, Vol. 24, No. 4B, pp. 3469–3493, 2018.
  • (61) F. Chen and P. Hall. Inference for a nonstationary self-exciting point process with an application in ultra-high frequency financial data modeling. Journal of Applied Probability, Vol. 50, No. 4, pp. 1006–1024, 2013.
  • (62) R. Kobayashi and R. Lambiotte. Tideh: Time-dependent Hawkes process for predicting retweet dynamics. In ICWSM’ 2016, pp. 191–200, 2016.
  • (63) M. West and J. Harrison. Bayesian Forecasting and Dynamic Models. Springer, 2nd edition, 1997.
  • (64) J. Durbin and S. Koopman. Time series analysis by state space methods. Oxford University Press, 2001.
  • (65) S. Koyama, L. C. Perez-Bolde, C. R. Shalizi, and R. E. Kass. Approximate methods for state-space models. Journal of American Statistical Association, Vol. 105, pp. 170–180, 2010.
  • (66) S. Koyama and L. Paninski. Efficient computation of the maximum a posteriori path and parameter estimation in integrate-and-fire and more general state-space models. Journal of Computational Neuroscience, Vol. 29, pp. 89–105, 2010.
  • (67) S. Koyama, U. T. Eden, E. N. Brown, and R. E. Kass. Bayesian decoding of neural spike trains. Annals of the Institute of Statistical Mathematics, Vol. 62, pp. 37–59, 2010.
  • (68) W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C: The Art of Scientific Computing. Cambridge University Press, 2nd edition, 1992.
  • (69) S. Coleman. Aspects of Symmetry, chapter 7. Cambridge University Press, 1988.
  • (70) H. Kleinert. Path Integrals in Quantum Mechanics, Statistics, Polymer Physics, and Financial Markets. World Scientific Publishing Company, 5th edition, 2009.
  • (71) I. M. Gelfand and A. M. Yaglom. Integration in functional spaces and applications in quantum physics. Journal of Mathematical Physics, Vol. 1, pp. 48–69, 1960.