This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Simultaneous clustering and estimation of additive shape invariant models for recurrent event data

Zitong Zhang111zztzhang@ucdavis.edu, Department of Statistics, University of California, Davis and Shizhe Chen222szdchen@ucdavis.edu, Department of Statistics, University of California, Davis
Abstract

Technological advancements have enabled the recording of spiking activities from large neuron ensembles, presenting an exciting yet challenging opportunity for statistical analysis. This project considers the challenges from a common type of neuroscience experiments, where randomized interventions are applied over the course of each trial. The objective is to identify groups of neurons with unique stimulation responses and estimate these responses. The observed data, however, comprise superpositions of neural responses to all stimuli, which is further complicated by varying response latencies across neurons. We introduce a novel additive shape invariant model that is capable of simultaneously accommodating multiple clusters, additive components, and unknown time-shifts. We establish conditions for the identifiability of model parameters, offering guidance for the design of future experiments. We examine the properties of the proposed algorithm through simulation studies, and apply the proposed method on neural data collected in mice.

1 Introduction

Recent technological advancements have greatly improved our ability to record neural activities with high spatiotemporal resolution and over large volumes. Recording device such as Neuropixels (Jun et al., 2017; Steinmetz et al., 2018, 2021) enable scientists to record neurons across multiple brain areas while subjects engage in behavioral tasks. The growing amount of neural data facilitates the exploration of how neural firing patterns encode information, but also presents challenges to existing statistical methods. In this paper, we consider the problem of identifying subgroups of neurons that share similar responses to stimuli. In a typical experiment, a subject will be exposed to a series of stimuli (e.g., visual cue, auditory cue, reward) over the course of a trial, which means that the recorded spike train is a superposition of neural responses to these stimuli (Benucci et al., 2009; Capilla et al., 2011; Orhan and Ma, 2015). It is challenging to disentangle the response to each stimulus given the stochastic nature of neural activities. Moreover, even if two neurons share the same responses to a stimulus, they might respond in different time due to the difference in their response latency (Oram et al., 2002; Levakova et al., 2015; Lee et al., 2020), making the estimation of shared neural responses more challenging. Lastly, even within the same area of the brain, we cannot assume that neurons share the same responses (Molyneaux et al., 2007; Lake et al., 2016; Cembrowski and Spruston, 2019). In light of these intricacies, the primary objective of this study is to develop statistical methods for recurrent events to simultaneously address three key tasks: (i) decompose the neural firing activities into their constituent components, (ii) align neural firing patterns across neurons, and (iii) cluster neurons based on their similar firing patterns. As a concrete example, we consider the representative experiment by Steinmetz et al. (2019), where mice were exposed to a series of visual and auditory stimuli, and thousands of neurons were recorded throughout the experiment. We display four neurons selected by our method in Figure 1 that displayed distinct firing patterns to the stimuli.

Refer to caption TrialsNeuron 1Refer to caption Neuron 2Refer to caption Neuron 3Refer to caption VisualAuditoryRefer to caption Neuron 4Time from visual stimulus onset (s)
Figure 1: Activities of four example neurons from Steinmetz et al. (2019). The four neurons are all from the midbrain region of the mouse brain, where their firings are shown as black dots. Each panel corresponds to a single neuron, where the x-axis represents time since visual stimulus onset, and the y-axis represents experiment trials. Trials are aligned by the visual stimulus onset time shown as orange dots , and ordered by the auditory stimulus onset time shown as blue dots.

However, existing methods are inadequate in addressing these challenges at hand. Shape invariant models (Beath, 2007; Bigot and Gadat, 2010; Vimond, 2010; Bigot et al., 2013; Bigot and Gendre, 2013; Bontemps and Gadat, 2014) have been studied extensively to align a collection of curves with identical means up to unknown time shifts. The combination of shape invariant models and mixture models has led to the development of techniques for simultaneously aligning and clustering curves (Chudova et al., 2003; Gaffney and Smyth, 2004; Liu and Yang, 2009; Lu and Lou, 2019). In a related vein, Sangalli et al. (2010) propose an approach to simultaneously align and cluster curves that share a common shape but are subject to non-linear distortions in temporal alignment. Nonetheless, all of these methods are limited in their capacity to decompose curves comprising multiple components.

A class of problems related to the decomposing superimposed curves has been studied in the context of functional principal component analysis (Yao et al., 2005; Morris and Carroll, 2006; Di et al., 2009; Crainiceanu et al., 2009). Functional principal component analysis aims to deconstruct functional data into orthogonal components that effectively account for the variation within the dataset. Researchers have applied the functional principal component analysis to analyze point processes through the modeling of intensity functions (Wu et al., 2013; Xu et al., 2020). Furthermore, the functional principal component analysis models have been combined with mixture models to simultaneously cluster and decompose curves (Chiou and Li, 2007; Bouveyron and Jacques, 2011; Jacques and Preda, 2013; Yin et al., 2021). However, the orthogonal components derived from these methods may not match the components of our primary interest. Indeed, the congruence between the outcomes of these methods and our proposed approach is limited to specific scenarios (see Section 2.3). Besides, unknown time shifts have not been studied in functional principal component analysis.

In this paper, we propose an additive shape invariant mixture model capable of simultaneously decompose, align, and cluster recurrent event data, such as neural firings. The rest of this paper is organized as follows. In Section 2, we introduce an additive shape invariant model for simultaneous decomposition and alignment. Section 3 expands upon the additive shape invariant model by integrating simultaneous clustering. In Section 4, we present a computationally efficient algorithm. The performance of the proposed method is assessed through simulation experiments in Section 5. In Section 6, we use the proposed method to study neural firing patterns in mice. Finally, we discuss potential future research directions in Section 7.

2 Additive shape invariant model

2.1 Model

Consider a set of recurrent events from repeated measurements of nn subjects 𝒪{{ti,r,j}j=1,,Ni,r(T):0<ti,r,1<<ti,r,Ni,r(T)<T,i=1,,n,r=1,,R},\mathcal{O}\equiv\big{\{}\left\{t_{i,r,j}\right\}_{j=1,\ldots,N_{i,r}(T)}:0<t_{i,r,1}<\cdots<t_{i,r,N_{i,r}(T)}<T,i=1,\ldots,n,r=1,\ldots,R\big{\}}, where ti,r,jt_{i,r,j} denotes the time of the jj-th event of the ii-th subject in the rr-th observation, TT denotes the duration of each observation, Ni,r(T)N_{i,r}(T) denotes the total number of events associated with the ii-th subject and the rr-th observation. For i[n]i\in[n] and r[R]r\in[R], we adopt the definition of counting process (see, e.g. Daley and Vere-Jones, 2003) as Ni,r(t)j=1Ni,r(T)1(ti,r,jt)N_{i,r}(t)\equiv\sum_{j=1}^{N_{i,r}(T)}\textbf{1}(t_{i,r,j}\leq t) for t[0,T]t\in[0,T]. We use the intensity function, i.e., λi,r(t)𝔼{dNi,r(t)/dt}\lambda_{i,r}(t)\equiv\mathbb{E}\{\mathrm{d}N_{i,r}(t)/\mathrm{d}t\}, to characterize each counting process.

We start with a model that tackles the first two of the aforementioned tasks: decomposition and alignment. To be specific, we assume that the true underlying intensities are formed through the superposition of multiple components, and these intensity components exhibit uniformity across subjects, differing only in temporal shifts. We arrive at the following model

λi,r(t)=a+m[M]Svi,m+wr,mfm(t),\displaystyle\begin{split}\lambda_{i,r}(t)=a+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{r,m}}f_{m}(t),\end{split} (1)

where a[0,)a\in[0,\infty) represents the baseline intensity, M+M\in\mathbb{N}^{+} is the number of components, vi,m[0,V]v_{i,m}\in[0,V] represents the subject-specific time shift associated with the ii-th subject and the mm-th component, wr,m[0,W]w^{*}_{r,m}\in[0,W] represents the known observation-specific time shift of the mm-th component in the rr-th observation, SvS^{v} is the shift operator defined as Svx(t)x(tv)S^{v}x(t)\equiv x(t-v), and fm(t)f_{m}(t)\in\mathcal{F} represents the mm-th intensity component. Here V,W(0,T)V,W\in(0,T), and {fL2():f(t)=0 for t(0,T0)}\mathcal{F}\equiv\{f\in L^{2}(\mathbb{R}):f(t)=0\text{ for }t\in\mathbb{R}\setminus(0,T_{0})\} with T0(0,T)T_{0}\in(0,T). For further clarity, a graphical representation of the model in (1) is provided in Figure 2.

Refer to caption f1f_{1}Refer to caption f2f_{2}0TT(a)Refer to caption Svi,1+wr,1f1S^{v_{i,1}+w^{*}_{r,1}}f_{1}vi,1v_{i,1} + wr,1w^{*}_{r,1}wr,1w^{*}_{r,1}Baseline intensity aaBaseline intensity aaRefer to caption Svi,2+wr,2f2S^{v_{i,2}+w^{*}_{r,2}}f_{2}vi,2v_{i,2} + wr,2w^{*}_{r,2}wr,2w^{*}_{r,2}0TT(b)Refer to caption λi,r\lambda_{i,r}ti,r,jt_{i,r,j}aa0TT(c)Refer to caption
Figure 2: Graphical representation of the additive shape invariant model with two components. Panel (a) shows the intensity components shared across subjects and observations. Panel (b) shows the intensity components associated with subject ii and observation rr where the intensity is shifted by vi,m+wr,mv_{i,m}+w^{*}_{r,m}. Panel (c) shows the expected intensity of subject ii in observation rr, and one realization of the point process {ti,r,j:j=1,,Ni,r(T)}\{t_{i,r,j}:j=1,\cdots,N_{i,r}(T)\}.

To provide a concrete example for Model 1, consider the experiment in Steinmetz et al. (2019) where Ni,r(t)N_{i,r}(t) represents the recorded spike train of the ii-th neuron in the rr-th trial, and λi,r(t)\lambda_{i,r}(t) denotes the corresponding firing rate (see for instance Figure 1). The terms aa and {fm():m=1,2}\{f_{m}(\cdot):m=1,2\} can be interpreted as the spontaneous firing rate and the neural response elicited by the visual gratings and auditory cue, respectively. The subject-specific time shift vi,mv_{i,m} corresponds to the response latency of the ii-th neuron in response to the mm-th stimulus. The observation-specific time shift wr,mw^{*}_{r,m} corresponds to the occurrence time of the mm-th stimulus in the rr-th trial, which is randomly generated by the experimenters.

2.2 Identifiability

We denote the collection of model parameters as 𝜽0(a,𝐟,𝐯)Θ0\boldsymbol{\theta}_{0}\equiv({a},\mathbf{f},\mathbf{v})\in\Theta_{0}, where 𝐟(fm)m[M]\mathbf{f}\equiv(f_{m})_{m\in[M]}, 𝐯(vi,m)i[n],m[M]\mathbf{v}\equiv(v_{i,m})_{i\in[n],m\in[M]}, and Θ0{(a,𝐟,𝐯):a[0,),𝐟M,𝐯[0,V]n×M}\Theta_{0}\equiv\{({a},\mathbf{f},\mathbf{v}):{a}\in[0,\infty),\mathbf{f}\in\mathcal{F}^{M},\mathbf{v}\in[0,V]^{n\times M}\}. In addition, for any Ni,r(t)N_{i,r}(t), we denote its conditional intensity given the observation-specific time shifts as λ𝜽0,i(t,𝐰)𝔼𝜽0{dNi,r(t)/dt𝐰r=𝐰}\lambda_{\boldsymbol{\theta}_{0},i}(t,\mathbf{w}^{*})\equiv\mathbb{E}_{\boldsymbol{\theta}_{0}}\{\mathrm{d}N_{i,r}(t)/\mathrm{d}t\mid\mathbf{w}^{*}_{r}=\mathbf{w}^{*}\}, where 𝐰r(wr,m)m[M]\mathbf{w}^{*}_{r}\equiv(w^{*}_{r,m})_{m\in[M]}. We further denote 𝝀𝜽0(t,𝐰)(λ𝜽0,i(t,𝐰))i[n]\boldsymbol{\lambda}_{\boldsymbol{\theta}_{0}}(t,\mathbf{w}^{*})\equiv(\lambda_{\boldsymbol{\theta}_{0},i}(t,\mathbf{w}^{*}))_{i\in[n]}. In this context, identifiability pertains to the injectivity of the mapping 𝜽0𝝀𝜽0(,)\boldsymbol{\theta}_{0}\mapsto\boldsymbol{\lambda}_{\boldsymbol{\theta}_{0}}(\cdot,\cdot) for 𝜽0Θ0\boldsymbol{\theta}_{0}\in\Theta_{0}.

With these notations, we can formally present the identifiability results in Proposition 1.

Proposition 1.

Suppose that the following assumptions hold.

  1. (A1)

    TT0+V+WT\geq T_{0}+V+W.

  2. (A2)

    The matrix 𝔼[𝜼(ξ)¯𝜼(ξ)]\mathbb{E}[\overline{\boldsymbol{\eta}^{*}(\xi)}{\boldsymbol{\eta}^{*}(\xi)}^{\top}] is invertible for ξ{0}\xi\in\mathbb{R}\setminus\{0\}, where 𝜼(ξ)(ηm)m[M]{{\boldsymbol{\eta}^{*}(\xi)}}\equiv(\eta_{m})_{m\in[M]}, ηmexp{j2πξwm}\eta_{m}\equiv\exp\{-\operatorname{j}2\pi\xi w^{*}_{m}\}j\operatorname{j} denotes the imaginary unit, and 𝜼(ξ)¯\overline{\boldsymbol{\eta}^{*}(\xi)} denotes the complex conjugate of 𝜼(ξ){\boldsymbol{\eta}^{*}(\xi)}.

Let (a,𝐟,𝐯)Θ0({a}^{*},\mathbf{f}^{*},\mathbf{v}^{*})\in\Theta_{0} denote the true parameters. Then, we can verify the following statements hold.

  1. (P1)

    For m[M]m\in[M], the intensity component fm{f}_{m}^{*} is identifiable up to a time shift.

  2. (P2)

    For m[M]m\in[M] such that fmf^{*}_{m} is non-zero on a set of positive measure, the subject-specific time shifts {vi,m:i[n]}\{v^{*}_{i,m}:{i\in[n]}\} are identifiable up to an additive constant.

  3. (P3)

    The baseline intensity a{a}^{*} is identifiable.

Assumption A1 posits that the observation duration sufficiently extends to avoid censorship. Given A1, the model described in (1) can be formulated in the form of a linear regression model in the frequency domain, where the response variables are determined by Ni,rN_{i,r}’s, the explanatory variables depend on wr,mw^{*}_{r,m}’s, and the regression coefficients are functions of fmf_{m}’s and vi,mv_{i,m}’s. Within the context of linear regression, the significance of Assumption A2 becomes straightforward. Specifically, Assumption A2 effectively assumes no collinearity among explanatory variables. We note that Assumption A2 is satisfied if either (i) the observation-specific time shifts are randomized, i.e., {wm:m=1,2,M}\{w^{*}_{m}\in\mathbb{C}:m=1,2,\ldots M\} or (ii) the gaps between the observation-specific time shifts are randomized, i.e., wm=wm1+δm1w^{*}_{m}=w^{*}_{m-1}+\delta_{m-1} for m=2,,Mm=2,\ldots,M, where {δm:m[M1]}\{\delta_{m}\in\mathbb{C}:m\in[M-1]\} are independent random variables with non-zero variance. Detailed proof of Proposition 1 is presented in Section A of the Supplementary Materials.

2.3 Connections with existing models

In special scenarios, the model presented in (1) has connections with three distinct branches of research. Firstly, considering a special case where M=1M=1 and R=1R=1, the model in (1) can be simplified as follows:

λi,1(t)\displaystyle\lambda_{i,1}(t) =a+Svi,1+w1,1f1(t)Svi,1g(t),\displaystyle=a+S^{v_{i,1}+w^{*}_{1,1}}f_{1}(t)\equiv S^{v_{i,1}}g(t), (2)

where g(t)a+Sw1,1f1(t)g(t)\equiv a+S^{w^{*}_{1,1}}f_{1}(t). This model has been investigated in the realm of shape invariant models (see, e.g., Beath, 2007; Bigot and Gadat, 2010; Vimond, 2010; Bigot et al., 2013; Bigot and Gendre, 2013; Bontemps and Gadat, 2014). Consequently, the proposed model can be seen as a generalization of shape invariant models to incorporate additive components. Various assumptions have been proposed to ensure the identifiability of shape invariant models, including availability of repeated observations (Bigot and Gendre, 2013), partial knowledge of time shifts (Bigot and Gadat, 2010; Bigot et al., 2013), or the knowledge of the noise distribution (Bontemps and Gadat, 2014). For the proposed additive shape invariant model, where MM is allowed to exceed 2, a combination of repeated observations and partial knowledge of time shifts suffices to ensure the model identifiability, as elucidated in Proposition 1.

Secondly, considering the scenario where the intensity components {Svi,m+wr,mfm(t):m[M]}\{S^{v_{i,m}+w^{*}_{r,m}}f_{m}(t):m\in[M]\} have non-overlapping supports for all i[n]i\in[n] and r[R]r\in[R], model (1) can be expressed as:

λi,r(t)\displaystyle\lambda_{i,r}(t) =a+m[M]Svi,m+wr,mfm(t)g{hi,r(t)},\displaystyle=a+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{r,m}}f_{m}(t)\equiv g\{h_{i,r}(t)\}, (3)

where g(s)a+m[M]fm(s)g(s)\equiv a+\sum_{m\in[M]}f_{m}(s) and hi,r(t)m[M]{t(vi,m+wr,m)}×𝟏[fm{t(vi,m+wr,m)}0]h_{i,r}(t)\equiv\sum_{m\in[M]}\{t-(v_{i,m}+w^{*}_{r,m})\}\times\mathbf{1}[f_{m}\{t-(v_{i,m}+w^{*}_{r,m})\}\neq 0]. The reformulated model in (3) means that λi,r\lambda_{i,r}’s are identical subject to unknown time warping functions (i.e., hi,r(t)h_{i,r}(t)’s). The task of estimating time warping functions has been explored in the domain of curve registration (Kneip and Gasser, 1992; Ramsay and Li, 1998; James, 2007; Telesca and Inoue, 2008; Cheng et al., 2016).

Lastly, considering the scenario where the variances of vi,mv_{i,m}’s and wr,mw^{*}_{r,m}’s are both close to zero, model (1) can be approximated using Taylor expansion as:

λi,r(t)μ(t)+m[M]ζi,r,mψm(t)\displaystyle\begin{split}\lambda_{i,r}(t)&\approx\mu(t)+\sum_{m\in[M]}\zeta_{i,r,m}~{}\psi_{m}(t)\end{split} (4)

where μ(t)a+m[M]fm(t𝔼ui,r,m)\mu(t)\equiv a+\sum_{m\in[M]}f_{m}(t-\mathbb{E}u_{i,r,m}), ui,r,mvi,m+wr,mu_{i,r,m}\equiv v_{i,m}+w^{*}_{r,m}, ζi,r,m(ui,r,m𝔼ui,r,m)Dfm(t𝔼ui,r,m)t\zeta_{i,r,m}\equiv-(u_{i,r,m}-\mathbb{E}u_{i,r,m})\|Df_{m}(t-\mathbb{E}u_{i,r,m})\|_{t}, DfmDf_{m} denotes the first order derivative of fmf_{m}, and ψm(t)Dfm(t𝔼ui,r,m)Dfm(t𝔼ui,r,m)t1\psi_{m}(t)\equiv Df_{m}(t-\mathbb{E}u_{i,r,m})\|Df_{m}(t-\mathbb{E}u_{i,r,m})\|_{t}^{-1}. We defer the derivation of this approximation to Section C of the Supplementary Materials. When functions {S𝔼ui,r,mfm(t):m[M]}\{S^{\mathbb{E}u_{i,r,m}}f_{m}(t):m\in[M]\} exhibit non-overlapping supports, the approximate model in (4) corresponds to the models of functional principal component analysis (FPCA) (Yao et al., 2005; Morris and Carroll, 2006; Di et al., 2009; Crainiceanu et al., 2009; Xu et al., 2020).

2.4 Estimation

We consider the case that Ni,r(t)N_{i,r}(t)’s are Poisson processes. To effectively estimate the parameters in (1), we exploit the two sources of stochasticity of the Poisson processes. Firstly, the number of events over [0,T][0,T] for any Poisson process is a random variable that follows a Poisson distribution, in other words, Ni,r(T)Poisson(Λi,r)N_{i,r}(T)\sim\mathrm{Poisson}(\Lambda_{i,r}), where Λi,r0Tλi,r(t)dt\Lambda_{i,r}\equiv\int_{0}^{T}\lambda_{i,r}(t)\mathrm{d}t represents the expected event count. When Assumption A1 holds, we can derive from the model in (1) that Λi,r=aT+m[M]0Tfm(t)dt\Lambda_{i,r}=aT+\sum_{m\in[M]}\int_{0}^{T}f_{m}(t)\mathrm{d}t, denoted as Λ\Lambda for simplicity. Secondly, conditioning Ni,r(T)N_{i,r}(T), the event times {ti,r,j:j=1,,Ni,r(T)}\{t_{i,r,j}:j=1,\ldots,N_{i,r}(T)\} can be regarded as independent and identically distributed random variables. The probability density function of these event times, or event time distribution, is characterized by λi,r(t)Λ1=aΛ1+m[M]Svi,m+wr,mfm(t)Λ1\lambda_{i,r}(t)\Lambda^{-1}=a\Lambda^{-1}+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{r,m}}f_{m}(t)\Lambda^{-1}.

Accordingly, we estimate a,𝐟,𝐯a,\mathbf{f},\mathbf{v} through the following reparameterization. Letting aaΛ1a^{\prime}\equiv a\Lambda^{-1}and 𝐟(fmΛ1)m[M]\mathbf{f}^{\prime}\equiv(f_{m}\Lambda^{-1})_{m\in[M]}, we have the following optimization problem:

a^,𝐟^,𝐯^argmina,𝐟,𝐯L1(a,𝐟,𝐯)argmina,𝐟,𝐯i[n],r[R]βi,r1Tyi,r(t)Ni,r(T){a+m[M]Svi,m+wr,mfm(t)}t2,\displaystyle\begin{split}\hat{{a}}^{\prime},\hat{\mathbf{f}}^{\prime},\hat{\mathbf{v}}&\equiv\underset{{a}^{\prime},\mathbf{f}^{\prime},\mathbf{v}}{\arg\min}~{}L_{1}({a}^{\prime},\mathbf{f}^{\prime},\mathbf{v})\\ &\equiv\underset{{a}^{\prime},\mathbf{f}^{\prime},\mathbf{v}}{\arg\min}\sum_{i\in[n],r\in[R]}\beta_{i,r}~{}\frac{1}{T}\Bigg{\|}\frac{y_{i,r}(t)}{N_{i,r}(T)}-\bigg{\{}a^{\prime}+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{r,m}}f^{\prime}_{m}(t)\bigg{\}}\Bigg{\|}_{t}^{2},\end{split} (5)

where t\|\cdot\|_{t} denotes the L2L^{2}-norm with respect to tt, βi,rNi,r(T)\beta_{i,r}\equiv N_{i,r}(T), yi,r(t){Ni,r(t+Δt)Ni,r(t)}Δt1y_{i,r}(t)\equiv\{N_{i,r}(t+\Delta t)-N_{i,r}(t)\}\Delta t^{-1}, and Δt\Delta t represents a infinitesimally small value. In (5), the objective function L1(a,𝐟,𝐯)L_{1}({a}^{\prime},\mathbf{f}^{\prime},\mathbf{v}) measures the discrepancy between the empirical and estimated distributions of event timings, where yi,r(t)Ni,r(T)1y_{i,r}(t)N_{i,r}(T)^{-1} and {a+m[M]Svi,m+wr,mfm(t)}\{a^{\prime}+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{r,m}}f^{\prime}_{m}(t)\} serve as the empirical distribution and the estimated distribution of {ti,r,j:j=1,,Ni,r(T)}\{t_{i,r,j}:j=1,\ldots,N_{i,r}(T)\}, respectively. The term βi,r\beta_{i,r} serves as the weight of the counting process Ni,r(t)N_{i,r}(t). For instance, setting βi,r=Ni,r(T)\beta_{i,r}=N_{i,r}(T) means equal weights for all events. Finally, we estimate Λ\Lambda using the empirical mean

Λ^(nR)1i[n],r[R]Ni,r(T).\displaystyle\begin{split}\hat{\Lambda}&\equiv\left({nR}\right)^{-1}\sum_{i\in[n],r\in[R]}N_{i,r}(T).\end{split} (6)

The parameters a,𝐟a,\mathbf{f} can be estimated from Λ^,a^,𝐟^\hat{\Lambda},\hat{a}^{\prime},\hat{\mathbf{f}}^{\prime} by a^a^Λ^\hat{a}\equiv\hat{a}^{\prime}\hat{\Lambda} and 𝐟^𝐟^Λ^\hat{\mathbf{f}}\equiv\hat{\mathbf{f}}^{\prime}\hat{\Lambda}.

3 Additive shape invariant mixture model

3.1 Model

We extend model (1) to simultaneously perform decomposition, alignment, and clustering. Assume that the nn subjects can be classified into KK distinct clusters, that is, [n]=k=1K𝒞k[n]=\cup_{k=1}^{K}\mathcal{C}_{k}, where 𝒞1,,𝒞K\mathcal{C}_{1},\ldots,\mathcal{C}_{K} represent mutually exclusive subsets of [n][n]. These clusters are delineated based on the similarity of intensity components across subjects. Specifically, we introduce the following model:

λi,r(t)=azi+m[M]Svi,m+wr,mfzi,m(t),\displaystyle\begin{split}\lambda_{i,r}(t)=a_{z_{i}}+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{r,m}}f_{z_{i},m}(t),\end{split} (7)

where zi[K]z_{i}\in[K] represents the cluster membership of subject ii such that zi=kz_{i}=k if i𝒞ki\in\mathcal{C}_{k}. We refer to the model in (7) as the additive shape invariant mixture model, or ASIMM for short. Conditioning on each cluster, the additive shape invariant mixture model in (7) simplifies to the additive shape invariant model in (1). Similar to the connection between model in (1) and FPCA, the additive shape invariant mixture model in (7) has a close connection with clustering methods based on FPCA (Chiou and Li, 2007; Bouveyron and Jacques, 2011; Jacques and Preda, 2013; Yin et al., 2021).

In the context of neural data (Steinmetz et al., 2019), 𝒞k\mathcal{C}_{k}’s can be interpreted as functional groups of neurons, wherein neurons exhibit similar firing patterns. The additive shape invariant mixture model in (7) enables us to simultaneously identify functional groups of neurons (i.e., 𝒞k\mathcal{C}_{k}’s), discern representative neural firing patterns (i.e., fk,mf_{k,m}’s), and estimate individual neural response latencies (i.e., vi,mv_{i,m}’s). It is worthwhile to emphasize that the applicability of the proposed method extends beyond neural data analysis. For instance, the additive shape invariant mixture model in (7) can be employed in analyzing recurrent consumer activity in response to advertising (Xu et al., 2014; Zadeh and Sharda, 2014; Tanaka et al., 2016; Bues et al., 2017), or studying hospital admission rates following the implementation of societal disease prevention policies (Barone-Adesi et al., 2006; Sims et al., 2010; Klevens et al., 2016; Evans et al., 2021). Additionally, the proposed model has potential for application on diverse datasets (Tang and Li, 2023; Schoenberg, 2023; Dempsey, 2023; Ganggang Xu and Guan, 2024; Djorno and Crawford, 2024).

3.2 Identifiability

We denote the collection of unknown parameters in (7) as 𝐳(zi)i[n]\mathbf{z}\equiv(z_{i})_{i\in[n]}, 𝐚(ak)k[K]\mathbf{a}\equiv(a_{k})_{k\in[K]}, 𝐟(fk,m)k[K],m[M]\mathbf{f}\equiv(f_{k,m})_{k\in[K],m\in[M]}, 𝐯(vi,m)i[n],m[M]\mathbf{v}\equiv(v_{i,m})_{i\in[n],m\in[M]}. Let (𝐳,𝒂,𝐟,𝐯)Θ1(\mathbf{z}^{*},\boldsymbol{a}^{*},\mathbf{f}^{*},\mathbf{v}^{*})\in\Theta_{1} denote the true parameters, where Θ1{(𝐳,𝒂,𝐟,𝐯):𝐳[K]n,𝒂[0,)K,𝐟K×M,𝐯[0,V]n×M}\Theta_{1}\equiv\{(\mathbf{z},\boldsymbol{a},\mathbf{f},\mathbf{v}):\mathbf{z}\in[K]^{n},\boldsymbol{a}\in[0,\infty)^{K},\mathbf{f}\in\mathcal{F}^{K\times M},\mathbf{v}\in[0,V]^{n\times M}\}. When conditioning on 𝐳\mathbf{z}^{*}, Proposition 1 establishes the identifiability of 𝒂,𝐟,𝐯\boldsymbol{a}^{*},\mathbf{f}^{*},\mathbf{v}^{*}. However, to ensure the identifiability of 𝐳\mathbf{z}^{*}, an additional assumption regarding the separability of clusters is required. The formal presentation of model identifiability is provided in Proposition 2.

Proposition 2.

Suppose that both Assumptions A1 and A2 hold, and further assume that

  1. (A3)

    For any k,k[K]k,k^{\prime}\in[K] that kkk\neq k^{\prime}, there exists m0[M]m_{0}\in[M] such that for any xx\in\mathbb{R}, {t:Sxfk,m0(t)fk,m0(t)}\{t\in\mathbb{R}:S^{x}f^{*}_{k,m_{0}}(t)\neq f^{*}_{k^{\prime},m_{0}}(t)\} has a positive measure.

Then, we can verify the following statements hold.

  1. (P4)

    The cluster memberships 𝐳\mathbf{z}^{*} are identifiable up to a permutation of cluster labels.

  2. (P5)

    The baseline values 𝒂\boldsymbol{a}^{*} are identifiable up to a permutation of cluster labels.

  3. (P6)

    The response components 𝐟\mathbf{f}^{*} are identifiable up to a permutation of cluster labels and time shifts.

  4. (P7)

    For k[K],m[M]k\in[K],m\in[M] such that the set {t:fk,m(t)0}\{t:f^{*}_{k,m}(t)\neq 0\} is of positive measure, the set (vi,m)i𝒞k(v^{*}_{i,m})_{i\in\mathcal{C}^{*}_{k}} is identifiable up to a constant independent of ii.

Assumption A3 mandates that each cluster exhibits at least one signature intensity component that is unique to this cluster. Statement P4 directly stems from Assumption A3. Statements P5P6, and P7 can be derived by applying Proposition 1 to each individual cluster. Detailed proof of Proposition 2 is presented in Section B of the Supplementary Materials.

3.3 Estimation

We estimate the parameters in (7) by generalizing the optimization approach in Section 2.4 to incorporate the cluster structure. For any k[K]k\in[K] and m[M]m\in[M], let ΛkakT+m[M]0Tfk,m(t)dt\Lambda_{k}\equiv a_{k}T+\sum_{m\in[M]}\int_{0}^{T}f_{k,m}(t)\mathrm{d}t, akakΛk1a^{\prime}_{k}\equiv a_{k}\Lambda_{k}^{-1}, and fk,mfk,mΛk1f^{\prime}_{k,m}\equiv f_{k,m}\Lambda_{k}^{-1}. Denoting 𝐚(ak)k[K]\mathbf{a}^{\prime}\equiv(a^{\prime}_{k})_{k\in[K]}, 𝐟(fk,m)k[K],m[M]\mathbf{f}^{\prime}\equiv(f^{\prime}_{k,m})_{k\in[K],m\in[M]}, and 𝚲(Λk)k[K]\boldsymbol{\Lambda}\equiv(\Lambda_{k})_{k\in[K]}, we propose the following optimization problem:

𝐳^,𝐚^,𝐟^,𝐯^,𝚲^argmin𝐳,𝐚,𝐟,𝐯,𝚲{L1(𝐳,𝐚,𝐟,𝐯)+γL2(𝐳,𝚲)},\displaystyle\begin{split}\hat{\mathbf{z}},\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime},\hat{\mathbf{v}},\hat{\boldsymbol{\Lambda}}&\equiv\underset{{\mathbf{z}},\mathbf{a}^{\prime},\mathbf{f}^{\prime},\mathbf{v},\boldsymbol{\Lambda}}{\arg\min}\left\{L_{1}(\mathbf{z},\mathbf{a}^{\prime},\mathbf{f}^{\prime},\mathbf{v})+\gamma~{}L_{2}(\mathbf{z},\boldsymbol{\Lambda})\right\},\end{split} (8)

where

L1(𝐳,𝐚,𝐟,𝐯)i[n],r[R]βi,r1Tyi,r(t)Ni,r(T){azi+m[M]Svi,m+wr,mfzi,m(t)}t2,\displaystyle L_{1}(\mathbf{z},\mathbf{a}^{\prime},\mathbf{f}^{\prime},\mathbf{v})\equiv\sum_{i\in[n],r\in[R]}\beta_{i,r}~{}\frac{1}{T}\Bigg{\|}\frac{y_{i,r}(t)}{N_{i,r}(T)}-\bigg{\{}a^{\prime}_{z_{i}}+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{r,m}}f^{\prime}_{z_{i},m}(t)\bigg{\}}\Bigg{\|}_{t}^{2}, (9)
L2(𝐳,𝚲)i[n],r[R]|Ni,r(T)Λzi|2,\displaystyle L_{2}(\mathbf{z},\boldsymbol{\Lambda})\equiv\sum_{i\in[n],r\in[R]}\big{|}N_{i,r}(T)-\Lambda_{z_{i}}\big{|}^{2}, (10)

and γ(0,)\gamma\in(0,\infty) is a tuning parameter. In essence, L1L_{1} and L2L_{2} assess the within-cluster variance of event time distributions and event counts, respectively. When the number of clusters is reduced to one (i.e., K=1K=1), the definitions of L1L_{1} in (9) is identical to the definition in (5). The tuning parameter γ\gamma modulates the relative importance of L2L_{2} compared to L1L_{1} in the optimization with respect to 𝐳\mathbf{z}. When γ\gamma is sufficiently small, the estimator 𝐳^\hat{\mathbf{z}} defined in (8) is predominantly determined by L1L_{1}, resulting in a potentially suboptimal value of L2L_{2}. Conversely, when γ\gamma is sufficiently large, the dominance shifts towards L2L_{2}, resulting in a 𝐳^\hat{\mathbf{z}} that achieves the minimum value of L2L_{2}, while L1L_{1} may be relegated to suboptimal values. Subsequent to addressing the optimization problem in (8), the estimations of 𝐚\mathbf{a} and 𝐟\mathbf{f} can be established via a^ka^kΛ^k\hat{a}_{k}\equiv\hat{a}^{\prime}_{k}\hat{\Lambda}_{k} and 𝐟^k,m𝐟^k,mΛ^k\hat{\mathbf{f}}_{k,m}\equiv\hat{\mathbf{f}}^{\prime}_{k,m}\hat{\Lambda}_{k} for k[K]k\in[K], m[M]m\in[M].

The optimization problem in (8) involves two tuning parameters γ\gamma and KK. To determine these tuning parameters, we employ a heuristic method. We first establish a preliminary estimate of KK using simple methods, such as applying the k-means algorithm on Ni,r(T)N_{i,r}(T)’s and selecting KK using the elbow method (Thorndike, 1953). Given this preliminary estimation of KK, we choose the largest γ\gamma before observing a significant increase in L1L_{1}. We provide simulation experiments to justify this heuristic method and demonstrate the robustness of selected γ\gamma to the change of the preliminary selection of KK in Section F.1 in the Supplementary Material. Finally, conditioning on the selected γ\gamma, we refine the value of KK by identifying the elbow point on the curve of the overall objective function in (8) against KK.

4 Algorithm

We now present an algorithm for solving the optimization problem (8). This optimization problem aims to minimize the within-cluster variances pertaining to event time distributions and event counts. To this end, we propose an algorithm that resembles the k-means algorithm that alternates between a centering step and a clustering step presented as follows.

(centering step) 𝐚^,𝐟^=argmin𝐚,𝐟L1(𝐳^,𝐚,𝐟,𝐯^),𝚲^=argmin𝚲L2(𝐳^,𝚲),\displaystyle\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime}=\underset{\mathbf{a}^{\prime},\mathbf{f}^{\prime}}{\arg\min}~{}L_{1}(\hat{\mathbf{z}},\mathbf{a}^{\prime},\mathbf{f}^{\prime},\hat{\mathbf{v}}),\quad\hat{\boldsymbol{\Lambda}}=\underset{\boldsymbol{\Lambda}}{\arg\min}~{}L_{2}(\hat{\mathbf{z}},\boldsymbol{\Lambda}), (11)
(clustering step) 𝐳^,𝐯^=argmin𝐳,𝐯{L1(𝐳,𝐚^,𝐟^,𝐯)+γL2(𝐳,𝚲^)}.\displaystyle\hat{\mathbf{z}},\hat{\mathbf{v}}=\underset{\mathbf{z},\mathbf{v}}{\arg\min}~{}\{L_{1}(\mathbf{z},\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime},\mathbf{v})+\gamma~{}L_{2}({\mathbf{z}},\hat{\boldsymbol{\Lambda}})\}. (12)

In the centering step (11), the estimators of 𝐚{\mathbf{a}^{\prime}}, 𝐟{\mathbf{f}^{\prime}} and 𝚲\boldsymbol{\Lambda} are updated conditioned on the values of 𝐳^\hat{\mathbf{z}} and 𝐯^\hat{\mathbf{v}}. In the clustering step (12), the estimators of 𝐳{\mathbf{z}} and 𝐯{\mathbf{v}} are updated conditioned on the values of 𝐚^\hat{\mathbf{a}}^{\prime}, 𝐟^\hat{\mathbf{f}}^{\prime} and 𝚲^\hat{\boldsymbol{\Lambda}}. This alternating scheme facilitates a closed-form solution in the centering step and an effective optimization process in the clustering step.

4.1 The centering step

The centering step (11) involves two optimization problems. To solve the first optimization problem, we formulate it in the frequency domain as follows:

𝐚^,ϕ^=argmin𝐚,ϕi[n],r[R]βi,rl|ηi,r,lNi,r(T){az^i𝟏(l=0)+m[M]exp{j2πl(v^i,m+wr,m)T1}ϕz^i,m,l}|2,\displaystyle\begin{split}\hat{\mathbf{a}}^{\prime},\hat{\boldsymbol{\phi}}^{\prime}&=\underset{\mathbf{a}^{\prime},\boldsymbol{\phi}^{\prime}}{\arg\min}\sum_{i\in[n],r\in[R]}\beta_{i,r}~{}\sum_{l\in\mathbb{Z}}\Bigg{|}\frac{\eta_{i,r,l}}{N_{i,r}(T)}-\bigg{\{}a^{\prime}_{\hat{z}_{i}}\mathbf{1}(l=0)\hskip 5.0pt+\\ &\hskip 108.405pt\sum_{m\in[M]}\exp\big{\{}-\operatorname{j}2\pi l(\hat{v}_{i,m}+w^{*}_{r,m})T^{-1}\big{\}}\phi^{\prime}_{\hat{z}_{i},m,l}\bigg{\}}\Bigg{|}^{2},\end{split} (13)

where ϕ(ϕk,m,l)k[K],m[M],l\boldsymbol{\phi}^{\prime}\equiv(\phi^{\prime}_{k,m,l})_{k\in[K],m\in[M],l\in\mathbb{Z}}, {ϕk,m,l:l}\{\phi^{\prime}_{k,m,l}:l\in\mathbb{Z}\} denotes the Fourier coefficients of fk,m(t)f^{\prime}_{k,m}(t), {ηi,r,l:l}\{\eta_{i,r,l}:l\in\mathbb{Z}\} denotes the Fourier coefficients of yi,r(t)y_{i,r}(t), and j\operatorname{j} denotes the imaginary unit. Notably, the multiplicative term exp(j2πl[v^i,m+wr,m]T1)\exp(-\operatorname{j}2\pi l[\hat{v}_{i,m}+w^{*}_{r,m}]T^{-1}) serves as the frequency domain counterpart of the shift operator Sv^i,m+wr,mS^{\hat{v}_{i,m}+w^{*}_{r,m}}. In essence, the Fourier transformation converts the shift operators into multiplication. As a result, the objective function becomes more tractable compared to its counterpart in the original domain. Indeed, an analytical solution for ϕ^\hat{\boldsymbol{\phi}}^{\prime} can be derived. Let ϕk,,l(ϕk,m,l)m[M]\boldsymbol{\phi}^{\prime}_{k,*,l}\equiv({\phi}^{\prime}_{k,m,l})_{m\in[M]} for any k[K]k\in[K] and ll\in\mathbb{Z}. For l0l\neq 0, the solution to (13) with respect to ϕk,,l\boldsymbol{\phi}^{\prime}_{k,*,l} can be expressed as follows:

ϕ^k,,l\displaystyle\hat{\boldsymbol{\phi}}^{\prime}_{k,*,l} =(𝐄k,l¯𝐁k𝐄k,l)1(𝐄k,l¯𝐁k𝐡k,l),\displaystyle=\left(\overline{\mathbf{E}_{k,l}}^{\top}\mathbf{B}_{k}~{}\mathbf{E}_{k,l}\right)^{-1}\left(\overline{\mathbf{E}_{k,l}}^{\top}\mathbf{B}_{k}~{}\mathbf{h}_{k,l}\right), for l0,\displaystyle\text{for }l\neq 0, (14)
ϕ^k,,0\displaystyle\hat{\boldsymbol{\phi}}^{\prime}_{k,*,0} =|l|0,l0ϕ^k,,l,\displaystyle=-\sum_{|l|\leq\ell_{0},l\neq 0}\hat{\boldsymbol{\phi}}^{\prime}_{k,*,l}, for l=0.\displaystyle\text{for }l=0. (15)

Here, 𝐄k,l\mathbf{E}_{k,l} is defined as

𝐄k,l[exp{j2πl(v^i,m+wr,m)T1}](i,r)𝒞^k×[R],m[M],\displaystyle\mathbf{E}_{k,l}\equiv[\exp\{-\operatorname{j}2\pi l(\hat{v}_{i,m}+w^{*}_{r,m})T^{-1}\}]_{(i,r)\in\hat{\mathcal{C}}_{k}\times[R],m\in[M]}, (16)

where 𝒞^k{i[n]:z^i=k}\hat{\mathcal{C}}_{k}\equiv\{i\in[n]:\hat{z}_{i}=k\}. 𝐄k,l¯\overline{\mathbf{E}_{k,l}} denotes the complex conjugate of 𝐄k,l\mathbf{E}_{k,l}, 𝐁k\mathbf{B}_{k} is a diagonal matrix of (βi,r)(i,r)𝒞^k×[R](\beta_{i,r})_{(i,r)\in\hat{\mathcal{C}}_{k}\times[R]}, 𝐡k,l(ηi,r,lNi,r(T)1)(i,r)𝒞^k×[R]\mathbf{h}_{k,l}\equiv(\eta_{i,r,l}N_{i,r}(T)^{-1})_{(i,r)\in\hat{\mathcal{C}}_{k}\times[R]}, and 0\ell_{0}\in\mathbb{N} is a truncation parameter introduced to ensure the numerical feasibility of computing ϕ^k,,0\hat{\boldsymbol{\phi}}^{\prime}_{k,*,0}. For l0l\neq 0, the objective function concerning ϕk,,l\boldsymbol{\phi}^{\prime}_{k,*,l} in (13) is a weighted sum of squares, hence the estimate in (14) can be derived using the well-known least squares estimator. For l=0l=0, the estimate in (15) can be obtained by exploiting the definition of \mathcal{F}. The detailed derivations of (14) and (15) can be found in Section D of the Supplementary Material.

Upon obtaining ϕ^\hat{\boldsymbol{\phi}}^{\prime}, the solution to the first optimization problem in (11) can be derived as follows. For k[K]k\in[K] and m[M]m\in[M],

a^k=T1m[M]ϕ^k,m,0,\displaystyle\begin{split}\hat{{a}}_{k}^{\prime}&=T^{-1}-\sum_{m\in[M]}\hat{\phi}^{\prime}_{k,m,0},\end{split} (17)
f^k,m(t)\displaystyle\hat{f}^{\prime}_{k,m}(t) =|l|0ϕ^k,m,lexp(j2πltT1),\displaystyle=\sum_{|l|\leq\ell_{0}}\hat{\phi}_{k,m,l}\exp(\operatorname{j}2\pi ltT^{-1}), (18)

where (79) is obtained by substituting ϕ^\hat{\boldsymbol{\phi}}^{\prime} into (13), and (18) follows from the inverse Fourier transformation.

The second optimization problem in (11) aims to minimize the within-cluster variances of event counts given cluster memberships. The solution to this optimization problem is straightforward: for any k[K]k\in[K],

Λ^k\displaystyle\hat{\Lambda}_{k} =argminΛki𝒞^k,r[R]|Ni,r(T)Λk|2=(|𝒞^k|R)1i𝒞^k,r[R]Ni,r(T),\displaystyle=\underset{{\Lambda}_{k}}{\arg\min}\sum_{i\in\hat{\mathcal{C}}_{k},r\in[R]}\big{|}N_{i,r}(T)-\Lambda_{k}\big{|}^{2}=\left({|\hat{\mathcal{C}}_{k}|~{}R}\right)^{-1}\sum_{i\in\hat{\mathcal{C}}_{k},r\in[R]}N_{i,r}(T), (19)

where |𝒞^k||\hat{\mathcal{C}}_{k}| denotes the cardinality of the set 𝒞^k\hat{\mathcal{C}}_{k}. In summary, the solution to the centering step is encapsulated by equations (79), (18), and (19).

4.2 The clustering step

The optimization problem in (12) can be scaled down to the subject level, allowing for the independent estimation of parameters associated with each subject. For any subject i[n]i\in[n], let 𝐯i(vi,m)m[M]\mathbf{v}_{i}\equiv(v_{i,m})_{m\in[M]} denote its associated time shifts. In (12), the parameters ziz_{i} and 𝐯i\mathbf{v}_{i} are estimated through the following sub-problem:

z^i,𝐯^i\displaystyle\hat{z}_{i},\hat{\mathbf{v}}_{i} =argminzi,𝐯i{L1,i(zi,𝐚^,𝐟^,𝐯i)+γL2,i(zi,𝚲^)},\displaystyle=\underset{z_{i},\mathbf{v}_{i}}{\arg\min}~{}\{L_{1,i}(z_{i},\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime},\mathbf{v}_{i})+\gamma~{}L_{2,i}(z_{i},\hat{\boldsymbol{\Lambda}})\}, (20)

where L1,iL_{1,i} and L2,iL_{2,i} are defined as

L1,i(zi,𝐚^,𝐟^,𝐯i)r[R]βi,r1Tyi,r(t)Ni,r(T){a^zi+m[M]Svi,m+wr,mf^zi,m(t)}t2,\displaystyle L_{1,i}(z_{i},\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime},\mathbf{v}_{i})\equiv\sum_{r\in[R]}\beta_{i,r}~{}\frac{1}{T}\Bigg{\|}\frac{y_{i,r}(t)}{N_{i,r}(T)}-\bigg{\{}\hat{a}^{\prime}_{z_{i}}+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{r,m}}\hat{f}^{\prime}_{z_{i},m}(t)\bigg{\}}\Bigg{\|}_{t}^{2}, (21)
L2,i(zi,𝚲^)r[R]|Ni,r(T)Λ^zi|2.\displaystyle L_{2,i}(z_{i},\hat{\boldsymbol{\Lambda}})\equiv\sum_{r\in[R]}\big{|}N_{i,r}(T)-\hat{\Lambda}_{z_{i}}\big{|}^{2}. (22)

The parameter dimension for the problem in (20) is significantly reduced compared to the original optimization problem in (12). Consequently, solving the problem in (20) is computationally more efficient than addressing the original problem stated in (12).

To solve the optimization problem in (20), we employ the following procedure:

𝐯~i|k\displaystyle\tilde{\mathbf{v}}_{i|k} =argmin𝐯iL1,i(k,𝐚^,𝐟^,𝐯i),for k[K],\displaystyle=\underset{\mathbf{v}_{i}}{\arg\min}~{}L_{1,i}(k,\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime},\mathbf{v}_{i}),\quad\text{for }k\in[K], (23)
z^i\displaystyle\hat{z}_{i} =argminzi[K]{L1,i(zi,𝐚^,𝐟^,𝐯~i|zi)+γL2,i(zi,𝚲^)},\displaystyle=\underset{z_{i}\in[K]}{\arg\min}~{}\{L_{1,i}(z_{i},\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime},\tilde{\mathbf{v}}_{i|z_{i}})+\gamma~{}L_{2,i}(z_{i},\hat{\boldsymbol{\Lambda}})\}, (24)
𝐯^i\displaystyle\hat{\mathbf{v}}_{i} =𝐯~i|z^i.\displaystyle=\tilde{\mathbf{v}}_{i|\hat{z}_{i}}. (25)

In the first step (23), we determine the optimal time shift for each potential cluster membership k[K]k\in[K]. The optimization problem in this step can be solved in the frequency domain using the Newton’s method (see Section E of the Supplementary Material). In the second step (24), we evaluate the objective function for each possible cluster membership, leveraging the optimal time shift corresponding to that particular cluster. Subsequently, we designate the cluster associated with the minimal objective function value as the estimated cluster membership. In the last step (25), we choose the optimal time shift for the estimated cluster membership. In summary, the solution to the clustering step is encapsulated by (23), (24), and (25).

4.3 Overall estimation procedure

The objective function in (8) exhibits a non-convex nature. This characteristic poses a crucial need for an appropriate initialization scheme and convergence criterion. Our proposed initialization scheme is described in Remark 1. The overall estimation procedure is summarized in Algorithm 1.

Input: {Ni,r(t):i[n],r[R]},K,γ,0\{N_{i,r}(t):i\in[n],r\in[R]\},K,\gamma,\ell_{0}
Initialize 𝐯^(0),𝐳^(0)\hat{\mathbf{v}}^{(0)},\hat{\mathbf{z}}^{(0)} via (26), (27), let s=0s=0, and L(0)=L^{(0)}=\infty;
while not stop do
       Update 𝐚^(s+1)\hat{\mathbf{a}}^{\prime(s+1)}, 𝐟^(s+1)\hat{\mathbf{f}}^{\prime(s+1)}, 𝚲^(s+1)\hat{\boldsymbol{\Lambda}}^{(s+1)} via (79) - (19) given (𝐳^(s),𝐯^(s))(\hat{\mathbf{z}}^{(s)},\hat{\mathbf{v}}^{(s)});
       Update 𝐳^(s+1)\hat{\mathbf{z}}^{(s+1)}, 𝐯^(s+1)\hat{\mathbf{v}}^{(s+1)} via (23) - (25), given (𝐚^(s+1),𝐟^(s+1),𝚲^(s+1))(\hat{\mathbf{a}}^{\prime(s+1)},\hat{\mathbf{f}}^{\prime(s+1)},\hat{\boldsymbol{\Lambda}}^{(s+1)});
       Evaluate the loss function: L(s+1)L1(𝐳^(s+1),𝐚^(s+1),𝐟^(s+1),𝐯(s+1))+γL2(𝐳^(s+1),𝚲^(s+1))L^{(s+1)}\equiv L_{1}(\hat{\mathbf{z}}^{(s+1)},\hat{\mathbf{a}}^{\prime(s+1)},\hat{\mathbf{f}}^{\prime(s+1)},\mathbf{v}^{(s+1)})+\gamma~{}L_{2}(\hat{\mathbf{z}}^{(s+1)},\hat{\boldsymbol{\Lambda}}^{(s+1)});
       Evaluate the stopping criterion: {L(s)L(s+1)}/L(s+1)ϵ\{L^{(s)}-L^{(s+1)}\}/L^{(s+1)}\leq\epsilon;
       s=s+1s=s+1;
end while
Output: 𝐳^(s),𝐚^(s),𝐟^(s),𝐯^(s)\hat{\mathbf{z}}^{(s)},\hat{\mathbf{a}}^{\prime(s)},\hat{\mathbf{f}}^{\prime(s)},\hat{\mathbf{v}}^{(s)}, 𝚲^(s)\hat{\boldsymbol{\Lambda}}^{(s)}.
Algorithm 1 Iterative algorithm for ASIMM
Remark 1.

Initialization. The subject-specific time shifts are initialized based on the earliest event occurrence following each stimulus. Specifically, the value of 𝐯^(0)\hat{\mathbf{v}}^{(0)} is defined as follows. For i[n]i\in[n], m[M]m\in[M],

v^i,m(0)min{ti,r,jwr,m:ti,r,j>wr,m,r[R],j[Ni,r(T)]}.\displaystyle\hat{v}^{(0)}_{i,m}\equiv\min\{t_{i,r,j}-w^{*}_{r,m}:t_{i,r,j}>w^{*}_{r,m},r\in[R],j\in[N_{i,r}(T)]\}. (26)

The cluster memberships are initialized based on the adjusted event times that roughly aligned the point processes using the initial subject-specific time shifts in (26). These adjusted event times are calculated by shifting the event times associated with each stimulus to an anchor point for that stimulus. To be specific, we shift the event times as t~i,r,jti,r,ju^i,r,m(0)+minr[R]wr,m\tilde{t}_{i,r,j}\equiv t_{i,r,j}-\hat{u}^{(0)}_{i,r,m}+\min_{r^{\prime}\in[R]}w^{*}_{r^{\prime},m} for event jj where ti,r,j[u^i,r,m(0),u^i,r,m+1(0)]t_{i,r,j}\in[\hat{u}^{(0)}_{i,r,m},\hat{u}^{(0)}_{i,r,m+1}]. Here, u^i,r,m(0)v^i,m(0)+wr,m\hat{u}^{(0)}_{i,r,m}\equiv\hat{v}^{(0)}_{i,m}+w^{*}_{r,m} denotes the total time shift associated with stimulus mm for m[M]m\in[M], u^i,r,M+1(0)T\hat{u}^{(0)}_{i,r,M+1}\equiv T, and minr[R]wr,m\min_{r^{\prime}\in[R]}w^{*}_{r^{\prime},m} represents the anchor point of stimulus mm. Subsequently, cluster memberships are initialized by applying the k-means algorithm on adjusted event times:

𝐳^(0)argmin𝐳k[K]i,j:zi=zj=kyi(t)yj(t)2,\displaystyle\hat{\mathbf{z}}^{(0)}\equiv\underset{\mathbf{z}}{\arg\min}\sum_{k\in[K]}\sum_{i,j:z_{i}=z_{j}=k}\left\|y^{\prime}_{i}(t)-y^{\prime}_{j}(t)\right\|^{2}, (27)

where yi(t)y^{\prime}_{i}(t) denotes the empirical distribution of {t~i,r,j:r[R],j[Ni,r(T)]}\{\tilde{t}_{i,r,j}:r\in[R],j\in[N_{i,r}(T)]\}. The efficacy of the proposed initialization approach is illustrated in Figure 10 of the Supplementary Material, where it is shown to outperform the random initialization with multiple restarts.

5 Simulation

5.1 Simulation experiment design

We assess the performance of the proposed method in three synthetic experiments. In the first experiment, we aim to explore the intensity decomposition performance. To this end, we generate Poisson processes Ni,rN_{i,r}’s whose intensities follow the additive shape invariant model in (1). Specifically, we set T=2.5T=2.5, M=2M=2, vi,1Unif(0,1/64)v_{i,1}\sim\mathrm{Unif}(0,1/64), vi,2Unif(0,1/16)v_{i,2}\sim\mathrm{Unif}(0,1/16), wr,1Unif(0,τ)w^{*}_{r,1}\sim\mathrm{Unif}(0,\tau), wr,2Unif(0.8,0.8+τ)w^{*}_{r,2}\sim\mathrm{Unif}(0.8,0.8+\tau). These parameter values remain unchanged across all subsequent experiments. In addition, we set a=20a=20, and set 𝐟\mathbf{f} as

f1(t)=70×[{22cos(4π[t0.4])}×𝟏(t[0.4,0.9])]70×q1(t),f2(t)=70×[{22cos(2π|2t|1/2)}×𝟏(t[0,0.5])]70×q2(t).\displaystyle\begin{split}f_{1}(t)&=70\times\big{[}\{2-2\cos(4\pi[t-0.4])\}\times\mathbf{1}(t\in[0.4,0.9])\big{]}\equiv 70\times{q}_{1}(t),\\ f_{2}(t)&=70\times\big{[}\{2-2\cos(2\pi{|2t|}^{1/2})\}\times\mathbf{1}(t\in[0,0.5])\big{]}\equiv 70\times q_{2}(t).\end{split} (28)

Notably, q1(t)q_{1}(t) and q2(t)q_{2}(t) capture the event time distribution of the two components, and both q1(t)q_{1}(t) and q2(t)q_{2}(t) integrate to unity. We tune the signal strength in synthetic data by altering the values of RR, nn and τ\tau. Intuitively, RR and nn serve as the sample size, hence are positively associated with signal strength. And τ\tau is associated with the variance of wr,mw^{*}_{r,m}’s and thus the identifiability of the intensity components.

In the second experiment, we compare the clustering performance of the proposed ASIMM (7) and two relevant methods: kCFC, introduced in Chiou and Li (2007), and k-mean alignment, introduced in Sangalli et al. (2010). Here we do not apply the method by Yin et al. (2021) since it considers a multiplicative model whereas our model assumes additive components. We generate Poisson processes using the model in (7). In particular, we set the true cluster memberships by sequentially assigning n=40n=40 subjects into K=4K=4 clusters of equal size, that is, zi=(i/n)Kz_{i}=\lceil(i/n)K\rceil, where \lceil\cdot\rceil denotes the ceiling function. In addition, we set ak=20a_{k}=20, and set fk,mf_{k,m}’s as shown in Table 1. In Table 1, we introduce ρ(0,1)\rho\in(0,1) to control the distinctiveness of clusters. Consider the mean intensity associated with the kk-th cluster. For i𝒞ki\in\mathcal{C}_{k} and r[R]r\in[R], we can derive from (7) that 𝔼λi,r(t)=ak+m[M](pmfk,m)(t)\mathbb{E}\lambda_{i,r}(t)=a_{k}+\sum_{m\in[M]}(p_{m}\star f_{k,m})(t), where pmp_{m} denotes the probability density function of vi,m+wr,mv_{i,m}+w^{*}_{r,m}, and \star denotes the convolution operator that (pmfk,m)(t)=0T0pm(tx)fk,m(x)dx(p_{m}\star f_{k,m})(t)=\int_{0}^{T_{0}}p_{m}(t-x)f_{k,m}(x)\mathrm{d}x. When ρ=0\rho=0, the shapes of m[M](pmfk,m)(t)\sum_{m\in[M]}(p_{m}\star f_{k,m})(t)’s are identical across clusters, whereas when ρ=1\rho=1, the shapes of m[M](pmfk,m)(t)\sum_{m\in[M]}(p_{m}\star f_{k,m})(t)’s exhibit substantial distinctions across clusters. In essence, as ρ\rho increases, the clusters become more separable.

Table 1: True values of {fk,m(t):k[K],m[M]}\{f_{k,m}(t):k\in[K],m\in[M]\} in Scenario 2. The parameter ρ\rho controls the distinctiveness across clusters, whose value is altered in the experiment. The functions q1(t)q_{1}(t) and q2(t)q_{2}(t) are defined in (28), h1(x)|max(x,0)|1/2h_{1}(x)\equiv|\max(x,0)|^{1/2}, and h2(x)1+min(x,0)h_{2}(x)\equiv 1+\min(x,0).
m=1{m=1} m=2{m=2}
k=1{k=1} 52.5×q1(t)52.5\times q_{1}(t) 52.5×q2(t)52.5\times q_{2}(t)
k=2{k=2}
60×[1h1(2ρ1)]×q1(t)60\times\left[1-h_{1}(2\rho-1)\right]\times q_{1}(t)
+48×h2(2ρ1)×q2(2[t0.8])~{}+48\times h_{2}(2\rho-1)\times q_{2}\left(2[t-0.8]\right)
60×[1+h1(2ρ1)]×q2(t)60\times\left[1+h_{1}(2\rho-1)\right]\times q_{2}(t)
48×h2(2ρ1)×q2(2t)\quad-48\times h_{2}(2\rho-1)\times q_{2}(2t)
k=3{k=3} 67.5×(1+0.5ρ)×q1(t)67.5\times(1+0.5\rho)\times q_{1}(t) 67.5×(10.5ρ)×q2(t)67.5\times(1-0.5\rho)\times q_{2}(t)
k=4{k=4} 75×(1+ρ)×q1(t)75\times(1+\rho)\times q_{1}(t) 75×(1ρ)×q2(t)75\times(1-\rho)\times q_{2}(t)

The third experiment is a continuation of the second experiment, where our focus is directed towards evaluating the clustering performance of the proposed method. We manipulate signal strength by varying the values of variables RR, nn, and τ\tau. Through this manipulation, we aim to investigate the impact of RR, nn, and τ\tau on the performance of clustering estimation.

5.2 Intensity estimation performance

In the first experiment, we investigate the effect of varying RR, nn and τ\tau on the intensity estimation performance of the proposed method. When K=1K=1, the value of γ\gamma does not affect the estimation result, hence can be set to zero. We let 0=10\ell_{0}=10, and ϵ=0.005\epsilon=0.005 in all experiments. A sensitivity analysis concerning 0\ell_{0} demonstrates the robustness of estimation results to changes in 0\ell_{0}, as detailed in Section F.2 of the Supplementary Material.

We evaluate the intensity estimation performance via the mean integrated squared error (MISE). The MSIE is defined as:

MISE1Mm[M]d{f^m(t),fm(t)Λ}.\displaystyle\text{MISE}\equiv\frac{1}{M}\sum_{m\in[M]}d\left\{\hat{f}^{\prime}_{m}(t),\frac{f^{*}_{m}(t)}{\Lambda^{*}}\right\}. (29)

where d{f1,f2}minv[T,T]Svf1f22d\{f_{1},f_{2}\}\equiv{\min}_{v\in[-T,T]}\left\|S^{v}f_{1}-f_{2}\right\|^{2}. It is worth noting that the definition of MISE considers 𝐟^\hat{\mathbf{f}}^{\prime} rather than 𝐟^\hat{\mathbf{f}}. We excludes Λ^\hat{{\Lambda}} from the evaluation criterion since its performance, as a sample mean, is well-studied.

The intensity estimation performance is shown in Figure 3. Firstly, the MISE rapidly improves as RR increases, because when RR increases, each subject is associated with more samples, while the number of unknown parameters remain constant. Secondly, a decrease in MISE is observed as τ\tau increases. This is because when τ\tau is small, there is a potential non-identifiability issue due to limited sample size. As τ\tau increases, the variance of wr,mw^{*}_{r,m}’s increases, thereby alleviating the non-identifiability issue. Thirdly, the MISE exhibits a decreasing trend with an increase in nn, since nn serves as the sample size for the estimation of intensity components. However, it is noteworthy that the MISE decreases slower in response to an increase in nn compared to an increase in RR. This is because an increment in nn leads to a proportional increase in the quantity of unknown subject-specific time shifts (i.e., vi,mv_{i,m}’s). Consequently, when the algorithm is provided with the true values of vi,mv_{i,m}’s, the MISE shows a significant reduction.

Refer to caption Number of subjects (i.e., nn)Unknown 𝐯\mathbf{v}Known 𝐯\mathbf{v}(b) Refer to caption Number of observations (i.e., RR)log\log(MISE)τ=0.1\tau=0.1τ=0.2\tau=0.2τ=0.3\tau=0.3(a)
Figure 3: Intensity estimation performance in Experiment 1 with 50005000 replicates. Synthetic data is generated with varying RR, nn, and τ\tau. MISE is shown in log scale for better visualization, where smaller values indicate better performances. Panel (a) shows the performance of intensity estimation with varying values of RR and τ\tau. Panel (b) demonstrates the performance of intensity estimation with varying nn. The curve labeled “Unknown 𝐯\mathbf{v}” shows results when the algorithm is not provided with the true value of 𝐯\mathbf{v}, while the curve labeled “Known 𝐯\mathbf{v}” depicts results when the algorithm is provided with the true value of 𝐯\mathbf{v}.

5.3 Comparison with relevant methods

In the second experiment, we compare the clustering performance of the proposed ASIMM (7) with the kCFC (Chiou and Li, 2007) and the k-mean alignment (Sangalli et al., 2010). We apply proposed method with K=4K=4, γ=0.01\gamma=0.01, where the selection of γ\gamma follows the procedure outlined in Section 3.3. We apply the kCFC by employing the implementation provided in the R package fdapace (Gajardo et al., 2021), specifically the function named “kCFC”. The parameters for this implementation are set as follows: the desired number of clusters is specified as 44, and the maximum number of principal components is set to 22. Additionally, we specify the type of design as ”dense”, and set the maximum number of iterations to 3030. We employ the k-mean alignment by utilizing the implementation available in the R package fdasrvf (Tucker, 2023), specifically the function named “kmeans_align”. In configuring the algorithm parameters, we specify the desired number of clusters specified to 44, the maximum number of iterations specified to 5050, and the minimum number of curves per cluster to 22. Both the kCFC and the k-mean alignment are applied on the empirical intensities aggregated across observations, which can be expressed as yi(t){Ni(t+Δt)Ni(t)}Δt1y_{i}(t)\equiv\{N_{i}(t+\Delta t)-N_{i}(t)\}\Delta t^{-1}, where Ni(t)R1r[R]Ni,r(t)N_{i}(t)\equiv R^{-1}\sum_{r\in[R]}{N_{i,r}(t)}.

We evaluate the cluster estimation performance via the Adjusted Rand Index (ARI) (Hubert and Arabie, 1985). Let 𝓒{𝒞k:k[K]}\boldsymbol{\mathcal{C}}^{*}\equiv\{\mathcal{C}^{*}_{k}:k\in[K]\}, 𝓒^{𝒞^k:k[K]}\hat{\boldsymbol{\mathcal{C}}}\equiv\{\hat{\mathcal{C}}_{k^{\prime}}:{k^{\prime}}\in[K^{\prime}]\} denote the set of true clusters and the set of estimated clusters, where KK and KK^{\prime} are the true number of clusters and specified number of clusters. The ARI is formally defined as

ARIk,k(dk,k2)[k(bk2)k(ck2)](n2)112[k(bk2)+k(ck2)][k(bk2)k(ck2)](n2)1,\displaystyle\mathrm{ARI}\equiv\frac{\sum_{k,k^{\prime}}\binom{d_{k,k^{\prime}}}{2}-\left[\sum_{k}\binom{b_{k}}{2}\sum_{k^{\prime}}\binom{c_{k^{\prime}}}{2}\right]\binom{n}{2}^{-1}}{\frac{1}{2}\left[\sum_{k}\binom{b_{k}}{2}+\sum_{k^{\prime}}\binom{c_{k^{\prime}}}{2}\right]-\left[\sum_{k}\binom{b_{k}}{2}\sum_{k^{\prime}}\binom{c_{k^{\prime}}}{2}\right]\binom{n}{2}^{-1}}, (30)

where bk|𝒞k|b_{k}\equiv|\mathcal{C}^{*}_{k}|, ck|𝒞^k|c_{k^{\prime}}\equiv|\hat{\mathcal{C}}_{k^{\prime}}|, and dk,k|𝒞k𝒞^k|d_{k,k^{\prime}}\equiv|\mathcal{C}^{*}_{k}\cap\hat{\mathcal{C}}_{k^{\prime}}| for k[K]k\in[K] and k[K]k^{\prime}\in[K^{\prime}]. The ARI quantifies the similarity between 𝓒\boldsymbol{\mathcal{C}}^{*} and 𝓒^\hat{\boldsymbol{\mathcal{C}}}. For instance, when 𝓒\boldsymbol{\mathcal{C}}^{*} and 𝓒^\hat{\boldsymbol{\mathcal{C}}} are equivalent up to a permutation of cluster labels, the ARI is equal to one. Conversely, when 𝓒^\hat{\boldsymbol{\mathcal{C}}} is entirely random, the ARI has a mean value of zero.

Figure 4 shows the clustering performance of the three considered methods. Across all depicted scenarios in Figure 4, it is evident that the proposed method consistently outperforms both kCFC and k-mean alignment. The superiority of proposed method becomes especially clear when the values of RR and ρ\rho are large. This is expected since the proposed method excels in handling data with additive intensity components and time shifts. In contrast, kCFC and k-mean alignment were not devised to handle the setting in this experiment. In essence, the proposed method serves as a valuable complement to existing approaches for functional clustering.

Refer to caption Cluster separability (i.e., ρ\rho) (b) ASIMM    kCFC k-mean alignment Refer to caption Number of observations (i.e., RR)1-ARI(a)
Figure 4: Clustering performance in Experiment 2 with 5000 replicates of our proposal in orange, kCFC by Chiou and Li (2007) in green, and k-mean alignment by Sangalli et al. (2010) in blue. Synthetic data is generated with n=40n=40, τ=0.1\tau=0.1, varying RR and ρ\rho. In panel (a), the value of ρ\rho is fixed as ρ=0.5\rho=0.5. In panel (b), the value of RR is fixed as R=2R=2.

5.4 Cluster estimation performance

In the third experiment, we investigate the effect of RR, nn and τ\tau on the cluster estimation performance of the proposed method. We apply proposed method with the same set of tuning parameters as in the second experiment.

The clustering performance is displayed in Figure 5. It is evident that the clustering performance improves as RR, τ\tau, and nn increases. This is because RR, τ\tau, and nn help in estimating the intensity components, as demonstrated in the first experiment, which in turn improves the cluster estimation. Furthermore, RR serves as sample size for cluster memberships, thereby contributing to improved clustering performance. However, the impact of increasing nn on clustering performance is only marginal. This is because as nn increases, the number of unknown cluster memberships also increases, meaning that nn does not serve as the sample size for cluster memberships.

Refer to caption Number of subjects (i.e., nn)Unknown 𝐯\mathbf{v}Known 𝐯\mathbf{v}(b) Refer to caption Number of observations (i.e., RR)1-ARIτ=0.1\tau=0.1τ=0.2\tau=0.2τ=0.3\tau=0.3(a)
Figure 5: Clustering performance in Experiment 3 with 50005000 replicates. Synthetic data is generated under the setting with varying RR, nn, and τ\tau. Panel (a) shows the clustering performance as RR and τ\tau increases, where the value of nn is fixed as n=40n=40. Panel (b) shows the clustering performance as nn increases, where the values of RR and τ\tau are fixed as R=2R=2 and τ=0.1\tau=0.1. The curve labeled “Unknown 𝐯\mathbf{v}” shows results when the algorithm is not provided with the true value of 𝐯\mathbf{v}, while the curve labeled “Known 𝐯\mathbf{v}” depicts results when the algorithm is provided with the true value of 𝐯\mathbf{v}.

6 Real data application

We consider the neural data during visual discrimination tasks from Steinmetz et al. (2019). In each trial, the mouse encountered a sequential presentation of two stimuli. The first stimulus comprised visual gratings of varying contrasts displayed on two screens, one to the left and one to the right of the mouse. The second stimulus was an auditory tone cue which was set off after a randomized delay between 0.4 to 0.8 seconds after the onset of the first stimulus. The mouse could rotate a wheel which, after the auditory cue, would move the visual gratings. When one contrast is higher than the other, the mouse succeeded and gained rewards if the visual grating of higher contrasts was moved to the center screen. The complete criteria for success are detailed in Table 2 of the Supplementary Material. Throughout the experiment, researchers simultaneously recorded firing activities of hundreds of neurons in the left hemisphere of the mouse’s brain using Neuropixels (Jun et al., 2017; Steinmetz et al., 2018, 2021). We aim to identifying groups of neurons with distinct responses to the two stimuli using our proposed method.

Following the notation in (7), we index trials using r{1,,R}r\in\{1,\ldots,R\} and neurons using i{1,,n}i\in\{1,\ldots,n\}. Consequently, the firing activities of neuron rr in trial ii is Ni,r(t),t[0,T]N_{i,r}(t),t\in[0,T] where T=3.5T=3.5. We set m=1m=1 for the visual grating and m=2m=2 for the auditory tone cue. We further denote the onset time of two stimuli in trial rr as wr,1w^{*}_{r,1} and wr,2w^{*}_{r,2}. Neurons might exhibit firing latencies in response to each stimulus denoted as vi,mv_{i,m} for m{1,2}m\in\{1,2\} and i{1,,n}i\in\{1,\ldots,n\}. To demonstrate the usage of proposed method, we focus on R=102R=102 experimental trials where the left visual grating was of higher contrast and the mouse successfully gained rewards. We study n=225n=225 neurons in the midbrain region, where we remove neurons with fewer than one spike per trial on average. We apply the proposed algorithm with K=3K=3, γ=104\gamma=10^{-4}, 0=10\ell_{0}=10 and ϵ=0.005\epsilon=0.005. Using the proposed method, we identify three clusters of neurons with distinct responses to the two stimuli shown in Fig 6. The first column of Figure 6 contains the refined intensity components. To better understand the roles of each clusters, we illustrate the average firing rates from the training trials and the trials that are not used to fit the model in the second and third columns of Figure 6. More details of this analysis can be found in Section G in Supplementary Material.

Recalling that the first stimulus is the visual gratings and the second stimulus is the auditory cue, we have the following observations.

  1. Cluster 1.

    There seems to be only one non-zero component in Cluster 1. It is immediately clear from Figure 6(1c) that the firing rates of neurons in Cluster 1 are highly in sync with the wheel velocity. When aligning the firing rates of neurons in Cluster 1 by movement onset in Figure 6(1b), we can see that the firing rates share almost the same trajectory across conditions, but their peaks depend on the choice the mouse made. In particular, the firing rate has a higher peak when the mouse chose the left visual grating. This preference, also known as laterality, is likely due to the fact that these neurons are from the left hemisphere. We hypothesize that neurons in Cluster 1 are responsible for executing the turning of the wheel. The crucial role of midbrain in coordinating movement has been identified in prior studies (see, e.g., Boecker et al., 2008; Coddington and Dudman, 2019; Inagaki et al., 2022).

  2. Cluster 2.

    The first intensity component exhibits a decrease-then-increase pattern post stimuli onset, and the second component shows a sharp increase after the auditory cue, as shown in Figure 6(2a). Figure 6(2b) shows that the decrease-then-increase pattern seems common in trials when the mouse chose the left visual grating. There is, however, no period of suppression in trials when the mouse chose the right visual grating. When aligning the firing rates by feedback delivery time, it is clear that the firing rates peaked almost immediately after rewards delivery regardless of the choice, and, in the absence of rewards, the firing rates remain stationary. We hypothesize that neurons in Cluster 2 might respond to perceptions of stimuli (e.g., visual gratings, rewards), and their activities are further regulated by laterality. Similar firing patterns of neurons in midbrain have been identified in prior studies (see, e.g., Coddington and Dudman, 2018; Steinmetz et al., 2019).

  3. Cluster 3.

    The estimated first component exhibits suppressed activities throughout the trial, and the second component shows increased activities after the auditory cue in Figure 6(3a). Neural responses in Cluster 3 bear a resemblance to those in Cluster 2 that they both peaked after reward delivery, and they both show preference to contralateral choices (i.e., choosing the right visual grating), as shown in Figures 6(3b) and 6(3c). Unlike Cluster 2 where rewards trigger immediate response, there is a one-second delay from reward delivery till the firing rates peaked in Cluster 3 in Figure 6(3c). It might be possible that neurons in Cluster 3 are involved in the initialization of the next trial, when the mouse needs to hold the wheel stationary after reward delivery. Furthermore, Figure 6(3b) shows that the suppression of activities started after the movement onset in trials with ipsilateral choices, but activities increased before the movement onset in trials with contralateral choices.

Refer to caption Intensity (spikes/s)Time from visual stimulus onset (s)(1a)a^1+f^1,1\hat{a}_{1}+\hat{f}_{1,1}a^1+f^1,2\hat{a}_{1}+\hat{f}_{1,2}Refer to caption Time from movement onset (s)L,LR,LR,R(1b)Refer to caption Standardized valueTime from movement onset (s)IntensityWheel velocity(1c)Refer to caption Intensity (spikes/s)Time from visual stimulus onset (s)a^2+f^2,1\hat{a}_{2}+\hat{f}_{2,1}a^2+f^2,2\hat{a}_{2}+\hat{f}_{2,2}(2a)Refer to caption Time from visual stimulus onset (s)(2b)Refer to caption Time from feedback delivery (s)(2c)Refer to caption Intensity (spikes/s)Time from visual stimulus onset (s)a^3+f^3,1\hat{a}_{3}+\hat{f}_{3,1}a^3+f^3,2\hat{a}_{3}+\hat{f}_{3,2}(3a)Refer to caption Time from movement onset (s)(3b)Refer to caption Time from feedback delivery (s)(3c)
Figure 6: Estimation of intensity components and average firing intensities in various conditions. Each row corresponds to one estimated cluster. The first column presents the estimated intensity components. The second and third columns display the average neural firing intensities from both the training trials and other trials that are not used to fit the model. The shaded area represents the mean firing intensities plus or minus two standard errors of the mean. The legend in panel (1b) represents “scenario, choice”, for instance, “R,L” represents the trials where the right grating was of a higher contrast, and the mouse chose to move the left grating to the center screen resulting in a failure in that trial. Panel (1c) illustrates the average firing intensity and wheel velocity, where both the firing intensity and the wheel velocity are standardized to range from 0 to 1 for alignment. The condition “L,R” is omitted in this figure since there are only three trials, but its firing intensity can be found in Figure 14 of the Supplementary Material.

7 Discussion

In this article, we tackle the problem of simultaneously decomposing, aligning, and clustering recurrent event data. We note that there are a few directions that can be explored in future works. First, the proposed method assumes that all subjects exhibit consistent intensity across observations, which may not hold true in cases where the subjects might respond randomly. For instance, neurons may exhibit periods of reduced responsiveness during tasks known as local sleep (see, e.g., Krueger et al., 2008; Vyazovskiy et al., 2011; Vyazovskiy and Harris, 2013). To address this limitation, we can generalize the proposed model by incorporating a hidden Markov model for the responsive status (Tokdar et al., 2010; Escola et al., 2011; Mews et al., 2023). Second, the proposed method only considers observations from the same experimental condition. A natural extension is to allow observations from different experimental conditions. For instance, it would be beneficial to learn neural functions based on their firing patterns in various visual conditions and feedback types. To achieve this, we can incorporate a cluster structure of observations by employing bi-clustering models (Madeira and Oliveira, 2004; Slimen et al., 2018; Galvani et al., 2021). Third, theoretical properties of the proposed estimators have not been investigated. Previous studies have established the consistency of the k-means clustering (Pollard, 1981; Sun et al., 2012) and clustering based on FPCA (Yin et al., 2021). In addition, theoretical properties of simultaneous registration and clustering models have been established in Tang et al. (2023). Building on previously established framework (Pollard, 1981; Yin et al., 2021; Tang et al., 2023), it may be possible to incorporate the theory of shape-invariant models (Bigot et al., 2013; Bigot and Gendre, 2013) to analyze the theoretical properties of the proposed method.

References

  • Barone-Adesi et al. [2006] Francesco Barone-Adesi, Loredana Vizzini, Franco Merletti, and Lorenzo Richiardi. Short-term effects of italian smoking regulation on rates of hospital admission for acute myocardial infarction. European heart journal, 27(20):2468–2472, 2006.
  • Beath [2007] Ken J Beath. Infant growth modelling using a shape invariant model with random effects. Statistics in medicine, 26(12):2547–2564, 2007.
  • Benedicks [1985] Michael Benedicks. On fourier transforms of functions supported on sets of finite lebesgue measure. Journal of Mathematical Analysis and Applications, 106(1):180–183, 1985. ISSN 0022-247X. doi: https://doi.org/10.1016/0022-247X(85)90140-4. URL https://www.sciencedirect.com/science/article/pii/0022247X85901404.
  • Benucci et al. [2009] Andrea Benucci, Dario L Ringach, and Matteo Carandini. Coding of stimulus sequences by population responses in visual cortex. Nature neuroscience, 12(10):1317–1324, 2009.
  • Bigot and Gadat [2010] Jérémie Bigot and Sébastien Gadat. A deconvolution approach to estimation of a common shape in a shifted curves model. The Annals of Statistics, 38(4):2422–2464, 2010. doi: 10.1214/10-AOS800.
  • Bigot and Gendre [2013] Jérémie Bigot and Xavier Gendre. Minimax properties of Fréchet means of discretely sampled curves. The Annals of Statistics, 41(2):923–956, 2013. doi: 10.1214/13-AOS1104.
  • Bigot et al. [2013] Jérémie Bigot, Sébastien Gadat, Thierry Klein, and Clément Marteau. Intensity estimation of non-homogeneous Poisson processes from shifted trajectories. Electronic Journal of Statistics, 7:881–931, 2013. ISSN 1935-7524. doi: 10.1214/13-EJS794.
  • Boecker et al. [2008] Henning Boecker, Jakob Jankowski, P Ditter, and Lukas Scheef. A role of the basal ganglia and midbrain nuclei for initiation of motor sequences. Neuroimage, 39(3):1356–1369, 2008.
  • Bontemps and Gadat [2014] Dominique Bontemps and Sébastien Gadat. Bayesian methods for the Shape Invariant Model. Electronic Journal of Statistics, 8:1522–1568, 2014. ISSN 1935-7524. doi: 10.1214/14-EJS933.
  • Bouveyron and Jacques [2011] Charles Bouveyron and Julien Jacques. Model-based clustering of time series in group-specific functional subspaces. Advances in Data Analysis and Classification, 5:281–300, 2011.
  • Bues et al. [2017] Mirja Bues, Michael Steiner, Marcel Stafflage, and Manfred Krafft. How mobile in-store advertising influences purchase intention: Value drivers and mediating effects from a consumer perspective. Psychology & Marketing, 34(2):157–174, 2017.
  • Capilla et al. [2011] Almudena Capilla, Paula Pazo-Alvarez, Alvaro Darriba, Pablo Campo, and Joachim Gross. Steady-state visual evoked potentials can be explained by temporal superposition of transient event-related responses. PloS one, 6(1):e14543, 2011.
  • Cembrowski and Spruston [2019] Mark S Cembrowski and Nelson Spruston. Heterogeneity within classical cell types is the rule: lessons from hippocampal pyramidal neurons. Nature Reviews Neuroscience, 20(4):193–204, 2019.
  • Cheng et al. [2016] Wen Cheng, Ian L Dryden, and Xianzheng Huang. Bayesian registration of functions and curves. 2016.
  • Chiou and Li [2007] Jeng-Min Chiou and Pai-Ling Li. Functional clustering and identifying substructures of longitudinal data. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4):679–699, 2007. ISSN 1467-9868. doi: 10.1111/j.1467-9868.2007.00605.x.
  • Chudova et al. [2003] Darya Chudova, Scott Gaffney, Eric Mjolsness, and Padhraic Smyth. Translation-invariant mixture models for curve clustering. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’03, pages 79–88, New York, NY, USA, August 2003. Association for Computing Machinery. ISBN 978-1-58113-737-8. doi: 10.1145/956750.956763.
  • Coddington and Dudman [2018] Luke T Coddington and Joshua T Dudman. The timing of action determines reward prediction signals in identified midbrain dopamine neurons. Nature neuroscience, 21(11):1563–1573, 2018.
  • Coddington and Dudman [2019] Luke T Coddington and Joshua T Dudman. Learning from action: reconsidering movement signaling in midbrain dopamine neuron activity. Neuron, 104(1):63–77, 2019.
  • Crainiceanu et al. [2009] Ciprian M Crainiceanu, Ana-Maria Staicu, and Chong-Zhi Di. Generalized multilevel functional regression. Journal of the American Statistical Association, 104(488):1550–1561, 2009.
  • Daley and Vere-Jones [2003] Daryl J. Daley and David Vere-Jones. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods. Probability and Its Applications. Springer, New York, second edition, 2003. ISBN 978-0-387-95541-4. doi: 10.1007/b97277.
  • Dempsey [2023] Walter Dempsey. Recurrent event analysis in the presence of real-time high frequency data via random subsampling. Journal of Computational and Graphical Statistics, 0(0):1–13, 2023. doi: 10.1080/10618600.2023.2276114. URL https://doi.org/10.1080/10618600.2023.2276114.
  • Di et al. [2009] Chong-Zhi Di, Ciprian M Crainiceanu, Brian S Caffo, and Naresh M Punjabi. Multilevel functional principal component analysis. The annals of applied statistics, 3(1):458, 2009.
  • Djorno and Crawford [2024] Alexandra Djorno and Forrest W Crawford. Mutually exciting point processes for crowdfunding platform dynamics. arXiv preprint arXiv:2402.15433, 2024.
  • Escola et al. [2011] Sean Escola, Alfredo Fontanini, Don Katz, and Liam Paninski. Hidden markov models for the stimulus-response relationships of multistate neural systems. Neural computation, 23(5):1071–1132, 2011.
  • Evans et al. [2021] Stephanie Evans, Emily Agnew, Emilia Vynnycky, James Stimson, Alex Bhattacharya, Christopher Rooney, Ben Warne, and Julie Robotham. The impact of testing and infection prevention and control strategies on within-hospital transmission dynamics of covid-19 in english hospitals. Philosophical Transactions of the Royal Society B, 376(1829):20200268, 2021.
  • Gaffney and Smyth [2004] Scott Gaffney and Padhraic Smyth. Joint probabilistic curve clustering and alignment. In L. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems, volume 17. MIT Press, 2004.
  • Gajardo et al. [2021] Alvaro Gajardo, Satarupa Bhattacharjee, Cody Carroll, Yaqing Chen, Xiongtao Dai, Jianing Fan, Pantelis Z. Hadjipantelis, Kyunghee Han, Hao Ji, Changbo Zhu, Hans-Georg Müller, and Jane-Ling Wang. fdapace: Functional Data Analysis and Empirical Dynamics, 2021. URL https://CRAN.R-project.org/package=fdapace. R package version 0.5.8.
  • Galvani et al. [2021] Marta Galvani, Agostino Torti, Alessandra Menafoglio, and Simone Vantini. Funcc: A new bi-clustering algorithm for functional data with misalignment. Computational Statistics & Data Analysis, 160:107219, 2021.
  • Ganggang Xu and Guan [2024] Yehua Li Ganggang Xu, Jingfei Zhang and Yongtao Guan. Bias-correction and test for mark-point dependence with replicated marked point processes. Journal of the American Statistical Association, 119(545):217–231, 2024. doi: 10.1080/01621459.2022.2106234. URL https://doi.org/10.1080/01621459.2022.2106234.
  • Hubert and Arabie [1985] Lawrence Hubert and Phipps Arabie. Comparing partitions. Journal of Classification, 2(1):193–218, December 1985. ISSN 01764268. doi: 10.1007/BF01908075.
  • Inagaki et al. [2022] Hidehiko K Inagaki, Susu Chen, Margreet C Ridder, Pankaj Sah, Nuo Li, Zidan Yang, Hana Hasanbegovic, Zhenyu Gao, Charles R Gerfen, and Karel Svoboda. A midbrain-thalamus-cortex circuit reorganizes cortical dynamics to initiate movement. Cell, 185(6):1065–1081, 2022.
  • Jacques and Preda [2013] Julien Jacques and Cristian Preda. Funclust: A curves clustering method using functional random variables density approximation. Neurocomputing, 112:164–171, 2013.
  • James [2007] Gareth M James. Curve alignment by moments. 2007.
  • Jun et al. [2017] James J Jun, Nicholas A Steinmetz, Joshua H Siegle, Daniel J Denman, Marius Bauza, Brian Barbarits, Albert K Lee, Costas A Anastassiou, Alexandru Andrei, Çağatay Aydın, et al. Fully integrated silicon probes for high-density recording of neural activity. Nature, 551(7679):232–236, 2017.
  • Klevens et al. [2016] Joanne Klevens, Feijun Luo, Likang Xu, Cora Peterson, and Natasha E Latzman. Paid family leave’s effect on hospital admissions for pediatric abusive head trauma. Injury prevention, 2016.
  • Kneip and Gasser [1992] Alois Kneip and Theo Gasser. Statistical tools to analyze data representing a sample of curves. The Annals of Statistics, pages 1266–1305, 1992.
  • Krueger et al. [2008] James M Krueger, David M Rector, Sandip Roy, Hans PA Van Dongen, Gregory Belenky, and Jaak Panksepp. Sleep as a fundamental property of neuronal assemblies. Nature Reviews Neuroscience, 9(12):910–919, 2008.
  • Lake et al. [2016] Blue B Lake, Rizi Ai, Gwendolyn E Kaeser, Neeraj S Salathia, Yun C Yung, Rui Liu, Andre Wildberg, Derek Gao, Ho-Lim Fung, Song Chen, et al. Neuronal subtypes and diversity revealed by single-nucleus rna sequencing of the human brain. Science, 352(6293):1586–1590, 2016.
  • Lee et al. [2020] Joonyeol Lee, Timothy R Darlington, and Stephen G Lisberger. The neural basis for response latency in a sensory-motor behavior. Cerebral Cortex, 30(5):3055–3073, 2020.
  • Levakova et al. [2015] Marie Levakova, Massimiliano Tamborrino, Susanne Ditlevsen, and Petr Lansky. A review of the methods for neuronal response latency estimation. Biosystems, 136:23–34, 2015.
  • Liu and Yang [2009] Xueli Liu and Mark C.K. Yang. Simultaneous curve registration and clustering for functional data. Computational Statistics & Data Analysis, 53(4):1361–1376, February 2009. ISSN 01679473. doi: 10.1016/j.csda.2008.11.019.
  • Lu and Lou [2019] Zihang Lu and Wendy Lou. Shape invariant mixture model for clustering non-linear longitudinal growth trajectories. Statistical Methods in Medical Research, 28(12):3769–3784, December 2019. ISSN 0962-2802. doi: 10.1177/0962280218815301.
  • Madeira and Oliveira [2004] Sara C Madeira and Arlindo L Oliveira. Biclustering algorithms for biological data analysis: a survey. IEEE/ACM transactions on computational biology and bioinformatics, 1(1):24–45, 2004.
  • Mews et al. [2023] Sina Mews, Bastian Surmann, Lena Hasemann, and Svenja Elkenkamp. Markov-modulated marked poisson processes for modeling disease dynamics based on medical claims data. Statistics in Medicine, 2023.
  • Molyneaux et al. [2007] Bradley J Molyneaux, Paola Arlotta, Joao RL Menezes, and Jeffrey D Macklis. Neuronal subtype specification in the cerebral cortex. Nature reviews neuroscience, 8(6):427–437, 2007.
  • Morris and Carroll [2006] Jeffrey S Morris and Raymond J Carroll. Wavelet-based functional mixed models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 68(2):179–199, 2006.
  • Oram et al. [2002] Michael W Oram, Dengke Xiao, Barbara Dritschel, and Kevin R Payne. The temporal resolution of neural codes: does response latency have a unique role? Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 357(1424):987–1001, 2002.
  • Orhan and Ma [2015] A Emin Orhan and Wei Ji Ma. Neural population coding of multiple stimuli. Journal of Neuroscience, 35(9):3825–3841, 2015.
  • Pollard [1981] David Pollard. Strong consistency of k-means clustering. The Annals of Statistics, pages 135–140, 1981.
  • Ramsay and Li [1998] James O Ramsay and Xiaochun Li. Curve registration. Journal of the Royal Statistical Society Series B: Statistical Methodology, 60(2):351–363, 1998.
  • Sangalli et al. [2010] Laura M. Sangalli, Piercesare Secchi, Simone Vantini, and Valeria Vitelli. K-mean alignment for curve clustering. Computational Statistics & Data Analysis, 54(5):1219–1233, May 2010. ISSN 01679473. doi: 10.1016/j.csda.2009.12.008.
  • Schoenberg [2023] Frederic Schoenberg. Estimating covid-19 transmission time using hawkes point processes. The Annals of Applied Statistics, 17(4):3349–3362, 2023.
  • Sims et al. [2010] Michelle Sims, Roy Maxwell, Linda Bauld, and Anna Gilmore. Short term impact of smoke-free legislation in england: retrospective analysis of hospital admissions for myocardial infarction. Bmj, 340, 2010.
  • Slimen et al. [2018] Yosra Ben Slimen, Sylvain Allio, and Julien Jacques. Model-based co-clustering for functional data. Neurocomputing, 291:97–108, 2018.
  • Steinmetz et al. [2018] Nicholas A Steinmetz, Christof Koch, Kenneth D Harris, and Matteo Carandini. Challenges and opportunities for large-scale electrophysiology with neuropixels probes. Current opinion in neurobiology, 50:92–100, 2018.
  • Steinmetz et al. [2019] Nicholas A. Steinmetz, Peter Zatka-Haas, Matteo Carandini, and Kenneth D. Harris. Distributed coding of choice, action and engagement across the mouse brain. Nature, 576(7786):266–273, December 2019. ISSN 0028-0836, 1476-4687. doi: 10.1038/s41586-019-1787-x.
  • Steinmetz et al. [2021] Nicholas A Steinmetz, Cagatay Aydin, Anna Lebedeva, Michael Okun, Marius Pachitariu, Marius Bauza, Maxime Beau, Jai Bhagat, Claudia Böhm, Martijn Broux, et al. Neuropixels 2.0: A miniaturized high-density probe for stable, long-term brain recordings. Science, 372(6539):eabf4588, 2021.
  • Sun et al. [2012] Wei Sun, Junhui Wang, and Yixin Fang. Regularized k-means clustering of high-dimensional data and its asymptotic consistency. 2012.
  • Tanaka et al. [2016] Yusuke Tanaka, Takeshi Kurashima, Yasuhiro Fujiwara, Tomoharu Iwata, and Hiroshi Sawada. Inferring latent triggers of purchases with consideration of social effects and media advertisements. In Proceedings of the ninth ACM international conference on web search and data mining, pages 543–552, 2016.
  • Tang et al. [2023] Lin Tang, Pengcheng Zeng, Jian Qing Shi, and Won-Seok Kim. Model-based joint curve registration and classification. Journal of Applied Statistics, 50(5):1178–1198, 2023.
  • Tang and Li [2023] Xiwei Tang and Lexin Li. Multivariate temporal point process regression. Journal of the American Statistical Association, 118(542):830–845, 2023. doi: 10.1080/01621459.2021.1955690. URL https://doi.org/10.1080/01621459.2021.1955690. PMID: 37519438.
  • Telesca and Inoue [2008] Donatello Telesca and Lurdes Y T Inoue. Bayesian hierarchical curve registration. Journal of the American Statistical Association, 103(481):328–339, 2008.
  • Thorndike [1953] Robert L Thorndike. Who belongs in the family? Psychometrika, 18(4):267–276, 1953.
  • Tokdar et al. [2010] Surya Tokdar, Peiyi Xi, Ryan C Kelly, and Robert E Kass. Detection of bursts in extracellular spike trains using hidden semi-markov point process models. Journal of computational neuroscience, 29:203–212, 2010.
  • Tucker [2023] J. Derek Tucker. fdasrvf: Elastic Functional Data Analysis, 2023. URL https://CRAN.R-project.org/package=fdasrvf. R package version 2.0.2.
  • Vimond [2010] Myriam Vimond. Efficient estimation for a subclass of shape invariant models. 2010.
  • Vyazovskiy and Harris [2013] Vladyslav V Vyazovskiy and Kenneth D Harris. Sleep and the single neuron: the role of global slow oscillations in individual cell rest. Nature Reviews Neuroscience, 14(6):443–451, 2013.
  • Vyazovskiy et al. [2011] Vladyslav V Vyazovskiy, Umberto Olcese, Erin C Hanlon, Yuval Nir, Chiara Cirelli, and Giulio Tononi. Local sleep in awake rats. Nature, 472(7344):443–447, 2011.
  • Wu et al. [2013] Shuang Wu, Hans-Georg Müller, and Zhen Zhang. Functional data analysis for point processes with rare events. Statistica Sinica, pages 1–23, 2013.
  • Xu et al. [2020] Ganggang Xu, Ming Wang, Jiangze Bian, Hui Huang, Timothy R Burch, Sandro C Andrade, Jingfei Zhang, and Yongtao Guan. Semi-parametric learning of structured temporal point processes. The Journal of Machine Learning Research, 21(1):7851–7889, 2020.
  • Xu et al. [2014] Lizhen Xu, Jason A Duan, and Andrew Whinston. Path to purchase: A mutually exciting point process model for online advertising and conversion. Management Science, 60(6):1392–1412, 2014.
  • Yao et al. [2005] Fang Yao, Hans-Georg Müller, and Jane-Ling Wang. Functional Data Analysis for Sparse Longitudinal Data. Journal of the American Statistical Association, 100(470):577–590, June 2005. ISSN 0162-1459. doi: 10.1198/016214504000001745.
  • Yin et al. [2021] Lihao Yin, Ganggang Xu, Huiyan Sang, and Yongtao Guan. Row-clustering of a point process-valued matrix. Advances in Neural Information Processing Systems, 34:20028–20039, 2021.
  • Zadeh and Sharda [2014] Amir Hassan Zadeh and Ramesh Sharda. Modeling brand post popularity dynamics in online social networks. Decision Support Systems, 65:59–68, 2014.

Supplementary Materials

Appendix A Proof of Proposition 1

Suppose (a,𝐟,𝐯)Θ0({a},\mathbf{f},\mathbf{v})\in\Theta_{0} are such that for all t[0,T],𝐰[0,W]M,i[n]t\in[0,T],\mathbf{w}^{*}\in[0,W]^{M},i\in[n] the following equality holds:

a+m[M]Svi,m+wmfm(t)=a+m[M]Svi,m+wmfm(t).\displaystyle\begin{split}&{a}^{*}+\sum_{m\in[M]}S^{{v}^{*}_{i,m}+w^{*}_{m}}{f}^{*}_{m}(t)=a+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{m}}f_{m}(t).\end{split} (31)

Noting that T0+V+WTT_{0}+V+W\leq T by Assumption A1, the equation in (31) can be formulated in the frequency domain as follows: for ξ0\xi\neq 0, 𝐰[0,W]M\mathbf{w}^{*}\in[0,W]^{M}, i[n]i\in[n],

m[M]exp{j2πξ(vi,m+wm)}ϕm(ξ)=m[M]exp{j2πξ(vi,m+wm)}ϕm(ξ),\displaystyle\sum_{m\in[M]}\exp\left\{-\operatorname{j}2\pi\xi({v}^{*}_{i,m}+w^{*}_{m})\right\}{\phi}^{*}_{m}(\xi)=\sum_{m\in[M]}\exp\left\{-\operatorname{j}2\pi\xi(v_{i,m}+w^{*}_{m})\right\}\phi_{m}(\xi), (32)

where ϕm(ξ){\phi}^{*}_{m}(\xi) and ϕm(ξ)\phi_{m}(\xi) are the Fourier coefficients of fm(t){f}^{*}_{m}(t) and fm(t)f_{m}(t) at frequency ξ\xi.

We first show an intermediate result that exp{j2πξ(vi,mvi,m)}ϕm(ξ)=ϕm(ξ)\exp\left\{-\operatorname{j}2\pi\xi({v}^{*}_{i,m}-v_{i,m})\right\}{\phi}^{*}_{m}(\xi)=\phi_{m}(\xi). By employing matrix notations, (32) can be written as, for ξ0\xi\neq 0, 𝐰[0,W]M\mathbf{w}^{*}\in[0,W]^{M}, i[n]i\in[n],

𝜼(ξ)𝝍i(ξ)=𝜼(ξ)𝝍i(ξ),\displaystyle\boldsymbol{\eta^{*}}(\xi)^{\top}{\boldsymbol{\psi}}^{*}_{i}(\xi)=\boldsymbol{\eta^{*}}(\xi)^{\top}\boldsymbol{\psi}_{i}(\xi), (33)

where

𝜼(ξ)[exp{j2πξw1}exp{j2πξwM}],\displaystyle\begin{split}{\boldsymbol{\eta}}^{*}(\xi)\equiv\left[\exp\left\{-\operatorname{j}2\pi\xi w^{*}_{1}\right\}~{}~{}\cdots~{}~{}\exp\left\{-\operatorname{j}2\pi\xi w^{*}_{M}\right\}\right]^{\top}\end{split}, (34)
𝝍i(ξ)[exp{j2πξvi,1}ϕ1(ξ)exp{j2πξvi,M}ϕM(ξ)],\displaystyle\begin{split}{{\boldsymbol{\psi}}^{*}_{i}}(\xi)\equiv\left[\exp\left\{-\operatorname{j}2\pi\xi{v}^{*}_{i,1}\right\}{\phi}^{*}_{1}(\xi)~{}~{}\cdots~{}~{}\exp\left\{-\operatorname{j}2\pi\xi{v}^{*}_{i,M}\right\}{\phi}^{*}_{M}(\xi)\right]^{\top}\end{split}, (35)
𝝍i(ξ)[exp{j2πξvi,1}ϕ1(ξ)exp{j2πξvi,M}ϕM(ξ)].\displaystyle\begin{split}{\boldsymbol{\psi}_{i}}(\xi)\equiv\left[\exp\left\{-\operatorname{j}2\pi\xi v_{i,1}\right\}\phi_{1}(\xi)~{}~{}\cdots~{}~{}\exp\left\{-\operatorname{j}2\pi\xi v_{i,M}\right\}\phi_{M}(\xi)\right]^{\top}\end{split}. (36)

From Assumption A2, we know that 𝔼[𝜼(ξ)¯𝜼(ξ)]\mathbb{E}[\overline{\boldsymbol{\eta}^{*}(\xi)}{\boldsymbol{\eta}^{*}(\xi)}^{\top}] is invertible for ξ{0}\xi\in\mathbb{R}\setminus\{0\}. Therefore, we know from (33) that, for ξ0\xi\neq 0, i[n]i\in[n],

𝝍i(ξ)=𝝍i(ξ).\displaystyle{\boldsymbol{\psi}}^{*}_{i}(\xi)=\boldsymbol{\psi}_{i}(\xi). (37)

Substituting the definitions of 𝝍i(ξ){\boldsymbol{\psi}}^{*}_{i}(\xi) and 𝝍i(ξ)\boldsymbol{\psi}_{i}(\xi) in (35) and (36) into (37), we obtain that, for ξ0\xi\neq 0, i[n]i\in[n], m[M]m\in[M],

exp{j2πξ(vi,mvi,m)}ϕm(ξ)=ϕm(ξ).\displaystyle\exp\left\{-\operatorname{j}2\pi\xi({v}^{*}_{i,m}-v_{i,m})\right\}{\phi}^{*}_{m}(\xi)=\phi_{m}(\xi). (38)

Now we show that, for m[M]m\in[M] such that the set {t:fm(t)0}\{t:f^{*}_{m}(t)\neq 0\} is of positive measure, the subject-specific time shifts are identifiable up to a constant, i.e., vi,mvi,m=cm{v}^{*}_{i,m}-v_{i,m}=c_{m} for i[n]i\in[n], where cmc_{m} is a constant independent of ii. Based on (38), it can be derived that for any ξ0\xi\neq 0, m[M]m\in[M], i,i[n]i,i^{\prime}\in[n],

exp{j2πξ(vi,mvi,m)}ϕm(ξ)=exp{j2πξ(vi,mvi,m)}ϕm(ξ).\displaystyle\exp\left\{-\operatorname{j}2\pi\xi({v}^{*}_{i,m}-v_{i,m})\right\}{\phi}^{*}_{m}(\xi)=\exp\left\{-\operatorname{j}2\pi\xi({v}^{*}_{i^{\prime},m}-v_{i^{\prime},m})\right\}{\phi}^{*}_{m}(\xi). (39)

Consider function g(t)Svi,mvi,mfm(t)Svi,mvi,mfm(t)g(t)\equiv S^{{v}^{*}_{i,m}-v_{i,m}}{f}^{*}_{m}(t)-S^{{v}^{*}_{i^{\prime},m}-v_{i^{\prime},m}}{f}^{*}_{m}(t). From (39) we know that the Fourier coefficient of g(t)g(t) is zero at frequency ξ0\xi\neq 0. Moreover, from the definition of the set \mathcal{F}, we know that the function g(t)g(t) has a bounded support. By applying Lemma 3 to the function g(t)g(t), we deduce that g(t)=0g(t)=0 almost everywhere. Consequently, for any i,i[n]i,i^{\prime}\in[n] and m[M]m\in[M],

Svi,mvi,mfm(t)Svi,mvi,mfm(t)=0 a.e.\displaystyle S^{{v}^{*}_{i,m}-v_{i,m}}{f}^{*}_{m}(t)-S^{{v}^{*}_{i^{\prime},m}-v_{i^{\prime},m}}{f}^{*}_{m}(t)=0\quad\text{ a.e. } (40)

or equivalently,

S(vi,mvi,m)(vi,mvi,m)fm(t)=fm(t), a.e.\displaystyle S^{({v}^{*}_{i,m}-v_{i,m})-({v}^{*}_{i^{\prime},m}-v_{i^{\prime},m})}{f}^{*}_{m}(t)={f}^{*}_{m}(t),\quad\text{ a.e. } (41)

The equation in (41) implies that, for m[M]m\in[M] such that the set {t:fm(t)0}\{t:f^{*}_{m}(t)\neq 0\} is of positive measure, (vi,mvi,m)(vi,mvi,m)=0({v}^{*}_{i,m}-v_{i,m})-({v}^{*}_{i^{\prime},m}-v_{i^{\prime},m})=0 for any i,i[n]i,i^{\prime}\in[n]. Consequently, for i[n]i\in[n],

vi,mvi,m=cm,\displaystyle{v}^{*}_{i,m}-v_{i,m}=c_{m}, (42)

where cmc_{m} is a constant independent of ii. In other words, Statement P2 is proved.

Next we show that, for m[M]m\in[M], Scmfm(t)=fm(t)S^{c_{m}}{f}^{*}_{m}(t)=f_{m}(t) for almost every tt\in\mathbb{R}. Plugging (42) into (38), we obtain that, for m[M]m\in[M] such that the set {t:fm(t)0}\{t:f^{*}_{m}(t)\neq 0\} is of positive measure and ξ0\xi\neq 0,

exp{j2πξcm}ϕm(ξ)=ϕm(ξ).\displaystyle\exp\left\{-\operatorname{j}2\pi\xi c_{m}\right\}\phi^{*}_{m}(\xi)=\phi_{m}(\xi). (43)

Applying the inverse Fourier transformation to (43) yields

Scmfm(t)=fm(t)+ca.e.,\displaystyle S^{c_{m}}f^{*}_{m}(t)=f_{m}(t)+c\quad\text{a.e.}, (44)

where cc\in\mathbb{R} is a constant. From the definition of \mathcal{F}, we know that

fm(t)=fm(t)=0, for t[0,T0].\displaystyle f^{*}_{m}(t)=f_{m}(t)=0,\text{ for }t\in\mathbb{R}\setminus[0,T_{0}]. (45)

Combining (44) and (45), we can derive that c=0c=0. Inserting the value of cc to (44) yields that, for m[M]m\in[M] such that the set {t:fm(t)0}\{t:f^{*}_{m}(t)\neq 0\} is of positive measure,

Scmfm(t)=fm(t)a.e.\displaystyle S^{c_{m}}f^{*}_{m}(t)=f_{m}(t)\quad\text{a.e.} (46)

Notably, for m[M]m\in[M] such that fm=0f^{*}_{m}=0 almost everywhere, (46) always holds. Thus (46) holds for all m[M]m\in[M], in other words, Statement P1 is proved.

Finally, substituting (46) and (42) into (31), we derive that,

a=a.\displaystyle a^{*}=a. (47)

In other words, Statement P3 is proved. \square

Appendix B Proof of Proposition 2

Suppose there exist (𝐳,𝒂,𝐟,𝐯)Θ1(\mathbf{z},\boldsymbol{a},\mathbf{f},\mathbf{v})\in\Theta_{1} such that for all t[0,T]t\in[0,T], 𝐰[0,W]M\mathbf{w}^{*}\in[0,W]^{M}, i[n]i\in[n], the following equation holds:

azi+m[M]Svi,m+wmfzi,m(t)=azi+m[M]Svi,m+wmfzi,m(t).\displaystyle\begin{split}a^{*}_{z^{*}_{i}}+\sum_{m\in[M]}S^{v^{*}_{i,m}+w^{*}_{m}}f^{*}_{z^{*}_{i},m}(t)=a_{z_{i}}+\sum_{m\in[M]}S^{v_{i,m}+w^{*}_{m}}f_{z_{i},m}(t).\end{split} (48)

Using T0+V+WTT_{0}+V+W\leq T by Assumption A1, (48) can be formulated in the frequency domain as follows: for ξ0\xi\neq 0, 𝐰[0,W]M\mathbf{w}^{*}\in[0,W]^{M}, i[n]i\in[n],

m[M]exp{j2πξ(vi,m+wm)}ϕzi,m(ξ)=m[M]exp{j2πξ(vi,m+wm)}ϕzi,m(ξ),\displaystyle\sum_{m\in[M]}\exp\left\{-\operatorname{j}2\pi\xi(v^{*}_{i,m}+w^{*}_{m})\right\}\phi^{*}_{z^{*}_{i},m}(\xi)=\sum_{m\in[M]}\exp\left\{-\operatorname{j}2\pi\xi(v_{i,m}+w^{*}_{m})\right\}\phi_{z_{i},m}(\xi), (49)

where ϕk,m(ξ)\phi^{*}_{k,m}(\xi) and ϕk,m(ξ)\phi_{k,m}(\xi) are the Fourier coefficients of fk,mf^{*}_{k,m} and fk,mf_{k,m} at frequency ξ\xi.

We first show an intermediate result that exp{j2πξ(vi,mvi,m)}ϕzi,m(ξ)=ϕzi,m(ξ)\exp\left\{-\operatorname{j}2\pi\xi({v}^{*}_{i,m}-v_{i,m})\right\}{\phi}^{*}_{z^{*}_{i},m}(\xi)=\phi_{z_{i},m}(\xi). By employing matrix notations, (49) can be written as

𝜼(ξ)𝝍i(ξ)=𝜼(ξ)𝝍i(ξ),\displaystyle\boldsymbol{\eta^{*}}(\xi)^{\top}\boldsymbol{\psi}_{i}^{*}(\xi)=\boldsymbol{\eta^{*}}(\xi)^{\top}\boldsymbol{\psi}_{i}(\xi), (50)

where

𝜼(ξ)[exp{j2πξw1}exp{j2πξwM}],\displaystyle\begin{split}{\boldsymbol{\eta}}^{*}(\xi)\equiv\left[\exp\left\{-\operatorname{j}2\pi\xi w^{*}_{1}\right\}~{}~{}\cdots~{}~{}\exp\left\{-\operatorname{j}2\pi\xi w^{*}_{M}\right\}\right]^{\top}\end{split}, (51)
𝝍i(ξ)[exp{j2πξvi,1}ϕzi,1(ξ)exp{j2πξvi,M}ϕzi,M(ξ)],\displaystyle\begin{split}{\boldsymbol{\psi}_{i}}^{*}(\xi)\equiv\left[\exp\left\{-\operatorname{j}2\pi\xi v^{*}_{i,1}\right\}\phi^{*}_{z^{*}_{i},1}(\xi)~{}~{}\cdots~{}~{}\exp\left\{-\operatorname{j}2\pi\xi v^{*}_{i,M}\right\}\phi^{*}_{z^{*}_{i},M}(\xi)\right]^{\top}\end{split}, (52)
𝝍i(ξ)[exp{j2πξvi,1}ϕzi,1(ξ)exp{j2πξvi,M}ϕzi,M(ξ)].\displaystyle\begin{split}{\boldsymbol{\psi}_{i}}(\xi)\equiv\left[\exp\left\{-\operatorname{j}2\pi\xi v_{i,1}\right\}\phi_{z_{i},1}(\xi)~{}~{}\cdots~{}~{}\exp\left\{-\operatorname{j}2\pi\xi v_{i,M}\right\}\phi_{z_{i},M}(\xi)\right]^{\top}\end{split}. (53)

From Assumption A2 we know that 𝔼[𝜼(ξ)¯𝜼(ξ)]\mathbb{E}[\overline{\boldsymbol{\eta}^{*}(\xi)}{\boldsymbol{\eta}^{*}(\xi)}^{\top}] is invertible for ξ{0}\xi\in\mathbb{R}\setminus\{0\}. Therefore, we have that, for ξ0\xi\neq 0,

𝝍i(ξ)=𝝍i(ξ),\displaystyle\boldsymbol{\psi}^{*}_{i}(\xi)=\boldsymbol{\psi}_{i}(\xi), (54)

Substituting the definitions of 𝝍i(ξ){\boldsymbol{\psi}}^{*}_{i}(\xi) and 𝝍i(ξ)\boldsymbol{\psi}_{i}(\xi) in (52) and (53) into (54), we obtain that, for ξ0\xi\neq 0, i[n]i\in[n], m[M]m\in[M],

exp{j2πξ(vi,mvi,m)}ϕzi,m(ξ)=ϕzi,m(ξ).\displaystyle\exp\left\{-\operatorname{j}2\pi\xi(v^{*}_{i,m}-v_{i,m})\right\}\phi^{*}_{z^{*}_{i},m}(\xi)=\phi_{z_{i},m}(\xi). (55)

Now we show that cluster memberships are identifiable up to a permutation of cluster labels, i.e., zi=σ(zi)z_{i}=\sigma(z^{*}_{i}) where σ:[K][K]\sigma:[K]\to[K] is a permutation of [K][K]. To achieve this, we prove the following two statements: (i) zi=zizi=ziz_{i}=z_{i^{\prime}}\Rightarrow z^{*}_{i}=z^{*}_{i^{\prime}}; and (ii) ziziziziz_{i}\neq z_{i^{\prime}}\Rightarrow z^{*}_{i}\neq z^{*}_{i^{\prime}}. First, based on (55), we can derive that for i,i[n]i,i^{\prime}\in[n] such that zi=ziz_{i}=z_{i^{\prime}}, ξ0\xi\neq 0, and m[M]m\in[M],

exp{j2πξ(vi,mvi,m)}ϕzi,m(ξ)=exp{j2πξ(vi,mvi,m)}ϕzi,m(ξ).\displaystyle\exp\left\{-\operatorname{j}2\pi\xi(v^{*}_{i,m}-v_{i,m})\right\}\phi^{*}_{z^{*}_{i},m}(\xi)=\exp\left\{-\operatorname{j}2\pi\xi(v^{*}_{i^{\prime},m}-v_{i^{\prime},m})\right\}\phi^{*}_{z^{*}_{i^{\prime}},m}(\xi). (56)

Consider a function g(t)g(t) defined as g(t)Svi,mvi,mfzi,m(t)Svi,mvi,mfzi,m(t)g(t)\equiv S^{v^{*}_{i,m}-v_{i,m}}{f}^{*}_{z^{*}_{i},m}(t)-S^{v^{*}_{i^{\prime},m}-v_{i^{\prime},m}}{f}^{*}_{z^{*}_{i^{\prime}},m}(t). From (56) we know that the Fourier transform of g(t)g(t) is zero for ξ0\xi\neq 0. Moreover, from the definition of \mathcal{F} we know that the function g(t)g(t) has a bounded support. Applying Lemma 3 to g(t)g(t), we deduce that g(t)=0g(t)=0 almost everywhere. Consequently, for any i,i[n]i,i^{\prime}\in[n] that zi=ziz_{i}=z_{i^{\prime}} and m[M]m\in[M], the following equation holds:

Svi,mvi,mfzi,m(t)Svi,mvi,mfzi,m(t)=0, a.e.\displaystyle S^{v^{*}_{i,m}-v_{i,m}}{f}^{*}_{z^{*}_{i},m}(t)-S^{v^{*}_{i^{\prime},m}-v_{i^{\prime},m}}{f}^{*}_{z^{*}_{i^{\prime}},m}(t)=0,\quad\text{ a.e. } (57)

According to Assumption A3, if ziziz^{*}_{i}\neq z^{*}_{i^{\prime}}, then there exists m0[M]m_{0}\in[M] such that for any xx\in\mathbb{R}, {t:Sxfzi,m0(t)fzi,m0(t)}\{t\in\mathbb{R}:S^{x}f^{*}_{z^{*}_{i},m_{0}}(t)\neq f^{*}_{z^{*}_{i^{\prime}},m_{0}}(t)\} has a positive measure. Therefore, in order for (57) to hold, we must have zi=ziz^{*}_{i}=z^{*}_{i^{\prime}}. In other words,

zi=zizi=zi.\displaystyle z_{i}=z_{i^{\prime}}\Rightarrow z^{*}_{i}=z^{*}_{i^{\prime}}. (58)

Second, suppose there exist i0,i1i_{0},i_{1} such that zi0zi1z_{i_{0}}\neq z_{i_{1}} and zi0=zi1z^{*}_{i_{0}}=z^{*}_{i_{1}}. Given that there are KK non-empty clusters by definition, it is always possible to find indices i2,,iKi_{2},\ldots,i_{K} such that zi1,,ziKz^{*}_{i_{1}},\ldots,z^{*}_{i_{K}} are pairwise distinct. Based on (58), it follows that zi1,,ziKz_{i_{1}},\ldots,z_{i_{K}} are also pairwise distinct. Since zi0zi1z_{i_{0}}\neq z_{i_{1}} by definitions of i0i_{0} and i1i_{1}, there must exist k[K]{1}k\in[K]\setminus\{1\} such that zi0=zikz_{i_{0}}=z_{i_{k}}, which, according to (58), implies zi0=zikz^{*}_{i_{0}}=z^{*}_{i_{k}}. Thus, by definitions of i0i_{0} and i1i_{1}, we have zi0=zi1=zikz^{*}_{i_{0}}=z^{*}_{i_{1}}=z^{*}_{i_{k}}. This contradicts with the definition of i2,,iKi_{2},\ldots,i_{K}, which asserts that zi1z^{*}_{i_{1}} is different from zi2,,ziKz^{*}_{i_{2}},\ldots,z^{*}_{i_{K}}. Consequently, such i0i_{0} and i1i_{1} cannot exist. Hence we know that, for all i,i[n]i,i^{\prime}\in[n],

zizizizi.\displaystyle z_{i}\neq z_{i^{\prime}}\Rightarrow z^{*}_{i}\neq z^{*}_{i^{\prime}}. (59)

Combining (59) and (58), we obtain that zi=ziz^{*}_{i}=z^{*}_{i^{\prime}} if and only if zi=ziz_{i}=z_{i^{\prime}}. This further implies that there exists a permutation of [K][K], denoted by σ:[K][K]\sigma:[K]\to[K], such that for i[n]i\in[n],

zi=σ(zi).\displaystyle z_{i}=\sigma(z^{*}_{i}). (60)

In other words, Statement P4 is proved.

Now we show that, for k[K],m[M]k\in[K],m\in[M] such that the set {t:fk,m(t)0}\{t:f^{*}_{k,m}(t)\neq 0\} is of positive measure, the subject-specific time shifts are identifiable up to a constant, i.e., vi,mvi,m=ck,mv^{*}_{i,m}-v_{i,m}=c_{k,m} for i{i:zi=k}i\in\{i:z^{*}_{i}=k\}, where ck,mc_{k,m} is a constant independent of ii. Plugging (60) into (57), we have that, for i,i[n]i,i^{\prime}\in[n] such that zi=zi=kz^{*}_{i}=z^{*}_{i^{\prime}}=k and m[M]m\in[M],

Svi,mvi,mfk,m(t)=Svi,mvi,mfk,m(t), a.e.\displaystyle S^{v^{*}_{i,m}-v_{i,m}}{f}^{*}_{k,m}(t)=S^{v^{*}_{i^{\prime},m}-v_{i^{\prime},m}}{f}^{*}_{k,m}(t),\quad\text{ a.e. } (61)

From the definition of \mathcal{F} we know that fk,m(t)f^{*}_{k,m}(t) has a bounded support for k[K]k\in[K] and m[M]m\in[M]. Therefore, (61) implies that, for k[K]k\in[K] and m[M]m\in[M] such that the set {t:fk,m(t)0}\{t:f^{*}_{k,m}(t)\neq 0\} is of positive measure, and i,i[n]i,i^{\prime}\in[n] such that zi=ziz^{*}_{i}=z^{*}_{i^{\prime}},

vi,mvi,m=vi,mvi,m.\displaystyle v^{*}_{i,m}-v^{*}_{i^{\prime},m}=v_{i,m}-v_{i^{\prime},m}. (62)

It follows from (62) that

vi,mvi,m=ck,m,\displaystyle v^{*}_{i,m}-v_{i,m}=c_{k,m}, (63)

where ck,mc_{k,m}\in\mathbb{R} is a constant independent of ii. In other words, Statement P7 is proved.

Next we show that Sck,mfk,m(t)=fσ(k),m(t)S^{c_{k,m}}f^{*}_{k,m}(t)=f_{\sigma(k),m}(t) for almost every tt\in\mathbb{R}. Plugging (63) and (60) into (55), we obtain that, for k[K]k\in[K] and m[M]m\in[M] such that the set {t:fk,m(t)0}\{t:f^{*}_{k,m}(t)\neq 0\} is of positive measure and ξ0\xi\neq 0,

exp{j2πξck,m}ϕk,m(ξ)=ϕσ(k),m(ξ),\displaystyle\exp\left\{-\operatorname{j}2\pi\xi c_{k,m}\right\}\phi^{*}_{k,m}(\xi)=\phi_{\sigma(k),m}(\xi), (64)

Applying the inverse Fourier transformation to (64) yields

Sck,mfk,m(t)=fσ(k),m(t)+ca.e.,\displaystyle S^{c_{k,m}}f^{*}_{k,m}(t)=f_{\sigma(k),m}(t)+c\quad\text{a.e.}, (65)

where cc\in\mathbb{R} is a constant. From the definition of \mathcal{F}, we know that

fk,m(t)=fk,m(t)=0, for t[0,T0].\displaystyle f^{*}_{k,m}(t)=f_{k,m}(t)=0,\text{ for }t\in\mathbb{R}\setminus[0,T_{0}]. (66)

Combining (65) and (66), we can derive that c=0c=0. Inserting the value of cc to (65) yields that, for k[K]k\in[K] and m[M]m\in[M] such that the set {t:fk,m(t)0}\{t:f^{*}_{k,m}(t)\neq 0\} is of positive measure,

Sck,mfk,m(t)=fσ(k),m(t)a.e.\displaystyle S^{c_{k,m}}f^{*}_{k,m}(t)=f_{\sigma(k),m}(t)\quad\text{a.e.} (67)

Notably, for k[K]k\in[K] and m[M]m\in[M] such that fk,m=0f^{*}_{k,m}=0 almost everywhere, (67) always holds. In other words, (67) holds for all k[K]k\in[K] and m[M]m\in[M]. Therefore, Statement P6 is proved.

Finally, substituting (67), (63) and (60) into (48), we derive that, for k[K]k\in[K],

ak=aσ(k).\displaystyle a^{*}_{k}=a_{\sigma(k)}. (68)

In other words, Statement P5 is proved. \square

Appendix C Connection between the additive shape invariant model and the functional principal component analysis (FPCA)

When the variances of vi,mv_{i,m}’s and wr,mw^{*}_{r,m}’s are both close to zero, the proposed model in (1) can be approximated using the Taylor expansion:

λi,r(t)=a+m[M]Sui,r,mfm(t)=a+m[M]fm({t𝔼ui,r,m}{ui,r,m𝔼ui,r,m})a+m[M]{fm(t𝔼ui,r,m)(ui,r,m𝔼ui,r,m)Dfm(t𝔼ui,r,m)}=a+m[M]fm(t𝔼ui,r,m)m[M](ui,r,m𝔼ui,r,m)Dfm(t𝔼ui,r,m),\displaystyle\begin{split}\lambda_{i,r}(t)&=a+\sum_{m\in[M]}S^{u_{i,r,m}}f_{m}(t)\\ &=a+\sum_{m\in[M]}f_{m}\left(\{t-\mathbb{E}u_{i,r,m}\}-\{u_{i,r,m}-\mathbb{E}u_{i,r,m}\}\right)\\ &\approx a+\sum_{m\in[M]}\left\{f_{m}(t-\mathbb{E}u_{i,r,m})-(u_{i,r,m}-\mathbb{E}u_{i,r,m})Df_{m}(t-\mathbb{E}u_{i,r,m})\right\}\\ &=a+\sum_{m\in[M]}f_{m}(t-\mathbb{E}u_{i,r,m})-\sum_{m\in[M]}(u_{i,r,m}-\mathbb{E}u_{i,r,m})Df_{m}(t-\mathbb{E}u_{i,r,m}),\end{split} (69)

where ui,r,mvi,m+wr,mu_{i,r,m}\equiv v_{i,m}+w^{*}_{r,m}, and DfmDf_{m} denotes the first order derivative of fmf_{m}. In (69), the first equality follows from the definition of ui,r,mu_{i,r,m}, the second equality follows from the definition of the shift operator, the approximation in the third line follows from the Taylor expansion. To elucidate the connection between the proposed additive shape invariant model and the FPCA, we define a new set of parameters:

μ(t)\displaystyle\mu(t) a+m[M]fm(t𝔼ui,r,m),\displaystyle\equiv a+\sum_{m\in[M]}f_{m}(t-\mathbb{E}u_{i,r,m}), (70)
ζi,r,m\displaystyle\zeta_{i,r,m} (ui,r,m𝔼ui,r,m)Dfm(t𝔼ui,r,m)t,\displaystyle\equiv-(u_{i,r,m}-\mathbb{E}u_{i,r,m})\left\|Df_{m}(t-\mathbb{E}u_{i,r,m})\right\|_{t}, (71)
ψm(t)\displaystyle\psi_{m}(t) Dfm(t𝔼ui,r,m)Dfm(t𝔼ui,r,m)t1.\displaystyle\equiv Df_{m}(t-\mathbb{E}u_{i,r,m})\left\|Df_{m}(t-\mathbb{E}u_{i,r,m})\right\|_{t}^{-1}. (72)

By definitions of ζi,r,m\zeta_{i,r,m} and ψm(t)\psi_{m}(t) in (71) and (72), we know that 𝔼ζi,r,m=0\mathbb{E}\zeta_{i,r,m}=0, and ψm(t)t=1\|\psi_{m}(t)\|_{t}=1. Using the new set of parameters in (70), (71) and (72), the model approximation in (69) can be expressed as follows:

λi,r(t)\displaystyle\lambda_{i,r}(t) μ(t)+m[M]ζi,r,mψm(t).\displaystyle\approx\mu(t)+\sum_{m\in[M]}\zeta_{i,r,m}~{}\psi_{m}(t). (73)

When {fm(t𝔼ui,r,m):m[M]}\{f_{m}(t-\mathbb{E}u_{i,r,m}):m\in[M]\} have non-overlapping supports, it follows that ψm(t)\psi_{m}(t)’s are mutually orthogonal. Consequently, the approximate model in (73) is an FPCA model, where ψ1(t),,ψM(t)\psi_{1}(t),\cdots,\psi_{M}(t) correspond to the first MM eigenfunctions, and ζi,r,1,,ζi,r,M\zeta_{i,r,1},\ldots,\zeta_{i,r,M} correspond to the principal components associated with the eigenfunctions.

Appendix D Detailed derivation of solutions in (14) and (15)

The first optimization problem in the centering step (11) can be elaborated as follows:

𝐚^,𝐟^=argmin𝐚,𝐟L1(𝐳^,𝐚,𝐟,𝐯^)=argmin𝐚,𝐟i[n],r[R]βi,r1Tyi,r(t)Ni,r(T){az^i+m[M]Sv^i,m+wr,mfz^i,m(t)}t2.\displaystyle\begin{split}\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime}&=\underset{\mathbf{a}^{\prime},\mathbf{f}^{\prime}}{\arg\min}~{}L_{1}(\hat{\mathbf{z}},\mathbf{a}^{\prime},\mathbf{f}^{\prime},\hat{\mathbf{v}})\\ &=\underset{\mathbf{a}^{\prime},\mathbf{f}^{\prime}}{\arg\min}\sum_{i\in[n],r\in[R]}\beta_{i,r}~{}\frac{1}{T}\Bigg{\|}\frac{y_{i,r}(t)}{N_{i,r}(T)}-\bigg{\{}a^{\prime}_{\hat{z}_{i}}+\sum_{m\in[M]}S^{\hat{v}_{i,m}+w^{*}_{r,m}}f^{\prime}_{\hat{z}_{i},m}(t)\bigg{\}}\Bigg{\|}_{t}^{2}.\end{split} (74)

Utilizing the renowned Parseval’s theorem, the optimization problem in (74) can be formulated as follows:

𝐚^,ϕ^=argmin𝐚,ϕi[n],r[R]βi,rl|ηi,r,lNi,r(T){az^i𝟏(l=0)+m[M]exp{j2πl(v^i,m+wr,m)T1}ϕz^i,m,l}|2,\displaystyle\begin{split}\hat{\mathbf{a}}^{\prime},\hat{\boldsymbol{\phi}}^{\prime}&=\underset{\mathbf{a}^{\prime},\boldsymbol{\phi}^{\prime}}{\arg\min}\sum_{i\in[n],r\in[R]}\beta_{i,r}~{}\sum_{l\in\mathbb{Z}}\Bigg{|}\frac{\eta_{i,r,l}}{N_{i,r}(T)}-\bigg{\{}a^{\prime}_{\hat{z}_{i}}\mathbf{1}(l=0)\hskip 5.0pt+\\ &\hskip 108.405pt\sum_{m\in[M]}\exp\big{\{}-\operatorname{j}2\pi l(\hat{v}_{i,m}+w^{*}_{r,m})T^{-1}\big{\}}\phi^{\prime}_{\hat{z}_{i},m,l}\bigg{\}}\Bigg{|}^{2},\end{split} (75)

where ϕ(ϕk,m,l)k[K],m[M],l\boldsymbol{\phi}^{\prime}\equiv(\phi^{\prime}_{k,m,l})_{k\in[K],m\in[M],l\in\mathbb{Z}}, {ϕz^i,m,l:l}\{\phi^{\prime}_{\hat{z}_{i},m,l}:l\in\mathbb{Z}\} denotes the Fourier coefficients of fzi,m(t)f^{\prime}_{z_{i},m}(t), {ηi,r,l:l}\{\eta_{i,r,l}:l\in\mathbb{Z}\} denotes the Fourier coefficients of yi,r(t)y_{i,r}(t), and j\operatorname{j} denotes the imaginary unit.

The optimization problem in (75) can be solved by breaking it down to smaller independent problems. For k[K]k\in[K] and ll\in\mathbb{Z}, let ϕk,,l(ϕk,m,l)m[M]\boldsymbol{\phi}^{\prime}_{k,*,l}\equiv({\phi}^{\prime}_{k,m,l})_{m\in[M]}. For l0l\neq 0, the objective function associated with ϕk,,l\boldsymbol{\phi}^{\prime}_{k,*,l} is essentially a weighted sum of squares:

ϕ^k,,l\displaystyle\hat{\boldsymbol{\phi}}^{\prime}_{k,*,l} =argminϕk,,l(𝐡k,l𝐄k,lϕk,,l)𝐁k(𝐡k,l𝐄k,lϕk,,l)¯,\displaystyle=\underset{\boldsymbol{\phi}^{\prime}_{k,*,l}}{\arg\min}~{}(\mathbf{h}_{k,l}-\mathbf{E}_{k,l}\boldsymbol{\phi}^{\prime}_{k,*,l})^{\top}\mathbf{B}_{k}~{}\overline{(\mathbf{h}_{k,l}-\mathbf{E}_{k,l}\boldsymbol{\phi}^{\prime}_{k,*,l})}, (76)

where 𝐡k,l(ηi,r,lNi,r(T)1)(i,r)𝒞^k×[R]\mathbf{h}_{k,l}\equiv(\eta_{i,r,l}N_{i,r}(T)^{-1})_{(i,r)\in\hat{\mathcal{C}}_{k}\times[R]}, 𝒞^k{i[n]:z^i=k}\hat{\mathcal{C}}_{k}\equiv\{i\in[n]:\hat{z}_{i}=k\}, 𝐄k,l[exp{j2πl(v^i,m+wr,m)T1}](i,r)𝒞^k×[R],m[M]\mathbf{E}_{k,l}\equiv[\exp\{-\operatorname{j}2\pi l(\hat{v}_{i,m}+w^{*}_{r,m})T^{-1}\}]_{(i,r)\in\hat{\mathcal{C}}_{k}\times[R],m\in[M]}, 𝐁k\mathbf{B}_{k} is a diagonal matrix of (βi,r)(i,r)𝒞^k×[R](\beta_{i,r})_{(i,r)\in\hat{\mathcal{C}}_{k}\times[R]}, and z¯\overline{z} denotes the complex conjugate of zz for any zz\in\mathbb{C}. As a result, the solution to (76) is:

ϕ^k,,l=(𝐄k,l¯𝐁k𝐄k,l)1(𝐄k,l¯𝐁k𝐡k,l),for l0.\displaystyle\hat{\boldsymbol{\phi}}^{\prime}_{k,*,l}=\left(\overline{\mathbf{E}_{k,l}}^{\top}\mathbf{B}_{k}~{}\mathbf{E}_{k,l}\right)^{-1}\left(\overline{\mathbf{E}_{k,l}}^{\top}\mathbf{B}_{k}~{}\mathbf{h}_{k,l}\right),\quad\text{for }l\neq 0. (77)

For l=0l=0, the parameter ϕk,,0\boldsymbol{\phi}^{\prime}_{k,*,0} can be estimated by exploiting the definition of \mathcal{F}. From the definition of \mathcal{F} we know that fk,m(0)=0{f}_{k,m}(0)=0 for k[K],m[M]k\in[K],m\in[M]. Consequently, fk,m(0)=fk,m(0)Λk1=0{f}^{\prime}_{k,m}(0)={f}_{k,m}(0)\Lambda_{k}^{-1}=0. In addition, using the inverse Fourier transformation, we can derive that lϕk,m,l=fk,m(0)\sum_{l\in\mathbb{Z}}{\phi}^{\prime}_{k,m,l}={f}^{\prime}_{k,m}(0). Hence we know that lϕk,m,l=0\sum_{l\in\mathbb{Z}}{\phi}^{\prime}_{k,m,l}=0. This leads to the estimate of ϕk,,0\boldsymbol{\phi}^{\prime}_{k,*,0} as

ϕ^k,,0\displaystyle\hat{\boldsymbol{\phi}}^{\prime}_{k,*,0} =|l|0,l0ϕ^k,,l,\displaystyle=-\sum_{|l|\leq\ell_{0},l\neq 0}\hat{\boldsymbol{\phi}}^{\prime}_{k,*,l}, (78)

where 0\ell_{0} is the truncation frequency to facilitate the numerical feasibility of computing ϕ^k,,0\hat{\boldsymbol{\phi}}^{\prime}_{k,*,0}.

Based on ϕ^\hat{\boldsymbol{\phi}}^{\prime}, it is straightforward to obtain estimation for 𝐚\mathbf{a}^{\prime} as follows:

a^k=argminaki𝒞^k,r[R]βi,r|ηi,r,0Ni,r(T){ak+m[M]ϕ^k,m,0}|2=[i𝒞^k,r[R]βi,r{ηi,r,0Ni,r(T)m[M]ϕ^k,m,0}][i𝒞^k,r[R]βi,r]1=T1m[M]ϕ^k,m,0,\displaystyle\begin{split}\hat{{a}}_{k}^{\prime}&=\underset{{a}_{k}^{\prime}}{\arg\min}\sum_{i\in\hat{\mathcal{C}}_{k},r\in[R]}\beta_{i,r}~{}\Bigg{|}\frac{\eta_{i,r,0}}{N_{i,r}(T)}-\bigg{\{}a^{\prime}_{k}+\sum_{m\in[M]}\hat{\phi}^{\prime}_{k,m,0}\bigg{\}}\Bigg{|}^{2}\\ &=\Bigg{[}\sum_{i\in\hat{\mathcal{C}}_{k},r\in[R]}\beta_{i,r}\bigg{\{}\frac{\eta_{i,r,0}}{N_{i,r}(T)}-\sum_{m\in[M]}\hat{\phi}^{\prime}_{k,m,0}\bigg{\}}\Bigg{]}{\Bigg{[}\sum_{i\in\hat{\mathcal{C}}_{k},r\in[R]}\beta_{i,r}\Bigg{]}}^{-1}\\ &=T^{-1}-\sum_{m\in[M]}\hat{\phi}^{\prime}_{k,m,0},\end{split} (79)

where in the third equality we use ηi,r,0=T10Tyi,r(t)dt=T1Ni,r(T)\eta_{i,r,0}=T^{-1}\int_{0}^{T}y_{i,r}(t)\mathrm{d}t=T^{-1}N_{i,r}(T). Moreover, using the inverse Fourier transform, the estimation for 𝐟{\mathbf{f}}^{\prime} can be derived as follows:

f^k,m(t)\displaystyle\hat{f}^{\prime}_{k,m}(t) =|l|0ϕ^k,m,lexp(j2πltT1).\displaystyle=\sum_{|l|\leq\ell_{0}}\hat{\phi}_{k,m,l}\exp(~{}\operatorname{j}2\pi ltT^{-1}). (80)

Appendix E Implementation of Newton’s method in the clustering step

We employ the Newton’s method to solve the optimization problem in (23) of the main text. The objective function in (23) can be formulated in the frequency domain:

L1,i(k,𝐚^,𝐟^,𝐯i)=r[R]βi,rlL1,i,l(𝐯i),\displaystyle L_{1,i}(k,\hat{\mathbf{a}}^{\prime},\hat{\mathbf{f}}^{\prime},\mathbf{v}_{i})=\sum_{r\in[R]}\beta_{i,r}~{}\sum_{l\in\mathbb{Z}}L_{1,i,l}(\mathbf{v}_{i}), (81)

where L1,i,l(𝐯i)L_{1,i,l}(\mathbf{v}_{i}) is defined as

L1,i,l(𝐯i)|ηi,r,lNi,r(T){a^k𝟏(l=0)+m[M]exp{j2πl(vi,m+wr,m)T1}ϕ^k,m,l}|2.\displaystyle L_{1,i,l}(\mathbf{v}_{i})\equiv\Bigg{|}\frac{\eta_{i,r,l}}{N_{i,r}(T)}-\bigg{\{}\hat{a}^{\prime}_{k}\mathbf{1}(l=0)+\sum_{m\in[M]}\exp\big{\{}-\operatorname{j}2\pi l({v}_{i,m}+w^{*}_{r,m})T^{-1}\big{\}}\hat{\phi}^{\prime}_{k,m,l}\bigg{\}}\Bigg{|}^{2}. (82)

The definition in (82) suggests that L1,i,l(𝐯i)L_{1,i,l}(\mathbf{v}_{i}) remains constant with respect to 𝐯i\mathbf{v}_{i} when l=0l=0. Moreover, since f^k,m\hat{f}_{k,m} is calculated using the truncated Fourier series (see (18) of the main text), it follows that ϕ^k,m,l=0\hat{\phi}^{\prime}_{k,m,l}=0 for |l|>0|l|>\ell_{0}. As a result, L1,i,l(𝐯i)L_{1,i,l}(\mathbf{v}_{i}) remains constant with respect to 𝐯i\mathbf{v}_{i} for |l|>0|l|>\ell_{0}. Therefore, the optimization problem in (23) can be formulated in the frequency domain as

𝐯~i|k=argmin𝐯ir[R]βi,r|l|0,l0L1,i,l(𝐯i)=argmin𝐯ir[R]βi,r|l|0,l0|ηi,r,lNi,r(T)m[M]exp{j2πl(vi,m+wr,m)T1}ϕ^k,m,l|2argmin𝐯iQi(𝐯i).\displaystyle\begin{split}\tilde{\mathbf{v}}_{i|k}&=\underset{\mathbf{v}_{i}}{\arg\min}\sum_{r\in[R]}\beta_{i,r}~{}\sum_{|l|\leq\ell_{0},l\neq 0}L_{1,i,l}(\mathbf{v}_{i})\\ &=\underset{\mathbf{v}_{i}}{\arg\min}\sum_{r\in[R]}\beta_{i,r}\sum_{|l|\leq\ell_{0},l\neq 0}\Bigg{|}\frac{\eta_{i,r,l}}{N_{i,r}(T)}-\sum_{m\in[M]}\exp\big{\{}-\operatorname{j}2\pi l({v}_{i,m}+w^{*}_{r,m})T^{-1}\big{\}}\hat{\phi}^{\prime}_{k,m,l}\Bigg{|}^{2}\\ &\equiv\underset{\mathbf{v}_{i}}{\arg\min}~{}Q_{i}(\mathbf{v}_{i}).\end{split} (83)

where the first equality follows from the fact that L1,i,l(𝐯i)L_{1,i,l}(\mathbf{v}_{i}) remains constant with respect to 𝐯i\mathbf{v}_{i} for l=0l=0 or |l|>0|l|>\ell_{0}, and the second equality follows from the definition of L1,i,l(𝐯i)L_{1,i,l}(\mathbf{v}_{i}) in (82).

We solve the optimization problem in (83) using the Newton’s method. Specifically, the estimate of 𝐯i\mathbf{v}_{i} is updated iteratively via

𝐯^i𝐯^itrunc{[2Qi(𝐯^i)]1Qi(𝐯^i)},\displaystyle\hat{\mathbf{v}}_{i}\leftarrow\hat{\mathbf{v}}_{i}-\mathrm{trunc}\{[\nabla^{2}Q_{i}(\hat{\mathbf{v}}_{i})]^{-1}\nabla Q_{i}(\hat{\mathbf{v}}_{i})\}, (84)

where trunc{x}\mathrm{trunc}\{x\} is a function defined as

trunc{x}{T/10,ifx<T/10,x,ifx[T/10,T/10],T/10,ifx>T/10,\displaystyle\mathrm{trunc}\{x\}\equiv\begin{cases}-T/10,&\mathrm{if}~{}x<-T/10,\\ x,&\mathrm{if}~{}x\in[-T/10,T/10],\\ T/10,&\mathrm{if}~{}x>T/10,\end{cases} (85)

2Qi(𝐯^i)(2Qi(𝐯i)/vi,m1vi,m2)(m1,m2)[M]2\nabla^{2}Q_{i}(\hat{\mathbf{v}}_{i})\equiv({\partial^{2}Q_{i}(\mathbf{v}_{i})}/{\partial v_{i,m_{1}}\partial v_{i,m_{2}}})_{(m_{1},m_{2})\in[M]^{2}} denotes the Hessian matrix, and Qi(𝐯^i)(Qi(𝐯i)/vi,m)m[M]\nabla Q_{i}(\hat{\mathbf{v}}_{i})\equiv({\partial Q_{i}(\mathbf{v}_{i})}/{\partial v_{i,m}})_{m\in[M]} denotes the gradient. The gradient can be calculated through the following partial derivatives: for m[M]m\in[M],

Qi(𝐯i)vi,m=2r[R]βi,r|l|0,l0e([(j2πlT1)exp{j2πl(vi,m+wr,m)T1}ϕ^k,m,l]×[ηi,r,lNi,r(T)m[M]{m}exp{j2πl(vi,m+wr,m)T1}ϕ^k,m,l]¯).\displaystyle\begin{split}\frac{\partial Q_{i}(\mathbf{v}_{i})}{\partial v_{i,m}}&=-2\sum_{r\in[R]}\beta_{i,r}\sum_{|l|\leq\ell_{0},l\neq 0}\Re{e}\Bigg{(}\bigg{[}(-\operatorname{j}2\pi lT^{-1})\exp\big{\{}-\operatorname{j}2\pi l({v}_{i,m}+w^{*}_{r,m})T^{-1}\big{\}}\hat{\phi}^{\prime}_{k,m,l}\bigg{]}\\ &\hskip 74.438pt\times\overline{\bigg{[}\frac{\eta_{i,r,l}}{N_{i,r}(T)}-\sum_{m^{\prime}\in[M]\setminus\{m\}}\exp\big{\{}-\operatorname{j}2\pi l({v}_{i,m^{\prime}}+w^{*}_{r,m^{\prime}})T^{-1}\big{\}}\hat{\phi}^{\prime}_{k,m^{\prime},l}\bigg{]}}\Bigg{)}.\end{split} (86)

The Hessian matrix can be calculated through the following second order partial derivatives: for m[M]m\in[M],

2Qi(𝐯i)vi,m2=2r[R]βi,r|l|0,l0e([(j2πlT1)2exp{j2πl(vi,m+wr,m)T1}ϕ^k,m,l]×[ηi,r,lNi,r(T)m[M]{m}exp{j2πl(vi,m+wr,m)T1}ϕ^k,m,l]¯),\displaystyle\begin{split}\frac{\partial^{2}Q_{i}(\mathbf{v}_{i})}{\partial v_{i,m}^{2}}&=-2\sum_{r\in[R]}\beta_{i,r}\sum_{|l|\leq\ell_{0},l\neq 0}\Re{e}\Bigg{(}\bigg{[}(-\operatorname{j}2\pi lT^{-1})^{2}\exp\big{\{}-\operatorname{j}2\pi l({v}_{i,m}+w^{*}_{r,m})T^{-1}\big{\}}\hat{\phi}^{\prime}_{k,m,l}\bigg{]}\\ &\hskip 74.438pt\times\overline{\bigg{[}\frac{\eta_{i,r,l}}{N_{i,r}(T)}-\sum_{m^{\prime}\in[M]\setminus\{m\}}\exp\big{\{}-\operatorname{j}2\pi l({v}_{i,m^{\prime}}+w^{*}_{r,m^{\prime}})T^{-1}\big{\}}\hat{\phi}^{\prime}_{k,m^{\prime},l}\bigg{]}}\Bigg{)},\end{split} (87)

while for m1,m2[M]m_{1},m_{2}\in[M] such that m1m2m_{1}\neq m_{2},

2Qi(𝐯i)vi,m1vi,m2=2r[R]βi,r|l|0,l0e([(j2πlT1)exp{j2πl(vi,m1+wr,m1)T1}ϕ^k,m1,l]×[(j2πlT1)exp{j2πl(vi,m2+wr,m2)T1}ϕ^k,m2,l]¯).\displaystyle\begin{split}\frac{\partial^{2}Q_{i}(\mathbf{v}_{i})}{\partial v_{i,m_{1}}\partial v_{i,m_{2}}}&=2\sum_{r\in[R]}\beta_{i,r}\sum_{|l|\leq\ell_{0},l\neq 0}\Re{e}\Bigg{(}\bigg{[}(-\operatorname{j}2\pi lT^{-1})\exp\big{\{}-\operatorname{j}2\pi l({v}_{i,m_{1}}+w^{*}_{r,m_{1}})T^{-1}\big{\}}\hat{\phi}^{\prime}_{k,m_{1},l}\bigg{]}\\ &\hskip 108.405pt\times\overline{\bigg{[}(-\operatorname{j}2\pi lT^{-1})\exp\big{\{}-\operatorname{j}2\pi l({v}_{i,m_{2}}+w^{*}_{r,m_{2}})T^{-1}\big{\}}\hat{\phi}^{\prime}_{k,m_{2},l}\bigg{]}}\Bigg{)}.\end{split} (88)

Appendix F Additional simulation results

F.1 Heuristic selection method for γ\gamma

Figure 7 shows the trend of L1L_{1} and L2L_{2} as γ\gamma changes, where the candidate range of γ\gamma is discussed in Section I. From Figure 7, we can see consistent trends of L1L_{1} and L2L_{2} across the designated number of clusters K^\hat{K}, indicating that the choice of γ\gamma using the proposed heuristic method is insensitive to the change in K^\hat{K}.

Figure 8 shows the clustering performance as γ\gamma changes. We can see that the clustering performance slightly improves whenγ\gamma increases from 10410^{-4} to 10210^{-2}. This is because the within-cluster heterogeneity of the distribution of event times remains unchanged (see Figure 7(1a)), while the within-cluster heterogeneity of event counts decreases (see Figure 7(1b)). Moreover, we can see from Figure 8 that once γ\gamma exceeds 0.010.01, the clustering performance rapidly deteriorates. This is because the estimated clusters have increasing within-cluster heterogeneity of the distribution of event times when γ\gamma exceeds 0.010.01 (see Figure 7(1a)). Therefore, by choosing the largest γ\gamma before observing a significant increase in L1L_{1}, we are able to achieve optimal clustering performance.

Refer to caption log10(γ)\log_{10}(\gamma)L2L_{2} K^=3\hat{K}=3K^=4\hat{K}=4K^=5\hat{K}=5K^=6\hat{K}=6(b) Refer to caption log10(γ)\log_{10}(\gamma)L1L_{1} (a)
Figure 7: Trends of L1L_{1} and L2L_{2} as γ\gamma changes averaged across 50005000 replicates. Synthetic data is generated with K=4K=4, n=40n=40, R=3R=3, τ=0.3\tau=0.3, ρ=0.5\rho=0.5. In the legend, K^\hat{K} represents the designated number of clusters as input of the algorithm. It is evident that the trend of L1L_{1} and L2L_{2} is consistent across the designated number of clusters.
Refer to caption log10(γ)\log_{10}(\gamma)1-ARI K^=3\hat{K}=3K^=4\hat{K}=4K^=5\hat{K}=5K^=6\hat{K}=6
Figure 8: Clustering performance as γ\gamma changes averaged across 50005000 replicates. Synthetic data is generated with K=4K=4, n=40n=40, R=3R=3, τ=0.3\tau=0.3, ρ=0.5\rho=0.5. In the legend, K^\hat{K} represents the designated number of clusters as input of the algorithm.

F.2 Sensitivity analysis with respect to 0\ell_{0}

Figure 9 shows how the performance of clustering and intensity estimation changes with different values of 0\ell_{0}. Both criteria decrease when 0\ell_{0} increases from 2 to 8, because higher values of 0\ell_{0} allow the estimated intensities to capture more information in the distribution of event times. As 0\ell_{0} further increases, both 1-ARI and MISE stabilize, indicating that the estimation results are not significantly affected by further changes in 0\ell_{0} as long as 0\ell_{0} is sufficiently large to capture the signal in the distribution of event times, for instance, for the synthetic data in Figure 9, 0=8\ell_{0}=8 is large enough.

Refer to caption 0\ell_{0} 1-ARI (a) Refer to caption 0\ell_{0} MISE (b)
Figure 9: Sensitivity analysis with respect to 0\ell_{0} averaged over 5000 replicates. Synthetic data is generated with K=4K=4, n=40n=40, R=3R=3, τ=0.3\tau=0.3, ρ=0.5\rho=0.5. Panel (a) and (b) show that the performance of clustering and intensity component estimation become stable when 08\ell_{0}\geq 8.

F.3 Effectiveness of proposed initialization scheme

Figure 10 shows the efficacy of proposed initialization scheme compared to a random initialization scheme. In the random initialization scheme, initial time shifts are set using Unif(0,v^i,m(0))\text{Unif}(0,\hat{v}^{(0)}_{i,m}), and initial cluster memberships randomly assigned. The results in Figure 10 demonstrate that the proposed initialization method yields better estimation than the random initialization with multiple restarts. Moreover, keep increasing the number of restarts only results in diminishing improvement in performance. Overall, using proposed initialization scheme is computationally efficient and achieve promising estimation performance.

Refer to caption Number of observations (R)1-ARI (a) Refer to caption Number of observations (R)MISE Our proposalRandom, no restartRandom, 3 restartsRandom, 5 restarts(b)
Figure 10: Performance of proposed initialization scheme averaged over 50005000 replicates. Synthetic data is generated with K=4K=4, n=40n=40, τ=0.3\tau=0.3, ρ=0.5\rho=0.5, and varying RR. In the legend, “our proposal” stands for the proposed initialization scheme, “random” stands for the random initialization scheme with restarts. For the random initialization scheme, time shifts are initialized using Unif(0,v^i,m(0))\text{Unif}(0,\hat{v}^{(0)}_{i,m}), and cluster memberships are initialized randomly. The best result among restarts is selected according to the smallest objective function. Panel (a) and (b) show that the proposed initialization scheme leads to better estimation than the random initialization scheme with 5 restarts.

Appendix G Supplement for real data analysis

G.1 Data preprocessing

Experimental trials

Table 2 presents the feedback types under different experimental conditions, along with the number of trials for each condition. The trials where the left visual grating was of higher contrast and the feedback was reward are analyzed using our proposed method. We refer to these trials as training trials. The rest of trials, which are not used to fit the proposed model, are employed to explore the roles of identified clusters under different tasks.

Table 2: Experimental conditions and total numbers of trials. In the “scenario” column, “L” and “R” denote higher contrast in the left and right gratings, respectively. The “choice” column shows “L” for moving the left grating towards the center and “R” for moving the right grating The “feedback” column shows ”1” for reward and ”-1” for penalty.
Scenario Choice Feedback # trials
L L 1 102
R R 1 81
R L -1 8
L R -1 3

Selection of time window

Due to the design of the experiment, the duration of time between stimulus onset and feedback delivery across trials. In order to create comparable samples, we analyze a 3.5-second time window whose center is positioned at the midpoint between 0.1 seconds before the visual stimuli onset and 2 seconds after the feedback delivery. To be more specific, for observation r{1,,R}r\in\{1,\cdots,R\}, the start and end time of observation rr is set as

ObsStartTimer{(VisTimer0.1)+(FeedTimer+2)}/23.5/2,\displaystyle\mathrm{ObsStartTime}_{r}\equiv\{(\mathrm{VisTime}_{r}-0.1)+(\mathrm{FeedTime}_{r}+2)\}/2-3.5/2, (89)
ObsEndTimer{(VisTimer0.1)+(FeedTimer+2)}/2+3.5/2,\displaystyle\mathrm{ObsEndTime}_{r}\equiv\{(\mathrm{VisTime}_{r}-0.1)+(\mathrm{FeedTime}_{r}+2)\}/2+3.5/2, (90)

where ObsStartTimer\mathrm{ObsStartTime}_{r} denotes the start time of the time window for trial rr, VisTimer\mathrm{VisTime}_{r} denotes the visual stimulus onset time of trial rr, FeedTimer\mathrm{FeedTime}_{r} denotes the feedback delivery time of trial rr, and ObsEndTimer\mathrm{ObsEndTime}_{r} denotes the end time of the time window for trial rr. In cases where the time window started within 1 second after the feedback delivery time of the previous trial, or ended after the visual stimulus onset time of the next trial, we augment the spike trains as detailed in the next section.

Here we analyze the firing activity within a T=3.5T=3.5 seconds time window due to the following reasons. Figure 11 displays the average neural firing rate post feedback delivery over all training trials. It is evident that the average firing rate stabilizes at around 2 seconds post feedback delivery. Moreover, among the 102 training trials, the maximum duration between visual stimulus onset and feedback delivery is 1.4 seconds. Therefore, by setting T=3.5T=3.5, we can ensure that each observation includes at least 0.1 second before visual stimulus onset and 2 seconds after feedback delivery.

Refer to caption
Figure 11: Average neural firing intensities over neurons and trials. Each gray curve represents an average firing intensity of the midbrain region in a single training trial. All trials are aligned based on the feedback delivery time. The red curve represents the average firing intensity across all training trials. It is observed that the average firing intensity stabilizes approximately 2 seconds after feedback delivery.

Augmentation of spike trains

For trials with ObsStartTimer<ConcluTimer1\mathrm{ObsStartTime}_{r}<\mathrm{ConcluTime}_{r-1}, where ConcluTimer1\mathrm{ConcluTime}_{r-1} denotes the trial conclusion time (i.e., 1 second post feedback delivery time) of trial r1r-1, we augment the spike trains to impute the missing data. Figure 12(a) shows that, for trials with ObsStartTimer<ConcluTimer1\mathrm{ObsStartTime}_{r}<\mathrm{ConcluTime}_{r-1}, the average intensity remains stable within 0.4 second post the conclusion time of previous trial. Therefore, we augment spikes by making shifted copies of spikes within 0.4 second post ConcluTimer1\mathrm{ConcluTime}_{r-1}. We append the following artificial spikes to the firing activity of neuron i[n]i\in[n] in observation r{r:ObsStartTimer<ConcluTimer1}r\in\{r:\mathrm{ObsStartTime}_{r}<\mathrm{ConcluTime}_{r-1}\}:

{ti,r,j0.4×k:ti,r,j[ConcluTimer1,ConcluTimer1+0.4],ti,r,j0.4×k[ObsStartTimer,ConcluTimer1],k}.\displaystyle\begin{split}\bigg{\{}t_{i,r,j}-0.4\times k:&~{}t_{i,r,j}\in[\mathrm{ConcluTime}_{r-1},\mathrm{ConcluTime}_{r-1}+0.4],\\ &~{}t_{i,r,j}-0.4\times k\in[\mathrm{ObsStartTime}_{r},\mathrm{ConcluTime}_{r-1}],k\in\mathbb{N}\bigg{\}}.\end{split} (91)

Additionally, we augment the spike trains for trials with ObsEndTimer>VisTimer+1\mathrm{ObsEndTime}_{r}>\mathrm{VisTime}_{r+1}. Figure 12(b) shows that, for trials with ObsEndTimer>VisTimer+1\mathrm{ObsEndTime}_{r}>\mathrm{VisTime}_{r+1}, the average intensity remains stable within 0.4 seconds before the visual stimulus onset of the next trials. Therefore, we append the following spikes to the firing activity of neuron i[n]i\in[n] in observation r{r:ObsEndTimer>VisTimer+1}r\in\{r:\mathrm{ObsEndTime}_{r}>\mathrm{VisTime}_{r+1}\}:

{ti,r,j+0.4×k:ti,r,j[VisTimer+10.4,VisTimer+1],ti,r,j+0.4×k[VisTimer+1,ObsEndTimer],k}\displaystyle\begin{split}\bigg{\{}t_{i,r,j}+0.4\times k:&~{}t_{i,r,j}\in[\mathrm{VisTime}_{r+1}-0.4,\mathrm{VisTime}_{r+1}],\\ &~{}t_{i,r,j}+0.4\times k\in[\mathrm{VisTime}_{r+1},\mathrm{ObsEndTime}_{r}],k\in\mathbb{N}\bigg{\}}\end{split} (92)
Refer to caption (a)Refer to caption (b)
Figure 12: Neural firing patterns after the previous trial’s conclusion or before the next trial’s visual stimulus onset. Each gray curve represents an average firing intensity of the midbrain region in one trial. The red curves represent the average of gray curves. Panel (a) shows the average neural firing intensities post conclusion of the previous trials for the trial set {r:ObsStartTimer<ConcluTimer1}\{r:\mathrm{ObsStartTime}_{r}<\mathrm{ConcluTime}_{r-1}\}. The trials are aligned by the conclusion time of the previous trials. Panel (b) shows the average neural firing intensities prior visual stimuli onset of the next trials for the trial set {r:ObsEndTimer>VisTimer+1}\{r:\mathrm{ObsEndTime}_{r}>\mathrm{VisTime}_{r+1}\}. The trials are aligned by the visual stimulus onset time of the next trials.

G.2 Tuning parameter selection

We choose the values of γ\gamma and KK using the heuristic method proposed in Section 3.3. Figure 13 presents the results of the heuristic method that informs our choice of γ\gamma and KK. Firstly, we establish a preliminary estimate of KK by applying the k-means algorithm on Ni(T)N_{i}(T)’s where Ni(T)R1r[R]Ni,r(T)N_{i}(T)\equiv R^{-1}\sum_{r\in[R]}{N_{i,r}(T)}, and selecting KK using the elbow method. Figure 13(a) illustrates that the within-cluster variance has only marginal reduction as the number of clusters exceeds 3. Therefore, we set K=3K=3 as a preliminary estimation of KK. Secondly, given the preliminary estimation of KK, we choose the largest γ\gamma before observing a significant increase in L1L_{1}. Figure 13(b) shows that the value of L1L_{1} a significant upward trend once γ\gamma exceeds 10410^{-4}. Therefore, we set γ=104\gamma=10^{-4}. Finally, with γ\gamma fixed at 10410^{-4}, we refine the value of KK by identifying the elbow point on the curve of overall objective function (i.e., L1+γL2L_{1}+\gamma L_{2}) against KK. Figure 13(c) suggests that K=3K=3 is a feasible choice.

Refer to caption (a)Number of clusters Within sum of squares Refer to caption (b)log10(γ)\log_{10}(\gamma) L1L_{1} Refer to caption (c)Number of clusters L1+γL2L_{1}+\gamma L_{2}
Figure 13: Results of the heuristic method for selection of γ\gamma and KK. Panel (a) shows the within cluster variance of Ni(T)N_{i}(T)’s obtained from the k-means algorithm against the number of clusters, where Ni(T)R1r[R]Ni,r(T)N_{i}(T)\equiv R^{-1}\sum_{r\in[R]}{N_{i,r}(T)}. Using the elbow method, we set K=3K=3 as a preliminary estimation of KK. Panel (b) shows the values of L1L_{1} as γ\gamma increases. The upward trend of L1L_{1} when γ\gamma exceeds 10410^{-4} suggests the choice of γ=104\gamma=10^{-4}. Panel (c) shows the overall objective function value obtained from various numbers of clusters. The curve suggests a diminishing decrease in the objective function value when KK exceeds 3.

G.3 Intensity component refinement

We refine the estimated intensity components by employing the proposed additive shape invariant model in (1) on each estimated cluster. For Cluster 1, we apply Algorithm 1 on spike trains of neurons in Cluster 1, denoted as 𝒩1{Ni,r(t):i𝒞^1,r[R]}\mathcal{N}_{1}\equiv\{N_{i,r}(t):i\in\hat{\mathcal{C}}_{1},r\in[R]\}. We set M=1M=1 and let the observation-specific time shifts wr,1w^{*}_{r,1}’s to be the visual stimulus onset time. The rest of the parameters are set as follows: K=1K=1, γ=0\gamma=0, 0=10\ell_{0}=10, ϵ=0.005\epsilon=0.005. The algorithm is applied with 20 restarts, where each restart involves distinct initial values for subject-specific time shifts. These initial values are obtained by jittering the proposed initial subject-specific time shifts as follows: for i𝒞^1i\in\hat{\mathcal{C}}_{1}, m=1m=1 and x{1,,20}x\in\{1,\cdots,20\},

v~i,m,x(0)v^i,m(0)+εi,m,x,εi,m,xunif(T/50,T/50),\displaystyle\tilde{v}_{i,m,x}^{(0)}\equiv\hat{v}_{i,m}^{(0)}+\varepsilon_{i,m,x},\quad\varepsilon_{i,m,x}\sim\mathrm{unif}(-T/50,T/50), (93)

where v~i,m,x(0)\tilde{v}_{i,m,x}^{(0)} represents the initial subject-specific time shifts associated subject ii and stimulus mm in restart xx, and v^i,m(0)\hat{v}_{i,m}^{(0)} represents the proposed initial subject-specific time shifts defined in (26). The best result of intensity components among the 20 restarts is selected based on the smallest value of the objective function.

For Cluster 2 and 3, Algorithm 1 is applied to 𝒩k{Ni,r(t):i𝒞^k,r[R]}\mathcal{N}_{k}\equiv\{N_{i,r}(t):i\in\hat{\mathcal{C}}_{k},r\in[R]\}, k{2,3}k\in\{2,3\}, in a similar manner as for 𝒩1\mathcal{N}_{1}. The only difference lies in the specification of M=2M=2 for Clusters 2 and 3, where m=1m=1 corresponds to the visual stimulus and m=2m=2 corresponds to the auditory tone cue.

G.4 Supplementary results of neural data analysis

Neural firing intensity in condition “L,R”

Figure 14 displays the average firing patterns of the three clusters in four different experimental conditions. From the figure we can see that, there is high uncertainty in condition “L,R” since there are only 3 trials. However, the firing patterns in the “L,R” condition generally support our hypothesis regarding the roles of the clusters. For instance, in “L,R” trials, the firing rates of neurons in Cluster 1 show an upward trend before movement onset, which is consistent with our hypothesis that Cluster 1 neurons are responsible for executing the turning of the wheel. Moreover, the firing rates of neurons in Cluster 2 in “L,R” trials remain stationary after feedback delivery, aligning with our hypothesis that Cluster 2 neurons might respond to perceptions of stimuli such as rewards.

Refer to caption Cluster 1Time from movement onset (s) Intensity (spikes/s) Refer to caption Cluster 2 Time from visual stimulus onset (s) Refer to caption Cluster 3Time from movement onset (s) L,LR,LR,RL,R
Figure 14: Firing patterns of the three clusters in four different experimental conditions. The lines represent the mean firing intensities. The shaded regions represent the mean firing intensities plus or minus two standard errors of the mean. The legends represent “scenario, choice”, for instance, “L,R” represents the trials where the left grating was of a higher contrast, and the mouse chose to move the right grating. In the “L,R” condition, there is high uncertainty due to the limited number of trials.

G.5 Results obtained using kCFC

For comparison, we apply the kCFC [Chiou and Li, 2007] to the neural data. Specifically, we apply the kCFC to the aggregated spike trains across the training trials with K=3K=3, and set the remaining parameter values consistent with those specified in Section 5.3 of the main text. Table 3 shows the association between clusters obtained from the kCFC and clusters obtained from the proposed ASIMM. From the table we see that the clusters from the kCFC are mixtures of the clusters from ASIMM. Particularly, for Cluster 1 and 2 identified by kCFC, approximately half of the neurons in these clusters are not grouped together in ASIMM’s results.

Figure 15 shows the firing patterns of clusters from the kCFC. We observe that firing patterns bear some resemblance to those obtained from ASIMM. However, Cluster 1 and 2 from the kCFC do not exhibit laterality. This lack of laterality is likely because Cluster 1 and 2 are mixtures of clusters from the ASIMM. For instance, the firing intensity of Cluster 1 from the ASIMM has a higher peak when the mouse chose the left visual grating (see Figure 6(1b)), whereas the firing intensity of Cluster 3 from the ASIMM has a higher peak when the mouse chose the right grating (see Figure 6(3b)). Consequently, when Cluster 1 and 3 from our proposed method are mixed, their individual lateral biases are counteracted, resulting in the absence of clear laterality.

Table 3: Contingency table of the clusters from the proposed ASIMM and the clusters from the kCFC.
ASIMM
kCFC Cluster 1 Cluster 2 Cluster 3
Cluster 1 60 2 71
Cluster 2 3 14 10
Cluster 3 0 7 58
Refer to caption Intensity (spikes/s)L,LR,LR,RTime from movement onset (s)(1a)Cluster 1Refer to caption Time from visual stimulus onset (s)(2a)Cluster 2Refer to caption Time from movement onset (s)(3a)Cluster 3Refer to caption Standardized valueTime from movement onset (s)(1b)IntensityWheel velocityRefer to caption Intensity (spikes/s)Time from feedback delivery (s)(2b)Refer to caption Time from feedback delivery (s)(3b)
Figure 15: Firing patterns of clusters from kCFC. The three columns correspond to the three estimated clusters. The shaded area represents the mean firing intensities plus or minus two standard errors of the mean. The legend in panel (1a) represents “scenario, choice”, for instance, “R,L” represents the trials where the right grating was of a higher contrast, and the mouse chose to move the left grating. Panel (1b) illustrates the average firing intensity and wheel velocity, where both firing intensity and wheel velocity are standardized to range from 0 to 1. The firing intensities of Cluster 1 and 2 do not exhibit laterality.

Appendix H Auxiliary Lemmas

Lemma 1.

Let ηmexp{j2πξwm}\eta_{m}\equiv\exp\{-\operatorname{j}2\pi\xi w^{*}_{m}\} and 𝛈(ηm)m[M]\boldsymbol{\eta}\equiv(\eta_{m})_{m\in[M]}, where ξ{0}\xi\in\mathbb{R}\setminus\{0\}. If {wm:m[M]}\{w^{*}_{m}\in\mathbb{C}:m\in[M]\} are independent random variables, and among them, at least M1M-1 variables have non-zero variance, i.e., |{m[M]:var(wm)>0}|M1|\{m\in[M]:\mathrm{var}(w^{*}_{m})>0\}|\geq M-1, then the matrix 𝔼[𝛈¯𝛈]\mathbb{E}[\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}] is invertible.

Proof.

Without loss of generality, we assume that var(wm)>0\mathrm{var}(w^{*}_{m})>0 for m=2,,Mm=2,\ldots,M. Suppose 𝔼[𝜼¯𝜼]\mathbb{E}[\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}] is not invertible. By the definition of invertible matrix, there exists 𝐱M{𝟎}\mathbf{x}\in\mathbb{C}^{M}\setminus\{\mathbf{0}\} such that 𝔼[𝜼¯𝜼]𝐱=𝟎\mathbb{E}[\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}]\mathbf{x}=\mathbf{0}. Thus we can derive that

𝔼[𝐱¯𝜼¯𝜼𝐱]=0.\displaystyle\mathbb{E}[\bar{\mathbf{x}}^{\top}\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}\mathbf{x}]=0. (94)

Moreover, we have

𝔼[𝐱¯𝜼¯𝜼𝐱]\displaystyle\mathbb{E}[\bar{\mathbf{x}}^{\top}\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}\mathbf{x}] =𝔼[|m[M]ηmxm|2]var(m[M]ηmxm)=m[M]|xm|2var(ηm)0,\displaystyle=\mathbb{E}\left[\bigg{|}\sum_{m\in[M]}\eta_{m}x_{m}\bigg{|}^{2}\right]\geq\mathrm{var}\bigg{(}\sum_{m\in[M]}\eta_{m}x_{m}\bigg{)}=\sum_{m\in[M]}|x_{m}|^{2}~{}\mathrm{var}(\eta_{m})\geq 0, (95)

where the second equality follows from the assumption that {wm:m[M]}\{w^{*}_{m}:m\in[M]\} are independent. In order for (94) and (95) to hold simultaneously, we must have |xm|2var(ηm)=0|x_{m}|^{2}~{}\mathrm{var}(\eta_{m})=0 for m[M]m\in[M]. From the assumption that var(wm)>0\mathrm{var}(w^{*}_{m})>0 for m=2,,Mm=2,\ldots,M, we obtain that var(ηm)>0\mathrm{var}(\eta_{m})>0 for m=2,,Mm=2,\ldots,M. Thus, we know that xm=0x_{m}=0 for m=2,,Mm=2,\ldots,M. Since 𝐱0\mathbf{x}\neq 0 by definition, we deduce that x10x_{1}\neq 0. Hence, in order for |x1|2var(η1)=0|x_{1}|^{2}~{}\mathrm{var}(\eta_{1})=0 to hold we must have

var(η1)=0.\displaystyle\textrm{var}(\eta_{1})=0. (96)

Additionally, in order for (94) and (95) to hold simultaneously, we must have

𝔼[|m[M]ηmxm|2]=var(m[M]ηmxm),\mathbb{E}\left[\bigg{|}\sum_{m\in[M]}\eta_{m}x_{m}\bigg{|}^{2}\right]=\mathrm{var}\bigg{(}\sum_{m\in[M]}\eta_{m}x_{m}\bigg{)},

or equivalently,

|𝔼(m[M]ηmxm)|2=0.\displaystyle\left|\mathbb{E}\bigg{(}\sum_{m\in[M]}\eta_{m}x_{m}\bigg{)}\right|^{2}=0. (97)

Plugging xm=0x_{m}=0 for m=2,,Mm=2,\ldots,M into (97), we obtain that

𝔼(m[M]ηmxm)=𝔼(η1)x1.\displaystyle\mathbb{E}\left(\sum_{m\in[M]}\eta_{m}x_{m}\right)=\mathbb{E}(\eta_{1})x_{1}. (98)

Combining (97), (98), and that x10x_{1}\neq 0, we deduce that

𝔼[η1]=0.\displaystyle\mathbb{E}[\eta_{1}]=0. (99)

Combining (96) and (99), we can derive that η1=0\eta_{1}=0. However, we know that |η1|=1|\eta_{1}|=1 by definition, implying that η1=0\eta_{1}=0 is impossible. Therefore, 𝔼[𝜼¯𝜼]\mathbb{E}[\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}] must be invertible. ∎

Lemma 2.

Let ηmexp{j2πξwm}\eta_{m}\equiv\exp\{-\operatorname{j}2\pi\xi w^{*}_{m}\} and 𝛈(ηm)m[M]\boldsymbol{\eta}\equiv(\eta_{m})_{m\in[M]}, where ξ{0}\xi\in\mathbb{R}\setminus\{0\}. If wm=wm1+δm1w^{*}_{m}=w^{*}_{m-1}+\delta_{m-1} for m=2,,Mm=2,\ldots,M, where {δm:m[M1]}\{\delta_{m}\in\mathbb{C}:m\in[M-1]\} are independent random variables with non-zero variance, then the matrix 𝔼[𝛈¯𝛈]\mathbb{E}[\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}] is invertible.

Proof.

Suppose 𝔼[𝜼¯𝜼]\mathbb{E}[\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}] is not invertible. By the definition of invertible matrix, there exists 𝐱M{𝟎}\mathbf{x}\in\mathbb{C}^{M}\setminus\{\mathbf{0}\} such that 𝔼[𝜼¯𝜼]𝐱=𝟎\mathbb{E}[\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}]\mathbf{x}=\mathbf{0}. Thus we can derive that

𝔼[𝐱¯𝜼¯𝜼𝐱]=0.\displaystyle\mathbb{E}[\bar{\mathbf{x}}^{\top}\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}\mathbf{x}]=0. (100)

On the other hand, we have

𝔼[𝐱¯𝜼¯𝜼𝐱]=𝔼|m[M]ηmxm|2=𝔼|m[M]exp{j2πξwm}xm|2=𝔼|exp{j2πξw1}[x1+m=2Mexp{j2πξ(δ1++δm1)}xm]|2=𝔼|x1+m=2Mexp{j2πξ(δ1++δm1)}xm|2\displaystyle\begin{split}\mathbb{E}[\bar{\mathbf{x}}^{\top}\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}\mathbf{x}]&=\mathbb{E}\left|\sum_{m\in[M]}\eta_{m}x_{m}\right|^{2}\\ &=\mathbb{E}\left|\sum_{m\in[M]}\exp\{-\operatorname{j}2\pi\xi w^{*}_{m}\}x_{m}\right|^{2}\\ &=\mathbb{E}\left|\exp\{-\operatorname{j}2\pi\xi w^{*}_{1}\}\left[x_{1}+\sum_{m=2}^{M}\exp\{-\operatorname{j}2\pi\xi(\delta_{1}+\cdots+\delta_{m-1})\}x_{m}\right]\right|^{2}\\ &=\mathbb{E}\left|x_{1}+\sum_{m=2}^{M}\exp\{-\operatorname{j}2\pi\xi(\delta_{1}+\cdots+\delta_{m-1})\}x_{m}\right|^{2}\end{split} (101)

where the second equation follows from the definition of ηm{\eta}_{m}, the third equation follows from the assumption that wm=wm1+δm1w^{*}_{m}=w^{*}_{m-1}+\delta_{m-1}, and the last equation follows from the fact that |exp{jθ}|=1|\exp\{-\operatorname{j}\theta\}|=1 for any θ\theta\in\mathbb{R}. Define Amxm+m=m+1Mexp{j2πξ(δm++δm1)}xmA_{m}\equiv x_{m}+\sum_{m^{\prime}=m+1}^{M}\exp\{-\operatorname{j}2\pi\xi(\delta_{m}+\cdots+\delta_{m^{\prime}-1})\}x_{m^{\prime}} for m[M1]m\in[M-1], and AMxMA_{M}\equiv x_{M}. Combining (100) and (101) we have 𝔼|A1|2=0\mathbb{E}|A_{1}|^{2}=0, which further implies that

A1=0.\displaystyle A_{1}=0. (102)

From the definition of AmA_{m}’s we know that Am=xm+exp{j2πξδm}Am+1A_{m}=x_{m}+\exp\{-\operatorname{j}2\pi\xi\delta_{m}\}A_{m+1}. Thus (102) implies that x1+exp{j2πξδ1}A2=0x_{1}+\exp\{-\operatorname{j}2\pi\xi\delta_{1}\}A_{2}=0, or equivalently,

A2=x1exp{j2πξδ1}.\displaystyle A_{2}=-x_{1}\exp\{~{}\operatorname{j}2\pi\xi\delta_{1}\}. (103)

From the definition of A2A_{2}, we know that A2A_{2} is a function of {δ2,,δM}\{\delta_{2},\ldots,\delta_{M}\}, which are independent to δ1\delta_{1} by our assumption. Furthermore, δ1\delta_{1} has a positive variance based on our assumption. Therefore, in order for (103) to hold, it must holds that x1=0x_{1}=0, or equivalently,

A2=0.\displaystyle A_{2}=0. (104)

Similarly, we can derive that A3==AM=0A_{3}=...=A_{M}=0. Therefore, using the definition of AmA_{m}’s, we know that,

xm\displaystyle x_{m} =Amexp{j2πξδm}Am+1=0,\displaystyle=A_{m}-\exp\{-\operatorname{j}2\pi\xi\delta_{m}\}A_{m+1}=0, m=1,,M1,\displaystyle m=1,\cdots,M-1, (105)
xM\displaystyle x_{M} =AM=0,\displaystyle=A_{M}=0, m=M.\displaystyle m=M. (106)

Equation (105) and (106) contradict the definition of 𝐱\mathbf{x} that asserts 𝐱𝟎\mathbf{x}\neq\mathbf{0}. Therefore, 𝔼[𝜼¯𝜼]\mathbb{E}[\bar{\boldsymbol{\eta}}\boldsymbol{\eta}^{\top}] must be invertible.

Lemma 3.

Let fL1()f\subseteq L^{1}\left(\mathbb{R}\right) and f^\hat{f} be its Fourier transform, and let A{x:f(x)0}A\equiv\left\{x\in\mathbb{R}:f(x)\neq 0\right\} and B{ξ:f^(ξ)0}B\equiv\{\xi\in\mathbb{R}:\hat{f}(\xi)\neq 0\}. Then

m(A)< and m(B)<f=0 a.e. ,\displaystyle m(A)<\infty\text{ and }{m}(B)<\infty\Rightarrow f=0\quad\text{ a.e. }, (107)

where mm denotes the Lebesgue measure.

The proof of Lemma 3 can be found in Benedicks [1985].

Appendix I Reasonable range of γ\gamma

To find a reasonable range for γ\gamma, we explore the magnitude of 𝔼[L1(𝐳,𝐚,𝐟,𝐯)]\mathbb{E}[L_{1}(\mathbf{z}^{*},{\mathbf{a}^{*}}^{\prime},{\mathbf{f}^{*}}^{\prime},\mathbf{v}^{*})] and 𝔼[L2(𝐳,𝚲)]\mathbb{E}[L_{2}(\mathbf{z}^{*},\boldsymbol{\Lambda}^{*})], where 𝐳,𝒂,𝐟,𝐯,𝚲\mathbf{z}^{*},{\boldsymbol{a}^{*}}^{\prime},{\mathbf{f}^{*}}^{\prime},\mathbf{v}^{*},\boldsymbol{\Lambda}^{*} denote the true parameters. The magnitude of 𝔼[L1(𝐳,𝐚,𝐟,𝐯)]\mathbb{E}[L_{1}(\mathbf{z}^{*},{\mathbf{a}^{*}}^{\prime},{\mathbf{f}^{*}}^{\prime},\mathbf{v}^{*})] and 𝔼[L2(𝐳,𝚲)]\mathbb{E}[L_{2}(\mathbf{z}^{*},\boldsymbol{\Lambda}^{*})] can be approximated as follows:

𝔼L1(𝐳,𝐚,𝐟,𝐯)(nR)(TΔt)1,\displaystyle\mathbb{E}L_{1}(\mathbf{z}^{*},{\mathbf{a}^{*}}^{\prime},{\mathbf{f}^{*}}^{\prime},\mathbf{v}^{*})\lessapprox(nR){(T\Delta t)}^{-1}, (108)
𝔼L2(𝐳,𝚲)i[n],r[R]Ni,r(T).\displaystyle\mathbb{E}L_{2}(\mathbf{z}^{*},\boldsymbol{\Lambda}^{*})\approx\sum_{i\in[n],r\in[R]}N_{i,r}(T). (109)

The derivation of (108) and (109) is provided later in this section. Combining (108) and (109), we obtain that

𝔼L1(𝐳,𝐚,𝐟,𝐯)𝔼L2(𝐳,𝚲)(nR)(TΔt)1{i[n],r[R]Ni,r(T)}1γ0.\displaystyle\begin{split}\frac{\mathbb{E}L_{1}(\mathbf{z}^{*},{\mathbf{a}^{*}}^{\prime},{\mathbf{f}^{*}}^{\prime},\mathbf{v}^{*})}{\mathbb{E}L_{2}(\mathbf{z}^{*},\boldsymbol{\Lambda}^{*})}&\lessapprox(nR){(T\Delta t)}^{-1}\bigg{\{}\sum_{i\in[n],r\in[R]}N_{i,r}(T)\bigg{\}}^{-1}\equiv\gamma_{0}.\end{split} (110)

Consequently, we suggested to explore γ\gamma in the range [105×γ0,10×γ0][10^{-5}\times\gamma_{0},10\times\gamma_{0}].

Derivation of (108)

From the definition of L1L_{1} in (9) of the main text, we know that

𝔼L1(𝐳,𝐚,𝐟,𝐯)\displaystyle\mathbb{E}L_{1}(\mathbf{z}^{*},{\mathbf{a}^{*}}^{\prime},{\mathbf{f}^{*}}^{\prime},\mathbf{v}^{*}) =i[n],r[R]𝔼(Ni,r(T)Tyi,r(t)Ni,r(T)λi,r(t)Λi,r(T)t2)\displaystyle=\sum_{i\in[n],r\in[R]}\mathbb{E}\left(\frac{N_{i,r}(T)}{T}\left\|\frac{y_{i,r}(t)}{N_{i,r}(T)}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|_{t}^{2}\right) (111)

where λi,r(t)azi+m[M]Svi,m+wr,mfzi,m(t)\lambda^{*}_{i,r}(t)\equiv a^{*}_{z_{i}}+\sum_{m\in[M]}S^{v^{*}_{i,m}+w^{*}_{r,m}}f^{*}_{z^{*}_{i},m}(t), and Λi,r(T)0Tλi,r(t)dt\Lambda^{*}_{i,r}(T)\equiv\int_{0}^{T}\lambda^{*}_{i,r}(t)\mathrm{d}t. Therefore, it suffices to show that

𝔼(Ni,r(T)Tyi,r(t)Ni,r(T)λi,r(t)Λi,r(T)t2)(TΔt)1.\displaystyle\mathbb{E}\left(\frac{N_{i,r}(T)}{T}\left\|\frac{y_{i,r}(t)}{N_{i,r}(T)}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|_{t}^{2}\right)\lessapprox(T\Delta t)^{-1}. (112)

To this end, we consider the following conditional expectation:

𝔼(Ni,r(T)Tyi,r(t)Ni,r(T)λi,r(t)Λi,r(T)t2|Ni,r(T))=𝔼(Ni,r(T)TNi,r(t+Δt)Ni,r(t)Ni,r(T)Δtλi,r(t)Λi,r(T)t2|Ni,r(T))=𝔼(Ni,r(T)Tj=1Ni,r(T)1(t<ti,r,jt+Δt)Ni,r(T)Δtλi,r(t)Λi,r(T)t2|Ni,r(T))=Ni,r(T)T0T𝔼(|j=1Ni,r(T)1(t<ti,r,jt+Δt)Ni,r(T)Δtλi,r(t)Λi,r(T)|2|Ni,r(T))dt,Ni,r(T)T0T𝔼(|X(t)Y(t)|2|Ni,r(T))dt,\displaystyle\begin{split}&\mathbb{E}\left(\frac{N_{i,r}(T)}{T}\left\|\frac{y_{i,r}(t)}{N_{i,r}(T)}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|_{t}^{2}\Bigg{\lvert}N_{i,r}(T)\right)\\ &=\mathbb{E}\left(\frac{N_{i,r}(T)}{T}\left\|\frac{N_{i,r}(t+\Delta t)-N_{i,r}(t)}{N_{i,r}(T)\Delta t}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|_{t}^{2}\Bigg{\lvert}N_{i,r}(T)\right)\\ &=\mathbb{E}\left(\frac{N_{i,r}(T)}{T}\left\|\frac{\sum_{j=1}^{N_{i,r}(T)}\textbf{1}(t<t_{i,r,j}\leq t+\Delta t)}{N_{i,r}(T)\Delta t}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|_{t}^{2}\Bigg{\lvert}N_{i,r}(T)\right)\\ &=\frac{N_{i,r}(T)}{T}\int_{0}^{T}\mathbb{E}\left(\left|\frac{\sum_{j=1}^{N_{i,r}(T)}\textbf{1}(t<t_{i,r,j}\leq t+\Delta t)}{N_{i,r}(T)\Delta t}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right|^{2}\Bigg{\lvert}N_{i,r}(T)\right)\mathrm{d}t,\\ &\equiv\frac{N_{i,r}(T)}{T}\int_{0}^{T}\mathbb{E}\left(\big{|}X(t)-Y(t)\big{|}^{2}\big{\lvert}N_{i,r}(T)\right)\mathrm{d}t,\end{split} (113)

where the first equality follows from the definition of yi,r(t)y_{i,r}(t), the second equality follows from the definition of Ni,r(t)N_{i,r}(t), and the third equality follows from the definition of L2L^{2}-norm. By definitions of X(t)X(t) and Y(t)Y(t), we know that

𝔼[X(t)|Ni,r(T)]=𝔼(j=1Ni,r(T)1(t<ti,r,jt+Δt)Ni,r(T)Δt|Ni,r(T))=𝔼(1(t<ti,r,1t+Δt)Δt|Ni,r(T))=(t<ti,r,1t+ΔtNi,r(T))Δtλi,r(t)Λi,r(T)=Y(t)\displaystyle\begin{split}\mathbb{E}\left[X(t)\lvert N_{i,r}(T)\right]&=\mathbb{E}\left(\frac{\sum_{j=1}^{N_{i,r}(T)}\textbf{1}(t<t_{i,r,j}\leq t+\Delta t)}{N_{i,r}(T)\Delta t}\Bigg{\lvert}N_{i,r}(T)\right)\\ &=\mathbb{E}\left(\frac{\textbf{1}(t<t_{i,r,1}\leq t+\Delta t)}{\Delta t}\Bigg{\lvert}N_{i,r}(T)\right)\\ &=\frac{\mathbb{P}\big{(}t<t_{i,r,1}\leq t+\Delta t\mid N_{i,r}(T)\big{)}}{\Delta t}\\ &\approx\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}=Y(t)\end{split} (114)

where the first equality follows from the definition of X(t)X(t). The second equality holds because, for Poisson processes, the event times {ti,r,j:j=1,,Ni,r(T)}\{t_{i,r,j}:j=1,\cdots,N_{i,r}(T)\} can be treated as i.i.d. samples given the total event count Ni,r(T)N_{i,r}(T). The approximation in the fourth line follows from the definition of λi,r(t)\lambda^{*}_{i,r}(t) and Λi,r(T)\Lambda^{*}_{i,r}(T). Therefore we have

𝔼(|X(t)Y(t)|2|Ni,r(T))=var(X(t)|Ni,r(T))=1|Δt|2(var{1(t<ti,r,jt+Δt)Ni,r(T)}Ni,r(T))=1|Δt|2({t<ti,r,jt+ΔtNi,r(T)}[1{t<ti,r,jt+ΔtNi,r(T)}]Ni,r(T))1|Δt|21Ni,r(T)(λi,r(t)Λi,r(T)Δt)(1λi,r(t)Λi,r(T)Δt),\displaystyle\begin{split}&~{}~{}~{}~{}\mathbb{E}\left(\big{|}X(t)-Y(t)\big{|}^{2}\big{\lvert}N_{i,r}(T)\right)\\ &=\mathrm{var}\left(X(t)\big{\lvert}N_{i,r}(T)\right)\\ &=\frac{1}{|\Delta t|^{2}}\left(\frac{\mathrm{var}\{\textbf{1}(t<t_{i,r,j}\leq t+\Delta t)\mid N_{i,r}(T)\}}{N_{i,r}(T)}\right)\\ &=\frac{1}{|\Delta t|^{2}}\left(\frac{\mathbb{P}\{t<t_{i,r,j}\leq t+\Delta t\mid N_{i,r}(T)\}[1-\mathbb{P}\{t<t_{i,r,j}\leq t+\Delta t\mid N_{i,r}(T)\}]}{N_{i,r}(T)}\right)\\ &\approx\frac{1}{|\Delta t|^{2}}\frac{1}{N_{i,r}(T)}\left(\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\Delta t\right)\left(1-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\Delta t\right),\end{split} (115)

where the second equality follows from the definition of X(t)X(t), the approximation in the fourth line follows from the definition of λi,r(t)\lambda^{*}_{i,r}(t) and Λi,r(T)\Lambda^{*}_{i,r}(T).

Plugging (115) into (113), we can derive that

𝔼(Ni,r(T)Tyi,r(t)Ni,r(T)λi,r(t)Λi,r(T)2|Ni,r(T))Ni,r(T)T0T1|Δt|21Ni,r(T)(λi,r(t)Λi,r(T)Δt)(1λi,r(t)Λi,r(T)Δt)dt=1T(1Δtλi,r(t)Λi,r(T)2)(TΔt)1.\displaystyle\begin{split}&\mathbb{E}\left(\frac{N_{i,r}(T)}{T}\left\|\frac{y_{i,r}(t)}{N_{i,r}(T)}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|^{2}\Bigg{\lvert}N_{i,r}(T)\right)\\ &\approx\frac{N_{i,r}(T)}{T}\int_{0}^{T}\frac{1}{|\Delta t|^{2}}\frac{1}{N_{i,r}(T)}\left(\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\Delta t\right)\left(1-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\Delta t\right)\mathrm{d}t\\ &=\frac{1}{T}\left(\frac{1}{\Delta t}-\left\|\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|^{2}\right)\\ &\lessapprox(T\Delta t)^{-1}.\end{split} (116)

Finally, by taking the expectation of (116), we obtain

𝔼(Ni,r(T)Tyi,r(t)Ni,r(T)λi,r(t)Λi,r(T)t2)=𝔼[𝔼(Ni,r(T)Tyi,r(t)Ni,r(T)λi,r(t)Λi,r(T)t2|Ni,r(T))](TΔt)1.\displaystyle\begin{split}\small&~{}~{}~{}~{}\mathbb{E}\left(\frac{N_{i,r}(T)}{T}\left\|\frac{y_{i,r}(t)}{N_{i,r}(T)}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|_{t}^{2}\right)\\ &=\mathbb{E}\left[\mathbb{E}\left(\frac{N_{i,r}(T)}{T}\left\|\frac{y_{i,r}(t)}{N_{i,r}(T)}-\frac{\lambda^{*}_{i,r}(t)}{\Lambda^{*}_{i,r}(T)}\right\|_{t}^{2}\Bigg{\lvert}N_{i,r}(T)\right)\right]\\ &\lessapprox(T\Delta t)^{-1}.\end{split} (117)

Derivation of (109)

Employing the definition of L2L_{2} in (10) of the main text, along with the variance expression for the Poisson distribution, we can derive that

𝔼[L2(𝐳,𝚲)]\displaystyle\mathbb{E}[L_{2}(\mathbf{z}^{*},\boldsymbol{\Lambda}^{*})] =𝔼[i[n],r[R]|Ni,r(T)Λi,r(T)|2]=i[n],r[R]Λi,r(T)i[n],r[R]Ni,r(T).\displaystyle=\mathbb{E}\left[\sum_{i\in[n],r\in[R]}\big{|}N_{i,r}(T)-\Lambda^{*}_{i,r}(T)\big{|}^{2}\right]=\sum_{i\in[n],r\in[R]}\Lambda^{*}_{i,r}(T)\approx\sum_{i\in[n],r\in[R]}N_{i,r}(T). (118)