This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A statistical framework for analyzing activity pattern from GPS data

Haoyang Wu, Yen-Chi Chen, Adrian Dobra
Contact: yenchic@uw.edu The authors gratefully acknowledge the support of NSF DMS-2310578, DMS-2141808 and NIH U24-AG072122.  
Abstract

We introduce a novel statistical framework for analyzing the GPS data of a single individual. Our approach models daily GPS observations as noisy measurements of an underlying random trajectory, enabling the definition of meaningful concepts such as the average GPS density function. We propose estimators for this density function and establish their asymptotic properties. To study human activity patterns using GPS data, we develop a simple movement model based on mixture models for generating random trajectories. Building on this framework, we introduce several analytical tools to explore activity spaces and mobility patterns. We demonstrate the effectiveness of our approach through applications to both simulated and real-world GPS data, uncovering insightful mobility trends.


Keywords: GPS, nonparametric statistics, kernel density estimation, random trajectory, activity space, mobility pattern

1 Introduction

The rapid advancement of Global Positioning System (GPS) technology has revolutionized various domains, ranging from transportation and urban planning to environmental studies and public health (Papoutsis et al., , 2021; Yang et al., , 2022; Mao et al., , 2023). GPS data provides high-resolution spatial and temporal information, enabling precise tracking of movement and location patterns. With the increasing availability of GPS-equipped devices, such as smartphones, vehicles, and wearable deices, vast amounts of trajectory data are generated every day.

In scientific research, GPS data play a crucial role in understanding human mobility, tracing activity spaces, and analyzing travel behaviors (Barnett and Onnela, , 2020; Andrade and Gama, , 2020; Elmasri et al., , 2023). Researchers leverage GPS data to study commuting patterns, urban dynamics, wildlife tracking, and disaster response. By examining spatial-temporal GPS records, scientists gain insights into mobility patterns of individuals and the dynamics of population movements. Despite the rich information embedded in the GPS data, little is known about how to properly analyze the data from a statistical perspective.

While various methods exist for processing GPS data, a comprehensive statistical framework for analysis is still lacking. Traditional approaches often rely on heuristic methods or machine learning techniques (Newson and Krumm, , 2009; Early and Sykulski, , 2020; Chabert-Liddell et al., , 2023; Burzacchi et al., , 2024) that may lack rigorous statistical foundations. A major challenge in modeling GPS data is that records cannot be viewed as independently and identically distributed (IID) random variables. Thus, a conventional statistical model does not work for GPS records. The absence of a reliable statistical framework hinders cross-disciplinary applications and reduces the potential for scientific applications.

The objective of this paper is to introduce a general statistical framework for analyzing GPS data. We introduce the concept of random trajectories and model GPS records as a noisy measurement of points on a trajectory. Specifically, we assume that on each day, the individual has a true trajectory that is generated from an IID process. Within each day, the GPS data are locations on the trajectory with additive measurement noises recorded at particular timestamps. The independence across different days allow us to use daily-average information to infer the overall patterns of the individual.

1.1 Related work

Our procedure for estimating activity utilizes a weighted kernel density estimator (KDE; Chacón and Duong, 2018; Wasserman, 2006; Silverman, 2018; Scott, 2015). The weights assigned to each observation account for the timestamp distribution while adjusting for the cyclic nature of time, establishing connections to directional statistics (Mardia and Jupp, , 2009; Meilán-Vila et al., , 2021; Pewsey and García-Portugués, , 2021; Ley and Verdebout, , 2017).

Our analysis of activity space builds on the concept of density level sets (Chacón, , 2015; Tsybakov, , 1997; Cadre, , 2006; Chen et al., , 2017) and level sets indexed by probability (Polonik, , 1997; Chen and Dobra, , 2020; Chen, , 2019). Additionally, the random trajectory model introduced in this paper is related to topological and functional data analysis (Chazal and Michel, , 2021; Wang et al., , 2016; Wasserman, , 2018).

Recent statistical research on GPS data primarily focuses on recovering latent trajectories (Early and Sykulski, , 2020; Duong, , 2024; Huang et al., , 2023; Burzacchi et al., , 2024) or analyzing transportation patterns (Mao et al., , 2023; Elmasri et al., , 2023). In contrast, our work aims to develop a statistical framework for identifying an individual’s activity space and anchor locations (Chen and Dobra, , 2020). Activity spaces are the spatial areas an individual visits during their activities of daily living. Anchor locations represent mobility hubs for an individual: they are locations the individual spends a significant amount of their time that serve as origin and destination for many of the routes they follow (Schönfelder and Axhausen, , 2004).

1.2 Outline

This paper is organized as follows. In Section 2, we formally introduce our statistical framework for modeling GPS data. This framework describes the underlying statistical model on how GPS data are generated. Section 3 describes estimators of the key parameters of interest introduced in Section 2. Section 4 presents the associated asymptotic theory. To study human activity patterns, we introduce a simple movement model in Section 5 and present some statistical analysis based on this model. We apply our methodology to a real GPS data in Section 6 and explain the key insights about human mobility gained from our analysis. All codes and data are available on https://github.com/wuhy0902/Mobility_analysis.

2 Statistical framework of GPS data

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: An example of real GPS data from an example individual (without timestamps). The four panels show the GPS records of this individual in four different days. The first three panels show weekdays whereas the last panel shows a weekend day. This is part of the GPS data that we analyze in Section 6.

We consider GPS data recorded from a single individual over several non-overlapping time periods of the same length. In what follows we will consider a time period to be one day, but our results extend to any other similar time periods that can be longer (e.g., weeks, months, weekdays, weekends) or shorter (e.g., daytime hours between 7:00am and 5:00pm).

The GPS data can be represented as random variables

(Xi,j,ti,j)2×[0,1],(X_{i,j},t_{i,j})\in\mathbb{R}^{2}\times[0,1],

where i=1,,ni=1,\ldots,n denotes the day in which the observation was recorded and j=1,,mij=1,\ldots,m_{i} denotes the sequence of observations for ii-th day. Here Xi,jX_{i,j} stands for the spatial location of this GPS observation (latitude and longitude), and tijt_{ij} represents the time when the observation was recorded. Note that for each day ii, ti,1<ti,2<<ti,mi.t_{i,1}<t_{i,2}<\ldots<t_{i,m_{i}}. Here mim_{i} is the total number of observations in day ii. We assume that the timestamps ti,jt_{i,j} are non-random (similar to a fixed design). Figure 1 provides an example of real GPS data recorded from an individual in four different days (ignoring timestamps).

A challenge in modeling GPS data is that observations within the same day are dependent since each observation can be viewed as a realization of the individual’s trajectory on discrete time point (with some measurement errors). Thus, it is a non-trivial task to formulate an adequate statistical model that accurately captures the nature of GPS data.

2.1 Latent trajectory model for GPS records

To construct a statistical framework for GPS records, we consider a random trajectory model as follows. For ii-th day, there exists a continuous mapping Si:[0,1]2S_{i}:[0,1]\rightarrow\mathbb{R}^{2} such that Si(t)S_{i}(t) is the actual location of the individual on day ii and time tt. We call SiS_{i} the (latent) trajectory of ii-th day.

The observed data is Xi,j=Si(ti,j)+ϵi,j,X_{i,j}=S_{i}(t_{i,j})+\epsilon_{i,j}, where ϵi,j\epsilon_{i,j} is independent noises (independent across both ii and jj) with a bounded PDF that are often assumed to be Gaussian. We also write XR,i,j=Si(ti,j)X_{R,i,j}=S_{i}(t_{i,j}) to denote the actual location of the individual at time ti,jt_{i,j}.

We assume that the latent trajectories S1,,SnPS,S_{1},\ldots,S_{n}\sim P_{S}, where PSP_{S} is a distribution of random trajectory. Namely, on each day, the individual follows a random trajectory independent of all other trajectories from PSP_{S}. In Section 5, we provide a realistic example on how to generate a random trajectory.

Figure 2 presents an example of the latent trajectory model. The first column presents a map of important locations such as home and office, and the corresponding road networks inside this region. The second column shows random trajectories of two days. The actual trajectory is shown in black lines. The red arrows indicate the direction of the trajectory. Note that the individual might remain at the same location for an extended period of time as indicated as the solid black dots in the second column. The third column presents the observed locations of the GPS records. We note that GPS records mostly follow the individual’s trajectory but may occasionally deviate from it due to measurement errors.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 2: An example of data generating process for GPS records. The first column shows the map of some important locations and the roads visited by an individual during their activities of daily living. The second column shows two possible trajectories. Black lines represent the true trajectory followed by the individual. Red arrows display movement along these trajectory. The third column shows the observed GPS data for each day.

2.2 Activity patterns

The individual’s activity pattern is characterized by the distribution of the random trajectory S(t)S(t). Let A2A\subset\mbox{$\mathbb{R}$}^{2} be a spatial region. The probability P(S(t)A)P(S(t)\in A) is the probability that the individual is in the region AA at time T=tT=t. The manner in which distribution of S(t)S(t) changes over time provides us with information about the dynamics of the most likely locations visited by this individual. Let U𝖴𝗇𝗂𝖿𝗈𝗋𝗆[0,1]U\sim{\sf Uniform}[0,1] be a random variable independent of S(t)S(t). The probability

P(S(U)A)=01P(S(t)A)𝑑t.P(S(U)\in A)=\int_{0}^{1}P(S(t)\in A)dt.

can be interpreted the proportion of time that the individual spends in the region AA. Thus, the distribution of S(U)S(U) tells us the overall activity patterns of the individual.

2.3 GPS densities

The distributions of S(t)S(t) or S(U)S(U) are difficult to estimate from the GPS data because of the measurement error ϵ\epsilon. This is known as the deconvolution problem and is a notoriously difficult task. Thus, our idea is to use the distribution of Xt=S(t)+ϵX_{t}=S(t)+\epsilon and X=S(U)+ϵX^{*}=S(U)+\epsilon as surrogates.

Naively, one may consider the the marginal PDF of Xi,jX_{i,j}

fX(x)=limr0P(Xi,jB(x,r))πr2f_{X}(x)=\lim_{r\rightarrow 0}\frac{P(X_{i,j}\in B(x,r))}{\pi r^{2}} (1)

as an estimate of the distribution of XX^{*}. This geometric approach (Simon, , 2014) is a convenient way to define density in this case. However, this PDF is NOT the PDF of XX^{*} because it depends on the distribution of the timestamps Ti,jT_{i,j}. Even if the individual’s latent trajectory remains the same, changing the distribution of timestamps {Ti,j:j=1,,mi}\{T_{i,j}:j=1,\ldots,m_{i}\} will change the distribution of fXf_{X}.

Average GPS density. To solve this problem, we consider a time-uniform density of the observation. Let U[0,1]U\sim[0,1] be a uniform random variable for the time interval [0,1][0,1]. Recall that

X=S(U)+ϵ,X^{*}=S(U)+\epsilon,

where SPSS\sim P_{S} and ϵPϵ\epsilon\sim P_{\epsilon} are independent random quantities. XX^{*} can be viewed as the random latent position S(U)S(U) corrupted by the measurement error ϵ\epsilon. The average behavior of XX^{*} describes the expectation of a random trajectory (with measurement error) that is invariant to the timestamps distribution. With this, we define the average GPS density is the PDF of XX^{*}:

fGPS(x)=limr0P(XB(x,r))πr2.f_{GPS}(x)=\lim_{r\rightarrow 0}\frac{P(X^{*}\in B(x,r))}{\pi r^{2}}. (2)

The quantity fGPSf_{GPS} describes the expectation of a random trajectory with consideration to the measurement error and is invariant to the distribution of timestamps. What influences fGPSf_{GPS} is the distribution of the latent trajectory PSP_{S} and the measurement errors (which are generally small). Thus, fGPS(x)f_{GPS}(x) is a good quantity to characterize the activity pattern based on GPS records.

Interval-specific GPS density. Sometimes we may be interested in the activity pattern that takes place within a specific time-interval, e.g., the morning or evening rush hours. Let A[0,1]A\subset[0,1] be an interval of interest. The activity patterns within AA can be described by the following interval-specific GPS density

fGPS,A(x)\displaystyle f_{GPS,A}(x) =limr0P(XAB(x,r))πr2,\displaystyle=\lim_{r\rightarrow 0}\frac{P(X^{*}_{A}\in B(x,r))}{\pi r^{2}}, (3)
XA\displaystyle X^{*}_{A} =S(UA)+ϵ,\displaystyle=S(U_{A})+\epsilon,
UA\displaystyle U_{A} 𝖴𝗇𝗂𝖿𝗈𝗋𝗆(A),\displaystyle\sim{\sf Uniform}(A),

where 𝖴𝗇𝗂𝖿𝗈𝗋𝗆(A){\sf Uniform}(A) is the uniform distribution over the interval A[0,1]A\subset[0,1].

Conditional GPS density. When the interval AA is shrinking to a specific point tt, we obtain the conditional GPS density:

fGPS(x|t)\displaystyle f_{GPS}(x|t) =limr0P(XtB(x,r))πr2,\displaystyle=\lim_{r\rightarrow 0}\frac{P(X^{*}_{t}\in B(x,r))}{\pi r^{2}}, (4)
Xt\displaystyle X^{*}_{t} =S(t)+ϵ.\displaystyle=S(t)+\epsilon.

The conditional GPS can be used for predicting individual’s location at time tt when we have no information about the individual’s latent trajectory. The conditional GPS density and the previous two densities are linked via the following proposition.

Proposition 1

The above three GPS densities are linked by the following equalities:

fGPS(x)=01fGPS(x|t)𝑑t,fGPS,A(x)=AfGPS(x|t)𝑑t.\displaystyle f_{GPS}(x)=\int_{0}^{1}f_{GPS}(x|t)dt,\qquad f_{GPS,A}(x)=\int_{A}f_{GPS}(x|t)dt.

2.4 Identification of activity patterns from GPS densities

Even if there are measurement errors, the GPS densities still provide useful bound on the distribution of S(U)S(U). For simplicity, we assume that the measurement error ϵN(0,σ2𝕀2)\epsilon\sim N(0,\sigma^{2}\mathbb{I}_{2}). Namely, the measurement error is an isotropic Gaussian with variance σ2\sigma^{2}. Let Fχ22(t)F_{\chi^{2}_{2}}(t) be the CDF of the χ22\chi^{2}_{2} distribution.

Proposition 2 (Anchor point identification)

Let U𝖴𝗇𝗂𝖿𝗈𝗋𝗆[0,1]U\sim{\sf Uniform}[0,1] be a uniform random variable. Suppose a2a\in\mbox{$\mathbb{R}$}^{2} is a location such that P(S(U)=a)ρa.P(S(U)=a)\geq\rho_{a}. Then fGPS(a)ρa2πσ2f_{GPS}(a)\geq\frac{\rho_{a}}{2\pi\sigma^{2}} and B(a,r)fGPS(x)𝑑xρaFχ22(r2σ2).\int_{B(a,r)}f_{GPS}(x)dx\geq\rho_{a}\cdot F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right).

Proposition 2 shows that if the location aa has a probability mass, the average GPS density fGPSf_{GPS} will also be a non-trivial number. A location with a high probability mass of S(U)S(U) means that on average, the individual has spent a substantial amount of time staying there. This is generally a location of interest and is called an anchor location if the individual has spent a lot of time there on a daily basis. The home and work place are good examples of anchor locations.

Proposition 2 can be generalized to a region. Let Cr={x2:d(x,C)r}C\oplus r=\{x\in\mbox{$\mathbb{R}$}^{2}:d(x,C)\leq r\} be the rr-neighboring region around the set CC. Then we have the following result.

Proposition 3 (High activity region)

Let U𝖴𝗇𝗂𝖿𝗈𝗋𝗆[0,1]U\sim{\sf Uniform}[0,1] be a uniform random variable. Suppose C2C\subset\mbox{$\mathbb{R}$}^{2} is a regions where P(S(U)C)ρC.P(S(U)\in C)\geq\rho_{C}. Then CrfGPS(x)𝑑xρCFχ22(r2σ2).\int_{C\oplus r}f_{GPS}(x)dx\geq\rho_{C}\cdot F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right).

Proposition 3 shows that if the random trajectories PSP_{S} had spent some time in the region CC, the average GPS density will also be a non-small number around CC. The CDF of a χ22\chi^{2}_{2} distribution is due to the measurement error being an isotropic two-dimensional Gaussian. If the measurement error has a different distribution, this quantity has to be modified.

Finally, a region with a high average GPS probability will imply its neighborhood has more activities.

Proposition 4 (GPS density to activity)

Let U𝖴𝗇𝗂𝖿𝗈𝗋𝗆[0,1]U\sim{\sf Uniform}[0,1] be a uniform random variable.

Suppose C2C\subset\mbox{$\mathbb{R}$}^{2} is a regions where CfGPS(x)𝑑xGC.\int_{C}f_{GPS}(x)dx\geq G_{C}. Then

P(S(U)Cr)GC(1Fχ22(r2σ2))Fχ22(r2σ2)=11GCFχ22(r2σ2)P(S(U)\in C\oplus r)\geq\frac{G_{C}-(1-F_{\chi^{2}_{2}}(\frac{r^{2}}{\sigma^{2}}))}{F_{\chi^{2}_{2}}(\frac{r^{2}}{\sigma^{2}})}=1-\frac{1-G_{C}}{F_{\chi^{2}_{2}}(\frac{r^{2}}{\sigma^{2}})}

for any r>0r>0.

The lower bound in Proposition 2 is meaningful only if Fχ22(r2σ2)1GCF_{\chi^{2}_{2}}(\frac{r^{2}}{\sigma^{2}})\geq 1-G_{C}. This proposition shows that if the density fGPS(x)f_{GPS}(x) is high, we will expect some high probability around of X(U)X(U) around xx.

Due to Propositions 2, 3, and 4, the average density fGPSf_{GPS} captures high density regions of the latent trajectory S(t)S(t). As a result, inference on fGPSf_{GPS} provides us valuable information about the individual’s actual activities.

3 Estimating the GPS densities

In this section, we study the problem of estimating the GPS densities. Recall that our data are (Xi,j,ti,j)2×[0,1],(X_{i,j},t_{i,j})\in\mathbb{R}^{2}\times[0,1], where i=1,,ni=1,\ldots,n is the indicator of different days and j=1,,mij=1,\ldots,m_{i} is the indicator of the time at each day. Our method is based on the idea of kernel smoothing. We will show that after properly reweighting the observations according to the timestamp, a 2D kernel density estimator (KDE) can provide a consistent estimator of the GPS densities.

Naively, one may drop the timestamps and apply the kernel density estimation, which leads to

f^naive(x)=1Nnh2i=1nj=1miK(Xi,jxh),\widehat{f}_{naive}(x)=\frac{1}{N_{n}h^{2}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}K\left(\frac{X_{i,j}-x}{h}\right), (5)

where h>0h>0 is the smoothing bandwidth and K()K(\cdot) is the kernel function such as a Gaussian and Nn=i=1nmiN_{n}=\sum_{i=1}^{n}m_{i} is the total number of observations. One can easily see that f^naive(x)\widehat{f}_{naive}(x) is a consistent estimator of fXf_{X} (marginal density of XX), not fGPSf_{GPS}. So this estimator is in general inconsistent.

3.1 Time-weighted estimator

To estimate fGPSf_{GPS}, we consider a very simple estimator by weighting each observation by its time difference. For observation Xi,j,ti,jX_{i,j},t_{i,j}, we use the two mid-points to the before and after timestamps: ti,j1+ti,j2\frac{t_{i,j-1}+t_{i,j}}{2} and ti,j+ti,j+12\frac{t_{i,j}+t_{i,j+1}}{2}. The time difference between these two mid-points is Wi,j=ti,j+1ti,j12,W_{i,j}=\frac{t_{i,j+1}-t_{i,j-1}}{2}, which will be used as the weight of observation Xi,jX_{i,j}.

For the first time ti,1t_{i,1} and the last time ti,mit_{i,m_{i}}, we use the fact that the time at t=0t=0 and t=1t=1 are the same. So we set ti,0=ti,mi1t_{i,0}=t_{i,m_{i}}-1 as if the last timestamp occurs before the first timestamp and ti,mi+1=1+ti,1t_{i,m_{i}+1}=1+t_{i,1} as if the first timestamp occurs after the last timestamp. This also leads to the summation j=1miWi,j=1\sum_{j=1}^{m_{i}}W_{i,j}=1. We then define the time-weighted estimator as

f^w(x)\displaystyle\widehat{f}_{w}(x) =1nh2i=1nj=1miWi,jK(Xi,jxh).\displaystyle=\frac{1}{nh^{2}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W_{i,j}\cdot K\left(\frac{X_{i,j}-x}{h}\right). (6)

This is essentially a weighted 2D KDE with the weight Wi,jW_{i,j} representing the amount of time around ti,jt_{i,j}. While this estimator is just simply weighting each observation by the time, it is a consistent estimator of fGPSf_{GPS} (Theorem 5).

If we want to estimate the interval-specific GPS density, we can restrict observations only to the those within the time interval of interest and adjust the weights accordingly.

3.2 Conditional GPS density estimator

To estimate the conditional GPS density, we can simply use the conditional kernel density estimator (KDE):

f^(x|t)=f^(x,t)f^(t)=i=1n1mij=1miK(Xi,jxhX)KT(dT(ti,j,t)hT)hX2i=1n1mij=1miKT(dT(ti,j,t)hT),\widehat{f}(x|t)=\frac{\widehat{f}(x,t)}{\widehat{f}(t)}=\frac{\sum_{i=1}^{n}\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}K\left(\frac{X_{i,j}-x}{h_{X}}\right)K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{h_{X}^{2}\cdot\sum_{i^{\prime}=1}^{n}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K_{T}\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}, (7)

where hX>0h_{X}>0 and hT>0h_{T}>0 are the smoothing bandwidth and KTK_{T} is a kernel function for the time and

dT(t1,t2)=min{|t1t2|,1|t1t2|}d_{T}(t_{1},t_{2})=\min\left\{|t_{1}-t_{2}|,1-|t_{1}-t_{2}|\right\}

measures the time difference correcting for the fact that the time t=0t=0 and t=1t=1 are the same time. The time interval [0,1][0,1] is not just simply an interval but its two ends t=0t=0 and t=1t=1 are connected, making it a cyclic structure. Thus, dTd_{T} measures the distance in terms of time. Alternatively, one may map the time [0,1][0,1] into a “degree” in [0,2π][0,2\pi] as if it is on a clock and use a directional kernel to preserve the cyclic structure of the time (Nguyen et al., , 2023; Mardia and Jupp, , 2009; Meilán-Vila et al., , 2021).

While this estimator seem to be similar to the naive estimator f^naive\widehat{f}_{naive}, it turns out that this estimator is consistent for fGPS(x|t)f_{GPS}(x|t). Moreover, the conditional KDE can be used to estimate the average GPS density by integrating over tt:

f^c(x)=01f^(x|t)𝑑t\displaystyle\widehat{f}_{c}(x)=\int_{0}^{1}\widehat{f}(x|t)dt =1nh2i=1nj=1miW~i,jK(Xi,jxh),\displaystyle=\frac{1}{nh^{2}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}\widetilde{W}_{i,j}\cdot K\left(\frac{X_{i,j}-x}{h}\right), (8)
W~i,j\displaystyle\widetilde{W}_{i,j} =n01KT(dT(ti,j,t)hT)i=1n1mij=1miKT(dT(ti,j,t)hT)𝑑t.\displaystyle=n\cdot\int_{0}^{1}\frac{K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{i^{\prime}=1}^{n}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K_{T}\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}dt.

Similar to f^w\widehat{f}_{w}, the estimator f^c\widehat{f}_{c} is also a weighted 2D KDE. This integrated conditional KDE f^c\widehat{f}_{c} is a consistent estimator of fGPSf_{GPS}. Note that we can convert f^(x|t)\widehat{f}(x|t) to an interval-specific GPS density estimator by integrating over the interval of interest. In Appendix A.1, we discuss some numerical techniques for quickly computing the estimator f^c\widehat{f}_{c}.

Remark (daily average estimator). Here is an alternative estimator for the conditional GPS by averaging daily conditional GPS density estimators. Let

f~i(x|t)=j=1miK(Xi,jxhX)K(dT(ti,j,t)hT)h2j=1miK(dT(ti,j,t)hT)\widetilde{f}_{i}(x|t)=\frac{\sum_{j=1}^{m_{i}}K\left(\frac{X_{i,j}-x}{h_{X}}\right)K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{h^{2}\cdot\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}

be a conditional KDE for day ii. One can see that when mim_{i}\rightarrow\infty and hX,hTh_{X},h_{T} are properly shrinking to 0, this estimator will be a consistent estimator of conditional GPS density at day ii. We can then simply average over all days to get an estimator of fGPS(x|t):f_{GPS}(x|t): f~(x|t)=1ni=1nf~i(x|t).\widetilde{f}(x|t)=\frac{1}{n}\sum_{i=1}^{n}\widetilde{f}_{i}(x|t). This estimator is also consistent but we experiments show that its performance is not as good as f^(x|t)\widehat{f}(x|t).

4 Asymptotic theory

In this section, we study the asymptotic theory of these estimators. Recall that our data consist of the collections (Xi,j,ti,j)(X_{i,j},t_{i,j}) for i=1,,ni=1,\ldots,n and j=1,,mij=1,\ldots,m_{i} such that Xi,j=Si(ti,j)+ϵi,j,X_{i,j}=S_{i}(t_{i,j})+\epsilon_{i,j}, where S1,,SnFSS_{1},\ldots,S_{n}\sim F_{S} are IID random trajectories and ϵi,j\epsilon_{i,j} are IID mean 0 noises. We assume a fixed design scenario where the timestamps {ti,j}\{t_{i,j}\} are non-random. All relevant assumptions are given in Appendix C.

For simplicity, we assume that the number of observation of each day mim_{i} is a fixed and non-random quantity. Since random trajectories are IID across different days, the observations in day i1i_{1} are independent to the observations in day i2i_{2}.

Special case: even-spacing design. When comparing our results to a conventional nonparametric estimation rate, we will consider an even-spacing design such that mi=mm_{i}=m and ti,j=2j12m.t_{i,j}=\frac{2j-1}{2m}. Namely, all timestamps are evenly spacing on the interval [0,1][0,1]. Under this scenario we will re-write our convergence rate in terms of n,m,hn,m,h, making it easy to compare to a conventional rate. The analysis under the even-spacing design allows us to investigate the effect of smoothing bandwidth in a clearer way.

4.1 Time-weighted estimator

We first derive the convergence rate of the time-weighted estimator f^w(x)\widehat{f}_{w}(x).

Theorem 5

Under Assumptions 1, 2, and 3. For any x2x\in\mbox{$\mathbb{R}$}^{2}, we have

f^w(x)fGPS(x)=\displaystyle\widehat{f}_{w}(x)-f_{GPS}(x)= O(h2)+O(1nh3i=1nj=0mi(ti,j+1ti,j)2)+OP(1nhi=1nj=1miWi,j2)+OP(n12)\displaystyle O(h^{2})+O\left(\frac{1}{nh^{3}}\sum_{i=1}^{n}\sum_{j=0}^{m_{i}}(t_{i,j+1}-t_{i,j})^{2}\right)+O_{P}\left(\frac{1}{nh}\sqrt{\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W_{i,j}^{2}}\right)+O_{P}\left(n^{-\frac{1}{2}}\right)

as h0,maxi,jWi,j0h\rightarrow 0,\max_{i,j}W_{i,j}\rightarrow 0 and nn\rightarrow\infty.

Theorem 5 describes the convergence rate of the time-weighted estimator. The first term O(h2)O(h^{2}) is the conventional smoothing bias. The second term O(1nh3i=1nj=1mi(ti,j+1ti,j)2)O\left(\frac{1}{nh^{3}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}(t_{i,j+1}-t_{i,j})^{2}\right) is a bias due to the timestamps differences. The third quantity Op(1nhj=1nj=1miWi,j2)O_{p}\left(\frac{1}{nh}\sqrt{\sum_{j=1}^{n}\sum_{j=1}^{m_{i}}W_{i,j}^{2}}\right) is a stochastic errors due to the fluctuations in weights Wi,j=ti,j+1ti,j12W_{i,j}=\frac{t_{i,j+1}-t_{i,j-1}}{2}. The last quantity Op(n12)O_{p}\left(n^{-\frac{1}{2}}\right) is the usual convergence rate due to averaging over nn independent day’s data.

While Theorem 5 shows a pointwise convergence rate, it also implies the rate for mean integrated square error (MISE):

𝔼[(f^w(x)fGPS(x))2]𝑑x\displaystyle\int\mbox{$\mathbb{E}$}\left[\left(\widehat{f}_{w}(x)-f_{GPS}(x)\right)^{2}\right]dx =O(h4)+O(1n2h6(i=1nj=1mi(ti,j+1ti,j)2)2)\displaystyle=O(h^{4})+O\left(\frac{1}{n^{2}h^{6}}\left(\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}(t_{i,j+1}-t_{i,j})^{2}\right)^{2}\right) (9)
+O(1n2h2i=1nj=1miWi,j2)+O(n1)\displaystyle+O\left(\frac{1}{n^{2}h^{2}}{\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W_{i,j}^{2}}\right)+O\left(n^{-1}\right)

when the support of XX is a compact set. We show the connection between this convergence rate and the rate of usual nonparametric estimators in the even-spacing design scenario.

4.1.1 Even-spacing design and bandwidth selection

To make a meaningful comparison to conventional rate of a KDE, we consider the even-spacing design with mi=mm_{i}=m. In this case,

ti,j+1ti,j=1m,Wi,j=1m.t_{i,j+1}-t_{i,j}=\frac{1}{m},\qquad W_{i,j}=\frac{1}{m}.

Thus,

f^w(x)fGPS(x)\displaystyle\widehat{f}_{w}(x)-f_{GPS}(x) =O(h2)+O(1mh3)+OP(1nmh2)+Op(n12)\displaystyle=O(h^{2})+O\left(\frac{1}{mh^{3}}\right)+O_{P}\left(\sqrt{\frac{1}{nmh^{2}}}\right)+O_{p}\left(n^{-\frac{1}{2}}\right) (10)
=O(h2)+O(1mh3)+OP(1nmh2).\displaystyle=O(h^{2})+O\left(\frac{1}{mh^{3}}\right)+O_{P}\left(\sqrt{\frac{1}{nmh^{2}}}\right).

In the case of MISE, equation (10) becomes

𝔼[(f^w(x)fGPS(x))2]𝑑x=O(h4)+O(1m2h6)+O(1nmh2).\int\mbox{$\mathbb{E}$}\left[\left(\widehat{f}_{w}(x)-f_{GPS}(x)\right)^{2}\right]dx=O(h^{4})+O\left(\frac{1}{m^{2}h^{6}}\right)+O\left(\frac{1}{nmh^{2}}\right). (11)

This is a reasonable rate. The first quantity O(h4)O(h^{4}) is the conventional smoothing bias in a KDE. The second quantity O(1m2h6)O\left(\frac{1}{m^{2}h^{6}}\right) is the bias due to the ‘resolution’ of each day’s observation. The last term O(1nmh2)O\left(\frac{1}{nmh^{2}}\right) is the variance (stochastic error) of a 2D KDE when the effective sample size is nmnm. In our estimator, we are averaging over nn days and each day has mm observations, so the total number of random elements we are averaging is nmnm, which is the effective sample size here.

Choosing the optimal bandwidth is a non-trivial problem because h=hn,mh=h_{n,m} and there are three terms in equation (11). We denote

(B)=O(1m2h6),(S)=O(1nmh2),\displaystyle(B)=O\left(\frac{1}{m^{2}h^{6}}\right),\qquad(S)=O\left(\frac{1}{nmh^{2}}\right),

so that the term (B) represents the temporal resolution bias and the term (S) represents the variance.

Proposition 6

The optimal smoothing bandwidth for equation (11) is

hn,m{m1/5,if m1/5=o(n)(nm)1/6,if n=o(m1/5)h^{*}_{n,m}\asymp\begin{cases}m^{-1/5},\qquad&\mbox{if }m^{1/5}=o(n)\\ (nm)^{-1/6},\qquad&\mbox{if }n=o(m^{1/5})\end{cases}

and the optimal rate under equation (10) is:

𝔼[(f^w(x)fGPS(x))2]𝑑x={O(m4/5),if m1/5=o(n)O((nm)2/3),if n=o(m1/5)\int\mbox{$\mathbb{E}$}\left[\left(\widehat{f}_{w}(x)-f_{GPS}(x)\right)^{2}\right]dx=\begin{cases}O\left(m^{-4/5}\right),\qquad&\mbox{if }m^{1/5}=o(n)\\ O\left((nm)^{-2/3}\right),\qquad&\mbox{if }n=o(m^{1/5})\end{cases}

Proposition 6 shows that the critical rate to balance between the errors is nm1/5n\asymp m^{1/5}. When m1/5=o(n)m^{1/5}=o(n), the temporal resolution bias (B) is larger than the stochastic error (S) and we obtain a convergence rate of OP(m2/5)O_{P}\left(m^{-2/5}\right), which is dominated by the resolution bias. On the other hand, n=o(m1/5)n=o(m^{1/5}) is the case where (S) dominates (B), so the convergence rate is OP((nm)1/3)O_{P}\left((nm)^{-1/3}\right), which is the stochastic error.

4.2 Conditional GPS density estimator

In this section, we study the convergence of the conditional GPS density estimator f^(x|t)\widehat{f}(x|t).

Theorem 7

Under Assumptions 1 to 4, for any (x,t)2×[0,1](x,t)\in\mbox{$\mathbb{R}$}^{2}\times[0,1], we have

f^(x|t)f(x|t)=\displaystyle\widehat{f}(x|t)-f(x|t)= O(hX2)+O(i=1nαi(t)j=1miK(dT(ti,j,t)hT)j=1miKT(dT(ti,j,t)hT)dT(ti,j,t))\displaystyle O(h_{X}^{2})+O\left(\sum_{i=1}^{n}\alpha_{i}(t)\sum_{j=1}^{m_{i}}\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K_{T}\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}d_{T}(t_{i,j},t)\right)
+Op(1hXi=1nαi(t)2j=1mi[K(dT(ti,j,t)hT)j=1miKT(dT(ti,j,t)hT)]2)+Op(i=1nαi(t)2),\displaystyle+O_{p}\left(\frac{1}{h_{X}}\sqrt{\sum_{i=1}^{n}\alpha_{i}(t)^{2}\sum_{j=1}^{m_{i}}\left[\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K_{T}\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}\right]^{2}}\right)+O_{p}\left(\sqrt{\sum_{i=1}^{n}\alpha_{i}(t)^{2}}\right),

where αi(t)\alpha_{i}(t) is the kernel weight of day ii for the time tt:

αi(t)=p^T,i(t)i=1np^T,i(t)=1mij=1miK(dT(ti,j,t)hT)i=1n1mij=1miK(dT(ti,j,t)hT).\alpha_{i}(t)=\frac{\widehat{p}_{T,i}(t)}{\sum_{i^{\prime}=1}^{n}\widehat{p}_{T,i^{\prime}}(t)}=\frac{\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{i^{\prime}=1}^{n}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}.

The corresponding MISE rate will be the square of the rate in Theorem 7. Each term in Theorem 7 has a correspondence to the term in Theorem 5. The first term is the smoothing bias. The second term is the resolution bias due to the fact that observed ti,jt_{i,j} may be away from the time point of interest tt. A high level idea on how the linear term dT(ti,j,t)d_{T}(t_{i,j},t) occurs is that the difference Si(ti,j)Si(t)=O(dT(ti,j,t))\|S_{i}(t_{i,j})-S_{i}(t)\|=O(d_{T}(t_{i,j},t)) when the velocity of the trajectory is bounded from the above. The third term is the stochastic error of the estimator. The last term is the stochastic error due to different weights of each day;this quantity corresponds to the term OP(n1/2)O_{P}(n^{-1/2}).

4.2.1 Even-spacing design and bandwidth selection

We now investigate the scenario of even-spacing, which provides us insights into the convergence rate and bandwidth selection problem. When mi=mm_{i}=m is the same and the timestamps are evenly spacing, we immediate obtain αi(t)=1n\alpha_{i}(t)=\frac{1}{n} for every i=1,,ni=1,\ldots,n.

Moreover, the summation j=1miK(dT(ti,j,t)hT)\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right) is associated with the KDE of the time tt:

p^T,i(t)=1mhTj=1mK(dT(ti,j,t)hT)=pT,i(t)+Δi,1(t)=1+Δi,1(t),\widehat{p}_{T,i}(t)=\frac{1}{mh_{T}}\sum_{j^{\prime}=1}^{m}K\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)=p_{T,i}(t)+\Delta_{i,1}(t)=1+\Delta_{i,1}(t),

where Δi,1(t)=O(hT2)+O(1mhT)\Delta_{i,1}(t)=O(h_{T}^{2})+O\left(\sqrt{\frac{1}{mh_{T}}}\right) is the approximation error. This implies that

j=1miK(dT(ti,j,t)hT)=mihT(1+Δi,1(t)).\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)=m_{i}\cdot h_{T}(1+\Delta_{i,1}(t)). (12)

Similarly, the summation in the numerator of the second term of equation (7)

j=1miK(dT(ti,j,t)hT)dT(ti,j,t)\displaystyle\sum_{j=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)d_{T}(t_{i,j},t) =mi1mij=1miK(dT(ti,j,t)hT)dT(ti,j,t)\displaystyle=m_{i}\cdot\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)d_{T}(t_{i,j},t) (13)
=mi𝔼(K(UthT)|Ut|)+O(1mhT)\displaystyle=m_{i}\cdot\mbox{$\mathbb{E}$}\left(K\left(\frac{U-t}{h_{T}}\right)|U-t|\right)+O\left(\sqrt{\frac{1}{mh_{T}}}\right)
=cKmihT2+Δi,2(t),\displaystyle=c_{K}\cdot m_{i}\cdot h^{2}_{T}+\Delta_{i,2}(t),

where UU is a uniform random variable over [0,1][0,1] and cK=|t|K(t)𝑑tc_{K}=\int|t|K(t)dt is a constant and Δi,2(t)=O(hT2)+O(1mhT)\Delta_{i,2}(t)=O(h_{T}^{2})+O\left(\sqrt{\frac{1}{mh_{T}}}\right) is another approximation error. Note that in the second equality of equation (13), we replace dT(ti,j,t)d_{T}(t_{i,j},t) with |ti,jt||t_{i,j}-t| because when h0h\rightarrow 0, only those dT(ti,j,t)0d_{T}(t_{i,j},t)\approx 0 matters and in this case, dT(ti,j,t)=|ti,jt|d_{T}(t_{i,j},t)=|t_{i,j}-t|.

Now, putting equations (12) and (13) into Theorem 7, we obtain

f^(x|t)f(x|t)=O(hX2)+O(hT)+O(1mhT)+OP(1nmhX2)+OP(n1/2)\widehat{f}(x|t)-f(x|t)=O(h_{X}^{2})+O(h_{T})+O\left(\sqrt{\frac{1}{mh_{T}}}\right)+O_{P}\left(\sqrt{\frac{1}{nmh_{X}^{2}}}\right)+O_{P}\left(n^{-1/2}\right)

and the corresponding MISE rate is

𝔼[(f^(x|t)fGPS(x|t))2]𝑑x=O(hX4)+O(hT2)+O(1mhT)+O(1nmhX2)+O(n1).\int\mbox{$\mathbb{E}$}\left[\left(\widehat{f}(x|t)-f_{GPS}(x|t)\right)^{2}\right]dx=O(h_{X}^{4})+O(h_{T}^{2})+O\left({\frac{1}{mh_{T}}}\right)+O\left({\frac{1}{nmh_{X}^{2}}}\right)+O\left(n^{-1}\right). (14)

The first quantity is the conventional smoothing bias. The second bias O(hT2)O(h_{T}^{2}) is the resolution bias. In contrast to the time-weighted estimator, this resolution bias is in a different form and is due to the fact that observations with a high weights will be within O(hT)O(h_{T}) neighborhood of tt in terms of time. In the worst case, the actual location on the trajectory can differ by the same order, leading to this rate. The third quantity is the bias when we approximate a uniform integral with a dense even-spacing grid. The fourth term is the conventional variance part of the estimator and the last one is just the variance due to nn distinct days.

Equation (14) also provides the convergence rate of the estimator f^c(x)\widehat{f}_{c}(x) because f^c(x)=01f^(x|t)𝑑t\widehat{f}_{c}(x)=\int_{0}^{1}\widehat{f}(x|t)dt is a finite integral. So the MISE convergence rate of f^c\widehat{f}_{c} is the same as in equation (14).

The bandwidth selection under equation (14) is straightforward. The optimal choice will be of the rate

hT=m1/3,hX=(nm)1/6,\displaystyle h^{*}_{T}=m^{-1/3},\qquad h^{*}_{X}=(nm)^{-1/6}, (15)

which corresponds to the convergence rate

𝔼[(f^(x|t)fGPS(x|t))2]𝑑x=O((nm)2/3)+O(m2/3).\int\mbox{$\mathbb{E}$}\left[\left(\widehat{f}(x|t)-f_{GPS}(x|t)\right)^{2}\right]dx=O\left((nm)^{-2/3}\right)+O\left(m^{-2/3}\right).

One can clearly see that the convergence rate is dominated by the second term O(m1/3)O\left(m^{-1/3}\right) because of the resolution bias O(hT2)O(h^{2}_{T}) and the approximation bias O(1mhT)O(\frac{1}{mh_{T}}) at each day. Since both quantity are not stochastic errors, increasing the number of days nn will not affect them.

Under the optimal smoothing bandwidth, we also have

𝔼[(f^c(x)fGPS(x))2]𝑑x=O((nm)2/3)+O(m2/3)=O(m2/3).\int\mbox{$\mathbb{E}$}\left[\left(\widehat{f}_{c}(x)-f_{GPS}(x)\right)^{2}\right]dx=O\left((nm)^{-2/3}\right)+O\left(m^{-2/3}\right)=O\left(m^{-2/3}\right).

Comparing this to the time-weighted estimator in Proposition 6, the two rates are the same when the number of days nn is not much larger than the number of observation per day mm, which is the typical case. When we have m1/5=o(n)m^{1/5}=o(n), f^w(x)\widehat{f}_{w}(x) has a MISE of rate O(m4/5)O\left(m^{-4/5}\right) when nn is large, which is slightly faster than f^c(x)\widehat{f}_{c}(x). This is because the time-weighted estimator does not require an estimator on the conditional density, so there is no smoothing on the time, making the estimator more efficient.

5 Simple movement model

All the above analysis can be applied to GPS data recorded from a human being but also from other living creatures whose spatial trajectories are studied in the literature (e.g., elk, whales, turtles, fish). However, human activity patterns are very different from rest because they are generally very regular (Hägerstrand, , 1963, 1970). To tailor our analysis on humans, we study a special class of random trajectories that we call simple movement model (SMM). In spite of its simplicity, it provides a model that generates a trajectory similar to an actual trajectory of a person. Our simulation studies are all based on this model.

Let 𝒜={a1,,aK}2\mathcal{A}=\{a_{1},\ldots,a_{K}\}\subset\mbox{$\mathbb{R}$}^{2} be a collection of locations that an individual visits very often. The set 𝒜\mathcal{A} includes the so-called anchor locations such as the home, office location as well as other places such as restaurants and grocery stores. Let BB be a random element that takes the form of

B=(A1,R1,A2,R2,,Ak,Rk,Ak+1)B=(A_{1},R_{1},A_{2},R_{2},\ldots,A_{k},R_{k},A_{k+1})

such that A1,,Ak+1𝒜A_{1},\ldots,A_{k+1}\in\mathcal{A} are the location where the individual stay stationary. The vector BB is called an action vector since it describes the actions being taken in a single day. The quantity RjR_{j} is a road/path connecting AjA_{j} to Aj+1A_{j+1}, which describes the road that the individual takes to move from location AjA_{j} to Aj+1A_{j+1}.

Refer to caption
Figure 3: An example of the SMM. The top left panel shows the possible places that the individual visits regularly, as well as the road networks connecting these locations. The other five panels show the possible states that the action vector can be. More details of the actions can be found in Table 1 in the appendix.

Figure 3 provides an example of the SMM. In this case, there are five possible patterns that the action vector BB can be. The first pattern is the following action vector

B=b1={home,R1,1,office,R1,2,home,R1,3,home}.B=b_{1}=\{\texttt{home},R_{1,1},\texttt{office},R_{1,2},\texttt{home},R_{1,3},\texttt{home}\}.

The road R1,1R_{1,1} is from home directly to the office and R1,2R_{1,2} is a road from office to home. The road R1,3R_{1,3} is where the individual goes to the park in the top region and then goes home without staying. Pattern 3 is the simplest one as it involves only a trip to the local grocery store:

B=b3={home,R31,supermarket,R3,2,home}.B=b_{3}=\{\texttt{home},R_{31},\texttt{supermarket},R_{3,2},\texttt{home}\}.

The randomness of the action vector BB reflects that the individual may follow different activity patterns in a day. Also, even if all the locations AjA_{j} are the same, the individual may take a different road to move between them. In this construction, we would assume that all possible outcomes of BB is a finite set, meaning that the the distribution of BB can be described by a probability mass function.

The action vector BB only describes the actions (stationary at a location or moving from one to another) of the individual. It does not include any information about the time. Thus, we introduce another random variable ZZ conditioned on BB such that Z=(Z1,,Z2k+1)Z=(Z_{1},\ldots,Z_{2k+1}) is a random partition of the interval [0,1][0,1] such that for j=1,,k+1j=1,\ldots,k+1, Z2j1Z_{2j-1} is the amount of time that the individual spend on AjA_{j} while Z2jZ_{2j} is the amount of time that the individual is on the road RjR_{j}. One can think of ZZ as a (2k+1)-simplex. When the individual is on the road, we assume that the individual moves between locations with constant velocity. One may use the Dirichlet distribution or truncated Gaussian distribution to generate ZZ.

To illustrate the idea, we consider again the example in Figure 3. When B=b1B=b_{1}, there are a total of 77 elements in the action vector. So Z|B=b1Z|B=b_{1} is a 7-simplex. Suppose Z=(924,124,824,124,224,124,224)Z=(\frac{9}{24},\frac{1}{24},\frac{8}{24},\frac{1}{24},\frac{2}{24},\frac{1}{24},\frac{2}{24}). This means that the individual leaves home at 9 am and spends 1 hour on the road to office. The individual then spends 8 hours in the office and drives home for 1 hour. After a 2-hour stay at home, the individual goes to the park and takes a round trip back home for 1-hour and stays at home for the rest of the day. Note that the vector Z|B=b3Z|B=b_{3} will be a 5-simplex since pattern 3 only has 5 possible actions.

Once we have randomly generate B,ZB,Z, we can easily obtain the corresponding random trajectory S(t)S(t). Since this model consider a simplified movement of an individual, we call it simple movement model. Although it is a simplified model, it does capture the primary activity of an individual that can be captured by the GPS data. Under SMM, the distribution PSP_{S} can be characterized by a PMF of BB and the PDF of Z|BZ|B. Since the action vector BB only have finite number of possible outcome, the implied trajectories can be viewed as generating from a mixture model where the action vector BB acts as the indicator of the component.

5.1 Recovering anchor locations

In the SMM, we may define the the anchor locations as 𝒜λ={a𝒜:P(S(U)=a)>λ}\mathcal{A}_{\lambda}=\{a\in\mathcal{A}:P(S(U)=a)>\lambda\} for some given threshold λ\lambda. The threshold λ\lambda is the proportion of the time that the individual spend at the location aa in a day. For instance, the threshold λ=824\lambda=\frac{8}{24} should give us the home location as we expect people to spend more than 8 hours in the home in a day. λ=424\lambda=\frac{4}{24} may correspond to both the home and the work place.

Suppose that the measurement noise is isotropic Gaussian with variance σ2\sigma^{2} that is known to us, Proposition 2 show that for any aa with P(S(U)=a)ρa,P(S(U)=a)\geq\rho_{a}, we have fGPS(a)ρa2πσ2.f_{GPS}(a)\geq\frac{\rho_{a}}{2\pi\sigma^{2}}. This implies that any anchor location a𝒜λa\in\mathcal{A}_{\lambda} will satisfy fGPS(a)λ2πσ2.f_{GPS}(a)\geq\frac{\lambda}{2\pi\sigma^{2}}.

Let f^(x)\widehat{f}(x) be an estimated GPS density, either from the time-weighted approach or the conditional GPS method. We may use the estimated upper level set

^λ={x:f^(x)λ2πσ2}\widehat{\mathcal{L}}_{\lambda}=\left\{x:\widehat{f}(x)\geq\frac{\lambda}{2\pi\sigma^{2}}\right\} (16)

as an estimator for possible locations of 𝒜λ\mathcal{A}_{\lambda}. As a result, the level sets of f^(x)\widehat{f}(x) provide useful information on the anchor locations.

In addition to the level sets, the density local modes of f^(x)\widehat{f}(x) can be used as an estimator of the anchor locations. Under SMM, anchor locations are where there are probability mass of S(U)S(U). On a regular road RiR_{i}, there is no probability mass (only a 1D probability density along the road). This property implies that the GPS density fGPS(x)f_{GPS}(x) will have local modes around each anchor point (when the measurement noise is not very large). Thus, the local modes within ^λ\widehat{\mathcal{L}}_{\lambda} can be used to recover the anchor location:

^λ={x^λ:f^(x)=0,λ1(f^(x))<0},\widehat{\mathcal{M}}_{\lambda}=\{x\in\widehat{\mathcal{L}}_{\lambda}:\nabla\widehat{f}(x)=0,\,\,\lambda_{1}(\nabla\nabla\widehat{f}(x))<0\},

where λ1(M)\lambda_{1}(M) is the largest eigenvalue of the matrix MM and f(x)\nabla\nabla f(x) is the Hessian matrix of f(x)f(x).

5.2 Detecting activity spaces

The activity space is a region where the individual spends most of her/his time in a regular day. This region is unique to every individual and can provide key information such as how the built-environment influences the individual (Sherman et al., , 2005). We show how GPS data can be used to determine an individual’s acvity space.

Suppose we know the true GPS density fGPS(x)f_{GPS(x)}, a simple approach to quantify the activity space is via the level set indexed by the probability (Polonik, , 1997), also known as the density ranking (Chen and Dobra, , 2020; Chen, , 2019):

Qρ\displaystyle Q_{\rho} =minλ{λ:xλfGPS(x)𝑑xρ}\displaystyle=\min_{\lambda}\left\{\mathcal{L}^{*}_{\lambda}:\int_{x\in\mathcal{L}^{*}_{\lambda}}f_{GPS}(x)dx\geq\rho\right\} (17)
λ\displaystyle\mathcal{L}^{*}_{\lambda} ={x:fGPS(x)λ}.\displaystyle=\left\{x:f_{GPS}(x)\geq\lambda\right\}.

Because the level set λ\mathcal{L}^{*}_{\lambda} are nested when changing λ\lambda, QρQ_{\rho} can be recovered by varying λ\lambda until we have an exactly ρ\rho amount of probability within the level set.

The set QρQ_{\rho} has a nice interpretation: it is the smallest region where the individual spends at least a proportion of ρ\rho time in it. As a concrete example, Q0.9Q_{0.9} is where the individual has spent 90%90\% of the time on average. Thus, QρQ_{\rho} can be used as statistical entity of the activity space. Interestingly, QρQ_{\rho} can also be interpreted as a prediction region of the individual.

5.2.1 Estimating the activity space

Estimating QρQ_{\rho} from the data can be done easily by the plug-in estimator approach. A straightforward idea is to use

Q^ρ\displaystyle\widehat{Q}^{*}_{\rho} =minλ{^λ:x^λf^(x)𝑑xρ},^λ\displaystyle=\min_{\lambda}\left\{\widehat{\mathcal{L}}^{*}_{\lambda}:\int_{x\in\widehat{\mathcal{L}}^{*}_{\lambda}}\widehat{f}(x)dx\geq\rho\right\},\qquad\widehat{\mathcal{L}}^{*}_{\lambda} ={x:f^(x)λ}.\displaystyle=\left\{x:\widehat{f}(x)\geq\lambda\right\}.

While this estimator is valid, it may be expansive to compute it because of the integral in the estimator.

Here is an alternative approach to estimating QρQ_{\rho} without the evaluation of the integral. First, the integral fGPS(x)dx=dFGPS(x)f_{GPS}(x)dx=dF_{GPS}(x), where FGPS(x)=P(X(U)+ϵx)F_{GPS}(x)=P(X(U)+\epsilon\leq x) is the corresponding CDF of fGPSf_{GPS}. When the timestamps are even-spacing, FGPSF_{GPS} can be estimated by the empirical distribution of Xi,jX_{i,j}. When the timestamps are not even-spacing, we can simply weight each observation by the idea of time-weighting estimator. In more details, a time-weighted estimator of FGPSF_{GPS} is

F^w(x)=1ni=1nj=1miWi,jI(Xi,jx),\widehat{F}_{w}(x)=\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W^{\dagger}_{i,j}I(X_{i,j}\leq x),

where Wi,j=ti,j+1ti,j12Ti,W^{\dagger}_{i,j}=\frac{t_{i,j+1}-t_{i,j-1}}{2T_{i}}, if we are using f^w\widehat{f}_{w} and

Wi,j=01KT(dT(ti,j,t)hT)i=1n1mij=1miKT(dT(ti,j,t)hT)𝑑t\qquad W^{\dagger}_{i,j}=\int_{0}^{1}\frac{K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{i^{\prime}=1}^{n}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K_{T}\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}dt

if we are using f^c\widehat{f}_{c}. F^w(x)\widehat{F}_{w}(x) can be viewed as a time-weighted EDF. A feature of F^w(x)\widehat{F}_{w}(x) is that when integrating over any region AA, we have

I(xA)𝑑F^w(x)=1ni=1nj=1miWi,jI(Xi,jA).\int I(x\in A)d\widehat{F}_{w}(x)=\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W^{\dagger}_{i,j}I(X_{i,j}\in A).

It is the time-weighted proportion of Xi,jX_{i,j} inside the region AA. With this, we obtain the estimator

Q^ρ\displaystyle\widehat{Q}_{\rho} =minλ{^λ:1ni=1nj=1miWi,jI(Xi,j^λ)ρ},\displaystyle=\min_{\lambda}\left\{\widehat{\mathcal{L}}^{*}_{\lambda}:\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W^{\dagger}_{i,j}I(X_{i,j}\in\widehat{\mathcal{L}}^{*}_{\lambda})\geq\rho\right\}, (18)
^λ\displaystyle\widehat{\mathcal{L}}^{*}_{\lambda} ={x:f^(x)λ};\displaystyle=\left\{x:\widehat{f}(x)\geq\lambda\right\};

note that f^\widehat{f} can be either f^c\widehat{f}_{c} or f^w\widehat{f}_{w}. We provide a fast numerical method for computing Q^ρ\widehat{Q}_{\rho} in Appendix A.2.

5.3 Clustering of trajectories

In the SMM, the action vector BB is assumed to take a finite number of possible outcomes. While the actual time of each movement under this model may vary a bit according to the random vector ZZ, resulting in a slightly different trajectory with the same BB, the main structure of the trajectory remain similar. This implies that the individual’s trajectories can be clustered into groups according to the action vector BB.

While we do not directly observe BB, we may use the estimated GPS densities at each day as a noisy representation of BB and cluster different days according to the estimated GPS densities.

Here we describe a simple approach to cluster trajectories. Let

f^c,i(x)=1nh2j=1miW~i,jK(Xi,jxh)\widehat{f}_{c,i}(x)=\frac{1}{nh^{2}}\sum_{j=1}^{m_{i}}\widetilde{W}_{i,j}\cdot K\left(\frac{X_{i,j}-x}{h}\right)

be the estimated time-weighted GPS density of day ii, where W~i,j\widetilde{W}_{i,j} is defined in equation (8). While we may use the pairwise distance matrix among different days (f^w,a(x)f^w,b(x))2𝑑x\int(\widehat{f}_{w,a}(x)-\widehat{f}_{w,b}(x))^{2}dx as a distance measure, we recommend using a scaled log-density, i.e.,

Da,b=[log(f^c,a(x)+ξ)log(f^c,b(x)+ξ)]2𝑑x,D_{a,b}=\int\left[\log(\widehat{f}_{c,a}(x)+\xi)-\log(\widehat{f}_{c,b}(x)+\xi)\right]^{2}dx, (19)

for some small constant ξ\xi to stabilize the log-density. Our empirical studies find that equation (19) works much better in practice because the GPS density is often very skew due to the fact that when the individual is not moving, the density is highly concentrated at a location.

With the distance matrix DD, we can perform cluster analysis based on hierarchical cluster or spectral clustering (Von Luxburg, , 2007) to obtain a cluster label for every day. We may also detect outlier activity by examining the distance matrix.

5.3.1 Recovering dynamics after clustering

Once we have done clustering of the data, we obtain a label GiG_{i} for every day, indicating what cluster the day belongs to. In SMM, trajectories in the same cluster are likely to be from the same action vector BB. If we focus on the data of a cluster, we may perform inference on anchor locations or activity space (Sections 5.1 and 5.2) specifically for the corresponding action vector.

Under SMM, trajectories in the same cluster are likely to have the same action vector BB (but different temporal distribution ZZ). We will expect that the conditional density on the days from the same clusters to be similar. The average behavior of these conditional densities in the same cluster provides us useful information about the latent action vector.

Let Gi{1,,M}G_{i}\in\{1,\ldots,M\} for every i=1,,ni=1,\ldots,n be the cluster label. For a cluster g{1,,M}g\in\{1,\ldots,M\}, we compute the cluster conditional GPS density

f^g(x|t)=i:Gi=g1mij=1miK(Xi,jxhX)KT(dT(ti,j,t)hT)hX2i:Gi=g1mij=1miKT(dT(ti,j,t)hT).\widehat{f}_{g}(x|t)=\frac{\sum_{i:G_{i}=g}\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}K\left(\frac{X_{i,j}-x}{h_{X}}\right)K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{h_{X}^{2}\sum_{i^{\prime}:G_{i^{\prime}}=g}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K_{T}\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}. (20)

How the conditional distribution f^g(x|t)\widehat{f}_{g}(x|t) changes over time tt informs us the movement of this individual under this action vector.

Estimating the conditional center. Since trajectories in the same cluster are similar, we expect the cluster conditional GPS density f^g(x|t)\widehat{f}_{g}(x|t) to be uni-modal and the mean of this density will be a good representation of its center. Thus, a simple way to track how this distribution moves over time is to study the movement of the mean location, which can be written as

μ^g(t)=i:Gi=g1mij=1miXi,jKT(dT(ti,j,t)hT)i:Gi=g1mij=1miKT(dT(ti,j,t)hT)2.\widehat{\mu}_{g}(t)=\frac{\sum_{i:G_{i}=g}\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}X_{i,j}K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{i^{\prime}:G_{i^{\prime}}=g}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K_{T}\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}\in\mbox{$\mathbb{R}$}^{2}. (21)

Note that this is like a kernel regression with two ‘outcome variables’ (coordinates). For two time points t1<t2t_{1}<t_{2}, the difference in the center μ^g(t2)μ^g(t1)\widehat{\mu}_{g}(t_{2})-\widehat{\mu}_{g}(t_{1}) can be interpreted as the average movement in the distribution f^g(x|t)\widehat{f}_{g}(x|t) within the time interval [t1,t2][t_{1},t_{2}]. Thus, the function μ^g(t)\widehat{\mu}_{g}(t) describes the overall dynamics of the trajectories. Note that we may use a local polynomial regression method (Fan, , 2018) to estimate the velocity as well.

6 Real data analysis

We analyze GPS data recorded for a single individual from a longitudinal study of exposure to a cognitive-impairing substance. Participants in this study have been recruited from several cities in a major metropolitan area in the US. The survey data collected comprise one month of GPS tracking with a GPS-enabled smartphone carried by each study participant. The protocol of the study required that the smartphones recorded point locations (latitude, longitude, timestamp) on the spatiotemporal trajectories followed by study participants every minute using a commercially available software. The study participant whose GPS locations are used to illustrate our proposed methodology has been selected at random from the study cohort which comprises more than 150 individuals with complete survey data. In order to preserve any disclosure of private, sensitive information related to this study participant, the latitude and longitude coordinates have been shifted and scaled. The timestamps have been mapped to the unit interval.

Refer to caption
Refer to caption
Refer to caption
Figure 4: Summary information of example individual. The first two panels display the distribution of timestamps in weekdays versus weekends. The right panel display the average GPS log-density.
Refer to caption
Refer to caption
Refer to caption
Figure 5: Activity space of the example individual. The first panel shows the activity space across from all days. Second panel uses only the weekday’s data and third panel focuses on the weekend. The 90% region means that this region covers 90% of the time that the individual spends in a day (on average).

The GPS data for the selected example individual consists of a total of 14,972 observations recorded over n=30n=30 days, averaging approximately m=500m=500 observations per day. Although the study protocol specified that a new GPS observation should be recorded every minute, the actual recorded timestamps are not regularly distributed over the entire observation window, as illustrated in the first two panels of Figure 4. Consequently, a naive estimation approach would be biased, so we should use either f^w\widehat{f}_{w} or f^c\widehat{f}_{c} for density estimation. We choose f^c\widehat{f}_{c} because it allows for conditional density estimation at any given time. The smoothing bandwidths are set to hX=0.005h_{X}=0.005 (longitude/latitude degrees) and hT=0.02h_{T}=0.02 (approximately 30 minutes). The third panel in Figure 4 displays the GPS log-density log(f^c(x)+1)\log(\widehat{f}_{c}(x)+1). Several high-density regions emerge, suggesting potential anchor locations for the individual.

Activity space analysis. Figure 5 presents the individual’s activity space under three scenarios: all days, weekdays, and weekends. The contour region QρQ_{\rho} represents the area where the individual spends ρ\rho proportion of their time. For instance, Q0.99Q_{0.99} (green) covers 99% of their time. More than 50% of the time is spent at home, as it includes nighttime. On weekdays (middle panel), 90% of the activities (orange contour) are concentrated at home or the office. To further analyze mobility, we separate the data into weekdays and weekends and examine hourly density variations.

Analysis on the weekdays patterns. Weekday Patterns Figure 6 depicts GPS density distributions across four weekday time windows: 1-2 AM (sleep time), 7-8 AM (morning rush hour), 12-1 PM (noon), and 5-6 PM (evening rush hour). The top row shows the logarithm of the hourly GPS density to mitigate skewness. The bottom row presents the activity space contours for 99% (green), 90% (orange), 70% (blue), and 50% (red) of the time. From Figure 6, we identify five likely anchor locations: one main living place (highest density region in first column) and other two close locations the person may also visit at night (two high-density regions at bottom in the first column) and two office locations (identified in the third panel as high-density regions during midday). By further cluster analysis, we find that there is a main living location (corresponding to the highest density region at night), and an alternative living place close to a park that the example individual sometimes walks inside. The second and fourth columns capture rush-hour mobility, illustrating connectivity between anchor locations.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 6: The hourly GPS log-density and activity space of the example individual during the weekdays. Top row displays the logarithm of the GPS density and the bottom row shows the corresponding activity space of 99%99\% (green), 90%90\% (orange), 70%70\% (blue), and 50%50\% (red) average time. We display the interval-specific GPS density of the individual during 1-2 AM (sleep time), 7-8 AM (morning rush hour), 12-1 PM (workplace/school), and 5-6 PM (evening rush hour).
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 7: The hourly GPS log-density and activity space of the example individual during the weekends.

Analysis on the weekends patterns. Figure 7 shows the GPS density during weekends, revealing distinct mobility patterns compared to weekdays. Notably, the office locations (bottom two spots in the third column of Figure 6) show no density. Additional high-density spots appear, likely representing leisure locations (e.g., parks, malls, or beaches). The fourth column indicates a frequent weekend location near home, which is a plaza containing some restaurants and shops. Comparing Figures 6 and 7, we observe a more dispersed mobility pattern on weekends, with multiple high-density regions, whereas weekdays exhibit a concentrated movement pattern, consistent with regular commuting behavior.

Refer to caption
Refer to caption
Refer to caption
Figure 8: Cluster analysis of the weekdays data. The first panel shows the dendrogram under single linkage clustering. We use the threshold as indicated in the red horizontal line to form 4 clusters. The third and fourth cluster only contain a single day so we ignore it, focusing on the first two clusters. The middle and right panels display the average GPS density within cluster 1 and cluster 2 and the corresponding conditional center at different time point.

Cluster Analysis on Weekday Mobility.

Given the structured weekday patterns, we conduct a cluster analysis to further investigate mobility trends. The results are presented in Figure 8. For each day, we use the same smoothing bandwidths (hXh_{X} and hTh_{T} ) and compute pairwise distances via the integrated squared difference between log-densities:

Dij=[log(f^c,i(x)+1)log(f^c,j(x)+1)]2𝑑x.D_{ij}=\int\left[\log(\widehat{f}_{c,i}(x)+1)-\log(\widehat{f}_{c,j}(x)+1)\right]^{2}dx.

Applying single-linkage hierarchical clustering, the dendrogram (first panel of Figure 8) identifies four clusters, separated by the red horizontal line. Cluster 3 contains Day 11 exclusively. All GPS records on Day 11 are located at home, which corresponds to a wildfire event in the individual’s region that forced he or she to remain indoors throughout the day. So we classify it as an outlier. And cluster 4 also consists of one day (Day 20). On this day, the individual made an additional stop when going back home from office, visiting an entertainment venue that was only recorded once in the entire dataset.

We focus on clusters 1 and 2, comparing their average GPS densities and conditional centers at six different time points. Although both clusters represent commuting patterns, they exhibit distinct anchor locations around the workplace and different evening behaviors. In the lower part of the second and third panels of Figure 8, Cluster 1 shows two high-density regions in the evening, primarily at home and near the office, whereas Cluster 2 contains additional high-density area representing an alternative place to stay overnight (to the right of the main office location) and a park nearby. In particular, on days in Cluster 2, the individual sometimes stays in an alternative home overnight after leaving office rather than returning to the primary home, or visits the park, in the evening (e.g., for a walk) after staying in the alternative home for about 2 hours (and then goes back to the alternative home after visiting the park) By contrast, in Cluster 1, the individual tends to commute directly between home and office without these additional stops.

To highlight these differences, Figure 9 compares log-density estimates for four specific time intervals: 7-8 AM (morning rush hour), 5-6 PM (evening rush hour), 7-8 PM (post-work activities), and 9-10 PM (late evening). The top row (Cluster 1) exhibits only 2 high-density regions in the evening, while the bottom row (Cluster 2) shows 4 high-density regions (home, office, park and alternative apartment). As a result, Cluster 2’s pattern emphasizes extended or alternative routines after work including returning to the alternative home or a visit to the park near the work place. The clustering analysis uncovers two distinct weekday mobility patterns: Cluster 1: The individual commutes directly between home and office. Cluster 2: The individual has two options after working: the individual goes back home, otherwise, the individual goes back to the alternative home (sometimes also containing the visit to park after working). This method effectively reveals hidden mobility structures in the data, demonstrating the utility of GPS density estimation in understanding individual movement behavior.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 9: A comparison of the log-density between the two clusters at four specific time intervals: 7-8 AM, 5-6 PM, 7-8 PM, and 9-10 PM. In the morning rush hour, the two clusters show a similar pattern. In the evening time (second to the last columns), the two cluster show significantly different patterns.

7 Discussion and future directions

In this paper, we introduced a statistical framework for analyzing GPS data. The general framework presented in Section 2 applies to GPS data collected from various entities, extending beyond human mobility. For studying human activity patterns, we developed the SMM in Section 5. Despite its simplicity, the SMM effectively captures major movement patterns within a single day and enables meaningful analyses. Our framework lays the foundation for numerous future research directions.

Population study. While this paper primarily focuses on analyzing the activity patterns of a single individual, extending our framework to a population-level analysis remains an open challenge. One key difficulty lies in the variation of anchor locations across individuals, as each person has distinct home and workplace locations. To accommodate population-level studies, modifications to the SMM are necessary to generalize anchor locations and account for heterogeneous mobility patterns. Also, a population-level study will lead to an additional asymptotic regime where the number of individual increases. Together with the two asymptotic regimes of the current framework (number of days and temporal resolution), we encounter a scenario with three asymptotics. The convergence rate and choice of tuning parameters will require adjustments.

Connections to functional data. Our framework transforms GPS data into a density map, establishing interesting connections with functional data analysis (Ramsay and Silverman, , 2002; Wang et al., , 2016). In particular, the daily average GPS density f^GPS,i\widehat{f}_{GPS,i} for an individual can be viewed as independent and identically distributed (IID) 2D functional data under the assumption that timestamps are generated in an IID setting. Existing functional data analysis techniques may therefore be applicable, offering new perspectives on GPS-based modeling. Notably, methods introduced in Section 5 suggest potential links between structural assumptions in the SMM and those in functional data analysis. In addition, when the sampling frequency is high, the two coordinates in the GPS data can be separated viewed as two functional data with a common but irregular spacing (Gervini, , 2009). Under this point of view, we may directly apply methods from functional data to handle GPS data. This alternative perspective may lead to a different framework for analyzing GPS data. Further exploration of these connections is an important research direction.

Limiting regime of measurement errors. Our current analysis considers asymptotic limits where the number of days nn\rightarrow\infty and the temporal resolution |ti,jti,j+1|0|t_{i,j}-t_{i,j+1}|\rightarrow 0. Another intriguing limit involves the measurement noise level, where 𝔼(ϵ2)0\mbox{$\mathbb{E}$}(\|\epsilon\|^{2})\rightarrow 0, reflecting advancements in GPS technology. However, as measurement errors vanish, the density function fGPSf_{GPS} becomes ill-defined. In this noiseless scenario, the distribution of a random trajectory point X=X(U)X^{*}=X(U) consists of a mixture of point masses (at anchor locations) and 1D distributions representing movement along trajectories. Addressing the challenges posed by singular measures (Chen and Dobra, , 2020; Chen, , 2019) in this regime is a crucial area for future investigation.

Alternative random trajectory models. While the SMM proposed in Section 5 provides practical utility and meaningful insights, it does not precisely model an individual’s movement. Specifically, the SMM assumes a constant velocity when an individual is in motion, leading to a step-function velocity profile with abrupt jumps at transitions between movement and stationary states. This assumption deviates from realistic human mobility patterns. Developing more sophisticated models that accurately capture velocity dynamics and movement variability remains an open problem for future research.

Topological data analysis. Topological data analysis (TDA; Wasserman, 2018; Chazal and Michel, 2021) is a data analysis technique to study the shape of features inside data. A common application of TDA is to study the morphology of the probability density function. The TDA can be applied to analyze GPS data in two possible ways. The first application is to use TDA to analyze the morphology of GPS density functions. This allows us to further understand the activity space. The second applications is to use TDA to investigate the latent trajectories generated the GPS data. The common/persistent features of trajectories informs us important characteristics of the individual’s activity patterns.

References

  • Andrade and Gama, (2020) Andrade, T. and Gama, J. (2020). Identifying points of interest and similar individuals from raw gps data. In Mobility Internet of Things 2018: Mobility IoT, pages 293–305. Springer.
  • Barnett and Onnela, (2020) Barnett, I. and Onnela, J.-P. (2020). Inferring mobility measures from gps traces with missing data. Biostatistics, 21(2):e98–e112.
  • Burzacchi et al., (2024) Burzacchi, A., Bellanger, L., Gall, K. L., Stamm, A., and Vantini, S. (2024). Generating synthetic functional data for privacy-preserving gps trajectories. arXiv preprint arXiv:2410.12514.
  • Cadre, (2006) Cadre, B. (2006). Kernel estimation of density level sets. Journal of multivariate analysis, 97(4):999–1023.
  • Chabert-Liddell et al., (2023) Chabert-Liddell, S.-C., Bez, N., Gloaguen, P., Donnet, S., and Mahévas, S. (2023). Auto-encoding gps data to reveal individual and collective behaviour.
  • Chacón, (2015) Chacón, J. E. (2015). A population background for nonparametric density-based clustering. Statistical Science, 30(4):518–532.
  • Chacón and Duong, (2018) Chacón, J. E. and Duong, T. (2018). Multivariate kernel smoothing and its applications. Chapman and Hall/CRC.
  • Chacón et al., (2011) Chacón, J. E., Duong, T., and Wand, M. P. (2011). Asymptotics for general multivariate kernel density derivative estimators. Statistica Sinica, 21(2):807–840. Accessed 5 Mar. 2025.
  • Chazal and Michel, (2021) Chazal, F. and Michel, B. (2021). An introduction to topological data analysis: fundamental and practical aspects for data scientists. Frontiers in artificial intelligence, 4:667963.
  • Chen, (2019) Chen, Y.-C. (2019). Generalized cluster trees and singular measures. The Annals of Statistics, 47(4):2174–2203.
  • Chen and Dobra, (2020) Chen, Y.-C. and Dobra, A. (2020). Measuring human activity spaces from GPS data with density ranking and summary curves. The Annals of Applied Statistics, 14(1):409 – 432.
  • Chen et al., (2017) Chen, Y.-C., Genovese, C. R., and Wasserman, L. (2017). Density level sets: Asymptotics, inference, and visualization. Journal of the American Statistical Association, 112(520):1684–1696.
  • Duong, (2024) Duong, T. (2024). Using iterated local alignment to aggregate trajectory data into a traffic flow map. arXiv preprint arXiv:2406.17500.
  • Early and Sykulski, (2020) Early, J. J. and Sykulski, A. M. (2020). Smoothing and interpolating noisy gps data with smoothing splines. Journal of Atmospheric and Oceanic Technology, 37(3):449–465.
  • Elmasri et al., (2023) Elmasri, M., Labbe, A., Larocque, D., and Charlin, L. (2023). Predictive inference for travel time on transportation networks. The Annals of Applied Statistics, 17(4):2796–2820.
  • Fan, (2018) Fan, J. (2018). Local polynomial modelling and its applications: monographs on statistics and applied probability 66. Routledge.
  • Gervini, (2009) Gervini, D. (2009). Detecting and handling outlying trajectories in irregularly sampled functional datasets. The Annals of Applied Statistics, pages 1758–1775.
  • Hägerstrand, (1963) Hägerstrand, T. (1963). Geographic measurements of migration. In Sutter, J., editor, Human Displacements: Measurement Methodological Aspects. Monaco.
  • Hägerstrand, (1970) Hägerstrand, T. (1970). What about people in regional science? Papers of the Regional Science Association, 24:7–21.
  • Huang et al., (2023) Huang, Y., Abdelhalim, A., Stewart, A., Zhao, J., and Koutsopoulos, H. (2023). Reconstructing transit vehicle trajectory using high-resolution gps data. In 2023 IEEE 26th International Conference on Intelligent Transportation Systems (ITSC), pages 5247–5253. IEEE.
  • Ley and Verdebout, (2017) Ley, C. and Verdebout, T. (2017). Modern directional statistics. Chapman and Hall/CRC.
  • Mao et al., (2023) Mao, J., Liu, L., Huang, H., Lu, W., Yang, K., Tang, T., and Shi, H. (2023). On the temporal-spatial analysis of estimating urban traffic patterns via gps trace data of car-hailing vehicles. arXiv preprint arXiv:2306.07456.
  • Mardia and Jupp, (2009) Mardia, K. V. and Jupp, P. E. (2009). Directional statistics. John Wiley & Sons.
  • Meilán-Vila et al., (2021) Meilán-Vila, A., Francisco-Fernández, M., Crujeiras, R. M., and Panzera, A. (2021). Nonparametric multiple regression estimation for circular response. TEST, 30(3):650–672.
  • Newson and Krumm, (2009) Newson, P. and Krumm, J. (2009). Hidden markov map matching through noise and sparseness. In 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL GIS 2009), November 4-6, Seattle, WA, pages 336–343.
  • Nguyen et al., (2023) Nguyen, T. D., Pham Ngoc, T. M., and Rivoirard, V. (2023). Adaptive warped kernel estimation for nonparametric regression with circular responses. Electronic Journal of Statistics, 17(2):4011–4048.
  • Papoutsis et al., (2021) Papoutsis, P., Fennia, S., Bridon, C., and Duong, T. (2021). Relaxing door-to-door matching reduces passenger waiting times: A workflow for the analysis of driver gps traces in a stochastic carpooling service. Transportation Engineering, 4:100061.
  • Pewsey and García-Portugués, (2021) Pewsey, A. and García-Portugués, E. (2021). Recent advances in directional statistics. Test, 30(1):1–58.
  • Polonik, (1997) Polonik, W. (1997). Minimum volume sets and generalized quantile processes. Stochastic processes and their applications, 69(1):1–24.
  • Ramsay and Silverman, (2002) Ramsay, J. O. and Silverman, B. W. (2002). Applied functional data analysis: methods and case studies. Springer.
  • Schönfelder and Axhausen, (2004) Schönfelder, S. and Axhausen, K. W. (2004). On the variability of human activity spaces. In Koll-Schretzenmayr, M., Keiner, M., and Nussbaumer, G., editors, The Real and Virtual Worlds of Spatial Planning, pages 237–262. Springer Verlag, Berlin, Heidelberg.
  • Scott, (2015) Scott, D. W. (2015). Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons.
  • Sherman et al., (2005) Sherman, J. E., Spencer, J., Preisser, J. S., Gesler, W. M., and Arcury, T. A. (2005). A suite of methods for representing activity space in a healthcare accessibility study. International Journal of Health Geographics, 4:24–24.
  • Silverman, (2018) Silverman, B. W. (2018). Density estimation for statistics and data analysis. Routledge.
  • Simon, (2014) Simon, L. (2014). Introduction to Geometric Measure Theory. Cambridge University Press.
  • Tsybakov, (1997) Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. The Annals of Statistics, 25(3):948–969.
  • Von Luxburg, (2007) Von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and computing, 17:395–416.
  • Wang et al., (2016) Wang, J.-L., Chiou, J.-M., and Müller, H.-G. (2016). Functional data analysis. Annual Review of Statistics and its application, 3(1):257–295.
  • Wasserman, (2006) Wasserman, L. (2006). All of nonparametric statistics. Springer Science & Business Media.
  • Wasserman, (2018) Wasserman, L. (2018). Topological data analysis. Annual Review of Statistics and Its Application, 5(1):501–532.
  • Yang et al., (2022) Yang, Y., Jia, B., Yan, X.-Y., Li, J., Yang, Z., and Gao, Z. (2022). Identifying intercity freight trip ends of heavy trucks from gps data. Transportation Research Part E: Logistics and Transportation Review, 157:102590.

SUPPLEMENTARY MATERIAL

Numerical techniques (Section A):

We provide some useful numerical techniques for computing our estimators.

Simulations (Section B):

We provide a simulation study to investigate the performance of our method.

Assumptions (Section C):

All technical assumptions for the asymptotic theory are included in this section.

Proofs (Section D):

The proofs of the theoretical results are given in this section.

Data and R scripts:

See the online supplementary file.

Appendix A Numerical techniques

A.1 Conditional estimator as a weighted estimator

Recall that the estimator f^c(x)\widehat{f}_{c}(x) in equation (8) is

f^c(x)=01f^(x|t)𝑑t\displaystyle\widehat{f}_{c}(x)=\int_{0}^{1}\widehat{f}(x|t)dt =1nh2i=1nj=1miW~i,jK(Xi,jxh),\displaystyle=\frac{1}{nh^{2}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}\widetilde{W}_{i,j}\cdot K\left(\frac{X_{i,j}-x}{h}\right),
W~i,j\displaystyle\widetilde{W}_{i,j} =n01KT(dT(ti,j,t)hT)i=1n1mij=1miKT(dT(ti,j,t)hT)𝑑t\displaystyle=n\cdot\int_{0}^{1}\frac{K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{i^{\prime}=1}^{n}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K_{T}\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}dt

and the conditional PDF in equation (7),

f^(x|t)=i=1n1mij=1miK(Xi,jxhX)KT(dT(ti,j,t)hT)hX2i=1n1mij=1miKT(dT(ti,j,t)hT)=1nh2i=1nj=1miωi,j(t)K(Xi,jxh),\widehat{f}(x|t)=\frac{\sum_{i=1}^{n}\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}K\left(\frac{X_{i,j}-x}{h_{X}}\right)K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{h_{X}^{2}\cdot\sum_{i^{\prime}=1}^{n}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K_{T}\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}=\frac{1}{nh^{2}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}\omega_{i,j}(t)\cdot K\left(\frac{X_{i,j}-x}{h}\right),

where

ωi,j(t)=nK(Xi,jxhX)KT(dT(ti,j,t)hT)i=1n1mij=1miKT(dT(ti,j,t)hT).\omega_{i,j}(t)=n\cdot\frac{K\left(\frac{X_{i,j}-x}{h_{X}}\right)K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{i^{\prime}=1}^{n}\frac{1}{m_{i^{\prime}}}\sum_{j^{\prime}=1}^{m_{i^{\prime}}}K_{T}\left(\frac{d_{T}(t_{i^{\prime},j^{\prime}},t)}{h_{T}}\right)}.

is a local weight at time tt. So both estimators are essentially the weighted 2D KDE with different weights. This allows us to quickly compute the estimator using existing kernel smoothing library such as the ks library in R.

Moreover, the weights are connected by the integral

W~i,j=01ωi,j(t).\widetilde{W}_{i,j}=\int_{0}^{1}\omega_{i,j}(t).

If we are interested in only the time interval A[0,1]A\subset[0,1], we just need to adjust the weight to be

W~A,i,j=Aωi,j(t)\widetilde{W}_{A,i,j}=\int_{A}\omega_{i,j}(t)

and rerun the weighted 2D KDE. In practice, we often perform a numerical approximation to the integral based on a dense grid of [0,1][0,1]. Computing the weights W~A,i,j\widetilde{W}_{A,i,j} can be done easily by using only the grid points inside the interval AA.

A.2 A fast algorithm for the activity space

Recall that the ρ\rho-activity space is

Q^ρ\displaystyle\widehat{Q}_{\rho} =minλ{^λ:1ni=1nj=1miWi,jI(Xi,j^λ)ρ},\displaystyle=\min_{\lambda}\left\{\widehat{\mathcal{L}}^{*}_{\lambda}:\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W^{\dagger}_{i,j}I(X_{i,j}\in\widehat{\mathcal{L}}^{*}_{\lambda})\geq\rho\right\},
^λ\displaystyle\widehat{\mathcal{L}}^{*}_{\lambda} ={x:f^(x)λ}.\displaystyle=\left\{x:\widehat{f}(x)\geq\lambda\right\}.

Numerically, the minimization in Q^ρ\widehat{Q}_{\rho} is not needed. Since the time-weight EDF F^w(x)\widehat{F}_{w}(x) is a distribution function with a probability mass of 1nWi,j\frac{1}{n}W^{\dagger}_{i,j} on Xi,jX_{i,j}. Therefore, the summation 1ni=1nj=1miWi,jI(Xi,j^λ)\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W^{\dagger}_{i,j}I(X_{i,j}\in\widehat{\mathcal{L}}^{*}_{\lambda}) remains the same when we vary λ\lambda without including any observation Xi,jX_{i,j}. Using this insight, let p^i,j=f^(Xi,j)\widehat{p}_{i,j}=\widehat{f}(X_{i,j}) be the estimated density at Xi,jX_{i,j}. We then ordered all observations (across every day) according to the estimated density evaluated at each point

p^(1)<p^(2)<p^(Nn)\widehat{p}_{(1)}<\widehat{p}_{(2)}<\ldots\widehat{p}_{(N_{n})}

and let (X(1),W(1)),,(X(Nn),W(Nn))(X_{(1)},W^{\dagger}_{(1)}),\ldots,(X_{(N_{n})},W^{\dagger}_{(N_{n})}) be the corresponding location and weight from (Xi,j,Wi,j)(X_{i,j},W^{\dagger}_{i,j}). Note that Nn=i=1nmiN_{n}=\sum_{i=1}^{n}m_{i} is total number of observations.

In this way, if we choose λ=p(k)\lambda=p_{(k)}, the level set ^p(k)\widehat{\mathcal{L}}^{*}_{p_{(k)}} will cover all observations of indices (k),(k+2),,(Nn)(k),(k+2),\ldots,(N_{n}), leading to the proportion of 1n(W(k)+W(k+1)+W(Nn))\frac{1}{n}\left(W^{\dagger}_{(k)}+W^{\dagger}_{(k+1)}+\ldots W^{\dagger}_{(N_{n})}\right) being covered inside ^p(k)\widehat{\mathcal{L}}^{*}_{p_{(k)}}. Note that the division of nn is to account for the summation over nn days. As a result, for a given proportion ρ\rho, we only need to find the largest index kρk^{*}_{\rho} such that the upper cumulative sum of t(k)t_{(k)} is above ρ\rho:

kρ=max{k:1n=kNnW()ρ}.k^{*}_{\rho}=\max\left\{k:\frac{1}{n}\sum_{\ell=k}^{N_{n}}W^{\dagger}_{(\ell)}\geq\rho\right\}. (22)

With this, the choice λ=p(kρ)\lambda=p_{(k^{*}_{\rho})} leads to the set

^p(kρ)=Q^ρ.\widehat{\mathcal{L}}^{*}_{p_{(k^{*}_{\rho})}}=\widehat{Q}_{\rho}. (23)

Appendix B Simulation

B.1 Design of simulations

Pattern Probability
Expected duration
(Hours)
η\eta qq Actions
Pattern 1 15/28 9 0.15 0.5 At home
0.5 0.08 0.25 Home to office
8 0.15 0.5 At office
0.6 0.08 0.25 Office to home
2 0.2 0.6 At home
1 0.06 0.15 Home to park to home
2.9 Home
Pattern 2 5/28 8.5 0.15 0.5 Home
0.5 0.08 0.25 Home to office
8 0.15 0.5 At office
0.75 0.08 0.25 Office to restaurant
1 0.08 0.25 At restaurant
0.4 0.08 0.25 Restaurant to home
0.95 0.1 0.3 At home
1 0.06 0.15 Home to park to home
2.9 At home
Pattern 3 4/28 11 0.3 1 At home
0.75 0.15 0.45 Home to supermarket
2.5 0.3 1 At supermarket
0.75 0.15 0.45 Supermarket to home
9 At home
Pattern 4 1/28 10 0.3 1 At home
0.8 0.3 0.7 Home to beach
5.7 0.35 1 Beach
0.8 0.3 0.7 Beach to home
6.7 At home
Pattern 5 3/28 24 At home
Table 1: An SMM model that we use for simulation. Specifically, there are totally five daily activity patterns used for simulation. Each location’s duration follows a truncated normal distribution with mean (Expected Duration), standard deviation η\eta, and half-width qq. Times shown are in hours. The final row for each pattern (where η\eta and qq are blank) absorbs any remaining time up to 24 hours.

To evaluate the performance of the proposed activity density estimators, we construct a simplified scenario involving a single fictitious individual residing in a small-scale world. This world contains five anchor points: home, office, restaurant, beach, and supermarket connected by several routes (Figure 3, the first panel).

Activity patterns.

We design five distinct daily activity patterns for this individual, illustrated in the 2nd to 6th panel in Figure 3. Specific details including the order of visits, the expected time spent at each location, and the probability of selecting each pattern are summarized in Table 1. Each day, the individual randomly chooses one of these five patterns.

For each location in a chosen pattern, the duration is drawn from a truncated normal distribution:

zib𝒩(μb,i,ηb,i2)truncated to(μb,iqb,i,μb,i+qb,i),z_{i}\mid b\sim\mathcal{N}\bigl{(}\mu_{b,i},\,\eta_{b,i}^{2}\bigr{)}\quad\text{truncated to}\quad\bigl{(}\mu_{b,i}-q_{b,i},\;\mu_{b,i}+q_{b,i}\bigr{)},

where μb,i\mu_{b,i} and ηb,i\eta_{b,i} are the mean and standard deviation, and qb,i[0,1]q_{b,i}\in[0,1] sets a reasonable range for the duration. The duration of final location in each pattern is computed as whatever time remains in the 24-hour day once all previous locations have been visited.

There are two reasons we use a truncated Gaussian distribution for the duration of each position:

  • First, controlling variance: By adjusting ηb,i\eta_{b,i}, we can reflect realistic scenarios; for example, weekday departures from home tend to be less variable than weekend departures for a beach visit. Hence, we set the variance of duration of the first position higher on weekends.

  • Second, ensuring a reasonable time distribution: Truncation avoids unrealistic time of visiting for some places (for example, a restaurant stay should not exceed typical operating hours).

Our ultimate goal is to estimate fGPSf_{GPS} and fGPS,Af_{GPS,A} over the time interval A=[824,1024]A=\bigl{[}\tfrac{8}{24},\,\tfrac{10}{24}\bigr{]}, which corresponds to the morning rush hour between 8 AM and 10 AM.

Timestamp generation. For simplicity, we assume that every day has equal number of observations, i.e., mi=mm_{i}=m, and we choose (n,m)(n,m) as

n{7,30,90}m{159,479,1439},n\in\{7,30,90\}\quad m\in\{159,479,1439\},

such choice of nn reflects one week, one month, and three months of GPS observation data. The choices of mm correspond to one observation every 9 minutes, 3 minutes, or 1 minute on average.

To investigate estimator performance under both regular and irregular frequencies of observations, we generate timestamps in two different ways:

  1. 1.

    Even-spacing: In each simulated day, the mm timestamps are evenly spaced at intervals of 1440m+1\tfrac{1440}{m+1} minutes. Then, the GPS observations are recorded every 1440m+1\frac{1440}{m+1} minutes.

  2. 2.

    Realistic: To generate non-uniform timestamps which align with the realistic recording, We use real GPS data described in Section 6 in the following procedure:

    1. (a)

      Randomly select one individual from the real dataset.

    2. (b)

      For each day, randomly choose one day from that individual.

    3. (c)

      If the chosen day has fewer than mm real timestamps, generate the additional timestamps by sampling from a Gaussian kernel density estimate of that day’s timestamp distribution (using Silverman’s rule of thumb for bandwidth). If the chosen day has more than mm real timestamps, randomly remove timestamps until mm timestamps remain, where the probability of removal for each observation is the same.

    This process yields mm timestamps per simulated day, preserving the original non-uniformity found in the selected individual’s data. The real data’s timestamp distribution is possibly very skew. This procedure preserves the original day-to-day skew and clustering patterns of timestamps found in real data. Figure 4 illustrates an example of a highly skewed timestamp distribution for one individual.

Measurement errors. We introduce measurement errors by drawing from a Gaussian noise distribution with σ=0.1\sigma=0.1 or σ=0.2\sigma=0.2. The corresponding log(fGPS+ξ)\log(f_{GPS}+\xi) is shown in Figure 10, where ξ\xi is set as 0.0001. Note that we use a log-scale because the anchor locations (home and office) have a very high fGPSf_{GPS}, which would otherwise dominate the scale so that we cannot see the overall pattern well. In the top panels, we display log(fGPS+ξ)\log(f_{GPS}+\xi) while in the bottom panel, we focus on the density during the morning rush hour (8 AM to 10 AM).

Compared with the real-data analysis in Section 6, here we choose a smaller ξ\xi. This adjustment is caused by the scale of coordinates. The density value will be lower if the data is in a larger spatial region. This smaller ξ\xi ensures low-frequency or low-duration regions, such as the route between home and the beach, remain distinguishable from truly unvisited areas in both visualization and the further analysis such as clustering. As a result, in subsequent clustering analyses, the reduced ξ\xi value helps more clearly separate visited versus unvisited locations, thereby improving the distinctiveness among different mobility patterns.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 10: The log-density map for measurement errors under the level of σ=0.1\sigma=0.1 (left column) and σ=0.2\sigma=0.2 (right column) for log(fGPS+ξ)\log(f_{GPS}+\xi) in the top and log(fGPS,A+ξ)\log(f_{GPS,A}+\xi) in the bottom with AA represents the interval of 8 AM to 10 AM. In each panel, ξ\xi is set as 0.0001.

Bandwidth selection. After preliminary experiments using mean integrated squared error (MISE) as a performance measure, we found the following reference rule for the spatial and temporal smoothing bandwidth:

hX= 0.065(sx,12+sx,22i=1nmi)16,hT=0.05(ni=1nmi)13h_{X}\;=\;0.065\,\biggl{(}\frac{\sqrt{s_{x,1}^{2}+s_{x,2}^{2}}}{\sum_{i=1}^{n}m_{i}}\,\biggr{)}^{\tfrac{1}{6}},\quad h_{T}=0.05\,\biggl{(}\frac{n}{\sum_{i=1}^{n}m_{i}}\,\biggr{)}^{\tfrac{1}{3}}

where

sx,l=k=1ndi=1mkwi,j(xi,j,lμl)2,l=1,2wi,j=ti,j+1ti,j12n,μ1=i=1nj=1miwi,jxi,j,l.s_{x,l}\;=\;\sqrt{\sum_{k=1}^{n_{d}}\sum_{i=1}^{m_{k}}w_{i,j}\,\bigl{(}x_{i,j,l}-\mu_{l}\bigr{)}^{2}},l=1,2\quad w_{i,j}\;=\;\frac{t_{i,j+1}-t_{i,j-1}}{2\,n},\quad\mu_{1}\;=\;\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}w_{i,j}\,x_{i,j,l}.

Here, (xi,j,1,xi,j,2)(x_{i,j,1},x_{i,j,2}) denotes the spatial coordinates of the jj-th observation on day ii, and ti,jt_{i,j} is its (normalized) timestamp within day ii. The weights wk,iw_{k,i} ensure proper normalization over the total number of days nn and allow for variable sampling rates within a day.

Compared with standard bandwidth heuristics such as Silverman’s rule of thumb for multivariate kernels Chacón et al., (2011), our scale constants are noticeably smaller. This discrepancy arises because Silverman’s rule is often derived under assumptions of (approximately) unimodal or smooth data distributions. In contrast, our simulated environment exhibits strongly localized activity peaks due to prolonged stays at a small number of anchor points (e.g., home and office). As a result, the density near these anchor points is extremely high compared with other regions. A smaller bandwidth helps avoid oversmoothing these sharp peaks, thereby reducing bias and improving local accuracy of the estimated density. From a MISE perspective, capturing these density spikes precisely is critical, since errors around densely occupied points can contribute disproportionately to the integrated error. Consequently, a smaller scale constant not only accommodates the multimodal nature of the data but also minimizes MISE by enhancing resolution in areas with concentrated activity.

B.2 Results on the MISE

Estimating the average GPS density. For each combination of (n,m)(n,m), we report the averaged mean integrated squared error (MISE) over a 121×99121\times 99 grid averaged over 100 repetitions, where each cell has area 0.2×0.20.2\times 0.2. The results for estimating fGPSf_{GPS} appear in Tables 23.

Even-spacing ϵ=0.2\epsilon=0.2 ϵ=0.1\epsilon=0.1
nn mm f^w\widehat{f}_{w} f^c\widehat{f}_{c} f^naive\widehat{f}_{naive} f^w\widehat{f}_{w} f^c\widehat{f}_{c} f^naive\widehat{f}_{naive}
7 159 0.0839 0.0812 0.0839 0.8571 0.8482 0.8571
7 479 0.0619 0.0599 0.0619 0.6426 0.6353 0.6426
7 1439 0.0456 0.0438 0.0456 0.4442 0.4375 0.4442
30 159 0.0473 0.045 0.0473 0.5967 0.5874 0.5967
30 479 0.0359 0.0341 0.0359 0.4321 0.4249 0.4321
30 1439 0.0209 0.0197 0.0209 0.268 0.2623 0.268
90 159 0.0368 0.0344 0.0368 0.4592 0.4495 0.4592
90 479 0.0225 0.021 0.0225 0.3109 0.3041 0.3109
90 1439 0.0113 0.0105 0.0113 0.1729 0.1675 0.1729
Table 2: The MISE of the three methods for estimating the daily density during under evenly spacing timestamps.
Realistic ϵ=0.2\epsilon=0.2 ϵ=0.1\epsilon=0.1
nn mm f^w\widehat{f}_{w} f^c\widehat{f}_{c} f^naive\widehat{f}_{naive} f^w\widehat{f}_{w} f^c\widehat{f}_{c} f^naive\widehat{f}_{naive}
7 159 0.113 0.0844 0.1457 0.9321 0.8747 1.0424
7 479 0.097 0.0622 0.1192 0.7135 0.6473 0.819
7 1439 0.0925 0.0489 0.1017 0.5616 0.4736 0.642
30 159 0.0708 0.0472 0.0934 0.6512 0.5873 0.7415
30 479 0.0575 0.0325 0.0787 0.4859 0.4143 0.5773
30 1439 0.0462 0.0219 0.0642 0.349 0.2801 0.4359
90 159 0.0562 0.0337 0.0778 0.5228 0.4529 0.6108
90 479 0.0415 0.0224 0.0641 0.3734 0.3119 0.4677
90 1439 0.0286 0.0106 0.0432 0.2358 0.1751 0.3172
Table 3: The MISE of the three methods for estimating the daily density during under realistic timestamp distributions.

In Tables 2 (even-spacing), we see that all methods work well with the conditional method f^c\widehat{f}_{c} that performs slightly better than the others. And the marginal method f^w\widehat{f}_{w} has almost same MISE as naive method f^naive\widehat{f}_{naive}. Because timestamps are recorded at uniform intervals, each observation naturally represents a comparable portion of the day, thus reducing the penalty from ignoring time differences. A similar conclusion holds if timestamps are generated uniformly in [0,1][0,1].

In contrast, when timestamps are drawn from the real dataset, the results change significantly. Real-world GPS records are often non-uniform in time, so the naive estimator which assumes each data point is equally representative suffers from a pronounced bias. This effect is especially evident in the last row of Table 3, where f^naive\widehat{f}_{\mathrm{naive}} exhibits a substantially higher MISE than either of the proposed estimators. Notably, both f^w\widehat{f}_{w} and f^c\widehat{f}_{c} handle irregular sampling much more effectively, resulting in significantly lower MISE.

Estimating the interval-specific GPS density. In addition to the average GPS densities, we also consider the case of estimating the GPS density during the morning rush hour A=[824,1024]A=[\frac{8}{24},\frac{10}{24}] (8 AM - 10 AM). Tables 4 and 5 present the results for this interval-specific density.

The performance trends mirror those observed for daily averages (Table 4): all three estimators remain consistent, and the conditional method f^c\widehat{f}_{c} has a slight edge over the other two. The naive and weighted methods f^w\widehat{f}_{w} show comparable MISE values.

Table 5 shows the MISE of three estimators when we generate the timestamps from a realistic distribution. As with the daily density estimation, the real (irregular) distribution of timestamps reveals a clear advantage for f^c\widehat{f}_{c}. The naive approach again shows higher MISE, though in this specific morning interval, the difference between f^w\widehat{f}_{w} and f^naive\widehat{f}_{\mathrm{naive}} is smaller than in the full-day scenario. This arises because the distribution of the real data timestamps between 8 AM and 10 AM are not as severely skewed as the distribution of timestamps in the whole day, so the advantage of f^w\widehat{f}_{w} is somewhat diminished.

Overall, the results confirm that both f^w\widehat{f}_{w} and f^c\widehat{f}_{c} more robustly handle uneven timestamp distributions compared to the naive approach, and that f^c\widehat{f}_{c} typically offers the best performance in terms of MISE across a variety of sampling settings.

Realistic ϵ=0.2\epsilon=0.2 ϵ=0.1\epsilon=0.1
nn mm f^w\widehat{f}_{w} f^c\widehat{f}_{c} f^naive\widehat{f}_{naive} f^w\widehat{f}_{w} f^c\widehat{f}_{c} f^naive\widehat{f}_{naive}
7 159 0.1802 0.1212 0.1794 1.2959 0.8483 1.2935
7 479 0.1287 0.0917 0.1281 0.9944 0.6345 0.9922
7 1439 0.1184 0.0789 0.1184 0.8261 0.4762 0.8253
30 159 0.0971 0.0676 0.0964 0.864 0.5967 0.8615
30 479 0.0787 0.0548 0.0783 0.6974 0.436 0.6955
30 1439 0.0498 0.0333 0.0497 0.4488 0.2693 0.4482
90 159 0.0777 0.0545 0.0772 0.7083 0.4752 0.7058
90 479 0.049 0.0341 0.0489 0.4616 0.3144 0.46
90 1439 0.038 0.0204 0.0379 0.3093 0.1817 0.3089
Table 4: The MISE of the three methods for estimating the density during the rush hour (8-10 AM) under evenly spacing timestamps.
Realistic ϵ=0.2\epsilon=0.2 ϵ=0.1\epsilon=0.1
nn mm f^w\widehat{f}_{w} f^c\widehat{f}_{c} f^naive\widehat{f}_{naive} f^w\widehat{f}_{w} f^c\widehat{f}_{c} f^naive\widehat{f}_{naive}
7 159 0.1489 0.1099 0.1522 1.0886 0.816 1.1085
7 479 0.1064 0.0812 0.1123 0.8289 0.582 0.8495
7 1439 0.0909 0.0749 0.1006 0.6751 0.4664 0.7054
30 159 0.0894 0.065 0.0902 0.7808 0.5702 0.7857
30 479 0.0637 0.0463 0.0651 0.5714 0.3954 0.5776
30 1439 0.0542 0.0424 0.0573 0.4103 0.3121 0.4225
90 159 0.0616 0.0482 0.0627 0.5631 0.4508 0.568
90 479 0.046 0.0357 0.0486 0.3906 0.3136 0.4013
90 1439 0.0286 0.0162 0.0304 0.2301 0.1587 0.2373
Table 5: The MISE of the three methods for estimating the density during the rush hour (8-10 AM) under the realistic timestamp distribution.

B.3 Applications based on the simple movement model

In our simulation setup, Pattern 1 and Pattern 2 correspond to typical weekday schedules (containing the office as an anchor point), whereas Pattern 3, Pattern 4, and Pattern 5 represent potential weekend routines (not containing office as anchor point). We separately analyze the data in weekdays and weekends using the framework of Section 5. Throughout this subsection, we fix the measurement error at σ=0.2\sigma=0.2 and assume evenly spaced timestamps with 479 GPS observations per day. We generate a total of 90 simulated days under these conditions.

B.3.1 Anchor location recovering

From Table 1, we identify three anchor locations for weekdays (home, office, restaurant) and another set of three for weekends (home, supermarket, beach). Across the entire 90 days, these yield five un anchor points overall.

Using the method described in Section 5.1 with λ=0.0055\lambda=0.0055, we recover all anchor locations accurately, whether we consider:

1. All 90 days combined,

2. Weekdays only (Patterns 1 and 2), or

3. Weekends only (Patterns 3, 4, and 5).

Figure 11 compares the true anchor points with the detected ones in each of these scenarios. Under a suitably chosen threshold, and assuming sufficient time spent at each anchor point, the technique in Section 5.1 locates the simulated individual’s anchor sites with high accuracy.

Refer to caption
Refer to caption
Refer to caption
Figure 11: Detected anchor points versus true anchor points for the simulated individual. Left: All days combined. Center: Weekdays only. Right: Weekends only.

B.3.2 Analysis on activities

Figure 12 depicts the contour region QρQ_{\rho}, which encloses the spatial area where the individual spends a proportion ρ\rho of their time. We show contours for ρ{0.999,0.99,0.90,0.70,0.50}\rho\in\{0.999,0.99,0.90,0.70,0.50\} in three scenarios:

  1. 1.

    All Days (Left Panel): Over 90 days, 90% of the individual’s time (orange contour) is concentrated at home and office, consistent with Patterns 1 and 2 being selected more frequently.

  2. 2.

    Weekdays (Middle Panel): On weekdays alone, the same 90% region again focuses heavily on home and office, indicating most weekday hours are spent in these two anchor locations.

  3. 3.

    Weekends (Right Panel): On weekends (Patterns 3 to 5), more than 90% of time is spent at home, reflecting fewer trips away from the house (with occasional visits to the supermarket or beach).

Refer to caption
Refer to caption
Refer to caption
Figure 12: Activity space of the simulated individual for all days (left), weekdays (middle), and weekends (right). Each panel shows contour regions for 99.9% (gray), 99% (green), 90% (orange), 70% (blue), and 50% (red) of total time spent.

B.3.3 Clustering of trajectories

Clustering of days. A key aspect of our study is to cluster the simulated individual’s weekday days (Patterns 1 and 2) and weekend days (Patterns 3, 4, and 5) based on their GPS trajectories. To do so, we separately analyze the weekday observations and the weekend observations.

For each day, we compute the smoothed conditional density f^c\widehat{f}_{c} (using the bandwidths hXh_{X} and hTh_{T} determined earlier) and measure pairwise distances via the integrated squared difference between log-densities:

Dij=[log(f^c,i(x)+ξ)log(f^c,j(x)+ξ)]2𝑑x,D_{ij}\;=\;\int\Bigl{[}\log\bigl{(}\widehat{f}_{c,i}(x)+\xi\bigr{)}\;-\;\log\bigl{(}\widehat{f}_{c,j}(x)+\xi\bigr{)}\Bigr{]}^{2}\,dx,

where f^c,i\widehat{f}_{c,i} is the estimated conditional density for day ii, and f^c,j\widehat{f}_{c,j} for day jj. Same as the log-density visualization in Figure 10, here, ξ\xi is set as 0.0001.

Figure 13 further illustrates the structure of the clusters by showing the re-ordered distance matrices for weekdays (left panel) and weekends (right panel). In both cases, we see a pronounced block structure, indicating clear separations between the different activity patterns measured based on our proposed density estimation method.

Refer to caption
Refer to caption
Figure 13: Re-ordered distance matrices by hierarchical clustering for weekday observations (left) and weekend observations (right). Each block corresponds to one of the simulated patterns, reflecting clear clustering boundaries.

We perform single-linkage hierarchical clustering, producing dendrograms of weekday and weekend data (Figure 14). After comparing with the known activity pattern for each day, we find that hierarchical clustering achieves 100% accurate day-level classifications, perfectly matching the original activity patterns used to generate the data.

Refer to caption
Refer to caption
Figure 14: Dendrograms for weekday observations (left) and weekend observations (right). Each branch corresponds to a distinct activity pattern.

Conditional density on clustered days. To further illustrate how the discovered clusters differ in their spatiotemporal patterns, we compare conditional densities at specific times of day.

Figure 15 shows the estimated conditional density at 6:00 PM for each weekday cluster (i.e., Patterns 1 and 2). The densities differ markedly: in Pattern 1, the individual leaves the office and goes directly home, whereas in Pattern 2, they stop at a restaurant (top-right anchor) before returning home.

Refer to caption
Refer to caption
Figure 15: Estimated conditional density at 6:00 PM for the two weekday clusters (Patterns 1 and 2). The cluster in the left panel shows a high density around home, while the right panel indicates a high density near the restaurant.

Figure 16 displays the conditional density at 12:00 PM for each weekend cluster. In Pattern 3, the individual is often near the supermarket around noon; in Pattern 4, they might be found at the beach; and in Pattern 5, they remain at home throughout the day.

Refer to caption
Refer to caption
Refer to caption
Figure 16: Estimated conditional density at 12:00 PM for the weekend clusters (Patterns 3, 4, 5). From left to right: frequent midday location near the supermarket, near the beach, and at home.

Centers on each cluster. Given these day-level clusters, we also estimate a cluster-specific center μ^g(t)\widehat{\mu}_{g}(t) representing the average location for each cluster at time tt. Figure 17 shows the computed centers at six time points (from 6:00 AM to 9:00 PM) for each cluster. The first (top-left) panel shows the combined data (all days), and the remaining panels correspond to each of the five clusters.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 17: Estimated centers at six times of day (6:00 AM to 9:00 PM) across all days (top-left panel) and for each of the five activity clusters (subsequent panels).

On weekdays, for four of the six time points 6:00 AM, 12:00 PM, 3:00 PM, and 9:00 PM, both clusters have roughly the same center (reflecting time spent at home or in the office). However, at 9:00 AM, the Cluster 1 center is closer to home, whereas the Cluster 2 center is closer to the office, consistent with the expected earlier departure in the weekdays’ morning under Pattern 2 in Table 1. At 6:00 PM, Cluster 1 centers near home, while Cluster 2 centers near the restaurant, mirroring the aforementioned difference in post-work behavior.

Each weekend pattern displays a distinct midday center, aligning with the anchor points for each pattern on weekends: supermarket for Pattern 3, beach for Pattern 4, and home for Pattern 5.

Appendix C Assumptions

Assumption 1 (Property of the kernel function)

The 2-dimensional kernel function K(x)K(x) satisfies the following properties:

(a) K(x)0K(x)\geq 0 for any x2x\in\mathbb{R}^{2};

(b) K(x)K(x) is symmetric, i.e. for any x1,x22x_{1},x_{2}\in\mathbb{R}^{2} satisfying x12=x22\|x_{1}\|_{2}=\|x_{2}\|_{2}, we have K(x1)=K(x2)K(x_{1})=K(x_{2});

(c) K(x)𝑑x=1\int K(x)dx=1;

(d) xTxK(x)𝑑x<\int x^{T}xK(x)dx<\infty.

(e) KK have Lipchitz property: there exists constant LKL_{K} such that, for any x1,x22x_{1},x_{2}\in\mathbb{R}^{2}, we have

|K(x1)K(x2)|LKx1x2.\displaystyle|K(x_{1})-K(x_{2})|\leq L_{K}\|x_{1}-x_{2}\|.

Assumption 1 are standard assumptions on the kernel functions (Wasserman, , 2006). The additional requirement is the Lipchitz condition (e) but it is a very mild condition.

Assumption 2

Any random trajectory S(t)S(t) is Lipschitz, i.e., there exists a v¯<\bar{v}<\infty such that S(t1)S(t2)v¯dT(t1,t2)\|S(t_{1})-S(t_{2})\|\leq\bar{v}d_{T}(t_{1},t_{2}) for any t1,t2t_{1},t_{2}.

The Lipschitz condition in Assumption 2 allows the case where the individual suddenly start moving. It also implies that the velocity of the individual is finite (the Lipschitz constant).

Assumption 3

The density of measurement error fϵf_{\epsilon} is bounded in the sense that fϵ(η)\|f_{\epsilon}(\eta)\|, fϵ(η)\|\nabla f_{\epsilon}(\eta)\|, and fϵ(η)2\|\nabla\nabla f_{\epsilon}(\eta)\|_{2} are uniformly bounded by a constant MϵM_{\epsilon} for every η2\eta\in\mathbb{R}^{2}.

The Assumption 1 is the common assumption for kernel density estimation, and most common kernel functions can satisfy it. When the distribution of measurement error is fixed, then the Assumption 3 is automatically satisfied.

By the convolution theory, Assumptions 3 implies that the GPS density fGPSf_{GPS} has bounded second derivative.

Assumption 4

The kernel function KTK_{T} is a symmetric PDF and t2K(t)𝑑t<\int t^{2}K(t)dt<\infty and is Lipschitz.

Assumption 4 is for the kernel function on the time. It will be used when we analyze the asymptotic properties of the conditional estimator.

Appendix D Proofs

D.1 Proof of Proposition 1

Proof. By definition of fGPSf_{GPS} in (1),

fGPS(x)=limr0P(XB(x,r))πr2=limr0P(S(U)+ϵB(x,r))πr2.\displaystyle f_{GPS}(x)=\lim_{r\rightarrow 0}\frac{P(X^{*}\in B(x,r))}{\pi r^{2}}=\lim_{r\rightarrow 0}\frac{P(S(U)+\epsilon\in B(x,r))}{\pi r^{2}}.

Since U𝖴𝗇𝗂𝖿𝗈𝗋𝗆([0,1])U\sim{\sf Uniform}([0,1]), we have

limr0P(S(U)+ϵB(x,r))πr2\displaystyle\lim_{r\rightarrow 0}\frac{P(S(U)+\epsilon\in B(x,r))}{\pi r^{2}} =limr0P(S(U)+ϵB(x,r)|U=t)𝑑tπr2\displaystyle=\lim_{r\rightarrow 0}\frac{\int P(S(U)+\epsilon\in B(x,r)|U=t)dt}{\pi r^{2}}
=limr0t[0,1]P(S(t)+ϵB(x,r))𝑑tπr2\displaystyle=\lim_{r\rightarrow 0}\frac{\int_{t\in[0,1]}P(S(t)+\epsilon\in B(x,r))dt}{\pi r^{2}}
=limr0t[0,1]P(XtB(x,r))𝑑tπr2.\displaystyle=\lim_{r\rightarrow 0}\frac{\int_{t\in[0,1]}P(X^{*}_{t}\in B(x,r))dt}{\pi r^{2}}.

Then, by the dominated convergence theorem,

limr0t[0,1]P(XtB(x,r))𝑑tπr2=t[0,1]limr0P(XtB(x,r))πr2dt=01fGPS(x|t)𝑑t.\displaystyle\lim_{r\rightarrow 0}\frac{\int_{t\in[0,1]}P(X^{*}_{t}\in B(x,r))dt}{\pi r^{2}}=\int_{t\in[0,1]}\lim_{r\rightarrow 0}\frac{P(X^{*}_{t}\in B(x,r))}{\pi r^{2}}dt=\int_{0}^{1}f_{GPS}(x|t)dt.

By definition of fGPS,Af_{GPS,A} in (3),

fGPS,A(x)=limr0P(XAB(x,r))πr2=limr0P(S(UA)+ϵB(x,r))πr2.\displaystyle f_{GPS,A}(x)=\lim_{r\rightarrow 0}\frac{P(X^{*}_{A}\in B(x,r))}{\pi r^{2}}=\lim_{r\rightarrow 0}\frac{P(S(U_{A})+\epsilon\in B(x,r))}{\pi r^{2}}.

Since UA𝖴𝗇𝗂𝖿𝗈𝗋𝗆(A)U_{A}\sim{\sf Uniform}(A), then by the same reasoning,

limr0P(S(UA)+ϵB(x,r))πr2=limr0tAP(S(t)+ϵB(x,r))1|A|𝑑tπr2=limr01|A|tAP(XtB(x,r))𝑑tπr2.\displaystyle\lim_{r\rightarrow 0}\frac{P(S(U_{A})+\epsilon\in B(x,r))}{\pi r^{2}}=\lim_{r\rightarrow 0}\frac{\int_{t\in A}P(S(t)+\epsilon\in B(x,r))\frac{1}{|A|}dt}{\pi r^{2}}=\lim_{r\rightarrow 0}\frac{1}{|A|}\frac{\int_{t\in A}P(X^{*}_{t}\in B(x,r))dt}{\pi r^{2}}.

Similarly, by the dominated convergence theorem, we have

limr01|A|tAP(XtB(x,r))𝑑tπr2=tAlimr01|A|P(XtB(x,r))πr2dt=1|A|AfGPS(x|t)𝑑t.\displaystyle\lim_{r\rightarrow 0}\frac{1}{|A|}\frac{\int_{t\in A}P(X^{*}_{t}\in B(x,r))dt}{\pi r^{2}}=\int_{t\in A}\lim_{r\rightarrow 0}\frac{1}{|A|}\frac{P(X^{*}_{t}\in B(x,r))}{\pi r^{2}}dt=\frac{1}{|A|}\int_{A}f_{GPS}(x|t)dt.

\Box

D.2 Proof of Proposition 2

Proof. By definition of fGPSf_{GPS} in (1),

fGPS(a)=limr0P(XB(a,r))πr2=limr0P(S(U)+ϵB(a,r))πr2.\displaystyle f_{GPS}(a)=\lim_{r\rightarrow 0}\frac{P(X^{*}\in B(a,r))}{\pi r^{2}}=\lim_{r\rightarrow 0}\frac{P(S(U)+\epsilon\in B(a,r))}{\pi r^{2}}.

Since U𝖴𝗇𝗂𝖿𝗈𝗋𝗆([0,1])U\sim{\sf Uniform}([0,1])

P(S(U)+ϵB(a,r))\displaystyle P(S(U)+\epsilon\in B(a,r))
\displaystyle\geq P(S(U)+ϵB(a,r),S(U)=a)\displaystyle P(S(U)+\epsilon\in B(a,r),S(U)=a)
=\displaystyle= P(S(U)+ϵB(a,r)|S(U)=a)P(S(U)=a)\displaystyle P(S(U)+\epsilon\in B(a,r)|S(U)=a)P(S(U)=a)
=\displaystyle= P(S(U)=a)P(ϵB(0,r))\displaystyle P(S(U)=a)P(\epsilon\in B(0,r))
\displaystyle\geq ρaP(ϵB(0,r)).\displaystyle\rho_{a}P(\epsilon\in B(0,r)).

Note that

limr0P(ϵB(y,r))πr2=12πσ2exp{yTy2σ2},\displaystyle\lim_{r\rightarrow 0}\frac{P(\epsilon\in B(y,r))}{\pi r^{2}}=\frac{1}{2\pi\sigma^{2}}\exp\left\{-\frac{y^{T}y}{2\sigma^{2}}\right\},

for any y2y\in\mathbb{R}^{2}. Hence, limr0P(ϵB(0,r))πr2=12πσ2\lim_{r\rightarrow 0}\frac{P(\epsilon\in B(0,r))}{\pi r^{2}}=\frac{1}{2\pi\sigma^{2}}. Then,

limr0P(S(U)+ϵB(a,r))πr2ρa2πσ2.\displaystyle\lim_{r\rightarrow 0}\frac{P(S(U)+\epsilon\in B(a,r))}{\pi r^{2}}\geq\frac{\rho_{a}}{2\pi\sigma^{2}}.

For regular GPS density fGPS(x)f_{GPS}(x), we have

B(a,r)fGPS(x)𝑑x=P(S(U)+ϵB(a,r))P(S(U)=a,S(U)+ϵB(a,r)).\displaystyle\int_{B(a,r)}f_{GPS}(x)dx=P(S(U)+\epsilon\in B(a,r))\geq P(S(U)=a,S(U)+\epsilon\in B(a,r)).

Moreover, the right-hand-side in the above inequality can further be bound by

P(S(U)=a,S(U)+ϵB(a,r))\displaystyle P(S(U)=a,S(U)+\epsilon\in B(a,r))
=\displaystyle= P(S(U)+ϵB(a,r)|S(U)=a)P(S(U)=a)\displaystyle P(S(U)+\epsilon\in B(a,r)|S(U)=a)P(S(U)=a)
\displaystyle\geq ρaP(ϵB(0,r)|S(U)=a)\displaystyle\rho_{a}P(\epsilon\in B(0,r)|S(U)=a)
=\displaystyle= ρaFχ22(r2σ2).\displaystyle{\rho_{a}}F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right).

Note that we use the fact that ϵ\epsilon is an isotropic Gaussian with a variance σ2\sigma^{2}, so

P(ϵB(0,r))=P(ϵ2r2)=P(ϵ2σ2r2σ2)=Fχ22(r2σ2).P(\epsilon\in B(0,r))=P(\epsilon^{2}\leq r^{2})=P\left(\frac{\epsilon^{2}}{\sigma^{2}}\leq\frac{r^{2}}{\sigma^{2}}\right)=F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right).

\Box

D.3 Proof of Proposition 3

Proof. We can write CrfGPS(x)𝑑x\int_{C\oplus r}f_{GPS}(x)dx as

CrfGPS(x)𝑑x=P(S(U)+ϵCr)\displaystyle\int_{C\oplus r}f_{GPS}(x)dx=P(S(U)+\epsilon\in C\oplus r)
\displaystyle\geq P(S(U)C,S(U)+ϵCr)\displaystyle P(S(U)\in C,S(U)+\epsilon\in C\oplus r)
\displaystyle\geq P(S(U)C,S(U)+ϵB(S(U),r))\displaystyle P(S(U)\in C,S(U)+\epsilon\in B(S(U),r))
=\displaystyle= P(S(U)C,ϵB(0,r))\displaystyle P(S(U)\in C,\epsilon\in B(0,r))
=\displaystyle= P(S(U)C)P(ϵB(0,r))\displaystyle P(S(U)\in C)P(\epsilon\in B(0,r))
\displaystyle\geq ρCFχ22(r2σ2).\displaystyle\rho_{C}F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right).

\Box

D.4 Proof of Proposition 4

Proof. A straight forward calculation shows that

GCCfGPS(x)𝑑x\displaystyle G_{C}\leq\int_{C}f_{GPS}(x)dx =P(S(U)+ϵC)\displaystyle=P(S(U)+\epsilon\in C)
=P(S(U)+ϵC,ϵr)+P(S(U)+ϵC,ϵ>r)\displaystyle=P(S(U)+\epsilon\in C,\epsilon\leq r)+P(S(U)+\epsilon\in C,\epsilon>r)
P(S(U)+ϵC,ϵr)+P(ϵ>r)\displaystyle\leq P(S(U)+\epsilon\in C,\epsilon\leq r)+P(\epsilon>r)
P(S(U)+ϵC|ϵr)P(ϵr)+1Fχ22(r2σ2)\displaystyle\leq P(S(U)+\epsilon\in C|\epsilon\leq r)P(\epsilon\leq r)+1-F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right)
P(S(U)+ϵC|ϵr)Fχ22(r2σ2)+1Fχ22(r2σ2).\displaystyle\leq P(S(U)+\epsilon\in C|\epsilon\leq r)F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right)+1-F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right).

Given ϵr\epsilon\leq r, a necessary condition to S(U)+ϵCS(U)+\epsilon\in C is that S(U)CrS(U)\in C\oplus r. Thus,

P(S(U)+ϵC|ϵr)P(S(U)Cr).P(S(U)+\epsilon\in C|\epsilon\leq r)\leq P(S(U)\in C\oplus r).

Putting this into the long inequality shows that

GCP(S(U)Cr)Fχ22(r2σ2)+1Fχ22(r2σ2),G_{C}\leq P(S(U)\in C\oplus r)F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right)+1-F_{\chi^{2}_{2}}\left(\frac{r^{2}}{\sigma^{2}}\right),

which is the desired result after rearrangements.

\Box

D.5 Proof of Theorem 5

Let f^w,i(x)\widehat{f}_{w,i}(x) is the estimated density of ii-th day observations, i.e.

f^w,i(x)=1h2i=1mi(ti,j+1ti,j1)2K(xXi,jh).\displaystyle\widehat{f}_{w,i}(x)=\frac{1}{h^{2}}\sum_{i=1}^{m_{i}}\frac{(t_{i,j+1}-t_{i,j-1})}{2}K\left(\frac{x-X_{i,j}}{h}\right).

Since the latent trajectories across different days are IID, this estimator will be IID quantities in different days if the timestamps {ti,j:j=1,,mi}\{t_{i,j}:j=1,\ldots,m_{i}\} are the same for every ii. Since the time-weighted estimator can be written as the mean of each day, i.e.,

f^w(x)=1ni=1nf^w,i(x),\widehat{f}_{w}(x)=\frac{1}{n}\sum_{i=1}^{n}\widehat{f}_{w,i}(x),

we can investigate the bias of f^w\widehat{f}_{w} by investigating the bias of f^w,i\widehat{f}_{w,i} versus fGPSf_{GPS}. The following lemma will be useful for this purpose.

Lemma 8 (Riemannian integration approximation of GPS density)

Assume assumptions  1 to 3. For any i{1,,n}i\in\{1,...,n\} and a trajectory s𝒮s\in\mathcal{S}, there are constants Cv¯,MϵC_{\bar{v}},M_{\epsilon} such that Cv¯C_{\bar{v}} depends on the maximal velocity of ss and MϵM_{\epsilon} depends on the distribution of ϵ\epsilon with

𝔼[f^w,i(x)Si=s]𝔼[1h2K(xXh)Si=s]\displaystyle\mathbb{E}\left[\widehat{f}_{w,i}(x)\mid S_{i}=s\right]-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\mid S_{i}=s\right] Cv¯(1nh3j=0mi(ti,j+1ti,j)2)\displaystyle\leq C_{\bar{v}}\left(\frac{1}{nh^{3}}\sum_{j=0}^{m_{i}}(t_{i,j+1}-t_{i,j})^{2}\right) (24)

for every x2x\in\mathbb{R}^{2}.

Proof.[Proof of Lemma 8] Since S1,,Sni.i.dPSS_{1},...,S_{n}\sim_{i.i.d}P_{S}, without loss of generality, we only need to derive (24) under i=1i=1. Then, for i=1i=1, we can decompose  (24) to the following two terms:

𝔼[f^w,1(x)S1=s]\displaystyle\mathbb{E}\left[\widehat{f}_{w,1}(x)\mid S_{1}=s\right] 𝔼[1h2K(xXh)S1=s]\displaystyle-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\mid S_{1}=s\right] (25)
=1h2j=1m1(t1,j+1t1,j1)2ϵ1,jK(xs(t1,j)+ϵ1,jh)𝑑Fϵ(ϵ1,j)(I)\displaystyle=\underbrace{\frac{1}{h^{2}}\sum_{j=1}^{m_{1}}\frac{(t_{1,j+1}-t_{1,j-1})}{2}\int_{\epsilon_{1,j}}K\left(\frac{x-s(t_{1,j})+\epsilon_{1,j}}{h}\right)dF_{\epsilon}(\epsilon_{1,j})}_{(I)}
1h2νu[0,1]K(xs(u)+νh)𝑑u𝑑Fϵ(ν)(II).\displaystyle-\underbrace{\frac{1}{h^{2}}\int_{\nu}\int_{u\in[0,1]}K\left(\frac{x-s(u)+\nu}{h}\right)dudF_{\epsilon}(\nu)}_{(II)}.

To simplify in (25), we use Ih(t1,j;s,x)I_{h}(t_{1,j};s,x) to denote ϵ1,jK(xs(t1,j)+ϵ1,jh)𝑑Fϵ\int_{\epsilon_{1,j}}K\left(\frac{x-s(t_{1,j})+\epsilon_{1,j}}{h}\right)dF_{\epsilon}, i.e. the expection of K(xX1,jh)K\left(\frac{x-X_{1,j}}{h}\right) at time t1,jt_{1,j} given latent trajectory S1=sS_{1}=s.

Term (I): Then, by t1,j+1t1,j1=(t1,j+1t1,j)+(t1,jt1,j1)=t1,jt1,j+1𝑑u+t1,j1t1,j𝑑ut_{1,j+1}-t_{1,j-1}=(t_{1,j+1}-t_{1,j})+(t_{1,j}-t_{1,j-1})=\int_{t_{1,j}}^{t_{1,j+1}}du+\int_{t_{1,j-1}}^{t_{1,j}}du, Term (I) in (25) can be written as

1h2j=1m1(t1,j+1t1,j1)2Ih(t1,j;s,x)\displaystyle\frac{1}{h^{2}}\sum_{j=1}^{m_{1}}\frac{(t_{1,j+1}-t_{1,j-1})}{2}I_{h}(t_{1,j};s,x)
=\displaystyle= (t1,1t1,0)2Ih(t1,1;s,x)+(t1,m1+1t1,m)2Ih(t1,m;s,x)+1h2j=1m11(t1,j+1t1,j)2Ih(t1,j;s,x).\displaystyle\frac{(t_{1,1}-t_{1,0})}{2}I_{h}(t_{1,1};s,x)+\frac{(t_{1,m_{1}+1}-t_{1,m})}{2}I_{h}(t_{1,m};s,x)+\frac{1}{h^{2}}\sum_{j=1}^{m_{1}-1}\frac{(t_{1,j+1}-t_{1,j})}{2}I_{h}(t_{1,j};s,x).

For the weights of first and last observation, note that,

(t1,1t1,0)2=(t1,m1+1t1,m1)2=1(t1,m1t1,1)2=12[0t1,11𝑑u+t1,m111𝑑u].\displaystyle\frac{(t_{1,1}-t_{1,0})}{2}=\frac{(t_{1,m_{1}+1}-t_{1,m_{1}})}{2}=\frac{1-(t_{1,m_{1}}-t_{1,1})}{2}=\frac{1}{2}\left[\int_{0}^{t_{1,1}}1du+\int_{t_{1,m_{1}}}^{1}1du\right].

We have

1h2j=1m1(t1,j+1t1,j1)2Ih(t1,j;s,x)\displaystyle\frac{1}{h^{2}}\sum_{j=1}^{m_{1}}\frac{(t_{1,j+1}-t_{1,j-1})}{2}I_{h}(t_{1,j};s,x)
=\displaystyle= 1h2j=1m11t1,jt1,j+1Ih(t1,j;s,x)𝑑u+12h2[0t1,1Ih(t1,1;s,x)𝑑u+t1,m11Ih(t1,1;s,x)𝑑u]+\displaystyle\frac{1}{h^{2}}\sum_{j=1}^{m_{1}-1}\int_{t_{1,j}}^{t_{1,j+1}}I_{h}(t_{1,j};s,x)du+\frac{1}{2h^{2}}\left[\int_{0}^{t_{1,1}}I_{h}(t_{1,1};s,x)du+\int_{t_{1,m_{1}}}^{1}I_{h}(t_{1,1};s,x)du\right]+
12h2[0t1,1Ih(t1,m1;s,x)𝑑u+t1,m11Ih(t1,m1;s,x)𝑑u].\displaystyle\frac{1}{2h^{2}}\left[\int_{0}^{t_{1,1}}I_{h}(t_{1,m_{1}};s,x)du+\int_{t_{1,m_{1}}}^{1}I_{h}(t_{1,m_{1}};s,x)du\right]. (26)

Term (II): Note that Ih(u;s,x)=ϵK(xs(u)+ν)h)𝑑Fϵ(ν)I_{h}(u;s,x)=\int_{\epsilon}K\left(\frac{x-s(u)+\nu)}{h}\right)dF_{\epsilon}(\nu), then Term (II) in the (25) can be decomposed to

1h2νu[0,1]K(xs(u)+νh)𝑑u𝑑Fϵ(ν)\displaystyle\frac{1}{h^{2}}\int_{\nu}\int_{u\in[0,1]}K\left(\frac{x-s(u)+\nu}{h}\right)dudF_{\epsilon}(\nu)
=\displaystyle= 1h2u[0,1]Ih(u;s,x)𝑑u\displaystyle\frac{1}{h^{2}}\int_{u\in[0,1]}I_{h}(u;s,x)du
=\displaystyle= 1h2[0t1,1Ih(u;s,x)𝑑u+t1,1t1,m1Ih(u;s,x)𝑑u+t1,m11Ih(u;s,x)𝑑u]\displaystyle\frac{1}{h^{2}}\left[\int_{0}^{t_{1,1}}I_{h}(u;s,x)du+\int_{t_{1,1}}^{t_{1,m_{1}}}I_{h}(u;s,x)du+\int_{t_{1,m_{1}}}^{1}I_{h}(u;s,x)du\right]
=\displaystyle= 1h2j=1m11t1,jt1,j+1Ih(u;s,x)𝑑u+1h2[0t1,1Ih(u;s,x)𝑑u+t1,m11Ih(u;s,x)𝑑u].\displaystyle\frac{1}{h^{2}}\sum_{j=1}^{m_{1}-1}\int_{t_{1,j}}^{t_{1,j+1}}I_{h}(u;s,x)du+\frac{1}{h^{2}}\left[\int_{0}^{t_{1,1}}I_{h}(u;s,x)du+\int_{t_{1,m_{1}}}^{1}I_{h}(u;s,x)du\right]. (27)

Namely, we decompose (II) into each interval [ti,j,ti,j+1][t_{i,j},t_{i,j+1}] so that we can easily compare it with term (I).

Combining (26) and (27), (25) can be written as:

𝔼[f^GPS,1(x)S1=s]𝔼[1h2K(xXh)S1=s]\displaystyle\mathbb{E}\left[\widehat{f}_{GPS,1}(x)\mid S_{1}=s\right]-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\mid S_{1}=s\right]
=\displaystyle= 1h2j=1m11[t1,jt1,j+1Ih(t1,j;s,x)Ih(u;s,x)du+](III)+\displaystyle\underbrace{\frac{1}{h^{2}}\sum_{j=1}^{m_{1}-1}\left[\int_{t_{1,j}}^{t_{1,j+1}}I_{h}(t_{1,j};s,x)-I_{h}(u;s,x)du+\right]}_{(III)}+
12h2[0t1,1Ih(t1,1;s,x)Ih(u;s,x)du+t1,m11Ih(t1,m1;s,x)Ih(u;s,x)du](IV)+\displaystyle\underbrace{\frac{1}{2h^{2}}\left[\int_{0}^{t_{1,1}}I_{h}(t_{1,1};s,x)-I_{h}(u;s,x)du+\int_{t_{1,m_{1}}}^{1}I_{h}(t_{1,m_{1}};s,x)-I_{h}(u;s,x)du\right]}_{(IV)}+
12h2[0t1,1Ih(t1,m1;s,x)Ih(u;s,x)du+t1,m11Ih(t1,1;s,x)Ih(u;s,x)du](V)\displaystyle\underbrace{\frac{1}{2h^{2}}\left[\int_{0}^{t_{1,1}}I_{h}(t_{1,m_{1}};s,x)-I_{h}(u;s,x)du+\int_{t_{1,m_{1}}}^{1}I_{h}(t_{1,1};s,x)-I_{h}(u;s,x)du\right]}_{(V)} (28)

In RHS of (28), the term (III) represents the error caused by the approximation using GPS observations. The source of the term (IV) and (V) is due to our design of circular weights for the first and last observation in each day.

Before deriving the convergence rate of term (III), (IV) and (V) in (28), note that, by Assumption 2 (velocity assumption) and Assumption 1 (Lipchitz property of the kernel function KK), for every j{1,,m1},u[0,1]j\in\{1,...,m_{1}\},u\in[0,1] the term |K(xs(t1,j)+ϵ1,jh)K(xs(u)+ϵ1,jh)|\left|K\left(\frac{x-s(t_{1,j})+\epsilon_{1,j}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,j}}{h}\right)\right| in the further analysis can be bounded in the following way:

|K(xs(t1,j)+ϵ1,jh)K(xs(u)+ϵ1,jh)|LKs(t1,j)hs(u)hLKv¯hdT(t1,j,u),\displaystyle\left|K\left(\frac{x-s(t_{1,j})+\epsilon_{1,j}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,j}}{h}\right)\right|\leq L_{K}\left\|\frac{s(t_{1,j})}{h}-\frac{s(u)}{h}\right\|\leq\frac{L_{K}\bar{v}}{h}d_{T}(t_{1,j},u), (29)

where dT(t1,j,u)=min{|t1,ju|,1|t1,ju|}d_{T}(t_{1,j},u)=\min\left\{|t_{1,j}-u|,1-|t_{1,j}-u|\right\}

Term (III): For the convergence rate of the term (III) in (28), we find that

j=1m11[t1,jt1,j+1Ih(u;s,x)Ih(t1,j;s,x)du]\displaystyle\sum_{j=1}^{m_{1}-1}\left[\int_{t_{1,j}}^{t_{1,j+1}}I_{h}(u;s,x)-I_{h}(t_{1,j};s,x)du\right] (30)
=\displaystyle= 1h2j=1m1t1,jt1,j+1ϵ1,jK(xs(t1,j)+ϵ1,jh)𝑑Fϵ(ϵ1,j)𝑑u1h2j=1m1t1,jt1,j+1ϵ1,jK(xs(u)+ϵ1,jh)𝑑Fϵ(ϵ1,j)𝑑u\displaystyle\frac{1}{h^{2}}\sum_{j=1}^{m_{1}}\int_{t_{1,j}}^{t_{1,j+1}}\int_{\epsilon_{1,j}}K\left(\frac{x-s(t_{1,j})+\epsilon_{1,j}}{h}\right)dF_{\epsilon}(\epsilon_{1,j})du-\frac{1}{h^{2}}\sum_{j=1}^{m_{1}}\int_{t_{1,j}}^{t_{1,j+1}}\int_{\epsilon_{1,j}}K\left(\frac{x-s(u)+\epsilon_{1,j}}{h}\right)dF_{\epsilon}(\epsilon_{1,j})du
\displaystyle\leq 1h2j=1m1t1,jt1,j+1ϵ1,j|K(xs(t1,j)+ϵ1,jh)K(xs(u)+ϵ1,jh)|𝑑Fϵ(ϵ1,j)𝑑u\displaystyle\frac{1}{h^{2}}\sum_{j=1}^{m_{1}}\int_{t_{1,j}}^{t_{1,j+1}}\int_{\epsilon_{1,j}}\left|K\left(\frac{x-s(t_{1,j})+\epsilon_{1,j}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,j}}{h}\right)\right|dF_{\epsilon}(\epsilon_{1,j})du

In (30), for every j{1,,mi}j\in\{1,...,m_{i}\} and u[t1,j1,t1,j]u\in[t_{1,j-1},t_{1,j}] corresponding to each integral of the summation, we have dT(u,t1,j)t1,j+1t1,jd_{T}(u,t_{1,j})\leq t_{1,j+1}-t_{1,j} Hence, by applying (29), the term (III) in (28) can be bounded by

1h21T1j=1m11[t1,jt1,j+1Ih(u;s,x)Ih(t1,j;s,x)du]\displaystyle\frac{1}{h^{2}}\frac{1}{T_{1}}\sum_{j=1}^{m_{1}-1}\left[\int_{t_{1,j}}^{t_{1,j+1}}I_{h}(u;s,x)-I_{h}(t_{1,j};s,x)du\right]
\displaystyle\leq 1h3j=1m11t1,jt1,j+1LKv¯dT(t1,j,u)𝑑Fϵ(ϵ1,j)𝑑u[By (29)]\displaystyle\frac{1}{h^{3}}\sum_{j=1}^{m_{1}-1}\int_{t_{1,j}}^{t_{1,j+1}}\int L_{K}\bar{v}d_{T}(t_{1,j},u)dF_{\epsilon}(\epsilon_{1,j})du\quad\quad\text{[By \eqref{eq: Riem-velocity0}]}
\displaystyle\leq 12h3j=1m11(t1,j+1t1,j)2LKv¯[By dT(u,t1,j)t1,j+1t1,j]\displaystyle\frac{1}{2h^{3}}\sum_{j=1}^{m_{1}-1}{\left(t_{1,j+1}-t_{1,j}\right)^{2}}L_{K}\bar{v}\quad\quad\text{[By }d_{T}(u,t_{1,j})\leq t_{1,j+1}-t_{1,j}]
\displaystyle\leq Cv¯j=1m11(t1,j+1t1,j)2h3,\displaystyle C_{\bar{v}}\frac{\sum_{j=1}^{m_{1}-1}\left(t_{1,j+1}-t_{1,j}\right)^{2}}{h^{3}}, (31)

for some constant Cv¯C_{\bar{v}}.

Term (IV): For term (IV):

12h2[0t1,1Ih(t1,1;s,x)Ih(u;s,x)du+t1,m11Ih(t1,m1;s,x)Ih(u;s,x)du]\displaystyle\frac{1}{2h^{2}}\left[\int_{0}^{t_{1,1}}I_{h}(t_{1,1};s,x)-I_{h}(u;s,x)du+\int_{t_{1,m_{1}}}^{1}I_{h}(t_{1,m_{1}};s,x)-I_{h}(u;s,x)du\right]
=\displaystyle= 12h20t1,1ϵ1,1K(xs(t1,1)+ϵ1,1h)K(xs(u)+ϵ1,jh)dFϵ(ϵ1,1)du+\displaystyle\frac{1}{2h^{2}}\int_{0}^{t_{1,1}}\int_{\epsilon_{1,1}}K\left(\frac{x-s(t_{1,1})+\epsilon_{1,1}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,j}}{h}\right)dF_{\epsilon}(\epsilon_{1,1})du+
12h2t1,m11ϵ1,m1K(xs(t1,m1)+ϵ1,m1h)K(xs(u)+ϵ1,m1h)dFϵ(ϵ1,m1)du\displaystyle\frac{1}{2h^{2}}\int_{t_{1,m_{1}}}^{1}\int_{\epsilon_{1,m_{1}}}K\left(\frac{x-s(t_{1,m_{1}})+\epsilon_{1,m_{1}}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,m_{1}}}{h}\right)dF_{\epsilon}(\epsilon_{1,m_{1}})du (32)

Since two parts in (32) have same form, we only need to focus on the first term for RHS of (32). Recall t1,0=t1,m1t_{1,0}=t_{1,m}-1, then, for u[0,t1,1]u\in[0,t_{1,1}], dT(t1,1,u)t1,1t1,1t1,0d_{T}(t_{1,1},u)\leq t_{1,1}\leq t_{1,1}-t_{1,0}. Thus,

12h20t1,1ϵ1,1K(xs(t1,1)+ϵ1,1h)K(xs(u)+ϵ1,1h)dFϵ(ϵ1,1)du\displaystyle\frac{1}{2h^{2}}\int_{0}^{t_{1,1}}\int_{\epsilon_{1,1}}K\left(\frac{x-s(t_{1,1})+\epsilon_{1,1}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,1}}{h}\right)dF_{\epsilon}(\epsilon_{1,1})du
\displaystyle\leq 12h20t1,1ϵ1,1|K(xs(t1,1)+ϵ1,1h)K(xs(u)+ϵ1,1h)|𝑑Fϵ(ϵ1,1)𝑑u\displaystyle\frac{1}{2h^{2}}\int_{0}^{t_{1,1}}\int_{\epsilon_{1,1}}\left|K\left(\frac{x-s(t_{1,1})+\epsilon_{1,1}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,1}}{h}\right)\right|dF_{\epsilon}(\epsilon_{1,1})du
\displaystyle\leq 12h30t1,1LKv¯dT(t1,1,u)𝑑Fϵ(ϵ1,1)𝑑u[By (29)]\displaystyle\frac{1}{2h^{3}}\int_{0}^{t_{1,1}}\int L_{K}\bar{v}d_{T}(t_{1,1},u)dF_{\epsilon}(\epsilon_{1,1})du\quad\quad\text{[By \eqref{eq: Riem-velocity0}]}
\displaystyle\leq Cv¯(t1,1t1,0)22h3.\displaystyle C_{\bar{v}}\frac{\left(t_{1,1}-t_{1,0}\right)^{2}}{2h^{3}}. (33)

Similarly, for the second term of RHS of (32), we have

12h2t1,m11ϵ1,m1K(xs(t1,m1)+ϵ1,m1h)K(xs(u)+ϵ1,m1h)dFϵ(ϵ1,m1)duCv¯(t1,m1+1t1,m1)22h3.\displaystyle\frac{1}{2h^{2}}\int_{t_{1,m_{1}}}^{1}\int_{\epsilon_{1,m_{1}}}K\left(\frac{x-s(t_{1,m_{1}})+\epsilon_{1,m_{1}}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,m_{1}}}{h}\right)dF_{\epsilon}(\epsilon_{1,m_{1}})du\leq C_{\bar{v}}\frac{\left(t_{1,m_{1}+1}-t_{1,m_{1}}\right)^{2}}{2h^{3}}. (34)

Term (V): For term (V):

12h2[0t1,1Ih(t1,m1;s,x)Ih(u;s,x)du+t1,m11Ih(t1,1;s,x)Ih(u;s,x)du]\displaystyle\frac{1}{2h^{2}}\left[\int_{0}^{t_{1,1}}I_{h}(t_{1,m_{1}};s,x)-I_{h}(u;s,x)du+\int_{t_{1,m_{1}}}^{1}I_{h}(t_{1,1};s,x)-I_{h}(u;s,x)du\right]
=\displaystyle= 12h20t1,1ϵ1,m1K(xs(t1,m1)+ϵ1,m1h)K(xs(u)+ϵ1,m1h)dFϵ(ϵ1,m1)du+\displaystyle\frac{1}{2h^{2}}\int_{0}^{t_{1,1}}\int_{\epsilon_{1,m_{1}}}K\left(\frac{x-s(t_{1,m_{1}})+\epsilon_{1,m_{1}}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,m_{1}}}{h}\right)dF_{\epsilon}(\epsilon_{1,m_{1}})du+
12h2t1,m11ϵ1,1K(xs(t1,1)+ϵ1,1h)K(xs(u)+ϵ1,1h)dFϵ(ϵ1,1)du\displaystyle\frac{1}{2h^{2}}\int_{t_{1,m_{1}}}^{1}\int_{\epsilon_{1,1}}K\left(\frac{x-s(t_{1,1})+\epsilon_{1,1}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,1}}{h}\right)dF_{\epsilon}(\epsilon_{1,1})du (35)

Similar as the procedure of deriving convergence rate of term (IV), the two parts in RHS of (35) have same form, we only need to focus on the first term for RHS of (35), for u[0,t1,1]u\in[0,t_{1,1}], dT(t1,m1,u)1(t1,m1t1,1)=t1,m1+1t1,m1d_{T}(t_{1,m_{1}},u)\leq 1-(t_{1,m_{1}}-t_{1,1})=t_{1,m_{1}+1}-t_{1,m_{1}}. Thus,

12h20t1,1ϵ1,1K(xs(t1,m1)+ϵ1,m1h)K(xs(u)+ϵ1,m1h)dFϵ(ϵ1,m1)du\displaystyle\frac{1}{2h^{2}}\int_{0}^{t_{1,1}}\int_{\epsilon_{1,1}}K\left(\frac{x-s(t_{1,m_{1}})+\epsilon_{1,m_{1}}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,m_{1}}}{h}\right)dF_{\epsilon}(\epsilon_{1,m_{1}})du
\displaystyle\leq 12h20t1,1ϵ1,1|K(xs(t1,m1)+ϵ1,m1h)K(xs(u)+ϵ1,m1h)|𝑑Fϵ(ϵ1,m1)𝑑u\displaystyle\frac{1}{2h^{2}}\int_{0}^{t_{1,1}}\int_{\epsilon_{1,1}}\left|K\left(\frac{x-s(t_{1,m_{1}})+\epsilon_{1,m_{1}}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,m_{1}}}{h}\right)\right|dF_{\epsilon}(\epsilon_{1,m_{1}})du
\displaystyle\leq 12h30t1,1LKv¯dT(t1,m1,u)𝑑Fϵ(ϵ1,1)𝑑u[By (29)]\displaystyle\frac{1}{2h^{3}}\int_{0}^{t_{1,1}}\int L_{K}\bar{v}d_{T}(t_{1,m_{1}},u)dF_{\epsilon}(\epsilon_{1,1})du\quad\quad\text{[By \eqref{eq: Riem-velocity0}]}
\displaystyle\leq Cv¯(t1,m1+1t1,m1)22h3.\displaystyle C_{\bar{v}}\frac{\left(t_{1,m_{1}+1}-t_{1,m_{1}}\right)^{2}}{2h^{3}}. (36)

Similarly,

12h2t1,m11ϵ1,1K(xs(t1,1)+ϵ1,1h)K(xs(u)+ϵ1,1h)dFϵ(ϵ1,1)duCv¯(t1,1t1,0)22h3\displaystyle\frac{1}{2h^{2}}\int_{t_{1,m_{1}}}^{1}\int_{\epsilon_{1,1}}K\left(\frac{x-s(t_{1,1})+\epsilon_{1,1}}{h}\right)-K\left(\frac{x-s(u)+\epsilon_{1,1}}{h}\right)dF_{\epsilon}(\epsilon_{1,1})du\leq C_{\bar{v}}\frac{\left(t_{1,1}-t_{1,0}\right)^{2}}{2h^{3}} (37)

Hence, combining (31), (33), (34) and (36), (37), for every x𝒲x\in\mathcal{W}, there is constant Cv¯C_{\bar{v}} such that

𝔼[f^w,i(x)S=s]𝔼[1h2K(xXh)S=s]Cv¯(1nh3i=1nj=0mk(ti,j+1ti,j)2)\displaystyle\mathbb{E}\left[\widehat{f}_{w,i}(x)\mid S=s\right]-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\mid S=s\right]\leq C_{\bar{v}}\left(\frac{1}{nh^{3}}\sum_{i=1}^{n}\sum_{j=0}^{m_{k}}(t_{i,j+1}-t_{i,j})^{2}\right)

\Box

Proof. For any xx, fGPS(x)f^w(x)f_{GPS}(x)-\widehat{f}_{w}(x) can be decomposed to the following three terms:

fGPS(x)f^w(x)=\displaystyle f_{GPS}(x)-\widehat{f}_{w}(x)= fGPS(x)𝔼[1h2K(xXh)](I)+𝔼[1h2K(xXh)]𝔼[f^w(x)](II)+𝔼[f^w(x)]f^w(x)(III).\displaystyle\underbrace{f_{GPS}(x)-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\right]}_{(I)}+\underbrace{\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\right]-\mathbb{E}[\widehat{f}_{w}(x)]}_{(II)}+\underbrace{\mathbb{E}[\widehat{f}_{w}(x)]-\widehat{f}_{w}(x)}_{(III)}.

Term (I): fGPS(x)𝔼[1h2K(xXh)]f_{GPS}(x)-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\right]; For any x2x\in\mathbb{R}^{2}, we have

fGPS(x)𝔼[1h2K(xXh)]\displaystyle f_{GPS}(x)-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\right]
=\displaystyle= 1h2y2K(xyh)fGPS(y)𝑑yfGPS(x)\displaystyle\frac{1}{h^{2}}\int_{y\in\mathbb{R}^{2}}K\left(\frac{x-y}{h}\right)f_{GPS}(y)dy-f_{GPS}(x)
=\displaystyle= z2K(z)fGPS(xhz)𝑑zfGPS(x)\displaystyle\int_{z\in\mathbb{R}^{2}}K\left(z\right)f_{GPS}(x-hz)dz-f_{GPS}(x)

By Assumption 3, fGPSf_{GPS} is twice differentiable,

z2K(z)fGPS(xhz)𝑑zfGPS(x)=\displaystyle\int_{z\in\mathbb{R}^{2}}K\left(z\right)f_{GPS}(x-hz)dz-f_{GPS}(x)= y2K(z)fGPS(x)+hzTfGPS(x)\displaystyle\int_{y\in\mathbb{R}^{2}}K\left(z\right)f_{GPS}(x)+hz^{T}\nabla f_{GPS}(x)
+h2zTfGPS(x)zdzfGPS(x)+o(h2).\displaystyle+h^{2}z^{T}\nabla\nabla f_{GPS}(x)zdz-f_{GPS}(x)+o(h^{2}).

By Assumption 1 and Assumption 3, z2K(z)𝑑z=1\int_{z\in\mathbb{R}^{2}}K(z)dz=1, z2zK(z)𝑑z=0\int_{z\in\mathbb{R}^{2}}zK(z)dz=0, fGPS\nabla\nabla f_{GPS} is bounded. Hence,

z2K(z)fGPS(xhz)𝑑zfGPS(x)=O(h2).\displaystyle\int_{z\in\mathbb{R}^{2}}K\left(z\right)f_{GPS}(x-hz)dz-f_{GPS}(x)=O(h^{2}). (38)

Term (II): 𝔼[1h2K(xXh)]𝔼[f^w(x)]\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\right]-\mathbb{E}[\widehat{f}_{w}(x)]. Recall that f^w,i(x)\widehat{f}_{w,i}(x) to is the estimated density contributed by the observations in ii-th day, i.e.

f^w,i(x)=1h2j=1mi(ti,j+1ti,j1)2K(xXi,jh).\displaystyle\widehat{f}_{w,i}(x)=\frac{1}{h^{2}}\sum_{j=1}^{m_{i}}\frac{(t_{i,j+1}-t_{i,j-1})}{2}K\left(\frac{x-X_{i,j}}{h}\right).

Then, we can decompose the expectation of the density estimator:

𝔼[f^w(x)]\displaystyle\mathbb{E}\left[\widehat{f}_{w}(x)\right]
=\displaystyle= 12nh2i=1n𝔼[j=1mi(ti,j+1ti,j1)K(xXi,jh)]\displaystyle\frac{1}{2nh^{2}}\sum_{i=1}^{n}\mathbb{E}\left[\sum_{j=1}^{m_{i}}{(t_{i,j+1}-t_{i,j-1})}K\left(\frac{x-X_{i,j}}{h}\right)\right]
=\displaystyle= 12nh2i=1n𝔼{𝔼[j=1mi(ti,j+1ti,j1)K(xXi,jh)Si]}\displaystyle\frac{1}{2nh^{2}}\sum_{i=1}^{n}\mathbb{E}\left\{\mathbb{E}\left[\sum_{j=1}^{m_{i}}{(t_{i,j+1}-t_{i,j-1})}K\left(\frac{x-X_{i,j}}{h}\right)\mid S_{i}\right]\right\}
=\displaystyle= 12nh2i=1nj=1mi(ti,j+1ti,j1)s𝒮ϵi,jK(xs(ti,j)+ϵi,jh)𝑑FS(s)𝑑Fϵ(ϵi,j)\displaystyle\frac{1}{2nh^{2}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}{(t_{i,j+1}-t_{i,j-1})}\int_{s\in\mathcal{S}}\int_{\epsilon_{i,j}}K\left(\frac{x-s(t_{i,j})+\epsilon_{i,j}}{h}\right)dF_{S}(s)dF_{\epsilon}(\epsilon_{i,j})
\displaystyle\triangleq 1ni=1ns𝒮𝔼[f^w,i(x)S=s]𝑑FS(s),\displaystyle\frac{1}{n}\sum_{i=1}^{n}\int_{s\in\mathcal{S}}\mathbb{E}\left[\widehat{f}_{w,i}(x)\mid S=s\right]dF_{S}(s),

Note that

𝔼[1h2K(xXh)]=s𝒮𝔼[1h2K(xXh)|S=s]𝑑FS(s).\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\right]=\int_{s\in\mathcal{S}}\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)|S=s\right]dF_{S}(s).

Hence, term (II) can be written as

𝔼[f^w(x)]𝔼[1h2K(xXh)]=1ni=1ns𝒮𝔼[f^w,i(x)S=s]𝔼[1h2K(xXh)|S=s]dFS(s).\displaystyle\mathbb{E}[\widehat{f}_{w}(x)]-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\right]=\frac{1}{n}\sum_{i=1}^{n}\int_{s\in\mathcal{S}}\mathbb{E}\left[\widehat{f}_{w,i}(x)\mid S=s\right]-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)|S=s\right]dF_{S}(s).

Then, by Lemma 8, for any i{1,,n}i\in\{1,...,n\} and s𝒮s\in\mathcal{S}, there is uniform constant Cv¯C_{\bar{v}} such that

𝔼[f^w,i(x)S=s]𝔼[1h2K(xXh)S=s]\displaystyle\mathbb{E}\left[\widehat{f}_{w,i}(x)\mid S=s\right]-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\mid S=s\right]\leq Cv¯(1nh3i=1nj=0mi(ti,j+1ti,j)2)\displaystyle C_{\bar{v}}\left(\frac{1}{nh^{3}}\sum_{i=1}^{n}\sum_{j=0}^{m_{i}}(t_{i,j+1}-t_{i,j})^{2}\right)

for every x𝒲x\in\mathcal{W}. Hence,

𝔼[f^w(x)]𝔼[1h2K(xXh)]=O(1nh3i=1nj=0m(ti,j+1ti,j)2).\displaystyle\mathbb{E}\left[\widehat{f}_{w}(x)\right]-\mathbb{E}\left[\frac{1}{h^{2}}K\left(\frac{x-X^{*}}{h}\right)\right]=O\left(\frac{1}{nh^{3}}\sum_{i=1}^{n}\sum_{j=0}^{m}(t_{i,j+1}-t_{i,j})^{2}\right).

Term (III). Note that the term (III) is the variance of f^w(x)\widehat{f}_{w}(x):

Var[f^w(x)]=Var[1nh2i=1nj=1miWi,jK(xSi(ti,j)+ϵi,jh)]=1n2h4i=1nVar[j=1miWi,jK(xSi(ti,j)+ϵi,jh)].\displaystyle\text{Var}[\widehat{f}_{w}(x)]=\text{Var}\left[\frac{1}{nh^{2}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W_{i,j}K\left(\frac{x-S_{i}(t_{i,j})+\epsilon_{i,j}}{h}\right)\right]=\frac{1}{n^{2}h^{4}}\sum_{i=1}^{n}\text{Var}\left[\sum_{j=1}^{m_{i}}W_{i,j}K\left(\frac{x-S_{i}(t_{i,j})+\epsilon_{i,j}}{h}\right)\right].

by the independence among {S1,,Sn}\{S_{1},...,S_{n}\}. To simplify, we only need to consider i=1i=1. Then we calculate

Var[j=1m1Wi,jK(xS(t1,j)+ϵ1,jh)]\displaystyle\text{Var}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{1,j}}{h}\right)\right] (39)
=\displaystyle= 𝔼{Var[j=1m1Wi,jK(xS(t1,j)+ϵ1,jh)|S1]}+Var{𝔼[j=1m1Wi,jK(xS(t1,j)+ϵ1,jh)|S1]}.\displaystyle\mathbb{E}\left\{\text{Var}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{1,j}}{h}\right)|S_{1}\right]\right\}+\text{Var}\left\{\mathbb{E}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{1,j}}{h}\right)|S_{1}\right]\right\}.

For the first term of the RHS of (39), note that ϵ1,1,,ϵ1,m1\epsilon_{1,1},\ldots,\epsilon_{1,m_{1}} are IID,

𝔼{Var[j=1m1Wi,jK(xS(t1,j))+ϵ1,jh)|S1]}\displaystyle\mathbb{E}\left\{\text{Var}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j}))+\epsilon_{1,j}}{h}\right)|S_{1}\right]\right\}
=\displaystyle= 𝔼{i=1m1Wi,j2Var[K(xS(t1,j)+ϵ1,jh)|S1]}\displaystyle\mathbb{E}\left\{\sum_{i=1}^{m_{1}}{W_{i,j}^{2}}\text{Var}\left[K\left(\frac{x-S(t_{1,j})+\epsilon_{1,j}}{h}\right)|S_{1}\right]\right\}
\displaystyle\leq 𝔼{j=1m1Wi,j2𝔼[K(xS(t1,j)+ϵ1,jh)2|S1]}.\displaystyle\mathbb{E}\left\{\sum_{j=1}^{m_{1}}{W_{i,j}^{2}}\mathbb{E}\left[K\left(\frac{x-S(t_{1,j})+\epsilon_{1,j}}{h}\right)^{2}|S_{1}\right]\right\}.

Note that, for any s𝒮s\in\mathcal{S}, 𝔼[K(xS(t1,j)+ϵ1,jh)2|S=s]\mathbb{E}\left[K\left(\frac{x-S(t_{1,j})+\epsilon_{1,j}}{h}\right)^{2}|S=s\right] only depends on the measurement error that is independent at each timestamp. Then,

𝔼{j=1m1Wi,j2𝔼[K(xS(t1,j)+ϵi,jh)2|S]}\displaystyle\mathbb{E}\left\{\sum_{j=1}^{m_{1}}{W_{i,j}^{2}}\mathbb{E}\left[K\left(\frac{x-S(t_{1,j})+\epsilon_{i,j}}{h}\right)^{2}|S\right]\right\}
=\displaystyle= 𝔼{j=1m1Wi,j2ϵ1,jK(xS(t1,j)+ϵ1,jh)2fϵ(ϵ1,j)𝑑ϵ1,j}\displaystyle\mathbb{E}\left\{\sum_{j=1}^{m_{1}}{W_{i,j}^{2}}\int_{\epsilon_{1,j}}K\left(\frac{x-S(t_{1,j})+\epsilon_{1,j}}{h}\right)^{2}f_{\epsilon}(\epsilon_{1,j})d\epsilon_{1,j}\right\}
=\displaystyle= sj=1m1Wi,j2ϵ1,jK(xs(t1,j)+ϵ1,jh)2fϵ(ϵ1,j)𝑑ϵ1,j𝑑FS(s)\displaystyle\int_{s}\sum_{j=1}^{m_{1}}{W_{i,j}^{2}}\int_{\epsilon_{1,j}}K\left(\frac{x-s(t_{1,j})+\epsilon_{1,j}}{h}\right)^{2}f_{\epsilon}(\epsilon_{1,j})d\epsilon_{1,j}dF_{S}(s)
=\displaystyle= h2j=1m1Wi,j2su1,jK(u1,j)2fϵ(u1,jhx+s(t1,j))𝑑u1,j𝑑FS(s).\displaystyle h^{2}\sum_{j=1}^{m_{1}}{W_{i,j}^{2}}\int_{s}\int_{u_{1,j}}K\left(u_{1,j}\right)^{2}f_{\epsilon}(u_{1,j}h-x+s(t_{1,j}))du_{1,j}dF_{S}(s).

By Assumption 1 and 3, fϵf_{\epsilon} and x2K(x)2𝑑x\int_{x\in\mathbb{R}^{2}}K(x)^{2}dx is bounded, we have

h2j=1m1Wi,j2sϵ1,jK(u1,j)2fϵ(u1,jhx+s(t1,j))𝑑u1,j𝑑FS(s)=O(h2j=1m1Wi,j2).\displaystyle h^{2}\sum_{j=1}^{m_{1}}{W_{i,j}^{2}}\int_{s}\int_{\epsilon_{1,j}}K\left(u_{1,j}\right)^{2}f_{\epsilon}(u_{1,j}h-x+s(t_{1,j}))du_{1,j}dF_{S}(s)=O\left({h^{2}}\sum_{j=1}^{m_{1}}W_{i,j}^{2}\right). (40)

For the second term in RHS of (39), by

Var{𝔼[j=1m1Wi,jK(xS(t1,j)+ϵi,jh)|S]}\displaystyle\text{Var}\left\{\mathbb{E}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{i,j}}{h}\right)|S\right]\right\} (41)
\displaystyle\leq 𝔼{{𝔼[j=1m1Wi,jK(xS(t1,j)+ϵi,jh)|S]}2},\displaystyle\mathbb{E}\left\{\left\{\mathbb{E}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{i,j}}{h}\right)|S\right]\right\}^{2}\right\},

we consider 𝔼[j=1m1Wi,jK(xS(t1,j)+ϵi,jh)|S]\mathbb{E}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{i,j}}{h}\right)|S\right] in (41) first. For any s𝒮s\in\mathcal{S}:

𝔼[j=1m1Wi,jK(xS(t1,j)+ϵi,jh)|S=s]\displaystyle\mathbb{E}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{i,j}}{h}\right)|S=s\right]
=\displaystyle= j=1m1Wi,jϵ1,jK(xs(t1,j)+ϵ1,jh)fϵ(ϵ1,j)𝑑ϵ1,j\displaystyle\sum_{j=1}^{m_{1}}W_{i,j}\int_{\epsilon_{1,j}}K\left(\frac{x-s(t_{1,j})+\epsilon_{1,j}}{h}\right)f_{\epsilon}(\epsilon_{1,j})d\epsilon_{1,j}
=\displaystyle= h2j=1m1Wi,jϵ1,jK(u1,j)fϵ(u1,jhx+s(t1,j))𝑑u1,j.\displaystyle h^{2}\sum_{j=1}^{m_{1}}W_{i,j}\int_{\epsilon_{1,j}}K\left(u_{1,j}\right)f_{\epsilon}(u_{1,j}h-x+s(t_{1,j}))du_{1,j}.

For any s𝒮s\in\mathcal{S}, ϵ1,jK(u1,j)fϵ(u1,jhx+s(t1,j))𝑑u1,j\int_{\epsilon_{1,j}}K\left(u_{1,j}\right)f_{\epsilon}(u_{1,j}h-x+s(t_{1,j}))du_{1,j} is upped-bounded by maxyfϵ(y)\max_{y}f_{\epsilon}(y), then the RHS of (41) can be upper-bounded by

𝔼{{𝔼[j=1m1Wi,jK(xS(t1,j)+ϵi,jh)|S]}2}\displaystyle\mathbb{E}\left\{\left\{\mathbb{E}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{i,j}}{h}\right)|S\right]\right\}^{2}\right\}
=\displaystyle= 𝔼[{h2j=1m1Wi,jϵ1,jK(u1,j)fϵ(u1,jhx+S(t1,j))du1,j]2}\displaystyle\mathbb{E}\left[\left\{h^{2}\sum_{j=1}^{m_{1}}W_{i,j}\int_{\epsilon_{1,j}}K\left(u_{1,j}\right)f_{\epsilon}(u_{1,j}h-x+S(t_{1,j}))du_{1,j}\right]^{2}\right\}
\displaystyle\leq h4𝔼{[j=1m1Wi,jmaxyfϵ(y)]2}h4maxyfϵ(y)2.\displaystyle h^{4}\mathbb{E}\left\{\left[\sum_{j=1}^{m_{1}}W_{i,j}\max_{y}f_{\epsilon}(y)\right]^{2}\right\}\leq{h^{4}}\max_{y}f_{\epsilon}(y)^{2}.

By Assumption 3, fϵf_{\epsilon} is uniformly bounded by some constant. Hence,

Var{𝔼[j=1m1Wi,jK(xS(t1,j)+ϵi,jh)|S]}=O(h4).\displaystyle\text{Var}\left\{\mathbb{E}\left[\sum_{j=1}^{m_{1}}W_{i,j}K\left(\frac{x-S(t_{1,j})+\epsilon_{i,j}}{h}\right)|S\right]\right\}=O(h^{4}). (42)

By (40) and (42), we have

Var[f^w(x)]=\displaystyle\text{Var}[\widehat{f}_{w}(x)]= 14n2h4i=1nVar[j=1miWi,jK(xSi(ti,j)+ϵi,jh)]=O(1n2h2i=1nj=1miWi,j2)+O(1n).\displaystyle\frac{1}{4n^{2}h^{4}}\sum_{i=1}^{n}\text{Var}\left[\sum_{j=1}^{m_{i}}W_{i,j}K\left(\frac{x-S_{i}(t_{i,j})+\epsilon_{i,j}}{h}\right)\right]=O\left(\frac{1}{n^{2}h^{2}}\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W_{i,j}^{2}\right)+O\left(\frac{1}{n}\right).

Hence, the convergence rate of Term (III) is

𝔼[f^w(x)]f^w(x)=\displaystyle\mathbb{E}[\widehat{f}_{w}(x)]-\widehat{f}_{w}(x)= Op(1nhi=1nj=1miWi,j2)+Op(n12)\displaystyle O_{p}\left(\frac{1}{nh}\sqrt{\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W_{i,j}^{2}}\right)+O_{p}\left(n^{-\frac{1}{2}}\right)

Combining the convergence rate of 3 terms derived above, we have

f^w(x)fw(x)=\displaystyle\widehat{f}_{w}(x)-f_{w}(x)= O(h2)+O(1nh3i=1nj=0m(ti,j+1ti,j)2)+Op(1nhi=1nj=1miWi,j2)+Op(n12).\displaystyle O(h^{2})+O\left(\frac{1}{nh^{3}}\sum_{i=1}^{n}\sum_{j=0}^{m}(t_{i,j+1}-t_{i,j})^{2}\right)+O_{p}\left(\frac{1}{nh}\sqrt{\sum_{i=1}^{n}\sum_{j=1}^{m_{i}}W_{i,j}^{2}}\right)+O_{p}\left(n^{-\frac{1}{2}}\right).

\Box

D.6 Proof of Proposition 6

Proof. From equation (11), we have

(S)(B)=o(1)\displaystyle\frac{(S)}{(B)}=o(1) m2h6nmh2=o(1)\displaystyle\Longleftrightarrow\frac{m^{2}h^{6}}{nmh^{2}}=o(1)
mh4n=o(1)\displaystyle\Longleftrightarrow\frac{mh^{4}}{n}=o(1)
mh4=o(n).\displaystyle\Longleftrightarrow mh^{4}=o(n).

Since the temporal resolution bias (B) is the dominating term, we should choose hh to optimize with respect to this, which leads to hm1/5h\asymp m^{-1/5} when mh4=o(n)mh^{4}=o(n). This leads to the inequality m1/5=o(n)m^{1/5}=o(n) when we put hh into the inequality mh4=o(n)mh^{4}=o(n).

On the other hand, (S) dominates (B) when n=o(mh4)n=o(mh^{4}) and in this case, the optimal smoothing bandwidth is h(nm)1/3h\asymp(nm)^{-1/3}. By putting this into n=o(mh4)n=o(mh^{4}), we obtain the inequality n=o(m1/5)n=o(m^{1/5}).

The convergence rates are obtained by simply plug-in these optimal smoothing bandwidth.

Note that at the critical point nm1/5n\asymp m^{1/5}, we have m1/5(nm)1/3m^{1/5}\asymp(nm)^{1/3}, so the two choices coincide.

\Box

D.7 Proof of Theorem 7

We use f^i(x|t)\widehat{f}_{i}(x|t) to denote the ii-th day’s contribution to f^(x|t)\widehat{f}(x|t), i.e.

f^i(x|t)=1hX2j=1miK(dT(ti,j,t)hT)K(Xi,jxhX)j=1miK(dT(ti,j,t)hT).\displaystyle\widehat{f}_{i}(x|t)=\frac{1}{h_{X}^{2}}\frac{\sum_{j=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)K\left(\frac{X_{i,j}-x}{h_{X}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}.

Recall that Xi,j=Si(ti,j)+ϵi,jX_{i,j}=S_{i}(t_{i,j})+\epsilon_{i,j}, then, for any s𝒮s\in\mathcal{S}, we define f^i(x|t;s)\widehat{f}_{i}(x|t;s) in the following way:

f^i(x|t;s)=1hX2j=1miK(dT(ti,j,t)hT)K(s(ti,j)+ϵi,jxhX)j=1miK(dT(ti,j,t)hT).\displaystyle\widehat{f}_{i}(x|t;s)=\frac{1}{h_{X}^{2}}\frac{\sum_{j=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)K\left(\frac{s(t_{i,j})+\epsilon_{i,j}-x}{h_{X}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}.

For any s𝒮s\in\mathcal{S}, we define fGPS(x|t;s)f_{GPS}(x|t;s) as

fGPS(x|t;s)\displaystyle f_{GPS}(x|t;s) =limr0P(s(t)+ϵB(x,r))πr2=fϵ(xs(t))\displaystyle=\lim_{r\rightarrow 0}\frac{P(s(t)+\epsilon\in B(x,r))}{\pi r^{2}}=f_{\epsilon}(x-s(t)) (43)

With the above definitions, f^i(x|t)\widehat{f}_{i}(x|t) and fGPS(x|t)f_{GPS}(x|t) can be written as

f^i(x|t)=s𝒮f^i(x|t;s)𝑑FS(s),fGPS(x|t)=s𝒮fGPS(x|t;s)𝑑FS(s).\displaystyle\widehat{f}_{i}(x|t)=\int_{s\in\mathcal{S}}\widehat{f}_{i}(x|t;s)dF_{S}(s),\quad f_{GPS}(x|t)=\int_{s\in\mathcal{S}}f_{GPS}(x|t;s)dF_{S}(s). (44)

Our analysis begins with a lemma controlling the error for a given trajectory ss.

Lemma 9

Under Assumption 2 to Assumption 4, for any i{1,,n}i\in\{1,...,n\} and s𝒮s\in\mathcal{S}, we have

𝔼(f^i(x|t;s))fGPS(x|t;s)=O(hX2)+O(j=1miK(dT(ti,j,t)hT)j=1miK(dT(ti,j,t)hT)fϵ(xs(ti,j))fϵ(xs(t))),\displaystyle\mathbb{E}\left(\widehat{f}_{i}(x|t;s)\right)-f_{GPS}(x|t;s)=O(h_{X}^{2})+O\left(\sum_{j=1}^{m_{i}}\frac{K\left(d_{T}(\frac{t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}f_{\epsilon}(x-s(t_{i,j}))-f_{\epsilon}(x-s(t))\right), (45)

and

Var(f^i(x|t;s))=MϵhX2j=1mi[K(dT(ti,j,t)hT)j=1miK(dT(ti,j,t)hT)]2x2K12(x)𝑑x,\displaystyle\text{Var}\left(\widehat{f}_{i}(x|t;s)\right)=\frac{M_{\epsilon}}{h_{X}^{2}}\sum_{j=1}^{m_{i}}\left[\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}\right]^{2}\int_{x\in\mathbb{R}^{2}}K_{1}^{2}(x)dx, (46)

for any t[0,1]t\in[0,1], and x𝒲x\in\mathcal{W} as hX0,mih_{X}\rightarrow 0,m_{i}\rightarrow\infty for any i{1,,n}i\in\{1,...,n\}.

Proof. By the independence among S1,,SnS_{1},...,S_{n}, WLOG, we can just let i=1i=1.

Bias: We first derive the bias of f^GPS,1(x|t)\widehat{f}_{GPS,1}(x|t), for any s𝒮s\in\mathcal{S}, by variable changing y1,j=s(t1,j)+ϵ1,jxhX,j=1,,m1y_{1,j}=\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}},j=1,...,m_{1}, and (43), we have

𝔼(f^1(x|t;s))fGPS(x|t;s)\displaystyle\mathbb{E}\left(\widehat{f}_{1}(x|t;s)\right)-f_{GPS}(x|t;s)
=\displaystyle= 𝔼[1hX2j=1m1K(dT(t1,j,t)hT)K(s(t1,j)+ϵ1,jxhX)j=1m1K(dT(t1,j,t)hT)]fϵ(xs(t))\displaystyle\mathbb{E}\left[\frac{1}{h_{X}^{2}}\frac{\sum_{j=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j^{\prime}},t)}{h_{T}}\right)}\right]-f_{\epsilon}(x-s(t))
=\displaystyle= 1hX2j=1m1K(dT(t1,j,t)hT)j=1m1K(dT(t1,j,t)hT)ϵ1,j2K(s(t1,j)+ϵ1,jxhX)fϵ(ϵ1,j)𝑑ϵ1,jfϵ(xs(t))\displaystyle\frac{1}{h_{X}^{2}}\sum_{j=1}^{m_{1}}\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}\int_{\epsilon_{1,j}\in\mathbb{R}^{2}}K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)f_{\epsilon}(\epsilon_{1,j})d\epsilon_{1,j}-f_{\epsilon}(x-s(t))
=\displaystyle= j=1m1K(dT(t1,j,t)hT)j=1m1K(dT(t1,j,t)hT)y1,j2K1(y1,j)fϵ(hXy1,j+xs(t1,j))𝑑y1,jfϵ(xs(t)).\displaystyle\sum_{j=1}^{m_{1}}\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}\int_{y_{1,j}\in\mathbb{R}^{2}}K_{1}\left(y_{1,j}\right)f_{\epsilon}(h_{X}y_{1,j}+x-s(t_{1,j}))dy_{1,j}-f_{\epsilon}(x-s(t)).

Then, by Assumption 3, we apply Taylor expansion on fϵ(hXy1,j+xs(t1,j))f_{\epsilon}(h_{X}y_{1,j}+x-s(t_{1,j})):

j=1m1KT(dT(t1,j,t)hT)j=1m1KT(dT(t1,j,t)hT)y1,j2K1(y1,j)fϵ(hXy1,j+xs(t1,j))𝑑y1,jfϵ(xs(t))\displaystyle\sum_{j=1}^{m_{1}}\frac{K_{T}\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j=1}^{m_{1}}K_{T}\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}\int_{y_{1,j}\in\mathbb{R}^{2}}K_{1}\left(y_{1,j}\right)f_{\epsilon}(h_{X}y_{1,j}+x-s(t_{1,j}))dy_{1,j}-f_{\epsilon}(x-s(t))
=\displaystyle= j=1m1K(dT(t1,j,t)hT)j=1m1K(dT(t1,j,t)hT)y1,j2K1(y1,j)[fϵ(xs(t1,j))+hXfϵ(xs(t1,j))Ty1,j+\displaystyle\sum_{j=1}^{m_{1}}\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j^{\prime}},t)}{h_{T}}\right)}\int_{y_{1,j}\in\mathbb{R}^{2}}K_{1}\left(y_{1,j}\right)[f_{\epsilon}(x-s(t_{1,j}))+h_{X}\nabla f_{\epsilon}(x-s(t_{1,j}))^{T}y_{1,j}+
hX2y1,jTfϵ(xs(t1,j))y1,jfϵ(xs(t))]dy1,j+o(hX2).\displaystyle h_{X}^{2}y_{1,j}^{T}\nabla\nabla f_{\epsilon}(x-s(t_{1,j}))y_{1,j}-f_{\epsilon}(x-s(t))]dy_{1,j}+o(h_{X}^{2}).

By Assumption 4, y2K(y)y𝑑y=0\int_{y\in\mathbb{R}^{2}}K(y)ydy=0, then

j=1m1K(dT(t1,j,t)hT)j=1m1K(dT((t1,j,t)hT)y1,j2K(y1,j)[fϵ(xs(t1,j))+hXfϵ(xs(t1,j))Ty1,j\displaystyle\sum_{j=1}^{m_{1}}\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}((t_{1,j^{\prime}},t)}{h_{T}}\right)}\int_{y_{1,j}\in\mathbb{R}^{2}}K\left(y_{1,j}\right)[f_{\epsilon}(x-s(t_{1,j}))+h_{X}\nabla f_{\epsilon}(x-s(t_{1,j}))^{T}y_{1,j}
+hX2y1,jTfϵ(xs(t1,j))y1,jfϵ(xs(t))]dy1,j+o(hX2)\displaystyle+h_{X}^{2}y_{1,j}^{T}\nabla\nabla f_{\epsilon}(x-s(t_{1,j}))y_{1,j}-f_{\epsilon}(x-s(t))]dy_{1,j}+o(h_{X}^{2})
=\displaystyle= j=1m1K(dT(t1,j,t)hT)j=1m1K(dT(t1,j,t)hT)y1,j2K(y1,j)[fϵ(xs(t1,j))fϵ(xs(t))+\displaystyle\sum_{j=1}^{m_{1}}\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}\int_{y_{1,j}\in\mathbb{R}^{2}}K\left(y_{1,j}\right)[f_{\epsilon}(x-s(t_{1,j}))-f_{\epsilon}(x-s(t))+
hX2y1,jTfϵ(xs(t1,j))y1,j]dy1,j+o(hX2)\displaystyle h_{X}^{2}y_{1,j}^{T}\nabla\nabla f_{\epsilon}(x-s(t_{1,j}))y_{1,j}]dy_{1,j}+o(h_{X}^{2})
\displaystyle\leq MϵhX2+j=1m1K(dT(t1,j,t)hT)j=1m1K(dT(t1,j,t)hT)fϵ(xs(t1,j))fϵ(xs(t)).\displaystyle M_{\epsilon}h_{X}^{2}+\sum_{j=1}^{m_{1}}\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j^{\prime}},t)}{h_{T}}\right)}f_{\epsilon}(x-s(t_{1,j}))-f_{\epsilon}(x-s(t)).

Hence,

𝔼f^i(x|t;s)fGPS(x|t;s)=O(hX2)+O(j=1miK(dT(ti,j,t)hT)j=1miK(dT(ti,j,t)hT)fϵ(xs(ti,j))fϵ(xs(t))).\displaystyle\mathbb{E}\widehat{f}_{i}(x|t;s)-f_{GPS}(x|t;s)=O(h_{X}^{2})+O\left(\sum_{j=1}^{m_{i}}\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}f_{\epsilon}(x-s(t_{i,j}))-f_{\epsilon}(x-s(t))\right).

for any i{1,,n}i\in\{1,...,n\}

Step 2 (Variance): As for the variance, for any s𝒮s\in\mathcal{S},

Var(f^1(x|t;s))=\displaystyle\text{Var}\left(\widehat{f}_{1}(x|t;s)\right)= Var(1hX2j=1m1K(dT(t1,j,t)hT)K(s(t1,j)+ϵ1,jxhX)j=1m1K(dT(t1,j,t)hT))\displaystyle\text{Var}\left(\frac{1}{h_{X}^{2}}\frac{\sum_{j=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}\right)
=\displaystyle= Var(1hX2j=1m1K(dT(t1,j,t)hT)K(s(t1,j)+ϵ1,jxhX)j=1m1KT(dT(t1,j,t)hT))\displaystyle\text{Var}\left(\frac{1}{h_{X}^{2}}\frac{\sum_{j=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K_{T}\left(\frac{d_{T}(t_{1,j^{\prime}},t)}{h_{T}}\right)}\right)
=\displaystyle= 1hX4j=1m1[K(dT(,t)hT)j=1m1K(dT(t1,j,t)hT)]2Var(K(s(t1,j)+ϵ1,jxhX)),\displaystyle\frac{1}{h_{X}^{4}}\sum_{j=1}^{m_{1}}\left[\frac{K\left(\frac{d_{T}(,t)}{h_{T}}\right)}{\sum_{j=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}\right]^{2}\text{Var}\left(K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)\right), (47)

In (47), by Assumption 3, fϵf_{\epsilon} is upper-bounded by some constant MϵM_{\epsilon}, then for every j{1,,m1}j\in\{1,...,m_{1}\}

Var(K(s(t1,j)+ϵ1,jxhX))\displaystyle\text{Var}\left(K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)\right)\leq 𝔼ϵ1,j[K(s(t1,j)+ϵ1,jxhX)2]\displaystyle\mathbb{E}_{\epsilon_{1,j}}\left[K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)^{2}\right]
=\displaystyle= ϵ1,j2K(s(t1,j)+ϵ1,jxhX)2fϵ(ϵ1,j)𝑑ϵ1,j\displaystyle\int_{\epsilon_{1,j}\in\mathbb{R}^{2}}K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)^{2}f_{\epsilon}(\epsilon_{1,j})d\epsilon_{1,j}
=\displaystyle= hX2y1,j2K(y1,j)2fϵ(hXy1,j+xs(t1,j))𝑑y1,j\displaystyle h_{X}^{2}\int_{y_{1,j}\in\mathbb{R}^{2}}K\left(y_{1,j}\right)^{2}f_{\epsilon}(h_{X}y_{1,j}+x-s(t_{1,j}))dy_{1,j}
\displaystyle\leq MϵhX2x2K2(x)𝑑x,\displaystyle M_{\epsilon}h_{X}^{2}\int_{x\in\mathbb{R}^{2}}K^{2}(x)dx, (48)

for any s𝒮s\in\mathcal{S}, where the last inequality is derived by Assumption 3 that fϵf_{\epsilon} is bounded. Plugging (48) to (47), we obtain

Var(f^i(x|t;s))=MϵhX2j=1mi[KT(dT(ti,j,t)hT)j=1miKT(dT(ti,j,t)hT)]2x2K12(x)𝑑x\displaystyle\text{Var}\left(\widehat{f}_{i}(x|t;s)\right)=\frac{M_{\epsilon}}{h_{X}^{2}}\sum_{j=1}^{m_{i}}\left[\frac{K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K_{T}\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}\right]^{2}\int_{x\in\mathbb{R}^{2}}K_{1}^{2}(x)dx

for any s𝒮s\in\mathcal{S}. \Box

Now we formally state our proof of Theorem 7.

Proof. Recall that we use f^i(x|t)\widehat{f}_{i}(x|t) to denote the ii-th day’s contribution to f^(x|t)\widehat{f}(x|t), i.e.

f^k(x|t)=1hX2j=1miK(dT(ti,j,t)hT)K(Xi,jxhX)j=1miK(dT(ti,j,t)hT).\displaystyle\widehat{f}_{k}(x|t)=\frac{1}{h_{X}^{2}}\frac{\sum_{j=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)K\left(\frac{X_{i,j}-x}{h_{X}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}.

Then, f^(x|t)\widehat{f}(x|t) can be written as

f^(x|t)=\displaystyle\widehat{f}(x|t)= 1hX2i=1n1mij=1miK(dT(ti,j,t)hT)K(Xi,jxhX)i=1n1mij=1miK(dT(ti,j,t)hT)=i=1nαi(t)f^i(x|t).\displaystyle\frac{1}{h_{X}^{2}}\frac{\sum_{i=1}^{n}\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)K\left(\frac{X_{i,j}-x}{h_{X}}\right)}{\sum_{i=1}^{n}\frac{1}{m_{i}}\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}=\sum_{i=1}^{n}\alpha_{i}(t)\widehat{f}_{i}(x|t).

where αi(t)=1mij=1miK(dT(ti,j,t)hT)i=1n1mij=1miK(dT(ti,j,t)hT)\alpha_{i}(t)=\frac{\frac{1}{m_{i}}\sum_{j=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{i=1}^{n}\frac{1}{m_{i}}\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)} represents the kernel-sum weight for each day. Note that i=1nαi(t)=1\sum_{i=1}^{n}\alpha_{i}(t)=1 for every t[0,1]t\in[0,1].

Bias: The expectation of f^(x|t)\widehat{f}(x|t) is

𝔼(f^(x|t))=i=1nαi(t)𝔼[f^i(x|t)],\displaystyle\mathbb{E}\left(\widehat{f}(x|t)\right)=\sum_{i=1}^{n}\alpha_{i}(t)\mathbb{E}\left[\widehat{f}_{i}(x|t)\right],

where, for any i{1,,n}i\in\{1,...,n\}:

𝔼[f^i(x|t)]=s𝒮𝔼[f^i(x|t)|Si=s]𝑑FS(s).\displaystyle\mathbb{E}\left[\widehat{f}_{i}(x|t)\right]=\int_{s\in\mathcal{S}}\mathbb{E}\left[\widehat{f}_{i}(x|t)|S_{i}=s\right]dF_{S}(s).

And fGPS(x|t)f_{GPS}(x|t) can be written as

fGPS(x|t)=s𝒮fGPS(x|t;s)𝑑FS(s).\displaystyle f_{GPS}(x|t)=\int_{s\in\mathcal{S}}f_{GPS}(x|t;s)dF_{S}(s).

By the (45) in Lemma 9, for any s𝒮s\in\mathcal{S},

𝔼[f^i(x|t;s)]fGPS(x|t;s)=O(hX2)+O(j=1miK(dT(ti,j,t)hT)j=1miK(dT(ti,j,t)hT)fϵ(xs((ti,j))fϵ(xs(t))).\displaystyle\mathbb{E}\left[\widehat{f}_{i}(x|t;s)\right]-f_{GPS}(x|t;s)=O(h_{X}^{2})+O\left(\sum_{j=1}^{m_{i}}\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}f_{\epsilon}(x-s((t_{i,j}))-f_{\epsilon}(x-s(t))\right).

So,

𝔼[f^(x|t)]fGPS(x|t)\displaystyle\mathbb{E}\left[\widehat{f}(x|t)\right]-f_{GPS}(x|t)
=\displaystyle= i=1nαi(t)S𝓈𝔼[f^i(x|t)|Si=s]fGPS(x|t;s)dFS(s)\displaystyle\sum_{i=1}^{n}\alpha_{i}(t)\int_{S\in\mathcal{s}}\mathbb{E}\left[\widehat{f}_{i}(x|t)|S_{i}=s\right]-f_{GPS}(x|t;s)dF_{S}(s)
=\displaystyle= O(hX2)+O(i=1nαi(t)s𝒮[j=1miK(dT(ti,j,t)hT)j=1miK(dT(ti,j,t)hT)fϵ(xs(ti,j))fϵ(xs(t))]𝑑FS(s)).\displaystyle O(h_{X}^{2})+O\left(\sum_{i=1}^{n}\alpha_{i}(t)\int_{s\in\mathcal{S}}\left[\sum_{j=1}^{m_{i}}\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}f_{\epsilon}(x-s(t_{i,j}))-f_{\epsilon}(x-s(t))\right]dF_{S}(s)\right).

Furthermore, by Assumption 2 and Assumption 3, note that there exists constant LϵL_{\epsilon} and v¯\bar{v} such that

fϵ(xs(ti,j))fϵ(xs(t))Lϵ|s(ti,j)s(t)|Lϵv¯dT(ti,j,t),\displaystyle f_{\epsilon}(x-s(t_{i,j}))-f_{\epsilon}(x-s(t))\leq L_{\epsilon}|s(t_{i,j})-s(t)|\leq L_{\epsilon}\bar{v}d_{T}(t_{i,j},t),

hold for any ti,j[0,1]t_{i,j}\in[0,1], s𝒮s\in\mathcal{S} and x𝒲x\in\mathcal{W}. Then,

𝔼f^(x|t)fGPS(x|t)=O(hX2)+O(i=1nαi(t)j=1miKT(dT(ti,j,t)hT)j=1miKT(dT(ti,j,t)hT)dT(ti,j,t)).\displaystyle\mathbb{E}\widehat{f}(x|t)-f_{GPS}(x|t)=O(h_{X}^{2})+O\left(\sum_{i=1}^{n}\alpha_{i}(t)\sum_{j=1}^{m_{i}}\frac{K_{T}\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K_{T}\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}d_{T}(t_{i,j},t)\right). (49)

Variance: By the independence of mobility among each day,

Var[f^(x|t)]=\displaystyle\text{Var}\left[\widehat{f}(x|t)\right]= Var[i=1nαi(t)f^i(x|t)]=i=1nαi(t)2Var[f^i(x|t)].\displaystyle\text{Var}\left[\sum_{i=1}^{n}\alpha_{i}(t)\widehat{f}_{i}(x|t)\right]=\sum_{i=1}^{n}\alpha_{i}(t)^{2}\text{Var}\left[\widehat{f}_{i}(x|t)\right].

By Var(Y)=𝔼(Var(YX))+Var(𝔼(YX))\operatorname{Var}(Y)=\mathbb{E}(\operatorname{Var}(Y\mid X))+\operatorname{Var}(\mathbb{E}(Y\mid X)), for every i{1,,n}i\in\{1,...,n\}, we have

Var[f^i(x|t)]=𝔼{Var[f^i(x|t)Si]}+Var{𝔼[f^i(x|t)Si]}.\displaystyle\text{Var}\left[\widehat{f}_{i}(x|t)\right]=\mathbb{E}\left\{\text{Var}\left[\widehat{f}_{i}(x|t)\mid S_{i}\right]\right\}+\text{Var}\left\{\mathbb{E}\left[\widehat{f}_{i}(x|t)\mid S_{i}\right]\right\}. (50)

WLOG, we can simply let i=1i=1.

For the term 𝔼{Var[f^1(x|t)S1]}\mathbb{E}\left\{\text{Var}\left[\widehat{f}_{1}(x|t)\mid S_{1}\right]\right\} in RHS of (50), for any s𝒮s\in\mathcal{S}, by (46) in Lemma 9, we have

Var[f^1(x|t)S1=s]=Var[f^1(x|t;s)]MϵhX2j=1m1[K(dT(t1,j,t)hT)j=1m1K(dT(t1,j,t)hT)]2x2K2(x)𝑑x.\displaystyle\text{Var}\left[\widehat{f}_{1}(x|t)\mid S_{1}=s\right]=\text{Var}\left[\widehat{f}_{1}(x|t;s)\right]\leq\frac{M_{\epsilon}}{h_{X}^{2}}\sum_{j=1}^{m_{1}}\left[\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j^{\prime}},t)}{h_{T}}\right)}\right]^{2}\int_{x\in\mathbb{R}^{2}}K^{2}(x)dx.

So, by Assumption 4 (boundness of x2K12(x)𝑑x\int_{x\in\mathbb{R}^{2}}K_{1}^{2}(x)dx), we have

i=1nαi(t)2𝔼{Var[f^i(x|t)Sk]}\displaystyle\sum_{i=1}^{n}\alpha_{i}(t)^{2}\mathbb{E}\left\{\text{Var}\left[\widehat{f}_{i}(x|t)\mid S_{k}\right]\right\} =i=1nαi(t)2z𝒵Var[f^i(x|t;s)]𝑑FS(s)\displaystyle=\sum_{i=1}^{n}\alpha_{i}(t)^{2}\int_{z\in\mathcal{Z}}\text{Var}\left[\widehat{f}_{i}(x|t;s)\right]dF_{S}(s) (51)
=O(1hX2i=1nαi(t)2j=1mi[K(dT(ti,j,t)hT)j=1miK(dT(ti,j,t)hT)]2).\displaystyle=O\left(\frac{1}{h_{X}^{2}}\sum_{i=1}^{n}\alpha_{i}(t)^{2}\sum_{j=1}^{m_{i}}\left[\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}\right]^{2}\right).

For the term Var{𝔼[f^1(x|t)S1]}\text{Var}\left\{\mathbb{E}\left[\widehat{f}_{1}(x|t)\mid S_{1}\right]\right\} in the RHS of (50), for any s𝒮s\in\mathcal{S}, by Assumption 3, fϵf_{\epsilon} is bounded by some constant MϵM_{\epsilon}, we have

𝔼[f^1(x|t)S1=s]\displaystyle\mathbb{E}\left[\widehat{f}_{1}(x|t)\mid S_{1}=s\right]
=\displaystyle= 1hX2j=1m1K(dT(t1,j,t)hT)j=1m1K(dT(t1,j,t)hT)ϵ1,j2K(s(t1,j)+ϵ1,jxhX)fϵ(ϵ1,j)𝑑ϵ1,j\displaystyle\frac{1}{h_{X}^{2}}\sum_{j=1}^{m_{1}}\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j^{\prime}},t)}{h_{T}}\right)}\int_{\epsilon_{1,j}\in\mathbb{R}^{2}}K\left(\frac{s(t_{1,j})+\epsilon_{1,j}-x}{h_{X}}\right)f_{\epsilon}(\epsilon_{1,j})d\epsilon_{1,j}
=\displaystyle= j=1m1K(dT(t1,j,t)hT)j=1m1K(dT(t1,j,t)hT)y1,j2K(y1,j)fϵ(hXy1,j+xs(t1,j))𝑑y1,j\displaystyle\sum_{j=1}^{m_{1}}\frac{K\left(\frac{d_{T}(t_{1,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{1}}K\left(\frac{d_{T}(t_{1,j^{\prime}},t)}{h_{T}}\right)}\int_{y_{1,j}\in\mathbb{R}^{2}}K\left(y_{1,j}\right)f_{\epsilon}(h_{X}y_{1,j}+x-s(t_{1,j}))dy_{1,j}
\displaystyle\leq Mϵ.\displaystyle M_{\epsilon}.

Hence,

i=1nαi(t)2Var{𝔼[f^i(x|t)Zi]}=O(i=1nαi(t)2).\displaystyle\sum_{i=1}^{n}\alpha_{i}(t)^{2}\text{Var}\left\{\mathbb{E}\left[\widehat{f}_{i}(x|t)\mid Z_{i}\right]\right\}=O\left({\sum_{i=1}^{n}\alpha_{i}(t)^{2}}\right). (52)

Combining (51) and (52), we have

f^(x|t)𝔼(f^(x|t))=\displaystyle\widehat{f}(x|t)-\mathbb{E}\left(\widehat{f}(x|t)\right)= Op(i=1nαi(t)2)+Op(1hXi=1nαi(t)2j=1mi[K(dT(ti,j,t)hT)j=1mkK(dT(tk,j,t)hT)]2).\displaystyle O_{p}\left(\sqrt{\sum_{i=1}^{n}\alpha_{i}(t)^{2}}\right)+O_{p}\left(\frac{1}{h_{X}}\sqrt{\sum_{i=1}^{n}\alpha_{i}(t)^{2}\sum_{j=1}^{m_{i}}\left[\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{k}}K\left(\frac{d_{T}(t_{k,j^{\prime}},t)}{h_{T}}\right)}\right]^{2}}\right). (53)

Combining (49) and (53), we get

f^(x|t)fGPS(x|t)=\displaystyle\widehat{f}(x|t)-f_{GPS}(x|t)= O(hX2)+O(i=1nαi(t)j=1miK(dT(ti,j,t)hT)j=1miK(dT(ti,j,t)hT)dT(ti,j,t))+Op(i=1nαi(t)2)+\displaystyle O(h_{X}^{2})+O\left(\sum_{i=1}^{n}\alpha_{i}(t)\sum_{j=1}^{m_{i}}\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}d_{T}(t_{i,j},t)\right)+O_{p}\left(\sqrt{\sum_{i=1}^{n}\alpha_{i}(t)^{2}}\right)+
Op(1hXi=1nαi(t)2j=1mi[K(dT(ti,j,t)hT)j=1miK(dT(ti,j,t)hT)]2).\displaystyle O_{p}\left(\frac{1}{h_{X}}\sqrt{\sum_{i=1}^{n}\alpha_{i}(t)^{2}\sum_{j=1}^{m_{i}}\left[\frac{K\left(\frac{d_{T}(t_{i,j},t)}{h_{T}}\right)}{\sum_{j^{\prime}=1}^{m_{i}}K\left(\frac{d_{T}(t_{i,j^{\prime}},t)}{h_{T}}\right)}\right]^{2}}\right).

\Box