This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Alexander G Tartakovsky 22institutetext: AGT StatConsult, Los Angeles, California, USA, 22email: alexg.tartakovsky@gmail.com 33institutetext: Valentin Spivak 44institutetext: MIPT, Russia, 44email: valyas.spivak@gmail.com

Quickest Changepoint Detection in General Multistream Stochastic Models: Recent Results, Applications and Future Challenges

Alexander G Tartakovsky and Valentin Spivak
Abstract

Modern information systems generate large volumes of data with anomalies that occur at unknown points in time and have to be detected quickly and reliably with low false alarm rates. The paper develops a general theory of quickest multistream detection in non-i.i.d. stochastic models when a change may occur in a set of multiple data streams. The first part of the paper focuses on the asymptotic quickest detection theory. Nearly optimal pointwise detection strategies that minimize the expected detection delay are proposed and analyzed when the false alarm rate is low. The general theory is illustrated in several examples. In the second part, we discuss challenging applications associated with the rapid detection of new COVID waves and the appearance of near-Earth space objects. Finally, we discuss certain open problems and future challenges.

1 Introduction

The problem of changepoint detection in multiple data streams (sensors, populations, or multichannel systems) arises in numerous applications that include but are not limited to the medical sphere (detection of an epidemic present in only a fraction of hospitals chang ; frisen-sqa09 ; bock ; tsui-iiet12 ); environmental monitoring (detection of the presence of hazardous materials or intruders fie ; mad ); military defense (detection of an unknown number of targets by multichannel sensor systems Bakutetal-book63 ; Tartakovsky&Brown-IEEEAES08 ); near-Earth space informatics (detection of debris and satellites with telescopes BerenkovTarKol_EnT2020 ; KolessaTartakovskyetal-IEEEAES2020 ; TartakovskyetalIEEESP2021 ); cyber security (detection of attacks in computer networks szor ; Tartakovsky-Cybersecurity14 ; Tartakovskyetal-SM06 ; Tartakovskyetal-IEEESP06 ); detection of malicious activity in social networks Raghavanetal-AoAS2013 ; Raghavanetal-IEEECSS2014 , to name a few.

In many change detection applications, the pre-change (the baseline or in-control) distribution of observed data is known, but the post-change (out-of-control) distribution is not completely known. As discussed in (Tartakovsky_book2020, , Ch 3, 6) there are three conventional approaches in this case: (i) to select a representative value of the post-change parameter and apply efficient detection rules tuned to this value such as the Shiryaev rule, the Shiryaev–Roberts rule or CUSUM, (ii) to select a mixing measure over the parameter space and apply mixture-type rules, (iii) to estimate the parameter and apply adaptive schemes. In Chapters 4 and 6 of Tartakovsky_book2020 , a single stream case was considered. In this paper, we consider a more general case where the change occurs in multiple data streams and the number and location of affected data streams are unknown.

To be more specific, suppose there are NN data streams observed sequentially in time subject to a change at an unknown point in time ν0\nu\geq 0, so that the data up to the time ν\nu are generated by one stochastic model and after ν+1\nu+1 by another model. The change in distributions may occur at a subset of streams of a size 1KN1\leq K\leq N, where KK is an assumed maximal number of streams that can be affected, which can be substantially smaller than NN. A sequential detection procedure is a stopping time TT with respect to an observed sequence. A false alarm is raised when the detection is declared before the change occurs. One wants to detect the change with as small a delay as possible while controlling the false alarm rate.

We consider a fairly general stochastic model assuming that the observations may be dependent and non-identically distributed (non-i.i.d.) before and after the change and that streams may be mutually dependent.

In the case of i.i.d. observations (in pre-change and post-change modes with different distributions), this problem was considered in felsokIEEEIT2016 ; Mei-B2010 ; TartakovskyIEEECDC05 ; TNB_book2014 ; Tartakovskyetal-SM06 ; Xie&Siegmund-AS13 . Specifically, in the case of a known post-change parameter and K=1K=1 (i.e., when only one stream can be affected but it is unknown which one), Tartakovsky TartakovskyIEEECDC05 proposed to use a multi-chart CUSUM procedure that raises an alarm when one of the partial CUSUM statistics exceeds a threshold. This procedure is very simple, but it is not optimal and performs poorly when many data streams are affected. To avoid this drawback, Mei Mei-B2010 suggested a SUM-CUSUM rule based on the sum of CUSUM statistics in streams and evaluated its first-order performance, which shows that this detection scheme is first-order asymptotically minimax minimizing the maximal expected delay to detection when the average run length to false alarm approaches infinity. Fellouris and Sokolov felsokIEEEIT2016 suggested more efficient generalized and mixture-based CUSUM rules that are second-order minimax. Xie and Siegmund Xie&Siegmund-AS13 considered a particular Gaussian model with an unknown post-change mean. They suggested a rule that combines mixture likelihood ratios that incorporate an assumption about the proportion of affected data streams with the generalized CUSUM statistics in streams and then add up the resulting local statistics. They also performed a detailed asymptotic analysis of the proposed detection procedure in terms of the average run length to a false alarm and the expected delay as well as MC simulations. Chan Chan-AS2017 studied a version of the mixture likelihood ratio rule for detecting a change in the mean of the normal population assuming independence of data streams and establishing its asymptotic optimality in a minimax setting as well as dependence of operating characteristics on the fraction of affected streams.

In the present paper, we consider a Bayesian problem with a general prior distribution of the change point and multiple data streams with an unknown pattern, i.e., when the size and location of the affected streams are unknown. It is assumed that the observations can be dependent and non-identically distributed in data streams and even across the streams. Furthermore, in contrast to most previous publications where asymptotically stationary models were considered, we consider substantially non-stationary models, even asymptotically. We address two scenarios when the pre- and post-change distributions are completely known and also a parametric uncertainty when the post-change distribution is known up to an unknown parameter. We introduce mixture detection procedures that mix the Shiryaev–Robers-type statistic over the distributions of the unknown pattern and unknown post-change parameter (in the case of prior uncertainty). The resulting statistics are then compared to appropriate thresholds.

The paper is organized as follows. In Section 2, we present a general theory for very general stochastic models, providing sufficient conditions under which the suggested detection procedures are first-order asymptotically optimal. In Section 3, we provide illustrative examples. In Section 4, we evaluate the performance of proposed mixture detection procedures using Monte Carlo simulations for the non-stationary Gaussian model. In Section 5, theoretical results are applied to rapid detection of the COVID-19 outbreak in Australia based on monitoring the percentage of infections in the total population as well as to rapid detection and extraction of faint near-Earth space objects with telescopes. Section 6 concludes the paper with several remarks, including future research challenges.

2 Asymptotic Theory of Multistream Quickest Change Detection for General Non-i.i.d. Models

In this section, we develop the quickest detection theory in the general multistream non-i.i.d. scenario. We design mixture-based change detection procedures which are nearly optimal in the class of change detection procedures with the prespecified average (weighted) probability of false alarm when this probability is small, assuming that the change point is random with a given prior distribution.

2.1 A General Multistream Model and Basic Notations

We begin with a preliminary description of the scenario of interest and general notation.

Consider the multistream scenario where the observations 𝐗=(X(1),,X(N)){\mathbf{X}}=(X(1),\ldots,X(N)) are sequentially acquired in NN streams, i.e., in the ii-th stream one observes a sequence X(i)={Xn(i)}n1X(i)=\{X_{n}(i)\}_{n\geq 1}, where i𝒩:={1,,N}i\in{\mathcal{N}}:=\{1,\ldots,N\}. The observations are subject to a change at an unknown time ν{0,1,2,}\nu\in\{0,1,2,\dots\}, so that X1(i),,Xν(i)X_{1}(i),\dots,X_{\nu}(i) are generated by one stochastic model and Xν+1(i),Xν+2(i),X_{\nu+1}(i),X_{\nu+2}(i),\dots by another model when the change occurs in the ii-th stream. The change in distributions happens at a subset of streams 𝖡{1,,N}{\mathsf{B}}\subseteq\{1,\dots,N\} with cardinality 1|𝖡|KN1\leq|{\mathsf{B}}|\leq K\leq N, where KK is an assumed maximal number of streams that can be affected, which can be and often is substantially smaller than NN. A sequential detection rule is a stopping time TT with respect to an observed sequence {𝐗n}n1\{{\mathbf{X}}_{n}\}_{n\geq 1}, 𝐗n=(Xn(1),,Xn(N)){\mathbf{X}}_{n}=(X_{n}(1),\dots,X_{n}(N)), i.e., TT is an integer-valued random variable, such that the event {T=n}\{T=n\} belongs to the sigma-algebra n=σ(𝐗1,,𝐗n){\mathscr{F}}_{n}=\sigma({\mathbf{X}}_{1},\dots,{\mathbf{X}}_{n}) generated by observations 𝐗1,,𝐗n{\mathbf{X}}_{1},\dots,{\mathbf{X}}_{n}.

The observations may have a very general stochastic structure. Specifically, if we let 𝐗n(i)=(X1(i),,Xn(i)){\mathbf{X}}^{n}(i)=(X_{1}(i),\dots,X_{n}(i)) denote the sample of size nn in the ii-th stream and if {fθi,n(Xn(i)|𝐗n1(i))}n1\{f_{\theta_{i},n}(X_{n}(i)|{\mathbf{X}}^{n-1}(i))\}_{n\geq 1}, θiΘi\theta_{i}\in\Theta_{i} is a parametric family of conditional densities of Xn(i)X_{n}(i) given 𝐗n1(i){\mathbf{X}}^{n-1}(i), then when ν=\nu=\infty (there is no change) the parameter θi\theta_{i} is equal to the known value θi,0\theta_{i,0}, i.e., fθi,n(Xn(i)|𝐗n1(i))=fθi,0,n(Xn(i)|𝐗n1(i))f_{\theta_{i},n}(X_{n}(i)|{\mathbf{X}}^{n-1}(i))=f_{\theta_{i,0},n}(X_{n}(i)|{\mathbf{X}}^{n-1}(i)) for all n1n\geq 1 and when ν=k<\nu=k<\infty, then θi=θi,1θi,0\theta_{i}=\theta_{i,1}\neq\theta_{i,0}, i.e., fθi,n(Xn(i)|𝐗n1(i))=fθi,0,n(Xn(i)|𝐗n1(i))f_{\theta_{i},n}(X_{n}(i)|{\mathbf{X}}^{n-1}(i))=f_{\theta_{i,0},n}(X_{n}(i)|{\mathbf{X}}^{n-1}(i)) for nkn\leq k and fθi,n(Xn(i)|𝐗n1(i))=fθi,1,n(Xn(i)|𝐗n1(i))f_{\theta_{i},n}(X_{n}(i)|{\mathbf{X}}^{n-1}(i))=f_{\theta_{i,1},n}(X_{n}(i)|{\mathbf{X}}^{n-1}(i)) for n>kn>k. Not only the point of change ν\nu, but also the subset 𝖡{\mathsf{B}}, its size |𝖡||{\mathsf{B}}|, and the post-change parameters θi,1\theta_{i,1} are unknown. For further details with a certain change of notation see below.

Let 𝖯{\mathsf{P}}_{\infty} denote the probability measure corresponding to the sequence of observations {𝐗n}n1\{{\mathbf{X}}_{n}\}_{n\geq 1} from all NN streams when there is never a change (ν=\nu=\infty) in any of the components and, for k=0,1,k=0,1,\dots and 𝖡𝒩{\mathsf{B}}\subset{\mathcal{N}}, let 𝖯k,𝖡{\mathsf{P}}_{k,{\mathsf{B}}} denote the measure corresponding to the sequence {𝐗n}n1\{{\mathbf{X}}_{n}\}_{n\geq 1} when ν=k<\nu=k<\infty and the change occurs in a subset 𝖡{\mathsf{B}} of the set 𝒫{\mathscr{P}} (i.e., Xν+1(i)X_{\nu+1}(i), i𝖡i\in{\mathsf{B}} is the first post-change observation). By 𝖧:ν={\mathsf{H}}_{\infty}:\nu=\infty we denote the hypothesis that the change never occurs and by 𝖧k,𝖡{\mathsf{H}}_{k,{\mathsf{B}}} – the hypothesis that the change occurs at time 0k<0\leq k<\infty is the subset of streams 𝖡𝒫{\mathsf{B}}\subset{\mathscr{P}}. The set 𝒫{\mathscr{P}} is a class of subsets of 𝒩{\mathcal{N}} that incorporates available prior information regarding the subset 𝖡{\mathsf{B}} where the change may occur. For example, in applications frequently it is known that at most KK streams can be affected, in which case 𝒫=𝒫K={𝖡𝒩:1|𝖡|K}{\mathscr{P}}={\mathscr{P}}_{K}=\{{\mathsf{B}}\subset{\mathcal{N}}:1\leq|{\mathsf{B}}|\leq K\}. Hereafter |𝖡||{\mathsf{B}}| denotes the size of a subset 𝖡{\mathsf{B}} (the number of affected streams under 𝖧k,𝖡{\mathsf{H}}_{k,{\mathsf{B}}}) and |𝒫||{\mathscr{P}}| denotes the size of class 𝒫{\mathscr{P}} (the number of possible alternatives in 𝒫{\mathscr{P}}). Note that |𝒫||{\mathscr{P}}| takes maximum value when there is no prior information regarding the subset of affected streams, i.e., when 𝒫=𝒫N{\mathscr{P}}={\mathscr{P}}_{N}, in which case |𝒫|=2N1|{\mathscr{P}}|=2^{N}-1.

The problem is to detect the change as soon as possible after it occurs regardless of the subset 𝖡{\mathsf{B}}, i.e., we are interested in detecting the event 𝖡𝒫𝖧k,𝖡\cup_{{\mathsf{B}}\in{\mathscr{P}}}{\mathsf{H}}_{k,{\mathsf{B}}} that the change has occurred in some subset but not in identifying the subset of streams where it occurs.

Write 𝐗n(i)=(X1(i),,Xn(i)){\mathbf{X}}^{n}(i)=(X_{1}(i),\dots,X_{n}(i)) for the concatenation of the first nn observations from the ii-th data stream and 𝐗n=(𝐗1,,𝐗n){\mathbf{X}}^{n}=({\mathbf{X}}_{1},\dots,{\mathbf{X}}_{n}) for the concatenation of the first nn observations from all NN data streams. Let {g(𝐗n|𝐗n1)}n+\{g({\mathbf{X}}_{n}|{\mathbf{X}}^{n-1})\}_{n\in\mathbb{Z}_{+}} and {f𝖡(𝐗n|𝐗n1)}n+\{f_{{\mathsf{B}}}({\mathbf{X}}_{n}|{\mathbf{X}}^{n-1})\}_{n\in\mathbb{Z}_{+}} be sequences of conditional densities of 𝐗n{\mathbf{X}}_{n} given 𝐗n1{\mathbf{X}}^{n-1}, which may depend on nn, i.e., g=gng=g_{n} and f𝖡=f𝖡,nf_{\mathsf{B}}=f_{{\mathsf{B}},n}. We omit the subscript nn for the sake of brevity. For the general non-i.i.d.  changepoint model the joint density p(𝐗n|𝖧k,𝖡)p({\mathbf{X}}^{n}|{\mathsf{H}}_{k,{\mathsf{B}}}) under hypothesis 𝖧k,𝖡{\mathsf{H}}_{k,{\mathsf{B}}} can be written as follows

p(𝐗n|𝖧k,𝖡)={t=1ng(𝐗t|𝐗t1)forν=kn,t=1kg(𝐗t|𝐗t1)×t=k+1nf𝖡(𝐗t|𝐗t1)forν=k<n,p({\mathbf{X}}^{n}|{\mathsf{H}}_{k,{\mathsf{B}}})=\begin{cases}\prod_{t=1}^{n}g({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})\quad&\text{for}~{}~{}\nu=k\geq n,\\ \prod_{t=1}^{k}g({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})\times\prod_{t=k+1}^{n}f_{{\mathsf{B}}}({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})\quad&\text{for}~{}~{}\nu=k<n,\end{cases} (1)

where 𝖡𝒫{\mathsf{B}}\subset{\mathscr{P}}. Therefore, g(𝐗n|𝐗n1)g({\mathbf{X}}_{n}|{\mathbf{X}}^{n-1}) is the pre-change conditional density and f𝖡(𝐗n|𝐗n1)f_{{\mathsf{B}}}({\mathbf{X}}_{n}|{\mathbf{X}}^{n-1}) is the post-change conditional density given that the change occurs in the subset 𝖡{\mathsf{B}}.

In most practical applications, the post-change distribution is not completely known – it depends on an unknown (generally multidimensional) parameter θΘ\theta\in\Theta, so that the model (1) may be treated only as a benchmark for a more practical case where the post-change densities f𝖡(𝐗t|𝐗t1)f_{{\mathsf{B}}}({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1}) are replaced by f𝖡,θ(𝐗t|𝐗t1)f_{{\mathsf{B}},\theta}({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1}), i.e.,

p(𝐗n|Hk,𝖡,θ)\displaystyle p({\mathbf{X}}^{n}|H_{k,{\mathsf{B}}},\theta) =t=1kg(𝐗t|𝐗t1)×t=k+1nf𝖡,θ(𝐗t|𝐗t1)forν=k<n.\displaystyle=\prod_{t=1}^{k}g({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})\times\prod_{t=k+1}^{n}f_{{\mathsf{B}},\theta}({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})\quad\text{for}~{}~{}\nu=k<n. (2)

Notice that the probabilistic models given by (1) and (2) are very general and do not assume that the data streams are mutually independent.

2.2 Optimality Criterion

In the sequel, we assume that the change point ν\nu is a random variable independent of the observations with a prior distribution πk=𝖯(ν=k)\pi_{k}={\mathsf{P}}(\nu=k), k=0,1,2,k=0,1,2,\dots with πk>0\pi_{k}>0 for k{0,1,2,}=+k\in\{0,1,2,\dots\}=\mathbb{Z}_{+} and that a change point may take negative values, but the detailed structure of the distribution 𝖯(ν=k){\mathsf{P}}(\nu=k) for k=1,2,k=-1,-2,\dots is not important. Only the total probability π1=𝖯(ν1)\pi_{-1}={\mathsf{P}}(\nu\leq-1) of the change being in effect before the observations become available matters.

Let 𝖤k,𝖡,θ{\mathsf{E}}_{k,{\mathsf{B}},\theta} and 𝖤{\mathsf{E}}_{\infty} denote expectations under 𝖯k,𝖡,θ{\mathsf{P}}_{k,{\mathsf{B}},\theta} and 𝖯{\mathsf{P}}_{\infty}, respectively, where 𝖯k,𝖡,θ{\mathsf{P}}_{k,{\mathsf{B}},\theta} corresponds to model (2) with an unknown parameter θΘ\theta\in\Theta. Define the probability measure on the Borel σ\sigma-algebra {\mathscr{B}} in ×\mathbb{R}^{\infty}\times\mathbb{N} as

𝖯𝖡,θπ(𝒜×𝒦)=k𝒦πk𝖯k,𝖡,θ(𝒜),𝒜(),𝒦.{\mathsf{P}}^{\pi}_{{\mathsf{B}},\theta}({\mathcal{A}}\times\mathcal{K})=\sum_{k\in\mathcal{K}}\,\pi_{k}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left({\mathcal{A}}\right),~{}~{}{\mathcal{A}}\in{\mathscr{B}}(\mathbb{R}^{\infty}),~{}~{}{\mathcal{K}}\in\mathbb{N}.

Under measure 𝖯𝖡,θπ{\mathsf{P}}^{\pi}_{{\mathsf{B}},\theta} the change point ν\nu has distribution π={πk}\pi=\{\pi_{k}\} and the model for the observations is of the form (2), i.e., 𝐗(t){\mathbf{X}}(t) has conditional density g(𝐗(t)|𝐗t1)g({\mathbf{X}}(t)|{\mathbf{X}}^{t-1}) if νk\nu\leq k and conditional density f𝖡,θ(𝐗(t)|𝐗t1)f_{{\mathsf{B}},\theta}({\mathbf{X}}(t)|{\mathbf{X}}^{t-1}) if ν>k\nu>k and the change occurs in the subset 𝖡{\mathsf{B}} with the parameter θ\theta. Let 𝖤𝖡,θπ{\mathsf{E}}^{\pi}_{{\mathsf{B}},\theta} denote the expectation under 𝖯𝖡,θπ{\mathsf{P}}^{\pi}_{{\mathsf{B}},\theta}.

For the prior distribution of the change point π={πk}k1\pi=\{\pi_{k}\}_{k\geq-1}, introduce the average (weighted) probability of false alarm associated with the change detection procedure TT

𝖯𝖥𝖠π(T)=𝖯𝖡,θπ(Tν)=k=0πk𝖯(Tk){\mathsf{PFA}}_{\pi}(T)={\mathsf{P}}^{\pi}_{{\mathsf{B}},\theta}(T\leq\nu)=\sum_{k=0}^{\infty}\pi_{k}{\mathsf{P}}_{\infty}(T\leq k) (3)

that corresponds to the risk due to a false alarm. Note that here we took into account that 𝖯k,𝖡,θ(Tk)=𝖯(Tk){\mathsf{P}}_{k,{\mathsf{B}},\theta}(T\leq k)={\mathsf{P}}_{\infty}(T\leq k) since the event {Tk}\{T\leq k\} depends on the observations 𝐗1,,𝐗k{\mathbf{X}}_{1},\dots,{\mathbf{X}}_{k} generated by the pre-change probability measure 𝖯{\mathsf{P}}_{\infty} (recall that by our convention 𝐗k{\mathbf{X}}_{k} is the last pre-change observation if ν=k\nu=k).

For ν=k+\nu=k\in\mathbb{Z}_{+}, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, and θΘ\theta\in\Theta the risk associated with the detection delay is measured by the conditional expected delay to detection

𝖤𝖣𝖣k,𝖡,θ(T)=𝖤k,𝖡,θ[Tk|T>k].{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T)={\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[T-k\,|\,T>k\right]. (4)

Note that if the change occurs before the observations become available, i.e., k{1,2,}k\in\{-1,-2,\dots\}, then 𝖤𝖣𝖣k,𝖡,θ(T)=𝖤k,𝖡,θ[T]𝖤0,𝖡,θ[T]{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T)={\mathsf{E}}_{k,{\mathsf{B}},\theta}[T]\equiv{\mathsf{E}}_{0,{\mathsf{B}},\theta}[T] since T0T\geq 0 with probability one.

Next, for the prior distribution π\pi and α(0,1)\alpha\in(0,1), define the Bayesian class of changepoint detection procedures with the weighed probability of false alarm 𝖯𝖥𝖠π(T)=𝖯𝖡,θπ(Tν){\mathsf{PFA}}_{\pi}(T)={\mathsf{P}}^{\pi}_{{\mathsf{B}},\theta}(T\leq\nu) not greater than a prescribed number α\alpha:

π(α)={T:𝖯𝖥𝖠π(T)α}={T:k=0πk𝖯(Tk)α},{\mathbb{C}_{\pi}}(\alpha)=\{T\in{\mathcal{M}}:{\mathsf{PFA}}_{\pi}(T)\leq\alpha\}=\left\{T\in{\mathcal{M}}:\sum_{k=0}^{\infty}\pi_{k}{\mathsf{P}}_{\infty}(T\leq k)\leq\alpha\right\}, (5)

where {\mathcal{M}} stands for the totality of Markov times.

In this section, we are interested in the uniform Bayesian constrained optimization criterion

inf{T:𝖯𝖥𝖠π(T)α}𝖤𝖣𝖣k,𝖡,θ(T)for allk+,𝖡𝒫andθΘ.\inf_{\{T:{\mathsf{PFA}}_{\pi}(T)\leq\alpha\}}\,{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T)\quad\text{for all}~{}k\in\mathbb{Z}_{+},~{}{\mathsf{B}}\in{\mathscr{P}}~{}\text{and}~{}\theta\in\Theta.

However, this problem is intractable for arbitrary values of α(0,1)\alpha\in(0,1). For this reason, we will consider the following first-order asymptotic problem assuming that the given PFA α\alpha approaches zero: Find a change detection procedure TT^{*} such that it minimizes the expected detection delay 𝖤𝖣𝖣k,𝖡,θ(T){\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T) asymptotically to first order as α0\alpha\to 0 uniformly for all possible values of k+k\in\mathbb{Z}_{+}, subsets 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, and θΘ\theta\in\Theta. That is, our goal it to design such detection procedure TT^{*} that, as α0\alpha\to 0,

infTπ(α)𝖤𝖣𝖣𝖡,θ(T)=𝖤𝖣𝖣k,𝖡,θ(T)(1+o(1))𝖡𝒫,θΘ,ν=k+,\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{EDD}}_{{\mathsf{B}},\theta}(T)={\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T^{*})(1+o(1))~{}~{}\forall~{}{\mathsf{B}}\in{\mathscr{P}},~{}\theta\in\Theta,~{}\nu=k\in\mathbb{Z}_{+}, (6)

where π(α){\mathbb{C}_{\pi}}(\alpha) is the class of detection procedures for which the PFA does not exceed a prescribed number α(0,1)\alpha\in(0,1) defined in (5) and o(1)0o(1)\to 0 as α0\alpha\to 0.

2.3 Multistream Changepoint Detection Procedures

We begin by considering the most general scenario where the observations across streams are dependent.

Parametric Prior Uncertainty

Let 𝖡,θ(n)=f𝖡,θ(𝐗n|𝐗n1)/g(𝐗n|𝐗n1){\mathcal{L}}_{{\mathsf{B}},\theta}(n)=f_{{\mathsf{B}},\theta}({\mathbf{X}}_{n}|{\mathbf{X}}^{n-1})/g({\mathbf{X}}_{n}|{\mathbf{X}}^{n-1}). Note that in the general non-i.i.d. case the statistic 𝖡,θ(n)=𝖡,θ(k)(n){\mathcal{L}}_{{\mathsf{B}},\theta}(n)={\mathcal{L}}_{{\mathsf{B}},\theta}^{(k)}(n) depends on the change point ν=k\nu=k since the post-change density f𝖡,θ(𝐗n|𝐗n1)=f𝖡,θ(k)(𝐗n|𝐗n1)f_{{\mathsf{B}},\theta}({\mathbf{X}}_{n}|{\mathbf{X}}^{n-1})=f_{{\mathsf{B}},\theta}^{(k)}({\mathbf{X}}_{n}|{\mathbf{X}}^{n-1}) may depend on kk. The likelihood ratio (LR) of the hypothesis 𝖧k,𝖡{\mathsf{H}}_{k,{\mathsf{B}}} that the change occurs at ν=k\nu=k in the subset of streams 𝖡{\mathsf{B}} against the no-change hypothesis 𝖧{\mathsf{H}}_{\infty} based on the sample 𝐗n=(𝐗1,,𝐗n){\mathbf{X}}^{n}=({\mathbf{X}}_{1},\dots,{\mathbf{X}}_{n}) is given by the product

LR𝖡,θ(k,n)=t=k+1n𝖡,θ(t),n>kLR_{{\mathsf{B}},\theta}(k,n)=\prod_{t=k+1}^{n}{\mathcal{L}}_{{\mathsf{B}},\theta}(t),\quad n>k

and we set LR𝖡,θ(k,n)=1LR_{{\mathsf{B}},\theta}(k,n)=1 for nkn\leq k.

For 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}} and θΘ\theta\in\Theta, define the generalized Shiryaev–Roberts (SR) statistic

R𝖡,θ(n)=rLR𝖡,θ(0,n)+k=0n1LR𝖡,θ(k,n),n1\begin{split}R_{{\mathsf{B}},\theta}(n)=r\,LR_{{\mathsf{B}},\theta}(0,n)+\sum_{k=0}^{n-1}LR_{{\mathsf{B}},\theta}(k,n),\quad n\geq 1\end{split} (7)

with the initial condition (non-negative head-start) R𝖡,θ(0)=rR_{{\mathsf{B}},\theta}(0)=r, r0r\geq 0.

Now, introduce the probability mass function (mixing measure)

𝐩={p𝖡,𝖡𝒫},p𝖡>0𝖡𝒫,𝖡𝒫p𝖡=1,{\mathbf{p}}=\left\{p_{\mathsf{B}},{\mathsf{B}}\in{\mathscr{P}}\right\},\quad p_{\mathsf{B}}>0~{}\forall~{}{\mathsf{B}}\in{\mathscr{P}},\quad\sum_{{\mathsf{B}}\in{\mathscr{P}}}p_{\mathsf{B}}=1, (8)

where p𝖡p_{\mathsf{B}} is the prior probability of the change being in effect on the set of streams 𝖡{\mathsf{B}}, and also the mixing probability measure

𝐖={W(θ),θΘ},ΘdW(θ)=1.\mathbf{W}=\{W(\theta),\theta\in\Theta\},\quad\int_{\Theta}\rm{d}W(\theta)=1. (9)

For a fixed value of θ\theta, introduce the mixture statistic

R𝐩,θ(n)=𝖡𝒫p𝖡R𝖡,θ(n)=rΛ𝐩,θ(0,n)+k=0n1Λ𝐩,θ(k,n),n1,R𝐩,θ(0)=r,\begin{split}R_{{\mathbf{p}},\theta}(n)&=\sum_{{\mathsf{B}}\in{\mathscr{P}}}p_{\mathsf{B}}R_{{\mathsf{B}},\theta}(n)\\ &=r\Lambda_{{\mathbf{p}},\theta}(0,n)+\sum_{k=0}^{n-1}\Lambda_{{\mathbf{p}},\theta}(k,n),~{}~{}n\geq 1,~{}~{}R_{{\mathbf{p}},\theta}(0)=r,\end{split} (10)

where

Λ𝐩,θ(k,n)=𝖡𝒫p𝖡LR𝖡,θ(k,n)\Lambda_{{\mathbf{p}},\theta}(k,n)=\sum_{{\mathsf{B}}\in{\mathscr{P}}}p_{\mathsf{B}}LR_{{\mathsf{B}},\theta}(k,n) (11)

is the mixture LR.

As discussed in Tartakovsky_book2020 ; TNB_book2014 , when the parameter θ\theta is unknown there are two main conventional approaches – either to estimate θ\theta (say maximize) or average (mix) over θ\theta. Using the mixing measure (prior distribution) 𝐖={W(θ),θΘ}\mathbf{W}=\{W(\theta),\theta\in\Theta\} given in (9), define the double LR-mixture

Λ𝐩,W(k,n)=Θ𝖡𝒫p𝖡LR𝖡,θ(k,n)dW(θ)=ΘΛ𝐩,θ(k,n)dW(θ),k<n\Lambda_{{\mathbf{p}},W}(k,n)=\int_{\Theta}\sum_{{\mathsf{B}}\in{\mathscr{P}}}p_{\mathsf{B}}LR_{{\mathsf{B}},\theta}(k,n)\,\mathrm{d}W(\theta)=\int_{\Theta}\Lambda_{{\mathbf{p}},\theta}(k,n)\,\mathrm{d}W(\theta),\quad k<n (12)

and the double-mixture statistic

R𝐩,W(n)=Θ𝖡𝒫p𝖡R𝖡,θ(n)dW(θ)=rΛ𝐩,W(0,n)+k=0n1Λ𝐩,W(k,n),n1,R𝐩,W(0)=r\begin{split}R_{{\mathbf{p}},W}(n)&=\int_{\Theta}\sum_{{\mathsf{B}}\in{\mathscr{P}}}p_{\mathsf{B}}R_{{\mathsf{B}},\theta}(n)\,\mathrm{d}W(\theta)\\ &=r\Lambda_{{\mathbf{p}},W}(0,n)+\sum_{k=0}^{n-1}\Lambda_{{\mathbf{p}},W}(k,n),~{}~{}n\geq 1,~{}~{}R_{{\mathbf{p}},W}(0)=r\end{split} (13)

(with a non-negative head-start rr).

The corresponding double-mixture LR-based detection procedure is given by the stopping rule which is the first time n1n\geq 1 such that the statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) exceeds the level A>0A>0:

TA𝐩,W=inf{n1:R𝐩,W(n)A}.T_{A}^{{\mathbf{p}},W}=\inf\left\{n\geq 1:R_{{\mathbf{p}},W}(n)\geq A\right\}. (14)

The main result of our theory is that this changepoint detection procedure is first-order asymptotically optimal under certain very general conditions if threshold A=AαA=A_{\alpha} is adequately selected, i.e., asymptotic equality (6) holds with T=TA𝐩,WT^{*}=T_{A}^{{\mathbf{p}},W}.

Known Parameters of the Post-Change Distribution

If the value of the post-change parameter θ\theta is known or its putative value is of special interest, representing a nominal change, then it is reasonable to turn the double-mixture procedure TA𝐩,WT_{A}^{{\mathbf{p}},W} in single-mixture procedure TA𝐩,θT_{A}^{{\mathbf{p}},\theta} by taking the degenerate weight function W(ϑ)W(\vartheta) concentrated at ϑ=θ\vartheta=\theta, i.e.,

TA𝐩,θ=inf{n1:R𝐩,θ(n)A},T_{A}^{{\mathbf{p}},\theta}=\inf\left\{n\geq 1:R_{{\mathbf{p}},\theta}(n)\geq A\right\}, (15)

and ask whether or not it has first-order asymptotic optimality properties at the point θ\theta, i.e., that under certain conditions asymptotic formula (6) holds for T=TA𝐩,θT^{*}=T_{A}^{{\mathbf{p}},\theta}.

2.4 Asymptotic Optimality of Mixture-Based Detection Procedures

Basic Conditions

While we consider a general prior and a very general stochastic model for the observations in streams and between streams, to study asymptotic optimality properties we still need to impose certain constraints on the prior distribution π={πk}\pi=\{\pi_{k}\} and on the general stochastic model (1) that guarantee asymptotic stability of the detection statistics as the sample size increases.

Regarding the prior distribution, we will assume that the following condition holds:

limn1n|log𝖯(ν>n)|=βfor someβ0.\lim_{n\to\infty}\frac{1}{n}\left|\log{\mathsf{P}}(\nu>n)\right|=\beta~{}~{}\text{for some}~{}\beta\geq 0.

However, this condition being quite general does not cover the case where β\beta is positive but may go to zero. Indeed, the distributions with an exponential right tail that satisfy this condition with β>0\beta>0 do not converge as β0\beta\to 0 to heavy-tailed distributions with β=0\beta=0. For this reason, any assertions for heavy-tailed distributions with β=0\beta=0 do not hold if β0\beta\to 0 with an arbitrary rate; the rate must be matched with the PFA probability α\alpha, α0\alpha\to 0. Hence, in what follows we consider the scenario with the prior distribution πα={πkα}\pi^{\alpha}=\{\pi_{k}^{\alpha}\} that depends on the PFA constraint α\alpha and modify the above condition as

𝐂𝐏1{\mathbf{CP}}_{1}. For some βα0\beta_{\alpha}\geq 0 such that βα0\beta_{\alpha}\to 0 as α0\alpha\to 0

limn1n|logk=n+1πkα|=βα.\lim_{n\to\infty}\frac{1}{n}\left|\log\sum_{k=n+1}^{\infty}\pi_{k}^{\alpha}\right|=\beta_{\alpha}.

For 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}} and θΘ\theta\in\Theta, introduce the log-likelihood ratio (LLR) process between the hypotheses 𝖧k,𝖡,θ{\mathsf{H}}_{k,{\mathsf{B}},\theta} (k=0,1,k=0,1,\dots) and 𝖧{\mathsf{H}}_{\infty}:

λ𝖡,θ(k,n)=t=k+1nlogf𝖡,θ(𝐗t|𝐗t1)g(𝐗t|𝐗t1),n>k\lambda_{{\mathsf{B}},\theta}(k,n)=\sum_{t=k+1}^{n}\,\log\frac{f_{{\mathsf{B}},\theta}({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})}{g({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})},\quad n>k

(λ𝖡,θ(k,n)=0\lambda_{{\mathsf{B}},\theta}(k,n)=0 for nkn\leq k).

Let ψ:++\psi:\mathbb{R}_{+}\to\mathbb{R}_{+} be an increasing and continuous function and by Ψ\Psi denote the inverse of ψ\psi, which is also increasing and continuous. We also assume that limxψ(x)=\lim_{x\to\infty}\psi(x)=\infty, and thus, Ψ(x)\Psi(x) is properly defined on the entire positive real line.

In the rest of the paper, we assume that the strong law of large numbers (SLLN) holds for the LLR with the rate ψ(n)\psi(n), that is, λ𝖡,θ(k,k+n)/ψ(n)\lambda_{{\mathsf{B}},\theta}(k,k+n)/\psi(n) converges almost surely (a.s.) under probability 𝖯k,𝖡,θ{\mathsf{P}}_{k,{\mathsf{B}},\theta} to a finite and positive number I𝖡,θI_{{\mathsf{B}},\theta}:

1ψ(n)λ𝖡,θ(k,k+n)n𝖯k,𝖡,θa.s.I𝖡,θfor allk+,𝖡𝒫,θΘ.\frac{1}{\psi(n)}\lambda_{{\mathsf{B}},\theta}(k,k+n)\xrightarrow[n\to\infty]{{\mathsf{P}}_{k,{\mathsf{B}},\theta}-\text{a.s.}}I_{{\mathsf{B}},\theta}~{}~{}\text{for all}~{}k\in\mathbb{Z}_{+},~{}{\mathsf{B}}\in{\mathscr{P}},~{}\theta\in\Theta. (16)

Condition (16) is the first main assumption regarding the general stochastic model for observed data.

Notice that if the data in streams and across the streams are independent (but non-stationary), then

I𝖡,θ=limn1ψ(n)i𝖡t=k+1k+n𝖤k,i,θi[fi,θi(Xt(i)gi,t(Xt(i))],I_{{\mathsf{B}},\theta}=\lim_{n\to\infty}\frac{1}{\psi(n)}\sum_{i\in{\mathsf{B}}}\sum_{t=k+1}^{k+n}{\mathsf{E}}_{k,i,\theta_{i}}\left[\frac{f_{i,\theta_{i}}(X_{t}(i)}{g_{i,t}(X_{t}(i))}\right],

where gi,t(Xt(i))g_{i,t}(X_{t}(i)) and fi,θi(Xt(i))f_{i,\theta_{i}}(X_{t}(i)) are pre- and post-change densities of the tt-th observation in the ii-th stream, respectively. In other words, the number I𝖡,θI_{{\mathsf{B}},\theta} can be interpreted as the limiting local Kullback-Leibler divergence.

If the function ψ(n)\psi(n) is non-linear we will say that the model is non-stationary (even asymptotically). However, in many applications the observations are non-identically distributed but ψ(n)=n\psi(n)=n, in which case we will say that the non-i.i.d. model is asymptotically stationary.

In Subsection 2.4, we establish the asymptotic lower bound (as α0\alpha\to 0) for the minimal value of the expected detection delay 𝖤𝖣𝖣k,𝖡,θ(T){\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T) in class π(α){\mathbb{C}}_{\pi}(\alpha) whenever the almost sure convergence condition (16) holds. To obtain the lower bound it suffices to require the following right-tail condition

limL𝖯k,𝖡,θ{1ψ(L)max1nLλ𝖡,θ(k,k+n)(1+ε)I𝖡,θ}=0for allε>0.\lim_{L\to\infty}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(L)}\max_{1\leq n\leq L}\lambda_{{\mathsf{B}},\theta}(k,k+n)\geq(1+\varepsilon)I_{{\mathsf{B}},\theta}\right\}=0~{}~{}\text{for all}~{}\varepsilon>0.

This condition holds whenever the SLLN (16) takes place. However, the SLLN is not sufficient to show that this lower bound is attained for the mixture detection procedures (14) and (15). To this end, the SLLN should be strengthened into a complete convergence version.

Definition 1

We say that the process {Yk,n,n1}\{Y_{k,n},n\geq 1\}, k+k\in\mathbb{Z}_{+} converges to a random variable YY uniformly completely under the measure 𝖯k{\mathsf{P}}_{k} if

n=1supk+𝖯k{|Yk,nY|>ε}<for allε>0.\sum_{n=1}^{\infty}\sup_{k\in\mathbb{Z}_{+}}{\mathsf{P}}_{k}\left\{\left|Y_{k,n}-Y\right|>\varepsilon\right\}<\infty~{}~{}\text{for all}~{}\varepsilon>0.

The proof of the upper bound presented in Subsection 2.4 shows that if along with the SLLN (16) the following left-tail condition is satisfied

n=1supk+𝖯k,𝖡,θ{1ψ(n)λ𝖡,θ(k,k+n)<I𝖡,θε}<for allε>0,\sum_{n=1}^{\infty}\sup_{k\in\mathbb{Z}_{+}}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\lambda_{{\mathsf{B}},\theta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon\right\}<\infty\quad\text{for all}~{}\varepsilon>0, (17)

where the “information number” I𝖡,θI_{{\mathsf{B}},\theta} is positive and finite, then the detection procedure (15) is asymptotically optimal for the fixed (prespecified) post-change parameter θ\theta.

Obviously, both conditions (16) and (17) are satisfied whenever there exist positive and finite numbers I𝖡,θI_{{\mathsf{B}},\theta} (𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, θΘ\theta\in\Theta) such that the normalized LLR λ𝖡,θ(k,k+n)/ψ(n)I𝖡,θ\lambda_{{\mathsf{B}},\theta}(k,k+n)/\psi(n)\to I_{{\mathsf{B}},\theta} uniformly completely under 𝖯k,𝖡,θ{\mathsf{P}}_{k,{\mathsf{B}},\theta}-probability. Therefore, this condition turns out to be sufficient for asymptotic optimality of the detection procedure (15) when the post-change parameter θ\theta is known. In the case of the unknown post-change parameter, the left-tail condition is similar to but more sophisticated than condition (17). The details will be given in Subsection 2.4.

Heuristic Argument

We begin with a heuristic argument that allows us to obtain approximations for the expected detection delay when the threshold in the mixture detection procedure TA𝐩,WT_{A}^{{\mathbf{p}},W} is large. For the sake of simplicity, the head-start in the detection statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) is set to zero, R𝐩,W(n)=r=0R_{{\mathbf{p}},W}(n)=r=0. This argument also explains the reason why the condition 𝐂𝐏1{\mathbf{CP}}_{1} on the prior distribution is imposed.

Assume first that the change occurs at ν=k0\nu=k\leq 0. It is easy to see that the logarithm of the statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) can be written as

logR𝐩,W(n)=λ𝐩,W(0,n)+Yn,\log R_{{\mathbf{p}},W}(n)=\lambda_{{\mathbf{p}},W}(0,n)+Y_{n},

where

Yn=log[t=1n1Λ𝐩,W(t,n)Λ𝐩,W(0,n)].Y_{n}=\log\left[\sum_{t=1}^{n-1}\frac{\Lambda_{{\mathbf{p}},W}(t,n)}{\Lambda_{{\mathbf{p}},W}(0,n)}\right].

Due to the SLLN (16) λ𝐩,W(n)/ψ(n)\lambda_{{\mathbf{p}},W}(n)/\psi(n) converges almost surely as nn\to\infty under 𝖯0,𝖡,θ{\mathsf{P}}_{0,{\mathsf{B}},\theta} to I𝖡,θI_{{\mathsf{B}},\theta}, so we can expect that for a large nn

λ𝐩,W(0,n)I𝖡,θψ(n)+ξn,\lambda_{{\mathbf{p}},W}(0,n)\approx I_{{\mathsf{B}},\theta}\,\psi(n)+\xi_{n}, (18)

where ξn/ψ(n)\xi_{n}/\psi(n) converges to 0. Also, YnY_{n}, n1n\geq 1 are “slowly changing” and converge to a random variable YY_{\infty}. Thus, ignoring the overshoot of logR𝐩,W(TA𝐩,W)\log R_{{\mathbf{p}},W}(T_{A}^{{\mathbf{p}},W}) over logA\log A, we obtain

logAlogR𝐩,W(TA𝐩,W)I𝖡,θψ(TA𝐩,W)+ξTA𝐩,W+Y.\log A\approx\log R_{{\mathbf{p}},W}(T_{A}^{{\mathbf{p}},W})\approx I_{{\mathsf{B}},\theta}\psi\left(T_{A}^{{\mathbf{p}},W}\right)+\xi_{T_{A}^{{\mathbf{p}},W}}+Y_{\infty}. (19)

If ψ(n)\psi(n) increases sufficiently fast, at least not slower than nn, then the last two terms can be ignored, and hence,

TA𝐩,WΨ(logAI𝖡,θ).T_{A}^{{\mathbf{p}},W}\approx\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}}\right).

Taking expectation yields

𝖤0,𝖡,θ[TA𝐩,W]Ψ(logAI𝖡,θ).{\mathsf{E}}_{0,{\mathsf{B}},\theta}[T_{A}^{{\mathbf{p}},W}]\approx\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}}\right).

A similar argument leads to the following approximate formula (for a large AA) for the expected delay 𝖤k,𝖡,θ[(TA𝐩,Wk)+]{\mathsf{E}}_{k,{\mathsf{B}},\theta}[(T_{A}^{{\mathbf{p}},W}-k)^{+}] when the change occurs at ν=k\nu=k:

𝖤k,𝖡,θ[(TA𝐩,Wk)+]Ψ(logAI𝖡,θ),k+.{\mathsf{E}}_{k,{\mathsf{B}},\theta}[(T_{A}^{{\mathbf{p}},W}-k)^{+}]\approx\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}}\right),\quad k\in\mathbb{Z}_{+}. (20)

Next, in Lemma 1 (see Subsection 2.4) it is established that the probability of a false alarm of the procedure TA𝐩,WT_{A}^{{\mathbf{p}},W} satisfies the inequality

𝖯𝖥𝖠π(TA𝐩,W)ν¯A/A,{\mathsf{PFA}}_{\pi}(T_{A}^{{\mathbf{p}},W})\leq\bar{\nu}_{A}/A,

where ν¯A=k=1kπkA\bar{\nu}_{A}=\sum_{k=1}^{\infty}k\pi_{k}^{A} is the mean of the prior distribution πA\pi^{A}. If we assume that ν¯A/A0\bar{\nu}_{A}/A\to 0 as AA\to\infty, then this inequality along with approximate equality (20) yields the following approximation for the expected delay to detection:

𝖤k,𝖡,θ[TA𝐩,Wk|TA𝐩,W>k]=𝖤k,θ[(TAWk)+]1𝖯𝖥𝖠π(TA𝐩,W)Ψ(logAI𝖡,θ).{\mathsf{E}}_{k,{\mathsf{B}},\theta}[T_{A}^{{\mathbf{p}},W}-k|T_{A}^{{\mathbf{p}},W}>k]=\frac{{\mathsf{E}}_{k,\theta}[(T_{A}^{W}-k)^{+}]}{1-{\mathsf{PFA}}_{\pi}(T_{A}^{{\mathbf{p}},W})}\approx\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}}\right). (21)

In subsequent sections, this approximation is justified rigorously.

There are two key points we would like to address. The first one is that if we impose condition 𝐂𝐏1{\mathbf{CP}}_{1} on the prior distribution of the change point with β>0\beta>0 that does not depend on α\alpha and does not converge to 0, then the lower bound for the expected delay to detection in class π(α){\mathbb{C}_{\pi}}(\alpha) has the form

infTπ(α)𝖤𝖣𝖣k,𝖡,θΨ(|logα|I𝖡,θ+β)(1+o(1))asα0.\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}\geq\Psi\left(\frac{|\log\alpha|}{I_{{\mathsf{B}},\theta}+\beta}\right)(1+o(1))\quad\text{as}~{}\alpha\to 0. (22)

In this case, this lower bound is not attained by the procedure TA𝐩,WT_{A}^{{\mathbf{p}},W} since if we take A=Aα|logα|A=A_{\alpha}\sim|\log\alpha|, then it follows from (21) that

𝖤k,𝖡,θ[TA𝐩,Wk|TA𝐩,W>k]Ψ(|logα|I𝖡,θ)>Ψ(|logα|I𝖡,θ+β).{\mathsf{E}}_{k,{\mathsf{B}},\theta}[T_{A}^{{\mathbf{p}},W}-k|T_{A}^{{\mathbf{p}},W}>k]\sim\Psi\left(\frac{|\log\alpha|}{I_{{\mathsf{B}},\theta}}\right)>\Psi\left(\frac{|\log\alpha|}{I_{{\mathsf{B}},\theta}+\beta}\right).

Hereafter we use a conventional notation yazay_{a}\sim z_{a} as aa0a\to a_{0} if limaa0(ya/za)=1\lim_{a\to a_{0}}(y_{a}/z_{a})=1. This can be expected since the detection statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) is based on the uniform prior on a positive half line. However, if β=βα0\beta=\beta_{\alpha}\to 0, as we assumed in condition 𝐂𝐏1{\mathbf{CP}}_{1}, then the lower bound is

Ψ(|logα|I𝖡,θ)(1+o(1))\Psi\left(\frac{|\log\alpha|}{I_{{\mathsf{B}},\theta}}\right)(1+o(1))

and it is attained by TA𝐩,WT_{A}^{{\mathbf{p}},W}.

Yet another key point is that to obtain the lower bound (22) with β=βα0\beta=\beta_{\alpha}\to 0 but βα>0\beta_{\alpha}>0, as shown in the proof of Theorem 2.1, we need the function ψ(x)\psi(x) to be either linear or super-linear. Indeed, if now we define the statistic

S𝐩,Wπ(n)=1𝖯(νn)[π1Λ𝐩,W(0,n)+k=0n1πkΛ𝐩,W(k,n)]S_{{\mathbf{p}},W}^{\pi}(n)=\frac{1}{{\mathsf{P}}(\nu\geq n)}\left[\pi_{-1}\Lambda_{{\mathbf{p}},W}(0,n)+\sum_{k=0}^{n-1}\pi_{k}\Lambda_{{\mathbf{p}},W}(k,n)\right]

and the corresponding detection procedure

T~A𝐩,W=inf{n1:S𝐩,Wπ(n)A},{\widetilde{T}}_{A}^{{\mathbf{p}},W}=\inf\left\{n\geq 1:S_{{\mathbf{p}},W}^{\pi}(n)\geq A\right\},

then instead of (19) we have

logAlogS𝐩,W(T~A𝐩,W)βT~A𝐩,W+I𝖡,θψ(T~A𝐩,W)+ξT~A𝐩,W+Y.\log A\approx\log S_{{\mathbf{p}},W}({\widetilde{T}}_{A}^{{\mathbf{p}},W})\approx\beta{\widetilde{T}}_{A}^{{\mathbf{p}},W}+I_{{\mathsf{B}},\theta}\psi\left({\widetilde{T}}_{A}^{{\mathbf{p}},W}\right)+\xi_{{\widetilde{T}}_{A}^{{\mathbf{p}},W}}+Y_{\infty}.

So if ψ(x)<x\psi(x)<x is sub-linear, i.e., Ψ(n)n\Psi(n)\gg n for large nn, we expect that the prior distribution gives much more contribution than the observations and

𝖤k,𝖡,θ[T~A𝐩,Wk|T~A𝐩,W>k]Ψ(logAβ){\mathsf{E}}_{k,{\mathsf{B}},\theta}[{\widetilde{T}}_{A}^{{\mathbf{p}},W}-k|{\widetilde{T}}_{A}^{{\mathbf{p}},W}>k]\approx\Psi\left(\frac{\log A}{\beta}\right)

as long as β>0\beta>0, i.e., the prior distribution has an exponential right tail.

Despite the simplicity of the basic ideas and the approximate calculations, the rigorous argument in proving these results is rather tedious.

Asymptotic Lower Bound for Expected Detection Delay

For establishing asymptotic optimality of changepoint detection procedures, we first obtain, under the a.s. convergence condition (16), the asymptotic lower bound for expected detection delay 𝖤𝖣𝖣k,𝖡,θ(T)=𝖤k,𝖡,θ[Tk|T>k]{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T)={\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[T-k|T>k\right] of any detection rule TT from class π(α){\mathbb{C}_{\pi}}(\alpha). In the following subsections, we show that under certain additional conditions associated with complete convergence of LLR these bounds are attained for the mixture procedures TA𝐩,WT_{A}^{{\mathbf{p}},W} and TA𝐩,θT_{A}^{{\mathbf{p}},\theta}.

For ε(0,1)\varepsilon\in(0,1) and δ>0\delta>0, define

Nα=Nα(ε,δ,𝖡,θ)=Ψ((1ε)|logα|I𝖡,θ+βα+δ).N_{\alpha}=N_{\alpha}(\varepsilon,\delta,{\mathsf{B}},\theta)=\Psi\left(\frac{(1-\varepsilon)|\log\alpha|}{I_{{\mathsf{B}},\theta}+\beta_{\alpha}+\delta}\right). (23)

The following theorem specifies the asymptotic lower bound for the expected detection delay.

Theorem 2.1

Let the prior distribution of the change point satisfy condition 𝐂𝐏1{\mathbf{CP}}_{1} and assume that for some positive and finite numbers I𝖡,θI_{{\mathsf{B}},\theta} (𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, θΘ\theta\in\Theta) the a.s. convergence condition (16) holds. Suppose in addition that the function ψ(x)\psi(x) increases not slower than xx, i.e.,

ψ(x)x,x>0.\psi(x)\geq x,\quad x>0. (24)

Then for all 0<ε<10<\varepsilon<1 and δ>0\delta>0

limα0supTπ(α)𝖯k,𝖡,θ{k<Tk+Nα(ε,δ,𝖡,θ)}=0for all𝖡𝒫,θΘ,\lim_{\alpha\to 0}\sup_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{k<T\leq k+N_{\alpha}(\varepsilon,\delta,{\mathsf{B}},\theta)\right\}=0~{}~{}\text{for all}~{}{\mathsf{B}}\in{\mathscr{P}},\theta\in\Theta, (25)

and, as a result, for all ν=k+\nu=k\in\mathbb{Z}_{+}, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, and θΘ\theta\in\Theta

infTπ(α)𝖤𝖣𝖣k,𝖡,θ(T)Ψ(|logα|I𝖡,θ)(1+o(1))asα0.\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T)\geq\Psi\left(\frac{|\log\alpha|}{I_{{\mathsf{B}},\theta}}\right)(1+o(1))~{}~{}\text{as}~{}\alpha\to 0. (26)
Proof

In the asymptotically stationary case where ψ(n)=n\psi(n)=n, the methodology of the proof is analogous to that used by Tartakovsky TartakovskyIEEEIT2017 ; TartakovskyIEEEIT2019 in the proofs of the lower bounds in a single stream change detection problem with slightly different assumptions on the prior distribution. A generalization of the proof in the substantially non-stationary case has several technical details. We present complete proof.

To begin, consider an arbitrary stopping time Tπ(α)T\in{\mathbb{C}_{\pi}}(\alpha) and note that by Markov’s inequality

𝖤𝖣𝖣k,𝖡,θ(T)\displaystyle{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T) 𝖤k,𝖡,θ[(Tk)+]Nα𝖯k,𝖡,θ{(Tk)+>Nα}\displaystyle\geq{\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[(T-k)^{+}\right]\geq N_{\alpha}\,{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{(T-k)^{+}>N_{\alpha}\right\}
=Nα𝖯k,𝖡,θ{T>k+Nα}.\displaystyle=N_{\alpha}\,{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{T>k+N_{\alpha}\right\}.

If assertion (25) holds, then

limα0infTπ(α)𝖯k,𝖡,θ{T>k+Nα}=1.\lim_{\alpha\to 0}\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{T>k+N_{\alpha}\right\}=1. (27)

Indeed,

𝖯k,𝖡,θ{T>k+Nα}=𝖯{T>k}𝖯k,𝖡,θ{k<Tk+Nα},\displaystyle{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{T>k+N_{\alpha}\right\}={\mathsf{P}}_{\infty}\left\{T>k\right\}-{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{k<T\leq k+N_{\alpha}\right\},

where we used the identity 𝖯k,𝖡,θ(T>k)=𝖯(T>k){\mathsf{P}}_{k,{\mathsf{B}},\theta}(T>k)={\mathsf{P}}_{\infty}(T>k). Next, since for any change detection procedure Tπ(α)T\in{\mathbb{C}_{\pi}}(\alpha),

αi=kπiα𝖯(Ti)𝖯(Tk)i=kπiα,\alpha\geq\sum_{i=k}^{\infty}\pi_{i}^{\alpha}{\mathsf{P}}_{\infty}(T\leq i)\geq{\mathsf{P}}_{\infty}(T\leq k)\sum_{i=k}^{\infty}\pi_{i}^{\alpha},

it follows that

infTπ(α)𝖯(T>k)1α/Πk1α,k+,\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{P}}_{\infty}(T>k)\geq 1-\alpha/\Pi_{k-1}^{\alpha},\quad k\in\mathbb{Z}_{+}, (28)

where Πα=𝖯(ν>)=k=+1πkα\Pi_{\ell}^{\alpha}={\mathsf{P}}(\nu>\ell)=\sum_{k=\ell+1}^{\infty}\pi_{k}^{\alpha}. Hence, we obtain

infTπ(α)𝖯k,𝖡,θ{T>k+Nα}1α/Πk1αsupTπ(α)𝖯k,𝖡,θ(k<Tk+Nα).\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{T>k+N_{\alpha}\right\}\geq 1-\alpha/\Pi_{k-1}^{\alpha}-\sup_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{P}}_{k,{\mathsf{B}},\theta}(k<T\leq k+N_{\alpha}).

This inequality yields (27), and we obtain the asymptotic inequality

𝖤𝖣𝖣k,𝖡,θ(T)Nα(1+o(1))asα0,{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T)\geq N_{\alpha}(1+o(1))~{}~{}\text{as}~{}\alpha\to 0,

which holds for arbitrary values of ε(0,1)\varepsilon\in(0,1) and δ>0\delta>0. By our assumption, the function Nα=Nα(ε,δ,𝖡,θ)N_{\alpha}=N_{\alpha}(\varepsilon,\delta,{\mathsf{B}},\theta) is continuous, so we may take a limit δ,ε0\delta,\varepsilon\to 0, which implies inequality (26). Consequently, to show the validity of asymptotic inequality (26), it suffices to prove the equality (25).

Let

λ𝖡,θ(t)=logf𝖡,θ(𝐗t|𝐗t1)g(𝐗t|𝐗t1)\lambda^{*}_{{\mathsf{B}},\theta}(t)=\log\frac{f_{{\mathsf{B}},\theta}({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})}{g({\mathbf{X}}_{t}|{\mathbf{X}}^{t-1})}

and notice that for n>kn>k

d𝖯(n)d𝖯k,𝖡,θ(n)=1LR𝖡,θ(k,n)=exp{t=k+1nλ𝖡,θ(t)},\frac{{\mathrm{d}}{\mathsf{P}}_{\infty}^{(n)}}{{\mathrm{d}}{\mathsf{P}}_{k,{\mathsf{B}},\theta}^{(n)}}=\frac{1}{LR_{{\mathsf{B}},\theta}(k,n)}=\exp\left\{-\sum_{t=k+1}^{n}\lambda^{*}_{{\mathsf{B}},\theta}(t)\right\},

where hereafter 𝖯(n){\mathsf{P}}^{(n)} denotes a restriction of the measure 𝖯{\mathsf{P}} to the sigma-algebra n{\mathscr{F}}_{n}. Therefore, changing the measure 𝖯𝖯k,𝖡,θ{\mathsf{P}}_{\infty}\to{\mathsf{P}}_{k,{\mathsf{B}},\theta} and using Wald’s likelihood ratio identity, we obtain for any C>0C>0:

𝖯{k<Tk+Nα}=𝖤k,𝖡,θ[𝟙{0<TkNα}d𝖯(T)d𝖯k,𝖡,θ(T)]\displaystyle{\mathsf{P}}_{\infty}\left\{k<T\leq k+N_{\alpha}\right\}={\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[{\mathbbm{1}_{\{0<T-k\leq N_{\alpha}\}}\frac{{\mathrm{d}}{\mathsf{P}}_{\infty}^{(T)}}{{\mathrm{d}}{\mathsf{P}}_{k,{\mathsf{B}},\theta}^{(T)}}}\right]
=𝖤k,𝖡,θ[𝟙{0<TkNα}et=k+1Tλ𝖡,θ(t)]\displaystyle={\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[{\mathbbm{1}_{\{0<T-k\leq N_{\alpha}\}}e^{-\sum_{t=k+1}^{T}\lambda^{*}_{{\mathsf{B}},\theta}(t)}}\right]
𝖤k,𝖡,θ[𝟙{0<TkNα,t=k+1Tλ𝖡,θ(t)<C}et=k+1Tλ𝖡,θ(t)]\displaystyle\geq{\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[{\mathbbm{1}_{\{0<T-k\leq N_{\alpha},\sum_{t=k+1}^{T}\lambda^{*}_{{\mathsf{B}},\theta}(t)<C\}}e^{-\sum_{t=k+1}^{T}\lambda^{*}_{{\mathsf{B}},\theta}(t)}}\right]
eC𝖯k,𝖡,θ{0<TkNα,max0<nkNαt=k+1nλ𝖡,θ(t)<C}\displaystyle\geq e^{-C}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{0<T-k\leq N_{\alpha},\max_{0<n-k\leq N_{\alpha}}\sum_{t=k+1}^{n}\lambda^{*}_{{\mathsf{B}},\theta}(t)<C\right\}
eC[𝖯k,𝖡,θ{0<TkNα}𝖯k,𝖡,θ{max0n<Nαt=k+1k+n+1λ𝖡,θ(t)C}],\displaystyle\geq e^{-C}\left[{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{0<T-k\leq N_{\alpha}\right\}-{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\max_{0\leq n<N_{\alpha}}\sum_{t=k+1}^{k+n+1}\lambda^{*}_{{\mathsf{B}},\theta}(t)\geq C\right\}\right],

where the last inequality follows from the fact that Pr(𝒜)=Pr(𝒜)Pr(c)\Pr({\cal A}\cap{\cal B})=\Pr({\cal A})-\Pr({\cal B}^{c}) for any events 𝒜{\cal A} and {\cal B}, where c{\cal B}^{c} is the complement event of {\cal B}. Setting

C=ψ(Nα)I𝖡,θ(1+ε)=(1+ε)Kα,Kα=(1ε)|logα|I𝖡,θI𝖡,θ+βα+δC=\psi(N_{\alpha})I_{{\mathsf{B}},\theta}(1+\varepsilon)=(1+\varepsilon)K_{\alpha},\quad K_{\alpha}=\frac{(1-\varepsilon)|\log\alpha|I_{{\mathsf{B}},\theta}}{I_{{\mathsf{B}},\theta}+\beta_{\alpha}+\delta}

yields

𝖯k,𝖡,θ{k<Tk+Nα}κk,α(T)+supk0γk,α,{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{k<T\leq k+N_{\alpha}\right\}\leq\kappa_{k,\alpha}(T)+\sup_{k\geq 0}\gamma_{k,\alpha}, (29)

where

κk,α(T)=κk,α(ε,δ,θ,𝖡,T)=e(1+ε)Kα𝖯{0<TkNα}\kappa_{k,\alpha}(T)=\kappa_{k,\alpha}(\varepsilon,\delta,\theta,{\mathsf{B}},T)=e^{(1+\varepsilon)K_{\alpha}}{\mathsf{P}}_{\infty}\left\{0<T-k\leq N_{\alpha}\right\}

and

γk,α=γk,α(ε,𝖡,θ)=𝖯k,𝖡,θ{max0n<Nαt=k+1k+n+1λ𝖡,θ(t)(1+ε)I𝖡,θψ(Nα)}.\gamma_{k,\alpha}=\gamma_{k,\alpha}(\varepsilon,{\mathsf{B}},\theta)={\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\max_{0\leq n<N_{\alpha}}\sum_{t=k+1}^{k+n+1}\lambda^{*}_{{\mathsf{B}},\theta}(t)\geq(1+\varepsilon)I_{{\mathsf{B}},\theta}\psi(N_{\alpha})\right\}.

By the a.s. convergence condition (16),

limα0supk0γk,α=0.\lim_{\alpha\to 0}\sup_{k\geq 0}\gamma_{k,\alpha}=0. (30)

To evaluate κk,α(T)\kappa_{k,\alpha}(T) it suffices to note that, by condition 𝐂𝐏1{\mathbf{CP}}_{1}, for all sufficiently small α\alpha, there exists a small δ\delta such that

|logΠk1+Nαα|k1+Nαβα+δ,\frac{|\log\Pi^{\alpha}_{k-1+N_{\alpha}}|}{k-1+N_{\alpha}}\leq\beta_{\alpha}+\delta,

and to use inequality (28), which yields that for all sufficiently small α\alpha

κk,α(T)\displaystyle\kappa_{k,\alpha}(T) e(1+ε)Kα𝖯(Tk+Nα)αe(1+ε)Kα/Πk1+Nαα\displaystyle\leq e^{(1+\varepsilon)K_{\alpha}}{\mathsf{P}}_{\infty}(T\leq k+N_{\alpha})\leq\alpha\,e^{(1+\varepsilon)K_{\alpha}}/\Pi^{\alpha}_{k-1+N_{\alpha}}
exp{(1+ε)Kα|logα|+(k1+Nα)|logΠk1+Nαα|k1+Nα}\displaystyle\leq\exp\left\{(1+\varepsilon)K_{\alpha}-|\log\alpha|+(k-1+N_{\alpha})\frac{|\log\Pi^{\alpha}_{k-1+N_{\alpha}}|}{k-1+N_{\alpha}}\right\}
exp{(1+ε)Kα|logα|+(k1+Nα)(βα+δ)}.\displaystyle\leq\exp\left\{(1+\varepsilon)K_{\alpha}-|\log\alpha|+(k-1+N_{\alpha})(\beta_{\alpha}+\delta)\right\}.

Since by condition (24),

NαKα/I𝖡,θ=(1ε)|logα|I𝖡,θ+βα+δN_{\alpha}\leq K_{\alpha}/I_{{\mathsf{B}},\theta}=\frac{(1-\varepsilon)|\log\alpha|}{I_{{\mathsf{B}},\theta}+\beta_{\alpha}+\delta}

it follows that

κk,α(T)\displaystyle\kappa_{k,\alpha}(T) exp{I𝖡,θε2+(βα+δ)εI𝖡,θ+βα+δ|logα|+(βα+δ)k}\displaystyle\leq\exp\left\{-\frac{I_{{\mathsf{B}},\theta}\varepsilon^{2}+(\beta_{\alpha}+\delta)\varepsilon}{I_{{\mathsf{B}},\theta}+\beta_{\alpha}+\delta}|\log\alpha|+(\beta_{\alpha}+\delta)k\right\}
exp{ε2|logα|+(βα+δ)k}:=κ¯α,k(ε,δ).\displaystyle\leq\exp\left\{-\varepsilon^{2}|\log\alpha|+(\beta_{\alpha}+\delta)k\right\}:=\overline{\kappa}_{\alpha,k}(\varepsilon,\delta).

for all ε(0,1)\varepsilon\in(0,1) and every small δ\delta. Hence, for all ε(0,1)\varepsilon\in(0,1) and sufficiently small δ\delta,

κk,α(T)κ¯k,α(ε,δ),\kappa_{k,\alpha}(T)\leq\overline{\kappa}_{k,\alpha}(\varepsilon,\delta), (31)

where κ¯k,α(ε,δ)\overline{\kappa}_{k,\alpha}(\varepsilon,\delta) does not depend on the stopping time TT and goes to 0 as α0\alpha\to 0 for any fixed k+k\in\mathbb{Z}_{+}, which implies that

supTπ(α)κk,α(T)0asα0for every fixedk+.\sup_{T\in{\mathbb{C}_{\pi}}(\alpha)}\kappa_{k,\alpha}(T)\to 0\quad\text{as}~{}\alpha\to 0~{}~{}\text{for every fixed}~{}k\in\mathbb{Z}_{+}.

The proof of the lower bound (26) is complete. ∎

Probabilities of False Alarm of Mixture Change Detection Procedures

An important question is how to select threshold AA in change detection procedures TA𝐩,θT_{A}^{{\mathbf{p}},\theta} and TA𝐩,WT_{A}^{{\mathbf{p}},W} defined in (15) and (14), respectively, to imbed them into class π(α){\mathbb{C}_{\pi}}(\alpha).

The following lemma provides the upper bounds for the PFA of these procedures.

Lemma 1

For all A>0A>0 and any prior distribution π={πk}\pi=\{\pi_{k}\} of the change point ν\nu with finite mean ν¯=k=1kπk\bar{\nu}=\sum_{k=1}^{\infty}k\pi_{k}, the weighted probabilities of false alarms of the detection procedures TA𝐩,θT_{A}^{{\mathbf{p}},\theta} and TA𝐩,WT_{A}^{{\mathbf{p}},W} satisfy the inequalities

𝖯𝖥𝖠π(TA𝐩,θ)\displaystyle{\mathsf{PFA}}_{\pi}(T_{A}^{{\mathbf{p}},\theta}) rb+ν¯A,\displaystyle\leq\frac{rb+\bar{\nu}}{A}, (32)
𝖯𝖥𝖠π(TA𝐩,W)\displaystyle{\mathsf{PFA}}_{\pi}(T_{A}^{{\mathbf{p}},W}) rb+ν¯A,\displaystyle\leq\frac{rb+\bar{\nu}}{A}, (33)

where b=k=1πkb=\sum_{k=1}^{\infty}\pi_{k}. Hence, if A=Aα=(rb+ν¯)/αA=A_{\alpha}=(rb+\bar{\nu})/\alpha, then 𝖯𝖥𝖠π(TAα𝐩,θ)α{\mathsf{PFA}}_{\pi}(T_{A_{\alpha}}^{{\mathbf{p}},\theta})\leq\alpha and 𝖯𝖥𝖠π(TAα𝐩,W)α{\mathsf{PFA}}_{\pi}(T_{A_{\alpha}}^{{\mathbf{p}},W})\leq\alpha, i.e., TAα𝐩,θπ(α)T_{A_{\alpha}}^{{\mathbf{p}},\theta}\in{\mathbb{C}_{\pi}}(\alpha) and TAα𝐩,Wπ(α)T_{A_{\alpha}}^{{\mathbf{p}},W}\in{\mathbb{C}_{\pi}}(\alpha).

Proof

We have

𝖤[R𝐩,θ(n)|n1]=𝖡𝒫p𝖡+𝖡𝒫p𝖡R𝖡,θ(n1)=1+R𝐩,θ(n1),\displaystyle{\mathsf{E}}_{\infty}[R_{{\mathbf{p}},\theta}(n)|{\mathscr{F}}_{n-1}]=\sum_{{\mathsf{B}}\in{\mathscr{P}}}p_{\mathsf{B}}+\sum_{{\mathsf{B}}\in{\mathscr{P}}}p_{\mathsf{B}}R_{{\mathsf{B}},\theta}(n-1)=1+R_{{\mathbf{p}},\theta}(n-1),

and hence, the statistic {R𝐩,θ(n)}n+\{R_{{\mathbf{p}},\theta}(n)\}_{n\in\mathbb{Z}_{+}} is a non-negative (𝖯,n)({\mathsf{P}}_{\infty},{\mathscr{F}}_{n})-submartingale with mean

𝖤[R𝐩,θ(n)]=r+n,n+.{\mathsf{E}}_{\infty}[R_{{\mathbf{p}},\theta}(n)]=r+n,\quad n\in\mathbb{Z}_{+}.

By Doob’s maximal submartingale inequality, for any k1k\geq 1,

𝖯(TA𝐩,θk)=𝖯{max0nkR𝐩,θ(n)A}𝖤[R𝐩,θ(k)]A=r+kA,{\mathsf{P}}_{\infty}(T_{A}^{{\mathbf{p}},\theta}\leq k)={\mathsf{P}}_{\infty}\left\{\max_{0\leq n\leq k}R_{{\mathbf{p}},\theta}(n)\geq A\right\}\leq\frac{{\mathsf{E}}_{\infty}[R_{{\mathbf{p}},\theta}(k)]}{A}=\frac{r+k}{A}, (34)

which implies

k=1πk𝖯(TA𝐩,θk)rk=1πk+k=1kπkA,\sum_{k=1}^{\infty}\pi_{k}{\mathsf{P}}_{\infty}(T_{A}^{{\mathbf{p}},\theta}\leq k)\leq\frac{r\sum_{k=1}^{\infty}\pi_{k}+\sum_{k=1}^{\infty}k\pi_{k}}{A},

and inequality (32) follows.

To prove inequality (33) it suffices to note that the double mixture statistic {R𝐩,W(n)}n1\{R_{{\mathbf{p}},W}(n)\}_{n\geq 1} is also a (𝖯,n)({\mathsf{P}}_{\infty},{\mathscr{F}}_{n})-submartingale with mean 𝖤[R𝐩,W(n)=r+n{\mathsf{E}}_{\infty}[R_{{\mathbf{p}},W}(n)=r+n since

𝖤[R𝐩,W(n)|n1]\displaystyle{\mathsf{E}}_{\infty}[R_{{\mathbf{p}},W}(n)|{\mathscr{F}}_{n-1}] =𝖤[ΘR𝐩,θ(n)dW(θ)|n1]=Θ𝖤[R𝐩,θ(n)|n1]dW(θ)\displaystyle={\mathsf{E}}_{\infty}\left[\int_{\Theta}R_{{\mathbf{p}},\theta}(n)\mathrm{d}W(\theta)|{\mathscr{F}}_{n-1}\right]=\int_{\Theta}{\mathsf{E}}_{\infty}\left[R_{{\mathbf{p}},\theta}(n)|{\mathscr{F}}_{n-1}\right]\mathrm{d}W(\theta)
=1+ΘR𝐩,θ(n1)dW(θ)=1+R𝐩,W(n1).\displaystyle=1+\int_{\Theta}R_{{\mathbf{p}},\theta}(n-1)\mathrm{d}W(\theta)=1+R_{{\mathbf{p}},W}(n-1).

Applying Doob’s submartingale inequality yields

𝖯(TA𝐩,Wk)(r+k)/A,k=1,2,,{\mathsf{P}}_{\infty}(T_{A}^{{\mathbf{p}},W}\leq k)\leq(r+k)/A,\quad k=1,2,\dots, (35)

which implies (33) and the proof is complete.∎

Asymptotic Optimality of Mixture Detection Procedure TA𝐩,θT_{A}^{{\mathbf{p}},\theta} for Known Post-change Parameter

We now proceed with establishing asymptotic optimality properties of the mixture detection procedures TA𝐩,θT_{A}^{{\mathbf{p}},\theta} in class π(α){\mathbb{C}_{\pi}}(\alpha) as α0\alpha\to 0 assuming that the post-change parameter θ\theta is prespecified (known or selected).

Throughout this subsection, we assume that there exist positive and finite numbers I𝖡,θI_{{\mathsf{B}},\theta} (𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, θΘ\theta\in\Theta) such that the normalized LLR λ𝖡,θ(k,k+n)/ψ(n)\lambda_{{\mathsf{B}},\theta}(k,k+n)/\psi(n) converges as nn\to\infty to I𝖡,θI_{{\mathsf{B}},\theta} uniformly completely under probability measure 𝖯k,𝖡,θ{\mathsf{P}}_{k,{\mathsf{B}},\theta}, i.e.,

n=1supk+𝖯k,𝖡,θ{|λ𝖡,θ(k,k+n)ψ(n)I𝖡,θ|>ε}<for allε>0.\sum_{n=1}^{\infty}\sup_{k\in\mathbb{Z}_{+}}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\left|\frac{\lambda_{{\mathsf{B}},\theta}(k,k+n)}{\psi(n)}-I_{{\mathsf{B}},\theta}\right|>\varepsilon\right\}<\infty\quad\text{for all}~{}\varepsilon>0. (36)
2.4(a) Asymptotic Operating Characteristics of Detection Procedure TA𝐩,θT_{A}^{{\mathbf{p}},\theta} for Large Threshold Values

The following theorem provides asymptotic operating characteristics of the mixture detection procedure TA𝐩,θT_{A}^{{\mathbf{p}},\theta} for large values of threshold AA. Since no constraints are imposed on the false alarm rate (FAR) – neither on the PFA nor any other FAR measure – this result is universal and can be used in a variety of optimization problems. We need to impose some constraints on the behavior of the head-start rAr_{A}, which may depend on AA and approach infinity as AA\to\infty in such a way that

limA(rAAε)=0for anyε>0.\lim_{A\to\infty}(r_{A}\,A^{-\varepsilon})=0\quad\text{for any}~{}\varepsilon>0. (37)

We also need to assume that the function ψ(x)\psi(x) increases not too slowly, at least faster than the logarithmic function. Specifically, we assume the following condition on the inverse function Ψ(x)\Psi(x)

limxlogΨ(x)x=0.\lim_{x\to\infty}\frac{\log\Psi(x)}{x}=0. (38)
Theorem 2.2

Suppose that conditions (37) and (38) hold and there exist positive and finite numbers I𝖡,θI_{{\mathsf{B}},\theta}, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, θΘ\theta\in\Theta, such that the uniform complete convergence condition (36) is satisfied. Then the following asymptotic as AA\to\infty approximation holds

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,θ)=Ψ(logAI𝖡,θ)(1+o(1))for allk+,𝖡𝒫.{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta})=\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}}\right)(1+o(1))\quad\text{for all}~{}k\in\mathbb{Z}_{+},~{}{\mathsf{B}}\in{\mathscr{P}}. (39)
Proof

For ε(0,1)\varepsilon\in(0,1), let NA=NA(ε,𝖡,θ)=Ψ((1ε)(logA)/I𝖡,θ)N_{A}=N_{A}(\varepsilon,{\mathsf{B}},\theta)=\Psi\left((1-\varepsilon)(\log A)/I_{{\mathsf{B}},\theta}\right). Inequality (34) implies that for all k+k\in\mathbb{Z}_{+}

𝖯k,𝖡,θ(TA𝐩,θ>k)=𝖯(TA𝐩,θ>k)1k+rAA.{\mathsf{P}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta}>k)={\mathsf{P}}_{\infty}(T_{A}^{{\mathbf{p}},\theta}>k)\geq 1-\frac{k+r_{A}}{A}.

Hence, using Markov’s inequality, we obtain

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,θ)\displaystyle{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta}) 𝖤k,𝖡,θ[(TA𝐩,θk)+]\displaystyle\geq{\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[(T_{A}^{{\mathbf{p}},\theta}-k)^{+}\right]
NA𝖯k,𝖡,θ(TA𝐩,θk>NA)\displaystyle\geq N_{A}\,{\mathsf{P}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta}-k>N_{A})
NA[𝖯k,𝖡,θ(TA𝐩,θ>k)𝖯k,𝖡,θ(k<TA𝐩,θ<k+NA)]\displaystyle\geq N_{A}\left[{\mathsf{P}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta}>k)-{\mathsf{P}}_{k,{\mathsf{B}},\theta}(k<T_{A}^{{\mathbf{p}},\theta}<k+N_{A})\right]
NA[1rA+kA𝖯k,𝖡,θ(k<TA𝐩,θ<k+NA)].\displaystyle\geq N_{A}\left[1-\frac{r_{A}+k}{A}-{\mathsf{P}}_{k,{\mathsf{B}},\theta}(k<T_{A}^{{\mathbf{p}},\theta}<k+N_{A})\right]. (40)

Similarly to (29)

𝖯k,𝖡,θ{k<TA𝐩,θk+NA}κk,A(TA𝐩,θ)+γk,A,{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{k<T_{A}^{{\mathbf{p}},\theta}\leq k+N_{A}\right\}\leq\kappa_{k,A}(T_{A}^{{\mathbf{p}},\theta})+\gamma_{k,A}, (41)

where

κk,A(TA𝐩,θ)=e(1ε2)logA𝖯{0<TA𝐩,θkNA}\kappa_{k,A}(T_{A}^{{\mathbf{p}},\theta})=e^{(1-\varepsilon^{2})\log A}{\mathsf{P}}_{\infty}\left\{0<T_{A}^{{\mathbf{p}},\theta}-k\leq N_{A}\right\}

and

γk,A=𝖯k,𝖡,θ{max0n<NAλ𝖡,θ(k,k+n)(1+ε)I𝖡,θψ(NA)}.\gamma_{k,A}={\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\max_{0\leq n<N_{A}}\lambda_{{\mathsf{B}},\theta}(k,k+n)\geq(1+\varepsilon)I_{{\mathsf{B}},\theta}\psi(N_{A})\right\}.

Using inequality (34), we obtain

𝖯(0<TA𝐩,θk<NA)𝖯(TA𝐩,θ<k+NA)(k+rA+NA)/A,{\mathsf{P}}_{\infty}\left(0<T_{A}^{{\mathbf{p}},\theta}-k<N_{A}\right)\leq{\mathsf{P}}_{\infty}\left(T_{A}^{{\mathbf{p}},\theta}<k+N_{A}\right)\leq(k+r_{A}+N_{A})/A,

and consequently,

κk,A(TA𝐩,θ)k+rA+Ψ((1ε)I𝖡,θ1logA)Aε2.\kappa_{k,A}(T_{A}^{{\mathbf{p}},\theta})\leq\frac{k+r_{A}+\Psi((1-\varepsilon)I_{{\mathsf{B}},\theta}^{-1}\log A)}{A^{\varepsilon^{2}}}. (42)

Since, by conditions (37) and (38), for any ε>0\varepsilon>0

rA+Ψ((1ε)I𝖡,θ1logA)Aε20asA.\frac{r_{A}+\Psi((1-\varepsilon)I_{{\mathsf{B}},\theta}^{-1}\log A)}{A^{\varepsilon^{2}}}\to 0\quad\text{as}~{}A\to\infty.

it follows from (42) that κk,A(TA𝐩,θ)0\kappa_{k,A}(T_{A}^{{\mathbf{p}},\theta})\to 0 as AA\to\infty for any k+k\in\mathbb{Z}_{+}. Also, γk,A0\gamma_{k,A}\to 0 as AA\to\infty by condition (36). Therefore,

𝖯k,𝖡,θ(0<TA𝐩,θk<NA)0asA{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left(0<T_{A}^{{\mathbf{p}},\theta}-k<N_{A}\right)\to 0\quad\text{as}~{}A\to\infty

for any fixed kk. It follows from (2.4) that as AA\to\infty

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,θ)Ψ((1ε)logAI𝖡,θ)(1+o(1)),{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta})\geq\Psi\left(\frac{(1-\varepsilon)\log A}{I_{{\mathsf{B}},\theta}}\right)(1+o(1)),

where ε\varepsilon can be arbitrarily small. This yields the asymptotic lower bound (for any fixed k+k\in\mathbb{Z}_{+}, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}})

lim infA𝖤𝖣𝖣k,𝖡,θ(TA𝐩,θ)logA1I𝖡,θ.\liminf_{A\to\infty}\frac{{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta})}{\log A}\geq\frac{1}{I_{{\mathsf{B}},\theta}}. (43)

To prove (39) it suffices to show that this bound is attained by TA𝐩,θT_{A}^{{\mathbf{p}},\theta}, i.e.,

lim supA𝖤𝖣𝖣k,𝖡,θ(TA𝐩,θ)logA1I𝖡,θ.\limsup_{A\to\infty}\frac{{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta})}{\log A}\leq\frac{1}{I_{{\mathsf{B}},\theta}}. (44)

For 0<ε<I𝖡,θ0<\varepsilon<I_{{\mathsf{B}},\theta}, define

MA=MA(ε,𝖡,θ)=1+Ψ(logAI𝖡,θε).M_{A}=M_{A}(\varepsilon,{\mathsf{B}},\theta)=1+\left\lfloor\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}-\varepsilon}\right)\right\rfloor. (45)

Recall the definitions of the mixture SR statistic R𝐩,θ(n)R_{{\mathbf{p}},\theta}(n) and mixture LR Λ𝐩,θ(n)\Lambda_{{\mathbf{p}},\theta}(n) given in (10) and (11), respectively. Obviously, for any n1n\geq 1 and 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}},

logR𝐩,θ(k+n)logΛ𝐩,θ(k,k+n)λ𝖡,ϑ(k,k+n)+logp𝖡,\log R_{{\mathbf{p}},\theta}(k+n)\geq\log\Lambda_{{\mathbf{p}},\theta}(k,k+n)\geq\lambda_{{\mathsf{B}},\vartheta}(k,k+n)+\log p_{\mathsf{B}},

and we have

𝖯k,𝖡,θ(TA𝐩,θk>n)𝖯k,𝖡,θ{1ψ(n)logR𝐩,θ(k+n)<1ψ(n)logA}\displaystyle{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left(T_{A}^{{\mathbf{p}},\theta}-k>n\right)\leq{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\log R_{{\mathbf{p}},\theta}(k+n)<\frac{1}{\psi(n)}\log A\right\}
𝖯k,𝖡,θ{1ψ(n)λ𝖡,θ(k,k+n)<1ψ(n)(logA+|logp𝖡|)},\displaystyle\leq{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\lambda_{{\mathsf{B}},\theta}(k,k+n)<\frac{1}{\psi(n)}\left(\log A+|\log p_{\mathsf{B}}|\right)\right\},

where for any nMAn\geq M_{A} the last probability does not exceed the probability

𝖯k,𝖡,θ{1ψ(n)λ𝖡,θ(k,k+n)<I𝖡,θε+|logp𝖡|/ψ(MA)}.{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\lambda_{{\mathsf{B}},\theta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon+|\log p_{\mathsf{B}}|/\psi(M_{A})\right\}.

Therefore, for I𝖡,θ>ε>0I_{{\mathsf{B}},\theta}>\varepsilon>0, all nMAn\geq M_{A} and all sufficiently large AA such that |logp𝖡|/ψ(MA)<ε/2|\log p_{\mathsf{B}}|/\psi(M_{A})<\varepsilon/2 , we have

𝖯k,𝖡,θ(TA𝐩,θk>n)\displaystyle{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left(T_{A}^{{\mathbf{p}},\theta}-k>n\right) 𝖯k,𝖡,θ{1ψ(n)λ𝖡,θ(k,k+n)<I𝖡,θε/2}.\displaystyle\leq{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\lambda_{{\mathsf{B}},\theta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon/2\right\}. (46)

Now, by Lemma A1 in Tartakovsky (Tartakovsky_book2020, , page 239), for any 1\ell\geq 1, k+k\in\mathbb{Z}_{+}, and 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}},

𝖤k,𝖡,θ[(TA𝐩,θk)+]MA+ 21n=MAn1𝖯k,𝖡,θ(TA𝐩,θ>n),\displaystyle{\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[(T_{A}^{{\mathbf{p}},\theta}-k)^{+}\right]^{\ell}\leq M_{A}^{\ell}+\ell\,2^{\ell-1}\sum_{n=M_{A}}^{\infty}n^{\ell-1}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left(T_{A}^{{\mathbf{p}},\theta}>n\right), (47)

which along with inequality (46) yields

𝖤k,𝖡,θ[(TA𝐩,θk)+]\displaystyle{\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[\left(T_{A}^{{\mathbf{p}},\theta}-k\right)^{+}\right] 1+Ψ(logAI𝖡,θε)\displaystyle\leq 1+\left\lfloor\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}-\varepsilon}\right)\right\rfloor
+n=MA𝖯k,𝖡,θ{1ψ(n)λ𝖡,θ(k,k+n)<I𝖡,θε/2}.\displaystyle+\sum_{n=M_{A}}^{\infty}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\lambda_{{\mathsf{B}},\theta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon/2\right\}. (48)

Write

ΥA(ε,𝖡,θ)=n=MAsupk+𝖯k,𝖡,θ{1ψ(n)λ𝖡,θ(k,k+n)<I𝖡,θε}.\Upsilon_{A}(\varepsilon,{\mathsf{B}},\theta)=\sum_{n=M_{A}}^{\infty}\sup_{k\in\mathbb{Z}_{+}}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\lambda_{{\mathsf{B}},\theta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon\right\}.

Using (48) and the inequality 𝖯(TA𝐩,θ>k)>1(rA+k)/A{\mathsf{P}}_{\infty}(T_{A}^{{\mathbf{p}},\theta}>k)>1-(r_{A}+k)/A (see (34)), we obtain

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,θ)\displaystyle{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta}) =𝖤k,𝖡,θ[(TA𝐩,θk)+]𝖯(TA𝐩,θ>k)\displaystyle=\frac{{\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[\left(T_{A}^{{\mathbf{p}},\theta}-k\right)^{+}\right]}{{\mathsf{P}}_{\infty}(T_{A}^{{\mathbf{p}},\theta}>k)}
1+Ψ(logAI𝖡,θε)+ΥA(ε/2,𝖡,θ)1(rA+k)/A.\displaystyle\leq\frac{1+\left\lfloor\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}-\varepsilon}\right)\right\rfloor+\Upsilon_{A}(\varepsilon/2,{\mathsf{B}},\theta)}{1-(r_{A}+k)/A}. (49)

Since due to condition (37) rA/A0r_{A}/A\to 0 and, by complete convergence condition (36), limAΥA(ε/2,𝖡,θ)=0\lim_{A\to\infty}\Upsilon_{A}(\varepsilon/2,{\mathsf{B}},\theta)=0 for all ε>0\varepsilon>0 and 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, inequality (49) implies the asymptotic inequality

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,θ)(logAI𝖡,θε)(1+o(1))asA.{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta})\leq\left(\frac{\log A}{I_{{\mathsf{B}},\theta}-\varepsilon}\right)(1+o(1))\quad\text{as}~{}A\to\infty.

Since ε\varepsilon can be arbitrarily small the asymptotic upper bound (44) follows and the proof of the asymptotic approximation (39) is complete. ∎

2.4(b) Asymptotic Optimality

Since we assume that the prior distribution may depend on the prescribed PFA α\alpha the mean ν¯α=k=0kπkα\bar{\nu}_{\alpha}=\sum_{k=0}^{\infty}k\,\pi_{k}^{\alpha} depends on α\alpha. We also suppose that the head-start r=rαr=r_{\alpha} may depend on α\alpha and may go to infinity as α0\alpha\to 0. We further assume that rαr_{\alpha} and ν¯α\bar{\nu}_{\alpha} approach infinity with such a rate that

limα0log(rα+ν¯α)|logα|=0.\lim_{\alpha\to 0}\frac{{\log(r_{\alpha}+\bar{\nu}_{\alpha})}}{|\log\alpha|}=0. (50)

The following theorem establishes the first-order asymptotic (as α0\alpha\to 0) optimality of the mixture detection procedure TA𝐩,θT_{A}^{{\mathbf{p}},\theta} for the fixed value of θ\theta in the general non-i.i.d. case. In this theorem, we assume that ψ(x)\psi(x) satisfies condition (24) in contrast to Theorem 2.2 where this function can be practically arbitrary.

Theorem 2.3

Let the prior distribution of the change point satisfy condition 𝐂𝐏1{\mathbf{CP}}_{1} and let the mean of the prior distribution να\nu_{\alpha} and the head-start rαr_{\alpha} of the statistic R𝐩,θ(n)R_{{\mathbf{p}},\theta}(n) go to infinity with such a rate that condition (50) hold. Assume that for some 0<I𝖡,θ<0<I_{{\mathsf{B}},\theta}<\infty (𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}) the uniform complete convergence condition (36) holds. If threshold A=AαA=A_{\alpha} is so selected that 𝖯𝖥𝖠π(TAα𝐩,θ)α{\mathsf{PFA}}_{\pi}(T_{A_{\alpha}}^{{\mathbf{p}},\theta})\leq\alpha and logAα|logα|\log A_{\alpha}\sim|\log\alpha| as α0\alpha\to 0, in particular as Aα=(rα+ν¯α)/αA_{\alpha}=(r_{\alpha}+\bar{\nu}_{\alpha})/\alpha, then for all k+k\in\mathbb{Z}_{+} and 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}} as α0\alpha\to 0

𝖤𝖣𝖣k,𝖡,θ(TAα𝐩,θ)\displaystyle{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A_{\alpha}}^{{\mathbf{p}},\theta}) =Ψ(|logα|I𝖡,θ)(1+o(1));\displaystyle=\Psi\left(\frac{|\log\alpha|}{I_{{\mathsf{B}},\theta}}\right)(1+o(1)); (51)
infTπ(α)𝖤𝖣𝖣k,𝖡,θ(T)\displaystyle\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T) =𝖤𝖣𝖣k,𝖡,θ(TAα𝐩,θ)(1+o(1)).\displaystyle={\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A_{\alpha}}^{{\mathbf{p}},\theta})(1+o(1)). (52)

That is, the detection procedure TAα𝐩,θT_{A_{\alpha}}^{{\mathbf{p}},\theta} is first-order asymptotically optimal as α0\alpha\to 0 in class π(α){\mathbb{C}_{\pi}}(\alpha), minimizing average detection delay 𝖤𝖣𝖣k,𝖡,θ(T){\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T) uniformly for all k+k\in\mathbb{Z}_{+} and 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}.

Proof

If threshold AαA_{\alpha} is so selected that logAα|logα|\log A_{\alpha}\sim|\log\alpha| as α0\alpha\to 0, then asymptotic approximation (51) follows from Theorem 2.2. This asymptotic approximation coincides with the asymptotic lower bound (26) in Theorem 2.1 if condition (24) on ψ(x)\psi(x) holds. Hence, the lower bound is attained by TAα𝐩,θT^{{\mathbf{p}},\theta}_{A_{\alpha}}, proving (52) and the assertion of the theorem.

In particular, if threshold is selected as Aα=(rα+ν¯α)/αA_{\alpha}=(r_{\alpha}+\bar{\nu}_{\alpha})/\alpha, then by Lemma 1 𝖯𝖥𝖠π(TAα𝐩,θ)α{\mathsf{PFA}}_{\pi}(T_{A_{\alpha}}^{{\mathbf{p}},\theta})\leq\alpha and logAαlogα\log A_{\alpha}\sim-\log\alpha due to condition (50). ∎

Remark 1

If the prior distribution π\pi of the change point does not depend on α\alpha and condition 𝐂𝐏1{\mathbf{CP}}_{1} is satisfied with β0\beta\geq 0, then the assertion of Theorem 2.3 holds if, and only if, β=0\beta=0, i.e., for heavy-tailed prior distributions, but not for priors with exponential right tail with β>0\beta>0. This follows from the fact that for β>0\beta>0 the lower bound has the form

infTπ(α)𝖤𝖣𝖣k,𝖡,θ(T)Ψ(|logα|I𝖡,θ+β)(1+o(1)asα0.\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T)\geq\Psi\left(\frac{|\log\alpha|}{I_{{\mathsf{B}},\theta}+\beta}\right)(1+o(1)\quad\text{as}~{}\alpha\to 0.

Asymptotic Optimality of the Double-Mixture Procedure TA𝐩,WT_{A}^{{\mathbf{p}},W} for Unknown Post-change Parameter

Consider now the case where the post-change parameter θΘ\theta\in\Theta is unknown. The goal is to show that the double-mixture detection procedure TA𝐩,WT_{A}^{{\mathbf{p}},W} defined in (14) is asymptotically optimal to first order.

Recall that in the case of the known parameter θ\theta, the sufficient condition for asymptotic optimality (imposed on the data model) is the uniform complete version of the SLLN (36). The proofs of Theorem 2.1 and Theorem 2.2 show that to establish asymptotic optimality of procedure TA𝐩,θT_{A}^{{\mathbf{p}},\theta} it suffices to require the following two (right-tail and left-tail) conditions: for all ε>0\varepsilon>0, k+k\in\mathbb{Z}_{+} and 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}

limL𝖯k,𝖡,θ{1ψ(L)max0n<Lλ𝖡,θ(k,k+n)(1+ε)I𝖡,θ}=0\lim_{L\to\infty}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(L)}\max_{0\leq n<L}\lambda_{{\mathsf{B}},\theta}(k,k+n)\geq(1+\varepsilon)I_{{\mathsf{B}},\theta}\right\}=0 (53)

and

n=1𝖯k,𝖡,θ{1ψ(n)λ𝖡,θ(k,k+n)<I𝖡,θε}<.\sum_{n=1}^{\infty}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\lambda_{{\mathsf{B}},\theta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon\right\}<\infty. (54)

Note that the SLLN (16) guarantees right-tail condition (53) and complete version (36) guarantees both conditions (53) and (54).

If the post-change parameter θ\theta is unknown to obtain the upper bound for the expected detection delay the left-trail condition (54) has to be modified as follows. For δ>0\delta>0, define Γδ,θ={ϑΘ:|ϑθ|<δ}\Gamma_{\delta,\theta}=\{\vartheta\in\Theta\,:\,|\vartheta-\theta|<\delta\} and assume that there exist positive and finite numbers I𝖡,θI_{{\mathsf{B}},\theta} (𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, θΘ\theta\in\Theta) such that for any ε>0\varepsilon>0 and for all 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}} and θΘ\theta\in\Theta

limδ0n=1supk+𝖯k,𝖡,θ{1ψ(n)infϑΓδ,θλ𝖡,ϑ(k,k+n)<I𝖡,θε}<.\lim_{\delta\to 0}\sum_{n=1}^{\infty}\sup_{k\in\mathbb{Z}_{+}}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\inf_{\vartheta\in\Gamma_{\delta,\theta}}\lambda_{{\mathsf{B}},\vartheta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon\right\}<\infty. (55)

We ignore parameter values with WW-measure null, i.e., without special emphasis we will always assume that

W(Γδ,θ)>0for all θΘ and δ>0.W(\Gamma_{\delta,\theta})>0~{}~{}\text{for all $\theta\in\Theta$ and $\delta>0$}.

The following theorem generalizes Theorem 2.2 to the case of the unknown post-change parameter.

Theorem 2.4

Suppose that conditions (37) and (38) hold and there exist positive and finite numbers I𝖡,θI_{{\mathsf{B}},\theta}, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, θΘ\theta\in\Theta, such that right-tail and left-tail conditions (53) and (55) are satisfied. Then the following asymptotic as AA\to\infty approximation holds

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,W)=Ψ(logAI𝖡,θ)(1+o(1))for allk+,𝖡𝒫,θΘ.{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},W})=\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}}\right)(1+o(1))\quad\text{for all}~{}k\in\mathbb{Z}_{+},~{}{\mathsf{B}}\in{\mathscr{P}},\theta\in\Theta. (56)
Proof

The proof of the asymptotic lower bound (for any fixed k+k\in\mathbb{Z}_{+}, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}} and θΘ\theta\in\Theta)

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,W)Ψ(logAI𝖡,θ)(1+o(1))asA{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},W})\geq\Psi\left(\frac{\log A}{I_{{\mathsf{B}},\theta}}\right)(1+o(1))\quad\text{as}~{}A\to\infty (57)

is essentially identical to that used to establish the lower bound (43) for the expected delay to detection 𝖤𝖣𝖣k,𝖡,θ(TA𝐩,θ){\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},\theta}) in the proof of Theorem 2.2. So it is omitted.

To prove asymptotic approximation (56) it suffices to show that the lower bound (57) is attained by TA𝐩,WT_{A}^{{\mathbf{p}},W}, i.e.,

lim supA𝖤𝖣𝖣k,𝖡,θ(TA𝐩,W)logA1I𝖡,θ.\limsup_{A\to\infty}\frac{{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},W})}{\log A}\leq\frac{1}{I_{{\mathsf{B}},\theta}}. (58)

Let MAM_{A} be as in (45). Since for any n1n\geq 1,

logR𝐩,W(k+n)logΛ𝐩,W(k,k+n)infϑΓδ,θλ𝖡,ϑ(k,k+n)+logW(Γδ,θ)+logp𝖡,\log R_{{\mathbf{p}},W}(k+n)\geq\log\Lambda_{{\mathbf{p}},W}(k,k+n)\geq\inf_{\vartheta\in\Gamma_{\delta,\theta}}\lambda_{{\mathsf{B}},\vartheta}(k,k+n)+\log W(\Gamma_{\delta,\theta})+\log p_{\mathsf{B}},

using essentially the same argument as in the proof of Theorem 2.2 that have led to inequality (46) we obtain that for all nMAn\geq M_{A}

𝖯k,𝖡,θ(TA𝐩,W>n)\displaystyle{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left(T_{A}^{{\mathbf{p}},W}>n\right) 𝖯k,𝖡,θ{1ψ(n)infϑΓδ,θλ𝖡,ϑ(k,k+n)<I𝖡,θε\displaystyle\leq{\mathsf{P}}_{k,{\mathsf{B}},\theta}\Bigg{\{}\frac{1}{\psi(n)}\inf_{\vartheta\in\Gamma_{\delta,\theta}}\lambda_{{\mathsf{B}},\vartheta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon
1ψ(MA)(logp𝖡+logW(Γδ,θ))}.\displaystyle-\frac{1}{\psi(M_{A})}\left(\log p_{\mathsf{B}}+\log W(\Gamma_{\delta,\theta})\right)\Bigg{\}}.

Hence, for all nMAn\geq M_{A} and all sufficiently large AA such that |log[p𝖡W(Γδ,θ)]|/ψ(MA)<ε/2|\log[p_{\mathsf{B}}W(\Gamma_{\delta,\theta})]|/\psi(M_{A})<\varepsilon/2 ,

𝖯k,𝖡,θ(TA𝐩,Wk>n)\displaystyle{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left(T_{A}^{{\mathbf{p}},W}-k>n\right) 𝖯k,𝖡,θ(1ψ(n)infϑΓδ,θλ𝖡,ϑ(k,k+n)<I𝖡,θε2).\displaystyle\leq{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left(\frac{1}{\psi(n)}\inf_{\vartheta\in\Gamma_{\delta,\theta}}\lambda_{{\mathsf{B}},\vartheta}(k,k+n)<I_{{\mathsf{B}},\theta}-\frac{\varepsilon}{2}\right). (59)

Lemma A1 in Tartakovsky (Tartakovsky_book2020, , Page 239) yields the inequality (for any k+k\in\mathbb{Z}_{+}, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, and θΘ\theta\in\Theta)

𝖤k,𝖡,θ[(TA𝐩,Wk)+]MA+n=MA𝖯k,𝖡,θ(TA𝐩,W>n).\displaystyle{\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[(T_{A}^{{\mathbf{p}},W}-k)^{+}\right]\leq M_{A}+\sum_{n=M_{A}}^{\infty}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left(T_{A}^{{\mathbf{p}},W}>n\right). (60)

Using (60) and (59), we obtain

𝖤k,𝖡,θ[(TA𝐩,Wk)+]1+Ψ(logAI𝖡,θε)+ΥA(ε/2,𝖡,θ),{\mathsf{E}}_{k,{\mathsf{B}},\theta}\left[\left(T_{A}^{{\mathbf{p}},W}-k\right)^{+}\right]\leq 1+\Psi\left(\left\lfloor\frac{\log A}{I_{{\mathsf{B}},\theta}-\varepsilon}\right\rfloor\right)+\Upsilon_{A}(\varepsilon/2,{\mathsf{B}},\theta), (61)

where

ΥA(ε,𝖡,θ)=n=MAsupk+𝖯k,𝖡,θ{1ψ(n)infϑΓδ,θλ𝖡,ϑ(k,k+n)<I𝖡,θε}.\Upsilon_{A}(\varepsilon,{\mathsf{B}},\theta)=\sum_{n=M_{A}}^{\infty}\sup_{k\in\mathbb{Z}_{+}}{\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(n)}\inf_{\vartheta\in\Gamma_{\delta,\theta}}\lambda_{{\mathsf{B}},\vartheta}(k,k+n)<I_{{\mathsf{B}},\theta}-\varepsilon\right\}.

Next, inequality (61) along with the inequality 𝖯(TA𝐩,W>k)>1(rA+k)/A{\mathsf{P}}_{\infty}(T_{A}^{{\mathbf{p}},W}>k)>1-(r_{A}+k)/A (see (35)) implies the inequality

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,W)\displaystyle{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},W}) 1+Ψ(logAI𝖡,θε)+ΥA(ε/2,𝖡,θ)1(rA+k)/A.\displaystyle\leq\frac{1+\Psi\left(\left\lfloor\frac{\log A}{I_{{\mathsf{B}},\theta}-\varepsilon}\right\rfloor\right)+\Upsilon_{A}(\varepsilon/2,{\mathsf{B}},\theta)}{1-(r_{A}+k)/A}. (62)

Since rA/A0r_{A}/A\to 0 (see condition (37)) and, by the left-tail condition (55), ΥA(ε/2,𝖡,θ)0\Upsilon_{A}(\varepsilon/2,{\mathsf{B}},\theta)\to 0 as AA\to\infty for all ε>0\varepsilon>0, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}}, and θΘ\theta\in\Theta inequality (62) implies the asymptotic inequality

𝖤𝖣𝖣k,𝖡,θ(TA𝐩,W)(logAI𝖡,θε)(1+o(1))asA,{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A}^{{\mathbf{p}},W})\leq\left(\frac{\log A}{I_{{\mathsf{B}},\theta}-\varepsilon}\right)(1+o(1))\quad\text{as}~{}A\to\infty,

where ε\varepsilon can be taken arbitrarily small so that the asymptotic upper bound (58) follows and the proof of the asymptotic approximation (56) is complete. ∎

Using asymptotic approximation (56) in Theorem 2.4 and the lower bound (26) in Theorem 2.1, it is easy to prove that the mixture procedure TA𝐩,WT_{A}^{{\mathbf{p}},W} is asymptotically optimal to first order as α0\alpha\to 0 in class π(α){\mathbb{C}_{\pi}}(\alpha). We silently assume that ψ(x)\psi(x) is either a linear or super-linear function (see (24)).

Theorem 2.5

Let the prior distribution of the change point satisfy condition 𝐂𝐏1{\mathbf{CP}}_{1}. Assume also that the mean value ν¯=ν¯α\bar{\nu}=\bar{\nu}_{\alpha} of the prior distribution and the head-start r=rαr=r_{\alpha} of the statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) approach infinity as α0\alpha\to 0 with such a rate that condition (50) holds. Suppose further that there exist numbers 0<I𝖡,θ<0<I_{{\mathsf{B}},\theta}<\infty (𝖡𝒫,θΘ{\mathsf{B}}\in{\mathscr{P}},\theta\in\Theta) such that conditions (53) and (55) are satisfied. If threshold AαA_{\alpha} is so selected that 𝖯𝖥𝖠π(TAα𝐩,W)α{\mathsf{PFA}}_{\pi}(T_{A_{\alpha}}^{{\mathbf{p}},W})\leq\alpha and logAα|logα|\log A_{\alpha}\sim|\log\alpha| as α0\alpha\to 0, in particular as Aα=(rα+ν¯α)/αA_{\alpha}=(r_{\alpha}+\bar{\nu}_{\alpha})/\alpha, then for all k+k\in\mathbb{Z}_{+}, 𝖡𝒫{\mathsf{B}}\in{\mathscr{P}} and θΘ\theta\in\Theta as α0\alpha\to 0

𝖤𝖣𝖣k,𝖡,θ(TAα𝐩,W)\displaystyle{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A_{\alpha}}^{{\mathbf{p}},W}) =Ψ(|logα|I𝖡,θ)(1+o(1));\displaystyle=\Psi\left(\frac{|\log\alpha|}{I_{{\mathsf{B}},\theta}}\right)(1+o(1)); (63)
infTπ(α)𝖤𝖣𝖣k,𝖡,θ(T)\displaystyle\inf_{T\in{\mathbb{C}_{\pi}}(\alpha)}{\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T) =𝖤𝖣𝖣k,𝖡,θ(TAα𝐩,W)(1+o(1)),\displaystyle={\mathsf{EDD}}_{k,{\mathsf{B}},\theta}(T_{A_{\alpha}}^{{\mathbf{p}},W})(1+o(1)), (64)

that is, the procedure TAα𝐩,WT_{A_{\alpha}}^{{\mathbf{p}},W} is first-order asymptotically optimal as α0\alpha\to 0 in class π(α){\mathbb{C}_{\pi}}(\alpha).

Proof

If AαA_{\alpha} is so selected that logAα|logα|\log A_{\alpha}\sim|\log\alpha| as α0\alpha\to 0, then asymptotic approximation (63) follows from asymptotic approximation (56) in Theorem 2.4. Since this approximation is the same as the asymptotic lower bound (26) in Theorem 2.1, this shows that the lower bound is attained by the detection rule TAα𝐩,WT_{A_{\alpha}}^{{\mathbf{p}},W}, so (64) follows and the proof is complete.

If, in particular, Aα=(rα+ν¯α)/αA_{\alpha}=(r_{\alpha}+\bar{\nu}_{\alpha})/\alpha, then logAα|logα|\log A_{\alpha}\sim|\log\alpha| and by Lemma 1 TAα𝐩,Wπ(α)T_{A_{\alpha}}^{{\mathbf{p}},W}\in{\mathbb{C}_{\pi}}(\alpha). ∎

Remark 2

The assertion of Theorem 2.5 holds when ν¯<\bar{\nu}<\infty and rr do not depend on α\alpha and the condition 𝐂𝐏1{\mathbf{CP}}_{1} is satisfied with β0\beta\equiv 0, i.e., for priors π\pi with heavy tails. This could be expected since the detection statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) uses an improper uniform prior distribution of the change point on the whole positive line.

The Case of Independent Streams

Notice that so far we considered a very general stochastic model where not only the observations in streams may be dependent and non-identically distributed, but also the streams may be mutually dependent. In this very general case, computing statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) is problematic even when the statistics in data streams can be computed. Consider now still a very general scenario where the data streams are mutually independent (but have a general statistical structure), which is of special interest for many applications. The computational problem becomes more manageable when the data between data streams are independent.

2.4(a) Computational Issues

In the case where the data across streams are independent, the model has the form

p(𝐗n|Hk,𝖡,𝜽𝖡)=t=1ni=1Ngi(Xt(i)|𝐗t1(i))forν=kn,p(𝐗n|Hk,𝖡,𝜽𝖡)=t=1ki=1Ngi(Xt(i)|𝐗t1(i))×t=k+1ni𝖡fi,θi(Xt(i)|𝐗t1(i))i𝖡gi(Xt(i)|𝐗t1(i))forν=k<n,\begin{split}&p({\mathbf{X}}^{n}|H_{k,{\mathsf{B}}},{\bm{\theta}}_{\mathsf{B}})=\prod_{t=1}^{n}\prod_{i=1}^{N}g_{i}(X_{t}(i)|{\mathbf{X}}^{t-1}(i))\quad\text{for}~{}\nu=k\geq n,\\ &p({\mathbf{X}}^{n}|H_{k,{\mathsf{B}}},{\bm{\theta}}_{\mathsf{B}})=\prod_{t=1}^{k}\prod_{i=1}^{N}g_{i}(X_{t}(i)|{\mathbf{X}}^{t-1}(i))\times\\ &\quad\prod_{t=k+1}^{n}\prod_{i\in{\mathsf{B}}}f_{i,\theta_{i}}(X_{t}(i)|{\mathbf{X}}^{t-1}(i))\prod_{i\notin{\mathsf{B}}}g_{i}(X_{t}(i)|{\mathbf{X}}^{t-1}(i))\quad\text{for}~{}\nu=k<n,\end{split} (65)

where gi(Xt(i)|𝐗t1(i))g_{i}(X_{t}(i)|{\mathbf{X}}^{t-1}(i)) and fi,θi(Xt(i)|𝐗t1(i))f_{i,\theta_{i}}(X_{t}(i)|{\mathbf{X}}^{t-1}(i)) are conditional pre- and post-change densities in the ii-th data stream, respectively, θiΘi\theta_{i}\in\Theta_{i} is the unknown post-change parameter (generally multidimensional) in the ii-th stream (i{1,,N}=𝒩i\in\{1,\dots,N\}={\mathcal{N}}), and 𝜽𝖡=(θi,i𝖡){\bm{\theta}}_{\mathsf{B}}=(\theta_{i},i\in{\mathsf{B}}) is the vector of parameters in the set 𝖡{\mathsf{B}}. So the LR processes are

LR𝖡,𝜽𝖡(k,n)=i𝖡LRi,θi(k,n),LRi,θi(k,n)=t=k+1ni,θi(t),n>k,LR_{{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}}}(k,n)=\prod_{i\in{\mathsf{B}}}LR_{i,\theta_{i}}(k,n),\quad LR_{i,\theta_{i}}(k,n)=\prod_{t=k+1}^{n}{\mathcal{L}}_{i,\theta_{i}}(t),\quad n>k, (66)

where i,θi(t)=fi,θi(Xt(i)|𝐗t1(i))/gi(Xt(i)|𝐗t1(i)){\mathcal{L}}_{i,\theta_{i}}(t)=f_{i,\theta_{i}}(X_{t}(i)|{\mathbf{X}}^{t-1}(i))/g_{i}(X_{t}(i)|{\mathbf{X}}^{t-1}(i)).

Recall that 𝒫K={𝖡:1|𝖡|K}{\mathscr{P}}_{K}=\{{\mathsf{B}}:1\leq|{\mathsf{B}}|\leq K\} is the subclass of 𝒫{\mathscr{P}} for which the cardinality |𝖡||{\mathsf{B}}| of the sets where the change may occur does not exceed KNK\leq N streams, and by 𝒫K={𝖡:|𝖡|=K}{\mathscr{P}}_{K}^{\star}=\{{\mathsf{B}}:|{\mathsf{B}}|=K\} denote the subclass of 𝒫{\mathscr{P}} for which the change occurs in exactly KK streams.

Assume, in addition, that the mixing measure is such that

p𝖡=C(𝒫K)i𝖡pi,C(𝒫K)=(𝖡𝒫Ki𝖡pi)1.p_{\mathsf{B}}=C({\mathscr{P}}_{K})\prod_{i\in{\mathsf{B}}}p_{i},\quad C({\mathscr{P}}_{K})=\left(\sum_{{\mathsf{B}}\in{\mathscr{P}}_{K}}\prod_{i\in{\mathsf{B}}}p_{i}\right)^{-1}.

Then the mixture LR is

Λ𝐩,𝜽(k,n)=C(𝒫K)i=1K𝖡𝒫ij𝖡pjLRj,θi(k,n),\Lambda_{{\mathbf{p}},{\bm{\theta}}}(k,n)=C({\mathscr{P}}_{K})\sum_{i=1}^{K}\sum_{{\mathsf{B}}\in{\mathscr{P}}_{i}^{\star}}\prod_{j\in{\mathsf{B}}}p_{j}LR_{j,\theta_{i}}(k,n),

and its computational complexity is polynomial in the number of data streams. Moreover, in the special, most difficult case of K=NK=N and pj=pp_{j}=p, we obtain

Λ𝐩,𝜽(k,n)=C(𝒫N)[i=1N(1+pLRi,θi(k,n))1],\Lambda_{{\mathbf{p}},{\bm{\theta}}}(k,n)=C({\mathscr{P}}_{N})\left[\prod_{i=1}^{N}\left(1+pLR_{i,\theta_{i}}(k,n)\right)-1\right], (67)

so its computational complexity is only O(N)O(N). The representation (67) corresponds to the case when each stream is affected independently with probability p/(1+p)p/(1+p), the assumption that was made in Xie&Siegmund-AS13 .

2.4(b) Asymptotic Optimality of Detection Procedures

Since the data are independent across streams, for an assumed value of the change point ν=k\nu=k, stream i𝒩i\in{\mathcal{N}}, and the post-change parameter value in the ii-th stream θi\theta_{i}, the LLR of observations accumulated by time k+nk+n is given by

λi,θi(k,k+n)=t=k+1k+nlogfi,θi(Xt(i)|𝐗t1(i))gi(Xt(i))|𝐗t1(i)),n1.\lambda_{i,\theta_{i}}(k,k+n)=\sum_{t=k+1}^{k+n}\log\frac{f_{i,\theta_{i}}(X_{t}(i)|{\mathbf{X}}^{t-1}(i))}{g_{i}(X_{t}(i))|{\mathbf{X}}^{t-1}(i))},\quad n\geq 1.

Define Γδ,θi={ϑΘi:|ϑθi|<δ}\Gamma_{\delta,\theta_{i}}=\{\vartheta\in\Theta_{i}\,:\,|\vartheta-\theta_{i}|<\delta\},

γk,L(i)(ε,θi)\displaystyle\gamma_{k,L}^{(i)}(\varepsilon,\theta_{i}) =𝖯k,i,θi{1ψ(L)max1nLλi,θi(k,k+n)(1+ε)Ii,θi},\displaystyle={\mathsf{P}}_{k,i,\theta_{i}}\left\{\frac{1}{\psi(L)}\max_{1\leq n\leq L}\lambda_{i,\theta_{i}}(k,k+n)\geq(1+\varepsilon)I_{i,\theta_{i}}\right\},
Υ(i)(ε,θi)\displaystyle\Upsilon^{(i)}(\varepsilon,\theta_{i}) =limδ0n=1supk+𝖯k,i,θi{1ψ(n)infϑΓδ,θiλi,ϑ(k,k+n)<Ii,θiε},\displaystyle=\lim_{\delta\to 0}\sum_{n=1}^{\infty}\sup_{k\in\mathbb{Z}_{+}}{\mathsf{P}}_{k,i,\theta_{i}}\left\{\frac{1}{\psi(n)}\inf_{\vartheta\in\Gamma_{\delta,\theta_{i}}}\lambda_{i,\vartheta}(k,k+n)<I_{i,\theta_{i}}-\varepsilon\right\},

and assume that the following right-tail and left-tail conditions are satisfied for local LLR statistics in data streams: There exist positive and finite numbers Ii,θiI_{i,\theta_{i}}, θiΘi\theta_{i}\in\Theta_{i}, i𝒩i\in{\mathcal{N}}, such that for any ε>0\varepsilon>0

limLγk,L(i)(ε,θi)=0for allk+,θiΘi,i𝒩;\lim_{L\to\infty}\gamma_{k,L}^{(i)}(\varepsilon,\theta_{i})=0\quad\text{for all}~{}k\in\mathbb{Z}_{+},~{}\theta_{i}\in\Theta_{i},~{}i\in{\mathcal{N}}; (68)

and

Υ(i)(ε,θi)<for allθiΘi,i𝒩.\Upsilon^{(i)}(\varepsilon,\theta_{i})<\infty\quad\text{for all}~{}\theta_{i}\in\Theta_{i},~{}i\in{\mathcal{N}}. (69)

We also assume that W(Γδ,θi)>0W(\Gamma_{\delta,\theta_{i}})>0 for all δ>0\delta>0 and i𝒩i\in{\mathcal{N}}.

Let I𝖡,𝜽𝖡=i𝖡Ii,θiI_{{\mathsf{B}},{\bm{\theta}}_{{\mathsf{B}}}}=\sum_{i\in{\mathsf{B}}}I_{i,\theta_{i}}. Recall that

γk,L(ε,𝖡,θ)=𝖯k,𝖡,θ{1ψ(L)max1nLλ𝖡,θ(k,k+n)(1+ε)I𝖡,θ}.\gamma_{k,L}(\varepsilon,{\mathsf{B}},\theta)={\mathsf{P}}_{k,{\mathsf{B}},\theta}\left\{\frac{1}{\psi(L)}\max_{1\leq n\leq L}\lambda_{{\mathsf{B}},\theta}(k,k+n)\geq(1+\varepsilon)I_{{\mathsf{B}},\theta}\right\}.

Since the LLR process λ𝖡,𝜽𝖡(k,k+n)\lambda_{{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}}}(k,k+n) is the sum of independent local LLRs, λ𝖡,𝜽𝖡(k,k+n)=i𝖡λi,θi(k,k+n)\lambda_{{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}}}(k,k+n)=\sum_{i\in{\mathsf{B}}}\lambda_{i,\theta_{i}}(k,k+n) (see (66)), it is easy to see that

γk,L(ε,𝖡,𝜽𝖡)i𝖡γk,L(i)(ε,θi),\gamma_{k,L}(\varepsilon,{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}})\leq\sum_{i\in{\mathsf{B}}}\gamma_{k,L}^{(i)}(\varepsilon,\theta_{i}),

so that local conditions (68) imply global right-tail condition (53). This is true, in particular, if λi,θi(k,k+n)/ψ(n)\lambda_{i,\theta_{i}}(k,k+n)/\psi(n) converge 𝖯k,i,θi{\mathsf{P}}_{k,i,\theta_{i}}-a.s. to Ii,θiI_{i,\theta_{i}}, i=1,,Ni=1,\dots,N, in which case the SLLN for the global LLR (16) holds with I𝖡,𝜽𝖡=i𝖡Ii,θiI_{{\mathsf{B}},{\bm{\theta}}_{{\mathsf{B}}}}=\sum_{i\in{\mathsf{B}}}I_{i,\theta_{i}}. Also,

Υ(ε,𝖡,𝜽𝖡)i𝖡Υ(i)(ε,𝖡,θi),\Upsilon(\varepsilon,{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}})\leq\sum_{i\in{\mathsf{B}}}\Upsilon^{(i)}(\varepsilon,{\mathsf{B}},\theta_{i}),

which shows that local left-tail conditions (69) imply global left-tail condition (55).

Theorem 2.5 implies the following results on asymptotic properties of the mixture procedure TA𝐩,WT_{A}^{{\mathbf{p}},W}.

Corollary 1

Assume that for some positive and finite numbers Ii,θiI_{i,\theta_{i}}, θiΘi\theta_{i}\in\Theta_{i}, i=1,Ni=1\dots,N, right-tail and left-tail conditions (68) and (69) for local data streams are satisfied. If threshold AαA_{\alpha} is so selected that 𝖯𝖥𝖠π(TAα𝐩,W)α{\mathsf{PFA}}_{\pi}(T_{A_{\alpha}}^{{\mathbf{p}},W})\leq\alpha and logAα|logα|\log A_{\alpha}\sim|\log\alpha| as α0\alpha\to 0, in particular as Aα=(rα+ν¯α)/αA_{\alpha}=(r_{\alpha}+\bar{\nu}_{\alpha})/\alpha, and if conditions (24) and (50) are satisfied, then asymptotic formulas (63) and (64) hold with I𝖡,θ=I𝖡,𝛉𝖡=i𝖡Ii,θiI_{{\mathsf{B}},\theta}=I_{{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}}}=\sum_{i\in{\mathsf{B}}}I_{i,\theta_{i}}. Therefore, detection procedure TAα𝐩,WT_{A_{\alpha}}^{{\mathbf{p}},W} is first-order asymptotically optimal as α0\alpha\to 0 in class π(α){\mathbb{C}_{\pi}}(\alpha).

Remark 3

The assertions of Corollary 1 also hold for different distribution functions WiW_{i}, i𝒩i\in{\mathcal{N}} in streams if we assume that Wi(Γδ,θi)>0W_{i}(\Gamma_{\delta,\theta_{i}})>0 for all δ>0\delta>0 and i𝒩i\in{\mathcal{N}}. A modification in the proof is trivial.

Remark 4

In the case where the post-change parameter is known, essentially similar corollary follows from Theorem 2.3 if we require the uniform complete version of the SLLN for all streams i𝒩i\in{\mathcal{N}}:

n=1supk+𝖯k,i,θi{|λi,θi(k,k+n)ψ(n)Ii,θi|>ε}<for allε>0.\sum_{n=1}^{\infty}\sup_{k\in\mathbb{Z}_{+}}{\mathsf{P}}_{k,i,\theta_{i}}\left\{\left|\frac{\lambda_{i,\theta_{i}}(k,k+n)}{\psi(n)}-I_{i,\theta_{i}}\right|>\varepsilon\right\}<\infty\quad\text{for all}~{}\varepsilon>0. (70)

3 Examples

3.1 Detection of Changes in the Mean Values of Multistream Autoregressive Non-stationary Processes

This example related to detecting changes in unknown means of multistream non-stationary AR(pp) processes has a specific real application in addition to several other applications in mathematical statistics. Specifically, it arises in multi-channel sensor systems (such as radars and electro-optic imaging systems) where it is required to detect an unknown number of randomly appearing signals from objects in clutter and sensor noise (cf., e.g., Bakutetal-book63 ; Tartakovsky&Brown-IEEEAES08 ; TNB_book2014 ).

Observations in the ii-th channel have the form

Xn(i)=θiSn(i)𝟙{n>ν}+ξn(i),n1,i=1,,N,X_{n}(i)=\theta_{i}\,S_{n}(i)\mathbbm{1}_{\{n>\nu\}}+\xi_{n}(i),\quad n\geq 1,~{}i=1,\dots,N, (71)

where θiSn(i)\theta_{i}\,S_{n}(i) are deterministic signals with unknown “amplitudes” θi>0\theta_{i}>0 that appear at an unknown time ν\nu in additive noises ξn(i)\xi_{n}(i) in an NN-channel system. Assume that noise processes {ξn(i)}n+\{\xi_{n}(i)\}_{n\in\mathbb{Z}_{+}} (i=1,,Ni=1,\dots,N) are mutually independent pp-th order Gaussian autoregressive processes AR(p)(p) that obey recursions

ξn(i)=j=1pρi,jξnj(i)+wn(i),n1,\xi_{n}(i)=\sum_{j=1}^{p}\rho_{i,j}\xi_{n-j}(i)+w_{n}(i),\quad n\geq 1, (72)

where {wn(i)}n1\{w_{n}(i)\}_{n\geq 1}, i=1,,Ni=1,\dots,N, are mutually independent i.i.d. normal 𝒩(0,σi2){\mathcal{N}}(0,\sigma_{i}^{2}) sequences (σi>0\sigma_{i}>0), so the observations in channels Xn(1),,XN(n)X_{n}(1),\dots,X_{N}(n) are independent of each other. For simplicity, let us set zero initial conditions ξ1p(i)=ξ2p(i)==ξ0(i)=0\xi_{1-p}(i)=\xi_{2-p}(i)=\cdots=\xi_{0}(i)=0. The coefficients ρi,1,,ρi,p\rho_{i,1},\dots,\rho_{i,p} and variances σi2\sigma_{i}^{2} are known and all roots of the equation zpρi,1zp1ρi,p=0z^{p}-\rho_{i,1}z^{p-1}-\cdots-\rho_{i,p}=0 are in the interior of the unit circle, so that the AR(pp) processes are stable.

For n1n\geq 1, define the pnp_{n}-th order residuals

S~n(i)=Sn(i)j=1pnρi,jSnj(i),X~n(i)=Xn(i)j=1pnρi,jXnj(i),\widetilde{S}_{n}(i)=S_{n}(i)-\sum_{j=1}^{p_{n}}\rho_{i,j}S_{n-j}(i),\quad\widetilde{X}_{n}(i)=X_{n}(i)-\sum_{j=1}^{p_{n}}\rho_{i,j}X_{n-j}(i),

where pn=pp_{n}=p if n>pn>p and pn=np_{n}=n if npn\leq p. It is easily shown that the conditional pre-change and post-change densities in the ii-th channel are

gi(Xn(i)|𝐗n1(i))\displaystyle g_{i}(X_{n}(i)|{\mathbf{X}}^{n-1}(i)) =f0,i(Xn(i)|𝐗n1(i))=12πσi2exp{X~n(i)22σi2},\displaystyle=f_{0,i}(X_{n}(i)|{\mathbf{X}}^{n-1}(i))=\frac{1}{\sqrt{2\pi\sigma_{i}^{2}}}\exp\left\{-\frac{\widetilde{X}_{n}(i)^{2}}{2\sigma_{i}^{2}}\right\},
fi,θi(Xn(i)|𝐗n1(i))\displaystyle f_{i,\theta_{i}}(X_{n}(i)|{\mathbf{X}}^{n-1}(i)) =12πσi2exp{(X~n(i)θiS~n(i))22σi2},θi(0,),\displaystyle=\frac{1}{\sqrt{2\pi\sigma_{i}^{2}}}\exp\left\{-\frac{(\widetilde{X}_{n}(i)-\theta_{i}\widetilde{S}_{n}(i))^{2}}{2\sigma_{i}^{2}}\right\},\quad\theta_{i}\in(0,\infty),

and that for all k+k\in\mathbb{Z}_{+} and n1n\geq 1 the LLR in the ii-th channel has the form

λi,θi(k,k+n)=θiσi2t=k+1k+nS~t(i)X~t(i)θi2t=k+1k+nS~t(i)22σi2.\lambda_{i,\theta_{i}}(k,k+n)=\frac{\theta_{i}}{\sigma_{i}^{2}}\sum_{t=k+1}^{k+n}\widetilde{S}_{t}(i)\widetilde{X}_{t}(i)-\frac{\theta_{i}^{2}\sum_{t=k+1}^{k+n}\widetilde{S}_{t}(i)^{2}}{2\sigma_{i}^{2}}.

Since under measure 𝖯k,i,ϑi{\mathsf{P}}_{k,i,\vartheta_{i}} the random variables {X~n(i)}nk+1\{\widetilde{X}_{n}(i)\}_{n\geq k+1} are independent Gaussian random variables 𝒩(ϑiS~n(i),σi2){\mathcal{N}}(\vartheta_{i}\widetilde{S}_{n(i)},\sigma_{i}^{2}), under 𝖯k,i,ϑi{\mathsf{P}}_{k,i,\vartheta_{i}} the LLR λi,θi(k,k+n)\lambda_{i,\theta_{i}}(k,k+n) is a Gaussian process (with independent but non-identically distributed increments) with mean and variance

𝖤k,i,ϑi[λi,θi(k,k+n)]=2θiϑiθi22σi2t=k+1k+nS~t(i)2,𝖵𝖺𝗋k,i,ϑi[λi,θi(k,k+n)]=θi2σi2t=k+1k+nS~t(i)2.\begin{split}{\mathsf{E}}_{k,i,\vartheta_{i}}[\lambda_{i,\theta_{i}}(k,k+n)]&=\frac{2\theta_{i}\vartheta_{i}-\theta_{i}^{2}}{2\sigma_{i}^{2}}\sum_{t=k+1}^{k+n}\widetilde{S}_{t}(i)^{2},\\ \mathsf{Var}_{k,i,\vartheta_{i}}[\lambda_{i,\theta_{i}}(k,k+n)]&=\frac{\theta_{i}^{2}}{\sigma_{i}^{2}}\sum_{t=k+1}^{k+n}\widetilde{S}_{t}(i)^{2}.\end{split} (73)

Assume that

limn1ψ(n)supk+t=k+1k+nS~t(i)2=Qi,\lim_{n\to\infty}\frac{1}{\psi(n)}\sup_{k\in\mathbb{Z}_{+}}\sum_{t=k+1}^{k+n}\widetilde{S}_{t}(i)^{2}=Q_{i},

where 0<Qi<0<Q_{i}<\infty. In a variety of signal processing applications this condition holds with ψ(n)=n\psi(n)=n, e.g., in radar applications where the signals θiSi,n\theta_{i}\,S_{i,n} are the sequences of harmonic pulses. In some applications such as detection, recognition, and tracking of objects on ballistic trajectories that can be approximated by polynomials of order m=23m=2-3, the function ψ(n)=nm\psi(n)=n^{m}, m>1m>1. Then for all k+k\in\mathbb{Z}_{+} and θi(0,)\theta_{i}\in(0,\infty)

1ψ(n)λi,θi(k,k+n)n𝖯k,i,θia.s.θi2Qi2σi2=Ii,θi,\frac{1}{\psi(n)}\lambda_{i,\theta_{i}}(k,k+n)\xrightarrow[n\to\infty]{{\mathsf{P}}_{k,i,\theta_{i}}-\text{a.s.}}\frac{\theta_{i}^{2}Q_{i}}{2\sigma_{i}^{2}}=I_{i,\theta_{i}},

so that the SLLN (16) takes place, and therefore, the right-tail condition (68) holds. Furthermore, since the second moment of the LLR is finite it can be shown that uniform complete convergence conditions (70) as well as the left-tail condition (69) also hold. Thus, by Corollary 1, the mixture detection procedure TAα𝐩,WT_{A_{\alpha}}^{{\mathbf{p}},W} minimizes as α0\alpha\to 0 the expected delay to detection and asymptotic formulas (63) and (64) hold with

I𝖡,θ=I𝖡,𝜽𝖡=i𝖡θi2Qi2σi2.I_{{\mathsf{B}},\theta}=I_{{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}}}=\sum_{i\in{\mathsf{B}}}\frac{\theta_{i}^{2}Q_{i}}{2\sigma_{i}^{2}}.

3.2 Change Detection in the Spectrum of the AR(p)p) Multistream Process

Consider the problem of detecting the change of the correlation coefficient in the pp-th order AR process which in the ii-th stream satisfies the recursion

Xn(i)=j=1pρi,j(n)Xnj(i)+wn(i),n1,X_{n}(i)=\sum_{j=1}^{p}\rho^{(n)}_{i,j}\,X_{n-j}(i)+w_{n}(i)\,,\quad n\geq 1, (74)

where

ρi,(n)=θi,𝟙{nν}+θi,𝟙{n>ν}\rho^{(n)}_{i,\ell}=\theta^{*}_{i,\ell}\mathbbm{1}_{\{n\leq\nu\}}+\theta_{i,\ell}\mathbbm{1}_{\{n>\nu\}}

and {wn(i)}n1\{w_{n}(i)\}_{n\geq 1} are i.i.d. (mutually independent) Gaussian random variables with 𝖤[w1(i)]=0{\mathsf{E}}[w_{1}(i)]=0, 𝖤[w12(i)]=1{\mathsf{E}}[w^{2}_{1}(i)]=1. Additional notation:

θi=(θi,1,,θi,p),θi=(θi,1,,θi,p),𝐗in1,np=(Xn1(i),,Xnp(i)).\theta^{*}_{i}=(\theta^{*}_{i,1},\ldots,\theta^{*}_{i,p})^{\top},~{}~{}\theta_{i}=(\theta_{i,1},\ldots,\theta_{i,p})^{\top},~{}~{}{\mathbf{X}}_{i}^{n-1,n-p}=(X_{n-1}(i),\ldots,X_{n-p}(i)).

(\top denotes transpose).

It is easy to see that the pre-change and post-change conditional densities gi(Xi,n|𝐗in1)=gi(Xn(i)|𝐗in1,np)g_{i}(X_{i,n}|{\mathbf{X}}_{i}^{n-1})=g_{i}(X_{n}(i)|{\mathbf{X}}_{i}^{n-1,n-p}) and fi,θi(Xn(i)|𝐗in1(i))=fi,θi(Xn(i)|𝐗in1,np)f_{i,\theta_{i}}(X_{n}(i)|{\mathbf{X}}_{i}^{n-1}(i))=f_{i,\theta_{i}}(X_{n}(i)|{\mathbf{X}}_{i}^{n-1,n-p}) are given by

gi(Xn(i)|𝐗in1,np)=1(2π)p/2exp{(ηi(Xn(i),𝐗in1,np)22},fi,θi(Xn(i)|𝐗in1,np)=1(2π)p/2exp{(ηθi,i(Xn(i),𝐗in1,np))22},\begin{split}g_{i}(X_{n}(i)|{\mathbf{X}}_{i}^{n-1,n-p})&=\frac{1}{(2\pi)^{p/2}}\,\exp\left\{-\dfrac{(\eta^{*}_{i}(X_{n}(i),{\mathbf{X}}_{i}^{n-1,n-p})^{2}}{2}\right\},\\ f_{i,\theta_{i}}(X_{n}(i)|{\mathbf{X}}_{i}^{n-1,n-p})&=\frac{1}{(2\pi)^{p/2}}\,\exp\left\{-\dfrac{(\eta_{\theta_{i},i}(X_{n}(i),{\mathbf{X}}_{i}^{n-1,n-p}))^{2}}{2}\right\},\end{split} (75)

where ηi(y,x)=y(θi)x\eta^{*}_{i}(y,x)=y-(\theta^{*}_{i})^{\top}x and ηθi,i(y,x)=y(θi)x\eta_{\theta_{i},i}(y,x)=y-(\theta_{i})^{\top}x (xx\in{\mathbb{R}}, ypy\in{\mathbb{R}}^{p}). Therefore, for any θip\theta_{i}\in{\mathbb{R}}^{p}, the LLR is

λi(k,k+n)=t=k+1k+nλi(t)\lambda_{i}(k,k+n)=\sum_{t=k+1}^{k+n}\lambda_{i}^{*}(t)

where

λi(t)=logfi,θi(Xt(i)|𝐗it1,tp)gi(Xt(i)|𝐗it1,tp)=Xt(i)(θiθi)𝐗it1,tp+((θi)𝐗it1,tp)2(θi𝐗it1,tp)22.\begin{split}\lambda_{i}^{*}(t)&=\log\frac{f_{i,\theta_{i}}(X_{t}(i)|{\mathbf{X}}_{i}^{t-1,t-p})}{g_{i}(X_{t}(i)|{\mathbf{X}}_{i}^{t-1,t-p})}\\ &=X_{t}(i)(\theta_{i}-\theta^{*}_{i})^{\top}{\mathbf{X}}_{i}^{t-1,t-p}+\frac{((\theta^{*}_{i})^{\top}{\mathbf{X}}_{i}^{t-1,t-p})^{2}-(\theta_{i}^{\top}{\mathbf{X}}_{i}^{t-1,t-p})^{2}}{2}.\end{split} (76)

The process (74) is not Markov, but the pp-dimensional processes

Φi,n=(Xi,n,,Xi,np+1)p,i=1,,N\Phi_{i,n}=(X_{i,n},\ldots,X_{i,n-p+1})^{\top}\in{\mathbb{R}}^{p},~{}~{}i=1,\dots,N

are Markov. Now, for any ϑ=(ϑ1,,ϑp)p\vartheta=(\vartheta_{1},\ldots,\vartheta_{p})\in{\mathbb{R}}^{p}, define the matrix

Λ(ϑ)=(ϑ1ϑ2ϑp1000010).\Lambda(\vartheta)=\begin{pmatrix}\vartheta_{1}&\vartheta_{2}&\dots&\vartheta_{p}\\ 1&0&\dots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\dots 1&0\end{pmatrix}\,.

Note that

Φi,n={Λ(θi)Φi,n1+w~i,nfornν,Λ(θi)Φi,n1+w~i,nforn>ν,\Phi_{i,n}=\begin{cases}\Lambda(\theta_{i}^{*})\Phi_{i,n-1}+\widetilde{w}_{i,n}&\mbox{for}~{}n\leq\nu,\\ \Lambda(\theta_{i})\Phi_{i,n-1}+\widetilde{w}_{i,n}&\mbox{for}~{}n>\nu,\end{cases} (77)

where w~i,n=(wi,n,0,,0)p\widetilde{w}_{i,n}=(w_{i,n},0,\ldots,0)^{\top}\in{\mathbb{R}}^{p}. Obviously,

𝖤[w~nw~n]=B=(1000).{\mathsf{E}}[\widetilde{w}_{n}\,\widetilde{w}^{\top}_{n}]=B=\begin{pmatrix}1&\dots&0\\ \vdots&\ddots&\vdots\\ 0&\dots&0\end{pmatrix}.

Assume that all eigenvalues of the matrices Λ(θi)\Lambda(\theta^{*}_{i}) in modules are less than 11 and that θi\theta_{i} belongs to the set

Θist={θip:max1jp|𝐞j(Λ(θi))|<1}{θi},\Theta^{st}_{i}=\{\theta_{i}\in{\mathbb{R}}^{p}\,:\,\max_{1\leq j\leq p}\,|{\mathbf{e}}_{j}(\Lambda(\theta_{i}))|<1\}\setminus\,\{\theta^{*}_{i}\}, (78)

where 𝐞j(Λ){\mathbf{e}}_{j}(\Lambda) is the jj-th eigenvalue of the matrix Λ\Lambda. Using (77) it can be shown that in this case the processes {Φi,n}n>ν\{\Phi_{i,n}\}_{n>\nu} (i=1,,Ni=1,\dots,N) are ergodic with stationary normal distributions 𝒩(0,Fi(θi)){\mathcal{N}}(0,{\mathrm{F}}_{i}(\theta_{i})), where

Fi(θi)=n0(Λ(θi))nB(Λ(θi))n.{\mathrm{F}}_{i}(\theta_{i})=\sum_{n\geq 0}(\Lambda(\theta_{i}))^{n}B(\Lambda^{\top}(\theta_{i}))^{n}.

Taking into account that max1iNsupt1𝖤|Xt(i)|m<\max_{1\leq i\leq N}\,\sup_{t\geq 1}\,{\mathsf{E}}_{\infty}|X_{t}(i)|^{m}<\infty for any m>0m>0 and using techniques developed in PerTar-JMVA2019 it can be shown that the uniform complete convergence conditions (70) as well as the left-tail conditions (69) hold. Therefore, Corollary 1 implies that the mixture detection procedure TAα𝐩,WT_{A_{\alpha}}^{{\mathbf{p}},W} minimizes as α0\alpha\to 0 the expected delay to detection for any compact subset of Θ=Θ1st××ΘNst\Theta=\Theta^{st}_{1}\times\cdots\times\Theta^{st}_{N} and asymptotic formulas (63) and (64) hold with

I𝖡,θ=I𝖡,𝜽𝖡=12i𝖡(θiθi)Fi(θi)(θiθi).I_{{\mathsf{B}},\theta}=I_{{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}}}=\frac{1}{2}\sum_{i\in{\mathsf{B}}}(\theta_{i}-\theta_{i}^{*})^{\top}{\mathrm{F}}_{i}(\theta_{i})(\theta_{i}-\theta_{i}^{*}).

In particular, in the purely Markov scalar case where p=1p=1 in (74),

I𝖡,θ=I𝖡,𝜽𝖡=i𝖡(θiθi)22(1θi2).I_{{\mathsf{B}},\theta}=I_{{\mathsf{B}},{\bm{\theta}}_{\mathsf{B}}}=\sum_{i\in{\mathsf{B}}}\frac{(\theta_{i}-\theta^{*}_{i})^{2}}{2(1-\theta_{i}^{2})}.

4 Monte Carlo

In this section, we perform MC simulations for the example considered in Subsection 3.1, assuming for simplicity (and with a very minor loss of generality) that the noise processes ξn(i)=wn(i)\xi_{n}(i)=w_{n}(i) in (71) are i.i.d. Gaussian with mean zero and variances σi2\sigma_{i}^{2}, which is equivalent setting p=0p=0 in (72). We also assume that Sn(i)=n1.1S_{n}(i)=n^{1.1}, σi2=4\sigma_{i}^{2}=4, θi=0\theta_{i}=0 pre-change and θi=θ=0.1\theta_{i}=\theta=0.1 in the post-change mode when the change occurs in the ii-th stream.

We suppose that the change in each channel occurs independently with probability qq, so the probability of the event {M=m}\{M=m\} that mm streams (out of NN) are affected is

𝖯(M=m)=N!m!(Nm)!qm(1q)Nm.{\mathsf{P}}(M=m)=\frac{N!}{m!(N-m)!}\,q^{m}\,(1-q)^{N-m}.

The change occurs at time ν\nu according to the geometric prior distribution

𝖯(ν=k)=ϱ(1ϱ)k,k=0,1{\mathsf{P}}(\nu=k)=\varrho(1-\varrho)^{k},\quad k=0,1\dots

with a parameter ϱ(0,1)\varrho\in(0,1).

In MC simulations, we set N=10N=10 for the total number of streams, ϱ=0.1\varrho=0.1, q=1/N=0.1q=1/N=0.1, and consider three cases: (a) the change occurs in a single stream with θ=0.1\theta=0.1, (b) the change occurs in two streams with θ=0.1\theta=0.1 in each, and (c) the change occurs in three streams with θ=0.1\theta=0.1 in each.

For each MC run, using equations (66) and (67), we compute the statistics (10) and (13) and get the stopping times for the cases where the parameter of the post-change distribution is known (15) and when the parameter of the post-change distribution is unknown (14). We take uniform prior W(θ)W(\theta) on [0.1,0.3][0.1,0.3] with the step 0.010.01. The number of Monte Carlo runs is 10610^{6}, which ensures very high accuracy of the estimated characteristics.

The results are shown in Table 1 and Fig. 1. In Fig. 1, 𝖯𝖥𝖠{\mathsf{PFA}} (x-axis) is presented in a logarithmic scale. Marks on the figure 2,3,,92,3,\dots,9 mean how many times the 𝖯𝖥𝖠{\mathsf{PFA}} value is greater than the previous decimal mark (0.0001, 0.001, or 0.01). It is seen that the detection algorithm has good performance even with such a low signal-to-noise ratio. As expected, in the case when the parameter of the post-change distribution is not known, the algorithm works worse than when it is known, but only slightly – the difference is small. When the change occurs in multiple streams, the expected detection delays 𝖤𝖣𝖣𝖡2,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{2},\theta}(T_{A}^{{\mathbf{p}},W}) and 𝖤𝖣𝖣𝖡3,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{3},\theta}(T_{A}^{{\mathbf{p}},W}) decrease compared to 𝖤𝖣𝖣𝖡1,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{1},\theta}(T_{A}^{{\mathbf{p}},W}) in the case (a), and also 𝖤𝖣𝖣𝖡3,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{3},\theta}(T_{A}^{{\mathbf{p}},W}) becomes smaller than 𝖤𝖣𝖣𝖡2,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{2},\theta}(T_{A}^{{\mathbf{p}},W}), as expected from theoretical results (see, e.g., (56)). Here 𝖤𝖣𝖣𝖡m,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{m},\theta}(T_{A}^{{\mathbf{p}},W}) denotes the expected detection delay when the change occurs in mm streams (m=1,2,3m=1,2,3).

It is also interesting to compare the MC estimates of the expected detection delay with theoretical asymptotic approximations given by (56). In our example, these approximations reduce to

𝖤𝖣𝖣¯𝖡m,θ(logAmI𝖡m,θ)1/3.2\overline{{\mathsf{EDD}}}_{{\mathsf{B}}_{m},\theta}\approx\left(\frac{\log A_{m}}{I_{{\mathsf{B}}_{m},\theta}}\right)^{1/3.2}

with I𝖡m,θ=m(θ2/6.4σ2)I_{{\mathsf{B}}_{m},\theta}=m(\theta^{2}/6.4\sigma^{2}), m=1,2,3m=1,2,3. Note that thresholds AmA_{m} are different for different mm to guarantee the same PFA. The data in Table 1 show that the first-order asymptotic approximation is not too bad but not especially accurate.

Table 1: Operating characteristics of the mixture detection procedures for the number of streams N=10N=10 with parameters Sn(i)=n1.1S_{n}(i)=n^{1.1} and θ1=θ2=θ3=θ=0.1\theta_{1}=\theta_{2}=\theta_{3}=\theta=0.1. The number of MC runs is 10610^{6}.
𝖯𝖥𝖠π(T){\mathsf{PFA}}_{\pi}(T) 10110^{-1} 51025\cdot\!10^{-2} 10210^{-2} 51035\cdot\!10^{-3} 10310^{-3} 51045\cdot\!10^{-4}
𝖤𝖣𝖣𝖡1,θ(TA𝐩,θ){\mathsf{EDD}}_{{\mathsf{B}}_{1},\theta}(T_{A}^{{\mathbf{p}},\theta}) 10.77 11.28 12.76 13.34 14.68 15.22
𝖤𝖣𝖣𝖡1,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{1},\theta}(T_{A}^{{\mathbf{p}},W}) 11.33 12.07 13.32 13.93 15.55 16.02
𝖤𝖣𝖣¯𝖡1,θ\overline{{\mathsf{EDD}}}_{{\mathsf{B}}_{1},\theta} 15.99 16.92 18.84 19.50 21.05 21.63
𝖤𝖣𝖣𝖡2,θ(TA𝐩,θ){\mathsf{EDD}}_{{\mathsf{B}}_{2},\theta}(T_{A}^{{\mathbf{p}},\theta}) 8.77 9.33 10.50 10.88 12.00 12.46
𝖤𝖣𝖣𝖡2,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{2},\theta}(T_{A}^{{\mathbf{p}},W}) 8.90 9.52 10.82 11.30 12.34 12.81
𝖤𝖣𝖣¯𝖡2,θ\overline{{\mathsf{EDD}}}_{{\mathsf{B}}_{2},\theta} 11.78 12.66 14.43 15.02 16.38 16.88
𝖤𝖣𝖣𝖡3,θ(TA𝐩,θ){\mathsf{EDD}}_{{\mathsf{B}}_{3},\theta}(T_{A}^{{\mathbf{p}},\theta}) 8.01 8.34 9.33 9.79 10.77 11.12
𝖤𝖣𝖣𝖡3,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{3},\theta}(T_{A}^{{\mathbf{p}},W}) 8.05 8.47 9.55 9.99 10.96 11.39
𝖤𝖣𝖣¯𝖡3,θ\overline{{\mathsf{EDD}}}_{{\mathsf{B}}_{3},\theta} 7.90 9.20 11.35 12.00 13.45 13.96
Refer to caption
Figure 1: Operating characteristics of the detection procedures for the total number of streams N=10N=10 and for the number of affected streams m=1,2,3m=1,2,3: 𝖤𝖣𝖣𝖡m,θ(TA𝐩,θ){\mathsf{EDD}}_{{\mathsf{B}}_{m},\theta}(T_{A}^{{\mathbf{p}},\theta}) vs log𝖯𝖥𝖠π\log{\mathsf{PFA}}_{\pi} and 𝖤𝖣𝖣𝖡m,θ(TA𝐩,W){\mathsf{EDD}}_{{\mathsf{B}}_{m},\theta}(T_{A}^{{\mathbf{p}},W}) vs log𝖯𝖥𝖠π\log{\mathsf{PFA}}_{\pi}. Number of MC runs 10610^{6}. x-axis in logarithmic scale

5 Applications

In this section, we consider two different applications – rapid detection of new COVID-19 epidemic waves and extraction of tracks of low-observable near-Earth space objects in optical images obtained by telescopes.

5.1 Application to COVID Detection

We consider the problem of detecting the emergence of new COVID-19 epidemic waves in Australia based on real data. Note that in pergamenchtchikov2022minimax the algorithm of joint disorder detection and identification has been used not only to detect the start of COVID-19 in Italy but also to identify a region where the outbreak occurs. Here we do not consider identification of the region in which the outbreak occurs. We need only to decide on the occurrence of an epidemic in the whole country based on data from various regions. Also, in contrast to pergamenchtchikov2022minimax where a stationary Markov model has been used, we now propose a substantially different non-stationary model taking into account that a new wave of COVID typically spreads faster than a linear law. This fact has been recently noticed in LiangTarVeerIEEEIT2023 .

We selected data on COVID in Australia, which has 8 regions: Australian Capital Territory, New South Wales, Northern Territory, Queensland, South Australia, Tasmania, Victoria, Western Australia. We propose to use the non-stationary model (71) considered in Subsection 3.1 with super-linear functions Sn(i)=CinγS_{n}(i)=C_{i}\,n^{\gamma}, γ>1\gamma>1. As an observation, we use the percentage of infections in the total population. We study the COVID outbreak in Australia, which was recorded in the winter of 2021-2022. The data are taken from the World Health Organization and presented in Fig. 2.

Refer to caption
Figure 2: Distribution of COVID-19 in Australia by region.

A performed statistical analysis shows that the model with parameters θ1=0.1\theta_{1}=0.1 and θ2=0.08\theta_{2}=0.08 for New South Wales and Victoria, respectively, describes the beginning of the pandemic outbreak well. We also selected Sn(i)=n1.127S_{n}(i)=n^{1.127} and ξn(i)=wn(i)𝒩(0,σi2)\xi_{n}(i)=w_{n}(i)\sim{\mathcal{N}}(0,\sigma_{i}^{2}) with σi=2.4\sigma_{i}=2.4 in (71). We apply the 88-stream double mixture detection algorithm discussed above when the parameter of the post-change distribution is not known. We used the discrete uniform prior W(θ)W(\theta) on [0.08,,0.11][0.08,\dots,0.11] with step 0.010.01. The proposed change detection algorithm with a threshold selected so that the probability of false detection does not exceed 0.01 (average detection delay is about 10) decides on the outbreak of the epidemic on January 9, 2022 (see Fig. 3).

The plots in Fig. 3 illustrate the behavior of the change detection statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) defined in (13). The COVID wave is detected at the moment when the statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) crosses the threshold.

Refer to caption
Figure 3: Behavior of the change detection statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) over time.

The obtained results complement the existing work on the application of changepoint detection algorithms to epidemic detection problems (see, e.g., Baron2004 ; LiangTarVeerIEEEIT2023 ; pergamenchtchikov2022minimax ; Baron2013 ). The results show that the proposed mixture-based change detection algorithm can be useful for governments when deciding whether to impose a total lockdown across the country without regard to a region in the early stages of a pandemic.

5.2 Application to Detection and Extraction of Faint Space Objects

The problem of rapid detection and extraction of streaks of low-observable space objects with unknown orbits in optical images captured with ground-based telescopes is a challenge for Space Informatics. A typical image (digital frame) with a low-contrast streak with the signal-to-noise (SNR) in pixel approximately 11 is shown in Fig. 4.

Refer to caption
Figure 4: Digital image with a low-SNR streak. The red rectangle marks the streak position.

Since the distribution of observations changes abruptly when the streak starts and ends the problem of object streak extraction can be regarded as the changepoint detection problem in 2-D space (but not in time since we consider a single image).

In realistic situation, the problem is aggravated by the presence of stars and background that produce strong clutter, and special image preprocessing for clutter removal has to be implemented (see, e.g., Tartakovsky&Brown-IEEEAES08 ). We assume that after appropriate preprocessing, the clutter-removed input frame contains Gaussian noise independent from pixel to pixel. For the sake of simplicity, we also consider a scenario where only one streak may be present in the image. We further assume that the satellite has a linear and uniform motion in the frame. The satellite streak is given by the vector 𝐘=(x0,y0,x1,y1)\mathbf{Y}=(x_{0},y_{0},x_{1},y_{1}), where (x0,y0)(x_{0},y_{0}) corresponds to the start point and (x1,y1)(x_{1},y_{1}) corresponds to the endpoint. We consider the following model of the observation Xi,jX_{i,j} in pixel (i,j)(i,j) of the 2-D frame:

Xi,j=θSi,j(𝐘)+ξi,j,X_{i,j}=\theta S_{i,j}(\mathbf{Y})+\xi_{i,j}, (79)

where θ\theta is an unknown signal intensity from the object, {Si,j(𝐘)}\{S_{i,j}(\mathbf{Y})\} are values of the model profile of the streak that are calculated beforehand assuming the point spread function (PSF) is Gaussian with a certain effective width, which is shown in Fig. 5; and ξi,j𝒩(0,σ2)\xi_{i,j}\sim{\mathcal{N}}(0,\sigma^{2}) is Gaussian noise after preprocessing with zero mean and known (estimated empirically) local variance σ2\sigma^{2}. Thus, the observation Xi,jX_{i,j} has normal pre-change distribution g(Xi,j)𝒩(0,σ2)g(X_{i,j})\sim{\mathcal{N}}(0,\sigma^{2}) when the streak does not cover pixel (i,j)(i,j) and normal post-change distribution f(Xi,j)𝒩(θSi,j(𝐘),σ2)f(X_{i,j})\sim{\mathcal{N}}(\theta S_{i,j}(\mathbf{Y}),\sigma^{2}) when the streak covers the pixel (i,j)(i,j).

Refer to caption
Figure 5: Model profile of the streak.

The problem is to detect the streak with minimal delay or to make a decision that there is no streak in the frame.

We consider only intra-frame detection of faint streaks of subequatorial satellites with unknown orbits with telescopes mounted at the equator. In this case, a signal from a satellite (with unknown intensity, start, and end points) is located almost vertically in a small area at the center of the frame, as shown in Fig. 6.

Refer to caption
Figure 6: Search area ΩS\Omega_{S} is rectangular. The dotted line shows one of the possible directions. White rectangle – streak, red rectangle – sliding window.

Let ΩS\Omega_{S} denote the streak search area. We define NN different directions inside ΩS\Omega_{S}. Let d{1,,N}d\in\{1,\dots,N\} denote a certain direction. Since we assume that there may appear only one satellite’s streak in the frame there is no need to mix SR statistics over streak location but rather maximize. However, since the signal intensity θ\theta is not known we still need to mix over the distribution of θ\theta, which is a drawback.

A reasonable way to avoid this averaging is to use the so-called Finite Moving Average (FMA) statistics, as suggested in TartakovskyetalIEEESP2021 . Specifically, let Md(n),n1M_{d}(n),n\geq 1 stand for a 2-D sliding rectangular window which contains certain pixel numbers (i,j)(i,j) at each step nn in the direction dd. Window Md(n)M_{d}(n) has a fixed length of LdL_{d} pixels and a fixed width of KdK_{d} pixels (the choice of the parameters depends on the expected SNR and PSF effective width). For the certain direction dd (d=1,,Nd=1,\dots,N), the FMA statistic is defined as

VMd(n)(n)=(i,j)Md(n)Si,j(d)Xi,j(d),V_{M_{d}(n)}(n)=\sum_{(i,j)\in M_{d}(n)}S_{i,j}^{(d)}X_{i,j}^{(d)},

where {Si,j(d)}\{S_{i,j}^{(d)}\} are values of the Gaussian model profile in the direction dd and Xi,j(d)X_{i,j}^{(d)} are observed data in the direction dd. Profile location is given by the vector 𝐘d\mathbf{Y}_{d} in the direction dd. Then the multistream111Here “streams” are not streams per se but rather data Xi,j(d)X_{i,j}^{(d)} in different directions d{1,,N}d\in\{1,\dots,N\} in the search area ΩS\Omega_{S}. FMA detection procedure is defined as

Th=inf{n1:max1dNVMd(n)(n)h}.T_{h}=\inf\left\{n\geq 1:\max_{1\leq d\leq N}V_{M_{d}(n)}(n)\geq h\right\}.

For the Gaussian model, the FMA procedure is invariant to the unknown signal intensity θ\theta, which is a big advantage over the SR-type versions.

Refer to caption
Figure 7: The behavior of the FMA VMd(n)(n)V_{M_{d}(n)}(n) statistic along the correct direction.

Fig. 7 shows a typical behavior of the VMd(n)(n)V_{M_{d}(n)}(n) statistic along the direction containing the streak in the case of a very low SNR =0.9=0.9. In this case, the streak is detected with coordinates of start and end at points 47 and 117, respectively, while the true values are 40 and 110 so that the precision is 77 pixels. Experiments show that when sliding the 2-D window in various directions inside ΩS\Omega_{S} and then comparing the largest value maxdVMd(n)(n)\max_{d}V_{M_{d}(n)}(n) to a threshold we typically determine the approximate position of the streak with an accuracy of 5-10 pixels. Therefore, the proposed FMA version of the change detection algorithm turns out to be efficient – it allows us to rapidly determine a localization area, which with a high probability contains the streak.

6 Concluding Remarks and Future Challenges

6.1 Remarks

1. As we already discussed, for general non-i.i.d. models SR-type statistics cannot be computed recursively even in separate streams, so the computational complexity and memory requirements of the mixture detection procedures TA𝐩,θT_{A}^{{\mathbf{p}},\theta} and TA𝐩,WT_{A}^{{\mathbf{p}},W}, especially in the asymptotically non-stationary case, can be quite high. To avoid this drawback it is reasonable to use either one-stage delayed adaptive procedures or window-limited versions of mixture detection procedures where the summation over potential change points ν=k\nu=k is restricted to the a sliding window of a fixed size \ell. In the window-limited versions of TA𝐩,WT_{A}^{{\mathbf{p}},W}, the statistic R𝐩,W(n)R_{{\mathbf{p}},W}(n) is replaced by the window-limited statistic

R^𝐩,W(n)=k=n(+1)n1Λ𝐩,W(k,n)forn>;R^𝐩,W(n)=R𝐩,W(n)forn.\displaystyle\widehat{R}_{{\mathbf{p}},W}(n)=\sum_{k=n-(\ell+1)}^{n-1}\Lambda_{{\mathbf{p}},W}(k,n)\quad\text{for}~{}n>\ell;\quad\widehat{R}_{{\mathbf{p}},W}(n)=R_{{\mathbf{p}},W}(n)\quad\text{for}~{}n\leq\ell.

Following guidelines of Lai LaiIEEE98 and the techniques developed by Tartakovsky (Tartakovsky_book2020, , Sec 3.10, pages 116-122) for the single-stream scenario, it can be shown that the window-limited version of the SR mixture also has first-order asymptotic optimality properties as long as the size of the window (A)\ell(A) approaches infinity as AA\to\infty with the rate (A)/logA\ell(A)/\log A\to\infty. Since thresholds, A=AαA=A_{\alpha}, in detection procedures should be selected in such a way that logAα|logα|\log A_{\alpha}\sim|\log\alpha| as α0\alpha\to 0, the value of the window size (α)\ell(\alpha) should satisfy limα0[(α)/|logα|]=\lim_{\alpha\to 0}[\ell(\alpha)/|\log\alpha|]=\infty.

2. Similar asymptotic optimality results can be obtained for the mixture CUSUM procedure based on thresholding of the sum of generalized LR statistic

W(n)=i=1Nmax0k<nsupθiΘiλi,θi(k,n)W(n)=\sum_{i=1}^{N}\max_{0\leq k<n}\sup_{\theta_{i}\in\Theta_{i}}\lambda_{i,\theta_{i}}(k,n)

and for the corresponding window-limited version.

3. We also conjecture that asymptotic optimality properties hold for the multistream Finite Moving Average (FMA) detection procedure given by the stopping time

Th=inf{n1:i=1NsupθiΘik=max(1,n+1)nλi,θi(k)h},T_{h}=\inf\left\{n\geq 1:\sum_{i=1}^{N}\sup_{\theta_{i}\in\Theta_{i}}\sum_{k=\max(1,n-\ell+1)}^{n}\lambda^{*}_{i,\theta_{i}}(k)\geq h\right\},

where λi,θi(k)=log[fi,θi(Xk(i)|𝐗k1(i))/gi(Xk(i)|𝐗k1(i))]\lambda^{*}_{i,\theta_{i}}(k)=\log[f_{i,\theta_{i}}(X_{k}(i)|{\mathbf{X}}^{k-1}(i))/g_{i}(X_{k}(i)|{\mathbf{X}}^{k-1}(i))]. In cases where the LLR λi,θi(k)\lambda^{*}_{i,\theta_{i}}(k) is a monotone function of some statistic, the FMA procedure can be appropriately modified in such a way that it is invariant to the unknown parameters θi\theta_{i}. This is a great advantage over corresponding SR-based and CUSUM-based procedures. See also discussion in Subsection 5.2.

4. The results can be easily generalized to the case where the change points νi\nu_{i} are different for different streams i=1,,Ni=1,\dots,N.

5. The Bayesian-type results of this paper can be used to establish asymptotic optimality properties of the detection procedures TA𝐩,θT_{A}^{{\mathbf{p}},\theta} and TA𝐩,WT_{A}^{{\mathbf{p}},W} in a non-Bayesian setting. In particular, the optimization problem can be solved in the class of procedures with the upper-bounded maximal local probability of false alarms

supm0𝖯(Tm+k|T>m)α,\sup_{m\geq 0}{\mathsf{P}}_{\infty}\left(T\leq m+k|T>m\right)\leq\alpha,

both in pointwise and minimax settings, by embedding into the Bayesian class, similar to what was done by Pergamenchtchikov and Tartakovsky PerTar-JMVA2019 for the case of a single stream.

6.2 Future Challenges

1. The results of MC simulations in Section 4 show that first-order approximations to EDD and PFA are typically not especially accurate, so higher-order approximations are in order. However, it is not feasible to obtain such approximations in the general non-i.i.d. case considered in this paper. Higher order approximations to the expected detection delay and the probability of false alarm for the i.i.d. models, assuming that the observations in streams are independent and also independent across streams, can be derived based on the renewal and nonlinear renewal theories. This important problem will be considered in the future.

2. The results of this paper cover the scenario where the number of streams NN is fixed so that log(N/α)log(1/α)\log(N/\alpha)\sim\log(1/\alpha) for small α\alpha. In Big Data problems, NN may be very large and go to infinity. These problems require different approaches and different detection procedures. This challenging problem will be considered in the future.

3. For detecting transient (or intermittent) changes of unknown duration (like object streaks in Subsection 5.2) it is often more reasonable to consider not quickest detection criteria but reliable detection criteria that require minimization of the detection probability in a fixed time (or space) window (see, e.g., BerenkovTarKol_EnT2020 ; TartakovskyetalIEEESP2021 ; Tartakovsky_book2020 and references therein). While certain interesting asymptotic results for single-stream scenarios and i.i.d. data models exist Nikiforov+et+al:2012 ; Nikiforov+et+al:2017 ; Nikiforov+et+al:2023 ; SokolovSpivakTartakSQA2023 ; TartakovskyetalIEEESP2021 ; Tartakovsky_book2020 , to the best of our knowledge this problem has never been considered in the multistream setting and for non-i.i.d. models.

Acknowledgements.
We are grateful to a referee for a comprehensive review and useful comments.

References

  • (1) Bakut, P.A., Bolshakov, I.A., Gerasimov, B.M., Kuriksha, A.A., Repin, V.G., Tartakovsky, G.P., Shirokov, V.V.: Statistical Radar Theory, vol. 1 (G. P. Tartakovsky, Editor). Sovetskoe Radio, Moscow, USSR (1963). In Russian
  • (2) Baron, M.: Early detection of epidemics as a sequential change-point problem. In: V. Antonov, C. Huber, M. Nikulin, V. Polischook (eds.) Longevity, Aging and Degradation Models in Reliability, Public Health, Medicine and Biology, vol. 2, pp. 31–43. St. Petersburg (2004)
  • (3) Berenkov, N.R., Tartakovsky, A.G., Kolessa, A.E.: Reliable detection of dynamic anomalies with application to extracting faint space object streaks from digital frames. In: 2020 International Conference on Engineering and Telecommunication (EnT-MIPT 2020). Dolgoprudny, Russia (2020)
  • (4) Chan, H.P.: Optimal sequential detection in multi-stream data. Annals of Statistics 45(6), 2736–2763 (2017)
  • (5) Chang, F.K.: Structural health monitoring: Promises and challenges. In: Proceedings of the 30th Annual Review of Progress in Quantitative NDE (QNDE), Green Bay, WI, USA. American Institute of Physics (2003)
  • (6) Fellouris, G., Sokolov, G.: Second-order asymptotic optimality in multichannel sequential detection. IEEE Transactions on Information Theory 62(6), 3662–3675 (2016). DOI 10.1109/TIT.2016.2549042
  • (7) Fienberg, S.E., Shmueli, G.: Statistical issues and challenges associated with rapid detection of bio-terrorist attacks. Statistics in Medicine 24(4), 513–529 (2005)
  • (8) Frisén, M.: Optimal sequential surveillance for finance, public health, and other areas (with discussion). Sequential Analysis 28(3), 310–393 (2009)
  • (9) Guépié, B.K., Fillatre, L., Nikiforov, I.: Sequential detection of transient changes. Sequential Analysis 31(4), 528–547 (2012). DOI 10.1080/07474946.2012.719443
  • (10) Guépié, B.K., Fillatre, L., Nikiforov, I.: Detecting a suddenly arriving dynamic profile of finite duration. IEEE Transactions on Information Theory 63(5), 3039–3052 (2017). DOI 10.1109/TIT.2017.2679057
  • (11) Kolessa, A., Tartakovsky, A., Ivanov, A., Radchenko, V.: Nonlinear estimation and decision-making methods in short track identification and orbit determination problem. IEEE Transactions on Aerospace and Electronic Systems 56(1), 301–312 (2020). DOI 10.1109/TAES.2019.2911760
  • (12) Lai, T.L.: Information bounds and quick detection of parameter changes in stochastic systems. IEEE Transactions on Information Theory 44(7), 2917–2929 (1998)
  • (13) Liang, Y., Tartakovsky, A.G., Veeravalli, V.V.: Quickest change detection with non-stationary post-change observations. IEEE Transactions on Information Theory 69(5), 3400–3414 (2023). DOI 0.1109/TIT.2022.3230583
  • (14) Mana, F.E., Guépié, B.K., Nikiforov, I.: Sequential detection of an arbitrary transient change profile by the FMA test. Sequential Analysis pp. 1–21 (2023). DOI 10.1080/07474946.2023.2171056
  • (15) Mei, Y.: Efficient scalable schemes for monitoring a large number of data streams. Biometrika 97(2), 419–433 (2010)
  • (16) Pergamenchtchikov, S., Tartakovsky, A.G.: Asymptotically optimal pointwise and minimax change-point detection for general stochastic models with a composite post-change hypothesis. Journal of Multivariate Analysis 174(11), 1–20 (2019)
  • (17) Pergamenchtchikov, S.M., Tartakovsky, A.G., Spivak, V.S.: Minimax and pointwise sequential changepoint detection and identification for general stochastic models. Journal of Multivariate Analysis 190, 1–22 (2022)
  • (18) Raghavan, V., Galstyan, A., Tartakovsky, A.G.: Hidden Markov models for the activity profile of terrorist groups. Annals of Applied Statistics 7, 2402–24,307 (2013)
  • (19) Raghavan, V., Steeg, G.V., Galstyan, A., Tartakovsky, A.G.: Modeling temporal activity patterns in dynamic social networks. IEEE Transactions on Computational Social Systems 1, 89–107 (2013)
  • (20) Rolka, H., Burkom, H., Cooper, G.F., Kulldorff, M., Madigan, D., Wong, W.K.: Issues in applied statistics for public health bioterrorism surveillance using multiple data streams: research needs. Statistics in Medicine 26(8), 1834–1856 (2007)
  • (21) Sokolov, G., Spivak, V.S., Tartakovsky, A.G.: Detecting an intermittent change of unknown duration. Sequential Analysis (2023, to be published)
  • (22) Sonesson, C., Bock, D.: A review and discussion of prospective statistical surveillance in public health. Journal of the Royal Statistical Society A 166, 5–21 (2003)
  • (23) Szor, P.: The Art of Computer Virus Research and Defense. Addison-Wesley Professional, Upper Saddle River, NJ, USA (2005)
  • (24) Tartakovsky, A., Berenkov, N., Kolessa, A., Nikiforov, I.: Optimal sequential detection of signals with unknown appearance and disappearance points in time. IEEE Transactions on Signal Processing 69, 2653–2662 (2021). DOI 10.1109/TSP.2021.3071016
  • (25) Tartakovsky, A.G.: Asymptotic performance of a multichart CUSUM test under false alarm probability constraint. In: Proceedings of the 44th IEEE Conference Decision and Control and European Control Conference (CDC-ECC’05), Seville, SP, pp. 320–325. IEEE, Omnipress CD-ROM (2005)
  • (26) Tartakovsky, A.G.: Rapid detection of attacks in computer networks by quickest changepoint detection methods. In: N. Adams, N. Heard (eds.) Data Analysis for Network Cyber-Security, pp. 33–70. Imperial College Press, London, UK (2014)
  • (27) Tartakovsky, A.G.: On asymptotic optimality in sequential changepoint detection: Non-iid case. IEEE Transactions on Information Theory 63(6), 3433–3450 (2017). DOI 10.1109/TIT.2017.2683496
  • (28) Tartakovsky, A.G.: Asymptotic optimality of mixture rules for detecting changes in general stochastic models. IEEE Transactions on Information Theory 65(3), 1413–1429 (2019). DOI 10.1109/TIT.2018.2876863
  • (29) Tartakovsky, A.G.: Sequential Change Detection and Hypothesis Testing: General Non-i.i.d. Stochastic Models and Asymptotically Optimal Rules. Monographs on Statistics and Applied Probability 165. Chapman & Hall/CRC Press, Taylor & Francis Group, Boca Raton, London, New York (2020)
  • (30) Tartakovsky, A.G., Brown, J.: Adaptive spatial-temporal filtering methods for clutter removal and target tracking. IEEE Transactions on Aerospace and Electronic Systems 44(4), 1522–1537 (2008)
  • (31) Tartakovsky, A.G., Nikiforov, I.V., Basseville, M.: Sequential Analysis: Hypothesis Testing and Changepoint Detection. Monographs on Statistics and Applied Probability 136. Chapman & Hall/CRC Press, Taylor & Francis Group, Boca Raton, London, New York (2015)
  • (32) Tartakovsky, A.G., Rozovskii, B.L., Blaźek, R.B., Kim, H.: Detection of intrusions in information systems by sequential change-point methods. Statistical Methodology 3(3), 252–293 (2006)
  • (33) Tartakovsky, A.G., Rozovskii, B.L., Blaźek, R.B., Kim, H.: A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods. IEEE Transactions on Signal Processing 54(9), 3372–3382 (2006)
  • (34) Tsui, K., Chiu, W., Gierlich, P., Goldsman, D., Liu, X., Maschek, T.: A review of healthcare, public health, and syndromic surveillance. Quality Engineering 20(4), 435–450 (2008)
  • (35) Xie, Y., Siegmund, D.: Sequential multi-sensor change-point detection. Annals of Statistics 41(2), 670–692 (2013)
  • (36) Yu, X., Baron, M., Choudhary, P.K.: Change-point detection in binomial thinning processes, with applications in epidemiology. Sequential Analysis 32(3), 350–367 (2013)