This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Cluster detection in networks using percolation

Ery Arias-Castrolabel=e1]eariasca@ucsd.edu [ Geoffrey R. Grimmettlabel=e2]G.R.Grimmett@statslab.cam.ac.uk [ Department of Mathematics, University of California, San Diego, CA 92093-0112, USA.
Statistical Laboratory, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB, UK.
(2013; 6 2011; 10 2011)
Abstract

We consider the task of detecting a salient cluster in a sensor network, that is, an undirected graph with a random variable attached to each node. Motivated by recent research in environmental statistics and the drive to compete with the reigning scan statistic, we explore alternatives based on the percolative properties of the network. The first method is based on the size of the largest connected component after removing the nodes in the network with a value below a given threshold. The second method is the upper level set scan test introduced by Patil and Taillie [Statist. Sci. 18 (2003) 457–465]. We establish the performance of these methods in an asymptotic decision- theoretic framework in which the network size increases. These tests have two advantages over the more conventional scan statistic: they do not require previous information about cluster shape, and they are computationally more feasible. We make abundant use of percolation theory to derive our theoretical results, and complement our theory with some numerical experiments.

cluster detection,
connected components,
largest open cluster within a box,
multiple hypothesis testing,
percolation,
scan statistic,
surveillance,
upper level set scan statistic,
doi:
10.3150/11-BEJ412
keywords:
volume: 19issue: 2

and

1 Introduction

We consider the problem of cluster detection in a network. The network is modeled as a graph, and we assume that a random variable is observed at each node. The task is then to detect a cluster, that is, a connected subset of nodes with values that are larger than usual. There are a multitude of applications for which this model is relevant; examples include detection of hazardous materials (Hills [25]) and target tracking (Li et al. [35]) in sensor networks (Culler, Estrin and Srivastava [12]), and detection of disease outbreaks (Heffernan et al. [24]; Rotz and Hughes [49]; Wagner et al. [53]). Pixels in digital images are also sensors, and thus many other applications are found in the rich literature on image processing, for example, road tracking (Geman and Jedynak [20]) and fire prevention using satellite imagery (Pozo, Olmo and Alados-Arboledas [47]), and the detection of cancerous tumors in medical imaging (McInerney and Terzopoulos [36]).

After specifying a distributional model for the observations at the nodes and a class of clusters to be detected, the generalized likelihood ratio (GLR) test is the first method that comes to mind. Indeed, this is by far the most popular method in practice, and as such, is given different names in different fields. The likelihood ratio is known as the scan statistic in spatial statistics (Kulldorff [29, 30]) and the corresponding test as the method of matched filters in engineering (Jain, Zhong and Dubuisson-Jolly [27]; McInerney and Terzopoulos [36]). Here we use the former, where scanning a given cluster KK means computing the likelihood ratio for the simple alternative where KK is the anomalous cluster. Various forms of scan statistic have been proposed, differing mainly by the assumptions made on the shape of the clusters. Most methods assume that the clusters are in some parametric family (e.g., circular (Kulldorff and Nagarwalla [33]), elliptical (Hobolth, Pedersen and Jensen [26]; Kulldorff et al. [32])) or, more generally, deformable templates (Jain, Zhong and Dubuisson-Jolly [27]). Sometimes no explicit shape is assumed, leading to nonparametric models (Duczmal and Assunção [16]; Kulldorff, Fang and Walsh [31]; Tango and Takahashi [51]).

We consider two alternative nonparametric methods, both based on the percolative properties of the network, that is, based on the connected components of the graph after removing the nodes with values below a given threshold. The simplest is based on the size of the largest connected component after thresholding – the threshold is the only parameter of this method. If the graph is a one-dimensional lattice, then after thresholding, this corresponds to the test based on the longest run (Balakrishnan and Koutras [4]), which Chen and Huo [9] adapt for path detection in a thin band. This test has been studied in a similar context in a series of papers111The authors were not aware of this unpublished line of work until M. Langovoy contacted them in the final stages manuscript preparation. (Davies, Langovoy and Wittich [14]; Langovoy and Wittich [34]) under the name of maximum cluster test. The idea behind this method is simple. When an anomalous cluster is indeed present, the values at the nodes belonging to this cluster are larger than usual and thus more likely to survive the threshold, and because these nodes are also likely to clump together – because the cluster is connected in the graph – the size of the statistic will be (stochastically) larger than when no anomalous cluster is present.

More sophisticated, and also parameter-free, is the method based on the upper level set scan statistic of Patil and Taillie [41], subsequently developed in the context of ecological and environmental applications (Patil, Joshi and Koli [38]; Patil and Taillie [42]; Patil et al. [37, 39]). It is the result of scanning over the connected components of the graph after thresholding, which is repeated at all thresholds. This method obviously is closely related to the scan statistic. It can be seen as attempting to approximate the scan statistic over all possible connected components of the graph by restricting the class of subsets to be scanned to those surviving a threshold. Our results indicate that this method is in fact more closely related to the previous one (based on the size of the largest connected component at a given threshold), and in some sense provides a way to automatically choose the threshold.

These two percolation-based methods have two significant advantages over the scan statistic. First, they do not need to be provided with the shape of the clusters to be detected. Thus they are valuable in settings with less previous spatial information. The second advantage is computational. The scan statistic tends to be computationally demanding, even in parametric settings, or even outright intractable, particularly in nonparametric settings. In contrast, these two methods are computationally feasible, and their implementation is fairly straightforward, even for irregular networks. On the other hand, the scan statistic often relies on the fast Fourier transform in the square lattice to scan clusters of known shape over all locations in that network.

In terms of detection performance, we compare these percolation-based methods to the scan statistic in a standard asymptotic decision theoretic framework where the network is a square lattice of growing size and the variables at the nodes are assumed i.i.d. for nodes inside (resp., outside) the anomalous cluster. The performance of the scan statistic in such a framework is well understood and known to be (near-) optimal, which makes it the gold standard in detection (Arias-Castro, Candès and Durand [1]; Arias-Castro, Donoho and Huo [3]; Perone Pacifico et al. [45]; Walther [54]). We find that these two methods are suboptimal for the detection of hypercubes, an emblematic parametric class, but are near-optimal for the detection of self-avoiding paths, an emblematic nonparametric class. The main weakness of these percolation-based methods is that when the per-node signal-to-noise ratio is weak, the connected components after thresholding are heavily influenced by the whimsical behavior of the values at the nodes. The scan statistic is very effective in such situations. Although this rationale seems to apply particularly well in the case of self-avoiding paths, what makes these methods competitive in this case is that the problem of detecting such objects is intrinsically very hard.

The study of the connected components after thresholding is intrinsically connected to percolation theory (Grimmett [21]), an important branch in probability theory. In fact, when the node values are i.i.d. – which is the case when no anomalous cluster is present – the only dependence on the distribution at the nodes is the probability of surviving the threshold, and after thresholding, the network is a site percolation model. (We introduce and discuss these notions in detail later in the article.) Our contribution is a careful analysis of these two nonparametric methods using percolation theory (Grimmett [21]) in a substantial way, thus applying percolation theory in a sophisticated fashion to shed light on an important problem in statistics.

The rest of the paper is organized as follows. In Section 2 we formally introduce the framework and state some fundamental detection bounds. In Section 3 we describe the standard scan statistic and present some results on its performance, showing that it is essentially optimal. In Section 4, we consider the size of the largest connected component after thresholding. In Section 5, we consider the upper level set scan statistic. We briefly discuss implementation issues and present some numerical experiments in Section 6. Finally, Section 7 is a discussion section where, in particular, we mention extensions. We provide proofs in the Appendix.

2 Mathematical framework and fundamental detection bounds

For concreteness, and also for its relevance to signal and image processing, we model the network as a finite subgrid of the regular square lattice in dimension dd, denoted 𝕍m:={1,,m}d\mathbb{V}_{m}:=\{1,\ldots,m\}^{d}. Our analysis is asymptotic in the sense that the network is assumed to be large, that is, mm\to\infty. To each node v𝕍mv\in\mathbb{V}_{m}, we attach a random variable, XvX_{v}. For example, in the context of a sensor network, the nodes represent the sensors and the variables represent the information that they transmit. The random variables {Xv:v𝕍m}\{X_{v}\colon\ v\in\mathbb{V}_{m}\} are assumed to be independent with common distribution in a certain one-parameter exponential family {Fθ:θ[0,θ)}\{F_{\theta}\colon\ \theta\in[0,\theta_{\infty})\}, defined as follows. Let θ>0\theta_{\infty}>0, let F0F_{0} be a distribution function with finite non-zero variance σ02\sigma_{0}^{2}, and assume the that moment-generating function φ(θ):=exθdF0(x)\varphi(\theta):=\int\mathrm{e}^{x\theta}\,\mathrm{d}F_{0}(x) is finite for θ[0,θ)\theta\in[0,\theta_{\infty}). Then FθF_{\theta} is the distribution function with density fθ(x)=exp(θxlogφ(θ))f_{\theta}(x)=\exp(\theta x-\log\varphi(\theta)) with respect to F0F_{0}. We assume further regularity of F0F_{0} at later points in this paper. Note that our results apply to other distributional models as well, as discussed in Section 7.

Examples of such a family {Fθ:θ[0,θ)}\{F_{\theta}\colon\ \theta\in[0,\theta_{\infty})\} include the following:

  • Bernoulli model: Fθ=𝐵𝑒𝑟(pθ)F_{\theta}=\operatorname{Ber}(p_{\theta}), pθ:=𝑙𝑜𝑔𝑖𝑡1(θ+θ0)p_{\theta}:=\operatorname{logit}^{-1}(\theta+\theta_{0}), relevant in sensor arrays where each sensor transmits one bit (i.e., makes a binary decision)

  • Poisson model: Fθ=𝑃𝑜𝑖(θ+θ0)F_{\theta}=\operatorname{Poi}(\theta+\theta_{0}), popular with count data, for example, arising in infectious disease surveillance systems

  • Exponential model: Fθ=𝐸𝑥𝑝(θ0θ)F_{\theta}=\operatorname{Exp}(\theta_{0}-\theta) (e.g., to model response times)

  • Normal location model: Fθ=𝒩(θ+θ0,1)F_{\theta}=\mathcal{N}(\theta+\theta_{0},1), standard in signal and image processing, where noise is often assumed to be Gaussian.

Let 𝒦m\mathcal{K}_{m} be a class of clusters, with a cluster defined as a subset of nodes connected in the graph. Under the null hypothesis, all of the variables at the nodes have distribution F0F_{0}, that is,

0m:XvF0v𝕍m.\mathbb{H}^{m}_{0}\colon\ X_{v}\sim F_{0}\qquad\forall v\in\mathbb{V}_{m}.

Under the particular alternative where K𝒦mK\in\mathcal{K}_{m} is anomalous, the variables indexed by KK have distribution FθmF_{\theta_{m}} for some θm>0\theta_{m}>0, that is,

1,Km:XvFθmvK;XvF0vK.\mathbb{H}^{m}_{1,K}\colon\ X_{v}\sim F_{\theta_{m}}\qquad\forall v\in K;\qquad X_{v}\sim F_{0}\qquad\forall v\notin K.

We are interested in the situation where the anomalous cluster KK is unknown, namely in testing 0m\mathbb{H}^{m}_{0} against 1m:=K𝒦m1,Km\mathbb{H}^{m}_{1}:=\bigcup_{K\in\mathcal{K}_{m}}\mathbb{H}^{m}_{1,K}. We illustrate the setting in Figure 1 in the context of the two-dimensional square grid.

Refer to caption
Figure 1: This figure illustrates the setting in dimension d=2d=2 for a beta model where F0=𝑈𝑛𝑖𝑓(0,1)F_{0}=\operatorname{Unif}(0,1) and Fθ=𝐵𝑒𝑡𝑎(θ+1,1),θ0F_{\theta}=\operatorname{Beta}(\theta+1,1),\theta\geq 0. (Left) An instance of the null hypothesis. (Middle) An instance of an alternative with a square cluster. (Right) An instance of an alternative with a path.

Let 𝒦m\mathcal{K}_{m} denote a cluster class for 𝕍m\mathbb{V}_{m}. As usual, a test TT is a function of the data, T=T(Xv:v𝕍m)T=T(X_{v}\colon\ v\in\mathbb{V}_{m}), that takes values in {0,1}\{0,1\}, with T=1T=1 corresponding to a rejection, meaning a decision in favor of 1m\mathbb{H}^{m}_{1}. For a test TT, we define its worst-case risk as the sum of its probability of type I error and its probability of type II maximized over the anomalous clusters in the class

γm(T)=(T=1|0m)+maxK𝒦m(T=0|1,Km).\gamma_{m}(T)=\mathbb{P}(T=1|\mathbb{H}^{m}_{0})+\max_{K\in\mathcal{K}_{m}}\mathbb{P}(T=0|\mathbb{H}^{m}_{1,K}).

A method is formally defined as a sequence of tests (Tm)(T_{m}) for testing 0m\mathbb{H}^{m}_{0} versus 1m\mathbb{H}^{m}_{1}. We say that a method (Tm)(T_{m}) is asymptotically powerless if

lim infmγm(Tm)1.\liminf_{m\to\infty}\gamma_{m}(T_{m})\geq 1.

This amounts to saying that as the size of the network increases, the method (Tm)(T_{m}) is not substantially better than random guessing. Conversely, a method (Tm)(T_{m}) is asymptotically powerful if

limmγm(Tm)=0.\lim_{m\to\infty}\gamma_{m}(T_{m})=0.

The minimax risk is defined as γm:=infTγm(T)\gamma_{m}^{*}:=\inf_{T}\gamma_{m}(T), and we say that a method is (Tm)(T_{m}) (asymptotically) optimal if γm(Tm)0\gamma_{m}(T_{m})\to 0 whenever γm0\gamma_{m}^{*}\to 0. Everything else fixed, the latter depends on the behavior of θm\theta_{m} when mm becomes large. We say that (Tm)(T_{m}) is optimal up to a multiplicative constant C1C\geq 1 if γm(Tm)0\gamma_{m}(T_{m})\to 0 under CθmC\theta_{m} whenever γm0\gamma_{m}^{*}\to 0 under θm\theta_{m}. We say that (Tm)(T_{m}) is near-optimal if the same is true with CC replaced by CmC_{m}\to\infty with logCm=o(logθm)\log C_{m}=\mathrm{o}(\log\theta_{m}). (This occurs here only when θm0\theta_{m}\to 0 polynomially fast and CmC_{m}\to\infty poly-logarithmically fast.)

We focus on situations where the clusters in the class 𝒦m\mathcal{K}_{m} are of same size, increasing with mm but negligible compared with the size of the entire network. We do so for the sake of simplicity; more general results could be obtained as in Arias-Castro, Candès and Durand [1], Arias-Castro, Donoho and Huo [3], Perone Pacifico et al. [45], Walther [54] without additional difficulty. Assuming a large anomalous cluster allows us to state general results applying to a wide range of one-parameter exponential families (via the central limit theorem). In addition, note that on the one hand, reliably detecting a cluster of bounded size is impossible in the Bernoulli model or any other model where F0F_{0} has finite support, whereas on the other hand, detecting a cluster of size comparable to that of the entire network is in some sense trivial, given that the simple test based on the total sum v𝕍mXv\sum_{v\in\mathbb{V}_{m}}X_{v} is optimal up to a multiplicative constant.

We consider two emblematic classes of clusters, in some sense at the opposite extremes:

  • Hypercube detection. Let 𝒦m\mathcal{K}_{m} denote the class of hypercubes within 𝕍m\mathbb{V}_{m} of sidelength [mα][m^{\alpha}] with 0<α<10<\alpha<1. This class is parametric, with the location of the hypercube the only parameter.

  • Path detection. Let 𝒦m\mathcal{K}_{m} denote the class of loopless paths within 𝕍m\mathbb{V}_{m} of length [mα][m^{\alpha}] with 0<α<10<\alpha<1. This class is nonparametric, in the sense that its cardinality is exponential in the length of the paths.

See Figure 1 for an illustration. (Note that a hypercube of side length kk may be seen as a loopless path of length kdk^{d}.) Although we obtain results for both, our main focus is in the setting of hypercube detection, which is relevant to a wider range of applications, in fact any situation where the task is to detect a shape that is not filamentary. The situation exemplified in the setting of path detection may be relevant in target tracking from video, or the detection of cracks in materials in non-destructive testing. Note that the two settings coincide in dimension one.

We state fundamental detection bounds for each setting. The following result is standard (see, e.g., Arias-Castro, Candès and Durand [1]; Arias-Castro, Donoho and Huo [3]). Remember that σ02\sigma_{0}^{2} denotes the variance of F0F_{0}.

Lemma 0

In hypercube detection, all methods are asymptotically powerless if

lim supm(logm)1/2mdα/2θm<σ02d(1α).\limsup_{m\to\infty}(\log m)^{-1/2}m^{d\alpha/2}\theta_{m}<\sigma_{0}\sqrt{2d(1-\alpha)}.

In fact, the conclusions of Lemma 1 apply for a wide variety of parametric classes, such as discs, a popular model in disease outbreak detection (Kulldorff and Nagarwalla [33]), as well as to nonparametric classes of blob-like clusters (see Arias-Castro, Candès and Durand [1]; Arias-Castro, Donoho and Huo [3]).

The following result is taken from Arias-Castro et al. [2].

Lemma 0

In path detection, all methods are asymptotically powerless if limmθm(logm)(loglogm)1/2=0\lim_{m\to\infty}\theta_{m}\*(\log m)(\log\log m)^{1/2}=0, in dimension d=2d=2, and the same is true in dimension d3d\geq 3 if lim supmθm<θ\limsup_{m\to\infty}\theta_{m}<\theta_{*}, where θ>0\theta_{*}>0 depends only on dd.

In dimension d4d\geq 4, θ\theta_{*} may be taken to be the unique solution to

ρφ(2θ)φ(θ)2=0,\rho\varphi(2\theta)-\varphi(\theta)^{2}=0,

where ρ\rho is the return probability of a symmetric random walk in dimension dd.

3 The scan statistic

For a subset of nodes K𝕍K\subset\mathbb{V}, let |K||K| denote its size and define

X¯K=1|K|vKXv.\bar{X}_{K}=\frac{1}{|K|}\sum_{v\in K}X_{v}.

Given a cluster class 𝒦\mathcal{K}, we define the (simple) scan statistic as

maxK𝒦|K|(X¯Kμ0),\max_{K\in\mathcal{K}}\sqrt{|K|}(\bar{X}_{K}-\mu_{0}), (1)

where μ0\mu_{0} is the mean of F0F_{0}. If μ0\mu_{0} is not available, we may use the grand mean X¯𝕍m\bar{X}_{\mathbb{V}_{m}} instead. In Appendix B, we derive this form of the scan statistic as an approximation to the scan statistic of Kulldorff [29], which is, strictly speaking, the GLR and arguably the most popular version, particularly in spatial statistics. We use this simpler form to streamline our theoretical analysis.

The test that rejects for large values of the scan statistic (1), which we call the scan test, is near-optimal in a wide range of settings (Arias-Castro, Candès and Durand [1]; Arias-Castro, Donoho and Huo [3]; Walther [54]). In particular, in the context of a class of hypercubes, and in fact many other parametric classes, this test is asymptotically optimal to the exact multiplicative constant.

Lemma 0

In hypercube detection, the scan test is asymptotically powerful if

lim infm(logm)1/2mdα/2θm>σ02d(1α).\liminf_{m\to\infty}(\log m)^{-1/2}m^{d\alpha/2}\theta_{m}>\sigma_{0}\sqrt{2d(1-\alpha)}.

In the context of a class of paths, the following result states that the scan test detects if θm\theta_{m} is bounded away from 0 and sufficiently large. Note that this does not match the order of magnitude of the lower bound given in dimension d=2d=2. Let Λ(θ)=logφ(θ)\Lambda(\theta)=\log\varphi(\theta) and Λ(x)=supθ0[θxΛ(θ)].\Lambda^{*}(x)=\sup_{\theta\geq 0}[\theta x-\Lambda(\theta)]. (Λ\Lambda^{*} is the rate function of F0F_{0} when xμ0x\geq\mu_{0}.) The following result is established in Arias-Castro et al. [2].

Lemma 0

In path detection, the scan test is asymptotically powerful if

lim infmθm>θ:=(ΛΛ)1(log(2d)).\liminf_{m\to\infty}\theta_{m}>\theta_{*}:=(\Lambda^{*}\circ\Lambda^{\prime})^{-1}(\log(2d)).

4 Size of the largest open cluster

We study the test based on the size of the largest connected component after thresholding the values at the nodes. This test was independently considered in a series of papers (Davies, Langovoy and Wittich [14]; Langovoy and Wittich [34]). Our results are seen to sharpen and elaborate on these results. In particular, we study this test under all three regimes (subcritical, supercritical, and critical).

Adapting terminology from percolation theory (Grimmett [21]), for a threshold tt\in\mathbb{R}, we say that a subset K𝕍K\subset\mathbb{V} is open (at threshold tt) if Xv>tX_{v}>t for all vKv\in K. Let Sm(t)S_{m}(t) (resp., SK(t)S_{K}(t)) denote the size of the largest open cluster within 𝕍m\mathbb{V}_{m} (resp., within KK). The analysis of the test based on Sm(t)S_{m}(t), which we call the largest open cluster (LOC) test, boils down to bounding the size of Sm(t)S_{m}(t) from above, under 0m\mathbb{H}^{m}_{0}, and, because Sm(t)SK(t)S_{m}(t)\geq S_{K}(t), bounding the size of SK(t)S_{K}(t) from below, under 1,Km\mathbb{H}^{m}_{1,K}. Define ξv(t)=𝐈{Xv>t}\xi_{v}(t)=\mathbf{I}\{X_{v}>t\}, which is Bernoulli with parameter pθ(t):=θ(Xv>t)p_{\theta}(t):=\mathbb{P}_{\theta}(X_{v}>t). The process (ξv(t):v𝕍m)(\xi_{v}(t)\colon\ v\in\mathbb{V}_{m}) is a site percolation model (Grimmett [21]). In general, consider a process (ξv:v𝕍m)(\xi_{v}\colon\ v\in\mathbb{V}_{m}) i.i.d. Bernoulli with parameter pp, and let SmS_{m} denote the size of the largest open cluster within 𝕍m\mathbb{V}_{m}. In dimension d=1d=1, this process may be seen as a sequence of coin tosses, and SmS_{m} viewed as the longest heads run in that sequence. In this context, the Erdős–Rényi Law (Erdős and Rényi [17]) says that

Smlogm1log(1/p),almost surely.\frac{S_{m}}{\log m}\to\frac{1}{\log(1/p)},\qquad\mbox{almost surely}. (2)

In higher dimensions d2d\geq 2, the situation is much more involved. Let pcp_{c} denote the critical probability for site percolation in d\mathbb{Z}^{d}, defined as the supremum over all p(0,1)p\in(0,1) such that the size of the open cluster at the origin, denoted by SS, is finite with probability 1. (The dependency in dd is left implicit.) We consider the subcritical (p0(t)<pcp_{0}(t)<p_{c}), supercritical (p0(t)>pcp_{0}(t)>p_{c}), and near-critical (p0(t)pcp_{0}(t)\approx p_{c}) cases separately.

4.1 Subcritical percolation

In the subcritical case, where tt is such that p0(t)<pcp_{0}(t)<p_{c}, we are able to obtain precise, rigorous results on the performance of the test based on Sm(t)S_{m}(t) in terms of the function ζp\zeta_{p}, implicitly defined as

ζp:=limk1klog(Sk)=limk1klog(S=k)\zeta_{p}:=-\lim_{k\to\infty}\frac{1}{k}\log\mathbb{P}(S\geq k)=-\lim_{k\to\infty}\frac{1}{k}\log\mathbb{P}(S=k) (3)

(see Grimmett [21], Section 6.3). Again, the dependency in dd is left implicit. As a function of p(0,pc)p\in(0,p_{c}), ζp\zeta_{p} is continuous and strictly decreasing, with limits \infty at p=0p=0 and 0 at p=pcp=p_{c} (see Lemma 8), whereas ζp=0\zeta_{p}=0 for ppcp\geq p_{c}. In the Appendix, we include a proof that

Smlogmdζp,in probability\frac{S_{m}}{\log m}\to\frac{d}{\zeta_{p}},\qquad\mbox{in probability} (4)

for a subcritical threshold p<pcp<p_{c}.

The convergence result in (4) may be used to bound Sm(t)S_{m}(t) under the null by taking p=p0(t)p=p_{0}(t). Under the alternative, if we consider a class of hypercubes, then (4) also may be used to bound SK(t)S_{K}(t), because KK is a scaled version of 𝕍m\mathbb{V}_{m}.

Theorem 1

In hypercube detection, the test based on Sm(t)S_{m}(t), with tt fixed such that 0<p0(t)<pc0<p_{0}(t)<p_{c}, is asymptotically powerful if lim infmθm>θ(t)\liminf_{m\to\infty}\theta_{m}>\theta_{*}(t), and asymptotically powerless if lim supmθm<θ(t)\limsup_{m\to\infty}\theta_{m}<\theta_{*}(t), where θ(t)\theta_{*}(t) is the unique solution to ζpθ(t)=αζp0(t)\zeta_{p_{\theta}(t)}=\alpha\zeta_{p_{0}(t)}.

Note that when tt is fixed, ζpθ(t)\zeta_{p_{\theta}(t)} as a function of θ\theta is continuous and strictly strictly decreasing, by the fact that pθ(t)p_{\theta}(t) is continuous and strictly increasing in θ\theta (Brown [7], Cor. 2.6, 2.22) and ζp\zeta_{p} is continuous and strictly decreasing in pp (Lemma 8). Therefore, θ(t)\theta_{*}(t) in the theorem is well defined.

If instead, we consider a class of paths, then (2) may be used to bound SK(t)S_{K}(t), because KK is a scaled version of the lattice in dimension 1. In congruence with (2), we define ζp1=log(1/p)\zeta^{1}_{p}=\log(1/p).

Theorem 2

In path detection, the test based on Sm(t)S_{m}(t), with tt fixed such that 0<p0(t)<pc0<p_{0}(t)<p_{c}, is asymptotically powerful if lim infmθm>θ+(t)\liminf_{m\to\infty}\theta_{m}>\theta_{*}^{+}(t), and asymptotically powerless if lim supmθm<θ(t)\limsup_{m\to\infty}\theta_{m}<\theta_{*}^{-}(t), where θ+(t)\theta_{*}^{+}(t) (resp., θ(t)\theta_{*}^{-}(t)) is the unique solution to dζpθ(t)1=αζp0(t)d\zeta^{1}_{p_{\theta}(t)}=\alpha\zeta_{p_{0}(t)} (resp., dζpθ(t)1=αζp0(t)d\zeta^{1}_{p_{\theta}(t)}=\alpha\zeta_{p_{0}(t)}).

Note that in dimension d2d\geq 2, the result is not sharp, because we always have θ+(t)>θ(t)\theta_{*}^{+}(t)>\theta_{*}^{-}(t). We believe that sharper forms of this result may be substantially more involved, and for this reason we have not pursued this.

Qualitatively, the message is that for both hypercube detection and path detection, the subcritical LOC test requires that θm\theta_{m} be larger than a constant to be effective. Compared with the scan statistic, this makes it grossly suboptimal when detecting hypercubes and comparable (up to a multiplicative constant in θm\theta_{m}) when detecting self-avoiding paths.

What if we let t=tmt=t_{m}\to\infty, so that p0(tm)0p_{0}(t_{m})\to 0? Then the test based on Sm(tm)S_{m}(t_{m}) is powerless under some additional conditions on F0F_{0}. For b,C0b,C\geq 0, consider the following class of approximately exponential power (𝐴𝐸𝑃\operatorname{AEP}) distributions, sometimes called Subbotin distributions:

𝐴𝐸𝑃(b,C)={F:xblogF¯(x)C,x}.\operatorname{AEP}(b,C)=\{F\colon\ x^{-b}\log\bar{F}(x)\to-C,x\to\infty\}.

(F¯(x):=1F(x)\bar{F}(x):=1-F(x) is the survival distribution function of XFX\sim F.) For example, 𝐸𝑥𝑝(λ)𝐴𝐸𝑃(1,λ)\operatorname{Exp}(\lambda)\in\operatorname{AEP}(1,\lambda) and 𝒩(μ,σ2)𝐴𝐸𝑃(2,1/(2σ2))\mathcal{N}(\mu,\sigma^{2})\in\operatorname{AEP}(2,1/(2\sigma^{2})), whereas 𝑃𝑜𝑖(λ)\operatorname{Poi}(\lambda) behaves roughly as a distribution in 𝐴𝐸𝑃(1,C)\operatorname{AEP}(1,C).

Proposition 1

Assume that F0𝐴𝐸𝑃(b,C)F_{0}\in\operatorname{AEP}(b,C) for some b>1b>1 and C>0C>0. In hypercube detection, the test based on Sm(t)S_{m}(t) is asymptotically powerless when t=tmt=t_{m}\to\infty, unless θm\theta_{m}\to\infty.

4.2 Supercritical percolation

Here we consider the supercritical regime, where p0(t)>pcp_{0}(t)>p_{c}. (Note that necessarily d2d\geq 2 for pc=1p_{c}=1 in dimension 1.) In this setting, too, the size of the largest cluster is well understood. Let Θp\Theta_{p} be the probability that the open cluster at the origin is infinite, and note that Θp>0\Theta_{p}>0 for p>pcp>p_{c}, by the definition of pcp_{c}. We have with probability 1 that

Sm|𝕍m|Θp\frac{S_{m}}{|\mathbb{V}_{m}|}\to\Theta_{p}

(see Falconer and Grimmett [18], Lemma 2 and proof, Penrose and Pisztora [44], Theorem 4, Pisztora [46]). In fact (with probability 1o(1)1-\mathrm{o}(1)), the largest open cluster within 𝕍m\mathbb{V}_{m} is unique, and the foregoing statement says that it occupies a non-negligible fraction of 𝕍m\mathbb{V}_{m}. With a supercritical choice of threshold, the LOC test is powerless for any θ\theta if the anomalous cluster is too small, specifically if α<1/2\alpha<1/2 in the setting of hypercube detection. Indeed, we have the following result.

Theorem 3

In hypercube detection, the test based on Sm(t)S_{m}(t), with tt fixed such that pc<p0(t)<1p_{c}<p_{0}(t)<1, is asymptotically powerful if α1/2\alpha\geq 1/2 and limmθmm(α1/2)d=\lim_{m\to\infty}\theta_{m}m^{(\alpha-1/2)d}=\infty, and asymptotically powerless if α<1/2\alpha<1/2 or if limmθmm(α1/2)d=0\lim_{m\to\infty}\theta_{m}m^{(\alpha-1/2)d}=0.

Thus, for the detection of small clusters, a supercritical LOC test is potentially worthless, whereas for larger clusters it improves substantially on the performance of a subcritical LOC test, although it is still suboptimal compared with the scan statistic. (Indeed, comparing the exponents when α1/2\alpha\geq 1/2, we have (α1/2)d<αd/2(\alpha-1/2)d<\alpha d/2, because α<1\alpha<1.) We mention that in the context of path detection, the same arguments show that the LOC test for any choice of supercritical threshold is asymptotically powerless.

4.3 Critical percolation

If our goal is to choose a threshold tt so as to maximize the difference in size for the largest open cluster under the null and under an alternative, then we are necessarily in the neighborhood of the percolation phase transition, which is to say that |ppc||p-p_{c}| is small. (Again, here we assume d2d\geq 2.) The percolation model is not fully understood in the critical regime, which poses a serious obstacle to a rigorous statistical analysis. (See Grimmett [21], Chapter 9, for a general discussion of this percolation regime.) We base our discussion on the work of Borgs et al. [6]. Let πm(p)\pi_{m}(p) denote the probability that the open cluster at the origin reaches outside the box [m,m]d[-m,m]^{d}, and let ξ(p)\xi(p) denote the correlation length, defined as

1ξ(p):=limm1mlogπm(p).\frac{1}{\xi(p)}:=-\lim_{m\to\infty}\frac{1}{m}\log\pi_{m}(p).

Note that, with ξ\xi thus defined, ξ(p)<\xi(p)<\infty if and only if p<pcp<p_{c}. The critical exponent for (subcritical) correlation length is postulated to be

ν:=limppclogξ(p)log|ppc|.\nu:=-\lim_{p\nearrow p_{c}}\frac{\log\xi(p)}{\log|p-p_{c}|}.

It is not known whether the limit exists for all dimensions, but it is known that 0<ν<0<\nu<\infty whenever it exists. It is shown in Borgs et al. [6] that, subject to the existence of this limit together with other scaling assumptions, when p=pmp=p_{m} varies with mm,

SmP{logm, if, for some ν>ν,m1/ν(pmpc),md, if, for some ν>ν,m1/ν(pmpc),S_{m}\asymp_{\mathrm{P}}\cases{\log m,&\quad$\mbox{if, for some }\nu^{\prime}>\nu,m^{1/\nu^{\prime}}(p_{m}-p_{c})\to-\infty$,\cr m^{d},&\quad$\mbox{if, for some }\nu^{\prime}>\nu,m^{1/\nu^{\prime}}(p_{m}-p_{c})\to\infty$,} (5)

where XmPYmX_{m}\asymp_{\mathrm{P}}Y_{m} means that there exists a constant C(0,)C\in(0,\infty) such that C1Xm/YmCC^{-1}\leq X_{m}/Y_{m}\leq C in probability. The scaling assumptions of Borgs et al. [6] are believed to hold if and only if the number dd of dimensions satisfies 2d62\leq d\leq 6, and they are proved for d=2d=2. The work of Borgs et al. [6] was directed at bond percolation only, but similar results are expected for site percolation.

It is known that ν=4/3\nu=4/3 for site percolation on the triangular lattice (see Smirnov and Werner [50]), and it is believed that this holds for percolation on any two-dimensional lattice. As described in Grimmett [21], Section 10.4, it is believed that ν=1/2\nu=1/2 for d6d\geq 6, and this has been proved for d19d\geq 19 and for the so-called “spread-out model” in 77 and more dimensions (Hara, van der Hofstad and Slade [23]).

Subject to the assumption that (5) holds, we establish the power of the test based on Sm(t)S_{m}(t) when choosing t=tmt=t_{m} near criticality. We assume that there exists tct_{c} such that p0(tc)=pcp_{0}(t_{c})=p_{c}, and that p0(t)p_{0}(t) is a continuous function of tt in a neighborhood of tct_{c}.

Theorem 4

Let tmtct_{m}\geq t_{c} be such that pcp0(tm)m1/νp_{c}-p_{0}(t_{m})\asymp m^{-1/\nu^{\prime}} for some ν>ν\nu^{\prime}>\nu. In hypercube detection, assuming that (5) holds, the test based on Sm(tm)S_{m}(t_{m}) is asymptotically powerful if lim infmθmmα/ν\liminf_{m\to\infty}\theta_{m}m^{\alpha/\nu^{\prime}} is sufficiently large.

Compared with a subcritical choice of threshold, which requires that θm\theta_{m} be bounded away from 0 for the test to have any power, as seen in Theorem 1, with a near-critical choice of threshold, the test is able to detect at polynomially small θm\theta_{m}. In particular, with a proper choice of threshold, the test is powerful for θm\theta_{m} of order mα/νm^{-\alpha/\nu^{\prime}} with ν>ν\nu^{\prime}>\nu. Note that, by Lemma 1, all methods are asymptotically powerless if θm\theta_{m} is of order mdα/2m^{-d\alpha/2}, implying that α/νdα/2\alpha/\nu\leq d\alpha/2. We thus obtain the inequality ν2/d\nu\geq 2/d. This may be compared with the scaling relation (Grimmett [21], Equation (9.23)) stating that dν=2ad\nu=2-a, where aa (<0<0) is the percolation critical exponent for the number of clusters per vertex. It is believed that a=23a=-\frac{2}{3} when d=2d=2 and a=1a=-1 when d6d\geq 6. Compared with the performance at supercriticality, the test at near-criticality (with a proper choice of threshold) is superior if (α12)d<α/ν(\alpha-\frac{1}{2})d<\alpha/\nu, which is equivalent to α<(1a/2)/(1a)\alpha<(1-a/2)/(1-a). For example, with a=23a=-\frac{2}{3}, the near-critical LOC test is superior when α<34\alpha<\frac{3}{4}.

5 The upper level set scan statistic

For a threshold tt, let 𝒬m(t)\mathcal{Q}_{m}^{(t)} denote the (random) class of clusters within 𝕍m\mathbb{V}_{m} open at tt, and let 𝒬m=t𝒬m(t)\mathcal{Q}_{m}^{*}=\bigcup_{t}\mathcal{Q}_{m}^{(t)}, which is also random. Patil and Taillie [41] suggested scanning the clusters in 𝒬m\mathcal{Q}_{m}^{*}. To facilitate a rigorous mathematical analysis of its performance, we consider the upper level set (𝑈𝐿𝑆\operatorname{ULS}) scan at a given threshold tt, and use the simple scan described in Section 3. Specifically, in correspondence with (1), we define the (simple) 𝑈𝐿𝑆\operatorname{ULS} scan statistic at threshold tt as

Um(t,km)=max{|K|(X¯Kμ0|t):K𝒬m(t),|K|km},U_{m}(t,k_{m})=\max\bigl{\{}\sqrt{|K|}(\bar{X}_{K}-\mu_{0|t})\colon\ K\in\mathcal{Q}_{m}^{(t)},|K|\geq k_{m}\bigr{\}}, (6)

where μ0|t\mu_{0|t} (resp., σ0|t2\sigma^{2}_{0|t}) is the the mean (resp., variance) of Xv|Xv>tX_{v}|X_{v}>t when XvF0X_{v}\sim F_{0}, and (km)(k_{m}) is a non-decreasing sequence of positive integers. The 𝑈𝐿𝑆\operatorname{ULS} scan statistic of Patil and Taillie [41] corresponds (in its simple form) to

𝑈𝐿𝑆m=maxtUm(t,1)σ0|t.\operatorname{ULS}_{m}=\max_{t\in\mathbb{R}}\frac{U_{m}(t,1)}{\sigma_{0|t}}. (7)

If μ0|t\mu_{0|t} and/or σ0|t2\sigma_{0|t}^{2} are not available, we may use their empirical versions based on the XvX_{v} that survive the threshold tt. We restrict the scan to clusters of size at least kmk_{m} to increase power, because the behavior of Um(t)U_{m}(t) is, as we show later, completely driven by the smallest open clusters that are scanned, at least when tt is subcritical. We present the rest of our discussion in terms of subcritical, supercritical, and near-critical choices of threshold. We then conclude with a result on the performance of the 𝑈𝐿𝑆\operatorname{ULS} scan test across all thresholds.

5.1 Subcritical threshold

We start by describing the behavior of Um(t,km)U_{m}(t,k_{m}) under the null. Let Fθ|tF_{\theta|t} denote the distribution of Xv|Xv>tX_{v}|X_{v}>t under FθF_{\theta}, and let μθ|t\mu_{\theta|t} and Λθ|t\Lambda^{*}_{\theta|t} denote its mean and rate function, respectively. Also, when 0<β<1/ζpθ(t)0<\beta<1/\zeta_{p_{\theta}(t)}, or β=0\beta=0 and F0𝐴𝐸𝑃(b,C)F_{0}\in\operatorname{AEP}(b,C) for some b2b\geq 2 and C>0C>0, let γθ|t(β):=γ(Fθ|t,μ0|t,ζpθ(t),β)\gamma_{\theta|t}(\beta):=\gamma(F_{\theta|t},\mu_{0|t},\zeta_{p_{\theta}(t)},\beta), where γ\gamma is the function defined in Lemma 16. Note that γθ|t(β)\gamma_{\theta|t}(\beta) can be computed explicitly in some cases, like the normal location model, and γθ|t(β)(μθ|tμ0|t)2/ζpθ(t)\gamma_{\theta|t}(\beta)\sim(\mu_{\theta|t}-\mu_{0|t})^{2}/\zeta_{p_{\theta}(t)} when θθc(t)\theta\nearrow\theta_{c}(t), defined (when it exists) as the solution to pθ(t)=pcp_{\theta}(t)=p_{c}.

Lemma 0

Assume that θ0\theta\geq 0 and tt is fixed such that 0<pθ(t)<pc0<p_{\theta}(t)<p_{c} and that km/logmdβk_{m}/\log m\to d\beta for some β0\beta\geq 0. Then, under FθF_{\theta} on 𝕍m\mathbb{V}_{m}, the following holds in probability:

  1. 1.

    If β>1/ζpθ(t)\beta>1/\zeta_{p_{\theta}(t)}, then Um(t,km)=0U_{m}(t,k_{m})=0 for mm large enough.

  2. 2.

    If 0<β<1/ζpθ(t)0<\beta<1/\zeta_{p_{\theta}(t)}, then

    (logm)1/2Um(t,km)(dγθ|t(β))1/2.(\log m)^{-1/2}U_{m}(t,k_{m})\to(d\gamma_{\theta|t}(\beta))^{1/2}.
  3. 3.

    If β=0\beta=0 and F0𝐴𝐸𝑃(b,C)F_{0}\in\operatorname{AEP}(b,C) for some b1b\geq 1 and C>0C>0, then

    1. [(b)]

    2. (a)

      If b2b\geq 2, the convergence in Part 2 applies;

    3. (b)

      If b<2b<2,

      km1/b1/2(logm)1/bUm(t,km)(d/C)1/b.k_{m}^{1/b-1/2}(\log m)^{-1/b}U_{m}(t,k_{m})\to(d/C)^{1/b}.

In the last case, where β=0\beta=0, the behavior of Um(t)U_{m}(t) is influenced by the very large deviations of Fθ|tkF_{\theta|t}^{*k} for kkmk\geq k_{m}. (The symbol * denotes convolution.) We choose to state a result for 𝐴𝐸𝑃\operatorname{AEP} distributions, for which the very large deviations resemble the large deviations.

Based on Lemma 5, we establish the performance of the 𝑈𝐿𝑆\operatorname{ULS} scan statistic. We start by arguing that choosing kmk_{m} such that km/logm0k_{m}/\log m\to 0 leads to a test that may potentially have less power than the test based on the largest cluster after thresholding. Indeed, the behavior of the 𝑈𝐿𝑆\operatorname{ULS} scan statistic does not depend on θ\theta as long as θ<θc(t)\theta<\theta_{c}(t).

Proposition 2

Assume that F0𝐴𝐸𝑃(b,C)F_{0}\in\operatorname{AEP}(b,C) for some b(1,2)b\in(1,2) and C>0C>0. In hypercube detection, the test based on Um(t,km)U_{m}(t,k_{m}), with tt fixed such that 0<p0(t)<pc0<p_{0}(t)<p_{c} and km/logm0k_{m}/\log m\to 0, is asymptotically powerless if lim supmθm<θc(t)\limsup_{m\to\infty}\theta_{m}<\theta_{c}(t).

For example, in the setting just described with d=1d=1, the 𝑈𝐿𝑆\operatorname{ULS} scan test has (asymptotically) no power unless θm\theta_{m}\to\infty, whereas the test based on the size of the largest cluster after thresholding is, by Theorem 1, asymptotically powerful if lim infmθm\liminf_{m\to\infty}\theta_{m} is large enough. We therefore choose a sequence kmk_{m} comparable in magnitude to logm\log m and state the performance of the 𝑈𝐿𝑆\operatorname{ULS} scan test in this case.

Theorem 5

In hypercube detection, the test based on Um(t,km)U_{m}(t,k_{m}), with tt fixed such that 0<p0(t)<pc0<p_{0}(t)<p_{c} and km/logmdβk_{m}/\log m\to d\beta with 0<β<1/ζp0(t)0<\beta<1/\zeta_{p_{0}(t)}, is asymptotically powerful if lim infmθm>θ(t)\liminf_{m\to\infty}\theta_{m}>\theta_{*}(t) and asymptotically powerless if lim supmθm<θ(t)\limsup_{m\to\infty}\theta_{m}<\theta_{*}(t), where θ(t)\theta_{*}(t) is the unique solution to αγθ|t(β)=γ0|t(β)\alpha\gamma_{\theta|t}(\beta)=\gamma_{0|t}(\beta).

Note that θ(t)\theta_{*}(t) is well defined by Lemma 17 and that θ(t)<θc\theta_{*}(t)<\theta_{c} as long as α>0\alpha>0. In any case, the test based on Um(t,km)U_{m}(t,k_{m}) with a subcritical threshold tt is, in the setting of hypercube detection, asymptotically powerless when θm0\theta_{m}\to 0, just like the LOC test. In essence, the two tests are qualitatively comparable in this setting. This is also true in the context of path detection. Let γθ|t1(β)\gamma^{1}_{\theta|t}(\beta) denote γθ|t(β)\gamma_{\theta|t}(\beta) in dimension 1.

Theorem 6

In path detection, the test based on Um(t,km)U_{m}(t,k_{m}), with tt fixed such that 0<p0(t)<pc0<p_{0}(t)<p_{c} and km/logmdβk_{m}/\log m\to d\beta with 0<β<1/ζp0(t)0<\beta<1/\zeta_{p_{0}(t)}, is asymptotically powerful if lim infmθm>θ+(t)\liminf_{m\to\infty}\theta_{m}>\theta_{*}^{+}(t), and asymptotically powerless if lim supmθm<θ(t)\limsup_{m\to\infty}\theta_{m}<\theta_{*}^{-}(t), where θ+(t)\theta_{*}^{+}(t) (resp., θ(t)\theta_{*}^{-}(t)) is the unique solution to αγθ|t1(β)=γ0|t(β)\alpha\gamma^{1}_{\theta|t}(\beta)=\gamma_{0|t}(\beta) (resp., αγθ|t(β)=γ0|t(β)\alpha\gamma_{\theta|t}(\beta)=\gamma_{0|t}(\beta)).

As in Theorem 2, the result is not as sharp.

Qualitatively, we see that the performance of the subcritical 𝑈𝐿𝑆\operatorname{ULS} scan and LOC tests are comparable for both hypercube detection and path detection.

5.2 Supercritical threshold

Here we consider the choice of a supercritical threshold, where tt is fixed such that p0(t)>pcp_{0}(t)>p_{c}. We already saw in Section 4.2 that the largest open cluster is unique and occupies a non-negligible fraction of the entire network. This is actually true both under the null and under an alternative. The 𝑈𝐿𝑆\operatorname{ULS} scan test based solely on the largest open cluster is comparable to the test based on the grand mean after thresholding. In turn, assuming tt is fixed, this test is asymptotically powerful when m(α1/2)dθmm^{(\alpha-1/2)d}\theta_{m}\to\infty, and asymptotically powerless if α1/2\alpha\leq 1/2 and θm\theta_{m} is bounded. (This is easily seen using Chebyshev’s inequality.) This is comparable to the LOC test at supercriticality.

In general, the 𝑈𝐿𝑆\operatorname{ULS} scan statistic includes other (smaller) open clusters. The story of the second-largest cluster of supercritical percolation in a box is not yet complete, and for this reason the behavior of the 𝑈𝐿𝑆\operatorname{ULS} scan statistic remains incompletely understood. The difficulty arises from the possibility that the second-largest cluster in 𝕍m\mathbb{V}_{m} might lie at its boundary. Whether or not this occurs depends on the outcome of a calculation (yet to be done) of energy/entropy type involving so-called “droplets” near the boundary of 𝕍m\mathbb{V}_{m} (see, e.g., Bodineau, Ioffe and Velenik [5]). To simplify the discussion, we finesse this problem by working where necessary on 𝕍m\mathbb{V}_{m} with toroidal boundary conditions. That is, whenever we make statements concerning supercritical percolation on the graph 𝕍m\mathbb{V}_{m}, we may add edges connecting sites on its boundary as follows: when d=2d=2, for k=1,2,,mk=1,2,\ldots,m, an additional edge is placed between site (1,k)(1,k) and site (m,k)(m,k), and similarly between (k,1)(k,1) and (k,m)(k,m).

In proving exact asymptotics for test statistics under the null, we assume toroidal boundary conditions. Our results on asymptotic power do not require such exact results but require only orders of magnitude, which do not need the toroidal assumption. We emphasize that similar results are expected to hold with “free” (i.e., without the extra edges) rather than toroidal boundary conditions. Once the percolation picture is better understood, such results will follow in the same manner as those presented in this paper. Our results for the torus are also valid if instead we discount open clusters that touch the boundary of 𝕍m\mathbb{V}_{m}. Details of this are omitted, and the proofs are essentially the same.

When working on the torus, the second-largest cluster is controlled through the following calculation. Cerf [8] proved that the limit

δp:=limkk(d1)/dlog(>Sk)=limkk(d1)/dlog(S=k),\delta_{p}:=-\lim_{k\to\infty}k^{-(d-1)/d}\log\mathbb{P}(\infty>S\geq k)=-\lim_{k\to\infty}k^{-(d-1)/d}\log\mathbb{P}(S=k), (8)

exists, with 0<δp<0<\delta_{p}<\infty for all fixed p(pc,1)p\in(p_{c},1). The dependency on dd is left implicit.

A result similar to Lemma 5 holds with δp\delta_{p} playing the role of ζp\zeta_{p} and the exponent of logm\log m changed in places. It turns out that we need this result only when θ=0\theta=0. For β>0\beta>0 and a supercritical tt, let γ0|t(β):=γ(F0|t,μ0|t,0,β)\gamma_{0|t}(\beta):=\gamma(F_{0|t},\mu_{0|t},0,\beta), defined in Lemma 16.

Lemma 0

Assume that tt is fixed such that pc<p0(t)<1p_{c}<p_{0}(t)<1 and that km/logmdβk_{m}/\log m\to d\beta and km(d1)/d/logmdβk_{m}^{(d-1)/d}/\log m\to d\beta^{\prime} for some 0β,β0\leq\beta,\beta^{\prime}\leq\infty. Then, under the null, the following holds in probability on the torus 𝕍m\mathbb{V}_{m}:

  1. 1.

    If β>1/δp0(t)\beta^{\prime}>1/\delta_{p_{0}(t)}, then Um(t,km)=O(1)U_{m}(t,k_{m})=\mathrm{O}(1).

  2. 2.

    If 0β<1/δp0(t)0\leq\beta^{\prime}<1/\delta_{p_{0}(t)} and β=\beta=\infty, then

    (logm)1/2Um(t,km)σ0|t[2d(1βδp0(t))]1/2,(\log m)^{-1/2}U_{m}(t,k_{m})\to\sigma_{0|t}\bigl{[}2d\bigl{(}1-\beta^{\prime}\delta_{p_{0}(t)}\bigr{)}\bigr{]}^{1/2},

    where σ0|t2:=𝑉𝑎𝑟(F0|t)\sigma_{0|t}^{2}:=\operatorname{Var}(F_{0|t}).

  3. 3.

    If β<\beta<\infty, then the conclusions of Lemma 5 apply. (Note that ζp0(t)=0\zeta_{p_{0}(t)}=0.)

Based on Lemma 6, we obtain the following result on the performance of the 𝑈𝐿𝑆\operatorname{ULS} scan test at supercriticality. As before, we restrict ourselves to the case where Um(t,km)U_{m}(t,k_{m}) is of order (logm)1/2(\log m)^{1/2}. We also chose to state a simple result instead of a more precise result with multiple subcases. This result holds irrespective of the type of boundary condition assumed on 𝕍m\mathbb{V}_{m}.

Theorem 7

In hypercube detection, the test based on Um(t,km)U_{m}(t,k_{m}), with tt fixed such that pc<p0(t)<1p_{c}<p_{0}(t)<1 and lim infkm/logm>0\liminf k_{m}/\log m>0 and lim supkm(d1)/d/logm<αd/δp0(t)\limsup k_{m}^{(d-1)/d}/\log m<\alpha d/\delta_{p_{0}(t)}, is asymptotically powerful (resp., powerless) if

θm[m(α1/2)d+(logm)d/(2d2)](logm)1/2(resp., 0).\theta_{m}\bigl{[}m^{(\alpha-1/2)d}+(\log m)^{d/(2d-2)}\bigr{]}(\log m)^{-1/2}\to\infty\qquad\mbox{(resp., $\to 0$)}.

We also mention that the equivalent of Theorem 6 holds here as well.

The improvement of the supercritical 𝑈𝐿𝑆\operatorname{ULS} scan test compared with the supercritical LOC test is a weaker requirement on θm\theta_{m} by a logarithmic factor. Thus, this test’s performance is still much worse than that of the scan statistic when detecting hypercubes.

5.3 Critical threshold

If we choose a threshold as described in Section 4.3, and if (5) is true, then the power of the 𝑈𝐿𝑆\operatorname{ULS} scan statistic is greatly improved, as in the case of the LOC test. In fact, it can be proven that Theorem 4 remains valid with S(tm)S(t_{m}) replaced with Um(tm,km)U_{m}(t_{m},k_{m}), as long as km=o(m)αdk_{m}=\mathrm{o}(m)^{\alpha d} so that the largest open cluster under the alternative is scanned. This boils down to showing that under the null, the 𝑈𝐿𝑆\operatorname{ULS} scan statistic is at most a power of logm\log m, which we do in Lemma 7 below. However, the 𝑈𝐿𝑆\operatorname{ULS} scan test does not seem to offer any substantial gain in power over the LOC test, given that θm\theta_{m} is still required to be large enough to change the regime of the percolation process within an alternative KK from subcritical to supercritical. That said, actually proving this would require information on the smaller open clusters near criticality, which is scarce and very difficult to obtain (see Borgs et al. [6] for some partial results and postulates).

5.4 Across all thresholds

Finally, we discuss the (simple) 𝑈𝐿𝑆\operatorname{ULS} scan test across all thresholds, as suggested in Patil and Taillie [41]. To take advantage of a phase transition near criticality, we assume, as in Section 4.3, that there exists tct_{c} such that p0(tc)=pcp_{0}(t_{c})=p_{c} and that p0(t)p_{0}(t) is a continuous function of tt in a neighborhood of tct_{c}. We also assume that (5) holds. In Proposition 2, we showed that scanning small clusters may lead to a decrease in power. For this reason, and also to facilitate the analysis, we limit ourselves to clusters of size at least kmk_{m}; that is, we consider the test based on

𝑈𝐿𝑆m(km)=maxtUm(t,km)σ0|t,\operatorname{ULS}_{m}(k_{m})=\max_{t\in\mathbb{R}}\frac{U_{m}(t,k_{m})}{\sigma_{0|t}}, (9)

where, for definiteness, Um(t,km)U_{m}(t,k_{m}) is calculated on the torus 𝕍m\mathbb{V}_{m} when t<tct<t_{c}.

Let Γθ(β)=inftγθ|t(β)/σ0|t2\Gamma_{\theta}(\beta)=\inf_{t}\gamma_{\theta|t}(\beta)/\sigma_{0|t}^{2}, where, in congruence with Sections 5.1 and 5.2,

γθ|t(β)={γ(Fθ|t,μ0|t,ζpθ(t),β), t>tc,γ(Fθ|t,μ0|t,0,β), t<tc,\gamma_{\theta|t}(\beta)=\cases{\gamma\bigl{(}F_{\theta|t},\mu_{0|t},\zeta_{p_{\theta}(t)},\beta\bigr{)},&\quad$t>t_{c}$,\cr\gamma(F_{\theta|t},\mu_{0|t},0,\beta),&\quad$t<t_{c}$,}

with γ\gamma being the function defined in Lemma 16. We first establish the behavior of 𝑈𝐿𝑆m(km)\operatorname{ULS}_{m}(k_{m}) under the null.

Lemma 0

Let km=βlogmk_{m}=\beta\log m where β>0\beta>0, and let tβt_{\beta} be such that d/βζp0(tβ)<d/\beta\leq\zeta_{p_{0}(t_{\beta})}<\infty. Define η(β):=sup{σ0|t/σ0|s:sttβ}\eta(\beta):=\sup\{\sigma_{0|t}/\sigma_{0|s}\colon\ s\leq t\leq t_{\beta}\}. With probability tending to 1, under F0F_{0},

lim supm(logm)1/2𝑈𝐿𝑆m(km)η(β)(dΓθ(β))1/2.\limsup_{m\to\infty}(\log m)^{-1/2}\operatorname{ULS}_{m}(k_{m})\leq\eta(\beta)(d\Gamma_{\theta}(\beta))^{1/2}.

If in addition, either σ0|t\sigma_{0|t} is non-decreasing in tt or F0F_{0} has no atoms on (,tβ](-\infty,t_{\beta}], then, in probability under F0F_{0},

(logm)1/2𝑈𝐿𝑆m(km)(dΓθ(β))1/2.(\log m)^{-1/2}\operatorname{ULS}_{m}(k_{m})\to(d\Gamma_{\theta}(\beta))^{1/2}.

In fact, a result as precise as Lemma 7 is superfluous, given the behavior of the 𝑈𝐿𝑆\operatorname{ULS} scan statistic under the alternative at supercriticality and near-criticality, which is polynomial in mm. The next theorem does not require the use of toroidal boundary conditions.

Theorem 8

In hypercube detection and assuming that (5) holds, the test based on 𝑈𝐿𝑆m(km)\operatorname{ULS}_{m}(k_{m}), with km=[βlogm]k_{m}=[\beta\log m] for some β>0\beta>0, is asymptotically powerful if θmmλ\theta_{m}m^{\lambda}\to\infty, for some 0<λ<α/ν0<\lambda<\alpha/\nu satisfying λ<(α1/2)d\lambda<(\alpha-1/2)d if α>1/2\alpha>1/2.

Thus, scanning all thresholds elicits the best performance of the LOC tests. Nevertheless, the overall test is still suboptimal when detecting hypercubes compared with the scan statistic. We mention in passing that the same result holds for the simpler test that scans only the largest open cluster at each threshold.

6 Implementation and numerical experiments

The scan test has been shown to be near-optimal in a wide variety of settings, differing in terms of both network structure and cluster class (Arias-Castro, Candès and Durand [1]; Arias-Castro, Donoho and Huo [3]). It is computationally demanding, however. For the simple situation of detecting a hypercube, the scan statistic can be computed in O(NlogN)\mathrm{O}(N\log N) flops, where N:=mdN:=m^{d} is the network size if the size of the hypercube is known. If one scans over all possible hypercubes, then computing the scan statistic requires O(N2logN)\mathrm{O}(N^{2}\log N) flops. For nonparametric shapes, the computational cost is even higher; in fact, for the problem of detecting a loopless path, computing the scan statistic corresponds to the reward-budget problem of DasGupta et al. [13], shown there to be NP-hard. Because the scan statistic is so computationally burdensome, the cluster class is most often taken to be parametric in practice, even though the underlying clusters may take a much wider range of shapes. For instance, discs are the prevalent shape used in disease outbreak detection (Kulldorff and Nagarwalla [33]), with variants such as ellipses (Hobolth, Pedersen and Jensen [26]; Kulldorff et al. [32]). For a wide range of parametric shapes, Arias-Castro, Donoho and Huo [3] recommended a multiscale approximation to the scan statistic. Efforts to move beyond parametric models include tree-based approaches (Kulldorff, Fang and Walsh [31]), simulated annealing (Duczmal and Assunção [16]) and an exhaustive search among arbitrarily shaped clusters of small size (Tango and Takahashi [51]).

The LOC test does not assume any parametric form for the anomalous cluster, and in that sense is nonparametric. Its computational complexity at a given threshold is of order the number of nodes plus the number of edges in the network (Cormen et al. [10]), and so of order O(N)\mathrm{O}(N) flops for the square lattice.

The 𝑈𝐿𝑆\operatorname{ULS} scan statistic is nonparametric as well. Computing Um(t,km)U_{m}(t,k_{m}) requires determining 𝒬m(t)\mathcal{Q}_{m}^{(t)}, which takes O(N)\mathrm{O}(N) flops, and then scanning over 𝒬m(t)\mathcal{Q}_{m}^{(t)}. Because the clusters in 𝒬m(t)\mathcal{Q}_{m}^{(t)} do not intersect, scanning over them takes order O(N)\mathrm{O}(N) flops. Therefore, computing 𝑈𝐿𝑆m\operatorname{ULS}_{m} can be done in O(MN)\mathrm{O}(M\cdot N) flops, where MM is the number of distinct values at the nodes. Patil and Taillie [42] argued that this can be done faster by using the tree structure of 𝒬m\mathcal{Q}_{m}^{*}, where the root is the entire network 𝕍m\mathbb{V}_{m} and a cluster K𝒦m(tj)K\in\mathcal{K}_{m}(t_{j}) is the parent of any cluster L𝒦m(tj+1)L\in\mathcal{K}_{m}(t_{j+1}) such that LKL\subset K, where t1<<tMt_{1}<\cdots<t_{M} denote the distinct values at the nodes.

We complement our theoretical analysis with some small-scale numerical experiments. Specifically, we explore the power properties of the LOC test of Section 4 and the 𝑈𝐿𝑆\operatorname{ULS} scan test of Section 5 in the context of detecting a hypercube in the two-dimensional square lattice. Patil, Modarres and Patankar [40] are developing sophisticated software implementing the 𝑈𝐿𝑆\operatorname{ULS} scan statistic for use in real-life situations, with more recent variations Patil, Joshi and Koli [38]. However, this software is not yet available, so we implemented our own (basic) routines.

We used the statistical software R (R Core Team [48]) with the package igraph (Csardi [11]). Our (basic) implementation of the 𝑈𝐿𝑆\operatorname{ULS} scan statistic for a given threshold is much slower than both the scan statistic with a given mask and the LOC statistic, especially when there is no constraint on the size of the open clusters to be scanned, that is, when km=1k_{m}=1. In all of our experiments, we chose the square lattice in dimension d=2d=2 with side length m=500m=500 for a total of 250,000 nodes, and we considered three alternatives: squares of side length {10,50,100}\ell\in\{10,50,100\}, corresponding roughly to α{0.4,0.7,0.8}\alpha\in\{0.4,0.7,0.8\}. The squares were fixed away from the boundary of the lattice, given that the methods are essentially location-independent. (This is rigorously true of the scan statistic.) We assessed the performance of a method in a given situation by estimating its risk, which we define as the sum of the probabilities of type I and type II errors optimized over all rejection regions.

We first ran some experiments to quickly assess the power of the scan test. We found that the test agrees very well with the theory (i.e., Lemma 3), which we already knew from previous experience. Specifically, we assumed a normal location model and simulated 100 realizations of the null and each of the three alternatives with θ{j/:j=1,3,5,7,9}\theta\in\{j/\ell\colon\ j=1,3,5,7,9\} (see Figure 2).

Refer to caption
Figure 2: The risk of the scan test against each of the three alternatives. The xx-axis is θ\theta, and the yy-axis is the estimated risk based on 100 replicates.
Refer to caption
Figure 3: The risk of the LOC test against each of the three alternatives. The xx-axis is the percolation probability qq on the anomalous cluster, and the yy-axis is the estimated risk based on 1000 replicates. Each curve corresponds to a different percolation probability pp.

Next, we performed some larger experiments to assess the power of the LOC test. We simply assumed a site percolation model with probability p{0.05,0.10,,0.90,0.95}p\in\{0.05,0.10,\ldots,0.90,0.95\}. Note that pcp_{c} is not known for site percolation in the square lattice, although pc0.593p_{c}\approx 0.593 from extensive numerical experiments (Feng, Deng and Blöte [19]). We simulated the null and each of the three alternatives with q{0.05,0.10,,0.90,0.95},q>p,q\in\{0.05,0.10,\ldots,0.90,0.95\},q>p, within the anomalous cluster. We replicated each situation 1000 times. The risk curves are shown in Figure 3. The test seems to behave similarly above and below criticality. At near-criticality, the test is rather erratic. However, when the size of the anomalous cluster is large enough, =100\ell=100, the risk curve is steepest just under pcp_{c}, at p=0.55p=0.55 in our experiments, with full power against q0.65q\geq 0.65. Figure 4 shows boxplots of the test statistic for the case where =100\ell=100 and p=0.40p=0.40 (subcritical), p=0.55p=0.55 (near-critical), and p=0.70p=0.70 (supercritical).

If we were to use this test in the context of a normal location model, then the correspondence would be t=Φ¯1(p)t=\bar{\Phi}^{-1}(p) (the threshold) and θ=tΦ¯1(q)\theta=t-\bar{\Phi}^{-1}(q), where Φ¯\bar{\Phi} denotes the normal survival distribution function. Figure 5 plots the risk curves in this context for p{0.40,0.50,0.55,0.60,0.70}p\in\{0.40,0.50,0.55,0.60,0.70\}. In particular, the test at near-criticality with t=Φ¯1(0.55)=0.126t=\bar{\Phi}^{-1}(0.55)=-0.126 has full power against the alternative with =100\ell=100 and θ=0.26\theta=0.26.

Refer to caption
Figure 4: The size of the largest open cluster in log10\log_{10} scale (yy-axis) versus the percolation probability qq, for the alternative =100\ell=100 and p{0.40,0.55,0.70}p\in\{0.40,0.55,0.70\} (from left to right). Each boxplot represent 1000 replicates.
Refer to caption
Figure 5: The risk of the LOC test in the context of a normal location model. The xx-axis is θ\theta, and the yy-axis is the estimated risk based on 1000 replicates. Each curve corresponds to a different threshold tt. The solid (—), dashed (- -), dotted (\cdots), dot-dashed (-\cdot-) and long-dashed (– –) curves correspond to p=0.40,0.50,0.55,0.60 and 0.70p=0.40,0.50,0.55,0.60\mbox{ and }0.70, respectively.

Finally, we experimented with the 𝑈𝐿𝑆\operatorname{ULS} scan test. To limit the size of our simulations, we considered alternatives with θ=Φ1(q)\theta=\Phi^{-1}(q) with q{0.55,0.6,0.65,0.70,0.80,0.90}q\in\{0.55,0.6,0.65,0.70,0.80,0.90\} and chose t=Φ1(p)t=\Phi^{-1}(p) with p{0.40,0.50,0.55,0.60,0.70}p\in\{0.40,0.50,0.55,0.60,0.70\} as thresholds. We restricted scanning to open clusters of size not smaller than 1/101/10 of the size of largest open cluster, essentially falling in the regime of Part 2 of Lemma 5, and also making the computation much faster. We used 200 replicates. We again see that the risk curve is sharpest near criticality when the size of the anomalous cluster is sufficiently large, here for 50\ell\geq 50. Compared with the LOC test, the 𝑈𝐿𝑆\operatorname{ULS} scan test has more power at large θ\theta when the cluster is small =10\ell=10 (as predicted) and, more interestingly, slightly more power when the cluster is larger. Compared with the scan statistic, which knows the size and shape of the anomalous cluster, the 𝑈𝐿𝑆\operatorname{ULS} scan test with the best choice of threshold (corresponding to p=0.55p=0.55) requires approximately threefold greater signal amplitude.

Refer to caption
Figure 6: The risk of the 𝑈𝐿𝑆\operatorname{ULS} scan test against each of the three alternatives. On the xx-axis is θ\theta, and on the yy-axis is the estimated risk based on 200 replicates. Each curve corresponds to a different threshold tt. The solid (—), dashed (- -), dotted (\cdots), dot-dashed (-\cdot-) and long-dashed (– –) curves correspond to p=0.40,0.50,0.55,0.60 and 0.70p=0.40,0.50,0.55,0.60\mbox{ and }0.70, respectively.

7 Discussion

The contribution of this paper is a rigorous mathematical analysis of the performance of the LOC test independent of, and more extensively than Davies, Langovoy and Wittich [14] and Langovoy and Wittich [34], and of the 𝑈𝐿𝑆\operatorname{ULS} scan test, both nonparametric and computationally tractable methods. We made abundant use of percolation theory to establish these results. We compared the power of these tests with that of the scan statistic, which is known to be near-optimal in a wide array of settings. Although these tests are comparable in power with the scan statistic for the detection of a path, they may be substantially less powerful for the detection of a hypercube. Note, however, that the scan statistic is provided with knowledge about the shape and size of the anomalous cluster. In theory, we argued that this was the case based on some heuristics and conjectures from percolation theory. Numerically, this appears to be the case when the anomalous cluster is large enough. In our experiments, the 𝑈𝐿𝑆\operatorname{ULS} scan test was slightly more powerful than the LOC test, and required a θ\theta three to four times larger than the scan statistic, which has the advantage of knowing the shape and size of the cluster. This result is promising, and further numerical experiments are needed to evaluate the power of these tests in truly nonparametric settings, because they do not require previous information about cluster shape, and are computationally more feasible in general.

Our theoretical results generalize to other networks that resemble the lattice, with a different critical percolation probability pcp_{c} and different functions ζp\zeta_{p} and δp\delta_{p}. In particular, we used the self-similarity property of the square lattice and the fact that it has polynomial growth. Our results also generalize to other cluster classes; in the setting of the square lattice, they extend immediately to any class of clusters that includes a hypercube of comparable size (e.g., the class 𝒦m\mathcal{K}_{m} of clusters KK of size |K|=[mα]d|K|=[m^{\alpha}]^{d}), such that there is a hypercube K0KK_{0}\subset K with |K0|/|K|ωm|K_{0}|/|K|\geq\omega_{m}, where ωm0\omega_{m}\to 0 more slowly than any negative power of mm. In addition, the class might contain clusters of different sizes, although in that case the worst-case risk would be driven by the smallest clusters. Implementation of the scan statistic may be much more demanding in this case. The main results of Section 4 require only that Fθ(t)F_{\theta}(t) be twice differentiable in (t,θ)(t,\theta), with θFθ(t)<0\partial_{\theta}F_{\theta}(t)<0 for all (t,θ)(t,\theta), which is the case, for example, for location models and scale models if F0F_{0} is twice differentiable with a strictly positive first derivate. With some additional work, we also can obtain results for classes of “thin” clusters as defined in Arias-Castro, Candès and Durand [1]. The key is to understand the percolation behavior within and near such clusters. Some results are available for slabs (Grimmett [21], Theorem 7.2) and more general subgraphs of lattices including “wedges,” and these appear to be transferable to other “curved” slabs.

Appendix A: Proofs

{nota*}

We write fmgmf_{m}\sim g_{m} as nn\to\infty if fm/gm1f_{m}/g_{m}\to 1. Similarly, we use O()\mathrm{O}(\cdot) and o()\mathrm{o}(\cdot) and write fmgmf_{m}\asymp g_{m} as nn\to\infty if fm=O(gm)f_{m}=\mathrm{O}(g_{m}) and vice versa. We also use their random counterparts, P\sim_{\mathrm{P}}, P\asymp_{\mathrm{P}}, OP()\mathrm{O}_{\mathrm{P}}(\cdot), and oP()\mathrm{o}_{\mathrm{P}}(\cdot). For example, Zm=oP(km)Z_{m}=\mathrm{o}_{\mathrm{P}}(k_{m}) means that Zm/km0Z_{m}/k_{m}\to 0 in probability, and Zm=OP(km)Z_{m}=\mathrm{O}_{\mathrm{P}}(k_{m}) means that Zm/kmZ_{m}/k_{m} is bounded in probability, which is to say that (|Zm|kmlm)1\mathbb{P}(|Z_{m}|\geq k_{m}l_{m})\to 1 as mm\to\infty for any lml_{m} satisfying lml_{m}\to\infty. We use 1{A}1\{A\} to denote the indicator function of the set AA. The maximum of kk and \ell is denoted by kk\vee\ell.

.1 On the size of percolation clusters

Here we state and prove some results on the sizes of percolation clusters in d\mathbb{Z}^{d}. We start by proving some properties of ζp\zeta_{p}. Recall that SS denotes the size of the open cluster at the origin. Besides the limit in (3), the following bound holds for p<pcp<p_{c} and all k1k\geq 1:

p(Sk)(1p)2kekζp(1eζp)2,\mathbb{P}_{p}(S\geq k)\leq(1-p)^{2}\frac{k\mathrm{e}^{-k\zeta_{p}}}{(1-\mathrm{e}^{-\zeta_{p}})^{2}}, (1)

by Grimmett [21], Equation (6.80), adapted to site percolation.

Lemma A.0

The function ζp\zeta_{p} defined in (3) is continuous and strictly decreasing over (0,pc](0,p_{c}], and satisfies limp0ζp=\lim_{p\to 0}\zeta_{p}=\infty and limppcζp=0\lim_{p\to p_{c}}\zeta_{p}=0.

Proof.

Let 0p<p10\leq p<p^{\prime}\leq 1. By coupling p\mathbb{P}_{p} and p\mathbb{P}_{p^{\prime}} in the usual way,

p(S=k)(p/p)kp(S=k),\mathbb{P}_{p}(S=k)\geq(p/p^{\prime})^{k}\mathbb{P}_{p^{\prime}}(S=k),

so that ζpζp+log(p/p)\zeta_{p}\leq\zeta_{p^{\prime}}+\log(p^{\prime}/p). Applying Grimmett [21], Theorem 2.38, to the event {Sk}\{S\geq k\}, we find that, as in the proof of Grimmett [21], Equation (6.16), ζp/logpζp/logp\zeta_{p}/\log p\leq\zeta_{p^{\prime}}/\log p^{\prime}. In summary,

ζp(1log(1/p)log(1/p))ζpζplog(p/p).\zeta_{p}\biggl{(}1-\frac{\log(1/p^{\prime})}{\log(1/p)}\biggr{)}\leq\zeta_{p}-\zeta_{p^{\prime}}\leq\log(p^{\prime}/p). (2)

Therefore, ζp\zeta_{p} is continuous and strictly decreasing on (0,pc)(0,p_{c}). Moreover, by fixing p(0,pc)p^{\prime}\in(0,p_{c}) and letting p0p\to 0, we have

ζpζplog(1/p)log(1/p).\zeta_{p}\geq\zeta_{p^{\prime}}\frac{\log(1/p)}{\log(1/p^{\prime})}\to\infty.

Finally, by Grimmett [21], Equations (6.83), (6.56), ζp0=ζpc\zeta_{p}\to 0=\zeta_{p_{c}} as ppcp\uparrow p_{c}. ∎

Next, we prove (4). We do this by standard means, and the claim may be strengthened (see also Grimmett [22]; Hofstad and Redig [52]).

Lemma A.0

Consider site percolation on d\mathbb{Z}^{d} with parameter p<pcp<p_{c}, and let SmS_{m} denote the size of the largest open cluster within 𝕍m\mathbb{V}_{m}. Then (4) holds, namely

Smlogmdζp,in probability.\frac{S_{m}}{\log m}\to\frac{d}{\zeta_{p}},\qquad\mbox{in probability}.
Proof.

Fix 0<ε<1/20<\varepsilon<1/2. Let SvS^{v} be the size of the open cluster at a node vdv\in\mathbb{Z}^{d}, which has the same distribution as SS. We start with the upper bound. By the union bound,

(Smk)v𝕍m(Svk)=|𝕍m|(Sk).\mathbb{P}(S_{m}\geq k)\leq\sum_{v\in\mathbb{V}_{m}}\mathbb{P}(S^{v}\geq k)=|\mathbb{V}_{m}|\cdot\mathbb{P}(S\geq k). (3)

Thus, using (3), for km(ε):=(1+ε)(d/ζp)logmk_{m}(\varepsilon):=(1+\varepsilon)(d/\zeta_{p})\log m and mm large enough,

(Smkm(ε))mdexp((1ε/2)ζpkm(ε))mεd/4,\mathbb{P}\bigl{(}S_{m}\geq k_{m}(\varepsilon)\bigr{)}\leq m^{d}\exp\bigl{(}-(1-\varepsilon/2)\zeta_{p}k_{m}(\varepsilon)\bigr{)}\leq m^{-\varepsilon d/4},

and the term on the right-hand side converges to 0.

For the lower bound, consider N=md/(logm)2dN=\lceil m^{d}/(\log m)^{2d}\rceil nodes v1,,vN𝕍mv_{1},\ldots,v_{N}\in\mathbb{V}_{m} separated from each other and the boundary of 𝕍m\mathbb{V}_{m} by at least 12(logm)2\frac{1}{2}(\log m)^{2}. Let km(ε):=(1ε)(d/ζp)logmk_{m}(\varepsilon):=(1-\varepsilon)(d/\zeta_{p})\log m. For sufficiently large mm, the events Ei:={|Svi|km(ε)}E_{i}:=\{|S^{v_{i}}|\leq k_{m}(\varepsilon)\} are independent. Therefore, using (3), for large mm,

(Smkm(ε))\displaystyle\mathbb{P}\bigl{(}S_{m}\leq k_{m}(\varepsilon)\bigr{)} \displaystyle\leq (1(Skm(ε)))N\displaystyle\bigl{(}1-\mathbb{P}\bigl{(}S\geq k_{m}(\varepsilon)\bigr{)}\bigr{)}^{N}
\displaystyle\leq (1exp((1+ε/2)ζpkm(ε)))N\displaystyle\bigl{(}1-\exp\bigl{(}-(1+\varepsilon/2)\zeta_{p}k_{m}(\varepsilon)\bigr{)}\bigr{)}^{N}
\displaystyle\leq exp(mεd/2/(logm)2d),\displaystyle\exp\bigl{(}-m^{\varepsilon d/2}/(\log m)^{2d}\bigr{)},

and the last term on the right-hand side tends to 0 as mm\to\infty. ∎

The following result describes the behavior of size of the open cluster at the origin when pp is small. It may be made more precise, but we do not pursue this here.

Lemma A.0

There exists c>0c>0 depending only on dd such that, for p(0,(2c)1)p\in(0,(2c)^{-1}),

pkp(Sk)12(cp)kk1.p^{k}\leq\mathbb{P}_{p}(S\geq k)\leq{\textstyle\frac{1}{2}}(cp)^{k}\qquad\forall k\geq 1.
Proof.

An animal is a connected subgraph of d\mathbb{Z}^{d} containing the origin. The lower bound comes from considering the probability that any given animal of size kk is open. For the upper bound, by the union bound, we have p(S=k)|𝒜k|pk\mathbb{P}_{p}(S=k)\leq|\mathcal{A}_{k}|p^{k}, where 𝒜k\mathcal{A}_{k} is the set of animals with kk vertices. There is a constant c>0c>0 such that |𝒜k|ck|\mathcal{A}_{k}|\leq c^{k}, so that

p(Sk)kcp=(cp)k1cp12(cp)k,\mathbb{P}_{p}(S\geq k)\leq\sum_{\ell\geq k}c^{\ell}p^{\ell}={\displaystyle\frac{(cp)^{k}}{1-cp}}\leq\frac{1}{2}(cp)^{k},

when cp<12cp<\frac{1}{2}. ∎

We next present a result on the number of open clusters of a given size that is valid for all p(0,1)p\in(0,1).

Lemma A.0

Consider site percolation on d\mathbb{Z}^{d} with parameter pp, and let Nm(k)N_{m}(k) denote the number of open clusters of size kk within 𝕍m\mathbb{V}_{m}. Then, for k1k\geq 1,

(m2k)dk(S=k)𝔼(Nm(k))mdk(>Sk),\frac{(m-2k)^{d}}{k}\mathbb{P}(S=k)\leq\mathbb{E}(N_{m}(k))\leq\frac{m^{d}}{k}\mathbb{P}(\infty>S\geq k),

In addition, for k,1k,\ell\geq 1,

|𝐶𝑜𝑣(Nm(k),Nm())|3d+1(k+)d𝔼(Nm(k)).\bigl{|}\operatorname{Cov}(N_{m}(k),N_{m}(\ell))\bigr{|}\leq 3^{d+1}(k+\ell)^{d}\mathbb{E}(N_{m}(k\vee\ell)).

Thus, for k1k\geq 1,

𝑉𝑎𝑟(Nm(k))6d+1kd𝔼(Nm(k)).\operatorname{Var}(N_{m}(k))\leq 6^{d+1}k^{d}\mathbb{E}(N_{m}(k)).
Proof.

Let SmvS^{v}_{m} be the size of the open cluster at vv within the box 𝕍m\mathbb{V}_{m}. Then

Nm(k)=v𝕍mXv(k),N_{m}(k)=\sum_{v\in\mathbb{V}_{m}}X^{v}(k), (5)

where Xv(k)=k11{Smv=k}X^{v}(k)=k^{-1}1\{S^{v}_{m}=k\}. We immediately have

𝔼(Nm(k))v𝕍m1k(>Svk)=|𝕍m|k(>Sk).\mathbb{E}(N_{m}(k))\leq\sum_{v\in\mathbb{V}_{m}}\frac{1}{k}\mathbb{P}(\infty>S^{v}\geq k)=\frac{|\mathbb{V}_{m}|}{k}\mathbb{P}(\infty>S\geq k).

For the lower bound, we count only nodes away from the boundary, obtaining

𝔼(Nm(k))|𝕍m(k)|1k(S=k),\mathbb{E}(N_{m}(k))\geq|\mathbb{V}_{m}(k)|\frac{1}{k}\mathbb{P}(S=k),

where 𝕍m(k):={k,,mk}d\mathbb{V}_{m}(k):=\{k,\ldots,m-k\}^{d}.

We turn now to the covariances. By (5),

𝐶𝑜𝑣(Nm(k),Nm())\displaystyle\operatorname{Cov}(N_{m}(k),N_{m}(\ell)) =\displaystyle= v,w𝕍m𝐶𝑜𝑣(Xv(k),Xw(l))\displaystyle\sum_{v,w\in\mathbb{V}_{m}}\operatorname{Cov}(X^{v}(k),X^{w}(l))
=\displaystyle= v,w𝕍mvwk+𝐶𝑜𝑣(Xv(k),Xw(l)),\displaystyle\mathop{\mathop{\sum}_{v,w\in\mathbb{V}_{m}}}_{\|v-w\|\leq k+\ell}\operatorname{Cov}(X^{v}(k),X^{w}(l)),

because Xv(k)X^{v}(k) and Xw()X^{w}(\ell) are independent if vw>k+\|v-w\|>k+\ell, where \|\cdot\| denotes \ell^{\infty}-norm. Now,

|𝐶𝑜𝑣(Xv(k),Xw())|\displaystyle\bigl{|}\operatorname{Cov}(X^{v}(k),X^{w}(\ell))\bigr{|} =\displaystyle= |𝔼(Xw()|Xv(k)=k1)𝔼(Xw())|𝔼(Xv(k))\displaystyle\bigl{|}\mathbb{E}\bigl{(}X^{w}(\ell)|X^{v}(k)=k^{-1}\bigr{)}-\mathbb{E}(X^{w}(\ell))\bigr{|}\mathbb{E}(X^{v}(k))
\displaystyle\leq 1𝔼(Xv(k)),\displaystyle\frac{1}{\ell}\mathbb{E}(X^{v}(k)),

so that

|𝐶𝑜𝑣(Nm(k),Nm())|1(2k+2+1)d𝔼(Nm(k)),\bigl{|}\operatorname{Cov}(N_{m}(k),N_{m}(\ell))\bigr{|}\leq\frac{1}{\ell}(2k+2\ell+1)^{d}\mathbb{E}(N_{m}(k)),

and the second claim of the lemma follows. ∎

We now describe some properties of the open clusters within 𝕍m\mathbb{V}_{m} in the supercritical regime. In this regime, it is known that, with probability 1, there is a unique infinite open cluster in d\mathbb{Z}^{d}, denoted by QQ_{\infty} (see, e.g., Grimmett [21], Section 8.2). With high probability, the largest open cluster within 𝕍m\mathbb{V}_{m} is a subgraph of this infinite open cluster. Next, we present some additional information on its size, SmS_{m}.

Lemma A.0

Suppose that p>pcp>p_{c}. There is a constant C>0C>0 such that, with probability at least 1exp(Cmd1)1-\exp(-Cm^{d-1}), there is a unique largest open cluster within 𝕍m\mathbb{V}_{m}, and it is a subgraph of QQ_{\infty}. Moreover, as mm\to\infty, its size SmS_{m} satisfies

Sm𝔼(Sm)𝑉𝑎𝑟(Sm)𝒩(0,1),in distribution,\frac{S_{m}-\mathbb{E}(S_{m})}{\sqrt{\operatorname{Var}(S_{m})}}\to\mathcal{N}(0,1),\qquad\mbox{in distribution},

with 𝔼(Sm)Θp|𝕍m|\mathbb{E}(S_{m})\sim\Theta_{p}|\mathbb{V}_{m}| and 𝑉𝑎𝑟(Sm)σ2|𝕍m|\operatorname{Var}(S_{m})\sim\sigma^{2}|\mathbb{V}_{m}| for some σ2>0\sigma^{2}>0 depending on (d,p)(d,p).

Proof.

For the first part and the limiting behavior of 𝔼(Sm)\mathbb{E}(S_{m}) as mm\to\infty, see the discussion of Penrose and Pisztora [44], Theorems 4 and 6, and the beginning of this Appendix. For the weak limit and the limit size of the variance of SmS_{m}, see, for example, Penrose [43], Theorem 3.2. ∎

We next describe some properties of the smaller open clusters. Let Sm(2)S_{m}^{(2)} be the size of the largest open cluster of d\mathbb{Z}^{d} that is contained entirely within 𝕍m\mathbb{V}_{m}.

Lemma A.0

Suppose that p>pcp>p_{c}. There exists a positive constant δp\delta_{p} such that

Sm(2)(logm)d/(d1)(dδp)d/(d1),in probability.\frac{S_{m}^{(2)}}{(\log m)^{d/(d-1)}}\to\biggl{(}\frac{d}{\delta_{p}}\biggr{)}^{d/(d-1)},\qquad\mbox{in probability}.

For any c>0c>0, there exists σi=σi(p,c)>0\sigma_{i}=\sigma_{i}(p,c)>0 such that the following holds: With probability tending to 1, there exist at least σ1mdexp[σ2(logm)(d1)/d]\sigma_{1}m^{d}\exp[-\sigma_{2}(\log m)^{(d-1)/d}] open clusters of size [clogm][c\log m] of d\mathbb{Z}^{d} lying within 𝕍m\mathbb{V}_{m}.

Our results on exact asymptotics in the supercritical phase concern 𝕍m\mathbb{V}_{m} with toroidal boundary conditions. One effect of removing the boundary from 𝕍m\mathbb{V}_{m} is that the asymptotics of the largest cluster coincide with those of SmS_{m}, as well as for the second-largest cluster Sm(2)S_{m}^{(2)}. In the proof of Theorem 7, we need an upper bound on the size of the second-largest cluster inside a box with “free” boundary conditions. We do not explore this in detail here, because it relies on extensions of arguments of Kesten and Zhang [28] (see also Grimmett [21], Proof of Theorem 8.65), which have not yet been not fully explored in the literature. Instead, we note that the the second-largest open cluster in a supercritical percolation model on 𝕍m\mathbb{V}_{m} with free boundary conditions has size of order OP((logm)d/(d1))\mathrm{O}_{\mathrm{P}}((\log m)^{d/(d-1)}). {pf*}Proof of Lemma 13 It was proven by Cerf [8] that the limit

δp:=limkk(d1)/dlog(S=k)\delta_{p}:=-\lim_{k\to\infty}k^{-(d-1)/d}\log\mathbb{P}(S=k) (6)

exists and is strictly positive and finite when pc<p<1p_{c}<p<1. It is elementary that δp\delta_{p} thus defined is equal to that of (8) (see also Grimmett [21], Section 8.6). The first part of the lemma follows by the same proof as used in Lemma 9.

As in the proof of Lemma 11, the mean number μm\mu_{m} of clusters of size k:=[clogm]k:=[c\log m] satisfies

mdclogmexp(δ1(clogm)(d1)/d)μmmd[clogm]exp(δ2(clogm)(d1)/d)\frac{m^{d}}{c\log m}\exp\bigl{(}-\delta^{1}(c\log m)^{(d-1)/d}\bigr{)}\leq\mu_{m}\leq\frac{m^{d}}{[c\log m]}\exp\bigl{(}-\delta^{2}(c\log m)^{(d-1)/d}\bigr{)}

for positive constants δi\delta^{i}. The number of such clusters has variance no larger than CkdμmCk^{d}\mu_{m} for some C<C<\infty. The claim follows by Chebyshev’s inequality.

.2 Some distributional properties

Here we present some results for 𝐴𝐸𝑃\operatorname{AEP} and exponential families of distributions. Our first result is on the size of the maximum of an i.i.d. sample from an 𝐴𝐸𝑃\operatorname{AEP} distribution.

Lemma A.0

Let F𝐴𝐸𝑃(b,C)F\in\operatorname{AEP}(b,C) for some b>0b>0 and C>0C>0. Then, for X1,,Xni.i.d.FX_{1},\ldots,X_{n}\stackrel{{\scriptstyle\mathrm{i.i.d.}}}{{\sim}}F,

max(X1,,Xn)(logn)1/bC1/b,in probability.\frac{\max(X_{1},\ldots,X_{n})}{(\log n)^{1/b}}\to C^{-1/b},\qquad\mbox{in probability}.
Proof.

Fix ε(0,1)\varepsilon\in(0,1) and define xn(ε)=((1ε)(logn)/C)1/bx_{n}(\varepsilon)=((1-\varepsilon)(\log n)/C)^{1/b}. For nn large enough, we have, by independence,

(max(X1,,Xn)xn(ε))\displaystyle\mathbb{P}\bigl{(}\max(X_{1},\ldots,X_{n})\leq x_{n}(\varepsilon)\bigr{)} \displaystyle\leq (1F¯(xn(ε)))n\displaystyle\bigl{(}1-\bar{F}(x_{n}(\varepsilon))\bigr{)}^{n}
\displaystyle\leq (1exp((1+ε)Cxn(ε)b))n\displaystyle\bigl{(}1-\exp\bigl{(}-(1+\varepsilon)Cx_{n}(\varepsilon)^{b}\bigr{)}\bigr{)}^{n}
\displaystyle\leq exp(nε2)0.\displaystyle\exp(-n^{\varepsilon^{2}})\to 0.

Now redefine xn(ε)=((1+ε)(logn)/C)1/bx_{n}(\varepsilon)=((1+\varepsilon)(\log n)/C)^{1/b}. For nn large enough, we have, by the union bound,

(max(X1,,Xn)xn(ε))\displaystyle\mathbb{P}\bigl{(}\max(X_{1},\ldots,X_{n})\geq x_{n}(\varepsilon)\bigr{)} \displaystyle\leq nF¯(xn(ε))\displaystyle n\bar{F}(x_{n}(\varepsilon))
\displaystyle\leq nexp((1ε/3)Cxn(ε)b)\displaystyle n\exp\bigl{(}-(1-\varepsilon/3)Cx_{n}(\varepsilon)^{b}\bigr{)}
\displaystyle\leq nε/30.\displaystyle n^{-\varepsilon/3}\to 0.
\upqed

We next describe the behavior at infinity of the logarithmic moment-generating function and rate function of an 𝐴𝐸𝑃\operatorname{AEP} distribution.

Lemma A.0

Let F𝐴𝐸𝑃(b,C)F\in\operatorname{AEP}(b,C) for some b1b\geq 1 and C>0C>0, with logarithmic moment-generating function Λ\Lambda and rate function Λ\Lambda^{*}. Then, as θ\theta\to\infty,

θb/(b1)Λ(θ)\displaystyle\theta^{-b/(b-1)}\Lambda(\theta) \displaystyle\to C(b1)(Cb)b/(b1),b>1;\displaystyle C(b-1)(Cb)^{-b/(b-1)},\qquad b>1; (7)
(log(1/(Cθ)))1Λ(θ)\displaystyle\bigl{(}\log\bigl{(}1/(C-\theta)\bigr{)}\bigr{)}^{-1}\Lambda(\theta) \displaystyle\to 1,b=1;\displaystyle 1,\qquad b=1; (8)

and, as xx\to\infty,

xbΛ(x)C.x^{-b}\Lambda^{*}(x)\to C. (9)
Proof.

Let φ\varphi be the moment-generating function of FF. We focus on the upper bound in (7) – obtaining the bound in (8) is analogous – and deduce the lower bound in (9). Let b>1b>1, C/2<A<CC/2<A<C, and let x1>0x_{1}>0 be such that F¯(x)exp(Axb)\bar{F}(x)\leq\exp(-Ax^{b}) for all x>x1x>x_{1}. We start from the following bound:

φ(θ)=θexp(θx)F¯(x)dxexp(θx1)+x1θexp(θxAxb)dx.\varphi(\theta)=\int_{-\infty}^{\infty}\theta\exp(\theta x)\bar{F}(x)\,\mathrm{d}x\leq\exp(\theta x_{1})+\int_{x_{1}}^{\infty}\theta\exp(\theta x-Ax^{b})\,\mathrm{d}x.

We again divide the integral into xx2x\leq x_{2} and x>x2x>x_{2}, where x2:=(2θ/A)1/(b1)x_{2}:=(2\theta/A)^{1/(b-1)}. For xx2x\leq x_{2}, we bound exp(θxAxb)\exp(\theta x-Ax^{b}) by its maximum over (0,)(0,\infty). For x>x2x>x_{2}, exp(θxAxb)exp((C/4)xb)\exp(\theta x-Ax^{b})\leq\exp(-(C/4)x^{b}). Letting B=A(b1)(Ab)b/(b1)B=A(b-1)(Ab)^{-b/(b-1)}, and assuming that θ\theta is large enough such that x2>x1x_{2}>x_{1}, we get

x1θexp(θxAxb)dx(x2x1)θexp(Bθb/(b1))+θx2exp((C/4)xb)dx.\int_{x_{1}}^{\infty}\theta\exp(\theta x-Ax^{b})\,\mathrm{d}x\leq(x_{2}-x_{1})\theta\exp\bigl{(}B\theta^{b/(b-1)}\bigr{)}+\theta\int_{x_{2}}^{\infty}\exp\bigl{(}-(C/4)x^{b}\bigr{)}\,\mathrm{d}x.

Thus, when θ\theta\to\infty,

φ(θ)=O(θb/(b1))exp(Bθb/(b1)).\varphi(\theta)=\mathrm{O}\bigl{(}\theta^{b/(b-1)}\bigr{)}\exp\bigl{(}B\theta^{b/(b-1)}\bigr{)}. (10)

Taking logs and letting θ\theta\to\infty, we get

lim supθθb/(b1)Λ(θ)A(b1)(Ab)b/(b1).\limsup_{\theta\to\infty}\theta^{-b/(b-1)}\Lambda(\theta)\leq A(b-1)(Ab)^{-b/(b-1)}.

Then letting AA tend to CC, we obtain the upper bound in (7).

Now, for xx exceeding the mean of FF, Λ(x)=supθ0(θxΛ(θ))\Lambda^{*}(x)=\sup_{\theta\geq 0}(\theta x-\Lambda(\theta)), and starting from (10), we obtain

Λ(x)supθ0(θxBθb/(b1))log2=Axblog2.\Lambda^{*}(x)\geq\sup_{\theta\geq 0}\bigl{(}\theta x-B\theta^{b/(b-1)}\bigr{)}-\log 2=Ax^{b}-\log 2.

Therefore,

lim¯xxbΛ(x)A.\mathop{\underline{\lim}}_{x\to\infty}x^{-b}\Lambda^{*}(x)\geq A.

Then, letting AA tend to CC, we obtain the lower bound in (9). ∎

We now define γ\gamma, first appearing in Section 5.1. Our function γ\gamma depends on certain quantities listed in the following lemma. It also depends on the quantity ζ\zeta, which we take as that defined in (3). It is only through its dependence on ζ\zeta that γ\gamma is affected by the geometry of 𝕍m\mathbb{V}_{m}.

Lemma A.0

Consider a distribution FF on the real line, possibly discrete but not a point mass, with finite mean μ\mu and finite moment-generating function at some positive θ>0\theta>0, and let Λ\Lambda^{*} denote its rate function. Let νμ\nu\leq\mu, and fix β,ζ[0,)\beta,\zeta\in[0,\infty).

  1. 1.

    Assume that ζ0\zeta\neq 0. If 0<β<1/ζ0<\beta<1/\zeta, or β=0\beta=0 and F𝐴𝐸𝑃(b,C)F\in\operatorname{AEP}(b,C) for some b2b\geq 2 and C>0C>0, then there is a unique solution γ=γ(F,ν,ζ,β)\gamma=\gamma(F,\nu,\zeta,\beta) to the following equation

    infβ<s<1/ζ[sΛ(ν+γ/s)+sζ]=1.\inf_{\beta<s<1/\zeta}\bigl{[}s\Lambda^{*}\bigl{(}\nu+\sqrt{\gamma/s}\bigr{)}+s\zeta\bigr{]}=1.
  2. 2.

    Assume that ζ=0\zeta=0. The foregoing holds as long as ν=μ\nu=\mu (and with 1/ζ1/\zeta interpreted as \infty).

Proof.

Let M=sup{x:Λ(x)<}M=\sup\{x\colon\ \Lambda^{*}(x)<\infty\}. Because FF is not a point mass, μ<M\mu<M\leq\infty. Define

G(s,γ)=sΛ(ν+γ/s)+sζ.G(s,\gamma)=s\Lambda^{*}\bigl{(}\nu+\sqrt{\gamma/s}\bigr{)}+s\zeta.

Note that G(s,γ)G(s,\gamma) is finite (resp., infinite) if γ/s<(Mν)2\gamma/s<(M-\nu)^{2} (resp., γ/s>(Mν)2\gamma/s>(M-\nu)^{2}). In addition, G(s,γ)G(s,\gamma), and its derivatives are continuous wherever GG is finite, and thus are uniformly continuous on any compact subset of [0,)2[0,\infty)^{2} on which GG is finite. Furthermore, G(s,γ)G(s,\gamma) is strictly increasing in γ\gamma on the interval (0,s(Mν)2)(0,s(M-\nu)^{2}). Let

Lβ(γ)=infβ<s<1/ζG(s,γ).L_{\beta}(\gamma)=\inf_{\beta<s<1/\zeta}G(s,\gamma). (11)

Thus Lβ(γ)L_{\beta}(\gamma) is finite if γζ<(Mν)2\gamma\zeta<(M-\nu)^{2}, and infinite when << is replaced by >>. Furthermore, for γ<(Mν)2/ζ\gamma<(M-\nu)^{2}/\zeta, the infimum is achieved at some value sγs_{\gamma} of ss in a neighborhood where G(s,γ)<G(s,\gamma)<\infty.

Assume first that β>0\beta>0. It may be seen that Lβ(γ)L_{\beta}(\gamma) is continuous and strictly increasing in γ\gamma on the interval [0,(Mν)2/ζ)[0,(M-\nu)^{2}/\zeta). Let 0γ<γ<(Mν)2/ζ0\leq\gamma<\gamma^{\prime}<(M-\nu)^{2}/\zeta. Then

0Lβ(γ)Lβ(γ)G(sγ,γ)G(sγ,γ),0\leq L_{\beta}(\gamma^{\prime})-L_{\beta}(\gamma)\leq G(s_{\gamma},\gamma^{\prime})-G(s_{\gamma},\gamma), (12)

and continuity follows from the properties of GG noted earlier. Similarly,

Lβ(γ)Lβ(γ)G(sγ,γ)G(sγ,γ)L_{\beta}(\gamma^{\prime})-L_{\beta}(\gamma)\geq G(s_{\gamma^{\prime}},\gamma^{\prime})-G(s_{\gamma^{\prime}},\gamma) (13)

and strict monotonicity follows similarly.

It suffices to prove that Lβ(γ)L_{\beta}(\gamma) takes values <<1 and finite values >>1. The first claim follows from the fact that, with γ=β(μν)2\gamma=\beta(\mu-\nu)^{2},

Lβ(γ)G(β,γ)=βζ<1.L_{\beta}(\gamma)\leq G(\beta,\gamma)=\beta\zeta<1.

We now turn to the second claim, and make use of two general properties of rate functions that follow from Dembo and Zeitouni [15], Equation (2.2.10), Lemma 2.2.20. It is standard that Λ(μ+x)12(x/σ)2\Lambda^{*}(\mu+x)\sim\frac{1}{2}(x/\sigma)^{2} as x0x\downarrow 0, where σ2>0\sigma^{2}>0 is the variance of FF. Therefore,

T(0,M) such that Λ(μ+x)14(x/σ)2 when 0xT.\exists T\in(0,M)\mbox{ such that }\Lambda^{*}(\mu+x)\geq{\textstyle\frac{1}{4}}(x/\sigma)^{2}\mbox{ when }0\leq x\leq T. (14)

With TT thus chosen, by convexity,

A>0 such that Λ(μ+x)Ax when xT.\exists A>0\mbox{ such that }\Lambda^{*}(\mu+x)\geq Ax\mbox{ when }x\geq T. (15)

Assume first that ζ>0\zeta>0 and M=M=\infty. By (15), for sufficiently large γ\gamma,

>Lβ(γ)infβ<s<1/ζ[sA(νμ+γ/s)+sζ]A(β(νμ)+γβ)>1.\infty>L_{\beta}(\gamma)\geq\inf_{\beta<s<1/\zeta}\bigl{[}sA\bigl{(}\nu-\mu+\sqrt{\gamma/s}\bigr{)}+s\zeta\bigr{]}\geq A\bigl{(}\beta(\nu-\mu)+\sqrt{\gamma\beta}\bigr{)}>1.

Suppose next that ζ>0\zeta>0 and M<M<\infty. Let 0<γ<(Mν)2/ζ0<\gamma<(M-\nu)^{2}/\zeta. Because Λ(ν+γ/s)=\Lambda^{*}(\nu+\sqrt{\gamma/s})=\infty if s<γ/(Mν)2=:β0(γ)s<\gamma/(M-\nu)^{2}=:\beta_{0}(\gamma),

\displaystyle\infty >\displaystyle> Lβ(γ)β0infβ0<s<1/ζΛ(ν+γ/s)+β0ζ\displaystyle L_{\beta}(\gamma)\geq\beta_{0}\inf_{\beta_{0}<s<1/\zeta}\Lambda^{*}\bigl{(}\nu+\sqrt{\gamma/s}\bigr{)}+\beta_{0}\zeta
=\displaystyle= β0Λ(ν+γζ)+β0ζ.\displaystyle\beta_{0}\Lambda^{*}\bigl{(}\nu+\sqrt{\gamma\zeta}\bigr{)}+\beta_{0}\zeta.

The limit of this, as γ(Mν)2/ζ\gamma\uparrow(M-\nu)^{2}/\zeta, is strictly greater than 11.

Now let ζ=0\zeta=0 and ν=μ\nu=\mu, and note that Lβ(γ)<L_{\beta}(\gamma)<\infty for all γ0\gamma\geq 0. Suppose that MM\leq\infty and γ>0\gamma>0. By dividing the infimum in (11) according to whether or not γ/s<T\sqrt{\gamma/s}<T, we find that

\displaystyle\infty >\displaystyle> Lβ(γ)min{infβ<s<γ/T2sΛ(μ+γ/s),infs>γ/T2sΛ(μ+γ/s)}\displaystyle L_{\beta}(\gamma)\geq\min\Bigl{\{}\inf_{\beta<s<\gamma/T^{2}}s\Lambda^{*}\bigl{(}\mu+\sqrt{\gamma/s}\bigr{)},\inf_{s>\gamma/T^{2}}s\Lambda^{*}\bigl{(}\mu+\sqrt{\gamma/s}\bigr{)}\Bigr{\}}
\displaystyle\geq min{Aγβ,14γ/σ2},\displaystyle\min\biggl{\{}A\sqrt{\gamma\beta},\frac{1}{4}\gamma/\sigma^{2}\biggr{\}},

by (14)–(15). This diverges as γ\gamma\to\infty.

When β=0\beta=0, some of the arguments fail, because G(s,γ)G(s,\gamma) might not be continuous at (0,0)(0,0). Assume that F𝐴𝐸𝑃(b,C)F\in\operatorname{AEP}(b,C) for some b2b\geq 2 and C>0C>0. Note that M=M=\infty by Lemma 15. If b=2b=2, G(s,γ)CγG(s,\gamma)\to C\gamma when γ>0\gamma>0 is fixed and s0s\to 0, by Lemma 15, and taking this limit as an extension at s=0s=0, the same arguments used in the case β>0\beta>0 apply. If b>2b>2, we need slightly different arguments. As before, let sγs_{\gamma} be a minimizer of G(s,γ)G(s,\gamma). We have that sγs_{\gamma} is well defined for all γ\gamma and strictly positive, because GG is uniformly continuous on any compact of (0,1/ζ]×[0,)(0,1/\zeta]\times[0,\infty) and G(s,γ)Cγb/2s1b/2G(s,\gamma)\sim C\gamma^{b/2}s^{1-b/2}\to\infty when s0s\to 0. Thus we may proceed as before in (12)–(13), obtaining that L0(γ)L_{0}(\gamma) is strictly increasing and continuous. As before, we turn to proving that L0L_{0} takes values <<1 and finite values >>1. First, with γ=(μν)2/(2ζ)\gamma=(\mu-\nu)^{2}/(2\zeta) and s=1/(2ζ)s=1/(2\zeta),

L0(γ)G(s,γ)=γζ/(μν)2=1/2<1.L_{0}(\gamma)\leq G(s,\gamma)=\gamma\zeta/(\mu-\nu)^{2}=1/2<1.

Next, showing that L0L_{0} takes finite values above 1 is done exactly as before, except that (14) is replaced by

G(s,γ)Cs1b/2γb/2Cζb/21γb/2,γG(s,\gamma)\sim Cs^{1-b/2}\gamma^{b/2}\geq C\zeta^{b/2-1}\gamma^{b/2},\qquad\gamma\to\infty

by Lemma 15. ∎

The following result describes the variations of γ\gamma (defined in Lemma 16) with the parameter of an exponential family.

Lemma A.0

Consider a natural exponential family of distributions (Fθ,θ0)(F_{\theta},\theta\geq 0) and let μθ\mu_{\theta} and Λθ\Lambda_{\theta}^{*} denote the mean and the rate function of FθF_{\theta}, respectively. Let ζθ\zeta_{\theta} be a continuous and decreasing function of θ\theta. Then, for any fixed 0<β<1/ζ00<\beta<1/\zeta_{0}, γθ:=γ(Fθ,μ0,ζθ,β)\gamma_{\theta}:=\gamma(F_{\theta},\mu_{0},\zeta_{\theta},\beta) is continuous and strictly increasing in θ\theta. Moreover, if ζθ0\zeta_{\theta}\to 0 when θθc\theta\to\theta_{c}, then γθ\gamma_{\theta}\to\infty when θθc\theta\to\theta_{c}.

Proof.

First, note that μθμ0\mu_{\theta}\geq\mu_{0} (Brown [7], Cor. 2.22) so that γθ\gamma_{\theta} is well-defined. That γθ\gamma_{\theta} is strictly increasing comes from the fact that both ζθ\zeta_{\theta} and Λθ(a)\Lambda_{\theta}^{*}(a) (a>μθa>\mu_{\theta} fixed) are decreasing. The latter can be seen from

Λθ(a)=limk1klogθ(X¯ka),\Lambda_{\theta}^{*}(a)=-\lim_{k\to\infty}\frac{1}{k}\log\mathbb{P}_{\theta}(\bar{X}_{k}\geq a),

where X¯k\bar{X}_{k} is the average of the sample of size kk from FθF_{\theta} Brown [7], Cor. 2.22, and the fact that the distribution of X¯k\bar{X}_{k} as θ\theta varies forms a natural exponential family with parameter kθk\theta. That γθ\gamma_{\theta} is continuous comes from the continuity of ζθ\zeta_{\theta} and Λθ(a)\Lambda_{\theta}^{*}(a) (in (θ,a)(\theta,a)).

For the behavior near θc\theta_{c}, note that Λθ(a)=0\Lambda_{\theta}^{*}(a)=0 for aμθa\leq\mu_{\theta}, so that G(1/(2ζθ),γ)=1/2G(1/(2\zeta_{\theta}),\gamma)=1/2 for any γ(μθμ0)2/(2ζθ)\gamma\leq(\mu_{\theta}-\mu_{0})^{2}/(2\zeta_{\theta}). Combine this with the fact that μθ\mu_{\theta} is strictly increasing in θ\theta to see that γθ\gamma_{\theta} is of order at least 1/ζθ1/\zeta_{\theta}. In fact, it is easy to see that γθ(μθμ0)2/ζθ\gamma_{\theta}\sim(\mu_{\theta}-\mu_{0})^{2}/\zeta_{\theta} when θθc\theta\nearrow\theta_{c}. ∎

.3 Main proofs

.3.1 Proof of Theorem 1

By monotonicity, it is sufficient to assume that θm=θ\theta_{m}=\theta for all mm. Fix tt and, for short, let p=p0(t)p=p_{0}(t) and p=pθ(t)p^{\prime}=p_{\theta}(t). First, assume that θ>θ\theta>\theta_{*}, so that ζp<αζp\zeta_{p^{\prime}}<\alpha\zeta_{p}. Fix BB such that 1/ζp<B<α/ζp1/\zeta_{p}<B<\alpha/\zeta_{p^{\prime}} and consider the test with rejection region {Sm(t)dBlogm}\{S_{m}(t)\geq dB\log m\}. Under 0m\mathbb{H}^{m}_{0}, we have Sm(t)=(1+oP(1))(d/ζp)logmS_{m}(t)=(1+\mathrm{o}_{\mathrm{P}}(1))(d/\zeta_{p})\log m by (4), so that (Sm(t)dBlogm)0\mathbb{P}(S_{m}(t)\geq dB\log m)\to 0. Under 1,Km\mathbb{H}^{m}_{1,K}, Sm(t)SK(t)=(1+oP(1))(αd/ζp)logmS_{m}(t)\geq S_{K}(t)=(1+\mathrm{o}_{\mathrm{P}}(1))(\alpha d/\zeta_{p^{\prime}})\log m, so that (Sm(t)dBlogm)1\mathbb{P}(S_{m}(t)\geq dB\log m)\to 1. Thus this test is asymptotically powerful.

Now assume that θ<θ\theta<\theta_{*}, so that ζp>αζp\zeta_{p^{\prime}}>\alpha\zeta_{p} and there is BB such that α/ζp<B<1/ζp\alpha/\zeta_{p^{\prime}}<B<1/\zeta_{p}. Let Kc=𝕍mKK^{c}=\mathbb{V}_{m}\setminus K. It is sufficient to show that under both 0m\mathbb{H}^{m}_{0} and 1,Km\mathbb{H}^{m}_{1,K}, Sm(t)=SKc(t)S_{m}(t)=S_{K^{c}}(t) with probability tending to 1, so that the values at the nodes in KK have no influence on Sm(t)S_{m}(t). Indeed, let JJ be a hypercube within 𝕍m\mathbb{V}_{m} of sidelength [m/3][m/3] which does not intersect KK. Then SKc(t)SJ(t)S_{K^{c}}(t)\geq S_{J}(t), and the distribution of SJ(t)S_{J}(t) is the same under both 0m\mathbb{H}^{m}_{0} and 1,Km\mathbb{H}^{m}_{1,K}. In addition, (SJ(t)dBlogm)1\mathbb{P}(S_{J}(t)\geq dB\log m)\to 1 by (4). Now, let LL be the set of nodes within (supnorm) distance (logm)2(\log m)^{2} from KK, so that LL is a hypercube of side length [mα]+[2(logm)2][m^{\alpha}]+[2(\log m)^{2}] containing KK in its interior. In the event that {Sm(t)(logm)2}\{S_{m}(t)\leq(\log m)^{2}\}, Sm(t)SKc(t)S_{m}(t)\neq S_{K^{c}}(t) only when SL(t)>SKc(t)S_{L}(t)>S_{K^{c}}(t). The distribution of SL(t)S_{L}(t) under the null is stochastically bounded by its distribution under 1,Km\mathbb{H}^{m}_{1,K}, which is itself bounded by its distribution under 1,Lm\mathbb{H}^{m}_{1,L}. Even under the latter, (SL(t)dBlogm)0\mathbb{P}(S_{L}(t)\geq dB\log m)\to 0 by (4). We then conclude the proof using the fact that (Sm(t)(logm)2)1\mathbb{P}(S_{m}(t)\leq(\log m)^{2})\to 1, again by (4).

.3.2 Proof of Theorem 2

Here we use the notation and follow the arguments of Section .3.1. In addition, let ζp1=log(1/p)\zeta^{1}_{p^{\prime}}=\log(1/p^{\prime}), that is, the function ζ\zeta in dimension one. When θ>θ+\theta>\theta_{*}^{+}, we consider 1/ζp<B<α/dζp11/\zeta_{p}<B<\alpha/d\zeta^{1}_{p^{\prime}}. Under 0m\mathbb{H}^{m}_{0}, we still have Sm(t)=(1+oP(1))(d/ζp)logmS_{m}(t)=(1+\mathrm{o}_{\mathrm{P}}(1))(d/\zeta_{p})\log m. Under 1,Km\mathbb{H}^{m}_{1,K}, Sm(t)SK(t)=(1+oP(1))(α/ζp)logmS_{m}(t)\geq S_{K}(t)=(1+\mathrm{o}_{\mathrm{P}}(1))(\alpha/\zeta_{p^{\prime}})\log m, because KK is isomorphic to a subinterval of the one-dimensional lattice. We conclude as before that the test with rejection region {Sm(t)dBlogm}\{S_{m}(t)\geq dB\log m\} is asymptotically powerful.

When θ<θ\theta<\theta_{*}^{-}, we consider α/dζp<B<1/ζp\alpha/d\zeta_{p^{\prime}}<B<1/\zeta_{p}. As before, let LL be the set of nodes within (supnorm) distance (logm)2(\log m)^{2} from KK, so that LL is now a band. As before, it suffices to prove that (SL(t)dBlogm)0\mathbb{P}(S_{L}(t)\geq dB\log m)\to 0 under 1,Lm\mathbb{H}^{m}_{1,L}. Although (4) cannot be applied, because LL is not isomorphic to a square lattice, its proof via the union bound and (3) applies. Indeed, fix η>0\eta>0 small enough that (1η)ζpdB>α(1-\eta)\zeta_{p^{\prime}}dB>\alpha. Then, for mm large enough, we have

(SL(t)dBlogm)\displaystyle\mathbb{P}\bigl{(}S_{L}(t)\geq dB\log m\bigr{)} \displaystyle\leq |L|(SdBlogm)\displaystyle|L|\cdot\mathbb{P}(S\geq dB\log m)
\displaystyle\leq O(mα(logm)2(d1))exp((1η)ζpdBlogm)\displaystyle\mathrm{O}\bigl{(}m^{\alpha}(\log m)^{2(d-1)}\bigr{)}\exp\bigl{(}-(1-\eta)\zeta_{p^{\prime}}dB\log m\bigr{)}
=\displaystyle= O(logm)2(d1)exp((α(1η)ζpdB)logm)0.\displaystyle\mathrm{O}(\log m)^{2(d-1)}\exp\bigl{(}\bigl{(}\alpha-(1-\eta)\zeta_{p^{\prime}}dB\bigr{)}\log m\bigr{)}\to 0.

.3.3 Proof of Proposition 1

Let km(ε)=(1ε)dlog(m)/log(1/p0(tm))k_{m}(\varepsilon)=(1-\varepsilon)d\log(m)/\log(1/p_{0}(t_{m})) with ε>0\varepsilon>0 fixed. We first show that Sm(tm)km(ε)S_{m}(t_{m})\geq k_{m}(\varepsilon) with probability tending to 1 under 0m\mathbb{H}^{m}_{0}. We use the notation and arguments provided in the proof of Lemma 9. As in (.1),

(Sm(tm)<km(ε))\displaystyle\mathbb{P}\bigl{(}S_{m}(t_{m})<k_{m}(\varepsilon)\bigr{)} \displaystyle\leq (1(Skm(ε)))N\displaystyle\bigl{(}1-\mathbb{P}\bigl{(}S\geq k_{m}(\varepsilon)\bigr{)}\bigr{)}^{N}
\displaystyle\leq (1p0(tm)km(ε))N\displaystyle\bigl{(}1-p_{0}(t_{m})^{k_{m}(\varepsilon)}\bigr{)}^{N}
\displaystyle\leq exp(mεd/(logm)2d)0,\displaystyle\exp\bigl{(}-m^{\varepsilon d}/(\log m)^{2d}\bigr{)}\to 0,

where the second inequality holds for mm large enough by Lemma 10.

Assume that θmθ<\theta_{m}\leq\theta<\infty for all mm. Proceeding as in Section .3.1 and using the slightly larger region LL, it is sufficient to show that for ε\varepsilon small enough, SL(tm)km(ε)S_{L}(t_{m})\leq k_{m}(\varepsilon) when XvFθX_{v}\sim F_{\theta} for all vLv\in L. Using the union bound and the fact that |L|=O(m)αd|L|=\mathrm{O}(m)^{\alpha d}, we have

(SL(tm)km(ε))|L|(Skm(ε))O(m)αd(cpθ(tm))km(ε),\mathbb{P}\bigl{(}S_{L}(t_{m})\geq k_{m}(\varepsilon)\bigr{)}\leq|L|\cdot\mathbb{P}\bigl{(}S\geq k_{m}(\varepsilon)\bigr{)}\leq\mathrm{O}(m)^{\alpha d}(cp_{\theta}(t_{m}))^{k_{m}(\varepsilon)}, (17)

where the last inequality is due to Lemma 10 (and cc is the constant that appears there). Through integration by parts, for θ>0\theta>0 and ε(0,1)\varepsilon\in(0,1) fixed, we have pθ(t)p0((1ε)t)p_{\theta}(t)\leq p_{0}((1-\varepsilon)t) for sufficiently large tt. Indeed, for tt large enough,

pθ(t)\displaystyle p_{\theta}(t) =\displaystyle= exp(θtΛ(θ))p0(t)+tθexp(θxΛ(θ))p0(x)dx\displaystyle\exp\bigl{(}\theta t-\Lambda(\theta)\bigr{)}p_{0}(t)+\int_{t}^{\infty}\theta\exp\bigl{(}\theta x-\Lambda(\theta)\bigr{)}p_{0}(x)\,\mathrm{d}x
\displaystyle\leq exp(θtΛ(θ)C(1ε/3)btb)+tθexp(θxΛ(θ)C(1ε/3)bxb)dx\displaystyle\exp\bigl{(}\theta t-\Lambda(\theta)-C(1-\varepsilon/3)^{b}t^{b}\bigr{)}+\int_{t}^{\infty}\theta\exp\bigl{(}\theta x-\Lambda(\theta)-C(1-\varepsilon/3)^{b}x^{b}\bigr{)}\,\mathrm{d}x
\displaystyle\leq exp(C(1ε/2)btb)\displaystyle\exp\bigl{(}-C(1-\varepsilon/2)^{b}t^{b}\bigr{)}
\displaystyle\leq p0((1ε)t),\displaystyle p_{0}\bigl{(}(1-\varepsilon)t\bigr{)},

where we used the fact that b>1b>1 in line 3 and the fact that logp0(t)Ctb\log p_{0}(t)\sim-Ct^{b} as tt\to\infty (because F0𝐴𝐸𝑃(b,C)F_{0}\in\operatorname{AEP}(b,C)) in lines 2 and 4. The last property also implies that p0((1ε)t)p0(t)(1ε)b+1p_{0}((1-\varepsilon)t)\leq p_{0}(t)^{(1-\varepsilon)^{b+1}} for large tt. Thus, for mm large enough, pθ(tm)p0(tm)(1ε)b+1p_{\theta}(t_{m})\leq p_{0}(t_{m})^{(1-\varepsilon)^{b+1}}, so that taking logs in (17), we get

log(SL(tm)km(ε))O(1)+(dlogm)(α+O(logp0(tm))1(1ε)b+2),\log\mathbb{P}\bigl{(}S_{L}(t_{m})\geq k_{m}(\varepsilon)\bigr{)}\leq\mathrm{O}(1)+(d\log m)\bigl{(}\alpha+\mathrm{O}(\log p_{0}(t_{m}))^{-1}-(1-\varepsilon)^{b+2}\bigr{)}\to-\infty,

when ε<1α1/(b+2)\varepsilon<1-\alpha^{1/(b+2)}. (Remember that α<1\alpha<1 and that p0(tm)0p_{0}(t_{m})\to 0, so the middle term is small.)

.3.4 Proof of Theorem 3

Let 𝔼θ\mathbb{E}_{\theta} denote the expectation of XvX_{v} under FθF_{\theta}. By Lemma 12, under the null,

Sm(t)𝔼0(Sm(t))𝑉𝑎𝑟0(Sm(t))𝒩(0,1),\frac{S_{m}(t)-\mathbb{E}_{0}(S_{m}(t))}{\sqrt{\operatorname{Var}_{0}(S_{m}(t))}}\to\mathcal{N}(0,1), (18)

with 𝑉𝑎𝑟0(Sm(t))\operatorname{Var}_{0}(S_{m}(t)) of order mdm^{d}. Write p:=p0(t)p:=p_{0}(t) and p:=pθm(t)p^{\prime}:=p_{\theta_{m}}(t).

We consider the alternative with anomalous cluster KK as a two-stage percolation process, where the first stage is percolation on 𝕍m\mathbb{V}_{m} with probability pp, as under the null, and the second stage is percolation on the closed nodes within KK, that is, K{v:Xv>t}K\setminus\{v\colon\ X_{v}>t\}, with (conditional) probability (pp)/(1p)(p^{\prime}-p)/(1-p). An open cluster at the first stage is called small if it is not a largest open cluster.

We may assume, except where noted below, that θm0\theta_{m}\to 0. Because

θlogpθ(t)=𝔼θ(Xv|Xv>t)𝔼θ(Xv),\frac{\partial}{\partial\theta}\log p_{\theta}(t)=\mathbb{E}_{\theta}(X_{v}|X_{v}>t)-\mathbb{E}_{\theta}(X_{v}),

which is positive at θ=0\theta=0 by choice of tt, there exists c(0,)c\in(0,\infty) such that

ppcθmas m.p^{\prime}-p\sim c\theta_{m}\qquad\mbox{as }m\to\infty. (19)

Let Δm0\Delta_{m}\geq 0 be the difference between the sizes of the largest clusters under the null and the alternative. For xKx\in K, let FxF_{x} be the sum of the sizes of all small clusters of the entire lattice that contain some neighbor of xx. Note that ΔmxD(1+Fx)\Delta_{m}\leq\sum_{x\in D}(1+F_{x}), where DD is the set of xKx\in K that are closed at the first stage and open at the second stage. Therefore, Δm\Delta_{m} has expectation bounded above by

𝔼(Δm)(pp1p)|K|(1+2dμp),\mathbb{E}(\Delta_{m})\leq\biggl{(}\frac{p^{\prime}-p}{1-p}\biggr{)}|K|(1+2d\mu_{p}), (20)

where μp<\mu_{p}<\infty is the mean size of a finite open cluster in the infinite lattice.

By (19) and the foregoing, 𝔼(Δm)Cθmmαd\mathbb{E}(\Delta_{m})\leq C\theta_{m}m^{\alpha d} for some C<C<\infty. By Markov’s inequality, Δm=OP(θmmαd)\Delta_{m}=\mathrm{O}_{\mathrm{P}}(\theta_{m}m^{\alpha d}).

Thus, if θmm(α1/2)d0\theta_{m}m^{(\alpha-1/2)d}\to 0, then Δm/𝑉𝑎𝑟0(Sm(t))0\Delta_{m}/\sqrt{\operatorname{Var}_{0}(S_{m}(t))}\to 0, implying that the same central limit law as (18) holds under the alternative, so that the test based on the largest open cluster is asymptotically powerless. We also must consider the case where θm↛0\theta_{m}\not\to 0, for which a similar argument is valid.

Now assume that α1/2\alpha\geq 1/2 and θmm(α1/2)d\theta_{m}m^{(\alpha-1/2)d}\to\infty. By Grimmett [21], Theorem 8.99, and standard properties of the largest cluster in a box (to be found in, e.g., Falconer and Grimmett [18]), with probability tending to 1, the largest open cluster increases in size by at least C1(pp)|K|C_{1}(p^{\prime}-p)|K| for some C1=C1(p)>0C_{1}=C_{1}(p)>0. By (19), this has order θmmαd\theta_{m}m^{\alpha d}. Because

θmmαd𝑉𝑎𝑟0(Sm(t))C2θmm(α1/2)d\frac{\theta_{m}m^{\alpha d}}{\sqrt{\operatorname{Var}_{0}(S_{m}(t))}}\sim C_{2}\theta_{m}m^{(\alpha-1/2)d}\to\infty

for some C2=C2(p)>0C_{2}=C_{2}(p)>0, the test based on the largest open cluster is asymptotically powerful.

.3.5 Proof of Theorem 4

We may assume without loss of generality that θm0\theta_{m}\to 0 as mm\to\infty. By (5) and the assumption on tmt_{m}, we have that Sm(tm)PlogmS_{m}(t_{m})\asymp_{\mathrm{P}}\log m under the null. Now pθ(t)p_{\theta}(t) is infinitely differentiable in θ\theta, with each derivative continuous in tt and with

pθ(t)θ|θ=0=p0(t)[𝔼0(Xv|Xv>t)𝔼0(Xv)]pc2[𝔼0(Xv|Xv>tc)𝔼0(Xv)]>0,\frac{\partial p_{\theta}(t)}{\partial\theta}\Big{|}_{\theta=0}=p_{0}(t)[\mathbb{E}_{0}(X_{v}|X_{v}>t)-\mathbb{E}_{0}(X_{v})]\geq\frac{p_{c}}{2}[\mathbb{E}_{0}(X_{v}|X_{v}>t_{c})-\mathbb{E}_{0}(X_{v})]>0,

uniformly for tt in a neighborhood of tct_{c}. Therefore, there exists C>0C>0 such that

pθ(t)θ1/Cand|2pθ(t)θ2|C\frac{\partial p_{\theta}(t)}{\partial\theta}\geq 1/C\quad\mbox{and}\quad\biggl{|}\frac{\partial^{2}p_{\theta}(t)}{\partial\theta^{2}}\biggr{|}\leq C

for (θ,t)(\theta,t) in some neighborhood of (0,tc)(0,t_{c}). Thus,

pθ(t)p0(t)θ/CC2θ2/2θ/(2C),p_{\theta}(t)-p_{0}(t)\geq\theta/C-C^{2}\theta^{2}/2\geq\theta/(2C),

on such a neighborhood. Let AA and BB be such that pcp0(tm)Amα/νp_{c}-p_{0}(t_{m})\leq Am^{-\alpha/\nu^{\prime}} and θmBmα/ν\theta_{m}\geq Bm^{-\alpha/\nu^{\prime}}, and assume that B>2ACB>2AC, based on the statement of the theorem. Because θm0\theta_{m}\to 0 and tmtct_{m}\to t_{c},

mα/ν′′(pθm(tm)pc)mα/ν′′[θm2C+(p0(tm)pc)][B2CA]mα(1/ν′′1/ν)m^{\alpha/\nu^{\prime\prime}}\bigl{(}p_{\theta_{m}}(t_{m})-p_{c}\bigr{)}\geq m^{\alpha/\nu^{\prime\prime}}\biggl{[}\frac{\theta_{m}}{2C}+\bigl{(}p_{0}(t_{m})-p_{c}\bigr{)}\biggr{]}\geq\biggl{[}\frac{B}{2C}-A\biggr{]}m^{\alpha(1/\nu^{\prime\prime}-1/\nu^{\prime})}\to\infty

for ν′′<ν\nu^{\prime\prime}<\nu^{\prime} and sufficiently large mm. By (5) applied to K𝒦mK\in\mathcal{K}_{m}, it follows that SK(tm)PmαdS_{K}(t_{m})\asymp_{\mathrm{P}}m^{\alpha d} under the alternative. Consequently, the test with rejection region {Sm(tm)(logm)2}\{S_{m}(t_{m})\geq(\log m)^{2}\} is asymptotically powerful.

.3.6 Proof of Lemma 5

Part 1. This follows immediately from Lemma 9.

Therefore, we focus on the remaining two parts. We use the abbreviated notation F:=Fθ|tF:=F_{\theta|t}, Λ:=Λθ|t\Lambda^{*}:=\Lambda^{*}_{\theta|t}, μ:=μθ|t\mu:=\mu_{\theta|t}, ζ:=ζpθ(t)\zeta:=\zeta_{p_{\theta}(t)}, γ:=γθ|t(β)\gamma:=\gamma_{\theta|t}(\beta), Um:=Um(t,km)U_{m}:=U_{m}(t,k_{m}), and write ν:=μ0|t\nu:=\mu_{0|t}. Let Yk=XkνY_{k}=X_{k}-\nu. As in Lemma 11, let Nm(k)N_{m}(k) denote the number of open cluster of size kk within 𝕍m\mathbb{V}_{m}, and define

Gk(x)=(k1/2Y¯kx),G_{k}(x)=\mathbb{P}(k^{1/2}\bar{Y}_{k}\leq x),

where Y¯k=X¯kν\bar{Y}_{k}=\bar{X}_{k}-\nu and X¯k\bar{X}_{k} is the average of an i.i.d. sample of size kk from FF. By the independence of Y¯K\bar{Y}_{K} and Y¯L\bar{Y}_{L} for K,L𝒬m(t)K,L\in\mathcal{Q}_{m}^{(t)} distinct, we have

(Umx)=𝔼(kkmGk(x)Nm(k))=𝔼(exp[Rm(x)]),\mathbb{P}(U_{m}\leq x)=\mathbb{E}\biggl{(}\prod_{k\geq k_{m}}G_{k}(x)^{N_{m}(k)}\biggr{)}=\mathbb{E}(\exp[-R_{m}(x)]),

where

Rm(x):=kkmNm(k)log(1G¯k(x)).R_{m}(x):=-\sum_{k\geq k_{m}}N_{m}(k)\log\bigl{(}1-\bar{G}_{k}(x)\bigr{)}.

Thus, we turn to bounding Rm(x)R_{m}(x).

Part 2. Define xm=γdlogmx_{m}=\sqrt{\gamma d\log m} and fix ε>0\varepsilon>0. For the lower bound, let m\ell_{m} be the closest integer to adlogmad\log m between kmk_{m} and (d/ζ)logm(d/\zeta)\log m, where

a=argminβ<s<1/ζ[sΛ(ν+γ/s)+sζ].a=\mathop{\arg\min}_{\beta<s<1/\zeta}\bigl{[}s\Lambda^{*}\bigl{(}\nu+\sqrt{\gamma/s}\bigr{)}+s\zeta\bigr{]}. (21)

We have

Rm((1ε)xm)Tm:=Nm(m)G¯m((1ε)xm),R_{m}\bigl{(}(1-\varepsilon)x_{m}\bigr{)}\geq T_{m}:=N_{m}(\ell_{m})\bar{G}_{\ell_{m}}\bigl{(}(1-\varepsilon)x_{m}\bigr{)},

and we show that for ε\varepsilon fixed, TmT_{m}\to\infty in probability. Fix η>0\eta>0. On the one hand, we use Lemma 11 and (3), to get

𝔼(Nm(m))(m2m)dm(S=m)mdexp((1+η)ζm)\mathbb{E}(N_{m}(\ell_{m}))\geq\frac{(m-2\ell_{m})^{d}}{\ell_{m}}\mathbb{P}(S=\ell_{m})\geq m^{d}\exp\bigl{(}-(1+\eta)\zeta\ell_{m}\bigr{)}

for mm large enough. On the other hand, we use Cramér’s theorem (Dembo and Zeitouni [15], Theorem 2.2.3) to get

G¯m((1ε)xm)\displaystyle\bar{G}_{\ell_{m}}\bigl{(}(1-\varepsilon)x_{m}\bigr{)} \displaystyle\geq (Y¯m(1ε/2)γ/a)\displaystyle\mathbb{P}\bigl{(}\bar{Y}_{\ell_{m}}\geq(1-\varepsilon/2)\sqrt{\gamma/a}\bigr{)}
\displaystyle\geq exp((1+η)mΛ[ν+(1ε/2)γ/a])\displaystyle\exp\bigl{(}-(1+\eta)\ell_{m}\Lambda^{*}\bigl{[}\nu+(1-\varepsilon/2)\sqrt{\gamma/a}\bigr{]}\bigr{)}

for mm large enough. By the definition of γ\gamma, aΛ[ν+γ/a]+aζ=1a\Lambda^{*}[\nu+\sqrt{\gamma/a}]+a\zeta=1, and thus for ε\varepsilon small enough,

aζ+aΛ[ν+(1ε/2)γ/a]<1,a\zeta+a\Lambda^{*}\bigl{[}\nu+(1-\varepsilon/2)\sqrt{\gamma/a}\bigr{]}<1,

by strict monotonicity, as in the proof of Lemma 16. Thus, for η\eta small enough,

mζ+mΛ[ν+(1ε/2)γ/a](1η)dlogm.\ell_{m}\zeta+\ell_{m}\Lambda^{*}\bigl{[}\nu+(1-\varepsilon/2)\sqrt{\gamma/a}\bigr{]}\leq(1-\eta)d\log m.

It follows that

𝔼(Tm)mη2d.\mathbb{E}(T_{m})\geq m^{\eta^{2}d}.

To bound the corresponding variance, we use Lemma 11 to obtain

𝑉𝑎𝑟(Tm)O(logm)d𝔼(Tm),\operatorname{Var}(T_{m})\leq\mathrm{O}(\log m)^{d}\mathbb{E}(T_{m}),

and it follows by Chebyshev’s inequality that indeed TmT_{m}\to\infty in probability.

Because Tm0T_{m}\geq 0, exp(Tm)0\exp(-T_{m})\to 0 in L1L^{1}, and thus

(Um(1ε)xm)0.\mathbb{P}\bigl{(}U_{m}\leq(1-\varepsilon)x_{m}\bigr{)}\to 0.

We next show that 𝔼(Rm((1+ε)xm))0\mathbb{E}(R_{m}((1+\varepsilon)x_{m}))\to 0, which will imply the claim of Part 2. Fix η>0\eta>0. We have that

Rm((1+ε)xm)Tm+2Zm,R_{m}\bigl{(}(1+\varepsilon)x_{m}\bigr{)}\leq T_{m}+2Z_{m}, (22)

where

Tm:=2k=kmkm(η)Nm(k)G¯k((1+ε)xm)T_{m}:=2\sum_{k=k_{m}}^{k_{m}^{(\eta)}}N_{m}(k)\bar{G}_{k}\bigl{(}(1+\varepsilon)x_{m}\bigr{)}

and ZmZ_{m} is the number of clusters of size exceeding km(η):=[(1+η)(d/ζ)logm]k_{m}^{(\eta)}:=[(1+\eta)(d/\zeta)\log m]. We first note that, as in the proof of Lemma 11, for large mm,

𝔼(Zm)mdexp(12ζkm(η))0.\mathbb{E}(Z_{m})\leq m^{d}\exp\bigl{(}-{\textstyle\frac{1}{2}}\zeta k_{m}^{(\eta)}\bigr{)}\to 0. (23)

We next turn to TmT_{m}, and show that for ε\varepsilon fixed and η\eta small enough, 𝔼(Tm)0\mathbb{E}(T_{m})\to 0. On the one hand, we use Lemma 11 and (3) to get

𝔼(Nm(k))mdexp((1η)ζk)\mathbb{E}(N_{m}(k))\leq m^{d}\exp\bigl{(}-(1-\eta)\zeta k\bigr{)}

for mm large enough. On the other hand, by Chernoff’s bound,

G¯k((1+ε)xm)exp(kΛ[ν+(1+ε)xm/k]).\bar{G}_{k}\bigl{(}(1+\varepsilon)x_{m}\bigr{)}\leq\exp\bigl{(}-k\Lambda^{*}\bigl{[}\nu+(1+\varepsilon)x_{m}/\sqrt{k}\bigr{]}\bigr{)}.

Taken together, we obtain

𝔼(Tm)\displaystyle\mathbb{E}(T_{m}) \displaystyle\leq 2k=kmkm(η)mdexp((1η)[kζ+kΛ(ν+(1+ε)xm/k)])\displaystyle 2\sum_{k=k_{m}}^{k_{m}^{(\eta)}}m^{d}\exp\bigl{(}-(1-\eta)\bigl{[}k\zeta+k\Lambda^{*}\bigl{(}\nu+(1+\varepsilon)x_{m}/\sqrt{k}\bigr{)}\bigr{]}\bigr{)}
\displaystyle\leq O(logm)exp(dlogm(1η)minkmkkm(η)[kζ+kΛ(ν+(1+ε)xm/k)])\displaystyle\mathrm{O}(\log m)\exp\Bigl{(}d\log m-(1-\eta)\min_{k_{m}\leq k\leq k_{m}^{(\eta)}}\bigl{[}k\zeta+k\Lambda^{*}\bigl{(}\nu+(1+\varepsilon)x_{m}/\sqrt{k}\bigr{)}\bigr{]}\Bigr{)}
\displaystyle\leq O(logm)exp((1(1η)A)dlogm),\displaystyle\mathrm{O}(\log m)\exp\bigl{(}\bigl{(}1-(1-\eta)A\bigr{)}d\log m\bigr{)},

where

A:=infβ<a<(1+η)/ζ[aΛ(ν+(1+ε)γ/a)+aζ].A:=\inf_{\beta<a<(1+\eta)/\zeta}\bigl{[}a\Lambda^{*}\bigl{(}\nu+(1+\varepsilon)\sqrt{\gamma/a}\bigr{)}+a\zeta\bigr{]}. (24)

As in the proof of Lemma 16, A=A(ε,η)A=A(\varepsilon,\eta) is continuous in (ε,η)(\varepsilon,\eta) and strictly increasing in ε\varepsilon. Because A(0,0)=1A(0,0)=1 by definition of γ\gamma, for ε\varepsilon fixed, h:=1(1η)A(ε,η)<0-h:=1-(1-\eta)A(\varepsilon,\eta)<0 for η\eta small enough, in which case 𝔼(Tm)mhd/20\mathbb{E}(T_{m})\leq m^{-hd/2}\to 0 as mm increases.

By (22)–(23), we have that 𝔼(Rm((1+ε)xm)))0\mathbb{E}(R_{m}((1+\varepsilon)x_{m})))\to 0. By Jensen’s inequality,

(Um(1+ε)xm)exp(𝔼(Rm((1+ε)xm)))1,\mathbb{P}\bigl{(}U_{m}\leq(1+\varepsilon)x_{m}\bigr{)}\geq\exp\bigl{(}-\mathbb{E}\bigl{(}R_{m}\bigl{(}(1+\varepsilon)x_{m}\bigr{)}\bigr{)}\bigr{)}\to 1,

and the proof of this part is complete.

Part 3. We build on the arguments provided so far, which apply essentially unchanged, except in two places. In the lower bound, instead of Cramér’s theorem, we use

G¯k(x)F¯(x/k)k,\bar{G}_{k}(x)\geq\bar{F}\bigl{(}x/\sqrt{k}\bigr{)}^{k},

combined with the asymptotic behavior for F¯\bar{F}. In the upper bound, AA defined in (24) is evaluated differently when b<2b<2.

Part 3(a). When b>2b>2, we have a>0a>0 in (21) (with β=0\beta=0), because

h(s):=sΛ(ν+γ/s)+sζs1b/2h(s):=s\Lambda^{*}\bigl{(}\nu+\sqrt{\gamma/s}\bigr{)}+s\zeta\asymp s^{1-b/2}\to\infty

for γ\gamma fixed and s0s\to 0, by Lemma 15. When b=2b=2, we take aa small enough if the minimum is at a=0a=0. Then the other arguments in Part 2 apply unchanged.

Part 3(b). By the same calculations, a=0a=0 in (21), because h(s)>0h(s)>0 for all s>0s>0, and h(s)s1b/20h(s)\asymp s^{1-b/2}\to 0 when s0s\to 0, because b<2b<2. This would make A=0A=0 in (24) for any ε>0\varepsilon>0, making the arguments for the upper bound collapse. Instead, redefine xm=(Cdlogm)1/bkm1/21/bx_{m}=(Cd\log m)^{1/b}k_{m}^{1/2-1/b}. Because xm/kx_{m}/\sqrt{k}\to\infty uniformly over kkm(η)k\leq k_{m}^{(\eta)}, for η>0\eta>0 fixed, we have

kζ+kΛ(ν+(1+ε)xm/k)kζ+(1η)Ck1b/2(1+ε)bxmbk\zeta+k\Lambda^{*}\bigl{(}\nu+(1+\varepsilon)x_{m}/\sqrt{k}\bigr{)}\geq k\zeta+(1-\eta)Ck^{1-b/2}(1+\varepsilon)^{b}x_{m}^{b}

for mm large enough, by Lemma 15. Then the term on the right-hand side takes its minimum over kmkkm(η)k_{m}\leq k\leq k_{m}^{(\eta)} at k=kmk=k_{m}, and from here, the remaining arguments apply.

.3.7 Proof of Proposition 2

Assume, for simplicity, that θm=θ<θc\theta_{m}=\theta<\theta_{c} for all mm. The key point is that Fθ|t𝐴𝐸𝑃(b,C)F_{\theta|t}\in\operatorname{AEP}(b,C). Indeed, we have F¯θ|t(x)=F¯θ(x)/F¯θ(t)\bar{F}_{\theta|t}(x)=\bar{F}_{\theta}(x)/\bar{F}_{\theta}(t), where the denominator is constant in xx and, integrating by parts,

F¯θ(x)=exp(θxΛ(θ))F¯0(x)+xθexp(θyΛ(θ))F¯0(y)dy.\bar{F}_{\theta}(x)=\exp\bigl{(}\theta x-\Lambda(\theta)\bigr{)}\bar{F}_{0}(x)+\int_{x}^{\infty}\theta\exp\bigl{(}\theta y-\Lambda(\theta)\bigr{)}\bar{F}_{0}(y)\,\mathrm{d}y.

From here, we reason as in the proof of Proposition 1, using the fact that logF¯0(y)Cyb\log\bar{F}_{0}(y)\sim-Cy^{b} when yy\to\infty, with b>1b>1. Thus Fθ|tF_{\theta|t} and F0|tF_{0|t} have same (first-order) asymptotics, and so nothing distinguishes the asymptotic behavior of UmU_{m} under the null and under an alternative. In detail, we proceed as in Section .3.1, with the enlarged hypercube LL, and show that in probability under 1,Lm\mathbb{H}^{m}_{1,L},

lim supmkm1/b1/2(logm)1/bUL<(d/C)1/b,\limsup_{m\to\infty}k_{m}^{1/b-1/2}(\log m)^{-1/b}U_{L}<(d/C)^{1/b},

where ULU_{L} is the 𝑈𝐿𝑆\operatorname{ULS} scan statistic restricted to open clusters within LL. Because LL is a scaled version of 𝕍m\mathbb{V}_{m}, Fθ|t𝐴𝐸𝑃(b,C)F_{\theta|t}\in\operatorname{AEP}(b,C) and pθ(t)<pcp_{\theta}(t)<p_{c}, Lemma 5 applies to yield

km1/b1/2(αlogm)1/bUL(d/C)1/b.k_{m}^{1/b-1/2}(\alpha\log m)^{-1/b}U_{L}\to(d/C)^{1/b}.

We then conclude with the fact that α<1\alpha<1.

.3.8 Proof of Theorem 5 and Theorem 6

The proof of Theorem 5 is parallel to that of Theorem 1 in Section .3.1, but using Lemma 5 in place of Lemma 9. Note that we use the fact that for tt and β>0\beta>0 fixed, γθ|t(β)\gamma_{\theta|t}(\beta) is continuous and strictly increasing in θ\theta. This comes from Lemma 17 and the fact that when tt is fixed, Fθ|tF_{\theta|t} is also a natural exponential family with parameter θ\theta. Similarly, the proof of Theorem 6 is parallel to that of Theorem 2 in Section .3.2. Further details are omitted.

.3.9 Proof of Lemma 6

The proof is parallel to that of Lemma 5. In particular, we use the notation introduced there and only note where the arguments differ (although never substantially).

Part 1. In this case, by Lemma 12 and Lemma 13, there is only one open cluster with size kmk_{m} or larger, and the result follows from, for example, Chebyshev’s inequality.

Part 2. Define xm=2σ2d(1δβ)logmx_{m}=\sqrt{2\sigma^{2}d(1-\delta\beta^{\prime})\log m} and fix ε>0\varepsilon>0. For the lower bound, we have

Rm((1ε)xm)Tm:=Nm(km)G¯km((1ε)xm).R_{m}\bigl{(}(1-\varepsilon)x_{m}\bigr{)}\geq T_{m}:=N_{m}(k_{m})\bar{G}_{k_{m}}\bigl{(}(1-\varepsilon)x_{m}\bigr{)}.

Fix η>0\eta>0. By Lemma 11 (still valid) and (8),

𝔼(Nm(km))mdexp((1+η)δkm(d1)/d)\mathbb{E}(N_{m}(k_{m}))\geq m^{d}\exp\bigl{(}-(1+\eta)\delta k_{m}^{(d-1)/d}\bigr{)}

for mm large enough. By Cramér’s theorem and the fact that Λ(x)x2/(2σ2)\Lambda^{*}(x)\sim x^{2}/(2\sigma^{2}) when xx is small,

G¯km((1ε)xm)\displaystyle\bar{G}_{k_{m}}\bigl{(}(1-\varepsilon)x_{m}\bigr{)} \displaystyle\geq exp((1+η)kmΛ[(1ε)xm/km])\displaystyle\exp\bigl{(}-(1+\eta)k_{m}\Lambda^{*}\bigl{[}(1-\varepsilon)x_{m}/\sqrt{k_{m}}\bigr{]}\bigr{)}
\displaystyle\geq exp((1+η)(1ε/2)xm2/(2σ2))\displaystyle\exp\bigl{(}-(1+\eta)(1-\varepsilon/2)x_{m}^{2}/(2\sigma^{2})\bigr{)}

for mm large enough. Thus,

𝔼(Tm)exp(dlogm(1+η)(δkm(d1)/d+(1ε/2)xm2/(2σ2)))mεd(1δβ)/4\mathbb{E}(T_{m})\geq\exp\bigl{(}d\log m-(1+\eta)\bigl{(}\delta k_{m}^{(d-1)/d}+(1-\varepsilon/2)x_{m}^{2}/(2\sigma^{2})\bigr{)}\bigr{)}\geq m^{\varepsilon d(1-\delta\beta^{\prime})/4}

for mm large enough and η\eta small enough. For the variance, we use Lemma 11 to get

𝑉𝑎𝑟(Tm)O(logm)d2/(d1)𝔼(Tm).\operatorname{Var}(T_{m})\leq\mathrm{O}(\log m)^{d^{2}/(d-1)}\mathbb{E}(T_{m}).

We then conclude by Chebyshev’s inequality.

We now show that Rm((1+ε)xm)0R_{m}((1+\varepsilon)x_{m})\to 0 in probability. Equation (22) holds with km(η):=[(1+η)(d/δ)logm]d/(d1)k_{m}^{(\eta)}:=[(1+\eta)(d/\delta)\log m]^{d/(d-1)}. As before,

𝔼(Zm)mdexp{12δ(km(η))(d1)/d}0as m.\mathbb{E}(Z_{m})\leq m^{d}\exp\bigl{\{}-{\textstyle\frac{1}{2}}\delta\bigl{(}k_{m}^{(\eta)}\bigr{)}^{(d-1)/d}\bigr{\}}\to 0\qquad\mbox{as }m\to\infty.

By Lemma 11 and (8),

𝔼(Nm(k))mdexp((1η)δk(d1)/d)\mathbb{E}(N_{m}(k))\leq m^{d}\exp\bigl{(}-(1-\eta)\delta k^{(d-1)/d}\bigr{)}

for mm large enough. The absence of a boundary to 𝕍m\mathbb{V}_{m} is being used here. The tail behavior of percolation clusters near the boundary of a box is not yet fully understood (see the remark in Section 5.2). By Chernoff’s bound and the behavior of Λ\Lambda^{*} near the origin,

G¯k((1+ε)xm)exp((1+ε)xm2/(2σ2))\bar{G}_{k}\bigl{(}(1+\varepsilon)x_{m}\bigr{)}\leq\exp\bigl{(}-(1+\varepsilon)x_{m}^{2}/(2\sigma^{2})\bigr{)}

for any kkmk\geq k_{m}. Thus,

𝔼(Tm)\displaystyle\mathbb{E}(T_{m}) \displaystyle\leq 2k=kmkm(η)mdexp((1η)δk(d1)/d(1+ε)xm2/(2σ2))\displaystyle 2\sum_{k=k_{m}}^{k_{m}^{(\eta)}}m^{d}\exp\bigl{(}-(1-\eta)\delta k^{(d-1)/d}-(1+\varepsilon)x_{m}^{2}/(2\sigma^{2})\bigr{)}
\displaystyle\leq O(logm)d/(d1)mεd(1δβ)/4\displaystyle\mathrm{O}(\log m)^{d/(d-1)}m^{-\varepsilon d(1-\delta\beta^{\prime})/4}

for mm large enough and η\eta small enough.

Part 3. This part is even more similar to what we did in the proof of Lemma 5. The behavior of UmU_{m} is driven by the open clusters of size of order logm\log m, with the only difference being that the term in k(d1)/dk^{(d-1)/d} from the bounds on Nm(k)N_{m}(k) is negligible. Details are omitted.

.3.10 Proof of Theorem 7

Without loss of generality, we assume that θm\theta_{m} is bounded. By Lemma 6 and our assumptions on kmk_{m}, under the null, Um:=Um(t,km)PA(logm)1/2U_{m}:=U_{m}(t,k_{m})\sim_{\mathrm{P}}A(\log m)^{1/2} for a finite constant A>0A>0. We now consider the alternative, where the anomalous cluster is KK.

The contribution of the largest open cluster, QmQ_{m}, is

|Qm|(X¯Qmμ0|t)\displaystyle\sqrt{|Q_{m}|}(\bar{X}_{Q_{m}}-\mu_{0|t}) =\displaystyle= |QmK||Qm|(X¯QmKμθm|t)+|QmKc||Qm|(X¯QmKcμ0|t)\displaystyle\frac{|Q_{m}\cap K|}{\sqrt{|Q_{m}|}}(\bar{X}_{Q_{m}\cap K}-\mu_{\theta_{m}|t})+\frac{|Q_{m}\cap K^{c}|}{\sqrt{|Q_{m}|}}(\bar{X}_{Q_{m}\cap K^{c}}-\mu_{0|t})
+|QmK||Qm|(μθm|tμ0|t).\displaystyle{}+\frac{|Q_{m}\cap K|}{\sqrt{|Q_{m}|}}(\mu_{\theta_{m}|t}-\mu_{0|t}).

On the right-hand side, the first term is of order oP(1)\mathrm{o}_{\mathrm{P}}(1), and the second term is of order OP(1)\mathrm{O}_{\mathrm{P}}(1), by Chebyshev’s inequality and the fact that, with probability tending to 1, |QmK||K||Q_{m}\cap K|\asymp|K| and |Qm||𝕍m||Q_{m}|\asymp|\mathbb{V}_{m}|, by Lemma 12. The last term is of (exact) order O(θmm(α1/2)d)\mathrm{O}(\theta_{m}m^{(\alpha-1/2)d}), by the fact that μθ|t\mu_{\theta|t} is differentiable at θ=0\theta=0 with derivative equal to σ0|t2>0\sigma_{0|t}^{2}>0. Therefore, the 𝑈𝐿𝑆\operatorname{ULS} scan test is asymptotically powerful when lim infθmm(α1/2)d(logm)1/2\liminf\theta_{m}m^{(\alpha-1/2)d}(\log m)^{-1/2} is large enough. (Note that this requires α>1/2\alpha>1/2.) If instead, we have lim supθmm(α1/2)d(logm)1/20\limsup\theta_{m}m^{(\alpha-1/2)d}(\log m)^{-1/2}\to 0, then the scan over QmQ_{m} may be ignored, and we need to consider smaller clusters.

By Lemma 13 and the upper bound on kmk_{m}, the second-largest cluster entirely within KK is scanned and its contribution is of order O(θm(logm)d/(2d2))\mathrm{O}(\theta_{m}(\log m)^{d/(2d-2)}), by the same arguments that established the contribution of the largest open cluster. Thus, the 𝑈𝐿𝑆\operatorname{ULS} scan test is asymptotically powerful when lim infθm(logm)d/(2d2)1/2\liminf\theta_{m}(\log m)^{d/(2d-2)-1/2} is large enough. If instead, θm(logm)d/(2d2)1/20\theta_{m}(\log m)^{d/(2d-2)-1/2}\to 0, the test is asymptotically powerless. Indeed, let LL be the set of nodes within distance (logm)3(\log m)^{3} from KK, and let ULU_{L} be the result of scanning the open clusters of size at least kmk_{m} and entirely within LL. As argued in the proof of Proposition 2, this time using Lemma 13, it is sufficient to show that ULA(logm)1/2U_{L}\leq A(\log m)^{1/2} with probability tending to 1 under 1,Lm\mathbb{H}_{1,L}^{m}. For any open cluster QQ entirely within LL,

|Q|(X¯Qμ0|t)=|Q|(X¯Qμθm|t)+|Q|(μθm|tμ0|t),\sqrt{|Q|}(\bar{X}_{Q}-\mu_{0|t})=\sqrt{|Q|}(\bar{X}_{Q}-\mu_{\theta_{m}|t})+\sqrt{|Q|}(\mu_{\theta_{m}|t}-\mu_{0|t}),

so that

ULmaxQ|Q|(X¯Qμθm|t)+oP(1),U_{L}\leq\max_{Q}\sqrt{|Q|}(\bar{X}_{Q}-\mu_{\theta_{m}|t})+\mathrm{o}_{\mathrm{P}}(1),

where the maximum is over open clusters of size at least kmk_{m} and entirely within LL, and the second term is oP(1)\mathrm{o}_{\mathrm{P}}(1) by Lemma 13 and the size of θm\theta_{m}. Although θm0\theta_{m}\to 0 varies, this maximum may be handled exactly as in Lemma 6, so that it is PA(αlogm)1/2\sim_{\mathrm{P}}A(\alpha\log m)^{1/2}, and we conclude.

.3.11 Proof of Lemma 7

We prove only the more refined part. We use abbreviated notation as before, in particular, we omit the subscript 0, using Ft=F0|tF_{t}=F_{0|t}, σt=σ0|t\sigma_{t}=\sigma_{0|t}, and so on. The lower bound is obtained via 𝑈𝐿𝑆mUm(t)/σt\operatorname{ULS}_{m}\geq U_{m}(t^{*})/\sigma_{t^{*}}, where tt^{*} defines Γ(β)\Gamma(\beta), and applying Lemmas 5 or 6 to Um(t)U_{m}(t^{*}) depending on whether t>tct^{*}>t_{c} or t<tct^{*}<t_{c}. For simplicity, we assume that ttct^{*}\neq t_{c}. If t=tct^{*}=t_{c}, then we consider a nearby threshold and argue by continuity. For the upper bound, we prove that (𝑈𝐿𝑆mxm)0\mathbb{P}(\operatorname{ULS}_{m}\geq x_{m})\to 0, where xm:=glogmx_{m}:=\sqrt{g\log m} and g>G:=(dΓ(β))1/2g>G:=(d\Gamma(\beta))^{1/2}.

As tt increases, clusters are created and then destroyed in the coupled percolation processes. Suppose the removal at time tt from the percolation process of vertex vv creates some cluster Qt(w)Q_{t}(w) at some neighbor ww of vv. If 𝑈𝐿𝑆mxm\operatorname{ULS}_{m}\geq x_{m}, there must exist a vertex vv and a neighbor ww such that the cluster formed at ww at time XvX_{v} contributes at some future time t>Xvt^{\prime}>X_{v} an amount at least xmx_{m} to 𝑈𝐿𝑆m\operatorname{ULS}_{m}. By conditioning on vv, XvX_{v}, and ww, one obtains that

(𝑈𝐿𝑆mxm)o(1)+tβ(v𝕍mwvΩt(w))dF(t),\mathbb{P}(\operatorname{ULS}_{m}\geq x_{m})\leq\mathrm{o}(1)+\int_{-\infty}^{t_{\beta}}\mathbb{P}\biggl{(}\bigcup_{v\in\mathbb{V}_{m}}\bigcup_{w\in\partial v}\Omega_{t}(w)\biggr{)}\,\mathrm{d}F(t), (25)

where the o(1)\mathrm{o}(1) term covers the probability that the cluster at time -\infty, namely 𝕍m\mathbb{V}_{m}, determines 𝑈𝐿𝑆m\operatorname{ULS}_{m}, or that a cluster at threshold t>tβt>t_{\beta} is of size at least km:=βlogmk_{m}:=\beta\log m; v\partial v is the neighbor set of vv; and Ωt(w)\Omega_{t}(w) is the event that:

  1. 1.

    k:=|Qt(w)|k:=|Q_{t}(w)| satisfies kβlogmk\geq\beta\log m,

  2. 2.

    there exists a time ttt^{\prime}\geq t such that Qt(w)Q_{t}(w) still exists at time tt^{\prime} and

  3. 3.

    Yt(k)𝔼(Yt(k))xmσtkY_{t}(k)-\mathbb{E}(Y_{t^{\prime}}(k))\geq x_{m}\sigma_{t^{\prime}}\sqrt{k}, where Yt(k)Y_{t}(k) is the sum of a kk-sample from FtF_{t}.

Assume (briefly) that σt\sigma_{t} is non-decreasing, and note that μt\mu_{t} is automatically non-decreasing. Then as in the proofs of Lemmas 5 and 6, and using similar notation,

v𝕍mwv(Ωt(w))\displaystyle\sum_{v\in\mathbb{V}_{m}}\sum_{w\in\partial v}\mathbb{P}(\Omega_{t}(w)) \displaystyle\leq v𝕍mwv(k:=|Qt(w)|βlogm,Yt(k)𝔼(Yt(k))xσtk)\displaystyle\sum_{v\in\mathbb{V}_{m}}\sum_{w\in\partial v}\mathbb{P}\bigl{(}k:=|Q_{t}(w)|\geq\beta\log m,Y_{t}(k)-\mathbb{E}(Y_{t}(k))\geq x\sigma_{t}\sqrt{k}\bigr{)}
\displaystyle\leq 2d𝔼(Rt(xm)),Rt(x):=kkmNt(k)G¯t(k,x),\displaystyle 2d\mathbb{E}(R_{t}(x_{m})),\qquad R_{t}(x):=\sum_{k\geq k_{m}}N_{t}(k)\bar{G}_{t}(k,x),

where Nt(k)N_{t}(k) is the number of tt-open clusters of size kk and

G¯t(k,x)=(Yt(k)𝔼(Yt(k))xσtk).\bar{G}_{t}(k,x)=\mathbb{P}\bigl{(}Y_{t}(k)-\mathbb{E}(Y_{t}(k))\geq x\sigma_{t}\sqrt{k}\bigr{)}.

Therefore, by (25),

(𝑈𝐿𝑆mxm)\displaystyle\mathbb{P}(\operatorname{ULS}_{m}\geq x_{m}) \displaystyle\leq o(1)+2d(tch+tc+htβ𝔼(Rt(xm))dF(t))\displaystyle\mathrm{o}(1)+2d\biggl{(}\int_{-\infty}^{t_{c}-h}+\int_{t_{c}+h}^{t_{\beta}}\mathbb{E}(R_{t}(x_{m}))\,\mathrm{d}F(t)\biggr{)}
+F(tc+h)F(tch)\displaystyle{}+F(t_{c}+h)-F(t_{c}-h)

for any h>0h>0. We bound 𝔼(Rt(xm))\mathbb{E}(R_{t}(x_{m})) as we did in the proofs of Lemmas 5 and 6. Explicitly, when tc+httβt_{c}+h\leq t\leq t_{\beta}, we use Lemma 11 and (1), to get

𝔼(Nt(k))\displaystyle\mathbb{E}(N_{t}(k)) \displaystyle\leq (1p(t))2kekζp(t)(1eζp(t))2\displaystyle\bigl{(}1-p(t)\bigr{)}^{2}\frac{k\mathrm{e}^{-k\zeta_{p(t)}}}{(1-\mathrm{e}^{-\zeta_{p(t)}})^{2}}
\displaystyle\leq C(h,β)kexp(kζp(tc+h)),C(h,β):=(1p(tβ))2(1eζp(tc+h))2.\displaystyle C(h,\beta)k\exp\bigl{(}-k\zeta_{p(t_{c}+h)}\bigr{)},\qquad C(h,\beta):=\frac{(1-p(t_{\beta}))^{2}}{(1-\mathrm{e}^{-\zeta_{p(t_{c}+h)}})^{2}}.

We use Chernoff’s Bound on G¯t(k,x)\bar{G}_{t}(k,x), to obtain

𝔼(Rt(xm))C(h,β)(km,th)2exp((1At)dlogm)+exp(hdlog(m)/2),\mathbb{E}(R_{t}(x_{m}))\leq C(h,\beta)(k_{m,t}^{h})^{2}\exp\bigl{(}(1-A_{t})d\log m\bigr{)}+\exp\bigl{(}-hd\log(m)/2\bigr{)},

where km,th:=(1+h)(d/ζp(t))logmk_{m,t}^{h}:=(1+h)(d/\zeta_{p(t)})\log m,

At:=infβ<s<(1+h)/ζp(t)[sΛt(μ+g/s)+sζp(t)],A_{t}:=\inf_{\beta<s<(1+h)/\zeta_{p(t)}}\bigl{[}s\Lambda_{t}^{*}\bigl{(}\mu+\sqrt{g/s}\bigr{)}+s\zeta_{p(t)}\bigr{]},

as in (24), and the last term is the probability that a there is a tt-open of size exceeding km,thk_{m,t}^{h}. Note that At>1A_{t}>1 for all tc+httβt_{c}+h\leq t\leq t_{\beta} because g>Gg>G. By continuity of AtA_{t}, A+:=inf{At:tc+httβ}>0A_{+}:=\inf\{A_{t}\colon\ t_{c}+h\leq t\leq t_{\beta}\}>0. Hence, we have the following bound for all tc+httβt_{c}+h\leq t\leq t_{\beta},

𝔼(Rt(xm))C(h,β)[(1+h)(d/ζp(tc+h))logm]2m(A+1)d+exp(hdlog(m)/2).\mathbb{E}(R_{t}(x_{m}))\leq C(h,\beta)\bigl{[}(1+h)\bigl{(}d/\zeta_{p(t_{c}+h)}\bigr{)}\log m\bigr{]}^{2}m^{-(A_{+}-1)d}+\exp\bigl{(}-hd\log(m)/2\bigr{)}.

When ttcht\leq t_{c}-h, we simply use the fact that

k𝔼(Nt(k))|𝕍m|=md,\sum_{k}\mathbb{E}(N_{t}(k))\leq|\mathbb{V}_{m}|=m^{d},

and bound G¯t(k,x)\bar{G}_{t}(k,x) in the same way. We get

𝔼(Rt(xm))exp((1At)dlogm),\mathbb{E}(R_{t}(x_{m}))\leq\exp\bigl{(}(1-A_{t})d\log m\bigr{)},

where

At:=infβ<ssΛt(μ+g/s).A_{t}:=\inf_{\beta<s}s\Lambda_{t}^{*}\bigl{(}\mu+\sqrt{g/s}\bigr{)}.

Again, At>1A_{t}>1 for t<tcht<t_{c}-h and AtA>1A_{t}\to A_{-\infty}>1 as tt\to-\infty. Hence, by continuity of AtA_{t}, A:=inf{At:t<tch}>0A_{-}:=\inf\{A_{t}:t<t_{c}-h\}>0, so that

𝔼(Rt(xm))m(A1)d,\mathbb{E}(R_{t}(x_{m}))\leq m^{-(A_{-}-1)d},

valid for all t<tcht<t_{c}-h. Hence, the two integrals in (.3.11) tend to zero with mm. We then let h0h\to 0 so that F(tc+h)F(tch)0F(t_{c}+h)-F(t_{c}-h)\to 0, because FF is continuous at tct_{c}.

Assume now that FF has no atoms on (,tβ](-\infty,t_{\beta}]. Then σt\sigma_{t} is continuous on (,tβ](-\infty,t_{\beta}], and in fact, is uniformly continuous because σtσ\sigma_{t}\to\sigma when tt\to-\infty, because it is positive on that interval (because σt=0\sigma_{t}=0 implies that FtF_{t} is a point mass), σ¯:=min{σt:ttβ}>0\underline{\sigma}:=\min\{\sigma_{t}\colon\ t\leq t_{\beta}\}>0. Because g>Gg>G we can find c>0c>0 such that g:=g(1c)2>Gg^{\prime}:=g(1-c)^{2}>G, and also η>0\eta>0 such that

|σsσt|cσ¯,if |st|η,s,ttβ.|\sigma_{s}-\sigma_{t}|\leq c\underline{\sigma},\qquad\mbox{if }|s-t|\leq\eta,s,t\leq t_{\beta}. (27)

Let xm=glogmx_{m}^{\prime}=\sqrt{g^{\prime}\log m}. We say that a cluster QQ scores at time ss if it exists at time ss and in addition

|Q|βlogm,vQXv|Q|μs+xmσs|Q|.|Q|\geq\beta\log m,\qquad\sum_{v\in Q}X_{v}\geq|Q|\mu_{s}+x_{m}\sigma_{s}\sqrt{|Q|}.

Without loss of generality, assume that tct_{c} is not an integer multiple of η\eta. Fix two neighbors v,w𝕍mv,w\in\mathbb{V}_{m}, and a time ttβt\leq t_{\beta}. If Ωt(w)\Omega_{t}(w) occurs then either:

  1. [(b)]

  2. (a)

    Qt(w)Q_{t}(w) scores at some time s[t,ntη]s\in[t,n_{t}\eta], where ntn_{t}\in\mathbb{Z} satisfies (nt1)ηt<ntη(n_{t}-1)\eta\leq t<n_{t}\eta, or

  3. (b)

    there exists nntn\geq n_{t} and s[nη,(n+1)η)s\in[n\eta,(n+1)\eta) such that Qnη(w)Q_{n\eta}(w) scores at time ss.

The latter possibility arises when Qt(w)Q_{t}(w) scores at some time ss not belonging to the interval [t,ntη)[t,n_{t}\eta). Writing [nη,(n+1)η)[n\eta,(n+1)\eta) for the interval containing ss, Qt(w)Q_{t}(w) must exist at the start of this interval, which is to say that Qt(w)=Qnη(w)Q_{t}(w)=Q_{n\eta}(w).

The probability of (a) is no larger than

(k:=|Qt(w)|βlogm,s[t,ntη]:Yt(k)/kμs+xmσs/k).\mathbb{P}\bigl{(}k:=|Q_{t}(w)|\geq\beta\log m,\exists s\in[t,n_{t}\eta]\colon\ Y_{t}(k)/k\geq\mu_{s}+x_{m}\sigma_{s}/\sqrt{k}\bigr{)}. (28)

By (27) and the fact that μs\mu_{s} is non-decreasing,

μs+xmσskμt+xmσtk,\mu_{s}+\frac{x_{m}\sigma_{s}}{\sqrt{k}}\geq\mu_{t}+\frac{x_{m}^{\prime}\sigma_{t}}{\sqrt{k}}, (29)

so that (28) is no greater than

(k:=|Qt(w)|βlogm,Yt(k)/kμt+xmσt/k).\mathbb{P}\bigl{(}k:=|Q_{t}(w)|\geq\beta\log m,Y_{t}(k)/k\geq\mu_{t}+x_{m}^{\prime}\sigma_{t}/\sqrt{k}\bigr{)}. (30)

Arguing similarly, part (b) has probability no greater than

t/η<n<tβ/η(k:=|Qt(w)|βlogm,Ynη(k)/kμnη+xmσnη/k).\sum_{t/\eta<n<t_{\beta}/\eta}\mathbb{P}\bigl{(}k:=|Q_{t}(w)|\geq\beta\log m,Y_{n\eta}(k)/k\geq\mu_{n\eta}+x_{m}^{\prime}\sigma_{n\eta}/\sqrt{k}\bigr{)}. (31)

We divide the integral in (25) as follows

tβ=1/h+1/htch+tchtc+h+tc+htβ.\int_{-\infty}^{t_{\beta}}=\int_{-\infty}^{-1/h}+\int_{-1/h}^{t_{c}-h}+\int_{t_{c}-h}^{t_{c}+h}+\int_{t_{c}+h}^{t_{\beta}}.

The first integral is bounded by F(1/h)F(-1/h) and the third integral by F(tc+h)F(tch)F(t_{c}+h)-F(t_{c}-h), both terms vanishing as h0h\to 0. For the second and fourth integrals, we do exactly as before, separately for (30) and (31) – for the latter, the sum has at most (tβ+1/h)/η+1(t_{\beta}+1/h)/\eta+1 terms in the second integral and at most (tβtch)/η+1(t_{\beta}-t_{c}-h)/\eta+1 terms in the fourth integral.

.3.12 Proof of Theorem 8

By Lemma 7, 𝑈𝐿𝑆m(km)\operatorname{ULS}_{m}(k_{m}) is of order at most logm\sqrt{\log m} under the null. Now consider the alternative with anomalous cluster KK. If 0<(α1/2)d<α/ν0<(\alpha-1/2)d<\alpha/\nu, consider the contribution of the largest open cluster at supercritical threshold tt and reason as in the proof of Theorem 7. Otherwise, consider the contribution of the largest open cluster at a threshold tmt_{m} such that pcp0(tm)mλ/αp_{c}-p_{0}(t_{m})\asymp m^{-\lambda/\alpha}. As in Theorem 4, the largest open cluster will be comparable in size to, and occupy a substantial portion of KK. Reasoning again as in the proof of Theorem 7, the contribution is of order mαd/2θmmα/νθmmα/νλm^{\alpha d/2}\theta_{m}\geq m^{\alpha/\nu}\theta_{m}\geq m^{\alpha/\nu-\lambda}, which increases as a positive power of mm.

Appendix B: The scan statistic as the GLR

We show that the simple scan statistic defined in (1) approximates the scan statistic of Kulldorff [29], which is strictly speaking the GLR, defined as follows. The log-likelihood under 1,Km\mathbb{H}^{m}_{1,K} is given by

𝑙𝑜𝑔𝑙𝑖𝑘(K,θ,θ0):=|K|(θX¯Klogφ(θ))+|Kc|(θ0X¯Kclogφ(θ0)).\operatorname{loglik}(K,\theta,\theta_{0}):=|K|\bigl{(}\theta\bar{X}_{K}-\log\varphi(\theta)\bigr{)}+|K^{c}|\bigl{(}\theta_{0}\bar{X}_{K^{c}}-\log\varphi(\theta_{0})\bigr{)}.

Assuming θ\theta and θ0\theta_{0} are both unknown, the log GLR is defined as

maxK𝒦msupθ>θ0𝑙𝑜𝑔𝑙𝑖𝑘(K,θ,θ0)supθ0𝑙𝑜𝑔𝑙𝑖𝑘(𝕍m,θ0,θ0),\max_{K\in\mathcal{K}_{m}}\sup_{\theta>\theta_{0}}\operatorname{loglik}(K,\theta,\theta_{0})-\sup_{\theta_{0}}\operatorname{loglik}(\mathbb{V}_{m},\theta_{0},\theta_{0}),

which is equal to

maxK𝒦m[|K|Λ(X¯K)+|Kc|Λ(X¯Kc)|𝕍m|Λ(X¯𝕍m)]+.\max_{K\in\mathcal{K}_{m}}[|K|\Lambda^{*}(\bar{X}_{K})+|K^{c}|\Lambda^{*}(\bar{X}_{K^{c}})-|\mathbb{V}_{m}|\Lambda^{*}(\bar{X}_{\mathbb{V}_{m}})]_{+}. (B.1)

(The subscript + denotes the positive part.)

Under the normal location model, Λ(x)=x2/2\Lambda^{*}(x)=x^{2}/2 and (B.1) is equal to

maxK𝒦m|𝕍m||K||𝕍m||K|(X¯KX¯𝕍m)+2.\max_{K\in\mathcal{K}_{m}}\frac{|\mathbb{V}_{m}||K|}{|\mathbb{V}_{m}|-|K|}(\bar{X}_{K}-\bar{X}_{\mathbb{V}_{m}})_{+}^{2}.

(We used the fact that X¯KX¯KcX¯KX¯𝕍m\bar{X}_{K}\geq\bar{X}_{K^{c}}\Leftrightarrow\bar{X}_{K}\geq\bar{X}_{\mathbb{V}_{m}}.) If km+:=max{|K|:K𝒦m}k_{m}^{+}:=\max\{|K|\colon\ K\in\mathcal{K}_{m}\} satisfies km+/|𝕍m|0k_{m}^{+}/|\mathbb{V}_{m}|\to 0, which is the case in our examples, the fraction above is equal to |K|(1+O(km+/|𝕍m|))|K|(1+\mathrm{O}(k^{+}_{m}/|\mathbb{V}_{m}|)). Moreover, knowing that there is always a cluster KK such that X¯KX¯𝕍m\bar{X}_{K}\geq\bar{X}_{\mathbb{V}_{m}}, we get that the square root of (B.1) is approximately equal to

maxK𝒦m|K|(X¯KX¯𝕍m),\max_{K\in\mathcal{K}_{m}}\sqrt{|K|}(\bar{X}_{K}-\bar{X}_{\mathbb{V}_{m}}), (B.2)

which is the version of (1) when μ0\mu_{0} is unknown. (Note that X¯𝕍m=μ0+O(|𝕍m|)1/2\bar{X}_{\mathbb{V}_{m}}=\mu_{0}+\mathrm{O}(|\mathbb{V}_{m}|)^{-1/2}, by the central limit theorem, so that (B.2) is within O(km+/|𝕍m|)1/2\mathrm{O}(k^{+}_{m}/|\mathbb{V}_{m}|)^{1/2} from (1).) This approximation is actually valid more generally, at least in a way that suffices for the asymptotic analysis that we perform in this work. Indeed, with σ02=𝑉𝑎𝑟0(Xv)\sigma_{0}^{2}=\operatorname{Var}_{0}(X_{v}), we have Λ(x)=(xμ0)2/(2σ02)+O(xμ0)3\Lambda^{*}(x)=(x-\mu_{0})^{2}/(2\sigma_{0}^{2})+\mathrm{O}(x-\mu_{0})^{3} in the neighborhood of μ0\mu_{0}. Assuming that km:=min{|K|:K𝒦m}k_{m}^{-}:=\min\{|K|\colon\ K\in\mathcal{K}_{m}\} satisfies kmk_{m}^{-}\to\infty, which is the case in our examples, the approximation of the square root of (B.1) by (B.2) is valid under the null, because X¯K=μ0+O(km)1/2\bar{X}_{K}=\mu_{0}+\mathrm{O}(k_{m}^{-})^{-1/2} and X¯Kc,X¯𝕍m=μ0+O(|𝕍m|)1/2\bar{X}_{K^{c}},\bar{X}_{\mathbb{V}_{m}}=\mu_{0}+\mathrm{O}(|\mathbb{V}_{m}|)^{-1/2}, by the central limit theorem and the fact that kmk_{m}^{-}\to\infty and km+/|𝕍m|0k_{m}^{+}/|\mathbb{V}_{m}|\to 0. The same applies under the alternative if θm0\theta_{m}\to 0, so that μθm:=𝔼θm(Xv)μ0\mu_{\theta_{m}}:=\mathbb{E}_{\theta_{m}}(X_{v})\to\mu_{0}, and therefore, X¯K\bar{X}_{K} for any K𝒦mK\in\mathcal{K}_{m}. When θm\theta_{m} is bounded away from 0, the two statistics, square root of (B.1) and (B.2), are both of order |K|\sqrt{|K|}, where KK denotes the cluster under the alternative (or in the case of the 𝑈𝐿𝑆\operatorname{ULS} scan, the largest open cluster within the anomalous cluster). Taken together, these findings are sufficient to allow us to conclude that the tests based on (B.1) and (1) behave similarly.

Acknowledgements

The authors thank Ganapati Patil for providing additional references on the upper level set scan statistic, and Mikhail Langovoy for alerting the authors of his work on the LOC test. They are grateful to Thierry Bodineau and Raphaël Cerf for their remarks on the second-largest cluster in supercritical percolation, and to anonymous referees for helping them improve the presentation of the paper. EAC was partially supported by grants from the National Science Foundation (DMS-06-03890) and the Office of Naval Research (N00014-09-1-0258), as well as a Hellman Fellowship. GRG was partially supported by the Engineering and Physical Sciences Research Council under Grant EP/103372X/1.

References

  • [1] {barticle}[mr] \bauthor\bsnmArias-Castro, \bfnmEry\binitsE., \bauthor\bsnmCandès, \bfnmEmmanuel J.\binitsE.J. &\bauthor\bsnmDurand, \bfnmArnaud\binitsA. (\byear2011). \btitleDetection of an anomalous cluster in a network. \bjournalAnn. Statist. \bvolume39 \bpages278–304. \biddoi=10.1214/10-AOS839, issn=0090-5364, mr=2797847 \bptokimsref \endbibitem
  • [2] {barticle}[mr] \bauthor\bsnmArias-Castro, \bfnmEry\binitsE., \bauthor\bsnmCandès, \bfnmEmmanuel J.\binitsE.J., \bauthor\bsnmHelgason, \bfnmHannes\binitsH. &\bauthor\bsnmZeitouni, \bfnmOfer\binitsO. (\byear2008). \btitleSearching for a trail of evidence in a maze. \bjournalAnn. Statist. \bvolume36 \bpages1726–1757. \biddoi=10.1214/07-AOS526, issn=0090-5364, mr=2435454 \bptokimsref \endbibitem
  • [3] {barticle}[mr] \bauthor\bsnmArias-Castro, \bfnmEry\binitsE., \bauthor\bsnmDonoho, \bfnmDavid L.\binitsD.L. &\bauthor\bsnmHuo, \bfnmXiaoming\binitsX. (\byear2005). \btitleNear-optimal detection of geometric objects by fast multiscale methods. \bjournalIEEE Trans. Inform. Theory \bvolume51 \bpages2402–2425. \biddoi=10.1109/TIT.2005.850056, issn=0018-9448, mr=2246369 \bptokimsref \endbibitem
  • [4] {bbook}[mr] \bauthor\bsnmBalakrishnan, \bfnmN.\binitsN. &\bauthor\bsnmKoutras, \bfnmMarkos V.\binitsM.V. (\byear2002). \btitleRuns and Scans with Applications. \bseriesWiley Series in Probability and Statistics. \baddressNew York: \bpublisherWiley-Interscience. \bidmr=1882476 \bptokimsref \endbibitem
  • [5] {barticle}[mr] \bauthor\bsnmBodineau, \bfnmT.\binitsT., \bauthor\bsnmIoffe, \bfnmD.\binitsD. &\bauthor\bsnmVelenik, \bfnmY.\binitsY. (\byear2001). \btitleWinterbottom construction for finite range ferromagnetic models: An 𝕃1\mathbb{L}_{1}-approach. \bjournalJ. Stat. Phys. \bvolume105 \bpages93–131. \biddoi=10.1023/A:1012277926007, issn=0022-4715, mr=1861201 \bptokimsref \endbibitem
  • [6] {barticle}[mr] \bauthor\bsnmBorgs, \bfnmC.\binitsC., \bauthor\bsnmChayes, \bfnmJ. T.\binitsJ.T., \bauthor\bsnmKesten, \bfnmH.\binitsH. &\bauthor\bsnmSpencer, \bfnmJ.\binitsJ. (\byear2001). \btitleThe birth of the infinite cluster: Finite-size scaling in percolation. \bjournalComm. Math. Phys. \bvolume224 \bpages153–204. \bnoteDedicated to Joel L. Lebowitz. \biddoi=10.1007/s002200100521, issn=0010-3616, mr=1868996 \bptokimsref \endbibitem
  • [7] {bbook}[mr] \bauthor\bsnmBrown, \bfnmLawrence D.\binitsL.D. (\byear1986). \btitleFundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. \bseriesInstitute of Mathematical Statistics Lecture Notes—Monograph Series \bvolume9. \baddressHayward, CA: \bpublisherIMS. \bidmr=0882001 \bptokimsref \endbibitem
  • [8] {bbook}[mr] \bauthor\bsnmCerf, \bfnmR.\binitsR. (\byear2006). \btitleThe Wulff Crystal in Ising and Percolation Models. \bseriesLecture Notes in Math. \bvolume1878. \baddressBerlin: \bpublisherSpringer. \bnoteLectures from the 34th Summer School on Probability Theory held in Saint-Flour, July 6–24, 2004, with a foreword by Jean Picard. \bidmr=2241754 \bptokimsref \endbibitem
  • [9] {barticle}[mr] \bauthor\bsnmChen, \bfnmJihong\binitsJ. &\bauthor\bsnmHuo, \bfnmXiaoming\binitsX. (\byear2006). \btitleDistribution of the length of the longest significance run on a Bernoulli net and its applications. \bjournalJ. Amer. Statist. Assoc. \bvolume101 \bpages321–331. \biddoi=10.1198/016214505000000574, issn=0162-1459, mr=2268049 \bptokimsref \endbibitem
  • [10] {bbook}[mr] \bauthor\bsnmCormen, \bfnmThomas H.\binitsT.H., \bauthor\bsnmLeiserson, \bfnmCharles E.\binitsC.E., \bauthor\bsnmRivest, \bfnmRonald L.\binitsR.L. &\bauthor\bsnmStein, \bfnmClifford\binitsC. (\byear2009). \btitleIntroduction to Algorithms, \bedition3rd ed. \baddressCambridge, MA: \bpublisherMIT Press. \bidmr=2572804 \bptokimsref \endbibitem
  • [11] {bmisc}[author] \bauthor\bsnmCsardi, \bfnmG.\binitsG. \btitleThe igraph library. \bhowpublishedAvailable at http://igraph.sourceforge.net. \bptokimsref \endbibitem
  • [12] {barticle}[author] \bauthor\bsnmCuller, \bfnmD.\binitsD., \bauthor\bsnmEstrin, \bfnmD.\binitsD. &\bauthor\bsnmSrivastava, \bfnmM.\binitsM. (\byear2004). \btitleOverview of sensor networks. \bjournalIEEE Computer \bvolume37 \bpages41–49. \bptokimsref \endbibitem
  • [13] {barticle}[mr] \bauthor\bsnmDasGupta, \bfnmBhaskar\binitsB., \bauthor\bsnmHespanha, \bfnmJoão P.\binitsJ.P., \bauthor\bsnmRiehl, \bfnmJames\binitsJ. &\bauthor\bsnmSontag, \bfnmEduardo\binitsE. (\byear2006). \btitleHoney-pot constrained searching with local sensory information. \bjournalNonlinear Anal. \bvolume65 \bpages1773–1793. \biddoi=10.1016/j.na.2005.10.049, issn=0362-546X, mr=2252129 \bptokimsref \endbibitem
  • [14] {bmisc}[author] \bauthor\bsnmDavies, \bfnmP. L.\binitsP.L., \bauthor\bsnmLangovoy, \bfnmM.\binitsM. &\bauthor\bsnmWittich, \bfnmO.\binitsO. (\byear2010). \bhowpublishedDetection of objects in noisy images based on percolation theory. Unpublished manuscript. \bptokimsref \endbibitem
  • [15] {bbook}[mr] \bauthor\bsnmDembo, \bfnmAmir\binitsA. &\bauthor\bsnmZeitouni, \bfnmOfer\binitsO. (\byear2010). \btitleLarge Deviations Techniques and Applications. \bseriesStochastic Modelling and Applied Probability \bvolume38. \baddressBerlin: \bpublisherSpringer. \bnoteCorrected reprint of the second (1998) edition. \biddoi=10.1007/978-3-642-03311-7, mr=2571413 \bptokimsref \endbibitem
  • [16] {barticle}[mr] \bauthor\bsnmDuczmal, \bfnmLuiz\binitsL. &\bauthor\bsnmAssunção, \bfnmRenato\binitsR. (\byear2004). \btitleA simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. \bjournalComput. Statist. Data Anal. \bvolume45 \bpages269–286. \biddoi=10.1016/S0167-9473(02)00302-X, issn=0167-9473, mr=2045632 \bptokimsref \endbibitem
  • [17] {barticle}[mr] \bauthor\bsnmErdős, \bfnmPaul\binitsP. &\bauthor\bsnmRényi, \bfnmAlfréd\binitsA. (\byear1970). \btitleOn a new law of large numbers. \bjournalJ. Analyse Math. \bvolume23 \bpages103–111. \bidissn=0021-7670, mr=0272026 \bptokimsref \endbibitem
  • [18] {barticle}[mr] \bauthor\bsnmFalconer, \bfnmK. J.\binitsK.J. &\bauthor\bsnmGrimmett, \bfnmG. R.\binitsG.R. (\byear1992). \btitleOn the geometry of random Cantor sets and fractal percolation. \bjournalJ. Theoret. Probab. \bvolume5 \bpages465–485. \biddoi=10.1007/BF01060430, issn=0894-9840, mr=1176432 \bptokimsref \endbibitem
  • [19] {barticle}[author] \bauthor\bsnmFeng, \bfnmXiaomei\binitsX., \bauthor\bsnmDeng, \bfnmYoujin\binitsY. &\bauthor\bsnmBlöte, \bfnmHenk W. J.\binitsH.W.J. (\byear2008). \btitlePercolation transitions in two dimensions. \bjournalPhys. Rev. E \bvolume78 \bpages031136. \bptokimsref \endbibitem
  • [20] {barticle}[author] \bauthor\bsnmGeman, \bfnmD.\binitsD. &\bauthor\bsnmJedynak, \bfnmB.\binitsB. (\byear1996). \btitleAn active testing model for tracking roads in satellite images. \bjournalIEEE Trans. Pattern Anal. Mach. Intell. \bvolume18 \bpages1–14. \bptokimsref \endbibitem
  • [21] {bbook}[mr] \bauthor\bsnmGrimmett, \bfnmGeoffrey\binitsG. (\byear1999). \btitlePercolation, \bedition2nd ed. \bseriesGrundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] \bvolume321. \baddressBerlin: \bpublisherSpringer. \bidmr=1707339 \bptokimsref \endbibitem
  • [22] {barticle}[mr] \bauthor\bsnmGrimmett, \bfnmG. R.\binitsG.R. (\byear1985). \btitleThe largest components in a random lattice. \bjournalStudia Sci. Math. Hungar. \bvolume20 \bpages325–331. \bidissn=0081-6906, mr=0886037 \bptokimsref \endbibitem
  • [23] {barticle}[mr] \bauthor\bsnmHara, \bfnmTakashi\binitsT., \bauthor\bparticlevan der \bsnmHofstad, \bfnmRemco\binitsR. &\bauthor\bsnmSlade, \bfnmGordon\binitsG. (\byear2003). \btitleCritical two-point functions and the lace expansion for spread-out high-dimensional percolation and related models. \bjournalAnn. Probab. \bvolume31 \bpages349–408. \biddoi=10.1214/aop/1046294314, issn=0091-1798, mr=1959796 \bptokimsref \endbibitem
  • [24] {barticle}[pbm] \bauthor\bsnmHeffernan, \bfnmRichard\binitsR., \bauthor\bsnmMostashari, \bfnmFarzad\binitsF., \bauthor\bsnmDas, \bfnmDebjani\binitsD., \bauthor\bsnmKarpati, \bfnmAdam\binitsA., \bauthor\bsnmKulldorff, \bfnmMartin\binitsM. &\bauthor\bsnmWeiss, \bfnmDon\binitsD. (\byear2004). \btitleSyndromic surveillance in public health practice, New York City. \bjournalEmerging Infect. Dis. \bvolume10 \bpages858–864. \bidissn=1080-6040, pmid=15200820 \bptokimsref \endbibitem
  • [25] {bmisc}[author] \bauthor\bsnmHills, \bfnmR.\binitsR. (\byear2001). \bhowpublishedSensing for danger. Science & Technology Review 11–17. \bptokimsref \endbibitem
  • [26] {barticle}[mr] \bauthor\bsnmHobolth, \bfnmAsger\binitsA., \bauthor\bsnmPedersen, \bfnmJan\binitsJ. &\bauthor\bsnmJensen, \bfnmEva B. Vedel\binitsE.B.V. (\byear2002). \btitleA deformable template model, with special reference to elliptical templates. \bjournalJ. Math. Imaging Vision \bvolume17 \bpages131–137. \bnoteSpecial issue on statistics of shapes and textures. \biddoi=10.1023/A:1020681419750, issn=0924-9907, mr=1950865 \bptokimsref \endbibitem
  • [27] {barticle}[author] \bauthor\bsnmJain, \bfnmA. K.\binitsA.K., \bauthor\bsnmZhong, \bfnmY.\binitsY. &\bauthor\bsnmDubuisson-Jolly, \bfnmM. P.\binitsM.P. (\byear1998). \btitleDeformable template models: A review. \bjournalSignal Processing \bvolume71 \bpages109–129. \bptokimsref \endbibitem
  • [28] {barticle}[mr] \bauthor\bsnmKesten, \bfnmHarry\binitsH. &\bauthor\bsnmZhang, \bfnmYu\binitsY. (\byear1990). \btitleThe probability of a large finite cluster in supercritical Bernoulli percolation. \bjournalAnn. Probab. \bvolume18 \bpages537–555. \bidissn=0091-1798, mr=1055419 \bptokimsref \endbibitem
  • [29] {barticle}[mr] \bauthor\bsnmKulldorff, \bfnmMartin\binitsM. (\byear1997). \btitleA spatial scan statistic. \bjournalComm. Statist. Theory Methods \bvolume26 \bpages1481–1496. \biddoi=10.1080/03610929708831995, issn=0361-0926, mr=1456844 \bptokimsref \endbibitem
  • [30] {barticle}[mr] \bauthor\bsnmKulldorff, \bfnmMartin\binitsM. (\byear2001). \btitleProspective time periodic geographical disease surveillance using a scan statistic. \bjournalJ. Roy. Statist. Soc. Ser. A \bvolume164 \bpages61–72. \biddoi=10.1111/1467-985X.00186, issn=0964-1998, mr=1819022 \bptokimsref \endbibitem
  • [31] {barticle}[mr] \bauthor\bsnmKulldorff, \bfnmMartin\binitsM., \bauthor\bsnmFang, \bfnmZixing\binitsZ. &\bauthor\bsnmWalsh, \bfnmStephen J.\binitsS.J. (\byear2003). \btitleA tree-based scan statistic for database disease surveillance. \bjournalBiometrics \bvolume59 \bpages323–331. \biddoi=10.1111/1541-0420.00039, issn=0006-341X, mr=1987399 \bptokimsref \endbibitem
  • [32] {barticle}[mr] \bauthor\bsnmKulldorff, \bfnmMartin\binitsM., \bauthor\bsnmHuang, \bfnmLan\binitsL., \bauthor\bsnmPickle, \bfnmLinda\binitsL. &\bauthor\bsnmDuczmal, \bfnmLuiz\binitsL. (\byear2006). \btitleAn elliptic spatial scan statistic. \bjournalStat. Med. \bvolume25 \bpages3929–3943. \biddoi=10.1002/sim.2490, issn=0277-6715, mr=2297401 \bptokimsref \endbibitem
  • [33] {barticle}[pbm] \bauthor\bsnmKulldorff, \bfnmM.\binitsM. &\bauthor\bsnmNagarwalla, \bfnmN.\binitsN. (\byear1995). \btitleSpatial disease clusters: Detection and inference. \bjournalStat. Med. \bvolume14 \bpages799–810. \bidissn=0277-6715, pmid=7644860 \bptokimsref \endbibitem
  • [34] {btechreport}[author] \bauthor\bsnmLangovoy, \bfnmM.\binitsM. &\bauthor\bsnmWittich, \bfnmO.\binitsO. (\byear2011). \btitleMultiple testing, uncertainty and realistic pictures. \btypeTechnical report, \binstitutionEURANDOM. \bptokimsref \endbibitem
  • [35] {barticle}[author] \bauthor\bsnmLi, \bfnmDan\binitsD., \bauthor\bsnmWong, \bfnmK. D.\binitsK.D., \bauthor\bsnmHu, \bfnmYu Hen\binitsY.H. &\bauthor\bsnmSayeed, \bfnmA. M.\binitsA.M. (\byear2002). \btitleDetection, classification, and tracking of targets. \bjournalSignal Processing Magazine, IEEE \bvolume19 \bpages17–29. \bptokimsref \endbibitem
  • [36] {barticle}[pbm] \bauthor\bsnmMcInerney, \bfnmT.\binitsT. &\bauthor\bsnmTerzopoulos, \bfnmD.\binitsD. (\byear1996). \btitleDeformable models in medical image analysis: A survey. \bjournalMed. Image Anal. \bvolume1 \bpages91–108. \bidissn=1361-8415, pii=S1361-8415(96)80007-7, pmid=9873923 \bptokimsref \endbibitem
  • [37] {barticle}[mr] \bauthor\bsnmPatil, \bfnmG. P.\binitsG.P., \bauthor\bsnmBalbus, \bfnmJ.\binitsJ., \bauthor\bsnmBiging, \bfnmG.\binitsG., \bauthor\bsnmJaja, \bfnmJ.\binitsJ., \bauthor\bsnmMyers, \bfnmW. L.\binitsW.L. &\bauthor\bsnmTaillie, \bfnmC.\binitsC. (\byear2004). \btitleMultiscale advanced raster map analysis system: Definition, design and development. \bjournalEnviron. Ecol. Stat. \bvolume11 \bpages113–138. \biddoi=10.1023/B:EEST.0000027205.77490.8c, issn=1352-8505, mr=2086391 \bptokimsref \endbibitem
  • [38] {barticle}[mr] \bauthor\bsnmPatil, \bfnmG. P.\binitsG.P., \bauthor\bsnmJoshi, \bfnmS. W.\binitsS.W. &\bauthor\bsnmKoli, \bfnmR. E.\binitsR.E. (\byear2010). \btitlePULSE, progressive upper level set scan statistic for geospatial hotspot detection. \bjournalEnviron. Ecol. Stat. \bvolume17 \bpages149–182. \biddoi=10.1007/s10651-010-0140-1, issn=1352-8505, mr=2725778 \bptokimsref \endbibitem
  • [39] {barticle}[mr] \bauthor\bsnmPatil, \bfnmG. P.\binitsG.P., \bauthor\bsnmModarres, \bfnmR.\binitsR., \bauthor\bsnmMyers, \bfnmW. L.\binitsW.L. &\bauthor\bsnmPatankar, \bfnmP.\binitsP. (\byear2006). \btitleSpatially constrained clustering and upper level set scan hotspot detection in surveillance geoinformatics. \bjournalEnviron. Ecol. Stat. \bvolume13 \bpages365–377. \biddoi=10.1007/s10651-006-0017-5, issn=1352-8505, mr=2297368 \bptokimsref \endbibitem
  • [40] {bunpublished}[author] \bauthor\bsnmPatil, \bfnmG. P.\binitsG.P., \bauthor\bsnmModarres, \bfnmR.\binitsR. &\bauthor\bsnmPatankar, \bfnmP.\binitsP. (\byear2005). \btitleThe ULS software, version 1.0. \bnoteCenter for Statistical Ecology and Environmental Statistics, Dept. Statistics, Pennsylvania State Univ. \bptokimsref \endbibitem
  • [41] {barticle}[mr] \bauthor\bsnmPatil, \bfnmG. P.\binitsG.P. &\bauthor\bsnmTaillie, \bfnmC.\binitsC. (\byear2003). \btitleGeographic and network surveillance via scan statistics for critical area detection. \bjournalStatist. Sci. \bvolume18 \bpages457–465. \biddoi=10.1214/ss/1081443229, issn=0883-4237, mr=2109372 \bptokimsref \endbibitem
  • [42] {barticle}[mr] \bauthor\bsnmPatil, \bfnmG. P.\binitsG.P. &\bauthor\bsnmTaillie, \bfnmC.\binitsC. (\byear2004). \btitleUpper level set scan statistic for detecting arbitrarily shaped hotspots. \bjournalEnviron. Ecol. Stat. \bvolume11 \bpages183–197. \biddoi=10.1023/B:EEST.0000027208.48919.7e, issn=1352-8505, mr=2086394 \bptokimsref \endbibitem
  • [43] {barticle}[mr] \bauthor\bsnmPenrose, \bfnmMathew D.\binitsM.D. (\byear2001). \btitleA central limit theorem with applications to percolation, epidemics and Boolean models. \bjournalAnn. Probab. \bvolume29 \bpages1515–1546. \biddoi=10.1214/aop/1015345760, issn=0091-1798, mr=1880230 \bptokimsref \endbibitem
  • [44] {barticle}[mr] \bauthor\bsnmPenrose, \bfnmMathew D.\binitsM.D. &\bauthor\bsnmPisztora, \bfnmAgoston\binitsA. (\byear1996). \btitleLarge deviations for discrete and continuous percolation. \bjournalAdv. in Appl. Probab. \bvolume28 \bpages29–52. \biddoi=10.2307/1427912, issn=0001-8678, mr=1372330 \bptokimsref \endbibitem
  • [45] {barticle}[mr] \bauthor\bsnmPerone Pacifico, \bfnmM.\binitsM., \bauthor\bsnmGenovese, \bfnmC.\binitsC., \bauthor\bsnmVerdinelli, \bfnmI.\binitsI. &\bauthor\bsnmWasserman, \bfnmL.\binitsL. (\byear2004). \btitleFalse discovery control for random fields. \bjournalJ. Amer. Statist. Assoc. \bvolume99 \bpages1002–1014. \biddoi=10.1198/0162145000001655, issn=0162-1459, mr=2109490 \bptokimsref \endbibitem
  • [46] {barticle}[mr] \bauthor\bsnmPisztora, \bfnmAgoston\binitsA. (\byear1996). \btitleSurface order large deviations for Ising, Potts and percolation models. \bjournalProbab. Theory Related Fields \bvolume104 \bpages427–466. \biddoi=10.1007/BF01198161, issn=0178-8051, mr=1384040 \bptnotecheck year \bptokimsref \endbibitem
  • [47] {barticle}[author] \bauthor\bsnmPozo, \bfnmD.\binitsD., \bauthor\bsnmOlmo, \bfnmF. J.\binitsF.J. &\bauthor\bsnmAlados-Arboledas, \bfnmL.\binitsL. (\byear1997). \btitleFire detection and growth monitoring using a multitemporal technique on AVHRR mid-infrared and thermal channels. \bjournalRemote Sensing of Environment \bvolume60 \bpages111–120. \bptokimsref \endbibitem
  • [48] {bmisc}[author] \borganizationR Core Team. \bhowpublishedThe R project for statistical computing. Available at http://www.r-project.org. \bptokimsref \endbibitem
  • [49] {barticle}[pbm] \bauthor\bsnmRotz, \bfnmLisa D.\binitsL.D. &\bauthor\bsnmHughes, \bfnmJames M.\binitsJ.M. (\byear2004). \btitleAdvances in detecting and responding to threats from bioterrorism and emerging infectious disease. \bjournalNat. Med. \bvolume10 \bpagesS130–S136. \biddoi=10.1038/nm1152, issn=1078-8956, pii=nm1152, pmid=15577931 \bptokimsref \endbibitem
  • [50] {barticle}[mr] \bauthor\bsnmSmirnov, \bfnmStanislav\binitsS. &\bauthor\bsnmWerner, \bfnmWendelin\binitsW. (\byear2001). \btitleCritical exponents for two-dimensional percolation. \bjournalMath. Res. Lett. \bvolume8 \bpages729–744. \bidissn=1073-2780, mr=1879816 \bptokimsref \endbibitem
  • [51] {barticle}[pbm] \bauthor\bsnmTango, \bfnmToshiro\binitsT. &\bauthor\bsnmTakahashi, \bfnmKunihiko\binitsK. (\byear2005). \btitleA flexibly shaped spatial scan statistic for detecting clusters. \bjournalInt. J. Health Geogr. \bvolume4 \bpages11. \biddoi=10.1186/1476-072X-4-11, issn=1476-072X, pii=1476-072X-4-11, pmcid=1173134, pmid=15904524 \bptokimsref \endbibitem
  • [52] {barticle}[mr] \bauthor\bparticlevan der \bsnmHofstad, \bfnmRemco\binitsR. &\bauthor\bsnmRedig, \bfnmFrank\binitsF. (\byear2006). \btitleMaximal clusters in non-critical percolation and related models. \bjournalJ. Stat. Phys. \bvolume122 \bpages671–703. \biddoi=10.1007/s10955-005-8012-z, issn=0022-4715, mr=2213947 \bptokimsref \endbibitem
  • [53] {barticle}[pbm] \bauthor\bsnmWagner, \bfnmM. M.\binitsM.M., \bauthor\bsnmTsui, \bfnmF. C.\binitsF.C., \bauthor\bsnmEspino, \bfnmJ. U.\binitsJ.U., \bauthor\bsnmDato, \bfnmV. M.\binitsV.M., \bauthor\bsnmSittig, \bfnmD. F.\binitsD.F., \bauthor\bsnmCaruana, \bfnmR. A.\binitsR.A., \bauthor\bsnmMcGinnis, \bfnmL. F.\binitsL.F., \bauthor\bsnmDeerfield, \bfnmD. W.\binitsD.W., \bauthor\bsnmDruzdzel, \bfnmM. J.\binitsM.J. &\bauthor\bsnmFridsma, \bfnmD. B.\binitsD.B. (\byear2001). \btitleThe emerging science of very early detection of disease outbreaks. \bjournalJ. Public Health Manag. Pract. \bvolume7 \bpages51–59. \bidissn=1078-4659, pmid=11710168 \bptokimsref \endbibitem
  • [54] {barticle}[mr] \bauthor\bsnmWalther, \bfnmGuenther\binitsG. (\byear2010). \btitleOptimal and fast detection of spatial clusters with scan statistics. \bjournalAnn. Statist. \bvolume38 \bpages1010–1033. \biddoi=10.1214/09-AOS732, issn=0090-5364, mr=2604703 \bptokimsref \endbibitem