This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An approximation to peak detection power using Gaussian random field theory

Yu Zhao1    Dan Cheng3    Armin Schwartzman1,2
(1Division of Biostatistics,
Herbert Wertheim School of Public Health and Human Longevity Science,
University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
2Halicioǧlu Data Science Institute,
University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
3School of Mathematical and Statistical Sciences,
Arizona State University, 900 S. Palm Walk, Tempe, AZ 85281, USA
)
Abstract

We study power approximation formulas for peak detection using Gaussian random field theory. The approximation, based on the expected number of local maxima above the threshold uu, 𝔼[Mu]{\mathbb{E}}[M_{u}], is proved to work well under three asymptotic scenarios: small domain, large threshold, and sharp signal. An adjusted version of 𝔼[Mu]{\mathbb{E}}[M_{u}] is also proposed to improve accuracy when the expected number of local maxima 𝔼[M]{\mathbb{E}}[M_{-\infty}] exceeds 1.

Cheng and Schwartzman (2018) developed explicit formulas for 𝔼[Mu]{\mathbb{E}}[M_{u}] of smooth isotropic Gaussian random fields with zero mean. In this paper, these formulas are extended to allow for rotational symmetric mean functions, so that they are suitable for power calculations. We also apply our formulas to 2D and 3D simulated datasets, and the 3D data is induced by a group analysis of fMRI data from the Human Connectome Project to measure performance in a realistic setting.

Key words: Power calculations, peak detection, Gaussian random field, image analysis.

1 Introduction

Detection of peaks (local maxima) is an important topic in image analysis. For example, a fundamental goal in fMRI analysis is to identify the local hotspots of brain activity (see, for example, Genovese et al., 2002 and Heller et al., 2006), which are typically captured by peaks in the fMRI signal. The detection of such peaks can be posed as a statistical testing problem intended to test whether the underlying signal has a peak at a given location. This is challenging because such tests are conducted only at locations of observed peaks, which depend on the data. Therefore, the height distribution of the observed peak is conditional on a peak being observed at that location. This is a nonstandard problem. Solutions exist using random field theory (RFT). RFT is a statistical framework that can be used to perform topological inference and modeling. RFT-based peak detection has been studied in Cheng and Schwartzman (2017) and Schwartzman and Telschow (2019), which provide the peak height distribution for isotropic noise under the complete null hypothesis of no signal anywhere.

In general, for any statistical testing problem, accurate power calculations help researchers decide the minimum sample size required for an informative test, and thus reduce cost. Power calculation formulas exist for common univariate tests, such as z-tests and t-tests. However, particular challenges arise when we perform power calculations in peak detection settings. Due to the nature of imaging data, the number and location of the signal peaks are unknown. Besides, the power is affected by other spatial aspects of the problem, such as the shape of the peak function and the spatial autocorrelation of the noise. Considering these difficulties, it requires some extra effort to derive a power formula for peak detection.

A formal definition of power in peak detection is necessary to perform power calculations. In Cheng and Schwartzman (2017) and Durnez et al. (2016), the authors explored approaches to control the false discovery rate (FDR). For the entire domain, average peakwise power, i.e. power averaged over all non-null voxels, is a natural choice for these approaches. For a local domain where a single peak exists, the power can be defined as the probability of successfully detecting that peak. Following this idea, we describe the null and alternative hypothesis and the definition of detection power. We do so informally here for didactic purposes and present formal definitions in Section 2.

Consider a local domain where a single peak may exist, and consider the hypotheses

H0\displaystyle H_{0} :“the signal is equal to 0 in the local domain.”  vs\displaystyle:\text{``the signal is equal to 0 in the local domain." \quad vs}
H1\displaystyle H_{1} :“the signal has at least one positive peak in the local domain.”\displaystyle:\text{``the signal has at least one positive peak in the local domain."}

Suppose we observe a random field to be used as test statistic at every location, typically as the result of statistical modeling of the data. For a fixed threshold uu, the existence of observed peaks with height greater than uu would lead to rejecting the null hypothesis. Therefore, we define the type I error and power as the probability of existing at least one local maximum above uu under H0H_{0} and H1H_{1} respectively:

Type I error: { a peak in the local domain with height>u when H0 is true}\displaystyle\mathbb{P}\{\exists\text{ a peak in the local domain with height}>u\text{ when $H_{0}$ is true}\}
Power: { a peak in the local domain with height>u when H1 is true}\displaystyle\mathbb{P}\{\exists\text{ a peak in the local domain with height}>u\text{ when $H_{1}$ is true}\} (1)

Formulas for type I error have been developed for stationary fields in 1D and isotropic fields in 2D and 3D (Cheng and Schwartzman, 2015, Cheng and Schwartzman, 2017 and Cheng and Schwartzman, 2018). However, there is no formula to calculate power. In order to get an appropriate estimate of power, we need to know the peak height distribution for non-centered (the mean function is not 0) random fields. Generally speaking, it is very difficult to calculate the peak height distribution especially when the random field has non-zero mean. Durnez et al. (2016) suggests using Gaussian distribution to describe the non-null peaks and truncated Gaussian distribution to approximate the overshoot distribution. This approach is easy to implement but not very accurate because the peak height distribution is in reality always skewed and not close to any Gaussian distribution.

In this article, we propose to approximate the probability of an observed peak exceeding the detection threshold uu by calculating the expected number of peaks above the threshold uu. We show that the approximation, which is also an upper bound, works well under certain scenarios. For the entire domain, we can approximate the average peakwise power by taking the arithmetic mean of the approximation proposed in this paper over non-null voxels.

The proposed approximation makes the problem more tractable, but in general, it does not have an explicit form. In order to make it applicable in practice, we further simplify the formula under the isotropy assumption and show its explicit form in 1D, 2D, and 3D. The explicit results are validated through 2D and 3D computer simulations carried out in MATLAB. The simulation also covers multiple scenarios by modifying the parameters used to generate the data. The performance of power approximation and its conservative adjustment under these scenarios are discussed.

Finally, to assess the real-data performance of our power approximation method, we apply it to a 3D simulation induced by a real brain imaging dataset, where the parameters are estimated from the Human Connectome Project (Van Essen et al., 2012) fMRI data. By testing the method in a realistic setting, we also demonstrate how effect size and other parameters affect the power.

The paper is organized as follows. We first show in Section 2 the problem setup and theoretical results in certain scenarios. In Section 3, we derive the explicit formulas under isotropy. Simulation in 2D is conducted in Section 4. Details regarding how to apply our formula in application setting are discussed in Section 5. The methodology is applied to a 3D real dataset in Section 6.

2 Power Approximation

2.1 Setup

Let Y(s)=σ(s)Z(s)+μ(s)Y(s)=\sigma(s)Z(s)+\mu(s) where Z={Z(s),sD}Z=\{Z(s),s\in D\} representing the noise is a centered (zero-mean) smooth unit-variance Gaussian random field on an NN dimensional non-empty domain DND\subset\mathbb{R}^{N}, σ(s)\sigma(s) is the standard deviation of the noise and μ(s)\mu(s) is the mean function. Let X(s)=Y(s)/σ(s)=Z(s)+θ(s)X(s)=Y(s)/\sigma(s)=Z(s)+\theta(s) where the ratio θ(s)=μ(s)/σ(s)\theta(s)=\mu(s)/\sigma(s) is the standardized mean function, which we assume to be C2C^{2}. Here C3C^{3} is a sufficient smoothness condition for ZZ, and this will be clarified in Assumption 1 below.

Let

Xi(s)=X(s)si,X(s)=(X1(s),,XN(s)),Xij(s)=2X(s)sisj,2X(s)=(Xij(s))1i,jN,Zi(s)=Z(s)si,Z(s)=(Z1(s),,ZN(s)),Zij(s)=2Z(s)sisj,2Z(s)=(Zij(s))1i,jN.\begin{split}X_{i}(s)&=\frac{\partial X(s)}{\partial s_{i}},\quad\nabla X(s)=(X_{1}(s),\ldots,X_{N}(s)),\\ X_{ij}(s)&=\frac{\partial^{2}X(s)}{\partial s_{i}s_{j}},\quad\nabla^{2}X(s)=(X_{ij}(s))_{1\leq i,j\leq N},\\ Z_{i}(s)&=\frac{\partial Z(s)}{\partial s_{i}},\quad\nabla Z(s)=(Z_{1}(s),\ldots,Z_{N}(s)),\\ Z_{ij}(s)&=\frac{\partial^{2}Z(s)}{\partial s_{i}s_{j}},\quad\nabla^{2}Z(s)=(Z_{ij}(s))_{1\leq i,j\leq N}.\end{split}

We will make use of the following assumptions:

Assumption 1.

ZC2(D)Z\in C^{2}(D) almost surely and its second derivatives satisfy the mean-square Hölder condition: for any s0Ds_{0}\in D, there exists positive constants LL, η\eta and δ\delta such that

𝔼(Zij(s)Zij(t))2L2st2η,t,sUs0(δ),i,j=1,,N.\mathbb{E}(Z_{ij}(s)-Z_{ij}(t))^{2}\leq L^{2}\|s-t\|^{2\eta},\quad\forall t,s\in U_{s_{0}}(\delta),\>i,j=1,...,N.

where Us0(δ)=s0(δ/2,δ/2)NU_{s_{0}}(\delta)=s_{0}\oplus(-\delta/2,\delta/2)^{N} is the NN dimensional open cube of side length δ\delta centered at s0s_{0}. This condition is satisfied, for example, if ZZ is C3(D)C^{3}(D).

Assumption 2.

For every pair (t,s)D×D(t,s)\in D\times D with sts\neq t, the Gaussian random vector

(Z(s),Z(s),Zij(s),Z(t),Z(t),Zij(t), 1ijN)(Z(s),\,\nabla Z(s),\,Z_{ij}(s),\,Z(t),\,\nabla Z(t),\,Z_{ij}(t),\,1\leq i\leq j\leq N)

is non-degenerate, i.e. its covariance matrix has full rank.

2.2 Peak detection

Following the notation in the problem setup, the null and alternative hypothesis can be written as:

H0\displaystyle H_{0} :μ(s)=0for allsD vs\displaystyle:\text{$\mu(s)=0\>\text{for all}\>s\in D$ \quad vs}
H1\displaystyle H_{1} :μ(s)>0,μ(s)=02μ(s)0for some sD\displaystyle:\text{$\mu(s)>0,\>\nabla\mu(s)=0\text{, }\nabla^{2}\mu(s)\prec 0\>\text{for some }s\in D$}

The mean function μ(s)\mu(s) is not directly observed, so the hypothesis is tested based on the peak height of X(s)X(s). For a peak detection procedure that aims to test this hypothesis, a threshold uu for the peak height of X(s)X(s) needs to be set in advance. If a local maximum with height greater than uu is observed, we would choose to reject the null hypothesis due to the strong evidence against it. The probability that a peak of XX exceeds uu

(sDs.t.X(s)>u|X(s)=0 and 2X(s)0){\mathbb{P}}{\left(\exists\>s\in D\>s.t.\>X(s)>u|\nabla X(s)=0\text{ and }\nabla^{2}X(s)\prec 0\right)} (2)

is the type I error under H0H_{0} and power under H1H_{1}. The threshold uu can be obtained based on the peak height distribution under H0H_{0}. A formula for peak height distribution of smooth isotropic Gaussian random fields has been derived in Cheng and Schwartzman (2018) and it can also be derived directly from a special case of the formulas presented in this paper. Usually, uu is set to be some quantile of the null distribution of peak height to maintain the nominal α\alpha type I error. More details about selecting the threshold will be discussed in the real data example. Selecting uu is not the main focus of this paper and our method can be applied to any choice of uu.

2.3 Power approximation

Let MuM_{u} be the number of local maxima of the random field XX above uu over the local domain DD. The power defined in (2) can be represented as [Mu1]{\mathbb{P}}[M_{u}\geq 1]. We call this the power function, seen as a function of the threshold u. Note that

[Mu1]=k=1[Mu=k]k=1k[Mu=k]=𝔼[Mu].{\mathbb{P}}[M_{u}\geq 1]=\sum_{k=1}^{\infty}{\mathbb{P}}[M_{u}=k]\leq\sum_{k=1}^{\infty}k{\mathbb{P}}[M_{u}=k]={\mathbb{E}}[M_{u}]. (3)

On the other hand,

𝔼[Mu][Mu1]=k=2(k1)[Mu=k]12k=2k(k1)[Mu=k]=12𝔼[Mu(Mu1)].{\mathbb{E}}[M_{u}]-{\mathbb{P}}[M_{u}\geq 1]=\sum_{k=2}^{\infty}(k-1){\mathbb{P}}[M_{u}=k]\leq\frac{1}{2}\sum_{k=2}^{\infty}k(k-1){\mathbb{P}}[M_{u}=k]=\frac{1}{2}{\mathbb{E}}[M_{u}(M_{u}-1)]. (4)

Thus, we have

𝔼[Mu]12𝔼[Mu(Mu1)][Mu1]𝔼[Mu].{\mathbb{E}}[M_{u}]-\frac{1}{2}{\mathbb{E}}[M_{u}(M_{u}-1)]\leq{\mathbb{P}}[M_{u}\geq 1]\leq{\mathbb{E}}[M_{u}]. (5)

This inequality tells us that for any fixed uu, the power is bounded within an interval of length 𝔼[Mu(Mu1)]/2{\mathbb{E}}[M_{u}(M_{u}-1)]/2. Thus, 𝔼[Mu]{\mathbb{E}}{[M_{u}]} is a good approximation of power if one of the two conditions below is satisfied:

  1. 1.

    The factorial moment 𝔼[Mu(Mu1)]{\mathbb{E}}{[M_{u}(M_{u}-1)]} converges to 0 and 𝔼[Mu]{\mathbb{E}}{[M_{u}]} does not.

  2. 2.

    They both converge to 0 and 𝔼[Mu(Mu1)]{\mathbb{E}}{[M_{u}(M_{u}-1)]} converges faster than 𝔼[Mu]{\mathbb{E}}{[M_{u}]}.

The convergence above refers to conditions on the signal and noise parameters. In the rest of this section, we introduce four interesting results. The first result can be useful for simplifying the power function and the other three results give different scenarios where one of the conditions above holds.

2.4 Adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}]

We have provided evidence of using 𝔼[Mu]{\mathbb{E}}[M_{u}] to approximate power through (5). However, 𝔼[Mu]{\mathbb{E}}[M_{u}] alone might not be sufficient for power approximation since it only gives an upper bound. Also, unlike power, 𝔼[Mu]{\mathbb{E}}[M_{u}] sometimes exceeds 1. To correct for this, we define the adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] as

𝔼[Mu]adj=𝔼[Mu]/max(1,𝔼[M]).{\mathbb{E}}[M_{u}]_{{\rm adj}}={\mathbb{E}}[M_{u}]/\max(1,{\mathbb{E}}[M_{-\infty}]). (6)

The adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] is the same as 𝔼[Mu]{\mathbb{E}}[M_{u}] when the expected number of local maxima 𝔼[M]{\mathbb{E}}[M_{-\infty}] is less or equal to 1. When 𝔼[M]{\mathbb{E}}[M_{-\infty}] is greater than 1, we divide 𝔼[Mu]{\mathbb{E}}[M_{u}] by 𝔼[M]{\mathbb{E}}[M_{-\infty}] to make sure it never exceeds 1. The adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] is more conservative, and we conjecture that it is a lower bound of power when there exists at least one local maximum in the domain DD. In applications, people are interested in a conservative estimator so that the test is guaranteed to have enough power. Combining 𝔼[Mu]{\mathbb{E}}[M_{u}] and 𝔼[Mu]/𝔼[M]{\mathbb{E}}[M_{u}]/{\mathbb{E}}[M_{-\infty}], we can get an approximate range of the true power. We will compare 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] in simulation studies.

2.5 Height equivariance

Our first result does not concern the approximation (5) yet, but it offers a simplification of the power function and 𝔼[Mu]{\mathbb{E}}[M_{u}] that will be used later. The proposition below states that the power function and 𝔼[Mu]{\mathbb{E}}[M_{u}] for peak detection are translation equivariant with respect to peak height.

Proposition 1.

Let θ(s)=h(s)+θ0\theta(s)=h(s)+\theta_{0} be a peak signal with height θ0\theta_{0}, where h(s)h(s) is a unimodal mean function with maximum equal to 0 at s0s_{0} in DD. Then the power function for peak detection and 𝔼[Mu]{\mathbb{E}}[M_{u}] can be written in the form F(uθ0)F(u-\theta_{0}), where F(u)F(u) is the power function or 𝔼[Mu]{\mathbb{E}}[M_{u}] at θ0=0\theta_{0}=0.

Proof.

Let θ~(s)=θ(s)θ0=h(s)+0\tilde{\theta}(s)=\theta(s)-\theta_{0}=h(s)+0 and M~u\tilde{M}_{u} be the number of local maxima of the random field X~(s)=Z(s)+θ~(s)\tilde{X}(s)=Z(s)+\tilde{\theta}(s) above uu over DD. Considering the definition of power, we have

F(uθ0)=[M~uθ01]=[Mu1].F(u-\theta_{0})={\mathbb{P}}[\tilde{M}_{u-\theta_{0}}\geq 1]={\mathbb{P}}[M_{u}\geq 1].

Given that 𝔼[M~uθ0]=𝔼[Mu]{\mathbb{E}}[\tilde{M}_{u-\theta_{0}}]={\mathbb{E}}[M_{u}], is is also straightforward to show 𝔼[Mu]{\mathbb{E}}[M_{u}] is translation equivariant with respect to θ0\theta_{0}. ∎

Next, we give three scenarios where the equality in (5) can be achieved asymptotically: small domain size, large threshold, and sharp signal.

2.6 Small domain

If the size of the local domain DD where a single peak exists is small enough, it can be shown that equality in (5) can be achieved asymptotically.

Theorem 1.

Consider a local domain Dϵ=U(s0,ϵ)D_{\epsilon}=U(s_{0},\epsilon) for any fixed s0Ds_{0}\in D where U(s0,ϵ)=t0(ϵ/2,ϵ/2)NU(s_{0},\epsilon)=t_{0}\oplus(-\epsilon/2,\epsilon/2)^{N} is the N-dimensional open cube of side ϵ\epsilon centered at t0t_{0}. For sufficiently small ϵ\epsilon and fixed threshold u,

[Mu1]=𝔼[Mu](1o(1))=𝔼[Mu]adj(1o(1)),{\mathbb{P}}[M_{u}\geq 1]={\mathbb{E}}[M_{u}](1-o(1))={\mathbb{E}}[M_{u}]_{{\rm adj}}(1-o(1)), (7)
Proof.

The proof is based on the proof of Lemma 3 in Piterbarg (1996) and Lemma 4.1 in Cheng and Schwartzman (2015).

𝔼[Mu(Mu1)]=DϵDϵuu𝔼[|det2X(s)||det2X(t)||X(s)=x1,X(t)=x2X(s)=X(t)=0]PX(s),X(t),X(s),X(t)(x1,x2,0,0)dx1dx2dsdt.\begin{split}{\mathbb{E}}[M_{u}(M_{u}-1)]=\int_{D_{\epsilon}}\int_{D_{\epsilon}}\int_{u}^{\infty}\int_{u}^{\infty}{\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)||{\rm det}\nabla^{2}X(t)|\middle|\begin{matrix}X(s)=x_{1},X(t)=x_{2}\\ \nabla X(s)=\nabla X(t)=0\end{matrix}\right]\\ P_{X(s),X(t),\nabla X(s),\nabla X(t)}(x_{1},x_{2},0,0)\,dx_{1}\,dx_{2}\,ds\,dt.\end{split} (8)

Let

𝔼1(s,t)=𝔼[|det2X(s)||det2X(t)||X(s)=xX(s)=X(t)=0]{\mathbb{E}}_{1}(s,t)={\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)||{\rm det}\nabla^{2}X(t)|\middle|\begin{matrix}X(s)=x\\ \nabla X(s)=\nabla X(t)=0\end{matrix}\right]

and replace one of the integration limits in (8) by -\infty, we have

𝔼[Mu(Mu1)]DϵDϵPX(s),X(t)(0,0)dsdtu𝔼1(s,t)PX(s)(x|X(s)=X(t)=0)dx.\begin{split}{\mathbb{E}}[M_{u}(M_{u}-1)]\leq\int_{D_{\epsilon}}\int_{D_{\epsilon}}P_{\nabla X(s),\nabla X(t)}(0,0)dsdt\int_{u}^{\infty}{\mathbb{E}}_{1}(s,t)P_{X(s)}\left(x\middle|\nabla X(s)=\nabla X(t)=0\right)dx.\end{split}

Then we can take the Taylor expansion

X(t)=X(s)+2X(s)(ts)+ts1+αYs,t\nabla X(t)=\nabla X(s)+\nabla^{2}X(s)(t-s)+||t-s||^{1+\alpha}Y_{s,t}

where Ys,t=(Ys,t1,,Ys,tN)TY_{s,t}=(Y_{s,t}^{1},...,Y_{s,t}^{N})^{T} is a Gaussian vector field. Note that the determinant of 2X(s)\nabla^{2}X(s) is equal to the determinant of

(1(t1s1)(tNsN)02X(s)0)\begin{pmatrix}1&-(t_{1}-s_{1})&\ldots&-(t_{N}-s_{N})\\ 0&&&\\ \vdots&&\nabla^{2}X(s)&\\ 0&&&\end{pmatrix} (9)

For i=2,,N+1i=2,...,N+1, multiply the iith column of this matrix by (tisi)/tisi2(t_{i}-s_{i})/||t_{i}-s_{i}||^{2}, take the sum of all such columns and add the result to the first column. Since X(s)=X(t)=0\nabla X(s)=\nabla X(t)=0, we can derive 2X(s)(ts)=ts1+αYs,t\nabla^{2}X(s)(t-s)=-||t-s||^{1+\alpha}Y_{s,t}, and obtain the matrix below with the same determinant as (9)

(0(t1s1)(tNsN)ts1+αYs,t12X(s)st1+αYs,tN)\begin{pmatrix}0&-(t_{1}-s_{1})&\ldots&-(t_{N}-s_{N})\\ -||t-s||^{-1+\alpha}Y_{s,t}^{1}&&&\\ \vdots&&\nabla^{2}X(s)&\\ -||s-t||^{-1+\alpha}Y_{s,t}^{N}&&&\end{pmatrix}

Let r=max1iN|tisi|r=\max_{1\leq i\leq N}|t_{i}-s_{i}|,

As,t=(0(t1s1)/r(tNsN)/rYs,t12X(s)Ys,tN)A_{s,t}=\begin{pmatrix}0&-(t_{1}-s_{1})/r&\ldots&-(t_{N}-s_{N})/r\\ Y_{s,t}^{1}&&&\\ \vdots&&\nabla^{2}X(s)&\\ Y_{s,t}^{N}&&&\end{pmatrix}

So we have

𝔼1(s,t)tsα𝔼2(s,t){\mathbb{E}}_{1}(s,t)\leq||t-s||^{\alpha}{\mathbb{E}}_{2}(s,t)

where

𝔼2(s,t)=𝔼[|detAs,t||det2X(t)||X(s)=x,X(s)=02X(s)(ts)=ts1+αYs,t].{\mathbb{E}}_{2}(s,t)={\mathbb{E}}\left[|{\rm det}A_{s,t}||{\rm det}\nabla^{2}X(t)|\middle|\begin{matrix}X(s)=x,\nabla X(s)=0\\ \nabla^{2}X(s)(t-s)=-||t-s||^{1+\alpha}Y_{s,t}\end{matrix}\right].

Using the inequality of arithmetic and geometric means, we can bound the determinant

|det2X(t)|N2N2i,j|Xij(t)|N|detAs,t|(N+1)2Ni,j|aij|N+1\begin{split}|{\rm det}\nabla^{2}X(t)|\leq N^{2N-2}\sum_{i,j}|X_{ij}(t)|^{N}\\ |{\rm det}A_{s,t}|\leq(N+1)^{2N}\sum_{i,j}|a_{ij}|^{N+1}\end{split}

where aija_{ij} is the i,ji,j entry of As,tA_{s,t}. Apply the inequality again

|det2X(t)||detAs,t|12N2N2(N+1)2N+1(i,j|Xij(t)|2N+i,j|aij|2N+2).|{\rm det}\nabla^{2}X(t)||{\rm det}A_{s,t}|\leq\frac{1}{2}N^{2N-2}(N+1)^{2N+1}\left(\sum_{i,j}|X_{ij}(t)|^{2N}+\sum_{i,j}|a_{ij}|^{2N+2}\right).

For any Gaussian variable XX and integer N0N\geq 0, the following inequality holds

𝔼[X2N]22N(𝔼[X]2N+CNVar(X)2N){\mathbb{E}}[X^{2N}]\leq 2^{2N}({\mathbb{E}}[X]^{2N}+C_{N}{\rm Var}(X)^{2N})

where CNC_{N} is a constant depending on NN. Next, we can focus on the conditional expectation and conditional variance of Xij(t)X_{ij}(t) and Ys,tY_{s,t}.

By Assumption 1 and 2 and the fact that the conditional variance of a Gaussian variable is less or equal to the unconditional variance, we can conclude that the conditional variance of Xij(t)X_{ij}(t) and Ys,tY_{s,t} are bounded above by some constant.

Summarizing the results above,

sups,tDϵ,st|𝔼2(s,t)|C1\sup_{s,t\in D_{\epsilon},s\neq t}|{\mathbb{E}}_{2}(s,t)|\leq C_{1}

for some constant C1>0C_{1}>0 and

𝔼1(s,t)tsα𝔼2(s,t)C1tsα.{\mathbb{E}}_{1}(s,t)\leq||t-s||^{\alpha}{\mathbb{E}}_{2}(s,t)\leq C_{1}||t-s||^{\alpha}.

Combine the results above and with a fixed threshold uu

u𝔼1(s,t)PX(s)(x|X(s)=X(t)=0)dx\displaystyle\int_{u}^{\infty}{\mathbb{E}}_{1}(s,t)P_{X(s)}\left(x\middle|\nabla X(s)=\nabla X(t)=0\right)dx
C1||ts||αuPX(s)(x|X(s)=X(t)=0)dx\displaystyle\leq C_{1}||t-s||^{\alpha}\int_{u}^{\infty}P_{X(s)}\left(x\middle|\nabla X(s)=\nabla X(t)=0\right)dx
=C1tsαuexp((AxB)2)𝑑xfor some constant AB\displaystyle=C_{1}||t-s||^{\alpha}\int_{u}^{\infty}\exp(-(Ax-B)^{2})dx\quad\text{for some constant $A$, $B$}
=C2tsα\displaystyle=C_{2}||t-s||^{\alpha}

for some constant C2>0C_{2}>0.

Next, by the proof of Lemma 4.1 in Cheng and Schwartzman (2015)

pX(s),X(t)(0,0)C3tsNp_{\nabla X(s),\nabla X(t)}(0,0)\leq C_{3}||t-s||^{-N}

for some constant C3>0C_{3}>0.

Therefore, there exists C4C_{4} such that

𝔼[Mu(Mu1)]C4DϵDϵ1tsNα𝑑t𝑑s=o(ϵN).\begin{split}{\mathbb{E}}[M_{u}(M_{u}-1)]\leq C_{4}\int_{D_{\epsilon}}\int_{D_{\epsilon}}\frac{1}{||t-s||^{N-\alpha}}dtds=o(\epsilon^{N}).\end{split}

For 𝔼[Mu]{\mathbb{E}}[M_{u}], by Kac-Rice formula in Adler and Taylor (2007)

𝔼[Mu]=DϵpX(s)(0)𝔼[|det2X(s)|𝟙{2X(s)0}𝟙{X(s)>u}|X(s)=0]𝑑s.{\mathbb{E}}[M_{u}]=\int_{D_{\epsilon}}p_{\nabla X(s)}(0){\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}\mathbbm{1}_{\{X(s)>u\}}|\nabla X(s)=0\right]ds.

Denote the integrand by g(s)g(s). The function g(s)g(s) is continuous and positive over the compact domain DϵD_{\epsilon}. Thus infsDϵg(s)g0>0\inf_{s\in D_{\epsilon}}g(s)\geq g_{0}>0, implying

𝔼[Mu]g0ϵN.{\mathbb{E}}[M_{u}]\geq g_{0}\epsilon^{N}.

Then (7) is an immediate consequence of (5).

For 𝔼[M]{\mathbb{E}}[M_{-\infty}], by Kac-Rice formula

𝔼[M]=DϵpX(s)(0)𝔼[|det2X(s)|𝟙{2X(s)0}|X(s)=0]𝑑s.{\mathbb{E}}[M_{-\infty}]=\int_{D_{\epsilon}}p_{\nabla X(s)}(0){\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}|\nabla X(s)=0\right]ds.

The integrand is also continuous and positive over the compact domain DϵD_{\epsilon} indicating 𝔼[M]=o(1){\mathbb{E}}[M_{-\infty}]=o(1) for small ϵ\epsilon. Thus we have

𝔼[Mu]adj=𝔼[Mu]/max(1,𝔼[M])=𝔼[Mu]/max(1,o(1))=𝔼[Mu]{\mathbb{E}}[M_{u}]_{{\rm adj}}={\mathbb{E}}[M_{u}]/\max(1,{\mathbb{E}}[M_{-\infty}])={\mathbb{E}}[M_{u}]/\max(1,o(1))={\mathbb{E}}[M_{u}]

for sufficiently small ϵ\epsilon. ∎

2.7 Large threshold

For large threshold uu, the following asymptotic result shows power can be precisely approximated by 𝔼[Mu]{\mathbb{E}}[M_{u}].

Theorem 2.

For any fixed domain DD, as uu\to\infty

[Mu1]=𝔼[Mu](1o(eαu2)){\mathbb{P}}[M_{u}\geq 1]={\mathbb{E}}[M_{u}](1-o(e^{-\alpha u^{2}})) (10)

where the error term o(eαu2)o(e^{-\alpha u^{2}}) is non-negative and α>0\alpha>0 is some constant.

Proof.

By lemma 3 of Piterbarg (1996), as uu\to\infty, the factorial moment is super-exponentially small. That means α>0\exists\alpha>0 s.t.

𝔼[Mu(Mu1)]=o(eu22αu2).{\mathbb{E}}[M_{u}(M_{u}-1)]=o(e^{-\frac{u^{2}}{2}-\alpha u^{2}}).

Also

E[Mu][Mu1][supX(s)u]=O(eu22).E[M_{u}]\geq{\mathbb{P}}[M_{u}\geq 1]\geq{\mathbb{P}}[\text{sup}X(s)\geq u]=O(e^{-\frac{u^{2}}{2}}).

Thus, the factorial moment decays exponentially faster than 𝔼[Mu]{\mathbb{E}}[M_{u}]. The result is an immediate consequence of (5). ∎

Notice that the threshold uu does not affect the value of 𝔼[M]{\mathbb{E}}[M_{-\infty}] which is part of the adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}]. By (10)

[Mu1]=𝔼[Mu]adj(1o(eαu2))max(1,𝔼[M]).{\mathbb{P}}[M_{u}\geq 1]={\mathbb{E}}[M_{u}]_{{\rm adj}}(1-o(e^{-\alpha u^{2}}))\max(1,{\mathbb{E}}[M_{-\infty}]).

If 𝔼[M]>1{\mathbb{E}}[M_{-\infty}]>1, the adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] might be overly conservative for large threshold uu. Therefore, we only recommend 𝔼[Mu]{\mathbb{E}}[M_{u}] for this scenario.

2.8 Sharp signal

The following theorem provides an asymptotic power approximation when the signal is sharp. Interestingly, while the power function is generally non-Gaussian, it becomes closer to Gaussian as the signal peaks become sharper.

Theorem 3.

Let θ(s)=ah(s)+θ0\theta(s)=ah(s)+\theta_{0} where h(s)h(s) is a unimodal mean function with maximum equal to 0 at s0s_{0}, a>0a>0, and θ0\theta_{0} represents the height. For any fixed threshold uu, as aa\to\infty

[Mu1]=𝔼[Mu]+o(1)=𝔼[Mu]adj+o(1)=Φ(θ0u)(1+o(1)),{\mathbb{P}}[M_{u}\geq 1]={\mathbb{E}}[M_{u}]+o(1)={\mathbb{E}}[M_{u}]_{{\rm adj}}+o(1)=\Phi(\theta_{0}-u)(1+o(1)), (11)

where Φ(x)\Phi(x) is CDF of the standard Gaussian distribution.

Proof.

By lemma A.1 of Cheng and Schwartzman (2017), as aa\to\infty

(M=1)1O(exp(ca2)),{\mathbb{P}}(M_{-\infty}=1)\geq 1-O(\exp(-ca^{2})),

where c>0c>0 is some constant. Therefore M𝑝1M_{-\infty}\overset{p}{\to}1.

Since MuMM_{u}\leq M_{-\infty} and both of them only take non-negative integer values, |Mu(Mu1)||M_{u}(M_{u}-1)| and |M(M1)||M_{-\infty}(M_{-\infty}-1)| are bounded above by |M(M1)||M(M-1)| where MM is the number of critical points of the random field XX. Apply Kac-Rice formula

𝔼[M(M1)]=DD𝔼[|det2X(s)||det2X(t)||X(s)=X(t)=0]PX(s),X(t)(0,0)dsdt.\begin{split}{\mathbb{E}}[M(M-1)]=\int_{D}\int_{D}{\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)||{\rm det}\nabla^{2}X(t)|\middle|\nabla X(s)=\nabla X(t)=0\right]P_{\nabla X(s),\nabla X(t)}(0,0)dsdt.\end{split}

Denote the integrand by g(s,t,a)g(s,t,a). The function g(s,t,a)g(s,t,a) is continuous and positive over the compact domain DD and M(M1)𝑝0M(M-1)\overset{p}{\to}0 as aa\to\infty. Thus there exists g0>0g_{0}>0 such that 𝔼[M(M1)]g0{\mathbb{E}}[M(M-1)]\leq g_{0}. Then by dominated convergence theorem

𝔼[Mu(Mu1)]0{\mathbb{E}}[M_{u}(M_{u}-1)]\to 0

as aa\to\infty. Since M𝑝1M_{-\infty}\overset{p}{\to}1, the adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}]

𝔼[Mu]adj=𝔼[Mu]max(1,𝔼[M])=𝔼[Mu](1+o(1))=𝔼[Mu]+o(1).{\mathbb{E}}[M_{u}]_{{\rm adj}}={\mathbb{E}}[M_{u}]\max(1,{\mathbb{E}}[M_{-\infty}])={\mathbb{E}}[M_{u}](1+o(1))={\mathbb{E}}[M_{u}]+o(1).

To calculate 𝔼[Mu]{\mathbb{E}}[M_{u}], apply Kac-Rice formula

𝔼[Mu]=\displaystyle{\mathbb{E}}[M_{u}]= DpX(s)(0)𝔼[|det2X(s)|𝟙{2X(s)0}𝟙{X(s)>u}|X(s)=0]𝑑s\displaystyle\int_{D}p_{\nabla X(s)}(0){\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}\mathbbm{1}_{\{X(s)>u\}}|\nabla X(s)=0\right]ds
=\displaystyle= DpX(s)(0)𝔼[|det2X(s)|𝟙{2X(s)0}𝟙{X(s)>u}|X(s)=0]𝑑s\displaystyle\int_{D}p_{\nabla X(s)}(0){\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}\mathbbm{1}_{\{X(s)>u\}}|\nabla X(s)=0\right]ds
=\displaystyle= D1(2π)N/2det(Λ)exp(a2(h(s))TΛ1h(s)/2)\displaystyle\int_{D}\frac{1}{(2\pi)^{N/2}\sqrt{\det(\Lambda)}}\exp(-a^{2}(\nabla h(s))^{T}\Lambda^{-1}\nabla h(s)/2)
𝔼[|det(2Z(s)+a2h(s))|𝟙{2X(s)0}𝟙{X(s)>u}|X(s)=0]ds\displaystyle{\mathbb{E}}\left[|{\rm det}(\nabla^{2}Z(s)+a\nabla^{2}h(s))|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}\mathbbm{1}_{\{X(s)>u\}}|\nabla X(s)=0\right]ds (12)

where Λ\Lambda is the covariance matrix of h(s)\nabla h(s). Let f(s)=(h(s))TΛ1h(s)/2f(s)=(\nabla h(s))^{T}\Lambda^{-1}\nabla h(s)/2 which attains its minimum 0 only at s0s_{0}. Similar to the proof of A.4 in Cheng and Schwartzman (2017), as aa\to\infty, (2.8) can be approximated by applying Laplace’s method

𝔼[Mu]=\displaystyle{\mathbb{E}}[M_{u}]= det(a2h(s0))(2π)N/2det(Λ)((2π)Ndet(Λ)a2Ndet(2h(s0)))1/2Φ(θ0u)+O(a2)\displaystyle\frac{\det(a\nabla^{2}h(s_{0}))}{(2\pi)^{N/2}\sqrt{\det(\Lambda)}}\left(\frac{(2\pi)^{N}\det(\Lambda)}{a^{2N}\det(\nabla^{2}h(s_{0}))}\right)^{1/2}\Phi(\theta_{0}-u)+O(a^{-2})
=\displaystyle= Φ(θ0u)+O(a2).\displaystyle\Phi(\theta_{0}-u)+O(a^{-2}).

This finishes the proof.

3 Explicit formulas

We have showed that the power for peak detection can be approximated by the expected number of local maxima above uu, 𝔼[Mu]{\mathbb{E}}[M_{u}], under certain scenarios such as small domain and large threshold. Although we can apply the Kac-Rice formula to calculate 𝔼[Mu]{\mathbb{E}}[M_{u}], it remains difficult to evaluate it explicitly for N>1N>1 without making any further assumptions. In this section, we focus on computing 𝔼[Mu]{\mathbb{E}}[M_{u}] and show a general formula can be obtained if the noise field is isotropic. Furthermore, explicit formulas when N=1,2,3N=1,2,3 are derived for application purposes.

3.1 Isotropic Gaussian fields

Suppose ZZ is a zero-mean unit-variance isotropic random field. We can write the covariance function of ZZ as 𝔼{Z(s)Z(t)}=ρ(st2){\mathbb{E}}\{Z(s)Z(t)\}=\rho(\|s-t\|^{2}) for an appropriate function ρ():[0,)\rho(\cdot):[0,\infty)\rightarrow\mathbb{R}. Denote

ρ=ρ(0),ρ′′=ρ′′(0),κ=ρ/ρ′′.\rho^{\prime}=\rho^{\prime}(0),\quad\rho^{\prime\prime}=\rho^{\prime\prime}(0),\quad\kappa=-\rho^{\prime}/\sqrt{\rho^{\prime\prime}}. (13)

where ρ\rho^{\prime} and ρ′′\rho^{\prime\prime} are first and second derivative of function ρ\rho respectively.

The following lemma comes from Cheng and Schwartzman (2018).

Lemma 1.

For each sNs\in\mathbb{R}^{N} and ii, jj, kk, l{1,,N}l\in\{1,\ldots,N\},

𝔼{Zi(s)Z(s)}=𝔼{Zi(s)Zjk(s)}=0,𝔼{Zi(s)Zj(s)}=𝔼{Zij(s)Z(s)}=2ρδij,𝔼{Zij(s)Zkl(s)}=4ρ′′(δijδkl+δikδjl+δilδjk)\begin{split}{\mathbb{E}}\{Z_{i}(s)Z(s)\}&={\mathbb{E}}\{Z_{i}(s)Z_{jk}(s)\}=0,\\ {\mathbb{E}}\{Z_{i}(s)Z_{j}(s)\}&=-{\mathbb{E}}\{Z_{ij}(s)Z(s)\}=-2\rho^{\prime}\delta_{ij},\\ {\mathbb{E}}\{Z_{ij}(s)Z_{kl}(s)\}&=4\rho^{\prime\prime}(\delta_{ij}\delta_{kl}+\delta_{ik}\delta_{jl}+\delta_{il}\delta_{jk})\end{split}

where ρ\rho^{\prime} and ρ′′\rho^{\prime\prime} are defined in (13) and δij\delta_{ij} is the Kronecker delta function.

In particular, it follows from Lemma 1 that Var(Zi(s))=2ρ{\rm Var}(Z_{i}(s))=-2\rho^{\prime} and Var(Zii(s))=12ρ′′{\rm Var}(Z_{ii}(s))=12\rho^{\prime\prime} for any i{1,,N}i\in\{1,\ldots,N\}, implying ρ<0\rho^{\prime}<0 and ρ′′>0\rho^{\prime\prime}>0 and hence κ>0\kappa>0.

We can use theoretical results from Gaussian Orthogonally Invariant (GOI) matrices to make the calculation of 𝔼[Mu]{\mathbb{E}}{[M_{u}]} easier. GOI matrices were first introduced in Schwartzman et al. (2008), and used for the first time in the context of random fields in Cheng and Schwartzman (2018). It is a class of Gaussian random matrices that are invariant under orthogonal transformations, and can be useful for computing the expected number of critical points of isotropic Gaussian fields. We call an N×NN\times N random matrix G=(Gij)1i,jNG=(G_{ij})_{1\leq i,j\leq N} GOI with covariance parameter cc, denoted by GOI(c){\rm GOI}(c), if it is symmetric and all entries are centered Gaussian variables such that

𝔼[GijGkl]=12(δikδjl+δilδjk)+cδijδkl.{\mathbb{E}}[G_{ij}G_{kl}]=\frac{1}{2}(\delta_{ik}\delta_{jl}+\delta_{il}\delta_{jk})+c\delta_{ij}\delta_{kl}. (14)

The following lemma is Lemma 3.4 from Cheng and Schwartzman (2018).

Lemma 2.

Let the assumptions in Lemma 1 hold. Let G~\widetilde{G} and GG be GOI(1/2){\rm GOI}(1/2) and GOI((1κ2)/2){\rm GOI}((1-\kappa^{2})/2) matrices respectively. INI_{N} denotes N×NN\times N identity matrix.

(i) The distribution of 2Z(s)\nabla^{2}Z(s) is the same as that of 8ρ′′G~\sqrt{8\rho^{\prime\prime}}\widetilde{G}.

(ii) The distribution of (2Z(s)|Z(s)=z)(\nabla^{2}Z(s)|Z(s)=z) is the same as that of 8ρ′′[G(κz/2)IN]\sqrt{8\rho^{\prime\prime}}\big{[}G-\big{(}\kappa z/\sqrt{2}\big{)}I_{N}\big{]}.

Lemma 2 shows the distribution and conditional distribution of the Hessian matrix of a centered random field Z(s)Z(s). Next, we establish the corresponding result for non-centered random field X(s)=Z(s)+θ(s)X(s)=Z(s)+\theta(s).

Lemma 3.

Let G~\widetilde{G} and GG be GOI(1/2){\rm GOI}(1/2) and GOI((1κ2)/2){\rm GOI}((1-\kappa^{2})/2) matrices respectively.

(i) The distribution of 2X(s)\nabla^{2}X(s) is the same as that of

8ρ′′G~+2θ(s).\sqrt{8\rho^{\prime\prime}}\widetilde{G}+\nabla^{2}\theta(s).

(ii) The distribution of (2X(s)|X(s)=x)(\nabla^{2}X(s)|X(s)=x) is the same as that of

8ρ′′[Gκ(xθ(s))2IN]+2θ(s).\sqrt{8\rho^{\prime\prime}}\left[G-\frac{\kappa(x-\theta(s))}{\sqrt{2}}I_{N}\right]+\nabla^{2}\theta(s).
Proof.

Part (i) is a direct consequence of Lemma 2. For part (ii), note that (2X(s)|X(s)=x)(\nabla^{2}X(s)|X(s)=x) is equivalent to (2Z(s)|Z(s)=xθ(s))+2θ(s)(\nabla^{2}Z(s)|Z(s)=x-\theta(s))+\nabla^{2}\theta(s), and the result follows immediately from Lemma 2. ∎

3.2 General formula under isotropy

Theorem 4.

Let X(s)=Z(s)+θ(s)X(s)=Z(s)+\theta(s), where Z(s)Z(s) is a smooth zero-mean unit-variance isotropic Gaussian random field satisfying Assumption 1, 2. Let θ(s)\theta(s) a smooth C3C^{3} mean function such that 2θ(s)\nabla^{2}\theta(s) is a non-singular matrix with ordered eigenvalues θ1′′(s)θN′′(s)\theta^{\prime\prime}_{1}(s)...\theta^{\prime\prime}_{N}(s) at all critical points ss. Then for any domain DD

𝔼[Mu]=(2ρ′′πρ)N/2Deθ(s)24ρuϕ(xθ(s))𝔼[|det(Matrix(s))|𝟙{Matrix(s)0}]𝑑x𝑑s{\mathbb{E}}[M_{u}]=\bigg{(}\frac{2\rho^{\prime\prime}}{-\pi\rho^{\prime}}\bigg{)}^{N/2}\int_{D}e^{\frac{\|\nabla\theta(s)\|^{2}}{4\rho^{\prime}}}\int_{u}^{\infty}\phi\left({x-\theta(s)}{}\right){\mathbb{E}}\left[|{\rm det}({\rm Matrix}(s))|\mathbbm{1}_{\{{\rm Matrix}(s)\prec 0\}}\right]dx\,ds (15)

where ϕ(x)\phi(x) is the PDF of the standard Gaussian distribution, Matrix(s)=Gκ(xθ(s))IN/2+{\rm Matrix}(s)=G-\kappa(x-\theta(s))I_{N}/\sqrt{2}+\\ diag{θ1′′(s),,θN′′(s)}/8ρ′′{\rm diag}\{\theta^{\prime\prime}_{1}(s),\dots,\theta^{\prime\prime}_{N}(s)\}/\sqrt{8\rho^{\prime\prime}}, GG as in Lemma 3 represents GOI((1-κ2\kappa^{2})/2), and 𝟙{}\mathbbm{1_{\{\cdot\}}} denotes the indicator function.

Proof.

By the Kac-Rice formula

𝔼[Mu]\displaystyle{\mathbb{E}}[M_{u}] =DpX(s)(0)𝔼[|det2X(s)|𝟙{2X(s)0}𝟙{X(s)>u}|X(s)=0]𝑑s\displaystyle=\int_{D}p_{\nabla X(s)}(0){\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}\mathbbm{1}_{\{X(s)>u\}}|\nabla X(s)=0\right]ds
=DpZ(s)+θ(s)(0)𝔼[|det2X(s)|𝟙{2X(s)0}𝟙{X(s)>u}|X(s)=0]𝑑s\displaystyle=\int_{D}p_{\nabla Z(s)+\nabla\theta(s)}(0){\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}\mathbbm{1}_{\{X(s)>u\}}|\nabla X(s)=0\right]ds
=D1(2π)N/2(2ρ)N/2eθ(s)24ρ𝔼[|det2X(s)|𝟙{2X(s)0}𝟙{X(s)>u}|X(s)=0]𝑑s\displaystyle=\int_{D}\frac{1}{(2\pi)^{N/2}(-2\rho^{\prime})^{N/2}}e^{\frac{\|\nabla\theta(s)\|^{2}}{4\rho^{\prime}}}{\mathbb{E}}\left[|{\rm det}\nabla^{2}X(s)|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}\mathbbm{1}_{\{X(s)>u\}}|\nabla X(s)=0\right]ds
=D(8ρ′′2)N/2(2π)N/2(2ρ)N/2eθ(s)24ρuϕ(xθ(s))𝔼[|det(Matrix(s))|𝟙{Matrix(s)0}]𝑑x𝑑s\displaystyle=\int_{D}\frac{(8\rho^{\prime\prime 2})^{N/2}}{(2\pi)^{N/2}(-2\rho^{\prime})^{N/2}}e^{\frac{\|\nabla\theta(s)\|^{2}}{4\rho^{\prime}}}\int_{u}^{\infty}\phi\left({x-\theta(s)}\right){\mathbb{E}}\left[|{\rm det}({\rm Matrix}(s))|\mathbbm{1}_{\{{\rm Matrix}(s)\prec 0\}}\right]dx\,ds
=D(2ρ′′πρ)N/2eθ(s)24ρuϕ(xθ(s))𝔼[|det(Matrix(s))|𝟙{Matrix(s)0}]𝑑x𝑑s\displaystyle=\int_{D}\bigg{(}\frac{2\rho^{\prime\prime}}{-\pi\rho^{\prime}}\bigg{)}^{N/2}e^{\frac{\|\nabla\theta(s)\|^{2}}{4\rho^{\prime}}}\int_{u}^{\infty}\phi\left({x-\theta(s)}\right){\mathbb{E}}\left[|{\rm det}({\rm Matrix}(s))|\mathbbm{1}_{\{{\rm Matrix}(s)\prec 0\}}\right]dx\,ds
=(2ρ′′πρ)N/2Deθ(s)24ρuϕ(xθ(s))𝔼[|det(Matrix(s))|𝟙{Matrix(s)0}]𝑑x𝑑s\displaystyle=\bigg{(}\frac{2\rho^{\prime\prime}}{-\pi\rho^{\prime}}\bigg{)}^{N/2}\int_{D}e^{\frac{\|\nabla\theta(s)\|^{2}}{4\rho^{\prime}}}\int_{u}^{\infty}\phi\left({x-\theta(s)}\right){\mathbb{E}}\left[|{\rm det}({\rm Matrix}(s))|\mathbbm{1}_{\{{\rm Matrix}(s)\prec 0\}}\right]dx\,ds

Next, we show the derivation from the third to the fourth line in the equation above. Since we assume 2θ(s)\nabla^{2}\theta(s) is a non-singular matrix at all critical points, then there exists an orthonormal matrix, denoted by A(s)A(s), such that A(s)T2θ(s)A(s)=diag{θ1′′(s),θ2′′,,θN′′(s)}A(s)^{T}\nabla^{2}\theta(s)A(s)={\rm diag}\{\theta^{\prime\prime}_{1}(s),\theta^{\prime\prime}_{2},\ldots,\theta^{\prime\prime}_{N}(s)\}, where θ1′′θN′′(s)\theta^{\prime\prime}_{1}\leq\ldots\leq\theta^{\prime\prime}_{N}(s) are ordered eigenvalues of 2θ(s)\nabla^{2}\theta(s). On the other hand, GOI matrices are invariant under orthonormal transformations. By Lemma 3, the conditional expectation 𝔼[|det(2X(s))|𝟙{2X(s)0}|X(s)=x]{\mathbb{E}}[|{\rm det}(\nabla^{2}X(s))|\mathbbm{1}_{\{\nabla^{2}X(s)\prec 0\}}|X(s)=x] is therefore

=𝔼[|det(8ρ′′[Gκ(xθ(s))2IN]+2θ(s))|𝟙{Matrix(s)0}]\displaystyle={\mathbb{E}}\left[\left|{\rm det}\left(\sqrt{8\rho^{\prime\prime}}\left[G-\frac{\kappa(x-\theta(s))}{\sqrt{2}}I_{N}\right]+\nabla^{2}\theta(s)\right)\right|\mathbbm{1}_{\{{\rm Matrix}(s)\prec 0\}}\right]
=𝔼[|det(8ρ′′[Gκ(xθ(s))2IN]+A(s)T2θ(s)A(s))|𝟙{Matrix(s)0}]\displaystyle={\mathbb{E}}\left[\left|{\rm det}\left(\sqrt{8\rho^{\prime\prime}}\left[G-\frac{\kappa(x-\theta(s))}{\sqrt{2}}I_{N}\right]+A(s)^{T}\nabla^{2}\theta(s)A(s)\right)\right|\mathbbm{1}_{\{{\rm Matrix}(s)\prec 0\}}\right]
=(8ρ′′)N𝔼[|det([Gκ(xθ(s))2IN]+A(s)T2θ(s)A(s)/8ρ′′)|𝟙{Matrix(s)0}]\displaystyle=(\sqrt{8\rho^{\prime\prime}})^{N}{\mathbb{E}}\left[\left|{\rm det}\left(\left[G-\frac{\kappa(x-\theta(s))}{\sqrt{2}}I_{N}\right]+A(s)^{T}\nabla^{2}\theta(s)A(s)/\sqrt{8\rho^{\prime\prime}}\right)\right|\mathbbm{1}_{\{{\rm Matrix}(s)\prec 0\}}\right]
=(8ρ′′)N𝔼[|det(Gκ(xθ(s))2IN+diag{θ1′′(s),θ2′′,,θN′′(s)}/8ρ′′)|𝟙{Matrix(s)0}].\displaystyle=(\sqrt{8\rho^{\prime\prime}})^{N}{\mathbb{E}}\left[\left|{\rm det}\left(G-\frac{\kappa(x-\theta(s))}{\sqrt{2}}I_{N}+{\rm diag}\{\theta^{\prime\prime}_{1}(s),\theta^{\prime\prime}_{2},\ldots,\theta^{\prime\prime}_{N}(s)\}/\sqrt{8\rho^{\prime\prime}}\right)\right|\mathbbm{1}_{\{{\rm Matrix}(s)\prec 0\}}\right]. (16)

The expression (15) can be simplified further if we further assume the mean function θ(s)\theta(s) to be a rotational symmetric paraboloid centered at s0s_{0}. In this case, the Hessian of θ(s)\theta(s) is the identity matrix multiplied by a constant, i.e.

θ′′=θ1′′(s)=θ2′′(s)==θN′′(s).\theta^{\prime\prime}=\theta^{\prime\prime}_{1}(s)=\theta^{\prime\prime}_{2}(s)=...=\theta^{\prime\prime}_{N}(s).

Then we can write the mean function as θ(s)=θ0+θ′′ss02/2\theta(s)=\theta_{0}+\theta^{\prime\prime}\|s-s_{0}\|^{2}/2. Define

η=θ′′2κρ′′=θ′′2ρ=θ′′Var(Z1(s)).\eta=\frac{\theta^{\prime\prime}}{2\kappa\sqrt{\rho^{\prime\prime}}}=\frac{\theta^{\prime\prime}}{-2\rho^{\prime}}=\frac{\theta^{\prime\prime}}{{\rm Var}(Z_{1}(s))}. (17)

and

H(x~)=𝔼GOI((1κ2)/2)N[j=1N|λjκx~2|𝟙{λN<κx~2}].H(\tilde{x})={\mathbb{E}}_{{\rm GOI}((1-\kappa^{2})/2)}^{N}\left[\prod_{j=1}^{N}\left|\lambda_{j}-\frac{\kappa\tilde{x}}{\sqrt{2}}\right|\mathbbm{1}_{\{\lambda_{N}<\frac{\kappa\tilde{x}}{\sqrt{2}}\}}\right]. (18)

𝔼[Mu]{\mathbb{E}}[M_{u}] can be simplified as

𝔼[Mu]=\displaystyle{\mathbb{E}}[M_{u}]= (2ρ′′πρ)N/2Deθ′′2ss024ρu~(s)ϕ(x~+η)H(x~)𝑑x~𝑑s\displaystyle\bigg{(}\frac{2\rho^{\prime\prime}}{-\pi\rho^{\prime}}\bigg{)}^{N/2}\int_{D}e^{\frac{\theta^{\prime\prime 2}\|s-s_{0}\|^{2}}{4\rho^{\prime}}}\int_{\tilde{u}(s)}^{\infty}\phi\left(\tilde{x}+\eta\right)H(\tilde{x})d\tilde{x}ds (19)

where we make a change of variable x~=xθ(s)η\tilde{x}=x-\theta(s)-\eta and u~(s)=uθ(s)η\tilde{u}(s)=u-\theta(s)-\eta. Note that the parameter κ\kappa depends on the correlation structure of Z(s)Z(s).

3.3 Explicit formulas in 1D, 2D and 3D

In (19), a general formula for 𝔼[Mu]{\mathbb{E}}[M_{u}] under isotropy was derived. To make the formula easier to apply in practice, we have the following results for computing it in 1D, 2D, and 3D. When N=1N=1, the derivation is simple enough that we do not need additional assumptions on the mean function θ(s)\theta(s) except those in Theorem 4, and it follows directly from Kac-Rice formula. When N=2N=2 and 33, we assume the mean function θ(s)\theta(s) is a rotational symmetric paraboloid centered at s0s_{0}. 𝔼[Mu]{\mathbb{E}}[M_{u}] is calculated by first obtaining explicit formulas for H(x~)H(\tilde{x}), and plugging HH into (19).

Proposition 2.

Let N=1N=1, X(s)=Z(s)+θ(s)X(s)=Z(s)+\theta(s), where Z(s)Z(s) is a smooth zero-mean unit-variance Gaussian process and θ(s)\theta(s) is a smooth mean function. Assume additionally that Z(s)Z(s) is stationary, then

𝔼[Mu]=D2ρ(3κ2)κϕ(θ(s)2ρ)uϕ(xθ(s))ψ(κ[xθ(s)η(s)]3κ2)𝑑x𝑑s,{\mathbb{E}}[M_{u}]=\int_{D}\frac{\sqrt{-2\rho^{\prime}(3-\kappa^{2})}}{\kappa}\phi\left(\frac{\theta^{\prime}(s)}{\sqrt{-2\rho^{\prime}}}\right)\int_{u}^{\infty}\phi(x-\theta(s))\psi\left(\frac{\kappa[x-\theta(s)-\eta(s)]}{\sqrt{3-\kappa^{2}}}\right)dx\,ds, (20)

where the function ψ\psi is defined as

ψ(x)=xΦ(y)𝑑y=ϕ(x)+xΦ(x),x.\psi(x)=\int_{-\infty}^{x}\Phi(y)dy=\phi(x)+x\Phi(x),\quad x\in\mathbb{R}.
Proof.

Since we assume that Z(s)Z(s) is stationary, Z(s)Z^{\prime}(s) is independent of Z(s)Z(s) and Z′′(s)Z^{\prime\prime}(s), and ρ=Var(Z(s))/2=𝔼[Z(s)Z′′(s)]/2\rho^{\prime}=-{\rm Var}(Z^{\prime}(s))/2={\mathbb{E}}[Z(s)Z^{\prime\prime}(s)]/2 and ρ′′=Var(Z′′(s))/12\rho^{\prime\prime}={\rm Var}(Z^{\prime\prime}(s))/12 do not depend on ss. Therefore,

Var(X(s))=1,Var(X(s))=Cov[X(s)X′′(s)]=2ρandVar(X′′(s))=12ρ′′.{\rm Var}(X(s))=1,\quad{\rm Var}(X^{\prime}(s))=-{\rm Cov}[X(s)X^{\prime\prime}(s)]=-2\rho^{\prime}\quad{\rm and}\quad{\rm Var}(X^{\prime\prime}(s))=12\rho^{\prime\prime}.

Note that, by the formula of conditional Gaussian distributions,

X′′(s)|X(s)=xN(θ′′(s)+2ρ(xθ(s)),12ρ′′4ρ2).X^{\prime\prime}(s)|X(s)=x\sim N(\theta^{\prime\prime}(s)+2\rho^{\prime}(x-\theta(s)),12\rho^{\prime\prime}-4\rho^{\prime 2}).

By the Kac-Rice formula

𝔼[Mu]\displaystyle{\mathbb{E}}[M_{u}] =DpX(s)(0)𝔼[|X′′(s)|𝟙{X(s)>u}𝟙{X′′(s)<0}|X(s)=0]𝑑s\displaystyle=\int_{D}p_{X^{\prime}(s)}(0){\mathbb{E}}[|X^{\prime\prime}(s)|\mathbbm{1}_{\{X(s)>u\}}\mathbbm{1}_{\{X^{\prime\prime}(s)<0\}}|X^{\prime}(s)=0]ds
=DpX(s)(0)uϕ(xθ(s))0(x′′)112ρ′′4ρ2ϕ[x′′θ′′(s)2ρ(xθ(s))12ρ′′4ρ2]𝑑x′′𝑑x𝑑s\displaystyle=\int_{D}p_{X^{\prime}(s)}(0)\int_{u}^{\infty}\phi(x-\theta(s))\int_{-\infty}^{0}(-x^{\prime\prime})\frac{1}{\sqrt{12\rho^{\prime\prime}-4\rho^{\prime 2}}}\phi\left[\frac{x^{\prime\prime}-\theta^{\prime\prime}(s)-2\rho^{\prime}(x-\theta(s))}{\sqrt{12\rho^{\prime\prime}-4\rho^{\prime 2}}}\right]dx^{\prime\prime}\,dx\,ds
=DpX(s)(0)12ρ′′4ρ2uϕ(xθ(s))ψ(2ρ(xθ(s))θ′′(s)12ρ′′4ρ2)𝑑x𝑑s\displaystyle=\int_{D}p_{X^{\prime}(s)}(0)\sqrt{12\rho^{\prime\prime}-4\rho^{\prime 2}}\int_{u}^{\infty}\phi(x-\theta(s))\psi\left(\frac{-2\rho^{\prime}(x-\theta(s))-\theta^{\prime\prime}(s)}{\sqrt{12\rho^{\prime\prime}-4\rho^{\prime 2}}}\right)dx\,ds
=D12ρϕ(θ(s)2ρ)12ρ′′4ρ2uϕ(xθ(s))ψ(2ρ(xθ(s))θ′′(s)12ρ′′4ρ2)𝑑x𝑑s\displaystyle=\int_{D}\frac{1}{\sqrt{-2\rho^{\prime}}}\phi\left(\frac{\theta^{\prime}(s)}{\sqrt{-2\rho^{\prime}}}\right)\sqrt{12\rho^{\prime\prime}-4\rho^{\prime 2}}\int_{u}^{\infty}\phi(x-\theta(s))\psi\left(\frac{-2\rho^{\prime}(x-\theta(s))-\theta^{\prime\prime}(s)}{\sqrt{12\rho^{\prime\prime}-4\rho^{\prime 2}}}\right)dx\,ds
=D12ρ′′4ρ22ρϕ(θ(s)2ρ)uϕ(xθ(s))ψ(2ρ(xθ(s))θ′′(s)12ρ′′4ρ2)𝑑x𝑑s.\displaystyle=\int_{D}\frac{\sqrt{12\rho^{\prime\prime}-4\rho^{\prime 2}}}{\sqrt{-2\rho^{\prime}}}\phi\left(\frac{\theta^{\prime}(s)}{\sqrt{-2\rho^{\prime}}}\right)\int_{u}^{\infty}\phi(x-\theta(s))\psi\left(\frac{-2\rho^{\prime}(x-\theta(s))-\theta^{\prime\prime}(s)}{\sqrt{12\rho^{\prime\prime}-4\rho^{\prime 2}}}\right)dx\,ds.

The second to third line is due to the fact that

0(x)1bϕ(x+ab)𝑑x=0Φ(x+ab)𝑑x=ba/bΦ(y)𝑑y=bψ(ab).\int_{-\infty}^{0}(-x)\frac{1}{b}\phi\left(\frac{x+a}{b}\right)dx=\int_{-\infty}^{0}\Phi\left(\frac{x+a}{b}\right)dx=b\int_{-\infty}^{a/b}\Phi(y)dy=b\psi\left(\frac{a}{b}\right).

Recall the κ\kappa (13) and η\eta (17) parameters defined before. We can rewrite 𝔼[Mu]{\mathbb{E}}[M_{u}] as

D2ρ(3κ2)κϕ(θ(s)2ρ)uϕ(xθ(s))ψ(κ[xθ(s)η(s)]3κ2)𝑑x𝑑s.\int_{D}\frac{\sqrt{-2\rho^{\prime}(3-\kappa^{2})}}{\kappa}\phi\left(\frac{\theta^{\prime}(s)}{\sqrt{-2\rho^{\prime}}}\right)\int_{u}^{\infty}\phi(x-\theta(s))\psi\left(\frac{\kappa[x-\theta(s)-\eta(s)]}{\sqrt{3-\kappa^{2}}}\right)dx\,ds.

This finishes the proof. ∎

Note that when N=1N=1,

H(x~)=ϕ(κx~3κ2)+κx~3κ2Φ(κx~3κ2)=ψ(κx~3κ2)H(\tilde{x})=\phi\left(\frac{\kappa\tilde{x}}{\sqrt{3-\kappa^{2}}}\right)+\frac{\kappa\tilde{x}}{\sqrt{3-\kappa^{2}}}\Phi\left(\frac{\kappa\tilde{x}}{\sqrt{3-\kappa^{2}}}\right)=\psi\left(\frac{\kappa\tilde{x}}{\sqrt{3-\kappa^{2}}}\right)\\ (21)

We need the following lemmas to calculate H(x~)H(\tilde{x}) explicitly when N=2N=2 and N=3N=3. They are direct calculation of integral by parts.

Lemma 4.

Let N=2N=2, for constant a>12a>-\frac{1}{2} and bb\in\mathbb{R}

2exp{12i=12λi2a2(i=12λi)2}(i=12|λib|)|λ1λ2|𝟙{λ1<λ2<b}𝑑λ1𝑑λ2=2π1+ae1+2a2(1+a)b2Φ(1+2a1+ab)+(2b21+4a1+2a)π1+2aΦ(2(1+2a)b)+b1+2ae(1+2a)b2\int_{\mathbb{R}^{2}}\exp\bigg{\{}-\frac{1}{2}\sum_{i=1}^{2}\lambda_{i}^{2}-\frac{a}{2}\big{(}\sum_{i=1}^{2}\lambda_{i}\big{)}^{2}\bigg{\}}\left(\prod_{i=1}^{2}|\lambda_{i}-b|\right)|\lambda_{1}-\lambda_{2}|\mathbbm{1}_{\{\lambda_{1}<\lambda_{2}<b\}}d\lambda_{1}\,d\lambda_{2}\\ =\frac{\sqrt{2\pi}}{\sqrt{1+a}}e^{-\frac{1+2a}{2(1+a)}b^{2}}\Phi\left(\frac{1+2a}{\sqrt{1+a}}b\right)+\left(2b^{2}-\frac{1+4a}{1+2a}\right)\frac{\sqrt{\pi}}{\sqrt{1+2a}}\Phi(\sqrt{2(1+2a)}b)+\frac{b}{1+2a}e^{-(1+2a)b^{2}} (22)
Lemma 5.

Let N=3N=3, for constant a>0a>0 and bb\in\mathbb{R}

3exp{12i=13λi2+a2(2+3a)(i=13λi)2}(i=13|λib|)1i<j3|λiλj|𝟙{λ1<λ2<λ3<b}dλ1dλ2dλ3\displaystyle\int_{\mathbb{R}^{3}}\exp\bigg{\{}-\frac{1}{2}\sum_{i=1}^{3}\lambda_{i}^{2}+\frac{a}{2(2+3a)}\big{(}\sum_{i=1}^{3}\lambda_{i}\big{)}^{2}\bigg{\}}\left(\prod_{i=1}^{3}|\lambda_{i}-b|\right)\prod_{1\leq i<j\leq 3}|\lambda_{i}-\lambda_{j}|\mathbbm{1}_{\{\lambda_{1}<\lambda_{2}<\lambda_{3}<b\}}d\lambda_{1}\,d\lambda_{2}\,d\lambda_{3}
=\displaystyle= [a3+6a2+12a+242(a+2)2b2+2a3+3a2+6a4(a+2)+32]1π(a+2)eb2a+2Φ(22b(a+2)(3a+2))\displaystyle\left[\frac{a^{3}+6a^{2}+12a+24}{2(a+2)^{2}}b^{2}+\frac{2a^{3}+3a^{2}+6a}{4(a+2)}+\frac{3}{2}\right]\frac{1}{\sqrt{\pi(a+2)}}e^{-\frac{b^{2}}{a+2}}\Phi\left(\frac{2\sqrt{2}b}{\sqrt{(a+2)(3a+2)}}\right)
+[a+12b2+a2a21]1π(a+1)eb2a+1Φ(2b(a+1)(3a+2))\displaystyle+\left[\frac{a+1}{2}b^{2}+\frac{a^{2}-a}{2}-1\right]\frac{1}{\sqrt{\pi(a+1)}}e^{-\frac{b^{2}}{a+1}}\Phi\left(\frac{\sqrt{2}b}{\sqrt{(a+1)(3a+2)}}\right)
+(a+6+3a3+12a2+28a2(a+2))b2π(a+2)3a+2e3b23a+2\displaystyle+\left(a+6+\frac{3a^{3}+12a^{2}+28a}{2(a+2)}\right)\frac{b}{2\pi(a+2)\sqrt{3a+2}}e^{-\frac{3b^{2}}{3a+2}}
+b[b2+3(a1)2][ΦΣ1(0,b)+ΦΣ2(0,b)]\displaystyle+b\left[b^{2}+\frac{3(a-1)}{2}\right][\Phi_{\Sigma_{1}}(0,b)+\Phi_{\Sigma_{2}}(0,b)]

where

Σ1=(3211a+22),Σ2=(321212a+12).\begin{split}\Sigma_{1}=\left(\begin{array}[]{cc}\frac{3}{2}&-1\\ -1&\frac{a+2}{2}\\ \end{array}\right),\quad\Sigma_{2}=\left(\begin{array}[]{cc}\frac{3}{2}&-\frac{1}{2}\\ -\frac{1}{2}&\frac{a+1}{2}\\ \end{array}\right).\end{split}
Proposition 3.

Let N=2N=2, and assumptions in Theorem 4 hold. Then the function HH defined in (18) can be written explicitly as

H(x~)=\displaystyle H(\tilde{x})= 2π3κ2ϕ(κx~3κ2)Φ(κx~(2κ2)(3κ2))+κ22(x~21)Φ(κx~2κ2)+κ2κ2x~2ϕ(κx~2κ2)\displaystyle\frac{\sqrt{2\pi}}{\sqrt{3-\kappa^{2}}}\phi\left(\frac{\kappa\tilde{x}}{\sqrt{3-\kappa^{2}}}\right)\Phi\left(\frac{\kappa\tilde{x}}{\sqrt{(2-\kappa^{2})(3-\kappa^{2})}}\right)+\frac{\kappa^{2}}{2}(\tilde{x}^{2}-1)\Phi\left(\frac{\kappa\tilde{x}}{\sqrt{2-\kappa^{2}}}\right)+\frac{\kappa\sqrt{2-\kappa^{2}}\tilde{x}}{2}\phi\left(\frac{\kappa\tilde{x}}{\sqrt{2-\kappa^{2}}}\right) (23)
Proof.

By Lemma 4 above with a=(κ21)/(42κ2)a=(\kappa^{2}-1)/(4-2\kappa^{2}) and b=κx~/2b=\kappa\tilde{x}/\sqrt{2}, and Lemma 2.2 in Cheng and Schwartzman (2018) ,

𝔼GOI((1κ2)/2)N[j=1N|λjκx~2|I(λN<κx~2)]\displaystyle{\mathbb{E}}_{{\rm GOI}((1-\kappa^{2})/2)}^{N}\left[\prod_{j=1}^{N}\left|\lambda_{j}-\frac{\kappa\tilde{x}}{\sqrt{2}}\right|I\left(\lambda_{N}<-\frac{\kappa\tilde{x}}{\sqrt{2}}\right)\right]
=12π(2κ2)2exp{12(λ12+λ22)+1κ24(2κ2)(λ1+λ2)2}(λ2λ1)\displaystyle=\frac{1}{2\sqrt{\pi(2-\kappa^{2})}}\int_{\mathbb{R}^{2}}\exp\bigg{\{}-\frac{1}{2}(\lambda_{1}^{2}+\lambda_{2}^{2})+\frac{1-\kappa^{2}}{4(2-\kappa^{2})}(\lambda_{1}+\lambda_{2})^{2}\bigg{\}}(\lambda_{2}-\lambda_{1})
|λ1κx~2||λ2κx~2|𝟙{λ2<κx~2}dλ1dλ2\displaystyle\quad\left|\lambda_{1}-\frac{\kappa\tilde{x}}{\sqrt{2}}\right|\left|\lambda_{2}-\frac{\kappa\tilde{x}}{\sqrt{2}}\right|\mathbbm{1}_{\{\lambda_{2}<\frac{\kappa\tilde{x}}{\sqrt{2}}\}}d\lambda_{1}\,d\lambda_{2}
=13κ2eκ2x~22(3κ2)Φ(κx~(2κ2)(3κ2))+κ22(x~21)Φ(κx~2κ2)+κ2κ2x~22πeκ2x~22(2κ2).\displaystyle=\frac{1}{\sqrt{3-\kappa^{2}}}e^{-\frac{\kappa^{2}\tilde{x}^{2}}{2(3-\kappa^{2})}}\Phi\left(\frac{\kappa\tilde{x}}{\sqrt{(2-\kappa^{2})(3-\kappa^{2})}}\right)+\quad\frac{\kappa^{2}}{2}\left(\tilde{x}^{2}-1\right)\Phi\left(\frac{\kappa\tilde{x}}{\sqrt{2-\kappa^{2}}}\right)+\frac{\kappa\sqrt{2-\kappa^{2}}\tilde{x}}{2\sqrt{2\pi}}e^{-\frac{\kappa^{2}\tilde{x}^{2}}{2(2-\kappa^{2})}}. (24)

This simplifies to (23).

Proposition 4.

When N=3N=3, let assumptions in Theorem 4 hold. Then the function HH defined in (18) can be written explicitly as

H(x~)\displaystyle H(\tilde{x}) =[κ2[(1κ2)3+6(1κ2)2+12(1κ2)+24]4(3κ2)2x~2\displaystyle=\Bigg{[}\frac{\kappa^{2}\left[(1-\kappa^{2})^{3}+6(1-\kappa^{2})^{2}+12(1-\kappa^{2})+24\right]}{4(3-\kappa^{2})^{2}}\tilde{x}^{2}
+2(1κ2)3+3(1κ2)2+6(1κ2)4(3κ2)+32]ϕ(κx~3κ2)π(3κ2)Φ(2κx~(3κ2)(53κ2))\displaystyle\quad+\frac{2(1-\kappa^{2})^{3}+3(1-\kappa^{2})^{2}+6(1-\kappa^{2})}{4(3-\kappa^{2})}+\frac{3}{2}\Bigg{]}\frac{\phi(\frac{\kappa\tilde{x}}{\sqrt{3-\kappa^{2}}})}{\sqrt{\pi(3-\kappa^{2})}}\Phi\left(\frac{2\kappa\tilde{x}}{\sqrt{(3-\kappa^{2})(5-3\kappa^{2})}}\right)
+[κ2(2κ2)4x~2κ2(1κ2)21]ϕ(κx~2κ2)π(2κ2)Φ(κx~(2κ2)(53κ2))\displaystyle\quad+\left[\frac{\kappa^{2}(2-\kappa^{2})}{4}\tilde{x}^{2}-\frac{\kappa^{2}(1-\kappa^{2})}{2}-1\right]\frac{\phi(\frac{\kappa\tilde{x}}{\sqrt{2-\kappa^{2}}})}{\sqrt{\pi(2-\kappa^{2})}}\Phi\left(\frac{\kappa\tilde{x}}{\sqrt{(2-\kappa^{2})(5-3\kappa^{2})}}\right)
+[7κ2+(1κ2)[3(1κ2)2+12(1κ2)+28]2(3κ2)]κx~ϕ(353κ2κx~)22π(3κ2)53κ2\displaystyle\quad+\left[7-\kappa^{2}+\frac{(1-\kappa^{2})\left[3(1-\kappa^{2})^{2}+12(1-\kappa^{2})+28\right]}{2(3-\kappa^{2})}\right]\frac{\kappa\tilde{x}\phi(\sqrt{\frac{3}{5-3\kappa^{2}}}\kappa\tilde{x})}{2\sqrt{2}\pi(3-\kappa^{2})\sqrt{5-3\kappa^{2}}}
+κ322x~(x~23)[ΦΣ1(0,κx~/2)+ΦΣ2(0,κx~/2)],\displaystyle\quad+\frac{\kappa^{3}}{2\sqrt{2}}\tilde{x}(\tilde{x}^{2}-3)\left[\Phi_{\Sigma_{1}}(0,\kappa\tilde{x}/\sqrt{2})+\Phi_{\Sigma_{2}}(0,\kappa\tilde{x}/\sqrt{2})\right],

where

Σ1=(32113κ22),Σ2=(3212122κ22).\begin{split}\Sigma_{1}=\left(\begin{array}[]{cc}\frac{3}{2}&-1\\ -1&\frac{3-\kappa^{2}}{2}\\ \end{array}\right),\quad\Sigma_{2}=\left(\begin{array}[]{cc}\frac{3}{2}&-\frac{1}{2}\\ -\frac{1}{2}&\frac{2-\kappa^{2}}{2}\\ \end{array}\right).\end{split}
Proof.

This is a direct result of Lemma 5 above with a=1κ2a=1-\kappa^{2} and b=κx~/2b=\kappa\tilde{x}/\sqrt{2}. ∎

Note that for N=2N=2 and N=3N=3, we need to solve an integral over the domain (see (19)) to get 𝔼[Mu]{\mathbb{E}}{[M_{u}]}. Although we can not derive the explicit form for the entire formula, this can be evaluated in applications with the help of numerical algorithms.

3.4 Isotropic unimodal mean function

We have calculated the explicit formulas assuming the mean function is a concave paraboloid. This is a very strong assumption. However, in a general setting, where the unimodal mean function is rotationally symmetric of any shape, we can apply a multivariate Taylor expansion at the peak and use the second-order approximation to estimate power. For example, suppose the shape of the mean function is proportional to a rotational symmetric Gaussian density

θ(s)=θ0exp(ss022ξ2)\theta(s)=\theta_{0}\exp\left(-\frac{\|s-s_{0}\|^{2}}{2\xi^{2}}\right) (25)

where s0s_{0} is the center of the mean function and ξ\xi is the signal bandwidth. The Taylor expansion at the center is

θ(s)=θ0θ02ξ2ss02+o(ss02)\theta(s)=\theta_{0}-\frac{\theta_{0}}{2\xi^{2}}\|s-s_{0}\|^{2}+o(\|s-s_{0}\|^{2}) (26)

When the domain size gets small, we neglect the remainder term, and use its quadratic approximation as the mean function. With quadratic mean function, it becomes convenient to use compute 𝔼[Mu]{\mathbb{E}}{[M_{u}]}. We will evaluate the performance of this approach for different domain sizes in the simulation study.

4 Simulations

In Section 2 above, we discussed power approximation under different scenarios. We showed the factorial moment 𝔼[Mu(Mu1)]{\mathbb{E}}[M_{u}(M_{u}-1)] decays faster than 𝔼[Mu]{\mathbb{E}}[M_{u}] under some circumstances so that we can use 𝔼[Mu]{\mathbb{E}}[M_{u}] or adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] to approximate power. In this section, a simulation study is conducted to validate each scenario as well as visualize the power function, 𝔼[Mu]{\mathbb{E}}[M_{u}], and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}]. Through simulation, we could also get a better sense of applying them to real data.

4.1 Paraboloidal mean function

We generate B=100,000B=100,000 centered, unit-variance, smooth isotropic 2D Gaussian random fields over a grid of size 50×5050\times 50 pixels as Z(s)Z(s), each field obtained as the convolution of white Gaussian noise with a Gaussian kernel of spatial standard deviation 5, and normalized to standard deviation σ=1\sigma=1. For the mean function μ(s)\mu(s), we use a concave paraboloid centered at s0=(25,25)s_{0}=(25,25). The equation of the paraboloid is

θ(s)=θ0ss022ξ2\theta(s)=\theta_{0}-\frac{\|s-s_{0}\|^{2}}{2\xi^{2}} (27)

where ξ\xi controls the sharpness of the mean function. The smaller ξ\xi is, the sharper the paraboloid will be. θ0\theta_{0} controls the height of the signal. To maintain the rotational symmetric property of θ(s)\theta(s), we only consider those circles centered at s0s_{0} as domain DD. The size of DD is measured by the radius Rad(D)\text{Rad}(D). The default value of each parameter is listed in Table unless otherwise specified.

Parameter Default value
Rad(D)\text{Rad}(D) 10
ξ\xi 7
θ0\theta_{0} 3
Table 1: 2D simulation: default value of each parameter

The first two panels of Figure 1 display two instances of θ(s)\theta(s) and Z(s)Z(s) respectively. The third panel displays the resulting sum X(s)X(s) which is calculated by the signal-plus-noise model.

In the simulation, we validate and visualize the scenarios presented above, and check the effect of different choices of parameters on the power function, 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}]. Four different scenarios are considered as we discussed in Section 2:

  1. 1.

    Height equivariance

  2. 2.

    Small domain size Rad(D)

  3. 3.

    Large threshold uu

  4. 4.

    Sharp signal (small ξ\xi)

    Refer to caption
    (a) Mean function θ(s)\theta(s)
    Refer to caption
    (b) Noise Z(s)Z(s)
    Refer to caption
    (c) Data X(s)X(s)
    Figure 1: 2D simulation: a single instance of θ(s)\theta(s), Z(s)Z(s) and their resulting X(s)X(s).

For each simulated random field, we record the height of its highest peak if there exists at least one, and then for any threshold uu, we calculate the empirical estimate of detection power (1) and 𝔼[Mu]{\mathbb{E}}[M_{u}]:

^[Mu1]\displaystyle\hat{{\mathbb{P}}}[M_{u}\geq 1] =1Bi=1B𝟙( a peak in D with height >u for ith simulated sample),\displaystyle=\frac{1}{B}\sum_{i=1}^{B}\mathbbm{1}(\exists\text{ a peak in D with height }>u\text{ for $i$th simulated sample}), (28)
𝔼^[Mu]\displaystyle\hat{{\mathbb{E}}}[M_{u}] =1Bi=1B# peaks in D with height >u for ith simulated sample.\displaystyle=\frac{1}{B}\sum_{i=1}^{B}\text{{\#} peaks in $D$ with height }>u\text{ for $i$th simulated sample}. (29)

Figure 2 displays the power, 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] curves under the four scenarios. The first panel is to validate scenario 1 (height equivariance). As stipulated by Proposition 1, the power, 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] curves are parallel for different signal height hh having other parameters remain the same. In the second panel, both the 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] curve are close to the power curve under scenario 2 (small domain) which indicates that for a smaller domain (quantified by Rad(DD)), using 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] to approximate power becomes more accurate as stipulated by Theorem 7. We can also find in all three panels that when uu is large enough, the 𝔼[Mu]{\mathbb{E}}[M_{u}] curve converges to the power curve as uu increases, as stipulated by Theorem 2. The adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] curve also converges to the power curve but with a slower rate compared to 𝔼[Mu]{\mathbb{E}}[M_{u}]. The third panel shows the power, 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] curve all converge to the Gaussian CDF for sharp signal (small ξ\xi), as stipulated by Theorem 3.

Refer to caption
(a) Height equivariance
Refer to caption
(b) Small domain size
Refer to caption
(c) Sharp signal
Figure 2: 2D simulation: Power approximation using E[Mu]E[M_{u}] under four different scenarios (scenario 3 is displayed in all three panels) when the mean function is quadratic.

4.2 Constant mean function

When the mean function θ(s)\theta(s) is constant, i.e. it does not depend on location ss, X(s)X(s) reduces to a centered isotropic Gaussian random field. Within the context of this paper, θ(s)=0\theta(s)=0 can be seen as the null hypothesis and the power function becomes the probability of Type I error. We use the peak height distribution when θ(s)=0\theta(s)=0 (Cheng and Schwartzman, 2018) to decide the cutoff point such that the test meets the nominal type I error. The simulation result when θ(s)=0\theta(s)=0 is displayed in Figure 3.

The performance of Type I error approximation when the mean function is 0 is similar to what we find when the mean function is quadratic (scenario 4 is ignored since the shape parameter does not exist when the mean function is constant). The conclusion is that under large uu (which is guaranteed in order to control the Type I error) or small domain, we have good Type I error approximation.

Refer to caption
(a) Height equivariance
Refer to caption
(b) Small domain size
Figure 3: 2D simulation: Type I error approximation using E[Mu]E[M_{u}].

4.3 Gaussian mean function

The simulation results under Gaussian mean are displayed in Figure 4. For scenario 2 and 3, the results are consistent with those under quadratic mean. For scenario 1, since θ0\theta_{0} controls both the signal height and sharpness, the power, 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] are no longer equivariant in terms of θ0\theta_{0}. For scenario 4, if we look at Figure 4(c) with Figure 4(d), it can be seen that as the signal becomes sharper, the power, 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] curve converges to the Gaussian CDF only when the domain size (quantified by Rad(DD)) is small. In this case, the asymptotic curve is a mixture of Gaussian CDF and 𝔼[Mu]{\mathbb{E}}[M_{u}] under constant mean. This is due to the shape of Gaussian density as it converges to 0 if we expand the domain.

In conclusion, for Gaussian mean function, we recommend applying our method to approximate power only when the domain size is small.

Refer to caption
(a) Height equivariance
Refer to caption
(b) Small domain size
Refer to caption
(c) Sharp signal (Rad(D)=3{\rm Rad}{(D)}=3)
Refer to caption
(d) Sharp signal (Rad(D)=20{\rm Rad}{(D)}=20)
Figure 4: 2D simulation: Power approximation using 𝔼[Mu]{\mathbb{E}}[M_{u}] when the mean function is Gaussian.

5 Estimation from data

To use our power approximation formula in real peak detection problems, we need to estimate the spatial covariance function of the noise as well as the mean function from the data. In this section, we demonstrate the 3D application setting and how to estimate the noise spatial covariance function and the mean function. Consider an imaging dataset with nn subjects, and let Yi(s)Y_{i}(s) represent the signal plus noise for subject ii

Yi(s)=μ(s)+σ(s)εi(s)Y_{i}(s)=\mu(s)+\sigma(s)\varepsilon_{i}(s)

where s=(s1,s2,s3)3s=(s_{1},s_{2},s_{3})^{\prime}\in\mathbb{R}^{3}, the signal μ(s)\mu(s), standard deviation σ(s)\sigma(s) and noise ε(s)\varepsilon(s) are assumed to be smooth C3C^{3} functions. If we compute the standardized mean of all nn subjects, we will get a standardized random field

X(s)=Y¯(s)/SE(Y¯(s))=nμ(s)/σ(s)+nε¯(s)X(s)=\bar{Y}(s)/\text{SE}(\bar{Y}(s))=\sqrt{n}\mu(s)/\sigma(s)+\sqrt{n}\bar{\varepsilon}(s) (30)

This standardized random field X(s)X(s) has constant standard deviation of 1. We can treat nμ(s)/σ(s)\sqrt{n}\mu(s)/\sigma(s) as the new signal and nε¯(s)\sqrt{n}\bar{\varepsilon}(s) as the new noise of the standardized field. Let θ(s)=nμ(s)/σ(s)\theta(s)=\sqrt{n}\mu(s)/\sigma(s) and Z(s)=nε¯(s)Z(s)=\sqrt{n}\bar{\varepsilon}(s). We propose using the following method to estimate the new signal and noise.

5.1 Estimation of the noise spatial covariance function

We consider the noise Z(s)Z(s) to be constructed by convolving Gaussian white noise with a kernel:

Z(s)=NK(ts)𝑑B(t)Z(s)=\int_{\mathbb{R}^{N}}K(t-s)dB(t) (31)

where K()K(\cdot) is a NN dimensional kernel function, and dB(s)dB(s) is Gaussian white noise. Assume that the kernel is rotationally symmetric so that the noise Z(s)Z(s) is isotropic. Under model (31), we would be able to simulate the noise if we were able to estimate the kernel function from the data.

It can be shown that the autocorrelation of Z(s)Z(s) is the convolution of the kernel with itself:

Cor(Z(s),Z(s))=NK(ts)K(ts)𝑑t=NK(t(ss))K(t)𝑑t.{\rm Cor}(Z(s),Z(s^{\prime}))=\int_{\mathbb{R}^{N}}K(t-s)K(t-s^{\prime})dt=\int_{\mathbb{R}^{N}}K(t-(s-s^{\prime}))K(t)dt.

By the convolution theorem, convolution in the original domain equals point-wise multiplication in the Fourier-transformed domain. Thus the kernel function can be estimated empirically using the following method:

  1. 1.

    Determine a location s0s_{0} of interest (e.g. center of the peak), and calculate the empirical correlation vectors between Y(s0)Y(s_{0}) and Y(s)Y(s) where ss lies on the three orthogonal axes centered at s0s_{0}, and belongs to a subdomain of interest.

  2. 2.

    Take the average of the three estimated correlation vectors (forcing the noise to be isotropic) and perform Fourier transform.

  3. 3.

    Take the square root of the Fourier coefficients, then the estimated kernel function can be obtained by performing the inverse Fourier transform.

5.2 Estimation of the mean function

Our explicit formulas are derived assuming the Hessian of the mean function is a constant times the identity matrix. Therefore, we aim to find a rotational symmetric paraboloid θ^(s)\hat{\theta}(s) that best represents the mean function:

θ(s)=β0+β1s2+(β2,β3,β4)s\theta(s)=\beta_{0}+\beta_{1}||s||^{2}+(\beta_{2},\beta_{3},\beta_{4})\cdot s (32)

where the dot represents the vector inner product in 3\mathbb{R}^{3}. Note that all the quadratic terms share the same coefficient which is due to the rotational symmetry. To estimate (32), we can fit a linear regression using all X(s)X(s) within the subdomain as outcome.

6 A 3D real data example

As an application, we illustrate the methods in a group analysis of fMRI data from the Human Connectome Project (HCP) (Van Essen et al., 2012). The data is used here to get realistic 3D signal and noise parameters from which to do 3D simulations as well as evaluate the performance of our formulas in power approximation. The dataset contains fMRI brain scans for 80 subjects. For each subject, the size of the fMRI image is 91×109×9191\times 109\times 91 voxels. The mean and standard deviation of the data are displayed in Figure 5(a) and 5(b).

Refer to caption
(a) Mean of the data
Refer to caption
(b) Standard deviation of the data
Refer to caption
(c) Standardized mean of the smoothed data
Figure 5: HCP data: Mean, standard deviation of the data, and standardized mean of the smoothed data (transverse sliced at the peak of the image along the third dimension). The blue box represents the subdomain of the peak and the red box represents the subdomain we use to estimate the noise spatial covariance function.

6.1 Data preprocessing

Gaussian kernel smoothing is applied to the dataset to make the mean function unimodal around the peak and increase the signal-to-noise ratio. The standard deviation of the smoothing kernel we use in this example is 1 voxel which translates to full width at half maximum (FWHM) being around 2.235. It is obvious from Figure 5(b) that the standard deviation of the noise is not a constant for different locations, thus we use the transformation described in Section 5 to standardize the smoothed data before analyzing it. The standardized mean of the smoothed data X(s)X(s) is displayed in Figure 5(c).

6.2 Estimation of the autocorrelation and mean functions

After standardizing the data, our next step is to estimate the mean and kernel functions using the methods described in Section 5. Here a 15×15×1515\times 15\times 15 subdomain (the red box in Figure 5(c)) around the peak is taken to estimate the kernel. Since we assume the noise is isotropic, the correlation around the peak is supposed to be strictly symmetric along any dimension. However, this is not always true in real data. To tackle this, for each of the three dimensions, we first save the correlation Cor(X(s),X(s0)){\rm Cor}(X(s),X(s_{0})) around the peak s0s_{0} as a vector and create a new vector by flipping the saved correlation vector. Then we take the mean of the two vectors so that it is guaranteed to be symmetric. The empirical correlation after such symmetrization and the corresponding estimated kernel function are displayed in Figure 6.

Refer to caption
Figure 6: The empirical correlation after symmetrization and the estimated kernel from a subdomain of HCP data.

We consider two approaches to estimate the mean function, nonparametric and parametric. The nonparametric mean estimation is obtained as a voxelwise average over subjects.

θ^(s)=i=1nXi(s)=X¯(s)\hat{\theta}(s)=\sum_{i=1}^{n}X_{i}(s)=\bar{X}(s) (33)

The parametric mean estimation is obtained by fitting a linear regression model (32) using all observed data X(s)X(s) within the subdomain of size 6×6×66\times 6\times 6 (the blue box in Figure 5(c)) as outcome and their corresponding location variables s2||s||^{2}, ss as covariates. The least square estimate of the mean is

θ^(s)=13.030.26s2+(0.20,0.11,0.39)s\hat{\theta}(s)=13.03-0.26||s||^{2}+(0.20,0.11,0.39)\cdot s (34)

We will compare the difference in simulated power and 𝔼[Mu]{\mathbb{E}}[M_{u}] when the mean function is estimated by the nonparametric approach (33) vs the parametric approach (34).

6.3 3D Simulation induced by data

We have done several simulation studies under a well-designed 2D setting where the formulas are supposed to work well, but eventually, we want to apply the formulas to real-life data which is more complicated. Besides, in terms of fMRI data analysis, the image is always 3D by nature. Considering all these factors, a 3D simulation study induced by real data is necessary to validate the performance of the formulas under a more realistic setting,

In the previous two subsections, we have studied the signal and noise of the HCP data. For the simulation, we would like to generate 3D images using the estimated mean and kernel function. The noise field is generated by convolving the estimated kernel (displayed in Figure 6) and Gaussian white noise. For each simulation setting, 10,000 such noise fields are generated.

The signal from the standardized data is very strong (see Figure 5(c)). For illustrative purposes, we choose to weaken the signal by scaling down the estimated mean function (34) while maintaining the same shape. Here signal strength is measured by effect size, and the amount of scaling is determined by different levels of effect size. In traditional t-test or z-test, Cohen’s d values of 0.2, 0.5, and 0.8 (corresponding to 0.58 th 0.69 th and 0.79 th quantiles of the standard Gaussian distribution) are considered as small, medium, and large effect sizes (Cohen, 1988). The peak height distribution under the null hypothesis (zero mean function) is displayed in Figure 7, and does not follow a Gaussian distribution. Therefore, we take 0.58 th, 0.69 th and 0.79 th quantile of the null distribution minus the mean as small (0.16), medium (0.40), and large (0.65) effect size (see the black dash-dot lines in Figure 7). For simplicity, we see the peak height of the mean function as effect size in this simulation. However, this is not the most accurate way of defining effect size in the peak detection setting. More details will be discussed in 7.2. The threshold uu for peak detection is chosen as the 0.99 th quantile of the peak height distribution under the null (3.42\approx 3.42) according to Cheng and Schwartzman (2017) (see the red dashed line in Figure 7).

Refer to caption
Figure 7: 3D Simulation induced by data: Simulated Peak height distribution under the null (zero mean) with different levels of effect sizes and threshold uu.

Similar to the 2D simulation, the search domain DD is assumed to be a sphere centered at the true peak, and we use radius of DD to control the domain size. Signal sharpness is fixed since it is estimated from the data. The empirical power and 𝔼[Mu]{\mathbb{E}}[M_{u}] are computed using (28) and (29).

To derive the explicit formulas for 𝔼[Mu]{\mathbb{E}}[M_{u}], we assume the mean function to be a rotational symmetric paraboloid, and this assumption might cause some bias in applications. In Figure 8, we demonstrate the difference in simulated power and 𝔼[Mu]{\mathbb{E}}[M_{u}] when the mean function is estimated by the nonparametric approach (33) vs the parametric approach (34). It can be observed that the quadratic approximation only has a small impact on power and 𝔼[Mu]{\mathbb{E}}[M_{u}] in this example. In Figure 9, we validate our theoretical formula for 𝔼[Mu]{\mathbb{E}}[M_{u}] (19) as well as the adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}]. As we can see, the theoretical curve for 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] closely mirrors the empirical curve. The figure also shows that the power approximation using 𝔼[Mu]{\mathbb{E}}[M_{u}] is accurate for large uu as stipulated by Theorem 2. Power curves using three different effect sizes, and comparisons between large and small domain sizes are displayed in Figure 10. We can see from the figure that the approximation works well for small and large sample sizes, and 𝔼[Mu]adj{\mathbb{E}}[M_{u}]_{{\rm adj}} provides a conservative determination of the sample when 𝔼[Mu]{\mathbb{E}}[M_{u}] exceeds 1. We can also observe that the performance of power approximation using 𝔼[Mu]{\mathbb{E}}[M_{u}] becomes better if the domain size is smaller as stipulated by Theorem 7.

Refer to caption
Figure 8: 3D Simulation induced by data (Rad(D)=3{\rm Rad}(D)=3, medium effect size): Simulated power and 𝔼[Mu]{\mathbb{E}}[M_{u}] when mean function is obtained from raw data vs quadratic estimation (34). Here 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}] are the same since 𝔼[M]<1{\mathbb{E}}[M_{-\infty}]<1.
Refer to caption
Figure 9: 3D Simulation induced by data (Rad(D)=6{\rm Rad}(D)=6, medium effect size): Simulated vs theoretical 𝔼[Mu]{\mathbb{E}}[M_{u}] and adjusted 𝔼[Mu]{\mathbb{E}}[M_{u}].
Refer to caption
(a) Small effect size (large domain size)
Refer to caption
(b) Medium effect size (large domain size)
Refer to caption
(c) Large effect size (large domain size)
Refer to caption
(d) Small effect size (small domain size)
Refer to caption
(e) Medium effect size (small domain size)
Refer to caption
(f) Large effect size (small domain size)
Figure 10: 3D Simulation induced by data: Power curves when the signal has small, medium, and large effect size, and comparisons between large and small domain size.

7 Discussion

7.1 Explicit formulas and approximations

Calculating power for peak detection (1) has been a difficult problem in random field theory due to the lack of formula that can compute it directly. In this paper, we have discussed the rationale of using 𝔼[Mu]{\mathbb{E}}[M_{u}] and 𝔼[Mu]adj{\mathbb{E}}[M_{u}]_{{\rm adj}} to approximate peak detection power under different scenarios and derived formulas to compute 𝔼[Mu]{\mathbb{E}}[M_{u}] assuming isotropy. Isotropy is assumed so that we are able to use the GOI matrix (Cheng and Schwartzman, 2018) as a tool to calculate 𝔼[Mu]{\mathbb{E}}[M_{u}] via the Kac-Rice formula.

We also showed explicit formulas for H(x~)H(\tilde{x}) (defined as (18)) when N=1,2,3N=1,2,3 assuming the mean function is a paraboloid. Computing H(x~)H(\tilde{x}) involves applying the probability density function for the eigenvalues of GOI matrices and details can be found in the proof of Proposition 2, 3 and 4. Then 𝔼[Mu]{\mathbb{E}}[M_{u}] can be calculated by plugging H(x~)H(\tilde{x}) to (19). The integration in (19), however, can not be evaluated explicitly. In practice, one may evaluate it numerically. For higher dimensions (N>3N>3), it remains difficult to get an explicit form of H(x~)H(\tilde{x}) due to the fact inferred by Proposition 2, 3 and 4 that the integration becomes extremely complicated as NN becomes large.

7.2 Effect size

We want to emphasize that the power depends on both the signal strength parameter θ0\theta_{0} and shape parameter η\eta. In a traditional z-test or t-test which tests a single null hypothesis that the mean value equal to 0, the detection power depends only on a single parameter we call effect size. Here the test is conditional on the point being a local maximum. Applying a simple z-test or t-test, one could reject the null hypothesis as long as the peak height θ0\theta_{0} exceeds the pre-specified threshold. This approach is not accurate since the peak height does not follow a Gaussian or t distribution. To address this, the threshold can be determined by the null distribution of peak height (Cheng and Schwartzman, 2018) to control the type I error at a nominal level. However, power calculation based on the test over peak height is still biased since the true effect size depends both on the signal height and curvature. The height of the peak affects the likelihood of exceeding the threshold and the curvature affects the likelihood of existing such peak in the domain. It follows that a sharp and high peak is easier to detect compared to a flat and low peak, leading to a larger detection power.

For an interpretation of the parameter η=θ′′/(2ρ)\eta=\theta^{\prime\prime}/(-2\rho^{\prime}), we consider two types of mean function: paraboloid and Gaussian. Suppose the noise is the result of the convolution of white noise with a Gaussian kernel with spatial std. dev. ν\nu resulting in the covariance function with ρ(r)=exp(r/(2ν2))\rho(r)=\exp(-r/(2\nu^{2})) as specified in Section 3.1. This is the same noise as we simulated in Section 5. When the mean function is paraboloid, consider θ(s)=s2/(2ξ2)+θ0\theta(s)=-\|s\|^{2}/(2\xi^{2})+\theta_{0} as in (27). Here we obtain θ′′=1/ξ2\theta^{\prime\prime}=-1/\xi^{2} and ρ=1/(2ν2)\rho^{\prime}=-1/(2\nu^{2}), yielding η=θ′′/(2ρ)=ν2/ξ2\eta=\theta^{\prime\prime}/(-2\rho^{\prime})=-\nu^{2}/\xi^{2}. Thus, η\eta is a shape parameter representing the relative sharpness of the mean function with respect to the curvature of the noise. When the mean function is Gaussian, consider θ(s)=aexp(s2/(2τ2))\theta(s)=a\exp(-\|s\|^{2}/(2\tau^{2})). This expression is obtained, for example, if the signal is the result of the convolution of a delta function with a Gaussian kernel with spatial std. dev. τ\tau. We obtain θ′′=a/τ2\theta^{\prime\prime}=-a/\tau^{2} and ρ=1/(2ν2)\rho^{\prime}=-1/(2\nu^{2}), yielding η=θ′′/(2ρ)=aν2/τ2\eta=\theta^{\prime\prime}/(-2\rho^{\prime})=-a\nu^{2}/\tau^{2}. Thus, η\eta is the height of the signal a, scaled by the ratio of the spatial extent of the noise and signal filters. In both cases, the parameter η\eta, and thus the power, are invariant under isotropic scaling of the domain, in a similar fashion to the peak height distribution under the null hypothesis (Cheng and Schwartzman, 2020).

Refer to caption
(a) Power
Refer to caption
(b) 𝔼[Mu]{\mathbb{E}}[M_{u}]
Figure 11: 2D simulation: Power and 𝔼[Mu]{\mathbb{E}}[M_{u}] for different θ\theta and η\eta (u=3.92u=3.92 and Rad(D)=10{\rm Rad}(D)=10).

Figure 11 illustrates how θ0\theta_{0} and η\eta affect power and 𝔼[Mu]{\mathbb{E}}[{M_{u}}] in the 2D simulation described in Section 4. As we have explained, θ0\theta_{0} and η\eta together determine the effect size. Although deriving an explicit form of effect size as a function of θ0\theta_{0} and η\eta is difficult, we are able to roughly show how the two parameters relate to power. θ0\theta_{0} which can be seen as signal-to-noise ratio (SNR) plays a major role. Having η\eta stay the same, the power monotonically increases with respect to θ0\theta_{0}. On the other hand, power monotonically decreases with respect to η\eta having θ0\theta_{0} stays the same. In this simulation example, the impact of θ0\theta_{0} on power is about 10 times stronger than η\eta if we fit a linear model of power using θ0\theta_{0} and η\eta. We can also observe from the figure that the effect of η\eta on power is stronger for large θ0\theta_{0} compared to that for small θ0\theta_{0}.

7.3 Application to data

To use our formula to calculate power in practice, one needs to assume the peak to be a certain type such as paraboloid or Gaussian. However, sometimes it might not be plausible to make such assumptions, leading to inaccurate power estimate.

Regarding the conjecture of 𝔼[Mu]adj{\mathbb{E}}[M_{u}]_{{\rm adj}} being a lower bound when there exists at least one local maximum in the domain DD, it remains difficult to prove in general, but as we showed in the real data example, it seems to be correct in practice. When it comes to a real-life problem, we can take both 𝔼[Mu]{\mathbb{E}}[M_{u}] and the 𝔼[Mu]adj{\mathbb{E}}[M_{u}]_{{\rm adj}} into consideration to get a better understanding of the true sample size. We suggest using 𝔼[Mu]{\mathbb{E}}{[M_{u}]} as an approximation to power when the sample size is small, considering 𝔼[Mu]adj{\mathbb{E}}[M_{u}]_{{\rm adj}} when the sample size is large. 𝔼[Mu]adj{\mathbb{E}}[M_{u}]_{{\rm adj}} also gives a more conservative estimate of power compared to 𝔼[Mu]{\mathbb{E}}[M_{u}] which is useful to guarantee that the test is powerful enough when we design future studies. Because of its difficulty, we leave further study of 𝔼[Mu]adj{\mathbb{E}}[M_{u}]_{{\rm adj}} for future work.

8 Acknowledgments

Y.Z., D.C. and A.S. were partially supported by NIH grant R01EB026859 and NSF grant 1811659. Data were provided in part by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.

9 Appendix

References

  • Adler and Taylor (2007) R. J. Adler, J. E. Taylor, Random Fields and Geometry., New York: Springer, 2007.
  • Cheng and Schwartzman (2015) D. Cheng, A. Schwartzman, Distribution of the height of local maxima of gaussian random fields., Extremes 18 (2015) 213–240.
  • Cheng and Schwartzman (2017) D. Cheng, A. Schwartzman, Multiple testing of local maxima for detection of peaks in random fields, Annals of Statistics 45(2) (2017) 529–556.
  • Cheng and Schwartzman (2018) D. Cheng, A. Schwartzman, Expected number and height distribution of critical points of smooth isotropic gaussian random fields., Bernoulli 24(4B) (2018) 3422–3446.
  • Cheng and Schwartzman (2020) D. Cheng, A. Schwartzman, On critical points of gaussian random fields under diffeomorphic transformations, Statistics & Probability Letters 158 (2020) 108672.
  • Cohen (1988) J. Cohen, Statistical Power Analysis for the Behavioral Sciences, Routledge, 1988.
  • Durnez et al. (2016) J. Durnez, J. Degryse, B. Moerkerke, R. Seurinck, V. Sochat, R. A. Poldrack, T. E. Nichols, Power and sample size calculations for fmri studies based on the prevalence of active peaks. (2016).
  • Genovese et al. (2002) C. R. Genovese, N. A. Lazar, T. Nichols, Thresholding of statistical maps in functional neuroimaging using the false discovery rate, Neuroimage 15 (2002) 870–878.
  • Heller et al. (2006) R. Heller, D. Stanley, D. Yekutieli, N. Rubin, Y. Benjamini, Cluster-based analysis of FMRI data, Neuroimage 33 (2006) 599–608.
  • Piterbarg (1996) V. I. Piterbarg, Rice’s method for large excursions of gaussian random fields., Technical Report NO 478. Center for Stochastic Processes, Univ. North Carolina (1996).
  • Schwartzman et al. (2008) A. Schwartzman, W. Mascarenhas, J. Taylor, Inference for eigenvalues and eigenvectors of gaussian symmetric matrices., Annals of Statistics 36(6) (2008) 2886–2919.
  • Schwartzman and Telschow (2019) A. Schwartzman, F. Telschow, Peak p-values and false discovery rate inference in neuroimaging., NeuroImage 197 (2019) 402–413.
  • Van Essen et al. (2012) D. C. Van Essen, K. Ugurbil, E. Auerbach, D. Barch, T. E. J. Behrens, R. Bucholz, A. Chang, L. Chen, M. Corbetta, S. W. Curtiss, S. Della Penna, D. Feinberg, M. F. Glasser, N. Harel, A. C. Heath, L. Larson-Prior, D. Marcus, G. Michalareas, S. Moeller, R. Oostenveld, S. E. Petersen, F. Prior, B. L. Schlaggar, S. M. Smith, A. Z. Snyder, J. Xu, E. Yacoub, WU-Minn HCP Consortium, The human connectome project: a data acquisition perspective, Neuroimage 62 (2012) 2222–2231.