This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Asymptotic Uncertainty of False Discovery Proportion

Meng Meia{}^{\text{a}} a{}^{\text{a}} Department of Statistics, Oregon State University
b{}^{\text{b}} Department of Statistics and Data Science, National University of Singapore
Tao Yub{}^{\text{b}} a{}^{\text{a}} Department of Statistics, Oregon State University
b{}^{\text{b}} Department of Statistics and Data Science, National University of Singapore
and Yuan Jianga{}^{\text{a}} Yuan Jiang is the corresponding author. Meng Mei and Yuan Jiang’s research is supported in part by National Institutes of Health grant R01 GM126549. Tao Yu’s research is supported in part by the Singapore Ministry Education Academic Research Tier 1 Funds: R-155-000-202-114. a{}^{\text{a}} Department of Statistics, Oregon State University
b{}^{\text{b}} Department of Statistics and Data Science, National University of Singapore
Abstract

Multiple testing has been a popular topic in statistical research. Although vast works have been done, controlling the false discoveries remains a challenging task when the corresponding test statistics are dependent. Various methods have been proposed to estimate the false discovery proportion (FDP) under arbitrary dependence among the test statistics. One of the main ideas is to reduce arbitrary dependence to weak dependence and then to establish theoretically the strong consistency of the FDP and false discovery rate (FDR) under weak dependence. As a consequence, FDPs share the same asymptotic limit in the framework of weak dependence. We observe that the asymptotic variance of the FDP, however, may rely heavily on the dependence structure of the corresponding test statistics even when they are only weakly dependent; and it is of great practical value to quantify this variability, as it can serve as an indicator of the quality of the FDP estimate from the given data. As far as we are aware, the research on this respect is still limited in the literature. In this paper, we first derive the asymptotic expansion of FDP under mild regularity conditions and then examine how the asymptotic variance of FDP varies under different dependence structures both theoretically and numerically. With the observations in this study, we recommend that in a multiple testing performed by an FDP procedure, we may report both the mean and the variance estimates of FDP to enrich the study outcome.

Keywords: Asymptotic expansion; Asymptotic variance; Multiple testing; Weak dependence.

1 Introduction

Multiple hypothesis testing has been a popular topic in statistical research; it has wide applications in many scientific areas, such as biology, medicine, genetics, neuroscience, and finance. Consider a multiple testing problem where we need to test pp hypotheses simultaneously; denote by p0p_{0} and p1p_{1} the number of true null and false null hypotheses respectively. The testing outcomes can be summarized in Table 1,

Table 1: Summary of possible outcomes from multiple hypothesis testing
Not rejected Rejected Total
True Null UU VV p0p_{0}
False Null TT SS p1p_{1}
Total pRp-R RR pp

where VV and RR denote the number of hypotheses that are wrongly rejected (i.e., the number of type I errors made) and rejected (i.e., the number of rejections made) out of the pp hypotheses, respectively.

Conventional methods propose to control the familywise error rate (FWER): FWER=P(V>0)\text{FWER}=P(V>0) (Bonferroni, 1936; Šidák, 1967; Holm, 1979; Simes, 1986; Holland and Copenhaver, 1987; Hochberg, 1988; Rom, 1990), or the generalized familywise error rate (gFWER): gFWER=P(V>k), for k1\text{gFWER}=P(V>k),\text{ for }k\geq 1 (Dudoit et al., 2004; Pollard and van der Laan, 2004; Lehmann and Romano, 2012). These methods control the probability of zero or a limited number of false positives, therefore they are usually conservative, especially when the number of hypotheses pp is large. We observe that large pp is common in practice. For example, in a genome-wide association study, hundreds of thousands or even millions of single nucleotide polymorphisms (SNPs) of each subject and the associated disease status are measured. It is of interest to find which SNPs are associated with the disease status. A common approach is to conduct a hypothesis test for each SNP; this results in a huge number of hypothesis tests.

When pp is large, FDP and its statistical characteristics have been introduced. False discovery rate (FDR) has been the most popularly studied; it was introduced by Benjamini and Hochberg (1995) and was proven to be a reasonable criterion; in particular,

FDP=VR,FDR=E(FDP),\mathrm{FDP}=\frac{V}{R},\qquad\mathrm{FDR}=E(\mathrm{FDP}),

where FDP=FDR=0\mathrm{FDP}=\mathrm{FDR}=0 if R=0R=0. A list of FDR research can be found in Benjamini and Hochberg (1995), Benjamini et al. (2001), Storey (2002), Storey and Tibshirani (2003), Storey et al. (2004), Ferreira and Zwinderman (2006), Clarke et al. (2009), and the references therein. We observe that most of these methods are founded on the assumption that the test statistics are independent or satisfy some restrictive dependence structure.

Because of the complicated nature of the practical data, the resultant test statistics may follow any arbitrary dependence structure. The effects of dependence on FDR have been studied extensively, e.g., by Benjamini et al. (2001), Finner and Roters (2002), Owen (2005), Sarkar (2006), Efron (2007), among others. Recently, new methods for the control and/or estimation of FDR have been proposed to allow complex and general dependence structure or even incorporate it in the testing procedure. Examples include the local index of significance (LIS) testing procedure for dependence modeled by a hidden Markov model (Sun and Cai, 2009), the pointwise and clusterwise analysis for spatial dependence (Sun et al., 2015), and the principal factor approximation (PFA) method for arbitrary dependence (Fan et al., 2012; Fan and Han, 2017; Fan et al., 2019). In particular, Fan et al. (2012) studied FDP and FDR under arbitrary dependence with a two step procedure. They first proposed a PFA method to reduce arbitrary dependence to weak dependence; and then established theoretically the strong consistency of the FDP and FDR. Therefore, their method is valid for test statistics under arbitrary dependence; under weak dependence, both FDP and FDR converge to the same asymptotic limit.

In addition to FDR, another popular multiple testing criterion is called false discovery exceedance (FDX, also referred to as TPPFP in the literature), which is defined as the tail probability of the event that FDP exceeds a threshold q(0,1)q\in(0,1):

FDX=P(FDP>q).\text{FDX}=P(\mathrm{FDP}>q).

Compared to FDR that is only a single-value summary, FDX has the potential to fully describe the uncertainty and/or distribution of FDP as the threshold value qq varies. However, it has not been fully characterized how the dependence among the test statistics would impact FDX as opposed to FDR in the literature. In particular, some works on FDX assume independence among tests (Genovese and Wasserman, 2004; Guo and Romano, 2007; Ge and Li, 2012); more importantly, most works focus on developing procedures that control FDX as a single-value characteristic of FDP other than investigating how dependence affects the uncertainty and/or distribution of FDP (Korn et al., 2004; van der Laan et al., 2004; Lehmann and Romano, 2005; Genovese and Wasserman, 2006; Delattre and Roquain, 2015; Hemerik et al., 2019; Döhler and Roquain, 2020; Basu et al., 2021). Thus, these works still lack the valuable information about how FDP varies from study to study.

As far as we are aware, most of the above works have focused on proposing the estimates of FDP and/or its related statistical characteristics. On the one hand, studying these estimates is sufficient to ensure that the corresponding FDP and/or its statistical characteristics are under control. On the other hand, we observe that the asymptotic uncertainty of the FDP may rely heavily on the dependence structure of the corresponding test statistics; and it is of great practical value to quantify this variability, as it can serve as an indicator of the quality of the FDP estimate and benefit the research of the FDR and FDX methods. More specifically, with the FDP variance estimate available, one can conclude more confidently how well the FDP is controlled when controlling its corresponding mean at a given level. It may also benefit the development on the methods for estimating and controlling FDX. The research on the uncertainty of FDP has high potential impact, but it is still limited in the community. For example, Ge and Li (2012) and Delattre and Roquain (2011) derived the asymptotic distribution of FDP under independence and Gaussian equi-correlation dependence, respectively; Delattre and Roquain (2016) established the asymptotic distribution of FDP\mathrm{FDP} under some assumptions on the dependence structure of the test statistics and that the expected values of the test statistics in the alternatives are equivalent.

In this paper, our focus is to provide a rigorous theoretical investigation of the asymptotic uncertainty of FDP in the framework that the test statistics are weakly dependent. In particular, we derive the theoretical expansion of the FDP: it is a linear combination of the tests, based on which we obtain its asymptotic variance. We also examine how this variance varies under different dependence structures among the test statistics both theoretically and numerically. One interesting observation is that the FDP based on weakly dependent test statistics converges to the same limit as that based on independent test statistics (Fan et al., 2012); however, the asymptotic variance of the former is almost always significantly greater than the latter both theoretically and numerically. With these observations, we recommend that in a multiple testing performed by an FDP procedure, we may report both the mean and the variance estimates of FDP to enrich the study outcome. Finally, with a real GWAS dataset, we demonstrate how our findings may impact real studies.

2 Asymptotic Limit of FDP Under Weak Dependence

Consider a genome-wide association study with nn subjects; for each subject, its disease status and pp SNPs are measured. The data can be represented by 𝐘=(Y1,,Yn)T\mathbf{Y}=(Y_{1},\ldots,Y_{n})^{T} and a n×pn\times p matrix 𝐗=(Xij)i=1,,n;j=1,,p\mathbf{X}=(X_{ij})_{i=1,\ldots,n;j=1,\ldots,p}, where YiY_{i} denotes the disease status of subject ii, and the iith row of 𝐗\mathbf{X} collects its associated SNPs. When YiY_{i}’s are measured in a continuous scale, the association between the jjth SNP and the disease status can be modelled by the marginal linear regression model:

Yi=αj+βjXij+ϵij,i=1,,n,Y_{i}=\alpha_{j}+\beta_{j}X_{ij}+\epsilon_{ij},\quad i=1,\ldots,n, (1)

where αj\alpha_{j} and βj\beta_{j} are regression parameters and ϵij\epsilon_{ij} are random errors. The estimates β^j,j=1,,p\hat{\beta}_{j},\ j=1,\ldots,p and the corresponding test statistics Zj,j=1,,pZ_{j},\ j=1,\ldots,p for

H0j:βj=0versusH1j:βj0,j=1,,p,H_{0j}:\beta_{j}=0\quad\text{versus}\quad H_{1j}:\beta_{j}\neq 0,\quad j=1,\dots,p, (2)

can be derived with a standard least squares procedure. Proposition 1 in Fan et al. (2012) verified that β^j,j=1,,p\hat{\beta}_{j},\ j=1,\ldots,p are correlated and derived their joint distribution, under appropriate regularity conditions. As a consequence, the test statistics Zj,j=1,,pZ_{j},\ j=1,\ldots,p are also correlated. In particular, they derived that conditioning on 𝐗\mathbf{X},

(Z1,,Zp)TN((μ1,,μp)T,𝚺),(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma}), (3)

where (μ1,,μp)T(\mu_{1},\ldots,\mu_{p})^{T} and 𝚺=(σij)i,j=1,,p\boldsymbol{\Sigma}=(\sigma_{ij})_{i,j=1,\ldots,p} are appropriate population mean and variance matrix for (Z1,,Zp)T(Z_{1},\ldots,Z_{p})^{T}. Furthermore, the testing problem (2) becomes equivalent to

H0j:μj=0versusH1j:μj0,j=1,,p.H_{0j}:\mu_{j}=0\quad\text{versus}\quad H_{1j}:\mu_{j}\neq 0,\quad j=1,\ldots,p. (4)

Therefore, we shall work on the multiple testing problem under the setup (3) and (4) hereafter. Let 0={j:H0j is true}\mathcal{H}_{0}=\{j:H_{0j}\text{ is true}\} and 1={j:H0j is false}\mathcal{H}_{1}=\{j:H_{0j}\text{ is false}\} respectively be the sets of indices for the true nulls and the true alternatives, and let p0p_{0} and p1p_{1} denote their cardinalities. For a given threshold t(0,1)t\in(0,1), the jjth test is given by tj=1(|Zj|>|zt/2|)t_{j}=\mathrm{1}(|Z_{j}|>|z_{t/2}|), where tj=1t_{j}=1 indicates that the null hypothesis is rejected, and tj=0t_{j}=0 otherwise; here zt/2z_{t/2} denotes the (t/2)(t/2)th quantile of a standard normal distribution. Define V(t)=j0tjV(t)=\sum_{j\in\mathcal{H}_{0}}t_{j}, S(t)=j1tjS(t)=\sum_{j\in\mathcal{H}_{1}}t_{j}, and R(t)=j=1ptjR(t)=\sum_{j=1}^{p}t_{j}; they are respectively the number of false discoveries, correct discoveries, and the total number of discoveries. As a consequence, we have FDP(t)=V(t)/R(t)\mathrm{FDP}(t)=V(t)/R(t).

Fan et al. (2012) studied the estimation of FDP(t)\mathrm{FDP}(t) under an arbitrary dependence structure of 𝚺\boldsymbol{\Sigma}. One milestone finding is that if 𝚺\boldsymbol{\Sigma} follows the weak dependence structure:

i,j|σij|=O(p2δ) for some δ>0,\sum_{i,j}|\sigma_{ij}|=O\left(p^{2-\delta}\right)\text{ for some }\delta>0, (5)

then,

limp[FDP(t)p0ti=1p{Φ(zt/2+μi)+Φ(zt/2μi)}]=0,\lim_{p\to\infty}\left[\mathrm{FDP}(t)-\frac{p_{0}t}{\sum_{i=1}^{p}\left\{\Phi(z_{t/2}+\mu_{i})+\Phi(z_{t/2}-\mu_{i})\right\}}\right]=0, (6)

almost surely, where Φ()\Phi(\cdot) denotes the cumulative distribution function of a standard normal distribution. This indicates that under the aforementioned weak dependence assumption, FDP(t)\mathrm{FDP}(t) converges to a limit not depending on the covariance structure of the test statistics. It is noteworthy that this limit holds not only for FDP(t)\mathrm{FDP}(t), but also for FDR(t)\mathrm{FDR}(t).

We observe that this conclusion does not hold for the asymptotic variance of FDP(t)\mathrm{FDP}(t); the asymptotic variance of FDP(t)\mathrm{FDP}(t) varies under different covariance structures among the test statistics even when they are weakly dependent. In this article, we shall derive the asymptotic variance of FDP(t)\mathrm{FDP}(t) theoretically and study how it varies under different dependence structures of the test statistics numerically.

3 Asymptotic Variance of FDP

In this section, we study the asymptotic variance of FDP(t)\mathrm{FDP}(t) in the framework of Section 2. We first establish the asymptotic expansion of FDP(t)\mathrm{FDP}(t), and its asymptotic variance under a weak dependence structure. Then, we show that with mild conditions, this asymptotic variance under dependence is higher than that under independence. This implies that under weak dependence, although the asymptotic mean of FDP(t)\mathrm{FDP}(t) remains the same, its asymptotic variance can be inflated. For space limitations and presentational continuity, we have relegated the technical details in the supplementary materials.

3.1 Asymptotic expansion and variance of FDP

We first derive the asymptotic expansion of the FDP in the framework of weak dependence. We introduce the following notation. For any t[0,1]t\in[0,1], we denote ξj=Φ(zt/2+μj)+Φ(zt/2μj)\xi_{j}=\Phi(z_{t/2}+\mu_{j})+\Phi(z_{t/2}-\mu_{j}) and ξ¯=1p1j1ξj\bar{\xi}=\frac{1}{p_{1}}\sum_{j\in\mathcal{H}_{1}}\xi_{j}. Consider the function

H(μ)=ϕ(|zt/2|+μ)(|zt/2|+μ)+ϕ(|zt/2|μ)(|zt/2|μ);H(\mu)=\phi(|z_{t/2}|+\mu)(|z_{t/2}|+\mu)+\phi(|z_{t/2}|-\mu)(|z_{t/2}|-\mu);

there exists a unique root, denoted by μt\mu_{t}, of this function for μ(|zt/2|,|zt/2|+1)\mu\in(|z_{t/2}|,|z_{t/2}|+1). Let Ctmax=supμ(μt,μt)H(μ)C_{t}^{\max}=\sup_{\mu\in(-\mu_{t},\mu_{t})}H(\mu).

Theorem 1.

Suppose (Z1,,Zp)TN((μ1,,μp)T,𝚺)(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma}) with unit variances, i.e., σjj=1 for j=1,,p\sigma_{jj}=1\text{ for }j=1,\ldots,p. Assume that (Z1,,Zp)T(Z_{1},\dots,Z_{p})^{T} are weakly dependent as defined in (5), that lim suppp0t/(p1ξ¯)<1\limsup_{p\to\infty}p_{0}t/(p_{1}\bar{\xi})<1, and that as pp is sufficiently large, for a universal constant C>0C>0,

ij;i,j0σij2Ctmaxϕ(zt/2)|zt/2|i1,j0,μi[μt,μt]σij2,\displaystyle\sum_{i\neq j;i,j\in\mathcal{H}_{0}}\sigma_{ij}^{2}\geq\frac{C_{t}^{\max}}{\phi(z_{t/2})|z_{t/2}|}\sum_{i\in\mathcal{H}_{1},j\in\mathcal{H}_{0},\mu_{i}\in[-\mu_{t},\mu_{t}]}\sigma_{ij}^{2}, (7)
ij;i,j0σij2+pCij;i,j1σij2,\displaystyle\sum_{i\neq j;i,j\in\mathcal{H}_{0}}\sigma_{ij}^{2}+p\geq C\sum_{i\neq j;i,j\in\mathcal{H}_{1}}\sigma_{ij}^{2}, (8)
ijσij4=o(ij;i,j0σij2+p0).\displaystyle\sum_{i\neq j}\sigma_{ij}^{4}=o\left(\sum_{i\neq j;i,j\in\mathcal{H}_{0}}\sigma_{ij}^{2}+p_{0}\right). (9)

We have the following asymptotic expansion of FDP(t)\mathrm{FDP}(t):

FDP(t)=E(V¯)E(R¯)+m(V¯,R¯)+r(V¯,R¯),\mathrm{FDP}(t)=\frac{E(\bar{V})}{E(\bar{R})}+m(\bar{V},\bar{R})+r(\bar{V},\bar{R}), (10)

where V¯=V(t)/p\bar{V}=V(t)/p, R¯=R(t)/p\bar{R}=R(t)/p,

m(V¯,R¯)=V¯E(R¯)E(V¯){E(R¯)}2R¯,m(\bar{V},\bar{R})=\frac{\bar{V}}{E(\bar{R})}-\frac{E(\bar{V})}{\{E(\bar{R})\}^{2}}\bar{R},

and the remainder term r(V¯,R¯)r(\bar{V},\bar{R}) satisfies that E{r2(V¯,R¯)}=o[Var{m(V¯,R¯)}]E\{r^{2}(\bar{V},\bar{R})\}=o[\mathrm{Var}\{m(\bar{V},\bar{R})\}].

Remark 1.

Theorem 1 establishes the asymptotic expansion of FDP(t)\mathrm{FDP}(t). It decomposes FDP(t)\mathrm{FDP}(t) into three parts: the asymptotic mean E(V¯)/E(R¯)E(\bar{V})/E(\bar{R}), the stochastic term m(V¯,R¯)m(\bar{V},\bar{R}) that has mean zero, and an asymptotically negligible remainder term r(V¯,R¯)r(\bar{V},\bar{R}). The asymptotic mean E(V¯)/E(R¯)E(\bar{V})/E(\bar{R}) is a constant, which reinforces the conclusion given by (6); the stochastic term m(V¯,R¯)m(\bar{V},\bar{R}) is a linear combination of the tests tj,j=1,pt_{j},\ j=1\ldots,p, which gives the asymptotic variance of FDP(t)\mathrm{FDP}(t).

Remark 2.

With the weak dependence assumption (5), other technical conditions in Theorem 1, i.e., Conditions (7)–(9), are mild, if 𝚺\boldsymbol{\Sigma} does not behave too extremely; the intuition is as follows. The number of summed terms on the left hand side of (7) is about p02p_{0}^{2}; in contrast, that on the right side is about p0p1p_{0}p_{1}. If p0p1p_{0}\gg p_{1}, the number of summed terms on the left hand side of (7) is much more than that on the right. Therefore, this condition is satisfied unless the magnitudes of σij\sigma_{ij} for i0,j1i\in\mathcal{H}_{0},\ j\in\mathcal{H}_{1} are generally much bigger than those for i,j0i,j\in\mathcal{H}_{0}. Similar arguments are applicable for Condition (8). For Condition (9), the summation on the left hand side is over the fourth order of σij\sigma_{ij}, whereas that on the right is over the second order; therefore, it is easily satisfied when most of σij\sigma_{ij} approach 0 when pp\to\infty.

Remark 3.

The development for Theorem 1 is theoretically challenging. The expansion (10) can be observed from a standard Taylor expansion; however, the derivation for E{r2(V¯,R¯)}=o[Var{m(V¯,R¯)}]E\{r^{2}(\bar{V},\bar{R})\}=o[\mathrm{Var}\{m(\bar{V},\bar{R})\}] is technically involved. More specifically, two features jointly introduce challenges in our developments: (1) the leading component of E{r2(V¯,R¯)}E\{r^{2}(\bar{V},\bar{R})\} is the summation of fourth moments of all the tests; we observe that to compare its asymptotic properties with Var{m(V¯,R¯)}\mathrm{Var}\{m(\bar{V},\bar{R})\}, a fourth order Taylor expansion of each moment term is further needed; (2) the dependence structure of the tests significantly complicates our study of the asymptotic upper bound of E{r2(V¯,R¯)}E\{r^{2}(\bar{V},\bar{R})\} and the lower bound of Var{m(V¯,R¯)}\mathrm{Var}\{m(\bar{V},\bar{R})\}.

Following the above asymptotic expansion of FDP(t)\mathrm{FDP}(t) in Theorem 1, Corollary 1 below presents its asymptotic variance.

Corollary 1.

With all the conditions in Theorem 1 effective, we have

limpVar{FDP(t)}V1(t)+V2(t)=1,\lim_{p\to\infty}\dfrac{\mathrm{Var}\left\{\mathrm{FDP}(t)\right\}}{V_{1}(t)+V_{2}(t)}=1, (11)

where

V1(t)=\displaystyle V_{1}(t)={} p12ξ¯2(p0t+p1ξ¯)4×p0t(1t)+p02t2(p0t+p1ξ¯)4×p1ξ¯(1ξ¯),\displaystyle\dfrac{p_{1}^{2}\bar{\xi}^{2}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\times p_{0}t(1-t)+\dfrac{p_{0}^{2}t^{2}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\times p_{1}\bar{\xi}(1-\bar{\xi}), (12)
V2(t)=\displaystyle V_{2}(t)={} 2p12ξ¯2(p0t+p1ξ¯)4i<ji,j0Cov(ti,tj)2p0p1tξ¯(p0t+p1ξ¯)4i0j1Cov(ti,tj)\displaystyle\dfrac{2p_{1}^{2}\bar{\xi}^{2}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\sum_{\begin{subarray}{c}i<j\\ i,j\in\mathcal{H}_{0}\end{subarray}}\mathrm{Cov}(t_{i},t_{j})-\dfrac{2p_{0}p_{1}t\bar{\xi}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\sum_{\begin{subarray}{c}i\in\mathcal{H}_{0}\\ j\in\mathcal{H}_{1}\end{subarray}}\mathrm{Cov}(t_{i},t_{j})
+2p02t2(p0t+p1ξ¯)4i<ji,j1Cov(ti,tj),\displaystyle+\dfrac{2p_{0}^{2}t^{2}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\sum_{\begin{subarray}{c}i<j\\ i,j\in\mathcal{H}_{1}\end{subarray}}\mathrm{Cov}(t_{i},t_{j}), (13)

with

Cov(ti,tj)=\displaystyle\mathrm{Cov}(t_{i},t_{j})={} |zi|>|zt/2||zj|>|zt/2|ϕ(zi,zj;μi,μj,1,1,σij)𝑑zi𝑑zj\displaystyle\int_{|z_{i}|>|z_{t/2}|}\int_{|z_{j}|>|z_{t/2}|}\phi(z_{i},z_{j};\mu_{i},\mu_{j},1,1,\sigma_{ij})dz_{i}dz_{j}
[Φ(zt/2+μi)+Φ(zt/2μi)][Φ(zt/2+μj)+Φ(zt/2μj)],\displaystyle-[\Phi(z_{t/2}+\mu_{i})+\Phi(z_{t/2}-\mu_{i})][\Phi(z_{t/2}+\mu_{j})+\Phi(z_{t/2}-\mu_{j})], (14)

in which ϕ(zi,zj;μi,μj,σi2,σj2,σij)\phi(z_{i},z_{j};\mu_{i},\mu_{j},\sigma_{i}^{2},\sigma_{j}^{2},\sigma_{ij}) is the probability density function of the bivariate normal distribution with mean (μi,μj)T(\mu_{i},\mu_{j})^{T} and covariance matrix (σi2σijσijσj2)\left(\begin{matrix}\sigma_{i}^{2}&\sigma_{ij}\\ \sigma_{ij}&\sigma_{j}^{2}\end{matrix}\right).

Corollary 1 provides an explicit formula for the asymptotic variance of FDP under weak dependence. In the special case that the test statistics are independent, V2(t)=0V_{2}(t)=0 and the asymptotic variance of FDP is reduced to V1(t)V_{1}(t). Therefore, the “additional variance” of FDP due to the dependence among the test statistics is introduced by V2(t)V_{2}(t). However, the sign of V2(t)V_{2}(t) is not conclusive based on (13). This leads to an interesting question: under which scenario, V2(t)>0V_{2}(t)>0, and therefore the FDP\mathrm{FDP} is inflated by the variance among test statistics? To answer this question, we need to investigate the signs of the covariances between pairs of tests, i.e., Cov(ti,tj)\mathrm{Cov}(t_{i},t_{j}), which are included in the formula of V2(t)V_{2}(t). We answer this question in the subsequent subsections.

3.2 Signs of the covariance between any pair of tests

From our development in the subsection above, we explained that the asymptotic variance of FDP(t)\mathrm{FDP}(t) depends on V2(t)V_{2}(t); and in turn depends on Cov(ti,tj)\mathrm{Cov}(t_{i},t_{j}). In this subsection, we study how the signs of Cov(ti,tj)\mathrm{Cov}(t_{i},t_{j}) are affected by the dependence among the test statistics. Interestingly, we observe that the signs of Cov(ti,tj)\mathrm{Cov}(t_{i},t_{j}) are almost certainly determined by the memberships of tests ii and jj in 0\mathcal{H}_{0} and 1\mathcal{H}_{1}. We have the following theorem.

Theorem 2.

Suppose (Z1,,Zp)TN((μ1,,μp)T,𝚺)(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma}) with unit variances, i.e., σjj=1 for j=1,,p\sigma_{jj}=1\text{ for }j=1,\ldots,p. Assume σij0\sigma_{ij}\neq 0 for the index pair (i,j),ij(i,j),i\neq j. Then we have:

  • (a)(a)

    for i,j0i,j\in\mathcal{H}_{0}, Cov(ti,tj)>0\mathrm{Cov}(t_{i},t_{j})>0;

  • (b)(b)

    for i0i\in\mathcal{H}_{0}, j1j\in\mathcal{H}_{1}, Cov(ti,tj)<0\mathrm{Cov}(t_{i},t_{j})<0 if |μj|>2|zt/2||\mu_{j}|>2|z_{t/2}| and t<2{1Φ1(log(3)/2)}t<2\{1-\Phi^{-1}(\sqrt{\log(3)/2})\};

  • (c)(c)

    for i,j1i,j\in\mathcal{H}_{1}, Cov(ti,tj)>0\mathrm{Cov}(t_{i},t_{j})>0 if |σij||μ|min/(|μ|max+zt/2)|\sigma_{ij}|\leq|\mu|_{\min}/(|\mu|_{\max}+z_{t/2}) and sign(σij)=sign(μiμj)\mathrm{sign}(\sigma_{ij})=\mathrm{sign}(\mu_{i}\mu_{j}), where |μ|min=min(|μi|,|μj|)|\mu|_{\min}=\min(|\mu_{i}|,|\mu_{j}|) and |μ|max=max(|μi|,|μj|)|\mu|_{\max}=\max(|\mu_{i}|,|\mu_{j}|).

The motivation of the above theorem is to understand how the variance of FDP is affected by the dependence structure among the test statistics; however, these results on their own are interesting, and have theoretical and/or practical value. The theorem implies that the sign of the covariance of any (ti,tj),ij(t_{i},t_{j}),i\neq j pair is almost certainly determined by the membership of tests ii and jj in 0\mathcal{H}_{0} and 1\mathcal{H}_{1}, and relies little on the sign of σij\sigma_{ij}; it holds for the dependence types of both weakly dependent and strongly dependent test statistics. We expect that this theorem may benefit other multiple testing related research, when it is in need to quantify the dependence of tit_{i} and tjt_{j}. Next, we give some discussion on the implications of Part (a)–(c). In the discussion below, we assume that ZiZ_{i} and ZjZ_{j} are dependent, if not stated otherwise.

Part (a) states that tit_{i} and tjt_{j} are positively correlated provided that both corresponding tests belong to 0\mathcal{H}_{0}. The intuition is that both tests are two-sided tests and the corresponding test statistics have zero expectations. Therefore, |Zi||Z_{i}| and |Zj||Z_{j}| are positively correlated, which further leads to the positive correlation between tit_{i} and tjt_{j}.

Part (b) indicates that when one test, say ii, belongs to 0\mathcal{H}_{0}, and the other test, say jj, belongs to 1\mathcal{H}_{1}, then tit_{i} and tjt_{j} are negatively correlated given that tt is a reasonable threshold, i.e., t<2{1Φ1(log(3)/2)}0.46t<2\{1-\Phi^{-1}(\sqrt{\log(3)/2})\}\approx 0.46 and that the signal of jj is not too weak, i.e., |μj|>2|zt/2||\mu_{j}|>2|z_{t/2}|. Note that both conditions are typically satisfied in practical problems. The intuition of this conclusion is as follows. Let Zj=ZjμjZ_{j}^{*}=Z_{j}-\mu_{j}; then if we consider ZjZ_{j}^{*} as the test statistic, its corresponding test belongs to 0\mathcal{H}_{0}; its correlations with the other test statistics remain unchanged. The corresponding test tjt_{j} in the notation of ZjZ_{j}^{*} has the form:

tj=1(|Zj|>|zt/2|)=1{Zj(,|zt/2|μj)(|zt/2|μj,)},\displaystyle t_{j}=\mathrm{1}(|Z_{j}|>|z_{t/2}|)=\mathrm{1}\{Z^{*}_{j}\in(-\infty,-|z_{t/2}|-\mu_{j})\cup(|z_{t/2}|-\mu_{j},\infty)\},

where we assume that μj>0\mu_{j}>0 for presentational convenience; this leads to

1tj=1{Zj(|zt/2|μj,|zt/2|μj)},1-t_{j}=\mathrm{1}\left\{Z^{*}_{j}\in(-|z_{t/2}|-\mu_{j},|z_{t/2}|-\mu_{j})\right\},

and implies that the acceptance region for the test based on ZjZ_{j}^{*} is contained by the rejection region of tit_{i} given that the signal μj>2|zt/2|\mu_{j}>2|z_{t/2}|. From our conclusion in Part (a), 1tj1-t_{j} is positively correlated with tit_{i}, and therefore tjt_{j} and tit_{i} are negatively correlated.

Part (c) gives sufficient conditions under which tit_{i} and tjt_{j} are positively correlated when i,j1i,j\in\mathcal{H}_{1}. This is the only case where σij\sigma_{ij} affects the sign of the correlation between the corresponding tests. These conditions are also intuitively met in real applications. In particular, the first condition |σij||μ|min/(|μ|max+zt/2)|\sigma_{ij}|\leq|\mu|_{\min}/(|\mu|_{\max}+z_{t/2}) essentially requires that the magnitude of σij\sigma_{ij} does not exceed a ratio of the minimum to the maximum signals. One special practical scenario is that if the two signals are reasonably strong and in the similar scale, this ratio is close to 1; thus this condition is met. The second condition sign(σij)=sign(μiμj)\mathrm{sign}(\sigma_{ij})=\mathrm{sign}(\mu_{i}\mu_{j}) needs that the sign of σij\sigma_{ij} coincides with the signs of the corresponding signals: the test statistics are positively correlated if the signals are of the same signs; and they are negatively correlated otherwise. More specifically, consider the GWAS example, Part (c) implies that the tests for the significance of the iith and jjth SNPs are positively correlated if their individual effects on the disease status are both positive or negative.

3.3 Inflation of the FDP variance under dependence

In Section 3.1, we discussed that V2(t)V_{2}(t) attributes to the FDP variance difference among varied dependence structures of the test statistics. The signs of its components are further discussed in Section 3.2. From these discussion, we conclude that the variance of FDP is inflated given that Parts (a)–(c) in Theorem 2 all hold. We summarize this result in the following theorem.

Theorem 3.

Suppose (Z1,,Zp)TN((μ1,,μp)T,𝚺)(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma}) with σjj=1 for j=1,,p\sigma_{jj}=1\text{ for }j=1,\ldots,p. For any i,j1,iji,j\in\mathcal{H}_{1},i\neq j, assume that |μi|2|zt/2|>2log(3)/2|\mu_{i}|\geq 2|z_{t/2}|>2\sqrt{\log(3)/2}, |σij||μ|min/(|μ|max+zt/2)|\sigma_{ij}|\leq|\mu|_{\min}/(|\mu|_{\max}+z_{t/2}), and sign(σij)=sign(μiμj)\mathrm{sign}(\sigma_{ij})=\mathrm{sign}(\mu_{i}\mu_{j}), where |μ|min=min(|μi|,|μj|)|\mu|_{\min}=\min(|\mu_{i}|,|\mu_{j}|) and |μ|max=max(|μi|,|μj|)|\mu|_{\max}=\max(|\mu_{i}|,|\mu_{j}|). Then, V2(t)>0V_{2}(t)>0.

Theorem 3 implies that the variance of FDP is inflated if the test statistics satisfy the given assumptions on their dependence structure. On the one hand, as we have discussed in Section 3.2, these are mild conditions for practical data. On the other hand, they are only one set of sufficient conditions that lead to the inflation of the FDP variance. In our numerical studies, we experiment with some settings that violate the conditions in Theorem 3, and still observe the inflation of the variance of FDP; please refer to Section 5 for details. Such scenarios are not theoretically justified, but may often be practically observed.

4 Estimation of the Asymptotic Variance of FDP

In Section 3.1, we have established the asymptotic variance of FDP in the framework of weak dependence. We observe that this variance depends on the parameters (μ1,,μp)T(\mu_{1},\ldots,\mu_{p})^{T} and 𝚺\boldsymbol{\Sigma}. We need to estimate them so that we can quantify this variance to make it practically applicable. Throughout, we assume that 𝚺\boldsymbol{\Sigma} is known or can be estimated from other resources (see Section 2) and estimate only (μ1,,μp)T(\mu_{1},\ldots,\mu_{p})^{T}.

One direct approach is to estimate (μ1,,μp)T(\mu_{1},\ldots,\mu_{p})^{T} by their corresponding test statistics (Z1,,Zp)T(Z_{1},\ldots,Z_{p})^{T}. However, in practical data, we often have p0p1p_{0}\gg p_{1}; this implies that most μj\mu_{j}’s are zeros. Estimating these “zeros” using the corresponding observed ZjZ_{j}’s tends to diminish the efficiency of the method, as they can be viewed as noise. In our numerical studies, we consider the following estimation approach.

  • (1)

    Find an estimate π^0\hat{\pi}_{0} for π0=p0/p\pi_{0}=p_{0}/p, which is the proportion of the number of tests that belong to 0\mathcal{H}_{0} over the total number of tests.

    Estimation methods for π0\pi_{0} are available in the literature. For example, Storey and Tibshirani (2003) proposed π^0(λ)=i=1p1(Pi>λ)/{m(1λ)}\hat{\pi}_{0}(\lambda)=\sum_{i=1}^{p}1(P_{i}>\lambda)/\{m(1-\lambda)\} with an λ(0,1)\lambda\in(0,1); the basic idea of this estimator is to pretend that pp-values greater than λ\lambda are from 0\mathcal{H}_{0}. In this framework, smoothed and bootstrap estimators of π0\pi_{0} are proposed by Storey and Tibshirani (2003) and Storey et al. (2004), respectively. Additionally, Langaas et al. (2005) proposed a nonparametric maximum likelihood estimator of π0\pi_{0}. More recently, Wang et al. (2010) proposed a “SLIM” estimator of π0\pi_{0}; their estimator accounted for the dependence among the test statistics. We shall adopt these estimators in our numerical studies and compare their performance in the estimation of the FDP variance.

  • (2)

    Estimate p1p_{1} by p^1=p(1π^0)\hat{p}_{1}=p(1-\hat{\pi}_{0}). We then collect the hypotheses with the top p^1\hat{p}_{1} statistic values to form ^1\widehat{\mathcal{H}}_{1}, and the rest are assigned to ^0\widehat{\mathcal{H}}_{0}.

  • (3)

    Set μ^j=0\hat{\mu}_{j}=0 for j^0j\in\widehat{\mathcal{H}}_{0}; and set μ^j=Zj\hat{\mu}_{j}=Z_{j} for j^1j\in\widehat{\mathcal{H}}_{1}; and use them to estimate the variance of FDP(t)\mathrm{FDP}(t) based on Corollary 1.

5 Simulation Studies

We conduct simulation studies to validate our developments in Sections 3 and 4. In Section 5.1, we use various numerical examples to illustrate how the asymptotic variances of FDP are different for varied dependence structures among the test statistics. In Section 5.2, we illustrate how the choice of the methods for estimating π^0\hat{\pi}_{0} reviewed in Section 4 affects the estimation for the variance of FDP.

5.1 Variance of FDP under varied dependence structures

Throughout, we generate (Z1,,Zp)TN((μ1,,μp)T,𝚺)(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma}) with p=2000p=2000. We examined three choices of p1p_{1}: 50, 100, and 200, where the indices of tests in 1\mathcal{H}_{1} are randomly sampled from {1,,p}\{1,\ldots,p\}; for j1j\in\mathcal{H}_{1}, we set μj=2zt/2\mu_{j}=2z_{t/2}. We considered three possible values of tt: 0.005, 0.02, and 0.05. The choice of 𝚺\boldsymbol{\Sigma} determines the dependence structure among the test statistics. We use similar methods as Fan et al. (2012) to generate it; the details are given below.

  • (1)

    We generate a random sample 𝐗1,,𝐗400\mathbf{X}_{1},\ldots,\mathbf{X}_{400}, which are iid copies of 𝐗=(X1,,Xp)T\mathbf{X}=(X_{1},\ldots,X_{p})^{T}. We consider seven models for the joint distribution of 𝐗\mathbf{X}.

    • [M1] X1,,XpX_{1},\ldots,X_{p} are generated as iid N(0,1)N(0,1) random variables.

    • [M2] Let 𝐗Np(0,𝚲)\mathbf{X}\sim N_{p}(0,\mathbf{\Lambda}), where 𝚲\mathbf{\Lambda} has diagonal elements 11 and off-diagonal elements 1/21/2.

    • [M3] Let {Xk}k=11900\{X_{k}\}_{k=1}^{1900} be iid N(0,1)N(0,1) random variables, and Xk=l=110Xl(1)l+1/5+11025εkX_{k}=\sum_{l=1}^{10}X_{l}(-1)^{l+1}/5+\sqrt{1-\dfrac{10}{25}}\varepsilon_{k} for k=1901,,2000,k=1901,\dots,2000, with {εk}k=19012000\{\varepsilon_{k}\}_{k=1901}^{2000} being N(0,1)N(0,1) random variables.

    • [M4] Let {Xk}k=12000\{X_{k}\}_{k=1}^{2000} be iid Cauchy random variables with location parameter 0 and scale parameter 11.

    • [M5] Let Xj=ρj(1)W(1)+ρj(2)W(2)+ρj(3)W(3)+HjX_{j}=\rho_{j}^{(1)}W^{(1)}+\rho_{j}^{(2)}W^{(2)}+\rho_{j}^{(3)}W^{(3)}+H_{j}, where W(1)N(2,1)W^{(1)}\sim N(-2,1), W(2)N(1,1)W^{(2)}\sim N(1,1), W(3)N(4,1)W^{(3)}\sim N(4,1), ρj(1)\rho_{j}^{(1)}, ρj(2)\rho_{j}^{(2)}, ρj(3)U(1,1)\rho_{j}^{(3)}\sim U(-1,1), HjN(0,1)H_{j}\sim N(0,1); and all these random variables are independent.

    • [M6] Let Xj=ρj(1)W(1)+ρj(2)W(2)+HjX_{j}=\rho_{j}^{(1)}W^{(1)}+\rho_{j}^{(2)}W^{(2)}+H_{j}, where W(1),W(2)N(0,1)W^{(1)},W^{(2)}\sim N(0,1), ρj(1),ρj(2)U(1,1)\rho_{j}^{(1)},\rho_{j}^{(2)}\sim U(-1,1), HjN(0,1)H_{j}\sim N(0,1), and all these random variables are independent.

    • [M7] Let Xj=sin(ρj(1)W(1))+sign(ρj(2))exp(|ρj(2)|W(2))+HjX_{j}=\mathrm{sin}\left(\rho_{j}^{(1)}W^{(1)}\right)+\mathrm{sign}\left(\rho_{j}^{(2)}\right)\exp\left(|\rho_{j}^{(2)}|W^{(2)}\right)+H_{j}, where W(1),W(2)N(0,1)W^{(1)},W^{(2)}\sim N(0,1), ρj(1),ρj(2)U(1,1)\rho_{j}^{(1)},\rho_{j}^{(2)}\sim U(-1,1), and HjN(0,1)H_{j}\sim N(0,1), and all these random variables are independent.

  • (2)

    We compute 𝚺initial\boldsymbol{\Sigma}^{\text{initial}} as the sample correlation matrix based on 𝐗1,,𝐗400\mathbf{X}_{1},\ldots,\mathbf{X}_{400}. Note that for some scenarios above, the components of 𝐗\mathbf{X} are strongly dependent.

  • (3)

    We adopt the PFA method in Fan et al. (2012) to generate a weakly dependent 𝚺\boldsymbol{\Sigma} based on 𝚺initial\boldsymbol{\Sigma}^{\text{initial}}. When applying the PFA method, we used the criterion p21i,jp|σij|<0.05p^{-2}\sum_{1\leq i,j\leq p}|\sigma_{ij}|<0.05 numerically.

With these setups, for M1, the test statistics are independent, whereas for the other models, the test statistics are weakly dependent with varied dependence structures. For each aforementioned p1p_{1}, tt, and model combination, we conduct FDP\mathrm{FDP} analysis, and evaluate the corresponding sample standard deviation of FDP\mathrm{FDP}s over 1000 repetitions (named “Emp” in Table 2); we also compute the corresponding asymptotic standard deviation of FDP\mathrm{FDP} given in Corollary 1 with 𝝁\boldsymbol{\mu} and 𝚺\boldsymbol{\Sigma} being replaced with their simulated values (named “Asym” in Table 2). The results are summarized in Table 2.

Table 2: Comparison for the standard deviation (×100\times 100) of FDPs
tt p1p_{1} Method M1 M2 M3 M4 M5 M6 M7
50 Asym 4.37 6.11 5.13 6.91 6.11 6.12 6.15
Emp 4.43 6.08 5.16 6.60 5.73 6.20 5.90
0.005 100 Asym 2.57 3.57 3.01 4.03 3.57 3.57 3.59
Emp 2.65 3.56 2.94 3.85 3.68 3.66 3.52
200 Asym 1.37 1.88 1.59 2.12 1.88 1.89 1.89
Emp 1.33 1.83 1.60 2.06 1.86 1.79 1.95
50 Asym 3.92 6.74 5.23 7.12 6.75 6.75 6.80
Emp 3.90 6.64 5.35 7.30 6.91 6.90 6.84
0.02 100 Asym 3.23 5.50 4.28 5.81 5.51 5.51 5.55
Emp 3.20 5.50 4.44 5.66 5.39 5.58 5.35
200 Asym 2.15 3.61 2.82 3.80 3.61 3.61 3.64
Emp 2.14 3.61 2.84 3.70 3.44 3.51 3.40
50 Asym 2.25 4.33 3.25 4.42 4.34 4.34 4.37
Emp 2.27 4.54 3.25 4.47 4.58 4.45 4.54
0.05 100 Asym 2.53 4.87 3.65 4.97 4.88 4.88 4.92
Emp 2.54 4.70 3.60 5.04 4.80 4.92 5.02
200 Asym 2.23 4.24 3.19 4.32 4.24 4.24 4.27
Emp 2.25 4.25 3.22 4.30 4.09 4.17 4.22

From this table, we observe that for all cases, the asymptotic standard deviation is very close to the sample standard deviation of FDP\mathrm{FDP}s over 1000 repetitions. This validates our conclusion in Corollary 1, and indicates that this asymptotic variance well approximates its corresponding true value in practice. Furthermore, for all the p1p_{1} and tt combinations, the standard deviations of the FDP from M1 are significantly smaller than those from the other models. This implies that the variances of FDP are inflated when dependence is present among the test statistics. Note that for these examples, the conditions in Theorem 3 are not necessarily guaranteed; this reinforces our discussion in Section 3.3 and demonstrates that the inflation of the FDP variance may occur in wide real applications. Among M2M6, we observe that M3 leads to the smallest inflation; this could be because that this model is close to the independent case, as the majority of XkX_{k}’s are independent, i.e., for k=1,,1900k=1,\ldots,1900.

5.2 Estimation of the asymptotic variance for FDP

In this subsection, we assess the performance of the method in Section 4 in estimating the asymptotic variance of FDP. As we observe in Section 4, our method relies on π0\pi_{0}, whose estimation methods are available in the literature. We incorporate these methods into our method and compare their performance. In particular, we consider: (1) the smoothed estimate by Storey and Tibshirani (2003), named as “Smoothed”; (2) the bootstrap estimate by Storey et al. (2004), named as “Bootstrap”; (3) the nonparametric maximum likelihood estimate by Langaas et al. (2005), named as “Langaas”; and (4) the SLIM estimate by Wang et al. (2010), named as “SLIM”. We include the asymptotic standard deviation of FDP based on the true values of the parameters, i.e, 𝝁\boldsymbol{\mu} and 𝚺\boldsymbol{\Sigma}, for each case as a reference, named as “True”.

We adopt the methods in Section 5.1 to generate test statistics, and apply the aforementioned methods to estimate the asymptotic variances of FDPs. We only present the results for Model M4 in Section 5.1; the results for the other models show similar patterns, and are omitted due to space limitation. The means and standard deviations of the estimated asymptotic standard deviations of FDPs over 1000 replicates are displayed in Table 3. From this table, we observe that the Langaas and SLIM methods perform reasonably well in the estimation of the asymptotic standard deviation of FDPs; in particular, these methods result in estimates close to the true asymptotic standard deviation based on our Corollary 1 and lead to smaller standard deviations. We observe that this is because in the estimation of π0\pi_{0}, these methods are more robust to the dependence among the test statistics, whereas the Smoothed and Bootstrap methods can work well only when the test statistics are independent. We provide the average estimated number of identified alternative hypotheses, i.e., p1=p(1π0)p_{1}=p(1-\pi_{0}) and the corresponding standard deviations over 1000 replicates in Table 4; from this table, we observe that the performance in the estimation of p1p_{1} is in the order of

Bootstrap<Smoothed<Langaas<SLIM,\displaystyle\mbox{Bootstrap}<\mbox{Smoothed}<\mbox{Langaas}<\mbox{SLIM},

which is consistent with our observation in Table 3.

Table 3: Mean (×1000\times 1000) and standard deviation (×1000\times 1000, in parenthesis) of the estimated asymptotic standard deviation of FDPs over 1000 replicates.
tt p1p_{1} True Smoothed Bootstrap Langaas SLIM
50 6.91 3.46 (3.12) 5.83 (2.70) 6.00 (1.02) 7.31 (1.52)
0.005 100 4.03 3.13 (2.82) 4.01 (1.43) 3.68 (0.42) 4.23 (0.60)
200 2.12 2.65 (2.45) 2.06 (0.37) 2.02 (0.14) 2.18 (0.16)
50 7.12 3.57 (2.92) 5.81 (1.79) 6.70 (0.62) 6.97 (0.43)
0.02 100 5.81 3.56 (2.48) 5.39 (1.02) 5.45 (0.54) 5.90 (0.40)
200 3.80 3.46 (1.84) 3.63 (0.57) 3.64 (0.29) 3.92 (0.22)
50 4.42 2.87 (2.35) 4.24 (1.50) 4.74 (0.40) 4.32 (0.69)
0.05 100 4.97 3.04 (2.13) 4.71 (0.49) 4.91 (0.16) 4.95 (0.09)
200 4.32 3.51 (1.52) 4.16 (0.37) 4.23 (0.27) 4.41 (0.12)
Table 4: Mean and standard deviation (in parenthesis) of the estimated number of alternative hypotheses over 1000 replicates.
tt p1p_{1} Smoothed Bootstrap Langaas SLIM
50 103 (122) 107 (80) 90 (53) 55 (20)
0.005 100 136 (135) 153 (86) 138 (51) 102 (21)
200 214 (152) 251 (77) 239 (49) 202 (20)
50 103 (122) 110 (89) 90 (55) 55 (21)
0.02 100 133 (136) 152 (85) 137 (51) 103 (21)
200 205 (154) 251 (80) 237 (50) 202 (20)
50 94 (117) 108 (85) 90 (58) 53 (21)
0.05 100 129 (135) 150 (84) 136 (54) 100 (20)
200 207 (147) 252 (80) 234 (56) 197 (20)

6 Real Data Analysis

We apply our method to a genome-wide association study (GWAS); more specifically, an eQTL mapping study. The data are formed of 210 subjects from the International Hapmap Project, including three groups: (1) 90 subjects from Asian formed of 45 Japanese from Tokyo, 45 Han Chinese from Beijing, named “Grp1”; (2) 60 Utah residents with ancestry from northern and western Europe, named “Grp2”; and (3) 60 Nigerian from Ibadan, named “Grp3”. More details of the data can be found from the web link: ftp://ftp.sanger.ac.uk/pub/genevar/; see also Bradic et al. (2011). We adopt the existing methods to construct test statistics for testing the association between the expression level of the gene CCT8 and the SNP genotypes for each group of the subjects (Bradic et al., 2011; Fan et al., 2012).

The SNP genotypes have three possibilities: SNP=0\text{SNP}=0 means no polymorphism, SNP=1\text{SNP}=1 indicates one nucleotide has polymorphism, and SNP=2\text{SNP}=2 implies both nucleotides have polymorphisms. Adopting the strategy in Bradic et al. (2011), we transform the SNP data into two dummy variables, namely (d1,d2)(d_{1},d_{2}):

  • (d1,d2)=(0,0)(d_{1},d_{2})=(0,0) for SNP=0\text{SNP}=0;

  • (d1,d2)=(1,0)(d_{1},d_{2})=(1,0) for SNP=1\text{SNP}=1;

  • (d1,d2)=(0,1)(d_{1},d_{2})=(0,1) for SNP=2\text{SNP}=2.

For subject ii, denote by YiY_{i} and (d1,i,j,d2,i,j)(d_{1,i,j},d_{2,i,j}) the logarithm-transformed value of the expression level of gene CCT8 and the aforementioned dummy variables of the jjth SNP. We consider two sets of marginal linear regression models:

Set 1:Yi\displaystyle\mbox{Set 1:}\qquad Y_{i} =\displaystyle= α1,j+β1,jd1,i,j+ϵ1,i,j,for j=1,,p;\displaystyle\alpha_{1,j}+\beta_{1,j}d_{1,i,j}+\epsilon_{1,i,j},\quad\mbox{for }j=1,\ldots,p;
Set 2:Yi\displaystyle\mbox{Set 2:}\qquad Y_{i} =\displaystyle= α2,j+β2,jd2,i,j+ϵ2,i,j,for j=1,,p.\displaystyle\alpha_{2,j}+\beta_{2,j}d_{2,i,j}+\epsilon_{2,i,j},\quad\mbox{for }j=1,\ldots,p.

Here we impute the missing SNP data by 0, and exclude the SNPs that are identical to each other for all the subjects.

For each group of the data, we combine aformentioned two sets of models, and follow the procedure in Fan et al. (2012) to generate the test statistics; this leads to 2p2p test statistics. In detail, we use the procedure described in Section 2 to establish an initial set of test statistics and their corresponding covariance matrix. We then use the dependence-adjusted procedure in Fan et al. (2012) to remove the top ten principal factors from the covariance matrix, and take a standardization step to generate a new set of test statistics of unit marginal variances; this helps reduce the dependence among the test statistics.

With the resultant test statistics for each group, we estimate the asymptotic limits and standard deviations of FDP(t)\mathrm{FDP}(t) based on the following setups:

  • For each group of test statistics, we consider two thresholds of pp-value, which are chosen such that the estimates of the FDP(t)\mathrm{FDP}(t) limits are reasonable values.

  • The asymptotic limits are estimated from (6), denoted by FDP^(t)\widehat{\text{FDP}}(t).

  • The asymptotic standard deviations are computed based on the methods in Section 4. As we have observed in Section 5.2 that the standard deviation estimates based on the π0\pi_{0} estimates from the “Langaas” and “SLIM” methods outperform the others; therefore, we present only the results from these methods. Furthermore, to assess how dependence may affect the estimation of these standard deviation estimates, we report these estimates based on the assumptions that the test statistics are independent and dependent, respectively. Therefore, for each group of subjects, we have four estimates, respectively denoted by SFDP^(t)LangaasS_{\widehat{\text{FDP}}(t)}^{\text{Langaas}}, SFDP^(t),IndLangaasS_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{Langaas}}, SFDP^(t)SLIMS_{\widehat{\text{FDP}}(t)}^{\text{SLIM}}, and SFDP^(t),IndSLIMS_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{SLIM}}.

We summarize our estimation results in Table 5. From this table, we observe that under different assumptions of whether the dependence is present among the test statistics significantly affect the asymptotic standard deviation of FDP(t)\mathrm{FDP}(t). The ratios SFDP^(t)Langaas/SFDP^(t),IndLangaasS_{\widehat{\text{FDP}}(t)}^{\text{Langaas}}/S_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{Langaas}} and SFDP^(t)SLIM/SFDP^(t),IndSLIMS_{\widehat{\text{FDP}}(t)}^{\text{SLIM}}/S_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{SLIM}} range from 4 to 7 folds. Moreover, the ratios of the FDP standard deviation estimates to its limit values FDP^(t)\widehat{\text{FDP}}(t) are between 0.25 and 0.82 and thus the uncertainty of FDP is nonnegligible. These observations reinforce the statement that in an FDP related inference, it may not be sufficient to consider only the limit value of FDP (or equivalently, FDR); we suggest that it is important to correctly account for the dependence; otherwise, it might be over optimistic in the conclusion of how well the FDP is controlled. For example, when t=0.005t=0.005 in Grp1, the 95 percentile (say, two standard deviations above the mean) of the FDP(t)\mathrm{FDP}(t) is estimated as 0.315+0.153×2=0.6210.315+0.153\times 2=0.621 under dependence; in contrast, if we assume that the test statistics are independent, the corresponding estimate is 0.315+0.031×2=0.3770.315+0.031\times 2=0.377. Therefore, ignoring the dependence leads to an over optimistic belief of how well the upper bound of FDP is controlled; such a nonignorable difference may significantly change the statistical interpretation of the results in practice.

Table 5: Estimation results for the GWAS study.
Population #SNP tt FDP^(t)\widehat{\text{FDP}}(t) SFDP^(t)LangaasS_{\widehat{\text{FDP}}(t)}^{\text{Langaas}} SFDP^(t),IndLangaasS_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{Langaas}} SFDP^(t)SLIMS_{\widehat{\text{FDP}}(t)}^{\text{SLIM}} SFDP^(t),IndSLIMS_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{SLIM}}
Grp1 73 0.005 0.315 0.153 0.031 0.259 0.052
105 0.01 0.377 0.166 0.031 0.244 0.045
Grp2 166 0.01 0.199 0.107 0.018 0.148 0.025
199 0.02 0.441 0.125 0.019 0.167 0.025
Grp3 89 0.002 0.147 0.069 0.017 0.092 0.022
135 0.005 0.371 0.092 0.019 0.124 0.025

7 Discussion

In this article, we have established the theoretical results for the asymptotic variance of FDP\mathrm{FDP} in the framework that the test statistics are weakly dependent. We observe that with mild conditions, the asymptotic variance of FDP\mathrm{FDP} under dependence is usually greater than that under independence; this is also demonstrated by our numerical studies: even under weak dependence, the variances of the FDPs can be numerically much larger than those under independence. It is thus crucial for us to take into account the dependence in the estimation of the variance of the FDP and report it together with its mean. Furthermore, we have observed in the real data example that the standard deviation estimate of FDP\mathrm{FDP} under the dependence assumption is significantly different from (usually greater than) that under the independence assumption. In practice, we suggest that we presumably include the dependence in the estimation, unless there is a strong evidence of otherwise. This will be helpful for us to better understand how well the FDP is controlled, and avoid over optimistic conclusions.

In this article, we have studied the asymptotic uncertainty of FDP in the same framework as Fan et al. (2012). Under the assumption that the test statistics are weakly dependent, we obtain the asymptotic expansion of FDP(t)\mathrm{FDP}(t) and establish the explicit formula for computing the asymptotic variance of FDP(t)\mathrm{FDP}(t). Within this framework, we observe many interesting open problems that can be studied in the future. We list several as examples; there are many others.

First, our proposed method for estimating the variance of FDP(t)\mathrm{FDP}(t) in Section 4 relies on the estimation of π0\pi_{0}; there, we adopted existing methods in the literature. From simulation studies in Section 5.2, we observe that the performance of the variance estimate for FDP(t)\mathrm{FDP}(t) relies heavily on that of π0\pi_{0}; although the Langaas and SLIM methods perform reasonably well, we believe that they are not yet optimal, as none of these π0\pi_{0} estimation methods have sufficiently accounted for the dependence among the test statistics. Therefore, we expect that establishing improved π0\pi_{0} estimation methods that can better accommodate the dependence among the test statistics would be of particular interest, and can be incorporated into our estimation method in Section 4.

Second, we have imposed some weak dependence assumptions in our development; this may limit the method in some applications. Practically, one possible solution is not to use the original test statistics, but the dependence-adjusted ones, as they are more powerful and often weakly dependent (Friguet and Causeur, 2011; Fan et al., 2012); yet there is no theoretical guarantees. Therefore, it is challenging, but would be of interest to study the asymptotic variance of FDP(t)\mathrm{FDP}(t) under arbitrary dependence.

Third, we assume that the test statistics are normally distributed in this article. It would be of great both practical and theoretical value to generalize our study to accommodate multiple testing problems where the test statistics follow other popular distributions, e.g., tt, FF, and chi-square distributions. We refer to Zhuo et al. (2020) as an example of the recent works on dependent tt-tests and leave the relevant investigation to our future work.

SUPPLEMENTARY MATERIALS

The supplementary materials contain technical details for the theorems in Section 3.

References

  • Basu et al. (2021) Basu, P., Fu, L., Saretto, A., and Sun, W. (2021), “Empirical bayes control of the false discovery exceedance,” arXiv preprint arXiv:2111.03885.
  • Benjamini and Hochberg (1995) Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal statistical society: series B (Methodological), 57, 289–300.
  • Benjamini et al. (2001) Benjamini, Y., Yekutieli, D., et al. (2001), “The control of the false discovery rate in multiple testing under dependency,” The annals of statistics, 29, 1165–1188.
  • Bonferroni (1936) Bonferroni, C. (1936), “Teoria statistica delle classi e calcolo delle probabilita,” Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, 3–62.
  • Bradic et al. (2011) Bradic, J., Fan, J., and Wang, W. (2011), “Penalized composite quasi-likelihood for ultrahigh dimensional variable selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 325–349.
  • Clarke et al. (2009) Clarke, S., Hall, P., et al. (2009), “Robustness of multiple testing procedures against dependence,” The Annals of Statistics, 37, 332–358.
  • Delattre and Roquain (2011) Delattre, S. and Roquain, E. (2011), “On the false discovery proportion convergence under Gaussian equi-correlation,” Statistics & Probability Letters, 81, 111–115.
  • Delattre and Roquain (2015) — (2015), “New procedures controlling the false discovery proportion via Romano–Wolf’s heuristic,” The Annals of Statistics, 43, 1141–1177.
  • Delattre and Roquain (2016) — (2016), “On empirical distribution function of high-dimensional Gaussian vector components with an application to multiple testing,” Bernoulli, 22, 302–324.
  • Döhler and Roquain (2020) Döhler, S. and Roquain, E. (2020), “Controlling the false discovery exceedance for heterogeneous tests,” Electronic Journal of Statistics, 14, 4244–4272.
  • Dudoit et al. (2004) Dudoit, S., van der Laan, M. J., and Pollard, K. S. (2004), “Multiple testing. Part I. Single-step procedures for control of general type I error rates,” Statistical Applications in Genetics and Molecular Biology, 3, 1–69.
  • Efron (2007) Efron, B. (2007), “Correlation and large-scale simultaneous significance testing,” Journal of the American Statistical Association, 102, 93–103.
  • Fan and Han (2017) Fan, J. and Han, X. (2017), “Estimation of the false discovery proportion with unknown dependence,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1143–1164.
  • Fan et al. (2012) Fan, J., Han, X., and Gu, W. (2012), “Estimating false discovery proportion under arbitrary covariance dependence,” Journal of the American Statistical Association, 107, 1019–1035.
  • Fan et al. (2019) Fan, J., Ke, Y., Sun, Q., and Zhou, W.-X. (2019), “FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control,” Journal of the American Statistical Association, 1–29.
  • Ferreira and Zwinderman (2006) Ferreira, J. and Zwinderman, A. (2006), “On the Benjamini–Hochberg method,” The Annals of Statistics, 34, 1827–1849.
  • Finner and Roters (2002) Finner, H. and Roters, M. (2002), “Multiple hypotheses testing and expected number of type I. errors,” The Annals of Statistics, 30, 220–238.
  • Friguet and Causeur (2011) Friguet, C. and Causeur, D. (2011), “Estimation of the proportion of true null hypotheses in high-dimensional data under dependence,” Computational statistics & data analysis, 55, 2665–2676.
  • Ge and Li (2012) Ge, Y. and Li, X. (2012), “Control of the false discovery proportion for independently tested null hypotheses,” Journal of Probability and Statistics, 2012.
  • Genovese and Wasserman (2004) Genovese, C. and Wasserman, L. (2004), “A stochastic process approach to false discovery control,” The Annals of Statistics, 32, 1035–1061.
  • Genovese and Wasserman (2006) Genovese, C. R. and Wasserman, L. (2006), “Exceedance control of the false discovery proportion,” Journal of the American Statistical Association, 101, 1408–1417.
  • Guo and Romano (2007) Guo, W. and Romano, J. (2007), “A generalized Sidak-Holm procedure and control of generalized error rates under independence,” Statistical applications in genetics and molecular biology, 6.
  • Hemerik et al. (2019) Hemerik, J., Solari, A., and Goeman, J. J. (2019), “Permutation-based simultaneous confidence bounds for the false discovery proportion,” Biometrika, 106, 635–649.
  • Hochberg (1988) Hochberg, Y. (1988), “A sharper Bonferroni procedure for multiple tests of significance,” Biometrika, 75, 800–802.
  • Holland and Copenhaver (1987) Holland, B. S. and Copenhaver, M. D. (1987), “An improved sequentially rejective Bonferroni test procedure,” Biometrics, 417–423.
  • Holm (1979) Holm, S. (1979), “A simple sequentially rejective multiple test procedure,” Scandinavian journal of statistics, 65–70.
  • Korn et al. (2004) Korn, E. L., Troendle, J. F., McShane, L. M., and Simon, R. (2004), “Controlling the number of false discoveries: application to high-dimensional genomic data,” Journal of Statistical Planning and Inference, 124, 379–398.
  • Langaas et al. (2005) Langaas, M., Lindqvist, B. H., and Ferkingstad, E. (2005), “Estimating the proportion of true null hypotheses, with application to DNA microarray data,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 555–572.
  • Lehmann and Romano (2005) Lehmann, E. and Romano, J. P. (2005), “GENERALIZATIONS OF THE FAMILYWISE ERROR RATE,” The Annals of Statistics, 33, 1138–1154.
  • Lehmann and Romano (2012) Lehmann, E. L. and Romano, J. P. (2012), “Generalizations of the familywise error rate,” in Selected Works of EL Lehmann, Springer, pp. 719–735.
  • Owen (2005) Owen, A. B. (2005), “Variance of the number of false discoveries,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 411–426.
  • Pollard and van der Laan (2004) Pollard, K. S. and van der Laan, M. J. (2004), “Choice of a null distribution in resampling-based multiple testing,” Journal of Statistical Planning and Inference, 125, 85–100.
  • Rom (1990) Rom, D. M. (1990), “A sequentially rejective test procedure based on a modified Bonferroni inequality,” Biometrika, 77, 663–665.
  • Sarkar (2006) Sarkar, S. K. (2006), “False discovery and false nondiscovery rates in single-step multiple testing procedures,” The Annals of Statistics, 34, 394–415.
  • Šidák (1967) Šidák, Z. (1967), “Rectangular confidence regions for the means of multivariate normal distributions,” Journal of the American Statistical Association, 62, 626–633.
  • Simes (1986) Simes, R. J. (1986), “An improved Bonferroni procedure for multiple tests of significance,” Biometrika, 73, 751–754.
  • Storey (2002) Storey, J. D. (2002), “A direct approach to false discovery rates,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 479–498.
  • Storey et al. (2004) Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 187–205.
  • Storey and Tibshirani (2003) Storey, J. D. and Tibshirani, R. (2003), “Statistical significance for genomewide studies,” Proceedings of the National Academy of Sciences, 100, 9440–9445.
  • Sun and Cai (2009) Sun, W. and Cai, T. (2009), “Large-scale multiple testing under dependence,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 393–424.
  • Sun et al. (2015) Sun, W., Reich, B. J., Cai, T. T., Guindani, M., and Schwartzman, A. (2015), “False discovery control in large-scale spatial multiple testing,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 59–83.
  • van der Laan et al. (2004) van der Laan, M. J., Dudoit, S., and Pollard, K. S. (2004), “Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives,” Statistical applications in genetics and molecular biology, 3, 1–25.
  • Wang et al. (2010) Wang, H.-Q., Tuominen, L. K., and Tsai, C.-J. (2010), “SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures,” Bioinformatics, 27, 225–231.
  • Zhuo et al. (2020) Zhuo, B., Jiang, D., and Di, Y. (2020), “Test-statistic correlation and data-row correlation,” Statistics & Probability Letters, 167, 108903.