Asymptotic Uncertainty of False Discovery Proportion

Meng Mei

{}^{\text{a}}

{}^{\text{a}}

Department of Statistics, Oregon State University

{}^{\text{b}}

Department of Statistics and Data Science, National University of Singapore Tao Yu

{}^{\text{b}}

{}^{\text{a}}

Department of Statistics, Oregon State University

{}^{\text{b}}

Department of Statistics and Data Science, National University of Singapore and Yuan Jiang

{}^{\text{a}}

Yuan Jiang is the corresponding author. Meng Mei and Yuan Jiang’s research is supported in part by National Institutes of Health grant R01 GM126549. Tao Yu’s research is supported in part by the Singapore Ministry Education Academic Research Tier 1 Funds: R-155-000-202-114.

{}^{\text{a}}

Department of Statistics, Oregon State University

{}^{\text{b}}

Department of Statistics and Data Science, National University of Singapore

Abstract

Multiple testing has been a popular topic in statistical research. Although vast works have been done, controlling the false discoveries remains a challenging task when the corresponding test statistics are dependent. Various methods have been proposed to estimate the false discovery proportion (FDP) under arbitrary dependence among the test statistics. One of the main ideas is to reduce arbitrary dependence to weak dependence and then to establish theoretically the strong consistency of the FDP and false discovery rate (FDR) under weak dependence. As a consequence, FDPs share the same asymptotic limit in the framework of weak dependence. We observe that the asymptotic variance of the FDP, however, may rely heavily on the dependence structure of the corresponding test statistics even when they are only weakly dependent; and it is of great practical value to quantify this variability, as it can serve as an indicator of the quality of the FDP estimate from the given data. As far as we are aware, the research on this respect is still limited in the literature. In this paper, we first derive the asymptotic expansion of FDP under mild regularity conditions and then examine how the asymptotic variance of FDP varies under different dependence structures both theoretically and numerically. With the observations in this study, we recommend that in a multiple testing performed by an FDP procedure, we may report both the mean and the variance estimates of FDP to enrich the study outcome.

Keywords: Asymptotic expansion; Asymptotic variance; Multiple testing; Weak dependence.

1 Introduction

Multiple hypothesis testing has been a popular topic in statistical research; it has wide applications in many scientific areas, such as biology, medicine, genetics, neuroscience, and finance. Consider a multiple testing problem where we need to test $p$ hypotheses simultaneously; denote by $p_{0}$ and $p_{1}$ the number of true null and false null hypotheses respectively. The testing outcomes can be summarized in Table 1,

Table 1: Summary of possible outcomes from multiple hypothesis testing

	Not rejected	Rejected	Total
True Null	$U$	$V$	$p_{0}$
False Null	$T$	$S$	$p_{1}$
Total	$p-R$	$R$	$p$

where $V$ and $R$ denote the number of hypotheses that are wrongly rejected (i.e., the number of type I errors made) and rejected (i.e., the number of rejections made) out of the $p$ hypotheses, respectively.

Conventional methods propose to control the familywise error rate (FWER): $\text{FWER}=P(V>0)$ (Bonferroni, 1936; Šidák, 1967; Holm, 1979; Simes, 1986; Holland and Copenhaver, 1987; Hochberg, 1988; Rom, 1990), or the generalized familywise error rate (gFWER): $\text{gFWER}=P(V>k),\text{ for }k\geq 1$ (Dudoit et al., 2004; Pollard and van der Laan, 2004; Lehmann and Romano, 2012). These methods control the probability of zero or a limited number of false positives, therefore they are usually conservative, especially when the number of hypotheses $p$ is large. We observe that large $p$ is common in practice. For example, in a genome-wide association study, hundreds of thousands or even millions of single nucleotide polymorphisms (SNPs) of each subject and the associated disease status are measured. It is of interest to find which SNPs are associated with the disease status. A common approach is to conduct a hypothesis test for each SNP; this results in a huge number of hypothesis tests.

When $p$ is large, FDP and its statistical characteristics have been introduced. False discovery rate (FDR) has been the most popularly studied; it was introduced by Benjamini and Hochberg (1995) and was proven to be a reasonable criterion; in particular,

\mathrm{FDP}=\frac{V}{R},\qquad\mathrm{FDR}=E(\mathrm{FDP}),

where $\mathrm{FDP}=\mathrm{FDR}=0$ if $R=0$ . A list of FDR research can be found in Benjamini and Hochberg (1995), Benjamini et al. (2001), Storey (2002), Storey and Tibshirani (2003), Storey et al. (2004), Ferreira and Zwinderman (2006), Clarke et al. (2009), and the references therein. We observe that most of these methods are founded on the assumption that the test statistics are independent or satisfy some restrictive dependence structure.

Because of the complicated nature of the practical data, the resultant test statistics may follow any arbitrary dependence structure. The effects of dependence on FDR have been studied extensively, e.g., by Benjamini et al. (2001), Finner and Roters (2002), Owen (2005), Sarkar (2006), Efron (2007), among others. Recently, new methods for the control and/or estimation of FDR have been proposed to allow complex and general dependence structure or even incorporate it in the testing procedure. Examples include the local index of significance (LIS) testing procedure for dependence modeled by a hidden Markov model (Sun and Cai, 2009), the pointwise and clusterwise analysis for spatial dependence (Sun et al., 2015), and the principal factor approximation (PFA) method for arbitrary dependence (Fan et al., 2012; Fan and Han, 2017; Fan et al., 2019). In particular, Fan et al. (2012) studied FDP and FDR under arbitrary dependence with a two step procedure. They first proposed a PFA method to reduce arbitrary dependence to weak dependence; and then established theoretically the strong consistency of the FDP and FDR. Therefore, their method is valid for test statistics under arbitrary dependence; under weak dependence, both FDP and FDR converge to the same asymptotic limit.

In addition to FDR, another popular multiple testing criterion is called false discovery exceedance (FDX, also referred to as TPPFP in the literature), which is defined as the tail probability of the event that FDP exceeds a threshold $q\in(0,1)$ :

\text{FDX}=P(\mathrm{FDP}>q).

Compared to FDR that is only a single-value summary, FDX has the potential to fully describe the uncertainty and/or distribution of FDP as the threshold value $q$ varies. However, it has not been fully characterized how the dependence among the test statistics would impact FDX as opposed to FDR in the literature. In particular, some works on FDX assume independence among tests (Genovese and Wasserman, 2004; Guo and Romano, 2007; Ge and Li, 2012); more importantly, most works focus on developing procedures that control FDX as a single-value characteristic of FDP other than investigating how dependence affects the uncertainty and/or distribution of FDP (Korn et al., 2004; van der Laan et al., 2004; Lehmann and Romano, 2005; Genovese and Wasserman, 2006; Delattre and Roquain, 2015; Hemerik et al., 2019; Döhler and Roquain, 2020; Basu et al., 2021). Thus, these works still lack the valuable information about how FDP varies from study to study.

As far as we are aware, most of the above works have focused on proposing the estimates of FDP and/or its related statistical characteristics. On the one hand, studying these estimates is sufficient to ensure that the corresponding FDP and/or its statistical characteristics are under control. On the other hand, we observe that the asymptotic uncertainty of the FDP may rely heavily on the dependence structure of the corresponding test statistics; and it is of great practical value to quantify this variability, as it can serve as an indicator of the quality of the FDP estimate and benefit the research of the FDR and FDX methods. More specifically, with the FDP variance estimate available, one can conclude more confidently how well the FDP is controlled when controlling its corresponding mean at a given level. It may also benefit the development on the methods for estimating and controlling FDX. The research on the uncertainty of FDP has high potential impact, but it is still limited in the community. For example, Ge and Li (2012) and Delattre and Roquain (2011) derived the asymptotic distribution of FDP under independence and Gaussian equi-correlation dependence, respectively; Delattre and Roquain (2016) established the asymptotic distribution of $\mathrm{FDP}$ under some assumptions on the dependence structure of the test statistics and that the expected values of the test statistics in the alternatives are equivalent.

In this paper, our focus is to provide a rigorous theoretical investigation of the asymptotic uncertainty of FDP in the framework that the test statistics are weakly dependent. In particular, we derive the theoretical expansion of the FDP: it is a linear combination of the tests, based on which we obtain its asymptotic variance. We also examine how this variance varies under different dependence structures among the test statistics both theoretically and numerically. One interesting observation is that the FDP based on weakly dependent test statistics converges to the same limit as that based on independent test statistics (Fan et al., 2012); however, the asymptotic variance of the former is almost always significantly greater than the latter both theoretically and numerically. With these observations, we recommend that in a multiple testing performed by an FDP procedure, we may report both the mean and the variance estimates of FDP to enrich the study outcome. Finally, with a real GWAS dataset, we demonstrate how our findings may impact real studies.

2 Asymptotic Limit of FDP Under Weak Dependence

Consider a genome-wide association study with $n$ subjects; for each subject, its disease status and $p$ SNPs are measured. The data can be represented by $\mathbf{Y}=(Y_{1},\ldots,Y_{n})^{T}$ and a $n\times p$ matrix $\mathbf{X}=(X_{ij})_{i=1,\ldots,n;j=1,\ldots,p}$ , where $Y_{i}$ denotes the disease status of subject $i$ , and the $i$ th row of $\mathbf{X}$ collects its associated SNPs. When $Y_{i}$ ’s are measured in a continuous scale, the association between the $j$ th SNP and the disease status can be modelled by the marginal linear regression model:

Y_{i}=\alpha_{j}+\beta_{j}X_{ij}+\epsilon_{ij},\quad i=1,\ldots,n,

(1)

where $\alpha_{j}$ and $\beta_{j}$ are regression parameters and $\epsilon_{ij}$ are random errors. The estimates $\hat{\beta}_{j},\ j=1,\ldots,p$ and the corresponding test statistics $Z_{j},\ j=1,\ldots,p$ for

H_{0j}:\beta_{j}=0\quad\text{versus}\quad H_{1j}:\beta_{j}\neq 0,\quad j=1,\dots,p,

(2)

can be derived with a standard least squares procedure. Proposition 1 in Fan et al. (2012) verified that $\hat{\beta}_{j},\ j=1,\ldots,p$ are correlated and derived their joint distribution, under appropriate regularity conditions. As a consequence, the test statistics $Z_{j},\ j=1,\ldots,p$ are also correlated. In particular, they derived that conditioning on $\mathbf{X}$ ,

(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma}),

(3)

where $(\mu_{1},\ldots,\mu_{p})^{T}$ and $\boldsymbol{\Sigma}=(\sigma_{ij})_{i,j=1,\ldots,p}$ are appropriate population mean and variance matrix for $(Z_{1},\ldots,Z_{p})^{T}$ . Furthermore, the testing problem (2) becomes equivalent to

H_{0j}:\mu_{j}=0\quad\text{versus}\quad H_{1j}:\mu_{j}\neq 0,\quad j=1,\ldots,p.

(4)

Therefore, we shall work on the multiple testing problem under the setup (3) and (4) hereafter. Let $\mathcal{H}_{0}=\{j:H_{0j}\text{ is true}\}$ and $\mathcal{H}_{1}=\{j:H_{0j}\text{ is false}\}$ respectively be the sets of indices for the true nulls and the true alternatives, and let $p_{0}$ and $p_{1}$ denote their cardinalities. For a given threshold $t\in(0,1)$ , the $j$ th test is given by $t_{j}=\mathrm{1}(|Z_{j}|>|z_{t/2}|)$ , where $t_{j}=1$ indicates that the null hypothesis is rejected, and $t_{j}=0$ otherwise; here $z_{t/2}$ denotes the $(t/2)$ th quantile of a standard normal distribution. Define $V(t)=\sum_{j\in\mathcal{H}_{0}}t_{j}$ , $S(t)=\sum_{j\in\mathcal{H}_{1}}t_{j}$ , and $R(t)=\sum_{j=1}^{p}t_{j}$ ; they are respectively the number of false discoveries, correct discoveries, and the total number of discoveries. As a consequence, we have $\mathrm{FDP}(t)=V(t)/R(t)$ .

Fan et al. (2012) studied the estimation of $\mathrm{FDP}(t)$ under an arbitrary dependence structure of $\boldsymbol{\Sigma}$ . One milestone finding is that if $\boldsymbol{\Sigma}$ follows the weak dependence structure:

\sum_{i,j}|\sigma_{ij}|=O\left(p^{2-\delta}\right)\text{ for some }\delta>0,

(5)

then,

\lim_{p\to\infty}\left[\mathrm{FDP}(t)-\frac{p_{0}t}{\sum_{i=1}^{p}\left\{\Phi(z_{t/2}+\mu_{i})+\Phi(z_{t/2}-\mu_{i})\right\}}\right]=0,

(6)

almost surely, where $\Phi(\cdot)$ denotes the cumulative distribution function of a standard normal distribution. This indicates that under the aforementioned weak dependence assumption, $\mathrm{FDP}(t)$ converges to a limit not depending on the covariance structure of the test statistics. It is noteworthy that this limit holds not only for $\mathrm{FDP}(t)$ , but also for $\mathrm{FDR}(t)$ .

We observe that this conclusion does not hold for the asymptotic variance of $\mathrm{FDP}(t)$ ; the asymptotic variance of $\mathrm{FDP}(t)$ varies under different covariance structures among the test statistics even when they are weakly dependent. In this article, we shall derive the asymptotic variance of $\mathrm{FDP}(t)$ theoretically and study how it varies under different dependence structures of the test statistics numerically.

3 Asymptotic Variance of FDP

In this section, we study the asymptotic variance of $\mathrm{FDP}(t)$ in the framework of Section 2. We first establish the asymptotic expansion of $\mathrm{FDP}(t)$ , and its asymptotic variance under a weak dependence structure. Then, we show that with mild conditions, this asymptotic variance under dependence is higher than that under independence. This implies that under weak dependence, although the asymptotic mean of $\mathrm{FDP}(t)$ remains the same, its asymptotic variance can be inflated. For space limitations and presentational continuity, we have relegated the technical details in the supplementary materials.

3.1 Asymptotic expansion and variance of FDP

We first derive the asymptotic expansion of the FDP in the framework of weak dependence. We introduce the following notation. For any $t\in[0,1]$ , we denote $\xi_{j}=\Phi(z_{t/2}+\mu_{j})+\Phi(z_{t/2}-\mu_{j})$ and $\bar{\xi}=\frac{1}{p_{1}}\sum_{j\in\mathcal{H}_{1}}\xi_{j}$ . Consider the function

H(\mu)=\phi(|z_{t/2}|+\mu)(|z_{t/2}|+\mu)+\phi(|z_{t/2}|-\mu)(|z_{t/2}|-\mu);

there exists a unique root, denoted by $\mu_{t}$ , of this function for $\mu\in(|z_{t/2}|,|z_{t/2}|+1)$ . Let $C_{t}^{\max}=\sup_{\mu\in(-\mu_{t},\mu_{t})}H(\mu)$ .

Theorem 1.

Suppose $(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma})$ with unit variances, i.e., $\sigma_{jj}=1\text{ for }j=1,\ldots,p$ . Assume that $(Z_{1},\dots,Z_{p})^{T}$ are weakly dependent as defined in (5), that $\limsup_{p\to\infty}p_{0}t/(p_{1}\bar{\xi})<1$ , and that as $p$ is sufficiently large, for a universal constant $C>0$ ,

	$\displaystyle\sum_{i\neq j;i,j\in\mathcal{H}_{0}}\sigma_{ij}^{2}\geq\frac{C_{t}^{\max}}{\phi(z_{t/2})\|z_{t/2}\|}\sum_{i\in\mathcal{H}_{1},j\in\mathcal{H}_{0},\mu_{i}\in[-\mu_{t},\mu_{t}]}\sigma_{ij}^{2},$		(7)
	$\displaystyle\sum_{i\neq j;i,j\in\mathcal{H}_{0}}\sigma_{ij}^{2}+p\geq C\sum_{i\neq j;i,j\in\mathcal{H}_{1}}\sigma_{ij}^{2},$		(8)
	$\displaystyle\sum_{i\neq j}\sigma_{ij}^{4}=o\left(\sum_{i\neq j;i,j\in\mathcal{H}_{0}}\sigma_{ij}^{2}+p_{0}\right).$		(9)

We have the following asymptotic expansion of $\mathrm{FDP}(t)$ :

\mathrm{FDP}(t)=\frac{E(\bar{V})}{E(\bar{R})}+m(\bar{V},\bar{R})+r(\bar{V},\bar{R}),

(10)

where $\bar{V}=V(t)/p$ , $\bar{R}=R(t)/p$ ,

m(\bar{V},\bar{R})=\frac{\bar{V}}{E(\bar{R})}-\frac{E(\bar{V})}{\{E(\bar{R})\}^{2}}\bar{R},

and the remainder term $r(\bar{V},\bar{R})$ satisfies that $E\{r^{2}(\bar{V},\bar{R})\}=o[\mathrm{Var}\{m(\bar{V},\bar{R})\}]$ .

Remark 1.

Theorem 1 establishes the asymptotic expansion of $\mathrm{FDP}(t)$ . It decomposes $\mathrm{FDP}(t)$ into three parts: the asymptotic mean $E(\bar{V})/E(\bar{R})$ , the stochastic term $m(\bar{V},\bar{R})$ that has mean zero, and an asymptotically negligible remainder term $r(\bar{V},\bar{R})$ . The asymptotic mean $E(\bar{V})/E(\bar{R})$ is a constant, which reinforces the conclusion given by (6); the stochastic term $m(\bar{V},\bar{R})$ is a linear combination of the tests $t_{j},\ j=1\ldots,p$ , which gives the asymptotic variance of $\mathrm{FDP}(t)$ .

Remark 2.

With the weak dependence assumption (5), other technical conditions in Theorem 1, i.e., Conditions (7)–(9), are mild, if $\boldsymbol{\Sigma}$ does not behave too extremely; the intuition is as follows. The number of summed terms on the left hand side of (7) is about $p_{0}^{2}$ ; in contrast, that on the right side is about $p_{0}p_{1}$ . If $p_{0}\gg p_{1}$ , the number of summed terms on the left hand side of (7) is much more than that on the right. Therefore, this condition is satisfied unless the magnitudes of $\sigma_{ij}$ for $i\in\mathcal{H}_{0},\ j\in\mathcal{H}_{1}$ are generally much bigger than those for $i,j\in\mathcal{H}_{0}$ . Similar arguments are applicable for Condition (8). For Condition (9), the summation on the left hand side is over the fourth order of $\sigma_{ij}$ , whereas that on the right is over the second order; therefore, it is easily satisfied when most of $\sigma_{ij}$ approach $0$ when $p\to\infty$ .

Remark 3.

The development for Theorem 1 is theoretically challenging. The expansion (10) can be observed from a standard Taylor expansion; however, the derivation for $E\{r^{2}(\bar{V},\bar{R})\}=o[\mathrm{Var}\{m(\bar{V},\bar{R})\}]$ is technically involved. More specifically, two features jointly introduce challenges in our developments: (1) the leading component of $E\{r^{2}(\bar{V},\bar{R})\}$ is the summation of fourth moments of all the tests; we observe that to compare its asymptotic properties with $\mathrm{Var}\{m(\bar{V},\bar{R})\}$ , a fourth order Taylor expansion of each moment term is further needed; (2) the dependence structure of the tests significantly complicates our study of the asymptotic upper bound of $E\{r^{2}(\bar{V},\bar{R})\}$ and the lower bound of $\mathrm{Var}\{m(\bar{V},\bar{R})\}$ .

Following the above asymptotic expansion of $\mathrm{FDP}(t)$ in Theorem 1, Corollary 1 below presents its asymptotic variance.

Corollary 1.

With all the conditions in Theorem 1 effective, we have

\lim_{p\to\infty}\dfrac{\mathrm{Var}\left\{\mathrm{FDP}(t)\right\}}{V_{1}(t)+V_{2}(t)}=1,

(11)

where

$\displaystyle V_{1}(t)={}$	$\displaystyle\dfrac{p_{1}^{2}\bar{\xi}^{2}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\times p_{0}t(1-t)+\dfrac{p_{0}^{2}t^{2}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\times p_{1}\bar{\xi}(1-\bar{\xi}),$	(12)
$\displaystyle V_{2}(t)={}$	$\displaystyle\dfrac{2p_{1}^{2}\bar{\xi}^{2}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\sum_{\begin{subarray}{c}i<j\\ i,j\in\mathcal{H}_{0}\end{subarray}}\mathrm{Cov}(t_{i},t_{j})-\dfrac{2p_{0}p_{1}t\bar{\xi}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\sum_{\begin{subarray}{c}i\in\mathcal{H}_{0}\\ j\in\mathcal{H}_{1}\end{subarray}}\mathrm{Cov}(t_{i},t_{j})$
	$\displaystyle+\dfrac{2p_{0}^{2}t^{2}}{(p_{0}t+p_{1}\bar{\xi})^{4}}\sum_{\begin{subarray}{c}i<j\\ i,j\in\mathcal{H}_{1}\end{subarray}}\mathrm{Cov}(t_{i},t_{j}),$	(13)

with

	$\displaystyle\mathrm{Cov}(t_{i},t_{j})={}$	$\displaystyle\int_{\|z_{i}\|>\|z_{t/2}\|}\int_{\|z_{j}\|>\|z_{t/2}\|}\phi(z_{i},z_{j};\mu_{i},\mu_{j},1,1,\sigma_{ij})dz_{i}dz_{j}$
		$\displaystyle-[\Phi(z_{t/2}+\mu_{i})+\Phi(z_{t/2}-\mu_{i})][\Phi(z_{t/2}+\mu_{j})+\Phi(z_{t/2}-\mu_{j})],$		(14)

in which $\phi(z_{i},z_{j};\mu_{i},\mu_{j},\sigma_{i}^{2},\sigma_{j}^{2},\sigma_{ij})$ is the probability density function of the bivariate normal distribution with mean $(\mu_{i},\mu_{j})^{T}$ and covariance matrix $\left(\begin{matrix}\sigma_{i}^{2}&\sigma_{ij}\\ \sigma_{ij}&\sigma_{j}^{2}\end{matrix}\right)$ .

Corollary 1 provides an explicit formula for the asymptotic variance of FDP under weak dependence. In the special case that the test statistics are independent, $V_{2}(t)=0$ and the asymptotic variance of FDP is reduced to $V_{1}(t)$ . Therefore, the “additional variance” of FDP due to the dependence among the test statistics is introduced by $V_{2}(t)$ . However, the sign of $V_{2}(t)$ is not conclusive based on (13). This leads to an interesting question: under which scenario, $V_{2}(t)>0$ , and therefore the $\mathrm{FDP}$ is inflated by the variance among test statistics? To answer this question, we need to investigate the signs of the covariances between pairs of tests, i.e., $\mathrm{Cov}(t_{i},t_{j})$ , which are included in the formula of $V_{2}(t)$ . We answer this question in the subsequent subsections.

3.2 Signs of the covariance between any pair of tests

From our development in the subsection above, we explained that the asymptotic variance of $\mathrm{FDP}(t)$ depends on $V_{2}(t)$ ; and in turn depends on $\mathrm{Cov}(t_{i},t_{j})$ . In this subsection, we study how the signs of $\mathrm{Cov}(t_{i},t_{j})$ are affected by the dependence among the test statistics. Interestingly, we observe that the signs of $\mathrm{Cov}(t_{i},t_{j})$ are almost certainly determined by the memberships of tests $i$ and $j$ in $\mathcal{H}_{0}$ and $\mathcal{H}_{1}$ . We have the following theorem.

Theorem 2.

$(a)$

for $i,j\in\mathcal{H}_{0}$ , $\mathrm{Cov}(t_{i},t_{j})>0$ ;
$(b)$

for $i\in\mathcal{H}_{0}$ , $j\in\mathcal{H}_{1}$ , $\mathrm{Cov}(t_{i},t_{j})<0$ if $|\mu_{j}|>2|z_{t/2}|$ and $t<2\{1-\Phi^{-1}(\sqrt{\log(3)/2})\}$ ;
$(c)$

for $i,j\in\mathcal{H}_{1}$ , $\mathrm{Cov}(t_{i},t_{j})>0$ if $|\sigma_{ij}|\leq|\mu|_{\min}/(|\mu|_{\max}+z_{t/2})$ and $\mathrm{sign}(\sigma_{ij})=\mathrm{sign}(\mu_{i}\mu_{j})$ , where $|\mu|_{\min}=\min(|\mu_{i}|,|\mu_{j}|)$ and $|\mu|_{\max}=\max(|\mu_{i}|,|\mu_{j}|)$ .

The motivation of the above theorem is to understand how the variance of FDP is affected by the dependence structure among the test statistics; however, these results on their own are interesting, and have theoretical and/or practical value. The theorem implies that the sign of the covariance of any $(t_{i},t_{j}),i\neq j$ pair is almost certainly determined by the membership of tests $i$ and $j$ in $\mathcal{H}_{0}$ and $\mathcal{H}_{1}$ , and relies little on the sign of $\sigma_{ij}$ ; it holds for the dependence types of both weakly dependent and strongly dependent test statistics. We expect that this theorem may benefit other multiple testing related research, when it is in need to quantify the dependence of $t_{i}$ and $t_{j}$ . Next, we give some discussion on the implications of Part (a)–(c). In the discussion below, we assume that $Z_{i}$ and $Z_{j}$ are dependent, if not stated otherwise.

Part (a) states that $t_{i}$ and $t_{j}$ are positively correlated provided that both corresponding tests belong to $\mathcal{H}_{0}$ . The intuition is that both tests are two-sided tests and the corresponding test statistics have zero expectations. Therefore, $|Z_{i}|$ and $|Z_{j}|$ are positively correlated, which further leads to the positive correlation between $t_{i}$ and $t_{j}$ .

Part (b) indicates that when one test, say $i$ , belongs to $\mathcal{H}_{0}$ , and the other test, say $j$ , belongs to $\mathcal{H}_{1}$ , then $t_{i}$ and $t_{j}$ are negatively correlated given that $t$ is a reasonable threshold, i.e., $t<2\{1-\Phi^{-1}(\sqrt{\log(3)/2})\}\approx 0.46$ and that the signal of $j$ is not too weak, i.e., $|\mu_{j}|>2|z_{t/2}|$ . Note that both conditions are typically satisfied in practical problems. The intuition of this conclusion is as follows. Let $Z_{j}^{*}=Z_{j}-\mu_{j}$ ; then if we consider $Z_{j}^{*}$ as the test statistic, its corresponding test belongs to $\mathcal{H}_{0}$ ; its correlations with the other test statistics remain unchanged. The corresponding test $t_{j}$ in the notation of $Z_{j}^{*}$ has the form:

\displaystyle t_{j}=\mathrm{1}(|Z_{j}|>|z_{t/2}|)=\mathrm{1}\{Z^{*}_{j}\in(-\infty,-|z_{t/2}|-\mu_{j})\cup(|z_{t/2}|-\mu_{j},\infty)\},

where we assume that $\mu_{j}>0$ for presentational convenience; this leads to

1-t_{j}=\mathrm{1}\left\{Z^{*}_{j}\in(-|z_{t/2}|-\mu_{j},|z_{t/2}|-\mu_{j})\right\},

and implies that the acceptance region for the test based on $Z_{j}^{*}$ is contained by the rejection region of $t_{i}$ given that the signal $\mu_{j}>2|z_{t/2}|$ . From our conclusion in Part (a), $1-t_{j}$ is positively correlated with $t_{i}$ , and therefore $t_{j}$ and $t_{i}$ are negatively correlated.

Part (c) gives sufficient conditions under which $t_{i}$ and $t_{j}$ are positively correlated when $i,j\in\mathcal{H}_{1}$ . This is the only case where $\sigma_{ij}$ affects the sign of the correlation between the corresponding tests. These conditions are also intuitively met in real applications. In particular, the first condition $|\sigma_{ij}|\leq|\mu|_{\min}/(|\mu|_{\max}+z_{t/2})$ essentially requires that the magnitude of $\sigma_{ij}$ does not exceed a ratio of the minimum to the maximum signals. One special practical scenario is that if the two signals are reasonably strong and in the similar scale, this ratio is close to 1; thus this condition is met. The second condition $\mathrm{sign}(\sigma_{ij})=\mathrm{sign}(\mu_{i}\mu_{j})$ needs that the sign of $\sigma_{ij}$ coincides with the signs of the corresponding signals: the test statistics are positively correlated if the signals are of the same signs; and they are negatively correlated otherwise. More specifically, consider the GWAS example, Part (c) implies that the tests for the significance of the $i$ th and $j$ th SNPs are positively correlated if their individual effects on the disease status are both positive or negative.

3.3 Inflation of the FDP variance under dependence

In Section 3.1, we discussed that $V_{2}(t)$ attributes to the FDP variance difference among varied dependence structures of the test statistics. The signs of its components are further discussed in Section 3.2. From these discussion, we conclude that the variance of FDP is inflated given that Parts (a)–(c) in Theorem 2 all hold. We summarize this result in the following theorem.

Theorem 3.

Suppose $(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma})$ with $\sigma_{jj}=1\text{ for }j=1,\ldots,p$ . For any $i,j\in\mathcal{H}_{1},i\neq j$ , assume that $|\mu_{i}|\geq 2|z_{t/2}|>2\sqrt{\log(3)/2}$ , $|\sigma_{ij}|\leq|\mu|_{\min}/(|\mu|_{\max}+z_{t/2})$ , and $\mathrm{sign}(\sigma_{ij})=\mathrm{sign}(\mu_{i}\mu_{j})$ , where $|\mu|_{\min}=\min(|\mu_{i}|,|\mu_{j}|)$ and $|\mu|_{\max}=\max(|\mu_{i}|,|\mu_{j}|)$ . Then, $V_{2}(t)>0$ .

Theorem 3 implies that the variance of FDP is inflated if the test statistics satisfy the given assumptions on their dependence structure. On the one hand, as we have discussed in Section 3.2, these are mild conditions for practical data. On the other hand, they are only one set of sufficient conditions that lead to the inflation of the FDP variance. In our numerical studies, we experiment with some settings that violate the conditions in Theorem 3, and still observe the inflation of the variance of FDP; please refer to Section 5 for details. Such scenarios are not theoretically justified, but may often be practically observed.

4 Estimation of the Asymptotic Variance of FDP

In Section 3.1, we have established the asymptotic variance of FDP in the framework of weak dependence. We observe that this variance depends on the parameters $(\mu_{1},\ldots,\mu_{p})^{T}$ and $\boldsymbol{\Sigma}$ . We need to estimate them so that we can quantify this variance to make it practically applicable. Throughout, we assume that $\boldsymbol{\Sigma}$ is known or can be estimated from other resources (see Section 2) and estimate only $(\mu_{1},\ldots,\mu_{p})^{T}$ .

One direct approach is to estimate $(\mu_{1},\ldots,\mu_{p})^{T}$ by their corresponding test statistics $(Z_{1},\ldots,Z_{p})^{T}$ . However, in practical data, we often have $p_{0}\gg p_{1}$ ; this implies that most $\mu_{j}$ ’s are zeros. Estimating these “zeros” using the corresponding observed $Z_{j}$ ’s tends to diminish the efficiency of the method, as they can be viewed as noise. In our numerical studies, we consider the following estimation approach.

(1)

Find an estimate $\hat{\pi}_{0}$ for $\pi_{0}=p_{0}/p$ , which is the proportion of the number of tests that belong to $\mathcal{H}_{0}$ over the total number of tests.

Estimation methods for $\pi_{0}$ are available in the literature. For example, Storey and Tibshirani (2003) proposed $\hat{\pi}_{0}(\lambda)=\sum_{i=1}^{p}1(P_{i}>\lambda)/\{m(1-\lambda)\}$ with an $\lambda\in(0,1)$ ; the basic idea of this estimator is to pretend that $p$ -values greater than $\lambda$ are from $\mathcal{H}_{0}$ . In this framework, smoothed and bootstrap estimators of $\pi_{0}$ are proposed by Storey and Tibshirani (2003) and Storey et al. (2004), respectively. Additionally, Langaas et al. (2005) proposed a nonparametric maximum likelihood estimator of $\pi_{0}$ . More recently, Wang et al. (2010) proposed a “SLIM” estimator of $\pi_{0}$ ; their estimator accounted for the dependence among the test statistics. We shall adopt these estimators in our numerical studies and compare their performance in the estimation of the FDP variance.
(2)

Estimate $p_{1}$ by $\hat{p}_{1}=p(1-\hat{\pi}_{0})$ . We then collect the hypotheses with the top $\hat{p}_{1}$ statistic values to form $\widehat{\mathcal{H}}_{1}$ , and the rest are assigned to $\widehat{\mathcal{H}}_{0}$ .
(3)

Set $\hat{\mu}_{j}=0$ for $j\in\widehat{\mathcal{H}}_{0}$ ; and set $\hat{\mu}_{j}=Z_{j}$ for $j\in\widehat{\mathcal{H}}_{1}$ ; and use them to estimate the variance of $\mathrm{FDP}(t)$ based on Corollary 1.

5 Simulation Studies

We conduct simulation studies to validate our developments in Sections 3 and 4. In Section 5.1, we use various numerical examples to illustrate how the asymptotic variances of FDP are different for varied dependence structures among the test statistics. In Section 5.2, we illustrate how the choice of the methods for estimating $\hat{\pi}_{0}$ reviewed in Section 4 affects the estimation for the variance of FDP.

5.1 Variance of FDP under varied dependence structures

Throughout, we generate $(Z_{1},\ldots,Z_{p})^{T}\sim N((\mu_{1},\ldots,\mu_{p})^{T},\boldsymbol{\Sigma})$ with $p=2000$ . We examined three choices of $p_{1}$ : 50, 100, and 200, where the indices of tests in $\mathcal{H}_{1}$ are randomly sampled from $\{1,\ldots,p\}$ ; for $j\in\mathcal{H}_{1}$ , we set $\mu_{j}=2z_{t/2}$ . We considered three possible values of $t$ : 0.005, 0.02, and 0.05. The choice of $\boldsymbol{\Sigma}$ determines the dependence structure among the test statistics. We use similar methods as Fan et al. (2012) to generate it; the details are given below.

(1)

We generate a random sample $\mathbf{X}_{1},\ldots,\mathbf{X}_{400}$ , which are iid copies of $\mathbf{X}=(X_{1},\ldots,X_{p})^{T}$ . We consider seven models for the joint distribution of $\mathbf{X}$ .
- –
  
  [M1] $X_{1},\ldots,X_{p}$ are generated as iid $N(0,1)$ random variables.
- –
  
  [M2] Let $\mathbf{X}\sim N_{p}(0,\mathbf{\Lambda})$ , where $\mathbf{\Lambda}$ has diagonal elements $1$ and off-diagonal elements $1/2$ .
- –
  
  [M3] Let $\{X_{k}\}_{k=1}^{1900}$ be iid $N(0,1)$ random variables, and $X_{k}=\sum_{l=1}^{10}X_{l}(-1)^{l+1}/5+\sqrt{1-\dfrac{10}{25}}\varepsilon_{k}$ for $k=1901,\dots,2000,$ with $\{\varepsilon_{k}\}_{k=1901}^{2000}$ being $N(0,1)$ random variables.
- –
  
  [M4] Let $\{X_{k}\}_{k=1}^{2000}$ be iid Cauchy random variables with location parameter $0$ and scale parameter $1$ .
- –
  
  [M5] Let $X_{j}=\rho_{j}^{(1)}W^{(1)}+\rho_{j}^{(2)}W^{(2)}+\rho_{j}^{(3)}W^{(3)}+H_{j}$ , where $W^{(1)}\sim N(-2,1)$ , $W^{(2)}\sim N(1,1)$ , $W^{(3)}\sim N(4,1)$ , $\rho_{j}^{(1)}$ , $\rho_{j}^{(2)}$ , $\rho_{j}^{(3)}\sim U(-1,1)$ , $H_{j}\sim N(0,1)$ ; and all these random variables are independent.
- –
  
  [M6] Let $X_{j}=\rho_{j}^{(1)}W^{(1)}+\rho_{j}^{(2)}W^{(2)}+H_{j}$ , where $W^{(1)},W^{(2)}\sim N(0,1)$ , $\rho_{j}^{(1)},\rho_{j}^{(2)}\sim U(-1,1)$ , $H_{j}\sim N(0,1)$ , and all these random variables are independent.
- –
  
  [M7] Let $X_{j}=\mathrm{sin}\left(\rho_{j}^{(1)}W^{(1)}\right)+\mathrm{sign}\left(\rho_{j}^{(2)}\right)\exp\left(|\rho_{j}^{(2)}|W^{(2)}\right)+H_{j}$ , where $W^{(1)},W^{(2)}\sim N(0,1)$ , $\rho_{j}^{(1)},\rho_{j}^{(2)}\sim U(-1,1)$ , and $H_{j}\sim N(0,1)$ , and all these random variables are independent.
(2)

We compute $\boldsymbol{\Sigma}^{\text{initial}}$ as the sample correlation matrix based on $\mathbf{X}_{1},\ldots,\mathbf{X}_{400}$ . Note that for some scenarios above, the components of $\mathbf{X}$ are strongly dependent.
(3)

We adopt the PFA method in Fan et al. (2012) to generate a weakly dependent $\boldsymbol{\Sigma}$ based on $\boldsymbol{\Sigma}^{\text{initial}}$ . When applying the PFA method, we used the criterion $p^{-2}\sum_{1\leq i,j\leq p}|\sigma_{ij}|<0.05$ numerically.

With these setups, for M1, the test statistics are independent, whereas for the other models, the test statistics are weakly dependent with varied dependence structures. For each aforementioned $p_{1}$ , $t$ , and model combination, we conduct $\mathrm{FDP}$ analysis, and evaluate the corresponding sample standard deviation of $\mathrm{FDP}$ s over 1000 repetitions (named “Emp” in Table 2); we also compute the corresponding asymptotic standard deviation of $\mathrm{FDP}$ given in Corollary 1 with $\boldsymbol{\mu}$ and $\boldsymbol{\Sigma}$ being replaced with their simulated values (named “Asym” in Table 2). The results are summarized in Table 2.

Table 2: Comparison for the standard deviation (

\times 100

) of FDPs

$t$	$p_{1}$	Method	M1	M2	M3	M4	M5	M6	M7
	50	Asym	4.37	6.11	5.13	6.91	6.11	6.12	6.15
		Emp	4.43	6.08	5.16	6.60	5.73	6.20	5.90
0.005	100	Asym	2.57	3.57	3.01	4.03	3.57	3.57	3.59
		Emp	2.65	3.56	2.94	3.85	3.68	3.66	3.52
	200	Asym	1.37	1.88	1.59	2.12	1.88	1.89	1.89
		Emp	1.33	1.83	1.60	2.06	1.86	1.79	1.95
	50	Asym	3.92	6.74	5.23	7.12	6.75	6.75	6.80
		Emp	3.90	6.64	5.35	7.30	6.91	6.90	6.84
0.02	100	Asym	3.23	5.50	4.28	5.81	5.51	5.51	5.55
		Emp	3.20	5.50	4.44	5.66	5.39	5.58	5.35
	200	Asym	2.15	3.61	2.82	3.80	3.61	3.61	3.64
		Emp	2.14	3.61	2.84	3.70	3.44	3.51	3.40
	50	Asym	2.25	4.33	3.25	4.42	4.34	4.34	4.37
		Emp	2.27	4.54	3.25	4.47	4.58	4.45	4.54
0.05	100	Asym	2.53	4.87	3.65	4.97	4.88	4.88	4.92
		Emp	2.54	4.70	3.60	5.04	4.80	4.92	5.02
	200	Asym	2.23	4.24	3.19	4.32	4.24	4.24	4.27
		Emp	2.25	4.25	3.22	4.30	4.09	4.17	4.22

From this table, we observe that for all cases, the asymptotic standard deviation is very close to the sample standard deviation of $\mathrm{FDP}$ s over 1000 repetitions. This validates our conclusion in Corollary 1, and indicates that this asymptotic variance well approximates its corresponding true value in practice. Furthermore, for all the $p_{1}$ and $t$ combinations, the standard deviations of the FDP from M1 are significantly smaller than those from the other models. This implies that the variances of FDP are inflated when dependence is present among the test statistics. Note that for these examples, the conditions in Theorem 3 are not necessarily guaranteed; this reinforces our discussion in Section 3.3 and demonstrates that the inflation of the FDP variance may occur in wide real applications. Among M2–M6, we observe that M3 leads to the smallest inflation; this could be because that this model is close to the independent case, as the majority of $X_{k}$ ’s are independent, i.e., for $k=1,\ldots,1900$ .

5.2 Estimation of the asymptotic variance for FDP

In this subsection, we assess the performance of the method in Section 4 in estimating the asymptotic variance of FDP. As we observe in Section 4, our method relies on $\pi_{0}$ , whose estimation methods are available in the literature. We incorporate these methods into our method and compare their performance. In particular, we consider: (1) the smoothed estimate by Storey and Tibshirani (2003), named as “Smoothed”; (2) the bootstrap estimate by Storey et al. (2004), named as “Bootstrap”; (3) the nonparametric maximum likelihood estimate by Langaas et al. (2005), named as “Langaas”; and (4) the SLIM estimate by Wang et al. (2010), named as “SLIM”. We include the asymptotic standard deviation of FDP based on the true values of the parameters, i.e, $\boldsymbol{\mu}$ and $\boldsymbol{\Sigma}$ , for each case as a reference, named as “True”.

We adopt the methods in Section 5.1 to generate test statistics, and apply the aforementioned methods to estimate the asymptotic variances of FDPs. We only present the results for Model M4 in Section 5.1; the results for the other models show similar patterns, and are omitted due to space limitation. The means and standard deviations of the estimated asymptotic standard deviations of FDPs over 1000 replicates are displayed in Table 3. From this table, we observe that the Langaas and SLIM methods perform reasonably well in the estimation of the asymptotic standard deviation of FDPs; in particular, these methods result in estimates close to the true asymptotic standard deviation based on our Corollary 1 and lead to smaller standard deviations. We observe that this is because in the estimation of $\pi_{0}$ , these methods are more robust to the dependence among the test statistics, whereas the Smoothed and Bootstrap methods can work well only when the test statistics are independent. We provide the average estimated number of identified alternative hypotheses, i.e., $p_{1}=p(1-\pi_{0})$ and the corresponding standard deviations over 1000 replicates in Table 4; from this table, we observe that the performance in the estimation of $p_{1}$ is in the order of

\displaystyle\mbox{Bootstrap}<\mbox{Smoothed}<\mbox{Langaas}<\mbox{SLIM},

which is consistent with our observation in Table 3.

Table 3: Mean (

\times 1000

) and standard deviation (

\times 1000

, in parenthesis) of the estimated asymptotic standard deviation of FDPs over 1000 replicates.

$t$	$p_{1}$	True	Smoothed	Bootstrap	Langaas	SLIM
	50	6.91	3.46 (3.12)	5.83 (2.70)	6.00 (1.02)	7.31 (1.52)
0.005	100	4.03	3.13 (2.82)	4.01 (1.43)	3.68 (0.42)	4.23 (0.60)
	200	2.12	2.65 (2.45)	2.06 (0.37)	2.02 (0.14)	2.18 (0.16)
	50	7.12	3.57 (2.92)	5.81 (1.79)	6.70 (0.62)	6.97 (0.43)
0.02	100	5.81	3.56 (2.48)	5.39 (1.02)	5.45 (0.54)	5.90 (0.40)
	200	3.80	3.46 (1.84)	3.63 (0.57)	3.64 (0.29)	3.92 (0.22)
	50	4.42	2.87 (2.35)	4.24 (1.50)	4.74 (0.40)	4.32 (0.69)
0.05	100	4.97	3.04 (2.13)	4.71 (0.49)	4.91 (0.16)	4.95 (0.09)
	200	4.32	3.51 (1.52)	4.16 (0.37)	4.23 (0.27)	4.41 (0.12)

Table 4: Mean and standard deviation (in parenthesis) of the estimated number of alternative hypotheses over 1000 replicates.

$t$	$p_{1}$	Smoothed	Bootstrap	Langaas	SLIM
	50	103 (122)	107 (80)	90 (53)	55 (20)
0.005	100	136 (135)	153 (86)	138 (51)	102 (21)
	200	214 (152)	251 (77)	239 (49)	202 (20)
	50	103 (122)	110 (89)	90 (55)	55 (21)
0.02	100	133 (136)	152 (85)	137 (51)	103 (21)
	200	205 (154)	251 (80)	237 (50)	202 (20)
	50	94 (117)	108 (85)	90 (58)	53 (21)
0.05	100	129 (135)	150 (84)	136 (54)	100 (20)
	200	207 (147)	252 (80)	234 (56)	197 (20)

6 Real Data Analysis

We apply our method to a genome-wide association study (GWAS); more specifically, an eQTL mapping study. The data are formed of 210 subjects from the International Hapmap Project, including three groups: (1) 90 subjects from Asian formed of 45 Japanese from Tokyo, 45 Han Chinese from Beijing, named “Grp1”; (2) 60 Utah residents with ancestry from northern and western Europe, named “Grp2”; and (3) 60 Nigerian from Ibadan, named “Grp3”. More details of the data can be found from the web link: ftp://ftp.sanger.ac.uk/pub/genevar/; see also Bradic et al. (2011). We adopt the existing methods to construct test statistics for testing the association between the expression level of the gene CCT8 and the SNP genotypes for each group of the subjects (Bradic et al., 2011; Fan et al., 2012).

The SNP genotypes have three possibilities: $\text{SNP}=0$ means no polymorphism, $\text{SNP}=1$ indicates one nucleotide has polymorphism, and $\text{SNP}=2$ implies both nucleotides have polymorphisms. Adopting the strategy in Bradic et al. (2011), we transform the SNP data into two dummy variables, namely $(d_{1},d_{2})$ :

•

$(d_{1},d_{2})=(0,0)$ for $\text{SNP}=0$ ;
•

$(d_{1},d_{2})=(1,0)$ for $\text{SNP}=1$ ;
•

$(d_{1},d_{2})=(0,1)$ for $\text{SNP}=2$ .

For subject $i$ , denote by $Y_{i}$ and $(d_{1,i,j},d_{2,i,j})$ the logarithm-transformed value of the expression level of gene CCT8 and the aforementioned dummy variables of the $j$ th SNP. We consider two sets of marginal linear regression models:

	$\displaystyle\mbox{Set 1:}\qquad Y_{i}$	$\displaystyle=$	$\displaystyle\alpha_{1,j}+\beta_{1,j}d_{1,i,j}+\epsilon_{1,i,j},\quad\mbox{for }j=1,\ldots,p;$
	$\displaystyle\mbox{Set 2:}\qquad Y_{i}$	$\displaystyle=$	$\displaystyle\alpha_{2,j}+\beta_{2,j}d_{2,i,j}+\epsilon_{2,i,j},\quad\mbox{for }j=1,\ldots,p.$

Here we impute the missing SNP data by 0, and exclude the SNPs that are identical to each other for all the subjects.

For each group of the data, we combine aformentioned two sets of models, and follow the procedure in Fan et al. (2012) to generate the test statistics; this leads to $2p$ test statistics. In detail, we use the procedure described in Section 2 to establish an initial set of test statistics and their corresponding covariance matrix. We then use the dependence-adjusted procedure in Fan et al. (2012) to remove the top ten principal factors from the covariance matrix, and take a standardization step to generate a new set of test statistics of unit marginal variances; this helps reduce the dependence among the test statistics.

With the resultant test statistics for each group, we estimate the asymptotic limits and standard deviations of $\mathrm{FDP}(t)$ based on the following setups:

•

For each group of test statistics, we consider two thresholds of $p$ -value, which are chosen such that the estimates of the $\mathrm{FDP}(t)$ limits are reasonable values.
•

The asymptotic limits are estimated from (6), denoted by $\widehat{\text{FDP}}(t)$ .
•

The asymptotic standard deviations are computed based on the methods in Section 4. As we have observed in Section 5.2 that the standard deviation estimates based on the $\pi_{0}$ estimates from the “Langaas” and “SLIM” methods outperform the others; therefore, we present only the results from these methods. Furthermore, to assess how dependence may affect the estimation of these standard deviation estimates, we report these estimates based on the assumptions that the test statistics are independent and dependent, respectively. Therefore, for each group of subjects, we have four estimates, respectively denoted by $S_{\widehat{\text{FDP}}(t)}^{\text{Langaas}}$ , $S_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{Langaas}}$ , $S_{\widehat{\text{FDP}}(t)}^{\text{SLIM}}$ , and $S_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{SLIM}}$ .

We summarize our estimation results in Table 5. From this table, we observe that under different assumptions of whether the dependence is present among the test statistics significantly affect the asymptotic standard deviation of $\mathrm{FDP}(t)$ . The ratios $S_{\widehat{\text{FDP}}(t)}^{\text{Langaas}}/S_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{Langaas}}$ and $S_{\widehat{\text{FDP}}(t)}^{\text{SLIM}}/S_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{SLIM}}$ range from 4 to 7 folds. Moreover, the ratios of the FDP standard deviation estimates to its limit values $\widehat{\text{FDP}}(t)$ are between 0.25 and 0.82 and thus the uncertainty of FDP is nonnegligible. These observations reinforce the statement that in an FDP related inference, it may not be sufficient to consider only the limit value of FDP (or equivalently, FDR); we suggest that it is important to correctly account for the dependence; otherwise, it might be over optimistic in the conclusion of how well the FDP is controlled. For example, when $t=0.005$ in Grp1, the 95 percentile (say, two standard deviations above the mean) of the $\mathrm{FDP}(t)$ is estimated as $0.315+0.153\times 2=0.621$ under dependence; in contrast, if we assume that the test statistics are independent, the corresponding estimate is $0.315+0.031\times 2=0.377$ . Therefore, ignoring the dependence leads to an over optimistic belief of how well the upper bound of FDP is controlled; such a nonignorable difference may significantly change the statistical interpretation of the results in practice.

Table 5: Estimation results for the GWAS study.

Population	#SNP	$t$	$\widehat{\text{FDP}}(t)$	$S_{\widehat{\text{FDP}}(t)}^{\text{Langaas}}$	$S_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{Langaas}}$	$S_{\widehat{\text{FDP}}(t)}^{\text{SLIM}}$	$S_{\widehat{\text{FDP}}(t),\text{Ind}}^{\text{SLIM}}$
Grp1	73	0.005	0.315	0.153	0.031	0.259	0.052
	105	0.01	0.377	0.166	0.031	0.244	0.045
Grp2	166	0.01	0.199	0.107	0.018	0.148	0.025
	199	0.02	0.441	0.125	0.019	0.167	0.025
Grp3	89	0.002	0.147	0.069	0.017	0.092	0.022
	135	0.005	0.371	0.092	0.019	0.124	0.025

7 Discussion

In this article, we have established the theoretical results for the asymptotic variance of $\mathrm{FDP}$ in the framework that the test statistics are weakly dependent. We observe that with mild conditions, the asymptotic variance of $\mathrm{FDP}$ under dependence is usually greater than that under independence; this is also demonstrated by our numerical studies: even under weak dependence, the variances of the FDPs can be numerically much larger than those under independence. It is thus crucial for us to take into account the dependence in the estimation of the variance of the FDP and report it together with its mean. Furthermore, we have observed in the real data example that the standard deviation estimate of $\mathrm{FDP}$ under the dependence assumption is significantly different from (usually greater than) that under the independence assumption. In practice, we suggest that we presumably include the dependence in the estimation, unless there is a strong evidence of otherwise. This will be helpful for us to better understand how well the FDP is controlled, and avoid over optimistic conclusions.

In this article, we have studied the asymptotic uncertainty of FDP in the same framework as Fan et al. (2012). Under the assumption that the test statistics are weakly dependent, we obtain the asymptotic expansion of $\mathrm{FDP}(t)$ and establish the explicit formula for computing the asymptotic variance of $\mathrm{FDP}(t)$ . Within this framework, we observe many interesting open problems that can be studied in the future. We list several as examples; there are many others.

First, our proposed method for estimating the variance of $\mathrm{FDP}(t)$ in Section 4 relies on the estimation of $\pi_{0}$ ; there, we adopted existing methods in the literature. From simulation studies in Section 5.2, we observe that the performance of the variance estimate for $\mathrm{FDP}(t)$ relies heavily on that of $\pi_{0}$ ; although the Langaas and SLIM methods perform reasonably well, we believe that they are not yet optimal, as none of these $\pi_{0}$ estimation methods have sufficiently accounted for the dependence among the test statistics. Therefore, we expect that establishing improved $\pi_{0}$ estimation methods that can better accommodate the dependence among the test statistics would be of particular interest, and can be incorporated into our estimation method in Section 4.

Second, we have imposed some weak dependence assumptions in our development; this may limit the method in some applications. Practically, one possible solution is not to use the original test statistics, but the dependence-adjusted ones, as they are more powerful and often weakly dependent (Friguet and Causeur, 2011; Fan et al., 2012); yet there is no theoretical guarantees. Therefore, it is challenging, but would be of interest to study the asymptotic variance of $\mathrm{FDP}(t)$ under arbitrary dependence.

Third, we assume that the test statistics are normally distributed in this article. It would be of great both practical and theoretical value to generalize our study to accommodate multiple testing problems where the test statistics follow other popular distributions, e.g., $t$ , $F$ , and chi-square distributions. We refer to Zhuo et al. (2020) as an example of the recent works on dependent $t$ -tests and leave the relevant investigation to our future work.

SUPPLEMENTARY MATERIALS

The supplementary materials contain technical details for the theorems in Section 3.

References

Basu et al. (2021) Basu, P., Fu, L., Saretto, A., and Sun, W. (2021), “Empirical bayes control of the false discovery exceedance,” arXiv preprint arXiv:2111.03885.
Benjamini and Hochberg (1995) Benjamini, Y. and Hochberg, Y. (1995), “Controlling the false discovery rate: a practical and powerful approach to multiple testing,” Journal of the Royal statistical society: series B (Methodological), 57, 289–300.
Benjamini et al. (2001) Benjamini, Y., Yekutieli, D., et al. (2001), “The control of the false discovery rate in multiple testing under dependency,” The annals of statistics, 29, 1165–1188.
Bonferroni (1936) Bonferroni, C. (1936), “Teoria statistica delle classi e calcolo delle probabilita,” Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commericiali di Firenze, 8, 3–62.
Bradic et al. (2011) Bradic, J., Fan, J., and Wang, W. (2011), “Penalized composite quasi-likelihood for ultrahigh dimensional variable selection,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73, 325–349.
Clarke et al. (2009) Clarke, S., Hall, P., et al. (2009), “Robustness of multiple testing procedures against dependence,” The Annals of Statistics, 37, 332–358.
Delattre and Roquain (2011) Delattre, S. and Roquain, E. (2011), “On the false discovery proportion convergence under Gaussian equi-correlation,” Statistics & Probability Letters, 81, 111–115.
Delattre and Roquain (2015) — (2015), “New procedures controlling the false discovery proportion via Romano–Wolf’s heuristic,” The Annals of Statistics, 43, 1141–1177.
Delattre and Roquain (2016) — (2016), “On empirical distribution function of high-dimensional Gaussian vector components with an application to multiple testing,” Bernoulli, 22, 302–324.
Döhler and Roquain (2020) Döhler, S. and Roquain, E. (2020), “Controlling the false discovery exceedance for heterogeneous tests,” Electronic Journal of Statistics, 14, 4244–4272.
Dudoit et al. (2004) Dudoit, S., van der Laan, M. J., and Pollard, K. S. (2004), “Multiple testing. Part I. Single-step procedures for control of general type I error rates,” Statistical Applications in Genetics and Molecular Biology, 3, 1–69.
Efron (2007) Efron, B. (2007), “Correlation and large-scale simultaneous significance testing,” Journal of the American Statistical Association, 102, 93–103.
Fan and Han (2017) Fan, J. and Han, X. (2017), “Estimation of the false discovery proportion with unknown dependence,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79, 1143–1164.
Fan et al. (2012) Fan, J., Han, X., and Gu, W. (2012), “Estimating false discovery proportion under arbitrary covariance dependence,” Journal of the American Statistical Association, 107, 1019–1035.
Fan et al. (2019) Fan, J., Ke, Y., Sun, Q., and Zhou, W.-X. (2019), “FarmTest: Factor-adjusted robust multiple testing with approximate false discovery control,” Journal of the American Statistical Association, 1–29.
Ferreira and Zwinderman (2006) Ferreira, J. and Zwinderman, A. (2006), “On the Benjamini–Hochberg method,” The Annals of Statistics, 34, 1827–1849.
Finner and Roters (2002) Finner, H. and Roters, M. (2002), “Multiple hypotheses testing and expected number of type I. errors,” The Annals of Statistics, 30, 220–238.
Friguet and Causeur (2011) Friguet, C. and Causeur, D. (2011), “Estimation of the proportion of true null hypotheses in high-dimensional data under dependence,” Computational statistics & data analysis, 55, 2665–2676.
Ge and Li (2012) Ge, Y. and Li, X. (2012), “Control of the false discovery proportion for independently tested null hypotheses,” Journal of Probability and Statistics, 2012.
Genovese and Wasserman (2004) Genovese, C. and Wasserman, L. (2004), “A stochastic process approach to false discovery control,” The Annals of Statistics, 32, 1035–1061.
Genovese and Wasserman (2006) Genovese, C. R. and Wasserman, L. (2006), “Exceedance control of the false discovery proportion,” Journal of the American Statistical Association, 101, 1408–1417.
Guo and Romano (2007) Guo, W. and Romano, J. (2007), “A generalized Sidak-Holm procedure and control of generalized error rates under independence,” Statistical applications in genetics and molecular biology, 6.
Hemerik et al. (2019) Hemerik, J., Solari, A., and Goeman, J. J. (2019), “Permutation-based simultaneous confidence bounds for the false discovery proportion,” Biometrika, 106, 635–649.
Hochberg (1988) Hochberg, Y. (1988), “A sharper Bonferroni procedure for multiple tests of significance,” Biometrika, 75, 800–802.
Holland and Copenhaver (1987) Holland, B. S. and Copenhaver, M. D. (1987), “An improved sequentially rejective Bonferroni test procedure,” Biometrics, 417–423.
Holm (1979) Holm, S. (1979), “A simple sequentially rejective multiple test procedure,” Scandinavian journal of statistics, 65–70.
Korn et al. (2004) Korn, E. L., Troendle, J. F., McShane, L. M., and Simon, R. (2004), “Controlling the number of false discoveries: application to high-dimensional genomic data,” Journal of Statistical Planning and Inference, 124, 379–398.
Langaas et al. (2005) Langaas, M., Lindqvist, B. H., and Ferkingstad, E. (2005), “Estimating the proportion of true null hypotheses, with application to DNA microarray data,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 555–572.
Lehmann and Romano (2005) Lehmann, E. and Romano, J. P. (2005), “GENERALIZATIONS OF THE FAMILYWISE ERROR RATE,” The Annals of Statistics, 33, 1138–1154.
Lehmann and Romano (2012) Lehmann, E. L. and Romano, J. P. (2012), “Generalizations of the familywise error rate,” in Selected Works of EL Lehmann, Springer, pp. 719–735.
Owen (2005) Owen, A. B. (2005), “Variance of the number of false discoveries,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67, 411–426.
Pollard and van der Laan (2004) Pollard, K. S. and van der Laan, M. J. (2004), “Choice of a null distribution in resampling-based multiple testing,” Journal of Statistical Planning and Inference, 125, 85–100.
Rom (1990) Rom, D. M. (1990), “A sequentially rejective test procedure based on a modified Bonferroni inequality,” Biometrika, 77, 663–665.
Sarkar (2006) Sarkar, S. K. (2006), “False discovery and false nondiscovery rates in single-step multiple testing procedures,” The Annals of Statistics, 34, 394–415.
Šidák (1967) Šidák, Z. (1967), “Rectangular confidence regions for the means of multivariate normal distributions,” Journal of the American Statistical Association, 62, 626–633.
Simes (1986) Simes, R. J. (1986), “An improved Bonferroni procedure for multiple tests of significance,” Biometrika, 73, 751–754.
Storey (2002) Storey, J. D. (2002), “A direct approach to false discovery rates,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 479–498.
Storey et al. (2004) Storey, J. D., Taylor, J. E., and Siegmund, D. (2004), “Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66, 187–205.
Storey and Tibshirani (2003) Storey, J. D. and Tibshirani, R. (2003), “Statistical significance for genomewide studies,” Proceedings of the National Academy of Sciences, 100, 9440–9445.
Sun and Cai (2009) Sun, W. and Cai, T. (2009), “Large-scale multiple testing under dependence,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71, 393–424.
Sun et al. (2015) Sun, W., Reich, B. J., Cai, T. T., Guindani, M., and Schwartzman, A. (2015), “False discovery control in large-scale spatial multiple testing,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77, 59–83.
van der Laan et al. (2004) van der Laan, M. J., Dudoit, S., and Pollard, K. S. (2004), “Augmentation procedures for control of the generalized family-wise error rate and tail probabilities for the proportion of false positives,” Statistical applications in genetics and molecular biology, 3, 1–25.
Wang et al. (2010) Wang, H.-Q., Tuominen, L. K., and Tsai, C.-J. (2010), “SLIM: a sliding linear model for estimating the proportion of true null hypotheses in datasets with dependence structures,” Bioinformatics, 27, 225–231.
Zhuo et al. (2020) Zhuo, B., Jiang, D., and Di, Y. (2020), “Test-statistic correlation and data-row correlation,” Statistics & Probability Letters, 167, 108903.