Sure Screening for Transelliptical Graphical Models

Yuxiang Xie, Chengchun Shi, Rui Song

(January 15, 2016)

Abstract

We propose a sure screening approach for recovering the structure of a transelliptical graphical model in the high dimensional setting. We estimate the partial correlation graph by thresholding the elements of an estimator of the sample correlation matrix obtained using Kendall’s tau statistic. Under a simple assumption on the relationship between the correlation and partial correlation graphs, we show that with high probability, the estimated edge set contains the true edge set, and the size of the estimated edge set is controlled. We develop a threshold value that allows for control of the expected false positive rate. In simulation and on an equities data set, we show that transelliptical graphical sure screening performs quite competitively with more computationally demanding techniques for graph estimation.
some key words: High Dimensionality; Kendall’s tau; Partial Correlation; Sparsity; Undirected Graph

1 Introduction

Consider the random vector $X=\left(X_{1},\ldots,X_{p}\right)^{T}$ , and an undirected graph denoted by $G=\left(\mathcal{V},\mathcal{E}\right)$ , where $\mathcal{V}=\left\{1,\dots,p\right\}$ is the set of nodes, and $\mathcal{E}$ is the set of edges describing the conditional dependence relationships among $X=\left(X_{1},\ldots,X_{p}\right)^{T}$ . A pair $\left(j,j^{\prime}\right)$ is contained in the edge set $\mathcal{E}$ if and only if $X_{j}$ is conditionally dependent on $X_{j^{\prime}}$ , given all remaining variables $X_{\mathcal{V}\backslash\{j,j^{\prime}\}}=\left\{X_{k};k\in\mathcal{V}\backslash\left\{j,j^{\prime}\right\}\right\}$ .

In recent years, many methods have been developed to recover the structure of graphical models in the high dimensional setting. Many authors have studied the Gaussian graphical model, in which conditional dependence is encoded by the sparsity pattern of the inverse covariance matrix (Yuan and Lin,, 2007; Friedman et al.,, 2008; Rothman et al.,, 2008; Ravikumar et al.,, 2009, and references therein). Liu et al., (2009); Liu et al., 2012a introduced the nonparanormal distribution, which results from univariate monotonic transformations of the Gaussian distribution, and showed that the structural properties of the inverse covariance matrix of the Gaussian distribution carry over to the corresponding nonparanormal distribution. Ravikumar et al., (2010) and Anandkumar et al., (2012) considered recovering the structure of an Ising graphical model. Yang et al., (2012) studied a class of graphical models in which the node-wise conditional distributions arise from the exponential family. Moreover, Yang et al., (2014) and Chen et al., (2015) considered the problem of structure recovery for mixed graphical models.

Luo et al., (2015) proposed a computationally-efficient screening approach for Gaussian graphical models, which they called graphical sure screening (GRASS). They estimated an edge between the $j$ th and $j^{\prime}$ th nodes if the sample correlation between the $j$ th and $j^{\prime}$ th features exceeds some threshold $\gamma$ . Then the $j$ th node’s estimated neighborhood contains the true neighborhood with very high probability, under certain simple assumptions. GRASS requires only $\mathcal{O}(p^{2})$ operations, while most other existing methods for estimating the graph require $\mathcal{O}(p^{3})$ computations (Friedman et al.,, 2008).

However, GRASS requires that the data follows a multivariate normal distribution. In many settings, a reliance on exact normality is not desirable (Liu et al.,, 2009; Liu et al., 2012a, ; Liu et al., 2012b, ). In this paper, we propose transelliptical GRASS, an extension of the GRASS procedure to the transelliptical graphical model family, introduced by Liu et al., 2012b . We show that under a certain set of simple assumptions, the desirable statistical properties held by GRASS are also held by transelliptical GRASS. However, due to the relaxation of multivariate normality, the graph that we estimate represents the partial correlation, rather than conditional dependence, among variables.

The rest of this paper is organized as follows. In Section 2, we provide some background on transelliptical graphical models, and present a useful property of Kendall’s tau statistic. In Section 3, we establish the theoretical properties for transelliptical sure screening, which include the sure screening property, size control of the selected edge set, and the control of the expected false positive rate. Furthermore, we provide a choice of the threshold value that leads to the aforementioned desirable properties. Simulation studies are presented in Section 4, and an application to an equities data set is shown in Section 5. We close with a discussion in Section 6.

2 Preliminaries

The transelliptical distribution (Liu et al., 2012b, ) is a generalization of the nonparanormal distribution (Liu et al.,, 2009; Liu et al., 2012a, ). The transelliptical distribution extends the elliptical distribution in much the same way that the nonparanormal extends the normal distribution. We first provide the definition of the elliptical distribution.

Definition 1.

Let $\mu\in\mathbb{R}^{p}$ and $\Sigma\in\mathbb{R}^{p\times p}$ with $\mathrm{rank}\left(\Sigma\right)=q\leq p$ . A $p$ -dimensional random vector $X$ has an elliptical distribution, denoted by $X\sim\mbox{EC}_{p}\left(\mu,\Sigma,r\right)$ , if it has a stochastic representation

\displaystyle X\stackrel{{\scriptstyle d}}{{=}}\mu+rAU,

(2.1)

where $U$ is a random vector uniformly distributed on the unit sphere in $\mathbb{R}^{q}$ , $r\geq 0$ is a scalar random variable independent of $U$ , and $A\in\mathbb{R}^{p\times q}$ satisfies $\Sigma=AA^{T}$ .

In Definition 1, $X\stackrel{{\scriptstyle d}}{{=}}Y$ indicates that $X$ and $Y$ have the same distribution. Many multivariate distributions, such as the multivariate normal and the multivariate t-distribution, belong to the elliptical distribution family.

From now on, we assume that $A$ in (2.1) is a full rank $p\times p$ matrix so that $\Sigma$ is positive definite. In addition, assume that $\Sigma$ has unit diagonal elements. We let $\rho_{jj^{\prime}}$ denote the $\left(j,j^{\prime}\right)$ th entry of $\Sigma$ . We also assume that the scalar random variable $r$ in (2.1) has density $g(\cdot)$ and ${\mathbb{E}}(r^{2})<\infty$ .

Definition 2.

A continuous random vector $X=\left(X_{1},\ldots,X_{p}\right)^{T}$ follows a $p$ -dimensional transelliptical distribution, denoted by $\mbox{TE}_{p}\left(\Sigma,r;f_{1},\dots,f_{p}\right)$ , if there exist monotone univariate functions $\left\{f_{j}\right\}_{j=1}^{p}$ and a nonnegative random variable $r$ satisfying $\Pr(r=0)=0$ , such that $Z\equiv f(X)\equiv\left(f_{1}\left(X_{1}\right),\ldots,f_{p}\left(X_{p}\right)\right)^{T}\sim\mbox{EC}_{p}\left(0,\Sigma,r\right)$ . We refer to $Z=\left(Z_{1},\ldots,Z_{p}\right)^{T}=\left(f_{1}\left(X_{1}\right),\ldots,f_{p}\left(X_{p}\right)\right)^{T}$ as the latent variables of $X=\left(X_{1},\ldots,X_{p}\right)^{T}$ .

Remark 1.

A random vector $X=\left(X_{1},\ldots,X_{p}\right)^{T}$ follows a nonparanormal distribution (Liu et al.,, 2009; Liu et al., 2012a, ) if there exist monotone univariate functions $\left\{f_{j}\right\}_{j=1}^{p}$ such that $Z=f(X)=\left(f_{1}\left(X_{1}\right),\ldots,f_{p}\left(X_{p}\right)\right)^{T}\sim N_{p}({0},{\Sigma})$ . Therefore, the transelliptical distribution is a strict extension of the nonparanormal distribution.

Given a transelliptical distribution $\mbox{TE}_{p}\left(\Sigma,r;f_{1},\dots,f_{p}\right)$ , we can define an undirected graph $G=\left(\mathcal{V},\mathcal{E}\right)$ , where $\mathcal{V}=\left\{1,\ldots,p\right\}$ , and $\left(j,j^{\prime}\right)\in\mathcal{E}$ if and only if $\left(\Sigma^{-1}\right)_{jj^{\prime}}\neq 0$ — that is, if and only if the latent variables $Z_{j}=f_{j}(X_{j})$ and $Z_{j^{\prime}}=f_{j^{\prime}}(X_{j^{\prime}})$ are partially correlated (Liu et al., 2012b, ). In the special case of a nonparanormal distribution, a zero entry of the precision matrix $\Sigma^{-1}$ further implies conditional independence between the corresponding pair of random variables.

Next, we present the definition and some theoretical properties of the Kendall’s tau statistic.

Definition 3.

Given $n$ independent draws from $X=(X_{1},\ldots,X_{p})^{T}$ , such that $X_{ij}$ is the value of the $j$ th variable in the $i$ th observation, the population-level Kendall’s tau statistic between $X_{j}$ and $X_{j^{\prime}}$ is defined as

\tau_{jj^{\prime}}={\mathbb{E}}\left[\mathrm{sign}\left(\left(X_{1j}-X_{2j}\right)\left(X_{1j^{\prime}}-X_{2j^{\prime}}\right)\right)\right].

(2.2)

The sample estimator of Kendall’s tau is defined as

\widehat{\tau}_{jj^{\prime}}=\frac{2}{n(n-1)}\sum_{1\leq i\leq i^{\prime}\leq n}\mathrm{sign}\left(\left(X_{ij}-X_{i^{\prime}j}\right)\left(X_{ij^{\prime}}-X_{i^{\prime}j^{\prime}}\right)\right).

(2.3)

Now suppose that we have $n$ independent draws from $X=(X_{1},\ldots,X_{p})^{T}\sim\mbox{TE}_{p}(\Sigma,r;f_{1},\dots,f_{p})$ , such that $X_{ij}$ is the value of the $j$ th variable in the $i$ th observation. Then there is a simple connection between the population-level Kendall’s tau statistic $\tau_{jj^{\prime}}$ and $\rho_{jj^{\prime}}$ .

Lemma 1.

(Liu et al., 2012b, ) For $X=\left(X_{1},\ldots,X_{p}\right)^{T}\sim\mbox{TE}_{p}(\Sigma,r;f_{1},\dots,f_{p})$ , we have $\rho_{jj^{\prime}}=\sin(\frac{\pi}{2}\tau_{jj^{\prime}})$ .

Lemma 1 motivated Liu et al., 2012b to estimate $\rho_{jj^{\prime}}$ using

\widehat{S}^{\tau}_{jj^{\prime}}=\begin{cases}\sin\left(\frac{\pi}{2}\widehat{\tau}_{jj^{\prime}}\right)&\text{if $j\neq j^{\prime}$}\\ 1&\text{if $j=j^{\prime}$}\end{cases}.

(2.4)

3 Transelliptical Graphical Sure Screening

3.1 Proposed Approach

Suppose that we have $n$ independent draws from $X=(X_{1},\ldots,X_{p})^{T}\sim\mbox{TE}_{p}(\Sigma,r;f_{1},\dots,f_{p})$ . We define $\mathcal{E}\equiv\left\{\left(j,j^{\prime}\right):j<j^{\prime},\left(\Sigma^{-1}\right)_{jj^{\prime}}\neq 0\right\}$ to be the true edge set, and $\mathcal{E}_{j}\equiv\left\{j^{\prime}:j^{\prime}\neq j,\left(\Sigma^{-1}\right)_{jj^{\prime}}\neq 0\right\}$ to be the true neighborhood for the $j$ th node. We propose to estimate $\mathcal{E}$ and $\mathcal{E}_{j}$ as follows,

\widehat{\mathcal{E}}_{\gamma_{n}}=\left\{\left(j,j^{\prime}\right):j<j^{\prime},\left|\widehat{S}^{\tau}_{jj^{\prime}}\right|>\gamma_{n,jj^{\prime}}\right\}

(3.1)

and

\widehat{\mathcal{E}}_{j,\gamma_{n}}=\left\{j^{\prime}:j^{\prime}\neq j,\left|\widehat{S}^{\tau}_{jj^{\prime}}\right|>\gamma_{n,jj^{\prime}}\right\},

(3.2)

where $\gamma_{n,jj^{\prime}}>0$ is some threshold value that we will specify in the following sections, and $\widehat{S}^{\tau}_{jj^{\prime}}$ is defined in (2.4). We refer to $\widehat{\mathcal{E}}_{\gamma_{n}}$ and $\widehat{\mathcal{E}}_{j,\gamma_{n}}$ as the transelliptical graphical sure screening (transelliptical GRASS) estimators.

3.2 Theoretical Properties

We now present some theoretical properties of transelliptical GRASS. Proofs are in the Appendix.

Assumption 1.

For some constant $C_{1}>0$ and $0<\kappa<1/2$ ,

\min_{\left(j,j^{\prime}\right)\in\mathcal{E}}\left|\rho_{jj^{\prime}}\right|\geq C_{1}n^{-\kappa}.

Assumption 1 requires that the elements in the edge set $\mathcal{E}$ correspond to sufficiently large values in the correlation matrix. We next present the sure screening property in Theorem 1.

Theorem 1.

Suppose that Assumption 1 holds, and that $\log(p)=C_{3}n^{\xi}$ for some constants $C_{3}>0$ and $\xi\in(0,1-2\kappa)$ . Let $\gamma_{n,jj^{\prime}}\equiv\gamma_{n}=\frac{2}{3}C_{1}n^{-\kappa}$ . Then there exist constants $C_{4}$ and $C_{5}$ such that

\displaystyle\Pr(\mathcal{E}\subseteq\widehat{\mathcal{E}}_{\gamma_{n}})

\displaystyle\geq 1-C_{4}\exp(-C_{5}n^{1-2\kappa})

and

\displaystyle\Pr(\mathcal{E}_{j}\subseteq\widehat{\mathcal{E}}_{j,\gamma_{n}})

\displaystyle\geq 1-C_{4}\exp(-C_{5}n^{1-2\kappa}).

Conversely, if $\min_{(j,j^{\prime})\in\mathcal{E}}|\rho_{jj^{\prime}}|<C_{1}n^{-\kappa}/3$ , then there exist constants $C_{6}$ and $C_{7}$ such that

\displaystyle\Pr(\mathcal{E}\not\subseteq\widehat{\mathcal{E}}_{\gamma_{n}}){\geq}1-C_{6}\exp(-C_{7}n^{1-2\kappa}).

(3.3)

Theorem 1 guarantees that the candidate edge set obtained from transelliptical GRASS contains the true edge set with high probability, which means the screening method will not result in false negatives with high probability. Moreover, (3.3) suggests that Assumption 1 is necessary up to a constant. The following corollary shows that under Assumption 1, transelliptical GRASS can recover the connected components of $\mathcal{E}$ with high probability.

Corollary 1.

Suppose there are $h$ connected components in the graph $\mathcal{E}$ , and that the $l$ th connected component contains the variables $x_{1}^{(l)},\dots,x_{p_{l}}^{(l)}$ , where $\sum_{l=1}^{h}p_{l}=p$ . That is, $x_{j}^{(s)}$ and $x_{j^{\prime}}^{(t)}$ are partially uncorrelated for $s\not=t$ . Suppose Assumption 1 and the conditions in Theorem 1 hold. Let $\gamma_{n,jj^{\prime}}=\gamma_{n}=2C_{1}n^{-\kappa}/3$ . Then the connected components of $\widehat{\mathcal{E}}_{\gamma_{n}}$ are the same as the connected components of $\mathcal{E}$ with probability at least $1-C_{4}\exp\left(-C_{5}n^{1-2\kappa}\right)$ .

Our next theorem will provide a bound on the size of $\widehat{\mathcal{E}}_{j,\gamma_{n}}$ . This requires an additional assumption.

Assumption 2.

There exist constants $\alpha\geq 0$ and $C_{2}>0$ such that $\Lambda_{\max}(\Sigma)\leq C_{2}n^{\alpha}$ , where $\Lambda_{\max}(\Sigma)$ is the largest eigenvalue of $\Sigma$ .

Assumption 2 allows the largest eigenvalue of the population covariance matrix $\Sigma$ to diverge as $n$ grows.

Theorem 2.

Let $\gamma_{n,jj^{\prime}}=\gamma_{n}=\frac{2}{3}C_{1}n^{-\kappa}$ . Under Assumptions 1–2, if $\log\left(p\right)=C_{3}n^{\xi}$ for some constants $C_{3}>0$ and $\xi\in\left(0,1-2\kappa\right)$ , then $\Pr\left(\left|\widehat{\mathcal{E}}_{j,\gamma_{n}}\right|\leq O\left(n^{2\kappa+\alpha}\right)\right)$ $\geq 1-C_{4}\exp\left(-C_{5}n^{1-2\kappa}\right)$ , where the constants $C_{4}$ and $C_{5}$ are as in Theorem 1.

Next we propose a choice of the threshold $\gamma_{n,jj^{\prime}}$ that enables us to control the expected false positive rate, defined as ${|\widehat{\mathcal{E}}_{\gamma_{n}}\cap\mathcal{E}^{c}|}/{\mathcal{E}^{c}}$ , at a pre-specified value. Here, $\mathcal{E}^{c}\equiv\left\{\left(j,j^{\prime}\right):j<j^{\prime},\left(\Sigma^{-1}\right)_{jj^{\prime}}=0\right\}$ , and $\widehat{\mathcal{E}}_{\gamma_{n}}$ is defined in (3.1). This requires an additional assumption.

Assumption 3.

For the same $\xi$ as in Theorem 1,

\max_{(j,j^{\prime})\notin\mathcal{E}}\left|\rho_{jj^{\prime}}\right|=o\left(n^{-\frac{1-\xi}{2}}\right).

Theorem 3.

Suppose that Assumptions 1–3 hold. If $\log(p)=C_{3}n^{\xi}$ for $\xi$ defined in Theorem 1, then we can control the asymptotic expected false positive rate at $f/|\mathcal{E}^{c}|$ by choosing $\gamma_{n,jj^{\prime}}=\frac{\pi}{2}\widehat{\omega}_{jj^{\prime}}\Phi^{-1}(1-\frac{f}{p(p-1)})/\sqrt{n}$ , where

\widehat{\omega}_{jj^{\prime}}^{2}=\frac{4(n-1)}{(n-2)^{2}}\sum_{i^{\prime}=1}^{n}\left[\left\{\frac{1}{n-1}\sum_{i=1,i\neq i^{\prime}}^{n}\mathrm{sign}\left(\left(X_{ij}-X_{i^{\prime}j}\right)\left(X_{ij^{\prime}}-X_{i^{\prime}j^{\prime}}\right)\right)\right\}-\widehat{\tau}_{jj^{\prime}}\right]^{2},

(3.4)

and $\widehat{\tau}_{jj^{\prime}}$ is defined in (2.3). Furthermore, with this threshold, the sure screening property of Theorem 1 still holds.

The estimator (3.4) is a jackknife estimator for the asymptotic variance of a U-statistic (Arvesen,, 1969; Sen,, 1977; Callaert and Veraverbeke,, 1981; Fligner and Rust,, 1983; Lee,, 1985).

3.3 A Second Look at Assumptions 1 and 3

Assumptions 1 and 3 involve placing conditions on the elements of $\Sigma$ corresponding to non-zero and zero elements of $\Sigma^{-1}$ , respectively. These conditions are somewhat hard to interpret, since in general there is no simple relationship between the $(j,j^{\prime})$ th elements of a matrix $A$ and its inverse $A^{-1}$ . We now present a result from Luo et al., (2015) that allows us to re-formulate these assumptions as conditions on the elements of $\Sigma^{-1}$ . We let $\beta=\Lambda_{\max}(\Sigma)/\Lambda_{\min}(\Sigma)$ and $\nu={2}/\left\{\Lambda_{\max}(\Sigma)^{-1}+\Lambda_{\min}(\Sigma)^{-1}\right\}$ . Here, $\Lambda_{\max}(\cdot)$ and $\Lambda_{\min}(\cdot)$ indicate the largest and smallest eigenvalues of a matrix, respectively.

Proposition 1.

(Luo et al.,, 2015) Suppose $1<\beta\leq\left\{n^{(1-\xi)/2}+\Lambda_{\max}(\Sigma)^{-1/2}\right\}/\left\{n^{(1-\xi)/2}-\Lambda_{\max}(\Sigma)^{-1/2}\right\}$ . Then Assumption 3 holds. Suppose also that $n\geq\left(2/C_{1}\right)^{1/(1-\xi-\kappa)}$ . If $\min_{(j,j^{\prime})\in\mathcal{E}}\nu^{2}\left|(\Sigma^{-1})_{jj^{\prime}}\right|\geq 2C_{1}n^{-\kappa}$ , then Assumption 1 holds. Furthermore, if Assumption 1 holds, then $\min_{(j,j^{\prime})\in\mathcal{E}}\nu^{2}\left|(\Sigma^{-1})_{jj^{\prime}}\right|\geq C_{1}n^{-\kappa}/2$ .

Therefore, if $\Sigma$ is very well-conditioned and the non-zero elements of $\Sigma^{-1}$ are sufficiently large, then Proposition 1 implies that Theorems 1 and 3 hold.

4 Simulation Studies

The simulation studies in this section are largely based on those in Luo et al., (2015).

4.1 Data Generation

Let $p$ be the number of features, and $n$ the number of observations. Motivated by the simulation study of Luo et al., (2015), we considered four ways of generating the edge set $\mathcal{E}$ .

Simulation A:

For all $j<j^{\prime}$ , we set $(j,j^{\prime})\in\mathcal{E}$ with probability $0.01$ . We then generated a $p\times p$ matrix $A$ , where

A_{jj^{\prime}}=A_{j^{\prime}j}=\begin{cases}1&\text{for $j=j^{\prime}$}\\ \mathrm{Unif}[-0.3,0.7]&\text{for $(j,j^{\prime})\in\mathcal{E}$ }\\ 0&\text{otherwise}\end{cases}.

(4.1)

Finally, we created a positive definite matrix ${\Sigma}^{-1}$ ,

{\Sigma}^{-1}={A}+(0.1-\Lambda_{\min}({A})){I},

(4.2)

where $\Lambda_{\min}(A)$ is the smallest eigenvalue of $A$ , and $I$ denotes the $p\times p$ identity matrix.

Simulation B:

We partitioned the $p$ features into $10$ equally-sized and non-overlapping sets, $C_{l}=\{(l-1)p/10+1,\ldots,lp/10\}$ for $l=1,\ldots,10$ . For all $j\in C_{l},j^{\prime}\in C_{l},j<j^{\prime}$ , we set $(j,j^{\prime})\in\mathcal{E}$ . We then generated $A$ and $\Sigma^{-1}$ according to (4.1) and (4.2).
Simulation C:

For all $j\leq j^{\prime}$ , we set $\rho_{jj^{\prime}}=0.3^{|j-j^{\prime}|}$ .
Simulation D:

We partitioned the features into $p/10$ equally-sized and non-overlapping sets, $C_{l}=\{10(l-1)+1,\ldots,10l\}$ for $l=1,\ldots,p/10$ . Then for all $j\in C_{l},j^{\prime}\in C_{l}$ , we set $(\Sigma^{-1})_{jj^{\prime}}=0.9^{|j-j^{\prime}|}$ . All other elements of $\Sigma^{-1}$ were set to zero.

$\Sigma$ was rescaled to have diagonal elements equal to 1. We then generated observations from a $N(0,{\Sigma})$ distribution, and observations from a multivariate $t$ -distribution with $\theta$ degrees of freedom, mean zero, and correlation $\Sigma$ . After that, we applied four monotonic functions, $\exp(x)$ , $x^{3}$ , $x^{5}$ , $(x-1)^{3}$ , to these observations with equal probability; this process gave us nonparanormal-distributed observations and transelliptical $t$ -distributed observations.

4.2 Control of False Positive Rate

Theorem 3 states that under certain conditions, performing transelliptical GRASS with $\gamma_{n,jj^{\prime}}=\frac{\pi}{2}\widehat{\omega}_{jj^{\prime}}\Phi^{-1}(1-\frac{f}{p(p-1)})/\sqrt{n}$ , where $\widehat{\omega}_{jj^{\prime}}$ is of the form (3.4), leads to control of the asymptotic expected false positive rate (FPR) at level $q\equiv f/|\mathcal{E}^{c}|$ . In Tables 1 and 2, we explore the control of the FPR in finite samples, for nonparanormal and transelliptical $t$ -distributed data. The FPR (defined as FP/(FP+TN)) and false negative rate (FNR; defined as FN/(TP+FN)) are reported for various values of the level of desired FPR control, $q$ . The size of the estimated edge set $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ is also reported. Here $n=100$ and $p=1000$ , and results are averaged over 250 simulated data sets.

Assumption 3 is the key for controlling the asymptotic expected false positive rate. In Simulation B, both $\Sigma$ and ${\Sigma}^{-1}$ are block diagonal with ten completely dense blocks. So Assumption 3 holds in Simulation B. In Simulation D, all of the zero elements in the precision matrix ${\Sigma}^{-1}$ also correspond to zero elements in $\Sigma$ , so that Assumption 3 holds exactly. As expected, the FPR is controlled successfully in Simulations B and D.

In contrast, in Simulations A and C, not all of the zero elements in the precision matrix ${\Sigma}^{-1}$ correspond to zero elements in the correlation matrix $\Sigma$ . But Table 1 and Table 2 reveal that the FPR is still controlled well in these settings. This is because Assumption 3 only requires the elements in $\mathcal{E}^{c}$ to correspond to small, though not necessarily zero, elements of $\Sigma$ . This assumption holds for most of the elements in $\mathcal{E}^{c}$ and $\mathcal{E}$ for Simulations A and C. Therefore, the FPR is also well-controlled in Simulations A and C.

Table 1: False positive rate control for nonparanormal data using

\gamma_{n,jj^{\prime}}=\frac{\pi}{2}\widehat{\omega}_{jj^{\prime}}\Phi^{-1}(1-\frac{f}{p(p-1)})/\sqrt{n}

, where

\widehat{\omega}_{jj^{\prime}}

is of the form (3.4) in Theorem 3.

Simulation A Simulation B Simulation C Simulation D $q$ $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ FPR FNR $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ FPR FNR $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ FPR FNR $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ FPR FNR 1e-04 542.46 3e-04 0.925 1611.6 2e-04 0.969 290.22 2e-04 0.819 920.38 2e-04 0.819 0.001 1579.76 0.0019 0.868 3472.04 0.0014 0.943 1071.96 0.0014 0.648 1580.36 0.0014 0.806 0.01 7426.36 0.013 0.762 10950.86 0.011 0.879 6125.34 0.011 0.382 6364.42 0.011 0.793 0.1 53548.1 0.104 0.551 59265.12 0.098 0.695 49926.3 0.098 0.111 49884.46 0.098 0.722 0.2 102830.74 0.202 0.446 108698.82 0.195 0.580 98455.7 0.196 0.055 98333.9 0.195 0.644 0.3 152029.24 0.301 0.368 157545.2 0.294 0.488 147546.26 0.294 0.031 147393.94 0.294 0.566 0.5 250794.72 0.500 0.244 254988.28 0.493 0.332 246989.82 0.493 0.012 246737.66 0.493 0.405

Table 2: False positive rate control for transelliptical

t

-distributed data with 5 degrees of freedom using

\gamma_{n,jj^{\prime}}=\frac{\pi}{2}\widehat{\omega}_{jj^{\prime}}\Phi^{-1}(1-\frac{f}{p(p-1)})/\sqrt{n}

, where

\widehat{\omega}_{jj^{\prime}}

is of the form (3.4) in Theorem 3.

Simulation A Simulation B Simulation C Simulation D $q$ $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ FPR FNR $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ FPR FNR $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ FPR FNR $|\widehat{\mathcal{E}}_{\gamma_{n}}|$ FPR FNR 1e-04 448.836 3e-04 0.941 1325.172 2e-04 0.975 240.52 2e-04 0.87 859.488 2e-04 0.833 0.001 1447.58 0.0018 0.889 3039.336 0.0015 0.952 1012.12 0.0015 0.721 1564.668 0.0015 0.813 0.01 7261.952 0.013 0.788 10340.736 0.011 0.892 6123.648 0.0112 0.466 6438.42 0.0111 0.794 0.1 53507.624 0.104 0.575 58758.12 0.099 0.713 50330.904 0.099 0.158 50297.052 0.099 0.721 0.2 102902.456 0.203 0.468 108367.908 0.197 0.598 99069.432 0.197 0.085 98909.556 0.197 0.642 0.3 152190.388 0.302 0.387 157358.224 0.295 0.504 148208.564 0.295 0.053 148021.74 0.295 0.563 0.5 250987.548 0.50 0.258 254984.984 0.494 0.344 247579.316 0.495 0.023 247400.096 0.494 0.404

4.3 Comparison to Existing Approaches

We first compare the performances of the graphical lasso (Friedman et al.,, 2008), neighborhood selection (Meinshausen and Bühlmann,, 2006), and transelliptical GRASS on nonparanormal data generated from Simulations A–D, with $n=50$ and $p=750$ . Let $X$ denote the $n\times p$ simulated data matrix. We let GL( $\widehat{\Sigma}$ ) and NS( $\widehat{\Sigma}$ ) denote the results of the graphical lasso and the neighborhood selection applied to the estimated correlation matrix $\widehat{\Sigma}=X^{T}X/n$ ; and we let GL( $\widehat{S}^{\tau}$ ) and NS( $\widehat{S}^{\tau}$ ) denote the results of the graphical lasso and neighborhood selection applied to $\widehat{S}^{\tau}$ , the Kendall’s tau estimator, defined in (2.4). Results are shown in Figure 1.

Transelliptical GRASS outperforms GL( $\widehat{S}^{\tau}$ ) and NS( $\widehat{S}^{\tau}$ ) in Simulation B. The sparsity patterns of $\Sigma$ and ${\Sigma}^{-1}$ are identical, so the assumptions underlying transelliptical GRASS hold.

In Simulations A and C, most of the edges in $\mathcal{E}$ correspond to large elements of $\Sigma$ , and most of the non-edges in $\mathcal{E}$ correspond to small elements of $\Sigma$ . Consequently, transelliptical GRASS outperforms GL( $\widehat{S}^{\tau}$ ) and NS( $\widehat{S}^{\tau}$ ).

Simulation D was designed to violate Assumption 1; most of the elements in the edge set $\mathcal{E}$ correspond to zero values in the correlation matrix $\Sigma$ . Therefore transelliptical GRASS does not perform well in Simulation D. But even in this undesirable setting, Figure 1 indicates that GL( $\widehat{S}^{\tau}$ ) and NS( $\widehat{S}^{\tau}$ ) do not perform much better than the transelliptical GRASS.

Not surprisingly, the original GRASS proposal (which involves thresholding $\widehat{\Sigma}$ ), GL( $\widehat{\Sigma}$ ), and NS( $\widehat{\Sigma}$ ) perform poorly, because they are designed for Gaussian data rather than nonparanormal data.

Finally, we compare the performance of the aforementioned methods on Gaussian data generated from Simulations A–D, with $n=50$ and $p=750$ . Figure 2 shows that the original GRASS proposal performs only slightly better than transelliptical GRASS on Gaussian data. This suggests that when the underlying distribution of the data is unknown, there is little cost (and potentially a large gain) associated with performing transelliptical GRASS instead of GRASS.

Overall, Figures 1 and 2 suggest that in these four settings, transelliptical GRASS performs competitively compared to some popular but computationally-intensive procedures for estimating a precision matrix.

Refer to caption — Figure 1: For nonparanormally distributed data with $p=750$ and $n=50$ , the true positive and false positive rates are shown. Curves are obtained by varying the tuning parameter for each method, and results are averaged over 20 simulated data sets.

5 Application to Equities data

We examined the Yahoo! Finance stock price data, which is described in Liu et al., 2012b , and available in the huge package in R on CRAN. The data consists of 1258 daily closing prices for 452 stocks in the S $\&$ P 500 index between January 1, 2003 and January 1, 2008. The stocks are categorized into 10 Global Industry Classification Standard (GICS) sectors. Let $S_{ij}$ denote the closing price of the $j$ th stock on the $i$ th day. We construct a $1257\times 452$ data matrix $X$ such that $X_{ij}=\log\left(S_{(i+1)j}/S_{ij}\right)$ for $i=1,\dots,1257$ and $j=1,\dots,452$ . We standardize each stock to have mean zero and standard deviation one, as in Tan et al., (2015).

We applied transelliptical GRASS with $\gamma_{n,jj^{\prime}}$ =0.5, 0.6, 0.7, and GL( $\widehat{S}^{\tau}$ ) with $\lambda$ =0.5, 0.6, 0.7. Figure 3 indicates that in all estimated graphs, stocks from the same GICS sector tend to be highly-connected, indicating that both methods provide informative graph estimates. Furthermore, plots $(c)$ , $(d)$ , $(e)$ and $(f)$ of Figure 3 also show that the graph estimates from transelliptical grass and GL( $\widehat{S}^{\tau}$ ) are quite similar provided that $\gamma_{n,jj^{\prime}}=\lambda$ . In fact, the arguments in Witten et al., (2011) can be used to establish the fact that when $\gamma_{n,jj^{\prime}}=\lambda$ , the connected components of GRASS and GL( $\widehat{S}^{\tau}$ ) are identical.

6 Discussion

In this paper, we have proposed transelliptical GRASS, a simple and efficient procedure for recovering the structure of a high-dimensional transelliptical graphical model. Transelliptical GRASS is a natural extension of the GRASS proposal of Luo et al., (2015) to the non-normal setting. Transelliptical GRASS shares the attractive theoretical and computational properties of the original GRASS proposal. We have established that it performs almost as well as methods that assume Gaussianity when the data are Gaussian, and much better than methods that assume Gaussianity in the case of non-Gaussian data. Therefore, in general, there is little cost to applying transelliptical GRASS instead of the original GRASS proposal.

Appendix A Appendix

A.1 Proof of Theorem 1

Definition 4.

(Hoeffding, (1963), page 25) Let $W_{1},W_{2},\ldots,W_{n}$ be independent random variables. For $m\leq n$ , a one-sample U-statistic takes the form

U=\frac{1}{n^{(m)}}\sum_{n,m}g\left(W_{i_{1}},\ldots,W_{i_{m}}\right),

(A.1)

where $n^{(m)}=n\left(n-1\right)\ldots\left(n-m+1\right)$ , and the sum $\sum_{n,m}$ is taken over all $m$ -tuples $i_{1},\ldots,i_{m}$ of distinct positive integers not exceeding $n$ .

Lemma 2.

(Hoeffding, (1963), page 25) If the function of g in (A.1) is bounded as $a\leq g\left(W_{i_{1}},\ldots,W_{i_{m}}\right)\leq b$ , then for any $t>0$ and $m\leq n$ , we have

\Pr\left\{\left|U-{\mathbb{E}}\left(U\right)\right|>t\right\}\leq 2\exp\left\{\frac{-2\lfloor n/m\rfloor t^{2}}{(b-a)^{2}}\right\}.

Recall from (2.3) that

\widehat{\tau}_{jj^{\prime}}=\frac{2}{n(n-1)}\sum_{1\leq i\leq i^{\prime}\leq n}\mathrm{sign}\left(\left(X_{ij}-X_{i^{\prime}j}\right)\left(X_{ij^{\prime}}-X_{i^{\prime}j^{\prime}}\right)\right).

Let $W_{i}=(X_{ij},X_{ij^{\prime}})$ , $i=1,\ldots,n$ . Then the sample estimator $\widehat{\tau}_{jj^{\prime}}$ is of the form (A.1) with $m=2$ and $g\left(W_{i},W_{i^{\prime}}\right)=\mathrm{sign}\left(\left(X_{ij}-X_{i^{\prime}j}\right)\left(X_{ij^{\prime}}-X_{i^{\prime}j^{\prime}}\right)\right)$ . Therefore, we can apply Lemma 2 with $m=2$ and $-1\leq g\left(W_{i},W_{i^{\prime}}\right)\leq 1$ , which yields

\Pr\left(\left|\widehat{\tau}_{jj^{\prime}}-\tau_{jj^{\prime}}\right|>\frac{2}{3\pi}C_{1}n^{-\kappa}\right)\leq 2\exp\left(-\frac{2\lfloor n/2\rfloor}{9\pi^{2}}C_{1}^{2}n^{-2\kappa}\right).

Next, we notice that

$\displaystyle\Pr\left(\left\|\widehat{S}^{\tau}_{jj^{\prime}}-\rho_{jj^{\prime}}\right\|>\frac{1}{3}C_{1}n^{-\kappa}\right)$	$\displaystyle=$	$\displaystyle\Pr\left(\left\|\sin\left(\frac{\pi}{2}\widehat{\tau}_{jj^{\prime}}\right)-\sin\left(\frac{\pi}{2}\tau_{jj^{\prime}}\right)\right\|>\frac{1}{3}C_{1}n^{-\kappa}\right)$	(A.2)
	$\displaystyle\leq$	$\displaystyle\Pr\left(\left\|\widehat{\tau}_{jj^{\prime}}-\tau_{jj^{\prime}}\right\|>\frac{2}{3\pi}C_{1}n^{-\kappa}\right)$
	$\displaystyle\leq$	$\displaystyle 2\exp\left(-\frac{2\lfloor n/2\rfloor}{9\pi^{2}}C_{1}^{2}n^{-2\kappa}\right).$

The first equality results from directly applying Lemma 1 and our definition of $\widehat{S}^{\tau}_{jj^{\prime}}$ . The first inequality results from applying the mean value theorem. It follows that

$\displaystyle\Pr\left(\mathcal{E}\nsubseteq\widehat{\mathcal{E}}_{\gamma_{n}}\right)$	$\displaystyle=$	$\displaystyle\Pr\left\{\bigcup_{(j,j^{\prime})\in\mathcal{E}}\left(\left\|\widehat{S}^{\tau}_{jj^{\prime}}\right\|<\gamma_{n}\right)\right\}$
	$\displaystyle\leq$	$\displaystyle\sum_{(j,j^{\prime})\in\mathcal{E}}\Pr\left(\left\|\widehat{S}^{\tau}_{jj^{\prime}}\right\|<\frac{2}{3}C_{1}n^{-\kappa}\right)$
	$\displaystyle\leq$	$\displaystyle\sum_{(j,j^{\prime})\in\mathcal{E}}\Pr\left(\left\|\widehat{S}^{\tau}_{jj^{\prime}}-\rho_{jj^{\prime}}\right\|\geq\frac{1}{3}C_{1}n^{-\kappa}\right)$
	$\displaystyle\leq$	$\displaystyle 2p^{2}\exp\left(-\frac{2\lfloor n/2\rfloor}{9\pi^{2}}C_{1}^{2}n^{-2\kappa}\right).$

Here, the second inequality follows from Assumption 1, and the third inequality from the fact that $\mid\mathcal{E}\mid\leq p^{2}$ along with (A.2).

Therefore, we have shown that

\Pr(\mathcal{E}\subseteq\widehat{\mathcal{E}}_{\gamma_{n}})\geq 1-2p^{2}\cdot\exp\left(-\frac{2\lfloor n/2\rfloor}{9\pi^{2}}C_{1}^{2}n^{-2\kappa}\right).

A similar argument can be used to establish that

\Pr\left(\mathcal{E}_{j}\subseteq\widehat{\mathcal{E}}_{j,\gamma_{n}}\right)\geq 1-2p^{2}\cdot\exp\left(-\frac{2\lfloor n/2\rfloor}{9\pi^{2}}C_{1}^{2}n^{-2\kappa}\right).

Conversely, if $\min_{(j,j^{\prime})\in\mathcal{E}}|\rho_{jj^{\prime}}|<C_{1}n^{-\kappa}/3$ , then there exists $(j^{\star},{j^{\prime}}^{\star})\in\mathcal{E}$ such that $|\rho_{j^{\star}{j^{\prime}}^{\star}}|<C_{1}n^{-\kappa}/3$ . This together with (A.2) implies that

\displaystyle\Pr(\mathcal{E}\subseteq\widehat{\mathcal{E}}_{\gamma_{n}})\leq\Pr\left\{(j^{\star},{j^{\prime}}^{\star})\in\widehat{\mathcal{E}}_{\gamma_{n}}\right\}\leq\Pr\left(\left|\widehat{S}^{\tau}_{j^{\star}{j^{\prime}}^{\star}}-\rho_{j^{\star}{j^{\prime}}^{\star}}\right|\geq C_{1}n^{-\kappa}/3\right)\leq 2\exp\left(-\frac{2\lfloor n/2\rfloor}{9\pi^{2}}C_{1}^{2}n^{-2\kappa}\right),

so that the result holds.

A.2 Proof of Corollary 1

Proof.

It suffices to show that transelliptical GRASS will not result in edges between $x_{j}^{(s)}$ and $x_{j^{\prime}}^{(t)}$ for all $s\not=t$ with high probability.

This is the case when the event $\left(\left|\widehat{S}^{\tau}_{jj^{\prime}}-\rho_{jj^{\prime}}\right|\leq\frac{1}{3}C_{1}n^{-\kappa}\right)$ holds for all $j\neq j^{\prime}$ . As was shown in the proof of Theorem 1, $\Pr\left\{\bigcap_{j\neq j^{\prime}}\left|\widehat{S}^{\tau}_{jj^{\prime}}-\rho_{jj^{\prime}}\right|\leq\frac{1}{3}C_{1}n^{-\kappa}\right\}\geq 1-C_{4}\mathrm{exp}\left(-C_{5}n^{1-2\kappa}\right)$ . ∎

A.3 Proof of Theorem 2

Let

L_{j}=\left\{j^{\prime}:j^{\prime}\neq j,\left|\rho_{jj^{\prime}}\right|\geq\frac{1}{3}C_{1}n^{-\kappa}\right\}

and

\Gamma_{j}=\bigcap_{j^{\prime}:j^{\prime}\neq j}\left\{\left|\widehat{S}^{\tau}_{jj^{\prime}}-\rho_{jj^{\prime}}\right|\leq\frac{1}{3}C_{1}n^{-\kappa}\right\}.

By definition, $\widehat{\mathcal{E}}_{j,\gamma_{n}}=\left\{j^{\prime}:j^{\prime}\neq j,\left|\widehat{S}^{\tau}_{jj^{\prime}}\right|>\frac{2}{3}C_{1}n^{-\kappa}\right\}$ . On the set $\Gamma_{j}$ , if $j^{\prime}$ belongs to $\widehat{\mathcal{E}}_{j,\gamma_{n}}$ , it has to belong to $L_{j}$ . Thus, we conclude that $\Pr\left(\widehat{\mathcal{E}}_{j,\gamma_{n}}\subseteq L_{j}\right)\geq\Pr\left(\Gamma_{j}\right)$ . An argument similar to that in the proof of Theorem 1 can be used to show that

\Pr\left(\Gamma_{j}\right)\geq 1-C_{4}\exp\left(-C_{5}n^{1-2\kappa}\right).

This implies that

\Pr\left(\widehat{\mathcal{E}}_{j,\gamma_{n}}\subseteq L_{j}\right)\geq 1-C_{4}\exp\left(-C_{5}n^{1-2\kappa}\right).

(A.3)

Define $D=\sum_{j^{\prime}\in L_{j}}\rho_{jj^{\prime}}^{2}$ . Then, by the definition of $L_{j}$ , it follows that

D\geq\frac{1}{9}C_{1}^{2}\left|L_{j}\right|n^{-2\kappa}.

(A.4)

Furthermore,

D\leq\sum_{j^{\prime}=1}^{p}\rho_{jj^{\prime}}^{2}=\left\|\Sigma e_{j}\right\|_{2}^{2}\leq\Lambda_{\max}\left(\Sigma\right)e_{j}^{T}\Sigma e_{j}=\Lambda_{\max}\left(\Sigma\right),

(A.5)

where $e_{j}$ is the unit vector with a one in the $j$ th element and zeros elsewhere, and where the last equality results from the fact that the diagonal elements of $\Sigma$ are equal to $1$ .

Combining (A.4) and (A.5), we have that

|L_{j}|\leq 9C_{1}^{-2}n^{2\kappa}\Lambda_{\max}\left(\Sigma\right).

This, in conjunction with Assumption 2 and (A.3), completes the proof of Theorem 2.

A.4 Proof of Theorem 3

First, we verify the sure screening property (Section A.4.1). We then establish the control of the asymptotic expected false positive rate (Section A.4.2).

A.4.1 Verification of Sure Screening Property

To show that the sure screening property holds, it is enough to show that

\gamma_{n,jj^{\prime}}=\frac{\pi}{2}\widehat{\omega}_{jj^{\prime}}\Phi^{-1}\left(1-\frac{f}{p(p-1)}\right)/\sqrt{n}\leq\frac{2}{3}C_{1}n^{-\kappa}.

(A.6)

In other words, we must show that

\frac{f}{p(p-1)}\geq 1-\Phi\left(\frac{4}{3\pi\widehat{\omega}_{jj^{\prime}}}C_{1}n^{\frac{1}{2}-\kappa}\right).

(A.7)

Recall that $1-\Phi\left(x\right)\leq\frac{1}{\sqrt{2\pi}}x^{-1}\exp\left(-x^{2}/2\right)$ , which implies that

1-\Phi\left(\frac{4}{3\pi\widehat{\omega}_{jj^{\prime}}}C_{1}n^{\frac{1}{2}-\kappa}\right)\leq C_{8}n^{-\frac{1}{2}+\kappa}\exp\left(-C_{9}n^{1-2\kappa}\right).

(A.8)

Since $\log\left(p\right)=C_{3}n^{\xi}$ , we have that

f/\left\{p(p-1)\right\}\geq C_{10}\exp\left(-C_{11}n^{\xi}\right).

(A.9)

Combining (A.8) and (A.9), and using the fact that $\xi<1-2\kappa$ , (A.7) follows directly.

A.4.2 Control of the Asymptotic Expected False Positive Rate

Next, we show that the choice of $\gamma_{n,jj^{\prime}}$ given in the statement of Theorem 3 leads to control of the asymptotic expected false positive rate at $f/|\mathcal{E}^{c}|$ . The following lemma is used here.

Lemma 3.

Consider two random variables $X_{j}$ and $X_{j^{\prime}}$ , each with $n$ i.i.d observations. Then

\frac{\sqrt{n}(\widehat{\tau}_{jj^{\prime}}-\tau_{jj^{\prime}})}{\widehat{\omega}_{jj^{\prime}}}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1),

(A.10)

where $\widehat{\tau}_{jj^{\prime}}$ is defined in (2.3) and $\widehat{\omega}_{jj^{\prime}}^{2}$ is defined in (3.4).

Lemma 3 follows from Theorem 6 in Arvesen, (1969) in conjunction with Slutsky’s Theorem. It follows from an application of the delta method that

\frac{\sqrt{n}\left(\widehat{S}_{jj^{\prime}}^{\tau}-\rho_{jj^{\prime}}\right)}{\frac{\pi}{2}\widehat{\omega}_{jj^{\prime}}\sqrt{\left(1-\rho^{2}_{jj^{\prime}}\right)}}\stackrel{{\scriptstyle d}}{{\longrightarrow}}N(0,1).

(A.11)

Therefore, for any $(j,j^{\prime})\notin\mathcal{E}$ , we have

	$\displaystyle\Pr\left(\left\|\widehat{S}_{jj^{\prime}}^{\tau}\right\|>\gamma_{n,jj^{\prime}}\right)$	$\displaystyle=\Pr\left(\frac{\sqrt{n}\left(\widehat{S}_{jj^{\prime}}^{\tau}-\rho_{jj^{\prime}}\right)}{\frac{\pi\widehat{\omega}_{jj^{\prime}}}{2}\sqrt{1-\rho^{2}_{jj^{\prime}}}}>\frac{\sqrt{n}\left(\gamma_{n,jj^{\prime}}-\rho_{jj^{\prime}}\right)}{\frac{\pi\widehat{\omega}_{jj^{\prime}}}{2}\sqrt{1-\rho^{2}_{jj^{\prime}}}}\right)$
		$\displaystyle\ \ \ +\Pr\left(\frac{\sqrt{n}\left(\widehat{S}_{jj^{\prime}}^{\tau}-\rho_{jj^{\prime}}\right)}{\frac{\pi\widehat{\omega}_{jj^{\prime}}}{2}\sqrt{1-\rho^{2}_{jj^{\prime}}}}<-\frac{\sqrt{n}\left(\gamma_{n,jj^{\prime}}+\rho_{jj^{\prime}}\right)}{\frac{\pi\widehat{\omega}_{jj^{\prime}}}{2}\sqrt{1-\rho^{2}_{jj^{\prime}}}}\right)$
		$\displaystyle\to 1-\Phi\left(\Phi^{-1}\left(1-\frac{f}{p(p-1)}\right)\right)+1-\Phi\left(\Phi^{-1}\left(1-\frac{f}{p(p-1)}\right)\right)$
		$\displaystyle=\frac{2f}{p(p-1)},$

where the convergence results from combining (A.11), Assumption 3, and the fact that the order of $\sqrt{n}\gamma_{n,jj^{\prime}}$ is much larger than that of $\sqrt{n}\rho_{jj^{\prime}}$ because $\sqrt{n}\gamma_{n,jj^{\prime}}$ is of the same order as $n^{\frac{\xi}{2}}$ while $\sqrt{n}\rho_{jj^{\prime}}=o(n^{\frac{\xi}{2}})$ by Assumption 3.

Consequently, the expected $FPR$ is controlled as desired,

	$\displaystyle{\mathbb{E}}(FPR)$	$\displaystyle=\frac{1}{\|\mathcal{E}^{c}\|}\sum_{\left(j,j^{\prime}\right)\notin\mathcal{E}}\Pr\left(\left\|\widehat{S}_{jj^{\prime}}^{\tau}\right\|>\gamma_{n,jj^{\prime}}\right)$
		$\displaystyle\to\frac{\sum_{\left(j,j^{\prime}\right)\notin\mathcal{E}}\left(\frac{2f}{p(p-1)}\right)}{\left\|\mathcal{E}^{c}\right\|}=2f/\left(p\left(p-1\right)\right)\leq f/\|\mathcal{E}^{c}\|,$

where the last inequality results from the fact that $|\mathcal{E}^{c}|\leq\frac{p(p-1)}{2}$ .

References

Anandkumar et al., (2012) Anandkumar, A., Tan, V. Y. F., Huang, F., and Willsky, A. S. (2012). High-dimensional Structure Estimation in Ising Models: Local Separation Criterion. The Annals of Statistics, 40:1346–1375.
Arvesen, (1969) Arvesen, J. N. (1969). Jackknifing U-Statistics. The Annals of Mathematical Statistics, 40:2076–2100.
Callaert and Veraverbeke, (1981) Callaert, H. and Veraverbeke, N. (1981). The Order of the Normal Approximation for a Studentized U-Statistic. The Annals of Statistics, 9:194–200.
Chen et al., (2015) Chen, S., Witten, D., and Shojaie, A. (2015). Selection and estimation for mixed graphical models. Biometrika, 102(1):47–64.
Fligner and Rust, (1983) Fligner, M. A. and Rust, S. W. (1983). On the independence problem and Kendall’s tau. Communications in Statistics - Theory and Methods, 12:1597–1607.
Friedman et al., (2008) Friedman, J., Hastie, T. J., and Tibshirani, R. J. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9:432–441.
Hoeffding, (1963) Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58:13–30.
Lee, (1985) Lee, A. J. (1985). On estimating the variance of a U-statistic. Communications in Statistics - Theory and Methods, 14:289–302.
(9) Liu, H., Han, F., Yuan, M., Lafferty, J., and Wasserman, L. (2012a). High-dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40:2293–2326.
(10) Liu, H., Han, F., and Zhang, C. (2012b). Transelliptical graphical models. In Advances in Neural Information Processing Systems 25, pages 809–817.
Liu et al., (2009) Liu, H., Lafferty, J., and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10:2295–2328.
Luo et al., (2015) Luo, S., Shi, C., Song, R., Xie, Y., and Witten, D. (2015). Sure Screening for Gaussian Graphical Models. to appear.
Meinshausen and Bühlmann, (2006) Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34:1436–1462.
Ravikumar et al., (2009) Ravikumar, P., Lafferty, J., Liu, H., and Wasserman, L. (2009). Sparse additive models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71:1009–1030.
Ravikumar et al., (2010) Ravikumar, P., Wainwright, M. J., and Lafferty, J. D. (2010). High-dimensional Ising Model Selection Using L1-Regularized Logistic Regression. The Annals of Statistics, 38:1287–1319.
Rothman et al., (2008) Rothman, A. J., Bickel, P. J., Levina, E., and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electronic Journal of Statistics, 2:494–515.
Sen, (1977) Sen, P. K. (1977). Some Invariance Principles Relating to Jackknifing and Their Role in Sequential Analysis. The Annals of Statistics, 5:316–329.
Tan et al., (2015) Tan, K. M., Witten, D., and Shojaie, A. (2015). The cluster graphical lasso for improved estimation of Gaussian graphical models. Computational Statistics and Data Analysis, 85:23–36.
Witten et al., (2011) Witten, D. M., Friedman, J. H., and Simon, N. (2011). New insights and faster computations for the graphical lasso. Journal of Computational and Graphical Statistics, 20(4):892–900.
Yang et al., (2012) Yang, E., Allen, G., Liu, Z., and Ravikumar, P. (2012). Graphical models via generalized linear models. In Advances in Neural Information Processing Systems 25, pages 1367–1375.
Yang et al., (2014) Yang, E., Ravikumar, P., Allen, G., Baker, Y., Wan, Y., and Liu, Z. (2014). A General Framework for Mixed Graphical Models. arXiv:1411.0288.
Yuan and Lin, (2007) Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika, 94:19–35.