Reference-Invariant Inverse Covariance Estimation with Application to Microbial Network Recovery

Chuan Tian Department of Statistics
Oregon State University Duo Jiang Department of Statistics
Oregon State University Yuan Jiang Yuan Jiang is the corresponding author. This research is supported in part by National Institutes of Health grant R01 GM126549. Department of Statistics
Oregon State University

Abstract

The interactions between microbial taxa in microbiome data has been under great research interest in the science community. In particular, several methods such as SPIEC-EASI, gCoda, and CD-trace have been proposed to model the conditional dependency between microbial taxa, in order to eliminate the detection of spurious correlations. However, all those methods are built upon the central log-ratio (clr) transformation, which results in a degenerate covariance matrix and thus an undefined inverse covariance matrix as the estimation of the underlying network. Jiang et al. (2021) and Tian et al. (2022) proposed bias-corrected graphical lasso and compositional graphical lasso based on the additive log-ratio (alr) transformation, which first selects a reference taxon and then computes the log ratios of the abundances of all the other taxa with respect to that of the reference. One concern of the alr transformation would be the invariance of the estimated network with respect to the choice of reference. In this paper, we first establish the reference-invariance property of a subnetwork of interest based on the alr transformed data. Then, we propose a reference-invariant version of the compositional graphical lasso by modifying the penalty in its objective function, penalizing only the invariant subnetwork. We validate the reference-invariance property of the proposed method under a variety of simulation scenarios as well as through the application to an oceanic microbiome data set.

1 Introduction

Microbiome, which is the collection of micro-organisms in an ecological system, is of great research interest in the science community and has been shown to play an important role in influencing its host or living environment. For instance, it is found that intervening the gut microbiota of African turquoise killifish could result in delay of their aging process (Smith et al., 2017). The advancement of the high-throughput sequencing technologies such as 16S rRNA profiling that replicates a specific sequence of marker genes which is counted and serves as the proxy of the abundance of the Operational Taxonomic Units (OTU’s, the surrogate of bacteria species) in a sample, has enabled researchers to analyze the microbial compositions in uncultivated samples.

One common goal in microbiome data analysis is to understand how microbes interact with each other. However, the nature of microbiome data determined by the technicalities of the sequencing procedures has imposed various challenges in recovering the microbial interactions. Firstly, the data is composed of discrete counts. Secondly, microbiome data entangles with the “compositionality”, which is a technicality imposed by the sequencing procedures. For instance, in 16S rRNA profiling, the “sequencing depth”, i.e. the total count of sequences in a sample, is pre-determined on the sequencing instrument, and is usually not on the same scale from sample to sample. This implies that the counts for each OTU in a sample carry only the information about their relative abundances instead of their absolute abundances. Thirdly, the microbiome data possesses “high-dimensionality” in nature. With the resolution at the OTU level, it is likely that the number of OTU’s is far more than the number of samples in a biological experiment.

After one obtains the abundance data for the microbial species, marginal correlation analysis could be used to infer the interactions among microbes (Faust and Raes, 2012). Over the years, several methods have been developed to address the compositionality issue in the construction of correlation networks for microbiome data, such as SparCC (Friedman and Alm, 2012) , CCLasso (Fang et al., 2015), and REBECCA (Ban et al., 2015). All of those methods aimed to construct a covariance (correlation) matrix or network of the unknown absolute abundances (the positive, unconstrained true abundance of taxa). The high-dimensionality in microbiome data is tackled by imposing a sparsity constraint in the above three methods, with a correlation threshold for SparCC and an $L_{1}$ -norm penalty for CCLasso and REBECCA.

All the above methods are built upon the marginal correlations between two microbial taxa, and they could lead to spurious correlations that are caused by confounding factors such as other taxa in the same community. Alternatively, interactions among taxa can be modeled through their conditional dependencies given the other taxa, which can eliminate the detection of spurious correlations. SPIEC-EASI was probably the first method that was proposed to estimate sparse microbial network based on conditional dependency (Kurtz et al., 2015). It first performs a central log-ratio (clr) transformation on the observed counts (Aitchison, 1986), and then apply graphical lasso (Yuan and Lin, 2007; Banerjee et al., 2008; Friedman et al., 2008) to find the inverse covariance matrix of the transformed data. More recently, gCoda and CD-trace were developed to improve SPIEC-EASI by accounting for the compositionality property of microbiome data (Fang et al., 2017; Yuan et al., 2019), both of which have been shown to possess better performance in terms of recovering the sparse microbial network than SPIEC-EASI.

It is worth noting that SPIEC-EASI, gCoda, and CD-trace are all built upon the clr transformation of the observed counts. Meanwhile, Jiang et al. (2021) and Tian et al. (2022) proposed bias-corrected graphical lasso and compositional graphical lasso based on the additive log-ratio (alr) transformed data. In alr transformation, one needs to select a reference taxon and compute the log relative abundance of all other taxa with respect to the reference. One of the major concerns for the alr transformation is the robustness or invariance of the proposed method with respect to the choice of the reference taxon, which is not well studied in the literature.

In this paper, we first establish the reference-invariance property of estimating the sparse microbial network based on the alr transformed data. It shows that a submatrix of the inverse covariance matrix that correspond to the non-candidate-of-reference taxa is invariant with respect to the choice of the reference. Then, we propose a reference-invariant version of the compositional graphical lasso by modifying the penalty in its objective function, which only penalizes the invariant submatrix mentioned above. Additionally, we illustrate the reference-invariance property of the proposed method under a variety of simulation scenarios and also demonstrate its applicability and advantages by applying it to an oceanic microbiome data set.

2 Methodology

2.1 Reference-Invariance Property

Let $\mathbf{p}=(p_{1},\ldots,p_{K+2})^{\prime}$ denote a vector of compositional probabilities satisfying that $p_{1}+\cdots+p_{K+2}=1$ . The additive log-ratio (alr) transformation picks an entry of this vector as the reference and transforms the compositional vector using log ratios of each entry to the reference. Without loss of generality, suppose we pick the last entry as the reference, then the alr transformed vector becomes

\mathbf{z}=\left[\log\left(\frac{p_{1}}{p_{K+2}}\right),\ldots,\log\left(\frac{p_{K}}{p_{K+2}}\right),\log\left(\frac{p_{K+1}}{p_{K+2}}\right)\right]^{\prime}.

The transformed vector $\mathbf{z}$ is often assumed to follow a multivariate continuous distribution with a mean vector $\boldsymbol{\mu}$ and a covariance matrix $\boldsymbol{\Sigma}$ . For example, $\mathbf{z}\sim N(\boldsymbol{\mu},\boldsymbol{\Sigma})$ . Denote further the inverse covariance matrix $\boldsymbol{\Omega}=\boldsymbol{\Sigma}^{-1}$ .

Similarly, if we pick another entry as the reference, we can define another alr-transformed vector. For simplicity of illustration, suppose we choose the second last entry to be the reference and consider the following alr transformation

\mathbf{z}_{p}=\left[\log\left(\frac{p_{1}}{p_{K+1}}\right),\ldots,\log\left(\frac{p_{K}}{p_{K+1}}\right),\log\left(\frac{p_{K+2}}{p_{K+1}}\right)\right]^{\prime},

where the subscript $p$ denotes the “permuted” version of $\mathbf{z}$ . Similarly, define the mean vector of $\mathbf{z}_{p}$ by $\boldsymbol{\mu}_{p}$ , the covariance matrix by $\boldsymbol{\Sigma}_{p}$ , and the inverse covariance matrix by $\boldsymbol{\Omega}_{p}=\boldsymbol{\Sigma}_{p}^{-1}$ .

A simple derivation implies that $\mathbf{z}_{p}$ is a linear transformation of $\mathbf{z}$ as $\mathbf{z}_{p}=\mathbf{Q}_{p}\mathbf{z}$ , where

\mathbf{Q}_{p}=\left(\begin{matrix}\mathbf{I}_{K}&\mathbf{-1}\\ \mathbf{0}^{\prime}&-1\end{matrix}\right)

with $\mathbf{I}_{K}$ denoting the identity matrix, and $\mathbf{0}$ and $\mathbf{-1}$ denoting the column vectors with all $0$ ’s and all $-1$ ’s, respectively. It follows that $\boldsymbol{\mu}_{p}=\mathbf{Q}_{p}\boldsymbol{\mu}$ , $\boldsymbol{\Sigma}_{p}=\mathbf{Q}_{p}\boldsymbol{\Sigma}\mathbf{Q}_{p}^{\prime}$ , and $\boldsymbol{\Omega}_{p}=(\mathbf{Q}_{p}^{\prime})^{-1}\boldsymbol{\Omega}\mathbf{Q}_{p}^{-1}$ . It is also worth noting that $\mathbf{Q}_{p}$ is an involutory matrix, i.e., $\mathbf{Q}_{p}^{-1}=\mathbf{Q}_{p}$ .

The following theorem states the reference-invariance property of the inverse covariance matrix $\boldsymbol{\Omega}$ under the alr transformation.

Theorem 1.

$\boldsymbol{\Omega}_{1:K,1:K}=\boldsymbol{\Omega}_{p,1:K,1:K}$ , i.e. the $K\times K$ upper-left sub-matrix of the inverse covariance matrix of the alr transformed vector is invariant with respect to the choices of the $(K+2)$ -th entry or the $(K+1)$ -th entry as the reference.

Theorem 1 regards the reference-invariance property of the true value of the inverse covariance matrix $\boldsymbol{\Omega}$ . It can also be extended to a class of estimators of $\boldsymbol{\Omega}$ . Suppose we have i.i.d. observations of the compositional vectors $\mathbf{p}_{1},\ldots,\mathbf{p}_{n}$ , and consequently, their alr transformed counterparts $\mathbf{z}_{1},\ldots,\mathbf{z}_{n}$ . Then, we can construct an estimator of $\boldsymbol{\Sigma}$ , denoted by $\hat{\boldsymbol{\Sigma}}$ , based on the i.i.d. observations $\mathbf{z}_{1},\ldots,\mathbf{z}_{n}$ . Furthermore, we can construct an estimator of $\boldsymbol{\Omega}$ , denoted by $\hat{\boldsymbol{\Omega}}$ , by taking its inverse or generalized inverse. The following corollary presents the reference-invariance property for a class of such estimators.

Corollary 1.

Suppose $\hat{\boldsymbol{\Sigma}}_{p}=\mathbf{Q}_{p}\hat{\boldsymbol{\Sigma}}\mathbf{Q}_{p}^{\prime}$ and both $\hat{\boldsymbol{\Sigma}}_{p}$ and $\hat{\boldsymbol{\Sigma}}$ are invertible. Let $\hat{\boldsymbol{\Omega}}=\hat{\boldsymbol{\Sigma}}^{-1}$ and $\hat{\boldsymbol{\Omega}}_{p}=\hat{\boldsymbol{\Sigma}}_{p}^{-1}$ be their inverse matrices. Then, $\hat{\boldsymbol{\Omega}}_{1:K,1:K}=\hat{\boldsymbol{\Omega}}_{p,1:K,1:K}$ , i.e. the $K\times K$ upper-left sub-matrix of the estimated inverse covariance matrix of the alr transformed vector is invariant with respect to the choices of the $(K+2)$ -entry or the $(K+1)$ -th entry as the reference.

The above results imply an important property for the additive log-ratio transformation in the compositional data analysis. It can be extended to a more general situation as follows. In general, suppose we have selected a set of entries $\mathbf{p}_{\mathcal{R}}$ as “candidate references” in a compositional vector $\mathbf{p}$ and write $\mathbf{p}=(\mathbf{p}_{\mathcal{R}^{c}}^{\prime},\mathbf{p}_{\mathcal{R}}^{\prime})^{\prime}$ . Then, for any alr transformed vector $\mathbf{z}$ based on a reference in the set of candidate references $\mathbf{p}_{\mathcal{R}}$ , the $|\mathcal{R}^{c}|\times|\mathcal{R}^{c}|$ upper-left sub-matrix of the (estimated) inverse covariance matrix of $\mathbf{z}$ is invariant with respect to the choice of the reference.

In the following subsections, we will incorporate the reference-invariance property into the estimation of a sparse inverse covariance matrix for compositional count data, such as the OTU abundance data in microbiome research.

2.2 Logistic Normal Multinomial Model

Consider an OTU abundance data set with $n$ independent samples, each of which composes observed counts of $K+2$ taxa, denoted by $\mathbf{x}_{i}=(x_{i,1},\ldots,x_{i,K+2})^{\prime}$ for the $i$ -th sample, $i=1,\ldots,n$ . Due to the compositional property of the data, the sum of all counts for each sample $i$ is a fixed number, denoted by $M_{i}$ . Naturally, a multinomial distribution is imposed on the observed counts as

\mathbf{x}_{i}|\mathbf{p}_{i}\sim\text{Multinomial}(M_{i},\mathbf{p}_{i}),

(1)

where $\mathbf{p}_{i}=(p_{i,1},\ldots,p_{i,K+2})^{\prime}$ are the multinomial probabilities with $\sum_{k=1}^{K+2}p_{i,k}=1$ .

In addition, we choose one taxon, without loss of generality, the $(K+2)$ -th taxon as the reference and then apply the alr transformation (Aitchison, 1986) on the multinomial probabilities as follows

\mathbf{z}_{i}=\left[\log\left(\frac{p_{i,1}}{p_{i,K+2}}\right),\ldots,\log\left(\frac{p_{i,K}}{p_{i,K+2}}\right),\log\left(\frac{p_{i,K+1}}{p_{i,K+2}}\right)\right]^{\prime},\ i=1,\ldots,n.

(2)

Further assume that $\mathbf{z}_{i}$ ’s follow an i.i.d. multivariate normal distribution

\mathbf{z}_{i}\stackrel{{\scriptstyle iid}}{{\sim}}N(\boldsymbol{\mu},\boldsymbol{\Sigma}),\ i=1,\ldots,n,

(3)

where $\boldsymbol{\mu}$ is the mean, $\boldsymbol{\Sigma}$ is the covariance matrix, and $\boldsymbol{\Omega}=\boldsymbol{\Sigma}^{-1}$ is the inverse covariance matrix. The above model in (1)–(3) is called a logistic normal multinomial model and has been applied to analyze the microbiome abundance data (Xia et al., 2013).

Tian et al. (2022) proposed a method called compositional graphical lasso that aims to find a sparse estimator of the inverse covariance matrix $\boldsymbol{\Omega}$ , in which the following objective function is minimized

	$\displaystyle\ell(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})={}$	$\displaystyle-\frac{1}{n}\sum_{i=1}^{n}\left[\mathbf{x}_{i,-(K+2)}^{\prime}\mathbf{z}_{i}-M_{i}\log\{\mathbf{1}^{\prime}\exp(\mathbf{z}_{i})+1\}\right]$
		$\displaystyle-\frac{1}{2}\log[\det(\boldsymbol{\Omega})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})+\lambda\\|\boldsymbol{\Omega}\\|_{1},$		(4)

where $\mathbf{x}_{i,-(K+2)}=(x_{i,1},\ldots,x_{i,K+1})^{\prime}$ and $\mathbf{1}=(1,\ldots,1)^{\prime}$ . The above objective function has two parts: The first term in (4) is the negative log-likelihood of the multinomial distribution in (1) and the remaining terms are the regular objective function of graphical lasso for the multivariate normal distribution in (3) regarding $\mathbf{z}_{1},\ldots,\mathbf{z}_{n}$ as known quantities.

2.3 Reference-Invariant Objective Function

Similar to 2.1, if we choose another taxon, for simplicity of illustration, the $(K+1)$ -th taxon as the reference, then the alr transformation in (2) becomes

\mathbf{z}_{i,p}=\left[\log\left(\frac{p_{i,1}}{p_{i,K+1}}\right),\ldots,\log\left(\frac{p_{i,K}}{p_{i,K+1}}\right),\log\left(\frac{p_{i,K+2}}{p_{i,K+1}}\right)\right]^{\prime}.

As in Sections 2.1, $\mathbf{z}_{i,p}=\mathbf{Q}_{p}\mathbf{z}_{i}$ . Therefore, $\mathbf{z}_{i,p}\stackrel{{\scriptstyle iid}}{{\sim}}N(\boldsymbol{\mu}_{p},\boldsymbol{\Sigma}_{p})$ , $i=1,\ldots,n$ , where $\boldsymbol{\mu}_{p}=\mathbf{Q}_{p}\boldsymbol{\mu}$ , $\boldsymbol{\Sigma}_{p}=\mathbf{Q}_{p}\boldsymbol{\Sigma}\mathbf{Q}_{p}^{\prime}$ , and $\boldsymbol{\Omega}_{p}=(\mathbf{Q}_{p}^{\prime})^{-1}\boldsymbol{\Omega}\mathbf{Q}_{p}^{-1}$ . The reference-invariance property in 2.1 implies that $\boldsymbol{\Omega}_{1:K,1:K}=\boldsymbol{\Omega}_{p,1:K,1:K}$ .

The different choice of the reference also leads to a different objective function for the compositional graphical lasso method (Comp-gLASSO) as follows

	$\displaystyle\ell_{p}(\mathbf{z}_{1,p},\ldots,\mathbf{z}_{n,p},\boldsymbol{\mu}_{p},\boldsymbol{\Omega}_{p})={}$	$\displaystyle-\frac{1}{n}\sum_{i=1}^{n}\left[\mathbf{x}_{i,-(K+1)}^{\prime}\mathbf{z}_{i,p}-M_{i}\log\{\mathbf{1}^{\prime}\exp(\mathbf{z}_{i,p})+1\}\right]$
		$\displaystyle-\frac{1}{2}\log[\det(\boldsymbol{\Omega}_{p})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i,p}-\boldsymbol{\mu}_{p})^{\prime}\boldsymbol{\Omega}_{p}(\mathbf{z}_{i,p}-\boldsymbol{\mu}_{p})+\lambda\\|\boldsymbol{\Omega}_{p}\\|_{1},$		(5)

where $\mathbf{x}_{i,-(K+1)}=(x_{i,1},\ldots,x_{i,K},x_{i,K+2})^{\prime}$ and $\mathbf{1}=(1,\ldots,1)^{\prime}$ . Comparing (4) and (5), their first terms are the same as they are both equal to the negative log-likelihood of the multinomial distribution: $-\frac{1}{n}\sum_{i=1}^{n}\sum_{k=1}^{K+2}x_{i,k}\log p_{i,k}$ . In addition, from Aitchison (1986), $\det(\boldsymbol{\Omega})=\det(\boldsymbol{\Omega}_{p})$ and $\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})=\sum_{i=1}^{n}(\mathbf{z}_{i,p}-\boldsymbol{\mu}_{p})^{\prime}\boldsymbol{\Omega}_{p}(\mathbf{z}_{i,p}-\boldsymbol{\mu}_{p})$ as known properties of the alr transformation. However, the $L_{1}$ penalties in (4) and (5) are different because $\boldsymbol{\Omega}$ is not necessarily equal to $\boldsymbol{\Omega}_{p}$ . The reference-invariance property only implies that $\boldsymbol{\Omega}_{1:K,1:K}=\boldsymbol{\Omega}_{p,1:K,1:K}$ .

Motivated by the reference-invariance property, we can impose the $L_{1}$ penalties only on the invariant entries of $\boldsymbol{\Omega}$ instead of all entries of $\boldsymbol{\Omega}$ as in (4), which leads to

	$\displaystyle\ell_{inv}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})={}$	$\displaystyle-\frac{1}{n}\sum_{i=1}^{n}\left[\mathbf{x}_{i,-(K+2)}^{\prime}\mathbf{z}_{i}-M_{i}\log\{\mathbf{1}^{\prime}\exp(\mathbf{z}_{i})+1\}\right]$
		$\displaystyle-\frac{1}{2}\log[\det(\boldsymbol{\Omega})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})+\lambda\\|\boldsymbol{\Omega}_{1:K,1:K}\\|_{1}.$		(6)

With the previous arguments, we showed that $\ell_{\text{inv}}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})$ is reference-invariant; in other words, the objective function $\ell_{\text{inv}}$ stays the same regardless of whether the $(K+1)$ -th or the $(K+2)$ -th taxa is selected as the reference. This is summarized in the following theorem.

Theorem 2.

If $\mathbf{z}_{i,p}=\mathbf{Q}_{p}\mathbf{z}_{i}$ for $i=1,\ldots,n$ , $\boldsymbol{\mu}_{p}=\mathbf{Q}_{p}\boldsymbol{\mu}$ , and $\boldsymbol{\Omega}_{p}=(\mathbf{Q}_{p}^{\prime})^{-1}\boldsymbol{\Omega}\mathbf{Q}_{p}^{-1}$ , then $\ell_{inv}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})=\ell_{inv,p}(\mathbf{z}_{1,p},\ldots,\mathbf{z}_{n,p},\boldsymbol{\mu}_{p},\boldsymbol{\Omega}_{p})$ .

We call $\ell_{inv}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})$ in (6) the reference-invariant compositional graphical lasso objective function. To obtain a sparse estimator of $\boldsymbol{\Omega}$ , we minimize the objective function $\ell_{inv}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})$ with respect to $\mathbf{z}_{1},\ldots,\mathbf{z}_{n}$ , $\boldsymbol{\mu}$ , and $\boldsymbol{\Omega}$ . We call this estimation approach the reference-invariant compositional graphical lasso (Inv-Comp-gLASSO) method.

In general, suppose we have selected a set of taxa $\mathbf{x}_{\mathcal{R}}$ as “candidate references” and write $\mathbf{x}=(\mathbf{x}_{\mathcal{R}^{c}}^{\prime},\mathbf{x}_{\mathcal{R}}^{\prime})^{\prime}$ . Then, the reference-invariant objective function becomes

	$\displaystyle\ell_{\text{inv}}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})={}$	$\displaystyle-\frac{1}{n}\sum_{i=1}^{n}\left[\mathbf{x}_{i,\mathcal{R}^{c}}^{\prime}\mathbf{z}_{i}-M_{i}\log\{\mathbf{1}^{\prime}\exp(\mathbf{z}_{i})+1\}\right]$
		$\displaystyle-\frac{1}{2}\log[\det(\boldsymbol{\Omega})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})+\lambda\\|\boldsymbol{\Omega}_{\mathcal{R}^{c},\mathcal{R}^{c}}\\|_{1}.$		(7)

In other words, (7) is invariant regardless of which reference is selected in the set of candidate references, and so is the invariant part of its minimizer.

It is noteworthy that the trick we played in defining the reference-invariant version of Comp-gLASSO is to revise the penalty term from the regular lasso penalty on the whole inverse covariance matrix to that only on the invariant part of the inverse covariance matrix. Using the same trick, we can define the reference-invariant version of other methods such as reference-invariant graphical lasso (Inv-gLASSO). The objective function of Inv-gLASSO is defined as follows when $\mathbf{z}_{1},\ldots,\mathbf{z}_{n}$ are observed instead of $\mathbf{x}_{1},\ldots,\mathbf{x}_{n}$ :

\ell_{\text{inv}}(\boldsymbol{\mu},\boldsymbol{\Omega})=-\frac{1}{2}\log[\det(\boldsymbol{\Omega})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})+\lambda\|\boldsymbol{\Omega}_{\mathcal{R}^{c},\mathcal{R}^{c}}\|_{1}.

(8)

The objective function (7) includes naturally three sets of parameters $(\mathbf{z}_{1},\ldots,\mathbf{z}_{n})$ , $\boldsymbol{\mu}$ , and $\boldsymbol{\Omega}$ , which motivates us to apply a block coordinate descent algorithm. A block coordinate descent algorithm minimizes the objective function iteratively for each set of parameters given the other sets. Given the initial values $(\mathbf{z}_{1}^{(0)},\ldots,\mathbf{z}_{n}^{(0)})$ , $\boldsymbol{\mu}^{(0)}$ , and $\boldsymbol{\Omega}^{(0)}$ , a block coordinate algorithm repeats the following steps cyclically for iteration $t=0,1,2,\ldots$ until the algorithm converges.

1.

Given $\boldsymbol{\mu}^{(t)}$ and $\boldsymbol{\Omega}^{(t)}$ , find $(\mathbf{z}_{1}^{(t+1)},\ldots,\mathbf{z}_{n}^{(t+1)})$ that maximizes (7).
2.

Given $(\mathbf{z}_{1}^{(t+1)},\ldots,\mathbf{z}_{n}^{(t+1)})$ and $\boldsymbol{\Omega}^{(t)}$ , find $\boldsymbol{\mu}^{(t+1)}$ that maximizes (7).
3.

Given $(\mathbf{z}_{1}^{(t+1)},\ldots,\mathbf{z}_{n}^{(t+1)})$ and $\boldsymbol{\mu}^{(t+1)}$ , find $\boldsymbol{\Omega}^{(t+1)}$ that maximizes (7).

Except for the mild modification on the objective function, the details of the algorithm is essentially the same as the one described in Tian et al. (2022).

3 Simulation Study

3.1 Settings

To illustrate the reference-invariance property under the aforementioned framework, we conduct a simulation study and evaluated the performance of Inv-Comp-gLASSO as well as Inv-gLASSO.

Following Tian et al. (2022), we generate three types of inverse covariance matrices $\boldsymbol{\Omega}=(\omega_{kl})_{1\leq k,l\leq K+1}$ as follows:

1.

Chain: $\omega_{kk}=1.5$ , $\omega_{kl}=0.5$ if $|k-l|=1$ , and $\Omega_{kl}=0$ if $|k-l|>1$ . Every node is connected to the adjacent node(s), and therefore the degree is $2$ for all but two nodes.
2.

Random: $\omega_{kl}=1$ with probability $3/(K+1)$ for $k\neq l$ . Every two nodes are connected with a fixed probability, and the expected degree is the same for all nodes.
3.

Hub: Nodes are randomly partitioned into $\lceil(K+1)/20\rceil$ groups, and there’s one “hub node” in each group. For the other nodes in the group, they are only connected to the hub node but not each other. There’s no connection among groups. The degree of connectedness is much higher for the hub nodes , and is $1$ for the rest of the nodes.

In the simulations, we also vary two other factors that play a crucial role in the performances of the methods:

1.

Sequencing depth. $M_{i}$ ’s are simulated from $\text{Uniform}(20K,40K)$ or $\text{Uniform}(100K,200K)$ , denoted by “low” and “high” sequencing depth.
2.

Compositional variation. For each aforementioned inverse covariance matrix $\boldsymbol{\Omega}$ (“low” compositional variation), we also divide each of them by a factor of $5$ to obtain another set of inverse covariance matrices, i.e., $\boldsymbol{\Omega}$ /5 (“high” compositional variation).

The data are simulated from the logistic normal multinomial distribution in (1)–(3). In detail, $\mathbf{z}_{i}\sim N(\boldsymbol{\mu},\boldsymbol{\Sigma})$ are first generated independently for $i=1,\ldots,n$ ; then, the softmax transformation (the inverse of the alr transformation) was applied to get the multinomial probabilities $\mathbf{p}_{i}$ with the $(K+2)$ -th entry serving as the true reference; last, the multinomial random variables $\mathbf{x}_{i}$ were simulated from $Multinomial(M_{i};\mathbf{p}_{i})$ , for $i=1,\ldots,n$ . We set $n=100$ and $K=49$ throughout the simulations.

The simulation results are based on 100 replicates of the simulated data. Both Inv-Comp-gLASSO and Inv-gLASSO are applied with two choices of reference, the $(K+1)$ -th entry serving as the false reference and the $(K+2)$ -th entry serving as the true reference. and only the reference-invariant sub-network $\boldsymbol{\Omega}_{1:K,1:K}$ is used in the evaluations. For Inv-gLASSO, we estimate $\mathbf{p}_{1},\ldots,\mathbf{p}_{n}$ with $\mathbf{x}_{1}/M_{1},\ldots,\mathbf{x}_{n}/M_{n}$ , and performe the alr transformation to get the estimates of $\mathbf{z}_{1},\ldots,\mathbf{z}_{n}$ , which are denoted by $\tilde{\mathbf{z}}_{1},\ldots,\tilde{\mathbf{z}}_{n}$ . We then apply Inv-gLASSO to $\tilde{\mathbf{z}}_{1},\ldots,\tilde{\mathbf{z}}_{n}$ directly to find the inverse covariance matrix $\boldsymbol{\Omega}$ , which also serves as the starting value for Inv-Comp-gLASSO. For both methods, we implement them with a common sequence of $70$ tuning parameters of $\lambda$ .

We empirically validate the invariance property of Inv-Comp-gLASSO and Inv-gLASSO by comparing the estimators of the two sub-networks $\boldsymbol{\Omega}_{1:K,1:K}$ resulted from choosing the true and false reference separately in each method, which have been shown to be theoretically invariant in Section 2. The comparison is assessed under four criteria as follows.

1.

Normalized Manhattan Similarity

For two matrices $\mathbf{A}$ and $\mathbf{B}$ , we define the normalized Manhattan similarity (NMS) as

$\text{NMS}(\mathbf{A},\mathbf{B})=1-\frac{\norm{\mathbf{A}-\mathbf{B}}_{1}}{\norm{\mathbf{A}}_{1}+\norm{\mathbf{B}}_{1}},$

where $\norm{\cdot}_{1}$ represents the entrywise $L_{1}$ norm of a matrix. Note that $0\leq\text{NMS}\leq 1$ due to the non-negativity of norms and the triangle inequality.
2.

Jaccard Index
For two networks with the same nodes, denote their set of edges by $\mathcal{A}$ and $\mathcal{B}$ . Then the Jaccard Index (Jaccard, 1901) is defined as follows:

$J(\mathcal{A},\mathcal{B})=\frac{|\mathcal{A}\cap\mathcal{B}|}{|\mathcal{A}\cup\mathcal{B}|}.$

Obviously, it also holds that $0\leq J(\mathcal{A},\mathcal{B})\leq 1$ .
3.

Normalized Hamming Similarity
In the context of network comparison, the normalized Hamming similarity for two adjacency matrices $\mathbf{A}$ and $\mathbf{B}$ with the same $N$ nodes are defined as follows (Hamming, 1950)

$H(\mathbf{A},\mathbf{B})=1-\frac{\norm{\mathbf{A}-\mathbf{B}}_{1}}{N(N-1)},$

where $\norm{\cdot}_{1}$ denotes the entrywise $L_{1}$ norm of a matrix. Since there are at most $N(N-1)$ $1$ ’s in an adjacency matrix with $N$ nodes, this metric is also between $0$ and $1$ .
4.

ROC Curve
The ROC curves on which true positive rate (TPR) and false positive rate (FPR) are plotted and also compared between the choices of true and false reference.

3.2 Results

3.2.1 Normalized Manhattan Similarity

The normalized Manhattan similarity serves as a direct measure of the similarity between the two estimated inverse covariance matrices with different choices of the reference. On a sequence of $70$ tuning parameter $\lambda$ ’s, we calculated the normalized Manhattan similarity between the two estimated inverse covariance matrices with the true and false references separately. Figure 1 shows the average normalized Manhattan similarity over 100 replicates along with standard error bars.

Refer to caption — Figure 1: Normalized Manhattan similarity between the two estimated inverse covariance matrices with true and false references. Solid blue: Inv-Comp-gLASSO; dashed red: Inv-gLASSO. $\mathbf{h,h}$ : high sequencing depth, high compositional variation; $\mathbf{h,l}$ : high sequencing depth, low compositional variation; $\mathbf{l,h}$ : low sequencing depth, high compositional variation; $\mathbf{l,l}$ : low sequencing depth, low compositional variation.

We can see from Figure 1 that, the normalized Manhattan similarity for Inv-gLASSO stays close to 1 in all settings, throughout the whole sequence of tuning parameters. On the other hand, there are some fluctuations in the same metric from Inv-Comp-gLASSO, although most values stay higher than 0.9. Empirically, the two estimated matrices are numerically identical for Inv-gLASSO and close for Inv-Comp-gLASSO. A potential reason why the invariance of Inv-gLASSO is numerically more evident than that of Inv-Comp-gLASSO is that Inv-gLASSO is a convex optimization while Inv-Comp-gLASSO is not necessarily convex (Tian et al., 2022). With different starting points (as we choose different references), Inv-Comp-gLASSO might result in different solutions as the algorithm is only guaranteed to converge to a stationary point. We refer to Tian et al. (2022) for more detailed discussion about the convexity and the convergence of the algorithm.

In addition, it is consistently observed that the normalized Manhattan similarity for Inv-Comp-gLASSO starts close to 1 when $\lambda$ is very large and gradually decreases with some fluctuations as $\lambda$ decreases. This is because the Inv-Comp-gLASSO objective function in (6) is solved by an iterative algorithm between graphical lasso to estimate $(\boldsymbol{\mu},\boldsymbol{\Omega})$ and Newton-Raphson to estimate $\mathbf{z}_{1},\ldots,\mathbf{z}_{n}$ , which can lead to more numerical errors depending on the number of iterations. Furthermore, the algorithm is implemented with warm start for a sequence of decreasing $\lambda$ ’s, i.e., the solution for the previous $\lambda$ value is used as the starting point for the current $\lambda$ value. With the accumulation of numerical errors, the numerical difference between the two estimated matrices becomes larger.

Among the simulation settings, we find the invariance property for Inv-Comp-gLASSO is most evidently supported by the numerical results in the “high sequencing depth, low compositional variation” setting, regardless of the network types. The normalized Manhattan similarity is very close to 1 for Inv-Comp-gLASSO throughout the sequence of tuning parameters. This is because the compositional probabilities $\mathbf{p}_{i}$ ’s and thus the $\mathbf{z}_{i}$ ’s are estimated accurately with high sequencing depth and low compositional variation in the first iteration of the Inv-Comp-gLASSO algorithm, which implies fewer iterations for the algorithm to converge and less numeric error accumulated during this process. On the other hand, the normalized Manhattan similarity is the lowest in the “low sequencing depth, high compositional variation” setting. It is due to a similar reason that it takes more iterations for the Inv-Comp-gLASSO algorithm to converge, accumulating more numerical errors. However, it is noteworthy that this is exactly the setting in which Inv-Comp-gLASSO has the most advantage over Inv-gLASSO in recovering the true network (see Section 3.2.3 for their ROC curves).

3.2.2 Jaccard Index and Normalized Hamming Similarity

Compared to normalized Manhattan similarity that measures directly the similarity between two inverse covariane matrices, both Jaccard index and normalized Hamming similarity measure the similarity between two networks represented by the matrices because they only compared the adjacency matrices or the edges of the two networks. Again, on a sequence of $70$ tuning parameter $\lambda$ ’s, we computed the Jaccard index and the normalized Hamming similarity between the two networks with true and false references separately. Figures 2 and 3 plot the average Jaccard index and the normalized Hamming similarity over 100 replicates along with standard error bars.

The results of normalized Hamming similarity in Figure 3 have a fairly similar pattern to those of normalized Manhattan similarity in Figure 1 and thus can be similarly interpreted. We only focus on the results of Jaccard index in Figure 2 here. The Jaccard index stays close to 1 for Inv-gLASSO, implying the identity between the two networks. Although the results of the Jaccard index for Inv-Comp-gLASSO look quite different from the other measures at the first glance, it actually implies a similar conclusion. First, we notice that there is no Jaccard index for the first few tuning parameters that are large enough. This is because the resultant network is empty with either true or false reference. Although two empty networks agree with each other perfectly, the Jaccard index is not well defined. Then, as Inv-Comp-gLASSO starts to pick up edges when $\lambda$ decreases, the Jaccard index is quite low in some settings, suggesting that the two networks are dissimilar. However, this is due to the fact the Jaccard index is a much more “strict” similarity measure than the Hamming similarity. For example, for two networks with 100 possible total edges, if both networks only have one but a different edge, then the Jaccard index is $0$ while the normalized Hamming similarity is $0.98$ . Finally, as the networks become denser, the Jaccard index increases quickly and stabilizes at a quite high value in most settings.

It is also notable that both the Jaccard index and the normalized Hamming similarity are relatively high in the “high sequencing depth, low compositional variation” setting and relatively low in the “low sequencing depth, high compositional variation” setting, which is consistent with the finding for the normalized Manhattan similarity.

3.2.3 ROC Curves

An ROC curve is plotted from the average of true positive rates and the average of false positive rates over 100 replicates. An ROC curve can be regarded as an indirect measure of the invariance (two networks possessing similar ROC curves is a necessary but not sufficient condition for the two networks to be similar). However, it is crucial to evaluate the algorithms with this criterion, since it answers the question: “Does the performance of the algorithm depends on the choice of reference?”

We could see that the ROC curves from Inv-Comp-gLASSO, regardless of the choice of the reference, dominate the ones from Inv-gLASSO in all settings. The two ROC curves from Inv-gLASSO lay perfectly on top of each other, while the curves from Inv-Comp-gLASSO are also fairly close to each other. These empirical results validate the theoretical reference-invariance property for both methods. In addition, Inv-Comp-gLASSO has the most obvious advantage over Inv-gLASSO in the “low sequencing depth, high compositional variation” setting and they perform almost identically in the “low sequencing depth, high compositional variation” setting. Although the similarity measures are lower in the “most favorable” setting for Inv-Comp-gLASSO (see Sections 3.2.1 and 3.2.2), the ROC curves of the two networks from the method do not deviate too much from each other in this setting.

4 Real Data

To further validate the theoretical reference-invariance properties of Inv-Comp-gLASSO and Inv-gLASSO, we applied them to a dataset from the TARA Ocean project, in which the Tara Oceans consortium sampled both plankton and environmental data in 210 sites from the world oceans. The data collected was later analyzed using sequencing and imaging techniques. We downloaded the taxonomic data and the literature interactions from the TARA Ocean Project data repository (https://doi.pangaea.de/10.1594/PANGAEA.843018). As part of the TARA Oceans project, Lima-Mendez et al. (2015) investigated the impact of both biotic and abiotic interactions in oceanic ecosystem. In this article, a literature-curated list of genus-level marine eukaryotic plankton interactions was generated by a panel of experts.

Similar to Tian et al. (2022), we focused the analysis on genus level and only kept the 81 genus involved in the literature-reported interactions. For computational simplicity, we removed the samples with too small reads ( $<100$ ). As a result, it leaves with 324 samples in the final preprocessed data. From the genera that were not reported in the literature, we chose two of them, Acrosphaera and Collosphaera, with the largest average relative abundances as the references. We then applied both Inv-Comp-gLASSO and Inv-gLASSO to the alr-transformed data with those two references, with a common sequence of tuning parameters. For each combination of a method and a reference, we also selected a tuning parameter that corresponds to the “asymptotically sparsistent” (the sparsest estimated network in the path that contains the true network asymptotically) network by StARS (Liu et al., 2010).

Figures 5 presents the normalized Manhattan similarity between two estimated inverse covariance matrices with the two choices of reference. Figures 6 plotted the Jaccard index and the normalized Hamming similarity between the two networks represented by the two estimated inverse covariance matrices. We can see from Figure 5 and Figure 6 that, all three similarity metrics stay steadily around $1$ for Inv-gLASSO throughout the sequence of $\lambda$ ’s. This agrees with our observation in simulations where Inv-gLASSO produced numerically identical inverse covariance matrices.

For Inv-Comp-gLASSO, the similarity scores start around $1$ (for normalized Manhattan similarity and normalized Hamming similarity) or non-existent (for Jaccard index) when the estimated networks are empty. Then, as $\lambda$ decreases, the estimated networks become denser, and the measures start to fluctuate and decline slightly at the end. In spite of the fluctuations, both normalized Manhattan similarity and normalized Hamming similarity stay above 0.9, while the lowest Jaccard index is about 0.64. As discussed earlier, Jaccard index is a stricter measure than normalized Hamming similarity.

Within each method, StARS picked the same tuning parameter $\lambda$ regardless of the choice of the reference, as denoted by the red dot for Inv-gLASSO and the blue dot for Inv-Comp-gLASSO in Figures 5 and 6. In other words, the red and blue dots represent the tuning parameters corresponding to the final estimated networks selected by StARS. Again, the three similarity measures for the two final inverse covariance matrices or networks from Inv-gLASSO is almost 1, while the normalized Manhattan similarity, Jaccard index, and normalized Hamming similarity are $0.98$ , $0.96$ and $0.99$ for Inv-Comp-gLASSO. All these high similarity scores imply the empirical invariance for Inv-gLASSO and Inv-Comp-gLASSO. Both methods result in invariant inverse covariance matrices and thus the corresponding networks with respect to the choices of the reference genus (Acrosphaera or Collosphaera) when applied to the TARA Ocean eukaryotic dataset.

5 Discussion

In this work, we established the reference-invariance property in sparse inverse covariance matrix estimation and network construction based on the alr transformed data. Then, we proposed the reference-invariant versions of the compositional graphical lasso and graphical lasso by modifying the penalty in their respective objective functions. In addition, we validated the reference-invariance property of the proposed methods empirically by applying them to various scenarios of simulations and a real TARA Ocean eukaryotic dataset.

It is noteworthy that the reference-invariance property is a general property for estimating the inverse covariance matrix based on the alr transformed data. We proposed reference-invariant versions of compositional graphical lasso and graphical lasso based on this property, however, one may revise other existing methods for inverse covariance matrix estimation based on the alr transformed data. The trick is to revise the objective function so that it becomes invariant with respect to the choice of the reference. Subsequently, the resultant inverse covariance matrix and network are expected to be reference-invariant both theoretically and empirically, the latter of which may depend on the algorithm that is used to optimize the reference-invariant objective function.

References

Aitchison (1986) Aitchison, J. (1986), The statistical analysis of compositional data, Monographs on statistics and applied probability, Chapman and Hall.
Ban et al. (2015) Ban, Y., An, L., and Jiang, H. (2015), “Investigating microbial co-occurrence patterns based on metagenomic compositional data,” Bioinformatics, 31, 3322–3329.
Banerjee et al. (2008) Banerjee, O., El Ghaoui, L., and d’Aspremont, A. (2008), “Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data,” The Journal of Machine Learning Research, 9, 485–516.
Fang et al. (2015) Fang, H., Huang, C., Zhao, H., and Deng, M. (2015), “CCLasso: correlation inference for compositional data through Lasso,” Bioinformatics, 31, 3172–3180.
Fang et al. (2017) — (2017), “gCoda: conditional dependence network inference for compositional data,” Journal of Computational Biology, 24, 699–708.
Faust and Raes (2012) Faust, K. and Raes, J. (2012), “Microbial interactions: from networks to models,” Nature Reviews Microbiology, 10, 538.
Friedman and Alm (2012) Friedman, J. and Alm, E. J. (2012), “Inferring correlation networks from genomic survey data,” PLoS computational biology, 8, e1002687.
Friedman et al. (2008) Friedman, J., Hastie, T., and Tibshirani, R. (2008), “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, 9, 432–441.
Hamming (1950) Hamming, R. W. (1950), “Error detecting and error correcting codes,” The Bell system technical journal, 29, 147–160.
Jaccard (1901) Jaccard, P. (1901), “Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines,” Bull Soc Vaudoise Sci Nat, 37, 241–272.
Jiang et al. (2021) Jiang, D., Sharpton, T., and Jiang, Y. (2021), “Microbial interaction network estimation via bias-corrected graphical lasso,” Statistics in Biosciences, 13, 329–350.
Kurtz et al. (2015) Kurtz, Z. D., Müller, C. L., Miraldi, E. R., Littman, D. R., Blaser, M. J., and Bonneau, R. A. (2015), “Sparse and compositionally robust inference of microbial ecological networks,” PLoS computational biology, 11, e1004226.
Lima-Mendez et al. (2015) Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J. C., Roux, S., Vincent, F., et al. (2015), “Determinants of community structure in the global plankton interactome,” Science, 348, 1262073.
Liu et al. (2010) Liu, H., Roeder, K., and Wasserman, L. (2010), “Stability approach to regularization selection (stars) for high dimensional graphical models,” in Advances in neural information processing systems, pp. 1432–1440.
Smith et al. (2017) Smith, P., Willemsen, D., Popkes, M., Metge, F., Gandiwa, E., Reichard, M., and Valenzano, D. R. (2017), “Regulation of life span by the gut microbiota in the short-lived African turquoise killifish,” elife, 6, e27014.
Tian et al. (2022) Tian, C., Jiang, D., Hammer, A., Sharpton, T., and Jiang, Y. (2022), “Compositional Graphical Lasso Resolves the Impact of Parasitic Infection on Gut Microbial Interaction Networks in a Zebrafish Model,” Under Review.
Xia et al. (2013) Xia, F., Chen, J., Fung, W. K., and Li, H. (2013), “A logistic normal multinomial regression model for microbiome compositional data analysis,” Biometrics, 69, 1053–1063.
Yuan et al. (2019) Yuan, H., He, S., and Deng, M. (2019), “Compositional data network analysis via lasso penalized D-trace loss,” Bioinformatics, 35, 3404–3411.
Yuan and Lin (2007) Yuan, M. and Lin, Y. (2007), “Model selection and estimation in the Gaussian graphical model,” Biometrika, 94, 19–35.