This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Reference-Invariant Inverse Covariance Estimation with Application to Microbial Network Recovery

Chuan Tian Department of Statistics
Oregon State University
Duo Jiang Department of Statistics
Oregon State University
Yuan Jiang Yuan Jiang is the corresponding author. This research is supported in part by National Institutes of Health grant R01 GM126549. Department of Statistics
Oregon State University
Abstract

The interactions between microbial taxa in microbiome data has been under great research interest in the science community. In particular, several methods such as SPIEC-EASI, gCoda, and CD-trace have been proposed to model the conditional dependency between microbial taxa, in order to eliminate the detection of spurious correlations. However, all those methods are built upon the central log-ratio (clr) transformation, which results in a degenerate covariance matrix and thus an undefined inverse covariance matrix as the estimation of the underlying network. Jiang et al. (2021) and Tian et al. (2022) proposed bias-corrected graphical lasso and compositional graphical lasso based on the additive log-ratio (alr) transformation, which first selects a reference taxon and then computes the log ratios of the abundances of all the other taxa with respect to that of the reference. One concern of the alr transformation would be the invariance of the estimated network with respect to the choice of reference. In this paper, we first establish the reference-invariance property of a subnetwork of interest based on the alr transformed data. Then, we propose a reference-invariant version of the compositional graphical lasso by modifying the penalty in its objective function, penalizing only the invariant subnetwork. We validate the reference-invariance property of the proposed method under a variety of simulation scenarios as well as through the application to an oceanic microbiome data set.

1 Introduction

Microbiome, which is the collection of micro-organisms in an ecological system, is of great research interest in the science community and has been shown to play an important role in influencing its host or living environment. For instance, it is found that intervening the gut microbiota of African turquoise killifish could result in delay of their aging process (Smith et al., 2017). The advancement of the high-throughput sequencing technologies such as 16S rRNA profiling that replicates a specific sequence of marker genes which is counted and serves as the proxy of the abundance of the Operational Taxonomic Units (OTU’s, the surrogate of bacteria species) in a sample, has enabled researchers to analyze the microbial compositions in uncultivated samples.

One common goal in microbiome data analysis is to understand how microbes interact with each other. However, the nature of microbiome data determined by the technicalities of the sequencing procedures has imposed various challenges in recovering the microbial interactions. Firstly, the data is composed of discrete counts. Secondly, microbiome data entangles with the “compositionality”, which is a technicality imposed by the sequencing procedures. For instance, in 16S rRNA profiling, the “sequencing depth”, i.e. the total count of sequences in a sample, is pre-determined on the sequencing instrument, and is usually not on the same scale from sample to sample. This implies that the counts for each OTU in a sample carry only the information about their relative abundances instead of their absolute abundances. Thirdly, the microbiome data possesses “high-dimensionality” in nature. With the resolution at the OTU level, it is likely that the number of OTU’s is far more than the number of samples in a biological experiment.

After one obtains the abundance data for the microbial species, marginal correlation analysis could be used to infer the interactions among microbes (Faust and Raes, 2012). Over the years, several methods have been developed to address the compositionality issue in the construction of correlation networks for microbiome data, such as SparCC (Friedman and Alm, 2012) , CCLasso (Fang et al., 2015), and REBECCA (Ban et al., 2015). All of those methods aimed to construct a covariance (correlation) matrix or network of the unknown absolute abundances (the positive, unconstrained true abundance of taxa). The high-dimensionality in microbiome data is tackled by imposing a sparsity constraint in the above three methods, with a correlation threshold for SparCC and an L1L_{1}-norm penalty for CCLasso and REBECCA.

All the above methods are built upon the marginal correlations between two microbial taxa, and they could lead to spurious correlations that are caused by confounding factors such as other taxa in the same community. Alternatively, interactions among taxa can be modeled through their conditional dependencies given the other taxa, which can eliminate the detection of spurious correlations. SPIEC-EASI was probably the first method that was proposed to estimate sparse microbial network based on conditional dependency (Kurtz et al., 2015). It first performs a central log-ratio (clr) transformation on the observed counts (Aitchison, 1986), and then apply graphical lasso (Yuan and Lin, 2007; Banerjee et al., 2008; Friedman et al., 2008) to find the inverse covariance matrix of the transformed data. More recently, gCoda and CD-trace were developed to improve SPIEC-EASI by accounting for the compositionality property of microbiome data (Fang et al., 2017; Yuan et al., 2019), both of which have been shown to possess better performance in terms of recovering the sparse microbial network than SPIEC-EASI.

It is worth noting that SPIEC-EASI, gCoda, and CD-trace are all built upon the clr transformation of the observed counts. Meanwhile, Jiang et al. (2021) and Tian et al. (2022) proposed bias-corrected graphical lasso and compositional graphical lasso based on the additive log-ratio (alr) transformed data. In alr transformation, one needs to select a reference taxon and compute the log relative abundance of all other taxa with respect to the reference. One of the major concerns for the alr transformation is the robustness or invariance of the proposed method with respect to the choice of the reference taxon, which is not well studied in the literature.

In this paper, we first establish the reference-invariance property of estimating the sparse microbial network based on the alr transformed data. It shows that a submatrix of the inverse covariance matrix that correspond to the non-candidate-of-reference taxa is invariant with respect to the choice of the reference. Then, we propose a reference-invariant version of the compositional graphical lasso by modifying the penalty in its objective function, which only penalizes the invariant submatrix mentioned above. Additionally, we illustrate the reference-invariance property of the proposed method under a variety of simulation scenarios and also demonstrate its applicability and advantages by applying it to an oceanic microbiome data set.

2 Methodology

2.1 Reference-Invariance Property

Let 𝐩=(p1,,pK+2)\mathbf{p}=(p_{1},\ldots,p_{K+2})^{\prime} denote a vector of compositional probabilities satisfying that p1++pK+2=1p_{1}+\cdots+p_{K+2}=1. The additive log-ratio (alr) transformation picks an entry of this vector as the reference and transforms the compositional vector using log ratios of each entry to the reference. Without loss of generality, suppose we pick the last entry as the reference, then the alr transformed vector becomes

𝐳=[log(p1pK+2),,log(pKpK+2),log(pK+1pK+2)].\mathbf{z}=\left[\log\left(\frac{p_{1}}{p_{K+2}}\right),\ldots,\log\left(\frac{p_{K}}{p_{K+2}}\right),\log\left(\frac{p_{K+1}}{p_{K+2}}\right)\right]^{\prime}.

The transformed vector 𝐳\mathbf{z} is often assumed to follow a multivariate continuous distribution with a mean vector 𝝁\boldsymbol{\mu} and a covariance matrix 𝚺\boldsymbol{\Sigma}. For example, 𝐳N(𝝁,𝚺)\mathbf{z}\sim N(\boldsymbol{\mu},\boldsymbol{\Sigma}). Denote further the inverse covariance matrix 𝛀=𝚺1\boldsymbol{\Omega}=\boldsymbol{\Sigma}^{-1}.

Similarly, if we pick another entry as the reference, we can define another alr-transformed vector. For simplicity of illustration, suppose we choose the second last entry to be the reference and consider the following alr transformation

𝐳p=[log(p1pK+1),,log(pKpK+1),log(pK+2pK+1)],\mathbf{z}_{p}=\left[\log\left(\frac{p_{1}}{p_{K+1}}\right),\ldots,\log\left(\frac{p_{K}}{p_{K+1}}\right),\log\left(\frac{p_{K+2}}{p_{K+1}}\right)\right]^{\prime},

where the subscript pp denotes the “permuted” version of 𝐳\mathbf{z}. Similarly, define the mean vector of 𝐳p\mathbf{z}_{p} by 𝝁p\boldsymbol{\mu}_{p}, the covariance matrix by 𝚺p\boldsymbol{\Sigma}_{p}, and the inverse covariance matrix by 𝛀p=𝚺p1\boldsymbol{\Omega}_{p}=\boldsymbol{\Sigma}_{p}^{-1}.

A simple derivation implies that 𝐳p\mathbf{z}_{p} is a linear transformation of 𝐳\mathbf{z} as 𝐳p=𝐐p𝐳\mathbf{z}_{p}=\mathbf{Q}_{p}\mathbf{z}, where

𝐐p=(𝐈K𝟏𝟎1)\mathbf{Q}_{p}=\left(\begin{matrix}\mathbf{I}_{K}&\mathbf{-1}\\ \mathbf{0}^{\prime}&-1\end{matrix}\right)

with 𝐈K\mathbf{I}_{K} denoting the identity matrix, and 𝟎\mathbf{0} and 𝟏\mathbf{-1} denoting the column vectors with all 0’s and all 1-1’s, respectively. It follows that 𝝁p=𝐐p𝝁\boldsymbol{\mu}_{p}=\mathbf{Q}_{p}\boldsymbol{\mu}, 𝚺p=𝐐p𝚺𝐐p\boldsymbol{\Sigma}_{p}=\mathbf{Q}_{p}\boldsymbol{\Sigma}\mathbf{Q}_{p}^{\prime}, and 𝛀p=(𝐐p)1𝛀𝐐p1\boldsymbol{\Omega}_{p}=(\mathbf{Q}_{p}^{\prime})^{-1}\boldsymbol{\Omega}\mathbf{Q}_{p}^{-1}. It is also worth noting that 𝐐p\mathbf{Q}_{p} is an involutory matrix, i.e., 𝐐p1=𝐐p\mathbf{Q}_{p}^{-1}=\mathbf{Q}_{p}.

The following theorem states the reference-invariance property of the inverse covariance matrix 𝛀\boldsymbol{\Omega} under the alr transformation.

Theorem 1.

𝛀1:K,1:K=𝛀p,1:K,1:K\boldsymbol{\Omega}_{1:K,1:K}=\boldsymbol{\Omega}_{p,1:K,1:K}, i.e. the K×KK\times K upper-left sub-matrix of the inverse covariance matrix of the alr transformed vector is invariant with respect to the choices of the (K+2)(K+2)-th entry or the (K+1)(K+1)-th entry as the reference.

Theorem 1 regards the reference-invariance property of the true value of the inverse covariance matrix 𝛀\boldsymbol{\Omega}. It can also be extended to a class of estimators of 𝛀\boldsymbol{\Omega}. Suppose we have i.i.d. observations of the compositional vectors 𝐩1,,𝐩n\mathbf{p}_{1},\ldots,\mathbf{p}_{n}, and consequently, their alr transformed counterparts 𝐳1,,𝐳n\mathbf{z}_{1},\ldots,\mathbf{z}_{n}. Then, we can construct an estimator of 𝚺\boldsymbol{\Sigma}, denoted by 𝚺^\hat{\boldsymbol{\Sigma}}, based on the i.i.d. observations 𝐳1,,𝐳n\mathbf{z}_{1},\ldots,\mathbf{z}_{n}. Furthermore, we can construct an estimator of 𝛀\boldsymbol{\Omega}, denoted by 𝛀^\hat{\boldsymbol{\Omega}}, by taking its inverse or generalized inverse. The following corollary presents the reference-invariance property for a class of such estimators.

Corollary 1.

Suppose 𝚺^p=𝐐p𝚺^𝐐p\hat{\boldsymbol{\Sigma}}_{p}=\mathbf{Q}_{p}\hat{\boldsymbol{\Sigma}}\mathbf{Q}_{p}^{\prime} and both 𝚺^p\hat{\boldsymbol{\Sigma}}_{p} and 𝚺^\hat{\boldsymbol{\Sigma}} are invertible. Let 𝛀^=𝚺^1\hat{\boldsymbol{\Omega}}=\hat{\boldsymbol{\Sigma}}^{-1} and 𝛀^p=𝚺^p1\hat{\boldsymbol{\Omega}}_{p}=\hat{\boldsymbol{\Sigma}}_{p}^{-1} be their inverse matrices. Then, 𝛀^1:K,1:K=𝛀^p,1:K,1:K\hat{\boldsymbol{\Omega}}_{1:K,1:K}=\hat{\boldsymbol{\Omega}}_{p,1:K,1:K}, i.e. the K×KK\times K upper-left sub-matrix of the estimated inverse covariance matrix of the alr transformed vector is invariant with respect to the choices of the (K+2)(K+2)-entry or the (K+1)(K+1)-th entry as the reference.

The above results imply an important property for the additive log-ratio transformation in the compositional data analysis. It can be extended to a more general situation as follows. In general, suppose we have selected a set of entries 𝐩\mathbf{p}_{\mathcal{R}} as “candidate references” in a compositional vector 𝐩\mathbf{p} and write 𝐩=(𝐩c,𝐩)\mathbf{p}=(\mathbf{p}_{\mathcal{R}^{c}}^{\prime},\mathbf{p}_{\mathcal{R}}^{\prime})^{\prime}. Then, for any alr transformed vector 𝐳\mathbf{z} based on a reference in the set of candidate references 𝐩\mathbf{p}_{\mathcal{R}}, the |c|×|c||\mathcal{R}^{c}|\times|\mathcal{R}^{c}| upper-left sub-matrix of the (estimated) inverse covariance matrix of 𝐳\mathbf{z} is invariant with respect to the choice of the reference.

In the following subsections, we will incorporate the reference-invariance property into the estimation of a sparse inverse covariance matrix for compositional count data, such as the OTU abundance data in microbiome research.

2.2 Logistic Normal Multinomial Model

Consider an OTU abundance data set with nn independent samples, each of which composes observed counts of K+2K+2 taxa, denoted by 𝐱i=(xi,1,,xi,K+2)\mathbf{x}_{i}=(x_{i,1},\ldots,x_{i,K+2})^{\prime} for the ii-th sample, i=1,,ni=1,\ldots,n. Due to the compositional property of the data, the sum of all counts for each sample ii is a fixed number, denoted by MiM_{i}. Naturally, a multinomial distribution is imposed on the observed counts as

𝐱i|𝐩iMultinomial(Mi,𝐩i),\mathbf{x}_{i}|\mathbf{p}_{i}\sim\text{Multinomial}(M_{i},\mathbf{p}_{i}), (1)

where 𝐩i=(pi,1,,pi,K+2)\mathbf{p}_{i}=(p_{i,1},\ldots,p_{i,K+2})^{\prime} are the multinomial probabilities with k=1K+2pi,k=1\sum_{k=1}^{K+2}p_{i,k}=1.

In addition, we choose one taxon, without loss of generality, the (K+2)(K+2)-th taxon as the reference and then apply the alr transformation (Aitchison, 1986) on the multinomial probabilities as follows

𝐳i=[log(pi,1pi,K+2),,log(pi,Kpi,K+2),log(pi,K+1pi,K+2)],i=1,,n.\mathbf{z}_{i}=\left[\log\left(\frac{p_{i,1}}{p_{i,K+2}}\right),\ldots,\log\left(\frac{p_{i,K}}{p_{i,K+2}}\right),\log\left(\frac{p_{i,K+1}}{p_{i,K+2}}\right)\right]^{\prime},\ i=1,\ldots,n. (2)

Further assume that 𝐳i\mathbf{z}_{i}’s follow an i.i.d. multivariate normal distribution

𝐳iiidN(𝝁,𝚺),i=1,,n,\mathbf{z}_{i}\stackrel{{\scriptstyle iid}}{{\sim}}N(\boldsymbol{\mu},\boldsymbol{\Sigma}),\ i=1,\ldots,n, (3)

where 𝝁\boldsymbol{\mu} is the mean, 𝚺\boldsymbol{\Sigma} is the covariance matrix, and 𝛀=𝚺1\boldsymbol{\Omega}=\boldsymbol{\Sigma}^{-1} is the inverse covariance matrix. The above model in (1)–(3) is called a logistic normal multinomial model and has been applied to analyze the microbiome abundance data (Xia et al., 2013).

Tian et al. (2022) proposed a method called compositional graphical lasso that aims to find a sparse estimator of the inverse covariance matrix 𝛀\boldsymbol{\Omega}, in which the following objective function is minimized

(𝐳1,,𝐳n,𝝁,𝛀)=\displaystyle\ell(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})={} 1ni=1n[𝐱i,(K+2)𝐳iMilog{𝟏exp(𝐳i)+1}]\displaystyle-\frac{1}{n}\sum_{i=1}^{n}\left[\mathbf{x}_{i,-(K+2)}^{\prime}\mathbf{z}_{i}-M_{i}\log\{\mathbf{1}^{\prime}\exp(\mathbf{z}_{i})+1\}\right]
12log[det(𝛀)]+12ni=1n(𝐳i𝝁)𝛀(𝐳i𝝁)+λ𝛀1,\displaystyle-\frac{1}{2}\log[\det(\boldsymbol{\Omega})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})+\lambda\|\boldsymbol{\Omega}\|_{1}, (4)

where 𝐱i,(K+2)=(xi,1,,xi,K+1)\mathbf{x}_{i,-(K+2)}=(x_{i,1},\ldots,x_{i,K+1})^{\prime} and 𝟏=(1,,1)\mathbf{1}=(1,\ldots,1)^{\prime}. The above objective function has two parts: The first term in (4) is the negative log-likelihood of the multinomial distribution in (1) and the remaining terms are the regular objective function of graphical lasso for the multivariate normal distribution in (3) regarding 𝐳1,,𝐳n\mathbf{z}_{1},\ldots,\mathbf{z}_{n} as known quantities.

2.3 Reference-Invariant Objective Function

Similar to 2.1, if we choose another taxon, for simplicity of illustration, the (K+1)(K+1)-th taxon as the reference, then the alr transformation in (2) becomes

𝐳i,p=[log(pi,1pi,K+1),,log(pi,Kpi,K+1),log(pi,K+2pi,K+1)].\mathbf{z}_{i,p}=\left[\log\left(\frac{p_{i,1}}{p_{i,K+1}}\right),\ldots,\log\left(\frac{p_{i,K}}{p_{i,K+1}}\right),\log\left(\frac{p_{i,K+2}}{p_{i,K+1}}\right)\right]^{\prime}.

As in Sections 2.1, 𝐳i,p=𝐐p𝐳i\mathbf{z}_{i,p}=\mathbf{Q}_{p}\mathbf{z}_{i}. Therefore, 𝐳i,piidN(𝝁p,𝚺p)\mathbf{z}_{i,p}\stackrel{{\scriptstyle iid}}{{\sim}}N(\boldsymbol{\mu}_{p},\boldsymbol{\Sigma}_{p}), i=1,,ni=1,\ldots,n, where 𝝁p=𝐐p𝝁\boldsymbol{\mu}_{p}=\mathbf{Q}_{p}\boldsymbol{\mu}, 𝚺p=𝐐p𝚺𝐐p\boldsymbol{\Sigma}_{p}=\mathbf{Q}_{p}\boldsymbol{\Sigma}\mathbf{Q}_{p}^{\prime}, and 𝛀p=(𝐐p)1𝛀𝐐p1\boldsymbol{\Omega}_{p}=(\mathbf{Q}_{p}^{\prime})^{-1}\boldsymbol{\Omega}\mathbf{Q}_{p}^{-1}. The reference-invariance property in 2.1 implies that 𝛀1:K,1:K=𝛀p,1:K,1:K\boldsymbol{\Omega}_{1:K,1:K}=\boldsymbol{\Omega}_{p,1:K,1:K}.

The different choice of the reference also leads to a different objective function for the compositional graphical lasso method (Comp-gLASSO) as follows

p(𝐳1,p,,𝐳n,p,𝝁p,𝛀p)=\displaystyle\ell_{p}(\mathbf{z}_{1,p},\ldots,\mathbf{z}_{n,p},\boldsymbol{\mu}_{p},\boldsymbol{\Omega}_{p})={} 1ni=1n[𝐱i,(K+1)𝐳i,pMilog{𝟏exp(𝐳i,p)+1}]\displaystyle-\frac{1}{n}\sum_{i=1}^{n}\left[\mathbf{x}_{i,-(K+1)}^{\prime}\mathbf{z}_{i,p}-M_{i}\log\{\mathbf{1}^{\prime}\exp(\mathbf{z}_{i,p})+1\}\right]
12log[det(𝛀p)]+12ni=1n(𝐳i,p𝝁p)𝛀p(𝐳i,p𝝁p)+λ𝛀p1,\displaystyle-\frac{1}{2}\log[\det(\boldsymbol{\Omega}_{p})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i,p}-\boldsymbol{\mu}_{p})^{\prime}\boldsymbol{\Omega}_{p}(\mathbf{z}_{i,p}-\boldsymbol{\mu}_{p})+\lambda\|\boldsymbol{\Omega}_{p}\|_{1}, (5)

where 𝐱i,(K+1)=(xi,1,,xi,K,xi,K+2)\mathbf{x}_{i,-(K+1)}=(x_{i,1},\ldots,x_{i,K},x_{i,K+2})^{\prime} and 𝟏=(1,,1)\mathbf{1}=(1,\ldots,1)^{\prime}. Comparing (4) and (5), their first terms are the same as they are both equal to the negative log-likelihood of the multinomial distribution: 1ni=1nk=1K+2xi,klogpi,k-\frac{1}{n}\sum_{i=1}^{n}\sum_{k=1}^{K+2}x_{i,k}\log p_{i,k}. In addition, from Aitchison (1986), det(𝛀)=det(𝛀p)\det(\boldsymbol{\Omega})=\det(\boldsymbol{\Omega}_{p}) and i=1n(𝐳i𝝁)𝛀(𝐳i𝝁)=i=1n(𝐳i,p𝝁p)𝛀p(𝐳i,p𝝁p)\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})=\sum_{i=1}^{n}(\mathbf{z}_{i,p}-\boldsymbol{\mu}_{p})^{\prime}\boldsymbol{\Omega}_{p}(\mathbf{z}_{i,p}-\boldsymbol{\mu}_{p}) as known properties of the alr transformation. However, the L1L_{1} penalties in (4) and (5) are different because 𝛀\boldsymbol{\Omega} is not necessarily equal to 𝛀p\boldsymbol{\Omega}_{p}. The reference-invariance property only implies that 𝛀1:K,1:K=𝛀p,1:K,1:K\boldsymbol{\Omega}_{1:K,1:K}=\boldsymbol{\Omega}_{p,1:K,1:K}.

Motivated by the reference-invariance property, we can impose the L1L_{1} penalties only on the invariant entries of 𝛀\boldsymbol{\Omega} instead of all entries of 𝛀\boldsymbol{\Omega} as in (4), which leads to

inv(𝐳1,,𝐳n,𝝁,𝛀)=\displaystyle\ell_{inv}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})={} 1ni=1n[𝐱i,(K+2)𝐳iMilog{𝟏exp(𝐳i)+1}]\displaystyle-\frac{1}{n}\sum_{i=1}^{n}\left[\mathbf{x}_{i,-(K+2)}^{\prime}\mathbf{z}_{i}-M_{i}\log\{\mathbf{1}^{\prime}\exp(\mathbf{z}_{i})+1\}\right]
12log[det(𝛀)]+12ni=1n(𝐳i𝝁)𝛀(𝐳i𝝁)+λ𝛀1:K,1:K1.\displaystyle-\frac{1}{2}\log[\det(\boldsymbol{\Omega})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})+\lambda\|\boldsymbol{\Omega}_{1:K,1:K}\|_{1}. (6)

With the previous arguments, we showed that inv(𝐳1,,𝐳n,𝝁,𝛀)\ell_{\text{inv}}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega}) is reference-invariant; in other words, the objective function inv\ell_{\text{inv}} stays the same regardless of whether the (K+1)(K+1)-th or the (K+2)(K+2)-th taxa is selected as the reference. This is summarized in the following theorem.

Theorem 2.

If 𝐳i,p=𝐐p𝐳i\mathbf{z}_{i,p}=\mathbf{Q}_{p}\mathbf{z}_{i} for i=1,,ni=1,\ldots,n, 𝛍p=𝐐p𝛍\boldsymbol{\mu}_{p}=\mathbf{Q}_{p}\boldsymbol{\mu}, and 𝛀p=(𝐐p)1𝛀𝐐p1\boldsymbol{\Omega}_{p}=(\mathbf{Q}_{p}^{\prime})^{-1}\boldsymbol{\Omega}\mathbf{Q}_{p}^{-1}, then inv(𝐳1,,𝐳n,𝛍,𝛀)=inv,p(𝐳1,p,,𝐳n,p,𝛍p,𝛀p)\ell_{inv}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})=\ell_{inv,p}(\mathbf{z}_{1,p},\ldots,\mathbf{z}_{n,p},\boldsymbol{\mu}_{p},\boldsymbol{\Omega}_{p}).

We call inv(𝐳1,,𝐳n,𝝁,𝛀)\ell_{inv}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega}) in (6) the reference-invariant compositional graphical lasso objective function. To obtain a sparse estimator of 𝛀\boldsymbol{\Omega}, we minimize the objective function inv(𝐳1,,𝐳n,𝝁,𝛀)\ell_{inv}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega}) with respect to 𝐳1,,𝐳n\mathbf{z}_{1},\ldots,\mathbf{z}_{n}, 𝝁\boldsymbol{\mu}, and 𝛀\boldsymbol{\Omega}. We call this estimation approach the reference-invariant compositional graphical lasso (Inv-Comp-gLASSO) method.

In general, suppose we have selected a set of taxa 𝐱\mathbf{x}_{\mathcal{R}} as “candidate references” and write 𝐱=(𝐱c,𝐱)\mathbf{x}=(\mathbf{x}_{\mathcal{R}^{c}}^{\prime},\mathbf{x}_{\mathcal{R}}^{\prime})^{\prime}. Then, the reference-invariant objective function becomes

inv(𝐳1,,𝐳n,𝝁,𝛀)=\displaystyle\ell_{\text{inv}}(\mathbf{z}_{1},\ldots,\mathbf{z}_{n},\boldsymbol{\mu},\boldsymbol{\Omega})={} 1ni=1n[𝐱i,c𝐳iMilog{𝟏exp(𝐳i)+1}]\displaystyle-\frac{1}{n}\sum_{i=1}^{n}\left[\mathbf{x}_{i,\mathcal{R}^{c}}^{\prime}\mathbf{z}_{i}-M_{i}\log\{\mathbf{1}^{\prime}\exp(\mathbf{z}_{i})+1\}\right]
12log[det(𝛀)]+12ni=1n(𝐳i𝝁)𝛀(𝐳i𝝁)+λ𝛀c,c1.\displaystyle-\frac{1}{2}\log[\det(\boldsymbol{\Omega})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})+\lambda\|\boldsymbol{\Omega}_{\mathcal{R}^{c},\mathcal{R}^{c}}\|_{1}. (7)

In other words, (7) is invariant regardless of which reference is selected in the set of candidate references, and so is the invariant part of its minimizer.

It is noteworthy that the trick we played in defining the reference-invariant version of Comp-gLASSO is to revise the penalty term from the regular lasso penalty on the whole inverse covariance matrix to that only on the invariant part of the inverse covariance matrix. Using the same trick, we can define the reference-invariant version of other methods such as reference-invariant graphical lasso (Inv-gLASSO). The objective function of Inv-gLASSO is defined as follows when 𝐳1,,𝐳n\mathbf{z}_{1},\ldots,\mathbf{z}_{n} are observed instead of 𝐱1,,𝐱n\mathbf{x}_{1},\ldots,\mathbf{x}_{n}:

inv(𝝁,𝛀)=12log[det(𝛀)]+12ni=1n(𝐳i𝝁)𝛀(𝐳i𝝁)+λ𝛀c,c1.\ell_{\text{inv}}(\boldsymbol{\mu},\boldsymbol{\Omega})=-\frac{1}{2}\log[\det(\boldsymbol{\Omega})]+\frac{1}{2n}\sum_{i=1}^{n}(\mathbf{z}_{i}-\boldsymbol{\mu})^{\prime}\boldsymbol{\Omega}(\mathbf{z}_{i}-\boldsymbol{\mu})+\lambda\|\boldsymbol{\Omega}_{\mathcal{R}^{c},\mathcal{R}^{c}}\|_{1}. (8)

The objective function (7) includes naturally three sets of parameters (𝐳1,,𝐳n)(\mathbf{z}_{1},\ldots,\mathbf{z}_{n}), 𝝁\boldsymbol{\mu}, and 𝛀\boldsymbol{\Omega}, which motivates us to apply a block coordinate descent algorithm. A block coordinate descent algorithm minimizes the objective function iteratively for each set of parameters given the other sets. Given the initial values (𝐳1(0),,𝐳n(0))(\mathbf{z}_{1}^{(0)},\ldots,\mathbf{z}_{n}^{(0)}), 𝝁(0)\boldsymbol{\mu}^{(0)}, and 𝛀(0)\boldsymbol{\Omega}^{(0)}, a block coordinate algorithm repeats the following steps cyclically for iteration t=0,1,2,t=0,1,2,\ldots until the algorithm converges.

  1. 1.

    Given 𝝁(t)\boldsymbol{\mu}^{(t)} and 𝛀(t)\boldsymbol{\Omega}^{(t)}, find (𝐳1(t+1),,𝐳n(t+1))(\mathbf{z}_{1}^{(t+1)},\ldots,\mathbf{z}_{n}^{(t+1)}) that maximizes (7).

  2. 2.

    Given (𝐳1(t+1),,𝐳n(t+1))(\mathbf{z}_{1}^{(t+1)},\ldots,\mathbf{z}_{n}^{(t+1)}) and 𝛀(t)\boldsymbol{\Omega}^{(t)}, find 𝝁(t+1)\boldsymbol{\mu}^{(t+1)} that maximizes (7).

  3. 3.

    Given (𝐳1(t+1),,𝐳n(t+1))(\mathbf{z}_{1}^{(t+1)},\ldots,\mathbf{z}_{n}^{(t+1)}) and 𝝁(t+1)\boldsymbol{\mu}^{(t+1)}, find 𝛀(t+1)\boldsymbol{\Omega}^{(t+1)} that maximizes (7).

Except for the mild modification on the objective function, the details of the algorithm is essentially the same as the one described in Tian et al. (2022).

3 Simulation Study

3.1 Settings

To illustrate the reference-invariance property under the aforementioned framework, we conduct a simulation study and evaluated the performance of Inv-Comp-gLASSO as well as Inv-gLASSO.

Following Tian et al. (2022), we generate three types of inverse covariance matrices 𝛀=(ωkl)1k,lK+1\boldsymbol{\Omega}=(\omega_{kl})_{1\leq k,l\leq K+1} as follows:

  1. 1.

    Chain: ωkk=1.5\omega_{kk}=1.5, ωkl=0.5\omega_{kl}=0.5 if |kl|=1|k-l|=1, and Ωkl=0\Omega_{kl}=0 if |kl|>1|k-l|>1. Every node is connected to the adjacent node(s), and therefore the degree is 22 for all but two nodes.

  2. 2.

    Random: ωkl=1\omega_{kl}=1 with probability 3/(K+1)3/(K+1) for klk\neq l. Every two nodes are connected with a fixed probability, and the expected degree is the same for all nodes.

  3. 3.

    Hub: Nodes are randomly partitioned into (K+1)/20\lceil(K+1)/20\rceil groups, and there’s one “hub node” in each group. For the other nodes in the group, they are only connected to the hub node but not each other. There’s no connection among groups. The degree of connectedness is much higher for the hub nodes , and is 11 for the rest of the nodes.

In the simulations, we also vary two other factors that play a crucial role in the performances of the methods:

  1. 1.

    Sequencing depth. MiM_{i}’s are simulated from Uniform(20K,40K)\text{Uniform}(20K,40K) or Uniform(100K,200K)\text{Uniform}(100K,200K), denoted by “low” and “high” sequencing depth.

  2. 2.

    Compositional variation. For each aforementioned inverse covariance matrix 𝛀\boldsymbol{\Omega} (“low” compositional variation), we also divide each of them by a factor of 55 to obtain another set of inverse covariance matrices, i.e., 𝛀\boldsymbol{\Omega}/5 (“high” compositional variation).

The data are simulated from the logistic normal multinomial distribution in (1)–(3). In detail, 𝐳iN(𝝁,𝚺)\mathbf{z}_{i}\sim N(\boldsymbol{\mu},\boldsymbol{\Sigma}) are first generated independently for i=1,,ni=1,\ldots,n; then, the softmax transformation (the inverse of the alr transformation) was applied to get the multinomial probabilities 𝐩i\mathbf{p}_{i} with the (K+2)(K+2)-th entry serving as the true reference; last, the multinomial random variables 𝐱i\mathbf{x}_{i} were simulated from Multinomial(Mi;𝐩i)Multinomial(M_{i};\mathbf{p}_{i}), for i=1,,ni=1,\ldots,n. We set n=100n=100 and K=49K=49 throughout the simulations.

The simulation results are based on 100 replicates of the simulated data. Both Inv-Comp-gLASSO and Inv-gLASSO are applied with two choices of reference, the (K+1)(K+1)-th entry serving as the false reference and the (K+2)(K+2)-th entry serving as the true reference. and only the reference-invariant sub-network 𝛀1:K,1:K\boldsymbol{\Omega}_{1:K,1:K} is used in the evaluations. For Inv-gLASSO, we estimate 𝐩1,,𝐩n\mathbf{p}_{1},\ldots,\mathbf{p}_{n} with 𝐱1/M1,,𝐱n/Mn\mathbf{x}_{1}/M_{1},\ldots,\mathbf{x}_{n}/M_{n}, and performe the alr transformation to get the estimates of 𝐳1,,𝐳n\mathbf{z}_{1},\ldots,\mathbf{z}_{n}, which are denoted by 𝐳~1,,𝐳~n\tilde{\mathbf{z}}_{1},\ldots,\tilde{\mathbf{z}}_{n}. We then apply Inv-gLASSO to 𝐳~1,,𝐳~n\tilde{\mathbf{z}}_{1},\ldots,\tilde{\mathbf{z}}_{n} directly to find the inverse covariance matrix 𝛀\boldsymbol{\Omega}, which also serves as the starting value for Inv-Comp-gLASSO. For both methods, we implement them with a common sequence of 7070 tuning parameters of λ\lambda.

We empirically validate the invariance property of Inv-Comp-gLASSO and Inv-gLASSO by comparing the estimators of the two sub-networks 𝛀1:K,1:K\boldsymbol{\Omega}_{1:K,1:K} resulted from choosing the true and false reference separately in each method, which have been shown to be theoretically invariant in Section 2. The comparison is assessed under four criteria as follows.

  1. 1.

    Normalized Manhattan Similarity

    For two matrices 𝐀\mathbf{A} and 𝐁\mathbf{B}, we define the normalized Manhattan similarity (NMS) as

    NMS(𝐀,𝐁)=1𝐀𝐁1𝐀1+𝐁1,\text{NMS}(\mathbf{A},\mathbf{B})=1-\frac{\norm{\mathbf{A}-\mathbf{B}}_{1}}{\norm{\mathbf{A}}_{1}+\norm{\mathbf{B}}_{1}},

    where 1\norm{\cdot}_{1} represents the entrywise L1L_{1} norm of a matrix. Note that 0NMS10\leq\text{NMS}\leq 1 due to the non-negativity of norms and the triangle inequality.

  2. 2.

    Jaccard Index
    For two networks with the same nodes, denote their set of edges by 𝒜\mathcal{A} and \mathcal{B}. Then the Jaccard Index (Jaccard, 1901) is defined as follows:

    J(𝒜,)=|𝒜||𝒜|.J(\mathcal{A},\mathcal{B})=\frac{|\mathcal{A}\cap\mathcal{B}|}{|\mathcal{A}\cup\mathcal{B}|}.

    Obviously, it also holds that 0J(𝒜,)10\leq J(\mathcal{A},\mathcal{B})\leq 1.

  3. 3.

    Normalized Hamming Similarity
    In the context of network comparison, the normalized Hamming similarity for two adjacency matrices 𝐀\mathbf{A} and 𝐁\mathbf{B} with the same NN nodes are defined as follows (Hamming, 1950)

    H(𝐀,𝐁)=1𝐀𝐁1N(N1),H(\mathbf{A},\mathbf{B})=1-\frac{\norm{\mathbf{A}-\mathbf{B}}_{1}}{N(N-1)},

    where 1\norm{\cdot}_{1} denotes the entrywise L1L_{1} norm of a matrix. Since there are at most N(N1)N(N-1) 11’s in an adjacency matrix with NN nodes, this metric is also between 0 and 11.

  4. 4.

    ROC Curve
    The ROC curves on which true positive rate (TPR) and false positive rate (FPR) are plotted and also compared between the choices of true and false reference.

3.2 Results

3.2.1 Normalized Manhattan Similarity

The normalized Manhattan similarity serves as a direct measure of the similarity between the two estimated inverse covariance matrices with different choices of the reference. On a sequence of 7070 tuning parameter λ\lambda’s, we calculated the normalized Manhattan similarity between the two estimated inverse covariance matrices with the true and false references separately. Figure 1 shows the average normalized Manhattan similarity over 100 replicates along with standard error bars.

Refer to caption
Figure 1: Normalized Manhattan similarity between the two estimated inverse covariance matrices with true and false references. Solid blue: Inv-Comp-gLASSO; dashed red: Inv-gLASSO. 𝐡,𝐡\mathbf{h,h}: high sequencing depth, high compositional variation; 𝐡,𝐥\mathbf{h,l}: high sequencing depth, low compositional variation; 𝐥,𝐡\mathbf{l,h}: low sequencing depth, high compositional variation; 𝐥,𝐥\mathbf{l,l}: low sequencing depth, low compositional variation.

We can see from Figure 1 that, the normalized Manhattan similarity for Inv-gLASSO stays close to 1 in all settings, throughout the whole sequence of tuning parameters. On the other hand, there are some fluctuations in the same metric from Inv-Comp-gLASSO, although most values stay higher than 0.9. Empirically, the two estimated matrices are numerically identical for Inv-gLASSO and close for Inv-Comp-gLASSO. A potential reason why the invariance of Inv-gLASSO is numerically more evident than that of Inv-Comp-gLASSO is that Inv-gLASSO is a convex optimization while Inv-Comp-gLASSO is not necessarily convex (Tian et al., 2022). With different starting points (as we choose different references), Inv-Comp-gLASSO might result in different solutions as the algorithm is only guaranteed to converge to a stationary point. We refer to Tian et al. (2022) for more detailed discussion about the convexity and the convergence of the algorithm.

In addition, it is consistently observed that the normalized Manhattan similarity for Inv-Comp-gLASSO starts close to 1 when λ\lambda is very large and gradually decreases with some fluctuations as λ\lambda decreases. This is because the Inv-Comp-gLASSO objective function in (6) is solved by an iterative algorithm between graphical lasso to estimate (𝝁,𝛀)(\boldsymbol{\mu},\boldsymbol{\Omega}) and Newton-Raphson to estimate 𝐳1,,𝐳n\mathbf{z}_{1},\ldots,\mathbf{z}_{n}, which can lead to more numerical errors depending on the number of iterations. Furthermore, the algorithm is implemented with warm start for a sequence of decreasing λ\lambda’s, i.e., the solution for the previous λ\lambda value is used as the starting point for the current λ\lambda value. With the accumulation of numerical errors, the numerical difference between the two estimated matrices becomes larger.

Among the simulation settings, we find the invariance property for Inv-Comp-gLASSO is most evidently supported by the numerical results in the “high sequencing depth, low compositional variation” setting, regardless of the network types. The normalized Manhattan similarity is very close to 1 for Inv-Comp-gLASSO throughout the sequence of tuning parameters. This is because the compositional probabilities 𝐩i\mathbf{p}_{i}’s and thus the 𝐳i\mathbf{z}_{i}’s are estimated accurately with high sequencing depth and low compositional variation in the first iteration of the Inv-Comp-gLASSO algorithm, which implies fewer iterations for the algorithm to converge and less numeric error accumulated during this process. On the other hand, the normalized Manhattan similarity is the lowest in the “low sequencing depth, high compositional variation” setting. It is due to a similar reason that it takes more iterations for the Inv-Comp-gLASSO algorithm to converge, accumulating more numerical errors. However, it is noteworthy that this is exactly the setting in which Inv-Comp-gLASSO has the most advantage over Inv-gLASSO in recovering the true network (see Section 3.2.3 for their ROC curves).

3.2.2 Jaccard Index and Normalized Hamming Similarity

Compared to normalized Manhattan similarity that measures directly the similarity between two inverse covariane matrices, both Jaccard index and normalized Hamming similarity measure the similarity between two networks represented by the matrices because they only compared the adjacency matrices or the edges of the two networks. Again, on a sequence of 7070 tuning parameter λ\lambda’s, we computed the Jaccard index and the normalized Hamming similarity between the two networks with true and false references separately. Figures 2 and 3 plot the average Jaccard index and the normalized Hamming similarity over 100 replicates along with standard error bars.

Refer to caption
Figure 2: Jaccard index between the two networks with true and false references. Solid blue: Inv-Comp-gLASSO; dashed red: Inv-gLASSO. 𝐡,𝐡\mathbf{h,h}: high sequencing depth, high compositional variation; 𝐡,𝐥\mathbf{h,l}: high sequencing depth, low compositional variation; 𝐥,𝐡\mathbf{l,h}: low sequencing depth, high compositional variation; 𝐥,𝐥\mathbf{l,l}: low sequencing depth, low compositional variation.
Refer to caption
Figure 3: Normalized Hamming similarity between the two networks with true and false references. Solid blue: Inv-Comp-gLASSO; dashed red: Inv-gLASSO. 𝐡,𝐡\mathbf{h,h}: high sequencing depth, high compositional variation; 𝐡,𝐥\mathbf{h,l}: high sequencing depth, low compositional variation; 𝐥,𝐡\mathbf{l,h}: low sequencing depth, high compositional variation; 𝐥,𝐥\mathbf{l,l}: low sequencing depth, low compositional variation.

The results of normalized Hamming similarity in Figure 3 have a fairly similar pattern to those of normalized Manhattan similarity in Figure 1 and thus can be similarly interpreted. We only focus on the results of Jaccard index in Figure 2 here. The Jaccard index stays close to 1 for Inv-gLASSO, implying the identity between the two networks. Although the results of the Jaccard index for Inv-Comp-gLASSO look quite different from the other measures at the first glance, it actually implies a similar conclusion. First, we notice that there is no Jaccard index for the first few tuning parameters that are large enough. This is because the resultant network is empty with either true or false reference. Although two empty networks agree with each other perfectly, the Jaccard index is not well defined. Then, as Inv-Comp-gLASSO starts to pick up edges when λ\lambda decreases, the Jaccard index is quite low in some settings, suggesting that the two networks are dissimilar. However, this is due to the fact the Jaccard index is a much more “strict” similarity measure than the Hamming similarity. For example, for two networks with 100 possible total edges, if both networks only have one but a different edge, then the Jaccard index is 0 while the normalized Hamming similarity is 0.980.98. Finally, as the networks become denser, the Jaccard index increases quickly and stabilizes at a quite high value in most settings.

It is also notable that both the Jaccard index and the normalized Hamming similarity are relatively high in the “high sequencing depth, low compositional variation” setting and relatively low in the “low sequencing depth, high compositional variation” setting, which is consistent with the finding for the normalized Manhattan similarity.

3.2.3 ROC Curves

An ROC curve is plotted from the average of true positive rates and the average of false positive rates over 100 replicates. An ROC curve can be regarded as an indirect measure of the invariance (two networks possessing similar ROC curves is a necessary but not sufficient condition for the two networks to be similar). However, it is crucial to evaluate the algorithms with this criterion, since it answers the question: “Does the performance of the algorithm depends on the choice of reference?”

Refer to caption
Figure 4: ROC curves for Inv-Comp-gLASSO and Inv-gLASSO with true and false references. Long-dashed blue: Inv-Comp-gLASSO with true reference; dashed red: Inv-Comp-gLASSO with false reference; dashed-dotted purple: Inv-Comp-gLASSO with true reference; dotted black: Inv-Comp-gLASSO with false reference. 𝐡,𝐡\mathbf{h,h}: high sequencing depth and high compositional variation; 𝐡,𝐥\mathbf{h,l}: high sequencing depth and low compositional variation; 𝐥,𝐡\mathbf{l,h}: low sequencing depth and high compositional variation; 𝐥,𝐥\mathbf{l,l}: low sequencing depth and low compositional variation.

We could see that the ROC curves from Inv-Comp-gLASSO, regardless of the choice of the reference, dominate the ones from Inv-gLASSO in all settings. The two ROC curves from Inv-gLASSO lay perfectly on top of each other, while the curves from Inv-Comp-gLASSO are also fairly close to each other. These empirical results validate the theoretical reference-invariance property for both methods. In addition, Inv-Comp-gLASSO has the most obvious advantage over Inv-gLASSO in the “low sequencing depth, high compositional variation” setting and they perform almost identically in the “low sequencing depth, high compositional variation” setting. Although the similarity measures are lower in the “most favorable” setting for Inv-Comp-gLASSO (see Sections 3.2.1 and 3.2.2), the ROC curves of the two networks from the method do not deviate too much from each other in this setting.

4 Real Data

To further validate the theoretical reference-invariance properties of Inv-Comp-gLASSO and Inv-gLASSO, we applied them to a dataset from the TARA Ocean project, in which the Tara Oceans consortium sampled both plankton and environmental data in 210 sites from the world oceans. The data collected was later analyzed using sequencing and imaging techniques. We downloaded the taxonomic data and the literature interactions from the TARA Ocean Project data repository (https://doi.pangaea.de/10.1594/PANGAEA.843018). As part of the TARA Oceans project, Lima-Mendez et al. (2015) investigated the impact of both biotic and abiotic interactions in oceanic ecosystem. In this article, a literature-curated list of genus-level marine eukaryotic plankton interactions was generated by a panel of experts.

Similar to Tian et al. (2022), we focused the analysis on genus level and only kept the 81 genus involved in the literature-reported interactions. For computational simplicity, we removed the samples with too small reads (<100<100). As a result, it leaves with 324 samples in the final preprocessed data. From the genera that were not reported in the literature, we chose two of them, Acrosphaera and Collosphaera, with the largest average relative abundances as the references. We then applied both Inv-Comp-gLASSO and Inv-gLASSO to the alr-transformed data with those two references, with a common sequence of tuning parameters. For each combination of a method and a reference, we also selected a tuning parameter that corresponds to the “asymptotically sparsistent” (the sparsest estimated network in the path that contains the true network asymptotically) network by StARS (Liu et al., 2010).

Refer to caption
Figure 5: Normalized Manhattan similarity between the two estimated inverse covariance matrices with the the two choices of reference, Acrosphaera and Collosphaera, from the TARA data. Solid blue: Inv-Comp-gLASSO; dashed red: Inv-gLASSO.
Refer to caption
Figure 6: Jaccard index and normalized Hamming similarity between the two estimated networks with the the two choices of reference, Acrosphaera and Collosphaera, from the TARA data. Solid blue: Inv-Comp-gLASSO; dashed red: Inv-gLASSO.

Figures 5 presents the normalized Manhattan similarity between two estimated inverse covariance matrices with the two choices of reference. Figures 6 plotted the Jaccard index and the normalized Hamming similarity between the two networks represented by the two estimated inverse covariance matrices. We can see from Figure 5 and Figure 6 that, all three similarity metrics stay steadily around 11 for Inv-gLASSO throughout the sequence of λ\lambda’s. This agrees with our observation in simulations where Inv-gLASSO produced numerically identical inverse covariance matrices.

For Inv-Comp-gLASSO, the similarity scores start around 11 (for normalized Manhattan similarity and normalized Hamming similarity) or non-existent (for Jaccard index) when the estimated networks are empty. Then, as λ\lambda decreases, the estimated networks become denser, and the measures start to fluctuate and decline slightly at the end. In spite of the fluctuations, both normalized Manhattan similarity and normalized Hamming similarity stay above 0.9, while the lowest Jaccard index is about 0.64. As discussed earlier, Jaccard index is a stricter measure than normalized Hamming similarity.

Within each method, StARS picked the same tuning parameter λ\lambda regardless of the choice of the reference, as denoted by the red dot for Inv-gLASSO and the blue dot for Inv-Comp-gLASSO in Figures 5 and 6. In other words, the red and blue dots represent the tuning parameters corresponding to the final estimated networks selected by StARS. Again, the three similarity measures for the two final inverse covariance matrices or networks from Inv-gLASSO is almost 1, while the normalized Manhattan similarity, Jaccard index, and normalized Hamming similarity are 0.980.98, 0.960.96 and 0.990.99 for Inv-Comp-gLASSO. All these high similarity scores imply the empirical invariance for Inv-gLASSO and Inv-Comp-gLASSO. Both methods result in invariant inverse covariance matrices and thus the corresponding networks with respect to the choices of the reference genus (Acrosphaera or Collosphaera) when applied to the TARA Ocean eukaryotic dataset.

5 Discussion

In this work, we established the reference-invariance property in sparse inverse covariance matrix estimation and network construction based on the alr transformed data. Then, we proposed the reference-invariant versions of the compositional graphical lasso and graphical lasso by modifying the penalty in their respective objective functions. In addition, we validated the reference-invariance property of the proposed methods empirically by applying them to various scenarios of simulations and a real TARA Ocean eukaryotic dataset.

It is noteworthy that the reference-invariance property is a general property for estimating the inverse covariance matrix based on the alr transformed data. We proposed reference-invariant versions of compositional graphical lasso and graphical lasso based on this property, however, one may revise other existing methods for inverse covariance matrix estimation based on the alr transformed data. The trick is to revise the objective function so that it becomes invariant with respect to the choice of the reference. Subsequently, the resultant inverse covariance matrix and network are expected to be reference-invariant both theoretically and empirically, the latter of which may depend on the algorithm that is used to optimize the reference-invariant objective function.

References

  • Aitchison (1986) Aitchison, J. (1986), The statistical analysis of compositional data, Monographs on statistics and applied probability, Chapman and Hall.
  • Ban et al. (2015) Ban, Y., An, L., and Jiang, H. (2015), “Investigating microbial co-occurrence patterns based on metagenomic compositional data,” Bioinformatics, 31, 3322–3329.
  • Banerjee et al. (2008) Banerjee, O., El Ghaoui, L., and d’Aspremont, A. (2008), “Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data,” The Journal of Machine Learning Research, 9, 485–516.
  • Fang et al. (2015) Fang, H., Huang, C., Zhao, H., and Deng, M. (2015), “CCLasso: correlation inference for compositional data through Lasso,” Bioinformatics, 31, 3172–3180.
  • Fang et al. (2017) — (2017), “gCoda: conditional dependence network inference for compositional data,” Journal of Computational Biology, 24, 699–708.
  • Faust and Raes (2012) Faust, K. and Raes, J. (2012), “Microbial interactions: from networks to models,” Nature Reviews Microbiology, 10, 538.
  • Friedman and Alm (2012) Friedman, J. and Alm, E. J. (2012), “Inferring correlation networks from genomic survey data,” PLoS computational biology, 8, e1002687.
  • Friedman et al. (2008) Friedman, J., Hastie, T., and Tibshirani, R. (2008), “Sparse inverse covariance estimation with the graphical lasso,” Biostatistics, 9, 432–441.
  • Hamming (1950) Hamming, R. W. (1950), “Error detecting and error correcting codes,” The Bell system technical journal, 29, 147–160.
  • Jaccard (1901) Jaccard, P. (1901), “Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines,” Bull Soc Vaudoise Sci Nat, 37, 241–272.
  • Jiang et al. (2021) Jiang, D., Sharpton, T., and Jiang, Y. (2021), “Microbial interaction network estimation via bias-corrected graphical lasso,” Statistics in Biosciences, 13, 329–350.
  • Kurtz et al. (2015) Kurtz, Z. D., Müller, C. L., Miraldi, E. R., Littman, D. R., Blaser, M. J., and Bonneau, R. A. (2015), “Sparse and compositionally robust inference of microbial ecological networks,” PLoS computational biology, 11, e1004226.
  • Lima-Mendez et al. (2015) Lima-Mendez, G., Faust, K., Henry, N., Decelle, J., Colin, S., Carcillo, F., Chaffron, S., Ignacio-Espinosa, J. C., Roux, S., Vincent, F., et al. (2015), “Determinants of community structure in the global plankton interactome,” Science, 348, 1262073.
  • Liu et al. (2010) Liu, H., Roeder, K., and Wasserman, L. (2010), “Stability approach to regularization selection (stars) for high dimensional graphical models,” in Advances in neural information processing systems, pp. 1432–1440.
  • Smith et al. (2017) Smith, P., Willemsen, D., Popkes, M., Metge, F., Gandiwa, E., Reichard, M., and Valenzano, D. R. (2017), “Regulation of life span by the gut microbiota in the short-lived African turquoise killifish,” elife, 6, e27014.
  • Tian et al. (2022) Tian, C., Jiang, D., Hammer, A., Sharpton, T., and Jiang, Y. (2022), “Compositional Graphical Lasso Resolves the Impact of Parasitic Infection on Gut Microbial Interaction Networks in a Zebrafish Model,” Under Review.
  • Xia et al. (2013) Xia, F., Chen, J., Fung, W. K., and Li, H. (2013), “A logistic normal multinomial regression model for microbiome compositional data analysis,” Biometrics, 69, 1053–1063.
  • Yuan et al. (2019) Yuan, H., He, S., and Deng, M. (2019), “Compositional data network analysis via lasso penalized D-trace loss,” Bioinformatics, 35, 3404–3411.
  • Yuan and Lin (2007) Yuan, M. and Lin, Y. (2007), “Model selection and estimation in the Gaussian graphical model,” Biometrika, 94, 19–35.