Statistical Inference of Auto-correlated Eigenvalues with Applications to Diffusion Tensor Imaging

Zhou Lan
Center for Outcomes Research and Evaluation
Yale School of Medicine

Abstract

Diffusion tensor imaging (DTI) is a prevalent neuroimaging tool in analyzing the anatomical structure. The distinguishing feature of DTI is that the voxel-wise variable is a $3\times 3$ positive definite matrix other than a scalar, describing the diffusion process at the voxel. Recently, several statistical methods have been proposed to analyze the DTI data. This paper focuses on the statistical inference of eigenvalues of DTI because it provides more transparent clinical interpretations. However, the statistical inference of eigenvalues is statistically challenging because few treat these responses as random eigenvalues. In our paper, we rely on the distribution of the Wishart matrix’s eigenvalues to model the random eigenvalues. A hierarchical model which captures the eigenvalues’ randomness and spatial auto-correlation is proposed to infer the local covariate effects. The Monte-Carlo Expectation-Maximization algorithm is implemented for parameter estimation. Both simulation studies and application to IXI data-set are used to demonstrate our proposal. The results show that our proposal is more proper in analyzing auto-correlated random eigenvalues compared to alternatives.

Keywords: Auto-correlation, Diffusion Tensor Imaging, Monte-Carlo Expectation-Maximization Algorithm, Random Eigenvalues, Wishart Distribution

1 Introduction

Diffusion tensor imaging (DTI) is a prevalent neuroimaging tool in analyzing the anatomical structure of a tissue. The rich use of DTI in brain science triggers many meaningful interdisciplinary studies. Due to the nature of interdisciplinarity, providing a tutorial of DTI, which is intuitive to both statisticians and clinicians, is key in interdisciplinary researches of DTI. Based on Figure 1 which is modified from figures in (Zhou, 2010, Figure 1.6), we give a short tutorial of DTI as follows. DTI is a neuroimage measuring the white matter tracts within a tissue (e.g., human brain) voxel-by-voxel. Let the structure at the upper panel of Figure 1 be the anatomical structure measured at a voxel. The movement of water molecules in different environments is measured by giving their respective diffusion ellipsoids at the lower panel of Figure 1. The eigenvectors are the diffusion directions $[\bm{e}_{1}\ \bm{e}_{2}\ \bm{e}_{3}]$ and the squared roots of the eigenvalues $[l_{1}\ l_{2}\ l_{3}]$ are the corresponding semi-axis lengths (Zhou, 2010, Section 1.2.3). The corresponding diffusion tensor $D=[\bm{e}_{1}\ \bm{e}_{2}\ \bm{e}_{3}]\text{diag}(l_{1},l_{2},l_{3})[\bm{e}_{1}\ \bm{e}_{2}\ \bm{e}_{3}]^{T}$ describes the anatomical structure at the voxel. Combining the diffusion tensors, which are positive definite over all the voxels, we have an image whose voxel-wise variables are positive definite diffusion tensors, describing the anatomical structure of a tissue.

Refer to caption — Figure 1: The figure gives the graphical illustration of the anatomical structure and the corresponding diffusion process. The upper panel gives the tissue to be measured. The lower panel gives the corresponding diffusion ellipsoid.

The use of DTI is extensive (e.g., fiber tracking, clinical studies, etc.). Our paper focuses on understanding how the subject’s covariates (e.g., age, gender, disease status) drive the brain’s anatomical structures in terms of DTI. Many brain scientists raised this scientific objective. The relevant researches are now divided into two paths. The statisticians are interested in developing methodologies that handle the statistical randomness of the whole diffusion tensor. There are many representative works (e.g., Schwartzman et al., 2008; Zhu et al., 2009; Yuan et al., 2012; Lee and Schwartzman, 2017; Lan et al., 2021). The clinicians are more interested in using convenient summary scalar to tell the changes of diffusion tensor quickly and easily. For example, the fractional anisotropy, denoted as ${\displaystyle{\text{FA}}={\sqrt{\frac{1}{2}}}{\frac{\sqrt{(l_{1}-l_{2})^{2}+(l_{2}-l_{3})^{2}+(l_{3}-l_{1})^{2}}}{\sqrt{l_{1}^{2}+l_{2}^{2}+l_{3}^{2}}}},}$ is a frequently used summary scalar. Via implementing basic statistical tools, i.e., t-test, simple linear regression with these quantities as data, the researchers can easily tell the changes of the tissue statistically (e.g., Ma et al., 2009; Lane et al., 2010).

In light of Figure 1, we propose that analyzing the DTI’s eigenvalues and eigenvectors separately can be another promising research direction. For statisticians, this path does not preclude any valuable available data information. For clinicians, the scientific explanations in terms of the statistical uncertainties of eigenvalues and/or eigenvectors are more transparent than the other ways, e.g., coefficients based on Cholesky decomposition (Zhu et al., 2009; Lan et al., 2021) or local polynomial regression (Yuan et al., 2012). In our paper, we focus on the statistical analysis of eigenvalues because the variations of eigenvalues are primarily interested by the clinicians given the current clinical reports (e.g., Ma et al., 2009; Lane et al., 2010). In this paper, the analysis of eigenvectors is left as future works.

Compared to other statistical models regarding the eigenvalues of DTI (e.g., Zhu et al., 2010, 2011; Jin et al., 2017), our paper may be a very first paper discussing the statistical inference of random eigenvalues. It is an important fact that the eigenvalues are stochastically ordered, and thus the probability density function of the eigenvalues must ensure the inequality of the eigenvalues stochastically. This inequality makes many classic models such as multivariate Gaussian processes not be applied to this problem straightforwardly. In this paper, we use the density function of a Wishart distributed matrix’s eigenvalues as the basic probability distribution, referred to as distribution of Wishart’s eigenvalues. The distribution of Wishart’s eigenvalues ensures the inequality of the eigenvalues stochastically.

Following many neuroimaging studies, the spatial dependence of the eigenvalues among voxels is encouraged to be incorporated. In the previous analysis (e.g., Zhu et al., 2010, 2011; Jin et al., 2017), the spatial dependence can be easily achieved via the spatial Gaussian process model. Many fiber tracking reports (e.g., Wong et al., 2016) assume that the diffusion tensors are smooth along with a fiber tract (see Figure 2) whose diffusion tensors have similar directions. In light of this assumption, we assume that the eigenvalues are auto-correlated. The model based on auto-correlated eigenvalues captures the spatial correlation and produces an analytic expression of the density function of the spatially correlated eigenvalues (James et al., 1964).

Combining all the above, we construct a hierarchical spatial model for the random auto-correlated eigenvalues. This model is both statistically and scientifically proper regarding the applications to diffusion tensor imaging. Given the hierarchical structure, we implement the Monte-Carlo Expectation-Maximization (EM) algorithm (Levine and Casella, 2001, Section 2.1) for the parameter estimation and implement parametric bootstrapping for uncertainties quantification.

Both simulation studies and real data applications are used to demonstrate our proposed method. We compare our proposal to a non-linear regression model treating the data as normally distributed. The simulation studies indicate that our model gives a more accurate parameter point and confidence interval estimation. The real data analysis based on the IXI data-set (https://brain-development.org/ixi-dataset/) provided by Center for the Developing Brain, Imperial College London, an open data-set made available under the Creative Commons CC BY-SA 3.0 license, shows more clinically meaningful results after treating the data as random eigenvalues.

In the rest of the paper, we first give the hierarchical spatial model of random auto-correlated eigenvalues in Section 2. The estimation of the proposed model is provided in Section 3. We give simulation studies and real data analysis in Section 4 and Section 5, respectively. Finally, we give a conclusion in Section 6. The theoretical details in our paper are in Appendix A. The codes for implementing our method are summarized in Appendix B.

2 Model Setup

In this section, we give the model construction step-by-step. First, we provide the basic model setup in Section 2.1, where the probability distribution of the eigenvalues and the covariate effect specification are introduced. Under the basic model, we further give a latent layer of eigenvalues that induce spatial correlation and provide the details in Section 2.2. To wrap up, we provide a model summary and introduce its application to the statistical inference of eigenvalues of DTI in Section 2.3.

Our model is based on the the parameterized Wishart distribution, and the parameterized Wishart distribution is slightly different from the Wishart distribution introduced in the classic textbooks. We give the construction of the parameterized Wishart distribution as follows. $\bm{Z}_{j}$ independently follows a mean-zero normal distribution with the covariance matrix $\bm{\Sigma}$ over $j\in\{1,...,U\}$ , denoted as $\bm{Z}_{j}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{N}(\bm{0},\bm{\Sigma})$ . $\bm{W}=\sum_{j=1}^{U}\bm{Z}_{j}\bm{Z}_{j}^{T}/U$ follows a parameterized Wishart distribution with the mean matrix $\bm{\Sigma}$ and the degrees of freedom $U$ , denoted as $\bm{W}\sim\mathcal{W}_{p}(\bm{\Sigma},U)$ . The density function of $\bm{W}$ is given in Appendix A.1.

2.1 Basic Model Setup

Let $\bm{D}_{it}$ be a $3\times 3$ positive definite matrix-random variable following a parameterized Wishart distribution with the mean matrix $\bm{\Sigma}_{it}$ and the degrees of freedom $m$ , denoted as

\bm{D}_{it}\sim\mathcal{W}_{3}(\bm{\Sigma}_{it},m).

(1)

$\bm{D}_{it}$ can be treated as the diffusion tensor at the voxel $t\in\{1,2,...,T\}$ of the subject $i\in\{1,2,...,I\}$ . The eigenvalues of $\bm{D}_{it}$ are $[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]$ with the stochastic inequality $L_{it}^{(1)}>L_{it}^{(2)}>L_{it}^{(3)}$ . Besides the degrees of freedom $m$ , the density function of $[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]$ only depends on the eigenvalues of $\bm{\Sigma}_{it}$ which are $\lambda_{it}^{(1)}>\lambda_{it}^{(2)}>\lambda_{it}^{(3)}$ (Muirhead, 2009, Theorem 3.2.18). Therefore, we say that $[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]$ follow a distribution of Wishart’s eigenvalues with the degrees freedom $m$ and the population eigenvalues $\lambda_{it}^{(1)}>\lambda_{it}^{(2)}>\lambda_{it}^{(3)}$ , denoted as

[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]\sim\mathcal{E}([\lambda_{it}^{(1)},\lambda_{it}^{(2)},\lambda_{it}^{(3)}],m)

(2)

Given Lawley (1956, Equation 3), the mean of the eigenvalue is expressed as $\mathbb{E}L_{it}^{(k)}=\lambda_{it}^{(k)}+\mathcal{O}(m^{-1/2})$ , where $\mathcal{O}(m^{-1/2})$ is a quantity with order $m^{-1/2}$ . Since $\lambda_{it}^{(k)}$ is the so-called population eigenvalue of the distribution (the eigenvalue of the mean matrix $\bm{\Sigma}_{it}$ ), we point out that the expression $\mathbb{E}L_{it}^{(k)}=\lambda_{it}^{(k)}+\mathcal{O}(m^{-1/2})$ may not be consistent with our intuition. The value of the degrees of freedom $m$ also leads to the small variance of the eigenvalues. Therefore, $\lambda_{it}^{(k)}$ can be approximately treated as the mean of $L_{it}^{(k)}$ when $m$ is large enough, denoted as $\mathbb{E}L_{it}^{(k)}\simeq\lambda_{it}^{(k)}$ , although its formal definition is the so-called population eigenvalue.

Let $\bm{X}_{i}$ be a $1\times D$ matrix containing the covariate information (e.g., age, gender) of the subject $i$ . We rescale the original value of each covariate $X_{id}$ into the range in $[0,1]$ . We further specify that $\mathbb{E}L_{it}^{(k)}\simeq e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}=\lambda_{it}^{(k)}$ where $\bm{\beta}_{t}^{(k)}=[{\beta}_{1t}^{(k)},...,{\beta}_{Dt}^{(k)}]^{T}$ , where ${\beta}_{dt}^{(k)}$ is the coefficient for the covariate $d$ of the $k$ -th eigenvalue located at the voxel $t$ and ${\beta}_{0t}^{(k)}$ is the corresponding intercept. This specification ensures that the population eigenvalues are positive, and the coefficients represent the linear effects on the logarithm of the mean of the random eigenvalue, denoted as $\log\mathbb{E}L_{it}^{(k)}\simeq\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}$ . We also put the constraints ${\beta}_{dt}^{(1)}>{\beta}_{dt}^{(2)}>{\beta}_{dt}^{(3)}$ for $d=0,1,2,...,D$ to ensure the inequality of the population eigenvalues. Combining all above, the means of the random eigenvalue responses (i.e., $\mathbb{E}L_{it}^{(k)}$ ) are determined by the covariate information.

2.2 Latent Eigenvalues Setup

In Section 2.1, we give basic model construction for the random eigenvalue responses. In this section, we further induce the spatial dependence among the random eigenvalues by giving the latent eigenvalues setup. The spatial model to be built is expected to have the following three properties:

Property 1.

The eigenvalues $[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]$ are independent and identically distributed (i.i.d) over subjects $i\in\{1,...,N\}$ .

Property 2.

Given a subject $i$ , $[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]$ are spatially correlated over the voxels $t\in\{1,...,T\}$ .

Property 3.

The approximation $\mathbb{E}L_{it}^{(k)}\simeq e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}$ given in the basic model setup is preserved after giving the latent eigenvalues setup.

To achieve these goals, we decompose $\lambda_{it}^{(k)}$ as $\lambda_{it}^{(k)}=\xi_{it}^{(k)}e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}$ , where $\bm{\beta}_{t}^{(k)}=[{\beta}_{1t}^{(k)},...,{\beta}_{Dt}^{(k)}]^{T}$ . The additional terms $\xi_{it}^{(1)}>\xi_{it}^{(2)}>\xi_{it}^{(3)}$ are random eigenvalues follow a so-called spatial process of Wishart’s eigenvalues, inducing the spatial correlation. As a prerequisite, we first introduce the spatial Wishart process (Gelfand et al., 2004; Lan et al., 2021) which is a means to induce the correlation among Wishart matrices. Let $\bm{Z}_{t}$ be a mean-zero Gaussian process with the covariance as $\text{cov}(\bm{Z}_{t},\bm{Z}_{s})=\rho_{ts}\times\bm{\Sigma}$ , where $\rho_{ts}\in(0,1)$ is the correlation, and $\bm{\Sigma}$ is the cross covariance matrix of the Gaussian process. In the rest of the paper, we simply give $\bm{\Sigma}=\bm{I}$ , where $\bm{I}$ is an identity matrix. If we have i.i.d realizations of this process, denoted as $\bm{Z}_{jt}$ , where $j$ is the subscript for realizations and $t$ is the subscript for the Gaussian process, then $\bm{U}_{t}=\sum_{j=1}^{M}\bm{Z}_{jt}\bm{Z}_{jt}^{T}/M$ follows a so-called spatial Wishart process, denoted as $\{\bm{U}_{t}\}_{t\in\mathbb{T}}\sim\mathcal{SWP}(M,\rho_{ts}\otimes\bm{I})$ . Subsequently, the eigenvalues of $\bm{U}_{t}$ which are $\xi_{t}^{(1)}>\xi_{t}^{(2)}>\xi_{t}^{(3)}$ follow a spatial process of Wishart’s eigenvalues, denoted as

{[\xi_{t}^{(1)},\xi_{t}^{(2)},\xi_{t}^{(3)}]\}_{t\in\mathbb{T}}\sim\mathcal{SEP}(M,\rho_{ts}\otimes[1,1,1])}.

(3)

Let $[\xi_{it}^{(1)}>\xi_{it}^{(2)}>\xi_{it}^{(3)}]$ be the $i$ -th i.i.d. realization of $\{[\xi_{t}^{(1)},\xi_{t}^{(2)},\xi_{t}^{(3)}]\}_{t\in\mathbb{T}}\sim\mathcal{SEP}(M,\rho_{ts}\otimes[1,1,1])$ at the voxel $t$ . To wrap-up, our model at this stage is

		$\displaystyle[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]\|[\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}]\sim\mathcal{E}([\lambda_{it}^{(1)},\lambda_{it}^{(2)},\lambda_{it}^{(3)}],m),$		(4)
		$\displaystyle\lambda_{it}^{(k)}=\xi_{it}^{(k)}e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}},$
		$\displaystyle\{[\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}]\}_{t\in\mathbb{T}}\sim\mathcal{SEP}(M,\rho_{ts}\otimes[1,1,1]).$

Next, we show this model satisfies the three properties we stated at the beginning of this subsection.

Since the latent eigenvalues $[\xi_{it}^{(1)}>\xi_{it}^{(2)}>\xi_{it}^{(3)}]$ are i.i.d. over $i\in\{1,...,N\}$ , Property 1 that the eigenvalues $[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]$ are i.i.d over subjects $i\in\{1,...,N\}$ is satisfied. We measure the spatial dependence of this process via the expected squared Frobenius norm of the term $\text{diag}([\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}])-\text{diag}([\xi_{s}^{(1)},\xi_{s}^{(2)},\xi_{s}^{(3)}])$ , denoted as

\mathcal{V}(t,s)=\mathbb{E}||\text{diag}([\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}])-\text{diag}([\xi_{is}^{(1)},\xi_{is}^{(2)},\xi_{is}^{(3)}])||_{F}^{2}.

(5)

The expected squared Frobenius norm $\mathcal{V}(t,s)$ can be treated as a variogram (Cressie and Wikle, 2015; Lan et al., 2021). We use Monte-Carlo approximation to visualize $\mathcal{V}(t,s)$ (see Figure 3) since it has no analytic expression. The term $\mathcal{V}(t,s)$ is a decreasing function of $\rho_{ts}$ . A larger value of the degrees of freedom $m$ leads to a larger value of $\mathcal{V}(t,s)$ , indicating lager local variations. We conclude that the spatial dependence replies on the correlation parameter $\rho_{ts}$ of the latent Gaussian processes. Thus, Property 2 that $[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]$ are spatially correlated over the voxels $t$ , is satisfied. Furthermore, conditional on $\xi_{it}^{(k)}$ , the expectation of $L_{it}^{(k)}$ is $\mathbb{E}_{L_{it}^{(k)}|\xi_{it}^{(k)}}L_{it}^{(k)}\simeq\xi_{it}^{(k)}e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}$ . Locally, $[\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}]$ follows $\mathcal{E}([1,1,1],M)$ with $\mathbb{E}\xi_{it}^{(k)}\simeq 1$ as $M$ is large. Thus, Property 3 that the approximation $\mathbb{E}L_{it}^{(k)}\simeq e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}$ is still preserved after $\xi_{it}^{(k)}$ is marginalized out.

Next, we give the specification of the correlation $\rho_{ts}$ . In light of many interdisciplinary works on fiber tracking Wong et al. (e.g., 2016, Section 4), the spatial dependence can be assumed to be auto-correlated along with a fiber tract, and there is no spatial dependence across fibers (see Figure 2). To accompany with this feature, we no longer use the subscript $t$ to denote the voxels but use the subscript $g_{t}$ to denote the $t$ -th voxel of the $g$ -th fiber. Given a fiber $g$ , $[g_{1},...,g_{t},...,g_{T_{g}}]$ are the $1$ -st, …, $t$ -th, …, $T_{g}$ -th diffusion tensor ordered from the first diffusion tensor of the fiber to the last diffusion tensor of the tensor, respectively. To accompany with the auto-correlation of the voxels along a fiber, we give that $\rho_{g_{t},g^{\prime}_{s}}=p^{|t-s|}$ for $g=g^{\prime}$ , where $p$ is the correlation parameter controlling the spatial dependence. To accompany with that there is no spatial dependence across fibers, we give that $\rho_{g_{t},g^{\prime}_{s}}=0$ for $g\not=g^{\prime}$ . A by-product of this spatial auto-correlation is to produce an analytic expression of the density function of the spatially correlated eigenvalues (James et al., 1964).

2.3 Model Summary and Diffusion Tensor Imaging

To this end, our proposed model of random auto-correlated eigenvalues is fully expressed as

		$\displaystyle{\textbf{Basic Level:}}\left\{\begin{aligned} &[L_{i,g_{t}}^{(1)},L_{i,g_{t}}^{(2)},L_{i,g_{t}}^{(3)}]\|[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\sim\mathcal{E}([\lambda_{i,g_{t}}^{(1)},\lambda_{i,g_{t}}^{(2)},\lambda_{i,g_{t}}^{(3)}],m);\\ &\lambda_{i,g_{t}}^{(k)}=\xi_{i,g_{t}}^{(k)}e^{\beta_{0,g_{t}}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}};\\ &\bm{\beta}_{g_{t}}^{(k)}=[{\beta}_{1,g_{t}}^{(k)},...,{\beta}_{D,g_{t}}^{(k)}]^{T};\quad{\beta}_{d,g_{t}}^{(1)}>{\beta}_{d,g_{t}}^{(2)}>{\beta}_{d,g_{t}}^{(3)}\\ \end{aligned}\right.$		(6)
		$\displaystyle{\textbf{Latent Level:}}\left\{\begin{aligned} &\{[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\}_{t\in\mathbb{T}}\sim\mathcal{SEP}(M,\rho_{g_{t},g^{\prime}_{s}}\otimes[1,1,1]);\\ &\rho_{g_{t},g^{\prime}_{s}}=\left\{\begin{aligned} p^{\|t-s\|}&,&g=g^{\prime}\\ 0&,&g\not=g^{\prime}\end{aligned}\right.,\end{aligned}\right.$		(6)

for $i=1,2,...,N;\ g=1,2,...,G;\ t=1,2,...,T_{g}$ .

This proposed model can be used for the statistical inference of eigenvalues of DTI. The DTI data are collected for subjects $i=1,2,...,N$ . For each subject $i$ , the eigenvalues of the voxel $t\in\{1,2,...,T_{g}\}$ located at the fiber $g\in\{1,2,...,G\}$ are treated as responses, i.e., $[L_{i,g_{t}}^{(1)},L_{i,g_{t}}^{(2)},L_{i,g_{t}}^{(3)}]$ . Conditional on $[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]$ , the eigenvalues $[L_{i,g_{t}}^{(1)},L_{i,g_{t}}^{(2)},L_{i,g_{t}}^{(3)}]$ follow a distribution of Wishart’s eigenvalues, i.e., $\mathcal{E}([\lambda_{i,g_{t}}^{(1)},\lambda_{i,g_{t}}^{(2)},\lambda_{i,g_{t}}^{(3)}],m)$ . The value of the degrees of freedom $m$ controls the local variations of the eigenvalues: a larger value of $m$ leads to smaller local variations. The term $e^{\beta_{0,g_{t}}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}}$ dominates the mean of the eigenvalue $L_{i,g_{t}}^{(k)}$ when $m$ and $M$ are large enough. The coefficient ${\beta}_{d,g_{t}}^{(k)}$ captures the averaging effect of the covariate $d$ on the mean of $L_{i,g_{t}}^{(k)}$ .

To capture the spatial correlations, the latent variables $\{[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\}_{t\in\mathbb{T}}$ follow the spatial process of Wishart’s eigenvalues $\mathcal{SEP}(M,\rho_{g_{t},g^{\prime}_{s}}\otimes[1,1,1])$ , where $\rho_{g_{t},g^{\prime}_{s}}=\left\{\begin{aligned} p^{|t-s|}&,&g=g^{\prime}\\ 0&,&g\not=g^{\prime}\end{aligned}\right.$ . The value of the degrees of freedom $M$ controls the local variations of the latent eigenvalues. No correlation is assumed across fibers. The parameter $p$ controls the correlation among voxels within a fiber tract: a larger value of $p$ leads to strong correlations. The information of the fibers and their connected voxels us treated as priori. These pieces of the known priori information can be obtained from a human brain diffusion tensor template (e.g., Peng et al., 2009).

3 Estimation

In this section, we introduce the statistical estimation of Model 6. We use $\bm{l}_{i,g_{t}}=[l_{i,g_{t}}^{(1)},l_{i,g_{t}}^{(2)},l_{i,g_{t}}^{(3)}]$ to denote the observed eigenvalues of the random eigenvalues $\bm{L}_{i,g_{t}}=[L_{i,g_{t}}^{(1)},L_{i,g_{t}}^{(2)},L_{i,g_{t}}^{(3)}]$ . We also give $\bm{\xi}_{i,g_{t}}=[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]$ , $\bm{L}=\{\bm{L}_{i,g_{t}}:i=1,2,...,N;g=1,2,...,G;t=1,2,...,T_{g}\}$ , and $\bm{\xi}=\{\bm{\xi}_{i,g_{t}}:i=1,2,...,N;g=1,2,...,G;t=1,2,...,T_{g}\}$ . The parameters to be estimated are $\bm{\theta}=\{M,m,p,\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\beta}^{(3)}\}$ where $\bm{\beta}^{(k)}=\{\bm{B}_{g_{t}}^{(k)}=[{\beta}_{0,g_{t}}^{(k)},{\bm{\beta}_{g_{t}}^{(k)}}^{T}]^{T}:g=1,2,...,G;t=1,2,...,T_{g}\}$ .

First, we aim to give the explicit expression of the log-likelihood of the data $\bm{l}$ and the latent variables $\bm{\xi}$ , in order to proceed with the next parameter estimation. The relevant proofs and derivations in this section are provided in Appendix A.2.

The log-likelihood of the data $\bm{l}$ and the latent variables $\bm{\xi}$ is expressed as

	$\displaystyle\ell_{full}(\bm{\theta}\|\bm{l},\bm{\xi})=$	$\displaystyle\log\Bigg{[}\prod_{i=1}^{N}\prod_{g=1}^{G}\prod_{t=1}^{T_{g}}\Big{[}f_{\bm{L}_{i,g_{t}}\|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}\|\bm{\xi}_{i,g_{t}};m,\bm{\beta})\Big{]}$		(7)
		$\displaystyle\times\prod_{i=1}^{N}\prod_{g=1}^{G}\Big{[}f_{\bm{\xi}_{ig}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)\Big{]}\Bigg{]},$		(7)

where the term $f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta})$ is the conditional density of $\bm{L}_{i,g_{t}}$ given $\bm{\xi}_{i,g_{t}}$ and the term $f_{\bm{\xi}_{g}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)$ is the joint density function of $[\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}}]$ .

Muirhead (2009, Theorem 3.2.18) gives the expression of $f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta})$ . The density function is difficult to compute due to the presence of an integral which is difficult to evaluate. Given the assumption that $m$ is assumed to be large to make $\mathbb{E}_{L_{it}^{(k)}|\xi_{it}^{(k)}}L_{it}^{(k)}$ dominated by $\lambda_{i,g_{t}}^{(k)}=\xi_{i,g_{t}}^{(k)}e^{\beta_{0,g_{t}}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}}$ , we give Lemma 3.1 borrowed from Srivastava (1979, Corollary 9.4.2).

Lemma 3.1.

In Equation 2, $\sqrt{m}\frac{L_{i,g_{t}}^{(k)}-\lambda_{i,g_{t}}^{(k)}}{\sqrt{2}\lambda_{i,g_{t}}^{(k)}}$ converges in distribution to a standard normal distribution as $m\rightarrow\infty$ , conditional on $\xi_{it}^{(k)}$ .

Therefore, to simplify the computation, we approximate $f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta})$ as a product of probability density functions of normal distributed random variables, expressed as

\displaystyle f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta})\simeq\prod_{k=1}^{3}\phi\Bigg{(}l_{i,g_{t}}^{(k)};\lambda_{i,g_{t}}^{(k)},\sqrt{\frac{2}{m}}\lambda_{i,g_{t}}^{(k)}\Bigg{)},

(8)

where $\lambda_{i,g_{t}}^{(k)}=\xi_{i,g_{t}}^{(k)}e^{\beta_{0,g_{t}}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}}$ and $\phi(l;\mu,\sigma)$ is the probability density function of a normal distribution with mean $\mu$ and standard error $\sigma$ . In practice, this simplification reduces the computational cost significantly but preserves the accurate parameter estimation.

Next we handle the probability density function $f_{\bm{\xi}_{ig}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)$ . In the following paragraphs, we derive the expression of the probability density function. First, Lemma 3.2 below allows us to handle only two density functions: $f_{\bm{\xi}_{i,g_{1}}}(\bm{\xi}_{i,g_{1}};M)$ and $f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p)$ .

Lemma 3.2.

$f_{\bm{\xi}_{ig}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)$ in Equation 7 can be written as

f_{\bm{\xi}_{ig}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)=f_{\bm{\xi}_{i,g_{1}}}({\xi}_{i,g_{1}};M)\prod_{t=2}^{T_{g}}f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{g_{i,t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{g_{i,t-1}};M,p),

where $f_{\bm{\xi}_{i,g_{1}}}(\bm{\xi}_{i,g_{1}};M)$ is the density function of $\bm{\xi}_{i,g_{1}}=[\xi_{i,g_{1}}^{(1)},\xi_{i,g_{1}}^{(2)},\xi_{i,g_{1}}^{(3)}]$ and $f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p)$ is the conditional density of $\bm{\xi}_{i,g_{t}}=[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]$ given $\bm{\xi}_{i,g_{t-1}}=[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}]$ .

Because $\bm{\xi}_{i,g_{1}}\sim\mathcal{E}([1,1,1],M)$ , Muirhead (2009, Corollary 3.2.19) directly gives the expression of the probability density function $f_{\bm{\xi}_{i,g_{1}}}(\bm{\xi}_{i,g_{1}};M)$ , expressed

\displaystyle f_{\bm{\xi}_{i,g_{1}}}(\bm{\xi}_{i,g_{1}};M)=\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\prod_{k=1}^{3}[\xi_{i,g_{t}}^{(k)}]^{(M-3-1)/2}\prod_{k<q}^{3}[\xi_{i,g_{t}}^{(k)}-\xi_{i,g_{t}}^{(q)}]\exp\left(-\frac{M}{2}\sum_{k=1}^{3}\xi_{i,g_{t}}^{(k)}\right).

(9)

Given James et al. (1960, Equation 7) and James et al. (1964, Equation 68), we further derives the expression of $f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p)$ (see Appendix A.2), expressed as

$\displaystyle f_{\bm{\xi}_{i,g_{t}}\|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}\|\bm{\xi}_{i,g_{t-1}};M,p)=$	$\displaystyle\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\prod_{k=1}^{3}[\xi_{i,g_{t}}^{(k)}]^{(M-3-1)/2}\prod_{k<q}^{3}[\xi_{i,g_{t}}^{(k)}-\xi_{i,g_{t}}^{(q)}]$	(10)
	$\displaystyle\times\exp\Bigg{[}-\frac{M}{2}\Big{(}\frac{p^{2}}{1-p^{2}}\sum_{k=1}^{K}\xi_{i,g_{t-1}}^{(k)}+\frac{1}{1-p^{2}}\sum_{k=1}^{K}\xi_{i,g_{t}}^{(k)}\Big{)}\Bigg{]}\times\big{(}\frac{1}{1-p^{2}}\big{)}^{\frac{3M}{2}}$
	$\displaystyle\times\ _{0}F_{1}\Big{(}\frac{1}{2}M;\frac{M}{2}\frac{p}{1-p^{2}}[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}],\frac{M}{2}\frac{p}{1-p^{2}}[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\Big{)},$

where $\ {}_{0}F_{1}(;)$ is a hypergeometric function of a matrix argument and its value can be numerically evaluated (Koev and Edelman, 2006). When $p=0$ , $\bm{\xi}_{i,g_{t}}$ is independent of $\bm{\xi}_{i,g_{t-1}}$ . Furthermore, either $f_{\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}};M,p)=f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p)f_{\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t-1}};M)$ or $f_{\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}};M,p)=f_{\bm{\xi}_{i,g_{t-1}}|\bm{\xi}_{i,g_{t}}}(\bm{\xi}_{i,g_{t-1}}|\bm{\xi}_{i,g_{t}};M,p)f_{\bm{\xi}_{i,g_{t}}}(\bm{\xi}_{i,g_{t}};M)$ produces the joint density function of $\bm{\xi}_{i,g_{t}}$ and $\bm{\xi}_{i,g_{t-1}}$ , denoted as $f_{\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}};M,p)$ .

Putting Equations 8-10 together, we have the explicit expression of the full log-likelihood $\ell_{full}(\bm{\theta}|\bm{l},\bm{\xi})$ (Equation 7). Standing on the above results, we next proceed the statistical inference of correlated eigenvalues. Our goal is to obtain the maximum likelihood estimators of $\bm{\theta}$ . In particular, to answer the clinical question about covariate effect quantification, the maximum likelihood estimate of $\bm{\beta}_{g_{t}}^{(k)}$ quantifies the covariate effects, and its confidence interval describes the uncertainties of covariate effects. Therefore, we have to maximize the likelihood whose the latent variables $\bm{\xi}$ are integrated out. The log-likelihood $\ell(\bm{\theta}|\bm{l})$ whose latent variables are integrated out is expressed as

	$\displaystyle\ell(\bm{\theta}\|\bm{l})=$	$\displaystyle\int\log\Bigg{[}\prod_{i=1}^{N}\prod_{g=1}^{G}\prod_{t=1}^{T_{g}}\Big{[}f_{\bm{L}_{i,g_{t}}\|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}\|\bm{\xi}_{i,g_{t}};m,\bm{\beta})\Big{]}$		(11)
		$\displaystyle\times\prod_{i=1}^{N}\prod_{g=1}^{G}\Big{[}f_{\bm{\xi}_{g}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)\Big{]}\Bigg{]}d\bm{\xi}.$		(11)

The hurdle caused by the unobserved latent variables $\bm{\xi}$ can be easily overcome through the EM algorithm. The iterative scheme of the algorithm is given as follows. We start with the current parameter $\bm{\theta}^{(r)}$ , thus the full likelihood at the current stage is $\ell_{Full}(\bm{\theta}^{(r)}|\bm{l},\bm{\xi})$ . We first compute the expected value of the likelihood function with respect to the current conditional distribution of $\bm{\xi}$ given the data $\bm{l}$ and the current parameter estimates $\bm{\theta}^{(r)}$ , denoted as $Q(\bm{\theta}|\bm{\theta}^{(r)})=\mathbb{E}_{\bm{\xi}|\bm{L},\bm{\theta}^{(r)}}\ell_{Full}(\bm{\theta}|\bm{l},\bm{\xi})$ . This step is called E-step since we take the expectation. Next we maximize the term $Q(\bm{\theta}|\bm{\theta}^{(r)})$ with respect to $\bm{\theta}$ to obtain the next estimates $\bm{\theta}^{(r+1)}$ . This step is called M-step because we maximize the function. We iteratively repeat the two steps until convergence, in order to obtain the maximum likelihood estimator $\hat{\bm{\theta}}$ for the parameters $\bm{\theta}$ .

In the M-step, the function $Q(\bm{\theta}|\bm{\theta}^{(r)})$ can be feasibly maximized using the Quasi-Newton method with the the constraints ${\beta}_{dt}^{(1)}>{\beta}_{dt}^{(2)}>{\beta}_{dt}^{(3)}$ for $d=0,1,2,...,D$ . In the E-step, the term $\mathbb{E}_{\bm{\xi}|\bm{L},\bm{\theta}^{(r)}}\ell_{Full}(\bm{\theta}|\bm{l},\bm{\xi})$ has no analytic expression. Thus, the Monte-Carlo EM based on importance sampling (Levine and Casella, 2001, Section 2.1) is applied. We first generate $C$ Markov chain Monte-Carlo samples from the posterior distribution $\bm{\xi}|\bm{L},\bm{\theta}^{(0)}$ , denoted as $\bm{\xi}^{(1)},...,\bm{\xi}^{(C)}$ , where $\bm{\theta}^{(0)}$ is the initial values of the EM algorithm. For every step $r$ , $Q(\bm{\theta}|\bm{\theta}^{(r)})$ is approximated as $Q(\bm{\theta}|\bm{\theta}^{(r)})=\mathbb{E}_{\bm{\xi}|\bm{L},\bm{\theta}^{(r)}}\ell_{Full}(\bm{\theta}|\bm{l},\bm{\xi})\approx\frac{1}{C}\sum_{c=1}^{C}\omega_{c}\ell_{Full}(\bm{\theta}|\bm{l},\bm{\xi}^{(c)})$ . The weight $\omega_{c}$ is defined as $\omega_{c}=\frac{\omega_{c}^{\prime}}{\sum_{c=1}^{C}\omega_{c}^{\prime}}$ where $\omega_{c}^{\prime}=\exp\Big{[}\ell_{Full}(\bm{\theta}^{(r)}|\bm{l},\bm{\xi})-\ell_{Full}(\bm{\theta}^{(0)}|\bm{l},\bm{\xi})\Big{]}$ . The correctness and relevant properties of the Monte-Carlo EM based on the importance sampling have already been provided in Levine and Casella (2001).

In clinical studies, the point and interval estimation of the coefficient ${\beta}_{d,g_{t}}^{(k)}$ are informative since it provides the covariate effects and the associated uncertainties, respectively. The confidence interval of parameters can be obtained by the parametric bootstrapping. We generate $B$ bootstrapping samples from Model 6 with $\bm{\theta}=\hat{\bm{\theta}}$ . We estimate ${\beta}_{d,g_{t}}^{(k)}$ for for each bootstrapping sample $b$ , denoted as ${\beta}_{d,g_{t}}^{(k)*}(b)$ , and we compute the bootstrapping difference as ${\delta}_{d,g_{t}}^{(k)*}(b)={\beta}_{d,g_{t}}^{(k)*}(b)-\widehat{\beta}_{d,g_{t}}^{(k)}$ . The $1-\alpha$ confidence interval of ${\beta}_{d,g_{t}}^{(k)}$ , is $[\widehat{\beta}_{d,g_{t}}^{(k)}-{\delta}_{d,g_{t}}^{(k)*}(b,1-\alpha/2),\hat{{\theta}}_{i}-{\delta}_{d,g_{t}}^{(k)*}(b,\alpha/2)]$ , where ${\delta}_{d,g_{t}}^{(k)*}(b,1-\alpha/2)$ and ${\delta}_{d,g_{t}}^{(k)*}(b,\alpha/2)$ are the $1-\alpha/2$ and $\alpha/2$ quantiles of all the bootstrapping differences ${\delta}_{d,g_{t}}^{(k)*}(b)$ for $b=1,...,B$ . In the next section, we use simulation studies to compare out proposal to the other methods in terms of the point and interval estimation of the coefficient.

4 Simulation Studies

In this section, we carry out simulation studies to measure the performance of our proposal. We call our proposed model as auto-eigenvalues model, and compare our method to a benchmark method. The benchmark model is a non-linear regression model treating the responses as normal distributed random variables. We refer this model as to the non-linear regression model, expressed as $L_{i,g_{t}}^{(k)}=e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}}+\epsilon_{i,g_{t}}^{(k)},\epsilon_{i,g_{t}}^{(k)}\sim\mathcal{N}(0,\sigma^{2})$ .

The two models have the similar interpretations of the coefficients $[\beta_{0t}^{(k)},\ {\bm{\beta}_{g_{t}}^{(k)}}^{T}]$ , that are to control the mean of the random eigenvalues, but differ in basic statistical assumptions. To summary, the auto-eigenvalues model respects that the responses are random eigenvalues while the non-linear regression model does not. The auto-eigenvalues model additionally captures the correlation of the eigenvalues among voxels. More importantly, the interpretation of $e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}$ is actually different. Under the the auto-eigenvalues model, we have $\mathbb{E}L_{it}^{(k)}\simeq e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}$ as $M$ and $m$ are large. Under the non-linear regression model, there is no approximation involved, such that $\mathbb{E}L_{it}^{(k)}=e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}$ . Therefore, we emphasize that the aim of the comparison is only to see whether we treat the responses as random eigenvalues or not impacts the coefficient estimation.

The simulated data is generated from our proposed model. In each simulated data, we give the following settings. We have $G=3$ fibers and each fiber has $T_{g}=20$ voxels. The correlation parameter is set as $p=0.6$ . The number of subjects is $N=10$ . Since our model setup relies on large values of the degrees of freedom $m$ and $M$ , we test parameter estimation performance given $M=m=20,30,40$ , respectively. For each replication, we generate ${\beta}_{d,g_{t}}^{(1)}$ from a normal distribution with mean $0$ and standard error $0.1$ , and we give ${\beta}_{d,g_{t}}^{(2)}={\beta}_{d,g_{t}}^{(1)}-0.3$ and ${\beta}_{d,g_{t}}^{(3)}={\beta}_{d,g_{t}}^{(1)}-0.6$ .

We first evaluate the mean square error (MSE) of the coefficient parameter estimates. A smaller MSE indicates that the estimation is more accurate. The MSE is reported for each eigenvalue, defined as $\frac{1}{D\times\sum_{g}g_{T}}\sum_{d}\sum_{g}\sum_{g_{t}}({\beta}_{d,g_{t}}^{(k)}-{\beta}_{d,g_{t}}^{(k)})^{2}$ . We use box-plots to visualize the MSEs of each replications (see Figure 4).

Regarding our proposed method, we find the trivial impact of the degrees of freedom on the parameter estimation of the coefficients, although the larger value of the degrees of freedom produces slightly more accurate results. This validates that our approximation in Equation 8 is acceptable. Since the interpretations of the coefficients are not equal between the two models, the MSEs of the two methods are not very comparable in terms of their crude numbers. An important observation is that the three eigenvalues in the non-linear regression present heterogeneous MSEs, even if the values of the degrees of freedom are large. This should be caused by that the non-linear regression model treats the three random eigenvalues equally via ignoring the random process of eigenvalues.

Furthermore, we compare the two methods in terms of the $95\%$ coverage probability. The $95\%$ coverage probability is defined as the probability that the true value is covered by the estimated $95\%$ confidence interval. The confidence interval of the non-linear regression model is also obtained via parametric bootstrapping. In Table 1, we summarize the coverage probabilities associated with each eigenvalue $k$ , i.e., $[\beta_{0t}^{(k)},\ {\bm{\beta}_{g_{t}}^{(k)}}^{T}]$ , averaging over voxels, coefficients, simulation replications. The auto-eigenvalue model presents reasonable uncertainties of parameter estimation. However, the non-linear regression model presents an an inflated confidence interval due to the model misspecification, where the parameter estimation of the coefficients has inflated uncertainties.

Table 1: The table summarizes the

95\%

coverage probabilities of the auto-eigenvalue model and the non-linear regression model.

Method	Eigenvalue	Degrees of Freedom
Method	Eigenvalue	20	30	40
Auto-eigenvalues Model	First	90%	91%	92%
	Second	97%	95%	94%
	Third	99%	98%	97%
Non-linear Regression	First	100%	100%	100%
	Second	100%	100%	100%
	Third	100%	100%	100%

5 Applications to IXI Dataset

We use the IXI data-set provided by the Center for the Developing Brain, Imperial College London, to demonstrate our proposed method. The data-set is made available at the website https://brain-development.org/ixi-dataset/ under the CC BY-SA 3.0 license. From the data repository, we select the first $20$ healthy subjects collected from Hammersmith Hospital in London to demonstrate our methodological contribution and compare it to the benchmark method. We focus on the effects of body/weight index (BWI) on the two fibers located at the corpus callosum (see Figure 5). The covariate of BWI is scaled into $[0,1]$ .

We continue to compare our proposed auto-eigenvalue model to the alternative method non-linear regression model. To evaluate the covariate effects, we report the Z-socres of the coefficient estimators, which is defined as $\frac{\widehat{\beta}_{d,g_{t}}^{(k)}}{\sigma_{dt}^{(k)}}$ , where $\sigma_{dt}^{(k)}$ is the standard error of the estimator $\widehat{\beta}_{d,g_{t}}^{(k)}$ . A larger absolute value of z-score indicates that the corresponding covariate has more significant effect on $k$ -th eigenvalue located at the voxel $g_{t}$ .

The covariate effects of BWI are given in Figure 6. First, we observe that the two models’ z-scores are roughly consistent. It indicates that the two models will not produce too different scientific interpretations. However, our proposed method is powerful in finding regions (voxels) with covariate effects since z-scores’ absolute values are relatively larger. This is consistent with our results in the simulation studies (Section 4), that is the non-linear regression model presents inflated statistical uncertainties. Although the real statistical underlying process of the diffusion tensors’ eigenvalues is intractable, this may further endorse our major claim in this paper: the responses are encouraged to be treated as random eigenvalues. Using our proposed model based on the distribution of Wishart’s eigenvalues is an option. Furthermore, the point estimate and $95\%$ confidence interval of $p$ are $0.452$ and $[0.445,0.493]$ , implying the existence of auto-correlation among random eigenvalues. The point estimate of $m$ and $M$ are $31.956$ and $32.795$ , respectively. The $95\%$ confidence intervals of $m$ and $M$ are $[30.256,33.656]$ and $[31.573,33.940]$ , respectively. The large value of degrees of freedom implies that the values of degrees of freedom might be large enough to make the term $e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}}$ dominate the mean of the random eigenvalue $\mathbb{E}L_{i,g_{t}}^{(k)}$ .

6 Conclusion and Future Works

This paper focuses on the statistical inference of the diffusion tensors’ eigenvalues because it is more scientifically interpretable. We propose a model of random eigenvalues based on the distribution of Wishart’s eigenvalues. Relying on the previous works i.e., Muirhead (2009, Chapter 3.2.5), James et al. (1964), and Srivastava (1979, Chapter 9.4), we are able to construct the hierarchical model of random eigenvalues and derive the relevant theoretical details. The estimation based on the Monte-Carlo EM algorithm is also proposed. We use simulation to study the impact on the parameter estimation if the responses are not treated as random eigenvalues. The real data analysis using IXI data also suggests that the eigenvalue responses of DTI are encouraged to be treated as random eigenvalues since our proposal produces more powerful results.

In this paper, we focus on eigenvalues and do not yet explore the statistical inference of eigenvectors. Srivastava (1979, Theorem 3.4.2) gives the density function expression of a Wishart distributed matrix’s eigenvectors. More studies on eigenvectors’ statistical inference are encouraged, including the specification of the covariate effect and correlation effect, and computational methods. All of these can be considered as future works.

SUPPLEMENTARY MATERIAL

Appendix A Theoretical Details

In this section, we offer theoretical details which are not fully elaborated in the main contexts.

A.1 Density Function of Parameterized Wishart Distribution

Let $\bm{W}\sim\mathcal{W}_{p}(\bm{\Sigma},U)$ . The probability density function of $\bm{W}$ is

{\displaystyle f_{\mathbf{W}}(\mathbf{w})={\frac{|\mathbf{w}|^{(U-p-1)/2}e^{-\operatorname{tr}(U\mathbf{\Sigma}^{-1}\mathbf{w})/2}}{2^{\frac{Up}{2}}|{\mathbf{\Sigma}/U}|^{U/2}\Gamma_{p}({\frac{U}{2}})}}}.

(12)

A.2 Proof of Lemma 3.2 and Derivation of Equation 10

To make our proof and derivation concise, we omit the subscript $i$ in the following. First, the joint density function of $(\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{T_{g}}})$ denoted as $f_{\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{t}},...,\bm{Z}_{j,g_{T_{g}}}}(\bm{z}_{j,g_{1}},...,\bm{z}_{j,g_{t}},...,\bm{z}_{j,g_{T_{g}}})$ can be decomposed as

		$\displaystyle f_{\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{t}},...,\bm{Z}_{j,g_{T_{g}}}}(\bm{z}_{j,g_{1}},...,\bm{z}_{j,g_{t}},...,\bm{z}_{j,g_{T_{g}}})$		(13)
		$\displaystyle=f_{\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{1}})\times f_{\bm{Z}_{j,g_{2}}\|\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{2}}\|\bm{z}_{j,g_{1}})\times f_{\bm{Z}_{j,g_{3}}\|\bm{Z}_{j,g_{2}},\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{3}}\|\bm{z}_{j,g_{2}},\bm{z}_{j,g_{1}})\times$
		$\displaystyle\quad...$
		$\displaystyle\quad\times f_{\bm{Z}_{j,g_{T}}\|\bm{Z}_{j,g_{T-1}},...,\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{T}}\|\bm{z}_{j,g_{T-1}},...,\bm{z}_{j,g_{1}}).$

Since $(\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{T_{g}}})$ follows a multivariate normal distribution with the covariance which is a Toeplitz matrix and the inverse covariance matrix is a tridiagonal matrix, the joint density function $f_{\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{t}},...,\bm{Z}_{j,g_{T_{g}}}}(\bm{z}_{j,g_{1}},...,\bm{z}_{j,g_{t}},...,\bm{z}_{j,g_{T_{g}}})$ can be simplified as

		$\displaystyle f_{\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{t}},...,\bm{Z}_{j,g_{T_{g}}}}(\bm{z}_{j,g_{1}},...,\bm{z}_{j,g_{t}},...,\bm{z}_{j,g_{T_{g}}})$		(14)
		$\displaystyle=f_{\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{1}})\times f_{\bm{Z}_{j,g_{2}}\|\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{2}}\|\bm{z}_{j,g_{1}})\times f_{\bm{Z}_{j,g_{3}}\|\bm{Z}_{j,g_{2}}}(\bm{z}_{j,g_{3}}\|\bm{z}_{j,g_{2}})\times...\times f_{\bm{Z}_{j,g_{T}}\|\bm{Z}_{j,g_{T-1}}}(\bm{z}_{j,g_{T}}\|\bm{z}_{j,g_{T-1}}).$		(14)

This decomposition indicates that the terms $(\bm{Z}_{j,g_{1}},[\bm{Z}_{j,g_{2}}|\bm{Z}_{j,g_{1}}],[\bm{Z}_{j,g_{3}}|\bm{Z}_{j,g_{2}}],...,[\bm{Z}_{j,g_{T}}|\bm{Z}_{j,g_{T-1}}])$ are mutually independent. Therefore, the joint density function of $\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}}$ can be written as

f_{\bm{\xi}_{g}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)=f_{\bm{\xi}_{g_{1}}}({\xi}_{i,g_{1}};M)\prod_{t=2}^{T_{g}}f_{\bm{\xi}_{g_{t}}|\bm{\xi}_{g_{t-1}}}(\bm{\xi}_{g_{t}}|\{\bm{Z}_{j,g_{t-1}}:j=1:M\};M,p),

because ${\xi}_{i,g_{t}}$ are the eigenvalues of $\bm{Z}_{j,g_{t}}\bm{Z}^{T}_{j,g_{t}}/M$ .

Next we want to show that $f_{\bm{\xi}_{g_{t}}|\{\bm{Z}_{j,g_{t-1}}:j=1:M\}}(\bm{\xi}_{g_{t}}|\{\bm{Z}_{j,g_{t-1}}:j=1:M\};M,p)=f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p)$ . The derivation largely relies on James et al. (1960, Equation 7) and James et al. (1964, Equation 68). The derivation trick is borrowed from Smith and Garth (2007) where the conditional distribution of eigenvalues of a complex Wishart matrix is derived.

First, we re-illustrate James et al. (1960, Equation 7) and James et al. (1964, Equation 68) by giving Theorem A.1.

Theorem A.1.

Let $\bm{K}_{j}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{N}(\bm{\mu}/\sqrt{M},\bm{\Sigma}/M)$ . The roots of $|\sum_{j=1}^{M}\bm{K}_{j}\bm{K}_{j}^{T}-\bm{W}\bm{\Sigma}/M|=0$ , denoted as $\bm{W}=diag[W_{1},W_{2},W_{3}]$ follow a distribution with the density function expressed as

		$\displaystyle f_{W_{1},W_{2},W_{3}}(w_{1},w_{2},w_{3})=\exp\Big{[}-\frac{1}{2}(w_{1}+w_{2}+w_{3})\Big{]}\ _{0}F_{1}\Big{(}\frac{1}{2}M;\frac{1}{4}[o_{1},o_{2},o_{3}],[w_{1},w_{2},w_{3}]\Big{)}$		(15)
		$\displaystyle\times\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\exp\Big{[}-\frac{1}{2}(o_{1}+o_{2}+o_{3})\Big{]}\prod_{k=1}^{3}(w_{k})^{(M-3-1)/2}\prod_{k<q}^{3}[w_{k}-w_{q}],$		(15)

where the roots of $|\bm{\mu}\bm{\mu}^{T}-\bm{O}\bm{\Sigma}/M|=0$ are $\bm{O}=diag[o_{1},o_{2},o_{3}]$ .

We next apply this theorem to our problem based on the result of conditional normal distribution, i.e., $[\bm{Z}_{j,g_{t}}|\bm{Z}_{j,g_{t-1}}]\sim\mathcal{N}({p}\bm{Z}_{j,g_{t-1}},(1-{p}^{2})\bm{I})$ . By letting $\bm{K}_{j}$ in Theorem A.1 be $[\bm{Z}_{j,g_{t}}|\bm{Z}_{j,g_{t-1}}]$ , we have

		$\displaystyle f_{W_{1},W_{2},W_{3}}(w_{1},w_{2},w_{3})=\exp\Big{[}-\frac{1}{2}(w_{1}+w_{2}+w_{3})\Big{]}\ _{0}F_{1}\Big{(}\frac{1}{2}M;\frac{M{p}^{2}}{4(1-{p}^{2})}[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}],[w_{1},w_{2},w_{3}]\Big{)}$		(16)
		$\displaystyle\times\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\exp\Big{[}-\frac{M{p}^{2}}{2(1-{p}^{2})}(\xi_{i,g_{t-1}}^{(1)}+\xi_{i,g_{t-1}}^{(2)}+\xi_{i,g_{t-1}}^{(3)})\Big{]}\prod_{k=1}^{3}(w_{k})^{(M-3-1)/2}\prod_{k<q}^{3}[w_{k}-w_{q}],$		(16)

where $[W_{1},W_{2},W_{3}]=\frac{M}{(1-{p}^{2})}[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}]$ . Via change of variables, we have

$\displaystyle f_{\bm{\xi}_{i,g_{t}}\|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}\|\bm{\xi}_{i,g_{t-1}};M,p)$	$\displaystyle=f_{W_{1},W_{2},W_{3}}(\frac{M}{(1-{p}^{2})}\xi_{i,g_{t-1}}^{(1)},\frac{M}{(1-{p}^{2})}\frac{M}{(1-{p}^{2})}\xi_{i,g_{t-1}}^{(2)},\frac{M}{(1-{p}^{2})}\xi_{i,g_{t-1}}^{(3)})\times\|J\|$	(17)
	(Because $\ {}_{0}F_{1}(m;\bm{A},\bm{B})=\ _{0}F_{1}(m;\bm{A}/c,\bm{B}c)$ , and $\|J\|=\|\frac{M}{1-p^{2}}\|^{3}$ )
	$\displaystyle=\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\prod_{k=1}^{3}[\xi_{i,g_{t}}^{(k)}]^{(M-3-1)/2}\prod_{k<q}^{3}[\xi_{i,g_{t}}^{(k)}-\xi_{i,g_{t}}^{(q)}]$
	$\displaystyle\times\exp\Bigg{[}-\frac{M}{2}\Big{(}\frac{p^{2}}{1-p^{2}}\sum_{k=1}^{K}\xi_{i,g_{t-1}}^{(k)}+\frac{1}{1-p^{2}}\sum_{k=1}^{K}\xi_{i,g_{t}}^{(k)}\Big{)}\Bigg{]}\times\big{(}\frac{1}{1-p^{2}}\big{)}^{\frac{3M}{2}}$
	$\displaystyle\times\ _{0}F_{1}\Big{(}\frac{1}{2}M;\frac{M}{2}\frac{p}{1-p^{2}}[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}],\frac{M}{2}\frac{p}{1-p^{2}}[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\Big{)}.$

Appendix B Codes

All the codes written in MATLAB are summarized in AutoEigenCodes.zip. The instructions of implementing the codes are provided readme.txt. Some example scripts are also given.

References

(1)
Cressie and Wikle (2015) Cressie, N. and Wikle, C. K. (2015), Statistics for spatio-temporal data, John Wiley & Sons.
Gelfand et al. (2004) Gelfand, A. E., Schmidt, A. M., Banerjee, S. and Sirmans, C. (2004), ‘Nonstationary multivariate process modeling through spatially varying coregionalization’, Test 13(2), 263–312.
James et al. (1960) James, A. T. et al. (1960), ‘The distribution of the latent roots of the covariance matrix’, Annals of Mathematical Statistics 31(1), 151–158.
James et al. (1964) James, A. T. et al. (1964), ‘Distributions of matrix variates and latent roots derived from normal samples’, The Annals of Mathematical Statistics 35(2), 475–501.
Jin et al. (2017) Jin, Y., Huang, C., Daianu, M., Zhan, L., Dennis, E. L., Reid, R. I., Jack Jr, C. R., Zhu, H., Thompson, P. M. and Initiative, A. D. N. (2017), ‘3 d tract-specific local and global analysis of white matter integrity in a lzheimer’s disease’, Human brain mapping 38(3), 1191–1207.
Koev and Edelman (2006) Koev, P. and Edelman, A. (2006), ‘The efficient evaluation of the hypergeometric function of a matrix argument’, Mathematics of Computation 75(254), 833–846.
Lan et al. (2021) Lan, Z., Reich, B. J., Guinness, J., Bandyopadhyay, D., Ma, L. and Moeller, F. G. (2021), ‘Geostatistical modeling of positive definite matrices: An application to diffusion tensor imaging’, Biometrics p. In Press.
Lane et al. (2010) Lane, S. D., Steinberg, J. L., Ma, L., Hasan, K. M., Kramer, L. A., Zuniga, E. A., Narayana, P. A. and Moeller, F. G. (2010), ‘Diffusion tensor imaging and decision making in cocaine dependence’, PLoS One 5(7), e11591.
Lawley (1956) Lawley, D. (1956), ‘Tests of significance for the latent roots of covariance and correlation matrices’, Biometrika 43(1/2), 128–136.
Lee and Schwartzman (2017) Lee, H. N. and Schwartzman, A. (2017), ‘Inference for eigenvalues and eigenvectors in exponential families of random symmetric matrices’, Journal of Multivariate Analysis 162, 152–171.
Levine and Casella (2001) Levine, R. A. and Casella, G. (2001), ‘Implementations of the monte carlo em algorithm’, Journal of Computational and Graphical Statistics 10(3), 422–439.
Ma et al. (2009) Ma, L., Hasan, K. M., Steinberg, J. L., Narayana, P. A., Lane, S. D., Zuniga, E. A., Kramer, L. A. and Moeller, F. G. (2009), ‘Diffusion tensor imaging in cocaine dependence: regional effects of cocaine on corpus callosum and effect of cocaine administration route’, Drug and Alcohol Dependence 104(3), 262–267.
Muirhead (2009) Muirhead, R. J. (2009), Aspects of multivariate statistical theory, John Wiley & Sons.
Peng et al. (2009) Peng, H., Orlichenko, A., Dawe, R. J., Agam, G., Zhang, S. and Arfanakis, K. (2009), ‘Development of a human brain diffusion tensor template’, Neuroimage 46(4), 967–980.
Schwartzman et al. (2008) Schwartzman, A., Mascarenhas, W. F., Taylor, J. E. et al. (2008), ‘Inference for eigenvalues and eigenvectors of gaussian symmetric matrices’, The Annals of Statistics 36(6), 2886–2919.
Smith and Garth (2007) Smith, P. J. and Garth, L. M. (2007), ‘Distribution and characteristic functions for correlated complex Wishart matrices’, Journal of Multivariate Analysis 98(4), 661–677.
Srivastava (1979) Srivastava, M. (1979), An Introduction to Multivariate Statistics, North-Holland/New York.
Wong et al. (2016) Wong, R. K., Lee, T. C., Paul, D., Peng, J., Initiative, A. D. N. et al. (2016), ‘Fiber direction estimation, smoothing and tracking in diffusion mri’, The Annals of Applied Statistics 10(3), 1137–1156.
Yuan et al. (2012) Yuan, Y., Zhu, H., Lin, W. and Marron, J. S. (2012), ‘Local polynomial regression for symmetric positive definite matrices’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74(4), 697–719.
Zhou (2010) Zhou, D. (2010), Statistical analysis of diffusion tensor imaging, PhD thesis, University of Nottingham.
Zhu et al. (2009) Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C. and Lin, W. (2009), ‘Intrinsic regression models for positive-definite matrices with applications to diffusion tensor imaging’, Journal of the American Statistical Association 104(487), 1203–1212.
Zhu et al. (2011) Zhu, H., Kong, L., Li, R., Styner, M., Gerig, G., Lin, W. and Gilmore, J. H. (2011), ‘Fadtts: functional analysis of diffusion tensor tract statistics’, NeuroImage 56(3), 1412–1425.
Zhu et al. (2010) Zhu, H., Styner, M., Li, Y., Kong, L., Shi, Y., Lin, W., Coe, C. and Gilmore, J. H. (2010), Multivariate varying coefficient models for dti tract statistics, in ‘International Conference on Medical Image Computing and Computer-Assisted Intervention’, Springer, pp. 690–697.