This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Statistical Inference of Auto-correlated Eigenvalues with Applications to Diffusion Tensor Imaging

Zhou Lan
 Center for Outcomes Research and Evaluation
Yale School of Medicine
Abstract

Diffusion tensor imaging (DTI) is a prevalent neuroimaging tool in analyzing the anatomical structure. The distinguishing feature of DTI is that the voxel-wise variable is a 3×33\times 3 positive definite matrix other than a scalar, describing the diffusion process at the voxel. Recently, several statistical methods have been proposed to analyze the DTI data. This paper focuses on the statistical inference of eigenvalues of DTI because it provides more transparent clinical interpretations. However, the statistical inference of eigenvalues is statistically challenging because few treat these responses as random eigenvalues. In our paper, we rely on the distribution of the Wishart matrix’s eigenvalues to model the random eigenvalues. A hierarchical model which captures the eigenvalues’ randomness and spatial auto-correlation is proposed to infer the local covariate effects. The Monte-Carlo Expectation-Maximization algorithm is implemented for parameter estimation. Both simulation studies and application to IXI data-set are used to demonstrate our proposal. The results show that our proposal is more proper in analyzing auto-correlated random eigenvalues compared to alternatives.


Keywords: Auto-correlation, Diffusion Tensor Imaging, Monte-Carlo Expectation-Maximization Algorithm, Random Eigenvalues, Wishart Distribution

1 Introduction

Diffusion tensor imaging (DTI) is a prevalent neuroimaging tool in analyzing the anatomical structure of a tissue. The rich use of DTI in brain science triggers many meaningful interdisciplinary studies. Due to the nature of interdisciplinarity, providing a tutorial of DTI, which is intuitive to both statisticians and clinicians, is key in interdisciplinary researches of DTI. Based on Figure 1 which is modified from figures in (Zhou, 2010, Figure 1.6), we give a short tutorial of DTI as follows. DTI is a neuroimage measuring the white matter tracts within a tissue (e.g., human brain) voxel-by-voxel. Let the structure at the upper panel of Figure 1 be the anatomical structure measured at a voxel. The movement of water molecules in different environments is measured by giving their respective diffusion ellipsoids at the lower panel of Figure 1. The eigenvectors are the diffusion directions [𝒆1𝒆2𝒆3][\bm{e}_{1}\ \bm{e}_{2}\ \bm{e}_{3}] and the squared roots of the eigenvalues [l1l2l3][l_{1}\ l_{2}\ l_{3}] are the corresponding semi-axis lengths (Zhou, 2010, Section 1.2.3). The corresponding diffusion tensor D=[𝒆1𝒆2𝒆3]diag(l1,l2,l3)[𝒆1𝒆2𝒆3]TD=[\bm{e}_{1}\ \bm{e}_{2}\ \bm{e}_{3}]\text{diag}(l_{1},l_{2},l_{3})[\bm{e}_{1}\ \bm{e}_{2}\ \bm{e}_{3}]^{T} describes the anatomical structure at the voxel. Combining the diffusion tensors, which are positive definite over all the voxels, we have an image whose voxel-wise variables are positive definite diffusion tensors, describing the anatomical structure of a tissue.

Refer to caption
Figure 1: The figure gives the graphical illustration of the anatomical structure and the corresponding diffusion process. The upper panel gives the tissue to be measured. The lower panel gives the corresponding diffusion ellipsoid.

The use of DTI is extensive (e.g., fiber tracking, clinical studies, etc.). Our paper focuses on understanding how the subject’s covariates (e.g., age, gender, disease status) drive the brain’s anatomical structures in terms of DTI. Many brain scientists raised this scientific objective. The relevant researches are now divided into two paths. The statisticians are interested in developing methodologies that handle the statistical randomness of the whole diffusion tensor. There are many representative works (e.g., Schwartzman et al., 2008; Zhu et al., 2009; Yuan et al., 2012; Lee and Schwartzman, 2017; Lan et al., 2021). The clinicians are more interested in using convenient summary scalar to tell the changes of diffusion tensor quickly and easily. For example, the fractional anisotropy, denoted as FA=12(l1l2)2+(l2l3)2+(l3l1)2l12+l22+l32,{\displaystyle{\text{FA}}={\sqrt{\frac{1}{2}}}{\frac{\sqrt{(l_{1}-l_{2})^{2}+(l_{2}-l_{3})^{2}+(l_{3}-l_{1})^{2}}}{\sqrt{l_{1}^{2}+l_{2}^{2}+l_{3}^{2}}}},} is a frequently used summary scalar. Via implementing basic statistical tools, i.e., t-test, simple linear regression with these quantities as data, the researchers can easily tell the changes of the tissue statistically (e.g., Ma et al., 2009; Lane et al., 2010).

In light of Figure 1, we propose that analyzing the DTI’s eigenvalues and eigenvectors separately can be another promising research direction. For statisticians, this path does not preclude any valuable available data information. For clinicians, the scientific explanations in terms of the statistical uncertainties of eigenvalues and/or eigenvectors are more transparent than the other ways, e.g., coefficients based on Cholesky decomposition (Zhu et al., 2009; Lan et al., 2021) or local polynomial regression (Yuan et al., 2012). In our paper, we focus on the statistical analysis of eigenvalues because the variations of eigenvalues are primarily interested by the clinicians given the current clinical reports (e.g., Ma et al., 2009; Lane et al., 2010). In this paper, the analysis of eigenvectors is left as future works.

Compared to other statistical models regarding the eigenvalues of DTI (e.g., Zhu et al., 2010, 2011; Jin et al., 2017), our paper may be a very first paper discussing the statistical inference of random eigenvalues. It is an important fact that the eigenvalues are stochastically ordered, and thus the probability density function of the eigenvalues must ensure the inequality of the eigenvalues stochastically. This inequality makes many classic models such as multivariate Gaussian processes not be applied to this problem straightforwardly. In this paper, we use the density function of a Wishart distributed matrix’s eigenvalues as the basic probability distribution, referred to as distribution of Wishart’s eigenvalues. The distribution of Wishart’s eigenvalues ensures the inequality of the eigenvalues stochastically.

Following many neuroimaging studies, the spatial dependence of the eigenvalues among voxels is encouraged to be incorporated. In the previous analysis (e.g., Zhu et al., 2010, 2011; Jin et al., 2017), the spatial dependence can be easily achieved via the spatial Gaussian process model. Many fiber tracking reports (e.g., Wong et al., 2016) assume that the diffusion tensors are smooth along with a fiber tract (see Figure 2) whose diffusion tensors have similar directions. In light of this assumption, we assume that the eigenvalues are auto-correlated. The model based on auto-correlated eigenvalues captures the spatial correlation and produces an analytic expression of the density function of the spatially correlated eigenvalues (James et al., 1964).

Refer to caption
Figure 2: The left panel gives a human brain in terms of fiber tracts. The right panel gives how fiber tracts are constructed. The principal eigenvalues of diffusion tensors are given in each voxel. The voxels with similar directions are considered as a fiber tract, e.g., the voxels linked by the blue curve.

Combining all the above, we construct a hierarchical spatial model for the random auto-correlated eigenvalues. This model is both statistically and scientifically proper regarding the applications to diffusion tensor imaging. Given the hierarchical structure, we implement the Monte-Carlo Expectation-Maximization (EM) algorithm (Levine and Casella, 2001, Section 2.1) for the parameter estimation and implement parametric bootstrapping for uncertainties quantification.

Both simulation studies and real data applications are used to demonstrate our proposed method. We compare our proposal to a non-linear regression model treating the data as normally distributed. The simulation studies indicate that our model gives a more accurate parameter point and confidence interval estimation. The real data analysis based on the IXI data-set (https://brain-development.org/ixi-dataset/) provided by Center for the Developing Brain, Imperial College London, an open data-set made available under the Creative Commons CC BY-SA 3.0 license, shows more clinically meaningful results after treating the data as random eigenvalues.

In the rest of the paper, we first give the hierarchical spatial model of random auto-correlated eigenvalues in Section 2. The estimation of the proposed model is provided in Section 3. We give simulation studies and real data analysis in Section 4 and Section 5, respectively. Finally, we give a conclusion in Section 6. The theoretical details in our paper are in Appendix A. The codes for implementing our method are summarized in Appendix B.

2 Model Setup

In this section, we give the model construction step-by-step. First, we provide the basic model setup in Section 2.1, where the probability distribution of the eigenvalues and the covariate effect specification are introduced. Under the basic model, we further give a latent layer of eigenvalues that induce spatial correlation and provide the details in Section 2.2. To wrap up, we provide a model summary and introduce its application to the statistical inference of eigenvalues of DTI in Section 2.3.

Our model is based on the the parameterized Wishart distribution, and the parameterized Wishart distribution is slightly different from the Wishart distribution introduced in the classic textbooks. We give the construction of the parameterized Wishart distribution as follows. 𝒁j\bm{Z}_{j} independently follows a mean-zero normal distribution with the covariance matrix 𝚺\bm{\Sigma} over j{1,,U}j\in\{1,...,U\} , denoted as 𝒁jind𝒩(𝟎,𝚺)\bm{Z}_{j}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{N}(\bm{0},\bm{\Sigma}). 𝑾=j=1U𝒁j𝒁jT/U\bm{W}=\sum_{j=1}^{U}\bm{Z}_{j}\bm{Z}_{j}^{T}/U follows a parameterized Wishart distribution with the mean matrix 𝚺\bm{\Sigma} and the degrees of freedom UU, denoted as 𝑾𝒲p(𝚺,U)\bm{W}\sim\mathcal{W}_{p}(\bm{\Sigma},U). The density function of 𝑾\bm{W} is given in Appendix A.1.

2.1 Basic Model Setup

Let 𝑫it\bm{D}_{it} be a 3×33\times 3 positive definite matrix-random variable following a parameterized Wishart distribution with the mean matrix 𝚺it\bm{\Sigma}_{it} and the degrees of freedom mm, denoted as

𝑫it𝒲3(𝚺it,m).\bm{D}_{it}\sim\mathcal{W}_{3}(\bm{\Sigma}_{it},m). (1)

𝑫it\bm{D}_{it} can be treated as the diffusion tensor at the voxel t{1,2,,T}t\in\{1,2,...,T\} of the subject i{1,2,,I}i\in\{1,2,...,I\}. The eigenvalues of 𝑫it\bm{D}_{it} are [Lit(1),Lit(2),Lit(3)][L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}] with the stochastic inequality Lit(1)>Lit(2)>Lit(3)L_{it}^{(1)}>L_{it}^{(2)}>L_{it}^{(3)}. Besides the degrees of freedom mm, the density function of [Lit(1),Lit(2),Lit(3)][L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}] only depends on the eigenvalues of 𝚺it\bm{\Sigma}_{it} which are λit(1)>λit(2)>λit(3)\lambda_{it}^{(1)}>\lambda_{it}^{(2)}>\lambda_{it}^{(3)} (Muirhead, 2009, Theorem 3.2.18). Therefore, we say that [Lit(1),Lit(2),Lit(3)][L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}] follow a distribution of Wishart’s eigenvalues with the degrees freedom mm and the population eigenvalues λit(1)>λit(2)>λit(3)\lambda_{it}^{(1)}>\lambda_{it}^{(2)}>\lambda_{it}^{(3)}, denoted as

[Lit(1),Lit(2),Lit(3)]([λit(1),λit(2),λit(3)],m)[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]\sim\mathcal{E}([\lambda_{it}^{(1)},\lambda_{it}^{(2)},\lambda_{it}^{(3)}],m) (2)

Given Lawley (1956, Equation 3), the mean of the eigenvalue is expressed as 𝔼Lit(k)=λit(k)+𝒪(m1/2)\mathbb{E}L_{it}^{(k)}=\lambda_{it}^{(k)}+\mathcal{O}(m^{-1/2}), where 𝒪(m1/2)\mathcal{O}(m^{-1/2}) is a quantity with order m1/2m^{-1/2}. Since λit(k)\lambda_{it}^{(k)} is the so-called population eigenvalue of the distribution (the eigenvalue of the mean matrix 𝚺it\bm{\Sigma}_{it}), we point out that the expression 𝔼Lit(k)=λit(k)+𝒪(m1/2)\mathbb{E}L_{it}^{(k)}=\lambda_{it}^{(k)}+\mathcal{O}(m^{-1/2}) may not be consistent with our intuition. The value of the degrees of freedom mm also leads to the small variance of the eigenvalues. Therefore, λit(k)\lambda_{it}^{(k)} can be approximately treated as the mean of Lit(k)L_{it}^{(k)} when mm is large enough, denoted as 𝔼Lit(k)λit(k)\mathbb{E}L_{it}^{(k)}\simeq\lambda_{it}^{(k)}, although its formal definition is the so-called population eigenvalue.

Let 𝑿i\bm{X}_{i} be a 1×D1\times D matrix containing the covariate information (e.g., age, gender) of the subject ii. We rescale the original value of each covariate XidX_{id} into the range in [0,1][0,1]. We further specify that 𝔼Lit(k)eβ0t(k)+𝑿i𝜷t(k)=λit(k)\mathbb{E}L_{it}^{(k)}\simeq e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}=\lambda_{it}^{(k)} where 𝜷t(k)=[β1t(k),,βDt(k)]T\bm{\beta}_{t}^{(k)}=[{\beta}_{1t}^{(k)},...,{\beta}_{Dt}^{(k)}]^{T}, where βdt(k){\beta}_{dt}^{(k)} is the coefficient for the covariate dd of the kk-th eigenvalue located at the voxel tt and β0t(k){\beta}_{0t}^{(k)} is the corresponding intercept. This specification ensures that the population eigenvalues are positive, and the coefficients represent the linear effects on the logarithm of the mean of the random eigenvalue, denoted as log𝔼Lit(k)β0t(k)+𝑿i𝜷t(k)\log\mathbb{E}L_{it}^{(k)}\simeq\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}. We also put the constraints βdt(1)>βdt(2)>βdt(3){\beta}_{dt}^{(1)}>{\beta}_{dt}^{(2)}>{\beta}_{dt}^{(3)} for d=0,1,2,,Dd=0,1,2,...,D to ensure the inequality of the population eigenvalues. Combining all above, the means of the random eigenvalue responses (i.e., 𝔼Lit(k)\mathbb{E}L_{it}^{(k)}) are determined by the covariate information.

2.2 Latent Eigenvalues Setup

In Section 2.1, we give basic model construction for the random eigenvalue responses. In this section, we further induce the spatial dependence among the random eigenvalues by giving the latent eigenvalues setup. The spatial model to be built is expected to have the following three properties:

Property 1.

The eigenvalues [Lit(1),Lit(2),Lit(3)][L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}] are independent and identically distributed (i.i.d) over subjects i{1,,N}i\in\{1,...,N\}.

Property 2.

Given a subject ii, [Lit(1),Lit(2),Lit(3)][L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}] are spatially correlated over the voxels t{1,,T}t\in\{1,...,T\}.

Property 3.

The approximation 𝔼Lit(k)eβ0t(k)+𝐗i𝛃t(k)\mathbb{E}L_{it}^{(k)}\simeq e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}} given in the basic model setup is preserved after giving the latent eigenvalues setup.

To achieve these goals, we decompose λit(k)\lambda_{it}^{(k)} as λit(k)=ξit(k)eβ0t(k)+𝑿i𝜷t(k)\lambda_{it}^{(k)}=\xi_{it}^{(k)}e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}, where 𝜷t(k)=[β1t(k),,βDt(k)]T\bm{\beta}_{t}^{(k)}=[{\beta}_{1t}^{(k)},...,{\beta}_{Dt}^{(k)}]^{T}. The additional terms ξit(1)>ξit(2)>ξit(3)\xi_{it}^{(1)}>\xi_{it}^{(2)}>\xi_{it}^{(3)} are random eigenvalues follow a so-called spatial process of Wishart’s eigenvalues, inducing the spatial correlation. As a prerequisite, we first introduce the spatial Wishart process (Gelfand et al., 2004; Lan et al., 2021) which is a means to induce the correlation among Wishart matrices. Let 𝒁t\bm{Z}_{t} be a mean-zero Gaussian process with the covariance as cov(𝒁t,𝒁s)=ρts×𝚺\text{cov}(\bm{Z}_{t},\bm{Z}_{s})=\rho_{ts}\times\bm{\Sigma}, where ρts(0,1)\rho_{ts}\in(0,1) is the correlation, and 𝚺\bm{\Sigma} is the cross covariance matrix of the Gaussian process. In the rest of the paper, we simply give 𝚺=𝑰\bm{\Sigma}=\bm{I}, where 𝑰\bm{I} is an identity matrix. If we have i.i.d realizations of this process, denoted as 𝒁jt\bm{Z}_{jt}, where jj is the subscript for realizations and tt is the subscript for the Gaussian process, then 𝑼t=j=1M𝒁jt𝒁jtT/M\bm{U}_{t}=\sum_{j=1}^{M}\bm{Z}_{jt}\bm{Z}_{jt}^{T}/M follows a so-called spatial Wishart process, denoted as {𝑼t}t𝕋𝒮𝒲𝒫(M,ρts𝑰)\{\bm{U}_{t}\}_{t\in\mathbb{T}}\sim\mathcal{SWP}(M,\rho_{ts}\otimes\bm{I}). Subsequently, the eigenvalues of 𝑼t\bm{U}_{t} which are ξt(1)>ξt(2)>ξt(3)\xi_{t}^{(1)}>\xi_{t}^{(2)}>\xi_{t}^{(3)} follow a spatial process of Wishart’s eigenvalues, denoted as

[ξt(1),ξt(2),ξt(3)]}t𝕋𝒮𝒫(M,ρts[1,1,1]).{[\xi_{t}^{(1)},\xi_{t}^{(2)},\xi_{t}^{(3)}]\}_{t\in\mathbb{T}}\sim\mathcal{SEP}(M,\rho_{ts}\otimes[1,1,1])}. (3)

Let [ξit(1)>ξit(2)>ξit(3)][\xi_{it}^{(1)}>\xi_{it}^{(2)}>\xi_{it}^{(3)}] be the ii-th i.i.d. realization of {[ξt(1),ξt(2),ξt(3)]}t𝕋𝒮𝒫(M,ρts[1,1,1])\{[\xi_{t}^{(1)},\xi_{t}^{(2)},\xi_{t}^{(3)}]\}_{t\in\mathbb{T}}\sim\mathcal{SEP}(M,\rho_{ts}\otimes[1,1,1]) at the voxel tt. To wrap-up, our model at this stage is

[Lit(1),Lit(2),Lit(3)]|[ξit(1),ξit(2),ξit(3)]([λit(1),λit(2),λit(3)],m),\displaystyle[L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}]|[\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}]\sim\mathcal{E}([\lambda_{it}^{(1)},\lambda_{it}^{(2)},\lambda_{it}^{(3)}],m), (4)
λit(k)=ξit(k)eβ0t(k)+𝑿i𝜷t(k),\displaystyle\lambda_{it}^{(k)}=\xi_{it}^{(k)}e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}},
{[ξit(1),ξit(2),ξit(3)]}t𝕋𝒮𝒫(M,ρts[1,1,1]).\displaystyle\{[\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}]\}_{t\in\mathbb{T}}\sim\mathcal{SEP}(M,\rho_{ts}\otimes[1,1,1]).

Next, we show this model satisfies the three properties we stated at the beginning of this subsection.

Since the latent eigenvalues [ξit(1)>ξit(2)>ξit(3)][\xi_{it}^{(1)}>\xi_{it}^{(2)}>\xi_{it}^{(3)}] are i.i.d. over i{1,,N}i\in\{1,...,N\}, Property 1 that the eigenvalues [Lit(1),Lit(2),Lit(3)][L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}] are i.i.d over subjects i{1,,N}i\in\{1,...,N\} is satisfied. We measure the spatial dependence of this process via the expected squared Frobenius norm of the term diag([ξit(1),ξit(2),ξit(3)])diag([ξs(1),ξs(2),ξs(3)])\text{diag}([\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}])-\text{diag}([\xi_{s}^{(1)},\xi_{s}^{(2)},\xi_{s}^{(3)}]), denoted as

𝒱(t,s)=𝔼diag([ξit(1),ξit(2),ξit(3)])diag([ξis(1),ξis(2),ξis(3)])F2.\mathcal{V}(t,s)=\mathbb{E}||\text{diag}([\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}])-\text{diag}([\xi_{is}^{(1)},\xi_{is}^{(2)},\xi_{is}^{(3)}])||_{F}^{2}. (5)

The expected squared Frobenius norm 𝒱(t,s)\mathcal{V}(t,s) can be treated as a variogram (Cressie and Wikle, 2015; Lan et al., 2021). We use Monte-Carlo approximation to visualize 𝒱(t,s)\mathcal{V}(t,s) (see Figure 3) since it has no analytic expression. The term 𝒱(t,s)\mathcal{V}(t,s) is a decreasing function of ρts\rho_{ts}. A larger value of the degrees of freedom mm leads to a larger value of 𝒱(t,s)\mathcal{V}(t,s), indicating lager local variations. We conclude that the spatial dependence replies on the correlation parameter ρts\rho_{ts} of the latent Gaussian processes. Thus, Property 2 that [Lit(1),Lit(2),Lit(3)][L_{it}^{(1)},L_{it}^{(2)},L_{it}^{(3)}] are spatially correlated over the voxels tt, is satisfied. Furthermore, conditional on ξit(k)\xi_{it}^{(k)}, the expectation of Lit(k)L_{it}^{(k)} is 𝔼Lit(k)|ξit(k)Lit(k)ξit(k)eβ0t(k)+𝑿i𝜷t(k)\mathbb{E}_{L_{it}^{(k)}|\xi_{it}^{(k)}}L_{it}^{(k)}\simeq\xi_{it}^{(k)}e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}. Locally, [ξit(1),ξit(2),ξit(3)][\xi_{it}^{(1)},\xi_{it}^{(2)},\xi_{it}^{(3)}] follows ([1,1,1],M)\mathcal{E}([1,1,1],M) with 𝔼ξit(k)1\mathbb{E}\xi_{it}^{(k)}\simeq 1 as MM is large. Thus, Property 3 that the approximation 𝔼Lit(k)eβ0t(k)+𝑿i𝜷t(k)\mathbb{E}L_{it}^{(k)}\simeq e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}} is still preserved after ξit(k)\xi_{it}^{(k)} is marginalized out.

Refer to caption
Figure 3: The function 𝒱(t,s)\mathcal{V}(t,s) is visualized via Monte-Carlo approximation. The x-axis is for the values of ρts\rho_{ts}. The y-axis is for the values of 𝒱(t,s)\mathcal{V}(t,s). The orange, red, and blue curves are for m=20,30,40m=20,30,40, respectively.

Next, we give the specification of the correlation ρts\rho_{ts}. In light of many interdisciplinary works on fiber tracking Wong et al. (e.g., 2016, Section 4), the spatial dependence can be assumed to be auto-correlated along with a fiber tract, and there is no spatial dependence across fibers (see Figure 2). To accompany with this feature, we no longer use the subscript tt to denote the voxels but use the subscript gtg_{t} to denote the tt-th voxel of the gg-th fiber. Given a fiber gg, [g1,,gt,,gTg][g_{1},...,g_{t},...,g_{T_{g}}] are the 11-st, …, tt-th, …, TgT_{g}-th diffusion tensor ordered from the first diffusion tensor of the fiber to the last diffusion tensor of the tensor, respectively. To accompany with the auto-correlation of the voxels along a fiber, we give that ρgt,gs=p|ts|\rho_{g_{t},g^{\prime}_{s}}=p^{|t-s|} for g=gg=g^{\prime}, where pp is the correlation parameter controlling the spatial dependence. To accompany with that there is no spatial dependence across fibers, we give that ρgt,gs=0\rho_{g_{t},g^{\prime}_{s}}=0 for ggg\not=g^{\prime}. A by-product of this spatial auto-correlation is to produce an analytic expression of the density function of the spatially correlated eigenvalues (James et al., 1964).

2.3 Model Summary and Diffusion Tensor Imaging

To this end, our proposed model of random auto-correlated eigenvalues is fully expressed as

Basic Level:{[Li,gt(1),Li,gt(2),Li,gt(3)]|[ξi,gt(1),ξi,gt(2),ξi,gt(3)]([λi,gt(1),λi,gt(2),λi,gt(3)],m);λi,gt(k)=ξi,gt(k)eβ0,gt(k)+𝑿i𝜷gt(k);𝜷gt(k)=[β1,gt(k),,βD,gt(k)]T;βd,gt(1)>βd,gt(2)>βd,gt(3)\displaystyle{\textbf{Basic Level:}}\left\{\begin{aligned} &[L_{i,g_{t}}^{(1)},L_{i,g_{t}}^{(2)},L_{i,g_{t}}^{(3)}]|[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\sim\mathcal{E}([\lambda_{i,g_{t}}^{(1)},\lambda_{i,g_{t}}^{(2)},\lambda_{i,g_{t}}^{(3)}],m);\\ &\lambda_{i,g_{t}}^{(k)}=\xi_{i,g_{t}}^{(k)}e^{\beta_{0,g_{t}}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}};\\ &\bm{\beta}_{g_{t}}^{(k)}=[{\beta}_{1,g_{t}}^{(k)},...,{\beta}_{D,g_{t}}^{(k)}]^{T};\quad{\beta}_{d,g_{t}}^{(1)}>{\beta}_{d,g_{t}}^{(2)}>{\beta}_{d,g_{t}}^{(3)}\\ \end{aligned}\right. (6)
Latent Level:{{[ξi,gt(1),ξi,gt(2),ξi,gt(3)]}t𝕋𝒮𝒫(M,ρgt,gs[1,1,1]);ρgt,gs={p|ts|,g=g0,gg,\displaystyle{\textbf{Latent Level:}}\left\{\begin{aligned} &\{[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\}_{t\in\mathbb{T}}\sim\mathcal{SEP}(M,\rho_{g_{t},g^{\prime}_{s}}\otimes[1,1,1]);\\ &\rho_{g_{t},g^{\prime}_{s}}=\left\{\begin{aligned} p^{|t-s|}&,&g=g^{\prime}\\ 0&,&g\not=g^{\prime}\end{aligned}\right.,\end{aligned}\right.

for i=1,2,,N;g=1,2,,G;t=1,2,,Tgi=1,2,...,N;\ g=1,2,...,G;\ t=1,2,...,T_{g}.

This proposed model can be used for the statistical inference of eigenvalues of DTI. The DTI data are collected for subjects i=1,2,,Ni=1,2,...,N. For each subject ii, the eigenvalues of the voxel t{1,2,,Tg}t\in\{1,2,...,T_{g}\} located at the fiber g{1,2,,G}g\in\{1,2,...,G\} are treated as responses, i.e., [Li,gt(1),Li,gt(2),Li,gt(3)][L_{i,g_{t}}^{(1)},L_{i,g_{t}}^{(2)},L_{i,g_{t}}^{(3)}]. Conditional on [ξi,gt(1),ξi,gt(2),ξi,gt(3)][\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}], the eigenvalues [Li,gt(1),Li,gt(2),Li,gt(3)][L_{i,g_{t}}^{(1)},L_{i,g_{t}}^{(2)},L_{i,g_{t}}^{(3)}] follow a distribution of Wishart’s eigenvalues, i.e., ([λi,gt(1),λi,gt(2),λi,gt(3)],m)\mathcal{E}([\lambda_{i,g_{t}}^{(1)},\lambda_{i,g_{t}}^{(2)},\lambda_{i,g_{t}}^{(3)}],m). The value of the degrees of freedom mm controls the local variations of the eigenvalues: a larger value of mm leads to smaller local variations. The term eβ0,gt(k)+𝑿i𝜷gt(k)e^{\beta_{0,g_{t}}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}} dominates the mean of the eigenvalue Li,gt(k)L_{i,g_{t}}^{(k)} when mm and MM are large enough. The coefficient βd,gt(k){\beta}_{d,g_{t}}^{(k)} captures the averaging effect of the covariate dd on the mean of Li,gt(k)L_{i,g_{t}}^{(k)}.

To capture the spatial correlations, the latent variables {[ξi,gt(1),ξi,gt(2),ξi,gt(3)]}t𝕋\{[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\}_{t\in\mathbb{T}} follow the spatial process of Wishart’s eigenvalues 𝒮𝒫(M,ρgt,gs[1,1,1])\mathcal{SEP}(M,\rho_{g_{t},g^{\prime}_{s}}\otimes[1,1,1]), where ρgt,gs={p|ts|,g=g0,gg\rho_{g_{t},g^{\prime}_{s}}=\left\{\begin{aligned} p^{|t-s|}&,&g=g^{\prime}\\ 0&,&g\not=g^{\prime}\end{aligned}\right.. The value of the degrees of freedom MM controls the local variations of the latent eigenvalues. No correlation is assumed across fibers. The parameter pp controls the correlation among voxels within a fiber tract: a larger value of pp leads to strong correlations. The information of the fibers and their connected voxels us treated as priori. These pieces of the known priori information can be obtained from a human brain diffusion tensor template (e.g., Peng et al., 2009).

3 Estimation

In this section, we introduce the statistical estimation of Model 6. We use 𝒍i,gt=[li,gt(1),li,gt(2),li,gt(3)]\bm{l}_{i,g_{t}}=[l_{i,g_{t}}^{(1)},l_{i,g_{t}}^{(2)},l_{i,g_{t}}^{(3)}] to denote the observed eigenvalues of the random eigenvalues 𝑳i,gt=[Li,gt(1),Li,gt(2),Li,gt(3)]\bm{L}_{i,g_{t}}=[L_{i,g_{t}}^{(1)},L_{i,g_{t}}^{(2)},L_{i,g_{t}}^{(3)}]. We also give 𝝃i,gt=[ξi,gt(1),ξi,gt(2),ξi,gt(3)]\bm{\xi}_{i,g_{t}}=[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}], 𝑳={𝑳i,gt:i=1,2,,N;g=1,2,,G;t=1,2,,Tg}\bm{L}=\{\bm{L}_{i,g_{t}}:i=1,2,...,N;g=1,2,...,G;t=1,2,...,T_{g}\}, and 𝝃={𝝃i,gt:i=1,2,,N;g=1,2,,G;t=1,2,,Tg}\bm{\xi}=\{\bm{\xi}_{i,g_{t}}:i=1,2,...,N;g=1,2,...,G;t=1,2,...,T_{g}\}. The parameters to be estimated are 𝜽={M,m,p,𝜷(1),𝜷(2),𝜷(3)}\bm{\theta}=\{M,m,p,\bm{\beta}^{(1)},\bm{\beta}^{(2)},\bm{\beta}^{(3)}\} where 𝜷(k)={𝑩gt(k)=[β0,gt(k),𝜷gt(k)T]T:g=1,2,,G;t=1,2,,Tg}\bm{\beta}^{(k)}=\{\bm{B}_{g_{t}}^{(k)}=[{\beta}_{0,g_{t}}^{(k)},{\bm{\beta}_{g_{t}}^{(k)}}^{T}]^{T}:g=1,2,...,G;t=1,2,...,T_{g}\}.

First, we aim to give the explicit expression of the log-likelihood of the data 𝒍\bm{l} and the latent variables 𝝃\bm{\xi}, in order to proceed with the next parameter estimation. The relevant proofs and derivations in this section are provided in Appendix A.2.

The log-likelihood of the data 𝒍\bm{l} and the latent variables 𝝃\bm{\xi} is expressed as

full(𝜽|𝒍,𝝃)=\displaystyle\ell_{full}(\bm{\theta}|\bm{l},\bm{\xi})= log[i=1Ng=1Gt=1Tg[f𝑳i,gt|𝝃i,gt(𝒍i,gt|𝝃i,gt;m,𝜷)]\displaystyle\log\Bigg{[}\prod_{i=1}^{N}\prod_{g=1}^{G}\prod_{t=1}^{T_{g}}\Big{[}f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta})\Big{]} (7)
×i=1Ng=1G[f𝝃ig(𝝃i,g1,,𝝃i,gt,,𝝃i,gTg;M,p)]],\displaystyle\times\prod_{i=1}^{N}\prod_{g=1}^{G}\Big{[}f_{\bm{\xi}_{ig}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)\Big{]}\Bigg{]},

where the term f𝑳i,gt|𝝃i,gt(𝒍i,gt|𝝃i,gt;m,𝜷)f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta}) is the conditional density of 𝑳i,gt\bm{L}_{i,g_{t}} given 𝝃i,gt\bm{\xi}_{i,g_{t}} and the term f𝝃g(𝝃i,g1,,𝝃i,gt,,𝝃i,gTg;M,p)f_{\bm{\xi}_{g}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p) is the joint density function of [𝝃i,g1,,𝝃i,gt,,𝝃i,gTg][\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}}].

Muirhead (2009, Theorem 3.2.18) gives the expression of f𝑳i,gt|𝝃i,gt(𝒍i,gt|𝝃i,gt;m,𝜷)f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta}). The density function is difficult to compute due to the presence of an integral which is difficult to evaluate. Given the assumption that mm is assumed to be large to make 𝔼Lit(k)|ξit(k)Lit(k)\mathbb{E}_{L_{it}^{(k)}|\xi_{it}^{(k)}}L_{it}^{(k)} dominated by λi,gt(k)=ξi,gt(k)eβ0,gt(k)+𝑿i𝜷gt(k)\lambda_{i,g_{t}}^{(k)}=\xi_{i,g_{t}}^{(k)}e^{\beta_{0,g_{t}}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}}, we give Lemma 3.1 borrowed from Srivastava (1979, Corollary 9.4.2).

Lemma 3.1.

In Equation 2, mLi,gt(k)λi,gt(k)2λi,gt(k)\sqrt{m}\frac{L_{i,g_{t}}^{(k)}-\lambda_{i,g_{t}}^{(k)}}{\sqrt{2}\lambda_{i,g_{t}}^{(k)}} converges in distribution to a standard normal distribution as mm\rightarrow\infty, conditional on ξit(k)\xi_{it}^{(k)}.

Therefore, to simplify the computation, we approximate f𝑳i,gt|𝝃i,gt(𝒍i,gt|𝝃i,gt;m,𝜷)f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta}) as a product of probability density functions of normal distributed random variables, expressed as

f𝑳i,gt|𝝃i,gt(𝒍i,gt|𝝃i,gt;m,𝜷)k=13ϕ(li,gt(k);λi,gt(k),2mλi,gt(k)),\displaystyle f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta})\simeq\prod_{k=1}^{3}\phi\Bigg{(}l_{i,g_{t}}^{(k)};\lambda_{i,g_{t}}^{(k)},\sqrt{\frac{2}{m}}\lambda_{i,g_{t}}^{(k)}\Bigg{)}, (8)

where λi,gt(k)=ξi,gt(k)eβ0,gt(k)+𝑿i𝜷gt(k)\lambda_{i,g_{t}}^{(k)}=\xi_{i,g_{t}}^{(k)}e^{\beta_{0,g_{t}}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}} and ϕ(l;μ,σ)\phi(l;\mu,\sigma) is the probability density function of a normal distribution with mean μ\mu and standard error σ\sigma. In practice, this simplification reduces the computational cost significantly but preserves the accurate parameter estimation.

Next we handle the probability density function f𝝃ig(𝝃i,g1,,𝝃i,gt,,𝝃i,gTg;M,p)f_{\bm{\xi}_{ig}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p). In the following paragraphs, we derive the expression of the probability density function. First, Lemma 3.2 below allows us to handle only two density functions: f𝝃i,g1(𝝃i,g1;M)f_{\bm{\xi}_{i,g_{1}}}(\bm{\xi}_{i,g_{1}};M) and f𝝃i,gt|𝝃i,gt1(𝝃i,gt|𝝃i,gt1;M,p)f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p).

Lemma 3.2.

f𝝃ig(𝝃i,g1,,𝝃i,gt,,𝝃i,gTg;M,p)f_{\bm{\xi}_{ig}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p) in Equation 7 can be written as

f𝝃ig(𝝃i,g1,,𝝃i,gt,,𝝃i,gTg;M,p)=f𝝃i,g1(ξi,g1;M)t=2Tgf𝝃i,gt|𝝃gi,t1(𝝃i,gt|𝝃gi,t1;M,p),f_{\bm{\xi}_{ig}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)=f_{\bm{\xi}_{i,g_{1}}}({\xi}_{i,g_{1}};M)\prod_{t=2}^{T_{g}}f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{g_{i,t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{g_{i,t-1}};M,p),

where f𝛏i,g1(𝛏i,g1;M)f_{\bm{\xi}_{i,g_{1}}}(\bm{\xi}_{i,g_{1}};M) is the density function of 𝛏i,g1=[ξi,g1(1),ξi,g1(2),ξi,g1(3)]\bm{\xi}_{i,g_{1}}=[\xi_{i,g_{1}}^{(1)},\xi_{i,g_{1}}^{(2)},\xi_{i,g_{1}}^{(3)}] and f𝛏i,gt|𝛏i,gt1(𝛏i,gt|𝛏i,gt1;M,p)f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p) is the conditional density of 𝛏i,gt=[ξi,gt(1),ξi,gt(2),ξi,gt(3)]\bm{\xi}_{i,g_{t}}=[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}] given 𝛏i,gt1=[ξi,gt1(1),ξi,gt1(2),ξi,gt1(3)]\bm{\xi}_{i,g_{t-1}}=[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}].

Because 𝝃i,g1([1,1,1],M)\bm{\xi}_{i,g_{1}}\sim\mathcal{E}([1,1,1],M), Muirhead (2009, Corollary 3.2.19) directly gives the expression of the probability density function f𝝃i,g1(𝝃i,g1;M)f_{\bm{\xi}_{i,g_{1}}}(\bm{\xi}_{i,g_{1}};M), expressed

f𝝃i,g1(𝝃i,g1;M)=π32/2(2M)3M/2Γ3(32)Γ3(M2)k=13[ξi,gt(k)](M31)/2k<q3[ξi,gt(k)ξi,gt(q)]exp(M2k=13ξi,gt(k)).\displaystyle f_{\bm{\xi}_{i,g_{1}}}(\bm{\xi}_{i,g_{1}};M)=\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\prod_{k=1}^{3}[\xi_{i,g_{t}}^{(k)}]^{(M-3-1)/2}\prod_{k<q}^{3}[\xi_{i,g_{t}}^{(k)}-\xi_{i,g_{t}}^{(q)}]\exp\left(-\frac{M}{2}\sum_{k=1}^{3}\xi_{i,g_{t}}^{(k)}\right). (9)

Given James et al. (1960, Equation 7) and James et al. (1964, Equation 68), we further derives the expression of f𝝃i,gt|𝝃i,gt1(𝝃i,gt|𝝃i,gt1;M,p)f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p) (see Appendix A.2), expressed as

f𝝃i,gt|𝝃i,gt1(𝝃i,gt|𝝃i,gt1;M,p)=\displaystyle f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p)= π32/2(2M)3M/2Γ3(32)Γ3(M2)k=13[ξi,gt(k)](M31)/2k<q3[ξi,gt(k)ξi,gt(q)]\displaystyle\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\prod_{k=1}^{3}[\xi_{i,g_{t}}^{(k)}]^{(M-3-1)/2}\prod_{k<q}^{3}[\xi_{i,g_{t}}^{(k)}-\xi_{i,g_{t}}^{(q)}] (10)
×exp[M2(p21p2k=1Kξi,gt1(k)+11p2k=1Kξi,gt(k))]×(11p2)3M2\displaystyle\times\exp\Bigg{[}-\frac{M}{2}\Big{(}\frac{p^{2}}{1-p^{2}}\sum_{k=1}^{K}\xi_{i,g_{t-1}}^{(k)}+\frac{1}{1-p^{2}}\sum_{k=1}^{K}\xi_{i,g_{t}}^{(k)}\Big{)}\Bigg{]}\times\big{(}\frac{1}{1-p^{2}}\big{)}^{\frac{3M}{2}}
×0F1(12M;M2p1p2[ξi,gt1(1),ξi,gt1(2),ξi,gt1(3)],M2p1p2[ξi,gt(1),ξi,gt(2),ξi,gt(3)]),\displaystyle\times\ _{0}F_{1}\Big{(}\frac{1}{2}M;\frac{M}{2}\frac{p}{1-p^{2}}[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}],\frac{M}{2}\frac{p}{1-p^{2}}[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\Big{)},

where F10(;)\ {}_{0}F_{1}(;) is a hypergeometric function of a matrix argument and its value can be numerically evaluated (Koev and Edelman, 2006). When p=0p=0, 𝝃i,gt\bm{\xi}_{i,g_{t}} is independent of 𝝃i,gt1\bm{\xi}_{i,g_{t-1}}. Furthermore, either f𝝃i,gt,𝝃i,gt1(𝝃i,gt,𝝃i,gt1;M,p)=f𝝃i,gt|𝝃i,gt1(𝝃i,gt|𝝃i,gt1;M,p)f𝝃i,gt1(𝝃i,gt1;M)f_{\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}};M,p)=f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p)f_{\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t-1}};M) or f𝝃i,gt,𝝃i,gt1(𝝃i,gt,𝝃i,gt1;M,p)=f𝝃i,gt1|𝝃i,gt(𝝃i,gt1|𝝃i,gt;M,p)f𝝃i,gt(𝝃i,gt;M)f_{\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}};M,p)=f_{\bm{\xi}_{i,g_{t-1}}|\bm{\xi}_{i,g_{t}}}(\bm{\xi}_{i,g_{t-1}}|\bm{\xi}_{i,g_{t}};M,p)f_{\bm{\xi}_{i,g_{t}}}(\bm{\xi}_{i,g_{t}};M) produces the joint density function of 𝝃i,gt\bm{\xi}_{i,g_{t}} and 𝝃i,gt1\bm{\xi}_{i,g_{t-1}}, denoted as f𝝃i,gt,𝝃i,gt1(𝝃i,gt,𝝃i,gt1;M,p)f_{\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}},\bm{\xi}_{i,g_{t-1}};M,p).

Putting Equations 8-10 together, we have the explicit expression of the full log-likelihood full(𝜽|𝒍,𝝃)\ell_{full}(\bm{\theta}|\bm{l},\bm{\xi}) (Equation 7). Standing on the above results, we next proceed the statistical inference of correlated eigenvalues. Our goal is to obtain the maximum likelihood estimators of 𝜽\bm{\theta}. In particular, to answer the clinical question about covariate effect quantification, the maximum likelihood estimate of 𝜷gt(k)\bm{\beta}_{g_{t}}^{(k)} quantifies the covariate effects, and its confidence interval describes the uncertainties of covariate effects. Therefore, we have to maximize the likelihood whose the latent variables 𝝃\bm{\xi} are integrated out. The log-likelihood (𝜽|𝒍)\ell(\bm{\theta}|\bm{l}) whose latent variables are integrated out is expressed as

(𝜽|𝒍)=\displaystyle\ell(\bm{\theta}|\bm{l})= log[i=1Ng=1Gt=1Tg[f𝑳i,gt|𝝃i,gt(𝒍i,gt|𝝃i,gt;m,𝜷)]\displaystyle\int\log\Bigg{[}\prod_{i=1}^{N}\prod_{g=1}^{G}\prod_{t=1}^{T_{g}}\Big{[}f_{\bm{L}_{i,g_{t}}|\bm{\xi}_{i,g_{t}}}(\bm{l}_{i,g_{t}}|\bm{\xi}_{i,g_{t}};m,\bm{\beta})\Big{]} (11)
×i=1Ng=1G[f𝝃g(𝝃i,g1,,𝝃i,gt,,𝝃i,gTg;M,p)]]d𝝃.\displaystyle\times\prod_{i=1}^{N}\prod_{g=1}^{G}\Big{[}f_{\bm{\xi}_{g}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)\Big{]}\Bigg{]}d\bm{\xi}.

The hurdle caused by the unobserved latent variables 𝝃\bm{\xi} can be easily overcome through the EM algorithm. The iterative scheme of the algorithm is given as follows. We start with the current parameter 𝜽(r)\bm{\theta}^{(r)}, thus the full likelihood at the current stage is Full(𝜽(r)|𝒍,𝝃)\ell_{Full}(\bm{\theta}^{(r)}|\bm{l},\bm{\xi}). We first compute the expected value of the likelihood function with respect to the current conditional distribution of 𝝃\bm{\xi} given the data 𝒍\bm{l} and the current parameter estimates 𝜽(r)\bm{\theta}^{(r)}, denoted as Q(𝜽|𝜽(r))=𝔼𝝃|𝑳,𝜽(r)Full(𝜽|𝒍,𝝃)Q(\bm{\theta}|\bm{\theta}^{(r)})=\mathbb{E}_{\bm{\xi}|\bm{L},\bm{\theta}^{(r)}}\ell_{Full}(\bm{\theta}|\bm{l},\bm{\xi}). This step is called E-step since we take the expectation. Next we maximize the term Q(𝜽|𝜽(r))Q(\bm{\theta}|\bm{\theta}^{(r)}) with respect to 𝜽\bm{\theta} to obtain the next estimates 𝜽(r+1)\bm{\theta}^{(r+1)}. This step is called M-step because we maximize the function. We iteratively repeat the two steps until convergence, in order to obtain the maximum likelihood estimator 𝜽^\hat{\bm{\theta}} for the parameters 𝜽\bm{\theta}.

In the M-step, the function Q(𝜽|𝜽(r))Q(\bm{\theta}|\bm{\theta}^{(r)}) can be feasibly maximized using the Quasi-Newton method with the the constraints βdt(1)>βdt(2)>βdt(3){\beta}_{dt}^{(1)}>{\beta}_{dt}^{(2)}>{\beta}_{dt}^{(3)} for d=0,1,2,,Dd=0,1,2,...,D. In the E-step, the term 𝔼𝝃|𝑳,𝜽(r)Full(𝜽|𝒍,𝝃)\mathbb{E}_{\bm{\xi}|\bm{L},\bm{\theta}^{(r)}}\ell_{Full}(\bm{\theta}|\bm{l},\bm{\xi}) has no analytic expression. Thus, the Monte-Carlo EM based on importance sampling (Levine and Casella, 2001, Section 2.1) is applied. We first generate CC Markov chain Monte-Carlo samples from the posterior distribution 𝝃|𝑳,𝜽(0)\bm{\xi}|\bm{L},\bm{\theta}^{(0)}, denoted as 𝝃(1),,𝝃(C)\bm{\xi}^{(1)},...,\bm{\xi}^{(C)}, where 𝜽(0)\bm{\theta}^{(0)} is the initial values of the EM algorithm. For every step rr, Q(𝜽|𝜽(r))Q(\bm{\theta}|\bm{\theta}^{(r)}) is approximated as Q(𝜽|𝜽(r))=𝔼𝝃|𝑳,𝜽(r)Full(𝜽|𝒍,𝝃)1Cc=1CωcFull(𝜽|𝒍,𝝃(c))Q(\bm{\theta}|\bm{\theta}^{(r)})=\mathbb{E}_{\bm{\xi}|\bm{L},\bm{\theta}^{(r)}}\ell_{Full}(\bm{\theta}|\bm{l},\bm{\xi})\approx\frac{1}{C}\sum_{c=1}^{C}\omega_{c}\ell_{Full}(\bm{\theta}|\bm{l},\bm{\xi}^{(c)}). The weight ωc\omega_{c} is defined as ωc=ωcc=1Cωc\omega_{c}=\frac{\omega_{c}^{\prime}}{\sum_{c=1}^{C}\omega_{c}^{\prime}} where ωc=exp[Full(𝜽(r)|𝒍,𝝃)Full(𝜽(0)|𝒍,𝝃)]\omega_{c}^{\prime}=\exp\Big{[}\ell_{Full}(\bm{\theta}^{(r)}|\bm{l},\bm{\xi})-\ell_{Full}(\bm{\theta}^{(0)}|\bm{l},\bm{\xi})\Big{]}. The correctness and relevant properties of the Monte-Carlo EM based on the importance sampling have already been provided in Levine and Casella (2001).

In clinical studies, the point and interval estimation of the coefficient βd,gt(k){\beta}_{d,g_{t}}^{(k)} are informative since it provides the covariate effects and the associated uncertainties, respectively. The confidence interval of parameters can be obtained by the parametric bootstrapping. We generate BB bootstrapping samples from Model 6 with 𝜽=𝜽^\bm{\theta}=\hat{\bm{\theta}}. We estimate βd,gt(k){\beta}_{d,g_{t}}^{(k)} for for each bootstrapping sample bb, denoted as βd,gt(k)(b){\beta}_{d,g_{t}}^{(k)*}(b), and we compute the bootstrapping difference as δd,gt(k)(b)=βd,gt(k)(b)β^d,gt(k){\delta}_{d,g_{t}}^{(k)*}(b)={\beta}_{d,g_{t}}^{(k)*}(b)-\widehat{\beta}_{d,g_{t}}^{(k)}. The 1α1-\alpha confidence interval of βd,gt(k){\beta}_{d,g_{t}}^{(k)}, is [β^d,gt(k)δd,gt(k)(b,1α/2),θ^iδd,gt(k)(b,α/2)][\widehat{\beta}_{d,g_{t}}^{(k)}-{\delta}_{d,g_{t}}^{(k)*}(b,1-\alpha/2),\hat{{\theta}}_{i}-{\delta}_{d,g_{t}}^{(k)*}(b,\alpha/2)], where δd,gt(k)(b,1α/2){\delta}_{d,g_{t}}^{(k)*}(b,1-\alpha/2) and δd,gt(k)(b,α/2){\delta}_{d,g_{t}}^{(k)*}(b,\alpha/2) are the 1α/21-\alpha/2 and α/2\alpha/2 quantiles of all the bootstrapping differences δd,gt(k)(b){\delta}_{d,g_{t}}^{(k)*}(b) for b=1,,Bb=1,...,B. In the next section, we use simulation studies to compare out proposal to the other methods in terms of the point and interval estimation of the coefficient.

4 Simulation Studies

In this section, we carry out simulation studies to measure the performance of our proposal. We call our proposed model as auto-eigenvalues model, and compare our method to a benchmark method. The benchmark model is a non-linear regression model treating the responses as normal distributed random variables. We refer this model as to the non-linear regression model, expressed as Li,gt(k)=eβ0t(k)+𝑿i𝜷gt(k)+ϵi,gt(k),ϵi,gt(k)𝒩(0,σ2)L_{i,g_{t}}^{(k)}=e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}}+\epsilon_{i,g_{t}}^{(k)},\epsilon_{i,g_{t}}^{(k)}\sim\mathcal{N}(0,\sigma^{2}).

The two models have the similar interpretations of the coefficients [β0t(k),𝜷gt(k)T][\beta_{0t}^{(k)},\ {\bm{\beta}_{g_{t}}^{(k)}}^{T}], that are to control the mean of the random eigenvalues, but differ in basic statistical assumptions. To summary, the auto-eigenvalues model respects that the responses are random eigenvalues while the non-linear regression model does not. The auto-eigenvalues model additionally captures the correlation of the eigenvalues among voxels. More importantly, the interpretation of eβ0t(k)+𝑿i𝜷t(k)e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}} is actually different. Under the the auto-eigenvalues model, we have 𝔼Lit(k)eβ0t(k)+𝑿i𝜷t(k)\mathbb{E}L_{it}^{(k)}\simeq e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}} as MM and mm are large. Under the non-linear regression model, there is no approximation involved, such that 𝔼Lit(k)=eβ0t(k)+𝑿i𝜷t(k)\mathbb{E}L_{it}^{(k)}=e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{t}^{(k)}}. Therefore, we emphasize that the aim of the comparison is only to see whether we treat the responses as random eigenvalues or not impacts the coefficient estimation.

The simulated data is generated from our proposed model. In each simulated data, we give the following settings. We have G=3G=3 fibers and each fiber has Tg=20T_{g}=20 voxels. The correlation parameter is set as p=0.6p=0.6. The number of subjects is N=10N=10. Since our model setup relies on large values of the degrees of freedom mm and MM, we test parameter estimation performance given M=m=20,30,40M=m=20,30,40, respectively. For each replication, we generate βd,gt(1){\beta}_{d,g_{t}}^{(1)} from a normal distribution with mean 0 and standard error 0.10.1, and we give βd,gt(2)=βd,gt(1)0.3{\beta}_{d,g_{t}}^{(2)}={\beta}_{d,g_{t}}^{(1)}-0.3 and βd,gt(3)=βd,gt(1)0.6{\beta}_{d,g_{t}}^{(3)}={\beta}_{d,g_{t}}^{(1)}-0.6.

We first evaluate the mean square error (MSE) of the coefficient parameter estimates. A smaller MSE indicates that the estimation is more accurate. The MSE is reported for each eigenvalue, defined as 1D×ggTdggt(βd,gt(k)βd,gt(k))2\frac{1}{D\times\sum_{g}g_{T}}\sum_{d}\sum_{g}\sum_{g_{t}}({\beta}_{d,g_{t}}^{(k)}-{\beta}_{d,g_{t}}^{(k)})^{2}. We use box-plots to visualize the MSEs of each replications (see Figure 4).

Refer to caption
Figure 4: The figure gives the MSE of coefficients through boxplots. The left panel is for the auto-eigenvalues model, and the right panel is for the non-linear regression model. The x-axis is for the values of degrees of freedom. The y-axis is for the values of MSE.

Regarding our proposed method, we find the trivial impact of the degrees of freedom on the parameter estimation of the coefficients, although the larger value of the degrees of freedom produces slightly more accurate results. This validates that our approximation in Equation 8 is acceptable. Since the interpretations of the coefficients are not equal between the two models, the MSEs of the two methods are not very comparable in terms of their crude numbers. An important observation is that the three eigenvalues in the non-linear regression present heterogeneous MSEs, even if the values of the degrees of freedom are large. This should be caused by that the non-linear regression model treats the three random eigenvalues equally via ignoring the random process of eigenvalues.

Furthermore, we compare the two methods in terms of the 95%95\% coverage probability. The 95%95\% coverage probability is defined as the probability that the true value is covered by the estimated 95%95\% confidence interval. The confidence interval of the non-linear regression model is also obtained via parametric bootstrapping. In Table 1, we summarize the coverage probabilities associated with each eigenvalue kk, i.e., [β0t(k),𝜷gt(k)T][\beta_{0t}^{(k)},\ {\bm{\beta}_{g_{t}}^{(k)}}^{T}], averaging over voxels, coefficients, simulation replications. The auto-eigenvalue model presents reasonable uncertainties of parameter estimation. However, the non-linear regression model presents an an inflated confidence interval due to the model misspecification, where the parameter estimation of the coefficients has inflated uncertainties.

Table 1: The table summarizes the 95%95\% coverage probabilities of the auto-eigenvalue model and the non-linear regression model.
Method Eigenvalue Degrees of Freedom
20 30 40
Auto-eigenvalues Model First 90% 91% 92%
Second 97% 95% 94%
Third 99% 98% 97%
Non-linear Regression First 100% 100% 100%
Second 100% 100% 100%
Third 100% 100% 100%

5 Applications to IXI Dataset

We use the IXI data-set provided by the Center for the Developing Brain, Imperial College London, to demonstrate our proposed method. The data-set is made available at the website https://brain-development.org/ixi-dataset/ under the CC BY-SA 3.0 license. From the data repository, we select the first 2020 healthy subjects collected from Hammersmith Hospital in London to demonstrate our methodological contribution and compare it to the benchmark method. We focus on the effects of body/weight index (BWI) on the two fibers located at the corpus callosum (see Figure 5). The covariate of BWI is scaled into [0,1][0,1].

Refer to caption
Figure 5: The figure illustrates the fibers the two fibers located at corpus callosum. The background is the Coronal plane of a human brain. The two red bundles are the fibers located in the region of interest.

We continue to compare our proposed auto-eigenvalue model to the alternative method non-linear regression model. To evaluate the covariate effects, we report the Z-socres of the coefficient estimators, which is defined as β^d,gt(k)σdt(k)\frac{\widehat{\beta}_{d,g_{t}}^{(k)}}{\sigma_{dt}^{(k)}}, where σdt(k)\sigma_{dt}^{(k)} is the standard error of the estimator β^d,gt(k)\widehat{\beta}_{d,g_{t}}^{(k)}. A larger absolute value of z-score indicates that the corresponding covariate has more significant effect on kk-th eigenvalue located at the voxel gtg_{t}.

The covariate effects of BWI are given in Figure 6. First, we observe that the two models’ z-scores are roughly consistent. It indicates that the two models will not produce too different scientific interpretations. However, our proposed method is powerful in finding regions (voxels) with covariate effects since z-scores’ absolute values are relatively larger. This is consistent with our results in the simulation studies (Section 4), that is the non-linear regression model presents inflated statistical uncertainties. Although the real statistical underlying process of the diffusion tensors’ eigenvalues is intractable, this may further endorse our major claim in this paper: the responses are encouraged to be treated as random eigenvalues. Using our proposed model based on the distribution of Wishart’s eigenvalues is an option. Furthermore, the point estimate and 95%95\% confidence interval of pp are 0.4520.452 and [0.445,0.493][0.445,0.493], implying the existence of auto-correlation among random eigenvalues. The point estimate of mm and MM are 31.95631.956 and 32.79532.795, respectively. The 95%95\% confidence intervals of mm and MM are [30.256,33.656][30.256,33.656] and [31.573,33.940][31.573,33.940], respectively. The large value of degrees of freedom implies that the values of degrees of freedom might be large enough to make the term eβ0t(k)+𝑿i𝜷gt(k)e^{\beta_{0t}^{(k)}+\bm{X}_{i}\bm{\beta}_{g_{t}}^{(k)}} dominate the mean of the random eigenvalue 𝔼Li,gt(k)\mathbb{E}L_{i,g_{t}}^{(k)}.

Refer to caption
Figure 6: This figure gives the covariate effects of BWI. In each panel, the left fiber is the result of the auto-eigenvalues model and the right fiber is the result of the non-linear regression model. The fills of the points show the the values of z-scores.

6 Conclusion and Future Works

This paper focuses on the statistical inference of the diffusion tensors’ eigenvalues because it is more scientifically interpretable. We propose a model of random eigenvalues based on the distribution of Wishart’s eigenvalues. Relying on the previous works i.e., Muirhead (2009, Chapter 3.2.5), James et al. (1964), and Srivastava (1979, Chapter 9.4), we are able to construct the hierarchical model of random eigenvalues and derive the relevant theoretical details. The estimation based on the Monte-Carlo EM algorithm is also proposed. We use simulation to study the impact on the parameter estimation if the responses are not treated as random eigenvalues. The real data analysis using IXI data also suggests that the eigenvalue responses of DTI are encouraged to be treated as random eigenvalues since our proposal produces more powerful results.

In this paper, we focus on eigenvalues and do not yet explore the statistical inference of eigenvectors. Srivastava (1979, Theorem 3.4.2) gives the density function expression of a Wishart distributed matrix’s eigenvectors. More studies on eigenvectors’ statistical inference are encouraged, including the specification of the covariate effect and correlation effect, and computational methods. All of these can be considered as future works.


SUPPLEMENTARY MATERIAL

Appendix A Theoretical Details

In this section, we offer theoretical details which are not fully elaborated in the main contexts.

A.1 Density Function of Parameterized Wishart Distribution

Let 𝑾𝒲p(𝚺,U)\bm{W}\sim\mathcal{W}_{p}(\bm{\Sigma},U). The probability density function of 𝑾\bm{W} is

f𝐖(𝐰)=|𝐰|(Up1)/2etr(U𝚺1𝐰)/22Up2|𝚺/U|U/2Γp(U2).{\displaystyle f_{\mathbf{W}}(\mathbf{w})={\frac{|\mathbf{w}|^{(U-p-1)/2}e^{-\operatorname{tr}(U\mathbf{\Sigma}^{-1}\mathbf{w})/2}}{2^{\frac{Up}{2}}|{\mathbf{\Sigma}/U}|^{U/2}\Gamma_{p}({\frac{U}{2}})}}}. (12)

A.2 Proof of Lemma 3.2 and Derivation of Equation 10

To make our proof and derivation concise, we omit the subscript ii in the following. First, the joint density function of (𝒁j,g1,,𝒁j,gTg)(\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{T_{g}}}) denoted as f𝒁j,g1,,𝒁j,gt,,𝒁j,gTg(𝒛j,g1,,𝒛j,gt,,𝒛j,gTg)f_{\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{t}},...,\bm{Z}_{j,g_{T_{g}}}}(\bm{z}_{j,g_{1}},...,\bm{z}_{j,g_{t}},...,\bm{z}_{j,g_{T_{g}}}) can be decomposed as

f𝒁j,g1,,𝒁j,gt,,𝒁j,gTg(𝒛j,g1,,𝒛j,gt,,𝒛j,gTg)\displaystyle f_{\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{t}},...,\bm{Z}_{j,g_{T_{g}}}}(\bm{z}_{j,g_{1}},...,\bm{z}_{j,g_{t}},...,\bm{z}_{j,g_{T_{g}}}) (13)
=f𝒁j,g1(𝒛j,g1)×f𝒁j,g2|𝒁j,g1(𝒛j,g2|𝒛j,g1)×f𝒁j,g3|𝒁j,g2,𝒁j,g1(𝒛j,g3|𝒛j,g2,𝒛j,g1)×\displaystyle=f_{\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{1}})\times f_{\bm{Z}_{j,g_{2}}|\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{2}}|\bm{z}_{j,g_{1}})\times f_{\bm{Z}_{j,g_{3}}|\bm{Z}_{j,g_{2}},\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{3}}|\bm{z}_{j,g_{2}},\bm{z}_{j,g_{1}})\times
\displaystyle\quad...
×f𝒁j,gT|𝒁j,gT1,,𝒁j,g1(𝒛j,gT|𝒛j,gT1,,𝒛j,g1).\displaystyle\quad\times f_{\bm{Z}_{j,g_{T}}|\bm{Z}_{j,g_{T-1}},...,\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{T}}|\bm{z}_{j,g_{T-1}},...,\bm{z}_{j,g_{1}}).

Since (𝒁j,g1,,𝒁j,gTg)(\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{T_{g}}}) follows a multivariate normal distribution with the covariance which is a Toeplitz matrix and the inverse covariance matrix is a tridiagonal matrix, the joint density function f𝒁j,g1,,𝒁j,gt,,𝒁j,gTg(𝒛j,g1,,𝒛j,gt,,𝒛j,gTg)f_{\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{t}},...,\bm{Z}_{j,g_{T_{g}}}}(\bm{z}_{j,g_{1}},...,\bm{z}_{j,g_{t}},...,\bm{z}_{j,g_{T_{g}}}) can be simplified as

f𝒁j,g1,,𝒁j,gt,,𝒁j,gTg(𝒛j,g1,,𝒛j,gt,,𝒛j,gTg)\displaystyle f_{\bm{Z}_{j,g_{1}},...,\bm{Z}_{j,g_{t}},...,\bm{Z}_{j,g_{T_{g}}}}(\bm{z}_{j,g_{1}},...,\bm{z}_{j,g_{t}},...,\bm{z}_{j,g_{T_{g}}}) (14)
=f𝒁j,g1(𝒛j,g1)×f𝒁j,g2|𝒁j,g1(𝒛j,g2|𝒛j,g1)×f𝒁j,g3|𝒁j,g2(𝒛j,g3|𝒛j,g2)××f𝒁j,gT|𝒁j,gT1(𝒛j,gT|𝒛j,gT1).\displaystyle=f_{\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{1}})\times f_{\bm{Z}_{j,g_{2}}|\bm{Z}_{j,g_{1}}}(\bm{z}_{j,g_{2}}|\bm{z}_{j,g_{1}})\times f_{\bm{Z}_{j,g_{3}}|\bm{Z}_{j,g_{2}}}(\bm{z}_{j,g_{3}}|\bm{z}_{j,g_{2}})\times...\times f_{\bm{Z}_{j,g_{T}}|\bm{Z}_{j,g_{T-1}}}(\bm{z}_{j,g_{T}}|\bm{z}_{j,g_{T-1}}).

This decomposition indicates that the terms (𝒁j,g1,[𝒁j,g2|𝒁j,g1],[𝒁j,g3|𝒁j,g2],,[𝒁j,gT|𝒁j,gT1])(\bm{Z}_{j,g_{1}},[\bm{Z}_{j,g_{2}}|\bm{Z}_{j,g_{1}}],[\bm{Z}_{j,g_{3}}|\bm{Z}_{j,g_{2}}],...,[\bm{Z}_{j,g_{T}}|\bm{Z}_{j,g_{T-1}}]) are mutually independent. Therefore, the joint density function of 𝝃i,g1,,𝝃i,gt,,𝝃i,gTg\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}} can be written as

f𝝃g(𝝃i,g1,,𝝃i,gt,,𝝃i,gTg;M,p)=f𝝃g1(ξi,g1;M)t=2Tgf𝝃gt|𝝃gt1(𝝃gt|{𝒁j,gt1:j=1:M};M,p),f_{\bm{\xi}_{g}}(\bm{\xi}_{i,g_{1}},...,\bm{\xi}_{i,g_{t}},...,\bm{\xi}_{i,g_{T_{g}}};M,p)=f_{\bm{\xi}_{g_{1}}}({\xi}_{i,g_{1}};M)\prod_{t=2}^{T_{g}}f_{\bm{\xi}_{g_{t}}|\bm{\xi}_{g_{t-1}}}(\bm{\xi}_{g_{t}}|\{\bm{Z}_{j,g_{t-1}}:j=1:M\};M,p),

because ξi,gt{\xi}_{i,g_{t}} are the eigenvalues of 𝒁j,gt𝒁j,gtT/M\bm{Z}_{j,g_{t}}\bm{Z}^{T}_{j,g_{t}}/M.

Next we want to show that f𝝃gt|{𝒁j,gt1:j=1:M}(𝝃gt|{𝒁j,gt1:j=1:M};M,p)=f𝝃i,gt|𝝃i,gt1(𝝃i,gt|𝝃i,gt1;M,p)f_{\bm{\xi}_{g_{t}}|\{\bm{Z}_{j,g_{t-1}}:j=1:M\}}(\bm{\xi}_{g_{t}}|\{\bm{Z}_{j,g_{t-1}}:j=1:M\};M,p)=f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p). The derivation largely relies on James et al. (1960, Equation 7) and James et al. (1964, Equation 68). The derivation trick is borrowed from Smith and Garth (2007) where the conditional distribution of eigenvalues of a complex Wishart matrix is derived.

First, we re-illustrate James et al. (1960, Equation 7) and James et al. (1964, Equation 68) by giving Theorem A.1.

Theorem A.1.

Let 𝐊jind𝒩(𝛍/M,𝚺/M)\bm{K}_{j}\stackrel{{\scriptstyle ind}}{{\sim}}\mathcal{N}(\bm{\mu}/\sqrt{M},\bm{\Sigma}/M). The roots of |j=1M𝐊j𝐊jT𝐖𝚺/M|=0|\sum_{j=1}^{M}\bm{K}_{j}\bm{K}_{j}^{T}-\bm{W}\bm{\Sigma}/M|=0, denoted as 𝐖=diag[W1,W2,W3]\bm{W}=diag[W_{1},W_{2},W_{3}] follow a distribution with the density function expressed as

fW1,W2,W3(w1,w2,w3)=exp[12(w1+w2+w3)]0F1(12M;14[o1,o2,o3],[w1,w2,w3])\displaystyle f_{W_{1},W_{2},W_{3}}(w_{1},w_{2},w_{3})=\exp\Big{[}-\frac{1}{2}(w_{1}+w_{2}+w_{3})\Big{]}\ _{0}F_{1}\Big{(}\frac{1}{2}M;\frac{1}{4}[o_{1},o_{2},o_{3}],[w_{1},w_{2},w_{3}]\Big{)} (15)
×π32/2(2M)3M/2Γ3(32)Γ3(M2)exp[12(o1+o2+o3)]k=13(wk)(M31)/2k<q3[wkwq],\displaystyle\times\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\exp\Big{[}-\frac{1}{2}(o_{1}+o_{2}+o_{3})\Big{]}\prod_{k=1}^{3}(w_{k})^{(M-3-1)/2}\prod_{k<q}^{3}[w_{k}-w_{q}],

where the roots of |𝛍𝛍T𝐎𝚺/M|=0|\bm{\mu}\bm{\mu}^{T}-\bm{O}\bm{\Sigma}/M|=0 are 𝐎=diag[o1,o2,o3]\bm{O}=diag[o_{1},o_{2},o_{3}].

We next apply this theorem to our problem based on the result of conditional normal distribution, i.e., [𝒁j,gt|𝒁j,gt1]𝒩(p𝒁j,gt1,(1p2)𝑰)[\bm{Z}_{j,g_{t}}|\bm{Z}_{j,g_{t-1}}]\sim\mathcal{N}({p}\bm{Z}_{j,g_{t-1}},(1-{p}^{2})\bm{I}). By letting 𝑲j\bm{K}_{j} in Theorem A.1 be [𝒁j,gt|𝒁j,gt1][\bm{Z}_{j,g_{t}}|\bm{Z}_{j,g_{t-1}}], we have

fW1,W2,W3(w1,w2,w3)=exp[12(w1+w2+w3)]0F1(12M;Mp24(1p2)[ξi,gt1(1),ξi,gt1(2),ξi,gt1(3)],[w1,w2,w3])\displaystyle f_{W_{1},W_{2},W_{3}}(w_{1},w_{2},w_{3})=\exp\Big{[}-\frac{1}{2}(w_{1}+w_{2}+w_{3})\Big{]}\ _{0}F_{1}\Big{(}\frac{1}{2}M;\frac{M{p}^{2}}{4(1-{p}^{2})}[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}],[w_{1},w_{2},w_{3}]\Big{)} (16)
×π32/2(2M)3M/2Γ3(32)Γ3(M2)exp[Mp22(1p2)(ξi,gt1(1)+ξi,gt1(2)+ξi,gt1(3))]k=13(wk)(M31)/2k<q3[wkwq],\displaystyle\times\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\exp\Big{[}-\frac{M{p}^{2}}{2(1-{p}^{2})}(\xi_{i,g_{t-1}}^{(1)}+\xi_{i,g_{t-1}}^{(2)}+\xi_{i,g_{t-1}}^{(3)})\Big{]}\prod_{k=1}^{3}(w_{k})^{(M-3-1)/2}\prod_{k<q}^{3}[w_{k}-w_{q}],

where [W1,W2,W3]=M(1p2)[ξi,gt1(1),ξi,gt1(2),ξi,gt1(3)][W_{1},W_{2},W_{3}]=\frac{M}{(1-{p}^{2})}[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}]. Via change of variables, we have

f𝝃i,gt|𝝃i,gt1(𝝃i,gt|𝝃i,gt1;M,p)\displaystyle f_{\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}}}(\bm{\xi}_{i,g_{t}}|\bm{\xi}_{i,g_{t-1}};M,p) =fW1,W2,W3(M(1p2)ξi,gt1(1),M(1p2)M(1p2)ξi,gt1(2),M(1p2)ξi,gt1(3))×|J|\displaystyle=f_{W_{1},W_{2},W_{3}}(\frac{M}{(1-{p}^{2})}\xi_{i,g_{t-1}}^{(1)},\frac{M}{(1-{p}^{2})}\frac{M}{(1-{p}^{2})}\xi_{i,g_{t-1}}^{(2)},\frac{M}{(1-{p}^{2})}\xi_{i,g_{t-1}}^{(3)})\times|J| (17)
(Because F10(m;𝑨,𝑩)=0F1(m;𝑨/c,𝑩c)\ {}_{0}F_{1}(m;\bm{A},\bm{B})=\ _{0}F_{1}(m;\bm{A}/c,\bm{B}c), and |J|=|M1p2|3|J|=|\frac{M}{1-p^{2}}|^{3})
=π32/2(2M)3M/2Γ3(32)Γ3(M2)k=13[ξi,gt(k)](M31)/2k<q3[ξi,gt(k)ξi,gt(q)]\displaystyle=\frac{\pi^{3^{2}/2}}{(\frac{2}{M})^{3M/2}\Gamma_{3}(\frac{3}{2})\Gamma_{3}(\frac{M}{2})}\prod_{k=1}^{3}[\xi_{i,g_{t}}^{(k)}]^{(M-3-1)/2}\prod_{k<q}^{3}[\xi_{i,g_{t}}^{(k)}-\xi_{i,g_{t}}^{(q)}]
×exp[M2(p21p2k=1Kξi,gt1(k)+11p2k=1Kξi,gt(k))]×(11p2)3M2\displaystyle\times\exp\Bigg{[}-\frac{M}{2}\Big{(}\frac{p^{2}}{1-p^{2}}\sum_{k=1}^{K}\xi_{i,g_{t-1}}^{(k)}+\frac{1}{1-p^{2}}\sum_{k=1}^{K}\xi_{i,g_{t}}^{(k)}\Big{)}\Bigg{]}\times\big{(}\frac{1}{1-p^{2}}\big{)}^{\frac{3M}{2}}
×0F1(12M;M2p1p2[ξi,gt1(1),ξi,gt1(2),ξi,gt1(3)],M2p1p2[ξi,gt(1),ξi,gt(2),ξi,gt(3)]).\displaystyle\times\ _{0}F_{1}\Big{(}\frac{1}{2}M;\frac{M}{2}\frac{p}{1-p^{2}}[\xi_{i,g_{t-1}}^{(1)},\xi_{i,g_{t-1}}^{(2)},\xi_{i,g_{t-1}}^{(3)}],\frac{M}{2}\frac{p}{1-p^{2}}[\xi_{i,g_{t}}^{(1)},\xi_{i,g_{t}}^{(2)},\xi_{i,g_{t}}^{(3)}]\Big{)}.

Appendix B Codes

All the codes written in MATLAB are summarized in AutoEigenCodes.zip. The instructions of implementing the codes are provided readme.txt. Some example scripts are also given.

References

  • (1)
  • Cressie and Wikle (2015) Cressie, N. and Wikle, C. K. (2015), Statistics for spatio-temporal data, John Wiley & Sons.
  • Gelfand et al. (2004) Gelfand, A. E., Schmidt, A. M., Banerjee, S. and Sirmans, C. (2004), ‘Nonstationary multivariate process modeling through spatially varying coregionalization’, Test 13(2), 263–312.
  • James et al. (1960) James, A. T. et al. (1960), ‘The distribution of the latent roots of the covariance matrix’, Annals of Mathematical Statistics 31(1), 151–158.
  • James et al. (1964) James, A. T. et al. (1964), ‘Distributions of matrix variates and latent roots derived from normal samples’, The Annals of Mathematical Statistics 35(2), 475–501.
  • Jin et al. (2017) Jin, Y., Huang, C., Daianu, M., Zhan, L., Dennis, E. L., Reid, R. I., Jack Jr, C. R., Zhu, H., Thompson, P. M. and Initiative, A. D. N. (2017), ‘3 d tract-specific local and global analysis of white matter integrity in a lzheimer’s disease’, Human brain mapping 38(3), 1191–1207.
  • Koev and Edelman (2006) Koev, P. and Edelman, A. (2006), ‘The efficient evaluation of the hypergeometric function of a matrix argument’, Mathematics of Computation 75(254), 833–846.
  • Lan et al. (2021) Lan, Z., Reich, B. J., Guinness, J., Bandyopadhyay, D., Ma, L. and Moeller, F. G. (2021), ‘Geostatistical modeling of positive definite matrices: An application to diffusion tensor imaging’, Biometrics p. In Press.
  • Lane et al. (2010) Lane, S. D., Steinberg, J. L., Ma, L., Hasan, K. M., Kramer, L. A., Zuniga, E. A., Narayana, P. A. and Moeller, F. G. (2010), ‘Diffusion tensor imaging and decision making in cocaine dependence’, PLoS One 5(7), e11591.
  • Lawley (1956) Lawley, D. (1956), ‘Tests of significance for the latent roots of covariance and correlation matrices’, Biometrika 43(1/2), 128–136.
  • Lee and Schwartzman (2017) Lee, H. N. and Schwartzman, A. (2017), ‘Inference for eigenvalues and eigenvectors in exponential families of random symmetric matrices’, Journal of Multivariate Analysis 162, 152–171.
  • Levine and Casella (2001) Levine, R. A. and Casella, G. (2001), ‘Implementations of the monte carlo em algorithm’, Journal of Computational and Graphical Statistics 10(3), 422–439.
  • Ma et al. (2009) Ma, L., Hasan, K. M., Steinberg, J. L., Narayana, P. A., Lane, S. D., Zuniga, E. A., Kramer, L. A. and Moeller, F. G. (2009), ‘Diffusion tensor imaging in cocaine dependence: regional effects of cocaine on corpus callosum and effect of cocaine administration route’, Drug and Alcohol Dependence 104(3), 262–267.
  • Muirhead (2009) Muirhead, R. J. (2009), Aspects of multivariate statistical theory, John Wiley & Sons.
  • Peng et al. (2009) Peng, H., Orlichenko, A., Dawe, R. J., Agam, G., Zhang, S. and Arfanakis, K. (2009), ‘Development of a human brain diffusion tensor template’, Neuroimage 46(4), 967–980.
  • Schwartzman et al. (2008) Schwartzman, A., Mascarenhas, W. F., Taylor, J. E. et al. (2008), ‘Inference for eigenvalues and eigenvectors of gaussian symmetric matrices’, The Annals of Statistics 36(6), 2886–2919.
  • Smith and Garth (2007) Smith, P. J. and Garth, L. M. (2007), ‘Distribution and characteristic functions for correlated complex Wishart matrices’, Journal of Multivariate Analysis 98(4), 661–677.
  • Srivastava (1979) Srivastava, M. (1979), An Introduction to Multivariate Statistics, North-Holland/New York.
  • Wong et al. (2016) Wong, R. K., Lee, T. C., Paul, D., Peng, J., Initiative, A. D. N. et al. (2016), ‘Fiber direction estimation, smoothing and tracking in diffusion mri’, The Annals of Applied Statistics 10(3), 1137–1156.
  • Yuan et al. (2012) Yuan, Y., Zhu, H., Lin, W. and Marron, J. S. (2012), ‘Local polynomial regression for symmetric positive definite matrices’, Journal of the Royal Statistical Society: Series B (Statistical Methodology) 74(4), 697–719.
  • Zhou (2010) Zhou, D. (2010), Statistical analysis of diffusion tensor imaging, PhD thesis, University of Nottingham.
  • Zhu et al. (2009) Zhu, H., Chen, Y., Ibrahim, J. G., Li, Y., Hall, C. and Lin, W. (2009), ‘Intrinsic regression models for positive-definite matrices with applications to diffusion tensor imaging’, Journal of the American Statistical Association 104(487), 1203–1212.
  • Zhu et al. (2011) Zhu, H., Kong, L., Li, R., Styner, M., Gerig, G., Lin, W. and Gilmore, J. H. (2011), ‘Fadtts: functional analysis of diffusion tensor tract statistics’, NeuroImage 56(3), 1412–1425.
  • Zhu et al. (2010) Zhu, H., Styner, M., Li, Y., Kong, L., Shi, Y., Lin, W., Coe, C. and Gilmore, J. H. (2010), Multivariate varying coefficient models for dti tract statistics, in ‘International Conference on Medical Image Computing and Computer-Assisted Intervention’, Springer, pp. 690–697.