A Two-stage Approach for Variable Selection in Joint Modeling of Multiple Longitudinal Markers and Competing Risk Outcomes

Taban Baghfalaki Corresponding author: taban.baghfalaki@u-bordeaux.fr Univ. Bordeaux, INSERM, INRIA, BPH, U1219, F-33000 Bordeaux, France Reza Hashemi Department of Statistics, Razi University, Kermanshah, Iran Christophe Tzourio Univ. Bordeaux, INSERM, INRIA, BPH, U1219, F-33000 Bordeaux, France Catherine Helmer Univ. Bordeaux, INSERM, INRIA, BPH, U1219, F-33000 Bordeaux, France Helene Jacqmin-Gadda Univ. Bordeaux, INSERM, INRIA, BPH, U1219, F-33000 Bordeaux, France

Abstract

Background In clinical and epidemiological research, the integration of longitudinal measurements and time-to-event outcomes is vital for understanding relationships and improving risk prediction. However, as the number of longitudinal markers increases, joint model estimation becomes more complex, leading to long computation times and convergence issues. This study introduces a novel two-stage Bayesian approach for variable selection in joint models, illustrated through a practical application.
Methods Our approach conceptualizes the analysis in two stages. In the first stage, we estimate one-marker joint models for each longitudinal marker related to the event, allowing for bias reduction from informative dropouts through individual marker trajectory predictions. The second stage employs a proportional hazard model that incorporates expected current values of all markers as time-dependent covariates. We explore continuous and Dirac spike-and-slab priors for variable selection, utilizing Markov chain Monte Carlo (MCMC) techniques.
Results The proposed method addresses the challenges of parameter estimation and risk prediction with numerous longitudinal markers, demonstrating robust performance through simulation studies. We further validate our approach by predicting dementia risk using the Three-City (3C) dataset, a longitudinal cohort study from France.
Conclusions This two-stage Bayesian method offers an efficient process for variable selection in joint modeling, enhancing risk prediction capabilities in longitudinal studies. The accompanying R package VSJM, which is freely available at https://github.com/tbaghfalaki/VSJM, facilitates implementation, making this approach accessible for diverse clinical applications.

1 Introduction

In many clinical studies, longitudinal measurements are repeatedly collected for each subject at different time points until an event such as death or drop-out occurs. It is also common to have multiple longitudinal measurements in a study. For example, in the context of HIV/AIDS studies, two important markers that are often repeatedly measured until death or drop-out are HIV-RNA and CD4 [24]. In most of these types of studies, the number of longitudinal markers under investigation and the number of explanatory variables is large. Estimating the association between longitudinal markers and time-to-event outcomes, as well as determining the effect of explanatory variables on event risks, is crucial in modern medicine [26, 47, 29, 16, 61], in particular, for the development of individual risk prediction models [39, 40]. For this purpose, joint models for longitudinal and time-to-event outcomes that combine mixed models and time-to-event models, have received considerable attention over the past decades. Interesting reviews on joint modeling can be found in [49, 35, 1] and [63]. From a computational point of view, the R software now offers some options: JMBayes [41], JMBayes2 [38], INLAjoint [44], and stan_jm modelling function in the rstanarm package [10]. The Bayesian implementation of joint models with multiple markers is available through INLAjoint [44] and JMBayes2 [38], but dealing with a large number of markers still poses computational difficulties due to the large number of random effects and parameters in these models. For this purpose, [5] propose a novel two-stage joint modeling approach (TSJM). In the first stage, joint models are estimated for each marker and the time-to-event. In the second stage, the predicted values of linear predictor of each marker are used as time-dependent covariates in a semi-parametric Cox model. This method enhances tractability and reduces bias from informative dropouts.
When multiple longitudinal markers and large number of time-invariant explanatory variables, are considered, the selection of the most predictive variables is challenging. Traditionally this has been performed through model comparison criteria such as Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), or Deviance Information Criterion (DIC). The use of these criteria for complex models in the presence of a large number of candidate models is not feasible due to the computational complexity associated with evaluating these criteria for complex models with numerous parameters and candidate models. Consequently, traditional criteria may not effectively guide model selection in such complex scenarios. This has led to the exploration of alternative strategies, such as variable selection, to address these challenges and identify the most appropriate model. It involves choosing appropriate variables from a complete list and removing those that are irrelevant or redundant. The purpose of such selection is to determine a set of variables that will provide the best fit for the model in order to make accurate predictions [15]. In general, variable selection in the frequentist paradigm involves the utilization of a penalized likelihood approach, which considers various forms of penalty functions. These include the least absolute shrinkage and selection operator (LASSO), the smoothly clipped absolute deviation (SCAD), and the adaptive least absolute shrinkage and selection operator (ALASSO) [11]. In the Bayesian framework, the selection of appropriate priors plays a pivotal role, similar to the regularization and shrinking techniques employed in frequentist variable selection methods. Properly chosen priors not only inform the prior beliefs about model parameters but also contribute to the regularization of the model, aiding in the prevention of overfitting and the promotion of model parsimony [21]. In this paper, we delve into the realm of Bayesian variable selection, where the careful selection of priors allows for the automatic identification of relevant predictors while appropriately handling model complexity [22]. Various types of priors have been proposed for variable selection, including spike-and-slab priors [32, 22, 34], horseshoe prior [12], and Laplace (i.e., double-exponential) prior [55]. Our investigation focuses on two prominent types of priors: continuous and Dirac spike priors. These priors strike a balance between flexibility and simplicity, offering practical advantages in the context of variable selection [46]. By exploring the implications and effectiveness of these priors, we aim to provide insights into their utility and contribute to the ongoing discourse surrounding Bayesian variable selection methodologies.
Variable selection methods in joint modeling have been explored across various studies, each offering unique insights into different aspects of the modeling process. [25] employed a penalized likelihood approach with adaptive LASSO penalty functions to select fixed and random effects in joint modeling of one longitudinal marker and time-to-event data. [60] extended this approach to the joint analysis of longitudinal data and interval-censored failure time data, utilizing penalized likelihood-based procedures to simultaneously select variables and estimate covariate effects. [23] adopted gradient boosting for variable selection in joint modeling, providing an alternative method for addressing this task. [13] utilized L1 penalty functions to select random effects and off-diagonal elements in the covariance matrix in the survival sub-model. Their focus was on joint modeling of multiple longitudinal markers and time-to-event data within the frequentist paradigm. [53] considered simultaneous parameter estimation and variable selection in joint modeling of longitudinal and survival data using various penalized functions. [56] utilized a B-spline decomposition and penalized likelihood with adaptive group LASSO to select the relevant independent variables and distinguish between the time-varying and time-invariant effects for the two sub-models. On the Bayesian side, [52] proposed a semiparametric joint model for analyzing multivariate longitudinal and survival data. They introduced a Bayesian Lasso method with independent Laplace priors for parameter estimation and variable selection. However, these studies primarily focused on scenarios with a limited number of longitudinal markers and did not address cases where the number of markers is high. Additionally, dynamic prediction in the context of variable selection in joint modeling remains unexplored in the literature reviewed here.
In this paper, we consider variable selection, including the selection of both the longitudinal markers and the time-invariant covariates, in the joint modeling of multiple longitudinal measurements and competing risks outcomes. This is the first study to specifically address variable selection in this context. For this purpose, we are faced with two issues. The first challenge is estimating joint models including a large number of longitudinal markers, which may be unfeasible. The second challenge, which involves variable selection, is addressed by using a spike-and-slab prior in a Bayesian paradigm. As a summary, we propose a new two-stage approach for variable selection of multiple longitudinal markers and competing risks outcomes when the number of longitudinal markers and covariates is large, and the estimation of the full joint model including all the markers and covariates is intractable. Using a Bayesian approach, the first stage involves estimating one-marker joint models for the competing risks and each longitudinal marker as in [5]. This avoids the bias caused by informative dropout. In the second stage, extending TSJM, we estimate a cause-specific hazard model for the competing risks outcome using a Bayesian approach allowing variable selection using a Bayesian approach allowing variable selection. This model includes time-dependent covariates and incorporates spike-and-slab priors. The explanatory variables in the hazard model consist of the estimates of functions derived from the random effects and parameters obtained from the initial stage of jointly modeling each longitudinal marker. We provide an R-package VSJM that implements the proposed methods which is available at https://github.com/tbaghfalaki/VSJM.
This paper is organized as follows: In Section 2, we describe our variable selection approach, which includes introducing longitudinal and competing risks sub-models, reviewing parameter estimation in MMJM, and finally describing our proposed two-stage approach for Bayesian variable selection. Section 3 focuses on dynamic prediction using the proposed approach. Section 4 presents a simulation study comparing the behavior of the variable selection two-stage approach with the oracle joint model in terms of variable selection and dynamic prediction. In Section 5, the proposed approach is applied to variable selection and the risk prediction of dementia in the Three-City (3C) cohort study [2]. Finally, the last section of the paper includes a conclusion of the proposed approach.

2 Variable selection for Joint modeling

The observed time is $T_{i}=min(T_{i1}^{*},\ldots,T_{iL}^{*},C_{i})$ where $C_{i}$ denotes the censoring time and $T_{il}^{*}$ the time to the event $l$ . Also, the event indicator is denoted as $\delta_{i}\in\{0,1,\ldots,L\}$ , where $0$ corresponds to censoring and $1,\ldots,L$ to competing events. Denote the measure of the $k$ th longitudinal marker for the $i$ th subject at time $s_{ijk}$ by $Y_{ijk}={Y}_{ik}(s_{ijk})$ . As a result, consider $\{T_{i},\delta_{i},Y_{ijk},i=1,~{}\ldots,N,~{}j=1,\ldots,n_{i},~{}k=1,\ldots,K\}$ a sample from the target population.

2.1 Longitudinal sub-model

A multivariate linear mixed-effects model is considered for the multiple longitudinal markers as follows:

\displaystyle Y_{ijk}=\eta_{ik}(s_{ijk}|{\beta}_{k},{b}_{ik})+\varepsilon_{ijk},i=1,~{}\ldots,N,~{}j=1,\ldots,n_{i},~{}k=1,\ldots,K,

(1)

where $\varepsilon_{ijk}\overset{\text{iid}}{\sim}N(0,\sigma_{k}^{2})$ . We assume that

\displaystyle\eta_{ik}(s|{\beta}_{k},{b}_{ik})={X}_{ik}(s)^{\top}{\beta}_{k}+{Z}_{ik}(s)^{\top}{b}_{ik},~{}k=1,\ldots,K,

(2)

where ${X}_{ik}(s)$ and ${Z}_{ik}(s)$ are the design vectors at time $s$ for the $p^{\beta}_{k}$ -dimensional fixed effects ${\beta}_{k}$ and $q_{k}$ -dimensional random effects ${b}_{ik}$ , respectively. We assume that ${b}_{i}=({b}_{i1}^{\top},\ldots,{b}_{iK}^{\top})^{\top}$ takes into account the association within and between the multiple longitudinal markers such that ${b}_{i}\sim N({0},{\Sigma})$ , and ${\Sigma}$ is a $q\times q$ covariance matrix, where $q=\sum_{k=1}^{K}q_{k}$ . It is assumed that ${\varepsilon}_{ijk}$ is independent of ${b}_{ik}$ , $i=1,\ldots,K$ .

2.2 Competing risks sub-model

For each of the $L$ causes, a cause-specific proportional hazard model is postulated. The risk of an event is assumed to be a function of the subject-specific linear predictors $\eta_{ik}(t|{b}_{ik})$ and of time-invariant covariates as follows [40]:

\displaystyle\lambda_{il}(t|{\omega}_{i},{b}_{i})=\lambda_{0l}(t)\exp\{{\gamma}_{l}^{\top}{\omega}_{il}+\sum_{k=1}^{K}\alpha_{lk}\eta_{ik}(t|{b}_{ik})\},

(3)

where $\lambda_{0l}(t)$ denotes the baseline hazard function for the $l$ th cause, ${\omega}_{il}$ is a $p^{\gamma}_{l}$ vector of covariates with corresponding regression coefficients ${\gamma}_{l}$ , and ${\alpha}_{l}=({\alpha}_{l1},\ldots,{\alpha}_{lK})^{\top}$ represents the vector of association parameters between the longitudinal markers and cause $l$ . There are different approaches for specifying the baseline hazard function in joint modeling [40]. The methodology is not restricted to a specific form of the baseline hazard, and we consider a piecewise constant baseline hazard [27]. Therefore, a finite partition of the time axis is constructed as $0=\xi_{0}<\xi_{1}<\xi_{2}<\ldots,\xi_{\mathcal{K}}$ , such that $max(T_{1},\ldots,T_{N})<\xi_{\mathcal{K}}$ . Thus, there are $\mathcal{K}$ intervals such as $(0,\xi_{1}],(\xi_{1},\xi_{2}],\ldots,(\xi_{\mathcal{K}-1},\xi_{\mathcal{K}}]$ . We consider a constant baseline hazard for each interval as $\lambda_{0l}(t)=\lambda_{\kappa l},~{}t\in(\xi_{\kappa-1},\xi_{\kappa}],~{}\kappa=1,\ldots,\mathcal{K}$ .

2.3 Parameter estimation of the multi-marker joint model

Let ${\psi}_{k}$ and ${\Upsilon}$ represent the unknown parameters for the $k$ th longitudinal marker and the time to event outcomes, respectively, excluding the covariance matrix ${\Sigma}$ . Let ${\psi}=\{{\psi}_{1},\ldots,{\psi}_{K}\}$ . If ${y}=\{{y}_{1},\ldots,{y}_{N}\}$ , ${t}=(t_{1},\ldots,t_{N})^{\top}$ , and ${\delta}=(\delta_{1},\ldots,\delta_{N})^{\top}$ , the joint likelihood function for models (1) and (3) is as follows:

$\displaystyle\mathbb{L}({\psi},{\Upsilon},{\Sigma}\|{y},{t},{\delta})$	$\displaystyle=$	$\displaystyle\prod_{i=1}^{N}\int\left(\prod_{k=1}^{K}p({y}_{ik}\|{b}_{ik},{\psi}_{k})\right)$
	$\displaystyle\times$	$\displaystyle p(t_{i},\delta_{i}\|{\omega}_{i},{b}_{i},{\Upsilon})p({b}_{i}\|{\Sigma})d{b}_{i}$
	$\displaystyle=$	$\displaystyle\prod_{i=1}^{N}\int\left(\prod_{k=1}^{K}\prod_{j=1}^{n_{i}}\phi({y}_{ijk};\eta_{ijk},\sigma^{2})\right)$
	$\displaystyle\times$	$\displaystyle\left(\prod_{l=1}^{L}\bigr{[}\lambda_{il}(t_{i}\|{\omega}_{i},{b}_{i},{\Upsilon})\bigr{]}^{I(\delta_{i}=l)}\exp\bigr{(}-\sum_{l=1}^{L}\Lambda_{il}({t_{i}}\|{\omega}_{i},{b}_{i},{\Upsilon})\bigr{)}\right)$
	$\displaystyle\times$	$\displaystyle\phi_{q}({b}_{i};{0},{\Sigma})d{b}_{i},$

where $\Lambda_{il}(t|{\omega}_{i},{{b}}_{i})$ represents the cumulative hazard function of Equation (3), and the densities of the univariate normal distribution with mean $\mu$ and variance $\sigma^{2}$ , and the $q$ -variate normal distribution with mean ${\mu}$ and covariance matrix ${\Sigma}$ , are denoted as $\phi(x;\mu,\sigma^{2})$ and $\phi_{q}({x};{\mu},{\Sigma})$ , respectively.
In the frequentist paradigm, parameter estimation can be computed by maximizing (2.3), which involves the use of numerical integration [14, 48]. This integration will be more challenging when the number of outcomes increases. In the Bayesian paradigm [57, 42, 4, 30], the following joint posterior distribution can be used for parameter estimation:

$\displaystyle\pi(\bm{\psi},\bm{\Upsilon},\bm{b},\bm{\Sigma}\mid\bm{y},\bm{t},\bm{\delta})$	$\displaystyle\propto$	$\displaystyle\prod_{i=1}^{N}\left(\prod_{k=1}^{K}\prod_{j=1}^{n_{i}}p\left(y_{ijk}\mid\bm{b}_{ik},\bm{\psi}_{k}\right)\times p\left(\bm{\psi}_{k}\right)\right)$
	$\displaystyle\times$	$\displaystyle\prod_{l=1}^{L}\bigr{[}\lambda_{il}(t_{i}\|{\omega}_{i},{b}_{i},{\Upsilon})\bigr{]}^{I(\delta_{i}=l)}\exp\bigr{(}-\sum_{l=1}^{L}\Lambda_{il}(t_{i}\|{\omega}_{i},{b}_{i},{\Upsilon})\bigr{)}$
	$\displaystyle\times$	$\displaystyle\phi_{q}\left(\bm{b}_{i};\mathbf{0},\bm{\Sigma}\right)\times p(\bm{\Upsilon})\times p(\bm{\Sigma}),$

where $p({\psi}_{k})$ , $p({\Upsilon})$ and $p({\Sigma})$ are the corresponding prior distributions for the unknown parameters. In this paper, we consider zero-mean normal distributions with a large variance as prior distributions for the regression coefficients and association parameters. A vague inverse gamma distribution is considered as the prior distribution for the variance of errors. An inverse Wishart distribution with degrees of freedom equal to the dimension of the random effects and an identity scale matrix is utilized as the prior for the variance of the random effects. The parameters of the baseline hazard are assumed to follow a gamma distribution with a mean of 1 and a large variance. In the next section, we introduce our proposed two-stage approach for variable selection of longitudinal markers and covariates in a Bayesian paradigm.

2.4 Two-stage joint modeling of multiple longitudinal measurements and competing risks

The proposed method adopts a two-stage approach for variable selection in joint modeling, targeting the selection of markers (or time-varying covariates) and time-invariant covariates. This approach is specifically designed for handling multiple longitudinal markers and time-to-event outcomes, with the primary goal of avoiding bias caused by informative dropouts. This relies on the idea of TSJM approach of [5] but considering competing risks and replacing the estimation of a Cox model by partial likelihood in the second stage by a fully Bayesian approach allowing variable selection.
The first stage involves estimating $K$ one-marker joint models for the competing risks and each marker. In the second stage, we estimate cause-specific proportional hazard models including the time-invariant explanatory variables and the predicted current values of the linear predictor of each longitudinal markers obtained from stage 1. Therefore, the approach consists of two stages, outlined as follows:

Stage 1

For each $k$ , $k=1,\ldots,K$ , estimate the joint model for the longitudinal marker $k$ and competing risks. The one-marker JMs are estimated using the joint posterior distribution given by equation 2.3 for $K=1$ . Then, predict the random effects $\hat{{b}}_{ik}$ and the linear predictor $\eta_{ik}(t|\hat{{\beta}}_{k},\hat{{b}}_{ik})$ , for marker $k$ using the empirical mean of the posterior distribution $({b}_{ik}|{y}_{ik},t_{i},\delta_{i},\hat{{\theta}}_{k})$ , where $\hat{{\theta}}_{k}$ represents the estimated parameters for the one-marker joint model for the $k$ th longitudinal marker. This first stage allow the estimation of $\eta_{ik}(t|\hat{{\beta}}_{k},\hat{{b}}_{ik})$ without the bias due to informative dropout but neglecting the correlation between the markers..

Stage 2

In the second stage, the values of $\eta_{ik}(t|\hat{{\beta}}_{k},\hat{{b}}_{ik})$ are included as time-varying covariates in a proportional hazard model to estimate parameters ${\gamma}$ and ${\alpha}_{k}$ , as well as the baseline hazard from (3). The cumulative hazard function of (3) can be approximated using Gaussian quadrature [9, 54]. In the Bayesian paradigm, the following joint posterior distribution is considered:

	$\displaystyle\pi({\lambda},\bm{\alpha}\mid\bm{t},\bm{\delta},\hat{\bm{b}},\bm{\omega})$	$\displaystyle\propto$	$\displaystyle\prod_{i=1}^{N}\prod_{l=1}^{L}\bigr{[}\lambda_{il}(t_{i}\|{\omega}_{i},\hat{{b}}_{i},{\Upsilon})\bigr{]}^{I(\delta_{i}=l)}$
		$\displaystyle\times$	$\displaystyle\exp\bigr{(}-\sum_{l=1}^{L}\Lambda_{il}(t_{i}\|{\omega}_{i},\hat{{b}}_{i},{\Upsilon})\bigr{)}\times p(\bm{\Upsilon}),$

where $\bm{\Upsilon}=({\lambda}_{l},{\alpha}_{l})$ , such that, ${\lambda}_{l}=\{\lambda_{l1},\ldots,\lambda_{l\mathcal{K}}\}$ denotes the vector of the piecewise constant baseline hazard for cause $l$ . The prior distributions for unknown parameters are the same as those considered for Equation 2.3.

2.5 Bayesian variable selection

As above, we consider the prior distributions ${\beta}_{k}\sim N_{p_{k}}(0,c\rm{I}_{p_{k}})$ , where $c$ is a large value, $\sigma_{k}^{2}\sim{\rm{I\Gamma}}(a_{\sigma},b_{\sigma})$ , ${\Sigma}\sim\rm{IWishart}({\mathcal{V}},\nu)$ and $\lambda_{l\kappa}\sim{\rm{\Gamma}}(a_{l\kappa},b_{l\kappa}),~{}{\kappa}=1,\ldots,\mathcal{K},~{}l=1,\ldots,L$ . Our purpose is to implement variable selection for the regression coefficients ${\gamma}_{l},~{}l=1,\ldots,L$ of the cause-specific hazard model (3) and the association parameters ${\alpha}_{l},~{}l=1,\ldots,L$ of (3). Therefore, we consider two different prior distributions, including continuous spike and Dirac spike, for them.

2.5.1 Continuous spike

The continuous spike [32, 22, CS,] prior for ${\gamma}_{l}=(\gamma_{l1},\ldots,\alpha_{p_{l}^{\gamma_{l}}})^{\prime}$ is given by

\displaystyle\begin{array}[]{l}{\gamma_{lj}}|{\zeta_{lj}},\sigma_{\gamma_{lj}}^{2},\tau_{\gamma_{lj}}^{2}\mathop{\sim}\limits^{ind}(1-{\zeta_{lj}}){N}({0},\sigma_{\gamma_{lj}}^{2})+{\zeta_{lj}}{N}({0},\tau_{\gamma_{lj}}^{2}),~{}~{}~{}~{}\sigma_{\gamma_{lj}}^{2}>\tau_{\gamma_{lj}}^{2}>0,~{}l=1,\ldots,L,~{}j=1,\ldots,p^{\gamma}_{l},\\ {\zeta_{lj}}|\pi_{\gamma_{lj}}\mathop{\sim}\limits^{ind}Ber(\pi_{\gamma_{lj}}),\\ \pi_{\gamma_{lj}}|{a_{\gamma_{lj}}},{b_{\gamma_{lj}}}\sim Beta(a_{\gamma_{lj}},b_{\gamma_{lj}}),\\ \sigma_{\gamma_{jk}}^{2}|a_{\sigma_{\gamma_{jk}}^{2}},b_{\sigma_{\gamma_{jk}}^{2}}\sim{\rm{I\Gamma}}(a_{\sigma_{\gamma_{jk}}^{2}},b_{\sigma_{\gamma_{jk}}^{2}}).\end{array}

(11)

Also, the CS prior for the association parameter ${\alpha}_{l}=(\alpha_{l1},\ldots,\alpha_{lK})^{\prime}$ is given by

\displaystyle\begin{array}[]{l}{\alpha_{lk}}|{\zeta_{lk}},\sigma_{\alpha_{lk}}^{2},\tau_{\alpha_{lk}}^{2}\mathop{\sim}\limits^{ind}(1-{\zeta_{lk}})N(0,\sigma_{\alpha_{lk}}^{2})+{\zeta_{lk}}N(0,\tau_{\alpha_{lk}}^{2}),~{}~{}~{}~{}\sigma_{\alpha_{lk}}^{2}>\tau_{\alpha_{lk}}^{2}>0,~{}k=1,\ldots,K,\\ {\zeta_{lk}}|\pi_{\alpha_{lk}}\mathop{\sim}\limits^{ind}Ber(\pi_{\alpha_{lk}}),\\ \pi_{\alpha_{lk}}|a_{\alpha_{lk}},b_{\alpha_{lk}}\sim Beta(a_{\alpha_{lk}},b_{\alpha_{lk}}),\\ \sigma_{\alpha_{lk}}^{2}|a_{\sigma_{\alpha_{lk}}^{2}},b_{\sigma_{\alpha_{lk}}^{2}}\sim{\rm{I\Gamma}}(a_{\sigma_{\alpha_{lk}}^{2}},b_{\sigma_{\alpha_{lk}}^{2}}).\end{array}

(16)

where, $\tau_{\gamma_{lj}}^{2}$ in (11) and $\tau_{\alpha_{lk}}^{2}$ in (16) are a small fixed value (e.g., $10^{-3}$ or $10^{-4}$ ) and represents a spike prior. The latent variable ${\zeta_{lj}}$ in (11) (or $\zeta_{lk}$ in (16)) is considered to account for the significance of the parameter. For example, if ${\zeta_{jk}}=0$ , then the distribution of ${\zeta_{lk}},\sigma_{\alpha_{lk}}^{2},\tau_{\alpha_{lk}}^{2}$ will be ${N}({0},\sigma_{\alpha_{lk}}^{2})$ , and can be considered as significant. This leads us to have a slab prior. A graphical example of CS prior can be found in panel (a) of Figure 1.

Refer to caption — Figure 1: An example of continuous spike prior (Panel a) and Dirac spike prior (Panel b).

2.5.2 Dirac spike

We exploit a Dirac spike-and-slab prior [22, 34, DS,] similar to that proposed by [58]. For this purpose, the DS prior for ${\gamma}_{l}$ is given by

\displaystyle\begin{array}[]{l}{\gamma_{lj}}|\pi_{\gamma_{lj}},\sigma_{\gamma_{lj}}^{2}\mathop{\sim}\limits^{ind}(1-\pi_{\gamma_{lj}})N(0,\sigma_{\gamma_{lj}}^{2})+\pi_{\gamma_{lj}}\delta_{0}(\gamma_{lj}),~{}~{}~{}~{}\sigma_{\gamma_{lj}}^{2}>0,~{}l=1,\ldots,L,~{}j=1,\ldots,p^{\gamma}_{l},\\ \pi_{\gamma_{lj}}|a_{\gamma_{lj}},b_{\gamma_{lj}}\sim Beta(a_{\gamma_{lj}},b_{\gamma_{lj}}),\\ \sigma_{\gamma_{lj}}^{2}|a_{\sigma_{\gamma_{lj}}^{2}},b_{\sigma_{\gamma_{lj}}^{2}}\sim{\rm{I\Gamma}}(a_{\sigma_{\gamma_{lj}}^{2}},b_{\sigma_{\gamma_{lj}}^{2}}).\end{array}

(20)

where $\delta_{0}(\gamma_{lj})$ represents a point mass at ${0}\in R$ . It is defined as $\delta_{0}(\gamma_{lj})=1$ when $\gamma_{lj}={0}$ and $\delta_{0}(\gamma_{lj})=0$ when $\gamma_{lj}$ is non-zero, i.e., $\gamma_{lj}\neq{0}$ . Also, the DS prior for the association parameter ${\alpha}$ is given by

\displaystyle\begin{array}[]{l}{\alpha_{lk}}|\pi_{\alpha_{lk}},\sigma_{\alpha_{lk}}^{2}\mathop{\sim}\limits^{ind}(1-\pi_{\alpha_{lk}})N(0,\sigma_{\alpha_{lk}}^{2})+\pi_{\alpha_{lk}}\delta_{0}(\alpha_{lk}),~{}~{}~{}~{}\sigma_{\alpha_{lk}}^{2}>0,~{}k=1,\ldots,K,\\ \pi_{\alpha_{lk}}|a_{\alpha_{lk}},b_{\alpha_{lk}}\sim Beta(a_{\alpha_{lk}},b_{\alpha_{lk}}),\\ \sigma_{\alpha_{lk}}^{2}|a_{\sigma_{\alpha_{lk}}^{2}},b_{\sigma_{\alpha_{lk}}^{2}}\sim{\rm{I\Gamma}}(a_{\sigma_{\alpha_{lk}}^{2}},b_{\sigma_{\alpha_{lk}}^{2}}).\end{array}

(24)

A graphical example of the DS prior can be found in panel (b) of Figure 1. For implementing Bayesian inference, a block Gibbs sampler is used in conjunction with the Metropolis-Hastings algorithm. For this purpose, we require all of the full conditional distributions. The Markov Chain Monte Carlo (MCMC) sampling is performed using “JAGS” [37, 36] and the R2jags [51] package is utilized as an interface between R and JAGS. The package automatically computes the full conditional posterior distributions for all parameters and models. We refer to this approach as Variable Selection for Joint Modeling (VSJM). The methodology has been implemented in the R package VSJM.

2.6 Statistical inference

To select significant variables in the covariates of the cause-specific hazard model and by considering Equation (11) or (20), the following hypothesis tests are conducted:

\displaystyle\left\{{\begin{array}[]{*{20}{c}}{{{\rm{H}}_{0lj}^{\gamma}}:{\gamma_{lj}}=0}\\ {{{\rm{H}}_{1lj}^{\gamma}}:{\gamma_{lj}}\neq 0}\\ \end{array}}\right.,~{}~{}j=1,\ldots,p_{l}^{\gamma},l=1,\ldots,L,

(27)

Also, to select significant longitudinal markers based on the association parameters and by considering Equation (16) or (24), the following hypothesis tests are considered:

\displaystyle\left\{{\begin{array}[]{*{20}{c}}{{{\rm{H}}_{0lk}^{\alpha}}:{\alpha_{lk}}=0}\\ {{{\rm{H}}_{1lk}^{\alpha}}:{\alpha_{lk}}\neq 0}\\ \end{array}}\right.,~{}~{}k=1,\ldots,K,l=1,\ldots,L,.

(30)

In practice, when implementing the hypotheses (27) and (30) for CS, it is possible to consider the following equivalent hypotheses, respectively:

\displaystyle\left\{{\begin{array}[]{*{20}{c}}{{{\rm{H}}_{0lj}^{\gamma}}:{\zeta_{lj}}=0}\\ {{{\rm{H}}_{1lj}^{\gamma}}:{\zeta_{lj}}\neq 0}\\ \end{array}}\right.,~{}~{}j=1,\ldots,p_{l}^{\gamma},l=1,\ldots,L,

(33)

\displaystyle\left\{{\begin{array}[]{*{20}{c}}{{{\rm{H}}_{0lk}^{\alpha}}:{\zeta_{lk}}=0}\\ {{{\rm{H}}_{1lk}^{\alpha}}:{\zeta_{lk}}\neq 0}\\ \end{array}}\right.,~{}~{}k=1,\ldots,K,l=1,\ldots,L,.

(36)

We consider two different Bayesian testing approaches for evaluating these hypotheses: a local Bayesian false discovery rate [18] and a Bayes factor [28].

2.6.1 Continuous spike

Local Bayesian false discovery rate
Let ${\cal{D}}$ denote the data in the study. The set of unknown parameters of the CS model is denoted as ${\theta}$ . Let $P({\rm{{\rm{H}}}}_{0k}|{\cal{D}})$ represent the posterior probability of the null hypothesis. This is the probability of making a false discovery for a significant covariate, which is called the local Bayesian false discovery rate for ${\cal{D}}$ (denoted by $lBFDR$ ) by Efron [18]. The small values of $lBFDR$ indicate strong evidence for the presence of a significant covariate. For computing $lBFDR$ of Equation (33), we obtain

	$\displaystyle lBFDR_{lj}^{\gamma}$	$\displaystyle=$	$\displaystyle P({{\rm{{\rm{H}}}}^{\gamma}_{0lj}}\|{\cal{D}})=P({\zeta_{lj}}=0\|{\cal{D}})$
		$\displaystyle\approx$	$\displaystyle\frac{1}{M}\sum\limits_{m=1}^{M}{I({\zeta_{lj}}=0\|{\cal{D}},{\theta}^{(m)})},$

where $I({\zeta_{lj}}=0|{\cal{D}},{\theta}^{(m)})$ is an indicator function and ${\theta}^{(m)}$ denotes the $m^{th},m=1,\ldots,M$ , generated sample of ${\theta}$ using the Gibbs sampler.
The lBFDR is a procedure for investigating the null hypothesis. It is an empirical Bayes version of the methodology proposed by [7]. This approach focuses on densities instead of tail areas [19, 20, 17]. Efron [17] demonstrated that the false discovery rate control rule proposed by [7] can be expressed in terms of empirical Bayes. They also showed that the desired control level (referred to as $\alpha[=0.05]$ ) for Benjamini-Hochberg testing of single tests can be regarded as the desired control level for local Bayes false discovery rate (lBFDR) as well. Based on Remark B of [17], if the value of lBFDR is not greater than 0.05 (the desired control level), the null hypothesis will be rejected. In fact, the interpretation of the lBFDR value is analogous to the frequentist’s p-value. LBFDR values less than a specified level (e.g., 0.05) of significance provide stronger evidence against the null hypothesis [45, 59]. The value of 0.05 in the context of variable selection is also considered a threshold for lBFDR by [31] and [6]. Thus, a local Bayesian false discovery rate of 0.05 is considered in this paper.
Bayes factor
The Bayes factor for testing (33) is defined as

	$\displaystyle BF^{\gamma}_{lj}$	$\displaystyle=$	$\displaystyle\frac{{P({{\rm{{\rm{H}}}}_{1lj}^{\gamma}}\|{\cal{D}})/P({{\rm{{\rm{H}}}}_{1lj}^{\gamma}})}}{{P({{\rm{{\rm{H}}}}_{0lj}^{\gamma}}\|{\cal{D}})/P({{\rm{{\rm{H}}}}_{0lj}^{\gamma}})}}$
		$\displaystyle=$	$\displaystyle\frac{{P({\zeta_{lj}}\neq 0\|{\cal{D}})/P({\zeta_{lj}}\neq 0)}}{{P({\zeta_{lj}}=0\|{\cal{D}})/P({\zeta_{lj}}=0)}},$

which describes the evidence of ${\rm{{\rm{H}}}}_{1k}$ against ${\rm{{\rm{H}}}}_{0k}$ . Note that $P({\zeta_{lj}}=1)=E\left[\pi_{\gamma_{lj}}\right]=\frac{{a_{\gamma_{lj}}}}{{a_{\gamma_{lj}}}+{b_{\gamma_{lj}}}}$ as $\zeta_{11},\ldots,\zeta_{Lp^{\gamma}_{L}}$ are assumed independent. Hence,

\displaystyle P(\zeta_{lj}=0)=\frac{b_{\gamma_{lj}}}{a_{\gamma_{lj}}+{b_{\gamma_{lj}}}}.

Also, since $lBFDR_{lj}^{\gamma}=P({{\rm{{\rm{H}}}}_{0lj}^{\gamma}}|{\cal{D}})$ ,

\displaystyle BF_{lj}^{\gamma}=\frac{{1-lBFDR_{lj}^{\gamma}}}{{lBFDR_{lj}^{\gamma}}}\times\frac{b_{\gamma_{lj}}}{a_{\gamma_{lj}}}.

(37)

Unlike $lBFDR$ , a large value of $BF$ indicates strong evidence in favor of ${\rm{{\rm{H}}}}_{1}$ . Also, a widely cited table is provided by [28]. This table is based on $log_{10}(BF_{k})$ and can be used to quantify the support that one model has over another.
All comments mentioned above can also be applied to the hypothesis testing of equations (36).

2.6.2 Dirac spike

The strategy for performing lBFDR and BF for DS is to define indicator variables as $\Psi^{\gamma}_{lj}$ , $j=1,\ldots,p_{l}^{\gamma},l=1,\ldots,L,$ and $\Psi^{\alpha}_{lk}$ , $k=1,\ldots,K$ such that

{\Psi^{\gamma}_{lj}}=\left\{{\begin{array}[]{*{20}{c}}1&{\rm{if}~{}~{}{{\gamma_{lj}}\neq{0}}}\\ 0&{\rm{if}~{}~{}{{\gamma_{lj}}={0}}}\end{array}}\right.,

{\Psi^{\alpha}_{lk}}=\left\{{\begin{array}[]{*{20}{c}}1&{\rm{if}~{}~{}{\alpha_{lk}}\neq{0}}\\ 0&{\rm{if}~{}~{}{\alpha_{lk}}={0}}\end{array}}\right..

By considering these indicator variables, all of those described for the CS can also be used for the DS.

3 Dynamic predictions

3.1 Dynamic predictions from VSJM

The joint modeling of longitudinal measurements and time-to-event outcomes is a popular methodology for calculating dynamic prediction (DP). The DP represents the risk predictions for subject $i$ who has survived up to time point $s$ , taking into account longitudinal measurements and other information available until time $s$ . Consider the notation described in Section 2. Let the proposed joint models be fitted to a random sample, which is referred to as training data. For validation purposes, we are interested in predicting the risk of subjects in a new set of data called the validation set. The longitudinal measurements until time $s$ are denoted by ${\mathcal{Y}}_{i}(s)$ and given by $\{Y_{ik}(s_{ijk});0\leq s_{ijk}\leq s,j=1,\ldots,n_{i},k=1,\ldots,K\}$ . Additionally, the vector of available explanatory variables until time $s$ is ${\mathcal{X}}_{i}(s)$ and given by $\{{w}_{i},{X}_{ik}(s_{ijk});0\leq s_{ijk}\leq s,j=1,\ldots,n_{i},k=1,\ldots,K\}$ . Let $\delta_{i}=1$ denote the event of interest, then, for the landmark time $s$ and the prediction window $t$ , the risk probability is as follows:

\displaystyle\pi_{i}(s+t\mid s)

\displaystyle=

\displaystyle\mathrm{P}\left(s\leq T_{i}^{*}<s+t,\delta_{i}=1\mid T_{i}^{*}>s,\mathcal{Y}_{i}(s),\bm{\mathcal{X}}_{i}(s);\bm{\theta}\right),

(38)

where ${{\theta}}$ is a vector of all unknown parameters of the model, including ${{\theta}}_{k}$ which is estimated at stage 1, and ${\bm{\Upsilon}}$ which is estimated at stage 2, and $T_{i}^{*}$ represents the true event time for the $i$ th subject. The predicted probability (38) can be computed as:

\displaystyle\begin{aligned} \mathrm{P}\left(s\leq T_{i}^{*}<s+t,\delta_{i}=1\mid T_{i}^{*}>s,\mathcal{Y}_{i}(s),\bm{\mathcal{X}}_{i}(s);\bm{\theta}\right)&=\int\frac{\mathrm{P}\left(s\leq T_{i}^{*}<s+t,\delta_{i}=1\mid\bm{b}_{i},\bm{\mathcal{X}}_{i}(s);\bm{\theta}\right)}{S\left(s\mid\bm{b}_{i},\bm{\mathcal{X}}_{i}(s);\bm{\theta}\right)}\\ &\times f\left(\bm{b}_{i}\mid T_{i}^{*}>s,\mathcal{Y}_{i}(s),\bm{\mathcal{X}}_{i}(s);\bm{\theta}\right)d\bm{b}_{i}\end{aligned}

(39)

where $S(\cdot)$ represents the survival probability to survive to both events given random effects. The first order approximation for estimating this quantity [39, 5] is given as follows:

\displaystyle\hat{\pi}_{i}(s+t\mid s)=\frac{\int_{s}^{s+t}\exp\{-\sum_{l=1}^{L}\Lambda_{l}(u\mid\hat{{b}}_{i},{\mathcal{X}}_{i}(s);\hat{{\theta}})\}\lambda_{1}(u\mid\hat{{b}}_{i},{\mathcal{X}}_{i}(s);\hat{{\theta}})\}du}{\exp\{-\sum_{l=1}^{L}\Lambda_{l}(s\mid\hat{{b}}_{i},{\mathcal{X}}_{i}(s);\hat{{\theta}})\}},

where $\hat{{\theta}}=\{\hat{{\theta}}_{1},\ldots,\hat{{\theta}}_{K},\hat{\bm{\Upsilon}}\}$ are estimated based on the training data. To predict $\hat{{b}}_{i}$ , we consider the one-marker joint models for each marker and predict the $p_{k}$ -dimensional random effect ${b}_{ik}$ given $(T^{*}_{i}>s,{\mathcal{Y}}_{i}(s),{\mathcal{X}}_{i}(s),\hat{{\theta}}_{k})$ . This is denoted by $\hat{{b}}_{ik}$ , where $\hat{{\theta}}_{k}$ represents the estimated parameters for the one-marker joint model in the first stage of VSJM. Merge the predicted random effects to obtain the $p=\sum_{k=1}^{K}p_{k}$ -dimensional predicted random effect $\hat{{b}}_{i}=(\hat{{b}}_{i1},\ldots,\hat{{b}}_{iK})$ .
For Monte-Carlo approximation of (39), we follow the same strategy as those discussed in [62]. For this aim, consider a realization of the generated MCMC sample for estimating the unknown parameters from the one-marker JM for marker $k$ in the training data, denoted ${\theta}_{k}^{(\ell)},~{}\ell=1,\ldots,\mathcal{L}$ . Draw $M$ values ${b}_{ik}^{(\ell,m)}$ from the posterior distribution $({b}_{ik}|T^{*}_{i}>s,{\mathcal{Y}}_{i}(s),{\mathcal{X}}_{i}(s),{\theta}_{k}^{(\ell)})$ . For $~{}\ell=1,\ldots,\mathcal{L},~{}m=1,\ldots,M,$

\displaystyle\hat{\pi}_{i}(s+t|s)^{(\ell,m)}=\frac{\int_{s}^{s+t}\exp\{-\sum_{l=1}^{L}\Lambda_{l}(u\mid{{b}}_{i}^{(\ell,m)},{\mathcal{X}}_{i}(s);{\theta}_{k}^{(\ell)})\}\lambda_{1}(u\mid{{b}}_{i}^{(\ell,m)},{\mathcal{X}}_{i}(s);{\theta}_{k}^{(\ell)})\}du}{\exp\{-\sum_{l=1}^{L}\Lambda_{l}(s\mid{{b}}_{i}^{(\ell,m)},{\mathcal{X}}_{i}(s);{\theta}_{k}^{(\ell)})\}}.

Calculate any desired quantity from the generated chains, such as percentiles, to determine the credible interval.

3.2 Reducing bias in dynamic predictions from VSJM

To reduce estimation biases resulting from variable selection, we propose incorporating an additional stage to calculate dynamic predictions. After variable selection using CS or DS prior, we recommend re-estimating the proportional hazard model by substituting CS or DS with non-informative normal priors for the association parameters of the selected markers and the regression coefficients of the selected covariates. Then calculate the DP as described in Section 3.1.

3.3 Measures of prediction accuracy

We use the Area Under the Receiver Operating Characteristics Curve (AUC) and the Brier Score (BS) to assess the predictive performance of prediction models [50]. AUC represents the probability that the model correctly ranks a randomly chosen individual with the disease higher than a randomly chosen individual without the disease based on their predicted risks Additionally, BS measures the preduction accuracy by quantifying the mean squared error of prediction. For censored and competing time-to-event and time-dependent markers, a non-parametric inverse probability of censoring weighted estimator of AUC and BS was proposed by [8] and utilized in this paper.

4 Simulation study

4.1 Design of simulation study

A simulation study was conducted to assess the effectiveness of the proposed VSJM approach for variable selection and risk prediction. To evaluate the effectiveness of the proposed approaches for variable selection, we examine three distinct criteria: the true positive rate (TPR), which measures the proportion of correctly selected variables among all significant variables; the false positive rate (FPR), which measures the proportion of incorrectly selected variables among all irrelevant variables; and the Matthews correlation coefficient [33] (MCC). The latter is defined as follows:

\displaystyle\rm{MCC}

\displaystyle=

\displaystyle\frac{\rm{TP}\times\rm{TN}-\rm{FP}\times\rm{FN}}{\sqrt{(\rm{TP+FP)(TP+FN)(TN+FP)(TN+FN)}}},

(40)

where TP, TN, FP, and FN represent the number of true positives, true negatives, false positives, and false negatives, respectively. The MCC and TPR are expected to reach 1, while the FPR is expected to be close to zero for optimal performance. BF and lBFDR are used for variable selection, with the results based on a threshold of 1 for BF and a threshold of 0.05 for lBFDR.
The simulation scenarios varied based on the strength of the association between the markers and the events, the residual variances for the markers that quantify measurement error, and the correlation between the markers that is neglected in the first stage of the approach. The performance of the approaches is also evaluated based on their predictive abilities using AUC and BS. The predictive performance of the approach is compared with that of the true model using the true values of the parameters and random effects (referred to as ”Real”) as well as the oracle model (true model, where parameters are estimated by accounting for the multi-marker joint modeling).
Data was generated using multi-marker joint models with $K=10$ longitudinal markers and a competing risks model with two causes. The longitudinal data was generated from the following model:

	$\displaystyle Y_{ik}(t)$	$\displaystyle=$	$\displaystyle\eta_{ik}(t\|{\beta}_{k},{b}_{ik})+\varepsilon_{ikt}$
		$\displaystyle=$	$\displaystyle\beta_{0k}+\beta_{1k}t+b_{0ki}+b_{1ki}t+\varepsilon_{ikt},$

where $\varepsilon_{ikt}\sim N(0,\sigma_{k}^{2})$ , ${b}_{i}=(b_{01i},b_{11i},b_{02i},b_{12i},\ldots,b_{0,10,i},b_{1,10,i})^{\top}\sim N({0},{\Sigma})$ , $i=1,\ldots,n$ . Let $k=1,2,\ldots,K=10$ , the measurement times $t=0,0.1,0.2,\ldots,2$ and ${\beta}_{k}=(-0.5,0.5)^{\top},k=1,2,..,10$ .
The time-to-event outcome was simulated using a cause-specific hazard model that depends on the current value of the $K$ markers and on 24 time-invariant covariates as follows:

\displaystyle\lambda_{l}(t)=\lambda_{0l}\exp({\gamma}_{l}^{\top}{\omega}_{il}+\sum_{k=1}^{K}\alpha_{lk}\eta_{ik}(t|{\beta}_{k},{b}_{ik})),

(42)

where ${\omega}_{il}=(x_{1i},w_{1i},\ldots,w_{24,i})^{T}$ . Half of the covariates were independently generated from a standard normal distribution, while the others were independently generated from a binary distribution with a success probability of 0.5. To simulate time-to-event data, we apply the inverse of the cumulative hazard function [3] using the inverse probability integral transform method [43].
The covariance matrix of the random effects a structure is considered as follows:

\displaystyle{\Sigma}=\begin{pmatrix}{\Sigma}^{*}&{\Sigma}^{\dagger}&\ldots&{\Sigma}^{\dagger}\\ &{\Sigma}^{*}&\ldots&{\Sigma}^{\dagger}\\ &&\ddots&\vdots\\ &&&{\Sigma}^{*}\end{pmatrix},~{}~{}\rm{where}~{}~{}{\Sigma}^{*}=\begin{pmatrix}1&0.5\\ 0.5&1\end{pmatrix},~{}{\Sigma}^{\dagger}=\rho\begin{pmatrix}1&1\\ 1&1\end{pmatrix}.

(43)

For each scenario, $100$ data sets of $n=1000$ subjects were generated and split into a training sample of 500 subjects and a validation of 500 subjects. The investigation focused on estimating parameters on the training sample using various approaches. The performance of these approaches for risk prediction was then evaluated using the validation samples. To achieve this goal, we consider four landmark times at $s=0,0.25,0.5,0.75$ , using $t=0.25$ as window for prediction.

4.2 Simulation with common effect for the two events

Among the 10 generated longitudinal markers and the 24 generated time-invariant covariates, only 4 markers and 4 covariates (2 Gaussian and 2 binary) were truly associated with the two event risks, sharing identical regression parameters for both events. To evaluate the impact of increasing measurement errors, association parameters, and marker correlation on the performance of the three methods, we considered different values for these parameters, as described below:

1.

$\sigma_{k}^{2}=0.5,1$ for $k=1,2,\ldots,10$ ,
2.

${\gamma}_{l}=(\gamma^{*},-\gamma^{*},\underbrace{0,\ldots,0}_{10},\gamma^{*},-\gamma^{*},\underbrace{0,\ldots,0}_{10})^{T},~{}l=1,2$ and ${\alpha}_{l}=(-\alpha^{*},-\alpha^{*},\alpha^{*},\alpha^{*},0,0,0,0,0,0)^{\top}$ , where $\gamma^{*}=\alpha^{*}=0.5,1$ .
3.

$\rho=0.1,~{}0.7$ .

Based on the insights obtained from both Table 1, which pertains to variable selection, and Table 2, which focuses on risk prediction, regarding the performance of the VSJM, we can conclude:

•

For all scenarios, AUC and BS computed on the validation set after Bayesian variable selection on the training set were close to those of the oracle model.
•

Increasing the between markers correlation values tends to decrease the performance of the approach in terms of TPR, FPR, and MCC, albeit to an acceptable extent.
•

Increasing the residual variance of the longitudinal markers ( $\sigma^{2}$ ) leads to a decrease in the performance of the approach.
•

Enhanced values of association parameters and regression coefficients tend to improve the approach’s performance.
•

Although the predictive performance of BF and lBFDR is similar, BF generally outperforms lBFDR in terms of total variable selection, except in the easiest scenario characterized by high association and low measurement error.

Table 1: Simulation results with identical effects for both events: Mean and standard deviation of the true positive rate (TPR), false positive rate (FPR), and Matthews Correlation Coefficient (MCC) over 100 replications. The results of DS and CS are based on a threshold of 1 for BF and a threshold of 0.05 for lBFDR.

$\rho=0.7,~{}\sigma^{2}=0.5,~{}\alpha^{}=0.5,~{}\gamma^{}=0.5$
		CS						DS
		Total		L-Markers		Fixed covariates		Total		L-Markers		Fixed covariates
		Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.
BF	TPR	0.787	0.115	0.704	0.174	0.871	0.117	0.820	0.099	0.760	0.158	0.879	0.100
	FPR	0.028	0.023	0.049	0.060	0.021	0.025	0.053	0.029	0.097	0.088	0.040	0.032
	MCC	0.797	0.104	0.697	0.179	0.862	0.093	0.771	0.095	0.683	0.188	0.822	0.091
lBFDR	TPR	0.516	0.103	0.371	0.159	0.660	0.130	0.584	0.117	0.450	0.189	0.719	0.129
	FPR	0.001	0.005	0.001	0.011	0.001	0.005	0.005	0.010	0.012	0.030	0.003	0.008
	MCC	0.664	0.076	0.494	0.157	0.780	0.088	0.704	0.087	0.540	0.179	0.813	0.089
$\rho=0.1,~{}\sigma^{2}=0.5,~{}\alpha^{}=0.5,~{}\gamma^{}=0.5$
BF	TPR	0.941	0.054	1.000	0.000	0.881	0.109	0.947	0.054	1.000	0.000	0.894	0.108
	FPR	0.024	0.021	0.024	0.044	0.024	0.024	0.037	0.030	0.040	0.056	0.036	0.034
	MCC	0.913	0.051	0.973	0.050	0.860	0.090	0.892	0.069	0.954	0.063	0.840	0.109
lBFDR	TPR	0.844	0.064	1.000	0.000	0.688	0.127	0.858	0.056	1.000	0.000	0.717	0.112
	FPR	0.001	0.004	0.001	0.011	0.001	0.005	0.003	0.007	0.006	0.021	0.002	0.008
	MCC	0.895	0.042	0.998	0.013	0.800	0.083	0.900	0.042	0.994	0.024	0.814	0.078
$\rho=0.1,~{}\sigma^{2}=1,~{}\alpha^{}=0.5,~{}\gamma^{}=0.5$
BF	TPR	0.929	0.058	1.000	0.000	0.858	0.116	0.939	0.057	1.000	0.000	0.877	0.114
	FPR	0.025	0.022	0.028	0.043	0.024	0.027	0.041	0.029	0.062	0.067	0.034	0.035
	MCC	0.903	0.054	0.967	0.050	0.848	0.095	0.880	0.066	0.930	0.073	0.834	0.113
lBFDR	TPR	0.828	0.051	0.995	0.025	0.660	0.104	0.853	0.053	0.998	0.018	0.708	0.112
	FPR	0.003	0.009	0.005	0.020	0.003	0.011	0.002	0.006	0.005	0.020	0.001	0.005
	MCC	0.879	0.038	0.990	0.030	0.775	0.079	0.899	0.037	0.992	0.027	0.813	0.074
$\rho=0.1,~{}\sigma^{2}=0.5,~{}\alpha^{}=1,~{}\gamma^{}=1$
BF	TPR	0.999	0.009	1.000	0.000	0.998	0.018	0.999	0.009	1.000	0.000	0.998	0.018
	FPR	0.035	0.026	0.037	0.051	0.034	0.031	0.060	0.039	0.057	0.070	0.061	0.043
	MCC	0.933	0.048	0.958	0.058	0.914	0.075	0.890	0.064	0.936	0.077	0.856	0.088
lBFDR	TPR	0.989	0.030	1.000	0.000	0.978	0.060	0.988	0.025	1.000	0.000	0.975	0.051
	FPR	0.003	0.008	0.002	0.012	0.003	0.008	0.007	0.012	0.008	0.025	0.007	0.013
	MCC	0.987	0.026	0.998	0.014	0.978	0.047	0.977	0.032	0.990	0.029	0.965	0.053
$\rho=0.1,~{}\sigma^{2}=1,~{}\alpha^{}=1,~{}\gamma^{}=1$
BF	TPR	0.996	0.016	1.000	0.000	0.992	0.032	1.000	0.000	1.000	0.000	1.000	0.000
	FPR	0.038	0.026	0.053	0.056	0.034	0.027	0.062	0.025	0.081	0.056	0.057	0.029
	MCC	0.924	0.045	0.939	0.063	0.908	0.064	0.885	0.041	0.908	0.060	0.862	0.062
lBFDR	TPR	0.979	0.038	1.000	0.000	0.958	0.076	0.981	0.029	1.000	0.000	0.963	0.058
	FPR	0.003	0.007	0.006	0.021	0.002	0.006	0.010	0.011	0.019	0.036	0.007	0.011
	MCC	0.981	0.028	0.994	0.025	0.970	0.051	0.968	0.027	0.977	0.042	0.958	0.046

Table 2: Simulation results with identical effects for both events: Mean and standard deviation of AUC and Brier score computed for landmark times

s=0,0.25,0.5,0.75

and prediction window of

0.25

$\rho=0.7,~{}\sigma^{2}=0.5,~{}\alpha^{}=0.5,~{}\gamma^{}=0.5$
		Real		Oracle		CS				DS
						BF		lBFDR		BF		lBFDR
	s	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.
AUC	0	0.789	0.026	0.770	0.028	0.724	0.033	0.711	0.036	0.723	0.033	0.712	0.038
	0.25	0.737	0.051	0.719	0.048	0.685	0.052	0.665	0.051	0.685	0.051	0.674	0.051
	0.5	0.713	0.071	0.702	0.069	0.672	0.068	0.651	0.087	0.668	0.072	0.651	0.081
	0.75	0.702	0.126	0.681	0.123	0.653	0.123	0.635	0.137	0.645	0.122	0.632	0.138
BS	0	0.160	0.011	0.161	0.010	0.170	0.012	0.171	0.013	0.170	0.013	0.171	0.014
	0.25	0.132	0.015	0.134	0.015	0.138	0.015	0.140	0.015	0.138	0.015	0.139	0.016
	0.5	0.114	0.027	0.114	0.027	0.117	0.027	0.119	0.027	0.117	0.027	0.119	0.027
	0.75	0.096	0.027	0.097	0.027	0.099	0.027	0.100	0.027	0.100	0.028	0.100	0.027
$\rho=0.1,~{}\sigma^{2}=0.5,~{}\alpha^{}=0.5,~{}\gamma^{}=0.5$
AUC	0	0.850	0.021	0.816	0.025	0.803	0.024	0.796	0.027	0.801	0.026	0.795	0.028
	0.25	0.806	0.046	0.786	0.052	0.781	0.050	0.776	0.050	0.779	0.052	0.776	0.051
	0.5	0.796	0.064	0.778	0.067	0.770	0.063	0.768	0.066	0.773	0.065	0.771	0.065
	0.75	0.811	0.099	0.802	0.099	0.794	0.101	0.798	0.100	0.793	0.104	0.798	0.102
BS	0	0.154	0.008	0.159	0.009	0.161	0.009	0.162	0.010	0.161	0.009	0.162	0.010
	0.25	0.102	0.016	0.105	0.017	0.105	0.017	0.106	0.017	0.106	0.017	0.106	0.017
	0.5	0.077	0.018	0.078	0.018	0.079	0.018	0.078	0.018	0.079	0.018	0.078	0.018
	0.75	0.052	0.015	0.053	0.016	0.054	0.016	0.054	0.016	0.054	0.016	0.054	0.016
$\rho=0.1,~{}\sigma^{2}=1,~{}\alpha^{}=0.5,~{}\gamma^{}=0.5$
AUC	0	0.854	0.025	0.806	0.030	0.791	0.032	0.784	0.031	0.789	0.033	0.786	0.031
	0.25	0.812	0.047	0.790	0.054	0.781	0.054	0.776	0.053	0.783	0.057	0.780	0.052
	0.5	0.785	0.078	0.769	0.078	0.761	0.076	0.758	0.081	0.764	0.079	0.765	0.078
	0.75	0.811	0.108	0.791	0.104	0.779	0.109	0.771	0.110	0.777	0.116	0.779	0.109
BS	0	0.151	0.009	0.160	0.008	0.161	0.009	0.163	0.009	0.162	0.009	0.162	0.009
	0.25	0.107	0.014	0.110	0.015	0.110	0.015	0.110	0.015	0.110	0.015	0.110	0.015
	0.5	0.071	0.016	0.073	0.017	0.074	0.017	0.074	0.017	0.074	0.017	0.074	0.017
	0.75	0.054	0.015	0.055	0.015	0.056	0.015	0.056	0.015	0.055	0.015	0.056	0.015
$\rho=0.1,~{}\sigma^{2}=0.5,~{}\alpha^{}=1,~{}\gamma^{}=1$
AUC	0	0.942	0.014	0.891	0.022	0.883	0.018	0.885	0.019	0.881	0.019	0.885	0.020
	0.25	0.880	0.045	0.846	0.049	0.844	0.050	0.847	0.048	0.843	0.045	0.847	0.046
	0.5	0.890	0.050	0.860	0.066	0.861	0.077	0.863	0.073	0.863	0.068	0.866	0.063
	0.75	0.875	0.074	0.869	0.073	0.867	0.067	0.868	0.076	0.866	0.076	0.869	0.076
BS	0	0.144	0.009	0.157	0.009	0.157	0.008	0.156	0.009	0.158	0.009	0.157	0.009
	0.25	0.064	0.011	0.067	0.013	0.068	0.012	0.068	0.012	0.068	0.011	0.068	0.012
	0.5	0.040	0.013	0.042	0.013	0.042	0.013	0.041	0.013	0.041	0.013	0.041	0.013
	0.75	0.028	0.012	0.028	0.012	0.028	0.011	0.028	0.011	0.028	0.011	0.028	0.011
$\rho=0.1,~{}\sigma^{2}=1,~{}\alpha^{}=1,~{}\gamma^{}=1$
AUC	0	0.938	0.012	0.864	0.032	0.860	0.021	0.863	0.021	0.858	0.021	0.862	0.019
	0.25	0.889	0.030	0.843	0.041	0.847	0.033	0.849	0.032	0.842	0.031	0.849	0.030
	0.5	0.858	0.057	0.826	0.069	0.828	0.054	0.832	0.054	0.829	0.048	0.835	0.053
	0.75	0.900	0.073	0.856	0.077	0.861	0.094	0.866	0.092	0.863	0.090	0.866	0.090
BS	0	0.144	0.007	0.163	0.010	0.160	0.008	0.160	0.009	0.160	0.007	0.160	0.008
	0.25	0.063	0.012	0.069	0.013	0.068	0.012	0.067	0.012	0.068	0.012	0.067	0.012
	0.5	0.040	0.011	0.042	0.012	0.043	0.012	0.042	0.012	0.043	0.012	0.042	0.012
	0.75	0.028	0.010	0.030	0.011	0.030	0.010	0.030	0.010	0.030	0.010	0.030	0.010

4.3 Scenarios with different effects for two events

In this section, we delve into exploring scenarios where the markers and covariates have differential effects on the two causes of event. The simulation design was identical to section 4.1 except for the values of the regression parameters ${\alpha}_{l}$ and ${\gamma}_{l}$ . Throughout all simulations within this context, we set $\sigma_{k}$ to 0.5 for $k$ ranging from 1 to 10, while $\gamma^{*}$ and $\alpha^{*}$ are fixed at 0.5, and $\rho$ is set to $0.1$ . We consider three scenarios as follows:

Single effects on cause 1

•

${\alpha}_{1}=(-\alpha^{*},-\alpha^{*},\alpha^{*},\alpha^{*},0,0,0,0,0,0)^{\top}$
•

${\gamma}_{1}=(\gamma^{*},-\gamma^{*},\underbrace{0,\ldots,0}_{10},\gamma^{*},-\gamma^{*},\underbrace{0,\ldots,0}_{10})^{T}$ .
•

${\alpha}_{2}=(0,...,0)^{\top}$
•

${\gamma}_{2}=(\gamma^{*},-\gamma^{*},\underbrace{0,\ldots,0}_{10},\gamma^{*},-\gamma^{*},\underbrace{0,\ldots,0}_{10})^{T}$ .

Single effects on both causes

•

${\alpha}_{1}=(\alpha^{*},\alpha^{*},0,0,0,0,0,0,0,0)^{\top}$
•

${\gamma}_{1}=(\gamma^{*},\gamma^{*},\underbrace{0,\ldots,0}_{22})^{T}$ .
•

${\alpha}_{2}=(0,0,\alpha^{*},\alpha^{*},0,0,0,0,0,0)^{\top}$
•

${\gamma}_{2}=(\underbrace{0,\ldots,0}_{12},\gamma^{*},\gamma^{*},\underbrace{0,\ldots,0}_{10})^{T}$ .

Opposite effects on the two causes

•

${\alpha}_{1}=(-\alpha^{*},-\alpha^{*},\alpha^{*},\alpha^{*},0,0,0,0,0,0)^{\top}$
•

${\gamma}_{1}=(\gamma^{*},-\gamma^{*},\underbrace{0,\ldots,0}_{10},\gamma^{*},-\gamma^{*},\underbrace{0,\ldots,0}_{10})^{T}$ .
•

${\alpha}_{2}=(\alpha^{*},\alpha^{*},-\alpha^{*},-\alpha^{*},0,0,0,0,0,0)^{\top}$
•

${\gamma}_{2}=(-\gamma^{*},\gamma^{*},\underbrace{0,\ldots,0}_{10},-\gamma^{*},\gamma^{*},\underbrace{0,\ldots,0}_{10})^{T}$ .

By analyzing the findings from Table 3, which focuses on variable selection, and Table 4, which pertains to risk prediction, We observe that the differing association structures between the two events have minimal impact on the performance of the VSJM method.

Table 3: Simulation results with differential effects between events: Mean and standard deviation of the true positive rate (TPR), false positive rate (FPR), and Matthews Correlation Coefficient (MCC) over 100 replications for

\rho=0.1,~{}\sigma^{2}=0.5,~{}\alpha^{*}=0.5,~{}\gamma^{*}=0.5

. The results of DS and CS are based on a threshold of 1 for BF and a threshold of 0.05 for lBFDR.

Single effects on cause 1
		CS						DS
		Total		L-Markers		Fixed covariates		Total		L-Markers		Fixed covariates
		Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.
BF	TPR	1.000	0.000	1.000	0.000	1.000	0.000	1.000	0.000	1.000	0.000	1.000	0.000
	FPR	0.018	0.018	0.022	0.049	0.017	0.020	0.039	0.024	0.056	0.060	0.035	0.023
	MCC	0.945	0.051	0.975	0.055	0.869	0.138	0.890	0.061	0.937	0.068	0.754	0.112
lBFDR	TPR	1.000	0.000	1.000	0.000	1.000	0.000	1.000	0.000	1.000	0.000	1.000	0.000
	FPR	0.000	0.000	0.000	0.000	0.000	0.000	0.002	0.006	0.000	0.000	0.003	0.008
	MCC	1.000	0.000	1.000	0.000	1.000	0.000	0.993	0.019	1.000	0.000	0.974	0.068
Single effects on both causes
BF	TPR	0.963	0.060	1.000	0.000	0.925	0.121	0.975	0.053	1.000	0.000	0.950	0.105
	FPR	0.020	0.027	0.050	0.087	0.009	0.016	0.035	0.032	0.069	0.080	0.023	0.024
	MCC	0.908	0.123	0.914	0.144	0.912	0.116	0.868	0.109	0.874	0.132	0.867	0.130
lBFDR	TPR	0.863	0.092	1.000	0.000	0.725	0.184	0.887	0.071	1.000	0.000	0.775	0.142
	FPR	0.000	0.000	0.000	0.000	0.000	0.000	0.003	0.011	0.013	0.040	0.000	0.000
	MCC	0.919	0.055	1.000	0.000	0.836	0.115	0.922	0.060	0.976	0.075	0.869	0.086
Opposite effects on the two causes
BF	TPR	0.981	0.030	1.000	0.000	0.963	0.060	0.988	0.026	1.000	0.000	0.975	0.053
	FPR	0.015	0.015	0.025	0.040	0.013	0.018	0.044	0.027	0.058	0.056	0.040	0.038
	MCC	0.956	0.044	0.971	0.047	0.943	0.066	0.908	0.049	0.933	0.063	0.887	0.093
lBFDR	TPR	0.850	0.053	1.000	0.000	0.700	0.105	0.881	0.069	1.000	0.000	0.762	0.138
	FPR	0.000	0.000	0.000	0.000	0.000	0.000	0.006	0.013	0.000	0.000	0.008	0.017
	MCC	0.901	0.035	1.000	0.000	0.811	0.070	0.910	0.055	1.000	0.000	0.829	0.105

Table 4: Simulation results with differential effects between events: Mean and standard deviation of AUC and Brier score computed for landmark times

s=0,0.25,0.5,0.75

and prediction window of

0.25

for

\rho=0.1,~{}\sigma^{2}=0.5,~{}\alpha^{*}=0.5,~{}\gamma^{*}=0.5

Single effects on cause 1
		Real		Oracle		CS				DS
						BF		lBFDR		BF		lBFDR
	s	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.
AUC	0	0.781	0.029	0.747	0.026	0.742	0.019	0.740	0.020	0.742	0.017	0.737	0.020
	0.25	0.790	0.047	0.759	0.060	0.761	0.050	0.762	0.053	0.757	0.052	0.757	0.053
	0.5	0.786	0.063	0.766	0.057	0.769	0.054	0.772	0.058	0.769	0.059	0.771	0.066
	0.75	0.778	0.090	0.794	0.087	0.786	0.095	0.796	0.090	0.787	0.089	0.790	0.092
BS	0	0.147	0.011	0.153	0.010	0.157	0.008	0.153	0.010	0.157	0.008	0.154	0.009
	0.25	0.117	0.014	0.121	0.017	0.121	0.016	0.120	0.017	0.121	0.016	0.121	0.017
	0.5	0.075	0.019	0.077	0.019	0.077	0.017	0.077	0.019	0.077	0.018	0.076	0.019
	0.75	0.063	0.017	0.063	0.018	0.063	0.017	0.063	0.018	0.063	0.018	0.064	0.018
Single effects on both causes
AUC	0	0.733	0.027	0.729	0.026	0.723	0.021	0.708	0.026	0.723	0.020	0.709	0.025
	0.25	0.762	0.035	0.752	0.036	0.743	0.035	0.734	0.038	0.746	0.034	0.736	0.037
	0.5	0.756	0.073	0.743	0.082	0.736	0.079	0.725	0.071	0.733	0.079	0.736	0.077
	0.75	0.756	0.073	0.730	0.071	0.741	0.064	0.736	0.056	0.730	0.066	0.743	0.060
BS	0	0.149	0.011	0.151	0.011	0.156	0.011	0.156	0.010	0.156	0.011	0.156	0.010
	0.25	0.134	0.017	0.136	0.017	0.138	0.016	0.139	0.017	0.137	0.017	0.139	0.017
	0.5	0.124	0.017	0.124	0.018	0.127	0.018	0.128	0.017	0.128	0.018	0.126	0.017
	0.75	0.100	0.020	0.106	0.020	0.104	0.018	0.106	0.019	0.106	0.019	0.105	0.019
Opposite effects on the two causes
AUC	0	0.784	0.021	0.791	0.027	0.765	0.027	0.757	0.026	0.761	0.029	0.758	0.030
	0.25	0.762	0.042	0.762	0.039	0.753	0.048	0.745	0.048	0.745	0.039	0.745	0.040
	0.5	0.758	0.078	0.735	0.073	0.716	0.070	0.710	0.065	0.720	0.056	0.697	0.060
	0.75	0.732	0.112	0.691	0.105	0.710	0.115	0.693	0.137	0.706	0.114	0.706	0.132
BS	0	0.138	0.011	0.137	0.011	0.146	0.012	0.149	0.011	0.147	0.012	0.149	0.012
	0.25	0.136	0.017	0.137	0.019	0.140	0.021	0.143	0.022	0.143	0.020	0.143	0.022
	0.5	0.145	0.029	0.153	0.028	0.156	0.026	0.160	0.025	0.159	0.023	0.164	0.025
	0.75	0.145	0.034	0.150	0.038	0.148	0.034	0.154	0.032	0.158	0.039	0.157	0.033

5 Application: 3C study

5.1 Data description

In this section, we analyzed a subset of the Three-City (3C) Study [2] which is a French cohort study. Participants aged 65 or older were recruited from three cities (Bordeaux, Dijon, and Montpellier) and followed for over 10 years to investigate the association between vascular factors and dementia. The data includes socio-demographic information, general health information, and cognitive test scores. The application here only included the centers of Dijon and Bordeaux, where MRI exams were conducted at years 0 and 4 for both, and additionally at year 10 for Bordeaux. Except for the MRI markers, the other longitudinal markers were measured at baseline and over the follow-up visit, including years 2, 4, 7, 10, 12, and also 14 and 17 in Bordeaux only. The diagnosis of dementia among participants followed a three-step process. Initially, cognitive assessments were conducted by neuropsychologists or investigators. Suspected cases of dementia then underwent examination by clinicians, with validation by an independent expert committee comprising neurologists and geriatricians. In addition to dementia, the exact time of death was collected and is considered a competing risk.
We considered $N=2133$ subjects who were dementia-free at the beginning of the study and had at least one measurement for each of the longitudinal markers. Out of these subjects, 227 were diagnosed with incident dementia, and 311 died before developing dementia. Figure A.1 shows the cumulative incidence function for dementia and death in this sample.
We considered seventeen longitudinal markers: three cardio-metabolic markers (body mass index (BMI), diastolic blood pressure (DBP), and systolic blood pressure (SBP)), the total number of medications (TOTMED), depressive symptomatology measured using the Center for Epidemiologic Studies-Depression scale (CESDT, the lower the better), functional dependency assessed using the Instrumental Activities of Daily Living scale (IADL, the lower the better), four cognitive tests (the visual retention test of Benton (BENTON, number of correct responses out of 15), the Isaacs set test (ISA, total number of words given in 4 semantic categories in 15 seconds), the trail making tests A and B (TMTA and TMTB, number of correct moves per minute), the total intracranial volume (TIV), and some biomarkers of neurodegeneration, including white matter volume (WMV), gray matter volume (GMV), left hippocampal volume (LHIPP), and right hippocampal volume (RHIPP); two markers of vascular brain lesions including the volumes of white matter hyperintensities in the periventricular (Peri), and deep (Deep) white matter. Figure A.2 shows individual trajectories of the longitudinal markers in the 3C study, illustrating time-dependent variables. The data were pre-transformed using splines to satisfy the normality assumption of linear mixed models [16]. The minimum, maximum, and median number of repeated measurements for SBP, DBP, CESDT, BENTON, ISA, TOTMED, and IADL are 1, 8, and 5, respectively. For BMI, TMTA, and TMTB, the minimum, maximum, and median number of repeated measurements are 1, 7, and 4, respectively. The minimum, maximum, and median number of repeated measurements for WMV, GMV, TIV, RHIPP, and LHIPP are 1, 3, and 2, respectively. For Peri and Deep (which were not measured at year 10), there are at most two repeated measures, with a median equal to 2.
In addition, the model incorporates five time-invariant explanatory variables including age at enrollment ( $\frac{\rm{age-65}}{10}$ ), sex (0=male, 1=female), education level (Educ, 0=less than 10 years of schooling, 1=at least 10 years of schooling), Diabetes status at baseline (1=yes), and APOE4 allele carrier status (the main genetic susceptibility factor for Alzheimer’s disease, 1=presence of APOE4).

5.2 Data analysis

For analyzing the 3C data using the proposed approach, we partitioned the dataset into two distinct sets: the training set, comprising 70% of the data, and a separate validation set. Parameter estimation was conducted using the training data, while risk prediction was exclusively performed on the validation set.
For this aim, a quadratic time trend is considered for both fixed and random effects, as a non-linear time trend was previously observed for SBP, DBP, BMI, CESDT, BENTON, ISA, TOTMED, TMTA, TMTB, and IADL [16]:

	$\displaystyle Y_{ik}(t)$	$\displaystyle=$	$\displaystyle\eta_{ik}(t\|{\beta}_{k},{b}_{ik})+\varepsilon_{ikt}$
		$\displaystyle=$	$\displaystyle\beta_{0k}+\beta_{1k}t+\beta_{2k}t^{2}+b_{0ki}+b_{1ki}t+b_{1ki}t^{2}+\varepsilon_{ikt}.$

For other longitudinal markers, a linear mixed model is considered:

	$\displaystyle Y_{ik}(t)$	$\displaystyle=$	$\displaystyle\eta_{ik}(t\|{\beta}_{k},{b}_{ik})+\varepsilon_{ikt}$
		$\displaystyle=$	$\displaystyle\beta_{0k}+\beta_{1k}t+b_{0ki}+b_{1ki}t+\varepsilon_{ikt}.$

To model dementia and death, we utilize proportional hazard models that may depend on the five time-invariant explanatory variables and the current value of the markers. Let $\lambda_{il}(t)$ denote the cause-specific hazard function for subject $i$ , where $l=1$ for dementia and $l=2$ for death. The proportional hazard model takes the following form:

	$\displaystyle\lambda_{il}(t)$	$\displaystyle=$	$\displaystyle\lambda_{0l}(t)\exp(\gamma_{1l}APOE_{i}+\gamma_{2l}Sex_{i}+\gamma_{3l}Diabete_{i}+\gamma_{4l}Educ_{i}$
		$\displaystyle+$	$\displaystyle\gamma_{5l}Age_{i}+\sum_{k=1}^{18}\alpha_{kl}\eta_{ik}(t\|{\beta}_{k},{b}_{ik})).$

Table A.1 in the supplementary material displays the estimated regression coefficients and variance of error from the mixed effects sub-models for each marker in the one-marker joint models. Additionally, Table 5 presents the estimated parameters from the proportional hazard models for dementia and death, considering the VSJM approach given you choose DS, this is an additional reason for not emphasizing on a vague superiority of CS which is debatable in the simulation results, including estimates, standard deviations, $95\%$ credible intervals, lBFDRs, and BFs (left part). Subsequently, the right part presents the estimates obtained after re-estimating the model with non-informative priors including only the significant variables selected variables according to lBFDR and BF. The analysis identifies several significant factors associated with dementia and death. Specifically, high TIV and IADL values, coupled with lower scores on the BENTON, ISA, and TMTB tests, are linked to an increased risk of dementia. Additionally, higher educational attainment, when adjusted for current cognitive scores, is also associated with a higher risk of dementia. For death without dementia, lower DBP and BENTON scores, increasing age, diabetes, and high educational levels (adjusted for current cognitive scores) are associated with a higher risk. Conversely, being female are linked to a lower risk of death. The results for CS remains similar, and are reported in Table A.2.
The predictive abilities of the model for the $t=5$ years prediction of dementia accounting for the competing risk of death were assessed at landmark times $s=0,5$ and $10$ years. To avoid overoptimistic results, AUC and BS were computed on a validation set. The predictive abilities presented in table 6 are very good. The AUC is always higher than 0.9 except for s=0 with CS and BF and higher for $s=5$ and $10$ compared to $s=0$ , highlighting the information brought by the repeated measures of the longitudinal markers. Results were very similar for CS and DS priors and slightly better for LBFDR compared to BF.

Table 5: The estimated regression coefficients and variance of errors of the longitudinal sub-model from the joint model in 3C study data using DS. Est: posterior mean, SD: standard deviation, 2.5

\%

CI: lower bound of credible interval and 97.5

\%

CI: upper bound of credible interval.

	VSJM						Re-estimation (LBFDR)				Re-estimation (BF)
	Est.	SD	$2.5\%$ CI	$97.5\%$ CI	lBFDR	BF	Est.	SD	$2.5\%$ CI	$97.5\%$ CI	Est.	SD	$2.5\%$ CI	$97.5\%$ CI
	Dementia
WMV	0.000	0.000	0.000	0.000	1.000	0.000
GMV	0.000	0.000	0.000	0.000	1.000	0.000
TIV	0.060	0.010	0.040	0.079	0.000	Inf	0.059	0.010	0.040	0.079	0.059	0.010	0.040	0.078
RHIPP	0.000	0.000	0.000	0.000	1.000	0.000
LHIPP	-0.079	0.018	-0.115	-0.043	0.000	Inf	-0.078	0.019	-0.115	-0.042	-0.079	0.018	-0.114	-0.044
Peri	0.000	0.000	0.000	0.000	1.000	0.000
Deep	0.000	0.000	0.000	0.000	1.000	0.000
SBP	0.000	0.000	0.000	0.000	1.000	0.000
DBP	0.000	0.000	0.000	0.000	1.000	0.000
BMI	-0.018	0.034	-0.103	0.000	0.765	0.308
CESDT	0.013	0.043	0.000	0.171	0.900	0.111
BENTON	-0.500	0.119	-0.744	-0.272	0.000	Inf	-0.500	0.112	-0.717	-0.265	-0.510	0.108	-0.723	-0.295
ISA	-0.323	0.079	-0.469	-0.171	0.000	Inf	-0.321	0.075	-0.474	-0.180	-0.311	0.078	-0.468	-0.168
TOTMED	0.000	0.000	0.000	0.000	1.000	0.000
TMTA	0.000	0.000	0.000	0.000	1.000	0.000
TMTB	-1.140	0.110	-1.345	-0.930	0.000	Inf	-1.150	0.104	-1.345	-0.946	-1.150	0.102	-1.346	-0.953
IADL	0.326	0.062	0.207	0.445	0.000	Inf	0.333	0.058	0.222	0.450	0.329	0.057	0.220	0.444
Sex	0.002	0.028	0.000	0.000	0.986	0.014
Diabete	0.000	0.025	0.000	0.000	0.996	0.004
APOE	0.002	0.032	0.000	0.000	0.994	0.006
Educ	0.891	0.186	0.536	1.245	0.000	Inf	0.913	0.175	0.560	1.253	0.914	0.177	0.563	1.259
Age	0.000	0.000	0.000	0.000	1.000	0.000
	Death
WMV	0.000	0.000	0.000	0.000	1.000	0.000
GMV	0.000	0.000	0.000	0.000	1.000	0.000
TIV	0.000	0.000	0.000	0.000	1.000	0.000
RHIPP	0.000	0.000	0.000	0.000	1.000	0.000
LHIPP	0.000	0.000	0.000	0.000	1.000	0.000
Peri	0.000	0.000	0.000	0.000	1.000	0.000
Deep	0.000	0.000	0.000	0.000	1.000	0.000
SBP	0.000	0.000	0.000	0.000	1.000	0.000
DBP	-0.155	0.048	-0.248	-0.060	0.000	Inf	-0.154	0.047	-0.245	-0.059	-0.156	0.046	-0.246	-0.065
BMI	0.000	0.000	0.000	0.000	1.000	0.000
CESDT	0.000	0.000	0.000	0.000	1.000	0.000
BENTON	-3.347	0.132	-3.623	-3.097	0.000	Inf	-3.365	0.150	-3.643	-3.081	-3.313	0.154	-3.618	-3.021
ISA	0.000	0.000	0.000	0.000	1.000	0.000
TOTMED	0.000	0.000	0.000	0.000	1.000	0.000
TMTA	0.000	0.000	0.000	0.000	1.000	0.000
TMTB	0.000	0.000	0.000	0.000	1.000	0.000
IADL	0.000	0.000	0.000	0.000	1.000	0.000
Sex	-2.200	0.175	-2.553	-1.872	0.000	Inf	-2.215	0.175	-2.555	-1.865	-2.184	0.179	-2.514	-1.828
Diabete	1.046	0.183	0.677	1.398	0.000	Inf	1.062	0.194	0.683	1.442	1.039	0.187	0.685	1.401
APOE	0.000	0.000	0.000	0.000	1.000	0.000
Educ	0.870	0.159	0.548	1.178	0.000	Inf	0.869	0.155	0.558	1.182	0.854	0.151	0.563	1.142
Age	1.230	0.191	0.844	1.588	0.000	Inf	1.255	0.192	0.875	1.628	1.216	0.188	0.841	1.589

Table 6: AUC and Brier score computed by validation set for landmark times

s=0,5,10

years and prediction window of 5 years for the 3C data.

		lBFDR				BF
		AUC		BS		AUC		BS
	$s$	Est.	SD.	Est.	SD.	Est.	SD.	Est.	SD.
DS
	$0$	0.891	0.037	0.019	0.005	0.885	0.034	0.019	0.005
	$5$	0.891	0.018	0.130	0.014	0.846	0.027	0.152	0.016
	$10$	0.971	0.016	0.044	0.016	0.901	0.035	0.122	0.029
CS
	$0$	0.731	0.070	0.058	0.006	0.827	0.075	0.020	0.004
	$5$	0.851	0.022	0.184	0.016	0.895	0.017	0.109	0.012
	$10$	0.863	0.045	0.135	0.028	0.945	0.023	0.086	0.022

6 Conclusion

In this paper, we introduce a novel approach for variable selection in joint modeling for multiple longitudinal markers and competing risks outcomes in a Bayesian paradigm. Our method utilizes a spike and slab prior for the association parameters and regression coefficients in the survival model. The approach involves two key steps: initially, a simple Bayesian joint modeling of each longitudinal marker and competing risks outcome is conducted, followed by variable selection using a time-dependent Bayesian proportional hazard model. This model incorporates Dirac spike or continuous spike priors for regression coefficients, including those for time-dependent markers and fixed covariates. We also propose two criteria, namely Bayes factor and local Bayesian false discovery rate, for making final decisions in variable selection.
In addition to variable selection, our paper explores dynamic prediction based on the estimated joint model. To achieve this, we suggest re-estimating the second step of the approach, taking into account the selected markers and covariates.
We evaluate the performance of our approach through extensive simulation studies, focusing on variable selection and dynamic prediction. Furthermore, we apply our proposed method to analyze and predict the risk of dementia using the 3C dataset. The implementation of our method is available in the freely accessible R package VSJM, which can be found at https://github.com/tbaghfalaki/VSJM.
As a future direction, our approach can be extended to accommodate non Gaussian longitudinal markers by including generalised linear mixed sub-model in the one-marker joint models. Moreover, while we focus on variable selection in this work, there is potential to extend our approach to group selection, offering further flexibility and applicability.

Declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Availability of data and materials

To facilitate implementation, we introduce an R package called VSJM, which is freely available at https://github.com/tbaghfalaki/VSJM. Vignette and example usage is available at https://github.com/tbaghfalaki/VSJM/blob/main/Exam1.md.

Competing interests

Not applicable.

Funding

Not applicable

Authors’ contributions

HJG and TB conceptualized and designed the study. CT and CH provided the data, while HJG and TB performed the data analysis and interpretation. HJG, TB, and RH developed the simulation studies. TB wrote the R code. All authors contributed to the manuscript preparation, reviewed, and approved the final version.

Acknowledgements

This work was partly funded by the French National Research Agency (grant ANR-21-CE36 for the project “Joint Models for Epidemiology and Clinical research”). This study received financial support from the French government in the framework of the University of Bordeaux’s IdEx ”Investments for the Future” program / RRI PHDS. The 3C Study was supported by Sanofi-Synthélabo, the FRM, the CNAM-TS,DGS,Conseils Régionaux of Aquitaine, 395 Languedoc-Roussillon, and Bourgogne; Foundation of France; Ministry of Research- INSERM “Cohorts and biological data collections” program; MGEN; Longevity Institute; General Council of the Côte d’Or; ANR PNRA 2006 (grant ANR/ DEDD/ PNRA/ PROJ/ 200206–01-01) and Longvie 2007 (grant LVIE-003-01); Alzheimer Plan Foundation (FCS grant 2009-2012); and Roche. The Three City Study data are available upon request at e3c.coordinatingcenter@gmail.com.

References

[1] Maha Alsefri, Maria Sudell, Marta García-Fiñana, and Ruwanthi Kolamunnage-Dona. Bayesian joint modelling of longitudinal and time to event data: a methodological review. BMC medical research methodology, 20:1–17, 2020.
[2] Marilyn Antoniak, Maura Pugliatti, Richard Hubbard, John Britton, Stefano Sotgiu, A Dessa Sadovnick, Irene ML Yee, Miguel A Cumsille, Jorge A Bevilacqua, Sarah Burdett, et al. Vascular factors and risk of dementia: design of the three-city study and baseline characteristics of the study population. Neuroepidemiology, 22(6):316–325, 2003.
[3] Peter C Austin. Generating survival times to simulate cox proportional hazards models with time-varying covariates. Statistics in medicine, 31(29):3946–3958, 2012.
[4] Taban Baghfalaki, Mojtaba Ganjali, and Damon Berridge. Joint modeling of multivariate longitudinal mixed measurements and time to event data using a bayesian approach. Journal of Applied Statistics, 41(9):1934–1955, 2014.
[5] Taban Baghfalaki, Reza Hashemi, Catherine Helmer, and Helene Jacqmin-Gadda. A two-stage joint modeling approach for multiple longitudinal markers and time-to-event data. submitted, pages 1–33, 2024.
[6] Taban Baghfalaki, Pierre-Emmanuel Sugier, Therese Truong, Anthony N Pettitt, Kerrie Mengersen, and Benoit Liquet. Bayesian meta-analysis models for cross cancer genomic investigation of pleiotropic effects using group structure. Statistics in Medicine, 40(6):1498–1518, 2021.
[7] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
[8] Paul Blanche, Cécile Proust-Lima, Lucie Loubère, Claudine Berr, Jean-François Dartigues, and Hélène Jacqmin-Gadda. Quantifying and comparing dynamic predictive accuracy of joint models for longitudinal marker and time-to-event in presence of censoring and competing risks. Biometrics, 71(1):102–113, 2015.
[9] Ignace Bogaert. Iteration-free computation of gauss–legendre quadrature nodes and weights. SIAM Journal on Scientific Computing, 36(3):A1008–A1026, 2014.
[10] Sam Brilleman. Estimating joint models for longitudinal and time-to-event data with rstanarm. https://cran.uni-muenster.de/web/packages/rstanarm/vignettes/jm.html, 2023.
[11] Peter Bühlmann and Sara Van De Geer. Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media, Germany, 2011.
[12] Carlos M Carvalho, Nicholas G Polson, and James G Scott. The horseshoe estimator for sparse signals. Biometrika, 97(2):465–480, 2010.
[13] Yuqi Chen and Yuedong Wang. Variable selection for joint models of multivariate longitudinal measurements and event time data. Statistics in medicine, 36(24):3820–3829, 2017.
[14] Yueh-Yun Chi and Joseph G Ibrahim. Joint models for multivariate longitudinal and multivariate survival data. Biometrics, 62(2):432–445, 2006.
[15] Mohammad Ziaul Islam Chowdhury and Tanvir C Turin. Variable selection strategies and its importance in clinical prediction modelling. Family medicine and community health, 8(1), 2020.
[16] Anthony Devaux, Catherine Helmer, Carole Dufouil, Robin Genuer, and Cécile Proust-Lima. Random survival forests for competing risks with multivariate longitudinal endogenous covariates. arXiv preprint arXiv:2208.05801, 2022.
[17] Bradley Efron. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. Journal of the American Statistical Association, 99(465):96–104, 2004.
[18] Bradley Efron. Large-scale inference: empirical Bayes methods for estimation, testing, and prediction, volume 1. Cambridge University Press, United Kingdom, 2012.
[19] Bradley Efron, Alan Gous, RE Kass, GS Datta, and P Lahiri. Scales of evidence for model selection: Fisher versus jeffreys. Lecture Notes-Monograph Series, pages 208–256, 2001.
[20] Bradley Efron and Robert Tibshirani. Empirical bayes methods and false discovery rates for microarrays. Genetic epidemiology, 23(1):70–86, 2002.
[21] Andrew Gelman, John B Carlin, Hal S Stern, and Donald B Rubin. Bayesian data analysis. Chapman and Hall/CRC, United States, 1995.
[22] Edward I George and Robert E McCulloch. Variable selection via gibbs sampling. Journal of the American Statistical Association, 88(423):881–889, 1993.
[23] Colin Griesbach, Andreas Mayr, and Elisabeth Bergherr. Variable selection and allocation in joint models via gradient boosting techniques. Mathematics, 11(2):411, 2023.
[24] Jeremie Guedj, Rodolphe Thiébaut, and Daniel Commenges. Joint modeling of the clinical progression and of the biomarkers’ dynamics using a mechanistic model. Biometrics, 67(1):59–66, 2011.
[25] Zangdong He, Wanzhu Tu, Sijian Wang, Haoda Fu, and Zhangsheng Yu. Simultaneous variable selection for joint models of longitudinal and survival outcomes. Biometrics, 71(1):178–187, 2015.
[26] Md Akhtar Hossain. Multivariate Joint Models and Dynamic Predictions. PhD thesis, University of South Carolina, 2020.
[27] Joseph G Ibrahim, Ming-Hui Chen, Debajyoti Sinha, JG Ibrahim, and MH Chen. Bayesian survival analysis, volume 2. Springer, Germany, 2001.
[28] Robert E Kass and Adrian E Raftery. Bayes factors. Journal of the american statistical association, 90(430):773–795, 1995.
[29] Jeffrey Lin and Sheng Luo. Deep learning for the dynamic prediction of multivariate longitudinal and survival data. Statistics in medicine, 41(15):2894–2907, 2022.
[30] Jeffrey D Long and James A Mills. Joint modeling of multivariate longitudinal data and survival data in several observational studies of huntington’s disease. BMC medical research methodology, 18(1):1–15, 2018.
[31] Arunabha Majumdar, Tanushree Haldar, Sourabh Bhattacharya, and John S Witte. An efficient bayesian meta-analysis approach for studying cross-phenotype genetic associations. PLoS genetics, 14(2):e1007139, 2018.
[32] Gertraud Malsiner-Walli and Helga Wagner. Comparing spike and slab priors for bayesian variable selection. Austrian Journal of Statistics, 40(4):241–264, 2011.
[33] B.W. Matthews. Comparison of the predicted and observed secondary structure of t4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure, 405(2):442 – 451, 1975.
[34] Toby J Mitchell and John J Beauchamp. Bayesian variable selection in linear regression. Journal of the american statistical association, 83(404):1023–1032, 1988.
[35] Grigorios Papageorgiou, Katya Mauff, Anirudh Tomer, and Dimitris Rizopoulos. An overview of joint modeling of time-to-event and longitudinal outcomes. Annual review of statistics and its application, 6:223–240, 2019.
[36] Martyn Plummer. Jags version 3.3. 0 user manual. International Agency for Research on Cancer, Lyon, France, 2012.
[37] Martyn Plummer et al. Jags: A program for analysis of bayesian graphical models using gibbs sampling. In Proceedings of the 3rd international workshop on distributed statistical computing, volume 124, pages 1–10. Vienna, Austria, 2003.
[38] D Rizopoulos, G Papageorgiou, and P Miranda Afonso. Jmbayes2: extended joint models for longitudinal and time-to-event data. R package version 0.2-4, 2022.
[39] Dimitris Rizopoulos. Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics, 67(3):819–829, 2011.
[40] Dimitris Rizopoulos. Joint models for longitudinal and time-to-event data: With applications in R. CRC press, New York, 2012.
[41] Dimitris Rizopoulos. The r package jmbayes for fitting joint models for longitudinal and time-to-event data using mcmc. Journal of Statistical Software, 72:1–46, 2016.
[42] Dimitris Rizopoulos and Pulak Ghosh. A bayesian semiparametric multivariate joint model for multiple longitudinal outcomes and a time-to-event. Statistics in medicine, 30(12):1366–1380, 2011.
[43] Sheldon M Ross. Simulation, 5th edision, 2012.
[44] Denis Rustand, Elias T Krainski, and Haavard Rue. Denisrustand/inlajoint: Joint modeling multivariate longitudinal and time-to-event outcomes with inla. https://repository.kaust.edu.sa/items/82751567-82c6-477e-92af-8a2e9a7b0c3b, 2022.
[45] Joshua N Sampson, Nilanjan Chatterjee, Raymond J Carroll, and Samuel Müller. Controlling the local false discovery rate in the adaptive lasso. Biostatistics, 14(4):653–666, 2013.
[46] James G Scott and James O Berger. Bayes and empirical-bayes multiplicity adjustment in the variable-selection problem. The Annals of Statistics, pages 2587–2619, 2010.
[47] Fan Shen and Liang Li. Backward joint model and dynamic prediction of survival with multivariate longitudinal data. Statistics in medicine, 40(20):4395–4409, 2021.
[48] Xiao Song, Marie Davidian, and Anastasios A Tsiatis. An estimator for the proportional hazards model with multiple longitudinal covariates measured with error. Biostatistics, 3(4):511–528, 2002.
[49] Inês Sousa. A review on joint modelling of longitudinal measurements and time-to-even. REVSTAT-Statistical Journal, 9(1):57–81, 2011.
[50] Ewout W Steyerberg. Applications of prediction models. In Clinical prediction models, pages 11–31. Springer, Germany, 2009.
[51] Yu-Sung Su and Masanao Yajima. R2jags: Using r to run ‘jags’. R package version 0.5-7, 34, 2015.
[52] An-Min Tang, Xingqiu Zhao, and Nian-Sheng Tang. Bayesian variable selection and estimation in semiparametric joint models of multivariate longitudinal and survival data. Biometrical Journal, 59(1):57–78, 2017.
[53] Jiarui Tang, An-Min Tang, and Niansheng Tang. Variable selection for joint models of multivariate skew-normal longitudinal and survival data. Statistical Methods in Medical Research, 32(9):1694–1710, 2023.
[54] James Tanton. Encyclopedia of mathematics. Facts On File, Inc, United States, 2005.
[55] Peter M Williams. Bayesian regularization and pruning using a laplace prior. Neural computation, 7(1):117–143, 1995.
[56] Yujing Xie, Zangdong He, Wanzhu Tu, and Zhangsheng Yu. Variable selection for joint models with time-varying coefficients. Statistical Methods in Medical Research, 29(1):309–322, 2020.
[57] Jane Xu and Scott L Zeger. The evaluation of multiple surrogate endpoints. Biometrics, 57(1):81–87, 2001.
[58] Xiaofan Xu, Malay Ghosh, et al. Bayesian variable selection and estimation for group lasso. Bayesian Analysis, 10(4):909–936, 2015.
[59] Jurel K Yap and Iris Ivy M Gauran. Bayesian variable selection using knockoffs with applications to genomics. Computational Statistics, 38(4):1771–1790, 2023.
[60] Fengting Yi, Niansheng Tang, and Jianguo Sun. Simultaneous variable selection and estimation for joint models of longitudinal and failure time data with interval censoring. Biometrics, 78(1):151–164, 2022.
[61] TianHong Zhang, XiaoChen Tang, Yue Zhang, LiHua Xu, YanYan Wei, YeGang Hu, HuiRu Cui, YingYing Tang, HaiChun Liu, Tao Chen, et al. Multivariate joint models for the dynamic prediction of psychosis in individuals with clinical high risk. Asian Journal of Psychiatry, 81:103468, 2023.
[62] Xiang Zhou and Jerome P Reiter. A note on bayesian inference after multiple imputation. The American Statistician, 64(2):159–163, 2010.
[63] Kirill Zhudenkov, Sergey Gavrilov, Alina Sofronova, Oleg Stepanov, Nataliya Kudryashova, Gabriel Helmlinger, and Kirill Peskov. A workflow for the joint modeling of longitudinal and event data in the development of therapeutics: Tools, statistical methods, and diagnostics. CPT: Pharmacometrics & Systems Pharmacology, 11(4):425–437, 2022.

Supplementary Material

Table A.1: The estimated regression coefficients and variance of errors of the longitudinal sub-model from the joint model in 3C study data. Est: posterior mean, SD: standard deviation, 2.5

\%

CI: lower bound of credible interval and 97.5

\%

CI: upper bound of credible interval.

		Est.	SD	$2.5\%$ CI	$97.5\%$ CI
WMV	Intercept	-0.004	0.063	-0.119	0.114
	Time	0.027	0.012	0.004	0.051
	$\sigma^{2}$	0.765	0.063	0.629	0.875
GMV	Intercept	-0.102	0.145	-0.382	0.170
	Time	-0.246	0.021	-0.285	-0.205
	$\sigma^{2}$	1.994	0.107	1.797	2.216
TIV	Intercept	-0.293	0.299	-0.779	0.420
	Time	0.019	0.028	-0.037	0.074
	$\sigma^{2}$	3.760	0.199	3.394	4.168
RHIPP	Intercept	0.137	0.100	-0.042	0.327
	Time	-0.316	0.013	-0.340	-0.291
	$\sigma^{2}$	0.905	0.054	0.801	1.018
LHIPP	Intercept	0.087	0.112	-0.128	0.297
	Time	-0.350	0.014	-0.379	-0.322
	$\sigma^{2}$	1.017	0.062	0.900	1.144
Peri	Intercept	-0.074	0.116	-0.310	0.132
	Time	0.605	0.029	0.549	0.661
	$\sigma^{2}$	1.704	0.493	0.945	2.671
Deep	Intercept	-0.009	0.058	-0.120	0.108
	Time	0.019	0.012	-0.004	0.043
	$\sigma^{2}$	0.601	0.097	0.394	0.770
SBP	Intercept	0.053	0.085	-0.071	0.198
	Time	-0.327	0.286	-0.668	-0.027
	Time²	0.064	0.071	-0.008	0.140
	$\sigma^{2}$	0.994	0.037	0.934	1.072
DBP	Intercept	-0.038	0.108	-0.179	0.118
	Time	-0.488	0.339	-0.863	-0.135
	Time²	0.075	0.084	-0.010	0.162
	$\sigma^{2}$	0.967	0.040	0.907	1.062
BMI	Intercept	0.014	0.225	-0.273	0.299
	Time	0.015	0.047	-0.057	0.079
	Time²	0.003	0.017	-0.015	0.022
	$\sigma^{2}$	0.997	0.044	0.933	1.100
CESDT	Intercept	0.053	0.163	-0.160	0.258
	Time	-0.060	0.196	-0.305	0.150
	Time²	0.011	0.054	-0.044	0.070
	$\sigma^{2}$	1.029	0.028	0.981	1.091
BENTON	Intercept	0.110	0.063	0.010	0.235
	Time	-0.239	0.223	-0.517	0.000
	Time²	0.049	0.053	-0.005	0.107
	$\sigma^{2}$	0.980	0.030	0.925	1.037
ISA	Intercept	0.069	0.113	-0.079	0.251
	Time	-0.143	0.229	-0.436	0.109
	Time²	0.029	0.051	-0.023	0.085
	$\sigma^{2}$	0.963	0.027	0.911	1.017
TOTMED	Intercept	-0.096	0.118	-0.264	0.061
	Time	-0.038	0.327	-0.426	0.304
	Time²	0.038	0.077	-0.041	0.121
	$\sigma^{2}$	1.053	0.046	0.977	1.140
TMTA	Intercept	0.044	0.033	-0.021	0.104
	Time	-0.193	0.245	-0.500	0.067
	Time²	0.051	0.057	-0.007	0.113
	$\sigma^{2}$	0.957	0.049	0.879	1.058
TMTB	Intercept	-0.000	0.035	-0.068	0.069
	Time	-0.250	0.205	-0.492	-0.031
	Time²	0.040	0.054	-0.015	0.098
	$\sigma^{2}$	0.975	0.052	0.893	1.091
IADL	Intercept	0.015	0.123	-0.144	0.180
	Time	-0.055	0.089	-0.175	0.050
	Time²	0.020	0.025	-0.006	0.048
	$\sigma^{2}$	0.969	0.032	0.915	1.038

Table A.2: The estimated regression coefficients and variance of errors of the longitudinal sub-model from the joint model in 3C study data using CS. Est: posterior mean, SD: standard deviation, 2.5

\%

CI: lower bound of credible interval and 97.5

\%

CI: upper bound of credible interval.

	VSJM						Re-estimation (LBFDR)				Re-estimation (BF)
	Est.	SD	$2.5\%$ CI	$97.5\%$ CI	lBFDR	BF	Est.	SD	$2.5\%$ CI	$97.5\%$ CI	Est.	SD	$2.5\%$ CI	$97.5\%$ CI
	Dementia
WMV	-0.030	0.022	-0.076	0.008	0.991	0.009
GMV	0.009	0.019	-0.029	0.042	1.000	0.000
TIV	0.033	0.011	0.011	0.053	1.000	0.000
RHIPP	-0.020	0.019	-0.055	0.015	1.000	0.000
LHIPP	-0.042	0.021	-0.085	-0.001	1.000	0.000
Peri	0.021	0.015	-0.005	0.053	1.000	0.000
Deep	0.020	0.026	-0.037	0.065	1.000	0.000
SBP	-0.006	0.040	-0.078	0.062	0.988	0.012
DBP	-0.041	0.032	-0.102	0.020	1.000	0.000
BMI	-0.003	0.014	-0.032	0.027	1.000	0.000
CESDT	0.038	0.030	-0.022	0.100	1.000	0.000
BENTON	-0.893	0.187	-1.251	-0.604	0.000	Inf	-1.201	0.085	-1.366	-1.038	-0.783	0.096	-0.969	-0.590
ISA	-0.034	0.031	-0.094	0.027	1.000	0.000
TOTMED	-0.040	0.033	-0.107	0.019	1.000	0.000
TMTA	0.004	0.042	-0.077	0.074	1.000	0.000
TMTB	-0.851	0.489	-1.344	-0.079	0.297	2.364					-1.235	0.094	-1.412	-1.045
IADL	0.440	0.089	0.296	0.623	0.000	Inf	0.625	0.056	0.518	0.735	0.382	0.055	0.278	0.490
Sex	-0.001	0.032	-0.062	0.061	1.000	0.000
Diabete	0.009	0.054	-0.054	0.071	0.979	0.021
APOE	0.688	0.183	0.311	1.020	0.000	Inf	0.897	0.174	0.548	1.225	0.737	0.179	0.380	1.084
Educ	0.641	0.175	0.283	0.951	0.000	Inf	0.674	0.174	0.339	1.026	0.804	0.171	0.462	1.136
Age	0.005	0.049	-0.063	0.082	0.955	0.047
	Death
WMV	0.016	0.023	-0.030	0.061	1.000	0.000
GMV	-0.010	0.019	-0.046	0.026	1.000	0.000
TIV	0.029	0.011	0.008	0.051	1.000	0.000
RHIPP	-0.007	0.017	-0.038	0.024	1.000	0.000
LHIPP	0.018	0.018	-0.015	0.052	1.000	0.000
Peri	0.001	0.011	-0.020	0.023	1.000	0.000
Deep	0.002	0.025	-0.045	0.053	1.000	0.000
SBP	0.004	0.028	-0.051	0.057	1.000	0.000
DBP	0.296	0.047	0.203	0.393	0.000	Inf	0.356	0.046	0.268	0.451	0.358	0.049	0.266	0.456
BMI	0.012	0.010	-0.008	0.031	1.000	0.000
CESDT	-0.003	0.025	-0.050	0.050	1.000	0.000
BENTON	-0.088	0.030	-0.152	-0.032	1.000	0.000
ISA	-1.761	0.084	-1.936	-1.620	0.000	Inf	-1.772	0.078	-1.931	-1.621	-1.759	0.091	-1.935	-1.589
TOTMED	0.075	0.027	0.022	0.129	1.000	0.000
TMTA	0.014	0.028	-0.041	0.064	1.000	0.000
TMTB	0.014	0.027	-0.041	0.068	1.000	0.000
IADL	0.011	0.025	-0.040	0.058	1.000	0.000
Sex	-1.816	0.196	-2.088	-1.813	0.000	Inf	-1.829	0.201	-2.213	-1.639	-1.823	0.119	-2.201	-1.627
Diabete	0.125	0.221	-0.050	0.662	0.763	0.311
APOE	0.002	0.039	-0.063	0.067	0.979	0.021
Educ	0.001	0.032	-0.062	0.067	1.000	0.000
Age	1.227	0.170	0.864	1.543	0.000	Inf	1.086	0.178	0.743	1.446	1.068	0.185	0.707	1.434