This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\clearauthor\Name

Chirag Nagpal \Emailchiragn@cs.cmu.edu
\NameVedant Sanil \Emailvsanil@andrew.cmu.edu
\NameArtur Dubrawski \Emailawd@cs.cmu.edu
\addrAuton Lab, Carnegie Mellon

Recovering Sparse and Interpretable Subgroups with Heterogeneous Treatment Effects with Censored Time-to-Event Outcomes

Abstract

Studies involving both randomized experiments as well as observational data typically involve time-to-event outcomes such as time-to-failure, death or onset of an adverse condition. Such outcomes are typically subject to censoring due to loss of follow-up and established statistical practice involves comparing treatment efficacy in terms of hazard ratios between the treated and control groups. In this paper we propose a statistical approach to recovering sparse phenogroups (or subtypes) that demonstrate differential treatment effects as compared to the study population. Our approach involves modelling the data as a mixture while enforcing parameter shrinkage through structured sparsity regularization. We propose a novel inference procedure for the proposed model and demonstrate its efficacy in recovering sparse phenotypes across large landmark real world clinical studies in cardiovascular health.

keywords:
Time-to-Event, Survival Analysis, Heterogeneous Treatment Effects, Hazard Ratio

1 Introduction

Data driven decision making across multiple disciplines including healthcare, epidemiology, econometrics and prognostics often involves establishing efficacy of an intervention when outcomes are measured in terms of the time to an adverse event, such as death, failure or onset of a critical condition. Typically the analysis of such studies involves assigning a patient population to two or more different treatment arms often called the ‘treated’ (or ‘exposed’) group and the ‘control’ (or ‘placebo’) group and observing whether the populations experience an adverse event (for instance death or onset of a disease) over the study period at a rate that is higher (or lower) than for the control group. Efficacy of a treatment is thus established by comparing the relative difference in the rate of event incidence between the two arms called the hazard ratio. However, not all individuals benefit equally from an intervention. Thus, very often potentially beneficial interventions are discarded even though there might exist individuals who benefit, as the population level estimates of treatment efficacy are inconclusive.

In this paper we assume that patient responses to an intervention are typically heterogeneous and there exists patient subgroups that are unaffected by (or worse, harmed) by the intervention being assessed. The ability to discover or phenotype these patients is thus clinically useful as it would allow for more precise clinical decision making by identifying individuals that actually benefit from the intervention being assessed.

Towards this end, we propose Sparse Cox Subgrouping, (SCS) a latent variable approach to model patient subgroups that demonstrate heterogeneous effects to an intervention. As opposed to existing literature in modelling heterogeneous treatment effects with censored time-to-event outcomes our approach involves structured regularization of the covariates that assign individuals to subgroups leading to parsimonious models resulting in phenogroups that are interpretable. We release a python implementation of the proposed SCS approach as part of the auton-survival package (Nagpal et al., 2022b) for survival analysis:

2 Related Work

Large studies especially in clinical medicine and epidemiology involve outcomes that are time-to-events such as death, or an adverse clinical condition like stroke or cancer. Treatment efficacy is typically estimated by comparing event rates between the treated and control arms using the Proportional Hazards (Cox, 1972) model and its extensions.

Identification of subgroups in such scenarios has been the subject of a large amount of traditional statistical literature. Large number of such approaches involve estimation of the factual and counterfactual outcomes using separate regression models (T-learner) followed by regressing the difference between these estimated potential outcomes. Within this category of approaches, Lipkovich et al. (2011) propose the subgroup identification based on differential effect search (SIDES) algorithm, Su et al. (2009) propose a recursive partitioning method for subgroup discovery, Dusseldorp and Mechelen (2014) propose the qualitative interaction trees (QUINT) algorithm, and Foster et al. (2011) propose the virtual twins (VT) method for subgroup discovery involving decision tree ensembles. We include a parametric version of such an approach as a competing baseline.

Identification of heterogeneous treatment effects (HTE) is also of growing interest to the machine learning community with multiple approaches involving deep neural networks with balanced representations (Shalit et al., 2017; Johansson et al., 2020), generative models Louizos et al. (2017) as well as Non-Parametric methods involving random-forests (Wager and Athey, 2018) and Gaussian Processes (Alaa and Van Der Schaar, 2017). There is a growing interest in estimating HTEs from an interpretable and trustworthy standpoint (Lee et al., 2020; Nagpal et al., 2020; Morucci et al., 2020; Wu et al., 2022; Crabbé et al., 2022). Wang and Rudin (2022) propose a sampling based approach to discovering interpretable rule sets demonstrating HTEs.

However large part of this work has focused extensively on outcomes that are binary or continuous. The estimation of HTEs in the presence of censored time-to-events has been limited. Xu et al. (2022) explore the problem and describe standard approaches to estimate treatment effect heterogeneity with survival outcomes. They also describe challenges associated with existing risk models when assessing treatment effect heterogeneity in the case of cardiovascular health.

There has been some initial attempt to use neural network for causal inference with censored time-to-event outcomes. Curth et al. (2021) propose a discrete time method along with regularization to match the treated and control representations. Chapfuwa et al. (2021)’s approach is related and involves the use of normalizing flows to estimate the potential time-to-event distributions under treatment and control. While our contributions are similar to Nagpal et al. (2022a), in that we assume treatment effect heterogeneity through a latent variable model, our contribution differs in that 1) Our approach is free of the expensive Monte-Carlo sampling procedure and 2) Our generalized EM inference procedure allows us to naturally incorporate structured sparsity regularization, which helps recovers phenogroups that are parsimonious in the features they recover that define subgroups.

Survival and time-to-event outcomes occur pre-eminently in areas of cardiovascular health. One such area is reducing combined risk of adverse outcomes from atherosclerotic disease111A class of related clinical conditions from increasing deposits of plaque in the arteries, leading to Stroke, Myorcardial Infarction and other Coronary Heart Diseases. (Herrington et al., 2016; Furberg et al., 2002; Group, 2009; Buse et al., 2007) The ability of recovering groups with differential benefits to interventions can thus lead to improved patient care through framing of optimal clinical guidelines.

3 Proposed Model: Sparse Cox Subgrouping

Case 1: Z{0,+1}Z\in\{0,+1\} Refer to caption

Case 2: Z{0,+1,1}Z\in\{0,+1,-1\} Refer to caption

Figure 1: Potential outcome distributions under the assumptions of treatment effect heterogeneity. Case 1: Amongst the treated population, conditioned on the latent ZZ, there are two subgroups that benefit and are unaffected by the intervention. Case 2: There is an additional latent subgroup conditioned on which, the treated population is harmed with a worse survival rate.
Notation

As is standard in survival analysis, we assume that we either observe the true time-to-event or the time of censoring U=min{T,C}U=\mathrm{min}\{T,C\} indicated by the censoring indicator defined as Δ=𝟏{T<C}\Delta=\bm{1}\{T<C\}. We thus work with a dataset of right censored observations in the form of 4-tuples, 𝒟={(𝒙i,δi,𝒖i,𝒂i)}i=1n\mathcal{D}=\{({\bm{x}}_{i},\delta_{i},{\bm{u}}_{i},{\bm{a}}_{i})\}_{i=1}^{n}, where 𝒖i+{\bm{u}}_{i}\in\mathbb{R}^{+} is the time-to-event or censoring as indicated by δi{0,1}\delta_{i}\in\{0,1\}, 𝒂i{0,1}{\bm{a}}_{i}\in\{0,1\} is the indicator of treatment assignment, and 𝒙i{\bm{x}}_{i} are individual covariates that confound the treatment assignment and the outcome.

Assumption 1 (Independent Censoring)

The time-to-event TT and the censoring distribution CC are independent conditional on the covariates XX and the intervention AA.

Model

Consider a maximum likelihood approach to model the data 𝒟\mathcal{D} the set of parameters 𝛀\bm{\Omega}. Under Assumption 1 the likelihood of the data 𝒟\mathcal{D} can be given as,

(𝛀;𝒟)\displaystyle\mathcal{L}(\bm{\Omega};\mathcal{D}) i=1|𝒟|𝝀(ui|X=𝒙i,A=𝒂i)δi𝑺(ui|X=𝒙i,A=𝒂i),\displaystyle\propto\prod_{i=1}^{|\mathcal{D}|}\bm{\lambda}(u_{i}|X={\bm{x}}_{i},A={\bm{a}}_{i})^{\delta_{i}}\bm{S}(u_{i}|X={\bm{x}}_{i},A={\bm{a}}_{i}), (1)

here 𝝀(t)=limΔt0(tT<t+Δt|Tt)Δt\bm{\lambda}(t)=\mathop{\lim}\limits_{\Delta t\to 0}\frac{\mathbb{P}(t\leq T<t+\Delta t|T\geq t)}{\Delta t} is the hazard and 𝑺(t)=(T>t)\bm{S}(t)=\mathbb{P}(T>t) is the survival rate.

Assumption 2 (PH)

The distribution of the time-to-event TT conditional on the covariates and the treatment assignment obeys proportional hazards.

From Assumption 2 (Proportional Hazards), an individual with covariates (X=𝒙)(X={\bm{x}}) under intervention (A=𝒂)(A={\bm{a}}) under a Cox model with parameters β\beta and treatment effect ω\omega is given as

𝝀(𝒕|A=𝒂,X=𝒙)=𝝀0(t)exp(𝜷𝒙+𝝎𝒂),\displaystyle\bm{\lambda}({\bm{t}}|A={\bm{a}},X={\bm{x}})=\bm{\lambda}_{0}(t)\exp\big{(}\bm{\beta}^{\top}{\bm{x}}+\bm{\omega}\cdot\bm{a}\big{)}, (2)

Here, 𝝀0()\bm{\lambda}_{0}(\cdot) is an infinite dimensional parameter known as the base survival rate. In practice in the Cox’s model the base survival rate is a nuisance parameter and is estimated non-parametrically. In order to model the heterogeneity of treatment response. We will now introduce a latent variable Z{0,1,1}Z\in\{0,1,-1\} that mediates treatment response to the model,

𝝀(𝒕|A=𝒂,X=𝒙,Z=𝒌)\displaystyle\bm{\lambda}({\bm{t}}|A={\bm{a}},X={\bm{x}},Z={\bm{k}}) =𝝀0(t)exp(β𝒙)exp(𝝎)𝒌𝒂,\displaystyle=\bm{\lambda}_{0}(t)\exp(\beta^{\top}{\bm{x}})\exp(\bm{\omega})^{\bm{ka}},
and, (Z\displaystyle\text{and, }\;\;\mathbb{P}(Z =𝒌|X=𝒙)=exp(𝜽k𝒙)jexp(𝜽j𝒙).\displaystyle={\bm{k}}|X={\bm{x}})=\frac{\exp(\bm{\theta}_{k}^{\top}{\bm{x}})}{\sum_{j}\exp(\bm{\theta}_{j}^{\top}{\bm{x}})}. (3)

Here, 𝝎\bm{\omega}\in\mathbb{R} is the treatment effect, and 𝜽k×d\bm{\theta}\in\mathbb{R}^{k\times d} is the set of parameters that mediate assignment to the latent group ZZ conditioned on the confounding features 𝒙{\bm{x}}. Note that the above choice of parameterization naturally enforces the requirements from the model as in Figure 1. Consider the following scenarios,

Case 1: The study population consists of two sub-strata ie. Z{0,+1}Z\in\{0,+1\}, that are benefit and are unaffected by treatment respectively.

Case 2: The study population consists of three sub-strata ie. Z{0,+1,1}Z\in\{0,+1,-1\}, that benefit, are harmed or unaffected by treatment respectively.

Following from Equations 1 & 3, the complete likelihood of the data 𝒟\mathcal{D} under this model is,

(𝛀;𝒟)=i=1|𝒟|kZ(𝝀0(ui)𝒉(𝒙,𝒂,𝒌))δi\displaystyle\mathcal{L}(\bm{\Omega};\mathcal{D})=\prod_{i=1}^{|\mathcal{D}|}\sum_{k\in Z}\bigg{(}\bm{\lambda}_{0}(u_{i})\bm{h}({\bm{x}},{\bm{a}},{\bm{k}})\bigg{)}^{\delta_{i}} 𝑺0(ui)𝒉(𝒙,𝒂,𝒌)(Z=k|X=𝒙i)\displaystyle{\bm{S}}_{0}(u_{i})^{\bm{h}({\bm{x}},{\bm{a}},{\bm{k}})}\mathbb{P}(Z=k|X={\bm{x}}_{i})
where, ln𝒉(𝒙,𝒂,𝒌)=𝜷𝒙+𝒌𝒂𝒘 and ln\displaystyle\text{ where, }\ln\bm{h}({\bm{x}},{\bm{a}},{\bm{k}})=\bm{\beta}^{\top}{\bm{x}}+{\bm{k}}\cdot\bm{a}\cdot\bm{w}\text{ and }\ln 𝑺0()=𝚲0(),\displaystyle{\bm{S}}_{0}(\cdot)=-\bm{\Lambda}_{0}(\cdot), (4)

Note that 𝚲0()=0t𝝀0()\bm{\Lambda}_{0}(\cdot)=\int_{0}^{t}\bm{\lambda}_{0}(\cdot) is the infinite dimensional cumulative hazard and is inferred when learning the model. We will notate the set of all learnable parameters as 𝛀={𝜽,𝜷,𝒘,𝚲0}\bm{\Omega}=\{\bm{\theta},\bm{\beta},\bm{w},\bm{\Lambda}_{0}\}.

Shrinkage

In retrospective analysis to recover treatment effect heterogeneity a natural requirement is parsimony of the recovered subgroups in terms of the covariates to promote model interpretability. Such parsimony can be naturally enforced through appropriate shrinkage on the coefficients that promote sparsity. We want to recover phenogroups that are ‘sparse’ in 𝜽\bm{\theta}. We enforce sparsity in the parameters of the latent ZZ gating function via a group 1\ell_{1} (Lasso) penalty. The final loss function to be optimized including the group sparsity regularization term is,

(𝛀;𝒟)+\displaystyle\mathcal{L}(\bm{\Omega};\mathcal{D})+ ϵ(𝜽) where, (𝜽)=dk𝒵(𝜽dk)2\displaystyle\bm{\epsilon}\cdot\mathcal{R}(\bm{\theta})\text{ where, }\mathcal{R}(\bm{\theta})=\sum_{d}\sqrt{\sum_{k\in\mathcal{Z}}\big{(}\bm{\theta}^{k}_{d}\big{)}^{2}}
and ϵ>0 is the strength of the shrinkage parameter.\displaystyle\bm{{\epsilon}}>0\text{ is the strength of the shrinkage parameter}. (5)
Identifiability

Further, to ensure identifiability we restrict the gating parameters for the (Z=0)(Z=0) to be 0. Thus 𝜽1=0\bm{\theta}_{1}=0.

Inference

We will present a variation of the Expectation Maximization algorithm to infer the parameters in Equation 3. Our approach differs from Nagpal et al. (2022a, 2021) in that it does not require storchastic Monte-Carlo sampling. Further, our generalized EM inference allows for incorporation of the structured sparsity in the M-Step.

A Semi-Parametric Q()Q(\cdot)

Note that the likelihood in Equation 3 is semi-parametric and consists of parametric components and the infinite dimensional base hazard 𝚲()\bm{\Lambda}(\cdot). We define the Q()Q(\cdot) as:

Q(𝛀;𝒟)\displaystyle Q(\bm{\Omega};\mathcal{D}) =i=1nk𝒵𝜸ik(ln𝒑𝜽(Z=k|X=𝒙i)+ln𝒑𝒘,𝜷,𝚲(T|Z=k,X=𝒙i))+(𝜽)\displaystyle=\sum_{i=1}^{n}\sum_{k\in\mathcal{Z}}\bm{\gamma}^{k}_{i}\bigg{(}\ln\bm{p}_{\bm{\theta}}(Z=k|X={\bm{x}}_{i})+\ln\bm{p}_{\bm{w},\bm{\beta},\bm{\Lambda}}(T|Z=k,X={\bm{x}}_{i})\bigg{)}+\mathcal{R}(\bm{\theta})
The E-Step

Requires computation of the posteriors counts 𝜸:=𝒑(Z=k|T,X=𝒙,A=𝒂).\bm{\gamma}:=\bm{p}(Z={k}|T,X=\bm{x},A={\bm{a}}).

Result 1 (Posterior Counts)

The posterior counts 𝛄\bm{\gamma} for the latent ZZ are estimated as,

𝜸k\displaystyle\bm{\gamma}^{k} =^(Z=k|X=𝒙,A=𝒂,𝒖)\displaystyle=\widehat{\mathbb{P}}(Z=k|X={\bm{x}},A={\bm{a}},{\bm{u}})
=(𝒖|Z=𝒌,X=𝒙,A=𝒂)(Z=𝒌|X=𝒙)k(𝒖|Z=𝒌,X=𝒙,A=𝒂)(Z=𝒌|X=𝒙)\displaystyle=\frac{\mathbb{P}({\bm{u}}|Z={\bm{k}},X={\bm{x}},A={\bm{a}})\mathbb{P}(Z={\bm{k}}|X={\bm{x}})}{\sum_{k}\mathbb{P}({\bm{u}}|Z={\bm{k}},X={\bm{x}},A={\bm{a}})\mathbb{P}(Z={\bm{k}}|X={\bm{x}})}
=𝒉(𝒙,𝒂,𝒌)δi𝑺^0(𝒖)𝒉(𝒙,𝒂,𝒌)exp(𝜽𝒌𝒙)j𝒵𝒉(𝒙,𝒂,𝒋)δi𝑺^0(𝒖)𝒉(𝒙,𝒂,𝒋)exp(𝜽𝒋𝒙).\displaystyle=\frac{{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{k}})}^{\delta_{i}}\widehat{{\bm{S}}}_{0}({\bm{u}})^{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{k}})}\exp(\bm{\theta_{k}}^{\top}\bm{x})}{\mathop{\sum}_{j\in\mathcal{Z}}{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{j}})}^{\delta_{i}}\widehat{{\bm{S}}}_{0}({\bm{u}})^{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{j}})}\exp(\bm{\theta_{j}}^{\top}\bm{x})}. (6)

For a full discussion on derivation of the Q()Q(\cdot) and the posterior counts please refer to Appendix B

The M-Step

Involves maximizing the Q()Q(\cdot) function. Rewriting the Q()Q(\cdot) as a sum of two terms,

Q(𝛀)=i=1nk𝒵𝜸ikln𝒑𝒘,𝜷,𝚲0(T|Z=k,X=𝒙i,A=𝒂i)𝑨(𝒘,𝜷,𝚲0)+i=1nk𝒵𝜸ikln𝒑𝜽(Z=k|X=𝒙i)+(𝜽)𝑩(𝜽)\displaystyle Q(\bm{\Omega})=\underbrace{\sum_{i=1}^{n}\sum_{k\in\mathcal{Z}}\bm{\gamma}^{k}_{i}\ln\bm{p}_{\bm{w},\bm{\beta},\bm{\Lambda}_{0}}(T|Z=k,X={\bm{x}}_{i},A={\bm{a}}_{i})}_{{\bm{A}}(\bm{w},\bm{\beta},\bm{\Lambda}_{0})}+\underbrace{\sum_{i=1}^{n}\sum_{k\in\mathcal{Z}}\bm{\gamma}^{k}_{i}\ln\bm{p}_{\bm{\theta}}(Z=k|X={\bm{x}}_{i})+\mathcal{R}(\bm{\theta})}_{{\bm{B}}(\bm{\theta})}
Result 2 (Weighted Cox model)

The term 𝐀{\bm{A}} can be rewritten as a weighted Cox model and thus optimized using the corresponding ‘partial likelihood’,

Updates for {β,ω}\{\bm{\beta},\bm{\omega}\}: The partial-likelihood, 𝒫()\mathcal{PL(\cdot)} under sampling weights (Binder, 1992) is

𝒫(𝛀;𝒟)=i=1,δi=1nk𝒵𝜸ik(ln𝒉k(𝒙i,𝒂i,𝒌)ln\displaystyle\mathcal{PL}(\bm{\Omega};\mathcal{D})=\sum_{\begin{subarray}{c}i=1,\delta_{i}=1\end{subarray}}^{n}\sum_{k\in\mathcal{Z}}\bm{\gamma}^{k}_{i}\bigg{(}\ln\bm{h}_{k}({\bm{x}}_{i},\bm{a}_{i},{\bm{k}})-\ln j𝖱𝗂𝗌𝗄𝖲𝖾𝗍(ui)k𝒵𝜸jk𝒉k(𝒙j,𝒂j,𝒌))]\displaystyle\sum_{j\in{\mathsf{RiskSet}(u_{i})}}\sum_{k\in\mathcal{Z}}\bm{\gamma}^{k}_{j}\bm{h}_{k}({\bm{x}}_{j},\bm{a}_{j},{\bm{k}})\bigg{)}\bigg{]} (7)

Here 𝖱𝗂𝗌𝗄𝖲𝖾𝗍()\mathsf{RiskSet}(\cdot) is the ‘risk set’ or the set of all individuals who haven’t experienced the event till the corresponding time, i.e. 𝖱𝗂𝗌𝗄𝖲𝖾𝗍(t):={i:ui>t}\mathsf{RiskSet}(t):=\{i:u_{i}>t\}. 𝒫()\mathcal{PL(\cdot)} is convex in {𝜷,𝝎}\{\bm{\beta},\bm{\omega}\} and we update these with a gradient step.

Updates for 𝚲0\bm{\Lambda}_{0}: The base hazard 𝚲0\bm{\Lambda}_{0} are updated using a weighted Breslow’s estimate (Breslow, 1972; Lin, 2007) assuming the posterior counts 𝜸\bm{\gamma} to be sampling weights (Chen, 2009),

𝚲^0(t)+=i=1nk𝒵𝟏{ui<t}𝜸ikδij𝖱𝗂𝗌𝗄𝖲𝖾𝗍(ui)k𝒵𝜸jk𝒉k(𝒙j,𝒂j,𝒌)\displaystyle\widehat{\bm{\Lambda}}_{0}(t)^{+}=\sum_{i=1}^{n}\sum_{k\in\mathcal{Z}}\bm{1}\{u_{i}<t\}\frac{\bm{\gamma}^{k}_{i}\cdot\delta_{i}}{\mathop{\sum}\limits_{j\in\mathsf{RiskSet}(u_{i})}\mathop{\sum}\limits_{k\in\mathcal{Z}}\bm{\gamma}^{k}_{j}\bm{h}_{k}({\bm{x}}_{j},\bm{a}_{j},{\bm{k}})} (8)

Term 𝑩{\bm{B}} is a function of the gating parameters 𝜽\bm{\theta} that determine the latent assignment ZZ along with sparsity regularization. We update 𝑩{\bm{B}} using a Proximal Gradient update as is the case with Iterative Soft Thresholding (ISTA) for group sparse 1\ell_{1} regression.

Updates for θ\bm{\theta}: The proximal update for 𝜽\bm{\theta} including the group regularization (Friedman et al., 2010) term ()\mathcal{R}(\cdot) is,

𝜽^+=𝗉𝗋𝗈𝗑ηϵ(𝜽dd𝜽𝑩(𝜽)), where 𝗉𝗋𝗈𝗑ηϵ(𝒚)=𝒚𝒚2max{0,𝒚2ηϵ}.\displaystyle\widehat{\bm{\theta}}^{+}=\mathsf{prox}_{\eta{\epsilon}}\bigg{(}\bm{\theta}-\frac{d}{d\bm{\theta}}{\bm{B}}(\bm{\theta})\bigg{)},\;\;\;\text{ where }\mathsf{prox}_{\eta{\epsilon}}({\bm{y}})=\frac{{\bm{y}}}{||{\bm{y}}||_{2}}\mathrm{max}\{0,||\bm{y}||_{2}-\eta\bm{\epsilon}\}. (9)

All together the inference procedure is described in Algorithm 1.

{algorithm}

[!h] Parameter Learning for SCS with a Generalized EM \SetAlgoLined\SetKwInOutInputInput \SetKwInOutreturnReturn \InputTraining set, 𝒟={(𝒙i,ui,ai,δi)i=1n}\mathcal{D}=\{({\bm{x}}_{i},u_{i},a_{i},\delta_{i})_{i=1}^{n}\}; maximum EM iterations, BB, step size η\eta  

\While

<not converged> \Forb{1,2,,B}b\in\{1,2,...,B\} E-Step .

𝜸ik=𝒉(𝒙,𝒂,𝒌)δi𝑺^0(𝒖)𝒉(𝒙,𝒂,𝒌)exp(𝜽𝒌𝒙)j𝒵𝒉(𝒙,𝒂,𝒋)δi𝑺^0(𝒖)𝒉(𝒙,𝒂,𝒋)exp(𝜽𝒋𝒙)\bm{\gamma}_{i}^{k}=\frac{{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{k}})}^{\delta_{i}}\widehat{{\bm{S}}}_{0}({\bm{u}})^{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{k}})}\exp(\bm{\theta_{k}}^{\top}\bm{x})}{\mathop{\sum}_{j\in\mathcal{Z}}{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{j}})}^{\delta_{i}}\widehat{{\bm{S}}}_{0}({\bm{u}})^{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{j}})}\exp(\bm{\theta_{j}}^{\top}\bm{x})} \hfill\triangleright Compute posterior counts (Equation 6).

M-Step .

𝜷^+𝜷^η𝜷𝒫(𝜷,𝒘;𝒟)\widehat{\bm{\beta}}^{+}\leftarrow\widehat{\bm{\beta}}-\eta\cdot\nabla_{\bm{\beta}}\mathcal{PL}(\bm{\beta},{\bm{w}};\mathcal{D})

𝒘^+𝒘^η𝒘𝒫(𝜷,𝒘;𝒟)\widehat{\bm{w}}^{+}\leftarrow\widehat{\bm{w}}-\eta\cdot\nabla_{\bm{w}}\mathcal{PL}(\bm{\beta},{\bm{w}};\mathcal{D}) \triangleright Gradient descent update.

𝚲^0(t)+i=1nk𝒵𝟏{ui<t}𝜸ikδij𝖱𝗂𝗌𝗄𝖲𝖾𝗍(ui)k𝒵𝜸jk𝒉k(𝒙j,𝒂j,𝒌)\widehat{\bm{\Lambda}}_{0}(t)^{+}\leftarrow\mathop{\sum}_{i=1}^{n}\sum_{k\in\mathcal{Z}}\bm{1}\{u_{i}<t\}\frac{\bm{\gamma}^{k}_{i}\cdot\delta_{i}}{\mathop{\sum}\limits_{j\in\mathsf{RiskSet}(u_{i})}\mathop{\sum}\limits_{k\in\mathcal{Z}}\bm{\gamma}^{k}_{j}\bm{h}_{k}({\bm{x}}_{j},\bm{a}_{j},{\bm{k}})} \hfill\trianglerightBreslow (1972)’s estimator.

𝜽^+𝜽^ηθ𝑩(θ)\widehat{\bm{\theta}}^{+}\leftarrow\widehat{\bm{\theta}}-\eta\cdot\nabla_{\theta}{\bm{B}}(\theta) \triangleright Update 𝜽\bm{\theta} with gradient of Q^\widehat{Q}.

𝜽^+𝗉𝗋𝗈𝗑ϵη(𝜽^)\widehat{\bm{\theta}}^{+}\leftarrow\mathsf{prox}_{{\epsilon}\eta}(\widehat{\bm{\theta}}) \triangleright Proximal update.
 
\returnlearnt parameters 𝛀\bm{\Omega};

4 Experiments

In this section we describe the experiments conducted to benchmark the performance of SCS against alternative models for heterogenous treatment effect estimation on multiple studies including a synthetic dataset and multiple large landmark clinical trials for cardiovascular diseases.

4.1 Simulation

a) The Time-to-Event

b) Learnt Decision Boundary

c) ROC Curves

Refer to caption
Refer to caption
Refer to caption
Figure 2: a) Population level Kaplan-Meier Estimates of the Survival Distribution stratified by the treatment assignment. b) Distribution of the Latent ZZ in XX and the recovered decision boundary by SCS. c) Receiver Operator Characteristics of SCS in recovering the true phenotype.
Refer to caption
Refer to caption
Refer to caption
Figure 3: The phenotypes recovered with Sparse Cox Subgrouping on the Synthetic Data. As expected, the recovered phenotypes conform to the modelling assumptions as in Figure 4.

.

In this section we first describe the performance of the proposed Sparse Cox Subgrouping approach on a synthetic dataset designed to demonstrate heterogeneous treatment effects. We randomly assign individuals to the treated or control group. The latent variable ZZ is drawn from a uniform categorical distribution that determines the subgroup,

ABernoulli(1/2),ZCategorical(1/3)A\sim\mathrm{Bernoulli}(\nicefrac{{1}}{{2}}),\quad Z\sim\mathrm{Categorical}(\nicefrac{{1}}{{3}})

Conditioned on ZZ we sample X1:2Normal(𝝁z,𝝈z)X_{1:2}\sim\textrm{Normal}(\bm{\mu}_{z},\bm{\sigma}_{z}) as in Figure 2 that determine the conditional Hazard Ratios HR(k)\textrm{HR}(k), and randomly sample noisy covariates X3:6Uniform(1,1)X_{3:6}\sim\text{Uniform}(-1,1) . The true time-to-event TT and censoring times CC are then sampled as,

T|(X=𝒙,A=𝒂,Z=𝒌)\displaystyle T|(X={\bm{x}},A={\bm{a}},Z={\bm{k}}) Gompertz(β=1,η=0.25HR(k)𝒂),C|TUniform(0,T)\displaystyle\sim\mathrm{Gompertz}(\beta=1,\eta=0.25\cdot\text{HR}(k)^{\bm{a}}),\quad C|T\sim\text{Uniform}(0,T)

Finally we sample the censoring indicator ΔBernoulli(0.8)\Delta\sim\mathrm{Bernoulli}(0.8) and set the observed time-to-event,

U=T if Δ=1, else we set U=C.U=T\text{ if }\Delta=1\text{, else we set }U=C.

Figure 2 presents the ROC curves for SCS’s ability to identify the groups with enhanced and diminished treatment effects respectively. In Figure 3 we present Kaplan-Meier estimators of the Time-to-Event distributions conditioned on the predicted ZZ by SCS. Clearly, SCS is able to identify the phenogroups corresponding to differential benefits.

4.2 Recovering subgroups demonstrating Heterogeneous Treatment Effects from Landmark studies of Cardiovascular Health

Refer to caption
Refer to caption
ALLHAT
Size 18,102
Outcome Combined CVD
Intervention Lisinopril
Control Amlodipine
Hazard Ratio 1.06, (1.01, 1.12)
5-year RMST -24.86, (-37.35, -8.89)
BARI2D
Size 2,368
Outcome Death, MI or Stroke
Intervention Medical Therapy
Control Early Revascularization
Hazard Ratio 1.02, (0.81, 1.14)
5-year RMST 23.26, (-27.01, 64.84)
Figure 4: Event-free Kaplan-Meier survival curves stratified by the treatment assignment and summary statistics for the ALLHAT and BARI2D studies. (Combined CVD: Coronary Heart Disease, Stroke, other treated angina, fatal or non-fatal Heart Failure, and Peripheral Arterial Disease.)
Antihypertensive and Lipid-Lowering Treatment to Prevent Heart Attack

(Furberg et al., 2002)

The ALLHAT study was a large randomized experiment conducted to assess the efficacy of multiple classes of blood pressure lowering medicines for patients with hypertension in reducing risk of adverse cardiovascular conditions. We considered a subset of patients from the original ALLHAT study who were randomized to receive either Amlodipine (a calcium channel blocker) or Lisinopril (an Angiotensin-converting enzyme inhibitor). Overall, Amlodipine was found to be more efficacious than Lisinopril in reducing combined risk of cardio-vascular disease.

Bypass Angioplasty Revascularization Investigation in Type II Diabetes

(Group, 2009)

Diabetic patients have been traditionally known to be at higher risk of cardiovascular disease however appropriate intervention for diabetics with ischemic heart disease between surgical coronary revascularization or management with medical therapy is widely debated. The BARI2D was a large landmark experiment conducted to assess efficacy between these two possible medical interventions. Overall BARI2D was inconclusive in establishing the appropriate therapy between Coronary Revascularization or medical management for patients with Type-II Diabetes.

Figure 4 presents the event-free survival rates as well as the summary statistics for the studies. In our experiments, we included a large set of confounders collected at baseline visit of the patients which we utilize to train the proposed model. A full list of these features are in Appendix A.

4.3 Baselines

Cox PH with 1\ell_{1} Regularized Treatment Interaction (cox-int)

a

We include treatment effect heterogeneity via interaction terms that model the time-to-event distribution using a proportional hazards model as in Kehl and Ulm (2006). Thus,

𝝀(t|X=𝒙,A=𝒂)=𝝀0(t)exp(𝜷𝒙+𝒂𝜽𝒙)\displaystyle\bm{\lambda}(t|X={\bm{x}},A={\bm{a}})=\bm{\lambda}_{0}(t)\exp\big{(}\bm{\beta}^{\top}{\bm{x}}+{\bm{a}}\cdot\bm{\theta}^{\top}{\bm{x}}\big{)} (10)

The interaction effects 𝜽\bm{\theta} are regularized with a lasso penalty in order to recover a sparse phenotyping rule defined as G(𝒙)=𝜽𝒙G(\bm{x})=\bm{\theta}^{\top}\bm{x}.

Binary Classifier with 1\ell_{1} Regularized Treatment Interaction (bin-int)

a

Instead of modelling the time-to-event distribution we directly model the thresholded survival outcomes Y=𝟏{T<t}Y=\bm{1}\{T<t\} at a five year time horizon using a log-linear parameterization with a logit link function. As compared to cox-int, this model ignores the data-points that were right-censored prior to the thresholded time-to-event, however it is not sensitive to the strong assumption of Proportional Hazards.

𝔼[T>t|X=𝒙,A=𝒂]=σ(𝜷𝒙+𝜷0+𝒂𝜽𝒙),\displaystyle\mathbb{E}[T>t|X={\bm{x}},A={\bm{a}}]=\sigma(\bm{\beta}^{\top}{\bm{x}}+\bm{\beta}_{0}+{\bm{a}}\cdot\bm{\theta}^{\top}{\bm{x}}),
and, σ() is the logistic link function.\displaystyle\text{and, }\sigma(\cdot)\text{ is the logistic link function.} (11)
Cox PH T-Learner with 1\ell_{1} Regularized Logistic Regression (cox-tlr)

a

We train two separate Cox Regression models on the treated and control arms (T-Learner) to estimate the potential outcomes under treatment (A=1)(A=1) and control (A=0)(A=0). Motivated from the ‘Virtual Twins’ approach as in Foster et al. (2011), a logistic regression with an 1\ell_{1} penalty is trained to estimate if the risk of the potential outcome under treatment is higher than under control. This logistic regression is then employed as the phenotyping function G()G(\cdot) and is given as,

G(𝒙)\displaystyle G({\bm{x}}) =𝔼[𝟏{f1(𝒙,t)>f0(𝒙,t)}|X=𝒙]\displaystyle=\mathbb{E}[\bm{1}\{f_{1}({\bm{x}},t)>f_{0}({\bm{x}},t)\}|X={\bm{x}}]
where, f𝒂(𝒙,t)\displaystyle\text{where, }f_{\bm{a}}({\bm{x}},t) =(T>t|do(A=𝒂),X=𝒙).\displaystyle=\mathbb{P}(T>t|\text{do}(A=\bm{a}),X={\bm{x}}). (12)

The above models involving sparse 1\ell_{1} regularization were trained with the glmnet (Friedman et al., 2009) package in R.

The ACC/AHA Long term Atheroscleoratic Cardiovascular Risk Estimate
222https://tools.acc.org/ascvd-risk-estimator-plus/

a

The American College of Cardiology and the American Heart Association model for estimation of risk of Atheroscleratic disease risk (Goff Jr et al., 2014) involves pooling data from multiple observational cohorts of patients followed by modelling the 10-year risk of an adverse cardiovascular condition including death from coronary heart disease, Non-Fatal Myocardial Infarction or Non-fatal Stroke. While the risk model was originally developed to assess factual risk in the observational sense, in practice it is also employed to assess risk when making counterfactual decisions.

Amlodipine versus Lisinopril in the ALLHAT Study

Refer to caption
Refer to caption

𝜽05||\bm{\theta}||_{0}\leq 5
Refer to captionRefer to caption 𝜽010||\bm{\theta}||_{0}\leq 10 Refer to captionRefer to caption No Sparsity
Refer to captionRefer to caption

Figure 5: Conditional Average Treatment Effect in Hazard Ratio versus subgroup size for the latent phenogroups extracted from the ALLHAT study.

Early Revascularization versus Medical Therapy in the BARI2D Study

Refer to caption
Refer to caption

𝜽05||\bm{\theta}||_{0}\leq 5 Refer to captionRefer to caption 𝜽010||\bm{\theta}||_{0}\leq 10 Refer to captionRefer to caption No Sparsity
Refer to captionRefer to caption

Figure 6: Conditional Average Treatment Effect in Hazard Ratio versus subgroup size for the latent phenogroups extracted from the BARI2D study.

4.4 Results and Discussion

Protocol

We compare the performance of SCS and the corresponding competing methods in recovery of subgroups with enhanced (or diminished treatment effects). For each of these studies we stratify the study population into equal sized sets for training and validation while persevering the proportion of individuals that were assigned to treatment and experienced the outcome in the follow up period. The models were trained on the training set and validated on the held-out test set. For each of the methods we experiment with models that do not enforce any sparsity (ϵ=0\bm{\epsilon}=0) as well as tune the level of sparsity to recover phenotyping functions that involve 55 and 1010 features. The subgroup size are varied by controlling the threshold at which the individual is assigned to a group. Finally, the treatment effect is compared in terms of Hazard Ratios, Risk Differences as well as Restricted Mean Survival Time over a 5 Year event period.

Results

We present the results of SCS versus the baselines in terms of Hazard Ratios on the ALLHAT and BARI2D datasets in Figures 5 and 6. In the case of ALLHAT, SCS consistently recovered phenogroups with more pronounced (or diminished) treatment effects. On external validation on the heldout dataset, we found a subgroup of patients that had similar outcomes whether assigned to Lisinopril or Amlodipine, whereas the other subgroup clearly identified patients that were harmed with Lisinopril. The group harmed with Lisinopril had higher Diastolic BP. On the other hand, patients with Lower kidney function did not seem to benefit from Amlodipine.

In the case of BARI2D, SCS recovered phenogroups that were both harmed as well as benefitted from just medical therapy without revascularization. The patients who were harmed from Medical therapy were typically older, on the other hand the patients who benefitted primarily included patients who were otherwise assigned to receive PCI instead of CABG revascularization, suggesting PCI to be harmful for diabetic patients.

Tables 3 and 4 present the features that were selected by the proposed model for the studies. Additionally, we also report tabulated results involving metrics like risk difference and restricted mean survival time in the Appendix C.

5 Concluding Remarks

We presented Sparse Cox Subgrouping (SCS) a latent variable approach to recover subgroups of patients that respond differentially to an intervention in the presence of censored time-to-event outcomes. As compared to alternative approaches to learning parsimonious hypotheses in such settings, our proposed model recovered hypotheses with more pronounced treatment effects which we validated on multiple studies for cardiovascular health.

While powerful in its ability to recover parsimonious subgroups there exists limitations in SCS in its current form. The model is sensitive to proportional hazards and may be ill-specified when the proportional hazards assumptions are violated as is evident in many real world clinical studies (Maron et al., 2018; Bretthauer et al., 2022). Another limitation is that SCS in its current form looks at only a single endpoint (typically death, or a composite of multiple adverse outcome). In practice however real world studies typically involve multiple end-points. We envision that extensions of SCS would allow patient subgrouping across multiple endpoints, leading to discovery of actionable sub-populations that similarly benefit from the intervention under assessment.

References

  • Alaa and Van Der Schaar (2017) Ahmed M Alaa and Mihaela Van Der Schaar. Bayesian inference of individualized treatment effects using multi-task gaussian processes. Advances in neural information processing systems, 30, 2017.
  • Binder (1992) David A Binder. Fitting cox’s proportional hazards models from survey data. Biometrika, 79(1):139–147, 1992.
  • Breslow (1972) Norman E Breslow. Contribution to discussion of paper by dr cox. J. Roy. Statist. Soc., Ser. B, 34:216–217, 1972.
  • Bretthauer et al. (2022) Michael Bretthauer, Magnus Løberg, Paulina Wieszczy, Mette Kalager, Louise Emilsson, Kjetil Garborg, Maciej Rupinski, Evelien Dekker, Manon Spaander, Marek Bugajski, et al. Effect of colonoscopy screening on risks of colorectal cancer and related death. New England Journal of Medicine, 2022.
  • Buse et al. (2007) John B Buse, ACCORD Study Group, et al. Action to control cardiovascular risk in diabetes (accord) trial: design and methods. The American journal of cardiology, 99(12):S21–S33, 2007.
  • Chapfuwa et al. (2021) Paidamoyo Chapfuwa, Serge Assaad, Shuxi Zeng, Michael J Pencina, Lawrence Carin, and Ricardo Henao. Enabling counterfactual survival analysis with balanced representations. In Proceedings of the Conference on Health, Inference, and Learning, pages 133–145, 2021.
  • Chen (2009) Yi-Hau Chen. Weighted breslow-type and maximum likelihood estimation in semiparametric transformation models. Biometrika, 96(3):591–600, 2009.
  • Cox (1972) David R Cox. Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2):187–202, 1972.
  • Crabbé et al. (2022) Jonathan Crabbé, Alicia Curth, Ioana Bica, and Mihaela van der Schaar. Benchmarking heterogeneous treatment effect models through the lens of interpretability. arXiv preprint arXiv:2206.08363, 2022.
  • Curth et al. (2021) Alicia Curth, Changhee Lee, and Mihaela van der Schaar. Survite: Learning heterogeneous treatment effects from time-to-event data. Advances in Neural Information Processing Systems, 34:26740–26753, 2021.
  • Dusseldorp and Mechelen (2014) Elise Dusseldorp and Iven Mechelen. Qualitative interaction trees: A tool to identify qualitative treatment-subgroup interactions. Statistics in medicine, 33, 01 2014. 10.1002/sim.5933.
  • Foster et al. (2011) Jared C Foster, Jeremy MG Taylor, and Stephen J Ruberg. Subgroup identification from randomized clinical trial data. Statistics in medicine, 30(24):2867–2880, 2011.
  • Friedman et al. (2009) Jerome Friedman, Trevor Hastie, Rob Tibshirani, et al. glmnet: Lasso and elastic-net regularized generalized linear models. R package version, 1(4):1–24, 2009.
  • Friedman et al. (2010) Jerome Friedman, Trevor Hastie, and Robert Tibshirani. A note on the group lasso and a sparse group lasso. arXiv preprint arXiv:1001.0736, 2010.
  • Furberg et al. (2002) Curt D Furberg et al. Major outcomes in high-risk hypertensive patients randomized to angiotensin-converting enzyme inhibitor or calcium channel blocker vs diuretic: the antihypertensive and lipid-lowering treatment to prevent heart attack trial (allhat). Journal of the American Medical Association, 2002.
  • Goff Jr et al. (2014) David C Goff Jr, Donald M Lloyd-Jones, Glen Bennett, Sean Coady, Ralph B D’agostino, Raymond Gibbons, Philip Greenland, Daniel T Lackland, Daniel Levy, Christopher J O’donnell, et al. 2013 acc/aha guideline on the assessment of cardiovascular risk: a report of the american college of cardiology/american heart association task force on practice guidelines. Circulation, 129(25_suppl_2):S49–S73, 2014.
  • Group (2009) BARI 2D Study Group. A randomized trial of therapies for type 2 diabetes and coronary artery disease. New England Journal of Medicine, 360(24):2503–2515, 2009.
  • Herrington et al. (2016) William Herrington, Ben Lacey, Paul Sherliker, Jane Armitage, and Sarah Lewington. Epidemiology of atherosclerosis and the potential to reduce the global burden of atherothrombotic disease. Circulation research, 118(4):535–546, 2016.
  • Johansson et al. (2020) Fredrik D Johansson, Uri Shalit, Nathan Kallus, and David Sontag. Generalization bounds and representation learning for estimation of potential outcomes and causal effects. arXiv preprint arXiv:2001.07426, 2020.
  • Kehl and Ulm (2006) Victoria Kehl and Kurt Ulm. Responder identification in clinical trials with censored data. Computational Statistics & Data Analysis, 50(5):1338–1355, 2006.
  • Lee et al. (2020) Kwonsang Lee, Falco J Bargagli-Stoffi, and Francesca Dominici. Causal rule ensemble: Interpretable inference of heterogeneous treatment effects. arXiv preprint arXiv:2009.09036, 2020.
  • Lin (2007) DY Lin. On the breslow estimator. Lifetime data analysis, 13(4):471–480, 2007.
  • Lipkovich et al. (2011) Ilya Lipkovich, Alex Dmitrienko, Jonathan Denne, and Gregory Enas. Subgroup identification based on differential effect search (sides) – a recursive partitioning method for establishing response to treatment in patient subpopulations. Statistics in medicine, 30:2601–21, 07 2011. 10.1002/sim.4289.
  • Louizos et al. (2017) Christos Louizos, Uri Shalit, Joris M Mooij, David Sontag, Richard Zemel, and Max Welling. Causal effect inference with deep latent-variable models. Advances in neural information processing systems, 30, 2017.
  • Maron et al. (2018) David J Maron, Judith S Hochman, Sean M O’Brien, Harmony R Reynolds, William E Boden, Gregg W Stone, Sripal Bangalore, John A Spertus, Daniel B Mark, Karen P Alexander, et al. International study of comparative health effectiveness with medical and invasive approaches (ischemia) trial: rationale and design. American heart journal, 201:124–135, 2018.
  • Morucci et al. (2020) Marco Morucci, Vittorio Orlandi, Sudeepa Roy, Cynthia Rudin, and Alexander Volfovsky. Adaptive hyper-box matching for interpretable individualized treatment effect estimation. In Conference on Uncertainty in Artificial Intelligence, pages 1089–1098. PMLR, 2020.
  • Nagpal et al. (2020) Chirag Nagpal, Dennis Wei, Bhanukiran Vinzamuri, Monica Shekhar, Sara E Berger, Subhro Das, and Kush R Varshney. Interpretable subgroup discovery in treatment effect estimation with application to opioid prescribing guidelines. In Proceedings of the ACM Conference on Health, Inference, and Learning, pages 19–29, 2020.
  • Nagpal et al. (2021) Chirag Nagpal, Steve Yadlowsky, Negar Rostamzadeh, and Katherine Heller. Deep cox mixtures for survival regression. In Machine Learning for Healthcare Conference, pages 674–708. PMLR, 2021.
  • Nagpal et al. (2022a) Chirag Nagpal, Mononito Goswami, Keith Dufendach, and Artur Dubrawski. Counterfactual phenotyping with censored time-to-events. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, page 3634–3644, New York, NY, USA, 2022a. Association for Computing Machinery. ISBN 9781450393850. 10.1145/3534678.3539110. URL https://doi.org/10.1145/3534678.3539110.
  • Nagpal et al. (2022b) Chirag Nagpal, Willa Potosnak, and Artur Dubrawski. auton-survival: an open-source package for regression, counterfactual estimation, evaluation and phenotyping with censored time-to-event data. In Proceedings of the 7th Machine Learning for Healthcare Conference, volume 182 of Proceedings of Machine Learning Research, pages 585–608. PMLR, 05–06 Aug 2022b. URL https://proceedings.mlr.press/v182/nagpal22a.html.
  • Shalit et al. (2017) Uri Shalit, Fredrik D Johansson, and David Sontag. Estimating individual treatment effect: generalization bounds and algorithms. In International Conference on Machine Learning, pages 3076–3085. PMLR, 2017.
  • Su et al. (2009) Xiaogang Su, Chih-Ling Tsai, Hansheng Wang, David Nickerson, and Bogong Li. Subgroup analysis via recursive partitioning. Journal of Machine Learning Research, 10:141–158, 02 2009. 10.2139/ssrn.1341380.
  • Wager and Athey (2018) Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523):1228–1242, 2018.
  • Wang and Rudin (2022) Tong Wang and Cynthia Rudin. Causal rule sets for identifying subgroups with enhanced treatment effects. INFORMS Journal on Computing, 2022.
  • Wu et al. (2022) Han Wu, Sarah Tan, Weiwei Li, Mia Garrard, Adam Obeng, Drew Dimmery, Shaun Singh, Hanson Wang, Daniel Jiang, and Eytan Bakshy. Interpretable personalized experimentation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 4173–4183, 2022.
  • Xu et al. (2022) Yizhe Xu, Nikolaos Ignatiadis, Erik Sverdrup, Scott Fleming, Stefan Wager, and Nigam Shah. Treatment heterogeneity with survival outcomes. arXiv preprint arXiv:2207.07758, 2022.

Appendix A Additional Details on the ALLHAT and BARI 2D Case Studies

Tables 1 and 2 represent additional confounding variables found in the ALLHAT and BARI2D trials respectively.

Name Description
ETHNIC Ethnicity
SEX Sex of Participant
ESTROGEN Estrogen supplementation
BLMEDS Antihypertensive treatment
MISTROKE History of Stroke
HXCABG History of coronary artery bypass
STDEPR Prior ST depression/T-wave inversion
OASCVD Other atherosclerotic cardiovascular disease
DIABETES Prior history of Diabetes
HDLLT35 HDL cholesterol <35mg/dl; 2x in past 5 years
LVHECG LVH by ECG in past 2 years
WALL25 LVH by ECG in past 2 years
LCHD History of CHD at baseline
CURSMOKE Current smoking status.
ASPIRIN Aspirin use
LLT Lipid-lowering trial
AGE Age upon entry
BLWGT Weight upon entry
BLHGT Height upon entry
BLBMI Body Mass Index upon entry
BV2SBP Baseline SBP
BV2DBP Baseline DBP
APOTAS Baseline serum potassium
BLGFR Baseline est glomerular filtration rate
ACHOL Total Cholesterol
AHDL Baseline HDL Cholesterol
AFGLUC Baseline fasting serum glucose
Table 1: List of confounding variables used for experiments involving the ALLHAT dataset.
Name Description
hxmi History of MI
age Age upon entry
dbp_stand Standing diastolic BP
sbp_stand Standing systolic BP
sex Sex
asp Aspirin use
smkcat Cigarette smoking category
betab Beta blocker use
ccb Calcium blocker use
hxhtn History of hypertension requiring tx
insulin Insulin use
weight Weight (kg) upon entry
bmi BMI upon entry
qabn Abnormal Q-Wave
trig Triglycerides (mg/dl) upon entry
dmdur Duration of diabetes mellitus
ablvef Left ventricular ejection fraction <50%
race Race
priorrev Prior revascularization
hxcva Cerebrovascular accident
screat Serum creatinine (mg/dl)
hmg Statin
hxhypo History of hypoglycemic episode
hba1c Hemoglobin A1c(%)
priorstent Prior stent
spotass Serum Potassium(mEq/L)
hispanic Hispanic ethnicity
tchol Total Cholesterol
hdl HDL Cholesterol
insul_circ Circulating insulin (IU/ml)
tzd Thiazolidinedione
ldl LDL Cholesterol
tabn Abnormal T-waves
nsgn Nonsublingual nitrate
sulf Sulfonylurea
hxchf Histoty of congestive heart failure req tx
arb Angiotensin receptor blocker
acr Urine albumin/creatinine ratio mg/g
diur Diuretic
apa Anti-platelet
hxchl Hypercholesterolemia req tx
acei ACE inhibitor
abilow Low ABI (<= 0.9)
biguanide Biguanide
stabn Abnormal ST depression
Table 2: List of confounding variables used for experiments involving the BARI2D dataset.

ALLHAT Name Description BV2SBP Baseline Seated Diastolic Pressure BLGFR Baseline est Glomerular Filteration Rate BLMEDS Antihypertensive Treatment CURSMOKE Current Smoking Status SEX Sex of Participant

BARI2D Name Description age Age upon entry asp Aspirin use hxhtn History of hypertension requiring tx hxchl Hypercholesterolemia req tx priorstent Prior stent

Table 3: List of selected features with sparsity level: 𝜽05||\bm{\theta}||_{0}\leq 5

ALLHAT Name Description BV2SBP Baseline Seated Diastolic Pressure BLGFR Baseline est Glomerular Filtration Rate BLMEDS Antihypertensive Treatment CURSMOKE Current Smoking Status SEX Sex of Participant ASPIRIN Aspirin Use ACHOL Total Cholesterol BLWGT Weight upon entry BMI Body mass index upon entry OASCVD Other atherosclerotic cardiovascular disease

BARI2D Name Description age Age upon entry asp Aspirin use hxhtn History of hypertension requiring tx hxchl Hypercholesterolemia req tx priorstent Prior stent acei ACE Inhibitor acr Urine albumin/creatinine ratio mg/g insul_circ Circulating insulin screat Serum creatinine (mg/dl) tchol Total Cholesterol

Table 4: List of selected features with sparsity level: 𝜽010||\bm{\theta}||_{0}\leq 10

Appendix B Derivation of the Inference Algorithm

Censored Instances: Note that in the case of the censored instances we will condition on the thresholded survival (T>𝒖)(T>{\bm{u}}). The the posterior counts thus reduce to:

𝜸k\displaystyle\bm{\gamma}^{k} =(Z=k|X=𝒙,A=𝒂,T>𝒖)\displaystyle=\mathbb{P}(Z=k|X={\bm{x}},A={\bm{a}},T>{\bm{u}})
=(T>𝒕|Z=𝒌,X=𝒙,A=𝒂)𝒑(Z=𝒌|X=𝒙)k(T>𝒕|Z=𝒌,X=𝒙,A=𝒂)(Z=𝒌|X=𝒙)\displaystyle=\frac{\mathbb{P}(T>{\bm{t}}|Z={\bm{k}},X={\bm{x}},A={\bm{a}})\bm{p}(Z={\bm{k}}|X={\bm{x}})}{\sum_{k}\mathbb{P}(T>{\bm{t}}|Z={\bm{k}},X={\bm{x}},A={\bm{a}})\mathbb{P}(Z={\bm{k}}|X={\bm{x}})} (13)
Here, (T>𝒕|Z=𝒌,X=𝒙,A=𝒂)=exp(𝚲(t))𝒉(𝒙,𝒂,k)\text{Here, }\mathbb{P}(T>{\bm{t}}|Z={\bm{k}},X={\bm{x}},A={\bm{a}})\\ =\exp\big{(}-\bm{\Lambda}(t)\big{)}^{\bm{h}({\bm{x}},{\bm{a}},k)}

Uncensored Instances The posteriors are 𝜸k=𝒑𝜽(Z=k|X=𝒙,T=𝒖)\bm{\gamma}^{k}=\bm{p}_{\bm{\theta}}(Z=k|X={\bm{x}},T={\bm{u}}),

Posteriors for the uncensored data are more involved and involve the base hazard 𝝀0()\bm{\lambda}_{0}(\cdot). Posteriors for uncensored data are independent of the base hazard function, 𝝀0()\bm{\lambda}_{0}(\cdot) as,

𝜸k=𝝀0(u)𝒉k(𝒙,𝒂)𝑺0(ui)𝒉k(𝒙,𝒂)k𝝀0(u)𝒉k(𝒙,𝒂)𝑺0(u)𝒉k(𝒙,𝒂)=𝒉k(𝒙,𝒂)𝑺0(ui)𝒉k(𝒙,𝒂)k𝒉k(𝒙,𝒂)𝑺0(ui)𝒉k(𝒙,𝒂)\displaystyle\bm{\gamma}^{k}=\frac{\cancel{\bm{\lambda}_{0}(u)}\bm{h}_{k}({\bm{x}},\bm{a}){\bm{S}}_{0}(u_{i})^{\bm{h}_{k}({\bm{x}},\bm{a})}}{\mathop{\sum}\limits_{k}\cancel{\bm{\lambda}_{0}(u)}\bm{h}_{k}({\bm{x}},\bm{a}){\bm{S}}_{0}(u)^{\bm{h}_{k}({\bm{x}},\bm{a})}}=\frac{\bm{h}_{k}({\bm{x}},\bm{a}){\bm{S}}_{0}(u_{i})^{\bm{h}_{k}({\bm{x}},\bm{a})}}{\mathop{\sum}\limits_{k}\bm{h}_{k}({\bm{x}},\bm{a}){\bm{S}}_{0}(u_{i})^{\bm{h}_{k}({\bm{x}},\bm{a})}}

Combining Equations 13 and B we arrive at the following estimate for the posterior counts

𝜸k\displaystyle\bm{\gamma}^{k} =^(Z=k|X=𝒙,A=𝒂,𝒖)\displaystyle=\widehat{\mathbb{P}}(Z=k|X={\bm{x}},A={\bm{a}},{\bm{u}})
=(𝒖|Z=𝒌,X=𝒙,A=𝒂)(Z=𝒌|X=𝒙)k(𝒖|Z=𝒌,X=𝒙,A=𝒂)(Z=𝒌|X=𝒙)\displaystyle=\frac{\mathbb{P}({\bm{u}}|Z={\bm{k}},X={\bm{x}},A={\bm{a}})\mathbb{P}(Z={\bm{k}}|X={\bm{x}})}{\sum_{k}\mathbb{P}({\bm{u}}|Z={\bm{k}},X={\bm{x}},A={\bm{a}})\mathbb{P}(Z={\bm{k}}|X={\bm{x}})}
=𝒉(𝒙,𝒂,𝒌)δi𝑺^0(𝒖)𝒉(𝒙,𝒂,𝒌)exp(𝜽𝒌𝒙)j𝒵𝒉(𝒙,𝒂,𝒋)δi𝑺^0(𝒖)𝒉(𝒙,𝒂,𝒋)exp(𝜽𝒋𝒙).\displaystyle=\frac{{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{k}})}^{\delta_{i}}\widehat{{\bm{S}}}_{0}({\bm{u}})^{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{k}})}\exp(\bm{\theta_{k}}^{\top}\bm{x})}{\mathop{\sum}_{j\in\mathcal{Z}}{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{j}})}^{\delta_{i}}\widehat{{\bm{S}}}_{0}({\bm{u}})^{{\bm{h}}({\bm{x}},{\bm{a}},{\bm{j}})}\exp(\bm{\theta_{j}}^{\top}\bm{x})}. (14)

Appendix C Additional Results

Figures 7, 8, 9 present tabulated metrics on ALLHAT with Hazard Ratio, Risk Difference and Restricted Mean Survival Time respectively. Figures 10, 11, 12 present tabulated metrics BARI2D with Hazard Ratio, Risk Difference and Restricted Mean Survival Time metrics respectively.

𝜽05||\bm{\theta}||_{0}\leq 5 Refer to captionRefer to caption
20% 40% 60% 80% SCS 1.31±\pm0.11 1.22±\pm0.09 1.13±\pm0.04 1.07±\pm0.06 COX-INT 1.17±\pm0.12 1.09±\pm0.06 1.06±\pm0.06 1.05±\pm0.05 COX-TLR 1.26±\pm0.12 1.11±\pm0.08 1.05±\pm0.06 1.04±\pm0.04 BIN-INT 1.12±\pm0.11 1.08±\pm0.06 1.08±\pm0.06 1.05±\pm0.05 ASCVD 1.07±\pm0.1 1.09±\pm0.07 1.02±\pm0.06 1.06±\pm0.05 20% 40% 60% 80% SCS 1.03±\pm0.08 0.95±\pm0.06 0.98±\pm0.05 1.01±\pm0.04 COX-INT 1.05±\pm0.1 1.04±\pm0.05 1.04±\pm0.05 1.04±\pm0.05 COX-TLR 1.07±\pm0.09 1.05±\pm0.06 1.02±\pm0.05 1.02±\pm0.05 BIN-INT 1.08±\pm0.1 1.03±\pm0.08 1.05±\pm0.05 1.05±\pm0.04 ASCVD 1.07±\pm0.1 1.09±\pm0.07 1.02±\pm0.06 1.06±\pm0.05

𝜽010||\bm{\theta}||_{0}\leq 10 Refer to captionRefer to caption

20% 40% 60% 80%
SCS 1.51±\pm0.14 1.26±\pm0.08 1.07±\pm0.06 1.07±\pm0.05
COX-INT 1.25±\pm0.12 1.22±\pm0.08 1.09±\pm0.06 1.04±\pm0.05
COX-TLR 1.43±\pm0.15 1.11±\pm0.08 1.07±\pm0.05 1.04±\pm0.05
BIN-INT 1.13±\pm0.11 1.05±\pm0.05 1.06±\pm0.05 1.04±\pm0.04
ASCVD 1.07±\pm0.1 1.09±\pm0.07 1.02±\pm0.06 1.06±\pm0.05
20% 40% 60% 80%
SCS 1.05±\pm0.07 1.04±\pm0.07 0.99±\pm0.04 1.0±\pm0.04
COX-INT 1.08±\pm0.1 1.05±\pm0.06 0.98±\pm0.05 1.02±\pm0.05
COX-TLR 1.12±\pm0.1 1.05±\pm0.05 1.05±\pm0.06 1.02±\pm0.05
BIN-INT 1.07±\pm0.1 1.06±\pm0.06 1.07±\pm0.06 1.03±\pm0.05
ASCVD 1.07±\pm0.1 1.09±\pm0.07 1.02±\pm0.06 1.06±\pm0.05

No Sparsity
Refer to captionRefer to caption

20% 40% 60% 80%
SCS 1.37±\pm0.16 1.22±\pm0.09 1.1±\pm0.07 1.06±\pm0.04
COX-INT 1.42±\pm0.17 1.2±\pm0.08 1.09±\pm0.05 1.06±\pm0.05
COX-TLR 1.37±\pm0.14 1.12±\pm0.07 1.06±\pm0.06 1.05±\pm0.05
BIN-INT 1.13±\pm0.11 1.05±\pm0.07 1.02±\pm0.06 1.02±\pm0.04
ASCVD 1.07±\pm0.1 1.09±\pm0.07 1.02±\pm0.06 1.06±\pm0.05
20% 40% 60% 80%
SCS 1.07±\pm0.07 1.02±\pm0.06 1.0±\pm0.04 1.01±\pm0.05
COX-INT 1.05±\pm0.06 1.02±\pm0.05 1.01±\pm0.05 0.99±\pm0.04
COX-TLR 1.1±\pm0.1 1.05±\pm0.06 1.03±\pm0.05 1.0±\pm0.04
BIN-INT 1.15±\pm0.09 1.09±\pm0.05 1.07±\pm0.06 1.03±\pm0.05
ASCVD 1.07±\pm0.1 1.09±\pm0.07 1.02±\pm0.06 1.06±\pm0.05
Figure 7: Conditional Average Treatment Effect in Hazard Ratio versus subgroup size for the latent phenogroups extracted from the ALLHAT study.

𝜽05||\bm{\theta}||_{0}\leq 5 Refer to captionRefer to caption 20% 40% 60% 80% SCS -0.05±\pm0.02 -0.03±\pm0.02 -0.01±\pm0.01 -0.0±\pm0.01 COX-INT -0.06±\pm0.02 -0.02±\pm0.01 -0.01±\pm0.01 -0.0±\pm0.01 COX-TLR -0.06±\pm0.02 -0.02±\pm0.02 -0.0±\pm0.01 -0.0±\pm0.01 BIN-INT -0.01±\pm0.02 -0.01±\pm0.01 0.0±\pm0.01 0.0±\pm0.01 ASCVD -0.01±\pm0.03 -0.01±\pm0.02 0.01±\pm0.02 -0.01±\pm0.01 20% 40% 60% 80% SCS -0.02±\pm0.02 -0.0±\pm0.02 0.0±\pm0.01 0.01±\pm0.01 COX-INT -0.02±\pm0.02 -0.0±\pm0.02 0.0±\pm0.01 0.01±\pm0.01 COX-TLR -0.02±\pm0.02 -0.01±\pm0.02 0.0±\pm0.01 0.01±\pm0.01 BIN-INT -0.02±\pm0.02 -0.01±\pm0.01 -0.01±\pm0.02 -0.0±\pm0.01 ASCVD -0.01±\pm0.03 -0.01±\pm0.02 0.01±\pm0.02 -0.01±\pm0.01

𝜽010||\bm{\theta}||_{0}\leq 10 Refer to captionRefer to caption

20% 40% 60% 80%
SCS -0.07±\pm0.02 -0.04±\pm0.02 -0.0±\pm0.01 -0.0±\pm0.01
COX-INT -0.03±\pm0.02 -0.03±\pm0.01 -0.01±\pm0.01 0.0±\pm0.01
COX-TLR -0.06±\pm0.02 -0.01±\pm0.02 -0.0±\pm0.01 -0.0±\pm0.01
BIN-INT -0.02±\pm0.02 -0.01±\pm0.01 -0.01±\pm0.01 -0.0±\pm0.01
ASCVD -0.01±\pm0.03 -0.01±\pm0.02 0.01±\pm0.02 -0.01±\pm0.01
20% 40% 60% 80%
SCS -0.02±\pm0.02 -0.01±\pm0.02 0.01±\pm0.01 0.01±\pm0.01
COX-INT -0.02±\pm0.03 -0.01±\pm0.02 0.01±\pm0.01 0.0±\pm0.01
COX-TLR -0.02±\pm0.03 -0.01±\pm0.02 -0.01±\pm0.02 0.0±\pm0.01
BIN-INT -0.01±\pm0.02 -0.0±\pm0.02 -0.01±\pm0.02 -0.0±\pm0.01
ASCVD -0.01±\pm0.03 -0.01±\pm0.02 0.01±\pm0.02 -0.01±\pm0.01

No Sparsity
Refer to captionRefer to caption 20% 40% 60% 80% SCS -0.05±\pm0.02 -0.04±\pm0.02 -0.02±\pm0.01 -0.01±\pm0.01 COX-INT -0.02±\pm0.02 -0.0±\pm0.01 -0.0±\pm0.01 -0.0±\pm0.01 COX-TLR -0.04±\pm0.02 -0.01±\pm0.01 -0.0±\pm0.01 0.0±\pm0.01 BIN-INT -0.02±\pm0.02 -0.01±\pm0.02 -0.01±\pm0.01 -0.0±\pm0.01 ASCVD -0.01±\pm0.03 -0.01±\pm0.02 0.01±\pm0.02 -0.01±\pm0.01 20% 40% 60% 80% SCS -0.01±\pm0.02 0.02±\pm0.02 0.01±\pm0.01 0.0±\pm0.01 COX-INT -0.01±\pm0.03 -0.01±\pm0.01 -0.01±\pm0.01 -0.0±\pm0.01 COX-TLR -0.03±\pm0.02 -0.01±\pm0.02 -0.0±\pm0.01 0.0±\pm0.01 BIN-INT -0.02±\pm0.02 -0.01±\pm0.02 -0.0±\pm0.01 -0.0±\pm0.01 ASCVD -0.01±\pm0.03 -0.01±\pm0.02 0.01±\pm0.02 -0.01±\pm0.01

Figure 8: Conditional Average Treatment Effect in Risk versus subgroup size for the latent phenogroups extracted from the ALLHAT study.

𝜽05||\bm{\theta}||_{0}\leq 5 Refer to captionRefer to caption 20% 40% 60% 80% SCS -80.91±\pm24.81 -60.39±\pm17.44 -32.92±\pm15.67 -24.33±\pm11.35 COX-INT -79.47±\pm22.1 -55.03±\pm15.93 -30.2±\pm13.11 -22.46±\pm11.68 COX-TLR -88.58±\pm22.92 -37.01±\pm15.97 -24.48±\pm15.29 -22.49±\pm12.61 BIN-INT -19.52±\pm22.12 -14.56±\pm15.51 -14.59±\pm15.14 -12.92±\pm10.25 ASCVD -18.81±\pm30.57 -29.74±\pm18.53 -13.37±\pm18.52 -28.73±\pm14.45 20% 40% 60% 80% SCS -38.04±\pm22.85 -16.19±\pm18.4 -4.54±\pm14.54 -9.46±\pm12.27 COX-INT -22.11±\pm23.05 -19.93±\pm18.72 -9.21±\pm15.22 -7.63±\pm14.01 COX-TLR -37.29±\pm28.39 -22.86±\pm19.1 -16.42±\pm13.25 -8.0±\pm12.11 BIN-INT -57.22±\pm25.47 -34.3±\pm16.07 -35.18±\pm16.41 -22.37±\pm14.36 ASCVD -18.81±\pm30.57 -29.74±\pm18.53 -13.37±\pm18.52 -28.73±\pm14.45

𝜽010||\bm{\theta}||_{0}\leq 10 Refer to captionRefer to caption

20% 40% 60% 80%
SCS -101.97±\pm20.57 -69.01±\pm17.72 -28.38±\pm14.82 -26.13±\pm12.52
COX-INT -57.8±\pm22.46 -53.56±\pm16.89 -27.46±\pm13.79 -17.04±\pm13.01
COX-TLR -74.04±\pm19.52 -28.19±\pm17.6 -25.25±\pm13.57 -18.17±\pm13.06
BIN-INT -21.45±\pm33.0 -15.78±\pm13.54 -18.12±\pm14.34 -21.48±\pm12.17
ASCVD -18.81±\pm30.57 -29.74±\pm18.53 -13.37±\pm18.52 -28.73±\pm14.45
20% 40% 60% 80%
SCS -27.85±\pm23.85 -15.3±\pm20.0 -2.78±\pm12.58 -8.69±\pm12.72
COX-INT -35.66±\pm31.92 -26.32±\pm19.65 -5.07±\pm16.91 -14.43±\pm13.43
COX-TLR -50.44±\pm29.24 -26.65±\pm15.8 -25.8±\pm19.21 -18.25±\pm13.94
BIN-INT -30.1±\pm26.83 -36.65±\pm17.93 -34.47±\pm16.25 -21.25±\pm13.13
ASCVD -18.81±\pm30.57 -29.74±\pm18.53 -13.37±\pm18.52 -28.73±\pm14.45

No Sparsity
Refer to captionRefer to caption

20% 40% 60% 80%
SCS -85.16±\pm23.76 -69.9±\pm20.1 -44.31±\pm10.2 -31.01±\pm15.13
COX-INT -44.56±\pm24.49 -28.22±\pm15.64 -23.32±\pm15.2 -21.78±\pm13.45
COX-TLR -60.94±\pm24.58 -38.11±\pm18.07 -25.09±\pm14.44 -20.61±\pm11.56
BIN-INT -20.17±\pm27.07 -21.04±\pm17.57 -24.72±\pm14.28 -20.06±\pm12.56
ASCVD -18.81±\pm30.57 -29.74±\pm18.53 -13.37±\pm18.52 -28.73±\pm14.45
20% 40% 60% 80%
SCS 1.74±\pm24.81 7.5±\pm18.37 -0.1±\pm13.34 -11.88±\pm12.05
COX-INT -27.49±\pm32.86 -22.41±\pm19.13 -20.47±\pm15.24 -22.05±\pm13.78
COX-TLR -28.94±\pm28.29 -23.05±\pm19.46 -15.34±\pm15.18 -14.23±\pm14.52
BIN-INT -40.82±\pm27.02 -25.63±\pm22.29 -28.62±\pm14.63 -27.02±\pm11.01
ASCVD -18.81±\pm30.57 -29.74±\pm18.53 -13.37±\pm18.52 -28.73±\pm14.45
Figure 9: Conditional Average Treatment Effect in Restricted Mean Survival Time versus subgroup size for the latent phenogroups extracted from the ALLHAT study.

𝜽05||\bm{\theta}||_{0}\leq 5 Refer to captionRefer to caption 20% 40% 60% 80% SCS 1.5±\pm0.36 1.29±\pm0.2 1.19±\pm0.16 1.14±\pm0.15 COX-INT 1.61±\pm0.4 1.28±\pm0.22 1.05±\pm0.15 1.07±\pm0.16 COX-TLR 1.24±\pm0.22 1.2±\pm0.19 1.16±\pm0.16 1.09±\pm0.13 BIN-INT 0.9±\pm0.2 0.98±\pm0.16 1.05±\pm0.13 1.08±\pm0.15 ASCVD 0.86±\pm0.27 0.86±\pm0.16 1.05±\pm0.18 1.06±\pm0.15 20% 40% 60% 80% SCS 0.71±\pm0.24 0.66±\pm0.16 0.82±\pm0.14 0.95±\pm0.15 COX-INT 0.76±\pm0.3 0.86±\pm0.14 0.86±\pm0.14 0.87±\pm0.1 COX-TLR 0.67±\pm0.26 0.81±\pm0.17 0.85±\pm0.16 0.98±\pm0.14 BIN-INT 0.9±\pm0.2 0.98±\pm0.16 1.05±\pm0.13 1.08±\pm0.15 ASCVD 0.86±\pm0.27 0.86±\pm0.16 1.05±\pm0.18 1.06±\pm0.15

𝜽010||\bm{\theta}||_{0}\leq 10 Refer to captionRefer to caption

20% 40% 60% 80%
scs 1.42±\pm0.4 1.24±\pm0.22 1.22±\pm0.19 1.11±\pm0.14
COX-INT 1.22±\pm0.25 1.28±\pm0.2 1.06±\pm0.17 1.07±\pm0.13
COX-TLR 1.15±\pm0.27 1.29±\pm0.22 1.18±\pm0.2 1.09±\pm0.15
BIN-INT 0.9±\pm0.2 0.98±\pm0.16 1.05±\pm0.13 1.08±\pm0.15
ASCVD 0.86±\pm0.27 0.86±\pm0.16 1.05±\pm0.18 1.06±\pm0.15
20% 40% 60% 80%
SCS 1.21±\pm0.3 1.35±\pm0.22 1.38±\pm0.25 1.1±\pm0.16
COX-INT 1.37±\pm0.31 1.29±\pm0.25 1.15±\pm0.18 1.1±\pm0.14
COX-TLR 1.34±\pm0.4 1.18±\pm0.26 1.16±\pm0.17 1.05±\pm0.14
BIN-INT 0.9±\pm0.2 0.98±\pm0.16 1.05±\pm0.13 1.08±\pm0.15
ASCVD 0.86±\pm0.27 0.86±\pm0.16 1.05±\pm0.18 1.06±\pm0.15

No Sparsity
Refer to captionRefer to caption

20% 40% 60% 80%
scs 1.21±\pm0.3 1.35±\pm0.22 1.38±\pm0.25 1.1±\pm0.16
COX-INT 1.37±\pm0.31 1.29±\pm0.25 1.15±\pm0.18 1.1±\pm0.14
COX-TLR 1.34±\pm0.4 1.18±\pm0.26 1.16±\pm0.17 1.05±\pm0.14
BIN-INT 0.9±\pm0.2 0.98±\pm0.16 1.05±\pm0.13 1.08±\pm0.15
ASCVD 0.86±\pm0.27 0.86±\pm0.16 1.05±\pm0.18 1.06±\pm0.15
20% 40% 60% 80%
SCS 0.7±\pm0.26 0.68±\pm0.15 0.75±\pm0.11 0.9±\pm0.11
COX-INT 0.7±\pm0.21 0.8±\pm0.16 0.88±\pm0.19 0.92±\pm0.14
COX-TLR 0.95±\pm0.33 0.79±\pm0.17 0.95±\pm0.15 0.94±\pm0.13
BIN-INT 0.9±\pm0.2 0.98±\pm0.16 1.05±\pm0.13 1.08±\pm0.15
ASCVD 0.86±\pm0.27 0.86±\pm0.16 1.05±\pm0.18 1.06±\pm0.15
Figure 10: Conditional Average Treatment Effect in Hazard Ratio versus subgroup size for the latent phenogroups extracted from the BARI 2D study.

𝜽05||\bm{\theta}||_{0}\leq 5 Refer to captionRefer to caption 20% 40% 60% 80% SCS -0.08±\pm0.05 -0.04±\pm0.04 -0.03±\pm0.03 -0.01±\pm0.03 COX-INT -0.1±\pm0.07 -0.04±\pm0.05 0.01±\pm0.04 -0.01±\pm0.03 COX-TLR -0.03±\pm0.05 -0.02±\pm0.05 -0.02±\pm0.03 -0.01±\pm0.03 BIN-INT 0.06±\pm0.06 0.03±\pm0.04 0.01±\pm0.03 -0.01±\pm0.03 ASCVD 0.02±\pm0.07 0.05±\pm0.04 -0.0±\pm0.04 -0.01±\pm0.04 20% 40% 60% 80% SCS 0.05±\pm0.05 0.07±\pm0.04 0.03±\pm0.03 0.02±\pm0.03 COX-INT 0.03±\pm0.05 0.02±\pm0.04 0.02±\pm0.03 0.03±\pm0.02 COX-TLR 0.05±\pm0.06 0.02±\pm0.04 0.02±\pm0.03 0.01±\pm0.03 BIN-INT 0.06±\pm0.06 0.03±\pm0.04 0.01±\pm0.03 -0.01±\pm0.03 ASCVD 0.02±\pm0.07 0.05±\pm0.04 -0.0±\pm0.04 -0.01±\pm0.04

𝜽010||\bm{\theta}||_{0}\leq 10 Refer to captionRefer to caption

20% 40% 60% 80%
SCS -0.07±\pm0.06 -0.03±\pm0.04 -0.04±\pm0.04 -0.02±\pm0.03
COX-INT -0.05±\pm0.05 -0.04±\pm0.04 0.01±\pm0.04 -0.01±\pm0.03
COX-TLR -0.02±\pm0.06 -0.05±\pm0.04 -0.02±\pm0.04 -0.01±\pm0.03
BIN-INT 0.06±\pm0.06 0.03±\pm0.04 0.01±\pm0.03 -0.01±\pm0.03
ASCVD 0.02±\pm0.07 0.05±\pm0.04 -0.0±\pm0.04 -0.01±\pm0.04
20% 40% 60% 80%
SCS -0.07±\pm0.06 -0.03±\pm0.04 -0.04±\pm0.04 -0.02±\pm0.03
COX-INT -0.05±\pm0.05 -0.04±\pm0.04 0.01±\pm0.04 -0.01±\pm0.03
COX-TLR -0.02±\pm0.06 -0.05±\pm0.04 -0.02±\pm0.04 -0.01±\pm0.03
BIN-INT 0.06±\pm0.06 0.03±\pm0.04 0.01±\pm0.03 -0.01±\pm0.03
ASCVD 0.02±\pm0.07 0.05±\pm0.04 -0.0±\pm0.04 -0.01±\pm0.04

No Sparsity
Refer to captionRefer to caption

20% 40% 60% 80%
SCS 0.07±\pm0.06 0.09±\pm0.04 0.07±\pm0.03 0.03±\pm0.03
COX-INT 0.06±\pm0.07 0.03±\pm0.04 0.03±\pm0.04 0.03±\pm0.03
COX-TLR 0.01±\pm0.07 0.05±\pm0.04 0.01±\pm0.04 0.02±\pm0.03
BIN-INT 0.06±\pm0.06 0.03±\pm0.04 0.01±\pm0.03 -0.01±\pm0.03
ASCVD 0.02±\pm0.07 0.05±\pm0.04 -0.0±\pm0.04 -0.01±\pm0.04
20% 40% 60% 80%
SCS 0.07±\pm0.06 0.09±\pm0.04 0.07±\pm0.03 0.03±\pm0.03
COX-INT 0.06±\pm0.07 0.03±\pm0.04 0.03±\pm0.04 0.03±\pm0.03
COX-TLR 0.01±\pm0.07 0.05±\pm0.04 0.01±\pm0.04 0.02±\pm0.03
BIN-INT 0.06±\pm0.06 0.03±\pm0.04 0.01±\pm0.03 -0.01±\pm0.03
ASCVD 0.02±\pm0.07 0.05±\pm0.04 -0.0±\pm0.04 -0.01±\pm0.04
Figure 11: Conditional Average Treatment Effect in Risk versus subgroup size for the latent phenogroups extracted from the BARI 2D study.

𝜽05||\bm{\theta}||_{0}\leq 5 Refer to captionRefer to caption 20% 40% 60% 80% SCS -30.51±\pm71.06 -34.71±\pm41.55 -11.77±\pm43.75 -2.73±\pm36.52 COX-INT -106.06±\pm74.9 -31.3±\pm55.18 16.38±\pm43.89 8.93±\pm42.04 COX-TLR -16.18±\pm53.18 0.65±\pm60.94 -0.7±\pm46.54 1.29±\pm35.33 BIN-INT 11.84±\pm65.18 10.39±\pm46.33 12.23±\pm33.8 -0.6±\pm34.6 ASCVD 108.54±\pm109.62 48.36±\pm56.77 -5.7±\pm49.77 3.93±\pm36.04 20% 40% 60% 80% SCS 61.73±\pm68.48 61.76±\pm53.61 52.87±\pm37.98 24.86±\pm45.35 COX-INT 41.07±\pm68.66 33.99±\pm51.0 43.48±\pm38.35 44.84±\pm31.96 COX-TLR 60.48±\pm71.86 23.63±\pm51.48 32.45±\pm39.39 25.35±\pm34.38 BIN-INT 11.84±\pm65.18 10.39±\pm46.33 12.23±\pm33.8 -0.6±\pm34.6 ASCVD 108.54±\pm109.62 48.36±\pm56.77 -5.7±\pm49.77 3.93±\pm36.04

𝜽010||\bm{\theta}||_{0}\leq 10 Refer to captionRefer to caption

20% 40% 60% 80%
SCS -30.2±\pm67.44 -14.45±\pm55.58 -6.03±\pm43.99 -3.8±\pm39.02
COX-INT -23.18±\pm69.92 -26.64±\pm53.34 12.09±\pm41.39 8.87±\pm36.87
COX-TLR -10.22±\pm73.05 -17.07±\pm54.39 2.7±\pm49.05 0.31±\pm33.17
BIN-INT 11.84±\pm65.18 10.39±\pm46.33 12.23±\pm33.8 -0.6±\pm34.6
ASCVD 108.54±\pm109.62 48.36±\pm56.77 -5.7±\pm49.77 3.93±\pm36.04
20% 40% 60% 80%
SCS 130.48±\pm58.03 54.3±\pm45.22 23.0±\pm34.47 32.71±\pm29.64
COX-INT 26.58±\pm74.93 20.7±\pm50.38 37.69±\pm33.38 27.18±\pm33.95
COX-TLR 71.81±\pm70.62 27.41±\pm57.71 52.85±\pm39.85 28.33±\pm32.79
BIN-INT 11.84±\pm65.18 10.39±\pm46.33 12.23±\pm33.8 -0.6±\pm34.6
ASCVD 108.54±\pm109.62 48.36±\pm56.77 -5.7±\pm49.77 3.93±\pm36.04

No Sparsity
Refer to captionRefer to caption

20% 40% 60% 80%
SCS -15.46±\pm67.61 -63.38±\pm45.83 -50.14±\pm44.42 1.99±\pm40.43
COX-INT -40.54±\pm64.8 -35.58±\pm51.4 -5.82±\pm49.97 4.42±\pm36.41
COX-TLR -34.51±\pm82.71 -11.28±\pm57.62 -12.3±\pm40.85 20.28±\pm38.2
BIN-INT 11.84±\pm65.18 10.39±\pm46.33 12.23±\pm33.8 -0.6±\pm34.6
ASCVD 108.54±\pm109.62 48.36±\pm56.77 -5.7±\pm49.77 3.93±\pm36.04
20% 40% 60% 80%
SCS 102.43±\pm64.9 113.99±\pm41.35 76.49±\pm38.53 49.76±\pm31.25
COX-INT 74.04±\pm63.72 57.82±\pm50.04 50.8±\pm52.89 41.27±\pm38.38
COX-TLR 1.96±\pm72.63 70.7±\pm53.71 36.51±\pm40.91 33.98±\pm36.9
BIN-INT 11.84±\pm65.18 10.39±\pm46.33 12.23±\pm33.8 -0.6±\pm34.6
ASCVD 108.54±\pm109.62 48.36±\pm56.77 -5.7±\pm49.77 3.93±\pm36.04
Figure 12: Conditional Average Treatment Effect in Restricted Mean Survival Time versus subgroup size for the latent phenogroups extracted from the BARI 2D study.