Analysis of N-of-1 trials using Bayesian distributed lag model with autocorrelated errors

Ziwei Liao¹, Min Qian¹, Ian M. Kronish², Ying Kuen Cheung¹

An N-of-1 trial is a multi-period crossover trial performed in a single individual, with a primary goal to estimate treatment effect on the individual instead of population-level mean responses. As in a conventional crossover trial, it is critical to understand carryover effects of the treatment in an N-of-1 trial, especially when no washout periods between treatment periods are instituted to reduce trial duration. To deal with this issue in situations where high volume of measurements is made during the study, we introduce a novel Bayesian distributed lag model that facilitates the estimation of carryover effects, while accounting for temporal correlations using an autoregressive model. Specifically, we propose a prior variance-covariance structure on the lag coefficients to address collinearity caused by the fact that treatment exposures are typically identical on successive days. A connection between the proposed Bayesian model and penalized regression is noted. Simulation results demonstrate that the proposed model substantially reduces the root mean squared error in the estimation of carryover effects and immediate effects when compared to other existing methods, while being comparable in the estimation of the total effects. We also apply the proposed method to assess the extent of carryover effects of light therapies in relieving depressive symptoms in cancer survivors.

Keywords:
Bayesian distributed lag model, Carryover effects, N-of-1 trials, Personalized treatment, Regression with autocorrelated errors, Time series

^†^†¹Department of Biostatistics, Columbia University, New York, NY, USA^†^†²Center for Behavioral Cardiovascular Health, Columbia University, New York, NY, USA^†^†
Corresponding Author:^†^†Ying Kuen Cheung, ^†^†Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA.^†^†Email: yc632@columbia.edu

1 Introduction

N-of-1 trials are multi-period crossover studies that compare two or more interventions in single individuals, and are suitable for evaluating personalized treatment effects in those with chronic conditions where the outcome is relatively stable. [1] Advances in mobile and sensor technology [2] and better understanding of patient preferences [3] have improved the implementation of N-of-1 trials. However, their uptake remains very small in clinical practice. In particular, the duration of N-of-1 trials remains a key barrier. To reduce the duration needed to conduct an N-of-1 trial and to reduce the burden of participation, it is often necessary to preclude scheduling washout periods between treatments. When physical washout periods are not feasible, it is critical to have provisions for dealing with carryover effects analytically. To motivate our work, consider an N-of-1 trial series that compare bright white light (10,000 lux) and dim red light (50 lux) in cancer patients with depressive symptoms, where light therapy was delivered by portable light boxes with instructions. [4] Briefly, each individual would use one of two light boxes for 30 minutes each morning over a 12 weeks. Along with the light boxes, a smartphone application would be used to give treatment reminders and to assess daily depressive symptoms and fatigue level over the entire 12-week period. While theory suggests bright white light may reduce cancer-related depression and fatigue, its effects may vary from individual to individual. [5] Thus, the primary analytical goal in light therapy study is to identify for each individual whether bright white light is superior in terms of symptom control and make light therapy suggestion for their further clinical treatment. Figure 1 shows the daily assessments of two patients during the study course.

Refer to caption — Figure 1: Daily assessments of two patients id 7706 and 7708. Black line represents bright white light intervention, and grey line dim red light.

In a systematic review of 108 N-of-1 trial series published between 1985 and 2010, Gabler et al. reported on the analytic methods used to compare the effectiveness of two or more treatments being studied in an N-of-1 trial, including graphical comparison, hypothesis tests (e.g., t-test, nonparametric tests), and regression models.[6] While there is no single agreed upon analysis method, these methods ignore two key features of experimental N-of-1 data. First, most methods do not account for temporal dependence (i.e., autocorrelation) between assessments. Second, the methods do not capture the carryover effects of an intervention. The second data feature, which motivates this article, can be partly addressed by using a distributed lagged model (DLM), which is widely used in economics, [7, 8] advertising, [9] and environmental health studies. [10, 11] A DLM postulates that the current value of the outcome variable depends on the previous values (lags) of an exposure as well as the current exposure value, thus allowing the total exposure effect to be distributed over a time period and facilitating explicit modeling of carryover effects. A potential challenge in fitting a DLM is collinearity of the exposure lags. The N-of-1 trial design will further aggravate the problem: as illustrated in Figure 1, the exposure (light box) often remains the same as in the previous day in order to avoid switching intervention too frequently during a trial. A strategy to handle collinearity in DLM is by putting parametric constraints on the lag coefficients such as geometric lags, [7] or polynomial lags. [8] Alternatively, one may consider putting informative prior on the coefficients in a Bayesian framework. [10]

In this article, we adopt the Bayesian framework and propose a Bayesian distributed lag model with autocorrelated errors (BDLM-AR) as an extension of DLMs for N-of-1 trial data. The model is novel in several ways. First, we propose a prior distribution that constrains the lag coefficients with shrinkage factors that increase over time. Second, we impose a fused ridge-type penalty to address collinearity, which may be viewed as a variant of the fused lasso method. [12] Third, while current DLM methods assume independent error terms, we incorporate temporal correlations using an autoregressive error model. We will introduce the proposed BDLM-AR with details in Section 2, and describe the posterior computations in Section 3. The performance of BDLM-AR will be evaluated and compared with other methods by simulation studies presented in Section 4. We will apply the proposed method to the light therapy data in Section 5, and will conclude this article with a discussion in Section 6.

2 Bayesian Distributed Lag Model with Autocorrelated Error

2.1 Proposed Model

Suppose we observe data from a patient on $n$ consecutive days. On day $t=1,\ldots,n$ , let $X_{t}$ and $Y_{t}$ denote the binary treatment indicator and the outcome of interest, respectively. We consider a distributed lag autoregressive model for $Y$ , described as follows:

Y_{t}=\mu+\sum^{L}_{l=0}\beta_{l}X_{t-l}+\epsilon_{t}

(1)

for $t=p+1,...,n$ , where the error term $\epsilon_{t}$ follows an autoregressive process,

\epsilon_{t}=\varphi_{1}\epsilon_{t-1}+\varphi_{2}\epsilon_{t-2}+...+\varphi_{p}\epsilon_{t-p}+w_{t}

(2)

$w_{t}$ is a white Gaussian noise with mean zero and unknown variance $\sigma^{2}>0$ , and $L$ and $p$ are pre-specified. Note that for $t<L$ , the maximum lag effect is of order $t-1$ , and terms involving $X$ with non-positive subscript are not included in the model.

Model (1) is composed of two parts. First, for the structural component, the mean model is specified by lag coefficients $\boldsymbol{\beta}=(\beta_{0},...,\beta_{L})^{\prime}$ and control mean $\mu$ . The immediate treatment effect is measured by $\beta_{0}$ , and the carryover effect due to treatment on $l$ days ago is measured by $\beta_{l}$ for $l>0$ . In the model, we assume the carryover effect beyond day $L$ is zero. As such, the total carryover treatment effect is captured as

\delta\triangleq\sum_{l=1}^{L}\beta_{l}=E(Y_{t}|X_{t-1}=1,...X_{t-L}=1,X_{t})-E(Y_{t}|X_{t-1}=0,...,X_{t-L}=0,X_{t}).

Hence, the model naturally breaks down total treatment effect into $\beta_{0}$ and $\delta$ .

Second, for the stochastic component, temporal dependency between errors is specified using an order- $p$ autoregressive error model with autoregression coefficient $\boldsymbol{\varphi}=(\varphi_{1},...,\varphi_{p})^{\prime}$ . Let $B$ denote the backshift operator, that is, having $\Phi(B)=1-\varphi_{1}B-\varphi_{2}B^{2}-...-\varphi_{p}B^{p}$ so that the autoregression model for the error terms can be written as $\Phi(B)\epsilon_{t}=w_{t}$ . It is often convenient to work with the transformed data $Y^{*}_{t}=\Phi(B)Y_{t}$ and $X_{t}^{*}=\Phi(B)X_{t}$ in the estimation steps. Thus, applying $\Phi(B)$ to both sides of model (1), we will rewrite the model

Y^{*}_{t}=\mu^{*}+\sum^{L}_{l=0}\beta_{l}X^{*}_{t-l}+w_{t},

(3)

for $t=p+1,...,n$ , where $\mu^{*}=\Phi(B)\mu$ . To stack the data in vector form, we have

(\boldsymbol{Y}^{*}\mid\boldsymbol{X}^{*},\mu^{*},\boldsymbol{\beta})\sim N(\mu^{*}\boldsymbol{1}_{n-p}+\boldsymbol{X}^{*}\boldsymbol{\beta},\sigma^{2}\mathbf{I}_{n-p})

(4)

where $\boldsymbol{Y}^{*}=(Y^{*}_{p+1},...,Y^{*}_{n})^{\prime}$ , $\boldsymbol{X}^{*}$ is a $(n-p)\times(L+1)$ matrix with $X^{*}_{k-l+p+1}$ being the $(k,l)$ -th element of $\boldsymbol{X}^{*}$ , $\boldsymbol{1}_{n-p}$ is a 1-vector of length $n-p$ , and $\mathbf{I}_{n-p}$ is the identity matrix of dimension $n-p$ . We denote $\tilde{\boldsymbol{\beta}}=(\mu,\boldsymbol{\beta}^{\prime})^{\prime}$ and $\tilde{\boldsymbol{X}^{*}}=(\Phi(B)\boldsymbol{1}_{n-p},\boldsymbol{X}^{*})$ , so that $\tilde{\boldsymbol{X}^{*}}\tilde{\boldsymbol{\beta}}=\mu^{*}\boldsymbol{1}_{n-p}+\boldsymbol{X}^{*}\boldsymbol{\beta}$ .

2.2 Prior Distribution on the Mean Model

We consider normal prior distribution for $\tilde{\boldsymbol{\beta}}$ , that is, having

\tilde{\boldsymbol{\beta}}\sim N(\boldsymbol{0},\sigma^{2}\tilde{\boldsymbol{\Omega}}^{-1}),

(5)

where $\tilde{\mathbf{\Omega}}=diag(c_{0},\boldsymbol{\Omega})$ so that the prior variance of $\mu$ is $\sigma^{2}c_{0}^{-1}$ and the prior variance-covariance matrix of $\boldsymbol{\beta}$ is $\sigma^{2}\boldsymbol{\Omega}^{-1}$ . We note that the prior variance depends on the variance $\sigma^{2}$ of the observations: such dependence renders a fused ridge penalized estimation procedure that is free of the variance parameters, resulting in computational stability; see Equation (7) below. We will postulate a non-informative prior on $\mu$ by setting $c_{0}$ to be a small number, and we will consider $\boldsymbol{\Omega}$ of the following form:

\left(\begin{array}[]{cccccc}{\lambda_{0}+\lambda^{*}_{0}}&{-\lambda^{*}_{0}}&{0}&{\ldots}&{\ldots}&{0}\\ {-\lambda^{*}_{0}}&{\lambda_{1}+\lambda^{*}_{0}+\lambda^{*}_{1}}&{-\lambda^{*}_{1}}&{\ldots}&{\ldots}&{0}\\ {0}&{-\lambda^{*}_{1}}&{\lambda_{2}+\lambda^{*}_{1}+\lambda^{*}_{2}}&{\ldots}&{\ldots}&{0}\\ {\vdots}&{\vdots}&{\vdots}&{\ddots}&{\vdots}&{\vdots}\\ {0}&{0}&{0}&{\ldots}&{{\lambda_{L-1}}+\lambda^{*}_{L-2}+\lambda^{*}_{L-1}}&{-\lambda^{*}_{L-1}}\\ {0}&{0}&{0}&{\ldots}&{-\lambda^{*}_{L-1}}&{{\lambda_{L}}+\lambda^{*}_{L-1}+\lambda^{*}_{L}}\end{array}\right),

(6)

where the hyperparameters $\lambda_{l},\lambda_{l}^{*}>0$ , for $l=0,\ldots,L,$ are constrained to increase over $l$ . As a result of the monotonicity constraint, a lag coefficient $\beta_{l}$ at a greater lag $l$ is associated with a larger diagonal element in precision $\boldsymbol{\Omega}$ , thus shrinking $\beta_{l}$ toward the prior mean (zero) to a greater extent. This effectively addresses collinearity of the lag coefficients without imposing strong parametric structure to $\boldsymbol{\beta}$ . In addition, using the normal prior (5) with precision matrix (6), we can show that the maximum a posteriori probability estimate of $\boldsymbol{\tilde{\beta}}$ minimizes a fused ridge-type penalty:

(\boldsymbol{Y}^{*}-\tilde{\boldsymbol{X}}^{*}\tilde{\boldsymbol{\beta}})^{\prime}(\boldsymbol{Y}^{*}-\tilde{\boldsymbol{X}}^{*}\tilde{\boldsymbol{\beta}})+c_{0}\mu^{2}+\sum_{l=0}^{L}\lambda_{l}\beta_{l}^{2}+\sum_{l=0}^{L}\lambda^{*}_{l}(\beta_{l}-\beta_{l+1})^{2}

(7)

where $\beta_{L+1}\triangleq 0$ , thus giving insights on how the proposed prior constrains the lag coefficients: it regularizes not only the $\ell_{2}$ -norm of the coefficients but also their successive differences, thereby enhancing local smoothness. The equivalence between the Bayesian inference and the fused ridge regularization (7) is proved in Appendix A.

There are many ways to specify $\lambda_{l}$ and $\lambda_{l}^{*}$ to meet the monotonicity constraints. In this article, we consider $\lambda_{l}=\exp\{\gamma_{1}(l+1)\}-1$ and $\lambda^{*}_{l}=\exp\{\gamma_{2}(l+1)\}-1$ for $\gamma_{1},\gamma_{2}>0$ , so that $\gamma_{1}$ controls the rate at which the ridge penalty in (7) increases, and $\gamma_{2}$ controls the increasing rate of smoothness of the coefficient curve $\boldsymbol{\beta}$ . Instead of treating these hyperparameters as fixed, we postulate a truncated standard exponential hyperprior on $(\gamma_{1},\gamma_{2})$ , that is, having probability density function

\pi(\gamma_{1},\gamma_{2})\propto\exp{(-\gamma_{1}-\gamma_{2})}\boldsymbol{1}_{S_{\boldsymbol{\gamma}}}(\gamma_{1},\gamma_{2})

(8)

where the support $S_{\boldsymbol{\gamma}}$ includes all pairs $(\gamma_{1},\gamma_{2})$ with which the precision matrix $\mathbf{\Omega}$ is positive definite. As such, the degree of ridge and smooth penalization can be determined according to the posterior distribution of the pair.

2.3 Prior Distribution on the Error Model

We put the Jeffreys prior for the error variance $\sigma^{2}$ , that is, having density function

\pi(\sigma^{2})\propto 1/\sigma^{2}

(9)

Note that any inverse-gamma prior for $\sigma^{2}$ would maintain conjugacy, and the Jeffreys prior can be regarded as an improper limit of inverse-gamma prior distribution.

For the autoregressive process, we consider a truncated normal prior for $\boldsymbol{\varphi}$ subject to the constraint that the error process is stationary. Specifically, we postulate

\boldsymbol{\varphi}\sim N_{p}\left(0_{p},200\times\mathbf{I}_{p}\right)\boldsymbol{1}_{S_{\boldsymbol{\varphi}}}(\boldsymbol{\varphi})

(10)

where ${S_{\boldsymbol{\varphi}}}(\boldsymbol{\varphi})$ denotes the support where all roots of the polynomial $\Phi(B)=1-\sum_{l=1}^{p}\varphi_{l}B^{l}$ are outside the unit circle. The process $\{\epsilon_{t}:t=1,2,\ldots\}$ is stationary when $\boldsymbol{\varphi}\in S_{\boldsymbol{\varphi}}(\boldsymbol{\varphi})$ . [13] Note that the range of each $\varphi_{l}$ is $(-1,1)$ ; thus, a prior variance of 200 in (10) essentially amounts to a flat prior.

3 Conditional Posterior Distributions

The proposed Bayesian model includes several conditionally conjugate priors, which facilitate posterior computations via a hybrid Metropolis-Hastings/Gibbs algorithm. We describe the conditional posterior distributions in this section.

Working with the likelihood (4) based on the transformed data $Y_{t}^{*}$ , we obtain that $\tilde{\boldsymbol{\beta}}$ is conditionally normally distributed a posteriori:

\tilde{\boldsymbol{\beta}}\mid\boldsymbol{Y},\tilde{\boldsymbol{X}},\sigma^{2},\boldsymbol{\varphi},\boldsymbol{\gamma}\sim N_{L+1}\left\{[\tilde{\boldsymbol{X}}^{*^{\prime}}\tilde{\boldsymbol{X}}^{*}+\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})]^{-1}\tilde{\boldsymbol{X}}^{*^{\prime}}\boldsymbol{Y}^{*},\sigma^{2}[\tilde{\boldsymbol{X}}^{*^{\prime}}\tilde{\boldsymbol{X}}^{*}+\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})]^{-1}\right\}

(11)

and that $\sigma^{2}$ has an inverse-gamma conditional posterior:

\sigma^{2}\mid\boldsymbol{Y},\tilde{\boldsymbol{X}},\tilde{\boldsymbol{\beta}},\boldsymbol{\varphi},\boldsymbol{\gamma}\sim\text{IG}\left[\frac{n-p+L+1}{2},\frac{(\boldsymbol{Y}^{*}-\tilde{\boldsymbol{X}}^{*}\tilde{\boldsymbol{\beta}})^{\prime}(\boldsymbol{Y}^{*}-\tilde{\boldsymbol{X}}^{*}\tilde{\boldsymbol{\beta}})+\tilde{\boldsymbol{\beta}}^{\prime}\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})\tilde{\boldsymbol{\beta}}}{2}\right]

(12)

Note that the dependence of (11) and (12) on $\boldsymbol{\varphi}$ is via the transformed data $\boldsymbol{Y^{*}}$ .

Working with model (2) and (10), we obtain the conditional posterior distribution of $\boldsymbol{\varphi}$ is truncated multivariate normal:

\boldsymbol{\varphi}\mid\boldsymbol{Y},\tilde{\boldsymbol{X}},\tilde{\boldsymbol{\beta}},\sigma^{2},\boldsymbol{\gamma}\sim N_{p}\left[\left(\sigma^{-2}\boldsymbol{E}^{*^{\prime}}\boldsymbol{E}^{*}+\sigma_{\boldsymbol{\varphi}}^{-2}\mathbf{I}\right)^{-1}\sigma^{-2}\boldsymbol{E}^{*^{\prime}}\boldsymbol{\epsilon}^{*},\left(\sigma^{-2}\boldsymbol{E}^{*^{\prime}}\boldsymbol{E}^{*}+\sigma_{\boldsymbol{\varphi}}^{-2}\mathbf{I}\right)^{-1}\right]\boldsymbol{1}_{S_{\boldsymbol{\varphi}}}(\boldsymbol{\varphi})

(13)

where $\boldsymbol{\epsilon}^{*}=(\epsilon_{p+1}^{*},\ldots,\epsilon_{n}^{*})^{\prime}$ , $\epsilon_{t}^{*}=Y_{t}-\mu-\sum^{L}_{l=0}\beta_{l}X_{t-l}$ , and $\boldsymbol{E}^{*}$ is a $(n-p)\times p$ matrix with $\epsilon_{p+k-j}^{*}$ being the $(k,j)$ -th element. Because of conjugacy, the parameters $\tilde{\boldsymbol{\beta}},\sigma^{2},$ and $\boldsymbol{\varphi}$ can be easily updated in a Gibbs sampling fashion.

Using the likelihood (4) and prior of $\boldsymbol{\gamma}$ and $\tilde{\boldsymbol{\beta}}$ , the conditional posterior distribution can be expressed as

\pi(\boldsymbol{\gamma}\mid\boldsymbol{Y},\tilde{\boldsymbol{X}},\tilde{\boldsymbol{\beta}},\boldsymbol{\varphi},\sigma^{2})\propto|\sigma^{-2}\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})|^{\frac{1}{2}}\exp\left[-\frac{1}{2\sigma^{2}}\tilde{\boldsymbol{\beta}}^{\prime}\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})\tilde{\boldsymbol{\beta}}\right]\exp(-\gamma_{1}-\gamma_{2})\boldsymbol{1}_{S_{\boldsymbol{\gamma}}}(\gamma_{1},\gamma_{2}).

(14)

We propose to sample $\boldsymbol{\gamma}$ using a Metropolis-Hastings (MH) step with a uniform $U(-a,a)$ proposal distribution, that is, having an updating step $\gamma_{i,new}=\gamma_{i}+U(-a,a)$ , where the tuning parameter $a$ is chosen such that the acceptance rate of proposed sample is around 50%. [14] Note that updating the hyperparameter $\boldsymbol{\gamma}$ involves the calculation of the precision matrix $\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})$ , which needs to be positive definite. The $(L+2)\times(L+2)$ precision matrix $\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})$ is a special case of tridiagonal matrix. The computational cost of regular algorithms for checking whether a given matrix is positive definite is at most $O(L^{3})$ . In this article, we implement an efficient computation algorithm with cost of $O(L)$ . Specifically, define an $(L+2)$ -dimensional vector $\mathbf{c}=(c_{0},c_{2},...,c_{L+1})$ by

\displaystyle c_{l}=\begin{cases}\lambda_{0}+\lambda^{*}_{0},&~{}~{}l=0,\\ (\lambda_{l}+\lambda^{*}_{l})-\frac{1}{c_{l-1}},&~{}~{}l=1,2,...,L+1\end{cases}

El-Mikkawy showed that the $\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})$ is positive definite if and only if $c_{l}>0$ for $l=0,1,...L+1$ . Thus, the problem boils down to checking the positiveness of elements in $\mathbf{c}$ .[15] The complete algorithm is summarized in Appendix B.

4 Simulation study

4.1 Comparison Methods

In this section, we evaluate the performance of the proposed BDLM-AR using simulation studies. At the end of each simulated trial, we fitted BDLM-AR with lag $L=7$ and AR(1), that is, having

Y_{t}=\mu+\sum^{7}_{l=0}\beta_{l}X_{t-l}+\epsilon_{t}

(15)

where $\epsilon_{t}=\varphi\epsilon_{t-1}+w_{t}$ and $w_{t}\sim N(0,\sigma^{2})$ . Posterior distributions were derived using the hybrid Metropolis Hastings/Gibbs algorithm described in the previous section with 50,000 iterations, a burn-in period of 25,000, and $a=0.2$ for sampling $\gamma$ in the MH step.

We compared BDLM-AR with some existing methods including the Bayesian distributed lag model (BDLagM),[10] Bayesian ridge DLM (BR-DLM) with a mean zero normal prior for $\tilde{\boldsymbol{\beta}}$ , and a non-informative prior Bayesian DLM (NB-DLM) with a flat improper priors on each parameter in $\tilde{\boldsymbol{\beta}}$ . These existing methods would use the same mean model (15) but assume independent errors without accounting for autocorrelation.

In addition, as a benchmark, we include the parametric Koyck’s DLM [7] which assumes the knowledge of the true autoregressive coefficients. For Bayesian models, we estimate the parameters using the posterior means and for Koyck model, we use the maximum likelihood estimates.

4.2 Simulation Scenarios and Data Generation

In each simulated N-of-1 trial, measurements were collected daily for 120 days, under one of two possible treatment sequences. In the first sequence, a participant would receive $x_{t}=1$ on the first 30 days and the last 30 days, and receive $x_{t}=0$ between days 31 and 90; that is,

x_{t}^{(1)}=\begin{cases}1&t=30s+1,...,30s+30\text{ for }s=0,3\\ 0&t=30s+1,...,30s+30\text{ for }s=1,2.\end{cases}

In the second treatment sequence, a participant would switch treatments more frequently; specifically,

x_{t}^{(2)}=\begin{cases}1&t=15s+1,...,15s+15\text{ for }s=0,3,5,6\\ 0&t=15s+1,...,15s+15\text{ for }s=1,2,4,7.\end{cases}

For each treatment sequence, the data were generated according to model (15) under five sets of lag coefficients (lag curves, LC):

LC1.

Exponential decay curve: $\boldsymbol{\beta}=(5,2.5,1.25,0.625,0.3125,0,0,0)^{{}^{\prime}}$ ;
LC2.

Exponential decay curve with oscillation: $\boldsymbol{\beta}=(5,2.5,-1.25,-0.625,0.3125,0,0,0)^{{}^{\prime}}$ ;
LC3.

Slow absorption curve: $\boldsymbol{\beta}=(1.51,2.75,3.36,2.03,0.34,0,0,0)^{{}^{\prime}}$ ;
LC4.

Slow absorption curve with oscillation: $\boldsymbol{\beta}=(1.51,2.75,-3.36,-2.03,0.34,0,0,0)^{{}^{\prime}}$ ;
LC5.

No carryover effect: $\boldsymbol{\beta}=(10,0,0,0,0,0,0,0)^{{}^{\prime}}$ .

The exponential decay curves (LC1 and LC2) specify coefficients that diminish in magnitude as lag lengthens. Specifically, the coefficients under LC1 decrease geometrically, which is aligned with the assumption of Koyck DLM. The slow absorption curves (LC3 and LC4) reflect scenarios where the carryover effect peaks at day 2 after treatment. LC5 is the null scenario where there is no carryover effect. The magnitudes of the coefficients were chosen in these scenarios so that $\|\boldsymbol{\beta}\|_{1}\approx 10$ ; in addition, the total carryover effects ( $\delta$ ) are 4.69, 0.94, 8.48, -2.30 and 0 respectively for LC1-LC5. We consider $\sigma=10,20$ and $\varphi=0.5,0.2$ for the stochastic component in data generation. For each of these scenarios, the methods were evaluated using 100 simulated trials.

4.3 Simulation Results

Figure 2 shows the bias and root mean squared error (RMSE) of the posterior means of individual lag coefficients. As expected, the biases of NB-DLM are relatively small; however, the method also has the largest RMSE uniformly because of the use of non-informative prior. The biases of the other methods are comparable, and are generally small compared to RMSE. While the $\ell_{2}$ penalty in BR-DLM on the lag coefficients reduces variability when compared to NB-DLM, the additional constraints on diminishing coefficients imposed by BDLagM and the proposed BDLM-AR further reduce RMSE for large lag $l$ . Additionally, since the proposed BDLM-AR explicitly incorporates ridge-type regularization on the lag coefficients, it results in smaller RMSE for $\beta_{0}$ and the earlier lag coefficients (e.g. $\beta_{1}$ ). However, as a trade-off, the bias of BDLM-AR for early lag coefficients will be slightly inflated as compared to BDLagM, BR-DLM and NB-DLM, especially when true lag coefficient curve has frequent fluctuation. The benchmark method, Koyck DLM, performs best in the exponential decay case, where the true coefficient of autoregressive error ( $\varphi$ ) is assumed to be known. The proposed BDLM-AR has very similar performance as Koyck DLM. Note that the coefficient of autoregressive error is estimated directly from the proposed BDLM-AR model, which is more practical in real application. In summary, the proposed BDLM-AR generally results in smallest RMSE for all lag coefficients.

To further examine the performance of each method in estimating the lag curve in aggregate, Figure 3 (top panel) plots the Euclidean distance between the vector of estimated lag coefficients and the vector of true lag coefficients. Under LC1 (exponential decay), the Koyck DLM has the best performance overall. This is not surprising because (i) the Koyck model mimics the coefficients under LC1 closely, and (ii) Koyck DLM assumes knowledge of the true autoregressive coefficients used in the simulation and hence it is not a method that can be implemented in practice. Thus, this comparison serves as a benchmark about the efficiency of the proposed BDLM-AR and other methods. The figure demonstrates that the proposed BDLM-AR produces smaller distance from the true lag curve than the other methods.

Table 1 gives the bias and RMSE in the estimation of the total effect ( $\sum_{l=0}^{7}\beta_{l}$ ), the total carryover effect ( $\delta=\sum_{l=i}^{7}\beta_{l}$ ) and the immediate effect ( $\beta_{0}$ ) under different lag curves with $\sigma=10$ and $\varphi=0.5$ under treatment sequence $x_{t}^{(1)}$ . Results for other values of $\varphi$ , $\sigma$ and treatment sequence are similar and are provided in Figure A1 to A3 in the online Supporting Information. Overall, the proposed BDLM-AR yields consistently lower RMSE in estimating total effect, carryover effects, and immediate effects than other comparison methods, except for Koyck DLM. This is consistent with what we observe in Figures 2 and 3. We note that the advantages of BDLM-AR in terms of RMSE for the total carryover effect $(\delta)$ and immediate effect $(\beta_{0})$ are more pronounced than that for total effect ( $\delta+\beta_{0}$ ). This is indeed the motivation that we set out to accomplish: to decompose the treatment effects and separate carryover effect from the immediate effect.

Table 1: Summary of evaluation metrics (best values in bold) of total effect, total carryover effect and immediate effect(

\beta_{0}

) under five lag curves, treatment sequence

x_{t}^{(1)}

\sigma

= 10 and

\varphi

= 0.5.

		Truth	BDLM-AR	Koyck DLM	BDLagM	BR-DLM	NB-DLM
Bias	Total Effect
	LC1. Exponential decay	10	-1.82	0.41	0.65	0.21	0.02
	LC2. Exponential decay (Oscillated)	5.94	-1.24	0.61	0.61	0.20	-0.79
	LC3. Slow absorption	10	-2.04	0.12	0.71	0.30	-0.42
	LC4. Slow absorption (Oscillated)	-0.79	0.73	0.73	0.59	0.46	-0.15
	LC5. No carryover	10	-1.60	0.65	0.58	0.14	0.25
	Total Carryover Effect
	LC1. Exponential decay	4.69	-1.57	0.10	0.01	1.13	-0.44
	LC2. Exponential decay (Oscillated)	0.94	0.31	2.14	-0.15	1.50	-0.17
	LC3. Slow absorption	8.48	-4.19	-3.61	-0.11	-0.31	0.40
	LC4. Slow absorption (Oscillated)	-2.30	1.78	2.23	-0.70	0.71	0.06
	LC5. No carryover	0	2.05	4.98	0.94	3.25	0.23
	Immediate Effect
	LC1. Exponential decay	5	-0.25	0.30	0.64	-0.92	0.46
	LC2. Exponential decay (Oscillated)	5	-1.55	-1.53	0.76	-1.30	-0.62
	LC3. Slow absorption	1.51	2.15	3.73	0.81	0.61	-0.82
	LC4. Slow absorption (Oscillated)	1.51	-1.05	-1.51	1.29	-0.25	-0.21
	LC5. No carryover	10	-3.65	-4.33	-0.35	-3.11	0.02
RMSE	Total Effect
	LC1. Exponential decay	10	3.61	3.87	4.05	4.17	3.96
	LC2. Exponential decay (Oscillated)	5.94	2.95	3.90	4.05	4.11	3.93
	LC3. Slow absorption	10	3.81	3.85	4.06	4.16	4.40
	LC4. Slow absorption (Oscillated)	-0.79	2.28	3.93	4.04	4.01	3.99
	LC5. No carryover	10	3.54	3.91	4.04	4.18	3.93
	Total Carryover Effect
	LC1. Exponential decay	4.69	3.03	2.01	6.57	4.83	7.61
	LC2. Exponential decay (Oscillated)	0.94	2.25	2.87	6.56	4.83	6.31
	LC3. Slow absorption	8.48	5.13	4.15	6.58	4.64	6.94
	LC4. Slow absorption (Oscillated)	-2.30	2.96	2.94	6.62	4.44	6.39
	LC5. No carryover	0	3.39	5.36	6.65	6.06	6.56
	Immediate Effect
	LC1. Exponential decay	5	2.71	2.23	6.00	3.85	7.21
	LC2. Exponential decay (Oscillated)	5	2.99	2.62	6.01	3.85	6.72
	LC3. Slow absorption	1.51	3.42	4.30	6.03	3.61	6.67
	LC4. Slow absorption (Oscillated)	1.51	2.62	2.52	6.12	3.18	6.49
	LC5. No carryover	10	4.77	4.90	5.99	5.39	5.87

4.4 Effects of Design

Figure 3 shows that the Euclidean distance between the vector of estimated lag coefficients and the truth under $x_{t}^{(2)}$ (bottom panel) is smaller than that under $x_{t}^{(1)}$ (top panel) suggesting frequently switching treatments will help improve the information content in N-of-1 trial data. This is in line with what we expect because collinearity of exposure lags will be lessened as treatments change frequently, while the total duration is held fixed. An implication in practice is that we should alternate intervention as frequent as it is feasible. The relative performance among methods is the same regardless of the treatment sequence, that is, the proposed BDLM-AR yields the shortest distance from the true lag coefficients $\boldsymbol{\beta}$ .

4.5 Effects of Model Misspecification

In the previous subsections, BDLM-AR and other methods use a working mean model with $L=7$ and an AR(1) model for autoregressive errors. These working models correctly specify (or include the data generation model) in the previous simulation study. In this subsection, we investigate the robustness of BDLM-AR under model mis-specifications. Specifically, we will consider (1) the working mean model with $L=0,1,\ldots,7$ ; (2) the stochastic components that assume autoregressive error order of $p=0,1,7$ . That is, we consider a total of 24 BDLM-AR models.

In data generation, we use LC1 as the true mean model, where $\beta_{l}>0$ for $l=0,1,2,3,4$ , and we consider true scenarios for the errors:

E1.

AR(1) with $\varphi=0.5$ ;
E2.

Autoregressive model with $\varphi_{1}=0.5,\varphi_{2}=0,\varphi_{3}=0,\varphi_{4}=0.3,\varphi_{5}=0,\varphi_{6}=0.2$ .

Note that, under the scenario LC1/E1, a working model with $L<4$ or $p=0$ under-specifies the true model. Likewise, under L1/E2, a working model with $L<4$ or $p=0,1$ under-specifies the true model.

Table 2 summarizes the RMSE of these 24 models under the two scenarios (LC1/E1 and LC/E2) with $\sigma=10$ under $x_{t}^{(1)}$ . It can be seen that misspecified lag length has little influence on estimating total effect, total carryover effect and immediate effect, while under-specified error AR order will increase RMSE of parameters to a higher level than over-specified error AR order. Note that when choosing a small lag length value, we can hardly acquire estimation about the whole DL curve, as well as the information on the duration of carryover effect. Therefore, when the lag length is unknown, we suggest to fit data with a reasonable long lag length. For error autoregressive order, when the true orders are unknown, it is also suggested to fit a model with high autoregressive order.

Table 2: Summary of RMSE of total effect, total carryover effect and immediate effect(

\beta_{0}

) fitted using BDLM-AR model with different lag length and error autoregressive order. LC1.Exponential decay curve, treatment sequence

x_{t}^{(1)}

\sigma

= 10, E1. AR(1) with

\varphi=0.5

and E2. Autoregressive model with

\varphi_{1}=0.5,\varphi_{2}=0,\varphi_{3}=0,\varphi_{4}=0.3,\varphi_{5}=0,\varphi_{6}=0.2

are used to generate simulated data.

		Truth for errors: E1			Truth for errors: E2
	Lag	AR(7)	AR(1)	AR(0)	AR(7)	AR(1)	AR(0)
Total Effect	7	4.32	3.85	3.95	5.85	8.32	10.72
	6	4.37	3.88	3.92	5.92	8.25	10.62
	5	4.44	3.91	3.93	5.89	8.21	10.55
	4	4.49	3.98	3.94	6.01	8.06	10.48
	3	4.59	4.02	3.93	6.02	7.81	10.40
	2	4.63	4.07	3.91	6.08	7.68	10.34
	1	4.75	4.15	3.87	6.12	7.43	10.26
	0	4.01	3.71	3.77	5.82	8.58	10.52
Total Carryover Effect	7	3.56	3.18	3.85	3.55	5.24	6.69
	6	3.51	3.17	3.73	3.60	5.15	6.38
	5	3.42	3.15	3.61	3.60	4.95	6.08
	4	3.41	3.12	3.49	3.73	4.67	5.68
	3	3.41	3.12	3.36	3.70	4.29	5.33
	2	3.38	3.10	3.15	3.73	4.00	4.62
	1	3.58	3.33	2.92	3.86	3.88	3.54
	0	-	-	-	-	-	-
Immediate Effect	7	3.09	2.91	3.54	3.61	4.54	8.52
	6	3.02	2.91	3.47	3.56	4.50	8.46
	5	3.03	2.87	3.38	3.58	4.52	8.40
	4	3.04	2.85	3.31	3.57	4.54	8.29
	3	3.03	2.88	3.28	3.54	4.66	8.25
	2	3.09	2.97	3.29	3.64	4.91	8.28
	1	3.15	3.15	3.49	3.71	5.21	8.48
	0	5.39	5.62	6.00	6.35	9.02	11.75

5 Application to Light Therapy Study

The data set we used is from the light therapy study,[4] which studies the effectiveness of bright white light therapy for depressive symptoms within cancer survivors. Besides bright white intervention (10,000 lux), dim red (50 lux) was used as a control intervention, which lacks sufficient light intensity to affect cells from retina. Patients were instructed to use one of two portable lightboxes each morning for 30 minutes per day. For each patient, the whole study duration was 12 weeks. One intervention was assigned on the first three weeks and last three weeks and the other intervention was assigned between the fourth week and the ninth week. The initial intervention was randomized, either bright white lightbox or dim red lightbox. The collected outcomes were depressive symptom and fatigue symptom, which were tracked using a smartphone application. The outcomes were measured by patient’s self-reported standard single-item visual analog scale from 0-not at all depressed/tired to 10-extremely depressed/tired. Some occasional missing outcomes were imputed using average non-missing value in the corresponding treatment block.

We fit the data with the proposed BDLM-AR model with $L=7$ . Two autoregressive order of BDLM-AR model AR(1) and AR(7) were used. Convergence of all the MCMC were checked using both trace plots and the Gelman–Rubin diagnostics. [16] To be specific, the potential scale reduction factor for lag coefficients, autoregressive coefficients and model variance all are smaller than 1.2, indicating the convergence of posterior samples. [17] In addition to the comparison Bayesian distributed lag models, we also fit frequentist autoregressive regression models (RegAR) with $p=1$ and 7.

Table 3 shows the posterior means of the coefficients by the Bayesian methods and the maximum likelihood estimates by RegAR using depressive symptom outcome. For patient 7706, the RegAR and other Bayesian DLM models indicate a weak insignificant total effect of bright white intervention in relieving depressive symptom. However, BDLM-AR(7) model identifies a significant strong effect of bright white intervention as -0.52 (90% CI: -1.07, -0.02). Due to the adjustment on outcome serial correlation and ridge-type penalty on DL coefficients, the credible intervals of DL coefficients are much smaller than those estimated from models using white noise. To check the fitness of each model, we used Ljung–Box test to examine autocorrelation of the residuals,[18] and the corresponding p-values of $\chi^{2}$ -test are also shown in Table 3. No statistically significant autocorrelation is found in residuals of BDLM-AR model. We also found a second peak of treatment effect within patient 7706 two days after the immediate effect. For patient 7708, we observe a similar estimation between different models in terms of total effect. Treatment total effect estimated from BDLM-AR(7) model is -1.13 (90% CI: -3.14, 0.28), which is slightly lower than that from other models. Extra information obtained from BDLM-AR model is that the majority of treatment effect lasts for around two days. For RegAR(1) model, BDLagM, BR-DLM and NB-DLM, statistically significant autocorrelation is found in residuals, indicating an inadequacy of model fitting. Analysis results using fatigue symptom outcome can be found in Table A1 in the online Supporting Information.

Table 3: Lag coefficient estimates for light therapy study using depressive symptom outcome. 90% credible intervals/confidence intervals are in brackets. P-value of Ljung-Box test for each model is on the last row.

	Subject ID: 7706
	BDLM-AR(1)	BDLM-AR(7)	RegAR(1)	RegAR(7)	BDLagM	BR-DLM	NB-DLM
$\mu$	2.56 (2.20,2.91)	2.02 (0.07,2.88)	2.58 (2.21,2.95)	3.14 (2.05,4.23)	2.56 (2.34,2.79)	2.57 (2.36,2.77)	2.58 (2.35,2.80)
Total effect	-0.01 (-0.39,0.38)	-0.52 (-1.07,-0.02)	-0.02 (-0.52,0.49)	-0.36 (-0.93,0.21)	0.04 (-0.31,0.39)	0.02 (-0.29,0.35)	0 (-0.38,0.38)
Total carryover effect	0.02 (-0.23,0.34)	-0.22 (-0.74,0.14)	-		0.28 (-0.52,1.07)	0.09 (-0.32,0.55)	0.24 (-0.62,1.10)
$\beta_{0}$	-0.03 (-0.41,0.32)	-0.30 (-0.72,0.08)	-0.02 (-0.52,0.49)	-0.36 (-0.93,0.21)	-0.23 (-1.02,0.56)	-0.07 (-0.47,0.26)	-0.24 (-1.10,0.61)
$\beta_{1}$	0.01 (-0.20,0.22)	-0.01 (-0.30,0.37)	-	-	0.05 (-0.86,0.97)	-0.02 (-0.43,0.37)	-0.02 (-1.20,1.16)
$\beta_{2}$	0.01 (-0.11,0.15)	-0.08 (-0.35,0.10)	-	-	0.10 (-0.30,0.49)	0.03 (-0.37,0.46)	0.09 (-1.10,1.28)
$\beta_{3}$	0.01 (-0.07,0.10)	-0.08 (-0.34,0.05)	-	-	0.05 (-0.14,0.24)	0.06 (-0.33,0.51)	0 (-1.16,1.19)
$\beta_{4}$	0.01 (-0.03,0.07)	-0.03 (-0.21,0.07)	-	-	0.03 (-0.08,0.14)	0.13 (-0.22,0.69)	0.90 (-0.27,2.07)
$\beta_{5}$	0 (-0.04,0.03)	-0.02 (-0.15,0.06)	-	-	0.02 (-0.05,0.09)	-0.10 (-0.64,0.26)	-0.90 (-2.08,0.27)
$\beta_{6}$	0 (-0.02,0.02)	0 (-0.09,0.07)	-	-	0.01 (-0.03,0.05)	0 (-0.39,0.42)	0.26 (-0.91,1.43)
$\beta_{7}$	0 (-0.01,0.01)	0 (-0.06,0.06)	-	-	0.01 (-0.02,0.03)	-0.02 (-0.40,0.33)	-0.09 (-0.94,0.77)
$\varphi_{1}$	0.53 (0.36,0.70)	0.08 (-0.13,0.29)	0.52 (0.34,0.67)	0.31 (0.13,0.49)	-	-	-
$\varphi_{2}$	-	0.18 (-0.02,0.39)	-	0.13 (-0.06,0.32)	-	-	-
$\varphi_{3}$	-	0.07 (-0.11,0.25)	-	-0.05 (-0.25,0.15)	-	-	-
$\varphi_{4}$	-	0.22 (0.04,0.39)	-	0.20 (0.01,0.40)	-	-	-
$\varphi_{5}$	-	0.09 (-0.08,0.26)	-	0.09 (-0.14, 0.32)	-	-	-
$\varphi_{6}$	-	0.06 (-0.10,0.23)	-	0.07 (-0.15, 0.29)	-	-	-
$\varphi_{7}$	-	0.06 (-0.10,0.22)	-	0.06 (-0.17,0.28)	-	-	-
p-value	0.18	0.72	<0.001	0.97	<0.001	<0.001	<0.001
	Subject ID: 7708
	BDLM-AR(1)	BDLM-AR(7)	RegAR(1)	RegAR(7)	BDLagM	BR-DLM	NB-DLM
$\mu$	2.62 (1.73,3.50)	2.80 (-1.3,7.57)	2.76 (1.91,3.62)	4.64(2.38,6.89)	2.82 (2.25,3.39)	2.74 (2.17,3.30)	2.85 (2.25,3.44)
Total effect	-0.99 (-2.28,0.17)	-1.13 (-3.14,0.28)	-1.42 (-2.64,-0.19)	-1.72 (-3.14,-0.31)	-1.53 (-2.44,-0.61)	-1.38 (-2.32,-0.43)	-1.62 (-2.63,-0.6)
Total carryover effect	-0.36 (-1.58,0.50)	-0.40 (-1.96,0.52)	-	-	-0.45 (-2.59,1.66)	-0.86 (-2.09,0.41)	-0.46 (-2.73,1.80)
$\beta_{0}$	-0.63 (-1.80,0.48)	-0.72 (-2.26,0.45)	-1.42 (-2.64,-0.19)	-1.72 (-3.14,-0.31)	-1.07 (-3.15,1.02)	-0.52 (-1.73,0.45)	-1.15 (-3.42,1.11)
$\beta_{1}$	-0.22 (-1.03,0.45)	-0.32 (-1.46,0.39)	-	-	-0.02 (-2.46,2.43)	-0.26 (-1.49,0.92)	-0.11 (-3.22,2.98)
$\beta_{2}$	-0.08 (-0.61,0.36)	-0.08 (-0.74,0.44)	-	-	-0.15 (-1.21,0.91)	-0.13 (-1.33,1.14)	0.10 (-3.03,3.22)
$\beta_{3}$	-0.03 (-0.38,0.26)	-0.04 (-0.51,0.34)	-	-	-0.12 (-0.63,0.39)	-0.08 (-1.27,1.18)	0.34 (-2.76,3.44)
$\beta_{4}$	-0.03 (-0.29,0.15)	-0.04 (-0.41,0.22)	-	-	-0.07 (-0.37,0.22)	-0.29 (-1.63,0.86)	-1.33 (-4.43,1.77)
$\beta_{5}$	0 (-0.15,0.13)	0.03 (-0.14,0.34)	-	-	-0.04 (-0.22,0.13)	0.09 (-1.06,1.44)	0.99 (-2.14,4.11)
$\beta_{6}$	0 (-0.10,0.09)	0.02 (-0.12,0.22)	-	-	-0.03 (-0.13,0.08)	-0.01 (-1.17,1.23)	0.13 (-2.95,3.22)
$\beta_{7}$	0 (-0.06,0.05)	0.01 (-0.08,0.13)	-	-	-0.02 (-0.08,0.05)	-0.18 (-1.31,0.84)	-0.58 (-2.85,1.69)
$\varphi_{1}$	0.44 (0.26,0.62)	0.43 (0.22,0.64)	0.43 (0.24,0.59)	0.36 (0.17,0.54)	-	-	-
$\varphi_{2}$	-	0 (-0.22,0.23)	-	-0.01 (-0.21,0.19)	-	-	-
$\varphi_{3}$	-	-0.02 (-0.25,0.22)	-	0 (-0.20,0.21)	-	-	-
$\varphi_{4}$	-	0.19 (-0.04,0.42)	-	0.13 (-0.07,0.33)	-	-	-
$\varphi_{5}$	-	0.09 (-0.15,0.32)	-	0.11 (-0.10,0.31)	-	-	-
$\varphi_{6}$	-	0.06 (-0.18,0.30)	-	0.06 (-0.16,0.28)	-	-	-
$\varphi_{7}$	-	-0.05 (-0.28,0.17)	-	-0.09 (-0.29,0.12)	-	-	-
p-value	0.76	0.83	<0.001	0.91	<0.001	<0.001	<0.001

6 Discussion

In this paper, we introduce a novel method to analyze data from N-of-1 trials. The method handles temporal correlation between measurements and carryover effects via distributed lag structure and parameters are estimated using Bayesian approach with (fused) ridge type regularization. From the design perspective, N-of-1 trial can be viewed as a multi-period crossover trial in an individual. Traditional crossover trial requires physical washout period to eliminate carryover effects, resulting in pauses of study intervention and potentially lengthening study duration. Instead of using physical washout period, our proposed method provides an alternative to address the carryover effects analytically, which can be applied to N-of-1 trial even without washout period. This is specifically suited to applications where outcomes are measured continuously over the study period. Our simulation studies show that the proposed BDLM-AR model generally outperforms Koyck DLM, BDLagM, BR-DLM and NB-DLM in estimating carryover effects while comparable in estimating the total effects. Furthermore, we showed that BDLM-AR can simultaneously estimate autoregressive error. The advantage of BDLM-AR model increases when strong serial correlation exists.

We adopt a Bayesian estimation framework, which facilitates modeling and inference of N-of-1 trial data in several ways. First, a key in modeling carryover effects in N-of-1 trial data is to address multicollinearity in the lag treatment effects. Under a Bayesian framework, we achieve this by postulating a prior precision matrix on the lag coefficients to provide the appropriate constraints on the lag coefficients. Specifically, the designed form of this prior precision matrix is motivated by and connected to a fused ridge penalized estimation procedure (7), which imposes shrinkage and smoothness of the lag coefficients; see Appendix A for details. Second, while cross validation is often a method of choice in tuning the penalty terms ( $\lambda_{i}$ and $\lambda_{i}^{*}$ via $\gamma_{1}$ and $\gamma_{2}$ ) in penalized estimation, it is not feasible to split the sample at random time points because of the temporal order in N-of-1 trial data. Bayesian formulation provides a natural way to tune the penalties in a data-driven manner via posterior inference. Third, we have applied our model to a real application of using white light therapy for depressive symptoms, along with other Bayesian approaches (BDLagM, BR-DLM, NB-DLM). These approaches allow for using posterior credible intervals of individual lag coefficients as inferential tools. While there are varying degrees of variability, the proposed BDLM-AR gives the shortest intervals. The posterior intervals offer additional insights on the whole time course of treatment effect.

Data availability

R code for our Bayesian distributed lag model and corresponding simulations is available via the following GitHub repository, https://github.com/williammomo/BDLM-AR.

Appendix

Appendix A

Let $\pi(A|\cdot)$ denote conditional distribution of $A$ given all other variables in the model. The posterior distribution of $\tilde{\boldsymbol{\beta}}=(\mu,\boldsymbol{\beta}^{\prime})^{\prime}$ is

	$\displaystyle\pi(\tilde{\boldsymbol{\beta}}\mid\cdot)$	$\displaystyle\propto\pi(\boldsymbol{Y}\|\tilde{\boldsymbol{X}},\tilde{\boldsymbol{\beta}},\sigma^{2},\boldsymbol{\varphi},\boldsymbol{\gamma})\pi(\tilde{\boldsymbol{\beta}}\|\sigma^{2},\boldsymbol{\gamma})$
		$\displaystyle\propto\exp\left[-\frac{1}{2\sigma^{2}}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\tilde{\boldsymbol{\beta}})^{\prime}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\tilde{\boldsymbol{\beta}})\right]\exp\left[-\frac{1}{2\sigma^{2}}\tilde{\boldsymbol{\beta}}^{\prime}\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})\tilde{\boldsymbol{\beta}}\right]$
		$\displaystyle\propto\exp\left[-\frac{1}{2\sigma^{2}}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\boldsymbol{\beta})^{\prime}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\boldsymbol{\beta})-\frac{1}{2\sigma^{2}}c_{0}\mu^{2}-\frac{1}{2\sigma^{2}}\sum_{l=0}^{L}\lambda_{l}\beta_{l}^{2}-\frac{1}{2\sigma^{2}}\sum_{l=0}^{L}\lambda^{*}_{l}(\beta_{l}-\beta_{l+1})^{2}\right].$

The MAP estimate of $\tilde{\boldsymbol{\beta}}$ is

	$\displaystyle\hat{\boldsymbol{\beta}}_{\text{MAP}}$	$\displaystyle=\arg\max_{\tilde{\boldsymbol{\beta}}}\pi(\tilde{\boldsymbol{\beta}}\mid\cdot)$
		$\displaystyle=\arg\min_{\tilde{\boldsymbol{\beta}}}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\tilde{\boldsymbol{\beta}})^{\prime}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\tilde{\boldsymbol{\beta}})+c_{0}\mu^{2}+\sum_{l=0}^{L}\lambda_{l}\beta_{l}^{2}+\sum_{l=0}^{L}\lambda^{*}_{l}(\beta_{l}-\beta_{l+1})^{2}$

Note that the model variance $\sigma^{2}$ is included in the prior distribution of $\tilde{\boldsymbol{\beta}}$ as a scaling parameter. In this way, the ridge and smoothness penalties are only controlled by $\lambda_{l}$ and $\lambda^{*}_{l}$ , without depending on the model variance and prior covariance matrix of $\tilde{\boldsymbol{\beta}}$ .

Appendix B

Step 1. Set initial values for

\tilde{\boldsymbol{\beta}}

\sigma^{2}

\boldsymbol{\varphi}

and

\boldsymbol{\gamma}

;

for $i\leftarrow 1$ to $n_{\text{iteration}}$ do

Step 2. Given current value of

\boldsymbol{\varphi}

, transform

\boldsymbol{Y}

\tilde{\boldsymbol{X}}

\boldsymbol{Y}^{*}

\tilde{\boldsymbol{X}}^{*}

as described in equation (3) of Section 2.1; Also construct precision matrix

\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})

based on

\boldsymbol{\gamma}

as described in Section 2.2;

Step 3. Conditional on current values of

\boldsymbol{Y}^{*}

\tilde{\boldsymbol{X}}^{*}

\sigma^{2}

and

\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})

, update

\tilde{\boldsymbol{\beta}}

based on

\pi(\tilde{\boldsymbol{\beta}}\mid\boldsymbol{Y}^{*},\tilde{\boldsymbol{X}}^{*},\sigma^{2},\boldsymbol{\varphi},\boldsymbol{\gamma})

;

Step 4. Conditional on current values of

\boldsymbol{Y}^{*}

\tilde{\boldsymbol{X}}^{*}

\tilde{\boldsymbol{\beta}}

\boldsymbol{\varphi}

and

\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})

, update

\sigma^{2}

based on

\pi(\sigma^{2}\mid\boldsymbol{Y}^{*},\tilde{\boldsymbol{X}}^{*},\tilde{\boldsymbol{\beta}},\boldsymbol{\varphi},\boldsymbol{\gamma})

;

Step 5. Update

\boldsymbol{\epsilon}^{*}

conditional on current value of

\tilde{\boldsymbol{\beta}}

and

\boldsymbol{Y}

\tilde{\boldsymbol{X}}

. Then update

\boldsymbol{\varphi}

based on

\pi(\boldsymbol{\varphi}\mid\boldsymbol{Y},\tilde{\boldsymbol{X}},\tilde{\boldsymbol{\beta}},\sigma^{2},\boldsymbol{\gamma})

. Reject samples if the roots of

\boldsymbol{\varphi}(L)

lie outside the unit circle;

Step 6. Update

(\gamma_{1},\gamma_{2})

based on

\pi(\boldsymbol{\gamma}\mid\tilde{\boldsymbol{\beta}},\sigma^{2})

. Sample a proposal

\gamma_{i}^{*}

\gamma_{i}^{*}={\gamma_{i}}+a*U(-1,1)

for

i=1,2

a

is an adjustable step size. Compute the ratio

R_{\gamma}=\frac{\pi(\boldsymbol{\gamma}^{*}\mid\tilde{\boldsymbol{\beta}},\sigma^{2})}{\pi(\boldsymbol{\gamma}\mid\tilde{\boldsymbol{\beta}},\sigma^{2})}

if $\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma}^{*})$ is positive definite then

update

\boldsymbol{\gamma}=\boldsymbol{\gamma}^{*}

with probability min(1,

R_{\gamma}

);

end if

end for

Algorithm 1 The hybrid Metropolis-Hastings/Gibbs algorithm

References

[1] R. Kravitz, N. Duan, I. Eslick, N. Gabler, H. Kaplan, et al., “Design and implementation of n-of-1 trials: a user’s guide,” Agency for healthcare research and quality, US Department of Health and Human Services, 2014.
[2] E. J. Topol, “Transforming medicine via digital innovation,” Science translational medicine, vol. 2, no. 16, pp. 16cm4–16cm4, 2010.
[3] Y. K. Cheung, D. Wood, K. Zhang, T. A. Ridenour, L. Derby, T. St Onge, N. Duan, J. Duer-Hefele, K. W. Davidson, I. Kronish, et al., “Personal preferences for personalised trials among patients with chronic diseases: an empirical bayesian analysis of a conjoint survey,” BMJ open, vol. 10, no. 6, p. e036056, 2020.
[4] I. M. Kronish, Y. K. Cheung, J. Julian, F. Parsons, J. Lee, S. Yoon, H. Valdimarsdottir, P. Green, J. Suls, D. L. Hershman, et al., “Clinical usefulness of bright white light therapy for depressive symptoms in cancer survivors: Results from a series of personalized (n-of-1) trials,” in Healthcare, vol. 8, p. 10, Multidisciplinary Digital Publishing Institute, 2020.
[5] J. A. Johnson, S. N. Garland, L. E. Carlson, J. Savard, J. S. A. Simpson, S. Ancoli-Israel, and T. S. Campbell, “Bright light therapy improves cancer-related fatigue in cancer survivors: a randomized controlled trial,” Journal of Cancer Survivorship, vol. 12, no. 2, pp. 206–215, 2018.
[6] N. B. Gabler, N. Duan, S. Vohra, and R. L. Kravitz, “N-of-1 trials in the medical literature: a systematic review,” Medical care, pp. 761–768, 2011.
[7] L. M. Koyck, Distributed lags and investment analysis, vol. 4. North-Holland Publishing Company, 1954.
[8] S. Almon, “The distributed lag between capital appropriations and expenditures,” Econometrica: Journal of the Econometric Society, pp. 178–196, 1965.
[9] F. M. Bass and D. G. Clarke, “Testing distributed lag models of advertising effect,” Journal of Marketing Research, vol. 9, no. 3, pp. 298–308, 1972.
[10] L. J. Welty, R. Peng, S. Zeger, and F. Dominici, “Bayesian distributed lag models: estimating effects of particulate matter air pollution on daily mortality,” Biometrics, vol. 65, no. 1, pp. 282–291, 2009.
[11] A. Zanobetti, M. Wand, J. Schwartz, and L. Ryan, “Generalized additive distributed lag models: quantifying mortality displacement,” Biostatistics, vol. 1, no. 3, pp. 279–292, 2000.
[12] R. Tibshirani and P. Wang, “Spatial smoothing and hot spot detection for cgh data using the fused lasso,” Biostatistics, vol. 9, no. 1, pp. 18–29, 2008.
[13] S. Chib, “Bayes regression with autoregressive errors: A gibbs sampling approach,” Journal of Econometrics, vol. 58, no. 3, pp. 275–294, 1993.
[14] A. Gelman, G. O. Roberts, W. R. Gilks, et al., “Efficient metropolis jumping rules,” Bayesian statistics, vol. 5, no. 599-608, p. 42, 1996.
[15] M. E. El-Mikkawy, “A fast algorithm for evaluating nth order tri-diagonal determinants,” Journal of computational and applied mathematics, vol. 166, no. 2, pp. 581–584, 2004.
[16] A. Gelman, D. B. Rubin, et al., “Inference from iterative simulation using multiple sequences,” Statistical science, vol. 7, no. 4, pp. 457–472, 1992.
[17] S. P. Brooks and A. Gelman, “General methods for monitoring convergence of iterative simulations,” Journal of computational and graphical statistics, vol. 7, no. 4, pp. 434–455, 1998.
[18] G. M. Ljung and G. E. Box, “On a measure of lack of fit in time series models,” Biometrika, vol. 65, no. 2, pp. 297–303, 1978.
[19] R. Tibshirani, M. Saunders, S. Rosset, J. Zhu, and K. Knight, “Sparsity and smoothness via the fused lasso,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 1, pp. 91–108, 2005.
[20] H. B. Valdimarsdottir, M. G. Figueiro, W. Holden, S. Lutgendorf, L. M. Wu, S. Ancoli-Israel, J. Chen, A. Hoffman-Peterson, J. Granski, N. Prescott, et al., “Programmed environmental illumination during autologous stem cell transplantation hospitalization for the treatment of multiple myeloma reduces severity of depression: A preliminary randomized controlled trial,” Cancer medicine, vol. 7, no. 9, pp. 4345–4353, 2018.
[21] D. A. Belsley, “A guide to using the collinearity diagnostics,” Computer Science in Economics and Management, vol. 4, no. 1, pp. 33–50, 1991.
[22] D. V. Lindley and A. F. Smith, “Bayes estimates for the linear model,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 34, no. 1, pp. 1–18, 1972.
[23] A. E. Hoerl and R. W. Kennard, “Ridge regression: Biased estimation for nonorthogonal problems,” Technometrics, vol. 12, no. 1, pp. 55–67, 1970.

	$\displaystyle\pi(\tilde{\boldsymbol{\beta}}\mid\cdot)$	$\displaystyle\propto\pi(\boldsymbol{Y}\|\tilde{\boldsymbol{X}},\tilde{\boldsymbol{\beta}},\sigma^{2},\boldsymbol{\varphi},\boldsymbol{\gamma})\pi(\tilde{\boldsymbol{\beta}}\|\sigma^{2},\boldsymbol{\gamma})$
		$\displaystyle\propto\exp\left[-\frac{1}{2\sigma^{2}}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\tilde{\boldsymbol{\beta}})^{\prime}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\tilde{\boldsymbol{\beta}})\right]\exp\left[-\frac{1}{2\sigma^{2}}\tilde{\boldsymbol{\beta}}^{\prime}\tilde{\boldsymbol{\Omega}}(\boldsymbol{\gamma})\tilde{\boldsymbol{\beta}}\right]$
		$\displaystyle\propto\exp\left[-\frac{1}{2\sigma^{2}}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\boldsymbol{\beta})^{\prime}(\boldsymbol{Y}^{}-\tilde{\boldsymbol{X}}^{}\boldsymbol{\beta})-\frac{1}{2\sigma^{2}}c_{0}\mu^{2}-\frac{1}{2\sigma^{2}}\sum_{l=0}^{L}\lambda_{l}\beta_{l}^{2}-\frac{1}{2\sigma^{2}}\sum_{l=0}^{L}\lambda^{*}_{l}(\beta_{l}-\beta_{l+1})^{2}\right].$