This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sequential monitoring of response-adaptive randomized clinical trials

Hongjian Zhulabel=e1]hz5n@virginia.edu [    Feifang Hulabel=e2]fh6e@virginia.edu [ University of Virginia Department of Statistics
University of Virgina
Kerchof Hall, Charlottesville
Virginia 22904-4135
USA

E-mail: e2
(2010; 9 2009; 1 2010)
Abstract

Clinical trials are complex and usually involve multiple objectives such as controlling type I error rate, increasing power to detect treatment difference, assigning more patients to better treatment, and more. In literature, both response-adaptive randomization (RAR) procedures (by changing randomization procedure sequentially) and sequential monitoring (by changing analysis procedure sequentially) have been proposed to achieve these objectives to some degree. In this paper, we propose to sequentially monitor response-adaptive randomized clinical trial and study it’s properties. We prove that the sequential test statistics of the new procedure converge to a Brownian motion in distribution. Further, we show that the sequential test statistics asymptotically satisfy the canonical joint distribution defined in Jennison and Turnbull (2000). Therefore, type I error and other objectives can be achieved theoretically by selecting appropriate boundaries. These results open a door to sequentially monitor response-adaptive randomized clinical trials in practice. We can also observe from the simulation studies that, the proposed procedure brings together the advantages of both techniques, in dealing with power, total sample size and total failure numbers, while keeps the type I error. In addition, we illustrate the characteristics of the proposed procedure by redesigning a well-known clinical trial of maternal-infant HIV transmission.

60F15,
62G10,
60F05,
60F10,
Asymptotic properties,
Brownian process,
response-adaptive randomization,
power,
sample size,
type I error,
doi:
10.1214/10-AOS796
keywords:
[class=AMS] .
keywords:
.
volume: 38issue: 4

and

t1Supported by NSF Grants DMS-03-49048 and DMS-09-07297.

1 Introduction

Clinical trials usually involve multiple competing objectives such as maximizing the power of detecting clinical difference among treatments, minimizing total sample size and protecting more people from possibly inferior treatments. To achieve these objectives, two different techniques have been proposed in literature: (i) the analysis approach—by analyzing the observed data sequentially [sequential monitoring, Jennison and Turnbull (2000)], and (ii) the design approach—by changing the allocation probability sequentially [response-adaptive randomization, Hu and Rosenberger (2006)]. In this paper, we discuss how to combine the two procedures in one clinical trial in order to utilize both of their advantages.

In experiments where data accumulates sequentially, it is natural to conduct a sequential analysis. Sequential techniques originated from a methodology of long history based on Brownian motion. Wald’s classic work about the sequential probability ratio test (SPRT) [Wald (1947)] led to the application of sequential analysis in numerous fields of statistics. Armitage (1957, 1975) introduced sequential methods to clinical studies, which required monitoring results on a patient-by-patient basis. Pocock (1977) proposed sequential monitoring of clinical trials based on a group basis. Since then, many authors have done important work on group sequential studies. These work are summarized in Jennison and Turnbull (2000) and Proschan, Lan and Wittes (2006).

The main advantages of sequential monitoring were listed in Jennison and Turnbull (2000). First, it is ethical to monitor clinical trials sequentially because we could ensure that patients are not exposed to dangerous treatments and we could stop trials as soon as possible if needed. Second, administratively one needs to ensure that the protocol is not violated and the assumption, which the clinical trial is based on, is correct and valid. Third, sequential monitoring can decrease sample size and cost. With all the above advantages, sequential monitoring has now become a standard technique in conducting clinical trials.

The idea of response-adaptive randomization (RAR) can be traced back to Thompson (1933) and Robbins (1952). The play-the-winner rule [Zelen (1969)] and the randomized play-the-winner rule [Wei and Durham (1978)] were proposed to reduce number of patients in the inferior treatments. Hu and Rosenberger (2003) proved theoretically that adaptive randomization can be used to increase statistical efficiency in some clinical trials. In literature, many papers showed its efficient and ethical advantages over fixed designs [Hu and Rosenberger (2006)]. With modern technology and high capability of collecting data, it becomes easier and easier to implement adaptive designs in sequential experiments. Some clinical trials have already implemented the response-adaptive designs [Rout et al. (1993), Tamura et al. (1994), Andersen (1996), etc.].

Bayesian adaptive designs have also been proposed and studied in literature. Berry (2005) provided some comprehensive introduction of Bayesian designs in clinical trials. Recently, Cheng and Shen (2005) proposed to sequentially monitor a Bayesian adaptive design using decision-theoretic approaches and allowing the maximum sample size to be sequentially adjusted by the observed data. Lewis, Lipsky and Berry (2007) proposed a Bayesian decision-theoretic group sequential design for a disease with two possible outcomes based on a quadratic loss function. Wathen and Thall (2008) studied Bayesian adaptive model selection for optimizing group sequential clinical trials. In this paper, we focus on sequential monitoring of response-adaptive randomized clinical trials.

Traditionally, sequential monitoring deals with fixed designs (usually with equal allocation). No systematic study is available about sequential monitoring a sequential experiment using response-adaptive randomization, except a simulation study by Coad and Rosenberger (1999). They found that the expected number of treatment failures can be further reduced by combining the triangular test with the randomized play-the-winner rule. In this paper, we will study both theoretical properties and finite sample properties of combining sequential monitoring with response-adaptive randomization.

Sequential monitoring procedures use responses to stop or continue a clinical trial. Response-adaptive randomization procedures sequentially estimate the parameters and update the allocation probability of the next patient. To monitor a response-adaptive randomized clinical trial sequentially, one needs to study the two sequential procedures simultaneously. This is conceptually difficult because: (1) the number of patients assigned to each treatment is a random variable at each time point; (2) both the treatment assignments (probabilities) and the estimators of parameters (test statistics) depend on the responses at each time point. These problems arise from the sequential updating of estimators of the parameters and the allocation probability function, which leads to difficulties in finding the joint distribution of sequential test statistics. We overcome above difficulties by (i) approximating these different processes by martingale processes at each time point simultaneously; (ii) then using continuous Gaussian approximation to study these martingale processes simultaneously.

In this paper, we discuss sequential monitoring of doubly adaptive biased coin design proposed by Hu and Zhang (2004) for comparing two treatments. Under widely satisfied conditions, we show that the sequential test statistics converge to (i) a standard Brownian motion in distribution under null hypothesis; and (ii) a drifted Brownian motion in distribution under alternative hypothesis. For a standard Brownian motion, the critical value for fixed type I error rate has been well studied in literature. Therefore, the problem of controlling type I error is theoretically solved. Further, we show that the sequential test statistics satisfy the canonical joint distribution defined in Jennison and Turnbull (2000) asymptotically. Hence, one can apply the group sequential methods in the book to response-adaptive randomized clinical trials.

Simulation results support our theoretical founds in terms of type I error and display that sequential monitoring of response-adaptive randomization procedure could increase power and decrease total failure number. Also compared to complete randomization, sequential monitoring of response-adaptive randomization procedure could stop earlier, and thus reduce the actual sample size. In other words, the proposed procedure achieves the goals of both RAR and sequential monitoring. We also redesign an experiment evaluating the effect of zidovudine treatment in reducing the risk of maternal-infant HIV transmission performed by Connor et al. (1994). The proposed procedure can be used to decrease the number of HIV infected people and increase the power comparing to the complete randomization.

In Section 2, we introduce the notation, describe the framework and state the main theorem. In Sections 3 and 4, we use both generated data and real data to compare the proposed procedure with other randomization procedures. Conclusions are in Section 5 and technical proofs are given in the Appendix.

2 Sequential monitoring of response-adaptive randomization procedures

2.1 Notation and framework

We first describe the framework for the randomized adaptive designs. In this article, we consider clinical trials with two treatments 1 and 2. Let 𝐓i=(Ti,1,Ti,2)=(1,0),i=1,,n\mathbf{T}_{i}=(T_{i,1},T_{i,2})=(1,0),i=1,\ldots,n, if the iith patient is assigned to treatment 1, and (0,1)(0,1) otherwise, where nn is the sample size. 𝐍(n)=(N1(n),N2(n))\mathbf{N}(n)=(N_{1}(n),N_{2}(n)), where Nj(n)=i=1nTij,j=1,2N_{j}(n)=\sum_{i=1}^{n}T_{ij},j=1,2, is the number of patients in treatment jj. Let 𝐗=(𝐗1,,𝐗n)\mathbf{X}=(\mathbf{X}_{1},\ldots,\mathbf{X}_{n})^{\prime}, where 𝐗i=(𝐗i1,𝐗i2),i=1,,n\mathbf{X}_{i}=(\mathbf{X}_{i1},\mathbf{X}_{i2}),i=1,\ldots,n, is a random matrix of responses variable and 𝐗ij,j=1,2\mathbf{X}_{ij},j=1,2, are dd-dimensional random vectors. Here, only one element of 𝐗i\mathbf{X}_{i}, say 𝐗ij\mathbf{X}_{ij}, can be observed if the iith patient is assigned to treatment jj. We assume that 𝐗1,,𝐗n\mathbf{X}_{1},\ldots,\mathbf{X}_{n} are independent and identical distributed with unknown parameter (\boldsθ1,\boldsθ2)(\bolds{\theta}_{1},\bolds{\theta}_{2}), where \boldsθj\bolds{\theta}_{j} is the corresponding djd_{j}-dimensional parameter vector (θj1,,θjdj)(\theta_{j1},\ldots,\theta_{jd_{j}}) of treatment jj (j=1,2j=1,2). To simplify the notation, we assume that the parameter vectors of both treatments have the same dimension (d1=d2=dd_{1}=d_{2}=d). Without loss of generality, we also assume that \boldsθj=E(𝐗ij)\bolds{\theta}_{j}=E(\mathbf{X}_{ij}). Otherwise, we can transform 𝐗\mathbf{X} and treat the transformation as responses to make the former equation hold if such transformation exists. Such transformation usually exists asymptotically. See Gwise, Hu and Hu (2008) and Hu and Zhang (2004) for further discussion.

Let [nt][nt] denote the largest integer that is smaller than or equal to ntnt for t[0,1]t\in[0,1]. Then 𝐍([nt])=(N1([nt]),N2([nt]))\mathbf{N}([nt])=(N_{1}([nt]),N_{2}([nt])) and Nj([nt])=i=1[nt]Tij,j=1,2N_{j}([nt])=\sum_{i=1}^{[nt]}T_{ij},j=1,2. Note that t=N/nt=N/n when NN is the number of patients who have already been enrolled. We introduce the so-called information time tt in order to formulate this problem into the Skorohod topology [Ethier and Kurts (1986)]. After N=[nt]N=[nt] patients have been assigned and the responses observed, we use the modified sample means \boldsθ^[nt]=(\boldsθ^[nt],1,\boldsθ^[nt],2)\hat{\bolds{\theta}}_{[nt]}=(\hat{\bolds{\theta}}_{[nt],1},\hat{\bolds{\theta}}_{[nt],2}) to estimate the parameter \boldsθ=(\boldsθ1,\boldsθ2)\bolds{\theta}=(\bolds{\theta}_{1},\bolds{\theta}_{2}), that is,

\boldsθ^[nt],1=i=1[nt]Ti,1Xi1+\boldsθ0,1N1([nt])+1and\boldsθ^[nt],2=i=1[nt]Ti,2Xi2+\boldsθ0,2N2([nt])+1.\qquad\hat{\bolds{\theta}}_{[nt],1}=\frac{\sum_{i=1}^{[nt]}T_{i,1}X_{i1}+\bolds{\theta}_{0,1}}{N_{1}([nt])+1}\quad\mbox{and}\quad\hat{\bolds{\theta}}_{[nt],2}=\frac{\sum_{i=1}^{[nt]}T_{i,2}X_{i2}+\bolds{\theta}_{0,2}}{N_{2}([nt])+1}. (1)

Here, we add 1 in the denominator to prevent discontinuity, and add \boldsθ0,j\bolds{\theta}_{0,j}, say 0.5, to estimate \boldsθj\bolds{\theta}_{j} when no patient has been assigned to the treatment jj, j=1,2j=1,2.

Let \boldsρ=(ρ1,ρ2)\bolds{\rho}=(\rho_{1},\rho_{2}) be the target allocation proportion. Usually \boldsρ\bolds{\rho} is obtained based on some optimal criteria and depends on unknown parameter \boldsθ\bolds{\theta}. The selection of \boldsρ=\boldsρ(\boldsθ)\bolds{\rho}=\bolds{\rho}(\bolds{\theta}) has been studied by Hayre (1979), Jennison and Turnbull (2000) and Tymofyeyev, Rosenberger and Hu (2007). In practice, the parameters are unknown. Therefore, we have to first estimate them according to previous treatment assignments and responses so that we can target the allocation proportion. We consider a general family of doubly adaptive biased coin design (DBCD) [Eisele and Woodroofe (1995)] here.

Doubly adaptive biased coin design: (i) assign the first 2n02n_{0} patients to treatment 1 and 2 by some restricted randomization procedures [permuted block or truncated binomial randomization, see Rosenberger and Lachin (2002)]; (ii) when the llth (l>2n0l>2n_{0}) patient arrives and all the responses on the previous l1l-1 patients are available, we compute \boldsθ^l1\hat{\bolds{\theta}}_{l-1} and \boldsρ^l1=\boldsρ(\boldsθ^l1)\hat{\bolds{\rho}}_{l-1}=\bolds{\rho}(\hat{\bolds{\theta}}_{l-1}); (iii) then assign the llth patient to treatment 11 with probability

g(N1(l1)/(l1),ρ1(\boldsθ^l1)),g\bigl{(}N_{1}(l-1)/(l-1),\rho_{1}(\hat{\bolds{\theta}}_{l-1})\bigr{)},

where g(s,r)\dvtx[0,1]×[0,1][0,1]g(s,r)\dvtx[0,1]\times[0,1]\rightarrow[0,1] is the allocation function. Hu and Zhang (2004) proposed (γ0\gamma\geq 0):

g(γ)(0,r)\displaystyle g^{(\gamma)}(0,r) =\displaystyle= 1,\displaystyle 1,
g(γ)(1,r)\displaystyle g^{(\gamma)}(1,r) =\displaystyle= 0,\displaystyle 0, (2)
g(γ)(s,r)\displaystyle g^{(\gamma)}(s,r) =\displaystyle= r(r/s)γr(r/s)γ+(1r)((1r)/(1s))γ.\displaystyle\frac{r(r/s)^{\gamma}}{r(r/s)^{\gamma}+(1-r)((1-r)/(1-s))^{\gamma}}.

The design has drawn much attention since it was proposed and its advantages and properties can be found in Hu and Rosenberger (2003), Rosenberger and Hu (2004) and Tymofyeyev, Rosenberger and Hu (2007).

To compare two treatments in clinical trials, one consider a general hypothesis test:

H0\dvtxh(\boldsθ1)=h(\boldsθ2)versusH1\dvtxh(\boldsθ1)h(\boldsθ2),H_{0}\dvtx h(\bolds{\theta}_{1})=h(\bolds{\theta}_{2})\quad\mbox{versus}\quad H_{1}\dvtx h(\bolds{\theta}_{1})\neq h(\bolds{\theta}_{2}),

where hh is a d\Re^{d}\rightarrow\Re function of parameters. In this paper, we assume h(\boldsθj)h(\bolds{\theta}_{j}) is continuous and twice differentiable in a small neighborhood of \boldsθj,j=1,2\bolds{\theta}_{j},j=1,2. If one would like to test the above hypothesis at time point t(0,1]t\in(0,1], it is natural to construct the test statistic as

Zt(𝐍([nt])[nt],\boldsθ^([nt]))=h(\boldsθ^1([nt]))h(\boldsθ^2([nt]))\operatornameVar^(h(\boldsθ^1([nt])))+\operatornameVar^(h(\boldsθ^2([nt]))).Z_{t}\biggl{(}\frac{\mathbf{N}([nt])}{[nt]},\hat{\bolds{\theta}}([nt])\biggr{)}=\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}. (3)

Here, \operatornameVar^(h(\boldsθ^1([nt])))\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt]))) and \operatornameVar^(h(\boldsθ^2([nt])))\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt]))) are some consistent estimators of the variances of h(\boldsθ^1([nt]))h(\hat{\bolds{\theta}}_{1}([nt])) and h(\boldsθ^2([nt]))h(\hat{\bolds{\theta}}_{2}([nt])), respectively. There is no covariance term on the denominator since the two terms on the numerator are asymptotically independent [Hu, Rosenberger and Zhang (2006)]. Without loss of generality, we also assume that for some functions v1v_{1} and v2v_{2}

[nt]\operatornameVar^(h(\boldsθ^j([nt])))=vj(𝐍([nt])[nt],\boldsθ^([nt]))(1+o(1))a.s. j=1,2.[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))=v_{j}\biggl{(}\frac{\mathbf{N}([nt])}{[nt]},\hat{\bolds{\theta}}([nt])\biggr{)}\bigl{(}1+o(1)\bigr{)}\qquad\mbox{a.s. }j=1,2.

It is easy to see that both vj(𝐲,𝐳)v_{j}(\mathbf{y},\mathbf{z}) and Zt(𝐲,𝐳)Z_{t}(\mathbf{y},\mathbf{z}) are 2+2d\Re^{2+2d}\rightarrow\Re function, where 𝐲\mathbf{y} is a two-dimensional vector and 𝐳\mathbf{z} is a 2d2d-dimensional vector. Examples of using this formulation are discussed in Section 2.3.

2.2 Main results

Based on the notation in Section 2.1, we observe the random processes (𝐓1,,𝐓[nt])(\mathbf{T}_{1},\ldots,\mathbf{T}_{[nt]}), (𝐗1,,𝐗[nt])(\mathbf{X}_{1},\ldots,\mathbf{X}_{[nt]}), 𝐍([nt])\mathbf{N}([nt]), \boldsθ^[nt]\hat{\bolds{\theta}}_{[nt]}, \boldsρ(\boldsθ^[nt])\bolds{\rho}(\hat{\bolds{\theta}}_{[nt]}) and ZtZ_{t} at time point tt. When a response-adaptive randomization procedure is used, these random processes have the following characteristics different from those in fixed designs:

  1. [(2)]

  2. (1)

    The allocation (𝐍([nt]))(\mathbf{N}([nt])) at any time tt is a random vector instead of a constant in fixed designs.

  3. (2)

    The allocation (𝐍([nt]))(\mathbf{N}([nt])) and (𝐓1,,𝐓[nt])(\mathbf{T}_{1},\ldots,\mathbf{T}_{[nt]}) are not independent with the responses (𝐗1,,𝐗[nt])(\mathbf{X}_{1},\ldots,\mathbf{X}_{[nt]}) and the parameter estimator vector \boldsθ^[nt]\hat{\bolds{\theta}}_{[nt]}.

  4. (3)

    The elements \boldsθ^1[nt]\hat{\bolds{\theta}}_{1}{[nt]} and \boldsθ^2[nt]\hat{\bolds{\theta}}_{2}{[nt]} depend on each other at any given time t(0,1]t\in(0,1].

These differences directly lead to difficulties in deriving the joint distributions of sequential testing statistics.

To sequentially monitor a clinical trial, we need to figure out how to control the type I error. The answer to this question relies on the derivation of the asymptotical joint distribution of the sequential statistics and right choices of the boundaries. Before we give the main theorem, we need the following conditions for the response 𝐗\mathbf{X}, target allocation ρ(\boldsθ)\rho(\bolds{\theta}), allocation function gg and the function vj(𝐲,𝐳),j=1,2v_{j}(\mathbf{y},\mathbf{z}),j=1,2.

(A1) For some ε>0\varepsilon>0, E𝐗12+ε<E\|\mathbf{X}_{1}\|^{2+\varepsilon}<\infty;

(A2) g(s,r)g(s,r) is jointly continuous and twice differentiable at (ρ1,ρ1)(\rho_{1},\rho_{1});

(A3) g(r,r)=rg(r,r)=r for all r(0,1)r\in(0,1) and g(s,r)g(s,r) is strictly decreasing in ss and strictly increasing in rr on (0,1)×(0,1)(0,1)\times(0,1);

(A4) \boldsρ(𝐳)\bolds{\rho}(\mathbf{z}) is a continuous function and twice continuously differentiable in a small neighborhood of \boldsθ\bolds{\theta};

(A5) vj(𝐲,𝐳)v_{j}(\mathbf{y},\mathbf{z}) is jointly continuous and twice differentiable in a small neighborhood of (\boldsρ,\boldsθ)(\bolds{\rho},\bolds{\theta});

(A6) Zt(𝐲,𝐳)Z_{t}(\mathbf{y},\mathbf{z}) is a continuous function and it is twice continuously differentiable in a small neighborhood of vector (\boldsρ,\boldsθ)(\bolds{\rho},\bolds{\theta}).

Remark 2.1.

All the conditions are widely satisfied. An example of a design which satisfies these conditions is DBCD in Hu and Zhang (2004). Condition (A1) is used to ensure the consistency of the procedure and asymptotic normality of the allocation proportions. Condition (A3) forces the actual allocation proportion to approach the theoretically targeted one. Conditions (A4), (A5) and (A6) are satisfied in all the examples in Chapter 5 of Hu and Rosenberger (2006).

Theorem 2.1

Let Bt=tZtB_{t}=\sqrt{t}Z_{t} in the space D[0,1]D_{[0,1]} with Skorohod topology. Assume conditions (A1)(A6) are satisfied. Then we have the following two results:

{longlist}

Under H0H_{0}, BtB_{t} is asymptotically a standard Brownian motion in distribution.

Under H1H_{1}, BtnμtB_{t}-\sqrt{n}\mu t is asymptotically a standard Brownian motion in distribution, where

μ=(h(\boldsθ1)h(\boldsθ2))v1(\boldsρ,\boldsθ)+v2(\boldsρ,\boldsθ).\mu=\frac{(h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2}))}{\sqrt{v_{1}(\bolds{\rho},\bolds{\theta})+v_{2}(\bolds{\rho},\bolds{\theta})}}.

Based on Theorem 2.1, we can obtain the asymptotical distribution of the sequence of test statistics {Zt1,,ZtK}\{Z_{t_{1}},\ldots,Z_{t_{K}}\}, where 0t1t2tK10\leq t_{1}\leq t_{2}\leq\cdots\leq t_{K}\leq 1. Because Zti=(ti)1BtiZ_{t_{i}}=(\sqrt{t_{i}})^{-1}B_{t_{i}}, we have asymptotically:

{longlist}

{Zt1,,ZtK}\{Z_{t_{1}},\ldots,Z_{t_{K}}\} is multivariate normal;

EZti=μntiEZ_{t_{i}}=\mu\sqrt{nt_{i}}; and

\operatornameCov(Zti,Ztj)=[nti]/[ntj],0titj1\operatorname{Cov}(Z_{t_{i}},Z_{t_{j}})=\sqrt{[nt_{i}]/[nt_{j}]},0\leq t_{i}\leq t_{j}\leq 1.

Therefore, the sequence of test statistics {Zt1,,ZtK}\{Z_{t_{1}},\ldots,Z_{t_{K}}\} has the asymptotical canonical joint distribution defined in Jennison and Turnbull (2000).

Remark 2.2.

Based on the canonical joint distribution of the sequence of test statistics {Zt1,,ZtK}\{Z_{t_{1}},\ldots,Z_{t_{K}}\}, we can see that the doubly adaptive biased coin design has a simple form of information time, which is just the proportion of the sample size enrolled. This is because the DBCD consistently allocates same proportion of patients to different treatments from the beginning to the end asymptotically. We conjecture that this simple form of information time is true for most response-adaptive randomization procedures.

Based on Theorem 2.1, we can easily choose the correct critical values for the asymptotic Brownian process, so that the inflation of the type I error will be avoided. Moreover, we can also make use of all the well-known properties of Brownian process to do further analysis on the process of sequentially monitoring a response-adaptive randomization procedure. Because {Zt1,,ZtK}\{Z_{t_{1}},\ldots,Z_{t_{K}}\} satisfies the canonical joint distribution asymptotically, we can apply the sequential techniques in Chapters 2, 3, 4, 5, 6, 7 of Jennison and Turnbull (2000) to response-adaptive randomized clinical trials. We may also apply different types of spending functions to monitor a response-adaptive randomized clinical trial sequentially. Here, we will use α\alpha spending functions proposed by Lan and DeMets (1983).

Any increasing function α(t)\alpha(t) defined on [0,1][0,1] with α(0)=0\alpha(0)=0 and α(1)=α\alpha(1)=\alpha is called a α\alpha spending function. We spend α(ti)α(ti1)\alpha(t_{i})-\alpha(t_{i-1}) of the total type I error rate at time point tit_{i}, so that α(ti)\alpha(t_{i}) has been spent after this point. For time ti,i=1,2,,t_{i},i=1,2,\ldots, we can sequentially obtain the boundaries. This method does not require the predetermined number of looks and equally spaced looks. We can perform the interim monitor anytime during the trial. Such a procedure is usually preferred by Data and Safety Monitoring Boards (DSMB). Proschan, Lan and Wittes (2006) provided three special spending functions. The first one approximates the O’Brien–Fleming boundaries [O’Brien and Fleming (1979)]

α1(t)=2{1Φ(zα/2/t1/2)}.\alpha_{1}(t)=2\{1-\Phi(z_{\alpha/2}/t^{1/2})\}.

The second one is the linear spending function:

α2(t)=αt.\alpha_{2}(t)=\alpha t.

The third one approximates the Pocock boundaries [Pocock (1982)]:

α3(t)=αln{1+(e1)t}.\alpha_{3}(t)=\alpha\ln\{1+(e-1)t\}.

The O’Brien–Fleming-like function spends little of the type I error at early looks. Consequently, the boundary for the last look is very close to what it would have been without sequential monitoring. Conversely, the Pocock-like function rejects the null hypothesis easier with smaller boundaries for early looks and then has to use a reasonably large critical value at the end to keep the type I error. The linear function is between these two. Therefore, the three functions above represent three typical types of spending function. Finally, it is worth mentioning that these three spending functions are corresponding to the process ZtZ_{t}.

2.3 Examples

Here, we use two examples to illustrate how to sequentially monitor the response-adaptive randomization procedures based on Theorem 2.1.

Example 1 ((Continuous responses from normal populations)).

Suppose the responses of the two treatments are from two normal distributions Yi1N(μ1,σ12)Y_{i1}\sim N(\mu_{1},\sigma_{1}^{2}) and Yi2N(μ2,σ22),i=1,,nY_{i2}\sim N(\mu_{2},\sigma_{2}^{2}),i=1,\ldots,n. We would like to compare μ1\mu_{1} and μ2\mu_{2}. In this case, \boldsθ1=(μ1,σ12+μ12)\bolds{\theta}_{1}=(\mu_{1},\sigma_{1}^{2}+\mu_{1}^{2}), \boldsθ2=(μ2,σ22+μ22)\bolds{\theta}_{2}=(\mu_{2},\sigma_{2}^{2}+\mu_{2}^{2}), 𝐗ij=(Yij,Yij2)\mathbf{X}_{ij}=(Y_{ij},Y_{ij}^{2}) and h(\boldsθj)=θj1=μj,j=1,2h(\bolds{\theta}_{j})=\theta_{j1}=\mu_{j},j=1,2. Then the hypotheses are

H0\dvtxμ1=μ2versusH1\dvtxμ1μ2.H_{0}\dvtx\mu_{1}=\mu_{2}\quad\mbox{versus}\quad H_{1}\dvtx\mu_{1}\neq\mu_{2}.

Let target allocation proportion be the Neyman allocation [Jennison and Turnbull (2000)] with

ρ1=σ1σ1+σ2andρ2=1ρ1=σ2σ1+σ2.\rho_{1}=\frac{\sigma_{1}}{\sigma_{1}+\sigma_{2}}\quad\mbox{and}\quad\rho_{2}=1-\rho_{1}=\frac{\sigma_{2}}{\sigma_{1}+\sigma_{2}}. (4)

We can use other target allocation proportions, for example, the optimal allocation proportion [Zhang and Rosenberger (2006)] and the DAD_{A}-optimal allocation proportion [Gwise, Hu and Hu (2008)]. The sequential statistics Zt(𝐲,𝐳)Z_{t}(\mathbf{y},\mathbf{z}) is a function from 6\Re^{6} to \Re:

Zt(𝐲,𝐳)=z11z21(z12z112)/([nt]y1)+(z22z212)/([nt]y2),Z_{t}(\mathbf{y},\mathbf{z})=\frac{z_{11}-z_{21}}{\sqrt{(z_{12}-z_{11}^{2})/([nt]y_{1})+(z_{22}-z_{21}^{2})/([nt]y_{2})}},

where 𝐲=𝐍([nt])/[nt]\mathbf{y}={\mathbf{N}([nt])}/{[nt]} and 𝐳=\boldsθ^=(θ^11([nt]),θ^12([nt]),θ^21([nt]),θ^22([nt]))\mathbf{z}=\hat{\bolds{\theta}}=(\hat{\theta}_{11}([nt]),\hat{\theta}_{12}([nt]),\hat{\theta}_{21}([nt]),\hat{\theta}_{22}([nt])). It is easy to see that h(\boldsθ^1([nt]))=μ^1([nt])h(\hat{\bolds{\theta}}_{1}([nt]))=\hat{\mu}_{1}([nt]) and h(\boldsθ^2([nt]))=μ^2([nt])h(\hat{\bolds{\theta}}_{2}([nt]))=\hat{\mu}_{2}([nt]). Also the natural variance estimators are

\operatornameVar^(h(\boldsθ^1([nt])))=σ^12([nt])N1([nt])and\operatornameVar^(h(\boldsθ^2([nt])))=σ^22([nt])N2([nt]),\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))=\frac{\hat{\sigma}_{1}^{2}([nt])}{N_{1}([nt])}\quad\mbox{and}\quad\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))=\frac{\hat{\sigma}_{2}^{2}([nt])}{N_{2}([nt])},

where σ^12([nt])\hat{\sigma}_{1}^{2}([nt]) and σ^22([nt])\hat{\sigma}_{2}^{2}([nt]) are the usual unbiased estimators of σ12\sigma_{1}^{2} and σ22\sigma_{2}^{2} based on the first [nt][nt] responses (N1([nt])N_{1}([nt]) from treatment 1 and N2([nt])N_{2}([nt]) from treatment 2), respectively. Therefore,

v1(\boldsρ,\boldsθ)=σ12ρ1andv2(\boldsρ,\boldsθ)=σ22ρ2.v_{1}(\bolds{\rho},\bolds{\theta})=\frac{\sigma_{1}^{2}}{\rho_{1}}\quad\mbox{and}\quad v_{2}(\bolds{\rho},\bolds{\theta})=\frac{\sigma_{2}^{2}}{\rho_{2}}.

The test statistic is then

Zt=μ^1([nt])μ^2([nt])σ^12([nt])/N1([nt])+σ^22([nt])/N2([nt]).Z_{t}=\frac{\hat{\mu}_{1}([nt])-\hat{\mu}_{2}([nt])}{\sqrt{\hat{\sigma}_{1}^{2}([nt])/N_{1}([nt])+\hat{\sigma}_{2}^{2}([nt])/N_{2}([nt])}}. (5)

Then based on Theorem 2.1, the joint distribution of Bt=tZtB_{t}=\sqrt{t}Z_{t} is asymptotically a standard Brownian process under H0H_{0}. Under H1H_{1}, BtnμtB_{t}-\sqrt{n}\mu t is asymptotically a standard Brownian motion in distribution, where

μ=μ1μ2σ12/ρ1+σ22/(1ρ1).\mu=\frac{\mu_{1}-\mu_{2}}{\sqrt{\sigma_{1}^{2}/\rho_{1}+\sigma_{2}^{2}/(1-\rho_{1})}}.
Example 2 ((Binary responses)).

Assume Yi1\operatornameBin(1,p1)Y_{i1}\sim\operatorname{Bin}(1,p_{1}) and Yi2\operatornameBin(1Y_{i2}\sim\operatorname{Bin}(1,p2),i=1,,np_{2}),i=1,\ldots,n, and we would like to compare p1p_{1} and p2p_{2}. In this case, \boldsθ1=(p1)\bolds{\theta}_{1}=(p_{1}), \boldsθ2=(p2)\bolds{\theta}_{2}=(p_{2}), 𝐗ij=(Yij)\mathbf{X}_{ij}=(Y_{ij}) and h(\boldsθj)=θj1,j=1,2h(\bolds{\theta}_{j})=\theta_{j1},j=1,2. The hypotheses are

H0\dvtxp1=p2versusH1\dvtxp1p2.H_{0}\dvtx p_{1}=p_{2}\quad\mbox{versus}\quad H_{1}\dvtx p_{1}\neq p_{2}.

Three common target allocations are: (i) Neyman allocation,

ρ1\displaystyle\rho_{1} =\displaystyle= p1(1p1)p1(1p1)+p2(1p2)and\displaystyle\frac{\sqrt{p_{1}(1-p_{1})}}{\sqrt{p_{1}(1-p_{1})}+\sqrt{p_{2}(1-p_{2})}}\qquad\mbox{and}
ρ2\displaystyle\rho_{2} =\displaystyle= p2(1p2)p1(1p1)+p2(1p2);\displaystyle\frac{\sqrt{p_{2}(1-p_{2})}}{\sqrt{p_{1}(1-p_{1})}+\sqrt{p_{2}(1-p_{2})}};

i(ii) optimal allocation proposed by Rosenberger et al. (2001),

ρ1=p1p1+p2andρ2=p2p1+p2;\rho_{1}=\frac{\sqrt{p_{1}}}{\sqrt{p_{1}}+\sqrt{p_{2}}}\quad\mbox{and}\quad\rho_{2}=\frac{\sqrt{p_{2}}}{\sqrt{p_{1}}+\sqrt{p_{2}}}; (7)

(iii) Urn allocation [Wei and Durham (1978)],

ρ1=q2q1+q2andρ2=q1q1+q2.\rho_{1}=\frac{q_{2}}{q_{1}+q_{2}}\quad\mbox{and}\quad\rho_{2}=\frac{q_{1}}{q_{1}+q_{2}}. (8)

Neyman allocation is a commonly discussed allocation which is related to the efficiency issue in the field of response-adaptive randomization procedures. We studied sequential monitoring of response-adaptive designs with Neyman allocation in order to show that our proposed procedure is able to achieve various objects.

In this case, Zt(𝐲,𝐳)Z_{t}(\mathbf{y},\mathbf{z}) is a function from 4\Re^{4} to \Re:

Zt(𝐲,𝐳)=z11z21z11(1z11)/([nt]y1)+z21(1z21)/([nt]y2),Z_{t}(\mathbf{y},\mathbf{z})=\frac{z_{11}-z_{21}}{\sqrt{z_{11}(1-z_{11})/([nt]y_{1})+z_{21}(1-z_{21})/([nt]y_{2})}},

where 𝐲=(N1([nt])/[nt],N2([nt])/[nt])\mathbf{y}=(N_{1}([nt])/[nt],N_{2}([nt])/[nt]), 𝐳=(θ^11([nt]),θ^21([nt]))\mathbf{z}=(\hat{\theta}_{11}([nt]),\hat{\theta}_{21}([nt])), h(\boldsθ^1([n×t]))=p^1([nt])h(\hat{\bolds{\theta}}_{1}([n\times\break t]))=\hat{p}_{1}([nt]) and h(\boldsθ^2([nt]))=p^2([nt])h(\hat{\bolds{\theta}}_{2}([nt]))=\hat{p}_{2}([nt]). The corresponding variance estimators are

\operatornameVar^(h(\boldsθ^1([nt])))=p^1([nt])(1p^1([nt]))N1([nt])\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))=\frac{\hat{p}_{1}([nt])(1-\hat{p}_{1}([nt]))}{N_{1}([nt])}

and

\operatornameVar^(h(\boldsθ^2([nt])))=p^2([nt])(1p^2([nt]))N2([nt]).\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))=\frac{\hat{p}_{2}([nt])(1-\hat{p}_{2}([nt]))}{N_{2}([nt])}.

Therefore,

v1(\boldsρ,\boldsθ)=p1(1p1)ρ1andv2(\boldsρ,\boldsθ)=p2(1p2)ρ2.v_{1}(\bolds{\rho},\bolds{\theta})=\frac{p_{1}(1-p_{1})}{\rho_{1}}\quad\mbox{and}\quad v_{2}(\bolds{\rho},\bolds{\theta})=\frac{p_{2}(1-p_{2})}{\rho_{2}}.

The test statistic is

Zt\displaystyle Z_{t} =\displaystyle= (p^1([nt])p^2([nt]))\displaystyle\bigl{(}{\hat{p}_{1}([nt])-\hat{p}_{2}([nt])}\bigr{)}
×(p^1([nt])1p^1([nt])N1([nt])+p^2([nt])1p^2([nt])N2([nt]))1.\displaystyle{}\times\Biggl{(}\sqrt{\hat{p}_{1}([nt])\frac{1-\hat{p}_{1}([nt])}{N_{1}([nt])}+\hat{p}_{2}([nt])\frac{1-\hat{p}_{2}([nt])}{N_{2}([nt])}}\Biggr{)}^{-1}.

Then Bt=tZtB_{t}=\sqrt{t}Z_{t} converges to a standard Brownian process in distribution under H0H_{0}. Under H1H_{1}, BtnμtB_{t}-\sqrt{n}\mu t is asymptotically a standard Brownian motion in distribution, where

μ=p1p2p1(1p1)/ρ1+p2(1p2)/(1ρ1).\mu=\frac{p_{1}-p_{2}}{\sqrt{p_{1}(1-p_{1})/\rho_{1}+p_{2}(1-p_{2})/(1-\rho_{1})}}.

Theorem 2.1 can be applied to different situations such as the examples considered in Chapter 5 of Hu and Rosenberger (2006). In Examples 1 and 2, now assume we would like to look at the process at three points: t1=0.2t_{1}=0.2, t2=0.5t_{2}=0.5 and t3=1t_{3}=1. Then we can use the corresponding critical values from the three spending functions [Proschan, Lan and Wittes (2006)] in the last subsection for ZtZ_{t} to keep the overall type I error 0.05: O’Brien–Fleming-like boundaries (4.877, 2.963, 1.969), linear boundaries (2.576, 2.377, 2.141) and Pocock-like boundaries (2.438, 2.333, 2.225).

3 Simulation study

In Section 2, we obtained the asymptotical distribution of the test statistic ZtZ_{t}. In this section, we will use the two examples in Section 2 to study the finite sample properties of the proposed procedure.

In Examples 1 and 2, we use the doubly adaptive biased coin design with Hu and Zhang’s allocation function in (2.1) and γ=2\gamma=2 is used. In Tables 15, we use the same total sample size 500. The first 5050 patients (n0=25n_{0}=25) are randomly assigned to treatments 1 and 2 by using permuted block randomization. Then, for the llth (l>50l>50) patient, the unknown parameters are estimated by using (1) based on the first l1l-1 responses with θ0,1=θ0,2=0.5\theta_{0,1}=\theta_{0,2}=0.5. For normal responses in Example 1, we estimate σ12\sigma_{1}^{2} and σ22\sigma_{2}^{2} by using the standard unbiased estimators based on the first l1l-1 responses.

Table 1: Example 1 with Neyman allocation, μ1=μ2=1\mu_{1}=\mu_{2}=1, σ1=1\sigma_{1}=1, σ2=2\sigma_{2}=2
Critical values Randomization Type I error \boldsρ^1\bolds{\hat{\rho}_{1}} (s.e.)
B–F-like DBCD 0.055 0.333 (0.020)
B–F-like CR 0.052 0.500 (0.022)
Linear DBCD 0.048 0.333 (0.020)
Linear CR 0.053 0.500 (0.023)
Pocock-like DBCD 0.051 0.332 (0.020)
Pocock-like CR 0.052 0.500 (0.023)
Table 2: Example 2 with optimal allocation, p1=p2=0.5p_{1}=p_{2}=0.5
Critical values Randomization Type I error \boldsρ^1\bolds{\hat{\rho}_{1}} (s.e.)
B–F-like DBCD 0.051 0.500 (0.016)
B–F-like CR 0.046 0.500 (0.023)
Linear DBCD 0.055 0.500 (0.019)
Linear CR 0.061 0.500 (0.023)
Pocock-like DBCD 0.056 0.500 (0.019)
Pocock-like CR 0.050 0.500 (0.022)
Table 3: Example 1 with Neyman allocation, μ1=1\mu_{1}=1, μ2=1.4\mu_{2}=1.4, σ1=1\sigma_{1}=1, σ2=2\sigma_{2}=2
Critical values Randomization Power \boldsρ^1\bolds{\hat{\rho}_{1}} (s.e.) \boldsN1\bolds{N_{1}} \boldsN2\bolds{N_{2}} \boldsN3\bolds{N_{3}}
B–F-like DBCD 0.847 0.333 (0.021) 002 1013 3222
B–F-like CR 0.807 0.500 (0.024) 001 0842 3193
Linear DBCD 0.812 0.332 (0.027) 594 1429 2035
Linear CR 0.765 0.500 (0.028) 477 1380 1970
Pocock-like DBCD 0.792 0.332 (0.028) 741 1443 1774
Pocock-like CR 0.738 0.500 (0.028) 544 1309 1835
Table 4: Example 2 with urn allocation, p1=0.5p_{1}=0.5, p2=0.625p_{2}=0.625
Critical values Randomization Power \boldsρ^1\bolds{\hat{\rho}_{1}} (s.e.) \boldsN1\bolds{N_{1}} \boldsN2\bolds{N_{2}} \boldsN3\bolds{N_{3}} Total failures (s.e.)
B–F-like DBCD 0.811 0.426 (0.033) 004 839 3214 211 (13)
B–F-like CR 0.811 0.500 (0.024) 001 839 3215 217 (13)
Linear DBCD 0.762 0.421 (0.041) 503 1396 1912 206 (14)
Linear CR 0.767 0.500 (0.029) 521 1300 2016 212 (14)
Pocock-like DBCD 0.749 0.421 (0.042) 609 1325 1809 205 (14)
Pocock-like CR 0.738 0.501 (0.029) 603 1312 1773 211 (15)

For simplicity, we look at the test at three time points [n1=100n_{1}=100 (t1=0.2)(t_{1}=0.2), n2=250n_{2}=250 (t2=0.5)(t_{2}=0.5) and n=500n=500 (t3=1)(t_{3}=1)]. Then the three sets of spending function boundaries in Section 2.3 are used to ensure α=0.05\alpha=0.05. For each spending function, the first row in the table is for DBCD and the second row is for complete randomization (denoted as CR in the tables). All the simulations are based on 5000 replications.

In Table 1, we simulate Example 1 with two normal responses N(1,1)N(1,1) and N(1,2)N(1,2) by using the Neyman allocation (4). We find that the type I error of sequentially monitoring the response-adaptive randomization procedure and complete randomization are both well kept at the 0.05 level. We also report the mean and standard deviation of actual allocation proportion (ρ^1\hat{\rho}_{1}) for treatment 1 [N(1,1)N(1,1)]. We find that the mean agrees with Neyman allocation and the standard deviation is reasonably small for DBCD. This indicates that the DBCD is able to target the theoretical targeted allocation proportion very well. In Table 2, we simulate the Example 2 with two binary responses p1=p2=0.5p_{1}=p_{2}=0.5 and the target allocation is the optimal allocation (7). We obtain the same conclusion as Table 1. We have also done simulations for some other cases, and similar results are obtained. These numerical results indicate that sequential monitoring of the response-adaptive randomization will not inflate the type I error with the appropriate boundaries based on Theorem 2.1.

Table 5: Example 2 with optimal allocation, p1=0.5p_{1}=0.5, p2=0.625p_{2}=0.625
Critical values Randomization Power \boldsρ^1\bolds{\hat{\rho}_{1}} (s.e.) \boldsN1\bolds{N_{1}} \boldsN2\bolds{N_{2}} \boldsN3\bolds{N_{3}} Total failures (s.e.)
B–F-like DBCD 0.810 0.471 (0.017) 04 0863 3185 214 (12)
B–F-like CR 0.805 0.501 (0.024) 04 0795 3229 218 (13)
Linear DBCD 0.768 0.468 (0.022) 520 1354 1964 210 (14)
Linear CR 0.762 0.500 (0.029) 474 1367 1971 214 (14)
Pocock-like DBCD 0.754 0.469 (0.023) 673 1309 1787 210 (14)
Pocock-like CR 0.749 0.500 (0.03)0 602 1351 1793 213 (15)
1.96 DBCD 0.805 0.472 (0.015) NA NA NA 217 (11)
1.96 CR 0.802 0.500 (0.022) NA NA NA 221 (11)

Next, we show other advantages of the sequential monitoring of the response-adaptive randomization procedure. In Table 3, we simulate Example 1 with two normal responses N(1,1)N(1,1) and N(1.4,2)N(1.4,2) using Neyman allocation (4) as the target allocation that maximizes the power. The power of the sequential monitoring of the response-adaptive randomization procedure is about 5%–8% higher than sequentially monitoring the complete randomization. NiN_{i} in the table is the number of rejections at the iith look. Rejection at the first two looks means stopping the trial earlier. DBCD with sequential monitoring obviously stops the trial earlier than complete randomization.

In Table 4, we simulate Example 2 with two binary responses p1=0.5p_{1}=0.5 and p2=0.625p_{2}=0.625 using the urn allocation (8) as the target allocation that assigns more people to the better treatment. If we reject the null hypothesis at the first two looks, we assign all the remaining patients to the estimated better treatment and count the total failure number. We do this only for the comparison in the simulation study. In a real clinical trial, we stop the trial if the null hypothesis is rejected at an interim look. From the mean total failure number, the DBCD with sequential monitoring has lower failure numbers than complete randomization for each type of spending function. N1N_{1}, N2N_{2}, and N3N_{3} show that our methods stop the trial a little earlier and the power is almost the same.

In Table 5, we simulate Example 2 with two binary responses p1=0.5p_{1}=0.5 and p2=0.625p_{2}=0.625 using the optimal allocation (7) used to maximize the power while keeping the total failure number. We deal with the remaining patients in the same way as in Table 4 if we reject the null hypothesis at the first two looks. We find that sequential monitoring of the response-adaptive randomization procedure can achieve the aim of optimal allocation. Its power is larger and its failure number is less than the complete randomization procedure. In this table, we also do the simulation without sequential monitoring. That is, we only look at the test once at the end of the trial and the critical value is 1.96 for the nominal significance level 0.05. We report it at the last two rows. It is obvious that sequential monitoring can reduce the total failures.

Based on the simulation results, we can see the advantages of sequentially monitoring response-adaptive randomized clinical trials: (i) controlling type I error well; (ii) reducing the total number of failures; (iii) increasing power; and (iv) stopping the trail earlier (reducing total sample size).

4 Re-designing the HIV transmission trial

Maternal-infant transmission is the primary means by which infants are infected by HIV virus. Connor et al. (1994) reported a trial to evaluate the drug AZT (Zidovudine treatment) in reducing the risk of maternal-infant HIV transmission. In this clinical trial, 477 HIV-infected pregnant women were enrolled from April 1991 to December 1993 and assigned to the Zidovudine treatment group and placebo group with a 50–50 randomization scheme. This experiment was a randomized, double-blind and placebo-controlled trial. 239 were allocated to the treatment group and 238 to the placebo group. At the end of the trial, 8.3%8.3\% of the infant from the treatment group were infected by the HIV virus, while 25.5%25.5\% from the placebo group were infected.

Table 6: Re-designed the HIV trial with full sample size
Target allocation Critical values \boldsρ^1\bolds{\hat{\rho}_{1}} (s.e.) Power Total failures (s.e.)
CR linear 0.500 (0.039) 0.999 60.1 (11.1)
CR 1.96 0.501 (0.023) 0.999 80.7 (8.2)0
Urn allocation linear 0.751 (0.062) 0.996 52.3 (9.2)0
Optimal allocation linear 0.527 (0.021) 0.997 56.4 (10.8)

In Table 6, we redesign the study by sequential monitoring of both complete randomization (the first two rows in the table) and response-adaptive randomization [DBCD (2.1) with γ=2\gamma=2] (the last three rows in the table). We assume the success rate for the treatment group is p1=0.917p_{1}=0.917 and that for the placebo group is p2=0.745p_{2}=0.745 (as reported in the original paper). We look at the test at the three same time points as mentioned in the last section, n1=95n_{1}=95 (t1=0.2)(t_{1}=0.2), n2=143n_{2}=143 (t2=0.5)(t_{2}=0.5) and n=239n=239 (t3=1)(t_{3}=1). The boundary we use is the linear spending function (2.576, 2.377, 2.141) except the second row in the table where we do the equal allocation without sequential monitoring. We report the actual allocation proportion for the treatment group, power and the total HIV-infected number. As before, if we reject the null hypothesis at the first two looks, we will assign all the remaining patients to the estimated better treatment. We find that the sequential monitoring technique will decrease the HIV-infected number dramatically from the first two rows. Response-adaptive randomization technique will also reduce the HIV-infected number compared to the complete randomization. Sequential monitoring DBCD while targeting at the urn allocation has the least HIV-infected number, which agrees with the aim of urn allocation.

Table 7: Re-designed the HIV trial with sample size n=245n=245
Target allocation Critical values \boldsρ^1\bolds{\hat{\rho}_{1}} (s.e.) Power Total failures (s.e.)
CR B–F-like 0.500 (0.036) 0.947 40.1 (7.0)
CR linear 0.501 (0.042) 0.942 36.6 (7.5)
CR 1.96 0.500 (0.032) 0.958 43.1 (5.8)
Urn allocation B–F-like 0.745 (0.068) 0.920 30.7 (5.9)
Urn allocation linear 0.747 (0.074) 0.885 29.3 (6.1)
Optimal allocation B–F-like 0.528 (0.023) 0.952 36.8 (6.7)
Optimal allocation linear 0.529 (0.025) 0.945 32.8 (7.3)

In Table 7, we reduce the full sample size to 245 (to achieve power 0.950.95 for complete randomization) and keep all the other settings unchanged. We obtain the same conclusion about the HIV-infected number as in Table 6. We also find that targeting optimal allocation with DBCD has slightly higher power than targeting equal allocation when sequential monitoring is used. Targeting urn allocation with DBCD has slightly less power but the HIV-infected number in this way is the least. Overall, sequential monitoring of the response-adaptive randomization procedure is better than that of complete randomization, since it reduces the HIV-infected number and remains good power.

5 Conclusion remarks

Now sequential monitoring becomes a standard technique in clinical trials. To apply response-adaptive randomization in clinical trials, it is important to know how to sequentially monitor adaptive randomized trials. In this paper, we overcome this hurdle and show the advantages of sequential monitoring response-adaptive randomized clinical trials both theoretically and numerically. We use a Gaussian process in the Skorohod topology to describe the relationship between the allocation and parameter estimators. One of the main contributions of this paper is to show that sequential statistics can be asymptotically approximated by a Brownian process in distribution under both null and alternative hypotheses. Further, we find that the sequential test statistics satisfy the canonical joint distribution asymptotically. Consequently, the results of this paper not only solve the problem of preserving a preset type I error but may lead to many area of potential future research.

We have studied how to sequentially monitor a clinical trial based on doubly adaptive biased coin design proposed by Eisele and Woodroofe (1995) and Hu and Zhang (2004). Another important family of response-adaptive randomization procedure is based on urn models, which include randomized play-the-winner rule [Wei and Durham (1978)], generalized Friedman’s urn models [Athreya and Karlin (1968), Bai and Hu (2005)], drop-the-loser rule [Ivanova (2003)], sequential estimation-adjusted urn models [Zhang, Hu and Cheung (2006)], etc. The technique used in this paper opens a door to study the properties of sequential monitoring of clinical trials based on these urn models or the efficient randomized adaptive designs [Hu, Zhang and He (2009)]. We leave this for future study.

In this paper, we have used α\alpha-spending function to calculate the critical boundaries. Because the sequential test statistics satisfy the canonical joint distribution asymptotically, we can implement all the sequential techniques introduced in Jennison and Turnbull (2000) based on this canonical form. Also we can use the optimal spending functions in Anderson (2007), or the beta spending functions in DeMets (2006). We also leave the details for future research.

Appendix: Proofs

First, we introduce some further notation. For a function \boldsη(𝐮,𝐰)\dvtxL×M2\bolds{\eta}(\mathbf{u},\mathbf{w})\dvtx\Re^{L}\times\Re^{M}\rightarrow\Re^{2}, we denote the partial derivative matrices as

u(\boldsη)=(ηkui;i=1,,L,k=1,2)L×2\nabla_{u}(\bolds{\eta})=\biggl{(}\frac{\partial\eta_{k}}{\partial u_{i}};i=1,\ldots,L,k=1,2\biggr{)}_{L\times 2}

and

w(\boldsη)=(ηkwj;j=1,,M,k=1,2)M×2.\nabla_{w}(\bolds{\eta})=\biggl{(}\frac{\partial\eta_{k}}{\partial w_{j}};j=1,\ldots,M,k=1,2\biggr{)}_{M\times 2}.

Let H=r(g(r,s),1g(r,s))|(ρ1,ρ1)H=\nabla_{r}(g(r,s),1-g(r,s))|_{(\rho_{1},\rho_{1})} and E=s(g(r,s),1g(r,s))|(ρ1,ρ1)E=\nabla_{s}(g(r,s),1-g(r,s))|_{(\rho_{1},\rho_{1})} be the partial derivative matrices of the allocation function gg. Further, let V=\operatornamediag(\operatornamevar(𝐗11)/ρ1,\operatornamevar(𝐗12)/ρ2)V=\operatorname{diag}(\operatorname{var}(\mathbf{X}_{11})/{\rho_{1}},\operatorname{var}(\mathbf{X}_{12})/{\rho_{2}}), Σ3=((\boldsρ)|\boldsθ)V(\boldsρ)|\boldsθ\Sigma_{3}=(\nabla(\bolds{\rho})|_{\bolds{\theta}})^{\prime}V\nabla(\bolds{\rho})|_{\bolds{\theta}}, Σ1=\operatornamediag(\boldsρ)\boldsρ\boldsρ\Sigma_{1}=\operatorname{diag}(\bolds{\rho})-\bolds{\rho}^{\prime}\bolds{\rho} and Σ2=EΣ3E\Sigma_{2}=E^{\prime}\Sigma_{3}E. In Hu and Zhang (2004), they studied the asymptotic properties of 𝐍(n)\mathbf{N}(n), \boldsρ^(n)\hat{\bolds{\rho}}(n) and \boldsθ^(n)\hat{\bolds{\theta}}(n) at the end of the trial. Based on their results, one can do the corresponding statistical inference after observing all responses of the clinical trial. To monitor the response-adaptive randomized trial sequentially, we need to know the theoretical properties of the process 𝐍([nt])\mathbf{N}([nt]) and \boldsθ^([nt])\hat{\bolds{\theta}}([nt]) for any given t(0,1]t\in(0,1]. To do this, we start with Lemma .1.

Lemma .1

Let W1tW_{1t} and W2tW_{2t} be two independent standardtwo-dimensional Brownian processes. 𝐍([nt])\mathbf{N}([nt]), \boldsθ^([nt])\hat{\bolds{\theta}}([nt]), \boldsρ\bolds{\rho} and \boldsθ\bolds{\theta} are defined as in Section 2. Under the conditions of Theorem 2.1, we have

n1/2([nt])(𝐍([nt])[nt]\boldsρ,\boldsθ^([nt])\boldsθ)(Gt,W2tV1/2)n^{-1/2}([nt])\biggl{(}\frac{\mathbf{N}([nt])}{[nt]}-\bolds{\rho},\hat{\bolds{\theta}}([nt])-\bolds{\theta}\biggr{)}\rightarrow(G_{t},W_{2t}V^{1/2}) (10)

in distribution in the space D[0,1]D_{[0,1]} with the Skorohod topology, where the Gaussian process

Gt=0t(dW1x)Σ11/2(tx)H+0t(dW2x)Σ21/2[xt1y(ty)H𝑑y],G_{t}=\int_{0}^{t}(dW_{1x})\Sigma_{1}^{1/2}\biggl{(}\frac{t}{x}\biggr{)}^{H}+\int_{0}^{t}(dW_{2x})\Sigma_{2}^{1/2}\biggl{[}\int_{x}^{t}\frac{1}{y}\biggl{(}\frac{t}{y}\biggr{)}^{H}\,dy\biggr{]}, (11)

which is the solution of the stochastic differential equation

dGt=(dW1t)Σ11/2+W2tΣ21/2tdt+GttHdtwith G0=0,dG_{t}=(dW_{1t})\Sigma_{1}^{1/2}+\frac{W_{2t}\Sigma_{2}^{1/2}}{t}\,dt+\frac{G_{t}}{t}H\,dt\qquad\mbox{with }G_{0}=0,

and aHa^{H} is the matrix power function defined as

aH=eHlna=j=0(lna)jj!Hk.a^{H}=e^{H\ln a}=\sum_{j=0}^{\infty}\frac{(\ln a)^{j}}{j!}H^{k}.
{pf}

It is worth noting that the response-adaptive design in Theorem 2.1 satisfies all the conditions of Hu and Zhang (2004). So all the results in Hu and Zhang (2004) are valid. We will prove this lemma by using the weak convergence of the martingale [cf. Theorem 4.1 of Hall and Heyde (1980)]. To do this, we first approximate the process (𝐍([nt])[nt]\boldsρ,\boldsθ^([nt])\boldsθ)(\frac{\mathbf{N}([nt])}{[nt]}-\bolds{\rho},\hat{\bolds{\theta}}([nt])-\bolds{\theta}) by a martingale and then prove the following two facts: (1) Lindeberg condition holds for the approximated martingale process; and (2) the limiting covariance of n1/2([nt])(([nt])1𝐍([nt])\boldsρ,\boldsθ^([nt])\boldsθ)n^{-1/2}([nt])(([nt])^{-1}\mathbf{N}([nt])-\bolds{\rho},\hat{\bolds{\theta}}([nt])-\bolds{\theta}) agrees with that of (Gt,W2tV1/2)(G_{t},W_{2t}V^{1/2}).

Now, we use the martingale approximation of 𝐍(n)n\boldsρ\mathbf{N}(n)-n\bolds{\rho} and \boldsθ^(n)\boldsθ\hat{\bolds{\theta}}(n)-\bolds{\theta} from Hu and Zhang (2004). Let m=σ(𝐓1,,𝐓m,𝐗1,,𝐗m)\mathcal{F}_{m}=\sigma(\mathbf{T}_{1},\ldots,\mathbf{T}_{m},\mathbf{X}_{1},\ldots,\mathbf{X}_{m}) be the σ\sigma-field generated by the previous mm stages. Then under m1\mathcal{F}_{m-1}, 𝐓m\mathbf{T}_{m} and 𝐗m\mathbf{X}_{m} are independent, and

E[Tm1|Fm1]=g(N1(m1)m1,ρ^1(m1)).E[T_{m1}|F_{m-1}]=g\biggl{(}\frac{N_{1}(m-1)}{m-1},\hat{\rho}_{1}(m-1)\biggr{)}.

Let 𝐐n=m=1nΔ𝐐m\mathbf{Q}_{n}=\sum_{m=1}^{n}\Delta\mathbf{Q}_{m}, where Δ𝐐m=(Δ𝐐m,1,Δ𝐐m,2)=(ΔQm,1k,ΔQm,2k\Delta\mathbf{Q}_{m}=(\Delta\mathbf{Q}_{m,1},\Delta\mathbf{Q}_{m,2})=(\Delta Q_{m,1k},\Delta Q_{m,2k}; k=1,,d)k=1,\ldots,d) and ΔQm,jk=Tm,j(Xm,jkθjk)/ρj,j=1,2\Delta Q_{m,jk}=T_{m,j}(X_{m,jk}-\theta_{jk})/\rho_{j},j=1,2. Then 𝐐n=O(nloglogn)\mathbf{Q}_{n}=\break O(\sqrt{n\log\log n}) a.s. is a sequence of martingales and we can prove

\boldsθ^(n)\boldsθ=𝐐nn+O(loglognn)a.s.\hat{\bolds{\theta}}(n)-\bolds{\theta}=\frac{\mathbf{Q}_{n}}{n}+O\biggl{(}\frac{\log\log n}{n}\biggr{)}\qquad\mbox{a.s.} (12)

Let 𝐌n=m=1nΔ𝐌m\mathbf{M}_{n}=\sum_{m=1}^{n}\Delta\mathbf{M}_{m}, where Δ𝐌m=𝐓mE[𝐓m|m1]\Delta\mathbf{M}_{m}=\mathbf{T}_{m}-E[\mathbf{T}_{m}|\mathcal{F}_{m-1}], and Bn,mB_{n,m} as defined in Hu and Zhang (2004), then

𝐍(n)n\boldsρ\displaystyle\mathbf{N}(n)-n\bolds{\rho} =\displaystyle= m=1nΔ𝐌mBn,m+m=1nΔ𝐐m(\boldsρ)|\boldsθEk=mn1kBn,k+o(n1/2δ/3)\displaystyle\sum_{m=1}^{n}\Delta\mathbf{M}_{m}B_{n,m}+\sum_{m=1}^{n}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{|}_{\bolds{\theta}}E\sum_{k=m}^{n}\frac{1}{k}B_{n,k}+o(n^{-1/2-\delta/3})
:\displaystyle:\! =\displaystyle= 𝐔n+o(n1/2δ/3)\displaystyle\mathbf{U}_{n}+o(n^{-1/2-\delta/3})

almost surely, where 𝐔n\mathbf{U}_{n} is a sum of martingale differences.

We can approximate the process 𝐍([nt])[nt]\boldsρ\mathbf{N}([nt])-[nt]\bolds{\rho} and \boldsθ^([nt])\boldsθ\hat{\bolds{\theta}}([nt])-\bolds{\theta} (for any point t(0,1]t\in(0,1]) similarly as 𝐍(n)n\boldsρ\mathbf{N}(n)-n\bolds{\rho} and \boldsθ^(n)\boldsθ\hat{\bolds{\theta}}(n)-\bolds{\theta}. We obtain

\boldsθ^([nt])\boldsθ=𝐐[nt][nt]+O(loglog[nt][nt])a.s.\hat{\bolds{\theta}}([nt])-\bolds{\theta}=\frac{\mathbf{Q}_{[nt]}}{[nt]}+O\biggl{(}\frac{\log\log[nt]}{[nt]}\biggr{)}\qquad\mbox{a.s.} (13)

and

𝐍([nt])[nt]\boldsρ\displaystyle\mathbf{N}([nt])-[nt]\bolds{\rho}
=m=1[nt]Δ𝐌mB[nt],m+m=1[nt]Δ𝐐m(\boldsρ)|\boldsθEk=m[nt]1kB[nt],k+o(([nt])1/2δ/3)\displaystyle\qquad\!\phantom{:}=\sum_{m=1}^{[nt]}\Delta\mathbf{M}_{m}B_{[nt],m}+\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{|}_{\bolds{\theta}}E\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}+o(([nt])^{-1/2-\delta/3})
:=𝐔[nt]+o(([nt])1/2δ/3)\displaystyle\qquad:=\mathbf{U}_{[nt]}+o(([nt])^{-1/2-\delta/3})

almost surely.

Hu and Zhang (2004) proved that both martingales 𝐐n\mathbf{Q}_{n} and 𝐔n\mathbf{U}_{n} satisfy the Lindberg conditions. Similarly, we can show that both martingales 𝐐[nt]\mathbf{Q}_{[nt]} and 𝐔[nt]\mathbf{U}_{[nt]} also satisfy the Lindberg conditions. Now we just have to calculate the covariance matrix of the martingales 𝐐[nt]\mathbf{Q}_{[nt]} and 𝐔[nt]\mathbf{U}_{[nt]}. First, based on the results of Hu and Zhang (2004), we have

\boldsρ^(n)\boldsρ=O(loglognn)andn1𝐍(n)\boldsρ=O(loglognn)\hat{\bolds{\rho}}(n)-\bolds{\rho}=O\Biggl{(}\sqrt{\frac{\log\log n}{n}}\Biggr{)}\quad\mbox{and}\quad n^{-1}\mathbf{N}(n)-\bolds{\rho}=O\Biggl{(}\sqrt{\frac{\log\log n}{n}}\Biggr{)}

almost surely. Therefore, for any t(0,1]t\in(0,1], we have

\boldsρ^([nt])\boldsρ\displaystyle\hat{\bolds{\rho}}([nt])-\bolds{\rho} =\displaystyle= O(loglog[nt][nt])and\displaystyle O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}\quad\mbox{and }
([nt])1𝐍([nt])\boldsρ\displaystyle([nt])^{-1}\mathbf{N}([nt])-\bolds{\rho} =\displaystyle= O(loglog[nt][nt])\displaystyle O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}

almost surely. Now, we can calculate \operatornameVar[Δ𝐌[nt]|[nt]1]\operatorname{Var}[\Delta\mathbf{M}_{[nt]}|\mathcal{F}_{[nt]-1}], \operatornameVar[Δ𝐐[nt]|[nt]1]\operatorname{Var}[\Delta\mathbf{Q}_{[nt]}|\mathcal{F}_{[nt]-1}] and \operatornameCov[Δ𝐌[nt],Δ𝐐[nt]|[nt]1]\operatorname{Cov}[\Delta\mathbf{M}_{[nt]},\Delta\mathbf{Q}_{[nt]}|\mathcal{F}_{[nt]-1}].

First, Δ𝐌[nt]=𝐓[nt]E[𝐓[nt]|[nt]1]\Delta\mathbf{M}_{[nt]}=\mathbf{T}_{[nt]}-E[\mathbf{T}_{[nt]}|\mathcal{F}_{[nt]-1}] is a binary random vector. Based on conditions (A2), (A3) and (Appendix: Proofs), we have

\operatornameVar[Δ𝐌[nt]|[nt]1]=Σ1+o(1)\operatorname{Var}\bigl{[}\Delta\mathbf{M}_{[nt]}|\mathcal{F}_{[nt]-1}\bigr{]}=\Sigma_{1}+o(1) (15)

almost surely. Similarly, we can show

\operatornameVar[Δ𝐐[nt]|[nt]1]=V+o(1)\operatorname{Var}\bigl{[}\Delta\mathbf{Q}_{[nt]}|\mathcal{F}_{[nt]-1}\bigr{]}=V+o(1) (16)

and

\operatornameCov[Δ𝐌[nt],Δ𝐐[nt]|[nt]1]=o(1)\operatorname{Cov}\bigl{[}\Delta\mathbf{M}_{[nt]},\Delta\mathbf{Q}_{[nt]}|\mathcal{F}_{[nt]-1}\bigr{]}=o(1) (17)

almost surely.

Based on results (15), (16) and (17), it follows that for any 0<s<t<10<s<t<1,

\operatornameCov[𝐐[ns],𝐐[nt]]\displaystyle\operatorname{Cov}\bigl{[}\mathbf{Q}_{[ns]},\mathbf{Q}_{[nt]}\bigr{]} =\displaystyle= \operatornameCov(m=1[ns]Δ𝐐m,m=1[nt]Δ𝐐m)\displaystyle\operatorname{Cov}\Biggl{(}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\Biggr{)}
=\displaystyle= ns(V+o(1))=nsV+o(n),\displaystyle ns\bigl{(}V+o(1)\bigr{)}=nsV+o(n),
\operatornameCov[𝐔[ns],𝐔[nt]]\displaystyle\operatorname{Cov}\bigl{[}\mathbf{U}_{[ns]},\mathbf{U}_{[nt]}\bigr{]} =\displaystyle= n11(s,t)+o(n),\displaystyle n\wedge_{11}(s,t)+o(n),
\operatornameCov[𝐐[ns],𝐔[nt]]\displaystyle\operatorname{Cov}\bigl{[}\mathbf{Q}_{[ns]},\mathbf{U}_{[nt]}\bigr{]} =\displaystyle= \operatornameCov[m=1[ns]Δ𝐐m,m=1[nt]Δ𝐌mB[nt],m\displaystyle\operatorname{Cov}\Biggl{[}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{M}_{m}B_{[nt],m}
+m=1[nt]Δ𝐐m(\boldsρ)|\boldsθEk=m[nt]1kB[nt],k]\displaystyle\qquad\hskip 48.9pt{}+\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{|}_{\bolds{\theta}}E\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}\Biggr{]}
=\displaystyle= \operatornameCov[m=1[ns]Δ𝐐m,m=1[nt]Δ𝐌mB[nt],m]\displaystyle\operatorname{Cov}\Biggl{[}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{M}_{m}B_{[nt],m}\Biggr{]}
+\operatornameCov[m=1[ns]Δ𝐐m,m=1[nt]Δ𝐐m(\boldsρ)|\boldsθEk=m[nt]1kB[nt],k]\displaystyle{}+\operatorname{Cov}\Biggl{[}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{|}_{\bolds{\theta}}E\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}\Biggr{]}
=\displaystyle= \operatornameCov[m=1[ns]Δ𝐐m,m=1[nt]Δ𝐐m(\boldsρ)|\boldsθEk=m[nt]1kB[nt],k]\displaystyle\operatorname{Cov}\Biggl{[}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{|}_{\bolds{\theta}}E\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}\Biggr{]}
=\displaystyle= (V(\boldsρ)|\boldsθE+o(1))m=1[ns](k=m[nt]1kB[nt],k)\displaystyle\bigl{(}V\nabla(\bolds{\rho})|_{\bolds{\theta}}E+o(1)\bigr{)}\sum_{m=1}^{[ns]}\Biggl{(}\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}\Biggr{)}
=\displaystyle= nV(\boldsρ)|\boldsθE0s𝑑x[xtty(ty)H𝑑y]+o(n)\displaystyle nV\nabla(\bolds{\rho})\big{|}_{\bolds{\theta}}E\int_{0}^{s}dx\biggl{[}\int_{x}^{t}\frac{t}{y}\biggl{(}\frac{t}{y}\biggr{)}^{H}\,dy\biggr{]}+o(n)
=\displaystyle= n21(s,t)+o(n),\displaystyle n\wedge_{21}(s,t)+o(n),

and similarly,

\operatornameCov[𝐐[nt],𝐔[ns]]=n12(s)+o(n),\operatorname{Cov}\bigl{[}\mathbf{Q}_{[nt]},\mathbf{U}_{[ns]}\bigr{]}=n\wedge_{12}(s)+o(n),

where

11(s,t)\displaystyle\wedge_{11}(s,t) =\displaystyle= 0s(sx)HΣ1(tx)H𝑑x\displaystyle\int_{0}^{s}\biggl{(}\frac{s}{x}\biggr{)}^{H^{\prime}}\Sigma_{1}\biggl{(}\frac{t}{x}\biggr{)}^{H}\,dx
+0s𝑑x[xs1y(sy)H𝑑y]Σ2[xt1y(ty)H𝑑y],\displaystyle{}+\int_{0}^{s}dx\biggl{[}\int_{x}^{s}\frac{1}{y}\biggl{(}\frac{s}{y}\biggr{)}^{H}\,dy\biggr{]}^{\prime}\Sigma_{2}\biggl{[}\int_{x}^{t}\frac{1}{y}\biggl{(}\frac{t}{y}\biggr{)}^{H}\,dy\biggr{]},
21(s,t)\displaystyle\wedge_{21}(s,t) =\displaystyle= V(\boldsρ)|\boldsθE0s𝑑x[xtty(ty)H𝑑y],\displaystyle V\nabla(\bolds{\rho})\big{|}_{\bolds{\theta}}E\int_{0}^{s}dx\biggl{[}\int_{x}^{t}\frac{t}{y}\biggl{(}\frac{t}{y}\biggr{)}^{H}\,dy\biggr{]},
12(s)\displaystyle\wedge_{12}(s) =\displaystyle= 0s𝑑x[xssy(sy)H𝑑y]E(\boldsρ)|\boldsθV.\displaystyle\int_{0}^{s}dx\biggl{[}\int_{x}^{s}\frac{s}{y}\biggl{(}\frac{s}{y}\biggr{)}^{H}\,dy\biggr{]}E^{\prime}\nabla(\bolds{\rho})\big{|}_{\bolds{\theta}}^{\prime}V.

Therefore, the asymptotic covariance function of n1/2(𝐔[nt],𝐐[nt])n^{-1/2}(\mathbf{U}_{[nt]},\mathbf{Q}_{[nt]}) agrees with that of (Gt,W2tV1/2)(G_{t},W_{2t}V^{1/2}). So by weak convergence of the martingale [cf. Theorem 4.1 of Hall and Heyde (1980)], we have

n1/2([nt])(𝐍([nt])[nt]\boldsρ,\boldsθ^([nt])\boldsθ)(Gt,W2tV1/2)n^{-1/2}([nt])\biggl{(}\frac{\mathbf{N}([nt])}{[nt]}-\bolds{\rho},\hat{\bolds{\theta}}([nt])-\bolds{\theta}\biggr{)}\rightarrow(G_{t},W_{2t}V^{1/2})

in distribution in the space D[0,1]D_{[0,1]} with the Skorohod topology. {pf*}Proof of Theorem 2.1 We assume for j=1,2j=1,2

[nt]\operatornameVar^(h(\boldsθ^j([nt])))=[nt]vj(𝐍([nt])/[nt],\boldsθ^([nt]))(1+oP(1))[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))=[nt]v_{j}\bigl{(}\mathbf{N}([nt])/[nt],\hat{\bolds{\theta}}([nt])\bigr{)}\bigl{(}1+o_{P}(1)\bigr{)}

and

[nt]\operatornameVar(h(\boldsθ^j([nt])))=[nt]vj(\boldsρ,\boldsθ),[nt]\operatorname{Var}(h(\hat{\bolds{\theta}}_{j}([nt])))=[nt]v_{j}(\bolds{\rho},\bolds{\theta}),

where vv is a continuous function. We also assume

[nt]vj(𝐍([nt])/[nt],\boldsθ^([nt]))=[nt]vj(\boldsρ,\boldsθ)+O(loglog[nt][nt])a.s.,[nt]v_{j}\bigl{(}\mathbf{N}([nt])/[nt],\hat{\bolds{\theta}}([nt])\bigr{)}=[nt]v_{j}(\bolds{\rho},\bolds{\theta})+O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}\qquad\mbox{a.s.},

which holds for most circumstances, since

𝐍([nt])/[nt]=\boldsρ+O(loglog[nt][nt])a.s.\mathbf{N}([nt])/[nt]=\bolds{\rho}+O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}\qquad\mbox{a.s.}

and

\boldsθ^([nt])=\boldsθ+O(loglog[nt][nt])a.s.\hat{\bolds{\theta}}([nt])=\bolds{\theta}+O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}\qquad\mbox{a.s.}

So

[nt]\operatornameVar^(h(\boldsθ^j([nt])))=[nt]\operatornameVar(h(\boldsθ^j([nt])))+OP(loglog[nt][nt]).[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))=[nt]\operatorname{Var}(h(\hat{\bolds{\theta}}_{j}([nt])))+O_{P}\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}.

That is, [nt]\operatornameVar^(h(\boldsθ^j([nt])))[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt]))) converges to [nt]\operatornameVar(h(\boldsθ^j([nt]))),j=1,2,[nt]\operatorname{Var}(h(\hat{\bolds{\theta}}_{j}([nt]))),j=1,2, in probability. By Slutsky’s theorem, the sequential statistics

Bt(𝐍([nt])[nt],\boldsθ^([nt]))=th(\boldsθ^1([nt]))h(\boldsθ^2([nt]))\operatornameVar^(h(\boldsθ^1([nt])))+\operatornameVar^(h(\boldsθ^2([nt])))B_{t}\biggl{(}\frac{\mathbf{N}([nt])}{[nt]},\hat{\bolds{\theta}}([nt])\biggr{)}=\sqrt{t}\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}

and

Bt(\boldsθ^([nt]))=th(\boldsθ^1([nt]))h(\boldsθ^2([nt]))\operatornameVar(h(\boldsθ^1([nt])))+\operatornameVar(h(\boldsθ^2([nt])))B_{t}^{*}(\hat{\bolds{\theta}}([nt]))=\sqrt{t}\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))}{\sqrt{\operatorname{Var}(h(\hat{\bolds{\theta}}_{1}([nt])))+\operatorname{Var}(h(\hat{\bolds{\theta}}_{2}([nt])))}}

have the same distribution asymptotically. So we only need to prove the sequential statistics BtB_{t}^{*} converges to Brownian motion in distribution. Now

h(\boldsθ^j)h(\boldsθj)\displaystyle h(\hat{\bolds{\theta}}_{j})-h(\bolds{\theta}_{j}) =\displaystyle= (\boldsθ^j\boldsθj)(h(\boldsθj)/\boldsθj)+o(\boldsθ^j\boldsθj1+δ)\displaystyle(\hat{\bolds{\theta}}_{j}-\bolds{\theta}_{j})\bigl{(}\partial h(\bolds{\theta}_{j})/\partial\bolds{\theta}_{j}\bigr{)}^{\prime}+o(\|\hat{\bolds{\theta}}_{j}-\bolds{\theta}_{j}\|^{1+\delta})
=\displaystyle= (\boldsθ^j\boldsθj)(h(\boldsθj)/\boldsθj)+o(n1/2δ/3)a.s.,j=1,2.\displaystyle(\hat{\bolds{\theta}}_{j}-\bolds{\theta}_{j})\bigl{(}\partial h(\bolds{\theta}_{j})/\partial\bolds{\theta}_{j}\bigr{)}^{\prime}+o(n^{-1/2-\delta/3})\qquad\mbox{a.s.},j=1,2.

It is easy to see that

\operatornameVar[\boldsθ^j([nt])]=\operatornameVar[\boldsθ^j(n)]/t+o(n1)a.s.,j=1,2.\operatorname{Var}[\hat{\bolds{\theta}}_{j}([nt])]=\operatorname{Var}[\hat{\bolds{\theta}}_{j}(n)]/t+o(n^{-1})\qquad\mbox{a.s.},j=1,2.

Here, we define

C=\operatornameVar[h(\boldsθ^1([nt]))]+\operatornameVar[h(\boldsθ^2([nt]))]\operatornameVar[h(\boldsθ^1([ns]))]+\operatornameVar[h(\boldsθ^2([ns]))]C=\sqrt{\operatorname{Var}[h(\hat{\bolds{\theta}}_{1}([nt]))]+\operatorname{Var}[h(\hat{\bolds{\theta}}_{2}([nt]))]}\sqrt{\operatorname{Var}[h(\hat{\bolds{\theta}}_{1}([ns]))]+\operatorname{Var}[h(\hat{\bolds{\theta}}_{2}([ns]))]}

and

𝐃=(h(\boldsθ1)/\boldsθ1,h(\boldsθ2)/\boldsθ2).\mathbf{D}=\bigl{(}\partial h(\bolds{\theta}_{1})/\partial\bolds{\theta}_{1},-\partial h(\bolds{\theta}_{2})/\partial\bolds{\theta}_{2}\bigr{)}.

Then

Bt(\boldsθ^([nt]))\displaystyle B_{t}^{*}(\hat{\bolds{\theta}}([nt])) =\displaystyle= th(\boldsθ^1([nt]))h(\boldsθ^2([nt]))\operatornameVar(h(\boldsθ^1([nt])))+\operatornameVar(h(\boldsθ^2([nt])))\displaystyle\sqrt{t}\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))}{\sqrt{\operatorname{Var}(h(\hat{\bolds{\theta}}_{1}([nt])))+\operatorname{Var}(h(\hat{\bolds{\theta}}_{2}([nt])))}}
=\displaystyle= th(\boldsθ1)h(\boldsθ2)+(\boldsθ^([nt])\boldsθ)𝐃+o(n1/2δ/3)\operatornameVar(h(\boldsθ^1([nt])))+\operatornameVar(h(\boldsθ^2([nt]))).\displaystyle\sqrt{t}\frac{h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2})+(\hat{\bolds{\theta}}([nt])-\bolds{\theta})\mathbf{D}^{\prime}+o(n^{-1/2-\delta/3})}{\sqrt{\operatorname{Var}(h(\hat{\bolds{\theta}}_{1}([nt])))+\operatorname{Var}(h(\hat{\bolds{\theta}}_{2}([nt])))}}.

By the conclusion of Lemma .1:

n1/2([nt])(\boldsθ^([nt])\boldsθ)(W2tV1/2)n^{-1/2}([nt])\bigl{(}\hat{\bolds{\theta}}([nt])-\bolds{\theta}\bigr{)}\rightarrow(W_{2t}V^{1/2})

in distribution in the space D[0,1]D_{[0,1]} with the Skorohod topology. Under H0H_{0}, we have

Bt=tn1/2([nt])1W2tV1/2𝐃\operatornameVar(h(\boldsθ^1([nt])))+\operatornameVar(h(\boldsθ^2([nt])))+o(nδ/3)B_{t}^{*}=\sqrt{t}\frac{n^{1/2}([nt])^{-1}W_{2t}V^{1/2}\mathbf{D}^{\prime}}{\sqrt{\operatorname{Var}(h(\hat{\bolds{\theta}}_{1}([nt])))+\operatorname{Var}(h(\hat{\bolds{\theta}}_{2}([nt])))}}+o(n^{-\delta/3})

almost surely. So the sequential statistics BtB_{t}^{*} converges to a Gaussian process in distribution. In order to prove that BtB_{t}^{*} converges to a “Brownian process” in distribution, it is enough to show EBt0EB_{t}^{*}\rightarrow 0 and for any 0<s<t<10<s<t<1,

\operatornamecov(Bt,Bs)\displaystyle\operatorname{cov}(B_{t}^{*},B_{s}^{*}) \displaystyle\rightarrow s,\displaystyle s,
\operatornamecov(Bt,Bs)\displaystyle\operatorname{cov}(B_{t}^{*},B_{s}^{*}) =\displaystyle= nts[nt][ns]\operatornamecov(W2t,W2s)𝐃V𝐃C+o(nδ/3)\displaystyle\frac{n\sqrt{ts}}{[nt][ns]}\frac{\operatorname{cov}(W_{2t},W_{2s})\mathbf{D}V\mathbf{D}^{\prime}}{C}+o(n^{-\delta/3})
=\displaystyle= nts3/2[nt][ns]𝐃V𝐃C+o(nδ/3)\displaystyle\frac{n\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\mathbf{D}V\mathbf{D}^{\prime}}{C}+o(n^{-\delta/3})
=\displaystyle= nts3/2[nt][ns]𝐃(n\operatornameVar[\boldsθ^(n)\boldsθ]+o(1))𝐃C+o(nδ/3)\displaystyle\frac{n\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\mathbf{D}(n\operatorname{Var}[\hat{\bolds{\theta}}(n)-\bolds{\theta}]+o(1))\mathbf{D}^{\prime}}{C}+o(n^{-\delta/3})
=\displaystyle= n2ts3/2[nt][ns]h(\boldsθ1)/\boldsθ1\operatornameVar[\boldsθ^1(n)\boldsθ1]h(\boldsθ1)/\boldsθ1C\displaystyle\frac{n^{2}\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\partial h(\bolds{\theta}_{1})/\partial\bolds{\theta}_{1}\operatorname{Var}[\hat{\bolds{\theta}}_{1}(n)-\bolds{\theta}_{1}]\,\partial h(\bolds{\theta}_{1})/\partial\bolds{\theta}_{1}^{\prime}}{C}
+n2ts3/2[nt][ns]h(\boldsθ2)/\boldsθ2\operatornameVar[\boldsθ^2(n)\boldsθ2]h(\boldsθ2)/\boldsθ2C+o(1)\displaystyle{}+\frac{n^{2}\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\partial h(\bolds{\theta}_{2})/\partial\bolds{\theta}_{2}\operatorname{Var}[\hat{\bolds{\theta}}_{2}(n)-\bolds{\theta}_{2}]\,\partial h(\bolds{\theta}_{2})/\partial\bolds{\theta}_{2}^{\prime}}{C}+o(1)
=\displaystyle= n2ts3/2[nt][ns]\operatornameVar[h(\boldsθ^1(n))]+\operatornameVar[h(\boldsθ^2(n))]C+o(1)\displaystyle\frac{n^{2}\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\operatorname{Var}[h(\hat{\bolds{\theta}}_{1}(n))]+\operatorname{Var}[h(\hat{\bolds{\theta}}_{2}(n))]}{C}+o(1)
=\displaystyle= n2ts2[nt][ns]+o(1)\displaystyle\frac{n^{2}ts^{2}}{[nt][ns]}+o(1)
\displaystyle\rightarrow sa.s.\displaystyle s\qquad\mbox{a.s.}

It is easy to see that EBt0EB_{t}^{*}\rightarrow 0. This completes the proof and shows that BtB_{t} is asymptotical Brownian process in distribution.

Under H1,H_{1}, the sequential statistics

Bt(\boldsθ^([nt]))\displaystyle B_{t}^{*}(\hat{\bolds{\theta}}([nt])) =\displaystyle= th(\boldsθ^1([nt]))h(\boldsθ^2([nt]))(h(\boldsθ1)h(\boldsθ2))\operatornameVar^(h(\boldsθ^1([nt])))+\operatornameVar^(h(\boldsθ^2([nt])))\displaystyle\sqrt{t}\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))-(h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2}))}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}
+th(\boldsθ1)h(\boldsθ2)\operatornameVar^(h(\boldsθ^1([nt])))+\operatornameVar^(h(\boldsθ^2([nt]))).\displaystyle{}+\sqrt{t}\frac{h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2})}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}.

With similar proof, the first term converges to a standard Brownian motion in distribution asymptotically. Because

[nt]\operatornameVar^(h(\boldsθ^j([nt])))=vj(𝐍([nt])[nt],\boldsθ^([nt]))(1+o(1))a.s. j=1,2,[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))=v_{j}\biggl{(}\frac{\mathbf{N}([nt])}{[nt]},\hat{\bolds{\theta}}([nt])\biggr{)}\bigl{(}1+o(1)\bigr{)}\qquad\mbox{a.s. }j=1,2,

we have that

th(\boldsθ1)h(\boldsθ2)\operatornameVar^(h(\boldsθ^1([nt])))+\operatornameVar^(h(\boldsθ^2([nt])))\sqrt{t}\frac{h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2})}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}

converges to

tn(h(\boldsθ1)h(\boldsθ2))v1(\boldsρ,\boldsθ)+v2(\boldsρ,\boldsθ)=nμtt\frac{\sqrt{n}(h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2}))}{\sqrt{v_{1}(\bolds{\rho},\bolds{\theta})+v_{2}(\bolds{\rho},\bolds{\theta})}}=\sqrt{n}\mu t (18)

in probability. Therefore, under H1H_{1}, by Slutsky’s theorem, BtnμtB_{t}^{*}-\sqrt{n}\mu t converges to a standard Brownian motion asymptotically.

Acknowledgments

Special thanks go to anonymous referees, the Associate Editor and the Editor for the constructive comments, which led to a much improved version of the paper.

References

  • (1) Andersen, J. (1996). Clinical trials designs—made to order. J. Biopharm. Statist. 6 515–522.
  • (2) Andersen, K. M. (2007). Optimal spending functions for asymmetric group sequential designs. Biom. J. 49 337–345. \MR2380516
  • (3) Armitage, P. (1957). Restricted sequential procedures. Biometrika 44 9–26. \MR0085685
  • (4) Armitage, P. (1975). Sequential Medical Trials. Blackwell, Oxford. \MR0370997
  • (5) Athreya, K. B. and Karlin, S. (1968). Embedding of urn schemes into continuous time Markov branching processes and related limit theorems. Ann. Math. Statist. 39 1801–1817. \MR0232455
  • (6) Bai, Z. D. and Hu, F. (2005). Asymptotics in randomized urn models. Ann. Appl. Probab. 15 914–940. \MR2114994
  • (7) Berry, D. A. (2005). Introduction to Bayesian methods III: Use and interpretation of Bayesian tools in design and analysis. Clinical Trials 2 295–300.
  • (8) Coad, D. S. and Rosenberger, W. F. (1999). A comparison of the randomized play-the-winner rule and the triangular test for clinical trials with binary responses. Stat. Med. 18 761–769.
  • (9) Cheng, Y. and Shen, Y. (2005). Bayesian adaptive designs for clinical trials. Biometrika 92 633–646. \MR2202651
  • (10) Connor, E. M., Sperling, R. S., Gelber, R., Kiselev, P., Scott, G., O’Sullivan, M. J., VanDyke, R., Bey, M., Shearer, W., Jacobson, R. L., Jimenez, E., O’Neill, E., Bazin, B., Delfraissy, J. F., Culname, M., Coombs, R., Elkins, M., Moye, J., Stratton, P. and Balsley, J. (1994). Reduction of maternal-infant transmission of human immunodeficiency virus type I with zidovudine treatment. The New England Journal of Medicine 331 1173–1180.
  • (11) DeMets, D. L. (2006). Futility approaches to interim monitoirng by data monitoring committees. Clinical Trials 3 522–529.
  • (12) Eisele, J. and Woodroofe, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Ann. Statist. 23 234–254. \MR1331666
  • (13) Ethier, S. N. and Kurts, T. G. (1986). Markov Processes: Characterization and Convergence. Wiley, New York. \MR0838085
  • (14) Gwise, T. E., Hu, J. and Hu, F. (2008). Optimal biased coins for two-arm clinical trials. Stat. Interface 1 125–135. \MR2425350
  • (15) Hayre, L. S. (1979). Two-population sequential tests with three hypotheses. Biometrika 66 465–474. \MR0556733
  • (16) Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic Press, New York. \MR0624435
  • (17) Hu, F. and Rosenberger, W. F. (2003). Optimality, variability, power: Evaluating response-adaptive randomization procedures for treatment comparisons. J. Amer. Statist. Assoc. 98 671–678. \MR2011680
  • (18) Hu, F. and Rosenberger, W. F. (2006). The Theory of Response-Adaptive Randomization in Clinical Trials. Wiley, New York. \MR2245329
  • (19) Hu, F., Rosenberger, W. F. and Zhang, L. (2006). Asymptotically best response-adaptive randomization procedures. J. Statist. Plann. Inference 136 1911–1922. \MR2255603
  • (20) Hu, F. and Zhang, L. X. (2004). Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Ann. Statist. 32 268–301. \MR2051008
  • (21) Hu, F., Zhang, L. X. and He, X. (2009). Efficient randomized adaptive designs. Ann. Statist. 37 2543–2560. \MR2543702
  • (22) Ivanova, A. V. (2003). A play-the-winner type urn model with reduced variability. Metrika 58 1–13. \MR1999248
  • (23) Jennison, C. and Turnbull, B. W. (2000). Group Sequential Methods With Applications to Clinical Trials. Chapman and Hall, Boca Raton, FL.. \MR1710781
  • (24) Lan, K. and DeMets, D. L. (1983). Discrete sequential boundaries for clinical trials. Biometrika 70 659–663. \MR0725380
  • (25) Lewis, R. J., Lipsky, A. M. and Berry, D. A. (2007). Bayesian decision-theoretic group sequential clinical trial design based on a quadratic loss function: A frequentist evaluation. Clinical Trials 4 5–14.
  • (26) O’Brien, P. C. and Fleming, T. R. (1979). A multiple testing procedure for clinical trials. Biometrics 35 549–556.
  • (27) Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika 64 191–199.
  • (28) Pocock, S. J. (1982). Interim analyses for randomized clinical trials: The group sequential approach. Biometrics 38 153–162.
  • (29) Proschan, M. A., Lan, K. and Wittes, J. T. (2006). Statistical Monitoring of Clinical Trials, A Unified Approach. Springer, New York.
  • (30) Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 527–535. \MR0050246
  • (31) Rosenberger, W. F. and Hu, F. (2004). Maximizing power and minimizing treatment failures in clinical trials. Clinical Trials 1 141–147.
  • (32) Rosenberger, W. F. and Lachin, J. M. (2002). Randomization in Clinical Trials: Theory and Practice. Wiley, New York. \MR1914364
  • (33) Rosenberger, W. F., Stallard, N., Ivanova, A., Harper, C. N. and Ricks, M. L. (2001). Optimal adaptive designs for binary response trials. Biometrics 57 909–913. \MR1863454
  • (34) Rout, C. C., Rocke, D. A., Levin, L., Gouws, E. and Reddy, D. (1993). A reevaluation of the role of crystalloid preload in the prevention of hypotension associated with apinal anesthesia for elective cesarean section. Anesthesiology 79 262–269.
  • (35) Tamura, R. N., Faries, D. E., Andersen, J. S. and Heiligenstein, J. H. (1994). A case study of an adaptive clinical trial in the treatment of out-patients with depressive disorder. J. Amer. Statist. Assoc. 89 768–776.
  • (36) Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in the review of the evidence of the two samples. Biometrika 25 275–294.
  • (37) Tymofyeyev, Y., Rosenberger, W. F. and Hu, F. (2007). Implementing optimal allocation in sequential binary response experiments. J. Amer. Statist. Assoc. 102 224–234. \MR2345540
  • (38) Wald, A. (1947). Sequential Analysis. Wiley, New York. \MR0020764
  • (39) Wathen, J. K. and Thall, P. F. (2008). Bayesian adaptive model selection for optimizing group sequential clinical trials. Statistics in Medicine 27 5586–5604.
  • (40) Wei, L. J. and Durham, S. (1978). The randomized play-the-winner rule in medical trials. J. Amer. Statist. Assoc. 73 840–843.
  • (41) Zelen, M. (1969). Play the winner and the controlled clinical trial. J. Amer. Statist. Assoc. 64 131–146. \MR0240938
  • (42) Zhang, L. and Rosenberger, W. F. (2006). Response-adaptive randomization for clinical trials with continuous outcomes. Biometrics 62 562–569. \MR2236838
  • (43) Zhang, L. X., Hu, F. and Cheung, S. H. (2006). Asymptotic theorems of sequential estimation-adjusted urn models. Ann. Appl. Probab. 16 340–369. \MR2209345