Sequential monitoring of response-adaptive randomized clinical trials

Hongjian Zhulabel=e1]hz5n@virginia.edu [ Feifang Hulabel=e2]fh6e@virginia.edu [ University of Virginia Department of Statistics
University of Virgina
Kerchof Hall, Charlottesville
Virginia 22904-4135
USA

E-mail: e2

(2010; 9 2009; 1 2010)

Abstract

Clinical trials are complex and usually involve multiple objectives such as controlling type I error rate, increasing power to detect treatment difference, assigning more patients to better treatment, and more. In literature, both response-adaptive randomization (RAR) procedures (by changing randomization procedure sequentially) and sequential monitoring (by changing analysis procedure sequentially) have been proposed to achieve these objectives to some degree. In this paper, we propose to sequentially monitor response-adaptive randomized clinical trial and study it’s properties. We prove that the sequential test statistics of the new procedure converge to a Brownian motion in distribution. Further, we show that the sequential test statistics asymptotically satisfy the canonical joint distribution defined in Jennison and Turnbull (2000). Therefore, type I error and other objectives can be achieved theoretically by selecting appropriate boundaries. These results open a door to sequentially monitor response-adaptive randomized clinical trials in practice. We can also observe from the simulation studies that, the proposed procedure brings together the advantages of both techniques, in dealing with power, total sample size and total failure numbers, while keeps the type I error. In addition, we illustrate the characteristics of the proposed procedure by redesigning a well-known clinical trial of maternal-infant HIV transmission.

60F15,

62G10,

60F05,

60F10,

Asymptotic properties,

Brownian process,

response-adaptive randomization,

power,

sample size,

type I error,

doi:

10.1214/10-AOS796

keywords:

[class=AMS] .

keywords:

^†^†volume: 38^†^†issue: 4

and

t1Supported by NSF Grants DMS-03-49048 and DMS-09-07297.

1 Introduction

Clinical trials usually involve multiple competing objectives such as maximizing the power of detecting clinical difference among treatments, minimizing total sample size and protecting more people from possibly inferior treatments. To achieve these objectives, two different techniques have been proposed in literature: (i) the analysis approach—by analyzing the observed data sequentially [sequential monitoring, Jennison and Turnbull (2000)], and (ii) the design approach—by changing the allocation probability sequentially [response-adaptive randomization, Hu and Rosenberger (2006)]. In this paper, we discuss how to combine the two procedures in one clinical trial in order to utilize both of their advantages.

In experiments where data accumulates sequentially, it is natural to conduct a sequential analysis. Sequential techniques originated from a methodology of long history based on Brownian motion. Wald’s classic work about the sequential probability ratio test (SPRT) [Wald (1947)] led to the application of sequential analysis in numerous fields of statistics. Armitage (1957, 1975) introduced sequential methods to clinical studies, which required monitoring results on a patient-by-patient basis. Pocock (1977) proposed sequential monitoring of clinical trials based on a group basis. Since then, many authors have done important work on group sequential studies. These work are summarized in Jennison and Turnbull (2000) and Proschan, Lan and Wittes (2006).

The main advantages of sequential monitoring were listed in Jennison and Turnbull (2000). First, it is ethical to monitor clinical trials sequentially because we could ensure that patients are not exposed to dangerous treatments and we could stop trials as soon as possible if needed. Second, administratively one needs to ensure that the protocol is not violated and the assumption, which the clinical trial is based on, is correct and valid. Third, sequential monitoring can decrease sample size and cost. With all the above advantages, sequential monitoring has now become a standard technique in conducting clinical trials.

The idea of response-adaptive randomization (RAR) can be traced back to Thompson (1933) and Robbins (1952). The play-the-winner rule [Zelen (1969)] and the randomized play-the-winner rule [Wei and Durham (1978)] were proposed to reduce number of patients in the inferior treatments. Hu and Rosenberger (2003) proved theoretically that adaptive randomization can be used to increase statistical efficiency in some clinical trials. In literature, many papers showed its efficient and ethical advantages over fixed designs [Hu and Rosenberger (2006)]. With modern technology and high capability of collecting data, it becomes easier and easier to implement adaptive designs in sequential experiments. Some clinical trials have already implemented the response-adaptive designs [Rout et al. (1993), Tamura et al. (1994), Andersen (1996), etc.].

Bayesian adaptive designs have also been proposed and studied in literature. Berry (2005) provided some comprehensive introduction of Bayesian designs in clinical trials. Recently, Cheng and Shen (2005) proposed to sequentially monitor a Bayesian adaptive design using decision-theoretic approaches and allowing the maximum sample size to be sequentially adjusted by the observed data. Lewis, Lipsky and Berry (2007) proposed a Bayesian decision-theoretic group sequential design for a disease with two possible outcomes based on a quadratic loss function. Wathen and Thall (2008) studied Bayesian adaptive model selection for optimizing group sequential clinical trials. In this paper, we focus on sequential monitoring of response-adaptive randomized clinical trials.

Traditionally, sequential monitoring deals with fixed designs (usually with equal allocation). No systematic study is available about sequential monitoring a sequential experiment using response-adaptive randomization, except a simulation study by Coad and Rosenberger (1999). They found that the expected number of treatment failures can be further reduced by combining the triangular test with the randomized play-the-winner rule. In this paper, we will study both theoretical properties and finite sample properties of combining sequential monitoring with response-adaptive randomization.

Sequential monitoring procedures use responses to stop or continue a clinical trial. Response-adaptive randomization procedures sequentially estimate the parameters and update the allocation probability of the next patient. To monitor a response-adaptive randomized clinical trial sequentially, one needs to study the two sequential procedures simultaneously. This is conceptually difficult because: (1) the number of patients assigned to each treatment is a random variable at each time point; (2) both the treatment assignments (probabilities) and the estimators of parameters (test statistics) depend on the responses at each time point. These problems arise from the sequential updating of estimators of the parameters and the allocation probability function, which leads to difficulties in finding the joint distribution of sequential test statistics. We overcome above difficulties by (i) approximating these different processes by martingale processes at each time point simultaneously; (ii) then using continuous Gaussian approximation to study these martingale processes simultaneously.

In this paper, we discuss sequential monitoring of doubly adaptive biased coin design proposed by Hu and Zhang (2004) for comparing two treatments. Under widely satisfied conditions, we show that the sequential test statistics converge to (i) a standard Brownian motion in distribution under null hypothesis; and (ii) a drifted Brownian motion in distribution under alternative hypothesis. For a standard Brownian motion, the critical value for fixed type I error rate has been well studied in literature. Therefore, the problem of controlling type I error is theoretically solved. Further, we show that the sequential test statistics satisfy the canonical joint distribution defined in Jennison and Turnbull (2000) asymptotically. Hence, one can apply the group sequential methods in the book to response-adaptive randomized clinical trials.

Simulation results support our theoretical founds in terms of type I error and display that sequential monitoring of response-adaptive randomization procedure could increase power and decrease total failure number. Also compared to complete randomization, sequential monitoring of response-adaptive randomization procedure could stop earlier, and thus reduce the actual sample size. In other words, the proposed procedure achieves the goals of both RAR and sequential monitoring. We also redesign an experiment evaluating the effect of zidovudine treatment in reducing the risk of maternal-infant HIV transmission performed by Connor et al. (1994). The proposed procedure can be used to decrease the number of HIV infected people and increase the power comparing to the complete randomization.

In Section 2, we introduce the notation, describe the framework and state the main theorem. In Sections 3 and 4, we use both generated data and real data to compare the proposed procedure with other randomization procedures. Conclusions are in Section 5 and technical proofs are given in the Appendix.

2 Sequential monitoring of response-adaptive randomization procedures

2.1 Notation and framework

We first describe the framework for the randomized adaptive designs. In this article, we consider clinical trials with two treatments 1 and 2. Let $\mathbf{T}_{i}=(T_{i,1},T_{i,2})=(1,0),i=1,\ldots,n$ , if the $i$ th patient is assigned to treatment 1, and $(0,1)$ otherwise, where $n$ is the sample size. $\mathbf{N}(n)=(N_{1}(n),N_{2}(n))$ , where $N_{j}(n)=\sum_{i=1}^{n}T_{ij},j=1,2$ , is the number of patients in treatment $j$ . Let $\mathbf{X}=(\mathbf{X}_{1},\ldots,\mathbf{X}_{n})^{\prime}$ , where $\mathbf{X}_{i}=(\mathbf{X}_{i1},\mathbf{X}_{i2}),i=1,\ldots,n$ , is a random matrix of responses variable and $\mathbf{X}_{ij},j=1,2$ , are $d$ -dimensional random vectors. Here, only one element of $\mathbf{X}_{i}$ , say $\mathbf{X}_{ij}$ , can be observed if the $i$ th patient is assigned to treatment $j$ . We assume that $\mathbf{X}_{1},\ldots,\mathbf{X}_{n}$ are independent and identical distributed with unknown parameter $(\bolds{\theta}_{1},\bolds{\theta}_{2})$ , where $\bolds{\theta}_{j}$ is the corresponding $d_{j}$ -dimensional parameter vector $(\theta_{j1},\ldots,\theta_{jd_{j}})$ of treatment $j$ ( $j=1,2$ ). To simplify the notation, we assume that the parameter vectors of both treatments have the same dimension ( $d_{1}=d_{2}=d$ ). Without loss of generality, we also assume that $\bolds{\theta}_{j}=E(\mathbf{X}_{ij})$ . Otherwise, we can transform $\mathbf{X}$ and treat the transformation as responses to make the former equation hold if such transformation exists. Such transformation usually exists asymptotically. See Gwise, Hu and Hu (2008) and Hu and Zhang (2004) for further discussion.

Let $[nt]$ denote the largest integer that is smaller than or equal to $nt$ for $t\in[0,1]$ . Then $\mathbf{N}([nt])=(N_{1}([nt]),N_{2}([nt]))$ and $N_{j}([nt])=\sum_{i=1}^{[nt]}T_{ij},j=1,2$ . Note that $t=N/n$ when $N$ is the number of patients who have already been enrolled. We introduce the so-called information time $t$ in order to formulate this problem into the Skorohod topology [Ethier and Kurts (1986)]. After $N=[nt]$ patients have been assigned and the responses observed, we use the modified sample means $\hat{\bolds{\theta}}_{[nt]}=(\hat{\bolds{\theta}}_{[nt],1},\hat{\bolds{\theta}}_{[nt],2})$ to estimate the parameter $\bolds{\theta}=(\bolds{\theta}_{1},\bolds{\theta}_{2})$ , that is,

\qquad\hat{\bolds{\theta}}_{[nt],1}=\frac{\sum_{i=1}^{[nt]}T_{i,1}X_{i1}+\bolds{\theta}_{0,1}}{N_{1}([nt])+1}\quad\mbox{and}\quad\hat{\bolds{\theta}}_{[nt],2}=\frac{\sum_{i=1}^{[nt]}T_{i,2}X_{i2}+\bolds{\theta}_{0,2}}{N_{2}([nt])+1}.

(1)

Here, we add 1 in the denominator to prevent discontinuity, and add $\bolds{\theta}_{0,j}$ , say 0.5, to estimate $\bolds{\theta}_{j}$ when no patient has been assigned to the treatment $j$ , $j=1,2$ .

Let $\bolds{\rho}=(\rho_{1},\rho_{2})$ be the target allocation proportion. Usually $\bolds{\rho}$ is obtained based on some optimal criteria and depends on unknown parameter $\bolds{\theta}$ . The selection of $\bolds{\rho}=\bolds{\rho}(\bolds{\theta})$ has been studied by Hayre (1979), Jennison and Turnbull (2000) and Tymofyeyev, Rosenberger and Hu (2007). In practice, the parameters are unknown. Therefore, we have to first estimate them according to previous treatment assignments and responses so that we can target the allocation proportion. We consider a general family of doubly adaptive biased coin design (DBCD) [Eisele and Woodroofe (1995)] here.

Doubly adaptive biased coin design: (i) assign the first $2n_{0}$ patients to treatment 1 and 2 by some restricted randomization procedures [permuted block or truncated binomial randomization, see Rosenberger and Lachin (2002)]; (ii) when the $l$ th ( $l>2n_{0}$ ) patient arrives and all the responses on the previous $l-1$ patients are available, we compute $\hat{\bolds{\theta}}_{l-1}$ and $\hat{\bolds{\rho}}_{l-1}=\bolds{\rho}(\hat{\bolds{\theta}}_{l-1})$ ; (iii) then assign the $l$ th patient to treatment $1$ with probability

g\bigl{(}N_{1}(l-1)/(l-1),\rho_{1}(\hat{\bolds{\theta}}_{l-1})\bigr{)},

where $g(s,r)\dvtx[0,1]\times[0,1]\rightarrow[0,1]$ is the allocation function. Hu and Zhang (2004) proposed ( $\gamma\geq 0$ ):

$\displaystyle g^{(\gamma)}(0,r)$	$\displaystyle=$	$\displaystyle 1,$
$\displaystyle g^{(\gamma)}(1,r)$	$\displaystyle=$	$\displaystyle 0,$	(2)
$\displaystyle g^{(\gamma)}(s,r)$	$\displaystyle=$	$\displaystyle\frac{r(r/s)^{\gamma}}{r(r/s)^{\gamma}+(1-r)((1-r)/(1-s))^{\gamma}}.$

The design has drawn much attention since it was proposed and its advantages and properties can be found in Hu and Rosenberger (2003), Rosenberger and Hu (2004) and Tymofyeyev, Rosenberger and Hu (2007).

To compare two treatments in clinical trials, one consider a general hypothesis test:

H_{0}\dvtx h(\bolds{\theta}_{1})=h(\bolds{\theta}_{2})\quad\mbox{versus}\quad H_{1}\dvtx h(\bolds{\theta}_{1})\neq h(\bolds{\theta}_{2}),

where $h$ is a $\Re^{d}\rightarrow\Re$ function of parameters. In this paper, we assume $h(\bolds{\theta}_{j})$ is continuous and twice differentiable in a small neighborhood of $\bolds{\theta}_{j},j=1,2$ . If one would like to test the above hypothesis at time point $t\in(0,1]$ , it is natural to construct the test statistic as

Z_{t}\biggl{(}\frac{\mathbf{N}([nt])}{[nt]},\hat{\bolds{\theta}}([nt])\biggr{)}=\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}.

(3)

Here, $\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))$ and $\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))$ are some consistent estimators of the variances of $h(\hat{\bolds{\theta}}_{1}([nt]))$ and $h(\hat{\bolds{\theta}}_{2}([nt]))$ , respectively. There is no covariance term on the denominator since the two terms on the numerator are asymptotically independent [Hu, Rosenberger and Zhang (2006)]. Without loss of generality, we also assume that for some functions $v_{1}$ and $v_{2}$

[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))=v_{j}\biggl{(}\frac{\mathbf{N}([nt])}{[nt]},\hat{\bolds{\theta}}([nt])\biggr{)}\bigl{(}1+o(1)\bigr{)}\qquad\mbox{a.s. }j=1,2.

It is easy to see that both $v_{j}(\mathbf{y},\mathbf{z})$ and $Z_{t}(\mathbf{y},\mathbf{z})$ are $\Re^{2+2d}\rightarrow\Re$ function, where $\mathbf{y}$ is a two-dimensional vector and $\mathbf{z}$ is a $2d$ -dimensional vector. Examples of using this formulation are discussed in Section 2.3.

2.2 Main results

Based on the notation in Section 2.1, we observe the random processes $(\mathbf{T}_{1},\ldots,\mathbf{T}_{[nt]})$ , $(\mathbf{X}_{1},\ldots,\mathbf{X}_{[nt]})$ , $\mathbf{N}([nt])$ , $\hat{\bolds{\theta}}_{[nt]}$ , $\bolds{\rho}(\hat{\bolds{\theta}}_{[nt]})$ and $Z_{t}$ at time point $t$ . When a response-adaptive randomization procedure is used, these random processes have the following characteristics different from those in fixed designs:

[(2)]
(1)

The allocation $(\mathbf{N}([nt]))$ at any time $t$ is a random vector instead of a constant in fixed designs.
(2)

The allocation $(\mathbf{N}([nt]))$ and $(\mathbf{T}_{1},\ldots,\mathbf{T}_{[nt]})$ are not independent with the responses $(\mathbf{X}_{1},\ldots,\mathbf{X}_{[nt]})$ and the parameter estimator vector $\hat{\bolds{\theta}}_{[nt]}$ .
(3)

The elements $\hat{\bolds{\theta}}_{1}{[nt]}$ and $\hat{\bolds{\theta}}_{2}{[nt]}$ depend on each other at any given time $t\in(0,1]$ .

These differences directly lead to difficulties in deriving the joint distributions of sequential testing statistics.

To sequentially monitor a clinical trial, we need to figure out how to control the type I error. The answer to this question relies on the derivation of the asymptotical joint distribution of the sequential statistics and right choices of the boundaries. Before we give the main theorem, we need the following conditions for the response $\mathbf{X}$ , target allocation $\rho(\bolds{\theta})$ , allocation function $g$ and the function $v_{j}(\mathbf{y},\mathbf{z}),j=1,2$ .

(A1) For some $\varepsilon>0$ , $E\|\mathbf{X}_{1}\|^{2+\varepsilon}<\infty$ ;

(A2) $g(s,r)$ is jointly continuous and twice differentiable at $(\rho_{1},\rho_{1})$ ;

(A3) $g(r,r)=r$ for all $r\in(0,1)$ and $g(s,r)$ is strictly decreasing in $s$ and strictly increasing in $r$ on $(0,1)\times(0,1)$ ;

(A4) $\bolds{\rho}(\mathbf{z})$ is a continuous function and twice continuously differentiable in a small neighborhood of $\bolds{\theta}$ ;

(A5) $v_{j}(\mathbf{y},\mathbf{z})$ is jointly continuous and twice differentiable in a small neighborhood of $(\bolds{\rho},\bolds{\theta})$ ;

(A6) $Z_{t}(\mathbf{y},\mathbf{z})$ is a continuous function and it is twice continuously differentiable in a small neighborhood of vector $(\bolds{\rho},\bolds{\theta})$ .

Remark 2.1.

All the conditions are widely satisfied. An example of a design which satisfies these conditions is DBCD in Hu and Zhang (2004). Condition (A1) is used to ensure the consistency of the procedure and asymptotic normality of the allocation proportions. Condition (A3) forces the actual allocation proportion to approach the theoretically targeted one. Conditions (A4), (A5) and (A6) are satisfied in all the examples in Chapter 5 of Hu and Rosenberger (2006).

Theorem 2.1

Let $B_{t}=\sqrt{t}Z_{t}$ in the space $D_{[0,1]}$ with Skorohod topology. Assume conditions (A1)–(A6) are satisfied. Then we have the following two results:

{longlist}

Under $H_{0}$ , $B_{t}$ is asymptotically a standard Brownian motion in distribution.

Under $H_{1}$ , $B_{t}-\sqrt{n}\mu t$ is asymptotically a standard Brownian motion in distribution, where

\mu=\frac{(h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2}))}{\sqrt{v_{1}(\bolds{\rho},\bolds{\theta})+v_{2}(\bolds{\rho},\bolds{\theta})}}.

Based on Theorem 2.1, we can obtain the asymptotical distribution of the sequence of test statistics $\{Z_{t_{1}},\ldots,Z_{t_{K}}\}$ , where $0\leq t_{1}\leq t_{2}\leq\cdots\leq t_{K}\leq 1$ . Because $Z_{t_{i}}=(\sqrt{t_{i}})^{-1}B_{t_{i}}$ , we have asymptotically:

{longlist}

$\{Z_{t_{1}},\ldots,Z_{t_{K}}\}$ is multivariate normal;

$EZ_{t_{i}}=\mu\sqrt{nt_{i}}$ ; and

$\operatorname{Cov}(Z_{t_{i}},Z_{t_{j}})=\sqrt{[nt_{i}]/[nt_{j}]},0\leq t_{i}\leq t_{j}\leq 1$ .

Therefore, the sequence of test statistics $\{Z_{t_{1}},\ldots,Z_{t_{K}}\}$ has the asymptotical canonical joint distribution defined in Jennison and Turnbull (2000).

Remark 2.2.

Based on the canonical joint distribution of the sequence of test statistics $\{Z_{t_{1}},\ldots,Z_{t_{K}}\}$ , we can see that the doubly adaptive biased coin design has a simple form of information time, which is just the proportion of the sample size enrolled. This is because the DBCD consistently allocates same proportion of patients to different treatments from the beginning to the end asymptotically. We conjecture that this simple form of information time is true for most response-adaptive randomization procedures.

Based on Theorem 2.1, we can easily choose the correct critical values for the asymptotic Brownian process, so that the inflation of the type I error will be avoided. Moreover, we can also make use of all the well-known properties of Brownian process to do further analysis on the process of sequentially monitoring a response-adaptive randomization procedure. Because $\{Z_{t_{1}},\ldots,Z_{t_{K}}\}$ satisfies the canonical joint distribution asymptotically, we can apply the sequential techniques in Chapters 2, 3, 4, 5, 6, 7 of Jennison and Turnbull (2000) to response-adaptive randomized clinical trials. We may also apply different types of spending functions to monitor a response-adaptive randomized clinical trial sequentially. Here, we will use $\alpha$ spending functions proposed by Lan and DeMets (1983).

Any increasing function $\alpha(t)$ defined on $[0,1]$ with $\alpha(0)=0$ and $\alpha(1)=\alpha$ is called a $\alpha$ spending function. We spend $\alpha(t_{i})-\alpha(t_{i-1})$ of the total type I error rate at time point $t_{i}$ , so that $\alpha(t_{i})$ has been spent after this point. For time $t_{i},i=1,2,\ldots,$ we can sequentially obtain the boundaries. This method does not require the predetermined number of looks and equally spaced looks. We can perform the interim monitor anytime during the trial. Such a procedure is usually preferred by Data and Safety Monitoring Boards (DSMB). Proschan, Lan and Wittes (2006) provided three special spending functions. The first one approximates the O’Brien–Fleming boundaries [O’Brien and Fleming (1979)]

\alpha_{1}(t)=2\{1-\Phi(z_{\alpha/2}/t^{1/2})\}.

The second one is the linear spending function:

\alpha_{2}(t)=\alpha t.

The third one approximates the Pocock boundaries [Pocock (1982)]:

\alpha_{3}(t)=\alpha\ln\{1+(e-1)t\}.

The O’Brien–Fleming-like function spends little of the type I error at early looks. Consequently, the boundary for the last look is very close to what it would have been without sequential monitoring. Conversely, the Pocock-like function rejects the null hypothesis easier with smaller boundaries for early looks and then has to use a reasonably large critical value at the end to keep the type I error. The linear function is between these two. Therefore, the three functions above represent three typical types of spending function. Finally, it is worth mentioning that these three spending functions are corresponding to the process $Z_{t}$ .

2.3 Examples

Here, we use two examples to illustrate how to sequentially monitor the response-adaptive randomization procedures based on Theorem 2.1.

Example 1 ((Continuous responses from normal populations)).

Suppose the responses of the two treatments are from two normal distributions $Y_{i1}\sim N(\mu_{1},\sigma_{1}^{2})$ and $Y_{i2}\sim N(\mu_{2},\sigma_{2}^{2}),i=1,\ldots,n$ . We would like to compare $\mu_{1}$ and $\mu_{2}$ . In this case, $\bolds{\theta}_{1}=(\mu_{1},\sigma_{1}^{2}+\mu_{1}^{2})$ , $\bolds{\theta}_{2}=(\mu_{2},\sigma_{2}^{2}+\mu_{2}^{2})$ , $\mathbf{X}_{ij}=(Y_{ij},Y_{ij}^{2})$ and $h(\bolds{\theta}_{j})=\theta_{j1}=\mu_{j},j=1,2$ . Then the hypotheses are

H_{0}\dvtx\mu_{1}=\mu_{2}\quad\mbox{versus}\quad H_{1}\dvtx\mu_{1}\neq\mu_{2}.

Let target allocation proportion be the Neyman allocation [Jennison and Turnbull (2000)] with

\rho_{1}=\frac{\sigma_{1}}{\sigma_{1}+\sigma_{2}}\quad\mbox{and}\quad\rho_{2}=1-\rho_{1}=\frac{\sigma_{2}}{\sigma_{1}+\sigma_{2}}.

(4)

We can use other target allocation proportions, for example, the optimal allocation proportion [Zhang and Rosenberger (2006)] and the $D_{A}$ -optimal allocation proportion [Gwise, Hu and Hu (2008)]. The sequential statistics $Z_{t}(\mathbf{y},\mathbf{z})$ is a function from $\Re^{6}$ to $\Re$ :

Z_{t}(\mathbf{y},\mathbf{z})=\frac{z_{11}-z_{21}}{\sqrt{(z_{12}-z_{11}^{2})/([nt]y_{1})+(z_{22}-z_{21}^{2})/([nt]y_{2})}},

where $\mathbf{y}={\mathbf{N}([nt])}/{[nt]}$ and $\mathbf{z}=\hat{\bolds{\theta}}=(\hat{\theta}_{11}([nt]),\hat{\theta}_{12}([nt]),\hat{\theta}_{21}([nt]),\hat{\theta}_{22}([nt]))$ . It is easy to see that $h(\hat{\bolds{\theta}}_{1}([nt]))=\hat{\mu}_{1}([nt])$ and $h(\hat{\bolds{\theta}}_{2}([nt]))=\hat{\mu}_{2}([nt])$ . Also the natural variance estimators are

\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))=\frac{\hat{\sigma}_{1}^{2}([nt])}{N_{1}([nt])}\quad\mbox{and}\quad\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))=\frac{\hat{\sigma}_{2}^{2}([nt])}{N_{2}([nt])},

where $\hat{\sigma}_{1}^{2}([nt])$ and $\hat{\sigma}_{2}^{2}([nt])$ are the usual unbiased estimators of $\sigma_{1}^{2}$ and $\sigma_{2}^{2}$ based on the first $[nt]$ responses ( $N_{1}([nt])$ from treatment 1 and $N_{2}([nt])$ from treatment 2), respectively. Therefore,

v_{1}(\bolds{\rho},\bolds{\theta})=\frac{\sigma_{1}^{2}}{\rho_{1}}\quad\mbox{and}\quad v_{2}(\bolds{\rho},\bolds{\theta})=\frac{\sigma_{2}^{2}}{\rho_{2}}.

The test statistic is then

Z_{t}=\frac{\hat{\mu}_{1}([nt])-\hat{\mu}_{2}([nt])}{\sqrt{\hat{\sigma}_{1}^{2}([nt])/N_{1}([nt])+\hat{\sigma}_{2}^{2}([nt])/N_{2}([nt])}}.

(5)

Then based on Theorem 2.1, the joint distribution of $B_{t}=\sqrt{t}Z_{t}$ is asymptotically a standard Brownian process under $H_{0}$ . Under $H_{1}$ , $B_{t}-\sqrt{n}\mu t$ is asymptotically a standard Brownian motion in distribution, where

\mu=\frac{\mu_{1}-\mu_{2}}{\sqrt{\sigma_{1}^{2}/\rho_{1}+\sigma_{2}^{2}/(1-\rho_{1})}}.

Example 2 ((Binary responses)).

Assume $Y_{i1}\sim\operatorname{Bin}(1,p_{1})$ and $Y_{i2}\sim\operatorname{Bin}(1$ , $p_{2}),i=1,\ldots,n$ , and we would like to compare $p_{1}$ and $p_{2}$ . In this case, $\bolds{\theta}_{1}=(p_{1})$ , $\bolds{\theta}_{2}=(p_{2})$ , $\mathbf{X}_{ij}=(Y_{ij})$ and $h(\bolds{\theta}_{j})=\theta_{j1},j=1,2$ . The hypotheses are

H_{0}\dvtx p_{1}=p_{2}\quad\mbox{versus}\quad H_{1}\dvtx p_{1}\neq p_{2}.

Three common target allocations are: (i) Neyman allocation,

	$\displaystyle\rho_{1}$	$\displaystyle=$	$\displaystyle\frac{\sqrt{p_{1}(1-p_{1})}}{\sqrt{p_{1}(1-p_{1})}+\sqrt{p_{2}(1-p_{2})}}\qquad\mbox{and}$
	$\displaystyle\rho_{2}$	$\displaystyle=$	$\displaystyle\frac{\sqrt{p_{2}(1-p_{2})}}{\sqrt{p_{1}(1-p_{1})}+\sqrt{p_{2}(1-p_{2})}};$

i(ii) optimal allocation proposed by Rosenberger et al. (2001),

\rho_{1}=\frac{\sqrt{p_{1}}}{\sqrt{p_{1}}+\sqrt{p_{2}}}\quad\mbox{and}\quad\rho_{2}=\frac{\sqrt{p_{2}}}{\sqrt{p_{1}}+\sqrt{p_{2}}};

(7)

(iii) Urn allocation [Wei and Durham (1978)],

\rho_{1}=\frac{q_{2}}{q_{1}+q_{2}}\quad\mbox{and}\quad\rho_{2}=\frac{q_{1}}{q_{1}+q_{2}}.

(8)

Neyman allocation is a commonly discussed allocation which is related to the efficiency issue in the field of response-adaptive randomization procedures. We studied sequential monitoring of response-adaptive designs with Neyman allocation in order to show that our proposed procedure is able to achieve various objects.

In this case, $Z_{t}(\mathbf{y},\mathbf{z})$ is a function from $\Re^{4}$ to $\Re$ :

Z_{t}(\mathbf{y},\mathbf{z})=\frac{z_{11}-z_{21}}{\sqrt{z_{11}(1-z_{11})/([nt]y_{1})+z_{21}(1-z_{21})/([nt]y_{2})}},

where $\mathbf{y}=(N_{1}([nt])/[nt],N_{2}([nt])/[nt])$ , $\mathbf{z}=(\hat{\theta}_{11}([nt]),\hat{\theta}_{21}([nt]))$ , $h(\hat{\bolds{\theta}}_{1}([n\times\break t]))=\hat{p}_{1}([nt])$ and $h(\hat{\bolds{\theta}}_{2}([nt]))=\hat{p}_{2}([nt])$ . The corresponding variance estimators are

\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))=\frac{\hat{p}_{1}([nt])(1-\hat{p}_{1}([nt]))}{N_{1}([nt])}

and

\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))=\frac{\hat{p}_{2}([nt])(1-\hat{p}_{2}([nt]))}{N_{2}([nt])}.

Therefore,

v_{1}(\bolds{\rho},\bolds{\theta})=\frac{p_{1}(1-p_{1})}{\rho_{1}}\quad\mbox{and}\quad v_{2}(\bolds{\rho},\bolds{\theta})=\frac{p_{2}(1-p_{2})}{\rho_{2}}.

The test statistic is

	$\displaystyle Z_{t}$	$\displaystyle=$	$\displaystyle\bigl{(}{\hat{p}_{1}([nt])-\hat{p}_{2}([nt])}\bigr{)}$
			$\displaystyle{}\times\Biggl{(}\sqrt{\hat{p}_{1}([nt])\frac{1-\hat{p}_{1}([nt])}{N_{1}([nt])}+\hat{p}_{2}([nt])\frac{1-\hat{p}_{2}([nt])}{N_{2}([nt])}}\Biggr{)}^{-1}.$

Then $B_{t}=\sqrt{t}Z_{t}$ converges to a standard Brownian process in distribution under $H_{0}$ . Under $H_{1}$ , $B_{t}-\sqrt{n}\mu t$ is asymptotically a standard Brownian motion in distribution, where

\mu=\frac{p_{1}-p_{2}}{\sqrt{p_{1}(1-p_{1})/\rho_{1}+p_{2}(1-p_{2})/(1-\rho_{1})}}.

Theorem 2.1 can be applied to different situations such as the examples considered in Chapter 5 of Hu and Rosenberger (2006). In Examples 1 and 2, now assume we would like to look at the process at three points: $t_{1}=0.2$ , $t_{2}=0.5$ and $t_{3}=1$ . Then we can use the corresponding critical values from the three spending functions [Proschan, Lan and Wittes (2006)] in the last subsection for $Z_{t}$ to keep the overall type I error 0.05: O’Brien–Fleming-like boundaries (4.877, 2.963, 1.969), linear boundaries (2.576, 2.377, 2.141) and Pocock-like boundaries (2.438, 2.333, 2.225).

3 Simulation study

In Section 2, we obtained the asymptotical distribution of the test statistic $Z_{t}$ . In this section, we will use the two examples in Section 2 to study the finite sample properties of the proposed procedure.

In Examples 1 and 2, we use the doubly adaptive biased coin design with Hu and Zhang’s allocation function in (2.1) and $\gamma=2$ is used. In Tables 1–5, we use the same total sample size 500. The first $50$ patients ( $n_{0}=25$ ) are randomly assigned to treatments 1 and 2 by using permuted block randomization. Then, for the $l$ th ( $l>50$ ) patient, the unknown parameters are estimated by using (1) based on the first $l-1$ responses with $\theta_{0,1}=\theta_{0,2}=0.5$ . For normal responses in Example 1, we estimate $\sigma_{1}^{2}$ and $\sigma_{2}^{2}$ by using the standard unbiased estimators based on the first $l-1$ responses.

Table 1: Example 1 with Neyman allocation,

\mu_{1}=\mu_{2}=1

\sigma_{1}=1

\sigma_{2}=2

Critical values	Randomization	Type I error	$\bolds{\hat{\rho}_{1}}$ (s.e.)
B–F-like	DBCD	0.055	0.333 (0.020)
B–F-like	CR	0.052	0.500 (0.022)
Linear	DBCD	0.048	0.333 (0.020)
Linear	CR	0.053	0.500 (0.023)
Pocock-like	DBCD	0.051	0.332 (0.020)
Pocock-like	CR	0.052	0.500 (0.023)

Table 2: Example 2 with optimal allocation,

p_{1}=p_{2}=0.5

Critical values	Randomization	Type I error	$\bolds{\hat{\rho}_{1}}$ (s.e.)
B–F-like	DBCD	0.051	0.500 (0.016)
B–F-like	CR	0.046	0.500 (0.023)
Linear	DBCD	0.055	0.500 (0.019)
Linear	CR	0.061	0.500 (0.023)
Pocock-like	DBCD	0.056	0.500 (0.019)
Pocock-like	CR	0.050	0.500 (0.022)

Table 3: Example 1 with Neyman allocation,

\mu_{1}=1

\mu_{2}=1.4

\sigma_{1}=1

\sigma_{2}=2

Critical values	Randomization	Power	$\bolds{\hat{\rho}_{1}}$ (s.e.)	$\bolds{N_{1}}$	$\bolds{N_{2}}$	$\bolds{N_{3}}$
B–F-like	DBCD	0.847	0.333 (0.021)	002	1013	3222
B–F-like	CR	0.807	0.500 (0.024)	001	0842	3193
Linear	DBCD	0.812	0.332 (0.027)	594	1429	2035
Linear	CR	0.765	0.500 (0.028)	477	1380	1970
Pocock-like	DBCD	0.792	0.332 (0.028)	741	1443	1774
Pocock-like	CR	0.738	0.500 (0.028)	544	1309	1835

Table 4: Example 2 with urn allocation,

p_{1}=0.5

p_{2}=0.625

Critical values	Randomization	Power	$\bolds{\hat{\rho}_{1}}$ (s.e.)	$\bolds{N_{1}}$	$\bolds{N_{2}}$	$\bolds{N_{3}}$	Total failures (s.e.)
B–F-like	DBCD	0.811	0.426 (0.033)	004	839	3214	211 (13)
B–F-like	CR	0.811	0.500 (0.024)	001	839	3215	217 (13)
Linear	DBCD	0.762	0.421 (0.041)	503	1396	1912	206 (14)
Linear	CR	0.767	0.500 (0.029)	521	1300	2016	212 (14)
Pocock-like	DBCD	0.749	0.421 (0.042)	609	1325	1809	205 (14)
Pocock-like	CR	0.738	0.501 (0.029)	603	1312	1773	211 (15)

For simplicity, we look at the test at three time points [ $n_{1}=100$ $(t_{1}=0.2)$ , $n_{2}=250$ $(t_{2}=0.5)$ and $n=500$ $(t_{3}=1)$ ]. Then the three sets of spending function boundaries in Section 2.3 are used to ensure $\alpha=0.05$ . For each spending function, the first row in the table is for DBCD and the second row is for complete randomization (denoted as CR in the tables). All the simulations are based on 5000 replications.

In Table 1, we simulate Example 1 with two normal responses $N(1,1)$ and $N(1,2)$ by using the Neyman allocation (4). We find that the type I error of sequentially monitoring the response-adaptive randomization procedure and complete randomization are both well kept at the 0.05 level. We also report the mean and standard deviation of actual allocation proportion ( $\hat{\rho}_{1}$ ) for treatment 1 [ $N(1,1)$ ]. We find that the mean agrees with Neyman allocation and the standard deviation is reasonably small for DBCD. This indicates that the DBCD is able to target the theoretical targeted allocation proportion very well. In Table 2, we simulate the Example 2 with two binary responses $p_{1}=p_{2}=0.5$ and the target allocation is the optimal allocation (7). We obtain the same conclusion as Table 1. We have also done simulations for some other cases, and similar results are obtained. These numerical results indicate that sequential monitoring of the response-adaptive randomization will not inflate the type I error with the appropriate boundaries based on Theorem 2.1.

Table 5: Example 2 with optimal allocation,

p_{1}=0.5

p_{2}=0.625

Critical values	Randomization	Power	$\bolds{\hat{\rho}_{1}}$ (s.e.)	$\bolds{N_{1}}$	$\bolds{N_{2}}$	$\bolds{N_{3}}$	Total failures (s.e.)
B–F-like	DBCD	0.810	0.471 (0.017)	04	0863	3185	214 (12)
B–F-like	CR	0.805	0.501 (0.024)	04	0795	3229	218 (13)
Linear	DBCD	0.768	0.468 (0.022)	520	1354	1964	210 (14)
Linear	CR	0.762	0.500 (0.029)	474	1367	1971	214 (14)
Pocock-like	DBCD	0.754	0.469 (0.023)	673	1309	1787	210 (14)
Pocock-like	CR	0.749	0.500 (0.03)0	602	1351	1793	213 (15)
1.96	DBCD	0.805	0.472 (0.015)	NA	NA	NA	217 (11)
1.96	CR	0.802	0.500 (0.022)	NA	NA	NA	221 (11)

Next, we show other advantages of the sequential monitoring of the response-adaptive randomization procedure. In Table 3, we simulate Example 1 with two normal responses $N(1,1)$ and $N(1.4,2)$ using Neyman allocation (4) as the target allocation that maximizes the power. The power of the sequential monitoring of the response-adaptive randomization procedure is about 5%–8% higher than sequentially monitoring the complete randomization. $N_{i}$ in the table is the number of rejections at the $i$ th look. Rejection at the first two looks means stopping the trial earlier. DBCD with sequential monitoring obviously stops the trial earlier than complete randomization.

In Table 4, we simulate Example 2 with two binary responses $p_{1}=0.5$ and $p_{2}=0.625$ using the urn allocation (8) as the target allocation that assigns more people to the better treatment. If we reject the null hypothesis at the first two looks, we assign all the remaining patients to the estimated better treatment and count the total failure number. We do this only for the comparison in the simulation study. In a real clinical trial, we stop the trial if the null hypothesis is rejected at an interim look. From the mean total failure number, the DBCD with sequential monitoring has lower failure numbers than complete randomization for each type of spending function. $N_{1}$ , $N_{2}$ , and $N_{3}$ show that our methods stop the trial a little earlier and the power is almost the same.

In Table 5, we simulate Example 2 with two binary responses $p_{1}=0.5$ and $p_{2}=0.625$ using the optimal allocation (7) used to maximize the power while keeping the total failure number. We deal with the remaining patients in the same way as in Table 4 if we reject the null hypothesis at the first two looks. We find that sequential monitoring of the response-adaptive randomization procedure can achieve the aim of optimal allocation. Its power is larger and its failure number is less than the complete randomization procedure. In this table, we also do the simulation without sequential monitoring. That is, we only look at the test once at the end of the trial and the critical value is 1.96 for the nominal significance level 0.05. We report it at the last two rows. It is obvious that sequential monitoring can reduce the total failures.

Based on the simulation results, we can see the advantages of sequentially monitoring response-adaptive randomized clinical trials: (i) controlling type I error well; (ii) reducing the total number of failures; (iii) increasing power; and (iv) stopping the trail earlier (reducing total sample size).

4 Re-designing the HIV transmission trial

Maternal-infant transmission is the primary means by which infants are infected by HIV virus. Connor et al. (1994) reported a trial to evaluate the drug AZT (Zidovudine treatment) in reducing the risk of maternal-infant HIV transmission. In this clinical trial, 477 HIV-infected pregnant women were enrolled from April 1991 to December 1993 and assigned to the Zidovudine treatment group and placebo group with a 50–50 randomization scheme. This experiment was a randomized, double-blind and placebo-controlled trial. 239 were allocated to the treatment group and 238 to the placebo group. At the end of the trial, $8.3\%$ of the infant from the treatment group were infected by the HIV virus, while $25.5\%$ from the placebo group were infected.

Table 6: Re-designed the HIV trial with full sample size

Target allocation	Critical values	$\bolds{\hat{\rho}_{1}}$ (s.e.)	Power	Total failures (s.e.)
CR	linear	0.500 (0.039)	0.999	60.1 (11.1)
CR	1.96	0.501 (0.023)	0.999	80.7 (8.2)0
Urn allocation	linear	0.751 (0.062)	0.996	52.3 (9.2)0
Optimal allocation	linear	0.527 (0.021)	0.997	56.4 (10.8)

In Table 6, we redesign the study by sequential monitoring of both complete randomization (the first two rows in the table) and response-adaptive randomization [DBCD (2.1) with $\gamma=2$ ] (the last three rows in the table). We assume the success rate for the treatment group is $p_{1}=0.917$ and that for the placebo group is $p_{2}=0.745$ (as reported in the original paper). We look at the test at the three same time points as mentioned in the last section, $n_{1}=95$ $(t_{1}=0.2)$ , $n_{2}=143$ $(t_{2}=0.5)$ and $n=239$ $(t_{3}=1)$ . The boundary we use is the linear spending function (2.576, 2.377, 2.141) except the second row in the table where we do the equal allocation without sequential monitoring. We report the actual allocation proportion for the treatment group, power and the total HIV-infected number. As before, if we reject the null hypothesis at the first two looks, we will assign all the remaining patients to the estimated better treatment. We find that the sequential monitoring technique will decrease the HIV-infected number dramatically from the first two rows. Response-adaptive randomization technique will also reduce the HIV-infected number compared to the complete randomization. Sequential monitoring DBCD while targeting at the urn allocation has the least HIV-infected number, which agrees with the aim of urn allocation.

Table 7: Re-designed the HIV trial with sample size

n=245

Target allocation	Critical values	$\bolds{\hat{\rho}_{1}}$ (s.e.)	Power	Total failures (s.e.)
CR	B–F-like	0.500 (0.036)	0.947	40.1 (7.0)
CR	linear	0.501 (0.042)	0.942	36.6 (7.5)
CR	1.96	0.500 (0.032)	0.958	43.1 (5.8)
Urn allocation	B–F-like	0.745 (0.068)	0.920	30.7 (5.9)
Urn allocation	linear	0.747 (0.074)	0.885	29.3 (6.1)
Optimal allocation	B–F-like	0.528 (0.023)	0.952	36.8 (6.7)
Optimal allocation	linear	0.529 (0.025)	0.945	32.8 (7.3)

In Table 7, we reduce the full sample size to 245 (to achieve power $0.95$ for complete randomization) and keep all the other settings unchanged. We obtain the same conclusion about the HIV-infected number as in Table 6. We also find that targeting optimal allocation with DBCD has slightly higher power than targeting equal allocation when sequential monitoring is used. Targeting urn allocation with DBCD has slightly less power but the HIV-infected number in this way is the least. Overall, sequential monitoring of the response-adaptive randomization procedure is better than that of complete randomization, since it reduces the HIV-infected number and remains good power.

5 Conclusion remarks

Now sequential monitoring becomes a standard technique in clinical trials. To apply response-adaptive randomization in clinical trials, it is important to know how to sequentially monitor adaptive randomized trials. In this paper, we overcome this hurdle and show the advantages of sequential monitoring response-adaptive randomized clinical trials both theoretically and numerically. We use a Gaussian process in the Skorohod topology to describe the relationship between the allocation and parameter estimators. One of the main contributions of this paper is to show that sequential statistics can be asymptotically approximated by a Brownian process in distribution under both null and alternative hypotheses. Further, we find that the sequential test statistics satisfy the canonical joint distribution asymptotically. Consequently, the results of this paper not only solve the problem of preserving a preset type I error but may lead to many area of potential future research.

We have studied how to sequentially monitor a clinical trial based on doubly adaptive biased coin design proposed by Eisele and Woodroofe (1995) and Hu and Zhang (2004). Another important family of response-adaptive randomization procedure is based on urn models, which include randomized play-the-winner rule [Wei and Durham (1978)], generalized Friedman’s urn models [Athreya and Karlin (1968), Bai and Hu (2005)], drop-the-loser rule [Ivanova (2003)], sequential estimation-adjusted urn models [Zhang, Hu and Cheung (2006)], etc. The technique used in this paper opens a door to study the properties of sequential monitoring of clinical trials based on these urn models or the efficient randomized adaptive designs [Hu, Zhang and He (2009)]. We leave this for future study.

In this paper, we have used $\alpha$ -spending function to calculate the critical boundaries. Because the sequential test statistics satisfy the canonical joint distribution asymptotically, we can implement all the sequential techniques introduced in Jennison and Turnbull (2000) based on this canonical form. Also we can use the optimal spending functions in Anderson (2007), or the beta spending functions in DeMets (2006). We also leave the details for future research.

Appendix: Proofs

First, we introduce some further notation. For a function $\bolds{\eta}(\mathbf{u},\mathbf{w})\dvtx\Re^{L}\times\Re^{M}\rightarrow\Re^{2}$ , we denote the partial derivative matrices as

\nabla_{u}(\bolds{\eta})=\biggl{(}\frac{\partial\eta_{k}}{\partial u_{i}};i=1,\ldots,L,k=1,2\biggr{)}_{L\times 2}

and

\nabla_{w}(\bolds{\eta})=\biggl{(}\frac{\partial\eta_{k}}{\partial w_{j}};j=1,\ldots,M,k=1,2\biggr{)}_{M\times 2}.

Let $H=\nabla_{r}(g(r,s),1-g(r,s))|_{(\rho_{1},\rho_{1})}$ and $E=\nabla_{s}(g(r,s),1-g(r,s))|_{(\rho_{1},\rho_{1})}$ be the partial derivative matrices of the allocation function $g$ . Further, let $V=\operatorname{diag}(\operatorname{var}(\mathbf{X}_{11})/{\rho_{1}},\operatorname{var}(\mathbf{X}_{12})/{\rho_{2}})$ , $\Sigma_{3}=(\nabla(\bolds{\rho})|_{\bolds{\theta}})^{\prime}V\nabla(\bolds{\rho})|_{\bolds{\theta}}$ , $\Sigma_{1}=\operatorname{diag}(\bolds{\rho})-\bolds{\rho}^{\prime}\bolds{\rho}$ and $\Sigma_{2}=E^{\prime}\Sigma_{3}E$ . In Hu and Zhang (2004), they studied the asymptotic properties of $\mathbf{N}(n)$ , $\hat{\bolds{\rho}}(n)$ and $\hat{\bolds{\theta}}(n)$ at the end of the trial. Based on their results, one can do the corresponding statistical inference after observing all responses of the clinical trial. To monitor the response-adaptive randomized trial sequentially, we need to know the theoretical properties of the process $\mathbf{N}([nt])$ and $\hat{\bolds{\theta}}([nt])$ for any given $t\in(0,1]$ . To do this, we start with Lemma .1.

Lemma .1

Let $W_{1t}$ and $W_{2t}$ be two independent standardtwo-dimensional Brownian processes. $\mathbf{N}([nt])$ , $\hat{\bolds{\theta}}([nt])$ , $\bolds{\rho}$ and $\bolds{\theta}$ are defined as in Section 2. Under the conditions of Theorem 2.1, we have

n^{-1/2}([nt])\biggl{(}\frac{\mathbf{N}([nt])}{[nt]}-\bolds{\rho},\hat{\bolds{\theta}}([nt])-\bolds{\theta}\biggr{)}\rightarrow(G_{t},W_{2t}V^{1/2})

(10)

in distribution in the space $D_{[0,1]}$ with the Skorohod topology, where the Gaussian process

G_{t}=\int_{0}^{t}(dW_{1x})\Sigma_{1}^{1/2}\biggl{(}\frac{t}{x}\biggr{)}^{H}+\int_{0}^{t}(dW_{2x})\Sigma_{2}^{1/2}\biggl{[}\int_{x}^{t}\frac{1}{y}\biggl{(}\frac{t}{y}\biggr{)}^{H}\,dy\biggr{]},

(11)

which is the solution of the stochastic differential equation

dG_{t}=(dW_{1t})\Sigma_{1}^{1/2}+\frac{W_{2t}\Sigma_{2}^{1/2}}{t}\,dt+\frac{G_{t}}{t}H\,dt\qquad\mbox{with }G_{0}=0,

and $a^{H}$ is the matrix power function defined as

a^{H}=e^{H\ln a}=\sum_{j=0}^{\infty}\frac{(\ln a)^{j}}{j!}H^{k}.

{pf}

It is worth noting that the response-adaptive design in Theorem 2.1 satisfies all the conditions of Hu and Zhang (2004). So all the results in Hu and Zhang (2004) are valid. We will prove this lemma by using the weak convergence of the martingale [cf. Theorem 4.1 of Hall and Heyde (1980)]. To do this, we first approximate the process $(\frac{\mathbf{N}([nt])}{[nt]}-\bolds{\rho},\hat{\bolds{\theta}}([nt])-\bolds{\theta})$ by a martingale and then prove the following two facts: (1) Lindeberg condition holds for the approximated martingale process; and (2) the limiting covariance of $n^{-1/2}([nt])(([nt])^{-1}\mathbf{N}([nt])-\bolds{\rho},\hat{\bolds{\theta}}([nt])-\bolds{\theta})$ agrees with that of $(G_{t},W_{2t}V^{1/2})$ .

Now, we use the martingale approximation of $\mathbf{N}(n)-n\bolds{\rho}$ and $\hat{\bolds{\theta}}(n)-\bolds{\theta}$ from Hu and Zhang (2004). Let $\mathcal{F}_{m}=\sigma(\mathbf{T}_{1},\ldots,\mathbf{T}_{m},\mathbf{X}_{1},\ldots,\mathbf{X}_{m})$ be the $\sigma$ -field generated by the previous $m$ stages. Then under $\mathcal{F}_{m-1}$ , $\mathbf{T}_{m}$ and $\mathbf{X}_{m}$ are independent, and

E[T_{m1}|F_{m-1}]=g\biggl{(}\frac{N_{1}(m-1)}{m-1},\hat{\rho}_{1}(m-1)\biggr{)}.

Let $\mathbf{Q}_{n}=\sum_{m=1}^{n}\Delta\mathbf{Q}_{m}$ , where $\Delta\mathbf{Q}_{m}=(\Delta\mathbf{Q}_{m,1},\Delta\mathbf{Q}_{m,2})=(\Delta Q_{m,1k},\Delta Q_{m,2k}$ ; $k=1,\ldots,d)$ and $\Delta Q_{m,jk}=T_{m,j}(X_{m,jk}-\theta_{jk})/\rho_{j},j=1,2$ . Then $\mathbf{Q}_{n}=\break O(\sqrt{n\log\log n})$ a.s. is a sequence of martingales and we can prove

\hat{\bolds{\theta}}(n)-\bolds{\theta}=\frac{\mathbf{Q}_{n}}{n}+O\biggl{(}\frac{\log\log n}{n}\biggr{)}\qquad\mbox{a.s.}

(12)

Let $\mathbf{M}_{n}=\sum_{m=1}^{n}\Delta\mathbf{M}_{m}$ , where $\Delta\mathbf{M}_{m}=\mathbf{T}_{m}-E[\mathbf{T}_{m}|\mathcal{F}_{m-1}]$ , and $B_{n,m}$ as defined in Hu and Zhang (2004), then

	$\displaystyle\mathbf{N}(n)-n\bolds{\rho}$	$\displaystyle=$	$\displaystyle\sum_{m=1}^{n}\Delta\mathbf{M}_{m}B_{n,m}+\sum_{m=1}^{n}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{\|}_{\bolds{\theta}}E\sum_{k=m}^{n}\frac{1}{k}B_{n,k}+o(n^{-1/2-\delta/3})$
	$\displaystyle:\!$	$\displaystyle=$	$\displaystyle\mathbf{U}_{n}+o(n^{-1/2-\delta/3})$

almost surely, where $\mathbf{U}_{n}$ is a sum of martingale differences.

We can approximate the process $\mathbf{N}([nt])-[nt]\bolds{\rho}$ and $\hat{\bolds{\theta}}([nt])-\bolds{\theta}$ (for any point $t\in(0,1]$ ) similarly as $\mathbf{N}(n)-n\bolds{\rho}$ and $\hat{\bolds{\theta}}(n)-\bolds{\theta}$ . We obtain

\hat{\bolds{\theta}}([nt])-\bolds{\theta}=\frac{\mathbf{Q}_{[nt]}}{[nt]}+O\biggl{(}\frac{\log\log[nt]}{[nt]}\biggr{)}\qquad\mbox{a.s.}

(13)

and

	$\displaystyle\mathbf{N}([nt])-[nt]\bolds{\rho}$
	$\displaystyle\qquad\!\phantom{:}=\sum_{m=1}^{[nt]}\Delta\mathbf{M}_{m}B_{[nt],m}+\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{\|}_{\bolds{\theta}}E\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}+o(([nt])^{-1/2-\delta/3})$
	$\displaystyle\qquad:=\mathbf{U}_{[nt]}+o(([nt])^{-1/2-\delta/3})$

almost surely.

Hu and Zhang (2004) proved that both martingales $\mathbf{Q}_{n}$ and $\mathbf{U}_{n}$ satisfy the Lindberg conditions. Similarly, we can show that both martingales $\mathbf{Q}_{[nt]}$ and $\mathbf{U}_{[nt]}$ also satisfy the Lindberg conditions. Now we just have to calculate the covariance matrix of the martingales $\mathbf{Q}_{[nt]}$ and $\mathbf{U}_{[nt]}$ . First, based on the results of Hu and Zhang (2004), we have

\hat{\bolds{\rho}}(n)-\bolds{\rho}=O\Biggl{(}\sqrt{\frac{\log\log n}{n}}\Biggr{)}\quad\mbox{and}\quad n^{-1}\mathbf{N}(n)-\bolds{\rho}=O\Biggl{(}\sqrt{\frac{\log\log n}{n}}\Biggr{)}

almost surely. Therefore, for any $t\in(0,1]$ , we have

	$\displaystyle\hat{\bolds{\rho}}([nt])-\bolds{\rho}$	$\displaystyle=$	$\displaystyle O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}\quad\mbox{and }$
	$\displaystyle([nt])^{-1}\mathbf{N}([nt])-\bolds{\rho}$	$\displaystyle=$	$\displaystyle O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}$

almost surely. Now, we can calculate $\operatorname{Var}[\Delta\mathbf{M}_{[nt]}|\mathcal{F}_{[nt]-1}]$ , $\operatorname{Var}[\Delta\mathbf{Q}_{[nt]}|\mathcal{F}_{[nt]-1}]$ and $\operatorname{Cov}[\Delta\mathbf{M}_{[nt]},\Delta\mathbf{Q}_{[nt]}|\mathcal{F}_{[nt]-1}]$ .

First, $\Delta\mathbf{M}_{[nt]}=\mathbf{T}_{[nt]}-E[\mathbf{T}_{[nt]}|\mathcal{F}_{[nt]-1}]$ is a binary random vector. Based on conditions (A2), (A3) and (Appendix: Proofs), we have

\operatorname{Var}\bigl{[}\Delta\mathbf{M}_{[nt]}|\mathcal{F}_{[nt]-1}\bigr{]}=\Sigma_{1}+o(1)

(15)

almost surely. Similarly, we can show

\operatorname{Var}\bigl{[}\Delta\mathbf{Q}_{[nt]}|\mathcal{F}_{[nt]-1}\bigr{]}=V+o(1)

(16)

and

\operatorname{Cov}\bigl{[}\Delta\mathbf{M}_{[nt]},\Delta\mathbf{Q}_{[nt]}|\mathcal{F}_{[nt]-1}\bigr{]}=o(1)

(17)

almost surely.

Based on results (15), (16) and (17), it follows that for any $0<s<t<1$ ,

$\displaystyle\operatorname{Cov}\bigl{[}\mathbf{Q}_{[ns]},\mathbf{Q}_{[nt]}\bigr{]}$	$\displaystyle=$	$\displaystyle\operatorname{Cov}\Biggl{(}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\Biggr{)}$
	$\displaystyle=$	$\displaystyle ns\bigl{(}V+o(1)\bigr{)}=nsV+o(n),$
$\displaystyle\operatorname{Cov}\bigl{[}\mathbf{U}_{[ns]},\mathbf{U}_{[nt]}\bigr{]}$	$\displaystyle=$	$\displaystyle n\wedge_{11}(s,t)+o(n),$
$\displaystyle\operatorname{Cov}\bigl{[}\mathbf{Q}_{[ns]},\mathbf{U}_{[nt]}\bigr{]}$	$\displaystyle=$	$\displaystyle\operatorname{Cov}\Biggl{[}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{M}_{m}B_{[nt],m}$
		$\displaystyle\qquad\hskip 48.9pt{}+\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{\|}_{\bolds{\theta}}E\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}\Biggr{]}$
	$\displaystyle=$	$\displaystyle\operatorname{Cov}\Biggl{[}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{M}_{m}B_{[nt],m}\Biggr{]}$
		$\displaystyle{}+\operatorname{Cov}\Biggl{[}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{\|}_{\bolds{\theta}}E\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}\Biggr{]}$
	$\displaystyle=$	$\displaystyle\operatorname{Cov}\Biggl{[}\sum_{m=1}^{[ns]}\Delta\mathbf{Q}_{m},\sum_{m=1}^{[nt]}\Delta\mathbf{Q}_{m}\nabla(\bolds{\rho})\Big{\|}_{\bolds{\theta}}E\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}\Biggr{]}$
	$\displaystyle=$	$\displaystyle\bigl{(}V\nabla(\bolds{\rho})\|_{\bolds{\theta}}E+o(1)\bigr{)}\sum_{m=1}^{[ns]}\Biggl{(}\sum_{k=m}^{[nt]}\frac{1}{k}B_{[nt],k}\Biggr{)}$
	$\displaystyle=$	$\displaystyle nV\nabla(\bolds{\rho})\big{\|}_{\bolds{\theta}}E\int_{0}^{s}dx\biggl{[}\int_{x}^{t}\frac{t}{y}\biggl{(}\frac{t}{y}\biggr{)}^{H}\,dy\biggr{]}+o(n)$
	$\displaystyle=$	$\displaystyle n\wedge_{21}(s,t)+o(n),$

and similarly,

\operatorname{Cov}\bigl{[}\mathbf{Q}_{[nt]},\mathbf{U}_{[ns]}\bigr{]}=n\wedge_{12}(s)+o(n),

where

$\displaystyle\wedge_{11}(s,t)$	$\displaystyle=$	$\displaystyle\int_{0}^{s}\biggl{(}\frac{s}{x}\biggr{)}^{H^{\prime}}\Sigma_{1}\biggl{(}\frac{t}{x}\biggr{)}^{H}\,dx$
		$\displaystyle{}+\int_{0}^{s}dx\biggl{[}\int_{x}^{s}\frac{1}{y}\biggl{(}\frac{s}{y}\biggr{)}^{H}\,dy\biggr{]}^{\prime}\Sigma_{2}\biggl{[}\int_{x}^{t}\frac{1}{y}\biggl{(}\frac{t}{y}\biggr{)}^{H}\,dy\biggr{]},$
$\displaystyle\wedge_{21}(s,t)$	$\displaystyle=$	$\displaystyle V\nabla(\bolds{\rho})\big{\|}_{\bolds{\theta}}E\int_{0}^{s}dx\biggl{[}\int_{x}^{t}\frac{t}{y}\biggl{(}\frac{t}{y}\biggr{)}^{H}\,dy\biggr{]},$
$\displaystyle\wedge_{12}(s)$	$\displaystyle=$	$\displaystyle\int_{0}^{s}dx\biggl{[}\int_{x}^{s}\frac{s}{y}\biggl{(}\frac{s}{y}\biggr{)}^{H}\,dy\biggr{]}E^{\prime}\nabla(\bolds{\rho})\big{\|}_{\bolds{\theta}}^{\prime}V.$

Therefore, the asymptotic covariance function of $n^{-1/2}(\mathbf{U}_{[nt]},\mathbf{Q}_{[nt]})$ agrees with that of $(G_{t},W_{2t}V^{1/2})$ . So by weak convergence of the martingale [cf. Theorem 4.1 of Hall and Heyde (1980)], we have

n^{-1/2}([nt])\biggl{(}\frac{\mathbf{N}([nt])}{[nt]}-\bolds{\rho},\hat{\bolds{\theta}}([nt])-\bolds{\theta}\biggr{)}\rightarrow(G_{t},W_{2t}V^{1/2})

in distribution in the space $D_{[0,1]}$ with the Skorohod topology. {pf*}Proof of Theorem 2.1 We assume for $j=1,2$

[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))=[nt]v_{j}\bigl{(}\mathbf{N}([nt])/[nt],\hat{\bolds{\theta}}([nt])\bigr{)}\bigl{(}1+o_{P}(1)\bigr{)}

and

[nt]\operatorname{Var}(h(\hat{\bolds{\theta}}_{j}([nt])))=[nt]v_{j}(\bolds{\rho},\bolds{\theta}),

where $v$ is a continuous function. We also assume

[nt]v_{j}\bigl{(}\mathbf{N}([nt])/[nt],\hat{\bolds{\theta}}([nt])\bigr{)}=[nt]v_{j}(\bolds{\rho},\bolds{\theta})+O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}\qquad\mbox{a.s.},

which holds for most circumstances, since

\mathbf{N}([nt])/[nt]=\bolds{\rho}+O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}\qquad\mbox{a.s.}

and

\hat{\bolds{\theta}}([nt])=\bolds{\theta}+O\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}\qquad\mbox{a.s.}

[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))=[nt]\operatorname{Var}(h(\hat{\bolds{\theta}}_{j}([nt])))+O_{P}\Biggl{(}\sqrt{\frac{\log\log[nt]}{[nt]}}\Biggr{)}.

That is, $[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))$ converges to $[nt]\operatorname{Var}(h(\hat{\bolds{\theta}}_{j}([nt]))),j=1,2,$ in probability. By Slutsky’s theorem, the sequential statistics

B_{t}\biggl{(}\frac{\mathbf{N}([nt])}{[nt]},\hat{\bolds{\theta}}([nt])\biggr{)}=\sqrt{t}\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}

and

B_{t}^{*}(\hat{\bolds{\theta}}([nt]))=\sqrt{t}\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))}{\sqrt{\operatorname{Var}(h(\hat{\bolds{\theta}}_{1}([nt])))+\operatorname{Var}(h(\hat{\bolds{\theta}}_{2}([nt])))}}

have the same distribution asymptotically. So we only need to prove the sequential statistics $B_{t}^{*}$ converges to Brownian motion in distribution. Now

	$\displaystyle h(\hat{\bolds{\theta}}_{j})-h(\bolds{\theta}_{j})$	$\displaystyle=$	$\displaystyle(\hat{\bolds{\theta}}_{j}-\bolds{\theta}_{j})\bigl{(}\partial h(\bolds{\theta}_{j})/\partial\bolds{\theta}_{j}\bigr{)}^{\prime}+o(\\|\hat{\bolds{\theta}}_{j}-\bolds{\theta}_{j}\\|^{1+\delta})$
		$\displaystyle=$	$\displaystyle(\hat{\bolds{\theta}}_{j}-\bolds{\theta}_{j})\bigl{(}\partial h(\bolds{\theta}_{j})/\partial\bolds{\theta}_{j}\bigr{)}^{\prime}+o(n^{-1/2-\delta/3})\qquad\mbox{a.s.},j=1,2.$

It is easy to see that

\operatorname{Var}[\hat{\bolds{\theta}}_{j}([nt])]=\operatorname{Var}[\hat{\bolds{\theta}}_{j}(n)]/t+o(n^{-1})\qquad\mbox{a.s.},j=1,2.

Here, we define

C=\sqrt{\operatorname{Var}[h(\hat{\bolds{\theta}}_{1}([nt]))]+\operatorname{Var}[h(\hat{\bolds{\theta}}_{2}([nt]))]}\sqrt{\operatorname{Var}[h(\hat{\bolds{\theta}}_{1}([ns]))]+\operatorname{Var}[h(\hat{\bolds{\theta}}_{2}([ns]))]}

and

\mathbf{D}=\bigl{(}\partial h(\bolds{\theta}_{1})/\partial\bolds{\theta}_{1},-\partial h(\bolds{\theta}_{2})/\partial\bolds{\theta}_{2}\bigr{)}.

Then

	$\displaystyle B_{t}^{*}(\hat{\bolds{\theta}}([nt]))$	$\displaystyle=$	$\displaystyle\sqrt{t}\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))}{\sqrt{\operatorname{Var}(h(\hat{\bolds{\theta}}_{1}([nt])))+\operatorname{Var}(h(\hat{\bolds{\theta}}_{2}([nt])))}}$
		$\displaystyle=$	$\displaystyle\sqrt{t}\frac{h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2})+(\hat{\bolds{\theta}}([nt])-\bolds{\theta})\mathbf{D}^{\prime}+o(n^{-1/2-\delta/3})}{\sqrt{\operatorname{Var}(h(\hat{\bolds{\theta}}_{1}([nt])))+\operatorname{Var}(h(\hat{\bolds{\theta}}_{2}([nt])))}}.$

By the conclusion of Lemma .1:

n^{-1/2}([nt])\bigl{(}\hat{\bolds{\theta}}([nt])-\bolds{\theta}\bigr{)}\rightarrow(W_{2t}V^{1/2})

in distribution in the space $D_{[0,1]}$ with the Skorohod topology. Under $H_{0}$ , we have

B_{t}^{*}=\sqrt{t}\frac{n^{1/2}([nt])^{-1}W_{2t}V^{1/2}\mathbf{D}^{\prime}}{\sqrt{\operatorname{Var}(h(\hat{\bolds{\theta}}_{1}([nt])))+\operatorname{Var}(h(\hat{\bolds{\theta}}_{2}([nt])))}}+o(n^{-\delta/3})

almost surely. So the sequential statistics $B_{t}^{*}$ converges to a Gaussian process in distribution. In order to prove that $B_{t}^{*}$ converges to a “Brownian process” in distribution, it is enough to show $EB_{t}^{*}\rightarrow 0$ and for any $0<s<t<1$ ,

$\displaystyle\operatorname{cov}(B_{t}^{},B_{s}^{})$	$\displaystyle\rightarrow$	$\displaystyle s,$
$\displaystyle\operatorname{cov}(B_{t}^{},B_{s}^{})$	$\displaystyle=$	$\displaystyle\frac{n\sqrt{ts}}{[nt][ns]}\frac{\operatorname{cov}(W_{2t},W_{2s})\mathbf{D}V\mathbf{D}^{\prime}}{C}+o(n^{-\delta/3})$
	$\displaystyle=$	$\displaystyle\frac{n\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\mathbf{D}V\mathbf{D}^{\prime}}{C}+o(n^{-\delta/3})$
	$\displaystyle=$	$\displaystyle\frac{n\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\mathbf{D}(n\operatorname{Var}[\hat{\bolds{\theta}}(n)-\bolds{\theta}]+o(1))\mathbf{D}^{\prime}}{C}+o(n^{-\delta/3})$
	$\displaystyle=$	$\displaystyle\frac{n^{2}\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\partial h(\bolds{\theta}_{1})/\partial\bolds{\theta}_{1}\operatorname{Var}[\hat{\bolds{\theta}}_{1}(n)-\bolds{\theta}_{1}]\,\partial h(\bolds{\theta}_{1})/\partial\bolds{\theta}_{1}^{\prime}}{C}$
		$\displaystyle{}+\frac{n^{2}\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\partial h(\bolds{\theta}_{2})/\partial\bolds{\theta}_{2}\operatorname{Var}[\hat{\bolds{\theta}}_{2}(n)-\bolds{\theta}_{2}]\,\partial h(\bolds{\theta}_{2})/\partial\bolds{\theta}_{2}^{\prime}}{C}+o(1)$
	$\displaystyle=$	$\displaystyle\frac{n^{2}\sqrt{t}s^{3/2}}{[nt][ns]}\frac{\operatorname{Var}[h(\hat{\bolds{\theta}}_{1}(n))]+\operatorname{Var}[h(\hat{\bolds{\theta}}_{2}(n))]}{C}+o(1)$
	$\displaystyle=$	$\displaystyle\frac{n^{2}ts^{2}}{[nt][ns]}+o(1)$
	$\displaystyle\rightarrow$	$\displaystyle s\qquad\mbox{a.s.}$

It is easy to see that $EB_{t}^{*}\rightarrow 0$ . This completes the proof and shows that $B_{t}$ is asymptotical Brownian process in distribution.

Under $H_{1},$ the sequential statistics

	$\displaystyle B_{t}^{*}(\hat{\bolds{\theta}}([nt]))$	$\displaystyle=$	$\displaystyle\sqrt{t}\frac{h(\hat{\bolds{\theta}}_{1}([nt]))-h(\hat{\bolds{\theta}}_{2}([nt]))-(h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2}))}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}$
			$\displaystyle{}+\sqrt{t}\frac{h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2})}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}.$

With similar proof, the first term converges to a standard Brownian motion in distribution asymptotically. Because

[nt]\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{j}([nt])))=v_{j}\biggl{(}\frac{\mathbf{N}([nt])}{[nt]},\hat{\bolds{\theta}}([nt])\biggr{)}\bigl{(}1+o(1)\bigr{)}\qquad\mbox{a.s. }j=1,2,

we have that

\sqrt{t}\frac{h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2})}{\sqrt{\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{1}([nt])))+\widehat{\operatorname{Var}}(h(\hat{\bolds{\theta}}_{2}([nt])))}}

converges to

t\frac{\sqrt{n}(h(\bolds{\theta}_{1})-h(\bolds{\theta}_{2}))}{\sqrt{v_{1}(\bolds{\rho},\bolds{\theta})+v_{2}(\bolds{\rho},\bolds{\theta})}}=\sqrt{n}\mu t

(18)

in probability. Therefore, under $H_{1}$ , by Slutsky’s theorem, $B_{t}^{*}-\sqrt{n}\mu t$ converges to a standard Brownian motion asymptotically.

Acknowledgments

Special thanks go to anonymous referees, the Associate Editor and the Editor for the constructive comments, which led to a much improved version of the paper.

References

(1) Andersen, J. (1996). Clinical trials designs—made to order. J. Biopharm. Statist. 6 515–522.
(2) Andersen, K. M. (2007). Optimal spending functions for asymmetric group sequential designs. Biom. J. 49 337–345. \MR2380516
(3) Armitage, P. (1957). Restricted sequential procedures. Biometrika 44 9–26. \MR0085685
(4) Armitage, P. (1975). Sequential Medical Trials. Blackwell, Oxford. \MR0370997
(5) Athreya, K. B. and Karlin, S. (1968). Embedding of urn schemes into continuous time Markov branching processes and related limit theorems. Ann. Math. Statist. 39 1801–1817. \MR0232455
(6) Bai, Z. D. and Hu, F. (2005). Asymptotics in randomized urn models. Ann. Appl. Probab. 15 914–940. \MR2114994
(7) Berry, D. A. (2005). Introduction to Bayesian methods III: Use and interpretation of Bayesian tools in design and analysis. Clinical Trials 2 295–300.
(8) Coad, D. S. and Rosenberger, W. F. (1999). A comparison of the randomized play-the-winner rule and the triangular test for clinical trials with binary responses. Stat. Med. 18 761–769.
(9) Cheng, Y. and Shen, Y. (2005). Bayesian adaptive designs for clinical trials. Biometrika 92 633–646. \MR2202651
(10) Connor, E. M., Sperling, R. S., Gelber, R., Kiselev, P., Scott, G., O’Sullivan, M. J., VanDyke, R., Bey, M., Shearer, W., Jacobson, R. L., Jimenez, E., O’Neill, E., Bazin, B., Delfraissy, J. F., Culname, M., Coombs, R., Elkins, M., Moye, J., Stratton, P. and Balsley, J. (1994). Reduction of maternal-infant transmission of human immunodeficiency virus type I with zidovudine treatment. The New England Journal of Medicine 331 1173–1180.
(11) DeMets, D. L. (2006). Futility approaches to interim monitoirng by data monitoring committees. Clinical Trials 3 522–529.
(12) Eisele, J. and Woodroofe, M. (1995). Central limit theorems for doubly adaptive biased coin designs. Ann. Statist. 23 234–254. \MR1331666
(13) Ethier, S. N. and Kurts, T. G. (1986). Markov Processes: Characterization and Convergence. Wiley, New York. \MR0838085
(14) Gwise, T. E., Hu, J. and Hu, F. (2008). Optimal biased coins for two-arm clinical trials. Stat. Interface 1 125–135. \MR2425350
(15) Hayre, L. S. (1979). Two-population sequential tests with three hypotheses. Biometrika 66 465–474. \MR0556733
(16) Hall, P. and Heyde, C. C. (1980). Martingale Limit Theory and Its Application. Academic Press, New York. \MR0624435
(17) Hu, F. and Rosenberger, W. F. (2003). Optimality, variability, power: Evaluating response-adaptive randomization procedures for treatment comparisons. J. Amer. Statist. Assoc. 98 671–678. \MR2011680
(18) Hu, F. and Rosenberger, W. F. (2006). The Theory of Response-Adaptive Randomization in Clinical Trials. Wiley, New York. \MR2245329
(19) Hu, F., Rosenberger, W. F. and Zhang, L. (2006). Asymptotically best response-adaptive randomization procedures. J. Statist. Plann. Inference 136 1911–1922. \MR2255603
(20) Hu, F. and Zhang, L. X. (2004). Asymptotic properties of doubly adaptive biased coin designs for multi-treatment clinical trials. Ann. Statist. 32 268–301. \MR2051008
(21) Hu, F., Zhang, L. X. and He, X. (2009). Efficient randomized adaptive designs. Ann. Statist. 37 2543–2560. \MR2543702
(22) Ivanova, A. V. (2003). A play-the-winner type urn model with reduced variability. Metrika 58 1–13. \MR1999248
(23) Jennison, C. and Turnbull, B. W. (2000). Group Sequential Methods With Applications to Clinical Trials. Chapman and Hall, Boca Raton, FL.. \MR1710781
(24) Lan, K. and DeMets, D. L. (1983). Discrete sequential boundaries for clinical trials. Biometrika 70 659–663. \MR0725380
(25) Lewis, R. J., Lipsky, A. M. and Berry, D. A. (2007). Bayesian decision-theoretic group sequential clinical trial design based on a quadratic loss function: A frequentist evaluation. Clinical Trials 4 5–14.
(26) O’Brien, P. C. and Fleming, T. R. (1979). A multiple testing procedure for clinical trials. Biometrics 35 549–556.
(27) Pocock, S. J. (1977). Group sequential methods in the design and analysis of clinical trials. Biometrika 64 191–199.
(28) Pocock, S. J. (1982). Interim analyses for randomized clinical trials: The group sequential approach. Biometrics 38 153–162.
(29) Proschan, M. A., Lan, K. and Wittes, J. T. (2006). Statistical Monitoring of Clinical Trials, A Unified Approach. Springer, New York.
(30) Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 527–535. \MR0050246
(31) Rosenberger, W. F. and Hu, F. (2004). Maximizing power and minimizing treatment failures in clinical trials. Clinical Trials 1 141–147.
(32) Rosenberger, W. F. and Lachin, J. M. (2002). Randomization in Clinical Trials: Theory and Practice. Wiley, New York. \MR1914364
(33) Rosenberger, W. F., Stallard, N., Ivanova, A., Harper, C. N. and Ricks, M. L. (2001). Optimal adaptive designs for binary response trials. Biometrics 57 909–913. \MR1863454
(34) Rout, C. C., Rocke, D. A., Levin, L., Gouws, E. and Reddy, D. (1993). A reevaluation of the role of crystalloid preload in the prevention of hypotension associated with apinal anesthesia for elective cesarean section. Anesthesiology 79 262–269.
(35) Tamura, R. N., Faries, D. E., Andersen, J. S. and Heiligenstein, J. H. (1994). A case study of an adaptive clinical trial in the treatment of out-patients with depressive disorder. J. Amer. Statist. Assoc. 89 768–776.
(36) Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in the review of the evidence of the two samples. Biometrika 25 275–294.
(37) Tymofyeyev, Y., Rosenberger, W. F. and Hu, F. (2007). Implementing optimal allocation in sequential binary response experiments. J. Amer. Statist. Assoc. 102 224–234. \MR2345540
(38) Wald, A. (1947). Sequential Analysis. Wiley, New York. \MR0020764
(39) Wathen, J. K. and Thall, P. F. (2008). Bayesian adaptive model selection for optimizing group sequential clinical trials. Statistics in Medicine 27 5586–5604.
(40) Wei, L. J. and Durham, S. (1978). The randomized play-the-winner rule in medical trials. J. Amer. Statist. Assoc. 73 840–843.
(41) Zelen, M. (1969). Play the winner and the controlled clinical trial. J. Amer. Statist. Assoc. 64 131–146. \MR0240938
(42) Zhang, L. and Rosenberger, W. F. (2006). Response-adaptive randomization for clinical trials with continuous outcomes. Biometrics 62 562–569. \MR2236838
(43) Zhang, L. X., Hu, F. and Cheung, S. H. (2006). Asymptotic theorems of sequential estimation-adjusted urn models. Ann. Appl. Probab. 16 340–369. \MR2209345