Bootstraps for Dynamic Panel Threshold Models

Woosik Gong University of Wisconsin-Madison. Email: wgong28@wisc.edu. This research was supported by the BK21 FOUR (Fostering Outstanding Universities for Research) funded by the Ministry of Education(MOE, Korea) and National Research Foundation of Korea(NRF). The author is also grateful to the student travel grant award by the IAAE 2022 conference. Myung Hwan Seo Seoul National University. Email: myunghseo@snu.ac.kr. This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2018S1A5A2A01033487)

Abstract

This paper develops valid bootstrap inference methods for the dynamic short panel threshold regression. We demonstrate that the standard nonparametric bootstrap is inconsistent for the first-differenced generalized method of moments (GMM) estimator. The inconsistency arises from an $n^{1/4}$ -consistent non-normal asymptotic distribution of the threshold estimator when the true parameter lies in the continuity region of the parameter space, which stems from the rank deficiency of the approximate Jacobian of the sample moment conditions on the continuity region. To address this, we propose a grid bootstrap to construct confidence intervals for the threshold and a residual bootstrap to construct confidence intervals for the coefficients. They are shown to be valid regardless of the model’s continuity. Moreover, we establish a uniform validity for the grid bootstrap. A set of Monte Carlo experiments demonstrates that the proposed bootstraps improve upon the standard nonparametric bootstrap. An empirical application to a firm investment model illustrates our methods.

KEYWORDS: Dynamic Panel Threshold; Kink; Bootstrap; Endogeneity; Identification; Rank Deficiency; Uniformity.

JEL: C12, C23, C24

1 Introduction

Threshold regression models have been widely used by empirical researchers, which have been more fruitful because of their extensions to the panel data context. Estimation and inference methods for the threshold model in non-dynamic panels were developed by Hansen, 1999b and Wang, (2015). Dynamic panel threshold models were considered by Seo and Shin, (2016), which proposes the generalized method of moments (GMM) estimation by generalizing the Arellano and Bond, (1991) dynamic panel estimator. A latent group structure in the parameters of the panel threshold model was investigated by Miao et al., 2020b .

Applications of the panel threshold models cover numerous topics in economics. The effect of debt on economic growth is a well-known example that has been analyzed using the panel threshold models, e.g., Adam and Bevan, (2005), Cecchetti et al., (2011) and Chudik et al., (2017). Another example is the threshold effect of inflation on economic growth such as the works by Khan and Senhadji, (2001), Rousseau and Wachtel, (2002), Bick, (2010), and Kremer et al., (2013). The benefit of foreign direct investment to productivity growth that depends on the regime determined by absorptive capacity is studied by Girma, (2005) using firm-level panel data.

It is common practice to make inference in threshold regression models based on an assumption about whether the model is continuous or not. Continuous threshold models that have kinks at the tipping points have received active research attention, e.g., Hansen, (2017); Kim et al., (2019) and Yang et al., (2020). In the literature, kink threshold models are analyzed for estimators that impose the continuity restriction as in Chan and Tsay, (1998), Hansen, (2017), and Zhang et al., (2017). On the other hand, unrestricted estimators are commonly used for discontinuous threshold models as in Hansen, (2000). However, Hidalgo et al., (2019) showed that the unrestricted least squares estimator possesses a different asymptotic property in the absence of discontinuity. Specifically, while the unrestricted model is not misspecified under continuity, failing to impose the restriction results in incorrect inference without proper care.

In the empirical literature, there has been mixed use of kink/discontinuous threshold models without much consideration of a possible specification error. Among the empirical examples referred to previously, Khan and Senhadji, (2001) use a continuous threshold model and impose continuity on their estimation procedure. They claim that the continuous model is desirable to prevent small changes in inflation rate from yielding different impacts around the threshold level. On the other hand, Bick, (2010) claims that the discontinuous threshold model is more appropriate for the same research question since overlooking a regime-dependent intercept can result in omitted variable bias. However, both of them do not provide econometric evidence that supports their choice of models.

For the dynamic panel threshold model, asymptotic normality of the GMM estimator is derived by Seo and Shin, (2016) under the fixed $T$ scheme. However, the asymptotic normality is valid only for the discontinuous models since it requires a full rank condition on the Jacobian of the population moment, which is violated in continuous models. Although the continuity-restricted estimator described in Kim et al., (2019) is asymptotically normal, it may be problematic since empirical researchers often do not agree about whether their threshold models should have a kink or a jump at the threshold as in Khan and Senhadji, (2001) and Bick, (2010). Therefore, we are focusing on the unrestricted GMM estimator and bootstrap inference methods which do not require any pretest on continuity or prior knowledge about continuity of true models.

We first show that when the true model is continuous, the asymptotic normality of the unrestricted GMM estimator breaks down and the convergence rate of the threshold estimator becomes $n^{1/4}$ -rate, which is slower than the standard $\sqrt{n}$ -rate. Moreover, the standard nonparametric bootstrap is inconsistent in this case because the Jacobian from the bootstrap distribution does not degenerate fast enough due to the slow convergence rate of the threshold estimator.

We propose two different bootstrap methods to obtain confidence intervals for the parameters that are consistent regardless of whether the true model is continuous or not. One is for the threshold location, and the other is for the coefficients. The two bootstrap methods achieve the consistency irrespective of the continuity of the model by adaptively setting the recentering parameter at the bootstrap for GMM introduced by Hall and Horowitz, (1996). This means that our bootstrap moment function achieves zero not at the sample estimator but at the parameter values that we propose. In the bootstrap for the threshold location, we employ a grid bootstrap to fix the recentering parameter. The grid bootstrap was originally proposed by Hansen, 1999a for inference on an autoregressive parameter and applies the test inversion principle. In case of the bootstrap for the coefficients, the recentering parameter is set to adjust the unrestricted estimator by a data driven criterion on the model’s continuity. We also introduce a bootstrap test of model continuity.

Furthermore, we establish the uniform validity of the grid bootstrap for the unknown continuity (or discontinuity) of the threshold model. The importance of uniform validity is well recognized in the literature, notably in the works of Mikusheva, (2007), Andrews and Guggenberger, (2009), and Romano and Shaikh, (2012), among others, who have studied the uniformity of resampling procedures. In particular, Mikusheva, (2007) showed the uniform validity of the grid bootstrap for linear autoregressive models. Our work extends this advantage of the grid bootstrap to a different class of nonstandard inference problems involving continuity of the model.

A set of Monte Carlo simulations demonstrates that the grid bootstrap performs favorably for inference on the threshold location, not only when the model is continuous but also when it includes a jump for various jump sizes. However, inference on the coefficients turns out to be more challenging. Bootstrap confidence intervals for the coefficient, based on percentiles of bootstrap distributions, tend to exhibit severe undercoverage. Nevertheless, our residual bootstrap method improves upon the standard nonparametric bootstrap in both cases.

We apply our inference methods to the dynamic firm investment model, whose static version has been studied by Fazzari et al., (1988) or Hansen, 1999b . It takes financial constraints into account via the threshold effect to determine a firm’s investment decision.

In the literature, Dovonon and Renault, (2013) and Dovonon and Hall, (2018) also deal with the degeneracy of the Jacobian in the context of the common conditional heteroskedasticity testing problem. And a bootstrap based test for the common conditional heteroskedasticity feature was proposed by Dovonon and Goncalves, (2017). However, their works do not deal with a discontinuous criterion function and their null hypothesis of interest always induces the degeneracy of the first-order derivative. That is, they are only concerned with a hypothesis testing and do not consider the confidence intervals. So, they do not have to address the uncertainty associated with the potential degeneracy of the Jacobian.

Meanwhile, there is also a substantial body of literature on singularity-robust inference such as Andrews and Cheng, (2012, 2014) and Han and McCloskey, (2019), among many others. They are motivated by weak or non-identification problems, where models are not point identified. In contrast, we focus on the inference problem that does not involve identification failure even though the Jacobian of the moment restriction can become singular. Andrews and Guggenberger, (2019) study more general singular cases than non-identification, but their approach requires differentiability of sample moments for the subvector inference. Since our model exhibits discontinuity, the method of Andrews and Guggenberger, (2019) is not applicable.

This paper is organized as follows. Section 2 explains the dynamic panel threshold model. Section 3 presents the asymptotic distribution theories of the estimators and test statistics related to the threshold location and continuity. Section 4 proposes bootstrap methods. Section 5 reports Monte Carlo simulation results. Section 6 contains an empirical application. Section 7 concludes. The mathematical proofs and technical details are left to the Appendix.

2 Dynamic Panel Threshold Model

We consider the dynamic panel threshold model,

y_{it}=x_{it}^{\prime}\beta+(1,x_{it}^{\prime})\delta 1\{q_{it}>\gamma\}+\eta_{i}+\epsilon_{it},

(1)

where $1\leq i\leq n$ , $1\leq t\leq T$ , and $x_{it}\in\mathbb{R}^{p}$ is a regressor vector that includes $y_{i,t-1}$ and $q_{it}$ . The threshold variable $q_{it}\in\mathbb{R}$ is allowed to be endogenous and is the last element of $x_{it}$ .¹¹1Our analysis still holds if researchers have two sets of regressors $x_{1it}$ and $x_{2it}$ such that $y_{it}=x_{1it}^{\prime}\beta+(1,x_{2it}^{\prime})\delta 1\{q_{it}>\gamma\}+\eta_{i}+\epsilon_{it}$ where $q_{it}$ is an element of $x_{2it}$ . However, this paper sticks to the current form to keep the exposition simple. Then, we partition $x_{it}$ and write $x_{it}=(\xi_{it}^{\prime},q_{it})^{\prime}\in\mathbb{R}^{p}$ .

When $x_{it}$ consists of the lagged dependent variables, the model becomes the well-known self-exciting threshold autoregressive (TAR) model popularized by Chan and Tong, (1985). The static version where the lagged dependent variables are excluded from $x_{it}$ was considered by Hansen, 1999b , while the current dynamic model was studied by Seo and Shin, (2016).

The parameter $\gamma\in\Gamma$ denotes the threshold location, where $\Gamma$ is a compact set in $\mathbb{R}$ , and $\alpha=(\beta^{\prime},\delta^{\prime})^{\prime}\in A\subset\mathbb{R}^{2p+1}$ denotes the collection of coefficients. Let $\theta=(\alpha^{\prime},\gamma)=(\beta^{\prime},\delta^{\prime},\gamma)^{\prime}\in\Theta=A\bigtimes\Gamma$ denote the vector of all the parameters. The fixed effect $\eta_{i}$ is constant across time for each individual in the panel data. It is not identified but is eliminated after first-differencing for the GMM estimation. The idiosyncratic error $\epsilon_{it}$ is independent across individuals.

For the estimation, we use the GMM after the first-difference transformation

\Delta y_{it}=\Delta x_{it}^{\prime}\beta+1_{it}(\gamma)^{\prime}X_{it}\delta+\Delta\epsilon_{it},

(2)

where

X_{it}=\begin{pmatrix}(1,x_{it}^{\prime})\\ (1,x_{it-1}^{\prime})\end{pmatrix},\text{ and }1_{it}(\gamma)=\begin{pmatrix}1\{q_{it}>\gamma\}\\ -1\{q_{it-1}>\gamma\}\end{pmatrix}.

(3)

Let $z_{it}$ denote a set of instrumental variables at time $t$ such that $E[z_{it}\Delta\epsilon_{it}]$ becomes a zero vector, which may include lagged dependent variables $y_{it-2},...,y_{i1}$ and certain lagged variables of covariates $x_{it}$ and/or $q_{it}$ , depending on the assumptions regarding exogeneity of those variables.

Then, we can define a vector of moment functions for the GMM estimation,

g_{i}(\theta)=\begin{pmatrix}z_{it_{0}}(\Delta y_{it_{0}}-\Delta x_{it_{0}}^{\prime}\beta-1_{it_{0}}(\gamma)^{\prime}X_{it_{0}}\delta)\\ \vdots\\ z_{iT}(\Delta y_{iT}-\Delta x_{iT}^{\prime}\beta-1_{iT}(\gamma)^{\prime}X_{iT}\delta)\end{pmatrix}\in\mathbb{R}^{k},

(4)

where $k\geq dim(\theta)=2p+2$ and $t_{0}\geq 2$ is the earliest period that the regressor and instrument can be defined. For example, $k=(T-1)(T-2)/2$ when $z_{it}=(y_{it-2},...,y_{i1})^{\prime}$ and $t_{0}=3$ . Denote the population moment by $g_{0}(\theta)=E[g_{i}(\theta)]$ and the sample moment by

\bar{g}_{n}(\theta)=\frac{1}{n}\sum_{i=1}^{n}g_{i}(\theta).

We write $g_{i}$ instead of $g_{i}(\theta_{0})$ for simplicity of notations.

We consider the two-stage GMM estimation of the dynamic panel threshold model. In the first stage, we get an initial estimate by $\hat{\theta}_{(1)}=\arg\min_{\theta\in\Theta}\bar{g}_{n}(\theta)^{\prime}\bar{g}_{n}(\theta)$ to compute a weight matrix

W_{n}=\left(\frac{1}{n}\sum_{i=1}^{n}[g_{i}(\hat{\theta}_{(1)})g_{i}(\hat{\theta}_{(1)})^{\prime}]-\bar{g}_{n}(\hat{\theta}_{(1)})\bar{g}_{n}(\hat{\theta}_{(1)})^{\prime}\right)^{-1},

and obtain the second stage estimator

\hat{\theta}=\arg\min_{\theta\in\Theta}\hat{Q}_{n}(\theta),

where $\hat{Q}_{n}(\theta)=\bar{g}_{n}(\theta)^{\prime}W_{n}\bar{g}_{n}(\theta)$ . Seo and Shin, (2016) proposed averaging of a class of GMM estimators that are constructed from randomized first stage estimators. We do not pursue the averaging since our primary goal is the bootstrap inference.

In practice, the grid search algorithm is employed to compute the estimates. Note that when $\gamma$ is given, $\hat{\alpha}(\gamma)=\arg\min_{\alpha\in A}\hat{Q}_{n}(\alpha,\gamma)$ can be easily computed because the problem becomes the estimation of the linear dynamic panel model. Then, $\hat{\gamma}$ minimizes the profiled criterion $\tilde{Q}_{n}(\gamma)=\hat{Q}_{n}(\hat{\alpha}(\gamma),\gamma)$ over the grid of $\Gamma$ .

Let $\theta_{0}=(\alpha_{0}^{\prime},\gamma_{0})^{\prime}=(\beta_{0}^{\prime},\delta_{0}^{\prime},\gamma_{0})^{\prime}$ denote the true parameter value that lies in the interior of $\Theta$ . For the point identification of $\theta_{0}$ , $g_{0}(\theta)=0_{k}$ should hold if and only if $\theta=\theta_{0}$ , where $0_{k}=(0,...,0)^{\prime}\in\mathbb{R}^{k}$ . Let

M_{1i}=-\begin{bmatrix}z_{it_{0}}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ z_{iT}\Delta x_{iT}^{\prime}\end{bmatrix}\in\mathbb{R}^{k\times p},\quad M_{2i}(\gamma)=-\begin{bmatrix}z_{it_{0}}1_{it_{0}}(\gamma)^{\prime}X_{it_{0}}\\ \vdots\\ z_{iT}1_{iT}(\gamma)^{\prime}X_{iT}\end{bmatrix}\in\mathbb{R}^{k\times(p+1)},

and $M_{i}(\gamma)=\left[\begin{array}[]{c;{2pt/2pt}c}M_{1i}&M_{2i}(\gamma)\end{array}\right]$ . Additionally, define $M_{0}(\gamma)=E[M_{i}(\gamma)]$ , $M_{10}=E[M_{1i}]$ , $M_{20}(\gamma)=E[M_{2i}(\gamma)]$ , $\bar{M}_{n}(\gamma)=n^{-1}\sum_{i=1}^{n}M_{i}(\gamma)$ , $\bar{M}_{1n}=n^{-1}\sum_{i=1}^{n}M_{1i}$ , and $\bar{M}_{2n}(\gamma)=n^{-1}\sum_{i=1}^{n}M_{2i}(\gamma)$ . We write $M_{0}$ , $M_{20}$ and $\bar{M}_{n}$ instead of $M_{0}(\gamma_{0})$ , $M_{20}(\gamma_{0})$ and $\bar{M}_{n}(\gamma_{0})$ , respectively, for simplicity of notation. The identification condition is stated in Theorem 1 that follows.

Theorem 1.

Let the following two conditions hold:

(i) The matrix $M_{0}$ is of full column rank.

(ii) For any $\gamma\neq\gamma_{0}$ , $M_{20}\delta_{0}$ is not in the column space of $M_{20}(\gamma)$ .
Then, $\theta_{0}$ is a unique solution to $g_{0}(\theta)=0_{k}$ .

Theorem 1 (i) is the identification condition for the coefficients once the true threshold location is identified. This means that instruments should be relevant to the first-differenced regressors appearing in $\eqref{eq:fd}$ when $\gamma=\gamma_{0}$ .

Theorem 1 (ii) is for the identification of the threshold location, which excludes the possibility of $\delta_{0}=0_{p+1}$ . In the standard GMM problem, it is usually assumed that the Jacobian of $g_{0}(\theta)$ at $\theta_{0}$ is of full column rank for both the point identification and the asymptotic normality of the GMM estimator. The condition (ii) does not require the full rank condition on the Jacobian, which is related to the presence of a jump in the threshold model, and thus it generalizes the identification conditions in Seo and Shin, (2016). When the model is continuous and has a kink at the threshold location, the last column of the Jacobian matrix, which is the first-order derivative with respect to $\gamma$ at the true parameter, becomes a zero vector. This degeneracy does not violate the condition (ii), but it fails the asymptotic normality of the standard GMM estimator, which relies on the linearization of $g_{0}(\theta)$ near $\theta_{0}$ as in Newey and McFadden, (1994).

To define the continuity, recall that $q_{it}$ is the last element of $x_{it}$ such that $x_{it}=(\xi_{it}^{\prime},q_{it})^{\prime}\in\mathbb{R}^{p}$ . Accordingly, partition $\delta=(\delta_{1},\delta_{2}^{\prime},\delta_{3})^{\prime}$ , where $\delta_{2}\in\mathbb{R}^{p-1}$ and $\delta_{1},\delta_{3}\in\mathbb{R}$ , and $\delta_{0}=(\delta_{10},\delta_{20}^{\prime},\delta_{30})^{\prime}$ . Hence, $\delta_{3}$ is the change in the coefficient of the threshold variable when the threshold variable surpasses the tipping point. Likewise, $\delta_{2}$ and $\delta_{1}$ are the changes in the coefficients for the other regressors, $\xi_{it}$ , and the intercept, respectively. The continuity of the dynamic panel threshold model is formally given in Definition 1.

Definition 1.

Let $\delta\neq 0_{p+1}$ . A dynamic panel threshold model is continuous with respect to the threshold variable if $\theta\in\Theta_{c}=\{\theta\in\Theta:\delta\neq 0_{p+1},\delta_{2}=0_{p-1}$ and $\delta_{1}+\delta_{3}\gamma=0\}$ . Otherwise, it is discontinuous at the threshold location.

Note that this definition of continuity requires that $\delta_{3}\neq 0$ ; otherwise, $\delta=0_{p+1}$ .

The rank of the first-order derivative matrix, say $D_{1}$ , of $g_{0}(\theta)$ at $\theta=\theta_{0}$ is crucial to the standard asymptotic normality of the GMM estimator. Let $G$ denote the first-order derivative of $g_{0}(\theta)$ with respect to $\gamma$ at $\theta=\theta_{0}$ . Then,

G=\underbrace{\begin{bmatrix}E_{t_{0}}[z_{it_{0}}(1,x_{it_{0}}^{\prime})|\gamma_{0}]f_{t_{0}}(\gamma_{0})-E_{t_{0}-1}[z_{it_{0}}(1,x_{it_{0}-1}^{\prime})|\gamma_{0}]f_{t_{0}-1}(\gamma_{0})\\ \vdots\\ E_{T}[z_{iT}(1,x_{iT}^{\prime})|\gamma_{0}]f_{T}(\gamma_{0})-E_{T-1}[z_{iT}(1,x_{iT-1}^{\prime})|\gamma_{0}]f_{T-1}(\gamma_{0})\end{bmatrix}}_{\textstyle G_{0}}\times\delta_{0}\in\mathbb{R}^{k},

(5)

where the conditional expectation $E_{t}[\cdot|q]=E[\cdot|q_{it}=q]$ and the density function $f_{t}(\cdot)$ of $q_{it}$ are assumed to exist. The derivation of $G$ is provided in the proof of Lemma D.1. Note that the first-order derivative of $g_{0}(\theta)$ with respect to $\alpha$ at $\theta=\theta_{0}$ is $M_{0}$ . The linear independence of $G$ from the other columns in $D_{1}$ is required for the standard linear approximation

g_{0}(\theta)\approx D_{1}(\theta-\theta_{0})=M_{0}(\alpha-\alpha_{0})+G(\gamma-\gamma_{0}).

Recall that the vector $G$ can be written as the product of the matrix $G_{0}$ and the vector $\delta_{0}$ , (5), and the first and last columns of $G_{0}$ are linearly dependent since $q_{it}=\gamma_{0}$ for all $t$ due to the conditioning. Then, the standard rank condition on the first derivative matrix $D_{1}$ can follow from a more primitive rank condition on $\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&G_{0,-(p+1)}\end{array}\right]$ , that is, the linear independence of all the columns in $M_{0}$ and all but the last column of $G_{0}$ , for the discontinuous case. Even if the primitive condition is met, however, the continuity restriction makes $G=0_{k}$ since $E_{s}[z_{it}(1,x_{is}^{\prime})\delta_{0}|\gamma_{0}]=(\delta_{10}+\delta_{30}\gamma_{0})E_{s}[z_{it}|\gamma_{0}]=0$ for $s\leq t$ , which leads to degeneracy of $D_{1}$ .

When the rank condition fails due to the continuity, the expansion becomes

g_{0}(\theta)\approx M_{0}(\alpha-\alpha_{0})+H(\gamma-\gamma_{0})^{2},

where

H=\frac{\partial^{2}g_{0}(\theta_{0})}{2\partial\gamma\partial\gamma}=\frac{\delta_{30}}{2}\begin{pmatrix}E_{t_{0}}[z_{it_{0}}|\gamma_{0}]f_{t_{0}}(\gamma_{0})-E_{t_{0}-1}[z_{it_{0}}|\gamma_{0}]f_{t_{0}-1}(\gamma_{0})\\ \vdots\\ E_{T}[z_{iT}|\gamma_{0}]f_{T}(\gamma_{0})-E_{T-1}[z_{iT}|\gamma_{0}]f_{T-1}(\gamma_{0})\end{pmatrix}\in\mathbb{R}^{k}.

(6)

The detailed derivation is given in the proof of Lemma D.1. It is worth noting that $H$ is identical to the first column of $G_{0}$ up to a constant multiple. Then, the rank condition on $\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&H\end{array}\right]$ is implied by the rank condition on $\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&G_{0,-(p+1)}\end{array}\right]$ . Thus, the rank condition on $\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&G_{0,-(p+1)}\end{array}\right]$ can be viewed as a sufficient condition for both Assumptions LK and LJ in the next section, apart from the continuity restriction on $\theta$ . Next section formalizes this discussion and presents the asymptotic distribution of the GMM estimator $\hat{\theta}$ under the continuity.

3 Asymptotic theory

This section considers the asymptotic analysis when $T$ is fixed, the data are independent and identically distributed across $i$ , and $n\rightarrow\infty$ . Specifically, the data for each individual $i$ is determined by the realization of $\{(z_{it},x_{it},\epsilon_{it})_{t=1}^{T},y_{i0},\eta_{i}\}$ , where $y_{i0}$ denotes the initial value. We make the following assumptions.

Assumption G.

The parameter space $\Theta$ is compact and $\theta_{0}\in\text{int }\Theta$ . $M_{0}$ is of full column rank, and $M_{20}\delta_{0}$ is not in the column space of $M_{20}(\gamma)$ for any $\gamma\neq\gamma_{0}$ . $\Omega=E[g_{i}g_{i}^{\prime}]$ is positive definite. $E\|z_{it}\|^{4}$ , $E\|x_{it}\|^{4}$ , and $E\epsilon_{it}^{4}$ are finite for all $t$ .

Assumption D.

For all $t$ , (i) $q_{it}$ has a continuous distribution and a bounded density $f_{t}(\cdot)$ , which is continuously differentiable at $\gamma_{0}$ and $f_{t}(\gamma_{0})>0$ . (ii) $E_{t}[z_{it}(1,x_{it}^{\prime})|q]$ and $E_{t-1}[z_{it}(1,x_{it-1}^{\prime})|q]$ are continuous on $q\in\Gamma$ and continuously differentiable at $q=\gamma_{0}$ .

Assumption LK.

$D_{2}=\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&H\end{array}\right]\in\mathbb{R}^{k\times(2p+2)}$ has full column rank.

Assumptions G and D are similar to Assumptions 1 and 2 in Seo and Shin, (2016) except for the differentiability conditions in D which allow the second-order derivative of the population moment to be defined. Since the regressors include lagged dependent variables, G requires the individual fixed effects and initial values to have finite fourth moments, too. The assumption also includes the conditions in Theorem 1. LK is a rank condition for a nondegenerate asymptotic distribution when the underlying model is continuous. This condition may be viewed as less restrictive than the standard rank assumption as discussed in the preceding section where $G$ and $H$ are defined. For easy reference, we restate the standard full rank assumption for the asymptotic normality of the GMM estimator for the discontinuous threshold regression below.

Assumption LJ.

$D_{1}=\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&G\end{array}\right]\in\mathbb{R}^{k\times(2p+2)}$ has full column rank.

In a simple model, where $y_{it}=x_{it}^{\prime}\beta+(\delta_{1}+\delta_{3}q_{it})1\{q_{it}>\gamma\}+\eta_{i}+\epsilon_{it}$ , LK is equivalent to LJ because $G=(\delta_{10}+\delta_{30}\gamma_{0})G_{01}$ while $H={\delta_{30}}G_{01}/2$ , where $G_{01}$ is the first column of $G_{0}$ in (5).

Theorem 2 below establishes the asymptotic distribution of the GMM estimator when the dynamic panel threshold model is continuous.

Theorem 2.

When the true model is continuous and Assumptions G, D, and LK hold,

\begin{pmatrix}\sqrt{n}(\hat{\alpha}-\alpha_{0})\\ \sqrt{n}(\hat{\gamma}-\gamma_{0})^{2}\end{pmatrix}\xrightarrow{d}\begin{pmatrix}U-(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}HV\\ V\end{pmatrix},

where $U\sim N(0,(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1})$ and $V\sim\max\{0,N(0,(H^{\prime}\Xi H)^{-1})\}$ are independent of each other, while $\Xi=\Omega^{-1}-\Omega^{-1}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}$ .

We observe that the convergence rate of $\hat{\gamma}$ is $n^{1/4}$ , which is slower than the standard $\sqrt{n}$ -rate. Meanwhile, Seo and Shin, (2016) show the $\sqrt{n}$ -convergence rate for $\hat{\gamma}$ when the model is discontinuous. Intuitively, it would be more difficult to detect the precise threshold location when there is a kink than when there is a jump at the tipping point. More technically, when the threshold model is discontinuous and the Jacobian is not singular, the limit of the GMM objective function admits a quadratic approximation with respect to $\gamma$ at the true value, while the limit admits a quartic approximation for the continuous model. Hence, the limit objective function becomes flatter in $\gamma$ at the true value resulting in the slower convergence rate. Hidalgo et al., (2019) also showed in the least squares context that when the model is continuous, the convergence rate of the threshold estimator slows down to $n^{1/3}$ , while it is superconsistent $n$ -rate when the model is discontinuous.

Moreover, we can observe that the asymptotic distribution of $\hat{\alpha}$ is also shifting to a non-normal distribution. Hence, standard inference methods based on the asymptotic normality become invalid for the continuous dynamic panel threshold model.

The asymptotic distribution of the GMM estimator is identical to the distribution reported in Theorem 1 (b) in Dovonon and Hall, (2018), which studies a smooth GMM problem with the degeneracy of the Jacobian. Theorem 2 shows that even though the criterion of our threshold model is discontinuous with respect to the parameter $\gamma$ , the same asymptotic distribution as that of Dovonon and Hall, (2018) appears.

The censored normal distribution also appears in Andrews, (2002) which studies the estimation of a parameter on a boundary. Heuristically, because our analysis depends on the second-order derivative of $\gamma$ for the local polynomial expansion of $g_{0}(\theta)$ near $\theta_{0}$ , only the asymptotic distribution of $(\hat{\gamma}-\gamma_{0})^{2}$ can be derived. Since $(\hat{\gamma}-\gamma_{0})^{2}$ should be nonnegative, the asymptotic censored normal distribution appears as in Andrews, (2002). Meanwhile, Dovonon and Goncalves, (2017) show that the standard nonparametric bootstrap becomes invalid when the Jacobian degenerates. To address this issue, we propose different bootstrap methods in Section 4 for the inference of the parameters.

The asymptotic distribution in Theorem 2 can be used for parameter inference when the true model is continuous, but the estimator is obtained without imposing the continuity restriction. As discussed in Seo and Shin, (2016), $M_{0}$ and $\Omega$ can be consistently estimated, while $H$ can be nonparametrically estimated similarly to $G$ . It is then straightforward to simulate the limit distribution from Theorem 2 by generating random numbers for $U$ and $V$ . However, there are several drawbacks to that approach, and hence we do not recommend it. First, empirical researchers might construct confidence intervals based on Theorem 2 when they cannot reject the continuity. However, Leeb and Pötscher, (2005) show that confidence intervals after model selection are subject to size-distortion. Second, even if the true model is known to be continuous, the continuity-restricted estimator explained in Kim et al., (2019) is more efficient and asymptotically normal. Therefore, using the continuity-restricted estimator for estimation and inference is preferable. Finally, the nonparametric estimation of $H$ requires a tuning parameter and has a slower convergence rate.

Seo and Shin, (2016) derived the asymptotic distribution of the GMM estimator and propose an inference method when the underlying model is discontinuous. When the true model is discontinuous and Assumptions G, D, and LJ hold,

\begin{pmatrix}\sqrt{n}(\hat{\alpha}-\alpha_{0})\\ \sqrt{n}(\hat{\gamma}-\gamma_{0})\end{pmatrix}\xrightarrow{d}N(0,(D_{1}^{\prime}\Omega^{-1}D_{1})^{-1}).

$\Omega$ can be estimated by $\hat{\Omega}=\frac{1}{n}\sum_{i=1}^{n}[g_{i}(\hat{\theta})g_{i}(\hat{\theta})^{\prime}]-\bar{g}_{n}(\hat{\theta})\bar{g}_{n}(\hat{\theta})^{\prime}$ . Note that $D_{1}=\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&G\end{array}\right]$ , and $M_{0}$ can be estimated by $\bar{M}_{n}(\hat{\gamma})$ , while the estimation of $G$ involves nonparametric estimation of the conditional means and densities. See section 4 of Seo and Shin, (2016) for more details. Note that $(\hat{D}_{1}\hat{\Omega}^{-1}\hat{D}_{1})^{-1}$ diverges when the model is continuous since the last column of $\hat{D}_{1}$ converges to a zero vector when it is consistent. This paper does not analyze the issue and leaves it for future research.

3.1 Testing for threshold value

Since the asymptotic distribution of the threshold estimator is not standard, we consider the GMM distance test introduced by Newey and West, (1987) for a hypothesis on the location of the threshold. Let the test statistic for the threshold location at $\gamma$ be

\mathcal{D}_{n}(\gamma)=n(\min_{\alpha\in A}\hat{Q}_{n}(\alpha,\gamma)-\hat{Q}_{n}(\hat{\theta})),

and let $\chi^{2}_{1}$ denote the chi-square distribution with 1 degree of freedom.

Theorem 3.

(i) If $\gamma=\gamma_{0}$ , the true model is continuous, and Assumptions G, D, and LK hold, then

\mathcal{D}_{n}(\gamma)\xrightarrow{d}Z_{0}^{2}

where $Z_{0}=\max(0,Z_{0}^{*})$ , $Z_{0}^{*}\sim N(0,1)$ .

(ii) If $\gamma=\gamma_{0}$ , the true model is discontinuous, and Assumptions G, D, and LJ hold, then

\mathcal{D}_{n}(\gamma)\xrightarrow{d}\chi^{2}_{1}.

(iii) If $\gamma\neq\gamma_{0}$ , then for any $M<\infty$ , $\lim_{n\rightarrow\infty}P(\mathcal{D}_{n}(\gamma)<M)=0$ .

Theorem 3 (i) presents the asymptotic distribution of the distance statistic under the continuity. Due to the censoring, the asymptotic distribution becomes a mixture of the $\chi^{2}_{1}$ distribution with weight 1/2 and zero with weight 1/2. This type of distribution also arises in the context of testing parameters on a boundary; see e.g., Andrews, (2001).

Meanwhile, the chi-square limit in Theorem 3 (ii) extends Newey and West, (1987) for a discontinuous moment function. Seo and Shin, (2016) did not study the distance statistic.

Theorem 3 (iii) shows that the GMM distance test for the threshold location is consistent. It also serves as the consistency of a bootstrap test together with Theorem 5 since the bootstrap statistic is stochastically bounded whether or not the threshold location is true.

Since the limit distribution depends on the continuity of the model, we introduce a bootstrap in Section 4.1, which is valid regardless of the model continuity. Furthermore, Appendix I establishes the uniform validity of the bootstrap inference for the threshold location under some simplifying assumptions.

3.2 Testing continuity

We propose a test for the continuity of the threshold model, similar to the approach used by Gonzalo and Wolf, (2005) or Hidalgo et al., (2023) in the threshold regression literature. While empirical researchers may employ the test to select a model, we utilize the test to modify the standard nonparametric bootstrap to make the bootstrap valid irrespective of the model continuity. Details of the use of the continuity test statistic in the bootstrap method are explained in Section 4.2.

The continuity hypothesis is a joint hypothesis. We employ the GMM distance test. Let $\tilde{\theta}=\arg\min_{\theta\in\Theta_{c}}\hat{Q}_{n}(\theta)$ be the continuity-restricted estimator. The GMM distance test statistic is

\mathcal{T}_{n}=n(\hat{Q}_{n}(\tilde{\theta})-\hat{Q}_{n}(\hat{\theta})).

Theorem 4.

(i) When the true model is continuous and Assumptions G, D, and LK hold,

\mathcal{T}_{n}\xrightarrow{d}V_{1}-V_{2}+V_{3},

where $V_{1}\equiv Z^{\prime}\Psi M_{20}(M_{20}^{\prime}\Psi M_{20})^{-1}M_{20}^{\prime}\Psi Z$ , $V_{2}\equiv Z^{\prime}\Psi N_{20}(N_{20}^{\prime}\Psi N_{20})^{-1}N_{20}^{\prime}\Psi Z$ , $V_{3}\equiv Z_{0}^{2}$ , $Z\sim N(0,\Omega)$ , $Z_{0}=\max(0,Z_{0}^{*})$ , $Z_{0}^{*}\sim N(0,1)$ , $Z_{0}$ and $Z$ are independent, $\Psi=\Omega^{-1}-\Omega^{-1}M_{10}(M_{10}^{\prime}\Omega^{-1}M_{10})^{-1}M_{10}^{\prime}\Omega^{-1}$ , and $N_{20}=M_{20}\scalebox{0.8}{$\left(\begin{array}[]{ccc}-\gamma_{0}&0_{p-1}^{\prime}&1\\ -\delta_{30}&0_{p-1}^{\prime}&0\end{array}\right)^{\prime}$}$

(ii) If the model is discontinuous, then $\lim_{n\rightarrow\infty}P(n^{-m}\mathcal{T}_{n}<M)=0$ for any $m\in[0,1)$ and $M<\infty$ .

While the limit distribution in Theorem 4 (i) is non-standard, it can be simulated to obtain critical values for the test using consistent plug-in sample analogue estimators, e.g., $\hat{\Omega}=\frac{1}{n}\sum_{i=1}^{n}[g_{i}(\hat{\theta})g_{i}(\hat{\theta})^{\prime}]-\bar{g}_{n}(\hat{\theta})\bar{g}_{n}(\hat{\theta})^{\prime}$ , $\hat{M}_{1}=\bar{M}_{1n}$ , $\hat{M}_{2}=\bar{M}_{2n}(\hat{\gamma})$ , etc. Another way to obtain the critical values is via a bootstrap method, which will be introduced in Section 4.3.

Theorem 4 (ii) shows that the continuity test is consistent. It also implies the consistency of the bootstrap test together with Theorem 7, which shows that the bootstrap test statistic is stochastically bounded even when the true model is not continuous. The divergence rate of $\mathcal{T}_{n}$ , which is faster than $n^{m}$ for any $0\leq m<1$ , is exploited to modify the standard nonparametric bootstrap for the coefficients as detailed in Section 4.2.

4 Bootstrap

As usual, the superscript “*” denotes the bootstrap quantities or the convergence of bootstrap statistics under the bootstrap probability law conditional on the original sample. For example, $E^{*}$ denotes the expectation with respect to the bootstrap probability law conditional on the data. “ $\xrightarrow{d^{*}}$ , in $P$ ” denotes the distributional convergence of bootstrap statistics under the bootstrap probability law with probability approaching one. We write “ $\nu_{n}^{*}=O_{p}^{*}(1)$ , in $P$ ” if a sequence $\nu_{n}^{*}$ is stochastically bounded under the bootstrap probability law with probability approaching one. More details are written in Section B.1. Let $\widehat{F}^{*-1}_{n}(\varphi;S^{*})$ denote the empirical $\varphi$ quantile of a bootstrap statistic $S^{*}$ .

This section introduces three different bootstrap schemes. The first bootstrap is for constructing bootstrap confidence interval(CI)s for the threshold, while the second bootstrap is for constructing bootstrap CIs for the coefficients. Both methods aim to provide valid inferences, regardless of whether the model is continuous or not. The third bootstrap is for testing continuity of the threshold model. The three bootstrap methods can be represented by means of Algorithm 1 with suitable choices of $\theta_{0}^{*}=(\beta_{0}^{*\prime},\delta_{0}^{*\prime},\gamma_{0}^{*})^{\prime}$ .

Algorithm 1 Bootstrap with

\theta_{0}^{*}

1:For

i=1,...,n

, let

i^{*}

be the

i

th i.i.d. random draw from the discrete uniform distribution on

\{1,...,n\}

. Generate a bootstrap sample

\{(x_{it}^{*},x_{it-1}^{*},z_{it}^{*},\widehat{\Delta\epsilon}_{it}^{*})_{t=t_{0}}^{T}:i=1,...,n\}

by setting

(x_{it}^{*},x_{it-1}^{*},z_{it}^{*},\widehat{\Delta\epsilon}_{it}^{*})_{t=t_{0}}^{T}=(x_{i^{*}t},x_{i^{*}t-1},z_{i^{*}t},\widehat{\Delta\epsilon}_{i^{*}t})_{t=t_{0}}^{T}

for each

i

2:Generate

\{(\Delta y_{it}^{*})_{t=t_{0}}^{T}:i=1,...,n\}

using

\theta_{0}^{*}

\Delta y_{it}^{*}=\Delta x_{it}^{*\prime}\beta_{0}^{*}+1_{it}^{*}(\gamma_{0}^{*})^{\prime}X_{it}^{*}\delta_{0}^{*}+\widehat{\Delta\epsilon}_{it}^{*},

where

\Delta x_{it}^{*}=x_{it}^{*}-x_{it-1}^{*}

X_{it}^{*}=\begin{pmatrix}(1,x_{it}^{*\prime})\\ (1,x_{it-1}^{*\prime})\end{pmatrix},\text{ and }1_{it}^{*}(\gamma)=\begin{pmatrix}1\{q_{it}^{*}>\gamma\}\\ -1\{q_{it-1}^{*}>\gamma\}\end{pmatrix}.

3:Define the bootstrap moment function

g_{i}^{*}(\theta)=(g_{it_{0}}^{*}(\theta)^{\prime},...,g_{iT}^{*}(\theta)^{\prime})^{\prime}

where

g_{it}^{*}(\theta)=z_{it}^{*}(\Delta y_{it}^{*}-\Delta x_{it}^{*\prime}\beta-1_{it}^{*}(\gamma)^{\prime}X_{it}^{*}\delta)

4:Define the (recentered) bootstrap sample moment

\bar{g}_{n}^{*}(\theta)=\tfrac{1}{n}{\textstyle\sum}_{i=1}^{n}(g_{i}^{*}(\theta)-\bar{g}_{n}(\hat{\theta})).

5:Compute the initial estimator

\hat{\theta}_{(1)}^{*}=\arg\min_{\theta}\bar{g}_{n}^{*}(\theta)^{\prime}\bar{g}_{n}^{*}(\theta)

and the weight matrix

W_{n}^{*}=(\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\hat{\theta}_{(1)}^{*})g_{i}^{*}(\hat{\theta}_{(1)}^{*})^{\prime}-[\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\hat{\theta}_{(1)}^{*})][\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\hat{\theta}_{(1)}^{*})]^{\prime})^{-1}

6:Define the bootstrap criterion function

\hat{Q}_{n}^{*}(\theta)=\bar{g}_{n}^{*}(\theta)^{\prime}W_{n}^{*}\bar{g}_{n}^{*}(\theta)

, and obtain the bootstrap estimator or the test statistics.

In step 1, we resample the regressors, the instruments, and the residuals jointly to maintain the dependence among them, unlike in the usual residual bootstrap. See e.g., Giannerini et al., (2024) for the description of the standard residual bootstrap, which resamples the residuals only, and the wild bootstrap for the testing of linearity in the threshold regression. There could be other ways of resampling not mentioned here and we do not attempt to decide which is the best here.

The parameter $\theta_{0}^{*}$ is used in step 2 of Algorithm 1 to generate the dependent variables in the bootstrap samples. In step 4, recentering of the bootstrap sample moment is done by subtracting $\bar{g}_{n}(\hat{\theta})=(\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{\prime}\widehat{\Delta\epsilon}_{it_{0}},...,\frac{1}{n}\sum_{i=1}^{n}z_{iT}^{\prime}\widehat{\Delta\epsilon}_{iT})^{\prime}$ . Note that the expectation of $\bar{g}_{n}^{*}(\theta)$ by the bootstrap probability law conditional on the data becomes zero when $\theta=\theta_{0}^{*}$ due to the recentering, which can be easily checked from the following equations: $g_{it}^{*}(\theta_{0}^{*})=z_{it}^{*}(\Delta y_{it}^{*}-\Delta x_{it}^{*}\beta_{0}^{*}-1_{it}(\gamma_{0}^{*})^{\prime}X_{it}^{*}\delta_{0}^{*})=z_{it}^{*}\widehat{\Delta\epsilon}_{it}^{*}$ and $E^{*}[g_{it}^{*}(\theta_{0}^{*})]=n^{-1}\sum_{i=1}^{n}z_{it}\widehat{\Delta\epsilon}_{it}$ for $t=t_{0},...,T$ .

A different choice of $\theta_{0}^{*}$ leads to a different bootstrap. For example, if $\theta_{0}^{*}=\hat{\theta}$ , then the bootstrap becomes the standard nonparametric bootstrap in Hall and Horowitz, (1996) because $\Delta y_{it}^{*}=\Delta y_{i^{*}t}$ holds true for $i=1,...,n$ and $t=t_{0},...,T$ in step 2. Note that, for $\theta_{0}^{*}$ not equal to $\hat{\theta}$ , step 2 of Algorithm 1 generates $\Delta y_{it}^{*}$ ’s that are generally different from $\Delta y_{i^{*}t}$ ’s. The following subsections detail three different choices of $\theta_{0}^{*}$ for three different inference problems.

4.1 Grid bootstrap for threshold location

To construct CIs for the threshold location, we propose to employ the grid bootstrap method introduced by Hansen, 1999a for autoregressive models. Let $\Gamma_{n}=\{\gamma_{\ell}\in\Gamma:\ell=1,...,L\}$ be a grid of the candidate thresholds. The grid bootstrap constructs the confidence set by inverting the bootstrap threshold location tests over $\Gamma_{n}$ . Specifically, a sequence of hypothesis tests for the hypothesized threshold locations in $\Gamma_{n}$ are performed by the bootstrap that imposes the null to generate bootstrap samples.

The null imposed bootstrap at a point $\gamma_{\ell}\in\Gamma_{n}$ can be implemented by setting $\theta_{0}^{*}=(\hat{\alpha}(\gamma_{\ell})^{\prime},\gamma_{\ell})^{\prime}$ in Algorithm 1, and the bootstrap test statistic is

\mathcal{D}_{n}^{*}(\gamma_{\ell})=n(\min_{\alpha\in A}\hat{Q}_{n}^{*}(\alpha,\gamma_{\ell})-\min_{\theta\in\Theta}\hat{Q}_{n}^{*}(\theta)).

The null hypothesis $\mathcal{H}_{0}:\gamma=\gamma_{\ell}$ is rejected at size $\tau$ if $\mathcal{D}_{n}(\gamma_{\ell})>\widehat{F}^{*-1}_{n}(1-\tau;\mathcal{D}_{n}^{*}(\gamma_{\ell}))$ . Consequently, after running the null imposed bootstrap for each point in $\Gamma_{n}$ , we can construct the $100(1-\tau)$ % confidence set of $\gamma$ by

CI_{n,1-\tau}^{grid}=\{\gamma\in\Gamma_{n}:\mathcal{D}_{n}(\gamma)\leq\widehat{F}^{*-1}_{n}(1-\tau;\mathcal{D}_{n}^{*}(\gamma))\}.

(7)

Note that the confidence set is not necessarily a connected set, even though researchers can convexify the set to get a connected CI. The CI does not become an empty set because $\mathcal{D}_{n}(\hat{\gamma})=0$ while $\mathcal{D}_{n}^{*}(\hat{\gamma})\geq 0$ . The consistency of the grid bootstrap method is implied by Theorem 5 that follows.

Theorem 5.

For a given $\gamma\in\Gamma$ , assume that $\mathcal{D}_{n}^{*}(\gamma)$ is obtained by Algorithm 1 with $\theta_{0}^{*}=(\hat{\alpha}(\gamma)^{\prime},\gamma)^{\prime}$ .

(i) If $\gamma=\gamma_{0}$ , the true model is continuous, and Assumptions G, D, and LK hold, then

\mathcal{D}_{n}^{*}(\gamma)\xrightarrow{d*}Z_{0}^{2}\quad\text{ in $P$},

where $Z_{0}=\max(0,Z_{0}^{*})$ and $Z_{0}^{*}\sim N(0,1)$ .

(ii) If $\gamma=\gamma_{0}$ , the true model is discontinuous, and Assumptions G, D, and LJ hold, then

\mathcal{D}_{n}^{*}(\gamma)\xrightarrow{d*}\chi^{2}_{1}\quad\text{ in $P$}.

(iii) If $\gamma\neq\gamma_{0}$ , then $\mathcal{D}_{n}^{*}(\gamma)=O_{p}^{*}(1)$ in $P$ .

Theorem 5 (i) and (ii) show that the limit distribution of the bootstrap test statistic, conditional on the data, is identical to that of the sample test statistic regardless of the continuity of the true model. Therefore, the CI for the threshold location by the grid bootstrap, (7), achieves an exact coverage rate for both continuous and discontinuous models asymptotically. Specifically, $\lim_{n\rightarrow\infty}P(\gamma_{0}\in CI_{n,1-\tau}^{grid})=1-\tau$ for both cases (i) and (ii). Theorem 5 (iii) says that the bootstrap test statistic is still stochastically bounded, conditionally on the data, under the alternatives. As Theorem 3 (iii) shows that the sample test statistic is stochastically unbounded under the alternatives, the grid bootstrap CI has power against the alternative threshold locations.

4.1.1 Uniform validity of grid bootstrap

We extend Theorem 5 to the uniform validity of the grid bootstrap to ensure its good finite sample performance when the model is nearly continuous. We establish the uniform validity for the following simplified specification for analytical tractability:

y_{it}=x_{it}^{\prime}\beta+(\delta_{1}+\delta_{3}q_{it})1\{q_{it}>\gamma\}+\eta_{i}+\epsilon_{it},

where $\theta=(\beta^{\prime},\delta^{\prime},\gamma)^{\prime}$ and $\delta=(\delta_{1},\delta_{3})^{\prime}$ in this subsection.

This section briefly states the uniformity result of the grid bootstrap and gives heuristic justification. Our derivation follows Andrews et al., (2020). It is highly complicated and involves more technical conditions, which are stated in Appendix I.

Specifically, we establish in Theorem I.1 that

\liminf_{n\rightarrow\infty}\inf_{\phi_{0}\in\Phi_{0}}P_{\phi_{0}}(\gamma_{0}\in CI_{n,1-\tau}^{grid})=\limsup_{n\rightarrow\infty}\sup_{\phi_{0}\in\Phi_{0}}P_{\phi_{0}}(\gamma_{0}\in CI_{n,1-\tau}^{grid})=1-\tau,

where $P_{\phi}$ is the probability law when the model is specified by $\phi=(\theta,F)$ and $F$ is the distribution of $\{\eta_{i},y_{i0},(z_{it},x_{it},\epsilon_{it})_{t=1}^{T}\}$ . The collection of probabilistic models $\Phi_{0}$ includes both continuous and discontinuous threshold models. More detailed discussions of technical assumptions about $\Phi_{0}$ are given in Appendix I.

For the uniformity analysis, we need to consider drifting sequences of true parameters $\phi_{0n}=(\theta_{0n},F_{0n})$ such that $\theta_{0n}\rightarrow\theta_{0,\infty}$ and $F_{0n}\rightarrow F_{0,\infty}$ . Here, the distance between $F_{0n}$ and $F_{0,\infty}$ is induced by a specific choice of norm that is explained in Appendix I. To show the uniform validity of the grid bootstrap CI, we need to verify that the limit distribution of $\mathcal{D}_{n}^{*}(\gamma_{0n})$ conditional on the data is identical to the limit distribution of $\mathcal{D}_{n}(\gamma_{0n})$ under all the above drifting sequences of models. Our analysis finds that the limit distribution of the threshold location test statistic under the true null, i.e., the limit distribution of $\mathcal{D}_{n}(\gamma_{0n})$ , is determined by $\zeta=\lim_{n\rightarrow\infty}n^{1/4}(\delta_{10n}+\delta_{30n}\gamma_{0n})$ ; see Lemma I.1 for details. When $\zeta=0$ , the limit distribution of $\mathcal{D}_{n}(\gamma_{0n})$ is as described in Theorem 3 (i). In contrast, when $|\zeta|=\infty$ , the limit distribution is the $\chi^{2}_{1}$ -distribution as in Theorem 3 (ii). When $\zeta$ is finite and nonzero, then $\mathcal{D}_{n}(\gamma_{0n})$ has a nonstandard limit distribution that depends on $\zeta$ .

Therefore, if $\theta_{0n}^{*}$ comprises a true parameter sequence of a bootstrap scheme, then $n^{1/4}(\delta_{10n}^{*}+\delta_{30n}^{*}\gamma_{0n}^{*})$ should consistently estimate $\zeta$ for the bootstrap statistics to exhibit the same asymptotic behavior as the sample statistics.

Note that under the grid bootstrap scheme, the bootstrap test statistic $\mathcal{D}_{n}^{*}(\gamma_{0n})$ is drawn from the bootstrap that imposes the null threshold location $\gamma_{0n}$ . The true parameter of the bootstrap data generating process (dgp) is $\theta_{0n}^{*}=(\hat{\alpha}_{n}(\gamma_{0n})^{\prime},\gamma_{0n})^{\prime}$ , where $\hat{\alpha}_{n}(\gamma)=(\hat{\beta}_{n}(\gamma)^{\prime},\hat{\delta}_{1n}(\gamma),\hat{\delta}_{3n}(\gamma))^{\prime}=\arg\min_{\alpha}\hat{Q}_{n}^{*}(\alpha,\gamma)$ . The restricted estimator satisfies $\|\hat{\alpha}(\gamma_{0n})-\alpha_{0n}\|=O_{p}(n^{-1/2})$ , as the problem becomes estimating a standard linear dynamic panel model, and hence $n^{1/4}(\hat{\delta}_{1n}(\gamma_{0n})+\hat{\delta}_{3n}(\gamma_{0n})\gamma_{0n})=\zeta+o_{p}(1)$ . Therefore, $\mathcal{D}_{n}^{*}(\gamma_{0n})$ conditionally converges to the limit distribution of $\mathcal{D}_{n}(\gamma_{0n})$ , which leads to the uniform validity of the grid bootstrap confidence interval. In contrast, $\hat{\theta}$ does not satisfy this property for some $\zeta$ and the bootstrap building on $\hat{\theta}$ may not be uniformly valid.

4.2 Residual bootstrap for coefficients

The bootstrap CIs for the coefficients can be obtained by applying Algorithm 1 with $\theta_{0}^{*}$ set as

\theta_{0}^{*}=w_{n}\hat{\theta}+(1-w_{n})\tilde{\theta},\quad w_{n}=\min\left(\frac{\mathcal{T}_{n}}{\hat{C}n^{1/4}},1\right),

(8)

where $\tilde{\theta}=\arg\min_{\theta\in\Theta_{c}}\hat{Q}_{n}(\theta)$ is the continuity-restricted estimator. $\hat{C}$ is some estimated quantile, such as the $50$ th percentile, of the limit distribution of the continuity test statistic $\mathcal{T}_{n}$ when the model is continuous. $\hat{C}$ can be obtained either by methods in Section 3.2 or Section 4.3. Since $w_{n}=O_{p}(n^{-1/4})$ if the true model is continuous, and $w_{n}=1+o_{p}(1)$ if the model is discontinuous, the true parameter value for the bootstrap adapts to the model continuity.

After collecting the bootstrap estimators

\hat{\theta}^{*}=(\hat{\alpha}^{*\prime},\hat{\gamma}^{*})^{\prime}=\arg\min_{\theta\in\Theta}\hat{Q}_{n}^{*}(\theta),

we can construct the CIs for the coefficients using the percentiles of either $|\hat{\alpha}_{j}^{*}-\alpha_{j0}^{*}|$ or $(\hat{\alpha}_{j}^{*}-\alpha_{j0}^{*})$ . Here, $\hat{\alpha}^{*}_{j}$ and $\alpha^{*}_{j0}$ are the $j$ th elements of $\hat{\alpha}^{*}$ and $\alpha^{*}_{0}$ , respectively. The $100(1-\tau)$ % CI for the $j$ th element of the coefficients, $\alpha_{j}$ , can be constructed by

CI^{RB}_{n,1-\tau}(\alpha_{j})=\left[\hat{\alpha}_{j}-\widehat{F}^{*-1}_{n}(1-\tfrac{\tau}{2};\hat{\alpha}^{*}_{j}-\alpha_{j0}^{*}),\hat{\alpha}_{j}-\widehat{F}^{*-1}_{n}(\tfrac{\tau}{2};\hat{\alpha}^{*}_{j}-\alpha_{j0}^{*})\right]

(9)

CI^{RB(S)}_{n,1-\tau}(\alpha_{j})=\left[\hat{\alpha}_{j}-\widehat{F}^{*-1}_{n}(1-\tau;|\hat{\alpha}^{*}_{j}-\alpha_{j0}^{*}|),\hat{\alpha}_{j}+\widehat{F}^{*-1}_{n}(1-\tau;|\hat{\alpha}^{*}_{j}-\alpha_{j0}^{*}|)\right],

(10)

which leads to a symmetric CI. The validity of the residual bootstrap CI is implied by Theorem 6 that follows.

We make the following additional assumption to derive the limit distribution of the bootstrap estimator when the true model is discontinuous.

Assumption P.

The continuity-restricted estimator $\tilde{\theta}=\arg\min_{\theta\in\Theta_{c}}\hat{Q}_{n}(\theta)$ is $O_{p}(1)$ .

The assumption holds if $M_{0}(\gamma)$ has full column rank for all $\gamma\in\Gamma$ . Details are explained in the comment after Lemma E.6.

Theorem 6.

Let $\hat{\theta}^{*}$ be obatined by Algorithm 1 with $\theta_{0}^{*}$ set as (8). (i) When the true model is continuous and Assumptions G, D, and LK hold,

\begin{pmatrix}\sqrt{n}(\hat{\alpha}^{*}-\alpha_{0}^{*})\\ \sqrt{n}(\hat{\gamma}^{*}-\gamma_{0}^{*})^{2}\end{pmatrix}\xrightarrow{d^{*}}\begin{pmatrix}U-(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}HV\\ V\end{pmatrix}\quad\text{ in $P$},

where $U$ and $V$ are defined as in Theorem 2.

(ii) When the true model is discontinuous and Assumptions G, D, LJ, and P hold,

\begin{pmatrix}\sqrt{n}(\hat{\alpha}^{*}-\alpha_{0}^{*})\\ \sqrt{n}(\hat{\gamma}^{*}-\gamma_{0}^{*})\end{pmatrix}\xrightarrow{d^{*}}N(0,(D_{1}^{\prime}\Omega^{-1}D_{1})^{-1})\quad\text{ in $P$}.

The asymptotic distributions of the bootstrap estimators in Theorem 6, conditional on the data, match those of the sample estimators for both continuous and discontinuous cases. Therefore, the residual bootstrap CI becomes asymptotically valid in a pointwise sense, regardless of whether the model is continuous or discontinuous. We acknowledge that Theorem 6 does not guarantee the uniform validity of the bootstrap CI. The difficulty in establishing the uniform validity lies in analyzing asymptotic behaviors of $\mathcal{T}_{n}$ and $w_{n}$ for drifting sequences of the true models. $\mathcal{T}_{n}$ already exhibits an irregular limit distribution even in the pointwise setup, as shown in Theorem 4 (i). This paper does not provide a theoretical analysis regarding the uniformity of the residual bootstrap. Instead, we conduct Monte Carlo experiments for nearly continuous cases in Section 5 and leaves theoretical work on the uniformity of the bootstrap method to future research.

The key motivation for setting $\theta_{0}^{*}$ , the true parameter of the bootstrap dgp, by (8) is to make $\delta^{*}_{10}+\delta^{*}_{30}\gamma^{*}_{0}$ degenerate fast enough when the underlying model is continuous. The $n^{1/4}$ convergence rate of the unrestricted estimator $\hat{\gamma}$ to $\gamma_{0}$ is not sufficiently fast. To see this, let the first-derivative of the population moment with respect to $\gamma$ at $\theta$ be

G(\theta)=(\delta_{1}+\delta_{3}\gamma)\cdot\begin{bmatrix}E_{t_{0}}[z_{it_{0}}|\gamma]f_{t_{0}}(\gamma)-E_{t_{0}-1}[z_{it_{0}}|\gamma]f_{t_{0}-1}(\gamma)\\ \vdots\\ E_{T}[z_{iT}|\gamma]f_{T}(\gamma)-E_{T-1}[z_{iT}|\gamma]f_{T-1}(\gamma)\end{bmatrix}\\ +\begin{bmatrix}E_{t_{0}}[z_{it_{0}}\xi_{it_{0}}^{\prime}\delta_{2}|\gamma]f_{t_{0}}(\gamma)-E_{t_{0}-1}[z_{it_{0}}\xi_{it_{0}-1}^{\prime}\delta_{2}|\gamma]f_{t_{0}-1}(\gamma)\\ \vdots\\ E_{T}[z_{iT}\xi_{iT}^{\prime}\delta_{2}|\gamma]f_{T}(\gamma)-E_{T-1}[z_{iT}\xi_{iT-1}^{\prime}\delta_{2}|\gamma]f_{T-1}(\gamma)\end{bmatrix},

(11)

for which we recall that $x_{it}=(\xi_{it}^{\prime},q_{it})^{\prime}$ and that $G(\theta_{0})=0_{k}$ under continuity. For the validity of a bootstrap method, the degeneracy of the Jacobian should be mimicked by the bootstrap dgp. In our residual bootstrap method, the Jacobian is $G(\theta_{0}^{*})=O_{p}(n^{-1/2})$ . However, it is $G(\hat{\theta})=O_{p}(n^{-1/4})$ for the standard nonparametric bootstrap. This fails the standard nonparametric bootstrap. More formal treatment of the invalidity of the standard nonparametric bootstrap is given in Appendix F.

It is not difficult to check $G(\hat{\theta})=O_{p}(n^{-1/4})$ but not $o_{p}(n^{-1/4})$ , which is directly implied by $n^{1/4}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})=O_{p}(1)$ but not $o_{p}(1)$ due to Theorem 2. Meanwhile, in our residual bootstrap method, $\delta_{10}^{*}+\delta_{30}^{*}\gamma_{0}^{*}=w_{n}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})+o_{p}(n^{-1/2})=O_{p}(n^{-1/2})$ and $\delta_{20}^{*}=w_{n}\hat{\delta}_{2}=O_{p}(n^{-3/4})$ , which leads to $G(\theta_{0}^{*})=O_{p}(n^{-1/2})$ . The exact formula for $\delta_{10}^{*}+\delta_{30}^{*}\gamma_{0}^{*}$ is provided in the comment of Lemma E.5.

According to the proof of Theorem 6 in Appendix B, $(\delta_{10}^{*}+\delta_{30}^{*}\gamma_{0}^{*})=O_{p}(n^{-1/2})$ is sufficient for the first-order asymptotic validity. This requirement is explicitly stated in the conditions of Lemma E.5. While our choice of $n^{1/4}$ decay rate for $w_{n}$ guarantees this condition, it remains an open question how fast $w_{n}$ must decay to ensure the uniform validity.

The idea of shrinking the first-order derivative in our bootstrap is closely related to other bootstrap methods developed for the case when asymptotic distributions of estimators are irregular. For example, Chatterjee and Lahiri, (2011) propose a bootstrap method for the lasso estimator, and Cavaliere et al., (2022) study bootstrap inference on the boundary of a parameter space. Both papers set up the model where the problem appears if the true parameter value is zero, and they obtain true parameters of bootstrap dgps by thresholding unrestricted estimators, i.e., $\theta_{j0}^{*}=\hat{\theta}_{j}1\{\hat{\theta}_{j}>c_{n}\}$ , where $c_{n}$ converges to zero in a proper rate.

4.3 Bootstrap for testing continuity

The critical value for the continuity test introduced in Section 3.2 can also be obtained by bootstrapping. Recall that $\tilde{\theta}=\arg\min_{\theta\in\Theta_{c}}\hat{Q}_{n}(\theta)$ is the continuity-restricted estimator. By setting $\theta_{0}^{*}=\tilde{\theta}$ in Algorithm 1, and collecting the bootstrap test statistic

\mathcal{T}_{n}^{*}=n\left(\min_{\theta\in\Theta_{c}}\hat{Q}_{n}^{*}(\theta)-\min_{\theta\in\Theta}\hat{Q}_{n}^{*}(\theta)\right),

we can get the critical value using the empirical quantile of $\mathcal{T}_{n}^{*}$ . To run the bootstrap continuity test at size $\tau$ , reject the continuity if $\mathcal{T}_{n}>\widehat{F}^{*-1}_{n}(1-\tau;\mathcal{T}_{n}^{*})$ , where $\widehat{F}^{*-1}_{n}(1-\tau;\mathcal{T}_{n}^{*})$ is the empirical $(1-\tau)$ quantile of $\mathcal{T}_{n}^{*}$ . The consistency of the bootstrap is implied by Theorem 7 that follows.

Theorem 7.

Assume that $\mathcal{T}_{n}^{*}$ is obtained by Algorithm 1 with $\theta_{0}^{*}=\tilde{\theta}$ .

(i) When the true model is continuous and Assumptions G, D, and LK hold,

\mathcal{T}_{n}^{*}\xrightarrow{d*}V_{1}-V_{2}+V_{3}\quad\text{ in $P$},

where the distributions of $V_{1}$ , $V_{2}$ , and $V_{3}$ are specified in Theorem 4.

(ii) When the model is discontinuous, then $\mathcal{T}_{n}^{*}=O_{p}^{*}(1)$ in $P$ .

Theorem 7 (i) shows that the limit distribution of $\mathcal{T}_{n}^{*}$ , conditional on the data, is identical to that of $\mathcal{T}_{n}$ under the null hypothesis. Moreover, Theorem 7 (ii) says that $\mathcal{T}_{n}^{*}$ is still stochastically bounded, conditionally on the data, when the true model is discontinuous. As $\mathcal{T}_{n}$ is shown to be stochastically unbounded under the alternative, according to Theorem 4 (ii), the bootstrap continuity test has power against the alternatives.

5 Monte Carlo results

This section executes Monte Carlo simulations to investigate finite sample performances of our bootstrap methods. The data is generated by

$\displaystyle y_{it}$	$\displaystyle=\beta_{2}y_{it-1}+\beta_{3}q_{it}+(\delta_{1}+\delta_{2}y_{it-1}+\delta_{3}q_{it})1\{q_{it}>\gamma\}+\sigma e_{it}$
$\displaystyle q_{it}$	$\displaystyle=\rho q_{it-1}+u_{it},$
	$\displaystyle\text{where }\begin{pmatrix}e_{it}\\ u_{it+1}\end{pmatrix}\overset{iid}{\sim}N\left(\begin{pmatrix}0\\ 0\end{pmatrix},\begin{pmatrix}1&\rho_{eu}\\ \rho_{eu}&1\end{pmatrix}\right),$	(12)

with $\beta_{2}=0.6$ , $\beta_{3}=1$ , $\delta_{2}=0$ , $\delta_{3}=2$ , $\gamma=0.25$ , $\sigma=0.5$ , $\rho=0.7$ , and $\rho_{eu}=0.5$ . Note that (12) implies that the threshold variable is weakly exogenous. That is, $E[e_{it}|q_{is}]=0$ for $s\leq t$ while $E[e_{it}|q_{is}]\neq 0$ for $s\geq t+1$ . Similar Monte Carlo results are obtained when the threshold variable is weakly endogenous, and they are reported in Appendix C.

To investigate how coverage rates of the CIs change depending on the continuity, we try different values of $\delta_{1}\in\{-0.5,-0.4,-0.3,0,0.5\}$ , which implies different degrees of (dis)continuity $\delta_{1}+\delta_{3}\gamma\in\{0,0.1,0.2,0.5,1\}$ . If $\delta_{1}=-0.5$ , then $\delta_{1}+\delta_{3}\gamma=0$ and the model is continuous. Otherwise, the model is discontinuous. As near continuous designs, we try $\delta_{1}+\delta_{3}\gamma=0.1,0.2$ and check if there is any poor performance of CIs. We generate samples of size $n\in\{400,800,1600\}$ and $T=6$ . The number of repetitions for the Monte Carlo simulations is 2000. Instruments used for the estimations are the lagged dependent variables that date back from period $t-2$ to period 1 and the lagged threshold variables from period $t-1$ to period 1, i.e., $z_{it}=(y_{it-2},...,y_{i1},q_{it-1},...,q_{i1})^{\prime}$ . The earliest period used for the estimation is $t_{0}=3$ , and the total number of the instruments becomes 24.

We begin with examining the finite sample coverage probabilities of bootstrap CIs for the threshold location. Specifically, the grid bootstrap CI (Grid-B) is compared with both percentile nonparametric bootstrap CI (NP-B) and symmetric percentile nonparametric bootstrap CI (NP-B(S)) that are defined as follows:

	$\displaystyle CI^{NPB}_{n,1-\tau}(\gamma)$	$\displaystyle=\left[\hat{\gamma}-\widehat{F}^{-1}_{n}(1-\tfrac{\tau}{2};\hat{\gamma}^{}-\hat{\gamma}),\hat{\gamma}-\widehat{F}^{-1}_{n}(\tfrac{\tau}{2};\hat{\gamma}^{}-\hat{\gamma})\right],$		(13)
	$\displaystyle CI^{NPB(S)}_{n,1-\tau}(\gamma)$	$\displaystyle=\left[\hat{\gamma}-\widehat{F}^{-1}_{n}(1-\tau;\|\hat{\gamma}^{}-\hat{\gamma}\|),\hat{\gamma}+\widehat{F}^{-1}_{n}(1-\tau;\|\hat{\gamma}^{}-\hat{\gamma}\|)\right].$		(14)

The number of bootstrap repetitions is set at 500 for each bootstrap method.

Table 1 reports the coverage rates of 95% CIs for the threshold location. First, it shows that the bootstrap CI by NP-B is subject to severe undercoverage in all cases. This is the case even when $\delta_{1}+\delta_{3}\gamma=1$ , despite the theoretical validity of NP-B when the model is discontinuous. Meanwhile, NP-B(S) exhibits extreme over-coverage in all cases. The large discrepancy in the results between NP-B and NP-B(S) suggests that the distribution of the nonparametric bootstrap estimator $\hat{\gamma}^{*}-\hat{\gamma}$ is poorly behaved, undermining its reliability for inference. The large difference between symmetric and non-symmetric CIs also arises in the inference for the coefficients, which we analyze in more detail in Appendix C.

Table 1: Coverage rates of 95% CIs for the threshold location. Grid-B denotes the grid bootstrap CI defined by (7). NP-B and NP-B(S) denote the percentile and the symmetric percentile CIs by the standard nonparametric bootstrap defined by (13) and (14).

		$\delta_{1}+\delta_{3}\gamma$
	n	0	0.1	0.2	0.5	1
	400	0.992	0.995	0.993	0.988	0.966
Grid-B	800	0.986	0.986	0.985	0.973	0.955
	1600	0.988	0.987	0.988	0.979	0.959
	400	0.484	0.491	0.494	0.524	0.631
NP-B	800	0.478	0.472	0.487	0.518	0.611
	1600	0.471	0.468	0.476	0.521	0.642
	400	1.000	1.000	1.000	1.000	0.998
NP-B(S)	800	1.000	1.000	1.000	0.999	0.994
	1600	1.000	1.000	1.000	1.000	0.994

On the other hand, Table 1 shows that Grid-B provides more reasonable coverage rates. It seems that a larger jump yields coverage rates closer to the nominal level as expected since it is easier to detect a bigger jump. As expected from the uniform validity of Grid-B against near continuity, coverage rates remain valid for all the parameter values, if somewhat over-coveraged near continuity or under smaller sample sizes.

Contrary to Grid-B, NP-B(S) exhibits higher coverage probabilities that are one or almost one for all cases. It indicates that NP-B(S) CIs are overly wide and non-informative. To investigate this further, we examine some power properties as reported in Table 2 below. It shows that NP-B(S) based tests for the threshold location are trivial for many parametrizations, specifically when the design is continuous or near-continuous or when the alternative is closer to the null. In contrast, Grid-B tests are more powerful, oftentime twice more powerful than NP-B(S) tests. Here, we report the power of the tests instead of the lengths of the bootstrap CIs due to computational burden associated with the grid bootstrap.

Table 2: Rejection rates of 5% level tests on the alternative threshold locations

\gamma=\gamma_{0}+c

are reported. Grid-B denotes the test using the 95% grid bootstrap CI defined by (7). NP-B(S) denotes the test using the symmetric percentile CI constructed by the standard nonparametric bootstrap defined by (14).

		Grid-B					NP-B(S)
		$\delta_{1}+\delta_{3}\gamma$					$\delta_{1}+\delta_{3}\gamma$
c	n	0	0.1	0.2	0.5	1	0	0.1	0.2	0.5	1
	400	0.015	0.015	0.015	0.027	0.096	0.000	0.000	0.000	0.004	0.018
0.10	800	0.011	0.014	0.015	0.038	0.112	0.000	0.000	0.000	0.004	0.017
	1600	0.017	0.020	0.021	0.040	0.125	0.000	0.000	0.002	0.004	0.023
	400	0.020	0.030	0.042	0.100	0.281	0.002	0.004	0.009	0.043	0.135
0.25	800	0.020	0.034	0.041	0.112	0.325	0.002	0.003	0.007	0.035	0.154
	1600	0.029	0.034	0.048	0.126	0.351	0.002	0.006	0.007	0.044	0.152
	400	0.102	0.137	0.172	0.314	0.581	0.062	0.109	0.142	0.274	0.298
0.50	800	0.114	0.162	0.207	0.362	0.632	0.078	0.117	0.169	0.310	0.327
	1600	0.136	0.186	0.240	0.396	0.652	0.076	0.124	0.189	0.332	0.316

Next, we turn to the coverage probabilities for the regression coefficients by different bootstrap CIs. Table 3 reports the coverage rates of bootstrap percentile CIs using the residual bootstrap (R-B), defined by (9), and the standard nonparametric bootstrap (NP-B), defined by

CI^{NPB}_{n,1-\tau}(\alpha_{j})=\left[\hat{\alpha}_{j}-\widehat{F}^{*-1}_{n}(1-\tfrac{\tau}{2};\hat{\alpha}_{j}^{*}-\hat{\alpha}_{j}),\hat{\alpha}_{j}-\widehat{F}^{*-1}_{n}(\tfrac{\tau}{2};\hat{\alpha}_{j}^{*}-\hat{\alpha}_{j})\right]

(15)

and $\hat{C}$ in (8) is set as the 50th percentile of the bootstrap distribution of the test statistic $\mathcal{T}_{n}$ under the null hypothesis that the model is continuous, using the bootstrap method explained in Section 4.3. Additional results on the coverage rates of the symmetric percentile CIs (NP-B(S) and R-B(S)) for the coefficients are reported in Appendix C.

As in the threshold inference case, the percentile CIs for the coefficients constructed using NP-B exhibit systematic undercoverage across all specifications and sample sizes. Even when $\delta_{1}+\delta_{3}\gamma=1$ , so that the model is discontinuous and the NP-B method is theoretically valid, the undercoverage remains severe. While the R-B method yields higher coverage rates than NP-B, they still fall short of the nominal 95% level. Moreover, as reported in Table 4, R-B results in wider average CI lengths compared to NP-B, partly accounting for its improved coverage.

Additional simulation results in Appendix C reveal highly asymmetric bootstrap distributions, which lead to one-sided inference failures because the bootstrap fails to reject the null when $\hat{\delta}_{j}<\delta_{j0}$ . These findings underscore the difficulty of reliable inference for the coefficients $\beta$ and $\delta$ . They echo similar concerns raised in the threshold regression literature; for instance, Hansen, (2000) documents comparable undercoverage issues for $\delta$ even when the threshold is estimated at a faster rate. A more comprehensive theoretical and methodological investigation is needed to address these challenges in future research.

Table 3: Coverage rates of 95% percentile CIs for the coefficients are shown. R-B denotes the percentile CIs by the residual bootstrap defined by (9). NP-B denotes the percentile CIs by the standard nonparametric bootstrap defined by (15).

		R-B					NP-B
$\delta_{1}+\delta_{3}\gamma$	n	$\beta_{2}$	$\beta_{3}$	$\delta_{1}$	$\delta_{2}$	$\delta_{3}$	$\beta_{2}$	$\beta_{3}$	$\delta_{1}$	$\delta_{2}$	$\delta_{3}$
	400	0.839	0.780	0.746	0.815	0.801	0.799	0.691	0.627	0.712	0.709
0.0	800	0.837	0.790	0.721	0.807	0.806	0.790	0.723	0.607	0.725	0.716
	1600	0.849	0.782	0.727	0.840	0.835	0.833	0.709	0.602	0.754	0.718
	400	0.837	0.784	0.749	0.813	0.799	0.794	0.697	0.624	0.706	0.708
0.1	800	0.830	0.779	0.724	0.803	0.800	0.786	0.714	0.599	0.720	0.710
	1600	0.853	0.787	0.727	0.840	0.829	0.827	0.700	0.598	0.760	0.719
	400	0.838	0.786	0.749	0.819	0.811	0.794	0.701	0.623	0.713	0.716
0.2	800	0.833	0.776	0.720	0.803	0.794	0.784	0.707	0.585	0.718	0.712
	1600	0.855	0.789	0.728	0.846	0.832	0.830	0.707	0.606	0.764	0.722
	400	0.836	0.775	0.739	0.820	0.802	0.787	0.703	0.601	0.718	0.724
0.5	800	0.841	0.789	0.732	0.815	0.807	0.787	0.714	0.602	0.716	0.727
	1600	0.843	0.799	0.728	0.826	0.834	0.815	0.717	0.595	0.753	0.737
	400	0.858	0.815	0.745	0.832	0.805	0.800	0.741	0.627	0.741	0.743
1.0	800	0.858	0.827	0.749	0.846	0.820	0.808	0.731	0.620	0.741	0.738
	1600	0.863	0.846	0.759	0.830	0.837	0.820	0.738	0.622	0.761	0.747

Table 4: Ratios of the average lengths of 95% percentile CIs for the coefficients, obtained using different bootstrap methods, are shown. R-B denotes the percentile CIs by the residual bootstrap defined by (9). NP-B denotes the percentile CIs by the standard nonparametric bootstrap defined by (15).

		Ratios of average lengths of CIs:
		R-B / NP-B
$\delta_{1}+\delta_{3}\gamma$	n	$\beta_{2}$	$\beta_{3}$	$\delta_{1}$	$\delta_{2}$	$\delta_{3}$
	400	1.076	1.091	1.099	1.074	1.046
0.0	800	1.081	1.086	1.093	1.070	1.046
	1600	1.088	1.100	1.111	1.083	1.057
	400	1.087	1.098	1.101	1.074	1.047
0.1	800	1.080	1.082	1.090	1.075	1.043
	1600	1.086	1.102	1.111	1.077	1.057
	400	1.080	1.088	1.097	1.074	1.047
0.2	800	1.079	1.089	1.094	1.075	1.047
	1600	1.085	1.100	1.106	1.077	1.054
	400	1.097	1.100	1.100	1.083	1.056
0.5	800	1.083	1.095	1.089	1.076	1.051
	1600	1.098	1.110	1.098	1.089	1.059
	400	1.164	1.159	1.084	1.114	1.074
1.0	800	1.158	1.159	1.079	1.109	1.076
	1600	1.158	1.177	1.084	1.109	1.079

6 Empirical example

Our empirical example examines a firm’s investment decision model that incorporates financial constraints, as in Hansen, 1999b and Seo and Shin, (2016). In a perfect financial market, firms can borrow as much money as they need to finance their investment projects, regardless of their financial conditions. Therefore, the financial conditions of firms are irrelevant to their investment decisions. However, in an imperfect financial market, some firms may be restricted in their access to external financing. These firms are said to be financially constrained. Financially constrained firms are more sensitive to the availability of internal financing, as they cannot rely on external financing to fund their investment projects.

Fazzari et al., (1988) argue that firms’ investments are positively related to their cash flow if they are financially constrained, where those firms are identified by low dividend payments. Hansen, 1999b applies the threshold panel regression more systematically to show that a more positive relationship between investment and cash flow is present for firms with higher leverage.

Since there are multiple candidate measures of the financial constraint for the threshold variable, we compare the following three dynamic panel threshold models:

$\displaystyle I_{it}$	$\displaystyle=\eta_{i}+\xi_{it-1}^{\prime}\beta+(\delta_{1}+\xi_{it-1}^{\prime}\delta_{2}+LEV_{it-1}\delta_{3})1\{LEV_{it-1}>\gamma\}+\epsilon_{it}$	(16)
$\displaystyle I_{it}$	$\displaystyle=\eta_{i}+\xi_{it-1}^{\prime}\beta+(\delta_{1}+\xi_{it-1}^{\prime}\delta_{2}+TQ_{it-1}\delta_{3})1\{TQ_{it-1}>\gamma\}+\epsilon_{it}$	(17)
$\displaystyle I_{it}$	$\displaystyle=\eta_{i}+\xi_{it-1}^{\prime}\beta+(\delta_{1}+TQ_{it-1}\delta_{3})1\{TQ_{it-1}>\gamma\}+\epsilon_{it}$	(18)

where $\xi_{it-1}=(I_{it-1},CF_{it},PPE_{it-1},ROA_{it-1})^{\prime}$ . Here, $I_{it}$ is investment, $CF_{it}$ is cash flow, $PPE_{it}$ is property, plant and equipment, and $ROA_{it}$ is return on assets. $I_{it}$ , $CF_{it}$ and $PPE_{it}$ are normalized by total assets. We have two candidate threshold variables, $LEV_{it}$ and $TQ_{it}$ , which are leverage and Tobin’s Q, respectively. Choice of the regressors and threshold variables is based on previous works like Hansen, 1999b and Lang et al., (1996). Note that the regression model (18) is nested within (17) and it is closer to a continuous threshold model.

Unlike the previous works, we do not need to assume either continuity or discontinuity for valid inferences since the bootstrap methods in this paper are adaptive to each case. With an assumption that the regressors are predetermined, we use the variables dated one period before as instruments. Hence, the instruments include $I_{t-2}$ , $CF_{t-1}$ , $PPE_{t-2}$ , $ROA_{t-2}$ added by $LEV_{t-2}$ or $TQ_{t-2}$ for each period.

We construct a balanced panel of 1459 U.S. firms, excluding finance and utility firms, from 2010 to 2019 available in Compustat. To deal with extreme values, we drop firms if any of their non-threshold variables’ values fall within the top or bottom 0.5% tails. Moreover, we exclude firms whose Tobin’s Q is larger than 5 for more than 5 years when the threshold variable is Tobin’s Q, leaving 1222 firms in the sample. Meanwhile, Strebulaev and Yang, (2013) claims that firms with large CEO ownership or CEO-friendly boards show persistent zero-leverage behavior. To prevent our threshold regression from capturing corporate governance characteristics rather than financial constraints, we exclude firms whose leverage is zero for more than half of the time periods when leverage is the threshold variable, leaving 1056 firms in the sample.

Table 5 reports the estimates and 95% CIs for (16) and (17), and Table 6 for (18). Figure 1 visualizes how the grid bootstrap CIs are obtained. The CIs for the coefficients are constructed by using the percentiles obtained from the residual bootstrap, defined as (10)²²2The symmetric percentile residual-bootstrap CIs that use the 0.05 quantiles of $|\hat{\alpha}_{j}^{*}-\alpha_{j0}^{*}|$ ’s return similar results, unlike in Monte Carlo results from Section 5. We report them in Appendix G.. $\hat{C}$ for the precentile bootstrap is set at the 50th percentile of the bootstrap statistic for the continuity test, explained in Section 4.3. For the threshold locations, the CIs are obtained by the grid bootstrap with convexification. For the grid bootstrap, we make 500 bootstrap draws for each grid point. The grids of the threshold locations have 81 points from the 10th percentile to the 90th percentile of the threshold variables, and there are equal number of observations between two consecutive points. Table 5 and Table 6 also report the bootstrap p-values for the continuity and linearity tests by the bootstrap methods explained in Section 4.3 and Appendix H, respectively. The null hypothesis of the linearity test is $\mathcal{H}_{0}:\delta=(0,...,0)^{\prime}$ , which implies no threshold effects.

Refer to caption — Figure 1: Panels (a), (b), and (c) are for the models (16), (17), and (18), respectively. Black solid lines in each subplot denote the test statistics, red dashed lines denote the 5% size bootstrapped critical values, and horizontal blue arrows visualize the 95% CIs. The regions where the test statistics are below the bootstrapped critical values become the CIs for the threshold locations.

We find supporting evidence for the presence of the threshold effect when the threshold variable is Tobin’s Q, but the statistical evidence is not strong for the leverage threshold model. Table 5 and Table 6 report the bootstrap p-values at .135, .011, and .011, for specifications (16) - (18), respectively. The statistical evidence to reject the continuity is not trivial for all specifications and gets stronger when it is the restricted model using Tobin’s Q. The estimated bootstrap p-values are .028 and .004 for the unrestricted and the restricted using Tobin’s Q. Furthermore, the confidence interval for the threshold location is narrower for the restricted model (18) than for the unrestricted model (17).

Table 5: Columns (a) and (b) report results of the models (16) and (17), respectively. The percentile of each threshold location value is shown in parentheses below each value. The significance levels for the coefficients are given by stars: * - 10%, ** - 5% and *** - 1%.

(a)				(b)
	est.	[95% CI]			est.	[95% CI]
Lower regime				Lower regime
$I_{t-1}$	0.778**	0.124	1.154	$I_{t-1}$	0.252	-0.258	0.724
$CF_{t-1}$	0.047	-0.034	0.145	$CF_{t-1}$	0.266*	-0.003	0.535
$PPE_{t-1}$	-0.147	-0.385	0.171	$PPE_{t-1}$	0.027	-0.103	0.264
$ROA_{t-1}$	-0.032	-0.132	0.047	$ROA_{t-1}$	-0.017	-0.180	0.090
$LEV_{t-1}$	0.231	-0.843	1.849	$TQ_{t-1}$	0.246*	-0.031	0.577
Upper regime				Upper regime
$I_{t-1}$	-0.154	-0.717	0.551	$I_{t-1}$	0.410	-0.049	0.751
$CF_{t-1}$	0.148	-0.015	0.326	$CF_{t-1}$	0.081**	0.021	0.200
$PPE_{t-1}$	-0.291*	-0.519	0.015	$PPE_{t-1}$	0.044	-0.214	0.398
$ROA_{t-1}$	0.013	-0.066	0.113	$ROA_{t-1}$	0.050*	-0.019	0.153
$LEV_{t-1}$	-0.081	-0.234	0.037	$TQ_{t-1}$	0.005	-0.004	0.012
Difference between regimes				Difference between regimes
intercept	0.068	-0.024	0.200	intercept	0.236*	-0.014	0.580
$I_{t-1}$	-0.932**	-1.830	-0.097	$I_{t-1}$	0.158	-0.559	0.843
$CF_{t-1}$	0.101	-0.107	0.322	$CF_{t-1}$	-0.185	-0.479	0.108
$PPE_{t-1}$	-0.144	-0.519	0.134	$PPE_{t-1}$	0.017	-0.227	0.275
$ROA_{t-1}$	0.045	-0.111	0.232	$ROA_{t-1}$	0.066	-0.074	0.287
$LEV_{t-1}$	-0.312*	-1.893	0.792	$TQ_{t-1}$	-0.242*	-0.573	0.038
Threshold				Threshold
$LEV_{t-1}$	0.172	0.101	0.265	$TQ_{t-1}$	1.298	1.169	1.386
	(38%)	(24%)	(58%)		(30%)	(21%)	(36%)
Testing (p-val)				Testing (p-val)
Linearity	0.135			Linearity	0.011
Continuity	0.033			Continuity	0.028

Table 6: Results of the model (18) are reported. The percentile of each threshold location value is shown in parentheses below each value. The significance levels for the coefficients are given by stars: * - 10%, ** - 5% and *** - 1%.

	est.	[95% CI]
Coefficients
$I_{t-1}$	0.392***	0.304	0.539
$CF_{t-1}$	0.122***	0.084	0.154
$PPE_{t-1}$	0.076	-0.027	0.271
$ROA_{t-1}$	0.027***	0.006	0.046
$TQ_{t-1}1\{TQ_{t-1}\leq\gamma\}$	0.298**	0.073	0.571
$TQ_{t-1}1\{TQ_{t-1}>\gamma\}$	0.008**	0.001	0.015
Difference between regimes
intercept	0.275**	0.010	0.540
$TQ_{t-1}$	-0.290**	-0.562	-0.018
Threshold
$TQ_{t-1}$	1.298	1.253	1.386
	(30%)	(27%)	(36%)
Testing (p-val)
Linearity	0.011
Continuity	0.004

A notable finding concerning the coefficients estimates is that the relationship between cash flow and investment is positive and has larger magnitude for the low Tobin’s Q firms and the high leverage firms compared to their other respective regimes, although they are not statistically significant at 5% level. Even though the sign and magnitude of the estimates align with the observations by Lang et al., (1996) and Hansen, 1999b that a firm is subject to financial constraints when its Tobin’s Q is low or leverage is high, there is uncertainty in the interpretation of our results due to the lack of statistical significance.

Next, the autoregressive coefficient of the lagged investment is significant at 5% level in the low leverage regime and is larger than in the high leverage regime. This lends supporting evidence for the presence of asymmetric dynamics in investment, akin to the dynamics of leverage analyzed by Dang et al., (2012). In the meantime, we note that the autoregressive coefficients for the low and high leverage regimes in Column (a) are 0.778 and -0.154, respectively, which appear more extreme than findings of the literature where the estimates are between 0.1 and 0.5, e.g., Blundell et al., (1992). The autoregressive coefficients in the Column (b) are more in line with these estimates. Since the changes of the estimated coefficients in Column (b) are moderate, we also estimate the restricted model (18).

Turning to Table 6, we observe that the differences between the coefficients of the two regimes become significant at 5% level, and the CI for the threshold location becomes narrower while the estimate of the threshold location remains close to the estimate under the unrestricted model. The autoregressive coefficient of the lagged investment and the sensitivity of investment to both cash flow and return on assets are all positive and significant. The effect of Tobin’s Q is both positive and significant for both high and low Tobin’s Q regimes, but it almost disappears once it surpasses the threshold location. This suggests that low Tobin’s Q is related to low investment but higher Tobin’s Q does not cause higher investment once it reaches some level.

7 Conclusion

This paper studies the asymptotic properties of the GMM estimator in dynamic panel threshold models, showing that the limiting distribution depends critically on whether the true model exhibits a kink or a jump at the threshold. We demonstrate that the standard nonparametric bootstrap is inconsistent when the true model has a kink. To address this, we propose alternative bootstrap procedures for constructing confidence intervals for the threshold location and the model coefficients, which are shown to be consistent regardless of the model’s continuity. In particular, we establish that the grid bootstrap for the threshold parameter is uniformly valid. Monte Carlo simulations confirm that the proposed methods outperform the standard bootstrap in finite samples.

Several directions remain for future research. Our simulation results reveal highly asymmetric bootstrap distributions for the coefficient estimates, which distort finite sample inference. This highlights the need for a more thorough theoretical understanding of the bootstrap’s behavior. In particular, establishing the uniform validity of the bootstrap for the coefficient estimates is an important open question. Extensions of our bootstrap algorithms to incorporate latent group structures, interactive fixed effects, or threshold indices, as studied in Miao et al., 2020b , Miao et al., 2020a , and Seo and Linton, (2007); Lee et al., (2021), respectively, would also be valuable.

References

Adam and Bevan, (2005) Adam, C. S. and Bevan, D. L. (2005). Fiscal deficits and growth in developing countries. Journal of Public Economics, 89:571–597.
Andrews et al., (2020) Andrews, D. W., Cheng, X., and Guggenberger, P. (2020). Generic results for establishing the asymptotic size of confidence sets and tests. Journal of Econometrics, 218(2):496–531.
Andrews, (2001) Andrews, D. W. K. (2001). Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69(3):683–734.
Andrews, (2002) Andrews, D. W. K. (2002). Generalized method of moments estimation when a parameter is on a boundary. Journal of Business & Economic Statistics, 20(4):530–544.
Andrews and Cheng, (2012) Andrews, D. W. K. and Cheng, X. (2012). Estimation and Inference With Weak, Semi-Strong, and Strong Identification. Econometrica, 80:2153–2211.
Andrews and Cheng, (2014) Andrews, D. W. K. and Cheng, X. (2014). GMM Estimation and Uniform Subvector Inference With Possible Identification Failure. Econometric Theory, 30:287–333.
Andrews and Guggenberger, (2009) Andrews, D. W. K. and Guggenberger, P. (2009). Hybrid and size-corrected subsampling methods. Econometrica, 77(3):721–762.
Andrews and Guggenberger, (2019) Andrews, D. W. K. and Guggenberger, P. (2019). Identification- and singularity-robust inference for moment condition models. Quantitative Economics, 10:1703–1746.
Arellano and Bond, (1991) Arellano, M. and Bond, S. (1991). Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations. The Review of Economic Studies, 58:277–297.
Bick, (2010) Bick, A. (2010). Threshold effects of inflation on economic growth in developing countries. Economics Letters, 108(2):126–129.
Blundell et al., (1992) Blundell, R., Bond, S., Devereux, M., and Schiantarelli, F. (1992). Investment and Tobin’s Q: Evidence from company panel data. Journal of Econometrics, 51:233–257.
Boyd and Vandenberghe, (2004) Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.
Cavaliere et al., (2022) Cavaliere, G., Nielsen, H. B., Pedersen, R. S., and Rahbek, A. (2022). Bootstrap inference on the boundary of the parameter space, with application to conditional volatility models. Journal of Econometrics, 227(1):241–263.
Cecchetti et al., (2011) Cecchetti, S. G., Mohanty, M. S., and Zampolli, F. (2011). The Real Effects of Debt. BIS Working Paper No. 352.
Chan and Tong, (1985) Chan, K. S. and Tong, H. (1985). On the use of the deterministic lyapunov function for the ergodicity of stochastic difference equations. Advances in applied probability, 17(3):666–678.
Chan and Tsay, (1998) Chan, K. S. and Tsay, R. S. (1998). Limiting properties of the least squares estimator of a continuous threshold autoregressive model. Biometrika, 85(2):413–426.
Chatterjee and Lahiri, (2011) Chatterjee, A. and Lahiri, S. N. (2011). Bootstrapping lasso estimators. Journal of the American Statistical Association, 106(494):608–625.
Cheng and Huang, (2010) Cheng, G. and Huang, J. Z. (2010). Bootstrap consistency for general semiparametric m-estimation. The Annals of Statistics, 38(5):2884–2915.
Chudik et al., (2017) Chudik, A., Mohaddes, K., Pesaran, M. H., and Raissi, M. (2017). Is There a Debt-Threshold Effect on Output Growth? The Review of Economics and Statistics, 99:135–150.
Dang et al., (2012) Dang, V. A., Kim, M., and Shin, Y. (2012). Asymmetric capital structure adjustments: New evidence from dynamic panel threshold models. Journal of Empirical Finance, 19:465–482.
Dovonon and Goncalves, (2017) Dovonon, P. and Goncalves, S. (2017). Bootstrapping the GMM overidentification test under first-order underidentification. Journal of Econometrics, 201:43–71.
Dovonon and Hall, (2018) Dovonon, P. and Hall, A. R. (2018). The asymptotic properties of gmm and indirect inference under second-order identification. Journal of Econometrics, 205(1):76–111.
Dovonon and Renault, (2013) Dovonon, P. and Renault, E. (2013). Testing for Common Conditionally Heteroskedastic Factors. Econometrica, 81:2561–2586.
Fazzari et al., (1988) Fazzari, S. M., Hubbard, R. G., Petersen, B. C., Blinder, A. S., and Poterba, J. M. (1988). Financing Constraints and Corporate Investment. Brookings Papers on Economic Activity, 1988:141–206.
Giannerini et al., (2024) Giannerini, S., Goracci, G., and Rahbek, A. (2024). The validity of bootstrap testing for threshold autoregression. Journal of Econometrics, 239(1):105379.
Gine and Zinn, (1990) Gine, E. and Zinn, J. (1990). Bootstrapping general empirical measures. The Annals of Probability, 18(2):851 – 869.
Girma, (2005) Girma, S. (2005). Absorptive Capacity and Productivity Spillovers from FDI: A Threshold Regression Analysis. Oxford Bulletin of Economics and Statistics, 67:281–306.
Goncalves and White, (2004) Goncalves, S. and White, H. (2004). Maximum likelihood and the bootstrap for nonlinear dynamic models. Journal of Econometrics, 119(1):199–219.
Gonzalo and Wolf, (2005) Gonzalo, J. and Wolf, M. (2005). Subsampling inference in threshold autoregressive models. Journal of Econometrics, 127(2):201–224.
Hall and Horowitz, (1996) Hall, P. and Horowitz, J. L. (1996). Bootstrap Critical Values for Tests Based on Generalized-Method-of-Moments Estimators. Econometrica, 64:891–916.
Han and McCloskey, (2019) Han, S. and McCloskey, A. (2019). Estimation and Inference with a (nearly) Singular Jacobian. Quantitative Economics, 10:1019–1068.
(32) Hansen, B. E. (1999a). The Grid Bootstrap and the Autoregressive Model. The Review of Economics and Statistics, 81:594–607.
(33) Hansen, B. E. (1999b). Threshold effects in non-dynamic panels: Estimation, testing, and inference. Journal of Econometrics, 93:345–368.
Hansen, (2000) Hansen, B. E. (2000). Sample Splitting and Threshold Estimation. Econometrica, 68:575–603.
Hansen, (2017) Hansen, B. E. (2017). Regression kink with an unknown threshold. Journal of Business & Economic Statistics, 35(2):228–240.
Hidalgo et al., (2023) Hidalgo, J., Lee, H., Lee, J., and Seo, M. H. (2023). Minimax risk in estimating kink threshold and testing continuity. In Advances in Econometrics: Essays in Honor of Joon Y. Park: Econometric Theory, Vol. 45A, pages 233–259.
Hidalgo et al., (2019) Hidalgo, J., Lee, J., and Seo, M. H. (2019). Robust Inference for Threshold Regression Models. Journal of Econometrics, 210:291–309.
Khan and Senhadji, (2001) Khan, M. S. and Senhadji, A. S. (2001). Threshold Effects in the Relationship between Inflation and Growth. IMF Staff Papers, 48:1–21.
Kim et al., (2019) Kim, S., Kim, Y. J., and Seo, M. H. (2019). Estimation of Dynamic Panel Threshold Model Using Stata. The Stata Journal, 19:685–697.
Kremer et al., (2013) Kremer, S., Bick, A., and Nautz, D. (2013). Inflation and growth: new evidence from a dynamic panel threshold analysis. Empirical Economics, 44:861–878.
Lang et al., (1996) Lang, L., Ofek, E., and Stulz, R. (1996). Leverage, investment, and firm growth. Journal of Financial Economics, 40(1):3–29.
Lee et al., (2021) Lee, S., Liao, Y., Seo, M. H., and Shin, Y. (2021). Factor-driven two-regime regression. The Annals of Statistics, 49(3):1656–1678.
Lee et al., (2011) Lee, S., Seo, M. H., and Shin, Y. (2011). Testing for Threshold Effects in Regression Models. Journal of the American Statistical Association, 106:220–231.
Leeb and Pötscher, (2005) Leeb, H. and Pötscher, B. M. (2005). Model selection and inference: facts and fiction. Econometric Theory, 21(1):21–59.
(45) Miao, K., Li, K., and Su, L. (2020a). Panel threshold models with interactive fixed effects. Journal of Econometrics, 219(1):137–170.
(46) Miao, K., Su, L., and Wang, W. (2020b). Panel threshold regressions with latent group structures. Journal of Econometrics, 214(2):451–481.
Mikusheva, (2007) Mikusheva, A. (2007). Uniform inference in autoregressive models. Econometrica, 75(5):1411–1452.
Newey and McFadden, (1994) Newey, W. K. and McFadden, D. (1994). Chapter 36 Large Sample Estimation and Hypothesis Testing. In Handbook of Econometrics, volume 4, pages 2111–2245. Elsevier.
Newey and West, (1987) Newey, W. K. and West, K. D. (1987). Hypothesis Testing with Efficient Method of Moments Estimation. International Economic Review, 28:777–787.
Pakes and Pollard, (1989) Pakes, A. and Pollard, D. (1989). Simulation and the Asymptotics of Optimization Estimators. Econometrica, 57:1027–1057.
Praestgaard and Wellner, (1993) Praestgaard, J. and Wellner, J. A. (1993). Exchangeably Weighted Bootstraps of the General Empirical Process. The Annals of Probability, 21(4):2053 – 2086.
Romano and Shaikh, (2012) Romano, J. P. and Shaikh, A. M. (2012). On the uniform asymptotic validity of subsampling and the bootstrap. The Annals of Statistics, 40(6):2798 – 2822.
Rousseau and Wachtel, (2002) Rousseau, P. L. and Wachtel, P. (2002). Inflation thresholds and the finance–growth nexus. Journal of International Money and Finance, 21:777–793.
Seo and Linton, (2007) Seo, M. H. and Linton, O. (2007). A smoothed least squares estimator for threshold regression models. Journal of Econometrics, 141(2):704–735.
Seo and Shin, (2016) Seo, M. H. and Shin, Y. (2016). Dynamic Panels With Threshold Effect and Endogeneity. Journal of Econometrics, 195:169–186.
Strebulaev and Yang, (2013) Strebulaev, I. A. and Yang, B. (2013). The mystery of zero-leverage firms. Journal of Financial Economics, 109(1):1–23.
van der Vaart and Wellner, (1996) van der Vaart, A. W. and Wellner, J. (1996). Weak Convergence and Empirical Processes With Applications to Statistics. Springer Series in Statistics. Springer-Verlag, New York.
Wang, (2015) Wang, Q. (2015). Fixed-effect panel threshold model using stata. The Stata Journal, 15(1):121–134.
Yang et al., (2020) Yang, L., Zhang, C., Lee, C., and Chen, I.-P. (2020). Panel kink threshold regression model with a covariate-dependent threshold. The Econometrics Journal, 24(3):462–481.
Zhang et al., (2017) Zhang, Y., Zhou, Q., and Jiang, L. (2017). Panel kink regression with an unknown threshold. Economics Letters, 157:116–121.

Additional Notations.

For $k,p\in\mathbb{N}$ , $0_{k\times p}$ denotes $k\times p$ a matrix whose elements are all zero. “ $\rightsquigarrow$ ” denotes the weak convergence as in section 1.3 of van der Vaart and Wellner, (1996). $\|\cdot\|$ is a norm for either vectors or matrices. For a vector, it is the Euclidean norm. For a matrix, it is the Frobenius norm, i.e., $\|M\|=\sqrt{tr(M^{\prime}M)}$ for a matrix $M$ .

Appendix A Proofs for Section 3.

A.1 Proof of Theorem 1.

Note that $E[z_{it}(\Delta y_{it}-\Delta x_{it}^{\prime}\beta-1_{it}(\gamma)^{\prime}X_{it}\delta)]=-E[z_{it}\Delta x_{it}^{\prime}](\beta-\beta_{0})-E[z_{it}1_{it}(\gamma)^{\prime}X_{it}]\delta+E[z_{it}1_{it}(\gamma_{0})^{\prime}X_{it}\delta_{0}]$ due to $\Delta y_{it}=\Delta x_{it}^{\prime}\beta_{0}+1_{it}(\gamma_{0})^{\prime}X_{it}\delta_{0}+\Delta\epsilon_{it}$ . Hence, the population moment equation is $g_{0}(\theta)=M_{10}(\beta-\beta_{0})+M_{20}(\gamma)\delta-M_{20}\delta_{0}=\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}(\gamma)&M_{20}\delta_{0}\\ \end{array}\right]\times((\beta^{\prime}-\beta_{0}^{\prime},\delta^{\prime}),-1)^{\prime},$ when $\gamma\neq\gamma_{0}$ . The condition (ii) of Theorem 1 implies that $\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}(\gamma)&M_{20}\delta_{0}\\ \end{array}\right]$ has full column rank, and hence $g_{0}(\theta)\neq 0_{k}$ if $\gamma\neq\gamma_{0}$ . $g_{0}(\theta)=M_{0}\times(\alpha-\alpha_{0})$ , when $\gamma=\gamma_{0}$ . The condition (i) of Theorem 1 implies that $M_{0}\times(\alpha-\alpha_{0})$ is not zero if $\alpha\neq\alpha_{0}$ . Therefore, $g_{0}(\theta)\neq 0_{k}$ if $\theta\neq\theta_{0}$ , and $g_{0}(\theta)=0_{k}$ if $\theta=\theta_{0}$ , which is the standard identification condition in the literature, e.g., Section 2.2.3 in Newey and McFadden, (1994).

A.2 Proof of Theorem 2.

To obtain limit distribution of $\hat{\theta}$ , we first establish consistency of $\hat{\theta}$ to $\theta_{0}$ and rate of $\hat{\theta}$ ’s convergence. Then, we show asymptotic distribution of the estimates using rescaled versions of the parameters and criterions.

A.2.1 Consistency.

Constrained estimator of the coefficients, $\hat{\alpha}(\gamma)=\arg\min_{\alpha\in A}\hat{Q}_{n}(\alpha,\gamma)$ , given a fixed $\gamma$ can be expressed as

\hat{\alpha}(\gamma)=-(\bar{M}_{n}(\gamma)^{\prime}W_{n}\bar{M}_{n}(\gamma))^{-1}\bar{M}_{n}(\gamma)^{\prime}W_{n}\bar{v}_{n}

where

\bar{v}_{n}=-\bar{M}_{n}\alpha_{0}+u_{n},\quad u_{n}=\frac{1}{n}\sum_{i=1}^{n}\begin{pmatrix}z_{it_{0}}\Delta\epsilon_{it_{0}}\\ \vdots\\ z_{iT}\Delta\epsilon_{iT}\end{pmatrix}.

Therefore,

\hat{\alpha}(\gamma)=-(\bar{M}_{n}(\gamma)^{\prime}W_{n}\bar{M}_{n}(\gamma))^{-1}\bar{M}_{n}(\gamma)^{\prime}W_{n}(-\bar{M}_{n}\alpha_{0}+u_{n}).

Define profiled criterion with respect to $\gamma$ by $\tilde{g}_{n}(\gamma)=\bar{g}_{n}(\hat{\alpha}(\gamma),\gamma)$ and $\tilde{Q}_{n}(\gamma)=\tilde{g}_{n}(\gamma)^{\prime}W_{n}\tilde{g}_{n}(\gamma)$ . The threshold location estimator is $\hat{\gamma}=\arg\min_{\gamma\in\Gamma}\tilde{g}_{n}(\gamma)^{\prime}W_{n}\tilde{g}_{n}(\gamma)$ . By the law of large numbers (LLN), $u_{n}\xrightarrow{p}0$ . By the uniform law of large numbers (ULLN) in Lemma D.2, $\bar{M}_{n}(\gamma)\xrightarrow{p}M_{0}(\gamma)$ uniformly with respect to $\gamma\in\Gamma$ . Hence, $\hat{\gamma}\xrightarrow{p}\gamma_{0}$ would imply $\bar{M}_{n}(\hat{\gamma})\xrightarrow{p}M_{0}$ , and then $\hat{\alpha}(\hat{\gamma})\xrightarrow{p}\alpha_{0}$ , which completes the proof.

To show consistency of $\hat{\gamma}$ to $\gamma_{0}$ , we apply the argmin/argmax continuous mapping theorem (CMT) as in Theorem 3.2.2 in van der Vaart and Wellner, (1996). It is sufficient to check (i) $\tilde{Q}_{n}(\gamma)$ uniformly converges to some function $\tilde{Q}_{0}(\gamma)$ in probability, and (ii) $\tilde{Q}_{0}(\gamma_{0})<\inf_{\gamma\not\in\mathcal{O}}\tilde{Q}_{0}(\gamma)$ for any open set $\mathcal{O}$ contatining $\gamma_{0}$ . (ii) can be shown if $\tilde{Q}_{0}(\gamma)$ is uniquely minimized at $\gamma_{0}$ and continuous as $\Gamma$ is compact.

The profiled moment can be rewritten as

\tilde{g}_{n}(\gamma)=[I-\bar{M}_{n}(\gamma)\left(\bar{M}_{n}(\gamma)^{\prime}W_{n}\bar{M}_{n}(\gamma)\right)^{-1}\bar{M}_{n}(\gamma)^{\prime}W_{n}](-\bar{M}_{n}\alpha_{0}+u_{n}).

Therefore,

W_{n}^{1/2}\tilde{g}_{n}(\gamma)=[I-P_{W_{n}^{1/2}\bar{M}_{n}(\gamma)}](-W_{n}^{1/2}\bar{M}_{n}\alpha_{0}+W_{n}^{1/2}u_{n}),

where $P_{W_{n}^{1/2}\bar{M}_{n}(\gamma)}=W_{n}^{1/2}\bar{M}_{n}(\gamma)\left(\bar{M}_{n}(\gamma)^{\prime}W_{n}\bar{M}_{n}(\gamma)\right)^{-1}\bar{M}_{n}(\gamma)^{\prime}W_{n}^{1/2}$ is a projection matrix to the column space of $W_{n}^{1/2}\bar{M}_{n}(\gamma)$ . The profiled objective can be written as

\tilde{Q}_{n}(\gamma)=\|(I-P_{W_{n}^{1/2}\bar{M}_{n}(\gamma)})(-W_{n}^{1/2}\bar{M}_{n}\alpha_{0}+W_{n}^{1/2}u_{n})\|^{2}.

By $W_{n}\xrightarrow{p}W$ , $u_{n}\xrightarrow{p}0$ , and $\sup_{\gamma\in\Gamma}\|\bar{M}_{n}(\gamma)-M_{0}(\gamma)\|\xrightarrow{p}0$ , we can derive that

\tilde{Q}_{n}(\gamma)\xrightarrow{p}\tilde{Q}_{0}(\gamma)=\|(I-P_{W^{1/2}M_{0}(\gamma)})W^{1/2}M_{0}\alpha_{0}\|^{2}

uniformly with respect to $\gamma$ , where $P_{W^{1/2}M_{0}(\gamma)}=W^{1/2}M_{0}(\gamma)\left(M_{0}(\gamma)^{\prime}WM_{0}(\gamma)\right)^{-1}M_{0}(\gamma)^{\prime}W^{1/2}$ . Note that $W=\Omega^{-1}$ in the second stage of the two-step GMM estimation. $W=I$ when we consider the first stage. $\tilde{Q}_{0}(\gamma)$ is uniquely minimized when $\gamma=\gamma_{0}$ . This is because $W$ is positive definite, and the conditions in Theorem 1 implies that $M_{0}\alpha_{0}$ does not lie in the column space of $M_{0}(\gamma)$ whenever $\gamma\neq\gamma_{0}$ . Moreover, $\tilde{Q}_{0}(\gamma)$ is continuous as $M_{0}(\gamma)$ is continuous with respect to $\gamma$ by D.

A.2.2 Convergence rate.

$\|W_{n}-\Omega^{-1}\|\xrightarrow{p}0$ as the consistency of $\hat{\theta}_{(1)}$ is shown. Our proof follows arguments similar to the proof of Theorem 3.3 by Pakes and Pollard, (1989). By the consistency of $\hat{\theta}$ and by Lemma D.3,

\sqrt{n}\|\bar{g}_{n}(\hat{\theta})-\bar{g}_{n}(\theta_{0})-g_{0}(\hat{\theta})\|=o_{p}(1).

By $\|W_{n}-\Omega^{-1}\|\xrightarrow{p}0$ , we can obtain

\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta})-W_{n}^{1/2}\bar{g}_{n}(\theta_{0})-\Omega^{-1/2}g_{0}(\hat{\theta})\|=o_{p}(1).

Apply triangle inequality to get

\sqrt{n}\|\Omega^{-1/2}g_{0}(\hat{\theta})\|\leq o_{p}(1)+\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\theta_{0})\|+\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta})\|.

As $\hat{\theta}$ is the minimizer of the GMM criterion, $\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta})\|\leq o_{p}(1)+\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\theta_{0})\|=O_{p}(1)$ . Therefore,

\sqrt{n}\|\Omega^{-1/2}g_{0}(\hat{\theta})\|\leq O_{p}(1).

$\sqrt{n}\|\Omega^{-1/2}g_{0}(\hat{\theta})\|\geq\sqrt{n}\|\Omega^{-1/2}D_{2}(\hat{\alpha}^{\prime}-\alpha_{0}^{\prime},(\hat{\gamma}-\gamma_{0})^{2})^{\prime}\|-\sqrt{n}\|\Omega^{-1/2}(g(\hat{\theta})-D_{2}(\hat{\alpha}^{\prime}-\alpha_{0}^{\prime},(\hat{\gamma}-\gamma_{0})^{2})^{\prime})\|$ , while $\sqrt{n}\|\Omega^{-1/2}(g(\hat{\theta})-D_{2}(\hat{\alpha}^{\prime}-\alpha_{0}^{\prime},(\hat{\gamma}-\gamma_{0})^{2})^{\prime})\|\leq o_{p}(1+\sqrt{n}\|(\hat{\alpha}^{\prime}-\alpha_{0}^{\prime},(\hat{\gamma}-\gamma_{0})^{2})^{\prime}\|)$ by Lemma D.1. Thus,

\sqrt{n}(\|\hat{\alpha}-\alpha_{0}\|+(\hat{\gamma}-\gamma_{0})^{2})\leq O_{p}(1)

which implies $\|\hat{\alpha}-\alpha_{0}\|=O_{p}(n^{-1/2})$ and $(\hat{\gamma}-\gamma_{0})^{2}=O_{p}(n^{-1/2})$ .

A.2.3 Asymptotic distribution.

This section derives asymptotic distribution of the estimator through the argmin/argmax continuous mapping theorem (CMT) as in Theorem 3.2.2 in van der Vaart and Wellner, (1996).

Introduce a local reparametrization by $a=\sqrt{n}(\alpha-\alpha_{0})$ and $b=n^{\frac{1}{4}}(\gamma-\gamma_{0})$ , and let $a$ consist of subvectors $a_{1}=\sqrt{n}(\beta-\beta_{0})$ and $a_{2}=\sqrt{n}(\delta-\delta_{0})$ . Additionally, define $\hat{a}=\sqrt{n}(\hat{\alpha}-\alpha_{0})$ and $\hat{b}=n^{\frac{1}{4}}(\hat{\gamma}-\gamma_{0})$ . Note that $(\hat{a},\hat{b}^{2})$ is uniformly tight due to the convergence rate we obtained.³³3 A random variable $X$ is tight if for any $\epsilon>0$ , there exists a compact set $\mathbb{K}$ such that $P(X\in\mathbb{K})>1-\epsilon$ , and $X_{n}$ is uniformly tight if for any $\epsilon>0$ , there exists a compact set $\mathbb{K}$ such that $P(X_{n}\in\mathbb{K})>1-\epsilon$ for all $n\in\mathbb{N}$ . Note that by the convergence rate we derived, for any $\epsilon>0$ , there exists a compact $\mathbb{K}_{0}$ such that $\lim_{n\rightarrow\infty}P((\sqrt{n}(\hat{\alpha}-\alpha_{0})^{\prime},\sqrt{n}(\hat{\gamma}-\gamma_{0})^{2})^{\prime}\in\mathbb{K}_{0})>1-\epsilon/2$ , and $N<\infty$ such that $P((\sqrt{n}(\hat{\alpha}-\alpha_{0})^{\prime},\sqrt{n}(\hat{\gamma}-\gamma_{0})^{2})^{\prime}\in\mathbb{K}_{0})>1-\epsilon$ if $n\geq N$ . Then, we can define a compact set $\mathbb{K}=(\cup_{j=1}^{N-1}\mathbb{K}_{j})\cup\mathbb{K}_{0}$ , where $\mathbb{K}_{j}$ is a compact set such that $P((\sqrt{j}(\hat{\alpha}-\alpha_{0})^{\prime},\sqrt{j}(\hat{\gamma}-\gamma_{0})^{2})^{\prime}\in\mathbb{K}_{j})>1-\epsilon$ , which satisfies $P((\sqrt{n}(\hat{\alpha}-\alpha_{0})^{\prime},\sqrt{n}(\hat{\gamma}-\gamma_{0})^{2})^{\prime}\in\mathbb{K})>1-\epsilon$ for all $n\in\mathbb{N}$ . Let

\mathbb{S}_{n}(a,b)=n\hat{Q}_{n}(\alpha_{0}+\tfrac{a}{\sqrt{n}},\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})=n\bar{g}_{n}(\alpha_{0}+\tfrac{a}{\sqrt{n}},\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}W_{n}\bar{g}_{n}(\alpha_{0}+\tfrac{a}{\sqrt{n}},\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}}).

We show that (i) $\mathbb{S}_{n}$ weakly converges to a stochastic process $\mathbb{S}$ in $\ell^{\infty}(\mathbb{K})$ for every compact $\mathbb{K}$ in the Euclidean space, (ii) $\mathbb{S}$ is continuous, and (iii) $\mathbb{S}$ possesses an unique optimum not in $b$ but in its square $b^{2}$ since $\mathbb{S}(a,b)=\mathbb{S}(a,-b)$ . Thus, we will establish that $(\hat{a}^{\prime},\hat{b}^{2})^{\prime}$ converges in distribution to $(a_{0}^{\prime},b_{0}^{2})^{\prime}=\arg\min_{a,b^{2}}\mathbb{S}(a,\sqrt{b^{2}})$ . In the characterization of the minimizers, $(a_{0}^{\prime},b_{0}^{2})^{\prime}$ is shown to be tight.

The rescaled and reparametrized sample moment can be written as

\sqrt{n}\bar{g}_{n}(\alpha_{0}+\tfrac{a}{\sqrt{n}},\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})=\sqrt{n}\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\Delta\epsilon_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\Delta\epsilon_{iT}\end{pmatrix}-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}\Delta x_{iT}^{\prime}\end{pmatrix}a_{1}\\ -\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}1_{it_{0}}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}1_{iT}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}\end{pmatrix}a_{2}+\begin{pmatrix}\tfrac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it_{0}}(1_{it_{0}}(\gamma_{0})^{\prime}-1_{it_{0}}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it_{0}}\\ \vdots\\ \tfrac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{iT}(1_{iT}(\gamma_{0})^{\prime}-1_{iT}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{iT}\end{pmatrix}\delta_{0}.

By the central limit theorem (CLT),

\sqrt{n}\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\Delta\epsilon_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\Delta\epsilon_{iT}\end{pmatrix}\xrightarrow{d}-e\sim N(0,\Omega).

By the LLN,

\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}\Delta x_{iT}^{\prime}\end{pmatrix}\xrightarrow{p}\begin{pmatrix}Ez_{it_{0}}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ Ez_{iT}\Delta x_{iT}^{\prime}\end{pmatrix}

Let $K<\infty$ be arbitrary. By the ULLN in Lemma D.2,

\left\|\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}1_{it_{0}}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}1_{iT}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}\end{pmatrix}-\begin{pmatrix}Ez_{it_{0}}1_{it_{0}}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}\\ \vdots\\ Ez_{iT}1_{iT}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}\end{pmatrix}\right\|\xrightarrow{p}0

uniformly with respect to $b\in[-K,K]$ . Then, by continuity of $\kappa\mapsto E[z_{it}1_{it}(\gamma+\kappa)X_{it}]$ at $\kappa=0$ ,

\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}1_{it_{0}}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}1_{iT}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}\end{pmatrix}\xrightarrow{p}\begin{pmatrix}Ez_{it_{0}}1_{it_{0}}(\gamma_{0})^{\prime}X_{it_{0}}\\ \vdots\\ Ez_{iT}1_{iT}(\gamma_{0})^{\prime}X_{iT}\end{pmatrix}

uniformly with respect to $b\in[-K,K]$ . By Lemma D.4,

\begin{pmatrix}\tfrac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it_{0}}(1_{it_{0}}(\gamma_{0})^{\prime}-1_{it_{0}}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it_{0}}\delta_{0}\\ \vdots\\ \tfrac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{iT}(1_{iT}(\gamma_{0})^{\prime}-1_{iT}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{iT}\delta_{0}\end{pmatrix}\xrightarrow{p}\frac{\delta_{30}}{2}\begin{pmatrix}E_{t_{0}}[z_{it_{0}}|\gamma_{0}]f_{t_{0}}(\gamma_{0})-E_{t_{0}-1}[z_{it_{0}}|\gamma_{0}]f_{t_{0}-1}(\gamma_{0})\\ \vdots\\ E_{T}[z_{iT}|\gamma_{0}]f_{T}(\gamma_{0})-E_{T-1}[z_{iT}|\gamma_{0}]f_{T-1}(\gamma_{0})\end{pmatrix}b^{2}

uniformly with respect to $b\in[-K,K]$ .

Therefore, $\mathbb{S}_{n}(a,b)$ weakly converges to

\mathbb{S}(a,b)=(M_{0}a+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{0}a+Hb^{2}-e),

in $\ell^{\infty}(\mathbb{K})$ for any compact $\mathbb{K}\subset\mathbb{R}^{2p+2}$ . Then, by the CMT,

(\hat{a},\hat{b}^{2})\xrightarrow{d}\arg\min_{a,b^{2}}(M_{0}a+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{0}a+Hb^{2}-e).

Characterization of the minimizers

Next, we characterize the minimizers. The objective function of the minimization problem is strictly convex with respect to $a$ and $b^{2}$ , since $\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&H\end{array}\right]$ has full column rank and $\Omega^{-1}$ is positive definite. Hence, a solution $(a_{0}^{\prime},b_{0}^{2})^{\prime}$ can be characterized by the Karush-Kuhn-Tucker (KKT) conditions. See Chapter 5 in Boyd and Vandenberghe, (2004) for more details.

The Lagrangian for this problem is

\mathcal{L}(a,b,\lambda)=a^{\prime}M_{0}^{\prime}\Omega^{-1}M_{0}a+2a^{\prime}M_{0}^{\prime}\Omega^{-1}Hb^{2}+H^{\prime}\Omega^{-1}Hb^{4}-2a^{\prime}M_{0}^{\prime}\Omega^{-1}e-2H^{\prime}\Omega^{-1}e\cdot b^{2}+e^{\prime}\Omega^{-1}e-\lambda b^{2}

and the gradient of the Lagrangian with respect to $a$ and $b^{2}$ should vanish:

	$\displaystyle a:$	$\displaystyle\quad M_{0}^{\prime}\Omega^{-1}M_{0}a+M_{0}^{\prime}\Omega^{-1}Hb^{2}-M_{0}^{\prime}\Omega^{-1}e=0$
	$\displaystyle b^{2}:$	$\displaystyle\quad H^{\prime}\Omega^{-1}Hb^{2}+H^{\prime}\Omega^{-1}M_{0}a-H^{\prime}\Omega^{-1}e-\lambda=0.$

In addition, $\lambda\geq 0$ and $\lambda b^{2}=0$ should hold.

(i)

When $\lambda=0$ and $b^{2}\geq 0$ , we can obtain

b^{2}=(H^{\prime}\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}H)^{-1}H^{\prime}\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}e,

where $P_{\Omega^{-1/2}M_{0}}=\Omega^{-1/2}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1/2}$ is the projection matrix to the column space of $\Omega^{-1/2}M_{0}$ . $H^{\prime}\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}H>0$ because the matrix $\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&H\end{array}\right]$ has full column rank, and $\Omega^{-1/2}H$ cannot be in the column space of $\Omega^{-1/2}M_{0}$ and $(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}H\neq 0$ . Therefore,

H^{\prime}\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}e\geq 0

should hold for the feasibility condition $b^{2}\geq 0$ .

(ii)

When $\lambda>0$ and $b^{2}=0$ , we can obtain

$a=(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}e.$

By plugging this into the equation for $b^{2}$ , we get

$H^{\prime}\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}e<0.$

Thus,

b_{0}^{2}=\begin{cases}[H^{\prime}\Xi H]^{-1}H^{\prime}\Xi e&\text{if }H^{\prime}\Xi e\geq 0\\ \quad\quad 0&\text{else}\\ \end{cases}

where $\Xi=\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}$ . $b_{0}^{2}$ follows a normal distribution that is left censored at 0. Then,

a_{0}=\begin{cases}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}[I-H[H^{\prime}\Xi H]^{-1}H^{\prime}\Xi]e&\text{if }H^{\prime}\Xi e\geq 0\\ (M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}e&\text{else.}\end{cases}

Note that the two normal variables $(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}H[H^{\prime}\Xi H]^{-1}H^{\prime}\Xi e$ and $(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}e$ are independent of each other, because $E[H^{\prime}\Xi ee^{\prime}\Omega^{-1}M_{0}]=H^{\prime}\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}M_{0}$ becomes zero.

Appendix B Proofs for Section 4

B.1 Preliminaries

The bootstrap methods we consider are Algorithm 1 with different choices of $\theta_{0}^{*}$ . There are three bootstrap methods this paper propose: (i) $\theta_{0}^{*}=(\hat{\alpha}(\gamma),\gamma)^{\prime}$ for $\gamma\in\Gamma$ , (ii) $\theta_{0}^{*}$ set as (8), and (iii) $\theta_{0}^{*}=\tilde{\theta}$ which is the continuity-restricted estimator. In Appendix F, we consider the case $\theta_{0}^{*}=\hat{\theta}$ which results in the standard nonparametric bootstrap.

The probability law for the bootstrap is formalized following Goncalves and White, (2004). Let $P$ be the probability measure for data and $P^{*}$ be the conditional probability law of bootstrap given observations. $Z_{n}^{*}\xrightarrow{p^{*}}0$ in $P$ ( $Z_{n}^{*}=o_{p}^{*}(1)$ in $P$ ) if for any $\epsilon,\delta>0$ , $P(P^{*}(|Z_{n}^{*}|>\epsilon)>\delta)\rightarrow 0$ as $n\rightarrow\infty$ . $Z_{n}^{*}=O_{p}^{*}(1)$ in $P$ if for any $\epsilon>0$ and $\delta>0$ , there exists $M<\infty$ such that $\limsup_{n}P(P^{*}(|Z_{n}^{*}|\geq M)>\delta)<\epsilon$ . $Z_{n}^{*}\xrightarrow{d^{*}}Z$ in $P$ if $E^{*}f(Z_{n}^{*})\rightarrow Ef(Z)$ in $P$ for every continuous and bounded function $f$ , where $E^{*}$ is the expectation by the bootstrap probability law conditional on observations. $Z_{n}^{*}\overset{*}{\rightsquigarrow}Z$ in $\ell^{\infty}(\mathbb{K})$ in $P$ if $\sup_{f\in BL_{1}}|E^{*}f(Z_{n}^{*})-Ef(Z_{n})|\xrightarrow{p}0$ , where $BL_{1}$ is the set of all Lipschitz functions on $\ell^{\infty}(\mathbb{K})$ bounded in $[0,1]$ such that $|f(z_{1})-f(z_{2})|\leq\|z_{1}-z_{2}\|_{\ell^{\infty}(\mathbb{K})}=\sup_{x\in\mathbb{K}}|z_{1}(x)-z_{2}(x)|$ .

The following lemma is useful in analyzing bootstrap stochastic orders.

Lemma B.1.

(i)

If $A_{n}=o_{p}(1)$ or $O_{p}(1)$ , then $A_{n}=o_{p}^{*}(1)$ or $O_{p}^{*}(1)$ in $P$ , respectively.
(ii)

Let $Z_{n}^{*}=o_{p}^{*}(1)$ in $P$ and $W_{n}^{*}=O_{p}^{*}(1)$ in $P$ . Then, $Z_{n}^{*}\times W_{n}^{*}=o_{p}^{*}(1)$ in $P$ .

Proof.

See Lemma 3 in Cheng and Huang, (2010). ∎

Recall that $W_{n}^{*}=\{[\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\hat{\theta}_{(1)}^{*})g_{i}^{*}(\hat{\theta}_{(1)}^{*})^{\prime}]-[\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\hat{\theta}_{(1)}^{*})][\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\hat{\theta}_{(1)}^{*})^{\prime}]\}^{-1}$ . $\|W_{n}^{*}-\Omega^{-1}\|=o_{p}^{*}(1)$ in $P$ when $\hat{\theta}_{(1)}^{*}\xrightarrow{p^{*}}\theta_{0}$ in $P$ . This would be the case when $\|\hat{\theta}_{(1)}^{*}-\theta_{0}^{*}\|\xrightarrow{p^{*}}0$ in $P$ and $\|\theta_{0}^{*}-\theta_{0}\|=o_{p}(1)$ since then $\|\hat{\theta}_{(1)}^{*}-\theta_{0}\|\leq\|\hat{\theta}_{(1)}^{*}-\theta_{0}^{*}\|+\|\theta_{0}^{*}-\theta_{0}\|=o_{p}^{*}(1)$ in $P$ by Lemma B.1.

B.2 Proof of Theorem 6.

As in the proof of Theorem 2, consistency and convergence rates of the bootstrap estimator should be derived first. These results are summarized in the following proposition, with the proof provided in Online Appendix E.

Proposition 1.

(i) Under the assumptions of the case (i) in Theorems 5, 6, or 7,

\sqrt{n}(\hat{\alpha}^{*}-\alpha_{0}^{*})=O_{p}^{*}(1)\text{ in $P$, and }\sqrt{n}(\hat{\gamma}^{*}-\gamma_{0}^{*})^{2}=O_{p}^{*}(1)\text{ in $P$.}

(ii) Under the assumptions of the case (ii) in Theorems 5 or 6,

\sqrt{n}(\hat{\alpha}^{*}-\alpha_{0}^{*})=O_{p}^{*}(1)\text{ in $P$, and }\sqrt{n}(\hat{\gamma}^{*}-\gamma_{0}^{*})=O_{p}^{*}(1)\text{ in $P$.}

Then, we derive the (conditional) weak convergence limit of the rescaled criterion and apply the CMT to obtain the asymptotic distribution of the bootstrap estimator.

Asymptotic distribution under continuity.

Based on the convergence rate in Proposition 1, introduce the local reparametrization by $a=\sqrt{n}(\alpha-\alpha_{0}^{*})$ and $b=n^{\frac{1}{4}}(\gamma-\gamma_{0}^{*})$ , and let $a$ consist of subvectors $a_{1}=\sqrt{n}(\beta-\beta_{0}^{*})$ and $a_{2}=\sqrt{n}(\delta-\delta_{0}^{*})$ .

The asymptotic distributions of the bootstrap estimators can be derived by using the argmin/argmax CMT as in the proof of Theorem 2. Let

\mathbb{S}_{n}^{*}(a,b)=n\hat{Q}_{n}^{*}(\alpha_{0}^{*}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})=n\bar{g}_{n}^{*}(\alpha_{0}^{*}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}W_{n}^{*}\bar{g}_{n}^{*}(\alpha_{0}^{*}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}}).

We show that $\mathbb{S}_{n}^{*}\overset{*}{\rightsquigarrow}\mathbb{S}$ in $\ell^{\infty}(\mathbb{K})$ in $P$ for every compact $\mathbb{K}$ in the Euclidean space. Recall that $\mathbb{S}(a,b)=(M_{0}a+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{0}a+Hb^{2}-e)$ .

The rescaled and reparametrized bootstrap moment can be written as

	$\displaystyle\sqrt{n}\bar{g}_{n}^{}(\alpha_{0}^{}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})=$	$\displaystyle\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}\widehat{\Delta\epsilon}_{it_{0}}^{}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{}\widehat{\Delta\epsilon}_{iT}^{}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}\right\}$
		$\displaystyle-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{}\Delta x_{iT}^{\prime}\end{pmatrix}a_{1}-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}1_{it_{0}}^{}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}^{}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{}1_{iT}^{}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}^{}\end{pmatrix}a_{2}$
		$\displaystyle+\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}(1_{it_{0}}^{}(\gamma_{0}^{})^{\prime}-1_{it_{0}}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it_{0}}^{}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{}(1_{iT}^{}(\gamma_{0}^{})^{\prime}-1_{iT}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{iT}^{}\end{pmatrix}\delta_{0}^{*}.$

By Lemma E.2,

\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}\widehat{\Delta\epsilon}_{it_{0}}^{*}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}\widehat{\Delta\epsilon}_{iT}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}\right\}\xrightarrow{d^{*}}-e\sim N(0,\Omega)\quad\text{ in $P$.}

By the bootstrap LLN,

\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}\Delta x_{it_{0}}^{*\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}\Delta x_{iT}^{*\prime}\end{pmatrix}\xrightarrow{p^{*}}\begin{pmatrix}Ez_{it_{0}}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ Ez_{iT}\Delta x_{iT}^{\prime}\end{pmatrix}\quad\text{ in $P$.}

Let $K<\infty$ be arbitrary. By bootstrap Glivenko-Cantelli, e.g., Lemma 3.6.16 in van der Vaart and Wellner, (1996),

\sup_{b:|b|\leq K,\gamma\in\Gamma}\left\|\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}1_{it_{0}}^{*}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}^{*}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}1_{iT}^{*}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}^{*}\end{pmatrix}-\begin{pmatrix}Ez_{it_{0}}1_{it_{0}}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}\\ \vdots\\ Ez_{iT}1_{iT}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}\end{pmatrix}\right\|\xrightarrow{p^{*}}0\quad\text{ in $P$.}

By continuity of $J(\gamma):=E[z_{it}1_{it}(\gamma)X_{it}]$ at $\gamma=\gamma_{0}$ , for any $c>0$ , there exists $h>0$ such that $\|J(\gamma)-J(\gamma_{0})\|<c$ if $|\gamma-\gamma_{0}|<h$ . For any $h>0$ , $P(|\gamma_{0}-\gamma_{0}^{*}-\frac{b}{n^{\frac{1}{4}}}|>h)\rightarrow 0$ . Note that $\{\|\frac{1}{n}\sum_{i=1}^{n}z_{it}^{*}1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it}^{*}-J(\gamma_{0})\|>2c\}\subseteq\{\|\frac{1}{n}\sum_{i=1}^{n}z_{it}^{*}1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it}^{*}-J(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})\|>c\}\cup\{\|J(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})-J(\gamma_{0})\|>c\}$ , and hence $P^{*}(\|\frac{1}{n}\sum_{i=1}^{n}z_{it}^{*}1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it}^{*}-J(\gamma_{0})\|>2c)\leq P^{*}(\|\frac{1}{n}\sum_{i=1}^{n}z_{it}^{*}1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it}^{*}-J(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})\|>c)$ with probability approaching 1, while $P^{*}(\|\frac{1}{n}\sum_{i=1}^{n}z_{it}^{*}1_{it}^{*}(\gamma_{0}^{*}+\frac{b}{n^{\frac{1}{4}}})^{\prime}X_{it}^{*}-J(\gamma_{0}^{*}+\frac{b}{n^{\frac{1}{4}}})\|>c)\xrightarrow{p}0$ uniformly with respect to $b\in[-K,K]$ . Thus,

\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}1_{it_{0}}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}^{*}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}1_{iT}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}^{*}\end{pmatrix}\xrightarrow{p^{*}}\begin{pmatrix}Ez_{it_{0}}1_{it_{0}}(\gamma_{0})^{\prime}X_{it_{0}}\\ \vdots\\ Ez_{iT}1_{iT}(\gamma_{0})^{\prime}X_{iT}\end{pmatrix}\text{ in $P$,}

both uniformly with respect to $b\in[-K,K]$ . By Lemma E.5,

\begin{pmatrix}\tfrac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it_{0}}^{*}(1_{it_{0}}^{*}(\gamma_{0}^{*})^{\prime}-1_{it_{0}}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it_{0}}^{*}\\ \vdots\\ \tfrac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{iT}^{*}(1_{iT}^{*}(\gamma_{0}^{*})^{\prime}-1_{iT}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{iT}^{*}\end{pmatrix}\delta_{0}^{*}\\ \xrightarrow{p^{*}}\frac{\delta_{30}}{2}\begin{pmatrix}E_{t_{0}}[z_{it_{0}}|\gamma_{0}]f_{t_{0}}(\gamma_{0})-E_{t_{0}-1}[z_{it_{0}}|\gamma_{0}]f_{t_{0}-1}(\gamma_{0})\\ \vdots\\ E_{T}[z_{iT}|\gamma_{0}]f_{T}(\gamma_{0})-E_{T-1}[z_{iT}|\gamma_{0}]f_{T-1}(\gamma_{0})\end{pmatrix}b^{2}\quad\text{ in $P$}

uniformly with respect to $b\in[-K,K]$ .

Therefore, $\mathbb{S}_{n}^{*}(a,b)\overset{*}{\rightsquigarrow}\mathbb{S}(a,b)$ in $\ell^{\infty}(\mathbb{K})$ in $P$ for any compact $\mathbb{K}\subset\mathbb{R}^{2p+2}$ . Then, by applying the argmin CMT as in the proof of Theorem 2, we can obtain the limit distribution of the bootstrap estimates conditional on the data.

Asymptotic distribution under discontinuity.

The proof for the discontinuous model only requires a slight change to the proof for the continuous model. As the convergence rate for the discontinuous model is $\sqrt{n}$ for both coefficients and threshold location estimators, let $a$ be unchanged and $b=\sqrt{n}(\gamma-\gamma_{0}^{*})$ for the local reparametrization. Let

\mathbb{S}_{n}^{*}(a,b)=n\hat{Q}_{n}^{*}(\alpha_{0}^{*}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})=n\bar{g}_{n}^{*}(\alpha_{0}^{*}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})^{\prime}W_{n}^{*}\bar{g}_{n}^{*}(\alpha_{0}^{*}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}}).

We can write the rescaled and reparametrized moment as follows:

\sqrt{n}\bar{g}_{n}^{*}(\alpha_{0}^{*}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})=\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}\widehat{\Delta\epsilon}_{it_{0}}^{*}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}\widehat{\Delta\epsilon}_{iT}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}\right\}-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}\Delta x_{it_{0}}^{*\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}\Delta x_{iT}^{*\prime}\end{pmatrix}a_{1}\\ -\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}1_{it_{0}}^{*}(\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})^{\prime}X_{it_{0}}^{*}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}1_{iT}^{*}(\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})^{\prime}X_{iT}^{*}\end{pmatrix}a_{2}+\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}(1_{it_{0}}^{*}(\gamma_{0}^{*})^{\prime}-1_{it_{0}}(\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})^{\prime})X_{it_{0}}^{*}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}(1_{iT}^{*}(\gamma_{0}^{*})^{\prime}-1_{iT}(\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})^{\prime})X_{iT}^{*}\end{pmatrix}\delta_{0}^{*}.

The limit of $\sqrt{n}\bar{g}_{n}^{*}(\alpha_{0}^{*}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})$ can be obtained similarly to the continuous model case, except that we use Lemma E.6 instead of Lemma E.5 to get

\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}(1_{it_{0}}^{*}(\gamma_{0}^{*})^{\prime}-1_{it_{0}}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it_{0}}^{*}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}(1_{iT}^{*}(\gamma_{0}^{*})^{\prime}-1_{iT}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{iT}^{*}\end{pmatrix}\delta_{0}^{*}\\ \xrightarrow{p^{*}}\begin{pmatrix}E_{t_{0}}[z_{it_{0}}(1,x_{it_{0}}^{\prime})\delta_{0}|\gamma_{0}]f_{t_{0}}(\gamma_{0})-E_{t_{0}-1}[z_{it_{0}}(1,x_{it_{0}}^{\prime})\delta_{0}|\gamma_{0}]f_{t_{0}-1}(\gamma_{0})\\ \vdots\\ E_{T}[z_{iT}(1,x_{iT}^{\prime})\delta_{0}|\gamma_{0}]f_{T}(\gamma_{0})-E_{T-1}[z_{iT}(1,x_{iT-1}^{\prime})\delta_{0}|\gamma_{0}]f_{T-1}(\gamma_{0})\end{pmatrix}b\quad\text{ in $P$}

uniformly with respect to $b\in[-K,K]$ .

Then, $\mathbb{S}_{n}^{*}(a,b)$ conditonally weakly converges to $\mathbb{S}_{J}(a,b)=(M_{0}a+Gb-e)^{\prime}\Omega^{-1}(M_{0}a+Gb-e)$ in $\ell^{\infty}(\mathbb{K})$ in $P$ for any compact $\mathbb{K}\subset\mathbb{R}^{2p+2}$ . And the argmin CMT yields the asymptotic distribution of the bootstrap estimators. The limit distributions of the bootstrap estimators are normal because $(a_{0}^{\prime},b_{0})^{\prime}=\arg\min_{a,b}\mathbb{S}_{J}(a,b)=(D_{1}^{\prime}\Omega^{-1}D_{1})^{-1}D_{1}^{\prime}\Omega^{-1}e$ . ∎

Online Supplements for “Bootstraps for Dynamic Panel Threshold Models” (Not for Publication)

Woosik Gong and Myung Hwan Seo

This part of the appendix is only for online supplements. It contains supplementary results for the Monte Carlo simulations, the remaining proofs for Theorem 3, Theorem 4, Proposition 1, Theorem 5, Theorem 7, as well as additional lemmas with proofs. It also presents invalidity of the standard nonparametric bootstrap, percentile bootstrap confidence intervals for empirical application, explanation of bootstrap for linearity test, and the uniform validity of the grid bootstrap.

Appendix C Supplementary Results for Monte Carlo Simulation

In this section, we present supplementary results for the Monte Carlo simulations in Section 5.

C.1 Symmetric Percentile Confidence Intervals for Coefficients

First, we report the coverage rates of symmetric percentile CIs for the coefficients that are constructed using the nonparametric bootstrap,

CI^{NPB(S)}_{n,1-\tau}(\alpha_{j})\left[\hat{\alpha}_{j}-\widehat{F}^{*-1}_{n}(1-\tau;|\hat{\alpha}^{*}_{j}-\hat{\alpha}_{j}|),\hat{\alpha}_{j}+\widehat{F}^{*-1}_{n}(1-\tau;|\hat{\alpha}^{*}_{j}-\hat{\alpha}_{j}|)\right].

(C.1)

and the residual bootstrap, defined by (10). Tables 7 and 8 show the coverage rates and the ratios of the average lengths of CIs by the two different bootstrap methods.

In contrast to the results based on non-symmetric percentile CIs in Table 3 in Section 5, Table 7 shows that symmetric CIs provide much higher coverage rates, often resulting in over-coverage. Note that this observation also occurs for the threshold inference as shown in Table 1. Meanwhile, Table 8 shows that the difference in the average lengths of symmetric percentile CIs between the two bootstrap methods is less pronounced compared to the non-symmetric case shown in Table 4.

Table 7: Coverage rates of 95% symmetric percentile CIs for the coefficients are shown. R-B(S) denotes the symmetric percentile CIs by the residual bootstrap defined by (10). NP-B(S) denotes the symmetric percentile CIs by the standard nonparametric bootstrap defined by (C.1).

		R-B(S)					NP-B(S)
	400	0.964	0.976	0.980	0.974	0.930	0.996	0.996	0.996	0.992	0.982
0.0	800	0.951	0.974	0.971	0.967	0.931	0.987	0.992	0.995	0.988	0.976
	1600	0.955	0.972	0.964	0.961	0.923	0.983	0.994	0.995	0.980	0.977
	400	0.964	0.976	0.979	0.974	0.933	0.994	0.993	0.995	0.991	0.982
0.1	800	0.952	0.975	0.970	0.968	0.935	0.990	0.992	0.995	0.989	0.978
	1600	0.959	0.975	0.973	0.961	0.924	0.986	0.995	0.997	0.979	0.977
	400	0.963	0.974	0.978	0.977	0.939	0.995	0.993	0.997	0.993	0.986
0.2	800	0.959	0.972	0.977	0.974	0.929	0.992	0.994	0.996	0.987	0.978
	1600	0.958	0.972	0.976	0.964	0.933	0.986	0.995	0.996	0.979	0.980
	400	0.964	0.971	0.982	0.978	0.940	0.992	0.994	0.998	0.994	0.989
0.5	800	0.960	0.973	0.987	0.974	0.945	0.991	0.994	0.998	0.988	0.985
	1600	0.957	0.977	0.985	0.970	0.945	0.985	0.996	0.998	0.981	0.987
	400	0.970	0.982	0.985	0.984	0.967	0.991	0.995	0.992	0.991	0.993
1.0	800	0.968	0.982	0.988	0.981	0.967	0.992	0.993	0.995	0.989	0.994
	1600	0.960	0.981	0.987	0.972	0.963	0.989	0.995	0.995	0.988	0.989

Table 8: Ratios of the average lengths of 95% symmetric percentile CIs for the coefficients, obtained using different bootstrap methods, are shown. R-B(S) denotes the symmetric percentile CIs by the residual bootstrap defined by (10). NP-B(S) denotes the symmetric percentile CIs by the standard nonparametric bootstrap defined by (C.1).

		Ratios of average lengths of CIs:
		R-B(S) / NP-B(S)
$\delta_{1}+\delta_{3}\gamma$	n	$\beta_{2}$	$\beta_{3}$	$\delta_{1}$	$\delta_{2}$	$\delta_{3}$
	400	1.017	1.035	1.008	0.996	1.010
0.0	800	1.033	1.037	1.007	1.004	1.018
	1600	1.040	1.046	1.012	1.015	1.014
	400	1.028	1.040	1.008	0.996	1.012
0.1	800	1.032	1.033	1.000	1.004	1.015
	1600	1.039	1.047	1.011	1.020	1.016
	400	1.022	1.035	1.003	0.996	1.012
0.2	800	1.032	1.039	1.001	1.004	1.015
	1600	1.039	1.048	1.009	1.025	1.016
	400	1.037	1.046	0.991	1.014	1.016
0.5	800	1.044	1.045	0.991	1.008	1.024
	1600	1.052	1.056	0.996	1.035	1.022
	400	1.101	1.107	0.989	1.042	1.042
1.0	800	1.096	1.111	0.988	1.039	1.052
	1600	1.115	1.136	0.996	1.051	1.048

Although taking symmetric CI brings the coverage probabilities of both bootstraps closer to the nominal level in our Monte Carlo simulations, it is not desirable as both non-symmetric and symmetric percentile CIs should provide similar results if an employed bootstrap scheme is theoretically valid. To investigate the cause of the large difference in coverage rates between symmetric and non-symmetric CIs, we present Figure 2, which displays the sample statistic $(\hat{\delta}_{1}-\delta_{10})$ and the quantiles of the bootstrap test statistics relevant for confidence intervals for each simulated dataset. Figure 2 collects results under the specification $\delta_{1}+\delta_{3}\gamma=0$ , where the model is continuous, with the sample size 1600. Results for other coefficients and other specifications are almost identical and are therefore omitted.

Panels (a) and (b) show the 0.025 and 0.975 bootstrap quantiles of $(\hat{\delta}_{1}^{*}-\hat{\delta}_{1})$ (used for NP-B) and $(\hat{\delta}_{1}^{*}-\delta_{10}^{*})$ (for R-B), respectively. The coverage probability is the frequency that the upper and lower bootstrap quantiles (dots) include the red line (45 degree line) between them. We observe that R-B method improves upon NP-B, as the distance between the two bootstrap quantiles tends to be wider. However, the improvement is not sufficiently large to resolve the undercoverage; see Table 3.

Note that the bootstrap quantiles (dots of each color) would be horizontally flat if they are asymptotically independent to the sample statistic. The nonparametric bootstrap CIs are asymptotically valid if

\sqrt{n}(\hat{\theta}^{*}-\hat{\theta})\xrightarrow{d^{*}}\mathcal{Z}^{*}\text{ in $P$ when }\sqrt{n}(\hat{\theta}-\theta_{0})\xrightarrow{d}\mathcal{Z},

where $\mathcal{Z}^{*}$ is an independent copy of $\mathcal{Z}$ . Therefore, the empirical 95% percentile of $\sqrt{n}(\hat{\delta}_{1}^{*}-\hat{\delta}_{1})$ should be asymptotically independent to $\sqrt{n}(\hat{\delta}_{1}-\delta_{10})$ for the nonparametric bootstrap CI to be valid.

However, as shown in Panel (a), and the bootstrap quantiles are negatively correlated with the sample statistic. Specifically, the correlations between the sample statistic $(\hat{\delta}_{1}-\delta_{10})$ and the 0.975 and 0.025 bootstrap quantiles from NP-B are -0.9037 and -0.8892, respectively. Our residual bootstrap (R-B) mitigates this issue. The bootstrap quantiles in Panel (b) appear flatter compared to those in Panel (a). The corresponding correlations from R-B are -0.7083 and -0.7003 for the 0.975 and 0.025 quantiles, respectively. While the correlations have decreased, they remain far from zero. Further investigation is warranted, although we leave this for future research.

Panels (c) and (d) show the 0.95 bootstrap quantiles of $|\hat{\delta}_{1}^{*}-\hat{\delta}_{1}|$ (for NP-B(S)) and $|\hat{\delta}_{1}^{*}-\delta_{10}^{*}|$ (for R-B(S)), respectively. The coverage probability is the frequency of the dots that lie above the red line. Contrary to Panels (a) and (b), there is no rejection if $\hat{\delta}_{1}^{*}-\delta_{10}^{*}<0$ . Although this brings the coverage probabilities of both bootstraps closer to the nominal level, it is not desirable and misleading.

C.2 Weakly Endogenous Threshold Variable

We additionally report Monte Carlo results when the threshold variable is not weakly exogenous but weakly endogenous, that is, when the variable is predetermined. We consider the dgp same with the one in Section 5 with an exception that (12) is replaced by

\begin{pmatrix}e_{it}\\ u_{it}\end{pmatrix}\overset{iid}{\sim}N\left(\begin{pmatrix}0\\ 0\end{pmatrix},\begin{pmatrix}1&\rho_{eu}\\ \rho_{eu}&1\end{pmatrix}\right),

(C.2)

where $\rho_{eu}=0.5$ . Other parameters such as $\theta=(\beta^{\prime},\delta^{\prime},\gamma)^{\prime}$ and $\sigma$ remain the same as in Section 5. Note that under (12), $E[q_{is}\Delta e_{it}]=0$ if $s\leq t-1$ . On the other hand, $E[q_{is}\Delta e_{it}]=0$ if $s\leq t-2$ but $E[q_{it-1}\Delta e_{it}]\neq 0$ under (C.2). Therefore, we need to exclude $q_{it-1}$ from the instrument such that $z_{it}=(y_{it-2},\dots,y_{i1},q_{it-2},\dots,q_{i1})^{\prime}$ .

We consider the specifications where $\delta_{1}+\delta_{3}\gamma=0,0.5,1$ and repeat Monte Carlo iterations 1,000 times. We report coverage rates of 95% CIs constructed by different bootstrap methods. Tables 9 and 10 show the coverage rates of the threshold location and the coefficients, respectively.

Table 9 shows that Grid-B achieves the most reasonable coverage rates, similar to the results in Table 1 in Section 5. Table 10 shows that both R-B and NP-B are subject to undercoverage for the coefficients, although R-B offers higher coverage rates than NP-B.

Table 9: Coverage rates of 95% CIs for the threshold location. Grid-B denotes the grid bootstrap CI defined by (7). NP-B and NP-B(S) denote the percentile and the symmetric percentile CIs by the standard nonparametric bootstrap defined by (13) and (14).

		$\delta_{1}+\delta_{3}\gamma$
	n	0	0.5	1
	400	0.990	0.983	0.975
Grid-B	800	0.986	0.983	0.965
	1600	0.981	0.975	0.959
	400	0.508	0.519	0.634
NP-B	800	0.443	0.496	0.612
	1600	0.468	0.501	0.610
	400	1.000	0.998	0.994
NP-B(S)	800	1.000	1.000	0.996
	1600	1.000	0.999	0.999

Table 10: Coverage rates of 95% percentile CIs for the coefficients are shown. R-B denotes the percentile CIs by the residual bootstrap defined by (9). NP-B denotes the percentile CIs by the standard nonparametric bootstrap defined by (15).

		R-B					NP-B
$\delta_{1}+\delta_{3}\gamma$	n	$\beta_{2}$	$\beta_{3}$	$\delta_{1}$	$\delta_{2}$	$\delta_{3}$	$\beta_{2}$	$\beta_{3}$	$\delta_{1}$	$\delta_{2}$	$\delta_{3}$
	400	0.753	0.739	0.781	0.796	0.765	0.726	0.658	0.636	0.706	0.691
0.0	800	0.795	0.729	0.783	0.786	0.756	0.764	0.629	0.640	0.709	0.669
	1600	0.832	0.746	0.803	0.787	0.755	0.800	0.647	0.640	0.720	0.674
	400	0.773	0.756	0.757	0.806	0.750	0.740	0.672	0.601	0.725	0.670
0.5	800	0.816	0.736	0.755	0.802	0.770	0.778	0.661	0.580	0.717	0.675
	1600	0.835	0.746	0.776	0.791	0.770	0.811	0.660	0.605	0.720	0.660
	400	0.805	0.777	0.743	0.822	0.754	0.765	0.712	0.618	0.731	0.701
1.0	800	0.829	0.770	0.725	0.798	0.742	0.784	0.685	0.582	0.727	0.683
	1600	0.867	0.799	0.751	0.815	0.762	0.822	0.697	0.576	0.747	0.673

C.3 Coverage Rates by Asymptotic Confidence Intervals

We additionally report coverage rates of CIs based on the asymptotic method described in Seo and Shin, (2016). The dgp remains the same as in Section 5. Tables 11 and 12 show the results for the threshold and the coefficients, respectively.

Table 11 shows that the asymptotic method suffers undercoverage for all specifications we consider and does not improve as the sample size grows. This remains true even when $\delta_{1}+\delta_{3}\gamma=1$ , a case in which the model is discontinuous and the asymptotic CIs are theoretically valid, as shown in Seo and Shin, (2016). This especially highlights the desirability of our grid bootstrap method for inference of the threshold location which achieves good coverage rates in finite samples.

On the other hand, in Table 12, the coverage rates of the coefficients by the asymptotic method are much closer to the nominal level compared to those obtained from the nonparametric bootstrap or our residual bootstrap for both continuous and discontinuous models; see Table 3. We ask readers to be cautious, as it is unclear how the coverage rates of the asymptotic CIs behave when the true model is continuous, as explained in the last paragraph of Section 3.

Table 11: Coverage rates of 95% CIs for the threshold location by the asymptotic method described in Seo and Shin, (2016). The method is based on the asymptotic normality, which holds only when the true model is discontinuous.

	$\delta_{1}+\delta_{3}\gamma$
n	0	0.1	0.2	0.5	1
400	0.881	0.881	0.885	0.884	0.899
800	0.864	0.862	0.860	0.846	0.869
1600	0.837	0.836	0.837	0.836	0.864

Table 12: Coverage rates of 95% CIs for the coefficients by the asymptotic method described in Seo and Shin, (2016). The method is based on the asymptotic normality, which holds only when the true model is discontinuous.

$\delta_{1}+\delta_{3}\gamma$	n	$\beta_{2}$	$\beta_{3}$	$\delta_{1}$	$\delta_{2}$	$\delta_{3}$
	400	0.950	0.923	0.951	0.916	0.970
0.0	800	0.956	0.921	0.952	0.921	0.973
	1600	0.960	0.927	0.956	0.931	0.979
	400	0.947	0.922	0.947	0.917	0.972
0.1	800	0.961	0.923	0.952	0.928	0.973
	1600	0.960	0.929	0.956	0.933	0.983
	400	0.942	0.919	0.947	0.915	0.974
0.2	800	0.959	0.926	0.952	0.926	0.971
	1600	0.957	0.923	0.954	0.933	0.982
	400	0.943	0.922	0.944	0.914	0.977
0.5	800	0.959	0.934	0.953	0.937	0.977
	1600	0.953	0.934	0.953	0.930	0.983
	400	0.949	0.937	0.950	0.925	0.987
1.0	800	0.958	0.952	0.952	0.945	0.985
	1600	0.958	0.949	0.955	0.936	0.981

Appendix D Proofs of Theorems in Section 3 and Auxiliary Lemmas

Additional notations

We introduce additional notations as lemmas in this online appendix involve more empirical process theory. Suppose that $(\mathcal{X},\mathcal{A})$ is a measurable space and $\omega_{1},\omega_{2},...$ are i.i.d. random elements in $(\mathcal{X},\mathcal{A})$ with probability law $P$ . For a point $\omega\in\mathcal{X}$ , let $\delta_{\omega}$ be a dirac measure at $\omega$ ⁴⁴4Although we already use $\delta$ as the subvector of the parameter $\theta=(\beta^{\prime},\delta^{\prime},\gamma)^{\prime}$ , we still use $\delta$ to represent dirac measure as it is strong convention in the literature. We explicitly mention if $\delta$ is used as dirac measure to avoid confusion.. The empirical measure of a sample $\omega_{1},...,\omega_{n}$ is $\mathbb{P}_{n}=\frac{1}{n}\sum_{i=1}^{n}\delta_{\omega_{i}}$ , and the empirical process is $\mathbb{G}_{n}=\sqrt{n}(\mathbb{P}_{n}-P)$ . Let $\mathcal{F}$ be a functional class, elements of which are measurable functions from $\mathcal{X}$ to $\mathbb{R}$ . We call a function $F:\mathcal{X}\rightarrow\mathbb{R}$ an envelope of $\mathcal{F}$ if $|f|\leq F$ for all $f\in\mathcal{F}$ . For a stochastic process $\mathbb{G}$ and a functional class $\mathcal{F}$ , define $\|\mathbb{G}\|_{\mathcal{F}}:=\sup_{f\in\mathcal{F}}|\mathbb{G}f|$ .

D.1 Proof of Theorem 3.

D.1.1 Continuous Model.

When $\gamma=\gamma_{0}$ .

Note that the constrained estimator $\hat{\alpha}(\gamma_{0})=\arg\min_{\alpha\in A}\hat{Q}_{n}(\alpha,\gamma_{0})$ is $\sqrt{n}$ -consistent to $\alpha_{0}$ , which is identical to the convergence rate of $\hat{\alpha}$ , since the problem becomes a standard linear dynamic panel estimation. Let $a=\sqrt{n}(\alpha-\alpha_{0})$ and $b=n^{1/4}(\gamma-\gamma_{0})$ . The distance test statistic can be rewritten as follows:

$\displaystyle\mathcal{D}_{n}(\gamma_{0})$	$\displaystyle=$	$\displaystyle\inf_{a}\mathbb{S}_{n}(a,0)-\inf_{a,b}\mathbb{S}_{n}(a,b)+o_{p}(1)$
	$\displaystyle\xrightarrow{d}$	$\displaystyle\inf_{a}\mathbb{S}(a,0)-\inf_{a,b}\mathbb{S}(a,b)$
	$\displaystyle=$	$\displaystyle\min_{a}(M_{0}a-e)^{\prime}\Omega^{-1}(M_{0}a-e)-\min_{a,b^{2}}(M_{0}a+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{0}a+Hb^{2}-e),$

where we apply the CMT. Lee et al., (2011) showed that the difference between the constrained and unconstrained infima is a continuous operator on $\ell^{\infty}(\mathbb{K})$ .

Note that $\min_{a}(M_{0}a-e)^{\prime}\Omega^{-1}(M_{0}a-e)=e^{\prime}(\Omega^{-1}-\Omega^{-1}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1})e$ , while

	$\displaystyle\min_{a,b^{2}}(M_{0}a+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{0}a+Hb^{2}-e)$
	$\displaystyle=(M_{0}a_{0}+Hb_{0}^{2}-e)^{\prime}\Omega^{-1}(M_{0}a_{0}+Hb_{0}^{2}-e)$
	$\displaystyle=(M_{0}^{\prime}\Omega^{-1}M_{0}a_{0}+M_{0}^{\prime}\Omega^{-1}Hb_{0}^{2})^{\prime}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}(M_{0}^{\prime}\Omega^{-1}M_{0}a_{0}+M_{0}^{\prime}\Omega^{-1}Hb_{0}^{2})$
	$\displaystyle\quad+b_{0}^{2}H^{\prime}\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}Hb_{0}^{2}-2e^{\prime}\Omega^{-1}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}(M_{0}^{\prime}\Omega^{-1}M_{0}a_{0}+M_{0}^{\prime}\Omega^{-1}Hb_{0}^{2})$
	$\displaystyle\quad-2e^{\prime}\Omega^{-1/2}(I-P_{\Omega^{-1/2}M_{0}})\Omega^{-1/2}Hb_{0}^{2}+e^{\prime}\Omega^{-1}e,$

where $(a_{0},b_{0}^{2})$ is the argmin, whose formula is derived in the proof of Theorem 2. By plugging in one of the first order conditions, $M_{0}^{\prime}\Omega^{-1}M_{0}a_{0}+M_{0}^{\prime}\Omega^{-1}Hb_{0}^{2}=M_{0}^{\prime}\Omega^{-1}e$ , and the formula for $b_{0}$ , we can get

	$\displaystyle\min_{a,b^{2}}(M_{0}a+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{0}a+Hb^{2}-e)$
	$\displaystyle\quad=\begin{cases}-e^{\prime}\Omega^{-1}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}e-e^{\prime}\Xi H(H^{\prime}\Xi H)^{-1}H^{\prime}\Xi e+e^{\prime}\Omega^{-1}e&\text{if }H^{\prime}\Xi e\geq 0\\ -e^{\prime}\Omega^{-1}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1}e+e^{\prime}\Omega^{-1}e&\text{else}.\end{cases}$

Therefore, the limit distribution of the test statistic is identical to

\begin{cases}e^{\prime}\Xi H(H^{\prime}\Xi H)^{-1}H^{\prime}\Xi e&\text{if }H^{\prime}\Xi e\geq 0\\ 0&\text{else}.\end{cases}

Note that $e^{\prime}\Xi H(H^{\prime}\Xi H)^{-1}H^{\prime}\Xi e\sim\chi^{2}_{1}$ as $H^{\prime}\Xi e\sim N(0,H^{\prime}\Xi\Omega\Xi H)$ , and $H^{\prime}\Xi\Omega\Xi H=H^{\prime}\Xi H$ .

When $\gamma\neq\gamma_{0}$ .

We show that $\mathcal{D}_{n}(\gamma)$ diverges to infinity in probability. There is a constant $C_{1}\in(0,+\infty)$ such that $\inf_{\alpha\in A}\|g_{0}(\alpha,\gamma)\|\geq C_{1}$ . This is because $g_{0}(\theta)$ is zero if and only if $\theta=\theta_{0}$ , by G and Theorem 1, and continuous on $\Theta$ , by D, while the restricted parameter set $\{\theta=(\beta^{\prime},\delta^{\prime},\gamma)^{\prime}\in\Theta:\gamma=c\}$ is closed for all $c\in\Gamma$ . $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ is shown to satisfy the uniform entropy condition in the proof of Lemma D.3, and hence $\sup_{\theta\in\Theta}\|\bar{g}_{n}(\theta)-g_{0}(\theta)\|=o_{p}(1)$ by Glivenko-Cantelli theorem. By triangle inequality, $C_{1}\leq\|g_{0}(\hat{\alpha}(\gamma),\gamma)\|\leq\|\bar{g}_{n}(\hat{\alpha}(\gamma),\gamma)\|+o_{p}(1)$ . Meanwhile, $\|\bar{g}_{n}(\hat{\theta})\|=O_{p}(n^{-1/2})$ because $\|\bar{g}_{n}(\hat{\theta})\|\leq\|\bar{g}_{n}(\theta_{0})\|=O_{p}(n^{-1/2})$ . Therefore, there exists $C_{2}\in(0,+\infty)$ such that $\hat{Q}_{n}(\hat{\alpha}(\gamma),\gamma)-\hat{Q}_{n}(\hat{\theta})\geq C_{2}+O_{p}(n^{-1})$ , which implies that $P(\mathcal{D}_{n}(\gamma)>M)=P(\hat{Q}_{n}(\hat{\alpha}(\gamma),\gamma)-\hat{Q}_{n}(\hat{\theta})>M/n)\rightarrow 1$ for any $M<\infty$ .

D.1.2 Discontinuous Model.

When $\gamma=\gamma_{0}$ .

As in the proof for the continuous model, we apply the CMT to the test statistic. Let $a=\sqrt{n}(\alpha-\alpha_{0})$ and $b=\sqrt{n}(\gamma-\gamma_{0})$ . First, we will show that when the model is discontinuous and Assumptions G, D, and LJ are true, $\mathbb{S}_{n}(a,b)\rightsquigarrow\mathbb{S}_{J}(a,b)=(M_{0}a+Gb-e)^{\prime}\Omega^{-1}(M_{0}a+Gb-e)$ in $\ell^{\infty}(\mathbb{K})$ for any compact $\mathbb{K}\subset\mathbb{R}^{2p+2}$ . Note that

$\displaystyle\sqrt{n}\bar{g}_{n}(\alpha_{0}+\tfrac{a}{\sqrt{n}},\gamma_{0}+\tfrac{b}{\sqrt{n}})=$	$\displaystyle\sqrt{n}\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\Delta\epsilon_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\Delta\epsilon_{iT}\end{pmatrix}-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}\Delta x_{iT}^{\prime}\end{pmatrix}a_{1}$	(D.1)
	$\displaystyle-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}1_{it_{0}}(\gamma_{0}+\tfrac{b}{\sqrt{n}})^{\prime}X_{it_{0}}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}1_{iT}(\gamma_{0}+\tfrac{b}{\sqrt{n}})^{\prime}X_{iT}\end{pmatrix}a_{2}$	(D.2)
	$\displaystyle+\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}(1_{it_{0}}(\gamma_{0})^{\prime}-1_{it_{0}}(\gamma_{0}+\tfrac{b}{\sqrt{n}})^{\prime})X_{it_{0}}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}(1_{iT}(\gamma_{0})^{\prime}-1_{iT}(\gamma_{0}+\tfrac{b}{\sqrt{n}})^{\prime})X_{iT}\end{pmatrix}\delta_{0}.$	(D.3)

The terms in the first two lines of the right hand side (D.1) and (D.2) converge in distribution to $(M_{0}a-e)$ uniformly with respect to $b\in[-K,K]$ . Since $\sup_{b:|b|\leq K}\sqrt{n}\|\bar{g}_{n}(\alpha_{0},\gamma_{0}+\frac{b}{\sqrt{n}})-\bar{g}_{n}(\alpha_{0},\gamma_{0})-g_{0}(\alpha_{0},\gamma_{0}+\frac{b}{\sqrt{n}})+g_{0}(\alpha_{0},\gamma_{0})\|=o_{p}(1)$ by Lemma D.3,

\sqrt{n}\left\|\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}(1_{it_{0}}(\gamma_{0})^{\prime}-1_{it_{0}}(\gamma_{0}+\tfrac{b}{\sqrt{n}})^{\prime})X_{it_{0}}\delta_{0}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}(1_{iT}(\gamma_{0})^{\prime}-1_{iT}(\gamma_{0}+\tfrac{b}{\sqrt{n}})^{\prime})X_{iT}\delta_{0}\end{pmatrix}-\begin{pmatrix}E[z_{it_{0}}(1_{it_{0}}(\gamma_{0})^{\prime}-1_{it_{0}}(\gamma_{0}+\tfrac{b}{\sqrt{n}})^{\prime})X_{it_{0}}\delta_{0}]\\ \vdots\\ E[z_{iT}(1_{iT}(\gamma_{0})^{\prime}-1_{iT}(\gamma_{0}+\tfrac{b}{\sqrt{n}})^{\prime})X_{iT}\delta_{0}]\end{pmatrix}\right\|

converges in probability to zero uniformly with respect to $b\in[-K,K]$ . Suppose $b>0$ . The result for $b<0$ is similar. By application of Talyor expansion,

\sqrt{n}E[z_{it}(1,x_{it}^{\prime})\delta_{0}1\{\gamma_{0}+\tfrac{b}{\sqrt{n}}\geq q_{it}>\gamma_{0}\}]\rightarrow E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}|\gamma_{0}]f_{t}(\gamma_{0})b,

uniformly with respect to $b\in[-K,K]$ , and similar limit result can be derived for $\sqrt{n}E[z_{it}((1,x_{it-1}^{\prime})\delta_{0}1\{\gamma_{0}+\tfrac{b}{\sqrt{n}}\geq q_{it-1}>\gamma_{0}\}]$ . Hence, we can derive that the term (D.3) converges in probability to $Gb$ uniformly with respect to $b\in[-K,K]$ .

By the CMT, the test statistic converges in distribution to

\min_{a}(M_{0}a-e)^{\prime}\Omega^{-1}(M_{0}a-e)-\min_{a,b}(M_{0}a+Gb-e)^{\prime}\Omega^{-1}(M_{0}a+Gb-e).

Note that $\min_{a}(M_{0}a-e)^{\prime}\Omega^{-1}(M_{0}a-e)=e^{\prime}(\Omega^{-1}-\Omega^{-1}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1})e$ , and $\min_{a,b}(M_{0}a+Gb-e)^{\prime}\Omega^{-1}(M_{0}a+Gb-e)=e^{\prime}(\Omega^{-1}-\Omega^{-1}D_{1}(D_{1}^{\prime}\Omega^{-1}D_{1})^{-1}D_{1}^{\prime}\Omega^{-1})e$ . Therefore, the limit distribution of the test statistic is identical to the distribution of

e^{\prime}\Omega^{-1/2}[\Omega^{-1/2}D_{1}(D_{1}^{\prime}\Omega^{-1}D_{1})^{-1}D_{1}^{\prime}\Omega^{-1/2}-\Omega^{-1/2}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1/2}]\Omega^{-1/2}e.

The matrix $\Omega^{-1/2}D_{1}(D_{1}^{\prime}\Omega^{-1}D_{1})^{-1}D_{1}^{\prime}\Omega^{-1/2}-\Omega^{-1/2}M_{0}(M_{0}^{\prime}\Omega^{-1}M_{0})^{-1}M_{0}^{\prime}\Omega^{-1/2}$ is idempotent since the column space of $\Omega^{-1/2}M_{0}$ lies in the column space of $\Omega^{-1/2}D_{1}$ . The rank of the matrix is 1. Since $\Omega^{-1/2}e\sim N(0,I)$ , the chi-square distribution with 1 degree of freedom is the limit distribution.

When $\gamma\neq\gamma_{0}$ .

The proof showing that $\mathcal{D}_{n}(\gamma)$ diverges when $\gamma\neq\gamma_{0}$ for the discontinuous model is identical to the proof written for the continuous model.

D.2 Proof of Theorem 4.

Under the null hypothesis.

Define a map $T$ such that $T(\psi)=(\beta^{\prime},-\gamma\delta_{3},0,...,0,\delta_{3},\gamma)^{\prime}\in\mathbb{R}^{2p+2}$ if $\psi=(\beta^{\prime},\delta_{3},\gamma)^{\prime}\in\mathbb{R}^{p+2}$ . Let $\psi_{0}=(\beta_{0}^{\prime},\delta_{30},\gamma_{0})^{\prime}$ . Note that

g_{i}(T(\psi))=\begin{pmatrix}z_{it_{0}}\{\Delta y_{it_{0}}-\Delta x_{it_{0}}^{\prime}\beta-[(q_{it_{0}}-\gamma)1_{\{q_{it_{0}}>\gamma\}}-(q_{it_{0}-1}-\gamma)1_{\{q_{it_{0}-1}>\gamma\}}]\delta_{3}\}\\ \vdots\\ z_{iT}\{\Delta y_{iT}-\Delta x_{iT}^{\prime}\beta-[(q_{iT}-\gamma)1_{\{q_{iT}>\gamma\}}-(q_{iT-1}-\gamma)1_{\{q_{iT-1}>\gamma\}}]\delta_{3}\}\end{pmatrix}.

The first-order derivative of $g_{0}(T(\psi))$ with respect to $\psi$ is

D_{\psi}=\\ E\begin{bmatrix}-z_{it_{0}}\Delta x_{it_{0}}^{\prime},&-z_{it_{0}}[(q_{it_{0}}-\gamma_{0})1_{\{q_{it_{0}}>\gamma_{0}\}}-(q_{it_{0}-1}-\gamma_{0})1_{\{q_{it_{0}-1}>\gamma_{0}\}}],&z_{it_{0}}[1_{\{q_{it_{0}}>\gamma_{0}\}}-1_{\{q_{it_{0}-1}>\gamma_{0}\}}]\delta_{30}\\ \vdots&\vdots&\vdots\\ -z_{iT}\Delta x_{iT}^{\prime},&-z_{iT}[(q_{iT}-\gamma_{0})1_{\{q_{iT}>\gamma_{0}\}}-(q_{iT-1}-\gamma_{0})1_{\{q_{iT-1}>\gamma_{0}\}}],&z_{iT}[1_{\{q_{iT}>\gamma_{0}\}}-1_{\{q_{iT-1}>\gamma_{0}\}}]\delta_{30}\end{bmatrix}.

$D_{\psi}$ is a matrix that is identical to a binding of the columns of $M_{10}$ and $N_{20}$ . If $\hat{\psi}=\arg\min_{\psi}\hat{Q}_{n}(T(\psi))$ , then $\sqrt{n}(\hat{\psi}-\psi_{0})\xrightarrow{d}N(0,(D_{\psi}^{\prime}\Omega D_{\psi})^{-1})$ (see Kim et al., (2019)). The continuity test statistic $\mathcal{T}_{n}=n(\hat{Q}_{n}(\tilde{\theta})-\hat{Q}_{n}(\hat{\theta}))$ can be rewritten as

n(\hat{Q}_{n}(T(\hat{\psi}))-\hat{Q}_{n}(\hat{\theta}))=n\left(\min_{(\theta^{\prime},\psi^{\prime})^{\prime}:\theta=\theta_{0}}(\hat{Q}_{n}(T(\psi))-\hat{Q}_{n}(\theta))-\min_{(\theta^{\prime},\psi^{\prime})^{\prime}:\psi=\psi_{0}}(-\hat{Q}_{n}(T(\psi))+\hat{Q}_{n}(\theta))\right).

Reparametrize such that $a=\sqrt{n}(\alpha-\alpha_{0})$ , $b=n^{1/4}(\gamma-\gamma_{0})$ , and $r=\sqrt{n}(\psi-\psi_{0})$ . Define a centered criterion by

\mathbb{M}_{n}(a,b,r)=n(\hat{Q}_{n}(T(\psi_{0}+\tfrac{r}{\sqrt{n}}))-\hat{Q}_{n}(\alpha_{0}+\tfrac{a}{\sqrt{n}},\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})).

We will show that $\mathbb{M}_{n}$ weakly converges to a process $\mathbb{M}$ in $\ell^{\infty}(\mathbb{K})$ for every compact $\mathbb{K}\subset\mathbb{R}^{3p+4}$ . Then, by the CMT, the continuity test statistic converges in distribution to

\min_{(a^{\prime},b,r^{\prime})^{\prime}:(a^{\prime},b)^{\prime}=0}\mathbb{M}(a,b,r)-\min_{(a^{\prime},b,r^{\prime})^{\prime}:r=0}(-\mathbb{M}(a,b,r)).

In the proof of Theorem 2, it is shown that $\sqrt{n}\bar{g}_{n}(\alpha_{0}+\tfrac{a}{\sqrt{n}},\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})\rightsquigarrow(M_{0}a+Hb^{2}-e)$ and

n\hat{Q}_{n}(\alpha_{0}+\tfrac{a}{\sqrt{n}},\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})\rightsquigarrow(M_{0}a+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{0}a+Hb^{2}-e).

Let $r_{1}=\sqrt{n}(\beta-\beta_{0})$ , $r_{2}=(r_{21},r_{22})^{\prime}$ , $r_{21}=\sqrt{n}(\delta_{3}-\delta_{30})$ , and $r_{22}=\sqrt{n}(\gamma-\gamma_{0})$ . Then,

	$\displaystyle\sqrt{n}\bar{g}_{n}(T(\psi_{0}+\tfrac{r}{\sqrt{n}}))$
	$\displaystyle=\begin{pmatrix}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it_{0}}\Delta\epsilon_{it_{0}}\\ \vdots\\ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{iT}\Delta\epsilon_{iT}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\Delta x_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\Delta x_{iT}\end{pmatrix}r_{1}$
	$\displaystyle-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}[(q_{it_{0}}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it_{0}}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}-(q_{it_{0}-1}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it_{0}-1}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}[(q_{iT}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{iT}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}-(q_{iT-1}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{iT-1}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\end{pmatrix}r_{21}$
	$\displaystyle+\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}[(q_{it_{0}}-\gamma_{0})1\{q_{it_{0}}>\gamma_{0}\}-(q_{it_{0}}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it_{0}}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}[(q_{iT}-\gamma_{0})1\{q_{iT}>\gamma_{0}\}-(q_{iT}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{iT}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\end{pmatrix}\right.$
	$\displaystyle\left.\qquad-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}[(q_{it_{0}-1}-\gamma_{0})1\{q_{it_{0}-1}>\gamma_{0}\}-(q_{it_{0}-1}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it_{0}-1}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}[(q_{iT-1}-\gamma_{0})1\{q_{iT-1}>\gamma_{0}\}-(q_{iT-1}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{iT-1}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\end{pmatrix}\right\}\delta_{30}.$

By the CLT and LLN,

\begin{pmatrix}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it_{0}}\Delta\epsilon_{it_{0}}\\ \vdots\\ \frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{iT}\Delta\epsilon_{iT}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\Delta x_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\Delta x_{iT}\end{pmatrix}r_{1}\xrightarrow{d}(M_{10}r_{1}-e).

By the ULLN (application of Lemma D.2) and continuity of $\kappa\mapsto E[z_{it}(1,q_{it})1\{q_{it}>\gamma_{0}+\kappa\}]$ and $\kappa\mapsto E[z_{it}(1,q_{it-1})1\{q_{it-1}>\gamma_{0}+\kappa\}]$ at $\kappa=0$ ,

\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}[(q_{it_{0}}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it_{0}}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}-(q_{it_{0}-1}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it_{0}-1}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}[(q_{iT}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{iT}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}-(q_{iT-1}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{iT-1}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\end{pmatrix}r_{21}\\ \xrightarrow{p}\begin{pmatrix}Ez_{it_{0}}[(q_{it_{0}}-\gamma_{0})1\{q_{it_{0}}>\gamma_{0}\}-(q_{it_{0}-1}-\gamma_{0})1\{q_{it_{0}-1}>\gamma_{0}\}]\\ \vdots\\ Ez_{iT}[(q_{iT}-\gamma_{0})1\{q_{iT}>\gamma_{0}\}-(q_{iT-1}-\gamma_{0})1\{q_{iT-1}>\gamma_{0}\}]\end{pmatrix}r_{21}

uniformly with respect to $r_{22}\in[-K,K]$ . Finally,

\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}[(q_{it_{0}}-\gamma_{0})1\{q_{it_{0}}>\gamma_{0}\}-(q_{it_{0}}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it_{0}}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}[(q_{iT}-\gamma_{0})1\{q_{iT}>\gamma_{0}\}-(q_{iT}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{iT}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\end{pmatrix}\right.\\ \left.\qquad-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}[(q_{it_{0}-1}-\gamma_{0})1\{q_{it_{0}-1}>\gamma_{0}\}-(q_{it_{0}-1}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it_{0}-1}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}[(q_{iT-1}-\gamma_{0})1\{q_{iT-1}>\gamma_{0}\}-(q_{iT-1}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{iT-1}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]\end{pmatrix}\right\}\\ \xrightarrow{p}\begin{pmatrix}Ez_{it_{0}}[1\{q_{it_{0}}>\gamma_{0}\}-1\{q_{it_{0}-1}>\gamma_{0}\}]\\ \vdots\\ Ez_{iT}[1\{q_{iT}>\gamma_{0}\}-1\{q_{iT-1}>\gamma_{0}\}]\end{pmatrix}r_{22}

uniformly with respect to $r_{22}\in[-K,K]$ . Suppose that $r_{22}>0$ . The case for $r_{22}<0$ follows similarly. The last uniform convergence holds because Lemma D.3 yields $\sqrt{n}\|\bar{g}_{n}(T(\beta_{0},\delta_{30},\gamma_{0}+\frac{r_{22}}{\sqrt{n}}))-\bar{g}_{n}(T(\beta_{0},\delta_{30},\gamma_{0}))-g_{0}(T(\beta_{0},\delta_{30},\gamma_{0}+\frac{r_{22}}{\sqrt{n}}))+g_{0}(T(\beta_{0},\delta_{30},\gamma_{0}))\|=o_{p}(1)$ uniformly with respect to $r_{22}\in[-K,K]$ and the following application of Taylor expansion:

	$\displaystyle\sqrt{n}E[z_{it}((q_{it}-\gamma_{0})1\{q_{it}>\gamma_{0}\}-(q_{it}-\gamma_{0}-\frac{r_{22}}{\sqrt{n}})1\{q_{it}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\})]$
	$\displaystyle=\sqrt{n}E[z_{it}(q_{it}-\gamma_{0})1\{\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\geq q_{it}>\gamma_{0}\}]+r_{22}E[z_{it}1\{q_{it}>\gamma_{0}+\frac{r_{22}}{\sqrt{n}}\}]$
	$\displaystyle\rightarrow E_{t}[z_{it}(q_{it}-\gamma_{0})\|\gamma_{0}]f_{t}(\gamma_{0})r_{22}+E[z_{it}1\{q_{it}>\gamma_{0}\}]r_{22}=E[z_{it}1\{q_{it}>\gamma_{0}\}]r_{22}$

uniformly with respect to $r_{22}\in[-K,K]$ as $n\rightarrow\infty$ .

In conclusion, $\sqrt{n}\bar{g}_{n}(T(\psi_{0}+\tfrac{r}{\sqrt{n}}))\rightsquigarrow(D_{\psi}r-e)$ , and

	$\displaystyle\mathbb{M}(a,b,r)$	$\displaystyle=(D_{\psi}r-e)^{\prime}\Omega^{-1}(D_{\psi}r-e)-(M_{0}a+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{0}a+Hb^{2}-e)$
		$\displaystyle=(M_{10}r_{1}+N_{20}r_{2}-e)^{\prime}\Omega^{-1}(M_{10}r_{1}+N_{20}r_{2}-e)$
		$\displaystyle\qquad-(M_{10}a_{1}+M_{20}a_{2}+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{10}a_{1}+M_{20}a_{2}+Hb^{2}-e),$

where $a_{1}=\sqrt{n}(\beta-\beta_{0})$ and $a_{2}=\sqrt{n}(\delta-\delta_{0})$ . By applying the CMT, the continuity test statistic converges in distribution to

	$\displaystyle\min_{r}(M_{10}r_{1}+N_{20}r_{2}-e)^{\prime}\Omega^{-1}(M_{10}r_{1}+N_{20}r_{2}-e)$
	$\displaystyle\qquad-\min_{a,b^{2}}(M_{10}a_{1}+M_{20}a_{2}+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{10}a_{1}+M_{20}a_{2}+Hb^{2}-e).$

By similar computations to the proof of Theorem 3,

	$\displaystyle\min_{r_{1},r_{2}}(M_{10}r_{1}+N_{20}r_{2}-e)^{\prime}\Omega^{-1}(M_{10}r_{1}+N_{20}r_{2}-e)$
	$\displaystyle\quad=e^{\prime}\Omega^{-1}e-e^{\prime}\Omega^{-1}M_{10}(M_{10}^{\prime}\Omega^{-1}M_{10})^{-1}M_{10}^{\prime}\Omega^{-1}e-e^{\prime}\Xi_{1}N_{20}(N_{20}^{\prime}\Xi_{1}N_{20})^{-1}N_{20}^{\prime}\Xi_{1}e,$
	$\displaystyle\min_{a_{1},a_{2},b^{2}}(M_{10}a_{1}+M_{20}a_{2}+Hb^{2}-e)^{\prime}\Omega^{-1}(M_{10}a_{1}+M_{20}a_{2}+Hb^{2}-e)$
	$\displaystyle\quad=\begin{cases}\begin{array}[]{l}-e^{\prime}\Omega^{-1}M_{10}(M_{10}^{\prime}\Omega^{-1}M_{10})^{-1}M_{10}^{\prime}\Omega^{-1}e-e^{\prime}\Xi_{1}M_{20}(M_{20}^{\prime}\Xi_{1}M_{20})^{-1}M_{20}^{\prime}\Xi_{1}e\\ -e^{\prime}\Xi_{12}H(H^{\prime}\Xi_{12}H)^{-1}H^{\prime}\Xi_{12}e+e^{\prime}\Omega^{-1}e\end{array}&\hskip-25.60747pt\text{if }H^{\prime}\Xi_{12}e\geq 0\\ &\\ -e^{\prime}\Omega^{-1}M_{10}(M_{10}^{\prime}\Omega^{-1}M_{10})^{-1}M_{10}^{\prime}\Omega^{-1}e-e^{\prime}\Xi_{1}M_{20}(M_{20}^{\prime}\Xi_{1}M_{20})^{-1}M_{20}^{\prime}\Xi_{1}e+e^{\prime}\Omega^{-1}e&\text{ else}\end{cases}$

where $\Xi_{1}=\Omega^{-1/2}(I-\Omega^{-1/2}M_{10}(M_{10}^{\prime}\Omega^{-1}M_{10})^{-1}M_{10}^{\prime}\Omega^{-1/2})\Omega^{-1/2}$ and $\Xi_{12}=\Xi_{1}^{1/2}(I-\Xi_{1}^{1/2}M_{20}(M_{20}^{\prime}\Xi_{1}M_{20})^{-1}M_{20}^{\prime}\Xi_{1}^{1/2})\Xi_{1}^{1/2}$ . As $\Xi_{1}\Omega\Xi_{1}=\Xi_{1}$ , we can derive $\Xi_{12}\Omega\Xi_{12}=(\Xi_{1}-\Xi_{1}M_{20}(M_{20}^{\prime}\Xi_{1}M_{20})^{-1}M_{20}^{\prime}\Xi_{1})\Omega(\Xi_{1}-\Xi_{1}M_{20}(M_{20}^{\prime}\Xi_{1}M_{20})^{-1}M_{20}^{\prime}\Xi_{1})=\Xi_{1}-\Xi_{1}M_{20}(M_{20}^{\prime}\Xi_{1}M_{20})^{-1}M_{20}^{\prime}\Xi_{1}=\Xi_{12}$ , and hence $e^{\prime}\Xi_{12}H(H^{\prime}\Xi_{12}H)^{-1}H^{\prime}\Xi_{12}e\sim\chi^{2}_{1}$ . Since $E[H^{\prime}\Xi_{12}ee^{\prime}\Xi_{1}M_{20}]$ is zero, $(e^{\prime}\Xi_{1}M_{20}(M_{20}^{\prime}\Xi_{1}M_{20})^{-1}M_{20}^{\prime}\Xi_{1}e,e^{\prime}\Xi_{1}N_{20}(N_{20}^{\prime}\Xi_{1}N_{20})^{-1}N_{20}^{\prime}\Xi_{1}e)$ is independent to $e^{\prime}\Xi_{12}H(H^{\prime}\Xi_{12}H)^{-1}H^{\prime}\Xi_{12}e$ .

Under the alternative hypothesis.

There is a constant $C_{1}\in(0,+\infty)$ such that $\inf_{\theta\in\Theta:\delta_{1}+\delta_{3}\gamma=0,\delta_{2}=0_{p-1}}\|g_{0}(\theta)\|\geq C_{1}$ . This is because $g_{0}(\theta)$ is zero if and only if $\theta=\theta_{0}$ , by G and Theorem 1, and continuous on $\Theta$ , by D, while the restricted parameter set $\{\theta=(\beta^{\prime},\delta^{\prime},\gamma)^{\prime}:\delta_{2}=0_{p-1},\delta_{1}+\delta_{3}\gamma=0\}$ is closed. $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ is shown to satisfy the uniform entropy condition in the proof of Lemma D.3, and hence $\sup_{\theta\in\Theta}\|\bar{g}_{n}(\theta)-g_{0}(\theta)\|=o_{p}(1)$ by Glivenko-Cantelli theorem. By triangle inequality, $C_{1}\leq\|g_{0}(\tilde{\theta})\|\leq\|\bar{g}_{n}(\tilde{\theta})\|+o_{p}(1)$ . Recall that $\tilde{\theta}$ is the continuity-restricted estimator. Meanwhile, $\|\bar{g}_{n}(\hat{\theta})\|=O_{p}(n^{-1/2})$ because $\|\bar{g}_{n}(\hat{\theta})\|\leq\|\bar{g}_{n}(\theta_{0})\|=O_{p}(n^{-1/2})$ . Therefore, there exists $C_{2}\in(0,+\infty)$ such that $\hat{Q}_{n}(\tilde{\theta})-\hat{Q}_{n}(\hat{\theta})\geq C_{2}+O_{p}(n^{-1})$ , which implies that $P(n^{-m}\mathcal{T}_{n}>M)=P(\hat{Q}_{n}(\tilde{\theta})-\hat{Q}_{n}(\hat{\theta})>M/(n^{1-m}))\rightarrow 1$ , for any $m\in[0,1)$ and $M<\infty$ .

D.3 Auxiliary Lemmas

Lemma D.1.

Suppose that the true model is continuous and Assumptions G, D, and LK are true. For any $\eta>0$ , there is a neighborhood $\mathcal{O}$ of $\theta_{0}$ such that the population moment function $g_{0}(\theta)$ satisfies

\lim_{n\rightarrow\infty}\sup_{\theta\in\mathcal{O}}\frac{\sqrt{n}\|g_{0}(\theta)-D_{2}\left(\alpha^{\prime}-\alpha_{0}^{\prime},(\gamma-\gamma_{0})^{2}\right)^{\prime}\|}{1+\sqrt{n}\|\left(\alpha^{\prime}-\alpha_{0}^{\prime},(\gamma-\gamma_{0})^{2}\right)^{\prime}\|}<\eta.

Proof.

Recall that $G$ , whose formula is (5), is the first-order derivative of $g_{0}(\theta)$ with respect to $\gamma$ at $\theta=\theta_{0}$ , and $H$ , whose formula is (6), is a half of the second-order derivative. $G$ can be obtained by applying the Leibniz rule as follows:

	$\displaystyle\left.\frac{d}{d\gamma}E[-z_{it}(1,x_{it}^{\prime})\delta_{0}1\{q_{it}>\gamma\}]\right\|_{\gamma=\gamma_{0}}$	$\displaystyle=\left.\frac{d}{d\gamma}\int_{\gamma}^{\infty}-E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}\|q]f_{t}(q)dq\right\|_{\gamma=\gamma_{0}}$
		$\displaystyle=E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}\|\gamma_{0}]f_{t}(\gamma_{0}).$

Similarly, we can get

\left.\frac{d}{d\gamma}E[z_{it}(1,x_{it-1}^{\prime})\delta_{0}1\{q_{it-1}>\gamma\}]\right|_{\gamma=\gamma_{0}}=-E_{t-1}[z_{it}(1,x_{it-1}^{\prime})\delta_{0}|\gamma_{0}]f_{t-1}(\gamma_{0}).

This implies the formula (5) for $G$ . $H$ can also be obtained by the Leibniz rule as follows:

	$\displaystyle\left.\frac{d}{d\gamma}E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}\|\gamma]f_{t}(\gamma)\right\|_{\gamma=\gamma_{0}}$	$\displaystyle=\left.\frac{d}{d\gamma}E_{t}[z_{it}(\delta_{10}+\delta_{30}\gamma)\|\gamma]f_{t}(\gamma)\right\|_{\gamma=\gamma_{0}}$
		$\displaystyle=\left.\frac{d}{d\gamma}(\delta_{10}+\delta_{30}\gamma)\cdot E_{t}[z_{it}\|\gamma]f_{t}(\gamma)\right\|_{\gamma=\gamma_{0}}$
		$\displaystyle=\delta_{30}E_{t}[{z_{it}}\|\gamma_{0}]f_{t}(\gamma_{0})+(\delta_{10}+\delta_{30}\gamma_{0})\left.\frac{d}{d\gamma}E_{t}[z_{it}\|\gamma]f_{t}(\gamma)\right\|_{\gamma=\gamma_{0}}$
		$\displaystyle=\delta_{30}E_{t}[{z_{it}}\|\gamma_{0}]f_{t}(\gamma_{0}).$

Similarly, we can get

\left.\frac{d}{d\gamma}\{-E_{t-1}[z_{it}(1,x_{it-1}^{\prime})\delta_{0}|\gamma]f_{t-1}(\gamma)\}\right|_{\gamma=\gamma_{0}}=-\delta_{30}E_{t-1}[{z_{it}}|\gamma_{0}]f_{t-1}(\gamma_{0}).

This implies the formula (6) for $H$ .

The population moment can be expressed as,

g_{0}(\alpha,\gamma)=M_{0}(\gamma)(\alpha-\alpha_{0})+H(\gamma-\gamma_{0})^{2}+o((\gamma-\gamma_{0})^{2}).

Define $M_{0,G}=\left[\begin{array}[]{c;{2pt/2pt}c}0_{k\times p}&M_{G}\end{array}\right]\in\mathbb{R}^{k\times(2p+1)}$ where

M_{G}=\begin{bmatrix}E_{t_{0}}[z_{it_{0}}(1,x_{it_{0}}^{\prime})|\gamma_{0}]f_{t_{0}}(\gamma_{0})-E_{t_{0}-1}[z_{it_{0}}(1,x_{it_{0}-1}^{\prime})|\gamma_{0}]f_{t_{0}-1}(\gamma_{0})\\ \vdots\\ E_{T}[z_{iT}(1,x_{iT}^{\prime})|\gamma_{0}]f_{T}(\gamma_{0})-E_{T-1}[z_{iT}(1,x_{iT-1}^{\prime})|\gamma_{0}]f_{T-1}(\gamma_{0})\end{bmatrix}\in\mathbb{R}^{k\times(p+1)}.

The polynomial expansion $M_{0}(\gamma)=M_{0}+M_{0,G}(\gamma-\gamma_{0})+o(|\gamma-\gamma_{0}|)$ implies

g_{0}(\alpha,\gamma)=M_{0}(\alpha-\alpha_{0})+H(\gamma-\gamma_{0})^{2}+o(\|\alpha-\alpha_{0}\|+(\gamma-\gamma_{0})^{2}).

Thus, $\sqrt{n}\|g_{0}(\theta)-D_{2}\left(\alpha^{\prime}-\alpha_{0}^{\prime},(\gamma-\gamma_{0})^{2}\right)^{\prime}\|=o(\sqrt{n}(\|\alpha-\alpha_{0}\|+(\gamma-\gamma_{0})^{2}))$ , which completes the proof. ∎

Lemma D.2.

If G is true, then

\sup_{\gamma\in\Gamma}\|\bar{M}_{n}(\gamma)-M_{0}(\gamma)\|\xrightarrow{p}0.

Proof.

We show that the classes $\{z_{it}(1,x_{it}^{\prime})1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ and $\{z_{it}(1,x_{it-1}^{\prime})1\{q_{it-1}>\gamma\}:\gamma\in\Gamma\}$ are P-Glivenko-Cantelli. We focus on the former class since the verification for the latter class is exactly identical. Let $\omega_{i}=\{(z_{it},y_{it},x_{it},\epsilon_{it})_{t=1}^{T}\}$ be a random element in a measurable space $(\mathcal{X},\mathcal{A})$ . A collection of measurable index functions $\mathcal{G}_{index}=\{1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ on $\mathcal{X}$ is a VC class with a VC index 2. If $m_{ij}$ is the $(i,j)$ th element of $z_{it}(1,x_{it}^{\prime})$ , then $\mathcal{G}_{index}\cdot m_{ij}=\{g_{index}\cdot m_{ij}:g_{index}\in\mathcal{G}_{index}\}$ is also a VC class as discussed by Lemma 2.6.18 in van der Vaart and Wellner, (1996). The envelope for $\mathcal{G}_{index}\cdot m_{ij}$ would be $|m_{ij}|$ since an index function is always bounded by 1. The expectation of the envelope is bounded since $E\|z_{it}(1,x_{it}^{\prime})\|\leq\sqrt{E\|z_{it}\|^{2}E\|(1,x_{it}^{\prime})^{\prime}\|^{2}}<\infty$ . In conclusion, $\mathcal{G}_{index}\cdot m_{ij}$ is a $P$ -Glivenko-Cantelli for each $(i,j)$ , and thus the ULLN for $\{z_{it}(1,x_{it}^{\prime})1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ holds. ∎

Lemma D.3.

Let G hold. If $h_{n}\rightarrow 0$ , then

\sup_{\|\theta_{1}-\theta_{2}\|<h_{n}}\sqrt{n}\|\bar{g}_{n}(\theta_{1})-\bar{g}_{n}(\theta_{2})-g_{0}(\theta_{1})+g_{0}(\theta_{2})\|=o_{p}(1).

Proof.

Let $\omega_{i}=\{(z_{it},y_{it},x_{it},\epsilon_{it})_{t=1}^{T}\}$ be a random element in a measurable space $(\mathcal{X},\mathcal{A})$ , and $P$ is the probability measure for $\omega_{i}$ . Define a functional class $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ on $\mathcal{X}$ such that

$\displaystyle g(\omega_{i},\theta)$	$\displaystyle=(g_{t_{0}}(\omega_{i},\theta)^{\prime},...,g_{T}(\omega_{i},\theta)^{\prime})^{\prime},$	(D.4)
$\displaystyle g_{t}(\omega_{i},\theta)$	$\displaystyle=z_{it}\Delta y_{it}-z_{it}\Delta x_{it}^{\prime}\beta-z_{it}1_{it}(\gamma)^{\prime}X_{it}\delta$
	$\displaystyle=z_{it}\Delta\epsilon_{it}-z_{it}\Delta x_{it}^{\prime}(\beta-\beta_{0})-z_{it}1_{it}(\gamma)^{\prime}X_{it}(\delta-\delta_{0})+z_{it}(1_{it}(\gamma_{0})^{\prime}-1_{it}(\gamma)^{\prime})X_{it}\delta_{0}.$

and $\mathcal{G}_{h}=\{g(\omega_{i},\theta_{1})-g(\omega_{i},\theta_{2}):\|\theta_{1}-\theta_{2}\|<h,\theta_{1},\theta_{2}\in\Theta\}$ . We need to show that $P(\|\mathbb{G}_{n}\|_{\mathcal{G}_{h}}>x)\rightarrow 0$ if $h\rightarrow 0$ as $n\rightarrow\infty$ , which is the asymptotic equicontinuity. To show the asymptotic equicontinuity, it is sufficient to show that each element of $\mathcal{G}$ is P-Donsker, e.g., 2.3.11 Lemma and its corollary in van der Vaart and Wellner, (1996), which is implied by the uniform entropy condtion:

\int_{0}^{\infty}\sup_{Q}\sqrt{\log N(\varepsilon\|G\|_{Q,2},\mathcal{G},L_{2}(Q))}d\varepsilon<\infty,

where supremum is taken over all probability measures $Q$ on $(\mathcal{X},\mathcal{A})$ such that $QG^{2}<\infty$ , and $G$ is an envelope for $\mathcal{G}$ . For more details, see section 2.1 in van der Vaart and Wellner, (1996). As we only need to consider each scalar element of $\mathcal{G}$ , it is sufficient to consider the following functional class

\widetilde{\mathcal{G}}^{(t)}=\{z_{it}\Delta\epsilon_{it}-z_{it}\Delta x_{it}\bar{\beta}-z_{it}1_{it}(\gamma_{1})^{\prime}X_{it}\delta_{1}+z_{it}1_{it}(\gamma_{2})^{\prime}X_{it}\delta_{2}\\ :\|\bar{\beta}\|\leq K,\|\delta_{1}\|\leq K,\|\delta_{2}\|\leq K,\gamma_{1},\gamma_{2}\in\Gamma\},

where $K<\infty$ is a constant such that $\|\theta\|\leq K/2$ if $\theta\in\Theta$ . Assume that $z_{it}$ is a scalar without losing of generality. Note that $g_{t}(\omega_{i},\theta)=z_{it}(\Delta y_{it}-\Delta x_{it}^{\prime}\beta-1_{it}(\gamma)^{\prime}X_{it}\delta)=z_{it}\Delta\epsilon_{it}-z_{it}\Delta x_{it}(\beta-\beta_{0n})-z_{it}1_{it}(\gamma)^{\prime}X_{it}\delta+z_{it}1_{it}(\gamma_{0})^{\prime}X_{it}\delta_{0}$ is an element of $\widetilde{\mathcal{G}}^{(t)}$ . So it is sufficient to show $\widetilde{\mathcal{G}}^{(t)}$ satisfies the uniform entropy condition.

Let $\mathcal{G}_{1}=\{z_{it}\Delta x_{it}^{\prime}\bar{\beta}:\|\bar{\beta}\|\leq K\}$ . $\mathcal{G}_{1}$ is a $p$ -dimensional vector space and is a VC class by 2.6.15 Lemma in van der Vaart and Wellner, (1996), with an envelope function $G_{1}(\omega_{i})=C\|z_{it}\Delta x_{it}^{\prime}\|$ for some constant $C<\infty$ , and $EG_{1}^{2}<\infty$ . Let $\mathcal{G}_{2}=\{z_{it}(1,x_{it}^{\prime})^{\prime}\delta 1\{q_{it}>\gamma\}:\|\delta\|\leq K,\gamma\in\Gamma\}$ , $\mathcal{G}_{2a}=\{z_{it}(1,x_{it}^{\prime})^{\prime}\delta:\|\delta\|\leq K\}$ , and $\mathcal{G}_{2b}=\{1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ . $G_{2a}=C\|z_{it}(1,x_{it}^{\prime})\|$ for some $C<\infty$ and $G_{2b}=1$ are envelopes for $\mathcal{G}_{2a}$ and $\mathcal{G}_{2b}$ , respectively. Note that $\mathcal{G}_{2}=\mathcal{G}_{2a}\mathcal{G}_{2b}$ , i.e., $\mathcal{G}_{2}$ is a collection of $g_{2a}\cdot g_{2b}$ where $g_{2a}\in\mathcal{G}_{2a}$ and $g_{2b}\in\mathcal{G}_{2b}$ . $\mathcal{G}_{2}$ satisfies the uniform entropy condition as pairwise sum or product of functional classes preserve the uniform entropy condition, e.g., Theorem 2.10.20 in van der Vaart and Wellner, (1996). Note that for every $d>0$ ,

\int_{0}^{d}\sup_{Q}\sqrt{\log N(\varepsilon\|(2G_{2a}^{2}G_{2b}^{2})^{1/2}\|_{Q,2},\mathcal{G}_{2},L_{2}(Q))}d\varepsilon\\ \leq\int_{0}^{d}\sup_{Q}\sqrt{\log N(\varepsilon\|G_{2a}\|_{Q,2},\mathcal{G}_{2a},L_{2}(Q))}d\varepsilon+\int_{0}^{d}\sup_{Q}\sqrt{\log N(\varepsilon\|G_{2b}\|_{Q,2},\mathcal{G}_{2b},L_{2}(Q))}d\varepsilon,

while $G_{2a}G_{2b}$ is an envelope of $\mathcal{G}_{2}$ . So the uniform entropy condition for $\mathcal{G}_{2}$ holds. Similarly, we can show that $\mathcal{G}_{3}=\{z_{it}(1,x_{it-1}^{\prime})^{\prime}\delta 1\{q_{it-1}>\gamma\}:\|\delta\|\leq K,\gamma\in\Gamma\}$ satisfies the uniform entropy condition. Hence, the functional class $(\mathcal{G}_{2}-\mathcal{G}_{3})$ defined by pairwise sum, which is a set of functions $g_{2}-g_{3}$ for all $g_{2}\in\mathcal{G}_{2}$ and $g_{3}\in\mathcal{G}_{3}$ , also satisfies the uniform entropy condition, e.g., Theorem 2.10.20 in van der Vaart and Wellner, (1996). As $(\mathcal{G}_{2}-\mathcal{G}_{3})$ is a superset of $\{z_{it}1_{it}(\gamma)^{\prime}X_{it}\delta:\|\delta\|\leq K,\gamma\in\Gamma\}$ , the functional class $\{z_{it}1_{it}(\gamma)^{\prime}X_{it}\delta:\|\delta\|\leq K,\gamma\in\Gamma\}$ also satisfies the uniform entropy condition . Thus, $\{z_{it}\Delta\epsilon_{it}\}-\mathcal{G}_{1}-(\mathcal{G}_{2}-\mathcal{G}_{3})+(\mathcal{G}_{2}-\mathcal{G}_{3})$ , which is a superset of $\widetilde{\mathcal{G}}^{(t)}$ , satisfies the uniform entropy condition by repetitively applying Theorem 2.10.20 in van der Vaart and Wellner, (1996), and hence $\widetilde{\mathcal{G}}^{(t)}$ also satisfies the condition.

Note that for some constant $C<\infty$ ,

\widetilde{G}=C(\|z_{it}\Delta x_{it}^{\prime}\|+\|z_{it}(1,x_{it}^{\prime})\|+\|z_{it}(1,x_{it-1}^{\prime})\|)+\|z_{it}\Delta\epsilon_{it}\|

is an envelope for $\widetilde{\mathcal{G}}^{(t)}$ , and $E\widetilde{G}^{2}<\infty$ by G.

∎

Lemma D.4.

When the true model is continuous and Assumptions G, D, and LK are true,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma_{0})^{\prime}-1_{it}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}\xrightarrow{p}\frac{\delta_{30}}{2}\left\{E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})\right\}b^{2}

uniformly over $b\in[-K,K]$ for any $K<\infty$ .

Proof.

Note that

		$\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma_{0})^{\prime}-1_{it}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}$
		$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{z_{it}(1_{it}(\gamma_{0})^{\prime}-1_{it}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}-E[z_{it}(1_{it}(\gamma_{0})^{\prime}-1_{it}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}]\right\}$		(D.5)
		$\displaystyle\quad+\sqrt{n}E[z_{it}(1_{it}(\gamma_{0})^{\prime}-1_{it}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}].$		(D.6)

The stochastic term (D.5) converges in probability to zero uniformly with respect to $b\in[-K,K]$ . This is because Lemma D.3 shows that when $h_{n}\downarrow 0$ , then

\sup_{|\gamma-\gamma_{0}|<h_{n}}\sqrt{n}\left\{\frac{1}{n}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma_{0})-1_{it}(\gamma))^{\prime}X_{it}\delta_{0}-E[z_{it}(1_{it}(\gamma_{0})-1_{it}(\gamma))^{\prime}X_{it}\delta_{0}]\right\}=o_{p}(1)

as it can be expressed as $\sup_{|\gamma-\gamma_{0}|<h_{n}}\|\bar{g}_{n}(\alpha_{0},\gamma)-\bar{g}_{n}(\alpha_{0},\gamma_{0})-g_{0}(\alpha_{0},\gamma)+g_{0}(\alpha_{0},\gamma_{0})\|$ .

Suppose $b>0$ . The case for $b<0$ follows similarly. As $n\rightarrow\infty$ , the deterministic term (D.6) converges as follows:

\sqrt{n}Ez_{it}(1_{it}(\gamma_{0})^{\prime}-1_{it}(\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}\\ =\sqrt{n}\left\{E[z_{it}(\delta_{10}+\delta_{30}q_{it})1\{\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma_{0}\}]-E[z_{it}(\delta_{10}+\delta_{30}q_{it-1})1\{\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it-1}>\gamma_{0}\}]\right\}\\ \rightarrow\frac{\delta_{30}}{2}\left\{E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})\right\}b^{2},

uniformly with respect to $b\in[-K,K]$ . To show that, use the (second-order) derivative of $\kappa\mapsto E[z_{it}(\delta_{10}+\delta_{30}q_{it})1\{\gamma_{0}+\kappa\geq q_{it}>\gamma_{0}\}]$ and derive the Taylor expansion

	$\displaystyle\sqrt{n}E[z_{it}(\delta_{10}+\delta_{30}q_{it})1\{\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma_{0}\}]$
	$\displaystyle=\frac{b^{2}}{2}\left(\delta_{30}E_{t}[z_{it}\|\gamma_{n,b}]f_{t}(\gamma_{n,b})+(\delta_{10}+\delta_{30}\gamma_{n,b})\frac{d}{d\gamma}E_{t}[z_{it}\|\gamma]f_{t}(\gamma)\|_{\gamma=\gamma_{n,b}}\right),$

where $\gamma_{n,b}\in[\gamma_{0},\gamma_{0}+\frac{b}{n^{1/4}}]$ . Note that $|\gamma_{n,b}-\gamma_{0}|\rightarrow 0$ unifromly with respect to $b\in[-K,K]$ . Since $E_{t}[z_{it}|\gamma]$ and $f_{t}(\gamma)$ are continuously differentiable at $\gamma_{0}$ by D, both $\frac{d}{d\gamma}E_{t}[z_{it}|\gamma]f_{t}(\gamma)|_{\gamma=\gamma_{n,b}}\rightarrow\frac{d}{d\gamma}E_{t}[z_{it}|\gamma]f_{t}(\gamma)|_{\gamma=\gamma_{0}}$ and $(\delta_{10}+\delta_{30}\gamma_{n,b})\rightarrow 0$ hold uniformly with respect to $b\in[-K,K]$ . On the other hand, $E_{t}[z_{it}|\gamma_{n,b}]f_{t}(\gamma_{n,b})\rightarrow E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})$ uniformly with respect to $b\in[-K,K]$ . Hence, $\sqrt{n}E[z_{it}(\delta_{10}+\delta_{30}q_{it})1\{\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma_{0}\}$ converges to $\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}$ uniformly with respect to $b\in[-K,K]$ as $n\rightarrow\infty$ . We can derive the similar result for $\sqrt{n}E[z_{it}(\delta_{10}+\delta_{30}q_{it-1})1\{\gamma_{0}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it-1}>\gamma_{0}\}]$ .

∎

Appendix E Proofs of Theorems in Section 4 and Auxiliary Lemmas

E.1 Preliminaries

Proofs in this section are regarding bootstrap results, and hence we explain empirical process framework for our bootstrap analysis. Let $\omega_{1}^{*},...,\omega_{n}^{*}$ be i.i.d. resampling draws from a given sample $\{\omega_{i}:1\leq i\leq n\}$ . We set $\omega_{i}=\{(z_{it},y_{it},x_{it},\epsilon_{it})_{t=1}^{T}\}$ as in the proofs of Lemmas D.2 and D.3. An important functional class for our bootstrap analysis is $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ where $g(\omega_{i},\theta)$ is defined as in (D.4).

Be mindful that $g_{i}^{*}(\theta)$ that appears in Section 4 is different from $g(\omega_{i}^{*},\theta)$ . This is because $g_{i}^{*}(\theta)=(g_{it_{0}}^{*}(\theta)^{\prime},...,g_{iT}^{*}(\theta)^{\prime})^{\prime}$ where

	$\displaystyle g_{it}^{*}(\theta)$	$\displaystyle=z_{it}^{}(\Delta y_{it}^{}-\Delta x_{it}^{\prime}\beta-1_{it}^{}(\gamma)^{\prime}X_{it}^{*}\delta)$
		$\displaystyle=\underbrace{-z_{it}^{}\Delta x_{it}^{\prime}(\beta-\beta_{0}^{})-z_{it}^{}1_{it}^{}(\gamma)^{\prime}X_{it}^{}(\delta-\delta_{0}^{})+z_{it}^{}(1_{it}^{}(\gamma_{0}^{})-1_{it}^{}(\gamma))^{\prime}X_{it}^{}\delta_{0}^{}}_{\textstyle(I)}+\underbrace{z_{it}^{}\widehat{\Delta\epsilon}_{it}^{*}}_{\textstyle(II)}.$		(E.1)

Recall that $\Delta y_{it}^{*}$ is not an i.i.d. resampling draw from $\{\Delta y_{it}:1\leq i\leq n\}$ but is generated using resampled regressors and residuals with regression equation using $\theta_{0}^{*}$ . The formula for $\Delta y_{it}^{*}$ is used to derive the equality in (E.1) (see Step 2 in Algorithm 1). Instead, $g_{it}^{*}(\theta)=g_{t}(\omega_{i}^{*},\theta)-g_{t}(\omega_{i}^{*},\theta_{0}^{*})+g_{t}(\omega_{i}^{*},\hat{\theta})$ . To be more precise, $(I)$ in (E.1) is $g_{t}(\omega_{i}^{*},\theta)-g_{t}(\omega_{i}^{*},\theta_{0}^{*})$ , and $(II)$ in (E.1) is $g_{t}(\omega_{i}^{*},\hat{\theta})$ .

E.2 Proof of Proposition 1

Consistency of the bootstrap estimator.

The bootstrap sample moment can be rewritten by

	$\displaystyle\bar{g}_{n}^{*}(\theta)$	$\displaystyle=\frac{1}{n}\sum_{i=1}^{n}(g_{i}^{*}(\theta)-\bar{g}_{n}(\hat{\theta}))$
		$\displaystyle=\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}\widehat{\Delta\epsilon}_{it_{0}}^{}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{}\widehat{\Delta\epsilon}_{iT}^{}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{}\Delta x_{iT}^{\prime}\end{pmatrix}(\beta-\beta_{0}^{*})$
		$\displaystyle\quad-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}1_{it_{0}}^{}(\gamma)^{\prime}X_{it_{0}}^{}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{}1_{iT}^{}(\gamma)^{\prime}X_{iT}^{}\end{pmatrix}(\delta-\delta_{0}^{})+\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}(1_{it_{0}}^{}(\gamma_{0}^{})-1_{it_{0}}^{}(\gamma))^{\prime}X_{it_{0}}^{}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{}(1_{iT}^{}(\gamma_{0}^{})-1_{iT}^{}(\gamma))^{\prime}X_{iT}^{}\end{pmatrix}\delta_{0}^{}.$

We additionally define

\displaystyle v_{i}^{*}=

\displaystyle\begin{pmatrix}z_{it_{0}}^{*}\Delta y_{it_{0}}^{*}\\ \vdots\\ z_{iT}^{*}\Delta y_{iT}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}

\displaystyle,\quad M_{i}^{*}(\gamma)=

\displaystyle-\begin{bmatrix}z_{it_{0}^{*}}(\Delta x_{it_{0}}^{*\prime},1_{it_{0}}^{*}(\gamma)^{\prime}X_{it_{0}}^{*})\\ \vdots\\ z_{iT}^{*}(\Delta x_{iT}^{*\prime},1_{iT}^{*}(\gamma)^{\prime}X_{iT}^{*})\end{bmatrix},

$\bar{v}_{n}^{*}=\frac{1}{n}\sum_{i=1}^{n}v_{i}^{*}$ , and $\bar{M}_{n}^{*}(\gamma)=\frac{1}{n}\sum_{i=1}^{n}M_{i}^{*}(\gamma)$ . Then, $\bar{g}_{n}^{*}(\theta)=\bar{v}_{n}^{*}+\bar{M}_{n}^{*}(\gamma)\alpha$ . Given $\gamma$ , we can obtain the constrained optimizer

\hat{\alpha}^{*}(\gamma)=-(\bar{M}_{n}^{*\prime}(\gamma)W_{n}^{*}\bar{M}_{n}^{*}(\gamma))^{-1}\bar{M}_{n}^{*\prime}(\gamma)W_{n}^{*}\bar{v}_{n}^{*}

where

\bar{v}_{n}^{*}=-\bar{M}_{n}^{*}(\gamma_{0}^{*})\alpha_{0}^{*}+\hat{u}_{n}^{*};\quad\hat{u}_{n}^{*}=\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}\widehat{\Delta\epsilon}_{it_{0}}^{*}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}\widehat{\Delta\epsilon}_{iT}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}.

Let $\tilde{Q}_{n}^{*}(\gamma)=\hat{Q}_{n}^{*}(\hat{\alpha}^{*}(\gamma),\gamma)$ be a profiled criterion and $\hat{\gamma}^{*}=\arg\min_{\gamma\in\Gamma}\tilde{Q}_{n}^{*}(\gamma)$ . $\hat{u}_{n}^{*}=o_{p}^{*}(1)$ in $P$ by Lemma E.1. By Lemma E.3, $\sup_{\gamma\in\Gamma}\|\bar{M}_{n}^{*}(\gamma)-M_{0}(\gamma)\|=o_{p}^{*}(1)$ in $P$ . Therefore, if $|\hat{\gamma}^{*}-\gamma_{0}^{*}|\xrightarrow{p^{*}}0$ in $P$ , then $\|\hat{\alpha}^{*}(\hat{\gamma}^{*})-\alpha_{0}^{*}\|\xrightarrow{p^{*}}0$ in $P$ , which completes the proof.

Let $\tilde{g}_{n}^{*}(\gamma)=\bar{g}_{n}^{*}(\hat{\alpha}^{*}(\gamma),\gamma)$ which can be expressed as

\tilde{g}_{n}^{*}(\gamma)=\left[I-\bar{M}_{n}^{*}(\gamma)(\bar{M}_{n}^{*\prime}(\gamma)W_{n}^{*}\bar{M}_{n}^{*}(\gamma))^{-1}\bar{M}_{n}^{*\prime}(\gamma)W_{n}^{*}\right]\left(-\bar{M}_{n}^{*}(\gamma_{0}^{*})\alpha_{0}^{*}+\hat{u}_{n}^{*}\right).

Therefore,

W_{n}^{*1/2}\tilde{g}_{n}^{*}(\gamma)=\left[I-P_{W_{n}^{*1/2}\bar{M}_{n}^{*}(\gamma)}\right]\left(-W_{n}^{*1/2}\bar{M}_{n}^{*}(\gamma_{0}^{*})\alpha_{0}^{*}+W_{n}^{*1/2}\hat{u}_{n}^{*}\right),

and

\sup_{\gamma\in\Gamma}\left|\tilde{Q}_{n}^{*}(\gamma)-\left\|\left[I-P_{W^{1/2}M_{0}(\gamma)}\right]\left(-W^{1/2}M_{0}(\gamma_{0})\alpha_{0}\right)\right\|^{2}\right|=o_{p}^{*}(1)\text{ in $P$}

when $\|W_{n}^{*}-W\|=o_{p}^{*}(1)$ in $P$ and $\theta_{0}^{*}\xrightarrow{p}\theta_{0}$ . Note that $W$ is the identity matrix if it is for the first step estimation and $\Omega^{-1}$ if it is for the second step estimation and the first step estimator is consistent. Since the uniform probability limit of $\tilde{Q}_{n}^{*}(\gamma)$ conditional on the data is minimized when $\gamma=\gamma_{0}$ , the argmin CMT implies $\hat{\gamma}^{*}-\gamma_{0}=o_{p}^{*}(1)$ in $P$ . Recall that $\theta_{0}^{*}$ is set as $(\hat{\alpha}(\gamma_{0}),\gamma_{0})^{\prime}$ in Theorem 5, (8) in Theorem 6, and $\tilde{\theta}$ in Theorem 7. For both cases (i) and (ii) of the proposition, $\gamma_{0}^{*}\xrightarrow{p}\gamma_{0}$ which implies $\gamma_{0}^{*}-\gamma_{0}=o_{p}^{*}(1)$ in $P$ by Lemma B.1. Therefore, we can derive that $\hat{\gamma}^{*}-\gamma_{0}^{*}=(\hat{\gamma}^{*}-\gamma_{0})-(\gamma_{0}^{*}-\gamma_{0})=o_{p}^{*}(1)$ in $P$ .

Convergence rate under continuity.

By bootstrap equicontinuity, Lemma E.4, and the consistency of $\hat{\theta}^{*}$ to $\theta_{0}^{*}$ ,

\sqrt{n}\|\bar{g}_{n}^{*}(\hat{\theta}^{*})-\bar{g}_{n}^{*}(\theta_{0}^{*})-\bar{g}_{n}(\hat{\theta}^{*})+\bar{g}_{n}(\theta_{0}^{*})\|=o_{p}^{*}(1)\text{ in $P$.}

$\|W_{n}^{*}-W_{n}\|\xrightarrow{p^{*}}0$ in $P$ since $\|W_{n}-\Omega^{-1}\|=o_{p}^{*}(1)$ in $P$ and $\|W_{n}^{*}-\Omega^{-1}\|=o_{p}^{*}(1)$ in $P$ . The condition $\|W_{n}^{*}-\Omega^{-1}\|=o_{p}^{*}(1)$ in $P$ is implied by $\hat{\theta}_{(1)}\xrightarrow{p^{*}}\theta_{0}$ in $P$ , as $|\hat{\theta}_{(1)}^{*}-\theta_{0}^{*}|\xrightarrow{p^{*}}0$ and $\theta_{0}^{*}\xrightarrow{p^{*}}\theta_{0}$ in $P$ . Thus,

\sqrt{n}\|W_{n}^{*1/2}\bar{g}_{n}^{*}(\hat{\theta}^{*})-W_{n}^{*1/2}\bar{g}_{n}^{*}(\theta_{0}^{*})-W_{n}^{1/2}\bar{g}_{n}(\hat{\theta}^{*})+W_{n}^{1/2}\bar{g}_{n}(\theta_{0}^{*})\|=o_{p}^{*}(1)\text{ in $P$}.

Apply triangle inequality to get

\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta}^{*})-W_{n}^{1/2}\bar{g}_{n}(\theta_{0}^{*})\|\leq o_{p}^{*}(1)+\sqrt{n}\|W_{n}^{*1/2}\bar{g}_{n}^{*}(\theta_{0}^{*})\|+\sqrt{n}\|W_{n}^{*1/2}\bar{g}_{n}^{*}(\hat{\theta}^{*})\|

where $o_{p}^{*}(1)$ holds in $P$ . As $\hat{\theta}^{*}$ is the minimizer of the bootstrap criterion, $\sqrt{n}\|W_{n}^{*1/2}\bar{g}_{n}^{*}(\hat{\theta}^{*})\|\leq\sqrt{n}\|W_{n}^{*1/2}\bar{g}_{n}^{*}(\theta_{0}^{*})\|=O_{p}^{*}(1)$ in $P$ where the last equality is implied by Lemma E.2. Therefore,

\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta}^{*})-W_{n}^{1/2}\bar{g}_{n}(\theta_{0}^{*})\|\leq O_{p}^{*}(1)\text{ in $P$}.

By Lemma D.3, $\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta}^{*})-W_{n}^{1/2}\bar{g}_{n}(\theta_{0}^{*})-\Omega^{-1/2}g_{0}(\hat{\theta}^{*})+\Omega^{-1/2}g_{0}(\theta_{0}^{*})\|=o_{p}(1)$ , so it is $o_{p}^{*}(1)$ in $P$ by Lemma B.1. Hence,

\sqrt{n}\|\Omega^{-1/2}g_{0}(\hat{\theta}^{*})-\Omega^{-1/2}g_{0}(\theta_{0}^{*})\|\leq O_{p}^{*}(1)\text{ in $P$}.

By Lemma D.1, $\sqrt{n}\|\Omega^{-1/2}g_{0}(\hat{\theta}^{*})-\Omega^{-1/2}g_{0}(\theta_{0}^{*})\|\geq\sqrt{n}\|\Omega^{-1/2}M_{0}(\hat{\alpha}^{*}-\alpha_{0}^{*})+\Omega^{-1/2}H\{(\hat{\gamma}^{*}-\gamma_{0})^{2}-(\gamma_{0}^{*}-\gamma_{0})^{2}\}\|+o_{p}^{*}(1+\sqrt{n}\{\|\hat{\alpha}^{*}-\alpha_{0}^{*}\|+(\hat{\gamma}^{*}-\gamma_{0})^{2}+(\gamma_{0}^{*}-\gamma_{0})^{2}\})$ in $P$ . Therefore, $\sqrt{n}\|\hat{\alpha}^{*}-\alpha_{0}^{*}\|=O_{p}^{*}(1)$ in $P$ and $\sqrt{n}(\hat{\gamma}^{*}-\gamma_{0})^{2}=O_{p}^{*}(1)$ in $P$ . Suppose that $\sqrt{n}(\gamma_{0}^{*}-\gamma_{0})^{2}=O_{p}^{*}(1)$ in $P$ . Then, $\sqrt{n}(\hat{\gamma}^{*}-\gamma_{0}^{*})^{2}=O_{p}^{*}(1)$ in $P$ since $\sqrt{n}(\hat{\gamma}^{*}-\gamma_{0}^{*})^{2}\leq 2\sqrt{n}[(\hat{\gamma}^{*}-\gamma_{0})^{2}+(\gamma_{0}^{*}-\gamma_{0})^{2}]=O_{p}^{*}(1)$ in $P$ .

The condition, $\sqrt{n}(\gamma_{0}^{*}-\gamma_{0})^{2}=O_{p}^{*}(1)$ in $P$ , is true if $\sqrt{n}(\gamma_{0}^{*}-\gamma_{0})^{2}=O_{p}(1)$ by Lemma B.1. This is true for $\gamma_{0}^{*}=\gamma_{0}$ (Theorem 5 (i)), $\gamma_{0}^{*}=w_{n}\hat{\gamma}+(1-w_{n})\tilde{\gamma}$ (Theorem 6 (i)), or $\gamma_{0}^{*}=\tilde{\gamma}$ (Theorem 7 (i)). It is also the case for the standard nonparametric bootstrap as $\sqrt{n}(\hat{\gamma}-\gamma_{0})^{2}=O_{p}(1)$ by Theorem 2.

Convergence rate under discontinuity.

Identically to the proof for the continuous model, we can get

\sqrt{n}\|\Omega^{-1/2}g_{0}(\hat{\theta}^{*})-\Omega^{-1/2}g_{0}(\theta_{0}^{*})\|\leq O_{p}^{*}(1)\text{ in $P$}.

Meanwhile, $\sqrt{n}\|\Omega^{-1/2}g_{0}(\hat{\theta}^{*})-\Omega^{-1/2}g_{0}(\theta_{0}^{*})\|\geq C\sqrt{n}\|\hat{\theta}^{*}-\theta_{0}^{*}\|+o_{p}^{*}(1+\sqrt{n}\|\hat{\theta}^{*}-\theta_{0}^{*}\|)$ for some $C<\infty$ in $P$ when the true model is discontinuous and LJ holds. This is because $g_{0}(\theta)=D_{1}(\theta-\theta_{0})+o(\|\theta-\theta_{0}\|)$ by LJ and

o(1)=\frac{\|g_{0}(\theta)-D_{1}(\theta-\theta_{0})\|}{\|\theta-\theta_{0}\|}\geq\frac{\sqrt{n}\|g_{0}(\theta)-D_{1}(\theta-\theta_{0})\|}{1+\sqrt{n}\|\theta-\theta_{0}\|}.

Therefore, $\sqrt{n}\|\hat{\theta}^{*}-\theta_{0}^{*}\|\leq O_{p}^{*}(1)$ in $P$ .

E.3 Proof of Theorem 5.

In the grid bootstrap at $\gamma$ , $\theta_{0}^{*}=(\hat{\alpha}(\gamma)^{\prime},\gamma)^{\prime}$ .

When $\gamma=\gamma_{0}$ .

The proof of Theorem 6 still holds, and $\mathbb{S}_{n}^{*}(a,b)$ conditionally weakly converges to either $\mathbb{S}$ or $\mathbb{S}_{J}$ in $\ell^{\infty}(\mathbb{K})$ in $P$ for every compact $\mathbb{K}$ . The limit is $\mathbb{S}$ for the Theorem 5 (i) case, and $\mathbb{S}_{J}$ for the Theorem 5 (ii) case. By following the similar steps to the proof of Theorem 3, we can derive the asymptotic distributions of $\mathcal{D}_{n}^{*}(\gamma)$ .

When $\gamma\neq\gamma_{0}$ .

Note that $\bar{g}_{n}^{*}(\hat{\alpha}(\gamma),\gamma)=O_{p}^{*}(n^{-1/2})$ . It will be shown that $\|W_{n}^{*}\|=O_{p}^{*}(1)$ in $P$ . Then, $\min_{\alpha}\hat{Q}_{n}^{*}(\alpha,\gamma)\leq\hat{Q}_{n}^{*}(\hat{\alpha}(\gamma),\gamma)=\bar{g}_{n}^{*}(\hat{\alpha}(\gamma),\gamma)^{\prime}W_{n}^{*}\bar{g}_{n}^{*}(\hat{\alpha}(\gamma),\gamma)=O_{p}^{*}(n^{-1})$ , and $\mathcal{D}_{n}^{*}(\gamma)\leq n\min_{\alpha}\hat{Q}_{n}^{*}(\alpha,\gamma)=O_{p}^{*}(1)$ in $P$ , which completes the proof.

Recall that

W_{n}^{*}=\left\{\frac{1}{n}\sum_{i=1}^{n}[g_{i}^{*}(\hat{\theta}_{(1)}^{*})g_{i}^{*}(\hat{\theta}_{(1)}^{*})^{\prime}]-\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\hat{\theta}_{(1)}^{*})\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\hat{\theta}_{(1)}^{*})^{\prime}\right\}^{-1},

while $g_{i}^{*}(\theta)=g(\omega_{i}^{*},\theta)-g(\omega_{i}^{*},\theta_{0}^{*})+g(\omega_{i}^{*},\hat{\theta})$ as explained in Online Appendix E.1. The functional class $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ is shown to satisfy the uniform entropy condition in the proof of Lemma D.3, and pairwise sum or product of functional classes preserve the uniform entropy condition by Theorem 2.10.20 in van der Vaart and Wellner, (1996). Hence, by applying the bootstrap Glivenko-Cantelli theorem, e.g., Lemma 3.6.16 in van der Vaart and Wellner, (1996),

\textstyle\sup_{\theta\in\Theta}\Biggl{\|}\frac{1}{n}\sum_{i=1}^{n}[g_{i}^{*}(\theta)g_{i}^{*}(\theta)^{\prime}]-\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\theta)\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\theta)^{\prime}\\ \textstyle-\Bigl{(}\frac{1}{n}\sum_{i=1}^{n}\left[\{g_{i}(\theta)-g_{i}(\theta_{0}^{*})+g_{i}(\hat{\theta})\}\{g_{i}(\theta)-g_{i}(\theta_{0}^{*})+g_{i}(\hat{\theta})\}^{\prime}\right]\\ \textstyle-\frac{1}{n}\sum_{i=1}^{n}\{g_{i}(\theta)-g_{i}(\theta_{0}^{*})+g_{i}(\hat{\theta})\}\frac{1}{n}\sum_{i=1}^{n}\{g_{i}(\theta)-g_{i}(\theta_{0}^{*})+g_{i}(\hat{\theta})\}^{\prime}\Bigr{)}\Biggl{\|}

is $o_{p}^{*}(1)$ in $P$ . Furthermore,

\frac{1}{n}\sum_{i=1}^{n}\left[\{g_{i}(\theta)-g_{i}(\theta_{1})+g_{i}(\theta_{2})\}\{g_{i}(\theta)-g_{i}(\theta_{1})+g_{i}(\theta_{2})\}^{\prime}\right]\\ -\frac{1}{n}\sum_{i=1}^{n}\{g_{i}(\theta)-g_{i}(\theta_{1})+g_{i}(\theta_{2})\}\frac{1}{n}\sum_{i=1}^{n}\{g_{i}(\theta)-g_{i}(\theta_{1})+g_{i}(\theta_{2})\}^{\prime}\\ \xrightarrow{p}E\left[\{g_{i}(\theta)-g_{i}(\theta_{1})+g_{i}(\theta_{2})\}\{g_{i}(\theta)-g_{i}(\theta_{1})+g_{i}(\theta_{2})\}^{\prime}\right]\\ -E[g_{i}(\theta)-g_{i}(\theta_{1})+g_{i}(\theta_{2})]E[g_{i}(\theta)-g_{i}(\theta_{1})+g_{i}(\theta_{2})]^{\prime}

uniformly with respect to $\theta$ , $\theta_{1}$ , and $\theta_{2}$ . As $\hat{\theta}$ and $\hat{\theta}_{0}^{*}$ are consistent to $\theta_{0}$ ,

\frac{1}{n}\sum_{i=1}^{n}[g_{i}^{*}(\theta)g_{i}^{*}(\theta)^{\prime}]-\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\theta)\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\theta)^{\prime}\xrightarrow{p^{*}}E\left[g_{i}(\theta)g_{i}(\theta)^{\prime}\right]-E[g_{i}(\theta)]E[g_{i}(\theta)]^{\prime}

uniformly with respect to $\theta$ . By the compactness of $\Theta$ , the minimum eigenvalue of $\{E\left[g_{i}(\theta)g_{i}(\theta)^{\prime}\right]-E[g_{i}(\theta)]E[g_{i}(\theta)]^{\prime}\}$ is bounded below by some constant $c>0$ . Therefore, $\sup_{\theta\in\Theta}\|W_{n}^{*}(\theta)\|=O_{p}^{*}(1)$ in $P$ where

W_{n}^{*}(\theta)=\left\{\frac{1}{n}\sum_{i=1}^{n}[g_{i}^{*}(\theta)g_{i}^{*}(\theta)^{\prime}]-\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\theta)\frac{1}{n}\sum_{i=1}^{n}g_{i}^{*}(\theta)^{\prime}\right\}^{-1}.

As $W_{n}^{*}=W_{n}^{*}(\hat{\theta}_{(1)}^{*})$ , we can conclude that $\|W_{n}^{*}\|=O_{p}^{*}(1)$ .

E.4 Proof of Theorem 7.

In the bootstrap for continuity test, $\theta_{0}^{*}=\tilde{\theta}$ , where $\tilde{\theta}$ is the continuity-restricted estimator.

Under the null hypothesis.

When the true model is continuous, the proof of Theorem 6 still holds. $\mathbb{S}_{n}^{*}(a,b)$ conditionally weakly converges to $\mathbb{S}$ in $\ell^{\infty}(\mathbb{K})$ in $P$ for every compact $\mathbb{K}$ . By following the similar steps to the proof of Theorem 4, we can derive the asymptotic distribution of $\mathcal{T}_{n}^{*}$ .

Under the alternative hypothesis.

Let the true model be discontinuous. Note that $\bar{g}_{n}^{*}(\tilde{\theta})=O_{p}^{*}(n^{-1/2})$ . Meanwhile, $\|W_{n}^{*}\|=O_{p}^{*}(1)$ in $P$ , by the same logic used in the proof of Theorem 5 when $\gamma\neq\gamma_{0}$ . Then, $\min_{\theta\in\Theta:\delta_{2}=0_{p-1},\delta_{1}=-\delta_{3}\gamma}\hat{Q}_{n}^{*}(\theta)\leq\hat{Q}_{n}^{*}(\tilde{\theta})=\bar{g}_{n}^{*}(\tilde{\theta})^{\prime}W_{n}^{*}\bar{g}_{n}^{*}(\tilde{\theta})=O_{p}^{*}(n^{-1})$ . Therefore, $\mathcal{T}_{n}^{*}\leq n\min_{\theta\in\Theta:\delta_{2}=0_{p-1},\delta_{1}=-\delta_{3}\gamma}\hat{Q}_{n}^{*}(\theta)=O_{p}^{*}(1)$ in $P$ , which completes the proof.

E.5 Lemmas

Lemma E.1.

If G holds,

\hat{u}_{n}^{*}=\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}\widehat{\Delta\epsilon}_{it_{0}}^{*}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}\widehat{\Delta\epsilon}_{iT}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}\xrightarrow{p^{*}}0\text{ in $P$.}

Proof.

Let $u_{n}^{*}(\theta)=\frac{1}{n}\sum_{i=1}^{n}[g(\omega_{i}^{*},\theta)-\frac{1}{n}\sum_{i=1}^{n}g(\omega_{i},\theta)]$ where $g(\omega_{i},\theta)$ is defined by (D.4), and $\omega_{i}^{*}$ is a resampling draw from $\{\omega_{i}:i=1,...,n\}$ . See Online Appendix E.1 for more explanation. $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ is shown to satisfy the uniform entropy condition in the proof of Lemma D.3. Therefore, by bootstrap Glivenko-Cantelli theorem, e.g., Lemma 3.6.16 in van der Vaart and Wellner, (1996), $\sup_{\theta\in\Theta}\|u_{n}^{*}(\theta)\|=o_{p}^{*}(1)$ in $P$ . Note that $\hat{u}_{n}^{*}=u_{n}^{*}(\hat{\theta})$ which completes the proof. ∎

Lemma E.2.

If G holds and $\hat{\theta}\xrightarrow{p}\theta_{0}$ , then

\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}\widehat{\Delta\epsilon}_{it_{0}}^{*}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}\widehat{\Delta\epsilon}_{iT}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}\right\}\xrightarrow{d^{*}}N(0,\Omega)\text{ in $P$.}

Proof.

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[g(\omega_{i}^{*},\hat{\theta})-\frac{1}{n}\sum_{i=1}^{n}g(\omega_{i},\hat{\theta})\right]=\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{*}\widehat{\Delta\epsilon}_{it_{0}}^{*}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{*}\widehat{\Delta\epsilon}_{iT}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}\right\}.

By Lemma E.4, $\sqrt{n}\|\bar{g}_{n}^{*}(\hat{\theta})-\bar{g}_{n}^{*}(\theta_{0})-\bar{g}_{n}(\hat{\theta})+\bar{g}_{n}(\theta_{0})\|=\sqrt{n}\|\frac{1}{n}\sum_{i=1}^{n}\left[g(\omega_{i}^{*},\hat{\theta})-\frac{1}{n}\sum_{i=1}^{n}g(\omega_{i},\hat{\theta})\right]-\frac{1}{n}\sum_{i=1}^{n}\left[g(\omega_{i}^{*},\theta_{0})-\frac{1}{n}\sum_{i=1}^{n}g(\omega_{i},\theta_{0})\right]\|=o_{p}^{*}(1)$ in $P$ . By the bootstrap CLT (e.g., Gine and Zinn, (1990)),

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[g(\omega_{i}^{*},\theta_{0})-\frac{1}{n}\sum_{i=1}^{n}g(\omega_{i},\theta_{0})\right]\xrightarrow{d^{*}}N(0,\Omega)\text{ in $P$.}

By applying the Slutsky theorem, we can derive $\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left[g(\omega_{i}^{*},\hat{\theta})-\frac{1}{n}\sum_{i=1}^{n}g(\omega_{i},\hat{\theta})\right]\xrightarrow{d^{*}}N(0,\Omega)$ in $P$ . ∎

Recall that $\bar{M}_{n}^{*}(\gamma)=\frac{1}{n}\sum_{i=1}^{n}M_{i}^{*}(\gamma)$ where

M_{i}^{*}(\gamma)=-\begin{bmatrix}z_{it_{0}^{*}}(\Delta x_{it_{0}}^{*\prime},1_{it_{0}}^{*}(\gamma)^{\prime}X_{it_{0}}^{*})\\ \vdots\\ z_{iT}^{*}(\Delta x_{iT}^{*\prime},1_{iT}^{*}(\gamma)^{\prime}X_{iT}^{*})\end{bmatrix}.

Lemma E.3.

If G is true, then

\sup_{\gamma\in\Gamma}\|\bar{M}_{n}^{*}(\gamma)-M_{0}(\gamma)\|\xrightarrow{p^{*}}0\text{ in $P$}.

Proof.

It is shown that the classes $\{z_{it}(1,x_{it}^{\prime})1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ and $\{z_{it}(1,x_{it-1}^{\prime})1\{q_{it-1}>\gamma\}:\gamma\in\Gamma\}$ are P-Glivenko-Cantelli in the proof of Lemma D.2. Then, by bootstrap Glivenko-Cantelli theorem, e.g., Lemma 3.6.16 in van der Vaart and Wellner, (1996), the result of this lemma holds.

∎

Lemma E.4.

Let G hold. If $h_{n}\rightarrow 0$ , then

\sup_{\|\theta_{1}-\theta_{2}\|<h_{n}}\sqrt{n}\|\bar{g}_{n}^{*}(\theta_{1})-\bar{g}_{n}^{*}(\theta_{2})-\bar{g}_{n}(\theta_{1})+\bar{g}_{n}(\theta_{2})\|=o_{p}^{*}(1)\text{ in $P$.}

Proof.

Note that $g_{i}^{*}(\theta_{1})-g_{i}^{*}(\theta_{2})=g(\omega_{i}^{*},\theta_{1})-g(\omega_{i}^{*},\theta_{2})$ for any $\theta_{1}$ and $\theta_{2}$ where $g(\omega_{i},\theta)$ is defined by (D.4), and $\omega_{i}^{*}$ is a resampling from $\{\omega_{i}:i=1,...,n\}$ . Hence, $\bar{g}_{n}^{*}(\theta_{1})-\bar{g}_{n}^{*}(\theta_{2})-\bar{g}_{n}(\theta_{1})+\bar{g}_{n}(\theta_{2})=\frac{1}{n}\sum_{i=1}^{n}\left[g(\omega_{i}^{*},\theta_{1})-\frac{1}{n}\sum_{i=1}^{n}g(\omega_{i},\theta_{1})\right]-\frac{1}{n}\sum_{i=1}^{n}\left[g(\omega_{i}^{*},\theta_{2})-\frac{1}{n}\sum_{i=1}^{n}g(\omega_{i},\theta_{2})\right]$ . By bootstrap version of stochastic equicontinuity, e.g., C2 in the proof of Theorem 2.1 in Praestgaard and Wellner, (1993), the result of this lemma holds if $\{g(\omega_{i},\theta):\theta\in\Theta\}$ satisfies the uniform entropy condition and has a square integrable envelope function, which are verified in the proof of Lemma D.3. ∎

Lemma E.5.

Suppose that Assumptions G, D, and LK hold, and the true model is continuous. If $\delta_{20}^{*}=O_{p}(n^{-1/2})$ , $\delta_{30}^{*}-\delta_{30}=O_{p}(n^{-1/2})$ , $\gamma_{0}^{*}-\gamma_{0}=O_{p}(n^{-1/4})$ , and $\delta_{10}^{*}+\delta_{30}^{*}\gamma_{0}^{*}=O_{p}(n^{-1/2})$ , then

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma_{0}^{*})^{\prime}-1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{*}\delta_{0}^{*}\xrightarrow{p^{*}}\frac{\delta_{30}}{2}\left\{E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})\right\}b^{2},

in $P$ uniformly with respect to $b\in[-K,K]$ for any $K<\infty$ .

The conditions for $\delta_{0}^{*}$ and $\gamma_{0}^{*}$ hold if (i) $\theta_{0}^{*}=(\hat{\alpha}(\gamma_{0})^{\prime},\gamma_{0})^{\prime}$ , (ii) $\theta_{0}^{*}$ is set as (8), and (iii) $\theta_{0}^{*}=\tilde{\theta}$ , which is the continuity-restricted estimator in Section 3.2, under the assumptions of this lemma. For (i), $\sqrt{n}(\hat{\alpha}(\gamma_{0})-\alpha_{0})$ is asymptotically normal, and $\hat{\delta}_{1}(\gamma_{0})-\delta_{10}+(\hat{\delta}_{3}(\gamma_{0})-\delta_{30})\cdot\gamma_{0}=O_{p}(n^{-1/2})$ . For (ii), note that $w_{n}=O_{p}(n^{-1/4})$ . $\delta_{10}^{*}+\delta_{30}^{*}\gamma_{0}^{*}=w_{n}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})+w_{n}(1-w_{n})(\hat{\delta}_{3}-\tilde{\delta}_{3})(\tilde{\gamma}-\hat{\gamma})+(1-w_{n})(\tilde{\delta}_{1}+\tilde{\delta}_{3}\tilde{\gamma})$ , while $w_{n}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})=O_{p}(n^{-1/2})$ , $(1-w_{n})(\tilde{\delta}_{1}+\tilde{\delta}_{3}\tilde{\gamma})=0$ , and $(1-w_{n})w_{n}(\hat{\delta}_{3}-\tilde{\delta}_{3})(\tilde{\gamma}-\hat{\gamma})=O_{p}(n^{-1/4})O_{p}(n^{-1/2})O_{p}(n^{-1/4})$ . $\delta_{20}^{*}=w_{n}\hat{\delta}_{2}=O_{p}(n^{-3/4})$ , and $\delta_{30}^{*}-\delta_{30}=w_{n}(\hat{\delta}_{3}-\delta_{30})+(1-w_{n})(\tilde{\delta}_{3}-\delta_{30})=O_{p}(n^{-3/4})+O_{p}(n^{-1/2})$ . $\gamma_{0}^{*}-\gamma_{0}=w_{n}(\hat{\gamma}-\gamma_{0})+(1-w_{n})(\tilde{\gamma}-\gamma_{0})=O_{p}(n^{-1/4})O_{p}(n^{-1/4})+O_{p}(n^{-1/2})=O_{p}(n^{-1/2})$ also holds. For (iii), Kim et al., (2019) showed that $\tilde{\theta}-\theta_{0}=O_{p}(n^{-1/2})$ , while $\tilde{\delta}_{1}+\tilde{\delta}_{3}\tilde{\gamma}=0$ and $\tilde{\delta}_{2}=0_{p-1}$ by definition.

Proof.

Note that

		$\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{}(1_{it}(\gamma_{0}^{})^{\prime}-1_{it}^{}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{}\delta_{0}^{}$
		$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{}(1_{it}^{}(\gamma_{0}^{})^{\prime}-1_{it}^{}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{}\delta_{0}^{}-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma_{0}^{})^{\prime}-1_{it}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}^{}$		(E.2)
		$\displaystyle\quad+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma_{0}^{})^{\prime}-1_{it}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}^{*}.$		(E.3)

First, we show that the stochastic term (E.2) is $o_{p}^{*}(1)$ in $P$ uniformly with respect to $b\in[-K,K]$ . Note that $\{z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\kappa)^{\prime})X_{it}\delta:\theta\in\Theta,|\kappa|\leq K\}=\{g(\omega_{i},(\alpha^{\prime},\gamma)^{\prime})-g(\omega_{i},(\alpha^{\prime},\gamma+\kappa)^{\prime}):\theta\in\Theta,|\kappa|\leq K\}$ while $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ is shown to satisfy the uniform entropy condition and to have a square integrable envelope in the proof of Lemma D.3. Then, by C2 in the proof of Theorem 2.1 in Praestgaard and Wellner, (1993), the following bootstrap asymptotic equicontinuity can be derived:

\sup_{\begin{subarray}{c}b\in[-K,K],\\ \theta\in\Theta\end{subarray}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{z_{it}^{*}(1_{it}^{*}(\gamma)^{\prime}-1_{it}^{*}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{*}\delta-\frac{1}{n}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta\right\}

is $o_{p}^{*}(1)$ in $P$ . Hence, by plugging in $\theta_{0}^{*}$ to the place of $\theta$ in the last display, we can derive that (E.2) is $o_{p}^{*}(1)$ in $P$ uniformly with respect to $b\in[-K,K]$ .

Next, we show that (E.3) term converges to a deterministic limit. As $\{z_{it}1_{it}(\gamma)^{\prime}X_{it}\delta:\theta\in\Theta,|\kappa|\leq K\}$ satisfies the uniform entropy condition and has a square integrable envelope function, we can derive the following asymptotic equicontinuity:

\sup_{\begin{subarray}{c}b\in[-K,K],\\ \theta\in\Theta\end{subarray}}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta-\sqrt{n}E[z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta]\right\|

is $o_{p}(1)$ , and hence $o_{p}^{*}(1)$ in $P$ by Lemma B.1. Therefore,

\sup_{\begin{subarray}{c}b\in[-K,K],\\ \theta\in\Theta\end{subarray}}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma)^{\prime}-1_{it}^{*}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{*}\delta-\sqrt{n}E[z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta]\right\|

is $o_{p}^{*}(1)$ in $P$ .

Let $J_{n}(\delta,\gamma,b)=\sqrt{n}E[z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta]$ . By assumption, we can reparametrize such that $\delta_{20}^{*}=\frac{r_{\delta_{2}}}{\sqrt{n}}$ , $\delta_{30}^{*}=\delta_{30}+\frac{r_{\delta_{3}}}{\sqrt{n}}$ , $\gamma_{0}^{*}=\gamma_{0}+\frac{r_{\gamma}}{n^{1/4}}$ , and $\delta_{10}^{*}=-\delta_{30}^{*}\gamma_{0}^{*}+\frac{r_{\delta_{1}+\delta_{3}\gamma}}{\sqrt{n}}=\delta_{10}-\delta_{30}\frac{r_{\gamma}}{n^{1/4}}-\gamma_{0}\frac{r_{\delta_{3}}}{\sqrt{n}}-\tfrac{r_{\gamma}r_{\delta_{3}}}{n^{3/4}}+\frac{r_{\delta_{1}+\delta_{3}\gamma}}{\sqrt{n}}$ . Then, we can reparametrize the function $J_{n}$ such that

\widetilde{J}_{n}(r_{\delta_{1}+\delta_{3}\gamma},r_{\delta_{2}},r_{\delta_{3}},r_{\gamma},b)=J_{n}(\delta_{10}-\delta_{30}\tfrac{r_{\gamma}}{n^{1/4}}-\gamma_{0}\tfrac{r_{\delta_{3}}}{\sqrt{n}}-\tfrac{r_{\gamma}r_{\delta_{3}}}{n^{3/4}}+\tfrac{r_{\delta_{1}+\delta_{3}\gamma}}{\sqrt{n}},\tfrac{r_{\delta_{2}}}{\sqrt{n}},\delta_{30}+\tfrac{r_{\delta_{3}}}{\sqrt{n}},\gamma_{0}+\tfrac{r_{\gamma}}{n^{1/4}},b).

(E.4)

Let $r=(r_{\delta_{1}+\delta_{3}\gamma},r_{\delta_{2}},r_{\delta_{3}},r_{\gamma})$ which lies in a compact set $\mathcal{R}=\{r\in\mathbb{R}^{p+2}:\|r\|\leq\overline{K}\}$ for an aribtrary $\overline{K}<\infty$ .

To prove the lemma, it will be shown below that

\widetilde{J}_{n}(r_{\delta_{1}+\delta_{3}\gamma},r_{\delta_{2}},r_{\delta_{3}},r_{\gamma},b)\rightarrow\frac{\delta_{30}}{2}\left\{E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})\right\}b^{2}

uniformly with respect to $r\in\mathcal{R}$ and $b\in[-K,K]$ , which in turn implies

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma_{0}^{*})^{\prime}-1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{*}\delta_{0}^{*}\xrightarrow{p^{*}}\frac{\delta_{30}}{2}\left\{E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})\right\}b^{2}\text{ in $P$}

uniformly with respect to $b\in[-K,K]$ since

\sup_{b\in[-K,K]}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma_{0}^{*})^{\prime}-1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{*}\delta_{0}^{*}-J_{n}(\delta_{0}^{*},\gamma_{0}^{*},b)\right\|=o_{p}^{*}(1)\text{ in $P$.}

Suppose $b>0$ . The case for $b<0$ follows similarly. Note that

\sqrt{n}E[z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta]=\sqrt{n}E[z_{it}(1,x_{it}^{\prime})\delta 1\{\gamma+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma\}]\\ -\sqrt{n}E[z_{it}(1,x_{it-1}^{\prime})\delta 1\{\gamma+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it-1}>\gamma\}].

We focus on the first term on the right hand side $\sqrt{n}E[z_{it}(1,x_{it}^{\prime})\delta 1\{\gamma+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma\}]$ since the limit of the second term can be analyzed similarly, and redefine $J_{n}(\delta,\gamma,b)=\sqrt{n}E[z_{it}(1,x_{it}^{\prime})\delta 1\{\gamma+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma\}]$ and $\widetilde{J}_{n}$ , accordingly. Let $x_{it}=(\xi_{it}^{\prime},q_{it})^{\prime}$ where $\xi_{it}\in\mathbb{R}^{p-1}$ . Then, $J_{n}(\delta,\gamma,b)=J_{1n}(\delta,\gamma,b)+J_{2n}(\delta,\gamma,b)$ where

	$\displaystyle J_{1n}(\delta,\gamma,b)$	$\displaystyle=\sqrt{n}E[z_{it}\xi_{it}^{\prime}\delta_{2}1\{\gamma+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma\}],\text{ and}$
	$\displaystyle J_{2n}(\delta,\gamma,b)$	$\displaystyle=\sqrt{n}E[z_{it}(\delta_{1}+\delta_{3}q_{it})1\{\gamma+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma\}].$

Similarly to $\widetilde{J}_{n}$ in (E.4), we define reparametrized function $\widetilde{J}_{1n}$ and $\widetilde{J}_{2n}$ .

Limit of $\widetilde{J}_{1n}$ :

We can derive the Taylor expansion

\widetilde{J}_{1n}(r,b)=E[z_{it}\xi_{it}^{\prime}r_{\delta_{2}}1\{\gamma_{0}+\tfrac{b+r_{\gamma}}{n^{\frac{1}{4}}}\geq q_{it}>\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}}\}]=E_{t}[z_{it}\xi_{it}^{\prime}\tfrac{r_{\delta_{2}}}{n^{1/4}}|\gamma_{n,b}]f_{t}(\gamma_{n,b})b,

where $\gamma_{n,b}\in[\gamma_{0}+\frac{r_{\gamma}}{n^{1/4}},\gamma_{0}+\frac{b+r_{\gamma}}{n^{1/4}}]$ . As both $r_{\gamma}$ and $b$ are in compact spaces, $\gamma_{n,b}\rightarrow\gamma_{0}$ uniformly with respect to $r_{\gamma}$ and $b$ . By D, $E_{t}[z_{it}\xi_{it}^{\prime}|\gamma]f_{t}(\gamma)$ is bounded and continuous on a neighborhood $\mathcal{O}$ of $\gamma_{0}$ . Therefore, $E_{t}[z_{it}\xi_{it}^{\prime}|\gamma_{n,b}]f_{t}(\gamma_{n,b})\rightarrow E_{t}[z_{it}\xi_{it}^{\prime}|\gamma_{0}]f_{t}(\gamma_{0})$ . Since $\frac{r_{\delta_{2}}}{n^{1/4}}\rightarrow 0$ , we can derive $\widetilde{J}_{1n}(r,b)\rightarrow 0$ uniformly in $r$ and $b$ .

Limit of $\widetilde{J}_{2n}$ :

We can derive the Taylor expansion

		$\displaystyle\widetilde{J}_{2n}(r,b)$
		$\displaystyle=\sqrt{n}E[z_{it}(\delta_{10}-\delta_{30}\tfrac{r_{\gamma}}{n^{1/4}}-\gamma_{0}\tfrac{r_{\delta_{3}}}{\sqrt{n}}-\tfrac{r_{\gamma}r_{\delta_{3}}}{n^{3/4}}+\tfrac{r_{\delta_{1}+\delta_{3}\gamma}}{\sqrt{n}}+(\delta_{30}+\tfrac{r_{\delta_{3}}}{\sqrt{n}})q_{it})1\{\gamma_{0}+\tfrac{b+r_{\gamma}}{n^{\frac{1}{4}}}\geq q_{it}>\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}}\}]$
		$\displaystyle=\tfrac{r_{\delta_{1}+\delta_{3}\gamma}}{n^{1/4}}E_{t}[z_{it}\|\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}}]f_{t}(\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}})b$		(E.5)
		$\displaystyle\quad+\frac{b^{2}}{2}(\tfrac{r_{\delta_{1}+\delta_{3}\gamma}}{\sqrt{n}}+(\delta_{30}+\tfrac{r_{\delta_{3}}}{\sqrt{n}})(\gamma_{n,b}-\gamma_{0}-\tfrac{b}{n^{\frac{1}{4}}}))\frac{d}{d\gamma}\left\{E_{t}[z_{it}\|\gamma]f_{t}(\gamma)\right\}\|_{\gamma=\gamma_{n,b}}$		(E.6)
		$\displaystyle\quad+\frac{b^{2}}{2}(\delta_{30}+\tfrac{r_{\delta_{3}}}{\sqrt{n}})E_{t}[z_{it}\|\gamma_{n,b}]f_{t}(\gamma_{n,b}),$		(E.7)

where $\gamma_{n,b}\in[\gamma_{0}+\frac{r_{\gamma}}{n^{1/4}},\gamma_{0}+\frac{b+r_{\gamma}}{n^{1/4}}]$ .

First, we can observe that (E.5) converges to zero uniformly with respect to $r_{\delta_{1}+\delta_{3}\gamma}$ , $r_{\gamma}$ , and $b$ . This is because $\gamma_{n,b}\rightarrow\gamma_{0}$ uniformly with respect to $r_{\gamma}$ and $b$ , which implies $E_{t}[z_{it}|\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}}]f_{t}(\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}})\rightarrow E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})$ , while $\frac{r_{\delta_{1}+\delta_{3}\gamma}}{n^{1/4}}b\rightarrow 0$ .

Next, we check that (E.6) converges to zero uniformly with respect to $r_{\delta_{1}+\delta_{3}\gamma}$ , $r_{\gamma}$ , and $b$ . By D, $\frac{d}{d\gamma}(E_{t}[z_{it}|\gamma]f_{t}(\gamma))$ is bounded and continuous on a neighborhood $\mathcal{O}$ of $\gamma_{0}$ . As $\gamma_{n,b}\rightarrow\gamma_{0}$ uniformly with respect to $r_{\gamma}$ and $b$ , $\frac{d}{d\gamma}(E_{t}[z_{it}|\gamma]f_{t}(\gamma))|_{\gamma=\gamma_{n,b}}\rightarrow\frac{d}{d\gamma}(E_{t}[z_{it}|\gamma]f_{t}(\gamma))|_{\gamma=\gamma_{0}}$ and $(\tfrac{r_{\delta_{1}+\delta_{3}\gamma}}{\sqrt{n}}+(\delta_{30}+\tfrac{r_{\delta_{3}}}{\sqrt{n}})(\gamma_{n,b}-\gamma_{0}-\tfrac{b}{n^{\frac{1}{4}}}))\rightarrow 0$ , which implies the convergence of (E.6) to zero.

Finally, we obtain the limit of (E.7). Since $E_{t}[z_{it}|\gamma_{n,b}]f_{t}(\gamma_{n,b})\rightarrow E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})$ and $\frac{r_{\delta_{3}}}{\sqrt{n}}\rightarrow 0$ , (E.7) converges to $\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}$ uniformly with respect to $r\in\mathcal{R}$ and $b\in[-K,K]$ .

In conclusion,

\widetilde{J}_{n}(r,b)\rightarrow\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}

uniformly with respect to $r\in\mathcal{R}$ and $b\in[-K,K]$ , and hence

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1,x_{it}^{*\prime})\delta_{0}^{*}1\{\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}^{*}>\gamma_{0}^{*}\}\xrightarrow{p^{*}}\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}\quad\text{ in $P$}

uniformly with respect to $b\in[-K,K]$ . Similarly, we can show that

\frac{1}{\sqrt{n}}z_{it}^{*}(1,x_{it-1}^{*\prime})\delta_{0}^{*}1\{\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it-1}^{*}>\gamma_{0}^{*}\}\xrightarrow{p^{*}}\frac{\delta_{30}}{2}E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})b^{2}\quad\text{ in $P$}

uniformly with respect to $b\in[-K,K]$ . ∎

Lemma E.6.

Suppose that Assumptions G, D, and LJ hold, and the true model is discontinuous. If $\delta_{0}^{*}-\delta_{0}=O_{p}(n^{-1/2})$ and $\gamma_{0}^{*}-\gamma_{0}=O_{p}(n^{-1/2})$ , then

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma_{0}^{*})^{\prime}-1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})^{\prime})X_{it}^{*}\delta_{0}^{*}\\ \xrightarrow{p^{*}}\left\{E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}(1,x_{it-1}^{\prime})\delta_{0}|\gamma_{0}]f_{t-1}(\gamma_{0})\right\}b,

in $P$ uniformly with respect to $b\in[-K,K]$ for any $K<\infty$ .

The conditions for $\delta_{0}^{*}$ and $\gamma_{0}^{*}$ hold if (i) $\theta_{0}^{*}=(\hat{\alpha}(\gamma_{0})^{\prime},\gamma_{0})^{\prime}$ or (ii) $\theta_{0}^{*}$ is set as (8) under the assumptions of this lemma. Note that $\delta_{0}^{*}=w_{n}\hat{\delta}+(1-w_{n})\tilde{\delta}=\delta_{0}+O_{p}(n^{-1/2})$ since $w_{n}\xrightarrow{p}1$ , $\hat{\delta}=\delta_{0}+O_{p}(n^{-1/2})$ , and $\tilde{\delta}=O_{p}(1)$ by P.

If $M_{0}(\gamma)$ has full column rank for all $\gamma\in\Gamma$ , then P holds. Let $\tilde{M}_{n}(\gamma)=\left[\begin{array}[]{c;{2pt/2pt}c}\bar{M}_{1n}&\tilde{M}_{2n}(\gamma)\end{array}\right]$ , $\tilde{M}_{2n}(\gamma)=\bar{M}_{2n}(\gamma)\left(-\gamma,0_{p-1},1\right)^{\prime}$ , $\tilde{M}_{0}(\gamma)=\left[\begin{array}[]{c;{2pt/2pt}c}M_{10}&\tilde{M}_{20}(\gamma)\end{array}\right]$ , and $\tilde{M}_{20}(\gamma)=M_{20}(\gamma)\left(-\gamma,0_{p-1},1\right)^{\prime}$ . Note that $\tilde{\alpha}(\gamma)=-(\tilde{M}_{n}(\gamma)^{\prime}W_{n}\tilde{M}_{n}(\gamma))^{-1}\tilde{M}_{n}(\gamma)^{\prime}W_{n}v_{n}$ , where $\tilde{\alpha}(\gamma)=\arg\min_{\alpha:\delta_{2}=0_{p-1},\delta_{1}=-\delta_{3}\gamma}\hat{Q}_{n}(\alpha,\gamma)$ . Since $v_{n}=\frac{1}{n}\sum_{i=1}^{n}(z_{it_{0}}^{\prime}\Delta y_{it_{0}},...,z_{iT}^{\prime}\Delta y_{iT})^{\prime}\xrightarrow{p}-M_{0}\alpha_{0}$ and $\sup_{\gamma\in\Gamma}\|\tilde{M}_{n}(\gamma)-\tilde{M}_{0}(\gamma)\|\xrightarrow{p}0$ by Lemma D.2, $\tilde{\alpha}(\gamma)\xrightarrow{p}(\tilde{M}_{0}(\gamma)^{\prime}\Omega^{-1}\tilde{M}_{0}(\gamma))^{-1}\tilde{M}_{0}(\gamma)\Omega^{-1}M_{0}\alpha_{0}$ uniformly with respect to $\gamma$ . Since $\Gamma$ is compact, there exists $C<\infty$ such that $\sup_{\gamma\in\Gamma}\|(\tilde{M}_{0}(\gamma)^{\prime}\Omega^{-1}\tilde{M}_{0}(\gamma))^{-1}\tilde{M}_{0}(\gamma)\Omega^{-1}M_{0}\alpha_{0}\|<C$ . As $\tilde{\gamma}\in\Gamma$ , $P(\|\tilde{\alpha}\|>C)\rightarrow 0$ holds, which implies $\tilde{\delta}=O_{p}(1)$ .

Proof.

By similar arguments used in the proof of Lemma E.5, we can derive that

\sup_{\begin{subarray}{c}b\in[-K,K],\\ \theta\in\Theta\end{subarray}}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma)^{\prime}-1_{it}^{*}(\gamma+\tfrac{b}{\sqrt{n}})^{\prime})X_{it}^{*}\delta-\sqrt{n}E[z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{\sqrt{n}})^{\prime})X_{it}\delta]\right\|

is $o_{p}^{*}(1)$ in $P$ .

Let $J_{n}(\delta,\gamma,b)=\sqrt{n}E[z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{\sqrt{n}})^{\prime})X_{it}\delta]$ . By assumption, we can reparametrize such that $\delta_{0}^{*}=\delta_{0}+\frac{r_{\delta}}{\sqrt{n}}$ and $\gamma_{0}^{*}=\gamma_{0}+\frac{r_{\gamma}}{\sqrt{n}}$ . Then, we can reparametrize the function $J_{n}$ such that $\widetilde{J}_{n}(r_{\delta},r_{\gamma},b)=J_{n}(\delta_{0}+\frac{r_{\delta}}{\sqrt{n}},\gamma_{0}+\frac{r_{\gamma}}{\sqrt{n}},b)$ . Let $r=(r_{\delta},r_{\gamma})$ which lies in a compact set $\mathcal{R}=\{r\in\mathbb{R}^{p+2}:\|r\|\leq\overline{K}\}$ for an aribtrary $\overline{K}<\infty$ .

To prove the lemma, it will be shown that

\widetilde{J}_{n}(r_{\delta},r_{\gamma},b)\rightarrow\{E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}(1,x_{it-1}^{\prime})\delta_{0}|\gamma_{0}]f_{t-1}(\gamma_{0})\}b

uniformly with respect to $r\in\mathcal{R}$ and $b\in[-K,K]$ , which in turn implies

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma_{0}^{*})^{\prime}-1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})^{\prime})X_{it}^{*}\delta_{0}^{*}\\ \xrightarrow{p^{*}}\{E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}(1,x_{it-1}^{\prime})\delta_{0}|\gamma_{0}]f_{t-1}(\gamma_{0})\}b\text{ in $P$}

uniformly with respect to $b\in[-K,K]$ since

\sup_{b\in[-K,K]}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma_{0}^{*})^{\prime}-1_{it}^{*}(\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}})^{\prime})X_{it}^{*}\delta_{0}^{*}-J_{n}(\delta_{0}^{*},\gamma_{0}^{*},b)\right\|=o_{p}^{*}(1)\text{ in $P$.}

Suppose $b>0$ . The case for $b<0$ follows similarly. Then,

\sqrt{n}E[z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{\sqrt{n}})^{\prime})X_{it}\delta]=\sqrt{n}E[z_{it}(1,x_{it}^{\prime})\delta 1\{\gamma+\tfrac{b}{\sqrt{n}}\geq q_{it}>\gamma\}]\\ -\sqrt{n}E[z_{it}(1,x_{it-1}^{\prime})\delta 1\{\gamma+\tfrac{b}{\sqrt{n}}\geq q_{it}>\gamma\}].

We focus on the first term of the right hand side $\sqrt{n}E[z_{it}(1,x_{it}^{\prime})\delta 1\{\gamma+\tfrac{b}{\sqrt{n}}\geq q_{it}>\gamma\}]$ as the limit of the second term can be derived identically, and redefine $J_{n}(\delta,\gamma,b)=\sqrt{n}E[z_{it}(1,x_{it}^{\prime})\delta 1\{\gamma+\tfrac{b}{\sqrt{n}}\geq q_{it}>\gamma\}]$ and $\widetilde{J}_{n}$ , accordingly.

We can derive the following Taylor expansion:

\widetilde{J}_{n}(r,b)=\sqrt{n}E[z_{it}(1,x_{it}^{\prime})(\delta_{0}+\tfrac{r_{\delta}}{\sqrt{n}})1\{\gamma_{0}+\tfrac{b+r_{\gamma}}{\sqrt{n}}>q_{it}\geq\gamma_{0}+\tfrac{r_{\gamma}}{\sqrt{n}}\}]=E_{t}[z_{it}(1,x_{it}^{\prime})(\delta_{0}+\tfrac{r_{\delta}}{\sqrt{n}})|\gamma_{n,b}]f_{t}(\gamma_{n,b})b,

where $\gamma_{n,b}\in[\gamma_{0}+\frac{r_{\gamma}}{\sqrt{n}},\gamma_{0}+\frac{b+r_{\gamma}}{\sqrt{n}}]$ . As $\gamma_{n,b}\rightarrow\gamma_{0}$ uniformly with respect to $r\in\mathcal{R}$ and $b\in[-K,K]$ , $E_{t}[z_{it}(1,x_{it}^{\prime})(\delta_{0}+\tfrac{r_{\delta}}{\sqrt{n}})|\gamma_{n,b}]f_{t}(\gamma_{n,b})b\rightarrow E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}|\gamma_{0}]f_{t}(\gamma_{0})b$ uniformly, and hence $\widetilde{J}_{n}(r,b)\rightarrow E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}|\gamma_{0}]f_{t}(\gamma_{0})b$ uniformly.

In conclusion,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1,x_{it}^{*\prime})\delta_{0}^{*}1\{\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}}\geq q_{it}^{*}>\gamma_{0}^{*}\}\xrightarrow{p^{*}}E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}|\gamma_{0}]f_{t}(\gamma_{0})b\quad\text{in $P$}

uniformly with respect to $b\in[-K,K]$ . Similarly, we can show that

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1,x_{it-1}^{*\prime})\delta_{0}^{*}1\{\gamma_{0}^{*}+\tfrac{b}{\sqrt{n}}\geq q_{it-1}^{*}>\gamma_{0}^{*}\}\xrightarrow{p^{*}}E_{t-1}[z_{it}(1,x_{it-1}^{\prime})\delta_{0}|\gamma_{0}]f_{t-1}(\gamma_{0})b\quad\text{in $P$}

uniformly with respect to $b\in[-K,K]$ . ∎

Appendix F Invalidity of standard nonparametric bootstrap

In this section, we explain why the bootstrap estimators of the standard bootstrap does not have the asymptotic distribution in Theorem 2 when the true model is continuous. Note that the bootstrap explained by Algorithm 1 becomes the standard nonparametric bootstrap when $\theta_{0}^{*}=\hat{\theta}$ . The consistency and convergence rate derivations in the proof of Proposition 1 can still be followed, and hence $\sqrt{n}(\hat{\alpha}^{*}-\hat{\alpha})=O_{p}^{*}(1)$ and $\sqrt{n}(\hat{\gamma}^{*}-\hat{\gamma})^{2}=O_{p}^{*}(1)$ both in $P$ . However, the conditions for Lemma E.5 do not hold for the standard nonparametric bootstrap as $n^{1/4}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})\neq o_{p}(1)$ as explained in Section 4.2. Therefore, the rescaled versions of the criterion converges to a different limit. Specifically,

\sqrt{n}\bar{g}_{n}^{*}(\hat{\alpha}+\tfrac{a}{\sqrt{n}},\hat{\gamma}+\tfrac{b}{n^{1/4}})-n^{1/4}G(\hat{\theta})b\overset{*}{\rightsquigarrow}M_{0}a+Hb^{2}-e

in $\ell^{\infty}(\mathbb{K})$ in $P$ for every compcat $\mathbb{K}$ in the Euclidean space, where $G(\theta)$ is defined by (11). Recall that $n^{1/4}G(\hat{\theta})\neq o_{p}(1)$ as shown in Section 4.2. The conditional weak convergence, $\overset{*}{\rightsquigarrow}$ , in the last display comes from applying the following Lemma F.1 in the place of Lemma E.5 used in the proof of Theorem 6.

Lemma F.1.

Suppose that Assumptions G, D, LK are true and that the true model is continuous. Then,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\hat{\gamma})^{\prime}-1_{it}^{*}(\hat{\gamma}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{*}\hat{\delta}-\left\{E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})\right\}n^{1/4}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})b\\ \xrightarrow{p^{*}}\frac{\delta_{30}}{2}\left\{E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})-E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})\right\}b^{2}

in $P$ uniformly with respect to $b\in[-K,K]$ for any $K<\infty$ .

Proof.

By similar arguments used in the proof of Lemma E.5, we can derive that

\sup_{\begin{subarray}{c}b\in[-K,K],\\ \theta\in\Theta\end{subarray}}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1_{it}^{*}(\gamma)^{\prime}-1_{it}^{*}(\gamma+\tfrac{b}{n^{1/4}})^{\prime})X_{it}^{*}\delta-\sqrt{n}E[z_{it}(1_{it}(\gamma)^{\prime}-1_{it}(\gamma+\tfrac{b}{n^{1/4}})^{\prime})X_{it}\delta]\right\|

is $o_{p}^{*}(1)$ in $P$ .

Suppose that $b>0$ . The $b<0$ case can be analyzed similarly. Let $J_{n}(\delta,\gamma,b)=\sqrt{n}E[z_{it}(1,x_{it}^{\prime})\delta 1\{\gamma+\frac{b}{n^{1/4}}\geq q_{it}>\gamma\}]-n^{1/4}(\delta_{1}+\delta_{3}\gamma)E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b$ . Reparametrize such that $\hat{\gamma}=\gamma_{0}+\frac{r_{\gamma}}{n^{1/4}}$ and $\hat{\delta}=\delta_{0}+\frac{r_{\delta}}{\sqrt{n}}$ . Let the set of $r=(r_{\delta},r_{\gamma})$ be $\mathcal{R}=\{r\in\mathbb{R}^{p+2}:\|r\|\leq\overline{K}\}$ for arbitrary $\overline{K}<\infty$ . Let $\widetilde{J}_{n}(r,b)=J_{n}(\delta_{0}+\frac{r_{\delta}}{\sqrt{n}},\gamma_{0}+\frac{r_{\gamma}}{n^{1/4}},b)$ .

We will show that $\widetilde{J}_{n}(r,b)\rightarrow\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}$ uniformly with respect to $r\in\mathcal{R}$ and $b\in[-K,K]$ , which implies

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1,x_{it}^{*\prime})\hat{\delta}1\{\hat{\gamma}+\tfrac{b}{n^{1/4}}\geq q_{it}^{*}>\hat{\gamma}\}-n^{1/4}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b\xrightarrow{p^{*}}\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}

in $P$ uniformly with respect to $b\in[-K,K]$ , because

\sup_{b\in[-K,K]}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}(1,x_{it}^{\prime})\hat{\delta}1\{\hat{\gamma}+\tfrac{b}{n^{1/4}}\geq q_{it}>\hat{\gamma}\}-n^{1/4}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b-J_{n}(\hat{\delta},\hat{\gamma},b)\right\|\\ =o_{p}^{*}(1)\text{ in $P$.}

Note that $J_{n}(\delta,\gamma,b)=J_{1n}(\delta,\gamma,b)+J_{2n}(\delta,\gamma,b)$ where

	$\displaystyle J_{1n}(\delta,\gamma,b)$	$\displaystyle=\sqrt{n}E[z_{it}\xi_{it}^{\prime}\delta_{2}1\{\gamma+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma\}],\text{ and}$
	$\displaystyle J_{2n}(\delta,\gamma,b)$	$\displaystyle=\sqrt{n}E[z_{it}(\delta_{1}+\delta_{3}q_{it})1\{\gamma+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it}>\gamma\}]-n^{1/4}(\delta_{1}+\delta_{3}\gamma)E_{t}[z_{it}\|\gamma_{0}]f_{t}(\gamma_{0})b.$

Let $\widetilde{J}_{1n}$ and $\widetilde{J}_{2n}$ denote the reparametrized version of $J_{1n}$ and $J_{2n}$ , respectively.

$\widetilde{J}_{1n}(r,b)$ converges to zero uniformly, for which we recall that it is identical to $\widetilde{J}_{1n}$ that appears in the proof of Lemma E.5.

$\widetilde{J}_{2n}(r,b)=\widetilde{J}_{2an}(r,b)+\widetilde{J}_{2bn}(r,b)$ where

	$\displaystyle\widetilde{J}_{2an}(r,b)$	$\displaystyle=E[z_{it}(r_{\delta_{1}}+r_{\delta_{3}}q_{it})1\{\gamma_{0}+\tfrac{b+r_{\gamma}}{n^{\frac{1}{4}}}\geq q_{it}>\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}}\}],\text{ and}$
	$\displaystyle\widetilde{J}_{2bn}(r,b)$	$\displaystyle=\sqrt{n}E[z_{it}(\delta_{10}+\delta_{30}q_{it})1\{\gamma_{0}+\tfrac{b+r_{\gamma}}{n^{\frac{1}{4}}}\geq q_{it}>\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}}\}]$
		$\displaystyle\quad-(\delta_{30}r_{\gamma}+\tfrac{r_{\delta_{1}}+r_{\delta_{3}}\gamma_{0}}{n^{1/4}}+\tfrac{r_{\delta_{3}}r_{\gamma}}{\sqrt{n}})E_{t}[z_{it}\|\gamma_{0}]f_{t}(\gamma_{0})b.$

It can be easily checked that $\widetilde{J}_{2an}(r,b)$ converges to zero uniformly. It will be shown in the next paragraph that $\widetilde{J}_{2bn}(r,b)\rightarrow\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}$ uniformly, which implies $\widetilde{J}_{n}(r,b)\rightarrow\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}$ uniformly.

By Taylor expansion,

		$\displaystyle\widetilde{J}_{2bn}(r,b)$
		$\displaystyle=\sqrt{n}E[z_{it}(\delta_{10}+\delta_{30}q_{it})1\{\gamma_{0}+\tfrac{b+r_{\gamma}}{n^{\frac{1}{4}}}\geq q_{it}>\gamma_{0}+\tfrac{r_{\gamma}}{n^{\frac{1}{4}}}\}]-(\delta_{30}r_{\gamma}+\tfrac{r_{\delta_{1}}+r_{\delta_{3}}\gamma_{0}}{n^{1/4}}+\tfrac{r_{\delta_{3}}r_{\gamma}}{\sqrt{n}})E_{t}[z_{it}\|\gamma_{0}]f_{t}(\gamma_{0})b$
		$\displaystyle=\delta_{30}r_{\gamma}E_{t}[z_{it}\|\gamma_{0}+\tfrac{r_{\gamma}}{n^{1/4}}]f_{t}(\gamma_{0}+\tfrac{r_{\gamma}}{n^{1/4}})b-(\delta_{30}r_{\gamma}+\tfrac{r_{\delta_{1}}+r_{\delta_{3}}\gamma_{0}}{n^{1/4}}+\tfrac{r_{\delta_{3}}r_{\gamma}}{\sqrt{n}})E_{t}[z_{it}\|\gamma_{0}]f_{t}(\gamma_{0})b$		(F.1)
		$\displaystyle\quad+\frac{b^{2}}{2}\left((\delta_{10}+\delta_{30}\gamma_{n,b})\frac{d}{d\gamma}\{E_{t}[z_{it}\|\gamma]f_{t}(\gamma)\}\|_{\gamma=\gamma_{n,b}}+\delta_{30}E_{t}[z_{it}\|\gamma_{n,b}]f_{t}(\gamma_{n,b})\right),$		(F.2)

where $\gamma_{n,b}\in[\gamma_{0}+\frac{r_{\gamma}}{n^{1/4}},\gamma_{0}+\frac{b+r_{\gamma}}{n^{1/4}}]$ . By continuity of $E_{t}[z_{it}|\gamma]f_{t}(\gamma)$ at $\gamma=\gamma_{0}$ , (F.1) converges to 0 uniformly with respect to $r\in\mathcal{R}$ and $b\in[-K,K]$ . As $\gamma_{n,b}\rightarrow\gamma_{0}$ uniformly, we can derive that (F.2) converges to $\frac{\delta_{30}}{2}E_{t}[z_{it}|\gamma_{0}]f_{t}(\gamma_{0})b^{2}$ uniformly.

By similar manner, we can derive

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{*}(1,x_{it-1}^{*\prime})\hat{\delta}1\{\hat{\gamma}+\tfrac{b}{n^{1/4}}\geq q_{it-1}^{*}>\hat{\gamma}\}-n^{1/4}(\hat{\delta}_{1}+\hat{\delta}_{3}\hat{\gamma})E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})b\\ \xrightarrow{p^{*}}\frac{\delta_{30}}{2}E_{t-1}[z_{it}|\gamma_{0}]f_{t-1}(\gamma_{0})b^{2}

in $P$ uniformly with respect to $b\in[-K,K]$ . ∎

Appendix G Symmetricc percentile bootstrap confidence intervals for empirical application

In this section, we report the symmetric percentile residual-bootstrap confidence intervals for the coefficients for the empirical application. Table 13 and Table 14 correspond to Table 5 and Table 6 in Section 6, respectively.

Table 13: The 95% symmetric percentile bootstrap confidence intervals that use the 0.95 quantile of

|\hat{\alpha}_{j}^{*}-\alpha_{j0}^{*}|

are reported. Columns (a) and (b) report results of the models (16) and (17), respectively. The percentile of each threshold location value is shown in parentheses below each value. The significance levels for the coefficients are given by stars: * - 10%, ** - 5% and *** - 1%.

(a)				(b)
	est.	[95% CI]			est.	[95% CI]
Lower regime				Lower regime
$I_{t-1}$	0.778**	0.319	1.237	$I_{t-1}$	0.252	-0.242	0.746
$CF_{t-1}$	0.047	-0.041	0.135	$CF_{t-1}$	0.266*	-0.004	0.535
$PPE_{t-1}$	-0.147	-0.428	0.134	$PPE_{t-1}$	0.027	-0.175	0.229
$ROA_{t-1}$	-0.032	-0.128	0.065	$ROA_{t-1}$	-0.017	-0.157	0.123
$LEV_{t-1}$	0.231	-1.219	1.682	$TQ_{t-1}$	0.246	-0.071	0.564
Upper regime				Upper regime
$I_{t-1}$	-0.154	-0.769	0.462	$I_{t-1}$	0.410**	0.007	0.813
$CF_{t-1}$	0.148*	-0.026	0.322	$CF_{t-1}$	0.081*	-0.023	0.184
$PPE_{t-1}$	-0.291**	-0.566	-0.015	$PPE_{t-1}$	0.044	-0.251	0.340
$ROA_{t-1}$	0.013	-0.076	0.102	$ROA_{t-1}$	0.050	-0.038	0.137
$LEV_{t-1}$	-0.081	-0.216	0.054	$TQ_{t-1}$	0.005	-0.004	0.013
Difference between regimes				Difference between regimes
intercept	0.068	-0.045	0.181	intercept	0.236	-0.083	0.554
$I_{t-1}$	-0.932**	-1.803	-0.061	$I_{t-1}$	0.158	-0.542	0.857
$CF_{t-1}$	0.101	-0.117	0.319	$CF_{t-1}$	-0.185	-0.479	0.109
$PPE_{t-1}$	-0.144	-0.463	0.176	$PPE_{t-1}$	0.017	-0.233	0.267
$ROA_{t-1}$	0.045	-0.129	0.218	$ROA_{t-1}$	0.066	-0.128	0.261
$LEV_{t-1}$	-0.312	-1.754	1.130	$TQ_{t-1}$	-0.242	-0.557	0.074

Table 14: The 95% symmetric percentile bootstrap confidence intervals that use the 0.05 quantile of

|\hat{\alpha}_{j}^{*}-\alpha_{j0}^{*}|

are reported. Results of the model (18) are reported. The percentile of each threshold location value is shown in parentheses below each value. The significance levels for the coefficients are given by stars: * - 10%, ** - 5% and *** - 1%.

	est.	[95% CI]
Coefficients
$I_{t-1}$	0.392***	0.269	0.514
$CF_{t-1}$	0.122***	0.087	0.156
$PPE_{t-1}$	0.076	-0.095	0.247
$ROA_{t-1}$	0.027***	0.007	0.047
$TQ_{t-1}1\{TQ_{t-1}\leq\gamma\}$	0.298**	0.028	0.567
$TQ_{t-1}1\{TQ_{t-1}>\gamma\}$	0.008**	0.000	0.015
Difference between regimes
intercept	0.275**	0.074	0.566
$TQ_{t-1}$	-0.290**	-0.566	-0.061

Appendix H Bootstrap for linearity test

We explain the bootstrap for linearity test based on sup-Wald statistic, explained in Seo and Shin, (2016). Null hypothesis of the test is $\delta=0_{p+1}$ . The sup-Wald test statistic is

\sup_{\gamma\in\Gamma}\{n\hat{\delta}(\gamma)^{\prime}[B^{\prime}(\bar{M}_{n}(\gamma)^{\prime}W_{n}(\gamma)\bar{M}_{n}(\gamma))^{-1}\bar{M}_{n}(\gamma)^{\prime}W_{n}(\gamma)\hat{\Omega}(\gamma)W_{n}(\gamma)\bar{M}_{n}(\gamma)(\bar{M}_{n}(\gamma)^{\prime}W_{n}(\gamma)\bar{M}_{n}(\gamma))^{-1}B]^{-1}\hat{\delta}(\gamma)\},

(H.1)

where $B=\left[\begin{array}[]{c;{2pt/2pt}c}0_{(p+1)\times p}&I_{p+1}\end{array}\right]\in\mathbb{R}^{(p+1)\times(2p+1)}$ , $W_{n}(\gamma)$ is the weight matrix obtained by the initial estimator with the restriction that the threshold location is $\gamma$ , $\hat{\delta}(\gamma)$ is a subvector of the restricted estimator $\hat{\alpha}(\gamma)=(\hat{\beta}(\gamma)^{\prime},\hat{\delta}(\gamma)^{\prime})^{\prime}$ , and $\hat{\Omega}(\gamma)=(\frac{1}{n}\sum_{i=1}^{n}[g_{i}(\hat{\alpha}(\gamma),\gamma)g_{i}(\hat{\alpha}(\gamma),\gamma)^{\prime}]-[\frac{1}{n}\sum_{i=1}^{n}g_{i}(\hat{\alpha}(\gamma),\gamma)][\frac{1}{n}\sum_{i=1}^{n}g_{i}(\hat{\alpha}(\gamma),\gamma)]^{\prime})$ .

The bootstrap for the linearity test can be implemented by setting

\beta_{0}^{*}=\hat{\beta},\quad\delta_{0}^{*}=0_{p+1}

in Algorithm 1. Note that $\gamma_{0}^{*}$ does not matter in this case as $\delta_{0}^{*}=0_{p+1}$ . The critical value for $\tau$ -size test is obtained by using the $(1-\tau)$ quantile of the bootstrapped sup-Wald test statistics, defined analogously to (H.1).

Appendix I Uniform validity of the grid bootstrap

In this section, we show the uniform validity of the grid bootstrap given in Section 4.1. As discussed in Section 4.1.1, the following simplified specification is analyzed for the clarity of exposition:

y_{it}=x_{it}^{\prime}\beta+(\delta_{1}+\delta_{3}q_{it})1\{q_{it}>\gamma\}+\eta_{i}+\epsilon_{it},\quad t=1,...,T,

where $\theta=(\alpha^{\prime},\gamma)^{\prime}=(\beta^{\prime},\delta^{\prime},\gamma)$ , $\alpha=(\beta^{\prime},\delta^{\prime})^{\prime}$ , and $\delta=(\delta_{1},\delta_{3})^{\prime}\in\mathbb{R}^{2}$ . $x_{it}=(\xi_{it}^{\prime},q_{it})^{\prime}$ still includes the threshold variable. The goal here is to show the uniform validity of the grid bootstrap near parameter values that make threshold models continuous. Let $\Theta,\Gamma,g_{0}(\cdot),M_{0},M_{10},M_{20}(\gamma),M_{20},\Omega,f_{t}(\cdot)$ , and $E_{t}[\cdot|q]$ be defined as in Section 2, while

\widetilde{H}=\begin{pmatrix}E_{t_{0}}[z_{it_{0}}|\gamma_{0}]f_{t_{0}}(\gamma_{0})-E_{t_{0}-1}[z_{it_{0}}|\gamma_{0}]f_{t_{0}-1}(\gamma_{0})\\ \vdots\\ E_{T}[z_{iT}|\gamma_{0}]f_{T}(\gamma_{0})-E_{T-1}[z_{iT}|\gamma_{0}]f_{T-1}(\gamma_{0})\end{pmatrix}.

Let $\phi=(\theta,F)$ index the dgp while $F$ is an infinite dimensional index that determines the distribution of the random variables $\{\eta_{i},y_{i0},(z_{it},x_{it},\epsilon_{it})_{t=1}^{T}\}$ . This section restricts $F$ to admit continuous density function. Let the space of the distributions be $\Phi_{F}$ which is compact and equipped with sup-norm over the space of density functions⁵⁵5 That means $d(F_{1},F_{2})=\sup_{x\in\mathbb{R}^{d_{x}}}|f_{1}(x)-f_{2}(x)|$ , where $f_{1}$ and $f_{2}$ are densities of the distribution functions $F_{1}$ and $F_{2}$ , and $d_{x}$ is a dimension of the random vectors whose distributions are $F_{1}$ or $F_{2}$ . It is a stronger norm than the sup-norm over the space of distribution functions as $\sup_{x\in\mathbb{R}^{d_{x}}}|f_{n}(x)-f_{0}(x)|\rightarrow 0$ implies $\sup_{x\in\mathbb{R}^{d_{x}}}|F_{n}(x)-F_{0}(x)|\rightarrow 0$ . , and the space of $\phi$ be $\Phi=\Theta\bigtimes\Phi_{F}$ which is compact since $\Theta$ and $\Phi_{F}$ are compact.

Following the general framework explained in Andrews et al., (2020), we consider a sequence of true parameters $\phi_{0n}=(\theta_{0n},F_{0n})=((\beta_{0n}^{\prime},\delta_{10n},\delta_{30n},\gamma_{0n})^{\prime},F_{0n})$ . Let $\sigma_{\min}(A)$ and $\sigma_{\max}(A)$ be the square root of the minimum and maximum eigenvalues of $A^{\prime}A$ , respectively. Let the parameter space for $\phi_{0n}$ be

\begin{array}[]{rl}\Phi_{0}=\Bigl{\{}\phi_{0}\in\Phi:&(\delta_{10}+\delta_{30}\gamma_{0})^{2}+\delta_{30}^{2}\geq c_{1},\\ &c_{2}\leq\sigma_{\min}(\Omega)\leq\sigma_{\max}(\Omega)\leq c_{3},\\ &c_{4}\leq E\|z_{it}\|^{4+r}\leq c_{5},\ c_{4}\leq E\|x_{it}\|^{4+r}\leq c_{5},\ c_{4}\leq E\|\epsilon_{it}\|^{4+r}\leq c_{5},\\ &f_{t}(\cdot)\text{ is continuously differentiable at }[\gamma_{0}-c_{6},\gamma_{0}+c_{6}],\\ &c_{7}\leq\min_{q\in[\gamma_{0}-c_{6},\gamma_{0}+c_{6}]}f_{t}(q)\leq\max_{q\in[\gamma_{0}-c_{6},\gamma_{0}+c_{6}]}f_{t}(q)\leq c_{8},\\ &\min_{q\in[\gamma_{0}-c_{6},\gamma_{0}+c_{6}]}|f_{t}^{\prime}(q)|\leq c_{9},\\ &E_{t}[z_{it}|q]\text{ and }E_{t-1}[z_{it}|q]\text{ are continuously differentiable at }[\gamma_{0}-c_{10},\gamma_{0}+c_{10}],\\ &\max_{q\in[\gamma_{0}-c_{10},\gamma_{0}+c_{10}]}\|E_{t}[z_{it}|q]\|\leq c_{11},\\ &\max_{q\in[\gamma_{0}-c_{10},\gamma_{0}+c_{10}]}\|E_{t-1}[z_{it}|q]\|\leq c_{11},\\ &\max_{q\in[\gamma_{0}-c_{10},\gamma_{0}+c_{10}]}\|\frac{d}{d\gamma}\left(E_{t}[z_{it}|\gamma]\right)_{\gamma=q}\|\leq c_{11},\\ &\max_{q\in[\gamma_{0}-c_{10},\gamma_{0}+c_{10}]}\|\frac{d}{d\gamma}\left(E_{t-1}[z_{it}|\gamma]\right)_{\gamma=q}\|\leq c_{11},\\ &c_{12}\leq\sigma_{\min}\left(\textstyle{\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&\widetilde{H}\end{array}\right]}\right)\leq\sigma_{\max}\left(\textstyle{\left[\begin{array}[]{c;{2pt/2pt}c}M_{0}&\widetilde{H}\end{array}\right]}\right)\leq c_{13}\\ &E_{t}[\|z_{it}\|^{1+r}|\gamma_{0}]\leq c_{14},\ E_{t-1}[\|z_{it}\|^{1+r}|\gamma_{0}]\leq c_{14},\hfill\text{ for }t=1,...,T\Bigr{\}},\end{array}

where $c_{1},...,c_{14},\text{ and }r$ are some positive constants. Note that $(\delta_{10}+\delta_{30}\gamma_{0})^{2}+\delta_{30}^{2}\geq c_{1}$ is to prevent $(\delta_{10n}+\delta_{30n}\gamma_{0n},\delta_{30n})^{\prime}$ from (having a subsequence) converging to zero.⁶⁶6 This implies that our threshold model has a strong threshold effect which excludes the diminishing or small threshold effect as in Hansen, (2000). The remaining conditions for $\Phi_{0}$ other than $E_{t}[\|z_{it}\|^{1+r}|\gamma_{0}]\leq c_{14},\ E_{t-1}[\|z_{it}\|^{1+r}|\gamma_{0}]\leq c_{14}$ imply that Assumptions D, G, and LK/LJ hold uniformly. The condition $E_{t}[\|z_{it}\|^{1+r}|\gamma_{0}]\leq c_{14},\ E_{t-1}[\|z_{it}\|^{1+r}|\gamma_{0}]\leq c_{14}$ is a uniform integrability condition for the distribution of $z_{it}$ conditional on $q_{it}$ or $q_{it-1}$ . Its role will be explained after introducing the drifting sequence framework.

Because of the nonlinearity and discontinuity of our dynamic model, it is not trivial to answer what primitive conditions for the parameter and distributions of random variables, such as initial value $y_{i0}$ or individual fixed effect $\eta_{i}$ , are sufficient for $\Phi_{0}$ . This paper does not investigate this issue so that we can focus on uniformity analysis with respect to degeneracy of the Jacobian of nonlinear GMM.

For $n=1,2,...$ , let $\{\eta_{in},y_{i0n},(z_{itn},x_{itn},\epsilon_{itn})_{t=1}^{T}\}$ be drawn from distribution $F_{0n}$ . For a function or random variable $u$ , e.g., $u=z,x$ or $\Delta\epsilon$ , we often write $u_{it,n}$ and $u_{it-1,n}$ to indicate more explicitly that indices in subscript are $((i,t),n)$ or $((i,t-1),n)$ , while $n$ is the new index introduced in this section. Suppose that

	$\displaystyle y_{itn}=$	$\displaystyle x_{itn}^{\prime}\beta_{0n}+(\delta_{10n}+\delta_{30n}q_{itn})1\{q_{itn}>\gamma_{0n}\}+\eta_{in}+\epsilon_{itn},\text{ for }t=1,...,T,$
		$\displaystyle E[z_{itn}\Delta\epsilon_{itn}]=0,\quad\text{where }\Delta\epsilon_{itn}=\epsilon_{it,n}-\epsilon_{it-1,n}.$

As in Section 2, we define

M_{1in}=-\begin{bmatrix}z_{it_{0}n}\Delta x_{it_{0}n}^{\prime}\\ \vdots\\ z_{iTn}\Delta x_{iTn}^{\prime}\end{bmatrix}\in\mathbb{R}^{k\times p},\quad M_{2in}(\gamma)=-\begin{bmatrix}z_{it_{0}n}1_{it_{0}n}(\gamma)^{\prime}X_{it_{0}n}\\ \vdots\\ z_{iTn}1_{iTn}(\gamma)^{\prime}X_{iTn}\end{bmatrix}\in\mathbb{R}^{k\times 2},

where $\Delta y_{itn}=y_{it,n}-y_{it-1,n}$ , $\Delta x_{itn}=x_{it,n}-x_{it-1,n}$ ,

X_{itn}=\begin{pmatrix}(1,q_{it,n})\\ (1,q_{it-1,n})\end{pmatrix},\text{ and}\quad 1_{itn}(\gamma)=\begin{pmatrix}1\{q_{it,n}>\gamma\}\\ -1\{q_{it-1,n}>\gamma\}\end{pmatrix}.

Let $M_{in}(\gamma)=\left[\begin{array}[]{c;{2pt/2pt}c}M_{1in}&M_{2in}(\gamma)\end{array}\right]$ , and $M_{0n}(\gamma)=E[M_{in}(\gamma)]$ , $M_{10n}=E[M_{1in}]$ , $M_{20n}(\gamma)=E[M_{2in}(\gamma)]$ , $\bar{M}_{n}(\gamma)=\frac{1}{n}\sum_{i=1}^{n}M_{in}(\gamma)$ , $\bar{M}_{1n}=\frac{1}{n}\sum_{i=1}^{n}M_{1in}$ , and $\bar{M}_{2n}(\gamma)=\frac{1}{n}\sum_{i=1}^{n}M_{2in}(\gamma)$ . We write $M_{0n}$ , $M_{20n}$ and $\bar{M}_{n}$ instead of $M_{0n}(\gamma_{0n})$ , $M_{20n}(\gamma_{0n})$ and $\bar{M}_{n}(\gamma_{0n})$ . Define

\widetilde{H}_{n}=\begin{pmatrix}E_{t_{0}n}[z_{it_{0}n}|\gamma_{0n}]f_{t_{0}n}(\gamma_{0n})-E_{t_{0}-1,n}[z_{it_{0},n}|\gamma_{0n}]f_{t_{0}-1,n}(\gamma_{0n})\\ \vdots\\ E_{Tn}[z_{iTn}|\gamma_{0n}]f_{Tn}(\gamma_{0n})-E_{T-1,n}[z_{iT,n}|\gamma_{0n}]f_{T-1,n}(\gamma_{0n})\end{pmatrix},

where $E_{tn}[\cdot|q]$ and $f_{tn}(\cdot)$ are the conditional expectation $E[\cdot|q_{itn}=q]$ and the density of $q_{itn}$ , respectively.

Suppose that a sequence $\{\phi_{0n}\}$ (or its subsequence $\{\phi_{0p_{n}}\}$ ) converges so that $\theta_{0n}\rightarrow\theta_{0,\infty}=(\alpha_{0,\infty}^{\prime},\gamma_{0,\infty})^{\prime}=(\beta_{0,\infty}^{\prime},\delta_{10,\infty},\delta_{30,\infty},\gamma_{0,\infty})^{\prime}$ and $F_{0n}\rightarrow F_{0,\infty}$ , i.e., $\phi_{0n}\text{ (or $\phi_{0p_{n}}$)}\rightarrow\phi_{0,\infty}$ . Note that the density of the distribution $F_{0n}$ converges to the density of $F_{0,\infty}$ uniformly by our choice of norm in $\Phi_{F}$ , and $\sup_{\upsilon}\|F_{0n}(\upsilon)-F_{0,\infty}(\upsilon)\|\rightarrow 0$ .

Note that $M_{0,\infty}(\gamma)=E[M_{i,\infty}(\gamma)]=\lim_{n\rightarrow\infty}M_{0n}(\gamma)$ as each element of $M_{in}(\gamma)$ is uniformly integrable by $\max\{E\|z_{itn}\|^{4+r},E\|x_{itn}\|^{4+r},E\|\epsilon_{itn}\|^{4+r}\}\leq c_{5}<\infty$ for all $n$ while $F_{0n}$ converges to $F_{0,\infty}$ . Hence, $M_{10,\infty}=E[M_{1i,\infty}]=\lim_{n\rightarrow\infty}M_{10n}$ and $M_{20,\infty}(\gamma)=E[M_{2i,\infty}(\gamma)]=\lim_{n\rightarrow\infty}M_{20n}(\gamma)$ also hold. Furthermore, $\widetilde{H}_{\infty}=\lim_{n\rightarrow\infty}\widetilde{H}_{n}$ , where

\widetilde{H}_{\infty}=\begin{pmatrix}E_{t_{0},\infty}[z_{it_{0},\infty}|\gamma_{0,\infty}]f_{t_{0},\infty}(\gamma_{0,\infty})-E_{t_{0},\infty}[z_{it_{0}-1,\infty}|\gamma_{0,\infty}]f_{t_{0}-1,\infty}(\gamma_{0,\infty})\\ \vdots\\ E_{T,\infty}[z_{iT,\infty}|\gamma_{0,\infty}]f_{T,\infty}(\gamma_{0,\infty})-E_{T,\infty}[z_{iT-1,\infty}|\gamma_{0,\infty}]f_{T-1,\infty}(\gamma_{0,\infty})\end{pmatrix}.

This is because $f_{tn}\rightarrow f_{t,\infty}$ uniformly by our definition of norm in $\Phi_{F}$ , and it is straightforward to derive $z_{itn}|q_{isn}=\gamma_{0n}\xrightarrow{d}z_{it,\infty}|q_{is,\infty}=\gamma_{0,\infty}$ for $s=t,t-1$ , which implies $E_{s}[z_{itn}|\gamma_{0n}]\rightarrow E_{s}[z_{it,\infty}|\gamma_{0,\infty}]$ due to the uniform integrability $E_{s}[\|z_{it}\|^{1+r}|\gamma_{0}]\leq c_{14}$ for $s=t,t-1$ . Furthermore, $\|M_{0n}-M_{0,\infty}\|\rightarrow 0$ as $n\rightarrow\infty$ because $\|M_{0n}(\gamma_{0,\infty})-M_{0,\infty}(\gamma_{0,\infty})\|\rightarrow 0$ , and $\|M_{0n}-M_{0n}(\gamma_{0,\infty})\|=\|M_{20n}-M_{20n}(\gamma_{0,\infty})\|\leq\|\mathfrak{H}_{n}(\bar{\gamma}_{n})\|(\gamma_{0n}-\gamma_{0,\infty})$ , where

\mathfrak{H}_{n}(\gamma)=\begin{pmatrix}E_{t_{0}n}[z_{it_{0}n}(1,\gamma)|\gamma]f_{t_{0}n}(\gamma)-E_{t_{0}-1,n}[z_{it_{0}n}(1,\gamma)|\gamma]f_{t_{0}-1,n}(\gamma)\\ \vdots\\ E_{Tn}[z_{iTn}(1,\gamma)|\gamma]f_{Tn}(\gamma)-E_{T-1,n}[z_{iTn}(1,\gamma)|\gamma]f_{T-1,n}(\gamma)\end{pmatrix},

and $\bar{\gamma}_{n}$ is between $\gamma_{0n}$ and $\gamma_{0,\infty}$ . Note that $\|\mathfrak{H}_{n}(\bar{\gamma}_{n})\|<C$ for some nonnegative $C<\infty$ for sufficiently large $n$ as $(\theta_{0n},F_{0n})\in\Phi_{0}$ .

Let $\omega_{in}=\{(z_{itn},y_{itn},x_{itn},\epsilon_{itn})_{t=1}^{T}\}$ and $g(\omega_{in},\theta)=(g_{t_{0}}(\omega_{in},\theta)^{\prime},\dots,g_{T}(\omega_{in},\theta)^{\prime})^{\prime}$ , where $g_{t}(\omega_{in},\theta)=z_{itn}(\Delta y_{itn}-\Delta x_{itn}^{\prime}\beta-1_{itn}(\gamma)^{\prime}X_{itn}\delta)$ . Let $\Omega_{n}=E[g(\omega_{in},\theta_{0n})g(\omega_{in},\theta_{0n})^{\prime}]$ , and $\Omega_{\infty}=E[g(\omega_{i,\infty},\theta_{0,\infty})g(\omega_{i,\infty},\theta_{0,\infty})^{\prime}]=\lim_{n\rightarrow\infty}\Omega_{n}$ . Let $\bar{g}_{n}(\theta)=\frac{1}{n}\sum_{i=1}^{n}g(\omega_{in},\theta)$ , $\hat{Q}_{n}(\theta)=\bar{g}_{n}(\theta)^{\prime}W_{n}\bar{g}_{n}(\theta)$ , and $g_{0n}(\theta)=E[g(\omega_{in},\theta)]$ , while $W_{n}=\{\frac{1}{n}\sum_{i=1}^{n}[g(\omega_{in},\hat{\theta}_{(1)n})g(\omega_{in},\hat{\theta}_{(1)n})^{\prime}]-\bar{g}_{n}(\hat{\theta}_{(1)n})\bar{g}_{n}(\hat{\theta}_{(1)n})^{\prime}\}^{-1}$ and $\hat{\theta}_{(1)n}=\arg\min_{\theta}\bar{g}_{n}(\theta)^{\prime}\bar{g}_{n}(\theta)$ is the initial estimator. $\hat{\theta}_{n}=(\hat{\alpha}_{n}^{\prime},\hat{\gamma}_{n})^{\prime}=\arg\min_{\theta}\hat{Q}_{n}(\theta)$ and $\mathcal{D}_{n}(\gamma)=n(\min_{\alpha\in A}\hat{Q}_{n}(\alpha,\gamma)-\hat{Q}_{n}(\hat{\theta}_{n}))$ .

Let $\omega_{in}^{*}$ be an i.i.d. draw along the index $i$ from $\{\omega_{in}:i=1,...,n\}$ . Let

$\displaystyle g_{in}^{*}(\theta)$	$\displaystyle=(g_{it_{0}n}^{}(\theta)^{\prime},...,g_{iTn}^{}(\theta)^{\prime})^{\prime}$
$\displaystyle g_{itn}^{*}(\theta)$	$\displaystyle=g_{t}(\omega_{in}^{},\theta)-g_{t}(\omega_{in}^{},\theta_{0n}^{})+g_{t}(\omega_{in}^{},\hat{\theta}_{n})$	(I.1)
	$\displaystyle=-z_{itn}^{}\Delta x_{itn}^{\prime}(\beta-\beta_{0n}^{})-z_{itn}^{}1_{itn}^{}(\gamma)^{\prime}X_{itn}^{}(\delta-\delta_{0n}^{*})$
	$\displaystyle\quad+z_{itn}^{}(1_{itn}^{}(\gamma_{0n}^{})^{\prime}-1_{itn}^{}(\gamma)^{\prime})X_{itn}^{}\delta_{0n}^{}+z_{itn}^{}\widehat{\Delta\epsilon}_{itn}^{},$

where $\theta_{0}^{*}=(\hat{\alpha}_{n}(\gamma_{0n})^{\prime},\gamma_{0n})^{\prime}$ and $\hat{\alpha}_{n}(\gamma)=\arg\min_{\alpha}\hat{Q}_{n}(\alpha,\gamma)$ . For the justification of the representation (I.1), please refer to (E.1) and description in Section E.1 . Note that $\bar{g}_{n}^{*}(\theta)=\frac{1}{n}\sum_{i=1}^{n}[g_{in}^{*}(\theta)-\bar{g}_{n}(\hat{\theta}_{n})]$ becomes the bootstrap sample moment from the grid bootstrap. Then, let $\hat{Q}_{n}^{*}(\theta)=\bar{g}_{n}^{*}(\theta)^{\prime}W_{n}^{*}\bar{g}_{n}^{*}(\theta)$ , $W_{n}^{*}=[\frac{1}{n}\sum_{i=1}^{n}\{g_{in}^{*}(\hat{\theta}_{(1)n}^{*})g_{in}^{*}(\hat{\theta}_{(1)n}^{*})^{\prime}\}-\{\frac{1}{n}\sum_{i=1}^{n}g_{in}^{*}(\hat{\theta}_{(1)n}^{*})\}\{\frac{1}{n}\sum_{i=1}^{n}g_{in}^{*}(\hat{\theta}_{(1)n}^{*})\}^{\prime}]^{-1}$ , $\hat{\theta}_{(1)n}^{*}=\arg\min_{\theta}\bar{g}_{n}^{*}(\theta)^{\prime}\bar{g}_{n}^{*}(\theta)$ , $\hat{\theta}_{n}^{*}=\arg\min_{\theta}\hat{Q}_{n}^{*}(\theta)$ , and $\mathcal{D}_{n}^{*}(\gamma)=n(\min_{\alpha}\hat{Q}_{n}^{*}(\alpha,\gamma)-\hat{Q}_{n}^{*}(\hat{\theta}_{n}^{*}))$ . Recall that in Section 4.1 the $100(1-\tau)$ % grid bootstrap confidence set was defined as

CI_{n,1-\tau}^{grid}=\{\gamma\in\Gamma:\mathcal{D}_{n}(\gamma)\leq\widehat{F}^{*-1}_{n}(1-\tau;\mathcal{D}_{n}^{*}(\gamma))\}.

Define a mapping $\pi_{n}:\Phi_{0}\rightarrow\Pi$ , where $\Pi=[-\infty,\infty]\bigtimes\mathbb{R}\bigtimes\Phi_{0}$ such that

\pi_{n}(\phi)=\begin{pmatrix}n^{1/4}(\delta_{1}+\delta_{3}\gamma)\\ (\delta_{1}+\delta_{3}\gamma)\\ \phi\end{pmatrix}.

This is because the limits of $n^{1/4}(\delta_{1}+\delta_{3}\gamma)$ and $(\delta_{1}+\delta_{3}\gamma)$ characterize the asymptotic behaviors of the test statistic used in the grid bootstrap.

Theorem I.1.

For any subsequence $\{p_{n}\}$ of $\{n:n\in\mathbb{N}\}$ and any sequence $\{\phi_{0p_{n}}\in\Phi_{0}:n\geq 1\}$ s.t. $\pi_{p_{n}}(\phi_{0p_{n}})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ ,

P_{\phi_{0p_{n}}}(\gamma_{0p_{n}}\in CI_{p_{n},1-\tau}^{grid})\rightarrow 1-\tau,

where $P_{\phi_{0p_{n}}}(\cdot)$ is the probability law under $\phi_{0p_{n}}=(\theta_{0p_{n}},F_{0p_{n}})$ . Moreover,

\liminf_{n\rightarrow\infty}\inf_{\phi_{0}\in\Phi_{0}}P_{\phi_{0}}(\gamma_{0}\in CI_{n,1-\tau}^{grid})=\limsup_{n\rightarrow\infty}\sup_{\phi_{0}\in\Phi_{0}}P_{\phi_{0}}(\gamma_{0}\in CI_{n,1-\tau}^{grid})=1-\tau,

which establishes the uniform validity of the grid bootstrap confidence interval.

Note that the last statement of Theorem I.1 follows from the theorem’s preceding statement, as the latter verifies Assumption B* from Andrews et al., (2020). Let $\{\pm\infty\}=\{-\infty,+\infty\}$ . To show Theorem I.1, we consider the following four cases:

(i) continuous: $\zeta_{1}=0$ and $\zeta_{2}=0$ .
(ii) semi-continuous: $\zeta_{1}\in\mathbb{R}\setminus\{0\}$ and $\zeta_{2}=0$ .
(iii) semi-discontinuous: $\zeta_{1}\in\{\pm\infty\}$ and $\zeta_{2}=0$ .
(vi) discontinuous: $\zeta_{1}\in\{\pm\infty\}$ and $\zeta_{2}\neq 0$ .

The following lemma implies Theorem I.1.

Lemma I.1.

For all sequences $\{\phi_{0p_{n}}\in\Phi_{0}:n\geq 1\}$ for which $\pi_{p_{n}}(\phi_{0p_{n}})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ , the following convergences hold ( $P$ in “ $\xrightarrow{d^{*}}$ in $P$ ” denotes the probability of $\{\omega_{ip_{n}}:1\leq i\leq p_{n},n=1,2,...\}$ ):

(i) For continuous case, $\mathcal{D}_{p_{n}}(\gamma_{0p_{n}})\xrightarrow{d}Z_{0}^{2}$ , and $\mathcal{D}_{p_{n}}^{*}(\gamma_{0p_{n}})\xrightarrow{d^{*}}Z_{0}^{2}$ in $P$ , where $Z_{0}=\max\{Z_{0}^{*},0\}$ and $Z_{0}^{*}\sim N(0,1)$ .
(ii) For semi-continuous case, $\mathcal{D}_{p_{n}}(\gamma_{0p_{n}})\xrightarrow{d}\mathcal{D}_{\infty}$ , and $\mathcal{D}_{p_{n}}^{*}(\gamma_{0p_{n}})\xrightarrow{d^{*}}\mathcal{D}_{\infty}$ in $P$ , where

\mathcal{D}_{\infty}=\begin{cases}(\frac{U}{\sqrt{\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}}})^{2}&\text{if }U\geq\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}\\ -(\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|})^{2}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}+2\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|}U&\text{if }U<\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}\end{cases},

$U\sim N(0,\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})$ , and $\Xi_{\infty}=\Omega_{\infty}^{-1}-\Omega_{\infty}^{-1}M_{0,\infty}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty})M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}$ .
(iii) For semi-discontinuous and discontinuous cases, $\mathcal{D}_{p_{n}}(\gamma_{0p_{n}})\xrightarrow{d}\chi^{2}_{1}$ , and $\mathcal{D}_{p_{n}}^{*}(\gamma_{0p_{n}})\xrightarrow{d^{*}}\chi^{2}_{1}$ in $P$ .

Remark 1.

Note that the distribution of $\mathcal{D}_{\infty}$ is (first-order) stochastically dominated by the $\chi^{2}_{1}$ distribution. This is because $f_{1}(Z_{0}):=(\frac{Z_{0}}{\sqrt{\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}}})^{2}=-(\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|})^{2}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}+2\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|}Z_{0}=:f_{2}(Z_{0})$ when $Z_{0}=\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}<0$ , and $f_{1}^{\prime}(Z_{0})<f_{2}^{\prime}(Z_{0})$ when $Z_{0}<\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}$ , which implies $f_{1}(Z_{0})>f_{2}(Z_{0})$ for $Z_{0}<\frac{-\zeta_{1}^{2}}{2|\delta_{30,\infty}|}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}$ .

Proof of Lemma I.1.

We prove the result for sequence $\{n\}$ rather than $\{p_{n}\}$ to ease notation. Then, we can replace $\{n\}$ by $\{p_{n}\}$ to complete the proof.

First, we derive the consistency, convergence rates, and asymptotic distributions of $\hat{\theta}_{n}$ , and then we derive the asymptotic distributions of $\mathcal{D}_{n}(\gamma_{0n})$ , depending on the regimes determined by $\zeta_{1}$ and $\zeta_{2}$ . Then, the same results are derived for bootstrap estimator and test statistic for each case.

Consistency of estimator

Define $\hat{\alpha}_{n}(\gamma)=\arg\min_{\alpha\in A}\hat{Q}_{n}(\alpha,\gamma)$ , which is

	$\displaystyle\hat{\alpha}_{n}(\gamma)=-(\bar{M}_{n}(\gamma)^{\prime}W_{n}\bar{M}_{n}(\gamma))^{-1}\bar{M}_{n}(\gamma)^{\prime}W_{n}\bar{v}_{n}$
	$\displaystyle\bar{v}_{n}=-\bar{M}_{n}\alpha_{0n}+u_{n},\quad u_{n}=\frac{1}{n}\sum_{i=1}^{n}\begin{pmatrix}z_{it_{0}n}\Delta\epsilon_{it_{0}n}\\ \vdots\\ z_{iTn}\Delta\epsilon_{iTn}\end{pmatrix}.$

Therefore, $\hat{\alpha}_{n}(\gamma)=-(\bar{M}_{n}(\gamma)^{\prime}W_{n}\bar{M}_{n}(\gamma))^{-1}\bar{M}_{n}(\gamma)^{\prime}W_{n}(-\bar{M}_{n}\alpha_{0n}+u_{n})$ .

Note that $u_{n}\xrightarrow{p}0$ by the WLLN for triangular array which holds as $\sup_{n\in\mathbb{N}}E\|z_{itn}\Delta\epsilon_{itn}\|^{2}\leq\sup_{n\in\mathbb{N}}(E\|z_{itn}\|^{4})^{1/2}(E\|\Delta\epsilon_{itn}\|^{4})^{1/2}<\infty$ . Furthermore, $\sup_{\gamma\in\Gamma}\|\bar{M}_{n}(\gamma)-M_{0n}(\gamma)\|\xrightarrow{p}0$ by Lemma I.3. Thus, $\sup_{\gamma\in\Gamma}\|\hat{\alpha}_{n}(\gamma)-(M_{0n}(\gamma)^{\prime}WM_{0n}(\gamma))^{-1}M_{0n}(\gamma)^{\prime}WM_{0n}\alpha_{0n}\|\xrightarrow{p}0$ so that $\|\hat{\alpha}_{n}(\hat{\gamma}_{n})-\alpha_{0n}\|\xrightarrow{p}0$ if $\hat{\gamma}_{n}=\arg\min_{\gamma\in\Gamma}\tilde{Q}_{n}(\gamma)$ , where $\tilde{Q}_{n}(\gamma)=\hat{Q}_{n}(\hat{\alpha}_{n}(\gamma),\gamma)$ , is consistent such that $|\hat{\gamma}_{n}-\gamma_{0n}|\xrightarrow{p}0$ .

If $\hat{\theta}_{(1)n}$ is consistent, then $\|W_{n}-\Omega_{n}^{-1}\|\rightarrow 0$ by Lemma I.4. Then,

\sup_{\gamma\in\Gamma}\left|\tilde{Q}_{n}(\gamma)-\|(I-P_{\Omega_{n}^{-1/2}M_{0n}(\gamma)})(\Omega_{n}^{-1/2}M_{0n}\alpha_{0n})\|^{2}\right|\rightarrow 0.

Since $\sigma_{\min}\left(\left[\begin{array}[]{c;{2pt/2pt}c}M_{20n}&\widetilde{H}_{n}\end{array}\right]\right)\geq c_{12}$ for all $n$ , $M_{20n}\delta_{0n}$ is not in the column space of $M_{20n}(\gamma)$ , and $\gamma_{0n}$ is the unique minimizer of $\|(I-P_{\Omega_{n}^{-1/2}M_{0n}(\gamma)})(\Omega_{n}^{-1/2}M_{0n}\alpha_{0n})\|$ . By applying the argmin CMT as in the proof of Theorem 2, $|\hat{\gamma}_{n}-\gamma_{0n}|\xrightarrow{p}0$ can be derived. Derivation of the consistency of $\hat{\theta}_{(1)n}$ is straightforward if we replace $\Omega_{n}^{-1/2}$ by the identity matrix.

Convergence rate of estimator

By Lemma I.5 and $\|\hat{\theta}_{n}-\theta_{0n}\|\xrightarrow{p}0$ , $\sqrt{n}\|\bar{g}_{n}(\hat{\theta}_{n})-\bar{g}_{n}(\theta_{0n})-g_{0n}(\hat{\theta}_{n})\|=o_{p}(1)$ . As $\|W_{n}-\Omega_{n}^{-1}\|\xrightarrow{p}0$ ,

\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta}_{n})-W_{n}^{1/2}\bar{g}_{n}(\theta_{0n})-\Omega_{n}^{-1/2}g_{0n}(\hat{\theta}_{n})\|=o_{p}(1).

By triangle inequality, $\sqrt{n}\|\Omega_{n}^{-1/2}g_{0n}(\hat{\theta}_{n})\|\leq\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta}_{n})\|+\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\theta_{0n})\|+o_{p}(1)$ . As $\hat{\theta}_{n}$ minimizes $\|W_{n}^{1/2}\bar{g}_{n}(\theta)\|$ , $\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta}_{n})\|\leq\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\theta_{0n})\|$ . Note that $\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\theta_{0n})\|=O_{p}(1)$ because $\|W_{n}\|=O_{p}(1)$ , while the CLT for triangular array implies $\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}\Delta\epsilon_{itn}\xrightarrow{d}N(0,\lim_{n\rightarrow\infty}E[z_{itn}z_{itn}^{\prime}\Delta\epsilon_{itn}^{2}])$ . The CLT holds by combination of Lyapunov condition and Cramér-Wold if $\lim_{n\rightarrow\infty}\frac{E[(\lambda^{\prime}z_{itn})^{2+r}\Delta\epsilon_{itn}^{2+r}]}{n^{r/2}\{E[(\lambda^{\prime}z_{itn})^{2}\Delta\epsilon_{itn}^{2}]\}^{1+r/2}}=0$ for some $r>0$ and for any $\lambda\in\mathbb{R}^{dim(z_{it})}$ , which holds as $\inf_{n\in\mathbb{N}}\sigma_{\min}(\Omega_{n})>0$ and $\sup_{n\in\mathbb{N}}\max\{(E\|z_{itn}\|^{4+2r})^{1/2},(E\Delta\epsilon_{itn}^{4+2r})^{1/2}\}<\infty$ for some $r>0$ . Therefore,

\begin{array}[]{rcl}\sqrt{n}\|\Omega_{n}^{-1/2}g_{0n}(\hat{\theta}_{n})\|&\leq&\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\hat{\theta}_{n})\|+\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\theta_{0n})\|+o_{p}(1)\\ &\leq&2\sqrt{n}\|W_{n}^{1/2}\bar{g}_{n}(\theta_{0n})\|+o_{p}(1)\\ &=&O_{p}(1),\end{array}

while $\sqrt{n}\|\Omega_{n}^{-1/2}g_{0n}(\hat{\theta}_{n})\|\geq\sqrt{n}\|\Omega_{n}^{-1/2}M_{0n}(\hat{\alpha}_{n}-\alpha_{0n})+\Omega_{n}^{-1/2}\widetilde{H}_{n}[(\delta_{10n}+\delta_{30n}\gamma_{0n})(\hat{\gamma}_{n}-\gamma_{0n})+\frac{\delta_{30n}}{2}(\hat{\gamma}_{n}-\gamma_{0n})^{2}]\|+o(\sqrt{n}(\|\hat{\alpha}_{n}-\alpha_{0n}\|+|(\delta_{10n}+\delta_{30n}\gamma_{0n})(\hat{\gamma}_{n}-\gamma_{0n})|+(\hat{\gamma}_{n}-\gamma_{0n})^{2}))$ by Lemma I.2.

In conclusion,

\sqrt{n}(\|\hat{\alpha}_{n}-\alpha_{0n}\|+|(\delta_{10n}+\delta_{30n}\gamma_{0n})(\hat{\gamma}_{n}-\gamma_{0n})|+(\hat{\gamma}_{n}-\gamma_{0n})^{2})\leq O_{p}(1).

It implies that $\sqrt{n}\|\hat{\alpha}_{n}-\alpha_{0n}\|=O_{p}(1)$ for any values of $\zeta_{1}=\lim_{n}n^{1/4}(\delta_{10n}+\delta_{30n}\gamma_{0n})$ and $\zeta_{2}=\lim_{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})$ , while for $\hat{\gamma}_{n}$ ,

(i) $n^{1/4}(\hat{\gamma}_{n}-\gamma_{0n})=O_{p}(1)$ if $\zeta_{1}=\zeta_{2}=0$
(ii) $n^{1/4}(\hat{\gamma}_{n}-\gamma_{0n})=O_{p}(1)$ if $\zeta_{1}\in\mathbb{R}\setminus\{0\},\zeta_{2}=0$
(iii) $\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})(\hat{\gamma}_{n}-\gamma_{0n})=O_{p}(1)$ if $|\zeta_{1}|=\infty,\zeta_{2}=0$
(vi) $\sqrt{n}(\hat{\gamma}_{n}-\gamma_{0n})=O_{p}(1)$ if $|\zeta_{1}|=\infty,\zeta_{2}\neq 0$ .

Asymptotic distribution of estimator and test statistic

We only consider (ii) semi-continuous and (iii) semi-discontinuous cases since the proofs for (i) continuous and (iv) discontinuous cases are almost identical to the proof of continuous and discontinuous cases in Theorem 3.

Case (ii): Let $a=\sqrt{n}(\alpha-\alpha_{0n})$ and $b=n^{1/4}(\gamma-\gamma_{0n})$ . Additionally, define $\hat{a}_{n}=\sqrt{n}(\hat{\alpha}-\alpha_{0n})$ and $\hat{b}_{n}=n^{\frac{1}{4}}(\hat{\gamma}-\gamma_{0n})$ . Let

\mathbb{S}_{n}(a,b)=n\hat{Q}_{n}(\alpha_{0n}+\tfrac{a}{\sqrt{n}},\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})=n\bar{g}_{n}(\alpha_{0n}+\tfrac{a}{\sqrt{n}},\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}W_{n}\bar{g}_{n}(\alpha_{0n}+\tfrac{a}{\sqrt{n}},\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}}).

The rescaled and reparametrized sample moment can be written as

	$\displaystyle\sqrt{n}\bar{g}_{n}(\alpha_{0n}+\tfrac{a}{\sqrt{n}},\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})=$	$\displaystyle\sqrt{n}\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\Delta\epsilon_{it_{0}n}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iTn}\Delta\epsilon_{iTn}\end{pmatrix}-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\Delta x_{it_{0}n}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}\Delta x_{iTn}^{\prime}\end{pmatrix}a_{1}$
		$\displaystyle-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}1_{it_{0}n}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}n}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}1_{iTn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iTn}\end{pmatrix}a_{2}$
		$\displaystyle+\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}(1_{it_{0}n}(\gamma_{0n})^{\prime}-1_{it_{0}n}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it_{0}n}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}(1_{iTn}(\gamma_{0n})^{\prime}-1_{iT}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{iTn}\end{pmatrix}\delta_{0n}.$

By the CLT for triangular array,

\sqrt{n}\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\Delta\epsilon_{it_{0}n}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iTn}\Delta\epsilon_{iTn}\end{pmatrix}\xrightarrow{d}-e\sim N(0,\Omega_{\infty}).

Note that the CLT holds by combination of Lyapunov condition and Cramér-Wold device if $\lim_{n\rightarrow\infty}\frac{E[(\lambda^{\prime}z_{itn})^{2+r}\Delta\epsilon_{itn}^{2+r}]}{n^{r/2}\{E[(\lambda^{\prime}z_{itn})^{2}\Delta\epsilon_{itn}^{2}]\}^{1+r/2}}=0$ for some $r>0$ for any $\lambda\in\mathbb{R}^{k}$ , which holds as $\inf_{n\in\mathbb{N}}\sigma_{\min}(\Omega_{n})>0$ and $\sup_{n\in\mathbb{N}}\max\{(E\|z_{itn}\|^{4+2r})^{1/2},(E\Delta\epsilon_{itn}^{4+2r})^{1/2}\}<\infty$ for some $r>0$ . By the WLLN for triangular array,

\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\Delta x_{it_{0}n}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}\Delta x_{iTn}^{\prime}\end{pmatrix}\xrightarrow{p}\begin{pmatrix}Ez_{it_{0},\infty}\Delta x_{it_{0},\infty}^{\prime}\\ \vdots\\ Ez_{iT,\infty}\Delta x_{iT,\infty}^{\prime}\end{pmatrix},

which holds as $\sup_{n\in\mathbb{N}}E\|z_{itn}\Delta x_{itn}\|^{2}\leq\sup_{n\in\mathbb{N}}(E\|z_{itn}\|^{4})^{1/2}(E\|\Delta x_{itn}\|^{4})^{1/2}<\infty$ . Let $K<\infty$ be some constant. By the ULLN in Lemma I.3,

\left\|\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}1_{it_{0}n}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}n}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}1_{iTn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iTn}\end{pmatrix}-\begin{pmatrix}Ez_{it_{0},\infty}1_{it_{0},\infty}(\gamma_{0,\infty}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0},\infty}\\ \vdots\\ Ez_{iT,\infty}1_{iT,\infty}(\gamma_{0,\infty}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT,\infty}\end{pmatrix}\right\|\xrightarrow{p}0

uniformly with respect to $b\in[-K,K]$ . Then, by the continuity of $\kappa\mapsto E[z_{it,\infty}1_{it,\infty}(\gamma_{0,\infty}+\kappa)X_{it,\infty}]$ at $\kappa=0$ ,

\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0},\infty}1_{it_{0},\infty}(\gamma_{0,\infty}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0},\infty}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT,\infty}1_{iT,\infty}(\gamma_{0,\infty}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT,\infty}\end{pmatrix}\xrightarrow{p}\begin{pmatrix}Ez_{it_{0},\infty}1_{it_{0,\infty}}(\gamma_{0,\infty})^{\prime}X_{it_{0},\infty}\\ \vdots\\ Ez_{iT,\infty}1_{iT,\infty}(\gamma_{0,\infty})^{\prime}X_{iT,\infty}\end{pmatrix}

uniformly with respect to $b\in[-K,K]$ . By Lemma I.6,

\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}(1_{it_{0}n}(\gamma_{0n})^{\prime}-1_{it_{0}n}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it_{0}n}\delta_{0n}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}(1_{iTn}(\gamma_{0n})^{\prime}-1_{iTn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{iTn}\delta_{0n}\end{pmatrix}\\ \xrightarrow{p}\begin{pmatrix}E_{t_{0},\infty}[z_{it_{0},\infty}|\gamma_{0,\infty}]f_{t_{0},\infty}(\gamma_{0,\infty})-E_{t_{0}-1,\infty}[z_{iT,\infty}|\gamma_{0,\infty}]f_{t_{0}-1,\infty}(\gamma_{0,\infty})\\ \vdots\\ E_{T,\infty}[z_{iT,\infty}|\gamma_{0,\infty}]f_{T,\infty}(\gamma_{0,\infty})-E_{T-1,\infty}[z_{iT,\infty}|\gamma_{0,\infty}]f_{T-1,\infty}(\gamma_{0,\infty})\end{pmatrix}\left\{\zeta_{1}b+\frac{\delta_{30,\infty}}{2}b^{2}\right\}

uniformly with respect to $b\in[-K,K]$ . Therefore, $\mathbb{S}_{n}(a,b)$ weakly converges to

\mathbb{S}(a,b)=(M_{0,\infty}a+\widetilde{H}_{\infty}(\zeta_{1}b+\frac{\delta_{30,\infty}}{2}b^{2})-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}(\zeta_{1}b+\frac{\delta_{30,\infty}}{2}b^{2})-e),

in $\ell^{\infty}(\mathbb{K})$ for any compact $\mathbb{K}\subset\mathbb{R}^{2p+2}$ .

Let $\tilde{b}=\zeta_{1}b+\frac{\delta_{30,\infty}}{2}b^{2}$ and $\hat{\tilde{b}}_{n}=\zeta_{1}\hat{b}_{n}+\frac{\delta_{30,\infty}}{2}\hat{b}_{n}^{2}$ . We consider $\delta_{30,\infty}>0$ so that $\tilde{b}\geq-\frac{\zeta_{1}^{2}}{2\delta_{30,\infty}}$ . When $\delta_{30,\infty}<0$ , derivations are almost identical and lead to the same limit distribution of the test statistic. Let $\underline{b}=-\frac{\zeta_{1}^{2}}{2\delta_{30,\infty}}$ . Then, by the CMT,

(\hat{a}_{n},\hat{\tilde{b}}_{n})\xrightarrow{d}(a_{0},\tilde{b}_{0})=\arg\min_{a,\tilde{b}\geq\underline{b}}(M_{0,\infty}a+\widetilde{H}_{\infty}\tilde{b}-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}\tilde{b}-e).

KKT conditions, as in the proof of Theorem 2, imply

	$\displaystyle M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}\tilde{b}_{0}-M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}e=0,$
	$\displaystyle\widetilde{H}_{\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}\tilde{b}_{0}+\widetilde{H}_{\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}-\widetilde{H}_{\infty}^{\prime}\Omega_{\infty}^{-1}e-\lambda=0,$

$\lambda\geq 0$ , $\tilde{b}_{0}\geq\underline{b}$ , and $\lambda(\tilde{b}_{0}-\underline{b})=0$ should hold. Then, we can get

\tilde{b}_{0}=\begin{cases}[\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}]^{-1}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}e&\text{if }[\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}]^{-1}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}e\geq\underline{b}\\ \quad\quad\underline{b}&\text{else}\\ \end{cases}

where $\Xi_{\infty}=\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}$ . $\tilde{b}_{0}$ follows a normal distribution that is left censored at $\underline{b}$ . Then,

a_{0}=\begin{cases}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty})^{-1}M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}[I-\widetilde{H}_{\infty}[\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}]^{-1}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}]e&\text{if }[\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}]^{-1}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}e\geq\underline{b}\\ (M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty})^{-1}M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}(e-\widetilde{H}_{\infty}\underline{b})&\text{else.}\end{cases}

Asymptotic distribution of the test statistic $\mathcal{D}_{n}(\gamma_{0n})$ can be derived by

	$\displaystyle\mathcal{D}_{n}(\gamma_{0n})\xrightarrow{d}$	$\displaystyle\min_{a}(M_{0,\infty}a-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a-e)$
		$\displaystyle-\min_{a,\tilde{b}\geq\underline{b}}(M_{0,\infty}a+\widetilde{H}_{\infty}\tilde{b}-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}\tilde{b}-e),$

where we apply the CMT. Note that $\min_{a}(M_{0,\infty}a-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a-e)=e^{\prime}\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}e$ , while

	$\displaystyle\min_{a,\tilde{b}\geq\underline{b}}(M_{0,\infty}a+\widetilde{H}_{\infty}\tilde{b}-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}\tilde{b}-e)$
	$\displaystyle=(M_{0,\infty}a_{0}+\widetilde{H}_{\infty}\tilde{b}_{0}-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a_{0}+\widetilde{H}_{\infty}\tilde{b}_{0}-e)$
	$\displaystyle=(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}\tilde{b}_{0})^{\prime}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty})^{-1}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}\tilde{b}_{0})$
	$\displaystyle\quad+\tilde{b}_{0}\widetilde{H}_{\infty}^{\prime}\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}\widetilde{H}_{\infty}\tilde{b}_{0}$
	$\displaystyle\quad-2e^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty})^{-1}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}\tilde{b}_{0})$
	$\displaystyle\quad-2e^{\prime}\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}\widetilde{H}_{\infty}\tilde{b}_{0}+e^{\prime}\Omega_{\infty}^{-1}e.$

By plugging in the formula for $(a_{0},\tilde{b}_{0})$ (note that $M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}\tilde{b}_{0}=M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}e$ ) we can get

	$\displaystyle\min_{a,\tilde{b}\geq\underline{b}}(M_{0,\infty}a+\widetilde{H}_{\infty}\tilde{b}-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}\tilde{b}-e)$
	$\displaystyle\quad=\begin{cases}e^{\prime}\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}e-e^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}(\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})^{-1}\widetilde{H}_{\infty}\Xi_{\infty}e&\text{if }[\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}]^{-1}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}e\geq\underline{b}\\ e^{\prime}\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}e+(\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})\underline{b}^{2}-2(e^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})\underline{b}&\text{else}.\end{cases}$

Therefore, the limit distribution of the test statistic is identical to

\begin{cases}e^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}(\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})^{-1}\widetilde{H}_{\infty}\Xi_{\infty}e&\text{if }[\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}]^{-1}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}e\geq\underline{b}\\ -(\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})\underline{b}^{2}+2(e^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})\underline{b}&\text{else}.\end{cases}

Case (iii): Let $a=\sqrt{n}(\alpha-\alpha_{0n})$ and $b=\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})(\gamma-\gamma_{0n})$ . The rescaled and reparametrized sample moment can be written as

	$\displaystyle\sqrt{n}\bar{g}_{n}(\alpha_{0n}+\tfrac{a}{\sqrt{n}},\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})=$
	$\displaystyle\hskip 56.9055pt\sqrt{n}\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\Delta\epsilon_{it_{0}n}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iTn}\Delta\epsilon_{iTn}\end{pmatrix}-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\Delta x_{it_{0}n}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}\Delta x_{iTn}^{\prime}\end{pmatrix}a_{1}$
	$\displaystyle\hskip 56.9055pt-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}1_{it_{0}n}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime}X_{it_{0}n}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}1_{iTn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime}X_{iTn}\end{pmatrix}a_{2}$
	$\displaystyle\hskip 56.9055pt+\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}(1_{it_{0}n}(\gamma_{0n})^{\prime}-1_{it_{0}n}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{it_{0}n}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}(1_{iTn}(\gamma_{0n})^{\prime}-1_{iT}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{iTn}\end{pmatrix}\delta_{0n}.$

By the CLT for triangular array,

\sqrt{n}\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\Delta\epsilon_{it_{0}n}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iTn}\Delta\epsilon_{iTn}\end{pmatrix}\xrightarrow{d}-e\sim N(0,\Omega_{\infty}).

By the WLLN for triangular array,

\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\Delta x_{it_{0}n}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}\Delta x_{iTn}^{\prime}\end{pmatrix}\xrightarrow{p}\begin{pmatrix}Ez_{it_{0},\infty}\Delta x_{it_{0},\infty}^{\prime}\\ \vdots\\ Ez_{iT,\infty}\Delta x_{iT,\infty}^{\prime}\end{pmatrix}.

By the ULLN in Lemma I.3,

\left\|\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}1_{it_{0}n}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime}X_{it_{0}n}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}1_{iTn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime}X_{iTn}\end{pmatrix}\right.-\\ \left.\begin{pmatrix}Ez_{it_{0},\infty}1_{it_{0},\infty}(\gamma_{0,\infty}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime}X_{it_{0},\infty}\\ \vdots\\ Ez_{iT,\infty}1_{iT,\infty}(\gamma_{0,\infty}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime}X_{iT,\infty}\end{pmatrix}\right\|\xrightarrow{p}0

uniformly with respect to $b\in[-K,K]$ , which implies

\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0},\infty}1_{it_{0},\infty}(\gamma_{0,\infty}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime}X_{it_{0},\infty}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT,\infty}1_{iT,\infty}(\gamma_{0,\infty}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime}X_{iT,\infty}\end{pmatrix}\xrightarrow{p}\begin{pmatrix}Ez_{it_{0},\infty}1_{it_{0,\infty}}(\gamma_{0,\infty})^{\prime}X_{it_{0},\infty}\\ \vdots\\ Ez_{iT,\infty}1_{iT,\infty}(\gamma_{0,\infty})^{\prime}X_{iT,\infty}\end{pmatrix}

uniformly with respect to $b\in[-K,K]$ . By Lemma I.7,

\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}(1_{it_{0}n}(\gamma_{0n})^{\prime}-1_{it_{0}n}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{it_{0}n}\delta_{0n}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iTn}(1_{iTn}(\gamma_{0n})^{\prime}-1_{iTn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{iTn}\delta_{0n}\end{pmatrix}\\ \xrightarrow{p}\begin{pmatrix}E_{t_{0},\infty}[z_{it_{0},\infty}|\gamma_{0,\infty}]f_{t_{0},\infty}(\gamma_{0,\infty})-E_{t_{0}-1,\infty}[z_{iT,\infty}|\gamma_{0,\infty}]f_{t_{0}-1,\infty}(\gamma_{0,\infty})\\ \vdots\\ E_{T,\infty}[z_{iT,\infty}|\gamma_{0,\infty}]f_{T,\infty}(\gamma_{0,\infty})-E_{T-1,\infty}[z_{iT,\infty}|\gamma_{0,\infty}]f_{T-1,\infty}(\gamma_{0,\infty})\end{pmatrix}b

uniformly with respect to $b\in[-K,K]$ . Therefore, $\mathbb{S}_{n}(a,b)=n\hat{Q}_{n}(\alpha_{0n}+\tfrac{a}{\sqrt{n}},\gamma_{0n}+\frac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})$ weakly converges to

\mathbb{S}(a,b)=(M_{0,\infty}a+\widetilde{H}_{\infty}b-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e),

in $\ell^{\infty}(\mathbb{K})$ for any compact $\mathbb{K}\subset\mathbb{R}^{2p+2}$ . Then, $\hat{a}_{n}=\sqrt{n}(\hat{\alpha}_{n}-\alpha_{0n})$ and $\hat{b}_{n}=\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})(\hat{\gamma}_{n}-\gamma_{0n})$ converges in distribution to

(a_{0},b_{0})=\arg\min_{a,b}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e).

by the argmin CMT. KKT conditions, as in the proof of Theorem 2, imply

	$\displaystyle M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}b_{0}-M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}e=0$
	$\displaystyle\widetilde{H}_{\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}b_{0}+\widetilde{H}_{\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}-\widetilde{H}_{\infty}^{\prime}\Omega_{\infty}^{-1}e=0.$

Then, we can get

b_{0}=[\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}]^{-1}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}e,

where $\Xi_{\infty}=\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}$ , and

a_{0}=(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty})^{-1}M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}[I-\widetilde{H}_{\infty}[\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}]^{-1}\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}]e

Asymptotic distribution of the test statistic $\mathcal{D}_{n}(\gamma_{0n})$ can be derived by

	$\displaystyle\mathcal{D}_{n}(\gamma_{0n})\xrightarrow{d}$	$\displaystyle\min_{a}(M_{0,\infty}a-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a-e)$
		$\displaystyle-\min_{a,b}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e),$

	$\displaystyle\min_{a,b}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e)$
	$\displaystyle=(M_{0,\infty}a_{0}+\widetilde{H}_{\infty}b_{0}-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a_{0}+\widetilde{H}_{\infty}b_{0}-e)$
	$\displaystyle=(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}b_{0})^{\prime}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty})^{-1}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}b_{0})$
	$\displaystyle\quad+b_{0}\widetilde{H}_{\infty}^{\prime}\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}\widetilde{H}_{\infty}b_{0}$
	$\displaystyle\quad-2e^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty})^{-1}(M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}b_{0})$
	$\displaystyle\quad-2e^{\prime}\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}\widetilde{H}_{\infty}b_{0}+e^{\prime}\Omega_{\infty}^{-1}e.$

By plugging in the formula for $(a_{0},b_{0})$ (note that $M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}M_{0,\infty}a_{0}+M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}\widetilde{H}_{\infty}b_{0}=M_{0,\infty}^{\prime}\Omega_{\infty}^{-1}e$ ), we can get

	$\displaystyle\min_{a,b}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e)^{\prime}\Omega_{\infty}^{-1}(M_{0,\infty}a+\widetilde{H}_{\infty}b-e)$
	$\displaystyle\quad=e^{\prime}\Omega_{\infty}^{-1/2}(I-P_{\Omega_{\infty}^{-1/2}M_{0,\infty}})\Omega_{\infty}^{-1/2}e-e^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}(\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})^{-1}\widetilde{H}_{\infty}\Xi_{\infty}e$

Therefore, the limit distribution of the test statistic is identical to

e^{\prime}\Xi_{\infty}\widetilde{H}_{\infty}(\widetilde{H}_{\infty}^{\prime}\Xi_{\infty}\widetilde{H}_{\infty})^{-1}\widetilde{H}_{\infty}\Xi_{\infty}e,

which has the $\chi^{2}_{1}$ distribution.

Limit distribution of bootstrap estimator and test statistic

The derivation of the limit distributions of the bootstrap estimator and test statistic is almost identical to that of the asymptotic distributions of the sample estimator and test statistic. We need to replace $\delta_{0n}$ by $\delta_{0n}^{*}=\hat{\delta}_{0n}(\gamma_{0n})$ , $\{\Delta\epsilon_{itn}\}$ by $\{\widehat{\Delta\epsilon}_{itn}\}$ , and sample moments by bootstrap moments in the previous part of the proof regarding asymptotic analysis. Be mindful that we do not need to replace $\gamma_{0n}$ in the previous part of the proof as we focus on the grid bootstrap when $\gamma_{0n}^{*}=\gamma_{0n}$ to show that the grid bootstrap CI provides correct coverage rate. Lemmas I.10, I.11, I.12, and I.13 are applied instead of Lemmas I.3, I.5, I.6, and I.7 in the places where the latter are used in the previous part of the proof. Moreover, Lemmas I.8 and I.9 are applied instead of the WLLN and CLT for triangular array applied to $\{z_{itn}\Delta\epsilon_{itn}:1\leq i\leq n,n\in\mathbb{N}\}$ in the places where the latter are used in the previous part of the proof. ∎

I.1 Auxiliary Lemmas

Lemma I.2.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ . For any $\eta>0$ , there is $h>0$ such that

\lim_{n\rightarrow\infty}\sup_{\|\theta-\theta_{0n}\|<h}\frac{\sqrt{n}\left\|g_{0n}(\theta)-M_{0n}(\alpha-\alpha_{0n})-\widetilde{H}_{n}[(\delta_{10n}+\delta_{30n}\gamma_{0n})(\gamma-\gamma_{0n})+\frac{\delta_{30n}}{2}(\gamma-\gamma_{0n})^{2}]\right\|}{1+\sqrt{n}(\|\alpha-\alpha_{0n}\|+|(\delta_{10n}+\delta_{30n}\gamma_{0n})(\gamma-\gamma_{0n})|+(\gamma-\gamma_{0n})^{2})}<\eta.

Proof.

Note that $g_{0n}(\theta)-M_{0n}(\alpha-\alpha_{0n})=M_{0n}(\gamma)\alpha-M_{0n}\alpha_{0n}-M_{0n}(\alpha-\alpha_{0n})=(M_{0n}(\gamma)-M_{0n})\alpha=(M_{20n}(\gamma)-M_{20n})\delta=(M_{20n}(\gamma)-M_{20n})[\delta_{0n}+(\delta-\delta_{0n})]$ .

First, we derive a bound for $(M_{20n}(\gamma)-M_{20n})\delta_{0n}$ which is

\begin{pmatrix}E[z_{it_{0}n}(\delta_{10n}+\delta_{30n}q_{it_{0}n})1\{\gamma\geq q_{it_{0}n}>\gamma_{0n}\}]-E[z_{it_{0}-1,n}(\delta_{10n}+\delta_{30n}q_{it_{0}-1,n})1\{\gamma\geq q_{it_{0}-1,n}>\gamma_{0n}\}]\\ \vdots\\ E[z_{iTn}(\delta_{10n}+\delta_{30n}q_{iTn})1\{\gamma\geq q_{iTn}>\gamma_{0n}\}]-E[z_{iT-1,n}(\delta_{10n}+\delta_{30n}q_{iT-1,n})1\{\gamma\geq q_{iT-1,n}>\gamma_{0n}\}]\end{pmatrix}.

Suppose $\gamma>\gamma_{0n}$ , and the other case can be analyzed similarly. By Taylor expansion,

E[z_{itn}(\delta_{10n}+\delta_{30n}q_{itn})1\{\gamma\geq q_{itn}>\gamma_{0n}\}]\\ =E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\left\{(\delta_{10n}+\delta_{30n}\gamma_{0n})\cdot(\gamma-\gamma_{0n})+\frac{\delta_{30n}}{2}(\gamma-\gamma_{0n})^{2}\right\}+R_{n},

where

R_{n}=\frac{1}{2}\frac{d}{d\gamma}\left(E_{tn}[z_{itn}|\gamma]f_{tn}(\gamma)\right)|_{\gamma=\bar{\gamma}_{0n}}\times(\delta_{10n}+\delta_{30n}\bar{\gamma}_{0n})(\gamma-\gamma_{0n})^{2}\\ +\frac{1}{2}\left\{E_{tn}[z_{itn}|\bar{\gamma}_{0n}]f_{tn}(\bar{\gamma}_{0n})-E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\right\}(\gamma-\gamma_{0n})^{2},

and $\bar{\gamma}_{0n}\in[\gamma_{0n},\gamma]$ . Suppose $|\gamma-\gamma_{0n}|\leq h_{1}$ . For sufficiently small $h_{1}>0$ , there is $N$ such that if $n>N$ , then $\|\frac{d}{d\gamma}\left(E_{tn}[z_{itn}|\gamma]f_{tn}(\gamma)\right)|_{\gamma=\bar{\gamma}_{0n}}\|\leq C_{1}<\infty$ for some $C_{1}<\infty$ . There also exists $C_{2}<\infty$ such that $\delta_{10n}+\delta_{30n}\bar{\gamma}_{0n}\leq(\delta_{10n}+\delta_{30n}\gamma_{0n})+\sup_{n}|\delta_{30n}|h_{1}\leq(\delta_{10n}+\delta_{30n}\gamma_{0n})+C_{2}h_{1}$ , and hence $\|\frac{d}{d\gamma}\left(E_{tn}[z_{itn}|\gamma]f_{tn}(\gamma)\right)|_{\gamma=\bar{\gamma}_{0n}}\times(\delta_{10n}+\delta_{30n}\bar{\gamma}_{0n})(\gamma-\gamma_{0n})^{2}\|\leq C_{1}((\delta_{10n}+\delta_{30n}\gamma_{0n})+C_{2}h_{1})h_{1}^{2}$ for sufficiently large $n$ . Moreover, there exists $C_{3}<\infty$ such that $\|E_{tn}[z_{itn}|\bar{\gamma}_{0n}]f_{tn}(\bar{\gamma}_{0n})-E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\|\leq\sup_{\bar{\gamma}:|\gamma_{0n}-\bar{\gamma}|\leq h_{1}}\|\frac{d}{d\gamma}\left(E_{tn}[z_{itn}|\gamma]f_{tn}(\gamma)\right)\mid_{\gamma=\bar{\gamma}}\|h_{1}\leq C_{3}h_{1}$ for sufficiently small $h_{1}>0$ and sufficiently large $n$ . Hence, $\|R_{n}\|<C((\delta_{10n}+\delta_{30n}\gamma_{0n})h_{1}^{2}+h_{1}^{3})$ for some $C<\infty$ and for sufficiently small $h_{1}>0$ and sufficiently large $n$ . Therfore, there exists $h_{1}>0$ such that if $|\gamma-\gamma_{0n}|\leq h_{1}$ , then

\Bigl{\|}E[z_{itn}(\delta_{10n}+\delta_{30n}q_{itn})1\{\gamma\geq q_{itn}>\gamma_{0n}\}]\Bigr{.}-E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\\ \left.\times\left\{(\delta_{10n}+\delta_{30n}\gamma_{0n})\cdot(\gamma-\gamma_{0n})+\frac{\delta_{30n}}{2}(\gamma-\gamma_{0n})^{2}\right\}\right\|<C((\delta_{10n}+\delta_{30n}\gamma_{0n})h_{1}^{2}+h_{1}^{3})

for some $C<\infty$ and for sufficiently large $n$ . By similar computations for $E[z_{itn}(\delta_{10n}+\delta_{30n}q_{it-1,n})1\{\gamma\geq q_{it-1,n}>\gamma_{0n}\}]$ , we can derive that there exists $h_{1}>0$ such that if $|\gamma-\gamma_{0n}|\leq h_{1}$ , then $\left\|(M_{20n}(\gamma)-M_{20n})\delta_{0n}-\widetilde{H}_{n}[(\delta_{10n}+\delta_{30n}\gamma_{0n})(\gamma-\gamma_{0n})+\frac{\delta_{30n}}{2}(\gamma-\gamma_{0n})^{2}]\right\|<C((\delta_{10n}+\delta_{30n}\gamma_{0n})h_{1}^{2}+h_{1}^{3})$ for some $C<\infty$ and for sufficiently large $n$ .

Meanwhile, there exist $h_{1},h_{2}>0$ such that if $|\gamma-\gamma_{0n}|\leq h_{1}$ and $\|\alpha-\alpha_{0n}\|\leq h_{2}$ , then $\|(M_{20n}(\gamma)-M_{20n})(\delta-\delta_{0n})\|<Ch_{2}h_{1}$ for some $C<\infty$ and for sufficiently large $n$ . This is because for sufficiently small $h_{1}>0$ , $\|M_{20n}(\gamma)-M_{20n}\|<\sup_{\bar{\gamma}:|\bar{\gamma}-\gamma_{0n}|\leq h_{1}}\|\mathfrak{H}_{n}(\bar{\gamma})\|h_{1}$ , where

\mathfrak{H}_{n}(\gamma)=\begin{pmatrix}E_{t_{0}n}[z_{it_{0}n}(1,\gamma)|\gamma]f_{t_{0}n}(\gamma)-E_{t_{0}-1,n}[z_{it_{0}n}(1,\gamma)|\gamma]f_{t_{0}-1,n}(\gamma)\\ \vdots\\ E_{Tn}[z_{iTn}(1,\gamma)|\gamma]f_{Tn}(\gamma)-E_{T-1,n}[z_{iTn}(1,\gamma)|\gamma]f_{T-1,n}(\gamma)\end{pmatrix},

Note that if $h_{1}$ is sufficently small, $\sup_{\bar{\gamma}:|\bar{\gamma}-\gamma_{0n}|\leq h_{1}}\|\mathfrak{H}_{n}(\bar{\gamma})\|$ is bounded above by some nonnegative constant $C<\infty$ , and $\|M_{20n}(\gamma)-M_{20n}\|<Ch_{1}$ .

Hence, for any $\eta>0$ , there exist $h_{1},h_{2}>0$ such that if $|\gamma-\gamma_{0n}|\leq h_{1}$ and $\|\alpha-\alpha_{0n}\|\leq h_{2}$ , then

\Bigl{\|}(M_{20n}(\gamma)-M_{20n})[\delta_{0n}+(\delta-\delta_{0n})]\Bigr{.}\\ \Bigl{.}-\widetilde{H}_{n}[(\delta_{10n}+\delta_{30n}\gamma_{0n})(\gamma-\gamma_{0n})+\frac{\delta_{30n}}{2}(\gamma-\gamma_{0n})^{2}]\Bigr{\|}<C(h_{1}h_{2}+(\delta_{10n}+\delta_{30n}\gamma_{0n})h_{1}^{2}+h_{1}^{3}),

for some nonnegative $C<\infty$ and sufficiently large $n$ . Therefore, for any $\eta>0$ , we can set $h_{1}$ and $h_{2}$ sufficiently small such that $\sup_{|\gamma-\gamma_{0n}|\leq h_{1},\|\alpha-\alpha_{0n}\|\leq h_{2}}\sqrt{n}\|g_{0n}(\theta)-M_{0n}(\alpha-\alpha_{0n})-\widetilde{H}_{n}[(\delta_{10n}+\delta_{30n}\gamma_{0n})(\gamma-\gamma_{0n})+\frac{\delta_{30n}}{2}(\gamma-\gamma_{0n})^{2}]\|\leq\sqrt{n}(h_{2}+(\delta_{10n}+\delta_{30n}\gamma_{0n})h_{1}+h_{1}^{2})\eta$ for sufficiently large $n$ , which completes the proof.

∎

Lemma I.3.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ . Then,

\sup_{\gamma\in\Gamma}\|\bar{M}_{n}(\gamma)-M_{0n}(\gamma)\|\xrightarrow{p}0.

Proof.

We show that the classes $\{z_{it}(1,q_{it})1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ and $\{z_{it}(1,q_{it-1})1\{q_{it-1}>\gamma\}:\gamma\in\Gamma\}$ are Glivenko-Cantelli uniformly in $\{P_{n}:n=1,2,...\}$ , where $P_{n}$ is the probability law of $\omega_{in}=\{(z_{itn},y_{itn},x_{itn},\epsilon_{itn})_{t=1}^{T}\}$ . We focus on the former class since the verification for the latter class is exactly identical. As it is sufficient to show that each element of $\{z_{it}(1,q_{it})1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ , we additionally restrict our focus on $\mathcal{G}_{m\cdot index}=\{z_{it}q_{it}1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ and assume that $z_{it}$ is scalar without losing of generality. By Theorem 2.8.1 in van der Vaart and Wellner, (1996), $\mathcal{G}_{m\cdot index}$ is Glivenko-Cantelli uniformly in $\{P_{n}\}$ if

	$\displaystyle\sup_{n\in\mathbb{N}}E\|G_{m\cdot index}(\omega_{in})\|^{1+r}<\infty\text{ for some $r>0$, and}$
	$\displaystyle\sup_{Q}\log N(\varepsilon\\|G_{m\cdot index}\\|_{Q,1},\mathcal{G}_{m\cdot index},L_{1}(Q))<\infty\text{ for all $\varepsilon>0$},$

where supremum is taken over all probability measures $Q$ such that $QG_{m\cdot index}<\infty$ , and $G_{m\cdot index}=|z_{it}q_{it}|$ is an envelope of $\mathcal{G}_{m\cdot index}$ . The first condition holds because $\sup_{n\in\mathbb{N}}E|z_{itn}q_{itn}|^{1+r}\leq\sup_{n\in\mathbb{N}}(E|z_{itn}|^{2+2r})^{1/2}(E|q_{itn}|^{2+2r})^{1/2}<C$ for some $C<\infty$ and $r>0$ . The second condition holds as we have shown in the proof of Lemma D.2 that $\mathcal{G}_{m\cdot index}$ is a VC class that satisfies the uniform entropy condition. Therefore, the ULLN with triangular array holds for $\{z_{it}q_{it}1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ . ∎

Lemma I.4.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ . Suppose that $|\hat{\theta}_{(1)n}-\theta_{0n}|\xrightarrow{p}0$ . Then,

\left\|\left\{\frac{1}{n}\sum_{i=1}^{n}[g(\omega_{in},\hat{\theta}_{(1)n})g(\omega_{in},\hat{\theta}_{(1)n})^{\prime}]-\bar{g}_{n}(\hat{\theta}_{(1)n})\bar{g}_{n}(\hat{\theta}_{(1)n})^{\prime}\right\}-\Omega_{n}\right\|\xrightarrow{p}0,

where $\Omega_{n}=E[g(\omega_{in},\theta_{0n})g(\omega_{in},\theta_{0n})^{\prime}]-g_{0n}(\theta_{0n})g_{0n}(\theta_{0n})^{\prime}$ .

Proof.

We need to show $\|\bar{g}_{n}(\hat{\theta}_{(1)n})-g_{0n}(\theta_{0n})\|\xrightarrow{p}0$ and $\|\frac{1}{n}\sum_{i=1}^{n}g(\omega_{in},\hat{\theta}_{(1)n})g(\omega_{in},\hat{\theta}_{(1)n})^{\prime}-E[g(\omega_{in},\theta_{0n})g(\omega_{in},\theta_{0n})^{\prime}]\|\xrightarrow{p}0$ . $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ is Glivenko-Cantelli class uniformly with respect to $\{P_{n}:n=1,2,...\}$ , where $P_{n}$ is the probability law of $\omega_{in}=\{(z_{itn},y_{itn},x_{itn},\epsilon_{itn})_{t=1}^{T}\}$ , as the proof of Lemma I.5 shows that the class is uniformly Donsker and pre-Gaussian. Therefore, $\|\bar{g}_{n}(\hat{\theta}_{(1)n})-g_{0n}(\theta_{0n})\|\xrightarrow{p}0$ when $|\hat{\theta}_{(1)n}-\theta_{0n}|\xrightarrow{p}0$ .

Let $\mathcal{G}^{2}=\{g(\omega_{i},\theta)g(\omega_{i},\theta)^{\prime}:\theta\in\Theta\}$ . If $\mathcal{G}^{2}$ is Glivenko-Cantelli class uniformly with respect to $\{P_{n}\}$ , then $\sup_{\theta\in\Theta}\|\frac{1}{n}\sum_{i=1}^{n}g(\omega_{in},\theta)g(\omega_{in},\theta)^{\prime}-E[g(\omega_{in},\theta)g(\omega_{in},\theta)^{\prime}]\|\xrightarrow{p}0$ . Then, $\|\frac{1}{n}\sum_{i=1}^{n}g(\omega_{in},\hat{\theta}_{(1)n})g(\omega_{in},\hat{\theta}_{(1)n})^{\prime}-E[g(\omega_{in},\theta_{0n})g(\omega_{in},\theta_{0n})^{\prime}]\|\xrightarrow{p}0$ as $|\hat{\theta}_{(1)n}-\theta_{0n}|\xrightarrow{p}0$ . By Theorem 2.8.1 in van der Vaart and Wellner, (1996), $\mathcal{G}^{2}$ is Glivenko-Cantelli uniformly in $\{P_{n}\}$ if

	$\displaystyle\sup_{n\in\mathbb{N}}E\|G^{2}(\omega_{in})\|^{1+r}<\infty\text{ for some $r>0$, and}$
	$\displaystyle\sup_{Q}\log N(\varepsilon\\|G^{2}\\|_{Q,1},\mathcal{G}^{2},L_{1}(Q))<\infty\text{ for all $\varepsilon>0$},$

where supremum is taken over all probability measures $Q$ such that $QG^{2}<\infty$ , and $G^{2}=[\sum_{t=1}^{T}\{C(\|z_{it}\Delta x_{it}\|+\|z_{it}(1,q_{it})^{\prime}\|+\|z_{it}(1,q_{it-1})^{\prime}\|)+\|z_{it}\Delta\epsilon_{it}\|\}]^{2}$ for some $C<\infty$ is an envelope of $\mathcal{G}^{2}$ as $G$ is an envelope of $\mathcal{G}$ as shown in the proof of Lemma D.3. The first condition $\sup_{n\in\mathbb{N}}E[G(\omega_{in})^{2+r}]<\infty$ holds because $\sup_{n\in\mathbb{N}}\max\{(E\|z_{itn}\|^{4+2r})^{1/2},(E\|x_{it-1,n}\|^{4+2r})^{1/2},(E\|x_{itn}\|^{4+2r})^{1/2},(E\|\Delta\epsilon_{itn}\|^{4+2r})^{1/2}\}<\infty$ for some $r>0$ . The second condition holds because $\mathcal{G}$ satisfies the uniform entropy condition (see the proof of Lemma D.3) while pairwise product preserves uniform entropy condition, e.g., Theorem 2.10.20 in van der Vaart and Wellner, (1996).

∎

Lemma I.5.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ . If $h_{n}\rightarrow 0$ , then

\sup_{\|\theta_{1}-\theta_{2}\|<h_{n}}\sqrt{n}\|\bar{g}_{n}(\theta_{1})-\bar{g}_{n}(\theta_{2})-g_{0n}(\theta_{1})+g_{0n}(\theta_{2})\|=o_{p}(1).

Proof.

Let $P_{n}$ be a probability law of $\omega_{in}=\{(z_{itn},y_{itn},x_{itn},\epsilon_{itn})_{t=1}^{T}\}$ . We show that the class $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ is pre-Gaussian uniformly in $\{P_{n}:n=1,2,...\}$ (see Section 2.8.2 in van der Vaart and Wellner, (1996) for its definition), which implies asymptotic equicontinuity uniform in $\{P_{n}\}$ . That is, for any $\epsilon>0$ , $\sup_{m\in\mathbb{N}}P_{m}(\|\mathbb{G}_{n}\|_{\mathcal{G}_{h}}>\epsilon)\rightarrow 0$ if $h\rightarrow 0$ and $n\rightarrow\infty$ , while $\mathcal{G}_{h}=\{g(\omega_{i},\theta_{1})-g(\omega_{i},\theta_{2}):\|\theta_{1}-\theta_{2}\|<h\}$ . Let $G$ be an envelope of $\mathcal{G}$ . By Theorem 2.8.3 in van der Vaart and Wellner, (1996), it is sufficient to show that

	$\displaystyle\sup_{n\in\mathbb{N}}E\|G(\omega_{in})\|^{2+r}<\infty\text{ for some $r>0$, and}$
	$\displaystyle\int_{0}^{\infty}\sup_{Q}\log N(\varepsilon\\|G\\|_{Q,2},\mathcal{G},L_{2}(Q))d\varepsilon<\infty,$

where $Q$ ranges over all finitely discrete probability measures, which implies that $\mathcal{G}$ is Donsker and uniformly pre-Gaussian in $\{P_{n}\}$ .

Let $\widetilde{\mathcal{G}}^{(t)}=\{z_{it}\Delta\epsilon_{it}-z_{it}\Delta x_{it}\bar{\beta}-z_{it}1_{it}(\gamma_{1})^{\prime}X_{it}\delta_{1}+z_{it}1_{it}(\gamma_{2})^{\prime}X_{it}\delta_{2}:\|\bar{\beta}\|\leq K,\|\delta_{1}\|\leq K,\|\delta_{2}\|\leq K,\gamma_{1},\gamma_{2}\in\Gamma\}$ . Suppose that $z_{it}$ is a scalar without losing of generality as it is sufficient to show the conditions hold for each element of $\mathcal{G}$ . Note that $g_{t}(\omega_{i},\theta)=z_{it}(\Delta y_{it}-\Delta x_{it}^{\prime}\beta-1_{it}(\gamma)^{\prime}X_{it}\delta)=z_{it}\Delta\epsilon_{it}-z_{it}\Delta x_{it}(\beta-\beta_{0n})-z_{it}1_{it}(\gamma)^{\prime}X_{it}\delta+z_{it}1_{it}(\gamma_{0n})^{\prime}X_{it}\delta_{0n}$ is an element of $\widetilde{\mathcal{G}}^{(t)}$ for any $\theta_{0n}\in\Theta$ . So it is sufficient to show $\widetilde{\mathcal{G}}^{(t)}$ is pre-Gaussian uniformly in $\{P_{n}\}$ instead of each element of $\mathcal{G}$ .

$\widetilde{G}(\omega_{i})=C(\|z_{it}\Delta x_{it}\|+\|z_{it}(1,q_{it})^{\prime}\|+\|z_{it}(1,q_{it-1})^{\prime}\|)+\|z_{it}\Delta\epsilon_{it}\|$ is an envelope of $\widetilde{\mathcal{G}}^{(t)}$ for some $C<\infty$ . The first condition for the uniform pre-Gaussianity $\sup_{n\in\mathbb{N}}E|\widetilde{G}(\omega_{in})|^{2+r}<\infty$ holds as $\sup_{n\in\mathbb{N}}\max\{(E\|z_{itn}\|^{4+2r})^{1/2},(E\|x_{itn}\|^{4+2r})^{1/2},(E\|x_{it-1,n}\|^{4+2r})^{1/2},(E\|\Delta\epsilon_{itn}\|^{4+2r})^{1/2}\}<\infty$ for some $r>0$ . The second condition holds as $\widetilde{\mathcal{G}}^{(t)}$ is shown to satisfy the uniform entropy condition in the proof of Lemma D.3.

∎

Lemma I.6.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ , and suppose that $\zeta_{1}\neq\{\pm\infty\}$ , and $\zeta_{2}=0$ , i.e., it is (i) continuous or (ii) semi-continuous. Then,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}\\ \xrightarrow{p}\left\{E_{t,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t,\infty}(\gamma_{0,\infty})-E_{t-1,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t-1,\infty}(\gamma_{0,\infty})\right\}[\zeta_{1}b+\frac{\delta_{30,\infty}}{2}b^{2}]

uniformly over $b\in[-K,K]$ for any $K<\infty$ .

Proof.

Note that

		$\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}$
		$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}-E[z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}]\right\}$		(I.2)
		$\displaystyle\quad+\sqrt{n}E[z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}].$		(I.3)

The stochastic term (I.2) converges in probability to zero uniformly with respect to $b\in[-K,K]$ . This is because Lemma I.5 shows that when $h_{n}\downarrow 0$ , then

\sup_{|\gamma-\gamma_{0n}|<h_{n}}\sqrt{n}\left\{\frac{1}{n}\sum_{i=1}^{n}z_{itn}(1_{itn}(\gamma_{0n})-1_{itn}(\gamma))^{\prime}X_{itn}\delta_{0n}-E[z_{itn}(1_{itn}(\gamma_{0n})-1_{itn}(\gamma))^{\prime}X_{itn}\delta_{0n}]\right\}=o_{p}(1)

as it can be expressed as $\sup_{|\gamma-\gamma_{0n}|<h_{n}}\|\bar{g}_{n}(\alpha_{0n},\gamma)-\bar{g}_{n}(\alpha_{0n},\gamma_{0n})-g_{0n}(\alpha_{0n},\gamma)+g_{0n}(\alpha_{0n},\gamma_{0n})\|$ .

Suppose $b>0$ . The case for $b<0$ follows similarly. We will show that (I.3) converges as follows:

\begin{array}[]{l}\sqrt{n}Ez_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}\\ =\sqrt{n}\left\{E[z_{itn}(\delta_{10n}+\delta_{30n}q_{itn})1\{\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{itn}>\gamma_{0n}\}]\right.\\ \left.\hskip 56.9055pt-E[z_{itn}(\delta_{10n}+\delta_{30n}q_{it-1,n})1\{\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it-1,n}>\gamma_{0n}\}]\right\}\\ \rightarrow\left\{E_{t,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t,\infty}(\gamma_{0,\infty})-E_{t-1,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t-1,\infty}(\gamma_{0,\infty})\right\}[\zeta_{1}b+\frac{\delta_{30,\infty}}{2}b^{2}],\end{array}

uniformly with respect to $b\in[-K,K]$ .

Let

\begin{array}[]{l}R_{n,b}=\left(\sqrt{n}E[z_{itn}(\delta_{10n}+\delta_{30n}q_{itn})1\{\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{itn}>\gamma_{0n}\}]\right.\\ \left.\hskip 56.9055pt-\{E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\}(n^{1/4}(\delta_{10n}+\delta_{30n}\gamma_{0n})b+\frac{\delta_{30n}}{2}b^{2})\right),\end{array}

which will be shown to converge to zero uniformly with respect to $b\in[-K,K]$ . By Taylor epxansion, its formula can be derived as follows:

R_{n,b}=\Bigl{(}\delta_{30n}\{E_{tn}[z_{itn}|\gamma_{n,b}]f_{tn}(\gamma_{n,b})-E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\}\Bigr{.}\\ \Bigl{.}+(\delta_{10n}+\delta_{30n}\gamma_{n,b})\frac{d}{d\gamma}\{E_{tn}[z_{itn}|\gamma]f_{tn}(\gamma)\}|_{\gamma=\gamma_{n,b}}\Bigl{)}\frac{b^{2}}{2},

where $\gamma_{n,b}\in[\gamma_{0n},\gamma_{0n}+\frac{b}{n^{1/4}}]$ . Note that $|\gamma_{n,b}-\gamma_{0n}|\rightarrow 0$ unifromly with respect to $b\in[-K,K]$ . Hence, for sufficiently large $n$ , $\|\frac{d}{d\gamma}\{E_{tn}[z_{itn}|\gamma]f_{tn}(\gamma)\}|_{\gamma=\gamma_{n,b}}\|\leq C$ for some $C<\infty$ . Moreover, $\delta_{10n}+\delta_{30n}\gamma_{n,b}\rightarrow 0$ and $E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})-E_{tn}[z_{itn}|\gamma_{n,b}]f_{tn}(\gamma_{n,b})\rightarrow 0$ uniformly with respect to $b\in[-K,K]$ . Therefore, $\|R_{n,b}\|\rightarrow 0$ uniformly with respect to $b\in[-K,K]$ , i.e.,

\begin{array}[]{l}\left(\sqrt{n}E[z_{itn}(\delta_{10n}+\delta_{30n}q_{itn})1\{\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{itn}>\gamma_{0n}\}]\right.\\ \left.\hskip 56.9055pt-\{E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\}(n^{1/4}(\delta_{10n}+\delta_{30n}\gamma_{0n})b+\frac{\delta_{30n}}{2}b^{2})\right)\rightarrow 0\end{array}

uniformly with respect to $b\in[-K,K]$ . We can derive a similar result for $\sqrt{n}E[z_{itn}(\delta_{10n}+\delta_{30n}q_{it-1,n})1\{\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}}\geq q_{it-1,n}>\gamma_{0n}\}]$ that leads to

\begin{array}[]{l}\Biggl{\|}\sqrt{n}Ez_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}\Biggr{.}\\ \Biggl{.}-\left\{E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})-E_{t-1,n}[z_{itn}|\gamma_{0n}]f_{t-1,n}(\gamma_{0n})\right\}[n^{1/4}(\delta_{10n}+\delta_{30n}\gamma_{0n})b+\frac{\delta_{30n}}{2}b^{2}]\Biggr{\|}\rightarrow 0,\end{array}

uniformly with respect to $b\in[-K,K]$ . As $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})$ ,

\left\{E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})-E_{t-1,n}[z_{itn}|\gamma_{0n}]f_{t-1,n}(\gamma_{0n})\right\}[n^{1/4}(\delta_{10n}+\delta_{30n}\gamma_{0n})b+\frac{\delta_{30n}}{2}b^{2}]\\ \rightarrow\left\{E_{t,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t,\infty}(\gamma_{0,\infty})-E_{t-1,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t-1,\infty}(\gamma_{0,\infty})\right\}[\zeta_{1}b+\frac{\delta_{30,\infty}}{2}b^{2}],

which completes the proof. ∎

Lemma I.7.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ , and suppose that $\zeta_{1}=\{\pm\infty\}$ and $\zeta_{2}=0$ , i.e., it is (iii) semi-discontinuous. Then,

\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{itn}\delta_{0n}\\ \xrightarrow{p}\left\{E_{t,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t,\infty}(\gamma_{0,\infty})-E_{t-1,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t-1,\infty}(\gamma_{0,\infty})\right\}b

uniformly over $b\in[-K,K]$ for any $K<\infty$ .

Proof.

Note that

		$\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{itn}\delta_{0n}$
		$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\left\{z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{itn}\delta_{0n}\right.$
		$\displaystyle\left.\hskip 113.81102pt-E[z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{itn}\delta_{0n}]\right\}$		(I.4)
		$\displaystyle\hskip 56.9055pt+\sqrt{n}E[z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{itn}\delta_{0n}].$		(I.5)

The stochastic term (I.4) converges in probability to zero uniformly with respect to $b\in[-K,K]$ by Lemma I.5, by an argument similar to the proof of Lemma I.6 that shows (I.2) converges to zero.

Suppose $b>0$ . The case for $b<0$ follows similarly. We will show that (I.5) converges as follows:

\begin{array}[]{l}\sqrt{n}Ez_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{itn}\delta_{0n}\\ =\sqrt{n}\left\{E[z_{itn}(\delta_{10n}+\delta_{30n}q_{itn})1\{\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})}\geq q_{itn}>\gamma_{0n}\}]\right.\\ \left.\hskip 56.9055pt-E[z_{itn}(\delta_{10n}+\delta_{30n}q_{it-1,n})1\{\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})}\geq q_{it-1,n}>\gamma_{0n}\}]\right\}\\ \rightarrow\left\{E_{t,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t,\infty}(\gamma_{0,\infty})-E_{t-1,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t-1,\infty}(\gamma_{0,\infty})\right\}b,\end{array}

uniformly with respect to $b\in[-K,K]$ .

Let

R_{n,b}=\left(\sqrt{n}E[z_{itn}(\delta_{10n}+\delta_{30n}q_{itn})1\{\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})}\geq q_{itn}>\gamma_{0n}\}]-\{E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\}b\right),

which will be shown to converge to zero uniformly with respect to $b\in[-K,K]$ . By Taylor expansion, its formula can be derived as follows:

R_{n,b}=\frac{1}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})^{2}}\Bigl{(}\delta_{30n}\{E_{tn}[z_{itn}|\gamma_{n,b}]f_{tn}(\gamma_{n,b})\}\Bigr{.}\\ \Bigl{.}+(\delta_{10n}+\delta_{30n}\gamma_{n,b})\frac{d}{d\gamma}\{E_{tn}[z_{itn}|\gamma]f_{tn}(\gamma)\}|_{\gamma=\gamma_{n,b}}\Bigl{)}\frac{b^{2}}{2},

where $\gamma_{n,b}\in[\gamma_{0n},\gamma_{0n}+\frac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})}]$ . Note that $|\gamma_{n,b}-\gamma_{0n}|\rightarrow 0$ unifromly with respect to $b\in[-K,K]$ . Hence, for sufficiently large $n$ , $\|E_{tn}[z_{itn}|\gamma_{n,b}]f_{tn}(\gamma_{n,b})\|\leq C$ and $\|\frac{d}{d\gamma}\{E_{tn}[z_{itn}|\gamma]f_{tn}(\gamma)\}|_{\gamma=\gamma_{n,b}}\|\leq C$ for some $C<\infty$ . Moreover, $\delta_{10n}+\delta_{30n}\gamma_{n,b}\rightarrow 0$ uniformly with respect to $b\in[-K,K]$ . As $\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})^{2}\rightarrow\infty$ , $\|R_{n,b}\|\rightarrow 0$ uniformly with respect to $b\in[-K,K]$ , i.e.,

\left(\sqrt{n}E[z_{itn}(\delta_{10n}+\delta_{30n}q_{itn})1\{\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})}\geq q_{itn}>\gamma_{0n}\}]-\{E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})\}b\right)\rightarrow 0

uniformly with respect to $b\in[-K,K]$ . We can derive a similar result for $\sqrt{n}E[z_{itn}(\delta_{10n}+\delta_{30n}q_{it-1,n})1\{\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})}\geq q_{it-1,n}>\gamma_{0n}\}]$ that leads to

\begin{array}[]{l}\Biggl{\|}\sqrt{n}Ez_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{itn}\delta_{0n}\Biggr{.}\\ \Biggl{.}\hskip 113.81102pt-\left\{E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})-E_{t-1,n}[z_{itn}|\gamma_{0n}]f_{t-1,n}(\gamma_{0n})\right\}b\Biggr{\|}\rightarrow 0,\end{array}

uniformly with respect to $b\in[-K,K]$ . As $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})$ ,

\left\{E_{tn}[z_{itn}|\gamma_{0n}]f_{tn}(\gamma_{0n})-E_{t-1,n}[z_{itn}|\gamma_{0n}]f_{t-1,n}(\gamma_{0n})\right\}b\\ \rightarrow\left\{E_{t,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t,\infty}(\gamma_{0,\infty})-E_{t-1,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t-1,\infty}(\gamma_{0,\infty})\right\}b,

which completes the proof. ∎

Lemma I.8.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ . Then,

\hat{u}_{n}^{*}=\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}^{*}\widehat{\Delta\epsilon}_{it_{0}n}^{*}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iTn}^{*}\widehat{\Delta\epsilon}_{iTn}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\widehat{\Delta\epsilon}_{it_{0}n}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iTn}\widehat{\Delta\epsilon}_{iTn}\end{pmatrix}\xrightarrow{p^{*}}0\text{ in $P$.}

Proof.

Note that $\hat{u}_{n}^{*}=\frac{1}{n}\sum_{i=1}^{n}[g(\omega_{in}^{*},\hat{\theta}_{n})-E[g(\omega_{in},\hat{\theta}_{n})]]-\frac{1}{n}\sum_{i=1}^{n}[g(\omega_{in},\hat{\theta}_{n})-E[g(\omega_{in},\hat{\theta}_{n})]]$ . Let $P_{n}$ be the probability law of $\omega_{in}=\{(z_{itn},y_{itn},x_{itn},\epsilon_{itn})_{t=1}^{T}\}$ . As $\mathcal{G}=\{g(\omega_{i},\theta):\theta\in\Theta\}$ is Glivenko-Cantelli uniformly in $\{P_{n}\}$ , which is shown in the proof of Lemma I.5, $\frac{1}{n}\sum_{i=1}^{n}[g(\omega_{in},\hat{\theta}_{n})-E[g(\omega_{in},\hat{\theta}_{n})]]$ is $o_{p}(1)$ , and hence $o_{p}^{*}(1)$ in $P$ by Lemma B.1. By Proposition 2, $\frac{1}{n}\sum_{i=1}^{n}[g(\omega_{in}^{*},\hat{\theta}_{n})-E[g(\omega_{in},\hat{\theta}_{n})]]$ is also $o_{p}^{*}(1)$ in $P$ , which completes the proof. ∎

Lemma I.9.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ . Then,

\sqrt{n}\hat{u}_{n}^{*}=\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}^{*}\widehat{\Delta\epsilon}_{it_{0}n}^{*}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iTn}^{*}\widehat{\Delta\epsilon}_{iTn}^{*}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}n}\widehat{\Delta\epsilon}_{it_{0}n}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iTn}\widehat{\Delta\epsilon}_{iTn}\end{pmatrix}\right\}\xrightarrow{d^{*}}N(0,\Omega_{\infty})\text{ in $P$.}

Proof.

Note that $\sqrt{n}\hat{u}_{n}^{*}=\sqrt{n}\{\bar{g}_{n}^{*}(\hat{\theta}_{n})-\bar{g}_{n}^{*}(\theta_{0n})-\bar{g}_{n}(\hat{\theta}_{n})+\bar{g}_{n}(\theta_{0n})\}+\sqrt{n}\{\bar{g}_{n}^{*}(\theta_{0n})-\bar{g}_{n}(\theta_{0n})\}$ . As $\|\hat{\theta}_{n}-\theta_{0n}\|=o_{p}(1)$ and $o_{p}^{*}(1)$ in $P$ by Lemma B.1, $\sqrt{n}\{\bar{g}_{n}^{*}(\hat{\theta}_{n})-\bar{g}_{n}^{*}(\theta_{0n})-\bar{g}_{n}(\hat{\theta}_{n})+\bar{g}_{n}(\theta_{0n})\}$ is $o_{p}^{*}(1)$ in $P$ . By applying Lemma I.18, $\sqrt{n}\lambda^{\prime}\{\bar{g}_{n}^{*}(\theta_{0n})-\bar{g}_{n}(\theta_{0n})\}\xrightarrow{d^{*}}N(0,\lambda^{\prime}\Omega_{\infty}\lambda)$ in $P$ for any real vector $\lambda$ . By Cramér-Wold, $\sqrt{n}\{\bar{g}_{n}^{*}(\theta_{0n})-\bar{g}_{n}(\theta_{0n})\}\xrightarrow{d^{*}}N(0,\Omega_{\infty})$ in $P$ , and applying Slutsky theorem completes the proof. ∎

The Lemma I.10 states uniform bootstrap probability limit of the following matrix:

\bar{M}_{n}^{*}(\gamma)=\frac{1}{n}\sum_{i=1}^{n}\begin{pmatrix}z_{it_{0}n}^{*}\Delta x_{it_{0}n}^{*\prime}&z_{it_{0}n}^{*}1_{it_{0}n}^{*}(\gamma)^{\prime}X_{it_{0}n}^{*}\\ \vdots&\vdots\\ z_{iTn}^{*}\Delta x_{iTn}^{*\prime}&z_{iTn}^{*}1_{iTn}^{*}(\gamma)^{\prime}X_{iTn}^{*}\end{pmatrix}.

Lemma I.10.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ . Then,

\sup_{\gamma\in\Gamma}\|\bar{M}_{n}^{*}(\gamma)-M_{0n}(\gamma)\|\xrightarrow{p^{*}}0\text{ in $P$}.

Proof.

We apply Proposition 2 to prove the result. First, we need to show that $\{z_{it}(1,q_{it})1\{q_{it}>\gamma\}:\gamma\in\Gamma\}$ and $\{z_{it}(1,q_{it-1})1\{q_{it-1}>\gamma\}:\gamma\in\Gamma\}$ are Glivenko-Cantelli uniformly in $\{P_{n}:n=1,2,...\}$ , where $P_{n}$ is the probability law of $\omega_{in}=\{(z_{itn},y_{itn},x_{itn},\epsilon_{itn})_{t=1}^{T}\}$ . It is shown in Lemma I.3 that the functional classes are Glivenko-Cantelli uniformly in $\{P_{n}\}$ . Second, the condition for envelope holds as $\sup_{n\in\mathbb{N}}E[\|z_{itn}(1,q_{itn})\|+\|z_{itn}(1,q_{it-1,n})\|]<\infty$ , which is implied by $\sup_{n\in\mathbb{N}}\max\{(E\|z_{itn}\|^{2+r})^{1/2},(E\|q_{itn}\|^{2+r})^{1/2},(E\|q_{it-1,n}\|^{2+r})^{1/2}\}<\infty$ for some $r>0$ . ∎

Lemma I.11.

Let $\{\phi_{0n}\in\Phi_{0}:n\geq 1\}$ and $\pi_{n}(\phi_{0n})\rightarrow(\zeta_{1},\zeta_{2},\phi_{0,\infty})\in\Pi$ . If $h_{n}\rightarrow 0$ , then

\sup_{\|\theta_{1}-\theta_{2}\|<h_{n}}\sqrt{n}\|\bar{g}_{n}^{*}(\theta_{1})-\bar{g}_{n}^{*}(\theta_{2})-\bar{g}_{n}(\theta_{1})+\bar{g}_{n}(\theta_{2})\|=o_{p}^{*}(1)\text{ in $P$.}

Proof.

Note that $\bar{g}_{n}^{*}(\theta_{1})-\bar{g}_{n}^{*}(\theta_{2})=\frac{1}{n}\sum_{i=1}^{n}(g(\omega_{in}^{*},\theta_{1})-g(\omega_{in}^{*},\theta_{2}))$ because $g_{in}^{*}(\theta)=g(\omega_{in}^{*},\theta)-g(\omega_{in}^{*},\theta_{0n}^{*})+g(\omega_{in}^{*},\hat{\theta}_{n})$ , see (I.1). Therefore, $\sqrt{n}\{\bar{g}_{n}^{*}(\theta_{1})-\bar{g}_{n}^{*}(\theta_{2})-\bar{g}_{n}(\theta_{1})+\bar{g}_{n}(\theta_{2})\}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\{g(\omega_{in}^{*},\theta_{1})-g(\omega_{in}^{*},\theta_{2})-g(\omega_{in},\theta_{1})+g(\omega_{in},\theta_{2})\}$ . Let $\widehat{\mathbb{G}}_{n}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\delta_{\omega_{in}^{*}}-\mathbb{P}_{n})$ and $\mathbb{P}_{n}=n^{-1}\sum_{i=1}^{n}\delta_{\omega_{in}}$ , where $\delta_{\omega_{in}^{*}}$ and $\delta_{\omega_{in}}$ are dirac measures at $\omega_{in}^{*}$ and $\omega_{in}$ . Then, it is sufficient to prove $\|\widehat{\mathbb{G}}_{n}\|_{\mathcal{G}_{h}}=o_{p}^{*}(1)$ in $P$ if $h\rightarrow 0$ and $n\rightarrow\infty$

For $h>0$ , let $\mathcal{G}_{h}=\{g(\omega_{i},\theta_{1})-g(\omega_{i},\theta_{2}):\|\theta_{1}-\theta_{2}\|\leq h\}$ and $G_{h}$ be its envelope. Let $\widetilde{N}_{1},\widetilde{N}_{2},...$ be symmetrized Poisson random variables with parameter $1/2$ . By Lemma I.14,

E^{*}\|\widehat{\mathbb{G}}_{n}\|_{\mathcal{G}_{h}}\leq 4E_{\widetilde{N}}\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\widetilde{N}_{i}\delta_{\omega_{in}}\|_{\mathcal{G}_{h}}

conditionally on $\{\omega_{in}:1\leq i\leq n\}$ . For all $1\leq n_{0}\leq n$ , the last display is stochastically bounded upto constant by

(n_{0}-1)E_{\widetilde{N}}\max_{1\leq i\leq n}\frac{\widetilde{N}_{i}}{\sqrt{n}}PG(\omega_{in})+\|\widetilde{N}_{1}\|_{2,1}\max_{n_{0}\leq j\leq n}E\|\frac{1}{\sqrt{j}}\sum_{i=n_{0}}^{j}\varepsilon_{i}\delta_{\omega_{in}}\|_{\mathcal{G}_{h}},

(I.6)

by Lemma I.16, where $G(\cdot)$ is an envelope function of $\mathcal{G}$ . The first term is bounded above by $(n_{0}-1)2\sqrt{2}n^{-1/4}$ , which converges to zero for any $n_{0}$ as $n\rightarrow\infty$ , and $\|\widetilde{N}_{1}\|_{2,1}\leq 2\sqrt{2}$ (see proof of Theorem 3.6.3 in van der Vaart and Wellner, (1996)). By triangle inequality,

	$\displaystyle\max_{n_{0}\leq j\leq n}E\\|\frac{1}{\sqrt{j}}\sum_{i=n_{0}}^{j}\varepsilon_{i}\delta_{\omega_{in}}\\|_{\mathcal{G}_{h}}$	$\displaystyle\leq\max_{n_{0}\leq j\leq n}E\left(\\|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}\varepsilon_{i}\delta_{\omega_{in}}\\|_{\mathcal{G}_{h}}+\\|\frac{1}{\sqrt{j}}\sum_{i=1}^{n_{0}-1}\varepsilon_{i}\delta_{\omega_{in}}\\|_{\mathcal{G}_{h}}\right)$
		$\displaystyle\leq 2\max_{n_{0}-1\leq j\leq n}E\\|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}\varepsilon_{i}\delta_{\omega_{in}}\\|_{\mathcal{G}_{h}},$

and the last display is bounded upto constant by

\max_{n_{0}-1\leq j\leq n}\left(E\sup_{\|\theta_{1}-\theta_{2}\|\leq h}\|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}\varepsilon_{i}(g(\omega_{in},\theta_{1})-g(\omega_{in},\theta_{2})-E[g(\omega_{in},\theta_{1})]+E[g(\omega_{in},\theta_{2})])\|\right.\\ \left.+E\sup_{\|\theta_{1}-\theta_{2}\|\leq h}\|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}\varepsilon_{i}(E[g(\omega_{in},\theta_{1})]-E[g(\omega_{in},\theta_{2})])\|\right).

For each $j$ , by Lemma I.15,

E\sup_{\|\theta_{1}-\theta_{2}\|\leq h}\|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}\varepsilon_{i}(g(\omega_{in},\theta_{1})-g(\omega_{in},\theta_{2})-E[g(\omega_{in},\theta_{1})]+E[g(\omega_{in},\theta_{2})])\|\\ \leq 2E\sup_{\|\theta_{1}-\theta_{2}\|\leq h}\|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}(g(\omega_{in},\theta_{1})-g(\omega_{in},\theta_{2})-E[g(\omega_{in},\theta_{1})]+E[g(\omega_{in},\theta_{2})])\|.

The right hand side of the last display converges to zero uniformly with respect to $n$ as $j\rightarrow\infty$ and $h\rightarrow 0$ since the functional class $\mathcal{G}$ is shown to be pre-Gaussian uniformly in $\{P_{n}\}$ in the proof of Lemma I.5.

For each $j$ ,

E\sup_{\|\theta_{1}-\theta_{2}\|\leq h}\|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}\varepsilon_{i}(E[g(\omega_{in},\theta_{1})]-E[g(\omega_{in},\theta_{2})])\|\leq E|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}\varepsilon_{i}|\cdot(E\|G_{h}(\omega_{in})\|),

and $E|\frac{1}{\sqrt{j}}\sum_{i=1}^{j}\varepsilon_{i}|<\infty$ by Hoeffding’s inequality, e.g., Lemma 2.2.7 in van der Vaart and Wellner, (1996). The following paragaph shows that $E\|G_{h}(\omega_{in})\|\rightarrow 0$ as $h\rightarrow 0$ and $n\rightarrow\infty$ .

As it is sufficient to consider each element of $\mathcal{G}$ , we focus on $g_{t}(\omega_{i},\theta)$ , the $t$ th term of $g(\omega_{i},\theta)$ , and assume that $g_{t}(\omega_{i},\theta)$ is a scalar without losing of generality. Note that

	$\displaystyle g_{t}(\omega_{i},\theta_{1})-g_{t}(\omega_{i},\theta_{2})$	$\displaystyle=-z_{it}\Delta x_{it}^{\prime}(\beta_{1}-\beta_{2})-z_{it}1_{it}(\gamma_{1})^{\prime}X_{it}(\delta_{1}-\delta_{2})$
		$\displaystyle\quad+z_{it}(1_{it}(\gamma_{2})^{\prime}-1_{it}(\gamma_{1})^{\prime})X_{it}\delta_{2}.$

Without losing of generality, let $\gamma_{1}\geq\gamma_{2}$ , and $K$ be a constant such that $\|\theta\|\leq K/2$ for $\theta\in\Theta$ . Set

G_{h,t}(\omega_{i})=\|z_{it}\Delta x_{it}^{\prime}\|\cdot h+(\|z_{it}(1,q_{it})\|+\|z_{it}(1,q_{it-1})\|)\cdot h\\ +K(\|z_{it}(1,q_{it})1\{\gamma_{1}\geq q_{it}>\gamma_{2}\}\|+\|z_{it}(1,q_{it-1})1\{\gamma_{1}\geq q_{it-1}>\gamma_{2}\}\|),

which is an envelope of $\{g_{t}(\omega_{i},\theta_{1})-g_{t}(\omega_{i},\theta_{2}):\|\theta_{1}-\theta_{2}\|<h\}$ . $\sup_{n\in\mathbb{N}}E[\|z_{itn}\Delta x_{itn}^{\prime}\|+\|z_{itn}(1,q_{itn})\|+\|z_{itn}(1,q_{it-1,n})\|]<\infty$ . Furthermore,

E\|z_{itn}(1,q_{itn})1\{\gamma_{1}\geq q_{itn}>\gamma_{2}\}\|\leq(E\|z_{itn}(1,q_{itn})\|^{2})^{1/2}(E1\{\gamma_{1}\geq q_{itn}>\gamma_{2}\})^{1/2},

while $\sup_{n\in\mathbb{N}}(E\|z_{itn}(1,q_{itn})\|^{2})^{1/2}<\infty$ , and

E1\{\gamma_{1}\geq q_{itn}>\gamma_{2}\}=\int_{\gamma_{2}}^{\gamma_{1}}f_{tn}(q)dq=(\gamma_{1}-\gamma_{2})f_{tn}(\bar{\gamma})

for some $\bar{\gamma}\in[\gamma_{2},\gamma_{1}]$ . Hence, $E1\{\gamma_{1}\geq q_{itn}>\gamma_{2}\}<Ch$ for some $C<\infty$ uniformly over all $n$ . Therefore, $E|G_{h,t}(\omega_{in})|<C\sqrt{h}$ for some $C<\infty$ and converges to zero as $h\rightarrow 0$ .

Recall that the first term in (I.6) goes to zero for any fixed $n_{0}$ when $n\rightarrow\infty$ . The second term in (I.6) is bounded by $2\sqrt{2}\max_{n_{0}\leq j\leq n}Z_{jn}$ , where $Z_{jn}=E\|\frac{1}{\sqrt{j}}\sum_{i=n_{0}}^{j}\varepsilon_{i}\delta_{\omega_{in}}\|_{\mathcal{G}_{h}}$ . It is shown in the previous paragraph that $Z_{jn}\rightarrow 0$ uniformly with respect to $n$ as $j\rightarrow\infty$ and $h\rightarrow 0$ . Therefore, for any $\epsilon>0$ , there exists $n_{0}<\infty$ such that $\max_{n_{0}\leq j\leq n}Z_{jn}<\epsilon/2$ for all $n>n_{0}$ . Then, there exists $N(n_{0})$ large enough such that the first term in (I.6) is bounded by $\epsilon/2$ for $n>N(n_{0})$ . In conclusion, $E^{*}\|\widehat{\mathbb{G}}_{n}\|_{\mathcal{G}_{h}}\rightarrow 0$ if $h\rightarrow 0$ and $n\rightarrow\infty$ . By applying the Markov inequality, we can complete the proof. ∎

Lemma I.12.

\sup_{b\in[-K,K]}\Biggl{\|}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}^{*}(1_{itn}^{*}(\gamma_{0n})^{\prime}-1_{itn}^{*}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}^{*}\delta_{0n}^{*}\\ -\left\{E_{t,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t,\infty}(\gamma_{0,\infty})-E_{t-1,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t-1,\infty}(\gamma_{0,\infty})\right\}[\zeta_{1}b+\frac{\delta_{30,\infty}}{2}b^{2}]\Biggr{\|}

is $o_{p}^{*}(1)$ in $P$ .

Proof.

As the proof is quite similar to the proofs of Lemma E.5 and Lemma I.6, we just explain direction of the proof heuristically. As $\delta_{0n}^{*}=\hat{\delta}_{n}(\gamma_{0n})$ is consistent to $\delta_{0n}$ ,

\sup_{b\in[-K,K]}\left\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}^{*}(1_{itn}^{*}(\gamma_{0n})^{\prime}-1_{itn}^{*}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}^{*}(\delta_{0n}^{*}-\delta_{0n})\right\|=o_{p}^{*}(1)\text{ in $P$.}

By Lemma I.11,

\sup_{b\in[-K,K]}\Biggl{\|}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}^{*}(1_{itn}^{*}(\gamma_{0n})^{\prime}-1_{itn}^{*}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}^{*}\delta_{0n}\\ -\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}\Biggr{\|}=o_{p}^{*}(1)\text{ in $P$,}

as the last display can be expressed by $\sqrt{n}\|\bar{g}_{n}^{*}(\alpha_{0n},\gamma_{0n}+\frac{b}{n^{1/4}})-\bar{g}_{n}^{*}(\alpha_{0n},\gamma_{0n})-\bar{g}_{n}(\alpha_{0n},\gamma_{0n}+\frac{b}{n^{1/4}})+\bar{g}_{n}(\alpha_{0n},\gamma_{0n})\|$ . Hence,

\sup_{b\in[-K,K]}\Biggl{\|}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}^{*}(1_{itn}^{*}(\gamma_{0n})^{\prime}-1_{itn}^{*}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}^{*}\delta_{0n}^{*}\\ -\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}(1_{itn}(\gamma_{0n})^{\prime}-1_{itn}(\gamma_{0n}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{itn}\delta_{0n}\Biggr{\|}=o_{p}^{*}(1)\text{ in $P$,}

and applying Lemma I.6 completes the proof.

∎

Lemma I.13.

\sup_{b\in[-K,K]}\Biggl{\|}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{itn}^{*}(1_{itn}^{*}(\gamma_{0n})^{\prime}-1_{itn}^{*}(\gamma_{0n}+\tfrac{b}{\sqrt{n}(\delta_{10n}+\delta_{30n}\gamma_{0n})})^{\prime})X_{itn}^{*}\delta_{0n}^{*}\\ -\left\{E_{t,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t,\infty}(\gamma_{0,\infty})-E_{t-1,\infty}[z_{it,\infty}|\gamma_{0,\infty}]f_{t-1,\infty}(\gamma_{0,\infty})\right\}b\Biggr{\|}

is $o_{p}^{*}(1)$ in $P$ .

Proof.

We omit the proof as it is almost identical to the proof of Lemma I.12. ∎

The following proposition is bootstrap Glivenko-Cantelli theorem uniform in underlying probability measures $P\in\{P_{1},P_{2},...\}$ .

Proposition 2.

Let $\{X_{in}:1\leq i\leq n,n=1,2,...\}$ be a triangular array of random elements in a measurable space $(\mathcal{X},\mathcal{A})$ while $X_{in}$ ’s are independent to each other with probability law $P_{n}$ , and $\mathcal{F}$ be a class of functions on $(\mathcal{X},\mathcal{A})$ with an envelope $F$ . Suppose that $\mathcal{F}$ is a Glivenko-Cantelli class uniformly in $P\in\{P_{m}\}$ , and $\sup_{n\in\mathbb{N}}P_{n}F<\infty$ . For each $n$ , let $W=(W_{1n},...,W_{nn})$ be an exchangeable nonnegative random vector independent of $X_{1n},X_{2n},...,X_{nn}$ such that $\sum_{i=1}^{n}W_{in}=1$ and $\max_{1\leq i\leq n}|W_{in}|$ converges to zero in probability. Then, for every $\epsilon>0$ and $\eta>0$ , as $n\rightarrow\infty$ ,

P_{n}\left(P_{W}\left(\|\sum_{i=1}^{n}W_{in}(\delta_{X_{in}}-P_{n})\|_{\mathcal{F}}>\epsilon\right)>\eta\right)\rightarrow 0,

where $\delta_{X_{in}}$ is a dirac measure at $X_{in}$ .

Let $W=(W_{1n},...,W_{nn})$ be a multinomial vector divided by $n$ with parameters $n$ and probabilities $(1/n,...,1/n)$ , which satisfies $\sum_{i=1}^{n}W_{in}=1$ and $\max_{1\leq i\leq n}|W_{in}|$ converges to zero in probability. Suppose that $\widehat{X}_{1n},...,\widehat{X}_{nn}$ are i.i.d. resampling draws from $\{X_{1n},...,X_{nn}\}$ . Then, $\frac{1}{n}\sum_{i=1}^{n}(\delta_{\widehat{X}_{in}}-P_{n})=\sum_{i=1}^{n}W_{in}(\delta_{X_{in}}-P_{n})$ , and the probability law of $W$ can be identified with the probability law of the empirical bootstrap conditional on the data.

Proof.

Let $Z_{in}=(\delta_{X_{in}}-P_{n})$ . By Lemma I.17,

E_{W}\|\sum_{i=1}^{n}W_{in}Z_{in}\|_{\mathcal{F}}\leq 2(n_{0}-1)\frac{1}{n}\sum_{i=1}^{n}\|Z_{in}\|_{\mathcal{F}}E_{W}\max_{1\leq i\leq n}|W_{in}|\\ +2n\|W_{1n}\|_{2,1}\max_{n_{0}\leq k\leq n}E_{R}\|\frac{1}{k}\sum_{i=n_{0}}^{k}Z_{R_{i}n}\|_{\mathcal{F}}.

(I.7)

Note that $\frac{1}{n}\sum_{i=1}^{n}\|Z_{in}\|_{\mathcal{F}}\leq\frac{1}{n}\sum_{i=1}^{n}Z_{in}(F)\leq(\mathbb{P}_{n}-P_{n})F+2P_{n}F$ , while $(\mathbb{P}_{n}-P_{n})F\xrightarrow{p}0$ and $\limsup_{n}\|P_{n}\|_{\mathcal{F}}\leq\limsup_{n}P_{n}F<\infty$ . Moreover, $E_{W}\max_{1\leq i\leq n}|W_{in}|\rightarrow 0$ by dominated convergence theorem because $|W_{in}|\leq 1$ . Hence, the first term in the right hand side of (I.7) converges to zero in probability for fixed $n_{0}$ as $n\rightarrow\infty$ . That is, for any $\epsilon>0$ and $n_{0}<\infty$ ,

P_{n}\left(\left|(n_{0}-1)\frac{1}{n}\sum_{i=1}^{n}\|Z_{in}\|_{\mathcal{F}}E_{W}\max_{1\leq i\leq n}|W_{in}|\right|>\epsilon\right)\rightarrow 0\text{ as $n\rightarrow\infty$.}

Note that $n\|W_{1n}\|_{2,1}\leq n(EW_{1n})=1$ (see the proof of Theorem 3.6.16 in van der Vaart and Wellner, (1996)). Finally, we need to show $\max_{n_{0}\leq k\leq n}E_{R}\|\frac{1}{k}\sum_{i=n_{0}}^{k}Z_{R_{i}n}\|_{\mathcal{F}}\xrightarrow{p}0$ . By triangle inequality,

	$\displaystyle\max_{n_{0}\leq k\leq n}E_{R}\\|\frac{1}{k}\sum_{i=n_{0}}^{k}Z_{R_{i}n}\\|_{\mathcal{F}}$	$\displaystyle\leq\max_{n_{0}\leq k\leq n}\left\{E_{R}\\|\frac{1}{k}\sum_{i=1}^{k}Z_{R_{i}n}\\|_{\mathcal{F}}+E_{R}\\|\frac{1}{k}\sum_{i=1}^{n_{0}-1}Z_{R_{i}n}\\|_{\mathcal{F}}\right\}$
		$\displaystyle\leq\max_{n_{0}-1\leq k\leq n}2E_{R}\\|\frac{1}{k}\sum_{i=1}^{k}Z_{R_{i}n}\\|_{\mathcal{F}}$
		$\displaystyle=\max_{n_{0}-1\leq k\leq n}2\\|\frac{1}{k}\sum_{i=1}^{k}Z_{{i}n}\\|_{\mathcal{F}}.$

The equality comes from $R$ being independent of $Z_{in}$ . Note that $\sup_{n\in\mathbb{N}}P_{n}(\|\frac{1}{k}\sum_{i=1}^{k}Z_{{i}n}\|_{\mathcal{F}}>\epsilon)\rightarrow 0$ as $k\rightarrow\infty$ since $\mathcal{F}$ is Glivenko-Cantelli uniformly in $\{P_{m}\}$ . Hence, the second term in the right hand side of (I.7) converges to zero in probability as $n_{0}\rightarrow\infty$ . That is, for any $\epsilon>0$ ,

\sup_{n\geq n_{0}}P_{n}\left(\left|n\|W_{1n}\|_{2,1}\max_{n_{0}\leq k\leq n}E_{R}\|\frac{1}{k}\sum_{i=n_{0}}^{k}Z_{R_{i}n}\|_{\mathcal{F}}\right|>\epsilon\right)\rightarrow 0\text{ as $n_{0}\rightarrow\infty$.}

Therefore, for any $\epsilon>0$ ,

P_{n}\left(E_{W}\|\sum_{i=1}^{n}W_{in}Z_{in}\|_{\mathcal{F}}>\epsilon\right)\rightarrow 0\text{ as $n\rightarrow\infty$.}

By applying the Markov inequality as follows, we can complete the proof:

P_{n}\left(P_{W}\left(\|\sum_{i=1}^{n}W_{in}Z_{in}\|_{\mathcal{F}}>\epsilon\right)>\eta\right)\leq P_{n}\left(E_{W}\|\sum_{i=1}^{n}W_{in}Z_{in}\|_{\mathcal{F}}>\eta\epsilon\right).

∎

Lemma I.14 (Lemma 3.6.6 van der Vaart and Wellner, (1996)).

For fixed elements $x_{1},...,x_{n}$ of a set $\mathcal{X}$ , let $\widehat{X}_{1},...,\widehat{X}_{k}$ be an i.i.d. sample from $\mathbb{P}_{n}=n^{-1}\sum_{i=1}^{n}\delta_{x_{i}}$ , where $\delta_{x_{i}}$ is a dirac measure at $x_{i}$ . Then,

E_{\widehat{X}}\|\sum_{j=1}^{k}(\delta_{\widehat{X}_{j}}-\mathbb{P}_{n})\|_{\mathcal{F}}\leq 4E_{N,N^{\prime}}\|\sum_{i=1}^{n}(N_{i}-N_{i}^{\prime})\delta_{x_{i}}\|_{\mathcal{F}}

for every class $\mathcal{F}$ of functions $f:\mathcal{X}\rightarrow\mathbb{R}$ and i.i.d. Poisson variables $N_{1},N_{1}^{\prime},...,N_{n},N_{n}^{\prime}$ with mean $\frac{1}{2}k/n$ .

Lemma I.15 (Lemma 2.3.6 van der Vaart and Wellner, (1996)).

Let $Z_{1},...,Z_{n}$ be independent stochastic processes with mean zero. Then,

E\|\sum_{i=1}^{n}\varepsilon_{i}Z_{i}\|_{\mathcal{F}}\leq 2E\|\sum_{i=1}^{n}Z_{i}\|_{\mathcal{F}}

for i.i.d. Rademacher random variables $\varepsilon_{1},...,\varepsilon_{n}$ and any functional class $\mathcal{F}$ .

Lemma I.16 (Lemma 2.9.1 van der Vaart and Wellner, (1996)).

Let $Z_{1},...,Z_{n}$ be i.i.d. stochastic processes with $E\|Z_{i}\|_{\mathcal{F}}<\infty$ independent of the Rademacher variables $\varepsilon_{1},...,\varepsilon_{n}$ . Then, for every i.i.d. sample $\xi_{1},...,\xi_{n}$ of mean-zero and symmetrically distributed random variables independent of $Z_{1},...,Z_{n}$ and $1\leq n_{0}\leq n$ ,

E\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}Z_{i}\|_{\mathcal{F}}\leq(n_{0}-1)E\|Z_{1}\|_{\mathcal{F}}E_{\xi}\max_{1\leq i\leq n}\frac{|\xi_{i}|}{\sqrt{n}}\\ +\|\xi_{1}\|_{2,1}\max_{n_{0}\leq k\leq n}E\|\frac{1}{\sqrt{k}}\sum_{i=n_{0}}^{k}\varepsilon_{i}Z_{i}\|_{\mathcal{F}},

where $\|\cdot\|_{2,1}$ is $L_{2,1}$ norm such that $\|\xi\|_{2,1}=\int_{0}^{\infty}\sqrt{P(|\xi|>x)}dx$ for a random variable $\xi$ .

Lemma I.17 (Lemma 3.6.7 van der Vaart and Wellner, (1996)).

For arbitrary stochastic processes $Z_{1},...,Z_{n}$ , every exchangeable random vector $(\xi_{1},...,\xi_{n})$ that is independent of $Z_{1},...,Z_{n}$ , and any $1\leq n_{0}\leq n$ ,

E_{\xi}\|\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}Z_{i}\|_{\mathcal{F}}\leq 2(n_{0}-1)\frac{1}{n}\sum_{i=1}^{n}\|Z_{i}\|_{\mathcal{F}}E_{\xi}\max_{1\leq i\leq n}\frac{|\xi_{i}|}{\sqrt{n}}+2\|\xi_{1}\|_{2,1}\max_{n_{0}\leq k\leq n}E_{R}\|\frac{1}{\sqrt{k}}\sum_{i=n_{0}}^{k}Z_{R_{i}}\|_{\mathcal{F}},

where $(R_{1},...,R_{n})$ is a random vector uniformly distributed on the set of all permutations of $\{1,...,n\}$ and independent of $Z_{1},...,Z_{n}$ . $\|\cdot\|_{2,1}$ is $L_{2,1}$ norm such that $\|\xi\|_{2,1}=\int_{0}^{\infty}\sqrt{P(|\xi|>x)}dx$ for a random variable $\xi$ .

Lemma I.18 (Lemma 3.6.15 van der Vaart and Wellner, (1996)).

For each $n$ , let $(a_{1n},...,a_{nn})$ and $(B_{1n},...,B_{nn})$ be a vector of numbers and exchangeable random vector such that

	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}(a_{in}-\bar{a}_{n})^{2}\rightarrow{\sigma^{2}},\quad\lim_{M\rightarrow\infty}\limsup_{n\rightarrow\infty}\frac{1}{n}\sum_{i=1}^{n}a_{in}^{2}\{\|a_{in}\|>M\}=0,$
	$\displaystyle\frac{1}{n}\sum_{i=1}^{n}(B_{in}-\bar{B}_{n})^{2}\xrightarrow{p}\alpha^{2},\quad\frac{1}{n}\max_{1\leq i\leq n}(B_{in}-\bar{B}_{n})^{2}\xrightarrow{p}0,$

where $\bar{a}_{n}=\frac{1}{n}\sum_{i=1}^{n}a_{in}$ and $\bar{B}_{n}=\frac{1}{n}\sum_{i=1}^{n}B_{in}$ . Then, $n^{-1/2}\sum_{i=1}^{n}(a_{in}B_{in}-\bar{a}_{n}\bar{B}_{n})\xrightarrow{d}N(0,\alpha^{2}\sigma^{2})$ .

Let $B=(B_{1n},...,B_{nn})$ be a multinomial vector with parameters $n$ and probabilities $(1/n,...,1/n)$ . Then, $\bar{B}_{n}=1$ , and conditions for $B$ in Lemma I.18 hold.

	$\displaystyle\sqrt{n}\bar{g}_{n}^{}(\alpha_{0}^{}+\tfrac{a}{\sqrt{n}},\gamma_{0}^{*}+\tfrac{b}{n^{\frac{1}{4}}})=$	$\displaystyle\sqrt{n}\left\{\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}\widehat{\Delta\epsilon}_{it_{0}}^{}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}^{}\widehat{\Delta\epsilon}_{iT}^{}\end{pmatrix}-\begin{pmatrix}\frac{1}{n}\sum_{i=1}^{n}z_{it_{0}}\widehat{\Delta\epsilon}_{it_{0}}\\ \vdots\\ \frac{1}{n}\sum_{i=1}^{n}z_{iT}\widehat{\Delta\epsilon}_{iT}\end{pmatrix}\right\}$
		$\displaystyle-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}\Delta x_{it_{0}}^{\prime}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{}\Delta x_{iT}^{\prime}\end{pmatrix}a_{1}-\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}1_{it_{0}}^{}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{it_{0}}^{}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{}1_{iT}^{}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime}X_{iT}^{}\end{pmatrix}a_{2}$
		$\displaystyle+\sqrt{n}\begin{pmatrix}\tfrac{1}{n}\sum_{i=1}^{n}z_{it_{0}}^{}(1_{it_{0}}^{}(\gamma_{0}^{})^{\prime}-1_{it_{0}}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it_{0}}^{}\\ \vdots\\ \tfrac{1}{n}\sum_{i=1}^{n}z_{iT}^{}(1_{iT}^{}(\gamma_{0}^{})^{\prime}-1_{iT}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{iT}^{}\end{pmatrix}\delta_{0}^{*}.$

	$\displaystyle\left.\frac{d}{d\gamma}E_{t}[z_{it}(1,x_{it}^{\prime})\delta_{0}\|\gamma]f_{t}(\gamma)\right\|_{\gamma=\gamma_{0}}$	$\displaystyle=\left.\frac{d}{d\gamma}E_{t}[z_{it}(\delta_{10}+\delta_{30}\gamma)\|\gamma]f_{t}(\gamma)\right\|_{\gamma=\gamma_{0}}$
		$\displaystyle=\left.\frac{d}{d\gamma}(\delta_{10}+\delta_{30}\gamma)\cdot E_{t}[z_{it}\|\gamma]f_{t}(\gamma)\right\|_{\gamma=\gamma_{0}}$
		$\displaystyle=\delta_{30}E_{t}[{z_{it}}\|\gamma_{0}]f_{t}(\gamma_{0})+(\delta_{10}+\delta_{30}\gamma_{0})\left.\frac{d}{d\gamma}E_{t}[z_{it}\|\gamma]f_{t}(\gamma)\right\|_{\gamma=\gamma_{0}}$
		$\displaystyle=\delta_{30}E_{t}[{z_{it}}\|\gamma_{0}]f_{t}(\gamma_{0}).$

		$\displaystyle\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{}(1_{it}(\gamma_{0}^{})^{\prime}-1_{it}^{}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{}\delta_{0}^{}$
		$\displaystyle=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}^{}(1_{it}^{}(\gamma_{0}^{})^{\prime}-1_{it}^{}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}^{}\delta_{0}^{}-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma_{0}^{})^{\prime}-1_{it}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}^{}$		(E.2)
		$\displaystyle\quad+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}z_{it}(1_{it}(\gamma_{0}^{})^{\prime}-1_{it}(\gamma_{0}^{}+\tfrac{b}{n^{\frac{1}{4}}})^{\prime})X_{it}\delta_{0}^{*}.$		(E.3)

$\displaystyle g_{in}^{*}(\theta)$	$\displaystyle=(g_{it_{0}n}^{}(\theta)^{\prime},...,g_{iTn}^{}(\theta)^{\prime})^{\prime}$
$\displaystyle g_{itn}^{*}(\theta)$	$\displaystyle=g_{t}(\omega_{in}^{},\theta)-g_{t}(\omega_{in}^{},\theta_{0n}^{})+g_{t}(\omega_{in}^{},\hat{\theta}_{n})$	(I.1)
	$\displaystyle=-z_{itn}^{}\Delta x_{itn}^{\prime}(\beta-\beta_{0n}^{})-z_{itn}^{}1_{itn}^{}(\gamma)^{\prime}X_{itn}^{}(\delta-\delta_{0n}^{*})$
	$\displaystyle\quad+z_{itn}^{}(1_{itn}^{}(\gamma_{0n}^{})^{\prime}-1_{itn}^{}(\gamma)^{\prime})X_{itn}^{}\delta_{0n}^{}+z_{itn}^{}\widehat{\Delta\epsilon}_{itn}^{},$

	$\displaystyle\max_{n_{0}\leq k\leq n}E_{R}\\|\frac{1}{k}\sum_{i=n_{0}}^{k}Z_{R_{i}n}\\|_{\mathcal{F}}$	$\displaystyle\leq\max_{n_{0}\leq k\leq n}\left\{E_{R}\\|\frac{1}{k}\sum_{i=1}^{k}Z_{R_{i}n}\\|_{\mathcal{F}}+E_{R}\\|\frac{1}{k}\sum_{i=1}^{n_{0}-1}Z_{R_{i}n}\\|_{\mathcal{F}}\right\}$
		$\displaystyle\leq\max_{n_{0}-1\leq k\leq n}2E_{R}\\|\frac{1}{k}\sum_{i=1}^{k}Z_{R_{i}n}\\|_{\mathcal{F}}$
		$\displaystyle=\max_{n_{0}-1\leq k\leq n}2\\|\frac{1}{k}\sum_{i=1}^{k}Z_{{i}n}\\|_{\mathcal{F}}.$

Bootstraps for Dynamic Panel Threshold Models

Abstract

1 Introduction

2 Dynamic Panel Threshold Model

Theorem 1.

Definition 1.

3 Asymptotic theory

Assumption G.

Assumption D.

Assumption LK.

Assumption LJ.

Theorem 2.

3.1 Testing for threshold value

Theorem 3.

3.2 Testing continuity

Theorem 4.

4 Bootstrap

4.1 Grid bootstrap for threshold location

Theorem 5.

4.1.1 Uniform validity of grid bootstrap

4.2 Residual bootstrap for coefficients

Assumption P.

Theorem 6.

4.3 Bootstrap for testing continuity

Theorem 7.

5 Monte Carlo results

6 Empirical example

7 Conclusion

References

Additional Notations.

Appendix A Proofs for Section 3.

A.1 Proof of Theorem 1.

A.2 Proof of Theorem 2.

A.2.1 Consistency.

A.2.2 Convergence rate.

A.2.3 Asymptotic distribution.

Characterization of the minimizers

Appendix B Proofs for Section 4

B.1 Preliminaries

Lemma B.1.

Proof.

B.2 Proof of Theorem 6.

Proposition 1.

Asymptotic distribution under continuity.

Asymptotic distribution under discontinuity.

Online Supplements for “Bootstraps for Dynamic Panel Threshold Models” (Not for Publication)

Woosik Gong and Myung Hwan Seo

Appendix C Supplementary Results for Monte Carlo Simulation

C.1 Symmetric Percentile Confidence Intervals for Coefficients

C.2 Weakly Endogenous Threshold Variable

C.3 Coverage Rates by Asymptotic Confidence Intervals

Appendix D Proofs of Theorems in Section 3 and Auxiliary Lemmas

Additional notations

D.1 Proof of Theorem 3.

D.1.1 Continuous Model.

When γ=γ0\gamma=\gamma_{0}.

When γ≠γ0\gamma\neq\gamma_{0}.

D.1.2 Discontinuous Model.

When γ=γ0\gamma=\gamma_{0}.

When γ≠γ0\gamma\neq\gamma_{0}.

D.2 Proof of Theorem 4.

Under the null hypothesis.

Under the alternative hypothesis.

D.3 Auxiliary Lemmas

Lemma D.1.

Proof.

Lemma D.2.

Proof.

Lemma D.3.

Proof.

Lemma D.4.

Proof.

Appendix E Proofs of Theorems in Section 4 and Auxiliary Lemmas

E.1 Preliminaries

E.2 Proof of Proposition 1

Consistency of the bootstrap estimator.

Convergence rate under continuity.

Convergence rate under discontinuity.

E.3 Proof of Theorem 5.

When γ=γ0\gamma=\gamma_{0}.

When $\gamma=\gamma_{0}$ .

When $\gamma\neq\gamma_{0}$ .

When $\gamma=\gamma_{0}$ .

When $\gamma\neq\gamma_{0}$ .

When $\gamma=\gamma_{0}$ .

When $\gamma\neq\gamma_{0}$ .

Limit of $\widetilde{J}_{1n}$ :

Limit of $\widetilde{J}_{2n}$ :