This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Inference in Nonparametric Series Estimation with Specification Searches for the Number of Series Terms

Byunghoon Kang
Department of Economics, Lancaster University
I thank the editor Peter Phillips, the co-editor Iván Fernández-Val, and the two anonymous referees for thoughtful comments that significantly improved this paper. I am also grateful to Bruce Hansen, Jack Porter, Xiaoxia Shi and Joachim Freyberger for useful comments and discussions, and thanks to Michal Kolesár, Denis Chetverikov, Yixiao Sun, Andres Santos, Patrik Guggenberger, Federico Bugni, Joris Pinkse, Liangjun Su, Myung Hwan Seo, and Áureo de Paula for helpful conversations and criticism. This paper is a revised version of the first chapter in my Ph.D. thesis at UW-Madison and previously titled “Inference in Nonparametric Series Estimation with Data-Dependent Undersmoothing”. I acknowledge support by the Kwanjeong Educational Foundation Graduate Research Fellowship and Leon Mears Dissertation Fellowship from UW-Madison. All errors are my own. Email: b.kang1@lancaster.ac.uk, Homepage: https://sites.google.com/site/davidbhkang
(September 15, 2025)
Abstract

Nonparametric series regression often involves specification search over the tuning parameter, i.e., evaluating estimates and confidence intervals with a different number of series terms. This paper develops pointwise and uniform inferences for conditional mean functions in nonparametric series estimations that are uniform in the number of series terms. As a result, this paper constructs confidence intervals and confidence bands with possibly data-dependent series terms that have valid asymptotic coverage probabilities. This paper also considers a partially linear model setup and develops inference methods for the parametric part uniform in the number of series terms. The finite sample performance of the proposed methods is investigated in various simulation setups as well as in an illustrative example, i.e., the nonparametric estimation of the wage elasticity of the expected labor supply from Blomquist and Newey (2002).

Keywords: Nonparametric series regression, Pointwise confidence interval, Smoothing parameter choice, Specification search, Undersmoothing, Uniform confidence bands.

JEL classification: C12, C14.

1 Introduction

We consider the following nonparametric regression model

yi=g0(xi)+εi,E(εi|xi)=0y_{i}=g_{0}(x_{i})+\varepsilon_{i},\qquad E(\varepsilon_{i}|x_{i})=0 (1.1)

where {yi,xi}i=1n\{y_{i},x_{i}\}_{i=1}^{n} is i.i.d., yiy_{i} is a scalar response variable, xi𝒳dxx_{i}\in\mathcal{X}\subset\mathbb{R}^{d_{x}} is a vector of covariates, and g0(x)=E(yi|xi=x)g_{0}(x)=E(y_{i}|x_{i}=x) is the conditional mean function. The theory of estimation and inference is well developed for nonparametric series (sieve) methods in a large body of econometrics and statistics literature. Series estimators have also received attention in applied economics because they have many appealing features, e.g., they can easily impose shape restrictions such as additive separability and monotonicity. Once the basis function is chosen (e.g., polynomial or regression spline series of fixed order), implementation requires a choice of the number of series terms K=KnK=K_{n} where KK denotes the order of the polynomials or the number of knots in the splines. However, this often involves some ad-hoc specification searches over K𝒦nK\in\mathcal{K}_{n}. For example, when xidxx_{i}\in\mathbb{R}^{d_{x}} is vector valued, researchers often evaluate the different numbers of terms in each dimension separately and construct a set of bases with different powers and cross-products of covariates. Although specification search seems necessary in some cases, it may lead to misleading inference without considering the first-step specification search or series term selection.111As a referee noted, the bias and MSE of the series estimator depend on not only KK but also the specific bases or sieve spaces, e.g., the order of the splines. In this paper, we fix the basis function, and we do not allow searching over the specific bases or sieve spaces.

Existing theory for the asymptotic normality of t-statistics and valid inference imposes a so-called undersmoothing (i.e., overfitting) condition that is a faster rate of KK than the mean-squared error (MSE) optimal convergence rates, and many papers in the literature typically suggest rule-of-thumb rules that give the desired level of undersmoothing. Among many others, Newey (2013) suggested increasing KK until the standard errors are large relative to small changes in objects of interest. Newey, Powell, and Vella (1999) suggested using more terms than that chosen by cross-validation. Horowitz and Lee (2012) suggested increasing KK until the integrated variance suddenly increases and then adding additional terms.

In this paper, we formally justify these rule-of-thumb methods or “plug-in” methods with undersmoothed K^\widehat{K} for valid inference in nonparametric series regression. Specifically, we provide pointwise inference for g0(x)g_{0}(x) with possibly data-dependent (undersmoothed) K^𝒦n\widehat{K}\in\mathcal{K}_{n}, i.e., constructing 100(1α)%100(1-\alpha)\% confidence interval (CI),

lim infnP(g0(x)[g^n(K^,x)±c^1α(x)V^n(K^,x)/n])1α,\liminf\limits_{n\rightarrow\infty}P(g_{0}(x)\in[\widehat{g}_{n}(\widehat{K},x)\pm\widehat{c}_{1-\alpha}(x)\sqrt{\widehat{V}_{n}(\widehat{K},x)/n}])\geq 1-\alpha, (1.2)

with an estimator g^n(K,x)\widehat{g}_{n}(K,x), variance V^n(K,x)\widehat{V}_{n}(K,x) using KK series terms, and critical values c^1α(x)\widehat{c}_{1-\alpha}(x) from the supremum of the t-statistics. For this result, we first develop a uniform distributional approximation theory of the absolute value of the supremum of the t-statistics over different series terms to construct asymptotically valid confidence intervals, which are uniform in K𝒦nK\in\mathcal{K}_{n},

P(g0(x)[g^n(K,x)±c^1α(x)V^n(K,x)/n],K𝒦n)=1α+o(1).P(g_{0}(x)\in[\widehat{g}_{n}(K,x)\pm\widehat{c}_{1-\alpha}(x)\sqrt{\widehat{V}_{n}(K,x)/n}],\quad K\in\mathcal{K}_{n})=1-\alpha+o(1). (1.3)

The critical values c^1α(x)\widehat{c}_{1-\alpha}(x) can be easily implemented using simple simulation or weighted bootstrap methods.

Furthermore, this paper develops the construction of confidence bands for g0(x)g_{0}(x) with asymptotically uniform (in K𝒦nK\in\mathcal{K}_{n}) coverage with critical values c^1α\widehat{c}_{1-\alpha} chosen to satisfy

P(g0(x)[g^n(K,x)±c^1αV^n(K,x)/n],K𝒦n,x𝒳)=1α+o(1).P(g_{0}(x)\in[\widehat{g}_{n}(K,x)\pm\widehat{c}_{1-\alpha}\sqrt{\widehat{V}_{n}(K,x)/n}],\quad K\in\mathcal{K}_{n},x\in\mathcal{X})=1-\alpha+o(1). (1.4)

Analogous to the pointwise inference in (1.2), we can show the validity of confidence bands with the data-dependent K^\widehat{K}. Even in pointwise inference, deriving a uniform asymptotic distribution theory for all sequences of t-statistics over K𝒦K\in\mathcal{K} may not be possible unless p=|𝒦n|p=|\mathcal{K}_{n}| is finite. Allowing pp\rightarrow\infty as nn\rightarrow\infty, results in this paper build on coupling inequalities for the supremum of the empirical process developed by Chernozhukov, Chetverikov, and Kato (2014a, 2016) combined with anti-concentration inequality in Chernozhukov, Chetverikov, and Kato (2014b).

We also provide inference methods in a partially linear model setup focusing on the common parametric part. Unlike the nonparametric object of interest that has a slower convergence rate than n1/2n^{1/2} (e.g., regression function or regression derivative), the t-statistics for the parametric object of interest are asymptotically equivalent for all sequences of KK under standard rate conditions K/n0K/n\rightarrow 0 as nn\rightarrow\infty. To account for the dependency of the t-statistics with the different sequences of KK in this setup, we consider a faster rate of KK that grows as fast as the sample size nn, as in Cattaneo, Jansson, and Newey (2018a, 2018b), and develop an asymptotic distribution of the t-statistics over K𝒦nK\in\mathcal{K}_{n}. Then, we discuss methods to construct confidence intervals that are similar to the nonparametric regression setup and provide uniform (in K𝒦nK\in\mathcal{K}_{n}) coverage properties.

We investigate finite sample coverage and length properties of the proposed CIs and uniform confidence bands in various simulation setups. As an illustrative example, we revisit nonparametric estimation of labor supply function using the entire individual piecewise-linear budget set as in Blomquist and Newey (2002). Imposing additive separability, which is derived by economic theory, Blomquist and Newey (2002) estimate the conditional mean of labor supply function using series estimation and report wage elasticity of the expected labor supply as well as other welfare measures with various specifications of the different number of series terms.

Several important papers have investigated the asymptotic properties of series (and sieve) estimators, including papers by Andrews (1991a); Eastwood and Gallant (1991); Newey (1997); Chen and Shen (1998); Huang (2003); Chen (2007); Chen and Liao (2014); Chen, Liao, and Sun (2014); Belloni, Chernozhukov, Chetverikov, and Kato (2015); and Chen and Christensen (2015), among many others. This paper extends inference based on the t-statistic under a single sequence of KK to the sequences of KK over a set 𝒦n\mathcal{K}_{n} and focuses both on the pointwise and uniform inferences on g0(x)g_{0}(x), which is irregular (i.e., slower than a rate of n1/2n^{1/2}) and a linear functional, under an i.i.d. setup.

The supremum t-statistics have been used as a correction for multiple-testing problems and to construct simultaneous confidence bands, and the importance of multiple-testing problems (data mining or data snooping) has long been noted in various other contexts (see Leamer (1983), White (2000), Romano and Wolf (2005), Hansen (2005)).

There is also a growing literature on data-dependent series term selection and its impact on estimation and inference in econometrics and statistics. Asymptotic optimality results of cross-validation have been developed, e.g., in papers by Li (1987), Andrews (1991b), and Hansen (2015). Horowitz (2014) develops data-driven methods for choosing the sieve dimension in the nonparametric instrumental variables (NPIV) estimation such that resulting NPIV estimators attain the optimal sup-norm or L2L^{2} norm rates adaptive to the unknown smoothness of g0(x)g_{0}(x). Although we do not pursue adaptive inference in this paper, there is also a large statistical literature on adaptive inference. For example, Giné and Nickl (2010), Chernozhukov, Chetverikov, and Kato (2014b) construct adaptive confidence bands in the density estimation problem (see Giné and Nickl (2015, Section 8) for comprehensive lists of references). However, once data-driven choice is obtained for adaptive estimation (e.g., Lepski (1990)-type procedures), one still requires an undersmoothing condition for inference to eliminate asymptotic bias terms (see Theorem 1 of Giné and Nickl (2010)), and this may result in similar specification search issues when choosing sufficiently “large” KK in practice.

We can, in principle, consider kernel-based estimation where several data-dependent bandwidth selections or explicit bias corrections have been proposed.222See Härdle and Linton (1994), Li and Racine (2007) for references. See also Hall and Horowitz (2013), Calonico, Cattaneo, and Farrell (2018), Schennach (2015) and references therein for various recent works on related bias issues and inference for kernel estimators. However, there exist many examples estimating g0(x)g_{0}(x) using (global) series estimation and imposing shape constraints easily (such as additive separability to reduce dimensionality) that are also interested in both pointwise and uniform inference. Given the issues of specification search, our paper is closely related to a recent paper by Armstrong and Kolesár (2018) which considers a bandwidth snooping adjustment for kernel-based inference.

Unlike kernel-based methods, little is known about the statistical properties of data-dependent selection rules and explicit bias formulas for general series estimation. Zhou, Shen, and Wolfe (1998) and Huang (2003) are two of the few exceptions. A recent paper, Cattaneo, Farrell, and Feng (2019), develops novel explicit asymptotic bias/integrated mean squared error (IMSE) formulas and asymptotic theory of the bias-correction methods for general partitioning-based series estimators. The results in Cattaneo, Farrel, and Feng (2019) can be used as an alternative to the undersmoothing approach to avoid specification search issues.

The remainder of the paper is organized as follows. Section 2 introduces the basic nonparametric series regression setup and the candidate set 𝒦n\mathcal{K}_{n}. Section 3 provides the pointwise inference, and Section 4 provides uniform inference in x𝒳x\in\mathcal{X}. Section 5 extends our inference methods to the partially linear model setup. Section 6 summarizes Monte Carlo experiments in various setups, and Section 7 illustrates an empirical example as in Blomquist and Newey (2002). Then, Section 8 concludes the paper. Appendix A includes the main proofs, and Appendix B includes figures and tables. Additional supporting lemmas and simulation results are provided in the Online Supplementary Material available at Cambridge Journals Online (journals.cambridge.org/ect).

1.1 Notation

A||A|| denotes the spectral norm, which equals the largest singular value of a matrix AA, and λmin(A),λmax(A)\lambda_{min}(A),\lambda_{max}(A) denote the minimum and maximum eigenvalues of a symmetric matrix AA, respectively. op()o_{p}(\cdot) and Op()O_{p}(\cdot) denote the usual stochastic order symbols, 𝑑\overset{d}{\longrightarrow} denotes convergence in distribution, and \Rightarrow denotes weak convergence. Let ab=min{a,b},ab=max{a,b}a\wedge b=\min\{a,b\},a\vee b=\max\{a,b\} and denote a\lfloor a\rfloor as the largest integer less than the real number aa. For two sequences of positive real numbers ana_{n} and bnb_{n}, anbna_{n}\lesssim b_{n} denotes ancbna_{n}\leq cb_{n} for all nn sufficiently large with some constant c>0c>0 that is independent of nn. anbna_{n}\asymp b_{n} denotes anbna_{n}\lesssim b_{n} and bnanb_{n}\lesssim a_{n}. Furthermore, anPbna_{n}\lesssim_{P}b_{n} denotes an=Op(bn)a_{n}=O_{p}(b_{n}). For a given random variable {Xi}\{X_{i}\} and 1p<1\leq p<\infty, Lp(X)L^{p}(X) is the space of all LpL^{p}-norm bounded functions with fLp=[Ef(Xi)p]1/p||f||_{L^{p}}=[E||f(X_{i})||^{p}]^{1/p}, (X)\ell^{\infty}(X) denotes the space of all bounded functions under the sup-norm, and f=supx𝒳|f(x)|||f||_{\infty}=\sup_{x\in\mathcal{X}}|f(x)| for the bounded real-valued functions ff on the support 𝒳\mathcal{X}.

2 Setup

We introduce the nonparametric series regression setup in the model (1.1). Given a random sample {yi,xi}i=1n\{y_{i},x_{i}\}_{i=1}^{n}, we are interested in inference on the conditional mean g0(x)=E(yi|xi=x)g_{0}(x)=E(y_{i}|x_{i}=x) at a particular point x𝒳dxx\in\mathcal{X}\subset\mathbb{R}^{d_{x}} or uniform in x𝒳x\in\mathcal{X}.

Let g^n(K,x)\widehat{g}_{n}(K,x) be an estimator of g0(x)g_{0}(x) using K=Kn1K=K_{n}\geq 1 series terms P(K,x)=(p1(x),,pK(x))P(K,x)=(p_{1}(x),\cdots,p_{K}(x))^{\prime}, which is a vector of basis functions that can change with nn. Standard examples for the basis functions are power series, Fourier series, orthogonal polynomials, splines and wavelets. The series estimator is then obtained by the least square (LS) estimation of yiy_{i} on regressors P(K,xi)P(K,x_{i})

g^n(K,x)=P(K,x)β^K,β^K=(PKPK)1PKY\widehat{g}_{n}(K,x)=P(K,x)^{\prime}\widehat{\beta}_{K},\qquad\widehat{\beta}_{K}=(P^{K^{\prime}}P^{K})^{-1}P^{K^{\prime}}Y (2.1)

where PK=[PK1,,PKn],PKiP(K,xi)=(p1(xi),p2(xi),,pK(xi)),Y=(y1,yn)P^{K}=[P_{K1},\cdots,P_{Kn}]^{\prime},P_{Ki}\equiv P(K,x_{i})=(p_{1}(x_{i}),p_{2}(x_{i}),\cdots,p_{K}(x_{i}))^{\prime},Y=(y_{1},\cdots y_{n})^{\prime}. Define the least square residuals as ε^Ki=yiPKiβ^K\widehat{\varepsilon}_{Ki}=y_{i}-P_{Ki}^{\prime}\widehat{\beta}_{K},

V^n(K,x)=P(K,x)Q^K1Ω^KQ^K1P(K,x),Q^K=1ni=1nPKiPKi,Ω^K=1ni=1nPKiPKiε^Ki2,\displaystyle\begin{split}&\widehat{V}_{n}(K,x)=P(K,x)^{\prime}\widehat{Q}_{K}^{-1}\widehat{\Omega}_{K}\widehat{Q}_{K}^{-1}P(K,x),\\ &\widehat{Q}_{K}=\frac{1}{n}\sum_{i=1}^{n}P_{Ki}P_{Ki}^{\prime},\quad\widehat{\Omega}_{K}=\frac{1}{n}\sum_{i=1}^{n}P_{Ki}P_{Ki}^{\prime}\widehat{\varepsilon}_{Ki}^{2},\end{split} (2.2)

and consider the t-statistic

T^n(K,x)n(g^n(K,x)g0(x))V^n(K,x)1/2.\widehat{T}_{n}(K,x)\equiv\frac{\sqrt{n}(\widehat{g}_{n}(K,x)-g_{0}(x))}{\widehat{V}_{n}(K,x)^{1/2}}. (2.3)

Under standard regularity conditions (discussed in the next section), the t-statistic can be decomposed as follows:

T^n(K,x)=1ni=1nP(K,x)QK1PKiεiV^n(K,x)1/2rn(K,x)V^n(K,x)/n+op(1)\widehat{T}_{n}(K,x)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{P(K,x)^{\prime}Q_{K}^{-1}P_{Ki}\varepsilon_{i}}{\widehat{V}_{n}(K,x)^{1/2}}-\frac{r_{n}(K,x)}{\sqrt{\widehat{V}_{n}(K,x)/n}}+o_{p}(1) (2.4)

where QK=E(PKiPKi)Q_{K}=E(P_{Ki}P_{Ki}^{\prime}), rn(K,x)=g0(x)P(K,x)βKr_{n}(K,x)=g_{0}(x)-P(K,x)^{\prime}\beta_{K}, and βKE[PKiPKi])1E[PKiyi]\beta_{K}\equiv E[P_{Ki}P_{Ki}^{\prime}])^{-1}E[P_{Ki}y_{i}] is the best linear L2L^{2} projection coefficient. The first term in the decomposition (2.4) converges to a standard normal distribution for the deterministic sequence KK\rightarrow\infty as nn\rightarrow\infty, and the second term does not necessarily converge to 0 due to approximation errors rn(K,x)r_{n}(K,x). The second term can be ignored with an undersmoothing assumption, and the asymptotic distribution of the t-statistic, T^n(K,x)𝑑N(0,1)\widehat{T}_{n}(K,x)\overset{d}{\longrightarrow}N(0,1), is well known in the literature (see, for examples, Andrews (1991a), Newey (1997), Belloni et al. (2015), and Chen and Christensen (2015), among many others). Then, the 100(1α)%100(1-\alpha)\% confidence interval for g0(x)g_{0}(x) can be easily constructed using the normal critical value z1α/2z_{1-\alpha/2}

[g^n(K,x)±z1α/2V^n(K,x)/n].\Big{[}\widehat{g}_{n}(K,x)\pm z_{1-\alpha/2}\sqrt{\widehat{V}_{n}(K,x)/n}\Big{]}. (2.5)

However, it is not clear whether the conventional CI using normal critical values (2.5) has a correct coverage probability with a possibly data-dependent K^\widehat{K} such as cross-validation or IMSE-optimal selection. First, Tn(K^,θ0)𝑑N(0,1)T_{n}({\widehat{K}},\theta_{0})\overset{d}{\rightarrow}N(0,1) may not hold with a random sequence of K^\widehat{K}, even if we assume the asymptotic bias is negligible. Second, it is well known that some data-dependent rules K^\widehat{K} do not satisfy the undersmoothing rate conditions, which can lead to a large asymptotic bias and coverage distortion of the standard CI. For example, suppose that the researcher uses K^=K^cv\widehat{K}=\widehat{K}_{\texttt{cv}} selected by cross-validation; then, K^cv\widehat{K}_{\texttt{cv}} is typically too “small” and violates the undersmoothing assumption needed to ensure the asymptotic normality without bias terms and the valid inference.

As discussed in the introduction, the undersmoothing assumption involves possibly ad-hoc methods to choose series terms KK over a candidate set 𝒦n\mathcal{K}_{n} for a valid inference, and cross-validation methods naturally involve specification search over a set of the different number of series terms.

The following set assumption on 𝒦n\mathcal{K}_{n} is constructed to allow a broad range of KK such that 𝒦n\mathcal{K}_{n} can allow (unknown) an optimal MSE rate of KK as well as an undersmoothing rate that increases faster than the optimal MSE rate.

Assumption 2.1.

(Set of number of series terms) Assume the candidate set as 𝒦n={Kj:1jp}\mathcal{K}_{n}=\{K_{j}:1\leq j\leq p\}, where K¯=K1\underline{K}=K_{1}\rightarrow\infty and K¯=Kp\overline{K}=K_{p}\rightarrow\infty as nn\rightarrow\infty.

Here, we consider a possibly growing set of the number of series terms, and a similar assumption is used in the literature, for example, in Newey (1994a, 1994b). Suppose g0(x)g_{0}(x) belongs to the Hölder space of smoothness s>0s>0, Σ(s,𝒳)\Sigma(s,\mathcal{X}); then, we obtain optimal L2L^{2} convergence rates Op(ns/(2s+dx))O_{p}(n^{-s/(2s+d_{x})}) with Kndx/(dx+2s)K\asymp n^{d_{x}/(d_{x}+2s)}. Assumption 2.1 allows having optimal L2L^{2} rates of KK in a large set of classes of functions. By setting 𝒦n=[K¯,K¯]\mathcal{K}_{n}=[\underline{K},\overline{K}]\cap\mathbb{N}, K¯nϕ¯\overline{K}\asymp n^{\overline{\phi}} and K¯nϕ¯\underline{K}\asymp n^{\underline{\phi}} with ϕ¯=dx/(dx+2s¯),ϕ¯=dx/(dx+2s¯)\overline{\phi}=d_{x}/(d_{x}+2\underline{s}),\underline{\phi}=d_{x}/(d_{x}+2\overline{s}), Assumption 2.1 contains the number of series terms that obtain an optimal L2L^{2} rate of convergence for g0(x)sSΣ(s,𝒳)g_{0}(x)\in\bigcup_{s\in S}\Sigma(s,\mathcal{X}), S=[s¯,s¯]S=[\underline{s},\overline{s}]. A similar assumption is used in the literature on adaptive inference, although we do not pursue this direction in the current paper.

Assumption 2.1 gives flexible choices of KK, as we only assume the rates of KK, for example, K¯=Cnϕ¯,K¯=cnϕ¯\overline{K}=Cn^{\overline{\phi}},\underline{K}=cn^{\underline{\phi}}, where cc and CC can be set arbitrarily small or large. We only require rate restrictions uniformly over K𝒦K\in\mathcal{K} to guarantee the linearization of the t-statistic in (2.4) and the rates of the cardinality p=|𝒦n|p=|\mathcal{K}_{n}|. Since K𝒦nK\in\mathcal{K}_{n} is a positive integer and pK¯p\leq\overline{K}, pp is growing at a rate much slower than nn under the rate restrictions in Section 3.

Remark 2.1 (𝒦n\mathcal{K}_{n} and the largest KK).

As a referee noted, specification search is often performed over a simple pre-defined set in practice. For example, a researcher may only use quadratic, cubic, or quartic terms in polynomial regression or try only a few different numbers of knots in regression splines to observe how the estimate and standard error change. In the nonparametric estimation of the Mincer equation (Heckman, Lochner, and Todd (2006)), researchers may consider a regression of log wages on experience with polynomials of order K¯=1\underline{K}=1 (linear) to K¯=4\overline{K}=4 (quartic).333All of our results continue to hold with fixed pp; however, it may be preferred to use larger sets 𝒦n\mathcal{K}_{n} with pp\rightarrow\infty to give greater flexibility to the candidate models as the sample size nn increases.

However, it may not be clear how to define a priori 𝒦n\mathcal{K}_{n} in practice. One must first consider a set of pre-selected models over which to search. As discussed earlier and suggested by many papers in the literature, some formal data-dependent methods to obtain optimal L2L^{2} norm or sup-norm rates, such as cross-validation, can be a useful guideline for 𝒦n\mathcal{K}_{n}. For example, one can consider a reasonable set 𝒦~n\widetilde{\mathcal{K}}_{n} first, choose K^cv𝒦~n\widehat{K}_{\texttt{cv}}\in\widetilde{\mathcal{K}}_{n} by cross-validation, and then consider 𝒦n=[K^cv,c1K^cv]\mathcal{K}_{n}=[\widehat{K}_{\texttt{cv}},c_{1}\widehat{K}_{\texttt{cv}}] or [K^cv,K^cvnc2][\widehat{K}_{\texttt{cv}},\widehat{K}_{\texttt{cv}}n^{c_{2}}] for some constants c1,c2>0c_{1},c_{2}>0. One can also search K¯\underline{K} and K¯\overline{K} sequentially by calculating changes in cross-validation or standard errors from the initial candidate set. Extending results developed in this paper with data-dependent 𝒦n\mathcal{K}_{n} are beyond the scope of the paper.

3 Pointwise Inference

In this section, we focus on pointwise inference for g0(x)g_{0}(x). The goal of this section is to provide a uniform distributional approximation theory of T^n(K,x)\widehat{T}_{n}(K,x) over a set 𝒦n\mathcal{K}_{n} and provide uniform (in K𝒦nK\in\mathcal{K}_{n}) coverage properties of confidence intervals for g0(x)g_{0}(x) in (1.2), (1.3) with the construction of critical values.

From the decomposition of the t-statistic in (2.4), we first consider the (infeasible) test statistic

maxK𝒦n|tn(K,x)|=max1jp|tn(Kj,x)|\max_{K\in\mathcal{K}_{n}}|t_{n}(K,x)|=\max_{1\leq j\leq p}|t_{n}(K_{j},x)| (3.1)

where tn(K,x)=n1/2i=1nP(K,x)QK1PKiεi/Vn(K,x)1/2t_{n}(K,x)=n^{-1/2}\sum_{i=1}^{n}P(K,x)^{\prime}Q_{K}^{-1}P_{Ki}\varepsilon_{i}/V_{n}(K,x)^{1/2} with the series variance Vn(K,x)=P(K,x)QK1ΩKQK1P(K,x)V_{n}(K,x)=P(K,x)^{\prime}Q_{K}^{-1}\Omega_{K}Q_{K}^{-1}P(K,x), ΩK=E(PKiPKiεi2)\Omega_{K}=E(P_{Ki}P_{Ki}^{\prime}\varepsilon_{i}^{2}). In general, tn(K,x),K𝒦nt_{n}(K,x),K\in\mathcal{K}_{n} does not have a limiting distribution because it is not asymptotically tight under Assumption 2.1 unless |𝒦n||\mathcal{K}_{n}| is finite or under the restrictive assumption on 𝒦n\mathcal{K}_{n}.444In an earlier version of the paper, we provide the weak convergence of a series process under the same rates of K𝒦nK\in\mathcal{K}_{n} and high-level assumptions. This can be viewed as an analogous result in the kernel estimation literature (see Section 2 of Armstrong and Kolesár (2018) and other references therein). However, we show below that there exists a sequence of random variables max1jpi=1n|Zij|\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}| such that |maxK𝒦n|tn(K,x)|max1jpi=1n|Zij||=Op(an)\big{|}\max_{K\in\mathcal{K}_{n}}|t_{n}(K,x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\big{|}=O_{p}(a_{n}) for a sequence of constants an0a_{n}\rightarrow 0, where Zi=(Zi1,,Zip)Z_{i}=(Z_{i1},...,Z_{ip})^{\prime} is a Gaussian random vector in p\mathbb{R}^{p} such that ZiN(0,1nΣn)Z_{i}\sim N(0,\frac{1}{n}\Sigma_{n}) with (j,l)(j,l) elements of the variance-covariance matrix

Σn(j,l)=E[tn(Kj,x)tn(Kl,x))]=P(Kj,x)QKj1ΩKj,KlQKl1P(Kl,x)Vn(Kj,x)1/2Vn(Kl,x)1/2,\Sigma_{n}(j,l)=E[t_{n}(K_{j},x)t_{n}(K_{l},x))]=\frac{P(K_{j},x)^{\prime}Q_{K_{j}}^{-1}\Omega_{K_{j},K_{l}}Q_{K_{l}}^{-1}P(K_{l},x)}{V_{n}(K_{j},x)^{1/2}V_{n}(K_{l},x)^{1/2}}, (3.2)

ΩKj,Kl=E(PKjiPKliεi2)\Omega_{K_{j},K_{l}}=E(P_{K_{j}i}P_{K_{l}i}^{\prime}\varepsilon_{i}^{2}).

By replacing unknown Σn,Vn(K,x)\Sigma_{n},V_{n}(K,x) with consistent estimators Σ^n,V^n(K,x)\widehat{\Sigma}_{n},\widehat{V}_{n}(K,x), we show below that we can approximate maxK𝒦n|T^n(K,x)|\max_{K\in\mathcal{K}_{n}}|\widehat{T}_{n}(K,x)| by max1jpi=1n|Zij|\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}| and then obtain critical values by using a simulation-based method to provide valid coverage properties in (1.2) and (1.3). We define c^1α(x)\widehat{c}_{1-\alpha}(x) as follows:

c^1α(x)(1α) quantile of max1jpi=1n|Z^ij|, where Z^i=(Z^i1,,Z^ip)N(0,1nΣ^n),Σ^n(j,j)=1,Σ^n(j,l)=V^n(Kj,Kl,x)V^n(Kj,x)1/2V^n(Kl,x)1/2,V^n(Kj,Kl,x)=P(Kj,x)Q^Kj1Ω^Kj,KlQ^Kl1P(Kl,x),Ω^Kj,Kl=1ni=1nPKjiPKliε^Kjiε^Kli\displaystyle\begin{split}&\widehat{c}_{1-\alpha}(x)\equiv(1-\alpha)\textnormal{ quantile of }\max_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}|,\textnormal{ where }\widehat{Z}_{i}=(\widehat{Z}_{i1},...,\widehat{Z}_{ip})^{\prime}\sim N(0,\frac{1}{n}\widehat{\Sigma}_{n}),\\ &\widehat{\Sigma}_{n}(j,j)=1,\quad\widehat{\Sigma}_{n}(j,l)=\frac{\widehat{V}_{n}(K_{j},K_{l},x)}{\widehat{V}_{n}(K_{j},x)^{1/2}\widehat{V}_{n}(K_{l},x)^{1/2}},\\ &\widehat{V}_{n}(K_{j},K_{l},x)=P(K_{j},x)^{\prime}\widehat{Q}_{K_{j}}^{-1}\widehat{\Omega}_{K_{j},K_{l}}\widehat{Q}_{K_{l}}^{-1}P(K_{l},x),\widehat{\Omega}_{K_{j},K_{l}}=\frac{1}{n}\sum_{i=1}^{n}P_{K_{j}i}P_{K_{l}i}^{\prime}\widehat{\varepsilon}_{K_{j}i}\widehat{\varepsilon}_{K_{l}i}\end{split} (3.3)

where Σ^n\widehat{\Sigma}_{n} is a consistent estimator of the variance-covariance matrix Σn\Sigma_{n} defined in (3.2), V^n(K,x)\widehat{V}_{n}(K,x) is the simple plug-in estimator for Vn(K,x)V_{n}(K,x) as in (2.2), and ε^Ki=yiPKiβ^K,K𝒦n\widehat{\varepsilon}_{Ki}=y_{i}-P_{Ki}^{\prime}\widehat{\beta}_{K},\forall K\in\mathcal{K}_{n}. One can compute c^1α(x)\widehat{c}_{1-\alpha}(x) by simulating BB (typically B=1000B=1000 or 50005000) i.i.d. random vectors Z^ibN(0,1nΣ^n)\widehat{Z}_{i}^{b}\sim N(0,\frac{1}{n}\widehat{\Sigma}_{n}) and by taking a (1α)(1-\alpha) sample quantile of {max1jpi=1n|Z^ijb|:b=1,,B}\{\max\limits_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}^{b}|:b=1,\cdots,B\}. Alternatively, we can use weighted bootstrap methods. See Section 4 for the implementation and the validity of bootstrap procedures in the construction of confidence bands.

To establish our main results, we impose mild regularity conditions uniform in K𝒦nK\in\mathcal{K}_{n}. For each K𝒦nK\in\mathcal{K}_{n}, define ζKsupx𝒳P(K,x)\zeta_{K}\equiv\sup_{x\in\mathcal{X}}||P(K,x)|| as the largest normalized length of the regressor vector and λK(λmin(QK))1/2\lambda_{K}\equiv(\lambda_{min}(Q_{K}))^{-1/2} for K×KK\times K design matrix QK=E(PKiPKi)Q_{K}=E(P_{Ki}P_{Ki}^{\prime}).

Assumption 3.1.

(Regularity conditions - model)

  1. (i)

    {yi,xi}i=1n\{y_{i},x_{i}\}_{i=1}^{n} are i.i.di.i.d random variables satisfying the model (1.1).

  2. (ii)

    maxK𝒦nλK1\max\limits_{K\in\mathcal{K}_{n}}\lambda_{K}\lesssim 1, and for each K𝒦nK\in\mathcal{K}_{n}, as KK\rightarrow\infty, there exists cK,Kc_{K},\ell_{K} such that

    supx𝒳|rn(K,x)|KcK,E[rn(K,x)2]1/2cK,\displaystyle\sup_{x\in\mathcal{X}}|r_{n}(K,x)|\leq\ell_{K}c_{K},\quad E[r_{n}(K,x)^{2}]^{1/2}\leq c_{K},

    where rn(K,x)=g0(x)P(K,x)βK,βK=(E[PKiPKi])1E[PKiyi]r_{n}(K,x)=g_{0}(x)-P(K,x)^{\prime}\beta_{K},\beta_{K}=(E[P_{Ki}P_{Ki}^{\prime}])^{-1}E[P_{Ki}y_{i}].

Assumption 3.2.

(Regularity conditions - pointwise inference)

  1. (i)

    maxK𝒦nζK2logKlog2p/n(1+KKcK)+KcKlogp0\max\limits_{K\in\mathcal{K}_{n}}\sqrt{\zeta_{K}^{2}\log K\log^{2}p/n}(1+\sqrt{K}\ell_{K}c_{K})+\ell_{K}c_{K}\log p\rightarrow 0 as nn\rightarrow\infty.

  2. (ii)

    supx𝒳E(|εi|3|xi=x)<\sup_{x\in\mathcal{X}}E(|\varepsilon_{i}|^{3}|x_{i}=x)<\infty, infx𝒳E(εi2|xi=x)>0\inf_{x\in\mathcal{X}}E(\varepsilon_{i}^{2}|x_{i}=x)>0, and either of the following conditions hold: (a) supx𝒳E[|εi|q|xi=x]<\sup_{x\in\mathcal{X}}E[|\varepsilon_{i}|^{q}|x_{i}=x]<\infty for q4q\geq 4 or (b) there exists a constant C>0C>0 such that supx𝒳E[exp(|εi|/C)|Xi=x]2\sup_{x\in\mathcal{X}}E[\exp(|\varepsilon_{i}|/C)|X_{i}=x]\leq 2.

  3. (iii)

    maxK𝒦n|Vn(K,x)V^n(K,x)1|=op(1/logp)\max\limits_{K\in\mathcal{K}_{n}}|\frac{V_{n}(K,x)}{\widehat{V}_{n}(K,x)}-1|=o_{p}(1/\log p), max1j,lp|Σ^n(j,l)Σn(j,l)|=op(1/log2p)\max\limits_{1\leq j,l\leq p}|\widehat{\Sigma}_{n}(j,l)-{\Sigma}_{n}(j,l)|=o_{p}(1/\log^{2}p).

Assumptions 3.1(ii) and 3.2(i) are similar to those imposed in Belloni et al. (2015) and Chen and Christensen (2015), and all the discussions made there also apply here except that we impose rate conditions of KK uniformly over 𝒦n\mathcal{K}_{n}. The rate conditions can be replaced by the specific bounds of ζK,cK,K\zeta_{K},c_{K},\ell_{K} with various sieve bases. For example, when 𝒳=[0,1]dx\mathcal{X}=[0,1]^{d_{x}}, the probability density of xix_{i} is uniformly bounded above and bounded away from zero, and g0(x)Σ(s,𝒳)g_{0}(x)\in\Sigma(s,\mathcal{X}), i.e., the Hölder space of smoothness s>0s>0, then λK1\lambda_{K}\lesssim 1, ζKK\zeta_{K}\lesssim\sqrt{K}, KcKK(ss0)/dx\ell_{K}c_{K}\lesssim K^{-(s\wedge s_{0})/d_{x}} for regression spline series of order s0s_{0}, and Assumption 3.2(i) is satisfied when K¯(log3K¯)/n(1+K¯1/2K¯(ss0)/dx)+K¯(ss0)/dxlogK¯0\sqrt{\overline{K}(\log^{3}\overline{K})/n}(1+\overline{K}^{1/2}\underline{K}^{-(s\wedge s_{0})/d_{x}})+\underline{K}^{-(s\wedge s_{0})/d_{x}}\log\overline{K}\rightarrow 0. Other standard regularity conditions in the literature (e.g., Newey (1997) and Chen (2007)) can also be used here, and the rate condition can be improved with different pointwise linearization and approximation bounds in Huang (2003) for splines and Cattaneo et al. (2019) for partitioning-based estimators.

Assumption 3.2(ii) imposes either the bounded polynomial moment conditions or sub-exponential moments of the regression errors. Assumption 3.2(iii) imposes the consistency of variance estimator V^n(K,x)\widehat{V}_{n}(K,x) uniformly in K𝒦nK\in\mathcal{K}_{n}, and this holds under mild regularity conditions (see Lemma 5.1 of Belloni et al. (2015) and Lemma 3.1-3.2 of Chen and Christensen (2015)).

Theorem 3.1.

Suppose that Assumptions 2.1, 3.1, and 3.2 hold and that either of the following rate conditions hold depending on the case (a) or (b) in Assumption 3.2(ii): (a) (maxKζK)2log5nlog3p/nmaxKζKlog3/4nlogp/n1/21/q0(\max_{K}\zeta_{K})^{2}\log^{5}n\log^{3}p/n\vee\max_{K}\zeta_{K}\log^{3/4}n\log p/n^{1/2-1/q}\rightarrow 0 or (b) (maxKζK)2log5nlog3p/n0(\max_{K}\zeta_{K})^{2}\log^{5}n\log^{3}p/n\rightarrow 0. If, in addition, we assume that maxK𝒦n|nrn(K,x)Vn(K,x)1/2|=o(1/logp)\max\limits_{K\in\mathcal{K}_{n}}|\frac{\sqrt{n}r_{n}(K,x)}{V_{n}(K,x)^{1/2}}|=o(1/\sqrt{\log p}), then

supu|P(maxK𝒦n|T^n(K,x)|u)P(max1jpi=1n|Z^ij|u)|=o(1),\displaystyle\sup_{u\in\mathbb{R}}\big{|}P(\max_{K\in\mathcal{K}_{n}}|\widehat{T}_{n}(K,x)|\leq u)-P(\max_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}|\leq u)\big{|}=o(1), (3.4)

and the following coverage property holds

P(g0(x)[g^n(K,x)±c^1α(x)V^n(K,x)/n],K𝒦n)=1α+o(1)\displaystyle P(g_{0}(x)\in[\widehat{g}_{n}(K,x)\pm\widehat{c}_{1-\alpha}(x)\sqrt{\widehat{V}_{n}(K,x)/n}],\quad K\in\mathcal{K}_{n})=1-\alpha+o(1) (3.5)

with a critical value c^1α(x)\widehat{c}_{1-\alpha}(x) defined in (3.3). Alternatively, if we assume |nrn(K^,x)Vn(K^,x)1/2|=o(1/logp)|\frac{\sqrt{n}r_{n}(\widehat{K},x)}{V_{n}(\widehat{K},x)^{1/2}}|=o(1/\sqrt{\log p}) with K^𝒦n\widehat{K}\in\mathcal{K}_{n}, then the following holds:

lim infnP(g0(x)[g^n(K^,x)±c^1α(x)V^n(K^,x)/n])1α.\liminf\limits_{n\rightarrow\infty}P(g_{0}(x)\in[\widehat{g}_{n}(\widehat{K},x)\pm\widehat{c}_{1-\alpha}(x)\sqrt{\widehat{V}_{n}(\widehat{K},x)/n}])\geq 1-\alpha. (3.6)

Theorem 3.1 provides a uniform coverage property of the confidence interval over K𝒦nK\in\mathcal{K}_{n} for the regression function g0(x)g_{0}(x). Equation (3.6) guarantees the asymptotic coverage of CI for data-dependent K^𝒦n\widehat{K}\in\mathcal{K}_{n} with undersmoothing. Note that standard inference methods in the nonparametric regression setup typically consider a singleton set 𝒦n={K}\mathcal{K}_{n}=\{K\} with KK\rightarrow\infty as nn\rightarrow\infty. The rate restriction is mild because it only requires K¯/n12/q0\overline{K}/n^{1-2/q}\rightarrow 0, up to logn\log n terms, in case (a)(a) and K¯/n0\overline{K}/n\rightarrow 0, up to logn\log n terms, in case (b)(b) when ζKK\zeta_{K}\lesssim\sqrt{K} for splines and wavelet series. Theorem 3.1 builds upon a coupling inequality for maxima of sums of random vectors in Chernozhukov, Chetverikov, and Kato (2014a) combined with the anti-concentration inequality in Chernozukhov, Chetverikov, and Kato (2014b).

Remark 3.1 (Undersmoothing assumption).

Note that (3.5) requires an undersmoothing assumption uniformly over K𝒦nK\in\mathcal{K}_{n}. Without maxK𝒦n|nrn(K,x)Vn(K,x)1/2|=o(1)\max\limits_{K\in\mathcal{K}_{n}}|\frac{\sqrt{n}r_{n}(K,x)}{V_{n}(K,x)^{1/2}}|=o(1), coverage in (3.5) can be understood as the uniform confidence intervals for the pseudo-true value g(K,x)=P(K,x)βKg(K,x)=P(K,x)^{\prime}\beta_{K}, i.e.,

P(g(K,x)[g^n(K,x)±c^1α(x)V^n(K,x)/n],K𝒦n)=1α+o(1)P(g(K,x)\in[\widehat{g}_{n}(K,x)\pm\widehat{c}_{1-\alpha}(x)\sqrt{\widehat{V}_{n}(K,x)/n}],\quad K\in\mathcal{K}_{n})=1-\alpha+o(1) (3.7)

However, a uniform undersmoothing condition is not assumed in (3.6), and it only requires that the chosen K^𝒦n\widehat{K}\in\mathcal{K}_{n} satisfies the undersmoothing condition such that the asymptotic bias is negligible. This allows broader ranges of KK in 𝒦n\mathcal{K}_{n} including an unknown optimal MSE rate. We formally justify rule-of-thumb methods for valid inference suggested in the literature that include an additional number of series terms, a blow up of the numbers after using cross-validation, or some “plug-in” methods for choosing K^\widehat{K} such as those in Newey, Powell, and Vella (1999), Newey (2013). Here, uniform (in K𝒦nK\in\mathcal{K}_{n}) inference considers uncertainty from specification search and using larger critical values c^1α(x)\widehat{c}_{1-\alpha}(x) than the normal critical value z1α/2z_{1-\alpha/2}.

Remark 3.2 (Other functionals).

Here, we focus on the leading example with g0(x)g_{0}(x) for some fixed point x𝒳x\in\mathcal{X}; however, we can consider other linear functionals a(g0())a(g_{0}(\cdot)) such as the regression derivatives a(g0(x))=ddxg0(x)a(g_{0}(x))=\frac{d}{dx}g_{0}(x). All the results in this paper can be applied to irregular (slower than n1/2n^{1/2} rate) linear functionals using estimators a(g^n(K,x))=aK(x)β^Ka(\widehat{g}_{n}(K,x))=a_{K}(x)^{\prime}\widehat{\beta}_{K} and an appropriate transformation of basis aK(x)=(a(p1(x),,a(pK(x)))a_{K}(x)=(a(p_{1}(x),\cdots,a(p_{K}(x)))^{\prime} with proper smoothness condition on the functional and continuity conditions on the derivative as in Newey (1997). Although the verification of previous results for regular (n1/2n^{1/2} rate) functionals, such as integrals and weighted average derivatives, is beyond the scope of this paper, we examine similar results for the partially linear model setup in Section 5.

4 Uniform Inference

This section provides construction of uniform confidence bands for g0(x)g_{0}(x) (uniform in K𝒦nK\in\mathcal{K}_{n}) given in (1.4). We define the following empirical process

T^n(K,x)n(g^n(K,x)g0(x))V^n(K,x)1/2\widehat{T}_{n}(K,x)\equiv\frac{\sqrt{n}(\widehat{g}_{n}(K,x)-g_{0}(x))}{\widehat{V}_{n}(K,x)^{1/2}} (4.1)

over 𝒦n×𝒳\mathcal{K}_{n}\times\mathcal{X}, and we show below that the supremum of the empirical process sup(K,x)𝒦n×𝒳|T^n(K,x)|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}(K,x)| can be approximated by a sequence of random variables sup(K,x)𝒦n×𝒳|Zn(K,x)|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|Z_{n}(K,x)|, where Zn(K,x)Z_{n}(K,x) is a tight Gaussian random process in (𝒦n×𝒳)\ell^{\infty}(\mathcal{K}_{n}\times\mathcal{X}) with zero mean and covariance function

E[Zn(K,x)Zn(K,x)]=P(K,x)QK1ΩK,KQK1P(K,x)Vn(K,x)1/2Vn(K,x)1/2.E[Z_{n}(K,x)Z_{n}(K^{\prime},x^{\prime})]=\frac{P(K,x)^{\prime}Q_{K}^{-1}\Omega_{K,K^{\prime}}Q_{K^{\prime}}^{-1}P(K^{\prime},x^{\prime})}{V_{n}(K,x)^{1/2}V_{n}(K^{\prime},x^{\prime})^{1/2}}. (4.2)

Although the Gaussian approximation is an important first step, the covariance function (4.2) is generally difficult to construct for the purpose of uniform inference. Thus, we employ weighted bootstrap methods similar to Belloni et al. (2015) and show the validity of the bootstrap procedure for uniform confidence bands.

Let e1,,ene_{1},...,e_{n} be a sequence of i.i.d. standard exponential random variables that are independent of Xn={x1,,xn}X^{n}=\{x_{1},...,x_{n}\}. For (K,x)𝒦n×𝒳(K,x)\in\mathcal{K}_{n}\times\mathcal{X}, we define a (centered) weighted bootstrap process

T^ne(K,x)=n(g^ne(K,x)g^n(K,x))V^n(K,x)1/2\widehat{T}_{n}^{e}(K,x)=\frac{\sqrt{n}(\widehat{g}_{n}^{e}(K,x)-\widehat{g}_{n}(K,x))}{\widehat{V}_{n}(K,x)^{1/2}} (4.3)

where g^ne(K,x)=P(K,x)β^Ke\widehat{g}_{n}^{e}(K,x)=P(K,x)^{\prime}\widehat{\beta}_{K}^{e}, and β^Ke\widehat{\beta}_{K}^{e} is obtained by the following weighted least squares regression

β^Ke=argminβKi=1nei(yiP(K,xi)β)2.\widehat{\beta}_{K}^{e}=\operatorname*{arg\,min}_{\beta\in\mathbb{R}^{K}}\sum_{i=1}^{n}e_{i}(y_{i}-P(K,x_{i})^{\prime}\beta)^{2}. (4.4)

Define the critical value

c^1α(1α) conditional quantile of supK𝒦n,x𝒳|T^ne(K,x)| given the data Xn,\widehat{c}_{1-\alpha}\equiv(1-\alpha)\textnormal{ conditional quantile of }\sup_{K\in\mathcal{K}_{n},x\in\mathcal{X}}|\widehat{T}_{n}^{e}(K,x)|\textnormal{ given the data }X^{n}, (4.5)

and we consider confidence bands of the form

[g^n(K,x)±c^1αV^n(K,x)/n],K𝒦n,x𝒳.[\widehat{g}_{n}(K,x)\pm\widehat{c}_{1-\alpha}\sqrt{\widehat{V}_{n}(K,x)/n}],\quad K\in\mathcal{K}_{n},x\in\mathcal{X}. (4.6)

To provide the validity of the bootstrap critical values and confidence bands in (4.6), we show below that the conditional distribution of sup(K,x)𝒦n×𝒳|T^ne(K,x)|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}^{e}(K,x)| is “close” to the distribution of sup(K,x)𝒦n×𝒳|Zn(K,x)|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|Z_{n}(K,x)| and that of sup(K,x)𝒦n×𝒳|T^n(K,x)|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}(K,x)| using coupling inequalities for the supremum of the empirical process and the bootstrap process as in Chernozhukov et al. (2016). Then, similar to Theorem 3.1, this gives bounds on the Kolmogorov distance for the distribution functions of P(supK𝒦n,x𝒳|T^n(K,x)|u)P(\sup_{K\in\mathcal{K}_{n},x\in\mathcal{X}}|\widehat{T}_{n}(K,x)|\leq u) and P(supK𝒦n,x𝒳|T^ne(K,x)|u|Xn)P(\sup_{K\in\mathcal{K}_{n},x\in\mathcal{X}}|\widehat{T}_{n}^{e}(K,x)|\leq u|X^{n}).

The following assumptions are used to establish the coverage probability of confidence bands uniformly over K𝒦nK\in\mathcal{K}_{n}. Define α(K,x)QK1/2P(K,x)/Vn(K,x)1/2\alpha(K,x)\equiv Q_{K}^{-1/2}P(K,x)/V_{n}(K,x)^{1/2}, and

ζL1=maxK𝒦nsupx,x𝒳,xxα(K,x)α(K,x)xx,ζL2=supx𝒳maxK,K𝒦n:KKα(K,x)α(K,x)|KK|.\zeta^{L_{1}}=\max_{K\in\mathcal{K}_{n}}\sup_{x,x^{\prime}\in\mathcal{X},x\neq x^{\prime}}\frac{||\alpha(K,x)-\alpha(K,x^{\prime})||}{||x-x^{\prime}||},\ \zeta^{L_{2}}=\sup_{x\in\mathcal{X}}\max_{K,K^{\prime}\in\mathcal{K}_{n}:K\neq K^{\prime}}\frac{||\alpha(K,x)-\alpha(K^{\prime},x)||}{|K-K^{\prime}|}.
Assumption 4.1.

(Regularity conditions - uniform inference)

  1. (i)

    supxE[|εi|q|xi=x]<\sup_{x}E[|\varepsilon_{i}|^{q}|x_{i}=x]<\infty for q4q\geq 4 and infx𝒳E(εi2|xi=x)>0\inf_{x\in\mathcal{X}}E(\varepsilon_{i}^{2}|x_{i}=x)>0.

  2. (ii)

    maxK𝒦nλK2ζK2logKlog4nn(n1/q+KcKK)+(KcK)logn0\max_{K\in\mathcal{K}_{n}}\sqrt{\frac{\lambda_{K}^{2}\zeta_{K}^{2}\log K\log^{4}n}{n}}(n^{1/q}+\ell_{K}c_{K}\sqrt{K})+(\ell_{K}c_{K})\log n\rightarrow 0 as nn\rightarrow\infty.

  3. (iii)

    log(ζL1ζL2)logn\log(\zeta^{L_{1}}\vee\zeta^{L_{2}})\lesssim\log n, maxKζK2q/(q2)log3n/n1\max_{K}\zeta_{K}^{2q/(q-2)}\log^{3}n/n\lesssim 1, and maxKζKlogn\max_{K}\zeta_{K}\lesssim\log n.

  4. (iv)

    sup(K,x)𝒦n×𝒳|Vn(K,x)V^n(K,x)1|=op(1/log2n)\sup\limits_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\frac{V_{n}(K,x)}{\widehat{V}_{n}(K,x)}-1|=o_{p}(1/\log^{2}n).

For uniform inference, we require similar but slightly stronger conditions compared to Assumption 3.2. We also impose mild rate restrictions on ζL1,ζL2\zeta^{L_{1}},\zeta^{L_{2}} and maxK𝒦nζK\max_{K\in\mathcal{K}_{n}}\zeta_{K} similar to Chernozhukov et al. (2014a) and Belloni et al. (2015).

Theorem 4.1.

Suppose that Assumptions 2.1, 3.1, and 4.1 hold, and (maxKζK)log2+1/(2q)n/n1/21/q0(\max_{K}\zeta_{K})\log^{2+1/(2q)}n/n^{1/2-1/q}\rightarrow 0, (maxKζK)2log7n/n0(\max_{K}\zeta_{K})^{2}\log^{7}n/n\rightarrow 0 as nn\rightarrow\infty. If, in addition, we assume that sup(K,x)𝒦n×𝒳|nrn(K,x)Vn(K,x)1/2|=o(1/logn)\sup\limits_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\frac{\sqrt{n}r_{n}(K,x)}{V_{n}(K,x)^{1/2}}|=o(1/\sqrt{\log n}), then

P(g0(x)[g^n(K,x)±c^1αV^n(K,x)/n],K𝒦n,x𝒳)=1α+o(1)\displaystyle P(g_{0}(x)\in[\widehat{g}_{n}(K,x)\pm\widehat{c}_{1-\alpha}\sqrt{\widehat{V}_{n}(K,x)/n}],\quad K\in\mathcal{K}_{n},x\in\mathcal{X})=1-\alpha+o(1) (4.7)

with a critical value c^1α\widehat{c}_{1-\alpha} in (4.5).

Alternatively, if we assume supx𝒳|nrn(K^,x)Vn(K^,x)1/2|=o(1/logn)\sup\limits_{x\in\mathcal{X}}|\frac{\sqrt{n}r_{n}(\widehat{K},x)}{V_{n}(\widehat{K},x)^{1/2}}|=o(1/\sqrt{\log n}) with K^𝒦n\widehat{K}\in\mathcal{K}_{n}, then the following coverage property holds:

lim infnP(g0(x)[g^n(K^,x)±c^1αV^n(K^,x)/n],x𝒳)1α.\liminf\limits_{n\rightarrow\infty}P(g_{0}(x)\in[\widehat{g}_{n}(\widehat{K},x)\pm\widehat{c}_{1-\alpha}\sqrt{\widehat{V}_{n}(\widehat{K},x)/n}],\quad x\in\mathcal{X})\geq 1-\alpha. (4.8)

Theorem 4.1 shows the uniform asymptotic coverage property of the confidence bands defined in (4.6) uniformly over K𝒦nK\in\mathcal{K}_{n}. Furthermore, it shows a confidence band with possibly data-dependent K^𝒦n\widehat{K}\in\mathcal{K}_{n} having an asymptotic coverage of at least 1α1-\alpha. The confidence band constructed in (4.8) requires a substantially weaker assumption on the undersmoothing similar to Theorem 3.1.

5 Extension: Partially Linear Model

In this section we provide inference methods for the partially linear model (PLM) setup. For notational simplicity, we use similar notation as defined in the nonparametric regression setup. Suppose we observe random samples {yi,wi,xi}i=1n\{y_{i},w_{i},x_{i}\}_{i=1}^{n}, where yiy_{i} is the scalar response variable, wi𝒲w_{i}\in\mathcal{W}\subset\mathbb{R} is the treatment/policy variable of interest, and xi𝒳dxx_{i}\in\mathcal{X}\subset\mathbb{R}^{d_{x}} is a set of explanatory variables. For simplicity, we shall assume that wiw_{i} is a scalar. We consider the model

yi=θ0wi+g0(xi)+εi,E(εi|wi,xi)=0.\displaystyle y_{i}=\theta_{0}w_{i}+g_{0}(x_{i})+\varepsilon_{i},\qquad E(\varepsilon_{i}|w_{i},x_{i})=0. (5.1)

We are interested in inference on θ0\theta_{0} after approximating an unknown function g0(x)g_{0}(x) by series terms/regressors p(xi)p(x_{i}) among a set of potential control variables. Specification searches can be performed for the number of different approximating terms or for the number of covariates in estimating the nonparametric part.

The series estimator θ^n(K)\widehat{\theta}_{n}(K) for θ0\theta_{0} using the first K=KnK=K_{n} terms is obtained by standard LS estimation of yiy_{i} on wiw_{i} and PKi=P(K,xi)P_{Ki}=P(K,x_{i}) and has the usual “partialling out” formula

θ^n(K)=(WMKW)1WMKY\displaystyle\widehat{\theta}_{n}(K)=\left(W^{\prime}M_{K}W\right)^{-1}W^{\prime}M_{K}Y (5.2)

where W=(w1,,wn),MK=IKPK(PKPK)1PK,PK=[PK1,,PKn],Y=(y1,,yn)W=(w_{1},\cdots,w_{n})^{\prime},M_{K}=I_{K}-P^{K}(P^{K^{\prime}}P^{K})^{-1}P^{K^{\prime}},P^{K}=[P_{K1},\cdots,P_{Kn}]^{\prime},Y=(y_{1},\cdots,y_{n})^{\prime}. The asymptotic normality and valid inference for θ^n(K)\widehat{\theta}_{n}(K) have been developed in the literature.555See also Robinson (1988), Linton (1995) and references therein for the results of the kernel estimators. Donald and Newey (1994) derived the asymptotic normality of θ^n(K)\widehat{\theta}_{n}(K) under standard rate conditions K/n0K/n\rightarrow 0. Belloni, Chernozukhov, and Hansen (2014) analyzed asymptotic normality and uniformly valid inference for the post-double-selection estimator even when KK is much larger than nn (see also Kozbur (2018)). Recent papers by Cattaneo, Jansson, and Newey (2018a, 2018b) provided a valid approximation theory for θ^n(K)\widehat{\theta}_{n}(K) when KK grows at the same rate of nn.

A different approximation theory using a faster rate of KK (K/nc,0<c<1K/n\rightarrow c,0<c<1) than the standard rate conditions (K/n0K/n\rightarrow 0) is particularly useful for our purpose to establish the asymptotic distribution of t-statistics over K𝒦nK\in\mathcal{K}_{n}. From the results in Cattaneo, Jansson, and Newey (2018a), we have the following decomposition:

n(θ^n(K)θ0)\displaystyle\sqrt{n}(\widehat{\theta}_{n}(K)-\theta_{0}) =(1nWMKW)11nWMKY\displaystyle=(\frac{1}{n}W^{\prime}M_{K}W)^{-1}\frac{1}{\sqrt{n}}W^{\prime}M_{K}Y
=Γ^n(K)1(1niviMK,iiεi+1ni=1nj=1,jinviMK,ijεj)+op(1)\displaystyle=\widehat{\Gamma}_{n}(K)^{-1}(\frac{1}{\sqrt{n}}\sum_{i}v_{i}M_{K,ii}\varepsilon_{i}+\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{j=1,j\neq i}^{n}v_{i}M_{K,ij}\varepsilon_{j})+o_{p}(1) (5.3)

where viwigw0(xi)v_{i}\equiv w_{i}-g_{w0}(x_{i}), gw0(xi)E[wi|xi]g_{w0}(x_{i})\equiv E[w_{i}|x_{i}] and Γ^n(K)=WMKW/n\widehat{\Gamma}_{n}(K)=W^{\prime}M_{K}W/n. For any deterministic sequence KK\rightarrow\infty satisfying standard rate conditions K/n0K/n\rightarrow 0, n(θ^n(K)θ0)\sqrt{n}(\widehat{\theta}_{n}(K)-\theta_{0}) is asymptotically normal with variance V=Γ1ΩΓ1,Γ=E[vivi],Ω=E[viviεi2]V=\Gamma^{-1}\Omega\Gamma^{-1},\Gamma=E[v_{i}v_{i}^{\prime}],\Omega=E[v_{i}v_{i}^{\prime}\varepsilon_{i}^{2}]. Unlike the nonparametric object of interest in the fully nonparametric model, where the variance term increases with KK, θ^n(K)\widehat{\theta}_{n}(K) has a parametric (n1/2n^{1/2}) convergence rate, and θ^n(K)\widehat{\theta}_{n}(K) with all different sequences of KK are asymptotically equivalent under K/n0K/n\rightarrow 0.666This is also related to the well-known results of the two-step semiparametric estimation; the asymptotic variance of two-step semiparametric estimators does not depend on the type of the first-step estimator or smoothing parameter sequences under certain conditions (see Newey (1994b)). However, under faster rate conditions, K/ncK/n\rightarrow c for 0<c<10<c<1, the second term in (5.3) is not negligible and converges to bounded random variables. Cattaneo, Jansson, and Newey (2018a) apply the central limit theorem of degenerate U-statistics for the second term, similar to the many instrument asymptotics analyzed in Chao, Swanson, Hausman, Newey, and Woutersen (2012). Then, the limiting normal distribution has a larger variance than the standard first-order asymptotic variance, and the adjusted variances generally depend on the number of terms KK such that we can provide an asymptotic distribution of the t-statistics with the different sequence of KK over 𝒦n\mathcal{K}_{n}.

The following assumption on 𝒦n\mathcal{K}_{n} is considered, and we impose the regularity conditions that are used in Cattaneo, Jansson, and Newey (2018a, Assumption PLM) uniformly over K𝒦nK\in\mathcal{K}_{n}.

Assumption 5.1.

(Set of finite number of series terms)

  1. Assume 𝒦n={K¯K1,,Km,,K¯Kp}\mathcal{K}_{n}=\{\underline{K}\equiv K_{1},\cdots,K_{m},\cdots,\overline{K}\equiv K_{p}\}, where Km,Km/ncmK_{m}\rightarrow\infty,K_{m}/n\rightarrow c_{m} as nn\rightarrow\infty for all m=1,,pm=1,...,p, constant cmc_{m} such that 0<c1<c2<<cp<10<c_{1}<c_{2}<\cdots<c_{p}<1, and fixed pp.

Assumption 5.2.

(Regularity conditions - partially linear model)

  1. (i)

    {yi,wi,xi}i=1n\{y_{i},w_{i},x_{i}\}_{i=1}^{n} are i.i.di.i.d random variables satisfying the model (5.1).

  2. (ii)

    There exists constants 0<cC<0<c\leq C<\infty such that E[εi2|wi,xi]cE[\varepsilon_{i}^{2}|w_{i},x_{i}]\geq c and E[vi2|xi]cE[v_{i}^{2}|x_{i}]\geq c, E[εi4|wi,xi]CE[\varepsilon_{i}^{4}|w_{i},x_{i}]\leq C and E[vi4|xi]CE[v_{i}^{4}|x_{i}]\leq C.

  3. (iii)

    rank(PK)=Krank(P_{K})=K(a.s.) and MK,iiCM_{K,ii}\geq C for C>0C>0 for all K𝒦nK\in\mathcal{K}_{n}.

  4. (iv)

    For each K𝒦nK\in\mathcal{K}_{n}, there exists some γg,γgw\gamma_{g},\gamma_{g_{w}},

    minηgE[(g0(xi)ηgPKi)2]=O(K2γg),minηgwE[(gw0(xi)ηgwPKi)2]=O(K2γgw).\displaystyle\min_{\eta_{g}}E[(g_{0}(x_{i})-\eta_{g}^{\prime}P_{Ki})^{2}]=O(K^{-2\gamma_{g}}),\quad\min_{\eta_{g_{w}}}E[(g_{w0}(x_{i})-\eta_{g_{w}}^{\prime}P_{Ki})^{2}]=O(K^{-2\gamma_{g_{w}}}).

Assumption 5.2 does not require K/n0K/n\rightarrow 0 which is required to obtain asymptotic normality in the literature (e.g., Donald and Newey (1994)). Similar to Assumption 3.2(iii) in the nonparametric setup, Assumption 5.2(iv) holds for the polynomials and spline basis. For example, 5.2(iv) holds with γg=sg/dx,γgw=sw/dx\gamma_{g}=s_{g}/d_{x},\gamma_{g_{w}}=s_{w}/d_{x} when 𝒳\mathcal{X} is compact and when the unknown functions g0(x)g_{0}(x) and gw0(x)g_{w0}(x) have sgs_{g} and sws_{w} continuous derivates, respectively.

Under Assumptions 5.1, 5.2 and undersmoothing condition (nK¯2(γg+γgw)0n\overline{K}^{-2(\gamma_{g}+\gamma_{g_{w}})}\rightarrow 0), we have a joint asymptotic distribution of the t-statistics Tn(K,θ)=nVn(K)1/2(θ^n(K)θ0)T_{n}(K,\theta)=\sqrt{n}V_{n}(K)^{-1/2}(\widehat{\theta}_{n}(K)-\theta_{0}) over K𝒦nK\in\mathcal{K}_{n}:

(Tn(K1,θ0),,Tn(Kp,θ0))𝑑ZΣ=(Z1,Zp)N(0,Σ)(T_{n}(K_{1},\theta_{0}),\cdots,T_{n}(K_{p},\theta_{0}))^{\prime}\overset{d}{\longrightarrow}Z_{\Sigma}=(Z_{1},\cdots Z_{p})^{\prime}\sim N(0,\Sigma)

where

Vn(K)\displaystyle V_{n}(K) =Γn(K)1Ωn(K)Γn(K)1,\displaystyle=\Gamma_{n}(K)^{-1}\Omega_{n}(K)\Gamma_{n}(K)^{-1},
Γn(K)\displaystyle\Gamma_{n}(K) =1ni=1nMK,iiE[vi2|xi],Ωn(K)=1ni=1nj=1nMK,ij2E[vi2εj2|xi,xj],\displaystyle=\frac{1}{n}\sum_{i=1}^{n}M_{K,ii}E[v_{i}^{2}|x_{i}],\ \Omega_{n}(K)=\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{n}M_{K,ij}^{2}E[v_{i}^{2}\varepsilon_{j}^{2}|x_{i},x_{j}],

and the variance-covariance matrix Σ\Sigma with (l,l)(l,l^{\prime}) element

Σ(l,l)limnVn(Kl,Kl)Vn(Kl)1/2(Kl)1/2,Vn(Kl,Kl)=Γn(Kl)1Ωn(Kl,Kl)Γn(Kl)1Ωn(Kl,Kl)=1ni=1nj=1nMKl,ijMKl,ijE[vi2εj2|xi,xj],\displaystyle\begin{split}&\Sigma(l,l^{\prime})\equiv\lim_{n\rightarrow\infty}\frac{V_{n}(K_{l},K_{l^{\prime}})}{V_{n}(K_{l})^{1/2}(K_{l^{\prime}})^{1/2}},\quad V_{n}(K_{l},K_{l^{\prime}})=\Gamma_{n}(K_{l})^{-1}\Omega_{n}(K_{l},K_{l^{\prime}})\Gamma_{n}(K_{l^{\prime}})^{-1}\\ &\Omega_{n}(K_{l},K_{l^{\prime}})=\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{n}M_{K_{l},ij}M_{K_{l^{\prime}},ij}E[v_{i}^{2}\varepsilon_{j}^{2}|x_{i},x_{j}],\end{split} (5.4)

for l,l=1,,p.l,l^{\prime}=1,...,p. Then, we can similarly define critical values as in (3.3) to construct confidence intervals for θ0\theta_{0} uniform in K𝒦nK\in\mathcal{K}_{n} analogous to the nonparametric setup. Let

c^1α(1α) quantile of maxm=1,,p|Z^m|,Z^Σ=(Z^1,,Z^p)N(0,Σ^n)\widehat{c}_{1-\alpha}\equiv(1-\alpha)\textnormal{ quantile of }\max_{m=1,...,p}|\widehat{Z}_{m}|,\ \ \widehat{Z}_{\Sigma}=(\widehat{Z}_{1},...,\widehat{Z}_{p})^{\prime}\sim N(0,\widehat{\Sigma}_{n}) (5.5)

where Σ^n\widehat{\Sigma}_{n} is a consistent estimator for unknown Σ\Sigma defined in (5.4).

Theorem 5.1 is the main result for the partially linear model setup and provides the asymptotic coverage results of the CIs uniform in K𝒦nK\in\mathcal{K}_{n} analogous to the nonparametric setup in Section 3.

Theorem 5.1.

Suppose that Assumptions 5.1 and 5.2 hold. In addition, assume that nK¯2(γg+γgw)0n\overline{K}^{-2(\gamma_{g}+\gamma_{g_{w}})}\rightarrow 0 and maxK,K𝒦n|V^n(K,K)Vn(K,K)1|=op(1)\max\limits_{K,K^{\prime}\in\mathcal{K}_{n}}|\frac{\widehat{V}_{n}(K,K^{\prime})}{V_{n}(K,K^{\prime})}-1|=o_{p}(1) as n,Kn,K\rightarrow\infty. Then,

limnP(θ0[θ^n(K)±c^1αV^n(K)/n],K𝒦n)=1α,\displaystyle\lim\limits_{n\rightarrow\infty}P(\theta_{0}\in[\widehat{\theta}_{n}(K)\pm\widehat{c}_{1-\alpha}\sqrt{\widehat{V}_{n}(K)/n}],\quad\forall K\in\mathcal{K}_{n})=1-\alpha, (5.6)
lim infnP(θ0[θ^n(K^)±c^1αV^n(K^)/n])1α,K^𝒦n,\displaystyle\liminf\limits_{n\rightarrow\infty}P(\theta_{0}\in[\widehat{\theta}_{n}(\widehat{K})\pm\widehat{c}_{1-\alpha}\sqrt{\widehat{V}_{n}(\widehat{K})/n}])\geq 1-\alpha,\quad\widehat{K}\in\mathcal{K}_{n}, (5.7)

where the critical value c^1α\widehat{c}_{1-\alpha} is defined in (5.5).

Remark 5.1.

Note that the construction of CIs requires consistent variance estimation of Ωn(K)\Omega_{n}(K). As discussed in Cattaneo, Jansson, and Newey (2018a, 2018b), the construction of the heteroskedasticity-robust estimator for Ωn(K)\Omega_{n}(K) under K/nc>0K/n\rightarrow c>0 is challenging, and the Eicker-Huber-White-type variance estimator generally requires K/n0K/n\rightarrow 0 for consistency. Cattaneo, Jansson, and Newey (2018b) considers the following standard error formula:

Ω^n(K,κn)=1ni=1nj=1nκijv^K,i2ε^K,j2\widehat{\Omega}_{n}(K,\kappa_{n})=\frac{1}{n}\sum_{i=1}^{n}\sum_{j=1}^{n}\kappa_{ij}\hat{v}_{K,i}^{2}\hat{\varepsilon}_{K,j}^{2} (5.8)

where v^K=MKW,ε^K=MK(YWθ^n(K))\hat{v}_{K}=M_{K}W,\hat{\varepsilon}_{K}=M_{K}(Y-W\widehat{\theta}_{n}(K)) and symmetric matrix κn\kappa_{n} with (i,j)(i,j) element κij\kappa_{ij}. Cattaneo, Jansson, and Newey (2018b) show that Ω^n(K,κn)\widehat{\Omega}_{n}(K,\kappa_{n}) is consistent even under heteroskedasticity and K/nc>0K/n\rightarrow c>0 with a certain choice of κn\kappa_{n} and provide a sufficient condition for consistency. See Theorems 3 and 4 of Cattaneo, Jansson, and Newey (2018b) for further discussion.

6 Simulations

This section investigates the small sample performance of the proposed inference methods. We report the empirical coverage and the average length of the confidence intervals/confidence bands considered in Sections 3 and 4 with various simulation setups.

We consider the following data generating process:

yi\displaystyle y_{i} =g(xi)+εi,\displaystyle=g(x_{i})+\varepsilon_{i},
xi=Φ(xi),(xiεi)\displaystyle x_{i}=\Phi(x_{i}^{*}),\begin{pmatrix}x_{i}^{*}\\ \varepsilon_{i}\end{pmatrix} N((00),(100σ2(xi)))\displaystyle\sim N\begin{pmatrix}\begin{pmatrix}0\\ 0\end{pmatrix},\begin{pmatrix}1&0\\ 0&\sigma^{2}(x_{i}^{*})\end{pmatrix}\end{pmatrix}

where Φ()\Phi(\cdot) is the standard normal cumulative distribution function needed to ensure compact support, and σ2(xi)=((1+2xi)/2)2\sigma^{2}(x_{i}^{*})=((1+2x_{i}^{*})/2)^{2} (heteroskedastic). We investigate the following three functions for g(x)g(x): g1(x)=ln(|6x3|+1)sgn(x1/2)g_{1}(x)=\ln(|6x-3|+1)sgn(x-1/2), g2(x)=sin(7πx/2)1+2x2(sgn(x)+1)g_{2}(x)=\frac{\sin(7\pi x/2)}{1+2x^{2}(sgn(x)+1)}, and g3(x)=x1/2+5ϕ(10(x1/2))g_{3}(x)=x-1/2+5\phi(10(x-1/2)), where ϕ()\phi(\cdot) is the standard normal probability density function, and sgn()sgn(\cdot) is the sign function. g1(x)g_{1}(x) is used in Newey and Powell (2003), as well as Chen and Christensen (2018). g2(x)g_{2}(x) and g3(x)g_{3}(x) are rescaled versions used in Hall and Horowitz (2013). See Figure 1 for the shapes of all three functions on [0,1][0,1]. For all simulation results below, we generate 2000 simulation replications for each design with a sample size n=200n=200.

Results for quadratic splines with evenly placed knots are reported where the number of knots KK are selected among 𝒦n={6,7,,12}\mathcal{K}_{n}=\{6,7,...,12\} by setting K¯=2n1/5\underline{K}=2n^{1/5} and K¯=2n1/3\overline{K}=2n^{1/3} rounded up to the nearest integer. Then, we calculate a pointwise coverage rate (COV) and the average length (AL) of various 95% nominal CIs, as well as analogous uniform CBs for the grid points of xx on the support 𝒳=[0.05,0.95]\mathcal{X}=[0.05,0.95]. To calculate critical values, 1000 additional Monte Carlo or bootstrap replications are performed on each simulation iteration. In addition, we investigate results for homoskedastic errors (σ2(xi)=1\sigma^{2}(x_{i}^{*})=1), different sample sizes n={100,500}n=\{100,500\}, polynomial regressions, and different specifications as in Cattaneo and Farrell (2013) with multivariate and non-normal regressors; however, the results show qualitatively similar patterns and hence are not reported here for brevity. Additional simulation results are reported in the Online Supplementary Material.

Table 1 reports the nominal 95% coverage of the following pointwise CIs at x=0.2,0.5,0.8,0.9x=0.2,0.5,0.8,0.9: (1) the standard CI in (2.5) with K^cv𝒦n\widehat{K}_{\texttt{cv}}\in\mathcal{K}_{n} selected to minimize the leave-one-out cross-validation; (2) robust CI in (3.6) with K^cv\widehat{K}_{\texttt{cv}} using the critical value c^1α(x)\widehat{c}_{1-\alpha}(x); (3) robust CI using K^cv+=K^cv+2\widehat{K}_{\texttt{cv+}}=\widehat{K}_{\texttt{cv}}+2. Analogous uniform inference results for CBs are also reported. The critical values, c^1α(x)\widehat{c}_{1-\alpha}(x) and c^1α\widehat{c}_{1-\alpha} are constructed using the Monte Carlo methods and weighted bootstrap method, respectively.

Overall, we find that the coverage of the standard CI with K^cv\widehat{K}_{\texttt{cv}} is far less than 95% over the support although it has the shortest length. However, the coverage of robust CIs based on K^cv\widehat{K}_{\texttt{cv}} or K^cv+\widehat{K}_{\texttt{cv+}} with c^1α(x)\widehat{c}_{1-\alpha}(x) is close to or above 95% and performs well across the different simulation designs, and this is consistent with theoretical results in Theorem 3.1. Using the undersmoothed K^cv+\widehat{K}_{\texttt{cv+}} (using more terms than the cross-validation) seems to work quite well at most points and for highly nonlinear designs where there exists relatively large bias, e.g., Model 3 (g3(x))(g_{3}(x)) at x=0.5x=0.5.777The possibly poor coverage property of the standard kernel-based CIs for g3(x)g_{3}(x) at the single peak (x=0.5x=0.5) was also described in Hall and Horowitz (2013, Figure 3). Uniform coverage rates of confidence bands with selected KK seem conservative, and this is due to the large critical values based on weighted bootstrap methods to be uniform in both K𝒦nK\in\mathcal{K}_{n} and x𝒳x\in\mathcal{X}, including boundary points.

7 Empirical application

In this section, we illustrate inference procedures by revisiting Blomquist and Newey (2002). Understanding how tax policy affects individual labor supply has been a central issue in labor economics (see Hausman (1985) and Blundell and MaCurdy (1999), among many others). Blomquist and Newey (2002) estimate the conditional mean of hours of work given the individual nonlinear budget sets using nonparametric series estimation. They also estimate the wage elasticity of the expected labor supply and find evidence of possible misspecification of the usual parametric model such as maximum likelihood estimation (MLE).

Specifically, Blomquist and Newey (2002) consider the following model by exploiting an additive structure from the utility maximization with piecewise linear budget sets:

hi\displaystyle h_{i} =g(xi)+εi,E(εi|xi)=0,\displaystyle=g(x_{i})+\varepsilon_{i},\quad E(\varepsilon_{i}|x_{i})=0, (7.1)
g(xi)\displaystyle g(x_{i}) =g1(yJ,wJ)+j=1J1[g2(yj,wj,j)g2(yj+1,wj+1,j)],\displaystyle=g_{1}(y_{J},w_{J})+\sum_{j=1}^{J-1}[g_{2}(y_{j},w_{j},\ell_{j})-g_{2}(y_{j+1},w_{j+1},\ell_{j})], (7.2)

where hih_{i} is the hours worked of the iith individual and xi=(y1,,yJ,w1,,wJ,1,,J,J)x_{i}=(y_{1},\cdots,y_{J},w_{1},\cdots,w_{J},\ell_{1},\cdots,\ell_{J},J) is the budget set, which can be represented by the intercept yjy_{j} (non-labor income), slope wjw_{j} (marginal wage rates) and the end point j\ell_{j} of the jjth segment in a piecewise linear budget with JJ segments. Equation (7.2) for the conditional mean function follows from Theorem 2.1 of Blomquist and Newey (2002), and this additive structure substantially reduces the dimensionality issues. To approximate g(x)g(x), they consider the power series, pk(x)=(yJp1(k)wJq1(k),j=1J1jm(k)(yjp2(k)wjq2(k)yj+1p2(k)wj+1q2(k))),p_{k}(x)=(y_{J}^{p_{1}(k)}w_{J}^{q_{1}(k)},\sum_{j=1}^{J-1}\ell_{j}^{m(k)}(y_{j}^{p_{2}(k)}w_{j}^{q_{2}(k)}-y_{j+1}^{p_{2}(k)}w_{j+1}^{q_{2}(k)})), p2(k)+q2(k)1p_{2}(k)+q_{2}(k)\geq 1.

From the Swedish “Level of Living” survey in 1973, 1980 and 1990, they pool the data from three waves and use the data for married or cohabiting men of ages 20-60. Changes in the tax system over three different time periods give a large variation in the budget sets. The sample size is n=2321n=2321. See Section 5 of Blomquist and Newey (2002) for more detailed descriptions. They estimate the wage elasticity of the expected labor supply

Ew=w¯/h¯[g(w,,w,y¯,,y¯)w]|w=w¯,E_{w}=\bar{w}/\bar{h}[\frac{\partial g(w,\cdots,w,\bar{y},\cdots,\bar{y})}{\partial w}]|_{w=\bar{w}}, (7.3)

which is the regression derivative of g(x)g(x) evaluated at the mean of the net wage rates w¯\bar{w}, virtual income y¯\bar{y} and level of hours h¯\bar{h}.

Table 2 is the same table as in Blomquist and Newey (2002, Table 1). They report estimates E^w\widehat{E}_{w} and standard errors SEE^wSE_{\widehat{E}_{w}} with a different number of series terms by adding additional series terms. For example, the estimates in the second row use the term in the first row (1,yJ,wJ)(1,y_{J},w_{J}) with the additional terms (Δy,Δw)(\Delta y,\Delta w). Here, mΔypwq\ell^{m}\Delta y^{p}w^{q} denotes approximating the term jjm(yjpwjqyj+1pwj+1q)\sum_{j}\ell_{j}^{m}(y_{j}^{p}w_{j}^{q}-y_{j+1}^{p}w_{j+1}^{q}). Blomquist and Newey (2002) also report cross-validation criteria, CVCV, for each specification. In their formula, series terms are chosen to maximize CVCV, which minimizes the asymptotic MSE. In addition to their original table, we add the standard 95% CI for each specification, i.e., CI(K)=E^w(K)±1.96SEE^w(K)CI(K)=\widehat{E}_{w}(K)\pm 1.96SE_{\widehat{E}_{w}}(K). In Table 2, it is ambiguous as to which large model (KK) can be used for the inference, and we do not have compelling data-dependent methods for selecting one of the large KK for the confidence interval to be reported. Here we want to construct CIs that are robust to specification searches.

Figure 2 displays pointwise 95% uniform CIs for Km{K1,K2,,K11}K_{m}\in\{K_{1},K_{2},\cdots,K_{11}\}, where KmK_{m} corresponds to each specification in Table 2 with increasing order of series terms, along with the point estimates and standard 95% confidence interval.888It is straightforward to construct c^1α(x)\widehat{c}_{1-\alpha}(x) using the covariance structure under the homoskedastic error and it only requires estimated variances for different K𝒦nK\in\mathcal{K}_{n} that are already reported in the table of Blomquist and Newey (2002). Based on 100,000 simulation repetitions, we have c^1α(x)=2.503\widehat{c}_{1-\alpha}(x)=2.503. From Figure 2, we reject a zero wage elasticity of the labor supply for almost all models except K¯\overline{K}. Table 2 also reports robust confidence intervals CIE^wsup(K)=E^w(K)±c^1α(x)SEE^w(K)CI_{\widehat{E}_{w}}^{\texttt{sup}}(K)=\widehat{E}_{w}(K)\pm\widehat{c}_{1-\alpha}(x)SE_{\widehat{E}_{w}}(K) with possibly data-dependent K^\widehat{K} justified by Theorem 3.1 (eq (3.6)). Note that cross-validation chooses K^cv=K5\widehat{K}_{\texttt{cv}}=K_{5}, and the standard CI with K^cv\widehat{K}_{\texttt{cv}} is [0.0247,0.0839][0.0247,0.0839] and the robust CI is [0.0165,0.0921][0.0165,0.0921]. Using K^cv+=K6\widehat{K}_{\texttt{cv+}}=K_{6} or K^cv++=K7\widehat{K}_{\texttt{cv++}}=K_{7} widens the standard CI, and the robust CIs are CIE^wsup(K^cv+)=[0.0166,0.1152],CIE^wsup(K^cv++)=[0.0070,0.1186]CI_{\widehat{E}_{w}}^{\texttt{sup}}(\widehat{K}_{\texttt{cv+}})=[0.0166,0.1152],CI_{\widehat{E}_{w}}^{\texttt{sup}}(\widehat{K}_{\texttt{cv++}})=[0.0070,0.1186].

8 Conclusion

This paper considers nonparametric inference methods given specification searches over different numbers of series terms in the nonparametric series regression model. We provide methods of constructing uniform CIs and confidence bands by adjusting the conventional normal critical value to the critical value based on the supremum of the t-statistics. The critical values can be constructed using simple Monte Carlo simulation or weighted bootstrap methods. Then, we provide an extension of the proposed CIs in the partially linear model setup. Finally, we investigate the finite sample properties of the proposed methods and illustrate uniform CIs in an empirical example of Blomquist and Newey (2002).

While beyond the scope of this paper, there are some potential directions to extend the results established here. First, investigating the coverage property of CIs with data-dependent K^\widehat{K} using bias-corrected methods is of interest. In particular, it would be of interest to analyze the bias-corrected CI and confidence bands using cross-validation methods combined with the recent results established in Cattaneo, Farrell, and Feng (2019). Second, an extension of the current theory for quantile regression (e.g., Belloni, Chernozhukov, Chetverikov, and Fernández-Val (2019)) or the nonparametric IV setup would be desirable. In the NPIV setup, for example, one can consider pointwise CIs (or uniform confidence bands) that are uniform in pairs of (Kn,Jn)𝒦n×𝒥n(K_{n},J_{n})\in\mathcal{K}_{n}\times\mathcal{J}_{n} with an additional dimension of the instrument sieve and the number of instruments J=JnJ=J_{n}. This is a difficult problem, and it would require a distinct theory to address the ill-posed inverse problem as well as two-dimensional choices. We leave these topics for future research.

References

Andrews, D. W. K. (1991a): “Asymptotic Normality of Series Estimators for Nonparametric and Semiparametric Regression Models,”Econometrica, 59, 307-345.

Andrews, D. W. K. (1991b): “Asymptotic Optimality of Generalized CLC_{L}, Cross-Validation, and Generalized Cross-Validation in Regression with Heteroskedastic Errors,”Journal of Econometrics, 47, 359-377.

Armstrong, T. B. and M. Kolesár (2018): “A Simple Adjustment for Bandwidth Snooping,”Review of Economic Studies, 85, 732-765.

Belloni, A., V. Chernozhukov, D. Chetverikov, and I. Fernández-Val (2019): “Conditional quantile processes based on series or many regressors,” Journal of Econometrics, 213, 4-29.

Belloni, A., V. Chernozhukov, D. Chetverikov, and K. Kato (2015): “Some New Asymptotic Theory for Least Squares Series: Pointwise and Uniform Results,” Journal of Econometrics, 186, 345-366.

Belloni, A., V. Chernozhukov, and C. Hansen (2014): “Inference on Treatment Effects after Selection among High-Dimensional Controls,”Review of Economic Studies, 81, 608-650.

Blomquist, S. and W. K. Newey (2002): “Nonparametric Estimation with Nonlinear Budget Sets,” Econometrica, 70, 2455-2480.

Blundell, R. and T. E. MaCurdy (1999): “Labor Supply: A Review of Alternative Approaches,” Handbook of Labor Economics, In: O. Ashenfelter, D. Card (Eds.), vol. 3., Elsevier, Chapter 27.

Calonico, S., M. D. Cattaneo, and M. H. Farrell (2018): “On the Effect of Bias Estimation on Coverage Accuracy in Nonparametric Inference,”Journal of the American Statistical Association, 113, 767-779.

Cattaneo, M. D. and M. H. Farrell (2013): “Optimal Convergence Rates, Bahadur Representation, and Asymptotic Normality of Partitioning Estimators,” Journal of Econometrics, 174, 127-143.

Cattaneo, M. D., M. H. Farrell, and Y. Feng (2019): “Large Sample Properties of Partitioning-Based Series Estimators,” Annals of Statistics, forthcoming.

Cattaneo, M. D., M. Jansson, and W. K. Newey (2018a): “Alternative Asymptotics and the Partially Linear Model with Many Regressors,” Econometric Theory, 34, 277-301.

Cattaneo, M. D., M. Jansson, and W. K. Newey (2018b): “Inference in Linear Regression Models with Many Covariates and Heteroscedasticity,” Journal of the American Statistical Association, 113, 1350-1361.

Chao, J. C., N. R. Swanson, J. A. Hausman, W. K. Newey, and T. Woutersen (2012): “Asymptotic Distribution of JIVE in a Heteroskedastic IV Regression with Many Instruments,”Econometric Theory, 28, 42-86.

Chatterjee, S. (2005): “An error bound in the Sudakov-Fernique inequality,”arXiv:math/0510424

Chen, X. (2007): “Large Sample Sieve Estimation of Semi-nonparametric Models,” Handbook of Econometrics, In: J.J. Heckman, E. Leamer (Eds.), vol. 6B., Elsevier, Chapter 76.

Chen, X. and T. Christensen (2015): “Optimal Uniform Convergence Rates and Asymptotic Normality for Series Estimators Under Weak Dependence and Weak Conditions,” Journal of Econometrics, 188, 447-465.

Chen, X. and T. Christensen (2018): “Optimal Sup-norm Rates and Uniform Inference on Nonlinear Functionals of Nonparametric IV Regression”, Quantitative Economics, 9(1), 39-85.

Chen, X. and Z. Liao (2014): “Sieve M inference on irregular parameters,” Journal of Econometrics,182, 70-86.

Chen, X., Z. Liao, and Y. Sun (2014): “Sieve inference on possibly misspecified semi-nonparametric time series models,” Journal of Econometrics,178, 639-658.

Chen, X. and X. Shen (1998): “Sieve extremum estimates for weakly dependent data,” Econometrica, 66 (2), 289-314.

Chernozhukov V, D. Chetverikov, and K. Kato (2014a): “Gaussian approximation of suprema of empirical processes,” The Annals of Statistics, 42(4), 1564-1597.

Chernozhukov V, D. Chetverikov, and K. Kato (2014b): “Anti-Concentration and Honest, Adaptive Confidence Bands,”The Annals of Statistics, 42(5), 1787-1818.

Chernozhukov V, D. Chetverikov, and K. Kato (2016): “Empirical and multiplier bootstraps for suprema of empirical processes of increasing complexity, and related Gaussian couplings,”Stochastic Processes and their Applications, 126(12), 3632-3651.

Donald, S. G. and W. K. Newey (1994): “Series Estimation of Semilinear Models,” Journal of Multivariate Analysis, 50, 30-40.

Eastwood, B. J. and A.R. Gallant, (1991): “Adaptive Rules for Seminonparametric Estimators That Achieve Asymptotic Normality,”Econometric Theory, 7, 307-340.

Giné, E. and R. Nickl (2010): “Confidence bands in density estimation,” The Annals of Statistics, 38, 1122-1170.

Giné, E. and R. Nickl (2015): Mathematical Foundations of Infinite-Dimensional Statistical Models, Cambridge University Press.

Hall, P. and J. Horowitz (2013): “A Simple Bootstrap Method for Constructing Nonparametric Confidence Bands for Functions,” The Annals of Statistics, 41, 1892-1921.

Hansen B. E. (2015): “The Integrated Mean Squared Error of Series Regression and a Rosenthal Hilbert-Space Inequality,” Econometric Theory, 31, 337-361.

Hansen, P.R. (2005): “A Test for Superior Predictive Ability,”Journal of Business and Economic Statistics, 23, 365-380.

Härdle, W. and O. Linton (1994): “Applied Nonparametric Methods,” Handbook of Econometrics, In: R. F. Engle, D. F. McFadden (Eds.), vol. 4., Elsevier, Chapter 38.

Hausman, J. A. (1985): “The Econometrics of Nonlinear Budget Sets”, Econometrica, 53, 1255-1282.

Heckman, J. J., L. J. Lochner, and P. E. Todd (2006): “Earnings Functions, Rates of Return and Treatment Effects: The Mincer Equation and Beyond,” Handbook of the Economics of Education, In: E. A. Hanushek, and F. Welch (Eds.), Vol. 1, Elsevier, Chapter 7.

Horowitz, J. L. (2014): “Adaptive Nonparametric Instrumental Variables Estimation: Empirical Choice of the Regularization Parameter,” Journal of Econometrics, 180, 158-173.

Horowitz, J. L. and S. Lee (2012): “Uniform Confidence Bands for Functions Estimated Nonparametrically with Instrumental Variables,” Journal of Econometrics, 168, 175-188.

Huang, J. Z. (2003): “Local Asymptotics for Polynomial Spline Regression,” The Annals of Statistics, 31, 1600-1635.

Kozbur, D. (2018): “Inference in Additively Separable Models With a High-Dimensional Set of Conditioning Variables,” Working Paper, arXiv:1503.05436.

Leamer, E. E. (1983): “Let’s Take the Con Out of Econometrics,”The American Economic Review, 73, 31-43.

Lepski, O. V. (1990): “On a problem of adaptive estimation in Gaussian white noise,”Theory of Probability and its Applications, 35, 454-466.

Li, K. C. (1987): “Asymptotic Optimality for CpC_{p}, CLC_{L}, Cross-Validation and Generalized Cross-Validation: Discrete Index Set,” The Annals of Statistics, 15, 958-975.

Li, Qi, and J. S. Racine (2007): Nonparametric Econometrics: Theory and Practice, Princeton University Press.

Linton, O. (1995): “Second order approximation in the partialy linear regression model,” Econometrica, 63(5), 1079-1112.

Newey, W. K. (1994a): “Series Estimation of Regression Functionals,” Econometric Theory, 10, 1-28.

Newey, W. K. (1994b): “The Asymptotic Variance of Semiparametric Estimators,” Econometrica, 62, 1349-1382.

Newey, W. K. (1997): “Convergence Rates and Asymptotic Normality for Series Estimators,”Journal of Econometrics, 79, 147-168.

Newey, W. K. (2013): “Nonparametric Instrumental Variables Estimation,”American Economic Review: Papers & Proceedings, 103, 550-556.

Newey, W. K. and J. L. Powell (2003): “Instrumental Variable Estimation of Nonparametric Models,”Econometrica, 71, 1565-1578.

Newey, W. K. and J. L. Powell, F. Vella (1999): “Nonparametric Estimation of Triangular Simultaneous Equations Models,”Econometrica, 67, 565-603.

Robinson, P. M. (1988): “Root-N-Consistent Semiparametric Regression,”Econometrica, 56(4), 931-954.

Romano, J. P. and M. Wolf (2005): “Stepwise Multiple Testing as Formalized Data Snooping,”Econometrica, 73, 1237-1282.

Schennach, S. M. (2015): “A bias bound approach to nonparametric inference,” CEMMAP working paper CWP71/15.

Van Der Vaart, A. W. and J. A. Wellner (1996): Weak Convergence and Empirical Processes, Springer.

White, H. (2000): “A Reality Check for Data Snooping,”Econometrica, 68, 1097-1126.

Zhou, S., X. Shen, and D.A. Wolfe (1998): “Local Asymptotics for Regression Splines and Confidence Regions,” The Annals of Statistics, 26, 1760-1782.

Appendix A Proofs

A.1 Preliminaries and Useful Lemmas

We define additional notations for the empirical process theory used in the proof of Theorem 4.1. Given measurable space (S,𝒮)(S,\mathcal{S}), let \mathcal{F} as a class of measurable functions f:𝒮f:\mathcal{S}\rightarrow\mathbb{R}. For any probability measure QQ on (S,𝒮)(S,\mathcal{S}), we define N(ϵ,,L2(Q))N(\epsilon,\mathcal{F},L_{2}(Q)) as covering numbers, which is the minimal number of the L2(Q)L_{2}(Q) balls of radius ϵ\epsilon to cover \mathcal{F} with L2(Q)L_{2}(Q) norms fQ,2=(|f|2𝑑Q)1/2||f||_{Q,2}=(\int|f|^{2}dQ)^{1/2}. The uniform entropy numbers relative to the L2(Q)L_{2}(Q) norms are defined as supQlogN(ϵFQ,2,,L2(Q))\sup_{Q}\log N(\epsilon||F||_{Q,2},\mathcal{F},L_{2}(Q)) where the supremum is over all discrete probability measures with an envelope function FF. We define \mathcal{F} as a VC type with envelope FF if there are constants A,v>0A,v>0 such that supQN(ϵFQ,2,,L2(Q))(A/ϵ)v\sup_{Q}N(\epsilon||F||_{Q,2},\mathcal{F},L_{2}(Q))\leq(A/\epsilon)^{v} for all 0<ϵ10<\epsilon\leq 1.

Let the data zi=(εi,xi)z_{i}=(\varepsilon_{i},x_{i}) be i.i.d. random vectors defined on the probability space (𝒵=×𝒳,𝒜,P)(\mathcal{Z}=\mathcal{E}\times\mathcal{X},\mathcal{A},P) with common probability distribution PPε,xP\equiv P_{\varepsilon,x}. We think of (ε1,x1),(εn,xn)(\varepsilon_{1},x_{1}),\cdots(\varepsilon_{n},x_{n}) as the coordinates of the infinite product probability space. We avoid discussing nonmeasurability issues and outer expectations (for the related issues, see van der Vaart and Wellner (1996)). Throughout the proofs, we denote c,C>0c,C>0 as universal constants that do not depend on nn.

For any sequence {K=Kn:n1}n=1𝒦n\{K=K_{n}:n\geq 1\}\in\prod_{n=1}^{\infty}\mathcal{K}_{n} under Assumption 2.1, we first define the orthonormalized vector of basis functions

P~(K,x)QK1/2P(K,x)=E[PKiPKi]1/2P(K,x),P~Ki=P~(K,xi),P~K=[P~K1,,P~Kn].\tilde{P}(K,x)\equiv Q_{K}^{-1/2}P(K,x)=E[P_{Ki}P_{Ki}^{\prime}]^{-1/2}P(K,x),\ \tilde{P}_{Ki}=\tilde{P}(K,x_{i}),\ \tilde{P}^{K}=[\tilde{P}_{K1},\cdots,\tilde{P}_{Kn}]^{\prime}.

We observe that

g^n(K,x)=P~(K,x)(P~KP~K)1P~KY,Vn(K,x)=P~(K,x)Ω~KP~(K,x),Ω~K=E(P~KiP~Kiεi2).\widehat{g}_{n}(K,x)=\tilde{P}(K,x)^{\prime}(\tilde{P}^{K^{\prime}}\tilde{P}^{K})^{-1}\tilde{P}^{K^{\prime}}Y,\ \ V_{n}(K,x)=\tilde{P}(K,x)^{\prime}\tilde{\Omega}_{K}\tilde{P}(K,x),\ \ \tilde{\Omega}_{K}=E(\tilde{P}_{Ki}\tilde{P}_{Ki}^{\prime}\varepsilon_{i}^{2}).

Without loss of generality, we may impose normalizations of QK¯=IK¯Q_{\overline{K}}=I_{\overline{K}} or QK=E(PKiPKi)=IKQ_{K}=E(P_{Ki}P_{Ki}^{\prime})=I_{K} uniformly over K𝒦nK\in\mathcal{K}_{n}, since g^n(K,x)\widehat{g}_{n}(K,x) is invariant to nonsingular linear transformations of P(K,x)P(K,x). However, we shall treat QKQ_{K} as unknown and deal with the non-orthonormalized series terms. Next, we re-define pseudo true value βK\beta_{K}, with an abuse of notation, using orthonormalized series terms P~Ki\tilde{P}_{Ki}. That is, yi=P~KiβK+εKi,E[P~KiεKi]=0y_{i}=\tilde{P}_{Ki}^{\prime}\beta_{K}+\varepsilon_{Ki},E[\tilde{P}_{Ki}\varepsilon_{Ki}]=0 where εKi=rKi+εi\varepsilon_{Ki}=r_{Ki}+\varepsilon_{i}, rn(K,x)=g0(x)P~(K,x)βK,rKi=rn(K,xi)r_{n}(K,x)=g_{0}(x)-\tilde{P}(K,x)^{\prime}\beta_{K},r_{Ki}=r_{n}(K,x_{i}), and rK(rK1,rKn)r_{K}\equiv(r_{K1},\cdots r_{Kn})^{\prime}. We also define Q^K1nP~KP~K\widehat{Q}_{K}\equiv\frac{1}{n}\tilde{P}^{K^{\prime}}\tilde{P}^{K}, σ¯2infxE[εi2|xi=x],σ¯2supxE[εi2|xi=x]\underline{\sigma}^{2}\equiv\inf_{x}E[\varepsilon_{i}^{2}|x_{i}=x],\bar{\sigma}^{2}\equiv\sup_{x}E[\varepsilon_{i}^{2}|x_{i}=x].

We first provide useful lemmas which will be used in the proof of Theorem 3.1 and 4.1. The versions of proof of Lemmas 1 and 2 with 𝒦n={K}\mathcal{K}_{n}=\{K\} are available in the literature, such as Belloni et al. (2015) and Chen and Christensen (2015), among many others. The maximal inequalities are used in the proof of Lemmas 1 and 2 to bound the remainder terms in the linearization of the t-statistics. Also note that different rate conditions of KK such as those in Newey (1997) can be used here but lead to different bounds. We provide the proofs of Lemma 1 and 2 in the Online Supplementary Material (Section B).

Lemma 1.

Suppose that Assumptions 2.1, 3.1, and 3.2 hold, then Q^KIK=Op(λK2ζK2logK/n)||\widehat{Q}_{K}-I_{K}||=O_{p}(\sqrt{\lambda_{K}^{2}\zeta_{K}^{2}\log K/n}) for any K𝒦nK\in\mathcal{K}_{n} and the following holds

maxK𝒦n|R1(K,x)|=Op(maxK𝒦nλK2ζK2logKlogpn(1+KcKK)),\displaystyle\max_{K\in\mathcal{K}_{n}}|R_{1}(K,x)|=O_{p}(\max_{K\in\mathcal{K}_{n}}\sqrt{\frac{\lambda_{K}^{2}\zeta_{K}^{2}\log K\log p}{n}}(1+\ell_{K}c_{K}\sqrt{K})), (A.1)
maxK𝒦n|R2(K,x)|=Op(maxK𝒦n(KcK)logp),\displaystyle\max_{K\in\mathcal{K}_{n}}|R_{2}(K,x)|=O_{p}(\max_{K\in\mathcal{K}_{n}}(\ell_{K}c_{K})\sqrt{\log p}), (A.2)

where R1(K,x)1nVn(K,x)P~(K,x)(Q^K1IK)P~K(ε+rK),R2(K,x)1nVn(K,x)P~(K,x)P~KrKR_{1}(K,x)\equiv\sqrt{\frac{1}{nV_{n}(K,x)}}\tilde{P}(K,x)^{\prime}(\widehat{Q}_{K}^{-1}-I_{K})\tilde{P}^{K^{\prime}}(\varepsilon+r_{K}),R_{2}(K,x)\equiv\sqrt{\frac{1}{nV_{n}(K,x)}}\tilde{P}(K,x)^{\prime}\tilde{P}^{K^{\prime}}r_{K}.

Lemma 2.

Suppose that Assumptions 2.1, 3.1 and 4.1 hold, then the following holds

supK𝒦n,x𝒳|R1(K,x)|=Op(maxK𝒦nλK2ζK2logKlognn(n1/q+KcKK)),\displaystyle\sup_{K\in\mathcal{K}_{n},x\in\mathcal{X}}|R_{1}(K,x)|=O_{p}(\max_{K\in\mathcal{K}_{n}}\sqrt{\frac{\lambda_{K}^{2}\zeta_{K}^{2}\log K\log n}{n}}(n^{1/q}+\ell_{K}c_{K}\sqrt{K})), (A.3)
supK𝒦n,x𝒳|R2(K,x)|=Op(maxK𝒦n(KcK)logn),\displaystyle\sup_{K\in\mathcal{K}_{n},x\in\mathcal{X}}|R_{2}(K,x)|=O_{p}(\max_{K\in\mathcal{K}_{n}}(\ell_{K}c_{K})\sqrt{\log n}), (A.4)

where R1(K,x),R2(K,x)R_{1}(K,x),R_{2}(K,x) are defined in Lemma 1.

A.2 Proofs of the Main Results

A.2.1 Proof of Theorem 3.1

Proof.

For any K𝒦nK\in\mathcal{K}_{n}, we first consider the decomposition of the t-statistic in (2.4) with the known variance Vn(K,x)V_{n}(K,x),

Tn(K,x)\displaystyle T_{n}(K,x) =nVn(K,x)P~(K,x)(β^KβK)nVn(K,x)rn(K,x)\displaystyle=\sqrt{\frac{n}{V_{n}(K,x)}}\tilde{P}(K,x)^{\prime}(\widehat{\beta}_{K}-\beta_{K})-\sqrt{\frac{n}{V_{n}(K,x)}}r_{n}(K,x)
=tn(K,x)+R1(K,x)+R2(K,x)+νn(K,x)\displaystyle=t_{n}(K,x)+R_{1}(K,x)+R_{2}(K,x)+\nu_{n}(K,x)

where tn(K,x)=n1/2i=1nP~(K,x)P~KiεiVn(K,x)1/2t_{n}(K,x)=n^{-1/2}\sum_{i=1}^{n}\frac{\tilde{P}(K,x)^{\prime}\tilde{P}_{Ki}\varepsilon_{i}}{V_{n}(K,x)^{1/2}}, R1(K,x),R2(K,x)R_{1}(K,x),R_{2}(K,x) are defined in Lemma 1, and νn(K,x)=nVn(K,x)1/2rn(K,x)\nu_{n}(K,x)=-\sqrt{n}V_{n}(K,x)^{-1/2}r_{n}(K,x). Define

tn(tn(K1,x),,tn(Kp,x))=1ni=1nξit_{n}\equiv(t_{n}(K_{1},x),\cdots,t_{n}(K_{p},x))^{\prime}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\xi_{i}

where ξi=(ξi1,ξi2,,ξip)p\xi_{i}=(\xi_{i1},\xi_{i2},\cdots,\xi_{ip})^{\prime}\in\mathbb{R}^{p} with ξij=P~(Kj,x)P~KjiεiVn(Kj,x)1/2\xi_{ij}=\frac{\tilde{P}(K_{j},x)^{\prime}\tilde{P}_{K_{j}i}\varepsilon_{i}}{V_{n}(K_{j},x)^{1/2}} and p=|𝒦n|p=|\mathcal{K}_{n}|. Note that E[ξij]=0E[\xi_{ij}]=0 and E[|ξij|3]E[|P~(Kj,x)P~Kji/Vn(Kj,x)1/2|3]supxE[|εi|3|xi=x]maxKζKE[|\xi_{ij}|^{3}]\lesssim E[|\tilde{P}(K_{j},x)^{\prime}\tilde{P}_{K_{j}i}/V_{n}(K_{j},x)^{1/2}|^{3}]\sup_{x}E[|\varepsilon_{i}|^{3}|x_{i}=x]\lesssim\max_{K}\zeta_{K} for all 1in,1jp1\leq i\leq n,1\leq j\leq p. By Lemma A.2 in the Online Supplementary Material, for any δ>0\delta>0, there exists a random variable max1jpi=1nZij\max_{1\leq j\leq p}\sum_{i=1}^{n}Z_{ij} with independent random vectors {Zi}i=1np\{Z_{i}\}_{i=1}^{n}\in\mathbb{R}^{p}, ZiN(0,1nE[ξiξi]),1inZ_{i}\sim N(0,\frac{1}{n}E[\xi_{i}\xi_{i}^{\prime}]),1\leq i\leq n, such that

P(|max1jp|tn(Kj,x)|max1jpi=1n|Zij||>16δ)log(pn)δ2D1+log2(pn)δ3n3/2(D2+D3)+lognnP(|\max_{1\leq j\leq p}|t_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}||>16\delta)\lesssim\frac{\log(p\vee n)}{\delta^{2}}D_{1}+\frac{\log^{2}(p\vee n)}{\delta^{3}n^{3/2}}(D_{2}+D_{3})+\frac{\log n}{n}

where D1=E[max1j,lp|1ni=1n(ξijξilE[ξijξil])|],D2=E[max1jpi=1n|ξij|3]D_{1}=E\big{[}\max_{1\leq j,l\leq p}|\frac{1}{n}\sum_{i=1}^{n}(\xi_{ij}\xi_{il}-E[\xi_{ij}\xi_{il}])|\big{]},D_{2}=E\big{[}\max_{1\leq j\leq p}\sum_{i=1}^{n}|\xi_{ij}|^{3}\big{]}, and D3=i=1nE[max1jp|ξij|31(max1jp|ξij|>δn/log(pn))].D_{3}=\sum_{i=1}^{n}E\big{[}\max_{1\leq j\leq p}|\xi_{ij}|^{3}1\big{(}\max_{1\leq j\leq p}|\xi_{ij}|>\delta\sqrt{n}/\log(p\vee n)\big{)}\big{]}.

First consider the case (a) in Assumption 3.2(ii). Combining bounds for D1,D2,D3D_{1},D_{2},D_{3} in Lemma B.1 in the Online Supplementary Material gives, for any δ>0\delta>0,

P(|max1jp|tn(Kj,x)|max1jpi=1n|Zij||>16δ)\displaystyle\hskip-14.22636ptP(|\max_{1\leq j\leq p}|t_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}||>16\delta)
log(pn)δ2[((maxKζK)2logpn)1/2+(maxKζK)2logpn12/q]\displaystyle\lesssim\frac{\log(p\vee n)}{\delta^{2}}\big{[}(\frac{(\max_{K}\zeta_{K})^{2}\log p}{n})^{1/2}+\frac{(\max_{K}\zeta_{K})^{2}\log p}{n^{1-2/q}}\big{]}
+log2(pn)δ3[((maxKζK)2n)1/2+(maxKζK)3logpn3/23/q]+logq1(pn)δq(maxKζK)qnq/21+lognn.\displaystyle+\frac{\log^{2}(p\vee n)}{\delta^{3}}\big{[}(\frac{(\max_{K}\zeta_{K})^{2}}{n})^{1/2}+\frac{(\max_{K}\zeta_{K})^{3}\log p}{n^{3/2-3/q}}\big{]}+\frac{\log^{q-1}(p\vee n)}{\delta^{q}}\frac{(\max_{K}\zeta_{K})^{q}}{n^{q/2-1}}+\frac{\log n}{n}.

For γ>0\gamma>0, by setting

δ\displaystyle\delta =\displaystyle= γ1/3((maxKζK)2log4(pn)n)1/6+γ1/2((maxKζK)2log(pn)logpn12/q)1/2\displaystyle\gamma^{-1/3}\big{(}\frac{(\max_{K}\zeta_{K})^{2}\log^{4}(p\vee n)}{n}\big{)}^{1/6}+\gamma^{-1/2}\big{(}\frac{(\max_{K}\zeta_{K})^{2}\log(p\vee n)\log p}{n^{1-2/q}}\big{)}^{1/2}
+γ1/3((maxKζK)3log2(pn)logpn3/23/q)1/3,\displaystyle+\gamma^{-1/3}\big{(}\frac{(\max_{K}\zeta_{K})^{3}\log^{2}(p\vee n)\log p}{n^{3/2-3/q}}\big{)}^{1/3},

we have

P(|max1jp|tn(Kj,x)|max1jpi=1n|Zij||>C1δ)C2(γ+lognn)P(|\max_{1\leq j\leq p}|t_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}||>C_{1}\delta)\leq C_{2}(\gamma+\frac{\log n}{n})

where C1,C2C_{1},C_{2} are positive constants that depend only on qq. If we take γ=γn0\gamma=\gamma_{n}\rightarrow 0 sufficiently slowly, e.g., γ=log(pn)1/2\gamma=\log(p\vee n)^{-1/2}, then the above implies there exists max1jpi=1nZij\max_{1\leq j\leq p}\sum_{i=1}^{n}Z_{ij} such that

|max1jp|tn(Kj,x)|max1jpi=1n|Zij||=op(((maxKζK)2log5(pn)n)1/6+(maxKζK)log3/4(pn)log1/2pn1/21/q).|\max_{1\leq j\leq p}|t_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}||=o_{p}(\big{(}\frac{(\max_{K}\zeta_{K})^{2}\log^{5}(p\vee n)}{n}\big{)}^{1/6}+\frac{(\max_{K}\zeta_{K})\log^{3/4}(p\vee n)\log^{1/2}p}{n^{1/2-1/q}}).

Next, consider the case (b) in Assumption 3.2(ii). For any δ>0\delta>0,

P(|max1jp|tn(Kj,x)|max1jpi=1n|Zij||>16δ)\displaystyle\hskip-14.22636ptP(|\max_{1\leq j\leq p}|t_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}||>16\delta)
log(pn)δ2[((maxKζK)2logpn)1/2+(maxKζK)2log2(pn)logpn]\displaystyle\lesssim\frac{\log(p\vee n)}{\delta^{2}}\big{[}(\frac{(\max_{K}\zeta_{K})^{2}\log p}{n})^{1/2}+\frac{(\max_{K}\zeta_{K})^{2}\log^{2}(pn)\log p}{n}\big{]}
+log2(pn)δ3[((maxKζK)2n)1/2+(maxKζK)3log3(pn)logpn3/2]\displaystyle+\frac{\log^{2}(p\vee n)}{\delta^{3}}\big{[}(\frac{(\max_{K}\zeta_{K})^{2}}{n})^{1/2}+\frac{(\max_{K}\zeta_{K})^{3}\log^{3}(pn)\log p}{n^{3/2}}\big{]}
+log2(pn)δ3[1n1/2(δ3n3/2log3(pn)+(maxKζK)3log3p)exp(δnCmaxKζKlogplog(pn))]+lognn\displaystyle+\frac{\log^{2}(p\vee n)}{\delta^{3}}\big{[}\frac{1}{n^{1/2}}(\frac{\delta^{3}n^{3/2}}{\log^{3}(p\vee n)}+(\max_{K}\zeta_{K})^{3}\log^{3}p)\exp(-\frac{\delta\sqrt{n}}{C\max_{K}\zeta_{K}\log p\log(p\vee n)})\big{]}+\frac{\log n}{n}

by Lemma B.1 in the Online Supplementary Material. Similarly, by setting

δ=max{γ1/3(maxKζK)2log4(pn)/n)1/6,2C((maxKζK)2log4(pn)log2p/n)1/2}\delta=\max\{\gamma^{-1/3}(\max_{K}\zeta_{K})^{2}\log^{4}(p\vee n)/n)^{1/6},2C((\max_{K}\zeta_{K})^{2}\log^{4}(p\vee n)\log^{2}p/n)^{1/2}\}

we have, for γ>0\gamma>0,

P(|max1jp|tn(Kj,x)|max1jpi=1n|Zij||>C1δ)C2(γ+lognn)P(|\max_{1\leq j\leq p}|t_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}||>C_{1}\delta)\leq C_{2}(\gamma+\frac{\log n}{n})

where C1,C2C_{1},C_{2} are universal constants which do not depend on nn. Here we use δnCmaxKζKlogplog(pn)2log(pn)\frac{\delta\sqrt{n}}{C\max_{K}\zeta_{K}\log p\log(p\vee n)}\geq 2\log(p\vee n). By taking γ=log(pn)1/2\gamma=\log(p\vee n)^{-1/2}, there exists max1jpi=1nZij\max_{1\leq j\leq p}\sum_{i=1}^{n}Z_{ij} such that

|max1jp|tn(Kj,x)|max1jpi=1n|Zij||=op(((maxKζK)2log5(pn)n)1/6+(maxKζK)2log4(pn)log2pn)1/2).|\max_{1\leq j\leq p}|t_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}||=o_{p}(\big{(}\frac{(\max_{K}\zeta_{K})^{2}\log^{5}(p\vee n)}{n}\big{)}^{1/6}+\frac{(\max_{K}\zeta_{K})^{2}\log^{4}(p\vee n)\log^{2}p}{n}\big{)}^{1/2}).

In either case (a) or (b), the above coupling inequality shows that there exists a sequence of random variables max1jpi=1n|Zij|\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}| such that |maxK𝒦n|tn(K,x)|max1jpi=1n|Zij||=op(an)\big{|}\max_{K\in\mathcal{K}_{n}}|t_{n}(K,x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\big{|}=o_{p}(a_{n}), an=1/(logp)1/2a_{n}=1/(\log p)^{1/2} under the rate conditions imposed in Theorem 3.1. Furthermore,

|max1jp|Tn(Kj,x)|max1jp|tn(Kj,x)||\displaystyle\big{|}\max_{1\leq j\leq p}|T_{n}(K_{j},x)|-\max_{1\leq j\leq p}|t_{n}(K_{j},x)|\big{|} max1jp|Tn(Kj,x)tn(Kj,x)|max1jp|R1(Kj,x)|\displaystyle\leq\max_{1\leq j\leq p}|T_{n}(K_{j},x)-t_{n}(K_{j},x)|\leq\max_{1\leq j\leq p}|R_{1}(K_{j},x)|
+max1jp|R2(Kj,x)|+max1jp|νn(Kj,x)|=op(an)\displaystyle\quad+\max_{1\leq j\leq p}|R_{2}(K_{j},x)|+\max_{1\leq j\leq p}|\nu_{n}(K_{j},x)|=o_{p}(a_{n}) (A.5)

with an=1/(logp)1/2a_{n}=1/(\log p)^{1/2} by Lemma 1 and the assumption imposed in Theorem 3.1. We also have

|\displaystyle\big{|} max1jp|Tn(Kj,x)|max1jp|T^n(Kj,x)||max1jp|Tn(Kj,x)T^n(Kj,x)|\displaystyle\max_{1\leq j\leq p}|T_{n}(K_{j},x)|-\max_{1\leq j\leq p}|\widehat{T}_{n}(K_{j},x)|\big{|}\leq\max_{1\leq j\leq p}|T_{n}(K_{j},x)-\widehat{T}_{n}(K_{j},x)|
\displaystyle\leq max1jp|Tn(Kj,x)|max1jp|1Vn(Kj,x)1/2V^n(Kj,x)1/2|=op(an)\displaystyle\max_{1\leq j\leq p}|T_{n}(K_{j},x)|\max_{1\leq j\leq p}|1-\frac{V_{n}(K_{j},x)^{1/2}}{\widehat{V}_{n}(K_{j},x)^{1/2}}|=o_{p}(a_{n}) (A.6)

where we use Lemma 1 and max1jp|tn(Kj,x)|Plogp\max_{1\leq j\leq p}|t_{n}(K_{j},x)|\lesssim_{P}\sqrt{\log p} by the maximal inequality (e.g., Lemma A.4 in the Online Supplementary Material) and Assumption 3.2(iii) with an=1/(logp)1/2a_{n}=1/(\log p)^{1/2}. Combining (A.5) and (A.6) gives |max1jp|T^n(Kj,x)|max1jpi=1n|Zij||=op(an)\big{|}\max_{1\leq j\leq p}|\widehat{T}_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\big{|}=o_{p}(a_{n}) with an=1/(logp)1/2a_{n}=1/(\log p)^{1/2}. Then, there exists some sequence of positive constant δn\delta_{n} such that δn=o(1)\delta_{n}=o(1) and P(|max1jp|T^n(Kj,x)|max1jpi=1n|Zij||>anδn)=o(1)P(\big{|}\max_{1\leq j\leq p}|\widehat{T}_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\big{|}>a_{n}\delta_{n})=o(1).

For any uu\in\mathbb{R}, we have

P(max1jp|T^n(Kj,x)|u)\displaystyle P(\max_{1\leq j\leq p}|\widehat{T}_{n}(K_{j},x)|\leq u)
P({max1jp|T^n(Kj,x)|u}{|max1jp|T^n(Kj,x)|max1jpi=1n|Zij||anδn})\displaystyle\leq P(\{\max_{1\leq j\leq p}|\widehat{T}_{n}(K_{j},x)|\leq u\}\cap\{\big{|}\max_{1\leq j\leq p}|\widehat{T}_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\big{|}\leq a_{n}\delta_{n}\})
+P(|max1jp|T^n(Kj,x)|max1jpi=1n|Zij||>anδn)\displaystyle\quad+P(\big{|}\max_{1\leq j\leq p}|\widehat{T}_{n}(K_{j},x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\big{|}>a_{n}\delta_{n})
P(max1jpi=1n|Zij|u+anδn)+o(1)P(max1jpi=1n|Zij|u)+anδnE[max1jpi=1n|Zij|]+o(1)\displaystyle\leq P(\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\leq u+a_{n}\delta_{n})+o(1)\leq P(\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\leq u)+a_{n}\delta_{n}E[\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|]+o(1)

where the last inequality uses anti-concentration inequality (Lemma A.8 in the Online Supplementary Material). The reverse inequality holds with a similar argument above, and thus

supu|P(max1jp|T^n(K,x)|u)P(max1jpi=1n|Zij|u)|=anδnE[max1jpi=1n|Zij|]+o(1)=o(1)\sup_{u\in\mathbb{R}}\big{|}P(\max_{1\leq j\leq p}|\widehat{T}_{n}(K,x)|\leq u)-P(\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\leq u)\big{|}=a_{n}\delta_{n}E[\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|]+o(1)=o(1)

where we use E[max1jpi=1n|Zij|]logpE[\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|]\lesssim\sqrt{\log p} by Gaussian maximal inequality and an=(logp)1/2a_{n}=(\log p)^{-1/2}. Using the same arguments above, |max1jpi=1n|Zij|max1jpi=1n|Z^ij||=op(an)\big{|}\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}|\big{|}=o_{p}(a_{n}) by Sudakov-Fernique type bound (e.g., Chatterjee (2005)) and Assumption 3.2(iii), we have supu|P(max1jp|Z^ij||u)P(max1jpi=1n|Zij|u)|=o(1)\sup_{u\in\mathbb{R}}\big{|}P(\max_{1\leq j\leq p}|\widehat{Z}_{ij}||\leq u)-P(\max_{1\leq j\leq p}\sum_{i=1}^{n}|Z_{ij}|\leq u)|=o(1). Therefore, the following holds by the triangle inequality,

supu|P(max1jp|T^n(K,x)|u)P(max1jpi=1n|Z^ij|u)|=o(1),\sup_{u\in\mathbb{R}}\big{|}P(\max_{1\leq j\leq p}|\widehat{T}_{n}(K,x)|\leq u)-P(\max_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}|\leq u)\big{|}=o(1),

and then we conclude

P(maxK𝒦n|T^n(K,x)|c^1α(x))=1α+o(1),P(\max_{K\in\mathcal{K}_{n}}|\widehat{T}_{n}(K,x)|\leq\widehat{c}_{1-\alpha}(x))=1-\alpha+o(1),

with a critical value c^1α(x)\widehat{c}_{1-\alpha}(x) given in (3.3), and the coverage result (3.5) follows.

Finally, we will show (3.6). For K^𝒦n\widehat{K}\in\mathcal{K}_{n}, observe that

|T^n(K^,x)|(|tn(K^,x)|+|R1(K^,x)|+|R2(K^,x)|+|νn(K^,x)|)|Vn(K^,x)1/2V^n(K^,x)1/2||\widehat{T}_{n}(\widehat{K},x)|\leq(|t_{n}(\widehat{K},x)|+|R_{1}(\widehat{K},x)|+|R_{2}(\widehat{K},x)|+|\nu_{n}(\widehat{K},x)|)|\frac{V_{n}(\widehat{K},x)^{1/2}}{\widehat{V}_{n}(\widehat{K},x)^{1/2}}| (A.7)

by the triangle inequality. Then,

P(g0(x)[g^n(K^,x)±c^1α(x)V^n(K^,x)/n])\displaystyle P(g_{0}(x)\in[\widehat{g}_{n}(\widehat{K},x)\pm\widehat{c}_{1-\alpha}(x)\sqrt{\widehat{V}_{n}(\widehat{K},x)/n}])
P(|tn(K^,x)|+|R1(K^,x)|+|R2(K^,x)|+|νn(K^,x)|c^1α(x)|V^n(K^,x)1/2Vn(K^,x)1/2|)\displaystyle\geq P(|t_{n}(\widehat{K},x)|+|R_{1}(\widehat{K},x)|+|R_{2}(\widehat{K},x)|+|\nu_{n}(\widehat{K},x)|\leq\widehat{c}_{1-\alpha}(x)|\frac{\widehat{V}_{n}(\widehat{K},x)^{1/2}}{V_{n}(\widehat{K},x)^{1/2}}|)
P(|tn(K^,x)|+|R1(K^,x)|+|R2(K^,x)|+|νn(K^,x)|c^1α(x)(1an2δ1n))ϵ1n\displaystyle\geq P(|t_{n}(\widehat{K},x)|+|R_{1}(\widehat{K},x)|+|R_{2}(\widehat{K},x)|+|\nu_{n}(\widehat{K},x)|\leq\widehat{c}_{1-\alpha}(x)(1-a_{n}^{2}\delta_{1n}))-\epsilon_{1n} (A.8)
P(|tn(K^,x)|c^1α(x)(1an2δ1n)anδ2nanδ3n)ϵ1nϵ2nϵ3n\displaystyle\geq P(|t_{n}(\widehat{K},x)|\leq\widehat{c}_{1-\alpha}(x)(1-a_{n}^{2}\delta_{1n})-a_{n}\delta_{2n}-a_{n}\delta_{3n})-\epsilon_{1n}-\epsilon_{2n}-\epsilon_{3n} (A.9)
P(maxK𝒦n|tn(K,x)|c^1α(x)(1an2δ1n)anδ2nanδ3n)ϵ1nϵ2nϵ3n\displaystyle\geq P(\max_{K\in\mathcal{K}_{n}}|t_{n}(K,x)|\leq\widehat{c}_{1-\alpha}(x)(1-a_{n}^{2}\delta_{1n})-a_{n}\delta_{2n}-a_{n}\delta_{3n})-\epsilon_{1n}-\epsilon_{2n}-\epsilon_{3n} (A.10)
P(max1jpi=1n|Z^ij|c^1α(x)δ~n)ϵ~n\displaystyle\geq P(\max_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}|\leq\widehat{c}_{1-\alpha}(x)-\tilde{\delta}_{n})-\tilde{\epsilon}_{n} (A.11)
1αsupuP(|max1jpi=1n|Z^ij|u|δ~n)ϵ~n1αo(1).\displaystyle\geq 1-\alpha-\sup_{u}P(|\max_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}|-u|\leq\tilde{\delta}_{n})-\tilde{\epsilon}_{n}\ \geq 1-\alpha-o(1). (A.12)

The first inequality follows by (A.7), and (A.8) holds by Assumption 3.2(iii) with some sequence of positive constant δ1n=o(1),ϵ1n=o(1)\delta_{1n}=o(1),\epsilon_{1n}=o(1) and (A.9) follows by |R1(K^,x)|+|R2(K^,x)|=op(an)|R_{1}(\widehat{K},x)|+|R_{2}(\widehat{K},x)|=o_{p}(a_{n}) from Lemma 1 and the assumption |nrn(K^,x)Vn(K^,x)1/2|=o(an)|\frac{\sqrt{n}r_{n}(\widehat{K},x)}{V_{n}(\widehat{K},x)^{1/2}}|=o(a_{n}) with an=1/(logp)1/2a_{n}=1/(\log p)^{1/2} and some sequences of constants δ2n=o(1),ϵ2n=o(1),δ3n=o(1),ϵ3n=o(1)\delta_{2n}=o(1),\epsilon_{2n}=o(1),\delta_{3n}=o(1),\epsilon_{3n}=o(1). (A.10) follows by |tn(K^,x)|maxK𝒦n|tn(K,x)||t_{n}(\widehat{K},x)|\leq\max_{K\in\mathcal{K}_{n}}|t_{n}(K,x)|, and (A.11) holds by |maxK𝒦n|tn(K,x)|max1jpi=1n|Z^ij||=op(an)|\max_{K\in\mathcal{K}_{n}}|t_{n}(K,x)|-\max_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}||=o_{p}(a_{n}) with some sequences δ4n=o(1),ϵ4n=o(1)\delta_{4n}=o(1),\epsilon_{4n}=o(1) and defining δ~n=c^1α(x)an2δ1n+anδ2n+anδ3n+anδ4n,ϵ~n=ϵ1n+ϵ2n+ϵ3n+ϵ4n\tilde{\delta}_{n}=\widehat{c}_{1-\alpha}(x)a_{n}^{2}\delta_{1n}+a_{n}\delta_{2n}+a_{n}\delta_{3n}+a_{n}\delta_{4n},\tilde{\epsilon}_{n}=\epsilon_{1n}+\epsilon_{2n}+\epsilon_{3n}+\epsilon_{4n}. Finally, (A.12) holds by Lemma A.8, E[max1jpi=1n|Z^ij|]logpE[\max_{1\leq j\leq p}\sum_{i=1}^{n}|\widehat{Z}_{ij}|]\lesssim\sqrt{\log p} and δ~nlogp=o(1)\tilde{\delta}_{n}\sqrt{\log p}=o(1) since c^1α(x)logp\widehat{c}_{1-\alpha}(x)\lesssim\sqrt{\log p} by Lemma A.15. This completes the proof. ∎

A.2.2 Proof of Theorem 4.1

Proof.

Similar to the proof of Theorem 3.1, we have the following linearization of the t-statistics uniformly in (K,x)𝒦n×𝒳(K,x)\in\mathcal{K}_{n}\times\mathcal{X},

Tn(K,x)=tn(K,x)+νn(K,x)+Rn(K,x),T_{n}(K,x)=t_{n}(K,x)+\nu_{n}(K,x)+R_{n}(K,x),

where tn(K,x)=n1/2i=1nP~(K,x)P~Kiεi/Vn(K,x)1/2t_{n}(K,x)=n^{-1/2}\sum_{i=1}^{n}\tilde{P}(K,x)^{\prime}\tilde{P}_{Ki}\varepsilon_{i}/V_{n}(K,x)^{1/2} and Rn(K,x)=R1(K,x)+R2(K,x)R_{n}(K,x)=R_{1}(K,x)+R_{2}(K,x). Define fn,K,x:(×𝒳)f_{n,K,x}:(\mathcal{E}\times\mathcal{X})\mapsto\mathbb{R} for given n1n\geq 1, K𝒦n,x𝒳K\in\mathcal{K}_{n},x\in\mathcal{X},

fn,K,x(ε,t)=P~(K,x)P~(K,t)εVn(K,x)1/2,(ε,t)×𝒳.f_{n,K,x}(\varepsilon,t)=\frac{\tilde{P}(K,x)^{\prime}\tilde{P}(K,t)\varepsilon}{V_{n}(K,x)^{1/2}},(\varepsilon,t)\in\mathcal{E}\times\mathcal{X}. (A.13)

and consider the class of measurable functions n={fn,K,x:(K,x)𝒦n×𝒳}\mathcal{F}_{n}=\{f_{n,K,x}:(K,x)\in\mathcal{K}_{n}\times\mathcal{X}\}. Then, we consider the following empirical process:

{tn(K,x):(K,x)𝒦n×𝒳}={n1/2i=1nfn,K,x(εi,xi):(K,x)𝒦n×𝒳}\Big{\{}t_{n}(K,x):(K,x)\in\mathcal{K}_{n}\times\mathcal{X}\Big{\}}=\Big{\{}n^{-1/2}\sum_{i=1}^{n}f_{n,K,x}(\varepsilon_{i},x_{i}):(K,x)\in\mathcal{K}_{n}\times\mathcal{X}\Big{\}}

which is indexed by classes of functions n\mathcal{F}_{n}. Define α(K,x)P~(K,x)/Vn(K,x)1/2=P~(K,x)/ΩK1/2P~(K,x)\alpha(K,x)\equiv\tilde{P}(K,x)/V_{n}(K,x)^{1/2}=\tilde{P}(K,x)/||\Omega^{1/2}_{K}\tilde{P}(K,x)||. Note that |fn,K,x(ε,t)|=|α(K,x)P~(K,t)ε|C|ε|maxKζK|f_{n,K,x}(\varepsilon,t)|=|\alpha(K,x)^{\prime}\tilde{P}(K,t)\varepsilon|\leq C|\varepsilon|\max_{K}\zeta_{K} for any (K,x)𝒦n×𝒳(K,x)\in\mathcal{K}_{n}\times\mathcal{X}. We define the envelope function Fn(ε,t)C|ε|maxKζK1F_{n}(\varepsilon,t)\equiv C|\varepsilon|\max_{K}\zeta_{K}\vee 1. By Assumption 4.1, we have

|fn,K,xfn,K,x|=|ε||α(K,x)P~(K,t)α(K,x)P~(K,t)|\displaystyle|f_{n,K,x}-f_{n,K^{\prime},x^{\prime}}|=|\varepsilon||\alpha(K,x)^{\prime}\tilde{P}(K,t)-\alpha(K^{\prime},x^{\prime})^{\prime}\tilde{P}(K^{\prime},t)|
|ε||[|α(K,x)P~(K,t)α(K,x)P~(K,t)|+|α(K,x)P~(K,t)α(K,x)P~(K,t)|\displaystyle\leq|\varepsilon||\big{[}|\alpha(K,x)^{\prime}\tilde{P}(K,t)-\alpha(K,x)^{\prime}\tilde{P}(K^{\prime},t)|+|\alpha(K,x)^{\prime}\tilde{P}(K^{\prime},t)-\alpha(K^{\prime},x)^{\prime}\tilde{P}(K^{\prime},t)|
+|α(K,x)P~(K,t)α(K,x)P~(K,t)|]|ε|AmaxKζKLn(||xx||+|KK|)\displaystyle\quad+|\alpha(K^{\prime},x)^{\prime}\tilde{P}(K^{\prime},t)-\alpha(K^{\prime},x^{\prime})^{\prime}\tilde{P}(K^{\prime},t)|\big{]}\leq|\varepsilon|A\max_{K}\zeta_{K}L_{n}(||x-x^{\prime}||+|K-K^{\prime}|)

for all x,x𝒳,K,K𝒦nx,x^{\prime}\in\mathcal{X},K,K^{\prime}\in\mathcal{K}_{n} where Ln=ζL1ζL2L_{n}=\zeta^{L_{1}}\vee\zeta^{L_{2}}. Therefore, the class of functions n={fn,K,x:(K,x)𝒦n×𝒳}\mathcal{F}_{n}=\{f_{n,K,x}:(K,x)\in\mathcal{K}_{n}\times\mathcal{X}\} is a VC type and there are constants A,V>0A,V>0 such that

supQN(ϵFnL2(Q),n,L2(Q))(ALn/ϵ)V,0<ϵ1\sup_{Q}N(\epsilon||F_{n}||_{L^{2}(Q)},\mathcal{F}_{n},L^{2}(Q))\leq(AL_{n}/\epsilon)^{V},0<\forall\epsilon\leq 1

for each nn. Then, using Theorem 2.1 (Lemma A.9 in the Online Supplementary Material) in Chernozhukov et al. (2016) with B(f)=0B(f)=0, there exists a tight Gaussian process Gn(f)G_{n}(f) in (n)\ell^{\infty}(\mathcal{F}_{n}) and Zn(K,x)=Gn(fn,K,x)Z_{n}(K,x)=G_{n}(f_{n,K,x}) in (𝒦n×𝒳)\ell^{\infty}(\mathcal{K}_{n}\times\mathcal{X}) with zero mean and covariance function (4.2), E[Gn(f)Gn(f)]=Cov(fn,K,x(εi,xi),fn,K,x(εi,xi))E[G_{n}(f)G_{n}(f^{\prime})]=Cov(f_{n,K,x}(\varepsilon_{i},x_{i}),f_{n,K^{\prime},x^{\prime}}^{\prime}(\varepsilon_{i},x_{i})) and a sequence of random variables Z~sup(K,x)𝒦n×𝒳|Zn(K,x)|\widetilde{Z}\equiv\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|Z_{n}(K,x)| such that, for every γ(0,1)\gamma\in(0,1),

P(|sup(K,x)𝒦n×𝒳|tn(K,x)|Z~|>C1δ1n)C2(γ+n1)P(|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|t_{n}(K,x)|-\widetilde{Z}|>C_{1}\delta_{1n})\leq C_{2}(\gamma+n^{-1}) (A.14)

where C1,C2C_{1},C_{2} are positive constants that depend only on qq, and

δ1n=γ1/qn1/2+1/qmaxKζKlogn+γ1/3n1/6(maxKζK)1/3log2/3n\delta_{1n}=\gamma^{-1/q}n^{-1/2+1/q}\max_{K}\zeta_{K}\log n+\gamma^{-1/3}n^{-1/6}(\max_{K}\zeta_{K})^{1/3}\log^{2/3}n

by Assumption 4.1(iii) and assuming log3nn\log^{3}n\leq n. By taking γ=(logn)1/2\gamma=(\log n)^{-1/2}, we have

|supK,x|tn(K,x)|Z~|=op(n1/2+1/qmaxKζKlog1+1/2qn+n1/6(maxKζK)1/3log5/6n).|\sup_{K,x}|t_{n}(K,x)|-\widetilde{Z}|=o_{p}(n^{-1/2+1/q}\max_{K}\zeta_{K}\log^{1+1/2q}n+n^{-1/6}(\max_{K}\zeta_{K})^{1/3}\log^{5/6}n).

Furthermore, |R1(K,x)|=op(an),|R2(K,x)|=op(an),|νn(K,x)|=op(an)|R_{1}(K,x)|=o_{p}(a_{n}),|R_{2}(K,x)|=o_{p}(a_{n}),|\nu_{n}(K,x)|=o_{p}(a_{n}) uniformly in (K,x)𝒦n×𝒳(K,x)\in\mathcal{K}_{n}\times\mathcal{X} with an=1/(logn)1/2a_{n}=1/(\log n)^{1/2} by Lemma 2 and the rate conditions. Again, consider the class of functions n={fn,K,x:(K,x)𝒦n×𝒳}\mathcal{F}_{n}=\{f_{n,K,x}:(K,x)\in\mathcal{K}_{n}\times\mathcal{X}\} and then

E[supK,x|tn(K,x)|]logn+(maxKζK)q/(q2)logn/nlognE\big{[}\sup_{K,x}|t_{n}(K,x)|\big{]}\lesssim\sqrt{\log n}+(\max_{K}\zeta_{K})^{q/(q-2)}\log n/\sqrt{n}\lesssim\sqrt{\log n}

by Lemma A.13 and Assumption 4.1(iii), and we have supK,x|tn(K,x)|Plogn\sup_{K,x}|t_{n}(K,x)|\lesssim_{P}\sqrt{\log n}. Further, supK,x|Zn(K,x)|Plogn\sup_{K,x}|Z_{n}(K,x)|\lesssim_{P}\sqrt{\log n} using Dudley’s inequality (Corollary 2.2.8 in van der Vaart and Wellner (1996)) and using the same arguments given in Theorem 3.1, we have |supK,x|T^n(K,x)|Z~|=op(an)\big{|}\sup_{K,x}|\widehat{T}_{n}(K_{,}x)|-\widetilde{Z}\big{|}=o_{p}(a_{n}) with an=1/(logn)1/2a_{n}=1/(\log n)^{1/2} and

supu|P(sup(K,x)𝒦n×𝒳|T^n(K,x)|u)P(Z~u)|=o(1).\sup_{u\in\mathbb{R}}\big{|}P(\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}(K_{,}x)|\leq u)-P(\widetilde{Z}\leq u)\big{|}=o(1). (A.15)

Next we consider following (infeasible) bootstrap process

Tne(K,x)=n(g^ne(K,x)g^n(K,x))Vn(K,x)1/2,(K,x)𝒦n×𝒳T_{n}^{e}(K,x)=\frac{\sqrt{n}(\widehat{g}_{n}^{e}(K,x)-\widehat{g}_{n}(K,x))}{V_{n}(K,x)^{1/2}},\quad(K,x)\in\mathcal{K}_{n}\times\mathcal{X}

where g^ne(K,x)=P~(K,x)β^Ke\widehat{g}_{n}^{e}(K,x)=\tilde{P}(K,x)^{\prime}\widehat{\beta}_{K}^{e}, β^Ke\widehat{\beta}_{K}^{e} is defined in (4.4) with P~(K,xi)\tilde{P}(K,x_{i}), and eie_{i} is i.i.d. standard exponential random variables independent of Xn={x1,,xn}X^{n}=\{x_{1},...,x_{n}\}. Then, we have

Tne(K,x)\displaystyle T_{n}^{e}(K,x) =n(g^ne(K,x)g0(x))Vn(K,x)1/2n(g^n(K,x)g0(x))Vn(K,x)1/2\displaystyle=\frac{\sqrt{n}(\widehat{g}_{n}^{e}(K,x)-g_{0}(x))}{V_{n}(K,x)^{1/2}}-\frac{\sqrt{n}(\widehat{g}_{n}(K,x)-g_{0}(x))}{V_{n}(K,x)^{1/2}}
=tne(K,x)+Rne(K,x)Rn(K,x)\displaystyle=t_{n}^{e}(K,x)+R_{n}^{e}(K,x)-R_{n}(K,x)

where tne(K,x)=n1/2i=1n(ei1)fn,K,x(εi,xi)t_{n}^{e}(K,x)=n^{-1/2}\sum_{i=1}^{n}(e_{i}-1)f_{n,K,x}(\varepsilon_{i},x_{i}), Rne(K,x)=R1e(K,x)+R2e(K,x)R_{n}^{e}(K,x)=R_{1}^{e}(K,x)+R_{2}^{e}(K,x), R1e(K,x)R_{1}^{e}(K,x), and R2e(K,x)R_{2}^{e}(K,x) are defined the same as in Lemma 1 with the rescaled data {(eiP~(K,xi),eiεi}i=1n\{(\sqrt{e_{i}}\tilde{P}(K,x_{i}),\sqrt{e_{i}}\varepsilon_{i}\}_{i=1}^{n}. Note that β^Ke\widehat{\beta}_{K}^{e} is the weighted least square estimator for the original data, and we can extend the uniform linearization results in Lemma 2 by replacing ζK\zeta_{K} with ζKe=ζKlog1/2n\zeta_{K}^{e}=\zeta_{K}\log^{1/2}n and noting that E[ei]=1,E[ei2]=1,max1in|ei|=op(logn)E[e_{i}]=1,E[e_{i}^{2}]=1,\max_{1\leq i\leq n}|e_{i}|=o_{p}(\log n).

By applying Theorem 2.1 in Chernozhukov et al. (2016) to the weighted bootstrap process tne(K,x)t_{n}^{e}(K,x), there exists a random variable Z~e=d|Xnsup(K,x)𝒦n×𝒳|Zn(K,x)|\widetilde{Z}^{e}\overset{d|X^{n}}{=}\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|Z_{n}(K,x)| such that, for every γ(0,1)\gamma\in(0,1),

P(|sup(K,x)𝒦n×𝒳|tne(K,x)|Z~e|>C3δ2n)C4(γ+n1)P(|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|t_{n}^{e}(K,x)|-\widetilde{Z}^{e}|>C_{3}\delta_{2n})\leq C_{4}(\gamma+n^{-1}) (A.16)

where C3,C4C_{3},C_{4} are positive constants that depend only on qq,

δ2n=γ1/qn1/2+1/qmaxKζKlog2n+γ1/3n1/6(maxKζK)1/3logn,\delta_{2n}=\gamma^{-1/q}n^{-1/2+1/q}\max_{K}\zeta_{K}\log^{2}n+\gamma^{-1/3}n^{-1/6}(\max_{K}\zeta_{K})^{1/3}\log n,

and =d|Xn\overset{d|X^{n}}{=} denotes that the two random variables have the same conditional distribution given XnX^{n}.

Further,

|supK,x|T^ne(K,x)|supK,x|tne(K,x)||supK,x|T^ne(K,x)Tne(K,x)|+supK,x|Tne(K,x)tne(K,x)|=op(an)\big{|}\sup_{K,x}|\widehat{T}_{n}^{e}(K,x)|-\sup_{K,x}|t_{n}^{e}(K,x)|\big{|}\leq\sup_{K,x}|\widehat{T}_{n}^{e}(K,x)-T_{n}^{e}(K,x)|+\sup_{K,x}|T_{n}^{e}(K,x)-t_{n}^{e}(K,x)|=o_{p}(a_{n})

by using E[supK,x|tne(K,x)|]max1in|ei|E[supK,x|tn(K,x)|]Plog3/2nE\big{[}\sup_{K,x}|t_{n}^{e}(K,x)|\big{]}\leq\max_{1\leq i\leq n}|e_{i}|E\big{[}\sup_{K,x}|t_{n}(K,x)|\big{]}\lesssim_{P}\log^{3/2}n, Assumption 4.1(iv), and |Rne(K,x)|=op(an),|Rn(K,x)|=op(an)|R_{n}^{e}(K,x)|=o_{p}(a_{n}),|R_{n}(K,x)|=o_{p}(a_{n}) uniformly in (K,x)𝒦n×𝒳(K,x)\in\mathcal{K}_{n}\times\mathcal{X} under the rate conditions in Assumption 4.1(ii) with an=1/(logn)1/2a_{n}=1/(\log n)^{1/2}. Then, there exists some sequence of positive constant δ3n,δ4n\delta_{3n},\delta_{4n} such that δ3n=o(1),δ4n=o(1)\delta_{3n}=o(1),\delta_{4n}=o(1),

P(|supK,x|T^ne(K,x)|supK,x|tne(K,x)||>anδ3n)δ4n.P(\big{|}\sup_{K,x}|\widehat{T}_{n}^{e}(K,x)|-\sup_{K,x}|t_{n}^{e}(K,x)|\big{|}>a_{n}\delta_{3n})\leq\delta_{4n}. (A.17)

Combining (A.16) and (A.17) gives

P(|sup(K,x)𝒦n×𝒳|T^ne(K,x)|Z~e|>anδ3n+C3δ2n)C4(γ+n1)+δ4nP(|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}^{e}(K,x)|-\widetilde{Z}^{e}|>a_{n}\delta_{3n}+C_{3}\delta_{2n})\leq C_{4}(\gamma+n^{-1})+\delta_{4n} (A.18)

By the Markov’s inequality, the following is deduced from (A.18), for every ν(0,1)\nu\in(0,1),

P(|sup(K,x)𝒦n×𝒳|T^ne(K,x)|Z~e|>anδ3n+C3δ2n|Xn)ν1(C4(γ+n1)+δ4n)P(|\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}^{e}(K,x)|-\widetilde{Z}^{e}|>a_{n}\delta_{3n}+C_{3}\delta_{2n}|X^{n})\leq\nu^{-1}(C_{4}(\gamma+n^{-1})+\delta_{4n}) (A.19)

with probability at least 1ν1-\nu. Similar derivation as in Theorem 3.1 using Lemma A.14 gives

supu|P(sup(K,x)𝒦n×𝒳|T^ne(K,x)|u|Xn)P(Z~u)|=(anδ3n+C3δ2n)logn+ν1(C4(γ+n1)+δ4n)\sup_{u\in\mathbb{R}}\big{|}P(\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}^{e}(K_{,}x)|\leq u|X^{n})-P(\widetilde{Z}\leq u)\big{|}=(a_{n}\delta_{3n}+C_{3}\delta_{2n})\sqrt{\log n}+\nu^{-1}(C_{4}(\gamma+n^{-1})+\delta_{4n}) (A.20)

with probability at least 1ν1-\nu where we use Z~e=d|XnZ~\widetilde{Z}^{e}\overset{d|X^{n}}{=}\widetilde{Z} and E[supK,x|Zn(K,x)|]lognE[\sup_{K,x}|Z_{n}(K,x)|]\lesssim\sqrt{\log n}. By taking γ=(logn)1/2\gamma=(\log n)^{-1/2} and ν=νn0\nu=\nu_{n}\rightarrow 0 sufficiently slower than (logn)1/2δ4n(\log n)^{-1/2}\vee\delta_{4n}, and using δ2n=o(an)\delta_{2n}=o(a_{n}), the rate conditions imposed in the theorem, (A.20) is op(1)o_{p}(1). Combining this with (A.15),

supu|P(sup(K,x)𝒦n×𝒳|T^ne(K,x)|u|X)P(sup(K,x)𝒦n×𝒳|T^n(K,x)|u)|=op(1).\sup_{u\in\mathbb{R}}\big{|}P(\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}^{e}(K_{,}x)|\leq u|X)-P(\sup_{(K,x)\in\mathcal{K}_{n}\times\mathcal{X}}|\widehat{T}_{n}(K_{,}x)|\leq u)\big{|}=o_{p}(1). (A.21)

Then, the coverage result (4.7) follows. The second part of the theorem, (4.8), can be similarly derived as in the proof of Theorem 3.1 and this completes the proof. ∎

A.2.3 Proof of Theorem 5.1

Proof.

Conditional on X=[x1,,xn]X=[x_{1},\cdots,x_{n}]^{\prime}, the following decomposition holds for any sequence K𝒦nK\in\mathcal{K}_{n}:

n(θ^n(K)θ0)=Γ^n(K)1Sn(K),Γ^n(K)=1n(WMKW),Sn(K)=1nWMK(g+ε)\displaystyle\sqrt{n}(\widehat{\theta}_{n}(K)-\theta_{0})=\widehat{\Gamma}_{n}(K)^{-1}S_{n}(K),\quad\widehat{\Gamma}_{n}(K)=\frac{1}{n}(W^{\prime}M_{K}W),\quad S_{n}(K)=\frac{1}{\sqrt{n}}W^{\prime}M_{K}(g+\varepsilon)

where g=[g1,,gn],gi=g0(xi),gw=[gw1,,gwn],gwi=gw0(xi)=E[wi|xi]g=[g_{1},\cdots,g_{n}]^{\prime},g_{i}=g_{0}(x_{i}),g_{w}=[g_{w1},\cdots,g_{wn}]^{\prime},g_{wi}=g_{w0}(x_{i})=E[w_{i}|x_{i}], and v=[v1,,vn]v=[v_{1},\cdots,v_{n}]. All remaining proofs contain conditional expectations (conditioning on XX) and hold almost surely (a.s.). Under Assumption 5.2,

Γ^n(K)=Γn(K)+op(1),Γn(K)=1ni=1nMK,iiE[vi2|xi]\displaystyle\widehat{\Gamma}_{n}(K)=\Gamma_{n}(K)+o_{p}(1),\quad\Gamma_{n}(K)=\frac{1}{n}\sum_{i=1}^{n}M_{K,ii}E[v_{i}^{2}|x_{i}]

by Lemma 1 of Cattaneo, Jansson, and Newey (2018a). Moreover,

Sn(K)=1ni=1nMK,iiviεi1ni=1nj=1,j<inPK,ij(viεj+vjεi)+op(1)\displaystyle S_{n}(K)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}M_{K,ii}v_{i}\varepsilon_{i}-\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\sum_{j=1,j<i}^{n}P_{K,ij}(v_{i}\varepsilon_{j}+v_{j}\varepsilon_{i})+o_{p}(1)

since MK,ij=PK,ijM_{K,ij}=-P_{K,ij} for j<ij<i, 1ngwMKg=Op(nK¯γgγgw)=op(1)\frac{1}{\sqrt{n}}g_{w}^{\prime}M_{K}g=O_{p}(\sqrt{n}\overline{K}^{-\gamma_{g}-\gamma_{g_{w}}})=o_{p}(1), 1n(vMKg+gwMKε)=Op(K¯γg+K¯γgw)=op(1)\frac{1}{\sqrt{n}}(v^{\prime}M_{K}g+g_{w}^{\prime}M_{K}\varepsilon)=O_{p}(\overline{K}^{-\gamma_{g}}+\overline{K}^{-\gamma_{g_{w}}})=o_{p}(1) by Lemma 2 of Cattaneo, Jansson and Newey (2018a) under Assumption 5.2. Then, the following holds:

Tn(K,θ0)=nVn(K)1/2(θ^n(K)θ0)=Vn(K)1/2Γn(K)11nvMKε+op(1)𝑑N(0,1)T_{n}(K,\theta_{0})=\sqrt{n}V_{n}({K})^{-1/2}(\widehat{\theta}_{n}(K)-\theta_{0})=V_{n}(K)^{-1/2}\Gamma_{n}(K)^{-1}\frac{1}{\sqrt{n}}v^{\prime}M_{K}\varepsilon+o_{p}(1)\overset{d}{\longrightarrow}N(0,1)

by Theorem 1 of Cattaneo, Jansson and Newey (2018a).

For simplicity, here we only show the joint convergence of bivariate t-statistics, but the proof can be easily extended to the multivariate case. For any K1<K2K_{1}<K_{2} in 𝒦n\mathcal{K}_{n}, we show

Yn=Ξ1/2(δ1Tn(K1,θ0)+δ2Tn(K2,θ0))𝑑N(0,1),(δ1,δ2)2\displaystyle Y_{n}=\Xi^{-1/2}(\delta_{1}T_{n}(K_{1},\theta_{0})+\delta_{2}T_{n}(K_{2},\theta_{0}))\overset{d}{\longrightarrow}N(0,1),\quad\forall(\delta_{1},\delta_{2})\in\mathbb{R}^{2} (A.22)

where Ξ=δ12+δ22+2δ1δ2v12,v12=limnVn(K1)1/2Γn(K1)1Ωn(K1,K2)Γn(K2)1Vn(K2)1/2\Xi=\delta_{1}^{2}+\delta_{2}^{2}+2\delta_{1}\delta_{2}v_{12},v_{12}=\lim_{n\rightarrow\infty}V_{n}({K_{1}})^{-1/2}\Gamma_{n}(K_{1})^{-1}\Omega_{n}(K_{1},K_{2})\Gamma_{n}(K_{2})^{-1}V_{n}({K_{2}})^{-1/2}.

Define Yn=Y1,n+Y2,n,Y1,nY_{n}=Y_{1,n}+Y_{2,n},Y_{1,n} and Y2,nY_{2,n} as follows

Y1,n=ω1,1n+i=2ny1,in,y1,in=ω1,in+y¯1,in,Y2,n=ω2,1n+i=2ny2,in,y2,in=ω2,in+y¯2,in,\displaystyle Y_{1,n}=\omega_{1,1n}+\sum_{i=2}^{n}y_{1,in},y_{1,in}=\omega_{1,in}+\bar{y}_{1,in},\quad Y_{2,n}=\omega_{2,1n}+\sum_{i=2}^{n}y_{2,in},y_{2,in}=\omega_{2,in}+\bar{y}_{2,in},

where ω1,in=b1,nW1,in\omega_{1,in}=b_{1,n}W_{1,in}, b1,n=δ1Ξ1/2Vn(K1)1/2Γn(K1)1b_{1,n}=\delta_{1}\Xi^{-1/2}V_{n}({K_{1}})^{-1/2}\Gamma_{n}(K_{1})^{-1}, W1,in=viMK1,iiεi/nW_{1,in}=v_{i}M_{K_{1},ii}\varepsilon_{i}/\sqrt{n}, y¯1,in=j<i(u1,jPK1,ijεi+u1,iPK1,ijεj)/K1,u1,i=c1,nvi,c1,n=δ1Ξ1/2Vn(K1)1/2Γn(K1)1K1/n\bar{y}_{1,in}=\sum_{j<i}(u_{1,j}P_{K_{1},ij}\varepsilon_{i}+u_{1,i}P_{K_{1},ij}\varepsilon_{j})/\sqrt{K_{1}},u_{1,i}=c_{1,n}v_{i},c_{1,n}=-\delta_{1}\Xi^{-1/2}V_{n}({K_{1}})^{-1/2}\Gamma_{n}(K_{1})^{-1}\sqrt{K_{1}/n} and ω2,in=b2,nW2,in,y¯2,in\omega_{2,in}=b_{2,n}W_{2,in},\bar{y}_{2,in} are similarly defined with PK2,Vn(K2),Γn(K2)P_{K_{2}},V_{n}({K_{2}}),\Gamma_{n}(K_{2}) and K2K_{2}. Note that Vn(K1)1C||V_{n}(K_{1})^{-1}||\leq C and Γn(K1)1C||\Gamma_{n}(K_{1})^{-1}||\leq C a.s. for nn large enough by Assumption 5.2, and it follows that b1,nC||b_{1,n}||\leq C. Also, E[ω1,1n4|X]Ci=1nE[W1,in4|X]0E[||\omega_{1,1n}||^{4}|X]\leq C\sum_{i=1}^{n}E[||W_{1,in}||^{4}|X]\rightarrow 0 a.s. by Assumption 5.2(ii). Using the same arguments in the proof of Lemma A2 in Chao et al. (2012), we have ω1,1n=op(1)\omega_{1,1n}=o_{p}(1) and ω2,1n=op(1)\omega_{2,1n}=o_{p}(1) unconditionally, thus Yn=i=2nyin+op(1),yin=y1,in+y2,inY_{n}=\sum_{i=2}^{n}y_{in}+o_{p}(1),y_{in}=y_{1,in}+y_{2,in}.

Let 𝒳i=(W1,in,W2,in,vi,εi)\mathcal{X}_{i}=(W_{1,in},W_{2,in},v_{i},\varepsilon_{i})^{\prime} and define the σ\sigma-fields Fi,n=σ(𝒳1,,𝒳i)F_{i,n}=\sigma(\mathcal{X}_{1},...,\mathcal{X}_{i}) for i=1,,n.i=1,...,n. Then, conditional on XX, {yin,Fi,n,1in,n2}\{y_{in},F_{i,n},1\leq i\leq n,n\geq 2\} is a martingale difference array with Fi1,nFi,nF_{i-1,n}\subseteq F_{i,n}. We apply the martingale central limit theorem to show, conditional on XX, i=2nyin𝑑N(0,1)\sum_{i=2}^{n}y_{in}\overset{d}{\longrightarrow}N(0,1) a.s. Note that E[ω1,iny¯1,jn|X]=0,E[ω1,iny¯2,jn|X]=0,E[ω2,iny¯1,jn|X]=0,E[ω2,iny¯2,jn|X]=0E[\omega_{1,in}\bar{y}_{1,jn}|X]=0,E[\omega_{1,in}\bar{y}_{2,jn}|X]=0,E[\omega_{2,in}\bar{y}_{1,jn}|X]=0,E[\omega_{2,in}\bar{y}_{2,jn}|X]=0 for all i,ji,j. Then similar to the proof of Lemma A2 in Chao et al. (2012),

sn2(X)\displaystyle s_{n}^{2}(X) =E[(i=2nyin)2|X]=i=2n(E[ω1,in2|X]+E[y¯1,in2|X])+i=2n(E[ω2,in2|X]+E[y¯2,in2|X])\displaystyle=E[(\sum_{i=2}^{n}y_{in})^{2}|X]=\sum_{i=2}^{n}(E[\omega_{1,in}^{2}|X]+E[\bar{y}_{1,in}^{2}|X])+\sum_{i=2}^{n}(E[\omega_{2,in}^{2}|X]+E[\bar{y}_{2,in}^{2}|X])
+2i=2n(E[ω1,inω2,in|X]+E[y¯1,iny¯2,in|X])\displaystyle+2\sum_{i=2}^{n}(E[\omega_{1,in}\omega_{2,in}|X]+E[\bar{y}_{1,in}\bar{y}_{2,in}|X])
=δ12Ξ1+δ22Ξ1E[ω1,1n2|X]E[ω2,1n2|X]2E[ω1,1nω2,1n|X]\displaystyle=\delta_{1}^{2}\Xi^{-1}+\delta_{2}^{2}\Xi^{-1}-E[\omega_{1,1n}^{2}|X]-E[\omega_{2,1n}^{2}|X]-2E[\omega_{1,1n}\omega_{2,1n}|X]
+2δ1δ2Ξ1Vn(K1)1/2Γn(K1)1Ωn(K1,K2)Γn(K2)1Vn(K2)1/21a.s.\displaystyle+2\delta_{1}\delta_{2}\Xi^{-1}V_{n}({K_{1}})^{-1/2}\Gamma_{n}(K_{1})^{-1}\Omega_{n}(K_{1},K_{2})\Gamma_{n}(K_{2})^{-1}V_{n}({K_{2}})^{-1/2}\rightarrow 1\quad a.s.

Moreover, we have i=2nE[yin4|X]i=2nE[y1,in4|X]+i=2nE[y2,in4|X]a.s.0\sum_{i=2}^{n}E[y_{in}^{4}|X]\lesssim\sum_{i=2}^{n}E[y_{1,in}^{4}|X]+\sum_{i=2}^{n}E[y_{2,in}^{4}|X]\overset{a.s.}{\rightarrow}0 as in the proof of Lemma A2 of Chao et al. (2012).

It remains to prove that for any δ>0\delta>0, P(|i=2nE[yin2|𝒳1,,𝒳i1,X]sn2(X)|δ|X)0P(\big{|}\sum_{i=2}^{n}E[y_{in}^{2}|\mathcal{X}_{1},...,\mathcal{X}_{i-1},X]-s_{n}^{2}(X)\big{|}\geq\delta|X)\rightarrow 0. Note that

i=2nE[yin2|𝒳1,,𝒳i1,X]sn2(X)\displaystyle\sum_{i=2}^{n}E[y_{in}^{2}|\mathcal{X}_{1},...,\mathcal{X}_{i-1},X]-s_{n}^{2}(X)
=i=2nE[y1,in2|𝒳1,,𝒳i1,X]i=2n(E[ω1,in2|X]+E[y¯1,in2|X])\displaystyle=\sum_{i=2}^{n}E[y_{1,in}^{2}|\mathcal{X}_{1},...,\mathcal{X}_{i-1},X]-\sum_{i=2}^{n}(E[\omega_{1,in}^{2}|X]+E[\bar{y}_{1,in}^{2}|X]) (A.23)
+i=2nE[y2,in2|𝒳1,,𝒳i1,X]i=2n(E[ω2,in2|X]+E[y¯2,in2|X])\displaystyle+\sum_{i=2}^{n}E[y_{2,in}^{2}|\mathcal{X}_{1},...,\mathcal{X}_{i-1},X]-\sum_{i=2}^{n}(E[\omega_{2,in}^{2}|X]+E[\bar{y}_{2,in}^{2}|X]) (A.24)
+2(i=2n(E[ω1,inω2,in|𝒳1,,𝒳i1,X]E[ω1,inω2,in|X])\displaystyle+2\Big{(}\sum_{i=2}^{n}(E[\omega_{1,in}\omega_{2,in}|\mathcal{X}_{1},...,\mathcal{X}_{i-1},X]-E[\omega_{1,in}\omega_{2,in}|X]) (A.25)
+i=2nE[ω1,iny¯2,in+ω2,iny¯1,in|𝒳1,,𝒳i1,X]+i=2n(E[y¯1,iny¯2,in|𝒳1,,𝒳i1,X]E[y¯1,iny¯2,in|X])).\displaystyle+\sum_{i=2}^{n}E[\omega_{1,in}\bar{y}_{2,in}+\omega_{2,in}\bar{y}_{1,in}|\mathcal{X}_{1},...,\mathcal{X}_{i-1},X]+\sum_{i=2}^{n}(E[\bar{y}_{1,in}\bar{y}_{2,in}|\mathcal{X}_{1},...,\mathcal{X}_{i-1},X]-E[\bar{y}_{1,in}\bar{y}_{2,in}|X])\Big{)}. (A.26)

(A.23) and (A.24) converge to 0 a.s. by the proof of Lemma A2 in Chao et al. (2012). Moreover, it is straightforward to verify that (A.25) and (A.26) converge to 0 a.s. since PK1,ijPK2,ijPK1,ij2PK2,ij2P_{K_{1},ij}P_{K_{2},ij}\leq P_{K_{1},ij}^{2}\vee P_{K_{2},ij}^{2}, K1K2K_{1}\asymp K_{2} and by closely following the proof of Lemma A2 in Chao et al. (2012). Then we can apply the martingale central limit theorem and deduce Yn𝑑N(0,1)Y_{n}\overset{d}{\longrightarrow}N(0,1) using similar arguments to the proof of Lemma A2 in Chao et al. (2012). Coverage results (5.6) and (5.7) follow by the joint convergence of T^n(K,θ0)\widehat{T}_{n}(K,\theta_{0}) with maxK𝒦n|V^n(K)Vn(K)1|=op(1),Σ^nΣn=op(1)\max\limits_{K\in\mathcal{K}_{n}}|\frac{\widehat{V}_{n}(K)}{V_{n}(K)}-1|=o_{p}(1),||\widehat{\Sigma}_{n}-\Sigma_{n}||=o_{p}(1) as n,Kn,K\rightarrow\infty under the assumption imposed in Theorem 5.1 and the Slutzky theorem. This completes the proof. ∎

Appendix B Figures and Tables

Refer to caption
Figure 1: Different functions of g(x)g(x) used in simulations (Section 6).
Solid lines (Black) are g1(x)=ln(|6x3|+1)sgn(x1/2)g_{1}(x)=\ln(|6x-3|+1)sgn(x-1/2); Dashed lines (Green) are g2(x)=sin(7πx/2)/[1+2x2(sgn(x)+1)]g_{2}(x)=\sin(7\pi x/2)/[1+2x^{2}(sgn(x)+1)] ; Dotted lines (Blue) are g3(x)=x1/2+5ϕ(10(x1/2))g_{3}(x)=x-1/2+5\phi(10(x-1/2)), where ϕ()\phi(\cdot) is the standard normal pdf.
Table 1: Coverage and Length of Nominal 95%\% CIs and CBs - Splines
Pointwise Uniform
x=0.2x=0.2 x=0.5x=0.5 x=0.8x=0.8 x=0.9x=0.9
COV AL COV AL COV AL COV AL COV AL
Model 1: g1(x)=ln(|6x3|+1)sgn(x1/2)g_{1}(x)=\ln(|6x-3|+1)sgn(x-1/2)
Standard 0.93 0.27 0.93 0.36 0.91 0.92 0.92 1.49 0.42 0.69
Robust (K^cv\widehat{K}_{\texttt{cv}}) 0.98 0.37 0.98 0.46 0.96 1.14 0.95 1.76 0.97 1.33
Robust (K^cv+\widehat{K}_{\texttt{cv+}}) 0.98 0.51 0.98 0.49 0.98 1.51 0.97 2.08 0.98 1.42
Model 2: g2(x)=sin(7πx/2)/[1+2x2(sgn(x)+1)]g_{2}(x)=\sin(7\pi x/2)/[1+2x^{2}(sgn(x)+1)]
Standard 0.80 0.28 0.93 0.36 0.91 0.92 0.92 1.49 0.27 0.69
Robust (K^cv\widehat{K}_{\texttt{cv}}) 0.93 0.37 0.97 0.46 0.96 1.14 0.95 1.76 0.96 1.33
Robust (K^cv+\widehat{K}_{\texttt{cv+}}) 0.98 0.51 0.98 0.49 0.98 1.51 0.97 2.08 0.98 1.42
Model 3: g3(x)=x1/2+5ϕ(10(x1/2))g_{3}(x)=x-1/2+5\phi(10(x-1/2))
Standard 0.77 0.29 0.65 0.40 0.89 1.00 0.91 1.57 0.16 0.70
Robust (K^cv\widehat{K}_{\texttt{cv}}) 0.88 0.39 0.74 0.50 0.96 1.23 0.95 1.85 0.75 1.35
Robust (K^cv+\widehat{K}_{\texttt{cv+}}) 0.98 0.52 0.92 0.53 0.98 1.52 0.97 2.06 0.97 1.44
  • Notes: “Pointwise” reports coverage (COV) and average length (AL) of (1) the standard 95% CI with K^cv𝒦n\widehat{K}_{\texttt{cv}}\in\mathcal{K}_{n}; (2) robust CI with K^cv\widehat{K}_{\texttt{cv}}; (3) robust CI with K^cv+\widehat{K}_{\texttt{cv+}}. “Uniform” reports analogous uniform inference results for confidence bands. K^cv\widehat{K}_{\texttt{cv}} is selected to minimize leave-one-out cross-validation and K^cv+=K^cv+2\widehat{K}_{\texttt{cv+}}=\widehat{K}_{\texttt{cv}}+2. Using quadratic spline regressions with evenly placed knots.

Table 2: Nonparametric Wage Elasticity of Hours of Work Estimates in Blomquist and Newey (Table 1, 2002). Wage elasticity evaluated at the mean net wage rates, virtual income, and level of hours.
Additional Terms1 CV2CV{\textsuperscript{2}} E^w\widehat{E}_{w} SEE^wSE_{\widehat{E}_{w}} CIE^w(K)CI_{\widehat{E}_{w}}(K)
1,yJ,wJ1,y_{J},w_{J} 0.00472 0.0372 0.0104 [0.0168, 0.0576]
ΔyΔw\Delta y\Delta w 0.0313 0.0761 0.0128 [0.0510, 0.1012]
Δy\ell\Delta y 0.0305 0.0760 0.0127 [0.0511, 0.1009]
yJ2,wJ2y_{J}^{2},w_{J}^{2} 0.0323 0.0763 0.0129 [0.0510, 0.1016]
Δy2,Δw2\Delta y^{2},\Delta w^{2} 0.0369 0.0543 0.0151 [0.0247, 0.0839]
yJwJy_{J}w_{J} 0.0364 0.0659 0.0197 [0.0273, 0.1045]
Δyw\Delta yw 0.0350 0.0628 0.0223 [0.0191, 0.1065]
2Δy\ell^{2}\Delta y 0.0364 0.0636 0.0223 [0.0199, 0.1073]
yJ3,wJ3y_{J}^{3},w_{J}^{3} 0.0331 0.0845 0.0275 [0.0306, 0.1384]
Δy2,Δw2,Δyw\ell\Delta y^{2},\ell\Delta w^{2},\ell\Delta yw 0.0263 0.0775 0.0286 [0.0214, 0.1336]
yJ2wJ,yJwJ2y_{J}^{2}w_{J},y_{J}w_{J}^{2} 0.0252 0.0714 0.0289 [0.0148, 0.1280]
MLE estimates 0.123 0.0137
critical values: c^1α(x)=2.503\widehat{c}_{1-\alpha}(x)=2.503,  CIE^wsup(K^cv)=[0.0165,0.0921]CI_{\widehat{E}_{w}}^{\texttt{sup}}(\widehat{K}_{\texttt{cv}})=[0.0165,0.0921]3
CIE^wsup(K^cv+)=[0.0166,0.1152],CIE^wsup(K^cv++)=[0.0070,0.1186]CI_{\widehat{E}_{w}}^{\texttt{sup}}(\widehat{K}_{\texttt{cv+}})=[0.0166,0.1152],\quad CI_{\widehat{E}_{w}}^{\texttt{sup}}(\widehat{K}_{\texttt{cv++}})=[0.0070,0.1186]
  • 1

    yy : non-labor income, ww : marginal wage rates, \ell: the end point of the segment in a piecewise linear budget set. mΔypwq\ell^{m}\Delta y^{p}w^{q} denotes jjm(yjpwjqyj+1pwj+1q)\sum_{j}\ell_{j}^{m}(y_{j}^{p}w_{j}^{q}-y_{j+1}^{p}w_{j+1}^{q}).

  • 2

    CVCV denotes the cross-validation criteria defined in Blomquist and Newey (2002, p.2464). K^cv=K5\widehat{K}_{\texttt{cv}}=K_{5}, the 5th smallest model, is chosen by the cross-validation, and let K^cv+=K6\widehat{K}_{\texttt{cv+}}=K_{6}, K^cv++=K7\widehat{K}_{\texttt{cv++}}=K_{7}.

  • 3

    CIE^wsup(K)=E^w(K)±c^1α(x)SEE^w(K)CI_{\widehat{E}_{w}}^{\texttt{sup}}(K)=\widehat{E}_{w}(K)\pm\widehat{c}_{1-\alpha}(x)SE_{\widehat{E}_{w}}(K), CIE^w(K)=E^w(K)±z1α/2SEE^w(K)CI_{\widehat{E}_{w}}(K)=\widehat{E}_{w}(K)\pm z_{1-\alpha/2}SE_{\widehat{E}_{w}}(K).

Figure 2: Nonparametric Wage Elasticity of Hours of Work Estimates in Blomquist and Newey (Table 1, 2002).
Refer to caption

Figure 2 plots the wage elasticity estimates of the expected labor supply same as in Table 2, with standard pointwise 95% CIs as well as uniform (in K𝒦nK\in\mathcal{K}_{n}) CIs constructed with the critical value c^1α(x)\widehat{c}_{1-\alpha}(x).