This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Bias correction and uniform inference
for the quantile density function

Grigory Franguridi Department of Economics, University of Southern California. Email: franguri@usc.edu
Abstract

For the kernel estimator of the quantile density function (the derivative of the quantile function), I show how to perform the boundary bias correction, establish the rate of strong uniform consistency of the bias-corrected estimator, and construct the confidence bands that are asymptotically exact uniformly over the entire domain [0,1][0,1]. The proposed procedures rely on the pivotality of the studentized bias-corrected estimator and known anti-concentration properties of the Gaussian approximation for its supremum.

1 Introduction

The derivative of the quantile function, the quantile density (QD), has been long recognized as an important object in statistical inference.222This function is sometimes also called the sparsity function (Tukey, 1965). In particular, it arises as a factor in the asymptotically linear expansion for the quantile function (Bahadur, 1966; Kiefer, 1967), and hence may be used for asymptotically valid inference on quantiles (Csörgő and Révész, 1981a, b; Koenker, 2005).

Given its importance, several estimators of the QD have been proposed in the literature. The most widely used estimator is the kernel quantile density (KQD), originally developed by Siddiqui (1960) and Bloch and Gastwirth (1968) for the case of rectangular kernel, and generalized to arbitrary kernels by Falk (1986), Welsh (1988), Csörgő et al. (1991), and Jones (1992). This estimator is simply a smoothed derivative of the empirical quantile function, where smoothing is performed via convolution with a kernel function.

Similarly to the classical case of kernel density estimation, the KQD suffers from bias close to the boundary points {0,1}\{0,1\} of its domain [0,1][0,1], rendering the estimator inconsistent. To the best of my knowledge, no bias correction procedures have been developed for the QD.

In this paper, I show how to perform correction for the boundary bias, recovering strong uniform consistency for the resulting bias-corrected KQD (BC-KQD) estimator. The bias correction is computationally cheap and is based on the fact that the bias of the KQD is approximately equal to the integral of the localized kernel function, a quantity that only depends on the chosen kernel and bandwidth. I also develop an algorithm for construction of the uniform confidence bands around the QD on its entire domain [0,1][0,1]. This procedure relies on the fact that the studentized BC-KQD exhibits an influence function that is pivotal. This makes it possible to calculate the critical values by simulating from either the known influence function or the studentized BC-KQD under an alternative (pseudo) distribution of the data.

The rest of the paper is organized as follows. Section 2 outlines the framework and defines the KQD estimator. Section 3 introduces the BC-KQD estimator and establishes its Bahadur-Kiefer expansion. Section 4 develops the uniform confidence bands based on the BC-KQD. Section 5 illustrates the performance of the confidence bands in a set of Monte Carlo simulations. Section 6 concludes. Proofs of theoretical results are given in the Appendix.

2 Setup and kernel quantile density estimator

The data consist of independent identically distributed draws X1,,XnX_{1},\dots,X_{n} from a distribution on \mathbb{R} with a cumulative distribution function (CDF) FF satisfying the following assumption.

Assumption 1 (Data generating process).

The distribution FF has compact support [x¯,x¯][\underline{x},\bar{x}] and admits a density f=Ff=F^{\prime} that is continuously differentiable and bounded away from zero and infinity on [x¯,x¯][\underline{x},\bar{x}].

Assumption 1 implies that the quantile density

q(u)dF1(u)du=1f(F1(u))\displaystyle q(u)\coloneqq\frac{dF^{-1}(u)}{du}=\frac{1}{f(F^{-1}(u))} (1)

is continuously differentiable and bounded away from zero and infinity on the support [x¯,x¯][\underline{x},\bar{x}].

Let X(1)X(n)X_{(1)}\leq\cdots\leq X_{(n)} be the order statistics of the sample X1,,XnX_{1},\dots,X_{n}, and let Q^\hat{Q} denote the empirical quantile function,

Q^(u){X(nu+1),u[0,1),X(n),u=1,\displaystyle\hat{Q}(u)\coloneqq\begin{cases}X_{(\left\lfloor{nu}\right\rfloor+1)},\quad u\in[0,1),\\ X_{(n)},\quad u=1,\end{cases} (2)

The KQD estimator is defined as

q^h(u)\displaystyle\hat{q}_{h}(u) 01Kh(uz)𝑑Q^(z)=i=1n1Kh(uin)(X(i+1)X(i)),u[0,1],\displaystyle\coloneqq\int_{0}^{1}K_{h}(u-z)\,d\hat{Q}(z)=\sum_{i=1}^{n-1}K_{h}\left(u-\frac{i}{n}\right)\left(X_{(i+1)}-X_{(i)}\right),\quad u\in[0,1], (3)

where KK is a kernel function, Kh(z)h1K(h1z)K_{h}(z)\coloneqq h^{-1}K\left(h^{-1}z\right), and h>0h>0 is bandwidth (see, e.g., Csörgő et al., 1991). We impose the following assumptions on the kernel and bandwidth.

Assumption 2 (Kernel function).

The kernel KK is a nonnegative function of bounded variation that is supported on [1/2,1/2][-1/2,1/2], symmetric around 0, and satisfies

K(x)𝑑x=1,K2(x)𝑑x<.\displaystyle\int_{\mathbb{R}}K(x)\,dx=1,\quad\int_{\mathbb{R}}K^{2}(x)\,dx<\infty. (4)
Assumption 3 (Bandwidth, estimation).

The bandwidth h=hnh=h_{n} is such that hn0h_{n}\to 0 and

  1. 1.

    hn1=o(n1/2(logn)1(loglogn)1/2logh1)h_{n}^{-1}=o\left(n^{1/2}(\log n)^{-1}(\log\log n)^{-1/2}\log h^{-1}\right),

  2. 2.

    hn=o(n1/3(logh1)1/3)h_{n}=o\left(n^{-1/3}(\log h^{-1})^{-1/3}\right).

Assumption 4 (Bandwidth, inference).

The bandwidth h=hnh=h_{n} is such that hn0h_{n}\to 0 and

  1. 1.

    hn1=o(n1/2(logn)2(loglogn)1/2)h_{n}^{-1}=o\left(n^{1/2}(\log n)^{-2}(\log\log n)^{-1/2}\right),

  2. 2.

    hn=o(n1/3(logn)1)h_{n}=o\left(n^{-1/3}(\log n)^{-1}\right).

Assumption 2 is standard; boundedness of the total variation of KK ensures that the class

{K(uh),u[0,1],h>0}\displaystyle\mathcal{F}\coloneqq\left\{K\left(\frac{u-\cdot}{h}\right),\,\,u\in[0,1],h>0\right\} (5)

is a bounded VC class of measurable functions, see, e.g., Nolan and Pollard (1987).

Assumptions 3 and 4 are essentially the same, up to the log terms in the bandwidth rates, with Assumption 3 being slightly weaker. Assumption 4.1 states that the bandwidth rate is large enough (slightly larger than n1/2n^{-1/2}) to guarantee that the smoothed remainder of the classical Bahadur-Kiefer expansion vanishes asymptotically, see the proof of 1 below. Assumption 4.2 imposes the undersmoothing bandwidth rate (slightly smaller than n1/3n^{-1/3}), which ensures that the smoothing bias disappears fast enough for the confidence bands to be valid, see the proof of Theorem 2 below.

3 Bias correction and Bahadur-Kiefer expansion

In this section, I introduce the bias-corrected estimator and develop its asymptotically linear expansion with an explicit a.s. uniform rate of the remainder (the Bahadur-Kiefer expansion).

To see the necessity of bias correction, note that, for uu close to the boundary, the kernel weights Kh(ui/n)K_{h}(u-i/n), i=1,,n1i=1,\dots,n-1, do not approximately sum up to one, rendering the KQD q^h(u)\hat{q}_{h}(u) inconsistent. Therefore, dividing the KQD by the sum of the kernel weights (or the corresponding integral of the kernel function) may eliminate the boundary bias. To this end, define

ψh(u)\displaystyle\psi_{h}(u) 01Kh(uz)𝑑z=max(uh/2,0)min(u+h/2,1)Kh(uz)𝑑z,u[0,1].\displaystyle\coloneqq\int_{0}^{1}K_{h}(u-z)\,dz=\int_{\max(u-h/2,0)}^{\min(u+h/2,1)}K_{h}(u-z)\,dz,\quad u\in[0,1]. (6)

For computational purposes, note that ψh\psi_{h} is symmetric around 1/21/2 (i.e. ψh(u)=ψh(1u)\psi_{h}(u)=\psi_{h}(1-u) for all u[0,1]u\in[0,1]), ψh[1/2,1]\psi_{h}\in[1/2,1] and ψh(u)=1\psi_{h}(u)=1 for u[h/2,1h/2]u\in[h/2,1-h/2]. The bias-corrected KQD (BC-KQD) is then defined as

q^hbc(u)q^h(u)ψh(u)=i=1n1Kh(uin)(X(i+1)X(i))01Kh(uz)𝑑z,u[0,1].\displaystyle\hat{q}_{h}^{bc}(u)\coloneqq\frac{\hat{q}_{h}(u)}{\psi_{h}(u)}=\frac{\sum_{i=1}^{n-1}K_{h}\left(u-\frac{i}{n}\right)\left(X_{(i+1)}-X_{(i)}\right)}{\int_{0}^{1}K_{h}(u-z)\,dz},\quad u\in[0,1]. (7)

The following theorem establishes that the studentized BC-KQD is approximately equal to the centered kernel density estimator with an approximation error that converges to zero a.s. at an explicit uniform rate. Since this result resembles (and relies on) the classical asymptotically linear expansion for the quantile function (Bahadur, 1966; Kiefer, 1967), we call it the Bahadur-Kiefer expansion for the BC-KQD. Denote Ui=F(Xi)U_{i}=F(X_{i}), i=1,,n.i=1,\dots,n.

Theorem 1 (Bahadur-Kiefer expansion for the BC-KQD).

Suppose Assumptions 1 and 2 are satisfied and hn0h_{n}\to 0. Then the following representation holds uniformly in u[0,1]u\in[0,1],

Znbc(u)=𝔾n(u)+Oa.s.(n1/2h3/2+hlogh1+h1/2n1/4(logn)1/2(loglogn)1/4),\displaystyle Z_{n}^{bc}(u)=-\mathbb{G}_{n}(u)+O_{a.s.}\left(n^{1/2}h^{3/2}+h\log h^{-1}+h^{-1/2}n^{-1/4}(\log n)^{1/2}(\log\log n)^{1/4}\right), (8)

where

Znbc(u)\displaystyle Z_{n}^{bc}(u) nh(q^hbc(u)q(u))q(u)/ψh(u),\displaystyle\coloneqq\frac{\sqrt{nh}\left(\hat{q}_{h}^{bc}(u)-q(u)\right)}{q(u)/\psi_{h}(u)}, (9)
𝔾n(u)\displaystyle\mathbb{G}_{n}(u) 1nhi=1n[K(Uiuh)𝐄K(Uiuh)]\displaystyle\coloneqq\frac{1}{\sqrt{nh}}\sum_{i=1}^{n}\left[K\left(\frac{U_{i}-u}{h}\right)-\mathbf{E}K\left(\frac{U_{i}-u}{h}\right)\right] (10)
=nh1ni=1n[Kh(Uiu)ψh(u)].\displaystyle=\sqrt{nh}\cdot\frac{1}{n}\sum_{i=1}^{n}\left[K_{h}(U_{i}-u)-\psi_{h}(u)\right]. (11)

This representation allows us to establish the exact rate of strong uniform consistency of the BC-KQD under a bandwidth that achieves undersmoothing (Assumption 3.2).

Corollary 1 (Strong uniform consistency of BC-KQD).

Suppose Assumptions 1, 2, and 3 hold. Then

limnnhn2loghn1supu[0,1]|q^hbc(u)q(u)|=(K2(x)𝑑x)1/2 a.s.\displaystyle\lim_{n\to\infty}\sqrt{\frac{nh_{n}}{2\log h_{n}^{-1}}}\sup_{u\in[0,1]}\left|\hat{q}_{h}^{bc}(u)-q(u)\right|=\left(\int_{\mathbb{R}}K^{2}(x)\,dx\right)^{1/2}\text{ a.s.} (12)

One of the convenient features of the KQD (and BC-KQD) estimator is that its bandwidth has a natural scale [0,1][0,1] which is independent of the data generating process. Hence, I put aside the choice of constant cc in the bandwidth h=cnηh=cn^{-\eta} and suggest setting c=1c=1.

Regarding the choice of the rate η\eta, ignoring the log terms, it is easy to establish the rate-optimal bandwidth, which is achieved whenever the rate of the smoothing bias n1/2h3/2n^{1/2}h^{3/2} matches that of the remainder in the original Bahadur-Kiefer expansion n1/4h1/2n^{-1/4}h^{-1/2}. It follows that the nearly-optimal bandwidth is

hnopt=O(n3/8).\displaystyle h_{n}^{opt}=O\left(n^{-3/8}\right). (13)

Under this bandwidth, the exact rate of strong uniform convergence is

O(lognn5/16),\displaystyle O\left(\frac{\log n}{n^{5/16}}\right), (14)

which is just slightly worse than the familiar “cube-root” rate (Kim and Pollard, 1990).

4 Uniform confidence bands

Suppose we had access to valid approximations cn,τc_{n,\tau}, cn,τabsc_{n,\tau}^{abs} to the τ\tau-quantiles of the random variables

Wnbc\displaystyle W_{n}^{bc} =supu[0,1]Znbc(u),\displaystyle=\sup_{u\in[0,1]}Z_{n}^{bc}(u), (15)
Wnbc,abs\displaystyle W_{n}^{bc,abs} =supu[0,1]|Znbc(u)|,\displaystyle=\sup_{u\in[0,1]}\left|Z_{n}^{bc}(u)\right|, (16)

respectively, in the sense that

(Wnbccn,τ)\displaystyle\mathbb{P}(W_{n}^{bc}\leq c_{n,\tau}) =τ+o(1),\displaystyle=\tau+o(1), (17)
(Wnbc,abscn,τabs)\displaystyle\mathbb{P}(W_{n}^{bc,abs}\leq c_{n,\tau}^{abs}) =τ+o(1).\displaystyle=\tau+o(1). (18)

Then the following confidence bands for q()q(\cdot) would be asymptotically valid at the confidence level 1α1-\alpha:

  1. 1.

    the one-sided CB

    [q^hbc(u)1+cn,1αψh(u)nh,+),u[0,1],\displaystyle\left[\frac{\hat{q}_{h}^{bc}(u)}{1+\frac{c_{n,1-\alpha}}{\psi_{h}(u)\sqrt{nh}}},\,+\infty\right),\quad u\in[0,1], (19)
  2. 2.

    the one-sided CB

    (,q^hbc(u)1cn,1αψh(u)nh],u[0,1],\displaystyle\left(-\infty,\,\frac{\hat{q}_{h}^{bc}(u)}{1-\frac{c_{n,1-\alpha}}{\psi_{h}(u)\sqrt{nh}}}\right],\quad u\in[0,1], (20)
  3. 3.

    the two-sided CB

    q(u)[q^hbc(u)1+cn,1α/2absψh(u)nh,q^hbc(u)1cn,1α/2absψh(u)nh],u[0,1].\displaystyle q(u)\in\left[\frac{\hat{q}_{h}^{bc}(u)}{1+\frac{c_{n,1-\alpha/2}^{abs}}{\psi_{h}(u)\sqrt{nh}}},\,\,\frac{\hat{q}_{h}^{bc}(u)}{1-\frac{c_{n,1-\alpha/2}^{abs}}{\psi_{h}(u)\sqrt{nh}}}\right],\quad u\in[0,1]. (21)

I propose two ways of obtaining such approximate critical values, both making use of the pivotality of the studentized bias-corrected KQD Znbc(u)Z_{n}^{bc}(u), see Theorem 1. I focus on the one-sided critical value cn,τc_{n,\tau} for simplicity; the proofs for the two-sided critical value are analogous.

The first approach is to let cn,τc_{n,\tau} be the τ\tau-quantile of the random variable

Wn𝔾=supu[0,1]𝔾n(u).\displaystyle W_{n}^{\mathbb{G}}=\sup_{u\in[0,1]}\mathbb{G}_{n}(u). (22)

Since 𝔾n\mathbb{G}_{n} is a known process, cn,τc_{n,\tau} can be obtained easily by simulation. In principle, cn,τc_{n,\tau} can be tabulated for different choices of the kernel KK and values of the sample size nn and the bandwidth hh.

The other approach is to let cn,τc_{n,\tau} be the τ\tau-quantile of the random variable

WnU[0,1]supu[0,1]Znbc,U[0,1](u),\displaystyle W_{n}^{U[0,1]}\coloneqq\sup_{u\in[0,1]}Z_{n}^{bc,U[0,1]}(u), (23)

where Znbc,U[0,1](u)Z_{n}^{bc,U[0,1]}(u) is equal to Znbc(u)Z_{n}^{bc}(u) evaluated at a pseudo-sample X~1,,X~nU[0,1]\tilde{X}_{1},\dots,\tilde{X}_{n}\sim U[0,1] in place of the original sample. For the uniform distribution, q1q\equiv 1, and hence

Znbc,U[0,1](u)nh(q~hbc(u)q(u))q(u)/ψh(u)=nh(q~h(u)ψh(u)),\displaystyle Z_{n}^{bc,U[0,1]}(u)\coloneqq\frac{\sqrt{nh}(\tilde{q}_{h}^{bc}(u)-q(u))}{q(u)/\psi_{h}(u)}=\sqrt{nh}(\tilde{q}_{h}(u)-\psi_{h}(u)), (24)

where q~n(u)\tilde{q}_{n}(u) is the (non-bias-corrected) KQD calculated using the pseudo-sample, i.e.

q~n(u)=i=1n1Kh(uin)(X~(i+1)X~(i)),u[0,1].\displaystyle\tilde{q}_{n}(u)=\sum_{i=1}^{n-1}K_{h}\left(u-\frac{i}{n}\right)\left(\tilde{X}_{(i+1)}-\tilde{X}_{(i)}\right),\quad u\in[0,1]. (25)

The following theorem establishes that the two aforementioned approximations to the critical values are valid, implying the asymptotic validity of the confidence bands. These confidence bands are centered at an AMSE-suboptimal estimator q^hbc\hat{q}_{h}^{bc} and are expected to shrink at a rate slightly slower than the minimax optimal rate, as noted by Chernozhukov et al. (2014a, p.1795). This is compensated for by the confidence bands exhibiting the coverage that is asymptotically exact.

Theorem 2 (Exactness of confidence bands).

Suppose Assumptions 1, 2, and 4 hold. Then

limnsupt|(Wnbct)(Wn𝔾t)|=0,\displaystyle\lim_{n\to\infty}\sup_{t\in\mathbb{R}}\left|\mathbb{P}\left(W_{n}^{bc}\leq t\right)-\mathbb{P}\left(W_{n}^{\mathbb{G}}\leq t\right)\right|=0, (26)
limnsupt|(Wnbct)(WnU[0,1]t)|=0,\displaystyle\lim_{n\to\infty}\sup_{t\in\mathbb{R}}\left|\mathbb{P}\left(W_{n}^{bc}\leq t\right)-\mathbb{P}\left(W_{n}^{U[0,1]}\leq t\right)\right|=0, (27)

and hence the confidence bands (19), (20), and (21) are asymptotically exact.

5 Monte Carlo study

In this section I study the finite-sample behavior of the proposed confidence bands in a set of Monte Carlo simulations.

I consider the following distributions of the data, all supported on the interval [0,1][0,1]: (i) uniform[0,1] distribution (ii) the distribution N(1/2,1)N(1/2,1) truncated to [0,1][0,1] (iii) the linear distribution with the PDF f(x)=x+1/2f(x)=x+1/2, x[0,1]x\in[0,1]. I set the nominal confidence level to be 1α{0.8,0.9,0.95,0.99}1-\alpha\in\{0.8,0.9,0.95,0.99\} and the sample size n{100,500,1000,5000}n\in\{100,500,1000,5000\}. The critical values are obtained by simulating 𝔾n(u)\mathbb{G}_{n}(u) and calculating the quantiles of its supremum on the grid u{0.005,0.015,0.02,,0.995}u\in\{0.005,0.015,0.02,\dots,0.995\}, with the number of simulations set to 2000020000 (simulation results for the critical values based on Znbc,U[0,1](u)Z_{n}^{bc,U[0,1]}(u) are very similar, so I do not report them here). I use the kernel corresponding to the standard normal distribution truncated to [1/2,1/2][-1/2,1/2] and the nearly-optimal bandwidth h=cn3/8h=cn^{-3/8}, where I set c=1c=1 since the scale of the bandwidth is [0,1][0,1], see Section 3.

In Figure 1, included for illustration, I plot 100 independent realizations of the 90%90\% confidence bands for the linear distribution, along with the true quantile density (in blue). Table 1 contains simulated coverage values for the two-sided confidence bands. The coverage is almost invariant to the distribution of the data, but the size distortion tends to be smaller for higher nominal confidence levels.

Refer to caption
Figure 1: 90%90\% confidence bands for the quantile density (in blue) of the linear distribution with the PDF f=x+0.5f=x+0.5, x[0,1]x\in[0,1]. Number of independent realizations of the bands S=100S=100, sample size n=5000n=5000.
Confidence level 0.80.8 0.90.9 0.950.95 0.990.99
Uniform distribution
n=100n=100 0.891 0.936 0.962 0.986
n=500n=500 0.881 0.943 0.966 0.990
n=1000n=1000 0.898 0.947 0.970 0.993
n=5000n=5000 0.907 0.949 0.976 0.996
Linear distribution
n=100n=100 0.891 0.929 0.956 0.987
n=500n=500 0.878 0.936 0.961 0.989
n=1000n=1000 0.890 0.944 0.970 0.991
n=5000n=5000 0.914 0.949 0.976 0.996
Truncated normal distribution
n=100n=100 0.898 0.942 0.964 0.988
n=500n=500 0.887 0.944 0.967 0.992
n=1000n=1000 0.905 0.950 0.972 0.993
n=5000n=5000 0.911 0.952 0.978 0.997
Table 1: Simulated coverage of the two-sided confidence bands

6 Conclusion

To the best of my knowledge, no boundary bias correction or uniform inference procedures have been developed for the quantile density (sparsity) function. In this paper, I develop such procedures, establish their validity and show in a set of Monte Carlo simulations that they perform reasonably well in finite samples. I hope that, even when the quantile density itself is not the main inference target, these results may be employed for improving the quality of inference for other statistical objects, including the quantile function.

References

  • Andreyanov and Franguridi (2022) Andreyanov, P. and G. Franguridi (2022): “Nonparametric inference on counterfactuals in first-price auctions,” Available at https://arxiv.org/pdf/2106.13856.pdf.
  • Bahadur (1966) Bahadur, R. R. (1966): “A note on quantiles in large samples,” The Annals of Mathematical Statistics, 37, 577–580.
  • Bloch and Gastwirth (1968) Bloch, D. A. and J. L. Gastwirth (1968): “On a simple estimate of the reciprocal of the density function,” The Annals of Mathematical Statistics, 39, 1083–1085.
  • Chernozhukov et al. (2014a) Chernozhukov, V., D. Chetverikov, and K. Kato (2014a): “Anti-concentration and honest, adaptive confidence bands,” The Annals of Statistics, 42, 1787–1818.
  • Chernozhukov et al. (2014b) ——— (2014b): “Gaussian approximation of suprema of empirical processes,” The Annals of Statistics, 42, 1564–1597.
  • Csörgő et al. (1991) Csörgő, M., L. Horváth, and P. Deheuvels (1991): “Estimating the quantile-density function,” in Nonparametric Functional Estimation and Related Topics, Springer, 213–223.
  • Csörgő and Révész (1981a) Csörgő, M. and P. Révész (1981a): Strong approximations in probability and statistics, Academic Press.
  • Csörgő and Révész (1981b) ——— (1981b): Two approaches to constructing simultaneous confidence bounds for quantiles, 176, Carleton University. Department of Mathematics and Statistics.
  • Falk (1986) Falk, M. (1986): “On the estimation of the quantile density function,” Statistics & Probability Letters, 4, 69–73.
  • Giné and Guillou (2002) Giné, E. and A. Guillou (2002): “Rates of strong uniform consistency for multivariate kernel density estimators,” in Annales de l’Institut Henri Poincare (B) Probability and Statistics, Elsevier, vol. 38, 907–921.
  • Jones (1992) Jones, M. C. (1992): “Estimating densities, quantiles, quantile densities and density quantiles,” Annals of the Institute of Statistical Mathematics, 44, 721–727.
  • Kiefer (1967) Kiefer, J. (1967): “On Bahadur’s representation of sample quantiles,” The Annals of Mathematical Statistics, 38, 1323–1342.
  • Kim and Pollard (1990) Kim, J. and D. Pollard (1990): “Cube root asymptotics,” The Annals of Statistics, 191–219.
  • Koenker (2005) Koenker, R. (2005): Quantile Regression, Econometric Society Monographs, Cambridge University Press.
  • Nolan and Pollard (1987) Nolan, D. and D. Pollard (1987): “U-processes: rates of convergence,” The Annals of Statistics, 780–799.
  • Siddiqui (1960) Siddiqui, M. M. (1960): “Distribution of quantiles in samples from a bivariate population,” Journal of Research of the National Bureau of Standards, 64, 145–150.
  • Stroock (1998) Stroock, D. W. (1998): A concise introduction to the theory of integration, Springer Science & Business Media.
  • Tukey (1965) Tukey, J. W. (1965): “Which part of the sample contains the information?” Proceedings of the National Academy of Sciences, 53, 127–134.
  • Welsh (1988) Welsh, A. (1988): “Asymptotically efficient estimation of the sparsity function at a point,” Statistics & Probability Letters, 6, 427–432.

Appendix

Appendix A Proof of Theorem 1 and 1

First, note that

qh(u)\displaystyle q_{h}(u) 01Kh(uz)q(z)𝑑z=01Kh(uz)(q(u)+q(ξ(u,z))(zu))𝑑z\displaystyle\coloneqq\int_{0}^{1}K_{h}(u-z)q(z)\,dz=\int_{0}^{1}K_{h}(u-z)\left(q(u)+q^{\prime}(\xi(u,z))(z-u)\right)\,dz (28)
=q(u)ψh(u)+rh(u),\displaystyle=q(u)\psi_{h}(u)+r_{h}(u), (29)

where rn(u)=O(h)r_{n}(u)=O(h) uniformly in u[0,1]u\in[0,1] since qq is continuously differentiable on [0,1][0,1].

Therefore,

Znbc(u)\displaystyle Z_{n}^{bc}(u) nh(q^hbc(u)q(u))q(u)/ψh(u)=nh(q^h(u)ψh(u)q(u))q(u)=Znc(u)+rnbc(u),\displaystyle\coloneqq\frac{\sqrt{nh}\left(\hat{q}_{h}^{bc}(u)-q(u)\right)}{q(u)/\psi_{h}(u)}=\frac{\sqrt{nh}\left(\hat{q}_{h}(u)-\psi_{h}(u)q(u)\right)}{q(u)}=Z_{n}^{c}(u)+r_{n}^{bc}(u), (30)

where

Znc(u)\displaystyle Z_{n}^{c}(u) nh(q^h(u)qh(u))q(u),\displaystyle\coloneqq\frac{\sqrt{nh}\left(\hat{q}_{h}(u)-q_{h}(u)\right)}{q(u)}, (31)
rnbc(u)\displaystyle r_{n}^{bc}(u) =nhrh(u)q(u)=O(n1/2h3/2) uniformly in u[0,1].\displaystyle=\frac{\sqrt{nh}r_{h}(u)}{q(u)}=O\left(n^{1/2}h^{3/2}\right)\text{ uniformly in }u\in[0,1]. (32)

The result now follows from the asymptotically linear expansion of the process ZncZ_{n}^{c},

Znc(u)=𝔾n(u)+Oa.s.(hlogh1+h1/2n1/4(logn)1/2(loglogn)1/4),\displaystyle Z_{n}^{c}(u)=\mathbb{G}_{n}(u)+O_{a.s.}\left(h\log h^{-1}+h^{-1/2}n^{-1/4}(\log n)^{1/2}(\log\log n)^{1/4}\right), (33)

This expansion is implied by the proof of Andreyanov and Franguridi (2022, Theorem 1). I reproduce this proof here for completeness.

A.1 Proof of the representation (33)

First, we need the following two lemmas concerning expressions that appear further in the proof.

Lemma 1.

Suppose that Assumptions 1 and 2 hold. Then, for every u[0,1]u\in[0,1],

01Kh(uz)d(Q^(z)Q(z))=01(Q^(z)Q(z))𝑑Kh(uz)+RnI(u),\displaystyle\int_{0}^{1}K_{h}(u-z)\,d\left(\hat{Q}(z)-Q(z)\right)=-\int_{0}^{1}\left(\hat{Q}(z)-Q(z)\right)\,dK_{h}(u-z)+R^{I}_{n}(u), (34)

where supu[0,1]|RnI(u)|=Oa.s.(1nh)\sup_{u\in[0,1]}|R^{I}_{n}(u)|=O_{a.s.}\left(\frac{1}{nh}\right).

Proof.

Denote ψ^(z)=Q^(z)Q(z)\hat{\psi}(z)=\hat{Q}(z)-Q(z) and note that ψ^\hat{\psi} is a function of bounded variation a.s. Using integration by parts for the Riemann-Stieltjes integral (see e.g. Stroock, 1998, Theorem 1.2.7), we have

01Kh(uz)𝑑ψ^(z)=01ψ^(z)𝑑Kh(uz)+Kh(u1)ψ^(1)Kh(u)ψ^(0)\displaystyle\int_{0}^{1}K_{h}(u-z)\,d\hat{\psi}(z)=-\int_{0}^{1}\hat{\psi}(z)\,dK_{h}(u-z)+K_{h}(u-1)\hat{\psi}(1)-K_{h}(u)\hat{\psi}(0) (35)

To complete the proof, note that ψ^(1)=X(n)x¯=Oa.s.(n1)\hat{\psi}(1)=X_{(n)}-\bar{x}=O_{a.s.}(n^{-1}), ψ^(0)=X(1)x¯=Oa.s.(n1)\hat{\psi}(0)=X_{(1)}-\underline{x}=O_{a.s.}(n^{-1}), |Kh(u1)|h1K(0)|K_{h}(u-1)|\leq h^{-1}K(0) and |Kh(u)|h1K(0)|K_{h}(u)|\leq h^{-1}K(0). ∎

Lemma 2.

Suppose that Assumptions 1 and 2 hold. Then, for every u[0,1]u\in[0,1],

01(F^(Q(z))z)𝑑Kh(uz)\displaystyle\int_{0}^{1}(\hat{F}(Q(z))-z)\,dK_{h}(u-z) =𝔾n(u)/nh.\displaystyle=-\mathbb{G}_{n}(u)/\sqrt{nh}. (36)
Proof.

Using integration by parts for the Riemann-Stieltjes integral (see e.g. Stroock, 1998, Theorem 1.2.7), we have

01(F^(Q(z))z)𝑑Kh(uz)\displaystyle\int_{0}^{1}(\hat{F}(Q(z))-z)\,dK_{h}(u-z) =01Kh(uz)d[F^(Q(z))z]+Kh(u1)[F^(x¯)1]+Kh(u)F^(x¯)\displaystyle=-\int_{0}^{1}K_{h}(u-z)\,d\left[\hat{F}(Q(z))-z\right]+K_{h}(u-1)\left[\hat{F}(\bar{x})-1\right]+K_{h}(u)\hat{F}(\underline{x}) (37)
=01Kh(uz)d[F^(Q(z))z],\displaystyle=-\int_{0}^{1}K_{h}(u-z)\,d\left[\hat{F}(Q(z))-z\right], (38)

where we used the fact that F^(x¯)=1\hat{F}(\bar{x})=1 a.s. and F^(x¯)=0\hat{F}(\underline{x})=0 a.s. We further write

01(F^(Q(z))z)𝑑Kh(uz)\displaystyle\int_{0}^{1}(\hat{F}(Q(z))-z)\,dK_{h}(u-z) =01Kh(uz)d[F^(Q(z))z]\displaystyle=-\int_{0}^{1}K_{h}(u-z)\,d\left[\hat{F}(Q(z))-z\right] (39)
=0b¯Kh(uF(x))d[F^(x)F(x)]\displaystyle=-\int_{0}^{\bar{b}}K_{h}(u-F(x))\,d\left[\hat{F}(x)-F(x)\right] (40)
=1ni=1n[Kh(uF(bi))𝐄Kh(uF(bi))]\displaystyle=-\frac{1}{n}\sum_{i=1}^{n}\left[K_{h}(u-F(b_{i}))-\mathbf{E}K_{h}(u-F(b_{i}))\right] (41)
=:𝔾n(u)/nh,\displaystyle=:-\mathbb{G}_{n}(u)/\sqrt{nh}, (42)

where in the second equality we used the change of variables x=Q(z)x=Q(z). ∎

We now proceed with the proof of representation (33).

Recall the classical Bahadur-Kiefer expansion (Bahadur, 1966; Kiefer, 1967),

Q^(u)Q(u)\displaystyle\hat{Q}(u)-Q(u) =q(u)(F^(Q(u))u)+rn(u),\displaystyle=-q(u)\left(\hat{F}(Q(u))-u\right)+r_{n}(u), (43)
where rn(u)\displaystyle\text{where }r_{n}(u) =Oa.s.(n3/4(n)) uniformly in u[0,1],\displaystyle=O_{a.s.}\left(n^{-3/4}\ell(n)\right)\text{ uniformly in }u\in[0,1], (44)

and (n)(logn)1/2(loglogn)1/4\ell(n)\coloneqq(\log n)^{1/2}(\log\log n)^{1/4}. Combine this expansion with Lemma 1 to obtain

q^h(u)qh(u)\displaystyle\hat{q}_{h}(u)-q_{h}(u) =01Kh(uz)d[Q^(z)Q(z)]\displaystyle=\int_{0}^{1}K_{h}(u-z)\,d\left[\hat{Q}(z)-Q(z)\right] (45)
=01[Q^(z)Q(z)]𝑑Kh(uz)+RnI(u)\displaystyle=\int_{0}^{1}\left[\hat{Q}(z)-Q(z)\right]\,dK_{h}(u-z)+R_{n}^{I}(u) (46)
=01q(z)(F^(Q(z))z)𝑑Kh(uz)+01RnBK(z)𝑑Kh(uz)+RnI(u).\displaystyle=\int_{0}^{1}q(z)(\hat{F}(Q(z))-z)\,dK_{h}(u-z)+\int_{0}^{1}R_{n}^{BK}(z)\,dK_{h}(u-z)+R_{n}^{I}(u). (47)

First term in (47).

Since ff is bounded away from zero, |q|M<|q^{\prime}|\leq M<\infty for some constant MM, and hence |q(z)q(u)|M|zu||q(z)-q(u)|\leq M|z-u|. The first term in (47) can then be rewritten as

01q(z)(F^(Q(z))z)𝑑Kh(uz)\displaystyle\int_{0}^{1}q(z)(\hat{F}(Q(z))-z)\,dK_{h}(u-z) =q(u)01(F^(Q(z))z)𝑑Kh(uz)+RnII(u),\displaystyle=q(u)\int_{0}^{1}(\hat{F}(Q(z))-z)\,dK_{h}(u-z)+R^{II}_{n}(u), (48)

where

|RnII(u)|\displaystyle\left|R_{n}^{II}(u)\right| =|01(q(z)q(u))(F^(Q(z))z)𝑑Kh(uz)|\displaystyle=\left|\int_{0}^{1}(q(z)-q(u))(\hat{F}(Q(z))-z)\,dK_{h}(u-z)\right| (49)
Mh|01(F^(Q(z))z)𝑑Kh(uz)|=Mh|𝔾n(u)/nh|,\displaystyle\leq Mh\left|\int_{0}^{1}(\hat{F}(Q(z))-z)\,dK_{h}(u-z)\right|=Mh\left|\mathbb{G}_{n}(u)/\sqrt{nh}\right|, (50)

the last equality using Lemma 2. The process 𝔾n\mathbb{G}_{n} has the strong uniform convergence rate logh1/nh\log h^{-1}/\sqrt{nh} (see, e.g., Giné and Guillou, 2002), and hence

RnII(u)=Oa.s.(hlogh1nh) uniformly over u[0,1].\displaystyle R_{n}^{II}(u)=O_{a.s.}\left(\frac{h\log h^{-1}}{\sqrt{nh}}\right)\text{ uniformly over }u\in[0,1]. (51)

Applying Lemma 2 to the first term in (48) allows us to rewrite

01q(z)(F^(Q(z))z)𝑑Kh(uz)\displaystyle\int_{0}^{1}q(z)(\hat{F}(Q(z))-z)\,dK_{h}(u-z) =q(u)𝔾n(u)nh+Oa.s.(hlogh1nh).\displaystyle=-q(u)\frac{\mathbb{G}_{n}(u)}{\sqrt{nh}}+O_{a.s.}\left(\frac{h\log h^{-1}}{\sqrt{nh}}\right). (52)

Second term in (47).

This term can be upper bounded as follows,

supu|01RnBK(z)𝑑Kh(uz)|supu01|RnBK(z)||dKh(uz)|supz|RnBK(z)|TV(Kh)\displaystyle\sup_{u}\left|\int_{0}^{1}R_{n}^{BK}(z)\,dK_{h}(u-z)\right|\leq\sup_{u}\int_{0}^{1}\left|R_{n}^{BK}(z)\right|\,\left|dK_{h}(u-z)\right|\leq\sup_{z}|R_{n}^{BK}(z)|TV(K_{h}) (53)
=Oa.s.(n3/4(n))h1TV(K)=Oa.s.(h1n3/4(n)),\displaystyle=O_{a.s.}\left(n^{-3/4}\ell(n)\right)h^{-1}TV(K)=O_{a.s.}\left(h^{-1}n^{-3/4}\ell(n)\right), (54)

where we used the properties of total variation in the first inequality and in the second equality.

Plugging (52) and (54) into (47) and multiplying by nh\sqrt{nh} yields

nh(q^h(u)qh(u))=q(u)𝔾n(u)+Oa.s.(hlogh1)+Oa.s.(h1/2n1/4(n)).\displaystyle\sqrt{nh}\left(\hat{q}_{h}(u)-q_{h}(u)\right)=-q(u)\mathbb{G}_{n}(u)+O_{a.s.}(h\log h^{-1})+O_{a.s.}\left(h^{-1/2}n^{-1/4}\ell(n)\right). (55)

Note that we disregarded the term nhRnI(u)\sqrt{nh}R_{n}^{I}(u), since it has the uniform order Oa.s.(n1/2h1/2)O_{a.s.}(n^{-1/2}h^{-1/2}), which is smaller than Oa.s.(h1/2n1/4(n))O_{a.s.}\left(h^{-1/2}n^{-1/4}\ell(n)\right). Dividing by q(u)q(u), which is bounded away from zero for u[0,1]u\in[0,1] due to Assumption 1, finishes the proof.

A.2 Proof of 1

Let us check that the conditions of Giné and Guillou (2002, Proposition 3.1) hold. Indeed, Assumption 2 implies their condition (K2)(K_{2}), while Assumption 3 implies their conditions (2.11) and (W2)(W_{2}). By Giné and Guillou (2002, Remark 3.5), their condition (D2)(D_{2}) can be replaced by the conditions satisfied by the uniform distribution. To complete the proof, divide the expansion in Theorem 1 by 2loghn1\sqrt{2\log h_{n}^{-1}} and note that the first term 𝔾n(u)/2loghn1\mathbb{G}_{n}(u)/\sqrt{2\log h_{n}^{-1}} converges to (K2(x)𝑑x)1/2\left(\int_{\mathbb{R}}K^{2}(x)\,dx\right)^{1/2} by Giné and Guillou (2002, Proposition 3.1), while the remainder converges to zero a.s. due to Assumption 3.

Appendix B Proof of Theorem 2

A key ingredient of the proof is to note that Lemmas 2.3 and 2.4 of Chernozhukov et al. (2014b) continue to hold even if their random variable ZnZ_{n} does not have the form Zn=supfn𝔾nfZ_{n}=\sup_{f\in\mathcal{F}_{n}}\mathbb{G}_{n}f for the standard empirical process 𝔾n\mathbb{G}_{n}, but instead is a generic random variable admitting a strong sup-Gaussian approximation with a sufficiently small remainder.

For completeness, we provide the aforementioned trivial extensions of the two lemmas here, taken directly from Andreyanov and Franguridi (2022).

Let XX be a random variable with distribution PP taking values in a measurable space (S,𝒮)(S,\mathcal{S}). Let \mathcal{F} be a class of real-valued functions on SS. We say that a function F:SF:S\to\mathbb{R} is an envelope of \mathcal{F} if FF is measurable and |f(x)|F(x)|f(x)|\leq F(x) for all ff\in\mathcal{F} and xSx\in S.

We impose the following assumptions (A1)-(A3) of Chernozhukov et al. (2014b).

  1. (A1)

    The class \mathcal{F} is pointwise measurable, i.e. it contains a coutable subset 𝔾\mathbb{G} such that for every ff\in\mathcal{F} there exists a sequence gm𝔾g_{m}\in\mathbb{G} with gm(x)f(x)g_{m}(x)\to f(x) for every xSx\in S.

  2. (A2)

    For some q2q\geq 2, an envelope FF of \mathcal{F} satisfies FLq(P)F\in L^{q}(P).

  3. (A3)

    The class \mathcal{F} is PP-pre-Gaussian, i.e. there exists a tight Gaussian random variable GPG_{P} in l()l^{\infty}(\mathcal{F}) with mean zero and covariance function

    𝐄[GP(f)GP(g)]=𝐄[f(X)g(X)] for all f,g.\displaystyle\mathbf{E}[G_{P}(f)G_{P}(g)]=\mathbf{E}[f(X)g(X)]\text{ for all }f,g\in\mathcal{F}. (56)
Lemma 3 (A trivial extension of Lemma 2.3 of Chernozhukov et al. (2014b)).

Suppose that Assumptions (A1)-(A3) are satisfied and that there exist constants σ¯\underline{\sigma}, σ¯>0\bar{\sigma}>0 such that σ¯2Pf2σ¯2\underline{\sigma}^{2}\leq Pf^{2}\leq\bar{\sigma}^{2} for all ff\in\mathcal{F}. Moreover, suppose there exist constants r1,r2>0r_{1},r_{2}>0 and a random variable Z~=supfGPf\tilde{Z}=\sup_{f\in\mathcal{F}}G_{P}f such that (|ZZ~|>r1)r2\mathbb{P}(|Z-\tilde{Z}|>r_{1})\leq r_{2}. Then

supt|(Zt)(Z~t)|Cσr1{𝐄Z~+1log(σ¯/r1)}+r2,\displaystyle\sup_{t\in\mathbb{R}}\left|\mathbb{P}(Z\leq t)-\mathbb{P}(\tilde{Z}\leq t)\right|\leq C_{\sigma}r_{1}\left\{\mathbf{E}\tilde{Z}+\sqrt{1\vee\log(\underline{\sigma}/r_{1})}\right\}+r_{2}, (57)

where CσC_{\sigma} is a constant depending only on σ¯\underline{\sigma} and σ¯\bar{\sigma}.

Proof.

For every tt\in\mathbb{R}, we have

(Zt)\displaystyle\mathbb{P}(Z\leq t) =({Zt}{|ZZ~|r1})+({Zt}{|ZZ~|>r1})\displaystyle=\mathbb{P}(\{Z\leq t\}\cap\{|Z-\tilde{Z}|\leq r_{1}\})+\mathbb{P}(\{Z\leq t\}\cap\{|Z-\tilde{Z}|>r_{1}\}) (58)
(Z~t+r1)+r2\displaystyle\leq\mathbb{P}(\tilde{Z}\leq t+r_{1})+r_{2} (59)
(Z~t)+Cσr1{𝐄Z~+1log(σ¯/r1)}+r2,\displaystyle\leq\mathbb{P}(\tilde{Z}\leq t)+C_{\sigma}r_{1}\left\{\mathbf{E}\tilde{Z}+\sqrt{1\vee\log(\underline{\sigma}/r_{1})}\right\}+r_{2}, (60)

where Lemma A.1 of Chernozhukov et al. (2014b) (an anti-concentration inequality for Z~\tilde{Z}) is used to deduce the last inequality. A similar argument leads to the reverse inequality, which completes the proof. ∎

Lemma 4 (A trivial extension of Lemma 2.4 of Chernozhukov et al. (2014b)).

Suppose that there exists a sequence of PP-centered classes n\mathcal{F}_{n} of measurable functions SS\to\mathbb{R} satisfying assumptions (A1)-(A3) with =n\mathcal{F}=\mathcal{F}_{n} for each nn, where in the assumption (A3) the constants σ¯\underline{\sigma} and σ¯\bar{\sigma} do not depend on nn. Denote by BnB_{n} the Brownian bridge on (n)\ell^{\infty}(\mathcal{F}_{n}), i.e. a tight Gaussian random variable in (n)\ell^{\infty}(\mathcal{F}_{n}) with mean zero and covariance function

𝐄[Bn(f)Bn(g)]=𝐄[f(X)g(X)] for all f,gn.\displaystyle\mathbf{E}[B_{n}(f)B_{n}(g)]=\mathbf{E}[f(X)g(X)]\text{ for all }f,g\in\mathcal{F}_{n}. (61)

Moreover, suppose that there exists a sequence of random variables Z~n=supfnBn(f)\tilde{Z}_{n}=\sup_{f\in\mathcal{F}_{n}}B_{n}(f) and a sequence of constants rn0r_{n}\to 0 such that |ZnZ~n|=OP(rn)|Z_{n}-\tilde{Z}_{n}|=O_{P}(r_{n}) and rn𝐄Z~n0r_{n}\mathbf{E}\tilde{Z}_{n}\to 0. Then

supt|(Znt)(Z~nt)|0.\displaystyle\sup_{t\in\mathbb{R}}\left|\mathbb{P}(Z_{n}\leq t)-\mathbb{P}(\tilde{Z}_{n}\leq t)\right|\to 0. (62)
Proof.

Take βn\beta_{n}\to\infty sufficiently slowly such that βnrn(1𝐄Z~n)=o(1)\beta_{n}r_{n}(1\vee\mathbf{E}\tilde{Z}_{n})=o(1). Then since (|ZnZ~n|>βnrn)=o(1)\mathbb{P}(|Z_{n}-\tilde{Z}_{n}|>\beta_{n}r_{n})=o(1), by Lemma 3, we have

supt|(Znt)(Z~nt)|=O(rn(𝐄Z~n+|log(βnrn)|))+o(1)=o(1).\displaystyle\sup_{t\in\mathbb{R}}\left|\mathbb{P}(Z_{n}\leq t)-\mathbb{P}(\tilde{Z}_{n}\leq t)\right|=O\left(r_{n}(\mathbf{E}\tilde{Z}_{n}+|\log(\beta_{n}r_{n})|)\right)+o(1)=o(1). (63)

This completes the proof. ∎

I now go back to the proof of Theorem 2. Chernozhukov et al. (2014b, Proposition 3.1) establish a sup-Gaussian approximation of Wn𝔾W_{n}^{\mathbb{G}}; namely, there exists a tight centered Gaussian random variable BnB_{n} in ([0,1])\ell^{\infty}([0,1]) with the covariance function

𝐄[Bn(u)Bn(v)]=Cov(Kh(Uu),Kh(Uv)),u,v[0,1],\displaystyle\mathbf{E}[B_{n}(u)B_{n}(v)]=\text{Cov}\left(K_{h}(U-u),K_{h}(U-v)\right),\quad u,v\in[0,1], (64)

where UUniform[0,1]U\sim\text{Uniform}[0,1], such that, for W~nsupu[0,1]Bn(u)\tilde{W}_{n}\coloneqq\sup_{u\in[0,1]}B_{n}(u), we have the approximation

Wn𝔾=W~n+Op((nh)1/6logn).\displaystyle W_{n}^{\mathbb{G}}=\tilde{W}_{n}+O_{p}\left((nh)^{-1/6}\log n\right). (65)

Lemma 4 and Chernozhukov et al. (2014b, Remark 3.2) then imply

supt|(Wn𝔾t)(W~nt)|0.\displaystyle\sup_{t\in\mathbb{R}}\left|\mathbb{P}(W_{n}^{\mathbb{G}}\leq t)-\mathbb{P}(\tilde{W}_{n}\leq t)\right|\to 0. (66)

On the other hand, from Theorem 1 it follows that

Wnbc=Wn𝔾+Oa.s.(n1/2h3/2+hlogh1+h1/2n1/4(n)),\displaystyle W_{n}^{bc}=W_{n}^{\mathbb{G}}+O_{a.s.}\left(n^{1/2}h^{3/2}+h\log h^{-1}+h^{-1/2}n^{-1/4}\ell(n)\right), (67)

where we define (n)(logn)1/2(loglogn)1/4\ell(n)\coloneqq(\log n)^{1/2}(\log\log n)^{1/4}. Substituting (65) into (67) yields

Wnbc=W~n+Oa.s.((nh)1/6logn+n1/2h3/2+hlogh1+h1/2n1/4(n)).\displaystyle W_{n}^{bc}=\tilde{W}_{n}+O_{a.s.}\left((nh)^{-1/6}\log n+n^{1/2}h^{3/2}+h\log h^{-1}+h^{-1/2}n^{-1/4}\ell(n)\right). (68)

Assumption 4 implies that n1/2h3/2=o(log1/2(n))n^{1/2}h^{3/2}=o(\log^{-1/2}(n)) and h1/2n1/4(n)=o(log1/2(n))h^{-1/2}n^{-1/4}\ell(n)=o(\log^{-1/2}(n)). Therefore,

WnbcW~n=op(log1/2n).\displaystyle W_{n}^{bc}-\tilde{W}_{n}=o_{p}(\log^{-1/2}n). (69)

It now follows from Chernozhukov et al. (2014b, Remark 3.2) that

supt|(Wnbct)(W~nt)|0.\displaystyle\sup_{t\in\mathbb{R}}\left|\mathbb{P}(W_{n}^{bc}\leq t)-\mathbb{P}(\tilde{W}_{n}\leq t)\right|\to 0. (70)

Applying the triangle inequality to equations (66) and (70) yields

supt|(Wnbct)(Wn𝔾t)|0.\displaystyle\sup_{t\in\mathbb{R}}\left|\mathbb{P}(W_{n}^{bc}\leq t)-\mathbb{P}(W_{n}^{\mathbb{G}}\leq t)\right|\to 0. (71)

On the other hand, considering the sample Ui=F(Xi)iid Uniform[0,1]U_{i}=F(X_{i})\sim\text{iid Uniform}[0,1], we have

Wnbc,U[0,1]=Wn𝔾+Oa.s.((nh)1/6logn+n1/2h3/2+hlogh1+h1/2n1/4(n)).\displaystyle W_{n}^{bc,U[0,1]}=W_{n}^{\mathbb{G}}+O_{a.s.}\left((nh)^{-1/6}\log n+n^{1/2}h^{3/2}+h\log h^{-1}+h^{-1/2}n^{-1/4}\ell(n)\right). (72)

A similar argument yields

supt|(Wnbct)(Wnbc,U[0,1]t)|0,\displaystyle\sup_{t\in\mathbb{R}}\left|\mathbb{P}(W_{n}^{bc}\leq t)-\mathbb{P}(W_{n}^{bc,U[0,1]}\leq t)\right|\to 0, (73)

which completes the proof.