This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\externaldocument

SuppPub

Self-convolved Bootstrap for M-regression under Complex Temporal Dynamics

Miaoshiqi Liu  
Department of Statistical Sciences, University of Toronto
and
Zhou Zhou   
Department of Statistical Sciences, University of Toronto
Email: zhou.zhou@utoronto.ca
Abstract

The paper considers simultaneous nonparametric inference for a wide class of M-regression models with time-varying coefficients. The covariates and errors of the regression model are tackled as a general class of nonstationary time series and are allowed to be cross-dependent. A novel and easy-to-implement self-convolved bootstrap procedure is proposed. With only one tuning parameter, the bootstrap facilitates a n\sqrt{n}-consistent inference of the cumulative regression function for the M-estimators under complex temporal dynamics, even under the possible presence of breakpoints in time series. Our methodology leads to a unified framework to conduct general classes of Exact Function Tests, Lack-of-fit Tests, and Qualitative Tests for the time-varying coefficients. These tests enable one to, among many others, conduct variable selection, check for constancy and linearity, as well as verify shape assumptions, including monotonicity and convexity. As applications, our method is utilized to study the time-varying properties of global climate data and Microsoft stock return, respectively.


Keywords: Bahadur representation, cumulative regression function, M estimator, nonstationary time series, quantile regression, shape test.

1 Introduction

There has been an increasing necessity to account for temporal non-stationarity in finance and economics. For instance, financial returns are frequently non-stationary (Francq and Sucarrat, 2023). In climate economics, climate change has become progressively unpredictable with extreme weather happening globally with greater intensity (Ebi et al., 2021). See also Dette et al. (2011); Vogt (2012); Nason (2013); Kreiss and Paparoditis (2014); Dahlhaus et al. (2019) among others for the recent literature on non-stationary time series analysis. As deviations from temporal homogeneity become more pronounced, static models may fail to capture the dynamic features of the data. In the context of regression analysis, one successful generalization is to use the varying-coefficient model (Hastie and Tibshirani, 1993), which is capable of describing the non-constant relationship between the predictors and the response (Hoover et al., 1998; Fan and Zhang, 2000; Karmakar et al., 2022). In the present work, we shall further consider the following time-varying M-regression (Huber, 1964, 1973) model for non-stationary time series:

yi=𝐱i𝜷i+ei,i=1,2,,n,\displaystyle y_{i}={\bf x}_{i}^{\top}\mbox{\boldmath$\beta$}_{i}+e_{i},\quad i=1,2,\cdots,n, (1)

where {𝐱i=(xi1,,xip)}i=1n\{{\bf x}_{i}=(x_{i1},\cdots,x_{ip})^{\top}\}_{i=1}^{n} is a pp-dimensional predictor process, {ei}i=1n\{e_{i}\}_{i=1}^{n} is the error process satisfying 𝔼(ψ(ei)|𝐱i,𝐱i1,,𝐱1)=0\mathbb{E}(\psi(e_{i})|{\bf x}_{i},{\bf x}_{i-1},\cdots,{\bf x}_{1})=0 almost surely with ψ()\psi(\cdot) being the left derivative of a convex function ρ()\rho(\cdot) , and 𝜷i=𝜷(ti)=\mbox{\boldmath$\beta$}_{i}=\mbox{\boldmath$\beta$}(t_{i})= (β0(ti),,βp1(ti))(\beta_{0}(t_{i}),\cdots,\beta_{p-1}(t_{i}))^{\top} is the coefficient 𝜷(t)\mbox{\boldmath$\beta$}(t) evaluated at time ti=i/nt_{i}=i/n. We study model (1) in a general framework where {𝐱i}\{{\bf x}_{i}\} and {ei}\{e_{i}\} belong to a general class of non-stationary time series models with both smoothly and abruptly time-varying dynamics. The detailed assumptions of the model are deferred to Section 7. Observe that (1) includes a wide range of regression models with flexible choices of ρ()\rho(\cdot). Prominent examples include the least square regression with ρ(x)=x2\rho(x)=x^{2}, the quantile regression with ρ(x)=τx++(1τ)(x)+\rho(x)=\tau x^{+}+(1-\tau)(-x)^{+}, τ(0,1)\tau\in(0,1), x+=max(x,0)x^{+}=\max(x,0), the LqL^{q} regression with ρ(x)=|x|q\rho(x)=|x|^{q}, q>0q>0, the expectile regression with ρ(x)=|I(x0)α|x2\rho(x)=|I(x\leq 0)-\alpha|x^{2}, α(0,1)\alpha\in(0,1), and Huber’s robust regression with ρ(x)=x2I(|x|ς)/2+(ς|x|ς2/2)I(|x|>ς)\rho(x)=x^{2}I(|x|\leq\varsigma)/2+(\varsigma|x|-\varsigma^{2}/2)I(|x|>\varsigma), ς>0\varsigma>0.

This article aims to provide a unified framework for nonparametric inference of model (1). Due to the complex temporal dynamics and the broad choice of loss functions, it is difficult to directly estimate the limiting distributions of the M-estimators, in which case resampling methods such as the bootstrap is typically adopted to circumvent the difficulty. In the context of nonparametric M-regression, however, so far there exist no valid bootstrap methods for its simultaneous inference when the regressors and errors are non-stationary time series with possible jumps in the underlying data generating mechanism. The most related work can be found in Wu and Zhou (2017), who explored time-varying coefficient quantile regression with smoothly time-varying dynamics. We also refer to Friedrich and Lin (2022) who proposed an autoregressive sieve bootstrap framework, valid under L2L_{2} regression with strictly stationary α\alpha-mixing predictors. Extending to our current setup, the primary challenge in designing bootstrap methods originates from the inconsistency in estimating various key quantities around the jump points. Additionally, bootstrap inference for M-regression involves estimating a density-like function, which requires the delicate choice of a tuning parameter (Koenker, 2005). For instance, in quantile regression where ρ(x)=τx++(1τ)(x)+\rho(x)=\tau x^{+}+(1-\tau)(-x)^{+}, the asymptotic behavior of local estimates of 𝜷()\mbox{\boldmath$\beta$}(\cdot) is tied to a density-type quantity Σ()\Sigma(\cdot) (Wu and Zhou, 2017). In moderate samples, the quality of the bootstrap inference depends on the estimation accuracy of Σ()\Sigma(\cdot), which substantially entails the nontrivial task of selecting the tuning parameter. Lastly, the existing literature on bootstrap inference of nonparametric M-regression, such as Friedrich and Lin (2022) and Wu and Zhou (2017), often constructs the bootstrap based on residuals. In moderate samples, the performance of such residual-based bootstrap can be significantly hampered by the estimation error of the residuals as estimators of the true regression errors.

The methodological innovation of this paper lies in proposing a novel self-convolved bootstrap for the simultaneous nonparametric inference of Model (1), which shares a simple and unified construction among a broad class of M-estimators and hypothesis tests, and is consistent under very general forms of non-stationary temporal dependence and predictor-error dependence. As its name implies, the self-convolved bootstrap only necessitates the convolution of the M-estimators with i.i.d. auxiliary standard normal variables, eliminating the need for additional computation. Particularly, it avoids estimating the aforementioned density-like quantities or residuals. The construction principle applies uniformly to a wide class of M-estimators and therefore saves lots of effort when conducting inference based on various estimators simultaneously. The validity of the proposed bootstrap arises from two main observations. Firstly, by taking the second-order difference of the local M-estimators as shown in Section 3, their progressive sum can be well approximated by a weighted block sum of the data through a uniform Bahadur representation. Secondly, the weighted block sum convolved by i.i.d. auxiliary standard normal variables takes the form of a robust multiplier bootstrap (Zhou, 2013), which can consistently mimic the probabilistic behavior of their partial sum processes. We will show that the easy-to-implement bootstrap simulates the probabilistic behavior of the M-estimators consistently under complex temporal dynamics. Compared to the bootstrap methods developed in Karmakar et al. (2022), Wu and Zhou (2017) and Friedrich and Lin (2022), the self-convolved bootstrap enjoys broader applicability yet maintains a simple implementation.

Facilitating the self-convolved bootstrap, our framework utilizes the cumulative regression function (CRF)

Λ(t)=(Λ1(t),,Λp(t))=0t𝜷(s)𝑑s,0t1,\displaystyle\Lambda(t)=(\Lambda_{1}(t),\cdots,\Lambda_{p}(t))^{\top}=\int_{0}^{t}\mbox{\boldmath$\beta$}(s)\,ds,\quad 0\leq t\leq 1, (2)

instead of 𝜷(t)\mbox{\boldmath$\beta$}(t) for inference, driven by two main reasons. First, as is suggested in Theorem 1, the CRF can be estimated at a parametric rate of n\sqrt{n}. Such merits of getting n\sqrt{n} convergence via integration have been demonstrated by Hall and Marron (1987), Bickel and Ritov (1988), Huang and Fan (1999) and many others. Recently, Mies (2023) employed the method to estimate parameters of locally stationary time series, leading to a n\sqrt{n} functional central limit theorem. Cai and Juhl (2023) also obtained a similar result regarding rolling regression estimators, where the error process is assumed to be a martingale difference. Although the results in the existing literature have gradually adapted from the i.i.d. case to a broader non-stationary time series setup, they generally rely on a closed-form statistic, e.g., the least square estimator, and therefore do not apply to the M-regression setting. To our knowledge, no comparable research has been conducted on uniform nonparametric inference of the CRF under the time-varying M-regression framework. The second reason, possibly more salient, pertains to the complexity of the temporal dynamics considered. Specifically, it is difficult to consistently estimate the (asymptotic) distributional behavior of the M-estimators around the time points where 𝐱i{\bf x}_{i} and/or eie_{i} experience abrupt changes. As a result, directly inferring the M-estimators of 𝜷()\mbox{\boldmath$\beta$}(\cdot) simultaneously over time becomes challenging. Nevertheless, we discover that aggregated M-estimators of the CRF converge to a Gaussian process uniformly over time under mild conditions, which can be effectively approximated by the self-convolved bootstrap mentioned earlier. Consequently, it is simpler and more accurate to make simultaneous inferences of Λ()\Lambda(\cdot) than 𝜷()\mbox{\boldmath$\beta$}(\cdot) under complex temporal dynamics. As far as we know, no previous literature has used the method of integration or aggregation to alleviate the effect of complex dependence and heteroscedasticity in regression analysis.

The self-convolved bootstrap facilitated by the CRF addresses different types of tests using a unified asymptotic result, which ventures into a new avenue compared to the conventional methods. To elaborate, we consider three types of hypothesis tests: Exact Function Tests, Lack-of-fit Tests, and Qualitative Tests. The first type, Exact Function Test (EFT), takes the form of H0:𝜷(t)=𝜷0(t)H_{0}:\mbox{\boldmath$\beta$}(t)=\mbox{\boldmath$\beta$}_{0}(t) for some specific 𝜷0(t)\mbox{\boldmath$\beta$}_{0}(t), which, for example, enables one to check the significance of the variables by letting 𝜷0(t)=𝟎\mbox{\boldmath$\beta$}_{0}(t)=\mathbf{0}. Secondly, the Lack-of-fit Test (LOFT), also referred to as diagnostic test (Stute, 1997), tackles the circumstance where the null hypothesis is a parametric family while the alternative remains nonparametric (He and Zhu, 2003). This includes, but is not limited to, the tests of constancy and linearity of the function versus a general smooth alternative. An example in the realm of econometrics is (Chen and Hong, 2012), where the LOFT is used to tackle the long-standing problem of detecting structural changes in economic relationships. Apart from the aforementioned two types of tests, one may also want to check qualitative beliefs on functions where the null hypothesis is a relatively general nonparametric class of functions (Komarova and Hidalgo, 2020). For example, many economic models use certain shape restrictions (e.g., monotonicity, convexity) as plausible restrictions (Chetverikov, 2019; Fang and Seo, 2021). While those prespecified qualitative shape assumptions facilitate the inference and enhance performance (Friedman and Tibshirani, 1984; Matzkin, 1994), wrong conclusions can be drawn when these assumptions are not satisfied. The necessity to conduct Qualitative Tests (QT), therefore, becomes axiomatic.

Historically in the context of regression, much literature has focused on the Lack-of-fit Test on the regression mean function, serving as a key component in model checking. One popular approach is to estimate the parametric and nonparametric curves separately and compute their discrepancies as a natural test statistic, as seen in Hardle and Mammen (1993) and Hong and White (1995). Similarly, Chen and Hong (2012) proposed a generalized Hausman test for checking parameter stability in time series models, applicable to stationary data under a L2L_{2} framework. Alternatively, another distinguished direction is to study the empirical process based on residuals; see Stute et al. (1998); Koul and Stute (1999). Recently, Mies (2023) utilized the integration technique for change-point detection of local parameters. In comparison, our current work is based on a new self-convolved bootstrap technique and applies to a broader range of tests and a more general M-regression setting. Different from the LOFT where the null hypothesis forms a parametric family, qualitative hypotheses often include shape constraints that are more complicated to deal with. To test the monotonicity of a regression curve, Bowman et al. (1998) exploited Silverman (1981)’s idea of critical bandwidth. Gijbels et al. (2000) and Ghosal et al. (2000) formulated monotonicity of the regression curve as the concordance of XX and YY. As a distinctive approach, Durot (2003) transformed the test of monotonicity into testing whether the integral coincides with its least concave majorant. A localized version of the test was later elaborated by Akakpo et al. (2014). Other contributions in nonparametric qualitative tests include Hall and Heckman (2000); I. Gijbels and Verhasselt (2017); Komarova and Hidalgo (2020); Fang and Seo (2021), among others.

The rest of the paper is structured as follows. Section 2 introduces the piece-wise locally stationary time series, dependence measures, and local linear M-estimators. The estimator of the CRF is proposed, with its asymptotic results stated in Theorem 1. Section 3 presents the self-convolved bootstrap. In Section 4, three types of hypothesis tests are discussed, where detailed algorithms and their asymptotic behaviors are provided. The finite-sample performance of our method is demonstrated in Section 5, followed by empirical illustrations using global climate data and Microsoft stock return in Section 6. Regularity conditions of the model and auxiliary theoretical results about Bahadur representation are given in Section 7. Detailed proofs are provided in the supplementary material.

2 Preliminary

We start by introducing some notations. For an mm-dimensional (random) vector 𝒗=(v1,,vm),m1\boldsymbol{v}=(v_{1},\cdots,v_{m}),m\geq 1, let |v||v| be the Euclidean norm and |v|:=maxi=1m|vi||v|_{\infty}:=\max_{i=1}^{m}|v_{i}|. Let I{}I\{\cdot\} be the indicator function and we denote the weak convergence by \Rightarrow.

2.1 Non-stationary time series models

The following piece-wise locally stationary (PLS) time series models (Zhou, 2013) are used to describe the complex temporal dynamics of the covariates and errors. For a sequence of i.i.d. random variables {ηi}i=\{\eta_{i}\}_{i=-\infty}^{\infty}, let {ηi}i=\{\eta^{\prime}_{i}\}_{i=-\infty}^{\infty} be an i.i.d. copy of {ηi}i=\{\eta_{i}\}_{i=-\infty}^{\infty}. Define i=(,ηi1,ηi)\mathcal{F}_{i}=(\cdots,\eta_{i-1},\eta_{i}), and i=(1,η0,η1,,ηi)\mathcal{F}_{i}^{*}=({\cal F}_{-1},\eta_{0}^{\prime},\eta_{1},\cdots,\eta_{i}).

Definition 1.

We say that {𝐙i}\{{\bf Z}_{i}\} is a dd-dimensional piece-wise locally stationary time series with rr break points and filtration {i}\{\mathcal{F}_{i}\} if

𝐙i=j=0rGj(i/n,i)I{i/n(wj,wj+1]},i=1,2,,n,\displaystyle{\bf Z}_{i}=\sum_{j=0}^{r}G_{j}(i/n,\mathcal{F}_{i})I\{i/n\in(w_{j},w_{j+1}]\},\quad i=1,2,\cdots,n, (3)

where 0=w0<w1<<wr<wr+1=10=w_{0}<w_{1}<\cdots<w_{r}<w_{r+1}=1, and Gj:G_{j}: [wj,wj+1]×d[w_{j},w_{j+1}]\times\mathbb{R}^{\mathbb{Z}}\rightarrow\mathbb{R}^{d}, j=0,1,,rj=0,1,\cdots,r are possible nonlinear filters.

The breakpoints w1,,wrw_{1},\cdots,w_{r} are assumed to be fixed but unknown, and the number of breaks rr is assumed to be bounded. The PLS process can capture a broad class non-stationary behavior in practice because it allows the underlying data-generating mechanism to evolve smoothly between breakpoints (provided that the filters Gj(t,i)G_{j}(t,\mathcal{F}_{i}) are smooth in tt) and permits the latter mechanism to change abruptly from Gj1()G_{j-1}(\cdot) to Gj()G_{j}(\cdot) at the breakpoint wjw_{j}. Let ζi\zeta_{i} be the index jj such that i/n(wj,wj+1]i/n\in(w_{j},w_{j+1}]. Then (3) is equivalent to 𝐙i=Gζi(i/n,i){\bf Z}_{i}=G_{\zeta_{i}}(i/n,\mathcal{F}_{i}). For q>0q>0 and i0i\geq 0, we define the dependence measures

δ𝐙(i,q)=max0jrsupwjtwj+1Gj(t,i)Gj(t,i)q,\displaystyle\delta_{{\bf Z}}(i,q)=\max_{0\leq j\leq r}\sup_{w_{j}\leq t\leq w_{j+1}}\|G_{j}({t,\cal F}_{i})-G_{j}(t,\mathcal{F}_{i}^{*})\|_{q}, (4)

where q={𝔼[||q]}1/q\|\cdot\|_{q}=\{\mathbb{E}[|\cdot|^{q}]\}^{1/q} is the q{\cal L}^{q} norm. Write :=2\|\cdot\|:=\|\cdot\|_{2}. Let δ𝐙(i,q)=0\delta_{{\bf Z}}(i,q)=0 if i<0i<0. Intuitively, δ𝐙(i,q)\delta_{{\bf Z}}(i,q) measures uniformly the changes in a dynamic system’s output when the innovations of the system ii-steps ahead are replaced with i.i.d. copies. Therefore the decay speed of δ𝐙(i,q)\delta_{{\bf Z}}(i,q) as a function of ii measures the strength of the system’s temporal dependence. We refer the readers to (Zhou, 2013) for more discussions and examples of the PLS time series models and the associated dependence measures.

Here we consider filtrations i=(,ηi1,ηi)\mathcal{F}_{i}=(\cdots,\eta_{i-1},\eta_{i}) and 𝒢i=(,τi1,τi)\mathcal{G}_{i}=(\cdots,\tau_{i-1},\tau_{i}), where {ηi}i=\{\eta_{i}\}_{i=-\infty}^{\infty} and {τi}i=\{\tau_{i}\}_{i=-\infty}^{\infty} are independent. Assume piece-wise locally stationary time series 𝐱i=𝐇ζi(ti,𝒢i){\bf x}_{i}={\bf H}_{\zeta_{i}}(t_{i},\mathcal{G}_{i}), ei=Gζi(ti,i,𝒢i)e_{i}=G_{\zeta_{i}}(t_{i},\mathcal{F}_{i},\mathcal{G}_{i}). Observe that 𝐱i{\bf x}_{i} and eie_{i} can be dependent as the innovations 𝒢i\mathcal{G}_{i} which generate 𝐱i{\bf x}_{i} are also involved in eie_{i}’s generating process. On the other hand, the factors influencing eie_{i} but are unrelated to 𝐱i{\bf x}_{i} are captured in i\mathcal{F}_{i}. Without loss of generality, assume {𝐱i}\{{\bf x}_{i}\} and {ei}\{e_{i}\} share the same rr breakpoints 0=w0<<wr<wr+1=10=w_{0}<\cdots<w_{r}<w_{r+1}=1.

2.2 Aggregated local estimation of the CRF

Throughout this paper, we assume that the regression function 𝜷(t)𝒞2[0,1]\mbox{\boldmath$\beta$}(t)\in{\cal C}^{2}[0,1]. For any t(0,1)t\in(0,1), since 𝜷(s)𝜷(t)+𝜷(t)(st)\mbox{\boldmath$\beta$}(s)\approx\mbox{\boldmath$\beta$}(t)+\mbox{\boldmath$\beta$}^{\prime}(t)(s-t) in a small neighborhood of tt, we define the preliminary local linear M-estimates of 𝜷(t)\mbox{\boldmath$\beta$}(t) and 𝜷(t)\mbox{\boldmath$\beta$}^{\prime}(t) by

𝜷~bn(t):=(𝜷^bn(t)𝜷^bn(t))=argmin𝒃0,𝒃1pi=1nρ(yi𝐱i𝒃0𝐱i𝒃1(tit))K((tit)/bn),\displaystyle\tilde{\mbox{\boldmath$\beta$}}_{b_{n}}(t):=\left(\begin{array}[]{c}\hat{\mbox{\boldmath$\beta$}}_{b_{n}}(t)\\ \hat{\mbox{\boldmath$\beta$}}_{b_{n}}^{\prime}(t)\end{array}\right)=\mathop{\mbox{argmin}}_{\mbox{\boldmath$b$}_{0},\mbox{\boldmath$b$}_{1}\in\mathbb{R}^{p}}\sum_{i=1}^{n}\rho(y_{i}-{\bf x}_{i}^{\top}\mbox{\boldmath$b$}_{0}-{\bf x}_{i}^{\top}\mbox{\boldmath$b$}_{1}(t_{i}-t))K((t_{i}-t)/b_{n}), (7)

where ti=i/nt_{i}=i/n, bnb_{n} is a bandwidth satisfying bn0b_{n}\rightarrow 0, and ρ()\rho(\cdot) is a convex loss function with left derivative ψ()\psi(\cdot). In this paper, we also assume robustness of the loss functions in the sense that |ψ(x)ψ(y)|M1+M2|xy||\psi(x)-\psi(y)|\leq M_{1}+M_{2}|x-y| for x,y\forall x,y\in\mathbb{R} and some positive constants M1,M2M_{1},M_{2}. K()K(\cdot) is a kernel function satisfying K𝒦K\in{\cal K} and 𝒦{\cal K} is collection of symmetric and 𝒞1{\cal C}^{1} density functions with support [1,1][-1,1].

Recall the CRF defined in (2), then the regression function 𝜷()\mbox{\boldmath$\beta$}(\cdot) can be retrieved via

𝜷(t)=Λ(t)0<t1and𝜷(0)=Λ(0+),\displaystyle\mbox{\boldmath$\beta$}(t)=\Lambda^{\prime}(t-)\quad 0<t\leq 1\quad\mbox{and}\quad\mbox{\boldmath$\beta$}(0)=\Lambda^{\prime}(0+),

where Λ(t)\Lambda^{\prime}(t-) and Λ(t+)\Lambda^{\prime}(t+) represent the left and right derivative of Λ(t)\Lambda(t), respectively.

As is discussed in Section 7, the M-estimator 𝜷^bn(t)\hat{\mbox{\boldmath$\beta$}}_{b_{n}}(t) obtained in (7) involves a bias term regarding 𝜷′′(t)\mbox{\boldmath$\beta$}^{\prime\prime}(t). In order to proceed without estimating 𝜷′′(t)\mbox{\boldmath$\beta$}^{\prime\prime}(t), we shall employ a Jackknife bias-corrected estimator denoted as 𝜷ˇbn(t)\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t), where

𝜷ˇbn(t):=2𝜷^bn/2(t)𝜷^bn(t).\displaystyle\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t):=2\hat{\mbox{\boldmath$\beta$}}_{b_{n}/\sqrt{2}}(t)-\hat{\mbox{\boldmath$\beta$}}_{b_{n}}(t). (8)

The implementation of the bias-correction procedure is asymptotically equivalent to using the second-order kernel K(x):=22K(2x)K(x)K^{*}(x):=2\sqrt{2}K(\sqrt{2}x)-K(x). In this way, the bias of 𝜷ˇbn(t)\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t) is asymptotically negligible under mild conditions. Based on the Jackknife bias-corrected estimator, we propose to estimate Λ(t)\Lambda(t) via

Λˇ(tj):=i=1j𝜷ˇbn(ti)/n,1jn,\displaystyle\check{\Lambda}(t_{j}):=\sum_{i=1}^{j}\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t_{i})/n,\quad 1\leq j\leq n, (9)

and Λˇ(0):=0\check{\Lambda}(0):=0. For any t[0,1]t\in[0,1], let Λˇ(t)\check{\Lambda}(t) be the linear interpolation of the sequence {Λˇ(tj)}j=0n\{\check{\Lambda}(t_{j})\}_{j=0}^{n}. We omit the subscript bnb_{n} of 𝜷ˇbn(t)\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t) hereafter, whenever no confusion caused.

Let 𝐂\mathbf{C} be a fixed s×ps\times p, sps\leq p full rank matrix, we are interested in testing the dynamic pattern of 𝜷𝐂():=𝐂𝜷()\mbox{\boldmath$\beta$}_{\mathbf{C}}(\cdot):=\mathbf{C}\mbox{\boldmath$\beta$}(\cdot), which includes any linear combination of {𝜷j()}j=0p1\{\mbox{\boldmath$\beta$}_{j}(\cdot)\}_{j=0}^{p-1}. As a result, for the CRF Λ𝐂(t):=𝐂Λ(t){\Lambda}_{\mathbf{C}}(t):=\mathbf{C}{\Lambda}(t), its estimator can be obtained via Λˇ𝐂(t):=𝐂Λˇ(t)\check{\Lambda}_{\mathbf{C}}(t):=\mathbf{C}\check{\Lambda}(t). Based on the Bahadur representation presented in Section 7, the following theorem establishes a n\sqrt{n}-consistent Gaussian weak convergence result for the proposed aggregated M-estimator:

Theorem 1.

Suppose assumptions (A1)-(A7) in Section 7 hold, nbn4log8n\frac{nb_{n}^{4}}{\log^{8}n}\to\infty, ncbn0n^{c}b_{n}\to 0 for some positive constant cc, and let Ψ𝐂(t)=(𝐂Σ1(t)Ψ(t)Σ1(t)𝐂)12\Psi_{\mathbf{C}}(t)=(\mathbf{C}\Sigma^{-1}(t)\Psi(t)\Sigma^{-1}(t)\mathbf{C}^{\top})^{\frac{1}{2}} with Σ(t)\Sigma(t) and Ψ(t)\Psi(t) defined in Section 7. Then we have

{n(Λˇ𝐂(t)Λ𝐂(t))}0<t<1{𝐔(t)}0<t<1,\displaystyle\left\{\sqrt{n}(\check{\Lambda}_{\mathbf{C}}(t)-\Lambda_{\mathbf{C}}(t))\right\}_{0<t<1}\Rightarrow\left\{\mathbf{U}(t)\right\}_{0<t<1},

where {𝐔(t)}\left\{\mathbf{U}(t)\right\} is a mean 0 Gaussian process with Cov(𝐔(t1),𝐔(t2))=0min(t1,t2)Ψ𝐂(t)Ψ𝐂(t)𝑑t\text{Cov}(\mathbf{U}(t_{1}),\mathbf{U}(t_{2}))=\int_{0}^{\min(t_{1},t_{2})}\Psi_{\mathbf{C}}(t)\Psi_{\mathbf{C}}^{\top}(t)\;dt.

3 The Self-convolved Bootstrap

The Gaussian process 𝐔(t)\mathbf{U}(t) expressed in Theorem 1 provides a theoretical foundation for the simultaneous inference of Λ(t)\Lambda(t), but the implementation is still unaccomplished due to its complex covariance structure. To circumvent the problem, we propose a self-convolved bootstrap method to mimic the behavior of the Gaussian process 𝐔(t)\mathbf{U}(t).

Let μ=22[K(t1)+K(t+1)2K(t)]2dt\mu=\int_{-2}^{2}[K^{*}(t-1)+K^{*}(t+1)-2K^{*}(t)]^{2}\;\mathrm{d}t, {Ri}i=1n\{R_{i}\}_{i=1}^{n} be i.i.d. standard normal random variables independent of {(𝐱i,yi)}i=1n\{(\mathbf{x}_{i},y_{i})\}_{i=1}^{n}. For a given bandwidth cnc_{n}, define the bootstrap process {𝚽~no(t),t(0,1)}\{\tilde{\boldsymbol{\Phi}}_{n}^{o}(t),t\in(0,1)\} as the linear interpolation of {𝚽n,jo}\{\boldsymbol{\Phi}_{n,j}^{o}\}, where

𝚽n,jo=𝐂i=2ncnjcnμ(𝜷ˇcn(ti+cn)+𝜷ˇcn(ticn)2𝜷ˇcn(ti))Ri,j=2ncn,,n2ncn.\boldsymbol{\Phi}_{n,j}^{o}=\mathbf{C}\sum_{i=\left\lceil 2nc_{n}\right\rceil}^{j}\sqrt{\frac{c_{n}}{\mu}}(\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}+c_{n})+\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}-c_{n})-2\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}))R_{i},\quad j=\left\lceil 2nc_{n}\right\rceil,\cdots,n-\left\lceil 2nc_{n}\right\rceil.

The self-convolved bootstrap 𝚽~no(t)\tilde{\boldsymbol{\Phi}}_{n}^{o}(t) only requires the convolution of local M-regression estimates with i.i.d. standard normal random variables {Ri}i=1n\{R_{i}\}_{i=1}^{n}. The heuristics stem from two fundamental observations. Firstly, the Bahadur representation of {𝜷ˇ(ti)}\{\check{\mbox{\boldmath$\beta$}}(t_{i})\} in Section 7 suggests that 𝜷ˇcn(ti+cn)+𝜷ˇcn(ticn)2𝜷ˇcn(ti)\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}+c_{n})+\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}-c_{n})-2\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}) can be expressed as a weighted block sum of {𝐱iψ(ei)}i=1n\{\mathbf{x}_{i}\psi(e_{i})\}_{i=1}^{n}, where we take the second-order difference as a bias-correction technique. Secondly, Zhou (2013) demonstrated that for a broad range of non-stationary time series, progressive convolutions of their block sums with i.i.d. standard normal random variables can consistently mimic the joint probabilistic behavior of their partial sum processes. Observe that 𝐔(t)\mathbf{U}(t) represents the limiting behavior of a weighted partial sum process of {𝐱iψ(ei)}i=1n\{\mathbf{x}_{i}\psi(e_{i})\}_{i=1}^{n}. By leveraging these observations, the self-convolved bootstrap achieves an accurate simulation of the limiting Gaussian process of the CRF and retains a concise form for a broad range of M-estimators, utilizing just a single tuning parameter.

In contrast, it is difficult to make inference of 𝜷(t)\mbox{\boldmath$\beta$}(t) directly based on 𝜷ˇbn(t)\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t), as the density-like quantity involved in the distributional behavior of 𝜷ˇbn(t)\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t) is hard to estimate in practice due to possible discontinuities and the no-trivial task of choosing smoothing parameters. As a result, the CRF considered in this paper, combined with the proposed self-convolved bootstrap, enables us to conduct inference more efficiently and accurately.

To quantify the consistency of {𝚽~no(t),t(0,1)}\{\tilde{\boldsymbol{\Phi}}_{n}^{o}(t),t\in(0,1)\}, we derive a direct comparison of the distribution between {𝚽n,jo}j=2ncnn2ncn\{\boldsymbol{\Phi}_{n,j}^{o}\}_{j=\lceil 2nc_{n}\rceil}^{n-\lceil 2nc_{n}\rceil} and {𝐔(tj)}j=2ncnn2ncn\{\mathbf{U}(t_{j})\}{{}_{j=\lceil 2nc_{n}\rceil}^{n-\lceil 2nc_{n}\rceil}} in the following theorem. First, we define the nsns-dimensional vector 𝚯B=[(𝚯1B),(𝚯2B),,(𝚯nB)]\mathbf{\Theta}^{B}=[(\mathbf{\Theta}_{1}^{B})^{\top},(\mathbf{\Theta}_{2}^{B})^{\top},\cdots,(\mathbf{\Theta}_{n}^{B})^{\top}] with

𝚯jB:={𝚽n,2ncno if j2ncn𝚽n,jo if 2ncn<jn2ncn𝚽n,n2ncno if n2ncn<jn,\mathbf{\Theta}_{j}^{B}:=\left\{\begin{array}[]{ll}\boldsymbol{\Phi}_{n,\left\lceil 2nc_{n}\right\rceil}^{o}&\text{ if }j\leq\left\lceil 2nc_{n}\right\rceil\\ \boldsymbol{\Phi}_{n,j}^{o}&\text{ if }\left\lceil 2nc_{n}\right\rceil<j\leq n-\left\lceil 2nc_{n}\right\rceil\\ \boldsymbol{\Phi}_{n,n-\left\lceil 2nc_{n}\right\rceil}^{o}&\text{ if }n-\left\lceil 2nc_{n}\right\rceil<j\leq n\end{array}\right.,

and similarly define the nsns-dimensional vector 𝐔=[(𝐔1),(𝐔2),,(𝐔n)]\mathbf{U}=[(\mathbf{U}_{1})^{\top},(\mathbf{U}_{2})^{\top},\cdots,(\mathbf{U}_{n})^{\top}] with

𝐔i:={𝐔(t2ncn) if i2ncn𝐔(ti) if 2ncn<in2ncn𝐔(tn2ncn) if n2ncn<in.\mathbf{U}_{i}:=\left\{\begin{array}[]{ll}\mathbf{U}(t_{\left\lceil 2nc_{n}\right\rceil})&\text{ if }i\leq\left\lceil 2nc_{n}\right\rceil\\ \mathbf{U}(t_{i})&\text{ if }\left\lceil 2nc_{n}\right\rceil<i\leq n-\left\lceil 2nc_{n}\right\rceil\\ \mathbf{U}(t_{n-\left\lceil 2nc_{n}\right\rceil})&\text{ if }n-\left\lceil 2nc_{n}\right\rceil<i\leq n.\end{array}\right.

Recall 𝐔(t)\mathbf{U}(t) is defined in Theorem 1. Furthermore, we define

Δ:=max1i,jns|[Cov(𝐔)Cov(𝚯B{(𝐱k,yk)}k=1n)]ij|\Delta:=\max_{1\leq i,j\leq ns}\left|\left[\operatorname{Cov}\left(\mathbf{U}\right)-\operatorname{Cov}(\mathbf{\Theta}^{B}\mid\{(\mathbf{x}_{k},y_{k})\}_{k=1}^{n})\right]_{ij}\right|

as a measure of the difference in covariance structure between the bootstrapped process and the limiting Gaussian process.

Theorem 2.

Suppose assumptions (A1)-(A7) in Section 7 hold, ncn4log8n\frac{nc_{n}^{4}}{\log^{8}n}\to\infty, and ncn50nc_{n}^{5}\to 0, then we have Δ=O(cn+ncn5)\Delta=O_{\mathbb{P}}(\sqrt{c_{n}}+nc_{n}^{5}). Define the sequence of events An={Δ(cn+ncn5)hn}A_{n}=\{\Delta\leq(\sqrt{c_{n}}+nc_{n}^{5})h_{n}\} where hn>0h_{n}>0 is a sequence diverging at an arbitrarily slow rate. Then (An)=1o(1)\mathbb{P}(A_{n})=1-o(1). On the event AnA_{n}, for any δ(0,1)\delta\in(0,1), we have

sup|x|>dn,p|(max2ncnin2ncn|𝐔(ti)|x)(max2ncnin2ncn|𝚽n,io|x{(𝐱i,yi)}i=1n)|\displaystyle\sup_{|x|>d_{n,p}^{\circ}}\left|\mathbb{P}\left(\max_{\lceil 2nc_{n}\rceil\leq i\leq n-\lceil 2nc_{n}\rceil}|\mathbf{U}(t_{i})|_{\infty}\leq x\right)-\mathbb{P}\left(\max_{\lceil 2nc_{n}\rceil\leq i\leq n-\lceil 2nc_{n}\rceil}|\boldsymbol{\Phi}_{n,i}^{o}|_{\infty}\leq x\mid\{(\mathbf{x}_{i},y_{i})\}_{i=1}^{n}\right)\right| (10)
\displaystyle\lesssim ((cn+ncn5)hn)1/3(log(n))1+4δ/3+1n(log(n))1/2,\displaystyle\left((\sqrt{c_{n}}+nc_{n}^{5})h_{n}\right)^{1/3}(\log(n))^{1+4\delta/3}+\dfrac{1}{n(\log(n))^{1/2}},

where dn,p=C[((cn+ncn5)hn)1/3logδ/3(n)+logδ(n)]d_{n,p}^{\circ}=C\left[\left((\sqrt{c_{n}}+nc_{n}^{5})h_{n}\right)^{1/3}\log^{\delta/3}(n)+\log^{-\delta}(n)\right] with some finite constant CC that does not depend on nn.

Remark 1.

Theorem 2 guarantees that the conditional behavior of the bootstrap process consistently mimics the distribution of the limiting process 𝐔(t){\bf U}(t) in {\cal L}_{\infty} norm. The restricted range |x|>dn,p|x|>d_{n,p}^{\circ} is to make the Gaussian approximation valid under the circumstance that the variances of 𝚽n,io\boldsymbol{\Phi}_{n,i}^{o} have no lower bounds. Note that dn,pd_{n,p}^{\circ} and the upper bound in (10) converge to 0 as nn\to\infty. Accordingly, for a given α\alpha and when nn is sufficiently large, dn,pd_{n,p}^{\circ} will be dominated by the (1α)(1-\alpha)-th percentile of max2ncnin2ncn|𝚽n,io|\max_{\lceil 2nc_{n}\rceil\leq i\leq n-\lceil 2nc_{n}\rceil}|\boldsymbol{\Phi}_{n,i}^{o}|_{\infty}, thereby ensuring that the results in Theorem 2 asymptotically validate the bootstrap procedure. Furthermore, under the conditions of Theorem 2, cnc_{n} converges to 0, hence the consistency of the bootstrap 𝚽~no(t)\tilde{\boldsymbol{\Phi}}_{n}^{o}(t) demonstrated on [2cn,12cn][2c_{n},1-2c_{n}] is asymptotically uniform on t(0,1)t\in(0,1).

4 Applications to Hypothesis Testing

In principle, hypotheses regarding the regression function 𝜷(t)\mbox{\boldmath$\beta$}(t) can be expressed equivalently in terms of the CRF Λ(t)\Lambda(t). In this Section, we shall explore the application of the CRF and the self-convolved bootstrap to three general classes of hypothesis tests of the M-regression: Exact Function Test (EFT), Lack-of-fit Test (LOFT), and Qualitative Test (QT).

4.1 Exact Function Test

The Exact Function Test amounts to testing H0:𝜷𝐂(t)=f(t),t(0,1)H_{0}:\mbox{\boldmath$\beta$}_{\mathbf{C}}(t)=f(t),t\in(0,1), where f()f(\cdot) is a known function. It can be reformulated as H0:Λ𝐂(t)=0tf(s)𝑑sH_{0}:\Lambda_{\mathbf{C}}(t)=\int_{0}^{t}f(s)\;ds. Define

Te:=supt(0,1)n|Λˇ𝐂(t)0tf(s)𝑑s|.\displaystyle T_{e}:=\sup_{t\in(0,1)}\sqrt{n}\left|\check{\Lambda}_{\mathbf{C}}(t)-\int_{0}^{t}f(s)\;ds\right|_{\infty}. (11)

For any prespecified level α\alpha, our goal is to find the critical value qαq_{\alpha}, such that (Te>qαH0)=α\mathbb{P}\left(T_{e}>q_{\alpha}\mid H_{0}\right)=\alpha asymptotically. The detailed procedures for conducting EFT are given in Algorithm 1. Propositions 1 and 2 in Section 4.2 validate the latter algorithm theoretically.

Algorithm 1 Exact Function Test

 

1:Select appropriate bandwidths bnb_{n}, cnc_{n} according to the procedures in section 5.1, define i=max(nbn,2ncn)i_{*}=\max(\lceil nb_{n}\rceil,\lceil 2nc_{n}\rceil) and i=nii^{*}=n-i_{*}.
2:Obtain the Jackknife bias-corrected M-estimators 𝜷ˇbn(ti)\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t_{i}), 𝜷ˇcn(ti)\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}), 𝜷ˇcn(ticn)\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}-c_{n}) and 𝜷ˇcn(ti+cn)\check{\mbox{\boldmath$\beta$}}_{c_{n}}(t_{i}+c_{n}) for i=1,2,,ni=1,2,\cdots,n, compute {Λˇ𝐂(tj)}j=1n\{\check{\Lambda}_{\mathbf{C}}(t_{j})\}_{j=1}^{n} based on {𝜷ˇbn(ti)}i=1n\{\check{\mbox{\boldmath$\beta$}}_{b_{n}}(t_{i})\}_{i=1}^{n}.
3:Generate BB sets of i.i.d. standard normal random variables {ui(r)}i=1n\{u_{i}^{(r)}\}_{i=1}^{n}, r=1,2,,Br=1,2,\cdots,B. For each rr, calculate {𝚽n,jo}j=1n\{\mathbf{\Phi}_{n,j}^{o}\}_{j=1}^{n} as {𝚽n,j(r)}j=1n\{\mathbf{\Phi}_{n,j}^{(r)}\}_{j=1}^{n} by letting Ri=ui(r)R_{i}=u_{i}^{(r)} in Theorem 2.
4:Calculate Mr=maxiji|𝚽n,j(r)|M_{r}=\max_{i_{*}\leq j\leq i^{*}}|\boldsymbol{\Phi}_{n,j}^{(r)}|_{\infty}, r=1,2,,Br=1,2,\cdots,B.
5:For a given α(0,1)\alpha\in(0,1), find the (1α)(1-\alpha)-th sample quantile of {Mr}r=1B\{M_{r}\}_{r=1}^{B}, q^n,1α\hat{q}_{n,1-\alpha}.
6:For the Exact Function Test where H0:𝜷𝐂(t)=f(t),t(0,1)H_{0}:\mbox{\boldmath$\beta$}_{\mathbf{C}}(t)=f(t),\quad t\in(0,1), reject the null hypothesis if maxiji|Λˇ𝐂(tj)0tjf(s)𝑑s|>q^n,1α/n\max_{i_{*}\leq j\leq i^{*}}|\check{\Lambda}_{\mathbf{C}}(t_{j})-\int_{0}^{t_{j}}f(s)\;ds|_{\infty}>\hat{q}_{n,1-\alpha}/\sqrt{n}.

 

4.2 Lack-of-fit Test

The Lack-of-fit Test investigates whether Λ𝐂(t)\Lambda_{\mathbf{C}}(t) belongs to a given parametric family; i.e., Λ𝐂(t)=g(t,𝜽),t(0,1)\Lambda_{\mathbf{C}}(t)=g(t,\boldsymbol{\theta}),t\in(0,1) for a given family of functions g(t,)g(t,\cdot) with unknown 𝜽\boldsymbol{\theta}. Under H0H_{0}, oftentimes the unknown 𝜽\boldsymbol{\theta} can be expressed (solved) in terms of Λ𝐂(vi)\Lambda_{\mathbf{C}}(v_{i}) for some given points v1,v2,,vkv_{1},v_{2},\cdots,v_{k} [0,1]\in[0,1]. See Section 4.2.1 for a detailed example of testing whether 𝜷𝐂(t)\boldsymbol{\beta}_{\mathbf{C}}(t) is a polynomial of a given degree. In this case, the null hypothesis is written as

Λ𝐂(t)f(t,{Λ𝐂(vi)}i=1k)=𝟎,t(0,1),\displaystyle\Lambda_{\mathbf{C}}(t)-f(t,\{\Lambda_{\mathbf{C}}(v_{i})\}_{i=1}^{k})=\mathbf{0},\quad t\in(0,1), (12)

where kk is a finite integer. Assumptions on f()f(\cdot) are discussed in Theorem 3. Consider

Tl:=supt(0,1)n|Λˇ𝐂(t)f(t,{Λˇ𝐂(vi)}i=1k)|,\displaystyle T_{l}:=\sup_{t\in(0,1)}\sqrt{n}\left|\check{\Lambda}_{\mathbf{C}}(t)-f(t,\{\check{\Lambda}_{\mathbf{C}}(v_{i})\}_{i=1}^{k})\right|_{\infty}, (13)

then the critical value can be obtained as shown in Algorithm 2.

4.2.1 Polynomial Test

The polynomial test aims to verify whether the coefficients are qq-th order polynomials for a fixed qq. For instance, q=0q=0 allows one to test if 𝜷𝐂(t)=𝜽0\mbox{\boldmath$\beta$}_{\mathbf{C}}(t)=\boldsymbol{\theta}_{0} for some unknown 𝜽0\boldsymbol{\theta}_{0}. Therefore under H0H_{0}, Λ𝐂(t)=𝜽0t\Lambda_{\mathbf{C}}(t)=\boldsymbol{\theta}_{0}t and 𝜽0=Λ𝐂(1)\boldsymbol{\theta}_{0}=\Lambda_{\mathbf{C}}(1) by setting v1=1v_{1}=1. Hence H0H_{0} can be written equivalently as Λ𝐂(t)tΛ𝐂(1)=𝟎,t(0,1)\Lambda_{\mathbf{C}}(t)-t\Lambda_{\mathbf{C}}(1)=\mathbf{0},t\in(0,1). This special case of q=0q=0 is closely related to CUSUM tests in structural change detection and has been widely discussed in the statistics literature recently; see for instance Mies (2023) and the references therein.

In general, testing whether 𝜷𝐂(t)\mbox{\boldmath$\beta$}_{\mathbf{C}}(t) is a qq-th order polynomial is equivalent to testing H0:𝚲𝐂(t)=𝐚1t++𝐚q+1tq+1,t(0,1)H_{0}:\boldsymbol{\Lambda}_{\mathbf{C}}(t)=\mathbf{a}_{1}t+\cdots+\mathbf{a}_{q+1}t^{q+1},t\in(0,1). Choosing k=q+1k=q+1 and vi=i/(q+1)v_{i}=i/(q+1), we solve 𝐚i,1iq+1\mathbf{a}_{i},1\leq i\leq q+1 via the system of linear equations

{Λ𝐂(1q+1)1q+1𝐚1(1q+1)2𝐚2(1q+1)q+1𝐚q+1=𝟎Λ𝐂(2q+1)2q+1𝐚1(2q+1)2𝐚2(2q+1)q+1𝐚q+1=𝟎Λ𝐂(1)𝐚1𝐚2𝐚q=𝟎.\displaystyle\begin{cases}\Lambda_{\mathbf{C}}\left(\frac{1}{q+1}\right)-\frac{1}{q+1}\mathbf{a}_{1}-\left(\frac{1}{q+1}\right)^{2}\mathbf{a}_{2}-\cdots-\left(\frac{1}{q+1}\right)^{q+1}\mathbf{a}_{q+1}=\mathbf{0}\\ \Lambda_{\mathbf{C}}\left(\frac{2}{q+1}\right)-\frac{2}{q+1}\mathbf{a}_{1}-\left(\frac{2}{q+1}\right)^{2}\mathbf{a}_{2}-\cdots-\left(\frac{2}{q+1}\right)^{q+1}\mathbf{a}_{q+1}=\mathbf{0}\\ \quad\quad\vdots\\ \Lambda_{\mathbf{C}}(1)-\mathbf{a}_{1}-\mathbf{a}_{2}-\cdots-\mathbf{a}_{q}=\mathbf{0}\\ \end{cases}.

By plugging in the solutions to 𝐚i\mathbf{a}_{i}’s, H0H_{0} is of the form (12), and the LOFT can be applied.

Theorem 3 lays a theoretical foundation for the Lack-of-fit Test and we summarise by listing the procedures in Algorithm 2.

Theorem 3.

Under the conditions of Theorem 1 and 2, assume f(t,{𝐬i}i=1k):[0,1]×sksf(t,\{\mathbf{s}_{i}\}_{i=1}^{k}):[0,1]\times\mathbb{R}^{sk}\to\mathbb{R}^{s} is continuously differentiable with regard to {𝐬i}i=1k\{\mathbf{s}_{i}\}_{i=1}^{k}. Denote the partial derivative of f()f(\cdot) with regard to 𝐬j\mathbf{s}_{j} as jf(t,{𝐬i}i=1k)\partial_{j}f(t,\{\mathbf{s}_{i}\}_{i=1}^{k}), we assume jf(t,{𝐬i}i=1k),j=1,2,,k\partial_{j}f(t,\{\mathbf{s}_{i}\}_{i=1}^{k}),j=1,2,\cdots,k are Lipschitz continuous for {𝐬i}i=1k\{\mathbf{s}_{i}\}_{i=1}^{k} uniformly over t(0,1)t\in(0,1), and that maxj=1ksupt(0,1)|jf(t,{𝐬i}i=1k)|C\max_{j=1}^{k}\sup_{t\in(0,1)}|\partial_{j}f(t,\{\mathbf{s}_{i}\}_{i=1}^{k})|\leq C. Let 𝐔(t)\mathbf{U}(t) be a Gaussian process defined in Theorem 2, T(t):=Λ𝐂(t)f(t,{Λ𝐂(vi)}i=1k)T(t):={\Lambda}_{\mathbf{C}}(t)-f(t,\{{\Lambda}_{\mathbf{C}}(v_{i})\}_{i=1}^{k}), Tn(t):=Λˇ𝐂(t)f(t,{Λˇ𝐂(vi)}i=1k)T_{n}(t):=\check{\Lambda}_{\mathbf{C}}(t)-f(t,\{\check{\Lambda}_{\mathbf{C}}(v_{i})\}_{i=1}^{k}), 𝐔~(t):=𝐔(t)j=1kjf(t,{Λ𝐂(vi)}i=1k)𝐔(vj)\tilde{\mathbf{U}}(t):=\mathbf{U}(t)-\sum_{j=1}^{k}\partial_{j}f(t,\{\Lambda_{\mathbf{C}}(v_{i})\}_{i=1}^{k})\mathbf{U}(v_{j}). Let i=max(nbn,2ncn)i_{*}=\max(\lceil nb_{n}\rceil,\lceil 2nc_{n}\rceil) and i=nii^{*}=n-i_{*}, then

maxiii|n(Tn(ti)T(ti))|supt(0,1)|𝐔~(t)|.\displaystyle\max_{i_{*}\leq i\leq i^{*}}\left|\sqrt{n}(T_{n}(t_{i})-T(t_{i}))\right|_{\infty}\Rightarrow\sup_{t\in(0,1)}\left|\tilde{\mathbf{U}}(t)\right|_{\infty}. (14)

Additionally, let Lˇn(t)=𝚽~n(t)j=1kjf(t,{Λˇ𝐂(vi)}i=1k)𝚽~n(vj)\check{{L}}_{n}(t)=\tilde{\boldsymbol{\Phi}}^{\circ}_{n}(t)-\sum_{j=1}^{k}\partial_{j}f(t,\{\check{\Lambda}_{\mathbf{C}}(v_{i})\}_{i=1}^{k})\tilde{\boldsymbol{\Phi}}^{\circ}_{n}(v_{j}), then on a sequence of events AnA_{n} with (An)=1o(1)\mathbb{P}(A_{n})=1-o(1), we have

sup|x|>dn,p|(maxiii|𝐔~(ti)|x)(maxiii|Lˇn(ti)|x{(𝐱i,yi)}i=1n)|0,\displaystyle\sup_{|x|>d_{n,p}^{\circ}}\left|\mathbb{P}\left(\max_{i_{*}\leq i\leq i^{*}}|\tilde{\mathbf{U}}(t_{i})|_{\infty}\leq x\right)-\mathbb{P}\left(\max_{i_{*}\leq i\leq i^{*}}|\check{{L}}_{n}(t_{i})|_{\infty}\leq x\mid\{(\mathbf{x}_{i},y_{i})\}_{i=1}^{n}\right)\right|\to 0, (15)

where dn,p0d_{n,p}^{\circ}\to 0 as defined in Theorem 2.

Algorithm 2 Lack-of-fit Test

 

1:Perform Step 1 - 3 proposed in Algorithm 1. Based on {Λˇ𝐂(tj)}j=1n\{\check{\Lambda}_{\mathbf{C}}(t_{j})\}_{j=1}^{n}, obtain the Jackknife bias-corrected estimators Tn(t)=Λˇ𝐂(t)f(t,{Λˇ𝐂(vi)}i=1k)T_{n}(t)=\check{\Lambda}_{\mathbf{C}}(t)-f(t,\{\check{\Lambda}_{\mathbf{C}}(v_{i})\}_{i=1}^{k}), and construct the corresponding bootstrap statistics {Lˇn(r)(t)=𝚽~n(r)(t)j=1kjf(t,{Λˇ𝐂(vi)}i=1k)𝚽~n(r)(vj)}r=1B\{\check{L}^{(r)}_{n}(t)=\tilde{\boldsymbol{\Phi}}^{(r)}_{n}(t)-\sum_{j=1}^{k}\partial_{j}f(t,\{\check{\Lambda}_{\mathbf{C}}(v_{i})\}_{i=1}^{k})\tilde{\boldsymbol{\Phi}}^{(r)}_{n}(v_{j})\}_{r=1}^{B}.
2:Calculate Mr=maxiji|Lˇn(r)(tj)|M_{r}=\max_{i_{*}\leq j\leq i^{*}}|\check{L}^{(r)}_{n}(t_{j})|_{\infty}, r=1,2,,Br=1,2,\cdots,B.
3:For a given α(0,1)\alpha\in(0,1), find the (1α)(1-\alpha)-th sample quantile of {Mr}r=1B\{M_{r}\}_{r=1}^{B}, q^n,1α\hat{q}_{n,1-\alpha}.
4:For the Lack-of-fit Test H0:Λ𝐂(t)f(t,{Λ𝐂(ti)}i=1k)=𝟎,t(0,1)H_{0}:\Lambda_{\mathbf{C}}(t)-f(t,\{\Lambda_{\mathbf{C}}(t_{i})\}_{i=1}^{k})=\mathbf{0},\quad t\in(0,1), reject the null hypothesis if maxiji|Tn(tj)|>q^n,1α/n\max_{i_{*}\leq j\leq i^{*}}|T_{n}(t_{j})|_{\infty}>\hat{q}_{n,1-\alpha}/\sqrt{n}.

 

It’s straightforward to see that the EFT addressed in Section 4.1 is a special case of the LOFT in Section 4.2. Therefore, we state the asymptotic rejection rate of the proposed algorithm of LOFT in Proposition 1 and 2, which also applies to EFT.

Proposition 1.

Suppose the conditions of Theorem 3 hold, under the null hypothesis, for a given significance level α(0,1)\alpha\in(0,1), the rejection rate of the proposed LOFT satisfies

limnlimB(maxiji|Tn(tj)|>q^n,1α/n)=α.\displaystyle\lim_{n\to\infty}\lim_{B\to\infty}\mathbb{P}\left(\max_{i_{*}\leq j\leq i^{*}}|T_{n}(t_{j})|_{\infty}>\hat{q}_{n,1-\alpha}/\sqrt{n}\right)=\alpha.
Proposition 2.

Suppose the conditions of Theorem 3 hold, under the alternative hypothesis of LOFT, if supt(0,1)|Λ𝐂(t)f(t,{Λ𝐂(vi)}i=1k)|n12\sup_{t\in(0,1)}|\Lambda_{\mathbf{C}}(t)-f(t,\{\Lambda_{\mathbf{C}}(v_{i})\}_{i=1}^{k})|_{\infty}\gg n^{-\frac{1}{2}}, then

limnlimB(maxiji|Tn(tj)|>q^n,1α/n)=1.\displaystyle\lim_{n\to\infty}\lim_{B\to\infty}\mathbb{P}\left(\max_{i_{*}\leq j\leq i^{*}}|T_{n}(t_{j})|_{\infty}>\hat{q}_{n,1-\alpha}/\sqrt{n}\right)=1.

Propositions 1 and 2 demonstrate the accuracy and power of our proposed testing framework. In particular, both EFT and LOFT are asymptotically accurate under the null hypothesis, while attaining asymptotic power of 11 under local alternatives deviating from H0H_{0} with a distance much greater than n1/2n^{-1/2}. These results highlight the asymptotic reliability and robustness of our methodology.

4.3 Qualitative Test

In this subsection, we are interested in testing qualitative hypotheses on M-regression coefficients in the sense that the null hypothesis is nonparametric and can be written as

H0:Λ𝐂(t)𝒩0,t(0,1),\displaystyle H_{0}:\Lambda_{\mathbf{C}}(t)\in\mathcal{N}_{0},\quad t\in(0,1), (16)

where 𝒩0\mathcal{N}_{0} is a non-empty subset of 𝒩:={g𝒞3[0,1]:[0,1]s,g(0)=𝟎}\mathcal{N}:=\{g\in\mathcal{C}^{3}[0,1]:[0,1]\to\mathbb{R}^{s},\quad g(0)=\mathbf{0}\}. Based on the cumulative M-estimator Λˇ𝐂(t)\check{\Lambda}_{\mathbf{C}}(t), consider the optimization problem:

minimizesupt(0,1)|ϕ(t)Λˇ𝐂(t)|,\displaystyle\text{minimize}\quad\sup_{t\in(0,1)}|\boldsymbol{\phi}(t)-\check{\Lambda}_{\mathbf{C}}(t)|_{\infty}, (17)
subject toϕ(t)𝒩0.\displaystyle\text{subject to}\quad\boldsymbol{\phi}(t)\in\mathcal{N}_{0}. (18)

Suppose that there exists a projection under the \mathcal{L}_{\infty} norm on 𝒩0\mathcal{N}_{0} for any f𝒩f\in\mathcal{N}. Then the solution to (17), denoted by Λ~𝐂(t)\tilde{\Lambda}_{\mathbf{C}}(t), is the projection of Λˇ𝐂(t)\check{\Lambda}_{\mathbf{C}}(t) onto 𝒩0\mathcal{N}_{0}. Define

Tq:=supt(0,1)n|Λˇ𝐂(t)Λ~𝐂(t)|,\displaystyle T_{q}:=\sup_{t\in(0,1)}\sqrt{n}|\check{\Lambda}_{\mathbf{C}}(t)-\tilde{\Lambda}_{\mathbf{C}}(t)|_{\infty}, (19)

we observe that TqT_{q} becomes a natural indicator of the distance between Λ𝐂(t){\Lambda}_{\mathbf{C}}(t) and 𝒩0\mathcal{N}_{0}, as guaranteed by the uniform consistency of Λˇ𝐂(t)\check{\Lambda}_{\mathbf{C}}(t). Therefore, in principle, TqT_{q} should be small under H0H_{0} and large under the alternative. The critical value of the test can be easily obtained by the self-convolved bootstrap, similar to the implementation of the EFT in Algorithm 1. Detailed steps are given in Algorithm 3.

Algorithm 3 Qualitative Test

 

1:Perform Step 1 - 5 in Algorithm 1 to obtain the critical value of EFT, q^n,1α\hat{q}_{n,1-\alpha}.
2:Formulate a projection problem as in (17) and find the solution Λ~𝐂(t)\tilde{\Lambda}_{\mathbf{C}}(t).
3:For the Qualitative Test where H0:Λ𝐂(t)𝒩0,t(0,1)H_{0}:\Lambda_{\mathbf{C}}(t)\in\mathcal{N}_{0},\quad t\in(0,1), reject the null hypothesis whenever maxiji|Λˇ𝐂(tj)Λ~𝐂(tj)|>q^n,1α/n\max_{i_{*}\leq j\leq i^{*}}|\check{\Lambda}_{\mathbf{C}}(t_{j})-\tilde{\Lambda}_{\mathbf{C}}(t_{j})|_{\infty}>\hat{q}_{n,1-\alpha}/\sqrt{n}.

 

Proposition 3 stated below shows the capability of our approach to asymptotically control the type I error under the specified significance level. Meanwhile, it attains an asymptotic power of 11 when the CRF deviates from the qualitative hypothesis with a \mathcal{L}_{\infty} distance dominating 1/n1/\sqrt{n}, as illustrated in Proposition 4.

Proposition 3.

Suppose conditions of Theorem 1 - 2 hold. Under the null hypothesis of QT, we have

limnlimB(maxiji|Λˇ𝐂(tj)Λ~𝐂(tj)|>q^n,1α/n)α,\displaystyle\lim_{n\to\infty}\lim_{B\to\infty}\mathbb{P}\left(\max_{i_{*}\leq j\leq i^{*}}|\check{\Lambda}_{\mathbf{C}}(t_{j})-\tilde{\Lambda}_{\mathbf{C}}(t_{j})|_{\infty}>\hat{q}_{n,1-\alpha}/\sqrt{n}\right)\leq\alpha,

where Λ~𝐂(t)\tilde{\Lambda}_{\mathbf{C}}(t) is the projection of Λˇ𝐂(t)\check{\Lambda}_{\mathbf{C}}(t) onto the feasible region 𝒩0\mathcal{N}_{0}.

Proposition 4.

Suppose conditions of Theorem 1 - 2 hold. Under the alternative hypothesis of QT, further assume that infg𝒩0supt(0,1)|Λ𝐂(t)g(t)|n12\inf_{g\in\mathcal{N}_{0}}\sup_{t\in(0,1)}|{\Lambda}_{\mathbf{C}}(t)-g(t)|_{\infty}\gg n^{-\frac{1}{2}}, we have

limnlimB(maxiji|Λˇ𝐂(tj)Λ~𝐂(tj)|>q^n,1α/n)=1.\displaystyle\lim_{n\to\infty}\lim_{B\to\infty}\mathbb{P}\left(\max_{i_{*}\leq j\leq i^{*}}|\check{\Lambda}_{\mathbf{C}}(t_{j})-\tilde{\Lambda}_{\mathbf{C}}(t_{j})|_{\infty}>\hat{q}_{n,1-\alpha}/\sqrt{n}\right)=1.

4.3.1 Shape Test

One instance of such Qualitative Tests involves testing the shape of 𝜷𝐂(t)\mbox{\boldmath$\beta$}_{\mathbf{C}}(t). For example, an attempt is to check the monotonicity of 𝜷𝐂(t)\mbox{\boldmath$\beta$}_{\mathbf{C}}(t), which is equivalent to testing

H0:Λ𝐂(t) is a convex/concave function.\displaystyle H_{0}:\Lambda_{\mathbf{C}}(t)\text{ is a convex/concave function.}

As a straightforward extension, we may consider the null hypothesis H0:𝜷𝐂(d)(t)𝟎H_{0}:\mbox{\boldmath$\beta$}_{\mathbf{C}}^{(d)}(t)\geq\mathbf{0}. Particularly, d=0d=0 corresponds to nonnegativity, d=1d=1 corresponds to the null hypothesis that 𝜷𝐂(t)\mbox{\boldmath$\beta$}_{\mathbf{C}}(t) is a monotonically increasing function, and d=2d=2 refers to the convexity of 𝜷𝐂(t)\mbox{\boldmath$\beta$}_{\mathbf{C}}(t). Clearly, such differential-type shape constraints belong to the realm of QT, by letting the feasible region 𝒩0={g𝒞3[0,1]:g(0)=𝟎,g(d+1)(t)𝟎}\mathcal{N}_{0}=\{g\in\mathcal{C}^{3}[0,1]:g(0)=\mathbf{0},g^{(d+1)}(t)\geq\mathbf{0}\}.

In practice, to tackle the optimization problem in the finite sample, the functional optimization problem is reduced to that regarding discrete vectors. Let s=1s=1 for a slight gain of simplicity, we denote the dd-th differential matrix as d\nabla_{d} and let ϕ=(ϕ0,ϕ1,,ϕn)\boldsymbol{\phi}=(\phi_{0},{\phi}_{1},\cdots,{\phi}_{n})^{\top}, then (17) can be formulated as follows:

minimizemaxiii|ϕiΛˇ𝐂(ti)|,\displaystyle\text{minimize}\quad\max_{i_{*}\leq i\leq i^{*}}|{\phi}_{i}-{\check{\Lambda}_{\mathbf{C}}}(t_{i})|_{\infty}, (20)
subject tod+1ϕ0,ϕ0=0.\displaystyle\text{subject to}\quad\nabla_{d+1}\boldsymbol{\phi}\geq 0,\quad\phi_{0}=0. (21)

Further, as is discussed in Knight (2017), the projection under \mathcal{L}_{\infty} norm in (20) can be expressed as the solution to a linear program, which can be obtained easily.

5 Simulation Studies

This section utilizes Monte Carlo simulations to demonstrate the finite sample performance of the proposed method. All three hypotheses stated in Section 4 are considered.

Let a(t)=1/2(t1/2)2a(t)=1/2-(t-1/2)^{2}, b(t)=1/2t/2b(t)=1/2-t/2, c(t)=1/4+t/2c(t)=1/4+t/2. For i=1,2,,ni=1,2,\cdots,n, define ti=i/nt_{i}=i/n, ei=e(ti)e_{i}=e(t_{i}), where e(t)=j=0a(t)jζij/4e(t)=\sum_{j=0}^{\infty}a(t)^{j}\zeta_{i-j}/4, and xi,1=xi,1(ti)x_{i,1}=x_{i,1}(t_{i}), xi,2=xi,2(ti)x_{i,2}=x_{i,2}(t_{i}), where xi,1(t)=j=0bj(t)ϵijx_{i,1}(t)=\sum_{j=0}^{\infty}b^{j}(t)\epsilon_{i-j}, xi,2(t)=j=0cj(t)ηijx_{i,2}(t)=\sum_{j=0}^{\infty}c^{j}(t)\eta_{i-j}, and ei=(ηi+εi)/2e_{i}=(\eta_{i}+\varepsilon_{i})/\sqrt{2}, {ζi}i=\{\zeta_{i}\}_{i=-\infty}^{\infty}, {ηi}i=\{\eta_{i}\}_{i=-\infty}^{\infty} and {εi}i=\{\varepsilon_{i}\}_{i=-\infty}^{\infty} are i.i.d standard normals. Let 𝒢i=(η,ε,,ηi,εi)\mathcal{G}_{i}=(\eta_{-\infty},\varepsilon_{-\infty},\cdots,\eta_{i},\varepsilon_{i}), i=(ζ,,ζi)\mathcal{F}_{i}=(\zeta_{-\infty},\cdots,\zeta_{i}), and ei,τe_{i,\tau} be the τ\tau-th quantile of eie_{i}. Consider the following models:

  1. I

    yi=sin(2πti)+0.5xi,1+2log(1+2ti)xi,2+eiy_{i}=\sin(2\pi t_{i})+0.5x_{i,1}+2\log(1+2t_{i})x_{i,2}+e_{i}. Covariates and errors are independent.

  2. II

    yi=sin(2πti)+0.5xi,1+2log(1+2ti)xi,2+1+xi,12+xi,22(eiei,τ)/3y_{i}=\sin(2\pi t_{i})+0.5x_{i,1}+2\log(1+2t_{i})x_{i,2}+\sqrt{1+x_{i,1}^{2}+x_{i,2}^{2}}(e_{i}-e_{i,\tau})/\sqrt{3} when using quantile loss, and yi=sin(2πti)+0.5xi,1+2log(1+2ti)xi,2+1+xi,12+xi,22ei/3y_{i}=\sin(2\pi t_{i})+0.5x_{i,1}+2\log(1+2t_{i})x_{i,2}+\sqrt{1+x_{i,1}^{2}+x_{i,2}^{2}}e_{i}/\sqrt{3} when using quadratic loss. In this model, covariates and errors are correlated.

Case I and II share the same 𝜷(t)=(sin(2πt),0.5,2log(1+2t))\mbox{\boldmath$\beta$}(t)=(\sin(2\pi t),0.5,2\log(1+2t))^{\top} but have different errors. By investigating both cases, we can illustrate the applicability of our method under error-covariate independence and dependence. Throughout this section, we use the Epanechnikov kernel. The number of replications is 10001000, and the bootstrap sample size is B=1000B=1000.

5.1 Bandwidth Selection

The framework involves smoothing parameters bnb_{n} and cnc_{n}, where bnb_{n} is used for estimating Λ𝐂(t)\Lambda_{\mathbf{C}}(t), while cnc_{n} is used for constructing the bootstrapped process {Φ~no(t),t[0,1]}\{\tilde{\Phi}_{n}^{o}(t),t\in[0,1]\}.

To select the appropriate bandwidth bnb_{n}, we recommend using Leave-One-Out Cross-Validation (LOOCV) to select b~\tilde{b} from a predetermined grid {b1,b2,,bk}\{b_{1},b_{2},\cdots,b_{k}\}. Denote 𝜷^b,i(t)\hat{\mbox{\boldmath$\beta$}}_{b,-i}(t) as the estimator of 𝜷(t)\mbox{\boldmath$\beta$}(t) using bandwidth bb and the training set {(𝐱,y)(i)}\{(\mathbf{x},y)_{(-i)}\}, then we select b~\tilde{b} via the minimum average error:

b~=argminb{b1,,bk}1ni=1nρ(yi𝐱i𝜷^b,i(ti)).\displaystyle\tilde{b}=\mathop{\mbox{argmin}}_{b\in\{b_{1},\cdots,b_{k}\}}\frac{1}{n}\sum_{i=1}^{n}\rho(y_{i}-{\bf x}_{i}^{\top}\hat{\mbox{\boldmath$\beta$}}_{b,-i}(t_{i})). (22)

Ideally b~=O(n1/5)\tilde{b}=O(n^{-1/5}) by the proof of Theorem 4 and a similar argument from Wu and Zhou (2017) regarding quantile regression. Based on Chen and Hong (2012) where bn=112n1/5b_{n}=\frac{1}{\sqrt{12}}n^{-1/5} was chosen as a rule-of-thumb approach, a feasible grid of bnb_{n} can be selected around it, say equispaced points within the interval [0.5,1.5]112n1/5[0.5,1.5]\cdot\frac{1}{\sqrt{12}}n^{-1/5}. The proposed procedure works reasonably well in our simulation studies.

Regarding the bandwidth cnc_{n}, we can use cn=0.5bnc_{n}=0.5b_{n} as an easy implementation. For refinements, we recommend the extended Minimum Volatility (MV) method suggested by Zhou and Wu (2010), which is an extension of the minimal volatility method proposed in Politis et al. (1999). Suppose we are to get bootstrap samples of 𝜷𝐂(t)\mbox{\boldmath$\beta$}_{\mathbf{C}}(t), where 𝐂\mathbf{C} is a s×ps\times p matrix. Let the candidates of possible bandwidths be {c1,c2,,ck}\{c_{1},c_{2},\cdots,c_{k}\}. For t[2ch,12ch]t\in[2c_{h},1-2c_{h}], define Vh(t)=chμ𝐂(𝜷ˇch(t+ch)+𝜷ˇch(tch)2𝜷ˇch(t))V_{h}(t)=\sqrt{\frac{c_{h}}{\mu}}\mathbf{C}(\check{\mbox{\boldmath$\beta$}}_{c_{h}}(t+c_{h})+\check{\mbox{\boldmath$\beta$}}_{c_{h}}(t-c_{h})-2\check{\mbox{\boldmath$\beta$}}_{c_{h}}(t)) and Mh(t)=i=2nchntVh(ti)Vh(ti)M_{h}(t)=\sum_{i=\left\lceil 2nc_{h}\right\rceil}^{\lceil nt\rceil}V_{h}(t_{i})V_{h}(t_{i})^{\top}. We further define Mh(t)M_{h}(t) over interval [0,1][0,1] by letting Mh(t)=Mh(2ch)M_{h}(t)=M_{h}(2c_{h}) if t[0,2ch)t\in[0,2c_{h}) and Mh(t)=Mh(12ch)M_{h}(t)=M_{h}(1-2c_{h}) if t(12ch,1]t\in(1-2c_{h},1]. Based on this, we then compute {M1(tj),M2(tj),,Mk(tj)}j=1n\{M_{1}(t_{j}),M_{2}(t_{j}),\cdots,M_{k}(t_{j})\}_{j=1}^{n}, respectively. For a positive integral rr, say r=5r=5, define the integrated standard error (ise) as

ise[{Mha,b(t)}h=1kr+1]={1r1l=hh+r1(Mla,b(t)l=hh+r1Mla,b(t)/r)2}1/2,\displaystyle ise\left[\left\{{M}_{h}^{a,b}(t)\right\}_{h=1}^{k-r+1}\right]=\left\{\frac{1}{r-1}\sum_{l=h}^{h+r-1}\left({M}_{l}^{a,b}(t)-\sum_{l=h}^{h+r-1}{M}_{l}^{a,b}(t)/r\right)^{2}\right\}^{1/2},

where Mla,b(t){M}_{l}^{a,b}(t) represents the (a,b)th(a,b)_{th} entry of the matrix Ml(t){M}_{l}(t). Further, define

ise[{Mh}h=1kr+1]=1ni=1n(a=1sise[{Mha,a(ti)}h=1kr+1]2)1/2,\displaystyle ise\left[\left\{{M}_{h}\right\}_{h=1}^{k-r+1}\right]=\frac{1}{n}\sum_{i=1}^{n}\left(\sum_{a=1}^{s}ise\left[\left\{{M}_{h}^{a,a}(t_{i})\right\}_{h=1}^{k-r+1}\right]^{2}\right)^{1/2},

then cnc_{n} is chosen as ch+r/2c_{h+\lfloor r/2\rfloor} if h=argminh{1,,kr+1}(ise[{Mh}h=1kr+1])h=\mathop{\mbox{argmin}}_{h\in\{1,\cdots,k-r+1\}}\Big{(}ise\left[\left\{{M}_{h}\right\}_{h=1}^{k-r+1}\right]\Big{)}.

5.2 Exact Function Test

We generate data under the setting of Case I and Case II, focusing on 2 null hypotheses H0:β1(t)=0.5H_{0}:\beta_{1}(t)=0.5 and H0:(β1(t),β2(t))=(0.5,2log(1+2t))H_{0}:(\beta_{1}(t),\beta_{2}(t))^{\top}=(0.5,2\log(1+2t))^{\top} for both cases. Sample sizes N=300N=300 and N=500N=500 are considered, and 4 different losses are applied: quadratic loss, quantile losses with τ=0.15,τ=0.5\tau=0.15,\tau=0.5, and τ=0.85\tau=0.85.

Table 1: Simulated type I error of EFT in %\% with nominal level α=5%,10%\alpha=5\%,10\% for Case I.

L2L_{2} τ=0.15\tau=0.15 τ=0.5\tau=0.5 τ=0.85\tau=0.85 β1(t)\beta_{1}(t) (β1(t),β2(t))(\beta_{1}(t),\beta_{2}(t)) β1(t)\beta_{1}(t) (β1(t),β2(t))(\beta_{1}(t),\beta_{2}(t)) β1(t)\beta_{1}(t) (β1(t),β2(t))(\beta_{1}(t),\beta_{2}(t)) β1(t)\beta_{1}(t) (β1(t),β2(t))(\beta_{1}(t),\beta_{2}(t)) α\alpha 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% N=300N=300 b/1.15b/1.15 5.95.910.110.1 4.74.78.88.8 4.54.58.48.4 4.54.58.08.0 5.35.39.89.8 4.84.89.09.0 5.55.510.610.6 5.45.49.39.3 bb 5.25.210.510.5 5.25.210.010.0 6.26.210.110.1 5.15.18.78.7 5.85.89.79.7 6.06.010.410.4 6.86.810.710.7 5.05.09.59.5 1.15b1.15b 5.95.911.011.0 5.05.09.69.6 5.85.89.69.6 4.94.98.88.8 6.76.710.910.9 5.45.410.110.1 6.16.110.110.1 5.55.58.58.5 N=500N=500 b/1.15b/1.15 4.94.99.09.0 4.14.19.69.6 6.36.311.211.2 4.84.89.29.2 6.06.09.79.7 5.35.38.78.7 5.35.310.610.6 4.74.78.68.6 bb 5.15.110.210.2 5.65.610.110.1 6.36.310.810.8 4.74.79.09.0 5.55.58.98.9 5.35.38.98.9 5.35.39.29.2 4.94.98.68.6 1.15b1.15b 5.45.410.210.2 5.75.79.99.9 6.36.39.99.9 4.94.99.49.4 5.35.39.39.3 4.94.98.88.8 5.85.89.69.6 4.64.68.68.6

Table 2: Simulated type I error of EFT in %\% with nominal level α=5%,10%\alpha=5\%,10\% for Case II.

L2L_{2} τ=0.15\tau=0.15 τ=0.5\tau=0.5 τ=0.85\tau=0.85 β1(t)\beta_{1}(t) (β1(t),β2(t))(\beta_{1}(t),\beta_{2}(t)) β1(t)\beta_{1}(t) (β1(t),β2(t))(\beta_{1}(t),\beta_{2}(t)) β1(t)\beta_{1}(t) (β1(t),β2(t))(\beta_{1}(t),\beta_{2}(t)) β1(t)\beta_{1}(t) (β1(t),β2(t))(\beta_{1}(t),\beta_{2}(t)) α\alpha 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% N=300N=300 b/1.15b/1.15 5.25.210.910.9 5.25.29.79.7 4.74.79.89.8 4.74.78.98.9 6.26.210.210.2 4.84.89.09.0 5.65.610.210.2 5.35.39.19.1 bb 5.55.511.311.3 5.15.110.210.2 6.16.110.510.5 4.64.69.19.1 6.16.19.29.2 5.65.69.89.8 6.76.710.710.7 5.05.09.59.5 1.15b1.15b 6.36.311.111.1 6.06.010.910.9 6.26.29.79.7 4.54.58.08.0 5.45.410.710.7 5.25.28.98.9 6.96.911.811.8 5.65.69.59.5 N=500N=500 b/1.15b/1.15 5.45.49.99.9 5.75.79.89.8 6.16.111.011.0 4.54.59.19.1 6.66.69.89.8 5.55.510.110.1 6.26.210.210.2 4.14.19.29.2 bb 6.16.110.910.9 6.26.210.010.0 6.26.29.59.5 4.44.48.38.3 5.65.610.510.5 4.94.99.99.9 5.95.99.09.0 4.74.78.68.6 1.15b1.15b 5.95.910.910.9 6.46.410.810.8 5.55.59.79.7 4.94.98.18.1 5.65.610.710.7 4.44.49.79.7 6.16.19.79.7 4.64.69.19.1

Table 1 and 2 demonstrate the type I error of EFT with nominal levels α=5%,10%\alpha=5\%,10\%. Using the bandwidth bb selected via the procedures described in Section 5.1, a reasonable type I error can be observed for both cases. We also present results using b/1.15b/1.15 and 1.15b1.15b for sensitivity analysis. The results show a good approximation of the nominal level, as long as the bandwidths are not very far from the ones chosen by the proposed procedures.

To demonstrate the power of EFT, we generate data under the mechanism β1(t)=0.5+δ\beta_{1}(t)=0.5+\delta, while β0(t)\beta_{0}(t) and β2(t)\beta_{2}(t) remain unchanged from the original setting. Let δ\delta be a positive constant, then the power of the test H0:β1(t)=0.5H_{0}:\beta_{1}(t)=0.5 should increase with δ\delta. For comparison, we also replicate the SCB test proposed by Wu and Zhou (2017) under the same setting. For both Case I and Case II, we present the power when using quantile loss with τ=0.5\tau=0.5 and sample size N=500N=500. The nominal level used here is 5%5\%. Figure 2 and Figure 2 show that, in both cases, the power of both methods goes to 11 as δ\delta increases from 0 to 0.30.3. As expected, the power of EFT converges faster to 11 than the other method, which is guaranteed by the n\sqrt{n} convergence rate theoretically.

Refer to caption
Figure 1: Power of EFT and SCB: Case I
Refer to caption
Figure 2: Power of EFT and SCB: Case II

5.3 Lack-of-fit Test

Table 3: Simulated type I error of constancy test of β1(t)\beta_{1}(t) in Case I and II.

L2L_{2} τ=0.15\tau=0.15 τ=0.5\tau=0.5 τ=0.85\tau=0.85 Case I Case II Case I Case II Case I Case II Case I Case II α\alpha 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% N=300N=300 b/1.15b/1.15 5.15.18.68.6 5.05.010.010.0 5.45.48.98.9 5.65.610.010.0 5.05.08.78.7 5.05.08.88.8 5.15.18.98.9 5.65.610.710.7 bb 5.05.09.19.1 6.16.110.210.2 6.06.09.19.1 5.75.710.210.2 5.35.39.49.4 5.65.69.59.5 5.45.49.09.0 5.65.69.99.9 1.15b1.15b 4.44.48.08.0 6.06.09.49.4 4.74.78.08.0 5.85.89.69.6 5.25.29.19.1 5.95.910.210.2 5.35.39.89.8 6.56.510.210.2 N=500N=500 b/1.15b/1.15 5.95.99.29.2 4.74.79.69.6 6.16.110.310.3 5.55.510.610.6 5.05.09.19.1 5.95.99.69.6 6.76.710.810.8 5.95.99.59.5 bb 5.25.29.19.1 6.06.011.211.2 6.16.19.79.7 6.26.210.810.8 5.45.48.88.8 6.16.19.89.8 5.95.99.59.5 6.06.09.89.8 1.15b1.15b 5.65.69.99.9 5.15.110.010.0 5.45.49.09.0 5.95.910.610.6 5.55.59.19.1 5.15.19.69.6 6.66.610.310.3 6.16.19.69.6

Table 4: Simulated type I error of linearity test of β1(t)\beta_{1}(t) in Case I and II.

L2L_{2} τ=0.15\tau=0.15 τ=0.5\tau=0.5 τ=0.85\tau=0.85 Case I Case II Case I Case II Case I Case II Case I Case II α\alpha 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% 5%5\%10%10\% N=300N=300 b/1.15b/1.15 4.64.68.58.5 5.65.69.59.5 5.45.49.89.8 5.55.59.59.5 5.75.710.810.8 5.95.99.79.7 6.06.09.19.1 6.06.010.510.5 bb 4.54.58.38.3 6.56.59.89.8 6.36.38.98.9 5.05.09.19.1 5.65.69.49.4 5.75.79.79.7 6.16.110.210.2 5.45.49.79.7 1.15b1.15b 5.65.68.88.8 6.66.69.69.6 5.85.89.59.5 5.65.68.78.7 5.05.09.39.3 5.55.510.310.3 5.85.89.09.0 5.45.49.99.9 N=500N=500 b/1.15b/1.15 5.35.39.09.0 6.36.310.010.0 5.05.08.98.9 5.55.59.19.1 5.65.69.09.0 5.35.39.39.3 5.95.910.210.2 5.65.69.69.6 bb 5.95.98.98.9 6.46.410.510.5 5.75.79.19.1 5.65.69.19.1 6.86.89.99.9 6.06.09.59.5 5.55.510.410.4 6.06.010.210.2 1.15b1.15b 5.25.29.09.0 5.35.39.99.9 6.26.210.310.3 6.06.09.99.9 5.75.79.09.0 6.86.810.610.6 5.85.810.610.6 5.15.18.98.9

For illustration of the Lack-of-fit Test, we focus on β1(t)\beta_{1}(t) in Case I and II. Under both settings, β1(t)=0.5\beta_{1}(t)=0.5 satisfies constancy and linearity, thus the two types of Lack-of-fit Tests can be conducted to demonstrate the accuracy of our method. Following the procedures in Section 4.2, we obtain the simulated type I errors for the 4 losses and sample size N=300,500N=300,500. The optimal bandwidths are the same as those selected in Section 5.2 in that we are considering the same settings. Table 3 and Table 4 show that the accuracy is quite satisfactory and is not very sensitive to the selection of the bandwidths.

Under the setting of Case II, we also conducted the power analysis under 2 scenarios: (1) β1(t)=0.5+δt\beta_{1}(t)=0.5+\delta t; (2) β1(t)=0.5+δt2\beta_{1}(t)=0.5+\delta t^{2}, where the first scenario allows us to investigate constancy while the other enables the testing of linearity. The sample size is chosen to be N=500N=500 and the nominal level is 5%5\%. For each test, power curves regarding the 4 different losses are simulated and compared in Figure 4 and Figure 4. The quadratic loss is the most powerful and the quantile loss with τ=0.5\tau=0.5 comes second. As for the extreme quantiles τ=0.15\tau=0.15 and τ=0.85\tau=0.85, the convergence is relatively slower.

Refer to caption
Figure 3: Power of LOFT: Constancy
Refer to caption
Figure 4: Power of LOFT: Linearity

We then compare our method with Chen and Hong (2012) to test the constancy of the coefficients. Since their method assumes stationary covariates and errors, we consider the following stationary model for a more fair comparison:

  1. III

    yi=δsin(2πti)+0.5xi,1+δ2log(1+2ti)xi,2+eiy_{i}=\delta\cdot\sin(2\pi t_{i})+0.5x_{i,1}+\delta\cdot 2\log(1+2t_{i})x_{i,2}+e_{i}, where xi,1=0.5xi1,1+εix_{i,1}=0.5x_{i-1,1}+\varepsilon_{i}, xi,2=0.5xi1,2+ηix_{i,2}=0.5x_{i-1,2}+\eta_{i}, and {εi},{ηi},{ei}\{\varepsilon_{i}\},\{\eta_{i}\},\{e_{i}\} are i.i.d standard normal random variables independent of each other.

Refer to caption
Figure 5: Power comparison of LOFT

Note that when δ=0\delta=0, the model corresponds to constant coefficients 𝜷(t)=(0,0.5,0)\mbox{\boldmath$\beta$}(t)=(0,0.5,0)^{\top}, and for the joint constancy test of 𝜷(t)\mbox{\boldmath$\beta$}(t), we expect to see an increasing rejection rate as δ0\delta\geq 0 increases. We generate the data under N=500N=500 and use L2L_{2} loss to implement our method. The nominal level is chosen as α=5%\alpha=5\%. Figure 5 indicates that our method has comparable accuracy and power with Chen and Hong (2012)’s method, where ‘H-homo’ and ‘H-het’ represent the homoscedastic and heteroscedasticity-robust version of the generalized Hausman test proposed in Chen and Hong (2012). Considering that their method is designed for the constancy test of L2L_{2}-regression models and applies to stationary variables, our Lack-of-fit Test enjoys more flexibility and broader applicability, with a faster convergence rate theoretically.

5.4 Qualitative Test

For the Qualitative Test, exemplarily, we mainly focus on the hypothesis of monotonicity. Under the setting of Case II with N=500N=500, we generate the data using the following mechanism: β1(t)=0.5δ10exp((t0.5)2)\beta_{1}(t)=0.5-\delta\cdot 10\exp(-(t-0.5)^{2}) for δ0\delta\geq 0, and we expect to see an increasing power curve as δ\delta increases.

Applying a similar projection idea to the SCB test in Wu and Zhou (2017), we also manage to obtain the power curve regarding the monotonicity test for quantile loss with τ=0.5\tau=0.5 for their method. Figure 7 provides the results with nominal level α=5%\alpha=5\%, demonstrating the performance of our method regarding 4 losses and the comparison with the SCB test. The power curves of quadratic loss and quantile loss with τ=0.5\tau=0.5 converge faster than the extreme quantiles, as we expect. Figure 7 also shows that, under the same quantile loss with τ=0.5\tau=0.5, our qualitative test is more powerful than the SCB test with projection. This is due to the fact that our CRF-based test can detect local alternatives of the order O(1/n)O(1/\sqrt{n}) while the SCB in Wu and Zhou (2017) are sensitive to those of the order O(logn/nbn)O(\sqrt{\log n}/\sqrt{nb_{n}}) with bandwidth bn0b_{n}\rightarrow 0.

Refer to caption
Figure 6: Power of QT under different losses
Refer to caption
Figure 7: Power comparison of QT and SCB

6 Real Data Illustrations

6.1 Global Temperature Anomalies

In this section, we delve into monthly global temperature anomalies from 1882.1 to 2005.12 (available at HadCRUT5 dataset), investigating the anthropogenic warming trend and the time-varying relationship between the anomalies and potential factors. The initial candidate factors include lags of multivariate ENSO index (MEI), total solar irradiance (TSI), aerosol optical depth (AOD) and Atlantic multidecadal oscillation (AMO). These factors are typical regressors in existing studies, where multivariate linear regression is commonly employed to filter out the fluctuations caused by the factors and to reveal the underlying anthropogenic warming (Zhou and Tung, 2013). The authors believe that the warming trend has been remarkably steady since the mid-twentieth century. Foster and Rahmstorf (2011) also applied a similar approach to the global temperature anomalies from 1979 to 2010, where the deduced warming rate was also found to be steady over the investigated time interval. All previous analyses assume a stationary structure of the errors and constant coefficients over time, in addition to which a linear trend of anthropogenic warming is also hypothesized. Nevertheless, by utilizing the change-point detection method in (Zhou, 2013), nonstationarity of the residuals has been detected after fitting the multivariate linear model, suggesting the necessity to use a more general model that allows for the nonstationarity of the time series. Furthermore, no rigorous proof has been provided in the existing literature to draw convincing conclusions about the shape of the warming trend.

We hereby apply our method to the aforementioned time series. Following the steps stated in Section 4, a full model consisting of all the factors is first constructed and analyzed, then a backward variable selection is implemented based on our EFT. Most covariates are therefore removed due to insignificance or multicollinearity. The final model includes the intercept, MEI, and the 6-month lag of MEI under L2L_{2} loss, quantile loss with τ=0.15\tau=0.15 and τ=0.5\tau=0.5, while the τ=0.85\tau=0.85 quantile regression model is composed of the intercept, 2-month and 6-month lags of MEI, and 5-month lag of AOD. All models involve the intercept and corresponding lags of MEI, enabling us to conduct tests regarding the anthropogenic warming trend and the coefficient of the ENSO effect. The outcome agrees with the fact that ENSO has conventionally been recognized as a leading contributor to global temperature fluctuations (Trenberth et al., 2002; Foster and Rahmstorf, 2011). The 6-month lag of MEI chosen in each model also aligns with the belief that El Niño warms up the global temperature with a lag of \sim6 months (Trenberth et al., 2002). As a distinctive predictor, the 5-month lag of AOD is selected under the τ=0.85\tau=0.85 quantile, emphasizing the influence of volcano activities on extreme high temperatures. The model selection result coincides with vast literature in climatology, where ENSO and volcano events are identified as significant sources of variance in global temperature, compared to solar activities and other factors (Lean and Rind, 2008; Foster and Rahmstorf, 2011). On the other hand, our conclusion is drawn under general non-stationary assumptions which could be more reliable.

Table 5: Tests of monotonicity (increasing), linearity and constancy.

Intercept,Monotonicity\text{Intercept},\text{Monotonicity} Intercept,Linearity\text{Intercept},\text{Linearity} MEI or MEI (lag 2),Constancy\text{MEI or MEI (lag 2)},\text{Constancy} MEI (lag 6),Constancy\text{MEI (lag 6)},\text{Constancy} Test Statistic 95% C.V.{95\%}\text{ C.V.} p-val Test Statistic 95% C.V.{95\%}\text{ C.V.} p-val Test Statistic 95% C.V.{95\%}\text{ C.V.} p-val Test Statistic 95% C.V.{95\%}\text{ C.V.} p-val L2L_{2} 0.132 1.355 1 0.853 0.69 0.007 0.217 0.702 0.986 0.221 0.499 0.842 τ=0.15\tau=0.15 0.157 1.02 1 0.886 0.647 0.002 0.401 0.711 0.613 0.4 0.528 0.244 τ=0.5\tau=0.5 0.143 1.464 1 0.877 0.752 0.013 0.333 0.729 0.804 0.216 0.643 0.978 τ=0.85\tau=0.85 0.293 1.495 0.999 0.954 0.733 0.001 0.322 0.527 0.468 0.431 0.496 0.121

Refer to caption
Figure 8: Estimated warming trend
Refer to caption
Figure 9: Influence of the 5-month lag of AOD

We are interested in the hypotheses on the shape of the anthropogenic trend and the coefficients regarding MEI and AOD. It is commonly believed that the anthropogenic trend is increasing, and we can further check whether it is steady over time, which is equivalent to testing whether the intercept is a linear function. As for the coefficients of MEI and AOD, we examine whether they are time-invariant, which amounts to testing the constancy. The test results are summarized in Table 5, which demonstrates similar results under different losses. For the shape of the intercept, the positive monotonicity is not rejected, while the linearity is rejected at 5%5\% significance level. This strengthens the belief of discernible human influences on global warming and also suggests that the warming rate is not steady from 1882 to 2005. The non-uniform rate is consistent with the conclusion from IPCC (2007), where global warming is observed to accelerate until 2005 after a cooling period in the 1960s and 1970s. This can also be illustrated by Figure 9, with a relatively flat trend from 1960 to 1970, and a steep increase afterward. While a similar pattern can be detected by solely analyzing the temperature anomalies, our approach rigorously assesses the anthropogenic fluctuations by separating them from the natural ones, making the conclusion more reliable and statistically verifiable. As for the coefficients of MEI and its lags, we fail to reject the null hypothesis of constancy at 5%5\% significance level. This gives us the insight that the overall ENSO effect on global temperature is likely to be steady from 1882 to 2005.

To represent the temperature change induced by volcano activities, we demonstrate the influence of the 5-month lag of AOD in Figure 9, which is computed by multiplying the estimated coefficient and the factor itself. Unlike the traditional multivariate linear regression where only the overall negative effect of AOD can be statistically detected, Figure 9 reveals the effect of AOD as a combination of heating and long-term cooling. This finding complies with the mechanism of volcano activities, where volcano eruptions can cool the surface due to an increased aerosol loading in the stratosphere that scatters solar radiation back to space. Simultaneously, these eruptions can lead to regional heating, particularly evident during winters in the Northern Hemisphere (Robock, 2000; Christiansen, 2008). Such unstable and complicated impact is also confirmed using our Lack-of-fit Test, where the pp-values of the test of constancy and linearity turn out to be 0.029 and 0.002, respectively.

6.2 Microsoft Stock Return

As another application, we incorporate the time-varying M-regression framework with the Fama-French 5-factor (FF5) model to study the monthly return of Microsoft stock from 1986.5 to 2023.10. Initially proposed by Fama and French (2015), the FF5 model aims to capture the size, value, profitability, and investment patterns in average stock returns. There have been subsequent discussions on the time-stability of the regression coefficients of FF5. Racicot et al. (2019) accounted for the time-varying nature of the model using a GMM approach, presenting evidence against a static market systematic risk parameter β\beta. Similarly, the time-varying parameters were examined by Horváth and Wang (2021), especially during selected events such as the dot-com bubble around 2000, the 2008 financial crisis, and the COVID-19 outbreak. Noda (2022) further argued that the parameters change over time, exhibiting distinct patterns of change across different countries and regions.

In the meantime, a quantile version of the FF5 model has been discussed in order to analyze the extremes of the stock returns (Allen and Powell, 2011). Emerging as an alternative to least squares estimation, quantile regression not only looks beyond the conditional mean, but also alleviates some of the statistical problems which plague L2L_{2} model including errors-in-variables, omitted variables bias, sensitivity to outliers, and non-normal error distributions (Barnes and Hughes, 2002).

While the aforementioned alternatives have been primarily studied in a parallel way in the literature, both directions can be accommodated under our framework. Denoting the monthly return of Microsoft stock as RR, the model is written as

RiRFi=αi+β1,i(RMiRFi)+β2,iSMBi+β3,iHMLi+β4,iRMWi+β5,iCMAi+ei,\displaystyle R_{i}-{R_{F}}_{i}=\alpha_{i}+\beta_{1,i}({R_{M}}_{i}-{R_{F}}_{i})+\beta_{2,i}SMB_{i}+\beta_{3,i}HML_{i}+\beta_{4,i}RMW_{i}+\beta_{5,i}CMA_{i}+e_{i}, (23)

where RFR_{F} represents the risk-free rate, RMR_{M} is the monthly return on the value-weight (VW) market portfolio, and SMB,HML,RMW,CMASMB,HML,RMW,CMA are size factor, value factor, profitability factor, and investment factor, respectively. The data are accessible from Professor French’s website, and we refer to Fama and French (2015) for a detailed explanation of the variables.

We estimate the regression coefficients under 4 losses: L2L_{2} loss, quantile losses with τ=0.15\tau=0.15, 0.50.5, and 0.850.85. Using the test proposed in Zhou (2013), second-order stationarity of the residuals is rejected under all losses with pp-value 0.0085,0.0145,0.01750.0085,0.0145,0.0175 and 0, suggesting the existence of time non-stationarity in the underlying data. Additionally, following the procedures of LOFT as presented in Section 4.2, a joint constancy test is applied to the regression coefficients, which yields a pp-value of 0.0020.002, 0.0030.003, 0.0020.002, 0.0490.049 under the 4 losses, respectively. These results confirm the necessity of utilizing a time-varying coefficient model, and the original FF5 model that features static parameters might be misspecified.

Refer to caption
Figure 10: Estimated intercept α(t)\alpha(t)
Refer to caption
Figure 11: Estimated systematic risk β1(t)\beta_{1}(t)

We further focus on the behavior of the estimators under L2L_{2} loss, as the intercept α(t)\alpha(t) and the market coefficient β1(t)\beta_{1}(t) have been widely recognized as important indicators for investment. Jensen (1968) proposed the intercept as a measure of performance, where a positive value suggests that the stock tends to outperform the average market. Meanwhile, β1(t)\beta_{1}(t) is regarded as the measure of systematic risk (Sharpe, 1964). As is indicated in Figure 11, the estimated intercept is greater than 0 most of the time, which aligns with the result of non-negativity test with a pp-value equal to 11. When it comes to the systematic market risk β1(t)\beta_{1}(t), fluctuations and extreme values have been observed from Figure 11, especially during the dot-cum bubble around 2000. Additionally, β1(t)\beta_{1}(t) is smaller than 11 most of the time, indicating a relatively lower risk of Microsoft stock compared to the average market.

Table 6: Tests of significance of the predictors (pp-values).

Intercept RMRFR_{M}-R_{F} SMB HML RMW CMA L2L_{2} 0 0 0 0 0.482 0.028 τ=0.15\tau=0.15 0 0.029 0 0 0.976 0 τ=0.5\tau=0.5 0 0 0 0 0.368 0.375 τ=0.85\tau=0.85 0 0 0.68 0.002 0.075 0.428

While it remains unclear how to interpret the quantile coefficients in an economic context due to scarce literature on the quantile FF5 model, we gain some insights into the predictors by conducting significance tests on the regression coefficients. Presented in Table 6 are pp-values of the significance tests. The intercept and the market factor serve as significant predictors under all losses. Similarly, the value factor HML continues to be significant as well. The size factor SMB plays its role under L2L_{2} loss and L1L_{1} loss with τ=0.15\tau=0.15 and τ=0.5\tau=0.5, but becomes insignificant under the higher quantile τ=0.85\tau=0.85. This suggests that the higher quantile of Microsoft stock return is no longer attributable to the size factor. An opposite pattern can be observed for the profitability factor RMW, where it remains insignificant while starting to make a difference under higher quantile τ=0.85\tau=0.85. Finally, the investment factor CMA might contribute to the stock return under the L2L_{2} loss and lower quantile τ=0.15\tau=0.15, but is considered insignificant otherwise with τ=0.5\tau=0.5 or τ=0.85\tau=0.85.

7 Assumptions and Auxiliary Theoretical Results

Let zi(t)=yi𝐱i𝜷(t)𝐱i𝜷(t)(tit)z_{i}(t)=y_{i}-{\bf x}_{i}^{\top}\mbox{\boldmath$\beta$}(t)-{\bf x}_{i}^{\top}\mbox{\boldmath$\beta$}^{\prime}(t)(t_{i}-t), and define

𝜽~bn(t):=(𝜽^bn(t)𝜽^bn(t))=argmin𝒄0,𝒄1pi=1nρ(zi(t)𝐱i𝒄0𝐱i𝒄1(tit)/bn)Kbn(tit),\displaystyle\tilde{\mbox{\boldmath$\theta$}}_{b_{n}}(t):=\left(\begin{array}[]{c}\hat{\mbox{\boldmath$\theta$}}_{b_{n}}(t)\\ \hat{\mbox{\boldmath$\theta$}}_{b_{n}}^{\prime}(t)\end{array}\right)=\mathop{\mbox{argmin}}_{\mbox{\boldmath$c$}_{0},\mbox{\boldmath$c$}_{1}\in\mathbb{R}^{p}}\sum_{i=1}^{n}\rho(z_{i}(t)-{\bf x}_{i}^{\top}\mbox{\boldmath$c$}_{0}-{\bf x}_{i}^{\top}\mbox{\boldmath$c$}_{1}(t_{i}-t)/b_{n})K_{b_{n}}(t_{i}-t), (26)

where Kbn():=K(/bn)K_{b_{n}}(\cdot):=K(\cdot/b_{n}). Compare this with equation (7), simple calculations yield that

𝜽^bn(t)=𝜷^bn(t)𝜷(t),𝜽^bn(t)=bn(𝜷^bn(t)𝜷(t)).\displaystyle\hat{\mbox{\boldmath$\theta$}}_{b_{n}}(t)=\hat{\mbox{\boldmath$\beta$}}_{b_{n}}(t)-\mbox{\boldmath$\beta$}(t),\quad\hat{\mbox{\boldmath$\theta$}}_{b_{n}}^{\prime}(t)=b_{n}(\hat{\mbox{\boldmath$\beta$}}_{b_{n}}^{\prime}(t)-{\mbox{\boldmath$\beta$}}^{\prime}(t)). (27)

Let 𝜽=(𝜽0,𝜽1)=(θ01,θ02,θ0p,θ11,θ12,,θ1p)\mbox{\boldmath$\theta$}=(\mbox{\boldmath$\theta$}_{0}^{\top},\mbox{\boldmath$\theta$}_{1}^{\top})^{\top}=(\theta_{01},\theta_{02},\cdots\theta_{0p},\theta_{11},\theta_{12},\cdots,\theta_{1p})^{\top} be a (2p×1)(2p\times 1) vector, 𝐰in(t)=(𝐱i,𝐱i(i/nt)/bn)=(win,1(t),,win,2p(t))\mathbf{w}_{in}(t)=\left(\mathbf{x}_{i}^{\top},\mathbf{x}_{i}^{\top}(i/n-t)/b_{n}\right)^{\top}\allowbreak=\left(w_{in,1}(t),\ldots,w_{in,2p}(t)\right)^{\top}. Write ηi(t,𝜽)=ψ(zi(t)𝐰in(t)𝜽)Kbn(tit)\eta_{i}(t,\mbox{\boldmath$\theta$})=\psi(z_{i}(t)-\mathbf{w}_{in}^{\top}(t)\mbox{\boldmath$\theta$})K_{b_{n}}(t_{i}-t), Sn(t,𝜽)=i=1nηi(t,𝜽)𝐰in(t)S_{n}(t,\mbox{\boldmath$\theta$})=\sum_{i=1}^{n}\eta_{i}(t,\mbox{\boldmath$\theta$})\mathbf{w}_{in}(t). We assume supt(0,1)Sn(t,𝜽~bn(t))=O(logn)\sup_{t\in(0,1)}S_{n}(t,\tilde{\mbox{\boldmath$\theta$}}_{b_{n}}(t))=O_{\mathbb{P}}(\log n).

Remark 2.

The assumption supt(0,1)Sn(t,𝛉~bn(t))=O(logn)\sup_{t\in(0,1)}S_{n}(t,\tilde{\mbox{\boldmath$\theta$}}_{b_{n}}(t))=O_{\mathbb{P}}(\log n) controls the magnitude of Sn(t,𝛉~bn(t)S_{n}(t,\tilde{\mbox{\boldmath$\theta$}}_{b_{n}}(t), which is used to establish the Bahadur representation. If ψ()\psi(\cdot) is continuous, this assumption is automatically satisfied as supt(0,1)Sn(t,𝛉~bn(t))=0\sup_{t\in(0,1)}S_{n}(t,\tilde{\mbox{\boldmath$\theta$}}_{b_{n}}(t))=0. While in the case where ψ()\psi(\cdot) is discontinuous, the solution may not always exist. A key example of such discontinuous ψ()\psi(\cdot) arises in quantile regression (Wu, 2007, Remark 4) and the proof of it for quantile regression is provided in Lemma LABEL:lem:4 of the supplementary.

7.1 Model Assumptions

For q0,j=0,1,,rq\geq 0,j=0,1,\cdots,r, define for t(wj,wj+1]t\in(w_{j},w_{j+1}],

Ξj(q)(t,xk1,𝒢k)\displaystyle\Xi_{j}^{(q)}\left(t,x\mid\mathcal{F}_{k-1},\mathcal{G}_{k}\right) =\displaystyle= qxq𝔼{ψ(Gj(t,k,𝒢k)+x)k1,𝒢k},\displaystyle\dfrac{\partial^{q}}{\partial x^{q}}\mathbb{E}\left\{\psi\left(G_{j}\left(t,\mathcal{F}_{k},\mathcal{G}_{k}\right)+x\right)\mid\mathcal{F}_{k-1},\mathcal{G}_{k}\right\},
κ¯j(t,x𝒢k)\displaystyle\bar{\kappa}_{j}\left(t,x\mid\mathcal{G}_{k}\right) =\displaystyle= x𝔼(ψ(Gj(t,k,𝒢k)+x)𝒢k).\displaystyle\dfrac{\partial}{\partial x}\mathbb{E}\left(\psi\left(G_{j}\left(t,\mathcal{F}_{k},\mathcal{G}_{k}\right)+x\right)\mid\mathcal{G}_{k}\right).
  • (A1)

    For 0jr0\leq j\leq r, assume that

    supt(wj,wj+1),x,y,|Ξj(t,x1,𝒢0)Ξj(t,y1,𝒢0)|\displaystyle\sup_{t\in(w_{j},w_{j+1}),x,y,\in\mathbb{R}}|\Xi_{j}(t,x\mid\mathcal{F}_{-1},\mathcal{G}_{0})-\Xi_{j}(t,y\mid\mathcal{F}_{-1},\mathcal{G}_{0})| (28)
    Cj,1|xy|+Cj,2|x2y2|+Cj,3|xy|2,\displaystyle\leq C_{j,1}|x-y|+C_{j,2}|x^{2}-y^{2}|+C_{j,3}|x-y|^{2}, (29)

    where Cj,1,Cj,2,Cj,3C_{j,1},C_{j,2},C_{j,3} are (1,𝒢0)(\mathcal{F}_{-1},\mathcal{G}_{0}) measurable r.v. with finite fourth moments. We also require that for 0jr0\leq j\leq r, Ξj(1)(t,01,𝒢0)\Xi_{j}^{(1)}(t,0\mid\mathcal{F}_{-1},\mathcal{G}_{0}) is stochastically Lipschitz continuous for t(wj,wj+1]t\in(w_{j},w_{j+1}], that is, M<\exists M<\infty, s.t. t1,t2(wj,wj+1]\forall t_{1},t_{2}\in(w_{j},w_{j+1}], 0jr0\leq j\leq r,

    ||Ξj(1)(t1,01,𝒢0)Ξj(1)(t2,01,𝒢0)||M|t1t2|.\displaystyle||\Xi_{j}^{(1)}(t_{1},0\mid\mathcal{F}_{-1},\mathcal{G}_{0})-\Xi_{j}^{(1)}(t_{2},0\mid\mathcal{F}_{-1},\mathcal{G}_{0})||\leq M|t_{1}-t_{2}|. (30)
  • (A2)

    Assume for the covariate process, tx>0\exists t_{x}>0, s.t. max1in𝔼(exp(tx|𝐱i|))\max_{1\leq i\leq n}\mathbb{E}(\exp(t_{x}|\mathbf{x}_{i}|))\leq\infty. In addition, max1jrsupt(wj,wj+1]𝐇j(t,𝒢i)𝐇j(t,𝒢i)1=O(χ|i|)\max_{1\leq j\leq r}\sup_{t\in(w_{j},w_{j+1}]}||{\bf H}_{j}(t,\mathcal{G}_{i})-{\bf H}_{j}(t,\mathcal{G}_{i}^{*})||_{1}=O(\chi^{|i|}) for some constant χ(0,1)\chi\in(0,1). For all t1,t2(wj,wj+1],0jrt_{1},t_{2}\in(w_{j},w_{j+1}],0\leq j\leq r, 𝐇j(t1,𝒢0)𝐇j(t2,𝒢0)M|t1t2|||{\bf H}_{j}(t_{1},\mathcal{G}_{0})-{\bf H}_{j}(t_{2},\mathcal{G}_{0})||\leq M|t_{1}-t_{2}| for some constant M<M<\infty. Recall the definition of 𝒢i\mathcal{G}_{i}^{*} in Section 2.1.

  • (A3)

    For ϵ\epsilon\in\mathbb{R}, 0q2p+10\leq q\leq 2p+1 and any integer ss, define

    δs(q)(k1,ϵ):=max0jrsupt(wj,wj+1],|x|ϵ||Ξj(q)(t,xk1,𝒢k)Ξj(q)(t,xk1,𝒢k)||s.\displaystyle\delta_{s}^{(q)}(k-1,\epsilon):=\max_{0\leq j\leq r}\sup_{t\in(w_{j},w_{j+1}],|x|\leq\epsilon}||\Xi_{j}^{(q)}\left(t,x\mid\mathcal{F}_{k-1},\mathcal{G}_{k}\right)-\Xi_{j}^{(q)}\left(t,x\mid\mathcal{F}_{k-1}^{*},\mathcal{G}_{k}\right)||_{s}.

    We assume that ϵ0\exists\epsilon_{0}\in\mathbb{R}, s.t. δ1(q)(k1,ϵ0)=O(χ|k1|)\delta_{1}^{(q)}(k-1,\epsilon_{0})=O(\chi^{|k-1|}) for some constant 0<χ<10<\chi<1.

  • (A4)

    Define Ξ¯(q)(x𝒢i)=qxq𝔼[ψ(ei+x)𝒢i]\bar{\Xi}^{(q)}(x\mid\mathcal{G}_{i})=\frac{\partial^{q}}{\partial x^{q}}\mathbb{E}[\psi(e_{i}+x)\mid\mathcal{G}_{i}], we require for all i=1,2,,ni=1,2,\cdots,n and any pp-dimensional vector 𝐡\mathbf{h} that

    Ξ¯(0𝒢i)=𝔼[ψ(ei)𝒢i]=0,a.s.\displaystyle\bar{\Xi}(0\mid\mathcal{G}_{i})=\mathbb{E}[\psi(e_{i})\mid\mathcal{G}_{i}]=0,a.s.
    Ξ¯(𝐱i𝐡𝒢i)=Ξ¯(1)(0𝒢i)𝐱i𝐡+O(|𝐱i2||𝐡|2),\displaystyle\bar{\Xi}(\mathbf{x}^{\top}_{i}\mathbf{h}\mid\mathcal{G}_{i})=\bar{\Xi}^{(1)}(0\mid\mathcal{G}_{i})\mathbf{x}^{\top}_{i}\mathbf{h}+O_{\mathbb{P}}(|\mathbf{x}_{i}^{2}||\mathbf{h}|^{2}),

    and Ξ¯(1)(x𝒢i)>0\bar{\Xi}^{(1)}(x\mid\mathcal{G}_{i})>0 a.s. for |x|ϵ|x|\leq\epsilon for some ϵ>0\epsilon>0. Let λ¯n(t)\bar{\lambda}_{n}(t) be the smallest eigenvalue of 𝔼{i=1nΞ¯(1)(0𝒢i)}𝐱i𝐱iKbn(tit)}\mathbb{E}\{\sum_{i=1}^{n}\bar{\Xi}^{(1)}(0\mid\mathcal{G}_{i})\}\mathbf{x}_{i}\mathbf{x}^{\top}_{i}K_{b_{n}}(t_{i}-t)\}, assume that lim infninft(0,1)λ¯n(t)nbn>0\liminf_{n\to\infty}\inf_{t\in(0,1)}\frac{\bar{\lambda}_{n}(t)}{nb_{n}}>0. Let λ(t)\lambda(t) be the smallest eigenvalue of Σ(t):=𝔼{κ¯j(t,0𝒢i)𝐇j(t,𝒢i)𝐇j(t,𝒢i)}\Sigma(t):=\mathbb{E}\{\bar{\kappa}_{j}\left(t,0\mid\mathcal{G}_{i}\right){\bf H}_{j}(t,\mathcal{G}_{i}){\bf H}_{j}^{\top}(t,\mathcal{G}_{i})\} if t(wj,wj+1]t\in(w_{j},w_{j+1}]. We require that i) max0jrsupt(wj,wj+1]||κ¯j(t,0𝒢i)κ¯j(t,0𝒢i)||=O(χ|i|)\max_{0\leq j\leq r}\sup_{t\in(w_{j},w_{j+1}]}||\bar{\kappa}_{j}\left(t,0\mid\mathcal{G}_{i}\right)-\bar{\kappa}_{j}\left(t,0\mid\mathcal{G}_{i}^{*}\right)||=O(\chi^{|i|}) for some constant χ(0,1)\chi\in(0,1), and max0jrsupt(wj,wj+1]|κ¯j(t,0𝒢0)|M\max_{0\leq j\leq r}\sup_{t\in(w_{j},w_{j+1}]}|{\bar{\kappa}}_{j}\left(t,0\mid\mathcal{G}_{0}\right)|\leq M a.s. for some constant M<M<\infty; ii) inft(0,1)λ(t)η>0\inf_{t\in(0,1)}\lambda(t)\geq\eta>0 for some positive constant η\eta; iii) max0jrsupt(wj,wj+1]||κ¯˙j(t,0𝒢i)κ¯˙j(t,0𝒢i)||=O(χ|i|)\max_{0\leq j\leq r}\sup_{t\in(w_{j},w_{j+1}]}||\dot{\bar{\kappa}}_{j}\left(t,0\mid\mathcal{G}_{i}\right)-\dot{\bar{\kappa}}_{j}\left(t,0\mid\mathcal{G}_{i}^{*}\right)||=O(\chi^{|i|}) for some constant χ(0,1)\chi\in(0,1), and max0jrsupt(wj,wj+1]|κ¯˙j(t,0𝒢0)|M\max_{0\leq j\leq r}\sup_{t\in(w_{j},w_{j+1}]}|\dot{\bar{\kappa}}_{j}\left(t,0\mid\mathcal{G}_{0}\right)|\leq M a.s. for some constant M<M<\infty, where κ¯˙j(t,0𝒢i):=tκ¯j(t,0𝒢i)\dot{\bar{\kappa}}_{j}\left(t,0\mid\mathcal{G}_{i}\right):=\frac{\partial}{\partial t}\bar{\kappa}_{j}\left(t,0\mid\mathcal{G}_{i}\right).

  • (A5)

    Assume δ𝐖(k,4)=O(χk)\delta_{{\bf W}}(k,4)=O(\chi^{k}) for some χ(0,1)\chi\in(0,1), where 𝐖(t,i,𝒢i)=𝐇j(t,𝒢i)ψ(Gj(t,i,𝒢i))\mathbf{W}(t,\mathcal{F}_{i},\mathcal{G}_{i})={\bf H}_{j}(t,\mathcal{G}_{i})\psi(G_{j}(t,\mathcal{F}_{i},\mathcal{G}_{i})) if t(wj,wj+1]t\in(w_{j},w_{j+1}], j=0,,rj=0,\cdots,r. Also assume for the error process that supiψ(Gj(t,i,𝒢i))4<\sup_{i}||\psi(G_{j}(t,\mathcal{F}_{i},\mathcal{G}_{i}))||_{4}<\infty for all j=0,,rj=0,\cdots,r and t(wj,wj+1]t\in(w_{j},w_{j+1}]. Further assume for all t1,t2(wj,wj+1],0jrt_{1},t_{2}\in(w_{j},w_{j+1}],0\leq j\leq r, ψ(Gj(t1,0,𝒢0))ψ(Gj(t2,0,𝒢0))M|t1t2|||\psi(G_{j}(t_{1},\mathcal{F}_{0},\mathcal{G}_{0}))-\psi(G_{j}(t_{2},\mathcal{F}_{0},\mathcal{G}_{0}))||\leq M|t_{1}-t_{2}| for some constant M<M<\infty.

  • (A6)

    Define the long-run covariance function of {𝐖i:=𝐖(ti,i,𝒢i)=𝐱iψ(ei)}\{\mathbf{W}_{i}:=\mathbf{W}(t_{i},\mathcal{F}_{i},\mathcal{G}_{i})=\mathbf{x}_{i}\psi(e_{i})\}

    Ψ(t)=i= Cov[𝐖(t,0,𝒢0),𝐖(t,i,𝒢i)].\displaystyle\Psi(t)=\sum_{i=-\infty}^{\infty}\mbox{ Cov}[\mathbf{W}(t,\mathcal{F}_{0},\mathcal{G}_{0}),\mathbf{W}(t,\mathcal{F}_{i},\mathcal{G}_{i})].

    Let Ψ(0)=limt0Ψ(t)\Psi(0)=\lim_{t\downarrow 0}\Psi(t). Assume that the minimum eigenvalue of Ψ(t)\Psi(t) is bounded away from 0 on [0,1][0,1].

  • (A7)

    Define Σm=i=1mΣ1(ti)Ψ(ti)Σ1(ti)\Sigma_{m}=\sum_{i=1}^{m}\Sigma^{-1}(t_{i})\Psi(t_{i})\Sigma^{-1}(t_{i}). Assume that, for sufficiently large nn, there exists L<L<\infty, such that |Σs1nΣs2n|L(s2ns1n)|\Sigma_{\lfloor s_{1}n\rfloor}-\Sigma_{\lfloor s_{2}n\rfloor}|\leq L(\lfloor s_{2}n\rfloor-\lfloor s_{1}n\rfloor) holds for 0<s1<s2<10<s_{1}<s_{2}<1.

Here are some insights on the above regularity conditions. Assumption (A1) guarantees the smoothness of the conditional loss functions. (28) holds if supt(0,1)|ψ(1)(t)|<\sup_{t\in(0,1)}|\psi^{(1)}(t)|<\infty, which is satisfied by least square regression, and can be achieved by quantile regression under some mild constraints. A sufficient condition for (28) is provided in Wu and Zhou (2018) by virtue of the robustness of loss functions. (A2) requires that the covariate process is stochastic Lipschitz continuous with geometrically decaying dependence measures. The moment requirement of 𝐱i\mathbf{x}_{i} in (A2) can also be significantly relaxed when ψ()\psi(\cdot) is bounded or light-tailed. (A3) controls the dependence measures of the derivatives of the loss functions’ conditional expectations, and is easy to verify for a large class of non-stationary processes, see Wu and Zhou (2018) for further details. (A4) plays a key role in the consistency of 𝜷ˇ(t)\check{\mbox{\boldmath$\beta$}}(t) by endowing smoothness on Ξ¯(𝒢i)\bar{\Xi}(\cdot\mid\mathcal{G}_{i}) to some extent. These conditions altogether enable Bahadur representation of the estimators. Condition (A6) means the time-varying long-run covariance matrices of {𝐖i}i=1n\{\mathbf{W}_{i}\}_{i=1}^{n} are non-degenerate on [0,1][0,1], which is mild in most applications. Condition (A7) assumes Lipschitz continuity of Σm\Sigma_{m}. Observe that Σm\Sigma_{m} is the integration of Σ1(ti)Ψ(ti)Σ1(ti)\Sigma^{-1}(t_{i})\Psi(t_{i})\Sigma^{-1}(t_{i}) and hence the assumption is mild even though both Σ(t)\Sigma(t) and Ψ(t)\Psi(t) may experience jumps. (A7) is utilized to guarantee the tightness of the estimators, see Lemma LABEL:lem:7 in the supplementary material for more details.

7.2 Uniform Bahadur Representation

This section establishes uniform Bahadur representations for the local linear M-estimators and the estimated CRF, by which the limiting distribution of the estimators can be derived in conjunction with some Gaussian approximation results.

Define a 2p×2p2p\times 2p matrix Σ1(t)=(Σ(t)00μ2Σ(t))\Sigma_{1}(t)=\leavevmode\resizebox{48.0pt}{}{$\begin{pmatrix}\Sigma(t)&0\\ 0&\mu_{2}\Sigma(t)\end{pmatrix}$}, where Σ(t)\Sigma(t) is defined in (A4) and μk:=|x|kK(x)𝑑x\mu_{k}:=\int|x|^{k}K(x)\,dx for some positive integer kk.

Theorem 4.

Suppose (A1)-(A4), nbn4log8n\frac{nb_{n}^{4}}{\log^{8}n}\to\infty, and ncbn0n^{c}b_{n}\to 0 for some positive constant cc. Let 𝔗n=[bn,1bn]\mathfrak{T}_{n}=\left[b_{n},1-b_{n}\right], and 𝔗n=𝔗nj=1r[wjbn,wj+bn]\mathfrak{T}_{n}^{\prime}=\mathfrak{T}_{n}\setminus\cup_{j=1}^{r}[w_{j}-b_{n},w_{j}+b_{n}], then

supt𝔗n|Σ1(t)𝜽~bn(t)i=1nψ(zi(t))𝐰in(t)Kbn(i/nt)nbn|=O(πnnbn),\displaystyle\sup_{t\in\mathfrak{T}_{n}^{\prime}}\left|\Sigma_{1}(t)\tilde{\mbox{\boldmath$\theta$}}_{b_{n}}(t)-\frac{\sum_{i=1}^{n}\psi\left(z_{i}(t)\right)\mathbf{w}_{in}(t)K_{b_{n}}(i/n-t)}{nb_{n}}\right|=O_{\mathbb{P}}\left(\frac{\pi_{n}}{\sqrt{nb_{n}}}\right), (31)

where πn=bnlog6n+(nbn)14log3n+nbnbn3log3n\pi_{n}=b_{n}\log^{6}n+(nb_{n})^{-\frac{1}{4}}\log^{3}n+\sqrt{nb_{n}}b_{n}^{3}\log^{3}n.

The Bahadur representation in (31) involves a bias term involving 𝜷′′(t)\mbox{\boldmath$\beta$}^{\prime\prime}(t). Recall the Jackknife bias-corrected M-estimator Λˇ(t)\check{\Lambda}(t) defined in (8) and (9). We extend the above Bahadur representation result to the estimated CRF as follows.

Corollary 1.

Suppose the conditions of Theorem 4 hold, then for any s×ps\times p, sps\leq p full rank matrix 𝐂\mathbf{C}, πn=bnlog6n+(nbn)14log3n+nbnbn3log3n\pi_{n}=b_{n}\log^{6}n+(nb_{n})^{-\frac{1}{4}}\log^{3}n+\sqrt{nb_{n}}b_{n}^{3}\log^{3}n,

supt𝔗n|n𝐂(Λˇ(t)Λ(t))1ni=1nt𝐂Σ1(ti)ψ(ei)𝐱i|=O(πnbn).\displaystyle\sup_{t\in\mathfrak{T}_{n}}\left|\sqrt{n}\mathbf{C}\left(\check{\Lambda}(t)-{\Lambda}(t)\right)-\frac{1}{\sqrt{n}}\sum_{i=1}^{\lfloor nt\rfloor}\mathbf{C}\Sigma^{-1}\left(t_{i}\right)\psi\left(e_{i}\right)\mathbf{x}_{i}\right|=O_{\mathbb{P}}\left(\frac{\pi_{n}}{\sqrt{b_{n}}}\right).

SUPPLEMENTARY MATERIAL

Theoretical proof:

This file contains detailed proof of all theorems. (.pdf file)

References

  • Akakpo et al. (2014) Akakpo, N., F. Balabdaoui, and C. Durot (2014). Testing monotonicity via local least concave majorants. Bernoulli 20(2), 514 – 544.
  • Allen and Powell (2011) Allen, D. E. and S. R. Powell (2011). Asset Pricing, the Fama—French Factor Model and the Implications of Quantile-Regression Analysis, pp.  176–193. London: Palgrave Macmillan UK.
  • Barnes and Hughes (2002) Barnes, M. L. and A. T. W. Hughes (2002, November). A quantile regression analysis of the cross section of stock market returns.  (458522).
  • Bickel and Ritov (1988) Bickel, P. J. and Y. Ritov (1988). Estimating integrated squared density derivatives: sharp best order of convergence estimates. Sankhya A (1961-2002) 50(3), 381–393.
  • Bowman et al. (1998) Bowman, A. W., M. C. Jones, and I. Gijbels (1998). Testing monotonicity of regression. J. Comput. Graph. Stat. 7(4), 489–500.
  • Cai and Juhl (2023) Cai, Z. and T. Juhl (2023). The distribution of rolling regression estimators. Journal of Econometrics 235(2), 1447–1463.
  • Chen and Hong (2012) Chen, B. and Y. Hong (2012). Testing for smooth structural changes in time series models via nonparametric regression. Econometrica 80(3), 1157–1183.
  • Chetverikov (2019) Chetverikov, D. (2019). Testing regression monotonicity in econometric models. Econometric Theory 35(4), 729–776.
  • Christiansen (2008) Christiansen, B. (2008). Volcanic eruptions, large-scale modes in the northern hemisphere, and the el niño–southern oscillation. Journal of Climate 21(5), 910–922.
  • Dahlhaus et al. (2019) Dahlhaus, R., S. Richter, and W. B. Wu (2019). Towards a general theory for nonlinear locally stationary processes. Bernoulli 25(2), 1013 – 1044.
  • Dette et al. (2011) Dette, H., P. Preuß, and M. Vetter (2011). A measure of stationarity in locally stationary processes with applications to testing. J. Am. Statist. Ass. 106(495), 1113–1124.
  • Durot (2003) Durot, C. (2003). A Kolmogorov-type test for monotonicity of regression. Statist. Probab. Lett. 63(4), 425–433.
  • Ebi et al. (2021) Ebi, K. L., J. Vanos, J. W. Baldwin, J. E. Bell, D. M. Hondula, N. A. Errett, K. Hayes, C. E. Reid, S. Saha, J. Spector, and P. Berry (2021). Extreme weather and climate change: population health and health system implications. Annu. Rev. Public Health 42(1), 293–315.
  • Fama and French (2015) Fama, E. F. and K. R. French (2015). A five-factor asset pricing model. Journal of Financial Economics 116(1), 1–22.
  • Fan and Zhang (2000) Fan, J. and W. Zhang (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics 27(4), 715–731.
  • Fang and Seo (2021) Fang, Z. and J. Seo (2021). A projection framework for testing shape restrictions that form convex cones. Econometrica 89(5), 2439–2458.
  • Foster and Rahmstorf (2011) Foster, G. and S. Rahmstorf (2011, dec). Global temperature evolution 1979–2010. Environ. Res. Lett. 6(4), 044022.
  • Francq and Sucarrat (2023) Francq, C. and G. Sucarrat (2023). Volatility estimation when the zero-process is nonstationary. J. Bus. Econ. Statist. 41(1), 53–66.
  • Friedman and Tibshirani (1984) Friedman, J. and R. Tibshirani (1984). The monotone smoothing of scatterplots. Technometrics 26(3), 243–250.
  • Friedrich and Lin (2022) Friedrich, M. and Y. Lin (2022). Sieve bootstrap inference for linear time-varying coefficient models. Journal of Econometrics, 105345.
  • Ghosal et al. (2000) Ghosal, S., A. Sen, and A. W. van der Vaart (2000). Testing monotonicity of regression. Ann. Statist. 28(4), 1054–1082.
  • Gijbels et al. (2000) Gijbels, I., P. Hall, M. C. Jones, and I. Koch (2000). Tests for monotonicity of a regression mean with guaranteed level. Biometrika 87(3), 663–673.
  • Hall and Heckman (2000) Hall, P. and N. E. Heckman (2000). Testing for monotonicity of a regression mean by calibrating for linear functions. Ann. Statist. 28(1), 20–39.
  • Hall and Marron (1987) Hall, P. and J. Marron (1987). Estimation of integrated squared density derivatives. Statist. Probab. Lett. 6(2), 109–115.
  • Hardle and Mammen (1993) Hardle, W. and E. Mammen (1993). Comparing nonparametric versus parametric regression fits. Ann. Statist. 21(4), 1926 – 1947.
  • Hastie and Tibshirani (1993) Hastie, T. and R. Tibshirani (1993). Varying-coefficient models. J. R. Statist. Soc. B 55(4), 757–796.
  • He and Zhu (2003) He, X. and L.-X. Zhu (2003). A lack-of-fit test for quantile regression. J. Am. Statist. Ass. 98(464), 1013–1022.
  • Hong and White (1995) Hong, Y. and H. White (1995). Consistent specification testing via nonparametric series regression. Econometrica 63(5), 1133–1159.
  • Hoover et al. (1998) Hoover, D. R., J. A. Rice, C. O. Wu, and L.-P. Yang (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85(4), 809–822.
  • Horváth and Wang (2021) Horváth, D. and Y.-L. Wang (2021). The examination of fama-french model during the covid-19. Finance Research Letters 41, 101848.
  • Huang and Fan (1999) Huang, L.-S. and J. Fan (1999). Nonparametric estimation of quadratic regression functionals. Bernoulli 5(5), 927 – 949.
  • Huber (1964) Huber, P. J. (1964). Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73 – 101.
  • Huber (1973) Huber, P. J. (1973). Robust regression: asymptotics, conjectures and Monte Carlo. Ann. Statist. 1(5), 799 – 821.
  • I. Gijbels and Verhasselt (2017) I. Gijbels, M. A. I. and A. Verhasselt (2017). Shape testing in quantile varying coefficient models with heteroscedastic error. J. Nonparametr. Stat. 29(2), 391–406.
  • IPCC (2007) IPCC (2007). Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [Solomon, S., D. Qin, M. Manning, Z. Chen, M. Marquis, K.B. Averyt, M. Tignor and H.L. Miller (eds.)]. Cambridge: Cambridge university press.
  • Jensen (1968) Jensen, M. C. (1968). The performance of mutual funds in the period 1945-1964. The Journal of Finance 23(2), 389–416.
  • Karmakar et al. (2022) Karmakar, S., S. Richter, and W. B. Wu (2022). Simultaneous inference for time-varying models. Journal of Econometrics 227(2), 408–428.
  • Knight (2017) Knight, K. (2017). On the asymptotic distribution of the L{L}_{\infty} estimator in linear regression.
  • Koenker (2005) Koenker, R. (2005). Quantile Regression. Econometric Society Monographs. Cambridge University Press.
  • Komarova and Hidalgo (2020) Komarova, T. and J. Hidalgo (2020, June). Testing nonparametric shape restrictions.  (arXiv:1909.01675). arXiv:1909.01675 [econ, math, stat].
  • Koul and Stute (1999) Koul, H. L. and W. Stute (1999). Nonparametric model checks for time series. Ann. Statist. 27(1), 204 – 236.
  • Kreiss and Paparoditis (2014) Kreiss, J.-P. and E. Paparoditis (2014, 04). Bootstrapping Locally Stationary Processes. J. R. Statist. Soc. B 77(1), 267–290.
  • Lean and Rind (2008) Lean, J. L. and D. H. Rind (2008). How natural and anthropogenic influences alter global and regional surface temperatures: 1889 to 2006. Geophys. Res. Lett. 35(18).
  • Matzkin (1994) Matzkin, R. L. (1994). Chapter 42 restrictions of economic theory in nonparametric methods. Volume 4 of Handbook of Econometrics, pp.  2523–2558. Elsevier.
  • Mies (2023) Mies, F. (2023). Functional estimation and change detection for nonstationary time series. J. Am. Statist. Ass. 118(542), 1011–1022.
  • Nason (2013) Nason, G. (2013, 03). A Test for Second-Order Stationarity and Approximate Confidence Intervals for Localized Autocovariances for Locally Stationary Time Series. J. R. Statist. Soc. B 75(5), 879–904.
  • Noda (2022) Noda, A. (2022, August). Estimating the time-varying structures of the fama-french multi-factor models.  (arXiv:2208.01270). arXiv:2208.01270 [econ, q-fin].
  • Politis et al. (1999) Politis, D. N., J. P. Romano, and M. Wolf (1999). Choice of the Block Size, pp.  188–212. New York, NY: Springer New York.
  • Racicot et al. (2019) Racicot,  François-Éric., W. F. Rentz, D. Tessier, and R.  Théoret (2019, 09). The conditional fama-french model and endogenous illiquidity: A robust instrumental variables test. PLOS ONE 14(9), 1–26.
  • Robock (2000) Robock, A. (2000). Volcanic eruptions and climate. Reviews of Geophysics 38(2), 191–219.
  • Sharpe (1964) Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk*. The Journal of Finance 19(3), 425–442.
  • Silverman (1981) Silverman, B. W. (1981). Using kernel density estimates to investigate multimodality. J. R. Statist. Soc. B 43(1), 97–99.
  • Stute (1997) Stute, W. (1997). Nonparametric model checks for regression. Ann. Statist. 25(2), 613 – 641.
  • Stute et al. (1998) Stute, W., S. Thies, and L.-X. Zhu (1998). Model checks for regression: an innovation process approach. Ann. Statist. 26(5), 1916 – 1934.
  • Trenberth et al. (2002) Trenberth, K. E., J. M. Caron, D. P. Stepaniak, and S. Worley (2002). Evolution of El Niño–Southern Oscillation and global atmospheric surface temperatures. Journal of Geophysical Research: Atmospheres 107(D8), AAC 5–1–AAC 5–17.
  • Vogt (2012) Vogt, M. (2012). Nonparametric regression for locally stationary time series. Ann. Statist. 40(5), 2601–2633.
  • Wu and Zhou (2017) Wu, W. and Z. Zhou (2017). Nonparametric inference for time-varying coefficient quantile regression. J. Bus. Econ. Statist. 35(1), 98–109.
  • Wu and Zhou (2018) Wu, W. and Z. Zhou (2018). Gradient-based structural change detection for nonstationary time series M-estimation. Ann. Statist. 46(3), 1197–1224.
  • Wu (2007) Wu, W. B. (2007). M-estimation of linear models with dependent errors. Ann. Statist. 35(2), 495 – 521.
  • Zhou and Tung (2013) Zhou, J. and K.-K. Tung (2013). Deducing multidecadal anthropogenic global warming trends using multiple regression analysis. Journal of the Atmospheric Sciences 70(1), 3 – 8.
  • Zhou (2013) Zhou, Z. (2013). Heteroscedasticity and autocorrelation robust structural change detection. J. Am. Statist. Ass. 108(502), 726–740.
  • Zhou and Wu (2010) Zhou, Z. and W. B. Wu (2010). Simultaneous inference of linear models with time varying coefficients. J. R. Statist. Soc. B 72(4), 513–531.