This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Statistical inference on partially shape-constrained function-on-scalar linear regression models

Kyunghee Han University of Illinois at Chicago Yeonjoo Park Corresponding author (yeonjoo.park@utsa.edu) University of Texas at San Antonio Soo-Young Kim Fred Hutchinson Cancer Center
Abstract

We consider functional linear regression models where functional outcomes are associated with scalar predictors by coefficient functions with shape constraints, such as monotonicity and convexity, that apply to sub-domains of interest. To validate the partial shape constraints, we propose testing a composite hypothesis of linear functional constraints on regression coefficients. Our approach employs kernel- and spline-based methods within a unified inferential framework, evaluating the statistical significance of the hypothesis by measuring an L2L^{2}-distance between constrained and unconstrained model fits. In the theoretical study of large-sample analysis under mild conditions, we show that both methods achieve the standard rate of convergence observed in the nonparametric estimation literature. Through numerical experiments of finite-sample analysis, we demonstrate that the type I error rate keeps the significance level as specified across various scenarios and that the power increases with sample size, confirming the consistency of the test procedure under both estimation methods. Our theoretical and numerical results provide researchers the flexibility to choose a method based on computational preference. The practicality of partial shape-constrained inference is illustrated by two data applications: one involving clinical trials of NeuroBloc in type A-resistant cervical dystonia and the other with the National Institute of Mental Health Schizophrenia Study.

Keywords: Nonparametric estimation, partial shape constraints, shape-constrained kernel least squares, shape-constrained regression spline, testing

1 Introduction

As function-valued data acquisition becomes increasingly common, there has been a significant amount of work devoted to functional regression models to address coefficient function estimation and inferential capabilities [21, 32]. Among them, inference for the function-on-scalar regression (FoSR) model has been making implications in many fields [24] by identifying a dynamic association between response and scalar covariates, varying over the domain. Besides finding the statistical evidence of a non-null covariate effect on response trajectories over the domain, validating the shape of functional regression coefficients, such as monotonicity, convexity, or concavity, over a specific subset of the domain is crucial for domain experts to allow tangible interpretation in practice. Once the functional shape is validated, one may incorporate the corresponding conditions to the model estimates to avoid potential biases. Or, in practical estimation problems, shape restrictions on functional regression coefficients over specific sub-intervals of the domain can be known as prior knowledge.

Our motivating example is the functional data analysis approach to demonstrating the treatment efficacy of drug use during the period of clinical trials in the National Institute of Mental Health Schizophrenia Collaborative Study. The previous studies, including [1] and [12], showed a significant continual drop in disease severity for the treatment group, i.e., improving drug efficacy, over weeks through the shape-constraint test on the drug effect coefficient function. While the study design and models are provided in Section 4.2, the left panel of Figure 1 displays the collected longitudinal disease severity measures surveyed from the initial treatment (week 0) to week 6 from randomly selected 30 patients of placebo and treatment cohorts. However, refined conclusions over specific periods can be of practical interest to practitioners, such as when the maximum drug efficacy is achieved or whether we can conclude significant improvement in drug effectiveness even during the later weeks of trials. As we shall see in Section 4.2, our proposed partial inference method concludes the significant monotone decrease in disease severity owing to the drug treatment over weeks 0–3, followed by the consistent duration of such efficacy for the remainder weeks, illustrated with the blue solid line in the right panel of Figure 1. This is somewhat distinct from constrained estimates under the condition of monotone decrease over the entire domain, the conclusion of [12], and unconstrained smooth estimates, which are displayed with the black dotted and dashed lines, respectively.

Refer to caption
Figure 1: The left panel illustrates the individual trajectories of disease severity for a subset of patients in the treatment and placebo groups. The average disease severity for each group (solid line) shows an overall decrease, with the treatment group experiencing a rapid drop in the first few weeks, followed by a continual reduction at a seemingly similar rate for both groups. As shown in the right panel, [12] previously validated this observation using globally monotone shape-constrained estimates (black dashed line), which provide more explicit and decision-making interpretations than unconstrained estimates (black dotted line). However, the global shape-constrained inference may be misleading by ignoring the rebound after Week 5, as indicated by the unconstrained estimates. Our proposed method (blue solid line) with partial shape constraints refines the conclusion, suggesting that the duration of drug action may effectively extend up to Week 3, with sustained efficacy thereafter.

The recent literature on functional data analysis and its applications also pays attention to shape-restricted estimation and inference in functional regression models. For the inference on shape-constrained regression coefficients, [6] proposed the goodness-of-fit test based on empirical processes projected to the space with shape constraints, while [22] addresses a similar problem under the null hypothesis with linear operator constraints. For estimation, [4] extended the shape-constrained kernel smoothing to the functional and longitudinal data framework, and [12] employed the Bernstein polynomials for shape-constrained estimation in functional regression with one of the data application results from [12] being illustrated in Figure 1. To name of few other applications, in economics, motivated by the pioneering work from Slutsky [31] recognizing the necessity of shape-restricted estimation and inference, [5] elaborated theoretical and application works in econometric research. In public health research, the analysis of growth charts under the monotone increasing shape restrictions has provided a crucial clinical tool for growth screening during infancy, childhood, and adolescence [15] or such growth chart helps health providers assess monotone increasing growth patterns against age-specific percentile curves [9]. The reliability engineering society also paid attention to this topic for bathtub-shaped function estimation in assessing the degradation of system reliability. However, to our best knowledge, inference for partial shape constraints has not been recognized, although such interests can be plausible in practice.

In this study, we propose the inferential tool to validate the partial shape constraints on functional regression coefficients in FoSR, along with two corresponding estimation approaches using kernel-smoothing and spline techniques. Suppose one is interested in the shape constraints on a sub-interval =[a,b][0,1]\mathcal{I}=[a,b]\subset[0,1], where collected response trajectories span [0,1][0,1]. Under the existing tools for shape-constrained inference, one might consider using a part of functional observations restricted on \mathcal{I} and apply the standard testing procedure. However, such an approach will cause a significant drop in sample sizes in the case of a bounded number of measurements on response trajectories, which is common in longitudinal studies, including our motivating example illustrated in Figure 1. The situation becomes much worse depending on the length of \mathcal{I} or the sparsity of evaluation grids on it. Consequently, this naive approach would result in a severe deterioration in the power. To prevent this, we propose to borrow information from observations over the entire domain to perform the inference even for testing focused on specific subsets of the domain. It indeed prevents the significant drop in the power of the test while keeping the desirable level of type-I error, as demonstrated in the simulation studies of Section 3. The other contribution of our paper is in developing two partial shape-constrained regression estimators in parallel using the most widely used nonparametric smoothing techniques, kernel-smoothing and smoothing splines. The proposed unified testing tool applicable to both estimates would provide practical flexibility in a real application. We additionally derive asymptotic behaviors of the regression coefficient estimators with embedded partial shape constraints.

The article is organized as follows. Section 2 presents the proposed method and results of the study. We introduce the partially shape-constrained FoSR models in Section 2.1. The estimation methods and theoretical findings are provided in Sections 2.2 and 2.3 that cover the kernel and spline estimators, respectively. In Section 2.4, we establish a unified inferential procedure for testing the partial shape constraints, encompassing kernel and spline approaches. Simulation results are reported in Section 3, and data applications are illustrated with two examples in Section 4. Technical details and proofs are provided in Supplementary Material.

2 Methodology and Theory

2.1 Testing partial shape constraints in FoSR models

Let YiY_{i} be the functional response coupled with a vector covariate 𝐗i=(Xi,1,,Xi,p)\mathbf{X}_{i}=(X_{i,1},\ldots,X_{i,p})^{\top} as

Yi(t)=j=1pβj(t)Xi,j+εi(t)(t[0,1])\displaystyle Y_{i}(t)=\sum_{j=1}^{p}\beta_{j}(t)X_{i,j}+\varepsilon_{i}(t)\quad(t\in[0,1]) (2.1)

independently for i=1,,ni=1,\ldots,n, where 𝜷(t)=(β1(t),,βp(t))\boldsymbol{\beta}(t)=\big{(}\beta_{1}(t),\ldots,\beta_{p}(t)\big{)}^{\top} is a vector coefficient function and εi(t)\varepsilon_{i}(t) is a mean-zero stochastic process that models the regression error uncorrelated with 𝐗i\mathbf{X}_{i}, i.e., 𝔼(εi(t)|𝐗i)=0\mathbb{E}(\varepsilon_{i}(t)|\mathbf{X}_{i})=0. We assume that Xi,1,,Xi,pX_{i,1},\ldots,X_{i,p} are linearly independent with positive probability, which allows the design matrix to include the intercept, e.g., Xi,1=1X_{i,1}=1 for all i=1,,ni=1,\ldots,n, depending on the inferential interest. Despite the generality of the functional model (2.1), it is practically infeasible to observe the functional outcomes as the infinite-dimensional objects, and we assume that discrete evaluations 𝐘i=(Yi,1,,Yi,Li)\mathbf{Y}_{i}=(Y_{i,1},\ldots,Y_{i,L_{i}})^{\top} with Yi,=Yi(Ti,)Y_{i,\ell}=Y_{i}(T_{i,\ell}) are only accessible for =1,,Li\ell=1,\ldots,L_{i}, where Ti,1,,Ti,LiT_{i,1},\ldots,T_{i,L_{i}} is a random sample of TT with a probability density function π\pi on [0,1][0,1]. Here, LiL_{i} is an independent random integer that may or may not depend on the sample size nn. More detailed conditions on LiL_{i} required for kernel and regression spline methods are provided in the Supplementary Material S.1 and S.2. We denote the finite random sample as 𝒳n={(𝐘i,𝐗i):i=1,,n}\mathcal{X}_{n}=\{(\mathbf{Y}_{i},\mathbf{X}_{i}):i=1,\ldots,n\}.

Suppose we test the functional shape of βj\beta_{j} on a pre-specified sub-interval j[0,1]\mathcal{I}_{j}\subset[0,1] for jJj\in J with J{1,,p}J\subset\{1,\ldots,p\}. The null hypothesis “H0:H0,j is true for all jJH_{0}:\textrm{$H_{0,j}$ is true for all $j\in J$}” consists of individual hypotheses

H0,j:βj is 𝒜j-shaped on j\displaystyle H_{0,j}:\textrm{$\beta_{j}$ is $\mathcal{A}_{j}$-shaped on $\mathcal{I}_{j}$} (2.2)

for jJj\in J. For example, if we hypothesize that βj\beta_{j} is monotone increasing on j\mathcal{I}_{j}, we read (2.2) as “H0,j:βj is monotone increasing on jH_{0,j}:\textrm{$\beta_{j}$ is monotone increasing on $\mathcal{I}_{j}$}.” Similarly, if one tests a partial convexity hypothesis, (2.2) can also be written as “H0,j:βj is convex on jH_{0,j}:\textrm{$\beta_{j}$ is convex on $\mathcal{I}_{j}$}.” In this paper, monotone and convex hypotheses will mainly be exemplified, yet one can extend our framework to general shape constraints, similarly as covered by many others in the literature [17, 18, 27, 28, 20, 25].

We then propose to reject the null hypothesis H0H_{0} if the test statistic

Dn=1ni=1n01{𝐗i(𝜷^(t)𝜷~(t))}2dt\displaystyle\begin{split}D_{n}=\frac{1}{n}\sum_{i=1}^{n}\int_{0}^{1}\left\{\mathbf{X}_{i}^{\top}\big{(}\widehat{\boldsymbol{\beta}}(t)-\widetilde{\boldsymbol{\beta}}(t)\big{)}\right\}^{2}\,\mathrm{d}t\end{split} (2.3)

is significantly large, where 𝜷^(t)=(β^1(t),,β^p(t))\widehat{\boldsymbol{\beta}}(t)=\big{(}\hat{\beta}_{1}(t),\ldots,\hat{\beta}_{p}(t)\big{)}^{\top} and 𝜷~(t)=(β~1(t),,β~p(t))\widetilde{\boldsymbol{\beta}}(t)=\big{(}\tilde{\beta}_{1}(t),\ldots,\tilde{\beta}_{p}(t)\big{)}^{\top} denote the estimates of coefficient function 𝜷(t)=(β1(t),,βp(t))\boldsymbol{\beta}(t)=\big{(}\beta_{1}(t),\ldots,\beta_{p}(t)\big{)}^{\top} under the null hypothesis H0H_{0} and its general alternative H1H_{1}, respectively. While this DnD_{n} is partly motivated by [22], a goodness-of-fit based test statistic for validating functional constraints on βj(t)\beta_{j}(t) via linear operator, similar types of the L2L^{2}-based test statistics were frequently employed in the literature for testing the nullity of functional difference for general scope [30, 36, 35]. Indeed, our proposed method corresponds to the global shape constraints if one sets j=[0,1]\mathcal{I}_{j}=[0,1]. The following Sections 2.2 and 2.3 elaborate on how to obtain 𝜷^(t)\widehat{\boldsymbol{\beta}}(t) under kernel-smoothing and spline-smoothing approaches, respectively. Based on those estimators, we present a bootstrap test procedure in Section 2.4.

2.2 Estimation: Shape-constrained kernel-weighted least squares

We assume that the vector coefficient 𝜷(t)\boldsymbol{\beta}(t) is as smooth as it allows local linear approximation 𝜷(u)𝜷(t)+𝜷˙(t)(ut)\boldsymbol{\beta}(u)\approx\boldsymbol{\beta}(t)+\dot{\boldsymbol{\beta}}(t)(u-t) for uu near tt, where 𝜷˙(t)=(β1(t),,βp(t))\dot{\boldsymbol{\beta}}(t)=\big{(}\beta_{1}^{\prime}(t),\ldots,\beta_{p}^{\prime}(t)\big{)}^{\top} denotes the gradient of 𝜷(t)\boldsymbol{\beta}(t). The local linear kernel estimator of 𝔹(t)=[𝜷(t),𝜷˙(t)]p×2\mathbb{B}(t)=\big{[}\boldsymbol{\beta}(t),\dot{\boldsymbol{\beta}}(t)\big{]}\in\mathbb{R}^{p\times 2} is given by the unconstrained minimizer 𝔹~(t)\widetilde{\mathbb{B}}(t) of the following objective function.

n(𝔹(t))=i=1n1Li=1Li[Yi,𝐗i𝔹(t)𝐙i,(t)]2Kh(Ti,t)\displaystyle\begin{split}\mathcal{L}_{n}\big{(}\mathbb{B}(t)\big{)}&=\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{[}Y_{i,\ell}-\mathbf{X}_{i}^{\top}\mathbb{B}(t)\mathbf{Z}_{i,\ell}(t)\Big{]}^{2}K_{h}(T_{i,\ell}-t)\end{split} (2.4)

where Kh(u)=K(u/h)/hK_{h}(u)=K(u/h)/h with a probability density function KK and a bandwidth h>0h>0, 𝐙i,(t)=(1,Ti,t)\mathbf{Z}_{i,\ell}(t)=\big{(}1,T_{i,\ell}-t\big{)}^{\top} is the local smoothing design, and 𝐗i𝔹(t)𝐙i,(t)=𝕎i,(t)vec(𝔹(t))\mathbf{X}_{i}^{\top}\mathbb{B}(t)\mathbf{Z}_{i,\ell}(t)=\mathbb{W}_{i,\ell}(t)^{\top}\mathrm{vec}\big{(}\mathbb{B}(t)\big{)} with vec(𝔹(t))=(𝜷(t),𝜷˙(t))2p\mathrm{vec}\big{(}\mathbb{B}(t)\big{)}=\big{(}\boldsymbol{\beta}(t)^{\top},\dot{\boldsymbol{\beta}}(t)^{\top}\big{)}^{\top}\in\mathbb{R}^{2p} of 𝔹(t)\mathbb{B}(t) and the Kronecker product 𝕎i,(t)=𝐙i,(t)𝐗i\mathbb{W}_{i,\ell}(t)=\mathbf{Z}_{i,\ell}(t)\otimes\mathbf{X}_{i} of the covarite design for local linear smoothing.

Proposition 2.2.1.

For each t[0,1]t\in[0,1], let 𝔹0(t)=[𝛃0(t),𝛃˙0(t)]\mathbb{B}_{0}(t)=\big{[}\boldsymbol{\beta}_{0}(t),\dot{\boldsymbol{\beta}}_{0}(t)\big{]} be the p×2p\times 2 matrix, where 𝛃0(t)\boldsymbol{\beta}_{0}(t) is the true vector coefficient function such that 𝛃¨0(t)=(β1′′(t),,βp′′(t))\ddot{\boldsymbol{\beta}}_{0}(t)=\big{(}\beta_{1}^{\prime\prime}(t),\ldots,\beta_{p}^{\prime\prime}(t)\big{)}^{\top} exists and is continuous. Suppose conditions (K0)(K0)(K4)(K4) in Supplementray Material S.1 hold. Then, we have

supt[0,1]𝜷~(t)𝜷0(t)=OP(h2+lognnh).\displaystyle\sup_{t\in[0,1]}\big{\|}\widetilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\big{\|}=O_{P}\bigg{(}h^{2}+\sqrt{\frac{\log n}{nh}}\bigg{)}. (2.5)

The implication of conditions (K0)(K0)(K4)(K4) are provided in Supplementary Material S.1. It is worth mentioning that Proposition 2.2.1 is valid even if only a few longitudinal observations are available for some subjects. For example, it is common in observational studies that dense observations of functional responses for some subjects may not be available regardless of the sample size nn, i.e., the minimum number of longitudinal observations λn=min1inLi\lambda_{n}=\min_{1\leq i\leq n}L_{i} is bounded. Therefore, the technical condition on λn\lambda_{n} in (K4)(K4) is not a restriction because we do not require λn\lambda_{n}\to\infty as nn\to\infty.

Remark 1.

For technical conditions (K0)(K0)(K4)(K4), k>2.5k>2.5 and a=1/5a=1/5 are commonly adopted in the standard theory of kernel smoothing [10]. Then, Theorem 2.2.1 gives supt[0,1]𝜷~(t)𝜷0(t)=OP(n2/5logn)\sup_{t\in[0,1]}\|\tilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\|=O_{P}(n^{-2/5}\sqrt{\log n}). It follows that 01𝜷~(t)𝜷0(t)2dt=OP(n2/5logn)\int_{0}^{1}\big{\|}\widetilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\big{\|}^{2}\,\mathrm{d}t=O_{P}(n^{-2/5}\sqrt{\log n}).

For the shape-constrained estimation of 𝜷^\widehat{\boldsymbol{\beta}} under H0H_{0}, we extend the kernel-weighted least squares for nonparametric regression with scalar responses [34], where side conditions are empirically examined over a fine grid 𝒢M={tm[0,1]:m=0,1,,M}\mathcal{G}_{M}=\{t_{m}\in[0,1]:m=0,1,\ldots,M\} with 0=t0<t1<<tM1<tM=10=t_{0}<t_{1}<\cdots<t_{M-1}<t_{M}=1 so that β^j\hat{\beta}_{j} satisfies the null hypothesis (2.2) over 𝒢j=𝒢|j={tjmj:m=0,1,,Mj}\mathcal{G}_{j}=\mathcal{G}|_{\mathcal{I}_{j}}=\{t_{j_{m}}\in\mathcal{I}_{j}:m=0,1,\ldots,M_{j}\} for each jJj\in J. Specifically, if we consider “H0,j:βj is monotone increasing on jH_{0,j}:\textrm{$\beta_{j}$ is monotone increasing on $\mathcal{I}_{j}$},” we propose to substitute

H0,j:βj(tjm)βj(tjm1)0for allm=1,,Mj.\displaystyle H_{0,j}^{\prime}:\beta_{j}(t_{j_{m}})-\beta_{j}(t_{j_{m-1}})\geq 0\,\,\,\textrm{for all}\,\,\,m=1,\ldots,M_{j}. (2.6)

Similarly, we empirically validate the partial hypothesis “H0,j:βk is convex on jH_{0,j}:\textrm{$\beta_{k}$ is convex on $\mathcal{I}_{j}$}” as

H0,j:βj(tjm)βj(tjm1)tjmtjm1βj(tjm1)for allm=1,,Mj.\displaystyle H_{0,j}^{\prime}:\frac{\beta_{j}(t_{j_{m}})-\beta_{j}(t_{j_{m-1}})}{t_{j_{m}}-t_{j_{m-1}}}\geq\beta_{j}^{\prime}(t_{j_{m-1}})\,\,\,\textrm{for all}\,\,\,m=1,\ldots,M_{j}. (2.7)

To define the shape-constrained kernel-weighted least squares for the function-on-scalar regression model (2.1) under the empirical null hypothesis on the grid 𝒢M\mathcal{G}_{M}, we note that

n(𝔹(t))=i=1n1Li=1Li(𝕎i,(t)[vec(𝔹(t))vec(𝔹~(t))]e~i,(t))2Kh(Ti,t)=𝒬n(𝔹(t),𝔹~(t))+1ni=1n1Li=1LiKh(Ti,t)e~i,(t)2,\displaystyle\begin{split}\mathcal{L}_{n}\big{(}\mathbb{B}(t)\big{)}&=\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{(}\mathbb{W}_{i,\ell}(t)^{\top}\big{[}\mathrm{vec}\big{(}\mathbb{B}(t)\big{)}-\mathrm{vec}\big{(}\widetilde{\mathbb{B}}(t)\big{)}\big{]}-\tilde{e}_{i,\ell}(t)\Big{)}^{2}K_{h}(T_{i,\ell}-t)\\ &=\mathcal{Q}_{n}\big{(}\mathbb{B}(t),\widetilde{\mathbb{B}}(t)\big{)}+\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\tilde{e}_{i,\ell}(t)^{2},\end{split} (2.8)

where e~i,=Yi,𝕎i,vec(𝔹~(t))\tilde{e}_{i,\ell}=Y_{i,\ell}-\mathbb{W}_{i,\ell}^{\top}\mathrm{vec}\big{(}\widetilde{\mathbb{B}}(t)\big{)} is the unconstrained residual of fitting Yi,Y_{i,\ell} and 𝒬n(𝔹(t),𝔹~(t))=vec(𝔹(t)𝔹~(t))Ψ^(t)vec(𝔹(t)𝔹~(t))\mathcal{Q}_{n}\big{(}\mathbb{B}(t),\widetilde{\mathbb{B}}(t)\big{)}=\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)}^{\top}\widehat{\Psi}(t)\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)} with Ψ^(t)=n1i=1nLi1=1Li𝕎i,(t)𝕎i,(t)Kh(Ti,t)\widehat{\Psi}(t)=n^{-1}\sum_{i=1}^{n}L_{i}^{-1}\sum_{\ell=1}^{L_{i}}\mathbb{W}_{i,\ell}(t)\mathbb{W}_{i,\ell}(t)^{\top}K_{h}(T_{i,\ell}-t). To get (2.8), we used the fact that the unconstrained estimator 𝔹~(t)\widetilde{\mathbb{B}}(t) solves the estimating equation n(𝔹(t))/vec(𝔹(t))=02p\partial\mathcal{L}_{n}\big{(}\mathbb{B}(t)\big{)}/\partial\mathrm{vec}\big{(}\mathbb{B}(t)\big{)}=0_{2p}, or equivalently n1i=1nLi1=1Li𝕎i,(t)e~i,Kh(Ti,t)=02pn^{-1}\sum_{i=1}^{n}L_{i}^{-1}\sum_{\ell=1}^{L_{i}}\mathbb{W}_{i,\ell}(t)\tilde{e}_{i,\ell}K_{h}(T_{i,\ell}-t)=0_{2p}, where 02p=(0,,0)0_{2p}=(0,\ldots,0)^{\top} denotes the zero vector in 2p\mathbb{R}^{2p}. Noting that the minimization of n(𝔹(t))\mathcal{L}_{n}\big{(}\mathbb{B}(t)\big{)} only depends on the first term of (2.8) given the unconstrained estimator B~(t)\widetilde{B}(t), we propose the constrained optimization problem as follows.

Minimizem=0Mvec(𝔹(tm)𝔹~(tm))Ψ^(tm)vec(𝔹(tm)𝔹~(tm))subject toAj[vec(𝔹(tj0))vec(𝔹(tjM))]0Mjfor alljJ,\displaystyle\begin{split}\textrm{Minimize}&\quad\sum_{m=0}^{M}\mathrm{vec}\big{(}\mathbb{B}(t_{m})-\widetilde{\mathbb{B}}(t_{m})\big{)}^{\top}\widehat{\Psi}(t_{m})\mathrm{vec}\big{(}\mathbb{B}(t_{m})-\widetilde{\mathbb{B}}(t_{m})\big{)}\\ \quad\textrm{subject to}&\quad A_{j}\left[\begin{array}[]{c}\mathrm{vec}\big{(}\mathbb{B}(t_{j_{0}})\big{)}\\ \vdots\\ \mathrm{vec}\big{(}\mathbb{B}(t_{j_{M}})\big{)}\end{array}\right]\geq 0_{M_{j}}\,\,\,\textrm{for all}\,\,\,j\in J,\end{split} (2.9)

where 0kk0_{k}\in\mathbb{R}^{k} and Ok×k×O_{k\times\ell}\in\mathbb{R}^{k\times\ell} are the kk-vector and (k,)(k,\ell)-matrix of zeros, respectively, and AjMj×2p(M+1)A_{j}\in\mathbb{R}^{M_{j}\times 2p(M+1)} is the linear constraint matrix associated with the individual hypothesis H0,jH_{0,j}. For the monotone increasing hypothesis (2.6), we set Aj=Amono(j)[ej,0p]A_{j}=A_{\mathrm{mono}}(\mathcal{I}_{j})\otimes\big{[}\mathrm{e}_{j}^{\top},0_{p}^{\top}\big{]}, where ejp\textrm{e}_{j}\in\mathbb{R}^{p} is the unit vector with 11 only at its jj-th coordinate and Amono(j)Mj×(Mj+1)A_{\mathrm{mono}}(\mathcal{I}_{j})\in\mathbb{R}^{M_{j}\times(M_{j}+1)} is defined as

Amono(j)\displaystyle A_{\mathrm{mono}}(\mathcal{I}_{j}) =[110000110000011].\displaystyle=\left[\begin{array}[]{ccc c cc}-1&1&0&\cdots&0&0\\ 0&-1&1&\cdots&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&-1&1\end{array}\right]. (2.14)

Similarly, for the convexity hypothesis (2.7), we set we set Aj=Aconv,0(j)[ej,0p]Aconv,1(j)[0p,ej]A_{j}=A_{\mathrm{conv},0}(\mathcal{I}_{j})\otimes\big{[}\mathrm{e}_{j}^{\top},0_{p}^{\top}\big{]}-A_{\mathrm{conv},1}(\mathcal{I}_{j})\otimes\big{[}0_{p}^{\top},\mathrm{e}_{j}^{\top}\big{]}, where Aconv,0(k),Aconv,1(j)Mj×(Mj+1)A_{\mathrm{conv},0}(\mathcal{I}_{k}),A_{\mathrm{conv},1}(\mathcal{I}_{j})\in\mathbb{R}^{M_{j}\times(M_{j}+1)} are defined as

Aconv,0(j)=[1tj1tj01tj1tj000001tj2tj11tj2tj1000001tjMjtjMj11tjMjtjMj1]\displaystyle\begin{split}A_{\mathrm{conv},0}(\mathcal{I}_{j})&=\left[\begin{array}[]{ccc c cc}\frac{-1}{t_{j_{1}}-t_{j_{0}}}&\frac{1}{t_{j_{1}}-t_{j_{0}}}&0&\cdots&0&0\\ 0&\frac{-1}{t_{j_{2}}-t_{j_{1}}}&\frac{1}{t_{j_{2}}-t_{j_{1}}}&\cdots&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&\frac{-1}{t_{j_{M_{j}}}-t_{j_{M_{j}-1}}}&\frac{1}{t_{j_{M_{j}}}-t_{j_{M_{j}-1}}}\end{array}\right]\end{split} (2.15)

and Aconv,1(j)=[IMj,0Mj]A_{\mathrm{conv},1}(\mathcal{I}_{j})=\big{[}I_{M_{j}},0_{M_{j}}\big{]}. This illustrates that the constrained estimates 𝔹^(t0),,𝔹^(tM)\widehat{\mathbb{B}}(t_{0}),\ldots,\widehat{\mathbb{B}}(t_{M}) can be obtained by the standard quadratic programming with jJMj\sum_{j\in J}M_{j} linear constraints.

Remark 2.

Suppose we hypothesize “H0,j:βj is monotone increasing on j,1 and convex on j,2H_{0,j}:\textrm{$\beta_{j}$ is monotone increasing on $\mathcal{I}_{j,1}$ and convex on $\mathcal{I}_{j,2}$},” where j,1\mathcal{I}_{j,1} and j,2\mathcal{I}_{j,2} may or may not overlap. Then, the side conditions corresponding to the null hypothesis can also be examined by imposing both linear constraint matrices Amono(j,1)[ej,0p]A_{\mathrm{mono}}(\mathcal{I}_{j,1})\otimes\big{[}\mathrm{e}_{j}^{\top},0_{p}^{\top}\big{]} and Aconv,0(j,2)[ej,0p]Aconv,1(j,2)[0p,ej]A_{\mathrm{conv},0}(\mathcal{I}_{j,2})\otimes\big{[}\mathrm{e}_{j}^{\top},0_{p}^{\top}\big{]}-A_{\mathrm{conv},1}(\mathcal{I}_{j,2})\otimes\big{[}0_{p}^{\top},\mathrm{e}_{j}^{\top}\big{]}.

Theorem 2.2.2.

Suppose the conditions in Proposition 2.2.1 hold. For any M1M\geq 1, let 𝒢M={tm[0,1]:m=0,1,,M}\mathcal{G}_{M}=\{t_{m}\in[0,1]:m=0,1,\ldots,M\} be the finite grid used in (2.9) with 0=t0<t1<<tM=10=t_{0}<t_{1}<\cdots<t_{M}=1. Under the null hypothesis H0H_{0} in (2.2), we have

maxt𝒢M𝜷^(t)𝜷0(t)=OP(h2+lognnh).\displaystyle\max_{t\in\mathcal{G}_{M}}\big{\|}\widehat{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\big{\|}=O_{P}\bigg{(}h^{2}+\sqrt{\frac{\log n}{nh}}\bigg{)}. (2.16)

Theorem 2.2.2 indicates that the shape-constrained kernel least square estimator achieves the same rate of convergence as the unconstrained estimator under the null hypothesis. Moreover, suppose the empirical distribution of 𝒢M\mathcal{G}_{M} converges to the uniform distribution on [0,1][0,1]. Then, it can be verified that 01𝜷^(t)𝜷0(t)dt=OP(n2/5logn)\int_{0}^{1}\big{\|}\widehat{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\big{\|}\,\mathrm{d}t=O_{P}(n^{-2/5}\sqrt{\log n}) similarly to Remark 1.

2.3 Estimation: Shape-constrained regression spline

We now consider fitting the functional regression model by means of spline functions for smooth, flexible, and parsimonious estimation of coefficient functions. Let ϕk(t)\phi_{k}(t), k=1,,dk=1,\ldots,d denote spline basis functions defined over [0,1][0,1] for a given sequence of knots, then we express βj(t)=k=1dcjkϕk(t)\beta_{j}(t)=\sum_{k=1}^{d}c_{jk}\phi_{k}(t), where cjkc_{jk} denotes the basis coefficients. Under the shape constraints on βj(t)\beta_{j}(t) over j\mathcal{I}_{j}, we can still employ the regression spline approach based on chosen basis functions, such as II-splines [23], CC-splines [18] or general BB-splines, with corresponding constraints assigned on a subset of basis coefficients cjkc_{jk}. While we focus on partially constrained estimation based on II- or CC- splines in this section, estimation via BB-splines is also discussed in the Supplementary Material S.6.

[23] first introduced II-splines for curve estimation under the monotonicity restriction over the entire domain. As illustrated in the left panel of Figure 2, II-spline basis functions are piece-wise quadratic, and at each of the knots indicated by dotted vertical lines, there is exactly one basis function with a non-zero slope. Thus, a monotone increasing (decreasing) spline function can be constructed as a non-negative (non-positive) linear combination of II-spline basis functions. For readers interested in generating process of II-splines, we refer [23] on how piece-wise quadratic II-splines are formed from nonnegative MM-spline family. Then, if the specific aim is on assigning, for example, monotone increasing restriction over a subset of the domain j\mathcal{I}_{j}, non-negative coefficients condition should apply to a subset of cjkc_{jk} for which corresponding II-spline ϕk(t)\phi_{k}(t) display positive slopes within the range of j\mathcal{I}_{j}.

Refer to caption
Figure 2: The piece-wise quadratic II-spline basis functions and the piece-wise cubic CC-spline basis functions under the location of equally spaced knots indicated by the symbol ‘×\times’ along with the dotted vertical lines.

Next, in terms of convexity constraints, [18] proposed CC-splines by integrating II-splines. As shown in the right panel of Figure 2, CC-splines form convex and piece-wise cubic functions with non-zero second derivatives at each knot, implying that a linear combination of CC-splines with non-negative (non-positive) basis coefficients ensures the convexity (concavity) of the function estimator. See Section 2 of [18] for details. Similarly, for the convexity constraints over sub-interval(s) j\mathcal{I}_{j}, we can impose non-negative restrictions to basis coefficients corresponding to splines showing positive second derivates features within j\mathcal{I}_{j}.

We now apply the shape-restricted spline function technique to the functional regression model framework to estimate functional coefficients under (2.2). Although methodological or theoretical developments are general to other shape constraints, such as convexity or monotone convexity, we elaborate on the estimation under the monotonicity restriction below and refer to remarks for other restrictions. Let ϕkj(t)\phi_{k}^{j}(t), k=1,,djk=1,\ldots,d_{j}, denote piece-wise quadratic II-spline basis functions under a given knot sequence, employed to fit βj(t)\beta_{j}(t), j=1,,pj=1,\ldots,p. Especially for partially constrained coefficients jJj\in J, the knot sequence should include lower and upper boundaries of j\mathcal{I}_{j}. For the case of multiple disjoint sub-intervals form of j\mathcal{I}_{j}, all boundaries should be a part of the knot sequence. We then estimate the functional regression coefficients by minimizing the following objective function with respect to non-negative conditions assigning on a subset of basis coefficients cjkc_{jk} depending on the set JJ and AjA_{j} as,

Minimizei=1n1Li=1Li[Yi,j=1pXi,j{k=0djcjkϕkj(Ti,)}]2subject tocjk0for{(j,k):jJandkAj},\displaystyle\begin{split}\textrm{Minimize}~{}~{}~{}&\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{[}Y_{i,\ell}-\sum_{j=1}^{p}{X}_{i,j}\big{\{}\sum_{k=0}^{d_{j}}c_{jk}\phi_{k}^{j}(T_{i,\ell})\big{\}}\Big{]}^{2}\\ \textrm{subject to}~{}~{}~{}&c_{jk}\geq 0~{}~{}~{}\textrm{for}~{}~{}\{(j,k):j\in J~{}\textrm{and}~{}k\in A_{j}\},~{}~{}\end{split} (2.17)

where ϕ0j(t)\phi_{0}^{j}(t) indicates the intercept function, i.e., ϕ0j(t)=1\phi_{0}^{j}(t)=1, so that cj0c_{j0} represents the location of monotonicity functions of βj(t)\beta_{j}(t), the dimensionality of splines djd_{j} is determined by the number of knots chosen to approximate βj\beta_{j}, the set JJ from (2.2) includes indices which coefficients are constrained in H0H_{0}, and the set AjA_{j} is defined as {k{1,,dj}:ϕkj(t)>0,fortj}\big{\{}k\in\{1,\ldots,d_{j}\}:{\phi_{k}^{j}}{{}^{\prime}}(t)>0,~{}\textrm{for}~{}t\in\mathcal{I}_{j}\big{\}}.

To obtain shape-restricted β^j(t)\hat{\beta}_{j}(t), we first write our regression problem associated with (2.17) as below. We elaborate for the case of regular evaluation grids, T1,,TLT_{1},\ldots,T_{L}, for i=1,,ni=1,\ldots,n, and dj=dd_{j}=d for representation simplicity, but having different Ti,T_{i,\ell} or djd_{j} does not affect methodological developments. Let vec([𝒄j]j=1p)=(𝒄1,,𝒄p)\textrm{vec}\big{(}[\boldsymbol{c}_{j}]_{j=1}^{p}\big{)}=(\boldsymbol{c}_{1}^{\top},\ldots,\boldsymbol{c}_{p}^{\top})^{\top} for 𝒄j=(cj0,cj1,,cjd)\boldsymbol{c}_{j}=(c_{j0},c_{j1},\ldots,c_{jd})^{\top}, then

𝐘i=(𝐗iΦ)vec([𝒄j]j=1p)+ϵi,i=1,,n,\mathbf{Y}_{i}=(\mathbf{X}_{i}^{\top}\otimes\Phi)\cdot\textrm{vec}\big{(}[\boldsymbol{c}_{j}]_{j=1}^{p}\big{)}+\boldsymbol{\epsilon}_{i},\quad i=1,\ldots,n, (2.18)

under cjk0c_{jk}\geq 0 for certain (j,k)(j,k) specified in (2.17), where Φ\Phi denotes a L×(d+1)L\times(d+1) basis matrix having its \ell-th row consisting of (1,ϕ1(T),\big{(}1,\phi_{1}(T_{\ell}), ,\ldots, ϕd(T))\phi_{d}(T_{\ell})\big{)} for =1,,L\ell=1,\ldots,L. Then, the coefficient estimation is associated with finding the projection of nLnL-dimensional (𝐘1,𝐘n)(\mathbf{Y}_{1}^{\top}\ldots,\mathbf{Y}_{n}^{\top})^{\top} onto the space spanned by columns of (𝐗Φ)(\mathbf{X}\otimes\Phi) with corresponding constraints on 𝒄j\boldsymbol{c}_{j}, where 𝐗\mathbf{X} denotes the (n×p)(n\times p) design matrix. By letting 𝝈hnL\boldsymbol{\sigma}^{h}\in\mathbb{R}^{nL}, for h=1,,p(d+1)h=1,\ldots,p(d+1), denote the set of column vectors on (𝐗Φ)(\mathbf{X}\otimes\Phi) and ()\mathcal{L}(\cdot) represent the linear space spanned by a given set of vectors, we separate 𝝈h\boldsymbol{\sigma}^{h}, hH1h\in H_{1}, a set of column vectors associated with basis coefficients with no non-negativity constraint, and define 𝒱=({𝝈h;hH1})\mathcal{V}=\mathcal{L}\big{(}\{\boldsymbol{\sigma}^{h};~{}h\in H_{1}\}\big{)} indicating the linear space spanned by them. We then further obtain a set of generating vectors, denoted as 𝜹h\boldsymbol{\delta}^{h} for hH2h\in H_{2} satisfying H2={1,,p(d+1)}\H1H_{2}=\{1,\ldots,p(d+1)\}\backslash H_{1}, that are orthogonal to 𝒱\mathcal{V}. That is, 𝜹h=𝝈hΠ(𝝈h|𝒱)\boldsymbol{\delta}^{h}=\boldsymbol{\sigma}^{h}-\Pi(\boldsymbol{\sigma}^{h}|\mathcal{V}), for hH2h\in H_{2}, where Π\Pi indicates a projection operator, such as the Gram-Schmidt process. By extending [18], we can characterize the constrained set as

𝒞={𝝁nL:𝝁=hH1bh𝝈h+hH2bh𝜹h,wherebh0,forhH2}.\mathcal{C}=\big{\{}\boldsymbol{\mu}\in\mathbb{R}^{nL}:~{}~{}\boldsymbol{\mu}=\sum_{h\in H_{1}}b_{h}\boldsymbol{\sigma}^{h}+\sum_{h\in H_{2}}b_{h}\boldsymbol{\delta}^{h},~{}~{}\textrm{where}~{}b_{h}\geq 0,~{}\textrm{for}~{}h\in H_{2}\}. (2.19)

Then the projection of (𝐘1,𝐘n)(\mathbf{Y}_{1}^{\top}\ldots,\mathbf{Y}_{n}^{\top})^{\top} to 𝒞\mathcal{C} can be fulfilled by projecting it onto the set of nonnegative linear combinations of 𝜹h\boldsymbol{\delta}^{h}, hH2h\in H_{2}, called as constraint cone, and onto the 𝒱\mathcal{V}, separately. We refer to Section 2 of [18] for readers interested in more comprehensive descriptions under a nonparametric regression setting. Owing to the fact that the constraint set 𝒞\mathcal{C} is a closed convex polyhedral cone in nL\mathbb{R}^{nL}, the projection is unique, and we employ the cone projection algorithm of [19] to find a solution, available as the function qprog or coneA in the R package coneproj. We note that the unconstrained estimates β~j(t)\tilde{\beta}_{j}(t) can be calculated by the least square estimates of 𝒄j\boldsymbol{c}_{j} under (2.18).

Next, we investigate rates of convergence for estimated functional coefficients under the constraints. Let KjnK_{jn} denote the number of knots to fit βj(t)\beta_{j}(t) growing with nn, qq is the order of the spline, and aL2\|a\|_{L_{2}} be the L2L_{2} norm of a square-integrable function a(t)a(t). Under Conditions (S1)(S1)(S6)(S6), deferred in Supplementary Material S.2, and limnKnlogKn/n=0\lim_{n\rightarrow\infty}K_{n}\log K_{n}/n=0 for Kn=max1jpKjnK_{n}=\text{max}_{1\leq j\leq p}K_{jn}, Theorem 2 of [14] and Theorem 6.25 of [26] imply the consistency of unconstrained 𝜷~(t)=(β~1(t),,β~p(t))\tilde{\boldsymbol{\beta}}(t)=\big{(}\tilde{\beta}_{1}(t),\ldots,\tilde{\beta}_{p}(t)\big{)}^{\top} written as, β~jβjL22=Op(Knn1+Kn2q)\|\tilde{\beta}_{j}-\beta_{j}\|^{2}_{L_{2}}=O_{p}(K_{n}n^{-1}+K_{n}^{-2q}), j=1,,p.j=1,\ldots,p. Next, we show the consistency of the constrained estimators by adding the condition (S7) specified in Supplementary Material.

Theorem 2.3.1.

When shape constraints assigned to βj\beta_{j} is true in H0H_{0}, Knj=O(n1/(2q+1))K_{nj}=O(n^{1/(2q+1)}), and conditions (S1)(S1)(S7)(S7) are satisfied, the constrained estimator 𝛃^(t)=(β^1(t),,β^p(t))\widehat{\boldsymbol{\beta}}(t)=\big{(}\hat{\beta}_{1}(t),\ldots,\hat{\beta}_{p}(t)\big{)}^{\top} attains the same rate as the unconstrained estimator.

β^jβjL22=Op(Knn1+Kn2q),j=1,,p,\|\hat{\beta}_{j}-\beta_{j}\|^{2}_{L_{2}}=O_{p}(K_{n}n^{-1}+K_{n}^{-2q}),\quad j=1,\ldots,p,

where its minimum rate Op(n2q/(2q+1))O_{p}(n^{-2q/(2q+1)}) is achieved when Kn=O(n1/(2q+1))K_{n}=O(n^{1/(2q+1)}).

Remark 3 (Convexity shape constraints).

When the convexity constraint is considered in (2.2) instead of monotonicity, we employ CC-splines and express the coefficient function in the objective function (2.17) as βj(t)=cj0,1ϕ0,1(Ti,)+cj0,2ϕ0,2(Ti,)+k=1djcjkϕkj(Ti,),\beta_{j}(t)=c_{j0,1}\phi_{0,1}(T_{i,\ell})+c_{j0,2}\phi_{0,2}(T_{i,\ell})+\sum_{k=1}^{d_{j}}c_{jk}\phi_{k}^{j}(T_{i,\ell}), where ϕ0,1(t)\phi_{0,1}(t) denotes the intercept function, ϕ0,2(t)\phi_{0,2}(t) is the identity function, i.e., ϕ0,2(Ti,)=Ti,\phi_{0,2}(T_{i,\ell})=T_{i,\ell}, and ϕkj(t)\phi_{k}^{j}(t) denotes CC-splines under a given sequence of knots. The first two terms without constraints on cj0,1c_{j0,1} and cj0,2c_{j0,2} determine the first-order behaviors of functional coefficients. Then, we assign constraints cjk0c_{jk}\geq 0, for (j,k)(j,k), where jJj\in J and kAjk\in A_{j} with the set AjA_{j} defined as {k{1,,dj}:ϕkj(t)′′>0,fortj}\{k\in\{1,\ldots,d_{j}\}:{\phi_{k}^{j}}{{}^{\prime\prime}}(t)>0,~{}\textrm{for}~{}t\in\mathcal{I}_{j}\}. The non-negativity constraints on basis coefficients associated with ϕkj(t)\phi_{k}^{j}(t) having positive second derivatives within j\mathcal{I}_{j} ensures the convexity of functional estimates over j\mathcal{I}_{j}. For concavity constraint, we apply the same framework but with non-positivity constraints on a chosen set of cjkc_{jk}. In this way, we can consider general H0H_{0} with various types of shape constraints on βj(t)\beta_{j}(t) over j\mathcal{I}_{j}, for jJj\in J.

2.4 Test procedure

We employ a resampling method to assess the statistical significance of the observed DnD_{n} given a random sample 𝒳n\mathcal{X}_{n}. We note that resampling functional observations , say 𝐘~i\widetilde{\mathbf{Y}}_{i} == (Y~i(Ti,1),\big{(}\widetilde{Y}_{i}(T_{i,1}), ,\ldots, Y~i(Ti,Li))\widetilde{Y}_{i}(T_{i,L_{i}})\big{)}^{\top}, is not straightforward because Y~i(Ti,1),,Y~i(Ti,Li)\widetilde{Y}_{i}(T_{i,1}),\ldots,\widetilde{Y}_{i}(T_{i,L_{i}}) should inherit the joint distribution of Yi(Ti,1),,Yi(Ti,Li){Y}_{i}(T_{i,1}),\ldots,{Y}_{i}(T_{i,L_{i}}) with within-subject heteroscedasticity. In literature, the wild bootstrap method [33, 13, 16] has been extensively investigated that can effectively handle the heteroscedasticity, and there have been many advances for handling dependent data [3, 7, 29, 11]. In this study, we use the wild bootstrap procedure as described in Algorithm 1, which can be regarded as the wild bootstrap for clustered data [3, 7].

Algorithm 1 Bootstrap procedure under the partial shape constraints
1:For a given a random sample 𝒳n={(𝐗i,𝐘i):i=1,,n}\mathcal{X}_{n}=\{(\mathbf{X}_{i},\mathbf{Y}_{i}):i=1,\ldots,n\}, find the unconstrained and shape-constrained estimators 𝜷~\widetilde{\boldsymbol{\beta}} and 𝜷^\widehat{\boldsymbol{\beta}}, respectively.
2:Set a bootstrap size B1B\geq 1.
3:Compute functional residuals;
4:for i=1i=1 to nn do
5:     for =1\ell=1 to LiL_{i} do
6:         e~i,Yi,𝐗i𝜷~(Ti,)\widetilde{e}_{i,\ell}\leftarrow Y_{i,\ell}-\mathbf{X}_{i}^{\top}\widetilde{\boldsymbol{\beta}}(T_{i,\ell}).
7:     end for
8:end for
9:Generate BB bootstrap samples under the null hypothesis;
10:for b=1b=1 to BB do
11:     for i=1i=1 to nn do
12:         for =1\ell=1 to LiL_{i} do
13:              Sample vi,(b){512,5+12}v_{i,\ell}^{(b)}\in\big{\{}-\frac{\sqrt{5}-1}{2},\frac{\sqrt{5}+1}{2}\big{\}} with probability (5+125,5125)\Big{(}\frac{\sqrt{5}+1}{2\sqrt{5}},\frac{\sqrt{5}-1}{2\sqrt{5}}\Big{)}.
14:              Y~i,(b)𝐗i𝜷^(Ti,)+vi,(b)e~i,\widetilde{Y}_{i,\ell}^{(b)}\leftarrow\mathbf{X}_{i}^{\top}\widehat{\boldsymbol{\beta}}(T_{i,\ell})+v_{i,\ell}^{(b)}\widetilde{e}_{i,\ell}.
15:         end for
16:     end for
17:     𝒳n(b){(𝐘~i(b),𝐗i):i=1,,n}\mathcal{X}_{n}^{(b)}\leftarrow\{(\widetilde{\mathbf{Y}}_{i}^{(b)},\mathbf{X}_{i}):i=1,\ldots,n\}.
18:     Find the shape-constrained 𝜷^(b)\widehat{\boldsymbol{\beta}}^{(b)} and unconstrained 𝜷~(b)\widetilde{\boldsymbol{\beta}}^{(b)} based on 𝒳n(b)\mathcal{X}_{n}^{(b)}.
19:     Compute Dn(b)D_{n}^{(b)} based on 𝜷^(b)\widehat{\boldsymbol{\beta}}^{(b)}, 𝜷~(b)\widetilde{\boldsymbol{\beta}}^{(b)} and 𝒳n(b)\mathcal{X}_{n}^{(b)}.
20:end for

Once BB bootstrap samples 𝒳n(1),,𝒳n(B)\mathcal{X}_{n}^{(1)},\ldots,\mathcal{X}_{n}^{(B)} are independently generated, we calculate the shape-constrained and unconstrained estimates 𝜷^(b)\widehat{\boldsymbol{\beta}}^{(b)} and 𝜷~(b)\widetilde{\boldsymbol{\beta}}^{(b)}, and calculate the test statistic Dn(b)D_{n}^{(b)}, for each b=1,,Bb=1,\ldots,B. These bootstrap test statistics are compared to the original test statistic DnD_{n} to determine the bootstrap pp-value, pn=B1b=1B𝕀(Dn(b)>Dn)p_{n}=B^{-1}\sum_{b=1}^{B}\mathbb{I}(D_{n}^{(b)}>D_{n}). Then, we reject the null hypothesis H0H_{0} if pn<αp_{n}<\alpha, and retain H0H_{0} otherwise at the significance level α(0,1)\alpha\in(0,1). Our numerical study (Section 3) demonstrates that the proposed test procedure fulfills consistency in the sense that the type I error rate meets the significance level as specified and that the power of the test increases as sample size increases.

3 Simulation Study

We demonstrate the finite-sample performance of the proposed methods with numerical simulations. As described in Section 2, functional observations are assumed to be only partially available in our simulation study. Specifically, we first generate subject-specific evaluation points 𝐓i=(Ti,1,,Ti,Li)\mathbf{T}_{i}=(T_{i,1},\ldots,T_{i,L_{i}}) independently drawn from Beta(1,1.25)\mathrm{Beta}(1,1.25), which is a right-skewed distribution on [0,1][0,1]. Here, LiL_{i} is a uniform random integer on [5,15][5,15], representing the bounded number of sparse and irregular observations. Then, longitudinal observations 𝐘i=(Yi(Ti,1),,Yi(Ti,Li))\mathbf{Y}_{i}=\big{(}Y_{i}(T_{i,1}),\ldots,Y_{i}(T_{i,L_{i}})\big{)}^{\top} of each functional response YiY_{i} are generated by the model (2.1) associated with four functional coefficients {βj:j=1,4}\{\beta_{j}:j=1,\ldots 4\} depicted in Figure 3(a), where the random vector 𝐗i=(Xi,1,,Xi,4)\mathbf{X}_{i}=(X_{i,1},\ldots,X_{i,4})^{\top} of scalar covariates is given by Xi,j=(Ui,j+0.5Ui,5)/(1+0.5)X_{i,j}=(U_{i,j}+0.5U_{i,5})/(1+0.5) with independent uniform random variables Ui,1,,Ui,5U_{i,1},\ldots,U_{i,5} on [0,1][0,1]. This makes the four components Xi,1,,Xi,4X_{i,1},\ldots,X_{i,4} correlated. Moreover, to introduce the within-subject dependency of longitudinal observations, we set the functional noise as εi(t)=k=150(ξi,k/k)2sin(2kπt)+ϵi(t)\varepsilon_{i}(t)=\sum_{k=1}^{50}(\xi_{i,k}/\sqrt{k})\sqrt{2}\sin(2k\pi t)+\epsilon_{i}(t), where 𝝃i=(ξi,1,,ξi,50)\boldsymbol{\xi}_{i}=(\xi_{i,1},\ldots,\xi_{i,50})^{\top} has a multivariate normal distribution with mean zero and Cov(ξi,k,ξi,k)=0.5|kk|\mathrm{Cov}(\xi_{i,k},\xi_{i,k^{\prime}})=0.5^{|k-k^{\prime}|} and {ϵi(t):t[0,1]}\{\epsilon_{i}(t):t\in[0,1]\} is a Gaussian white noise process with mean zero and Var(ϵi(t))=0.12\mathrm{Var}(\epsilon_{i}(t))=0.1^{2}. Then, we have a random sample 𝒳n={(𝐗i,𝐘i):i=1,,n}\mathcal{X}_{n}=\{(\mathbf{X}_{i},\mathbf{Y}_{i}):i=1,\ldots,n\} of size n{100,500}n\in\{100,500\}.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a)
Refer to caption
(b)
Figure 3: True regression coefficient functions associated with the function-on-scalar regression model (2.1) are depicted in the left panel. The right panel illustrates empirical sample paths of 5050 functional responses 𝐘i\mathbf{Y}_{i} generated with right-skewed evaluation points 𝐓i\mathbf{T}_{i}. We consider testing monotone-increasing hypotheses of β1\beta_{1} and β3\beta_{3} on sub-intervals of inferential interests.

We consider testing a composite null hypothesis

H0:β1 and β3 are monotone increasing on ,H_{0}:\textrm{$\beta_{1}$ and $\beta_{3}$ are monotone increasing on $\mathcal{I}$}, (3.1)

where [0,1]\mathcal{I}\subset[0,1] is a pre-specified sub-interval to which the partial shape constraints of interest apply. This setup is intended to mock up one of the common situations in longitudinal studies such that patients drop out of the study or transfer to other clinics. For example, as the vertical rugs show that evaluation points are right-skewed in Figure 3(b), fewer observations are available near t=1t=1. Still, verifying the partial shape constraints on regression coefficient functions near t=1t=1 can be of inferential interest. For a systematic assessment of the numerical experiments, we consider ten simulation scenarios of the sub-interval =[0,0.5]\mathcal{I}=[0,0.5], [0.1,0.5][0.1,0.5], [0.2,0.5][0.2,0.5], [0.3,0.5][0.3,0.5], [0.4,0.5][0.4,0.5], [0.5,1][0.5,1], [0.6,1][0.6,1], [0.7,1][0.7,1], [0.8,1][0.8,1], and [0.9,1][0.9,1]. With the graphical illustration of the true regression coefficient functions in Figure 3(a), we note that the null hypothesis (3.1) is true if [0,0.5]\mathcal{I}\subset[0,0.5], and false otherwise.

To demonstrate the consistency of the proposed method subject to the different scenarios of \mathcal{I}, we evaluate the power of the test by the Monte Carlo approximation as

γα()=1Rr=1R𝕀(pn(r)<α)\displaystyle\gamma_{\alpha}(\mathcal{I})=\frac{1}{R}\sum_{r=1}^{R}\mathbb{I}\big{(}p_{n}^{(r)}<\alpha\big{)} (3.2)

at the significance level α(0,1)\alpha\in(0,1), where pn(r)p_{n}^{(r)} is the pp-value obtained in each rr-th Monte Carlo repetition under the null hypothesis (2.2). Since γα()\gamma_{\alpha}(\mathcal{I}) gives the empirical level of the test, we can validate how much the proposed test keeps the test level as specified with different significance levels. Under the general alternative, γα()\gamma_{\alpha}(\mathcal{I}) corresponds to the empirical power of the test, and we can examine how fast γα()\gamma_{\alpha}(\mathcal{I}) approaches 1 as the sample size increases.

Besides the consistency of the partially shape-constrained inference, we evaluate the consistency of estimation with the integrated squared bias (ISB) and the integrated variance (IVar),

ISB()=1||𝜷^¯(t)𝜷(t)2dt,IVar()=1Rr=1R1||𝜷^(r)(t)𝜷^¯(t)2dt,\displaystyle\textrm{ISB}(\mathcal{I})=\frac{1}{|\mathcal{I}|}\int_{\mathcal{I}}\Big{\|}\overline{\widehat{\boldsymbol{\beta}}}(t)-\boldsymbol{\beta}(t)\Big{\|}^{2}\,\mathrm{d}t,\quad\textrm{IVar}(\mathcal{I})=\frac{1}{R}\sum_{r=1}^{R}\frac{1}{|\mathcal{I}|}\int_{\mathcal{I}}\Big{\|}\widehat{\boldsymbol{\beta}}^{(r)}(t)-\overline{\widehat{\boldsymbol{\beta}}}(t)\Big{\|}^{2}\,\mathrm{d}t, (3.3)

where 𝜷^¯(t)=R1r=1R𝜷^(r)(t)\overline{\widehat{\boldsymbol{\beta}}}(t)=R^{-1}\sum_{r=1}^{R}\widehat{\boldsymbol{\beta}}^{(r)}(t) is the average of the shape-constrained estimates obtained from the repeated Monte Carlo experiments. In (3.3), we normalize the ISB and IVar by |||\mathcal{I}| to easily compare the trend of numerical performances obtained from the different lengths of sub-intervals. Combining the above (3.2) and (3.3), we can assess the sensitivity and specificity of the proposed test procedure subject to the location and length of the sub-interval. We mainly report the simulation results obtained with B=200B=200 and R=200R=200, but our background simulation study gave the same lessons with larger BB and RR. We refer the readers to Section S.3 of the Supplementary Materials for the implementation details, including the bandwidth and knots selection for the shape-constrained kernel smoothing and regression spline, respectively.

We also consider a sub-cohort analysis for a comparative study, shedding light on another essential feature of the proposed method. As mentioned in the Introduction section, one may argue that it is more appropriate to simply treat (3.1) as global shape constraints “β1|\beta_{1}|_{\mathcal{I}} and β3|\beta_{3}|_{\mathcal{I}} are globally monotone increasing” under the model (2.1) restricted to \mathcal{I}. In this concern, we conduct the sub-cohort analysis using the same inferential procedure as the proposed method but only utilizing the partial data 𝒳n|={(𝐘i|,𝐗i):i=1,,n}\mathcal{X}_{n}|_{\mathcal{I}}=\{(\mathbf{Y}_{i}|_{\mathcal{I}},\mathbf{X}_{i}):i=1,\ldots,n\}, where 𝐘i|={Yi,:Ti,,=1,,Li}\mathbf{Y}_{i}|_{\mathcal{I}}=\{Y_{i,\ell}:T_{i,\ell}\in\mathcal{I},\,\ell=1,\ldots,L_{i}\}. In contrast to the sub-cohort analysis, we call our proposed method the “full cohort” analysis since it uses all available observations in 𝒳n\mathcal{X}_{n} for estimation regardless of \mathcal{I}. As seen below, our simulation result indicates that the sub-cohort analysis suffers from a significant loss of information. The more \mathcal{I} is located in the second half of the domain, the smaller the statistical power one may expect. The situation becomes much worse depending on the length of \mathcal{I} and the sparsity of functional evaluations on it. Through our simulation study, we also demonstrate that the proposed test procedure is robust against the location of sub-intervals.

Table 1: Simulation results for the partially shape-constrained estimates corresponding to the kernel and spline methods. The estimation and test performance are evaluated with the integrated squared bias (ISB), the integrated variance (IVar), the type I error rate, and the power of the test, defined as (3.2) and (3.3).
Sample size Dataset Criterion H0:β1H_{0}:\beta_{1} and β3\beta_{3} are monotone increasing on \mathcal{I}.
Kernel least squares Spline regression
=[0,0.5]\mathcal{I}=\mathcal{[}0,0.5] =[0.4,0.5]\mathcal{I}=\mathcal{[}0.4,0.5] =[0,0.5]\mathcal{I}=\mathcal{[}0,0.5] =[0.4,0.5]\mathcal{I}=\mathcal{[}0.4,0.5]
n=100n=100 Full data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.06,0.11)(0.06,0.11) (0.05,0.11)(0.05,0.11) (0.04,0.07)(0.04,0.07) (0.08,0.11)(0.08,0.11)
ISB\mathrm{ISB} 0.03020.0302 0.00730.0073 0.01010.0101 0.00330.0033
IVar\mathrm{IVar} 0.40340.4034 0.53460.5346 0.37370.3737 0.20540.2054
Partial data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.06,0.11)(0.06,0.11) (0.09,0.19)(0.09,0.19) (0.05,0.11)(0.05,0.11) (0.06,0.12)(0.06,0.12)
restricted to \mathcal{I} ISB\mathrm{ISB} 0.05170.0517 0.07680.0768 0.01760.0176 0.11560.1156
IVar\mathrm{IVar} 0.53350.5335 1.50281.5028 0.57580.5758 1.68961.6896
n=500n=500 Full data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.05,0.10)(0.05,0.10) (0.05,0.09)(0.05,0.09) (0.04,0.08)(0.04,0.08) (0.05,0.08)(0.05,0.08)
ISB\mathrm{ISB} 0.01020.0102 0.00180.0018 0.00170.0017 0.00160.0016
IVar\mathrm{IVar} 0.10100.1010 0.11350.1135 0.09460.0946 0.05720.0572
Partial data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.04,0.10)(0.04,0.10) (0.07,0.13)(0.07,0.13) (0.05,0.11)(0.05,0.11) (0.04,0.08)(0.04,0.08)
restricted to \mathcal{I} ISB\mathrm{ISB} 0.01290.0129 0.01100.0110 0.00220.0022 0.01090.0109
IVar\mathrm{IVar} 0.14970.1497 0.32290.3229 0.11680.1168 0.27980.2798

We report the simulation results in Table 1 and Figure 4, highlighting the main lessons we obtained in the power analysis and estimation performance. The additional simulation results are provided in Section S.7 of the Supplementary Materials. The proposed partially shape-constrained inference (full cohort analysis) outperforms the sub-cohort analysis with the global shape constraints. Specifically, Table 1 shows that both the kernel and spline methods yield consistent estimates in the sense that ISB and IVar decrease as the sample size increases, which is well-aligned with the large sample properties we investigated in Section 2. Figure 4 shows that the proposed test procedure consistently meets the level of the test as specified (α=0.05,0.1\alpha=0.05,0.1) across different scenarios varying with the length of sub-intervals from 0.10.1 to 0.50.5.

We note that the proposed method keeps the significance level under the null hypothesis and generally attains a greater power under the general alternative, while the sub-cohort analysis often violates the pre-specified level of the test under the null hypothesis. For the sub-interval =[0.9,1]\mathcal{I}=[0.9,1], the sub-cohort analysis appears more sensitive to the general alternative than the full cohort analysis. However, as shown in Figure 1(b) in the Supplementary Materials, the shape-constrained estimates of the sub-cohort analysis suffer from significant bias due to the low sample size with 𝒳n|=[0.9,1]\mathcal{X}_{n}|_{\mathcal{I}=[0.9,1]}. A similar issue arises with the sub-interval =[0.4,0.5]\mathcal{I}=[0.4,0.5], where the sub-cohort analysis exhibits a higher type I error rate compared to the full cohort analysis. Therefore, we do not recommend using the sub-cohort analysis for verifying partial shape constraints over short sub-intervals.

Refer to caption
Refer to caption
Figure 4: The power analysis of the partially shape-constrained kernel method (2.9) subject to shape constraints (3.1). The sensitivity and specificity are evaluated at 5%5\% significance level under different scenarios of locations and lengths of the sub-interval as =[0,0.5]\mathcal{I}=[0,0.5], [0.1,0.5][0.1,0.5], [0.2,0.5][0.2,0.5], [0.3,0.5][0.3,0.5], [0.4,0.5][0.4,0.5], [0.5,1][0.5,1], [0.6,1][0.6,1], [0.7,1][0.7,1], [0.8,1][0.8,1], and [0.9,1][0.9,1].

4 Real Data Applications

We illustrate the application of shape-constrained inference to two datasets from clinical trials and demonstrate how we figure out treatment efficacy over time. The proposed tools help summarize dynamic efficacy patterns so that practitioners better understand when the maximum treatment effect is achieved and decide the appropriate treatment frequencies. Compared to existing studies making such conclusions relying on visualization of estimated regression coefficient functions or based on inference over the entire domain, our method enables providing statistical evidence on refined conclusions on any sub-intervals of interest. Although we illustrate examples from clinical trials, the proposed tool can be applied to any field with similar interests.

4.1 Application 1: Cervical Dystonia Dataset

Cervical dystonia is a painful neurological condition that causes the head to twist or turn to one side because of involuntary muscle contractions in the neck. Although this rare disorder can occur at any age, it most often occurs in middle-aged people, women more than men. The dataset is collected from a randomized placebo-controlled trial of botulinum toxin (botox) B, where 109 participants across 9 sites were assigned to placebo (36 subjects), 5000 units of botox B (36 subjects), or 10,000 units of botox B (37 subjects), injected into the affected muscle to partially paralyze it and make it relax. The response variable is the score on the Toronto Western Spasmodic Torticollis Rating Scale (TWSTRS), measuring the severity, pain, and disability of cervical dystonia, where higher scores mean more impairment on a 0-87 scale. The TWSTRS is administered at baseline (week 0) and at weeks 2, 4, 8, 12, and 16 thereafter. This study was originally conducted by [2], and 96.5% of subjects were followed up at designated weeks, on average. Besides, the dataset contains information on the age and sex of the subjects. The data is available in R through the package ‘medicaldata’ (https://github.com/higgi13425/medicaldata). While [2] or [8] focus on demonstrating the efficacy of botox B in cervical dystonia during weeks of clinical trial under longitudinal models, our application aims to assess the pattern of drug efficacy over weeks by applying shape-constrained inference on various sub-intervals and investigate when the maximum efficacy is achieved. To do this, we introduce the indicator variables DiD_{i}, where Di=1D_{i}=1 if the drug is assigned to iith subject, and Di=0D_{i}=0, otherwise. We then consider the following regression model

Yi(t)=β0(t)+β1(t)Di+β2(t)Agei+β3(t)Sexi+ϵi(t)(t[0,16]),Y_{i}(t)=\beta_{0}(t)+\beta_{1}(t)D_{i}+\beta_{2}(t)*\texttt{Age}_{i}+\beta_{3}(t)*\texttt{Sex}_{i}+\epsilon_{i}(t)\quad(t\in[0,16]), (4.1)
Table 2: pp-values from Cervical Dystonia data under corresponding shape-constrained null hypotheses with given sub-intervals using kernel-based and spline-based estimation.
Hypothesis pp-values
Kernel-based Spline-based
β1(t)\beta_{1}(t) is monotone decreasing on [0,4][0,4] 0.36 0.45
β1(t)\beta_{1}(t) is monotone increasing on [0,4][0,4] 0.03 0.02
β1(t)\beta_{1}(t) is monotone decreasing on [4,16][4,16] 0.07 0.01
β1(t)\beta_{1}(t) is monotone increasing on [4,16][4,16] 0.84 0.48

Table 2 contains the results of the monotonicity tests, H0:β1(t)H_{0}:\beta_{1}(t) is monotone decreasing (increasing) over two sub-intervals, =[0,4]\mathcal{I}=[0,4] and [4,16][4,16], from the kernel- and spline-based tests using bootstrap size B=500B=500. The spline-based approach specifically adopts piece-wise quadratic II-spline basis functions with one internal knot located at t=8t=8, determined by cross-validation as detailed in Supplementary Material S.3. To estimate β1(t)\beta_{1}(t) with shape restriction over =[0,4]\mathcal{I}=[0,4] or [4,16][4,16], we locate one additional knot at t=4t=4 only for β1(t)\beta_{1}(t) to present desired shape under given boundaries. The kernel-based results are obtained from estimates based on optimal bandwidth h3.2(=0.2×16)h\approx 3.2(=0.2\times 16) with the detailed selection criteria discussed in Supplementary Material. Then, we observe that both kernel- and spline-based tests conclude the significant decreasing β1(t)\beta_{1}(t) from weeks 0 to 4 by rejecting H0:β1(t)H_{0}:\beta_{1}(t) is monotone increasing on [0,4][0,4], but declare an increasing pattern observed from weeks 4 to 16 by rejecting H0:β1(t)H_{0}:\beta_{1}(t) is monotone decreasing on [4,16][4,16] under the significant level 0.1. This conclusion implies the improving efficacy of botox B during the first 4 weeks, but diminishing effectiveness after then. This finding becomes more convincing through consistent conclusions regardless of the choice of shape constraints on null hypotheses, either monotone decreasing or increasing, or choice of estimation methods.

Refer to caption
Figure 5: Estimated coefficient functions from Cervical Dystonia data. Solid lines are constrained estimation functions with decreasing restriction over weeks 0 to 4 and increasing restriction over weeks 4 to 16 on treatment effect, and dotted lines are unconstrained spline estimation, along with the location of knots indicated by the symbol ‘×’

Figure 5 displays the estimated functional coefficients under spline estimates with the chosen optimal selection of knots in each panel. Following the inferential conclusion on treatment effect, β1(t)\beta_{1}(t) is estimated under the decreasing constraint over weeks from 0 to 4 and under the increasing constraint during weeks from 4 to 16. The unconstrained and constrained models are fitted using the same knots, but one extra knot at week 4 is added to fit β1(t)\beta_{1}(t) due to shape constraint changing at week 4. We observe the positive coefficients at week 0, presumably due to a slight bias from the random assignment in the trial, with the mean of response scores from placebo and treatment groups calculated as 43.5 and 46.7, respectively, at baseline (week 0). Although patients in the treatment group show slightly higher scores at the beginning, the decreasing trend in the first four weeks, i.e., improving efficacy, is clear from the estimated function. After 4 weeks, the degree of effectiveness weakens, and at week 16, there is no more treatment effect by returning to the status that we observed from week 0. While we find a precise fit for β1(t)\beta_{1}(t) under shape constraints, remainder regression coefficient estimates for age and sex, β2(t)\beta_{2}(t) and β3(t)\beta_{3}(t), show similar fits with or without constraints on β1(t)\beta_{1}(t). In addition to the conclusion on overall treatment efficacy as in [2], we could make a comprehensive summary of treatment efficacy based on refined conclusions over sub-intervals of interest, and it further helps precise estimation.

4.2 Application 2: Mental Health Schizophrenia Collaborative Study

We next implement the proposed method on the data from the National Institute of Mental Health Schizophrenia Collaborative Study, where 437 patients were randomized to receive either a placebo or anti-psychotic drug, followed by longitudinal monitoring of individuals. [1] and [12] already analyzed this data to assess the efficacy of the drug using Item 79 ‘Severity of Illness,’ the Inpatient Multidimensional Psychiatric Scale (IMPS) ranging from 1 (normal, not ill at all) to 7 (among the most extremely ill), evaluated at weeks 0, 1, 3, 6 under the protocol along with additional measurements for some patients made at weeks 2, 4, 5. There were a total of 108 and 329 patients in the placebo and treatment groups, respectively, with an average of 89.8%89.8\% among them collected during weeks of protocol and 2.62.6% collected during other weeks. The data is available in R (Package ‘mixor’). As in [1] and [12], we consider the regression model Yi(t)=β0(t)+β1(t)Di+ϵi(t)Y_{i}(t)=\beta_{0}(t)+\beta_{1}(t)D_{i}+\epsilon_{i}(t), t[0,6]t\in[0,6], where Yi(t)Y_{i}(t) denotes the illness severity measurement of iith subject at week tt, and β1(t)\beta_{1}(t) represents the effect of drug treatment over weeks with the indicator dummy variable DiD_{i}; Di=1D_{i}=1 for individuals assigned in treatment group. As mentioned in the Introduction, although [1] and [12] demonstrated its improving effectiveness over the entire domain, from week 0 to 6, through conclusion on significant decreasing βi(t)\beta_{i}(t) using their inferential tools under shape-constrained estimation, respectively, we are furthermore interested in the efficacy dynamics in later weeks by applying sub-interval tests, similar to what we conducted in Section 4.1. This is motivated by Figure 4 of [1], displaying a clear decreasing trend on estimated β1(t)\beta_{1}(t) with narrow confidence over weeks 0 - 3, but flattening out phase afterward with widening confidence intervals. By applying sub-interval tests over weeks 0 to 3 and weeks 3 to 6, we examine patterns in efficacy beyond the overall drug effectiveness.

Table 3: pp-values from Schizophrenia data under corresponding shape-constrained null hypotheses with given sub-intervals using kernel-based and spline-based estimation.
Hypothesis pp-values
Kernel-based Spline-based
β1(t)\beta_{1}(t) is monotone decreasing on [0,3][0,3] 0.71 0.26
β1(t)\beta_{1}(t) is monotone increasing on [0,3][0,3] 0.05 0.00
β1(t)\beta_{1}(t) is monotone decreasing on [3,6][3,6] 0.65 0.15
β1(t)\beta_{1}(t) is monotone increasing on [3,6][3,6] 0.62 0.11

Table 3 provides pp-values from H0:β1(t)H_{0}:\beta_{1}(t) is monotone increasing (decreasing) over \mathcal{I} with bootstrap size B=500B=500. The optimal bandwidth for kernel estimates is set as h1.8(=0.3×6)h\approx 1.8(=0.3\times 6), and for spline estimates, we adopt piece-wise II-splines with two internal knots located at t=1t=1 and 4. Both kernel- and spline-based tests conclude the significant decreasing β1(t)\beta_{1}(t) during weeks 0 - 3, supported by pp-values less than or equal to significance level α=0.05\alpha=0.05 from the null hypothesis of monotone increasing. Consistently, the null hypothesis for monotone decrasing over weeks 0 -3 is not rejected from both tests. However, sub-interval tests over weeks 3 - 6 conclude that there is not enough statistical evidence to reject both the monotone increasing and decreasing trends. Thus, we conclude the constant level of β1(t)\beta_{1}(t) maintained over this period, implying the maximum drug efficacy achieved at week 3 and the duration of such degree of effectiveness until the end of the experiment. The estimated drug effect is illustrated in Figure 1 with the estimates with no constraints and with decreasing constraints over the entire domain. As discussed in the Section 1, the latter fit is the inferential finding from [1] and [12].

References

  • [1] M. Ahkim, G. I., and V. A. Shape testing in varying coefficient models. Test, 26:429–450, 2017.
  • [2] A. Brashear, M. Lew, D. Dykstra, C. Comella, S. Factor, R. Rodnitzky, R. Trosch, C. Singer, M. Brin, and J. Murray. Safety and efficacy of NeuroBloc (botulinum toxin type B) in type A–responsive cervical dystonia. Neurology, 53(7):1439–1439, 1999.
  • [3] A. C. Cameron, J. B. Gelbach, and D. L. Miller. Bootstrap-based improvements for inference with clustered errors. The review of economics and statistics, 90(3):414–427, 2008.
  • [4] Z. Chen, Q. Gao, B. Fu, and H. Zhu. Monotone nonparametric regression for functional/longitudinal data. Statistica Sinica, 29(4):2229–2249, 2019.
  • [5] D. Chetverikov, A. Santos, and A. M. Shaikh. The econometrics of shape restrictions. Annual Review of Economics, 10(1):31–63, 2018.
  • [6] J. A. Cuesta-Albertos, E. García-Portugués, M. Febrero-Bande, and W. González-Manteiga. Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes. The Annals of Statistics, 47(1):439–467, 2019.
  • [7] R. Davidson and E. Flachaire. The wild bootstrap, tamed at last. Journal of Econometrics, 146(1):162–169, 2008.
  • [8] C. S. Davis. Statistical Methods for the Analysis of Repeated Measurements. Springer Texts in Statistics. New York, NY: Springer Nature, 1 edition, 2003.
  • [9] L. Dümbgen. Shape-constrained statistical inference. Annual Review of Statistics and Its Application, 11, 2024.
  • [10] Y. Fan and E. Guerre. Multivariate local polynomial estimators: Uniform boundary properties and asymptotic linear representation. In Essays in Honor of Aman Ullah, volume 36, pages 489–537. Emerald Group Publishing Limited, 2016.
  • [11] S. Friedrich, F. Konietschke, and M. Pauly. A wild bootstrap approach for nonparametric repeated measurements. Computational Statistics & Data Analysis, 113:38–52, 2017.
  • [12] R. Ghosal, S. Ghosh, J. Urbanek, J. A. Schrack, and V. Zipunnikov. Shape-constrained estimation in functional regression with bernstein polynomials. Computational Statistics & Data Analysis, 178:107614, 2023.
  • [13] W. Härdle and E. Mammen. Bootstrap methods in nonparametric regression. In Nonparametric functional estimation and related topics, pages 111–123. Springer, 1991.
  • [14] J. Z. Huang, C. . Wu, and L. Zhou. Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica, 14(3):763–788, 2004.
  • [15] R. J. Kuczmarski. 2000 CDC growth charts for the United States: Methods and development. Number 246. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics, 2002.
  • [16] E. Mammen. Bootstrap and wild bootstrap for high dimensional linear models. The annals of statistics, 21(1):255–285, 1993.
  • [17] E. Mammen, J. S. Marron, B. A. Turlach, and M. P. Wand. A General Projection Framework for Constrained Smoothing. Statistical Science, 16(3):232 – 248, 2001.
  • [18] M. C. Meyer. Inference using shape-restricted regression splines. The Annals of Applied Statistics, 2(3):1013–1033, 2008.
  • [19] M. C. Meyer. A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics - Simulation and Computation, 42(5):1126–1139, 2013.
  • [20] M. C. Meyer. A framework for estimation and inference in generalized additive models with shape and order restrictions. Statistical Science, 33(4):595–614, 2018.
  • [21] J. S. Morris. Functional regression. Annual Review of Statistics and Its Application, 2(2):321–359, 2015.
  • [22] Y. Park, K. Han, and D. G. Simpson. Testing linear operator constraints in functional response regression with incomplete response functions. Electronic Journal of Statistics, 17(2):3143–3180, 2023.
  • [23] J. O. Ramsay. Monotone regression splines in action. Statistical Science, 3(4):425–441, 1988.
  • [24] P. T. Reiss, J. Goldsmith, H. L. Shang, and R. T. Ogden. Methods for scalar-on-function regression. International Statistical Review, 85(2):228–249, 2017.
  • [25] R. J. Samworth and B. Sen. Special issue on “nonparametric inference under shape constraints”. Statistical science, 33(4):469–472, 2018.
  • [26] L. Schumaker. Spline Functions: Basic Theory. Cambridge Mathematical Library. Cambridge University Press, 3 edition, 2007.
  • [27] E. Seijo and B. Sen. Nonparametric least squares estimation of a multivariate convex regression function. The Annals of Statistics, 39(3):1633–1657, 2011.
  • [28] B. Sen and M. Meyer. Testing against a linear regression model using ideas from shape-restricted estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(2):423–448, 2017.
  • [29] X. Shao. The dependent wild bootstrap. Journal of the American Statistical Association, 105(489):218–235, 2010.
  • [30] Q. Shen and J. Faraway. An FF test for linear models with functional responses. Statistica Sinica, 14:1239–1257, 2004.
  • [31] E. Slutsky. Sulla teoria del bilancio del consumatore. Giornale degli economisti e rivista di statistica, pages 1–26, 1915.
  • [32] J.-L. Wang, J.-M. Chiou, and H.-G. Müller. Functional data analysis. Annual Review of Statistics and Its Application, 3:257–295, 2016.
  • [33] C.-F. J. Wu. Jackknife, bootstrap and other resampling methods in regression analysis. The Annals of Statistics, 14(4):1261–1295, 1986.
  • [34] D. Yagi, Y. Chen, A. L. Johnson, and T. Kuosmanen. Shape-constrained kernel-weighted least squares: Estimating production functions for chilean manufacturing industries. Journal of Business & Economic Statistics, 38(1):43–54, 2020.
  • [35] J.-T. Zhang. Statistical inferences for linear models with functional responses. Statistica Sinica, 21:1431–1451, 2011.
  • [36] J.-T. Zhang and J. Chen. Statistical inferences for functional data. The Annals of Statistics, 35(3):1052–1079, 2007.

Supplementary materials:
Statistical inference on partially shape-constrained
function-on-scalar linear regression models

S.1 Technical conditions for kernel estimates

For the minimum and maximum measurement frequencies λn=min1inLi\lambda_{n}=\min_{1\leq i\leq n}L_{i} and Λn=max1inLi\Lambda_{n}=\max_{1\leq i\leq n}L_{i}, let N=nλnN=n\lambda_{n}. We list the technical conditions for Proposition 2.2.1 and Theorem 2.2.2.

  1. (K0)(K0)

    π\pi of TT is continuously differentiable and strictly positive.

  2. (K1)(K1)

    KK is a probability density function symmetric and Lipschitz continuous on [1,1][-1,1].

  3. (K2)(K2)

    Eεk<E\|\varepsilon\|_{\infty}^{k}<\infty for some k>2k>2.

  4. (K3)(K3)

    hNah\asymp N^{-a} for some 0<a<12/k0<a<1-2/k.

  5. (K4)(K4)

    λn=O(1)\lambda_{n}=O(1) and Λn=O(Nb/logN)k2\Lambda_{n}=O\big{(}N^{b}/\log N\big{)}^{\frac{k}{2}} for some 0<b<1(2/k)a0<b<1-(2/k)-a.

(K1)(K1)(K3)(K3) are standard conditions to analyze the large sample property of the kernel estimator with an exponential inequality. (K4)(K4) is a technical condition that ensures the uniform convergence of the unconstrained kernel least square estimator even if the minimum number λn\lambda_{n} of longitudinal observations for some subjects is bounded. Still, we require Λn\Lambda_{n} to increase so that the collection {Ti,:1in,=1,,Li}\{T_{i,\ell}:1\leq i\leq n,\,\ell=1,\ldots,L_{i}\} of sampling points from all subjects densely covers the entire domain [0,1][0,1].

S.2 Technical conditions for spline estimates

  1. (S1)(S1)

    Assume βj(t)Cq[0,1]\beta_{j}(t)\in C^{q}[0,1]. For monotone βj\beta_{j}, we use quadratic splines (q=3)(q=3) and for convex βj\beta_{j}, cubic splines are used (q=4)(q=4).

  2. (S2)(S2)

    There is a positive constant M1M_{1} such that |Xj|M1|X_{j}|\leq M_{1}, for j=1,,p.j=1,\ldots,p.

  3. (S3)(S3)

    There is a constant M3M_{3} such that E[ϵ(t)2]M3<E[\epsilon(t)^{2}]\leq M_{3}<\infty for t[0,1].t\in[0,1].

  4. (S4)(S4)

    The knots 0=ξj,1<ξj,2<<ξj,Kjn=10=\xi_{j,1}<\xi_{j,2}<\cdots<\xi_{j,K_{jn}}=1 have bounded mesh ratio; that is, there is a constant M2(1,)M_{2}\in(1,\infty) not depending on nn or jj such that Kjn1M21ξj,k+1ξj,kKjn1M2K_{jn}^{-1}M_{2}^{-1}\leq\xi_{j,k+1}-\xi_{j,k}\leq K_{jn}^{-1}M_{2}, for k=2,Kjnk=2,\ldots K_{jn}.

  5. (S5)(S5)

    lim supn(maxjKjn/minjKjn)\limsup_{n}(\max_{j}K_{jn}/\min_{j}K_{jn})\leq\infty.

  6. (S6)(S6)

    inLi1n\sum_{i}^{n}L_{i}^{-1}\asymp n.

  7. (S7)(S7)

    For the monotone assumption, we have βj(t)ε>0\beta_{j}^{{}^{\prime}}(t)\geq\varepsilon>0 on [0,1][0,1], and for the convex assumption, βj′′(t)ε>0\beta_{j}^{{}^{\prime\prime}}(t)\geq\varepsilon>0 on [0,1][0,1]. For the increasing and convex assumption, constrained hold strictly if βj′′ε>0\beta_{j}^{{}^{\prime\prime}}\geq\varepsilon>0 on [0,1][0,1] and βj(t)ε>0\beta_{j}^{{}^{\prime}}(t)\geq\varepsilon>0.

The condition (S4)(S4) requires fairly distributed knots over [0,1] without bias in location, where equidistantly located knots satisfy this condition. Then similar rates of growth on the number of knots KjnK_{jn} across jj are conditioned through (S5)(S5). Lastly, Condition (S6)(S6) indicates the number of observations per subject increases slower than the rate of increase for sample size nn. In practice, (S6)(S6) is rather common with a bounded number of measurements per subject in clinical experiments instead of its divergence.

S.3 Implementation Details

Bandwidth selection for kernel estimation

To select bandwidths in a data-adaptive manner, we propose a KK-fold cross-validation (CV) procedure that minimizes empirical prediction errors for individual trajectories. Specifically, let 𝒫=k=1K𝒫k\mathcal{P}=\bigcup_{k=1}^{K}\mathcal{P}_{k} be an arbitrary disjoint partition of the index set {1,,n}\{1,\ldots,n\}, where each partition set approximately has the same size. Also, let 𝜷~(t;h,𝒳n,(k))\widetilde{\boldsymbol{\beta}}\big{(}t;h,\mathcal{X}_{n,(-k)}\big{)} be the unconstrained estimates of 𝜷(t)\boldsymbol{\beta}(t) defined with a bandwidth h>0h>0 and the training sample 𝒳n,(k)={(𝐘i,𝐗i):i𝒫k}\mathcal{X}_{n,(-k)}=\{(\mathbf{Y}_{i},\mathbf{X}_{i}):i\not\in\mathcal{P}_{k}\} in (2.4). We choose

h~=argminhk=1Ki𝒫k1Li=1Li(Yi,𝐗i𝜷~(Ti,;h,𝒳n,(k)))2\displaystyle\widetilde{h}=\operatorname*{arg\,min}_{h\in\mathcal{H}}\sum_{k=1}^{K}\sum_{i\in\mathcal{P}_{k}}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{(}Y_{i,\ell}-\mathbf{X}_{i}^{\top}\widetilde{\boldsymbol{\beta}}\big{(}T_{i,\ell};h,\mathcal{X}_{n,(-k)}\big{)}\Big{)}^{2} (S.3.1)

and use 𝜷~(t)=𝜷~(t;h~,𝒳n)\widetilde{\boldsymbol{\beta}}(t)=\widetilde{\boldsymbol{\beta}}\big{(}t;\widetilde{h},\mathcal{X}_{n}\big{)} as the final estimates, where \mathcal{H} is a grid of the domain. We also similarly implement the constrained estimates 𝜷^(t)=𝜷^(t;h^,𝒳n)\widehat{\boldsymbol{\beta}}(t)=\widehat{\boldsymbol{\beta}}\big{(}t;\widehat{h},\mathcal{X}_{n}\big{)}. In our numerical studies, we used the 55-fold CV procedure with \mathcal{H} that consists of equally-spaced ten points.

Knot selection for regression spline functions

In spline regression, the number and location of knots can significantly impact the fit. Thus, in simulation studies and real data examples, we try a KK-fold CV error to choose the optimal number as well as the location of internal knots for unconstrained estimates, where the practical partition of samples and evaluation details are discussed in the above paragraph for bandwidth selection in kernel least squares. Although the equally spaced knots are preferred in some literature, considering the right-skewed or unbalanced evaluation points Ti,T_{i,\ell} in our numerical examples, we further consider a set of different knot sequences spanning the entire domain under the given candidate number of internal knots, which are set as 1,2, or 3, in our study. We assign the same optimal knot sequence for all β~j(t)\tilde{\beta}_{j}(t), although it can vary over jj. Once the optimal knot sequence for unconstrained fit is determined, we add lower and upper boundaries of sub-interval j\mathcal{I}_{j} as additional internal knots for the estimation of partially shape-constrained βj(t)\beta_{j}(t), for jJj\in J, to present the desired shape explicitly over j\mathcal{I}_{j}. For simulation experiments, we perform the 5-fold CV-based knot selection at each iteration.

S.4 Proofs of Proposition 2.2.1 and Theorem 2.2.2

Lemma S.4.1.

For r{0,1,2}r\in\{0,1,2\}, let

Hr(t)=1ni=1n1Li=1LiKh(Ti,t)(Ti,th)rηi(Ti,)(t[0,1]),\displaystyle H_{r}(t)=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}\eta_{i}(T_{i,\ell})\quad(t\in[0,1]), (S.4.1)

where η1,,ηn\eta_{1},\ldots,\eta_{n} are random copies of a stochastic process η\eta on [0,1][0,1]. Assume the following conditions hold.

  1. (L0)(L0)

    π\pi is continuously differentiable and strictly positive,

  2. (L1)(L1)

    KK is Lipschitz continuous on [0,1][0,1], i.e., |K(s)K(t)|Lip(K)|st||K(s)-K(t)|\leq\mathrm{Lip}(K)|s-t| for every s,t,[0,1]s,t,\in[0,1], where Lip(K)\mathrm{Lip}(K) denotes the Lipschitz constant of KK,

  3. (L2)(L2)

    Eηk<E\|\eta\|_{\infty}^{k}<\infty for some k>2k>2,

  4. (L3)(L3)

    hNah\asymp N^{-a} for some 0<a<12/k0<a<1-2/k,

  5. (L4)(L4)

    λn=O(1)\lambda_{n}=O(1) and Λn=O(Nb/logN)k/2\Lambda_{n}=O\big{(}N^{b}/\log N\big{)}^{k/2} for some 0<b<1(2/k)a0<b<1-(2/k)-a.

Then, letting N=nλnN=n\lambda_{n}, we have

supt[0,1]hr|Hr(t)E(Hr(t))|=OP(logNNh).\displaystyle\sup_{t\in[0,1]}h^{r}\big{|}H_{r}(t)-E\big{(}H_{r}(t)\big{)}\big{|}=O_{P}\bigg{(}\sqrt{\frac{\log N}{Nh}}\bigg{)}. (S.4.2)
Proof of Lemma S.4.1.

Define H~r(t)\widetilde{H}_{r}(t) similarly as Hr(t)H_{r}(t) by substituting η~i(Ti,)=ηi(Ti,)𝕀(|ηi(Ti,)|δn1)\tilde{\eta}_{i}(T_{i,\ell})=\eta_{i}(T_{i,\ell})\mathbb{I}\big{(}|\eta_{i}(T_{i,\ell})|\leq\delta_{n}^{-1}\big{)} for ηi(Ti,)\eta_{i}(T_{i,\ell}) in (S.4.1), where δn=N1/2h1/2logN\delta_{n}=N^{-1/2}h^{-1/2}\sqrt{\log N}. It follows that

Hr(t)E(Hr(t))={H~r(t)E(H~r(t))}+1ni=1n1Li=1LiKh(Ti,t)(Ti,th)rηi(Ti,)𝕀(|ηi(t)|>δn1)E[Kh(Tt)(Tth)rη(T)𝕀(|η(T)|>δn1)].\displaystyle\begin{split}H_{r}(t)-E\big{(}H_{r}(t)\big{)}&=\Big{\{}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\Big{\}}\\ &\quad+\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}\eta_{i}(T_{i,\ell})\mathbb{I}\big{(}|\eta_{i}(t)|>\delta_{n}^{-1}\big{)}\\ &\quad-E\Big{[}K_{h}(T-t)\Big{(}\frac{T-t}{h}\Big{)}^{r}\eta(T)\mathbb{I}\big{(}|\eta(T)|>\delta_{n}^{-1}\big{)}\Big{]}.\end{split} (S.4.3)

First, we claim that the last two terms on the right side are negligible up to the order of δn\delta_{n} so that it suffices to show the uniform convergence of H~r(t)E(H~r(t))\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}. To claim the approximation, define a sequence of event sets as

n=(|ηi(Ti,)|δn1 for all 1in and 1Li)\displaystyle\mathcal{E}_{n}=\big{(}|\eta_{i}(T_{i,\ell})|\leq\delta_{n}^{-1}\textrm{ for all $1\leq i\leq n$ and $1\leq\ell\leq L_{i}$}\big{)} (S.4.4)

for n=1,2,n=1,2,\ldots. Recalling N=nλnN=n\lambda_{n}, we obtain from Markov’s inequality that

P(n)1i=1n=1LiP(|ηi(Ti,)|>δn1)1nΛnδnkEηk1λn1(Nb{1(2/k)a}logN)k/2Eηk=1o(1)\displaystyle\begin{split}P(\mathcal{E}_{n})&\geq 1-\sum_{i=1}^{n}\sum_{\ell=1}^{L_{i}}P\big{(}|\eta_{i}(T_{i,\ell})|>\delta_{n}^{-1}\big{)}\\ &\geq 1-n\Lambda_{n}\delta_{n}^{k}E\|\eta\|_{\infty}^{k}\\ &\geq 1-\lambda_{n}^{-1}\Big{(}N^{b-\{1-(2/k)-a\}}\log N\Big{)}^{k/2}E\|\eta\|_{\infty}^{k}\\ &=1-o(1)\end{split} (S.4.5)

where λn11\lambda_{n}^{-1}\leq 1. This implies that the second term on the right side of (S.4.3) is zero with probability tending to 1. Similarly, we also get

|E(Kh(Tt)(Tth)rη(T)𝕀(|η(T)|>δn1))|h1νr(t;h)E[supt|η(t)|𝕀(|η(t)|>δn1)]h1νr(t;h)Eηkδnk1δn(Nk22hk221(logN)k22)=δnO(Nk2{1(2/k)a}(logN)k22)=o(δn),\displaystyle\begin{split}&\bigg{|}E\Big{(}K_{h}(T-t)\Big{(}\frac{T-t}{h}\Big{)}^{r}\eta(T)\mathbb{I}\big{(}|\eta(T)|>\delta_{n}^{-1}\big{)}\Big{)}\bigg{|}\\ &\quad\leq h^{-1}\nu_{r}(t;h)E\big{[}\sup_{t}|\eta(t)|\,\mathbb{I}\big{(}|\eta(t)|>\delta_{n}^{-1}\big{)}\big{]}\\ &\quad\leq h^{-1}\nu_{r}(t;h)E\|\eta\|_{\infty}^{k}\delta_{n}^{k-1}\\ &\quad\lesssim\delta_{n}\Big{(}N^{-\frac{k-2}{2}}h^{-\frac{k-2}{2}-1}(\log N)^{\frac{k-2}{2}}\Big{)}\\ &\quad=\delta_{n}O\Big{(}N^{-\frac{k}{2}\{1-(2/k)-a\}}(\log N)^{\frac{k-2}{2}}\Big{)}\\ &\quad=o(\delta_{n}),\end{split} (S.4.6)

where νr(t;h)=01(uth)rKh(ut)π(u)du\nu_{r}(t;h)=\int_{0}^{1}\big{(}\frac{u-t}{h}\big{)}^{r}K_{h}(u-t)\pi(u)\,\mathrm{d}u.

Now, we prove the uniform convergence of H~r(t)E(H~r(t))\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}. For τ>0\tau>0, let 𝒯N\mathcal{T}_{N} be the (Nτ)(N^{-\tau})-net over [0,1][0,1] with the cardinality |𝒯N|=O(Nτ)|\mathcal{T}_{N}|=O(N^{\tau}) such that, for any s[0,1]s\in[0,1], there exists t𝒯Nt\in\mathcal{T}_{N} such that |st|Nτ|s-t|\leq N^{-\tau}. Then, we have

supt[0,1]|H~r(t)E(H~r(t))|supt𝒯n|H~r(t)E(H~r(t))|+sup|st|nτ|(H~r(s)H~r(t))E(H~r(s)H~r(t))|.\displaystyle\begin{split}&\sup_{t\in[0,1]}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}\\ &\leq\sup_{t\in\mathcal{T}_{n}}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}\,\,\,+\sup_{|s-t|\leq n^{-\tau}}\Big{|}\big{(}\widetilde{H}_{r}(s)-\widetilde{H}_{r}(t)\big{)}-E\big{(}\widetilde{H}_{r}(s)-\widetilde{H}_{r}(t)\big{)}\Big{|}.\end{split} (S.4.7)

The second term on the right side is almost surely negligible up to the order of δn\delta_{n}. Specifically, for any |st|Nτ|s-t|\leq N^{-\tau}, it follows from the Lipschitz continuity of KK that

|H~r(s)H~r(t)|1ni=1n1Li=1Li|η~i(Ti,)||Kh(Ti,s)(Ti,sh)rKh(Ti,t)(Ti,th)r|δn1h1{Lip(K)hr+rK}|st|hδn(N1τh1r/logN)=δnO(N1+a(1+r)τ/logN)=o(δn),\displaystyle\begin{split}&\big{|}\widetilde{H}_{r}(s)-\widetilde{H}_{r}(t)\big{|}\\ &\leq\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}|\tilde{\eta}_{i}(T_{i,\ell})|\,\Big{|}K_{h}(T_{i,\ell}-s)\Big{(}\frac{T_{i,\ell}-s}{h}\Big{)}^{r}-K_{h}(T_{i,\ell}-t)\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}\Big{|}\\ &\leq\delta_{n}^{-1}h^{-1}\big{\{}\mathrm{Lip}(K)h^{-r}+r\|K\|_{\infty}\big{\}}\frac{|s-t|}{h}\\ &\lesssim\delta_{n}\big{(}N^{1-\tau}h^{-1-r}/\log N\big{)}\\ &=\delta_{n}O\big{(}N^{1+a(1+r)-\tau}/\log N\big{)}\\ &=o(\delta_{n}),\end{split} (S.4.8)

for a sufficiently large τ>1+a(1+r)\tau>1+a(1+r). To investigate the stochastic magnitude of supt𝒯nhr|H~r(t)E(H~r(t))|\sup_{t\in\mathcal{T}_{n}}h^{r}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}, we write

H~r(t)E(H~r(t))=1ni=1n{ζi,r(t)E(ζi,r(t))},\displaystyle\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}=\frac{1}{n}\sum_{i=1}^{n}\big{\{}\zeta_{i,r}(t)-E\big{(}\zeta_{i,r}(t)\big{)}\big{\}}, (S.4.9)

where ζi,r(t)=1Li=1LiKh(Ti,t)(Ti,th)rη~i(Ti,)\zeta_{i,r}(t)=\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\big{(}\frac{T_{i,\ell}-t}{h}\big{)}^{r}\tilde{\eta}_{i}(T_{i,\ell}). Since ζi,r\|\zeta_{i,r}\|_{\infty} is almost surely bounded for all i=1,,ni=1,\ldots,n, it follows from Bernstein’s inequality that

P(supt𝒯Nhr|H~r(t)E(H~r(t))|Cδn)t𝒯NP(|i=1nhr{ζi,r(t)E(ζi,r(t))}|Cnδn)t𝒯N2exp((Cnδn)2/2h2ri=1nVar(ζi,r(t))+2hrζi,r(Cnδn)/3)2|𝒯N|exp((C2/2)(λn1nh1logN)01u2rK2(u)duEη2π(nh1+2r)+(C/3)K(nh1))Nτexp(λn1g(C)logN)=O(Nτλn1g(C)),\displaystyle\begin{split}&P\bigg{(}\sup_{t\in\mathcal{T}_{N}}h^{r}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}\geq C\delta_{n}\bigg{)}\\ &\leq\sum_{t\in\mathcal{T}_{N}}P\bigg{(}\Big{|}\sum_{i=1}^{n}h^{r}\big{\{}\zeta_{i,r}(t)-E\big{(}\zeta_{i,r}(t)\big{)}\big{\}}\Big{|}\geq Cn\delta_{n}\bigg{)}\\ &\leq\sum_{t\in\mathcal{T}_{N}}2\exp\Bigg{(}-\frac{(Cn\delta_{n})^{2}/2}{h^{2r}\sum_{i=1}^{n}\mathrm{Var}\big{(}\zeta_{i,r}(t)\big{)}+2h^{r}\|\zeta_{i,r}\|_{\infty}(Cn\delta_{n})/3}\Bigg{)}\\ &\leq 2|\mathcal{T}_{N}|\exp\Bigg{(}-\frac{(C^{2}/2)(\lambda_{n}^{-1}nh^{-1}\log N)}{\int_{0}^{1}u^{2r}K^{2}(u)\,\mathrm{d}u\,E\|\eta\|_{\infty}^{2}\|\pi\|_{\infty}(nh^{-1+2r})+(C/3)\|K\|_{\infty}(nh^{-1})}\Bigg{)}\\ &\lesssim N^{\tau}\exp\big{(}-\lambda_{n}^{-1}g(C)\log N\big{)}\\ &=O\big{(}N^{\tau-\lambda_{n}^{-1}g(C)}\big{)},\end{split} (S.4.10)

where g(C)=(C2/2){01u2rK2(u)duEη2π+(C/3)K}g(C)=(C^{2}/2)\big{\{}\int_{0}^{1}u^{2r}K^{2}(u)\,\mathrm{d}u\,E\|\eta\|_{\infty}^{2}\|\pi\|_{\infty}+(C/3)\|K\|_{\infty}\big{\}} is increasing in C>0C>0. Since λn1>0\lambda_{n}^{-1}>0 with λn=O(1)\lambda_{n}=O(1), we conclude that supt𝒯nhr|H~r(t)E(H~r(t))|=OP(δn)\sup_{t\in\mathcal{T}_{n}}h^{r}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}=O_{P}(\delta_{n}). ∎

Proof of Proposition 2.2.1.

We note that

vec(𝔹~(t)𝔹0(t))=[i=1n1Li=1LiKh(Ti,t)(𝐙i,(t)𝐗i)(𝐙i,(t)𝐗i)]1×[i=1n1Li=1LiKh(Ti,t)(𝐙i,(t)𝐗i){Yi,𝐗i𝔹0(t)𝐙i,(t)}].\displaystyle\begin{split}\mathrm{vec}\big{(}\widetilde{\mathbb{B}}(t)-\mathbb{B}_{0}(t)\big{)}&=\Bigg{[}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\big{(}\mathbf{Z}_{i,\ell}(t)\otimes\mathbf{X}_{i}\big{)}\big{(}\mathbf{Z}_{i,\ell}(t)\otimes\mathbf{X}_{i}\big{)}^{\top}\Bigg{]}^{-1}\\ &\qquad\times\Bigg{[}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\big{(}\mathbf{Z}_{i,\ell}(t)\otimes\mathbf{X}_{i}\big{)}\Big{\{}Y_{i,\ell}-\mathbf{X}_{i}^{\top}\mathbb{B}_{0}(t)\mathbf{Z}_{i,\ell}(t)\Big{\}}\Bigg{]}.\end{split} (S.4.11)

To analyze the large sample property of 𝜷~(t)𝜷(t)\widetilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}(t), we obtain the asymptotic approximation of the above matrix multiplication by analyzing individual components. For the first factor in (S.4.11), we note that Lemma S.4.1 gives

1ni=1n1Li=1Li(Ti,th)rKh(Ti,t)𝐗i𝐗i=νr(t;h)Γ+OP(logNNh1+r)\displaystyle\begin{split}\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}K_{h}(T_{i,\ell}-t)\mathbf{X}_{i}\mathbf{X}_{i}^{\top}&=\nu_{r}(t;h)\Gamma+O_{P}\bigg{(}\sqrt{\frac{\log N}{Nh^{1+r}}}\bigg{)}\end{split} (S.4.12)

for r{0,1,2}r\in\{0,1,2\} and uniformly for t[0,1]t\in[0,1], where νr(t;h)=01(uth)rKh(ut)π(u)du\nu_{r}(t;h)=\int_{0}^{1}\big{(}\frac{u-t}{h}\big{)}^{r}K_{h}(u-t)\pi(u)\,\mathrm{d}u and Γ=E(𝐗𝐗)\Gamma=E(\mathbf{X}\mathbf{X}^{\top}). Similarly, for the second factor in (S.4.11), we have

1ni=1n1Li=1Li(Ti,th)rKh(Ti,t)𝐗i{Yi,𝐗i𝔹(t)𝐙i,(t)}=12h2ν2+r(t;h)(Γ𝜷¨0(t))+o(h2)+OP(h2logNNh1+r)\displaystyle\begin{split}&\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}K_{h}(T_{i,\ell}-t)\mathbf{X}_{i}\Big{\{}Y_{i,\ell}-\mathbf{X}_{i}^{\top}\mathbb{B}(t)\mathbf{Z}_{i,\ell}(t)\Big{\}}\\ &=\frac{1}{2}h^{2}\nu_{2+r}(t;h)\big{(}\Gamma\ddot{\boldsymbol{\beta}}_{0}(t)\big{)}+o(h^{2})+O_{P}\bigg{(}h^{2}\sqrt{\frac{\log N}{Nh^{1+r}}}\bigg{)}\end{split} (S.4.13)

for r{0,1}r\in\{0,1\} and uniformly for t[0,1]t\in[0,1], where Yi,𝐗i𝔹(t)𝐙i,(t)=𝐗i𝜷¨(t){(Ti,t)2/2}(1+o(|Ti,t|))+εi,Y_{i,\ell}-\mathbf{X}_{i}^{\top}\mathbb{B}(t)\mathbf{Z}_{i,\ell}(t)=\mathbf{X}_{i}^{\top}\ddot{\boldsymbol{\beta}}(t)\{(T_{i,\ell}-t)^{2}/2\}\cdot(1+o(|T_{i,\ell}-t|))+\varepsilon_{i,\ell} for Ti,T_{i,\ell} near tt. Combining (S.4.12) and (S.4.13), we have

𝜷~(t)𝜷0(t)=[IpOp]vec(𝔹~(t)𝔹0(t))=h22[(1,0)Ip][(ν0(t;h)ν1(t;h)ν1(t;h)ν2(t;h))Γ]1[(ν2(t;h)ν3(t;h))(Γ𝜷¨0(t))]+o(h2)+OP(logNNh)=h22𝜷¨0(t)(ν2(t;h)2ν1(t;h)μ3(t;h)ν0(t;h)ν2(t;h)ν1(t;h)2)+o(h2)+OP(logNNh),\displaystyle\begin{split}\widetilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)&=\left[\begin{array}[]{c}I_{p}\\ O_{p}\end{array}\right]^{\top}\mathrm{vec}\big{(}\widetilde{\mathbb{B}}(t)-\mathbb{B}_{0}(t)\big{)}\\ &=\frac{h^{2}}{2}\Big{[}(1,0)^{\top}\otimes I_{p}\Big{]}\Bigg{[}\left(\begin{array}[]{cc}\nu_{0}(t;h)&\nu_{1}(t;h)\\ \nu_{1}(t;h)&\nu_{2}(t;h)\end{array}\right)\otimes\Gamma\Bigg{]}^{-1}\Bigg{[}\left(\begin{array}[]{c}\nu_{2}(t;h)\\ \nu_{3}(t;h)\end{array}\right)\otimes\big{(}\Gamma\ddot{\boldsymbol{\beta}}_{0}(t)\big{)}\Bigg{]}\\ &\qquad+o(h^{2})+O_{P}\bigg{(}\sqrt{\frac{\log N}{Nh}}\bigg{)}\\ &=\frac{h^{2}}{2}\ddot{\boldsymbol{\beta}}_{0}(t)\bigg{(}\frac{\nu_{2}(t;h)^{2}-\nu_{1}(t;h)\mu_{3}(t;h)}{\nu_{0}(t;h)\nu_{2}(t;h)-\nu_{1}(t;h)^{2}}\bigg{)}+o(h^{2})+O_{P}\bigg{(}\sqrt{\frac{\log N}{Nh}}\bigg{)},\end{split} (S.4.14)

where it can be verified that the first term is O(h2)O(h^{2}) uniformly for t[0,1]t\in[0,1]. ∎

Remark 4.

Using the similar argument in (S.4.14), one can further verify that

supt[0,1]h𝜷˙~(t)𝜷˙(t)=OP(h2+logNNh).\displaystyle\begin{split}\sup_{t\in[0,1]}h\Big{\|}\widetilde{\dot{\boldsymbol{\beta}}}(t)-\dot{\boldsymbol{\beta}}(t)\Big{\|}&=O_{P}\bigg{(}h^{2}+\sqrt{\frac{\log N}{Nh}}\bigg{)}.\end{split} (S.4.15)
Proof of Theorem 2.2.2.

We write

𝒬n(𝔹(t),𝔹~(t))=vec(𝔹(t)𝔹~(t))Ψ^(t)vec(𝔹(t)𝔹~(t)),\displaystyle\begin{split}\mathcal{Q}_{n}\big{(}{\mathbb{B}}(t),\widetilde{\mathbb{B}}(t)\big{)}&=\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)}^{\top}\widehat{\Psi}(t)\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)},\end{split} (S.4.16)

where Ψ^(t)=1ni=1n1Li=1LiKh(Ti,t)𝕎i,(t)𝕎i,(t)\widehat{\Psi}(t)=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\mathbb{W}_{i,\ell}(t)\mathbb{W}_{i,\ell}(t)^{\top}. It follows from Lemma S.4.1 that

Ψ^(t)=𝒩(t;h)Γ+oP(1)\displaystyle\begin{split}\widehat{\Psi}(t)&=\mathcal{N}(t;h)\otimes\Gamma+o_{P}(1)\end{split} (S.4.17)

uniformly for t[0,1]t\in[0,1], where Γ=E(𝐗𝐗)\Gamma=E(\mathbf{X}\mathbf{X}^{\top}) and 𝒩(t;h)\mathcal{N}(t;h) is the 2×22\times 2 matrix whose (r+1,r+1)(r+1,r^{\prime}+1)-component is given by νr+r(t;h)=01(uth)r+rKh(ut)π(u)du\nu_{r+r^{\prime}}(t;h)=\int_{0}^{1}\big{(}\frac{u-t}{h}\big{)}^{r+r^{\prime}}K_{h}(u-t)\pi(u)\,\mathrm{d}u for 0r,r10\leq r,r^{\prime}\leq 1. For t(0,1)t\in(0,1), we note that 𝒩(t;h)=π(t)𝒩\mathcal{N}(t;h)=\pi(t)\mathcal{N} is strictly positive definite, where 𝒩=diag(ν0,ν2)\mathcal{N}=\mathrm{diag}(\nu_{0},\nu_{2}) is a diagonal matrix with νr=11urK(u)du\nu_{r}=\int_{-1}^{1}u^{r}K(u)\,\mathrm{d}u. Therefore, with probability tending to one, we have

1Mm=1M𝒬n(𝔹0(tm),𝔹^(tm))1Mm=1M(𝒬n(𝔹0(tm),𝔹~(tm))+𝒬n(𝔹^(tm),𝔹~(tm)))2Mm=1M𝒬n(𝔹0(tm),𝔹~(tm))2σmax(𝒩Γ)supt[0,1]vec(𝔹(t)𝔹~(t))2=OP(h4+logNNh),\displaystyle\begin{split}\frac{1}{M}\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widehat{\mathbb{B}}(t_{m})\big{)}&\leq\frac{1}{M}\sum_{m=1}^{M}\Big{(}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}+\mathcal{Q}_{n}\big{(}\widehat{\mathbb{B}}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}\Big{)}\\ &\leq\frac{2}{M}\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}\\ &\leq 2\sigma_{\mathrm{max}}(\mathcal{N}\otimes\Gamma)\sup_{t\in[0,1]}\big{\|}\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)}\big{\|}^{2}\\ &=O_{P}\bigg{(}h^{4}+\frac{\log N}{Nh}\bigg{)},\end{split} (S.4.18)

where σmax(𝒩Γ)\sigma_{\mathrm{max}}(\mathcal{N}\otimes\Gamma) denotes the maximum eigenvalue of 𝒩Γ\mathcal{N}\otimes\Gamma. To get the second inequality in (S.4.18), we used the fact that m=1M𝒬n(𝔹^(tm),𝔹~(tm))m=1M𝒬n(𝔹0(tm),𝔹~(tm))\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\widehat{\mathbb{B}}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}\leq\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)} holds because 𝔹0\mathbb{B}_{0} satisfies the side conditions (2.9) under which 𝔹^\widehat{\mathbb{B}} minimizes m=1M𝒬n(𝔹(tm),𝔹~(tm))\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}. The last equality in (S.4.18) follows from Proposition 2.2.1 and (S.4.15). Finally, we conclude from (S.4.18) that

max1mM𝜷^(tm)𝜷0(tm)2max1mMvec(𝔹0(tm)𝔹^(tm))2σmin(𝒩Γ)1m=1M𝒬n(𝔹0(tm),𝔹^(tm))=OP(h4+logNNh),\displaystyle\begin{split}\max_{1\leq m\leq M}\big{\|}\widehat{\boldsymbol{\beta}}(t_{m})-\boldsymbol{\beta}_{0}(t_{m})\big{\|}^{2}&\leq\max_{1\leq m\leq M}\big{\|}\mathrm{vec}\big{(}\mathbb{B}_{0}(t_{m})-\widehat{\mathbb{B}}(t_{m})\big{)}\big{\|}^{2}\\ &\leq\sigma_{\mathrm{min}}(\mathcal{N}\otimes\Gamma)^{-1}\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widehat{\mathbb{B}}(t_{m})\big{)}\\ &=O_{P}\bigg{(}h^{4}+\frac{\log N}{Nh}\bigg{)},\end{split} (S.4.19)

where σmin(𝒩Γ)\sigma_{\mathrm{min}}(\mathcal{N}\otimes\Gamma) denotes the minimum eigenvalues of 𝒩Γ\mathcal{N}\otimes\Gamma. ∎

S.5 Proof of Theorem 2.3.1

Proof.

Let 𝝁i\boldsymbol{\mu}_{i} denote the true mean vector of responses of the iith trajectory when (2.2) is true. We define 𝝁~i\tilde{\boldsymbol{\mu}}_{i} and 𝝁^i\hat{\boldsymbol{\mu}}_{i} be the estimated responses calculated from the unconstrained 𝜷~(t)=(β~1(t),,β~p(t))\tilde{\boldsymbol{\beta}}(t)=\big{(}\tilde{\beta}_{1}(t),\ldots,\tilde{\beta}_{p}(t)\big{)}^{\top} and constrained 𝜷^(t)=(β^1(t),,β^p(t))\hat{\boldsymbol{\beta}}(t)=\big{(}\hat{\beta}_{1}(t),\ldots,\hat{\beta}_{p}(t)\big{)}^{\top}, respectively. Let 𝒂2=𝒂𝒂\|\boldsymbol{a}\|^{2}=\boldsymbol{a}^{\top}\boldsymbol{a} and define vec[𝝁i]=(𝝁1,,𝝁n)\textrm{vec}[\boldsymbol{\mu}_{i}]=(\boldsymbol{\mu}_{1}^{\top},\ldots,\boldsymbol{\mu}_{n}^{\top})^{\top}. Then we have

vec[𝝁~i]vec[𝝁i]2vec[𝝁^i]vec[𝝁i]2\displaystyle||\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]||^{2}-||\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]||^{2}
=vec[𝝁~i]vec[𝝁^i]2+2(vec[𝝁~i]vec[𝝁^i])(vec[𝝁^i]vec[𝝁i])\displaystyle=||\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}]||^{2}+2(\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}])
=vec[𝝁~i]vec[𝝁^i]22(vec[𝐘i]vec[𝝁~i])(vec[𝝁^i]vec[𝝁i])\displaystyle=||\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}]||^{2}-2(\text{vec}[\mathbf{Y}_{i}]-\text{vec}[\tilde{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]) (S.5.1)
+2(vec[𝐘i]vec[𝝁^i])(vec[𝝁^i]vec[𝝁i]).\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+2(\text{vec}[\mathbf{Y}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]).

The second term on the last equation is zero because of the orthogonality of (𝐘i𝝁~i)(\mathbf{Y}_{i}-\tilde{\boldsymbol{\mu}}_{i}) and (𝝁^i𝝁i)(\hat{\boldsymbol{\mu}}_{i}-\boldsymbol{\mu}_{i}), for i=1,,ni=1,\ldots,n. Based on Proposition 3.12.3 of Silvapulle and Sen (2005), the last term is non-negative by Karush-Kuhn-Tucker conditions as follows,

𝐘i𝝁^i,𝝁^i=0 and𝐘i𝝁^i,𝝁i0,\langle\mathbf{Y}_{i}-\hat{\boldsymbol{\mu}}_{i},~{}\hat{\boldsymbol{\mu}}_{i}\rangle=0~{}\text{~{}and}~{}\langle\mathbf{Y}_{i}-\hat{\boldsymbol{\mu}}_{i},\boldsymbol{\mu}_{i}\rangle\leq 0,

for i=1,,ni=1,\ldots,n. Thus, we have vec[𝝁^i]vec[𝝁i]2vec[𝝁~i]vec[𝝁i]2.||\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]||^{2}\leq||\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]||^{2}. Then, from Condition (S2), 𝝁~i𝝁i2𝐗i𝜷(t)𝐗i𝜷~(t)L22=𝐗i{𝜷~(t)𝜷(t)}L22M1𝜷(t)𝜷~(t)L22.||\tilde{\boldsymbol{\mu}}_{i}-{\boldsymbol{\mu}}_{i}||^{2}\simeq||\mathbf{X}_{i}\boldsymbol{\beta}(t)-\mathbf{X}_{i}\tilde{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}=||\mathbf{X}_{i}\{\tilde{\boldsymbol{\beta}}(t)-{\boldsymbol{\beta}}(t)\}||^{2}_{L_{2}}\leq M_{1}||\boldsymbol{\beta}(t)-\tilde{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}. For non-trivial 𝐗i\mathbf{X}_{i}, we can always find M4>0M_{4}>0 satisfying 𝝁^i𝝁i2𝐗i{𝜷^(t)𝜷(t)}L22M4𝜷^(t)𝜷(t)L22.||\hat{\boldsymbol{\mu}}_{i}-{\boldsymbol{\mu}}_{i}||^{2}\simeq||\mathbf{X}_{i}\{{\hat{\boldsymbol{\beta}}}(t)-{\boldsymbol{\beta}}(t)\}||^{2}_{L_{2}}\geq M_{4}||\hat{\boldsymbol{\beta}}(t)-{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}. Hence, altogether, we have M4𝜷^(t)𝜷(t)L22𝝁^i𝝁i2𝝁~i𝝁i2M3𝜷~(t)𝜷~(t)L22M_{4}||\hat{\boldsymbol{\beta}}(t)-{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}\leq||\hat{\boldsymbol{\mu}}_{i}-{\boldsymbol{\mu}}_{i}||^{2}\leq||\tilde{\boldsymbol{\mu}}_{i}-{\boldsymbol{\mu}}_{i}||^{2}\leq M_{3}||\tilde{\boldsymbol{\beta}}(t)-\tilde{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}, which leads to 𝜷^j(t)𝜷j(t)L22𝜷~j(t)𝜷j(t)L22||\hat{\boldsymbol{\beta}}_{j}(t)-{\boldsymbol{\beta}_{j}}(t)||^{2}_{L_{2}}\leq||\tilde{\boldsymbol{\beta}}_{j}(t)-{\boldsymbol{\beta}_{j}}(t)||^{2}_{L_{2}}, for j=1,,pj=1,\ldots,p. ∎

S.6 Partially shape-constrained regression spline via BB-splines

We consider the monotone increasing constraints on βj(t)\beta_{j}(t) over sub-interval j\mathcal{I}_{j} for jJj\in J, as in (2.2). Borrowing the notation from Section 2.3, now let {ϕkj(t)\phi_{k}^{j}(t): k=1,,djk=1,\ldots,d_{j}}, denote a set of BB-spline basis functions of degree 2 under the given knot sequence to express βj(t)\beta_{j}(t) and 𝐜j=(cj1,,cjdj)\mathbf{c}_{j}=(c_{j1},\ldots,c_{jd_{j}})^{\top} represents the corresponding basis coefficients. For i=1,,ni=1,\ldots,n, we define (Li×dj)(L_{i}\times d_{j}) matrix of discretized basis functions as,

Φij=[ϕ1j(Ti,1)ϕ2j(Ti,1)ϕdjj(Ti,1)ϕ1j(Ti,2)ϕ2j(Ti,2)ϕdjj(Ti,2)ϕ1j(Ti,Li)ϕ2j(Ti,Li)ϕdjj(Ti,Li)].\Phi_{ij}=\left[{\begin{array}[]{cccc}\phi^{j}_{1}{(T_{i,1})}&\phi^{j}_{2}{(T_{i,1})}&\cdots&\phi^{j}_{d_{j}}{(T_{i,1})}\\ \phi^{j}_{1}{({T_{i,2}})}&\phi^{j}_{2}{({T_{i,2}})}&\cdots&\phi^{j}_{d_{j}}{({T_{i,2}})}\\ \vdots&\vdots&\ddots&\vdots\\ \phi^{j}_{1}{(T_{i,L_{i}})}&\phi^{j}_{2}(T_{i,L_{i}})&\cdots&\phi^{j}_{d_{j}}(T_{i,L_{i}})\\ \end{array}}\right].

We then notate a set of given knot sequence generating a set of ϕkj(t)\phi_{k}^{j}(t) by 𝒦j={tjm[0,1]:m=1,,Mj}\mathcal{K}_{j}=\{t_{jm}\in[0,1]:m=1,\ldots,M_{j}\} with 0=tj1<<tjMj=10=t_{j1}<\cdots<t_{jM_{j}}=1. For j=1,,pj=1,\ldots,p, let 𝐀j\mathbf{A}_{j} denote the (Mj×dj)(M_{j}\times d_{j}) matrix of discretized first derivatives of basis functions evaluated at 𝒦j\mathcal{K}_{j} as,

𝐀j=[ϕ1j(tj1)ϕ2j(tj1)ϕdjj(tj1)ϕ1j(tj2)ϕ2j(tj2)ϕdjj(tj2)ϕ1j(tjMj)ϕ2j(tjMj)ϕdjj(tjMj)].\mathbf{A}_{j}=\left[{\begin{array}[]{cccc}{\phi^{j}_{1}}{{}^{\prime}}{(t_{j1})}&{\phi^{j}_{2}}{{}^{\prime}}{(t_{j1})}&\cdots&{\phi^{j}_{d_{j}}}{{}^{\prime}}{(t_{j1})}\\ {\phi^{j}_{1}}{{}^{\prime}}{({t_{j2}})}&{\phi^{j}_{2}}{{}^{\prime}}{({t_{j2}})}&\cdots&{\phi^{j}_{d_{j}}}{{}^{\prime}}{({t_{j2}})}\\ \vdots&\vdots&\ddots&\vdots\\ {\phi^{j}_{1}}{{}^{\prime}}{(t_{jM_{j}})}&{\phi^{j}_{2}}{{}^{\prime}}(t_{jM_{j}})&\cdots&{\phi^{j}_{d_{j}}}{{}^{\prime}}(t_{jM_{j}})\\ \end{array}}\right].

For the special case of j=[0,1]\mathcal{I}_{j}=[0,1], for all jJj\in J, i.e., when the monotone increasing constraint is assigned to the entire domain, the objective function to find shape constrained βj(t)\beta_{j}(t) is written as

Minimizei=1n𝐘ij=1pXi,j{Φij𝒄j}2subject to𝐀j𝐜j0Mj,jJ,\textrm{Minimize}~{}~{}~{}\sum_{i=1}^{n}\Big{\|}\mathbf{Y}_{i}-\sum_{j=1}^{p}{X}_{i,j}\big{\{}\Phi_{ij}\boldsymbol{c}_{j}\big{\}}\Big{\|}^{2}\quad\textrm{subject to}~{}~{}~{}\mathbf{A}_{j}\mathbf{c}_{j}\geq 0_{M_{j}},~{}~{}j\in J, (S.6.1)

where 0Mj0_{M_{j}} represents a length-MjM_{j} vector of 0’s. Now, under the general partial constraint case, suppose that lower and upper boundaries of sub-interval j\mathcal{I}_{j}, say tjlt_{jl} and tjut_{ju}, belong to 𝒦j\mathcal{K}_{j} without loss of generality. Since 𝒦j\mathcal{K}_{j} is user-defined, we can always define such 𝒦j\mathcal{K}_{j} by simply adding additional internal knots. Then, the objective function can be written similarly to (S.6.1) aiming to minimize the least squares but now under different 𝐀j\mathbf{A}_{j}, which assigns the non-negativity restriction on the first derivative function evaluated on knots belonging to j\mathcal{I}_{j}. Specifically,(v,w)(v,w)th element of AjA_{j} is defined as (Aj)[v,w]=ϕwj(tv)(A_{j})_{[v,w]}={{\phi}_{w}^{j}}{{}^{\prime}}{(t_{v})}, for v𝒱jv\in\mathcal{V}_{j} and w𝒲jw\in\mathcal{W}_{j}, where 𝒱j={m{1,,Mj}:tjltmtju}\mathcal{V}_{j}=\big{\{}m\in\{1,\ldots,M_{j}\}:t_{jl}\leq t_{m}\leq t_{ju}\big{\}} and 𝒲j={k{1,,dj}:ϕkj(t)0 over tj}\mathcal{W}_{j}=\big{\{}k\in\{1,\ldots,d_{j}\}:{\phi_{k}^{j}}{{}^{\prime}}(t)\neq 0\textrm{~{}over~{}}t\in\mathcal{I}_{j}\big{\}}. Otherwise, (Aj)[v,w]=0(A_{j})_{[v,w]}=0. In a matrix form, if we denote integer elements in 𝒱j\mathcal{V}_{j} and 𝒲j\mathcal{W}_{j} as mlm_{l}, ml+1m_{l+1},…, mum_{u}, and as kl,kl+1,kuk_{l},k_{l+1}\ldots,k_{u}, respectively, 𝐀j\mathbf{A}_{j} can be written as,

𝐀j=[00000000000000\hdashline00ϕklj(tj,ml)ϕkl+1j(tj,ml)ϕkuj(tj,ml)0000ϕklj(tj,ml+1)ϕkl+1j(tj,ml+1)ϕkuj(tj,ml+1)0000ϕklj(tj,mu)ϕkl+1j(tj,mu)ϕkuj(tj,mu)00\hdashline00000000000000].\mathbf{A}_{j}=\left[{\begin{array}[]{ccc:cc cc:ccc}0&\cdots&0&0&0&\cdots&0&0&\cdots&0\\ &\vdots&&&&\vdots&&&\vdots&\\ 0&\cdots&0&0&0&\cdots&0&0&\cdots&0\\ \hdashline 0&\cdots&0&{\phi^{j}_{k_{l}}}{{}^{\prime}}{(t_{j,m_{l}})}&{\phi^{j}_{k_{l+1}}}{{}^{\prime}}{(t_{j,m_{l}})}&\cdots&{\phi^{j}_{k_{u}}}{{}^{\prime}}{(t_{j,m_{l}})}&0&\cdots&0\\ 0&\cdots&0&{\phi^{j}_{k_{l}}}{{}^{\prime}}{(t_{j,m_{l+1}})}&{\phi^{j}_{k_{l+1}}}{{}^{\prime}}{(t_{j,m_{l+1}})}&\cdots&{\phi^{j}_{k_{u}}}{{}^{\prime}}{(t_{j,m_{l+1}})}&0&\cdots&0\\ \vdots&&\vdots&\vdots&\ddots&\vdots&\vdots&&\vdots\\ 0&\cdots&0&{\phi^{j}_{k_{l}}}{{}^{\prime}}{(t_{j,m_{u}})}&{\phi^{j}_{k_{l+1}}}{{}^{\prime}}{(t_{j,m_{u}})}&\cdots&{\phi^{j}_{k_{u}}}{{}^{\prime}}{(t_{j,m_{u}})}&0&\cdots&0\\ \hdashline 0&\cdots&0&0&0&\cdots&0&0&\cdots&0\\ &\vdots&&&&\vdots&&&\vdots&\\ 0&\cdots&0&0&0&\cdots&0&0&\cdots&0\\ \end{array}}\right].

Although here we elaborate on the case of one sub-interval j\mathcal{I}_{j}, the same approach applies to the case of multiple disjoint intervals of j\mathcal{I}_{j}. This optimization can be fulfilled using quadratic programming through qprog function in an R package coneporj. Indeed, this approach applies to shape-constrained estimation under any kind of basis functions, not necessarily BB-splines.

S.7 Additional simulation results

Refer to caption
Refer to caption
Refer to caption
Refer to caption
(a) =[0,0.5]\mathcal{I}=[0,0.5]

 

Refer to caption
Refer to caption
Refer to caption
Refer to caption
(b) =[0.3,0.5]\mathcal{I}=[0.3,0.5]
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(c) =[0.4,0.5]\mathcal{I}=[0.4,0.5]
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(e) =[0.8,1]\mathcal{I}=[0.8,1]
Figure S.7.1: Regression coefficient function estimates with kernel method (2.9) under shape constraints (3.1). The shaded area indicates the sub-interval \mathcal{I} of inferential interest. The blue dash-dotted lines depict the Monte Carlo average estimate in the full cohort analysis. The red long-dashed lines represent the Monte Carlo average estimate from the sub-cohort analysis. The solid black lines represent the true functions.

             

Refer to caption
Refer to caption
Refer to caption
Refer to caption
(d) =[0.5,1]\mathcal{I}=[0.5,1]
Refer to caption
Refer to caption
Refer to caption
Refer to caption
(f) =[0.9,1]\mathcal{I}=[0.9,1]
Refer to caption
Refer to caption
Figure S.7.2: The power analysis of the partially shape-constrained spline method (2.17) subject to shape constraints H0:β1H_{0}:\beta_{1} and β3\beta_{3} are monotone increasing on \mathcal{I}. The sensitivity and specificity are evaluated at 5%5\% significance level under different scenarios of locations and lengths of the sub-interval as =[0,0.5]\mathcal{I}=[0,0.5], [0.1,0.5][0.1,0.5], [0.2,0.5][0.2,0.5], [0.3,0.5][0.3,0.5], [0.4,0.5][0.4,0.5], [0.5,1][0.5,1], [0.6,1][0.6,1], [0.7,1][0.7,1], [0.8,1][0.8,1], and [0.9,1][0.9,1].
Table S.7.1: Simulation results for the shape-constrained kernel-weighted least squares approach (2.9). The estimation and test performance are evaluated with the integrated squared bias (ISB), the integrated variance (IVar), the type I error rate, and the power of the test, defined as (3.2) and (3.3).
Sample size Dataset Criterion H0:β1H_{0}:\beta_{1} and β3\beta_{3} are monotone increasing on \mathcal{I}.
H0H_{0} is true. H0H_{0} is false.
=[0,0.5]\mathcal{I}=\mathcal{[}0,0.5] =[0.4,0.5]\mathcal{I}=\mathcal{[}0.4,0.5] =[0.5,1]\mathcal{I}=\mathcal{[}0.5,1] =[0.9,1]\mathcal{I}=\mathcal{[}0.9,1]
n=100n=100 Full data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.06,0.11)(0.06,0.11) (0.05,0.11)(0.05,0.11) (0.40,0.50)(0.40,0.50) (0.12,0.20)(0.12,0.20)
ISB\mathrm{ISB} 0.03020.0302 0.00730.0073 0.16000.1600 0.02300.0230
IVar\mathrm{IVar} 0.40340.4034 0.53460.5346 0.45370.4537 1.09011.0901
Partial data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.06,0.11)(0.06,0.11) (0.09,0.19)(0.09,0.19) (0.30,0.40)(0.30,0.40) (0.21,0.30)(0.21,0.30)
restricted to \mathcal{I} ISB\mathrm{ISB} 0.05170.0517 0.07680.0768 0.22990.2299 0.22000.2200
IVar\mathrm{IVar} 0.53350.5335 1.50281.5028 0.59870.5987 3.78443.7844
n=500n=500 Full data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.05,0.10)(0.05,0.10) (0.05,0.09)(0.05,0.09) (0.86,0.92)(0.86,0.92) (0.11,0.16)(0.11,0.16)
ISB\mathrm{ISB} 0.01020.0102 0.00180.0018 0.12110.1211 0.00970.0097
IVar\mathrm{IVar} 0.10100.1010 0.11350.1135 0.10590.1059 0.26810.2681
Partial data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.04,0.10)(0.04,0.10) (0.07,0.13)(0.07,0.13) (0.78,0.88)(0.78,0.88) (0.16,0.25)(0.16,0.25)
restricted to \mathcal{I} ISB\mathrm{ISB} 0.01290.0129 0.01100.0110 0.16990.1699 0.10450.1045
IVar\mathrm{IVar} 0.14970.1497 0.32290.3229 0.14300.1430 0.73960.7396
Table S.7.2: Simulation results for the shape-constrained regression spline approach (2.17). The estimation and test performance are evaluated with the integrated squared bias (ISB), the integrated variance (IVar), the type I error rate, and the power of the test, defined as (3.2) and (3.3).
Sample size Dataset Criterion H0:β1H_{0}:\beta_{1} and β3\beta_{3} are monotone increasing on \mathcal{I}.
H0H_{0} is true. H0H_{0} is false.
=[0,0.5]\mathcal{I}=\mathcal{[}0,0.5] =[0.4,0.5]\mathcal{I}=\mathcal{[}0.4,0.5] =[0.5,1]\mathcal{I}=\mathcal{[}0.5,1] =[0.9,1]\mathcal{I}=\mathcal{[}0.9,1]
n=100n=100 Full data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.04,0.07)(0.04,0.07) (0.08,0.11)(0.08,0.11) (0.29,0.40)(0.29,0.40) (0.09,0.11)(0.09,0.11)
ISB\mathrm{ISB} 0.01010.0101 0.00330.0033 0.18500.1850 0.23040.2304
IVar\mathrm{IVar} 0.37370.3737 0.20540.2054 0.40460.4046 0.92820.9282
Partial data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.05,0.11)(0.05,0.11) (0.06,0.12)(0.06,0.12) (0.18,0.24)(0.18,0.24) (0.05,0.11)(0.05,0.11)
restricted to \mathcal{I} ISB\mathrm{ISB} 0.01760.0176 0.11560.1156 0.17830.1783 0.42000.4200
IVar\mathrm{IVar} 0.57580.5758 1.68961.6896 0.61510.6151 4.21234.2123
n=500n=500 Full data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.04,0.08)(0.04,0.08) (0.05,0.08)(0.05,0.08) (0.81,0.89)(0.81,0.89) (0.10,0.16)(0.10,0.16)
ISB\mathrm{ISB} 0.00170.0017 0.00160.0016 0.14610.1461 0.09130.0913
IVar\mathrm{IVar} 0.09460.0946 0.05720.0572 0.09740.0974 0.20280.2028
Partial data (γ0.05,γ0.1)(\gamma_{0.05},\gamma_{0.1}) (0.05,0.11)(0.05,0.11) (0.04,0.08)(0.04,0.08) (0.44,0.56)(0.44,0.56) (0.04,0.09)(0.04,0.09)
restricted to \mathcal{I} ISB\mathrm{ISB} 0.00220.0022 0.01090.0109 0.14060.1406 0.07310.0731
IVar\mathrm{IVar} 0.11680.1168 0.27980.2798 0.14630.1463 0.70050.7005

References

M. J. Silvapulle and P. K. Sen. Constrained Statistical Inference: Inequality, Order, and Shape Restrictions. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc, 1 edition, 2005.