Statistical inference on partially shape-constrained function-on-scalar linear regression models

Kyunghee Han University of Illinois at Chicago Yeonjoo Park Corresponding author (yeonjoo.park@utsa.edu) University of Texas at San Antonio Soo-Young Kim Fred Hutchinson Cancer Center

Abstract

We consider functional linear regression models where functional outcomes are associated with scalar predictors by coefficient functions with shape constraints, such as monotonicity and convexity, that apply to sub-domains of interest. To validate the partial shape constraints, we propose testing a composite hypothesis of linear functional constraints on regression coefficients. Our approach employs kernel- and spline-based methods within a unified inferential framework, evaluating the statistical significance of the hypothesis by measuring an $L^{2}$ -distance between constrained and unconstrained model fits. In the theoretical study of large-sample analysis under mild conditions, we show that both methods achieve the standard rate of convergence observed in the nonparametric estimation literature. Through numerical experiments of finite-sample analysis, we demonstrate that the type I error rate keeps the significance level as specified across various scenarios and that the power increases with sample size, confirming the consistency of the test procedure under both estimation methods. Our theoretical and numerical results provide researchers the flexibility to choose a method based on computational preference. The practicality of partial shape-constrained inference is illustrated by two data applications: one involving clinical trials of NeuroBloc in type A-resistant cervical dystonia and the other with the National Institute of Mental Health Schizophrenia Study.

Keywords: Nonparametric estimation, partial shape constraints, shape-constrained kernel least squares, shape-constrained regression spline, testing

1 Introduction

As function-valued data acquisition becomes increasingly common, there has been a significant amount of work devoted to functional regression models to address coefficient function estimation and inferential capabilities [21, 32]. Among them, inference for the function-on-scalar regression (FoSR) model has been making implications in many fields [24] by identifying a dynamic association between response and scalar covariates, varying over the domain. Besides finding the statistical evidence of a non-null covariate effect on response trajectories over the domain, validating the shape of functional regression coefficients, such as monotonicity, convexity, or concavity, over a specific subset of the domain is crucial for domain experts to allow tangible interpretation in practice. Once the functional shape is validated, one may incorporate the corresponding conditions to the model estimates to avoid potential biases. Or, in practical estimation problems, shape restrictions on functional regression coefficients over specific sub-intervals of the domain can be known as prior knowledge.

Our motivating example is the functional data analysis approach to demonstrating the treatment efficacy of drug use during the period of clinical trials in the National Institute of Mental Health Schizophrenia Collaborative Study. The previous studies, including [1] and [12], showed a significant continual drop in disease severity for the treatment group, i.e., improving drug efficacy, over weeks through the shape-constraint test on the drug effect coefficient function. While the study design and models are provided in Section 4.2, the left panel of Figure 1 displays the collected longitudinal disease severity measures surveyed from the initial treatment (week 0) to week 6 from randomly selected 30 patients of placebo and treatment cohorts. However, refined conclusions over specific periods can be of practical interest to practitioners, such as when the maximum drug efficacy is achieved or whether we can conclude significant improvement in drug effectiveness even during the later weeks of trials. As we shall see in Section 4.2, our proposed partial inference method concludes the significant monotone decrease in disease severity owing to the drug treatment over weeks 0–3, followed by the consistent duration of such efficacy for the remainder weeks, illustrated with the blue solid line in the right panel of Figure 1. This is somewhat distinct from constrained estimates under the condition of monotone decrease over the entire domain, the conclusion of [12], and unconstrained smooth estimates, which are displayed with the black dotted and dashed lines, respectively.

Refer to caption — Figure 1: The left panel illustrates the individual trajectories of disease severity for a subset of patients in the treatment and placebo groups. The average disease severity for each group (solid line) shows an overall decrease, with the treatment group experiencing a rapid drop in the first few weeks, followed by a continual reduction at a seemingly similar rate for both groups. As shown in the right panel, [12] previously validated this observation using globally monotone shape-constrained estimates (black dashed line), which provide more explicit and decision-making interpretations than unconstrained estimates (black dotted line). However, the global shape-constrained inference may be misleading by ignoring the rebound after Week 5, as indicated by the unconstrained estimates. Our proposed method (blue solid line) with partial shape constraints refines the conclusion, suggesting that the duration of drug action may effectively extend up to Week 3, with sustained efficacy thereafter.

The recent literature on functional data analysis and its applications also pays attention to shape-restricted estimation and inference in functional regression models. For the inference on shape-constrained regression coefficients, [6] proposed the goodness-of-fit test based on empirical processes projected to the space with shape constraints, while [22] addresses a similar problem under the null hypothesis with linear operator constraints. For estimation, [4] extended the shape-constrained kernel smoothing to the functional and longitudinal data framework, and [12] employed the Bernstein polynomials for shape-constrained estimation in functional regression with one of the data application results from [12] being illustrated in Figure 1. To name of few other applications, in economics, motivated by the pioneering work from Slutsky [31] recognizing the necessity of shape-restricted estimation and inference, [5] elaborated theoretical and application works in econometric research. In public health research, the analysis of growth charts under the monotone increasing shape restrictions has provided a crucial clinical tool for growth screening during infancy, childhood, and adolescence [15] or such growth chart helps health providers assess monotone increasing growth patterns against age-specific percentile curves [9]. The reliability engineering society also paid attention to this topic for bathtub-shaped function estimation in assessing the degradation of system reliability. However, to our best knowledge, inference for partial shape constraints has not been recognized, although such interests can be plausible in practice.

In this study, we propose the inferential tool to validate the partial shape constraints on functional regression coefficients in FoSR, along with two corresponding estimation approaches using kernel-smoothing and spline techniques. Suppose one is interested in the shape constraints on a sub-interval $\mathcal{I}=[a,b]\subset[0,1]$ , where collected response trajectories span $[0,1]$ . Under the existing tools for shape-constrained inference, one might consider using a part of functional observations restricted on $\mathcal{I}$ and apply the standard testing procedure. However, such an approach will cause a significant drop in sample sizes in the case of a bounded number of measurements on response trajectories, which is common in longitudinal studies, including our motivating example illustrated in Figure 1. The situation becomes much worse depending on the length of $\mathcal{I}$ or the sparsity of evaluation grids on it. Consequently, this naive approach would result in a severe deterioration in the power. To prevent this, we propose to borrow information from observations over the entire domain to perform the inference even for testing focused on specific subsets of the domain. It indeed prevents the significant drop in the power of the test while keeping the desirable level of type-I error, as demonstrated in the simulation studies of Section 3. The other contribution of our paper is in developing two partial shape-constrained regression estimators in parallel using the most widely used nonparametric smoothing techniques, kernel-smoothing and smoothing splines. The proposed unified testing tool applicable to both estimates would provide practical flexibility in a real application. We additionally derive asymptotic behaviors of the regression coefficient estimators with embedded partial shape constraints.

The article is organized as follows. Section 2 presents the proposed method and results of the study. We introduce the partially shape-constrained FoSR models in Section 2.1. The estimation methods and theoretical findings are provided in Sections 2.2 and 2.3 that cover the kernel and spline estimators, respectively. In Section 2.4, we establish a unified inferential procedure for testing the partial shape constraints, encompassing kernel and spline approaches. Simulation results are reported in Section 3, and data applications are illustrated with two examples in Section 4. Technical details and proofs are provided in Supplementary Material.

2 Methodology and Theory

2.1 Testing partial shape constraints in FoSR models

Let $Y_{i}$ be the functional response coupled with a vector covariate $\mathbf{X}_{i}=(X_{i,1},\ldots,X_{i,p})^{\top}$ as

\displaystyle Y_{i}(t)=\sum_{j=1}^{p}\beta_{j}(t)X_{i,j}+\varepsilon_{i}(t)\quad(t\in[0,1])

(2.1)

independently for $i=1,\ldots,n$ , where $\boldsymbol{\beta}(t)=\big{(}\beta_{1}(t),\ldots,\beta_{p}(t)\big{)}^{\top}$ is a vector coefficient function and $\varepsilon_{i}(t)$ is a mean-zero stochastic process that models the regression error uncorrelated with $\mathbf{X}_{i}$ , i.e., $\mathbb{E}(\varepsilon_{i}(t)|\mathbf{X}_{i})=0$ . We assume that $X_{i,1},\ldots,X_{i,p}$ are linearly independent with positive probability, which allows the design matrix to include the intercept, e.g., $X_{i,1}=1$ for all $i=1,\ldots,n$ , depending on the inferential interest. Despite the generality of the functional model (2.1), it is practically infeasible to observe the functional outcomes as the infinite-dimensional objects, and we assume that discrete evaluations $\mathbf{Y}_{i}=(Y_{i,1},\ldots,Y_{i,L_{i}})^{\top}$ with $Y_{i,\ell}=Y_{i}(T_{i,\ell})$ are only accessible for $\ell=1,\ldots,L_{i}$ , where $T_{i,1},\ldots,T_{i,L_{i}}$ is a random sample of $T$ with a probability density function $\pi$ on $[0,1]$ . Here, $L_{i}$ is an independent random integer that may or may not depend on the sample size $n$ . More detailed conditions on $L_{i}$ required for kernel and regression spline methods are provided in the Supplementary Material S.1 and S.2. We denote the finite random sample as $\mathcal{X}_{n}=\{(\mathbf{Y}_{i},\mathbf{X}_{i}):i=1,\ldots,n\}$ .

Suppose we test the functional shape of $\beta_{j}$ on a pre-specified sub-interval $\mathcal{I}_{j}\subset[0,1]$ for $j\in J$ with $J\subset\{1,\ldots,p\}$ . The null hypothesis “ $H_{0}:\textrm{$H_{0,j}$ is true for all $j\in J$}$ ” consists of individual hypotheses

\displaystyle H_{0,j}:\textrm{$\beta_{j}$ is $\mathcal{A}_{j}$-shaped on $\mathcal{I}_{j}$}

(2.2)

for $j\in J$ . For example, if we hypothesize that $\beta_{j}$ is monotone increasing on $\mathcal{I}_{j}$ , we read (2.2) as “ $H_{0,j}:\textrm{$\beta_{j}$ is monotone increasing on $\mathcal{I}_{j}$}$ .” Similarly, if one tests a partial convexity hypothesis, (2.2) can also be written as “ $H_{0,j}:\textrm{$\beta_{j}$ is convex on $\mathcal{I}_{j}$}$ .” In this paper, monotone and convex hypotheses will mainly be exemplified, yet one can extend our framework to general shape constraints, similarly as covered by many others in the literature [17, 18, 27, 28, 20, 25].

We then propose to reject the null hypothesis $H_{0}$ if the test statistic

\displaystyle\begin{split}D_{n}=\frac{1}{n}\sum_{i=1}^{n}\int_{0}^{1}\left\{\mathbf{X}_{i}^{\top}\big{(}\widehat{\boldsymbol{\beta}}(t)-\widetilde{\boldsymbol{\beta}}(t)\big{)}\right\}^{2}\,\mathrm{d}t\end{split}

(2.3)

is significantly large, where $\widehat{\boldsymbol{\beta}}(t)=\big{(}\hat{\beta}_{1}(t),\ldots,\hat{\beta}_{p}(t)\big{)}^{\top}$ and $\widetilde{\boldsymbol{\beta}}(t)=\big{(}\tilde{\beta}_{1}(t),\ldots,\tilde{\beta}_{p}(t)\big{)}^{\top}$ denote the estimates of coefficient function $\boldsymbol{\beta}(t)=\big{(}\beta_{1}(t),\ldots,\beta_{p}(t)\big{)}^{\top}$ under the null hypothesis $H_{0}$ and its general alternative $H_{1}$ , respectively. While this $D_{n}$ is partly motivated by [22], a goodness-of-fit based test statistic for validating functional constraints on $\beta_{j}(t)$ via linear operator, similar types of the $L^{2}$ -based test statistics were frequently employed in the literature for testing the nullity of functional difference for general scope [30, 36, 35]. Indeed, our proposed method corresponds to the global shape constraints if one sets $\mathcal{I}_{j}=[0,1]$ . The following Sections 2.2 and 2.3 elaborate on how to obtain $\widehat{\boldsymbol{\beta}}(t)$ under kernel-smoothing and spline-smoothing approaches, respectively. Based on those estimators, we present a bootstrap test procedure in Section 2.4.

2.2 Estimation: Shape-constrained kernel-weighted least squares

We assume that the vector coefficient $\boldsymbol{\beta}(t)$ is as smooth as it allows local linear approximation $\boldsymbol{\beta}(u)\approx\boldsymbol{\beta}(t)+\dot{\boldsymbol{\beta}}(t)(u-t)$ for $u$ near $t$ , where $\dot{\boldsymbol{\beta}}(t)=\big{(}\beta_{1}^{\prime}(t),\ldots,\beta_{p}^{\prime}(t)\big{)}^{\top}$ denotes the gradient of $\boldsymbol{\beta}(t)$ . The local linear kernel estimator of $\mathbb{B}(t)=\big{[}\boldsymbol{\beta}(t),\dot{\boldsymbol{\beta}}(t)\big{]}\in\mathbb{R}^{p\times 2}$ is given by the unconstrained minimizer $\widetilde{\mathbb{B}}(t)$ of the following objective function.

\displaystyle\begin{split}\mathcal{L}_{n}\big{(}\mathbb{B}(t)\big{)}&=\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{[}Y_{i,\ell}-\mathbf{X}_{i}^{\top}\mathbb{B}(t)\mathbf{Z}_{i,\ell}(t)\Big{]}^{2}K_{h}(T_{i,\ell}-t)\end{split}

(2.4)

where $K_{h}(u)=K(u/h)/h$ with a probability density function $K$ and a bandwidth $h>0$ , $\mathbf{Z}_{i,\ell}(t)=\big{(}1,T_{i,\ell}-t\big{)}^{\top}$ is the local smoothing design, and $\mathbf{X}_{i}^{\top}\mathbb{B}(t)\mathbf{Z}_{i,\ell}(t)=\mathbb{W}_{i,\ell}(t)^{\top}\mathrm{vec}\big{(}\mathbb{B}(t)\big{)}$ with $\mathrm{vec}\big{(}\mathbb{B}(t)\big{)}=\big{(}\boldsymbol{\beta}(t)^{\top},\dot{\boldsymbol{\beta}}(t)^{\top}\big{)}^{\top}\in\mathbb{R}^{2p}$ of $\mathbb{B}(t)$ and the Kronecker product $\mathbb{W}_{i,\ell}(t)=\mathbf{Z}_{i,\ell}(t)\otimes\mathbf{X}_{i}$ of the covarite design for local linear smoothing.

Proposition 2.2.1.

For each $t\in[0,1]$ , let $\mathbb{B}_{0}(t)=\big{[}\boldsymbol{\beta}_{0}(t),\dot{\boldsymbol{\beta}}_{0}(t)\big{]}$ be the $p\times 2$ matrix, where $\boldsymbol{\beta}_{0}(t)$ is the true vector coefficient function such that $\ddot{\boldsymbol{\beta}}_{0}(t)=\big{(}\beta_{1}^{\prime\prime}(t),\ldots,\beta_{p}^{\prime\prime}(t)\big{)}^{\top}$ exists and is continuous. Suppose conditions $(K0)$ – $(K4)$ in Supplementray Material S.1 hold. Then, we have

\displaystyle\sup_{t\in[0,1]}\big{\|}\widetilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\big{\|}=O_{P}\bigg{(}h^{2}+\sqrt{\frac{\log n}{nh}}\bigg{)}.

(2.5)

The implication of conditions $(K0)$ – $(K4)$ are provided in Supplementary Material S.1. It is worth mentioning that Proposition 2.2.1 is valid even if only a few longitudinal observations are available for some subjects. For example, it is common in observational studies that dense observations of functional responses for some subjects may not be available regardless of the sample size $n$ , i.e., the minimum number of longitudinal observations $\lambda_{n}=\min_{1\leq i\leq n}L_{i}$ is bounded. Therefore, the technical condition on $\lambda_{n}$ in $(K4)$ is not a restriction because we do not require $\lambda_{n}\to\infty$ as $n\to\infty$ .

Remark 1.

For technical conditions $(K0)$ – $(K4)$ , $k>2.5$ and $a=1/5$ are commonly adopted in the standard theory of kernel smoothing [10]. Then, Theorem 2.2.1 gives $\sup_{t\in[0,1]}\|\tilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\|=O_{P}(n^{-2/5}\sqrt{\log n})$ . It follows that $\int_{0}^{1}\big{\|}\widetilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\big{\|}^{2}\,\mathrm{d}t=O_{P}(n^{-2/5}\sqrt{\log n})$ .

For the shape-constrained estimation of $\widehat{\boldsymbol{\beta}}$ under $H_{0}$ , we extend the kernel-weighted least squares for nonparametric regression with scalar responses [34], where side conditions are empirically examined over a fine grid $\mathcal{G}_{M}=\{t_{m}\in[0,1]:m=0,1,\ldots,M\}$ with $0=t_{0}<t_{1}<\cdots<t_{M-1}<t_{M}=1$ so that $\hat{\beta}_{j}$ satisfies the null hypothesis (2.2) over $\mathcal{G}_{j}=\mathcal{G}|_{\mathcal{I}_{j}}=\{t_{j_{m}}\in\mathcal{I}_{j}:m=0,1,\ldots,M_{j}\}$ for each $j\in J$ . Specifically, if we consider “ $H_{0,j}:\textrm{$\beta_{j}$ is monotone increasing on $\mathcal{I}_{j}$}$ ,” we propose to substitute

\displaystyle H_{0,j}^{\prime}:\beta_{j}(t_{j_{m}})-\beta_{j}(t_{j_{m-1}})\geq 0\,\,\,\textrm{for all}\,\,\,m=1,\ldots,M_{j}.

(2.6)

Similarly, we empirically validate the partial hypothesis “ $H_{0,j}:\textrm{$\beta_{k}$ is convex on $\mathcal{I}_{j}$}$ ” as

\displaystyle H_{0,j}^{\prime}:\frac{\beta_{j}(t_{j_{m}})-\beta_{j}(t_{j_{m-1}})}{t_{j_{m}}-t_{j_{m-1}}}\geq\beta_{j}^{\prime}(t_{j_{m-1}})\,\,\,\textrm{for all}\,\,\,m=1,\ldots,M_{j}.

(2.7)

To define the shape-constrained kernel-weighted least squares for the function-on-scalar regression model (2.1) under the empirical null hypothesis on the grid $\mathcal{G}_{M}$ , we note that

\displaystyle\begin{split}\mathcal{L}_{n}\big{(}\mathbb{B}(t)\big{)}&=\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{(}\mathbb{W}_{i,\ell}(t)^{\top}\big{[}\mathrm{vec}\big{(}\mathbb{B}(t)\big{)}-\mathrm{vec}\big{(}\widetilde{\mathbb{B}}(t)\big{)}\big{]}-\tilde{e}_{i,\ell}(t)\Big{)}^{2}K_{h}(T_{i,\ell}-t)\\ &=\mathcal{Q}_{n}\big{(}\mathbb{B}(t),\widetilde{\mathbb{B}}(t)\big{)}+\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\tilde{e}_{i,\ell}(t)^{2},\end{split}

(2.8)

where $\tilde{e}_{i,\ell}=Y_{i,\ell}-\mathbb{W}_{i,\ell}^{\top}\mathrm{vec}\big{(}\widetilde{\mathbb{B}}(t)\big{)}$ is the unconstrained residual of fitting $Y_{i,\ell}$ and $\mathcal{Q}_{n}\big{(}\mathbb{B}(t),\widetilde{\mathbb{B}}(t)\big{)}=\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)}^{\top}\widehat{\Psi}(t)\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)}$ with $\widehat{\Psi}(t)=n^{-1}\sum_{i=1}^{n}L_{i}^{-1}\sum_{\ell=1}^{L_{i}}\mathbb{W}_{i,\ell}(t)\mathbb{W}_{i,\ell}(t)^{\top}K_{h}(T_{i,\ell}-t)$ . To get (2.8), we used the fact that the unconstrained estimator $\widetilde{\mathbb{B}}(t)$ solves the estimating equation $\partial\mathcal{L}_{n}\big{(}\mathbb{B}(t)\big{)}/\partial\mathrm{vec}\big{(}\mathbb{B}(t)\big{)}=0_{2p}$ , or equivalently $n^{-1}\sum_{i=1}^{n}L_{i}^{-1}\sum_{\ell=1}^{L_{i}}\mathbb{W}_{i,\ell}(t)\tilde{e}_{i,\ell}K_{h}(T_{i,\ell}-t)=0_{2p}$ , where $0_{2p}=(0,\ldots,0)^{\top}$ denotes the zero vector in $\mathbb{R}^{2p}$ . Noting that the minimization of $\mathcal{L}_{n}\big{(}\mathbb{B}(t)\big{)}$ only depends on the first term of (2.8) given the unconstrained estimator $\widetilde{B}(t)$ , we propose the constrained optimization problem as follows.

\displaystyle\begin{split}\textrm{Minimize}&\quad\sum_{m=0}^{M}\mathrm{vec}\big{(}\mathbb{B}(t_{m})-\widetilde{\mathbb{B}}(t_{m})\big{)}^{\top}\widehat{\Psi}(t_{m})\mathrm{vec}\big{(}\mathbb{B}(t_{m})-\widetilde{\mathbb{B}}(t_{m})\big{)}\\ \quad\textrm{subject to}&\quad A_{j}\left[\begin{array}[]{c}\mathrm{vec}\big{(}\mathbb{B}(t_{j_{0}})\big{)}\\ \vdots\\ \mathrm{vec}\big{(}\mathbb{B}(t_{j_{M}})\big{)}\end{array}\right]\geq 0_{M_{j}}\,\,\,\textrm{for all}\,\,\,j\in J,\end{split}

(2.9)

where $0_{k}\in\mathbb{R}^{k}$ and $O_{k\times\ell}\in\mathbb{R}^{k\times\ell}$ are the $k$ -vector and $(k,\ell)$ -matrix of zeros, respectively, and $A_{j}\in\mathbb{R}^{M_{j}\times 2p(M+1)}$ is the linear constraint matrix associated with the individual hypothesis $H_{0,j}$ . For the monotone increasing hypothesis (2.6), we set $A_{j}=A_{\mathrm{mono}}(\mathcal{I}_{j})\otimes\big{[}\mathrm{e}_{j}^{\top},0_{p}^{\top}\big{]}$ , where $\textrm{e}_{j}\in\mathbb{R}^{p}$ is the unit vector with $1$ only at its $j$ -th coordinate and $A_{\mathrm{mono}}(\mathcal{I}_{j})\in\mathbb{R}^{M_{j}\times(M_{j}+1)}$ is defined as

\displaystyle A_{\mathrm{mono}}(\mathcal{I}_{j})

\displaystyle=\left[\begin{array}[]{ccc c cc}-1&1&0&\cdots&0&0\\ 0&-1&1&\cdots&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&-1&1\end{array}\right].

(2.14)

Similarly, for the convexity hypothesis (2.7), we set we set $A_{j}=A_{\mathrm{conv},0}(\mathcal{I}_{j})\otimes\big{[}\mathrm{e}_{j}^{\top},0_{p}^{\top}\big{]}-A_{\mathrm{conv},1}(\mathcal{I}_{j})\otimes\big{[}0_{p}^{\top},\mathrm{e}_{j}^{\top}\big{]}$ , where $A_{\mathrm{conv},0}(\mathcal{I}_{k}),A_{\mathrm{conv},1}(\mathcal{I}_{j})\in\mathbb{R}^{M_{j}\times(M_{j}+1)}$ are defined as

\displaystyle\begin{split}A_{\mathrm{conv},0}(\mathcal{I}_{j})&=\left[\begin{array}[]{ccc c cc}\frac{-1}{t_{j_{1}}-t_{j_{0}}}&\frac{1}{t_{j_{1}}-t_{j_{0}}}&0&\cdots&0&0\\ 0&\frac{-1}{t_{j_{2}}-t_{j_{1}}}&\frac{1}{t_{j_{2}}-t_{j_{1}}}&\cdots&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&\frac{-1}{t_{j_{M_{j}}}-t_{j_{M_{j}-1}}}&\frac{1}{t_{j_{M_{j}}}-t_{j_{M_{j}-1}}}\end{array}\right]\end{split}

(2.15)

and $A_{\mathrm{conv},1}(\mathcal{I}_{j})=\big{[}I_{M_{j}},0_{M_{j}}\big{]}$ . This illustrates that the constrained estimates $\widehat{\mathbb{B}}(t_{0}),\ldots,\widehat{\mathbb{B}}(t_{M})$ can be obtained by the standard quadratic programming with $\sum_{j\in J}M_{j}$ linear constraints.

Remark 2.

Suppose we hypothesize “ $H_{0,j}:\textrm{$\beta_{j}$ is monotone increasing on $\mathcal{I}_{j,1}$ and convex on $\mathcal{I}_{j,2}$}$ ,” where $\mathcal{I}_{j,1}$ and $\mathcal{I}_{j,2}$ may or may not overlap. Then, the side conditions corresponding to the null hypothesis can also be examined by imposing both linear constraint matrices $A_{\mathrm{mono}}(\mathcal{I}_{j,1})\otimes\big{[}\mathrm{e}_{j}^{\top},0_{p}^{\top}\big{]}$ and $A_{\mathrm{conv},0}(\mathcal{I}_{j,2})\otimes\big{[}\mathrm{e}_{j}^{\top},0_{p}^{\top}\big{]}-A_{\mathrm{conv},1}(\mathcal{I}_{j,2})\otimes\big{[}0_{p}^{\top},\mathrm{e}_{j}^{\top}\big{]}$ .

Theorem 2.2.2.

Suppose the conditions in Proposition 2.2.1 hold. For any $M\geq 1$ , let $\mathcal{G}_{M}=\{t_{m}\in[0,1]:m=0,1,\ldots,M\}$ be the finite grid used in (2.9) with $0=t_{0}<t_{1}<\cdots<t_{M}=1$ . Under the null hypothesis $H_{0}$ in (2.2), we have

\displaystyle\max_{t\in\mathcal{G}_{M}}\big{\|}\widehat{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\big{\|}=O_{P}\bigg{(}h^{2}+\sqrt{\frac{\log n}{nh}}\bigg{)}.

(2.16)

Theorem 2.2.2 indicates that the shape-constrained kernel least square estimator achieves the same rate of convergence as the unconstrained estimator under the null hypothesis. Moreover, suppose the empirical distribution of $\mathcal{G}_{M}$ converges to the uniform distribution on $[0,1]$ . Then, it can be verified that $\int_{0}^{1}\big{\|}\widehat{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)\big{\|}\,\mathrm{d}t=O_{P}(n^{-2/5}\sqrt{\log n})$ similarly to Remark 1.

2.3 Estimation: Shape-constrained regression spline

We now consider fitting the functional regression model by means of spline functions for smooth, flexible, and parsimonious estimation of coefficient functions. Let $\phi_{k}(t)$ , $k=1,\ldots,d$ denote spline basis functions defined over $[0,1]$ for a given sequence of knots, then we express $\beta_{j}(t)=\sum_{k=1}^{d}c_{jk}\phi_{k}(t)$ , where $c_{jk}$ denotes the basis coefficients. Under the shape constraints on $\beta_{j}(t)$ over $\mathcal{I}_{j}$ , we can still employ the regression spline approach based on chosen basis functions, such as $I$ -splines [23], $C$ -splines [18] or general $B$ -splines, with corresponding constraints assigned on a subset of basis coefficients $c_{jk}$ . While we focus on partially constrained estimation based on $I$ - or $C$ - splines in this section, estimation via $B$ -splines is also discussed in the Supplementary Material S.6.

[23] first introduced $I$ -splines for curve estimation under the monotonicity restriction over the entire domain. As illustrated in the left panel of Figure 2, $I$ -spline basis functions are piece-wise quadratic, and at each of the knots indicated by dotted vertical lines, there is exactly one basis function with a non-zero slope. Thus, a monotone increasing (decreasing) spline function can be constructed as a non-negative (non-positive) linear combination of $I$ -spline basis functions. For readers interested in generating process of $I$ -splines, we refer [23] on how piece-wise quadratic $I$ -splines are formed from nonnegative $M$ -spline family. Then, if the specific aim is on assigning, for example, monotone increasing restriction over a subset of the domain $\mathcal{I}_{j}$ , non-negative coefficients condition should apply to a subset of $c_{jk}$ for which corresponding $I$ -spline $\phi_{k}(t)$ display positive slopes within the range of $\mathcal{I}_{j}$ .

Next, in terms of convexity constraints, [18] proposed $C$ -splines by integrating $I$ -splines. As shown in the right panel of Figure 2, $C$ -splines form convex and piece-wise cubic functions with non-zero second derivatives at each knot, implying that a linear combination of $C$ -splines with non-negative (non-positive) basis coefficients ensures the convexity (concavity) of the function estimator. See Section 2 of [18] for details. Similarly, for the convexity constraints over sub-interval(s) $\mathcal{I}_{j}$ , we can impose non-negative restrictions to basis coefficients corresponding to splines showing positive second derivates features within $\mathcal{I}_{j}$ .

We now apply the shape-restricted spline function technique to the functional regression model framework to estimate functional coefficients under (2.2). Although methodological or theoretical developments are general to other shape constraints, such as convexity or monotone convexity, we elaborate on the estimation under the monotonicity restriction below and refer to remarks for other restrictions. Let $\phi_{k}^{j}(t)$ , $k=1,\ldots,d_{j}$ , denote piece-wise quadratic $I$ -spline basis functions under a given knot sequence, employed to fit $\beta_{j}(t)$ , $j=1,\ldots,p$ . Especially for partially constrained coefficients $j\in J$ , the knot sequence should include lower and upper boundaries of $\mathcal{I}_{j}$ . For the case of multiple disjoint sub-intervals form of $\mathcal{I}_{j}$ , all boundaries should be a part of the knot sequence. We then estimate the functional regression coefficients by minimizing the following objective function with respect to non-negative conditions assigning on a subset of basis coefficients $c_{jk}$ depending on the set $J$ and $A_{j}$ as,

\displaystyle\begin{split}\textrm{Minimize}~{}~{}~{}&\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{[}Y_{i,\ell}-\sum_{j=1}^{p}{X}_{i,j}\big{\{}\sum_{k=0}^{d_{j}}c_{jk}\phi_{k}^{j}(T_{i,\ell})\big{\}}\Big{]}^{2}\\ \textrm{subject to}~{}~{}~{}&c_{jk}\geq 0~{}~{}~{}\textrm{for}~{}~{}\{(j,k):j\in J~{}\textrm{and}~{}k\in A_{j}\},~{}~{}\end{split}

(2.17)

where $\phi_{0}^{j}(t)$ indicates the intercept function, i.e., $\phi_{0}^{j}(t)=1$ , so that $c_{j0}$ represents the location of monotonicity functions of $\beta_{j}(t)$ , the dimensionality of splines $d_{j}$ is determined by the number of knots chosen to approximate $\beta_{j}$ , the set $J$ from (2.2) includes indices which coefficients are constrained in $H_{0}$ , and the set $A_{j}$ is defined as $\big{\{}k\in\{1,\ldots,d_{j}\}:{\phi_{k}^{j}}{{}^{\prime}}(t)>0,~{}\textrm{for}~{}t\in\mathcal{I}_{j}\big{\}}$ .

To obtain shape-restricted $\hat{\beta}_{j}(t)$ , we first write our regression problem associated with (2.17) as below. We elaborate for the case of regular evaluation grids, $T_{1},\ldots,T_{L}$ , for $i=1,\ldots,n$ , and $d_{j}=d$ for representation simplicity, but having different $T_{i,\ell}$ or $d_{j}$ does not affect methodological developments. Let $\textrm{vec}\big{(}[\boldsymbol{c}_{j}]_{j=1}^{p}\big{)}=(\boldsymbol{c}_{1}^{\top},\ldots,\boldsymbol{c}_{p}^{\top})^{\top}$ for $\boldsymbol{c}_{j}=(c_{j0},c_{j1},\ldots,c_{jd})^{\top}$ , then

\mathbf{Y}_{i}=(\mathbf{X}_{i}^{\top}\otimes\Phi)\cdot\textrm{vec}\big{(}[\boldsymbol{c}_{j}]_{j=1}^{p}\big{)}+\boldsymbol{\epsilon}_{i},\quad i=1,\ldots,n,

(2.18)

under $c_{jk}\geq 0$ for certain $(j,k)$ specified in (2.17), where $\Phi$ denotes a $L\times(d+1)$ basis matrix having its $\ell$ -th row consisting of $\big{(}1,\phi_{1}(T_{\ell}),$ $\ldots,$ $\phi_{d}(T_{\ell})\big{)}$ for $\ell=1,\ldots,L$ . Then, the coefficient estimation is associated with finding the projection of $nL$ -dimensional $(\mathbf{Y}_{1}^{\top}\ldots,\mathbf{Y}_{n}^{\top})^{\top}$ onto the space spanned by columns of $(\mathbf{X}\otimes\Phi)$ with corresponding constraints on $\boldsymbol{c}_{j}$ , where $\mathbf{X}$ denotes the $(n\times p)$ design matrix. By letting $\boldsymbol{\sigma}^{h}\in\mathbb{R}^{nL}$ , for $h=1,\ldots,p(d+1)$ , denote the set of column vectors on $(\mathbf{X}\otimes\Phi)$ and $\mathcal{L}(\cdot)$ represent the linear space spanned by a given set of vectors, we separate $\boldsymbol{\sigma}^{h}$ , $h\in H_{1}$ , a set of column vectors associated with basis coefficients with no non-negativity constraint, and define $\mathcal{V}=\mathcal{L}\big{(}\{\boldsymbol{\sigma}^{h};~{}h\in H_{1}\}\big{)}$ indicating the linear space spanned by them. We then further obtain a set of generating vectors, denoted as $\boldsymbol{\delta}^{h}$ for $h\in H_{2}$ satisfying $H_{2}=\{1,\ldots,p(d+1)\}\backslash H_{1}$ , that are orthogonal to $\mathcal{V}$ . That is, $\boldsymbol{\delta}^{h}=\boldsymbol{\sigma}^{h}-\Pi(\boldsymbol{\sigma}^{h}|\mathcal{V})$ , for $h\in H_{2}$ , where $\Pi$ indicates a projection operator, such as the Gram-Schmidt process. By extending [18], we can characterize the constrained set as

\mathcal{C}=\big{\{}\boldsymbol{\mu}\in\mathbb{R}^{nL}:~{}~{}\boldsymbol{\mu}=\sum_{h\in H_{1}}b_{h}\boldsymbol{\sigma}^{h}+\sum_{h\in H_{2}}b_{h}\boldsymbol{\delta}^{h},~{}~{}\textrm{where}~{}b_{h}\geq 0,~{}\textrm{for}~{}h\in H_{2}\}.

(2.19)

Then the projection of $(\mathbf{Y}_{1}^{\top}\ldots,\mathbf{Y}_{n}^{\top})^{\top}$ to $\mathcal{C}$ can be fulfilled by projecting it onto the set of nonnegative linear combinations of $\boldsymbol{\delta}^{h}$ , $h\in H_{2}$ , called as constraint cone, and onto the $\mathcal{V}$ , separately. We refer to Section 2 of [18] for readers interested in more comprehensive descriptions under a nonparametric regression setting. Owing to the fact that the constraint set $\mathcal{C}$ is a closed convex polyhedral cone in $\mathbb{R}^{nL}$ , the projection is unique, and we employ the cone projection algorithm of [19] to find a solution, available as the function qprog or coneA in the R package coneproj. We note that the unconstrained estimates $\tilde{\beta}_{j}(t)$ can be calculated by the least square estimates of $\boldsymbol{c}_{j}$ under (2.18).

Next, we investigate rates of convergence for estimated functional coefficients under the constraints. Let $K_{jn}$ denote the number of knots to fit $\beta_{j}(t)$ growing with $n$ , $q$ is the order of the spline, and $\|a\|_{L_{2}}$ be the $L_{2}$ norm of a square-integrable function $a(t)$ . Under Conditions $(S1)$ – $(S6)$ , deferred in Supplementary Material S.2, and $\lim_{n\rightarrow\infty}K_{n}\log K_{n}/n=0$ for $K_{n}=\text{max}_{1\leq j\leq p}K_{jn}$ , Theorem 2 of [14] and Theorem 6.25 of [26] imply the consistency of unconstrained $\tilde{\boldsymbol{\beta}}(t)=\big{(}\tilde{\beta}_{1}(t),\ldots,\tilde{\beta}_{p}(t)\big{)}^{\top}$ written as, $\|\tilde{\beta}_{j}-\beta_{j}\|^{2}_{L_{2}}=O_{p}(K_{n}n^{-1}+K_{n}^{-2q})$ , $j=1,\ldots,p.$ Next, we show the consistency of the constrained estimators by adding the condition (S7) specified in Supplementary Material.

Theorem 2.3.1.

When shape constraints assigned to $\beta_{j}$ is true in $H_{0}$ , $K_{nj}=O(n^{1/(2q+1)})$ , and conditions $(S1)$ – $(S7)$ are satisfied, the constrained estimator $\widehat{\boldsymbol{\beta}}(t)=\big{(}\hat{\beta}_{1}(t),\ldots,\hat{\beta}_{p}(t)\big{)}^{\top}$ attains the same rate as the unconstrained estimator.

\|\hat{\beta}_{j}-\beta_{j}\|^{2}_{L_{2}}=O_{p}(K_{n}n^{-1}+K_{n}^{-2q}),\quad j=1,\ldots,p,

where its minimum rate $O_{p}(n^{-2q/(2q+1)})$ is achieved when $K_{n}=O(n^{1/(2q+1)})$ .

Remark 3 (Convexity shape constraints).

When the convexity constraint is considered in (2.2) instead of monotonicity, we employ $C$ -splines and express the coefficient function in the objective function (2.17) as $\beta_{j}(t)=c_{j0,1}\phi_{0,1}(T_{i,\ell})+c_{j0,2}\phi_{0,2}(T_{i,\ell})+\sum_{k=1}^{d_{j}}c_{jk}\phi_{k}^{j}(T_{i,\ell}),$ where $\phi_{0,1}(t)$ denotes the intercept function, $\phi_{0,2}(t)$ is the identity function, i.e., $\phi_{0,2}(T_{i,\ell})=T_{i,\ell}$ , and $\phi_{k}^{j}(t)$ denotes $C$ -splines under a given sequence of knots. The first two terms without constraints on $c_{j0,1}$ and $c_{j0,2}$ determine the first-order behaviors of functional coefficients. Then, we assign constraints $c_{jk}\geq 0$ , for $(j,k)$ , where $j\in J$ and $k\in A_{j}$ with the set $A_{j}$ defined as $\{k\in\{1,\ldots,d_{j}\}:{\phi_{k}^{j}}{{}^{\prime\prime}}(t)>0,~{}\textrm{for}~{}t\in\mathcal{I}_{j}\}$ . The non-negativity constraints on basis coefficients associated with $\phi_{k}^{j}(t)$ having positive second derivatives within $\mathcal{I}_{j}$ ensures the convexity of functional estimates over $\mathcal{I}_{j}$ . For concavity constraint, we apply the same framework but with non-positivity constraints on a chosen set of $c_{jk}$ . In this way, we can consider general $H_{0}$ with various types of shape constraints on $\beta_{j}(t)$ over $\mathcal{I}_{j}$ , for $j\in J$ .

2.4 Test procedure

We employ a resampling method to assess the statistical significance of the observed $D_{n}$ given a random sample $\mathcal{X}_{n}$ . We note that resampling functional observations , say $\widetilde{\mathbf{Y}}_{i}$ $=$ $\big{(}\widetilde{Y}_{i}(T_{i,1}),$ $\ldots,$ $\widetilde{Y}_{i}(T_{i,L_{i}})\big{)}^{\top}$ , is not straightforward because $\widetilde{Y}_{i}(T_{i,1}),\ldots,\widetilde{Y}_{i}(T_{i,L_{i}})$ should inherit the joint distribution of ${Y}_{i}(T_{i,1}),\ldots,{Y}_{i}(T_{i,L_{i}})$ with within-subject heteroscedasticity. In literature, the wild bootstrap method [33, 13, 16] has been extensively investigated that can effectively handle the heteroscedasticity, and there have been many advances for handling dependent data [3, 7, 29, 11]. In this study, we use the wild bootstrap procedure as described in Algorithm 1, which can be regarded as the wild bootstrap for clustered data [3, 7].

Algorithm 1 Bootstrap procedure under the partial shape constraints

1:For a given a random sample

\mathcal{X}_{n}=\{(\mathbf{X}_{i},\mathbf{Y}_{i}):i=1,\ldots,n\}

, find the unconstrained and shape-constrained estimators

\widetilde{\boldsymbol{\beta}}

and

\widehat{\boldsymbol{\beta}}

, respectively.

2:Set a bootstrap size

B\geq 1

3:Compute functional residuals;

4:for

i=1

n

5: for

\ell=1

L_{i}

\widetilde{e}_{i,\ell}\leftarrow Y_{i,\ell}-\mathbf{X}_{i}^{\top}\widetilde{\boldsymbol{\beta}}(T_{i,\ell})

7: end for

8:end for

9:Generate

B

bootstrap samples under the null hypothesis;

10:for

b=1

B

11: for

i=1

n

12: for

\ell=1

L_{i}

13: Sample

v_{i,\ell}^{(b)}\in\big{\{}-\frac{\sqrt{5}-1}{2},\frac{\sqrt{5}+1}{2}\big{\}}

with probability

\Big{(}\frac{\sqrt{5}+1}{2\sqrt{5}},\frac{\sqrt{5}-1}{2\sqrt{5}}\Big{)}

14:

\widetilde{Y}_{i,\ell}^{(b)}\leftarrow\mathbf{X}_{i}^{\top}\widehat{\boldsymbol{\beta}}(T_{i,\ell})+v_{i,\ell}^{(b)}\widetilde{e}_{i,\ell}

15: end for

16: end for

17:

\mathcal{X}_{n}^{(b)}\leftarrow\{(\widetilde{\mathbf{Y}}_{i}^{(b)},\mathbf{X}_{i}):i=1,\ldots,n\}

18: Find the shape-constrained

\widehat{\boldsymbol{\beta}}^{(b)}

and unconstrained

\widetilde{\boldsymbol{\beta}}^{(b)}

based on

\mathcal{X}_{n}^{(b)}

19: Compute

D_{n}^{(b)}

based on

\widehat{\boldsymbol{\beta}}^{(b)}

\widetilde{\boldsymbol{\beta}}^{(b)}

and

\mathcal{X}_{n}^{(b)}

20:end for

Once $B$ bootstrap samples $\mathcal{X}_{n}^{(1)},\ldots,\mathcal{X}_{n}^{(B)}$ are independently generated, we calculate the shape-constrained and unconstrained estimates $\widehat{\boldsymbol{\beta}}^{(b)}$ and $\widetilde{\boldsymbol{\beta}}^{(b)}$ , and calculate the test statistic $D_{n}^{(b)}$ , for each $b=1,\ldots,B$ . These bootstrap test statistics are compared to the original test statistic $D_{n}$ to determine the bootstrap $p$ -value, $p_{n}=B^{-1}\sum_{b=1}^{B}\mathbb{I}(D_{n}^{(b)}>D_{n})$ . Then, we reject the null hypothesis $H_{0}$ if $p_{n}<\alpha$ , and retain $H_{0}$ otherwise at the significance level $\alpha\in(0,1)$ . Our numerical study (Section 3) demonstrates that the proposed test procedure fulfills consistency in the sense that the type I error rate meets the significance level as specified and that the power of the test increases as sample size increases.

3 Simulation Study

We demonstrate the finite-sample performance of the proposed methods with numerical simulations. As described in Section 2, functional observations are assumed to be only partially available in our simulation study. Specifically, we first generate subject-specific evaluation points $\mathbf{T}_{i}=(T_{i,1},\ldots,T_{i,L_{i}})$ independently drawn from $\mathrm{Beta}(1,1.25)$ , which is a right-skewed distribution on $[0,1]$ . Here, $L_{i}$ is a uniform random integer on $[5,15]$ , representing the bounded number of sparse and irregular observations. Then, longitudinal observations $\mathbf{Y}_{i}=\big{(}Y_{i}(T_{i,1}),\ldots,Y_{i}(T_{i,L_{i}})\big{)}^{\top}$ of each functional response $Y_{i}$ are generated by the model (2.1) associated with four functional coefficients $\{\beta_{j}:j=1,\ldots 4\}$ depicted in Figure 3(a), where the random vector $\mathbf{X}_{i}=(X_{i,1},\ldots,X_{i,4})^{\top}$ of scalar covariates is given by $X_{i,j}=(U_{i,j}+0.5U_{i,5})/(1+0.5)$ with independent uniform random variables $U_{i,1},\ldots,U_{i,5}$ on $[0,1]$ . This makes the four components $X_{i,1},\ldots,X_{i,4}$ correlated. Moreover, to introduce the within-subject dependency of longitudinal observations, we set the functional noise as $\varepsilon_{i}(t)=\sum_{k=1}^{50}(\xi_{i,k}/\sqrt{k})\sqrt{2}\sin(2k\pi t)+\epsilon_{i}(t)$ , where $\boldsymbol{\xi}_{i}=(\xi_{i,1},\ldots,\xi_{i,50})^{\top}$ has a multivariate normal distribution with mean zero and $\mathrm{Cov}(\xi_{i,k},\xi_{i,k^{\prime}})=0.5^{|k-k^{\prime}|}$ and $\{\epsilon_{i}(t):t\in[0,1]\}$ is a Gaussian white noise process with mean zero and $\mathrm{Var}(\epsilon_{i}(t))=0.1^{2}$ . Then, we have a random sample $\mathcal{X}_{n}=\{(\mathbf{X}_{i},\mathbf{Y}_{i}):i=1,\ldots,n\}$ of size $n\in\{100,500\}$ .

We consider testing a composite null hypothesis

H_{0}:\textrm{$\beta_{1}$ and $\beta_{3}$ are monotone increasing on $\mathcal{I}$},

(3.1)

where $\mathcal{I}\subset[0,1]$ is a pre-specified sub-interval to which the partial shape constraints of interest apply. This setup is intended to mock up one of the common situations in longitudinal studies such that patients drop out of the study or transfer to other clinics. For example, as the vertical rugs show that evaluation points are right-skewed in Figure 3(b), fewer observations are available near $t=1$ . Still, verifying the partial shape constraints on regression coefficient functions near $t=1$ can be of inferential interest. For a systematic assessment of the numerical experiments, we consider ten simulation scenarios of the sub-interval $\mathcal{I}=[0,0.5]$ , $[0.1,0.5]$ , $[0.2,0.5]$ , $[0.3,0.5]$ , $[0.4,0.5]$ , $[0.5,1]$ , $[0.6,1]$ , $[0.7,1]$ , $[0.8,1]$ , and $[0.9,1]$ . With the graphical illustration of the true regression coefficient functions in Figure 3(a), we note that the null hypothesis (3.1) is true if $\mathcal{I}\subset[0,0.5]$ , and false otherwise.

To demonstrate the consistency of the proposed method subject to the different scenarios of $\mathcal{I}$ , we evaluate the power of the test by the Monte Carlo approximation as

\displaystyle\gamma_{\alpha}(\mathcal{I})=\frac{1}{R}\sum_{r=1}^{R}\mathbb{I}\big{(}p_{n}^{(r)}<\alpha\big{)}

(3.2)

at the significance level $\alpha\in(0,1)$ , where $p_{n}^{(r)}$ is the $p$ -value obtained in each $r$ -th Monte Carlo repetition under the null hypothesis (2.2). Since $\gamma_{\alpha}(\mathcal{I})$ gives the empirical level of the test, we can validate how much the proposed test keeps the test level as specified with different significance levels. Under the general alternative, $\gamma_{\alpha}(\mathcal{I})$ corresponds to the empirical power of the test, and we can examine how fast $\gamma_{\alpha}(\mathcal{I})$ approaches 1 as the sample size increases.

Besides the consistency of the partially shape-constrained inference, we evaluate the consistency of estimation with the integrated squared bias (ISB) and the integrated variance (IVar),

\displaystyle\textrm{ISB}(\mathcal{I})=\frac{1}{|\mathcal{I}|}\int_{\mathcal{I}}\Big{\|}\overline{\widehat{\boldsymbol{\beta}}}(t)-\boldsymbol{\beta}(t)\Big{\|}^{2}\,\mathrm{d}t,\quad\textrm{IVar}(\mathcal{I})=\frac{1}{R}\sum_{r=1}^{R}\frac{1}{|\mathcal{I}|}\int_{\mathcal{I}}\Big{\|}\widehat{\boldsymbol{\beta}}^{(r)}(t)-\overline{\widehat{\boldsymbol{\beta}}}(t)\Big{\|}^{2}\,\mathrm{d}t,

(3.3)

where $\overline{\widehat{\boldsymbol{\beta}}}(t)=R^{-1}\sum_{r=1}^{R}\widehat{\boldsymbol{\beta}}^{(r)}(t)$ is the average of the shape-constrained estimates obtained from the repeated Monte Carlo experiments. In (3.3), we normalize the ISB and IVar by $|\mathcal{I}|$ to easily compare the trend of numerical performances obtained from the different lengths of sub-intervals. Combining the above (3.2) and (3.3), we can assess the sensitivity and specificity of the proposed test procedure subject to the location and length of the sub-interval. We mainly report the simulation results obtained with $B=200$ and $R=200$ , but our background simulation study gave the same lessons with larger $B$ and $R$ . We refer the readers to Section S.3 of the Supplementary Materials for the implementation details, including the bandwidth and knots selection for the shape-constrained kernel smoothing and regression spline, respectively.

We also consider a sub-cohort analysis for a comparative study, shedding light on another essential feature of the proposed method. As mentioned in the Introduction section, one may argue that it is more appropriate to simply treat (3.1) as global shape constraints “ $\beta_{1}|_{\mathcal{I}}$ and $\beta_{3}|_{\mathcal{I}}$ are globally monotone increasing” under the model (2.1) restricted to $\mathcal{I}$ . In this concern, we conduct the sub-cohort analysis using the same inferential procedure as the proposed method but only utilizing the partial data $\mathcal{X}_{n}|_{\mathcal{I}}=\{(\mathbf{Y}_{i}|_{\mathcal{I}},\mathbf{X}_{i}):i=1,\ldots,n\}$ , where $\mathbf{Y}_{i}|_{\mathcal{I}}=\{Y_{i,\ell}:T_{i,\ell}\in\mathcal{I},\,\ell=1,\ldots,L_{i}\}$ . In contrast to the sub-cohort analysis, we call our proposed method the “full cohort” analysis since it uses all available observations in $\mathcal{X}_{n}$ for estimation regardless of $\mathcal{I}$ . As seen below, our simulation result indicates that the sub-cohort analysis suffers from a significant loss of information. The more $\mathcal{I}$ is located in the second half of the domain, the smaller the statistical power one may expect. The situation becomes much worse depending on the length of $\mathcal{I}$ and the sparsity of functional evaluations on it. Through our simulation study, we also demonstrate that the proposed test procedure is robust against the location of sub-intervals.

Table 1: Simulation results for the partially shape-constrained estimates corresponding to the kernel and spline methods. The estimation and test performance are evaluated with the integrated squared bias (ISB), the integrated variance (IVar), the type I error rate, and the power of the test, defined as (3.2) and (3.3).

Sample size	Dataset	Criterion	$H_{0}:\beta_{1}$ and $\beta_{3}$ are monotone increasing on $\mathcal{I}$ .
			Kernel least squares		Spline regression
			$\mathcal{I}=\mathcal{[}0,0.5]$	$\mathcal{I}=\mathcal{[}0.4,0.5]$	$\mathcal{I}=\mathcal{[}0,0.5]$	$\mathcal{I}=\mathcal{[}0.4,0.5]$
$n=100$	Full data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.06,0.11)$	$(0.05,0.11)$	$(0.04,0.07)$	$(0.08,0.11)$
		$\mathrm{ISB}$	$0.0302$	$0.0073$	$0.0101$	$0.0033$
		$\mathrm{IVar}$	$0.4034$	$0.5346$	$0.3737$	$0.2054$
	Partial data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.06,0.11)$	$(0.09,0.19)$	$(0.05,0.11)$	$(0.06,0.12)$
	restricted to $\mathcal{I}$	$\mathrm{ISB}$	$0.0517$	$0.0768$	$0.0176$	$0.1156$
	restricted to $\mathcal{I}$	$\mathrm{IVar}$	$0.5335$	$1.5028$	$0.5758$	$1.6896$
$n=500$	Full data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.05,0.10)$	$(0.05,0.09)$	$(0.04,0.08)$	$(0.05,0.08)$
		$\mathrm{ISB}$	$0.0102$	$0.0018$	$0.0017$	$0.0016$
		$\mathrm{IVar}$	$0.1010$	$0.1135$	$0.0946$	$0.0572$
	Partial data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.04,0.10)$	$(0.07,0.13)$	$(0.05,0.11)$	$(0.04,0.08)$
	restricted to $\mathcal{I}$	$\mathrm{ISB}$	$0.0129$	$0.0110$	$0.0022$	$0.0109$
	restricted to $\mathcal{I}$	$\mathrm{IVar}$	$0.1497$	$0.3229$	$0.1168$	$0.2798$

We report the simulation results in Table 1 and Figure 4, highlighting the main lessons we obtained in the power analysis and estimation performance. The additional simulation results are provided in Section S.7 of the Supplementary Materials. The proposed partially shape-constrained inference (full cohort analysis) outperforms the sub-cohort analysis with the global shape constraints. Specifically, Table 1 shows that both the kernel and spline methods yield consistent estimates in the sense that ISB and IVar decrease as the sample size increases, which is well-aligned with the large sample properties we investigated in Section 2. Figure 4 shows that the proposed test procedure consistently meets the level of the test as specified ( $\alpha=0.05,0.1$ ) across different scenarios varying with the length of sub-intervals from $0.1$ to $0.5$ .

We note that the proposed method keeps the significance level under the null hypothesis and generally attains a greater power under the general alternative, while the sub-cohort analysis often violates the pre-specified level of the test under the null hypothesis. For the sub-interval $\mathcal{I}=[0.9,1]$ , the sub-cohort analysis appears more sensitive to the general alternative than the full cohort analysis. However, as shown in Figure 1(b) in the Supplementary Materials, the shape-constrained estimates of the sub-cohort analysis suffer from significant bias due to the low sample size with $\mathcal{X}_{n}|_{\mathcal{I}=[0.9,1]}$ . A similar issue arises with the sub-interval $\mathcal{I}=[0.4,0.5]$ , where the sub-cohort analysis exhibits a higher type I error rate compared to the full cohort analysis. Therefore, we do not recommend using the sub-cohort analysis for verifying partial shape constraints over short sub-intervals.

4 Real Data Applications

We illustrate the application of shape-constrained inference to two datasets from clinical trials and demonstrate how we figure out treatment efficacy over time. The proposed tools help summarize dynamic efficacy patterns so that practitioners better understand when the maximum treatment effect is achieved and decide the appropriate treatment frequencies. Compared to existing studies making such conclusions relying on visualization of estimated regression coefficient functions or based on inference over the entire domain, our method enables providing statistical evidence on refined conclusions on any sub-intervals of interest. Although we illustrate examples from clinical trials, the proposed tool can be applied to any field with similar interests.

4.1 Application 1: Cervical Dystonia Dataset

Cervical dystonia is a painful neurological condition that causes the head to twist or turn to one side because of involuntary muscle contractions in the neck. Although this rare disorder can occur at any age, it most often occurs in middle-aged people, women more than men. The dataset is collected from a randomized placebo-controlled trial of botulinum toxin (botox) B, where 109 participants across 9 sites were assigned to placebo (36 subjects), 5000 units of botox B (36 subjects), or 10,000 units of botox B (37 subjects), injected into the affected muscle to partially paralyze it and make it relax. The response variable is the score on the Toronto Western Spasmodic Torticollis Rating Scale (TWSTRS), measuring the severity, pain, and disability of cervical dystonia, where higher scores mean more impairment on a 0-87 scale. The TWSTRS is administered at baseline (week 0) and at weeks 2, 4, 8, 12, and 16 thereafter. This study was originally conducted by [2], and 96.5% of subjects were followed up at designated weeks, on average. Besides, the dataset contains information on the age and sex of the subjects. The data is available in R through the package ‘medicaldata’ (https://github.com/higgi13425/medicaldata). While [2] or [8] focus on demonstrating the efficacy of botox B in cervical dystonia during weeks of clinical trial under longitudinal models, our application aims to assess the pattern of drug efficacy over weeks by applying shape-constrained inference on various sub-intervals and investigate when the maximum efficacy is achieved. To do this, we introduce the indicator variables $D_{i}$ , where $D_{i}=1$ if the drug is assigned to $i$ th subject, and $D_{i}=0$ , otherwise. We then consider the following regression model

Y_{i}(t)=\beta_{0}(t)+\beta_{1}(t)D_{i}+\beta_{2}(t)*\texttt{Age}_{i}+\beta_{3}(t)*\texttt{Sex}_{i}+\epsilon_{i}(t)\quad(t\in[0,16]),

(4.1)

Table 2:

p

-values from Cervical Dystonia data under corresponding shape-constrained null hypotheses with given sub-intervals using kernel-based and spline-based estimation.

Hypothesis	$p$ -values
	Kernel-based	Spline-based
$\beta_{1}(t)$ is monotone decreasing on $[0,4]$	0.36	0.45
$\beta_{1}(t)$ is monotone increasing on $[0,4]$	0.03	0.02
$\beta_{1}(t)$ is monotone decreasing on $[4,16]$	0.07	0.01
$\beta_{1}(t)$ is monotone increasing on $[4,16]$	0.84	0.48

Table 2 contains the results of the monotonicity tests, $H_{0}:\beta_{1}(t)$ is monotone decreasing (increasing) over two sub-intervals, $\mathcal{I}=[0,4]$ and $[4,16]$ , from the kernel- and spline-based tests using bootstrap size $B=500$ . The spline-based approach specifically adopts piece-wise quadratic $I$ -spline basis functions with one internal knot located at $t=8$ , determined by cross-validation as detailed in Supplementary Material S.3. To estimate $\beta_{1}(t)$ with shape restriction over $\mathcal{I}=[0,4]$ or $[4,16]$ , we locate one additional knot at $t=4$ only for $\beta_{1}(t)$ to present desired shape under given boundaries. The kernel-based results are obtained from estimates based on optimal bandwidth $h\approx 3.2(=0.2\times 16)$ with the detailed selection criteria discussed in Supplementary Material. Then, we observe that both kernel- and spline-based tests conclude the significant decreasing $\beta_{1}(t)$ from weeks 0 to 4 by rejecting $H_{0}:\beta_{1}(t)$ is monotone increasing on $[0,4]$ , but declare an increasing pattern observed from weeks 4 to 16 by rejecting $H_{0}:\beta_{1}(t)$ is monotone decreasing on $[4,16]$ under the significant level 0.1. This conclusion implies the improving efficacy of botox B during the first 4 weeks, but diminishing effectiveness after then. This finding becomes more convincing through consistent conclusions regardless of the choice of shape constraints on null hypotheses, either monotone decreasing or increasing, or choice of estimation methods.

Figure 5 displays the estimated functional coefficients under spline estimates with the chosen optimal selection of knots in each panel. Following the inferential conclusion on treatment effect, $\beta_{1}(t)$ is estimated under the decreasing constraint over weeks from 0 to 4 and under the increasing constraint during weeks from 4 to 16. The unconstrained and constrained models are fitted using the same knots, but one extra knot at week 4 is added to fit $\beta_{1}(t)$ due to shape constraint changing at week 4. We observe the positive coefficients at week 0, presumably due to a slight bias from the random assignment in the trial, with the mean of response scores from placebo and treatment groups calculated as 43.5 and 46.7, respectively, at baseline (week 0). Although patients in the treatment group show slightly higher scores at the beginning, the decreasing trend in the first four weeks, i.e., improving efficacy, is clear from the estimated function. After 4 weeks, the degree of effectiveness weakens, and at week 16, there is no more treatment effect by returning to the status that we observed from week 0. While we find a precise fit for $\beta_{1}(t)$ under shape constraints, remainder regression coefficient estimates for age and sex, $\beta_{2}(t)$ and $\beta_{3}(t)$ , show similar fits with or without constraints on $\beta_{1}(t)$ . In addition to the conclusion on overall treatment efficacy as in [2], we could make a comprehensive summary of treatment efficacy based on refined conclusions over sub-intervals of interest, and it further helps precise estimation.

4.2 Application 2: Mental Health Schizophrenia Collaborative Study

We next implement the proposed method on the data from the National Institute of Mental Health Schizophrenia Collaborative Study, where 437 patients were randomized to receive either a placebo or anti-psychotic drug, followed by longitudinal monitoring of individuals. [1] and [12] already analyzed this data to assess the efficacy of the drug using Item 79 ‘Severity of Illness,’ the Inpatient Multidimensional Psychiatric Scale (IMPS) ranging from 1 (normal, not ill at all) to 7 (among the most extremely ill), evaluated at weeks 0, 1, 3, 6 under the protocol along with additional measurements for some patients made at weeks 2, 4, 5. There were a total of 108 and 329 patients in the placebo and treatment groups, respectively, with an average of $89.8\%$ among them collected during weeks of protocol and $2.6$ % collected during other weeks. The data is available in R (Package ‘mixor’). As in [1] and [12], we consider the regression model $Y_{i}(t)=\beta_{0}(t)+\beta_{1}(t)D_{i}+\epsilon_{i}(t)$ , $t\in[0,6]$ , where $Y_{i}(t)$ denotes the illness severity measurement of $i$ th subject at week $t$ , and $\beta_{1}(t)$ represents the effect of drug treatment over weeks with the indicator dummy variable $D_{i}$ ; $D_{i}=1$ for individuals assigned in treatment group. As mentioned in the Introduction, although [1] and [12] demonstrated its improving effectiveness over the entire domain, from week 0 to 6, through conclusion on significant decreasing $\beta_{i}(t)$ using their inferential tools under shape-constrained estimation, respectively, we are furthermore interested in the efficacy dynamics in later weeks by applying sub-interval tests, similar to what we conducted in Section 4.1. This is motivated by Figure 4 of [1], displaying a clear decreasing trend on estimated $\beta_{1}(t)$ with narrow confidence over weeks 0 - 3, but flattening out phase afterward with widening confidence intervals. By applying sub-interval tests over weeks 0 to 3 and weeks 3 to 6, we examine patterns in efficacy beyond the overall drug effectiveness.

Table 3:

p

-values from Schizophrenia data under corresponding shape-constrained null hypotheses with given sub-intervals using kernel-based and spline-based estimation.

Hypothesis	$p$ -values
	Kernel-based	Spline-based
$\beta_{1}(t)$ is monotone decreasing on $[0,3]$	0.71	0.26
$\beta_{1}(t)$ is monotone increasing on $[0,3]$	0.05	0.00
$\beta_{1}(t)$ is monotone decreasing on $[3,6]$	0.65	0.15
$\beta_{1}(t)$ is monotone increasing on $[3,6]$	0.62	0.11

Table 3 provides $p$ -values from $H_{0}:\beta_{1}(t)$ is monotone increasing (decreasing) over $\mathcal{I}$ with bootstrap size $B=500$ . The optimal bandwidth for kernel estimates is set as $h\approx 1.8(=0.3\times 6)$ , and for spline estimates, we adopt piece-wise $I$ -splines with two internal knots located at $t=1$ and 4. Both kernel- and spline-based tests conclude the significant decreasing $\beta_{1}(t)$ during weeks 0 - 3, supported by $p$ -values less than or equal to significance level $\alpha=0.05$ from the null hypothesis of monotone increasing. Consistently, the null hypothesis for monotone decrasing over weeks 0 -3 is not rejected from both tests. However, sub-interval tests over weeks 3 - 6 conclude that there is not enough statistical evidence to reject both the monotone increasing and decreasing trends. Thus, we conclude the constant level of $\beta_{1}(t)$ maintained over this period, implying the maximum drug efficacy achieved at week 3 and the duration of such degree of effectiveness until the end of the experiment. The estimated drug effect is illustrated in Figure 1 with the estimates with no constraints and with decreasing constraints over the entire domain. As discussed in the Section 1, the latter fit is the inferential finding from [1] and [12].

References

[1] M. Ahkim, G. I., and V. A. Shape testing in varying coefficient models. Test, 26:429–450, 2017.
[2] A. Brashear, M. Lew, D. Dykstra, C. Comella, S. Factor, R. Rodnitzky, R. Trosch, C. Singer, M. Brin, and J. Murray. Safety and efficacy of NeuroBloc (botulinum toxin type B) in type A–responsive cervical dystonia. Neurology, 53(7):1439–1439, 1999.
[3] A. C. Cameron, J. B. Gelbach, and D. L. Miller. Bootstrap-based improvements for inference with clustered errors. The review of economics and statistics, 90(3):414–427, 2008.
[4] Z. Chen, Q. Gao, B. Fu, and H. Zhu. Monotone nonparametric regression for functional/longitudinal data. Statistica Sinica, 29(4):2229–2249, 2019.
[5] D. Chetverikov, A. Santos, and A. M. Shaikh. The econometrics of shape restrictions. Annual Review of Economics, 10(1):31–63, 2018.
[6] J. A. Cuesta-Albertos, E. García-Portugués, M. Febrero-Bande, and W. González-Manteiga. Goodness-of-fit tests for the functional linear model based on randomly projected empirical processes. The Annals of Statistics, 47(1):439–467, 2019.
[7] R. Davidson and E. Flachaire. The wild bootstrap, tamed at last. Journal of Econometrics, 146(1):162–169, 2008.
[8] C. S. Davis. Statistical Methods for the Analysis of Repeated Measurements. Springer Texts in Statistics. New York, NY: Springer Nature, 1 edition, 2003.
[9] L. Dümbgen. Shape-constrained statistical inference. Annual Review of Statistics and Its Application, 11, 2024.
[10] Y. Fan and E. Guerre. Multivariate local polynomial estimators: Uniform boundary properties and asymptotic linear representation. In Essays in Honor of Aman Ullah, volume 36, pages 489–537. Emerald Group Publishing Limited, 2016.
[11] S. Friedrich, F. Konietschke, and M. Pauly. A wild bootstrap approach for nonparametric repeated measurements. Computational Statistics & Data Analysis, 113:38–52, 2017.
[12] R. Ghosal, S. Ghosh, J. Urbanek, J. A. Schrack, and V. Zipunnikov. Shape-constrained estimation in functional regression with bernstein polynomials. Computational Statistics & Data Analysis, 178:107614, 2023.
[13] W. Härdle and E. Mammen. Bootstrap methods in nonparametric regression. In Nonparametric functional estimation and related topics, pages 111–123. Springer, 1991.
[14] J. Z. Huang, C. . Wu, and L. Zhou. Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statistica Sinica, 14(3):763–788, 2004.
[15] R. J. Kuczmarski. 2000 CDC growth charts for the United States: Methods and development. Number 246. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics, 2002.
[16] E. Mammen. Bootstrap and wild bootstrap for high dimensional linear models. The annals of statistics, 21(1):255–285, 1993.
[17] E. Mammen, J. S. Marron, B. A. Turlach, and M. P. Wand. A General Projection Framework for Constrained Smoothing. Statistical Science, 16(3):232 – 248, 2001.
[18] M. C. Meyer. Inference using shape-restricted regression splines. The Annals of Applied Statistics, 2(3):1013–1033, 2008.
[19] M. C. Meyer. A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics - Simulation and Computation, 42(5):1126–1139, 2013.
[20] M. C. Meyer. A framework for estimation and inference in generalized additive models with shape and order restrictions. Statistical Science, 33(4):595–614, 2018.
[21] J. S. Morris. Functional regression. Annual Review of Statistics and Its Application, 2(2):321–359, 2015.
[22] Y. Park, K. Han, and D. G. Simpson. Testing linear operator constraints in functional response regression with incomplete response functions. Electronic Journal of Statistics, 17(2):3143–3180, 2023.
[23] J. O. Ramsay. Monotone regression splines in action. Statistical Science, 3(4):425–441, 1988.
[24] P. T. Reiss, J. Goldsmith, H. L. Shang, and R. T. Ogden. Methods for scalar-on-function regression. International Statistical Review, 85(2):228–249, 2017.
[25] R. J. Samworth and B. Sen. Special issue on “nonparametric inference under shape constraints”. Statistical science, 33(4):469–472, 2018.
[26] L. Schumaker. Spline Functions: Basic Theory. Cambridge Mathematical Library. Cambridge University Press, 3 edition, 2007.
[27] E. Seijo and B. Sen. Nonparametric least squares estimation of a multivariate convex regression function. The Annals of Statistics, 39(3):1633–1657, 2011.
[28] B. Sen and M. Meyer. Testing against a linear regression model using ideas from shape-restricted estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(2):423–448, 2017.
[29] X. Shao. The dependent wild bootstrap. Journal of the American Statistical Association, 105(489):218–235, 2010.
[30] Q. Shen and J. Faraway. An $F$ test for linear models with functional responses. Statistica Sinica, 14:1239–1257, 2004.
[31] E. Slutsky. Sulla teoria del bilancio del consumatore. Giornale degli economisti e rivista di statistica, pages 1–26, 1915.
[32] J.-L. Wang, J.-M. Chiou, and H.-G. Müller. Functional data analysis. Annual Review of Statistics and Its Application, 3:257–295, 2016.
[33] C.-F. J. Wu. Jackknife, bootstrap and other resampling methods in regression analysis. The Annals of Statistics, 14(4):1261–1295, 1986.
[34] D. Yagi, Y. Chen, A. L. Johnson, and T. Kuosmanen. Shape-constrained kernel-weighted least squares: Estimating production functions for chilean manufacturing industries. Journal of Business & Economic Statistics, 38(1):43–54, 2020.
[35] J.-T. Zhang. Statistical inferences for linear models with functional responses. Statistica Sinica, 21:1431–1451, 2011.
[36] J.-T. Zhang and J. Chen. Statistical inferences for functional data. The Annals of Statistics, 35(3):1052–1079, 2007.

Supplementary materials:
Statistical inference on partially shape-constrained
function-on-scalar linear regression models

S.1 Technical conditions for kernel estimates

For the minimum and maximum measurement frequencies $\lambda_{n}=\min_{1\leq i\leq n}L_{i}$ and $\Lambda_{n}=\max_{1\leq i\leq n}L_{i}$ , let $N=n\lambda_{n}$ . We list the technical conditions for Proposition 2.2.1 and Theorem 2.2.2.

$(K0)$

$\pi$ of $T$ is continuously differentiable and strictly positive.
$(K1)$

$K$ is a probability density function symmetric and Lipschitz continuous on $[-1,1]$ .
$(K2)$

$E\|\varepsilon\|_{\infty}^{k}<\infty$ for some $k>2$ .
$(K3)$

$h\asymp N^{-a}$ for some $0<a<1-2/k$ .
$(K4)$

$\lambda_{n}=O(1)$ and $\Lambda_{n}=O\big{(}N^{b}/\log N\big{)}^{\frac{k}{2}}$ for some $0<b<1-(2/k)-a$ .

$(K1)$ – $(K3)$ are standard conditions to analyze the large sample property of the kernel estimator with an exponential inequality. $(K4)$ is a technical condition that ensures the uniform convergence of the unconstrained kernel least square estimator even if the minimum number $\lambda_{n}$ of longitudinal observations for some subjects is bounded. Still, we require $\Lambda_{n}$ to increase so that the collection $\{T_{i,\ell}:1\leq i\leq n,\,\ell=1,\ldots,L_{i}\}$ of sampling points from all subjects densely covers the entire domain $[0,1]$ .

S.2 Technical conditions for spline estimates

$(S1)$

Assume $\beta_{j}(t)\in C^{q}[0,1]$ . For monotone $\beta_{j}$ , we use quadratic splines $(q=3)$ and for convex $\beta_{j}$ , cubic splines are used $(q=4)$ .
$(S2)$

There is a positive constant $M_{1}$ such that $|X_{j}|\leq M_{1}$ , for $j=1,\ldots,p.$
$(S3)$

There is a constant $M_{3}$ such that $E[\epsilon(t)^{2}]\leq M_{3}<\infty$ for $t\in[0,1].$
$(S4)$

The knots $0=\xi_{j,1}<\xi_{j,2}<\cdots<\xi_{j,K_{jn}}=1$ have bounded mesh ratio; that is, there is a constant $M_{2}\in(1,\infty)$ not depending on $n$ or $j$ such that $K_{jn}^{-1}M_{2}^{-1}\leq\xi_{j,k+1}-\xi_{j,k}\leq K_{jn}^{-1}M_{2}$ , for $k=2,\ldots K_{jn}$ .
$(S5)$

$\limsup_{n}(\max_{j}K_{jn}/\min_{j}K_{jn})\leq\infty$ .
$(S6)$

$\sum_{i}^{n}L_{i}^{-1}\asymp n$ .
$(S7)$

For the monotone assumption, we have $\beta_{j}^{{}^{\prime}}(t)\geq\varepsilon>0$ on $[0,1]$ , and for the convex assumption, $\beta_{j}^{{}^{\prime\prime}}(t)\geq\varepsilon>0$ on $[0,1]$ . For the increasing and convex assumption, constrained hold strictly if $\beta_{j}^{{}^{\prime\prime}}\geq\varepsilon>0$ on $[0,1]$ and $\beta_{j}^{{}^{\prime}}(t)\geq\varepsilon>0$ .

The condition $(S4)$ requires fairly distributed knots over [0,1] without bias in location, where equidistantly located knots satisfy this condition. Then similar rates of growth on the number of knots $K_{jn}$ across $j$ are conditioned through $(S5)$ . Lastly, Condition $(S6)$ indicates the number of observations per subject increases slower than the rate of increase for sample size $n$ . In practice, $(S6)$ is rather common with a bounded number of measurements per subject in clinical experiments instead of its divergence.

S.3 Implementation Details

Bandwidth selection for kernel estimation

To select bandwidths in a data-adaptive manner, we propose a $K$ -fold cross-validation (CV) procedure that minimizes empirical prediction errors for individual trajectories. Specifically, let $\mathcal{P}=\bigcup_{k=1}^{K}\mathcal{P}_{k}$ be an arbitrary disjoint partition of the index set $\{1,\ldots,n\}$ , where each partition set approximately has the same size. Also, let $\widetilde{\boldsymbol{\beta}}\big{(}t;h,\mathcal{X}_{n,(-k)}\big{)}$ be the unconstrained estimates of $\boldsymbol{\beta}(t)$ defined with a bandwidth $h>0$ and the training sample $\mathcal{X}_{n,(-k)}=\{(\mathbf{Y}_{i},\mathbf{X}_{i}):i\not\in\mathcal{P}_{k}\}$ in (2.4). We choose

\displaystyle\widetilde{h}=\operatorname*{arg\,min}_{h\in\mathcal{H}}\sum_{k=1}^{K}\sum_{i\in\mathcal{P}_{k}}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{(}Y_{i,\ell}-\mathbf{X}_{i}^{\top}\widetilde{\boldsymbol{\beta}}\big{(}T_{i,\ell};h,\mathcal{X}_{n,(-k)}\big{)}\Big{)}^{2}

(S.3.1)

and use $\widetilde{\boldsymbol{\beta}}(t)=\widetilde{\boldsymbol{\beta}}\big{(}t;\widetilde{h},\mathcal{X}_{n}\big{)}$ as the final estimates, where $\mathcal{H}$ is a grid of the domain. We also similarly implement the constrained estimates $\widehat{\boldsymbol{\beta}}(t)=\widehat{\boldsymbol{\beta}}\big{(}t;\widehat{h},\mathcal{X}_{n}\big{)}$ . In our numerical studies, we used the $5$ -fold CV procedure with $\mathcal{H}$ that consists of equally-spaced ten points.

Knot selection for regression spline functions

In spline regression, the number and location of knots can significantly impact the fit. Thus, in simulation studies and real data examples, we try a $K$ -fold CV error to choose the optimal number as well as the location of internal knots for unconstrained estimates, where the practical partition of samples and evaluation details are discussed in the above paragraph for bandwidth selection in kernel least squares. Although the equally spaced knots are preferred in some literature, considering the right-skewed or unbalanced evaluation points $T_{i,\ell}$ in our numerical examples, we further consider a set of different knot sequences spanning the entire domain under the given candidate number of internal knots, which are set as 1,2, or 3, in our study. We assign the same optimal knot sequence for all $\tilde{\beta}_{j}(t)$ , although it can vary over $j$ . Once the optimal knot sequence for unconstrained fit is determined, we add lower and upper boundaries of sub-interval $\mathcal{I}_{j}$ as additional internal knots for the estimation of partially shape-constrained $\beta_{j}(t)$ , for $j\in J$ , to present the desired shape explicitly over $\mathcal{I}_{j}$ . For simulation experiments, we perform the 5-fold CV-based knot selection at each iteration.

S.4 Proofs of Proposition 2.2.1 and Theorem 2.2.2

Lemma S.4.1.

For $r\in\{0,1,2\}$ , let

\displaystyle H_{r}(t)=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}\eta_{i}(T_{i,\ell})\quad(t\in[0,1]),

(S.4.1)

where $\eta_{1},\ldots,\eta_{n}$ are random copies of a stochastic process $\eta$ on $[0,1]$ . Assume the following conditions hold.

$(L0)$

$\pi$ is continuously differentiable and strictly positive,
$(L1)$

$K$ is Lipschitz continuous on $[0,1]$ , i.e., $|K(s)-K(t)|\leq\mathrm{Lip}(K)|s-t|$ for every $s,t,\in[0,1]$ , where $\mathrm{Lip}(K)$ denotes the Lipschitz constant of $K$ ,
$(L2)$

$E\|\eta\|_{\infty}^{k}<\infty$ for some $k>2$ ,
$(L3)$

$h\asymp N^{-a}$ for some $0<a<1-2/k$ ,
$(L4)$

$\lambda_{n}=O(1)$ and $\Lambda_{n}=O\big{(}N^{b}/\log N\big{)}^{k/2}$ for some $0<b<1-(2/k)-a$ .

Then, letting $N=n\lambda_{n}$ , we have

\displaystyle\sup_{t\in[0,1]}h^{r}\big{|}H_{r}(t)-E\big{(}H_{r}(t)\big{)}\big{|}=O_{P}\bigg{(}\sqrt{\frac{\log N}{Nh}}\bigg{)}.

(S.4.2)

Proof of Lemma S.4.1.

Define $\widetilde{H}_{r}(t)$ similarly as $H_{r}(t)$ by substituting $\tilde{\eta}_{i}(T_{i,\ell})=\eta_{i}(T_{i,\ell})\mathbb{I}\big{(}|\eta_{i}(T_{i,\ell})|\leq\delta_{n}^{-1}\big{)}$ for $\eta_{i}(T_{i,\ell})$ in (S.4.1), where $\delta_{n}=N^{-1/2}h^{-1/2}\sqrt{\log N}$ . It follows that

\displaystyle\begin{split}H_{r}(t)-E\big{(}H_{r}(t)\big{)}&=\Big{\{}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\Big{\}}\\ &\quad+\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}\eta_{i}(T_{i,\ell})\mathbb{I}\big{(}|\eta_{i}(t)|>\delta_{n}^{-1}\big{)}\\ &\quad-E\Big{[}K_{h}(T-t)\Big{(}\frac{T-t}{h}\Big{)}^{r}\eta(T)\mathbb{I}\big{(}|\eta(T)|>\delta_{n}^{-1}\big{)}\Big{]}.\end{split}

(S.4.3)

First, we claim that the last two terms on the right side are negligible up to the order of $\delta_{n}$ so that it suffices to show the uniform convergence of $\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}$ . To claim the approximation, define a sequence of event sets as

\displaystyle\mathcal{E}_{n}=\big{(}|\eta_{i}(T_{i,\ell})|\leq\delta_{n}^{-1}\textrm{ for all $1\leq i\leq n$ and $1\leq\ell\leq L_{i}$}\big{)}

(S.4.4)

for $n=1,2,\ldots$ . Recalling $N=n\lambda_{n}$ , we obtain from Markov’s inequality that

\displaystyle\begin{split}P(\mathcal{E}_{n})&\geq 1-\sum_{i=1}^{n}\sum_{\ell=1}^{L_{i}}P\big{(}|\eta_{i}(T_{i,\ell})|>\delta_{n}^{-1}\big{)}\\ &\geq 1-n\Lambda_{n}\delta_{n}^{k}E\|\eta\|_{\infty}^{k}\\ &\geq 1-\lambda_{n}^{-1}\Big{(}N^{b-\{1-(2/k)-a\}}\log N\Big{)}^{k/2}E\|\eta\|_{\infty}^{k}\\ &=1-o(1)\end{split}

(S.4.5)

where $\lambda_{n}^{-1}\leq 1$ . This implies that the second term on the right side of (S.4.3) is zero with probability tending to 1. Similarly, we also get

\displaystyle\begin{split}&\bigg{|}E\Big{(}K_{h}(T-t)\Big{(}\frac{T-t}{h}\Big{)}^{r}\eta(T)\mathbb{I}\big{(}|\eta(T)|>\delta_{n}^{-1}\big{)}\Big{)}\bigg{|}\\ &\quad\leq h^{-1}\nu_{r}(t;h)E\big{[}\sup_{t}|\eta(t)|\,\mathbb{I}\big{(}|\eta(t)|>\delta_{n}^{-1}\big{)}\big{]}\\ &\quad\leq h^{-1}\nu_{r}(t;h)E\|\eta\|_{\infty}^{k}\delta_{n}^{k-1}\\ &\quad\lesssim\delta_{n}\Big{(}N^{-\frac{k-2}{2}}h^{-\frac{k-2}{2}-1}(\log N)^{\frac{k-2}{2}}\Big{)}\\ &\quad=\delta_{n}O\Big{(}N^{-\frac{k}{2}\{1-(2/k)-a\}}(\log N)^{\frac{k-2}{2}}\Big{)}\\ &\quad=o(\delta_{n}),\end{split}

(S.4.6)

where $\nu_{r}(t;h)=\int_{0}^{1}\big{(}\frac{u-t}{h}\big{)}^{r}K_{h}(u-t)\pi(u)\,\mathrm{d}u$ .

Now, we prove the uniform convergence of $\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}$ . For $\tau>0$ , let $\mathcal{T}_{N}$ be the $(N^{-\tau})$ -net over $[0,1]$ with the cardinality $|\mathcal{T}_{N}|=O(N^{\tau})$ such that, for any $s\in[0,1]$ , there exists $t\in\mathcal{T}_{N}$ such that $|s-t|\leq N^{-\tau}$ . Then, we have

\displaystyle\begin{split}&\sup_{t\in[0,1]}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}\\ &\leq\sup_{t\in\mathcal{T}_{n}}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}\,\,\,+\sup_{|s-t|\leq n^{-\tau}}\Big{|}\big{(}\widetilde{H}_{r}(s)-\widetilde{H}_{r}(t)\big{)}-E\big{(}\widetilde{H}_{r}(s)-\widetilde{H}_{r}(t)\big{)}\Big{|}.\end{split}

(S.4.7)

The second term on the right side is almost surely negligible up to the order of $\delta_{n}$ . Specifically, for any $|s-t|\leq N^{-\tau}$ , it follows from the Lipschitz continuity of $K$ that

\displaystyle\begin{split}&\big{|}\widetilde{H}_{r}(s)-\widetilde{H}_{r}(t)\big{|}\\ &\leq\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}|\tilde{\eta}_{i}(T_{i,\ell})|\,\Big{|}K_{h}(T_{i,\ell}-s)\Big{(}\frac{T_{i,\ell}-s}{h}\Big{)}^{r}-K_{h}(T_{i,\ell}-t)\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}\Big{|}\\ &\leq\delta_{n}^{-1}h^{-1}\big{\{}\mathrm{Lip}(K)h^{-r}+r\|K\|_{\infty}\big{\}}\frac{|s-t|}{h}\\ &\lesssim\delta_{n}\big{(}N^{1-\tau}h^{-1-r}/\log N\big{)}\\ &=\delta_{n}O\big{(}N^{1+a(1+r)-\tau}/\log N\big{)}\\ &=o(\delta_{n}),\end{split}

(S.4.8)

for a sufficiently large $\tau>1+a(1+r)$ . To investigate the stochastic magnitude of $\sup_{t\in\mathcal{T}_{n}}h^{r}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}$ , we write

\displaystyle\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}=\frac{1}{n}\sum_{i=1}^{n}\big{\{}\zeta_{i,r}(t)-E\big{(}\zeta_{i,r}(t)\big{)}\big{\}},

(S.4.9)

where $\zeta_{i,r}(t)=\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\big{(}\frac{T_{i,\ell}-t}{h}\big{)}^{r}\tilde{\eta}_{i}(T_{i,\ell})$ . Since $\|\zeta_{i,r}\|_{\infty}$ is almost surely bounded for all $i=1,\ldots,n$ , it follows from Bernstein’s inequality that

\displaystyle\begin{split}&P\bigg{(}\sup_{t\in\mathcal{T}_{N}}h^{r}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}\geq C\delta_{n}\bigg{)}\\ &\leq\sum_{t\in\mathcal{T}_{N}}P\bigg{(}\Big{|}\sum_{i=1}^{n}h^{r}\big{\{}\zeta_{i,r}(t)-E\big{(}\zeta_{i,r}(t)\big{)}\big{\}}\Big{|}\geq Cn\delta_{n}\bigg{)}\\ &\leq\sum_{t\in\mathcal{T}_{N}}2\exp\Bigg{(}-\frac{(Cn\delta_{n})^{2}/2}{h^{2r}\sum_{i=1}^{n}\mathrm{Var}\big{(}\zeta_{i,r}(t)\big{)}+2h^{r}\|\zeta_{i,r}\|_{\infty}(Cn\delta_{n})/3}\Bigg{)}\\ &\leq 2|\mathcal{T}_{N}|\exp\Bigg{(}-\frac{(C^{2}/2)(\lambda_{n}^{-1}nh^{-1}\log N)}{\int_{0}^{1}u^{2r}K^{2}(u)\,\mathrm{d}u\,E\|\eta\|_{\infty}^{2}\|\pi\|_{\infty}(nh^{-1+2r})+(C/3)\|K\|_{\infty}(nh^{-1})}\Bigg{)}\\ &\lesssim N^{\tau}\exp\big{(}-\lambda_{n}^{-1}g(C)\log N\big{)}\\ &=O\big{(}N^{\tau-\lambda_{n}^{-1}g(C)}\big{)},\end{split}

(S.4.10)

where $g(C)=(C^{2}/2)\big{\{}\int_{0}^{1}u^{2r}K^{2}(u)\,\mathrm{d}u\,E\|\eta\|_{\infty}^{2}\|\pi\|_{\infty}+(C/3)\|K\|_{\infty}\big{\}}$ is increasing in $C>0$ . Since $\lambda_{n}^{-1}>0$ with $\lambda_{n}=O(1)$ , we conclude that $\sup_{t\in\mathcal{T}_{n}}h^{r}\big{|}\widetilde{H}_{r}(t)-E\big{(}\widetilde{H}_{r}(t)\big{)}\big{|}=O_{P}(\delta_{n})$ . ∎

Proof of Proposition 2.2.1.

We note that

\displaystyle\begin{split}\mathrm{vec}\big{(}\widetilde{\mathbb{B}}(t)-\mathbb{B}_{0}(t)\big{)}&=\Bigg{[}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\big{(}\mathbf{Z}_{i,\ell}(t)\otimes\mathbf{X}_{i}\big{)}\big{(}\mathbf{Z}_{i,\ell}(t)\otimes\mathbf{X}_{i}\big{)}^{\top}\Bigg{]}^{-1}\\ &\qquad\times\Bigg{[}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\big{(}\mathbf{Z}_{i,\ell}(t)\otimes\mathbf{X}_{i}\big{)}\Big{\{}Y_{i,\ell}-\mathbf{X}_{i}^{\top}\mathbb{B}_{0}(t)\mathbf{Z}_{i,\ell}(t)\Big{\}}\Bigg{]}.\end{split}

(S.4.11)

To analyze the large sample property of $\widetilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}(t)$ , we obtain the asymptotic approximation of the above matrix multiplication by analyzing individual components. For the first factor in (S.4.11), we note that Lemma S.4.1 gives

\displaystyle\begin{split}\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}K_{h}(T_{i,\ell}-t)\mathbf{X}_{i}\mathbf{X}_{i}^{\top}&=\nu_{r}(t;h)\Gamma+O_{P}\bigg{(}\sqrt{\frac{\log N}{Nh^{1+r}}}\bigg{)}\end{split}

(S.4.12)

for $r\in\{0,1,2\}$ and uniformly for $t\in[0,1]$ , where $\nu_{r}(t;h)=\int_{0}^{1}\big{(}\frac{u-t}{h}\big{)}^{r}K_{h}(u-t)\pi(u)\,\mathrm{d}u$ and $\Gamma=E(\mathbf{X}\mathbf{X}^{\top})$ . Similarly, for the second factor in (S.4.11), we have

\displaystyle\begin{split}&\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}\Big{(}\frac{T_{i,\ell}-t}{h}\Big{)}^{r}K_{h}(T_{i,\ell}-t)\mathbf{X}_{i}\Big{\{}Y_{i,\ell}-\mathbf{X}_{i}^{\top}\mathbb{B}(t)\mathbf{Z}_{i,\ell}(t)\Big{\}}\\ &=\frac{1}{2}h^{2}\nu_{2+r}(t;h)\big{(}\Gamma\ddot{\boldsymbol{\beta}}_{0}(t)\big{)}+o(h^{2})+O_{P}\bigg{(}h^{2}\sqrt{\frac{\log N}{Nh^{1+r}}}\bigg{)}\end{split}

(S.4.13)

for $r\in\{0,1\}$ and uniformly for $t\in[0,1]$ , where $Y_{i,\ell}-\mathbf{X}_{i}^{\top}\mathbb{B}(t)\mathbf{Z}_{i,\ell}(t)=\mathbf{X}_{i}^{\top}\ddot{\boldsymbol{\beta}}(t)\{(T_{i,\ell}-t)^{2}/2\}\cdot(1+o(|T_{i,\ell}-t|))+\varepsilon_{i,\ell}$ for $T_{i,\ell}$ near $t$ . Combining (S.4.12) and (S.4.13), we have

\displaystyle\begin{split}\widetilde{\boldsymbol{\beta}}(t)-\boldsymbol{\beta}_{0}(t)&=\left[\begin{array}[]{c}I_{p}\\ O_{p}\end{array}\right]^{\top}\mathrm{vec}\big{(}\widetilde{\mathbb{B}}(t)-\mathbb{B}_{0}(t)\big{)}\\ &=\frac{h^{2}}{2}\Big{[}(1,0)^{\top}\otimes I_{p}\Big{]}\Bigg{[}\left(\begin{array}[]{cc}\nu_{0}(t;h)&\nu_{1}(t;h)\\ \nu_{1}(t;h)&\nu_{2}(t;h)\end{array}\right)\otimes\Gamma\Bigg{]}^{-1}\Bigg{[}\left(\begin{array}[]{c}\nu_{2}(t;h)\\ \nu_{3}(t;h)\end{array}\right)\otimes\big{(}\Gamma\ddot{\boldsymbol{\beta}}_{0}(t)\big{)}\Bigg{]}\\ &\qquad+o(h^{2})+O_{P}\bigg{(}\sqrt{\frac{\log N}{Nh}}\bigg{)}\\ &=\frac{h^{2}}{2}\ddot{\boldsymbol{\beta}}_{0}(t)\bigg{(}\frac{\nu_{2}(t;h)^{2}-\nu_{1}(t;h)\mu_{3}(t;h)}{\nu_{0}(t;h)\nu_{2}(t;h)-\nu_{1}(t;h)^{2}}\bigg{)}+o(h^{2})+O_{P}\bigg{(}\sqrt{\frac{\log N}{Nh}}\bigg{)},\end{split}

(S.4.14)

where it can be verified that the first term is $O(h^{2})$ uniformly for $t\in[0,1]$ . ∎

Remark 4.

Using the similar argument in (S.4.14), one can further verify that

\displaystyle\begin{split}\sup_{t\in[0,1]}h\Big{\|}\widetilde{\dot{\boldsymbol{\beta}}}(t)-\dot{\boldsymbol{\beta}}(t)\Big{\|}&=O_{P}\bigg{(}h^{2}+\sqrt{\frac{\log N}{Nh}}\bigg{)}.\end{split}

(S.4.15)

Proof of Theorem 2.2.2.

We write

\displaystyle\begin{split}\mathcal{Q}_{n}\big{(}{\mathbb{B}}(t),\widetilde{\mathbb{B}}(t)\big{)}&=\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)}^{\top}\widehat{\Psi}(t)\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)},\end{split}

(S.4.16)

where $\widehat{\Psi}(t)=\frac{1}{n}\sum_{i=1}^{n}\frac{1}{L_{i}}\sum_{\ell=1}^{L_{i}}K_{h}(T_{i,\ell}-t)\mathbb{W}_{i,\ell}(t)\mathbb{W}_{i,\ell}(t)^{\top}$ . It follows from Lemma S.4.1 that

\displaystyle\begin{split}\widehat{\Psi}(t)&=\mathcal{N}(t;h)\otimes\Gamma+o_{P}(1)\end{split}

(S.4.17)

uniformly for $t\in[0,1]$ , where $\Gamma=E(\mathbf{X}\mathbf{X}^{\top})$ and $\mathcal{N}(t;h)$ is the $2\times 2$ matrix whose $(r+1,r^{\prime}+1)$ -component is given by $\nu_{r+r^{\prime}}(t;h)=\int_{0}^{1}\big{(}\frac{u-t}{h}\big{)}^{r+r^{\prime}}K_{h}(u-t)\pi(u)\,\mathrm{d}u$ for $0\leq r,r^{\prime}\leq 1$ . For $t\in(0,1)$ , we note that $\mathcal{N}(t;h)=\pi(t)\mathcal{N}$ is strictly positive definite, where $\mathcal{N}=\mathrm{diag}(\nu_{0},\nu_{2})$ is a diagonal matrix with $\nu_{r}=\int_{-1}^{1}u^{r}K(u)\,\mathrm{d}u$ . Therefore, with probability tending to one, we have

\displaystyle\begin{split}\frac{1}{M}\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widehat{\mathbb{B}}(t_{m})\big{)}&\leq\frac{1}{M}\sum_{m=1}^{M}\Big{(}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}+\mathcal{Q}_{n}\big{(}\widehat{\mathbb{B}}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}\Big{)}\\ &\leq\frac{2}{M}\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}\\ &\leq 2\sigma_{\mathrm{max}}(\mathcal{N}\otimes\Gamma)\sup_{t\in[0,1]}\big{\|}\mathrm{vec}\big{(}\mathbb{B}(t)-\widetilde{\mathbb{B}}(t)\big{)}\big{\|}^{2}\\ &=O_{P}\bigg{(}h^{4}+\frac{\log N}{Nh}\bigg{)},\end{split}

(S.4.18)

where $\sigma_{\mathrm{max}}(\mathcal{N}\otimes\Gamma)$ denotes the maximum eigenvalue of $\mathcal{N}\otimes\Gamma$ . To get the second inequality in (S.4.18), we used the fact that $\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\widehat{\mathbb{B}}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}\leq\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}$ holds because $\mathbb{B}_{0}$ satisfies the side conditions (2.9) under which $\widehat{\mathbb{B}}$ minimizes $\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}(t_{m}),\widetilde{\mathbb{B}}(t_{m})\big{)}$ . The last equality in (S.4.18) follows from Proposition 2.2.1 and (S.4.15). Finally, we conclude from (S.4.18) that

\displaystyle\begin{split}\max_{1\leq m\leq M}\big{\|}\widehat{\boldsymbol{\beta}}(t_{m})-\boldsymbol{\beta}_{0}(t_{m})\big{\|}^{2}&\leq\max_{1\leq m\leq M}\big{\|}\mathrm{vec}\big{(}\mathbb{B}_{0}(t_{m})-\widehat{\mathbb{B}}(t_{m})\big{)}\big{\|}^{2}\\ &\leq\sigma_{\mathrm{min}}(\mathcal{N}\otimes\Gamma)^{-1}\sum_{m=1}^{M}\mathcal{Q}_{n}\big{(}\mathbb{B}_{0}(t_{m}),\widehat{\mathbb{B}}(t_{m})\big{)}\\ &=O_{P}\bigg{(}h^{4}+\frac{\log N}{Nh}\bigg{)},\end{split}

(S.4.19)

where $\sigma_{\mathrm{min}}(\mathcal{N}\otimes\Gamma)$ denotes the minimum eigenvalues of $\mathcal{N}\otimes\Gamma$ . ∎

S.5 Proof of Theorem 2.3.1

Proof.

Let $\boldsymbol{\mu}_{i}$ denote the true mean vector of responses of the $i$ th trajectory when (2.2) is true. We define $\tilde{\boldsymbol{\mu}}_{i}$ and $\hat{\boldsymbol{\mu}}_{i}$ be the estimated responses calculated from the unconstrained $\tilde{\boldsymbol{\beta}}(t)=\big{(}\tilde{\beta}_{1}(t),\ldots,\tilde{\beta}_{p}(t)\big{)}^{\top}$ and constrained $\hat{\boldsymbol{\beta}}(t)=\big{(}\hat{\beta}_{1}(t),\ldots,\hat{\beta}_{p}(t)\big{)}^{\top}$ , respectively. Let $\|\boldsymbol{a}\|^{2}=\boldsymbol{a}^{\top}\boldsymbol{a}$ and define $\textrm{vec}[\boldsymbol{\mu}_{i}]=(\boldsymbol{\mu}_{1}^{\top},\ldots,\boldsymbol{\mu}_{n}^{\top})^{\top}$ . Then we have

	$\displaystyle\|\|\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]\|\|^{2}-\|\|\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]\|\|^{2}$
	$\displaystyle=\|\|\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}]\|\|^{2}+2(\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}])$
	$\displaystyle=\|\|\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}]\|\|^{2}-2(\text{vec}[\mathbf{Y}_{i}]-\text{vec}[\tilde{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}])$		(S.5.1)
	$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+2(\text{vec}[\mathbf{Y}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]).$

The second term on the last equation is zero because of the orthogonality of $(\mathbf{Y}_{i}-\tilde{\boldsymbol{\mu}}_{i})$ and $(\hat{\boldsymbol{\mu}}_{i}-\boldsymbol{\mu}_{i})$ , for $i=1,\ldots,n$ . Based on Proposition 3.12.3 of Silvapulle and Sen (2005), the last term is non-negative by Karush-Kuhn-Tucker conditions as follows,

\langle\mathbf{Y}_{i}-\hat{\boldsymbol{\mu}}_{i},~{}\hat{\boldsymbol{\mu}}_{i}\rangle=0~{}\text{~{}and}~{}\langle\mathbf{Y}_{i}-\hat{\boldsymbol{\mu}}_{i},\boldsymbol{\mu}_{i}\rangle\leq 0,

for $i=1,\ldots,n$ . Thus, we have $||\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]||^{2}\leq||\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]||^{2}.$ Then, from Condition (S2), $||\tilde{\boldsymbol{\mu}}_{i}-{\boldsymbol{\mu}}_{i}||^{2}\simeq||\mathbf{X}_{i}\boldsymbol{\beta}(t)-\mathbf{X}_{i}\tilde{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}=||\mathbf{X}_{i}\{\tilde{\boldsymbol{\beta}}(t)-{\boldsymbol{\beta}}(t)\}||^{2}_{L_{2}}\leq M_{1}||\boldsymbol{\beta}(t)-\tilde{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}.$ For non-trivial $\mathbf{X}_{i}$ , we can always find $M_{4}>0$ satisfying $||\hat{\boldsymbol{\mu}}_{i}-{\boldsymbol{\mu}}_{i}||^{2}\simeq||\mathbf{X}_{i}\{{\hat{\boldsymbol{\beta}}}(t)-{\boldsymbol{\beta}}(t)\}||^{2}_{L_{2}}\geq M_{4}||\hat{\boldsymbol{\beta}}(t)-{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}.$ Hence, altogether, we have $M_{4}||\hat{\boldsymbol{\beta}}(t)-{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}\leq||\hat{\boldsymbol{\mu}}_{i}-{\boldsymbol{\mu}}_{i}||^{2}\leq||\tilde{\boldsymbol{\mu}}_{i}-{\boldsymbol{\mu}}_{i}||^{2}\leq M_{3}||\tilde{\boldsymbol{\beta}}(t)-\tilde{\boldsymbol{\beta}}(t)||^{2}_{L_{2}}$ , which leads to $||\hat{\boldsymbol{\beta}}_{j}(t)-{\boldsymbol{\beta}_{j}}(t)||^{2}_{L_{2}}\leq||\tilde{\boldsymbol{\beta}}_{j}(t)-{\boldsymbol{\beta}_{j}}(t)||^{2}_{L_{2}}$ , for $j=1,\ldots,p$ . ∎

S.6 Partially shape-constrained regression spline via $B$ -splines

We consider the monotone increasing constraints on $\beta_{j}(t)$ over sub-interval $\mathcal{I}_{j}$ for $j\in J$ , as in (2.2). Borrowing the notation from Section 2.3, now let { $\phi_{k}^{j}(t)$ : $k=1,\ldots,d_{j}$ }, denote a set of $B$ -spline basis functions of degree 2 under the given knot sequence to express $\beta_{j}(t)$ and $\mathbf{c}_{j}=(c_{j1},\ldots,c_{jd_{j}})^{\top}$ represents the corresponding basis coefficients. For $i=1,\ldots,n$ , we define $(L_{i}\times d_{j})$ matrix of discretized basis functions as,

\Phi_{ij}=\left[{\begin{array}[]{cccc}\phi^{j}_{1}{(T_{i,1})}&\phi^{j}_{2}{(T_{i,1})}&\cdots&\phi^{j}_{d_{j}}{(T_{i,1})}\\ \phi^{j}_{1}{({T_{i,2}})}&\phi^{j}_{2}{({T_{i,2}})}&\cdots&\phi^{j}_{d_{j}}{({T_{i,2}})}\\ \vdots&\vdots&\ddots&\vdots\\ \phi^{j}_{1}{(T_{i,L_{i}})}&\phi^{j}_{2}(T_{i,L_{i}})&\cdots&\phi^{j}_{d_{j}}(T_{i,L_{i}})\\ \end{array}}\right].

We then notate a set of given knot sequence generating a set of $\phi_{k}^{j}(t)$ by $\mathcal{K}_{j}=\{t_{jm}\in[0,1]:m=1,\ldots,M_{j}\}$ with $0=t_{j1}<\cdots<t_{jM_{j}}=1$ . For $j=1,\ldots,p$ , let $\mathbf{A}_{j}$ denote the $(M_{j}\times d_{j})$ matrix of discretized first derivatives of basis functions evaluated at $\mathcal{K}_{j}$ as,

\mathbf{A}_{j}=\left[{\begin{array}[]{cccc}{\phi^{j}_{1}}{{}^{\prime}}{(t_{j1})}&{\phi^{j}_{2}}{{}^{\prime}}{(t_{j1})}&\cdots&{\phi^{j}_{d_{j}}}{{}^{\prime}}{(t_{j1})}\\ {\phi^{j}_{1}}{{}^{\prime}}{({t_{j2}})}&{\phi^{j}_{2}}{{}^{\prime}}{({t_{j2}})}&\cdots&{\phi^{j}_{d_{j}}}{{}^{\prime}}{({t_{j2}})}\\ \vdots&\vdots&\ddots&\vdots\\ {\phi^{j}_{1}}{{}^{\prime}}{(t_{jM_{j}})}&{\phi^{j}_{2}}{{}^{\prime}}(t_{jM_{j}})&\cdots&{\phi^{j}_{d_{j}}}{{}^{\prime}}(t_{jM_{j}})\\ \end{array}}\right].

For the special case of $\mathcal{I}_{j}=[0,1]$ , for all $j\in J$ , i.e., when the monotone increasing constraint is assigned to the entire domain, the objective function to find shape constrained $\beta_{j}(t)$ is written as

\textrm{Minimize}~{}~{}~{}\sum_{i=1}^{n}\Big{\|}\mathbf{Y}_{i}-\sum_{j=1}^{p}{X}_{i,j}\big{\{}\Phi_{ij}\boldsymbol{c}_{j}\big{\}}\Big{\|}^{2}\quad\textrm{subject to}~{}~{}~{}\mathbf{A}_{j}\mathbf{c}_{j}\geq 0_{M_{j}},~{}~{}j\in J,

(S.6.1)

where $0_{M_{j}}$ represents a length- $M_{j}$ vector of 0’s. Now, under the general partial constraint case, suppose that lower and upper boundaries of sub-interval $\mathcal{I}_{j}$ , say $t_{jl}$ and $t_{ju}$ , belong to $\mathcal{K}_{j}$ without loss of generality. Since $\mathcal{K}_{j}$ is user-defined, we can always define such $\mathcal{K}_{j}$ by simply adding additional internal knots. Then, the objective function can be written similarly to (S.6.1) aiming to minimize the least squares but now under different $\mathbf{A}_{j}$ , which assigns the non-negativity restriction on the first derivative function evaluated on knots belonging to $\mathcal{I}_{j}$ . Specifically, $(v,w)$ th element of $A_{j}$ is defined as $(A_{j})_{[v,w]}={{\phi}_{w}^{j}}{{}^{\prime}}{(t_{v})}$ , for $v\in\mathcal{V}_{j}$ and $w\in\mathcal{W}_{j}$ , where $\mathcal{V}_{j}=\big{\{}m\in\{1,\ldots,M_{j}\}:t_{jl}\leq t_{m}\leq t_{ju}\big{\}}$ and $\mathcal{W}_{j}=\big{\{}k\in\{1,\ldots,d_{j}\}:{\phi_{k}^{j}}{{}^{\prime}}(t)\neq 0\textrm{~{}over~{}}t\in\mathcal{I}_{j}\big{\}}$ . Otherwise, $(A_{j})_{[v,w]}=0$ . In a matrix form, if we denote integer elements in $\mathcal{V}_{j}$ and $\mathcal{W}_{j}$ as $m_{l}$ , $m_{l+1}$ ,…, $m_{u}$ , and as $k_{l},k_{l+1}\ldots,k_{u}$ , respectively, $\mathbf{A}_{j}$ can be written as,

\mathbf{A}_{j}=\left[{\begin{array}[]{ccc:cc cc:ccc}0&\cdots&0&0&0&\cdots&0&0&\cdots&0\\ &\vdots&&&&\vdots&&&\vdots&\\ 0&\cdots&0&0&0&\cdots&0&0&\cdots&0\\ \hdashline 0&\cdots&0&{\phi^{j}_{k_{l}}}{{}^{\prime}}{(t_{j,m_{l}})}&{\phi^{j}_{k_{l+1}}}{{}^{\prime}}{(t_{j,m_{l}})}&\cdots&{\phi^{j}_{k_{u}}}{{}^{\prime}}{(t_{j,m_{l}})}&0&\cdots&0\\ 0&\cdots&0&{\phi^{j}_{k_{l}}}{{}^{\prime}}{(t_{j,m_{l+1}})}&{\phi^{j}_{k_{l+1}}}{{}^{\prime}}{(t_{j,m_{l+1}})}&\cdots&{\phi^{j}_{k_{u}}}{{}^{\prime}}{(t_{j,m_{l+1}})}&0&\cdots&0\\ \vdots&&\vdots&\vdots&\ddots&\vdots&\vdots&&\vdots\\ 0&\cdots&0&{\phi^{j}_{k_{l}}}{{}^{\prime}}{(t_{j,m_{u}})}&{\phi^{j}_{k_{l+1}}}{{}^{\prime}}{(t_{j,m_{u}})}&\cdots&{\phi^{j}_{k_{u}}}{{}^{\prime}}{(t_{j,m_{u}})}&0&\cdots&0\\ \hdashline 0&\cdots&0&0&0&\cdots&0&0&\cdots&0\\ &\vdots&&&&\vdots&&&\vdots&\\ 0&\cdots&0&0&0&\cdots&0&0&\cdots&0\\ \end{array}}\right].

Although here we elaborate on the case of one sub-interval $\mathcal{I}_{j}$ , the same approach applies to the case of multiple disjoint intervals of $\mathcal{I}_{j}$ . This optimization can be fulfilled using quadratic programming through qprog function in an R package coneporj. Indeed, this approach applies to shape-constrained estimation under any kind of basis functions, not necessarily $B$ -splines.

S.7 Additional simulation results

Table S.7.1: Simulation results for the shape-constrained kernel-weighted least squares approach (2.9). The estimation and test performance are evaluated with the integrated squared bias (ISB), the integrated variance (IVar), the type I error rate, and the power of the test, defined as (3.2) and (3.3).

Sample size	Dataset	Criterion	$H_{0}:\beta_{1}$ and $\beta_{3}$ are monotone increasing on $\mathcal{I}$ .
			$H_{0}$ is true.		$H_{0}$ is false.
			$\mathcal{I}=\mathcal{[}0,0.5]$	$\mathcal{I}=\mathcal{[}0.4,0.5]$	$\mathcal{I}=\mathcal{[}0.5,1]$	$\mathcal{I}=\mathcal{[}0.9,1]$
$n=100$	Full data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.06,0.11)$	$(0.05,0.11)$	$(0.40,0.50)$	$(0.12,0.20)$
		$\mathrm{ISB}$	$0.0302$	$0.0073$	$0.1600$	$0.0230$
		$\mathrm{IVar}$	$0.4034$	$0.5346$	$0.4537$	$1.0901$
	Partial data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.06,0.11)$	$(0.09,0.19)$	$(0.30,0.40)$	$(0.21,0.30)$
	restricted to $\mathcal{I}$	$\mathrm{ISB}$	$0.0517$	$0.0768$	$0.2299$	$0.2200$
	restricted to $\mathcal{I}$	$\mathrm{IVar}$	$0.5335$	$1.5028$	$0.5987$	$3.7844$
$n=500$	Full data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.05,0.10)$	$(0.05,0.09)$	$(0.86,0.92)$	$(0.11,0.16)$
		$\mathrm{ISB}$	$0.0102$	$0.0018$	$0.1211$	$0.0097$
		$\mathrm{IVar}$	$0.1010$	$0.1135$	$0.1059$	$0.2681$
	Partial data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.04,0.10)$	$(0.07,0.13)$	$(0.78,0.88)$	$(0.16,0.25)$
	restricted to $\mathcal{I}$	$\mathrm{ISB}$	$0.0129$	$0.0110$	$0.1699$	$0.1045$
	restricted to $\mathcal{I}$	$\mathrm{IVar}$	$0.1497$	$0.3229$	$0.1430$	$0.7396$

Table S.7.2: Simulation results for the shape-constrained regression spline approach (2.17). The estimation and test performance are evaluated with the integrated squared bias (ISB), the integrated variance (IVar), the type I error rate, and the power of the test, defined as (3.2) and (3.3).

Sample size	Dataset	Criterion	$H_{0}:\beta_{1}$ and $\beta_{3}$ are monotone increasing on $\mathcal{I}$ .
			$H_{0}$ is true.		$H_{0}$ is false.
			$\mathcal{I}=\mathcal{[}0,0.5]$	$\mathcal{I}=\mathcal{[}0.4,0.5]$	$\mathcal{I}=\mathcal{[}0.5,1]$	$\mathcal{I}=\mathcal{[}0.9,1]$
$n=100$	Full data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.04,0.07)$	$(0.08,0.11)$	$(0.29,0.40)$	$(0.09,0.11)$
		$\mathrm{ISB}$	$0.0101$	$0.0033$	$0.1850$	$0.2304$
		$\mathrm{IVar}$	$0.3737$	$0.2054$	$0.4046$	$0.9282$
	Partial data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.05,0.11)$	$(0.06,0.12)$	$(0.18,0.24)$	$(0.05,0.11)$
	restricted to $\mathcal{I}$	$\mathrm{ISB}$	$0.0176$	$0.1156$	$0.1783$	$0.4200$
	restricted to $\mathcal{I}$	$\mathrm{IVar}$	$0.5758$	$1.6896$	$0.6151$	$4.2123$
$n=500$	Full data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.04,0.08)$	$(0.05,0.08)$	$(0.81,0.89)$	$(0.10,0.16)$
		$\mathrm{ISB}$	$0.0017$	$0.0016$	$0.1461$	$0.0913$
		$\mathrm{IVar}$	$0.0946$	$0.0572$	$0.0974$	$0.2028$
	Partial data	$(\gamma_{0.05},\gamma_{0.1})$	$(0.05,0.11)$	$(0.04,0.08)$	$(0.44,0.56)$	$(0.04,0.09)$
	restricted to $\mathcal{I}$	$\mathrm{ISB}$	$0.0022$	$0.0109$	$0.1406$	$0.0731$
	restricted to $\mathcal{I}$	$\mathrm{IVar}$	$0.1168$	$0.2798$	$0.1463$	$0.7005$

References

M. J. Silvapulle and P. K. Sen. Constrained Statistical Inference: Inequality, Order, and Shape Restrictions. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc, 1 edition, 2005.

	$\displaystyle\|\|\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]\|\|^{2}-\|\|\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]\|\|^{2}$
	$\displaystyle=\|\|\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}]\|\|^{2}+2(\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}])$
	$\displaystyle=\|\|\text{vec}[\tilde{\boldsymbol{\mu}}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}]\|\|^{2}-2(\text{vec}[\mathbf{Y}_{i}]-\text{vec}[\tilde{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}])$		(S.5.1)
	$\displaystyle~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}+2(\text{vec}[\mathbf{Y}_{i}]-\text{vec}[\hat{\boldsymbol{\mu}}_{i}])^{\top}(\text{vec}[\hat{\boldsymbol{\mu}}_{i}]-\text{vec}[\boldsymbol{\mu}_{i}]).$

Statistical inference on partially shape-constrained function-on-scalar linear regression models

Abstract

1 Introduction

2 Methodology and Theory

2.1 Testing partial shape constraints in FoSR models

2.2 Estimation: Shape-constrained kernel-weighted least squares

Proposition 2.2.1.

Remark 1.

Remark 2.

Theorem 2.2.2.

2.3 Estimation: Shape-constrained regression spline

Theorem 2.3.1.

Remark 3 (Convexity shape constraints).

2.4 Test procedure

3 Simulation Study

4 Real Data Applications

4.1 Application 1: Cervical Dystonia Dataset

4.2 Application 2: Mental Health Schizophrenia Collaborative Study

References

S.1 Technical conditions for kernel estimates

S.2 Technical conditions for spline estimates

S.3 Implementation Details

Bandwidth selection for kernel estimation

Knot selection for regression spline functions

S.4 Proofs of Proposition 2.2.1 and Theorem 2.2.2

Lemma S.4.1.

Proof of Lemma S.4.1.

Proof of Proposition 2.2.1.

Remark 4.

Proof of Theorem 2.2.2.

S.5 Proof of Theorem 2.3.1

Proof.

S.6 Partially shape-constrained regression spline via BB-splines

S.7 Additional simulation results

S.6 Partially shape-constrained regression spline via $B$ -splines