Decomposition with Monotone B-splines: Fitting and Testing

Lijun Wang Email: ljwang@link.cuhk.edu.hk, lijun.wang@yale.edu Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China Department of Biostatistics, Yale University, New Haven, Connecticut, USA Xiaodan Fan Email: xfan@cuhk.edu.hk Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China Hongyu Zhao Email: hongyu.zhao@yale.edu Department of Biostatistics, Yale University, New Haven, Connecticut, USA Jun S. Liu Email: jliu@stat.harvard.edu Department of Statistics, Harvard University, Cambridge, Massachusetts, USA

Abstract

A univariate continuous function can always be decomposed as the sum of a non-increasing function and a non-decreasing one. Based on this property, we propose a non-parametric regression method that combines two spline-fitted monotone curves. We demonstrate by extensive simulations that, compared to standard spline-fitting methods, the proposed approach is particularly advantageous in high-noise scenarios. Several theoretical guarantees are established for the proposed approach. Additionally, we present statistics to test the monotonicity of a function based on monotone decomposition, which can better control Type I error and achieve comparable (if not always higher) power compared to existing methods. Finally, we apply the proposed fitting and testing approaches to analyze the single-cell pseudotime trajectory datasets, identifying significant biological insights for non-monotonically expressed genes through Gene Ontology enrichment analysis. The source code implementing the methodology and producing all results is accessible at https://github.com/szcf-weiya/MonotoneDecomposition.jl.

Keywords: Function Decomposition; Monotone B-splines; Curve Fitting; Test of Monotonicity.

1 Introduction

Suppose we have $n$ pairs of observations $(x_{i},y_{i}),i=1,\ldots,n$ , with $x_{i}\in\mathrm{I\!R}^{d},y_{i}\in\mathrm{I\!R}$ , independent and identically distributed (i.i.d.) according to an unknown probability distribution $P(X,Y)$ . Various methods exist for estimating the conditional expectation function $f(x)={\mathbb{E}}(Y|X=x)$ , ranging from simple linear regressions (including ridge and lasso) to more sophisticated nonlinear techniques. Spline is one of the most popular methods, particularly when $d=1$ . Unlike existing methods, we aim to estimate the monotonic components of $f(x)$ and then use their sum as an estimator for $f(x)$ . This is because any general function can be decomposed into the sum of an increasing function $f_{\mathrm{up}}(x)$ and a decreasing function $f_{\mathrm{down}}(x)$ (a more formal proof is given in the Supplementary Material for self-contained).

The monotone decomposition idea has been exploited by [5] in their recent algorithm, where the monotone decomposition is incorporated into fitting monotone Bayesian additive regression trees (mBART). They found that fitting by monotone decomposition with mBART outperforms the corresponding BART algorithm proposed earlier in [4]. We here focus on the case of $d=1$ , and adopt B-spline basis functions to represent the monotone components,

f_{\mathrm{up}}(x)=\sum_{j=1}^{J_{u}}\gamma_{j}^{u}B^{u}_{j}(x)+\varepsilon_{u}\,,\qquad f_{\mathrm{down}}(x)=\sum_{j=1}^{J_{d}}\gamma_{j}^{d}B^{d}_{j}(x)+\varepsilon_{d}\,,

where the superscripts and subscripts “ $u$ ” and “ $d$ ” indicate for up (increasing) and down (decreasing), respectively. The monotonicity of each monotone component is ensured by the monotonicity of the coefficients [23], i.e.,

\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma^{u}_{J_{u}};\quad\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J_{d}}^{d}\,.

This paper is organized as follows. In Section 2, we formulate monotone decomposition with cubic splines and establish properties of the solution, particularly for monotone functions. In Section 3, we propose the monotone decomposition with smoothing splines and establish similar properties and theoretical guarantees. In Section 4, we present simulation results to demonstrate how fitting via monotone decomposition can outperform the corresponding unconstrained fitting, particularly in high-noise scenarios. In Section 5, we propose statistics for testing monotonicity and, in Section 6, we demonstrate the power of the proposed method via simulations. In Section 7, we present the results of our analysis on single-cell pseudo-time trajectory datasets using the fitting and testing techniques based on monotone decomposition. Finally, we discuss the limitations and potential future work in Section 8.

2 Monotone Decomposition with Cubic Splines

Cubic splines are the most popular polynomial splines for practitioners. Presumably, cubic splines are the lowest-order splines for which the knot-discontinuity is not visible to the human eye, and there is scarcely any good reason to go beyond cubic splines [12]. On the other hand, although there are many equivalent bases for representing a spline function, the B-spline basis system developed by [7] is attractive numerically [17]. Thus, we take the order-4 B-spline basis to represent cubic splines, under which the problem reduces to solving the following optimization problem:

\begin{split}\min_{\gamma^{u},\gamma^{d}}&\;\|{\mathbf{y}}-{\mathbf{B}}_{u}\gamma^{u}-{\mathbf{B}}_{d}\gamma^{d}\|_{2}\,,\\ \text{s.t. }&\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\,,\\ \end{split}

(1)

where ${\mathbf{y}}=(y_{1},\ldots,y_{n})$ is an $n$ -vector of the responses (note that we use round brackets to denote column vectors), $({\mathbf{B}}_{u})_{ik}=B^{u}_{k}(x_{i}),k=1,\ldots,J_{u},({\mathbf{B}}_{d})_{i\ell}=B^{d}_{\ell}(x_{i}),\ell=1,\ldots,J_{d}$ are the matrices constructed by evaluating the B-spline basis at $\{x_{i}\}_{i=1}^{n}$ , and $\gamma^{u}=(\gamma_{1}^{u},\ldots,\gamma_{J}^{u}),\gamma^{d}=(\gamma_{1}^{d},\ldots,\gamma_{J}^{d})$ are the coefficient vectors.

For simplicity, we consider $J_{u}=J_{d}=J$ . Note that the knots for determining the B-spline basis are conventionally on the quantiles of $x$ ’s, then the B-spline basis functions are also the same, $B_{k}^{u}=B_{\ell}^{d}$ , so the above problem (1) becomes

\begin{split}\min_{\gamma^{u},\gamma^{d}}&\;\|{\mathbf{y}}-{\mathbf{B}}(\gamma^{u}+\gamma^{d})\|_{2}\\ \text{s.t. }&\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\,,\end{split}

(2)

where ${\mathbf{B}}_{ij}=B_{j}(x_{i}),j=1,\ldots,J$ .

First of all, Proposition 1 establishes the equivalence between problem (2) with the corresponding unconstrained B-spline fitting.

Proposition 1.

Regardless of the component solutions $\hat{\gamma}^{u},\hat{\gamma}^{d}$ to problem (2), the overall solution $\hat{\gamma}^{u}+\hat{\gamma}^{d}$ is equivalent to the unconstrained B-spline fitting, i.e., the least squares solution,

\hat{\gamma}^{\mathrm{ls}}=\operatorname*{arg\,min}_{\gamma}\|{\mathbf{y}}-{\mathbf{B}}\gamma\|_{2}=({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}{\mathbf{y}}\,.

(3)

Specifically, $\hat{\gamma}^{u}+\hat{\gamma}^{d}=\hat{\gamma}^{\mathrm{ls}}\,.$

Note that the monotone components in (2) are not identifiable, since

\gamma^{u}+\gamma^{d}=\gamma^{u}+\delta+\gamma^{d}-\delta\,,

where $\delta$ is an arbitrary increasing sequence, $\delta_{1}\leq\cdots\leq\delta_{J}$ . In other words, the decomposition for a general function is not unique since

f_{\mathrm{up}}(x)+f_{\mathrm{down}}(x)=f_{\mathrm{up}}(x)+h(x)+f_{\mathrm{down}}(x)-h(x)\,,

where $h(x)$ is an arbitrary increasing function.

In order to have a unique decomposition, we consider the closest decomposition in some sense, such as the difference between two monotone components being the smallest in the $L_{2}$ -norm. Thus, we consider imposing some discrepancy constraint on problem (2), as detailed in the following subsections, to help obtain a unique solution.

2.1 Discrepancy Constraint: A Motivating Example

Consider the simple function $y=x^{3},x\in[-1,1]$ , which may be decomposed as

x^{3}=\{x^{3}+h(x)\}+\{0-h(x)\}\triangleq f_{\mathrm{up}}(x)+f_{\mathrm{down}}(x)\,,

where $h(x)$ is an increasing function. If we set $h(0)=0$ , then it is easy to show that the magnitude of the difference between the two monotone components is lower-bounded by $|x^{3}|$ , i.e.,

|x^{3}|\leq|x^{3}+2h(x)|.

(4)

Since it is unreasonable to have a decreasing component for a strictly increasing function, the ideal decomposition should correspond to $h(x)=0$ . Equation (4) suggests to us that such an ideal decomposition may be obtained by requiring the two monotone components to differ the least.

In light of this observation, we introduce the following discrepancy constraint for the two monotone components in the decomposition:

\|f_{\mathrm{up}}-f_{\mathrm{down}}\|_{2}\leq s,

(5)

where $s\geq 0$ is a tuning parameter. The role of parameter $s$ can be summarized as follows,

•

if $s\rightarrow 0$ , then the solution is $\gamma^{u}=\gamma^{d}=c{\boldsymbol{1}}_{J}$ , and hence $\hat{f}_{\mathrm{up}}=\hat{f}_{\mathrm{down}}=c{\mathbf{B}}{\boldsymbol{1}}_{J}=c{\boldsymbol{1}}_{n}$ are constant functions;
•

if $s\rightarrow\infty$ , then the problem reduces to be equivalent to the unconstrained B-spline fitting;
•

a moderate $s$ imposes some regularization, which is preferable for a better fitting.

2.2 General Functions

With the discrepancy constraint in (5), we can restate problem (2) as

\begin{split}\min_{\gamma^{u},\gamma^{d}}&\;\|{\mathbf{y}}-{\mathbf{B}}(\gamma^{u}+\gamma^{d})\|_{2}\\ \text{s.t. }&\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\\ &\|{\mathbf{B}}(\gamma^{u}-\gamma^{d})\|_{2}\leq s\,.\end{split}

(6)

Defining $\Upsilon\triangleq\{\gamma=(\gamma^{u},\gamma^{d})\in\mathrm{I\!R}^{2J}:\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\}$ and

{\mathbf{Z}}\triangleq\begin{bmatrix}{\mathbf{B}}&{\mathbf{B}}\end{bmatrix},\quad{\mathbf{W}}\triangleq\begin{bmatrix}{\mathbf{B}}^{T}\\ -{\mathbf{B}}^{T}\end{bmatrix}\begin{bmatrix}{\mathbf{B}}&-{\mathbf{B}}\end{bmatrix},

we further rewrite problem (6) as

\min_{\gamma\in\Upsilon}\;\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}\qquad\mathrm{s.t.}\ \ \gamma^{T}{\mathbf{W}}\gamma\leq s^{2}\,.

(7)

It is more convenient to consider its Lagrangian form

\displaystyle\min_{\gamma\in\Upsilon}\left[\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}+\mu(\gamma^{T}{\mathbf{W}}\gamma-s^{2})\right]=\min_{\gamma\in\Upsilon}\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}+\mu\gamma^{T}{\mathbf{W}}\gamma\,,

(8)

where $\mu\geq 0$ is the Lagrange multiplier. By Lagrangian duality, there is a one-to-one correspondence between the constrained problem (7) and the Lagrangian form (8): for each value of $s$ in the range where the constraint $\gamma^{T}{\mathbf{W}}\gamma\leq s^{2}$ is active, there is a corresponding value of $\mu$ that yields the same solution from the Lagrangian form (8). Conversely, the solution $\hat{\gamma}_{\mu}$ to problem (8) solves the bound problem (7) with $s^{2}=\hat{\gamma}^{T}_{\mu}{\mathbf{W}}\hat{\gamma}_{\mu}$ . Some basic properties of the solution to problem (8) are summarized in Proposition 2.

Proposition 2.

Let $\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d})$ be the monotone decomposition to problem (8) (or problem (13) discussed later).

(i)

There must be ties in the solution $\hat{\gamma}^{u}$ or $\hat{\gamma}^{d}$ , i.e., there exists $i$ or $j$ such that $\hat{\gamma}^{u}_{i}=\hat{\gamma}^{u}_{i+1}$ or $\hat{\gamma}^{d}_{j}=\hat{\gamma}^{d}_{j+1}$ .
(ii)

The mean values of two monotone components are equal, ${\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{d}$ .

2.3 Monotone Functions

To delve deeper into the properties of the solution to problem (8), this section discusses the monotone decomposition of monotone functions. Without loss of generality, we consider increasing functions.

Proposition 3.

Let $\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d})$ be the monotone decomposition to problem (8). Suppose $\hat{\gamma}^{u}+\hat{\gamma}^{d}$ is increasing, then

(i)

$\hat{\gamma}^{d}$ is a vector with identical elements, i.e., $\hat{\gamma}^{d}=c{\boldsymbol{1}}$ , where the constant $c=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{u}}{n}$ ;
(ii)

if there is no ties in $\hat{\gamma}^{u}$ , i.e., $\hat{\gamma}_{1}^{u}<\hat{\gamma}_{2}^{u}<\ldots<\hat{\gamma}^{u}_{J}$ , then

$\hat{\gamma}^{u}=\frac{1}{\mu+1}\hat{\gamma}^{\mathrm{ls}}+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,,$ (9)

where the unconstrained B-spline solution $\hat{\gamma}^{\mathrm{ls}}$ is given in Equation (3);

(iii)

if $\hat{\gamma}^{u}_{1}<\cdots<\hat{\gamma}^{u}_{k_{1}}=\cdots=\hat{\gamma}^{u}_{k_{2}}<\cdots<\hat{\gamma}^{u}_{k_{2m-1}}=\cdots=\hat{\gamma}^{u}_{k_{2m}}<\cdots<\hat{\gamma}_{J}^{u}$ , where $1\leq k_{1}\leq k_{2}\leq\cdots\leq k_{2m-1}\leq k_{2m}\leq J$ , then

\hat{\gamma}^{u}=\frac{1}{\mu+1}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{y}}+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,,

(10)

where ${\mathbf{G}}$ is a block diagonal matrix with elements

\{{\mathbf{I}}_{k_{1}-1},{\boldsymbol{1}}^{T}_{k_{2}-k_{1}+1},\ldots,{\mathbf{I}}_{k_{2m-1}-k_{2m-2}-1},{\boldsymbol{1}}^{T}_{k_{2m}-k_{2m-1}+1},{\mathbf{I}}_{J-k_{2m}}\}\,.

The above result (ii) can be viewed as a special case when $k_{1}=k_{2}=J$ .

Intuitively, solution (9) can be viewed as a shrinkage with offset applied to the unconstrained B-spline fitting $\hat{\gamma}^{\mathrm{ls}}$ , where $\frac{1}{\mu+1}$ is the shrinkage factor, and $\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}$ is the offset. Even with the general matrix ${\mathbf{G}}$ , solution (10) also exhibits a similar shrinkage with offset pattern.

2.4 MSE Comparisons

To quantify the performance of fitting by monotone decomposition, consider the model

y=f(x)+\varepsilon\,,\quad\varepsilon\sim N(0,\sigma^{2})\,.

(11)

Define the mean squared error of the fitness,

\displaystyle\operatorname*{MSE}(\hat{\mathbf{y}})={\mathbb{E}}\|{\mathbf{f}}-{\mathbf{B}}(\hat{\gamma}^{u}+\hat{\gamma}^{d})\|^{2}_{2}\,,\quad\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ls}})={\mathbb{E}}\|{\mathbf{f}}-{\mathbf{B}}\hat{\gamma}^{\mathrm{ls}}\|^{2}_{2}\,,

where ${\mathbf{f}}=[f(x_{1}),\ldots,f(x_{n})]^{T}$ , and the expectations are taken over ${\mathbf{y}}\sim N({\mathbf{f}},\sigma^{2}{\mathbf{I}})$ . Proposition 4 shows that the fitting with monotone decomposition can achieve better performance, particularly in high-noise scenarios, when the underlying function is monotone.

Proposition 4.

Suppose the monotone decomposition $\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d})$ satisfies that $\hat{\gamma}^{u}+\hat{\gamma}^{d}$ is increasing. Let ${\mathbf{G}}$ be a $g\times J$ matrix defined in Proposition 3 such that ${\mathbf{G}}^{T}\hat{\gamma}^{u}$ is the sub-vector with unique elements. If

\sigma^{2}>\frac{-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)+\sqrt{\Delta}}{2[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}]}\,,

where

	$\displaystyle C_{1}$	$\displaystyle={\mathbf{f}}^{T}{\mathbf{A}}{\mathbf{f}}-\frac{({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}}{n}\geq 0\,,\quad C_{2}={\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{A}}){\mathbf{f}}\geq 0$
	$\displaystyle{\mathbf{A}}$	$\displaystyle={\mathbb{E}}[{\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}]$
	$\displaystyle\Delta$	$\displaystyle=\left[C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]^{2}+4C_{1}C_{2}({\mathbb{E}}g-1)^{2}\,,$

and the expectations are taken over ${\mathbf{y}}$ since ${\mathbf{G}}$ (and hence $g$ ) depends on ${\mathbf{y}}$ , then there exists monotone decomposition such that $\operatorname*{MSE}(\hat{\mathbf{y}})<\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ls}})$ .

Particularly, if we assume there is no ties in $\hat{\gamma}^{u}$ , i.e., ${\mathbf{G}}\equiv{\mathbf{I}}$ for different ${\mathbf{y}}$ , then there always exists a monotone decomposition such that $\operatorname*{MSE}(\hat{\mathbf{y}})<\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ls}})$ regardless of the noise level.

The lower bound of $\sigma^{2}$ in Proposition 4 might not be easy to evaluate. Nonetheless, the pivotal implication is that the monotone decomposition fitting can achieve better performance when the noise level is large enough. Extensive simulations in Section 4 agree with this argument. Moreover, although Proposition 4 is specifically established for monotone functions, the simulations show that the monotone decomposition fitting with cubic splines can also outperform the corresponding unconstrained cubic splines applied to random functions, particularly in high-noise scenarios.

3 Monotone Decomposition with Smoothing Splines

When dealing with cubic splines, it is typically necessary to ascertain both the number of basis functions, denoted as $J$ , and the optimal placement of knots. In contrast, smoothing splines take a different approach by employing all unique data points as knots, thus bypassing the need for an optimization process to determine the knot placement and the number of knots required for B-spline basis functions.

With B-spline basis functions, the smoothing spline $f(x)=\sum_{j=1}^{J}\gamma_{j}B_{j}(x)$ can be estimated as follows,

\hat{\gamma}^{\mathrm{ss}}=\operatorname*{arg\,min}\|{\mathbf{y}}-{\mathbf{B}}\gamma\|_{2}^{2}+\lambda\gamma^{T}{\boldsymbol{\Omega}}\gamma\,,

(12)

where $\{{\boldsymbol{\Omega}}\}_{jk}=\int B_{j}^{\prime\prime}(s)B_{k}^{\prime\prime}(s)ds$ is called the roughness penalty matrix and $\lambda>0$ is the penalty parameter. For this reason, smoothing splines are also referred to as penalized splines.

Imposing the roughness penalty $\gamma^{T}{\boldsymbol{\Omega}}\gamma=(\gamma^{u}+\gamma^{d})^{T}{\boldsymbol{\Omega}}(\gamma^{u}+\gamma^{d})=\gamma^{T}{\boldsymbol{\Sigma}}\gamma$ on problem (8), where ${\boldsymbol{\Sigma}}\triangleq\begin{bmatrix}{\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\\ {\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\end{bmatrix}$ , we have the Lagrangian form of monotone decomposition with smoothing splines,

\min_{\gamma\in\Upsilon}\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}+\mu\gamma^{T}{\mathbf{W}}\gamma+\lambda\gamma^{T}{\boldsymbol{\Sigma}}\gamma\,.

(13)

For general functions, the properties in Proposition 2 also hold for the monotone decomposition with smoothing splines.

3.1 Monotone Functions

Likewise, we delve deeper into the characteristics of monotone decomposition with smoothing splines on monotone functions. The solutions demonstrate analogous shrinkage with offset patterns, akin to those observed in Proposition 3 for monotone decomposition with cubic splines, and the results are articulated in the following Proposition 5.

Proposition 5.

Let $\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d})$ be the monotone decomposition to problem (13). Suppose $\hat{\gamma}^{u}+\hat{\gamma}^{d}$ is increasing, then

(i)

$\hat{\gamma}^{d}$ is a vector with identical elements, i.e., $\hat{\gamma}^{d}=c{\boldsymbol{1}}$ , where the constant $c=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{u}}{n}$ ;

(ii)

if there is no ties in $\hat{\gamma}^{u}$ , i.e., the inequalities hold strictly, $\hat{\gamma}_{1}^{u}<\hat{\gamma}_{2}^{u}<\ldots<\hat{\gamma}^{u}_{J}$ , then

\begin{split}\hat{\gamma}^{u}&=\frac{1}{1+\mu}\hat{\gamma}^{\mathrm{ss}}\left({\frac{\lambda}{1+\mu}}\right)-c((1+\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}})^{-1}((1-\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}){\boldsymbol{1}}_{J}\end{split}

(14)

where $\hat{\gamma}^{\mathrm{ss}}\left({\frac{\lambda}{1+\mu}}\right)$ is the solution to smoothing spline with penalty parameter $\frac{\lambda}{1+\mu}$ ,

\hat{\gamma}^{\mathrm{ss}}\left({\frac{\lambda}{1+\mu}}\right)=\left({\mathbf{B}}^{T}{\mathbf{B}}+\frac{\lambda}{1+\mu}{\boldsymbol{\Omega}}\right)^{-1}{\mathbf{B}}^{T}{\mathbf{y}}\,.

(iii)

if $\hat{\gamma}^{u}_{1}<\cdots<\hat{\gamma}^{u}_{k_{1}}=\cdots=\hat{\gamma}^{u}_{k_{2}}<\cdots<\hat{\gamma}^{u}_{k_{2m-1}}=\cdots=\hat{\gamma}^{u}_{k_{2m}}<\hat{\gamma}_{J}^{u}$ , where $1\leq k_{1}\leq k_{2}\leq\cdots\leq k_{2m-1}\leq k_{2m}\leq J$ , then

	$\displaystyle\hat{\gamma}^{u}$	$\displaystyle={\mathbf{G}}^{T}((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{y}}-$
		$\displaystyle\qquad c{\mathbf{G}}^{T}((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T}){\boldsymbol{1}}_{g}\,,$

where ${\mathbf{G}}$ is defined in Proposition 3. The above result (ii) can be viewed as a special case when $k_{1}=k_{2}=J$ .

For the no-tie solution (14), the shrinkage is on the ridge solution $\hat{\gamma}^{\mathrm{ss}}({\frac{\lambda}{1+\mu}})$ , but different from the constant offset in Equation (9), the offset $c((1+\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}})^{-1}((1-\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}){\boldsymbol{1}}_{J}$ depends on ${\mathbf{B}}$ and ${\boldsymbol{\Omega}}$ .

3.2 MSE Comparisons

Based on model (11), for a comparative analysis of the fitting performance between monotone decomposition with smoothing splines and their smoothing splines counterparts, we further define the mean squared error for smoothing splines,

\displaystyle\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ss}}(\lambda))

\displaystyle={\mathbb{E}}\|{\mathbf{f}}-{\mathbf{B}}\hat{\gamma}^{\mathrm{ss}}(\lambda)\|^{2}_{2}\,,

where $\hat{\gamma}^{\mathrm{ss}}(\lambda)$ is the solution to smoothing splines with penalty parameter $\lambda$ .

Proposition 6 shows that employing monotone decomposition with smoothing splines can result in a superior mean squared error (MSE) compared to smoothing splines in the context of monotone functions, particularly when the noise level is sufficiently high. While the condition (15) outlined in Proposition 6 may appear intricate, the simulations presented in the next section empirically substantiate this assertion. Furthermore, although the proposition is specifically formulated for monotone functions, the simulations show that the monotone decomposition applied to general functions can still achieve better performance in high-noise scenarios.

Proposition 6.

Consider the smoothing spline with penalty parameter $\lambda_{0}$ . Let $\hat{\gamma}^{\mathrm{ss}}(\lambda_{0})$ be the coefficient vector and denote its hat matrix by ${\mathbf{Q}}={\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}}+\lambda_{0}{\boldsymbol{\Omega}})^{-1}{\mathbf{B}}^{T}$ . If

\sigma^{2}>\frac{{\mathbf{f}}^{T}{\mathbf{Q}}({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n})({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}}{\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}){\mathbf{Q}}^{2}]}\,,

(15)

and suppose the monotone decomposition $\hat{\gamma}^{u}+\hat{\gamma}^{d}$ is increasing with no ties, then there exists a monotone decomposition with parameters $\lambda=\lambda_{0}/k,\mu=1/k-1$ , where $k\in(0,1)$ , that achieves smaller mean squared error than $\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ss}}(\lambda_{0}))$ .

4 Simulations for Fitting

In this section, we compare the performance of monotone decomposition using simulated examples. We generate data from function $f$ with standard Gaussian noises,

y=f(x)+N(0,\sigma^{2})\,,\;x\in[-1,1]\,.

To cover a diverse range of functional forms, we consider the following different types of functions.

•

monotone functions: (i) polynomial function: $y=x^{3}$ ; (ii) exponential function: $y=\exp(x)$ ; (iii) sigmoid function: $y=\frac{1}{1+\exp(-5x)}$ .
•

general functions: (i) unimodal: $y=x^{2}$ ; (ii) random functions, where the kernel can be Squared Exponential (SE), Rational Quadratic (RQ), Matérn (Mat) and Periodic [18]. The numerical values appended to the kernel names in Tables 6 and 8 are the kernel parameters. For example, “Mat12” refers to the Matérn kernel with parameter $\nu=1/2$ . The detailed procedure for generating a random function and a visualization of those curves can be found in the Supplementary Material.

To compare the performance of different methods, we adopt the mean squared fitting error (MSFE), i.e., the residual sum of squares (RSS) divided by sample size, and the mean squared prediction error (MSPE),

\mathrm{MSFE}=\frac{1}{n}\sum_{i=1}^{n}\left(y_{i}-\hat{f}(x_{i})\right)^{2}\,,\qquad\mathrm{MSPE}=\frac{1}{N}\sum_{i=1}^{N}\left(\hat{f}(t_{i})-f(t_{i})\right)^{2}\,,

where $t_{i}=x_{(1)}+(i-1)\cdot\frac{x_{(N)}-x_{(1)}}{N},i=1,\ldots,N$ are equally spaced within the same range of $x$ . Based on $R=100$ replications, we estimate the mean MSFE and MSPE, together with their respective standard errors. To judge how significant the differences of MSPE between the fitting by monotone decomposition and the corresponding spline fitting, we consider

\Delta=\mathrm{MSPE}(\text{Monotone decomposition})-\mathrm{MSPE}(\text{Spline fitting})\,,

and denote the differences for each experiment as $\delta_{i},i=1,\ldots,R$ . We report the $p$ -value for the one-sided $t$ -test $H_{0}:\Delta=0$ versus $H_{1}^{<}:\Delta<0$ when $\sum_{i=1}^{n}\delta_{i}<0$ (or $H_{1}^{>}:\Delta>0$ when $\sum_{i=1}^{n}\delta_{i}>0$ ). Besides, we also count the proportion for the fitting by monotone decomposition that achieves better performance, $\text{prop}\triangleq\frac{\sum_{i=1}^{R}\#\{\delta_{i}<0\}}{R}\,.$

4.1 Cubic Splines

Firstly, we consider the monotone decomposition with cubic splines. There are two tuning parameters: the number of basis functions $J$ , and the Lagrange multiplier $\mu$ for the discrepancy between the two components. We adopt two strategies to optimize these parameters:

Tune $\mu$ with fixed $J$ :

We pick $J$ , the tuning parameter for cubic splines, to be a minimizer of the cross-validation (CV) error, and then perform the monotone decomposition with cubic splines using the same $J$ while tuning the parameter $\mu$ . Figure 1 shows an example, with $J=26$ selected by CV. The left panel shows the leave-one-out CV error plotted against $\mu$ . The cubic spline fitting and the monotone decomposition fitting are displayed in the right panel, where the former achieves 0.102 MSPE, while the latter improves the MSPE to 0.066.

Refer to caption — Figure 1: Demo for monotone decomposition with cubic splines. The left panel shows the leave-one-out cross-validation error curve for each candidate $\mu$ when $J$ is CV-tuned for the cubic spline. The right panel shows the corresponding fitting curves, together with the truth and the noised observations. The values in the parentheses are the mean squared prediction error.

Tune $\mu$ and $J$ simultaneously:

Instead of fixing $J$ in the monotone decomposition procedure, we also use cross-validation to choose it, together with $\mu$ . Figure 2 displays an example of this process, where the left panel shows the CV error for each parameter pair $(\mu,J)$ , and the right panel compares the fitting given the parameters that minimize the CV error to cubic spline fitting, whose parameter $J$ is separately tuned by CV.

Note that the right panels of Figures 1 and 2 depict the same training data. Although the MSPE of 0.079 by tuning $(\mu,J)$ simultaneously is slightly larger than the MSPE of 0.066 by tuning $\mu$ with fixed $J$ , the monotone decomposition method achieves better performance than the cubic spline fitting, which has an MSPE of 0.102.

To provide comprehensive comparisons, we conducted 100 repetitions for 12 types of curves under different noise levels. The results by tuning $\mu$ and $J$ simultaneously are summarized in Table 6, and the results by tuning $\mu$ with fixed $J$ can be found in the Supplementary Material. For some curves with small noises (e.g., $\sigma=0.2$ ), such as $y=x^{3}$ , the decomposition method performs slightly worse than cubic spline fitting. Nevertheless, the monotone decomposition always outperforms cubic spline fitting in higher noise (e.g., $\sigma=1.0$ ) scenarios, regardless of the optimization strategies.

Table 1: Results for comparing the cubic splines and the fitting by monotone decomposition with CV-tuned

(\mu,J)

. The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error. The complete table with finer noise levels can be found in the Supplementary Material.

curve	$\sigma$	MSFE		MSPE		p-value	prop.
curve	$\sigma$	CubicSpline	MonoDecomp	CubicSpline	MonoDecomp	p-value	prop.
$x^{2}$	1.0	9.71e+00 (7.3e-02)	9.73e+00 (7.3e-02)	7.50e+00 (3.1e-01)	6.93e+00 (2.6e-01)	5.91e-03 (**)	0.59
$x^{3}$	1.0	9.65e+00 (7.4e-02)	9.60e+00 (7.2e-02)	7.63e+00 (3.5e-01)	7.17e+00 (2.7e-01)	2.21e-02 (*)	0.6
$\exp(x)$	1.0	9.57e+00 (5.9e-02)	9.49e+00 (7.3e-02)	7.56e+00 (2.8e-01)	7.08e+00 (3.1e-01)	3.59e-02 (*)	0.57
sigmoid	1.0	9.51e+00 (8.5e-02)	9.50e+00 (8.7e-02)	7.33e+00 (3.2e-01)	6.61e+00 (2.6e-01)	1.45e-03 (**)	0.56
SE-1	1.0	9.55e+00 (7.0e-02)	9.51e+00 (7.6e-02)	7.29e+00 (3.2e-01)	6.62e+00 (2.7e-01)	5.51e-03 (**)	0.63
SE-0.1	1.0	9.29e+00 (8.5e-02)	9.20e+00 (9.3e-02)	1.44e+01 (2.6e-01)	1.38e+01 (2.4e-01)	5.93e-04 (***)	0.7
Mat12-1	1.0	9.79e+00 (8.9e-02)	9.73e+00 (9.1e-02)	1.26e+01 (2.9e-01)	1.17e+01 (2.4e-01)	9.31e-07 (***)	0.68
Mat12-0.1	1.0	1.04e+01 (1.3e-01)	1.04e+01 (1.3e-01)	2.07e+01 (2.4e-01)	2.01e+01 (2.4e-01)	1.85e-04 (***)	0.73
Mat32-1	1.0	9.62e+00 (8.2e-02)	9.61e+00 (8.3e-02)	9.01e+00 (3.5e-01)	8.00e+00 (2.6e-01)	1.46e-04 (***)	0.56
Mat32-0.1	1.0	9.62e+00 (1.1e-01)	9.53e+00 (9.7e-02)	1.68e+01 (2.5e-01)	1.58e+01 (2.1e-01)	4.67e-10 (***)	0.72
RQ-0.1-0.5	1.0	9.55e+00 (1.1e-01)	9.50e+00 (1.2e-01)	1.50e+01 (2.4e-01)	1.44e+01 (2.6e-01)	5.01e-03 (**)	0.68
Periodic-0.1-4	1.0	9.41e+00 (1.2e-01)	9.31e+00 (1.1e-01)	1.72e+01 (3.0e-01)	1.60e+01 (2.5e-01)	2.03e-10 (***)	0.63

4.2 Smoothing Splines

This section compares the fitting performance of monotone decomposition with smoothing splines to the fitting of the smoothing splines. There are two tuning parameters: the penalty parameter $\lambda$ and the Lagrange multiplier $\mu$ for the discrepancy. We consider three strategies to optimize these parameters:

•

Tune $\lambda$ for smoothing splines first, then tune $\mu$ for the monotone decomposition with smoothing splines using the tuned $\lambda$ ;
•

According to Proposition 6, tune $\lambda$ for smoothing splines first, then tune the shrinkage factor $k=\frac{1}{1+\mu}$ for monotone decomposition with smoothing splines using penalty parameter $\lambda/k$ ;
•

Simultaneously tune $\lambda$ and $\mu$ for monotone decomposition with smoothing splines.

All strategies use cross-validation (CV) to determine the parameters. Specifically, we pick the tuning parameters that minimize the CV error.

For brevity, we only present the results using the third strategy in this section. Results based on the other two strategies can be found in the Supplementary Material. Figure 3 illustrates the simultaneous tuning of $(\mu,\lambda)$ using the same toy example presented in Figures 1 and 2. The smoothness penalty in smoothing splines results in fitted curves that are notably less wiggly compared to the curves fitted by cubic splines without such a penalty.

The results of 100 repetitive experiments are summarized in Table 8. We observe that the monotone decomposition consistently outperforms the corresponding smoothing splines. Moreover, the monotone decomposition with CV-tuned $(\lambda,\mu)$ is more likely to obtain better performance compared to the other two strategies. Regardless of the optimization strategies employed, all results consistently show that the monotone decomposition fitting can achieve a good performance, especially in high-noise scenarios.

Table 2: Results for comparing the smoothing splines with CV-tuned

\lambda

and the fitting by monotone decomposition with CV-tuned

(\lambda,\mu)

curve	$\sigma$	MSFE		MSPE		p-value	prop.
curve	$\sigma$	SmoothSpline	MonoDecomp	SmoothSpline	MonoDecomp	p-value	prop.
$x^{2}$	1.0	9.65e+00 (8.9e-02)	9.69e+00 (8.5e-02)	6.44e+00 (2.9e-01)	6.35e+00 (2.5e-01)	1.68e-01	0.49
$x^{3}$	1.0	9.75e+00 (8.2e-02)	9.77e+00 (8.2e-02)	6.47e+00 (1.8e-01)	6.21e+00 (1.7e-01)	1.18e-04 (***)	0.65
$\exp(x)$	1.0	9.74e+00 (8.4e-02)	9.75e+00 (8.4e-02)	5.94e+00 (2.3e-01)	5.82e+00 (2.1e-01)	3.12e-02 (*)	0.58
sigmoid	1.0	9.60e+00 (7.8e-02)	9.64e+00 (7.5e-02)	5.99e+00 (2.9e-01)	5.68e+00 (2.4e-01)	4.47e-04 (***)	0.67
SE-1	1.0	9.67e+00 (8.3e-02)	9.70e+00 (8.1e-02)	6.32e+00 (2.8e-01)	6.11e+00 (2.5e-01)	3.86e-03 (**)	0.6
SE-0.1	1.0	8.92e+00 (9.2e-02)	8.99e+00 (9.0e-02)	1.23e+01 (2.1e-01)	1.23e+01 (2.1e-01)	4.32e-01	0.57
Mat12-1	1.0	9.65e+00 (9.5e-02)	9.69e+00 (9.2e-02)	1.15e+01 (2.3e-01)	1.13e+01 (2.1e-01)	3.40e-04 (***)	0.56
Mat12-0.1	1.0	9.76e+00 (1.2e-01)	9.92e+00 (1.1e-01)	1.90e+01 (2.2e-01)	1.90e+01 (2.2e-01)	2.06e-01	0.58
Mat32-1	1.0	9.59e+00 (8.5e-02)	9.64e+00 (7.5e-02)	7.20e+00 (2.4e-01)	7.04e+00 (2.0e-01)	3.65e-02 (*)	0.49
Mat32-0.1	1.0	8.96e+00 (1.1e-01)	9.14e+00 (1.0e-01)	1.48e+01 (2.2e-01)	1.47e+01 (2.1e-01)	1.44e-01	0.54
RQ-0.1-0.5	1.0	9.10e+00 (1.1e-01)	9.22e+00 (1.1e-01)	1.27e+01 (2.4e-01)	1.25e+01 (2.2e-01)	1.21e-03 (**)	0.6
Periodic-0.1-4	1.0	8.69e+00 (1.0e-01)	8.89e+00 (9.1e-02)	1.44e+01 (2.1e-01)	1.43e+01 (1.9e-01)	3.03e-01	0.49

5 Test of Monotonicity

Once obtaining two monotone components through monotone decomposition, in addition to utilizing the sum of these two components as a fitting method, we can also derive statistics for the monotonicity testing. Consider the model $Y=f(X)+\epsilon$ , where $X$ is a scalar covariate, $Y$ is a scalar dependent random variable, $\epsilon$ is the noise satisfying ${\mathbb{E}}[\epsilon\mid X]=0$ , and $f$ is an unknown function. Testing of monotonicity aims to test

H_{0}:f\text{ is monotone}\quad H_{1}:f\text{ is not monotone}\,,

where “monotone” can be specifically (strictly) “increasing” or “decreasing”.

5.1 Related Work

There are many existing approaches for testing monotonicity. [1] constructed a test based on critical bandwidth. They fitted a local linear regression and determined the smallest bandwidth value such that the estimate becomes monotone. This critical bandwidth is then used as a test statistic, and the $p$ -value is calculated by the bootstrap method. [11] pointed out the shortcoming of the test when the true function has flat and nearly flat spots, and they proposed a test that estimates local slopes and approximates the distribution of the weighted minimum. [9] proposed test statistics that are functionals of a U-process, which is based on the signs of $(Y_{i+k}-Y_{i})(X_{i+k}-X_{i})$ . They approximated the limiting distribution by Gaussian processes and then derived the critical values for an asymptotic significance level $\alpha$ . [3] used the similar U-statistics, but he introduced a weighting function $Q(X_{i},X_{j})$ and proposed a statistic based on $(Y_{i}-Y_{j})\operatorname*{sign}(X_{j}-X_{i})Q(X_{i},X_{j})$ , where $Q$ is chosen from a large set of potentially useful weighting functions to maximize the statistic. [22] used quadratic regression splines to fit the data, took the minimum of the slopes at the knots as the test statistic, and then estimated the null distribution of such a statistic by performing constrained quadratic regression splines.

5.2 Test by Monotone Decomposition

Suppose we have obtained the monotone components. Propositions 3 and 5 imply that the coefficients for one component would be constant if the function is monotone. Thus, we can test the monotonicity of a function by testing whether the coefficients of monotone components are constant. The equivalences are summarized in Table 3.

Table 3: Equivalent hypotheses.

Original Hypothesis	Hypothesis in terms of Monotone Decomposition
${\mathcal{H}}_{0}^{u}:f\text{ is increasing}$	$H_{0}^{u}:\gamma^{d}=c{\boldsymbol{1}}\,.$
${\mathcal{H}}_{0}^{d}:f\text{ is decreasing}$	$H_{0}^{d}:\gamma^{u}=c{\boldsymbol{1}}\,.$
${\mathcal{H}}_{0}^{su}:f\text{ is strictly increasing}$	$H_{0}^{su}:\gamma^{d}=c{\boldsymbol{1}},\gamma^{u}\neq c{\boldsymbol{1}}\,.$
${\mathcal{H}}_{0}^{sd}:f\text{ is strictly decreasing}$	$H_{0}^{sd}:\gamma^{u}=c{\boldsymbol{1}},\gamma^{d}\neq c{\boldsymbol{1}}\,.$
${\mathcal{H}}_{0}^{m}:f\text{ is monotone}$	$H_{0}^{m}:\text{one monotone component is constant}$ , i.e., $\min(\gamma^{u},\gamma^{d})=c{\boldsymbol{1}}\,.$
${\mathcal{H}}_{0}^{sm}:f\text{ is strictly monotone }$	$H_{0}^{sm}:\text{one and only one monotone component is constant}$ , i.e., $\min(\gamma^{u},\gamma^{d})=c{\boldsymbol{1}},\max(\gamma^{u},\gamma^{d})\neq c{\boldsymbol{1}}\,.$

In Table 3, the minimum (maximum) of two vectors $a,b\in\mathrm{I\!R}^{J}$ are defined as:

\displaystyle\min(a,b)\triangleq\operatorname*{arg\,min}_{x\in\{a,b\}}V(x),\ \ \ \ \ \max(a,b)

\displaystyle\triangleq\operatorname*{arg\,max}_{x\in\{a,b\}}V(x),

where $V$ is the sample variance on the elements of a vector. To test $H_{0}^{u},H_{0}^{d}$ and $H_{0}^{m}$ , consider the test statistics

\displaystyle T_{u}=V(\hat{\gamma}^{d}),\ \ T_{d}=V(\hat{\gamma}^{u}),\ \ T_{m}=V(\min(\hat{\gamma}^{u},\hat{\gamma}^{d})).

Note that the null hypothesis will be rejected if the test statistic $T$ is large enough. Specifically, given a significance level $\alpha$ , the respective null hypothesis would be rejected if $T_{u}\geq t_{u,1-\alpha},T_{d}\geq t_{d,1-\alpha}$ or $T_{m}\geq t_{m,1-\alpha}$ , where $t_{u,1-\alpha},t_{d,1-\alpha}$ and $t_{m,1-\alpha}$ denote the critical values of the distributions of $T_{u},T_{d},T_{m}$ under the respective null hypotheses, respectively. The distributions of $T_{u},T_{d},T_{m}$ under their null hypotheses can be characterized by the bootstrap samples. Note that the $\epsilon$ and $\hat{\epsilon}$ can be heterogeneous, so we take the wild bootstrap [6]. Without loss of generality, we focus on the test of increasing functions, and the procedure for testing $H_{0}^{u}$ is outlined in Algorithm 1.

Algorithm 1 Test of Monotonicity (Increasing):

H^{u}_{0}

0: Significance level

\alpha

, number of bootstrap samples

R

1: Fit

\{x_{i},y_{i}\}_{i=1}^{n}

by monotone decomposition, either with cubic splines or smoothing splines. Denote the increasing component as

\hat{y}^{u}

, let

\hat{c}=\frac{{\boldsymbol{1}}^{T}\hat{y}^{u}}{n}

, and denote the fitting method as

\hat{m}

2: Calculate the test statistic

T

3: Calculate the errors,

\epsilon_{i}=y_{i}-\hat{y}_{i}

4: for

r

from 1 to

R

5: Sample

\eta_{i}\sim N(0,1)

and construct

\epsilon^{\star}_{i}=\eta_{i}\epsilon_{i}

. [Wild Bootstrap]

6: Construct bootstrap samples

y^{\star}_{i}=\hat{y}^{u}_{i}+\hat{c}+\epsilon^{\star}_{i},i=1,\ldots,n

7: Apply

\hat{m}

\{x_{i},y_{i}^{\star}\}_{i=1}^{n}

, and calculate the bootstrap statistic

\delta_{r}

8: end for

9: The critical value

t_{1-\alpha}

is the

1-\alpha

quantile of

\{\delta_{r}\}_{r=1}^{R}

, then reject the null hypothesis if

T>t_{1-\alpha}

Alternatively, construct

p

-value

\frac{1}{R}\sum_{r=1}^{R}\#\{\delta_{r}>T\}

, and reject if

p<\alpha

6 Simulations for Testing

Firstly, we want to check whether the methods can control the Type I error. Specifically, consider five monotone functions, $x,x^{3},x^{1/3},e^{x}$ and $\frac{1}{1+e^{-x}}$ . We conducted 100 simulations and calculated the proportion of rejecting the null hypothesis. Ideally, the rejection proportion should be less than 0.05 if we pick the commonly used significance level $\alpha=0.05$ . The results are reported in Table 4.

Table 4: Simulated size (the proportion of rejecting the null hypothesis) of monotone curves under different noise levels and different sample sizes given the significance level

\alpha=0.05

Methods	Curves	$\sigma=0.001$			$\sigma=0.01$			$\sigma=0.1$
Methods	Curves	n = 50	100	200	n = 50	100	200	n = 50	100	200
[22]	$x$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.01	0.0
	$x^{3}$	0.53	0.84	1.0	0.05	0.08	0.08	0.02	0.06	0.04
	$x^{1/3}$	0.0	0.0	0.0	0.0	0.0	0.0	0.03	0.02	0.01
	$e^{x}$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$1/(1+e^{-x})$	0.0	0.0	0.0	0.01	0.0	0.0	0.04	0.06	0.05
[9]	$x$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$x^{3}$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$x^{1/3}$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$e^{x}$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$1/(1+e^{-x})$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
[1]	$x$	0.0	0.0	0.0	0.02	0.0	0.01	0.01	0.0	0.01
	$x^{3}$	0.0	0.0	0.0	0.2	0.19	0.18	0.26	0.2	0.15
	$x^{1/3}$	0.0	0.0	0.0	0.0	0.0	0.0	0.05	0.02	0.01
	$e^{x}$	0.0	0.0	0.0	0.0	0.0	0.0	0.04	0.02	0.0
	$1/(1+e^{-x})$	0.0	0.0	0.0	0.02	0.0	0.02	0.02	0.01	0.0
MDCS	$x$	0.01	0.0	0.12	0.0	0.0	0.0	0.0	0.0	0.0
	$x^{3}$	0.0	0.0	0.0	0.01	0.0	0.0	0.0	0.0	0.0
	$x^{1/3}$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$e^{x}$	0.02	0.11	0.03	0.0	0.0	0.0	0.0	0.0	0.0
	$1/(1+e^{-x})$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
MDSS	$x$	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$x^{3}$	0.1	0.08	0.08	0.08	0.1	0.07	0.09	0.05	0.03
	$x^{1/3}$	0.0	0.02	0.02	0.01	0.03	0.07	0.05	0.06	0.05
	$e^{x}$	0.04	0.05	0.02	0.08	0.07	0.05	0.04	0.08	0.08
	$1/(1+e^{-x})$	0.03	0.03	0.03	0.0	0.01	0.0	0.0	0.0	0.0

[9] always accepts the null hypothesis. [22] fails to control the Type I error when the noise level is small on curve $x^{3}$ , and [1] cannot control the Type I error when the noise level is large on curve $x^{3}$ . In contrast, our proposed methods, monotone decomposition with cubic splines (MDCS) and monotone decomposition with smoothing splines (MDSS), demonstrate strong Type I error control in the majority of cases, even though the rejection rates are slightly elevated in a few instances.

Furthermore, the $p$ -value should follow $\text{Uniform}[0,1]$ under the null hypothesis. To check the distribution of $p$ -value for each approach, Figure 4 displays the uniform QQ plots of 1000 $p$ -values for $x^{3}$ and $e^{x}$ with sample size $n=200$ and noise level $\sigma=0.01$ , respectively. The uniform QQ plots of $p$ -values for all five curves with different noise levels can be found in the Supplementary Material. Notably, our proposed MDSS aligns pretty well with the diagonal line in the QQ plots, indicating the closest resemblance to the uniform distribution.

Next, we compare the simulated size and power under the settings of two competitors. The first one is the simulation setting in [1],

f(x,a)=1+x-a\exp\left(-\frac{(x-0.5)^{2}}{2\cdot 0.1^{2}}\right)\,,\quad y=f(x,a)+N(0,\sigma^{2})\,,

where $a=0$ , and 0.15 generate strongly and just monotonic curves, respectively; $a=0.25$ , and 0.45 produce mildly and strongly non-monotonic shapes, respectively. Also, we consider the simulation setting in [9], $y=m_{i}(x)+N(0,\sigma^{2})$ , where

	$\displaystyle m_{1}(x)$	$\displaystyle=0\,,\quad m_{2}(x)=x(1-x)\,,\quad m_{3}(x)=x+0.415\exp(-50x^{2})\,,$
	$\displaystyle m_{4}(x)$	$\displaystyle=\begin{cases}10(x-0.5)^{3}-\exp(-100(x-0.25)^{2})&\text{ if }x<0.5\\ 0.1(x-0.5)-\exp(-100(x-0.25)^{2})&\text{ otherwise}\end{cases}\,,$

and $m_{i}(x),i\geq 2$ are non-monotone curves. A visualization of those curves can be found in the Supplementary Material.

For each combination of parameters on each curve, 100 simulations are carried out, using a bootstrap simulation size of 100. The proportions of rejecting the null hypothesis, i.e., the simulated size and power, are reported. The complete results are displayed in Table 5, from which we have the following observations:

•

[9] fails to achieve high power on [1]’s curve $a=0.25$ , but our proposed method can achieve as high power as [1].
•

[1] loses power on [9]’s curve $m_{4}$ , but our proposed method can obtain as high power as [9].
•

For most curves, the behaviors of MDCS are similar to MDSS. But MDCS has better control over Type I errors while losing some power.

In summary, our proposed method can achieve comparable (and even better) power as other methods while controlling the Type I error.

Table 5: Simulated size and power on curves from [1] and [9] based on 100 repetitive experiments.

Methods	[1]										[9]
	Curves	$\sigma=0.001$			$\sigma=0.01$			$\sigma=0.1$			Curves	$\sigma=0.001$			$\sigma=0.01$			$\sigma=0.1$
	Curves	n = 50	100	200	n = 50	100	200	n = 50	100	200	Curves	n = 50	100	200	n = 50	100	200	n = 50	100	200
[22]	a = 0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.01	0.01	0.01	m1	0.1	0.04	0.04	0.06	0.02	0.06	0.07	0.07	0.1
	a = 0.15	0.0	0.0	0.0	0.07	0.04	0.06	0.03	0.03	0.01	m2	1.0	1.0	1.0	1.0	1.0	1.0	0.09	0.27	0.45
	a = 0.25	1.0	1.0	1.0	1.0	1.0	1.0	0.13	0.24	0.64	m3	1.0	1.0	1.0	1.0	1.0	1.0	0.34	0.56	0.88
	a = 0.45	1.0	1.0	1.0	1.0	1.0	1.0	0.56	0.95	1.0	m4	0.36	0.57	0.98	0.39	0.57	0.98	0.23	0.38	0.78
[9]	a = 0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	m1	0.01	0.0	0.01	0.01	0.04	0.02	0.02	0.0	0.01
	a = 0.15	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	m2	1.0	1.0	1.0	1.0	1.0	1.0	0.37	0.7	0.94
	a = 0.25	0.0	0.0	1.0	0.0	0.0	0.31	0.0	0.0	0.0	m3	0.97	1.0	1.0	0.94	1.0	1.0	0.1	0.33	0.87
	a = 0.45	1.0	1.0	1.0	1.0	1.0	1.0	0.06	0.71	0.99	m4	0.01	0.19	0.53	0.02	0.13	0.53	0.0	0.05	0.34
[1]	a = 0.0	0.0	0.0	0.0	0.02	0.01	0.0	0.01	0.01	0.03	m1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	a = 0.15	0.0	0.0	0.0	0.0	0.03	0.0	0.01	0.02	0.04	m2	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	a = 0.25	1.0	1.0	1.0	1.0	1.0	1.0	0.09	0.16	0.44	m3	1.0	1.0	1.0	1.0	1.0	1.0	0.9	1.0	1.0
	a = 0.45	1.0	1.0	1.0	1.0	1.0	1.0	0.3	0.72	0.98	m4	0.0	0.0	0.0	0.01	0.02	0.0	0.34	0.33	0.28
MDCS	a = 0.0	0.01	0.0	0.07	0.0	0.0	0.0	0.0	0.0	0.0	m1	0.03	0.02	0.02	0.0	0.01	0.0	0.0	0.0	0.0
	a = 0.15	0.05	0.02	0.04	0.01	0.0	0.0	0.0	0.0	0.0	m2	0.98	0.99	0.95	1.0	1.0	0.94	0.18	0.35	0.65
	a = 0.25	0.97	1.0	1.0	0.74	0.92	0.95	0.0	0.0	0.0	m3	0.89	0.93	1.0	0.9	0.97	0.99	0.16	0.22	0.29
	a = 0.45	1.0	1.0	1.0	0.99	1.0	1.0	0.01	0.08	0.17	m4	0.89	0.91	0.94	0.88	0.92	0.99	0.51	0.68	0.81
MDSS	a = 0.0	0.02	0.07	0.06	0.02	0.08	0.03	0.02	0.04	0.04	m1	0.05	0.07	0.04	0.06	0.04	0.03	0.05	0.07	0.05
	a = 0.15	0.05	0.03	0.05	0.03	0.06	0.05	0.02	0.05	0.06	m2	1.0	1.0	1.0	1.0	1.0	1.0	0.79	0.96	1.0
	a = 0.25	0.99	1.0	1.0	0.98	1.0	1.0	0.03	0.04	0.03	m3	1.0	1.0	1.0	1.0	1.0	1.0	0.67	0.83	0.97
	a = 0.45	1.0	1.0	1.0	1.0	1.0	1.0	0.46	0.82	0.98	m4	0.98	1.0	1.0	0.99	1.0	1.0	0.84	0.99	1.0

7 Application: Monotonicity Test for scRNA-seq Trajectory Inference

Single-cell transcriptome sequencing (scRNA-seq) is a powerful technique that allows researchers to profile transcript abundance at the resolution of individual cells. Trajectory inference aims first to allocate cells to lineages and then order them based on pseudotimes within these lineages. Based on trajectory inference, researchers can discover differentially expressed genes within lineages, such as [21]’s tradeSeq, [20]’s PseudotimeDE, and [13]’s Lamian. These methods mostly focus on the differential genes by checking whether the trajectory is constant along the pseudotime. Once a gene is identified as differentially expressed, researchers may further check whether its expression exhibits a monotone pattern. A non-decreasing expression pattern indicates that the corresponding gene is turning on and needed thereafter along the cell lineage. A decreasing expression pattern indicates that the corresponding gene is needed less and less along the pseudotime. On the other hand, a non-monotone expression pattern indicates that the corresponding gene is part of a more complex dynamics. Such detailed dynamics may illuminate the critical regulatory mechanism of cell differentiation along the corresponding lineage.

As an analogy to the term differentially expressed (DE) gene when the null hypothesis that the expression of the gene along the trajectory is constant is rejected, we call a non-monotonically expressed (nME) gene when the null hypothesis that the expression is monotonic is rejected. We adopt tradeSeq to identify DE genes, and the monotonicity test via monotone decomposition with cubic spline (MDCS) to find nME genes. Both DE genes and nME genes are selected using the Benjamini–Hochberg (BH) procedure to control the false discovery rate (FDR) with cutoff $\alpha=0.05$ .

To explore the biological functions of DE genes and nME genes, we examined the Gene Ontology (GO), which is a relational database of terms (concepts) used to describe gene functions, and conducted enrichment analysis [2].

Suppose there are $N$ genes in the reference gene list, among which $n$ genes are in our analyzed gene set. For a GO term of interest, suppose there are $M$ and $k$ genes within the reference gene list and our analyzed gene set, respectively, that are annotated to have the GO term. The $p$ -value for the one-sided Fisher’s exact test of the null hypothesis that the GO term is not enriched in the analyzed gene set can be calculated based on the hypergeometric distribution:

p=1-\sum_{i=0}^{k-1}\frac{\binom{M}{i}\binom{N-M}{k-i}}{\binom{N}{n}}\,.

We repeat this test for multiple GO terms of interest and correct for multiple comparisons via the BH procedure to control the FDR at the cutoff $\alpha=0.05$ .

7.1 nME genes can identify significant GO terms when DE genes fail

We studied the leukocyte lineage of the mouse bone marrow data set [16], which consists of the expression measurements of 3004 genes at 1474 pseudotime points. Figure 5(a) shows the reduced two-dimensional representation of the data using uniform manifold approximation and projection (UMAP) [15]. Eight cell types are denoted with different colors and shapes. The solid curve is the pseudotime axis, which starts from the cell type Multipotent progenitors at the bottomright and ends at the cell type Neutrophils at the topleft. Note that although a monotone pattern is a special DE pattern, we do not perform the monotonicity test in a two-step manner, i.e., firstly find DE genes and then perform the monotonicity test among those found DE genes. Instead, for each gene, we test whether it is a DE gene or an nME gene independently. Figure 5(b) displays the paired $p$ -values $(p_{\text{nME}},p_{\text{DE}})$ in the logarithmic scale, where the dash lines denote the cutoff determined by the BH procedure. As a result, we identified 109 nME genes (it is 102 after GO analysis since 7 genes are not mapped in the GO database) and 767 DE genes, of which 53 genes are in common. These numbers are also noted in the titles of GO bar plots in Figure 6. Figure 5(c) illustrates the fitted trajectory for gene 2610029G23Rik, which is identified as an nME gene but not a DE gene, i.e., it lies in the bottom right green block of Figure 5(b).

Figure 6 displays GO enrichment analysis on the DE genes and nME genes by R package clusterProfiler [24]. Figures 6(a), 6(b) and 6(c) take the whole 3004 genes as the reference gene list, but note that, because some genes are not mapped in the GO database, there are only 2665 genes after filtering. We cannot find significant GO terms for the DE gene list, as shown in Figure 6(b), which is left blank deliberately due to no significant results. In contrast, we can identify several significant GO terms for the nME gene list. Figures 6(c) and 6(d) display the intersection of the DE gene list and the nME gene list with respect to the whole gene list or the DE gene list, respectively. Both can identify significant GO terms. One possible reason for DE genes failing to identify significant GO terms is that the range of DE genes might be too broad, thus different sub-categories of DE genes (the non-monotonic pattern is a special sub-category) might contribute to different GO terms, but the increased sample size due to incorporating unrelated genes might reduce the significance for determining the significant GO terms. Another potential reason is that the tradeSeq test is not robust enough. As shown by the star symbol in Figure 5(b), there are many NA values returned by tradeSeq, which is a known issue discussed in their GitHub repository¹¹1https://github.com/statOmics/tradeSeq/issues/209.

We can further check whether the pattern of trajectory curves in the enriched GO terms agrees with the biological mechanism. Take the first GO term “translation” as an example. Figure 5(d) displays the trajectory curves of 109 nME genes along the pseudotime. Among these 109 nME genes, the curves of 22 genes annotated in the GO term “translation” are highlighted. These genes exhibit a coherent pattern, characterized by an initial increase in expression followed by a subsequent decrease. If we cast the pseudotime axis in Figure 5(d) back to Figure 5(a), the curve pattern implies that the gene expression increases when the cell develops from Multipotent progenitors to Monocytes, and roughly after the gene expression reaching the peak, the cell evolves to Neutrophils. This behavior is consistent with the biological fact that specialized cell types (here Neutrophils) might reduce rates of translation (and hence protein synthesis), since their structure and function are relatively stable.

7.2 nME genes can fine-locate GO terms when DE genes identify too many terms

In some scenarios, although GO enrichment analysis can identify significant GO terms given the DE genes, nME genes can further fine-locate GO terms and focus on a small but significant set of GO terms. We analyzed the cholangiocyte lineage from the mouse liver data studied in [8] to demonstrate such an advantage of nME genes. In the dataset, there are 6038 genes and 308 pseudotime points.

tradeSeq identifies 767 DE genes (it is 801 before filtering due to unmapping in GO), and the monotonicity test identifies 67 nME genes (it is 69 before filtering due to unmapping in GO), of which 45 genes are in common. For the 767 DE genes, we identify 439 significant GO terms, whereas for the 67 nME genes, we find 39 significant GO terms. Between the two sets, 28 GO terms are in common. Figure 7(a) displays all GO terms returned by the 39 nME genes, where the star symbol indicates GO terms not shared by the DE genes. Figure 7(b) shows the enrichment map constructed by the GO terms from DE genes, in which the common GO terms shared by the nME genes are highlighted. In the enrichment map, an edge connects two GO terms if there are overlapped genes, and hence, mutually overlapping GO terms tend to cluster together, making it easy to identify functional modules. It is clear that the shared GO terms mainly focus on two clusters: one is isolated from others on the right side and forms a pentagon shape with the keyword “coagulation”; another cluster is located at the left corner of the biggest cluster, related to “catabolic process”. In other words, nME genes can help fine-locate GO terms, which might help save time for researchers without checking too many GO terms from DE genes.

We further check the new GO terms enriched by the nME genes, which are annotated with the star symbol in Figure 7(a). These new GO terms might be contributed by new genes that are not identified as DE genes. For example, we take the GO term “regulation of blood coagulation” as an example, which contains 5 genes {F11, Kng2, Serpinc1, Serpinf2, Vtn} in the nME gene set, where the first three are also in the DE gene set, but the last two are only in the nME gene set. Figures 8(c) and 8(d) display the fitted trajectory of the expression along pseudotime by our monotone decomposition fitting technique, and the $p$ -values are also noted in the title of each figure. We observe that the $p$ -value for the DE gene is not quite as significant as the one for the nME gene. In other words, these two $p$ -values lie in the bottom right green block of Figure 8(a). Using the same data at the hepatoblast stage (an earlier stage than the cholangiocyte stage we considered), [8] identified 68 differentially variable (DV) genes, which indicates that the variances (instead of the mean expression considered in DE genes and nME genes) of gene expressions change along the pseudotime. Accidentally, Serpinf2 and Vtn are the two and the only two which are both in the 68 DV gene set and the 69 nME gene set. The coexistence of DV and nME characteristics in these genes suggests intricate and dynamic expression patterns, which might indicate significant biological interest. The dual nature of being both DV and nME genes underscores the complexity of the regulatory mechanisms governing these specific genetic expressions.

Furthermore, we check all genes that are identified as nME but not DE. Figure 8(b) displays the trajectory curves of all nME genes, and the curves of nME but not DE genes are highlighted, while others are transparent. To facilitate comparative analysis, all curves are centered and different colors denote different clusters (see Section 7.3). Note that the variations of highlighted curves are relatively small compared to curves in transparent, so different methods might make different conclusions. As a result, some non-DE genes are treated as nME genes despite the initial expectation that nME genes should naturally encompass a subset of DE genes.

7.3 nME genes can highlight non-monotonic patterns while DE genes blur them

The clustering of genes based on their fitted expression patterns can reveal intriguing insights for biologists. However, a potential limitation of clustering based on DE genes is the tendency of clustering methods to amalgamate pure monotonic patterns with somewhat intricate non-monotonic patterns (e.g., Figure 5 of [21]). To mitigate the amalgamation of non-monotonic patterns and maintain their clarity, one possible direction is to tailor clustering approaches, such as constructing more suitable similarity measurements. Another direct approach is to focus only on non-monotonic patterns from nME genes.

Here, we consider the liver dataset in Section 7.2. Figure 9(a) shows the dendrogram from hierarchical clustering with complete linkage on 69 nME genes. We take the cutoff 2.0 to obtain 10 clusters with clear patterns. Figure 9(b) presents the trajectory curves of those 69 nME genes, and different colors denote different clusters. Notably, we have highlighted three representative clusters, each depicted in its respective figure, ranging from Figure 9(c) to Figure 9(d). Figure 9(c) displays a peak on the right side, while Figure 9(d) showcases a peak on the left side.

On the other hand, we perform hierarchical clustering with a similar cutoff 1.5 and identify 12 clusters, as shown in Figure 9(e). The trajectory curves of 801 DE genes with different colors representing different clusters are displayed in Figure 9(f). It is worth noting that the presence of monotonic patterns has somewhat concealed the underlying wiggly patterns, as evidenced in Figure 9(g), which combines non-monotonic patterns similar to those found in Figure 9(c) with numerous ascending curves. Similarly, Figure 9(h) combines the non-monotonic pattern observed in Figure 9(d), illustrating the challenges in distinguishing these patterns.

Pure non-monotonic patterns hold the potential to identify significant patterns. For example, all three genes in Figure 9(c) are significantly annotated to GO terms “defense response to other organism”, “response to external biotic stimulus”, and “response to other organism”, each accompanied by FDR adjusted $p$ -value of 0.00278.

8 Discussions

We formulate the monotone decomposition with monotone splines. It can serve as a fitting method when we sum up two monotone components, and it can be used to conduct a test of monotonicity by checking whether one component is constant.

As a fitting method, the experiments have shown that the monotone decomposition with cubic splines improves the performance, especially in high noise cases. We can explain the better performance in monotone functions theoretically. Similar phenomena have been observed for the monotone decomposition with smoothing splines. However, there are also some limitations:

•

The cross-validation procedure for simultaneously tuning two parameters is computationally extensive. The generative bootstrap sampler (GBS) proposed by [19] might be an alternative to speed up the cross-validation step.
•

Currently, the theoretical guarantees are only for monotone functions. It would be great if the theoretical results could be extended to general functions.

For the test of monotonicity by monotone decomposition, the proposed statistics based on monotone decomposition show competitive performance and are even much better than the existing approaches.

We also apply the fitting and testing based on monotone decomposition to investigate the monotonic and non-monotonic trajectory patterns in several scRNA-seq datasets. In parallel with the conventional analysis of differentially expressed (DE) genes, we propose the concept of non-monotonically expressed (nME) genes, which might lead to new biological insights.

Acknowledgement

Lijun Wang was supported by the Hong Kong Ph.D. Fellowship Scheme from the University Grant Committee. Hongyu Zhao was supported in part by NIH grant P50 CA196530. JSL was supported in part by the NSF DMS/NIGMS 2 Collaborative Research grant (R01 GM152814).

Supplementary Material

The Supplementary Material contains additional simulation results and technical proofs of propositions.

References

[1] A.. Bowman, M.. Jones and I. Gijbels “Testing Monotonicity of Regression” In Journal of Computational and Graphical Statistics 7.4, 1998, pp. 489–500 DOI: 10.1080/10618600.1998.10474790
[2] Elizabeth I. Boyle et al. “GO::TermFinder—Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes” In Bioinformatics 20.18, 2004, pp. 3710–3715 DOI: 10.1093/bioinformatics/bth456
[3] Denis Chetverikov “Testing Regression Monotonicity in Econometric Models” In Econometric Theory 35.4, 2019, pp. 729–776 DOI: 10.1017/S0266466618000282
[4] Hugh A. Chipman, Edward I. George and Robert E. McCulloch “BART: Bayesian Additive Regression Trees” In The Annals of Applied Statistics 4.1 Institute of Mathematical Statistics, 2010, pp. 266–298 DOI: 10.1214/09-AOAS285
[5] Hugh A. Chipman, Edward I. George, Robert E. McCulloch and Thomas S. Shively “mBART: Multidimensional Monotone BART” In Bayesian Analysis 17.2 International Society for Bayesian Analysis, 2022, pp. 515–544 DOI: 10.1214/21-BA1259
[6] Russell Davidson and Emmanuel Flachaire “The Wild Bootstrap, Tamed at Last” In Journal of Econometrics 146.1, 2008, pp. 162–169 DOI: 10.1016/j.jeconom.2008.08.003
[7] Carl De Boor “A Practical Guide to Splines” New York: Springer, 1978
[8] Shila Ghazanfar et al. “Investigating Higher-Order Interactions in Single-Cell Data with scHOT” In Nature Methods 17.8 Nature Publishing Group, 2020, pp. 799–806 DOI: 10.1038/s41592-020-0885-x
[9] Subhashis Ghosal, Arusharka Sen and Aad W. van der Vaart “Testing Monotonicity of Regression” In The Annals of Statistics 28.4 Institute of Mathematical Statistics, 2000, pp. 1054–1082 JSTOR: 2673954
[10] Robert B. Gramacy “Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences” Chapman Hall/CRC, 2020
[11] Peter Hall and Nancy E. Heckman “Testing for Monotonicity of a Regression Mean by Calibrating for Linear Functions” In The Annals of Statistics 28.1 Institute of Mathematical Statistics, 2000, pp. 20–39 JSTOR: 2673980
[12] Trevor Hastie, Robert Tibshirani and Jerome Friedman “The Elements of Statistical Learning: Data Mining, Inference, and Prediction” Springer Science & Business Media, 2009
[13] Wenpin Hou et al. “A Statistical Framework for Differential Pseudotime Analysis with Multiple Single-Cell RNA-seq Samples” In Nature Communications 14.1 Nature Publishing Group, 2023, pp. 7286 DOI: 10.1038/s41467-023-42841-y
[14] Jan R Magnus and Heinz Neudecker “Matrix Differential Calculus with Applications in Statistics and Econometrics” John Wiley & Sons, Ltd, 2019 DOI: 10.1002/9781119541219
[15] Leland McInnes, John Healy, Nathaniel Saul and Lukas Großberger “UMAP: Uniform Manifold Approximation and Projection” In Journal of Open Source Software 3.29, 2018, pp. 861 DOI: 10.21105/joss.00861
[16] Franziska Paul et al. “Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors” In Cell 163.7, 2015, pp. 1663–1677 DOI: 10.1016/j.cell.2015.11.013
[17] James O. Ramsay and Bernard W. Silverman “Functional Data Analysis”, Springer Series in Statistics New York, NY: Springer, 2005
[18] Carl Edward Rasmussen and Christopher K.. Williams “Gaussian Processes for Machine Learning”, Adaptive Computation and Machine Learning Cambridge, Mass: MIT Press, 2006
[19] Minsuk Shin, Shijie Wang and Jun S. Liu “Generative Multi-purpose Sampler for Weighted M-estimation” In Journal of Computational and Graphical Statistics 0.ja Taylor & Francis, 2023, pp. 1–53 DOI: 10.1080/10618600.2023.2292668
[20] Dongyuan Song and Jingyi Jessica Li “PseudotimeDE: Inference of Differential Gene Expression along Cell Pseudotime with Well-Calibrated p-Values from Single-Cell RNA Sequencing Data” In Genome Biology 22.1, 2021, pp. 124 DOI: 10.1186/s13059-021-02341-y
[21] Koen Van den Berge et al. “Trajectory-Based Differential Expression Analysis for Single-Cell Sequencing Data” In Nature Communications 11.1 Nature Publishing Group, 2020, pp. 1201 DOI: 10.1038/s41467-020-14766-3
[22] Jianqiang C. Wang and Mary C. Meyer “Testing the Monotonicity or Convexity of a Function Using Regression Splines” In The Canadian Journal of Statistics / La Revue Canadienne de Statistique 39.1 [Statistical Society of Canada, Wiley], 2011, pp. 89–107 JSTOR: 41304465
[23] Lijun Wang, Xiaodan Fan, Huabai Li and Jun S. Liu “Monotone Cubic B-Splines” arXiv, 2023 DOI: 10.48550/arXiv.2307.01748
[24] Guangchuang Yu et al. “clusterProfiler: A Universal Enrichment Tool for Interpreting Omics Data”, Bioconductor version: Release (3.17), 2023 DOI: 10.18129/B9.bioc.clusterProfiler

Appendix A More Simulation Results

A.1 Candidate Kernels

Random functions with kernel $K$ , including Squared Exponential (SE) kernel $K_{SE}$ , Rational Quadratic (RQ) kernel $K_{RQ}$ , Matérn (Mat) kernel $K_{Mat}$ and Periodic kernel $K_{Periodic}$ [18].

	$\displaystyle K_{SE}(x,x^{\prime})$	$\displaystyle=\exp\left(-\frac{(x-x^{\prime})^{2}}{2\ell^{2}}\right),\;K_{Mat}(x,x^{\prime})=\frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}\|x-x^{\prime}\|}{\ell}\right)^{\nu}S_{\nu}\left(\frac{\sqrt{2\nu}\|x-x^{\prime}\|}{\ell}\right),$
	$\displaystyle K_{RQ}(x,x^{\prime})$	$\displaystyle=\left(1+\frac{(x-x^{\prime})^{2}}{2\alpha\ell^{2}}\right)^{-\alpha},\;K_{Periodic}(x,x^{\prime})=\exp\left(-\frac{2\sin^{2}(\|x-x^{\prime}\|/T)}{\ell^{2}}\right)\,,$

where $\ell,\nu,\alpha,T$ are the parameters, and $S_{\nu}$ is a modified Bessel function. In particular, “Mat12” refers to the Matérn kernel with $\nu=1/2$ , and similarly, “Mat32” and “Mat52” indicate $\nu=3/2$ and $5/2$ , respectively. Any additional parameters are appended to the kernel name; for example, “SE-1” represents the Squared Exponential kernel with $\ell=1$ , “Mat12-1” denotes the Matérn kernel with $\nu=1/2,\ell=1$ , and “RQ-0.1-0.5” is the Rational Quadratic kernel with parameter $\ell=0.1,\alpha=0.5$ .

A.2 Random Function Generation

A random function $f$ with kernel $K$ is generated as follows,

1.

Generate $n$ random points, $x_{i}\sim U[-1,1],i=1,\ldots,n$ .
2.

Calculate the covariance matrix $\Sigma$ based on kernel $K$ , $\Sigma_{ij}=K(x_{i},x_{j})$ . Practically, add a small constant, say $10^{-7}$ , on the diagonal of $\Sigma$ to prevent numerically ill-conditioned matrix [10].
3.

Generate a random Gaussian vector with the above covariance matrix, $f\sim N(0,\Sigma)$ .

A.3 Monotone Decomposition with Cubic Splines

Table 6: A complete version with finer noise levels fro Table 1. Results for comparing the cubic splines and the fitting by monotone decomposition with CV-tuned

(\mu,J)

. The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error.

curve	$\sigma$	MSFE		MSPE		p-value	prop.
curve	$\sigma$	CubicSpline	MonoDecomp	CubicSpline	MonoDecomp	p-value	prop.
$x^{2}$	0.2	1.95e+00 (1.4e-02)	1.94e+00 (1.5e-02)	1.46e+00 (6.1e-02)	1.46e+00 (6.4e-02)	4.44e-01	0.57
	0.4	3.84e+00 (3.4e-02)	3.83e+00 (3.0e-02)	2.97e+00 (1.5e-01)	2.90e+00 (1.3e-01)	2.79e-01	0.57
	0.6	5.80e+00 (4.3e-02)	5.77e+00 (4.7e-02)	4.45e+00 (2.0e-01)	4.22e+00 (1.9e-01)	7.97e-02 (.)	0.63
	1.0	9.71e+00 (7.3e-02)	9.73e+00 (7.3e-02)	7.50e+00 (3.1e-01)	6.93e+00 (2.6e-01)	5.91e-03 (**)	0.59
$x^{3}$	0.2	1.89e+00 (1.6e-02)	1.88e+00 (1.7e-02)	1.53e+00 (6.9e-02)	1.54e+00 (5.9e-02)	3.81e-01	0.47
	0.4	3.83e+00 (3.0e-02)	3.80e+00 (3.0e-02)	2.82e+00 (1.4e-01)	2.89e+00 (1.2e-01)	1.71e-01	0.45
	0.6	5.74e+00 (4.6e-02)	5.68e+00 (4.9e-02)	4.78e+00 (2.1e-01)	4.71e+00 (1.6e-01)	2.93e-01	0.49
	1.0	9.65e+00 (7.4e-02)	9.60e+00 (7.2e-02)	7.63e+00 (3.5e-01)	7.17e+00 (2.7e-01)	2.21e-02 (*)	0.6
$\exp(x)$	0.2	1.91e+00 (1.3e-02)	1.88e+00 (1.5e-02)	1.51e+00 (5.8e-02)	1.61e+00 (6.6e-02)	3.47e-02 (*)	0.53
	0.4	3.83e+00 (2.9e-02)	3.82e+00 (3.1e-02)	2.79e+00 (1.3e-01)	2.71e+00 (1.1e-01)	1.97e-01	0.57
	0.6	5.72e+00 (4.4e-02)	5.68e+00 (4.4e-02)	4.45e+00 (1.8e-01)	4.27e+00 (1.7e-01)	1.15e-01	0.61
	1.0	9.57e+00 (5.9e-02)	9.49e+00 (7.3e-02)	7.56e+00 (2.8e-01)	7.08e+00 (3.1e-01)	3.59e-02 (*)	0.57
sigmoid	0.2	1.91e+00 (1.5e-02)	1.91e+00 (1.6e-02)	1.69e+00 (5.4e-02)	1.59e+00 (5.0e-02)	6.14e-03 (**)	0.65
	0.4	3.89e+00 (3.2e-02)	3.88e+00 (3.2e-02)	2.99e+00 (1.2e-01)	2.96e+00 (1.1e-01)	3.53e-01	0.47
	0.6	5.77e+00 (4.8e-02)	5.76e+00 (4.8e-02)	4.35e+00 (1.7e-01)	4.14e+00 (1.4e-01)	4.19e-02 (*)	0.56
	1.0	9.51e+00 (8.5e-02)	9.50e+00 (8.7e-02)	7.33e+00 (3.2e-01)	6.61e+00 (2.6e-01)	1.45e-03 (**)	0.56
SE-1	0.2	1.93e+00 (1.4e-02)	1.93e+00 (1.6e-02)	1.62e+00 (7.3e-02)	1.59e+00 (5.2e-02)	3.21e-01	0.52
	0.4	3.81e+00 (3.4e-02)	3.78e+00 (3.6e-02)	3.18e+00 (1.4e-01)	3.16e+00 (1.2e-01)	3.93e-01	0.54
	0.6	5.77e+00 (4.4e-02)	5.77e+00 (4.6e-02)	4.67e+00 (2.0e-01)	4.34e+00 (1.6e-01)	2.88e-02 (*)	0.57
	1.0	9.55e+00 (7.0e-02)	9.51e+00 (7.6e-02)	7.29e+00 (3.2e-01)	6.62e+00 (2.7e-01)	5.51e-03 (**)	0.63
SE-0.1	0.2	1.76e+00 (2.0e-02)	1.78e+00 (2.5e-02)	3.54e+00 (5.3e-02)	3.54e+00 (6.7e-02)	4.92e-01	0.6
	0.4	3.54e+00 (3.6e-02)	3.55e+00 (3.8e-02)	6.59e+00 (1.1e-01)	6.25e+00 (1.0e-01)	1.36e-04 (***)	0.66
	0.6	5.57e+00 (5.4e-02)	5.59e+00 (6.0e-02)	9.20e+00 (1.6e-01)	9.13e+00 (1.6e-01)	3.14e-01	0.59
	1.0	9.29e+00 (8.5e-02)	9.20e+00 (9.3e-02)	1.44e+01 (2.6e-01)	1.38e+01 (2.4e-01)	5.93e-04 (***)	0.7
Mat12-1	0.2	2.12e+00 (3.2e-02)	2.15e+00 (2.9e-02)	5.43e+00 (5.2e-02)	5.29e+00 (6.5e-02)	7.85e-03 (**)	0.72
	0.4	4.08e+00 (4.7e-02)	4.02e+00 (4.8e-02)	7.55e+00 (9.5e-02)	7.20e+00 (8.9e-02)	7.51e-09 (***)	0.72
	0.6	5.82e+00 (6.5e-02)	5.79e+00 (5.8e-02)	9.34e+00 (1.4e-01)	8.94e+00 (1.1e-01)	1.35e-04 (***)	0.7
	1.0	9.79e+00 (8.9e-02)	9.73e+00 (9.1e-02)	1.26e+01 (2.9e-01)	1.17e+01 (2.4e-01)	9.31e-07 (***)	0.68
Mat12-0.1	0.2	3.87e+00 (6.8e-02)	3.88e+00 (7.3e-02)	1.26e+01 (1.6e-01)	1.25e+01 (1.8e-01)	2.28e-01	0.57
	0.4	5.15e+00 (8.3e-02)	5.08e+00 (6.9e-02)	1.47e+01 (1.5e-01)	1.42e+01 (1.3e-01)	1.25e-03 (**)	0.66
	0.6	6.77e+00 (1.0e-01)	6.64e+00 (8.8e-02)	1.67e+01 (1.8e-01)	1.61e+01 (1.7e-01)	2.66e-04 (***)	0.67
	1.0	1.04e+01 (1.3e-01)	1.04e+01 (1.3e-01)	2.07e+01 (2.4e-01)	2.01e+01 (2.4e-01)	1.85e-04 (***)	0.73
Mat32-1	0.2	1.87e+00 (1.6e-02)	1.87e+00 (1.6e-02)	2.40e+00 (4.5e-02)	2.29e+00 (4.3e-02)	1.30e-03 (**)	0.66
	0.4	3.82e+00 (3.5e-02)	3.79e+00 (3.6e-02)	3.99e+00 (1.0e-01)	3.92e+00 (9.6e-02)	2.08e-01	0.63
	0.6	5.77e+00 (4.4e-02)	5.76e+00 (4.3e-02)	5.60e+00 (1.6e-01)	5.28e+00 (1.3e-01)	4.84e-03 (**)	0.68
	1.0	9.62e+00 (8.2e-02)	9.61e+00 (8.3e-02)	9.01e+00 (3.5e-01)	8.00e+00 (2.6e-01)	1.46e-04 (***)	0.56
Mat32-0.1	0.2	2.05e+00 (3.5e-02)	2.11e+00 (4.4e-02)	5.46e+00 (7.6e-02)	5.54e+00 (1.1e-01)	1.82e-01	0.57
	0.4	3.84e+00 (5.4e-02)	3.77e+00 (6.0e-02)	8.83e+00 (1.1e-01)	8.41e+00 (1.3e-01)	8.21e-05 (***)	0.74
	0.6	5.64e+00 (7.0e-02)	5.58e+00 (7.6e-02)	1.17e+01 (1.5e-01)	1.13e+01 (1.4e-01)	1.68e-05 (***)	0.69
	1.0	9.62e+00 (1.1e-01)	9.53e+00 (9.7e-02)	1.68e+01 (2.5e-01)	1.58e+01 (2.1e-01)	4.67e-10 (***)	0.72
RQ-0.1-0.5	0.2	1.72e+00 (2.9e-02)	1.74e+00 (3.1e-02)	4.14e+00 (6.5e-02)	4.12e+00 (7.4e-02)	3.53e-01	0.61
	0.4	3.77e+00 (4.6e-02)	3.68e+00 (4.6e-02)	7.33e+00 (1.1e-01)	6.92e+00 (9.9e-02)	1.44e-06 (***)	0.77
	0.6	5.63e+00 (7.2e-02)	5.54e+00 (6.4e-02)	1.03e+01 (1.7e-01)	9.61e+00 (1.6e-01)	2.72e-07 (***)	0.72
	1.0	9.55e+00 (1.1e-01)	9.50e+00 (1.2e-01)	1.50e+01 (2.4e-01)	1.44e+01 (2.6e-01)	5.01e-03 (**)	0.68
Periodic-0.1-4	0.2	1.68e+00 (2.8e-02)	1.70e+00 (2.7e-02)	4.27e+00 (7.0e-02)	4.21e+00 (6.6e-02)	2.00e-01	0.6
	0.4	3.40e+00 (4.2e-02)	3.48e+00 (5.2e-02)	7.90e+00 (1.1e-01)	7.84e+00 (1.3e-01)	2.81e-01	0.65
	0.6	5.42e+00 (7.3e-02)	5.45e+00 (7.0e-02)	1.12e+01 (1.6e-01)	1.08e+01 (1.8e-01)	2.52e-02 (*)	0.65
	1.0	9.41e+00 (1.2e-01)	9.31e+00 (1.1e-01)	1.72e+01 (3.0e-01)	1.60e+01 (2.5e-01)	2.03e-10 (***)	0.63

Table 7: Results for comparing the cubic splines and the fitting by monotone decomposition with CV-tuned

\mu

and fixed

J

. The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error.

curve	$\sigma$	MSFE		MSPE		p-value	prop.
curve	$\sigma$	CubicSpline	MonoDecomp	CubicSpline	MonoDecomp	p-value	prop.
$x^{2}$	0.1	9.68e-01 (8.2e-03)	9.71e-01 (8.2e-03)	7.38e-01 (3.6e-02)	7.07e-01 (3.3e-02)	1.47e-03 (**)	0.65
	0.2	1.92e+00 (1.6e-02)	1.93e+00 (1.6e-02)	1.53e+00 (7.7e-02)	1.45e+00 (7.2e-02)	1.03e-03 (**)	0.74
	0.5	4.88e+00 (3.6e-02)	4.90e+00 (3.6e-02)	3.67e+00 (1.5e-01)	3.37e+00 (1.3e-01)	2.20e-06 (***)	0.77
	1.0	9.63e+00 (8.2e-02)	9.73e+00 (8.0e-02)	7.12e+00 (3.2e-01)	5.92e+00 (2.5e-01)	1.73e-10 (***)	0.77
$x^{3}$	0.1	9.56e-01 (7.6e-03)	9.58e-01 (7.1e-03)	7.32e-01 (3.5e-02)	6.98e-01 (2.9e-02)	1.01e-03 (**)	0.61
	0.2	1.93e+00 (1.6e-02)	1.94e+00 (1.5e-02)	1.47e+00 (7.1e-02)	1.38e+00 (5.8e-02)	2.88e-05 (***)	0.69
	0.5	4.83e+00 (4.3e-02)	4.86e+00 (4.2e-02)	3.66e+00 (1.5e-01)	3.41e+00 (1.4e-01)	7.49e-03 (**)	0.64
	1.0	9.55e+00 (8.1e-02)	9.68e+00 (7.8e-02)	8.03e+00 (3.7e-01)	7.02e+00 (2.7e-01)	2.01e-06 (***)	0.69
$\exp(x)$	0.1	9.56e-01 (7.4e-03)	9.57e-01 (7.3e-03)	7.51e-01 (3.3e-02)	7.26e-01 (3.0e-02)	3.12e-04 (***)	0.61
	0.2	1.93e+00 (1.5e-02)	1.94e+00 (1.5e-02)	1.40e+00 (6.6e-02)	1.32e+00 (5.9e-02)	4.32e-07 (***)	0.67
	0.5	4.82e+00 (3.9e-02)	4.84e+00 (3.7e-02)	3.52e+00 (1.7e-01)	3.02e+00 (1.4e-01)	9.36e-11 (***)	0.8
	1.0	9.74e+00 (7.6e-02)	9.82e+00 (7.5e-02)	7.47e+00 (3.3e-01)	6.09e+00 (2.5e-01)	6.35e-11 (***)	0.8
sigmoid	0.1	9.53e-01 (7.3e-03)	9.60e-01 (7.5e-03)	8.84e-01 (2.4e-02)	8.55e-01 (2.5e-02)	3.16e-03 (**)	0.68
	0.2	1.87e+00 (1.4e-02)	1.89e+00 (1.4e-02)	1.81e+00 (6.2e-02)	1.67e+00 (5.2e-02)	5.12e-05 (***)	0.71
	0.5	4.78e+00 (3.8e-02)	4.82e+00 (3.6e-02)	3.83e+00 (1.7e-01)	3.51e+00 (1.3e-01)	2.42e-04 (***)	0.66
	1.0	9.50e+00 (7.7e-02)	9.62e+00 (7.7e-02)	7.62e+00 (3.4e-01)	6.36e+00 (2.4e-01)	5.51e-08 (***)	0.61
SE-1	0.1	9.69e-01 (6.9e-03)	9.74e-01 (6.7e-03)	8.68e-01 (3.5e-02)	8.18e-01 (3.0e-02)	1.44e-04 (***)	0.73
	0.2	1.92e+00 (1.6e-02)	1.92e+00 (1.6e-02)	1.62e+00 (6.1e-02)	1.55e+00 (5.6e-02)	3.30e-04 (***)	0.66
	0.5	4.84e+00 (3.6e-02)	4.88e+00 (3.5e-02)	3.77e+00 (1.7e-01)	3.52e+00 (1.3e-01)	2.43e-03 (**)	0.61
	1.0	9.69e+00 (6.9e-02)	9.77e+00 (6.7e-02)	7.31e+00 (3.1e-01)	6.15e+00 (2.5e-01)	2.24e-09 (***)	0.63
SE-0.1	0.1	8.51e-01 (1.1e-02)	9.02e-01 (1.9e-02)	1.88e+00 (3.1e-02)	1.98e+00 (6.3e-02)	3.83e-02 (*)	0.57
	0.2	1.73e+00 (1.6e-02)	1.79e+00 (3.1e-02)	3.43e+00 (4.8e-02)	3.48e+00 (1.1e-01)	2.94e-01	0.73
	0.5	4.47e+00 (5.0e-02)	4.58e+00 (6.0e-02)	7.77e+00 (1.4e-01)	7.73e+00 (1.7e-01)	3.74e-01	0.67
	1.0	9.31e+00 (8.4e-02)	9.52e+00 (8.4e-02)	1.45e+01 (2.6e-01)	1.41e+01 (2.5e-01)	3.91e-04 (***)	0.76
Mat12-1	0.1	1.45e+00 (2.5e-02)	1.56e+00 (2.4e-02)	4.42e+00 (5.5e-02)	4.61e+00 (5.8e-02)	2.37e-07 (***)	0.37
	0.2	2.19e+00 (3.5e-02)	2.30e+00 (3.3e-02)	5.46e+00 (5.7e-02)	5.50e+00 (6.5e-02)	1.78e-01	0.64
	0.5	4.95e+00 (5.5e-02)	5.05e+00 (5.2e-02)	8.18e+00 (9.8e-02)	8.04e+00 (1.1e-01)	1.52e-02 (*)	0.73
	1.0	9.80e+00 (9.1e-02)	9.94e+00 (9.0e-02)	1.21e+01 (2.6e-01)	1.16e+01 (2.2e-01)	6.76e-04 (***)	0.74
Mat12-0.1	0.1	3.41e+00 (6.6e-02)	3.83e+00 (8.4e-02)	1.16e+01 (1.4e-01)	1.25e+01 (2.3e-01)	1.85e-06 (***)	0.23
	0.2	3.75e+00 (7.9e-02)	4.17e+00 (9.4e-02)	1.24e+01 (1.7e-01)	1.32e+01 (2.3e-01)	2.05e-07 (***)	0.27
	0.5	5.91e+00 (9.0e-02)	6.36e+00 (1.0e-01)	1.56e+01 (1.6e-01)	1.62e+01 (2.4e-01)	1.01e-03 (**)	0.35
	1.0	1.04e+01 (1.4e-01)	1.08e+01 (1.2e-01)	2.11e+01 (2.2e-01)	2.11e+01 (2.3e-01)	4.48e-01	0.48
Mat32-1	0.1	9.39e-01 (9.2e-03)	9.51e-01 (8.3e-03)	1.33e+00 (2.3e-02)	1.29e+00 (2.0e-02)	5.22e-04 (***)	0.7
	0.2	1.91e+00 (1.7e-02)	1.92e+00 (1.6e-02)	2.27e+00 (4.7e-02)	2.18e+00 (4.2e-02)	1.76e-04 (***)	0.71
	0.5	4.79e+00 (4.0e-02)	4.84e+00 (3.8e-02)	5.01e+00 (1.4e-01)	4.53e+00 (1.1e-01)	3.31e-07 (***)	0.66
	1.0	9.59e+00 (8.2e-02)	9.67e+00 (8.1e-02)	8.49e+00 (2.9e-01)	7.58e+00 (2.6e-01)	3.94e-08 (***)	0.67
Mat32-0.1	0.1	1.25e+00 (3.3e-02)	1.42e+00 (4.2e-02)	3.86e+00 (8.2e-02)	4.27e+00 (1.3e-01)	1.67e-05 (***)	0.42
	0.2	1.97e+00 (3.8e-02)	2.23e+00 (6.7e-02)	5.41e+00 (7.8e-02)	5.95e+00 (1.9e-01)	1.59e-03 (**)	0.53
	0.5	4.66e+00 (5.9e-02)	4.97e+00 (6.9e-02)	1.03e+01 (1.2e-01)	1.06e+01 (1.8e-01)	2.05e-02 (*)	0.55
	1.0	9.55e+00 (1.2e-01)	9.82e+00 (1.2e-01)	1.69e+01 (2.3e-01)	1.67e+01 (2.7e-01)	4.00e-02 (*)	0.68
RQ-0.1-0.5	0.1	8.92e-01 (1.9e-02)	9.95e-01 (3.3e-02)	2.37e+00 (4.1e-02)	2.58e+00 (1.0e-01)	1.91e-02 (*)	0.51
	0.2	1.82e+00 (2.7e-02)	1.95e+00 (3.7e-02)	4.15e+00 (7.0e-02)	4.36e+00 (1.1e-01)	6.07e-03 (**)	0.57
	0.5	4.60e+00 (6.2e-02)	4.84e+00 (7.4e-02)	9.08e+00 (1.5e-01)	8.99e+00 (2.0e-01)	2.66e-01	0.61
	1.0	9.30e+00 (9.9e-02)	9.57e+00 (9.8e-02)	1.48e+01 (2.3e-01)	1.43e+01 (2.2e-01)	6.64e-03 (**)	0.74
Periodic-0.1-4	0.1	8.82e-01 (3.6e-02)	9.46e-01 (3.8e-02)	2.44e+00 (1.1e-01)	2.58e+00 (1.2e-01)	2.75e-03 (**)	0.54
	0.2	1.65e+00 (2.6e-02)	1.80e+00 (4.4e-02)	4.26e+00 (6.2e-02)	4.47e+00 (1.3e-01)	3.92e-02 (*)	0.67
	0.5	4.34e+00 (5.1e-02)	4.54e+00 (6.0e-02)	9.44e+00 (1.3e-01)	9.41e+00 (1.9e-01)	4.14e-01	0.59
	1.0	9.25e+00 (1.3e-01)	9.64e+00 (1.3e-01)	1.76e+01 (2.6e-01)	1.72e+01 (2.9e-01)	3.24e-02 (*)	0.66

A.4 Monotone Decomposition Fitting with Smoothing Splines

A.4.1 A complete version for Table 2

Table 8: Results for comparing the smoothing splines with CV-tuned

\lambda

and the fitting by monotone decomposition with CV-tuned

(\lambda,\mu)

. The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error.

curve	$\sigma$	MSFE		MSPE		p-value	prop.
curve	$\sigma$	SmoothSpline	MonoDecomp	SmoothSpline	MonoDecomp	p-value	prop.
$x^{2}$	0.1	9.57e-01 (8.3e-03)	9.73e-01 (8.2e-03)	7.38e-01 (2.4e-02)	8.67e-01 (2.4e-02)	1.74e-14 (***)	0.25
	0.5	4.79e+00 (4.1e-02)	4.80e+00 (4.1e-02)	3.41e+00 (1.2e-01)	3.39e+00 (1.1e-01)	3.61e-01	0.46
	1.0	9.65e+00 (8.9e-02)	9.69e+00 (8.5e-02)	6.44e+00 (2.9e-01)	6.35e+00 (2.5e-01)	1.68e-01	0.49
	1.5	1.44e+01 (1.1e-01)	1.45e+01 (1.1e-01)	9.31e+00 (3.9e-01)	9.14e+00 (3.4e-01)	4.68e-02 (*)	0.6
	2.0	1.92e+01 (1.6e-01)	1.92e+01 (1.5e-01)	1.23e+01 (4.9e-01)	1.15e+01 (4.3e-01)	4.45e-06 (***)	0.81
$x^{3}$	0.1	9.46e-01 (8.0e-03)	9.89e-01 (8.0e-03)	8.65e-01 (2.4e-02)	1.11e+00 (2.5e-02)	0.00e+00 (***)	0.2
	0.5	4.86e+00 (4.4e-02)	4.88e+00 (4.3e-02)	3.65e+00 (1.2e-01)	3.55e+00 (1.2e-01)	9.06e-03 (**)	0.58
	1.0	9.75e+00 (8.2e-02)	9.77e+00 (8.2e-02)	6.47e+00 (1.8e-01)	6.21e+00 (1.7e-01)	1.18e-04 (***)	0.65
	1.5	1.45e+01 (1.1e-01)	1.46e+01 (1.1e-01)	9.16e+00 (3.1e-01)	8.67e+00 (2.8e-01)	1.67e-06 (***)	0.81
	2.0	1.92e+01 (1.8e-01)	1.92e+01 (1.6e-01)	1.16e+01 (5.9e-01)	1.09e+01 (4.9e-01)	1.15e-04 (***)	0.78
$\exp(x)$	0.1	9.53e-01 (7.6e-03)	9.64e-01 (7.7e-03)	7.96e-01 (2.6e-02)	9.13e-01 (2.6e-02)	5.02e-12 (***)	0.26
	0.5	4.80e+00 (4.3e-02)	4.82e+00 (3.9e-02)	3.33e+00 (1.5e-01)	3.30e+00 (1.3e-01)	2.60e-01	0.51
	1.0	9.74e+00 (8.4e-02)	9.75e+00 (8.4e-02)	5.94e+00 (2.3e-01)	5.82e+00 (2.1e-01)	3.12e-02 (*)	0.58
	1.5	1.45e+01 (1.3e-01)	1.46e+01 (1.3e-01)	9.10e+00 (4.4e-01)	8.83e+00 (4.1e-01)	7.93e-03 (**)	0.69
	2.0	1.95e+01 (1.5e-01)	1.95e+01 (1.5e-01)	1.08e+01 (4.9e-01)	1.06e+01 (4.4e-01)	3.80e-02 (*)	0.57
sigmoid	0.1	9.58e-01 (8.0e-03)	9.58e-01 (7.9e-03)	7.75e-01 (2.7e-02)	7.69e-01 (2.5e-02)	1.81e-01	0.56
	0.5	4.82e+00 (4.2e-02)	4.82e+00 (4.2e-02)	3.55e+00 (1.3e-01)	3.49e+00 (1.2e-01)	2.45e-02 (*)	0.61
	1.0	9.60e+00 (7.8e-02)	9.64e+00 (7.5e-02)	5.99e+00 (2.9e-01)	5.68e+00 (2.4e-01)	4.47e-04 (***)	0.67
	1.5	1.43e+01 (1.6e-01)	1.44e+01 (1.5e-01)	8.94e+00 (5.4e-01)	8.36e+00 (4.9e-01)	5.86e-07 (***)	0.65
	2.0	1.93e+01 (1.6e-01)	1.94e+01 (1.5e-01)	1.15e+01 (6.4e-01)	1.08e+01 (5.4e-01)	2.06e-05 (***)	0.7
SE-1	0.1	9.66e-01 (7.5e-03)	9.71e-01 (7.4e-03)	7.54e-01 (2.4e-02)	8.19e-01 (2.6e-02)	7.24e-08 (***)	0.26
	0.5	4.88e+00 (3.9e-02)	4.89e+00 (3.8e-02)	3.15e+00 (1.2e-01)	3.10e+00 (1.1e-01)	5.92e-02 (.)	0.55
	1.0	9.67e+00 (8.3e-02)	9.70e+00 (8.1e-02)	6.32e+00 (2.8e-01)	6.11e+00 (2.5e-01)	3.86e-03 (**)	0.6
	1.5	1.47e+01 (1.1e-01)	1.47e+01 (1.1e-01)	8.65e+00 (3.8e-01)	8.39e+00 (3.4e-01)	2.48e-02 (*)	0.55
	2.0	1.93e+01 (1.5e-01)	1.94e+01 (1.5e-01)	1.15e+01 (4.3e-01)	1.10e+01 (3.8e-01)	6.24e-04 (***)	0.7
SE-0.1	0.1	7.68e-01 (9.1e-03)	7.94e-01 (9.4e-03)	1.71e+00 (2.7e-02)	1.79e+00 (3.0e-02)	3.41e-09 (***)	0.24
	0.5	4.36e+00 (4.4e-02)	4.40e+00 (4.5e-02)	6.65e+00 (1.0e-01)	6.72e+00 (1.0e-01)	2.24e-03 (**)	0.42
	1.0	8.92e+00 (9.2e-02)	8.99e+00 (9.0e-02)	1.23e+01 (2.1e-01)	1.23e+01 (2.1e-01)	4.32e-01	0.57
	1.5	1.41e+01 (1.3e-01)	1.42e+01 (1.3e-01)	1.76e+01 (3.1e-01)	1.72e+01 (3.1e-01)	5.96e-05 (***)	0.64
	2.0	1.89e+01 (1.7e-01)	1.91e+01 (1.7e-01)	2.16e+01 (4.3e-01)	2.11e+01 (4.3e-01)	2.13e-04 (***)	0.68
Mat12-1	0.1	1.16e+00 (2.0e-02)	1.23e+00 (1.9e-02)	3.71e+00 (2.7e-02)	3.80e+00 (3.1e-02)	5.72e-14 (***)	0.15
	0.5	4.80e+00 (4.9e-02)	4.87e+00 (4.5e-02)	7.50e+00 (8.3e-02)	7.51e+00 (8.3e-02)	3.83e-01	0.4
	1.0	9.65e+00 (9.5e-02)	9.69e+00 (9.2e-02)	1.15e+01 (2.3e-01)	1.13e+01 (2.1e-01)	3.40e-04 (***)	0.56
	1.5	1.43e+01 (1.1e-01)	1.43e+01 (1.1e-01)	1.40e+01 (3.0e-01)	1.38e+01 (2.8e-01)	3.13e-03 (**)	0.64
	2.0	1.96e+01 (1.6e-01)	1.97e+01 (1.6e-01)	1.69e+01 (4.4e-01)	1.64e+01 (3.7e-01)	1.97e-04 (***)	0.69
Mat12-0.1	0.1	2.60e+00 (3.9e-02)	2.88e+00 (3.8e-02)	9.86e+00 (7.0e-02)	1.03e+01 (8.1e-02)	0.00e+00 (***)	0.06
	0.5	4.81e+00 (6.4e-02)	5.11e+00 (6.1e-02)	1.35e+01 (8.5e-02)	1.37e+01 (9.4e-02)	6.90e-06 (***)	0.3
	1.0	9.76e+00 (1.2e-01)	9.92e+00 (1.1e-01)	1.90e+01 (2.2e-01)	1.90e+01 (2.2e-01)	2.06e-01	0.58
	1.5	1.46e+01 (1.7e-01)	1.48e+01 (1.6e-01)	2.29e+01 (2.8e-01)	2.25e+01 (2.7e-01)	1.21e-05 (***)	0.69
	2.0	1.93e+01 (1.8e-01)	1.95e+01 (1.7e-01)	2.70e+01 (4.1e-01)	2.65e+01 (3.6e-01)	3.05e-04 (***)	0.66
Mat32-1	0.1	9.15e-01 (8.0e-03)	9.20e-01 (7.7e-03)	1.18e+00 (2.1e-02)	1.19e+00 (2.2e-02)	5.89e-02 (.)	0.45
	0.5	4.86e+00 (4.2e-02)	4.88e+00 (4.0e-02)	4.35e+00 (1.3e-01)	4.32e+00 (1.2e-01)	2.45e-01	0.52
	1.0	9.59e+00 (8.5e-02)	9.64e+00 (7.5e-02)	7.20e+00 (2.4e-01)	7.04e+00 (2.0e-01)	3.65e-02 (*)	0.49
	1.5	1.45e+01 (1.3e-01)	1.46e+01 (1.3e-01)	1.03e+01 (3.7e-01)	1.00e+01 (3.3e-01)	7.41e-03 (**)	0.65
	2.0	1.95e+01 (1.7e-01)	1.96e+01 (1.7e-01)	1.26e+01 (5.1e-01)	1.19e+01 (4.4e-01)	6.29e-05 (***)	0.68
Mat32-0.1	0.1	8.83e-01 (1.4e-02)	9.47e-01 (1.4e-02)	2.92e+00 (2.4e-02)	3.02e+00 (2.6e-02)	3.54e-10 (***)	0.21
	0.5	4.22e+00 (5.3e-02)	4.37e+00 (5.0e-02)	8.99e+00 (9.9e-02)	9.03e+00 (9.9e-02)	1.82e-01	0.45
	1.0	8.96e+00 (1.1e-01)	9.14e+00 (1.0e-01)	1.48e+01 (2.2e-01)	1.47e+01 (2.1e-01)	1.44e-01	0.54
	1.5	1.41e+01 (1.6e-01)	1.43e+01 (1.6e-01)	1.99e+01 (3.5e-01)	1.97e+01 (3.4e-01)	8.29e-02 (.)	0.59
	2.0	1.92e+01 (1.9e-01)	1.94e+01 (1.8e-01)	2.36e+01 (4.4e-01)	2.29e+01 (3.9e-01)	6.38e-08 (***)	0.74
RQ-0.1-0.5	0.1	7.21e-01 (9.5e-03)	7.52e-01 (1.0e-02)	2.04e+00 (2.1e-02)	2.07e+00 (2.2e-02)	1.35e-04 (***)	0.38
	0.5	4.31e+00 (4.8e-02)	4.41e+00 (4.7e-02)	7.50e+00 (9.8e-02)	7.55e+00 (9.5e-02)	1.36e-01	0.46
	1.0	9.10e+00 (1.1e-01)	9.22e+00 (1.1e-01)	1.27e+01 (2.4e-01)	1.25e+01 (2.2e-01)	1.21e-03 (**)	0.6
	1.5	1.43e+01 (1.5e-01)	1.44e+01 (1.5e-01)	1.75e+01 (3.3e-01)	1.74e+01 (3.1e-01)	1.29e-01	0.5
	2.0	1.91e+01 (1.9e-01)	1.92e+01 (1.8e-01)	2.10e+01 (5.4e-01)	2.07e+01 (5.0e-01)	1.08e-02 (*)	0.63
Periodic-0.1-4	0.1	7.11e-01 (8.8e-03)	7.44e-01 (1.0e-02)	2.07e+00 (2.7e-02)	2.11e+00 (2.6e-02)	3.37e-06 (***)	0.27
	0.5	4.07e+00 (4.6e-02)	4.20e+00 (4.6e-02)	8.20e+00 (1.2e-01)	8.35e+00 (1.1e-01)	1.66e-03 (**)	0.27
	1.0	8.69e+00 (1.0e-01)	8.89e+00 (9.1e-02)	1.44e+01 (2.1e-01)	1.43e+01 (1.9e-01)	3.03e-01	0.49
	1.5	1.39e+01 (1.7e-01)	1.42e+01 (1.6e-01)	2.06e+01 (2.7e-01)	2.02e+01 (2.7e-01)	1.96e-05 (***)	0.67
	2.0	1.90e+01 (1.8e-01)	1.92e+01 (1.8e-01)	2.47e+01 (3.9e-01)	2.45e+01 (3.8e-01)	4.46e-02 (*)	0.63

A.4.2 Tune $\mu$ with fixed $\lambda$

First, we fix the parameter $\lambda$ for the roughness penalty as the CV-tuned one for the smoothing splines. Then, we choose the parameter $\mu$ to minimize the CV error. Figure 11 demonstrates the procedure. The left panel shows the CV-error curve for each candidate parameter $\mu$ , and the right panel compares the monotone decomposition fitting given the parameter which minimized the curve in the left panel to the smoothing spline fitting. The monotone decomposition achieves a better MSPE, and it is obvious that the better performance is mainly due to the shrinkage on the local modes based on the shapes of fitting curves.

The average results based on 100 repetitions are summarized in Table 9. The table shows that we can obtain better performance in the high noise cases, and comparable results in the smaller noise scenarios.

Table 9: Results for comparing the smoothing splines with CV-tuned

\lambda

and the fitting by monotone decomposition with CV-tuned

\mu

and the same

\lambda

. The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean square prediction error.

curve	$\sigma$	SNR	MSFE		MSPE		p-value	prop.
curve	$\sigma$	SNR	SmoothSpline	MonoDecomp	SmoothSpline	MonoDecomp	p-value	prop.
$x^{2}$	0.1	1.03e+01 (1.9e-01)	9.30e-03 (1.4e-04)	9.55e-03 (1.4e-04)	6.78e-04 (4.8e-05)	9.03e-04 (5.4e-05)	7.17e-12 (***)	0.24
	0.2	2.68e+00 (7.4e-02)	3.72e-02 (6.4e-04)	3.76e-02 (6.3e-04)	2.15e-03 (1.8e-04)	2.29e-03 (1.7e-04)	1.03e-02 (*)	0.33
	0.5	5.04e-01 (2.1e-02)	2.35e-01 (3.8e-03)	2.37e-01 (3.6e-03)	1.36e-02 (1.4e-03)	1.23e-02 (9.5e-04)	3.49e-02 (*)	0.44
	1.0	1.75e-01 (1.0e-02)	9.48e-01 (1.5e-02)	9.59e-01 (1.5e-02)	5.17e-02 (3.0e-03)	4.81e-02 (2.7e-03)	7.22e-03 (**)	0.48
	1.5	1.23e-01 (1.2e-02)	2.07e+00 (3.5e-02)	2.09e+00 (3.6e-02)	1.03e-01 (1.0e-02)	9.57e-02 (8.8e-03)	1.14e-02 (*)	0.57
	2.0	1.13e-01 (2.3e-02)	3.82e+00 (5.9e-02)	3.87e+00 (5.6e-02)	1.94e-01 (2.4e-02)	1.55e-01 (1.6e-02)	1.98e-04 (***)	0.63
$x^{3}$	0.1	1.76e+01 (3.3e-01)	8.80e-03 (1.3e-04)	9.92e-03 (1.4e-04)	8.51e-04 (4.8e-05)	1.70e-03 (7.3e-05)	0.00e+00 (***)	0.15
	0.2	4.51e+00 (1.1e-01)	3.62e-02 (6.0e-04)	3.74e-02 (6.1e-04)	2.89e-03 (1.5e-04)	3.11e-03 (1.6e-04)	4.74e-02 (*)	0.51
	0.5	7.70e-01 (2.3e-02)	2.33e-01 (3.5e-03)	2.36e-01 (3.5e-03)	1.51e-02 (9.3e-04)	1.36e-02 (8.0e-04)	3.25e-04 (***)	0.52
	1.0	2.43e-01 (1.6e-02)	9.44e-01 (1.6e-02)	9.54e-01 (1.6e-02)	5.32e-02 (3.9e-03)	4.63e-02 (2.9e-03)	1.77e-04 (***)	0.69
	1.5	1.58e-01 (1.6e-02)	2.09e+00 (3.7e-02)	2.12e+00 (3.5e-02)	1.11e-01 (1.1e-02)	9.53e-02 (1.1e-02)	1.90e-05 (***)	0.69
	2.0	1.36e-01 (2.1e-02)	3.75e+00 (7.0e-02)	3.80e+00 (6.6e-02)	1.74e-01 (2.1e-02)	1.37e-01 (1.6e-02)	1.55e-04 (***)	0.74
$\exp(x)$	0.1	5.10e+01 (1.1e+00)	9.06e-03 (1.7e-04)	9.42e-03 (1.6e-04)	6.66e-04 (5.0e-05)	9.11e-04 (5.6e-05)	3.13e-06 (***)	0.36
	0.2	1.26e+01 (3.1e-01)	3.65e-02 (6.4e-04)	3.69e-02 (6.0e-04)	2.35e-03 (2.6e-04)	2.34e-03 (1.8e-04)	4.72e-01	0.37
	0.5	1.98e+00 (4.5e-02)	2.40e-01 (3.9e-03)	2.41e-01 (3.8e-03)	1.16e-02 (9.4e-04)	1.10e-02 (7.9e-04)	4.03e-02 (*)	0.52
	1.0	5.89e-01 (2.2e-02)	9.07e-01 (1.4e-02)	9.14e-01 (1.3e-02)	5.39e-02 (5.7e-03)	4.79e-02 (4.5e-03)	3.84e-04 (***)	0.55
	1.5	3.02e-01 (1.9e-02)	2.11e+00 (3.6e-02)	2.13e+00 (3.5e-02)	1.02e-01 (1.0e-02)	9.10e-02 (8.7e-03)	5.87e-04 (***)	0.62
	2.0	2.00e-01 (2.9e-02)	3.79e+00 (6.3e-02)	3.81e+00 (6.2e-02)	1.71e-01 (2.7e-02)	1.53e-01 (2.4e-02)	1.71e-04 (***)	0.72
sigmoid	0.1	1.70e+01 (3.2e-01)	9.31e-03 (1.4e-04)	9.36e-03 (1.4e-04)	6.33e-04 (4.3e-05)	6.03e-04 (3.3e-05)	5.12e-02 (.)	0.55
	0.2	4.38e+00 (6.9e-02)	3.66e-02 (4.6e-04)	3.68e-02 (4.6e-04)	2.44e-03 (1.7e-04)	2.39e-03 (1.6e-04)	7.53e-02 (.)	0.61
	0.5	7.87e-01 (2.7e-02)	2.32e-01 (4.0e-03)	2.33e-01 (3.9e-03)	1.50e-02 (1.0e-03)	1.34e-02 (7.9e-04)	1.59e-05 (***)	0.63
	1.0	2.31e-01 (1.3e-02)	9.44e-01 (1.6e-02)	9.51e-01 (1.6e-02)	3.96e-02 (3.9e-03)	3.43e-02 (3.1e-03)	8.84e-06 (***)	0.67
	1.5	1.70e-01 (1.5e-02)	2.09e+00 (3.3e-02)	2.12e+00 (3.1e-02)	1.12e-01 (1.0e-02)	8.82e-02 (7.8e-03)	4.95e-05 (***)	0.64
	2.0	1.26e-01 (1.9e-02)	3.69e+00 (6.5e-02)	3.73e+00 (6.3e-02)	1.67e-01 (2.4e-02)	1.32e-01 (1.7e-02)	1.58e-04 (***)	0.75
SE-1	0.1	3.43e+01 (4.3e+00)	9.02e-03 (1.6e-04)	9.16e-03 (1.5e-04)	7.53e-04 (7.0e-05)	7.32e-04 (5.5e-05)	2.40e-01	0.43
	0.2	5.41e+00 (6.9e-01)	3.77e-02 (6.6e-04)	3.78e-02 (6.5e-04)	2.31e-03 (2.0e-04)	2.36e-03 (1.9e-04)	2.04e-01	0.4
	0.5	1.20e+00 (1.4e-01)	2.38e-01 (3.9e-03)	2.39e-01 (3.8e-03)	1.24e-02 (1.2e-03)	1.12e-02 (9.5e-04)	7.34e-03 (**)	0.51
	1.0	3.59e-01 (3.9e-02)	9.38e-01 (1.6e-02)	9.50e-01 (1.6e-02)	4.74e-02 (3.6e-03)	4.12e-02 (2.8e-03)	4.67e-04 (***)	0.63
	1.5	1.68e-01 (1.9e-02)	2.13e+00 (3.2e-02)	2.15e+00 (3.2e-02)	9.58e-02 (8.1e-03)	8.60e-02 (6.7e-03)	3.72e-04 (***)	0.6
	2.0	1.18e-01 (1.2e-02)	3.68e+00 (5.9e-02)	3.71e+00 (5.7e-02)	1.69e-01 (1.6e-02)	1.48e-01 (1.2e-02)	2.36e-04 (***)	0.75
SE-0.1	0.1	1.50e+02 (6.4e+00)	6.36e-03 (1.3e-04)	7.06e-03 (1.8e-04)	3.00e-03 (7.7e-05)	3.51e-03 (1.6e-04)	3.81e-05 (***)	0.25
	0.2	3.55e+01 (1.9e+00)	2.79e-02 (5.5e-04)	2.90e-02 (5.7e-04)	1.00e-02 (3.1e-04)	1.05e-02 (3.3e-04)	6.47e-05 (***)	0.39
	0.5	4.98e+00 (2.3e-01)	1.97e-01 (3.9e-03)	2.02e-01 (3.7e-03)	4.70e-02 (1.7e-03)	4.74e-02 (1.5e-03)	2.98e-01	0.38
	1.0	1.30e+00 (6.9e-02)	8.23e-01 (1.6e-02)	8.54e-01 (1.5e-02)	1.54e-01 (5.3e-03)	1.55e-01 (4.6e-03)	3.62e-01	0.35
	1.5	6.35e-01 (3.9e-02)	1.96e+00 (4.0e-02)	2.02e+00 (4.0e-02)	3.20e-01 (1.2e-02)	3.23e-01 (1.3e-02)	2.79e-01	0.51
	2.0	3.69e-01 (2.3e-02)	3.71e+00 (6.7e-02)	3.79e+00 (6.8e-02)	4.69e-01 (1.8e-02)	4.62e-01 (1.7e-02)	1.57e-01	0.54
Mat12-1	0.1	3.51e+01 (2.7e+00)	1.41e-02 (4.1e-04)	1.58e-02 (4.3e-04)	1.40e-02 (2.3e-04)	1.48e-02 (2.6e-04)	3.30e-10 (***)	0.2
	0.2	1.29e+01 (9.7e-01)	3.80e-02 (1.0e-03)	4.17e-02 (9.8e-04)	2.39e-02 (4.6e-04)	2.44e-02 (5.0e-04)	1.39e-02 (*)	0.43
	0.5	1.97e+00 (1.5e-01)	2.27e-01 (4.6e-03)	2.38e-01 (4.4e-03)	5.88e-02 (1.7e-03)	5.78e-02 (1.5e-03)	1.09e-01	0.54
	1.0	6.51e-01 (5.1e-02)	9.19e-01 (1.6e-02)	9.40e-01 (1.5e-02)	1.26e-01 (4.8e-03)	1.20e-01 (4.1e-03)	1.12e-03 (**)	0.57
	1.5	3.52e-01 (4.4e-02)	2.10e+00 (3.4e-02)	2.14e+00 (3.1e-02)	2.13e-01 (1.2e-02)	1.97e-01 (9.0e-03)	2.02e-02 (*)	0.62
	2.0	1.71e-01 (1.9e-02)	3.76e+00 (6.0e-02)	3.81e+00 (5.9e-02)	2.94e-01 (2.1e-02)	2.75e-01 (1.8e-02)	1.00e-03 (**)	0.61
Mat12-0.1	0.1	1.51e+01 (8.5e-01)	6.84e-02 (2.5e-03)	8.71e-02 (3.1e-03)	9.73e-02 (1.8e-03)	1.10e-01 (2.3e-03)	7.39e-13 (***)	0.1
	0.2	9.80e+00 (5.0e-01)	9.35e-02 (2.9e-03)	1.12e-01 (3.5e-03)	1.14e-01 (1.7e-03)	1.26e-01 (2.5e-03)	3.16e-11 (***)	0.1
	0.5	3.90e+00 (2.3e-01)	2.55e-01 (7.8e-03)	2.90e-01 (7.9e-03)	1.92e-01 (3.0e-03)	2.02e-01 (3.5e-03)	8.52e-08 (***)	0.2
	1.0	1.24e+00 (7.2e-02)	9.43e-01 (2.5e-02)	1.01e+00 (2.3e-02)	3.62e-01 (7.5e-03)	3.65e-01 (7.5e-03)	2.24e-01	0.25
	1.5	5.39e-01 (4.7e-02)	2.15e+00 (5.0e-02)	2.23e+00 (4.6e-02)	5.26e-01 (1.7e-02)	5.14e-01 (1.3e-02)	8.29e-02 (.)	0.51
	2.0	3.22e-01 (3.3e-02)	3.85e+00 (8.0e-02)	3.94e+00 (7.4e-02)	7.69e-01 (2.3e-02)	7.36e-01 (2.2e-02)	7.18e-04 (***)	0.5
Mat32-1	0.1	3.78e+01 (3.8e+00)	8.48e-03 (1.6e-04)	8.70e-03 (1.6e-04)	1.50e-03 (5.1e-05)	1.57e-03 (5.1e-05)	1.71e-02 (*)	0.4
	0.2	1.11e+01 (1.0e+00)	3.50e-02 (6.3e-04)	3.57e-02 (6.0e-04)	4.32e-03 (2.0e-04)	4.37e-03 (1.9e-04)	2.68e-01	0.47
	0.5	1.70e+00 (1.6e-01)	2.32e-01 (3.6e-03)	2.35e-01 (3.6e-03)	1.95e-02 (1.1e-03)	1.87e-02 (9.8e-04)	1.28e-02 (*)	0.48
	1.0	4.05e-01 (3.9e-02)	9.54e-01 (1.4e-02)	9.65e-01 (1.4e-02)	6.44e-02 (4.2e-03)	6.09e-02 (3.9e-03)	5.44e-03 (**)	0.57
	1.5	2.24e-01 (2.4e-02)	2.17e+00 (3.9e-02)	2.18e+00 (3.8e-02)	1.13e-01 (1.1e-02)	1.01e-01 (8.2e-03)	1.09e-03 (**)	0.6
	2.0	1.55e-01 (1.4e-02)	3.79e+00 (6.3e-02)	3.84e+00 (6.2e-02)	1.94e-01 (1.6e-02)	1.62e-01 (1.2e-02)	4.61e-05 (***)	0.74
Mat32-0.1	0.1	1.30e+02 (6.8e+00)	7.72e-03 (2.2e-04)	9.77e-03 (3.8e-04)	8.67e-03 (2.0e-04)	9.84e-03 (3.0e-04)	2.79e-05 (***)	0.27
	0.2	4.25e+01 (2.6e+00)	2.47e-02 (7.1e-04)	2.88e-02 (8.5e-04)	2.04e-02 (3.6e-04)	2.15e-02 (5.4e-04)	3.83e-03 (**)	0.32
	0.5	5.49e+00 (2.8e-01)	1.81e-01 (4.9e-03)	2.01e-01 (5.2e-03)	8.16e-02 (2.2e-03)	8.51e-02 (2.4e-03)	1.74e-03 (**)	0.4
	1.0	1.34e+00 (6.7e-02)	8.44e-01 (2.0e-02)	9.01e-01 (1.9e-02)	2.44e-01 (7.8e-03)	2.40e-01 (7.6e-03)	6.21e-02 (.)	0.41
	1.5	6.73e-01 (4.4e-02)	1.99e+00 (4.1e-02)	2.07e+00 (3.9e-02)	3.98e-01 (1.6e-02)	3.82e-01 (1.3e-02)	4.94e-03 (**)	0.5
	2.0	3.67e-01 (3.0e-02)	3.76e+00 (6.6e-02)	3.87e+00 (6.2e-02)	5.92e-01 (2.6e-02)	5.64e-01 (2.0e-02)	8.86e-03 (**)	0.6
RQ-0.1-0.5	0.1	1.23e+02 (6.6e+00)	5.70e-03 (1.5e-04)	6.56e-03 (1.9e-04)	4.33e-03 (8.3e-05)	4.71e-03 (1.0e-04)	2.57e-07 (***)	0.28
	0.2	3.44e+01 (2.0e+00)	2.47e-02 (6.1e-04)	2.66e-02 (6.3e-04)	1.30e-02 (3.2e-04)	1.34e-02 (3.3e-04)	2.92e-03 (**)	0.34
	0.5	4.27e+00 (2.4e-01)	1.94e-01 (4.5e-03)	2.10e-01 (5.0e-03)	5.97e-02 (1.6e-03)	6.17e-02 (1.7e-03)	6.21e-02 (.)	0.41
	1.0	1.07e+00 (6.2e-02)	8.53e-01 (1.8e-02)	8.92e-01 (1.7e-02)	1.72e-01 (6.8e-03)	1.70e-01 (6.1e-03)	3.52e-01	0.43
	1.5	4.43e-01 (3.3e-02)	2.11e+00 (3.9e-02)	2.15e+00 (3.8e-02)	3.05e-01 (1.0e-02)	3.02e-01 (1.0e-02)	3.03e-01	0.49
	2.0	3.07e-01 (2.4e-02)	3.83e+00 (6.6e-02)	3.92e+00 (6.6e-02)	4.56e-01 (1.8e-02)	4.43e-01 (1.6e-02)	3.16e-02 (*)	0.54
Periodic-0.1-4	0.1	1.90e+02 (7.8e+00)	5.22e-03 (1.3e-04)	6.09e-03 (2.5e-04)	4.37e-03 (9.4e-05)	4.99e-03 (2.3e-04)	1.19e-03 (**)	0.34
	0.2	4.32e+01 (2.3e+00)	2.35e-02 (4.9e-04)	2.60e-02 (6.4e-04)	1.45e-02 (3.6e-04)	1.54e-02 (4.9e-04)	2.14e-03 (**)	0.3
	0.5	6.84e+00 (3.2e-01)	1.66e-01 (4.2e-03)	1.76e-01 (4.6e-03)	6.68e-02 (1.8e-03)	6.84e-02 (1.9e-03)	3.44e-02 (*)	0.51
	1.0	1.59e+00 (8.1e-02)	7.92e-01 (1.9e-02)	8.49e-01 (1.9e-02)	2.27e-01 (7.4e-03)	2.34e-01 (7.8e-03)	5.08e-02 (.)	0.48
	1.5	6.94e-01 (5.0e-02)	1.96e+00 (4.3e-02)	2.06e+00 (4.4e-02)	4.40e-01 (1.4e-02)	4.35e-01 (1.4e-02)	2.60e-01	0.55
	2.0	4.22e-01 (3.9e-02)	3.62e+00 (8.5e-02)	3.72e+00 (8.4e-02)	6.51e-01 (2.3e-02)	6.30e-01 (2.0e-02)	1.19e-02 (*)	0.62

A.4.3 Tune the shrinkage factor $k$

In light of Proposition 6, we tune the shrinkage factor $k$ instead of $(\lambda,\mu)$ . It shows that the monotone decomposition with the shrinkage factor can achieve better performance in high noise cases for monotone functions such as $x^{3},\exp(x)$ and sigmoid function. Promisingly, it also works for $x^{2}$ and SE_1, although Proposition 6 is intended for pure monotone functions.

A.5 QQ plots of p-values

A.6 Different Curves for Testing the Monotonicity

Appendix B More GO Analysis

Figure 22 is the same as Figure 8except that panel (d) displays the GO terms identified by the interaction of DE genes and ME genes (the complementary of nME genes). Similar to DE genes, the interaction of DE genes and ME genes failed to identify significant GO terms.

For the liver dataset, the GO terms identified by different gene sets are shown in Figure 23. Most GO terms selected by the DE genes (Figure 23(b)) and by the ME&DE genes (Figure 23(d)) are the same. The numbers of different GO terms are illustrated in the Venn diagram of Figure 24.

Figures 25 and 26 display the distinct GO terms selected by DE genes and ME genes, respectively. Note that the counts of GO terms and associated $p$ -values are relatively small compared to their common GO terms.

Appendix C Theoretical proofs of some basic propositions

C.1 A proposition

Proposition 7.

Suppose $f(x)$ is a univariate continuous function in an interval $I$ , there exists a decomposition

f(x)=f_{\mathrm{up}}(x)+f_{\mathrm{down}}(x).

(16)

where $f_{\mathrm{up}}(x)$ and $f_{\mathrm{down}}(x)$ are continuous increasing and decreasing functions, respectively.

Proof.

Decompose the interval $I$ as

I=\cup_{k=1}^{n}[a_{2k-1},a_{2k}]\,.

Without loss of generality, suppose $f(x)$ is increasing in $[a_{2k-1},a_{2k}]$ and decreasing in $[a_{2k},a_{2k+1}]$ . Then one feasible decomposition is

f_{\mathrm{up}}(x)=\begin{cases}f(x)&x\in[a_{1},a_{2}]\\ f(a_{2})&x\in[a_{2},a_{3}]\\ f(x)-f(a_{3})+f(a_{2})&x\in[a_{3},a_{4}]\\ f(a_{4})-f(a_{3})+f(a_{2})&x\in[a_{4},a_{5}]\\ \cdots&\cdots\\ f(x)+\sum_{i=2}^{k}(f(a_{2i-2})-f(a_{2i-1}))&x\in[a_{2k-1},a_{2k}],k\geq 2\\ f(a_{2k})+\sum_{i=2}^{k}(f(a_{2i-2})-f(a_{2i-1}))&x\in[a_{2k},a_{2k+1}],k\geq 2\end{cases}\,,

and $f_{\mathrm{down}}(x)=f(x)-f_{\mathrm{up}}(x)$ . ∎

C.2 Proposition 1

Proof.

Consider the problem

\begin{split}\min_{\gamma^{u},\gamma^{d}}&\;\|y-{\mathbf{B}}(\gamma^{u}+\gamma^{d})\|_{2}\\ \text{s.t. }&\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\\ &\gamma^{u}+\gamma^{d}=\hat{\gamma}\,,\end{split}

(17)

where $\hat{\gamma}$ is the unconstrained solution. Since we can always decompose a vector into the sum of an increasing vector and a decreasing vector. For $2\leq i\leq J$ , let

	$\displaystyle\hat{\gamma}^{u}_{i}$	$\displaystyle=\hat{\gamma}_{i-1}^{u}+(\hat{\gamma}_{i}-\hat{\gamma}_{i-1})_{+}$		(18)
	$\displaystyle\hat{\gamma}^{d}_{i}$	$\displaystyle=\hat{\gamma}_{i-1}^{d}+(\hat{\gamma}_{i}-\hat{\gamma}_{i-1})_{-}\,,$		(19)

where $t_{+}=\max(0,t)$ and $t_{-}=\min(0,t)$ . And

\hat{\gamma}^{u}_{1}+\hat{\gamma}^{d}_{1}=\hat{\gamma}_{1}\,,

for example, take $\hat{\gamma}_{1}^{u}=\hat{\gamma}_{1}$ and $\hat{\gamma}_{1}^{d}=0$ . Thus the constraints are feasible, i.e., non-empty, and then any solution would satisfy

\hat{\gamma}^{u}+\hat{\gamma}^{d}=\hat{\gamma}\,.

∎

C.3 Proposition 2

Proof.

Here we only prove the second property since the first property has been incorporated into the proof for the main propositions in Sections D.2 and G.3.

We can also put the constraint $\gamma\in\Upsilon$ into the Lagrangian form. Define

{\mathbf{A}}\triangleq\begin{bmatrix}{\mathbf{I}}_{J-1}&0\end{bmatrix}-\begin{bmatrix}0&{\mathbf{I}}_{J-1}\end{bmatrix}=\begin{bmatrix}1&-1&0&\cdots&0&0\\ 0&1&-1&\cdots&0&0\\ 0&0&1&\cdots&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&1&-1\end{bmatrix}_{(J-1)\times J}\,,\quad{\mathbf{H}}\triangleq\begin{bmatrix}{\mathbf{A}}&\boldsymbol{0}\\ \boldsymbol{0}&-{\mathbf{A}}\end{bmatrix}\,,

and the Lagrange form is

L_{\mathrm{ls}}(\gamma,\nu,\mu)=\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}+2\nu^{T}{\mathbf{H}}\gamma+\mu\gamma^{T}{\mathbf{W}}\gamma\,,

(20)

where $2(J-1)$ -vector $\nu$ and scalar $\mu$ are the Lagrange multipliers, and factor $2$ is just for convenient formulas after taking the first derivatives. Take the derivatives on the Lagrange form,

-2{\mathbf{Z}}^{T}(y-{\mathbf{Z}}\gamma)+2{\mathbf{H}}^{T}\nu+2\mu{\mathbf{W}}\gamma+2\lambda\begin{bmatrix}{\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\\ {\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\end{bmatrix}\gamma=0\,,

where if $\lambda=0$ , it reduces to the cubic spline. Then

	$\displaystyle-{\mathbf{B}}^{T}_{i}({\mathbf{y}}-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})+\lambda{\boldsymbol{\Omega}}_{i}(\gamma^{u}+\gamma^{d})=0$		(21)
	$\displaystyle-{\mathbf{B}}^{T}_{i}({\mathbf{y}}-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i+J}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})+\lambda{\boldsymbol{\Omega}}_{i}(\gamma^{u}+\gamma^{d})=0$		(22)

Note that

\sum_{i=1}^{J}{\mathbf{H}}_{i}^{T}\nu=0\,,

and

\sum_{i=1}^{J}{\mathbf{H}}_{i+J}^{T}\nu=0\,,

then we have

\mu{\boldsymbol{1}}^{T}{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})=\mu{\boldsymbol{1}}^{T}{\mathbf{B}}^{T}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})\,,

which implies

\mu{\boldsymbol{1}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})=\mu{\boldsymbol{1}}^{T}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})\,,

2\mu{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}=2\mu{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}\,.

If $\mu=0$ , by KKT condition, then ${\mathbf{B}}\gamma^{u}-{\mathbf{B}}\gamma^{d}=0$ . If $\mu\neq 0$ , then ${\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}$ .

Thus, ${\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}$ always holds. ∎

Appendix D Proof of Proposition 3

Lemma 1 (Chebyshev Sum Inequality).

If $a_{1}\geq a_{2}\geq\cdots\geq a_{n}$ , and $b_{1}\geq b_{2}\geq\cdots\geq b_{n}$ , then

n\sum_{k=1}^{n}a_{k}b_{k}\geq\left(\sum_{k=1}^{n}a_{k}\right)\left(\sum_{k=1}^{n}b_{k}\right)

Proof.

Since these two sequences are decreasing, then the sum

S=\sum_{j=1}^{n}\sum_{k=1}^{n}(a_{j}-a_{k})(b_{j}-b_{k})\geq 0\,.

Note that

	$\displaystyle S$	$\displaystyle=\sum_{j=1}^{n}\left(na_{j}b_{j}-b_{j}\sum_{k=1}^{n}a_{k}-a_{j}\sum_{k=1}^{n}b_{k}+\sum_{k=1}^{n}a_{k}b_{k}\right)$
		$\displaystyle=n\sum_{j=1}^{n}a_{j}b_{j}+n\sum_{k=1}^{n}a_{k}b_{k}-\sum_{j=1}^{n}b_{j}\sum_{k=1}^{n}a_{k}-\sum_{j=1}^{n}a_{j}\sum_{k=1}^{n}b_{k}$
		$\displaystyle=2n\sum_{k=1}^{n}a_{k}b_{k}-2\sum_{k=1}^{n}b_{k}\sum_{k=1}^{n}a_{k}\,,$

hence

n\sum_{k=1}^{n}a_{k}b_{k}\geq\left(\sum_{k=1}^{n}a_{k}\right)\left(\sum_{k=1}^{n}b_{k}\right)\,.

∎

D.1 (i)

Proof.

Suppose $\gamma=\gamma^{u}+\gamma^{d}$ is increasing, then we have a decomposition in which $\gamma^{d}$ is a vector with identical elements. Write the decomposition as $\gamma=\gamma^{u}+c{\boldsymbol{1}}$ , where $c$ is a constant. If there exists a non-zero non-decreasing vector $\delta$ such that the non-increasing part is not a constant, i.e.,

\gamma=(\gamma^{u}+\delta)+(c{\boldsymbol{1}}-\delta)\triangleq\tilde{\gamma}^{u}+\tilde{\gamma}^{d}\,.

Then

\|y-{\mathbf{B}}(\gamma^{u}+c{\boldsymbol{1}})\|=\|y-{\mathbf{B}}(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})\|\,,

and if we impose the roughness penalty, we have

(\gamma^{u}+\gamma^{d})^{T}{\boldsymbol{\Omega}}(\gamma^{u}+\gamma^{d})=(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})^{T}{\boldsymbol{\Omega}}(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})\,.

So the first two terms in the Lagrange objective function (13) are the same, and we only need to compare the difference between these two components,

		$\displaystyle\\|{\mathbf{B}}(\tilde{\gamma}^{u}-\tilde{\gamma}^{d})\\|_{2}^{2}-\\|{\mathbf{B}}(\gamma^{u}-\gamma^{d})\\|_{2}^{2}$
	$\displaystyle=$	$\displaystyle(\tilde{\gamma}^{u}-\tilde{\gamma}^{d}-(\gamma^{u}-\gamma^{d}))^{T}{\mathbf{B}}^{T}{\mathbf{B}}(\tilde{\gamma}^{u}-\tilde{\gamma}^{d}+(\gamma^{u}-\gamma^{d}))$
	$\displaystyle=$	$\displaystyle 2\delta^{T}{\mathbf{B}}^{T}{\mathbf{B}}(2\gamma^{u}+2\delta-2c{\boldsymbol{1}})$
	$\displaystyle=$	$\displaystyle 4\left\{\delta^{T}{\mathbf{B}}^{T}{\mathbf{B}}\delta+\delta^{T}{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-c{\boldsymbol{1}})\right\}\,.$

By Proposition 2, we have

{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}\,\qquad{\boldsymbol{1}}^{T}{\mathbf{B}}\tilde{\gamma}^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\tilde{\gamma}^{d}\,,

then

{\boldsymbol{1}}^{T}{\mathbf{B}}(\gamma^{u}-c{\boldsymbol{1}})={\boldsymbol{1}}^{T}{\mathbf{B}}\delta=0\,.

Let ${\mathbf{a}}={\mathbf{B}}\delta,{\mathbf{b}}={\mathbf{B}}(\gamma^{u}-c{\boldsymbol{1}})$ , then by Fact 1,

{\mathbf{a}}^{T}{\mathbf{b}}\geq\frac{1}{n}({\boldsymbol{1}}^{T}{\mathbf{a}})({\boldsymbol{1}}^{T}{\mathbf{b}})=0\,.

And $\delta^{T}{\mathbf{B}}^{T}{\mathbf{B}}\delta>0$ for non-zero $\delta$ , thus,

\|{\mathbf{B}}(\tilde{\gamma}^{u}-\tilde{\gamma}^{d})\|_{2}^{2}>\|{\mathbf{B}}(\gamma^{u}-\gamma^{d})\|_{2}^{2}\,.

So if $\gamma^{u}+\gamma^{d}$ is increasing, the best decomposition is $\gamma^{d}=c{\boldsymbol{1}}$ .

Note that

{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}c{\boldsymbol{1}}=c{\boldsymbol{1}}^{T}{\boldsymbol{1}}\,,

the constant $c$ is

c=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}}{n}\,.

∎

D.2 (ii)

Proof.

Take the derivatives on the Lagrange form,

\displaystyle-2{\mathbf{Z}}^{T}(y-{\mathbf{Z}}\gamma)+2{\mathbf{H}}^{T}\nu+2\mu{\mathbf{W}}\gamma=0

(23)

then

	$\displaystyle\hat{\gamma}$	$\displaystyle=({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}({\mathbf{Z}}^{T}y-{\mathbf{H}}^{T}\nu)$
		$\displaystyle\triangleq({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}{\mathbf{Z}}^{T}y-({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}\xi\,,$

where $\xi\triangleq{\mathbf{H}}^{T}\nu$ . Note that

({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}=\begin{bmatrix}1+\mu&1-\mu\\ 1-\mu&1+\mu\end{bmatrix}^{-1}\otimes({\mathbf{B}}^{T}{\mathbf{B}})^{-1}=\frac{1}{4\mu}\begin{bmatrix}\mu+1&\mu-1\\ \mu-1&\mu+1\end{bmatrix}\otimes({\mathbf{B}}^{T}{\mathbf{B}})^{-1}\,,

then

({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}{\mathbf{Z}}^{T}y=\left\{\frac{1}{4\mu}\begin{bmatrix}\mu+1&\mu-1\\ \mu-1&\mu+1\end{bmatrix}\begin{bmatrix}1\\ 1\end{bmatrix}\right\}\otimes({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y=\frac{1}{2}\begin{bmatrix}1\\ 1\end{bmatrix}\otimes({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y\,.

Consider the $i$ -th element of (23),

-{\mathbf{Z}}_{i}^{T}(y-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i}^{T}\nu+\mu{\mathbf{W}}_{i}^{T}\gamma=0\,,

(24)

where ${\mathbf{Z}}_{i},{\mathbf{H}}_{i},{\mathbf{W}}_{i}$ are the $i$ -th column of ${\mathbf{Z}},{\mathbf{H}},{\mathbf{W}}$ , respectively. Note that for $1\leq i\leq J$ ,

{\mathbf{W}}_{i}=\begin{bmatrix}{\mathbf{B}}^{T}{\mathbf{B}}_{i}\\ -{\mathbf{B}}^{T}{\mathbf{B}}_{i}\end{bmatrix}\,,\qquad{\mathbf{W}}_{i+J}=\begin{bmatrix}-{\mathbf{B}}^{T}{\mathbf{B}}_{i}\\ {\mathbf{B}}^{T}{\mathbf{B}}_{i}\end{bmatrix}\,,

and

{\mathbf{Z}}_{i}={\mathbf{B}}_{i}\,,\qquad{\mathbf{Z}}_{i+J}={\mathbf{B}}_{i}\,,

where ${\mathbf{B}}_{i}$ is the $i$ -th column of ${\mathbf{B}}$ . Then (24) turns out to be

	$\displaystyle-{\mathbf{B}}^{T}_{i}(y-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})$	$\displaystyle=0\,,$		(25)
	$\displaystyle-{\mathbf{B}}^{T}_{i}(y-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i+J}^{T}\nu+\mu{\mathbf{B}}^{T}_{i}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})$	$\displaystyle=0\,.$		(26)

(25) + (26) yields

({\mathbf{H}}_{i}^{T}-{\mathbf{H}}_{i+J}^{T})\nu=-2\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})\,,

(27)

and (25) - (26) yields

({\mathbf{H}}_{i}^{T}+{\mathbf{H}}_{i+J}^{T})\nu=2{\mathbf{B}}^{T}_{i}(y-{\mathbf{Z}}\gamma)\,.

If there is no ties in the solution, i.e.,

\gamma_{1}^{u}<\gamma_{2}^{u}<\cdots<\gamma_{J}^{u}\,,\qquad\gamma_{1}^{d}>\gamma_{2}^{d}>\cdots>\gamma_{J}^{d}\,,

by the KKT condition, $\nu=0$ , and then (27) implies $\mu=0$ . Then it reduces to unconstrained B-spline fitting. This argument proves the first point of Proposition 2.

\gamma^{u}_{1}<\gamma^{u}_{2}<\ldots<\gamma^{u}_{J}\,,

then from (27), we have

\xi_{i+J}=2\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})\,,

then

\xi=\begin{bmatrix}0\\ 1\end{bmatrix}\otimes 2\mu{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})\,,

it follows that

({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}\xi=\frac{1}{2}\begin{bmatrix}\mu-1\\ \mu+1\end{bmatrix}\otimes(\gamma^{u}-\gamma^{d})\,.

\begin{bmatrix}\gamma^{u}\\ \gamma^{d}\end{bmatrix}=\frac{1}{2}\begin{bmatrix}({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y-(\mu-1)(\gamma^{u}-\gamma^{d})\\ ({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y-(\mu+1)(\gamma^{u}-\gamma^{d})\end{bmatrix}\,,

both yield

(\mu+1)\gamma^{u}-(\mu-1)\gamma^{d}=({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y\,.

On the other hand, from (25), we have

-{\mathbf{B}}^{T}(y-{\mathbf{Z}}\gamma)+\mu{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})=0\,,

that is,

(\gamma^{u}+\gamma^{d})+\mu(\gamma^{u}-\gamma^{d})=({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y\,,

that is,

(\mu+1)\gamma^{u}-(\mu-1)\gamma^{d}=({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y\triangleq\hat{\gamma}^{\mathrm{ls}}\,.

Incorporate with the result in Section D.1, we have

\gamma^{d}=c{\boldsymbol{1}}\,,\gamma^{u}=\frac{1}{\mu+1}\hat{\gamma}^{\mathrm{ls}}+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,.

∎

D.3 (iii)

Proof.

Note that

\begin{cases}\xi_{1}&=\nu_{1}\\ \xi_{2}&=\nu_{2}-\nu_{1}\\ \vdots&\\ \xi_{J-1}&=\nu_{J-1}-\nu_{J-2}\\ \xi_{J}&=-\nu_{J-1}\end{cases}

(28)

If $\hat{\gamma}^{u}_{1}<\cdots<\hat{\gamma}^{u}_{k_{1}}=\cdots=\hat{\gamma}^{u}_{k_{2}}<\cdots<\hat{\gamma}^{u}_{k_{2m-1}}=\cdots=\hat{\gamma}^{u}_{k_{2m}}<\hat{\gamma}_{J}$ , then

\begin{cases}\nu_{1}&=\cdots=\nu_{k_{1}-1}=0\\ \vdots&\\ \nu_{k_{2m-2}}&=\cdots=\nu_{k_{2m-1}}=0\\ \nu_{k_{2m}}&=\cdots=\nu_{J-1}=0\end{cases}\qquad\Longrightarrow\qquad\begin{cases}\xi_{1}&=\cdots=\xi_{k_{1}-1}=0\\ \vdots&\\ \displaystyle\sum_{i=k_{1}}^{k_{2}-1}\xi_{i}&=0\\ \vdots&\\ \displaystyle\sum_{i=k_{2m-1}}^{k_{2m}-1}\xi_{i}&=0\\ \xi_{k_{2m}}&=\cdots=\xi_{J}=0\\ \end{cases}

Thus,

	$\displaystyle{\mathbf{B}}_{1}^{T}y$	$\displaystyle=(\mu+1){\mathbf{B}}^{T}_{1}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}^{T}_{1}{\mathbf{B}}\gamma^{d}$
	$\displaystyle\vdots$
	$\displaystyle{\mathbf{B}}_{k_{1}-1}^{T}y$	$\displaystyle=(\mu+1){\mathbf{B}}^{T}_{k_{1}-1}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}^{T}_{k_{1}-1}{\mathbf{B}}\gamma^{d}$
	$\displaystyle\sum_{i=k_{1}}^{k_{2}-1}{\mathbf{B}}_{i}^{T}y$	$\displaystyle=(\mu+1)\sum_{i=k_{1}}^{k_{2}-1}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=k_{1}}^{k_{2}-1}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}$
	$\displaystyle\vdots$
	$\displaystyle\sum_{i=k_{2m-2}}^{k_{2m}-1}{\mathbf{B}}_{i}^{T}y$	$\displaystyle=(\mu+1)\sum_{i=k_{2m-2}}^{k_{2m}-1}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=k_{2m-2}}^{k_{2m}-1}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}$
	$\displaystyle{\mathbf{B}}_{k_{2m}}^{T}y$	$\displaystyle=(\mu+1){\mathbf{B}}^{T}_{k_{2m}}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}^{T}_{k_{2m}}{\mathbf{B}}\gamma^{d}$
	$\displaystyle\vdots$
	$\displaystyle{\mathbf{B}}_{J}^{T}y$	$\displaystyle=(\mu+1){\mathbf{B}}^{T}_{J}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}_{J}^{T}{\mathbf{B}}\gamma^{d}\,.$

Then

{\mathbf{G}}{\mathbf{B}}^{T}y=(\mu+1){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{d}\,,

where

{\mathbf{G}}=\begin{bmatrix}{\mathbf{I}}_{k_{1}-1}&&&&&\\ &{\boldsymbol{1}}^{T}_{k_{2}-k_{1}+1}&&&&\\ &&\ddots&&&\\ &&&{\mathbf{I}}_{k_{2m-1}-k_{2m-2}-1}&&\\ &&&&{\boldsymbol{1}}^{T}_{k_{2m}-k_{2m-1}+1}&\\ &&&&&{\mathbf{I}}_{J-k_{2m}}\end{bmatrix}\,.

Let $\gamma^{u}={\mathbf{G}}^{T}\beta$ , where $\beta$ is the sub-vector of $\gamma^{u}$ constructed by the unique elements. Since $\gamma^{d}=c{\boldsymbol{1}}$ , and ${\mathbf{G}}^{T}{\boldsymbol{1}}={\boldsymbol{1}}$ , then

{\mathbf{G}}{\mathbf{B}}^{T}y=(\mu+1){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}\beta-(\mu-1){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}\gamma^{d}c{\boldsymbol{1}}\,,

then

\beta=\frac{1}{\mu+1}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,.

thus

\gamma^{u}={\mathbf{G}}^{T}\beta=\frac{1}{\mu+1}{\mathbf{G}}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,,\qquad\gamma^{d}=c{\boldsymbol{1}}\,.

∎

D.4 Corollary 1

The monotone decomposition for decreasing functions can be straightforward to derive, as summarized in Corollary 1.

Corollary 1.

Let $\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d})$ be the monotone decomposition to problem (6).Suppose $\hat{\gamma}^{u}+\hat{\gamma}^{d}$ is decreasing, then

(i)

$\hat{\gamma}^{u}$ is a vector with identical elements;

(ii)

if $\hat{\gamma}^{d}_{1}>\cdots>\hat{\gamma}^{d}_{k_{1}}=\cdots=\hat{\gamma}^{d}_{k_{2}}>\cdots>\hat{\gamma}^{d}_{k_{2m-1}}=\cdots=\hat{\gamma}^{d}_{k_{2m}}>\cdots>\hat{\gamma}_{J}$ , where $1\leq k_{1}\leq k_{2}\leq\cdots\leq k_{2m-1}\leq k_{2m}\leq J$ , then

\hat{\gamma}^{d}=\frac{1}{\mu+1}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,,\qquad\hat{\gamma}^{u}=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{d}}{n}{\boldsymbol{1}}\,.

Proof.

With $(X,y)$ , the solution is $\hat{\gamma}^{u}+\hat{\gamma}^{d}$ , then the solution for $(X,-y)$ is $-\hat{\gamma}^{u}-\hat{\gamma}^{d}$ .

If $\gamma^{u}+\gamma^{d}$ is decreasing, then $-\gamma^{u}-\gamma^{d}$ is increasing. Note that $-\gamma^{u}$ is non-increasing, and $-\gamma^{d}$ is non-decreasing, then by Proposition 1, firstly, $-\gamma^{u}$ is a constant, and hence $\gamma^{u}$ is a constant.

If $\hat{\gamma}^{d}_{1}>\cdots>\hat{\gamma}^{d}_{k_{1}}=\cdots=\hat{\gamma}^{d}_{k_{2}}>\cdots>\hat{\gamma}^{d}_{k_{2m-1}}=\cdots=\hat{\gamma}^{d}_{k_{2m}}>\hat{\gamma}_{J}$ , then

-\hat{\gamma}^{d}_{1}<\cdots<-\hat{\gamma}^{d}_{k_{1}}=\cdots=-\hat{\gamma}^{d}_{k_{2}}<\cdots<-\hat{\gamma}^{d}_{k_{2m-1}}=\cdots=-\hat{\gamma}^{d}_{k_{2m}}<-\hat{\gamma}_{J}\,.

Thus,

-\hat{\gamma}^{d}=\frac{1}{\mu+1}{\mathbf{G}}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}(-y)+\frac{\mu-1}{\mu+1}\tilde{c}{\boldsymbol{1}}\,,\qquad\tilde{c}=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}(-\gamma^{d})}{n}\,,

it follows that

\hat{\gamma}^{d}=\frac{1}{\mu+1}{\mathbf{G}}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,,\qquad c=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}}{n}\,,\qquad\hat{\gamma}^{u}=c{\boldsymbol{1}}\,.

∎

Appendix E A Side Product: Unimodal Functions

We explore the solution for more general unimodal functions, as summarized in Proposition 8. Specifically, the second point of Proposition 3 can be viewed as a special instance with $k=\ell=1$ of Proposition 8.

Proposition 8.

Let $\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d})$ be the monotone decomposition to problem (8). If

	$\displaystyle\hat{\gamma}_{1}^{u}$	$\displaystyle=\hat{\gamma}_{2}^{u}=\cdots=\hat{\gamma}_{k}^{u}<\hat{\gamma}_{k+1}^{u}<\cdots<\hat{\gamma}_{J}^{u}$
	$\displaystyle\hat{\gamma}_{1}^{d}$	$\displaystyle>\hat{\gamma}_{2}^{d}>\cdots>\hat{\gamma}_{\ell}^{d}=\hat{\gamma}_{\ell+1}^{d}=\cdots=\hat{\gamma}_{J}^{d}\,,$

suppose $\ell\leq k$ , then

\begin{bmatrix}{\mathbf{I}}_{\ell-1}&&\\ &{\boldsymbol{1}}_{k-\ell+1}&\\ &&{\mathbf{I}}_{J-k}\end{bmatrix}{\mathbf{B}}^{T}{\mathbf{y}}={\mathbf{A}}{\mathbf{B}}^{T}{\mathbf{B}}\hat{\gamma}^{u}+(2{\mathbf{I}}-{\mathbf{A}}){\mathbf{B}}^{T}{\mathbf{B}}\hat{\gamma}^{d}\,,

where

{\mathbf{A}}=\begin{bmatrix}(1-\mu){\mathbf{I}}_{\ell-1}&&\\ 2\mu{\boldsymbol{1}}_{\ell-1}^{T}&1+\mu&\\ &&(1+\mu){\mathbf{I}}_{J-k}\end{bmatrix}\,.

Proof.

\gamma_{1}^{u}=\gamma_{2}^{u}=\cdots=\gamma_{k}^{u}<\gamma_{k+1}^{u}<\cdots<\gamma_{J}^{u}

and

\gamma_{1}^{d}>\gamma_{2}^{d}>\cdots>\gamma_{\ell}^{d}=\gamma_{\ell+1}^{d}=\cdots=\gamma_{J}^{d}\,.

Suppose $\ell\leq k$ , then

\nu_{k}=\nu_{k+1}=\cdots=\nu_{J-1}=0\,,

and

\nu_{J-1+i}=0,i=1,\ldots,\ell-1\,.

Then from (28), we have $\xi_{i}=0,k+1\leq i\leq J$ , and then

	$\displaystyle\sum_{i=1}^{k}{\mathbf{B}}_{i}^{T}y$	$\displaystyle=(\mu+1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}\,,$		(29)
	$\displaystyle{\mathbf{B}}_{i}^{T}y$	$\displaystyle=(\mu+1){\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}\quad k+1\leq i\leq J\,.$		(30)

Similarly, $\xi_{i+J}=0,1\leq i\leq\ell-1$ , and hence

	$\displaystyle{\mathbf{B}}_{i}^{T}y$	$\displaystyle=(\mu+1){\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}-(\mu-1){\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}\quad 1\leq i\leq\ell-1$		(31)
	$\displaystyle\sum_{i=\ell}^{J}{\mathbf{B}}_{i}^{T}y$	$\displaystyle=(\mu+1)\sum_{i=\ell}^{J}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}-(\mu-1)\sum_{i=\ell}^{J}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}\,.$		(32)

Note that

	$\displaystyle\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}y$	$\displaystyle=(\mu+1)\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}$
		$\displaystyle=(\mu+1)\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}-(\mu+1)\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}\,,$		(33)

then (29) + (32) - (33) yields

	$\displaystyle\sum_{\ell}^{k}{\mathbf{B}}_{i}^{T}y$	$\displaystyle=\sum_{i=1}^{k}{\mathbf{B}}_{i}^{T}y+\sum_{i=\ell}^{J}{\mathbf{B}}_{i}^{T}y-\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}y$
		$\displaystyle=(\mu+1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}+$
		$\displaystyle\qquad\quad(\mu+1)\sum_{i=\ell}^{J}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}-(\mu-1)\sum_{i=\ell}^{J}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-$
		$\displaystyle\qquad\quad[(1-\mu){\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}+(\mu+1){\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}]$
		$\displaystyle=(\mu+1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}+$
		$\displaystyle\qquad\quad(\mu-1)\sum_{i=1}^{\ell-1}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}-(\mu+1)\sum_{i=1}^{\ell-1}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}$
		$\displaystyle=2\mu\sum_{i=1}^{\ell-1}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}+(\mu+1)\sum_{i=\ell}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-2\mu\sum_{i=1}^{\ell-1}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}-(\mu-1)\sum_{i=\ell}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}\,,$

then

	$\displaystyle\begin{bmatrix}{\mathbf{B}}_{1}^{T}y\\ \vdots\\ {\mathbf{B}}_{\ell-1}^{T}y\\ \sum_{i=\ell}^{k}{\mathbf{B}}_{i}^{T}y\\ {\mathbf{B}}_{k+1}^{T}y\\ \vdots\\ {\mathbf{B}}_{J}^{T}y\\ \end{bmatrix}$	$\displaystyle=\begin{bmatrix}1-\mu&\cdots&0&0&0&\cdots&0\\ \vdots&\ddots&0&0&0&\cdots&0\\ 0&\cdots&1-\mu&0&0&\cdots&0\\ 2\mu&\cdots&2\mu&1+\mu&0&\cdots&0\\ 0&\cdots&0&0&1+\mu&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\cdots&0&0&0&\cdots&1+\mu\end{bmatrix}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{u}+$
		$\displaystyle\qquad\qquad\begin{bmatrix}1+\mu&\cdots&0&0&0&\cdots&0\\ \vdots&\ddots&0&0&0&\cdots&0\\ 0&\cdots&1+\mu&0&0&\cdots&0\\ -2\mu&\cdots&-2\mu&1-\mu&0&\cdots&0\\ 0&\cdots&0&0&1-\mu&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\cdots&0&0&0&\cdots&1-\mu\end{bmatrix}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{d}$
		$\displaystyle\triangleq{\mathbf{A}}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{u}+(2{\mathbf{I}}-{\mathbf{A}}){\mathbf{B}}^{T}{\mathbf{B}}\gamma^{d}\,.$

If $\ell=k$ , then

{\mathbf{B}}^{T}y={\mathbf{A}}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{u}+(2{\mathbf{I}}-{\mathbf{A}}){\mathbf{B}}^{T}{\mathbf{B}}\gamma^{d}\,.

∎

Appendix F Proof of Proposition 4

Proof.

The fitting vector is

$\displaystyle\hat{\mathbf{f}}$	$\displaystyle={\mathbf{B}}\hat{\gamma}^{u}+{\mathbf{B}}\hat{\gamma}^{d}$
	$\displaystyle=\frac{1}{\mu+1}{\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{y}}+\frac{2\mu}{\mu+1}\frac{{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}}{2n}{\boldsymbol{1}}_{n}$
	$\displaystyle\triangleq k{\mathbf{A}}{\mathbf{y}}+\frac{1}{n}(1-k){\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}{\boldsymbol{1}}_{n}\,,$	(34)

then

{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}=k{\boldsymbol{1}}^{T}_{n}{\mathbf{A}}{\mathbf{y}}+(1-k){\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}\,.

(35)

It follows that

{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}={\boldsymbol{1}}^{T}_{n}{\mathbf{A}}{\mathbf{y}}\,,

then

\hat{\mathbf{f}}=k{\mathbf{A}}{\mathbf{y}}+\frac{1-k}{n}{\boldsymbol{1}}^{T}_{n}{\mathbf{A}}{\mathbf{y}}{\boldsymbol{1}}_{n}=\left[k{\mathbf{I}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]{\mathbf{A}}{\mathbf{y}}\,.

(36)

Note that ${\mathbf{G}}$ depends on ${\mathbf{y}}$ , and hence ${\mathbf{A}}$ also depends on ${\mathbf{y}}$ . Take expectation on (36) and by the law of total expectation, we have

{\mathbb{E}}\hat{\mathbf{f}}={\mathbb{E}}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]={\mathbb{E}}[k{\mathbf{A}}{\mathbf{f}}+\frac{1-k}{n}{\boldsymbol{1}}^{T}_{n}{\mathbf{A}}{\mathbf{f}}{\boldsymbol{1}}_{n}]\,.

Note that ${\mathbf{A}}{\mathbf{A}}^{T}={\mathbf{A}}={\mathbf{A}}^{T}$ and ${\boldsymbol{1}}_{n}={\mathbf{B}}{\boldsymbol{1}}_{J}={\mathbf{G}}^{T}{\boldsymbol{1}}_{g}={\mathbf{B}}{\mathbf{G}}^{T}{\boldsymbol{1}}_{g}$ , then

	$\displaystyle{\boldsymbol{1}}^{T}_{n}{\mathbf{A}}$	$\displaystyle={\boldsymbol{1}}^{T}_{n}{\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}$
		$\displaystyle={\boldsymbol{1}}^{T}_{g}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}$
		$\displaystyle={\boldsymbol{1}}^{T}_{g}{\mathbf{G}}{\mathbf{B}}^{T}$
		$\displaystyle={\boldsymbol{1}}^{T}_{n}\,,$

and hence ${\boldsymbol{1}}_{n}^{T}{\mathbf{A}}{\mathbf{A}}^{T}{\boldsymbol{1}}_{n}=n$ . It follows that

{\mathbb{E}}\hat{\mathbf{f}}=k{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}{\boldsymbol{1}}_{n}\,.

The square of bias is

	$\displaystyle\\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\\|^{2}$	$\displaystyle=\\|{\mathbf{f}}-k{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}-\frac{1-k}{n}{\boldsymbol{1}}^{T}_{n}{\mathbf{f}}{\boldsymbol{1}}_{n}\\|^{2}$
		$\displaystyle={\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}})^{2}{\mathbf{f}}-\frac{2(1-k)}{n}{\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}}){\boldsymbol{1}}^{T}_{n}{\mathbf{f}}{\boldsymbol{1}}_{n}+\frac{(1-k)^{2}}{n^{2}}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}{\boldsymbol{1}}^{T}_{n}{\boldsymbol{1}}_{n}$
		$\displaystyle={\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}})^{2}{\mathbf{f}}-\frac{2(1-k)}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}}){\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}}){\boldsymbol{1}}_{n}+\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}$
		$\displaystyle={\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}})^{2}{\mathbf{f}}-\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}$
		$\displaystyle={\mathbf{f}}^{T}{\mathbf{f}}-2k{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+k^{2}{\mathbf{f}}^{T}[{\mathbb{E}}{\mathbf{A}}]^{2}{\mathbf{f}}-\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}\,.$

Note that

[{\mathbb{E}}{\mathbf{A}}]^{2}={\mathbb{E}}{\mathbf{A}}^{2}-\operatorname*{Var}({\mathbf{A}})\,,

and ${\mathbf{A}}^{2}={\mathbf{A}}$ , we can write

k^{2}{\mathbf{f}}^{T}[{\mathbb{E}}{\mathbf{A}}]^{2}{\mathbf{f}}=k^{2}{\mathbf{f}}^{T}[{\mathbb{E}}{\mathbf{A}}-\operatorname*{Var}({\mathbf{A}})]{\mathbf{f}}\,.

It follows that

	$\displaystyle\\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\\|^{2}$	$\displaystyle={\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}-2k{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+k^{2}{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+{\mathbf{f}}^{T}{\mathbf{f}}-{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}-k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}-\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}$
		$\displaystyle=(1-k)^{2}{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbb{E}}{\mathbf{A}}){\mathbf{f}}-k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}-\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}$
		$\displaystyle=(1-k)^{2}\left({\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}-\frac{({\boldsymbol{1}}_{n}^{T}{\mathbf{f}})^{2}}{n}\right)+{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbb{E}}{\mathbf{A}}){\mathbf{f}}-k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}$
		$\displaystyle\triangleq(1-k)^{2}C_{1}+C_{2}-k^{2}C_{3}\,.$

Since for each ${\mathbf{A}}$ , we have $({\mathbf{I}}-{\mathbf{A}})^{2}=({\mathbf{I}}-{\mathbf{A}})$ and $({\mathbf{A}}-{\boldsymbol{1}}{\boldsymbol{1}}^{T}/n)^{2}={\mathbf{A}}-{\boldsymbol{1}}{\boldsymbol{1}}^{T}/n$ , then both ${\mathbf{I}}-{\mathbf{A}}$ and ${\mathbf{A}}-{\boldsymbol{1}}{\boldsymbol{1}}^{T}$ are idempotent, and hence are positive semi-definite. It follows that both ${\mathbf{I}}-{\mathbb{E}}{\mathbf{A}}$ and ${\mathbb{E}}{\mathbf{A}}-{\boldsymbol{1}}{\boldsymbol{1}}^{T}/n$ are also positive semi-definite. Hence $C_{1}\geq 0$ and $C_{2}\geq 0$ .

On the other hand, by the law of total variance, we have

\operatorname*{Var}(\hat{\mathbf{f}})={\mathbb{E}}[\operatorname*{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]+\operatorname*{Var}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]\,.

(37)

For the first term of (37), we have

	$\displaystyle\operatorname*{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})$	$\displaystyle=\sigma^{2}\left[k{\mathbf{I}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]{\mathbf{A}}{\mathbf{A}}^{T}\left[k{\mathbf{I}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]$
		$\displaystyle=\sigma^{2}\left[k^{2}{\mathbf{A}}{\mathbf{A}}^{T}+\frac{1-k}{n}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{A}}{\mathbf{A}}^{T}+\frac{1-k}{n}{\mathbf{A}}{\mathbf{A}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}+\frac{(1-k)^{2}}{n^{2}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]\,.$

Since

	$\displaystyle\operatorname*{tr}({\mathbf{A}}{\mathbf{A}}^{T})$	$\displaystyle=\operatorname*{tr}({\mathbf{A}})$
		$\displaystyle=\operatorname*{tr}({\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T})$
		$\displaystyle=\operatorname*{tr}(({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})$
		$\displaystyle=g\,,$

then

\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]=\sigma^{2}\left[k^{2}g+2(1-k)+(1-k)^{2}\right]\,.

For the second term of (37), we have

\displaystyle\operatorname*{Var}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]=\operatorname*{Var}[k{\mathbf{A}}{\mathbf{f}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}{\boldsymbol{1}}_{n}]=k^{2}\operatorname*{Var}[{\mathbf{A}}]{\mathbf{f}}{\mathbf{f}}^{T}\,.

Thus,

	$\displaystyle\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}})]$	$\displaystyle=\operatorname{tr}[{\mathbb{E}}[\operatorname{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]]+\operatorname{tr}[\operatorname{Var}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]]$
		$\displaystyle={\mathbb{E}}[\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]]+\operatorname{tr}[k^{2}\operatorname{Var}[{\mathbf{A}}]{\mathbf{f}}{\mathbf{f}}^{T}]$
		$\displaystyle=\sigma^{2}[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}]+k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}$
		$\displaystyle\triangleq\sigma^{2}[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}]+k^{2}C_{3}\,.$

It follows that the mean square error is

	$\displaystyle\operatorname*{MSE}$	$\displaystyle=\\|\operatorname{Bias}\\|^{2}+\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}})]$
		$\displaystyle=(1-k)^{2}C_{1}+C_{2}+\sigma^{2}\left[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}\right]\,.$

The unconstrained B-spline fitting has

\operatorname*{MSE}{}^{\mathrm{ls}}=J\sigma^{2}\,.

To have $\operatorname*{MSE}<\operatorname*{MSE}{}^{\mathrm{ls}}$ ,

(1-k)^{2}C_{1}+C_{2}+\sigma^{2}\left[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}\right]<J\sigma^{2}\,,

that is

h(k)\triangleq\left[C_{1}+({\mathbb{E}}g+1)\sigma^{2}\right]k^{2}-\left[2C_{1}+4\sigma^{2}\right]k+C_{1}+C_{2}+3\sigma^{2}-J\sigma^{2}<0\,.

Note that the minimum is obtained at

k_{\min}=\frac{\left[2C_{1}+4\sigma^{2}\right]}{2\left[C_{1}+({\mathbb{E}}g+1)\sigma^{2}\right]}=\frac{C_{1}+2\sigma^{2}}{C_{1}+({\mathbb{E}}g+1)\sigma^{2}}\in(0,1)\,,

since $g\geq 1$ , and the minimum value is

	$\displaystyle h_{\min}$	$\displaystyle=C_{1}+C_{2}+3\sigma^{2}-J\sigma^{2}-\frac{\left[2C_{1}+4\sigma^{2}\right]^{2}}{4\left[C_{1}+({\mathbb{E}}g+1)\sigma^{2}\right]}$
		$\displaystyle=\frac{\sigma^{4}\left[(3-J)({\mathbb{E}}g+1)-4\right]+\sigma^{2}\left[-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]+C_{1}C_{2}}{C_{1}+({\mathbb{E}}g+1)\sigma^{2}}$
		$\displaystyle=\frac{-\sigma^{4}\left[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}\right]+\sigma^{2}\left[-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]+C_{1}C_{2}}{C_{1}+({\mathbb{E}}g+1)\sigma^{2}}$
		$\displaystyle=\frac{u(\sigma^{2})}{C_{1}+({\mathbb{E}}g+1)\sigma^{2}}\,,$

where

u(t)=-t^{2}\left[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}\right]+t\left[-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]+C_{1}C_{2}\,.

Since $C_{1},C_{2}\geq 0$ ,

	$\displaystyle\Delta$	$\displaystyle=\left[-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]^{2}+4C_{1}C_{2}\left[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}\right]\geq 0$
		$\displaystyle=\left[C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]^{2}+4C_{1}C_{2}({\mathbb{E}}g-1)^{2}\geq 0\,,$

then $u(t)<0$ if

t>\frac{C_{1}(J-{\mathbb{E}}g)-C_{2}({\mathbb{E}}g+1)-\sqrt{\Delta}}{-2[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}]}=\frac{-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)+\sqrt{\Delta}}{2[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}]}\,.

F.1 ${\mathbf{G}}={\mathbf{I}}$

If ${\mathbf{G}}={\mathbf{I}}$ , that is, no ties in the solution,

{\mathbf{A}}{\mathbf{f}}={\mathbf{A}}{\mathbf{B}}\gamma={\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}{\mathbf{B}}\gamma={\mathbf{B}}\gamma={\mathbf{f}}\,.

	$\displaystyle\\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\\|^{2}$	$\displaystyle=\\|{\mathbf{f}}-k{\mathbf{f}}-\frac{1-k}{n}{\boldsymbol{1}}^{T}{\mathbf{f}}{\boldsymbol{1}}\\|^{2}$
		$\displaystyle=(1-k)^{2}\\|{\mathbf{f}}-\frac{1}{n}{\boldsymbol{1}}^{T}{\mathbf{f}}{\boldsymbol{1}}\\|^{2}$
		$\displaystyle=(1-k)^{2}\left[\\|{\mathbf{f}}\\|^{2}-\frac{1}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}\right]$

If ${\mathbf{G}}={\mathbf{I}}$ , $J=g,C_{2}=0$ , then

h_{\min}=\frac{-\sigma^{4}(g-1)^{2}}{C_{1}+(g+1)\sigma^{2}}<0\,.

∎

Appendix G Proof of Proposition 5

G.1 (i)

Proof.

Since if $\gamma^{u}+\gamma^{d}=\tilde{\gamma}^{u}+\tilde{\gamma}^{d}$ , then the roughness penalty does not change,

(\gamma^{u}+\gamma^{d})^{T}{\boldsymbol{\Omega}}(\gamma^{u}+\gamma^{d})=(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})^{T}{\boldsymbol{\Omega}}(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})\,.

Then we can repeat the procedure in Section D.1.

∎

G.2 (ii)

Proof.

A special case when ${\mathbf{G}}={\mathbf{I}}$ in the following result. ∎

G.3 (iii)

Proof.

Take the derivatives on the Lagrange form,

-2{\mathbf{Z}}^{T}({\mathbf{y}}-{\mathbf{Z}}\gamma)+2{\mathbf{H}}^{T}\nu+2\mu{\mathbf{W}}\gamma+2\lambda\begin{bmatrix}{\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\\ {\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\end{bmatrix}\gamma=0

Then

	$\displaystyle-{\mathbf{B}}^{T}_{i}({\mathbf{y}}-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})+\lambda{\boldsymbol{\Omega}}_{i}(\gamma^{u}+\gamma^{d})=0$		(38)
	$\displaystyle-{\mathbf{B}}^{T}_{i}({\mathbf{y}}-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i+J}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})+\lambda{\boldsymbol{\Omega}}_{i}(\gamma^{u}+\gamma^{d})=0$		(39)

Rewrite them as

	$\displaystyle{\mathbf{B}}^{T}_{i}{\mathbf{y}}+\xi_{i}=((1+\mu){\mathbf{B}}_{i}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}_{i})\gamma^{u}+((1-\mu){\mathbf{B}}_{i}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}_{i})\gamma^{d}$		(40)
	$\displaystyle{\mathbf{B}}^{T}_{i}{\mathbf{y}}+\xi_{i+J}=((1-\mu){\mathbf{B}}_{i}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}_{i})\gamma^{u}+((1+\mu){\mathbf{B}}_{i}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}_{i})\gamma^{d}\,.$		(41)

If there are no ties in the solution, i.e.,

\gamma_{1}^{u}<\gamma_{2}^{u}<\cdots<\gamma_{J}^{u}\,,\qquad\gamma_{1}^{d}>\gamma_{2}^{d}>\cdots>\gamma_{J}^{d}\,,

by the KKT condition, $\nu=0$ , and hence $\xi_{i}=\xi_{i+J}=0$ , then (40) - (41) yields

0=2\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}-2\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}\,,

that is

0=2\mu{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})\,,

which implies $\mu=0$ since $\gamma^{u}>\gamma^{d}$ . Then it reduces to unconstrained B-spline fitting. This argument proves the first point of Proposition 2.

If $\hat{\gamma}^{u}_{1}<\cdots<\hat{\gamma}^{u}_{k_{1}}=\cdots=\hat{\gamma}^{u}_{k_{2}}<\cdots<\hat{\gamma}^{u}_{k_{2m-1}}=\cdots=\hat{\gamma}^{u}_{k_{2m}}<\hat{\gamma}_{J}$ , then similar to Section D, we have

{\mathbf{G}}{\mathbf{B}}^{T}y=((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}})\gamma^{u}+((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}})\gamma^{d}\,.

Let $\gamma^{u}={\mathbf{G}}^{T}\beta$ , where $\beta$ is the sub-vector of $\gamma^{u}$ constructed by the unique elements. By G.1, $\gamma^{d}=c{\boldsymbol{1}}_{J}$ , and note that ${\mathbf{G}}^{T}{\boldsymbol{1}}_{g}={\boldsymbol{1}}_{J}$ , then

{\mathbf{G}}{\mathbf{B}}^{T}y=((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})\beta+c((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T}){\boldsymbol{1}}_{g}\,,

it follows that

	$\displaystyle\beta$	$\displaystyle=((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y-$
		$\displaystyle\qquad c((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T}){\boldsymbol{1}}_{g}\,,$

and

c=\frac{{\boldsymbol{1}}_{n}^{T}{\mathbf{B}}\gamma^{u}}{n}=\frac{{\boldsymbol{1}}_{n}^{T}{\mathbf{B}}{\mathbf{G}}^{T}\beta}{n}\,.

∎

G.4 Corollary 2

Analogously, we can obtain the monotone decomposition with smoothing splines on decreasing functions, as summarized in Corollary 2.

Corollary 2.

Let $\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d})$ be the monotone decomposition to problem (11). Suppose $\hat{\gamma}^{u}+\hat{\gamma}^{d}$ is decreasing, then

(i)

$\hat{\gamma}^{u}$ is a vector with identical elements;

(ii)

if $\hat{\gamma}^{d}_{1}>\cdots>\hat{\gamma}^{d}_{k_{1}}=\cdots=\hat{\gamma}^{d}_{k_{2}}>\cdots>\hat{\gamma}^{d}_{k_{2m-1}}=\cdots=\hat{\gamma}^{d}_{k_{2m}}>\hat{\gamma}_{J}$ , where $1\leq k_{1}\leq k_{2}\leq\cdots\leq k_{2m-1}\leq k_{2m}\leq J$ , then

	$\displaystyle\hat{\gamma}^{d}$	$\displaystyle={\mathbf{G}}^{T}((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y-$
		$\displaystyle\qquad c{\mathbf{G}}^{T}((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T}){\boldsymbol{1}}_{g}\,,$
	$\displaystyle\hat{\gamma}^{u}$	$\displaystyle=c{\boldsymbol{1}}_{J}=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{d}}{n}{\boldsymbol{1}}_{J}\,,$

Proof.

Similar to the proof D.4 for Corollary 1. ∎

Appendix H Proof of Proposition 6

H.1 Lemmas

Lemma 2 ([14]).

Let ${\mathbf{A}},{\mathbf{B}}$ be two real matrices of the same size, then

[\operatorname*{tr}({\mathbf{A}}{\mathbf{B}})]^{2}\leq[\operatorname*{tr}({\mathbf{A}}^{2})][\operatorname*{tr}({\mathbf{B}}^{2})]\,.

Lemma 3.

Let ${\mathbf{A}}$ be a real positive semi-definite matrix, then

\operatorname*{tr}({\mathbf{A}}^{2})\leq[\operatorname*{tr}({\mathbf{A}})]^{2}\,.

Proof.

Let $\lambda_{i}$ be the eigenvalues of ${\mathbf{A}}$ , then $\lambda_{i}^{2}$ are the eigenvalues of ${\mathbf{A}}^{2}$ . Note that

\operatorname*{tr}[{\mathbf{A}}^{2}]=\sum\lambda_{i}^{2}\leq(\sum\lambda_{i})^{2}=[\operatorname*{tr}({\mathbf{A}})]^{2}\,.

∎

Lemma 4.

The eigenvalues of ${\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}$ is

\lambda_{1}=n,\lambda_{2}=\cdots=\lambda_{n}=0\,.

Proof.

Since ${\boldsymbol{1}}_{n}{\boldsymbol{1}}^{T}_{n}$ is a rank-1 matrix, which implies that it has $n-1$ eigenvalues which are zero. Denote the eigenvectors as $\xi_{i},i=1,\ldots,n$ . Suppose the nonzero eigenvalues is $\lambda_{1}$ with associated eigenvectors $\xi_{1}$ , then

{\boldsymbol{1}}_{n}{\boldsymbol{1}}^{T}_{n}\xi_{1}=\lambda_{1}\xi_{1}\,,

left multiplying ${\boldsymbol{1}}^{T}_{n}$ yields

n{\boldsymbol{1}}_{n}^{T}\xi_{1}=\lambda_{1}{\boldsymbol{1}}^{T}\xi_{1}\,,

which implies that $\lambda_{1}=n$ . ∎

H.2 ${\mathbf{G}}={\mathbf{I}}$

Proof.

If ${\mathbf{G}}={\mathbf{I}}$ , the solution is

	$\displaystyle\hat{\gamma}^{u}$	$\displaystyle=\frac{1}{1+\mu}\left[{\mathbf{B}}^{T}{\mathbf{B}}+\frac{\lambda}{1+\mu}{\boldsymbol{\Omega}}\right]^{-1}{\mathbf{B}}^{T}{\mathbf{y}}-c((1+\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}})^{-1}((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}){\boldsymbol{1}}_{J}\,,$
		$\displaystyle\triangleq\frac{1}{1+\mu}\left[{\mathbf{B}}^{T}{\mathbf{B}}+\lambda_{0}{\boldsymbol{\Omega}}\right]^{-1}{\mathbf{B}}^{T}{\mathbf{y}}-c{\mathbf{K}}{\boldsymbol{1}}_{J}\,,$
	$\displaystyle\hat{\gamma}^{d}$	$\displaystyle=c{\boldsymbol{1}}_{J}=\frac{{\boldsymbol{1}}_{n}^{T}\hat{\mathbf{f}}}{2n}{\boldsymbol{1}}_{J}\,,$

then the fitting vector is

	$\displaystyle\hat{\mathbf{f}}$	$\displaystyle={\mathbf{B}}\hat{\gamma}^{u}+{\mathbf{B}}\hat{\gamma}^{d}$
		$\displaystyle=\frac{1}{1+\mu}\hat{\mathbf{f}}^{\mathrm{ss}}+c{\mathbf{B}}(-{\mathbf{K}}+{\mathbf{I}}){\boldsymbol{1}}_{J}$
		$\displaystyle=\frac{1}{1+\mu}\hat{\mathbf{f}}^{\mathrm{ss}}+\frac{{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}}{2n}{\mathbf{B}}\left[(1+\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}\right]^{-1}2\mu{\mathbf{B}}^{T}{\mathbf{B}}{\boldsymbol{1}}_{J}$
		$\displaystyle=\frac{1}{1+\mu}\hat{\mathbf{f}}^{\mathrm{ss}}+\frac{\mu}{1+\mu}\frac{{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}}{n}{\mathbf{B}}\left[{\mathbf{B}}^{T}{\mathbf{B}}+\frac{\lambda}{1+\mu}{\boldsymbol{\Omega}}\right]^{-1}{\mathbf{B}}^{T}{\boldsymbol{1}}_{n}$
		$\displaystyle\triangleq k\hat{\mathbf{f}}^{\mathrm{ss}}+(1-k)\frac{{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}}{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}\,,$

where ${\mathbf{Q}}={\mathbf{B}}\left[{\mathbf{B}}^{T}{\mathbf{B}}+\lambda_{0}{\boldsymbol{\Omega}}\right]^{-1}{\mathbf{B}}^{T}$ . In practice, $\lambda_{0}$ is given as a constant, so ${\mathbf{Q}}$ does not depend on $k$ . Left multiplying ${\boldsymbol{1}}_{n}^{T}$ yields

{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}=k{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}^{\mathrm{ss}}+(1-k){\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}\frac{{\boldsymbol{1}}^{T}_{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}}{n}\,,

that is,

{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}=\frac{k{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}^{\mathrm{ss}}}{1-\frac{{\boldsymbol{1}}^{T}_{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}}{n}(1-k)}\triangleq\frac{k{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}^{\mathrm{ss}}}{1-\eta(1-k)}\,.

It follows that

\hat{\mathbf{f}}=\left[k{\mathbf{I}}+\frac{1-k}{n}\frac{k}{1-\eta(1-k)}{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]\hat{\mathbf{f}}^{\mathrm{ss}}\triangleq(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})\hat{\mathbf{f}}^{\mathrm{ss}}\,.

Then the squared bias is

	$\displaystyle\\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\\|^{2}_{2}=\\|{\mathbf{f}}-(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\\|^{2}_{2}$
	$\displaystyle=\\|(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})+((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}\\|^{2}$
	$\displaystyle=({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}(k{\mathbf{I}}+\alpha{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}})(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})+$
	$\displaystyle\qquad 2({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}+$
	$\displaystyle\qquad{\mathbf{f}}^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}$
	$\displaystyle\triangleq C_{1}+2C_{2}+C_{3}\,.$

First for term $C_{1}$ , we have

	$\displaystyle C_{1}$	$\displaystyle=({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}(k{\mathbf{I}}+\alpha{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}})(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})$
		$\displaystyle=k^{2}\\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\\|^{2}+2k\alpha({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})+$
		$\displaystyle\qquad\qquad\alpha^{2}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})\,.$

Take the derivative with respect to $k$ ,

	$\displaystyle\frac{\partial C_{1}}{\partial k}$	$\displaystyle=2k\\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\\|^{2}+2\left(k\frac{\partial\alpha}{\partial k}+\alpha\right)({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})$
		$\displaystyle\qquad\qquad+2\alpha\frac{\partial\alpha}{\partial k}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})\,.$

Note that $\frac{\partial\alpha}{\partial k}\mid_{k=1}=-\frac{1}{n}$ and $\alpha(1)=0$ , then

\frac{\partial C_{1}}{\partial k}\mid_{k=1}=2\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\|^{2}-\frac{2}{n}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})\,.

Similarly, for term $C_{2}$ and $C_{3}$ , we have

	$\displaystyle C_{2}$	$\displaystyle=({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}$
		$\displaystyle={\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}})^{T}(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}$
		$\displaystyle=k(1-k){\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+(1-k)\alpha{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\mathbf{f}}-k\alpha{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}+$
		$\displaystyle\qquad\qquad\alpha^{2}{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,$
	$\displaystyle C_{3}$	$\displaystyle={\mathbf{f}}^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}$
		$\displaystyle=(1-k)^{2}{\mathbf{f}}^{T}{\mathbf{f}}-2\alpha(1-k){\mathbf{f}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\mathbf{f}}+\alpha^{2}{\mathbf{f}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,$

and then take their derivatives with respect to $k$ ,

	$\displaystyle\frac{\partial C_{2}}{\partial k}$	$\displaystyle=(1-2k){\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+\left((1-k)\frac{\partial\alpha}{\partial k}-\alpha\right){\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\mathbf{f}}-$
		$\displaystyle\qquad\qquad\left(\frac{\partial\alpha}{\partial k}+\alpha\right){\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}+2\alpha\frac{\partial\alpha}{\partial k}{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,$
	$\displaystyle\frac{\partial C_{3}}{\partial k}$	$\displaystyle=-2(1-k){\mathbf{f}}^{T}{\mathbf{f}}-2\left((1-k)\frac{\partial\alpha}{\partial k}-\alpha\right){\mathbf{f}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\mathbf{f}}+2\alpha\frac{\partial\alpha}{\partial k}{\mathbf{f}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,$

it follows that

	$\displaystyle\frac{\partial C_{2}}{\partial k}\mid_{k=1}$	$\displaystyle=-{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+\frac{1}{n}{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,$
	$\displaystyle\frac{\partial C_{3}}{\partial k}\mid_{k=1}$	$\displaystyle=0\,.$

On the other hand, the variance is

	$\displaystyle\operatorname*{Var}(\hat{\mathbf{f}})$	$\displaystyle=(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}$
		$\displaystyle=k^{2}\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})+k\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})+k\alpha\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}+\alpha^{2}{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}\,,$

and

\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}})]=k^{2}\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]+2k\alpha\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]+\alpha^{2}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}]\,,

then its derivative with respect to $k$ is

	$\displaystyle\frac{\partial\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}})]}{\partial k}$	$\displaystyle=2k\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]+2\left(k\frac{\partial\alpha}{\partial k}+\alpha\right)\operatorname{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]+$
		$\displaystyle\qquad\qquad 2\alpha\frac{\partial\alpha}{\partial k}{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}\,.$

Evaluate at $k=1$ ,

\frac{\partial\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}})]}{\partial k}\mid_{k=1}=2\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-\frac{2}{n}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]\,.

Thus

$\displaystyle\frac{\partial\operatorname*{MSE}(k)}{\partial k}\mid_{k=1}$	$\displaystyle=2\\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\\|^{2}-\frac{2}{n}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})+$
	$\displaystyle\qquad\qquad 2\left[-{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+\frac{1}{n}{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\right]+$
	$\displaystyle\qquad\qquad 2\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-\frac{2}{n}\operatorname{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]$
	$\displaystyle=2{\mathbf{f}}^{T}{\mathbf{Q}}\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}({\mathbf{I}}-{\mathbf{Q}})f-2{\mathbf{f}}^{T}{\mathbf{Q}}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+$
	$\displaystyle\qquad\qquad 2\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-\frac{2}{n}\operatorname{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]$
	$\displaystyle=2\left[\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-\frac{1}{n}\operatorname{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-{\mathbf{f}}^{T}{\mathbf{Q}}({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n})({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}\right]$	(42)
	$\displaystyle=2\left[\sigma^{2}\operatorname{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}){\mathbf{Q}}^{2}]-\operatorname{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n})({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}{\mathbf{f}}^{T}{\mathbf{Q}}]\right]\,.$	(43)

Next, we will prove the first term on the right-hand side of (43) is positive. By the Cauchy-Schwarz inequality for trace in Lemma 2,

	$\displaystyle\operatorname{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]$	$\displaystyle\leq\sqrt{\operatorname{tr}[({\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{2}]\operatorname{tr}[(\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}))^{2}]}$
		$\displaystyle={\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\boldsymbol{1}}_{n}\sqrt{\operatorname{tr}[(\operatorname{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}))^{2}]}\,,$

and since $\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})=\sigma^{2}{\mathbf{Q}}^{2}$ is a positive semi-definite matrix, then by Lemma 3,

\operatorname*{tr}[(\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}))^{2}]\leq[\operatorname*{tr}(\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}))]^{2}\,.

Note that

\displaystyle\eta

\displaystyle=\frac{{\boldsymbol{1}}^{T}_{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}}{n}=\frac{{\boldsymbol{1}}^{T}_{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}}{{\boldsymbol{1}}^{T}_{n}{\boldsymbol{1}}_{n}}\leq\lambda_{\max}({\mathbf{Q}})=\lambda_{\max}({\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}}+\lambda_{0}{\boldsymbol{\Omega}})^{-1}{\mathbf{B}}^{T})\,.

Perform singular value decomposition (SVD) on ${\mathbf{B}}$ , ${\mathbf{B}}={\mathbf{U}}{\mathbf{D}}{\mathbf{V}}^{T}$ , where ${\mathbf{U}}$ and ${\mathbf{V}}$ are $n\times J$ and $J\times J$ orthogonal matrices, and ${\mathbf{D}}$ is a $J\times J$ diagonal matrix, with diagonal entries $d_{1}\geq d_{2}\geq\cdots d_{p}\geq 0$ . Then

{\mathbf{Q}}={\mathbf{U}}{\mathbf{D}}{\mathbf{V}}^{T}({\mathbf{V}}{\mathbf{D}}^{2}{\mathbf{V}}^{T}+\lambda_{0}{\boldsymbol{\Omega}})^{-1}{\mathbf{V}}{\mathbf{D}}{\mathbf{U}}^{T}={\mathbf{U}}({\mathbf{I}}+\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})^{-1}{\mathbf{U}}^{T}\,,

it follows that

	$\displaystyle\eta$	$\displaystyle\leq\lambda_{\max}({\mathbf{U}}({\mathbf{I}}+\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})^{-1}{\mathbf{U}}^{T})$
		$\displaystyle=\lambda_{\max}(({\mathbf{I}}+\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})^{-1})$
		$\displaystyle=\frac{1}{\lambda_{\min}({\mathbf{I}}+\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})}$
		$\displaystyle=\frac{1}{1+\lambda_{\min}(\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})}<1\,,$

thus

\frac{1}{n}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]<\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]\,.

Note that if $\frac{\partial\operatorname*{MSE}(k)}{\partial k}>0$ , that is,

\sigma^{2}>\frac{{\mathbf{f}}^{T}{\mathbf{Q}}({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n})({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}}{\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}){\mathbf{Q}}^{2}]}\,,

then there exists $k\in(0,1)$ such that $\operatorname*{MSE}(k)<\operatorname*{MSE}(1)$ .

If $\lambda_{0}=0$ , then $({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}=\boldsymbol{0}$ , and

{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}={\boldsymbol{1}}_{n}({\mathbf{B}}{\boldsymbol{1}}_{J})^{T}{\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}={\boldsymbol{1}}_{n}{\boldsymbol{1}}_{J}^{T}{\mathbf{B}}^{T}={\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\,,

{\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}={\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}({\mathbf{B}}{\boldsymbol{1}}_{J})^{T}{\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}}{n}={\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{J}^{T}{\mathbf{B}}^{T}}{n}={\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}}{n}\,,

then

\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}){\mathbf{Q}}^{2}]=\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}}{n}){\mathbf{Q}}]=J-\frac{1}{n}\operatorname*{tr}[{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}]=J-1\,,

and hence $\frac{\partial\operatorname*{MSE}(k)}{\partial k}>0$ if $\sigma^{2}>0$ , which always holds.

∎

	$\displaystyle\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}})]$	$\displaystyle=\operatorname{tr}[{\mathbb{E}}[\operatorname{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]]+\operatorname{tr}[\operatorname{Var}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]]$
		$\displaystyle={\mathbb{E}}[\operatorname{tr}[\operatorname{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]]+\operatorname{tr}[k^{2}\operatorname{Var}[{\mathbf{A}}]{\mathbf{f}}{\mathbf{f}}^{T}]$
		$\displaystyle=\sigma^{2}[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}]+k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}$
		$\displaystyle\triangleq\sigma^{2}[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}]+k^{2}C_{3}\,.$

	$\displaystyle\\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\\|^{2}$	$\displaystyle=\\|{\mathbf{f}}-k{\mathbf{f}}-\frac{1-k}{n}{\boldsymbol{1}}^{T}{\mathbf{f}}{\boldsymbol{1}}\\|^{2}$
		$\displaystyle=(1-k)^{2}\\|{\mathbf{f}}-\frac{1}{n}{\boldsymbol{1}}^{T}{\mathbf{f}}{\boldsymbol{1}}\\|^{2}$
		$\displaystyle=(1-k)^{2}\left[\\|{\mathbf{f}}\\|^{2}-\frac{1}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}\right]$

Decomposition with Monotone B-splines: Fitting and Testing

Abstract

1 Introduction

2 Monotone Decomposition with Cubic Splines

Proposition 1.

2.1 Discrepancy Constraint: A Motivating Example

2.2 General Functions

Proposition 2.

2.3 Monotone Functions

Proposition 3.

2.4 MSE Comparisons

Proposition 4.

3 Monotone Decomposition with Smoothing Splines

3.1 Monotone Functions

Proposition 5.

3.2 MSE Comparisons

Proposition 6.

4 Simulations for Fitting

4.1 Cubic Splines

Tune μ\mu with fixed JJ:

Tune μ\mu and JJ simultaneously:

4.2 Smoothing Splines

5 Test of Monotonicity

5.1 Related Work

5.2 Test by Monotone Decomposition

6 Simulations for Testing

7 Application: Monotonicity Test for scRNA-seq Trajectory Inference

7.1 nME genes can identify significant GO terms when DE genes fail

7.2 nME genes can fine-locate GO terms when DE genes identify too many terms

7.3 nME genes can highlight non-monotonic patterns while DE genes blur them

8 Discussions

Acknowledgement

Supplementary Material

References

Appendix A More Simulation Results

A.1 Candidate Kernels

A.2 Random Function Generation

A.3 Monotone Decomposition with Cubic Splines

A.4 Monotone Decomposition Fitting with Smoothing Splines

A.4.1 A complete version for Table 2

A.4.2 Tune μ\mu with fixed λ\lambda

A.4.3 Tune the shrinkage factor kk

A.5 QQ plots of p-values

A.6 Different Curves for Testing the Monotonicity

Appendix B More GO Analysis

Appendix C Theoretical proofs of some basic propositions

C.1 A proposition

Proposition 7.

Proof.

C.2 Proposition 1

Proof.

C.3 Proposition 2

Proof.

Appendix D Proof of Proposition 3

Lemma 1 (Chebyshev Sum Inequality).

Proof.

D.1 (i)

Proof.

D.2 (ii)

Proof.

D.3 (iii)

Proof.

D.4 Corollary 1

Corollary 1.

Proof.

Appendix E A Side Product: Unimodal Functions

Proposition 8.

Proof.

Appendix F Proof of Proposition 4

Proof.

F.1 𝐆=𝐈{\mathbf{G}}={\mathbf{I}}

Appendix G Proof of Proposition 5

G.1 (i)

Proof.

G.2 (ii)

Proof.

G.3 (iii)

Proof.

G.4 Corollary 2

Corollary 2.

Tune $\mu$ with fixed $J$ :

Tune $\mu$ and $J$ simultaneously:

A.4.2 Tune $\mu$ with fixed $\lambda$

A.4.3 Tune the shrinkage factor $k$

F.1 ${\mathbf{G}}={\mathbf{I}}$

H.2 ${\mathbf{G}}={\mathbf{I}}$