This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Decomposition with Monotone B-splines: Fitting and Testing

Lijun Wang Email: ljwang@link.cuhk.edu.hk, lijun.wang@yale.edu Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China Department of Biostatistics, Yale University, New Haven, Connecticut, USA Xiaodan Fan Email: xfan@cuhk.edu.hk Department of Statistics, The Chinese University of Hong Kong, Hong Kong SAR, China Hongyu Zhao Email: hongyu.zhao@yale.edu Department of Biostatistics, Yale University, New Haven, Connecticut, USA Jun S. Liu Email: jliu@stat.harvard.edu Department of Statistics, Harvard University, Cambridge, Massachusetts, USA
Abstract

A univariate continuous function can always be decomposed as the sum of a non-increasing function and a non-decreasing one. Based on this property, we propose a non-parametric regression method that combines two spline-fitted monotone curves. We demonstrate by extensive simulations that, compared to standard spline-fitting methods, the proposed approach is particularly advantageous in high-noise scenarios. Several theoretical guarantees are established for the proposed approach. Additionally, we present statistics to test the monotonicity of a function based on monotone decomposition, which can better control Type I error and achieve comparable (if not always higher) power compared to existing methods. Finally, we apply the proposed fitting and testing approaches to analyze the single-cell pseudotime trajectory datasets, identifying significant biological insights for non-monotonically expressed genes through Gene Ontology enrichment analysis. The source code implementing the methodology and producing all results is accessible at https://github.com/szcf-weiya/MonotoneDecomposition.jl.

Keywords: Function Decomposition; Monotone B-splines; Curve Fitting; Test of Monotonicity.

1 Introduction

Suppose we have nn pairs of observations (xi,yi),i=1,,n(x_{i},y_{i}),i=1,\ldots,n, with xiIRd,yiIRx_{i}\in\mathrm{I\!R}^{d},y_{i}\in\mathrm{I\!R}, independent and identically distributed (i.i.d.) according to an unknown probability distribution P(X,Y)P(X,Y). Various methods exist for estimating the conditional expectation function f(x)=𝔼(Y|X=x)f(x)={\mathbb{E}}(Y|X=x), ranging from simple linear regressions (including ridge and lasso) to more sophisticated nonlinear techniques. Spline is one of the most popular methods, particularly when d=1d=1. Unlike existing methods, we aim to estimate the monotonic components of f(x)f(x) and then use their sum as an estimator for f(x)f(x). This is because any general function can be decomposed into the sum of an increasing function fup(x)f_{\mathrm{up}}(x) and a decreasing function fdown(x)f_{\mathrm{down}}(x) (a more formal proof is given in the Supplementary Material for self-contained).

The monotone decomposition idea has been exploited by [5] in their recent algorithm, where the monotone decomposition is incorporated into fitting monotone Bayesian additive regression trees (mBART). They found that fitting by monotone decomposition with mBART outperforms the corresponding BART algorithm proposed earlier in [4]. We here focus on the case of d=1d=1, and adopt B-spline basis functions to represent the monotone components,

fup(x)=j=1JuγjuBju(x)+εu,fdown(x)=j=1JdγjdBjd(x)+εd,f_{\mathrm{up}}(x)=\sum_{j=1}^{J_{u}}\gamma_{j}^{u}B^{u}_{j}(x)+\varepsilon_{u}\,,\qquad f_{\mathrm{down}}(x)=\sum_{j=1}^{J_{d}}\gamma_{j}^{d}B^{d}_{j}(x)+\varepsilon_{d}\,,

where the superscripts and subscripts “uu” and “dd” indicate for up (increasing) and down (decreasing), respectively. The monotonicity of each monotone component is ensured by the monotonicity of the coefficients [23], i.e.,

γ1uγ2uγJuu;γ1dγ2dγJdd.\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma^{u}_{J_{u}};\quad\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J_{d}}^{d}\,.

This paper is organized as follows. In Section 2, we formulate monotone decomposition with cubic splines and establish properties of the solution, particularly for monotone functions. In Section 3, we propose the monotone decomposition with smoothing splines and establish similar properties and theoretical guarantees. In Section 4, we present simulation results to demonstrate how fitting via monotone decomposition can outperform the corresponding unconstrained fitting, particularly in high-noise scenarios. In Section 5, we propose statistics for testing monotonicity and, in Section 6, we demonstrate the power of the proposed method via simulations. In Section 7, we present the results of our analysis on single-cell pseudo-time trajectory datasets using the fitting and testing techniques based on monotone decomposition. Finally, we discuss the limitations and potential future work in Section 8.

2 Monotone Decomposition with Cubic Splines

Cubic splines are the most popular polynomial splines for practitioners. Presumably, cubic splines are the lowest-order splines for which the knot-discontinuity is not visible to the human eye, and there is scarcely any good reason to go beyond cubic splines [12]. On the other hand, although there are many equivalent bases for representing a spline function, the B-spline basis system developed by [7] is attractive numerically [17]. Thus, we take the order-4 B-spline basis to represent cubic splines, under which the problem reduces to solving the following optimization problem:

minγu,γd𝐲𝐁uγu𝐁dγd2,s.t. γ1uγ2uγJu;γ1dγ2dγJd,\begin{split}\min_{\gamma^{u},\gamma^{d}}&\;\|{\mathbf{y}}-{\mathbf{B}}_{u}\gamma^{u}-{\mathbf{B}}_{d}\gamma^{d}\|_{2}\,,\\ \text{s.t. }&\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\,,\\ \end{split} (1)

where 𝐲=(y1,,yn){\mathbf{y}}=(y_{1},\ldots,y_{n}) is an nn-vector of the responses (note that we use round brackets to denote column vectors), (𝐁u)ik=Bku(xi),k=1,,Ju,(𝐁d)i=Bd(xi),=1,,Jd({\mathbf{B}}_{u})_{ik}=B^{u}_{k}(x_{i}),k=1,\ldots,J_{u},({\mathbf{B}}_{d})_{i\ell}=B^{d}_{\ell}(x_{i}),\ell=1,\ldots,J_{d} are the matrices constructed by evaluating the B-spline basis at {xi}i=1n\{x_{i}\}_{i=1}^{n}, and γu=(γ1u,,γJu),γd=(γ1d,,γJd)\gamma^{u}=(\gamma_{1}^{u},\ldots,\gamma_{J}^{u}),\gamma^{d}=(\gamma_{1}^{d},\ldots,\gamma_{J}^{d}) are the coefficient vectors.

For simplicity, we consider Ju=Jd=JJ_{u}=J_{d}=J. Note that the knots for determining the B-spline basis are conventionally on the quantiles of xx’s, then the B-spline basis functions are also the same, Bku=BdB_{k}^{u}=B_{\ell}^{d}, so the above problem (1) becomes

minγu,γd𝐲𝐁(γu+γd)2s.t. γ1uγ2uγJu;γ1dγ2dγJd,\begin{split}\min_{\gamma^{u},\gamma^{d}}&\;\|{\mathbf{y}}-{\mathbf{B}}(\gamma^{u}+\gamma^{d})\|_{2}\\ \text{s.t. }&\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\,,\end{split} (2)

where 𝐁ij=Bj(xi),j=1,,J{\mathbf{B}}_{ij}=B_{j}(x_{i}),j=1,\ldots,J.

First of all, Proposition 1 establishes the equivalence between problem (2) with the corresponding unconstrained B-spline fitting.

Proposition 1.

Regardless of the component solutions γ^u,γ^d\hat{\gamma}^{u},\hat{\gamma}^{d} to problem (2), the overall solution γ^u+γ^d\hat{\gamma}^{u}+\hat{\gamma}^{d} is equivalent to the unconstrained B-spline fitting, i.e., the least squares solution,

γ^ls=argminγ𝐲𝐁γ2=(𝐁T𝐁)1𝐁T𝐲.\hat{\gamma}^{\mathrm{ls}}=\operatorname*{arg\,min}_{\gamma}\|{\mathbf{y}}-{\mathbf{B}}\gamma\|_{2}=({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}{\mathbf{y}}\,. (3)

Specifically, γ^u+γ^d=γ^ls.\hat{\gamma}^{u}+\hat{\gamma}^{d}=\hat{\gamma}^{\mathrm{ls}}\,.

Note that the monotone components in (2) are not identifiable, since

γu+γd=γu+δ+γdδ,\gamma^{u}+\gamma^{d}=\gamma^{u}+\delta+\gamma^{d}-\delta\,,

where δ\delta is an arbitrary increasing sequence, δ1δJ\delta_{1}\leq\cdots\leq\delta_{J}. In other words, the decomposition for a general function is not unique since

fup(x)+fdown(x)=fup(x)+h(x)+fdown(x)h(x),f_{\mathrm{up}}(x)+f_{\mathrm{down}}(x)=f_{\mathrm{up}}(x)+h(x)+f_{\mathrm{down}}(x)-h(x)\,,

where h(x)h(x) is an arbitrary increasing function.

In order to have a unique decomposition, we consider the closest decomposition in some sense, such as the difference between two monotone components being the smallest in the L2L_{2}-norm. Thus, we consider imposing some discrepancy constraint on problem (2), as detailed in the following subsections, to help obtain a unique solution.

2.1 Discrepancy Constraint: A Motivating Example

Consider the simple function y=x3,x[1,1]y=x^{3},x\in[-1,1], which may be decomposed as

x3={x3+h(x)}+{0h(x)}fup(x)+fdown(x),x^{3}=\{x^{3}+h(x)\}+\{0-h(x)\}\triangleq f_{\mathrm{up}}(x)+f_{\mathrm{down}}(x)\,,

where h(x)h(x) is an increasing function. If we set h(0)=0h(0)=0, then it is easy to show that the magnitude of the difference between the two monotone components is lower-bounded by |x3||x^{3}|, i.e.,

|x3||x3+2h(x)|.|x^{3}|\leq|x^{3}+2h(x)|. (4)

Since it is unreasonable to have a decreasing component for a strictly increasing function, the ideal decomposition should correspond to h(x)=0h(x)=0. Equation (4) suggests to us that such an ideal decomposition may be obtained by requiring the two monotone components to differ the least.

In light of this observation, we introduce the following discrepancy constraint for the two monotone components in the decomposition:

fupfdown2s,\|f_{\mathrm{up}}-f_{\mathrm{down}}\|_{2}\leq s, (5)

where s0s\geq 0 is a tuning parameter. The role of parameter ss can be summarized as follows,

  • if s0s\rightarrow 0, then the solution is γu=γd=c𝟏J\gamma^{u}=\gamma^{d}=c{\boldsymbol{1}}_{J}, and hence f^up=f^down=c𝐁𝟏J=c𝟏n\hat{f}_{\mathrm{up}}=\hat{f}_{\mathrm{down}}=c{\mathbf{B}}{\boldsymbol{1}}_{J}=c{\boldsymbol{1}}_{n} are constant functions;

  • if ss\rightarrow\infty, then the problem reduces to be equivalent to the unconstrained B-spline fitting;

  • a moderate ss imposes some regularization, which is preferable for a better fitting.

2.2 General Functions

With the discrepancy constraint in (5), we can restate problem (2) as

minγu,γd𝐲𝐁(γu+γd)2s.t. γ1uγ2uγJu;γ1dγ2dγJd𝐁(γuγd)2s.\begin{split}\min_{\gamma^{u},\gamma^{d}}&\;\|{\mathbf{y}}-{\mathbf{B}}(\gamma^{u}+\gamma^{d})\|_{2}\\ \text{s.t. }&\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\\ &\|{\mathbf{B}}(\gamma^{u}-\gamma^{d})\|_{2}\leq s\,.\end{split} (6)

Defining Υ{γ=(γu,γd)IR2J:γ1uγ2uγJu;γ1dγ2dγJd}\Upsilon\triangleq\{\gamma=(\gamma^{u},\gamma^{d})\in\mathrm{I\!R}^{2J}:\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\} and

𝐙[𝐁𝐁],𝐖[𝐁T𝐁T][𝐁𝐁],{\mathbf{Z}}\triangleq\begin{bmatrix}{\mathbf{B}}&{\mathbf{B}}\end{bmatrix},\quad{\mathbf{W}}\triangleq\begin{bmatrix}{\mathbf{B}}^{T}\\ -{\mathbf{B}}^{T}\end{bmatrix}\begin{bmatrix}{\mathbf{B}}&-{\mathbf{B}}\end{bmatrix},

we further rewrite problem (6) as

minγΥ𝐲𝐙γ22s.t.γT𝐖γs2.\min_{\gamma\in\Upsilon}\;\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}\qquad\mathrm{s.t.}\ \ \gamma^{T}{\mathbf{W}}\gamma\leq s^{2}\,. (7)

It is more convenient to consider its Lagrangian form

minγΥ[𝐲𝐙γ22+μ(γT𝐖γs2)]=minγΥ𝐲𝐙γ22+μγT𝐖γ,\displaystyle\min_{\gamma\in\Upsilon}\left[\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}+\mu(\gamma^{T}{\mathbf{W}}\gamma-s^{2})\right]=\min_{\gamma\in\Upsilon}\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}+\mu\gamma^{T}{\mathbf{W}}\gamma\,, (8)

where μ0\mu\geq 0 is the Lagrange multiplier. By Lagrangian duality, there is a one-to-one correspondence between the constrained problem (7) and the Lagrangian form (8): for each value of ss in the range where the constraint γT𝐖γs2\gamma^{T}{\mathbf{W}}\gamma\leq s^{2} is active, there is a corresponding value of μ\mu that yields the same solution from the Lagrangian form (8). Conversely, the solution γ^μ\hat{\gamma}_{\mu} to problem (8) solves the bound problem (7) with s2=γ^μT𝐖γ^μs^{2}=\hat{\gamma}^{T}_{\mu}{\mathbf{W}}\hat{\gamma}_{\mu}. Some basic properties of the solution to problem (8) are summarized in Proposition 2.

Proposition 2.

Let γ^=(γ^u,γ^d)\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d}) be the monotone decomposition to problem (8) (or problem (13) discussed later).

  1. (i)

    There must be ties in the solution γ^u\hat{\gamma}^{u} or γ^d\hat{\gamma}^{d}, i.e., there exists ii or jj such that γ^iu=γ^i+1u\hat{\gamma}^{u}_{i}=\hat{\gamma}^{u}_{i+1} or γ^jd=γ^j+1d\hat{\gamma}^{d}_{j}=\hat{\gamma}^{d}_{j+1}.

  2. (ii)

    The mean values of two monotone components are equal, 𝟏T𝐁γ^u=𝟏T𝐁γ^d{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{d}.

2.3 Monotone Functions

To delve deeper into the properties of the solution to problem (8), this section discusses the monotone decomposition of monotone functions. Without loss of generality, we consider increasing functions.

Proposition 3.

Let γ^=(γ^u,γ^d)\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d}) be the monotone decomposition to problem (8). Suppose γ^u+γ^d\hat{\gamma}^{u}+\hat{\gamma}^{d} is increasing, then

  1. (i)

    γ^d\hat{\gamma}^{d} is a vector with identical elements, i.e., γ^d=c𝟏\hat{\gamma}^{d}=c{\boldsymbol{1}}, where the constant c=𝟏T𝐁γ^unc=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{u}}{n};

  2. (ii)

    if there is no ties in γ^u\hat{\gamma}^{u}, i.e., γ^1u<γ^2u<<γ^Ju\hat{\gamma}_{1}^{u}<\hat{\gamma}_{2}^{u}<\ldots<\hat{\gamma}^{u}_{J}, then

    γ^u=1μ+1γ^ls+μ1μ+1c𝟏,\hat{\gamma}^{u}=\frac{1}{\mu+1}\hat{\gamma}^{\mathrm{ls}}+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,, (9)

    where the unconstrained B-spline solution γ^ls\hat{\gamma}^{\mathrm{ls}} is given in Equation (3);

  3. (iii)

    if γ^1u<<γ^k1u==γ^k2u<<γ^k2m1u==γ^k2mu<<γ^Ju\hat{\gamma}^{u}_{1}<\cdots<\hat{\gamma}^{u}_{k_{1}}=\cdots=\hat{\gamma}^{u}_{k_{2}}<\cdots<\hat{\gamma}^{u}_{k_{2m-1}}=\cdots=\hat{\gamma}^{u}_{k_{2m}}<\cdots<\hat{\gamma}_{J}^{u}, where 1k1k2k2m1k2mJ1\leq k_{1}\leq k_{2}\leq\cdots\leq k_{2m-1}\leq k_{2m}\leq J, then

    γ^u=1μ+1𝐆T(𝐆𝐁T𝐁𝐆T)1𝐆𝐁T𝐲+μ1μ+1c𝟏,\hat{\gamma}^{u}=\frac{1}{\mu+1}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{y}}+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,, (10)

    where 𝐆{\mathbf{G}} is a block diagonal matrix with elements

    {𝐈k11,𝟏k2k1+1T,,𝐈k2m1k2m21,𝟏k2mk2m1+1T,𝐈Jk2m}.\{{\mathbf{I}}_{k_{1}-1},{\boldsymbol{1}}^{T}_{k_{2}-k_{1}+1},\ldots,{\mathbf{I}}_{k_{2m-1}-k_{2m-2}-1},{\boldsymbol{1}}^{T}_{k_{2m}-k_{2m-1}+1},{\mathbf{I}}_{J-k_{2m}}\}\,.

    The above result (ii) can be viewed as a special case when k1=k2=Jk_{1}=k_{2}=J.

Intuitively, solution (9) can be viewed as a shrinkage with offset applied to the unconstrained B-spline fitting γ^ls\hat{\gamma}^{\mathrm{ls}}, where 1μ+1\frac{1}{\mu+1} is the shrinkage factor, and μ1μ+1c𝟏\frac{\mu-1}{\mu+1}c{\boldsymbol{1}} is the offset. Even with the general matrix 𝐆{\mathbf{G}}, solution (10) also exhibits a similar shrinkage with offset pattern.

2.4 MSE Comparisons

To quantify the performance of fitting by monotone decomposition, consider the model

y=f(x)+ε,εN(0,σ2).y=f(x)+\varepsilon\,,\quad\varepsilon\sim N(0,\sigma^{2})\,. (11)

Define the mean squared error of the fitness,

MSE(𝐲^)=𝔼𝐟𝐁(γ^u+γ^d)22,MSE(𝐲^ls)=𝔼𝐟𝐁γ^ls22,\displaystyle\operatorname*{MSE}(\hat{\mathbf{y}})={\mathbb{E}}\|{\mathbf{f}}-{\mathbf{B}}(\hat{\gamma}^{u}+\hat{\gamma}^{d})\|^{2}_{2}\,,\quad\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ls}})={\mathbb{E}}\|{\mathbf{f}}-{\mathbf{B}}\hat{\gamma}^{\mathrm{ls}}\|^{2}_{2}\,,

where 𝐟=[f(x1),,f(xn)]T{\mathbf{f}}=[f(x_{1}),\ldots,f(x_{n})]^{T}, and the expectations are taken over 𝐲N(𝐟,σ2𝐈){\mathbf{y}}\sim N({\mathbf{f}},\sigma^{2}{\mathbf{I}}). Proposition 4 shows that the fitting with monotone decomposition can achieve better performance, particularly in high-noise scenarios, when the underlying function is monotone.

Proposition 4.

Suppose the monotone decomposition γ^=(γ^u,γ^d)\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d}) satisfies that γ^u+γ^d\hat{\gamma}^{u}+\hat{\gamma}^{d} is increasing. Let 𝐆{\mathbf{G}} be a g×Jg\times J matrix defined in Proposition 3 such that 𝐆Tγ^u{\mathbf{G}}^{T}\hat{\gamma}^{u} is the sub-vector with unique elements. If

σ2>C1(J𝔼g)+C2(𝔼g+1)+Δ2[(J𝔼g)(𝔼g+1)+(𝔼g1)2],\sigma^{2}>\frac{-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)+\sqrt{\Delta}}{2[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}]}\,,

where

C1\displaystyle C_{1} =𝐟T𝐀𝐟(𝟏nT𝐟)2n0,C2=𝐟T(𝐈𝐀)𝐟0\displaystyle={\mathbf{f}}^{T}{\mathbf{A}}{\mathbf{f}}-\frac{({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}}{n}\geq 0\,,\quad C_{2}={\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{A}}){\mathbf{f}}\geq 0
𝐀\displaystyle{\mathbf{A}} =𝔼[𝐁𝐆T(𝐆𝐁T𝐁𝐆T)1𝐆𝐁T]\displaystyle={\mathbb{E}}[{\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}]
Δ\displaystyle\Delta =[C1(J𝔼g)+C2(𝔼g+1)]2+4C1C2(𝔼g1)2,\displaystyle=\left[C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]^{2}+4C_{1}C_{2}({\mathbb{E}}g-1)^{2}\,,

and the expectations are taken over 𝐲{\mathbf{y}} since 𝐆{\mathbf{G}} (and hence gg) depends on 𝐲{\mathbf{y}}, then there exists monotone decomposition such that MSE(𝐲^)<MSE(𝐲^ls)\operatorname*{MSE}(\hat{\mathbf{y}})<\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ls}}).

Particularly, if we assume there is no ties in γ^u\hat{\gamma}^{u}, i.e.,𝐆𝐈{\mathbf{G}}\equiv{\mathbf{I}} for different 𝐲{\mathbf{y}}, then there always exists a monotone decomposition such that MSE(𝐲^)<MSE(𝐲^ls)\operatorname*{MSE}(\hat{\mathbf{y}})<\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ls}}) regardless of the noise level.

The lower bound of σ2\sigma^{2} in Proposition 4 might not be easy to evaluate. Nonetheless, the pivotal implication is that the monotone decomposition fitting can achieve better performance when the noise level is large enough. Extensive simulations in Section 4 agree with this argument. Moreover, although Proposition 4 is specifically established for monotone functions, the simulations show that the monotone decomposition fitting with cubic splines can also outperform the corresponding unconstrained cubic splines applied to random functions, particularly in high-noise scenarios.

3 Monotone Decomposition with Smoothing Splines

When dealing with cubic splines, it is typically necessary to ascertain both the number of basis functions, denoted as JJ, and the optimal placement of knots. In contrast, smoothing splines take a different approach by employing all unique data points as knots, thus bypassing the need for an optimization process to determine the knot placement and the number of knots required for B-spline basis functions.

With B-spline basis functions, the smoothing spline f(x)=j=1JγjBj(x)f(x)=\sum_{j=1}^{J}\gamma_{j}B_{j}(x) can be estimated as follows,

γ^ss=argmin𝐲𝐁γ22+λγT𝛀γ,\hat{\gamma}^{\mathrm{ss}}=\operatorname*{arg\,min}\|{\mathbf{y}}-{\mathbf{B}}\gamma\|_{2}^{2}+\lambda\gamma^{T}{\boldsymbol{\Omega}}\gamma\,, (12)

where {𝛀}jk=Bj′′(s)Bk′′(s)𝑑s\{{\boldsymbol{\Omega}}\}_{jk}=\int B_{j}^{\prime\prime}(s)B_{k}^{\prime\prime}(s)ds is called the roughness penalty matrix and λ>0\lambda>0 is the penalty parameter. For this reason, smoothing splines are also referred to as penalized splines.

Imposing the roughness penalty γT𝛀γ=(γu+γd)T𝛀(γu+γd)=γT𝚺γ\gamma^{T}{\boldsymbol{\Omega}}\gamma=(\gamma^{u}+\gamma^{d})^{T}{\boldsymbol{\Omega}}(\gamma^{u}+\gamma^{d})=\gamma^{T}{\boldsymbol{\Sigma}}\gamma on problem (8), where 𝚺[𝛀𝛀𝛀𝛀]{\boldsymbol{\Sigma}}\triangleq\begin{bmatrix}{\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\\ {\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\end{bmatrix}, we have the Lagrangian form of monotone decomposition with smoothing splines,

minγΥ𝐲𝐙γ22+μγT𝐖γ+λγT𝚺γ.\min_{\gamma\in\Upsilon}\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}+\mu\gamma^{T}{\mathbf{W}}\gamma+\lambda\gamma^{T}{\boldsymbol{\Sigma}}\gamma\,. (13)

For general functions, the properties in Proposition 2 also hold for the monotone decomposition with smoothing splines.

3.1 Monotone Functions

Likewise, we delve deeper into the characteristics of monotone decomposition with smoothing splines on monotone functions. The solutions demonstrate analogous shrinkage with offset patterns, akin to those observed in Proposition 3 for monotone decomposition with cubic splines, and the results are articulated in the following Proposition 5.

Proposition 5.

Let γ^=(γ^u,γ^d)\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d}) be the monotone decomposition to problem (13). Suppose γ^u+γ^d\hat{\gamma}^{u}+\hat{\gamma}^{d} is increasing, then

  1. (i)

    γ^d\hat{\gamma}^{d} is a vector with identical elements, i.e., γ^d=c𝟏\hat{\gamma}^{d}=c{\boldsymbol{1}}, where the constant c=𝟏T𝐁γ^unc=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{u}}{n};

  2. (ii)

    if there is no ties in γ^u\hat{\gamma}^{u}, i.e., the inequalities hold strictly, γ^1u<γ^2u<<γ^Ju\hat{\gamma}_{1}^{u}<\hat{\gamma}_{2}^{u}<\ldots<\hat{\gamma}^{u}_{J}, then

    γ^u=11+μγ^ss(λ1+μ)c((1+μ)𝐁T𝐁+λ𝛀)1((1μ)𝐁T𝐁+λ𝛀)𝟏J\begin{split}\hat{\gamma}^{u}&=\frac{1}{1+\mu}\hat{\gamma}^{\mathrm{ss}}\left({\frac{\lambda}{1+\mu}}\right)-c((1+\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}})^{-1}((1-\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}){\boldsymbol{1}}_{J}\end{split} (14)

    where γ^ss(λ1+μ)\hat{\gamma}^{\mathrm{ss}}\left({\frac{\lambda}{1+\mu}}\right) is the solution to smoothing spline with penalty parameter λ1+μ\frac{\lambda}{1+\mu},

    γ^ss(λ1+μ)=(𝐁T𝐁+λ1+μ𝛀)1𝐁T𝐲.\hat{\gamma}^{\mathrm{ss}}\left({\frac{\lambda}{1+\mu}}\right)=\left({\mathbf{B}}^{T}{\mathbf{B}}+\frac{\lambda}{1+\mu}{\boldsymbol{\Omega}}\right)^{-1}{\mathbf{B}}^{T}{\mathbf{y}}\,.
  3. (iii)

    if γ^1u<<γ^k1u==γ^k2u<<γ^k2m1u==γ^k2mu<γ^Ju\hat{\gamma}^{u}_{1}<\cdots<\hat{\gamma}^{u}_{k_{1}}=\cdots=\hat{\gamma}^{u}_{k_{2}}<\cdots<\hat{\gamma}^{u}_{k_{2m-1}}=\cdots=\hat{\gamma}^{u}_{k_{2m}}<\hat{\gamma}_{J}^{u}, where 1k1k2k2m1k2mJ1\leq k_{1}\leq k_{2}\leq\cdots\leq k_{2m-1}\leq k_{2m}\leq J, then

    γ^u\displaystyle\hat{\gamma}^{u} =𝐆T((1+μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)1𝐆𝐁T𝐲\displaystyle={\mathbf{G}}^{T}((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{y}}-
    c𝐆T((1+μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)1((1μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)𝟏g,\displaystyle\qquad c{\mathbf{G}}^{T}((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T}){\boldsymbol{1}}_{g}\,,

    where 𝐆{\mathbf{G}} is defined in Proposition 3. The above result (ii) can be viewed as a special case when k1=k2=Jk_{1}=k_{2}=J.

For the no-tie solution (14), the shrinkage is on the ridge solution γ^ss(λ1+μ)\hat{\gamma}^{\mathrm{ss}}({\frac{\lambda}{1+\mu}}), but different from the constant offset in Equation (9), the offset c((1+μ)𝐁T𝐁+λ𝛀)1((1μ)𝐁T𝐁+λ𝛀)𝟏Jc((1+\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}})^{-1}((1-\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}){\boldsymbol{1}}_{J} depends on 𝐁{\mathbf{B}} and 𝛀{\boldsymbol{\Omega}}.

3.2 MSE Comparisons

Based on model (11), for a comparative analysis of the fitting performance between monotone decomposition with smoothing splines and their smoothing splines counterparts, we further define the mean squared error for smoothing splines,

MSE(𝐲^ss(λ))\displaystyle\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ss}}(\lambda)) =𝔼𝐟𝐁γ^ss(λ)22,\displaystyle={\mathbb{E}}\|{\mathbf{f}}-{\mathbf{B}}\hat{\gamma}^{\mathrm{ss}}(\lambda)\|^{2}_{2}\,,

where γ^ss(λ)\hat{\gamma}^{\mathrm{ss}}(\lambda) is the solution to smoothing splines with penalty parameter λ\lambda.

Proposition 6 shows that employing monotone decomposition with smoothing splines can result in a superior mean squared error (MSE) compared to smoothing splines in the context of monotone functions, particularly when the noise level is sufficiently high. While the condition (15) outlined in Proposition 6 may appear intricate, the simulations presented in the next section empirically substantiate this assertion. Furthermore, although the proposition is specifically formulated for monotone functions, the simulations show that the monotone decomposition applied to general functions can still achieve better performance in high-noise scenarios.

Proposition 6.

Consider the smoothing spline with penalty parameter λ0\lambda_{0}. Let γ^ss(λ0)\hat{\gamma}^{\mathrm{ss}}(\lambda_{0}) be the coefficient vector and denote its hat matrix by 𝐐=𝐁(𝐁T𝐁+λ0𝛀)1𝐁T{\mathbf{Q}}={\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}}+\lambda_{0}{\boldsymbol{\Omega}})^{-1}{\mathbf{B}}^{T}. If

σ2>𝐟T𝐐(𝐈𝟏n𝟏nT𝐐n)(𝐈𝐐)𝐟tr[(𝐈𝟏n𝟏nT𝐐n)𝐐2],\sigma^{2}>\frac{{\mathbf{f}}^{T}{\mathbf{Q}}({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n})({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}}{\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}){\mathbf{Q}}^{2}]}\,, (15)

and suppose the monotone decomposition γ^u+γ^d\hat{\gamma}^{u}+\hat{\gamma}^{d} is increasing with no ties, then there exists a monotone decomposition with parameters λ=λ0/k,μ=1/k1\lambda=\lambda_{0}/k,\mu=1/k-1, where k(0,1)k\in(0,1), that achieves smaller mean squared error than MSE(𝐲^ss(λ0))\operatorname*{MSE}(\hat{\mathbf{y}}^{\mathrm{ss}}(\lambda_{0})).

4 Simulations for Fitting

In this section, we compare the performance of monotone decomposition using simulated examples. We generate data from function ff with standard Gaussian noises,

y=f(x)+N(0,σ2),x[1,1].y=f(x)+N(0,\sigma^{2})\,,\;x\in[-1,1]\,.

To cover a diverse range of functional forms, we consider the following different types of functions.

  • monotone functions: (i) polynomial function: y=x3y=x^{3}; (ii) exponential function: y=exp(x)y=\exp(x); (iii) sigmoid function: y=11+exp(5x)y=\frac{1}{1+\exp(-5x)}.

  • general functions: (i) unimodal: y=x2y=x^{2}; (ii) random functions, where the kernel can be Squared Exponential (SE), Rational Quadratic (RQ), Matérn (Mat) and Periodic [18]. The numerical values appended to the kernel names in Tables 6 and 8 are the kernel parameters. For example, “Mat12” refers to the Matérn kernel with parameter ν=1/2\nu=1/2. The detailed procedure for generating a random function and a visualization of those curves can be found in the Supplementary Material.

To compare the performance of different methods, we adopt the mean squared fitting error (MSFE), i.e., the residual sum of squares (RSS) divided by sample size, and the mean squared prediction error (MSPE),

MSFE=1ni=1n(yif^(xi))2,MSPE=1Ni=1N(f^(ti)f(ti))2,\mathrm{MSFE}=\frac{1}{n}\sum_{i=1}^{n}\left(y_{i}-\hat{f}(x_{i})\right)^{2}\,,\qquad\mathrm{MSPE}=\frac{1}{N}\sum_{i=1}^{N}\left(\hat{f}(t_{i})-f(t_{i})\right)^{2}\,,

where ti=x(1)+(i1)x(N)x(1)N,i=1,,Nt_{i}=x_{(1)}+(i-1)\cdot\frac{x_{(N)}-x_{(1)}}{N},i=1,\ldots,N are equally spaced within the same range of xx. Based on R=100R=100 replications, we estimate the mean MSFE and MSPE, together with their respective standard errors. To judge how significant the differences of MSPE between the fitting by monotone decomposition and the corresponding spline fitting, we consider

Δ=MSPE(Monotone decomposition)MSPE(Spline fitting),\Delta=\mathrm{MSPE}(\text{Monotone decomposition})-\mathrm{MSPE}(\text{Spline fitting})\,,

and denote the differences for each experiment as δi,i=1,,R\delta_{i},i=1,\ldots,R. We report the pp-value for the one-sided tt-test H0:Δ=0H_{0}:\Delta=0 versus H1<:Δ<0H_{1}^{<}:\Delta<0 when i=1nδi<0\sum_{i=1}^{n}\delta_{i}<0 (or H1>:Δ>0H_{1}^{>}:\Delta>0 when i=1nδi>0\sum_{i=1}^{n}\delta_{i}>0). Besides, we also count the proportion for the fitting by monotone decomposition that achieves better performance, propi=1R#{δi<0}R.\text{prop}\triangleq\frac{\sum_{i=1}^{R}\#\{\delta_{i}<0\}}{R}\,.

4.1 Cubic Splines

Firstly, we consider the monotone decomposition with cubic splines. There are two tuning parameters: the number of basis functions JJ, and the Lagrange multiplier μ\mu for the discrepancy between the two components. We adopt two strategies to optimize these parameters:

Tune μ\mu with fixed JJ:

We pick JJ, the tuning parameter for cubic splines, to be a minimizer of the cross-validation (CV) error, and then perform the monotone decomposition with cubic splines using the same JJ while tuning the parameter μ\mu. Figure 1 shows an example, with J=26J=26 selected by CV. The left panel shows the leave-one-out CV error plotted against μ\mu. The cubic spline fitting and the monotone decomposition fitting are displayed in the right panel, where the former achieves 0.102 MSPE, while the latter improves the MSPE to 0.066.

Refer to caption
Refer to caption
Figure 1: Demo for monotone decomposition with cubic splines. The left panel shows the leave-one-out cross-validation error curve for each candidate μ\mu when JJ is CV-tuned for the cubic spline. The right panel shows the corresponding fitting curves, together with the truth and the noised observations. The values in the parentheses are the mean squared prediction error.
Tune μ\mu and JJ simultaneously:

Instead of fixing JJ in the monotone decomposition procedure, we also use cross-validation to choose it, together with μ\mu. Figure 2 displays an example of this process, where the left panel shows the CV error for each parameter pair (μ,J)(\mu,J), and the right panel compares the fitting given the parameters that minimize the CV error to cubic spline fitting, whose parameter JJ is separately tuned by CV.

Refer to caption
Refer to caption
Figure 2: Demo for monotone decomposition with cubic splines. The left panel shows the leave-one-out cross-validation error curve for each candidate pair (μ,J)(\mu,J). The right panel shows the corresponding fitting curves, together with the truth and noised observations. The values in the parentheses are the mean squared prediction error.

Note that the right panels of Figures 1 and 2 depict the same training data. Although the MSPE of 0.079 by tuning (μ,J)(\mu,J) simultaneously is slightly larger than the MSPE of 0.066 by tuning μ\mu with fixed JJ, the monotone decomposition method achieves better performance than the cubic spline fitting, which has an MSPE of 0.102.

To provide comprehensive comparisons, we conducted 100 repetitions for 12 types of curves under different noise levels. The results by tuning μ\mu and JJ simultaneously are summarized in Table 6, and the results by tuning μ\mu with fixed JJ can be found in the Supplementary Material. For some curves with small noises (e.g., σ=0.2\sigma=0.2), such as y=x3y=x^{3}, the decomposition method performs slightly worse than cubic spline fitting. Nevertheless, the monotone decomposition always outperforms cubic spline fitting in higher noise (e.g., σ=1.0\sigma=1.0) scenarios, regardless of the optimization strategies.

Table 1: Results for comparing the cubic splines and the fitting by monotone decomposition with CV-tuned (μ,J)(\mu,J). The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error. The complete table with finer noise levels can be found in the Supplementary Material.
curve σ\sigma MSFE MSPE p-value prop.
CubicSpline MonoDecomp CubicSpline MonoDecomp
x2x^{2} 1.0 9.71e+00 (7.3e-02) 9.73e+00 (7.3e-02) 7.50e+00 (3.1e-01) 6.93e+00 (2.6e-01) 5.91e-03 (**) 0.59
x3x^{3} 1.0 9.65e+00 (7.4e-02) 9.60e+00 (7.2e-02) 7.63e+00 (3.5e-01) 7.17e+00 (2.7e-01) 2.21e-02 (*) 0.6
exp(x)\exp(x) 1.0 9.57e+00 (5.9e-02) 9.49e+00 (7.3e-02) 7.56e+00 (2.8e-01) 7.08e+00 (3.1e-01) 3.59e-02 (*) 0.57
sigmoid 1.0 9.51e+00 (8.5e-02) 9.50e+00 (8.7e-02) 7.33e+00 (3.2e-01) 6.61e+00 (2.6e-01) 1.45e-03 (**) 0.56
SE-1 1.0 9.55e+00 (7.0e-02) 9.51e+00 (7.6e-02) 7.29e+00 (3.2e-01) 6.62e+00 (2.7e-01) 5.51e-03 (**) 0.63
SE-0.1 1.0 9.29e+00 (8.5e-02) 9.20e+00 (9.3e-02) 1.44e+01 (2.6e-01) 1.38e+01 (2.4e-01) 5.93e-04 (***) 0.7
Mat12-1 1.0 9.79e+00 (8.9e-02) 9.73e+00 (9.1e-02) 1.26e+01 (2.9e-01) 1.17e+01 (2.4e-01) 9.31e-07 (***) 0.68
Mat12-0.1 1.0 1.04e+01 (1.3e-01) 1.04e+01 (1.3e-01) 2.07e+01 (2.4e-01) 2.01e+01 (2.4e-01) 1.85e-04 (***) 0.73
Mat32-1 1.0 9.62e+00 (8.2e-02) 9.61e+00 (8.3e-02) 9.01e+00 (3.5e-01) 8.00e+00 (2.6e-01) 1.46e-04 (***) 0.56
Mat32-0.1 1.0 9.62e+00 (1.1e-01) 9.53e+00 (9.7e-02) 1.68e+01 (2.5e-01) 1.58e+01 (2.1e-01) 4.67e-10 (***) 0.72
RQ-0.1-0.5 1.0 9.55e+00 (1.1e-01) 9.50e+00 (1.2e-01) 1.50e+01 (2.4e-01) 1.44e+01 (2.6e-01) 5.01e-03 (**) 0.68
Periodic-0.1-4 1.0 9.41e+00 (1.2e-01) 9.31e+00 (1.1e-01) 1.72e+01 (3.0e-01) 1.60e+01 (2.5e-01) 2.03e-10 (***) 0.63

4.2 Smoothing Splines

This section compares the fitting performance of monotone decomposition with smoothing splines to the fitting of the smoothing splines. There are two tuning parameters: the penalty parameter λ\lambda and the Lagrange multiplier μ\mu for the discrepancy. We consider three strategies to optimize these parameters:

  • Tune λ\lambda for smoothing splines first, then tune μ\mu for the monotone decomposition with smoothing splines using the tuned λ\lambda;

  • According to Proposition 6, tune λ\lambda for smoothing splines first, then tune the shrinkage factor k=11+μk=\frac{1}{1+\mu} for monotone decomposition with smoothing splines using penalty parameter λ/k\lambda/k;

  • Simultaneously tune λ\lambda and μ\mu for monotone decomposition with smoothing splines.

All strategies use cross-validation (CV) to determine the parameters. Specifically, we pick the tuning parameters that minimize the CV error.

For brevity, we only present the results using the third strategy in this section. Results based on the other two strategies can be found in the Supplementary Material. Figure 3 illustrates the simultaneous tuning of (μ,λ)(\mu,\lambda) using the same toy example presented in Figures 1 and 2. The smoothness penalty in smoothing splines results in fitted curves that are notably less wiggly compared to the curves fitted by cubic splines without such a penalty.

Refer to caption
Refer to caption
Figure 3: Demo for monotone decomposition with smoothing splines. The left panel shows the leave-one-out cross-validation error heatmap for each candidate pair (μ,λ)(\mu,\lambda). The right panel shows the corresponding fitting curves, together with the truth and the noised observations. The values in the parentheses are the mean squared prediction errors.

The results of 100 repetitive experiments are summarized in Table 8. We observe that the monotone decomposition consistently outperforms the corresponding smoothing splines. Moreover, the monotone decomposition with CV-tuned (λ,μ)(\lambda,\mu) is more likely to obtain better performance compared to the other two strategies. Regardless of the optimization strategies employed, all results consistently show that the monotone decomposition fitting can achieve a good performance, especially in high-noise scenarios.

Table 2: Results for comparing the smoothing splines with CV-tuned λ\lambda and the fitting by monotone decomposition with CV-tuned (λ,μ)(\lambda,\mu). The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error. The complete table with finer noise levels can be found in the Supplementary Material.
curve σ\sigma MSFE MSPE p-value prop.
SmoothSpline MonoDecomp SmoothSpline MonoDecomp
x2x^{2} 1.0 9.65e+00 (8.9e-02) 9.69e+00 (8.5e-02) 6.44e+00 (2.9e-01) 6.35e+00 (2.5e-01) 1.68e-01 0.49
x3x^{3} 1.0 9.75e+00 (8.2e-02) 9.77e+00 (8.2e-02) 6.47e+00 (1.8e-01) 6.21e+00 (1.7e-01) 1.18e-04 (***) 0.65
exp(x)\exp(x) 1.0 9.74e+00 (8.4e-02) 9.75e+00 (8.4e-02) 5.94e+00 (2.3e-01) 5.82e+00 (2.1e-01) 3.12e-02 (*) 0.58
sigmoid 1.0 9.60e+00 (7.8e-02) 9.64e+00 (7.5e-02) 5.99e+00 (2.9e-01) 5.68e+00 (2.4e-01) 4.47e-04 (***) 0.67
SE-1 1.0 9.67e+00 (8.3e-02) 9.70e+00 (8.1e-02) 6.32e+00 (2.8e-01) 6.11e+00 (2.5e-01) 3.86e-03 (**) 0.6
SE-0.1 1.0 8.92e+00 (9.2e-02) 8.99e+00 (9.0e-02) 1.23e+01 (2.1e-01) 1.23e+01 (2.1e-01) 4.32e-01 0.57
Mat12-1 1.0 9.65e+00 (9.5e-02) 9.69e+00 (9.2e-02) 1.15e+01 (2.3e-01) 1.13e+01 (2.1e-01) 3.40e-04 (***) 0.56
Mat12-0.1 1.0 9.76e+00 (1.2e-01) 9.92e+00 (1.1e-01) 1.90e+01 (2.2e-01) 1.90e+01 (2.2e-01) 2.06e-01 0.58
Mat32-1 1.0 9.59e+00 (8.5e-02) 9.64e+00 (7.5e-02) 7.20e+00 (2.4e-01) 7.04e+00 (2.0e-01) 3.65e-02 (*) 0.49
Mat32-0.1 1.0 8.96e+00 (1.1e-01) 9.14e+00 (1.0e-01) 1.48e+01 (2.2e-01) 1.47e+01 (2.1e-01) 1.44e-01 0.54
RQ-0.1-0.5 1.0 9.10e+00 (1.1e-01) 9.22e+00 (1.1e-01) 1.27e+01 (2.4e-01) 1.25e+01 (2.2e-01) 1.21e-03 (**) 0.6
Periodic-0.1-4 1.0 8.69e+00 (1.0e-01) 8.89e+00 (9.1e-02) 1.44e+01 (2.1e-01) 1.43e+01 (1.9e-01) 3.03e-01 0.49

5 Test of Monotonicity

Once obtaining two monotone components through monotone decomposition, in addition to utilizing the sum of these two components as a fitting method, we can also derive statistics for the monotonicity testing. Consider the model Y=f(X)+ϵY=f(X)+\epsilon, where XX is a scalar covariate, YY is a scalar dependent random variable, ϵ\epsilon is the noise satisfying 𝔼[ϵX]=0{\mathbb{E}}[\epsilon\mid X]=0, and ff is an unknown function. Testing of monotonicity aims to test

H0:f is monotoneH1:f is not monotone,H_{0}:f\text{ is monotone}\quad H_{1}:f\text{ is not monotone}\,,

where “monotone” can be specifically (strictly) “increasing” or “decreasing”.

5.1 Related Work

There are many existing approaches for testing monotonicity. [1] constructed a test based on critical bandwidth. They fitted a local linear regression and determined the smallest bandwidth value such that the estimate becomes monotone. This critical bandwidth is then used as a test statistic, and the pp-value is calculated by the bootstrap method. [11] pointed out the shortcoming of the test when the true function has flat and nearly flat spots, and they proposed a test that estimates local slopes and approximates the distribution of the weighted minimum. [9] proposed test statistics that are functionals of a U-process, which is based on the signs of (Yi+kYi)(Xi+kXi)(Y_{i+k}-Y_{i})(X_{i+k}-X_{i}). They approximated the limiting distribution by Gaussian processes and then derived the critical values for an asymptotic significance level α\alpha. [3] used the similar U-statistics, but he introduced a weighting function Q(Xi,Xj)Q(X_{i},X_{j}) and proposed a statistic based on (YiYj)sign(XjXi)Q(Xi,Xj)(Y_{i}-Y_{j})\operatorname*{sign}(X_{j}-X_{i})Q(X_{i},X_{j}), where QQ is chosen from a large set of potentially useful weighting functions to maximize the statistic. [22] used quadratic regression splines to fit the data, took the minimum of the slopes at the knots as the test statistic, and then estimated the null distribution of such a statistic by performing constrained quadratic regression splines.

5.2 Test by Monotone Decomposition

Suppose we have obtained the monotone components. Propositions 3 and 5 imply that the coefficients for one component would be constant if the function is monotone. Thus, we can test the monotonicity of a function by testing whether the coefficients of monotone components are constant. The equivalences are summarized in Table 3.

Table 3: Equivalent hypotheses.
Original Hypothesis Hypothesis in terms of Monotone Decomposition
0u:f is increasing{\mathcal{H}}_{0}^{u}:f\text{ is increasing} H0u:γd=c𝟏.H_{0}^{u}:\gamma^{d}=c{\boldsymbol{1}}\,.
0d:f is decreasing{\mathcal{H}}_{0}^{d}:f\text{ is decreasing} H0d:γu=c𝟏.H_{0}^{d}:\gamma^{u}=c{\boldsymbol{1}}\,.
0su:f is strictly increasing{\mathcal{H}}_{0}^{su}:f\text{ is strictly increasing} H0su:γd=c𝟏,γuc𝟏.H_{0}^{su}:\gamma^{d}=c{\boldsymbol{1}},\gamma^{u}\neq c{\boldsymbol{1}}\,.
0sd:f is strictly decreasing{\mathcal{H}}_{0}^{sd}:f\text{ is strictly decreasing} H0sd:γu=c𝟏,γdc𝟏.H_{0}^{sd}:\gamma^{u}=c{\boldsymbol{1}},\gamma^{d}\neq c{\boldsymbol{1}}\,.
0m:f is monotone{\mathcal{H}}_{0}^{m}:f\text{ is monotone} H0m:one monotone component is constantH_{0}^{m}:\text{one monotone component is constant}, i.e., min(γu,γd)=c𝟏.\min(\gamma^{u},\gamma^{d})=c{\boldsymbol{1}}\,.
0sm:f is strictly monotone{\mathcal{H}}_{0}^{sm}:f\text{ is strictly monotone } H0sm:one and only one monotone component is constantH_{0}^{sm}:\text{one and only one monotone component is constant}, i.e., min(γu,γd)=c𝟏,max(γu,γd)c𝟏.\min(\gamma^{u},\gamma^{d})=c{\boldsymbol{1}},\max(\gamma^{u},\gamma^{d})\neq c{\boldsymbol{1}}\,.

In Table 3, the minimum (maximum) of two vectors a,bIRJa,b\in\mathrm{I\!R}^{J} are defined as:

min(a,b)argminx{a,b}V(x),max(a,b)\displaystyle\min(a,b)\triangleq\operatorname*{arg\,min}_{x\in\{a,b\}}V(x),\ \ \ \ \ \max(a,b) argmaxx{a,b}V(x),\displaystyle\triangleq\operatorname*{arg\,max}_{x\in\{a,b\}}V(x),

where VV is the sample variance on the elements of a vector. To test H0u,H0dH_{0}^{u},H_{0}^{d} and H0mH_{0}^{m}, consider the test statistics

Tu=V(γ^d),Td=V(γ^u),Tm=V(min(γ^u,γ^d)).\displaystyle T_{u}=V(\hat{\gamma}^{d}),\ \ T_{d}=V(\hat{\gamma}^{u}),\ \ T_{m}=V(\min(\hat{\gamma}^{u},\hat{\gamma}^{d})).

Note that the null hypothesis will be rejected if the test statistic TT is large enough. Specifically, given a significance level α\alpha, the respective null hypothesis would be rejected if Tutu,1α,Tdtd,1αT_{u}\geq t_{u,1-\alpha},T_{d}\geq t_{d,1-\alpha} or Tmtm,1αT_{m}\geq t_{m,1-\alpha}, where tu,1α,td,1αt_{u,1-\alpha},t_{d,1-\alpha} and tm,1αt_{m,1-\alpha} denote the critical values of the distributions of Tu,Td,TmT_{u},T_{d},T_{m} under the respective null hypotheses, respectively. The distributions of Tu,Td,TmT_{u},T_{d},T_{m} under their null hypotheses can be characterized by the bootstrap samples. Note that the ϵ\epsilon and ϵ^\hat{\epsilon} can be heterogeneous, so we take the wild bootstrap [6]. Without loss of generality, we focus on the test of increasing functions, and the procedure for testing H0uH_{0}^{u} is outlined in Algorithm 1.

Algorithm 1 Test of Monotonicity (Increasing): H0uH^{u}_{0}
0:  Significance level α\alpha, number of bootstrap samples RR.
1:  Fit {xi,yi}i=1n\{x_{i},y_{i}\}_{i=1}^{n} by monotone decomposition, either with cubic splines or smoothing splines. Denote the increasing component as y^u\hat{y}^{u}, let c^=𝟏Ty^un\hat{c}=\frac{{\boldsymbol{1}}^{T}\hat{y}^{u}}{n}, and denote the fitting method as m^\hat{m}.
2:  Calculate the test statistic TT.
3:  Calculate the errors, ϵi=yiy^i\epsilon_{i}=y_{i}-\hat{y}_{i}.
4:  for rr from 1 to RR do
5:     Sample ηiN(0,1)\eta_{i}\sim N(0,1) and construct ϵi=ηiϵi\epsilon^{\star}_{i}=\eta_{i}\epsilon_{i}. [Wild Bootstrap]
6:     Construct bootstrap samples yi=y^iu+c^+ϵi,i=1,,ny^{\star}_{i}=\hat{y}^{u}_{i}+\hat{c}+\epsilon^{\star}_{i},i=1,\ldots,n.
7:     Apply m^\hat{m} on {xi,yi}i=1n\{x_{i},y_{i}^{\star}\}_{i=1}^{n}, and calculate the bootstrap statistic δr\delta_{r}
8:  end for
9:  The critical value t1αt_{1-\alpha} is the 1α1-\alpha quantile of {δr}r=1R\{\delta_{r}\}_{r=1}^{R}, then reject the null hypothesis if
T>t1αT>t_{1-\alpha}
Alternatively, construct pp-value 1Rr=1R#{δr>T}\frac{1}{R}\sum_{r=1}^{R}\#\{\delta_{r}>T\}, and reject if p<αp<\alpha.

6 Simulations for Testing

Firstly, we want to check whether the methods can control the Type I error. Specifically, consider five monotone functions, x,x3,x1/3,exx,x^{3},x^{1/3},e^{x} and 11+ex\frac{1}{1+e^{-x}}. We conducted 100 simulations and calculated the proportion of rejecting the null hypothesis. Ideally, the rejection proportion should be less than 0.05 if we pick the commonly used significance level α=0.05\alpha=0.05. The results are reported in Table 4.

Table 4: Simulated size (the proportion of rejecting the null hypothesis) of monotone curves under different noise levels and different sample sizes given the significance level α=0.05\alpha=0.05.
Methods Curves σ=0.001\sigma=0.001 σ=0.01\sigma=0.01 σ=0.1\sigma=0.1
n = 50 100 200 n = 50 100 200 n = 50 100 200
[22] xx 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.0
x3x^{3} 0.53 0.84 1.0 0.05 0.08 0.08 0.02 0.06 0.04
x1/3x^{1/3} 0.0 0.0 0.0 0.0 0.0 0.0 0.03 0.02 0.01
exe^{x} 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1/(1+ex)1/(1+e^{-x}) 0.0 0.0 0.0 0.01 0.0 0.0 0.04 0.06 0.05
[9] xx 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
x3x^{3} 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
x1/3x^{1/3} 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
exe^{x} 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
1/(1+ex)1/(1+e^{-x}) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
[1] xx 0.0 0.0 0.0 0.02 0.0 0.01 0.01 0.0 0.01
x3x^{3} 0.0 0.0 0.0 0.2 0.19 0.18 0.26 0.2 0.15
x1/3x^{1/3} 0.0 0.0 0.0 0.0 0.0 0.0 0.05 0.02 0.01
exe^{x} 0.0 0.0 0.0 0.0 0.0 0.0 0.04 0.02 0.0
1/(1+ex)1/(1+e^{-x}) 0.0 0.0 0.0 0.02 0.0 0.02 0.02 0.01 0.0
MDCS xx 0.01 0.0 0.12 0.0 0.0 0.0 0.0 0.0 0.0
x3x^{3} 0.0 0.0 0.0 0.01 0.0 0.0 0.0 0.0 0.0
x1/3x^{1/3} 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
exe^{x} 0.02 0.11 0.03 0.0 0.0 0.0 0.0 0.0 0.0
1/(1+ex)1/(1+e^{-x}) 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
MDSS xx 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
x3x^{3} 0.1 0.08 0.08 0.08 0.1 0.07 0.09 0.05 0.03
x1/3x^{1/3} 0.0 0.02 0.02 0.01 0.03 0.07 0.05 0.06 0.05
exe^{x} 0.04 0.05 0.02 0.08 0.07 0.05 0.04 0.08 0.08
1/(1+ex)1/(1+e^{-x}) 0.03 0.03 0.03 0.0 0.01 0.0 0.0 0.0 0.0

[9] always accepts the null hypothesis. [22] fails to control the Type I error when the noise level is small on curve x3x^{3}, and [1] cannot control the Type I error when the noise level is large on curve x3x^{3}. In contrast, our proposed methods, monotone decomposition with cubic splines (MDCS) and monotone decomposition with smoothing splines (MDSS), demonstrate strong Type I error control in the majority of cases, even though the rejection rates are slightly elevated in a few instances.

Furthermore, the pp-value should follow Uniform[0,1]\text{Uniform}[0,1] under the null hypothesis. To check the distribution of pp-value for each approach, Figure 4 displays the uniform QQ plots of 1000 pp-values for x3x^{3} and exe^{x} with sample size n=200n=200 and noise level σ=0.01\sigma=0.01, respectively. The uniform QQ plots of pp-values for all five curves with different noise levels can be found in the Supplementary Material. Notably, our proposed MDSS aligns pretty well with the diagonal line in the QQ plots, indicating the closest resemblance to the uniform distribution.

Refer to caption
(a) x3x^{3}
Refer to caption
(b) exe^{x}
Figure 4: Uniform QQ plots of 1000 pp-values for five approaches on two curves with n=200,σ=0.01n=200,\sigma=0.01, respectively.

Next, we compare the simulated size and power under the settings of two competitors. The first one is the simulation setting in [1],

f(x,a)=1+xaexp((x0.5)220.12),y=f(x,a)+N(0,σ2),f(x,a)=1+x-a\exp\left(-\frac{(x-0.5)^{2}}{2\cdot 0.1^{2}}\right)\,,\quad y=f(x,a)+N(0,\sigma^{2})\,,

where a=0a=0, and 0.15 generate strongly and just monotonic curves, respectively; a=0.25a=0.25, and 0.45 produce mildly and strongly non-monotonic shapes, respectively. Also, we consider the simulation setting in [9], y=mi(x)+N(0,σ2)y=m_{i}(x)+N(0,\sigma^{2}), where

m1(x)\displaystyle m_{1}(x) =0,m2(x)=x(1x),m3(x)=x+0.415exp(50x2),\displaystyle=0\,,\quad m_{2}(x)=x(1-x)\,,\quad m_{3}(x)=x+0.415\exp(-50x^{2})\,,
m4(x)\displaystyle m_{4}(x) ={10(x0.5)3exp(100(x0.25)2) if x<0.50.1(x0.5)exp(100(x0.25)2) otherwise,\displaystyle=\begin{cases}10(x-0.5)^{3}-\exp(-100(x-0.25)^{2})&\text{ if }x<0.5\\ 0.1(x-0.5)-\exp(-100(x-0.25)^{2})&\text{ otherwise}\end{cases}\,,

and mi(x),i2m_{i}(x),i\geq 2 are non-monotone curves. A visualization of those curves can be found in the Supplementary Material.

For each combination of parameters on each curve, 100 simulations are carried out, using a bootstrap simulation size of 100. The proportions of rejecting the null hypothesis, i.e., the simulated size and power, are reported. The complete results are displayed in Table 5, from which we have the following observations:

  • [9] fails to achieve high power on [1]’s curve a=0.25a=0.25, but our proposed method can achieve as high power as [1].

  • [1] loses power on [9]’s curve m4m_{4}, but our proposed method can obtain as high power as [9].

  • For most curves, the behaviors of MDCS are similar to MDSS. But MDCS has better control over Type I errors while losing some power.

In summary, our proposed method can achieve comparable (and even better) power as other methods while controlling the Type I error.

Table 5: Simulated size and power on curves from [1] and [9] based on 100 repetitive experiments.
Methods [1] [9]
Curves σ=0.001\sigma=0.001 σ=0.01\sigma=0.01 σ=0.1\sigma=0.1 Curves σ=0.001\sigma=0.001 σ=0.01\sigma=0.01 σ=0.1\sigma=0.1
n = 50 100 200 n = 50 100 200 n = 50 100 200 n = 50 100 200 n = 50 100 200 n = 50 100 200
[22] a = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.01 0.01 0.01 m1 0.1 0.04 0.04 0.06 0.02 0.06 0.07 0.07 0.1
a = 0.15 0.0 0.0 0.0 0.07 0.04 0.06 0.03 0.03 0.01 m2 1.0 1.0 1.0 1.0 1.0 1.0 0.09 0.27 0.45
a = 0.25 1.0 1.0 1.0 1.0 1.0 1.0 0.13 0.24 0.64 m3 1.0 1.0 1.0 1.0 1.0 1.0 0.34 0.56 0.88
a = 0.45 1.0 1.0 1.0 1.0 1.0 1.0 0.56 0.95 1.0 m4 0.36 0.57 0.98 0.39 0.57 0.98 0.23 0.38 0.78
[9] a = 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 m1 0.01 0.0 0.01 0.01 0.04 0.02 0.02 0.0 0.01
a = 0.15 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 m2 1.0 1.0 1.0 1.0 1.0 1.0 0.37 0.7 0.94
a = 0.25 0.0 0.0 1.0 0.0 0.0 0.31 0.0 0.0 0.0 m3 0.97 1.0 1.0 0.94 1.0 1.0 0.1 0.33 0.87
a = 0.45 1.0 1.0 1.0 1.0 1.0 1.0 0.06 0.71 0.99 m4 0.01 0.19 0.53 0.02 0.13 0.53 0.0 0.05 0.34
[1] a = 0.0 0.0 0.0 0.0 0.02 0.01 0.0 0.01 0.01 0.03 m1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
a = 0.15 0.0 0.0 0.0 0.0 0.03 0.0 0.01 0.02 0.04 m2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
a = 0.25 1.0 1.0 1.0 1.0 1.0 1.0 0.09 0.16 0.44 m3 1.0 1.0 1.0 1.0 1.0 1.0 0.9 1.0 1.0
a = 0.45 1.0 1.0 1.0 1.0 1.0 1.0 0.3 0.72 0.98 m4 0.0 0.0 0.0 0.01 0.02 0.0 0.34 0.33 0.28
MDCS a = 0.0 0.01 0.0 0.07 0.0 0.0 0.0 0.0 0.0 0.0 m1 0.03 0.02 0.02 0.0 0.01 0.0 0.0 0.0 0.0
a = 0.15 0.05 0.02 0.04 0.01 0.0 0.0 0.0 0.0 0.0 m2 0.98 0.99 0.95 1.0 1.0 0.94 0.18 0.35 0.65
a = 0.25 0.97 1.0 1.0 0.74 0.92 0.95 0.0 0.0 0.0 m3 0.89 0.93 1.0 0.9 0.97 0.99 0.16 0.22 0.29
a = 0.45 1.0 1.0 1.0 0.99 1.0 1.0 0.01 0.08 0.17 m4 0.89 0.91 0.94 0.88 0.92 0.99 0.51 0.68 0.81
MDSS a = 0.0 0.02 0.07 0.06 0.02 0.08 0.03 0.02 0.04 0.04 m1 0.05 0.07 0.04 0.06 0.04 0.03 0.05 0.07 0.05
a = 0.15 0.05 0.03 0.05 0.03 0.06 0.05 0.02 0.05 0.06 m2 1.0 1.0 1.0 1.0 1.0 1.0 0.79 0.96 1.0
a = 0.25 0.99 1.0 1.0 0.98 1.0 1.0 0.03 0.04 0.03 m3 1.0 1.0 1.0 1.0 1.0 1.0 0.67 0.83 0.97
a = 0.45 1.0 1.0 1.0 1.0 1.0 1.0 0.46 0.82 0.98 m4 0.98 1.0 1.0 0.99 1.0 1.0 0.84 0.99 1.0

7 Application: Monotonicity Test for scRNA-seq Trajectory Inference

Single-cell transcriptome sequencing (scRNA-seq) is a powerful technique that allows researchers to profile transcript abundance at the resolution of individual cells. Trajectory inference aims first to allocate cells to lineages and then order them based on pseudotimes within these lineages. Based on trajectory inference, researchers can discover differentially expressed genes within lineages, such as [21]’s tradeSeq, [20]’s PseudotimeDE, and [13]’s Lamian. These methods mostly focus on the differential genes by checking whether the trajectory is constant along the pseudotime. Once a gene is identified as differentially expressed, researchers may further check whether its expression exhibits a monotone pattern. A non-decreasing expression pattern indicates that the corresponding gene is turning on and needed thereafter along the cell lineage. A decreasing expression pattern indicates that the corresponding gene is needed less and less along the pseudotime. On the other hand, a non-monotone expression pattern indicates that the corresponding gene is part of a more complex dynamics. Such detailed dynamics may illuminate the critical regulatory mechanism of cell differentiation along the corresponding lineage.

As an analogy to the term differentially expressed (DE) gene when the null hypothesis that the expression of the gene along the trajectory is constant is rejected, we call a non-monotonically expressed (nME) gene when the null hypothesis that the expression is monotonic is rejected. We adopt tradeSeq to identify DE genes, and the monotonicity test via monotone decomposition with cubic spline (MDCS) to find nME genes. Both DE genes and nME genes are selected using the Benjamini–Hochberg (BH) procedure to control the false discovery rate (FDR) with cutoff α=0.05\alpha=0.05.

To explore the biological functions of DE genes and nME genes, we examined the Gene Ontology (GO), which is a relational database of terms (concepts) used to describe gene functions, and conducted enrichment analysis [2].

Suppose there are NN genes in the reference gene list, among which nn genes are in our analyzed gene set. For a GO term of interest, suppose there are MM and kk genes within the reference gene list and our analyzed gene set, respectively, that are annotated to have the GO term. The pp-value for the one-sided Fisher’s exact test of the null hypothesis that the GO term is not enriched in the analyzed gene set can be calculated based on the hypergeometric distribution:

p=1i=0k1(Mi)(NMki)(Nn).p=1-\sum_{i=0}^{k-1}\frac{\binom{M}{i}\binom{N-M}{k-i}}{\binom{N}{n}}\,.

We repeat this test for multiple GO terms of interest and correct for multiple comparisons via the BH procedure to control the FDR at the cutoff α=0.05\alpha=0.05.

7.1 nME genes can identify significant GO terms when DE genes fail

We studied the leukocyte lineage of the mouse bone marrow data set [16], which consists of the expression measurements of 3004 genes at 1474 pseudotime points. Figure 5(a) shows the reduced two-dimensional representation of the data using uniform manifold approximation and projection (UMAP) [15]. Eight cell types are denoted with different colors and shapes. The solid curve is the pseudotime axis, which starts from the cell type Multipotent progenitors at the bottomright and ends at the cell type Neutrophils at the topleft. Note that although a monotone pattern is a special DE pattern, we do not perform the monotonicity test in a two-step manner, i.e., firstly find DE genes and then perform the monotonicity test among those found DE genes. Instead, for each gene, we test whether it is a DE gene or an nME gene independently. Figure 5(b) displays the paired pp-values (pnME,pDE)(p_{\text{nME}},p_{\text{DE}}) in the logarithmic scale, where the dash lines denote the cutoff determined by the BH procedure. As a result, we identified 109 nME genes (it is 102 after GO analysis since 7 genes are not mapped in the GO database) and 767 DE genes, of which 53 genes are in common. These numbers are also noted in the titles of GO bar plots in Figure 6. Figure 5(c) illustrates the fitted trajectory for gene 2610029G23Rik, which is identified as an nME gene but not a DE gene, i.e., it lies in the bottom right green block of Figure 5(b).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 5: (a): Two-dimensional representation of the data using UMAP. Different cell types are denoted with different colors (and shapes). The pseudotime axis is displayed, which starts from the Multipotent progenitors cell type at the bottom-right to the Neutrophils cell type at the top-left. (b): the scatter plot of the paired pp-values (pnME,pDE)(p_{\text{nME}},p_{\text{DE}}), where the star symbols denote NA values when tradeSeq failed. (c): one example gene located in the bottom right green block of (b), which is identified as an nME gene but not a DE gene. (d): Trajectory curves of 109 nME genes. The curves of 22 genes annotated in the GO term “translation” are highlighted.

Figure 6 displays GO enrichment analysis on the DE genes and nME genes by R package clusterProfiler [24]. Figures 6(a), 6(b) and 6(c) take the whole 3004 genes as the reference gene list, but note that, because some genes are not mapped in the GO database, there are only 2665 genes after filtering. We cannot find significant GO terms for the DE gene list, as shown in Figure 6(b), which is left blank deliberately due to no significant results. In contrast, we can identify several significant GO terms for the nME gene list. Figures 6(c) and 6(d) display the intersection of the DE gene list and the nME gene list with respect to the whole gene list or the DE gene list, respectively. Both can identify significant GO terms. One possible reason for DE genes failing to identify significant GO terms is that the range of DE genes might be too broad, thus different sub-categories of DE genes (the non-monotonic pattern is a special sub-category) might contribute to different GO terms, but the increased sample size due to incorporating unrelated genes might reduce the significance for determining the significant GO terms. Another potential reason is that the tradeSeq test is not robust enough. As shown by the star symbol in Figure 5(b), there are many NA values returned by tradeSeq, which is a known issue discussed in their GitHub repository111https://github.com/statOmics/tradeSeq/issues/209.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 6: GO enrichment analysis of four gene sets ((a): nME gene set; (b): DE gene set; (c): the intersection of nME and DE genes; (d): the same gene set as (c) but the reference list is the DE genes) with BH FDR cutoff α=0.05\alpha=0.05. The numbers of genes are noted in the title of each subfigure. The enriched GO terms are sorted by the adjusted pp-values. (b) is left empty deliberately since no significant GO terms are selected.

We can further check whether the pattern of trajectory curves in the enriched GO terms agrees with the biological mechanism. Take the first GO term “translation” as an example. Figure 5(d) displays the trajectory curves of 109 nME genes along the pseudotime. Among these 109 nME genes, the curves of 22 genes annotated in the GO term “translation” are highlighted. These genes exhibit a coherent pattern, characterized by an initial increase in expression followed by a subsequent decrease. If we cast the pseudotime axis in Figure 5(d) back to Figure 5(a), the curve pattern implies that the gene expression increases when the cell develops from Multipotent progenitors to Monocytes, and roughly after the gene expression reaching the peak, the cell evolves to Neutrophils. This behavior is consistent with the biological fact that specialized cell types (here Neutrophils) might reduce rates of translation (and hence protein synthesis), since their structure and function are relatively stable.

7.2 nME genes can fine-locate GO terms when DE genes identify too many terms

In some scenarios, although GO enrichment analysis can identify significant GO terms given the DE genes, nME genes can further fine-locate GO terms and focus on a small but significant set of GO terms. We analyzed the cholangiocyte lineage from the mouse liver data studied in [8] to demonstrate such an advantage of nME genes. In the dataset, there are 6038 genes and 308 pseudotime points.

tradeSeq identifies 767 DE genes (it is 801 before filtering due to unmapping in GO), and the monotonicity test identifies 67 nME genes (it is 69 before filtering due to unmapping in GO), of which 45 genes are in common. For the 767 DE genes, we identify 439 significant GO terms, whereas for the 67 nME genes, we find 39 significant GO terms. Between the two sets, 28 GO terms are in common. Figure 7(a) displays all GO terms returned by the 39 nME genes, where the star symbol indicates GO terms not shared by the DE genes. Figure 7(b) shows the enrichment map constructed by the GO terms from DE genes, in which the common GO terms shared by the nME genes are highlighted. In the enrichment map, an edge connects two GO terms if there are overlapped genes, and hence, mutually overlapping GO terms tend to cluster together, making it easy to identify functional modules. It is clear that the shared GO terms mainly focus on two clusters: one is isolated from others on the right side and forms a pentagon shape with the keyword “coagulation”; another cluster is located at the left corner of the biggest cluster, related to “catabolic process”. In other words, nME genes can help fine-locate GO terms, which might help save time for researchers without checking too many GO terms from DE genes.

Refer to caption
(a)
Refer to caption
(b)
Figure 7: GO enrichment analysis for the DE genes and nME genes on the cholangiocyte lineage of the liver data. (a) GO terms sorted by adjusted pp-values enriched by the nME genes. (b) Enrichment map of the GO terms by the DE genes, where the common GO terms shared by the nME genes are highlighted.

We further check the new GO terms enriched by the nME genes, which are annotated with the star symbol in Figure 7(a). These new GO terms might be contributed by new genes that are not identified as DE genes. For example, we take the GO term “regulation of blood coagulation” as an example, which contains 5 genes {F11, Kng2, Serpinc1, Serpinf2, Vtn} in the nME gene set, where the first three are also in the DE gene set, but the last two are only in the nME gene set. Figures 8(c) and 8(d) display the fitted trajectory of the expression along pseudotime by our monotone decomposition fitting technique, and the pp-values are also noted in the title of each figure. We observe that the pp-value for the DE gene is not quite as significant as the one for the nME gene. In other words, these two pp-values lie in the bottom right green block of Figure 8(a). Using the same data at the hepatoblast stage (an earlier stage than the cholangiocyte stage we considered), [8] identified 68 differentially variable (DV) genes, which indicates that the variances (instead of the mean expression considered in DE genes and nME genes) of gene expressions change along the pseudotime. Accidentally, Serpinf2 and Vtn are the two and the only two which are both in the 68 DV gene set and the 69 nME gene set. The coexistence of DV and nME characteristics in these genes suggests intricate and dynamic expression patterns, which might indicate significant biological interest. The dual nature of being both DV and nME genes underscores the complexity of the regulatory mechanisms governing these specific genetic expressions.

Furthermore, we check all genes that are identified as nME but not DE. Figure 8(b) displays the trajectory curves of all nME genes, and the curves of nME but not DE genes are highlighted, while others are transparent. To facilitate comparative analysis, all curves are centered and different colors denote different clusters (see Section 7.3). Note that the variations of highlighted curves are relatively small compared to curves in transparent, so different methods might make different conclusions. As a result, some non-DE genes are treated as nME genes despite the initial expectation that nME genes should naturally encompass a subset of DE genes.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 8: Hypothesis testing on the cholangiocyte lineage of the mouse liver data. (a) The scatter plot of paired pp-values (pnME,pDE)(p_{\text{nME}},p_{\text{DE}}), where the cutoffs selected by the BH procedure are 0.00053 and 0.00660, respectively. (b) Trajectory curves of all nME genes. (c) and (d) are two genes in the GO term “regulation of blood coagulation”, which are enriched by the nME genes but not DE genes.

7.3 nME genes can highlight non-monotonic patterns while DE genes blur them

The clustering of genes based on their fitted expression patterns can reveal intriguing insights for biologists. However, a potential limitation of clustering based on DE genes is the tendency of clustering methods to amalgamate pure monotonic patterns with somewhat intricate non-monotonic patterns (e.g., Figure 5 of [21]). To mitigate the amalgamation of non-monotonic patterns and maintain their clarity, one possible direction is to tailor clustering approaches, such as constructing more suitable similarity measurements. Another direct approach is to focus only on non-monotonic patterns from nME genes.

Here, we consider the liver dataset in Section 7.2. Figure 9(a) shows the dendrogram from hierarchical clustering with complete linkage on 69 nME genes. We take the cutoff 2.0 to obtain 10 clusters with clear patterns. Figure 9(b) presents the trajectory curves of those 69 nME genes, and different colors denote different clusters. Notably, we have highlighted three representative clusters, each depicted in its respective figure, ranging from Figure 9(c) to Figure 9(d). Figure 9(c) displays a peak on the right side, while Figure 9(d) showcases a peak on the left side.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c) C9, Hp, Kng2
Refer to caption
(d) Hgd, Proz, Sord
Refer to caption
(e)
Refer to caption
(f)
Refer to caption
(g)
Refer to caption
(h)
Figure 9: Clustering of trajectory curves of nME genes (first row) and DE genes (second row). (a): Hierarchical dendrogram of 69 nME genes. The cutoff (2.0) at the dashed line results in 10 clusters. (b): Trajectory curves of 69 nME genes. (c)-(d) display the curves of 2 out of 10 clusters from (b), respectively. The gene symbols are noted for clusters with few genes. (e): Hierarchical dendrogram of 801 DE genes. The cutoff (1.5) at the dashed line results in 12 clusters. (f): Trajectory curves of 801 DE genes. (g) and (h) are two distinct clusters from 12 clusters in (f). Different colors denote different clusters. The same color in (b) to (d) denotes the same cluster, and the same color in (g) to (h) denotes the same cluster, but there is no direct correspondence between the colors in (b) and (f).

On the other hand, we perform hierarchical clustering with a similar cutoff 1.5 and identify 12 clusters, as shown in Figure 9(e). The trajectory curves of 801 DE genes with different colors representing different clusters are displayed in Figure 9(f). It is worth noting that the presence of monotonic patterns has somewhat concealed the underlying wiggly patterns, as evidenced in Figure 9(g), which combines non-monotonic patterns similar to those found in Figure 9(c) with numerous ascending curves. Similarly, Figure 9(h) combines the non-monotonic pattern observed in Figure 9(d), illustrating the challenges in distinguishing these patterns.

Pure non-monotonic patterns hold the potential to identify significant patterns. For example, all three genes in Figure 9(c) are significantly annotated to GO terms “defense response to other organism”, “response to external biotic stimulus”, and “response to other organism”, each accompanied by FDR adjusted pp-value of 0.00278.

8 Discussions

We formulate the monotone decomposition with monotone splines. It can serve as a fitting method when we sum up two monotone components, and it can be used to conduct a test of monotonicity by checking whether one component is constant.

As a fitting method, the experiments have shown that the monotone decomposition with cubic splines improves the performance, especially in high noise cases. We can explain the better performance in monotone functions theoretically. Similar phenomena have been observed for the monotone decomposition with smoothing splines. However, there are also some limitations:

  • The cross-validation procedure for simultaneously tuning two parameters is computationally extensive. The generative bootstrap sampler (GBS) proposed by [19] might be an alternative to speed up the cross-validation step.

  • Currently, the theoretical guarantees are only for monotone functions. It would be great if the theoretical results could be extended to general functions.

For the test of monotonicity by monotone decomposition, the proposed statistics based on monotone decomposition show competitive performance and are even much better than the existing approaches.

We also apply the fitting and testing based on monotone decomposition to investigate the monotonic and non-monotonic trajectory patterns in several scRNA-seq datasets. In parallel with the conventional analysis of differentially expressed (DE) genes, we propose the concept of non-monotonically expressed (nME) genes, which might lead to new biological insights.

Acknowledgement

Lijun Wang was supported by the Hong Kong Ph.D. Fellowship Scheme from the University Grant Committee. Hongyu Zhao was supported in part by NIH grant P50 CA196530. JSL was supported in part by the NSF DMS/NIGMS 2 Collaborative Research grant (R01 GM152814).

Supplementary Material

The Supplementary Material contains additional simulation results and technical proofs of propositions.

References

  • [1] A.. Bowman, M.. Jones and I. Gijbels “Testing Monotonicity of Regression” In Journal of Computational and Graphical Statistics 7.4, 1998, pp. 489–500 DOI: 10.1080/10618600.1998.10474790
  • [2] Elizabeth I. Boyle et al. “GO::TermFinder—Open Source Software for Accessing Gene Ontology Information and Finding Significantly Enriched Gene Ontology Terms Associated with a List of Genes” In Bioinformatics 20.18, 2004, pp. 3710–3715 DOI: 10.1093/bioinformatics/bth456
  • [3] Denis Chetverikov “Testing Regression Monotonicity in Econometric Models” In Econometric Theory 35.4, 2019, pp. 729–776 DOI: 10.1017/S0266466618000282
  • [4] Hugh A. Chipman, Edward I. George and Robert E. McCulloch “BART: Bayesian Additive Regression Trees” In The Annals of Applied Statistics 4.1 Institute of Mathematical Statistics, 2010, pp. 266–298 DOI: 10.1214/09-AOAS285
  • [5] Hugh A. Chipman, Edward I. George, Robert E. McCulloch and Thomas S. Shively “mBART: Multidimensional Monotone BART” In Bayesian Analysis 17.2 International Society for Bayesian Analysis, 2022, pp. 515–544 DOI: 10.1214/21-BA1259
  • [6] Russell Davidson and Emmanuel Flachaire “The Wild Bootstrap, Tamed at Last” In Journal of Econometrics 146.1, 2008, pp. 162–169 DOI: 10.1016/j.jeconom.2008.08.003
  • [7] Carl De Boor “A Practical Guide to Splines” New York: Springer, 1978
  • [8] Shila Ghazanfar et al. “Investigating Higher-Order Interactions in Single-Cell Data with scHOT” In Nature Methods 17.8 Nature Publishing Group, 2020, pp. 799–806 DOI: 10.1038/s41592-020-0885-x
  • [9] Subhashis Ghosal, Arusharka Sen and Aad W. van der Vaart “Testing Monotonicity of Regression” In The Annals of Statistics 28.4 Institute of Mathematical Statistics, 2000, pp. 1054–1082 JSTOR: 2673954
  • [10] Robert B. Gramacy “Surrogates: Gaussian Process Modeling, Design and Optimization for the Applied Sciences” Chapman Hall/CRC, 2020
  • [11] Peter Hall and Nancy E. Heckman “Testing for Monotonicity of a Regression Mean by Calibrating for Linear Functions” In The Annals of Statistics 28.1 Institute of Mathematical Statistics, 2000, pp. 20–39 JSTOR: 2673980
  • [12] Trevor Hastie, Robert Tibshirani and Jerome Friedman “The Elements of Statistical Learning: Data Mining, Inference, and Prediction” Springer Science & Business Media, 2009
  • [13] Wenpin Hou et al. “A Statistical Framework for Differential Pseudotime Analysis with Multiple Single-Cell RNA-seq Samples” In Nature Communications 14.1 Nature Publishing Group, 2023, pp. 7286 DOI: 10.1038/s41467-023-42841-y
  • [14] Jan R Magnus and Heinz Neudecker “Matrix Differential Calculus with Applications in Statistics and Econometrics” John Wiley & Sons, Ltd, 2019 DOI: 10.1002/9781119541219
  • [15] Leland McInnes, John Healy, Nathaniel Saul and Lukas Großberger “UMAP: Uniform Manifold Approximation and Projection” In Journal of Open Source Software 3.29, 2018, pp. 861 DOI: 10.21105/joss.00861
  • [16] Franziska Paul et al. “Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors” In Cell 163.7, 2015, pp. 1663–1677 DOI: 10.1016/j.cell.2015.11.013
  • [17] James O. Ramsay and Bernard W. Silverman “Functional Data Analysis”, Springer Series in Statistics New York, NY: Springer, 2005
  • [18] Carl Edward Rasmussen and Christopher K.. Williams “Gaussian Processes for Machine Learning”, Adaptive Computation and Machine Learning Cambridge, Mass: MIT Press, 2006
  • [19] Minsuk Shin, Shijie Wang and Jun S. Liu “Generative Multi-purpose Sampler for Weighted M-estimation” In Journal of Computational and Graphical Statistics 0.ja Taylor & Francis, 2023, pp. 1–53 DOI: 10.1080/10618600.2023.2292668
  • [20] Dongyuan Song and Jingyi Jessica Li “PseudotimeDE: Inference of Differential Gene Expression along Cell Pseudotime with Well-Calibrated p-Values from Single-Cell RNA Sequencing Data” In Genome Biology 22.1, 2021, pp. 124 DOI: 10.1186/s13059-021-02341-y
  • [21] Koen Van den Berge et al. “Trajectory-Based Differential Expression Analysis for Single-Cell Sequencing Data” In Nature Communications 11.1 Nature Publishing Group, 2020, pp. 1201 DOI: 10.1038/s41467-020-14766-3
  • [22] Jianqiang C. Wang and Mary C. Meyer “Testing the Monotonicity or Convexity of a Function Using Regression Splines” In The Canadian Journal of Statistics / La Revue Canadienne de Statistique 39.1 [Statistical Society of Canada, Wiley], 2011, pp. 89–107 JSTOR: 41304465
  • [23] Lijun Wang, Xiaodan Fan, Huabai Li and Jun S. Liu “Monotone Cubic B-Splines” arXiv, 2023 DOI: 10.48550/arXiv.2307.01748
  • [24] Guangchuang Yu et al. “clusterProfiler: A Universal Enrichment Tool for Interpreting Omics Data”, Bioconductor version: Release (3.17), 2023 DOI: 10.18129/B9.bioc.clusterProfiler

Appendix A More Simulation Results

A.1 Candidate Kernels

Random functions with kernel KK, including Squared Exponential (SE) kernel KSEK_{SE}, Rational Quadratic (RQ) kernel KRQK_{RQ}, Matérn (Mat) kernel KMatK_{Mat} and Periodic kernel KPeriodicK_{Periodic}[18].

KSE(x,x)\displaystyle K_{SE}(x,x^{\prime}) =exp((xx)222),KMat(x,x)=21νΓ(ν)(2ν|xx|)νSν(2ν|xx|),\displaystyle=\exp\left(-\frac{(x-x^{\prime})^{2}}{2\ell^{2}}\right),\;K_{Mat}(x,x^{\prime})=\frac{2^{1-\nu}}{\Gamma(\nu)}\left(\frac{\sqrt{2\nu}|x-x^{\prime}|}{\ell}\right)^{\nu}S_{\nu}\left(\frac{\sqrt{2\nu}|x-x^{\prime}|}{\ell}\right),
KRQ(x,x)\displaystyle K_{RQ}(x,x^{\prime}) =(1+(xx)22α2)α,KPeriodic(x,x)=exp(2sin2(|xx|/T)2),\displaystyle=\left(1+\frac{(x-x^{\prime})^{2}}{2\alpha\ell^{2}}\right)^{-\alpha},\;K_{Periodic}(x,x^{\prime})=\exp\left(-\frac{2\sin^{2}(|x-x^{\prime}|/T)}{\ell^{2}}\right)\,,

where ,ν,α,T\ell,\nu,\alpha,T are the parameters, and SνS_{\nu} is a modified Bessel function. In particular, “Mat12” refers to the Matérn kernel with ν=1/2\nu=1/2, and similarly, “Mat32” and “Mat52” indicate ν=3/2\nu=3/2 and 5/25/2, respectively. Any additional parameters are appended to the kernel name; for example, “SE-1” represents the Squared Exponential kernel with =1\ell=1, “Mat12-1” denotes the Matérn kernel with ν=1/2,=1\nu=1/2,\ell=1, and “RQ-0.1-0.5” is the Rational Quadratic kernel with parameter =0.1,α=0.5\ell=0.1,\alpha=0.5.

A.2 Random Function Generation

A random function ff with kernel KK is generated as follows,

  1. 1.

    Generate nn random points, xiU[1,1],i=1,,nx_{i}\sim U[-1,1],i=1,\ldots,n.

  2. 2.

    Calculate the covariance matrix Σ\Sigma based on kernel KK, Σij=K(xi,xj)\Sigma_{ij}=K(x_{i},x_{j}). Practically, add a small constant, say 10710^{-7}, on the diagonal of Σ\Sigma to prevent numerically ill-conditioned matrix [10].

  3. 3.

    Generate a random Gaussian vector with the above covariance matrix, fN(0,Σ)f\sim N(0,\Sigma).

Refer to caption
Figure 10: Candidate functions. The left panel shows several regular functions, the middle panel displays the random functions drawn from Gaussian processes with the square exponential kernel, and the right panel shows the random functions generated from Gaussian processes with other kernels.

A.3 Monotone Decomposition with Cubic Splines

Table 6: A complete version with finer noise levels fro Table 1. Results for comparing the cubic splines and the fitting by monotone decomposition with CV-tuned (μ,J)(\mu,J). The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error.
curve σ\sigma MSFE MSPE p-value prop.
CubicSpline MonoDecomp CubicSpline MonoDecomp
x2x^{2} 0.2 1.95e+00 (1.4e-02) 1.94e+00 (1.5e-02) 1.46e+00 (6.1e-02) 1.46e+00 (6.4e-02) 4.44e-01 0.57
0.4 3.84e+00 (3.4e-02) 3.83e+00 (3.0e-02) 2.97e+00 (1.5e-01) 2.90e+00 (1.3e-01) 2.79e-01 0.57
0.6 5.80e+00 (4.3e-02) 5.77e+00 (4.7e-02) 4.45e+00 (2.0e-01) 4.22e+00 (1.9e-01) 7.97e-02 (.) 0.63
1.0 9.71e+00 (7.3e-02) 9.73e+00 (7.3e-02) 7.50e+00 (3.1e-01) 6.93e+00 (2.6e-01) 5.91e-03 (**) 0.59
x3x^{3} 0.2 1.89e+00 (1.6e-02) 1.88e+00 (1.7e-02) 1.53e+00 (6.9e-02) 1.54e+00 (5.9e-02) 3.81e-01 0.47
0.4 3.83e+00 (3.0e-02) 3.80e+00 (3.0e-02) 2.82e+00 (1.4e-01) 2.89e+00 (1.2e-01) 1.71e-01 0.45
0.6 5.74e+00 (4.6e-02) 5.68e+00 (4.9e-02) 4.78e+00 (2.1e-01) 4.71e+00 (1.6e-01) 2.93e-01 0.49
1.0 9.65e+00 (7.4e-02) 9.60e+00 (7.2e-02) 7.63e+00 (3.5e-01) 7.17e+00 (2.7e-01) 2.21e-02 (*) 0.6
exp(x)\exp(x) 0.2 1.91e+00 (1.3e-02) 1.88e+00 (1.5e-02) 1.51e+00 (5.8e-02) 1.61e+00 (6.6e-02) 3.47e-02 (*) 0.53
0.4 3.83e+00 (2.9e-02) 3.82e+00 (3.1e-02) 2.79e+00 (1.3e-01) 2.71e+00 (1.1e-01) 1.97e-01 0.57
0.6 5.72e+00 (4.4e-02) 5.68e+00 (4.4e-02) 4.45e+00 (1.8e-01) 4.27e+00 (1.7e-01) 1.15e-01 0.61
1.0 9.57e+00 (5.9e-02) 9.49e+00 (7.3e-02) 7.56e+00 (2.8e-01) 7.08e+00 (3.1e-01) 3.59e-02 (*) 0.57
sigmoid 0.2 1.91e+00 (1.5e-02) 1.91e+00 (1.6e-02) 1.69e+00 (5.4e-02) 1.59e+00 (5.0e-02) 6.14e-03 (**) 0.65
0.4 3.89e+00 (3.2e-02) 3.88e+00 (3.2e-02) 2.99e+00 (1.2e-01) 2.96e+00 (1.1e-01) 3.53e-01 0.47
0.6 5.77e+00 (4.8e-02) 5.76e+00 (4.8e-02) 4.35e+00 (1.7e-01) 4.14e+00 (1.4e-01) 4.19e-02 (*) 0.56
1.0 9.51e+00 (8.5e-02) 9.50e+00 (8.7e-02) 7.33e+00 (3.2e-01) 6.61e+00 (2.6e-01) 1.45e-03 (**) 0.56
SE-1 0.2 1.93e+00 (1.4e-02) 1.93e+00 (1.6e-02) 1.62e+00 (7.3e-02) 1.59e+00 (5.2e-02) 3.21e-01 0.52
0.4 3.81e+00 (3.4e-02) 3.78e+00 (3.6e-02) 3.18e+00 (1.4e-01) 3.16e+00 (1.2e-01) 3.93e-01 0.54
0.6 5.77e+00 (4.4e-02) 5.77e+00 (4.6e-02) 4.67e+00 (2.0e-01) 4.34e+00 (1.6e-01) 2.88e-02 (*) 0.57
1.0 9.55e+00 (7.0e-02) 9.51e+00 (7.6e-02) 7.29e+00 (3.2e-01) 6.62e+00 (2.7e-01) 5.51e-03 (**) 0.63
SE-0.1 0.2 1.76e+00 (2.0e-02) 1.78e+00 (2.5e-02) 3.54e+00 (5.3e-02) 3.54e+00 (6.7e-02) 4.92e-01 0.6
0.4 3.54e+00 (3.6e-02) 3.55e+00 (3.8e-02) 6.59e+00 (1.1e-01) 6.25e+00 (1.0e-01) 1.36e-04 (***) 0.66
0.6 5.57e+00 (5.4e-02) 5.59e+00 (6.0e-02) 9.20e+00 (1.6e-01) 9.13e+00 (1.6e-01) 3.14e-01 0.59
1.0 9.29e+00 (8.5e-02) 9.20e+00 (9.3e-02) 1.44e+01 (2.6e-01) 1.38e+01 (2.4e-01) 5.93e-04 (***) 0.7
Mat12-1 0.2 2.12e+00 (3.2e-02) 2.15e+00 (2.9e-02) 5.43e+00 (5.2e-02) 5.29e+00 (6.5e-02) 7.85e-03 (**) 0.72
0.4 4.08e+00 (4.7e-02) 4.02e+00 (4.8e-02) 7.55e+00 (9.5e-02) 7.20e+00 (8.9e-02) 7.51e-09 (***) 0.72
0.6 5.82e+00 (6.5e-02) 5.79e+00 (5.8e-02) 9.34e+00 (1.4e-01) 8.94e+00 (1.1e-01) 1.35e-04 (***) 0.7
1.0 9.79e+00 (8.9e-02) 9.73e+00 (9.1e-02) 1.26e+01 (2.9e-01) 1.17e+01 (2.4e-01) 9.31e-07 (***) 0.68
Mat12-0.1 0.2 3.87e+00 (6.8e-02) 3.88e+00 (7.3e-02) 1.26e+01 (1.6e-01) 1.25e+01 (1.8e-01) 2.28e-01 0.57
0.4 5.15e+00 (8.3e-02) 5.08e+00 (6.9e-02) 1.47e+01 (1.5e-01) 1.42e+01 (1.3e-01) 1.25e-03 (**) 0.66
0.6 6.77e+00 (1.0e-01) 6.64e+00 (8.8e-02) 1.67e+01 (1.8e-01) 1.61e+01 (1.7e-01) 2.66e-04 (***) 0.67
1.0 1.04e+01 (1.3e-01) 1.04e+01 (1.3e-01) 2.07e+01 (2.4e-01) 2.01e+01 (2.4e-01) 1.85e-04 (***) 0.73
Mat32-1 0.2 1.87e+00 (1.6e-02) 1.87e+00 (1.6e-02) 2.40e+00 (4.5e-02) 2.29e+00 (4.3e-02) 1.30e-03 (**) 0.66
0.4 3.82e+00 (3.5e-02) 3.79e+00 (3.6e-02) 3.99e+00 (1.0e-01) 3.92e+00 (9.6e-02) 2.08e-01 0.63
0.6 5.77e+00 (4.4e-02) 5.76e+00 (4.3e-02) 5.60e+00 (1.6e-01) 5.28e+00 (1.3e-01) 4.84e-03 (**) 0.68
1.0 9.62e+00 (8.2e-02) 9.61e+00 (8.3e-02) 9.01e+00 (3.5e-01) 8.00e+00 (2.6e-01) 1.46e-04 (***) 0.56
Mat32-0.1 0.2 2.05e+00 (3.5e-02) 2.11e+00 (4.4e-02) 5.46e+00 (7.6e-02) 5.54e+00 (1.1e-01) 1.82e-01 0.57
0.4 3.84e+00 (5.4e-02) 3.77e+00 (6.0e-02) 8.83e+00 (1.1e-01) 8.41e+00 (1.3e-01) 8.21e-05 (***) 0.74
0.6 5.64e+00 (7.0e-02) 5.58e+00 (7.6e-02) 1.17e+01 (1.5e-01) 1.13e+01 (1.4e-01) 1.68e-05 (***) 0.69
1.0 9.62e+00 (1.1e-01) 9.53e+00 (9.7e-02) 1.68e+01 (2.5e-01) 1.58e+01 (2.1e-01) 4.67e-10 (***) 0.72
RQ-0.1-0.5 0.2 1.72e+00 (2.9e-02) 1.74e+00 (3.1e-02) 4.14e+00 (6.5e-02) 4.12e+00 (7.4e-02) 3.53e-01 0.61
0.4 3.77e+00 (4.6e-02) 3.68e+00 (4.6e-02) 7.33e+00 (1.1e-01) 6.92e+00 (9.9e-02) 1.44e-06 (***) 0.77
0.6 5.63e+00 (7.2e-02) 5.54e+00 (6.4e-02) 1.03e+01 (1.7e-01) 9.61e+00 (1.6e-01) 2.72e-07 (***) 0.72
1.0 9.55e+00 (1.1e-01) 9.50e+00 (1.2e-01) 1.50e+01 (2.4e-01) 1.44e+01 (2.6e-01) 5.01e-03 (**) 0.68
Periodic-0.1-4 0.2 1.68e+00 (2.8e-02) 1.70e+00 (2.7e-02) 4.27e+00 (7.0e-02) 4.21e+00 (6.6e-02) 2.00e-01 0.6
0.4 3.40e+00 (4.2e-02) 3.48e+00 (5.2e-02) 7.90e+00 (1.1e-01) 7.84e+00 (1.3e-01) 2.81e-01 0.65
0.6 5.42e+00 (7.3e-02) 5.45e+00 (7.0e-02) 1.12e+01 (1.6e-01) 1.08e+01 (1.8e-01) 2.52e-02 (*) 0.65
1.0 9.41e+00 (1.2e-01) 9.31e+00 (1.1e-01) 1.72e+01 (3.0e-01) 1.60e+01 (2.5e-01) 2.03e-10 (***) 0.63
Table 7: Results for comparing the cubic splines and the fitting by monotone decomposition with CV-tuned μ\mu and fixed JJ. The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error.
curve σ\sigma MSFE MSPE p-value prop.
CubicSpline MonoDecomp CubicSpline MonoDecomp
x2x^{2} 0.1 9.68e-01 (8.2e-03) 9.71e-01 (8.2e-03) 7.38e-01 (3.6e-02) 7.07e-01 (3.3e-02) 1.47e-03 (**) 0.65
0.2 1.92e+00 (1.6e-02) 1.93e+00 (1.6e-02) 1.53e+00 (7.7e-02) 1.45e+00 (7.2e-02) 1.03e-03 (**) 0.74
0.5 4.88e+00 (3.6e-02) 4.90e+00 (3.6e-02) 3.67e+00 (1.5e-01) 3.37e+00 (1.3e-01) 2.20e-06 (***) 0.77
1.0 9.63e+00 (8.2e-02) 9.73e+00 (8.0e-02) 7.12e+00 (3.2e-01) 5.92e+00 (2.5e-01) 1.73e-10 (***) 0.77
x3x^{3} 0.1 9.56e-01 (7.6e-03) 9.58e-01 (7.1e-03) 7.32e-01 (3.5e-02) 6.98e-01 (2.9e-02) 1.01e-03 (**) 0.61
0.2 1.93e+00 (1.6e-02) 1.94e+00 (1.5e-02) 1.47e+00 (7.1e-02) 1.38e+00 (5.8e-02) 2.88e-05 (***) 0.69
0.5 4.83e+00 (4.3e-02) 4.86e+00 (4.2e-02) 3.66e+00 (1.5e-01) 3.41e+00 (1.4e-01) 7.49e-03 (**) 0.64
1.0 9.55e+00 (8.1e-02) 9.68e+00 (7.8e-02) 8.03e+00 (3.7e-01) 7.02e+00 (2.7e-01) 2.01e-06 (***) 0.69
exp(x)\exp(x) 0.1 9.56e-01 (7.4e-03) 9.57e-01 (7.3e-03) 7.51e-01 (3.3e-02) 7.26e-01 (3.0e-02) 3.12e-04 (***) 0.61
0.2 1.93e+00 (1.5e-02) 1.94e+00 (1.5e-02) 1.40e+00 (6.6e-02) 1.32e+00 (5.9e-02) 4.32e-07 (***) 0.67
0.5 4.82e+00 (3.9e-02) 4.84e+00 (3.7e-02) 3.52e+00 (1.7e-01) 3.02e+00 (1.4e-01) 9.36e-11 (***) 0.8
1.0 9.74e+00 (7.6e-02) 9.82e+00 (7.5e-02) 7.47e+00 (3.3e-01) 6.09e+00 (2.5e-01) 6.35e-11 (***) 0.8
sigmoid 0.1 9.53e-01 (7.3e-03) 9.60e-01 (7.5e-03) 8.84e-01 (2.4e-02) 8.55e-01 (2.5e-02) 3.16e-03 (**) 0.68
0.2 1.87e+00 (1.4e-02) 1.89e+00 (1.4e-02) 1.81e+00 (6.2e-02) 1.67e+00 (5.2e-02) 5.12e-05 (***) 0.71
0.5 4.78e+00 (3.8e-02) 4.82e+00 (3.6e-02) 3.83e+00 (1.7e-01) 3.51e+00 (1.3e-01) 2.42e-04 (***) 0.66
1.0 9.50e+00 (7.7e-02) 9.62e+00 (7.7e-02) 7.62e+00 (3.4e-01) 6.36e+00 (2.4e-01) 5.51e-08 (***) 0.61
SE-1 0.1 9.69e-01 (6.9e-03) 9.74e-01 (6.7e-03) 8.68e-01 (3.5e-02) 8.18e-01 (3.0e-02) 1.44e-04 (***) 0.73
0.2 1.92e+00 (1.6e-02) 1.92e+00 (1.6e-02) 1.62e+00 (6.1e-02) 1.55e+00 (5.6e-02) 3.30e-04 (***) 0.66
0.5 4.84e+00 (3.6e-02) 4.88e+00 (3.5e-02) 3.77e+00 (1.7e-01) 3.52e+00 (1.3e-01) 2.43e-03 (**) 0.61
1.0 9.69e+00 (6.9e-02) 9.77e+00 (6.7e-02) 7.31e+00 (3.1e-01) 6.15e+00 (2.5e-01) 2.24e-09 (***) 0.63
SE-0.1 0.1 8.51e-01 (1.1e-02) 9.02e-01 (1.9e-02) 1.88e+00 (3.1e-02) 1.98e+00 (6.3e-02) 3.83e-02 (*) 0.57
0.2 1.73e+00 (1.6e-02) 1.79e+00 (3.1e-02) 3.43e+00 (4.8e-02) 3.48e+00 (1.1e-01) 2.94e-01 0.73
0.5 4.47e+00 (5.0e-02) 4.58e+00 (6.0e-02) 7.77e+00 (1.4e-01) 7.73e+00 (1.7e-01) 3.74e-01 0.67
1.0 9.31e+00 (8.4e-02) 9.52e+00 (8.4e-02) 1.45e+01 (2.6e-01) 1.41e+01 (2.5e-01) 3.91e-04 (***) 0.76
Mat12-1 0.1 1.45e+00 (2.5e-02) 1.56e+00 (2.4e-02) 4.42e+00 (5.5e-02) 4.61e+00 (5.8e-02) 2.37e-07 (***) 0.37
0.2 2.19e+00 (3.5e-02) 2.30e+00 (3.3e-02) 5.46e+00 (5.7e-02) 5.50e+00 (6.5e-02) 1.78e-01 0.64
0.5 4.95e+00 (5.5e-02) 5.05e+00 (5.2e-02) 8.18e+00 (9.8e-02) 8.04e+00 (1.1e-01) 1.52e-02 (*) 0.73
1.0 9.80e+00 (9.1e-02) 9.94e+00 (9.0e-02) 1.21e+01 (2.6e-01) 1.16e+01 (2.2e-01) 6.76e-04 (***) 0.74
Mat12-0.1 0.1 3.41e+00 (6.6e-02) 3.83e+00 (8.4e-02) 1.16e+01 (1.4e-01) 1.25e+01 (2.3e-01) 1.85e-06 (***) 0.23
0.2 3.75e+00 (7.9e-02) 4.17e+00 (9.4e-02) 1.24e+01 (1.7e-01) 1.32e+01 (2.3e-01) 2.05e-07 (***) 0.27
0.5 5.91e+00 (9.0e-02) 6.36e+00 (1.0e-01) 1.56e+01 (1.6e-01) 1.62e+01 (2.4e-01) 1.01e-03 (**) 0.35
1.0 1.04e+01 (1.4e-01) 1.08e+01 (1.2e-01) 2.11e+01 (2.2e-01) 2.11e+01 (2.3e-01) 4.48e-01 0.48
Mat32-1 0.1 9.39e-01 (9.2e-03) 9.51e-01 (8.3e-03) 1.33e+00 (2.3e-02) 1.29e+00 (2.0e-02) 5.22e-04 (***) 0.7
0.2 1.91e+00 (1.7e-02) 1.92e+00 (1.6e-02) 2.27e+00 (4.7e-02) 2.18e+00 (4.2e-02) 1.76e-04 (***) 0.71
0.5 4.79e+00 (4.0e-02) 4.84e+00 (3.8e-02) 5.01e+00 (1.4e-01) 4.53e+00 (1.1e-01) 3.31e-07 (***) 0.66
1.0 9.59e+00 (8.2e-02) 9.67e+00 (8.1e-02) 8.49e+00 (2.9e-01) 7.58e+00 (2.6e-01) 3.94e-08 (***) 0.67
Mat32-0.1 0.1 1.25e+00 (3.3e-02) 1.42e+00 (4.2e-02) 3.86e+00 (8.2e-02) 4.27e+00 (1.3e-01) 1.67e-05 (***) 0.42
0.2 1.97e+00 (3.8e-02) 2.23e+00 (6.7e-02) 5.41e+00 (7.8e-02) 5.95e+00 (1.9e-01) 1.59e-03 (**) 0.53
0.5 4.66e+00 (5.9e-02) 4.97e+00 (6.9e-02) 1.03e+01 (1.2e-01) 1.06e+01 (1.8e-01) 2.05e-02 (*) 0.55
1.0 9.55e+00 (1.2e-01) 9.82e+00 (1.2e-01) 1.69e+01 (2.3e-01) 1.67e+01 (2.7e-01) 4.00e-02 (*) 0.68
RQ-0.1-0.5 0.1 8.92e-01 (1.9e-02) 9.95e-01 (3.3e-02) 2.37e+00 (4.1e-02) 2.58e+00 (1.0e-01) 1.91e-02 (*) 0.51
0.2 1.82e+00 (2.7e-02) 1.95e+00 (3.7e-02) 4.15e+00 (7.0e-02) 4.36e+00 (1.1e-01) 6.07e-03 (**) 0.57
0.5 4.60e+00 (6.2e-02) 4.84e+00 (7.4e-02) 9.08e+00 (1.5e-01) 8.99e+00 (2.0e-01) 2.66e-01 0.61
1.0 9.30e+00 (9.9e-02) 9.57e+00 (9.8e-02) 1.48e+01 (2.3e-01) 1.43e+01 (2.2e-01) 6.64e-03 (**) 0.74
Periodic-0.1-4 0.1 8.82e-01 (3.6e-02) 9.46e-01 (3.8e-02) 2.44e+00 (1.1e-01) 2.58e+00 (1.2e-01) 2.75e-03 (**) 0.54
0.2 1.65e+00 (2.6e-02) 1.80e+00 (4.4e-02) 4.26e+00 (6.2e-02) 4.47e+00 (1.3e-01) 3.92e-02 (*) 0.67
0.5 4.34e+00 (5.1e-02) 4.54e+00 (6.0e-02) 9.44e+00 (1.3e-01) 9.41e+00 (1.9e-01) 4.14e-01 0.59
1.0 9.25e+00 (1.3e-01) 9.64e+00 (1.3e-01) 1.76e+01 (2.6e-01) 1.72e+01 (2.9e-01) 3.24e-02 (*) 0.66

A.4 Monotone Decomposition Fitting with Smoothing Splines

A.4.1 A complete version for Table 2

Table 8: Results for comparing the smoothing splines with CV-tuned λ\lambda and the fitting by monotone decomposition with CV-tuned (λ,μ)(\lambda,\mu). The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean squared prediction error.
curve σ\sigma MSFE MSPE p-value prop.
SmoothSpline MonoDecomp SmoothSpline MonoDecomp
x2x^{2} 0.1 9.57e-01 (8.3e-03) 9.73e-01 (8.2e-03) 7.38e-01 (2.4e-02) 8.67e-01 (2.4e-02) 1.74e-14 (***) 0.25
0.5 4.79e+00 (4.1e-02) 4.80e+00 (4.1e-02) 3.41e+00 (1.2e-01) 3.39e+00 (1.1e-01) 3.61e-01 0.46
1.0 9.65e+00 (8.9e-02) 9.69e+00 (8.5e-02) 6.44e+00 (2.9e-01) 6.35e+00 (2.5e-01) 1.68e-01 0.49
1.5 1.44e+01 (1.1e-01) 1.45e+01 (1.1e-01) 9.31e+00 (3.9e-01) 9.14e+00 (3.4e-01) 4.68e-02 (*) 0.6
2.0 1.92e+01 (1.6e-01) 1.92e+01 (1.5e-01) 1.23e+01 (4.9e-01) 1.15e+01 (4.3e-01) 4.45e-06 (***) 0.81
x3x^{3} 0.1 9.46e-01 (8.0e-03) 9.89e-01 (8.0e-03) 8.65e-01 (2.4e-02) 1.11e+00 (2.5e-02) 0.00e+00 (***) 0.2
0.5 4.86e+00 (4.4e-02) 4.88e+00 (4.3e-02) 3.65e+00 (1.2e-01) 3.55e+00 (1.2e-01) 9.06e-03 (**) 0.58
1.0 9.75e+00 (8.2e-02) 9.77e+00 (8.2e-02) 6.47e+00 (1.8e-01) 6.21e+00 (1.7e-01) 1.18e-04 (***) 0.65
1.5 1.45e+01 (1.1e-01) 1.46e+01 (1.1e-01) 9.16e+00 (3.1e-01) 8.67e+00 (2.8e-01) 1.67e-06 (***) 0.81
2.0 1.92e+01 (1.8e-01) 1.92e+01 (1.6e-01) 1.16e+01 (5.9e-01) 1.09e+01 (4.9e-01) 1.15e-04 (***) 0.78
exp(x)\exp(x) 0.1 9.53e-01 (7.6e-03) 9.64e-01 (7.7e-03) 7.96e-01 (2.6e-02) 9.13e-01 (2.6e-02) 5.02e-12 (***) 0.26
0.5 4.80e+00 (4.3e-02) 4.82e+00 (3.9e-02) 3.33e+00 (1.5e-01) 3.30e+00 (1.3e-01) 2.60e-01 0.51
1.0 9.74e+00 (8.4e-02) 9.75e+00 (8.4e-02) 5.94e+00 (2.3e-01) 5.82e+00 (2.1e-01) 3.12e-02 (*) 0.58
1.5 1.45e+01 (1.3e-01) 1.46e+01 (1.3e-01) 9.10e+00 (4.4e-01) 8.83e+00 (4.1e-01) 7.93e-03 (**) 0.69
2.0 1.95e+01 (1.5e-01) 1.95e+01 (1.5e-01) 1.08e+01 (4.9e-01) 1.06e+01 (4.4e-01) 3.80e-02 (*) 0.57
sigmoid 0.1 9.58e-01 (8.0e-03) 9.58e-01 (7.9e-03) 7.75e-01 (2.7e-02) 7.69e-01 (2.5e-02) 1.81e-01 0.56
0.5 4.82e+00 (4.2e-02) 4.82e+00 (4.2e-02) 3.55e+00 (1.3e-01) 3.49e+00 (1.2e-01) 2.45e-02 (*) 0.61
1.0 9.60e+00 (7.8e-02) 9.64e+00 (7.5e-02) 5.99e+00 (2.9e-01) 5.68e+00 (2.4e-01) 4.47e-04 (***) 0.67
1.5 1.43e+01 (1.6e-01) 1.44e+01 (1.5e-01) 8.94e+00 (5.4e-01) 8.36e+00 (4.9e-01) 5.86e-07 (***) 0.65
2.0 1.93e+01 (1.6e-01) 1.94e+01 (1.5e-01) 1.15e+01 (6.4e-01) 1.08e+01 (5.4e-01) 2.06e-05 (***) 0.7
SE-1 0.1 9.66e-01 (7.5e-03) 9.71e-01 (7.4e-03) 7.54e-01 (2.4e-02) 8.19e-01 (2.6e-02) 7.24e-08 (***) 0.26
0.5 4.88e+00 (3.9e-02) 4.89e+00 (3.8e-02) 3.15e+00 (1.2e-01) 3.10e+00 (1.1e-01) 5.92e-02 (.) 0.55
1.0 9.67e+00 (8.3e-02) 9.70e+00 (8.1e-02) 6.32e+00 (2.8e-01) 6.11e+00 (2.5e-01) 3.86e-03 (**) 0.6
1.5 1.47e+01 (1.1e-01) 1.47e+01 (1.1e-01) 8.65e+00 (3.8e-01) 8.39e+00 (3.4e-01) 2.48e-02 (*) 0.55
2.0 1.93e+01 (1.5e-01) 1.94e+01 (1.5e-01) 1.15e+01 (4.3e-01) 1.10e+01 (3.8e-01) 6.24e-04 (***) 0.7
SE-0.1 0.1 7.68e-01 (9.1e-03) 7.94e-01 (9.4e-03) 1.71e+00 (2.7e-02) 1.79e+00 (3.0e-02) 3.41e-09 (***) 0.24
0.5 4.36e+00 (4.4e-02) 4.40e+00 (4.5e-02) 6.65e+00 (1.0e-01) 6.72e+00 (1.0e-01) 2.24e-03 (**) 0.42
1.0 8.92e+00 (9.2e-02) 8.99e+00 (9.0e-02) 1.23e+01 (2.1e-01) 1.23e+01 (2.1e-01) 4.32e-01 0.57
1.5 1.41e+01 (1.3e-01) 1.42e+01 (1.3e-01) 1.76e+01 (3.1e-01) 1.72e+01 (3.1e-01) 5.96e-05 (***) 0.64
2.0 1.89e+01 (1.7e-01) 1.91e+01 (1.7e-01) 2.16e+01 (4.3e-01) 2.11e+01 (4.3e-01) 2.13e-04 (***) 0.68
Mat12-1 0.1 1.16e+00 (2.0e-02) 1.23e+00 (1.9e-02) 3.71e+00 (2.7e-02) 3.80e+00 (3.1e-02) 5.72e-14 (***) 0.15
0.5 4.80e+00 (4.9e-02) 4.87e+00 (4.5e-02) 7.50e+00 (8.3e-02) 7.51e+00 (8.3e-02) 3.83e-01 0.4
1.0 9.65e+00 (9.5e-02) 9.69e+00 (9.2e-02) 1.15e+01 (2.3e-01) 1.13e+01 (2.1e-01) 3.40e-04 (***) 0.56
1.5 1.43e+01 (1.1e-01) 1.43e+01 (1.1e-01) 1.40e+01 (3.0e-01) 1.38e+01 (2.8e-01) 3.13e-03 (**) 0.64
2.0 1.96e+01 (1.6e-01) 1.97e+01 (1.6e-01) 1.69e+01 (4.4e-01) 1.64e+01 (3.7e-01) 1.97e-04 (***) 0.69
Mat12-0.1 0.1 2.60e+00 (3.9e-02) 2.88e+00 (3.8e-02) 9.86e+00 (7.0e-02) 1.03e+01 (8.1e-02) 0.00e+00 (***) 0.06
0.5 4.81e+00 (6.4e-02) 5.11e+00 (6.1e-02) 1.35e+01 (8.5e-02) 1.37e+01 (9.4e-02) 6.90e-06 (***) 0.3
1.0 9.76e+00 (1.2e-01) 9.92e+00 (1.1e-01) 1.90e+01 (2.2e-01) 1.90e+01 (2.2e-01) 2.06e-01 0.58
1.5 1.46e+01 (1.7e-01) 1.48e+01 (1.6e-01) 2.29e+01 (2.8e-01) 2.25e+01 (2.7e-01) 1.21e-05 (***) 0.69
2.0 1.93e+01 (1.8e-01) 1.95e+01 (1.7e-01) 2.70e+01 (4.1e-01) 2.65e+01 (3.6e-01) 3.05e-04 (***) 0.66
Mat32-1 0.1 9.15e-01 (8.0e-03) 9.20e-01 (7.7e-03) 1.18e+00 (2.1e-02) 1.19e+00 (2.2e-02) 5.89e-02 (.) 0.45
0.5 4.86e+00 (4.2e-02) 4.88e+00 (4.0e-02) 4.35e+00 (1.3e-01) 4.32e+00 (1.2e-01) 2.45e-01 0.52
1.0 9.59e+00 (8.5e-02) 9.64e+00 (7.5e-02) 7.20e+00 (2.4e-01) 7.04e+00 (2.0e-01) 3.65e-02 (*) 0.49
1.5 1.45e+01 (1.3e-01) 1.46e+01 (1.3e-01) 1.03e+01 (3.7e-01) 1.00e+01 (3.3e-01) 7.41e-03 (**) 0.65
2.0 1.95e+01 (1.7e-01) 1.96e+01 (1.7e-01) 1.26e+01 (5.1e-01) 1.19e+01 (4.4e-01) 6.29e-05 (***) 0.68
Mat32-0.1 0.1 8.83e-01 (1.4e-02) 9.47e-01 (1.4e-02) 2.92e+00 (2.4e-02) 3.02e+00 (2.6e-02) 3.54e-10 (***) 0.21
0.5 4.22e+00 (5.3e-02) 4.37e+00 (5.0e-02) 8.99e+00 (9.9e-02) 9.03e+00 (9.9e-02) 1.82e-01 0.45
1.0 8.96e+00 (1.1e-01) 9.14e+00 (1.0e-01) 1.48e+01 (2.2e-01) 1.47e+01 (2.1e-01) 1.44e-01 0.54
1.5 1.41e+01 (1.6e-01) 1.43e+01 (1.6e-01) 1.99e+01 (3.5e-01) 1.97e+01 (3.4e-01) 8.29e-02 (.) 0.59
2.0 1.92e+01 (1.9e-01) 1.94e+01 (1.8e-01) 2.36e+01 (4.4e-01) 2.29e+01 (3.9e-01) 6.38e-08 (***) 0.74
RQ-0.1-0.5 0.1 7.21e-01 (9.5e-03) 7.52e-01 (1.0e-02) 2.04e+00 (2.1e-02) 2.07e+00 (2.2e-02) 1.35e-04 (***) 0.38
0.5 4.31e+00 (4.8e-02) 4.41e+00 (4.7e-02) 7.50e+00 (9.8e-02) 7.55e+00 (9.5e-02) 1.36e-01 0.46
1.0 9.10e+00 (1.1e-01) 9.22e+00 (1.1e-01) 1.27e+01 (2.4e-01) 1.25e+01 (2.2e-01) 1.21e-03 (**) 0.6
1.5 1.43e+01 (1.5e-01) 1.44e+01 (1.5e-01) 1.75e+01 (3.3e-01) 1.74e+01 (3.1e-01) 1.29e-01 0.5
2.0 1.91e+01 (1.9e-01) 1.92e+01 (1.8e-01) 2.10e+01 (5.4e-01) 2.07e+01 (5.0e-01) 1.08e-02 (*) 0.63
Periodic-0.1-4 0.1 7.11e-01 (8.8e-03) 7.44e-01 (1.0e-02) 2.07e+00 (2.7e-02) 2.11e+00 (2.6e-02) 3.37e-06 (***) 0.27
0.5 4.07e+00 (4.6e-02) 4.20e+00 (4.6e-02) 8.20e+00 (1.2e-01) 8.35e+00 (1.1e-01) 1.66e-03 (**) 0.27
1.0 8.69e+00 (1.0e-01) 8.89e+00 (9.1e-02) 1.44e+01 (2.1e-01) 1.43e+01 (1.9e-01) 3.03e-01 0.49
1.5 1.39e+01 (1.7e-01) 1.42e+01 (1.6e-01) 2.06e+01 (2.7e-01) 2.02e+01 (2.7e-01) 1.96e-05 (***) 0.67
2.0 1.90e+01 (1.8e-01) 1.92e+01 (1.8e-01) 2.47e+01 (3.9e-01) 2.45e+01 (3.8e-01) 4.46e-02 (*) 0.63

A.4.2 Tune μ\mu with fixed λ\lambda

First, we fix the parameter λ\lambda for the roughness penalty as the CV-tuned one for the smoothing splines. Then, we choose the parameter μ\mu to minimize the CV error. Figure 11 demonstrates the procedure. The left panel shows the CV-error curve for each candidate parameter μ\mu, and the right panel compares the monotone decomposition fitting given the parameter which minimized the curve in the left panel to the smoothing spline fitting. The monotone decomposition achieves a better MSPE, and it is obvious that the better performance is mainly due to the shrinkage on the local modes based on the shapes of fitting curves.

Refer to caption
Refer to caption
Figure 11: Demo for monotone decomposition with smoothing splines. The left panel shows the leave-one-out cross-validation error curve for each candidate μ\mu when λ\lambda is fixed to the one of smoothing splines. The right panel shows the corresponding fitting curves, together with the truth and the noised observations.

The average results based on 100 repetitions are summarized in Table 9. The table shows that we can obtain better performance in the high noise cases, and comparable results in the smaller noise scenarios.

Table 9: Results for comparing the smoothing splines with CV-tuned λ\lambda and the fitting by monotone decomposition with CV-tuned μ\mu and the same λ\lambda. The values are averages over 100 replications, with the standard error of the average in parentheses. The bold values highlight the smaller mean square prediction error.
curve σ\sigma SNR MSFE MSPE p-value prop.
SmoothSpline MonoDecomp SmoothSpline MonoDecomp
x2x^{2} 0.1 1.03e+01 (1.9e-01) 9.30e-03 (1.4e-04) 9.55e-03 (1.4e-04) 6.78e-04 (4.8e-05) 9.03e-04 (5.4e-05) 7.17e-12 (***) 0.24
0.2 2.68e+00 (7.4e-02) 3.72e-02 (6.4e-04) 3.76e-02 (6.3e-04) 2.15e-03 (1.8e-04) 2.29e-03 (1.7e-04) 1.03e-02 (*) 0.33
0.5 5.04e-01 (2.1e-02) 2.35e-01 (3.8e-03) 2.37e-01 (3.6e-03) 1.36e-02 (1.4e-03) 1.23e-02 (9.5e-04) 3.49e-02 (*) 0.44
1.0 1.75e-01 (1.0e-02) 9.48e-01 (1.5e-02) 9.59e-01 (1.5e-02) 5.17e-02 (3.0e-03) 4.81e-02 (2.7e-03) 7.22e-03 (**) 0.48
1.5 1.23e-01 (1.2e-02) 2.07e+00 (3.5e-02) 2.09e+00 (3.6e-02) 1.03e-01 (1.0e-02) 9.57e-02 (8.8e-03) 1.14e-02 (*) 0.57
2.0 1.13e-01 (2.3e-02) 3.82e+00 (5.9e-02) 3.87e+00 (5.6e-02) 1.94e-01 (2.4e-02) 1.55e-01 (1.6e-02) 1.98e-04 (***) 0.63
x3x^{3} 0.1 1.76e+01 (3.3e-01) 8.80e-03 (1.3e-04) 9.92e-03 (1.4e-04) 8.51e-04 (4.8e-05) 1.70e-03 (7.3e-05) 0.00e+00 (***) 0.15
0.2 4.51e+00 (1.1e-01) 3.62e-02 (6.0e-04) 3.74e-02 (6.1e-04) 2.89e-03 (1.5e-04) 3.11e-03 (1.6e-04) 4.74e-02 (*) 0.51
0.5 7.70e-01 (2.3e-02) 2.33e-01 (3.5e-03) 2.36e-01 (3.5e-03) 1.51e-02 (9.3e-04) 1.36e-02 (8.0e-04) 3.25e-04 (***) 0.52
1.0 2.43e-01 (1.6e-02) 9.44e-01 (1.6e-02) 9.54e-01 (1.6e-02) 5.32e-02 (3.9e-03) 4.63e-02 (2.9e-03) 1.77e-04 (***) 0.69
1.5 1.58e-01 (1.6e-02) 2.09e+00 (3.7e-02) 2.12e+00 (3.5e-02) 1.11e-01 (1.1e-02) 9.53e-02 (1.1e-02) 1.90e-05 (***) 0.69
2.0 1.36e-01 (2.1e-02) 3.75e+00 (7.0e-02) 3.80e+00 (6.6e-02) 1.74e-01 (2.1e-02) 1.37e-01 (1.6e-02) 1.55e-04 (***) 0.74
exp(x)\exp(x) 0.1 5.10e+01 (1.1e+00) 9.06e-03 (1.7e-04) 9.42e-03 (1.6e-04) 6.66e-04 (5.0e-05) 9.11e-04 (5.6e-05) 3.13e-06 (***) 0.36
0.2 1.26e+01 (3.1e-01) 3.65e-02 (6.4e-04) 3.69e-02 (6.0e-04) 2.35e-03 (2.6e-04) 2.34e-03 (1.8e-04) 4.72e-01 0.37
0.5 1.98e+00 (4.5e-02) 2.40e-01 (3.9e-03) 2.41e-01 (3.8e-03) 1.16e-02 (9.4e-04) 1.10e-02 (7.9e-04) 4.03e-02 (*) 0.52
1.0 5.89e-01 (2.2e-02) 9.07e-01 (1.4e-02) 9.14e-01 (1.3e-02) 5.39e-02 (5.7e-03) 4.79e-02 (4.5e-03) 3.84e-04 (***) 0.55
1.5 3.02e-01 (1.9e-02) 2.11e+00 (3.6e-02) 2.13e+00 (3.5e-02) 1.02e-01 (1.0e-02) 9.10e-02 (8.7e-03) 5.87e-04 (***) 0.62
2.0 2.00e-01 (2.9e-02) 3.79e+00 (6.3e-02) 3.81e+00 (6.2e-02) 1.71e-01 (2.7e-02) 1.53e-01 (2.4e-02) 1.71e-04 (***) 0.72
sigmoid 0.1 1.70e+01 (3.2e-01) 9.31e-03 (1.4e-04) 9.36e-03 (1.4e-04) 6.33e-04 (4.3e-05) 6.03e-04 (3.3e-05) 5.12e-02 (.) 0.55
0.2 4.38e+00 (6.9e-02) 3.66e-02 (4.6e-04) 3.68e-02 (4.6e-04) 2.44e-03 (1.7e-04) 2.39e-03 (1.6e-04) 7.53e-02 (.) 0.61
0.5 7.87e-01 (2.7e-02) 2.32e-01 (4.0e-03) 2.33e-01 (3.9e-03) 1.50e-02 (1.0e-03) 1.34e-02 (7.9e-04) 1.59e-05 (***) 0.63
1.0 2.31e-01 (1.3e-02) 9.44e-01 (1.6e-02) 9.51e-01 (1.6e-02) 3.96e-02 (3.9e-03) 3.43e-02 (3.1e-03) 8.84e-06 (***) 0.67
1.5 1.70e-01 (1.5e-02) 2.09e+00 (3.3e-02) 2.12e+00 (3.1e-02) 1.12e-01 (1.0e-02) 8.82e-02 (7.8e-03) 4.95e-05 (***) 0.64
2.0 1.26e-01 (1.9e-02) 3.69e+00 (6.5e-02) 3.73e+00 (6.3e-02) 1.67e-01 (2.4e-02) 1.32e-01 (1.7e-02) 1.58e-04 (***) 0.75
SE-1 0.1 3.43e+01 (4.3e+00) 9.02e-03 (1.6e-04) 9.16e-03 (1.5e-04) 7.53e-04 (7.0e-05) 7.32e-04 (5.5e-05) 2.40e-01 0.43
0.2 5.41e+00 (6.9e-01) 3.77e-02 (6.6e-04) 3.78e-02 (6.5e-04) 2.31e-03 (2.0e-04) 2.36e-03 (1.9e-04) 2.04e-01 0.4
0.5 1.20e+00 (1.4e-01) 2.38e-01 (3.9e-03) 2.39e-01 (3.8e-03) 1.24e-02 (1.2e-03) 1.12e-02 (9.5e-04) 7.34e-03 (**) 0.51
1.0 3.59e-01 (3.9e-02) 9.38e-01 (1.6e-02) 9.50e-01 (1.6e-02) 4.74e-02 (3.6e-03) 4.12e-02 (2.8e-03) 4.67e-04 (***) 0.63
1.5 1.68e-01 (1.9e-02) 2.13e+00 (3.2e-02) 2.15e+00 (3.2e-02) 9.58e-02 (8.1e-03) 8.60e-02 (6.7e-03) 3.72e-04 (***) 0.6
2.0 1.18e-01 (1.2e-02) 3.68e+00 (5.9e-02) 3.71e+00 (5.7e-02) 1.69e-01 (1.6e-02) 1.48e-01 (1.2e-02) 2.36e-04 (***) 0.75
SE-0.1 0.1 1.50e+02 (6.4e+00) 6.36e-03 (1.3e-04) 7.06e-03 (1.8e-04) 3.00e-03 (7.7e-05) 3.51e-03 (1.6e-04) 3.81e-05 (***) 0.25
0.2 3.55e+01 (1.9e+00) 2.79e-02 (5.5e-04) 2.90e-02 (5.7e-04) 1.00e-02 (3.1e-04) 1.05e-02 (3.3e-04) 6.47e-05 (***) 0.39
0.5 4.98e+00 (2.3e-01) 1.97e-01 (3.9e-03) 2.02e-01 (3.7e-03) 4.70e-02 (1.7e-03) 4.74e-02 (1.5e-03) 2.98e-01 0.38
1.0 1.30e+00 (6.9e-02) 8.23e-01 (1.6e-02) 8.54e-01 (1.5e-02) 1.54e-01 (5.3e-03) 1.55e-01 (4.6e-03) 3.62e-01 0.35
1.5 6.35e-01 (3.9e-02) 1.96e+00 (4.0e-02) 2.02e+00 (4.0e-02) 3.20e-01 (1.2e-02) 3.23e-01 (1.3e-02) 2.79e-01 0.51
2.0 3.69e-01 (2.3e-02) 3.71e+00 (6.7e-02) 3.79e+00 (6.8e-02) 4.69e-01 (1.8e-02) 4.62e-01 (1.7e-02) 1.57e-01 0.54
Mat12-1 0.1 3.51e+01 (2.7e+00) 1.41e-02 (4.1e-04) 1.58e-02 (4.3e-04) 1.40e-02 (2.3e-04) 1.48e-02 (2.6e-04) 3.30e-10 (***) 0.2
0.2 1.29e+01 (9.7e-01) 3.80e-02 (1.0e-03) 4.17e-02 (9.8e-04) 2.39e-02 (4.6e-04) 2.44e-02 (5.0e-04) 1.39e-02 (*) 0.43
0.5 1.97e+00 (1.5e-01) 2.27e-01 (4.6e-03) 2.38e-01 (4.4e-03) 5.88e-02 (1.7e-03) 5.78e-02 (1.5e-03) 1.09e-01 0.54
1.0 6.51e-01 (5.1e-02) 9.19e-01 (1.6e-02) 9.40e-01 (1.5e-02) 1.26e-01 (4.8e-03) 1.20e-01 (4.1e-03) 1.12e-03 (**) 0.57
1.5 3.52e-01 (4.4e-02) 2.10e+00 (3.4e-02) 2.14e+00 (3.1e-02) 2.13e-01 (1.2e-02) 1.97e-01 (9.0e-03) 2.02e-02 (*) 0.62
2.0 1.71e-01 (1.9e-02) 3.76e+00 (6.0e-02) 3.81e+00 (5.9e-02) 2.94e-01 (2.1e-02) 2.75e-01 (1.8e-02) 1.00e-03 (**) 0.61
Mat12-0.1 0.1 1.51e+01 (8.5e-01) 6.84e-02 (2.5e-03) 8.71e-02 (3.1e-03) 9.73e-02 (1.8e-03) 1.10e-01 (2.3e-03) 7.39e-13 (***) 0.1
0.2 9.80e+00 (5.0e-01) 9.35e-02 (2.9e-03) 1.12e-01 (3.5e-03) 1.14e-01 (1.7e-03) 1.26e-01 (2.5e-03) 3.16e-11 (***) 0.1
0.5 3.90e+00 (2.3e-01) 2.55e-01 (7.8e-03) 2.90e-01 (7.9e-03) 1.92e-01 (3.0e-03) 2.02e-01 (3.5e-03) 8.52e-08 (***) 0.2
1.0 1.24e+00 (7.2e-02) 9.43e-01 (2.5e-02) 1.01e+00 (2.3e-02) 3.62e-01 (7.5e-03) 3.65e-01 (7.5e-03) 2.24e-01 0.25
1.5 5.39e-01 (4.7e-02) 2.15e+00 (5.0e-02) 2.23e+00 (4.6e-02) 5.26e-01 (1.7e-02) 5.14e-01 (1.3e-02) 8.29e-02 (.) 0.51
2.0 3.22e-01 (3.3e-02) 3.85e+00 (8.0e-02) 3.94e+00 (7.4e-02) 7.69e-01 (2.3e-02) 7.36e-01 (2.2e-02) 7.18e-04 (***) 0.5
Mat32-1 0.1 3.78e+01 (3.8e+00) 8.48e-03 (1.6e-04) 8.70e-03 (1.6e-04) 1.50e-03 (5.1e-05) 1.57e-03 (5.1e-05) 1.71e-02 (*) 0.4
0.2 1.11e+01 (1.0e+00) 3.50e-02 (6.3e-04) 3.57e-02 (6.0e-04) 4.32e-03 (2.0e-04) 4.37e-03 (1.9e-04) 2.68e-01 0.47
0.5 1.70e+00 (1.6e-01) 2.32e-01 (3.6e-03) 2.35e-01 (3.6e-03) 1.95e-02 (1.1e-03) 1.87e-02 (9.8e-04) 1.28e-02 (*) 0.48
1.0 4.05e-01 (3.9e-02) 9.54e-01 (1.4e-02) 9.65e-01 (1.4e-02) 6.44e-02 (4.2e-03) 6.09e-02 (3.9e-03) 5.44e-03 (**) 0.57
1.5 2.24e-01 (2.4e-02) 2.17e+00 (3.9e-02) 2.18e+00 (3.8e-02) 1.13e-01 (1.1e-02) 1.01e-01 (8.2e-03) 1.09e-03 (**) 0.6
2.0 1.55e-01 (1.4e-02) 3.79e+00 (6.3e-02) 3.84e+00 (6.2e-02) 1.94e-01 (1.6e-02) 1.62e-01 (1.2e-02) 4.61e-05 (***) 0.74
Mat32-0.1 0.1 1.30e+02 (6.8e+00) 7.72e-03 (2.2e-04) 9.77e-03 (3.8e-04) 8.67e-03 (2.0e-04) 9.84e-03 (3.0e-04) 2.79e-05 (***) 0.27
0.2 4.25e+01 (2.6e+00) 2.47e-02 (7.1e-04) 2.88e-02 (8.5e-04) 2.04e-02 (3.6e-04) 2.15e-02 (5.4e-04) 3.83e-03 (**) 0.32
0.5 5.49e+00 (2.8e-01) 1.81e-01 (4.9e-03) 2.01e-01 (5.2e-03) 8.16e-02 (2.2e-03) 8.51e-02 (2.4e-03) 1.74e-03 (**) 0.4
1.0 1.34e+00 (6.7e-02) 8.44e-01 (2.0e-02) 9.01e-01 (1.9e-02) 2.44e-01 (7.8e-03) 2.40e-01 (7.6e-03) 6.21e-02 (.) 0.41
1.5 6.73e-01 (4.4e-02) 1.99e+00 (4.1e-02) 2.07e+00 (3.9e-02) 3.98e-01 (1.6e-02) 3.82e-01 (1.3e-02) 4.94e-03 (**) 0.5
2.0 3.67e-01 (3.0e-02) 3.76e+00 (6.6e-02) 3.87e+00 (6.2e-02) 5.92e-01 (2.6e-02) 5.64e-01 (2.0e-02) 8.86e-03 (**) 0.6
RQ-0.1-0.5 0.1 1.23e+02 (6.6e+00) 5.70e-03 (1.5e-04) 6.56e-03 (1.9e-04) 4.33e-03 (8.3e-05) 4.71e-03 (1.0e-04) 2.57e-07 (***) 0.28
0.2 3.44e+01 (2.0e+00) 2.47e-02 (6.1e-04) 2.66e-02 (6.3e-04) 1.30e-02 (3.2e-04) 1.34e-02 (3.3e-04) 2.92e-03 (**) 0.34
0.5 4.27e+00 (2.4e-01) 1.94e-01 (4.5e-03) 2.10e-01 (5.0e-03) 5.97e-02 (1.6e-03) 6.17e-02 (1.7e-03) 6.21e-02 (.) 0.41
1.0 1.07e+00 (6.2e-02) 8.53e-01 (1.8e-02) 8.92e-01 (1.7e-02) 1.72e-01 (6.8e-03) 1.70e-01 (6.1e-03) 3.52e-01 0.43
1.5 4.43e-01 (3.3e-02) 2.11e+00 (3.9e-02) 2.15e+00 (3.8e-02) 3.05e-01 (1.0e-02) 3.02e-01 (1.0e-02) 3.03e-01 0.49
2.0 3.07e-01 (2.4e-02) 3.83e+00 (6.6e-02) 3.92e+00 (6.6e-02) 4.56e-01 (1.8e-02) 4.43e-01 (1.6e-02) 3.16e-02 (*) 0.54
Periodic-0.1-4 0.1 1.90e+02 (7.8e+00) 5.22e-03 (1.3e-04) 6.09e-03 (2.5e-04) 4.37e-03 (9.4e-05) 4.99e-03 (2.3e-04) 1.19e-03 (**) 0.34
0.2 4.32e+01 (2.3e+00) 2.35e-02 (4.9e-04) 2.60e-02 (6.4e-04) 1.45e-02 (3.6e-04) 1.54e-02 (4.9e-04) 2.14e-03 (**) 0.3
0.5 6.84e+00 (3.2e-01) 1.66e-01 (4.2e-03) 1.76e-01 (4.6e-03) 6.68e-02 (1.8e-03) 6.84e-02 (1.9e-03) 3.44e-02 (*) 0.51
1.0 1.59e+00 (8.1e-02) 7.92e-01 (1.9e-02) 8.49e-01 (1.9e-02) 2.27e-01 (7.4e-03) 2.34e-01 (7.8e-03) 5.08e-02 (.) 0.48
1.5 6.94e-01 (5.0e-02) 1.96e+00 (4.3e-02) 2.06e+00 (4.4e-02) 4.40e-01 (1.4e-02) 4.35e-01 (1.4e-02) 2.60e-01 0.55
2.0 4.22e-01 (3.9e-02) 3.62e+00 (8.5e-02) 3.72e+00 (8.4e-02) 6.51e-01 (2.3e-02) 6.30e-01 (2.0e-02) 1.19e-02 (*) 0.62

A.4.3 Tune the shrinkage factor kk

In light of Proposition 6, we tune the shrinkage factor kk instead of (λ,μ)(\lambda,\mu). It shows that the monotone decomposition with the shrinkage factor can achieve better performance in high noise cases for monotone functions such as x3,exp(x)x^{3},\exp(x) and sigmoid function. Promisingly, it also works for x2x^{2} and SE_1, although Proposition 6 is intended for pure monotone functions.

Refer to caption
Figure 12: Results for comparing the smoothing splines and the fitting by monotone decomposition where the parameters are tuning by varying the shrinkage factor kk. The values are averages over 100 replications.

A.5 QQ plots of p-values

Refer to caption
Figure 13: Uniform QQ plot of pp-values for five approaches on curve xx with n=200,σ=0.01n=200,\sigma=0.01.
Refer to caption
Figure 14: Uniform QQ plot of pp-values for five approaches on curve x1/3x^{1/3} with n=200,σ=0.01n=200,\sigma=0.01.
Refer to caption
Figure 15: Uniform QQ plot of pp-values for five approaches on curve 1/(1+ex)1/(1+e^{-x}) with n=200,σ=0.01n=200,\sigma=0.01.
Refer to caption
Figure 16: Uniform QQ plot of pp-values for five approaches on curve xx with n=200,σ=0.1n=200,\sigma=0.1.
Refer to caption
Figure 17: Uniform QQ plot of pp-values for five approaches on curve x3x^{3} with n=200,σ=0.1n=200,\sigma=0.1.
Refer to caption
Figure 18: Uniform QQ plot of pp-values for five approaches on curve x1/3x^{1/3} with n=200,σ=0.1n=200,\sigma=0.1.
Refer to caption
Figure 19: Uniform QQ plot of pp-values for five approaches on curve exe^{x} with n=200,σ=0.1n=200,\sigma=0.1.
Refer to caption
Figure 20: Uniform QQ plot of pp-values for five approaches on curve 1/(1+ex)1/(1+e^{-x}) with n=200,σ=0.1n=200,\sigma=0.1.

A.6 Different Curves for Testing the Monotonicity

Refer to caption
(a) [1]
Refer to caption
(b) [9]
Figure 21: Different curves for testing the monotonicity.

Appendix B More GO Analysis

Figure 22 is the same as Figure 8except that panel (d) displays the GO terms identified by the interaction of DE genes and ME genes (the complementary of nME genes). Similar to DE genes, the interaction of DE genes and ME genes failed to identify significant GO terms.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 22: GO enrichment analysis on the [16] dataset of four gene sets ((a): nME gene set; (b): DE gene set; (c): the intersection of nME and DE genes; (d): the intersection of ME and DE genes) with BH FDR cutoff α=0.05\alpha=0.05. The numbers of genes are noted in the title of each subfigure. The enriched GO terms are sorted by the adjusted pp-value. (b) and (d) are left empty deliberately since no significant GO terms are selected.

For the liver dataset, the GO terms identified by different gene sets are shown in Figure 23. Most GO terms selected by the DE genes (Figure 23(b)) and by the ME&DE genes (Figure 23(d)) are the same. The numbers of different GO terms are illustrated in the Venn diagram of Figure 24.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 23: GO enrichment analysis on the liver dataset of four gene sets ((a): nME gene set; (b): DE gene set; (c): the intersection of nME and DE genes; (d): the intersection of ME and DE genes) with BH FDR cutoff α=0.05\alpha=0.05. The numbers of genes are noted in the title of each subfigure. The enriched GO terms are sorted by the adjusted pp-value. (b) is left empty deliberately since no significant GO terms are selected.
Refer to caption
Figure 24: Venn diagram of GO terms identified by DE genes and DE&ME genes.

Figures 25 and 26 display the distinct GO terms selected by DE genes and ME genes, respectively. Note that the counts of GO terms and associated pp-values are relatively small compared to their common GO terms.

Refer to caption
Figure 25: GO terms identified by DE genes but not ME genes.
Refer to caption
Figure 26: GO terms identified by ME genes but not DE genes.

Appendix C Theoretical proofs of some basic propositions

C.1 A proposition

Proposition 7.

Suppose f(x)f(x) is a univariate continuous function in an interval II, there exists a decomposition

f(x)=fup(x)+fdown(x).f(x)=f_{\mathrm{up}}(x)+f_{\mathrm{down}}(x). (16)

where fup(x)f_{\mathrm{up}}(x) and fdown(x)f_{\mathrm{down}}(x) are continuous increasing and decreasing functions, respectively.

Proof.

Decompose the interval II as

I=k=1n[a2k1,a2k].I=\cup_{k=1}^{n}[a_{2k-1},a_{2k}]\,.

Without loss of generality, suppose f(x)f(x) is increasing in [a2k1,a2k][a_{2k-1},a_{2k}] and decreasing in [a2k,a2k+1][a_{2k},a_{2k+1}]. Then one feasible decomposition is

fup(x)={f(x)x[a1,a2]f(a2)x[a2,a3]f(x)f(a3)+f(a2)x[a3,a4]f(a4)f(a3)+f(a2)x[a4,a5]f(x)+i=2k(f(a2i2)f(a2i1))x[a2k1,a2k],k2f(a2k)+i=2k(f(a2i2)f(a2i1))x[a2k,a2k+1],k2,f_{\mathrm{up}}(x)=\begin{cases}f(x)&x\in[a_{1},a_{2}]\\ f(a_{2})&x\in[a_{2},a_{3}]\\ f(x)-f(a_{3})+f(a_{2})&x\in[a_{3},a_{4}]\\ f(a_{4})-f(a_{3})+f(a_{2})&x\in[a_{4},a_{5}]\\ \cdots&\cdots\\ f(x)+\sum_{i=2}^{k}(f(a_{2i-2})-f(a_{2i-1}))&x\in[a_{2k-1},a_{2k}],k\geq 2\\ f(a_{2k})+\sum_{i=2}^{k}(f(a_{2i-2})-f(a_{2i-1}))&x\in[a_{2k},a_{2k+1}],k\geq 2\end{cases}\,,

and fdown(x)=f(x)fup(x)f_{\mathrm{down}}(x)=f(x)-f_{\mathrm{up}}(x). ∎

C.2 Proposition 1

Proof.

Consider the problem

minγu,γdy𝐁(γu+γd)2s.t. γ1uγ2uγJu;γ1dγ2dγJdγu+γd=γ^,\begin{split}\min_{\gamma^{u},\gamma^{d}}&\;\|y-{\mathbf{B}}(\gamma^{u}+\gamma^{d})\|_{2}\\ \text{s.t. }&\gamma_{1}^{u}\leq\gamma_{2}^{u}\leq\cdots\leq\gamma_{J}^{u};\;\gamma_{1}^{d}\geq\gamma_{2}^{d}\geq\cdots\geq\gamma_{J}^{d}\\ &\gamma^{u}+\gamma^{d}=\hat{\gamma}\,,\end{split} (17)

where γ^\hat{\gamma} is the unconstrained solution. Since we can always decompose a vector into the sum of an increasing vector and a decreasing vector. For 2iJ2\leq i\leq J, let

γ^iu\displaystyle\hat{\gamma}^{u}_{i} =γ^i1u+(γ^iγ^i1)+\displaystyle=\hat{\gamma}_{i-1}^{u}+(\hat{\gamma}_{i}-\hat{\gamma}_{i-1})_{+} (18)
γ^id\displaystyle\hat{\gamma}^{d}_{i} =γ^i1d+(γ^iγ^i1),\displaystyle=\hat{\gamma}_{i-1}^{d}+(\hat{\gamma}_{i}-\hat{\gamma}_{i-1})_{-}\,, (19)

where t+=max(0,t)t_{+}=\max(0,t) and t=min(0,t)t_{-}=\min(0,t). And

γ^1u+γ^1d=γ^1,\hat{\gamma}^{u}_{1}+\hat{\gamma}^{d}_{1}=\hat{\gamma}_{1}\,,

for example, take γ^1u=γ^1\hat{\gamma}_{1}^{u}=\hat{\gamma}_{1} and γ^1d=0\hat{\gamma}_{1}^{d}=0. Thus the constraints are feasible, i.e., non-empty, and then any solution would satisfy

γ^u+γ^d=γ^.\hat{\gamma}^{u}+\hat{\gamma}^{d}=\hat{\gamma}\,.

C.3 Proposition 2

Proof.

Here we only prove the second property since the first property has been incorporated into the proof for the main propositions in Sections D.2 and G.3.

We can also put the constraint γΥ\gamma\in\Upsilon into the Lagrangian form. Define

𝐀[𝐈J10][0𝐈J1]=[11000011000010000011](J1)×J,𝐇[𝐀𝟎𝟎𝐀],{\mathbf{A}}\triangleq\begin{bmatrix}{\mathbf{I}}_{J-1}&0\end{bmatrix}-\begin{bmatrix}0&{\mathbf{I}}_{J-1}\end{bmatrix}=\begin{bmatrix}1&-1&0&\cdots&0&0\\ 0&1&-1&\cdots&0&0\\ 0&0&1&\cdots&0&0\\ \vdots&\vdots&\vdots&\ddots&\vdots&\vdots\\ 0&0&0&\cdots&1&-1\end{bmatrix}_{(J-1)\times J}\,,\quad{\mathbf{H}}\triangleq\begin{bmatrix}{\mathbf{A}}&\boldsymbol{0}\\ \boldsymbol{0}&-{\mathbf{A}}\end{bmatrix}\,,

and the Lagrange form is

Lls(γ,ν,μ)=𝐲𝐙γ22+2νT𝐇γ+μγT𝐖γ,L_{\mathrm{ls}}(\gamma,\nu,\mu)=\|{\mathbf{y}}-{\mathbf{Z}}\gamma\|_{2}^{2}+2\nu^{T}{\mathbf{H}}\gamma+\mu\gamma^{T}{\mathbf{W}}\gamma\,, (20)

where 2(J1)2(J-1)-vector ν\nu and scalar μ\mu are the Lagrange multipliers, and factor 22 is just for convenient formulas after taking the first derivatives. Take the derivatives on the Lagrange form,

2𝐙T(y𝐙γ)+2𝐇Tν+2μ𝐖γ+2λ[𝛀𝛀𝛀𝛀]γ=0,-2{\mathbf{Z}}^{T}(y-{\mathbf{Z}}\gamma)+2{\mathbf{H}}^{T}\nu+2\mu{\mathbf{W}}\gamma+2\lambda\begin{bmatrix}{\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\\ {\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\end{bmatrix}\gamma=0\,,

where if λ=0\lambda=0, it reduces to the cubic spline. Then

𝐁iT(𝐲𝐙γ)+𝐇iTν+μ𝐁iT𝐁(γuγd)+λ𝛀i(γu+γd)=0\displaystyle-{\mathbf{B}}^{T}_{i}({\mathbf{y}}-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})+\lambda{\boldsymbol{\Omega}}_{i}(\gamma^{u}+\gamma^{d})=0 (21)
𝐁iT(𝐲𝐙γ)+𝐇i+JTν+μ𝐁iT𝐁(γu+γd)+λ𝛀i(γu+γd)=0\displaystyle-{\mathbf{B}}^{T}_{i}({\mathbf{y}}-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i+J}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})+\lambda{\boldsymbol{\Omega}}_{i}(\gamma^{u}+\gamma^{d})=0 (22)

Note that

i=1J𝐇iTν=0,\sum_{i=1}^{J}{\mathbf{H}}_{i}^{T}\nu=0\,,

and

i=1J𝐇i+JTν=0,\sum_{i=1}^{J}{\mathbf{H}}_{i+J}^{T}\nu=0\,,

then we have

μ𝟏T𝐁T𝐁(γuγd)=μ𝟏T𝐁T𝐁(γu+γd),\mu{\boldsymbol{1}}^{T}{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})=\mu{\boldsymbol{1}}^{T}{\mathbf{B}}^{T}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})\,,

which implies

μ𝟏T𝐁(γuγd)=μ𝟏T𝐁(γu+γd),\mu{\boldsymbol{1}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})=\mu{\boldsymbol{1}}^{T}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})\,,

so

2μ𝟏T𝐁γu=2μ𝟏T𝐁γd.2\mu{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}=2\mu{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}\,.

If μ=0\mu=0, by KKT condition, then 𝐁γu𝐁γd=0{\mathbf{B}}\gamma^{u}-{\mathbf{B}}\gamma^{d}=0. If μ0\mu\neq 0, then 𝟏T𝐁γu=𝟏T𝐁γd{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}.

Thus, 𝟏T𝐁γu=𝟏T𝐁γd{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d} always holds. ∎

Appendix D Proof of Proposition 3

Lemma 1 (Chebyshev Sum Inequality).

If a1a2ana_{1}\geq a_{2}\geq\cdots\geq a_{n}, and b1b2bnb_{1}\geq b_{2}\geq\cdots\geq b_{n}, then

nk=1nakbk(k=1nak)(k=1nbk)n\sum_{k=1}^{n}a_{k}b_{k}\geq\left(\sum_{k=1}^{n}a_{k}\right)\left(\sum_{k=1}^{n}b_{k}\right)
Proof.

Since these two sequences are decreasing, then the sum

S=j=1nk=1n(ajak)(bjbk)0.S=\sum_{j=1}^{n}\sum_{k=1}^{n}(a_{j}-a_{k})(b_{j}-b_{k})\geq 0\,.

Note that

S\displaystyle S =j=1n(najbjbjk=1nakajk=1nbk+k=1nakbk)\displaystyle=\sum_{j=1}^{n}\left(na_{j}b_{j}-b_{j}\sum_{k=1}^{n}a_{k}-a_{j}\sum_{k=1}^{n}b_{k}+\sum_{k=1}^{n}a_{k}b_{k}\right)
=nj=1najbj+nk=1nakbkj=1nbjk=1nakj=1najk=1nbk\displaystyle=n\sum_{j=1}^{n}a_{j}b_{j}+n\sum_{k=1}^{n}a_{k}b_{k}-\sum_{j=1}^{n}b_{j}\sum_{k=1}^{n}a_{k}-\sum_{j=1}^{n}a_{j}\sum_{k=1}^{n}b_{k}
=2nk=1nakbk2k=1nbkk=1nak,\displaystyle=2n\sum_{k=1}^{n}a_{k}b_{k}-2\sum_{k=1}^{n}b_{k}\sum_{k=1}^{n}a_{k}\,,

hence

nk=1nakbk(k=1nak)(k=1nbk).n\sum_{k=1}^{n}a_{k}b_{k}\geq\left(\sum_{k=1}^{n}a_{k}\right)\left(\sum_{k=1}^{n}b_{k}\right)\,.

D.1 (i)

Proof.

Suppose γ=γu+γd\gamma=\gamma^{u}+\gamma^{d} is increasing, then we have a decomposition in which γd\gamma^{d} is a vector with identical elements. Write the decomposition as γ=γu+c𝟏\gamma=\gamma^{u}+c{\boldsymbol{1}}, where cc is a constant. If there exists a non-zero non-decreasing vector δ\delta such that the non-increasing part is not a constant, i.e.,

γ=(γu+δ)+(c𝟏δ)γ~u+γ~d.\gamma=(\gamma^{u}+\delta)+(c{\boldsymbol{1}}-\delta)\triangleq\tilde{\gamma}^{u}+\tilde{\gamma}^{d}\,.

Then

y𝐁(γu+c𝟏)=y𝐁(γ~u+γ~d),\|y-{\mathbf{B}}(\gamma^{u}+c{\boldsymbol{1}})\|=\|y-{\mathbf{B}}(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})\|\,,

and if we impose the roughness penalty, we have

(γu+γd)T𝛀(γu+γd)=(γ~u+γ~d)T𝛀(γ~u+γ~d).(\gamma^{u}+\gamma^{d})^{T}{\boldsymbol{\Omega}}(\gamma^{u}+\gamma^{d})=(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})^{T}{\boldsymbol{\Omega}}(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})\,.

So the first two terms in the Lagrange objective function (13) are the same, and we only need to compare the difference between these two components,

𝐁(γ~uγ~d)22𝐁(γuγd)22\displaystyle\|{\mathbf{B}}(\tilde{\gamma}^{u}-\tilde{\gamma}^{d})\|_{2}^{2}-\|{\mathbf{B}}(\gamma^{u}-\gamma^{d})\|_{2}^{2}
=\displaystyle= (γ~uγ~d(γuγd))T𝐁T𝐁(γ~uγ~d+(γuγd))\displaystyle(\tilde{\gamma}^{u}-\tilde{\gamma}^{d}-(\gamma^{u}-\gamma^{d}))^{T}{\mathbf{B}}^{T}{\mathbf{B}}(\tilde{\gamma}^{u}-\tilde{\gamma}^{d}+(\gamma^{u}-\gamma^{d}))
=\displaystyle= 2δT𝐁T𝐁(2γu+2δ2c𝟏)\displaystyle 2\delta^{T}{\mathbf{B}}^{T}{\mathbf{B}}(2\gamma^{u}+2\delta-2c{\boldsymbol{1}})
=\displaystyle= 4{δT𝐁T𝐁δ+δT𝐁T𝐁(γuc𝟏)}.\displaystyle 4\left\{\delta^{T}{\mathbf{B}}^{T}{\mathbf{B}}\delta+\delta^{T}{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-c{\boldsymbol{1}})\right\}\,.

By Proposition 2, we have

𝟏T𝐁γu=𝟏T𝐁γd𝟏T𝐁γ~u=𝟏T𝐁γ~d,{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}\,\qquad{\boldsymbol{1}}^{T}{\mathbf{B}}\tilde{\gamma}^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}\tilde{\gamma}^{d}\,,

then

𝟏T𝐁(γuc𝟏)=𝟏T𝐁δ=0.{\boldsymbol{1}}^{T}{\mathbf{B}}(\gamma^{u}-c{\boldsymbol{1}})={\boldsymbol{1}}^{T}{\mathbf{B}}\delta=0\,.

Let 𝐚=𝐁δ,𝐛=𝐁(γuc𝟏){\mathbf{a}}={\mathbf{B}}\delta,{\mathbf{b}}={\mathbf{B}}(\gamma^{u}-c{\boldsymbol{1}}), then by Fact 1,

𝐚T𝐛1n(𝟏T𝐚)(𝟏T𝐛)=0.{\mathbf{a}}^{T}{\mathbf{b}}\geq\frac{1}{n}({\boldsymbol{1}}^{T}{\mathbf{a}})({\boldsymbol{1}}^{T}{\mathbf{b}})=0\,.

And δT𝐁T𝐁δ>0\delta^{T}{\mathbf{B}}^{T}{\mathbf{B}}\delta>0 for non-zero δ\delta, thus,

𝐁(γ~uγ~d)22>𝐁(γuγd)22.\|{\mathbf{B}}(\tilde{\gamma}^{u}-\tilde{\gamma}^{d})\|_{2}^{2}>\|{\mathbf{B}}(\gamma^{u}-\gamma^{d})\|_{2}^{2}\,.

So if γu+γd\gamma^{u}+\gamma^{d} is increasing, the best decomposition is γd=c𝟏\gamma^{d}=c{\boldsymbol{1}}.

Note that

𝟏T𝐁γu=𝟏T𝐁c𝟏=c𝟏T𝟏,{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}={\boldsymbol{1}}^{T}{\mathbf{B}}c{\boldsymbol{1}}=c{\boldsymbol{1}}^{T}{\boldsymbol{1}}\,,

the constant cc is

c=𝟏T𝐁γun.c=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}}{n}\,.

D.2 (ii)

Proof.

Take the derivatives on the Lagrange form,

2𝐙T(y𝐙γ)+2𝐇Tν+2μ𝐖γ=0\displaystyle-2{\mathbf{Z}}^{T}(y-{\mathbf{Z}}\gamma)+2{\mathbf{H}}^{T}\nu+2\mu{\mathbf{W}}\gamma=0 (23)

then

γ^\displaystyle\hat{\gamma} =(𝐙T𝐙+μ𝐖)1(𝐙Ty𝐇Tν)\displaystyle=({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}({\mathbf{Z}}^{T}y-{\mathbf{H}}^{T}\nu)
(𝐙T𝐙+μ𝐖)1𝐙Ty(𝐙T𝐙+μ𝐖)1ξ,\displaystyle\triangleq({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}{\mathbf{Z}}^{T}y-({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}\xi\,,

where ξ𝐇Tν\xi\triangleq{\mathbf{H}}^{T}\nu. Note that

(𝐙T𝐙+μ𝐖)1=[1+μ1μ1μ1+μ]1(𝐁T𝐁)1=14μ[μ+1μ1μ1μ+1](𝐁T𝐁)1,({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}=\begin{bmatrix}1+\mu&1-\mu\\ 1-\mu&1+\mu\end{bmatrix}^{-1}\otimes({\mathbf{B}}^{T}{\mathbf{B}})^{-1}=\frac{1}{4\mu}\begin{bmatrix}\mu+1&\mu-1\\ \mu-1&\mu+1\end{bmatrix}\otimes({\mathbf{B}}^{T}{\mathbf{B}})^{-1}\,,

then

(𝐙T𝐙+μ𝐖)1𝐙Ty={14μ[μ+1μ1μ1μ+1][11]}(𝐁T𝐁)1𝐁Ty=12[11](𝐁T𝐁)1𝐁Ty.({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}{\mathbf{Z}}^{T}y=\left\{\frac{1}{4\mu}\begin{bmatrix}\mu+1&\mu-1\\ \mu-1&\mu+1\end{bmatrix}\begin{bmatrix}1\\ 1\end{bmatrix}\right\}\otimes({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y=\frac{1}{2}\begin{bmatrix}1\\ 1\end{bmatrix}\otimes({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y\,.

Consider the ii-th element of (23),

𝐙iT(y𝐙γ)+𝐇iTν+μ𝐖iTγ=0,-{\mathbf{Z}}_{i}^{T}(y-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i}^{T}\nu+\mu{\mathbf{W}}_{i}^{T}\gamma=0\,, (24)

where 𝐙i,𝐇i,𝐖i{\mathbf{Z}}_{i},{\mathbf{H}}_{i},{\mathbf{W}}_{i} are the ii-th column of 𝐙,𝐇,𝐖{\mathbf{Z}},{\mathbf{H}},{\mathbf{W}}, respectively. Note that for 1iJ1\leq i\leq J,

𝐖i=[𝐁T𝐁i𝐁T𝐁i],𝐖i+J=[𝐁T𝐁i𝐁T𝐁i],{\mathbf{W}}_{i}=\begin{bmatrix}{\mathbf{B}}^{T}{\mathbf{B}}_{i}\\ -{\mathbf{B}}^{T}{\mathbf{B}}_{i}\end{bmatrix}\,,\qquad{\mathbf{W}}_{i+J}=\begin{bmatrix}-{\mathbf{B}}^{T}{\mathbf{B}}_{i}\\ {\mathbf{B}}^{T}{\mathbf{B}}_{i}\end{bmatrix}\,,

and

𝐙i=𝐁i,𝐙i+J=𝐁i,{\mathbf{Z}}_{i}={\mathbf{B}}_{i}\,,\qquad{\mathbf{Z}}_{i+J}={\mathbf{B}}_{i}\,,

where 𝐁i{\mathbf{B}}_{i} is the ii-th column of 𝐁{\mathbf{B}}. Then (24) turns out to be

𝐁iT(y𝐙γ)+𝐇iTν+μ𝐁iT𝐁(γuγd)\displaystyle-{\mathbf{B}}^{T}_{i}(y-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d}) =0,\displaystyle=0\,, (25)
𝐁iT(y𝐙γ)+𝐇i+JTν+μ𝐁iT𝐁(γu+γd)\displaystyle-{\mathbf{B}}^{T}_{i}(y-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i+J}^{T}\nu+\mu{\mathbf{B}}^{T}_{i}{\mathbf{B}}(-\gamma^{u}+\gamma^{d}) =0.\displaystyle=0\,. (26)

(25) + (26) yields

(𝐇iT𝐇i+JT)ν=2μ𝐁iT𝐁(γuγd),({\mathbf{H}}_{i}^{T}-{\mathbf{H}}_{i+J}^{T})\nu=-2\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})\,, (27)

and (25) - (26) yields

(𝐇iT+𝐇i+JT)ν=2𝐁iT(y𝐙γ).({\mathbf{H}}_{i}^{T}+{\mathbf{H}}_{i+J}^{T})\nu=2{\mathbf{B}}^{T}_{i}(y-{\mathbf{Z}}\gamma)\,.

If there is no ties in the solution, i.e.,

γ1u<γ2u<<γJu,γ1d>γ2d>>γJd,\gamma_{1}^{u}<\gamma_{2}^{u}<\cdots<\gamma_{J}^{u}\,,\qquad\gamma_{1}^{d}>\gamma_{2}^{d}>\cdots>\gamma_{J}^{d}\,,

by the KKT condition, ν=0\nu=0, and then (27) implies μ=0\mu=0. Then it reduces to unconstrained B-spline fitting. This argument proves the first point of Proposition 2.

If

γ1u<γ2u<<γJu,\gamma^{u}_{1}<\gamma^{u}_{2}<\ldots<\gamma^{u}_{J}\,,

then from (27), we have

ξi+J=2μ𝐁iT𝐁(γuγd),\xi_{i+J}=2\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})\,,

then

ξ=[01]2μ𝐁T𝐁(γuγd),\xi=\begin{bmatrix}0\\ 1\end{bmatrix}\otimes 2\mu{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})\,,

it follows that

(𝐙T𝐙+μ𝐖)1ξ=12[μ1μ+1](γuγd).({\mathbf{Z}}^{T}{\mathbf{Z}}+\mu{\mathbf{W}})^{-1}\xi=\frac{1}{2}\begin{bmatrix}\mu-1\\ \mu+1\end{bmatrix}\otimes(\gamma^{u}-\gamma^{d})\,.

So

[γuγd]=12[(𝐁T𝐁)1𝐁Ty(μ1)(γuγd)(𝐁T𝐁)1𝐁Ty(μ+1)(γuγd)],\begin{bmatrix}\gamma^{u}\\ \gamma^{d}\end{bmatrix}=\frac{1}{2}\begin{bmatrix}({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y-(\mu-1)(\gamma^{u}-\gamma^{d})\\ ({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y-(\mu+1)(\gamma^{u}-\gamma^{d})\end{bmatrix}\,,

both yield

(μ+1)γu(μ1)γd=(𝐁T𝐁)1𝐁Ty.(\mu+1)\gamma^{u}-(\mu-1)\gamma^{d}=({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y\,.

On the other hand, from (25), we have

𝐁T(y𝐙γ)+μ𝐁T𝐁(γuγd)=0,-{\mathbf{B}}^{T}(y-{\mathbf{Z}}\gamma)+\mu{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})=0\,,

that is,

(γu+γd)+μ(γuγd)=(𝐁T𝐁)1𝐁Ty,(\gamma^{u}+\gamma^{d})+\mu(\gamma^{u}-\gamma^{d})=({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y\,,

that is,

(μ+1)γu(μ1)γd=(𝐁T𝐁)1𝐁Tyγ^ls.(\mu+1)\gamma^{u}-(\mu-1)\gamma^{d}=({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}y\triangleq\hat{\gamma}^{\mathrm{ls}}\,.

Incorporate with the result in Section D.1, we have

γd=c𝟏,γu=1μ+1γ^ls+μ1μ+1c𝟏.\gamma^{d}=c{\boldsymbol{1}}\,,\gamma^{u}=\frac{1}{\mu+1}\hat{\gamma}^{\mathrm{ls}}+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,.

D.3 (iii)

Proof.

Note that

{ξ1=ν1ξ2=ν2ν1ξJ1=νJ1νJ2ξJ=νJ1\begin{cases}\xi_{1}&=\nu_{1}\\ \xi_{2}&=\nu_{2}-\nu_{1}\\ \vdots&\\ \xi_{J-1}&=\nu_{J-1}-\nu_{J-2}\\ \xi_{J}&=-\nu_{J-1}\end{cases} (28)

If γ^1u<<γ^k1u==γ^k2u<<γ^k2m1u==γ^k2mu<γ^J\hat{\gamma}^{u}_{1}<\cdots<\hat{\gamma}^{u}_{k_{1}}=\cdots=\hat{\gamma}^{u}_{k_{2}}<\cdots<\hat{\gamma}^{u}_{k_{2m-1}}=\cdots=\hat{\gamma}^{u}_{k_{2m}}<\hat{\gamma}_{J}, then

{ν1==νk11=0νk2m2==νk2m1=0νk2m==νJ1=0{ξ1==ξk11=0i=k1k21ξi=0i=k2m1k2m1ξi=0ξk2m==ξJ=0\begin{cases}\nu_{1}&=\cdots=\nu_{k_{1}-1}=0\\ \vdots&\\ \nu_{k_{2m-2}}&=\cdots=\nu_{k_{2m-1}}=0\\ \nu_{k_{2m}}&=\cdots=\nu_{J-1}=0\end{cases}\qquad\Longrightarrow\qquad\begin{cases}\xi_{1}&=\cdots=\xi_{k_{1}-1}=0\\ \vdots&\\ \displaystyle\sum_{i=k_{1}}^{k_{2}-1}\xi_{i}&=0\\ \vdots&\\ \displaystyle\sum_{i=k_{2m-1}}^{k_{2m}-1}\xi_{i}&=0\\ \xi_{k_{2m}}&=\cdots=\xi_{J}=0\\ \end{cases}

Thus,

𝐁1Ty\displaystyle{\mathbf{B}}_{1}^{T}y =(μ+1)𝐁1T𝐁γu(μ1)𝐁1T𝐁γd\displaystyle=(\mu+1){\mathbf{B}}^{T}_{1}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}^{T}_{1}{\mathbf{B}}\gamma^{d}
\displaystyle\vdots
𝐁k11Ty\displaystyle{\mathbf{B}}_{k_{1}-1}^{T}y =(μ+1)𝐁k11T𝐁γu(μ1)𝐁k11T𝐁γd\displaystyle=(\mu+1){\mathbf{B}}^{T}_{k_{1}-1}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}^{T}_{k_{1}-1}{\mathbf{B}}\gamma^{d}
i=k1k21𝐁iTy\displaystyle\sum_{i=k_{1}}^{k_{2}-1}{\mathbf{B}}_{i}^{T}y =(μ+1)i=k1k21𝐁iT𝐁γu(μ1)i=k1k21𝐁iT𝐁γd\displaystyle=(\mu+1)\sum_{i=k_{1}}^{k_{2}-1}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=k_{1}}^{k_{2}-1}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}
\displaystyle\vdots
i=k2m2k2m1𝐁iTy\displaystyle\sum_{i=k_{2m-2}}^{k_{2m}-1}{\mathbf{B}}_{i}^{T}y =(μ+1)i=k2m2k2m1𝐁iT𝐁γu(μ1)i=k2m2k2m1𝐁iT𝐁γd\displaystyle=(\mu+1)\sum_{i=k_{2m-2}}^{k_{2m}-1}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=k_{2m-2}}^{k_{2m}-1}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}
𝐁k2mTy\displaystyle{\mathbf{B}}_{k_{2m}}^{T}y =(μ+1)𝐁k2mT𝐁γu(μ1)𝐁k2mT𝐁γd\displaystyle=(\mu+1){\mathbf{B}}^{T}_{k_{2m}}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}^{T}_{k_{2m}}{\mathbf{B}}\gamma^{d}
\displaystyle\vdots
𝐁JTy\displaystyle{\mathbf{B}}_{J}^{T}y =(μ+1)𝐁JT𝐁γu(μ1)𝐁JT𝐁γd.\displaystyle=(\mu+1){\mathbf{B}}^{T}_{J}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}_{J}^{T}{\mathbf{B}}\gamma^{d}\,.

Then

𝐆𝐁Ty=(μ+1)𝐆𝐁T𝐁γu(μ1)𝐆𝐁T𝐁γd,{\mathbf{G}}{\mathbf{B}}^{T}y=(\mu+1){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{d}\,,

where

𝐆=[𝐈k11𝟏k2k1+1T𝐈k2m1k2m21𝟏k2mk2m1+1T𝐈Jk2m].{\mathbf{G}}=\begin{bmatrix}{\mathbf{I}}_{k_{1}-1}&&&&&\\ &{\boldsymbol{1}}^{T}_{k_{2}-k_{1}+1}&&&&\\ &&\ddots&&&\\ &&&{\mathbf{I}}_{k_{2m-1}-k_{2m-2}-1}&&\\ &&&&{\boldsymbol{1}}^{T}_{k_{2m}-k_{2m-1}+1}&\\ &&&&&{\mathbf{I}}_{J-k_{2m}}\end{bmatrix}\,.

Let γu=𝐆Tβ\gamma^{u}={\mathbf{G}}^{T}\beta, where β\beta is the sub-vector of γu\gamma^{u} constructed by the unique elements. Since γd=c𝟏\gamma^{d}=c{\boldsymbol{1}}, and 𝐆T𝟏=𝟏{\mathbf{G}}^{T}{\boldsymbol{1}}={\boldsymbol{1}}, then

𝐆𝐁Ty=(μ+1)𝐆𝐁T𝐁𝐆Tβ(μ1)𝐆𝐁T𝐁𝐆Tγdc𝟏,{\mathbf{G}}{\mathbf{B}}^{T}y=(\mu+1){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}\beta-(\mu-1){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}\gamma^{d}c{\boldsymbol{1}}\,,

then

β=1μ+1(𝐆𝐁T𝐁𝐆T)1𝐆𝐁Ty+μ1μ+1c𝟏.\beta=\frac{1}{\mu+1}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,.

thus

γu=𝐆Tβ=1μ+1𝐆(𝐆𝐁T𝐁𝐆T)1𝐆𝐁Ty+μ1μ+1c𝟏,γd=c𝟏.\gamma^{u}={\mathbf{G}}^{T}\beta=\frac{1}{\mu+1}{\mathbf{G}}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,,\qquad\gamma^{d}=c{\boldsymbol{1}}\,.

D.4 Corollary 1

The monotone decomposition for decreasing functions can be straightforward to derive, as summarized in Corollary 1.

Corollary 1.

Let γ^=(γ^u,γ^d)\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d}) be the monotone decomposition to problem (6).Suppose γ^u+γ^d\hat{\gamma}^{u}+\hat{\gamma}^{d} is decreasing, then

  1. (i)

    γ^u\hat{\gamma}^{u} is a vector with identical elements;

  2. (ii)

    if γ^1d>>γ^k1d==γ^k2d>>γ^k2m1d==γ^k2md>>γ^J\hat{\gamma}^{d}_{1}>\cdots>\hat{\gamma}^{d}_{k_{1}}=\cdots=\hat{\gamma}^{d}_{k_{2}}>\cdots>\hat{\gamma}^{d}_{k_{2m-1}}=\cdots=\hat{\gamma}^{d}_{k_{2m}}>\cdots>\hat{\gamma}_{J}, where 1k1k2k2m1k2mJ1\leq k_{1}\leq k_{2}\leq\cdots\leq k_{2m-1}\leq k_{2m}\leq J, then

    γ^d=1μ+1𝐆T(𝐆𝐁T𝐁𝐆T)1𝐆𝐁Ty+μ1μ+1c𝟏,γ^u=𝟏T𝐁γ^dn𝟏.\hat{\gamma}^{d}=\frac{1}{\mu+1}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,,\qquad\hat{\gamma}^{u}=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{d}}{n}{\boldsymbol{1}}\,.
Proof.

With (X,y)(X,y), the solution is γ^u+γ^d\hat{\gamma}^{u}+\hat{\gamma}^{d}, then the solution for (X,y)(X,-y) is γ^uγ^d-\hat{\gamma}^{u}-\hat{\gamma}^{d}.

If γu+γd\gamma^{u}+\gamma^{d} is decreasing, then γuγd-\gamma^{u}-\gamma^{d} is increasing. Note that γu-\gamma^{u} is non-increasing, and γd-\gamma^{d} is non-decreasing, then by Proposition 1, firstly, γu-\gamma^{u} is a constant, and hence γu\gamma^{u} is a constant.

If γ^1d>>γ^k1d==γ^k2d>>γ^k2m1d==γ^k2md>γ^J\hat{\gamma}^{d}_{1}>\cdots>\hat{\gamma}^{d}_{k_{1}}=\cdots=\hat{\gamma}^{d}_{k_{2}}>\cdots>\hat{\gamma}^{d}_{k_{2m-1}}=\cdots=\hat{\gamma}^{d}_{k_{2m}}>\hat{\gamma}_{J}, then

γ^1d<<γ^k1d==γ^k2d<<γ^k2m1d==γ^k2md<γ^J.-\hat{\gamma}^{d}_{1}<\cdots<-\hat{\gamma}^{d}_{k_{1}}=\cdots=-\hat{\gamma}^{d}_{k_{2}}<\cdots<-\hat{\gamma}^{d}_{k_{2m-1}}=\cdots=-\hat{\gamma}^{d}_{k_{2m}}<-\hat{\gamma}_{J}\,.

Thus,

γ^d=1μ+1𝐆(𝐆𝐁T𝐁𝐆T)1𝐆𝐁T(y)+μ1μ+1c~𝟏,c~=𝟏T𝐁(γd)n,-\hat{\gamma}^{d}=\frac{1}{\mu+1}{\mathbf{G}}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}(-y)+\frac{\mu-1}{\mu+1}\tilde{c}{\boldsymbol{1}}\,,\qquad\tilde{c}=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}(-\gamma^{d})}{n}\,,

it follows that

γ^d=1μ+1𝐆(𝐆𝐁T𝐁𝐆T)1𝐆𝐁Ty+μ1μ+1c𝟏,c=𝟏T𝐁γdn,γ^u=c𝟏.\hat{\gamma}^{d}=\frac{1}{\mu+1}{\mathbf{G}}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y+\frac{\mu-1}{\mu+1}c{\boldsymbol{1}}\,,\qquad c=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}}{n}\,,\qquad\hat{\gamma}^{u}=c{\boldsymbol{1}}\,.

Appendix E A Side Product: Unimodal Functions

We explore the solution for more general unimodal functions, as summarized in Proposition 8. Specifically, the second point of Proposition 3 can be viewed as a special instance with k==1k=\ell=1 of Proposition 8.

Proposition 8.

Let γ^=(γ^u,γ^d)\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d}) be the monotone decomposition to problem (8). If

γ^1u\displaystyle\hat{\gamma}_{1}^{u} =γ^2u==γ^ku<γ^k+1u<<γ^Ju\displaystyle=\hat{\gamma}_{2}^{u}=\cdots=\hat{\gamma}_{k}^{u}<\hat{\gamma}_{k+1}^{u}<\cdots<\hat{\gamma}_{J}^{u}
γ^1d\displaystyle\hat{\gamma}_{1}^{d} >γ^2d>>γ^d=γ^+1d==γ^Jd,\displaystyle>\hat{\gamma}_{2}^{d}>\cdots>\hat{\gamma}_{\ell}^{d}=\hat{\gamma}_{\ell+1}^{d}=\cdots=\hat{\gamma}_{J}^{d}\,,

suppose k\ell\leq k, then

[𝐈1𝟏k+1𝐈Jk]𝐁T𝐲=𝐀𝐁T𝐁γ^u+(2𝐈𝐀)𝐁T𝐁γ^d,\begin{bmatrix}{\mathbf{I}}_{\ell-1}&&\\ &{\boldsymbol{1}}_{k-\ell+1}&\\ &&{\mathbf{I}}_{J-k}\end{bmatrix}{\mathbf{B}}^{T}{\mathbf{y}}={\mathbf{A}}{\mathbf{B}}^{T}{\mathbf{B}}\hat{\gamma}^{u}+(2{\mathbf{I}}-{\mathbf{A}}){\mathbf{B}}^{T}{\mathbf{B}}\hat{\gamma}^{d}\,,

where

𝐀=[(1μ)𝐈12μ𝟏1T1+μ(1+μ)𝐈Jk].{\mathbf{A}}=\begin{bmatrix}(1-\mu){\mathbf{I}}_{\ell-1}&&\\ 2\mu{\boldsymbol{1}}_{\ell-1}^{T}&1+\mu&\\ &&(1+\mu){\mathbf{I}}_{J-k}\end{bmatrix}\,.
Proof.

If

γ1u=γ2u==γku<γk+1u<<γJu\gamma_{1}^{u}=\gamma_{2}^{u}=\cdots=\gamma_{k}^{u}<\gamma_{k+1}^{u}<\cdots<\gamma_{J}^{u}

and

γ1d>γ2d>>γd=γ+1d==γJd.\gamma_{1}^{d}>\gamma_{2}^{d}>\cdots>\gamma_{\ell}^{d}=\gamma_{\ell+1}^{d}=\cdots=\gamma_{J}^{d}\,.

Suppose k\ell\leq k, then

νk=νk+1==νJ1=0,\nu_{k}=\nu_{k+1}=\cdots=\nu_{J-1}=0\,,

and

νJ1+i=0,i=1,,1.\nu_{J-1+i}=0,i=1,\ldots,\ell-1\,.

Then from (28), we have ξi=0,k+1iJ\xi_{i}=0,k+1\leq i\leq J, and then

i=1k𝐁iTy\displaystyle\sum_{i=1}^{k}{\mathbf{B}}_{i}^{T}y =(μ+1)i=1k𝐁iT𝐁γu(μ1)i=1k𝐁iT𝐁γd,\displaystyle=(\mu+1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}\,, (29)
𝐁iTy\displaystyle{\mathbf{B}}_{i}^{T}y =(μ+1)𝐁iT𝐁γu(μ1)𝐁iT𝐁γdk+1iJ.\displaystyle=(\mu+1){\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1){\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}\quad k+1\leq i\leq J\,. (30)

Similarly, ξi+J=0,1i1\xi_{i+J}=0,1\leq i\leq\ell-1, and hence

𝐁iTy\displaystyle{\mathbf{B}}_{i}^{T}y =(μ+1)𝐁iT𝐁γd(μ1)𝐁iT𝐁γu1i1\displaystyle=(\mu+1){\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}-(\mu-1){\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}\quad 1\leq i\leq\ell-1 (31)
i=J𝐁iTy\displaystyle\sum_{i=\ell}^{J}{\mathbf{B}}_{i}^{T}y =(μ+1)i=J𝐁iT𝐁γd(μ1)i=J𝐁iT𝐁γu.\displaystyle=(\mu+1)\sum_{i=\ell}^{J}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}-(\mu-1)\sum_{i=\ell}^{J}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}\,. (32)

Note that

i=1J𝐁iTy\displaystyle\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}y =(μ+1)i=1J𝐁iT𝐁γu(μ1)i=1J𝐁iT𝐁γd\displaystyle=(\mu+1)\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}
=(μ+1)i=1J𝐁iT𝐁γd(μ+1)i=1J𝐁iT𝐁γu,\displaystyle=(\mu+1)\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}-(\mu+1)\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}\,, (33)

then (29) + (32) - (33) yields

k𝐁iTy\displaystyle\sum_{\ell}^{k}{\mathbf{B}}_{i}^{T}y =i=1k𝐁iTy+i=J𝐁iTyi=1J𝐁iTy\displaystyle=\sum_{i=1}^{k}{\mathbf{B}}_{i}^{T}y+\sum_{i=\ell}^{J}{\mathbf{B}}_{i}^{T}y-\sum_{i=1}^{J}{\mathbf{B}}_{i}^{T}y
=(μ+1)i=1k𝐁iT𝐁γu(μ1)i=1k𝐁iT𝐁γd+\displaystyle=(\mu+1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}+
(μ+1)i=J𝐁iT𝐁γd(μ1)i=J𝐁iT𝐁γu\displaystyle\qquad\quad(\mu+1)\sum_{i=\ell}^{J}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}-(\mu-1)\sum_{i=\ell}^{J}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-
[(1μ)𝟏T𝐁γu+(μ+1)𝟏T𝐁γd]\displaystyle\qquad\quad[(1-\mu){\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{u}+(\mu+1){\boldsymbol{1}}^{T}{\mathbf{B}}\gamma^{d}]
=(μ+1)i=1k𝐁iT𝐁γu(μ1)i=1k𝐁iT𝐁γd+\displaystyle=(\mu+1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-(\mu-1)\sum_{i=1}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}+
(μ1)i=11𝐁iT𝐁γu(μ+1)i=11𝐁iT𝐁γd\displaystyle\qquad\quad(\mu-1)\sum_{i=1}^{\ell-1}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}-(\mu+1)\sum_{i=1}^{\ell-1}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}
=2μi=11𝐁iT𝐁γu+(μ+1)i=k𝐁iT𝐁γu2μi=11𝐁iT𝐁γd(μ1)i=k𝐁iT𝐁γd,\displaystyle=2\mu\sum_{i=1}^{\ell-1}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}+(\mu+1)\sum_{i=\ell}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{u}-2\mu\sum_{i=1}^{\ell-1}{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}-(\mu-1)\sum_{i=\ell}^{k}{\mathbf{B}}^{T}_{i}{\mathbf{B}}\gamma^{d}\,,

then

[𝐁1Ty𝐁1Tyi=k𝐁iTy𝐁k+1Ty𝐁JTy]\displaystyle\begin{bmatrix}{\mathbf{B}}_{1}^{T}y\\ \vdots\\ {\mathbf{B}}_{\ell-1}^{T}y\\ \sum_{i=\ell}^{k}{\mathbf{B}}_{i}^{T}y\\ {\mathbf{B}}_{k+1}^{T}y\\ \vdots\\ {\mathbf{B}}_{J}^{T}y\\ \end{bmatrix} =[1μ0000000001μ0002μ2μ1+μ000001+μ000001+μ]𝐁T𝐁γu+\displaystyle=\begin{bmatrix}1-\mu&\cdots&0&0&0&\cdots&0\\ \vdots&\ddots&0&0&0&\cdots&0\\ 0&\cdots&1-\mu&0&0&\cdots&0\\ 2\mu&\cdots&2\mu&1+\mu&0&\cdots&0\\ 0&\cdots&0&0&1+\mu&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\cdots&0&0&0&\cdots&1+\mu\end{bmatrix}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{u}+
[1+μ0000000001+μ0002μ2μ1μ000001μ000001μ]𝐁T𝐁γd\displaystyle\qquad\qquad\begin{bmatrix}1+\mu&\cdots&0&0&0&\cdots&0\\ \vdots&\ddots&0&0&0&\cdots&0\\ 0&\cdots&1+\mu&0&0&\cdots&0\\ -2\mu&\cdots&-2\mu&1-\mu&0&\cdots&0\\ 0&\cdots&0&0&1-\mu&\cdots&0\\ \vdots&\vdots&\vdots&\vdots&\vdots&\ddots&\vdots\\ 0&\cdots&0&0&0&\cdots&1-\mu\end{bmatrix}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{d}
𝐀𝐁T𝐁γu+(2𝐈𝐀)𝐁T𝐁γd.\displaystyle\triangleq{\mathbf{A}}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{u}+(2{\mathbf{I}}-{\mathbf{A}}){\mathbf{B}}^{T}{\mathbf{B}}\gamma^{d}\,.

If =k\ell=k, then

𝐁Ty=𝐀𝐁T𝐁γu+(2𝐈𝐀)𝐁T𝐁γd.{\mathbf{B}}^{T}y={\mathbf{A}}{\mathbf{B}}^{T}{\mathbf{B}}\gamma^{u}+(2{\mathbf{I}}-{\mathbf{A}}){\mathbf{B}}^{T}{\mathbf{B}}\gamma^{d}\,.

Appendix F Proof of Proposition 4

Proof.

The fitting vector is

𝐟^\displaystyle\hat{\mathbf{f}} =𝐁γ^u+𝐁γ^d\displaystyle={\mathbf{B}}\hat{\gamma}^{u}+{\mathbf{B}}\hat{\gamma}^{d}
=1μ+1𝐁𝐆T(𝐆𝐁T𝐁𝐆T)1𝐆𝐁T𝐲+2μμ+1𝟏nT𝐟^2n𝟏n\displaystyle=\frac{1}{\mu+1}{\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{y}}+\frac{2\mu}{\mu+1}\frac{{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}}{2n}{\boldsymbol{1}}_{n}
k𝐀𝐲+1n(1k)𝟏nT𝐟^𝟏n,\displaystyle\triangleq k{\mathbf{A}}{\mathbf{y}}+\frac{1}{n}(1-k){\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}{\boldsymbol{1}}_{n}\,, (34)

then

𝟏nT𝐟^=k𝟏nT𝐀𝐲+(1k)𝟏nT𝐟^.{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}=k{\boldsymbol{1}}^{T}_{n}{\mathbf{A}}{\mathbf{y}}+(1-k){\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}\,. (35)

It follows that

𝟏nT𝐟^=𝟏nT𝐀𝐲,{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}={\boldsymbol{1}}^{T}_{n}{\mathbf{A}}{\mathbf{y}}\,,

then

𝐟^=k𝐀𝐲+1kn𝟏nT𝐀𝐲𝟏n=[k𝐈+1kn𝟏n𝟏nT]𝐀𝐲.\hat{\mathbf{f}}=k{\mathbf{A}}{\mathbf{y}}+\frac{1-k}{n}{\boldsymbol{1}}^{T}_{n}{\mathbf{A}}{\mathbf{y}}{\boldsymbol{1}}_{n}=\left[k{\mathbf{I}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]{\mathbf{A}}{\mathbf{y}}\,. (36)

Note that 𝐆{\mathbf{G}} depends on 𝐲{\mathbf{y}}, and hence 𝐀{\mathbf{A}} also depends on 𝐲{\mathbf{y}}. Take expectation on (36) and by the law of total expectation, we have

𝔼𝐟^=𝔼[𝔼[𝐟^𝐆]]=𝔼[k𝐀𝐟+1kn𝟏nT𝐀𝐟𝟏n].{\mathbb{E}}\hat{\mathbf{f}}={\mathbb{E}}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]={\mathbb{E}}[k{\mathbf{A}}{\mathbf{f}}+\frac{1-k}{n}{\boldsymbol{1}}^{T}_{n}{\mathbf{A}}{\mathbf{f}}{\boldsymbol{1}}_{n}]\,.

Note that 𝐀𝐀T=𝐀=𝐀T{\mathbf{A}}{\mathbf{A}}^{T}={\mathbf{A}}={\mathbf{A}}^{T} and 𝟏n=𝐁𝟏J=𝐆T𝟏g=𝐁𝐆T𝟏g{\boldsymbol{1}}_{n}={\mathbf{B}}{\boldsymbol{1}}_{J}={\mathbf{G}}^{T}{\boldsymbol{1}}_{g}={\mathbf{B}}{\mathbf{G}}^{T}{\boldsymbol{1}}_{g}, then

𝟏nT𝐀\displaystyle{\boldsymbol{1}}^{T}_{n}{\mathbf{A}} =𝟏nT𝐁𝐆T(𝐆𝐁T𝐁𝐆T)1𝐆𝐁T\displaystyle={\boldsymbol{1}}^{T}_{n}{\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}
=𝟏gT𝐆𝐁T𝐁𝐆T(𝐆𝐁T𝐁𝐆T)1𝐆𝐁T\displaystyle={\boldsymbol{1}}^{T}_{g}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}
=𝟏gT𝐆𝐁T\displaystyle={\boldsymbol{1}}^{T}_{g}{\mathbf{G}}{\mathbf{B}}^{T}
=𝟏nT,\displaystyle={\boldsymbol{1}}^{T}_{n}\,,

and hence 𝟏nT𝐀𝐀T𝟏n=n{\boldsymbol{1}}_{n}^{T}{\mathbf{A}}{\mathbf{A}}^{T}{\boldsymbol{1}}_{n}=n. It follows that

𝔼𝐟^=k𝔼𝐀𝐟+1kn𝟏nT𝐟𝟏n.{\mathbb{E}}\hat{\mathbf{f}}=k{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}{\boldsymbol{1}}_{n}\,.

The square of bias is

𝐟𝔼𝐟^2\displaystyle\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\|^{2} =𝐟k𝔼𝐀𝐟1kn𝟏nT𝐟𝟏n2\displaystyle=\|{\mathbf{f}}-k{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}-\frac{1-k}{n}{\boldsymbol{1}}^{T}_{n}{\mathbf{f}}{\boldsymbol{1}}_{n}\|^{2}
=𝐟T(𝐈k𝔼𝐀)2𝐟2(1k)n𝐟T(𝐈k𝔼𝐀)𝟏nT𝐟𝟏n+(1k)2n2(𝟏nT𝐟)2𝟏nT𝟏n\displaystyle={\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}})^{2}{\mathbf{f}}-\frac{2(1-k)}{n}{\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}}){\boldsymbol{1}}^{T}_{n}{\mathbf{f}}{\boldsymbol{1}}_{n}+\frac{(1-k)^{2}}{n^{2}}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}{\boldsymbol{1}}^{T}_{n}{\boldsymbol{1}}_{n}
=𝐟T(𝐈k𝔼𝐀)2𝐟2(1k)n(𝟏nT𝐟)𝐟T(𝐈k𝔼𝐀)𝟏n+(1k)2n(𝟏nT𝐟)2\displaystyle={\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}})^{2}{\mathbf{f}}-\frac{2(1-k)}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}}){\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}}){\boldsymbol{1}}_{n}+\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}
=𝐟T(𝐈k𝔼𝐀)2𝐟(1k)2n(𝟏nT𝐟)2\displaystyle={\mathbf{f}}^{T}({\mathbf{I}}-k{\mathbb{E}}{\mathbf{A}})^{2}{\mathbf{f}}-\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}
=𝐟T𝐟2k𝐟T𝔼𝐀𝐟+k2𝐟T[𝔼𝐀]2𝐟(1k)2n(𝟏nT𝐟)2.\displaystyle={\mathbf{f}}^{T}{\mathbf{f}}-2k{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+k^{2}{\mathbf{f}}^{T}[{\mathbb{E}}{\mathbf{A}}]^{2}{\mathbf{f}}-\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}\,.

Note that

[𝔼𝐀]2=𝔼𝐀2Var(𝐀),[{\mathbb{E}}{\mathbf{A}}]^{2}={\mathbb{E}}{\mathbf{A}}^{2}-\operatorname*{Var}({\mathbf{A}})\,,

and 𝐀2=𝐀{\mathbf{A}}^{2}={\mathbf{A}}, we can write

k2𝐟T[𝔼𝐀]2𝐟=k2𝐟T[𝔼𝐀Var(𝐀)]𝐟.k^{2}{\mathbf{f}}^{T}[{\mathbb{E}}{\mathbf{A}}]^{2}{\mathbf{f}}=k^{2}{\mathbf{f}}^{T}[{\mathbb{E}}{\mathbf{A}}-\operatorname*{Var}({\mathbf{A}})]{\mathbf{f}}\,.

It follows that

𝐟𝔼𝐟^2\displaystyle\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\|^{2} =𝐟T𝔼𝐀𝐟2k𝐟T𝔼𝐀𝐟+k2𝐟T𝔼𝐀𝐟+𝐟T𝐟𝐟T𝔼𝐀𝐟k2𝐟TVar(𝐀)𝐟(1k)2n(𝟏nT𝐟)2\displaystyle={\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}-2k{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+k^{2}{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+{\mathbf{f}}^{T}{\mathbf{f}}-{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}-k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}-\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}
=(1k)2𝐟T𝔼𝐀𝐟+𝐟T(𝐈𝔼𝐀)𝐟k2𝐟TVar(𝐀)𝐟(1k)2n(𝟏nT𝐟)2\displaystyle=(1-k)^{2}{\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}+{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbb{E}}{\mathbf{A}}){\mathbf{f}}-k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}-\frac{(1-k)^{2}}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}
=(1k)2(𝐟T𝔼𝐀𝐟(𝟏nT𝐟)2n)+𝐟T(𝐈𝔼𝐀)𝐟k2𝐟TVar(𝐀)𝐟\displaystyle=(1-k)^{2}\left({\mathbf{f}}^{T}{\mathbb{E}}{\mathbf{A}}{\mathbf{f}}-\frac{({\boldsymbol{1}}_{n}^{T}{\mathbf{f}})^{2}}{n}\right)+{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbb{E}}{\mathbf{A}}){\mathbf{f}}-k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}
(1k)2C1+C2k2C3.\displaystyle\triangleq(1-k)^{2}C_{1}+C_{2}-k^{2}C_{3}\,.

Since for each 𝐀{\mathbf{A}}, we have (𝐈𝐀)2=(𝐈𝐀)({\mathbf{I}}-{\mathbf{A}})^{2}=({\mathbf{I}}-{\mathbf{A}}) and (𝐀𝟏𝟏T/n)2=𝐀𝟏𝟏T/n({\mathbf{A}}-{\boldsymbol{1}}{\boldsymbol{1}}^{T}/n)^{2}={\mathbf{A}}-{\boldsymbol{1}}{\boldsymbol{1}}^{T}/n, then both 𝐈𝐀{\mathbf{I}}-{\mathbf{A}} and 𝐀𝟏𝟏T{\mathbf{A}}-{\boldsymbol{1}}{\boldsymbol{1}}^{T} are idempotent, and hence are positive semi-definite. It follows that both 𝐈𝔼𝐀{\mathbf{I}}-{\mathbb{E}}{\mathbf{A}} and 𝔼𝐀𝟏𝟏T/n{\mathbb{E}}{\mathbf{A}}-{\boldsymbol{1}}{\boldsymbol{1}}^{T}/n are also positive semi-definite. Hence C10C_{1}\geq 0 and C20C_{2}\geq 0.

On the other hand, by the law of total variance, we have

Var(𝐟^)=𝔼[Var(𝐟^𝐆)]+Var[𝔼[𝐟^𝐆]].\operatorname*{Var}(\hat{\mathbf{f}})={\mathbb{E}}[\operatorname*{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]+\operatorname*{Var}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]\,. (37)

For the first term of (37), we have

Var(𝐟^𝐆)\displaystyle\operatorname*{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}}) =σ2[k𝐈+1kn𝟏n𝟏nT]𝐀𝐀T[k𝐈+1kn𝟏n𝟏nT]\displaystyle=\sigma^{2}\left[k{\mathbf{I}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]{\mathbf{A}}{\mathbf{A}}^{T}\left[k{\mathbf{I}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]
=σ2[k2𝐀𝐀T+1kn𝟏n𝟏nT𝐀𝐀T+1kn𝐀𝐀T𝟏n𝟏nT+(1k)2n2𝟏n𝟏nT𝟏n𝟏nT].\displaystyle=\sigma^{2}\left[k^{2}{\mathbf{A}}{\mathbf{A}}^{T}+\frac{1-k}{n}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{A}}{\mathbf{A}}^{T}+\frac{1-k}{n}{\mathbf{A}}{\mathbf{A}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}+\frac{(1-k)^{2}}{n^{2}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]\,.

Since

tr(𝐀𝐀T)\displaystyle\operatorname*{tr}({\mathbf{A}}{\mathbf{A}}^{T}) =tr(𝐀)\displaystyle=\operatorname*{tr}({\mathbf{A}})
=tr(𝐁𝐆T(𝐆𝐁T𝐁𝐆T)1𝐆𝐁T)\displaystyle=\operatorname*{tr}({\mathbf{B}}{\mathbf{G}}^{T}({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T})
=tr((𝐆𝐁T𝐁𝐆T)1𝐆𝐁T𝐁𝐆T)\displaystyle=\operatorname*{tr}(({\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T})
=g,\displaystyle=g\,,

then

tr[Var(𝐟^𝐆)]=σ2[k2g+2(1k)+(1k)2].\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]=\sigma^{2}\left[k^{2}g+2(1-k)+(1-k)^{2}\right]\,.

For the second term of (37), we have

Var[𝔼[𝐟^𝐆]]=Var[k𝐀𝐟+1kn𝟏nT𝐟𝟏n]=k2Var[𝐀]𝐟𝐟T.\displaystyle\operatorname*{Var}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]=\operatorname*{Var}[k{\mathbf{A}}{\mathbf{f}}+\frac{1-k}{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}{\boldsymbol{1}}_{n}]=k^{2}\operatorname*{Var}[{\mathbf{A}}]{\mathbf{f}}{\mathbf{f}}^{T}\,.

Thus,

tr[Var(𝐟^)]\displaystyle\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}})] =tr[𝔼[Var(𝐟^𝐆)]]+tr[Var[𝔼[𝐟^𝐆]]]\displaystyle=\operatorname*{tr}[{\mathbb{E}}[\operatorname*{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]]+\operatorname*{tr}[\operatorname*{Var}[{\mathbb{E}}[\hat{\mathbf{f}}\mid{\mathbf{G}}]]]
=𝔼[tr[Var(𝐟^𝐆)]]+tr[k2Var[𝐀]𝐟𝐟T]\displaystyle={\mathbb{E}}[\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}\mid{\mathbf{G}})]]+\operatorname*{tr}[k^{2}\operatorname*{Var}[{\mathbf{A}}]{\mathbf{f}}{\mathbf{f}}^{T}]
=σ2[k2𝔼g+2(1k)+(1k)2]+k2𝐟TVar(𝐀)𝐟\displaystyle=\sigma^{2}[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}]+k^{2}{\mathbf{f}}^{T}\operatorname*{Var}({\mathbf{A}}){\mathbf{f}}
σ2[k2𝔼g+2(1k)+(1k)2]+k2C3.\displaystyle\triangleq\sigma^{2}[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}]+k^{2}C_{3}\,.

It follows that the mean square error is

MSE\displaystyle\operatorname*{MSE} =Bias2+tr[Var(𝐟^)]\displaystyle=\|\operatorname{Bias}\|^{2}+\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}})]
=(1k)2C1+C2+σ2[k2𝔼g+2(1k)+(1k)2].\displaystyle=(1-k)^{2}C_{1}+C_{2}+\sigma^{2}\left[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}\right]\,.

The unconstrained B-spline fitting has

MSE=lsJσ2.\operatorname*{MSE}{}^{\mathrm{ls}}=J\sigma^{2}\,.

To have MSE<MSEls\operatorname*{MSE}<\operatorname*{MSE}{}^{\mathrm{ls}},

(1k)2C1+C2+σ2[k2𝔼g+2(1k)+(1k)2]<Jσ2,(1-k)^{2}C_{1}+C_{2}+\sigma^{2}\left[k^{2}{\mathbb{E}}g+2(1-k)+(1-k)^{2}\right]<J\sigma^{2}\,,

that is

h(k)[C1+(𝔼g+1)σ2]k2[2C1+4σ2]k+C1+C2+3σ2Jσ2<0.h(k)\triangleq\left[C_{1}+({\mathbb{E}}g+1)\sigma^{2}\right]k^{2}-\left[2C_{1}+4\sigma^{2}\right]k+C_{1}+C_{2}+3\sigma^{2}-J\sigma^{2}<0\,.

Note that the minimum is obtained at

kmin=[2C1+4σ2]2[C1+(𝔼g+1)σ2]=C1+2σ2C1+(𝔼g+1)σ2(0,1),k_{\min}=\frac{\left[2C_{1}+4\sigma^{2}\right]}{2\left[C_{1}+({\mathbb{E}}g+1)\sigma^{2}\right]}=\frac{C_{1}+2\sigma^{2}}{C_{1}+({\mathbb{E}}g+1)\sigma^{2}}\in(0,1)\,,

since g1g\geq 1, and the minimum value is

hmin\displaystyle h_{\min} =C1+C2+3σ2Jσ2[2C1+4σ2]24[C1+(𝔼g+1)σ2]\displaystyle=C_{1}+C_{2}+3\sigma^{2}-J\sigma^{2}-\frac{\left[2C_{1}+4\sigma^{2}\right]^{2}}{4\left[C_{1}+({\mathbb{E}}g+1)\sigma^{2}\right]}
=σ4[(3J)(𝔼g+1)4]+σ2[C1(J𝔼g)+C2(𝔼g+1)]+C1C2C1+(𝔼g+1)σ2\displaystyle=\frac{\sigma^{4}\left[(3-J)({\mathbb{E}}g+1)-4\right]+\sigma^{2}\left[-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]+C_{1}C_{2}}{C_{1}+({\mathbb{E}}g+1)\sigma^{2}}
=σ4[(J𝔼g)(𝔼g+1)+(𝔼g1)2]+σ2[C1(J𝔼g)+C2(𝔼g+1)]+C1C2C1+(𝔼g+1)σ2\displaystyle=\frac{-\sigma^{4}\left[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}\right]+\sigma^{2}\left[-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]+C_{1}C_{2}}{C_{1}+({\mathbb{E}}g+1)\sigma^{2}}
=u(σ2)C1+(𝔼g+1)σ2,\displaystyle=\frac{u(\sigma^{2})}{C_{1}+({\mathbb{E}}g+1)\sigma^{2}}\,,

where

u(t)=t2[(J𝔼g)(𝔼g+1)+(𝔼g1)2]+t[C1(J𝔼g)+C2(𝔼g+1)]+C1C2.u(t)=-t^{2}\left[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}\right]+t\left[-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]+C_{1}C_{2}\,.

Since C1,C20C_{1},C_{2}\geq 0,

Δ\displaystyle\Delta =[C1(J𝔼g)+C2(𝔼g+1)]2+4C1C2[(J𝔼g)(𝔼g+1)+(𝔼g1)2]0\displaystyle=\left[-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]^{2}+4C_{1}C_{2}\left[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}\right]\geq 0
=[C1(J𝔼g)+C2(𝔼g+1)]2+4C1C2(𝔼g1)20,\displaystyle=\left[C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)\right]^{2}+4C_{1}C_{2}({\mathbb{E}}g-1)^{2}\geq 0\,,

then u(t)<0u(t)<0 if

t>C1(J𝔼g)C2(𝔼g+1)Δ2[(J𝔼g)(𝔼g+1)+(𝔼g1)2]=C1(J𝔼g)+C2(𝔼g+1)+Δ2[(J𝔼g)(𝔼g+1)+(𝔼g1)2].t>\frac{C_{1}(J-{\mathbb{E}}g)-C_{2}({\mathbb{E}}g+1)-\sqrt{\Delta}}{-2[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}]}=\frac{-C_{1}(J-{\mathbb{E}}g)+C_{2}({\mathbb{E}}g+1)+\sqrt{\Delta}}{2[(J-{\mathbb{E}}g)({\mathbb{E}}g+1)+({\mathbb{E}}g-1)^{2}]}\,.

F.1 𝐆=𝐈{\mathbf{G}}={\mathbf{I}}

If 𝐆=𝐈{\mathbf{G}}={\mathbf{I}}, that is, no ties in the solution,

𝐀𝐟=𝐀𝐁γ=𝐁(𝐁T𝐁)1𝐁T𝐁γ=𝐁γ=𝐟.{\mathbf{A}}{\mathbf{f}}={\mathbf{A}}{\mathbf{B}}\gamma={\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}{\mathbf{B}}\gamma={\mathbf{B}}\gamma={\mathbf{f}}\,.
𝐟𝔼𝐟^2\displaystyle\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\|^{2} =𝐟k𝐟1kn𝟏T𝐟𝟏2\displaystyle=\|{\mathbf{f}}-k{\mathbf{f}}-\frac{1-k}{n}{\boldsymbol{1}}^{T}{\mathbf{f}}{\boldsymbol{1}}\|^{2}
=(1k)2𝐟1n𝟏T𝐟𝟏2\displaystyle=(1-k)^{2}\|{\mathbf{f}}-\frac{1}{n}{\boldsymbol{1}}^{T}{\mathbf{f}}{\boldsymbol{1}}\|^{2}
=(1k)2[𝐟21n(𝟏nT𝐟)2]\displaystyle=(1-k)^{2}\left[\|{\mathbf{f}}\|^{2}-\frac{1}{n}({\boldsymbol{1}}^{T}_{n}{\mathbf{f}})^{2}\right]

If 𝐆=𝐈{\mathbf{G}}={\mathbf{I}}, J=g,C2=0J=g,C_{2}=0, then

hmin=σ4(g1)2C1+(g+1)σ2<0.h_{\min}=\frac{-\sigma^{4}(g-1)^{2}}{C_{1}+(g+1)\sigma^{2}}<0\,.

Appendix G Proof of Proposition 5

G.1 (i)

Proof.

Since if γu+γd=γ~u+γ~d\gamma^{u}+\gamma^{d}=\tilde{\gamma}^{u}+\tilde{\gamma}^{d}, then the roughness penalty does not change,

(γu+γd)T𝛀(γu+γd)=(γ~u+γ~d)T𝛀(γ~u+γ~d).(\gamma^{u}+\gamma^{d})^{T}{\boldsymbol{\Omega}}(\gamma^{u}+\gamma^{d})=(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})^{T}{\boldsymbol{\Omega}}(\tilde{\gamma}^{u}+\tilde{\gamma}^{d})\,.

Then we can repeat the procedure in Section D.1.

G.2 (ii)

Proof.

A special case when 𝐆=𝐈{\mathbf{G}}={\mathbf{I}} in the following result. ∎

G.3 (iii)

Proof.

Take the derivatives on the Lagrange form,

2𝐙T(𝐲𝐙γ)+2𝐇Tν+2μ𝐖γ+2λ[𝛀𝛀𝛀𝛀]γ=0-2{\mathbf{Z}}^{T}({\mathbf{y}}-{\mathbf{Z}}\gamma)+2{\mathbf{H}}^{T}\nu+2\mu{\mathbf{W}}\gamma+2\lambda\begin{bmatrix}{\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\\ {\boldsymbol{\Omega}}&{\boldsymbol{\Omega}}\end{bmatrix}\gamma=0

Then

𝐁iT(𝐲𝐙γ)+𝐇iTν+μ𝐁iT𝐁(γuγd)+λ𝛀i(γu+γd)=0\displaystyle-{\mathbf{B}}^{T}_{i}({\mathbf{y}}-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})+\lambda{\boldsymbol{\Omega}}_{i}(\gamma^{u}+\gamma^{d})=0 (38)
𝐁iT(𝐲𝐙γ)+𝐇i+JTν+μ𝐁iT𝐁(γu+γd)+λ𝛀i(γu+γd)=0\displaystyle-{\mathbf{B}}^{T}_{i}({\mathbf{y}}-{\mathbf{Z}}\gamma)+{\mathbf{H}}_{i+J}^{T}\nu+\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}(-\gamma^{u}+\gamma^{d})+\lambda{\boldsymbol{\Omega}}_{i}(\gamma^{u}+\gamma^{d})=0 (39)

Rewrite them as

𝐁iT𝐲+ξi=((1+μ)𝐁iT𝐁+λ𝛀i)γu+((1μ)𝐁iT𝐁+λ𝛀i)γd\displaystyle{\mathbf{B}}^{T}_{i}{\mathbf{y}}+\xi_{i}=((1+\mu){\mathbf{B}}_{i}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}_{i})\gamma^{u}+((1-\mu){\mathbf{B}}_{i}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}_{i})\gamma^{d} (40)
𝐁iT𝐲+ξi+J=((1μ)𝐁iT𝐁+λ𝛀i)γu+((1+μ)𝐁iT𝐁+λ𝛀i)γd.\displaystyle{\mathbf{B}}^{T}_{i}{\mathbf{y}}+\xi_{i+J}=((1-\mu){\mathbf{B}}_{i}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}_{i})\gamma^{u}+((1+\mu){\mathbf{B}}_{i}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}_{i})\gamma^{d}\,. (41)

If there are no ties in the solution, i.e.,

γ1u<γ2u<<γJu,γ1d>γ2d>>γJd,\gamma_{1}^{u}<\gamma_{2}^{u}<\cdots<\gamma_{J}^{u}\,,\qquad\gamma_{1}^{d}>\gamma_{2}^{d}>\cdots>\gamma_{J}^{d}\,,

by the KKT condition, ν=0\nu=0, and hence ξi=ξi+J=0\xi_{i}=\xi_{i+J}=0, then (40) - (41) yields

0=2μ𝐁iT𝐁γu2μ𝐁iT𝐁γd,0=2\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{u}-2\mu{\mathbf{B}}_{i}^{T}{\mathbf{B}}\gamma^{d}\,,

that is

0=2μ𝐁T𝐁(γuγd),0=2\mu{\mathbf{B}}^{T}{\mathbf{B}}(\gamma^{u}-\gamma^{d})\,,

which implies μ=0\mu=0 since γu>γd\gamma^{u}>\gamma^{d}. Then it reduces to unconstrained B-spline fitting. This argument proves the first point of Proposition 2.

If γ^1u<<γ^k1u==γ^k2u<<γ^k2m1u==γ^k2mu<γ^J\hat{\gamma}^{u}_{1}<\cdots<\hat{\gamma}^{u}_{k_{1}}=\cdots=\hat{\gamma}^{u}_{k_{2}}<\cdots<\hat{\gamma}^{u}_{k_{2m-1}}=\cdots=\hat{\gamma}^{u}_{k_{2m}}<\hat{\gamma}_{J}, then similar to Section D, we have

𝐆𝐁Ty=((1+μ)𝐆𝐁T𝐁+λ𝐆𝛀)γu+((1μ)𝐆𝐁T𝐁+λ𝐆𝛀)γd.{\mathbf{G}}{\mathbf{B}}^{T}y=((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}})\gamma^{u}+((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}})\gamma^{d}\,.

Let γu=𝐆Tβ\gamma^{u}={\mathbf{G}}^{T}\beta, where β\beta is the sub-vector of γu\gamma^{u} constructed by the unique elements. By G.1, γd=c𝟏J\gamma^{d}=c{\boldsymbol{1}}_{J}, and note that 𝐆T𝟏g=𝟏J{\mathbf{G}}^{T}{\boldsymbol{1}}_{g}={\boldsymbol{1}}_{J}, then

𝐆𝐁Ty=((1+μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)β+c((1μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)𝟏g,{\mathbf{G}}{\mathbf{B}}^{T}y=((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})\beta+c((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T}){\boldsymbol{1}}_{g}\,,

it follows that

β\displaystyle\beta =((1+μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)1𝐆𝐁Ty\displaystyle=((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y-
c((1+μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)1((1μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)𝟏g,\displaystyle\qquad c((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T}){\boldsymbol{1}}_{g}\,,

and

c=𝟏nT𝐁γun=𝟏nT𝐁𝐆Tβn.c=\frac{{\boldsymbol{1}}_{n}^{T}{\mathbf{B}}\gamma^{u}}{n}=\frac{{\boldsymbol{1}}_{n}^{T}{\mathbf{B}}{\mathbf{G}}^{T}\beta}{n}\,.

G.4 Corollary 2

Analogously, we can obtain the monotone decomposition with smoothing splines on decreasing functions, as summarized in Corollary 2.

Corollary 2.

Let γ^=(γ^u,γ^d)\hat{\gamma}=(\hat{\gamma}^{u},\hat{\gamma}^{d}) be the monotone decomposition to problem (11). Suppose γ^u+γ^d\hat{\gamma}^{u}+\hat{\gamma}^{d} is decreasing, then

  1. (i)

    γ^u\hat{\gamma}^{u} is a vector with identical elements;

  2. (ii)

    if γ^1d>>γ^k1d==γ^k2d>>γ^k2m1d==γ^k2md>γ^J\hat{\gamma}^{d}_{1}>\cdots>\hat{\gamma}^{d}_{k_{1}}=\cdots=\hat{\gamma}^{d}_{k_{2}}>\cdots>\hat{\gamma}^{d}_{k_{2m-1}}=\cdots=\hat{\gamma}^{d}_{k_{2m}}>\hat{\gamma}_{J}, where 1k1k2k2m1k2mJ1\leq k_{1}\leq k_{2}\leq\cdots\leq k_{2m-1}\leq k_{2m}\leq J, then

    γ^d\displaystyle\hat{\gamma}^{d} =𝐆T((1+μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)1𝐆𝐁Ty\displaystyle={\mathbf{G}}^{T}((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}{\mathbf{G}}{\mathbf{B}}^{T}y-
    c𝐆T((1+μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)1((1μ)𝐆𝐁T𝐁𝐆T+λ𝐆𝛀𝐆T)𝟏g,\displaystyle\qquad c{\mathbf{G}}^{T}((1+\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T})^{-1}((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}{\mathbf{G}}^{T}+\lambda{\mathbf{G}}{\boldsymbol{\Omega}}{\mathbf{G}}^{T}){\boldsymbol{1}}_{g}\,,
    γ^u\displaystyle\hat{\gamma}^{u} =c𝟏J=𝟏T𝐁γ^dn𝟏J,\displaystyle=c{\boldsymbol{1}}_{J}=\frac{{\boldsymbol{1}}^{T}{\mathbf{B}}\hat{\gamma}^{d}}{n}{\boldsymbol{1}}_{J}\,,
Proof.

Similar to the proof D.4 for Corollary 1. ∎

Appendix H Proof of Proposition 6

H.1 Lemmas

Lemma 2 ([14]).

Let 𝐀,𝐁{\mathbf{A}},{\mathbf{B}} be two real matrices of the same size, then

[tr(𝐀𝐁)]2[tr(𝐀2)][tr(𝐁2)].[\operatorname*{tr}({\mathbf{A}}{\mathbf{B}})]^{2}\leq[\operatorname*{tr}({\mathbf{A}}^{2})][\operatorname*{tr}({\mathbf{B}}^{2})]\,.
Lemma 3.

Let 𝐀{\mathbf{A}} be a real positive semi-definite matrix, then

tr(𝐀2)[tr(𝐀)]2.\operatorname*{tr}({\mathbf{A}}^{2})\leq[\operatorname*{tr}({\mathbf{A}})]^{2}\,.
Proof.

Let λi\lambda_{i} be the eigenvalues of 𝐀{\mathbf{A}}, then λi2\lambda_{i}^{2} are the eigenvalues of 𝐀2{\mathbf{A}}^{2}. Note that

tr[𝐀2]=λi2(λi)2=[tr(𝐀)]2.\operatorname*{tr}[{\mathbf{A}}^{2}]=\sum\lambda_{i}^{2}\leq(\sum\lambda_{i})^{2}=[\operatorname*{tr}({\mathbf{A}})]^{2}\,.

Lemma 4.

The eigenvalues of 𝟏n𝟏nT{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T} is

λ1=n,λ2==λn=0.\lambda_{1}=n,\lambda_{2}=\cdots=\lambda_{n}=0\,.
Proof.

Since 𝟏n𝟏nT{\boldsymbol{1}}_{n}{\boldsymbol{1}}^{T}_{n} is a rank-1 matrix, which implies that it has n1n-1 eigenvalues which are zero. Denote the eigenvectors as ξi,i=1,,n\xi_{i},i=1,\ldots,n. Suppose the nonzero eigenvalues is λ1\lambda_{1} with associated eigenvectors ξ1\xi_{1}, then

𝟏n𝟏nTξ1=λ1ξ1,{\boldsymbol{1}}_{n}{\boldsymbol{1}}^{T}_{n}\xi_{1}=\lambda_{1}\xi_{1}\,,

left multiplying 𝟏nT{\boldsymbol{1}}^{T}_{n} yields

n𝟏nTξ1=λ1𝟏Tξ1,n{\boldsymbol{1}}_{n}^{T}\xi_{1}=\lambda_{1}{\boldsymbol{1}}^{T}\xi_{1}\,,

which implies that λ1=n\lambda_{1}=n. ∎

H.2 𝐆=𝐈{\mathbf{G}}={\mathbf{I}}

Proof.

If 𝐆=𝐈{\mathbf{G}}={\mathbf{I}}, the solution is

γ^u\displaystyle\hat{\gamma}^{u} =11+μ[𝐁T𝐁+λ1+μ𝛀]1𝐁T𝐲c((1+μ)𝐁T𝐁+λ𝛀)1((1μ)𝐆𝐁T𝐁+λ𝛀)𝟏J,\displaystyle=\frac{1}{1+\mu}\left[{\mathbf{B}}^{T}{\mathbf{B}}+\frac{\lambda}{1+\mu}{\boldsymbol{\Omega}}\right]^{-1}{\mathbf{B}}^{T}{\mathbf{y}}-c((1+\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}})^{-1}((1-\mu){\mathbf{G}}{\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}){\boldsymbol{1}}_{J}\,,
11+μ[𝐁T𝐁+λ0𝛀]1𝐁T𝐲c𝐊𝟏J,\displaystyle\triangleq\frac{1}{1+\mu}\left[{\mathbf{B}}^{T}{\mathbf{B}}+\lambda_{0}{\boldsymbol{\Omega}}\right]^{-1}{\mathbf{B}}^{T}{\mathbf{y}}-c{\mathbf{K}}{\boldsymbol{1}}_{J}\,,
γ^d\displaystyle\hat{\gamma}^{d} =c𝟏J=𝟏nT𝐟^2n𝟏J,\displaystyle=c{\boldsymbol{1}}_{J}=\frac{{\boldsymbol{1}}_{n}^{T}\hat{\mathbf{f}}}{2n}{\boldsymbol{1}}_{J}\,,

then the fitting vector is

𝐟^\displaystyle\hat{\mathbf{f}} =𝐁γ^u+𝐁γ^d\displaystyle={\mathbf{B}}\hat{\gamma}^{u}+{\mathbf{B}}\hat{\gamma}^{d}
=11+μ𝐟^ss+c𝐁(𝐊+𝐈)𝟏J\displaystyle=\frac{1}{1+\mu}\hat{\mathbf{f}}^{\mathrm{ss}}+c{\mathbf{B}}(-{\mathbf{K}}+{\mathbf{I}}){\boldsymbol{1}}_{J}
=11+μ𝐟^ss+𝟏nT𝐟^2n𝐁[(1+μ)𝐁T𝐁+λ𝛀]12μ𝐁T𝐁𝟏J\displaystyle=\frac{1}{1+\mu}\hat{\mathbf{f}}^{\mathrm{ss}}+\frac{{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}}{2n}{\mathbf{B}}\left[(1+\mu){\mathbf{B}}^{T}{\mathbf{B}}+\lambda{\boldsymbol{\Omega}}\right]^{-1}2\mu{\mathbf{B}}^{T}{\mathbf{B}}{\boldsymbol{1}}_{J}
=11+μ𝐟^ss+μ1+μ𝟏nT𝐟^n𝐁[𝐁T𝐁+λ1+μ𝛀]1𝐁T𝟏n\displaystyle=\frac{1}{1+\mu}\hat{\mathbf{f}}^{\mathrm{ss}}+\frac{\mu}{1+\mu}\frac{{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}}{n}{\mathbf{B}}\left[{\mathbf{B}}^{T}{\mathbf{B}}+\frac{\lambda}{1+\mu}{\boldsymbol{\Omega}}\right]^{-1}{\mathbf{B}}^{T}{\boldsymbol{1}}_{n}
k𝐟^ss+(1k)𝟏nT𝐟^n𝐐𝟏n,\displaystyle\triangleq k\hat{\mathbf{f}}^{\mathrm{ss}}+(1-k)\frac{{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}}{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}\,,

where 𝐐=𝐁[𝐁T𝐁+λ0𝛀]1𝐁T{\mathbf{Q}}={\mathbf{B}}\left[{\mathbf{B}}^{T}{\mathbf{B}}+\lambda_{0}{\boldsymbol{\Omega}}\right]^{-1}{\mathbf{B}}^{T}. In practice, λ0\lambda_{0} is given as a constant, so 𝐐{\mathbf{Q}} does not depend on kk. Left multiplying 𝟏nT{\boldsymbol{1}}_{n}^{T} yields

𝟏nT𝐟^=k𝟏nT𝐟^ss+(1k)𝟏nT𝐟^𝟏nT𝐐𝟏nn,{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}=k{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}^{\mathrm{ss}}+(1-k){\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}\frac{{\boldsymbol{1}}^{T}_{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}}{n}\,,

that is,

𝟏nT𝐟^=k𝟏nT𝐟^ss1𝟏nT𝐐𝟏nn(1k)k𝟏nT𝐟^ss1η(1k).{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}=\frac{k{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}^{\mathrm{ss}}}{1-\frac{{\boldsymbol{1}}^{T}_{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}}{n}(1-k)}\triangleq\frac{k{\boldsymbol{1}}^{T}_{n}\hat{\mathbf{f}}^{\mathrm{ss}}}{1-\eta(1-k)}\,.

It follows that

𝐟^=[k𝐈+1knk1η(1k)𝐐𝟏n𝟏nT]𝐟^ss(k𝐈+α𝐐𝟏n𝟏nT)𝐟^ss.\hat{\mathbf{f}}=\left[k{\mathbf{I}}+\frac{1-k}{n}\frac{k}{1-\eta(1-k)}{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\right]\hat{\mathbf{f}}^{\mathrm{ss}}\triangleq(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})\hat{\mathbf{f}}^{\mathrm{ss}}\,.

Then the squared bias is

𝐟𝔼𝐟^22=𝐟(k𝐈+α𝐐𝟏n𝟏nT)𝔼𝐟^ss22\displaystyle\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}\|^{2}_{2}=\|{\mathbf{f}}-(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\|^{2}_{2}
=(k𝐈+α𝐐𝟏n𝟏nT)(𝐟𝔼𝐟^ss)+((1k)𝐈α𝐐𝟏n𝟏nT)𝐟2\displaystyle=\|(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})+((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}\|^{2}
=(𝐟𝔼𝐟^ss)T(k𝐈+α𝟏n𝟏nT𝐐)(k𝐈+α𝐐𝟏n𝟏nT)(𝐟𝔼𝐟^ss)+\displaystyle=({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}(k{\mathbf{I}}+\alpha{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}})(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})+
2(𝐟𝔼𝐟^ss)T(k𝐈+α𝐐𝟏n𝟏nT)T((1k)𝐈α𝐐𝟏n𝟏nT)𝐟+\displaystyle\qquad 2({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}+
𝐟T((1k)𝐈α𝐐𝟏n𝟏nT)T((1k)𝐈α𝐐𝟏n𝟏nT)𝐟\displaystyle\qquad{\mathbf{f}}^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}
C1+2C2+C3.\displaystyle\triangleq C_{1}+2C_{2}+C_{3}\,.

First for term C1C_{1}, we have

C1\displaystyle C_{1} =(𝐟𝔼𝐟^ss)T(k𝐈+α𝟏n𝟏nT𝐐)(k𝐈+α𝐐𝟏n𝟏nT)(𝐟𝔼𝐟^ss)\displaystyle=({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}(k{\mathbf{I}}+\alpha{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}})(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})
=k2𝐟𝔼𝐟^ss2+2kα(𝐟𝔼𝐟^ss)T𝟏n𝟏nT𝐐(𝐟𝔼𝐟^ss)+\displaystyle=k^{2}\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\|^{2}+2k\alpha({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})+
α2(𝐟𝔼𝐟^ss)T𝟏n𝟏nT𝐐2𝟏n𝟏nT(𝐟𝔼𝐟^ss).\displaystyle\qquad\qquad\alpha^{2}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})\,.

Take the derivative with respect to kk,

C1k\displaystyle\frac{\partial C_{1}}{\partial k} =2k𝐟𝔼𝐟^ss2+2(kαk+α)(𝐟𝔼𝐟^ss)T𝟏n𝟏nT𝐐(𝐟𝔼𝐟^ss)\displaystyle=2k\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\|^{2}+2\left(k\frac{\partial\alpha}{\partial k}+\alpha\right)({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})
+2ααk(𝐟𝔼𝐟^ss)T𝟏n𝟏nT𝐐2𝟏n𝟏nT(𝐟𝔼𝐟^ss).\displaystyle\qquad\qquad+2\alpha\frac{\partial\alpha}{\partial k}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})\,.

Note that αkk=1=1n\frac{\partial\alpha}{\partial k}\mid_{k=1}=-\frac{1}{n} and α(1)=0\alpha(1)=0, then

C1kk=1=2𝐟𝔼𝐟^ss22n(𝐟𝔼𝐟^ss)T𝟏n𝟏nT𝐐(𝐟𝔼𝐟^ss).\frac{\partial C_{1}}{\partial k}\mid_{k=1}=2\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\|^{2}-\frac{2}{n}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})\,.

Similarly, for term C2C_{2} and C3C_{3}, we have

C2\displaystyle C_{2} =(𝐟𝔼𝐟^ss)T(k𝐈+α𝐐𝟏n𝟏nT)T((1k)𝐈α𝐐𝟏n𝟏nT)𝐟\displaystyle=({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}
=𝐟T(𝐈𝐐)T(k𝐈+α𝐐𝟏n𝟏nT)T((1k)𝐈α𝐐𝟏n𝟏nT)𝐟\displaystyle={\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}})^{T}(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}
=k(1k)𝐟T(𝐈𝐐)𝐟+(1k)α𝐟T(𝐈𝐐)𝟏n𝟏nT𝐐𝐟kα𝐟T(𝐈𝐐)𝐐𝟏n𝟏nT𝐟+\displaystyle=k(1-k){\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+(1-k)\alpha{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\mathbf{f}}-k\alpha{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}+
α2𝐟T(𝐈𝐐)𝟏n𝟏nT𝐐2𝟏n𝟏nT𝐟,\displaystyle\qquad\qquad\alpha^{2}{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,
C3\displaystyle C_{3} =𝐟T((1k)𝐈α𝐐𝟏n𝟏nT)T((1k)𝐈α𝐐𝟏n𝟏nT)𝐟\displaystyle={\mathbf{f}}^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}((1-k){\mathbf{I}}-\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}){\mathbf{f}}
=(1k)2𝐟T𝐟2α(1k)𝐟T𝟏n𝟏nT𝐐𝐟+α2𝐟T𝟏n𝟏nT𝐐2𝟏n𝟏nT𝐟,\displaystyle=(1-k)^{2}{\mathbf{f}}^{T}{\mathbf{f}}-2\alpha(1-k){\mathbf{f}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\mathbf{f}}+\alpha^{2}{\mathbf{f}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,

and then take their derivatives with respect to kk,

C2k\displaystyle\frac{\partial C_{2}}{\partial k} =(12k)𝐟T(𝐈𝐐)𝐟+((1k)αkα)𝐟T(𝐈𝐐)𝟏n𝟏nT𝐐𝐟\displaystyle=(1-2k){\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+\left((1-k)\frac{\partial\alpha}{\partial k}-\alpha\right){\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\mathbf{f}}-
(αk+α)𝐟T(𝐈𝐐)𝐐𝟏n𝟏nT𝐟+2ααk𝐟T(𝐈𝐐)𝟏n𝟏nT𝐐2𝟏n𝟏nT𝐟,\displaystyle\qquad\qquad\left(\frac{\partial\alpha}{\partial k}+\alpha\right){\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}+2\alpha\frac{\partial\alpha}{\partial k}{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,
C3k\displaystyle\frac{\partial C_{3}}{\partial k} =2(1k)𝐟T𝐟2((1k)αkα)𝐟T𝟏n𝟏nT𝐐𝐟+2ααk𝐟T𝟏n𝟏nT𝐐2𝟏n𝟏nT𝐟,\displaystyle=-2(1-k){\mathbf{f}}^{T}{\mathbf{f}}-2\left((1-k)\frac{\partial\alpha}{\partial k}-\alpha\right){\mathbf{f}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\mathbf{f}}+2\alpha\frac{\partial\alpha}{\partial k}{\mathbf{f}}^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}^{2}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,

it follows that

C2kk=1\displaystyle\frac{\partial C_{2}}{\partial k}\mid_{k=1} =𝐟T(𝐈𝐐)𝐟+1n𝐟T(𝐈𝐐)𝐐𝟏n𝟏nT𝐟,\displaystyle=-{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+\frac{1}{n}{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\,,
C3kk=1\displaystyle\frac{\partial C_{3}}{\partial k}\mid_{k=1} =0.\displaystyle=0\,.

On the other hand, the variance is

Var(𝐟^)\displaystyle\operatorname*{Var}(\hat{\mathbf{f}}) =(k𝐈+α𝐐𝟏n𝟏nT)Var(𝐟^ss)(k𝐈+α𝐐𝟏n𝟏nT)T\displaystyle=(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})(k{\mathbf{I}}+\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{T}
=k2Var(𝐟^ss)+kα𝐐𝟏n𝟏nTVar(𝐟^ss)+kαVar(𝐟^ss)𝟏n𝟏nT𝐐+α2𝐐𝟏n𝟏nTVar(𝐟^ss)𝟏n𝟏nT𝐐,\displaystyle=k^{2}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})+k\alpha{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})+k\alpha\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}+\alpha^{2}{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}\,,

and

tr[Var(𝐟^)]=k2tr[Var(𝐟^ss)]+2kαtr[𝐐𝟏n𝟏nTVar(𝐟^ss)]+α2tr[𝐐𝟏n𝟏nTVar(𝐟^ss)𝟏n𝟏nT𝐐],\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}})]=k^{2}\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]+2k\alpha\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]+\alpha^{2}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}]\,,

then its derivative with respect to kk is

tr[Var(𝐟^)]k\displaystyle\frac{\partial\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}})]}{\partial k} =2ktr[Var(𝐟^ss)]+2(kαk+α)tr[𝐐𝟏n𝟏nTVar(𝐟^ss)]+\displaystyle=2k\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]+2\left(k\frac{\partial\alpha}{\partial k}+\alpha\right)\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]+
2ααk𝐐𝟏n𝟏nTVar(𝐟^ss)𝟏n𝟏nT𝐐.\displaystyle\qquad\qquad 2\alpha\frac{\partial\alpha}{\partial k}{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}){\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}\,.

Evaluate at k=1k=1,

tr[Var(𝐟^)]kk=1=2tr[Var(𝐟^ss)]2ntr[𝐐𝟏n𝟏nTVar(𝐟^ss)].\frac{\partial\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}})]}{\partial k}\mid_{k=1}=2\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-\frac{2}{n}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]\,.

Thus

MSE(k)kk=1\displaystyle\frac{\partial\operatorname*{MSE}(k)}{\partial k}\mid_{k=1} =2𝐟𝔼𝐟^ss22n(𝐟𝔼𝐟^ss)T𝟏n𝟏nT𝐐(𝐟𝔼𝐟^ss)+\displaystyle=2\|{\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}}\|^{2}-\frac{2}{n}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})^{T}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}({\mathbf{f}}-{\mathbb{E}}\hat{\mathbf{f}}^{\mathrm{ss}})+
2[𝐟T(𝐈𝐐)𝐟+1n𝐟T(𝐈𝐐)𝐐𝟏n𝟏nT𝐟]+\displaystyle\qquad\qquad 2\left[-{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+\frac{1}{n}{\mathbf{f}}^{T}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{f}}\right]+
2tr[Var(𝐟^ss)]2ntr[𝐐𝟏n𝟏nTVar(𝐟^ss)]\displaystyle\qquad\qquad 2\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-\frac{2}{n}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]
=2𝐟T𝐐𝟏n𝟏nT𝐐n(𝐈𝐐)f2𝐟T𝐐(𝐈𝐐)𝐟+\displaystyle=2{\mathbf{f}}^{T}{\mathbf{Q}}\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}({\mathbf{I}}-{\mathbf{Q}})f-2{\mathbf{f}}^{T}{\mathbf{Q}}({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}+
2tr[Var(𝐟^ss)]2ntr[𝐐𝟏n𝟏nTVar(𝐟^ss)]\displaystyle\qquad\qquad 2\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-\frac{2}{n}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]
=2[tr[Var(𝐟^ss)]1ntr[𝐐𝟏n𝟏nTVar(𝐟^ss)]𝐟T𝐐(𝐈𝟏n𝟏nT𝐐n)(𝐈𝐐)𝐟]\displaystyle=2\left[\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-\frac{1}{n}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]-{\mathbf{f}}^{T}{\mathbf{Q}}({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n})({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}\right] (42)
=2[σ2tr[(𝐈𝟏n𝟏nT𝐐n)𝐐2]tr[(𝐈𝟏n𝟏nT𝐐n)(𝐈𝐐)𝐟𝐟T𝐐]].\displaystyle=2\left[\sigma^{2}\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}){\mathbf{Q}}^{2}]-\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n})({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}{\mathbf{f}}^{T}{\mathbf{Q}}]\right]\,. (43)

Next, we will prove the first term on the right-hand side of (43) is positive. By the Cauchy-Schwarz inequality for trace in Lemma 2,

tr[𝐐𝟏n𝟏nTVar(𝐟^ss)]\displaystyle\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})] tr[(𝐐𝟏n𝟏nT)2]tr[(Var(𝐟^ss))2]\displaystyle\leq\sqrt{\operatorname*{tr}[({\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T})^{2}]\operatorname*{tr}[(\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}))^{2}]}
=𝟏nT𝐐𝟏ntr[(Var(𝐟^ss))2],\displaystyle={\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}{\boldsymbol{1}}_{n}\sqrt{\operatorname*{tr}[(\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}))^{2}]}\,,

and since Var(𝐟^ss)=σ2𝐐2\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})=\sigma^{2}{\mathbf{Q}}^{2} is a positive semi-definite matrix, then by Lemma 3,

tr[(Var(𝐟^ss))2][tr(Var(𝐟^ss))]2.\operatorname*{tr}[(\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}))^{2}]\leq[\operatorname*{tr}(\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}}))]^{2}\,.

Note that

η\displaystyle\eta =𝟏nT𝐐𝟏nn=𝟏nT𝐐𝟏n𝟏nT𝟏nλmax(𝐐)=λmax(𝐁(𝐁T𝐁+λ0𝛀)1𝐁T).\displaystyle=\frac{{\boldsymbol{1}}^{T}_{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}}{n}=\frac{{\boldsymbol{1}}^{T}_{n}{\mathbf{Q}}{\boldsymbol{1}}_{n}}{{\boldsymbol{1}}^{T}_{n}{\boldsymbol{1}}_{n}}\leq\lambda_{\max}({\mathbf{Q}})=\lambda_{\max}({\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}}+\lambda_{0}{\boldsymbol{\Omega}})^{-1}{\mathbf{B}}^{T})\,.

Perform singular value decomposition (SVD) on 𝐁{\mathbf{B}}, 𝐁=𝐔𝐃𝐕T{\mathbf{B}}={\mathbf{U}}{\mathbf{D}}{\mathbf{V}}^{T}, where 𝐔{\mathbf{U}} and 𝐕{\mathbf{V}} are n×Jn\times J and J×JJ\times J orthogonal matrices, and 𝐃{\mathbf{D}} is a J×JJ\times J diagonal matrix, with diagonal entries d1d2dp0d_{1}\geq d_{2}\geq\cdots d_{p}\geq 0. Then

𝐐=𝐔𝐃𝐕T(𝐕𝐃2𝐕T+λ0𝛀)1𝐕𝐃𝐔T=𝐔(𝐈+λ0𝐃1𝐕T𝛀𝐕𝐃1)1𝐔T,{\mathbf{Q}}={\mathbf{U}}{\mathbf{D}}{\mathbf{V}}^{T}({\mathbf{V}}{\mathbf{D}}^{2}{\mathbf{V}}^{T}+\lambda_{0}{\boldsymbol{\Omega}})^{-1}{\mathbf{V}}{\mathbf{D}}{\mathbf{U}}^{T}={\mathbf{U}}({\mathbf{I}}+\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})^{-1}{\mathbf{U}}^{T}\,,

it follows that

η\displaystyle\eta λmax(𝐔(𝐈+λ0𝐃1𝐕T𝛀𝐕𝐃1)1𝐔T)\displaystyle\leq\lambda_{\max}({\mathbf{U}}({\mathbf{I}}+\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})^{-1}{\mathbf{U}}^{T})
=λmax((𝐈+λ0𝐃1𝐕T𝛀𝐕𝐃1)1)\displaystyle=\lambda_{\max}(({\mathbf{I}}+\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})^{-1})
=1λmin(𝐈+λ0𝐃1𝐕T𝛀𝐕𝐃1)\displaystyle=\frac{1}{\lambda_{\min}({\mathbf{I}}+\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})}
=11+λmin(λ0𝐃1𝐕T𝛀𝐕𝐃1)<1,\displaystyle=\frac{1}{1+\lambda_{\min}(\lambda_{0}{\mathbf{D}}^{-1}{\mathbf{V}}^{T}{\boldsymbol{\Omega}}{\mathbf{V}}{\mathbf{D}}^{-1})}<1\,,

thus

1ntr[𝐐𝟏n𝟏nTVar(𝐟^ss)]<tr[Var(𝐟^ss)].\frac{1}{n}\operatorname*{tr}[{\mathbf{Q}}{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]<\operatorname*{tr}[\operatorname*{Var}(\hat{\mathbf{f}}^{\mathrm{ss}})]\,.

Note that if MSE(k)k>0\frac{\partial\operatorname*{MSE}(k)}{\partial k}>0, that is,

σ2>𝐟T𝐐(𝐈𝟏n𝟏nT𝐐n)(𝐈𝐐)𝐟tr[(𝐈𝟏n𝟏nT𝐐n)𝐐2],\sigma^{2}>\frac{{\mathbf{f}}^{T}{\mathbf{Q}}({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n})({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}}{\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}){\mathbf{Q}}^{2}]}\,,

then there exists k(0,1)k\in(0,1) such that MSE(k)<MSE(1)\operatorname*{MSE}(k)<\operatorname*{MSE}(1).

If λ0=0\lambda_{0}=0, then (𝐈𝐐)𝐟=𝟎({\mathbf{I}}-{\mathbf{Q}}){\mathbf{f}}=\boldsymbol{0}, and

𝟏n𝟏nT𝐐=𝟏n(𝐁𝟏J)T𝐁(𝐁T𝐁)1𝐁T=𝟏n𝟏JT𝐁T=𝟏n𝟏nT,{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}={\boldsymbol{1}}_{n}({\mathbf{B}}{\boldsymbol{1}}_{J})^{T}{\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}={\boldsymbol{1}}_{n}{\boldsymbol{1}}_{J}^{T}{\mathbf{B}}^{T}={\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}\,,
𝐈𝟏n𝟏nT𝐐n=𝐈𝟏n(𝐁𝟏J)T𝐁(𝐁T𝐁)1𝐁Tn=𝐈𝟏n𝟏JT𝐁Tn=𝐈𝟏n𝟏nTn,{\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}={\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}({\mathbf{B}}{\boldsymbol{1}}_{J})^{T}{\mathbf{B}}({\mathbf{B}}^{T}{\mathbf{B}})^{-1}{\mathbf{B}}^{T}}{n}={\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{J}^{T}{\mathbf{B}}^{T}}{n}={\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}}{n}\,,

then

tr[(𝐈𝟏n𝟏nT𝐐n)𝐐2]=tr[(𝐈𝟏n𝟏nTn)𝐐]=J1ntr[𝟏n𝟏nT𝐐]=J1,\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}}{n}){\mathbf{Q}}^{2}]=\operatorname*{tr}[({\mathbf{I}}-\frac{{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}}{n}){\mathbf{Q}}]=J-\frac{1}{n}\operatorname*{tr}[{\boldsymbol{1}}_{n}{\boldsymbol{1}}_{n}^{T}{\mathbf{Q}}]=J-1\,,

and hence MSE(k)k>0\frac{\partial\operatorname*{MSE}(k)}{\partial k}>0 if σ2>0\sigma^{2}>0, which always holds.