This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Regularized Weighted Discrete Least Squares Approximation Using Gauss Quadrature Points

Congpei An111School of Economic Mathematics, Southwestern University of Finance and Economics, Chengdu 611130, China.
   Email address: andbach1984@hotmail.com, ancp@swufe.edu.cn
 and Hao-Ning Wu222Department of Mathematics, The University of Hong Kong, Hong Kong, China.
   Email address: haoning.wu@outlook.com, hnwu@hku.hk
Abstract

We consider polynomial approximation over the interval [1,1][-1,1] by regularized weighted discrete least squares methods with 2\ell_{2}- or 1\ell_{1}-regularization, respectively. As the set of nodes we use Gauss quadrature points (which are zeros of orthogonal polynomials). The number of Gauss quadrature points is N+1N+1. For 2L2N+12L\leq 2N+1, with the aid of Gauss quadrature, we obtain approximation polynomials of degree LL in closed form without solving linear algebra or optimization problems. In fact, these approximation polynomials can be expressed in the form of the barycentric interpolation formula (Berrut & Trefethen, 2004) when an interpolation condition is satisfied. We then study the approximation quality of the 2\ell_{2}-regularized approximation polynomial in terms of Lebesgue constants, and the sparsity of the 1\ell_{1}-regularized approximation polynomial. Finally, we give numerical examples to illustrate these theoretical results and show that a well-chosen regularization parameter can lead to good performance, with or without contaminated data.

Keywords. regularized least squares approximation, Gauss quadrature, Lebesgue constants, sparsity, barycentric interpolation.

AMS subject classifications. 41A05, 65D99, 65D32, 94A99

1 Introduction

In this paper, we are interested in approximating or recovering a function (possibly noisy) f𝒞([1,1])f\in\mathcal{C}([-1,1]) by a polynomial

pL(x)==0LβΦ~(x)L,x[1,1],p_{L}(x)=\sum_{\ell=0}^{L}\beta_{\ell}\tilde{\Phi}_{\ell}(x)\in\mathbb{P}_{L},\quad x\in[-1,1], (1.1)

where L\mathbb{P}_{L} is a linear space of polynomials of degree at most LL, and Φ~(x)\tilde{\Phi}_{\ell}(x), =0,1,,L\ell=0,1,\ldots,L, are normalized orthogonal polynomials (Szegö, 1939; Gautschi, 2004). As long as the basis {Φ~(x)}=0L\{\tilde{\Phi}_{\ell}(x)\}_{\ell=0}^{L} for L\mathbb{P}_{L} is given, the next step is to determine coefficients β,=0,,L\beta_{\ell},\,\ell=0,\ldots,L. We will consider the 2\ell_{2}-regularized approximation problem

minβ{j=0Nωj(=0LβΦ~(xj)f(xj))2+λ=0L(μβ)2},λ>0,\min_{\beta_{\ell}\in\mathbb{R}}~~\left\{\sum_{j=0}^{N}\omega_{j}\left(\sum_{\ell=0}^{L}\beta_{\ell}\tilde{\Phi}_{\ell}(x_{j})-f(x_{j})\right)^{2}+\lambda\sum_{\ell=0}^{L}(\mu_{\ell}\beta_{\ell})^{2}\right\},\quad\lambda>0, (1.2)

and the 1\ell_{1}-regularized approximation problem

minβ{j=0Nωj(=0LβΦ~(xj)f(xj))2+λ=0L|μβ|},λ>0,\min_{\beta_{\ell}\in\mathbb{R}}~~\left\{\sum_{j=0}^{N}\omega_{j}\left(\sum_{\ell=0}^{L}\beta_{\ell}\tilde{\Phi}_{\ell}(x_{j})-f(x_{j})\right)^{2}+\lambda\sum_{\ell=0}^{L}|\mu_{\ell}\beta_{\ell}|\right\},\quad\lambda>0, (1.3)

where ff is a given continuous function with values (possibly noisy) taken at N+1N+1 distinct points x0,x1,,xNx_{0},x_{1},\ldots,x_{N} over the interval [1,1][-1,1]; 2L2N+12L\leq 2N+1; 𝐰=[ω0,ω1,,ωN]T{\rm\bf{w}}=[\omega_{0},\omega_{1},\ldots,\omega_{N}]^{T} is a vector of positive Gauss quadrature weights (Gautschi, 2004); {μ}=0L\{\mu_{\ell}\}_{\ell=0}^{L} is a nonnegative nondecreasing sequence, which penalizes coefficients {β}=0L\{\beta_{\ell}\}_{\ell=0}^{L}; and λ>0\lambda>0 is the regularization parameter.

It is known that approximation schemes (1.2) and (1.3) are special cases of classical penalized least squares methods, see Powell (1967), Golitschek & Schumaker (1990), Gautschi (2004), Lazarov et al. (2007), Cai et al. (2009), Kim et al. (2009), An et al. (2012), Xiang & Zou (2013) and Zhou & Chen (2018). Some optimization methods or iterative algorithms are presented to find minimizers. However, we will concentrate on the aspect of constructing minimizers to problems (1.2) and (1.3) by means of orthogonal polynomials and Gauss quadrature (Kress, 1998; Gautschi, 2004, 2012; Trefethen, 2013). In this paper, Gauss quadrature will play an important role. We assume that the weight function w:(1,1)w:~(-1,1)\rightarrow\mathbb{R} is positive, such that

11w(x)𝑑x<,11xiw(x)𝑑x<,i=1,2,.\int_{-1}^{1}w(x)dx<\infty,\quad\int_{-1}^{1}x^{i}w(x)dx<\infty,\,i=1,2,\ldots.
Definition 1.1

A quadrature formula

11w(x)f(x)𝑑xj=0Nωjf(xj)\int_{-1}^{1}w(x)f(x)dx\approx\sum\limits_{j=0}^{N}\omega_{j}f(x_{j})

with N+1N+1 distinct quadrature points x0,x1,,xNx_{0},x_{1},\ldots,x_{N} is called a Gauss quadrature formula if it integrates all polynomials p2N+1p\in\mathbb{P}_{2N+1} exactly, i.e., if

j=0Nωjp(xj)=11w(x)p(x)𝑑xp2N+1.\sum\limits_{j=0}^{N}\omega_{j}p(x_{j})=\int_{-1}^{1}w(x)p(x)dx\quad\forall p\in\mathbb{P}_{2N+1}. (1.4)

x0,x1,,xNx_{0},x_{1},\ldots,x_{N} are called Gauss quadrature points.

Without causing confusion, we call 𝒳N+1={xj}j=0N\mathcal{X}_{N+1}=\{x_{j}\}_{j=0}^{N} Gauss quadrature points. Throughout this paper, we always assume that 𝒳N+1\mathcal{X}_{N+1} are Gauss quadrature points. It is well known (see, for example, Powell (1981), Kress (1998) and Gautschi (2012)) that Gauss quadrature points 𝒳N+1\mathcal{X}_{N+1} are the zeros of the orthogonal polynomial of degree N+1N+1. The orthogonality is with respect to the L2L_{2} inner product

(f,g)L2:=11w(x)f(x)g(x)𝑑x.(f,g)_{L_{2}}:=\int_{-1}^{1}w(x)f(x)g(x)dx.

This inner product induces the standard L2L_{2} norm

fL2:=(f,f)L2=(11|f(x)|2𝑑x)1/2.\|f\|_{L_{2}}:=\sqrt{(f,f)_{L_{2}}}=\left(\int_{-1}^{1}|f(x)|^{2}dx\right)^{1/2}.

Given a continuous function ff defined on [1,1][-1,1], sampling on 𝒳N+1\mathcal{X}_{N+1} generates

𝐟:=𝐟(𝒳N+1)=[f(x0),f(x1),.f(xN)]TN+1.{\rm\bf{f}}:={\rm\bf{f}}(\mathcal{X}_{N+1})=[f(x_{0}),f(x_{1}),\ldots.f(x_{N})]^{T}\in\mathbb{R}^{N+1}.

Let 𝐀L:=𝐀L(𝒳N+1)(N+1)×(L+1){\rm\bf{A}}_{L}:={\rm\bf{A}}_{L}(\mathcal{X}_{N+1})\in\mathbb{R}^{(N+1)\times(L+1)} be a matrix of orthogonal polynomials evaluated at 𝒳N+1\mathcal{X}_{N+1}:

𝐀L=[Φ~(xj)](N+1)×(L+1),j=0,1,,N,=0,1,,L.{\rm\bf{A}}_{L}=\left[\tilde{\Phi}_{\ell}(x_{j})\right]\in\mathbb{R}^{(N+1)\times(L+1)},\quad j=0,1,\ldots,N,\quad\ell=0,1,\ldots,L.

By subtracting the structure (1.1) of the approximation polynomial into the 2\ell_{2}-regularized approximation problem (1.2), the problem (1.2) transforms into the following problem

min𝜷L+1𝐖12(𝐀L𝜷𝐟)22+λ𝐑L𝜷22,λ>0,\underset{{\bm{\beta}}\in\mathbb{R}^{L+1}}{\min}~~\|{\bf{W}}^{\frac{1}{2}}({\rm\bf{A}}_{L}{\bm{\beta}}-{\rm\bf{f}})\|^{2}_{2}+\lambda\|{\rm\bf{R}}_{L}{\bm{\beta}}\|_{2}^{2},\quad\lambda>0, (1.5)

where

𝐖=diag(ω0,ω1,,ωN)(N+1)×(N+1),{\bf{W}}={\rm diag}(\omega_{0},\omega_{1},\ldots,\omega_{N})\in\mathbb{R}^{(N+1)\times(N+1)},

and the diagonal matrix 𝐑L:=diag(μ0,μ1,,μL)(L+1)×(L+1){\rm\bf{R}}_{L}:={\rm diag}(\mu_{0},\mu_{1},\ldots,\mu_{L})\in\mathbb{R}^{(L+1)\times(L+1)} is a semi-definite positive matrix.

With the same basis and weight vector as the 2\ell_{2}-regularized approximation problem (1.2) above, the 1\ell_{1}-regularized approximation problem (1.3) transforms into

min𝜷L+1𝐖12(𝐀LT𝜷𝐟)22+λ𝐑L𝜷1,λ>0.\underset{{\bm{\beta}}\in\mathbb{R}^{L+1}}{\min}~~\|{\bf{W}}^{\frac{1}{2}}({\rm\bf{A}}_{L}^{T}{\bm{\beta}}-{\rm\bf{f}})\|^{2}_{2}+\lambda\|{\rm\bf{R}}_{L}{\bm{\beta}}\|_{1},\quad\lambda>0. (1.6)

Now the next step is to fix 𝜷=[β0,β1,,βL]TL+1\bm{\beta}=[\beta_{0},\beta_{1},\ldots,\beta_{L}]^{T}\in\mathbb{R}^{L+1}.

The goal of this paper is to construct approximation polynomials in the form of (1.1). We specify coefficients {β}=0L\{\beta_{\ell}\}_{\ell=0}^{L} by solving problems (1.5) and (1.6) directly. This immediately yields the 2\ell_{2}-regularized approximation polynomial pL,N+12p_{L,N+1}^{\ell_{2}} (2.3) and the 1\ell_{1}-regularized approximation polynomial pL,N+11p_{L,N+1}^{\ell_{1}} (2.7).

As is known to all, Gauss quadrature goes hand in hand with the theory and computation of orthogonal polynomials, see Gautschi (2004) and Trefethen (2013) and references therein. Orthogonal polynomials occur in a wide range of applications and act as a remarkable role in pure and applied mathematics. Chebyshev polynomials and Legendre polynomials are two excellent factors in the family of orthogonal polynomials. Many polynomial approximation textbooks introduce fruitful results of Chebyshev and Legendre polynomials (Szegö, 1939; Powell, 1981; Gautschi, 2012; Trefethen, 2013). In particular, we take these two orthogonal polynomials (Chebyshev and Legendre) as representative examples in the choice of basis and Gauss quadrature points.

In the next section, we introduce some necessary notations and terminologies. The construction of the 2\ell_{2}- and 1\ell_{1}-regularized minimizers to problems (1.2) and (1.3) are presented, respectively. The crucial fact is that both pL,N+12p_{L,N+1}^{\ell_{2}} and pL,N+11p_{L,N+1}^{\ell_{1}} could be presented in the barycentric form under an interpolation condition, see the 2\ell_{2}-regularized barycentric interpolation formula (2.13) and the 1\ell_{1}-regularized barycentric interpolation formula (2.17). It is worth noting that the Wang-Xiang formula (Wang & Xiang, 2012) is a special case of pL,N+12p_{L,N+1}^{\ell_{2}} when we set Legendre polynomials as the basis, see Section 2.3. In Section 3, we study the quality of the approximation polynomial pL,N+12p_{L,N+1}^{\ell_{2}} in terms of Lebesgue constants. We illustrate that Lebesgue constants decay when the regularization parameter increases. Section 4 analyses the 1\ell_{1}-regularized approximation problem (1.3) in the view of sparsity. In particular, we derive a sharp upper bound of nonzero entries in the solution to the 1\ell_{1}-regularized approximation problem (1.6). We consider, in Section 5, numerical experiments containing approximation with exact and contaminated data.

All numerical results***All codes are available at
https://github.com/HaoNingWu/Regularized-Least-Squares-Approximation-using-Orthogonal-Polynomials.
in this paper are carried out by using MATLAB R2017A on a desktop (8.00 GB RAM, Intel(R) Processor 5Y70 at 1.10 GHz and 1.30 GHz) with the Windows 10 operating system.

2 Regularized weighted least squares approximation

The construction of minimizers to problems (1.2) and (1.3) is presented in this section.

2.1 2\ell_{2}-regularized approximation problem

First, we consider solving the 2\ell_{2}-regularized weighted discrete least squares approximation problem (the 2\ell_{2}-regularized approximation problem) (1.2). The problem can be transformed into a convex and differential optimization problem (1.5).

Taking the first derivative of the objective function in the 2\ell_{2}-regularized approximation problem (1.5) in the matrix form with respect to 𝜷\bm{\beta} leads to the first order condition

(𝐀LT𝐖𝐀L+λ𝐑LT𝐑L)𝜷=𝐀LT𝐖𝐟,λ>0.\left({\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{A}}_{L}+\lambda{\rm\bf{R}}_{L}^{T}{\rm\bf{R}}_{L}\right){\bm{\beta}}={\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}},~~~~\lambda>0. (2.1)

One may solve the first order condition (2.1) using methods of numerical linear algebra; however, in this paper we concentrate on how to obtain the solution to the first order condition (2.1) in a closed form.

Lemma 2.1

Let {Φ~}=0L\{\tilde{\Phi}_{\ell}\}_{\ell=0}^{L} be a class of normalized orthogonal polynomials with the weight function w(x)w(x) over [1,1][-1,1], and 𝒳N+1={x0,x1,,xN}\mathcal{X}_{N+1}=\{x_{0},x_{1},\ldots,x_{N}\} be the set of zeros of Φ~N+1\tilde{\Phi}_{N+1}. Assume 2L2N+12L\leq 2N+1 and 𝐰{\rm\bf{w}} is a vector of weights satisfying the Gauss quadrature formula (1.4). Then

𝐇L:=𝐀LT𝐖𝐀L=𝐈L(L+1)×(L+1),{\rm\bf{H}}_{L}:={\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{A}}_{L}={\rm\bf{I}}_{L}\in\mathbb{R}^{(L+1)\times(L+1)},

where 𝐈L{\rm\bf{I}}_{L} is the identity matrix.

Proof.  By the structure of the matrix 𝐇L{\rm\bf{H}}_{L} and the definition of Gauss quadrature formula (see (1.4)), we obtain

(𝐇L),=j=0NωjΦ~(xj)Φ~(xj)=11w(x)Φ~(x)Φ~(x)𝑑x=δ,,\begin{split}({\rm\bf{H}}_{L})_{\ell,\ell^{\prime}}=\sum_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})\tilde{\Phi}_{\ell^{\prime}}(x_{j})=\int_{-1}^{1}w(x)\tilde{\Phi}_{\ell}(x)\tilde{\Phi}_{\ell^{\prime}}(x)dx=\delta_{\ell,\ell^{\prime}},\end{split}

where δ,\delta_{\ell,\ell^{\prime}} is the Kronecker delta. The middle equality holds from Φ~(x)Φ~(x)2L2N+1\tilde{\Phi}_{\ell}(x)\tilde{\Phi}_{\ell^{\prime}}(x)\in\mathbb{P}_{2L}\subset\mathbb{P}_{2N+1}, and the last equality holds because of the orthonormality of {Φ~}=0L\{\tilde{\Phi}_{\ell}\}_{\ell=0}^{L}. \square

Theorem 2.1

Under the condition of Lemma 2.1, the optimal solution to the 2\ell_{2}-regularized approximation problem (1.5) in the matrix form can be expressed by

β=11+λμ2j=0NωjΦ~(xj)f(xj),=0,1,,L,λ>0.\beta_{\ell}=\frac{1}{1+\lambda\mu_{\ell}^{2}}\sum_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})f(x_{j}),\quad\ell=0,1,\ldots,L,\quad\lambda>0. (2.2)

Consequently, the 2\ell_{2}-regularized approximation polynomial is

pL,N+12(x)==0LΦ~(x)1+λμ2j=0NωjΦ~(xj)f(xj).p_{L,N+1}^{\ell_{2}}(x)=\sum_{\ell=0}^{L}\frac{\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}}\sum_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})f(x_{j}). (2.3)

Proof.  This is immediately obtained from the first order condition (2.1) of the problem (1.5) and Lemma 2.1. \square

In the limiting case NN\rightarrow\infty, we obtain the following simple but significant result.

Theorem 2.2

Adopt the notation and assumptions of Lemma 2.1. Let f𝒞([1,1])f\in{\mathcal{C}}([-1,1]), and let L0L\geq 0 be given. Then the polynomial pL,N+12p_{L,N+1}^{\ell_{2}} (2.3) has the uniform limit pLp_{L}^{*} as NN\rightarrow\infty, that is

limNpL,N+12pL=0,\lim\limits_{N\rightarrow\infty}\|p_{L,N+1}^{\ell_{2}}-p_{L}^{*}\|_{\infty}=0,

where

pL(x)==0LΦ~(x)1+λμ211w(x)Φ~(x)f(x)𝑑x.p_{L}^{*}(x)=\sum\limits_{\ell=0}^{L}\frac{\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}}\int_{-1}^{1}w(x)\tilde{\Phi}_{\ell}(x)f(x)dx. (2.4)

Proof.  Since the interval [1, 1][-1,\,1] is a compact set, and since the sums over \ell in (2.2) and (2.4) are finite, to prove the theorem, it is sufficient to prove that

limNj=0NωjΦ~(xj)f(xj)=11w(x)Φ~(x)f(x)𝑑x,0L.\lim\limits_{N\rightarrow\infty}\sum_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})f(x_{j})=\int_{-1}^{1}w(x)\tilde{\Phi}_{\ell}(x)f(x)dx,\quad 0\leq\ell\leq L. (2.5)

Since w(x)Φ~(x)f(x)𝒞([1, 1])w(x)\tilde{\Phi}_{\ell}(x)f(x)\in\mathcal{C}([-1,\,1]), the result follows from that, the sequence of Gauss quadrature formulae is convergent (Kress, 1998, Chapter 9). Hence (2.5) holds, proving the whole theorem. \square

2.2 1\ell_{1}-regularized approximation problem

Now we discuss the 1\ell_{1}-regularized approximation problem (1.3), but we still convert it into solving the problem (1.6) in a matrix form. To solve this problem, we first define the soft threshold operator Sk(a)S_{k}(a).

Definition 2.1 (Donoho & Johnstone, 1994)

The soft threshold operator, denoted as Sk(a)S_{k}(a), is defined as

Sk(a)=max(0,ak)+min(0,a+k).S_{k}(a)=\max(0,a-k)+\min(0,a+k).
Theorem 2.3

Adopt the notation and assumptions of Lemma 2.1. Then the 1\ell_{1}-regularized approximation problem (1.6) in the matrix form has the unique closed-form solution

β=12Sλμ(2α),=0,1,,L,\beta_{\ell}=\frac{1}{2}S_{\lambda\mu_{\ell}}(2\alpha_{\ell}),\quad\ell=0,1,\ldots,L, (2.6)

where α=j=0NωjΦ~(xj)f(xj)\alpha_{\ell}=\sum_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})f(x_{j}). Consequently, the 1\ell_{1}-regularized approximation polynomial is

pL,N+11(x)=12=0NSλμ(2α)Φ~(x).p_{L,N+1}^{\ell_{1}}(x)=\frac{1}{2}\sum\limits_{\ell=0}^{N}{S_{\lambda\mu_{\ell}}\left(2\alpha_{\ell}\right)}\tilde{\Phi}_{\ell}(x). (2.7)

The method of the proof is similar to the proof of Zhou & Chen (2018, Theorem 5.1). But we explain that our regularized least squares approximation problem (1.3) is over the interval [1,1][-1,1] rather than over the unit sphere.

Proof.  Since 𝐇L{\rm\bf{H}}_{L} is non-singular, for the 1\ell_{1}-regularized approximation problem (1.6) in the matrix form, its first order condition is

𝟎2𝐇L𝜷2𝐀LT𝐖𝐟+λ(𝐑L𝜷1),{\bf{0}}\in 2{\rm\bf{H}}_{L}\bm{\beta}-2{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}+\lambda\partial(\|{\rm\bf{R}}_{L}\bm{\beta}\|_{1}), (2.8)

where ()\partial(\cdot) denotes the subgradient (Clarke, 1990). Since 𝐇L=𝐈L{\rm\bf{H}}_{L}={\rm\bf{I}}_{L} is an identity matrix and 𝐑L{\rm\bf{R}}_{L} is a diagonal matrix, 𝜷\bm{\beta} is the solution to the 1\ell_{1}-regularized approximation problem (1.6) in the matrix form if and only if

02β2α+λμ|β|,=0,1,,L,0\in 2\beta_{\ell}-2\alpha_{\ell}+\lambda\mu_{\ell}\partial|\beta_{\ell}|,\quad\ell=0,1,\ldots,L, (2.9)

where 1|β|1-1\leq\partial|\beta_{\ell}|\leq 1. Let β\beta_{\ell}^{*} be the optimal solution to the problem (2.9), then

β=12(2αλμ|β|),=0,1,,L.\beta_{\ell}^{*}=\frac{1}{2}(2\alpha_{\ell}-\lambda\mu_{\ell}\partial|\beta_{\ell}^{*}|),\quad\ell=0,1,\ldots,L.

Next there exist three cases to be considered:

  1. 1.

    If 2α>λμ2\alpha_{\ell}>\lambda\mu_{\ell}, then 2αλμ|β|>02\alpha_{\ell}-\lambda\mu_{\ell}\partial|\beta_{\ell}^{*}|>0, thus β>0\beta_{\ell}^{*}>0, yielding |β|=1\partial|\beta_{\ell}^{*}|=1, then

    β=12(2αλμ)>0.\beta_{\ell}^{*}=\frac{1}{2}(2\alpha_{\ell}-\lambda\mu_{\ell})>0.
  2. 2.

    If 2α<λμ2\alpha_{\ell}<-\lambda\mu_{\ell}, then 2α+λμ|β|<02\alpha_{\ell}+\lambda\mu_{\ell}\partial|\beta_{\ell}^{*}|<0, thus β<0\beta_{\ell}^{*}<0, giving |β|=1\partial|\beta_{\ell}^{*}|=-1, then

    β=12(2α+λμ)<0.\beta_{\ell}^{*}=\frac{1}{2}(2\alpha_{\ell}+\lambda\mu_{\ell})<0.
  3. 3.

    If λμ2αλμ-\lambda\mu_{\ell}\leq 2\alpha_{\ell}\leq\lambda\mu_{\ell}, then on the one hand, β>0\beta_{\ell}^{*}>0 leads to |β|=1\partial|\beta_{\ell}|=1, and thus β0\beta_{\ell}^{*}\leq 0, on the other hand, β<0\beta_{\ell}^{*}<0 produces |β|=1\partial|\beta_{\ell}|=-1, and hence β0\beta_{\ell}^{*}\geq 0. Thus

    β=0.\beta_{\ell}^{*}=0.

As we hoped, with the aid of soft threshold operator, we obtain

β=12(max(0,2αλμ)+min(0,2α+λμ))=12Sλμ(2α).\begin{split}\beta_{\ell}^{*}&=\frac{1}{2}(\max(0,2\alpha_{\ell}-\lambda\mu_{\ell})+\min(0,2\alpha_{\ell}+\lambda\mu_{\ell}))\\ &=\frac{1}{2}S_{\lambda\mu_{\ell}}(2\alpha_{\ell}).\end{split}

\square

2.3 Regularized barycentric interpolation formulae

In this subsection, we focus on the condition of L=NL=N and the required interpolation conditions

p(xj)=f(xj),j=0,1,,N,p(x_{j})=f(x_{j}),\quad j=0,1,\ldots,N,

where pLp\in\mathbb{P}_{L} is the interpolant of ff. As pointed out by Wang, Huybrechs, & Vandewalle (2014), “barycentric interpolation is arguably the method of choice for numerical polynomial interpolation”. It is significant to express the 2\ell_{2}-regularized approximation polynomial (2.3) and the 2\ell_{2}-regularized approximation polynomial (2.7) in a barycentric form (Berrut & Trefethen, 2004; Higham, 2004)

p(x)=j=0NΩjxxjf(xj)j=0NΩjxxj,Ωj=1kj(xkxj),p(x)=\dfrac{\sum\limits_{j=0}^{N}\dfrac{\Omega_{j}}{x-x_{j}}f(x_{j})}{\sum\limits_{j=0}^{N}\dfrac{\Omega_{j}}{x-x_{j}}},\quad\Omega_{j}=\frac{1}{\prod_{k\neq j}(x_{k}-x_{j})},

respectively. {Ωj}j=0N\{\Omega_{j}\}_{j=0}^{N} are called barycentric weights. The study of the {Ωj}j=0N\{\Omega_{j}\}_{j=0}^{N} for roots and extrema of classical orthogonal polynomials is well developed, see Salzer (1972), Schwarz & Waldvogel (1989), Berrut & Trefethen (2004), Wang & Xiang (2012) and Wang et al. (2014).

We first derive the 2\ell_{2}-regularized barycentric interpolation formula. Recall (2.3), we have

pL,N+12(x)\displaystyle p_{L,N+1}^{\ell_{2}}(x) ==0Nj=0NωjΦ~(xj)f(xj)1+λμ2Φ~(x)\displaystyle=\sum\limits_{\ell=0}^{N}\dfrac{\sum_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})f(x_{j})}{1+\lambda\mu_{\ell}^{2}}\tilde{\Phi}_{\ell}(x)
=j=0Nωjf(xj)=0NΦ~(xj)Φ~(x)1+λμ2.\displaystyle=\sum_{j=0}^{N}\omega_{j}f(x_{j})\sum_{\ell=0}^{N}\dfrac{\tilde{\Phi}_{\ell}(x_{j})\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}}. (2.10)

From the orthonormality of Φ~(x)\tilde{\Phi}_{\ell}(x), =0,1,,N\ell=0,1,\ldots,N, by letting f(x)1f(x)\equiv 1 we have

j=0Nωj=0NΦ~(xj)Φ~(x)1+λμ2==0N(j=0NωjΦ~(xj)1)Φ~(x)1+λμ2=Φ~0(x)L2Φ~0(x)1+λμ02=11+λμ02.\begin{split}\sum\limits_{j=0}^{N}\omega_{j}\sum_{\ell=0}^{N}\dfrac{\tilde{\Phi}_{\ell}(x_{j})\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}}&=\sum_{\ell=0}^{N}\left(\sum_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})\cdot 1\right)\dfrac{\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}}\\ &=\|\tilde{\Phi}_{0}(x)\|_{L_{2}}\dfrac{\tilde{\Phi}_{0}(x)}{1+\lambda\mu_{0}^{2}}=\dfrac{1}{1+\lambda\mu_{0}^{2}}.\end{split}

Note that

1j=0Nωj=0NΦ~(xj)Φ~(x)1+λμ2whenλμ20,1\neq\sum\limits_{j=0}^{N}\omega_{j}\sum_{\ell=0}^{N}\dfrac{\tilde{\Phi}_{\ell}(x_{j})\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}}\quad\text{when}\quad\lambda\mu_{\ell}^{2}\neq 0,

due to the existence of regularization. Then the 2\ell_{2}-regularized approximation polynomial (2.3) can be expressed as

pL,N+12(x)=j=0N(ωj=0NΦ~(xj)Φ~(x)1+λμ2)f(xj)(1+λμ02)j=0Nωj=0NΦ~(xj)Φ~(x)1+λμ2.p_{L,N+1}^{\ell_{2}}(x)=\dfrac{\sum\limits_{j=0}^{N}\left(\omega_{j}\sum\limits_{\ell=0}^{N}\dfrac{\tilde{\Phi}_{\ell}(x_{j})\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}}\right)f(x_{j})}{(1+\lambda\mu_{0}^{2})\sum\limits_{j=0}^{N}\omega_{j}\sum\limits_{\ell=0}^{N}\dfrac{\tilde{\Phi}_{\ell}(x_{j})\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}}}. (2.11)

Without loss of generality, assume μ=1\mu_{\ell}=1 for N+1\ell\geq N+1. Note that under this assumption, {Φ~(x)1+λμ2}\left\{\frac{\tilde{\Phi}_{\ell}(x)}{\sqrt{1+\lambda\mu_{\ell}^{2}}}\right\}_{\ell\in\mathbb{N}} is still a sequence of orthogonal polynomials. By the Christoffel-Darboux formula (Gautschi, 2004, Section 1.3.3), we rewrite =0NΦ~(xj)Φ~(x)1+λμ2\sum_{\ell=0}^{N}\frac{\tilde{\Phi}_{\ell}(x_{j})\tilde{\Phi}_{\ell}(x)}{1+\lambda\mu_{\ell}^{2}} in the form

=0NΦ~(x)Φ~(xj)1+λμ2\displaystyle\sum\limits_{\ell=0}^{N}\dfrac{\tilde{\Phi}_{\ell}(x)\tilde{\Phi}_{\ell}(x_{j})}{1+\lambda\mu_{\ell}^{2}} =kNhNkN+1Φ~N+1(x)1+λμN+12Φ~N(xj)1+λμN2Φ~N+1(xj)1+λμN+12Φ~N(x)1+λμN2xxj\displaystyle=\dfrac{k_{N}}{h_{N}k_{N+1}}\dfrac{\dfrac{\tilde{\Phi}_{N+1}(x)}{\sqrt{1+\lambda\mu_{N+1}^{2}}}\dfrac{\tilde{\Phi}_{N}(x_{j})}{\sqrt{1+\lambda\mu_{N}^{2}}}-\dfrac{\tilde{\Phi}_{N+1}(x_{j})}{\sqrt{1+\lambda\mu_{N+1}^{2}}}\dfrac{\tilde{\Phi}_{N}(x)}{\sqrt{1+\lambda\mu_{N}^{2}}}}{x-x_{j}}
=kNhNkN+1Φ~N+1(x)Φ~N(xj)1+λμN+121+λμN2(xxj),\displaystyle=\dfrac{k_{N}}{h_{N}k_{N+1}}\dfrac{\tilde{\Phi}_{N+1}(x)\tilde{\Phi}_{N}(x_{j})}{\sqrt{1+\lambda\mu_{N+1}^{2}}\sqrt{1+\lambda\mu_{N}^{2}}(x-x_{j})}, (2.12)

where kk_{\ell} and hh_{\ell} denote the leading coefficient and the L2L_{2} norm of Φ~(x)1+λμ2\frac{\tilde{\Phi}_{\ell}(x)}{\sqrt{1+\lambda\mu_{\ell}^{2}}}, respectively. Let us combine (2.12) with the other expression (2.11) of the 2\ell_{2}-regularized approximation polynomial and cancel the factor kNhNkN+1Φ~N+1(x)1+λμN+121+λμN2\dfrac{k_{N}}{h_{N}k_{N+1}}\dfrac{\tilde{\Phi}_{N+1}(x)}{\sqrt{1+\lambda\mu_{N+1}^{2}}\sqrt{1+\lambda\mu_{N}^{2}}} from both the numerator and the denominator. Together with (2.11) and (2.12) we obtain the solution to the 2\ell_{2}-regularized approximation problem in a barycentric form, and we name it the 2\ell_{2}-regularized barycentric interpolation formula:

pN2bary(x)=j=0NΩjxxjf(xj)(1+λμ02)j=0NΩjxxj,p_{N}^{\ell_{2}-\text{bary}}(x)=\dfrac{\sum\limits_{j=0}^{N}\dfrac{\Omega_{j}}{x-x_{j}}f(x_{j})}{(1+\lambda\mu_{0}^{2})\sum\limits_{j=0}^{N}\dfrac{\Omega_{j}}{x-x_{j}}}, (2.13)

where Ωj=ωjΦ~N(xj)\Omega_{j}=\omega_{j}\tilde{\Phi}_{N}(x_{j}) is the corresponding barycentric weight at xjx_{j}. This relation between barycentric weights and Gauss quadrature weights is revealed by Wang, Huybrechs, & Vandewalle (2014); however, this relation does not lead to the fast computation since it still requires evaluating orthogonal polynomials on 𝒳N+1\mathcal{X}_{N+1}. From the relation they also find the explicit barycentric weights for all classical orthogonal polynomials.

Secondly, we induce the 1\ell_{1}-regularized barycentric interpolation formula. The 1\ell_{1}-regularized approximation polynomial (2.7) can be expressed as the sum of two terms:

pL,N+11(x)\displaystyle p_{L,N+1}^{\ell_{1}}(x) ==0NSλμ(2j=0NωjΦ~(xj)f(xj))2Φ~(x)\displaystyle=\sum\limits_{\ell=0}^{N}\frac{S_{\lambda\mu_{\ell}}\left(2\sum\limits_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})f(x_{j})\right)}{2}\tilde{\Phi}_{\ell}(x)
==0N(j=0NωjΦ~(xj)f(xj))Φ~(x)+=0NcΦ~(x),\displaystyle=\sum\limits_{\ell=0}^{N}\left(\sum_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})f(x_{j})\right)\tilde{\Phi}_{\ell}(x)+\sum_{\ell=0}^{N}c_{\ell}\tilde{\Phi}_{\ell}(x), (2.14)

where

c=Sλμ(2α)2α,=0,1,,N.c_{\ell}=\frac{S_{\lambda\mu_{\ell}}(2\alpha_{\ell})}{2}-\alpha_{\ell},\quad\ell=0,1,\ldots,N.

The first term in (2.3) can be directly written in the barycentric form by letting λ=0\lambda=0 using the 2\ell_{2}-regularized barycentric formula. Then let the basis {Φ~}=0N\{\tilde{\Phi}_{\ell}\}_{\ell=0}^{N} transform into Lagrange polynomials {j(x)}j=0N\{\ell_{j}(x)\}_{j=0}^{N}. By the basis-transformation relation between orthogonal polynomials and Lagrange polynomials (Gander, 2005), the second term in (2.3) can be represented by Lagrange polynomials in the form

=0NcΦ~(x)=j=0N(=0NcΦ~(xj))j(x).\sum_{\ell=0}^{N}c_{\ell}\tilde{\Phi}_{\ell}(x)=\sum_{j=0}^{N}\left(\sum_{\ell=0}^{N}c_{\ell}\tilde{\Phi}_{\ell}(x_{j})\right)\ell_{j}(x). (2.15)

With the same procedure of obtaining the barycentric formula from the classical Lagrange interpolation formula in Berrut & Trefethen (2004), the second term (2.15) in (2.3) can be rewritten as

=0NcΦ~(x)=j=0NΩjxxj(=0NcΦ~(xj))j=0NΩjxxj.\sum_{\ell=0}^{N}c_{\ell}\tilde{\Phi}_{\ell}(x)=\dfrac{\sum\limits_{j=0}^{N}\dfrac{\Omega_{j}}{x-x_{j}}\left(\sum\limits_{\ell=0}^{N}c_{\ell}\tilde{\Phi}_{\ell}(x_{j})\right)}{\sum\limits_{j=0}^{N}\dfrac{\Omega_{j}}{x-x_{j}}}. (2.16)

Together with (2.3) and (2.16), we obtain the 1\ell_{1}-regularized barycentric interpolation formula:

pN1bary(x)=j=0NΩjxxj(f(xj)+=0NcΦ~(xj))j=0NΩjxxj.p_{N}^{\ell_{1}-\text{bary}}(x)=\dfrac{\sum\limits_{j=0}^{N}\dfrac{\Omega_{j}}{x-x_{j}}\left(f(x_{j})+\sum\limits_{\ell=0}^{N}c_{\ell}\tilde{\Phi}_{\ell}(x_{j})\right)}{\sum\limits_{j=0}^{N}\dfrac{\Omega_{j}}{x-x_{j}}}. (2.17)

If λ=0\lambda=0, the basis is normalized Legendre polynomials (i.e., interpolation nodes are Legendre points) and Ωj=(1)j(1xj2)ωj\Omega_{j}=(-1)^{j}\sqrt{(1-x_{j}^{2})\omega_{j}} where ωj\omega_{j} is the Gauss quadrature weight at xjx_{j} (Wang & Xiang, 2012), then both the 2\ell_{2}-regularized barycentric interpolation formula (2.13) and the 1\ell_{1}-regularized barycentric interpolation formula (2.17) reduce to the Wang-Xiang formula (Wang & Xiang, 2012; Trefethen, 2013). Inspired by the work of Higham (2004), we will conduct numerical studies on both regularized barycentric interpolation formulae (2.13) and (2.17), such as numerical stability, see the next paper (An & Wu, 2019).

3 Quality of 2\ell_{2}-regularized weighted least squares approximation

In this section, we study the quality of the 2\ell_{2}-regularized weighted least squares approximation in terms of Lebesgue constants. As is known to all, the Lebesgue constant is a tool for quantifying the divergence or convergence of polynomial approximation. From 1910, a lot of works have been done on Lebesgue constants (Fejér, 1910; Szegö, 1939; Rivlin, 2003; Powell, 1981; Wang & Xiang, 2012; Trefethen, 2013). This paper considers Lebesgue constants in the case of regularization. The Lebesgue constant is the \infty-norm of the linear mapping from data to approximation polynomial:

ΛL:=supf0pLf.{\rm\Lambda}_{L}:=\underset{f\neq 0}{\sup}\frac{\|p_{L}\|_{\infty}}{\|f\|_{\infty}}.

In particular, we have the following estimation on the ΛL\Lambda_{L} as a consequence of (2.3).

Proposition 3.1

Adopt the notation and assumptions of Theorem 2.1. We have

ΛL=\displaystyle{\rm\Lambda}_{L}= maxx[1, 1]j=1Nwj|=0L11+λμ2Φ~(x)Φ~(xj)|\displaystyle\underset{x\in[-1,\,1]}{\max}\sum_{j=1}^{N}w_{j}\Big{|}\sum_{\ell=0}^{L}\frac{1}{1+\lambda\mu_{\ell}^{2}}\tilde{\Phi}_{\ell}(x)\tilde{\Phi}_{\ell}(x_{j})\Big{|}
\displaystyle\leq maxx[1, 1]j=0Nwj=0L11+λμ2|Φ~(x)Φ~(xj)|.\displaystyle\underset{x\in[-1,\,1]}{\max}\sum_{j=0}^{N}w_{j}\sum_{\ell=0}^{L}\frac{1}{1+\lambda\mu_{\ell}^{2}}\Big{|}\tilde{\Phi}_{\ell}(x)\tilde{\Phi}_{\ell}(x_{j})\Big{|}. (3.1)

3.1 Lebesgue constants with the basis of Chebyshev polynomials of the first kind

We mimic the discussion of the least squares approximation without regularization in Rivlin (2003, Section 2.4). We shall treat the case of normalized Chebyshev polynomials of the first kind T(x),=0,1,,T_{\ell}(x),\,\ell=0,1,\ldots, as the basis for L\mathbb{P}_{L}. Primary results are also available in Powell (1967). Consider a weighted Fourier series of a given continuous function g(θ)g(\theta) over [π,π][-\pi,\pi]:

qL(θ)=ρ0,L2a0+=1Lρ,L(acosθ+bsinθ),q_{L}(\theta)=\frac{\rho_{0,L}}{2}a_{0}+\sum_{\ell=1}^{L}\rho_{\ell,L}(a_{\ell}\cos\ell\theta+b_{\ell}\sin\ell\theta),

where a0,a1,,aL,b1,,bLa_{0},a_{1},\ldots,a_{L},b_{1},\ldots,b_{L} are Fourier coefficients defined as

a=1πππg(t)costdt,=0,1,,L,a_{\ell}=\frac{1}{\pi}\int_{-\pi}^{\pi}g(t)\cos\ell tdt,\quad\ell=0,1,\ldots,L,

and

b=1πππg(t)sintdt,=1,,L,b_{\ell}=\frac{1}{\pi}\int_{-\pi}^{\pi}g(t)\sin\ell tdt,\quad\ell=1,\ldots,L,

and weights ρ,L=1/(1+λμ2)\rho_{\ell,L}=1/(1+\lambda\mu_{\ell}^{2}), =0,1,,L\ell=0,1,\ldots,L.

Lemma 3.1 (Rivlin, 2003)

If g(θ)g(\theta) is continuous on [π,π][-\pi,\pi] with period 2π2\pi, then

qL(θ)=1πππg(t+θ)uL(t)𝑑t,q_{L}(\theta)=\frac{1}{\pi}\int_{-\pi}^{\pi}g(t+\theta)u_{L}(t)dt,

where

uL(t)=ρ0,L2+=1Lρ,Lcost.u_{L}(t)=\frac{\rho_{0,L}}{2}+\sum_{\ell=1}^{L}\rho_{\ell,L}\cos\ell t.
Definition 3.1

Lebesgue constants for 2\ell_{2}-regularized least squares approximation using Chebyshev polynomials of the first kind are defined as

ΛL:=1πππ|uL(t)|𝑑t.{\rm\Lambda}_{L}:=\frac{1}{\pi}\int_{-\pi}^{\pi}|u_{L}(t)|dt.

The case of λ=0\lambda=0 leads to Lebesgue constants for Fourier series (without regularization) (Rivlin, 2003, Section 2.4) in the form of

ΛL=1π0π|sin(L+12)t|sint2𝑑t=12πππ|sin(L+12)tsint2|𝑑t,{\rm\Lambda}_{L}=\frac{1}{\pi}\int_{0}^{\pi}\frac{|\sin(L+\frac{1}{2})t|}{\sin\frac{t}{2}}dt=\frac{1}{2\pi}\int_{-\pi}^{\pi}\Big{|}\frac{\sin(L+\frac{1}{2})t}{\sin\frac{t}{2}}\Big{|}dt,

where the last integrand is the famous Dirichlet kernel. For estimation of ΛL\Lambda_{L}, we have the following lemma.

Lemma 3.2

Let DnD_{n} denote the Dirichlet kernel (Stein & Shakarchi, 2011)

Dn(x):=k=nneikx=sin(n+12)xsinx2.D_{n}(x):=\sum_{k=-n}^{n}e^{ikx}=\frac{\sin(n+\frac{1}{2})x}{\sin\frac{x}{2}}.

Then for n2n\geq 2,

12πππ|Dn(x)|𝑑x=4π2logn+𝒪(1),\frac{1}{2\pi}\int_{-\pi}^{\pi}|D_{n}(x)|dx=\frac{4}{\pi^{2}}\log n+\mathcal{O}(1), (3.2)

and

12πππ|Dn(x)|𝑑x4π2log(n1)+η,\frac{1}{2\pi}\int_{-\pi}^{\pi}|D_{n}(x)|dx\leq\frac{4}{\pi^{2}}\log(n-1)+\eta, (3.3)

where η=4π2+2π(1+0πsinxx𝑑x)=2.220884\eta=\frac{4}{\pi^{2}}+\frac{2}{\pi}(1+\int_{0}^{\pi}\frac{\sin x}{x}dx)=2.220884\ldots.

(3.2) is a known result given by Fejér (1910), one might find in Lorentz (1966, p.5) and Stein & Shakarchi (2011, Section 2.7.2). (3.3) is sharper than the known result ΛL<4π2logn+3,n2{\rm\Lambda}_{L}<\frac{4}{\pi^{2}}\log n+3,\,n\geq 2 (Rivlin, 2003, Lemma 2.2). For completeness, the proof of Lemma 3.2 is given in Appendix: Proof of Lemma 3.2.

Theorem 3.1

Suppose ff is continuous on [1,1][-1,1], and normalized Chebyshev polynomials constitute a basis for L\mathbb{P}_{L}. Then Lebesgue constants ΛL{\rm\Lambda}_{L} for the 2\ell_{2}-regularized least squares approximation of degree LL (L2L\geq 2) on [1,1][-1,1] satisfy

ΛL=4logL/π21+λμ~2+𝒪(1),{\rm\Lambda}_{L}=\frac{4\log L/\pi^{2}}{1+\lambda\tilde{\mu}^{2}}+\mathcal{O}(1), (3.4)

and

ΛL4log(L1)/π2+η1+λμ¯2,{\rm\Lambda}_{L}\leq\frac{4\log(L-1)/\pi^{2}+\eta}{1+\lambda\underline{\mu}^{2}}, (3.5)

where μ¯=min{μ0,μ1,,μL}\underline{\mu}=\min\{\mu_{0},\mu_{1},\ldots,\mu_{L}\}, μ~\tilde{\mu}\in\mathbb{R} which satisfies min{μ0,μ1,,μL}μ~max{μ0,μ1,,μL}\min\{\mu_{0},\mu_{1},\ldots,\mu_{L}\}\leq\tilde{\mu}\leq\max\{\mu_{0},\mu_{1},\ldots,\mu_{L}\} and η=2.220884\eta=2.220884\ldots.

Proof.  Since ff is continuous on [1,1][-1,1], then g(θ)=f(cosθ)g(\theta)=f(\cos\theta) is continuous on [0,π][0,\pi]. If g(θ)=g(θ)g(-\theta)=g(\theta), then gg is continuous on [π,π][-\pi,\pi]. The even function gg gives b=0b_{\ell}=0 for all =1,,L\ell=1,\ldots,L, and then

qL(θ)\displaystyle q_{L}(\theta) =ρ0,L2a0+=1Lρ,Lacosθ\displaystyle=\frac{\rho_{0,L}}{2}a_{0}+\sum_{\ell=1}^{L}\rho_{\ell,L}a_{\ell}\cos\ell\theta
=ρ0,L2a0+=1Lρ,LaT(x)\displaystyle=\frac{\rho_{0,L}}{2}a_{0}+\sum_{\ell=1}^{L}\rho_{\ell,L}a_{\ell}T_{\ell}(x) (3.6)
=π/2ρ0,L2a0T~0+=1Lπρ,LaT~(x),\displaystyle=\frac{\sqrt{\pi/2}\rho_{0,L}}{2}a_{0}\tilde{T}_{0}+\sum_{\ell=1}^{L}\sqrt{\pi}\rho_{\ell,L}a_{\ell}\tilde{T}_{\ell}(x),

which reveals that this is the 2\ell_{2}-regularized least squares approximation of degree LL with the basis for L\mathbb{P}_{L} being normalized Chebyshev polynomials of the first kind. Since g(θ)g(\theta) is continuous on [π,π][-\pi,\pi] with period 2π2\pi, there must exist M0M\geq 0 such that

|g(t+θ)|M,t,θ[π,π].|g(t+\theta)|\leq M,\quad t,\theta\in[-\pi,\pi].

By Lemma 3.1,

maxθ[π,π]|qL(θ)|MΛL.\max\limits_{\theta\in[-\pi,\pi]}|q_{L}(\theta)|\leq M{\rm\Lambda}_{L}.

When λ=0\lambda=0, one may easily verify that

uL(t)=12+=0Lcost=sin(L+12)t2sint2,u_{L}(t)=\frac{1}{2}+\sum_{\ell=0}^{L}\cos\ell t=\frac{\sin(L+\frac{1}{2})t}{2\sin\frac{t}{2}},

then by Lemma 3.2,

ΛL=12πππ|sin(L+12)tsint2|𝑑t4π2log(L1)+η,{\rm\Lambda}_{L}=\frac{1}{2\pi}\int_{-\pi}^{\pi}\Big{|}\frac{\sin(L+\frac{1}{2})t}{\sin\frac{t}{2}}\Big{|}dt\leq\frac{4}{\pi^{2}}\log(L-1)+\eta,

and

ΛL=4π2logL+𝒪(1).{\rm\Lambda}_{L}=\frac{4}{\pi^{2}}\log L+\mathcal{O}(1).

Let μ¯=min{μ0,μ1,,μL}\underline{\mu}=\min\{\mu_{0},\mu_{1},\ldots,\mu_{L}\} and μ¯=max{μ0,μ1,,μL}\overline{\mu}=\max\{\mu_{0},\mu_{1},\ldots,\mu_{L}\}. And let μ~\tilde{\mu} be a real number satisfying μ¯μ~μ¯\underline{\mu}\leq\tilde{\mu}\leq\overline{\mu}. If λ>0\lambda>0, then

uL(t)11+λμ¯2(12+=0Lcost)=11+λμ¯2sin(L+12)t2sint2,u_{L}(t)\leq\frac{1}{1+\lambda\underline{\mu}^{2}}\left(\frac{1}{2}+\sum_{\ell=0}^{L}\cos\ell t\right)=\frac{1}{1+\lambda\underline{\mu}^{2}}\frac{\sin(L+\frac{1}{2})t}{2\sin\frac{t}{2}},

and

uL(t)=11+λμ~2(12+=0Lcost)=11+λμ~2sin(L+12)t2sint2,u_{L}(t)=\frac{1}{1+\lambda\tilde{\mu}^{2}}\left(\frac{1}{2}+\sum_{\ell=0}^{L}\cos\ell t\right)=\frac{1}{1+\lambda\tilde{\mu}^{2}}\frac{\sin(L+\frac{1}{2})t}{2\sin\frac{t}{2}},

which gives the asymptotic result (3.4) and the inequality (3.5). \square

Take the family of normalized Chebyshev polynomials of the first kind {T~(x)}=0L\{\tilde{T}_{\ell}(x)\}_{\ell=0}^{L} as the basis for L\mathbb{P}_{L} and the set of zeros of T~L+1(x)\tilde{T}_{L+1}(x) as the set of nodes. Setting L=NL=N and λ=101\lambda=10^{-1}. Fig. 1 illustrates Lebesgue constants with respect to different choices of regularization parameter λ\lambda.

Refer to caption
Figure 1: The Lebesgue constant of 2\ell_{2}-regularized approximation with L=NL=N and μ=1\mu_{\ell}=1 for =0,1,,L\ell=0,1,\ldots,L using Chebyshev polynomials of the first kind

3.2 Lebesgue constants with the basis of Legendre polynomials

In this subsection, we will derive asymptotic bounds of Lebesgue constants of the 2\ell_{2}- regularized approximation by using Legendre polynomials. Consider the kernel

KL(x,y):==0L2+11+λμ2P(x)P(y),K_{L}(x,y):=\sum_{\ell=0}^{L}\frac{2\ell+1}{1+\lambda\mu_{\ell}^{2}}P_{\ell}(x)P_{\ell}(y),

where P()P_{\ell}(\cdot) denotes the Legendre polynomial of degree \ell. The case of λ=0\lambda=0 gives a simple kernel

TL(x,y)==0L(2+1)P(x)P(y)=(L+1)PL(x)PL+1(y)PL+1(x)PL(y)yx,T_{L}(x,y)=\sum_{\ell=0}^{L}(2\ell+1)P_{\ell}(x)P_{\ell}(y)=(L+1)\frac{P_{L}(x)P_{L+1}(y)-P_{L+1}(x)P_{L}(y)}{y-x},

where the last equality is due to Gronwall (1913, (4)). Thus,

KL(x)KL(x,1):==0L2+11+λμ2P(x),K_{L}(x)\triangleq K_{L}(x,1):=\sum_{\ell=0}^{L}\frac{2\ell+1}{1+\lambda\mu_{\ell}^{2}}P_{\ell}(x),

and

TL(x)TL(x,1):==0L(2+1)P(x)=(L+1)PL(x)PL+1(x)1x,T_{L}(x)\triangleq T_{L}(x,1):=\sum_{\ell=0}^{L}(2\ell+1)P_{\ell}(x)=(L+1)\frac{P_{L}(x)-P_{L+1}(x)}{1-x},

where the rightmost equality is due to Gronwall (1913, (5)). The reader may note that

|KL(x)|11+λμ¯2|TL(x)|,|K_{L}(x)|\leq\frac{1}{1+\lambda\underline{\mu}^{2}}|T_{L}(x)|, (3.7)

where μ¯=min{μ0,μ1,,μL}\underline{\mu}=\min\{\mu_{0},\mu_{1},\ldots,\mu_{L}\}.

Definition 3.2

Lebesgue constants for the 2\ell_{2}-regularized approximation using Legendre polynomials are defined as

ΛL:=1211|KL(x)|𝑑x.{\rm\Lambda}_{L}:=\frac{1}{2}\int_{-1}^{1}|K_{L}(x)|dx.

The case of λ=0\lambda=0 leads to

ΘL:=1211|TL(x)|𝑑x=L+1211|PL(x)PL+1(x)1x|𝑑x,{\rm\Theta}_{L}:=\frac{1}{2}\int_{-1}^{1}|T_{L}(x)|dx=\frac{L+1}{2}\int_{-1}^{1}\left|\frac{P_{L}(x)-P_{L+1}(x)}{1-x}\right|dx, (3.8)

which is the definition of Lebesgue constant of Legendre truncation of degree LL (Gronwall, 1913).

Lemma 3.3 (Gronwall, 1913 and Szegö, 1934)

Let ΘL{\rm\Theta}_{L} be defined as (3.8). Then

limLΘLL=22π.\lim\limits_{L\rightarrow\infty}\frac{{\rm\Theta}_{L}}{\sqrt{L}}=2\sqrt{\frac{2}{\pi}}. (3.9)

Combining Lemma (3.3) with the inequality (3.7), we obtain the estimation of ΛL{\rm\Lambda}_{L} in the case of Legendre polynomials.

Theorem 3.2

Suppose ff is continuous on [1,1][-1,1], and Legendre polynomials constitute the basis for L\mathbb{P}_{L}. Then Lebesgue constants ΛL{\rm\Lambda}_{L} for the 2\ell_{2}-regularized least squares approximation of degree LL (L2L\geq 2) on [1,1][-1,1] satisfy

ΛL11+λμ¯2(23/2πL1/2+o(L1/2)),{\rm\Lambda}_{L}\leq\frac{1}{1+\lambda\underline{\mu}^{2}}\left(\frac{2^{3/2}}{\sqrt{\pi}}L^{1/2}+o(L^{1/2})\right),

where μ¯=min{μ0,μ1,,μL}\underline{\mu}=\min\{\mu_{0},\mu_{1},\ldots,\mu_{L}\}.

The proof for Theorem 3.2 is based on the above discussion.

4 Sparsity of solution to the 1\ell_{1}-regularized approximation problem

Some real-world problems, such as signal processing, often have sparse solutions. One may seek the sparsest solution to a problem, that is, the solution containing zero elements at most. However, a vector of real data would rarely contains many strict zeros. One may introduce other methods to seek sparsity, such as minx\min_{x} xp\|x\|_{p}, where xp=(i|xi|p)1/p\|x\|_{p}=(\sum_{i}|x_{i}|^{p})^{1/p}, 0<p<10<p<1. Nevertheless, these optimization problems mentioned above are nonconvex and nondifferentiable (Clarke, 1990; Bruckstein et al., 2009). Regularized methods, especially 1\ell_{1}-regularizaed methods, also produce sparse solutions, according to our examples. One may find a relatively sparse solution by minimizing 1\ell_{1} norm, because such an optimization problem is a convex optimization problem, and its solution is the closest one to the sparsest solution for all p1p\geq 1 in xp\|x\|_{p}. For topics on sparsity, we refer to Bruckstein et al. (2009). We consider the sparsity of the solution 𝜷{\bm{\beta}} to the 1\ell_{1}-regularized approximation problem (1.6). The sparsity is measured by the number of nonzero elements of 𝜷\bm{\beta}, denoted as 𝜷0\|{\bm{\beta}}\|_{0}, also known as the zero “norm” (it is not a norm actually) of 𝜷\bm{\beta}.

Before discussing the upper bound of 𝜷0\|{\bm{\beta}}\|_{0}, we offer a quick glimpse into the zero elements distribution of the 1\ell_{1}-regularized approximation solution. From Definition 2.1 of the soft threshold operator, we have the following result.

Proposition 4.1 (Zero elements distribution of the 1\ell_{1}-regularized approximation solution)

Adopt the notation and assumptions of Lemma 2.1. If μ\mu_{\ell} satisfies

λμ2j=0NωjΦ~(xj)f(xj)λμ,-\lambda\mu_{\ell}\leq 2\sum\limits_{j=0}^{N}\omega_{j}\tilde{\Phi}_{\ell}(x_{j})f(x_{j})\leq\lambda\mu_{\ell}, (4.1)

then its corresponding β\beta_{\ell} is zero, =0,1,,L\ell=0,1,\ldots,L.

If λ>0\lambda>0, then 𝐀LT𝐖𝐟0\|{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}\|_{0} becomes an upper bound of the number of nonzero elements of 𝜷{\bm{\beta}}. Furthermore, we obtain the exact number of nonzero elements of 𝜷{\bm{\beta}} with the help of the information of 𝜷{\bm{\beta}}.

Theorem 4.1

Let 𝛃=[β0,β1,,βL]T{\bm{\beta}}=[\beta_{0},\beta_{1},\ldots,\beta_{L}]^{T} be the solution to the 1\ell_{1}-regularized problem (1.6). If λ>0\lambda>0, then the number of nonzero elements of 𝛃{\bm{\beta}} satisfies

𝜷0𝐀LT𝐖𝐟0,\|\bm{\beta}\|_{0}\leq\|{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}\|_{0}, (4.2)

and

𝜷0=𝐀LT𝐖𝐟0#{occurrences of β=0 but α0},\|\bm{\beta}\|_{0}=\|{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm{\bf{f}}}\|_{0}-\#\text{\{occurrences of $\beta_{\ell}=0$ but $\alpha_{\ell}\neq 0$\}}, (4.3)

where #{occurrences of β=0 but α0}{\rm\#}\text{\{occurrences of $\beta_{\ell}=0$ but $\alpha_{\ell}\neq 0$\}} denotes the number of occurrences of β=0\beta_{\ell}=0 but α0\alpha_{\ell}\neq 0 for =0,1,,L\ell=0,1,\ldots,L.

Proof.  The first order condition (2.8) of the 1\ell_{1}-regularized approximation problem (1.6) in the matrix form can be rewritten as

𝜷𝐀LT𝐖𝐟λ(𝐑L𝜷1)2.{\bm{\beta}}\in{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}-\frac{\lambda\partial(\|{\rm\bf{R}}_{L}\bm{\beta}\|_{1})}{2}.

To obtain a solution, there must exist an L+1L+1 vector 𝐡=[h0,h1,,hL]T(𝐑L𝜷1){\rm\bf{h}}=[h_{0},h_{1},\ldots,h_{L}]^{T}\in\partial(\|{\rm\bf{R}}_{L}\bm{\beta}\|_{1}) such that

𝜷=𝐀LT𝐖𝐟λ𝐡2.{\bm{\beta}}={\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}-\frac{\lambda{\rm\bf{h}}}{2}. (4.4)

Since μ>0\mu_{\ell}>0 for all =0,1,,L\ell=0,1,\ldots,L, elements of 𝐡{\rm\bf{h}} satisfy

h={μ,μβ>0, i.e., β>0μ,μβ<0, i.e., β<0rμr[1,1],μβ=0, i.e., β=0,h_{\ell}=\begin{cases}\mu_{\ell},&\mu_{\ell}\beta_{\ell}>0,\text{~i.e.,~}\beta_{\ell}>0\\ -\mu_{\ell},&\mu_{\ell}\beta_{\ell}<0,\text{~i.e.,~}\beta_{\ell}<0\\ r\mu_{\ell}~\forall r\in[-1,1],&\mu_{\ell}\beta_{\ell}=0,\text{~i.e.,~}\beta_{\ell}=0,\end{cases}

yielding 𝐡0𝜷0\|{\rm{\bf{h}}}\|_{0}\geq\|{\bm{\beta}}\|_{0}. Expression (4.4) gives

𝜷+λ𝐡20=𝐀LT𝐖𝐟0.\left\|\bm{\beta}+\frac{\lambda{\rm{\bf{h}}}}{2}\right\|_{0}=\left\|{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}\right\|_{0}.

If β>(or<)0\beta_{\ell}>(\text{or}<)0, then h>(or<)0h_{\ell}>(\text{or}<)0. If β=0\beta_{\ell}=0, whereas hh_{\ell} may not be zero. Thus,

𝜷+λ𝐡20=𝐡0.\left\|\bm{\beta}+\frac{\lambda{\rm{\bf{h}}}}{2}\right\|_{0}=\|{\rm{\bf{h}}}\|_{0}.

Hence,

𝜷0𝐡0=𝐀LT𝐖𝐟0.\|{\bm{\beta}}\|_{0}\leq\|{\rm{\bf{h}}}\|_{0}=\|{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}\|_{0}.

Let β\beta_{\ell}^{*} denote the optimal solution to the problem (2.9) . With the aid of the closed-form solution to the 1\ell_{1}-regularized approximation problem (see (2.6)), expression (4.4) gives birth to

hμ=2λμ(αβ)={1,β>0,1,β<0,2αλμ,β=0.\frac{h_{\ell}}{\mu_{\ell}}=\frac{2}{\lambda\mu_{\ell}}(\alpha_{\ell}-\beta_{\ell}^{*})=\begin{cases}1,&\beta_{\ell}^{*}>0,\\ -1,&\beta_{\ell}^{*}<0,\\ \frac{2\alpha_{\ell}}{\lambda\mu_{\ell}},&\beta_{\ell}^{*}=0.\end{cases}

Due to 𝐡0=𝐡𝝁0\|\rm{\bf{h}}\|_{0}=\left\|\frac{\rm{\bf{h}}}{\bm{\mu}}\right\|_{0}, where 𝐡𝝁\frac{\rm{\bf{h}}}{\bm{\mu}} denotes the pointwise division between 𝐡\rm{\bf{h}} and 𝝁\bm{\mu}, the difference between 𝐡0\|\rm{\bf{h}}\|_{0} and 𝜷0\|\bm{\beta}\|_{0} is expressed as

𝐡0𝜷0=𝐡𝝁0𝜷0=#{occurrences of β=0 but α0}.\|\rm{\bf{h}}\|_{0}-\|\bm{\beta}\|_{0}=\left\|\frac{\rm{\bf{h}}}{\bm{\mu}}\right\|_{0}-\|\bm{\beta}\|_{0}=\#\text{\{occurrences of $\beta_{\ell}=0$ but $\alpha_{\ell}\neq 0$\}}. (4.5)

Together with 𝐡0=𝐀LT𝐖𝐟0\|\rm{\bf{h}}\|_{0}=\|{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}\|_{0} and (4.5), we obtain the exact number (4.3) of nonzero elements of 𝜷{\bm{\beta}}. \square

Corollary 4.1

If λ=0\lambda=0, then the number of nonzero elements of 𝛃{\bm{\beta}} satisfies

𝜷0=𝐀LT𝐖𝐟0.\|\bm{\beta}\|_{0}=\|{\rm\bf{A}}_{L}^{T}{\bf{W}}{\rm\bf{f}}\|_{0}. (4.6)

Remark.  Together, Theorem 4.1 and Corollary 4.1 state that the regularized minimization is better than minimization without regularization in terms of sparsity.

Let the basis for L\mathbb{P}_{L} be the family of normalized Chebyshev polynomials of the first kind {T~(x)}=0L\{\tilde{T}_{\ell}(x)\}_{\ell=0}^{L} and the set of nodes be the set of zeros of T~N+1(x)\tilde{T}_{N+1}(x). With degree LL of approximation polynomial ranging from 11 to 6060, λ\lambda evaluated 10110^{-1} and μ\mu_{\ell} evaluated 11 for all =0,1,,L\ell=0,1,\ldots,L, Fig. 2 gives four examples on the bounds and estimations given above.

Refer to caption
Figure 2: Examples on bounding the number of nonzero elements, where # denotes #{occurrences of β=0 but α0}\#\text{\{occurrences of $\beta_{\ell}=0$ but $\alpha_{\ell}\neq 0$\}}

5 Numerical experiments

In this section, we report numerical results to illustrate the theoretical results derived above and test the efficiency of the 2\ell_{2}-regularized approximation polynomial pL,N+12p_{L,N+1}^{\ell_{2}} (2.3) and the 1\ell_{1}-regularized approximation polynomial pL,N+11p_{L,N+1}^{\ell_{1}} (2.7). The choices of the basis for L\mathbb{P}_{L} and the set of nodes 𝒳N+1\mathcal{X}_{N+1} are primary when using both models. We choose Chebyshev polynomials of the first kind and the corresponding Chebyshev points. Certainly, choosing other orthogonal polynomials such as Legendre polynomials is also possible. All computations are performed in MATLAB in double precision arithmetic. Some related commands, for instance, obtaining quadrature points and weights, are included in Chebfun 5.7.0 (Trefethen et al., 2017).

To test the efficiency of approximation, we define the uniform error and the L2L_{2} error to measure the approximation error:

  • The uniform error of the approximation is estimated by

    f(x)pL,N+1(x):=maxx[1,1]|f(x)pL,N+1(x)|maxx𝒳|f(x)pL,N+1(x)|,\begin{split}\|f(x)-p_{L,N+1}(x)\|_{\infty}&:=\underset{x\in[-1,1]}{\max}|f(x)-p_{L,N+1}(x)|\\ &\simeq\underset{x\in\mathcal{X}}{\max}|f(x)-p_{L,N+1}(x)|,\end{split}

    where 𝒳\mathcal{X} is a large but finite set of well distributed points (for example, clustered grids, see Trefethen (2000, Chapter 5)) over the interval [1,1][-1,1].

  • The L2L_{2} error of the approximation is estimated by a proper Gauss quadrature rule:

    f(x)pL,N+1(x)L2=(11w(x)(f(x)pL,N+1(x))2𝑑x)1/2(j=0Nωj(f(xj)pL,N+1(xj))2)1/2.\begin{split}\|f(x)-p_{L,N+1}(x)\|_{L_{2}}&=\left(\int_{-1}^{1}w(x)(f(x)-p_{L,N+1}(x))^{2}dx\right)^{1/2}\\ &\simeq\left(\sum_{j=0}^{N}\omega_{j}(f(x_{j})-p_{L,N+1}(x_{j}))^{2}\right)^{1/2}.\end{split}

5.1 Regularized approximation models for exact data

The fact should always stick in readers’ mind that regularization is introduced to solve ill-posed problems or to prevent overfitting. When approximation applies to functions without noise, regularization parameter λ=0\lambda=0 (no regularization) contributes to the best choice of approximating. Fig. 3 reports the efficiency and errors for approximating function

f1(x)=tanh(20sin(12x))+0.02e3xsin(300x),f_{1}(x)=\tanh(20\sin(12x))+0.02\text{e}^{3x}\sin(300x),

with or without regularization over [1,1][-1,1]. The test function is given in Trefethen (2013). Let N=600N=600, L=200L=200, λ=101\lambda=10^{-1} and μ=1\mu_{\ell}=1 for all =0,1,,L\ell=0,1,...,L. Fig. 3 illustrates that regularization is beyond use in this well-posed approximation problem, and 2\ell_{2}-regularization is better than 1\ell_{1}-regularization in approximating smooth functions.

Refer to caption
Figure 3: Approximation of function f1(x)=tanh(20sin(12x))+0.02e3xsin(300x)f_{1}(x)=\tanh(20\sin(12x))+0.02\text{e}^{3x}\sin(300x) with exact values on the Chebyshev polynomials of the first kind

5.2 Regularized approximation models for contaminated data

We consider

f2(x)={5sin(5πx)5πx,x0,5,x=0,\displaystyle f_{2}(x)=\begin{cases}\frac{5\sin(5\pi x)}{5\pi x},&x\neq 0,\vskip 5.69046pt\\ 5,&x=0,\end{cases} (5.1)

which is the Fourier transform of the gate signal

g(t)={1,|t|5/2,0,|t|>5/2,g(t)=\begin{cases}1,&|t|\leq 5/2,\\ 0,&|t|>5/2,\end{cases}

see Bracewell (1965). We use regularized least squares models to reduce Gaussian white noise added to the function (5.1) with the signal-noise ratio (SNR) 10 dB. The choice of λ\lambda is critical in these models, so we first consider the relation between λ\lambda and approximation errors to choose the optimal λ\lambda. Let L=30L=30 and N=100N=100. We take λ=1015,1014.51014,,,104.5,105\lambda=10^{-15},~10^{-14.5}~10^{-14},~,\ldots,10^{4.5},~10^{5} to choose the best regularization parameter. Here we choose λ=101\lambda=10^{-1}. For more advanced methods to choose the parameter λ\lambda, we refer to Lazarov et al. (2007) and Pereverzyev et al. (2015) for a further discussion.

Fig. 4 shows that the 2\ell_{2}- and 1\ell_{1}-regularized approximation models with λ=101\lambda=10^{-1} are effective in recovering the noisy function. In the case we let

μ=1F(/L),=0,1,,L,\mu_{\ell}=\frac{1}{F(\ell/L)},\quad\ell=0,1,\ldots,L,

where the filter function FF is defined as (An et al., 2012)

F(x):={1,x[0,1/2],sin2πx,x[1/2,1],0,x[1,+].F(x):=\begin{cases}1,&x\in[0,1/2],\\ \sin^{2}\pi x,&x\in[1/2,1],\\ 0,&x\in[1,+\infty].\end{cases}

In this case, {μ}=0L\{\mu_{\ell}\}_{\ell=0}^{L} is a sequence of nonnegative nondecreasing parameters.

Refer to caption
Figure 4: Regularized approximation models with L=30L=30 and N=100N=100 to recover the function (5.1) from contaminated data on the Chebyshev polynomials of the first kind

Results in Fig. 5 illustrate that the 1\ell_{1}-regularized approximation model is the best choice when recovering a contaminated function, which accords with the known facts (Lu & Pereverzev, 2009). Both Fig. 4 and Fig. 5 show that regularized models are better than those without regularization (λ=0\lambda=0).

Refer to caption
Figure 5: Errors for regularized approximation model to recover f2(x)f_{2}(x) with fixed N=100N=100 on the Chebyshev polynomials of the first kind

Besides this, consider the highly oscillatory function

f3(x)=Airy(40x),x[1,1],f_{3}(x)=\text{Airy}(40x),\quad x\in[-1,1],

with 12dB Gauss white noise added (noisy function is shown in Fig. 6). We use regularized barycentric formulae (2.13) and (2.17) to conduct this experiment. Let L=N=500L=N=500 and {μ}=0L\{\mu_{\ell}\}_{\ell=0}^{L} be the same as above. Different values of λ\lambda, say 10110^{-1}, 10210^{-2}, 10510^{-5}, lead to different results, see Fig. 6. This experiment indicates that one could apply a simple formula to reduce noise, rather than employ an iterative scheme.

Refer to caption
Figure 6: Denoising by regularized barycentric formulae with L=N=500L=N=500 on Chebyshev points of the first kind: function f3(x)=Airy(40x)f_{3}(x)=\text{Airy}(40x) with 12dB Gauss white noise added

These numerical examples illustrate that for some problems, 2\ell_{2}-regularization also can be better than 1\ell_{1}-regularization. For example, λ=101\lambda=10^{-1} suits 2\ell_{2}-regularization, but almost straightens the function with 1\ell_{1}-regularization. Besides this, we can see that 2\ell_{2}-regularization contains lower sensitivity than 1\ell_{1}-regularization.

6 Concluding remarks

In this paper, we have investigated minimizers to the 2\ell_{2}- and 1\ell_{1}-regularized least squares approximation problems with the aid of Gauss quadrature points and orthogonal polynomials on the interval [1,1][-1,1]. Based on those explicit constructed approximation polynomials (2.3) and (2.7), the 2\ell_{2}-regulariarized barycentric interpolation formula (2.13) and the 1\ell_{1}-regulariarized barycentric interpolation formula (2.17) have been derived. In addition, Lebesgue constants are studied in the case of using Legendre polynomials and normalized Chebyshev polynomials of the first kind as the basis for the polynomial space L\mathbb{P}_{L}. A bound for sparsity of the solution to 1\ell_{1}-regularized approximation is obtained by the refinement of the subgradient. Numerical results indicate that both the 2\ell_{2}-regularized approximation polynomial and the 1\ell_{1}-regularized approximation polynomial are practicable and efficient. Regularization parameter choice strategies and error bounds of approximation should be studied in future work. These results provide new insights into the 2\ell_{2}- and 1\ell_{1}-regularized approximation, and can be adapted to many practical applications such as noise reduction by using the barycentric interpolation scheme on Gauss quadrature points.

References

  • An et al. (2012) An, C., Chen, X., Sloan, I. H. & Womersley, R. S. (2012) Regularized least squares approximations on the sphere using spherical designs. SIAM Journal on Numerical Analysis, 50, 1513–1534.
  • An & Wu (2019) An, C. & Wu, H.-N. (2019) The numerical stability of regularized barycentric interpolation formulae for interpolation and extrapolation. arXiv preprint arXiv:1904.07187.
  • Berrut & Trefethen (2004) Berrut, J.-P. & Trefethen, L. N. (2004) Barycentric Lagrange interpolation. SIAM Review, 46, 501–517.
  • Bracewell (1965) Bracewell, R. (1965) The Fourier Transform and Its Applications. New York: McGraw-Hill.
  • Bruckstein et al. (2009) Bruckstein, A. M., Donoho, D. L. & Elad, M. (2009) From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51, 34–81.
  • Cai et al. (2009) Cai, J.-F., Osher, S. & Shen, Z. (2009) Split Bregman methods and frame based image restoration. Multiscale Modeling & Simulation, 8, 337–369.
  • Clarke (1990) Clarke, F. H. (1990) Optimization and nonsmooth analysis, vol. 5. Philadelphia: SIAM.
  • Donoho & Johnstone (1994) Donoho, D. L. & Johnstone, J. M. (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika, 81, 425–455.
  • Fejér (1910) Fejér, L. (1910) Lebesguessche konstanten und divergente fourierreihen. Journal für die Reine und Angewandte Mathematik, 138, 22–53.
  • Gander (2005) Gander, W. (2005) Change of basis in polynomial interpolation. Numerical Linear Algebra with Applications, 12, 769–778.
  • Gautschi (2004) Gautschi, W. (2004) Orthogonal Polynomials: Computation and Approximation. Oxford: Oxford University Press on Demand.
  • Gautschi (2012) Gautschi, W. (2012) Numerical analysis 2nd edition. Basel: Birkhäuser Basel.
  • Golitschek & Schumaker (1990) Golitschek, M. & Schumaker, L. (1990) Data fitting by penalized least squares. Algorithms for Approximation II. New York: Springer-Verlag, pp. 210–227.
  • Gronwall (1913) Gronwall, T. H. (1913) Über die Laplacesche reihe. Mathematische Annalen, 74, 213–270.
  • Higham (2004) Higham, N. J. (2004) The numerical stability of barycentric Lagrange interpolation. IMA Journal of Numerical Analysis, 24, 547–556.
  • Kim et al. (2009) Kim, S.-J., Koh, K., Boyd, S. & Gorinevsky, D. (2009) 1\ell_{1} trend filtering. SIAM Review, 51, 339–360.
  • Kress (1998) Kress, R. (1998) Numerical analysis. Graduate Texts in Mathematics, vol. 181. New York: Springer-Verlag.
  • Lazarov et al. (2007) Lazarov, R. D., Lu, S. & Pereverzev, S. V. (2007) On the balancing principle for some problems of numerical analysis. Numerische Mathematik, 106, 659–689.
  • Lorentz (1966) Lorentz, G. G. (1966) Approximation of functions. Athena Series: Selected Topics in Mathematics. New York Chicago San Francisco Toronto London: Holt, Rinehart and Winston.
  • Lu & Pereverzev (2009) Lu, S. & Pereverzev, S. V. (2009) Sparse recovery by the standard Tikhonov method. Numerische Mathematik, 112, 403–424.
  • Pereverzyev et al. (2015) Pereverzyev, S. V., Sloan, I. H. & Tkachenko, P. (2015) Parameter choice strategies for least-squares approximation of noisy smooth functions on the sphere. SIAM Journal on Numerical Analysis, 53, 820–835.
  • Powell (1967) Powell, M. J. D. (1967) On the maximum errors of polynomial approximations defined by interpolation and by least squares criteria. The Computer Journal, 9, 404–407.
  • Powell (1981) Powell, M. J. D. (1981) Approximation theory and methods. Cambridge: Cambridge University Press.
  • Rivlin (2003) Rivlin, T. J. (2003) An introduction to the approximation of functions. North Chelmsford: Courier Corporation.
  • Salzer (1972) Salzer, H. E. (1972) Lagrangian interpolation at the Chebyshev points xn, ν\nu\equiv cos (ν\nuπ\pi/n), ν\nu= 0 (1) n; some unnoted advantages. The Computer Journal, 15, 156–159.
  • Schwarz & Waldvogel (1989) Schwarz, H. R. & Waldvogel, J. (1989) Numerical analysis: a comprehensive introduction. New York: Wiley.
  • Stein & Shakarchi (2011) Stein, E. M. & Shakarchi, R. (2011) Fourier analysis: an introduction, vol. 1. Princeton: Princeton University Press.
  • Szegö (1934) Szegö, G. (1934) Über einige asymptotische entwicklungen der Legendreschen funktionen. Proceedings of the London Mathematical Society, 2, 427–450.
  • Szegö (1939) Szegö, G. (1939) Orthogonal polynomials. Colloquium Publications Volume XXIII, vol. 23. Providence, Rhode Island: American Mathematical Society.
  • Trefethen (2000) Trefethen, L. N. (2000) Spectral methods in MATLAB, vol. 10. Philadelphia: SIAM.
  • Trefethen (2013) Trefethen, L. N. (2013) Approximation theory and approximation practice, vol. 128. Philadelphia: SIAM.
  • Trefethen et al. (2017) Trefethen, L. N. et al. (2017) Chebfun Version 5.7.0.
    http://www/maths.ox.ac.uk/chebfun/: Chebfun Development Team, Chebfun Development Team.
  • Wang et al. (2014) Wang, H., Huybrechs, D. & Vandewalle, S. (2014) Explicit barycentric weights for polynomial interpolation in the roots or extrema of classical orthogonal polynomials. Mathematics of Computation, 83, 2893–2914.
  • Wang & Xiang (2012) Wang, H. & Xiang, S. (2012) On the convergence rates of Legendre approximation. Mathematics of Computation, 81, 861–877.
  • Xiang & Zou (2013) Xiang, H. & Zou, J. (2013) Regularization with randomized SVD for large-scale discrete inverse problems. Inverse Problems, 29, 085008.
  • Zhou & Chen (2018) Zhou, Y. & Chen, X. (2018) Spherical tϵt_{\epsilon}-designs for approximations on the sphere. Mathematics of Computation, 87, 2831–2855.

Appendix A. Proof of Lemma 3.2

Proof.  We adopt the proof of Rivlin (2003, Lemma 2.2), but bring up a sharper bound of this inequality:

12πππ|Dn(x)|𝑑x=12πππ|sin(n+12)xsinx2|𝑑x<4π2logn+3,n1.\frac{1}{2\pi}\int_{-\pi}^{\pi}|D_{n}(x)|dx=\frac{1}{2\pi}\int_{-\pi}^{\pi}\Big{|}\frac{\sin(n+\frac{1}{2})x}{\sin\frac{x}{2}}\Big{|}dx<\frac{4}{\pi^{2}}\log n+3,\quad\quad n\geq 1. (.1)

When n=1n=1, 12πππ|Dn(x)|𝑑x=1\frac{1}{2\pi}\int_{-\pi}^{\pi}|D_{n}(x)|dx=1. Suppose n2n\geq 2, then

12πππ|sin(n+12)xsinx2|𝑑x\displaystyle\frac{1}{2\pi}\int_{-\pi}^{\pi}\Big{|}\frac{\sin(n+\frac{1}{2})x}{\sin\frac{x}{2}}\Big{|}dx =1π0π|sin(n+12)x|sinx2𝑑x\displaystyle=\frac{1}{\pi}\int_{0}^{\pi}\frac{|\sin(n+\frac{1}{2})x|}{\sin\frac{x}{2}}dx
=1π0π|sinnxtanx2+cosnx|𝑑x\displaystyle=\frac{1}{\pi}\int_{0}^{\pi}\left|\frac{\sin nx}{\tan\frac{x}{2}}+\cos nx\right|dx
1π0π|sinnx|tanx2𝑑x+1π0π|cosnx|𝑑x.\displaystyle\leq\frac{1}{\pi}\int_{0}^{\pi}\frac{|\sin nx|}{\tan\frac{x}{2}}dx+\frac{1}{\pi}\int_{0}^{\pi}|\cos nx|dx. (.2)

We have

1π0π|cosnx|𝑑x=2π,\frac{1}{\pi}\int_{0}^{\pi}|\cos nx|dx=\frac{2}{\pi}, (.3)

and

0π|sinnx|tanx2𝑑x20πsinxx𝑑x+4π[1+log(n1)].\int_{0}^{\pi}\frac{|\sin nx|}{\tan\frac{x}{2}}dx\leq 2\int_{0}^{\pi}\frac{\sin x}{x}dx+\frac{4}{\pi}[1+\log(n-1)]. (.4)

Together with (.2), (.3) and (.4), we obtain

12πππ|sin(n+12)xsinx2|𝑑x2π0πsinxx𝑑x+4π2[1+log(n1)]+2π.\frac{1}{2\pi}\int_{-\pi}^{\pi}\Big{|}\frac{\sin(n+\frac{1}{2})x}{\sin\frac{x}{2}}\Big{|}dx\leq\frac{2}{\pi}\int_{0}^{\pi}\frac{\sin x}{x}dx+\frac{4}{\pi^{2}}[1+\log(n-1)]+\frac{2}{\pi}.\\

Thus,

12πππ|Dn(x)|𝑑x4π2log(n1)+η,\frac{1}{2\pi}\int_{-\pi}^{\pi}|D_{n}(x)|dx\leq\frac{4}{\pi^{2}}\log(n-1)+\eta, (.5)

where

η:=4π2+2π+20πsinxx𝑑xπ.\eta:=\frac{4}{\pi^{2}}+\frac{2}{\pi}+\dfrac{2\int_{0}^{\pi}\frac{\sin x}{x}dx}{\pi}.

The value of η\eta can be calculated by MATLAB with the function quadgk as

η:=4π2+2π+20πsinxx𝑑xπ=2.220884.\eta:=\frac{4}{\pi^{2}}+\frac{2}{\pi}+\dfrac{2\int_{0}^{\pi}\frac{\sin x}{x}dx}{\pi}=2.220884\ldots.

It is not difficult to see inequality (.5) is sharper than inequality (.1) for n2n\geq 2. \square