This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Probabilistic Approximation in
Reproducing Kernel Hilbert Spaces

Dongwei Chen Department of Mathematics, Colorado State University, CO, US dongwei.chen@colostate.edu  and  Kai-Hsiang Wang Department of Mathematics, Northwestern University, IL, US kai-hsiangwang2025@u.northwestern.edu
Abstract.

This paper generalizes the least square method to probabilistic approximation in reproducing kernel Hilbert spaces. We show the existence and uniqueness of the optimizer. Furthermore, we generalize the celebrated representer theorem in this setting, and especially when the probability measure is finitely supported, or the Hilbert space is finite-dimensional, we show that the approximation problem turns out to be a measure quantization problem. Some discussions and examples are also given when the space is infinite-dimensional and the measure is infinitely supported.

1. Introduction and Main Results

Let XX be a set, 𝔽=\mathbb{F}=\mathbb{R} or \mathbb{C}, and (X,𝔽)\mathscr{F}(X,\mathbb{F}) the set of functions from XX to 𝔽\mathbb{F}. (X,𝔽)\mathscr{F}(X,\mathbb{F}) is naturally equipped with the vector space structure over 𝔽\mathbb{F} by pointwise addition and scalar multiplication:

(f+h)(x)=f(x)+h(x),(λf)(x)=λf(x)for xX and λ𝔽.(f+h)(x)=f(x)+h(x),\ (\lambda\cdot f)(x)=\lambda\cdot f(x)\;\text{for $x\in X$ and $\lambda\in\mathbb{F}$}.

A vector subspace (X,𝔽)\mathscr{H}\subset\mathscr{F}(X,\mathbb{F}) is said to be a reproducing kernel Hilbert space (RKHS) on XX if

  • \mathscr{H} is endowed with a Hilbert space structure ,\langle\cdot,\cdot\rangle. Our convention is that this is 𝔽\mathbb{F}-linear in the first argument.

  • for every xXx\in X, the linear evaluation functional Ex:𝔽E_{x}:\mathscr{H}\rightarrow\mathbb{F}, defined by Ex(f)=f(x)E_{x}(f)=f(x), is bounded.

If \mathscr{H} is an RKHS on XX, then Riesz representation theorem shows that for each xXx\in X, there exists a unique vector kxk_{x}\in\mathscr{H} such that for any ff\in\mathscr{H},

Ex(f)=f,kx=f(x).E_{x}(f)=\langle f,k_{x}\rangle=f(x).

The function kxk_{x} is called the reproducing kernel for the point xx, and the function K:X×X𝔽K:X\times X\rightarrow\mathbb{F} defined by K(y,x)=kx(y)K(y,x)=k_{x}(y) is called the reproducing kernel for \mathscr{H}. One can check that KK is indeed a kernel function, meaning that for any nn\in{\mathbb{N}} and any nn distinct points {x1,,xn}X\{x_{1},\cdots,x_{n}\}\subset X, the matrix (K(xi,xj))(K(x_{i},x_{j})) is symmetric (Hermitian) and positive semidefinite. It is well-known that there is a one-to-one correspondence between RKHSs and kernel functions on XX: by Moore’s theorem [5], if K:X×X𝔽K:X\times X\rightarrow\mathbb{F} is a kernel function, then there exists a unique RKHS \mathscr{H} on XX such that KK is the reproducing kernel of \mathscr{H}. We let (K)\mathscr{H}(K) denote the unique RKHS with the reproducing kernel KK, and define the feature map ϕ:X(K)\phi:X\rightarrow\mathscr{H}(K) by ϕ(x)=kx\phi(x)=k_{x}. We refer to [1, 2, 4, 6, 7, 8] for more details on the RKHS and its applications.

One of the interesting topics on the RKHS is interpolation. Let (K)\mathscr{H}(K) be an RKHS on XX, F={x1,,xN}F=\{x_{1},\cdots,x_{N}\} a finite set of distinct points in XX, and {c1,,cN}𝔽\{c_{1},\dots,c_{N}\}\subset\mathbb{F}. If the matrix (K(xi,xj))(K(x_{i},x_{j})) is invertible, then there exists f(K)f\in\mathscr{H}(K) such that f(xi)=cif(x_{i})=c_{i} for all 1iN1\leq i\leq N. However, if (K(xi,xj))(K(x_{i},x_{j})) is not invertible, such ff may not exist. In this case, one is often interested in finding the best approximation in (K)\mathscr{H}(K) to minimize the least square error:

inff(K)i=1N|f(xi)ci|2.\underset{f\in\mathscr{H}(K)}{\inf}\sum\limits_{i=1}^{N}|f(x_{i})-c_{i}|^{2}.

The theorem below shows the existence of the optimizer and describes its structure:

Theorem 1.1 (Theorem 3.8 in [6]).

Let (K)\mathscr{H}(K) be an RKHS on XX, F={x1,,xN}F=\{x_{1},\cdots,x_{N}\} a finite set of distinct points in XX, and v=(c1,,cN)T𝔽Nv=(c_{1},\dots,c_{N})^{T}\in\mathbb{F}^{N}. Let Q=(K(xi,xj))Q=(K(x_{i},x_{j})) and 𝒩(Q)\mathscr{N}(Q) the null space of QQ. Then there exists w=(α1,,αN)T𝔽Nw=(\alpha_{1},\cdots,\alpha_{N})^{T}\in\mathbb{F}^{N} with vQw𝒩(Q)v-Qw\in\mathscr{N}(Q). If we let

f^=α1kx1++αNkxN,\hat{f}=\alpha_{1}k_{x_{1}}+\cdots+\alpha_{N}k_{x_{N}},

then f^\hat{f} minimizes the least square error. Besides, among all such minimizers in (K)\mathscr{H}(K), f^\hat{f} is the unique function with the minimum norm.

Now, let 𝒫(X)\mathscr{P}(X) be the set of probability measures on XX and μN=1Ni=1Nδxi𝒫(X)\mu_{N}=\frac{1}{N}\sum\limits_{i=1}^{N}\delta_{x_{i}}\in\mathscr{P}(X). Then the above least square error problem is equivalent to

inff(K)X|f(x)g(x)|2𝑑μN(x)=inff(K)1Ni=1N|f(xi)ci|2\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \int_{X}|f(x)-g(x)|^{2}d\mu_{N}(x)=\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \frac{1}{N}\sum_{i=1}^{N}|f(x_{i})-c_{i}|^{2}

where gg is any given function with g(xi)=cig(x_{i})=c_{i} for all i=1,,Ni=1,\dots,N. This inspires us to replace μN\mu_{N} with any probability measure μ𝒫(X)\mu\in\mathscr{P}(X) and consider the probabilistic approximation problem in the RKHS.

The general formulation is as follows. Throughout this paper, we assume that XX is a Polish space, and all functions and measures considered are Borel measurable. Let g:X𝔽g:X\rightarrow\mathbb{F} be a given function, μ𝒫(X)\mu\in\mathscr{P}(X), and c:𝔽×𝔽+c:\mathbb{F}\times\mathbb{F}\rightarrow\mathbb{R}^{+} a nonnegative cost function. We then consider the following minimization problem

(1.1) inff(K)Xc(f(x),g(x))𝑑μ(x).\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \int_{X}c(f(x),g(x))d\mu(x).

Our first result is about the case when the cost function cc is from the LpL^{p} norm, and the feature map ϕ\phi is a continuous pp-frame 111A set of vectors {fx}xX\{f_{x}\}_{x\in X} in (K)\mathscr{H}(K) is a continuous pp-frame with respect to μ𝒫(X)\mu\in\mathscr{P}(X) if there exist 0<AB0<A\leq B such that for any f(K)f\in{\mathscr{H}(K)}, Af(K)pX|f,fx|p𝑑μ(x)Bf(K)pA\|f\|^{p}_{\mathscr{H}(K)}\leq\int_{X}|\langle f,f_{x}\rangle|^{p}d\mu(x)\leq B\|f\|^{p}_{\mathscr{H}(K)}. for (K)\mathscr{H}(K) with respect to μ\mu.

Theorem 1.2.

Let (K)\mathscr{H}(K) be an RKHS on XX with the feature map ϕ\phi. Let μ𝒫(X)\mu\in\mathscr{P}(X) and gLp(X,μ)g\in L^{p}(X,\mu), where 1p<1\leq p<\infty. Assume that {ϕ(x),xX}\{\phi(x),x\in X\} is a continuous pp-frame for (K)\mathscr{H}(K) with respect to μ\mu. Then the following problem

inff(K)X|f(x)g(x)|p𝑑μ(x)\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \int_{X}|f(x)-g(x)|^{p}d\mu(x)

admits an optimizer f^(K)\hat{f}\in\mathscr{H}(K). Furthermore, if p>1p>1, the optimizer is unique.

Note that the continuous pp-frame condition is the same as that the LpL^{p} norm is equivalent to the Hilbert space norm, as in the following inequality:

C1f(K)fLp(X,μ)=(X|f,ϕ(x)(K)|p𝑑μ(x))1pC2f(K)C_{1}\|f\|_{\mathscr{H}(K)}\leq\|f\|_{L^{p}(X,\mu)}=\left(\int_{X}|\langle f,\phi(x)\rangle_{\mathscr{H}(K)}|^{p}d\mu(x)\right)^{\frac{1}{p}}\leq C_{2}\|f\|_{\mathscr{H}(K)}

for some 0<C1C2<0<C_{1}\leq C_{2}<\infty. Thus we can rewrite Theorem 1.2 as the following:

Corollary 1.3.

Let (K)\mathscr{H}(K) be an RKHS on XX, μ𝒫(X)\mu\in\mathscr{P}(X), and gLp(X,μ)g\in L^{p}(X,\mu) where 1p<1\leq p<\infty. If (K)\mathscr{H}(K) is a (closed) subspace of Lp(X,μ)L^{p}(X,\mu), and the norm induced by the inner product is equivalent to the LpL^{p} norm, then the following problem

inff(K)X|f(x)g(x)|p𝑑μ(x)\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \int_{X}|f(x)-g(x)|^{p}d\mu(x)

admits an optimizer f^(K)\hat{f}\in\mathscr{H}(K). Furthermore, if p>1p>1, the optimizer is unique.

In the special case where p=2p=2 and \mathscr{H} is a Hilbert subspace of L2(X,μ)L^{2}(X,\mu), such a unique closest vector is classically given by the orthogonal projection of gg onto \mathscr{H}. Although Lp(X,μ)L^{p}(X,\mu) is not a Hilbert space for general p2p\neq 2, our corollary (under some assumptions) provides such a unique optimizer in the probabilistic approximation sense, which can be viewed as "projections" to the given subspace.

Our next result involves adding an extra regularization term to the minimization problem. In statistical regression and machine learning, a regularization term is preferred to perform variable selection, enhance the prediction accuracy, and prevent overfitting. Examples of such practice are ridge regression [3] and Lasso regression [10]. In this paper, we consider the following minimization problem with regularization:

(1.2) inff(K)Xc(f(x),g(x))𝑑μ(x)+f(K)p.\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \int_{X}c(f(x),g(x))d\mu(x)+\|f\|_{\mathscr{H}(K)}^{p}.

We show the existence and uniqueness of the optimizer in the following theorem:

Theorem 1.4.

Let (K)\mathscr{H}(K) be an RKHS on XX. Let μ𝒫(X)\mu\in\mathscr{P}(X) and 0<p<0<p<\infty. Let a function g:X𝔽g:X\to\mathbb{F} and a cost function c:𝔽×𝔽+c:\mathbb{F}\times\mathbb{F}\rightarrow\mathbb{R}^{+} be given such that

  • c(0,g())L1(X,μ)c(0,g(\cdot))\in L^{1}(X,\mu);

  • for any given z𝔽z\in\mathbb{F}, c(,z)c(\cdot,z) is lower-semicontinuous.

Then Problem 1.2 admits an optimizer f^(K)\hat{f}\in\mathscr{H}(K). Furthermore, when p>1p>1 and c(,z)c(\cdot,z) is convex for any given z𝔽z\in\mathbb{F}, the optimizer is unique.

The theorems in this section will be proved in the last part of this paper. In the following sections, we will show some representer-type theorems describing the optimizer, mainly that from Theorem 1.4.

2. Probabilistic Representer Theorem

As witnessed by the work of Wahba [11] and followed by Schölkopf, Herbrich, and Smola [9], the celebrated representer theorem (Theorem 2.1 below) states that the optimizer in an RKHS that minimizes the regularized loss function is in the linear span of the kernel function at the given points.

Theorem 2.1 (Theorem 8.7 and 8.8 in [6]).

Let (K)\mathscr{H}(K) be an RKHS on XX, F={x1,,xN}F=\{x_{1},\cdots,x_{N}\} a finite set of distinct points in XX, and {c1,,cN}𝔽\{c_{1},\dots,c_{N}\}\subset\mathbb{F}. Let LL be a convex loss function and consider the minimizing problem

inff(K)L(f(x1),,f(xn))+f(K)2.\underset{f\in\mathscr{H}(K)}{\inf}L(f(x_{1}),\cdots,f(x_{n}))+\|f\|_{\mathscr{H}(K)}^{2}.

Then the optimizer to this problem exists and is unique. Furthermore, the optimizer is in the linear span of the functions kx1,,kxnk_{x_{1}},\cdots,k_{x_{n}}.

The representer theorem is useful in practice since it turns the minimization problem into a finite-dimensional optimization problem. It is natural to ask whether there is a corresponding version of the representer theorem under the probabilistic approximation setting as we introduced in the previous section. We confirm this speculation with the following representer theorem in measure representation forms.

Theorem 2.2 (Probabilistic Representer).

Let f^\hat{f} be the unique optimizer in Theorem 1.4. Assume the probability measure μ\mu satisfies that for any 𝔽\mathbb{F}-measure ξ\xi on XX with suppξsuppμ\operatorname{supp}\xi\subset\operatorname{supp}\mu, the following holds:

(2.1) Xϕ(x)(K)d|ξ|(x)<.\int_{X}\|\phi(x)\|_{\mathscr{H}(K)}d|\xi|(x)<\infty.

Then there exists a sequence of 𝔽\mathbb{F}-measures {νn}n=1\{\nu_{n}\}_{n=1}^{\infty} on XX with supp(νn)supp(μ)\operatorname{supp}(\nu_{n})\subset\operatorname{supp}(\mu) such that

f^=limnXϕ(x)𝑑νn(x).\hat{f}=\underset{n\rightarrow\infty}{\text{lim}}\int_{X}\phi(x)d\nu_{n}(x).

Furthermore, if μ\mu is finitely supported, or (K)\mathscr{H}(K) is finite-dimensional, then there exists an 𝔽\mathbb{F}-measure ν\nu on XX with supp(ν)supp(μ)\operatorname{supp}(\nu)\subset\operatorname{supp}(\mu) such that

f^=Xϕ(x)𝑑ν(x).\hat{f}=\int_{X}\phi(x)d\nu(x).

The finiteness condition 2.1 in Theorem 2.2 holds when μ\mu is finitely supported, or when XX is compact and KK is continuous. Also, the condition 2.1 holds on the Hardy space H2(D)H^{2}(D) when suppμ\operatorname{supp}\mu is a compact set in the open unit disk DD, as we will see in Example 3.2 later.

Proof.

Define

𝒜:={Xϕ(x)𝑑ξ(x):ξis an 𝔽-measure on X,supp(ξ)supp(μ)},\mathscr{A}:=\left\{\int_{X}\phi(x)d\xi(x):\xi\ \text{is an $\mathbb{F}$-measure on $X$},\operatorname{supp}(\xi)\subset\operatorname{supp}(\mu)\right\},

and let Ω\Omega be the closure of 𝒜\mathscr{A} in (K)\mathscr{H}(K). The integral above is defined via duality, and we check that for any f(K)f\in\mathscr{H}(K),

|Xϕ(x)𝑑ξ(x),f(K)|\displaystyle\left|\langle\int_{X}\phi(x)d\xi(x),f\rangle_{\mathscr{H}(K)}\right| =|Xϕ(x),f(K)𝑑ξ(x)|\displaystyle=\left|\int_{X}\langle\phi(x),f\rangle_{\mathscr{H}(K)}d\xi(x)\right|
f(K)Xϕ(x)(K)d|ξ|(x),\displaystyle\leq\|f\|_{\mathscr{H}(K)}\int_{X}\|\phi(x)\|_{\mathscr{H}(K)}d|\xi|(x),

where |ξ||\xi| is the variation measure of ξ\xi. By the Assumption 2.1, we see that Xϕ(x)𝑑ξ(x)\int_{X}\phi(x)d\xi(x) defines a bounded linear functional on (K)\mathscr{H}(K).

It is easy to see that Ω\Omega is a closed subspace of (K)\mathscr{H}(K). Let PΩP_{\Omega} be the orthogonal projection of (K)\mathscr{H}(K) onto Ω\Omega. Note that for any xsupp(μ)x\in\operatorname{supp}(\mu), ϕ(x)Ω\phi(x)\in\Omega by taking ξ\xi as the delta measure at xx. Therefore, for any xsupp(μ)x\in\operatorname{supp}(\mu),

(PΩf^)(x)=PΩf^,ϕ(x)(K)=f^,PΩϕ(x)(K)=f^,ϕ(x)(K)=f^(x).(P_{\Omega}\hat{f})(x)=\langle P_{\Omega}\hat{f},\phi(x)\rangle_{\mathscr{H}(K)}=\langle\hat{f},P_{\Omega}\phi(x)\rangle_{\mathscr{H}(K)}=\langle\hat{f},\phi(x)\rangle_{\mathscr{H}(K)}=\hat{f}(x).

Therefore,

Xc(PΩf^(x),g(x))𝑑μ(x)+PΩf^(K)pXc(f^(x),g(x))𝑑μ(x)+f^(K)p.\int_{X}c(P_{\Omega}\hat{f}(x),g(x))d\mu(x)+\|P_{\Omega}\hat{f}\|^{p}_{\mathscr{H}(K)}\leq\int_{X}c(\hat{f}(x),g(x))d\mu(x)+\|\hat{f}\|^{p}_{\mathscr{H}(K)}.

Hence PΩf^P_{\Omega}\hat{f} is also an optimizer. Since the optimizer is unique, we conclude PΩf^=f^P_{\Omega}\hat{f}=\hat{f} and thus f^Ω\hat{f}\in\Omega. Therefore, there exists a sequence of 𝔽\mathbb{F}-measures {νn}n=1\{\nu_{n}\}_{n=1}^{\infty} on XX with supp(νn)supp(μ)\operatorname{supp}(\nu_{n})\subset\operatorname{supp}(\mu) for each nn, such that

f^=limnXϕ(x)𝑑νn(x).\hat{f}=\underset{n\rightarrow\infty}{\text{lim}}\int_{X}\phi(x)d\nu_{n}(x).

If μ\mu is finitely supported or (K)\mathscr{H}(K) is finite-dimensional, then the set 𝒜\mathscr{A} is automatically closed and Ω=𝒜\Omega=\mathscr{A}. Thus there exists an 𝔽\mathbb{F}-measure ν\nu on XX with supp(ν)supp(μ)\operatorname{supp}(\nu)\subset\operatorname{supp}(\mu) such that

f^=Xϕ(x)𝑑ν(x).\hat{f}=\int_{X}\phi(x)d\nu(x).\qed

We can furthermore assume that the 𝔽\mathbb{F}-measures ξ\xi on XX are finitely supported, and the set 𝒜\mathscr{A} then becomes the linear span of {kx}xsupp(μ)\{k_{x}\}_{x\in\operatorname{supp}(\mu)}. This leads to the following corollary:

Corollary 2.3 (Discrete Probabilistic Representer).

Let f^\hat{f} be the unique optimizer in Theorem 1.4. Then f^\hat{f} is in the closure of the linear span of {kx}xsupp(μ)\{k_{x}\}_{x\in\operatorname{supp}(\mu)}. Furthermore, if μ\mu is finitely supported, or (K)\mathscr{H}(K) is finite-dimensional, then f^\hat{f} is in the linear span of {kx}xsupp(μ)\{k_{x}\}_{x\in\operatorname{supp}(\mu)}.

Proof.

Here, we use the following definition of 𝒜\mathscr{A}:

𝒜:=span{kx}xsupp(μ)={i=1mwikxi:m+,{wi}i=1m𝔽,{xi}i=1msupp(μ)},\mathscr{A}:=span{\{k_{x}\}_{x\in\operatorname{supp}(\mu)}}=\left\{\sum_{i=1}^{m}w_{i}k_{x_{i}}:m\in\mathbb{N}^{+},\{w_{i}\}_{i=1}^{m}\subset\mathbb{F},\{x_{i}\}_{i=1}^{m}\subset\operatorname{supp}(\mu)\right\},

and let Ω\Omega be the closure of 𝒜\mathscr{A} in (K)\mathscr{H}(K). Then Ω\Omega is a closed linear subspace of (K)\mathscr{H}(K). Using the same arguments as in Theorem 2.2, we conclude f^Ω\hat{f}\in\Omega. If μ\mu is finitely supported or (K)\mathscr{H}(K) is finite-dimensional, the set 𝒜\mathscr{A} is automatically closed and thus f^Ω=𝒜\hat{f}\in\Omega=\mathscr{A}. ∎

Note that when μ\mu is finitely supported, Corollary 2.3 recovers the celebrated Representer Theorem 2.1. On the other hand, when (K)\mathscr{H}(K) is finite-dimensional, the unique optimizer is also in the linear span of {kx}xsupp(μ)\{k_{x}\}_{x\in\operatorname{supp}(\mu)}, which does not depend on the cardinality of the support of the measure μ\mu. Both these two cases indicate that the probabilistic approximation problem turns out to be a measure quantization problem about the measure μ\mu with respect to the loss function 1.2.

Remark 2.4.

When the cost function cc and the given function gg satisfy the assumption in Theorem 1.4, the existence and unique still holds for the following more general problem

(2.2) inff(K)Xc(f(x),g(x))𝑑μ(x)+h(f(K)),\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \int_{X}c(f(x),g(x))d\mu(x)+h(\|f\|_{\mathscr{H}(K)}),

where h:++h:\mathbb{R}^{+}\rightarrow\mathbb{R}^{+} can be any strictly convex function. Furthermore, when h:++h:\mathbb{R}^{+}\rightarrow\mathbb{R}^{+} is increasing and strictly convex, the probabilistic representer theorems (Theorem 2.2 and Corollary 2.3) also hold.

3. Discussions on the Representer Theorem

The preferable result in the probabilistic representer theorem (Theorem 2.2) is that the minimizer f^\hat{f} can be represented directly by an 𝔽\mathbb{F}-measure ν\nu, instead of an approximating sequence {νn}\{\nu_{n}\}. We conjecture such nice form holds merely under the Assumption 2.1:

Conjecture 3.1.

Let f^\hat{f} be the unique optimizer in Theorem 1.4. Assume the probability measure μ\mu satisfies that for any 𝔽\mathbb{F}-measure ξ\xi on XX with suppξsuppμ\operatorname{supp}\xi\subset\operatorname{supp}\mu, the following holds:

Xϕ(x)(K)d|ξ|(x)<.\int_{X}\|\phi(x)\|_{\mathscr{H}(K)}d|\xi|(x)<\infty.

Then there exists an 𝔽\mathbb{F}-measure ν\nu on XX with supp(ν)supp(μ)\operatorname{supp}(\nu)\subset\operatorname{supp}(\mu) such that

f^=Xϕ(x)𝑑ν(x).\hat{f}=\int_{X}\phi(x)d\nu(x).

To support this conjecture as well as to illustrate Theorem 2.2, we here provide the following example:

Example 3.2 (Measure Representation).

Consider the 22-Hardy space H2(D)H^{2}(D) on the unit disk DD. Fix 0<r<10<r<1 and let DrD_{r} be the open disk in \mathbb{C} centered at the origin of radius rr. Let μ\mu be the uniform probability measure on DrD_{r}. First we check Assumption 2.1 in this setting: for any 𝔽\mathbb{F}-measure ξ\xi with suppξsuppμ\operatorname{supp}\xi\subset\operatorname{supp}\mu, we have

Dϕ(w)H2(D)d|ξ|(w)=Dr11|w|2d|ξ|(w)11r2ξ<,\int_{D}\|\phi(w)\|_{H^{2}(D)}d|\xi|(w)=\int_{D_{r}}\frac{1}{1-|w|^{2}}d|\xi|(w)\leq\frac{1}{1-r^{2}}\|\xi\|<\infty,

where ξ\|\xi\| is the total variation of ξ\xi. Now let gB2(D)g\in B^{2}(D), the Bergman space consisting of Lebesgue-L2L^{2} holomorphic functions on DD. Consider the minimization problem:

inffH2(D)1|Dr|Dr|fg|2dArea+fH2(D)2.\underset{f\in H^{2}(D)}{\inf}\frac{1}{|D_{r}|}\int_{D_{r}}|f-g|^{2}d\operatorname{Area}+\|f\|^{2}_{H^{2}(D)}.

If we use the power series expressions f^=nanzn\hat{f}=\sum_{n}a_{n}z^{n} and g=nbnzng=\sum_{n}b_{n}z^{n}, then we can apply variations on the coefficients ana_{n} to get an Euler-Lagrange equation for the minimizer f^\hat{f}:

(1+n+1r2n)an=bn.(1+\frac{n+1}{r^{2n}})a_{n}=b_{n}.

Thus f^\hat{f} is determined by gg via this formula, and our goal is to find a \mathbb{C}-measure representation of f^\hat{f}, as in Conjecture 3.1.

Our strategy is to first find the measure representation for the basis vectors {zk}k0\{z^{k}\}_{k\in{\mathbb{N}}_{0}} of H2(D)H^{2}(D). Computation gives

zk=Dr11w¯zk+1πr2k2wkdArea(w)z^{k}=\int_{D_{r}}\frac{1}{1-\bar{w}z}\frac{k+1}{\pi}r^{-2k-2}w^{k}d\operatorname{Area}(w)

That is, zkz^{k} can be represented by the \mathbb{C}-measure ξk:\xi_{k}:

dξk(w):=k+1πr2k2wkdAreaDr(w).d\xi_{k}(w):=\frac{k+1}{\pi}r^{-2k-2}w^{k}d\operatorname{Area}\llcorner D_{r}(w).

From f^=nanzn\hat{f}=\sum_{n}a_{n}z^{n}, we would imagine that f^\hat{f} is represented by the measure ν:=nanξn\nu:=\sum_{n}a_{n}\xi_{n}. Indeed this is a well-defined \mathbb{C}-measure since we have

νn|an|ξn\displaystyle\|\nu\|\leq\sum_{n}|a_{n}|\|\xi_{n}\| nr2nr2n+n+1|bn|2n+2n+2rn\displaystyle\leq\sum_{n}\frac{r^{2n}}{r^{2n}+n+1}|b_{n}|\frac{2n+2}{n+2}r^{-n}
n1n+1|bn|2n4n+1r2n<,\displaystyle\leq\sqrt{\sum_{n}\frac{1}{n+1}|b_{n}|^{2}}\sqrt{\sum_{n}\frac{4}{n+1}r^{2n}}<\infty,

where the first square root is finite since gB2(D)g\in B^{2}(D).

Now we show precisely that f^\hat{f} is represented by ν\nu. Let νk=n=1kanξn\nu_{k}=\sum_{n=1}^{k}a_{n}\xi_{n} be the partial sums of ν\nu. Then we have f^=limkDϕ𝑑νk\hat{f}=\lim_{k\rightarrow\infty}\int_{D}\phi d\nu_{k} strongly in H2(D)H^{2}(D) since {zk}k0\{z^{k}\}_{k\in{\mathbb{N}}_{0}} is an orthonormal basis. On the other hand, Dϕ𝑑νk\int_{D}\phi d\nu_{k} converges weakly to Dϕ𝑑ν\int_{D}\phi d\nu in H2(D)H^{2}(D) since all functions in H2(D)H^{2}(D) are bounded on DrD_{r}. Thus by the uniqueness of the weak limit, we have f^=Dϕ𝑑ν\hat{f}=\int_{D}\phi d\nu.

4. Proof of Theorem 1.2 and Theorem 1.4

Proof of Theorem 1.2.

Since gLp(X,μ)g\in L^{p}(X,\mu) and f=0(K)f=0\in\mathscr{H}(K), we have

I:=inff(K)X|f(x)g(x)|p𝑑μ(x)X|g(x)|p𝑑μ(x)<+,I:=\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \int_{X}|f(x)-g(x)|^{p}d\mu(x)\leq\int_{X}|g(x)|^{p}d\mu(x)<+\infty,

i.e., the problem II is bounded. Let {fi}i=1\{f_{i}\}_{i=1}^{\infty} be a minimizing sequence. Then

X|fi(x)g(x)|p𝑑μ(x)I.\int_{X}|f_{i}(x)-g(x)|^{p}d\mu(x)\rightarrow I.

Thus there exists 0<M<0<M<\infty such that for each ii, figLp(X,μ)M\|f_{i}-g\|_{L^{p}(X,\mu)}\leq M. Then

fiLp(X,μ)figLp(X,μ)+gLp(X,μ)M+gLp(X,μ),for any i.\|f_{i}\|_{L^{p}(X,\mu)}\leq\|f_{i}-g\|_{L^{p}(X,\mu)}+\|g\|_{L^{p}(X,\mu)}\leq M+\|g\|_{L^{p}(X,\mu)},\ \text{for any $i$}.

On the other hand, since {ϕ(x),xX}\{\phi(x),x\in X\} is a continuous pp–frame for (K)\mathscr{H}(K) with respect to μ\mu, for some lower frame bound A>0A>0 we have

Afi(K)pX|fi(x)|p𝑑μ(x)=X|fi,ϕ(x)(K)|p𝑑μ(x),for any i.A\|f_{i}\|_{\mathscr{H}(K)}^{p}\leq\int_{X}|f_{i}(x)|^{p}d\mu(x)=\int_{X}|\langle f_{i},\phi(x)\rangle_{\mathscr{H}(K)}|^{p}d\mu(x),\;\text{for any $i$.}

Combing these results we get

fi(K)p1AX|fi(x)|p𝑑μ(x)=fiLp(X,μ)pA(M+gLp(X,μ))pA<+.\|f_{i}\|_{\mathscr{H}(K)}^{p}\leq\frac{1}{A}\int_{X}|f_{i}(x)|^{p}d\mu(x)=\frac{\|f_{i}\|_{L^{p}(X,\mu)}^{p}}{A}\leq\frac{(M+\|g\|_{L^{p}(X,\mu)})^{p}}{A}<+\infty.

Thus {fi}i=1\{f_{i}\}_{i=1}^{\infty} is a bounded sequence in (K)\mathscr{H}(K). Then {fi}i=1\{f_{i}\}_{i=1}^{\infty} has a weakly convergent subsequence {fik}k=1\{f_{i_{k}}\}_{k=1}^{\infty}, i.e., there exists f^(K)\hat{f}\in\mathscr{H}(K) such that for any h(K)h\in\mathscr{H}(K), fik,hf^,hask.\langle f_{i_{k}},h\rangle\rightarrow\langle\hat{f},h\rangle\ \text{as}\ k\rightarrow\infty. Taking h=ϕ(x)h=\phi(x), we get fik(x)f^(x)f_{i_{k}}(x)\rightarrow\hat{f}(x). Now by Fatou’s lemma, we get

Xlim infk|fik(x)g(x)|p𝑑μ(x)lim infkX|fik(x)g(x)|p𝑑μ(x).\int_{X}\underset{k\rightarrow\infty}{\text{lim inf}}\ |f_{i_{k}}(x)-g(x)|^{p}d\mu(x)\leq\underset{k\rightarrow\infty}{\text{lim inf}}\int_{X}|f_{i_{k}}(x)-g(x)|^{p}d\mu(x).

Using the pointwise convergence and that {fi}i=1\{f_{i}\}_{i=1}^{\infty} is minimizing, we obtain

X|f^(x)g(x)|p𝑑μ(x)inff(K)X|f(x)g(x)|p𝑑μ(x).\int_{X}|\hat{f}(x)-g(x)|^{p}d\mu(x)\leq\underset{f\in\mathscr{H}(K)}{\text{inf}}\int_{X}|f(x)-g(x)|^{p}d\mu(x).

Therefore f^\hat{f} is an optimizer.

Next, we show that the optimizer is unique when p>1p>1. Let f1^\hat{f_{1}} and f2^\hat{f_{2}} be optimizers. Then by Minkowski’s inequality, we have

f1^+f2^2gLp(X,μ)f1^g2Lp(X,μ)+f2^g2Lp(X,μ)=I1p.\|\frac{\hat{f_{1}}+\hat{f_{2}}}{2}-g\|_{L^{p}(X,\mu)}\leq\|\frac{\hat{f_{1}}-g}{2}\|_{L^{p}(X,\mu)}+\|\frac{\hat{f_{2}}-g}{2}\|_{L^{p}(X,\mu)}=I^{\frac{1}{p}}.

Since II is the infimum and f1^+f2^2(K)\frac{\hat{f_{1}}+\hat{f_{2}}}{2}\in\mathscr{H}(K), the equality must hold and we infer the following two cases:

  1. (1)

    f2^g=0\hat{f_{2}}-g=0 μ\mu-a.e. In this case, we have I=0I=0 and f1^=g=f2^\hat{f_{1}}=g=\hat{f_{2}} μ\mu-a.e.

  2. (2)

    f1^g=λ(f2^g)\hat{f_{1}}-g=\lambda(\hat{f_{2}}-g) μ\mu-a.e. for some number λ0\lambda\geq 0. It easily follows that λ=1\lambda=1 and hence f1^=f2^\hat{f_{1}}=\hat{f_{2}} μ\mu-a.e.

In either case, we conclude f1^=f2^\hat{f_{1}}=\hat{f_{2}} in (K)\mathscr{H}(K) by the continuous frame condition. ∎

Proof of Theorem 1.4.

Since c(0,g())L1(X,μ)c(0,g(\cdot))\in L^{1}(X,\mu) and f=0(K)f=0\in\mathscr{H}(K), then

Ig:=inff(K)Xc(f(x),g(x))𝑑μ(x)+f(K)pXc(0,g(x))𝑑μ(x)<+.I_{g}:=\underset{f\in\mathscr{H}(K)}{\text{inf}}\ \int_{X}c(f(x),g(x))d\mu(x)+\|f\|^{p}_{\mathscr{H}(K)}\leq\int_{X}c(0,g(x))d\mu(x)<+\infty.

Hence the problem IgI_{g} is bounded. Let {fi}i=1\{f_{i}\}_{i=1}^{\infty} be a minimizing sequence. Then

Xc(fi(x),g(x))𝑑μ(x)+fi(K)pIg<+.\int_{X}c(f_{i}(x),g(x))d\mu(x)+\|f_{i}\|^{p}_{\mathscr{H}(K)}\rightarrow I_{g}<+\infty.

Then there exists 0<M<0<M<\infty such that for each ii,

Xc(fi(x),g(x))𝑑μ(x)+fi(K)pM.\int_{X}c(f_{i}(x),g(x))d\mu(x)+\|f_{i}\|^{p}_{\mathscr{H}(K)}\leq M.

Thus {fi}i=1\{f_{i}\}_{i=1}^{\infty} is a bounded sequence in (K)\mathscr{H}(K). Then {fi}i=1\{f_{i}\}_{i=1}^{\infty} has a weakly convergent subsequence {fik}k=1\{f_{i_{k}}\}_{k=1}^{\infty}, i.e., there exists f^(K)\hat{f}\in\mathscr{H}(K) such that for any h(K)h\in\mathscr{H}(K), fik,hf^,hask.\langle f_{i_{k}},h\rangle\rightarrow\langle\hat{f},h\rangle\ \text{as}\ k\rightarrow\infty. Taking h=ϕ(x)h=\phi(x), we get fik(x)f^(x)f_{i_{k}}(x)\rightarrow\hat{f}(x). By lower-semicontinuity of cc and Fatou’s lemma, we get

Xc(f^(x),g(x))𝑑μ(x)Xlim infkc(f^ik(x),g(x))𝑑μ(x)lim infkXc(f^ik(x),g(x))𝑑μ(x).\begin{split}\int_{X}c(\hat{f}(x),g(x))d\mu(x)&\leq\int_{X}\underset{k\rightarrow\infty}{\text{lim inf}}\ c(\hat{f}_{i_{k}}(x),g(x))d\mu(x)\\ &\leq\underset{k\rightarrow\infty}{\text{lim inf}}\int_{X}c(\hat{f}_{i_{k}}(x),g(x))d\mu(x).\end{split}

On the other hand, since fikf_{i_{k}} converges to f^\hat{f} weakly in (K)\mathscr{H}(K), we have f^(K)plim infkfik(K)p.\|\hat{f}\|_{\mathscr{H}(K)}^{p}\leq\underset{k\rightarrow\infty}{\text{lim inf}}\ \|f_{i_{k}}\|_{\mathscr{H}(K)}^{p}. Furthermore, by the superadditivity of limit inferior, we get

lim infkXc(f^ik(x),g(x))dμ(x)+lim infkfik(K)plim infkXc(fik(x),g(x))𝑑μ(x)+fik(K)p.\begin{split}\underset{k\rightarrow\infty}{\text{lim inf}}\int_{X}c(\hat{f}_{i_{k}}(x),g(x))&d\mu(x)+\underset{k\rightarrow\infty}{\text{lim inf}}\ \|f_{i_{k}}\|^{p}_{\mathscr{H}(K)}\\ &\leq\underset{k\rightarrow\infty}{\text{lim inf}}\int_{X}c(f_{i_{k}}(x),g(x))d\mu(x)+\|f_{i_{k}}\|^{p}_{\mathscr{H}(K)}.\end{split}

Combining the results and that {fi}i=1\{f_{i}\}_{i=1}^{\infty} is minimizing, we obtain

Xc(f^(x),g(x))𝑑μ(x)+f^(K)pIg.\int_{X}c(\hat{f}(x),g(x))d\mu(x)+\|\hat{f}\|^{p}_{\mathscr{H}(K)}\leq I_{g}.

Therefore f^\hat{f} is an optimizer.

Next, we show the uniqueness when p>1p>1 and c(,z)c(\cdot,z) is convex for any fixed z𝔽z\in\mathbb{F}. Let f1^\hat{f_{1}} and f2^\hat{f_{2}} be optimizers attaining IgI_{g}. Since f1^+f2^2(K)\frac{\hat{f_{1}}+\hat{f_{2}}}{2}\in\mathscr{H}(K), we have

Xc(f1^(x)+f2^(x)2,g(x))dμ(x)+f1^+f2^2(K)pIg=f1^(K)p2+f2^(K)p2+Xc(f1^(x),g(x))𝑑μ(x)2+Xc(f2^(x),g(x))𝑑μ(x)2.\begin{split}\int_{X}&c(\frac{\hat{f_{1}}(x)+\hat{f_{2}}(x)}{2},g(x))d\mu(x)+\|\frac{\hat{f_{1}}+\hat{f_{2}}}{2}\|^{p}_{\mathscr{H}(K)}\geq I_{g}\\ &=\frac{\|\hat{f_{1}}\|^{p}_{\mathscr{H}(K)}}{2}+\frac{\|\hat{f_{2}}\|^{p}_{\mathscr{H}(K)}}{2}+\frac{\int_{X}c(\hat{f_{1}}(x),g(x))d\mu(x)}{2}+\frac{\int_{X}c(\hat{f_{2}}(x),g(x))d\mu(x)}{2}.\end{split}

Since c(,z)c(\cdot,z) is convex for any given zz, we then have

f1^+f2^2(K)pf1^(K)p2+f2^(K)p2.\|\frac{\hat{f_{1}}+\hat{f_{2}}}{2}\|^{p}_{\mathscr{H}(K)}\geq\frac{\|\hat{f_{1}}\|^{p}_{\mathscr{H}(K)}}{2}+\frac{\|\hat{f_{2}}\|^{p}_{\mathscr{H}(K)}}{2}.

On the other hand, by triangle inequality and that |x|p|x|^{p} is (strictly) convex for p>1p>1, we get

f1^+f2^2(K)p(f1^(K)2+f2^(K)2)pf1^(K)p2+f2^(K)p2.\|\frac{\hat{f_{1}}+\hat{f_{2}}}{2}\|^{p}_{\mathscr{H}(K)}\leq\left(\frac{\|\hat{f_{1}}\|_{\mathscr{H}(K)}}{2}+\frac{\|\hat{f_{2}}\|_{\mathscr{H}(K)}}{2}\right)^{p}\leq\frac{\|\hat{f_{1}}\|^{p}_{\mathscr{H}(K)}}{2}+\frac{\|\hat{f_{2}}\|^{p}_{\mathscr{H}(K)}}{2}.

Hence we see that the equalities above must hold, and we infer f1^=cf2^\hat{f_{1}}=c\hat{f_{2}} for some c0c\geq 0 as well as f1^(K)=f2^(K)\|\hat{f_{1}}\|_{\mathscr{H}(K)}=\|\hat{f_{2}}\|_{\mathscr{H}(K)} (the case f1^=0\hat{f_{1}}=0 implies f2^=0\hat{f_{2}}=0, and vice versa). Therefore c=1c=1 and f1^=f2^\hat{f_{1}}=\hat{f_{2}}. ∎

Acknowledgement

The authors would like to express gratitude to Qiyu Sun for valuable discussions.

References

  • [1] Nachman Aronszajn. Theory of reproducing kernels. Transactions of the American mathematical society, 68(3):337–404, 1950.
  • [2] Alain Berlinet and Christine Thomas-Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Springer Science & Business Media, 2011.
  • [3] Arthur E Hoerl and Robert W Kennard. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1):55–67, 1970.
  • [4] Jonathan H Manton, Pierre-Olivier Amblard, et al. A primer on reproducing kernel hilbert spaces. Foundations and Trends in Signal Processing, 8(1–2):1–126, 2015.
  • [5] EH Moore. General analysis, part 2: The fundamental notions of general analysis. edited by rw barnard. Memoirs of the American Philosophical Society, 1, 2013.
  • [6] Vern I Paulsen and Mrinal Raghupathi. An introduction to the theory of reproducing kernel Hilbert spaces, volume 152. Cambridge University Press, 2016.
  • [7] Sergei Pereverzyev. An introduction to artificial intelligence based on reproducing kernel Hilbert spaces. Springer Nature, 2022.
  • [8] Saburou Saitoh, Yoshihiro Sawano, et al. Theory of reproducing kernels and applications. Springer, 2016.
  • [9] Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola. A generalized representer theorem. In International conference on computational learning theory, pages 416–426. Springer, 2001.
  • [10] Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B: Statistical Methodology, 58(1):267–288, 1996.
  • [11] Grace Wahba. Spline models for observational data. SIAM, 1990.