This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

New duality in choices of feature spaces via kernel analysis

Palle E.T. Jorgensen (Palle E.T. Jorgensen) Department of Mathematics, The University of Iowa, Iowa City, IA 52242-1419, U.S.A. palle-jorgensen@uiowa.edu  and  James Tian (James F. Tian) Mathematical Reviews, 416 4th Street Ann Arbor, MI 48103-4816, U.S.A. james.ftian@gmail.com
Abstract.

We present a systematic study of the family of positive definite (p.d.) kernels with the use of their associated feature maps and feature spaces. For a fixed set XX, generalizing Loewner, we make precise the corresponding partially ordered set Pos(X)Pos\left(X\right) of all p.d. kernels on XX, as well as a study of its global properties. This new analysis includes both results dealing with applications and concrete examples, including such general notions for Pos(X)Pos\left(X\right) as the structure of its partial order, its products, sums, and limits; as well as their Hilbert space-theoretic counterparts. For this purpose, we introduce a new duality for feature spaces, feature selections, and feature mappings. For our analysis, we further introduce a general notion of dual pairs of p.d. kernels. Three special classes of kernels are studied in detail: (a) the case when the reproducing kernel Hilbert spaces (RKHSs) may be chosen as Hilbert spaces of analytic functions, (b) when they are realized in spaces of Schwartz-distributions, and (c) arise as fractal limits. We further prove inverse theorems in which we derive results for the analysis of Pos(X)Pos\left(X\right) from the operator theory of specified counterpart-feature spaces. We present constructions of new p.d. kernels in two ways: (i) as limits of monotone families in Pos(X)Pos\left(X\right), and (ii) as p.d. kernels which model fractal limits, i.e., are invariant with respect to certain iterated function systems (IFS)-transformations.

Key words and phrases:
Positive-definite kernel, feature space, feature selection, operator theory, reproducing kernel Hilbert space, Schwartz distributions, embedding problem, factorization, geometry, optimization, Principal Component Analysis, covariance kernels, kernel optimization, Gaussian process.
2020 Mathematics Subject Classification:
Primary 46E22. Secondary 47B32, 41A65, 42A82, 42C15, 60G15, 68T07.

1. Introduction

The purpose of this paper is to introduce a new duality for the study of feature spaces, feature selections, and feature mappings, which arise in diverse applications of kernel analysis of non-linear problems. Here, our use of the notion of “feature space” is in the sense of data science: it refers to the collections of features used to characterize the data at hand. By “feature selection,” we mean one or more techniques from machine learning, typically involving the choice of subsets of relevant features from the original set to enhance model performance. The term “feature mappings” refers to a technique in data analysis and machine learning for transforming input data from a lower-dimensional space to a higher-dimensional space using kernels, enabling easier analysis or classification. Choices of feature mapping involve constructions and optimization algorithms, which lead to the selection of specific functions. These mappings serve to transform the original data into a new set of features (feature spaces) that better capture the significant patterns in the data.

We consider families of positive definite (p.d.) kernels, defined on a product set X×XX\times X, where XX is merely a set with no extra a priori structure. The positivity condition for KK (p.d.), Definition 2.1, was first studied by Aronszajn [Aro50] and his contemporaries (see also [PR16, AMP92, ZZ23, SBP23]).

Pairs (X,K)\left(X,K\right) arise in various contexts, including optimization, principal component analysis (PCA), partial differential equations (PDEs), and statistical inference. In stochastic models, the p.d. kernel often serves as a covariance kernel of a Gaussian field. We emphasize that the set XX does not lend itself to direct analysis; in particular, it does not come equipped with any linear structure. However, measurements performed on XX often lead to p.d. kernels KK. Moreover, KK then allows one to represent data from XX in a linear space, often referred to as a feature space.

We say that a pair (ϕ,)\left(\phi,\mathscr{H}\right) represents a feature map, and a feature space if ϕ\phi is a function from XX mapping into the Hilbert space \mathscr{H} in such a way that KK is recovered from the inner product in \mathscr{H} via ϕ\phi; see Proposition 3.3 below. There is a vast variety of choices of feature selections in the form (ϕ,)\left(\phi,\mathscr{H}\right), and Aronszajn’s reproducing kernel Hilbert space (RKHS), denoted as K\mathscr{H}_{K}, is only one possibility.

In this paper, we introduce a new duality approach to the study of choices of feature selections, and apply it to particular p.d. kernels (X,K)\left(X,K\right) arising in both pure and applied mathematics. For related papers on feature selection, we refer to [JT23b, JST23, JT23a, AJ22, JT22, AJ21] as well as the references cited therein.

Organization. Our duality tools are outlined in detail below, and in more detail, in Section 3.1, especially Propositions 3.5 and 3.7. Section 4 deals with the need for choices of “bigger” spaces for implementation, which includes here choices of Hilbert spaces of Schwartz distributions, i.e., generalized functions. For optimization questions arising in practice, it is important to have useful ordering of families of kernels, as well as monotone limit theorems for kernels, and these two questions are addressed systematically in Section 6. Applications of our kernel duality principles and new transforms, are addressed in Section 7.

Applications. While our focus is primarily theoretical, kernel theory and optimization have had significant impact on practical applications, particularly in machine learning algorithms and big data analysis [NSW11, PDC+14, YTDMM11, Jon09, ZXZ09, MDL19, HSZ+19]. Beyond these areas, kernel methods have found relevance in fields such as statistical inference, quantum dynamics, perturbation theory, and operator algebras, with applications ranging from multiplicative change-of-measure algorithms to the analysis of coherent states and Fock spaces [AJP22, Gia21, AP20, DS20, DKS19, CCF16]. This versatility has sparked renewed interest in both the theoretical foundations and practical applications of kernels, leading to deeper understanding of inference techniques and optimization models [WK23, ZCH19, vdL96, AAARM24, HMBV24, Ste24, AA23, TXK23].

Our approach focuses on identifying duality principles to better understand feature spaces and their role in kernel methods. While the “kernel trick” in machine learning often bypasses explicit constructions of the ambient feature space, studying its structure offers a richer theoretical perspective. Insights into feature space flexibility, stability, and scalability can inform the design and optimization of kernels, enhancing algorithmic robustness and interpretability. This dual perspective bridges practical applications like clustering, support-vector machines, and principal component analysis with the broader mathematical framework, enabling more sophisticated kernel-based solutions for complex data problems.

2. Preliminaries

In this section we introduce the main notions which will be used inside the paper.

The concept of a kernel in machine learning is powerful tool used in in the design of Support Vector Machines (SVMs). A kernel is a function that operates on points from the input space, commonly referred to as the XX space. The primary role of this function is to return a scalar value, but a Hilbert space, called a reproducing kernel Hilbert space (RKHS). This higher-dimensional space, known as the ZZ space. It conveys how close or similar vectors are in the ZZ space The kernel allows one to glean the necessary information about the vectors in this more complex space without having to access the space directly. This approach allows one to understand the relationship and position of vectors in a higher-dimensional space and is a powerful tool for classification tasks.

In this section, we introduce fundamental definitions, along with selected lemmas and properties that serve as key building blocks for the paper.

Definition 2.1 (Positive definite).

Let XX be a set.

  1. (1)

    A function K:X×XK:X\times X\rightarrow\mathbb{C} is said to be a positive definite (p.d.) kernel if, for all nn\in\mathbb{N}, all (xi)i=1n\left(x_{i}\right)_{i=1}^{n} in XX, and all (ci)i=1n\left(c_{i}\right)_{i=1}^{n} in \mathbb{C}, we have

    i=1nj=1nci¯cjK(xi,xj)0.\sum_{i=1}^{n}\sum_{j=1}^{n}\overline{c_{i}}c_{j}K\left(x_{i},x_{j}\right)\geq 0.
  2. (2)

    Given a p.d. kernel K:X×XK:X\times X\rightarrow\mathbb{C}, let K\mathscr{H}_{K} be the Hilbert completion of the set H0=span{Kx:xX}H_{0}=span\left\{K_{x}:x\in X\right\}, where Kx()=K(,x)K_{x}\left(\cdot\right)=K\left(\cdot,x\right), xXx\in X, with respect to the norm

    i=1nciKxiK2=i=1nj=1nci¯cjK(xi,xj).\left\|\sum_{i=1}^{n}c_{i}K_{x_{i}}\right\|_{\mathscr{H}_{K}}^{2}=\sum_{i=1}^{n}\sum_{j=1}^{n}\overline{c_{i}}c_{j}K\left(x_{i},x_{j}\right).

    K\mathscr{H}_{K} is called the reproducing kernel Hilbert space (RKHS) of KK, and it has the reproducing property:

    f(x)=Kx,fKf\left(x\right)=\left\langle K_{x},f\right\rangle_{\mathscr{H}_{K}}

    valid for all fKf\in\mathscr{H}_{K} and xXx\in X.

Throughout this paper, all the Hilbert spaces are assumed separable.

Lemma 2.2 (Parseval frame).

Let \mathscr{H} be a Hilbert space, and {fn}n\left\{f_{n}\right\}_{n\in\mathbb{N}}\subset\mathscr{H}. Suppose

n|fn,h|2=h2,h.\sum_{n\in\mathbb{N}}\left|\left\langle f_{n},h\right\rangle_{\mathscr{H}}\right|^{2}=\left\|h\right\|_{\mathscr{H}}^{2},\quad\forall h\in\mathscr{H}. (2.1)

Then {fn}\left\{f_{n}\right\} is an orthonormal basis (ONB) if and only if fn=1\left\|f_{n}\right\|_{\mathscr{H}}=1 for all nn\in\mathbb{N}.

Proof.

Fix n0n_{0} and assume (2.1), then

fn0K2=fn0K4+nn0|fn,fn0K|2,\left\|f_{n_{0}}\right\|_{\mathscr{H}_{K}}^{2}=\left\|f_{n_{0}}\right\|_{\mathscr{H}_{K}}^{4}+\sum_{n\neq n_{0}}\left|\left\langle f_{n},f_{n_{0}}\right\rangle_{\mathscr{H}_{K}}\right|^{2},

and so fn,fn0K=0\left\langle f_{n},f_{n_{0}}\right\rangle_{\mathscr{H}_{K}}=0, for all n\{n0}n\in\mathbb{N}\backslash\left\{n_{0}\right\}, if fn0K=1\left\|f_{n_{0}}\right\|_{\mathscr{H}_{K}}=1. ∎

Lemma 2.3 (Kernel representation).

Let KK be a p.d. kernel on X×XX\times X.

  1. (1)

    A system of functions {fn}\left\{f_{n}\right\} on XX is a Parseval frame for K\mathscr{H}_{K} if and only if

    K(x,y)=nfn(x)¯fn(y),x,yX.K\left(x,y\right)=\sum_{n}\overline{f_{n}\left(x\right)}f_{n}\left(y\right),\quad x,y\in X. (2.2)

    Moreover, when (2.2) holds, then (fn(x))l2\left(f_{n}\left(x\right)\right)\in l^{2} for all xXx\in X.

  2. (2)

    Further, if all the fnf_{n}’s are distinct, i.e., each with multiplicity one, then {fn}\left\{f_{n}\right\} is an ONB.

Proof.

For (1), see e.g., [PR16]. Note n|fn(x)|2=Kx2=K(x,x)<\sum_{n}\left|f_{n}\left(x\right)\right|^{2}=\left\|K_{x}\right\|_{\mathscr{H}}^{2}=K\left(x,x\right)<\infty, for all xXx\in X.

Part (2) follows from the argument in Lemma 2.2, where Kl2\mathscr{H}_{K}\simeq l^{2} with the isomorphism fnenf_{n}\mapsto e_{n}. ∎

Proposition 2.4 (Products of p.d. kernels).

Let KK and LL be p.d. kernels defined on X×XX\times X. Set M=KLM=KL as follows: M(x,y)K(x,y)L(x,y)M\left(x,y\right)\coloneqq K\left(x,y\right)L\left(x,y\right) for (x,y)X×X\left(x,y\right)\in X\times X. Then MM is also p.d. on X×XX\times X.

Proof.

Pick the representation (2.2) for KK, and consider NN\in\mathbb{N}, cic_{i}\in\mathbb{R} (or \mathbb{C}), xiXx_{i}\in X, 1iN1\leq i\leq N. Then we get the desired conclusion as follows:

ijci¯cjM(xi,xj)=n(ij(cifn(xi)¯)(cjfn(xj))L(xi,xj))0,\sum_{i}\sum_{j}\overline{c_{i}}c_{j}M\left(x_{i},x_{j}\right)\\ =\sum_{n}\left(\sum_{i}\sum_{j}\left(\overline{c_{i}f_{n}\left(x_{i}\right)}\right)\left(c_{j}f_{n}\left(x_{j}\right)\right)L\left(x_{i},x_{j}\right)\right)\geq 0,

where we used the p.d. property of LL in the last step. ∎

Definition 2.5 (Loewner order, see e.g., [Don74, Loe48]).

For p.d. kernels K,LK,L on X×XX\times X, we say KLK\leq L if LKL-K is p.d.

Lemma 2.6.

If K:X×XK:X\times X\rightarrow\mathbb{C} is p.d. and K1K\geq 1, then KnKn+1K^{n}\leq K^{n+1}, nn\in\mathbb{N}.

Proof.

Recall that products of p.d. kernels are p.d. (see Proposition 2.4, and also Proposition 3.7). Therefore, Kn+1Kn=Kn(K1)0K^{n+1}-K^{n}=K^{n}\left(K-1\right)\geq 0. ∎

Example 2.7.

Let K(z,w)=(1w¯z)1K\left(z,w\right)=\left(1-\overline{w}z\right)^{-1} be the Szegő kernel on 𝔻×𝔻\mathbb{D}\times\mathbb{D}, where 𝔻={z:|z|<1}\mathbb{D}=\left\{z\in\mathbb{C}:\left|z\right|<1\right\}. Then

(1w¯z)1=n=0w¯nzn1,\left(1-\overline{w}z\right)^{-1}=\sum_{n=0}^{\infty}\overline{w}^{n}z^{n}\geq 1,

in the sense of Definition 2.5, and so

1KK2Kn.1\leq K\leq K^{2}\leq\cdots\leq K^{n}.

More specifically,

111w¯z1(1w¯z)21(1w¯z)31(1w¯z)n.1\leq\frac{1}{1-\overline{w}z}\leq\frac{1}{\left(1-\overline{w}z\right)^{2}}\leq\frac{1}{\left(1-\overline{w}z\right)^{3}}\leq\cdots\leq\frac{1}{\left(1-\overline{w}z\right)^{n}}.

Here, K2(z,w)=(1w¯z)2K^{2}\left(z,w\right)=\left(1-\overline{w}z\right)^{-2} is the Bergman kernel.

Throughout the paper, we will revisit this family of kernels and its variations, both for motivations and for illustrations of general results. The reader may refer to Examples 3.4, 3.6, 4.2, 5.5, 6.3, as well as Corollaries 4.3, 7.3, and Remark 5.11.

Proposition 2.8 (Monotonicity).

Consider two pairs of p.d. kernels KiK_{i} and LiL_{i}, and form the product Pi:=KiLiP_{i}:=K_{i}L_{i}, i=1,2i=1,2. If K1K2K_{1}\leq K_{2}, L1L2L_{1}\leq L_{2}, then P1P2P_{1}\leq P_{2}.

Proof.

By assumption, we get the following conclusion:

P2P1=K2L2K1L1=(K2K1)L2+K1(L2L1)0.P_{2}-P_{1}=K_{2}L_{2}-K_{1}L_{1}=\left(K_{2}-K_{1}\right)L_{2}+K_{1}\left(L_{2}-L_{1}\right)\geq 0.

Since our paper is interdisciplinary it combines topics from pure and applied. Of special significance to the above are the following citations [AJ21, BGG99, JS21, JST20, MDL19, WK23, Ste24].

3. Feature space realizations

The purpose of the present section is to outline key links connection the notions from Section 2 to the present applications. We outline the main connections between the key pure math notions (especially kernel-duality introduced above), and the applied notions, focusing on feature selection, feature maps, and kernel-machines.

With “feature selection” we refer to the part of machine learning that identifies the “best” insights into phenomena/observations. A feature is input, i.e., a measurable property of the phenomena. In statistical learning, features are often identified with choices of independent random variables, typically identically distributed (i.i.d.); see below. More generally, learning algorithms serve to identify features that yield better models. Features come in several forms, for example they might be numeric, or qualitative features. “Good” feature selections in turn let us identify the important or significant patterns that distinguish between data forms and instances. Indeed, as we recall, machine learning is truly multidisciplinary, as is reflected in for example, how features are viewed. Example: a geometric view, treating features as tuples, or vectors in a high-dimensional space, the feature space. Equally important is the probabilistic perspective, i.e., viewing features as multivariate random variables. The following references may be helpful, [AA23, BB23, Gia21, Jon09, MDL19, PDC+14, ZCH19], and [ZZ23]. An important part of the tools that go into feature selection in the statistical setting is known as principal component analysis (PCA) [CDD15]. It is a dimensionality reduction of features, i.e., a reduction of the dimensionality of large data sets. With the use of choices of covariance kernels, it allows one to transform large sets of variables into smaller ones that still contains most of the information in the large set; see e.g., [LCA+24].

As noted above, in applications such as data analysis, the initial set XX is general and typically unstructured. In particular, in applications, choices of sets XX may not have any linear structure. But, nonetheless, in the design of optimization models (for example in statistical inference, and in machine learning models), there will in fact be natural choices of families of positive definite (p.d.) kernels, specified on X×XX\times X. Each such p.d. kernel will then yield an RKHS, denoted here K\mathscr{H}_{K}. And K\mathscr{H}_{K} does present one possible choice of feature space (Definition 3.1), but applications dictate a qualitative and quantitative comparison within the variety of feature spaces for a single p.d. kernel KK. In particular, we study the possibility of a second p.d. kernel, say LL, serving to generate a feature space for KK (Proposition 3.3.) These themes are addressed below. We further study operations on the variety of p.d. kernels, as they relate to feature space selection questions.

Definition 3.1 (Feature map, and feature space).

Given a p.d. kernel K:X×XK:X\times X\rightarrow\mathbb{C} defined on a set XX, a Hilbert space \mathscr{L} is said to be a feature space for KK if there is a map φ:X\varphi:X\rightarrow\mathscr{L}, such that K(x,y)=φ(x),φ(y)K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{L}}, for all x,yXx,y\in X. Set

HS(K){(φ,):K(x,y)=φ(x),φ(y),x,yX}.H_{S}\left(K\right)\coloneqq\left\{\left(\varphi,\mathscr{L}\right):K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{L}},\>x,y\in X\right\}.
Remark 3.2.

HS(K)H_{S}\left(K\right)\neq\emptyset. Some basic examples include:

  1. (1)

    φ(x)=Kx\varphi\left(x\right)=K_{x}, =K\mathscr{L}=\mathscr{H}_{K} (the RKHS of KK), and K(x,y)=Kx,KyKK\left(x,y\right)=\left\langle K_{x},K_{y}\right\rangle_{\mathscr{H}_{K}}.

  2. (2)

    φ(x)=δx\varphi\left(x\right)=\delta_{x}, =span¯{δx:xX}\mathscr{L}=\overline{span}\left\{\delta_{x}:x\in X\right\}, where the Hilbert completion is with respect to

    i=1Nciδx2=i,j=1Nci¯cjK(xi,xj),\left\|\sum_{i=1}^{N}c_{i}\delta_{x}\right\|_{\mathscr{L}}^{2}=\sum_{i,j=1}^{N}\overline{c_{i}}c_{j}K\left(x_{i},x_{j}\right),

    and K(x,y)=δx,δyK\left(x,y\right)=\left\langle\delta_{x},\delta_{y}\right\rangle_{\mathscr{L}}.

  3. (3)

    φ(x)=WxN(0,K(x,x))\varphi\left(x\right)=W_{x}\sim N\left(0,K\left(x,x\right)\right), i.e., (Wx)xX\left(W_{x}\right)_{x\in X} is a mean zero Gaussian field, realized in =L2(Ω,)\mathscr{L}=L^{2}\left(\Omega,\mathbb{P}\right); and K(x,y)=Wx,WyL2()K\left(x,y\right)=\left\langle W_{x},W_{y}\right\rangle_{L^{2}\left(\mathbb{P}\right)}.

More general constructions are considered below.

Proposition 3.3.

Given a p.d. kernel KK on X×XX\times X, then for every Hilbert space \mathscr{L} with dim=dimK\dim\mathscr{L}=\dim\mathscr{H}_{K}, there exists φ\varphi such that (φ,)HS(K)\left(\varphi,\mathscr{L}\right)\in H_{S}\left(K\right).

Proof.

Let {fn}\left\{f_{n}\right\} be an ONB in K\mathscr{H}_{K}, and {ζn}\left\{\zeta_{n}\right\} an ONB in \mathscr{L}. Then the map

φ(x)=nfn(x)ζn\varphi\left(x\right)=\sum_{n}f_{n}\left(x\right)\zeta_{n}\in\mathscr{L} (3.1)

is well defined since (fn(x))l2\left(f_{n}\left(x\right)\right)\in l^{2}, xXx\in X (Lemma 2.3), and

φ(x),φ(y)=nfn(x)¯fn(y)=K(x,y),x,yX.\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{L}}=\sum_{n}\overline{f_{n}\left(x\right)}f_{n}\left(y\right)=K\left(x,y\right),\quad x,y\in X.

Example 3.4.

Let K:X×XK:X\times X\rightarrow\mathbb{C} be p.d., and K\mathscr{H}_{K} the associated RKHS. Let {fn}\left\{f_{n}\right\} be an ONB for K\mathscr{H}_{K}.

  1. (1)

    Let {Zn}\left\{Z_{n}\right\} be a sequence of i.i.d. Gaussian random variables, where ZnN(0,1)Z_{n}\sim N\left(0,1\right), realized on a probability space L2(Ω,)L^{2}\left(\Omega,\mathbb{P}\right). Here, one may take Ω=\Omega=\prod_{\mathbb{N}}\mathbb{R}, equipped with the σ\sigma-algebra 𝒞\mathscr{C} generated by the cylinder sets. Define

    φ(x)=nfn(x)Zn()L2(Ω,).\varphi\left(x\right)=\sum_{n}f_{n}\left(x\right)Z_{n}\left(\cdot\right)\in L^{2}\left(\Omega,\mathbb{P}\right).

    Then (φ,L2(Ω,))HS(K)\left(\varphi,L^{2}\left(\Omega,\mathbb{P}\right)\right)\in H_{S}\left(K\right), and

    K(x,y)=φ(x),φ(y)L2(Ω,).K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{L^{2}\left(\Omega,\mathbb{P}\right)}.
  2. (2)

    The above holds, in particular, when KK is the reproducing kernel of the Bergman space B2(Ω)B_{2}\left(\Omega\right), Ωn\Omega\subset\mathbb{C}^{n}. For n=1n=1, Ω=𝔻\Omega=\mathbb{D},

    K(z,w)=(1w¯z)2=n=0(n+1)w¯nzn,K\left(z,w\right)=\left(1-\overline{w}z\right)^{-2}=\sum_{n=0}^{\infty}\left(n+1\right)\overline{w}^{n}z^{n},

    where {1+nzn}n0\left\{\sqrt{1+n}z^{n}\right\}_{n\in\mathbb{N}_{0}} is an ONB for K\mathscr{H}_{K}. Setting

    φ(z)=n=0n+1znZn()L2(Ω,),\varphi\left(z\right)=\sum_{n=0}^{\infty}\sqrt{n+1}z^{n}Z_{n}\left(\cdot\right)\in L^{2}\left(\Omega,\mathbb{P}\right),

    then

    K(z,w)=φ(z),φ(w)L2(Ω,).K\left(z,w\right)=\left\langle\varphi\left(z\right),\varphi\left(w\right)\right\rangle_{L^{2}\left(\Omega,\mathbb{P}\right)}.
  3. (3)

    Choose \mathscr{L} to be any L2L^{2}-space, e.g., =L2(M,μ)\mathscr{L}=L^{2}\left(M,\mu\right). Then, with

    φ(x)=nfn(x)ζn()L2(M,μ),\varphi\left(x\right)=\sum_{n}f_{n}\left(x\right)\zeta_{n}\left(\cdot\right)\in L^{2}\left(M,\mu\right),

    we have

    K(x,y)=Mφ(x)¯φ(y)𝑑μ=φ(x),φ(y)L2(μ).K\left(x,y\right)=\int_{M}\overline{\varphi\left(x\right)}\varphi\left(y\right)d\mu=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{L^{2}\left(\mu\right)}.

    Here the variable mMm\in M is supposed.

3.1. A duality for feature selections

In Proposition 3.3, the case when \mathscr{L} is another RKHS is of particular interest in the analysis below, as it offers a certain symmetry between the two RKHSs and their feature selections. This is stated as follows:

Proposition 3.5 (duality).

Let K,LK,L be p.d. kernels on X×XX\times X, and let K,L\mathscr{H}_{K},\mathscr{H}_{L} be the corresponding RKHSs. Choose an ONB {fn}\left\{f_{n}\right\} for K\mathscr{H}_{K}, and {gn}\left\{g_{n}\right\} for L\mathscr{H}_{L}. Define the following vector-valued functions on XX:

φ(x)\displaystyle\varphi\left(x\right) =nfn(x)gn()L,\displaystyle=\sum_{n}f_{n}\left(x\right)g_{n}\left(\cdot\right)\in\mathscr{H}_{L},
ψ(x)\displaystyle\psi\left(x\right) =nfn()gn(x)K.\displaystyle=\sum_{n}f_{n}\left(\cdot\right)g_{n}\left(x\right)\in\mathscr{H}_{K}.

Then,

(φ,L)\displaystyle\left(\varphi,\mathscr{H}_{L}\right) HS(K),and\displaystyle\in H_{S}\left(K\right),\;\text{and}
(ψ,K)\displaystyle\left(\psi,\mathscr{H}_{K}\right) HS(L).\displaystyle\in H_{S}\left(L\right).
Proof.

Note that φ,ψ\varphi,\psi are well defined since, by Lemma 2.3, (fn(x)),(gn(x))l2\left(f_{n}\left(x\right)\right),\left(g_{n}\left(x\right)\right)\in l^{2} for all xXx\in X. Moreover, for all x,yXx,y\in X,

φ(x),φ(y)L\displaystyle\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{H}_{L}} =fn(x)¯fn(y)=K(x,y),and\displaystyle=\sum\overline{f_{n}\left(x\right)}f_{n}\left(y\right)=K\left(x,y\right),\;\text{and}
ψ(x),ψ(y)K\displaystyle\left\langle\psi\left(x\right),\psi\left(y\right)\right\rangle_{\mathscr{H}_{K}} =gn(x)¯gn(y)=L(x,y),\displaystyle=\sum\overline{g_{n}\left(x\right)}g_{n}\left(y\right)=L\left(x,y\right),

which is the desired conclusion. ∎

Example 3.6.

Consider the Szegő kernel KSz(x,y)=(1w¯z)1K_{Sz}\left(x,y\right)=\left(1-\overline{w}z\right)^{-1}, (w,z)𝔻×𝔻\left(w,z\right)\in\mathbb{D}\times\mathbb{D}. Its RKHS is the Hardy space H2(𝔻)={n=0cnznn:(cn)l2}H_{2}\left(\mathbb{D}\right)=\left\{\sum_{n=0}^{\infty}c_{n}z_{n}^{n}:\left(c_{n}\right)\in l^{2}\right\} with ONB {zn}n0\left\{z_{n}\right\}_{n\in\mathbb{N}_{0}}.

Let K(x,y)=(1xy)1=n=0xnynK\left(x,y\right)=\left(1-xy\right)^{-1}=\sum_{n=0}^{\infty}x^{n}y^{n}, defined on J×JJ\times J, where J=(1,1)J=\left(-1,1\right). That is, K=KSz|J×JK=K_{Sz}\big{|}_{J\times J}. It follows from Lemma 2.3 that {xn}n0\left\{x^{n}\right\}_{n\in\mathbb{N}_{0}} is an ONB for K\mathscr{H}_{K}. Setting

φ(x)=n0xnzn,\varphi\left(x\right)=\sum_{n\in\mathbb{N}_{0}}x^{n}z^{n},

then (φ,H2(𝔻))HS(K)\left(\varphi,H_{2}\left(\mathbb{D}\right)\right)\in H_{S}\left(K\right) by Proposition 3.5.

3.2. Operations

A key property in the choice of Hilbert spaces when constructing feature spaces for positive definite (p.d.) kernels is that tensor products behave well; specifically, the category of Hilbert space is closed under tensor product, meaning that the tensor product formed from two Hilbert spaces is a new and canonically defined Hilbert space, see e.g., [PR16]. Here we take advantage of this geometric fact, showing e.g., that if a p.d. kernel MM arises as a product M=KLM=KL (Proposition 2.4), then the feature spaces for MM arise as tensor products of the feature spaces for the respective factors KK and LL.

Proposition 3.7.

Let KiK_{i}, i=1,2i=1,2, be p.d. kernels on X×XX\times X, and let

K(x,y)K1(x,y)K2(x,y),x,yX,K\left(x,y\right)\coloneqq K_{1}\left(x,y\right)K_{2}\left(x,y\right),\quad x,y\in X, (3.2)

the Hadamard product. Suppose

(φi,i)HS(Ki),i=1,2.\left(\varphi_{i},\mathscr{H}_{i}\right)\in H_{S}\left(K_{i}\right),\quad i=1,2. (3.3)

Set 12\mathscr{H}\coloneqq\mathscr{H}_{1}\otimes\mathscr{H}_{2} (as tensor product in the category of Hilbert spaces), and

φ(x)=φ1(x)φ2(x),xX.\varphi\left(x\right)=\varphi_{1}\left(x\right)\otimes\varphi_{2}\left(x\right),\quad x\in X. (3.4)

Then

(φ,)HS(K).\left(\varphi,\mathscr{H}\right)\in H_{S}\left(K\right).
Proof.

We have

K(x,y)\displaystyle K\left(x,y\right) =(3.2)\displaystyle\underset{\left(\ref{eq:c2}\right)}{=} K1(x,y)K2(x,y)\displaystyle K_{1}\left(x,y\right)K_{2}\left(x,y\right)
=(3.3)\displaystyle\underset{\left(\ref{eq:c3}\right)}{=} φ1(x),φ1(y)1φ2(x),φ2(y)2\displaystyle\left\langle\varphi_{1}\left(x\right),\varphi_{1}\left(y\right)\right\rangle_{\mathscr{H}_{1}}\left\langle\varphi_{2}\left(x\right),\varphi_{2}\left(y\right)\right\rangle_{\mathscr{H}_{2}}
=(3.4)\displaystyle\underset{\left(\ref{eq:c4}\right)}{=} φ(x),φ(y)12.\displaystyle\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{H}_{1}\otimes\mathscr{H}_{2}}.

Proposition 3.8.

Let II be an index set. Suppose (φi,i)HS(Ki)\left(\varphi_{i},\mathscr{H}_{i}\right)\in H_{S}\left(K_{i}\right), iIi\in I, and iIKi(x,x)<\sum_{i\in I}K_{i}\left(x,x\right)<\infty for all xXx\in X. Let KiIKiK\coloneqq\sum_{i\in I}K_{i}, iIi\mathscr{H}\coloneqq\oplus_{i\in I}\mathscr{H}_{i}, and φ(x)iIφi(x)\varphi\left(x\right)\coloneqq\oplus_{i\in I}\varphi_{i}\left(x\right). Then, (φ,)HS(K)\left(\varphi,\mathscr{H}\right)\in H_{S}\left(K\right).

Proof.

Note KK is well defined if and only if iIKi(x,x)<\sum_{i\in I}K_{i}\left(x,x\right)<\infty, for all xXx\in X. By assumptions,

φ(x),φ(y)=iIφi(x),φi(y)i=iIKi(x,y)=K(x,y).\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{H}}=\sum_{i\in I}\left\langle\varphi_{i}\left(x\right),\varphi_{i}\left(y\right)\right\rangle_{\mathscr{H}_{i}}=\sum_{i\in I}K_{i}\left(x,y\right)=K\left(x,y\right).

Lemma 3.9 (sums of p.d. kernels).

Let KiK_{i}, i=1,2i=1,2, be p.d. kernels on X×XX\times X. Then it is immediate that K=K1+K2K=K_{1}+K_{2} is also p.d. Indeed with Definition 3.1, we have that for every (φi,i)HS(Ki)\left(\varphi_{i},\mathscr{L}_{i}\right)\in H_{S}\left(K_{i}\right), then

(φ(x)=φ1(x)+φ2(x),12)HS(K).\left(\varphi^{\oplus}\left(x\right)=\varphi_{1}\left(x\right)+\varphi_{2}\left(x\right),\mathscr{L}_{1}\oplus\mathscr{L}_{2}\right)\in H_{S}\left(K\right).
Proof.
φ(x),φ(y)12\displaystyle\left\langle\varphi^{\oplus}\left(x\right),\varphi^{\oplus}\left(y\right)\right\rangle_{\mathscr{L}_{1}\oplus\mathscr{L}_{2}} =φ1(x),φ1(y)1+φ2(x),φ2(y)2\displaystyle=\left\langle\varphi_{1}\left(x\right),\varphi_{1}\left(y\right)\right\rangle_{\mathscr{L}_{1}}+\left\langle\varphi_{2}\left(x\right),\varphi_{2}\left(y\right)\right\rangle_{\mathscr{L}_{2}}
=K1(x,y)+K2(x,y)=K(x,y).\displaystyle=K_{1}\left(x,y\right)+K_{2}\left(x,y\right)=K\left(x,y\right).

However, the RKHS from K=K1+K2K=K_{1}+K_{2} is not a direct sum Hilbert space. By [Aro50], FKF\in\mathscr{H}_{K} has its norm-represented as

FK2=infF1,F2{F1K12+F1K22,F1+F2=F,FiKi}.\left\|F\right\|_{\mathscr{H}_{K}}^{2}=\inf_{F_{1},F_{2}}\left\{\left\|F_{1}\right\|_{\mathscr{H}_{K_{1}}}^{2}+\left\|F_{1}\right\|_{\mathscr{H}_{K_{2}}}^{2},F_{1}+F_{2}=F,F_{i}\in\mathscr{H}_{K_{i}}\right\}.

Indeed, the RKHS K\mathscr{H}_{K} takes the form

K(K1K2)N\mathscr{H}_{K}\simeq\left(\mathscr{H}_{K_{1}}\oplus\mathscr{H}_{K_{2}}\right)\ominus N

where

N={(f,f)K1K2,fK1K2}.N=\left\{\left(f,-f\right)\in\mathscr{H}_{K_{1}}\oplus\mathscr{H}_{K_{2}},f\in\mathscr{H}_{K_{1}}\cap\mathscr{H}_{K_{2}}\right\}.

Of special significance to the above discussion are the following citations [AA23, AAARM24, CDD15, Jon09, NSW11, YTDMM11, ZCH19].

4. Hilbert space of distributions

Above, in Sections 2 and 3, we introduced realizations via feature maps. Here we address the of “good” choices of feature spaces, in particular, we make precise the choices of “bigger” features spaces, taking the form of Hilbert spaces of Schwartz distributions.

The details below concern special families of p.d. kernels KK, and ways to make precise the corresponding feature spaces, including the form taken by the RKHS K\mathscr{H}_{K}. This is motivated in part by an important paper by Laurent Schwartz [Sch64].

Theorem 4.1.

Let KK be a p.d. kernel on Ω×Ω\Omega\times\Omega, where Ωd\Omega\subset\mathbb{R}^{d} is open, and let K\mathscr{H}_{K} be the corresponding RKHS. Suppose KC(Ω×Ω)K\in C^{\infty}\left(\Omega\times\Omega\right), and let {fn}\left\{f_{n}\right\} be an ONB for K\mathscr{H}_{K}, so that K(x,y)=nfn(x)¯fn(y)K\left(x,y\right)=\sum_{n}\overline{f_{n}\left(x\right)}f_{n}\left(y\right), where fnCf_{n}\in C^{\infty}.

Let (Ω)\mathscr{E}^{\prime}\left(\Omega\right) denote the space of Schwartz distributions with compact support in Ω\Omega, i.e., (Ω)\mathscr{E}^{\prime}\left(\Omega\right) is the Frechet dual of C(Ω)C^{\infty}\left(\Omega\right). Let

HKdist(Ω)={D(Ω):DKD<}H_{K}^{dist}\left(\Omega\right)=\left\{D\in\mathscr{E}^{\prime}\left(\Omega\right):DKD<\infty\right\} (4.1)

with an ONB {Dn}\left\{D_{n}\right\}. The notation DKDDKD in (4.1) refers to a pair K,DK,D where KK is a CC^{\infty} p.d. kernel and DD is a Schwartz distribution. Then DKDDKD refers to the action of DD in the two variables of KK, i.e., on the left and on the right; see the cited literature. Here, the Hilbert completion is with respect to the inner product (ξ,η)ξKη\left(\xi,\eta\right)\mapsto\xi K\eta.

Set

φ(x)=nfn(x)DnHKdist(Ω),\varphi\left(x\right)=\sum_{n}f_{n}\left(x\right)D_{n}\in H_{K}^{dist}\left(\Omega\right),

then

(φ,HKdist(Ω))HS(K),\left(\varphi,H_{K}^{dist}\left(\Omega\right)\right)\in H_{S}\left(K\right),

i.e.,

K(x,y)=φ(x),φ(y)HKdist(Ω).K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{H_{K}^{dist}\left(\Omega\right)}.
Proof.

See Proposition 3.3. ∎

Example 4.2 ([Jor02]).

Let K(x,y)=(1xy)1K\left(x,y\right)=\left(1-xy\right)^{-1}, defined on J×JJ\times J, where J=(1,1)J=\left(-1,1\right). The corresponding RKHS K\mathscr{H}_{K} has an ONB {xn:n0}\left\{x^{n}:n\in\mathbb{N}_{0}\right\}.

Let Dn=(n!)1δ0(n)D_{n}=\left(n!\right)^{-1}\delta_{0}^{\left(n\right)}, where δ0(n)\delta_{0}^{\left(n\right)} is the nthn^{th} derivative of the Dirac distribution δ0\delta_{0}. Note that KCK\in C^{\infty}, and

Dn(x)K(x,y)=ynK.D_{n}\left(x\right)K\left(x,y\right)=y^{n}\in\mathscr{H}_{K}.

Define

DnKDmDn(x)K(x,y)Dm(y)={1n=m,0nm.D_{n}KD_{m}\coloneqq D_{n}\left(x\right)K\left(x,y\right)D_{m}\left(y\right)=\begin{cases}1&n=m,\\ 0&n\neq m.\end{cases} (4.2)

Let Kdist\mathscr{H}_{K}^{dist} be the Hilbert completion of span{Dn}span\left\{D_{n}\right\}, then

Kdist={ξ(J):ξKξ<}={cnDn:|cn|2<},\mathscr{H}_{K}^{dist}=\left\{\xi\in\mathscr{E}^{\prime}\left(J\right):\xi K\xi<\infty\right\}=\left\{\sum c_{n}D_{n}:\sum\left|c_{n}\right|^{2}<\infty\right\}, (4.3)

for which {Dn:n0}\left\{D_{n}:n\in\mathbb{N}_{0}\right\} is an ONB. Set

φ(x)=n=0xnDnKdist,\varphi\left(x\right)=\sum_{n=0}^{\infty}x^{n}D_{n}\in\mathscr{H}_{K}^{dist},

then (φ,Kdist)HS(K)\left(\varphi,\mathscr{H}_{K}^{dist}\right)\in H_{S}\left(K\right), by Theorem 4.1. Note that

KdistK=.\mathscr{H}_{K}^{dist}\cap\mathscr{H}_{K}=\emptyset.

Similarly, for any kernel that is defined by power series, we obtain Kdist\mathscr{H}_{K}^{dist} as in (4.3), by adjusting the coefficients of δ(n)\delta^{\left(n\right)}. In particular, this applies to the kernel

K(x,y)=(1xy)n=1+nxy+12n(n+1)x2y2+,(x,y)J×J.K\left(x,y\right)=\left(1-xy\right)^{-n}=1+nxy+\frac{1}{2}n\left(n+1\right)x^{2}y^{2}+\cdots,\quad\left(x,y\right)\in J\times J.
Corollary 4.3.

Suppose f(x)=n=0anxnf\left(x\right)=\sum_{n=0}^{\infty}a_{n}x^{n}, an>0a_{n}>0, with radius of convergence r2>0r^{2}>0. Let K(x,y)=n=0anxnynK\left(x,y\right)=\sum_{n=0}^{\infty}a_{n}x^{n}y^{n}, defined on J×JJ\times J, where J=(r,r)J=\left(-r,r\right). Set

HKdist={n=0cnDn:|cn|2<,Dn=1n!anδ(n)}H_{K}^{dist}=\left\{\sum_{n=0}^{\infty}c_{n}D_{n}:\sum\left|c_{n}\right|^{2}<\infty,\;D_{n}=\frac{1}{n!\sqrt{a_{n}}}\delta^{\left(n\right)}\right\}

and

φ(x)=n=0anxnDn.\varphi\left(x\right)=\sum_{n=0}^{\infty}\sqrt{a_{n}}x^{n}D_{n}.

Then (φ,HKdist)HS(K)\left(\varphi,H_{K}^{dist}\right)\in H_{S}\left(K\right).

Of special significance to the above discussion are the following citations [AJ21, AJP22, Jor02, Sch64].

5. Ordering of kernels and RKHSs

Since the choice of “good” features for kernel-machines depend on prior identification of kernels, it is clear that precise comparisons of kernels will be important. We stress ordering of kernels. Their role is addressed below, addressing the role played for feature selection by ordering between pairs kernels, and their implications for computations. Details will be addressed below.

The role of “ordering”, as we saw, arises for both issues dealing with feature selection, and from the role kernels play in geometry and in analysis. More generally, the question of order plays an important role in diverse methods used for building new reproducing kernel Hilbert spaces (RKHSs) from other Hilbert spaces with specified frame elements having specific properties. Such new constructions of RKHSs are used in turn within the framework of regularization theory, and in approximation theory; involving there such questions as semiparametric estimation, and multiscale schemes of regularization. Making use of the results from the previous two section, we turn below to a systematic analysis of these questions of ordering.

Returning to the general framework (X,K)(X,K), where the set XX does not come with any particular structure, we now study the case when XX is fixed, and we examine the collection of all p.d. kernels defined on X×XX\times X. A special feature of interest is that of deciding how the ordering of pairs of p.d. kernels relates to operators which map between the associated families of feature spaces, see Theorem 5.7. This study includes an identification of multipliers, see Corollary 5.8.

We first recall Aronszajn’s inclusion theorem, which states that, for two p.d. kernels K,LK,L on X×XX\times X, KLK\leq L if and only if K\mathscr{H}_{K} is contractively contained in L\mathscr{H}_{L} (see e.g., [Aro50]):

Theorem 5.1.

Let KiK_{i}, i=1,2i=1,2, be p.d. kernels on X×XX\times X. Then K1K2\mathscr{H}_{K_{1}}\subset\mathscr{H}_{K_{2}} (bounded contained) if and only if there exists a constant c>0c>0 such that K1c2K2K_{1}\leq c^{2}K_{2}. Moreover, f2cf1\left\|f\right\|_{\mathscr{H}_{2}}\leq c\left\|f\right\|_{\mathscr{H}_{1}} for all fK1f\in\mathscr{H}_{K_{1}}.

This theorem is reformulated in Lemma 5.2 by means of quadratic forms. It is then extended in Theorem 5.7 to feature spaces.

Lemma 5.2.

Suppose K,LK,L are p.d. on X×XX\times X, and KLK\leq L. Let L\mathscr{H}_{L} be the RKHS of LL.

  1. (1)

    Let L0=span{Lx:xX}L_{0}=span\left\{L_{x}:x\in X\right\}. Define Φ:L0×L0\Phi:L_{0}\times L_{0}\rightarrow\mathbb{C} by

    Φ(Lx,Ly)=K(x,y)\Phi\left(L_{x},L_{y}\right)=K\left(x,y\right) (5.1)

    and extend by linearity:

    Φ(i=1mciLxi,j=1ndjLyj)=i=1mj=1nci¯djK(xi,yj).\Phi\left(\sum_{i=1}^{m}c_{i}L_{x_{i}},\sum_{j=1}^{n}d_{j}L_{y_{j}}\right)=\sum_{i=1}^{m}\sum_{j=1}^{n}\overline{c_{i}}d_{j}K\left(x_{i},y_{j}\right). (5.2)

    Then Φ\Phi extends to a bounded sesquilinear form on L\mathscr{H}_{L}.

  2. (2)

    There exists a unique positive selfadjoint operator AA on L\mathscr{H}_{L}, such that 0AI0\leq A\leq I, and

    Φ(f,g)=A1/2f,A1/2gL,f,gL.\Phi\left(f,g\right)=\left\langle A^{1/2}f,A^{1/2}g\right\rangle_{\mathscr{H}_{L}},\quad f,g\in\mathscr{H}_{L}. (5.3)
  3. (3)

    Especially,

    K(x,y)=A1/2Lx,A1/2LyL,x,yX.K\left(x,y\right)=\left\langle A^{1/2}L_{x},A^{1/2}L_{y}\right\rangle_{\mathscr{H}_{L}},\quad x,y\in X. (5.4)
Proof.

Part (1). Need only to show that

|Φ(i=1mciLxi,j=1ndjLyi)|2\displaystyle\left|\Phi\left(\sum_{i=1}^{m}c_{i}L_{x_{i}},\sum_{j=1}^{n}d_{j}L_{y_{i}}\right)\right|^{2} i=1mciLxiL2j=1ndjLyiL2\displaystyle\leq\left\|\sum_{i=1}^{m}c_{i}L_{x_{i}}\right\|_{\mathscr{H}_{L}}^{2}\left\|\sum_{j=1}^{n}d_{j}L_{y_{i}}\right\|_{\mathscr{H}_{L}}^{2} (5.5)

Assume (π,)HS(K)\left(\pi,\mathscr{E}\right)\in H_{S}\left(K\right), i.e., K(x,y)=π(x),π(y)K\left(x,y\right)=\left\langle\pi\left(x\right),\pi\left(y\right)\right\rangle_{\mathscr{E}}. Then,

|i=1mj=1nci¯djK(xi,yj)|2=|i=1mciπ(xi),j=1ndjπ(yj)|2.\left|\sum_{i=1}^{m}\sum_{j=1}^{n}\overline{c_{i}}d_{j}K\left(x_{i},y_{j}\right)\right|^{2}=\left|\left\langle\sum_{i=1}^{m}c_{i}\pi\left(x_{i}\right),\sum_{j=1}^{n}d_{j}\pi\left(y_{j}\right)\right\rangle_{\mathscr{E}}\right|^{2}.

Thus,

LHS(5.5)\displaystyle\text{LHS}_{\left(\ref{eq:t5}\right)} i=1mciπ(xi)2j=1ndjπ(yj)2\displaystyle\leq\left\|\sum_{i=1}^{m}c_{i}\pi\left(x_{i}\right)\right\|_{\mathscr{E}}^{2}\left\|\sum_{j=1}^{n}d_{j}\pi\left(y_{j}\right)\right\|_{\mathscr{E}}^{2}
=(s,t=1mcs¯ctπ(xs),π(xt))(s,t=1nds¯dtπ(ys),π(yt))\displaystyle=\left(\sum_{s,t=1}^{m}\overline{c_{s}}c_{t}\left\langle\pi\left(x_{s}\right),\pi\left(x_{t}\right)\right\rangle_{\mathscr{E}}\right)\left(\sum_{s,t=1}^{n}\overline{d_{s}}d_{t}\left\langle\pi\left(y_{s}\right),\pi\left(y_{t}\right)\right\rangle_{\mathscr{E}}\right)
=(s,t=1mcs¯ctK(xs,xt))(s,t=1nds¯dtK(ys,yt))\displaystyle=\left(\sum_{s,t=1}^{m}\overline{c_{s}}c_{t}K\left(x_{s},x_{t}\right)\right)\left(\sum_{s,t=1}^{n}\overline{d_{s}}d_{t}K\left(y_{s},y_{t}\right)\right)
(s,t=1mcs¯ctL(xs,xt))(s,t=1nds¯dtL(ys,yt))=RKS(5.5).\displaystyle\leq\left(\sum_{s,t=1}^{m}\overline{c_{s}}c_{t}L\left(x_{s},x_{t}\right)\right)\left(\sum_{s,t=1}^{n}\overline{d_{s}}d_{t}L\left(y_{s},y_{t}\right)\right)=\text{RKS}_{\left(\ref{eq:t5}\right)}.

Part (2) follows from the general theory of quadratic forms. Note that (5.4) follows from (5.3) and (5.1). ∎

Corollary 5.3.

Assume K,LK,L are p.d. on X×XX\times X, and KLK\leq L. Let K\mathscr{H}_{K} and L\mathscr{H}_{L} be the corresponding RKHSs. Then

KxA1/2LxK_{x}\mapsto A^{1/2}L_{x}

extends to an isometry from K\mathscr{H}_{K} into L\mathscr{H}_{L}.

Remark 5.4.

Let a pair of p.d. kernels satisfy the Loewner order relation (Definition 2.5.) Note that then the corresponding operator AA introduced in (5.4) and Corollary 5.3 will be bounded. However, Example 5.5 below (see (5.8)) illustrates that, in general, the inverse A1A^{-1} will be an unbounded operator. In applications to the theory of elliptic PDEs, the operator AA introduced in (5.4) and Corollary 5.3 may take the form of a “Greens function;” see e.g., [Nel58a, Nel58b].

Example 5.5.

Let K(z,w)=(1w¯z)1K\left(z,w\right)=\left(1-\overline{w}z\right)^{-1}, L(z,w)=(1w¯z)2L\left(z,w\right)=\left(1-\overline{w}z\right)^{-2}, defined on 𝔻×𝔻\mathbb{D}\times\mathbb{D}, where

K\displaystyle\mathscr{H}_{K} =H2(𝔻)={n=0cnzn:(cn)l2},\displaystyle=H_{2}\left(\mathbb{D}\right)=\left\{\sum_{n=0}^{\infty}c_{n}z^{n}:\left(c_{n}\right)\in l^{2}\right\},
L\displaystyle\mathscr{H}_{L} =B2(𝔻)={n=0cnzn:(cn/1+n)l2}.\displaystyle=B_{2}\left(\mathbb{D}\right)=\left\{\sum_{n=0}^{\infty}c_{n}z^{n}:\left(c_{n}/\sqrt{1+n}\right)\in l^{2}\right\}.

Define

A(zn)=(1+n)1zn.A\left(z^{n}\right)=\left(1+n\right)^{-1}z^{n}. (5.6)

Then

K(z,w)=A1/2Lz,A1/2LwL.K\left(z,w\right)=\left\langle A^{1/2}L_{z},A^{1/2}L_{w}\right\rangle_{\mathscr{H}_{L}}. (5.7)

Moreover, the inverse operator is given by

A1=1+zddz:zn(1+n)zn,A^{-1}=1+z\frac{d}{dz}:z^{n}\longmapsto\left(1+n\right)z^{n}, (5.8)

where A11A^{-1}\geq 1.

Proof of (5.7).

Recall that Lw(s)=L(s,w)=n0(1+n)w¯nsnL_{w}\left(s\right)=L\left(s,w\right)=\sum_{n\in\mathbb{N}_{0}}\left(1+n\right)\overline{w}^{n}s^{n}, and

1=znKznL=11+n,n0.1=\left\|z^{n}\right\|_{\mathscr{H}_{K}}\geq\left\|z^{n}\right\|_{\mathscr{H}_{L}}=\frac{1}{\sqrt{1+n}},\quad n\in\mathbb{N}_{0}.

Then,

A1/2Lz,A1/2LwL\displaystyle\left\langle A^{1/2}L_{z},A^{1/2}L_{w}\right\rangle_{\mathscr{H}_{L}} =A1/2n(1+n)z¯nsn,A1/2m(1+n)w¯msmL\displaystyle=\left\langle A^{1/2}\sum_{n}\left(1+n\right)\overline{z}^{n}s^{n},A^{1/2}\sum_{m}\left(1+n\right)\overline{w}^{m}s^{m}\right\rangle_{\mathscr{H}_{L}}
=n(1+n)2znw¯nA1/2sn,A1/2snL\displaystyle=\sum_{n}\left(1+n\right)^{2}z^{n}\overline{w}^{n}\left\langle A^{1/2}s^{n},A^{1/2}s^{n}\right\rangle_{\mathscr{H}_{L}}
=n(1+n)znw¯nsn,snL\displaystyle=\sum_{n}\left(1+n\right)z^{n}\overline{w}^{n}\left\langle s^{n},s^{n}\right\rangle_{\mathscr{H}_{L}}
=nznw¯n=Kz,KwK=K(z,w).\displaystyle=\sum_{n}z^{n}\overline{w}^{n}=\left\langle K_{z},K_{w}\right\rangle_{\mathscr{H}_{K}}=K\left(z,w\right).

Remark 5.6.

If j:KLj:\mathscr{H}_{K}\rightarrow\mathscr{H}_{L} is the inclusion map, then the adjoint j:LKj^{*}:\mathscr{H}_{L}\rightarrow\mathscr{H}_{K} is given by j(zn)=(1+n)1znj^{*}\left(z^{n}\right)=\left(1+n\right)^{-1}z^{n}. Therefore, the operator AA in (5.6) is precisely the contraction A=jj:LLA=jj^{*}:\mathscr{H}_{L}\rightarrow\mathscr{H}_{L}.

More generally, we have:

Theorem 5.7.

Let K,LK,L be p.d. kernels on X×XX\times X, with (φ,𝒦)HS(K)\left(\varphi,\mathscr{K}\right)\in H_{S}\left(K\right) and (ψ,)HS(L)\left(\psi,\mathscr{L}\right)\in H_{S}\left(L\right). Then KLK\leq L if and only if there exists a positive selfadjoint operator on BB on \mathscr{L}, such that 0BI0\leq B\leq I, and

K(x,y)=φ(x),φ(y)𝒦=B1/2ψ(x),B1/2ψ(y),x,yX.K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{K}}=\left\langle B^{1/2}\psi\left(x\right),B^{1/2}\psi\left(y\right)\right\rangle_{\mathscr{L}},\quad x,y\in X.
Proof.

See the proof of Lemma 5.2. ∎

Corollary 5.8 (Multipliers).

Let KK be a p.d. kernel on 𝔻\mathbb{D} and K\mathscr{H}_{K} be the corresponding RKHS. For φ\varphi in the unit ball (H)1\left(H^{\infty}\right)_{1} of HH^{\infty}, the function

K(z,w)=(1φ(w)¯φ(z))K(z,w)K^{*}\left(z,w\right)=\left(1-\overline{\varphi\left(w\right)}\varphi\left(z\right)\right)K\left(z,w\right)

is a p.d. kernel on 𝔻\mathbb{D} if and only if φ\varphi is a contractive multiplier on \mathscr{H}, i.e., φhKhK\left\|\varphi h\right\|_{\mathscr{H}_{K}}\leq\left\|h\right\|_{\mathscr{H}_{K}} for all hKh\in\mathscr{H}_{K}.

Proof.

The assertion is equivalent to:

φ(w)¯φ(z)K(z,w)K(z,w)\displaystyle\overline{\varphi\left(w\right)}\varphi\left(z\right)K\left(z,w\right)\leq K\left(z,w\right) (5.9)
\displaystyle\Updownarrow
φhKhK,hK.\displaystyle\left\|\varphi h\right\|_{\mathscr{H}_{K}}\leq\left\|h\right\|_{\mathscr{H}_{K}},\forall h\in\mathscr{H}_{K}. (5.10)

Pick an ONB {fn}\left\{f_{n}\right\} for K\mathscr{H}_{K}, then

φ(w)¯φ(z)K(z,w)=nφ(w)fn(w)¯φ(z)fn(z)\overline{\varphi\left(w\right)}\varphi\left(z\right)K\left(z,w\right)=\sum_{n}\overline{\varphi\left(w\right)f_{n}\left(w\right)}\varphi\left(z\right)f_{n}\left(z\right) (5.11)

Therefore, an application of Theorem 5.7 to (5.11) shows that (5.9) holds if and only if the operator KK\mathscr{H}_{K}\rightarrow\mathscr{H}_{K}, fnφfnf_{n}\mapsto\varphi f_{n} is contractive, thus the equivalence to (5.10). ∎

Next, we focus on certain limit constructions of p.d. kernels.

Definition 5.9.

Given a set XX, let Pos(X)Pos\left(X\right) be the set of all p.d. kernels defined on X×XX\times X.

Theorem 5.10.

Let K1K2KnKn+1K_{1}\leq K_{2}\leq\cdots\leq K_{n}\leq K_{n+1}\leq\cdots, with KnPos(X)K_{n}\in Pos\left(X\right) for all nn\in\mathbb{N}. Assume that for all xXx\in X,

supnKn(x,x)=S(x)<,\sup_{n}K_{n}\left(x,x\right)=S\left(x\right)<\infty, (5.12)

then the Hilbert completion

K~:=(nKn),\mathscr{H}_{\tilde{K}}:=\left(\bigcup_{n}\mathscr{H}_{K_{n}}\right)^{\sim}, (5.13)

is the RKHS of a limit p.d. kernel

K~(x,y):=limnKn(x,y),\tilde{K}\left(x,y\right):=\lim_{n\rightarrow\infty}K_{n}\left(x,y\right), (5.14)

defined on X×XX\times X.

Proof.

First note that, from (5.12), we get the following boundedness:

|Kn(x,y)|2Kn(x,x)Kn(y,y)S(x)S(y)<,\left|K_{n}\left({\color[rgb]{0,0,1}x,y}\right)\right|^{2}\leq K_{n}\left(x,x\right)K_{n}\left(y,y\right)\leq S\left(x\right)S\left(y\right)<\infty,

and so the sequence

{Kn(x,y)}n\left\{K_{n}\left(x,y\right)\right\}_{n\in\mathbb{N}}

is bounded in \mathbb{C} for (x,y)X×X\forall\left(x,y\right)\in X\times X.

For all NN\in\mathbb{N}, cic_{i}\in\mathbb{C}, xiXx_{i}\in X, 1iN1\leq i\leq N, set

Fn(c,x,N)=ijci¯cjKn(xi,xj).F_{n}\left(\vec{c},\vec{x},N\right)=\sum_{i}\sum_{j}\overline{c_{i}}c_{j}K_{n}\left(x_{i},x_{j}\right). (5.15)

Since KnKn+1K_{n}\leq K_{n+1}, it follows that

Fn(c,x,N)Fn+1(c,x,N)F_{n}\left(\vec{c},\vec{x},N\right)\leq F_{n+1}\left(\vec{c},\vec{x},N\right)

and that

supnFn(c,x,N)<(by (5.12)).\sup_{n\in\mathbb{N}}F_{n}\left(\vec{c},\vec{x},N\right)<\infty\;\left(\text{by $\left(\ref{eq:f11}\right)$}\right).

Hence K~\tilde{K} is well defined, and

ijci¯cjK~(xi,xj)=supnFn(c,x,N)<.\sum_{i}\sum_{j}\overline{c_{i}}c_{j}\tilde{K}\left(x_{i},x_{j}\right)=\sup_{n}F_{n}\left(c,\vec{x},N\right)<\infty.

In this case, for every fK1K2f\in\mathscr{H}_{K_{1}}\subset\mathscr{H}_{K_{2}}\subset\cdots,

fK1fK2fK3\left\|f\right\|_{\mathscr{H}_{K_{1}}}\geq\left\|f\right\|_{\mathscr{H}_{K_{2}}}\geq\left\|f\right\|_{\mathscr{H}_{K_{3}}} (5.16)

and we have

fK~=limnfKn.\left\|f\right\|_{\mathscr{H}_{\tilde{K}}}=\lim_{n}\left\|f\right\|_{\mathscr{H}_{K_{n}}}.

Remark 5.11.

Condition (5.12) is necessary for this construction. For example, consider Kn(z,w)=(1w¯z)nK_{n}\left(z,w\right)=\left(1-\overline{w}z\right)^{-n} on 𝔻×𝔻\mathbb{D}\times\mathbb{D}, where nn\in\mathbb{N}. Then

1=zkK1>zkK211+k>zkK30,n.1=\left\|z^{k}\right\|_{\mathscr{H}_{K_{1}}}>\underset{\frac{1}{\sqrt{1+k}}}{\underbrace{\left\|z^{k}\right\|_{\mathscr{H}_{K_{2}}}}}>\left\|z^{k}\right\|_{\mathscr{H}_{K_{3}}}\rightarrow 0,\;n\rightarrow\infty.

As an application we mention the following Cantor construction and a monotone kernel limit. While the example selects a particular scaling-iteration, the idea will apply more generally to a variety of iterated function system constructions (IFSs). For background on IFSs, see e.g., [JT23c, JS21].

Lemma 5.12.

Let fC([0,1])f\in C\left(\left[0,1\right]\right), and extend it to \mathbb{R} by setting f(x)=0f\left(x\right)=0 for x[0,1]x\notin\left[0,1\right]. Define T0f=fT^{0}f=f, and

Tnf(x)=Tn1f(3x)+Tn1f(3x2),n.T^{n}f\left(x\right)=T^{n-1}f\left(3x\right)+T^{n-1}f\left(3x-2\right),\quad n\in\mathbb{N}. (5.17)

Then the limit (pointwise)

F(x)=limnTnf(x)F\left(x\right)=\lim_{n\rightarrow\infty}T^{n}f\left(x\right) (5.18)

is supported in the middle-third Cantor set C1/3C_{1/3}. (See Figure 5.1 for an illustration.)

Refer to caption
Figure 5.1. gn(x)=Tnf(x)g_{n}\left(x\right)=T^{n}f\left(x\right), n=0,1,,5n=0,1,\cdots,5.
Proof.

Recall that C1/3C_{1/3} is defined as follows: Let I=[0,1]I=\left[0,1\right]. Introduce two endomorphisms τ1,τ2:II\tau_{1},\tau_{2}:I\rightarrow I, where τ1(x)=x/3\tau_{1}\left(x\right)=x/3, τ2(x)=(x+2)/3\tau_{2}\left(x\right)=\left(x+2\right)/3. Set C0=IC_{0}=I, and

Cn=τ1(Cn1)τ2(Cn1),n.C_{n}=\tau_{1}\left(C_{n-1}\right)\cup\tau_{2}\left(C_{n-1}\right),\quad n\in\mathbb{N}.

Then

C1/3=n=0Cn.C_{1/3}=\bigcap_{n=0}^{\infty}C_{n}.

Note (5.17) is the dual construction for functions on the unit interval II. ∎

Theorem 5.13.

Let KK be p.d. on X×XX\times X with X=[0,1]X=\left[0,1\right], such that

K(x,y)=i=0fi(x)fi(y)¯,K\left(x,y\right)=\sum_{i=0}^{\infty}f_{i}\left(x\right)\overline{f_{i}\left(y\right)},

where {fi}\left\{f_{i}\right\} is an ONB for the corresponding RKHS K\mathscr{H}_{K}. Extend fif_{i} to \mathbb{R} by setting fi(x)=0f_{i}\left(x\right)=0 for x[0,1]x\notin\left[0,1\right], and set

Kn(x,y)=i=0Tnfi(x)Tnfi(y)¯.K_{n}\left(x,y\right)=\sum_{i=0}^{\infty}T^{n}f_{i}\left(x\right)\overline{T^{n}f_{i}\left(y\right)}.

Then the limit

K(x,y)=limnKn(x,y)=limni=0Tnfi(x)Tnfi(y)¯K_{\infty}\left(x,y\right)=\lim_{n\rightarrow\infty}K_{n}\left(x,y\right)=\lim_{n\rightarrow\infty}\sum_{i=0}^{\infty}T^{n}f_{i}\left(x\right)\overline{T^{n}f_{i}\left(y\right)}

is a p.d. kernel on C1/3×C1/3C_{1/3}\times C_{1/3}.

Moreover, KK_{\infty} is invariant under the action of TT, where TT acts on a p.d. kernel LL on X×XX\times X by

L(x,y)=ili(x)li(y)¯TL(x,y)=iTli(x)Tli(y)¯,x,yX.L\left(x,y\right)=\sum_{i}l_{i}\left(x\right)\overline{l_{i}\left(y\right)}\longmapsto TL\left(x,y\right)=\sum_{i}Tl_{i}\left(x\right)\overline{Tl_{i}\left(y\right)},\quad x,y\in X.
Proof.

By assumption, (fi(x))l2\left(f_{i}\left(x\right)\right)\in l^{2}, x\forall x\in\mathbb{R}. Note that

(fi(x))l2,x(Tfi(x))l2,x\left(f_{i}\left(x\right)\right)\in l^{2},\forall x\in\mathbb{R}\Longrightarrow\left(Tf_{i}\left(x\right)\right)\in l^{2},\forall x\in\mathbb{R}

since

(Tfi(x))l22(fi(3x))l22+(fi(3x2))l22<.\left\|\left(Tf_{i}\left(x\right)\right)\right\|_{l^{2}}^{2}\leq\left\|\left(f_{i}\left(3x\right)\right)\right\|_{l^{2}}^{2}+\left\|\left(f_{i}\left(3x-2\right)\right)\right\|_{l^{2}}^{2}<\infty.

Therefore, KnK_{n} is a well defined p.d. kernel, for all nn\in\mathbb{N}.

The conclusion follows by passing to the limit, where Fi(x):=limnTnfi(x)F_{i}\left(x\right):=\lim_{n\rightarrow\infty}T^{n}f_{i}\left(x\right) exists, and has support in C1/3C_{1/3}, by (5.17)–(5.18). See Figure 5.2 for an illustration. ∎

Refer to caption
K(x,y)\displaystyle K\left(x,y\right) =11xy=i=0xiyi=i=0fi(x)fi(y)\displaystyle=\frac{1}{1-xy}=\sum_{i=0}^{\infty}x^{i}y^{i}=\sum_{i=0}^{\infty}f_{i}\left(x\right)f_{i}\left(y\right)
Kn(x,y)\displaystyle K_{n}\left(x,y\right) =i=0Tnfi(x)Tnfi(y),x,y(0,1).\displaystyle=\sum_{i=0}^{\infty}T^{n}f_{i}\left(x\right)T^{n}f_{i}\left(y\right),\quad x,y\in\left(0,1\right).
Figure 5.2. KnK_{n}, n=0,1,2,3n=0,1,2,3.

Of special significance to the above discussion are the following citations [JST23, JT22, KL21, PDC+14, Don74].

6. RKHS of analytic functions

As noted in Section 4, an identification of good kernels, and their corresponding RKHSs, depend on the particular function spaces that arise as RKHSs. The choices when the RKHSs consist of Hilbert spaces of analytic functions has received special attention in the earlier literature on the use of kernels in analysis. The section below outlines properties of RKHSs realized as Hilbert spaces of analytic functions, and their role in our present applications.

The focus of our analysis below is the case when the RKHS K\mathscr{H}_{K} will be Hilbert spaces of analytic function, defined on an open domain in d\mathbb{C}^{d} for some dd.

Definition 6.1.

Let Ω\Omega be an open subset in d\mathbb{C}^{d} and let KK be a \mathbb{C}-valued p.d. function on Ω×Ω\Omega\times\Omega. We say that KK is analytic if the corresponding RKHS K\mathscr{H}_{K} consists of analytic functions on Ω\Omega.

Remark 6.2.

We note that there are other definitions in the literature which make precise this property of analyticity, and it follows from our discussion that they are equivalent to the present one.

Note that Definition 6.1 makes it clear that the following three familiar classes of p.d. kernels KK are analytic: The cases when KK is a Szegő kernel, or a Bergman kernel, or Bargmann’s kernel [Ber55, BB23, SS17]. In these cases, the respective RKHSs K\mathscr{H}_{K} are the Hardy space H2(Ω)H_{2}\left(\Omega\right), the Bergman space B2(Ω)B_{2}\left(\Omega\right), or Bargmann’s Hilbert space of entire analytic functions on d\mathbb{C}^{d}, also called the Segal-Bargmann space. For the literature, we refer to [LG20, Alp15, ADR03, Kis23, Has21, CCL17], and we call attention to the Drury-Arveson kernel [Arv98] as generalization of the Szego/Bergman case.

Example 6.3 (Bergman = (Szegő)2\left(\text{Szeg\H{o}}\right)^{2}).

Recall the Szegő kernel

K(z,w)=n0znw¯nK\left(z,w\right)=\sum_{n\in\mathbb{N}_{0}}z^{n}\overline{w}^{n}

and the Bergman kernel

K2(z,w)=n0(n+1)znw¯n.K^{2}\left(z,w\right)=\sum_{n\in\mathbb{N}_{0}}\left(n+1\right)z^{n}\overline{w}^{n}.

Here, we have

K2K=K(K1)=(11zw¯)2(zw¯)0,K^{2}-K=K\left(K-1\right)=\left(\frac{1}{1-z\overline{w}}\right)^{2}\left(z\overline{w}\right)\geq 0,

i.e., KK2K\leq K^{2}. By the discussion above, we have

B2(𝔻)AH2(𝔻)B2(𝔻),B_{2}\left(\mathbb{D}\right)\xrightarrow{\quad A\quad}H_{2}\left(\mathbb{D}\right)\subset B_{2}\left(\mathbb{D}\right),

where AA is the operator in (6.1). Specifically,

A(n=0cnzn)B2:=n=0cnn+1znH2A\underset{\in B_{2}}{\underbrace{\left(\sum_{n=0}^{\infty}c_{n}z^{n}\right)}}:=\sum_{n=0}^{\infty}\frac{c_{n}}{\sqrt{n+1}}z^{n}\in H_{2} (6.1)

where

n=0cnn+1znH22\displaystyle\left\|\sum_{n=0}^{\infty}\frac{c_{n}}{\sqrt{n+1}}z^{n}\right\|_{H_{2}}^{2} =n=0|cn|2n+1\displaystyle=\sum_{n=0}^{\infty}\frac{\left|c_{n}\right|^{2}}{n+1}
n=0cnznB22\displaystyle\left\|\sum_{n=0}^{\infty}c_{n}z^{n}\right\|_{B_{2}}^{2} =n=0cnn+1n+1znB22=n=0|cn|2n+1.\displaystyle=\left\|\sum_{n=0}^{\infty}\frac{c_{n}}{\sqrt{n+1}}\sqrt{n+1}z^{n}\right\|_{B_{2}}^{2}=\sum_{n=0}^{\infty}\frac{\left|c_{n}\right|^{2}}{n+1}.
Remark 6.4.

Note this covers a lot of the kernels we considered, such as

  1. (1)

    KK:

    11zw¯,(11zw¯)2,X=𝔻\begin{matrix}{\displaystyle\frac{1}{1-z\overline{w}}},&{\displaystyle\left(\frac{1}{1-z\overline{w}}\right)^{2}},&X=\mathbb{D}\end{matrix}
  2. (2)

    K:K:

    12i(zw¯)\frac{1}{2i\left(z-\overline{w}\right)}

    defined for (z,w)+×+\left(z,w\right)\in\mathbb{C}_{+}\times\mathbb{C}_{+}, where

    +={z:z>0}\mathbb{C}_{+}=\left\{z\in\mathbb{C}:\Im z>0\right\}
  3. (3)

    The Bargmann kernel:

    ezw¯=n=0znw¯nn!=n=0znn!w¯nn!,(z,w)×.e^{z\overline{w}}=\sum_{n=0}^{\infty}\frac{z^{n}\overline{w}^{n}}{n!}=\sum_{n=0}^{\infty}\frac{z^{n}}{\sqrt{n!}}\frac{\overline{w}^{n}}{\sqrt{n!}},\quad\forall\left(z,w\right)\in\mathbb{C}\times\mathbb{C}.

    This leads to an RKHS K\mathscr{H}_{K} consisting of all entire functions FF on \mathbb{C} with norm

    |F(z)|2e|z|22𝑑A(z)<.\int_{\mathbb{C}}\left|F\left(z\right)\right|^{2}e^{-\frac{\left|z\right|^{2}}{2}}dA\left(z\right)<\infty. (6.2)

    Here, dA(z)=dxdydA\left(z\right)=dxdy, z=x+iyz=x+iy.

Remark 6.5.

Comparing kernels is relatively straightforward,

KKKK is p.d.K\leq K^{\prime}\Longleftrightarrow\text{$K^{\prime}-K$ is p.d.} (6.3)

while comparing the associated Hilbert spaces is intriguing. The challenge lies in understanding how the embedding or inclusion of one RKHS into another reflects the geometry of the underlying kernels. For instance, the inclusion involves not just the kernels’ positivity but also their interaction with the data, operator norms, and potential scaling factors. Moreover, the relationship between the norms of the two spaces is critical, as it determines the stability and sensitivity of algorithms using these spaces. This makes the comparison of RKHSs more than a direct numerical or functional comparison—it becomes a study of their geometry, boundedness properties, and the behavior of operators that map between them.

For example,

H2(𝔻)H_{2}\left(\mathbb{D}\right) (Hardy space) vs B2(𝔻)B_{2}\left(\mathbb{D}\right) (Bergman space), (6.4)

see the discussion above.

Now, consider feature maps and feature spaces:

HS(K)={(φ,),Xxφ(x),s.t. K(x,y)=φ(x),φ(y)}.H_{S}\left(K\right)=\left\{\left(\varphi,\mathscr{H}\right),X\ni x\rightarrow\varphi\left(x\right)\in\mathscr{H},\>\text{s.t. $K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{H}}$}\right\}. (6.5)

We may also consider the following two variants of HS(K)H_{S}\left(K\right):

Definition 6.6.

Given KK, p.d. in X×XX\times X, set

super feature space:HS+(K)={(ψ,);Xxψ(x),s.t. Kψ(x),ψ(y)},\text{super feature space:}\\ H_{S}^{+}\left(K\right)=\left\{\left(\psi,\mathscr{H}\right);X\ni x\rightarrow\psi\left(x\right)\in\mathscr{H},\>\text{s.t. $K\leq\left\langle\psi\left(x\right),\psi\left(y\right)\right\rangle_{\mathscr{H}}$}\right\}, (6.6)
sub feature space: HS(K)={(ψ,);Xxψ(x),s.t. ψ(x),ψ(y)K}.\text{sub feature space: }\\ H_{S}^{-}\left(K\right)=\left\{\left(\psi,\mathscr{H}\right);X\ni x\rightarrow\psi\left(x\right)\in\mathscr{H},\>\text{s.t. $\left\langle\psi\left(x\right),\psi\left(y\right)\right\rangle_{\mathscr{H}}\leq K$}\right\}. (6.7)

6.1. Three kernels of H2H_{2}-Hardy spaces

Here, fn(z)=znf_{n}\left(z\right)=z^{n}, in a complex variable z𝔻={z𝔻:|z|<1}z\in\mathbb{D}=\left\{z\in\mathbb{D}:\left|z\right|<1\right\}, the unit disk in \mathbb{C}.

Summary:

  1. (1)

    Coefficients in the scalar \mathbb{C}:

    H2(𝔻)={n=0cnzn:(cn)l2(0)},H_{2}\left(\mathbb{D}\right)=\left\{\sum_{n=0}^{\infty}c_{n}z^{n}:\left(c_{n}\right)\in l^{2}\left(\mathbb{N}_{0}\right)\right\},
    n=0cnznH2(𝔻)2=n=0|cn|2=(cn)l22.\left\|\sum_{n=0}^{\infty}c_{n}z^{n}\right\|_{H_{2}\left(\mathbb{D}\right)}^{2}=\sum_{n=0}^{\infty}\left|c_{n}\right|^{2}=\left\|\left(c_{n}\right)\right\|_{l^{2}}^{2}.
  2. (2)

    Coefficients in a fixed Hilbert space \mathscr{H}:

    H2():={n=0hnzn:hn,n=0hn2<},H_{2}\left(\mathscr{H}\right):=\left\{\sum_{n=0}^{\infty}h_{n}z^{n}:h_{n}\in\mathscr{H},\>\sum_{n=0}^{\infty}\left\|h_{n}\right\|^{2}<\infty\right\},
    n=0hnznH2()2=n=0hn2<.\left\|\sum_{n=0}^{\infty}h_{n}z^{n}\right\|_{H_{2}\left(\mathscr{H}\right)}^{2}=\sum_{n=0}^{\infty}\left\|h_{n}\right\|_{\mathscr{H}}^{2}<\infty.
  3. (3)

    Coefficients in the Hilbert-Schmidt class HS()B()HS\left(\mathscr{H}\right)\subset B\left(\mathscr{H}\right), where B()B\left(\mathscr{H}\right) is the space of all bounded operators in \mathscr{H}:

    H2(B()):={n=0Qnzn:QnB(),n=0QnQn𝒯(),trace class},H_{2}\left(B\left(\mathscr{H}\right)\right):=\left\{\sum_{n=0}^{\infty}Q_{n}z^{n}:Q_{n}\in B\left(\mathscr{H}\right),\right.\\ \;\left.\sum_{n=0}^{\infty}Q_{n}^{*}Q_{n}\in\mathscr{T}\left(\mathscr{H}\right),\>\text{trace class}\right\},
    n=0QnznH2(B())2=Trace(n=0QnQn).\left\|\sum_{n=0}^{\infty}Q_{n}z^{n}\right\|_{H_{2}\left(B\left(\mathscr{H}\right)\right)}^{2}=\text{Trace}\left(\sum_{n=0}^{\infty}Q_{n}^{*}Q_{n}\right).

Correspondences, transforms: 3213\rightarrow 2\rightarrow 1, 313\rightarrow 1. Recall a Kaczmarz system of projections PnP_{n} yields operators QnQ_{n} s.t. nQnQn=I\sum_{n}Q_{n}^{*}Q_{n}=I. See e.g., [JST23, HJW20, JST20] for additional details.

6.2. Realization using tensor product of Hilbert spaces

Recall that

the Hilbert space of all Hilbert-Schmidt operators acting on .{\color[rgb]{0,0,1}\mathscr{H}\otimes\mathscr{H}^{*}}\longleftrightarrow\text{the Hilbert space of all \emph{Hilbert-Schmidt} operators acting on $\mathscr{H}.$} (6.8)

Consider case (3) from above, i.e., F(z)=n=0QnznF\left(z\right)=\sum_{n=0}^{\infty}Q_{n}z^{n}, then for hh\in\mathscr{H},

h,F(z)h=n=0h,QnhznH2(𝔻),\left\langle h,F\left(z\right)h\right\rangle_{\mathscr{H}}=\sum_{n=0}^{\infty}\left\langle h,Q_{n}h\right\rangle_{\mathscr{H}}z^{n}\in H_{2}\left(\mathbb{D}\right),

where ,\left\langle\cdot,\cdot\right\rangle_{\mathscr{H}} denotes the inner product in \mathscr{H}, and

F(z)h2\displaystyle\left\|F\left(z\right)h\right\|_{\mathscr{H}}^{2} =n=0(Qnh)znH2(𝔻,)2\displaystyle=\left\|\sum_{n=0}^{\infty}\left(Q_{n}h\right)z^{n}\right\|_{H_{2}\left(\mathbb{D},\mathscr{H}\right)}^{2}
=n=0Qhh2|z|2n\displaystyle=\sum_{n=0}^{\infty}\left\|Q_{h}h\right\|_{\mathscr{H}}^{2}\left|z\right|^{2n}
=n=0h,QnQnh|z|2n\displaystyle=\sum_{n=0}^{\infty}\left\langle h,Q_{n}^{*}Q_{n}h\right\rangle_{\mathscr{H}}\left|z\right|^{2n}
h2n=0QnQn|z|2n\displaystyle\leq\left\|h\right\|_{\mathscr{H}}^{2}\sum_{n=0}^{\infty}\left\|Q_{n}^{*}Q_{n}\right\|\left|z\right|^{2n}
=h2n=0Qn2|z|2n\displaystyle=\left\|h\right\|_{\mathscr{H}}^{2}\sum_{n=0}^{\infty}\left\|Q_{n}\right\|^{2}\left|z\right|^{2n}

where =\left\|\cdot\right\|=\left\|\cdot\right\|_{\mathscr{H}\rightarrow\mathscr{H}} is the operator norm.

Trace-norm: Pick an ONB {ek}\left\{e_{k}\right\} in \mathscr{H}, then

Tr(F(z)F(z))\displaystyle\text{Tr}\left(F\left(z\right)^{*}F\left(z\right)\right) =kek,F(z)F(z)ek\displaystyle=\sum_{k}\left\langle e_{k},F\left(z\right)^{*}F\left(z\right){\color[rgb]{0,0,1}e_{k}}\right\rangle
=kF(z)ek2\displaystyle=\sum_{k}\left\|F\left(z\right)e_{k}\right\|_{\mathscr{H}}^{2}
=n=0Tr(QnQn)|z|2n<\displaystyle=\sum_{n=0}^{\infty}\text{Tr}\left(Q_{n}^{*}Q_{n}\right)\left|z\right|^{2n}<\infty

for z𝔻\forall z\in\mathbb{D} when QnQn\sum Q_{n}^{*}Q_{n} is trace-class.

Of special significance to the above discussion are the following citations [ADR03, Alp15, Arv98, HJW20, Loe48, PR16].

7. KK-duality via the RKHS K\mathscr{H}_{K}

The present final section addresses a list of direct links between kernel properties, and the role they play in feature selection.

7.1. Dirac-masses and K\mathscr{H}_{K}

In the below we consider the role of the Dirac masses in reproducing kernel Hilbert space K\mathscr{H}_{K} when KK is a general p.d. kernel defined on X×XX\times X. Specifically, we show that the completion of the span of the XX-Dirac masses identifies as a realization of all bounded linear functionals on K\mathscr{H}_{K}.

Theorem 7.1.

Fix KK, assumed p.d. on XX. Let

K=span¯K{Kx:xX},~K=span¯~K{δx:xX},\mathscr{H}_{K}=\overline{span}^{\left\|\cdot\right\|_{\mathscr{H}_{K}}}\left\{K_{x}:x\in X\right\},\quad\tilde{\mathscr{H}}_{K}=\overline{span}^{\left\|\cdot\right\|_{\tilde{\mathscr{H}}_{K}}}\left\{\delta_{x}:x\in X\right\}, (7.1)

where

iciKxiK2=iciδxi~K2=i,jci¯cjK(xi,xj).\left\|\sum\nolimits_{i}c_{i}K_{x_{i}}\right\|_{\mathscr{H}_{K}}^{2}=\left\|\sum\nolimits_{i}c_{i}\delta_{x_{i}}\right\|_{\tilde{\mathscr{H}}_{K}}^{2}=\sum\nolimits_{i,j}\overline{c_{i}}c_{j}K\left(x_{i},x_{j}\right). (7.2)

Then

~KK.\tilde{\mathscr{H}}_{K}\simeq\mathscr{H}_{K}^{\prime}. (7.3)
Proof.

Every bounded linear functional ll on K\mathscr{H}_{K} is given by a unique ξl\xi_{l} in K\mathscr{H}_{K}, and ξl\xi_{l} is the limit of a sequence (φn)\left(\varphi_{n}\right) in span{Kx:xX}span\left\{K_{x}:x\in X\right\}. Using the correspondence

φn(x)=ci(n)Kxiby (7.2)φ~n(x)=ci(n)δxi,\varphi_{n}\left(x\right)=\sum c_{i}^{\left(n\right)}K_{x_{i}}\xleftrightarrow{\text{by $\left(\ref{eq:g2-1}\right)$}}\tilde{\varphi}_{n}\left(x\right)=\sum c_{i}^{\left(n\right)}\delta_{x_{i}},

(φ~n)\left(\tilde{\varphi}_{n}\right) is Cauchy in ~K\tilde{\mathscr{H}}_{K}, and it converges to some f~Kf\in\tilde{\mathscr{H}}_{K}. Note, by (7.2),

φnξlK=φ~nf~K.\left\|\varphi_{n}-\xi_{l}\right\|_{\mathscr{H}_{K}}=\left\|\tilde{\varphi}_{n}-f\right\|_{\tilde{\mathscr{H}}_{K}}.

Then, for all hKh\in\mathscr{H}_{K},

l(h)=ξl,hK=limnφn,hK=limnφ~n(h)=f(h),l\left(h\right)=\left\langle\xi_{l},h\right\rangle_{\mathscr{H}_{K}}=\lim_{n}\left\langle\varphi_{n},h\right\rangle_{\mathscr{H}_{K}}=\lim_{n}\tilde{\varphi}_{n}\left(h\right)=f\left(h\right),

where

φ~n(h)=(ci(n)δxi)(h)=(ci(n)Kxi),hK=ci(n)h(xi).\tilde{\varphi}_{n}\left(h\right)=\left(\sum c_{i}^{\left(n\right)}\delta_{x_{i}}\right)\left(h\right)=\left\langle\left(\sum c_{i}^{\left(n\right)}K_{x_{i}}\right),h\right\rangle_{\mathscr{H}_{K}}=\sum c_{i}^{\left(n\right)}h\left(x_{i}\right).

Therefore,

l=f.l=f.

This shows that K~K\mathscr{H}_{K}^{\prime}\subset\tilde{\mathscr{H}}_{K}. Similarly, ~KK\tilde{\mathscr{H}}_{K}\subset\mathscr{H}_{K}^{\prime}, and so (7.3) holds. ∎

We note that Theorem 7.1 follows from the Riesz Representation Theorem. However, we include this theorem to explicitly establish the equivalence between K\mathscr{H}_{K}^{\prime}, the dual space of K\mathscr{H}_{K}, and ~K\tilde{\mathscr{H}}_{K}, the completion of the span of Dirac masses. The point is to explicitly show how the RKHS structure and kernel properties are used for this identification (see (7.3)). In Corollary 7.3 below, we use this to construct explicit bases for K\mathscr{H}_{K} and K\mathscr{H}_{K}^{\prime} for the kernel Kn(x,y):=(1xy)nK^{n}\left(x,y\right):=\left(1-xy\right)^{-n}, where the representation of the Dirac delta function gives a concrete realization of ~K\tilde{\mathscr{H}}_{K}.

Corollary 7.2.

Fix K,LK,L p.d. on XX. The following are equivalent:

  1. (1)

    KLK\leq L.

  2. (2)

    K\mathscr{H}_{K} is contractively contained in L\mathscr{H}_{L}.

  3. (3)

    ~L\tilde{\mathscr{H}}_{L} is contractively contained in ~K\tilde{\mathscr{H}}_{K}.

Moreover, K\mathscr{H}_{K} is dense in L\mathscr{H}_{L} if and only if ~L\tilde{\mathscr{H}}_{L} is dense in ~K\tilde{\mathscr{H}}_{K}.

Corollary 7.3.

Let X=(1,1)X=\left(-1,1\right), and

Kn(x,y)=(1xy)n=k=0akxkyk,x,yXK^{n}\left(x,y\right)=\left(1-xy\right)^{-n}=\sum_{k=0}^{\infty}a_{k}x^{k}y^{k},\quad x,y\in X

where

ak=(1)k(nk)=n(n+1)(n+k1)k>0.a_{k}=\left(-1\right)^{k}\binom{-n}{k}=\frac{n\left(n+1\right)\cdots\left(n+k-1\right)}{k}>0.

Then

Kn\displaystyle\mathscr{H}_{K^{n}} ={ckxk:(ck/ak)l2}{(ck):(ck/ak)l2}\displaystyle=\left\{\sum c_{k}x^{k}:\left(c_{k}/\sqrt{a_{k}}\right)\in l^{2}\right\}\simeq\left\{\left(c_{k}\right):\left(c_{k}/\sqrt{a_{k}}\right)\in l^{2}\right\} (7.4)
Kn\displaystyle\mathscr{H}^{\prime}_{K^{n}} ={ckDkk!:(ckak)l2}{(ck):(ckak)l2}\displaystyle=\left\{\sum c_{k}\frac{D^{k}}{k!}:\left(c_{k}\sqrt{a_{k}}\right)\in l^{2}\right\}\simeq\left\{\left(c_{k}\right):\left(c_{k}\sqrt{a_{k}}\right)\in l^{2}\right\} (7.5)

where

Dk=(ddy)k|y=0.D^{k}=\left(\frac{d}{dy}\right)^{k}\big{|}_{y=0}.

Further, for all xXx\in X,

δx=k=0xkDkk!.\delta_{x}=\sum_{k=0}^{\infty}x^{k}\frac{D^{k}}{k!}. (7.6)
Proof.

Note that

Dkk!(xm)={1k=m0km\frac{D^{k}}{k!}\left(x^{m}\right)=\begin{cases}1&k=m\\ 0&k\neq m\end{cases}

and so we have the natural isomorphism

KnDkk!akxkKn.\mathscr{H}^{\prime}_{K^{n}}\ni\frac{D^{k}}{k!}\longleftrightarrow a_{k}x^{k}\in\mathscr{H}_{K^{n}}.

Thus,

{Dkk!ak}k=0\left\{\frac{D^{k}}{k!\sqrt{a_{k}}}\right\}_{k=0}^{\infty}

is an ONB for Kn\mathscr{H}^{\prime}_{K^{n}}. (See also Corollary 4.3.)

Lastly, with δxKx\delta_{x}\longleftrightarrow K_{x}, then

Dkk!ak,δxKn=akxk,KxKn=akxk,\left\langle\frac{D^{k}}{k!\sqrt{a_{k}}},\delta_{x}\right\rangle_{\mathscr{H}^{\prime}_{K^{n}}}=\left\langle\sqrt{a_{k}}x^{k},K_{x}\right\rangle_{\mathscr{H}_{K^{n}}}=\sqrt{a_{k}}x^{k},

and thus

δx=k=0Dkk!ak,δxKnDkk!ak=k=0akxkDkk!ak=k=0xkDkk!,\delta_{x}=\sum_{k=0}^{\infty}\left\langle\frac{D^{k}}{k!\sqrt{a_{k}}},\delta_{x}\right\rangle_{\mathscr{H}^{\prime}_{K^{n}}}\frac{D^{k}}{k!\sqrt{a_{k}}}=\sum_{k=0}^{\infty}\sqrt{a_{k}}x^{k}\frac{D^{k}}{k!\sqrt{a_{k}}}=\sum_{k=0}^{\infty}x^{k}\frac{D^{k}}{k!},

which is (7.6). ∎

7.2. A KK-transform and K1K^{-1}

The following result is motivated by the special case when p.d. kernels arise as Greens functions. In more detail, recall from the context of PDEs, Greens functions arise as “inverse” to positive elliptic operators, see e.g., [Nel58b, KL21, BGG99, Hil98]. Since, for these cases, therefore p.d. kernels KK arise as inverses of elliptic PDEs, it seems natural, in our present general framework of Pos(X)Pos(X), to ask for a precise form of K1K^{-1} .

Starting with KPos(X)K\in Pos\left(X\right), and introduce functions fKf\in\mathscr{H}_{K} (the RKHS), and signed measures μ\mu on XX s.t. μKμ<\mu K\mu<\infty.

Recall that the following are equivalent:

>μ(dx)K(x,y)μ(dy)=(abbreviated μKμ)\displaystyle\infty>\iint\mu\left(dx\right)K\left(x,y\right)\mu\left(dy\right)=\left(\text{abbreviated $\mu K\mu$}\right) (7.7)
\displaystyle\Updownarrow
μ(dx)K(x,)K\displaystyle\int\mu\left(dx\right)K\left(x,\cdot\right)\in\mathscr{H}_{K} (7.8)

So we get a pre-Hilbert space

2(K):={signed measures μ s.t. μKμ<},\mathscr{M}_{2}\left(K\right):=\left\{\text{signed measures $\mu$ s.t. $\mu K\mu<\infty$}\right\},

and

TK:2(K)K,T_{K}:\mathscr{M}_{2}\left(K\right)\longrightarrow\mathscr{H}_{K}, (7.9)

where

2(K)μ\displaystyle\mathscr{M}_{2}\left(K\right)\ni\mu TK\displaystyle\xrightarrow{\quad T_{K}\quad} TKμK\displaystyle T_{K}\mu\in\mathscr{H}_{K} (7.10)
2(K)K1f\displaystyle\mathscr{M}_{2}\left(K\right)\ni K^{-1}f TK\displaystyle\xleftarrow[\quad T_{K}^{*}\quad]{} fK\displaystyle f\in\mathscr{H}_{K} (7.11)

So we have a well defined operator TKT_{K}^{*}, and we can use it to make precise K1K^{-1}. So K1K^{-1} gets a precise definition via TKT_{K}^{*}.

Recall we proved (7.7)\Longleftrightarrow(7.8),

TKμK2=μKμ=μ2(K)2.\left\|T_{K}\mu\right\|_{\mathscr{H}_{K}}^{2}=\mu K\mu=\left\|\mu\right\|_{\mathscr{M}_{2}\left(K\right)}^{2}. (7.12)

So introduce

ν,μ2(K)=νKμ\left\langle\nu,\mu\right\rangle_{\mathscr{M}_{2}\left(K\right)}=\nu K\mu

we have that

2(K)TKmK\mathscr{M}_{2}\left(K\right)\xrightarrow{\quad T_{K}\quad}m_{K}

is isometric.

Proposition 7.4.

Fix KPos(X)K\in Pos\left(X\right). We have:

μ2(K)signedmeasuresμ,μKμ<\textstyle{\mu\in\underset{\begin{matrix}\text{signed}\\ \text{measures}\\ \mu,\mu K\mu<\infty\end{matrix}}{\underbrace{\mathscr{M}_{2}\left(K\right)}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}TK\scriptstyle{T_{K}}KRKHS\textstyle{\underset{\text{RKHS}}{\mathscr{H}_{K}}\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}TK\scriptstyle{T_{K}^{*}}
TKμ\displaystyle T_{K}\mu =μ(dx)K(x,),\displaystyle=\int\mu\left(dx\right)K\left(x,\cdot\right), (7.13)
TKf\displaystyle T_{K}^{*}f =K1f,\displaystyle=K^{-1}f, (7.14)

where K1K^{-1} is a Penrose-inverse (see e.g., [MWS25]) to KK where KK is interpreted as a kernel operator.

Proof.

We must show the following identity for the respective inner products on fKf\in\mathscr{H}_{K}, μ2(K)\mu\in\mathscr{M}_{2}\left(K\right):

TKμ,fK=μ,K1f2(K).\left\langle T_{K}\mu,f\right\rangle_{\mathscr{H}_{K}}=\left\langle\mu,K^{-1}f\right\rangle_{\mathscr{M}_{2}\left(K\right)}. (7.15)

Using (7.14), we arrive at the following:

LHS(7.15)\displaystyle\text{LHS}_{\left(\ref{eq:nm3}\right)} =μ(dx)K(x,),fK\displaystyle=\left\langle\int\mu\left(dx\right)K\left(x,\cdot\right),f\right\rangle_{\mathscr{H}_{K}}
=μ(dx)f(x)\displaystyle=\int\mu\left(dx\right)f\left(x\right)
=μ(dx)K(K1f)(y)=RHS(7.15).\displaystyle=\int\mu\left(dx\right)K\left(K^{-1}f\right)\left(y\right)=\text{$\text{RHS}_{\left(\ref{eq:nm3}\right)}$. }

Note that when K1K^{-1} is acting via Penrose inverse on the function ff, the result K1fK^{-1}f is a signed measure, and that is the interpretation used in the statement of the Proposition. ∎

Of special significance to the above discussion are the following citations [JST23, JT23b, MWS25, PDC+14, TXK23, ZCH19].

References

  • [AA23] Nourhane Attia and Ali Akgül, On solutions of biological models using reproducing Kernel Hilbert space method, Computational methods for biological models, Stud. Comput. Intell., vol. 1109, Springer, Singapore, [2023] ©2023, pp. 117–136. MR 4689655
  • [AAARM24] Taher Amoozad, Tofigh Allahviranloo, Saeid Abbasbandy, and Mohsen Rostamy Malkhalifeh, Using a new implementation of reproducing kernel Hilbert space method to solve a system of second-order BVPs, Int. J. Dyn. Control 12 (2024), no. 6, 1694–1706. MR 4751246
  • [ADR03] D. Alpay, A. Dijksma, and J. Rovnyak, A theorem of Beurling-Lax type for Hilbert spaces of functions analytic in the unit ball, Integral Equations Operator Theory 47 (2003), no. 3, 251–274. MR 2012838
  • [AJ21] Daniel Alpay and Palle E. T. Jorgensen, New characterizations of reproducing kernel Hilbert spaces and applications to metric geometry, Opuscula Math. 41 (2021), no. 3, 283–300. MR 4302453
  • [AJ22] Daniel Alpay and Palle Jorgensen, Reflection positivity via Krein space analysis, Adv. in Appl. Math. 141 (2022), Paper No. 102411, 45. MR 4467152
  • [AJP22] Daniel Alpay, Palle Jorgensen, and Motke Porat, White noise space analysis and multiplicative change of measures, J. Math. Phys. 63 (2022), no. 4, Paper No. 042102, 23. MR 4405120
  • [Alp15] Daniel Alpay, An advanced complex analysis problem book, Birkhäuser/Springer, Cham, 2015, Topological vector spaces, functional analysis, and Hilbert spaces of analytic functions. MR 3410523
  • [AMP92] Gregory T. Adams, Paul J. McGuire, and Vern I. Paulsen, Analytic reproducing kernels and multiplication operators, Illinois J. Math. 36 (1992), no. 3, 404–419. MR 1161974
  • [AP20] Daniel Alpay and Ismael L. Paiva, On the extension of positive definite kernels to topological algebras, J. Math. Phys. 61 (2020), no. 6, 063507, 10. MR 4113042
  • [Aro50] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950), 337–404. MR 51437
  • [Arv98] William Arveson, Subalgebras of CC^{*}-algebras. III. Multivariable operator theory, Acta Math. 181 (1998), no. 2, 159–228. MR 1668582
  • [BB23] Anton Baranov and Timur Batenev, Representing systems of reproducing kernels in spaces of analytic functions, Results Math. 78 (2023), no. 4, Paper No. 143, 17. MR 4586941
  • [Ber55] Stefan Bergman, Bounds for analytic functions in domains with a distinguished boundary surface, Math. Z. 63 (1955), 173–194. MR 75656
  • [BGG99] Richard Beals, Peter Greiner, and Bernard Gaveau, Green’s functions for some highly degenerate elliptic operators, J. Funct. Anal. 165 (1999), no. 2, 407–429. MR 1698952
  • [CCF16] Min Chen, Yang Chen, and Engui Fan, Perturbed Hankel determinant, correlation functions and Painlevé equations, J. Math. Phys. 57 (2016), no. 2, 023501, 31. MR 3439677
  • [CCL17] Hong Rae Cho, Hyunil Choi, and Han-Wool Lee, Boundedness of the Segal-Bargmann transform on fractional Hermite-Sobolev spaces, J. Funct. Spaces (2017), Art. ID 9176914, 6. MR 3603383
  • [CDD15] Sergio Cruces and Iván Durán-Díaz, The minimum risk principle that underlies the criteria of bounded component analysis, IEEE Trans. Neural Netw. Learn. Syst. 26 (2015), no. 5, 964–981. MR 3454256
  • [DKS19] Kamal Diki, Rolf Sören Krausshar, and Irene Sabadini, On the Bargmann-Fock-Fueter and Bergman-Fueter integral transforms, J. Math. Phys. 60 (2019), no. 8, 083506, 26. MR 3994389
  • [Don74] William F. Donoghue, Jr., Monotone matrix functions and analytic continuation, Die Grundlehren der mathematischen Wissenschaften, Band 207, Springer-Verlag, New York-Heidelberg, 1974. MR 0486556
  • [DS20] Micho Durdevich and Stephen Bruce Sontz, Coherent states for the Manin plane via Toeplitz quantization, J. Math. Phys. 61 (2020), no. 2, 023502, 17. MR 4059341
  • [Gia21] Dimitrios Giannakis, Quantum dynamics of the classical harmonic oscillator, J. Math. Phys. 62 (2021), no. 4, Paper No. 042701, 45. MR 4241109
  • [Has21] Friedrich Haslinger, The generalized \partial-complex on the Segal-Bargmann space, Operator theory, functional analysis and applications, Oper. Theory Adv. Appl., vol. 282, Birkhäuser/Springer, Cham, [2021] ©2021, pp. 317–328. MR 4248024
  • [Hil98] Adrian T. Hill, Estimates on the Green’s function of second-order elliptic operators in 𝐑N{\bf R}^{N}, Proc. Roy. Soc. Edinburgh Sect. A 128 (1998), no. 5, 1033–1051. MR 1642132
  • [HJW20] John E. Herr, Palle E. T. Jorgensen, and Eric S. Weber, Harmonic analysis of fractal measures: basis and frame algorithms for fractal L2L^{2}-spaces, and boundary representations as closed subspaces of the Hardy space, Analysis, probability and mathematical physics on fractals, Fractals Dyn. Math. Sci. Arts Theory Appl., vol. 5, World Sci. Publ., Hackensack, NJ, [2020] ©2020, pp. 163–221. MR 4472249
  • [HMBV24] Boya Hou, Amarsagar Reddy Ramapuram Matavalam, Subhonmesh Bose, and Umesh Vaidya, Propagating uncertainty through system dynamics in reproducing kernel Hilbert space, Phys. D 463 (2024), Paper No. 134168, 9. MR 4735224
  • [HSZ+19] Bo He, Yan Song, Yuemei Zhu, Qixin Sha, Yue Shen, Tianhong Yan, Rui Nian, and Amaury Lendasse, Local receptive fields based extreme learning machine with hybrid filter kernels for image classification, Multidimens. Syst. Signal Process. 30 (2019), no. 3, 1149–1169. MR 3969356
  • [Jon09] Lee K. Jones, Local minimax learning of functions with best finite sample estimation error bounds: applications to ridge and lasso regression, boosting, tree learning, kernel machines, and inverse problems, IEEE Trans. Inform. Theory 55 (2009), no. 12, 5700–5727. MR 2597189
  • [Jor02] Palle E. T. Jorgensen, Diagonalizing operators with reflection symmetry, vol. 190, 2002, Special issue dedicated to the memory of I. E. Segal, pp. 93–132. MR 1895530
  • [JS21] Palle Jorgensen and David E. Stewart, Approximation properties of ridge functions and extreme learning machines, SIAM J. Math. Data Sci. 3 (2021), no. 3, 815–832. MR 4291375
  • [JST20] Palle Jorgensen, Myung-Sin Song, and James Tian, A Kaczmarz algorithm for sequences of projections, infinite products, and applications to frames in IFS L2L^{2} spaces, Adv. Oper. Theory 5 (2020), no. 3, 1100–1131. MR 4126821
  • [JST23] Palle E. T. Jorgensen, Myung-Sin Song, and James Tian, Infinite-dimensional stochastic transforms and reproducing kernel Hilbert space, Sampl. Theory Signal Process. Data Anal. 21 (2023), no. 1, Paper No. 12, 27. MR 4561157
  • [JT22] Palle Jorgensen and James Tian, Reproducing kernels and choices of associated feature spaces, in the form of L2L^{2}-spaces, J. Math. Anal. Appl. 505 (2022), no. 2, Paper No. 125535, 31. MR 4295177
  • [JT23a] by same author, Harmonic analysis of network systems via kernels and their boundary realizations, Discrete Contin. Dyn. Syst. Ser. S 16 (2023), no. 2, 277–308. MR 4536608
  • [JT23b] Palle E. T. Jorgensen and James Tian, Dual pairs of operators, harmonic analysis of singular nonatomic measures and Krein-Feller diffusion, J. Operator Theory 89 (2023), no. 1, 205–248. MR 4567343
  • [JT23c] by same author, Stochastics and dynamics of fractals, Recent developments in operator theory, mathematical physics and complex analysis, Oper. Theory Adv. Appl., vol. 290, Birkhäuser/Springer, Cham, [2023] ©2023, pp. 171–216. MR 4590528
  • [Kis23] Vladimir V. Kisil, Cross-Toeplitz operators on the Fock-Segal-Bargmann spaces and two-sided convolutions on the Heisenberg group, Ann. Funct. Anal. 14 (2023), no. 2, Paper No. 38, 57. MR 4553935
  • [KL21] Seick Kim and Sungjin Lee, Estimates for Green’s functions of elliptic equations in non-divergence form with continuous coefficients, Ann. Appl. Math. 37 (2021), no. 2, 111–130. MR 4294330
  • [LCA+24] William Lippitt, Nichole E. Carlson, Jaron Arbet, Tasha E. Fingerlin, Lisa A. Maier, and Katerina Kechris, Limitations of clustering with PCA and correlated noise, J. Stat. Comput. Simul. 94 (2024), no. 10, 2291–2319. MR 4769269
  • [LG20] Marcos López-García, The weighted Bergman space on a sector and a degenerate parabolic equation, J. Math. Anal. Appl. 491 (2020), no. 2, 124344, 15. MR 4122067
  • [Loe48] Charles Loewner, A topological characterization of a class of integral operators, Ann. of Math. (2) 49 (1948), 316–332. MR 24487
  • [MDL19] Elisa Marcelli and Renato De Leone, Infinite Kernel Extreme Learning Machine, Advances in optimization and decision science for society, services and enterprises, AIRO Springer Ser., vol. 3, Springer, Cham, [2019] ©2019, pp. 95–105. MR 4300781
  • [MWS25] Haifeng Ma, Wen Wang, and Predrag S. Stanimirović, Weighted Moore-Penrose inverses for dual matrices and its applications, Appl. Math. Comput. 489 (2025), Paper No. 129145, 14. MR 4815055
  • [Nel58a] Edward Nelson, An existence theorem for second order parabolic equations, Trans. Amer. Math. Soc. 88 (1958), 414–429. MR 95341
  • [Nel58b] by same author, Kernel functions and eigenfunction expansions, Duke Math. J. 25 (1958), 15–27. MR 91442
  • [NSW11] P. Niyogi, S. Smale, and S. Weinberger, A topological view of unsupervised learning from noisy data, SIAM J. Comput. 40 (2011), no. 3, 646–663. MR 2810909
  • [PDC+14] Gianluigi Pillonetto, Francesco Dinuzzo, Tianshi Chen, Giuseppe De Nicolao, and Lennart Ljung, Kernel methods in system identification, machine learning and function estimation: a survey, Automatica J. IFAC 50 (2014), no. 3, 657–682. MR 3173967
  • [PR16] Vern I. Paulsen and Mrinal Raghupathi, An introduction to the theory of reproducing kernel Hilbert spaces, Cambridge Studies in Advanced Mathematics, vol. 152, Cambridge University Press, Cambridge, 2016. MR 3526117
  • [SBP23] Anirban Sen, Pintu Bhunia, and Kallol Paul, Bounds for the Berezin number of reproducing kernel Hilbert space operators, Filomat 37 (2023), no. 6, 1741–1749. MR 4569938
  • [Sch64] Laurent Schwartz, Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (noyaux reproduisants), J. Analyse Math. 13 (1964), 115–256. MR 179587
  • [SS17] Jan Stochel and Jerzy Bartł omiej Stochel, Composition operators on Hilbert spaces of entire functions with analytic symbols, J. Math. Anal. Appl. 454 (2017), no. 2, 1019–1066. MR 3658810
  • [Ste24] Ingo Steinwart, Reproducing kernel Hilbert spaces cannot contain all continuous functions on a compact metric space, Arch. Math. (Basel) 122 (2024), no. 5, 553–557. MR 4734568
  • [TXK23] Xin Tan, Yingcun Xia, and Efang Kong, Choosing shape parameters for regression in reproducing kernel Hilbert space and variable selection, J. Nonparametr. Stat. 35 (2023), no. 3, 514–528. MR 4635411
  • [vdL96] Angelika van der Linde, The invariance of statistical analyses with smoothing splines with respect to the inner product in the reproducing kernel Hilbert space, Statistical theory and computational aspects of smoothing (Semmering, 1994), Contrib. Statist., Physica, Heidelberg, 1996, pp. 149–164. MR 1482832
  • [WK23] Hengfang Wang and Jae Kwang Kim, Statistical inference using regularized M-estimation in the reproducing kernel Hilbert space for handling missing data, Ann. Inst. Statist. Math. 75 (2023), no. 6, 911–929. MR 4655783
  • [YTDMM11] Shi Yu, Léon-Charles Tranchevent, Bart De Moor, and Yves Moreau, Kernel-based data fusion for machine learning, Studies in Computational Intelligence, vol. 345, Springer-Verlag, Berlin, 2011, Methods and applications in bioinformatics and text mining. MR 3024752
  • [ZCH19] Yang Zhou, Di-Rong Chen, and Wei Huang, A class of optimal estimators for the covariance operator in reproducing kernel Hilbert spaces, J. Multivariate Anal. 169 (2019), 166–178. MR 3875593
  • [ZXZ09] Haizhang Zhang, Yuesheng Xu, and Jun Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res. 10 (2009), 2741–2775. MR 2579912
  • [ZZ23] Yilin Zhang and Liping Zhu, Projection divergence in the reproducing kernel Hilbert space: Asymptotic normality, block-wise and slicing estimation, and computational efficiency, J. Multivariate Anal. 197 (2023), Paper No. 105204. MR 4601874