New duality in choices of feature spaces via kernel analysis

Palle E.T. Jorgensen (Palle E.T. Jorgensen) Department of Mathematics, The University of Iowa, Iowa City, IA 52242-1419, U.S.A. palle-jorgensen@uiowa.edu and James Tian (James F. Tian) Mathematical Reviews, 416 4th Street Ann Arbor, MI 48103-4816, U.S.A. james.ftian@gmail.com

Abstract.

We present a systematic study of the family of positive definite (p.d.) kernels with the use of their associated feature maps and feature spaces. For a fixed set $X$ , generalizing Loewner, we make precise the corresponding partially ordered set $Pos\left(X\right)$ of all p.d. kernels on $X$ , as well as a study of its global properties. This new analysis includes both results dealing with applications and concrete examples, including such general notions for $Pos\left(X\right)$ as the structure of its partial order, its products, sums, and limits; as well as their Hilbert space-theoretic counterparts. For this purpose, we introduce a new duality for feature spaces, feature selections, and feature mappings. For our analysis, we further introduce a general notion of dual pairs of p.d. kernels. Three special classes of kernels are studied in detail: (a) the case when the reproducing kernel Hilbert spaces (RKHSs) may be chosen as Hilbert spaces of analytic functions, (b) when they are realized in spaces of Schwartz-distributions, and (c) arise as fractal limits. We further prove inverse theorems in which we derive results for the analysis of $Pos\left(X\right)$ from the operator theory of specified counterpart-feature spaces. We present constructions of new p.d. kernels in two ways: (i) as limits of monotone families in $Pos\left(X\right)$ , and (ii) as p.d. kernels which model fractal limits, i.e., are invariant with respect to certain iterated function systems (IFS)-transformations.

Key words and phrases:

Positive-definite kernel, feature space, feature selection, operator theory, reproducing kernel Hilbert space, Schwartz distributions, embedding problem, factorization, geometry, optimization, Principal Component Analysis, covariance kernels, kernel optimization, Gaussian process.

2020 Mathematics Subject Classification:

Primary 46E22. Secondary 47B32, 41A65, 42A82, 42C15, 60G15, 68T07.

1. Introduction

The purpose of this paper is to introduce a new duality for the study of feature spaces, feature selections, and feature mappings, which arise in diverse applications of kernel analysis of non-linear problems. Here, our use of the notion of “feature space” is in the sense of data science: it refers to the collections of features used to characterize the data at hand. By “feature selection,” we mean one or more techniques from machine learning, typically involving the choice of subsets of relevant features from the original set to enhance model performance. The term “feature mappings” refers to a technique in data analysis and machine learning for transforming input data from a lower-dimensional space to a higher-dimensional space using kernels, enabling easier analysis or classification. Choices of feature mapping involve constructions and optimization algorithms, which lead to the selection of specific functions. These mappings serve to transform the original data into a new set of features (feature spaces) that better capture the significant patterns in the data.

We consider families of positive definite (p.d.) kernels, defined on a product set $X\times X$ , where $X$ is merely a set with no extra a priori structure. The positivity condition for $K$ (p.d.), Definition 2.1, was first studied by Aronszajn [Aro50] and his contemporaries (see also [PR16, AMP92, ZZ23, SBP23]).

Pairs $\left(X,K\right)$ arise in various contexts, including optimization, principal component analysis (PCA), partial differential equations (PDEs), and statistical inference. In stochastic models, the p.d. kernel often serves as a covariance kernel of a Gaussian field. We emphasize that the set $X$ does not lend itself to direct analysis; in particular, it does not come equipped with any linear structure. However, measurements performed on $X$ often lead to p.d. kernels $K$ . Moreover, $K$ then allows one to represent data from $X$ in a linear space, often referred to as a feature space.

We say that a pair $\left(\phi,\mathscr{H}\right)$ represents a feature map, and a feature space if $\phi$ is a function from $X$ mapping into the Hilbert space $\mathscr{H}$ in such a way that $K$ is recovered from the inner product in $\mathscr{H}$ via $\phi$ ; see Proposition 3.3 below. There is a vast variety of choices of feature selections in the form $\left(\phi,\mathscr{H}\right)$ , and Aronszajn’s reproducing kernel Hilbert space (RKHS), denoted as $\mathscr{H}_{K}$ , is only one possibility.

In this paper, we introduce a new duality approach to the study of choices of feature selections, and apply it to particular p.d. kernels $\left(X,K\right)$ arising in both pure and applied mathematics. For related papers on feature selection, we refer to [JT23b, JST23, JT23a, AJ22, JT22, AJ21] as well as the references cited therein.

Organization. Our duality tools are outlined in detail below, and in more detail, in Section 3.1, especially Propositions 3.5 and 3.7. Section 4 deals with the need for choices of “bigger” spaces for implementation, which includes here choices of Hilbert spaces of Schwartz distributions, i.e., generalized functions. For optimization questions arising in practice, it is important to have useful ordering of families of kernels, as well as monotone limit theorems for kernels, and these two questions are addressed systematically in Section 6. Applications of our kernel duality principles and new transforms, are addressed in Section 7.

Applications. While our focus is primarily theoretical, kernel theory and optimization have had significant impact on practical applications, particularly in machine learning algorithms and big data analysis [NSW11, PDC⁺14, YTDMM11, Jon09, ZXZ09, MDL19, HSZ⁺19]. Beyond these areas, kernel methods have found relevance in fields such as statistical inference, quantum dynamics, perturbation theory, and operator algebras, with applications ranging from multiplicative change-of-measure algorithms to the analysis of coherent states and Fock spaces [AJP22, Gia21, AP20, DS20, DKS19, CCF16]. This versatility has sparked renewed interest in both the theoretical foundations and practical applications of kernels, leading to deeper understanding of inference techniques and optimization models [WK23, ZCH19, vdL96, AAARM24, HMBV24, Ste24, AA23, TXK23].

Our approach focuses on identifying duality principles to better understand feature spaces and their role in kernel methods. While the “kernel trick” in machine learning often bypasses explicit constructions of the ambient feature space, studying its structure offers a richer theoretical perspective. Insights into feature space flexibility, stability, and scalability can inform the design and optimization of kernels, enhancing algorithmic robustness and interpretability. This dual perspective bridges practical applications like clustering, support-vector machines, and principal component analysis with the broader mathematical framework, enabling more sophisticated kernel-based solutions for complex data problems.

2. Preliminaries

In this section we introduce the main notions which will be used inside the paper.

The concept of a kernel in machine learning is powerful tool used in in the design of Support Vector Machines (SVMs). A kernel is a function that operates on points from the input space, commonly referred to as the $X$ space. The primary role of this function is to return a scalar value, but a Hilbert space, called a reproducing kernel Hilbert space (RKHS). This higher-dimensional space, known as the $Z$ space. It conveys how close or similar vectors are in the $Z$ space The kernel allows one to glean the necessary information about the vectors in this more complex space without having to access the space directly. This approach allows one to understand the relationship and position of vectors in a higher-dimensional space and is a powerful tool for classification tasks.

In this section, we introduce fundamental definitions, along with selected lemmas and properties that serve as key building blocks for the paper.

Definition 2.1 (Positive definite).

Let $X$ be a set.

(1)

A function $K:X\times X\rightarrow\mathbb{C}$ is said to be a positive definite (p.d.) kernel if, for all $n\in\mathbb{N}$ , all $\left(x_{i}\right)_{i=1}^{n}$ in $X$ , and all $\left(c_{i}\right)_{i=1}^{n}$ in $\mathbb{C}$ , we have

$\sum_{i=1}^{n}\sum_{j=1}^{n}\overline{c_{i}}c_{j}K\left(x_{i},x_{j}\right)\geq 0.$

(2)

Given a p.d. kernel $K:X\times X\rightarrow\mathbb{C}$ , let $\mathscr{H}_{K}$ be the Hilbert completion of the set $H_{0}=span\left\{K_{x}:x\in X\right\}$ , where $K_{x}\left(\cdot\right)=K\left(\cdot,x\right)$ , $x\in X$ , with respect to the norm

\left\|\sum_{i=1}^{n}c_{i}K_{x_{i}}\right\|_{\mathscr{H}_{K}}^{2}=\sum_{i=1}^{n}\sum_{j=1}^{n}\overline{c_{i}}c_{j}K\left(x_{i},x_{j}\right).

$\mathscr{H}_{K}$ is called the reproducing kernel Hilbert space (RKHS) of $K$ , and it has the reproducing property:

f\left(x\right)=\left\langle K_{x},f\right\rangle_{\mathscr{H}_{K}}

valid for all $f\in\mathscr{H}_{K}$ and $x\in X$ .

Throughout this paper, all the Hilbert spaces are assumed separable.

Lemma 2.2 (Parseval frame).

Let $\mathscr{H}$ be a Hilbert space, and $\left\{f_{n}\right\}_{n\in\mathbb{N}}\subset\mathscr{H}$ . Suppose

\sum_{n\in\mathbb{N}}\left|\left\langle f_{n},h\right\rangle_{\mathscr{H}}\right|^{2}=\left\|h\right\|_{\mathscr{H}}^{2},\quad\forall h\in\mathscr{H}.

(2.1)

Then $\left\{f_{n}\right\}$ is an orthonormal basis (ONB) if and only if $\left\|f_{n}\right\|_{\mathscr{H}}=1$ for all $n\in\mathbb{N}$ .

Proof.

Fix $n_{0}$ and assume (2.1), then

\left\|f_{n_{0}}\right\|_{\mathscr{H}_{K}}^{2}=\left\|f_{n_{0}}\right\|_{\mathscr{H}_{K}}^{4}+\sum_{n\neq n_{0}}\left|\left\langle f_{n},f_{n_{0}}\right\rangle_{\mathscr{H}_{K}}\right|^{2},

and so $\left\langle f_{n},f_{n_{0}}\right\rangle_{\mathscr{H}_{K}}=0$ , for all $n\in\mathbb{N}\backslash\left\{n_{0}\right\}$ , if $\left\|f_{n_{0}}\right\|_{\mathscr{H}_{K}}=1$ . ∎

Lemma 2.3 (Kernel representation).

Let $K$ be a p.d. kernel on $X\times X$ .

(1)

A system of functions $\left\{f_{n}\right\}$ on $X$ is a Parseval frame for $\mathscr{H}_{K}$ if and only if

$K\left(x,y\right)=\sum_{n}\overline{f_{n}\left(x\right)}f_{n}\left(y\right),\quad x,y\in X.$ (2.2)

Moreover, when (2.2) holds, then $\left(f_{n}\left(x\right)\right)\in l^{2}$ for all $x\in X$ .
(2)

Further, if all the $f_{n}$ ’s are distinct, i.e., each with multiplicity one, then $\left\{f_{n}\right\}$ is an ONB.

Proof.

For (1), see e.g., [PR16]. Note $\sum_{n}\left|f_{n}\left(x\right)\right|^{2}=\left\|K_{x}\right\|_{\mathscr{H}}^{2}=K\left(x,x\right)<\infty$ , for all $x\in X$ .

Part (2) follows from the argument in Lemma 2.2, where $\mathscr{H}_{K}\simeq l^{2}$ with the isomorphism $f_{n}\mapsto e_{n}$ . ∎

Proposition 2.4 (Products of p.d. kernels).

Let $K$ and $L$ be p.d. kernels defined on $X\times X$ . Set $M=KL$ as follows: $M\left(x,y\right)\coloneqq K\left(x,y\right)L\left(x,y\right)$ for $\left(x,y\right)\in X\times X$ . Then $M$ is also p.d. on $X\times X$ .

Proof.

Pick the representation (2.2) for $K$ , and consider $N\in\mathbb{N}$ , $c_{i}\in\mathbb{R}$ (or $\mathbb{C}$ ), $x_{i}\in X$ , $1\leq i\leq N$ . Then we get the desired conclusion as follows:

\sum_{i}\sum_{j}\overline{c_{i}}c_{j}M\left(x_{i},x_{j}\right)\\ =\sum_{n}\left(\sum_{i}\sum_{j}\left(\overline{c_{i}f_{n}\left(x_{i}\right)}\right)\left(c_{j}f_{n}\left(x_{j}\right)\right)L\left(x_{i},x_{j}\right)\right)\geq 0,

where we used the p.d. property of $L$ in the last step. ∎

Definition 2.5 (Loewner order, see e.g., [Don74, Loe48]).

For p.d. kernels $K,L$ on $X\times X$ , we say $K\leq L$ if $L-K$ is p.d.

Lemma 2.6.

If $K:X\times X\rightarrow\mathbb{C}$ is p.d. and $K\geq 1$ , then $K^{n}\leq K^{n+1}$ , $n\in\mathbb{N}$ .

Proof.

Recall that products of p.d. kernels are p.d. (see Proposition 2.4, and also Proposition 3.7). Therefore, $K^{n+1}-K^{n}=K^{n}\left(K-1\right)\geq 0$ . ∎

Example 2.7.

Let $K\left(z,w\right)=\left(1-\overline{w}z\right)^{-1}$ be the Szegő kernel on $\mathbb{D}\times\mathbb{D}$ , where $\mathbb{D}=\left\{z\in\mathbb{C}:\left|z\right|<1\right\}$ . Then

\left(1-\overline{w}z\right)^{-1}=\sum_{n=0}^{\infty}\overline{w}^{n}z^{n}\geq 1,

in the sense of Definition 2.5, and so

1\leq K\leq K^{2}\leq\cdots\leq K^{n}.

More specifically,

1\leq\frac{1}{1-\overline{w}z}\leq\frac{1}{\left(1-\overline{w}z\right)^{2}}\leq\frac{1}{\left(1-\overline{w}z\right)^{3}}\leq\cdots\leq\frac{1}{\left(1-\overline{w}z\right)^{n}}.

Here, $K^{2}\left(z,w\right)=\left(1-\overline{w}z\right)^{-2}$ is the Bergman kernel.

Throughout the paper, we will revisit this family of kernels and its variations, both for motivations and for illustrations of general results. The reader may refer to Examples 3.4, 3.6, 4.2, 5.5, 6.3, as well as Corollaries 4.3, 7.3, and Remark 5.11.

Proposition 2.8 (Monotonicity).

Consider two pairs of p.d. kernels $K_{i}$ and $L_{i}$ , and form the product $P_{i}:=K_{i}L_{i}$ , $i=1,2$ . If $K_{1}\leq K_{2}$ , $L_{1}\leq L_{2}$ , then $P_{1}\leq P_{2}$ .

Proof.

By assumption, we get the following conclusion:

P_{2}-P_{1}=K_{2}L_{2}-K_{1}L_{1}=\left(K_{2}-K_{1}\right)L_{2}+K_{1}\left(L_{2}-L_{1}\right)\geq 0.

∎

Since our paper is interdisciplinary it combines topics from pure and applied. Of special significance to the above are the following citations [AJ21, BGG99, JS21, JST20, MDL19, WK23, Ste24].

3. Feature space realizations

The purpose of the present section is to outline key links connection the notions from Section 2 to the present applications. We outline the main connections between the key pure math notions (especially kernel-duality introduced above), and the applied notions, focusing on feature selection, feature maps, and kernel-machines.

With “feature selection” we refer to the part of machine learning that identifies the “best” insights into phenomena/observations. A feature is input, i.e., a measurable property of the phenomena. In statistical learning, features are often identified with choices of independent random variables, typically identically distributed (i.i.d.); see below. More generally, learning algorithms serve to identify features that yield better models. Features come in several forms, for example they might be numeric, or qualitative features. “Good” feature selections in turn let us identify the important or significant patterns that distinguish between data forms and instances. Indeed, as we recall, machine learning is truly multidisciplinary, as is reflected in for example, how features are viewed. Example: a geometric view, treating features as tuples, or vectors in a high-dimensional space, the feature space. Equally important is the probabilistic perspective, i.e., viewing features as multivariate random variables. The following references may be helpful, [AA23, BB23, Gia21, Jon09, MDL19, PDC⁺14, ZCH19], and [ZZ23]. An important part of the tools that go into feature selection in the statistical setting is known as principal component analysis (PCA) [CDD15]. It is a dimensionality reduction of features, i.e., a reduction of the dimensionality of large data sets. With the use of choices of covariance kernels, it allows one to transform large sets of variables into smaller ones that still contains most of the information in the large set; see e.g., [LCA⁺24].

As noted above, in applications such as data analysis, the initial set $X$ is general and typically unstructured. In particular, in applications, choices of sets $X$ may not have any linear structure. But, nonetheless, in the design of optimization models (for example in statistical inference, and in machine learning models), there will in fact be natural choices of families of positive definite (p.d.) kernels, specified on $X\times X$ . Each such p.d. kernel will then yield an RKHS, denoted here $\mathscr{H}_{K}$ . And $\mathscr{H}_{K}$ does present one possible choice of feature space (Definition 3.1), but applications dictate a qualitative and quantitative comparison within the variety of feature spaces for a single p.d. kernel $K$ . In particular, we study the possibility of a second p.d. kernel, say $L$ , serving to generate a feature space for $K$ (Proposition 3.3.) These themes are addressed below. We further study operations on the variety of p.d. kernels, as they relate to feature space selection questions.

Definition 3.1 (Feature map, and feature space).

Given a p.d. kernel $K:X\times X\rightarrow\mathbb{C}$ defined on a set $X$ , a Hilbert space $\mathscr{L}$ is said to be a feature space for $K$ if there is a map $\varphi:X\rightarrow\mathscr{L}$ , such that $K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{L}}$ , for all $x,y\in X$ . Set

H_{S}\left(K\right)\coloneqq\left\{\left(\varphi,\mathscr{L}\right):K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{L}},\>x,y\in X\right\}.

Remark 3.2.

$H_{S}\left(K\right)\neq\emptyset$ . Some basic examples include:

(1)

$\varphi\left(x\right)=K_{x}$ , $\mathscr{L}=\mathscr{H}_{K}$ (the RKHS of $K$ ), and $K\left(x,y\right)=\left\langle K_{x},K_{y}\right\rangle_{\mathscr{H}_{K}}$ .
(2)

$\varphi\left(x\right)=\delta_{x}$ , $\mathscr{L}=\overline{span}\left\{\delta_{x}:x\in X\right\}$ , where the Hilbert completion is with respect to

$\left\|\sum_{i=1}^{N}c_{i}\delta_{x}\right\|_{\mathscr{L}}^{2}=\sum_{i,j=1}^{N}\overline{c_{i}}c_{j}K\left(x_{i},x_{j}\right),$

and $K\left(x,y\right)=\left\langle\delta_{x},\delta_{y}\right\rangle_{\mathscr{L}}$ .
(3)

$\varphi\left(x\right)=W_{x}\sim N\left(0,K\left(x,x\right)\right)$ , i.e., $\left(W_{x}\right)_{x\in X}$ is a mean zero Gaussian field, realized in $\mathscr{L}=L^{2}\left(\Omega,\mathbb{P}\right)$ ; and $K\left(x,y\right)=\left\langle W_{x},W_{y}\right\rangle_{L^{2}\left(\mathbb{P}\right)}$ .

More general constructions are considered below.

Proposition 3.3.

Given a p.d. kernel $K$ on $X\times X$ , then for every Hilbert space $\mathscr{L}$ with $\dim\mathscr{L}=\dim\mathscr{H}_{K}$ , there exists $\varphi$ such that $\left(\varphi,\mathscr{L}\right)\in H_{S}\left(K\right)$ .

Proof.

Let $\left\{f_{n}\right\}$ be an ONB in $\mathscr{H}_{K}$ , and $\left\{\zeta_{n}\right\}$ an ONB in $\mathscr{L}$ . Then the map

\varphi\left(x\right)=\sum_{n}f_{n}\left(x\right)\zeta_{n}\in\mathscr{L}

(3.1)

is well defined since $\left(f_{n}\left(x\right)\right)\in l^{2}$ , $x\in X$ (Lemma 2.3), and

\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{L}}=\sum_{n}\overline{f_{n}\left(x\right)}f_{n}\left(y\right)=K\left(x,y\right),\quad x,y\in X.

∎

Example 3.4.

Let $K:X\times X\rightarrow\mathbb{C}$ be p.d., and $\mathscr{H}_{K}$ the associated RKHS. Let $\left\{f_{n}\right\}$ be an ONB for $\mathscr{H}_{K}$ .

(1)

Let $\left\{Z_{n}\right\}$ be a sequence of i.i.d. Gaussian random variables, where $Z_{n}\sim N\left(0,1\right)$ , realized on a probability space $L^{2}\left(\Omega,\mathbb{P}\right)$ . Here, one may take $\Omega=\prod_{\mathbb{N}}\mathbb{R}$ , equipped with the $\sigma$ -algebra $\mathscr{C}$ generated by the cylinder sets. Define

$\varphi\left(x\right)=\sum_{n}f_{n}\left(x\right)Z_{n}\left(\cdot\right)\in L^{2}\left(\Omega,\mathbb{P}\right).$

Then $\left(\varphi,L^{2}\left(\Omega,\mathbb{P}\right)\right)\in H_{S}\left(K\right)$ , and

$K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{L^{2}\left(\Omega,\mathbb{P}\right)}.$
(2)

The above holds, in particular, when $K$ is the reproducing kernel of the Bergman space $B_{2}\left(\Omega\right)$ , $\Omega\subset\mathbb{C}^{n}$ . For $n=1$ , $\Omega=\mathbb{D}$ ,

$K\left(z,w\right)=\left(1-\overline{w}z\right)^{-2}=\sum_{n=0}^{\infty}\left(n+1\right)\overline{w}^{n}z^{n},$

where $\left\{\sqrt{1+n}z^{n}\right\}_{n\in\mathbb{N}_{0}}$ is an ONB for $\mathscr{H}_{K}$ . Setting

$\varphi\left(z\right)=\sum_{n=0}^{\infty}\sqrt{n+1}z^{n}Z_{n}\left(\cdot\right)\in L^{2}\left(\Omega,\mathbb{P}\right),$

then

$K\left(z,w\right)=\left\langle\varphi\left(z\right),\varphi\left(w\right)\right\rangle_{L^{2}\left(\Omega,\mathbb{P}\right)}.$

(3)

Choose $\mathscr{L}$ to be any $L^{2}$ -space, e.g., $\mathscr{L}=L^{2}\left(M,\mu\right)$ . Then, with

\varphi\left(x\right)=\sum_{n}f_{n}\left(x\right)\zeta_{n}\left(\cdot\right)\in L^{2}\left(M,\mu\right),

we have

K\left(x,y\right)=\int_{M}\overline{\varphi\left(x\right)}\varphi\left(y\right)d\mu=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{L^{2}\left(\mu\right)}.

Here the variable $m\in M$ is supposed.

3.1. A duality for feature selections

In Proposition 3.3, the case when $\mathscr{L}$ is another RKHS is of particular interest in the analysis below, as it offers a certain symmetry between the two RKHSs and their feature selections. This is stated as follows:

Proposition 3.5 (duality).

Let $K,L$ be p.d. kernels on $X\times X$ , and let $\mathscr{H}_{K},\mathscr{H}_{L}$ be the corresponding RKHSs. Choose an ONB $\left\{f_{n}\right\}$ for $\mathscr{H}_{K}$ , and $\left\{g_{n}\right\}$ for $\mathscr{H}_{L}$ . Define the following vector-valued functions on $X$ :

	$\displaystyle\varphi\left(x\right)$	$\displaystyle=\sum_{n}f_{n}\left(x\right)g_{n}\left(\cdot\right)\in\mathscr{H}_{L},$
	$\displaystyle\psi\left(x\right)$	$\displaystyle=\sum_{n}f_{n}\left(\cdot\right)g_{n}\left(x\right)\in\mathscr{H}_{K}.$

Then,

	$\displaystyle\left(\varphi,\mathscr{H}_{L}\right)$	$\displaystyle\in H_{S}\left(K\right),\;\text{and}$
	$\displaystyle\left(\psi,\mathscr{H}_{K}\right)$	$\displaystyle\in H_{S}\left(L\right).$

Proof.

Note that $\varphi,\psi$ are well defined since, by Lemma 2.3, $\left(f_{n}\left(x\right)\right),\left(g_{n}\left(x\right)\right)\in l^{2}$ for all $x\in X$ . Moreover, for all $x,y\in X$ ,

	$\displaystyle\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{H}_{L}}$	$\displaystyle=\sum\overline{f_{n}\left(x\right)}f_{n}\left(y\right)=K\left(x,y\right),\;\text{and}$
	$\displaystyle\left\langle\psi\left(x\right),\psi\left(y\right)\right\rangle_{\mathscr{H}_{K}}$	$\displaystyle=\sum\overline{g_{n}\left(x\right)}g_{n}\left(y\right)=L\left(x,y\right),$

which is the desired conclusion. ∎

Example 3.6.

Consider the Szegő kernel $K_{Sz}\left(x,y\right)=\left(1-\overline{w}z\right)^{-1}$ , $\left(w,z\right)\in\mathbb{D}\times\mathbb{D}$ . Its RKHS is the Hardy space $H_{2}\left(\mathbb{D}\right)=\left\{\sum_{n=0}^{\infty}c_{n}z_{n}^{n}:\left(c_{n}\right)\in l^{2}\right\}$ with ONB $\left\{z_{n}\right\}_{n\in\mathbb{N}_{0}}$ .

Let $K\left(x,y\right)=\left(1-xy\right)^{-1}=\sum_{n=0}^{\infty}x^{n}y^{n}$ , defined on $J\times J$ , where $J=\left(-1,1\right)$ . That is, $K=K_{Sz}\big{|}_{J\times J}$ . It follows from Lemma 2.3 that $\left\{x^{n}\right\}_{n\in\mathbb{N}_{0}}$ is an ONB for $\mathscr{H}_{K}$ . Setting

\varphi\left(x\right)=\sum_{n\in\mathbb{N}_{0}}x^{n}z^{n},

then $\left(\varphi,H_{2}\left(\mathbb{D}\right)\right)\in H_{S}\left(K\right)$ by Proposition 3.5.

3.2. Operations

A key property in the choice of Hilbert spaces when constructing feature spaces for positive definite (p.d.) kernels is that tensor products behave well; specifically, the category of Hilbert space is closed under tensor product, meaning that the tensor product formed from two Hilbert spaces is a new and canonically defined Hilbert space, see e.g., [PR16]. Here we take advantage of this geometric fact, showing e.g., that if a p.d. kernel $M$ arises as a product $M=KL$ (Proposition 2.4), then the feature spaces for $M$ arise as tensor products of the feature spaces for the respective factors $K$ and $L$ .

Proposition 3.7.

Let $K_{i}$ , $i=1,2$ , be p.d. kernels on $X\times X$ , and let

K\left(x,y\right)\coloneqq K_{1}\left(x,y\right)K_{2}\left(x,y\right),\quad x,y\in X,

(3.2)

the Hadamard product. Suppose

\left(\varphi_{i},\mathscr{H}_{i}\right)\in H_{S}\left(K_{i}\right),\quad i=1,2.

(3.3)

Set $\mathscr{H}\coloneqq\mathscr{H}_{1}\otimes\mathscr{H}_{2}$ (as tensor product in the category of Hilbert spaces), and

\varphi\left(x\right)=\varphi_{1}\left(x\right)\otimes\varphi_{2}\left(x\right),\quad x\in X.

(3.4)

Then

\left(\varphi,\mathscr{H}\right)\in H_{S}\left(K\right).

Proof.

We have

$\displaystyle K\left(x,y\right)$	$\displaystyle\underset{\left(\ref{eq:c2}\right)}{=}$	$\displaystyle K_{1}\left(x,y\right)K_{2}\left(x,y\right)$
	$\displaystyle\underset{\left(\ref{eq:c3}\right)}{=}$	$\displaystyle\left\langle\varphi_{1}\left(x\right),\varphi_{1}\left(y\right)\right\rangle_{\mathscr{H}_{1}}\left\langle\varphi_{2}\left(x\right),\varphi_{2}\left(y\right)\right\rangle_{\mathscr{H}_{2}}$
	$\displaystyle\underset{\left(\ref{eq:c4}\right)}{=}$	$\displaystyle\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{H}_{1}\otimes\mathscr{H}_{2}}.$

∎

Proposition 3.8.

Let $I$ be an index set. Suppose $\left(\varphi_{i},\mathscr{H}_{i}\right)\in H_{S}\left(K_{i}\right)$ , $i\in I$ , and $\sum_{i\in I}K_{i}\left(x,x\right)<\infty$ for all $x\in X$ . Let $K\coloneqq\sum_{i\in I}K_{i}$ , $\mathscr{H}\coloneqq\oplus_{i\in I}\mathscr{H}_{i}$ , and $\varphi\left(x\right)\coloneqq\oplus_{i\in I}\varphi_{i}\left(x\right)$ . Then, $\left(\varphi,\mathscr{H}\right)\in H_{S}\left(K\right)$ .

Proof.

Note $K$ is well defined if and only if $\sum_{i\in I}K_{i}\left(x,x\right)<\infty$ , for all $x\in X$ . By assumptions,

\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{H}}=\sum_{i\in I}\left\langle\varphi_{i}\left(x\right),\varphi_{i}\left(y\right)\right\rangle_{\mathscr{H}_{i}}=\sum_{i\in I}K_{i}\left(x,y\right)=K\left(x,y\right).

∎

Lemma 3.9 (sums of p.d. kernels).

Let $K_{i}$ , $i=1,2$ , be p.d. kernels on $X\times X$ . Then it is immediate that $K=K_{1}+K_{2}$ is also p.d. Indeed with Definition 3.1, we have that for every $\left(\varphi_{i},\mathscr{L}_{i}\right)\in H_{S}\left(K_{i}\right)$ , then

\left(\varphi^{\oplus}\left(x\right)=\varphi_{1}\left(x\right)+\varphi_{2}\left(x\right),\mathscr{L}_{1}\oplus\mathscr{L}_{2}\right)\in H_{S}\left(K\right).

Proof.

	$\displaystyle\left\langle\varphi^{\oplus}\left(x\right),\varphi^{\oplus}\left(y\right)\right\rangle_{\mathscr{L}_{1}\oplus\mathscr{L}_{2}}$	$\displaystyle=\left\langle\varphi_{1}\left(x\right),\varphi_{1}\left(y\right)\right\rangle_{\mathscr{L}_{1}}+\left\langle\varphi_{2}\left(x\right),\varphi_{2}\left(y\right)\right\rangle_{\mathscr{L}_{2}}$
		$\displaystyle=K_{1}\left(x,y\right)+K_{2}\left(x,y\right)=K\left(x,y\right).$

∎

However, the RKHS from $K=K_{1}+K_{2}$ is not a direct sum Hilbert space. By [Aro50], $F\in\mathscr{H}_{K}$ has its norm-represented as

\left\|F\right\|_{\mathscr{H}_{K}}^{2}=\inf_{F_{1},F_{2}}\left\{\left\|F_{1}\right\|_{\mathscr{H}_{K_{1}}}^{2}+\left\|F_{1}\right\|_{\mathscr{H}_{K_{2}}}^{2},F_{1}+F_{2}=F,F_{i}\in\mathscr{H}_{K_{i}}\right\}.

Indeed, the RKHS $\mathscr{H}_{K}$ takes the form

\mathscr{H}_{K}\simeq\left(\mathscr{H}_{K_{1}}\oplus\mathscr{H}_{K_{2}}\right)\ominus N

where

N=\left\{\left(f,-f\right)\in\mathscr{H}_{K_{1}}\oplus\mathscr{H}_{K_{2}},f\in\mathscr{H}_{K_{1}}\cap\mathscr{H}_{K_{2}}\right\}.

Of special significance to the above discussion are the following citations [AA23, AAARM24, CDD15, Jon09, NSW11, YTDMM11, ZCH19].

4. Hilbert space of distributions

Above, in Sections 2 and 3, we introduced realizations via feature maps. Here we address the of “good” choices of feature spaces, in particular, we make precise the choices of “bigger” features spaces, taking the form of Hilbert spaces of Schwartz distributions.

The details below concern special families of p.d. kernels $K$ , and ways to make precise the corresponding feature spaces, including the form taken by the RKHS $\mathscr{H}_{K}$ . This is motivated in part by an important paper by Laurent Schwartz [Sch64].

Theorem 4.1.

Let $K$ be a p.d. kernel on $\Omega\times\Omega$ , where $\Omega\subset\mathbb{R}^{d}$ is open, and let $\mathscr{H}_{K}$ be the corresponding RKHS. Suppose $K\in C^{\infty}\left(\Omega\times\Omega\right)$ , and let $\left\{f_{n}\right\}$ be an ONB for $\mathscr{H}_{K}$ , so that $K\left(x,y\right)=\sum_{n}\overline{f_{n}\left(x\right)}f_{n}\left(y\right)$ , where $f_{n}\in C^{\infty}$ .

Let $\mathscr{E}^{\prime}\left(\Omega\right)$ denote the space of Schwartz distributions with compact support in $\Omega$ , i.e., $\mathscr{E}^{\prime}\left(\Omega\right)$ is the Frechet dual of $C^{\infty}\left(\Omega\right)$ . Let

H_{K}^{dist}\left(\Omega\right)=\left\{D\in\mathscr{E}^{\prime}\left(\Omega\right):DKD<\infty\right\}

(4.1)

with an ONB $\left\{D_{n}\right\}$ . The notation $DKD$ in (4.1) refers to a pair $K,D$ where $K$ is a $C^{\infty}$ p.d. kernel and $D$ is a Schwartz distribution. Then $DKD$ refers to the action of $D$ in the two variables of $K$ , i.e., on the left and on the right; see the cited literature. Here, the Hilbert completion is with respect to the inner product $\left(\xi,\eta\right)\mapsto\xi K\eta$ .

Set

\varphi\left(x\right)=\sum_{n}f_{n}\left(x\right)D_{n}\in H_{K}^{dist}\left(\Omega\right),

then

\left(\varphi,H_{K}^{dist}\left(\Omega\right)\right)\in H_{S}\left(K\right),

i.e.,

K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{H_{K}^{dist}\left(\Omega\right)}.

Proof.

See Proposition 3.3. ∎

Example 4.2 ([Jor02]).

Let $K\left(x,y\right)=\left(1-xy\right)^{-1}$ , defined on $J\times J$ , where $J=\left(-1,1\right)$ . The corresponding RKHS $\mathscr{H}_{K}$ has an ONB $\left\{x^{n}:n\in\mathbb{N}_{0}\right\}$ .

Let $D_{n}=\left(n!\right)^{-1}\delta_{0}^{\left(n\right)}$ , where $\delta_{0}^{\left(n\right)}$ is the $n^{th}$ derivative of the Dirac distribution $\delta_{0}$ . Note that $K\in C^{\infty}$ , and

D_{n}\left(x\right)K\left(x,y\right)=y^{n}\in\mathscr{H}_{K}.

Define

D_{n}KD_{m}\coloneqq D_{n}\left(x\right)K\left(x,y\right)D_{m}\left(y\right)=\begin{cases}1&n=m,\\ 0&n\neq m.\end{cases}

(4.2)

Let $\mathscr{H}_{K}^{dist}$ be the Hilbert completion of $span\left\{D_{n}\right\}$ , then

\mathscr{H}_{K}^{dist}=\left\{\xi\in\mathscr{E}^{\prime}\left(J\right):\xi K\xi<\infty\right\}=\left\{\sum c_{n}D_{n}:\sum\left|c_{n}\right|^{2}<\infty\right\},

(4.3)

for which $\left\{D_{n}:n\in\mathbb{N}_{0}\right\}$ is an ONB. Set

\varphi\left(x\right)=\sum_{n=0}^{\infty}x^{n}D_{n}\in\mathscr{H}_{K}^{dist},

then $\left(\varphi,\mathscr{H}_{K}^{dist}\right)\in H_{S}\left(K\right)$ , by Theorem 4.1. Note that

\mathscr{H}_{K}^{dist}\cap\mathscr{H}_{K}=\emptyset.

Similarly, for any kernel that is defined by power series, we obtain $\mathscr{H}_{K}^{dist}$ as in (4.3), by adjusting the coefficients of $\delta^{\left(n\right)}$ . In particular, this applies to the kernel

K\left(x,y\right)=\left(1-xy\right)^{-n}=1+nxy+\frac{1}{2}n\left(n+1\right)x^{2}y^{2}+\cdots,\quad\left(x,y\right)\in J\times J.

Corollary 4.3.

Suppose $f\left(x\right)=\sum_{n=0}^{\infty}a_{n}x^{n}$ , $a_{n}>0$ , with radius of convergence $r^{2}>0$ . Let $K\left(x,y\right)=\sum_{n=0}^{\infty}a_{n}x^{n}y^{n}$ , defined on $J\times J$ , where $J=\left(-r,r\right)$ . Set

H_{K}^{dist}=\left\{\sum_{n=0}^{\infty}c_{n}D_{n}:\sum\left|c_{n}\right|^{2}<\infty,\;D_{n}=\frac{1}{n!\sqrt{a_{n}}}\delta^{\left(n\right)}\right\}

and

\varphi\left(x\right)=\sum_{n=0}^{\infty}\sqrt{a_{n}}x^{n}D_{n}.

Then $\left(\varphi,H_{K}^{dist}\right)\in H_{S}\left(K\right)$ .

Of special significance to the above discussion are the following citations [AJ21, AJP22, Jor02, Sch64].

5. Ordering of kernels and RKHSs

Since the choice of “good” features for kernel-machines depend on prior identification of kernels, it is clear that precise comparisons of kernels will be important. We stress ordering of kernels. Their role is addressed below, addressing the role played for feature selection by ordering between pairs kernels, and their implications for computations. Details will be addressed below.

The role of “ordering”, as we saw, arises for both issues dealing with feature selection, and from the role kernels play in geometry and in analysis. More generally, the question of order plays an important role in diverse methods used for building new reproducing kernel Hilbert spaces (RKHSs) from other Hilbert spaces with specified frame elements having specific properties. Such new constructions of RKHSs are used in turn within the framework of regularization theory, and in approximation theory; involving there such questions as semiparametric estimation, and multiscale schemes of regularization. Making use of the results from the previous two section, we turn below to a systematic analysis of these questions of ordering.

Returning to the general framework $(X,K)$ , where the set $X$ does not come with any particular structure, we now study the case when $X$ is fixed, and we examine the collection of all p.d. kernels defined on $X\times X$ . A special feature of interest is that of deciding how the ordering of pairs of p.d. kernels relates to operators which map between the associated families of feature spaces, see Theorem 5.7. This study includes an identification of multipliers, see Corollary 5.8.

We first recall Aronszajn’s inclusion theorem, which states that, for two p.d. kernels $K,L$ on $X\times X$ , $K\leq L$ if and only if $\mathscr{H}_{K}$ is contractively contained in $\mathscr{H}_{L}$ (see e.g., [Aro50]):

Theorem 5.1.

Let $K_{i}$ , $i=1,2$ , be p.d. kernels on $X\times X$ . Then $\mathscr{H}_{K_{1}}\subset\mathscr{H}_{K_{2}}$ (bounded contained) if and only if there exists a constant $c>0$ such that $K_{1}\leq c^{2}K_{2}$ . Moreover, $\left\|f\right\|_{\mathscr{H}_{2}}\leq c\left\|f\right\|_{\mathscr{H}_{1}}$ for all $f\in\mathscr{H}_{K_{1}}$ .

This theorem is reformulated in Lemma 5.2 by means of quadratic forms. It is then extended in Theorem 5.7 to feature spaces.

Lemma 5.2.

Suppose $K,L$ are p.d. on $X\times X$ , and $K\leq L$ . Let $\mathscr{H}_{L}$ be the RKHS of $L$ .

(1)

Let $L_{0}=span\left\{L_{x}:x\in X\right\}$ . Define $\Phi:L_{0}\times L_{0}\rightarrow\mathbb{C}$ by

\Phi\left(L_{x},L_{y}\right)=K\left(x,y\right)

(5.1)

and extend by linearity:

\Phi\left(\sum_{i=1}^{m}c_{i}L_{x_{i}},\sum_{j=1}^{n}d_{j}L_{y_{j}}\right)=\sum_{i=1}^{m}\sum_{j=1}^{n}\overline{c_{i}}d_{j}K\left(x_{i},y_{j}\right).

(5.2)

Then $\Phi$ extends to a bounded sesquilinear form on $\mathscr{H}_{L}$ .

(2)

There exists a unique positive selfadjoint operator $A$ on $\mathscr{H}_{L}$ , such that $0\leq A\leq I$ , and

$\Phi\left(f,g\right)=\left\langle A^{1/2}f,A^{1/2}g\right\rangle_{\mathscr{H}_{L}},\quad f,g\in\mathscr{H}_{L}.$ (5.3)
(3)

Especially,

$K\left(x,y\right)=\left\langle A^{1/2}L_{x},A^{1/2}L_{y}\right\rangle_{\mathscr{H}_{L}},\quad x,y\in X.$ (5.4)

Proof.

Part (1). Need only to show that

\displaystyle\left|\Phi\left(\sum_{i=1}^{m}c_{i}L_{x_{i}},\sum_{j=1}^{n}d_{j}L_{y_{i}}\right)\right|^{2}

\displaystyle\leq\left\|\sum_{i=1}^{m}c_{i}L_{x_{i}}\right\|_{\mathscr{H}_{L}}^{2}\left\|\sum_{j=1}^{n}d_{j}L_{y_{i}}\right\|_{\mathscr{H}_{L}}^{2}

(5.5)

Assume $\left(\pi,\mathscr{E}\right)\in H_{S}\left(K\right)$ , i.e., $K\left(x,y\right)=\left\langle\pi\left(x\right),\pi\left(y\right)\right\rangle_{\mathscr{E}}$ . Then,

\left|\sum_{i=1}^{m}\sum_{j=1}^{n}\overline{c_{i}}d_{j}K\left(x_{i},y_{j}\right)\right|^{2}=\left|\left\langle\sum_{i=1}^{m}c_{i}\pi\left(x_{i}\right),\sum_{j=1}^{n}d_{j}\pi\left(y_{j}\right)\right\rangle_{\mathscr{E}}\right|^{2}.

Thus,

	$\displaystyle\text{LHS}_{\left(\ref{eq:t5}\right)}$	$\displaystyle\leq\left\\|\sum_{i=1}^{m}c_{i}\pi\left(x_{i}\right)\right\\|_{\mathscr{E}}^{2}\left\\|\sum_{j=1}^{n}d_{j}\pi\left(y_{j}\right)\right\\|_{\mathscr{E}}^{2}$
		$\displaystyle=\left(\sum_{s,t=1}^{m}\overline{c_{s}}c_{t}\left\langle\pi\left(x_{s}\right),\pi\left(x_{t}\right)\right\rangle_{\mathscr{E}}\right)\left(\sum_{s,t=1}^{n}\overline{d_{s}}d_{t}\left\langle\pi\left(y_{s}\right),\pi\left(y_{t}\right)\right\rangle_{\mathscr{E}}\right)$
		$\displaystyle=\left(\sum_{s,t=1}^{m}\overline{c_{s}}c_{t}K\left(x_{s},x_{t}\right)\right)\left(\sum_{s,t=1}^{n}\overline{d_{s}}d_{t}K\left(y_{s},y_{t}\right)\right)$
		$\displaystyle\leq\left(\sum_{s,t=1}^{m}\overline{c_{s}}c_{t}L\left(x_{s},x_{t}\right)\right)\left(\sum_{s,t=1}^{n}\overline{d_{s}}d_{t}L\left(y_{s},y_{t}\right)\right)=\text{RKS}_{\left(\ref{eq:t5}\right)}.$

Part (2) follows from the general theory of quadratic forms. Note that (5.4) follows from (5.3) and (5.1). ∎

Corollary 5.3.

Assume $K,L$ are p.d. on $X\times X$ , and $K\leq L$ . Let $\mathscr{H}_{K}$ and $\mathscr{H}_{L}$ be the corresponding RKHSs. Then

K_{x}\mapsto A^{1/2}L_{x}

extends to an isometry from $\mathscr{H}_{K}$ into $\mathscr{H}_{L}$ .

Remark 5.4.

Let a pair of p.d. kernels satisfy the Loewner order relation (Definition 2.5.) Note that then the corresponding operator $A$ introduced in (5.4) and Corollary 5.3 will be bounded. However, Example 5.5 below (see (5.8)) illustrates that, in general, the inverse $A^{-1}$ will be an unbounded operator. In applications to the theory of elliptic PDEs, the operator $A$ introduced in (5.4) and Corollary 5.3 may take the form of a “Greens function;” see e.g., [Nel58a, Nel58b].

Example 5.5.

Let $K\left(z,w\right)=\left(1-\overline{w}z\right)^{-1}$ , $L\left(z,w\right)=\left(1-\overline{w}z\right)^{-2}$ , defined on $\mathbb{D}\times\mathbb{D}$ , where

	$\displaystyle\mathscr{H}_{K}$	$\displaystyle=H_{2}\left(\mathbb{D}\right)=\left\{\sum_{n=0}^{\infty}c_{n}z^{n}:\left(c_{n}\right)\in l^{2}\right\},$
	$\displaystyle\mathscr{H}_{L}$	$\displaystyle=B_{2}\left(\mathbb{D}\right)=\left\{\sum_{n=0}^{\infty}c_{n}z^{n}:\left(c_{n}/\sqrt{1+n}\right)\in l^{2}\right\}.$

Define

A\left(z^{n}\right)=\left(1+n\right)^{-1}z^{n}.

(5.6)

Then

K\left(z,w\right)=\left\langle A^{1/2}L_{z},A^{1/2}L_{w}\right\rangle_{\mathscr{H}_{L}}.

(5.7)

Moreover, the inverse operator is given by

A^{-1}=1+z\frac{d}{dz}:z^{n}\longmapsto\left(1+n\right)z^{n},

(5.8)

where $A^{-1}\geq 1$ .

Proof of (5.7).

Recall that $L_{w}\left(s\right)=L\left(s,w\right)=\sum_{n\in\mathbb{N}_{0}}\left(1+n\right)\overline{w}^{n}s^{n}$ , and

1=\left\|z^{n}\right\|_{\mathscr{H}_{K}}\geq\left\|z^{n}\right\|_{\mathscr{H}_{L}}=\frac{1}{\sqrt{1+n}},\quad n\in\mathbb{N}_{0}.

Then,

	$\displaystyle\left\langle A^{1/2}L_{z},A^{1/2}L_{w}\right\rangle_{\mathscr{H}_{L}}$	$\displaystyle=\left\langle A^{1/2}\sum_{n}\left(1+n\right)\overline{z}^{n}s^{n},A^{1/2}\sum_{m}\left(1+n\right)\overline{w}^{m}s^{m}\right\rangle_{\mathscr{H}_{L}}$
		$\displaystyle=\sum_{n}\left(1+n\right)^{2}z^{n}\overline{w}^{n}\left\langle A^{1/2}s^{n},A^{1/2}s^{n}\right\rangle_{\mathscr{H}_{L}}$
		$\displaystyle=\sum_{n}\left(1+n\right)z^{n}\overline{w}^{n}\left\langle s^{n},s^{n}\right\rangle_{\mathscr{H}_{L}}$
		$\displaystyle=\sum_{n}z^{n}\overline{w}^{n}=\left\langle K_{z},K_{w}\right\rangle_{\mathscr{H}_{K}}=K\left(z,w\right).$

∎

Remark 5.6.

If $j:\mathscr{H}_{K}\rightarrow\mathscr{H}_{L}$ is the inclusion map, then the adjoint $j^{*}:\mathscr{H}_{L}\rightarrow\mathscr{H}_{K}$ is given by $j^{*}\left(z^{n}\right)=\left(1+n\right)^{-1}z^{n}$ . Therefore, the operator $A$ in (5.6) is precisely the contraction $A=jj^{*}:\mathscr{H}_{L}\rightarrow\mathscr{H}_{L}$ .

More generally, we have:

Theorem 5.7.

Let $K,L$ be p.d. kernels on $X\times X$ , with $\left(\varphi,\mathscr{K}\right)\in H_{S}\left(K\right)$ and $\left(\psi,\mathscr{L}\right)\in H_{S}\left(L\right)$ . Then $K\leq L$ if and only if there exists a positive selfadjoint operator on $B$ on $\mathscr{L}$ , such that $0\leq B\leq I$ , and

K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{K}}=\left\langle B^{1/2}\psi\left(x\right),B^{1/2}\psi\left(y\right)\right\rangle_{\mathscr{L}},\quad x,y\in X.

Proof.

See the proof of Lemma 5.2. ∎

Corollary 5.8 (Multipliers).

Let $K$ be a p.d. kernel on $\mathbb{D}$ and $\mathscr{H}_{K}$ be the corresponding RKHS. For $\varphi$ in the unit ball $\left(H^{\infty}\right)_{1}$ of $H^{\infty}$ , the function

K^{*}\left(z,w\right)=\left(1-\overline{\varphi\left(w\right)}\varphi\left(z\right)\right)K\left(z,w\right)

is a p.d. kernel on $\mathbb{D}$ if and only if $\varphi$ is a contractive multiplier on $\mathscr{H}$ , i.e., $\left\|\varphi h\right\|_{\mathscr{H}_{K}}\leq\left\|h\right\|_{\mathscr{H}_{K}}$ for all $h\in\mathscr{H}_{K}$ .

Proof.

The assertion is equivalent to:

	$\displaystyle\overline{\varphi\left(w\right)}\varphi\left(z\right)K\left(z,w\right)\leq K\left(z,w\right)$		(5.9)
	$\displaystyle\Updownarrow$
	$\displaystyle\left\\|\varphi h\right\\|_{\mathscr{H}_{K}}\leq\left\\|h\right\\|_{\mathscr{H}_{K}},\forall h\in\mathscr{H}_{K}.$		(5.10)

Pick an ONB $\left\{f_{n}\right\}$ for $\mathscr{H}_{K}$ , then

\overline{\varphi\left(w\right)}\varphi\left(z\right)K\left(z,w\right)=\sum_{n}\overline{\varphi\left(w\right)f_{n}\left(w\right)}\varphi\left(z\right)f_{n}\left(z\right)

(5.11)

Therefore, an application of Theorem 5.7 to (5.11) shows that (5.9) holds if and only if the operator $\mathscr{H}_{K}\rightarrow\mathscr{H}_{K}$ , $f_{n}\mapsto\varphi f_{n}$ is contractive, thus the equivalence to (5.10). ∎

Next, we focus on certain limit constructions of p.d. kernels.

Definition 5.9.

Given a set $X$ , let $Pos\left(X\right)$ be the set of all p.d. kernels defined on $X\times X$ .

Theorem 5.10.

Let $K_{1}\leq K_{2}\leq\cdots\leq K_{n}\leq K_{n+1}\leq\cdots$ , with $K_{n}\in Pos\left(X\right)$ for all $n\in\mathbb{N}$ . Assume that for all $x\in X$ ,

\sup_{n}K_{n}\left(x,x\right)=S\left(x\right)<\infty,

(5.12)

then the Hilbert completion

\mathscr{H}_{\tilde{K}}:=\left(\bigcup_{n}\mathscr{H}_{K_{n}}\right)^{\sim},

(5.13)

is the RKHS of a limit p.d. kernel

\tilde{K}\left(x,y\right):=\lim_{n\rightarrow\infty}K_{n}\left(x,y\right),

(5.14)

defined on $X\times X$ .

Proof.

First note that, from (5.12), we get the following boundedness:

\left|K_{n}\left({\color[rgb]{0,0,1}x,y}\right)\right|^{2}\leq K_{n}\left(x,x\right)K_{n}\left(y,y\right)\leq S\left(x\right)S\left(y\right)<\infty,

and so the sequence

\left\{K_{n}\left(x,y\right)\right\}_{n\in\mathbb{N}}

is bounded in $\mathbb{C}$ for $\forall\left(x,y\right)\in X\times X$ .

For all $N\in\mathbb{N}$ , $c_{i}\in\mathbb{C}$ , $x_{i}\in X$ , $1\leq i\leq N$ , set

F_{n}\left(\vec{c},\vec{x},N\right)=\sum_{i}\sum_{j}\overline{c_{i}}c_{j}K_{n}\left(x_{i},x_{j}\right).

(5.15)

Since $K_{n}\leq K_{n+1}$ , it follows that

F_{n}\left(\vec{c},\vec{x},N\right)\leq F_{n+1}\left(\vec{c},\vec{x},N\right)

and that

\sup_{n\in\mathbb{N}}F_{n}\left(\vec{c},\vec{x},N\right)<\infty\;\left(\text{by $\left(\ref{eq:f11}\right)$}\right).

Hence $\tilde{K}$ is well defined, and

\sum_{i}\sum_{j}\overline{c_{i}}c_{j}\tilde{K}\left(x_{i},x_{j}\right)=\sup_{n}F_{n}\left(c,\vec{x},N\right)<\infty.

In this case, for every $f\in\mathscr{H}_{K_{1}}\subset\mathscr{H}_{K_{2}}\subset\cdots$ ,

\left\|f\right\|_{\mathscr{H}_{K_{1}}}\geq\left\|f\right\|_{\mathscr{H}_{K_{2}}}\geq\left\|f\right\|_{\mathscr{H}_{K_{3}}}

(5.16)

and we have

\left\|f\right\|_{\mathscr{H}_{\tilde{K}}}=\lim_{n}\left\|f\right\|_{\mathscr{H}_{K_{n}}}.

∎

Remark 5.11.

Condition (5.12) is necessary for this construction. For example, consider $K_{n}\left(z,w\right)=\left(1-\overline{w}z\right)^{-n}$ on $\mathbb{D}\times\mathbb{D}$ , where $n\in\mathbb{N}$ . Then

1=\left\|z^{k}\right\|_{\mathscr{H}_{K_{1}}}>\underset{\frac{1}{\sqrt{1+k}}}{\underbrace{\left\|z^{k}\right\|_{\mathscr{H}_{K_{2}}}}}>\left\|z^{k}\right\|_{\mathscr{H}_{K_{3}}}\rightarrow 0,\;n\rightarrow\infty.

As an application we mention the following Cantor construction and a monotone kernel limit. While the example selects a particular scaling-iteration, the idea will apply more generally to a variety of iterated function system constructions (IFSs). For background on IFSs, see e.g., [JT23c, JS21].

Lemma 5.12.

Let $f\in C\left(\left[0,1\right]\right)$ , and extend it to $\mathbb{R}$ by setting $f\left(x\right)=0$ for $x\notin\left[0,1\right]$ . Define $T^{0}f=f$ , and

T^{n}f\left(x\right)=T^{n-1}f\left(3x\right)+T^{n-1}f\left(3x-2\right),\quad n\in\mathbb{N}.

(5.17)

Then the limit (pointwise)

F\left(x\right)=\lim_{n\rightarrow\infty}T^{n}f\left(x\right)

(5.18)

is supported in the middle-third Cantor set $C_{1/3}$ . (See Figure 5.1 for an illustration.)

Refer to caption — Figure 5.1. $g_{n}\left(x\right)=T^{n}f\left(x\right)$ , $n=0,1,\cdots,5$ .

Proof.

Recall that $C_{1/3}$ is defined as follows: Let $I=\left[0,1\right]$ . Introduce two endomorphisms $\tau_{1},\tau_{2}:I\rightarrow I$ , where $\tau_{1}\left(x\right)=x/3$ , $\tau_{2}\left(x\right)=\left(x+2\right)/3$ . Set $C_{0}=I$ , and

C_{n}=\tau_{1}\left(C_{n-1}\right)\cup\tau_{2}\left(C_{n-1}\right),\quad n\in\mathbb{N}.

Then

C_{1/3}=\bigcap_{n=0}^{\infty}C_{n}.

Note (5.17) is the dual construction for functions on the unit interval $I$ . ∎

Theorem 5.13.

Let $K$ be p.d. on $X\times X$ with $X=\left[0,1\right]$ , such that

K\left(x,y\right)=\sum_{i=0}^{\infty}f_{i}\left(x\right)\overline{f_{i}\left(y\right)},

where $\left\{f_{i}\right\}$ is an ONB for the corresponding RKHS $\mathscr{H}_{K}$ . Extend $f_{i}$ to $\mathbb{R}$ by setting $f_{i}\left(x\right)=0$ for $x\notin\left[0,1\right]$ , and set

K_{n}\left(x,y\right)=\sum_{i=0}^{\infty}T^{n}f_{i}\left(x\right)\overline{T^{n}f_{i}\left(y\right)}.

Then the limit

K_{\infty}\left(x,y\right)=\lim_{n\rightarrow\infty}K_{n}\left(x,y\right)=\lim_{n\rightarrow\infty}\sum_{i=0}^{\infty}T^{n}f_{i}\left(x\right)\overline{T^{n}f_{i}\left(y\right)}

is a p.d. kernel on $C_{1/3}\times C_{1/3}$ .

Moreover, $K_{\infty}$ is invariant under the action of $T$ , where $T$ acts on a p.d. kernel $L$ on $X\times X$ by

L\left(x,y\right)=\sum_{i}l_{i}\left(x\right)\overline{l_{i}\left(y\right)}\longmapsto TL\left(x,y\right)=\sum_{i}Tl_{i}\left(x\right)\overline{Tl_{i}\left(y\right)},\quad x,y\in X.

Proof.

By assumption, $\left(f_{i}\left(x\right)\right)\in l^{2}$ , $\forall x\in\mathbb{R}$ . Note that

\left(f_{i}\left(x\right)\right)\in l^{2},\forall x\in\mathbb{R}\Longrightarrow\left(Tf_{i}\left(x\right)\right)\in l^{2},\forall x\in\mathbb{R}

since

\left\|\left(Tf_{i}\left(x\right)\right)\right\|_{l^{2}}^{2}\leq\left\|\left(f_{i}\left(3x\right)\right)\right\|_{l^{2}}^{2}+\left\|\left(f_{i}\left(3x-2\right)\right)\right\|_{l^{2}}^{2}<\infty.

Therefore, $K_{n}$ is a well defined p.d. kernel, for all $n\in\mathbb{N}$ .

The conclusion follows by passing to the limit, where $F_{i}\left(x\right):=\lim_{n\rightarrow\infty}T^{n}f_{i}\left(x\right)$ exists, and has support in $C_{1/3}$ , by (5.17)–(5.18). See Figure 5.2 for an illustration. ∎

Of special significance to the above discussion are the following citations [JST23, JT22, KL21, PDC⁺14, Don74].

6. RKHS of analytic functions

As noted in Section 4, an identification of good kernels, and their corresponding RKHSs, depend on the particular function spaces that arise as RKHSs. The choices when the RKHSs consist of Hilbert spaces of analytic functions has received special attention in the earlier literature on the use of kernels in analysis. The section below outlines properties of RKHSs realized as Hilbert spaces of analytic functions, and their role in our present applications.

The focus of our analysis below is the case when the RKHS $\mathscr{H}_{K}$ will be Hilbert spaces of analytic function, defined on an open domain in $\mathbb{C}^{d}$ for some $d$ .

Definition 6.1.

Let $\Omega$ be an open subset in $\mathbb{C}^{d}$ and let $K$ be a $\mathbb{C}$ -valued p.d. function on $\Omega\times\Omega$ . We say that $K$ is analytic if the corresponding RKHS $\mathscr{H}_{K}$ consists of analytic functions on $\Omega$ .

Remark 6.2.

We note that there are other definitions in the literature which make precise this property of analyticity, and it follows from our discussion that they are equivalent to the present one.

Note that Definition 6.1 makes it clear that the following three familiar classes of p.d. kernels $K$ are analytic: The cases when $K$ is a Szegő kernel, or a Bergman kernel, or Bargmann’s kernel [Ber55, BB23, SS17]. In these cases, the respective RKHSs $\mathscr{H}_{K}$ are the Hardy space $H_{2}\left(\Omega\right)$ , the Bergman space $B_{2}\left(\Omega\right)$ , or Bargmann’s Hilbert space of entire analytic functions on $\mathbb{C}^{d}$ , also called the Segal-Bargmann space. For the literature, we refer to [LG20, Alp15, ADR03, Kis23, Has21, CCL17], and we call attention to the Drury-Arveson kernel [Arv98] as generalization of the Szego/Bergman case.

Example 6.3 (Bergman = $\left(\text{Szeg\H{o}}\right)^{2}$ ).

Recall the Szegő kernel

K\left(z,w\right)=\sum_{n\in\mathbb{N}_{0}}z^{n}\overline{w}^{n}

and the Bergman kernel

K^{2}\left(z,w\right)=\sum_{n\in\mathbb{N}_{0}}\left(n+1\right)z^{n}\overline{w}^{n}.

Here, we have

K^{2}-K=K\left(K-1\right)=\left(\frac{1}{1-z\overline{w}}\right)^{2}\left(z\overline{w}\right)\geq 0,

i.e., $K\leq K^{2}$ . By the discussion above, we have

B_{2}\left(\mathbb{D}\right)\xrightarrow{\quad A\quad}H_{2}\left(\mathbb{D}\right)\subset B_{2}\left(\mathbb{D}\right),

where $A$ is the operator in (6.1). Specifically,

A\underset{\in B_{2}}{\underbrace{\left(\sum_{n=0}^{\infty}c_{n}z^{n}\right)}}:=\sum_{n=0}^{\infty}\frac{c_{n}}{\sqrt{n+1}}z^{n}\in H_{2}

(6.1)

where

	$\displaystyle\left\\|\sum_{n=0}^{\infty}\frac{c_{n}}{\sqrt{n+1}}z^{n}\right\\|_{H_{2}}^{2}$	$\displaystyle=\sum_{n=0}^{\infty}\frac{\left\|c_{n}\right\|^{2}}{n+1}$
	$\displaystyle\left\\|\sum_{n=0}^{\infty}c_{n}z^{n}\right\\|_{B_{2}}^{2}$	$\displaystyle=\left\\|\sum_{n=0}^{\infty}\frac{c_{n}}{\sqrt{n+1}}\sqrt{n+1}z^{n}\right\\|_{B_{2}}^{2}=\sum_{n=0}^{\infty}\frac{\left\|c_{n}\right\|^{2}}{n+1}.$

Remark 6.4.

Note this covers a lot of the kernels we considered, such as

(1)

$K$ :

$\begin{matrix}{\displaystyle\frac{1}{1-z\overline{w}}},&{\displaystyle\left(\frac{1}{1-z\overline{w}}\right)^{2}},&X=\mathbb{D}\end{matrix}$
(2)

$K:$

$\frac{1}{2i\left(z-\overline{w}\right)}$

defined for $\left(z,w\right)\in\mathbb{C}_{+}\times\mathbb{C}_{+}$ , where

$\mathbb{C}_{+}=\left\{z\in\mathbb{C}:\Im z>0\right\}$

(3)

The Bargmann kernel:

e^{z\overline{w}}=\sum_{n=0}^{\infty}\frac{z^{n}\overline{w}^{n}}{n!}=\sum_{n=0}^{\infty}\frac{z^{n}}{\sqrt{n!}}\frac{\overline{w}^{n}}{\sqrt{n!}},\quad\forall\left(z,w\right)\in\mathbb{C}\times\mathbb{C}.

This leads to an RKHS $\mathscr{H}_{K}$ consisting of all entire functions $F$ on $\mathbb{C}$ with norm

\int_{\mathbb{C}}\left|F\left(z\right)\right|^{2}e^{-\frac{\left|z\right|^{2}}{2}}dA\left(z\right)<\infty.

(6.2)

Here, $dA\left(z\right)=dxdy$ , $z=x+iy$ .

Remark 6.5.

Comparing kernels is relatively straightforward,

K\leq K^{\prime}\Longleftrightarrow\text{$K^{\prime}-K$ is p.d.}

(6.3)

while comparing the associated Hilbert spaces is intriguing. The challenge lies in understanding how the embedding or inclusion of one RKHS into another reflects the geometry of the underlying kernels. For instance, the inclusion involves not just the kernels’ positivity but also their interaction with the data, operator norms, and potential scaling factors. Moreover, the relationship between the norms of the two spaces is critical, as it determines the stability and sensitivity of algorithms using these spaces. This makes the comparison of RKHSs more than a direct numerical or functional comparison—it becomes a study of their geometry, boundedness properties, and the behavior of operators that map between them.

For example,

H_{2}\left(\mathbb{D}\right)

(Hardy space) vs

B_{2}\left(\mathbb{D}\right)

(Bergman space),

(6.4)

see the discussion above.

Now, consider feature maps and feature spaces:

H_{S}\left(K\right)=\left\{\left(\varphi,\mathscr{H}\right),X\ni x\rightarrow\varphi\left(x\right)\in\mathscr{H},\>\text{s.t. $K\left(x,y\right)=\left\langle\varphi\left(x\right),\varphi\left(y\right)\right\rangle_{\mathscr{H}}$}\right\}.

(6.5)

We may also consider the following two variants of $H_{S}\left(K\right)$ :

Definition 6.6.

Given $K$ , p.d. in $X\times X$ , set

\text{super feature space:}\\ H_{S}^{+}\left(K\right)=\left\{\left(\psi,\mathscr{H}\right);X\ni x\rightarrow\psi\left(x\right)\in\mathscr{H},\>\text{s.t. $K\leq\left\langle\psi\left(x\right),\psi\left(y\right)\right\rangle_{\mathscr{H}}$}\right\},

(6.6)

\text{sub feature space: }\\ H_{S}^{-}\left(K\right)=\left\{\left(\psi,\mathscr{H}\right);X\ni x\rightarrow\psi\left(x\right)\in\mathscr{H},\>\text{s.t. $\left\langle\psi\left(x\right),\psi\left(y\right)\right\rangle_{\mathscr{H}}\leq K$}\right\}.

(6.7)

6.1. Three kernels of $H_{2}$ -Hardy spaces

Here, $f_{n}\left(z\right)=z^{n}$ , in a complex variable $z\in\mathbb{D}=\left\{z\in\mathbb{D}:\left|z\right|<1\right\}$ , the unit disk in $\mathbb{C}$ .

Summary:

(1)

Coefficients in the scalar $\mathbb{C}$ :

H_{2}\left(\mathbb{D}\right)=\left\{\sum_{n=0}^{\infty}c_{n}z^{n}:\left(c_{n}\right)\in l^{2}\left(\mathbb{N}_{0}\right)\right\},

\left\|\sum_{n=0}^{\infty}c_{n}z^{n}\right\|_{H_{2}\left(\mathbb{D}\right)}^{2}=\sum_{n=0}^{\infty}\left|c_{n}\right|^{2}=\left\|\left(c_{n}\right)\right\|_{l^{2}}^{2}.

(2)

Coefficients in a fixed Hilbert space $\mathscr{H}$ :

H_{2}\left(\mathscr{H}\right):=\left\{\sum_{n=0}^{\infty}h_{n}z^{n}:h_{n}\in\mathscr{H},\>\sum_{n=0}^{\infty}\left\|h_{n}\right\|^{2}<\infty\right\},

\left\|\sum_{n=0}^{\infty}h_{n}z^{n}\right\|_{H_{2}\left(\mathscr{H}\right)}^{2}=\sum_{n=0}^{\infty}\left\|h_{n}\right\|_{\mathscr{H}}^{2}<\infty.

(3)

Coefficients in the Hilbert-Schmidt class $HS\left(\mathscr{H}\right)\subset B\left(\mathscr{H}\right)$ , where $B\left(\mathscr{H}\right)$ is the space of all bounded operators in $\mathscr{H}$ :

H_{2}\left(B\left(\mathscr{H}\right)\right):=\left\{\sum_{n=0}^{\infty}Q_{n}z^{n}:Q_{n}\in B\left(\mathscr{H}\right),\right.\\ \;\left.\sum_{n=0}^{\infty}Q_{n}^{*}Q_{n}\in\mathscr{T}\left(\mathscr{H}\right),\>\text{trace class}\right\},

\left\|\sum_{n=0}^{\infty}Q_{n}z^{n}\right\|_{H_{2}\left(B\left(\mathscr{H}\right)\right)}^{2}=\text{Trace}\left(\sum_{n=0}^{\infty}Q_{n}^{*}Q_{n}\right).

Correspondences, transforms: $3\rightarrow 2\rightarrow 1$ , $3\rightarrow 1$ . Recall a Kaczmarz system of projections $P_{n}$ yields operators $Q_{n}$ s.t. $\sum_{n}Q_{n}^{*}Q_{n}=I$ . See e.g., [JST23, HJW20, JST20] for additional details.

6.2. Realization using tensor product of Hilbert spaces

Recall that

{\color[rgb]{0,0,1}\mathscr{H}\otimes\mathscr{H}^{*}}\longleftrightarrow\text{the Hilbert space of all \emph{Hilbert-Schmidt} operators acting on $\mathscr{H}.$}

(6.8)

Consider case (3) from above, i.e., $F\left(z\right)=\sum_{n=0}^{\infty}Q_{n}z^{n}$ , then for $h\in\mathscr{H}$ ,

\left\langle h,F\left(z\right)h\right\rangle_{\mathscr{H}}=\sum_{n=0}^{\infty}\left\langle h,Q_{n}h\right\rangle_{\mathscr{H}}z^{n}\in H_{2}\left(\mathbb{D}\right),

where $\left\langle\cdot,\cdot\right\rangle_{\mathscr{H}}$ denotes the inner product in $\mathscr{H}$ , and

	$\displaystyle\left\\|F\left(z\right)h\right\\|_{\mathscr{H}}^{2}$	$\displaystyle=\left\\|\sum_{n=0}^{\infty}\left(Q_{n}h\right)z^{n}\right\\|_{H_{2}\left(\mathbb{D},\mathscr{H}\right)}^{2}$
		$\displaystyle=\sum_{n=0}^{\infty}\left\\|Q_{h}h\right\\|_{\mathscr{H}}^{2}\left\|z\right\|^{2n}$
		$\displaystyle=\sum_{n=0}^{\infty}\left\langle h,Q_{n}^{*}Q_{n}h\right\rangle_{\mathscr{H}}\left\|z\right\|^{2n}$
		$\displaystyle\leq\left\\|h\right\\|_{\mathscr{H}}^{2}\sum_{n=0}^{\infty}\left\\|Q_{n}^{*}Q_{n}\right\\|\left\|z\right\|^{2n}$
		$\displaystyle=\left\\|h\right\\|_{\mathscr{H}}^{2}\sum_{n=0}^{\infty}\left\\|Q_{n}\right\\|^{2}\left\|z\right\|^{2n}$

where $\left\|\cdot\right\|=\left\|\cdot\right\|_{\mathscr{H}\rightarrow\mathscr{H}}$ is the operator norm.

Trace-norm: Pick an ONB $\left\{e_{k}\right\}$ in $\mathscr{H}$ , then

	$\displaystyle\text{Tr}\left(F\left(z\right)^{*}F\left(z\right)\right)$	$\displaystyle=\sum_{k}\left\langle e_{k},F\left(z\right)^{*}F\left(z\right){\color[rgb]{0,0,1}e_{k}}\right\rangle$
		$\displaystyle=\sum_{k}\left\\|F\left(z\right)e_{k}\right\\|_{\mathscr{H}}^{2}$
		$\displaystyle=\sum_{n=0}^{\infty}\text{Tr}\left(Q_{n}^{*}Q_{n}\right)\left\|z\right\|^{2n}<\infty$

for $\forall z\in\mathbb{D}$ when $\sum Q_{n}^{*}Q_{n}$ is trace-class.

Of special significance to the above discussion are the following citations [ADR03, Alp15, Arv98, HJW20, Loe48, PR16].

7. $K$ -duality via the RKHS $\mathscr{H}_{K}$

The present final section addresses a list of direct links between kernel properties, and the role they play in feature selection.

7.1. Dirac-masses and $\mathscr{H}_{K}$

In the below we consider the role of the Dirac masses in reproducing kernel Hilbert space $\mathscr{H}_{K}$ when $K$ is a general p.d. kernel defined on $X\times X$ . Specifically, we show that the completion of the span of the $X$ -Dirac masses identifies as a realization of all bounded linear functionals on $\mathscr{H}_{K}$ .

Theorem 7.1.

Fix $K$ , assumed p.d. on $X$ . Let

\mathscr{H}_{K}=\overline{span}^{\left\|\cdot\right\|_{\mathscr{H}_{K}}}\left\{K_{x}:x\in X\right\},\quad\tilde{\mathscr{H}}_{K}=\overline{span}^{\left\|\cdot\right\|_{\tilde{\mathscr{H}}_{K}}}\left\{\delta_{x}:x\in X\right\},

(7.1)

where

\left\|\sum\nolimits_{i}c_{i}K_{x_{i}}\right\|_{\mathscr{H}_{K}}^{2}=\left\|\sum\nolimits_{i}c_{i}\delta_{x_{i}}\right\|_{\tilde{\mathscr{H}}_{K}}^{2}=\sum\nolimits_{i,j}\overline{c_{i}}c_{j}K\left(x_{i},x_{j}\right).

(7.2)

Then

\tilde{\mathscr{H}}_{K}\simeq\mathscr{H}_{K}^{\prime}.

(7.3)

Proof.

Every bounded linear functional $l$ on $\mathscr{H}_{K}$ is given by a unique $\xi_{l}$ in $\mathscr{H}_{K}$ , and $\xi_{l}$ is the limit of a sequence $\left(\varphi_{n}\right)$ in $span\left\{K_{x}:x\in X\right\}$ . Using the correspondence

\varphi_{n}\left(x\right)=\sum c_{i}^{\left(n\right)}K_{x_{i}}\xleftrightarrow{\text{by $\left(\ref{eq:g2-1}\right)$}}\tilde{\varphi}_{n}\left(x\right)=\sum c_{i}^{\left(n\right)}\delta_{x_{i}},

$\left(\tilde{\varphi}_{n}\right)$ is Cauchy in $\tilde{\mathscr{H}}_{K}$ , and it converges to some $f\in\tilde{\mathscr{H}}_{K}$ . Note, by (7.2),

\left\|\varphi_{n}-\xi_{l}\right\|_{\mathscr{H}_{K}}=\left\|\tilde{\varphi}_{n}-f\right\|_{\tilde{\mathscr{H}}_{K}}.

Then, for all $h\in\mathscr{H}_{K}$ ,

l\left(h\right)=\left\langle\xi_{l},h\right\rangle_{\mathscr{H}_{K}}=\lim_{n}\left\langle\varphi_{n},h\right\rangle_{\mathscr{H}_{K}}=\lim_{n}\tilde{\varphi}_{n}\left(h\right)=f\left(h\right),

where

\tilde{\varphi}_{n}\left(h\right)=\left(\sum c_{i}^{\left(n\right)}\delta_{x_{i}}\right)\left(h\right)=\left\langle\left(\sum c_{i}^{\left(n\right)}K_{x_{i}}\right),h\right\rangle_{\mathscr{H}_{K}}=\sum c_{i}^{\left(n\right)}h\left(x_{i}\right).

Therefore,

l=f.

This shows that $\mathscr{H}_{K}^{\prime}\subset\tilde{\mathscr{H}}_{K}$ . Similarly, $\tilde{\mathscr{H}}_{K}\subset\mathscr{H}_{K}^{\prime}$ , and so (7.3) holds. ∎

We note that Theorem 7.1 follows from the Riesz Representation Theorem. However, we include this theorem to explicitly establish the equivalence between $\mathscr{H}_{K}^{\prime}$ , the dual space of $\mathscr{H}_{K}$ , and $\tilde{\mathscr{H}}_{K}$ , the completion of the span of Dirac masses. The point is to explicitly show how the RKHS structure and kernel properties are used for this identification (see (7.3)). In Corollary 7.3 below, we use this to construct explicit bases for $\mathscr{H}_{K}$ and $\mathscr{H}_{K}^{\prime}$ for the kernel $K^{n}\left(x,y\right):=\left(1-xy\right)^{-n}$ , where the representation of the Dirac delta function gives a concrete realization of $\tilde{\mathscr{H}}_{K}$ .

Corollary 7.2.

Fix $K,L$ p.d. on $X$ . The following are equivalent:

(1)

$K\leq L$ .
(2)

$\mathscr{H}_{K}$ is contractively contained in $\mathscr{H}_{L}$ .
(3)

$\tilde{\mathscr{H}}_{L}$ is contractively contained in $\tilde{\mathscr{H}}_{K}$ .

Moreover, $\mathscr{H}_{K}$ is dense in $\mathscr{H}_{L}$ if and only if $\tilde{\mathscr{H}}_{L}$ is dense in $\tilde{\mathscr{H}}_{K}$ .

Corollary 7.3.

Let $X=\left(-1,1\right)$ , and

K^{n}\left(x,y\right)=\left(1-xy\right)^{-n}=\sum_{k=0}^{\infty}a_{k}x^{k}y^{k},\quad x,y\in X

where

a_{k}=\left(-1\right)^{k}\binom{-n}{k}=\frac{n\left(n+1\right)\cdots\left(n+k-1\right)}{k}>0.

Then

	$\displaystyle\mathscr{H}_{K^{n}}$	$\displaystyle=\left\{\sum c_{k}x^{k}:\left(c_{k}/\sqrt{a_{k}}\right)\in l^{2}\right\}\simeq\left\{\left(c_{k}\right):\left(c_{k}/\sqrt{a_{k}}\right)\in l^{2}\right\}$		(7.4)
	$\displaystyle\mathscr{H}^{\prime}_{K^{n}}$	$\displaystyle=\left\{\sum c_{k}\frac{D^{k}}{k!}:\left(c_{k}\sqrt{a_{k}}\right)\in l^{2}\right\}\simeq\left\{\left(c_{k}\right):\left(c_{k}\sqrt{a_{k}}\right)\in l^{2}\right\}$		(7.5)

where

D^{k}=\left(\frac{d}{dy}\right)^{k}\big{|}_{y=0}.

Further, for all $x\in X$ ,

\delta_{x}=\sum_{k=0}^{\infty}x^{k}\frac{D^{k}}{k!}.

(7.6)

Proof.

Note that

\frac{D^{k}}{k!}\left(x^{m}\right)=\begin{cases}1&k=m\\ 0&k\neq m\end{cases}

and so we have the natural isomorphism

\mathscr{H}^{\prime}_{K^{n}}\ni\frac{D^{k}}{k!}\longleftrightarrow a_{k}x^{k}\in\mathscr{H}_{K^{n}}.

Thus,

\left\{\frac{D^{k}}{k!\sqrt{a_{k}}}\right\}_{k=0}^{\infty}

is an ONB for $\mathscr{H}^{\prime}_{K^{n}}$ . (See also Corollary 4.3.)

Lastly, with $\delta_{x}\longleftrightarrow K_{x}$ , then

\left\langle\frac{D^{k}}{k!\sqrt{a_{k}}},\delta_{x}\right\rangle_{\mathscr{H}^{\prime}_{K^{n}}}=\left\langle\sqrt{a_{k}}x^{k},K_{x}\right\rangle_{\mathscr{H}_{K^{n}}}=\sqrt{a_{k}}x^{k},

and thus

\delta_{x}=\sum_{k=0}^{\infty}\left\langle\frac{D^{k}}{k!\sqrt{a_{k}}},\delta_{x}\right\rangle_{\mathscr{H}^{\prime}_{K^{n}}}\frac{D^{k}}{k!\sqrt{a_{k}}}=\sum_{k=0}^{\infty}\sqrt{a_{k}}x^{k}\frac{D^{k}}{k!\sqrt{a_{k}}}=\sum_{k=0}^{\infty}x^{k}\frac{D^{k}}{k!},

which is (7.6). ∎

7.2. A $K$ -transform and $K^{-1}$

The following result is motivated by the special case when p.d. kernels arise as Greens functions. In more detail, recall from the context of PDEs, Greens functions arise as “inverse” to positive elliptic operators, see e.g., [Nel58b, KL21, BGG99, Hil98]. Since, for these cases, therefore p.d. kernels $K$ arise as inverses of elliptic PDEs, it seems natural, in our present general framework of $Pos(X)$ , to ask for a precise form of $K^{-1}$ .

Starting with $K\in Pos\left(X\right)$ , and introduce functions $f\in\mathscr{H}_{K}$ (the RKHS), and signed measures $\mu$ on $X$ s.t. $\mu K\mu<\infty$ .

Recall that the following are equivalent:

	$\displaystyle\infty>\iint\mu\left(dx\right)K\left(x,y\right)\mu\left(dy\right)=\left(\text{abbreviated $\mu K\mu$}\right)$		(7.7)
	$\displaystyle\Updownarrow$
	$\displaystyle\int\mu\left(dx\right)K\left(x,\cdot\right)\in\mathscr{H}_{K}$		(7.8)

So we get a pre-Hilbert space

\mathscr{M}_{2}\left(K\right):=\left\{\text{signed measures $\mu$ s.t. $\mu K\mu<\infty$}\right\},

and

T_{K}:\mathscr{M}_{2}\left(K\right)\longrightarrow\mathscr{H}_{K},

(7.9)

where

	$\displaystyle\mathscr{M}_{2}\left(K\right)\ni\mu$		$\displaystyle\xrightarrow{\quad T_{K}\quad}$		$\displaystyle T_{K}\mu\in\mathscr{H}_{K}$		(7.10)
	$\displaystyle\mathscr{M}_{2}\left(K\right)\ni K^{-1}f$		$\displaystyle\xleftarrow[\quad T_{K}^{*}\quad]{}$		$\displaystyle f\in\mathscr{H}_{K}$		(7.11)

So we have a well defined operator $T_{K}^{*}$ , and we can use it to make precise $K^{-1}$ . So $K^{-1}$ gets a precise definition via $T_{K}^{*}$ .

Recall we proved (7.7) $\Longleftrightarrow$ (7.8),

\left\|T_{K}\mu\right\|_{\mathscr{H}_{K}}^{2}=\mu K\mu=\left\|\mu\right\|_{\mathscr{M}_{2}\left(K\right)}^{2}.

(7.12)

So introduce

\left\langle\nu,\mu\right\rangle_{\mathscr{M}_{2}\left(K\right)}=\nu K\mu

we have that

\mathscr{M}_{2}\left(K\right)\xrightarrow{\quad T_{K}\quad}m_{K}

is isometric.

Proposition 7.4.

Fix $K\in Pos\left(X\right)$ . We have:

	$\displaystyle T_{K}\mu$	$\displaystyle=\int\mu\left(dx\right)K\left(x,\cdot\right),$		(7.13)
	$\displaystyle T_{K}^{*}f$	$\displaystyle=K^{-1}f,$		(7.14)

where $K^{-1}$ is a Penrose-inverse (see e.g., [MWS25]) to $K$ where $K$ is interpreted as a kernel operator.

Proof.

We must show the following identity for the respective inner products on $f\in\mathscr{H}_{K}$ , $\mu\in\mathscr{M}_{2}\left(K\right)$ :

\left\langle T_{K}\mu,f\right\rangle_{\mathscr{H}_{K}}=\left\langle\mu,K^{-1}f\right\rangle_{\mathscr{M}_{2}\left(K\right)}.

(7.15)

Using (7.14), we arrive at the following:

	$\displaystyle\text{LHS}_{\left(\ref{eq:nm3}\right)}$	$\displaystyle=\left\langle\int\mu\left(dx\right)K\left(x,\cdot\right),f\right\rangle_{\mathscr{H}_{K}}$
		$\displaystyle=\int\mu\left(dx\right)f\left(x\right)$
		$\displaystyle=\int\mu\left(dx\right)K\left(K^{-1}f\right)\left(y\right)=\text{$\text{RHS}_{\left(\ref{eq:nm3}\right)}$. }$

Note that when $K^{-1}$ is acting via Penrose inverse on the function $f$ , the result $K^{-1}f$ is a signed measure, and that is the interpretation used in the statement of the Proposition. ∎

Of special significance to the above discussion are the following citations [JST23, JT23b, MWS25, PDC⁺14, TXK23, ZCH19].

References

[AA23] Nourhane Attia and Ali Akgül, On solutions of biological models using reproducing Kernel Hilbert space method, Computational methods for biological models, Stud. Comput. Intell., vol. 1109, Springer, Singapore, [2023] ©2023, pp. 117–136. MR 4689655
[AAARM24] Taher Amoozad, Tofigh Allahviranloo, Saeid Abbasbandy, and Mohsen Rostamy Malkhalifeh, Using a new implementation of reproducing kernel Hilbert space method to solve a system of second-order BVPs, Int. J. Dyn. Control 12 (2024), no. 6, 1694–1706. MR 4751246
[ADR03] D. Alpay, A. Dijksma, and J. Rovnyak, A theorem of Beurling-Lax type for Hilbert spaces of functions analytic in the unit ball, Integral Equations Operator Theory 47 (2003), no. 3, 251–274. MR 2012838
[AJ21] Daniel Alpay and Palle E. T. Jorgensen, New characterizations of reproducing kernel Hilbert spaces and applications to metric geometry, Opuscula Math. 41 (2021), no. 3, 283–300. MR 4302453
[AJ22] Daniel Alpay and Palle Jorgensen, Reflection positivity via Krein space analysis, Adv. in Appl. Math. 141 (2022), Paper No. 102411, 45. MR 4467152
[AJP22] Daniel Alpay, Palle Jorgensen, and Motke Porat, White noise space analysis and multiplicative change of measures, J. Math. Phys. 63 (2022), no. 4, Paper No. 042102, 23. MR 4405120
[Alp15] Daniel Alpay, An advanced complex analysis problem book, Birkhäuser/Springer, Cham, 2015, Topological vector spaces, functional analysis, and Hilbert spaces of analytic functions. MR 3410523
[AMP92] Gregory T. Adams, Paul J. McGuire, and Vern I. Paulsen, Analytic reproducing kernels and multiplication operators, Illinois J. Math. 36 (1992), no. 3, 404–419. MR 1161974
[AP20] Daniel Alpay and Ismael L. Paiva, On the extension of positive definite kernels to topological algebras, J. Math. Phys. 61 (2020), no. 6, 063507, 10. MR 4113042
[Aro50] N. Aronszajn, Theory of reproducing kernels, Trans. Amer. Math. Soc. 68 (1950), 337–404. MR 51437
[Arv98] William Arveson, Subalgebras of $C^{*}$ -algebras. III. Multivariable operator theory, Acta Math. 181 (1998), no. 2, 159–228. MR 1668582
[BB23] Anton Baranov and Timur Batenev, Representing systems of reproducing kernels in spaces of analytic functions, Results Math. 78 (2023), no. 4, Paper No. 143, 17. MR 4586941
[Ber55] Stefan Bergman, Bounds for analytic functions in domains with a distinguished boundary surface, Math. Z. 63 (1955), 173–194. MR 75656
[BGG99] Richard Beals, Peter Greiner, and Bernard Gaveau, Green’s functions for some highly degenerate elliptic operators, J. Funct. Anal. 165 (1999), no. 2, 407–429. MR 1698952
[CCF16] Min Chen, Yang Chen, and Engui Fan, Perturbed Hankel determinant, correlation functions and Painlevé equations, J. Math. Phys. 57 (2016), no. 2, 023501, 31. MR 3439677
[CCL17] Hong Rae Cho, Hyunil Choi, and Han-Wool Lee, Boundedness of the Segal-Bargmann transform on fractional Hermite-Sobolev spaces, J. Funct. Spaces (2017), Art. ID 9176914, 6. MR 3603383
[CDD15] Sergio Cruces and Iván Durán-Díaz, The minimum risk principle that underlies the criteria of bounded component analysis, IEEE Trans. Neural Netw. Learn. Syst. 26 (2015), no. 5, 964–981. MR 3454256
[DKS19] Kamal Diki, Rolf Sören Krausshar, and Irene Sabadini, On the Bargmann-Fock-Fueter and Bergman-Fueter integral transforms, J. Math. Phys. 60 (2019), no. 8, 083506, 26. MR 3994389
[Don74] William F. Donoghue, Jr., Monotone matrix functions and analytic continuation, Die Grundlehren der mathematischen Wissenschaften, Band 207, Springer-Verlag, New York-Heidelberg, 1974. MR 0486556
[DS20] Micho Durdevich and Stephen Bruce Sontz, Coherent states for the Manin plane via Toeplitz quantization, J. Math. Phys. 61 (2020), no. 2, 023502, 17. MR 4059341
[Gia21] Dimitrios Giannakis, Quantum dynamics of the classical harmonic oscillator, J. Math. Phys. 62 (2021), no. 4, Paper No. 042701, 45. MR 4241109
[Has21] Friedrich Haslinger, The generalized $\partial$ -complex on the Segal-Bargmann space, Operator theory, functional analysis and applications, Oper. Theory Adv. Appl., vol. 282, Birkhäuser/Springer, Cham, [2021] ©2021, pp. 317–328. MR 4248024
[Hil98] Adrian T. Hill, Estimates on the Green’s function of second-order elliptic operators in ${\bf R}^{N}$ , Proc. Roy. Soc. Edinburgh Sect. A 128 (1998), no. 5, 1033–1051. MR 1642132
[HJW20] John E. Herr, Palle E. T. Jorgensen, and Eric S. Weber, Harmonic analysis of fractal measures: basis and frame algorithms for fractal $L^{2}$ -spaces, and boundary representations as closed subspaces of the Hardy space, Analysis, probability and mathematical physics on fractals, Fractals Dyn. Math. Sci. Arts Theory Appl., vol. 5, World Sci. Publ., Hackensack, NJ, [2020] ©2020, pp. 163–221. MR 4472249
[HMBV24] Boya Hou, Amarsagar Reddy Ramapuram Matavalam, Subhonmesh Bose, and Umesh Vaidya, Propagating uncertainty through system dynamics in reproducing kernel Hilbert space, Phys. D 463 (2024), Paper No. 134168, 9. MR 4735224
[HSZ⁺19] Bo He, Yan Song, Yuemei Zhu, Qixin Sha, Yue Shen, Tianhong Yan, Rui Nian, and Amaury Lendasse, Local receptive fields based extreme learning machine with hybrid filter kernels for image classification, Multidimens. Syst. Signal Process. 30 (2019), no. 3, 1149–1169. MR 3969356
[Jon09] Lee K. Jones, Local minimax learning of functions with best finite sample estimation error bounds: applications to ridge and lasso regression, boosting, tree learning, kernel machines, and inverse problems, IEEE Trans. Inform. Theory 55 (2009), no. 12, 5700–5727. MR 2597189
[Jor02] Palle E. T. Jorgensen, Diagonalizing operators with reflection symmetry, vol. 190, 2002, Special issue dedicated to the memory of I. E. Segal, pp. 93–132. MR 1895530
[JS21] Palle Jorgensen and David E. Stewart, Approximation properties of ridge functions and extreme learning machines, SIAM J. Math. Data Sci. 3 (2021), no. 3, 815–832. MR 4291375
[JST20] Palle Jorgensen, Myung-Sin Song, and James Tian, A Kaczmarz algorithm for sequences of projections, infinite products, and applications to frames in IFS $L^{2}$ spaces, Adv. Oper. Theory 5 (2020), no. 3, 1100–1131. MR 4126821
[JST23] Palle E. T. Jorgensen, Myung-Sin Song, and James Tian, Infinite-dimensional stochastic transforms and reproducing kernel Hilbert space, Sampl. Theory Signal Process. Data Anal. 21 (2023), no. 1, Paper No. 12, 27. MR 4561157
[JT22] Palle Jorgensen and James Tian, Reproducing kernels and choices of associated feature spaces, in the form of $L^{2}$ -spaces, J. Math. Anal. Appl. 505 (2022), no. 2, Paper No. 125535, 31. MR 4295177
[JT23a] by same author, Harmonic analysis of network systems via kernels and their boundary realizations, Discrete Contin. Dyn. Syst. Ser. S 16 (2023), no. 2, 277–308. MR 4536608
[JT23b] Palle E. T. Jorgensen and James Tian, Dual pairs of operators, harmonic analysis of singular nonatomic measures and Krein-Feller diffusion, J. Operator Theory 89 (2023), no. 1, 205–248. MR 4567343
[JT23c] by same author, Stochastics and dynamics of fractals, Recent developments in operator theory, mathematical physics and complex analysis, Oper. Theory Adv. Appl., vol. 290, Birkhäuser/Springer, Cham, [2023] ©2023, pp. 171–216. MR 4590528
[Kis23] Vladimir V. Kisil, Cross-Toeplitz operators on the Fock-Segal-Bargmann spaces and two-sided convolutions on the Heisenberg group, Ann. Funct. Anal. 14 (2023), no. 2, Paper No. 38, 57. MR 4553935
[KL21] Seick Kim and Sungjin Lee, Estimates for Green’s functions of elliptic equations in non-divergence form with continuous coefficients, Ann. Appl. Math. 37 (2021), no. 2, 111–130. MR 4294330
[LCA⁺24] William Lippitt, Nichole E. Carlson, Jaron Arbet, Tasha E. Fingerlin, Lisa A. Maier, and Katerina Kechris, Limitations of clustering with PCA and correlated noise, J. Stat. Comput. Simul. 94 (2024), no. 10, 2291–2319. MR 4769269
[LG20] Marcos López-García, The weighted Bergman space on a sector and a degenerate parabolic equation, J. Math. Anal. Appl. 491 (2020), no. 2, 124344, 15. MR 4122067
[Loe48] Charles Loewner, A topological characterization of a class of integral operators, Ann. of Math. (2) 49 (1948), 316–332. MR 24487
[MDL19] Elisa Marcelli and Renato De Leone, Infinite Kernel Extreme Learning Machine, Advances in optimization and decision science for society, services and enterprises, AIRO Springer Ser., vol. 3, Springer, Cham, [2019] ©2019, pp. 95–105. MR 4300781
[MWS25] Haifeng Ma, Wen Wang, and Predrag S. Stanimirović, Weighted Moore-Penrose inverses for dual matrices and its applications, Appl. Math. Comput. 489 (2025), Paper No. 129145, 14. MR 4815055
[Nel58a] Edward Nelson, An existence theorem for second order parabolic equations, Trans. Amer. Math. Soc. 88 (1958), 414–429. MR 95341
[Nel58b] by same author, Kernel functions and eigenfunction expansions, Duke Math. J. 25 (1958), 15–27. MR 91442
[NSW11] P. Niyogi, S. Smale, and S. Weinberger, A topological view of unsupervised learning from noisy data, SIAM J. Comput. 40 (2011), no. 3, 646–663. MR 2810909
[PDC⁺14] Gianluigi Pillonetto, Francesco Dinuzzo, Tianshi Chen, Giuseppe De Nicolao, and Lennart Ljung, Kernel methods in system identification, machine learning and function estimation: a survey, Automatica J. IFAC 50 (2014), no. 3, 657–682. MR 3173967
[PR16] Vern I. Paulsen and Mrinal Raghupathi, An introduction to the theory of reproducing kernel Hilbert spaces, Cambridge Studies in Advanced Mathematics, vol. 152, Cambridge University Press, Cambridge, 2016. MR 3526117
[SBP23] Anirban Sen, Pintu Bhunia, and Kallol Paul, Bounds for the Berezin number of reproducing kernel Hilbert space operators, Filomat 37 (2023), no. 6, 1741–1749. MR 4569938
[Sch64] Laurent Schwartz, Sous-espaces hilbertiens d’espaces vectoriels topologiques et noyaux associés (noyaux reproduisants), J. Analyse Math. 13 (1964), 115–256. MR 179587
[SS17] Jan Stochel and Jerzy Bartł omiej Stochel, Composition operators on Hilbert spaces of entire functions with analytic symbols, J. Math. Anal. Appl. 454 (2017), no. 2, 1019–1066. MR 3658810
[Ste24] Ingo Steinwart, Reproducing kernel Hilbert spaces cannot contain all continuous functions on a compact metric space, Arch. Math. (Basel) 122 (2024), no. 5, 553–557. MR 4734568
[TXK23] Xin Tan, Yingcun Xia, and Efang Kong, Choosing shape parameters for regression in reproducing kernel Hilbert space and variable selection, J. Nonparametr. Stat. 35 (2023), no. 3, 514–528. MR 4635411
[vdL96] Angelika van der Linde, The invariance of statistical analyses with smoothing splines with respect to the inner product in the reproducing kernel Hilbert space, Statistical theory and computational aspects of smoothing (Semmering, 1994), Contrib. Statist., Physica, Heidelberg, 1996, pp. 149–164. MR 1482832
[WK23] Hengfang Wang and Jae Kwang Kim, Statistical inference using regularized M-estimation in the reproducing kernel Hilbert space for handling missing data, Ann. Inst. Statist. Math. 75 (2023), no. 6, 911–929. MR 4655783
[YTDMM11] Shi Yu, Léon-Charles Tranchevent, Bart De Moor, and Yves Moreau, Kernel-based data fusion for machine learning, Studies in Computational Intelligence, vol. 345, Springer-Verlag, Berlin, 2011, Methods and applications in bioinformatics and text mining. MR 3024752
[ZCH19] Yang Zhou, Di-Rong Chen, and Wei Huang, A class of optimal estimators for the covariance operator in reproducing kernel Hilbert spaces, J. Multivariate Anal. 169 (2019), 166–178. MR 3875593
[ZXZ09] Haizhang Zhang, Yuesheng Xu, and Jun Zhang, Reproducing kernel Banach spaces for machine learning, J. Mach. Learn. Res. 10 (2009), 2741–2775. MR 2579912
[ZZ23] Yilin Zhang and Liping Zhu, Projection divergence in the reproducing kernel Hilbert space: Asymptotic normality, block-wise and slicing estimation, and computational efficiency, J. Multivariate Anal. 197 (2023), Paper No. 105204. MR 4601874

	$\displaystyle K\left(x,y\right)$	$\displaystyle=\frac{1}{1-xy}=\sum_{i=0}^{\infty}x^{i}y^{i}=\sum_{i=0}^{\infty}f_{i}\left(x\right)f_{i}\left(y\right)$
	$\displaystyle K_{n}\left(x,y\right)$	$\displaystyle=\sum_{i=0}^{\infty}T^{n}f_{i}\left(x\right)T^{n}f_{i}\left(y\right),\quad x,y\in\left(0,1\right).$

	$\displaystyle\left\\|\sum_{n=0}^{\infty}\frac{c_{n}}{\sqrt{n+1}}z^{n}\right\\|_{H_{2}}^{2}$	$\displaystyle=\sum_{n=0}^{\infty}\frac{\left\|c_{n}\right\|^{2}}{n+1}$
	$\displaystyle\left\\|\sum_{n=0}^{\infty}c_{n}z^{n}\right\\|_{B_{2}}^{2}$	$\displaystyle=\left\\|\sum_{n=0}^{\infty}\frac{c_{n}}{\sqrt{n+1}}\sqrt{n+1}z^{n}\right\\|_{B_{2}}^{2}=\sum_{n=0}^{\infty}\frac{\left\|c_{n}\right\|^{2}}{n+1}.$

	$\displaystyle\left\\|F\left(z\right)h\right\\|_{\mathscr{H}}^{2}$	$\displaystyle=\left\\|\sum_{n=0}^{\infty}\left(Q_{n}h\right)z^{n}\right\\|_{H_{2}\left(\mathbb{D},\mathscr{H}\right)}^{2}$
		$\displaystyle=\sum_{n=0}^{\infty}\left\\|Q_{h}h\right\\|_{\mathscr{H}}^{2}\left\|z\right\|^{2n}$
		$\displaystyle=\sum_{n=0}^{\infty}\left\langle h,Q_{n}^{*}Q_{n}h\right\rangle_{\mathscr{H}}\left\|z\right\|^{2n}$
		$\displaystyle\leq\left\\|h\right\\|_{\mathscr{H}}^{2}\sum_{n=0}^{\infty}\left\\|Q_{n}^{*}Q_{n}\right\\|\left\|z\right\|^{2n}$
		$\displaystyle=\left\\|h\right\\|_{\mathscr{H}}^{2}\sum_{n=0}^{\infty}\left\\|Q_{n}\right\\|^{2}\left\|z\right\|^{2n}$

New duality in choices of feature spaces via kernel analysis

Abstract.

Key words and phrases:

2020 Mathematics Subject Classification:

1. Introduction

2. Preliminaries

Definition 2.1 (Positive definite).

Lemma 2.2 (Parseval frame).

Proof.

Lemma 2.3 (Kernel representation).

Proof.

Proposition 2.4 (Products of p.d. kernels).

Proof.

Definition 2.5 (Loewner order, see e.g., [Don74, Loe48]).

Lemma 2.6.

Proof.

Example 2.7.

Proposition 2.8 (Monotonicity).

Proof.

3. Feature space realizations

Definition 3.1 (Feature map, and feature space).

Remark 3.2.

Proposition 3.3.

Proof.

Example 3.4.

3.1. A duality for feature selections

Proposition 3.5 (duality).

Proof.

Example 3.6.

3.2. Operations

Proposition 3.7.

Proof.

Proposition 3.8.

Proof.

Lemma 3.9 (sums of p.d. kernels).

Proof.

4. Hilbert space of distributions

Theorem 4.1.

Proof.

Example 4.2 ([Jor02]).

Corollary 4.3.

5. Ordering of kernels and RKHSs

Theorem 5.1.

Lemma 5.2.

Proof.

Corollary 5.3.

Remark 5.4.

Example 5.5.

Proof of (5.7).

Remark 5.6.

Theorem 5.7.

Proof.

Corollary 5.8 (Multipliers).

Proof.

Definition 5.9.

Theorem 5.10.

Proof.

Remark 5.11.

Lemma 5.12.

Proof.

Theorem 5.13.

Proof.

6. RKHS of analytic functions

Definition 6.1.

Remark 6.2.

Example 6.3 (Bergman = (Szegő)2\left(\text{Szeg\H{o}}\right)^{2}).

Remark 6.4.

Remark 6.5.

Definition 6.6.

6.1. Three kernels of H2H_{2}-Hardy spaces

6.2. Realization using tensor product of Hilbert spaces

7. KK-duality via the RKHS ℋK\mathscr{H}_{K}

7.1. Dirac-masses and ℋK\mathscr{H}_{K}

Theorem 7.1.

Proof.

Corollary 7.2.

Corollary 7.3.

Proof.

7.2. A KK-transform and K−1K^{-1}

Proposition 7.4.

Example 6.3 (Bergman = $\left(\text{Szeg\H{o}}\right)^{2}$ ).

6.1. Three kernels of $H_{2}$ -Hardy spaces

7. $K$ -duality via the RKHS $\mathscr{H}_{K}$

7.1. Dirac-masses and $\mathscr{H}_{K}$

7.2. A $K$ -transform and $K^{-1}$