This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Generalization of the energy distance by Bernstein functions

J. C. Guella jean.guella@riken.jp RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
Abstract.

We reprove the well known fact that the energy distance defines a metric on the space of Borel probability measures on a Hilbert space with finite first moment by a new approach, by analysing the behaviour of the Gaussian kernel on Hilbert spaces and a Maximum Mean Discrepancy analysis. From this new point of view we are able to generalize the energy distance metric to a family of kernels related to Bernstein functions and conditionally negative definite kernels. We also explain what occurs on the energy distance on the kernel xyα\|x-y\|^{\alpha} for every α>2\alpha>2, where we also generalize the idea to a family of kernels related to derivatives of completely monotone functions and conditionally negative definite kernels.

Key words and phrases:
Energy distance; Metric spaces of strong negative type; Metrics on probabilities; Bernstein functions; Conditionally negative definite kernels
2010 Mathematics Subject Classification:
42A82 ; 43A35

1. Introduction

A popular method to compare two probabilities is done by embedding the space (or a subset) of probabilities into a Hilbert space and use the metric provided by the embedding. Currently, there are two main approaches for this task:

  1. (I)(I)

    The maximum mean discrepancy on a bounded, continuous, positive definite kernel K:X×XK:X\times X\to\mathbb{R} that is characteristic [7], [4]. The distance between two Radon regular probabilities PP and QQ is defined by

    MMD(P,Q):=XXK(x,y)d[PQ](x)d[PQ](y).MMD(P,Q):=\sqrt{\int_{X}\int_{X}K(x,y)d[P-Q](x)d[P-Q](y)}.
  2. (II)(II)

    The use of a continuous conditionally negative definite kernel γ:X×X\gamma:X\times X\to\mathbb{R} with γ(x,x)=0\gamma(x,x)=0 for every xXx\in X, [19]. The kernel γ\gamma must additionally satisfy the equality

    (1.1) XXγ(x,y)d[PQ](x)d[PQ](y)=0\int_{X}\int_{X}-\gamma(x,y)d[P-Q](x)d[P-Q](y)=0

    for two Radon regular probabilities PP and QQ that integrates the function xγ(x,z)x\to\gamma(x,z) for every zXz\in X only when P=QP=Q. It can be proved that the above double integral is always a nonnegative number and when this property occurs

    Dγ(P,Q):=XXγ(x,y)d[PQ](x)d[PQ](y),D_{\gamma}(P,Q):=\sqrt{\int_{X}\int_{X}-\gamma(x,y)d[P-Q](x)d[P-Q](y)},

    is a metric on the mentioned subspace of probabilities on XX.

On this paper, we focus on the second method.

The most popular example of this method is the energy distance, initially defined as X=mX=\mathbb{R}^{m}, γ(x,y)=xyθ\gamma(x,y)=\|x-y\|^{\theta}, where 0<θ<20<\theta<2 and the set of probabilities are those that integrates xθ\|x\|^{\theta}, [24], [23]. When θ=2\theta=2, the kernel is conditionally negative definite but do not satisfy the additional property of Equation 1.1.

A more geometrical approach is when γ\gamma is a metric on XX that satisfy Equation 1.1 (the topology is the one from the metric), hence (X,γ)(X,\gamma) is a metric space of strong negative type. Examples of such spaces include:

\bullet Hilbert spaces: Proved on [12] as a generalization of the energy distance.

\bullet Hyperbolic spaces (finite dimensional): Proved on [13]

In some cases, the conditionally negative definite kernel γ\gamma may define a metric on the set XX, but γ\gamma is not of strong type. A metric space where we only know that the distance is a conditionally negative definite kernel is called a metric space of negative type. An example of such space is the real sphere, proved on [5], where it is also proved that the real, complex and quaternionic projective spaces and the Cayley projective plane are not metric spaces of negative type.

In [12], it is also proved that if (X,γ)(X,\gamma) is a metric space of negative type then γθ\gamma^{\theta}, 0<θ<10<\theta<1 is a conditionally negative definite kernel that satisfies Equation 1.1, with the topology of the metric γ\gamma. Interestingly, the kernel γθ\gamma^{\theta} is a metric on XX, with the same topology as γ\gamma, so we can rephrase the result of Lyon as (X,γθ)(X,\gamma^{\theta}) being a metric space of strong negative type. We provide more details and generalizations of this property on Corollary 4.3.

The major aim of this paper is to provide a large amount of examples of conditionally negative definite kernels that satisfy Equation 1.1, by using Bernstein functions on Theorem 4.1. Our method encompasses all of the above mentioned kernels that satisfy (II)(II). We also provide a new proof that hyperbolic spaces (any dimension) are metric paces of strong negative type on Theorem on 4.2.

In [15], Mattner analysed the behaviour of the kernel xyα\|x-y\|^{\alpha}, for α>2\alpha>2, defined on m\mathbb{R}^{m}. What occurs is that we can still provide a metric structure on the space of probabilities with certain integrability assumptions, but we can only compare them if they have the same vector mean ( 2<α<42<\alpha<4), the same same vector mean and the same covariance matrix (4<α<64<\alpha<6), and so on. It also provided the same analysis for others radial kernels, that we generalize on Theorem 4.4 to a broader setting.

Section 3 is focused on the integrability conditions of a conditionally negative definite kernel (and its generalizations). Section 4 contains the most important results of this paper, mentioned before. On Section 5 we analyse the space of functions

yψ(xy2)𝑑μ(x),y\in\mathbb{H}\to\int_{\mathcal{H}}\psi(\|x-y\|^{2})d\mu(x)\in\mathbb{R},

where ψ\psi is a continuous function that is the difference of two derivatives (same order) of a completely monotone function. More precisely, we analyse when they are uniquely defined by the measure μ\mu. Section 2 is entirely focused on definitions that we use. The proofs are presented on Section 6.

2. Definitions

We recall that a nonnegative measure λ\lambda on a Hausdorff space XX is Radon regular (which we simply refer as Radon) when it is a Borel measure such that is finite on every compact set of XX and

  1. (i)

    (Inner regular)λ(E)=sup{λ(K),K is compact ,KE}\lambda(E)=\sup\{\lambda(K),\ \ K\text{ is compact },K\subset E\} for every Borel set EE.

  2. (ii)

    (Outer regular) λ(E)=inf{λ(U),U is open ,EU}\lambda(E)=\inf\{\lambda(U),\ \ U\text{ is open },E\subset U\} for every Borel set EE.

We then said that a complex valued measure λ\lambda of bounded variation is Radon if its variation is a Radon measure. The vector space of such measures is denoted by 𝔐(X)\mathfrak{M}(X). Recall that every Borel measure of finite variation (in particular, probability measures) on a separable complete metric space is necessarily Radon.

An semi-inner product on a real (complex) vector space VV is a bilinear real (sesquilinear complex) valued function (,)V(\cdot,\cdot)_{V} defined on V×VV\times V such that (u,u)V0(u,u)_{V}\geq 0 for every uVu\in V. When this inequality is an equality only for u=0u=0, we say that (,)V(\cdot,\cdot)_{V} is an inner-product. Similarly, a pseudometric on a set XX is a symmetric function d:X×X[0,)d:X\times X\to[0,\infty), such that d(x,x)=0d(x,x)=0 that satisfies the triangle inequality. If d(x,y)=0d(x,y)=0 only when x=yx=y, dd is a metric on XX.

A kernel K:X×XK:X\times X\to\mathbb{C} is called positive definite if for every finite quantity of distinct points x1,,xnXx_{1},\ldots,x_{n}\in X and scalars c1,,cnc_{1},\ldots,c_{n}\in\mathbb{C}, we have that

XXK(x,y)𝑑λ(x)𝑑λ¯(y)=i,j=1ncicj¯K(xi,xj)0,\int_{X}\int_{X}K(x,y)d\lambda(x)d\overline{\lambda}(y)=\sum_{i,j=1}^{n}c_{i}\overline{c_{j}}K(x_{i},x_{j})\geq 0,

where λ=i=1nciδxi\lambda=\sum_{i=1}^{n}c_{i}\delta_{x_{i}}. The set of measures on XX used before are denoted by the symbol δ(X)\mathcal{M}_{\delta}(X).

The reproducing kernel Hilbert space (RKHS) of a positive definite kernel K:X×XK:X\times X\to\mathbb{C} is the Hilbert space K(X,)\mathcal{H}_{K}\subset\mathcal{F}(X,\mathbb{C}), and it satisfies [22]

  1. (i)(i)

    xXKy(x):=K(x,y)Kx\in X\to K_{y}(x):=K(x,y)\in\mathcal{H}_{K};

  2. (ii)(ii)

    Ky,Kx=K(x,y)\langle K_{y},K_{x}\rangle=K(x,y)

  3. (iii)(iii)

    span{Ky,yX}¯=K\overline{span\{K_{y},\quad y\in X\}}=\mathcal{H}_{K}.

When XX is a Hausdorff space and KK is continuous it holds that KC(X)\mathcal{H}_{K}\subset C(X).

The following widely known result describes how it is possible to define a semi-inner product structure on a subspace of 𝔐(X)\mathfrak{M}(X) using a continuous positive definite kernel.

Lemma 2.1.

If K:X×XK:X\times X\to\mathbb{C} is a continuous positive definite kernel and μ𝔐(X)\mu\in\mathfrak{M}(X) with K(x,x)L1(|μ|)\sqrt{K(x,x)}\in L^{1}(|\mu|) (μ𝔐K(X)\mu\in\mathfrak{M}_{\sqrt{K}}(X)), then

zXKμ(z):=XK(x,z)𝑑μ(x)z\in X\to K_{\mu}(z):=\int_{X}K(x,z)d\mu(x)\in\mathbb{C}

is an element of K\mathcal{H}_{K}, and if η\eta is another measure with the same conditions as μ\mu, we have that

Kη,KμK=XXk(x,y)𝑑η(x)𝑑μ¯(y).\langle K_{\eta},K_{\mu}\rangle_{\mathcal{H}_{K}}=\int_{X}\int_{X}k(x,y)d\eta(x)d\overline{\mu}(y).

In particular, (η,μ)𝔐K(X)×𝔐K(X)Kη,KμK(\eta,\mu)\in\mathfrak{M}_{\sqrt{K}}(X)\times\mathfrak{M}_{\sqrt{K}}(X)\to\langle K_{\eta},K_{\mu}\rangle_{\mathcal{H}_{K}} is a semi-inner product.

We present a generalization of this result to a larger class of measures in Lemma 3.6. Usually, the kernel KK is bounded, so 𝔐K(X)=𝔐(X)\mathfrak{M}_{\sqrt{K}}(X)=\mathfrak{M}(X). On this case, if the semi-inner product is in fact an inner product we say that KK is integrally strictly positive definite (ISPD), and when is an inner product on the vector space of measures in 𝔐(X)\mathfrak{M}(X) that μ(X)=0\mu(X)=0, we say that KK is characteristic. If the kernel KK is real valued, it is sufficient to analyse the ISPD and characteristic property on real valued measures.

When the kernel is characteristic we define the maximum mean discrepancy (MMD) as the metric on the space of probability measures in 𝔐(X)\mathfrak{M}(X) by

(2.2) MMD(P,Q)K:=KPKQ,KPKQK=XXK(x,y)d[PQ](x)d[PQ](y)MMD(P,Q)_{K}:=\sqrt{\langle K_{P}-K_{Q},K_{P}-K_{Q}\rangle_{\mathcal{H}_{K}}}=\sqrt{\int_{X}\int_{X}K(x,y)d[P-Q](x)d[P-Q](y)}

As mentioned at the introduction, the focused of this paper is to analyse metrics on the space of probabilities using conditionally negative definite kernels. We present a more general definition which will be useful to the analysis of the energy distance through the kernel xyα\|x-y\|^{\alpha}, α>2\alpha>2, defined on a Hilbert space.

Definition 2.2.

Let γ:X×X\gamma:X\times X\to\mathbb{C} be an Hermitian kernel and PP a finite dimensional space of functions from XX to \mathbb{C}. We say that γ\gamma is PP-conditionally positive definite (PP-CPD) if for every finite quantity of points x1,,xnXx_{1},\ldots,x_{n}\in X and scalars c1,,cnc_{1},\ldots,c_{n}\in\mathbb{C}, under the restriction that i=1ncip(xi)=0\sum_{i=1}^{n}c_{i}p(x_{i})=0 for every pPp\in P, we have that

i,j=1ncicj¯γ(xi,xj)0.\sum_{i,j=1}^{n}c_{i}\overline{c_{j}}\gamma(x_{i},x_{j})\geq 0.

This definition generalize the concepts of positive definite kernels (PP is the zero space) and CPD kernels (PP as the set of constant functions). The most important example is when XX is a finite dimensional Euclidean space and PP is the set of multivariable polynomials on XX with degree less than or equal to a constant kk\in\mathbb{N}, [25] [9], [6]. Sometimes it might be more convenient to work with the opposite sign on Definition 2.2, on this case we say that the kernel is PP-conditionally negative definite (PP-CND).

In [9], [16], it is proved that a characterization for the continuous functions ψ:[0,)\psi:[0,\infty)\to\mathbb{R}, such that the kernel

(x,y)m×mψ(xy2)(x,y)\in\mathbb{R}^{m}\times\mathbb{R}^{m}\to\psi(\|x-y\|^{2})\in\mathbb{R}

is CPD for PP as the family of multivariable polynomials of degree less than a fixed +\ell\in\mathbb{Z}_{+} (we denote this family by π1(m)\pi_{\ell-1}(\mathbb{R}^{m}), where π1(m)={0}\pi_{-1}(\mathbb{R}^{m})=\{0\} and π0(m)={constant functions}\pi_{0}(\mathbb{R}^{m})=\{\text{constant functions}\}) for every mm\in\mathbb{N}. A function ψ\psi satisfy this property if and only if ψC(0,)\psi\in C^{\infty}(0,\infty) and (1)ψ()(-1)^{\ell}\psi^{(\ell)} is a completely monotone function on (0,)(0,\infty). A function with this property can be uniquely written as

(2.3) ψ(t)=(0,)etre(r)ω,(rt)r𝑑λ(r)+k=0aktk\psi(t)=\int_{(0,\infty)}\frac{e^{-tr}-e_{\ell}(r)\omega_{\ell,\infty}(rt)}{r^{\ell}}d\lambda(r)+\sum_{k=0}^{\ell}a_{k}t^{k}

where λ\lambda is a nonnegative Radon measure on (0,)(0,\infty) (not necessarily with finite variation) with

ω,(s):=l=01(1)lsll!,e(s):=esl=01sll!,(0,)min{1,r}𝑑λ(r)<\omega_{\ell,\infty}(s):=\sum_{l=0}^{\ell-1}(-1)^{l}\frac{s^{l}}{l!},\quad e_{\ell}(s):=e^{-s}\sum_{l=0}^{\ell-1}\frac{s^{l}}{l!},\quad\int_{(0,\infty)}\min\{1,r^{-\ell}\}d\lambda(r)<\infty

and aka_{k}\in\mathbb{R}, (1)a0(-1)^{\ell}a_{\ell}\geq 0 and ω0,\omega_{0,\infty} is the zero function. For instance, the functions

  1. i)i)

    (1)ta+p(t)(-1)^{\ell}t^{a}+p(t);

  2. ii)ii)

    (1)+1tlog(t)+p(t)(-1)^{\ell+1}t^{\ell}\log(t)+p(t);

  3. iii)iii)

    (1)(c+t)a+p(t)(-1)^{\ell}(c+t)^{a}+p(t);

  4. iv)iv)

    ert+p(t)e^{-rt}+p(t),

are elements of CMCM_{\ell}, for 1<a\ell-1<a\leq\ell , c>0c>0 and pπ1p\in\pi_{\ell-1}. Those functions are not only in CMCM_{\ell}, but they are 1\ell-1 continuously differentiable on [0,)[0,\infty) and we have a similar and simpler characterization compared to Equation 2.3 for them.

In general, a function ψCM\psi\in CM_{\ell} is such that ψC1([0,))\psi\in C^{\ell-1}([0,\infty)) if and only if

(2.4) ψ(t)=(0,)etrω,(rt)r𝑑η(r)+k=0bktk\psi(t)=\int_{(0,\infty)}\frac{e^{-tr}-\omega_{\ell,\infty}(rt)}{r^{\ell}}d\eta(r)+\sum_{k=0}^{\ell}b_{k}t^{k}

where η\eta is a nonnegative Radon measure on (0,)(0,\infty) (not necessarily with finite variation) with

ω,(s):=l=01(1)lsll!,(0,)min{1,r}𝑑η(r)<,\omega_{\ell,\infty}(s):=\sum_{l=0}^{\ell-1}(-1)^{l}\frac{s^{l}}{l!},\quad\int_{(0,\infty)}\min\{1,r^{-\ell}\}d\eta(r)<\infty,

bk=ψ(k)(0)/k!b_{k}=\psi^{(k)}(0)/k! for k<k<\ell and (1)b0(-1)^{\ell}b_{\ell}\geq 0.

Note that if a function ψCM\psi\in CM_{\ell} then ψ(+c)CMC1([0,))\psi(\cdot+c)\in CM_{\ell}\cap C^{\ell-1}([0,\infty)). On this case, the measure ηc\eta_{c} relative to the decomposition given on Equation 2.4 has finite variation and satisfy dηc+s(r)=esrdηc(r)d\eta_{c+s}(r)=e^{-sr}d\eta_{c}(r) for every c,s>0c,s>0. This property and the decomposition given on Equation 2.4 are implicitly proved on Theorem 2.12.1 on [16] and can also be found on Theorem 8.198.19 of [25]. We remark that a polynomial pCMp\in CM_{\ell} if and only if pπ()p\in\pi_{\ell}(\mathbb{R}) and the constant (1)p()0(-1)^{\ell}p^{(\ell)}\geq 0.

By Lemma 2.42.4 in [9], a function ψCM\psi\in CM_{\ell} satisfies |ψ(t)|1+t|\psi(t)|\lesssim 1+t^{\ell} (this notation means that |ψ(t)|/1+t|\psi(t)|/1+t^{\ell} is a bounded function).

3. Conditionally positive definite kernels

The following known result states a connection between positive definite kernels and PP-CPD kernels [25]. A Lagrange basis for PP is a basis {p1,,pm}\{p_{1},\ldots,p_{m}\} of PP and points ξ1,,ξmX\xi_{1},\ldots,\xi_{m}\in X, such that pi(ξj)=δi,jp_{i}(\xi_{j})=\delta_{i,j}. A set of points ξ1,,ξmX\xi_{1},\ldots,\xi_{m}\in X is unisolvent with respect to a mm-dimensional space PP if the only function pPp\in P such that p(ξi)=0p(\xi_{i})=0 for every ii is the zero function.

Theorem 3.1.

Let ξ1,,ξmX\xi_{1},\ldots,\xi_{m}\in X and p1,,pmp_{1},\ldots,p_{m} be a Lagrange basis for a finite dimensional space PP of functions from XX to \mathbb{C}. An Hermitian kernel γ:X×X\gamma:X\times X\to\mathbb{C} is PP-CPD if and only if the Hermitian kernel

Kγ(x,y):=γ(x,y)k=1mpk(x)γ(ξk,y)l=1mpl(y)¯γ(x,ξl)+k,l=1mpk(x)pl(y)¯γ(ξk,ξl)K_{\gamma}(x,y):=\gamma(x,y)-\sum_{k=1}^{m}p_{k}(x)\gamma(\xi_{k},y)-\sum_{l=1}^{m}\overline{p_{l}(y)}\gamma(x,\xi_{l})+\sum_{k,l=1}^{m}p_{k}(x)\overline{p_{l}(y)}\gamma(\xi_{k},\xi_{l})

is positive definite.

This result can be easily seen by the fact that if x1,,xnXx_{1},\ldots,x_{n}\in X and c1,,cnc_{1},\ldots,c_{n}\in\mathbb{C} are such that i=1ncip(xi)=0\sum_{i=1}^{n}c_{i}p(x_{i})=0 for every pPp\in P, then

i,j=1ncicj¯Kγ(xi,xj)=i,j=1ncicj¯γ(xi,xj),\sum_{i,j=1}^{n}c_{i}\overline{c_{j}}K_{\gamma}(x_{i},x_{j})=\sum_{i,j=1}^{n}c_{i}\overline{c_{j}}\gamma(x_{i},x_{j}),

and conversely, if z1,,zm+nXz_{1},\ldots,z_{m+n}\in X (with zn+k=ξkz_{n+k}=\xi_{k}) and d1,,dm+nd_{1},\ldots,d_{m+n}\in\mathbb{C}, then

i,j=1m+ndidj¯Kγ(zi,zj)=i,j=1m+neiej¯γ(zi,zj),\sum_{i,j=1}^{m+n}d_{i}\overline{d_{j}}K_{\gamma}(z_{i},z_{j})=\sum_{i,j=1}^{m+n}e_{i}\overline{e_{j}}\gamma(z_{i},z_{j}),

where ei=die_{i}=d_{i}, for ini\leq n and ei=i=1ndipin(zi)e_{i}=-\sum_{i=1}^{n}d_{i}p_{i-n}(z_{i}), for i>ni>n.

Similar to continuous positive definite kernels, continuous PP-CPD kernels can be analyzed by its behaviour on a certain type of space of measures.

Definition 3.2.

Let XX be a Hausdorff space and PC(X)P\subset C(X) a finite dimensional vector space. We define the set

𝔐P(X):={μ𝔐(X),X|p(x)|d|μ|(x)< and Xp(x)𝑑μ(x)=0 for every pP}.\mathfrak{M}_{P}(X):=\{\mu\in\mathfrak{M}(X),\quad\int_{X}|p(x)|d|\mu|(x)<\infty\text{ and }\int_{X}p(x)d\mu(x)=0\text{ for every }p\in P\}.
Theorem 3.3.

A continuous Hermitian kernel γ:X×X\gamma:X\times X\to\mathbb{C} is PP-CPD if and only if for every μ𝔐P(X)\mu\in\mathfrak{M}_{P}(X) for which γ(x,y)L1(|μ|×|μ|)\gamma(x,y)\in L^{1}(|\mu|\times|\mu|) and γ(x,ξi)L1(|μ|)\gamma(x,\xi_{i})\in L^{1}(|\mu|), where (ξi)1im(\xi_{i})_{1\leq i\leq m} is unisolvent, we have that

XXγ(x,y)𝑑μ(x)𝑑μ¯(y)0.\int_{X}\int_{X}\gamma(x,y)d\mu(x)d\overline{\mu}(y)\geq 0.

If we restrict the measures on Theorem 3.3 to those that γ(x,ξ)L1(|μ|)\gamma(x,\xi)\in L^{1}(|\mu|) for every ξX\xi\in X, then the kernel γ\gamma defines a semi-inner product on this vector space.

When PP is the space generated by a single function pp, we can simplify the assumptions of Theorem 3.3.

Lemma 3.4.

Let γ:X×X\gamma:X\times X\to\mathbb{C} be a continuous Hermitian kernel and [p]=PC(X)[p]=P\subset C(X) be a one dimensional vector space. Then, γ\gamma is PP-CPD if and only if for every μ𝔐P(X)\mu\in\mathfrak{M}_{P}(X) for which γ(x,y)L1(|μ|×|μ|)\gamma(x,y)\in L^{1}(|\mu|\times|\mu|)

XXγ(x,y)𝑑μ(x)𝑑μ¯(y)0.\int_{X}\int_{X}\gamma(x,y)d\mu(x)d\overline{\mu}(y)\geq 0.

Additionally, if pp and γ\gamma are real valued functions such that p(x)0p(x)\neq 0 and the function γ(x,x)/p2(x)\gamma(x,x)/p^{2}(x) is bounded, the following assertions are equivalent:

  1. (i)(i)

    γL1(|μ|×|μ|)\gamma\in L^{1}(|\mu|\times|\mu|);

  2. (ii)(ii)

    The function xXγ(x,z)L1(|μ|)x\in X\to\gamma(x,z)\in L^{1}(|\mu|) for some zXz\in X;

  3. (iii)(iii)

    The function xXγ(x,z)L1(|μ|)x\in X\to\gamma(x,z)\in L^{1}(|\mu|) for every zXz\in X.

As a direct consequence of the previous Lemma we obtain that if the function γ(x,x)/p2(x)\gamma(x,x)/p^{2}(x) is bounded, the set of measures on 𝔐P(X)\mathfrak{M}_{P}(X) that integrates γ(x,y)\gamma(x,y) is a vector space and the double integral defines a semi inner product on it. We focus on the CPD case and when γ\gamma is real valued due to its relevance.

Corollary 3.5.

Let γ:X×X\gamma:X\times X\to\mathbb{R} be a continuous CPD kernel such that the function γ(x,x)\gamma(x,x) is bounded. The semi inner product

(μ,ν)𝔐1(X,γ)×𝔐1(X,γ)I(μ,ν)γ:=XXγ(x,y)𝑑μ(x)𝑑ν(y)(\mu,\nu)\in\mathfrak{M}_{1}(X,\gamma)\times\mathfrak{M}_{1}(X,\gamma)\to I(\mu,\nu)_{\gamma}:=\int_{X}\int_{X}\gamma(x,y)d\mu(x)d\nu(y)\in\mathbb{R}

is well defined on the vector space

𝔐1(X,γ):={η𝔐(X),η(X)=0,γL1(|η|×|η|)}\mathfrak{M}_{1}(X,\gamma):=\{\eta\in\mathfrak{M}(X),\quad\eta(X)=0,\gamma\in L^{1}(|\eta|\times|\eta|)\}

On the next lemma we improve the condition K(x,x)L1(|μ|)\sqrt{K(x,x)}\in L^{1}(|\mu|) and the set of measures analysed on Lemma 2.1, at the cost of describing the function KμK_{\mu} at the exception of a |μ||\mu| measure zero set.

Lemma 3.6.

Let K:X×XK:X\times X\to\mathbb{C} be a continuous positive definite kernel. Let μ𝔐(X)\mu\in\mathfrak{M}(X) such that K(x,y)L1(|μ|×|μ|)K(x,y)\in L^{1}(|\mu|\times|\mu|), then the set of points

Xμ:={zX,K(,z)L1(|μ|)}X_{\mu}:=\{z\in X,\quad K(\cdot,z)\in L^{1}(|\mu|)\}

is such that |μ|(XXμ)=0|\mu|(X-X_{\mu})=0, and the function

zXμXK(x,z)𝑑μ(x)z\in X_{\mu}\to\int_{X}K(x,z)d\mu(x)\in\mathbb{C}

is the restriction of an element KμKK_{\mu}\in\mathcal{H}_{K}. If η\eta is a measure with the same conditions as the measure μ\mu and KL1(μ×η)K\in L^{1}(\mu\times\eta), we have that

Kη,KμK=XXk(x,y)𝑑η(x)𝑑μ¯(y).\langle K_{\eta},K_{\mu}\rangle_{\mathcal{H}_{K}}=\int_{X}\int_{X}k(x,y)d\eta(x)d\overline{\mu}(y).

4. Inner products defined by CND kernels and derivatives of completely monotone functions

Since all kernels that we deal on this Section are real valued, we simplify the writing by only focusing on real valued measures (which we still use the notation 𝔐(X)\mathfrak{M}(X)). As mentioned on Section 22, this is not a restriction.

In [12], it is proved that on a separable real Hilbert space \mathcal{H}, the bilinear function I1/2I_{1/2} defined as

(μ,ν)𝔐1()×𝔐1()I(μ,ν)1/2:=xydμ(x)dν(y)(\mu,\nu)\in\mathfrak{M}_{1}(\mathcal{H})\times\mathfrak{M}_{1}(\mathcal{H})\to I(\mu,\nu)_{1/2}:=\int_{\mathcal{H}}\int_{\mathcal{H}}-\|x-y\|_{\mathcal{H}}d\mu(x)d\nu(y)

defines a inner product on the vector space

𝔐1():={η𝔐(),η()=0,xL1(|η|)}.\mathfrak{M}_{1}(\mathcal{H}):=\{\eta\in\mathfrak{M}(\mathcal{H}),\quad\eta(\mathcal{H})=0,\|x\|\in L^{1}(|\eta|)\}.

The function t[0,)ψ(t):=tt\in[0,\infty)\to\psi(t):=\sqrt{t}\in\mathbb{R} is an example of a Bernstein function, [18]. It is continuous, ψC((0,))\psi\in C^{\infty}((0,\infty)) and ψ\psi^{\prime} is a completely monotone function on (0,)(0,\infty) (we do not need to assume on our context that Bernstein functions are nonnegative). In other words, a function ψ\psi is a Bernstein function if and only if ψCM1-\psi\in CM_{1}, and then it can be written, by Equation 2.4 for =1\ell=1, as

t=12π(0,)(ert1)1r3/2𝑑r.-\sqrt{t}=\frac{1}{2\sqrt{\pi}}\int_{(0,\infty)}(e^{-rt}-1)\frac{1}{r^{3/2}}dr.

So,

(x,y)×xy=12π(0,)(erxy21)1r3/2𝑑r,(x,y)\in\mathcal{H}\times\mathcal{H}\to-\|x-y\|_{\mathcal{H}}=\frac{1}{2\sqrt{\pi}}\int_{(0,\infty)}(e^{-r\|x-y\|^{2}}-1)\frac{1}{r^{3/2}}dr,

and this kernel is CPD. The Gaussian kernels erxy2e^{-r\|x-y\|^{2}}, r>0r>0, are ISPD for every Hilbert space [8], being so, by Fubini-Tonelli Theorem we have that if μ𝔐()\mu\in\mathfrak{M}(\mathcal{H}) with μ()=0\mu(\mathcal{H})=0 and xL1(|μ|)\|x\|\in L^{1}(|\mu|), then

(1)xy𝑑μ(x)𝑑μ(y)=12π(0,)(erxy2𝑑μ(x)𝑑μ(y))1r3/2𝑑r0.\int_{\mathcal{H}}\int_{\mathcal{H}}(-1)\|x-y\|_{\mathcal{H}}d\mu(x)d\mu(y)=\frac{1}{2\sqrt{\pi}}\int_{(0,\infty)}\left(\int_{\mathcal{H}}\int_{\mathcal{H}}e^{-r\|x-y\|^{2}}d\mu(x)d\mu(y)\right)\frac{1}{r^{3/2}}dr\geq 0.

Further, the double inner integral is positive whenever μ\mu is not the zero measure, implying that the final result is a positive number, which is the key argument in order to verify that I1/2I_{1/2} is an inner product, thus reobtaining the main result of [12] by a complete different argument. More generally, we have the following result.

Theorem 4.1.

Let ψ:[0,)\psi:[0,\infty)\to\mathbb{R} be a Bersntein function and γ:X×X[0,)\gamma:X\times X\to[0,\infty) be a continuous CND kernel such that xγ(x,x)x\to\gamma(x,x) is a bounded function. Consider the vector space

𝔐1(X;γ,ψ):={η𝔐(X),ψ(γ(x,y))L1(|η|×|η|) and η(X)=0},\mathfrak{M}_{1}(X;\gamma,\psi):=\{\eta\in\mathfrak{M}(X),\quad\psi(\gamma(x,y))\in L^{1}(|\eta|\times|\eta|)\text{ and }\eta(X)=0\},

then the function

(μ,ν)𝔐1(X;γ,ψ)×𝔐1(X;γ,ψ)I(μ,ν)γ,ψ:=XXψ(γ(x,y))𝑑μ(x)𝑑ν(y)(\mu,\nu)\in\mathfrak{M}_{1}(X;\gamma,\psi)\times\mathfrak{M}_{1}(X;\gamma,\psi)\to I(\mu,\nu)_{\gamma,\psi}:=-\int_{X}\int_{X}\psi(\gamma(x,y))d\mu(x)d\nu(y)

defines an semi-inner product on 𝔐1(X;γ,ψ)\mathfrak{M}_{1}(X;\gamma,\psi). If ψ\psi is not a linear function and 2γ(x,y)=γ(x,x)+γ(y,y)2\gamma(x,y)=\gamma(x,x)+\gamma(y,y) only when x=yx=y, then I(μ,ν)γ,ψI(\mu,\nu)_{\gamma,\psi} defines an inner product on 𝔐1(X;γ,ψ)\mathfrak{M}_{1}(X;\gamma,\psi).

We emphasize that by Lemma 3.4 (pp is the constant 11 function) ψ(γ(x,y))L1(|η|×|η|)\psi(\gamma(x,y))\in L^{1}(|\eta|\times|\eta|) if and only if xψ(γ(x,z))L1(|η|)x\to\psi(\gamma(x,z))\in L^{1}(|\eta|) for some (or every) zXz\in X.

For instance, if XX is a real Hilbert space \mathcal{H}, γ(x,y)=xy2\gamma(x,y)=\|x-y\|^{2} and ψ(t)=ta/2\psi(t)=t^{a/2}, 0<a<20<a<2, then

(μ,ν)𝔐1(;ta/2)×𝔐1(;ta/2)I(μ,ν)a/2:=xya𝑑μ(x)𝑑ν(y)(\mu,\nu)\in\mathfrak{M}_{1}(\mathcal{H};t^{a/2})\times\mathfrak{M}_{1}(\mathcal{H};t^{a/2})\to I(\mu,\nu)_{a/2}:=-\int_{\mathcal{H}}\int_{\mathcal{H}}\|x-y\|^{a}d\mu(x)d\nu(y)

defines an inner product on

𝔐1(;ta/2):={η𝔐(),xaL1(|η|) and η(X)=0}.\mathfrak{M}_{1}(\mathcal{H};t^{a/2}):=\{\eta\in\mathfrak{M}(\mathcal{H}),\quad\|x\|^{a}\in L^{1}(|\eta|)\text{ and }\eta(X)=0\}.

It is relevant to say that usually the inner product on Theorem 4.1 is not complete (hence, 𝔐1(X;γ,ψ)\mathfrak{M}_{1}(X;\gamma,\psi) is not a Hilbert space). For instance, on [20] it is proved that the Gaussian kernel can be used to define an inner product on the space of tempered distributions on Euclidean spaces.

Another example occurs on the generalized real hyperbolic space. Let \mathcal{H} be a Hilbert space and define :={(x,tx)×(0,),tx2x2=1}\mathbb{H}:=\{(x,t_{x})\in\mathcal{H}\times(0,\infty),\quad t_{x}^{2}-\|x\|^{2}=1\} be the real hyperbolic space relative to \mathcal{H} and consider the kernel

((x,tx),(y,ty))×[(x,tx),(y,ty)]:=txtyx,y[1,),((x,t_{x}),(y,t_{y}))\in\mathbb{H}\times\mathbb{H}\to[(x,t_{x}),(y,t_{y})]:=t_{x}t_{y}-\langle x,y\rangle\in[1,\infty),

which satisfies the relation

cosh(d((x,tx),(y,ty)))=[(x,tx),(y,ty)],\cosh(d_{\mathbb{H}}((x,t_{x}),(y,t_{y})))=[(x,t_{x}),(y,t_{y})],

where dd_{\mathbb{H}} is a metric in \mathbb{H}. On [3] or chapter 55 in [1], it is proved that the metric dd_{\mathbb{H}} on \mathbb{H} is a CND kernel, being so we can apply Theorem 4.1 for the kernel γ=d\gamma=d_{\mathbb{H}} and ψ=ta/2\psi=t^{a/2}, 0<a<20<a<2, then

(μ,ν)𝔐1(;ta/2)×𝔐1(;ta/2)H(μ,ν)a/2:=d(x,y)a/2𝑑μ(x)𝑑ν(y)(\mu,\nu)\in\mathfrak{M}_{1}(\mathbb{H};t^{a/2})\times\mathfrak{M}_{1}(\mathbb{H};t^{a/2})\to H(\mu,\nu)_{a/2}:=-\int_{\mathbb{H}}\int_{\mathbb{H}}d_{\mathbb{H}}(x,y)^{a/2}d\mu(x)d\nu(y)

defines a inner product on

𝔐1(;ta/2):={η𝔐(),\displaystyle\mathfrak{M}_{1}(\mathbb{H};t^{a/2}):=\{\eta\in\mathfrak{M}(\mathbb{H}), xd(x,z)a/2L1(|η|)\displaystyle\quad x\in\mathbb{H}\to d_{\mathbb{H}}(x,z)^{a/2}\in L^{1}(|\eta|)
for some (or every) z and η()=0}.\displaystyle\text{for some (or every) }z\in\mathbb{H}\text{ and }\eta(\mathbb{H})=0\}.

We can also include the case a=2a=2. A proof when \mathbb{H} is finite dimensional was provided on [13] using geometric properties of hyperbolic spaces. Our proof relies on a Laurent type of approximation for the function arcCosh(t)\operatorname{arcCosh}(t).

Theorem 4.2.

Let \mathbb{H} be a real hyperbolic space, and consider the vector space

𝔐1(;t):={η𝔐(),\displaystyle\mathfrak{M}_{1}(\mathbb{H};t):=\{\eta\in\mathfrak{M}(\mathbb{H}), xd(x,z)L1(|η|)\displaystyle\quad x\in\mathbb{H}\to d_{\mathbb{H}}(x,z)\in L^{1}(|\eta|)
for some (or every) z and η()=0}.\displaystyle\text{for some (or every) }z\in\mathbb{H}\text{ and }\eta(\mathbb{H})=0\}.

Then

(μ,ν)𝔐1(;t)×𝔐1(;t)H(μ,ν)1:=d(x,y)𝑑μ(x)𝑑ν(y)(\mu,\nu)\in\mathfrak{M}_{1}(\mathbb{H};t)\times\mathfrak{M}_{1}(\mathbb{H};t)\to H(\mu,\nu)_{1}:=-\int_{\mathbb{H}}\int_{\mathbb{H}}d_{\mathbb{H}}(x,y)d\mu(x)d\nu(y)

is an inner product.

A different behaviour occurs on the generalized real spheres. Let \mathcal{H} be a Hilbert space and define S:={x,x=1}S^{\mathcal{H}}:=\{x\in\mathcal{H},\quad\|x\|=1\} be the real sphere relative to \mathcal{H}. The kernel dSd_{S^{\mathcal{H}}} defined on SS^{\mathcal{H}} by the relation

cos(dS(x,y))=x,y,x,y\cos(d_{S{\mathcal{H}}}(x,y))=\langle x,y\rangle_{\mathcal{H}},\quad x,y\in\mathcal{H}

is a metric and defines a CND kernel as shown on [5]. However, unlikely the Hilbert space and the real hyperbolic space, dSd_{S^{\mathcal{H}}} is not a metric space of strong negative type, [14]. Gangolli also proved on [5] that the metric on the other compact two-point homogeneous spaces (real/complex/quaternionic projective spaces and the Cayley projective plane) does not define a CND kernel.

The following Corollary of Theorem 4.1, connects the setting of metric spaces of strong negative type and the kernels on Theorem 4.1.

Corollary 4.3.

Let ψ:[0,)\psi:[0,\infty)\to\mathbb{R} be a nonzero Bernstein function such that ψ(0)=0\psi(0)=0,
limtψ(t)/t=0\lim_{t\to\infty}\psi(t)/t=0 and (X,γ)(X,\gamma) is a metric space of negative type. Then,

(x,y)X×XDψ,γ(x,y):=ψ(γ(x,y))(x,y)\in X\times X\to D_{\psi,\gamma}(x,y):=\psi(\gamma(x,y))

is a metric on XX and (X,Dψ,γ)(X,D_{\psi,\gamma}) is a metric space of strong negative type homeomorphic to (X,γ)(X,\gamma).

As an example of Corollary 4.3, the Bersntein function ψ(t)=log(t+1)\psi(t)=\log(t+1), satisfies ψ(0)=0\psi(0)=0 and limtψ(t)/t=0\lim_{t\to\infty}\psi(t)/t=0. In particular, on a Hilbert space \mathcal{H} , log(xy+1)\log(\|x-y\|+1) is a metric on \mathcal{H} that is homeomorphic with the Hilbertian topology and this metric is of strong negative type. Interestingly we can apply Corollary 4.3 again in order to obtain that the same occurs with the metric log(log(xy+1)+1)\log(\log(\|x-y\|+1)+1).

Returning to the kernel (x,y)×xya(x,y)\in\mathcal{H}\times\mathcal{H}\to\|x-y\|^{a}, we may ask ourselves what occurs when a2a\geq 2. The case a=2a=2 is simpler, because

xy2𝑑μ(x)𝑑ν(y)=2x,y𝑑μ(x)𝑑ν(y),-\int_{\mathcal{H}}\int_{\mathcal{H}}\|x-y\|^{2}d\mu(x)d\nu(y)=2\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle_{\mathcal{H}}d\mu(x)d\nu(y),

for every μ,ν𝔐1(;t):={η𝔐(),x2L1(|η|) and η(X)=0}\mu,\nu\in\mathfrak{M}_{1}(\mathcal{H};t):=\{\eta\in\mathfrak{M}(\mathcal{H}),\quad\|x\|^{2}\in L^{1}(|\eta|)\text{ and }\eta(X)=0\}. This still defines a semi-inner product on 𝔐1(;t)\mathfrak{M}_{1}(\mathcal{H};t), but the vector space

𝔐2(;t):={η𝔐1(;t),x,y𝑑η(x)=0, for every y}𝔐1(;t)\mathfrak{M}_{2}(\mathcal{H};t):=\{\eta\in\mathfrak{M}_{1}(\mathcal{H};t),\quad\int_{\mathcal{H}}\langle x,y\rangle_{\mathcal{H}}d\eta(x)=0,\text{ for every }y\in\mathcal{H}\}\subset\mathfrak{M}_{1}(\mathcal{H};t)

is equivalent to the zero measure on this inner product. For an arbitrary measure η𝔐()\eta\in\mathfrak{M}(\mathcal{H}) such that x2L1(|η|)\|x\|^{2}\in L^{1}(|\eta|), the linear functional

yx,y𝑑η(x)y\in\mathcal{H}\to\int_{\mathcal{H}}\langle x,y\rangle d\eta(x)\in\mathbb{R}

is continuous, so there exists a vector vηv_{\eta}, which we call the vector mean of η\eta, which represents the above continuous linear functional.

On the case a>2a>2, a different behaviour emerges. The double integral kernel does not define a semi-inner product on 𝔐1(,ta/2)\mathfrak{M}_{1}(\mathcal{H},t^{a/2}), however, if we restrict ourselves to the vector space space

𝔐2(;ta/2):={η𝔐(),xaL1(|η|),η()=0,vη=0}\mathfrak{M}_{2}(\mathcal{H};t^{a/2}):=\{\eta\in\mathfrak{M}(\mathcal{H}),\quad\|x\|^{a}\in L^{1}(|\eta|),\eta(\mathcal{H})=0,v_{\eta}=0\}

for 2<a<42<a<4 and using the representation given on Equation 2.4 for the CM2CM_{2} function

ta/2=a(a2)4Γ(2a/2)(0,)(ert1+rt)1ra/2+1𝑑r,t^{a/2}=\frac{a(a-2)}{4\Gamma(2-a/2)}\int_{(0,\infty)}(e^{-rt}-1+rt)\frac{1}{r^{a/2+1}}dr,

by Fubini-Tonelli we obtain that if μ,ν𝔐2(;ta/2)\mu,\nu\in\mathfrak{M}_{2}(\mathcal{H};t^{a/2})

xya\displaystyle\int_{\mathcal{H}}\int_{\mathcal{H}}\|x-y\|^{a} dμ(x)dν(y)\displaystyle d\mu(x)d\nu(y)
=a(a2)4Γ(2a/2)(0,)(erxy2𝑑μ(x)𝑑ν(y))1ra+1𝑑r0.\displaystyle=\frac{a(a-2)}{4\Gamma(2-a/2)}\int_{(0,\infty)}\left(\int_{\mathcal{H}}\int_{\mathcal{H}}e^{-r\|x-y\|^{2}}d\mu(x)d\nu(y)\right)\frac{1}{r^{a+1}}dr\geq 0.

In particular, we can use the kernel xya\|x-y\|^{a}, 2<a<42<a<4, in order to define a metric on the space of Radon probability measures on \mathcal{H} with finite second moment, but with a fixed vector mean.

More generally, we have that.

Theorem 4.4.

Let \ell\in\mathbb{N}, ψ:[0,)\psi:[0,\infty)\to\mathbb{R} be a continuous function on CMCM_{\ell} and γ:X×X[0,)\gamma:X\times X\to[0,\infty) be a continuous CND kernel such that xγ(x,x)x\to\gamma(x,x) is a constant function. Consider the vector space

𝔐(X;γ,ψ):\displaystyle\mathfrak{M}_{\ell}(X;\gamma,\psi): ={η𝔐(X),ψ(γ(x,y))L1(|η|×|η|),γ(x,y)L1(|η|×|η|) and\displaystyle=\{\eta\in\mathfrak{M}(X),\quad\psi(\gamma(x,y))\in L^{1}(|\eta|\times|\eta|),\quad\gamma(x,y)^{\ell}\in L^{1}(|\eta|\times|\eta|)\text{ and }
η(X)=0,XXKγ(x,y)jdη(x)dη(y)=0,1j1}\displaystyle\eta(X)=0,\quad\int_{X}\int_{X}K_{-\gamma}(x,y)^{j}d\eta(x)d\eta(y)=0,\quad 1\leq j\leq\ell-1\}

where KγK_{-\gamma} is the kernel in Theorem 3.1, then the function

(μ,ν)𝔐(X;γ,ψ)×𝔐(X;γ,ψ)I(μ,ν)γ,ψ:=XXψ(γ(x,y))𝑑μ(x)𝑑ν(y)(\mu,\nu)\in\mathfrak{M}_{\ell}(X;\gamma,\psi)\times\mathfrak{M}_{\ell}(X;\gamma,\psi)\to I(\mu,\nu)_{\gamma,\psi}:=\int_{X}\int_{X}\psi(\gamma(x,y))d\mu(x)d\nu(y)

defines an semi-inner product on 𝔐(X;γ,ψ)\mathfrak{M}_{\ell}(X;\gamma,\psi). If ψ\psi is not a polynomial of degree \ell or less and 2γ(x,y)=γ(x,x)+γ(y,y)2\gamma(x,y)=\gamma(x,x)+\gamma(y,y) only when x=yx=y, then I(μ,ν)γ,ψI(\mu,\nu)_{\gamma,\psi} defines an inner product on 𝔐(X;γ,ψ)\mathfrak{M}_{\ell}(X;\gamma,\psi).

From Equation 2.4 and the fact that (1)[etrω,(rt)]0(-1)^{\ell}[e^{-tr}-\omega_{\infty,\ell}(rt)]\geq 0, for every t,r0t,r\geq 0 and \ell\in\mathbb{N} (this can be easily proved by induction on \ell), if ψCM\psi\in CM_{\ell} on Theorem 4.4 also belongs C1[0,)C^{\ell-1}[0,\infty), we may lower the requirement γL1(|η|×|η|)\gamma^{\ell}\in L^{1}(|\eta|\times|\eta|) to γ1L1(|η|×|η|)\gamma^{\ell-1}\in L^{1}(|\eta|\times|\eta|) on the definition of 𝔐(X;γ,ψ)\mathfrak{M}_{\ell}(X;\gamma,\psi).

The fact that we required additional properties on the function γ(x,x)\gamma(x,x) on Theorem 4.4 compared to Theorem 4.1, is related to the fact that the integrals XXγ(x,y)j𝑑η(x)𝑑η(y)\int_{X}\int_{X}\gamma(x,y)^{j}d\eta(x)d\eta(y) are difficult to analyse on the general setting of Theorem 4.1. However, if ψCM\psi\in CM_{\ell} on Theorem 4.4 also belongs C1[0,)C^{\ell-1}[0,\infty) and all of its derivatives up to 1\ell-1 are zero at the point 0, then there is no polynomial part on Equation 2.4, and on this case we may only assume that xγ(x,x)x\to\gamma(x,x) is a bounded function on Theorem 4.4. This is the case for the function (1)ta(-1)^{\ell}t^{a}, 2(1)<a<22(\ell-1)<a<2\ell.

As an example of Theorem 4.4, if XX is a Hilbert space \mathcal{H}, γ(x,y)=xy2\gamma(x,y)=\|x-y\|^{2} and ψ(t)=(1)ta/2\psi(t)=(-1)^{\ell}t^{a/2}, 2(1)<a<22(\ell-1)<a<2\ell, \ell\in\mathbb{N} then

(μ,ν)𝔐(;ta/2)×𝔐(;ta/2)I(μ,ν)a/2:=(1)xya𝑑μ(x)𝑑ν(y)(\mu,\nu)\in\mathfrak{M}_{\ell}(\mathcal{H};t^{a/2})\times\mathfrak{M}_{\ell}(\mathcal{H};t^{a/2})\to I(\mu,\nu)_{a/2}:=\int_{\mathcal{H}}\int_{\mathcal{H}}(-1)^{\ell}\|x-y\|^{a}d\mu(x)d\nu(y)

defines a inner product on the vector space

𝔐(;ta/2):\displaystyle\mathfrak{M}_{\ell}(\mathcal{H};t^{a/2}): ={μ𝔐(),xaL1(|μ|),μ()=0, and\displaystyle=\{\mu\in\mathfrak{M}(\mathcal{H}),\quad\|x\|^{a}\in L^{1}(|\mu|),\quad\mu(\mathcal{H})=0,\text{ and }
x,y1x,yjdμ(x)=0,y1,,yj1j1}.\displaystyle\quad\int_{\mathcal{H}}\langle x,y_{1}\rangle\ldots\langle x,y_{j}\rangle d\mu(x)=0,\quad y_{1},\ldots,y_{j}\in\mathcal{H}\quad 1\leq j\leq\ell-1\}.

Theorem 4.1 and Theorem 4.4 on the case where XX is an Euclidean space m\mathbb{R}^{m} and γ(x,y)=xy2\gamma(x,y)=\|x-y\|^{2} were proved on [15].

5. Space of functions defined by derivatives of completely monotone functions

As mentioned in [12], the fact that the energy distance defines a metric on a separable Hilbert space can be proved using the proposed method, but also follows as a consequence of the fact that if \mathcal{H} is a separable Hilbert space, then a measure μ𝔐()\mu\in\mathfrak{M}(\mathcal{H}) such that xaL1(|μ|)\|x\|^{a}\in L^{1}(|\mu|), a(0,)2a\in(0,\infty)\setminus 2\mathbb{N}, satisfies

(5.5) xya𝑑μ(x)=0,y\int_{\mathcal{H}}\|x-y\|^{a}d\mu(x)=0,\quad y\in\mathcal{H}

if and only if μ\mu is the zero measure, proved in [11], [10].

On [8] it is proved that if ψCM0\psi\in CM_{0} and is not a constant function, then

ψ(xy2)𝑑μ(x)=0,y\int_{\mathcal{H}}\psi(\|x-y\|^{2})d\mu(x)=0,\quad y\in\mathcal{H}

if and only if μ\mu is the zero measure. In this section we prove similar results on a much broader setting, as a consequence of the results presented on Section 4.

Theorem 5.1.

Let \mathcal{H} be an infinite dimensional Hilbert space, +\ell\in\mathbb{Z}_{+} and ϕ,φCM\phi,\varphi\in CM_{\ell}. If a measure μ𝔐()\mu\in\mathfrak{M}(\mathcal{H}) such that x2L1(|μ|)\|x\|^{2\ell}\in L^{1}(|\mu|) satisfies

ψ(xy2)𝑑μ(x)=0y,\int_{\mathcal{H}}\psi(\|x-y\|^{2})d\mu(x)=0\quad y\in\mathcal{H},

where ψ:=ϕφ\psi:=\phi-\varphi then it must hold that

ψ(xy2+c)𝑑μ(x)=0y,c0.\int_{\mathcal{H}}\psi(\|x-y\|^{2}+c)d\mu(x)=0\quad y\in\mathcal{H},c\geq 0.

In addition, (even if \mathcal{H} is not infinite dimensional), ψ\psi is not a polynomial if and only if the only measure μ𝔐()\mu\in\mathfrak{M}(\mathcal{H}) such that x2L1(|μ|)\|x\|^{2\ell}\in L^{1}(|\mu|) satisfies

ψ(xy2+c)𝑑μ(x)=0y,c0\int_{\mathcal{H}}\psi(\|x-y\|^{2}+c)d\mu(x)=0\quad y\in\mathcal{H},c\geq 0

is the zero measure.

For some functions we can provide a version of Theorem 5.1 on finite dimensional spaces.

Lemma 5.2.

Let \ell\in\mathbb{N} and \mathcal{H} be a Hilbert space. A measure μ𝔐()\mu\in\mathfrak{M}(\mathcal{H}) such that x2(1)L1(|μ|)\|x\|^{2(\ell-1)}\in L^{1}(|\mu|) and ψ(xy2)L1(|μ|×|μ|)\psi(\|x-y\|^{2})\in L^{1}(|\mu|\times|\mu|), satisfies

ψ(xy2)𝑑μ(x)=0y\int_{\mathcal{H}}\psi(\|x-y\|^{2})d\mu(x)=0\quad y\in\mathcal{H}

when ψ:[0,)\psi:[0,\infty)\to\mathbb{R} is one of the following functions:

  1. (i)(i)

    ψ(t)=ta/2\psi(t)=t^{a/2},  2(1)<a<22(\ell-1)<a<2\ell;

  2. (ii)(ii)

    ψ(t)=t1log(t)\psi(t)=t^{\ell-1}\log(t),  >1\ell>1;

  3. (iii)(iii)

    =1\ell=1 and ψCM\psi\in CM_{\ell} is not a polynomial, ψ(0)0\psi(0)\leq 0.

  4. (iv)(iv)

    =2\ell=2 and ψCM\psi\in CM_{\ell} is not a polynomial, ψ(0)0\psi(0)\leq 0 but x2L1(|μ|)\|x\|^{2\ell}\in L^{1}(|\mu|).

if and only if μ\mu is the zero measure.

We remark that on the case (iv)(iv) we may withdraw the additional assumption x2L1(|μ|)\|x\|^{2\ell}\in L^{1}(|\mu|) if ψC1[0,)\psi\in C^{\ell-1}[0,\infty).

6. Proofs

6.1. Section 3

Proof of Theorem 3.3.

The converse is immediate.
Suppose that γ\gamma is PP-CPD. Since PP is finite dimensional there exists a basis p1,,pmPp_{1},\ldots,p_{m}\in P for it such that pi(ξj)=δi,jp_{i}(\xi_{j})=\delta_{i,j}. By the integrability assumptions on the functions pip_{i} and γ(x,ξj)\gamma(x,\xi_{j}), the kernel KγL1(|μ|×|μ|)K_{\gamma}\in L^{1}(|\mu|\times|\mu|), and

XXγ(x,y)𝑑μ(x)𝑑μ¯(y)=XXKγ(x,y)𝑑μ(x)𝑑μ¯(y)\int_{X}\int_{X}\gamma(x,y)d\mu(x)d\overline{\mu}(y)=\int_{X}\int_{X}K_{\gamma}(x,y)d\mu(x)d\overline{\mu}(y)

the conclusion will follow from Lemma 3.6. ∎

Proof of Lemma 3.4 .

Let μ𝔐P(X)\mu\in\mathfrak{M}_{P}(X) for which γ(x,y)L1(|μ|×|μ|)\gamma(x,y)\in L^{1}(|\mu|\times|\mu|). Let
A:={ξX,γ(,ξ)L1(|μ|)}A:=\{\xi\in X,\quad\gamma(\cdot,\xi)\in L^{1}(|\mu|)\}, which by Fubini-Tonelli its complement has |μ||\mu| zero measure. If A{ξ,p(ξ)0}A\cap\{\xi,\quad p(\xi)\neq 0\}\neq\emptyset, the result is a consequence of Theorem 3.3. On the other hand, if A{ξ,p(ξ)0}=A\cap\{\xi,\quad p(\xi)\neq 0\}=\emptyset, note that the kernel γ\gamma is positive definite when restricted to the closed set B:={ξ,p(ξ)=0}B:=\{\xi,\quad p(\xi)=0\}, ABA\subset B, and

XXγ(x,y)𝑑μ(x)𝑑μ¯(y)=BBγ(x,y)𝑑μ(x)𝑑μ¯(y).\int_{X}\int_{X}\gamma(x,y)d\mu(x)d\overline{\mu}(y)=\int_{B}\int_{B}\gamma(x,y)d\mu(x)d\overline{\mu}(y).

The conclusion follows from Lemma 3.6.
Now, under the additional requirements on pp and γ\gamma it is easy to see that γ\gamma is [p][p]-PD if and only if the kernel

(x,y)X×Xβ(x,y):=γ(x,y)p(x)p(y)(x,y)\in X\times X\to\beta(x,y):=\frac{\gamma(x,y)}{p(x)p(y)}\in\mathbb{R}

is CPD. Note that is sufficient to prove the 33 equivalences on the kernel β\beta for any measure η𝔐(X)\eta\in\mathfrak{M}(X) with η(X)=0\eta(X)=0, because we can take dη=pdμd\eta=pd\mu.
The kernel d(x,y):=(2β(x,y)+β(x,x)+β(y,y))1/2d(x,y):=(-2\beta(x,y)+\beta(x,x)+\beta(y,y))^{1/2} is a pseudometric on XX, because d2d^{2} is a CND kernel with d(x,x)=0d(x,x)=0 for all xXx\in X, so it satisfies the triangle inequality. Since the function β(x,x)\beta(x,x) is bounded, the relations

βL1(|η|×|η|),β(x,z)L1(|η|) for some zX,β(x,z)L1(|η|) for every zX\beta\in L^{1}(|\eta|\times|\eta|),\quad\beta(x,z)\in L^{1}(|\eta|)\text{ for some }z\in X,\quad\beta(x,z)\in L^{1}(|\eta|)\text{ for every }z\in X

are respectively equivalent to the relations

dL2(|η|×|η|),d(x,z)L2(|η|) for some zX,d(x,z)L2(|η|) for every zX.d\in L^{2}(|\eta|\times|\eta|),\quad d(x,z)\in L^{2}(|\eta|)\text{ for some }z\in X,\quad d(x,z)\in L^{2}(|\eta|)\text{ for every }z\in X.

The conclusion that these 33 properties are equivalent for the kernel dd follows directly from the triangle inequality. ∎

Proof of Lemma 3.6.

Assume without loss of generalization that μ\mu is a nonnegative measure. The fact that the set XμX_{\mu} satisfies μ(XXμ)=0\mu(X-X_{\mu})=0 is a direct consequence of the Fubini-Tonelli Theorem.
Also, by the Radon hypothesis, there exists a sequence of nested compact sets (𝒞n)n(\mathcal{C}_{n})_{n\in\mathbb{N}} for which μ(Xn𝒞n)=0\mu(X-\cup_{n\in\mathbb{N}}\mathcal{C}_{n})=0. In particular, by the Dominated Convergence Theorem, the L1(μ×μ)L^{1}(\mu\times\mu) convergence holds

XXK(x,y)χ𝒞n(x)χ𝒞n(y)𝑑μ(x)𝑑μ(y)XXK(x,y)𝑑μ(x)𝑑μ(y),\int_{X}\int_{X}K(x,y)\chi_{\mathcal{C}_{n}}(x)\chi_{\mathcal{C}_{n}}(y)d\mu(x)d\mu(y)\to\int_{X}\int_{X}K(x,y)d\mu(x)d\mu(y),

because μ×μ(X×[Xn𝒞n])=0\mu\times\mu(X\times[X-\cup_{n\in\mathbb{N}}\mathcal{C}_{n}])=0. The function K(x,x)L1(χ𝒞nμ)\sqrt{K(x,x)}\in L^{1}(\chi_{\mathcal{C}_{n}}\mu), so by Lemma 2.1, KμnKK_{\mu^{n}}\in\mathcal{H}_{K}, where μn:=χ𝒞ndμ\mu^{n}:=\chi_{\mathcal{C}_{n}}d\mu, and

KμnKμm,\displaystyle\langle K_{\mu^{n}}-K_{\mu^{m}}, KμnKμmK\displaystyle K_{\mu^{n}}-K_{\mu^{m}}\rangle_{\mathcal{H}_{K}}
=XXK(x,y)[χ𝒞n(x)χ𝒞m(x)][χ𝒞n(y)χ𝒞m(y)]𝑑μ(x)𝑑μ(y)m,n0\displaystyle=\int_{X}\int_{X}K(x,y)[\chi_{\mathcal{C}_{n}}(x)-\chi_{\mathcal{C}_{m}}(x)][\chi_{\mathcal{C}_{n}}(y)-\chi_{\mathcal{C}_{m}}(y)]d\mu(x)d\mu(y)\xrightarrow[m,n\to\infty]{}0

which proves that the sequence (Kμn)n(K_{\mu^{n}})_{n\in\mathbb{N}} is Cauchy, in particular, convergent to an element KμKK_{\mu}\in\mathcal{H}_{K}. Since K\mathcal{H}_{K} is a RKHS, convergence in norm implies pointwise convergence, so

Kμ(z)=limnKμn(z)=limn𝒞nK(x,z)𝑑μ(x)=XK(x,z)𝑑μ(x),K_{\mu}(z)=\lim_{n\to\infty}K_{\mu}^{n}(z)=\lim_{n\to\infty}\int_{\mathcal{C}_{n}}K(x,z)d\mu(x)=\int_{X}K(x,z)d\mu(x),

for every zXμz\in X_{\mu}, which proves our claim.
Now, if KL1(μ×η)K\in L^{1}(\mu\times\eta), we have that

Kηn,KμnK=XXk(x,y)χ𝒟n(x)χ𝒞n(y)𝑑η(x)𝑑μ(y).\langle K_{\eta^{n}},K_{\mu^{n}}\rangle_{\mathcal{H}_{K}}=\int_{X}\int_{X}k(x,y)\chi_{\mathcal{D}_{n}}(x)\chi_{\mathcal{C}_{n}}(y)d\eta(x)d\mu(y).

The left hand side of this equality converge to Kη,KμK\langle K_{\eta},K_{\mu}\rangle_{\mathcal{H}_{K}}, while the right hand side converge to XXk(x,y)𝑑η(x)𝑑μ(y)\int_{X}\int_{X}k(x,y)d\eta(x)d\mu(y) by the Dominated Convergence Theorem. ∎

6.2. Section 4

Throughout the rest of the paper, we use the well known fact that a Hermitian kernel γ:X×X\gamma:X\times X\to\mathbb{C} is CND if and only if the kernel erγ(x,y)e^{-r\gamma(x,y)} is positive definite for every r>0r>0, page 7474 in [1].

Next Lemma is an improvement of Lemma 3.4 for pp as the set of constant functions. We use CND instead of CPD because it is how we apply this result.

Lemma 6.1.

Let γ:X×X\gamma:X\times X\to\mathbb{R} be a continuous CND kernel such that γ(x,x)\gamma(x,x) is a bounded function, μ𝔐(X)\mu\in\mathfrak{M}(X) and θ>0\theta>0. Then, the following assertions are equivalent

  1. (i)(i)

    γLθ(|μ|×|μ|)\gamma\in L^{\theta}(|\mu|\times|\mu|);

  2. (ii)(ii)

    The function xXγ(x,z)Lθ(|μ|)x\in X\to\gamma(x,z)\in L^{\theta}(|\mu|) for some zXz\in X;

  3. (iii)(iii)

    The function xXγ(x,z)Lθ(|μ|)x\in X\to\gamma(x,z)\in L^{\theta}(|\mu|) for every zXz\in X.

Proof.

Since γ\gamma is CND there exists a CND kernel β:X×X\beta:X\times X\to\mathbb{R}, for which β(x,x)=0\beta(x,x)=0 for every xXx\in X, β1/2\beta^{1/2} is a pseudometric on XX and γ(x,y)=β(x,y)+γ(x,x)/2+γ(y,y)/2\gamma(x,y)=\beta(x,y)+\gamma(x,x)/2+\gamma(y,y)/2.
Since γ(x,x)\gamma(x,x) is bounded and μ\mu is a finite measure, for θ1\theta\geq 1 the three equivalences for γ\gamma are respectively equivalent to the three equivalences for the CND kernel β\beta by the Minkowsky inequality. If θ(0,1)\theta\in(0,1) the same relation occurs, but it follows from the general relation on LθL^{\theta} spaces

|f+g|θ|f|θ+|g|θ.\int|f+g|^{\theta}\leq\int|f|^{\theta}+\int|g|^{\theta}.

In particular, we may suppose that γ\gamma is a CND kernel for which γ(x,x)=0\gamma(x,x)=0 for every xXx\in X.
If γ(x,y)θL1(|μ|×|μ|)\gamma(x,y)^{\theta}\in L^{1}(|\mu|\times|\mu|) then there exists zXz\in X for which γ(x,z)θL1(|μ|)\gamma(x,z)^{\theta}\in L^{1}(|\mu|) by the Fubini-Tonelli Theorem.
If xXγ(x,z)θL1(|μ|)x\in X\to\gamma(x,z)^{\theta}\in L^{1}(|\mu|) for some zXz\in X, then for every yXy\in X

(γ(x,y))θ=((γ(x,y))1/2)2θ(γ(x,z)1/2+γ(y,z)1/2)2θ.(\gamma(x,y))^{\theta}=((\gamma(x,y))^{1/2})^{2\theta}\leq(\gamma(x,z)^{1/2}+\gamma(y,z)^{1/2})^{2\theta}.

For θ1/2\theta\geq 1/2, the functions inside the parenthesis on the right hand side of the previous equation are elements of L2θ(|μ|)L^{2\theta}(|\mu|) (xx variable), which by Minkowski Theorem we obtain the integrability of xXγ(x,y)θx\in X\to\gamma(x,y)^{\theta}. Integrating the Minkowsky inequality with respect to d|μ|(y)d|\mu|(y), we also obtain that γθL1(|μ|×|μ|)\gamma^{\theta}\in L^{1}(|\mu|\times|\mu|). For 0<θ<1/20<\theta<1/2, the proof is the same but it follows from the from the general relation on L2θL^{2\theta} spaces as mentioned above. ∎

Proof of Theorem 4.1.

For the first claim it is sufficient to prove that I(μ,μ)γ,ψ0I(\mu,\mu)_{\gamma,\psi}\geq 0 by the linearity of the integration involved. Indeed, by Equation 2.4, we have that

ψ(γ(x,y))=a+bγ(x,y)+(0,)erγ(x,y)1r𝑑λ(r),-\psi(\gamma(x,y))=a+b\gamma(x,y)+\int_{(0,\infty)}\frac{e^{-r\gamma(x,y)}-1}{r}d\lambda(r),

where b0b\leq 0 and λ\lambda is a nonnegative Radon measure such that min{1,r1}L1(λ)\min\{1,r^{-1}\}\in L^{1}(\lambda). Consequently, ψ(γ(x,y))-\psi(\gamma(x,y)) is a CPD kernel, the conclusion is then consequence of Corollary 3.4.
If ψ\psi is not a linear function, then λ((0,))>0\lambda((0,\infty))>0, because the representation on Equation 2.4 is unique.
If μ𝔐1(X;γ,ψ)\mu\in\mathfrak{M}_{1}(X;\gamma,\psi), then the 33 functions that describes ψ(γ(x,y))\psi(\gamma(x,y)) are in L1(|μ|×|μ|)L^{1}(|\mu|\times|\mu|), because erγ(x,y)10e^{-r\gamma(x,y)}-1\leq 0 for every r>0r>0 and x,yXx,y\in X and b0b\leq 0. Since

(1ert)r(1+t)min{1,r1},r,t0(1-e^{-rt})\leq r(1+t)\min\{1,r^{-1}\},\quad r,t\geq 0

we can apply Fubini-Tonelli and obtain that

XXψ(γ(x,y))𝑑μ(x)𝑑μ(y)=\displaystyle-\int_{X}\int_{X}\psi(\gamma(x,y))d\mu(x)d\mu(y)= bXXγ(x,y)𝑑μ(x)𝑑μ(y)\displaystyle b\int_{X}\int_{X}\gamma(x,y)d\mu(x)d\mu(y)
+(0,)[XXerγ(x,y)𝑑μ(x)𝑑μ(y)]1r𝑑λ(r).\displaystyle+\int_{(0,\infty)}\left[\int_{X}\int_{X}e^{-r\gamma(x,y)}d\mu(x)d\mu(y)\right]\frac{1}{r}d\lambda(r).

The first double integral is non positive by Corollary 3.4. Since 2γ(x,y)=γ(x,x)+γ(y,y)2\gamma(x,y)=\gamma(x,x)+\gamma(y,y) only when x=yx=y, the kernel erγ(x,y)e^{-r\gamma(x,y)} is ISPD for every r>0r>0 by Theorem 4.24.2 in [8], so

XXerγ(x,y)𝑑μ(x)𝑑μ(y)>0,r>0\int_{X}\int_{X}e^{-r\gamma(x,y)}d\mu(x)d\mu(y)>0,\quad r>0

and the conclusion follows because λ((0,))>0\lambda((0,\infty))>0.

Proof of Theorem 4.2.

By equation 4.38.24.38.2 in [2], we have that for t1t\geq 1

arcCosh(t)=log(2)+log(t)k=1(2k)!22k(k!)2t2k2k.\operatorname{arcCosh}(t)=\log(2)+\log(t)-\sum_{k=1}^{\infty}\frac{(2k)!}{2^{2k}(k!)^{2}}\frac{t^{-2k}}{2k}.

In [1] it is proved that log([x,y])\log([x,y]) is a CND kernel on \mathcal{H} while by [8] the positive definite kernel [x,y]2k[x,y]^{-2k} on \mathbb{H} is ISPD for every kk\in\mathbb{N}. Since the series appearing on the arcCosh\operatorname{arcCosh} formula above only contains nonnegative numbers, we may reverse the order the summation with integration for any η𝔐1(;t)\eta\in\mathfrak{M}_{1}(\mathbb{H};t). Consequently, if μ\mu is not the zero measure

d(x,y)𝑑μ(x)𝑑μ(y)\displaystyle-\int_{\mathbb{H}}\int_{\mathbb{H}}d_{\mathbb{H}}(x,y)d\mu(x)d\mu(y) =arcCosh([x,y])𝑑μ(x)𝑑μ(y)\displaystyle=-\int_{\mathbb{H}}\int_{\mathbb{H}}\operatorname{arcCosh}([x,y])d\mu(x)d\mu(y)
=log([x,y])+k=1(2k)!22k(k!)22k[x,y]2kdμ(x)dμ(y)\displaystyle=\int_{\mathbb{H}}\int_{\mathbb{H}}-\log([x,y])+\sum_{k=1}^{\infty}\frac{(2k)!}{2^{2k}(k!)^{2}2k}[x,y]^{-2k}d\mu(x)d\mu(y)
k=1(2k)!22k(k!)22k[x,y]2kdμ(x)dμ(y)\displaystyle\geq\int_{\mathbb{H}}\int_{\mathbb{H}}\sum_{k=1}^{\infty}\frac{(2k)!}{2^{2k}(k!)^{2}2k}[x,y]^{-2k}d\mu(x)d\mu(y)
=k=1(2k)!22k(k!)22k[x,y]2k𝑑μ(x)𝑑μ(y)>0.\displaystyle=\sum_{k=1}^{\infty}\frac{(2k)!}{2^{2k}(k!)^{2}2k}\int_{\mathbb{H}}\int_{\mathbb{H}}[x,y]^{-2k}d\mu(x)d\mu(y)>0.

Proof of Corollary 4.3.

By Remark 3.33.3-(iv) on [18], if ψ\psi satisfy these assumptions then we can write the kernel Dψ,γD_{\psi,\gamma} as

Dψ,γ(x,y)=ψ(γ(x,y))=(0,)1erγ(x,y)r𝑑λ(r)D_{\psi,\gamma}(x,y)=\psi(\gamma(x,y))=\int_{(0,\infty)}\frac{1-e^{-r\gamma(x,y)}}{r}d\lambda(r)

where λ\lambda is a nonnegative Radon measure such that min{1,r1}L1(λ)\min\{1,r^{-1}\}\in L^{1}(\lambda). Because γ\gamma is a metric, we have that

1erγ(x,y)[1erγ(x,z)]+[1erγ(z,y)],x,y,zX,1-e^{-r\gamma(x,y)}\leq[1-e^{-r\gamma(x,z)}]+[1-e^{-r\gamma(z,y)}],\quad x,y,z\in X,

Which proves that Dψ,γ(x,y)Dψ,γ(x,z)+Dψ,γ(z,y)D_{\psi,\gamma}(x,y)\leq D_{\psi,\gamma}(x,z)+D_{\psi,\gamma}(z,y).
The topologies are equivalent because ψ\psi is necessarily an increasing function with ψ(0)=0\psi(0)=0, so ψ(tn)0\psi(t_{n})\to 0 if and only if tn0t_{n}\to 0.
The metric space (X,Dψ,γ)(X,D_{\psi,\gamma}) has strong negative type because the kernel γ\gamma is continuous on the metric topology (X,γ)(X,\gamma), ψ\psi is not a linear function and the remaining requirements for Theorem 4.1 are satisfied. ∎

In order to prove the next result, we will use the same infinite dimensional multinomial theorem that was used to prove that the Gaussian kernel is ISPD on Hilbert spaces on [8]. If \mathcal{H} is a real Hilbert space and (eξ)ξ𝕀(e_{\xi})_{\xi\in\mathbb{I}} is a complete orthonormal basis for it, then for every nn\in\mathbb{N}

(6.6) x,yn=(ξ𝕀xξyξ)n=α(𝕀,+),|α|=nn!α!xαyα\langle x,y\rangle^{n}=\left(\sum_{\xi\in\mathbb{I}}x_{\xi}y_{\xi}\right)^{n}=\sum_{\alpha\in(\mathbb{I},\mathbb{Z}_{+}),|\alpha|=n}\frac{n!}{\alpha!}x^{\alpha}y^{\alpha}

where xξ=x,eξx_{\xi}=\langle x,e_{\xi}\rangle, (𝕀,+)(\mathbb{I},\mathbb{Z}_{+}) is the space of functions from 𝕀\mathbb{I} to +\mathbb{Z}_{+}, the condition |α|=n|\alpha|=n means that ξ𝕀α(ξ)=n\sum_{\xi\in\mathbb{I}}\alpha(\xi)=n (in particular α\alpha must be the zero function except for a finite number of points). Also α!=ξ𝕀α(ξ)!\alpha!=\prod_{\xi\in\mathbb{I}}\alpha(\xi)! (which makes sense because 0!=10!=1) and xα=α(ξ)0xξα(ξ)x^{\alpha}=\prod_{\alpha(\xi)\neq 0}x_{\xi}^{\alpha(\xi)}. This result can be proved using approximations of x,y\langle x,y\rangle on finite dimensional spaces and the multinomial theorem on those spaces. The number l\lfloor l\rfloor stands for the smallest integer less then or equal to ll.

On the next Lemma we use the fact that for a continuous positive definite kernel K:X×XK:X\times X\to\mathbb{C} a measure μ𝔐K(X)\mu\in\mathfrak{M}_{\sqrt{K}}(X) satisfy

XK(x,y)𝑑μ(x)=0,yX\int_{X}K(x,y)d\mu(x)=0,\quad y\in X

if and only if XXK(x,y)𝑑μ(x)𝑑μ¯(y)=0\int_{X}\int_{X}K(x,y)d\mu(x)d\overline{\mu}(y)=0, which can be seen on [17], [21].

Lemma 6.2.

Let \mathcal{H} be a real Hilbert space, nn\in\mathbb{N} and μ𝔐()\mu\in\mathfrak{M}(\mathcal{H}). Suppose that xy2nL1(|μ|×|μ|)\|x-y\|^{2n}\in L^{1}(|\mu|\times|\mu|), then

x,ykx2iy2jL1(|μ|×|μ|),k,i,j+,k+i+jn.\langle x,y\rangle^{k}\|x\|^{2i}\|y\|^{2j}\in L^{1}(|\mu|\times|\mu|),\quad k,i,j\in\mathbb{Z}_{+},\quad k+i+j\leq n.

Moreover, if x,yk𝑑μ(x)𝑑μ(y)=0\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k}d\mu(x)d\mu(y)=0 for every 0kn10\leq k\leq n-1, then

(1)n\displaystyle(-1)^{n}\int_{\mathcal{H}}\int_{\mathcal{H}} xy2ndμ(x)dμ(y)\displaystyle\|x-y\|^{2n}d\mu(x)d\mu(y)
=l=0n/2(n2l)(2ll)2n2lx,yn2lx2ly2l𝑑μ(x)𝑑μ(y)0,\displaystyle=\sum_{l=0}^{\lfloor n/2\rfloor}\binom{n}{2l}\binom{2l}{l}2^{n-2l}\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{n-2l}\|x\|^{2l}\|y\|^{2l}d\mu(x)d\mu(y)\geq 0,

and

xy2m𝑑μ(x)𝑑μ(y)=0,0mn1.\int_{\mathcal{H}}\int_{\mathcal{H}}\|x-y\|^{2m}d\mu(x)d\mu(y)=0,\quad 0\leq m\leq n-1.
Proof.

By Lemma 6.1, the fact that xy2nL1(|μ|×|μ|)\|x-y\|^{2n}\in L^{1}(|\mu|\times|\mu|) is equivalent at x2nL1(|μ|)\|x\|^{2n}\in L^{1}(|\mu|). Since |x,ykx2iy2j|x2i+ky2j+k|\langle x,y\rangle^{k}\|x\|^{2i}\|y\|^{2j}|\leq\|x\|^{2i+k}\|y\|^{2j+k} and x2i+kmax{1,x2n}\|x\|^{2i+k}\leq\max\{1,\|x\|^{2n}\}, we obtain the desired integrability.
Note that

xy2m=(x2+y22x,y)m=k=0mi=0mk(mk)(mki)(2)kx,ykx2iy2(mki)\|x-y\|^{2m}=(\|x\|^{2}+\|y\|^{2}-2\langle x,y\rangle)^{m}=\sum_{k=0}^{m}\sum_{i=0}^{m-k}\binom{m}{k}\binom{m-k}{i}(-2)^{k}\langle x,y\rangle^{k}\|x\|^{2i}\|y\|^{2(m-k-i)}

If k+2in1k+2i\leq n-1, then by the hypothesis

0\displaystyle 0 =x,yk+2i𝑑μ(x)𝑑μ(y)=x,yk(ξ𝕀xξyξ)2i𝑑μ(x)𝑑μ(y)\displaystyle=\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k+2i}d\mu(x)d\mu(y)=\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k}\left(\sum_{\xi\in\mathbb{I}}x_{\xi}y_{\xi}\right)^{2i}d\mu(x)d\mu(y)
=x,yk(ξ𝕀xξyξ)2i𝑑μ(x)𝑑μ(y)\displaystyle=\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k}\left(\sum_{\xi\in\mathbb{I}}x_{\xi}y_{\xi}\right)^{2i}d\mu(x)d\mu(y)
=x,yk(|α|=2i2i!α!xαyα)𝑑μ(x)𝑑μ(y)\displaystyle=\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k}\left(\sum_{|\alpha|=2i}\frac{2i!}{\alpha!}x^{\alpha}y^{\alpha}\right)d\mu(x)d\mu(y)
=|α|=2i2i!α!x,ykxαyα𝑑μ(x)𝑑μ(y).\displaystyle=\sum_{|\alpha|=2i}\frac{2i!}{\alpha!}\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k}x^{\alpha}y^{\alpha}d\mu(x)d\mu(y).

But then, x,ykxαyα𝑑μ(x)𝑑μ(y)=0\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k}x^{\alpha}y^{\alpha}d\mu(x)d\mu(y)=0 for every α(𝕀,+)\alpha\in(\mathbb{I},\mathbb{Z}_{+}) with |α|=2i|\alpha|=2i, because the kernel inside the double integral is positive definite, continuous and satisfies the conditions on Lemma 2.1. In particular, since for every yy\in\mathcal{H} and |α|=2i|\alpha|=2i there exists a sequence (yl)l(y_{l})_{l\in\mathbb{N}} that converges to yy and ylα0y_{l}^{\alpha}\neq 0, we have that

x,ykxα𝑑μ(x)=0,y,α(𝕀,+),|α|=2i.\int_{\mathcal{H}}\langle x,y\rangle^{k}x^{\alpha}d\mu(x)=0,\quad y\in\mathcal{H},\alpha\in(\mathbb{I},\mathbb{Z}_{+}),|\alpha|=2i.

Then

x,ykx2iy2(mki)𝑑μ(x)𝑑μ(y)\displaystyle\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k}\|x\|^{2i}\|y\|^{2(m-k-i)}d\mu(x)d\mu(y)
=|β|=mki|α|=i(mki)!β!i!α!x,ykx2αy2β𝑑μ(x)𝑑μ(y)=0.\displaystyle=\sum_{|\beta|=m-k-i}\sum_{|\alpha|=i}\frac{(m-k-i)!}{\beta!}\frac{i!}{\alpha!}\int_{\mathcal{H}}\int_{\mathcal{H}}\langle x,y\rangle^{k}x^{2\alpha}y^{2\beta}d\mu(x)d\mu(y)=0.

By symmetry, the same double integral is zero when k+2(mki)n1k+2(m-k-i)\leq n-1. Those two relations occur only when n=mn=m and 2i=2(nik)2i=2(n-i-k). The remaining terms on the sum when n=mn=m are exactly those on the statement on the theorem after a simplification using those two equalities. The conclusion follows because the kernel x,ykx2ly2l\langle x,y\rangle^{k}\|x\|^{2l}\|y\|^{2l} is continuous, positive definite and satisfies the conditions on Lemma 2.1

Corollary 6.3.

Let γ:X×X[0,)\gamma:X\times X\to[0,\infty) be a continuous CND kernel such that xγ(x,x)x\to\gamma(x,x) is a constant function and 2γ(x,y)=γ(x,x)+γ(y,y)2\gamma(x,y)=\gamma(x,x)+\gamma(y,y) only when x=yx=y. Then for nn\in\mathbb{N} and μ𝔐(X)\mu\in\mathfrak{M}(X) such that γnL1(|μ|×|μ|)\gamma^{n}\in L^{1}(|\mu|\times|\mu|), the kernel KγK_{-\gamma} defined in Theorem 3.1 satisfies (Kγ)mL1(|μ|×|μ|)(K_{-\gamma})^{m}\in L^{1}(|\mu|\times|\mu|), 0mn0\leq m\leq n and if

XXKγ(x,y)m𝑑μ(x)𝑑μ(y)=0,0mn1,\int_{X}\int_{X}K_{-\gamma}(x,y)^{m}d\mu(x)d\mu(y)=0,\quad 0\leq m\leq n-1,

then

(1)nXXγ(x,y)n𝑑μ(x)𝑑μ(y)0(-1)^{n}\int_{X}\int_{X}\gamma(x,y)^{n}d\mu(x)d\mu(y)\geq 0

and

XXγ(x,y)m𝑑μ(x)𝑑μ(y)=0,0mn1.\int_{X}\int_{X}\gamma(x,y)^{m}d\mu(x)d\mu(y)=0,\quad 0\leq m\leq n-1.
Proof.

By the hypothesis on γ\gamma, there exists a Hilbert space \mathcal{H} and a continuous and injective function T:XT:X\to\mathcal{H}, such that γ(x,y)=T(x)T(y)2+c\gamma(x,y)=\|T(x)-T(y)\|_{\mathcal{H}}^{2}+c, where c0c\geq 0 is the value of γ\gamma on the diagonal. If μ𝔐(X)\mu\in\mathfrak{M}(X) is a measure satisfying the conditions on the Corollary, then the image measure μT𝔐()\mu_{T}\in\mathfrak{M}(\mathcal{H}) satisfies the same conditions of Lemma 6.2. The conclusion follows by standard properties of image measures.∎

Proof of Theorem 4.4.

By Equation 2.3, we have that

ψ(γ(x,y))=(0,)eγ(x,y)re(r)ω,(γ(x,y)r)r𝑑λ(r)+k=0akγ(x,y)k.\psi(\gamma(x,y))=\int_{(0,\infty)}\frac{e^{-\gamma(x,y)r}-e_{\ell}(r)\omega_{\ell,\infty}(\gamma(x,y)r)}{r^{\ell}}d\lambda(r)+\sum_{k=0}^{\ell}a_{k}\gamma(x,y)^{k}.

By the hypothesis, the +2\ell+2 functions above are in L1(|μ|×|μ|)L^{1}(|\mu|\times|\mu|). Corollary 6.3 implies that

XXk=0akγ(x,y)kdμ(x)dμ(y)=XXaγ(x,y)𝑑μ(x)𝑑μ(y)0.\int_{X}\int_{X}\sum_{k=0}^{\ell}a_{k}\gamma(x,y)^{k}d\mu(x)d\mu(y)=\int_{X}\int_{X}a_{\ell}\gamma(x,y)^{\ell}d\mu(x)d\mu(y)\geq 0.

On the other hand, because of Lemma 6.4 we can apply Fubini-Tonelli, and then

XX[(0,)eγ(x,y)re(r)ω,(γ(x,y)r)r𝑑λ(r)]𝑑μ(x)𝑑μ(y)\displaystyle\int_{X}\int_{X}\left[\int_{(0,\infty)}\frac{e^{-\gamma(x,y)r}-e_{\ell}(r)\omega_{\ell,\infty}(\gamma(x,y)r)}{r^{\ell}}d\lambda(r)\right]d\mu(x)d\mu(y)
=\displaystyle= (0,)1r[XXeγ(x,y)r𝑑μ(x)𝑑μ(y)]𝑑λ(r)0,\displaystyle\int_{(0,\infty)}\frac{1}{r^{\ell}}\left[\int_{X}\int_{X}e^{-\gamma(x,y)r}d\mu(x)d\mu(y)\right]d\lambda(r)\geq 0,

because the inner double integral is a nonnegative number for every r>0r>0 by [8].
Because the representation for ψ\psi is unique, if ψ\psi is not a polynomial of degree \ell or less then λ((0,))>0\lambda((0,\infty))>0, also, if 2γ(x,y)=γ(x,x)+γ(y,y)2\gamma(x,y)=\gamma(x,x)+\gamma(y,y) only when x=yx=y, by [8] the inner double integral is a positive number for every r>0r>0 when μ\mu is not the zero measure, and then the triple integral is a positive number as well.∎

Lemma 6.4.

There exists an M>0M>0, which only depends on +\ell\in\mathbb{Z_{+}} for which

(6.7) |erte(r)ω,(rt)|Mr(1+t)min{1,r},r>0,t0.|e^{-rt}-e_{\ell}(r)\omega_{\ell,\infty}(rt)|\leq Mr^{\ell}(1+t^{\ell})\min\{1,r^{-\ell}\},\quad r>0,t\geq 0.
Proof.

Note that rmin{1,r}=min{r,1}r^{\ell}\min\{1,r^{-\ell}\}=\min\{r^{\ell},1\}.
Case r1r\geq 1: On this case, the right hand side of Equation 6.7 is (1+t)(1+t^{\ell}), while the left hand side is

|erte(r)ω,(rt)|1+|e(r)ω,(rt)|1+l=01|e(r)rl|tl/l!.|e^{-rt}-e_{\ell}(r)\omega_{\ell,\infty}(rt)|\leq 1+|e_{\ell}(r)\omega_{\ell,\infty}(rt)|\leq 1+\sum_{l=0}^{\ell-1}|e_{\ell}(r)r^{l}|t^{l}/l!.

Since each function |e(r)rl||e_{\ell}(r)r^{l}| is bounded, the results follows from the fact that tl1+tt^{l}\leq 1+t^{\ell}.

Case r<1r<1: On this case, the right hand side of Equation 6.7 is (1+t)r(1+t^{\ell})r^{\ell}, while the left hand side is

|erte(r)ω,(rt)||ertω,(rt)|+|(e(r)1)ω,(rt)|.|e^{-rt}-e_{\ell}(r)\omega_{\ell,\infty}(rt)|\leq|e^{-rt}-\omega_{\ell,\infty}(rt)|+|(e_{\ell}(r)-1)\omega_{\ell,\infty}(rt)|.

The function [esω,(s)]/s[e^{-s}-\omega_{\ell,\infty}(s)]/s^{\ell} is a bounded function on s[0,)s\in[0,\infty), and from this we obtain the desired inequality for |ertω,(rt)||e^{-rt}-\omega_{\ell,\infty}(rt)|.
On the other function we have that

|(e(r)1)ω,(rt)|l=01|(e(r)1)rl|tl/l!.|(e_{\ell}(r)-1)\omega_{\ell,\infty}(rt)|\leq\sum_{l=0}^{\ell-1}|(e_{\ell}(r)-1)r^{l}|t^{l}/l!.

Similarly, since e(r)1=erk=rk/k!e_{\ell}(r)-1=-e^{-r}\sum_{k=\ell}^{\infty}r^{k}/k! the functions (e(r)1)rlr(e_{\ell}(r)-1)r^{l}r^{-\ell} are bounded on r(0,1)r\in(0,1) and from this we also obtain the desired inequality for |(e(r)1)ω,(rt)||(e_{\ell}(r)-1)\omega_{\ell,\infty}(rt)|, which concludes the proof. ∎

6.3. Section 5

Proof of Theorem 5.1.

Since \mathcal{H} is infinite dimensional, take (eι)ι(e_{\iota})_{\iota\in\mathbb{N}} be an orthonormal sequence of vectors in \mathcal{H}. By the Dominated Convergence Theorem, we have that

0=ψ(xyreι2)𝑑μ(x)ψ(xy2+r2)𝑑μ(x),y,r0=\int_{\mathcal{H}}\psi(\|x-y-re_{\iota}\|^{2})d\mu(x)\to\int_{\mathcal{H}}\psi(\|x-y\|^{2}+r^{2})d\mu(x),\quad y\in\mathcal{H},\quad r\in\mathbb{R}

because xy,eι0\langle x-y,e_{\iota}\rangle\to 0 as ι\iota\to\infty and |ψ(t)||φ(t)|+|ϕ(t)|(1+t)|\psi(t)|\leq|\varphi(t)|+|\phi(t)|\lesssim(1+t)^{\ell}, which proves the first assertion.
Now, if ψ\psi is a polynomial of degree nn, let t1,,tNt_{1},\ldots,t_{N}\in\mathbb{R}, c1,,cNc_{1},\ldots,c_{N}\in\mathbb{R} (not all null) such that i=1Ncip(ti)=0\sum_{i=1}^{N}c_{i}p(t_{i})=0 for every pπ2n(R)p\in\pi_{2n}(R). Then if v=1\|v\|=1, the measure μ:=i=1Nciδ(tiv)𝔐()\mu:=\sum_{i=1}^{N}c_{i}\delta(t_{i}v)\in\mathfrak{M}(\mathcal{H}) is nonzero and

ψ(xy2)𝑑μ(x)=i=1Nciψ(yy,vv2+(y,vti)2)=0\int_{\mathcal{H}}\psi(\|x-y\|^{2})d\mu(x)=\sum_{i=1}^{N}c_{i}\psi(\|y-\langle y,v\rangle v\|^{2}+(\langle y,v\rangle-t_{i})^{2})=0

because this function is polynomial of degree 2n2n for every fixed yy\in\mathcal{H}.
For the converse, first, we show that is sufficient to prove the case =0\ell=0.
Indeed, the function c(0,)F(c):=ψ(xy2+c)c\in(0,\infty)\to F(c):=\psi(\|x-y\|^{2}+c)\in\mathbb{R} is differentiable for every x,yx,y\in\mathcal{H}, and

Fc(y)=ψ(c+xy2).\frac{\partial F}{\partial c}(y)=\psi^{\prime}(c+\|x-y\|^{2}).

Since ψ=φϕ\psi=\varphi-\phi, and those functions are elements of CMCM_{\ell}, we have that |ψ(t+c)|(1+t)1|\psi^{\prime}(t+c)|\lesssim(1+t)^{\ell-1}, for every c>0c>0. In particular, the derivative is a function in L1(|μ|)L^{1}(|\mu|) and

(6.8) ψ(c+xy2)𝑑μ(x)=0,y,c>0.\int_{\mathcal{H}}\psi^{\prime}(c+\|x-y\|^{2})d\mu(x)=0,\quad y\in\mathcal{H},\quad c>0.

Since ψ(c+)\psi^{\prime}(c+\cdot) also is the difference between two functions in CM1CM_{\ell-1} for every c>0c>0, by induction, we may assume that =0\ell=0.
Assume that ψ\psi is not a polynomial and μ\mu is a nonzero measure that satisfy the equality on the statement of the Theorem. The function ψ(c+)\psi(c+\cdot) is the difference between two completely monotone functions on [0,)[0,\infty), so there exists a measure βc\beta_{c} in [0,)[0,\infty) for which

ψ(c+t)=[0,)ert𝑑βc(r),c>0,t0\psi(c+t)=\int_{[0,\infty)}e^{-rt}d\beta_{c}(r),\quad c>0,t\geq 0

and dβc+s(r)=ersdβc(r)d\beta_{c+s}(r)=e^{-rs}d\beta_{c}(r) for every c,s>0c,s>0. Integrating the function on the hypotheses with respect to the measure dμ(y)d\mu(y), we obtain that

(6.9) 0=ψ(c+xy2)𝑑μ(x)𝑑μ(y)=[0,)erxy2𝑑μ(x)𝑑μ(y)𝑑βc(r),c>0.0=\int_{\mathcal{H}}\int_{\mathcal{H}}\psi(c+\|x-y\|^{2})d\mu(x)d\mu(y)=\int_{[0,\infty)}\int_{\mathcal{H}}\int_{\mathcal{H}}e^{-r\|x-y\|^{2}}d\mu(x)d\mu(y)d\beta_{c}(r),\quad c>0.

The continuous and bounded function Iμ(r):=erxy2𝑑μ(x)𝑑μ(y)I_{\mu}(r):=\int_{\mathcal{H}}\int_{\mathcal{H}}e^{-r\|x-y\|^{2}}d\mu(x)d\mu(y), r0r\geq 0, is positive for every r>0r>0 by [8], additionally Equation 6.9 implies that (c=s+1c=s+1)

0=[0,)esrIμ(r)𝑑β1(r),s0.0=\int_{[0,\infty)}e^{-sr}I_{\mu}(r)d\beta_{1}(r),\quad s\geq 0.

By the uniqueness representation of Laplace transform, this can only occur if the finite measure Iμdβ1I_{\mu}d\beta_{1} is the zero measure on [0,)[0,\infty). The behaviour of IμI_{\mu} implies that this occur if and only if Iμ(0)=0I_{\mu}(0)=0 and β1\beta_{1} is a multiple of δ0\delta_{0}, the latter implies that ψ\psi is a constant function, which is a contradiction. ∎

Proof of Lemma 5.2.

By Theorem 5.1 we only need to focus on the finite dimensional case. We prove (i)(i) and (ii)(ii) by showing that is sufficient to prove the case =1,2\ell=1,2, which will follow from (iii)(iii) and (iv)(iv). For the induction argument on (ii)(ii) we assume a more general setting, that ψ(t)=t1log(t)+bt1\psi(t)=t^{\ell-1}\log(t)+bt^{\ell-1}, with bb\in\mathbb{R}.
Indeed, suppose that 3\ell\geq 3. Note then that the function yF(y):=ψ(xy2)y\in\mathcal{H}\to F(y):=\psi(\|x-y\|^{2})\in\mathbb{R} is twice differentiable on each direction of an orthonormal basis (eι)ι(e_{\iota})_{\iota\in\mathfrak{I}} for \mathcal{H}, and

2F2eι(y)=4ψ′′(xy2)(yιxι)2+2ψ(xy2).\frac{\partial^{2}F}{\partial^{2}e_{\iota}}(y)=4\psi^{\prime\prime}(\|x-y\|^{2})(y_{\iota}-x_{\iota})^{2}+2\psi^{\prime}(\|x-y\|^{2}).

Since ψC1([0,))CM\psi\in C^{\ell-1}([0,\infty))\cap CM_{\ell} (or ψ-\psi is an element, the sign does not make difference for the induction step), we have that |ψ(t)|(1+t)1|\psi^{\prime}(t)|\lesssim(1+t)^{\ell-1} and |ψ′′(t)|(1+t)2|\psi^{\prime\prime}(t)|\lesssim(1+t)^{\ell-2}. In particular, the second derivative is a function in L1(|μ|)L^{1}(|\mu|) and summing on the ι\iota variable we obtain (m=dim()m=dim(\mathcal{H}))

(6.10) 0=4ψ′′(xy2)xy2+2mψ(xy2)dμ(x),y.0=\int_{\mathcal{H}}4\psi^{\prime\prime}(\|x-y\|^{2})\|x-y\|^{2}+2m\psi^{\prime}(\|x-y\|^{2})d\mu(x),\quad y\in\mathcal{H}.

When ψ\psi is a function of type (i)(i) or (ii)(ii), the integrand on this equation is equal to a positive multiple of xy2a2\|x-y\|^{2a-2} (or xy24log(xy2)\|x-y\|^{2\ell-4}\log(\|x-y\|^{2}) plus a multiple of xy24\|x-y\|^{2\ell-4}), which is the induction argument.
Now, let ψ\psi be an arbitrary function on CMCM_{\ell}, =1,2\ell=1,2, that is not a polynomial. For every t>0t>0, define ηt:=tμτt\eta_{t}:=t\mu-\tau_{t}, where τt=tμ()δ0(δtvμδtvμ)/2\tau_{t}=t\mu(\mathcal{H})\delta_{0}-(\delta_{tv_{\mu}}-\delta_{-tv_{\mu}})/2 and vμv_{\mu} is the vector mean, that is

x,y𝑑μ(x)=vμ,y,y.\int_{\mathcal{H}}\langle x,y\rangle d\mu(x)=\langle v_{\mu},y\rangle,\quad y\in\mathcal{H}.

On the case =1\ell=1 the vector vμv_{\mu} might not be well defined, on this case define it as the vector zero. Then ηt()=0\eta_{t}(\mathcal{H})=0, and if it is well defined vηt=0v_{\eta_{t}}=0. By the hypothesis we obtain that

4ψ(xy2)𝑑ηt(x)𝑑ηt(y)\displaystyle 4\int_{\mathcal{H}}\int_{\mathcal{H}}\psi(\|x-y\|^{2})d\eta_{t}(x)d\eta_{t}(y) =ψ(xy2)d2τt(x)d2τt(y)\displaystyle=\int_{\mathcal{H}}\int_{\mathcal{H}}\psi(\|x-y\|^{2})d2\tau_{t}(x)d2\tau_{t}(y)
=ψ(0)(4t2μ()2+2)2ψ(4t2vμ2)\displaystyle=\psi(0)(4t^{2}\mu(\mathcal{H})^{2}+2)-2\psi(4t^{2}\|v_{\mu}\|^{2})

By Theorem 4.4, this is a nonnegative number for every t>0t>0.
On the other hand, if =2\ell=2 by the relation on Equation 2.4, we know that (1)2ψ(t)(-1)^{2}\psi(t) converges to ++\infty as tt\to\infty, so if vμ0\|v_{\mu}\|\neq 0 or ψ(0)<0\psi(0)<0 we would reach a contradiction, consequently vμ=0,ψ(0)=0v_{\mu}=0,\psi(0)=0. In particular, we obtain that the double integral with respect to η1\eta_{1} is zero, which by Theorem 4.4 we must have that μ=μ()δ0\mu=\mu(\mathcal{H})\delta_{0}, because ψ\psi is not a polynomial. From this equality and the initial assumption on μ\mu we obtain that μ()ψ(y2)=0\mu(\mathcal{H})\psi(\|y\|^{2})=0 for every yy\in\mathcal{H}, which can only occur if μ\mu is the zero measure because ψ\psi is not a polynomial.
The case =1\ell=1 follows by a similar analysis. ∎

References

  • [1] C. Berg, J. Christensen, and P. Ressel, Harmonic analysis on semigroups: theory of positive definite and related functions, vol. 100 of Graduate Texts in Mathematics, Springer, 1984.
  • [2] NIST Digital Library of Mathematical Functions. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds.
  • [3] J. Faraut and K. Harzallah, Distances hilbertiennes invariantes sur un espace homogène, Annales de l’Institut Fourier, 24 (1974), pp. 171–217.
  • [4] K. Fukumizu, F. R. Bach, and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces, Journal of Machine Learning Research, 5 (2004), pp. 73–99.
  • [5] R. Gangolli, Positive definite kernels on homogeneous spaces and certain stochastic processes related to lévy brownian motion of several parameters, Annales de Institut Henri Poincaré Probabilités et Statistiques, 3 (1967), pp. 121–226.
  • [6] I. M. Gelfand and N. Y. Vilenkin, Generalized Functions, Vol. 4: Applications of Harmonic Analysis, Academic Press, 1964.
  • [7] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola, A kernel method for the two-sample-problem, Advances in neural information processing systems, 19 (2006), pp. 513–520.
  • [8] J. C. Guella, On Gaussian kernels on Hilbert spaces and kernels on Hyperbolic spaces, arXiv e-prints, (2020), p. arXiv:2007.14697.
  • [9] K. Guo, S. Hu, and X. Sun, Conditionally positive definite functions and Laplace-Stieltjes integrals, Journal of Approximation Theory, 74 (1993), pp. 249–265.
  • [10] A. L. Koldobskii, Isometric operators in vector-valued lp-spaces, Journal of Soviet Mathematics, 36 (1987), pp. 420–423.
  • [11] W. LINDE, On rudin’s equimeasurability theorem for infinite dimensional hilbert spaces, Indiana University Mathematics Journal, 35 (1986), pp. 235–243.
  • [12] R. Lyons, Distance covariance in metric spaces, Ann. Probab., 41 (2013), pp. 3284–3305.
  • [13]  , Hyperbolic space has strong negative type, Illinois J. Math., 58 (2014), pp. 1009–1013.
  • [14]  , Strong negative type in spheres, Pacific Journal of Mathematics, 307 (2020), pp. 383–390.
  • [15] L. Mattner, Strict definiteness of integrals via complete monotonicity of derivatives, Transactions of the American Mathematical Society, 349 (1997), pp. 3321–3342.
  • [16] C. A. Micchelli, Interpolation of scattered data: distance matrices and conditionally positive definite functions, Constructive Approximation, 2 (1984), pp. 11–22.
  • [17] C. A. Micchelli, Y. Xu, and H. Zhang, Universal kernels, Journal of Machine Learning Research, 7 (2006), pp. 2651–2667.
  • [18] R. L. Schilling, R. Song, and Z. Vondracek, Bernstein functions: theory and applications, vol. 37, Walter de Gruyter, 2012.
  • [19] D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu, Equivalence of distance-based and rkhs-based statistics in hypothesis testing, The Annals of Statistics, (2013), pp. 2263–2291.
  • [20] C.J. Simon-Gabriel and B. Schölkopf, Kernel distribution embeddings: Universal kernels, characteristic kernels and kernel metrics on distributions, Journal of Machine Learning Research, 19 (2018), pp. 1–29.
  • [21] B. K. Sriperumbudur, K. Fukumizu, and G. R. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, Journal of Machine Learning Research, 12 (2011), pp. 2389–2410.
  • [22] I. Steinwart and A. Christmann, Support vector machines, Springer Science & Business Media, 2008.
  • [23] G. J. Székely and M. L. Rizzo, Energy statistics: A class of statistics based on distances, Journal of Statistical Planning and Inference, 143 (2013), pp. 1249–1272.
  • [24] G. J. Székely, M. L. Rizzo, et al., Testing for equal distributions in high dimension, InterStat, 5 (2004), pp. 1249–1272.
  • [25] H. Wendland, Scattered data approximation, vol. 17, Cambridge university press, 2005.