Generalization of the energy distance by Bernstein functions
Abstract.
We reprove the well known fact that the energy distance defines a metric on the space of Borel probability measures on a Hilbert space with finite first moment by a new approach, by analysing the behaviour of the Gaussian kernel on Hilbert spaces and a Maximum Mean Discrepancy analysis. From this new point of view we are able to generalize the energy distance metric to a family of kernels related to Bernstein functions and conditionally negative definite kernels. We also explain what occurs on the energy distance on the kernel for every , where we also generalize the idea to a family of kernels related to derivatives of completely monotone functions and conditionally negative definite kernels.
Key words and phrases:
Energy distance; Metric spaces of strong negative type; Metrics on probabilities; Bernstein functions; Conditionally negative definite kernels2010 Mathematics Subject Classification:
42A82 ; 43A351. Introduction
A popular method to compare two probabilities is done by embedding the space (or a subset) of probabilities into a Hilbert space and use the metric provided by the embedding. Currently, there are two main approaches for this task:
-
The use of a continuous conditionally negative definite kernel with for every , [19]. The kernel must additionally satisfy the equality
(1.1) for two Radon regular probabilities and that integrates the function for every only when . It can be proved that the above double integral is always a nonnegative number and when this property occurs
is a metric on the mentioned subspace of probabilities on .
On this paper, we focus on the second method.
The most popular example of this method is the energy distance, initially defined as , , where and the set of probabilities are those that integrates , [24], [23]. When , the kernel is conditionally negative definite but do not satisfy the additional property of Equation 1.1.
A more geometrical approach is when is a metric on that satisfy Equation 1.1 (the topology is the one from the metric), hence is a metric space of strong negative type. Examples of such spaces include:
Hilbert spaces: Proved on [12] as a generalization of the energy distance.
Hyperbolic spaces (finite dimensional): Proved on [13]
In some cases, the conditionally negative definite kernel may define a metric on the set , but is not of strong type. A metric space where we only know that the distance is a conditionally negative definite kernel is called a metric space of negative type. An example of such space is the real sphere, proved on [5], where it is also proved that the real, complex and quaternionic projective spaces and the Cayley projective plane are not metric spaces of negative type.
In [12], it is also proved that if is a metric space of negative type then , is a conditionally negative definite kernel that satisfies Equation 1.1, with the topology of the metric . Interestingly, the kernel is a metric on , with the same topology as , so we can rephrase the result of Lyon as being a metric space of strong negative type. We provide more details and generalizations of this property on Corollary 4.3.
The major aim of this paper is to provide a large amount of examples of conditionally negative definite kernels that satisfy Equation 1.1, by using Bernstein functions on Theorem 4.1. Our method encompasses all of the above mentioned kernels that satisfy . We also provide a new proof that hyperbolic spaces (any dimension) are metric paces of strong negative type on Theorem on 4.2.
In [15], Mattner analysed the behaviour of the kernel , for , defined on . What occurs is that we can still provide a metric structure on the space of probabilities with certain integrability assumptions, but we can only compare them if they have the same vector mean ( ), the same same vector mean and the same covariance matrix (), and so on. It also provided the same analysis for others radial kernels, that we generalize on Theorem 4.4 to a broader setting.
Section 3 is focused on the integrability conditions of a conditionally negative definite kernel (and its generalizations). Section 4 contains the most important results of this paper, mentioned before. On Section 5 we analyse the space of functions
where is a continuous function that is the difference of two derivatives (same order) of a completely monotone function. More precisely, we analyse when they are uniquely defined by the measure . Section 2 is entirely focused on definitions that we use. The proofs are presented on Section 6.
2. Definitions
We recall that a nonnegative measure on a Hausdorff space is Radon regular (which we simply refer as Radon) when it is a Borel measure such that is finite on every compact set of and
-
(i)
(Inner regular) for every Borel set .
-
(ii)
(Outer regular) for every Borel set .
We then said that a complex valued measure of bounded variation is Radon if its variation is a Radon measure. The vector space of such measures is denoted by . Recall that every Borel measure of finite variation (in particular, probability measures) on a separable complete metric space is necessarily Radon.
An semi-inner product on a real (complex) vector space is a bilinear real (sesquilinear complex) valued function defined on such that for every . When this inequality is an equality only for , we say that is an inner-product. Similarly, a pseudometric on a set is a symmetric function , such that that satisfies the triangle inequality. If only when , is a metric on .
A kernel is called positive definite if for every finite quantity of distinct points and scalars , we have that
where . The set of measures on used before are denoted by the symbol .
The reproducing kernel Hilbert space (RKHS) of a positive definite kernel is the Hilbert space , and it satisfies [22]
-
;
-
-
.
When is a Hausdorff space and is continuous it holds that .
The following widely known result describes how it is possible to define a semi-inner product structure on a subspace of using a continuous positive definite kernel.
Lemma 2.1.
If is a continuous positive definite kernel and with (), then
is an element of , and if is another measure with the same conditions as , we have that
In particular, is a semi-inner product.
We present a generalization of this result to a larger class of measures in Lemma 3.6. Usually, the kernel is bounded, so . On this case, if the semi-inner product is in fact an inner product we say that is integrally strictly positive definite (ISPD), and when is an inner product on the vector space of measures in that , we say that is characteristic. If the kernel is real valued, it is sufficient to analyse the ISPD and characteristic property on real valued measures.
When the kernel is characteristic we define the maximum mean discrepancy (MMD) as the metric on the space of probability measures in by
(2.2) |
As mentioned at the introduction, the focused of this paper is to analyse metrics on the space of probabilities using conditionally negative definite kernels. We present a more general definition which will be useful to the analysis of the energy distance through the kernel , , defined on a Hilbert space.
Definition 2.2.
Let be an Hermitian kernel and a finite dimensional space of functions from to . We say that is -conditionally positive definite (-CPD) if for every finite quantity of points and scalars , under the restriction that for every , we have that
This definition generalize the concepts of positive definite kernels ( is the zero space) and CPD kernels ( as the set of constant functions). The most important example is when is a finite dimensional Euclidean space and is the set of multivariable polynomials on with degree less than or equal to a constant , [25] [9], [6]. Sometimes it might be more convenient to work with the opposite sign on Definition 2.2, on this case we say that the kernel is -conditionally negative definite (-CND).
In [9], [16], it is proved that a characterization for the continuous functions , such that the kernel
is CPD for as the family of multivariable polynomials of degree less than a fixed (we denote this family by , where and ) for every . A function satisfy this property if and only if and is a completely monotone function on . A function with this property can be uniquely written as
(2.3) |
where is a nonnegative Radon measure on (not necessarily with finite variation) with
and , and is the zero function. For instance, the functions
-
;
-
;
-
;
-
,
are elements of , for , and . Those functions are not only in , but they are continuously differentiable on and we have a similar and simpler characterization compared to Equation 2.3 for them.
In general, a function is such that if and only if
(2.4) |
where is a nonnegative Radon measure on (not necessarily with finite variation) with
for and .
Note that if a function then . On this case, the measure relative to the decomposition given on Equation 2.4 has finite variation and satisfy for every . This property and the decomposition given on Equation 2.4 are implicitly proved on Theorem on [16] and can also be found on Theorem of [25]. We remark that a polynomial if and only if and the constant .
By Lemma in [9], a function satisfies (this notation means that is a bounded function).
3. Conditionally positive definite kernels
The following known result states a connection between positive definite kernels and -CPD kernels [25]. A Lagrange basis for is a basis of and points , such that . A set of points is unisolvent with respect to a -dimensional space if the only function such that for every is the zero function.
Theorem 3.1.
Let and be a Lagrange basis for a finite dimensional space of functions from to . An Hermitian kernel is -CPD if and only if the Hermitian kernel
is positive definite.
This result can be easily seen by the fact that if and are such that for every , then
and conversely, if (with ) and , then
where , for and , for .
Similar to continuous positive definite kernels, continuous -CPD kernels can be analyzed by its behaviour on a certain type of space of measures.
Definition 3.2.
Let be a Hausdorff space and a finite dimensional vector space. We define the set
Theorem 3.3.
A continuous Hermitian kernel is -CPD if and only if for every for which and , where is unisolvent, we have that
If we restrict the measures on Theorem 3.3 to those that for every , then the kernel defines a semi-inner product on this vector space.
When is the space generated by a single function , we can simplify the assumptions of Theorem 3.3.
Lemma 3.4.
Let be a continuous Hermitian kernel and be a one dimensional vector space. Then, is -CPD if and only if for every for which
Additionally, if and are real valued functions such that and the function is bounded, the following assertions are equivalent:
-
;
-
The function for some ;
-
The function for every .
As a direct consequence of the previous Lemma we obtain that if the function is bounded, the set of measures on that integrates is a vector space and the double integral defines a semi inner product on it. We focus on the CPD case and when is real valued due to its relevance.
Corollary 3.5.
Let be a continuous CPD kernel such that the function is bounded. The semi inner product
is well defined on the vector space
On the next lemma we improve the condition and the set of measures analysed on Lemma 2.1, at the cost of describing the function at the exception of a measure zero set.
Lemma 3.6.
Let be a continuous positive definite kernel. Let such that , then the set of points
is such that , and the function
is the restriction of an element . If is a measure with the same conditions as the measure and , we have that
4. Inner products defined by CND kernels and derivatives of completely monotone functions
Since all kernels that we deal on this Section are real valued, we simplify the writing by only focusing on real valued measures (which we still use the notation ). As mentioned on Section , this is not a restriction.
In [12], it is proved that on a separable real Hilbert space , the bilinear function defined as
defines a inner product on the vector space
The function is an example of a Bernstein function, [18]. It is continuous, and is a completely monotone function on (we do not need to assume on our context that Bernstein functions are nonnegative). In other words, a function is a Bernstein function if and only if , and then it can be written, by Equation 2.4 for , as
So,
and this kernel is CPD. The Gaussian kernels , , are ISPD for every Hilbert space [8], being so, by Fubini-Tonelli Theorem we have that if with and , then
Further, the double inner integral is positive whenever is not the zero measure, implying that the final result is a positive number, which is the key argument in order to verify that is an inner product, thus reobtaining the main result of [12] by a complete different argument. More generally, we have the following result.
Theorem 4.1.
Let be a Bersntein function and be a continuous CND kernel such that is a bounded function. Consider the vector space
then the function
defines an semi-inner product on . If is not a linear function and only when , then defines an inner product on .
We emphasize that by Lemma 3.4 ( is the constant function) if and only if for some (or every) .
For instance, if is a real Hilbert space , and , , then
defines an inner product on
It is relevant to say that usually the inner product on Theorem 4.1 is not complete (hence, is not a Hilbert space). For instance, on [20] it is proved that the Gaussian kernel can be used to define an inner product on the space of tempered distributions on Euclidean spaces.
Another example occurs on the generalized real hyperbolic space. Let be a Hilbert space and define be the real hyperbolic space relative to and consider the kernel
which satisfies the relation
where is a metric in . On [3] or chapter in [1], it is proved that the metric on is a CND kernel, being so we can apply Theorem 4.1 for the kernel and , , then
defines a inner product on
We can also include the case . A proof when is finite dimensional was provided on [13] using geometric properties of hyperbolic spaces. Our proof relies on a Laurent type of approximation for the function .
Theorem 4.2.
Let be a real hyperbolic space, and consider the vector space
Then
is an inner product.
A different behaviour occurs on the generalized real spheres. Let be a Hilbert space and define be the real sphere relative to . The kernel defined on by the relation
is a metric and defines a CND kernel as shown on [5]. However, unlikely the Hilbert space and the real hyperbolic space, is not a metric space of strong negative type, [14]. Gangolli also proved on [5] that the metric on the other compact two-point homogeneous spaces (real/complex/quaternionic projective spaces and the Cayley projective plane) does not define a CND kernel.
The following Corollary of Theorem 4.1, connects the setting of metric spaces of strong negative type and the kernels on Theorem 4.1.
Corollary 4.3.
Let be a nonzero Bernstein function such that ,
and is a metric space of negative type. Then,
is a metric on and is a metric space of strong negative type homeomorphic to .
As an example of Corollary 4.3, the Bersntein function , satisfies and . In particular, on a Hilbert space , is a metric on that is homeomorphic with the Hilbertian topology and this metric is of strong negative type. Interestingly we can apply Corollary 4.3 again in order to obtain that the same occurs with the metric .
Returning to the kernel , we may ask ourselves what occurs when . The case is simpler, because
for every . This still defines a semi-inner product on , but the vector space
is equivalent to the zero measure on this inner product. For an arbitrary measure such that , the linear functional
is continuous, so there exists a vector , which we call the vector mean of , which represents the above continuous linear functional.
On the case , a different behaviour emerges. The double integral kernel does not define a semi-inner product on , however, if we restrict ourselves to the vector space space
for and using the representation given on Equation 2.4 for the function
by Fubini-Tonelli we obtain that if
In particular, we can use the kernel , , in order to define a metric on the space of Radon probability measures on with finite second moment, but with a fixed vector mean.
More generally, we have that.
Theorem 4.4.
Let , be a continuous function on and be a continuous CND kernel such that is a constant function. Consider the vector space
where is the kernel in Theorem 3.1, then the function
defines an semi-inner product on . If is not a polynomial of degree or less and only when , then defines an inner product on .
From Equation 2.4 and the fact that , for every and (this can be easily proved by induction on ), if on Theorem 4.4 also belongs , we may lower the requirement to on the definition of .
The fact that we required additional properties on the function on Theorem 4.4 compared to Theorem 4.1, is related to the fact that the integrals are difficult to analyse on the general setting of Theorem 4.1. However, if on Theorem 4.4 also belongs and all of its derivatives up to are zero at the point , then there is no polynomial part on Equation 2.4, and on this case we may only assume that is a bounded function on Theorem 4.4. This is the case for the function , .
As an example of Theorem 4.4, if is a Hilbert space , and , , then
defines a inner product on the vector space
5. Space of functions defined by derivatives of completely monotone functions
As mentioned in [12], the fact that the energy distance defines a metric on a separable Hilbert space can be proved using the proposed method, but also follows as a consequence of the fact that if is a separable Hilbert space, then a measure such that , , satisfies
(5.5) |
On [8] it is proved that if and is not a constant function, then
if and only if is the zero measure. In this section we prove similar results on a much broader setting, as a consequence of the results presented on Section 4.
Theorem 5.1.
Let be an infinite dimensional Hilbert space, and . If a measure such that satisfies
where then it must hold that
In addition, (even if is not infinite dimensional), is not a polynomial if and only if the only measure such that satisfies
is the zero measure.
For some functions we can provide a version of Theorem 5.1 on finite dimensional spaces.
Lemma 5.2.
Let and be a Hilbert space. A measure such that and , satisfies
when is one of the following functions:
-
, ;
-
, ;
-
and is not a polynomial, .
-
and is not a polynomial, but .
if and only if is the zero measure.
We remark that on the case we may withdraw the additional assumption if .
6. Proofs
6.1. Section 3
Proof of Theorem 3.3.
The converse is immediate.
Suppose that is -CPD. Since is finite dimensional there exists a basis for it such that . By the integrability assumptions on the functions and , the kernel , and
the conclusion will follow from Lemma 3.6. ∎
Proof of Lemma 3.4 .
Let for which . Let
, which by Fubini-Tonelli its complement has zero measure. If , the result is a consequence of Theorem 3.3. On the other hand, if , note that the kernel is positive definite when restricted to the closed set , , and
The conclusion follows from Lemma 3.6.
Now, under the additional requirements on and it is easy to see that is -PD if and only if the kernel
is CPD. Note that is sufficient to prove the equivalences on the kernel for any measure with , because we can take .
The kernel is a pseudometric on , because is a CND kernel with for all , so it satisfies the triangle inequality. Since the function is bounded, the relations
are respectively equivalent to the relations
The conclusion that these properties are equivalent for the kernel follows directly from the triangle inequality. ∎
Proof of Lemma 3.6.
Assume without loss of generalization that is a nonnegative measure. The fact that the set satisfies is a direct consequence of the Fubini-Tonelli Theorem.
Also, by the Radon hypothesis, there exists a sequence of nested compact sets for which . In particular, by the Dominated Convergence Theorem, the convergence holds
because . The function , so by Lemma 2.1, , where , and
which proves that the sequence is Cauchy, in particular, convergent to an element . Since is a RKHS, convergence in norm implies pointwise convergence, so
for every , which proves our claim.
Now, if , we have that
The left hand side of this equality converge to , while the right hand side converge to by the Dominated Convergence Theorem. ∎
6.2. Section 4
Throughout the rest of the paper, we use the well known fact that a Hermitian kernel is CND if and only if the kernel is positive definite for every , page in [1].
Next Lemma is an improvement of Lemma 3.4 for as the set of constant functions. We use CND instead of CPD because it is how we apply this result.
Lemma 6.1.
Let be a continuous CND kernel such that is a bounded function, and . Then, the following assertions are equivalent
-
;
-
The function for some ;
-
The function for every .
Proof.
Since is CND there exists a CND kernel , for which for every , is a pseudometric on and .
Since is bounded and is a finite measure, for the three equivalences for are respectively equivalent to the three equivalences for the CND kernel by the Minkowsky inequality. If the same relation occurs, but it follows from the general relation on spaces
In particular, we may suppose that is a CND kernel for which for every .
If then there exists for which by the Fubini-Tonelli Theorem.
If for some , then for every
For , the functions inside the parenthesis on the right hand side of the previous equation are elements of ( variable), which by Minkowski Theorem we obtain the integrability of . Integrating the Minkowsky inequality with respect to , we also obtain that . For , the proof is the same but it follows from the from the general relation on spaces as mentioned above. ∎
Proof of Theorem 4.1.
For the first claim it is sufficient to prove that by the linearity of the integration involved. Indeed, by Equation 2.4, we have that
where and is a nonnegative Radon measure such that . Consequently, is a CPD kernel, the conclusion is then consequence of Corollary 3.4.
If is not a linear function, then , because the representation on Equation 2.4 is unique.
If , then the functions that describes are in , because for every and and . Since
we can apply Fubini-Tonelli and obtain that
The first double integral is non positive by Corollary 3.4. Since only when , the kernel is ISPD for every by Theorem in [8], so
and the conclusion follows because .
∎
Proof of Theorem 4.2.
By equation in [2], we have that for
In [1] it is proved that is a CND kernel on while by [8] the positive definite kernel on is ISPD for every . Since the series appearing on the formula above only contains nonnegative numbers, we may reverse the order the summation with integration for any . Consequently, if is not the zero measure
∎
Proof of Corollary 4.3.
By Remark -(iv) on [18], if satisfy these assumptions then we can write the kernel as
where is a nonnegative Radon measure such that . Because is a metric, we have that
Which proves that .
The topologies are equivalent because is necessarily an increasing function with , so if and only if .
The metric space has strong negative type because the kernel is continuous on the metric topology , is not a linear function and the remaining requirements for Theorem 4.1 are satisfied.
∎
In order to prove the next result, we will use the same infinite dimensional multinomial theorem that was used to prove that the Gaussian kernel is ISPD on Hilbert spaces on [8]. If is a real Hilbert space and is a complete orthonormal basis for it, then for every
(6.6) |
where , is the space of functions from to , the condition means that (in particular must be the zero function except for a finite number of points). Also (which makes sense because ) and . This result can be proved using approximations of on finite dimensional spaces and the multinomial theorem on those spaces. The number stands for the smallest integer less then or equal to .
On the next Lemma we use the fact that for a continuous positive definite kernel a measure satisfy
Lemma 6.2.
Let be a real Hilbert space, and . Suppose that , then
Moreover, if for every , then
and
Proof.
By Lemma 6.1, the fact that is equivalent at . Since
and , we obtain the desired integrability.
Note that
If , then by the hypothesis
But then, for every with , because the kernel inside the double integral is positive definite, continuous and satisfies the conditions on Lemma 2.1. In particular, since for every and there exists a sequence that converges to and , we have that
Then
By symmetry, the same double integral is zero when . Those two relations occur only when and . The remaining terms on the sum when are exactly those on the statement on the theorem after a simplification using those two equalities. The conclusion follows because the kernel is continuous, positive definite and satisfies the conditions on Lemma 2.1 ∎
Corollary 6.3.
Let be a continuous CND kernel such that is a constant function and only when . Then for and such that , the kernel defined in Theorem 3.1 satisfies , and if
then
and
Proof.
By the hypothesis on , there exists a Hilbert space and a continuous and injective function , such that , where is the value of on the diagonal. If is a measure satisfying the conditions on the Corollary, then the image measure satisfies the same conditions of Lemma 6.2. The conclusion follows by standard properties of image measures.∎
Proof of Theorem 4.4.
By Equation 2.3, we have that
By the hypothesis, the functions above are in . Corollary 6.3 implies that
On the other hand, because of Lemma 6.4 we can apply Fubini-Tonelli, and then
because the inner double integral is a nonnegative number for every by [8].
Because the representation for is unique, if is not a polynomial of degree or less then , also, if only when , by [8] the inner double integral is a positive number for every when is not the zero measure, and then the triple integral is a positive number as well.∎
Lemma 6.4.
There exists an , which only depends on for which
(6.7) |
Proof.
Note that .
Case : On this case, the right hand side of Equation 6.7 is , while the left hand side is
Since each function is bounded, the results follows from the fact that .
Case : On this case, the right hand side of Equation 6.7 is , while the left hand side is
The function is a bounded function on , and from this we obtain the desired inequality for .
On the other function we have that
Similarly, since the functions are bounded on and from this we also obtain the desired inequality for , which concludes the proof. ∎
6.3. Section 5
Proof of Theorem 5.1.
Since is infinite dimensional, take be an orthonormal sequence of vectors in . By the Dominated Convergence Theorem, we have that
because as and , which proves the first assertion.
Now, if is a polynomial of degree , let , (not all null) such that for every . Then if , the measure is nonzero and
because this function is polynomial of degree for every fixed .
For the converse, first, we show that is sufficient to prove the case .
Indeed, the function is differentiable for every , and
Since , and those functions are elements of , we have that , for every . In particular, the derivative is a function in and
(6.8) |
Since also is the difference between two functions in for every , by induction, we may assume that .
Assume that is not a polynomial and is a nonzero measure that satisfy the equality on the statement of the Theorem. The function is the difference between two completely monotone functions on , so there exists a measure in for which
and for every . Integrating the function on the hypotheses with respect to the measure , we obtain that
(6.9) |
The continuous and bounded function , , is positive for every by [8], additionally Equation 6.9 implies that ()
By the uniqueness representation of Laplace transform, this can only occur if the finite measure is the zero measure on . The behaviour of implies that this occur if and only if and is a multiple of , the latter implies that is a constant function, which is a contradiction. ∎
Proof of Lemma 5.2.
By Theorem 5.1 we only need to focus on the finite dimensional case. We prove and by showing that is sufficient to prove the case , which will follow from and . For the induction argument on we assume a more general setting, that , with .
Indeed, suppose that . Note then that the function is twice differentiable on each direction of an orthonormal basis for , and
Since (or is an element, the sign does not make difference for the induction step), we have that and . In particular, the second derivative is a function in and summing on the variable we obtain ()
(6.10) |
When is a function of type or , the integrand on this equation is equal to a positive multiple of (or plus a multiple of ), which is the induction argument.
Now, let be an arbitrary function on , , that is not a polynomial. For every , define , where and is the vector mean, that is
On the case the vector might not be well defined, on this case define it as the vector zero. Then , and if it is well defined . By the hypothesis we obtain that
By Theorem 4.4, this is a nonnegative number for every .
On the other hand, if by the relation on Equation 2.4, we know that converges to as , so if or we would reach a contradiction, consequently . In particular, we obtain that the double integral with respect to is zero, which by Theorem 4.4 we must have that , because is not a polynomial. From this equality and the initial assumption on we obtain that for every , which can only occur if is the zero measure because is not a polynomial.
The case follows by a similar analysis.
∎
References
- [1] C. Berg, J. Christensen, and P. Ressel, Harmonic analysis on semigroups: theory of positive definite and related functions, vol. 100 of Graduate Texts in Mathematics, Springer, 1984.
- [2] NIST Digital Library of Mathematical Functions. F. W. J. Olver, A. B. Olde Daalhuis, D. W. Lozier, B. I. Schneider, R. F. Boisvert, C. W. Clark, B. R. Miller, B. V. Saunders, H. S. Cohl, and M. A. McClain, eds.
- [3] J. Faraut and K. Harzallah, Distances hilbertiennes invariantes sur un espace homogène, Annales de l’Institut Fourier, 24 (1974), pp. 171–217.
- [4] K. Fukumizu, F. R. Bach, and M. I. Jordan, Dimensionality reduction for supervised learning with reproducing kernel hilbert spaces, Journal of Machine Learning Research, 5 (2004), pp. 73–99.
- [5] R. Gangolli, Positive definite kernels on homogeneous spaces and certain stochastic processes related to lévy brownian motion of several parameters, Annales de Institut Henri Poincaré Probabilités et Statistiques, 3 (1967), pp. 121–226.
- [6] I. M. Gelfand and N. Y. Vilenkin, Generalized Functions, Vol. 4: Applications of Harmonic Analysis, Academic Press, 1964.
- [7] A. Gretton, K. Borgwardt, M. Rasch, B. Schölkopf, and A. Smola, A kernel method for the two-sample-problem, Advances in neural information processing systems, 19 (2006), pp. 513–520.
- [8] J. C. Guella, On Gaussian kernels on Hilbert spaces and kernels on Hyperbolic spaces, arXiv e-prints, (2020), p. arXiv:2007.14697.
- [9] K. Guo, S. Hu, and X. Sun, Conditionally positive definite functions and Laplace-Stieltjes integrals, Journal of Approximation Theory, 74 (1993), pp. 249–265.
- [10] A. L. Koldobskii, Isometric operators in vector-valued lp-spaces, Journal of Soviet Mathematics, 36 (1987), pp. 420–423.
- [11] W. LINDE, On rudin’s equimeasurability theorem for infinite dimensional hilbert spaces, Indiana University Mathematics Journal, 35 (1986), pp. 235–243.
- [12] R. Lyons, Distance covariance in metric spaces, Ann. Probab., 41 (2013), pp. 3284–3305.
- [13] , Hyperbolic space has strong negative type, Illinois J. Math., 58 (2014), pp. 1009–1013.
- [14] , Strong negative type in spheres, Pacific Journal of Mathematics, 307 (2020), pp. 383–390.
- [15] L. Mattner, Strict definiteness of integrals via complete monotonicity of derivatives, Transactions of the American Mathematical Society, 349 (1997), pp. 3321–3342.
- [16] C. A. Micchelli, Interpolation of scattered data: distance matrices and conditionally positive definite functions, Constructive Approximation, 2 (1984), pp. 11–22.
- [17] C. A. Micchelli, Y. Xu, and H. Zhang, Universal kernels, Journal of Machine Learning Research, 7 (2006), pp. 2651–2667.
- [18] R. L. Schilling, R. Song, and Z. Vondracek, Bernstein functions: theory and applications, vol. 37, Walter de Gruyter, 2012.
- [19] D. Sejdinovic, B. Sriperumbudur, A. Gretton, and K. Fukumizu, Equivalence of distance-based and rkhs-based statistics in hypothesis testing, The Annals of Statistics, (2013), pp. 2263–2291.
- [20] C.J. Simon-Gabriel and B. Schölkopf, Kernel distribution embeddings: Universal kernels, characteristic kernels and kernel metrics on distributions, Journal of Machine Learning Research, 19 (2018), pp. 1–29.
- [21] B. K. Sriperumbudur, K. Fukumizu, and G. R. Lanckriet, Universality, characteristic kernels and RKHS embedding of measures, Journal of Machine Learning Research, 12 (2011), pp. 2389–2410.
- [22] I. Steinwart and A. Christmann, Support vector machines, Springer Science & Business Media, 2008.
- [23] G. J. Székely and M. L. Rizzo, Energy statistics: A class of statistics based on distances, Journal of Statistical Planning and Inference, 143 (2013), pp. 1249–1272.
- [24] G. J. Székely, M. L. Rizzo, et al., Testing for equal distributions in high dimension, InterStat, 5 (2004), pp. 1249–1272.
- [25] H. Wendland, Scattered data approximation, vol. 17, Cambridge university press, 2005.