This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A global structure-preserving kernel method for the learning of Poisson systems

Jianyu Hu1 Juan-Pablo Ortega1 and Daiying Yin1
Abstract

A structure-preserving kernel ridge regression method is presented that allows the recovery of globally defined, potentially high-dimensional, and nonlinear Hamiltonian functions on Poisson manifolds out of datasets made of noisy observations of Hamiltonian vector fields. The proposed method is based on finding the solution of a non-standard kernel ridge regression where the observed data is generated as the noisy image by a vector bundle map of the differential of the function that one is trying to estimate. Additionally, it is shown how a suitable regularization solves the intrinsic non-identifiability of the learning problem due to the degeneracy of the Poisson tensor and the presence of Casimir functions. A full error analysis is conducted that provides convergence rates using fixed and adaptive regularization parameters. The good performance of the proposed estimator is illustrated with several numerical experiments.

11footnotetext: Jianyu Hu, Juan-Pablo Ortega, and Daiying Yin are with the Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore. Their email addresses are Jianyu.Hu@ntu.edu.sg, Juan-Pablo.Ortega@ntu.edu.sg, and YIND0004@e.ntu.edu.sg, respectively.

1 Introduction

The study of Poisson systems plays a crucial role in understanding a wide range of physical phenomena, from classical mechanics to fluid dynamics [Mars 99, Orte 04, Holm 09, Holm 11a, Holm 11b]. These systems, characterized by their inherent geometric structures and conservation laws, present unique challenges in modeling and simulation. Traditional methods for learning and approximating Poisson systems often require extensive domain knowledge and computational resources. In recent years, several machine learning techniques have emerged as powerful tools for addressing complex scientific problems that could offer new avenues for efficiently learning the structures underlying Poisson systems.

Recent literature has extensively explored the learning and prediction of some Poisson systems. These studies incorporate varying degrees of domain-specific knowledge into learning algorithms, enhancing their ability to accurately model and predict such systems’ behavior. An initial approach involved directly learning the Hamiltonian [Grey 19, Davi 23, Hu 24] or Lagrangian [Cran 20, Chen 22] functions using neural networks or other machine learning models, such as Gaussian processes [Offe 24, Pfö 24, Beck 22]. It is worth pointing out that proper variational integrators [Mars 01, Leok 12] should be adopted to match the predicted vector field with observed time series data [Zhu 20]. Researchers have also proposed learning the generating function that produces the flow map [Chen 21, Rath 21]. Alternative approaches aim to parameterize a (possibly universal) family of symplectic or Poisson maps to fit the flow map directly [Chen 19, Jin 20, Baja 23, Jin 22, Eldr 24b], with SympNet being a prominent example [Jin 20]. Using SympNet, symplectic autoencoders have been developed to achieve a structure-preserving model reduction of Hamiltonian systems [Bran 23], effectively mapping high-dimensional systems to low-dimensional representations. Additionally, physics-informed neural networks have become popular for solving known physical partial differential equations with high precision [Rais 19] and convergence analysis in certain scenarios [Doum 23, Shin 20], though they enforce the PDEs as loss functions and thus do not preserve the exact structure. In general, to characterize and replicate the flow of (port-)Hamiltonian systems with exact structure preservation remains challenging, with a notable exception in the literature for linear port-Hamiltonian systems [Orte 24].

This paper proposes a non-standard kernel ridge regression method that allows the recovery of globally defined and potentially high-dimensional and nonlinear Hamiltonian functions on Poisson manifolds out of datasets made of noisy observations of Hamiltonian vector fields. We now spell out three key features of the approach proposed in this paper.

First, much of the existing research in this field operates under the assumption that the underlying dynamical system is Hamiltonian or that the flow map is symplectic [Chen 19, Jin 20], allowing for the use of generating function models or universal parameterizations of symplectic maps to directly model the flow. However, this assumption does not extend to general Poisson systems on manifolds for which the Hamiltonian function cannot be globally expressed in canonical coordinates, and the flow map is generally not symplectic. Moreover, due to the potential degeneracy of the Poisson tensor, Poisson manifolds typically possess a family of functions known as Casimirs, which do not influence the Hamiltonian vector fields. Preserving these Casimirs is an intrinsic requirement for any machine learning algorithm applied to such systems. Inspired by the Darboux-Weinstein theorem [Wein 83, Libe 12, Vais 12], recent approaches have proposed learning coordinate transformations to convert coordinates into canonical ones, followed by inverse transformations to retrieve the dynamics [Jin 22, Baja 23]. While the local existence of these transformations is guaranteed, their global behavior remains unclear, particularly in cases where coordinate patches overlap, and numerical errors on the preservation of Casimir values accumulate. The Lie-Poisson Network [Eldr 24b, Eldr 24a] attempts to approximate flows of the special case of Lie-Poisson systems using compositions of flow maps of linear systems, preserving Casimirs exactly through projection but not addressing the expressiveness of the constructed flows. Rather than learning the flow map, we propose to discover an underlying Hamiltonian function of the Poisson system from Hamiltonian vector fields in a structure-preserving manner. The presence of Casimir functions renders the identification of the Hamiltonian function from the observed vector field infeasible, which is a challenge that lies beyond the scope of the work already presented by the authors in [Hu 24] and necessitates the development of new algorithms to address this issue.

Second, while significant research has focused on the forward problem of learning the flow map of dynamical systems using approaches like Physics-Informed neural networks (PINNs), symbolic regression [Brun 16], Gaussian processes, or kernel methods, the inverse problem of uncovering unknown Hamiltonian functions from data remains relatively underexplored. Unlike neural networks or likelihood-based methods that rely on iterative or stochastic gradient descent to optimize objective functions, kernel methods offer a more straightforward, easy-to-train, and data-efficient alternative. By transforming the optimization problem into a convex one, kernel methods enable the derivation of a closed-form solution for the Hamiltonian function estimation problem in a single step, even on manifolds. Another advantage of kernel methods is that by selecting a kernel with inherent symmetry, the learning scheme can be designed to automatically preserve it, which is particularly important when learning mechanical systems. We recall that due to the celebrated Noether’s Theorem, symmetries usually carry in their wake additional conserved quantities when dealing with mechanical systems. The Reproducing Kernel Hilbert Space (RKHS) framework also facilitates rigorous estimation and approximation error bounds, offering both theoretical insights and interpretability. Additionally, it is well-suited for the design of online learning paradigms.

Finally, while the literature on learning physical dynamical systems is extensive and the idea of structure-preserving kernel regression has already been proposed to recover either interacting potentials of particle systems [Feng 23] or Hamiltonian functions on Euclidean spaces [Hu 24], the majority of studies have been confined to the context of Euclidean spaces, with a lack of attention to scenarios where the underlying configuration space is a more general manifold, which is typically the case in many applications like rigid body mechanics, robotics, fluids, or elasticity. This generalization introduces a two-fold challenge in model construction or for the learning of either the flows of dynamical systems or their corresponding Hamiltonian functions, namely, preserving simlutaneously the manifold and the variational structures. When a model is designed to predict Hamiltonian vector fields on a manifold, ensuring that the predicted flow remains on the manifold and preserves symplectic or Poisson structures necessitates structure-preserving integrators. However, even in Euclidean spaces, the design of variational integrators is complex and often requires implicit methods, indicating a heightened difficulty in the manifold setting. Additionally, standard methods such as Galerkin projections are typically approximately but not exactly manifold-preserving [Boro 20] and typically fail to preserve the variational structure. In [Magg 21], an algorithm is proposed to identify interacting potentials on a Riemannian manifold, though the estimated potential remains dependent only on the Euclidean scalar distance. This approach requires a prior selection of a finite set of basis functions and coordinate charts or embeddings. In contrast, our work introduces a kernel method that globally estimates a Hamiltonian function for Poisson systems on the phase-space manifold and yields a globally well-defined and chart-independent solution. This sets it apart from existing approaches in the literature [Feng 23, Magg 21, Lu 19].

Main Results.

This paper proposes a method to learn a Hamiltonian function H:PH:P\rightarrow\mathbb{R} of a globally defined Hamiltonian system on a dd-dimensional Poisson manifold PP from noisy realizations of the corresponding Hamiltonian vector field XH𝔛(P)X_{H}\in\mathfrak{X}(P), that is:

𝐗σ2(n):=XH(𝐙(n))+𝜺(n),n=1,,N,\displaystyle\mathbf{X}_{\sigma^{2}}^{(n)}:=X_{H}(\mathbf{Z}^{(n)})+\bm{\varepsilon}^{(n)},\quad n=1,\ldots,N,

where 𝐙(n)\mathbf{Z}^{(n)} are IID random variables with values on the Poisson manifold PP and with the same distribution μ𝐙\mu_{\mathbf{Z}}, and 𝜺(n)\bm{\varepsilon}^{(n)} are independent random variables on the tangent spaces T𝐳(n)PT_{\mathbf{z}^{(n)}}P (𝐳(n))\mathbf{z}^{(n)}) is a particular realization of 𝐙(n)\mathbf{Z}^{(n)} with mean zero and variance σ2Id\sigma^{2}I_{d}. In this setup, we consider the inverse problem consisting in recovering the Hamiltonian function HH, or a dynamically equivalent version of it, by solving the following optimization problem,

h^λ,N:=argminhK1Nn=1NXh(𝐙(n))𝐗σ2(n)g2+λhK2,\displaystyle\widehat{h}_{\lambda,N}:=\mathop{\arg\min}\limits_{h\in\mathcal{H}_{K}}\frac{1}{N}\sum_{n=1}^{N}\left\|X_{h}(\mathbf{Z}^{(n)})-\mathbf{X}^{(n)}_{\sigma^{2}}\right\|_{g}^{2}+\lambda\|h\|_{\mathcal{H}_{K}}^{2}, (1.1)

where XhX_{h} is the Hamiltonian vector field corresponding to the function hKh\in{\mathcal{H}}_{K}, K{\mathcal{H}}_{K} is the reproducing kernel Hilbert space (RKHS) associated to a kernel function K:P×PK:P\times P\rightarrow\mathbb{R} that will be introduced later on, gg is a Riemannian metric on PP, and λ0\lambda\geq 0 is a Tikhonov regularization parameter. The solution h^λ,N\widehat{h}_{\lambda,N} in the RKHS is sought to minimize the natural regularized empirical risk defined in this setup. We shall call the solution h^λ,N\widehat{h}_{\lambda,N} of the optimization problem (1.1) the structure-preserving kernel estimator of the Hamiltonian function.

We now summarize the outline and the main contributions of the paper.

  1. 1.

    In Section 2, we review basic concepts in Riemannian geometry, including differentials on Riemannian manifolds and the definitions of random variables and integration in this context. Additionally, we provide the necessary background on Poisson mechanics, covering topics such as Hamiltonian vector fields, Casimir functions, compatible structures, and Lie-Poisson systems, illustrated with two real-world examples: rigid-body dynamics and underwater vehicle dynamics.

  2. 2.

    In Section 3, we extend the differentiable reproducing property, previously established for the compact [Zhou 08] and Euclidean cases [Hu 24] to Riemannian manifolds. Specifically, we show that the differential of a function fKf\in\mathcal{H}_{K} in the reproducing kernel Hilbert space (RKHS) K\mathcal{H}_{K} associated to a kernel function K:P×PK:P\times P\rightarrow\mathbb{R}, when paired with a vector vTxPv\in T_{x}P, can be represented as the RKHS inner product of fKf\in\mathcal{H}_{K} with the differential of the kernel section of KK evaluated along vTxPv\in T_{x}P. This property is pivotal throughout our analysis. Following this, we derive the solution of the optimization problem (1.1) using operator representations and ultimately express it as a closed-form formula made of a linear combination of kernel sections, similar to the standard Representer theorem. Importantly, this formula provides a globally defined expression on the Poisson manifold PP, independent of the choice of local coordinates, though we also present the formula in local charts. In Subsection 3.5, we study the case in which the Hamiltonian that needs to be learned is known to be a priori invariant with respect to the canonical action of a given Lie group and show that if the kernel is chosen to be invariant under that group action, then the flow of the estimated Hamiltonian inherently conserves momentum maps, as a direct consequence of Noether’s theorem.

  3. 3.

    In Section 4, we utilize the operator representation of the estimator to derive bounds for estimation and approximation errors. These bounds demonstrate that, as the number of random samples increases, the estimator converges to a true Hamiltonian function with high probability.

  4. 4.

    In Section 5, we implement the structure-preserving kernel ridge regression in Python and conduct extensive numerical experiments. These experiments focus on two primary scenarios: Hamiltonian systems on symplectic manifolds (Subsection 5.2) and Lie-Poisson systems (Subsection 5.1). In subsection 5.2, we aim to recover the Hamiltonian function for the classical two-vortex system on products of 22-spheres, which exhibits singularities, as well as another well-behaved Hamiltonian function represented as a 3-norm defined on the same manifold. In Subsection 5.1, we address symmetric Poisson systems on cotangent bundles of Lie groups. Through Lie-Poisson reduction, these systems can be modeled as Lie-Poisson systems on the dual of a Lie algebra. Numerical simulations validate the effectiveness, ease of training, and data efficiency of our proposed kernel-based approach.

We emphasize that, although this paper focuses on recovering Hamiltonian functions from observed Hamiltonian vector fields, the proposed framework is more broadly applicable. Specifically, it can be extended to any scenario where the observed quantity (in this paper JHJ\nabla H) is the image by a vector bundle map of the differential of a function (in this paper HH) on the manifold. This makes the framework versatile for various systems beyond Hamiltonian dynamics; for example, in gradient systems with an underlying control-dependent potential function, the same algorithm can be applied to learn an optimal control force based on sensor-measured external forces.

2 Preliminaries on Poisson mechanics

2.1 Hamiltonian systems on Poisson manifolds

The aim of this paper is to learn Hamiltonian systems on Poisson manifolds. In this subsection, we will provide a brief introduction to Poisson mechanics relevant to the learning problems; for more details, see [Mars 99, Orte 04, Libe 12, Vais 12]. Let PP be a manifold and denote by C(P)C^{\infty}(P) the space of all smooth functions defined on it.

Definition 2.1.

A Poisson bracket (or a Poisson structure) on a manifold PP is a bilinear map {,}:C(P)×C(P)C(P)\{\cdot,\cdot\}:C^{\infty}(P)\times C^{\infty}(P)\to C^{\infty}(P) such that:

(i)

(C(P),{,})\left(C^{\infty}(P),\{\cdot,\cdot\}\right) is a Lie algebra.

(ii)

{,}\{\cdot,\cdot\} is a derivation in each factor, that is,

{FG,H}={F,H}G+F{G,H},\{FG,H\}=\{F,H\}G+F\{G,H\},

for all F,GF,G, and HC(P)H\in C^{\infty}(P).

A manifold PP endowed with a Poisson bracket {,}\{\cdot,\cdot\} on C(P)C^{\infty}(P) is called a Poisson manifold.

Example 2.2 (Symplectic bracket).

Any symplectic manifold (P,ω)(P,\omega) is a Poisson manifold with the Poisson bracket defined by the symplectic form ω\omega as follows:

{F,G}(z)=ω(XF(z),XG(z)),F,GC(P),\displaystyle\{F,G\}(z)=\omega(X_{F}(z),X_{G}(z)),\quad\mbox{$F,G\in C^{\infty}(P)$, } (2.1)

where XF,XGX_{F},X_{G} are the Hamiltonian vector fields of F,GC(P)F,G\in C^{\infty}(P), respectively, uniquely determined by the equalities 𝐢XFω=𝐝F\mathbf{i}_{X_{F}}\omega=\mathbf{d}F and 𝐢XGω=𝐝G\mathbf{i}_{X_{G}}\omega=\mathbf{d}G (𝐢\mathbf{i} denotes the interior product and 𝐝\mathbf{d} the exterior derivative).

Example 2.3 (Lie-Poisson bracket).

If 𝔤\mathfrak{g} is a Lie algebra, then its dual algebra 𝔤\mathfrak{g}^{*} is a Poisson manifold with respect to the Lie-Poisson brackets {,}+\{\cdot,\cdot\}_{+} and {,}\{\cdot,\cdot\}_{-} defined by

{F,G}±(μ)=±μ,[δFδμ,δGδμ],F,GC(𝔤),\displaystyle\{F,G\}_{\pm}(\mu)=\pm\left\langle\mu,\left[\frac{\delta F}{\delta\mu},\frac{\delta G}{\delta\mu}\right]\right\rangle,\quad\mbox{$F,G\in C^{\infty}(\mathfrak{g}^{\ast})$,} (2.2)

where [,][\cdot,\cdot] is the Lie bracket on 𝔤\mathfrak{g}, ,\langle\cdot,\cdot\rangle denotes the pairing between 𝔤\mathfrak{g}^{*} and 𝔤\mathfrak{g}, and δFδμ𝔤\frac{\delta F}{\delta\mu}\in\mathfrak{g} is the functional derivative of FF at μ𝔤\mu\in\mathfrak{g}^{*} defined as the unique element that satisfies

limε01ε[F(μ+εδμ)F(μ)]=δμ,δFδμfor any δμ𝔤.\displaystyle\lim_{\varepsilon\to 0}\frac{1}{\varepsilon}\left[F(\mu+\varepsilon\delta\mu)-F(\mu)\right]=\left\langle\delta\mu,\frac{\delta F}{\delta\mu}\right\rangle\quad\mbox{for any $\delta\mu\in\mathfrak{g}^{\ast}$}. (2.3)

By the derivation property of the Poisson bracket, the value of the bracket {F,G}\{F,G\} at zPz\in P (and thus XF(z)X_{F}(z) as well) depends on FF only through the differential 𝐝F(z)\mathbf{d}F(z). Thus, a Poisson tensor associated with the Poisson bracket can be defined as follows.

Definition 2.4.

Let (P,{,})(P,\left\{\cdot,\cdot\right\}) be a Poisson manifold. The Poisson tensor of this Poisson bracket is the contravariant anti-symmetric two-tensor

B:TP×TP,B:T^{*}P\times T^{*}P\rightarrow\mathbb{R},

defined by

B(z)(αz,βz)={F,G}(z),for any αz,βzTzPzP,\displaystyle B(z)\left(\alpha_{z},\beta_{z}\right)=\{F,G\}(z),\quad\mbox{for any $\alpha_{z},\beta_{z}\in T_{z}^{\ast}P$, $z\in P$}, (2.4)

and where F,GC(P)F,G\in C^{\infty}(P) are any two functions such that 𝐝F(z)=αz\mathbf{d}F(z)=\alpha_{z} and 𝐝G(z)=βzTzP\mathbf{d}G(z)=\beta_{z}\in T_{z}^{*}P.

Hamiltonian vector fields and Hamilton’s equations. Given a Poisson manifold (P,{,})(P,\left\{\cdot,\cdot\right\}) and HC(P)H\in C^{\infty}(P), there is a unique vector field XH𝔛(P)X_{H}\in\mathfrak{X}(P) such that

XH[F]={F,H},for all FC(P).\displaystyle X_{H}[F]=\{F,H\},\quad\mbox{for all $F\in C^{\infty}(P)$}. (2.5)

The unique vector field XHX_{H} is called the Hamiltonian vector field associated with the Hamiltonian function HH. Let Ft:PPF_{t}:P\rightarrow P be the flow of the Hamiltonian vector field XHX_{H}; then, for any GC(P)G\in C^{\infty}(P),

ddt(GFt)={G,H}Ft={GFt,H},orG˙={G,H}.\displaystyle\frac{d}{dt}(G\circ F_{t})=\{G,H\}\circ F_{t}=\{G\circ F_{t},H\},\quad\text{or}\quad\dot{G}=\{G,H\}. (2.6)

We call (2.6) the Hamilton equations associated with the Hamiltonian function HH. Let B:TPTPB^{\sharp}:T^{*}P\to TP be the vector bundle map associated to the Poisson tensor BB defined in (2.4), that is,

B(z)(αz,βz)=αzB(z)(βz).\displaystyle B(z)\left(\alpha_{z},\beta_{z}\right)=\alpha_{z}\cdot B^{\sharp}(z)(\beta_{z}). (2.7)

Then the Hamiltonian vector field defined in (2.5) can be expressed as XH=B(𝐝H)X_{H}=B^{\sharp}(\bm{\mathrm{d}}H). Furthermore, the Poisson bracket is given by

{F,H}(z)=B(z)(𝐝F(z),𝐝H(z))=𝐝F(z)B(z)(𝐝H(z)).\displaystyle\{F,H\}(z)=B(z)(\mathbf{d}F(z),\mathbf{d}H(z))=\mathbf{d}F(z)\cdot B^{\sharp}(z)(\mathbf{d}H(z)).

Poisson geometry is closely related to symplectic geometry: for instance, every Poisson bracket determines a (generalized) foliation of a Poisson manifold into symplectic (immersed) submanifolds such that the leaf that passes through a given point has the dimension of the rank of the Poisson tensor at that point. We recall that the non-degeneracy hypothesis on symplectic forms implies that the Poisson tensor, in that case, has constant maximal rank. The following paragraph describes a distinctive feature of Poisson manifolds with respect to the symplectic case.

Casimir functions.  As we just explained, an important difference between the symplectic and Poisson cases is that the Poisson tensor could be degenerate, which leads to the emergence of Casimir functions. A function CC(P)C\in C^{\infty}(P) is called a Casimir function if {C,F}=0\{C,F\}=0 for all FC(P)F\in C^{\infty}(P), that is, CC is constant along the flow of all Hamiltonian vector fields, or, equivalently, XC=0X_{C}=0, that is, CC generates trivial dynamics. Casimir functions are conserved quantities along the equations of motion associated with any Hamiltonian function. The Casimirs of a Poisson manifold are the center 𝒞(P)\mathcal{C}(P) of the Poisson algebra (C(P),{,})(C^{\infty}(P),\left\{\cdot,\cdot\right\}) which is defined by

𝒞(P)={CC(P){C,F}=0, for all FC(P)}.\displaystyle\mathcal{C}(P)=\{C\in C^{\infty}(P)\mid\{C,F\}=0,\text{ for all }F\in C^{\infty}(P)\}.

If the Poisson tensor is non-degenerate (symplectic case), then 𝒞(P)\mathcal{C}(P) consists only of all constant functions on PP. The following Lie-Poisson systems are two examples that admit non-constant Casimirs.

Example 2.5 (Rigid body Casimirs [Mars 99]).

The motion of a rigid body follows a Lie-Poisson equation on the dual of the Lie algebra 𝔰𝔬(3)\mathfrak{so}(3)^{*} of the special orthogonal group SO(3)SO(3) determined by the Hamiltonian function

H(Π)=12Π𝕀1Π,\displaystyle H(\Pi)=\frac{1}{2}\Pi^{\top}\mathbb{I}^{-1}\Pi,

where 𝕀=diag{I1,I2,I3}\mathbb{I}=\operatorname{diag}\{I_{1},I_{2},I_{3}\} is a diagonal matrix. The Poisson bracket on 𝔰𝔬(3)3\mathfrak{so}(3)^{*}\simeq\mathbb{R}^{3} is given by

{F,K}(Π)=Π(F×K).\displaystyle\{F,K\}(\Pi)=-\Pi\cdot(\nabla F\times\nabla K). (2.8)

Hence, it can be easily checked that C(Π)=Π2C(\Pi)=\|\Pi\|^{2} is a Casimir function of Poisson structure (2.8). Furthermore, for any analytic function Φ:\Phi:\mathbb{R}\to\mathbb{R}, the function ΦC\Phi\circ C is also a Casimir.

Example 2.6 (Underwater vehicle Casimirs [Leon 97]).

The underwater vehicle dynamics can be described using the Lie-Poisson structure on 𝔰𝔬(3)×3×3\mathfrak{so}(3)^{*}\times{\mathbb{R}^{3}}^{*}\times{\mathbb{R}^{3}}^{*} and the Hamiltonian function

H(Π,Q,Γ)=12(ΠAΠ+2ΠBQ+QCQ2mg(ΓrG)),\displaystyle H(\Pi,\mathrm{Q},\Gamma)=\frac{1}{2}\left(\Pi^{\top}A\Pi+2\Pi^{\top}B^{\top}\mathrm{Q}+\mathrm{Q}^{\top}C\mathrm{Q}-2mg\left(\Gamma\cdot r_{\mathrm{G}}\right)\right),

where A,B,CA,B,C are matrices coming from first principles and physical laws, mm is the mass of the body, gg is the gravity constant, and rGr_{G} is the vector that links the center of buoyancy to the center of gravity. For the underwater vehicle, Π\Pi and Q\mathrm{Q} correspond, respectively, to angular and linear components of the impulse of the system (roughly, the momentum of the system less a term at infinity). The vector Γ\Gamma describes the direction of gravity in body-fixed coordinates. The Poisson bracket on 𝔰𝔬(3)×3×3\mathfrak{so}(3)^{*}\times{\mathbb{R}^{3}}^{*}\times{\mathbb{R}^{3}}^{*} is

{F,K}(Π,Q,Γ)=FΛ(Π,Q,Γ)K,\{F,K\}(\Pi,\mathrm{Q},\Gamma)=\nabla F^{\top}\Lambda(\Pi,\mathrm{Q},\Gamma)\nabla K,

where FF and KK are smooth functions on 𝔰𝔬(3)×3×3\mathfrak{so}(3)^{*}\times{\mathbb{R}^{3}}^{*}\times{\mathbb{R}^{3}}^{*} and the Poisson tensor Λ\Lambda is given by

Λ(Π,Q,Γ)=(Π^Q^Γ^Q^00Γ^00).\Lambda(\Pi,\mathrm{Q},\Gamma)=\left(\begin{array}[]{ccc}\hat{\Pi}&\hat{\mathrm{Q}}&\hat{\Gamma}\\ \hat{\mathrm{Q}}&0&0\\ \hat{\Gamma}&0&0\end{array}\right).

In this equation, the operation :3𝔰𝔬(3){}^{\wedge}:\mathbb{R}^{3}\rightarrow\mathfrak{so}(3) is the standard isomorphism of 3\mathbb{R}^{3} with the Lie algebra of the rotation group SO(3)SO(3) and is defined by v^w=v×w\hat{v}w=v\times w for v,w3v,w\in\mathbb{R}^{3}. There are six Casimir functions for the underwater vehicle dynamics

C1(Π,Q,Γ)=QΓ,C2(Π,Q,Γ)=Q2,C3(Π,Q,Γ)=Γ2,\displaystyle C_{1}(\Pi,\mathrm{Q},\Gamma)=\mathrm{Q}\cdot\Gamma,\quad C_{2}(\Pi,\mathrm{Q},\Gamma)=\|\mathrm{Q}\|^{2},\quad C_{3}(\Pi,\mathrm{Q},\Gamma)=\|\Gamma\|^{2},
C4(Π,Q,Γ)=ΠQ,C5(Π,Q,Γ)=ΠΓ,C6(Π,Q,Γ)=Π2.\displaystyle C_{4}(\Pi,\mathrm{Q},\Gamma)=\Pi\cdot\mathrm{Q},\quad C_{5}(\Pi,\mathrm{Q},\Gamma)=\Pi\cdot\Gamma,\quad C_{6}(\Pi,\mathrm{Q},\Gamma)=\|\Pi\|^{2}.

It can be easily checked that for any analytic function Φ:6\Phi:\mathbb{R}^{6}\to\mathbb{R}, the function Φ(C1,,C6)\Phi\circ(C_{1},\ldots,C_{6}) is also a Casimir.

The momentum map and conservation laws.  Symmetries are particularly important in mechanics since they are, on many occasions, associated with the appearance of conservation laws. This fact is the celebrated Noether’s Theorem [Noet 18]. A modern and elegant way to encode this fact uses the momentum map, an object introduced in [Lie 93, Sour 65, Sour 69, Kost 09, Sour 66]. The definition of the momentum map requires a canonical Lie group action, that is, an action that respects the Poisson bracket. Its existence is guaranteed when the infinitesimal generators of this action are Hamiltonian vector fields. In other words, if the Lie algebra 𝔤\mathfrak{g} of the group GG that acts canonically on the Poisson manifold (P,{,})(P,\{\cdot,\cdot\}) then, for each ξ𝔤\xi\in\mathfrak{g} with associated infinitesimal generator vector field ξP𝔛(P)\xi_{P}\in\mathfrak{X}(P), we require the existence of a globally defined function 𝐉ξC(P)\mathbf{J}^{\xi}\in C^{\infty}(P), such that

ξM=X𝐉ξ.\xi_{M}=X_{\mathbf{J}^{\xi}}.
Definition 2.7.

Let 𝔤\mathfrak{g} be the Lie algebra of a Lie group GG acting canonically on the Poisson manifold (P,{,})(P,\,\{\cdot,\cdot\}). Suppose that for any ξ𝔤\xi\in\mathfrak{g}, the vector field ξP\xi_{P} is globally Hamiltonian, with Hamiltonian function 𝐉ξC(P)\mathbf{J}^{\xi}\in C^{\infty}(P). The map 𝐉:P𝔤\mathbf{J}:P\rightarrow\mathfrak{g}^{\ast} defined by the relation

𝐉(z),ξ=𝐉ξ(z),\langle\mathbf{J}(z),\,\xi\rangle=\mathbf{J}^{\xi}(z),

for all ξ𝔤\xi\in\mathfrak{g} and zMz\in M, is called a momentum map of the GG–action.

Notice that momentum maps are not uniquely determined; indeed, 𝐉1\mathbf{J}_{1} and 𝐉2\mathbf{J}_{2} are momentum maps for the same canonical action if and only if for any ξ𝔤\xi\in\mathfrak{g}, 𝐉1ξ𝐉2ξ𝒞(P)\mathbf{J}_{1}^{\xi}-\mathbf{J}_{2}^{\xi}\in\mathcal{C}(P). Obviously, if PP is symplectic and connected, then 𝐉\mathbf{J} is determined up to a constant in 𝔤\mathfrak{g}^{\ast}.

Noether’s theorem is formulated in terms of the momentum map by saying that the fibers of the momentum map are preserved by the flow of the Hamiltonian vector field associated with any GG-invariant Hamiltonian function.

Example 2.8 (The linear momentum).

Take the phase space of the NN–particle system, that is, T3NT^{\ast}\mathbb{R}^{3N}. The additive group 3\mathbb{R}^{3} acts on it by spatial translation on each factor: 𝐯(𝐪i,𝐩i)=(𝐪i+𝐯,𝐩i)\bm{v}\cdot(\bm{q}_{i},\,\bm{p}^{i})=(\bm{q}_{i}+\bm{v},\,\bm{p}^{i}), with i=1,,Ni=1,\ldots,\,N. This action is canonical and has an associated momentum map that coincides with the classical linear momentum

𝐉:T3NLie(3)3(𝒒i,𝒑i)i=1N𝒑i.\begin{array}[]{cccc}\mathbf{J}:&T^{\ast}\mathbb{R}^{3N}&\longrightarrow&{\rm Lie}(\mathbb{R}^{3})\simeq\mathbb{R}^{3}\\ &(\bm{q}_{i},\,\bm{p}^{i})&\longmapsto&\sum_{i=1}^{N}\bm{p}_{i}.\end{array}
Example 2.9 (The angular momentum).

Let SO(3){\rm SO}(3) act on 3\mathbb{R}^{3} and then, by lift, on T3T^{\ast}\mathbb{R}^{3}, that is, A(𝐪,𝐩)=(A𝐪,A𝐩)A\cdot(\bm{q},\,\bm{p})=(A\bm{q},\,A\bm{p}). This action is canonical and has an associated momentum map

𝐉:T3𝔰𝔬(3)3(𝒒,𝒑)𝒒×𝒑.\begin{array}[]{cccc}\mathbf{J}:&T^{\ast}\mathbb{R}^{3}&\longrightarrow&\mathfrak{so(3)}^{\ast}\simeq\mathbb{R}^{3}\\ &(\bm{q},\,\bm{p})&\longmapsto&\bm{q}\times\bm{p}.\end{array}

which is the classical angular momentum.

Example 2.10 (Lifted actions on cotangent bundles).

The previous two examples are particular cases of the following situation. Let GG be a Lie group acting on the manifold QQ and then by lift on its cotangent bundle TQT^{\ast}Q. Any such lifted action is canonical with respect to the canonical symplectic form on TQT^{\ast}Q and has an associated momentum map 𝐉:TQ𝔤\mathbf{J}:T^{\ast}Q\rightarrow\mathfrak{g}^{\ast} given by

𝐉(αq),ξ=αq,ξQ(q),\langle\mathbf{J}(\alpha_{q}),\,\xi\rangle=\langle\alpha_{q},\,\xi_{Q}(q)\rangle, (2.9)

for any αqTQ\alpha_{q}\in T^{\ast}Q and any ξ𝔤\xi\in\mathfrak{g}.

2.2 High-order differentials and random variables on Riemannian manifolds

We start by reviewing higher-order differentials and the spaces of bounded differentiable functions on general manifolds, which will allow us to examine the regularity of functions in a reproducing kernel Hilbert space (RKHS) later on in Section 3.1. Finally, we introduce some basic facts about statistics on Riemannian manifolds that will be needed later on. For further details, see [Penn 19, Hsu 02, Emer 12] and references therein.

Let MM be a smooth manifold with an atlas {Uβ,φβ,β}\{U_{\beta},\varphi_{\beta},\beta\in\mathcal{I}\}, where \mathcal{I} is an index set. A function f:Mf:M\to\mathbb{R} is said to be in the CkC^{k} class if for any chart φβ,β\varphi_{\beta},\beta\in\mathcal{I}, the function fφβ:φβ(Uβ)f\circ\varphi_{\beta}:\varphi_{\beta}(U_{\beta})\to\mathbb{R} is a CkC^{k} function and the kk-order derivative of fφβf\circ\varphi_{\beta} is continuous.

Definition 2.11 (Differentials).

If f:Mf:M\to\mathbb{R} is in the C1C^{1} class, we define its differential Df:MTMDf:M\to T^{*}M of ff as

Df(x)v:=ddt|t=0f(c(t)),for all xM,vTxM,\displaystyle Df(x)\cdot v:=\left.\frac{d}{dt}\right|_{t=0}f(c(t)),\quad\mbox{for all $x\in M,v\in T_{x}M$},

where c:[1,1]Mc:[-1,1]\to M is a differentiable curve with c(0)=xc(0)=x and c(0)=vc^{\prime}(0)=v.

In the sequel, we shall also denote 𝐝f\mathbf{d}f as the differential of ff. The differential DfDf can be identified with a function on TMTM. Then, if ff is regular enough, we can continue taking the differential of DfDf on TMTM. In this way, we can define high-order differentials as follows.

Definition 2.12 (High-order differentials).

If f:Mf:M\to\mathbb{R} is in CkC^{k} class, we define the kk-order differential of ff denoted as Dkf:TkMD^{k}f:T^{k}M\to\mathbb{R} inductively to be the differential of Dk1f:Tk1MD^{k-1}f:T^{k-1}M\to\mathbb{R}.

Remark 2.13.

For a differentiable function K:M×MK:M\times M\to\mathbb{R}, we denote D(m,l)K:TmM×TlMD^{(m,l)}K:T^{m}M\times T^{l}M\to\mathbb{R} as the mm-order differential with respect to the first variable and ll-order differential with respect to the second variable.

Remark 2.14 (Functions with bounded differentials).

Having defined high-order differentials, we can now define the spaces Cbs(M)C_{b}^{s}(M) of functions with bounded differentials for all positive integers ss. We assume first that MM is a second-countable manifold, and we hence can construct a complete Riemannian metric gg on it [Nomi 61]. A general result [Tu 08, Proposition 12.3] guarantees that the higher order tangent TkMT^{k}M bundles are also second countable and can hence also be endowed with complete metrics gkg_{k} Let now Cbs(M)C^{s}_{b}(M) be the subset of Cs(M)C^{s}(M) given by

Cbs(M):={fCs(M)fCbs:=f+k=1sDkf<},\displaystyle C_{b}^{s}(M):=\left\{f\in C^{s}(M)~{}\mid~{}\|f\|_{C_{b}^{s}}:=\|f\|_{\infty}+\sum_{k=1}^{s}\|D^{k}f\|_{\infty}<\infty\right\},

where f:=supxM|f(x)|\|f\|_{\infty}:=\sup_{x\in M}|f(x)| and

Dkf:=supyTk1MsupvTy(Tk1M),v0|Dkf(y)v|vk1.\displaystyle\|D^{k}f\|_{\infty}:=\sup_{y\in T^{k-1}M}\ \sup_{v\in T_{y}(T^{k-1}M),v\neq 0}\frac{\left|D^{k}f(y)\cdot v\right|}{\|v\|_{k-1}}.

Here, vk1\|v\|_{k-1} stands for the norm of vv in the tangent space TyTk1MT_{y}T^{k-1}M that is defined using the complete Riemannian metric gk1g_{k-1} on Tk1MT^{k-1}M.

The metric g=g0g=g_{0} is also used to define the gradient f𝔛(M)\nabla f\in\mathfrak{X}(M) of a function f:Mf:M\to\mathbb{R} which is the vector field defined by its differential as

g(x)(f(x),v)=Df(x)v,for all vTxM.\displaystyle g(x)(\nabla f(x),v)=Df(x)\cdot v,\quad\text{for\ all\ }v\in T_{x}M.

For a differentiable function f:Mf:M\to\mathbb{R}, a Fundamental Theorem of Calculus can be formulated as

f(x)f(y)\displaystyle f(x)-f(y) =abddtf(γ(t))dt=abg(γ(t))(f(γ(t)),γ(t))dt=abDf(γ(t))γ(t)dt,\displaystyle=\int_{a}^{b}\frac{d}{dt}f(\gamma(t))\mathrm{d}t=\int_{a}^{b}g(\gamma(t))\left(\nabla f(\gamma(t)),\gamma^{\prime}(t)\right)\mathrm{d}t=\int_{a}^{b}Df(\gamma(t))\cdot\gamma^{\prime}(t)\mathrm{d}t, (2.10)

where γ:[a,b]M\gamma:[a,b]\to M is a differentiable curve with γ(a)=x\gamma(a)=x and γ(b)=y\gamma(b)=y. Taking absolute values and using the Cauchy-Schwarz inequality, we obtain

|f(x)f(y)|Dfabγ(t)gdt=Dflength(γ),\displaystyle|f(x)-f(y)|\leq\|Df\|_{\infty}\int_{a}^{b}\|\gamma^{\prime}(t)\|_{g}\mathrm{d}t=\|Df\|_{\infty}\ \text{length}(\gamma),

for each fCb1(M)f\in C^{1}_{b}(M). Note that, if we choose γ\gamma to be the geodesic connecting xx and yy, then length(γ)\text{length}(\gamma) is the geodesic distance of xx and yy.

Random variables on Riemannian manifolds. Let (M,g)(M,g) be a smooth, connected, and complete dd-dimensional Riemannian manifold with Riemannian metric gg. Existence and uniqueness theorems from the theory of ordinary differential equations guarantee that there exists a unique geodesic integral curve γx,v\gamma_{x,v} going through the point xMx\in M at time t=0t=0 with tangent vector vTxMv\in T_{x}M. The Riemannian exponential map at xMx\in M denoted by expx\operatorname{exp}_{x} maps each vector vTxMv\in T_{x}M to the point of the manifold reached in a unit time, that is,

expx(v):=γx,v(1),\displaystyle\operatorname{exp}_{x}(v):=\gamma_{x,v}(1),

where γx,v\gamma_{x,v} is the geodesic starting at xx along the vector vv. Let t0(x,v)>0t_{0}(x,v)>0 be the first time when the geodesic texpx(tv)t\mapsto\operatorname{exp}_{x}(tv) stops to be length-minimizing. The point expx(t0(x,v)v)\operatorname{exp}_{x}(t_{0}(x,v)v) is called a cut point and the corresponding tangent vector t0(x,v)vt_{0}(x,v)v is called a tangent cut point. The set of all cut points of all geodesics starting from xMx\in M is the cut locus C(x)MC(x)\in M and the set of corresponding vectors the tangential cut locus 𝒞(x)TxM\mathcal{C}(x)\in T_{x}M. Thus, we have C(x)=expx(𝒞(x))C(x)=\operatorname{exp}_{x}(\mathcal{C}(x)). Define the maximal definition domain for the exponential chart as

𝒟(x):={tvTxMvSd1,0t<t0(x,v)},\displaystyle\mathcal{D}(x):=\left\{tv\in T_{x}M\mid v\in S^{d-1},0\leq t<t_{0}(x,v)\right\},

where Sd1S^{d-1} is the unit sphere in the tangent space with respect to the metric g(x)g(x). Then the exponential map expx\operatorname{exp}_{x} is a diffeomorphism from 𝒟(x)\mathcal{D}(x) on its image expx(𝒟(x))\operatorname{exp}_{x}(\mathcal{D}(x)). Denote by logx\operatorname{log}_{x} the inverse of the exponential map. The Riemannian distance d(x,y)=logx(y)g(x)d(x,y)=\|\operatorname{log}_{x}(y)\|_{g(x)}. Recall that for connected complete Riemannian manifolds we have that

M=expx(𝒟(x))𝒞(x).\displaystyle M=\operatorname{exp}_{x}(\mathcal{D}(x))\cup\mathcal{C}(x).

It is easy to check that 𝒟(x)\mathcal{D}(x) is an open star-shaped subset of TxMT_{x}M, so (𝒟(x),expx)\left(\mathcal{D}(x),\exp_{x}\right) is a smooth chart and one has that for any Borel measurable function f:Mf:M\rightarrow\mathbb{R} on MM

Mf𝑑VM=𝒟(x)(fexpx)det(Gx)𝑑λd,\int_{M}fdV_{M}=\int_{\mathcal{D}(x)}\left(f\circ\exp_{x}\right)\sqrt{\operatorname{det}\left(G_{x}\right)}d\lambda^{d},

where dVMdV_{M} is the Riemannian measure on the manifold (M,g),Gx:=expxg(M,g),G_{x}:=\exp_{x}^{*}g is the pullback of gg by expx\exp_{x}, and λd\lambda^{d} denotes the dd-dimensional Lebesgue measure on d\mathbb{R}^{d}.

Definition 2.15 (Random variables on Riemannian manifold).

Let (Ω,,)(\Omega,\mathcal{F},\mathbb{P}) be a probability space and (M,g)(M,g) be a Riemannian manifold. A random variable ξ\xi on the Riemannian manifold MM is a Borel measurable function from Ω\Omega to MM.

Let (M)\mathcal{B}(M) be the Borel σ\sigma-algebra of MM. The random variable ξ\xi on the Riemannian manifold (M,g)(M,g) has a probability density function p:Mp:M\rightarrow\mathbb{R} (real, positive and integrable function) if for all A(M)A\in\mathcal{B}(M),

(XA)=Ap(x)𝑑VM(x) and (M)=Mp(x)𝑑VM(x)=1.\mathbb{P}(X\in A)=\int_{A}p(x)dV_{M}(x)\quad\text{ and }\quad\mathbb{P}(M)=\int_{M}p(x)dV_{M}(x)=1.

The definitions of the mean and variance of a random variable on a Riemannian manifold can be found in the literature [Penn 06].

Remark 2.16 (Random variables on tangent spaces).

Since the tangent space of a manifold at any given point is a vector space, the definition of a random variable on them is the same as for vector spaces. Let η\eta be a random variable on the tangent space TxMT_{x}M with density p:TxMp:T_{x}M\rightarrow\mathbb{R}, that is, η\eta is a Borel measurable function from Ω\Omega to TxMT_{x}M.

(i) (Mean and variance). The mean and variance of a random variable VV on TxMT_{x}M are defined as

𝔼[η]:\displaystyle\mathbb{E}[\eta]: =TxMvp(v)𝑑λd(v)TxM,\displaystyle=\int_{T_{x}M}vp(v)d\lambda^{d}(v)\in T_{x}M,
Var[η]:\displaystyle\operatorname{Var}[\eta]: =𝔼[η𝔼[η]g2]=TxMv𝔼[η]g2p(v)𝑑λd(v),\displaystyle=\mathbb{E}\left[\|\eta-\mathbb{E}[\eta]\|_{g}^{2}\right]=\int_{T_{x}M}\left\|v-\mathbb{E}[\eta]\right\|_{g}^{2}p(v)d\lambda^{d}(v),

where λd\lambda^{d} is the dd-dimensional Lebesgue measure.

(ii) (Independence). Let ηx\eta_{x} and ηy\eta_{y} are two random variables on the tangent spaces TxMT_{x}M and TyMT_{y}M, respectively. Let (TxM)\mathcal{B}(T_{x}M) and (TyM)\mathcal{B}(T_{y}M) be the Borel σ\sigma-algebras of TxMT_{x}M and TyMT_{y}M, respectively. Then we say ηx\eta_{x} and ηy\eta_{y} are independent if for all Ax(TxM)A_{x}\in\mathcal{B}(T_{x}M) and Ay(TyM)A_{y}\in\mathcal{B}(T_{y}M), we have that

(ηxAx,ηyAy)=(ηxAx)(ηyAy).\displaystyle\mathbb{P}(\eta_{x}\in A_{x},\eta_{y}\in A_{y})=\mathbb{P}(\eta_{x}\in A_{x})\mathbb{P}(\eta_{y}\in A_{y}).

2.3 Compatible structures and local coordinate representations

This subsection introduces the concept of compatible structure, a vector bundle map from the tangent space of the Poisson manifold to itself that enables us to express Hamiltonian vector fields as images of gradients by a vector bundle map. Using this tool, we can formulate adapted differential reproducing properties in RKHS setups and establish an operator framework later in Section 3 for them. Finally, we shall apply the Poisson-Darboux-Lie-Weinstein theorem to provide local coordinate representations of the compatible structure.

We first endow the Poisson manifold (P,{,})(P,\left\{\cdot,\cdot\right\}) with a Riemannian metric gg and define the vector bundle map J:TPTPJ:TP\to TP by

J(z)v:=B(z)(g(z)(v)),for all zP,vTzP,\displaystyle J(z)v:=B^{\sharp}(z)\left(g^{\flat}(z)(v)\right),\quad\mbox{for all $z\in P,v\in T_{z}P$,} (2.11)

where B:TPTPB^{\sharp}:T^{*}P\to TP is the vector bundle map defined in (2.7) and g:TPTPg^{\flat}:TP\to T^{*}P is the vector bundle isomorphism determined by the Riemannian metric gg. Notice that the vector bundle map BB^{\sharp} may not be an isomorphism due to the possible degeneracy of the Poisson tensor BB, but nevertheless, the map JJ is always well-defined since we do not need to use the inverse of BB^{\sharp} at any moment. Note that the Poisson tensor BB and the Riemannian metric gg are linked by the following relation:

B(z)(αz,βz)=αzB(z)(βz)=g(z)(g(z)(αz),B(z)(βz)),for all αz,βzTzP.\displaystyle B(z)(\alpha_{z},\beta_{z})=\alpha_{z}\cdot B^{\sharp}(z)(\beta_{z})=g(z)\left(g^{\sharp}(z)(\alpha_{z}),B^{\sharp}(z)(\beta_{z})\right),\quad\mbox{for all $\alpha_{z},\beta_{z}\in T^{*}_{z}P$.} (2.12)

We call J:TPTPJ:TP\to TP the compatible structure associated with the Poisson tensor BB and the Riemannian metric gg. Using the compatible structure JJ, the Hamiltonian vector field associated with a Hamiltonian function hh can be expressed as

Xh=B(𝐝h)=B(g(h))=Jh.\displaystyle X_{h}=B^{\sharp}(\mathbf{d}h)=B^{\sharp}(g^{\flat}(\nabla h))=J\nabla h. (2.13)
Remark 2.17.

If the Poisson manifold PP is symplectic with a symplectic form ω\omega, then B=ωB^{\sharp}=\omega^{\sharp}, and the relationship (2.12) reduces to

g(z)(u,v)=ω(z)(u,Jv),u,vTzP,\displaystyle g(z)(u,v)=\omega(z)(u,-Jv),\quad u,v\in T_{z}P,

where J(z)v:=ω(z)(g(z)(v))J(z)v:=\omega^{\sharp}(z)(g^{\flat}(z)(v)) is, in this case, an isomorphism from TPTP to TPTP. When the compatible structure JJ defined in (2.12) satisfies J2=IJ^{2}=-I, it is called a complex structure, and the corresponding manifold is said to be a Kähler manifold. Even-dimensional Euclidean vector spaces are special cases of Kähler manifolds by choosing JJ as the canonical (Darboux) symplectic matrix JcanJ_{\mathrm{can}}, whose Hamiltonian vector field associated with a Hamiltonian function is expressed as Xh=JcanhX_{h}=J_{\mathrm{can}}\nabla h.

Using the compatible structure JJ, we obtain a useful property of Casimir functions which will be needed later on in Section 3 to prove the uniqueness of the solution to the learning problem.

Proposition 2.18.

Let (P,{,})(P,\left\{\cdot,\cdot\right\}) be a Poisson manifold equipped with a Riemannian manifold gg. Then for any vector field Y𝔛(P)Y\in\mathfrak{X}(P) and any hC(P)h\in C^{\infty}(P):

g(z)(J(z)h(z),Y(z))=g(z)(h(z),J(z)Y(z)),for all zP.\displaystyle g(z)(J(z)\nabla h(z),Y(z))=g(z)(\nabla h(z),-J(z)Y(z)),\quad\mbox{for all $z\in P$}. (2.14)

Furthermore, Casimir functions are constant along the flow of the vector field JY-JY, that is, (JY)[h]=0(-JY)[h]=0 for any vector field Y𝔛(P)Y\in\mathfrak{X}(P) and any h𝒞(P)h\in\mathcal{C}(P).

Proof.

By the definition of the compatible structure JJ and the vector bundle map BB^{\sharp}, we have that

g(z)(J(z)h(z),Y(z))\displaystyle g(z)(J(z)\nabla h(z),Y(z)) =g(z)(B(z)𝐝h(z),Y(z))=B(z)(g(z)(Y)(z),𝐝h(z))\displaystyle=g(z)\left(B^{\sharp}(z)\mathbf{d}h(z),Y(z)\right)=B(z)(g^{\flat}(z)(Y)(z),\mathbf{d}h(z))
=B(z)(𝐝h(z),g(z)(Y)(z))=𝐝h(z)B(z)(g(z)(Y))(z)\displaystyle=-B(z)(\mathbf{d}h(z),g^{\flat}(z)(Y)(z))=-\mathbf{d}h(z)\cdot B^{\sharp}(z)(g^{\flat}(z)(Y))(z)
=g(z)(h(z),B(z)(g(z)(Y))(z))=g(z)(h(z),J(z)Y(z)),\displaystyle=-g(z)(\nabla h(z),B^{\sharp}(z)(g^{\flat}(z)(Y))(z))=g(z)(\nabla h(z),-J(z)Y(z)),

where the second and fifth equalities are due to the equation (2.12) and the symmetry of the Riemannian metric gg, and the third equality holds thanks to the anti-symmetry of the Poisson tensor BB. Furthermore, let h𝒞(P)h\in\mathcal{C}(P). Then equation (2.14) implies that for all zPz\in P,

(JY)[h](z)=𝐝h(z)(JY)(z)=g(z)(h(z),J(z)Y(z))=g(z)(Xh(z),Y(z))=0.\displaystyle(-JY)[h](z)=\mathbf{d}h(z)\cdot(-JY)(z)=g(z)(\nabla h(z),-J(z)Y(z))=g(z)(X_{h}(z),Y(z))=0.

Local coordinate representations.  We start by recalling the local Lie-Darboux-Weinstein coordinates of Poisson manifolds (see the original paper [Wein 83] or [Libe 12, Vais 12] for details).

Theorem 2.19 (Lie-Darboux-Weinstein coordinates).

Let (P,{,})(P,\{\cdot,\cdot\}) be a mm–dimensional Poisson manifold and z0Mz_{0}\in M a point where the rank of the Poisson structure equals 2n2n, 02nm0\leq 2n\leq m. There exists a chart (U,φ)(U,\varphi) of PP whose domain contains the point z0z_{0} and such that the associated local coordinates, denoted by (q1,,qn,p1,,pn,y1,,ym2n)(q^{1},\ldots,q^{n},p_{1},\ldots,p_{n},y_{1},\ldots,y_{m-2n}), satisfy

{qi,qj}={pi,pj}={qi,yk}={pi,yk}=0, and {qi,pj}=δji,\{q^{i},q^{j}\}=\{p_{i},p_{j}\}=\{q^{i},y_{k}\}=\{p_{i},y_{k}\}=0,\quad\text{ and }\quad\{q^{i},p_{j}\}=\delta^{i}_{j},

for all i,j,ki,j,k, 1i,jn1\leq i,j\leq n, 1km2n1\leq k\leq m-2n. For all k,lk,l, 1k,lm2n1\leq k,l\leq m-2n, the Poisson bracket {yk,yl}\{y_{k},y_{l}\} is a function of the local coordinates y1,,ym2ny^{1},\ldots,y^{m-2n} exclusively, and vanishes at z0z_{0}. Hence, the restriction of the bracket {,}\{\cdot,\cdot\} to the coordinates y1,,ym2ny^{1},\ldots,y^{m-2n} induces a Poisson structure that is usually referred to as the transverse Poisson structure of (P,{,})(P,\{\cdot,\cdot\}) at mm. This structure is unique up to isomorphisms.

If the rank of (P,{,})(P,\{\cdot,\cdot\}) is constant and equal to 2n2n on a neighborhood WW of z0z_{0} then, by choosing the domain UU of the chart contained in WW, the coordinates yy satisfy

{yk,yl}=0,\{y_{k},y_{l}\}=0,

for all k,lk,l, 1k,lm2n1\leq k,l\leq m-2n.

This theorem proves that if the rank of the Poisson structure around a given point z0Pz_{0}\in P is locally constant, then there exist local coordinates {z1,,zm}\{z^{1},\ldots,z^{m}\} in which the Poisson tensor BB can be represented by a constant matrix (Bij)i,j=1m\left(B^{ij}\right)_{i,j=1}^{m} as B=BijzizjB=B^{ij}\frac{\partial}{\partial z^{i}}\otimes\frac{\partial}{\partial z^{j}}. Hence, the Poisson bracket can be locally expressed as (using the Einstein convention in the summation of the indices)

{F,G}=BijFziGzj.\displaystyle\{F,G\}=B^{ij}\frac{\partial F}{\partial z^{i}}\frac{\partial G}{\partial z^{j}}.

Furthermore, for any one-form α=αjdzj\alpha=\alpha_{j}dz^{j}, the vector bundle map BB^{\sharp} can be locally expressed as

B(α)=Bijαizj.\displaystyle B^{\sharp}(\alpha)=B^{ij}\alpha_{i}\frac{\partial}{\partial z^{j}}. (2.15)

Note that g:TPTPg^{\flat}:TP\to T^{*}P is the bundle isomorphism with respect to the Riemannian metric g:=gijdzidzjg:=g_{ij}dz^{i}\otimes dz^{j}. Thus, for any X:=XjzjX:=X^{j}\frac{\partial}{\partial z^{j}}, the bundle isomorphism gg^{\flat} can be expressed as

g(X)=gijXidzj.\displaystyle g^{\flat}(X)=g_{ij}X^{i}dz^{j}. (2.16)

Therefore, combining equations (2.15) and (2.16), the compatible structure JJ defined in (2.11) has the local representation

J(Xjzj)=B(g(Xjzj))=B(gijXidzj))=BijgikXkzj,\displaystyle J\left(X^{j}\frac{\partial}{\partial z^{j}}\right)=B^{\sharp}\left(g^{\flat}\left(X^{j}\frac{\partial}{\partial z^{j}}\right)\right)=B^{\sharp}\left(g_{ij}X^{i}dz^{j})\right)=B^{ij}g_{ik}X^{k}\frac{\partial}{\partial z^{j}},

which proves that the matrix representation of the bundle map is given by J=BgJ=-Bg, that is,

Jij=Bikgkj.\displaystyle J^{ij}=-B^{ik}g_{kj}. (2.17)

3 Structure-preserving kernel regressions on Poisson manifolds

This section proposes a structure-preserving kernel regression method to recover Hamiltonian functions on Poisson manifolds, which is a significant generalization of the same learning problem on even-dimensional symplectic vector spaces [Hu 24]. First, in Section 3.1, we establish a differential reproducing property, which not only ensures the existence and boundedness of function gradients in the RKHS but also provides an operator framework to represent the minimizers of our structure-preserving kernel regression problems. In Section 3.3, we present an operator framework to represent the estimator for the structure-preserving regression problem. In Section 3.4, we develop a differential kernel representation of the estimator, providing a closed-form solution that is computationally feasible in applications. The error analysis of this structure-preserving estimator is conducted later in Section 4, where we establish convergence rates with both fixed and adaptive regularization parameters.

3.1 RKHS on Riemannian manifolds

The structure-preserving learning of Hamiltonian systems on symplectic vector spaces has been tackled by proving a differential reproducing property in [Hu 24]. We shall show in this section that this differential reproducing property still holds on a generic Riemannian manifold, and we will give a global version of it, where the term global means that the result that we provide is not formulated in terms of locally defined partial derivatives but of globally defined differentials as in (2.11) that do not depend on specific choices of local coordinates.

We first quickly recall Mercer kernels and the associated reproducing kernel Hilbert spaces (RKHS). A Mercer kernel on a non-empty set 𝒳\mathcal{X} is a positive semidefinite symmetric function K:𝒳×𝒳K:\mathcal{X}\times\mathcal{X}\to\mathbb{R}, where positive semidefinite means that it satisfies

i=1nj=1ncicjK(xi,xj)0,\displaystyle\sum_{i=1}^{n}\sum_{j=1}^{n}c_{i}c_{j}K(x_{i},x_{j})\geq 0, (3.1)

for any x1,,xn𝒳x_{1},\ldots,x_{n}\in\mathcal{X}, c1,,cnc_{1},\ldots,c_{n}\in\mathbb{R}, and any nn\in\mathbb{N}. Property (3.1) is equivalent to requiring that the Gram matrices [K(xi,xj)]i,j=1n[K(x_{i},x_{j})]_{i,j=1}^{n} are positive semidefinite for any x1,,xn𝒳x_{1},\ldots,x_{n}\in\mathcal{X} and any given nn\in\mathbb{N}. We emphasize that the positive semidefiniteness of kernels in non-Euclidean spaces requires care in the sense that the natural non-Euclidean generalization of positive definite kernels in Euclidean spaces (like the Gaussian kernel) may not be positive definite [Fera 15, Da C 23a, Da C 23b, Li 23]. However, as the next proposition shows, it is easy to construct Mercer kernels in any non-Euclidean space.

Proposition 3.1.

[Van  12, page 69] For any real-valued function ff on an non-empty set 𝒳\mathcal{X}, the function K(x,y):=f(x)f(y)K(x,y):=f(x)f(y) is a Mercer kernel on 𝒳\mathcal{X}. Moreover, if K1,K2K_{1},K_{2} are Mercer kernels on 𝒳\mathcal{X}, so is K1K2K_{1}K_{2}.

A Mercer kernel is the key element to define a reproducing kernel Hilbert space (RKHS) as follows.

Definition 3.2 (RKHS).

Let K:𝒳×𝒳K:\mathcal{X}\times\mathcal{X}\to\mathbb{R} be a Mercer kernel on a nonempty set 𝒳\mathcal{X}. A Hilbert space K\mathcal{H}_{K} of real-valued functions on 𝒳{\cal X} endowed with the pointwise sum and pointwise scalar multiplication, and with inner product ,K\langle\cdot,\cdot\rangle_{\mathcal{H}_{K}} is called a reproducing kernel Hilbert space (RKHS) associated to KK if the following properties hold:

(i)

For all x𝒳x\in\mathcal{X}, we have that the function K(x,)KK(x,\cdot)\in\mathcal{H}_{K}.

(ii)

For all x𝒳x\in\mathcal{X} and for all fKf\in\mathcal{H}_{K}, the following reproducing property holds

f(x)=f,K(x,)K.f(x)=\langle f,K(x,\cdot)\rangle_{\mathcal{H}_{K}}.

The Moore-Aronszajn Theorem [Aron 50] establishes that given a Mercer kernel KK on a set 𝒳{\cal X}, there is a unique Hilbert space of real-valued functions K\mathcal{H}_{K} on 𝒳{\cal X} for which KK is a reproducing kernel, which allows us to talk about the RKHS K\mathcal{H}_{K} associated to KK. Conversely, an RKHS naturally induces a kernel function that is a Mercer kernel. Therefore, there is a bijection between RKHSs and Mercer kernels.

We now state a differential reproducing property on Riemannian manifolds whose proof can be found in Appendix A.1. The following theorem is a generalization of similar results proved for compact [Nova 18, Zhou 08], bounded [Ferr 12], and unbounded [Hu 24] subsets of Euclidean spaces.

Theorem 3.3 (Differential reproducing property on Riemannian manifolds).

Let MM be a complete Riemannian manifold. Let K:M×MK:M\times M\rightarrow\mathbb{R} be a Mercer kernel such that KCb2s+1(M×M)K\in C_{b}^{2s+1}(M\times M), for some ss\in\mathbb{N} (see Remark 2.14 for the definition of this space). Then the following statements hold:

(i)

For all 1ks1\leq k\leq s, we have that the function D(k,0)K(y,)vKD^{(k,0)}K(y,\cdot)\cdot v\in\mathcal{H}_{K} for all yTk1My\in T^{k-1}M and vTy(Tk1M)v\in T_{y}(T^{k-1}M).

(ii)

A differential reproducing property holds true for all fKf\in\mathcal{H}_{K}:

Dkf(y)v=D(k,0)K(y,)v,fK,for all yTk1MvTy(Tk1M), and 1ks.D^{k}f(y)\cdot v=\langle D^{(k,0)}K(y,\cdot)\cdot v,f\rangle_{\mathcal{H}_{K}},\quad\mbox{for all $y\in T^{k-1}M$, $v\in T_{y}(T^{k-1}M)$, and $1\leq k\leq s$.} (3.2)
(iii)

Denote κ2=(s+1)KCbs\kappa^{2}=(s+1)\|K\|_{C_{b}^{s}}. The inclusion J:KCbs(M)J:\mathcal{H}_{K}\hookrightarrow C_{b}^{s}(M) is well-defined and bounded:

fCbsκfK,fK.\displaystyle\|f\|_{C_{b}^{s}}\leqslant\kappa\|f\|_{\mathcal{H}_{K}},\quad\forall~{}f\in\mathcal{H}_{K}. (3.3)

For convenience, we use the following notation in what follows: given xMx\in M, we denote by Kx:MK_{x}:M\rightarrow\mathbb{R} the function on MM given by Kx(y):=K(x,y)K_{x}(y):=K(x,y) and we call it the kernel section of KK at xMx\in M. Therefore, KK_{\cdot} can be regarded as a function on M×MM\times M such that Kx(y):=K(y,x)K_{x}(y):=K(y,x). For any function fKf\in\mathcal{H}_{K}, we denote by f,KK\langle f,K_{\cdot}\rangle_{\mathcal{H}_{K}} the function in K\mathcal{H}_{K} given by f,KyK=f(y)\langle f,K_{y}\rangle_{\mathcal{H}_{K}}=f(y) (by the reproducing property). Furthermore, for a Poisson manifold PP equipped with a Riemannian metric gg, we shall also denote by XKX_{K_{\cdot}} the family of Hamiltonian vector fields parametrized by the elements in PP, that is, given any zPz\in P, XKzX_{K_{z}} is the Hamiltonian vector field of the function KzK_{z}. In particular, for any zPz\in P and any vector vTzPv\in T_{z}P, g(z)(XK(z),v)g(z)\left(X_{K_{\cdot}}(z),v\right) is a function on the manifold PP.

Remark 3.4.

(i) By Theorem 3.3 (i), D(1,0)K(x,)uKD^{(1,0)}K(x,\cdot)\cdot u\in\mathcal{H}_{K} for any uTxMu\in T_{x}M. Thus, the differential reproducing property (3.2) with f=D(1,0)K(x,)uf=D^{(1,0)}K(x,\cdot)\cdot u implies that

D(1,1)K(x,y)(u,v)=D(1,0)K(x,)u,D(1,0)K(y,)vK,for all yM,vTyM.\displaystyle D^{(1,1)}K(x,y)\cdot(u,v)=\left\langle D^{(1,0)}K(x,\cdot)\cdot u,D^{(1,0)}K(y,\cdot)\cdot v\right\rangle_{\mathcal{H}_{K}},\quad\mbox{for all $y\in M,v\in T_{y}M$.}

Furthermore, in the presence of a Poisson structure and combining with Proposition 2.18, we obtain that for any vector field Y𝔛(M)Y\in\mathfrak{X}(M),

g(z)(XK(z),Y(z))=D(1,0)K(z,)(J(z)Y(z))K.\displaystyle g(z)\left(X_{K_{\cdot}}(z),Y(z)\right)=D^{(1,0)}K(z,\cdot)\cdot(-J(z)Y(z))\in\mathcal{H}_{K}.

(ii) Note that unlike the differential reproducing property in [Hu 24] that was formulated in terms of partial derivatives, (3.2) is a coordinate-free expression that has been written down in terms of global differentials.

Corollary 3.5.

Let MM be a Riemannian manifold. Denote by K\mathcal{H}_{K} the RKHS associated with the kernel K:M×MK:M\times M\to\mathbb{R}. Then for each vector field X𝔛(M)X\in\mathfrak{X}(M) and each fKf\in\mathcal{H}_{K}, we have that

X[f](x)=X[f,KK](x),xM.\displaystyle X[f](x)=X[\langle f,K_{\cdot}\rangle_{\mathcal{H}_{K}}](x),\quad\forall\ x\in M.
Proof.

By the reproducing property, we have that for each fKf\in\mathcal{H}_{K}, f(x)=f,KxKf(x)=\langle f,K_{x}\rangle_{\mathcal{H}_{K}}, for all xMx\in M. Let FtF_{t} be the flow of the vector field XX. Then,

X[f,KK](x):=ddt|t=0[f,KFt(x)K]=ddt|t=0f(Ft(x))=X[f](x).X[\langle f,K_{\cdot}\rangle_{\mathcal{H}_{K}}](x):=\left.\frac{d}{dt}\right|_{t=0}[\langle f,K_{F_{t}(x)}\rangle_{\mathcal{H}_{K}}]=\left.\frac{d}{dt}\right|_{t=0}f(F_{t}(x))=X[f](x).

3.2 The structure-preserving learning problem

As we already explained in the introduction, the data that we are given for the learning problem are noisy realizations of the Hamiltonian vector field XH𝔛(P)X_{H}\in\mathfrak{X}(P) whose Hamiltonian function HC(P)H\in C^{\infty}(P) we are trying to estimate, that is,

𝐗σ2(n):=XH(𝐙(n))+𝜺(n),n=1,,N,\displaystyle\mathbf{X}_{\sigma^{2}}^{(n)}:=X_{H}(\mathbf{Z}^{(n)})+\bm{\varepsilon}^{(n)},\quad n=1,\ldots,N,

where 𝐙(n)\mathbf{Z}^{(n)} are IID random variables with values in the Poisson manifold PP with the same distribution μ𝐙\mu_{\mathbf{Z}} and 𝜺(n)\bm{\varepsilon}^{(n)} are independent random variables on the tangent spaces T𝐳(n)PT_{\mathbf{z}^{(n)}}P with mean zero and variance σ2Id\sigma^{2}I_{d} (see Remark 2.16). We notice that unlike the Euclidean case studied in [Hu 24] where it was supposed that 𝜺(n)\bm{\varepsilon}^{(n)} have the same distribution in d\mathbb{R}^{d}, in the present situation, the tangent spaces T𝐳(n)PT_{\mathbf{z}^{(n)}}P are different vector spaces even though they are isomorphic and hence, we can not assume that the random variables 𝜺(n)\bm{\varepsilon}^{(n)} in different spaces have the same distribution. However, the independence hypothesis and the assumptions on the moments are sufficient for our structure-preserving kernel regression.

Notation. We shall write the random samples as:

𝐙N:=Vec(𝐙(1)||𝐙(N))Πi=1NP,𝐗σ2,N:=Vec(𝐗σ2(1)||𝐗σ2(N))Πi=1NT𝐙(i)P,\displaystyle\begin{split}\mathbf{Z}_{N}&:=\mathrm{Vec}\left(\mathbf{Z}^{(1)}|\cdots|\mathbf{Z}^{(N)}\right)\in\Pi_{i=1}^{N}P,\\ \mathbf{X}_{\sigma^{2},N}&:=\mathrm{Vec}\left(\mathbf{X}_{\sigma^{2}}^{(1)}|\cdots|\mathbf{X}_{\sigma^{2}}^{(N)}\right)\in\Pi_{i=1}^{N}T_{\mathbf{Z}^{(i)}}P,\end{split}

where the symbol ‘Vec\mathrm{Vec}’ stands for the vectorization of the corresponding matrices. We shall denote by 𝐳(n)\mathbf{z}^{(n)} and 𝐱σ2(n)\mathbf{x}_{\sigma^{2}}^{(n)} the realizations of the random variables 𝐙(n)\mathbf{Z}^{(n)} and 𝐗σ2(n)\mathbf{X}_{\sigma^{2}}^{(n)}, respectively. The collection of realizations is denoted by

𝐳N:=Vec(𝐳(1)||𝐳(N))Πi=1NP,𝐱σ2,N:=Vec(𝐱σ2(1)||𝐱σ2(N))Πi=1NT𝐳(i)P.\displaystyle\begin{split}\mathbf{z}_{N}&:=\mathrm{Vec}\left(\mathbf{z}^{(1)}|\cdots|\mathbf{z}^{(N)}\right)\in\Pi_{i=1}^{N}P,\\ \mathbf{x}_{\sigma^{2},N}&:=\mathrm{Vec}\left(\mathbf{x}_{\sigma^{2}}^{(1)}|\cdots|\mathbf{x}_{\sigma^{2}}^{(N)}\right)\in\Pi_{i=1}^{N}T_{\mathbf{z}^{(i)}}P.\end{split}

In the sequel, if f:Psf:P\to\mathbb{R}^{s} is a function, we then shall denote the value Vec(f(𝐙(1))||f(𝐙(N)))sN\mathrm{Vec}\left(f(\mathbf{Z}^{(1)})|\cdots|f(\mathbf{Z}^{(N)})\right)\in\mathbb{R}^{sN} by f(𝐙N)f(\mathbf{Z}_{N}).

Structure-preserving kernel ridge regression. The strategy behind structure-preservation in the context of kernel regression is that we search the vector field X𝔛(P)X\in\mathfrak{X}(P) that minimizes a risk functional among those that have the form X:=XhX:=X_{h}, where h:Ph:P\rightarrow\mathbb{R} belongs to the RKHS K\mathcal{H}_{K} associated with a kernel KK defined on the Poisson manifold PP. This approach obviously guarantees that the learned vector field is Hamiltonian. To make the method explicit, we shall be solving the following optimization problem

h^λ,N\displaystyle\widehat{h}_{\lambda,N} :=argminhKR^λ,N(h),\displaystyle:=\mathop{\arg\min}\limits_{h\in\mathcal{H}_{K}}\ \widehat{R}_{\lambda,N}(h), (3.4)
R^λ,N(h)\displaystyle\widehat{R}_{\lambda,N}(h) :=1Nn=1NXh(𝐙(n))𝐗σ2(n)g2+λhK2,\displaystyle:=\frac{1}{N}\sum_{n=1}^{N}\|X_{h}(\mathbf{Z}^{(n)})-\mathbf{X}^{(n)}_{\sigma^{2}}\|_{g}^{2}+\lambda\|h\|_{\mathcal{H}_{K}}^{2}, (3.5)

where XhX_{h} is the Hamiltonian vector field of hh and λ0\lambda\geq 0 is the Tikhonov regularization parameter. We shall refer to the minimizer h^λ,N\widehat{h}_{\lambda,N} as the structure-preserving kernel estimator of the Hamiltonian function HH. The functional R^λ,N\widehat{R}_{\lambda,N} is called the regularized empirical risk.

Remark 3.6.

(i) If KCb3(P×P)K\in C_{b}^{3}(P\times P) then the differential reproducing property in Theorem 3.3 implies that the functions in K{\mathcal{H}}_{K} are all differentiable because and hence they always have a Hamiltonian vector field associated. This implies that the regularized empirical risk R^λ,N\widehat{R}_{\lambda,N} defined in (3.5) and the associated optimization problem (3.4) is always well-defined in that situation.

(ii) The minimization problem (3.4) might be ill-posed since for any Casimir function C𝒞KC\in\mathcal{C}\cap\mathcal{H}_{K} and hKh\in\mathcal{H}_{K}, the function h+Ch+C shares the same Hamiltonian vector field as hh. This could result in the lack of unique solutions for the minimization problem. Later on in Remark 3.14, we show how ridge regularization solves this problem.

Given the probability measure μ𝐙\mu_{\mathbf{Z}} and the Riemannian metric gg on the manifold PP, a vector field Y𝔛(P)Y\in\mathfrak{X}(P) is said L2(μ𝐙)L^{2}(\mu_{\mathbf{Z}})-integrable if the following L2(μ𝐙)L^{2}(\mu_{\mathbf{Z}}) norm of YY is finite, that is,

YL2(μ𝐙):=Pg(z)(Y(z),Y(z))dμ(z)<.\displaystyle\|Y\|_{L^{2}(\mu_{\mathbf{Z}})}:=\int_{P}g(z)(Y(z),Y(z))\mathrm{d}\mu(z)<\infty. (3.6)

Denote by L2(𝔛(P),g,μ𝐙)L^{2}(\mathfrak{X}(P),g,\mu_{\mathbf{Z}}) the space consisting of all L2(μ𝐙)L^{2}(\mu_{\mathbf{Z}})-integrable vector fields. The measure-theoretic analogue RλR_{\lambda} of (3.5) is referred to as regularized statistical risk and is defined by

Rλ(h)\displaystyle R_{\lambda}(h) :=XhXHL2(μ𝐙)2+λhK2+σ2,\displaystyle:=\|X_{h}-X_{H}\|_{L^{2}(\mu_{\mathbf{Z}})}^{2}+\lambda\|h\|_{\mathcal{H}_{K}}^{2}+\sigma^{2}, (3.7)

where the L2(μ𝐙)L^{2}(\mu_{\mathbf{Z}}) norm is defined in (3.6) and Xh,XHX_{h},X_{H} are the Hamiltonian vector fields of hh and HH, respectively. We denote by hλKh^{*}_{\lambda}\in\mathcal{H}_{K} the best-in-class functional with the minimal regularized statistical risk, that is,

hλ:=argminhKRλ(h).\displaystyle h^{*}_{\lambda}:=\mathop{\arg\min}\limits_{h\in\mathcal{H}_{K}}R_{\lambda}(h). (3.8)

We note that by the strong law of large numbers, the regularized empirical and statistical risks are consistent within the RKHS, that is, for every hKh\in\mathcal{H}_{K}, we have that

limN𝔼𝜺[R^λ,N(h)]=Rλ(h),almost surely,\displaystyle\lim\limits_{N\to\infty}\mathbb{E}_{\bm{\varepsilon}}\left[\widehat{R}_{\lambda,N}(h)\right]=R_{\lambda}(h),\quad\mbox{almost surely},

where 𝔼𝜺\mathbb{E}_{\bm{\varepsilon}} means taking the conditional expectation for all random variables 𝜺(n)\bm{\varepsilon}^{(n)} given all 𝐙(n)\mathbf{Z}^{(n)}.

3.3 Operator representations of the kernel estimator

In order to find and study the solutions of the inverse learning problems (3.4)-(3.5) and (3.7)-(3.8), we introduce the operators A:K𝔛Ham(P)A:\mathcal{H}_{K}\to\mathfrak{X}_{\mathrm{Ham}}(P) and AN:KT𝐙NP:=Πi=1NT𝐙(i)PA_{N}:\mathcal{H}_{K}\to T_{\mathbf{Z}_{N}}P:=\Pi_{i=1}^{N}T_{\mathbf{Z}^{(i)}}P as

Ah:=Xh=Jh, and ANh:=Xh(𝐙N)=1N𝕁Nh(𝐙N), for all hK,\displaystyle Ah:=X_{h}=J\nabla h,\text{ and }A_{N}h:=X_{h}(\mathbf{Z}_{N})=\frac{1}{\sqrt{N}}\mathbb{J}_{N}\nabla h(\mathbf{Z}_{N}),\quad\text{ for all }h\in\mathcal{H}_{K}, (3.9)

where 𝕁N=diag{J(𝐙(1)),,J(𝐙(N))}\mathbb{J}_{N}=\operatorname{diag}\{J(\mathbf{Z}^{(1)}),\ldots,J(\mathbf{Z}^{(N)})\} and JJ is the compatible structure defined in (2.11). By Theorem 3.3, if the Mercer kernel KCb3(P×P)K\in C_{b}^{3}(P\times P), then the functions in K\mathcal{H}_{K} are differentiable, and hence the operators AA and ANA_{N} are well-defined. We aim to show that the image of the operator AA belongs to the space L2(𝔛(P),g,μ𝐙)L^{2}(\mathfrak{X}(P),g,\mu_{\mathbf{Z}}). To achieve this, we must make the following boundedness assumption on the compatible structure JJ.

Assumption 3.7.

The compatible structure JJ defined in (2.11) satisfies

g(z)(J(z)v,J(z)v)γ(z)g(z)(v,v),for all zP, and vTzP,\displaystyle g(z)(J(z)v,J(z)v)\leq\gamma(z)g(z)(v,v),\quad\text{for all }z\in P,\text{ and }v\in T_{z}P, (3.10)

where γ\gamma is a positive function on the Poisson manifold PP bounded above by some constant C2C^{2} (C>0C>0).

Using this boundedness condition on the compatible structure JJ, we can obtain the boundedness of the operators AA and ANA_{N} defined in (3.9) as maps from K\mathcal{H}_{K} to L2(𝔛(P),g,μ𝐙)L^{2}(\mathfrak{X}(P),g,\mu_{\mathbf{Z}}) and from K\mathcal{H}_{K} to T𝐙NPT_{\mathbf{Z}_{N}}P, respectively. A detailed proof of the following proposition can be found in the Appendix A.2

Proposition 3.8.

Let PP be a Poisson manifold and let KCb3(P×P)K\in C_{b}^{3}(P\times P) be a Mercer kernel. Suppose that Assumption 3.7 holds. Then, the operator AA defined in (3.9) is a bounded linear operator that maps K\mathcal{H}_{K} into L2(𝔛(P),g,μ𝐙)L^{2}(\mathfrak{X}(P),g,\mu_{\mathbf{Z}}) with an operator norm A\|A\| that satisfies AκC\|A\|\leq\kappa C, where κ2=2KCb2\kappa^{2}=2\|K\|_{C_{b}^{2}}. The adjoint operator A:L2(𝔛(P),g,μ𝐙)KA^{*}:L^{2}(\mathfrak{X}(P),g,\mu_{\mathbf{Z}})\longrightarrow{\mathcal{H}}_{K} of AA is given by

AY=Pg(z)(XK(z),Y(z))dμ𝐙(z),for all YL2(𝔛(P),g,μ𝐙).\displaystyle A^{*}Y=\int_{P}g(z)(X_{K_{\cdot}}(z),Y(z))\mathrm{d}\mu_{\mathbf{Z}}(z),\quad\mbox{for all $Y\in L^{2}(\mathfrak{X}(P),g,\mu_{\mathbf{Z}})$.}

As a consequence, the bounded linear operator Q:KKQ:\mathcal{H}_{K}\longrightarrow\mathcal{H}_{K}, defined by

Qh:=AAh=Pg(z)(XK(z),Xh(z))dμ𝐙(z),\displaystyle Qh:=A^{*}Ah=\int_{P}g(z)(X_{K_{\cdot}}(z),X_{h}(z))\mathrm{~{}d}\mu_{\mathbf{Z}}(z), (3.11)

is a positive trace class operator that satisfies Tr(Q)12dκ2C2\operatorname{Tr}(Q)\leq\frac{1}{2}d\kappa^{2}C^{2}.

Remark 3.9.

(i) The proof of Proposition 3.8 shows that the boundedness condition of the compatible structure JJ in Assumption 3.7 can be weakened by requiring that the function γ\gamma is positive L1L^{1}-integrable with respect to the probability measure μ𝐙\mu_{\mathbf{Z}}. This condition obviously holds if μ𝐙\mu_{\mathbf{Z}} is compactly supported or it is a Gaussian probability measure, which shows this boundedness condition holds for a large class of compatible structures.

(ii) When the Poisson manifold PP is symplectic and the compatible structure JJ satisfies J2=IJ^{2}=-I then it is a complex structure and PP becomes a Kähler manifold. Even-dimensional vector spaces are special cases of Kähler manifolds by picking as the complex structure the one naturally associated to the canonical (Darboux) symplectic form. In these cases, the boundedness condition (3.10) naturally holds by choosing γ1\gamma\equiv 1. This is why this condition never appears in [Hu 24].

We now deal with the empirical version ANA_{N} of the operator AA defined in (3.9). For convenience, we define the inner product gN(,)g_{N}(\cdot,\cdot) on the product space T𝐙NPT_{\mathbf{Z}_{N}}P as follows:

gN(u,v):=i=1Ng(𝐙(i))(ui,vi),u=(u1,,uN),v=(v1,,vN)T𝐙NP,\displaystyle g_{N}(u,v):=\sum_{i=1}^{N}g\left(\mathbf{Z}^{(i)}\right)(u_{i},v_{i}),\quad u=(u_{1},\ldots,u_{N}),v=(v_{1},\ldots,v_{N})\in T_{\mathbf{Z}_{N}}P, (3.12)

where ui,viT𝐙(i)Pu_{i},v_{i}\in T_{\mathbf{Z}^{(i)}}P for i=1,,Ni=1,\ldots,N. Furthermore, we denote by gN\|\cdot\|_{g_{N}} the corresponding norm.

Proposition 3.10.

Suppose that Assumption 3.7 holds. Then, the operator AN:KT𝐙NPA_{N}:\mathcal{H}_{K}\rightarrow T_{\mathbf{Z}_{N}}P defined in (3.9) is a bounded linear operator whose operator norm ANκC\|A_{N}\|\leq\kappa C. The adjoint operator AN:T𝐙NPKA_{N}^{*}:T_{\mathbf{Z}_{N}}P\rightarrow\mathcal{H}_{K} of ANA_{N} is a finite rank operator given by

ANW=1NgN(W,XK(𝐙N)),WT𝐙NP.\displaystyle A^{*}_{N}W=\frac{1}{\sqrt{N}}g_{N}(W,X_{K_{\cdot}}(\mathbf{Z}_{N})),\quad W\in T_{\mathbf{Z}_{N}}P.

Moreover, the operator QNQ_{N} defined by

QNh:=ANANh=1NgN(Xh(𝐙N),XK(𝐙N)),hK,\displaystyle Q_{N}h:=A^{*}_{N}A_{N}h=\frac{1}{N}g_{N}(X_{h}(\mathbf{Z}_{N}),X_{K_{\cdot}}(\mathbf{Z}_{N})),\quad h\in\mathcal{H}_{K}, (3.13)

is a positive-semidefinite compact operator.

Proof.

The explicit expressions of ANA_{N}^{\ast} and QNQ_{N} follow from a direct calculation. Under Assumption 3.7 it follows from the equation (A.4) that

ANh2=1N𝕁Nh(𝐙N)gN2C2κ2hK2,\|A_{N}h\|^{2}=\frac{1}{N}\|\mathbb{J}_{N}\nabla h(\mathbf{Z}_{N})\|_{g_{N}}^{2}\leq C^{2}\kappa^{2}\|h\|^{2}_{\mathcal{H}_{K}},

which implies that ANA_{N} is bounded and that ANκC\|A_{N}\|\leq\kappa C. The compactness of the operator QNQ_{N} follows from a proof similar to that of Proposition 3.8 for QQ. ∎

Having defined the operators AA and ANA_{N}, we obtain an operator representation of the minimizers of the learning problems (3.4)-(3.5) and (3.7)-(3.8) that is similar to the one that was already stated in in [Hu 24].

Proposition 3.11.

Let h^λ,N\widehat{h}_{\lambda,N} and hλh_{\lambda}^{*} be the minimizers of (3.5) and (3.7), respectively. Then, for all λ>0\lambda>0, the minimizers hλh_{\lambda}^{*} and h^λ,N\widehat{h}_{\lambda,N} are unique and they are given by

hλ:\displaystyle h_{\lambda}^{*}: =(Q+λI)1AXH.\displaystyle=(Q+\lambda I)^{-1}A^{*}X_{H}.
h^λ,N:\displaystyle\widehat{h}_{\lambda,N}: =1N(QN+λI)1AN𝐗σ2,N, with QN=ANAN.\displaystyle=\frac{1}{\sqrt{N}}(Q_{N}+\lambda I)^{-1}A_{N}^{*}\mathbf{X}_{\sigma^{2},N},\mbox{ with }Q_{N}=A_{N}^{*}A_{N}. (3.14)

3.4 The global solution of the learning problem

We now derive a kernel representation for the solution of the learning problem (3.4)-(3.5). More precisely, we derive a closed-form expression for the estimator of a Hamiltonian function of the data-generating process by defining a generalized differential Gram matrix on manifolds. We shall see that the symmetric and positive semidefinite properties of this generalized differential Gram matrix guarantee the uniqueness of the estimator. The main expression that will be obtained is stated below in Theorem 3.13 and amounts to a differential version of the Representer Theorem for Poisson manifolds and that is how we shall refer to it. In order to derive it, we first define a generalized differential Gram matrix GN:T𝐙NPT𝐙NPG_{N}:T_{\mathbf{Z}_{N}}P\rightarrow T_{\mathbf{Z}_{N}}P as

GN𝐜:=XgN(𝐜,XK(𝐙N))(𝐙N),𝐜T𝐙NP.\displaystyle G_{N}\mathbf{c}:=X_{g_{N}(\mathbf{c},X_{K_{\cdot}}(\mathbf{Z}_{N}))}(\mathbf{Z}_{N}),\quad\mathbf{c}\in T_{\mathbf{Z}_{N}}P. (3.15)

As it was studied in [Hu 24], in symplectic vector spaces endowed with the canonical symplectic form ωcan\omega_{\mathrm{can}}, the generalized differential Gram matrix GNG_{N} defined in (3.15) reduces to the usual differential Gram matrix 𝕁can1,2K(𝐙N,𝐙N)𝕁can\mathbb{J}_{\mathrm{can}}\nabla_{1,2}K(\mathbf{Z}_{N},\mathbf{Z}_{N})\mathbb{J}_{\mathrm{can}}^{\top}, where 1,2K\nabla_{1,2}K represents the matrix of partial derivatives of KK with respect to the first and second arguments.

The general differential Gram matrix GNG_{N} defined in (3.15) corresponds, roughly speaking, to the operator QNQ_{N} defined in (3.13) (see also [Hu 24]). In the following proposition that is proved in the appendix, we see that the generalized differential Gram matrix GNG_{N} is symmetric and positive semidefinite.

Proposition 3.12.

Given a Mercer kernel KCb3(P×P)K\in C_{b}^{3}(P\times P) on the dd-dimensional Poisson manifold (P,{,})(P,\left\{\cdot,\cdot\right\}) and endowed with a Riemannian metric gg, the generalized differential Gram matrix GNG_{N} defined in (3.15) is symmetric and positive semidefinite with respect to the metric gNg_{N} in (3.12), that is,

gN(𝐜,GN𝐜)=gN(𝐜,GN𝐜) and gN(𝐜,GN𝐜)0 for all 𝐜,𝐜T𝐙NP.g_{N}(\mathbf{c},G_{N}\mathbf{c}^{\prime})=g_{N}(\mathbf{c}^{\prime},G_{N}\mathbf{c})~{}\text{ and }~{}g_{N}(\mathbf{c},G_{N}\mathbf{c})\geq 0~{}\text{ for all }\mathbf{c},\mathbf{c}^{\prime}\in T_{\mathbf{Z}_{N}}P.
Theorem 3.13 (Differential Representer Theorem on Poisson manifolds).

Let KCb3(P×P)K\in C_{b}^{3}(P\times P) be a Mercer kernel on the Poisson manifold (P,{,})(P,\left\{\cdot,\cdot\right\}) and endowed with a Riemannian metric gg. Suppose that Assumption 3.7 holds. Then for every λ>0\lambda>0, the estimator h^λ,N\widehat{h}_{\lambda,N} introduced in (3.4) as the minimizer of the regularized empirical risk functional R^λ,N\widehat{R}_{\lambda,N} in (3.5) can be written as

h^λ,N=gN(𝐜^,XK(𝐙N)),\widehat{h}_{\lambda,N}=g_{N}(\widehat{\bf c},X_{K_{\cdot}}(\mathbf{Z}_{N})), (3.16)

where 𝐜^T𝐙NP\widehat{\mathbf{c}}\in T_{\mathbf{Z}_{N}}P is given by

𝐜^=(GN+λNI)1𝐗σ2,N.\displaystyle\widehat{\mathbf{c}}=(G_{N}+\lambda NI)^{-1}\mathbf{X}_{\sigma^{2},N}.
Proof.

Denote the space KN\mathcal{H}_{K}^{N} by

KN:={gN(XK(𝐙N),𝐜),𝐜T𝐙NP},\mathcal{H}_{K}^{N}:=\left\{g_{N}(X_{K_{\cdot}}(\mathbf{Z}_{N}),\mathbf{c}),\ \mathbf{c}\in T_{\mathbf{Z}_{N}}P\right\},

By part (i) of Theorem 3.3, KN\mathcal{H}_{K}^{N} is a subspace of K\mathcal{H}_{K}. Then by the representation of the operator QNQ_{N} in the equation (3.13), we know that QN(KN)KNQ_{N}(\mathcal{H}_{K}^{N})\subseteq\mathcal{H}_{K}^{N}, that is, KN\mathcal{H}_{K}^{N} is an invariant space for the operator QNQ_{N}. This implies that, for any λ>0\lambda>0, (QN+λI)(KN)KN(Q_{N}+\lambda I)(\mathcal{H}_{K}^{N})\subseteq\mathcal{H}_{K}^{N}. By Proposition 3.12, the operator QNQ_{N} is positive semi-definite, we can conclude that the restriction (QN+λI)|KN(Q_{N}+\lambda I)|_{\mathcal{H}_{K}^{N}} is invertible and since the space KN\mathcal{H}_{K}^{N} is finite-dimensional then it is also an invariant subspace of (QN+λI)|KN1(Q_{N}+\lambda I)|_{\mathcal{H}_{K}^{N}}^{-1}, that is

(QN+λI)|KN1(KN)KN.(Q_{N}+\lambda I)|_{\mathcal{H}_{K}^{N}}^{-1}\left(\mathcal{H}_{K}^{N}\right)\subset\mathcal{H}_{K}^{N}.

Then, there exists a vector 𝐜^T𝐙NP\widehat{\mathbf{c}}\in T_{\mathbf{Z}_{N}}P such that

h^λ,N=gN(𝐜^,XK(𝐙N)).\widehat{h}_{\lambda,N}=g_{N}(\widehat{\mathbf{c}},X_{K_{\cdot}}(\mathbf{Z}_{N})). (3.17)

Then, applying (QN+λI)(Q_{N}+\lambda I) on the left of both sides on the estimator h^λ,N\widehat{h}_{\lambda,N} in (3.14) and plugging (3.17) into the identity, we obtain that

1NgN(XK(𝐙N),𝐗σ2,N)=1NgN(Xh^λ,N(𝐙N),XK(𝐙N))+λgN(𝐜^,XK(𝐙N)).\displaystyle\frac{1}{N}g_{N}\left(X_{K_{\cdot}}(\mathbf{Z}_{N}),\mathbf{X}_{\sigma^{2},N}\right)=\frac{1}{N}g_{N}\left(X_{\widehat{h}_{\lambda,N}}(\mathbf{Z}_{N}),X_{K_{\cdot}}(\mathbf{Z}_{N})\right)+\lambda g_{N}\left(\widehat{\mathbf{c}},X_{K_{\cdot}}(\mathbf{Z}_{N})\right). (3.18)

Therefore, we obtain that

GN𝐜^+λN𝐜^=𝐗σ2,N.\displaystyle G_{N}\widehat{\mathbf{c}}+\lambda N\widehat{\mathbf{c}}=\mathbf{X}_{\sigma^{2},N}.

Since the matrix GN+λNIG_{N}+\lambda NI is invertible due to the positive semi-definiteness of GNG_{N} that we proved in Proposition 3.12, we can write that

𝐜^=(GN+λNI)1𝐗σ2,N.\displaystyle\widehat{\mathbf{c}}=(G_{N}+\lambda NI)^{-1}\mathbf{X}_{\sigma^{2},N}. (3.19)

This shows that the function h^λ,N\widehat{h}_{\lambda,N} in (3.17) with 𝐜^\widehat{\mathbf{c}} determined by (3.19) coincides with minimizer (3.14) of the regularized empirical risk functional R^λ,N\widehat{R}_{\lambda,N} in (3.5). Since by Proposition 3.11, this minimizer is unique, the result follows. ∎

Remark 3.14 (On the uniqueness of the estimated Hamiltonian).

Define the kernel of the operator AA as follows:

null:={hKAh=Jh=0}.\displaystyle\mathcal{H}_{\mathrm{null}}:=\{h\in\mathcal{H}_{K}\mid Ah=J\nabla h=0\}.

It can be checked that null\mathcal{H}_{\mathrm{null}} is a closed vector subspace of K\mathcal{H}_{K}. Indeed, let {fn}n\left\{f_{n}\right\}_{n\in\mathbb{N}} be a convergent sequence in null\mathcal{H}_{\mathrm{null}} such that fnfK0\left\|f_{n}-f\right\|_{{\mathcal{H}}_{K}}\rightarrow 0 as nn\rightarrow\infty, for some fKf\in{\mathcal{H}}_{K}. Then, by (A.4) in the proof of Proposition 3.8 and Assumption 3.7, it holds that

supzPXf(z)g2=supzPX(fnf)(z)g2κ2C2fnfK20,as n,\sup_{z\in P}\|X_{f}(z)\|_{g}^{2}=\sup_{z\in P}\|X_{\left(f_{n}-f\right)}(z)\|_{g}^{2}\leq\kappa^{2}C^{2}\|f_{n}-f\|_{\mathcal{H}_{K}}^{2}\rightarrow 0,\quad\mbox{as $n\rightarrow\infty$,}

which implies that Af=0Af=0 and hence null\mathcal{H}_{\mathrm{null}} is a closed subspace of K{\mathcal{H}}_{K}. Therefore, K\mathcal{H}_{K} can be decomposed as

K=nullnull,\displaystyle\mathcal{H}_{K}=\mathcal{H}_{\mathrm{null}}\oplus\mathcal{H}_{\mathrm{null}}^{\bot},

where null\mathcal{H}_{\mathrm{null}}^{\bot} denotes the orthogonal complement of null\mathcal{H}_{\mathrm{null}} with respect to the RKHS inner product ,K\left\langle\cdot,\cdot\right\rangle_{\mathcal{H}_{K}}. Using this decomposition and the expression of the kernel estimator (3.16), we conclude that

h^λ,Nnull,\displaystyle\widehat{h}_{\lambda,N}\in\mathcal{H}_{\mathrm{null}}^{\bot}, (3.20)

which is due to the fact that for any hnullh\in\mathcal{H}_{\mathrm{null}},

h^λ,N,hK\displaystyle\langle\widehat{h}_{\lambda,N},h\rangle_{\mathcal{H}_{K}} =gN(XK(𝐙N),𝐜^),hK=D(1,0)K(𝐙N,)(𝕁N𝐜^),hK\displaystyle=\langle g_{N}(X_{K_{\cdot}}(\mathbf{Z}_{N}),\widehat{\mathbf{c}}),h\rangle_{\mathcal{H}_{K}}=\langle D^{(1,0)}K(\mathbf{Z}_{N},\cdot)\cdot(-\mathbb{J}_{N}\widehat{\mathbf{c}}),h\rangle_{\mathcal{H}_{K}}
=Dh(𝐙N)(𝕁N𝐜^)=gN(Xh(𝐙N),𝐜^)=0,\displaystyle=Dh(\mathbf{Z}_{N})\cdot(-\mathbb{J}_{N}\widehat{\mathbf{c}})=g_{N}(X_{h}(\mathbf{Z}_{N}),\widehat{\mathbf{c}})=0,

where the second and fourth equalities follow by Proposition 2.18, and the third equality is due to the differential reproducing property (3.2) in Theorem 3.3.

In general, the space null\mathcal{H}_{\mathrm{null}} could contain Casimir functions in 𝒞(P)\mathcal{C}(P) and this could, in principle, cause ill-posedness of the learning problem since adding Casimir functions in null\mathcal{H}_{\mathrm{null}} to h^λ,N\widehat{h}_{\lambda,N} would not change the corresponding Hamiltonian vector field. Therefore, one may wonder why, according to Theorem 3.13, the estimator h^λ,N\widehat{h}_{\lambda,N} is unique but not up to Casimir functions. The explanation is in the regularization term. Indeed, let h^λ,N\widehat{h}_{\lambda,N} be the minimizer in (3.16) and let hnullh\in\mathcal{H}_{\mathrm{null}}. Even though h^λ,N\widehat{h}_{\lambda,N} and h^λ,N+h\widehat{h}_{\lambda,N}+h have the same Hamiltonian vector field associated, it is easy to show that h^λ,N+h\widehat{h}_{\lambda,N}+h is a minimizer of (3.4) if and only if h0h\equiv 0. This is because of (3.20) and

R^λ,N(h^λ,N+h)\displaystyle\widehat{R}_{\lambda,N}(\widehat{h}_{\lambda,N}+h) :=1Nn=1NXh^λ,N+h(𝐙(n))𝐗σ2(n)g2+λh^λ,N+hK2\displaystyle:=\frac{1}{N}\sum_{n=1}^{N}\|X_{\widehat{h}_{\lambda,N}+h}(\mathbf{Z}^{(n)})-\mathbf{X}^{(n)}_{\sigma^{2}}\|_{g}^{2}+\lambda\|\widehat{h}_{\lambda,N}+h\|_{\mathcal{H}_{K}}^{2}
=1Nn=1NXh^λ,N(𝐙(n))𝐗σ2(n)g2+λ(h^λ,NK2+hK2+2h^λ,N,hK)\displaystyle=\frac{1}{N}\sum_{n=1}^{N}\|X_{\widehat{h}_{\lambda,N}}(\mathbf{Z}^{(n)})-\mathbf{X}^{(n)}_{\sigma^{2}}\|_{g}^{2}+\lambda\left(\|\widehat{h}_{\lambda,N}\|_{\mathcal{H}_{K}}^{2}+\|h\|^{2}_{\mathcal{H}_{K}}+2\langle\widehat{h}_{\lambda,N},h\rangle_{\mathcal{H}_{K}}\right)
=1Nn=1NXh^λ,N(𝐙(n))𝐗σ2(n)2+λ(h^λ,NK2+hK2).\displaystyle=\frac{1}{N}\sum_{n=1}^{N}\|X_{\widehat{h}_{\lambda,N}}(\mathbf{Z}^{(n)})-\mathbf{X}^{(n)}_{\sigma^{2}}\|^{2}+\lambda\left(\|\widehat{h}_{\lambda,N}\|_{\mathcal{H}_{K}}^{2}+\|h\|^{2}_{\mathcal{H}_{K}}\right).

Local coordinate representation of the kernel estimator.  In applications, it is important to have local coordinate expressions for the kernel estimator h^λ,N\widehat{h}_{\lambda,N}. Recall that by Theorem 3.13, the kernel estimator is given by

h^λ,N=gN(𝐜^,XK(𝐙N))=gN(𝐜^,𝕁NK(𝐙N,)),\widehat{h}_{\lambda,N}=g_{N}(\widehat{\bf c},X_{K_{\cdot}}(\mathbf{Z}_{N}))=g_{N}(\widehat{\bf c},\mathbb{J}_{N}\nabla K(\mathbf{Z}_{N},\cdot)),

where 𝐜^T𝐙NP\widehat{\mathbf{c}}\in T_{\mathbf{Z}_{N}}P is given by

𝐜^=(GN+λNI)1𝐗σ2,N.\displaystyle\widehat{\mathbf{c}}=(G_{N}+\lambda NI)^{-1}\mathbf{X}_{\sigma^{2},N}.

Since JJ can be locally represented as (2.17), it suffices to write down the Gram matrix GNG_{N} locally. Let 𝐳(1),,𝐳(N)\mathbf{z}^{(1)},\ldots,\mathbf{z}^{(N)} be the available data points and let {zn1,,znd}\{z_{n}^{1},\ldots,z_{n}^{d}\}, n=1,,Nn=1,\ldots,N, be local coordinates on open sets containing the points 𝐳(n)\mathbf{z}^{(n)}, for all n=1,,Nn=1,\ldots,N. Let g(n)g^{(n)} be the matrix associated to the Riemannian metric gg in the local coordinates {zn1,,znd}\{z_{n}^{1},\ldots,z_{n}^{d}\}. We first compute the following function for each 𝐜T𝐳NP\mathbf{c}\in T_{\mathbf{z}_{N}}P,

gN(𝐜,XK(𝐳N))=gN(𝐜,𝕁NK(𝐳N,))=i=1N(1K)(𝐳(i),)g(i)1J(𝐳(i))g(i)𝐜i.\displaystyle g_{N}(\mathbf{c},X_{K_{\cdot}}(\mathbf{z}_{N}))=g_{N}(\mathbf{c},\mathbb{J}_{N}\nabla K(\mathbf{z}_{N},\cdot))=\sum_{i=1}^{N}(\partial_{1}K)^{\top}(\mathbf{z}^{(i)},\cdot){g^{(i)}}^{-1}J^{\top}(\mathbf{z}^{(i)})g^{(i)}\mathbf{c}^{i}.

By the definition of the Gram matrix in (3.15) and combining it with the local representation of JJ in (2.17), we obtain that

(GN𝐜)i\displaystyle(G_{N}\mathbf{c})^{i} =XgN(𝐜,XK(𝐳N))(𝐳(i))=j=1NJ(𝐳(i))g(i)11,2K(𝐳(i),𝐳(i))g(j)1J(𝐳(j))g(j)𝐜j\displaystyle=X_{g_{N}(\mathbf{c},X_{K_{\cdot}}(\mathbf{z}_{N}))}(\mathbf{z}^{(i)})=\sum_{j=1}^{N}J(\mathbf{z}^{(i)}){g^{(i)}}^{-1}\partial_{1,2}K(\mathbf{z}^{(i)},\mathbf{z}^{(i)}){g^{(j)}}^{-1}J^{\top}(\mathbf{z}^{(j)})g^{(j)}\mathbf{c}^{j}
=j=1N(B(𝐳(i))g(i))g(i)11,2K(𝐳(j),𝐳(i))g(j)1(g(j)B(𝐳(j)))g(j)𝐜j\displaystyle=\sum_{j=1}^{N}(-B(\mathbf{z}^{(i)})g^{(i)}){g^{(i)}}^{-1}\partial_{1,2}K(\mathbf{z}^{(j)},\mathbf{z}^{(i)}){g^{(j)}}^{-1}(-{g^{(j)}}^{\top}B^{\top}(\mathbf{z}^{(j)}))g^{(j)}\mathbf{c}^{j}
=j=1NB(𝐳(i))1,2K(𝐳(j),𝐳(i))B(𝐳(j))g(j)𝐜j,\displaystyle=\sum_{j=1}^{N}B(\mathbf{z}^{(i)})\partial_{1,2}K(\mathbf{z}^{(j)},\mathbf{z}^{(i)})B^{\top}(\mathbf{z}^{(j)})g^{(j)}\mathbf{c}^{j},

which shows that the (i,j)(i,j)-component matrix GNi,jG_{N}^{i,j} of the general Gram matrix is given by

GNi,j=B(𝐳(i))1,2K(𝐳(i),𝐳(j))B(𝐳(j))g(j).\displaystyle G_{N}^{i,j}=B(\mathbf{z}^{(i)})\partial_{1,2}K(\mathbf{z}^{(i)},\mathbf{z}^{(j)})B^{\top}(\mathbf{z}^{(j)})g^{(j)}. (3.21)

3.5 Symmetries and momentum conservation by kernel estimated Hamiltonians

Studies have been carried out to bridge invariant feature learning with kernel methods [Mrou 15]. In this subsection, we will see that if the unknown Hamiltonian function is invariant under a canonical group action, then the structure-preserving kernel estimator can also be made invariant under the same action by imposing certain symmetry constraints on the kernel KK. As a result, the fibers of any momentum map associated with that group action that are preserved by the flow of the ground-truth Hamiltonian HH will also be preserved by the flow of the estimator h^λ,N\widehat{h}_{\lambda,N}.

Definition 3.15.

Left GG be a group acting on the manifold PP. A function h:Ph:P\rightarrow\mathbb{R} on the Poisson manifold PP is called GG-invariant if

h(gx)=h(x), for all g,gG and all xP.\displaystyle h(g\cdot x)=h(x),\quad\text{ for all $g,g^{\prime}\in G$ and all $x\in P$.}

A kernel K:P×PK:P\times P\rightarrow\mathbb{R} is called argumentwise invariant by the group action if

K(gx,gx)=K(x,x), for all g,gG and all x,xP.\displaystyle K(g\cdot x,g^{\prime}\cdot x^{\prime})=K(x,x^{\prime}),\quad\text{ for all $g,g^{\prime}\in G$ and all $x,x^{\prime}\in P$.}

Argumentwise invariant kernels can be constructed by double averaging (using the Haar measure if the group is compact) or, alternatively, if KK is an arbitrary Mercer kernel and ff is a GG-invariant function. Then the function KK^{\prime} defined as K(x,y)=K(f(x),f(y))K^{\prime}(x,y)=K(f(x),f(y)) is an argumentwise invariant kernel.

Proposition 3.16 (Noether’s Theorem for the structure-preserving kernel estimator).

Let GG be a Lie group acting canonically on the Poisson manifold (P,{,})(P,\left\{\cdot,\cdot\right\}). If the Mercer kernel KCb3(P×P)K\in C_{b}^{3}(P\times P) is argumentwise invariant under the group GG then the kernel estimator h^λ,N\widehat{h}_{\lambda,N} in Theorem 3.13 is GG-invariant. If additionally, the group action has a momentum map 𝐉:P𝔤\mathbf{J}:P\longrightarrow\mathfrak{g}^{\ast} associated, then its fibers are preserved by the flow FtF_{t} of the corresponding Hamiltonian vector field Xh^λ,NX_{\widehat{h}_{\lambda,N}}, that is,

𝐉Ft=𝐉.\mathbf{J}\circ F_{t}=\mathbf{J}.
Proof.

By [Gins 12, Property 3.13], if the kernel is argumentwise invariant under GG-action, then the corresponding RKHS K\mathcal{H}_{K} consists exclusively of GG-invariant functions. This implies that the kernel estimator is GG-invariant since h^λ,NnullK\widehat{h}_{\lambda,N}\in\mathcal{H}_{null}^{\bot}\subset\mathcal{H}_{K}. The statement on the momentum conservation is a straightforward consequence of Noether’s Theorem (see, for instance, [Mars 99, Theorem 11.4.1]). ∎

4 Estimation and approximation error analysis

We now analyze the extent to which the structure-preserving kernel estimator h^λ,N\widehat{h}_{\lambda,N} can recover the unknown Hamiltonian HH on a Poisson manifold PP. A standard approach is to decompose the reconstruction error h^λ,NH\widehat{h}_{\lambda,N}-H into the sum of what we shall call estimation and approximation errors, and further decompose the estimation error h^λ,Nhλ\widehat{h}_{\lambda,N}-h_{\lambda}^{*} into what we shall call sampling error and noisy sampling error. More specifically,

h^λ,NH\displaystyle\widehat{h}_{\lambda,N}-H =h^λ,NhλEstimation error+hλHApproximation error,\displaystyle=\underbrace{\widehat{h}_{\lambda,N}-h_{\lambda}^{*}}_{\text{Estimation error}}\quad+\underbrace{h_{\lambda}^{*}-H}_{\text{Approximation error}}, (4.1)
=h~λ,NhλSampling error+h^λ,Nh~λ,NNoisy sampling error+hλHApproximation error,\displaystyle=\underbrace{\widetilde{h}_{\lambda,N}-h_{\lambda}^{*}}_{\text{Sampling error}}\quad+\quad\underbrace{\widehat{h}_{\lambda,N}-\widetilde{h}_{\lambda,N}}_{\text{Noisy sampling error}}\quad+\quad\underbrace{h_{\lambda}^{*}-H}_{\text{Approximation error}},

where hλKh^{*}_{\lambda}\in\mathcal{H}_{K} is the best-in-class function introduced in (3.8) that minimizes the regularized statistical risk (3.7) and h~λ,N:=(QN+λ)1QNH\widetilde{h}_{\lambda,N}:=(Q_{N}+\lambda)^{-1}Q_{N}H is the noise-free part of the estimator h^λ,N\widehat{h}_{\lambda,N}. Indeed, the noisy sampling error can be computed as

h^λ,Nh~λ,N=1N(QN+λ)1AN𝐗σ2,N(QN+λ)1QNH=1N(QN+λ)1AN𝐄N\displaystyle\widehat{h}_{\lambda,N}-\widetilde{h}_{\lambda,N}=\frac{1}{\sqrt{N}}(Q_{N}+\lambda)^{-1}A_{N}^{*}\mathbf{X}_{\sigma^{2},N}-(Q_{N}+\lambda)^{-1}Q_{N}H=\frac{1}{\sqrt{N}}(Q_{N}+\lambda)^{-1}A_{N}^{*}\mathbf{E}_{N}

where the noise vector 𝐄N\mathbf{E}_{N} is

𝐄N=Vec(𝜺(1)||𝜺(N))T𝐙NP,\mathbf{E}_{N}=\mathrm{Vec}\left(\bm{\varepsilon}^{(1)}|\cdots|\bm{\varepsilon}^{(N)}\right)\in T_{\mathbf{Z}_{N}}P,

which, by hypothesis, follows a multivariate distribution with zero mean and variance σ2IdN\sigma^{2}I_{dN}.

Following the approach introduced in [Feng 23, Hu 24], we shall separately analyze these three errors. The estimation of the total error relies on both the operator representations in Section 3.3 and the differential kernel representations in Section 3.4 of the estimator. Since the operator representations of the estimator are similar to those in the Euclidean case, the approximation error and sampling error can be analyzed following an approach very close to the one in [Hu 24]. Therefore, to obtain the total error in (4.1), it is sufficient to analyze the noisy sampling error. As in the previous sections, we assume that KCb3(P×P)K\in C_{b}^{3}(P\times P) is a Mercer kernel. The proofs of the following results can be found in the appendix.

Lemma 4.1.

Let KCb3(P×P)K\in C_{b}^{3}(P\times P) be a Mercer kernel. For any function hKh\in\mathcal{H}_{K} and 0<δ<10<\delta<1, with probability at least 1δ1-\delta, there holds

QNhQhK(8log(2/δ)N+1)log(2/δ)NC2κ2hK.\displaystyle\|Q_{N}h-Qh\|_{\mathcal{H}_{K}}\leq\left(\sqrt{\frac{8\log(2/\delta)}{N}}+1\right)\sqrt{\frac{\log(2/\delta)}{N}}C^{2}\kappa^{2}\|h\|_{\mathcal{H}_{K}}.
Lemma 4.2 (Error bounds of noisy sampling error).

For any δ>0\delta>0, with a probability of at least 1δ/21-\delta/2, it holds that

h~λ,Nh^λ,NKσκλ1N(1+1clog(4/δ)),\displaystyle\left\|\widetilde{h}_{\lambda,N}-\widehat{h}_{\lambda,N}\right\|_{\mathcal{H}_{K}}\leq\frac{\sigma\kappa}{\lambda}\sqrt{\frac{1}{N}}\left(1+\sqrt{\frac{1}{c}\log(4/\delta)}\right),

where the constant cc is related to the Hanson-Wright inequality [Rude 13] (see the proof in Appendix A.5).

Assumption 4.3 (Source condition).

Assume that the unknown Hamiltonian function in the Poisson system (2.6) lies in so-called source space, that is,

HΩSγ:={hKh=Qγψ,ψK,ψK<S},\displaystyle H\in\Omega_{S}^{\gamma}:=\{h\in\mathcal{H}_{K}\mid h=Q^{\gamma}\psi,\psi\in\mathcal{H}_{K},\|\psi\|_{\mathcal{H}_{K}}<S\}, (4.2)

for some γ(0,1)\gamma\in(0,1) and S>0S>0.

Proposition 4.4.

If a Hamiltonian function HH satisfies the source condition (4.2), then HnullH\in\mathcal{H}_{\mathrm{null}}^{\bot}. In other words, ΩSγnull\Omega_{S}^{\gamma}\subset\mathcal{H}_{\mathrm{null}}^{\bot}.

Proof.

Recall first that by Proposition 3.8, the operator Q=AAQ=A^{*}A is a positive compact operator. Let Q=n=1Lλn,enenQ=\sum_{n=1}^{L}\lambda_{n}\langle\cdot,e_{n}\rangle e_{n} (possibly L=L=\infty) be the spectral decomposition of QQ with 0<λn+1<λn0<\lambda_{n+1}<\lambda_{n} and {en}n=1L\{e_{n}\}_{n=1}^{L} be an orthonormal basis of K\mathcal{H}_{K}. Hence for any γ(0,1)\gamma\in(0,1), we have Qγ=n=1Lλnγ,enKenQ^{\gamma}=\sum_{n=1}^{L}\lambda_{n}^{\gamma}\langle\cdot,e_{n}\rangle_{\mathcal{H}_{K}}e_{n}, which implies that QγQ^{\gamma} is adjoint. Notice that by the representation of the operator QQ given in (3.11), for an arbitrary Casimir function hnullh\in\mathcal{H}_{\mathrm{null}} (if it is not an empty set), we have that Qh=0Qh=0, which implies that λnh,enK=0\lambda_{n}\langle h,e_{n}\rangle_{\mathcal{H}_{K}}=0 for all n=1,,Ln=1,\ldots,L. Hence for any γ(0,1)\gamma\in(0,1), we have λnγh,enK=0\lambda_{n}^{\gamma}\langle h,e_{n}\rangle_{\mathcal{H}_{K}}=0 for all n=1,,Ln=1,\ldots,L. Then we obtain that Qγh=0Q^{\gamma}h=0. Finally for arbitrary ψK\psi\in\mathcal{H}_{K}, we compute the inner product Qγψ,hK=ψ,QγhK=0\langle Q^{\gamma}\psi,h\rangle_{\mathcal{H}_{K}}=\langle\psi,Q^{\gamma}h\rangle_{\mathcal{H}_{K}}=0, which yields that QγψnullQ^{\gamma}\psi\in\mathcal{H}_{\mathrm{null}}^{\bot} for any ψK\psi\in\mathcal{H}_{K}. Therefore, we conclude that ΩSγnull\Omega_{S}^{\gamma}\subset\mathcal{H}_{\mathrm{null}}^{\bot}. ∎

Proposition 4.5.

Let Q=n=1Lλn,enenQ=\sum_{n=1}^{L}\lambda_{n}\langle\cdot,e_{n}\rangle e_{n} (possibly L=L=\infty) be a spectral decomposition of QQ, with an orthonormal basis {en}n=1L\{e_{n}\}_{n=1}^{L} with 0<λn+1λn0<\lambda_{n+1}\leq\lambda_{n} for all n=1,,L1n=1,\ldots,L-1. Then, ΩSγ\Omega_{S}^{\gamma} has the following characterization

ΩSγ=Sγ:={hnulln=1L1λn2γ|h,enK|2<S2}.\displaystyle\Omega_{S}^{\gamma}=\mathcal{H}_{S}^{\gamma}:=\left\{h\in\mathcal{H}_{\mathrm{null}}^{\bot}\mid\sum_{n=1}^{L}\frac{1}{\lambda_{n}^{2\gamma}}|\langle h,e_{n}\rangle_{\mathcal{H}_{K}}|^{2}<S^{2}\right\}.
Proof.

First, we show that SγΩSγ\mathcal{H}_{S}^{\gamma}\subset\Omega_{S}^{\gamma}. For an arbitrary hSγh\in\mathcal{H}_{S}^{\gamma}, we can define a function ψspan{en,n+}\psi\in\operatorname{span}\{e_{n},n\in\mathbb{N}_{+}\} such that ψ,enK=1λnγh,enK\langle\psi,e_{n}\rangle_{\mathcal{H}_{K}}=\frac{1}{\lambda_{n}^{\gamma}}\langle h,e_{n}\rangle_{\mathcal{H}_{K}} for all n+n\in\mathbb{N}^{+}. One computes

ψK2=n+1λn2γ|h,enK|2<S2.\displaystyle\|\psi\|_{\mathcal{H}_{K}}^{2}=\sum_{n\in\mathbb{N}^{+}}\frac{1}{\lambda_{n}^{2\gamma}}|\langle h,e_{n}\rangle_{\mathcal{H}_{K}}|^{2}<S^{2}.

Furthermore, since hnull=span{en,n+}h\in\mathcal{H}_{\mathrm{null}}^{\bot}=\operatorname{span}\{e_{n},n\in\mathbb{N}_{+}\}, we have that

Bγψ=n+λnγψ,enKen=n+λnγ1λnγh,enKen=h,\displaystyle B^{\gamma}\psi=\sum_{n\in\mathbb{N}^{+}}\lambda_{n}^{\gamma}\langle\psi,e_{n}\rangle_{\mathcal{H}_{K}}e_{n}=\sum_{n\in\mathbb{N}^{+}}\lambda_{n}^{\gamma}\frac{1}{\lambda_{n}^{\gamma}}\langle h,e_{n}\rangle_{\mathcal{H}_{K}}e_{n}=h,

which yields that hΩSγh\in\Omega_{S}^{\gamma}. Now, we show that ΩSγSγ\Omega_{S}^{\gamma}\subset\mathcal{H}_{S}^{\gamma}. For an arbitrary hΩSγh\in\Omega_{S}^{\gamma}, there exists a function ψK\psi\in\mathcal{H}_{K} with norm ψK<S\|\psi\|_{\mathcal{H}_{K}}<S, such that Bγψ=hB^{\gamma}\psi=h. Let ψ\psi^{\prime} be the projection of ψ\psi onto null\mathcal{H}_{\mathrm{null}}^{\bot}, then h=Bγψh=B^{\gamma}\psi^{\prime}. Hence, we obtain that

n+λnγψ,enKen=n+h,enKen,\displaystyle\sum_{n\in\mathbb{N}^{+}}\lambda_{n}^{\gamma}\langle\psi^{\prime},e_{n}\rangle_{\mathcal{H}_{K}}e_{n}=\sum_{n\in\mathbb{N}^{+}}\langle h,e_{n}\rangle_{\mathcal{H}_{K}}e_{n},

which implies that ψ,enK=1λnγh,enK\langle\psi^{\prime},e_{n}\rangle_{\mathcal{H}_{K}}=\frac{1}{\lambda_{n}^{\gamma}}\langle h,e_{n}\rangle_{\mathcal{H}_{K}} for all n+n\in\mathbb{N}^{+} since λn>0\lambda_{n}>0. Therefore,

ψK2=n+1λn2γ|h,enK|2<S2.\displaystyle\|\psi^{\prime}\|_{\mathcal{H}_{K}}^{2}=\sum_{n\in\mathbb{N}^{+}}\frac{1}{\lambda_{n}^{2\gamma}}|\langle h,e_{n}\rangle_{\mathcal{H}_{K}}|^{2}<S^{2}.

Combing Proposition 4.4, we have that hSγh\in\mathcal{H}_{S}^{\gamma}. Therefore, we can conclude that ΩSγ=Sγ\Omega_{S}^{\gamma}=\mathcal{H}_{S}^{\gamma}. ∎

In Section 3.4, we showed that the kernel estimator h^λ,Nnull\widehat{h}_{\lambda,N}\in\mathcal{H}_{\mathrm{null}}^{\bot}. In this section, we will prove that for an arbitrary Hamiltonian function HH in the source space ΩSγ\Omega_{S}^{\gamma}, the kernel estimator h^λ,N\widehat{h}_{\lambda,N} converges to the Hamiltonian HH as NN tends to infinity.

Notice that the results in Lemma 4.2 are very similar to the error bounds of the noisy sampling error in [Hu 24]. Hence, combining the approximation error and the noisy sampling error bounds obtained in [Hu 24], we immediately formulate probably approximately correct (PAC) bounds for the reconstruction error in which the regularization constant λ\lambda remains fixed and the size of the estimation sample is allowed to vary independently from it.

Theorem 4.6 (PAC bounds of the total reconstruction error).

Let h^λ,N\widehat{h}_{\lambda,N} be the unique minimizer of the optimization problem (3.4). Suppose that Assumption 4.3 holds, that is, HΩSγH\in\Omega_{S}^{\gamma} as defined in (4.2). Then, for any ε,δ>0\varepsilon,\delta>0, there exist λ>0\lambda>0 and n+n\in\mathbb{N}_{+} such that for all N>nN>n, it holds that

(h^λ,NHK>ε)<δ.\displaystyle\mathbb{P}\left(\left\|\widehat{h}_{\lambda,N}-H\right\|_{\mathcal{H}_{K}}>\varepsilon\right)<\delta.

Notice that by Theorem 4.6, we can make the total reconstruction error arbitrarily small with the probability close to one with a fixed regularization parameter. However, this theorem provides no convergence rates. Similar to the approach followed in [Hu 24], we now consider an adaptive regularization parameter to obtain convergence rates. To this end, we shall assume that

λNα,α>0,\displaystyle\lambda\propto N^{-\alpha},\quad\alpha>0, (4.3)

where the symbol \propto means that λ\lambda has the order NαN^{-\alpha} as NN\to\infty. Then, combining Lemma 4.2 with [Hu 24, Theorem 4.7], we obtain convergence rates for the total reconstruction error when α(0,13)\alpha\in(0,\frac{1}{3}). If we further suppose that the coercivity condition in the following definition holds, we can then improve the convergence integral to α(0,12)\alpha\in(0,\frac{1}{2}).

Definition 4.7 (Coercivity condition).

In the same setup as in Theorem 3.13 we say that the coercivity condition is satisfied, if for all hKh\in\mathcal{H}_{K} there exists cK>0c_{\mathcal{H}_{K}}>0 such that

AhL2(μ𝐙)2=XhL2(μ𝐙)2cKhK2.\displaystyle\|Ah\|^{2}_{L^{2}(\mu_{\mathbf{Z}})}=\|X_{h}\|^{2}_{L^{2}(\mu_{\mathbf{Z}})}\geq c_{\mathcal{H}_{K}}\|h\|^{2}_{\mathcal{H}_{K}}. (4.4)

The largest constant cKc_{\mathcal{H}_{K}} that satisfies (4.4) is called the coercivity constant.

Theorem 4.8 (Convergence upper rates for the total reconstruction error).

Let h^λ,N\widehat{h}_{\lambda,N} be the unique minimizer of the optimization problem (3.4). Suppose that λ\lambda satisfies (4.3) and that HH satisfies the source condition (4.2), that is, HΩSγH\in\Omega_{S}^{\gamma}. Then for all α(0,13)\alpha\in(0,\frac{1}{3}), and for any 0<δ<10<\delta<1, with probability as least 1δ1-\delta, it holds that

h^λ,NHKC(γ,δ,κ)Nmin{αγ,12(13α)},\displaystyle\left\|\widehat{h}_{\lambda,N}-H\right\|_{\mathcal{H}_{K}}\leq C(\gamma,\delta,\kappa)~{}N^{-\min\{\alpha\gamma,\frac{1}{2}(1-3\alpha)\}},

where

C(γ,δ,κ)=max{BγHK,84log(8/δ)d32κ3HK}.C(\gamma,\delta,\kappa)=\max\left\{\|B^{-\gamma}H\|_{\mathcal{H}_{K}},8\sqrt{4\log(8/\delta)}d^{\frac{3}{2}}\kappa^{3}\|H\|_{\mathcal{H}_{K}}\right\}.

Moreover, if the coercivity condition (4.4) holds, then for all α(0,12)\alpha\in(0,\frac{1}{2}), and for any 0<δ<10<\delta<1, and with probability as least 1δ1-\delta, it holds that

h^λ,NHKC(γ,δ,σ,κ,c,cK)Nmin{αγ,12(12α)},\displaystyle\|\widehat{h}_{\lambda,N}-H\|_{\mathcal{H}_{K}}\leq C(\gamma,\delta,\sigma,\kappa,c,c_{\mathcal{H}_{K}})~{}N^{-\min\{\alpha\gamma,\frac{1}{2}(1-2\alpha)\}},

where

C(γ,δ,σ,κ,c,cK)=max{BγHK,σκd(1+1clog(4/δ)),42log(8/δ)dκ2(2+κdcK)HK}.C(\gamma,\delta,\sigma,\kappa,c,c_{\mathcal{H}_{K}})\\ =\max\left\{\|B^{-\gamma}H\|_{\mathcal{H}_{K}},\sigma\kappa\sqrt{d}\left(1+\sqrt{\frac{1}{c}\log(4/\delta)}\right),4\sqrt{2\log(8/\delta)}d\kappa^{2}\left(2+\kappa\sqrt{\frac{d}{c_{\mathcal{H}_{K}}}}\right)\|H\|_{\mathcal{H}_{K}}\right\}.

Theorem 4.8 guarantees that the structure-preserving kernel estimator provides a function that is close to the data-generating Hamiltonian function with respect to the RKHS norm. As a consequence, we now prove that the flow of the learned Hamiltonian system will uniformly approximate that of the data-generating one.

In the following lemma and proposition, we take the Sasaki metric on the tangent buddle TPTP induced by the Riemannian metric gg on the Poisson manifold PP. To define Sasaki metric on TPTP, let (x,v)TP(x,v)\in TP and V,WV,W be tangent vectors to TPTP at (x,v)(x,v). Choose curves in TPTP

α:t(x(t),v(t)),β:s(y(t),w(t)),\displaystyle\alpha:t\mapsto(x(t),v(t)),\quad\beta:s\mapsto(y(t),w(t)),

with x(0)=y(0)=xx(0)=y(0)=x, v(0)=w(0)=vv(0)=w(0)=v and V=α(0)V=\alpha^{\prime}(0), W=β(0)W=\beta^{\prime}(0). Then the Sasaki metric is defined as

g1(x,v)(V,W)=g(x)(dπ(V),dπ(W))+g(x)(Ddt|t=0v(t),Ddt|t=0w(t)),\displaystyle g_{1}(x,v)(V,W)=g(x)(d\pi(V),d\pi(W))+g(x)\left(\frac{D}{\mathrm{d}t}\bigg{|}_{t=0}v(t),\frac{D}{\mathrm{d}t}\bigg{|}_{t=0}w(t)\right), (4.5)

where π:TPP\pi:TP\to P is the canonical projection, Ddt\frac{D}{\mathrm{d}t} stands for the covariant derivative. As stated in Remark 2.14, we shall assume that this Sasaki metric (4.5) is complete. Then we obtain the following lemma.

Lemma 4.9.

Let hCb2(P)h\in C_{b}^{2}(P). Then for any fC1(P)f\in C^{1}(P) and z1,z2Pz_{1},z_{2}\in P, we have

Xh[f](z1)Xh[f](z2)hCb2d(z1,z2),\displaystyle X_{h}[f](z_{1})-X_{h}[f](z_{2})\leq\|h\|_{C_{b}^{2}}d(z_{1},z_{2}),

where dd is the geodesic distance on the Riemannian manifold (P,g)(P,g).

Proof.

Since hCb2(P)h\in C_{b}^{2}(P), we have that DhCb1(TP)Dh\in C_{b}^{1}(TP). Then by the Fundamental Theorem of Calculus (2.10), we obtain that

Dh(y1)Dh(y2)=01D2h(γ(t))γ(t)dtD2h01γ(t)g1dt,\displaystyle Dh(y_{1})-Dh(y_{2})=\int_{0}^{1}D^{2}h(\gamma(t))\cdot\gamma^{\prime}(t)\mathrm{d}t\leq\|D^{2}h\|_{\infty}\int_{0}^{1}\|\gamma^{\prime}(t)\|_{g_{1}}\mathrm{d}t,

where γ:[0,1]TP\gamma:[0,1]\to TP is a differentiable curve connecting γ(0)=y1\gamma(0)=y_{1} and γ(1)=y2\gamma(1)=y_{2} in the Riemannian manifold (TP,g1)(TP,g_{1}) with Sasaki metric g1g_{1} induced by the Riemannian metric gg on PP. Denote γ(t)=(z(t),v(t))\gamma(t)=(z(t),v(t)) with z(t)Pz(t)\in P and v(t)Tz(t)Pv(t)\in T_{z(t)}P for all t[0,1]t\in[0,1]. For each t[0,1]t\in[0,1], let αt:[0,1]TP\alpha_{t}:[0,1]\to TP be a smooth curve such that αt(0)=(z(t),v(t))\alpha_{t}(0)=(z(t),v(t)) and αt(0)=(z(t),v(t))\alpha^{\prime}_{t}(0)=(z^{\prime}(t),v^{\prime}(t)). Denote αt=(at,bt)\alpha_{t}=(a_{t},b_{t}) with at(s)Pa_{t}(s)\in P and bt(s)Tat(s)Pb_{t}(s)\in T_{a_{t}(s)}P for all s[0,1]s\in[0,1]. Notice that

Dds|s=0bt(s)=a˙t(0)bt(0)=z(t)v(t).\displaystyle\frac{D}{\mathrm{d}s}\bigg{|}_{s=0}b_{t}(s)=\nabla_{\dot{a}_{t}(0)}b_{t}(0)=\nabla_{z^{\prime}(t)}v(t).

Then, by the definition of the Sasaki metric g1g_{1} in (4.5), we obtain

γ(t)g1=a˙t(0)g+Dds|s=0bt(s)g=z(t)g+z(t)v(t)g.\displaystyle\|\gamma^{\prime}(t)\|_{g_{1}}=\|\dot{a}_{t}(0)\|_{g}+\left\|\frac{D}{\mathrm{d}s}\bigg{|}_{s=0}b_{t}(s)\right\|_{g}=\|z^{\prime}(t)\|_{g}+\left\|\nabla_{z^{\prime}(t)}v(t)\right\|_{g}.

Let y1=(z1,Xf(z1))y_{1}=(z_{1},X_{f}(z_{1})) and y2=(z2,Xf(z2))y_{2}=(z_{2},X_{f}(z_{2})) for z1,z2Pz_{1},z_{2}\in P. Choose the {z(t),t[0,1]}\{z(t),t\in[0,1]\} be the geodesic curve in PP. Let {v(t),t[0,1]}\{v(t),t\in[0,1]\} be a curve which is parallel along the curve {z(t):t[0,1]}\{z^{\prime}(t):t\in[0,1]\}, i.e., z(t)v(t)=0\nabla_{z^{\prime}(t)}v(t)=0 for all t[0,1]t\in[0,1]. Then we obtain that γ(t)g1=z(t)g\|\gamma^{\prime}(t)\|_{g_{1}}=\|z^{\prime}(t)\|_{g}. Therefore,

Xh[f](z1)Xh[f](z2)=Dh(y1)Dh(y2)D2h01z(t)gdthCb2d(z1,z2),\displaystyle X_{h}[f](z_{1})-X_{h}[f](z_{2})=Dh(y_{1})-Dh(y_{2})\leq\|D^{2}h\|_{\infty}\int_{0}^{1}\|z^{\prime}(t)\|_{g}\mathrm{d}t\leq\|h\|_{C_{b}^{2}}d(z_{1},z_{2}),

where dd is the geodesic distance on the Riemannian manifold (P,g)(P,g). The result follows. ∎

Proposition 4.10 (From discrete data to continuous-time flows).

Let H^=h^λ,N\widehat{H}=\widehat{h}_{\lambda,N} be the structure-preserving kernel estimator of HH using a kernel KCb5(P×P)K\in C_{b}^{5}(P\times P). Let F:[0,T]×PPF:[0,T]\times P\longrightarrow P and F^:[0,T]×PP\widehat{F}:[0,T]\times P\longrightarrow P be the flows over the time interval [0,T][0,T] of the Hamilton equations associated to the Hamiltonian functions HH and H^\widehat{H}, respectively. Suppose that the geodesic distance d(,y)d(\cdot,y) of the Poisson manifold PP is in Cb1(P)C_{b}^{1}(P) for each yPy\in P and that the Cb1C_{b}^{1} norm of the family {d(,y),yp}\{d(\cdot,y),y\in p\} is uniformly bounded, that is, C1=supyPd(,y)Cb1<C_{1}=\sup_{y\in P}\|d(\cdot,y)\|_{C_{b}^{1}}<\infty. Then, for any initial condition zPz\in P, we have that

d(F(z),F^(z)):=maxt[0,T]d(Ft(z),F^t(z))C~HH^K,\displaystyle d^{\infty}\left(F({z}),\widehat{F}({z})\right):=\max_{t\in[0,T]}d\left(F_{t}({z}),\widehat{F}_{t}({z})\right)\leq\widetilde{C}\|H-\widehat{H}\|_{\mathcal{H}_{K}},

with the constant C~=κCC1Texp{H^Cb2T}\widetilde{C}=\kappa CC_{1}T\exp\{\|\widehat{H}\|_{C_{b}^{2}}T\}, where CC is the constant introduced in Assumption 3.7.

Proof.

Let fCb1(P)f\in C_{b}^{1}(P). For each t[0,T]t\in[0,T] and zP{z}\in P, we have

ddt(fFt)(z)=𝐝f(Ft(z))XH(Ft(z))=XH[f](Ft(z)).\displaystyle\frac{\mathrm{d}}{\mathrm{d}t}(f\circ F_{t})(z)=\mathbf{d}f(F_{t}(z))\cdot X_{H}(F_{t}(z))=X_{H}[f](F_{t}(z)).

By the equation (A.4), we have that for any zPz\in P,

XH[f](z)XH^[f](z)\displaystyle X_{H}[f](z)-X_{\widehat{H}}[f](z) =Df(z)(XH(z)XH^(z))fCb1XHH^(z)gκCfCb1HH^K.\displaystyle=Df(z)\cdot(X_{H}(z)-X_{\widehat{H}}(z))\leq\|f\|_{C_{b}^{1}}\|X_{H-\widehat{H}}(z)\|_{g}\leq\kappa C\|f\|_{C_{b}^{1}}\|H-\widehat{H}\|_{\mathcal{H}_{K}}.

Since KCb5(P×P)K\in C_{b}^{5}(P\times P), Theorem 3.3 implies that H^Cb2(P)\widehat{H}\in C_{b}^{2}(P). Then by Lemma 4.9, for any z1,z2Pz_{1},z_{2}\in P,

XH^[f](z1)XH^[f](z2)H^Cb2d(z1,z2).\displaystyle X_{\widehat{H}}[f](z_{1})-X_{\widehat{H}}[f](z_{2})\leq\|\widehat{H}\|_{C_{b}^{2}}d(z_{1},z_{2}).

Furthermore, for each t[0,T]t\in[0,T], we can choose ytPy_{t}\in P such that the point F^t(z)\widehat{F}_{t}({z}) is in the geodesic curve connecting Ft(z)F_{t}({z}) and yty_{t}. Now, by hypothesis, f:=d(,yt)Cb1(P)f:=d(\cdot,y_{t})\in C_{b}^{1}(P). Therefore, we obtain that

d(Ft(z),F^t(z))\displaystyle d\left(F_{t}({z}),\widehat{F}_{t}({z})\right) =f(Ft(z))f(F^t(z))=0tXH[f](Fs(z))XH^[f](F^s(z))ds\displaystyle=f(F_{t}(z))-f(\widehat{F}_{t}(z))=\int_{0}^{t}X_{H}[f](F_{s}({z}))-X_{\widehat{H}}[f](\widehat{F}_{s}({z}))\mathrm{d}s
0t|XH[f](Fs(z))XH^[f](Fs(z))|+|XH^[f](Fs(z))XH^[f](F^s(z))|ds\displaystyle\leq\int_{0}^{t}\left|X_{H}[f](F_{s}({z}))-X_{\widehat{H}}[f](F_{s}({z}))\right|+\left|X_{\widehat{H}}[f](F_{s}({z}))-X_{\widehat{H}}[f](\widehat{F}_{s}({z}))\right|\mathrm{d}s
κCfCb1HH^Kt+H^Cb20td(Fs(z),F^s(z))ds.\displaystyle\leq\kappa C\|f\|_{C_{b}^{1}}\|H-\widehat{H}\|_{\mathcal{H}_{K}}t+\|\widehat{H}\|_{C_{b}^{2}}\int_{0}^{t}d\left(F_{s}(z),\widehat{F}_{s}(z)\right)\mathrm{d}s.

Then by the integral form of Grönwall’s inequality, for d(Ft(z),F^t(z))d\left(F_{t}({z}),\widehat{F}_{t}({z})\right), we obtain

d(Ft(z),F^t(z))κCd(,yt)Cb1HH^Ktexp{H^Cb2t}.\displaystyle d\left(F_{t}({z}),\widehat{F}_{t}({z})\right)\leq\kappa C\|d(\cdot,y_{t})\|_{C_{b}^{1}}\|H-\widehat{H}\|_{\mathcal{H}_{K}}t\exp\left\{\|\widehat{H}\|_{C_{b}^{2}}t\right\}.

Therefore, we obtain

d(F(z),F^(z)):=maxt[0,T]d(Ft(z),F^t(z))C~HH^K,\displaystyle d^{\infty}\left(F({z}),\widehat{F}({z})\right):=\max_{t\in[0,T]}d\left(F_{t}({z}),\widehat{F}_{t}({z})\right)\leq\widetilde{C}\|H-\widehat{H}\|_{\mathcal{H}_{K}},

where C~=κCC1Texp{H^Cb2T}\widetilde{C}=\kappa CC_{1}T\exp\{\|\widehat{H}\|_{C_{b}^{2}}T\}. The result follows. ∎

5 Test examples and numerical experiments

This section spells out the global structure-preserving kernel estimator and illustrates its performance on two important instances of Poisson systems, namely, Hamiltonian systems on non-Euclidean symplectic manifolds and Lie-Poisson systems on the dual of a Lie algebra. More precisely, we shall derive explicit expressions for the global estimator h^λ,N\widehat{h}_{\lambda,N} in (3.16) in these two scenarios.

5.1 Learning of Lie-Poisson systems

The Lie-Poisson systems introduced in Example 2.3 are very important examples of Poisson systems. In this subsection, we shall make explicit the estimator h^λ,N\widehat{h}_{\lambda,N} in this case. Let 𝔤\mathfrak{g} be a Lie algebra and 𝔤\mathfrak{g}^{*} be its dual equipped with the Lie-Poisson bracket {,}+\{\cdot,\cdot\}_{+} defined in (2.2). A straightforward computation shows that the Lie-Poisson dynamical system associated with a Hamiltonian function H:𝔤H:\mathfrak{g}^{*}\to\mathbb{R} is

μ˙=XH(μ)=adδHδμμ,\displaystyle\dot{\mu}=X_{H}(\mu)=\operatorname{ad}^{*}_{\frac{\delta H}{\delta\mu}}\mu, (5.1)

where δH/δμ𝔤\delta H/\delta\mu\in\mathfrak{g} is the functional derivative introduced in (2.3). Since 𝔤\mathfrak{g} and 𝔤\mathfrak{g}^{\ast} are vector spaces, natural Riemannian metrics can be obtained out of a non-degenerate inner product ,𝔤:𝔤×𝔤\left\langle\cdot,\cdot\right\rangle_{\mathfrak{g}}:\mathfrak{g}\times\mathfrak{g}\rightarrow\mathbb{R} on 𝔤\mathfrak{g}. The non-degeneracy of ,𝔤\left\langle\cdot,\cdot\right\rangle_{\mathfrak{g}} allows us to define musical isomorphisms :𝔤𝔤\flat:\mathfrak{g}\rightarrow\mathfrak{g}^{\ast} and :𝔤𝔤\sharp:\mathfrak{g}^{\ast}\rightarrow\mathfrak{g} by ξ:=ξ,𝔤\xi^{\flat}:=\left\langle\xi,\cdot\right\rangle_{\mathfrak{g}}, ξ𝔤\xi\in\mathfrak{g}, and :=1\sharp:=\flat^{-1}, as well as a natural non-degenerate inner product ,𝔤:𝔤×𝔤\left\langle\cdot,\cdot\right\rangle_{\mathfrak{g}^{\ast}}:\mathfrak{g}^{\ast}\times\mathfrak{g}^{\ast}\rightarrow\mathbb{R} on 𝔤\mathfrak{g}^{\ast} given by ξ,η𝔤=ξ,η𝔤\left\langle\xi^{\flat},\eta^{\flat}\right\rangle_{\mathfrak{g}^{\ast}}=\left\langle\xi,\eta\right\rangle_{\mathfrak{g}}, ξ,η𝔤\xi,\eta\in\mathfrak{g}. Notice that

ξ,η=ξ,𝔤,η=ξ,η𝔤=ξ,η𝔤for all ξ,η𝔤.\left\langle\xi^{\flat},\eta\right\rangle=\left\langle\left\langle\xi,\cdot\right\rangle_{\mathfrak{g}},\eta\right\rangle=\left\langle\xi,\eta\right\rangle_{\mathfrak{g}}=\left\langle\xi^{\flat},\eta^{\flat}\right\rangle_{\mathfrak{g}^{\ast}}\quad\mbox{for all $\xi,\eta\in\mathfrak{g}$.} (5.2)

Using these relations and (2.3), we note that

H(μ),ν𝔤=DH(μ)ν=ν,δHδμ=ν,δHδμ𝔤, for all ν,μ𝔤.\left\langle\nabla H(\mu),\nu\right\rangle_{\mathfrak{g}^{\ast}}=DH(\mu)\cdot\nu=\left\langle\nu,\frac{\delta H}{\delta\mu}\right\rangle=\left\langle\nu,\frac{\delta H}{\delta\mu}^{\flat}\right\rangle_{\mathfrak{g}^{\ast}},\mbox{ for all $\nu,\mu\in\mathfrak{g}^{\ast}$.} (5.3)

Consequently,

H(μ)=δHδμ=δHδμ,𝔤and hence δHδμ=H(μ).\nabla H(\mu)=\frac{\delta H}{\delta\mu}^{\flat}=\left\langle\frac{\delta H}{\delta\mu},\cdot\right\rangle_{\mathfrak{g}}\quad\mbox{and hence $\frac{\delta H}{\delta\mu}=\nabla H(\mu)^{\sharp}$.} (5.4)

We now compute the compatible structure JJ defined in (2.12) associated with the Lie-Poisson systems (5.1). Note first that, for any μ𝔤\mu\in\mathfrak{g}^{*} and η𝔤\eta\in\mathfrak{g}, by (5.1) and (5.4), we have that

J(μ)H(μ)=XH(μ)=adδHδμμ=adH(μ)μ.\displaystyle J(\mu)\nabla H(\mu)=X_{H}(\mu)=\operatorname{ad}^{*}_{\frac{\delta H}{\delta\mu}}\mu=\operatorname{ad}^{*}_{\nabla H(\mu)^{\sharp}}\mu.

As this can be done for any Hamiltonian function and their gradients span 𝔤\mathfrak{g}^{\ast}, this yields

J(μ)ν=adνμ, for all ν𝔤.\displaystyle J(\mu)\nu=\operatorname{ad}^{*}_{\nu^{\sharp}}\mu,\quad\text{ for all }\nu\in{\mathfrak{g}}^{\ast}. (5.5)

To compute the estimator to h^λ,N\widehat{h}_{\lambda,N} one first evaluates JJ explicitly using (5.5), we then set

𝕁N=diag{J(𝐙(1)),,J(𝐙(N))},\mathbb{J}_{N}=\operatorname{diag}\{J(\mathbf{Z}^{(1)}),\ldots,J(\mathbf{Z}^{(N)})\},

and substitute it into (3.16). More explicitly, note that the generalized differential Gram matrix GNG_{N} defined in (3.15) reduces in this case to

GN𝐜=XgN(𝐜,XK(𝐙N))(𝐙N)=X𝐜𝕁N1K(𝐙N,)(𝐙N)=𝕁N1,2K(𝐙N,𝐙N)𝕁N𝐜.\displaystyle G_{N}\mathbf{c}=X_{g_{N}(\mathbf{c},X_{K_{\cdot}}(\mathbf{Z}_{N}))}(\mathbf{Z}_{N})=X_{\mathbf{c}^{\top}\mathbb{J}_{N}\nabla_{1}K(\mathbf{Z}_{N},\cdot)}(\mathbf{Z}_{N})=\mathbb{J}_{N}\nabla_{1,2}K(\mathbf{Z}_{N},\mathbf{Z}_{N})\mathbb{J}_{N}^{\top}\mathbf{c}.

for all 𝐜T𝐙N𝔤\mathbf{c}\in T_{\mathbf{Z}_{N}}\mathfrak{g}^{*}. Therefore, the estimator in Theorem 3.13 turns to

h^λ,N\displaystyle\widehat{h}_{\lambda,N} =gN((GN+λNI)1𝐗σ2,N,XK(𝐙N,))\displaystyle=g_{N}((G_{N}+\lambda NI)^{-1}\mathbf{X}_{\sigma^{2},N},X_{K_{\cdot}}(\mathbf{Z}_{N},\cdot))
=𝐗σ2,N(𝕁N1,2K(𝐙N,𝐙N)𝕁N+λNI)1𝕁N1K(𝐙N,).\displaystyle=\mathbf{X}_{\sigma^{2},N}^{\top}\left(\mathbb{J}_{N}\nabla_{1,2}K(\mathbf{Z}_{N},\mathbf{Z}_{N})\mathbb{J}_{N}^{\top}+\lambda NI\right)^{-1}\mathbb{J}_{N}\nabla_{1}K(\mathbf{Z}_{N},\cdot). (5.6)

In what follows, we demonstrate the effectiveness of this learning scheme using two examples. The first one is the three-dimensional classical rigid body, which models the motion of solid bodies without deformation. The second example is the dynamics of the underwater vehicle, which has a nine-dimensional underlying phase space. We emphasize that the state spaces of Lie-Poisson systems are duals of Lie algebras and, hence, are isomorphic to Euclidean spaces; in such a framework, the Gaussian kernel is a natural choice of universal kernels. As such, even when HH does not belong to K\mathcal{H}_{K}, we can hope for learning a proxy function in K\mathcal{H}_{K} that is close to a ground-truth Hamiltonian HH.

We will see in these two examples that h^λ,N\widehat{h}_{\lambda,N} can only recover the ground-truth Hamiltonian function up to a Casimir function. We emphasize that the estimator h^λ,N\widehat{h}_{\lambda,N} is always unique, and the possible difference by a Casimir function is present because h^λ,Nnull\widehat{h}_{\lambda,N}\in\mathcal{H}^{\bot}_{\mathrm{null}}, while the ground-truth HH may not belong to null\mathcal{H}^{\bot}_{\mathrm{null}}, and hence the difference between HH and h^λ,N\widehat{h}_{\lambda,N} could converge not to zero, but to a Casimir function. An analytic reason for this is that the Hamiltonian functions that we shall be using for data generation are quadratic and violate the source condition (4.2) (see Example 2.9 in [Hu 24] for a characterization of the RKHS of the Gaussian kernel). To better illustrate this point, we provide a third example in which the Hamiltonian HnullH\in\mathcal{H}^{\bot}_{\mathrm{null}} satisfies the source condition (4.2) for sufficiently large S>0S>0 (SS is the constant introduced in (4.2); see also Proposition 4.5). In that case, according to Theorem 4.8, h^λ,N\widehat{h}_{\lambda,N} converges to HH in the K\mathcal{H}_{K} norm, and the difference converges to 0 as NN\rightarrow\infty. We will see that, numerically, h^λ,N\widehat{h}_{\lambda,N} accurately recovers the ground-truth Hamiltonian.

5.1.1 The rigid body

Rigid body motion has been introduced in Example 2.5. Using (5.5), the compatible structure JJ associated with the Euclidean metric in 3{\mathbb{R}}^{3} can be written in this case as

J(Π)=Π^.\displaystyle J(\Pi)=\hat{\Pi}.

Numerical results  We pick 𝕀=diag{1,10,0.1}\mathbb{I}=\operatorname{diag}\{1,10,0.1\} and consider the standard Gaussian kernel with parameter η\eta on 𝔰𝔬(3)3\mathfrak{so}(3)^{*}\cong\mathbb{R}^{3}. In order to generate the data, we set N=500N=500 and sample NN states from the uniform distribution on the cube [1,1]3[-1,1]^{3}. We then obtain the Hamiltonian vector fields at these states that, this time, we do not perturb with noise (σ=0\sigma=0). In view of (4.3), we set the regularization parameter λ=cNα\lambda=c\cdot N^{-\alpha} with α=0.4\alpha=0.4. We then perform a grid search for the parameters η\eta and cc using 55-fold cross-validation, which results in the optimal parameters η=2.5\eta=2.5 and c=2.5105c=2.5\cdot 10^{-5}. Given that 𝔰𝔬(3)3\mathfrak{so}(3)^{*}\cong\mathbb{R}^{3} is 33-dimensional, we fix the third dimension, that is, Π3=0\Pi_{3}=0, and visualize both the ground-truth Hamiltonian functions HH and the reconstructed Hamiltonian functions h^λ,N\widehat{h}_{\lambda,N} in the dimensions (Π1,Π2)(\Pi_{1},\Pi_{2}), see Figure 5.1 (a) and (b). The ground-truth Hamiltonian HH is quadratic and violates the source condition (4.2), so h^λ,N\widehat{h}_{\lambda,N} and HH could differ by a Casimir function.

From Example 2.5, all possible Casimir functions are of the form Φ(Π2)\Phi(\|\Pi\|^{2}) for some function Φ:\Phi:\mathbb{R}\rightarrow\mathbb{R}. We try to compensate for the difference via a simple Casimir of the form aΠ2a\cdot\|\Pi\|^{2}, for some constant aa\in\mathbb{R}. After trial and error, we select a=1.85a=1.85 and visualize the “Casimir-corrected” estimator, that is h^λ,N+1.85Π2\widehat{h}_{\lambda,N}+1.85\cdot\|\Pi\|^{2}, see Figure 5.1 (c). We also visualize the mean-square error of the predicted Hamiltonian vector fields in a heatmap, see Figure 5.1 (d).

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5.1: Rigid body dynamics: (a) Ground-truth Hamiltonian (b) Learned Hamiltonian with N=500N=500 (c) Learned Hamiltonian adjusted by a Casimir function (d) Squared error of the predicted Hamiltonian vector field

Analysis  In this setting, the Hamiltonian function HH is a quadratic function, which, according to Example 2.9 in [Hu 24], does not belong to K\mathcal{H}_{K}. In particular, the source condition (4.2) cannot be satisfied. Therefore, even though the Gaussian kernel is universal, the proxy function in K\mathcal{H}_{K}, which approximates the ground-truth Hamiltonian HH, can only be learned at best up to a Casimir function. That is why we choose to manually correct the estimator by adding a Casimir function. From the numerical results, we see that without the “Casimir correction”, HH and h^λ,N\widehat{h}_{\lambda,N} can be very different, despite the fact that the Hamiltonian vector fields are very well replicated.

5.1.2 Underwater vehicle dynamics

The Lie-Poisson underwater vehicle dynamics has been introduced in Example 2.6. Using also a Euclidean metric, (5.5), the compatible structure JJ is

J(Π,Q,Γ)=(Π^Q^Γ^Q^00Γ^00).\displaystyle J(\Pi,\mathrm{Q},\Gamma)=\left(\begin{array}[]{ccc}\hat{\Pi}&\hat{\mathrm{Q}}&\hat{\Gamma}\\ \hat{\mathrm{Q}}&0&0\\ \hat{\Gamma}&0&0\end{array}\right).

Numerical results  We pick m=1m=1, g=9.8g=9.8, I=diag{1,2,3}I=\operatorname{diag}\{1,2,3\}, M=diag{3,2,1}M=\operatorname{diag}\{3,2,1\} and rG=(1,1,1)r_{G}=(1,1,1)^{\top}. We consider the standard Gaussian kernel with parameter η\eta on 𝔰𝔬(3)×3×39\mathfrak{so}(3)^{*}\times{\mathbb{R}^{3}}^{*}\times{\mathbb{R}^{3}}^{*}\cong\mathbb{R}^{9}. We set N=400N=400 and sample NN states from the uniform distribution on the cube [1,1]9[-1,1]^{9}, and then obtain the Hamiltonian vector fields at these states. We set the regularization parameter λ=cNα\lambda=c\cdot N^{-\alpha} with α=0.4\alpha=0.4. We then perform a grid search of the parameters η\eta and cc using 55-fold cross-validation, which results in the optimal parameter η=5\eta=5 and c=7.5106c=7.5\cdot 10^{-6}. Given that 𝔰𝔬(3)×3×3\mathfrak{so}(3)^{*}\times{\mathbb{R}^{3}}^{*}\times{\mathbb{R}^{3}}^{*} is 99-dimensional, we fix the 3rd to 9th dimensions, that is, Π3=0\Pi_{3}=0, Q=Γ=𝟎Q=\Gamma={\bf 0}, and visualize both the ground-truth Hamiltonian functions HH and the reconstructed Hamiltonian functions h^λ,N\widehat{h}_{\lambda,N} in the dimensions (Π1,Π2)(\Pi_{1},\Pi_{2}), see Figure 5.2 (a) and (b). The ground-truth Hamiltonian HH does not satisfy the source condition (4.2), so h^λ,N\widehat{h}_{\lambda,N} and HH generically differ by a Casimir function. We also visualize the mean-square error of the predicted Hamiltonian vector fields with a heatmap, see Figure 5.2 (c).

Refer to caption
Refer to caption
Refer to caption
Figure 5.2: Underwater Vehicle: (a) Ground-truth Hamiltonian (b) Learned Hamiltonian with N=400N=400 (c) Squared error of the predicted Hamiltonian vector field

Analysis  In this setting, the Hamiltonian function HH is a quadratic function, which, according to Example 2.9 in [Hu 24], does not belong to K\mathcal{H}_{K}. In particular, the source condition (4.2) cannot be satisfied. Therefore, even though the Gaussian kernel is universal, the proxy function in K\mathcal{H}_{K}, which approximates the ground-truth Hamiltonian HH, can only be learned at best up to a Casimir function. From Example 2.6, all possible Casimir functions are of the form Φ(C1,,C6)\Phi\circ(C_{1},\ldots,C_{6}) for some function Φ:6\Phi:\mathbb{R}^{6}\rightarrow\mathbb{R}. To determine such a function Φ\Phi, one might consider running a polynomial regression to approximate Φ\Phi. From the HH-axis, we see that, although the shapes of the Hamiltonian look alike, the exact Hamiltonian value still differs significantly by a constant, which is a trivial Casimir function. On the other hand, we notice that the Hamiltonian vector fields are very well replicated.

5.1.3 Exact recovery of Hamiltonians in the RKHS

In the rigid body and the underwater vehicle examples, we saw that even though the Hamiltonian vector field can be learned very well, the data-generating Hamiltonian function can only be estimated up to a Casimir function due to the violation of the source condition. We now aim to manually pick a ground-truth Hamiltonian that satisfies the source condition and reconstruct the Hamiltonian function accurately from the data. To find such a Hamiltonian, we use [Minh 10], which provides an orthonormal basis of the RKHS of the Gaussian kernel. It turns out that exactly one of the basis functions is a Casimir and any other basis function belongs to null\mathcal{H}^{\bot}_{\mathrm{null}} according to Example 2.5. Moreover, by Proposition 4.5, the basis functions, except the Casimir basis element, satisfies the source condition for sufficiently large S>0S>0. In view of the above, we consider the standard Gaussian kernel on 3\mathbb{R}^{3} with fixed parameter η=2\eta=2 and choose the Hamiltonian function as

H(𝐱)=x12x23e𝐱2η2.H({\bf x})=x_{1}^{2}x_{2}^{3}e^{-\frac{\|{\bf x}\|^{2}}{\eta^{2}}}. (5.7)

The Lie-Poisson structure is the same as the one we used for the rigid body, with the only difference in the experiment being the Hamiltonian function.

Numerical results  To guarantee the matching of the RKHS, we consider the standard Gaussian kernel with the fixed parameter η=2\eta=2 (the same as the ground-truth) on 3\mathbb{R}^{3}. We set N=500N=500 and sample NN states from the uniform distribution on the cube [1,1]3[-1,1]^{3}, and then obtain the Hamiltonian vector fields at these states. In view of (4.3), we set the regularization parameter λ=cNα\lambda=c\cdot N^{-\alpha} with α=0.4\alpha=0.4. We then perform a grid search of the parameter cc using 55-fold cross-validation, which results in the optimal parameter c=7.5106c=7.5\cdot 10^{-6}. Given that 𝔰𝔬(3)3\mathfrak{so}(3)^{*}\cong\mathbb{R}^{3} is 33-dimensional, we fix the third dimension, that is, x3=0x_{3}=0, and visualize both the ground-truth Hamiltonian functions HH and the reconstructed Hamiltonian functions h^λ,N\widehat{h}_{\lambda,N} in the dimensions (x1,x2)(x_{1},x_{2}), see Figure 5.3 (a) and (b).

Analysis  In this setting, the Hamiltonian function HH is chosen to satisfy the source condition (4.2). In this case, both the ground-truth Hamiltonian HH and the estimator h^λ,N\widehat{h}_{\lambda,N} belong to null\mathcal{H}^{\bot}_{\mathrm{null}}. The possibility of the presence of Casimir functions is eliminated, and hence by Theorem 4.8, h^λ,N\widehat{h}_{\lambda,N} converges to HH in the RKHS norm. As shown in Figure 5.3 (c), the Hamiltonian function can replicated even without “Casimir correction”, in contrast with the rigid body example.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5.3: Gaussian kernel sections: (a) Ground-truth Hamiltonian (b) Learned Hamiltonian with N=500N=500 (c) Absolute error of the predicted Hamiltonian function (d) Squared error of the predicted Hamiltonian vector field

5.2 Learning on non-Euclidean symplectic manifolds

As mentioned in Example 2.2, any symplectic manifold is a Poisson manifold. Let MM be a symplectic manifold with a symplectic form ω\omega. Then the Hamiltonian vector field associated with Hamiltonian function HH defined in (2.5) is

XH=ω(𝐝H)=JH,\displaystyle X_{H}=\omega^{\sharp}(\bm{\mathrm{d}}H)=J\nabla H,

where ω:TMTM\omega^{\sharp}:T^{*}M\to TM is the bundle isomorphism induced by the symplectic form ω\omega and J:TMTMJ:TM\to TM given by

J(x)v:=ωx(gx(v)),xM,vTxM,\displaystyle J(x)v:=\omega_{x}^{\sharp}\left(g_{x}^{\flat}(v)\right),\quad\forall x\in M,v\in T_{x}M, (5.8)

with the bundle isomorphism g:TMTMg^{\flat}:TM\to T^{*}M induced by the Riemannian metric gg. Unlike in the more general Poisson setting, JJ is a bundle isomorphism in the symplectic setup. We aim to learn the Hamiltonian function HH out of the equation

z˙=JH(z).\displaystyle\dot{z}=J\nabla H(z). (5.9)

Following Theorem 3.13, the kernel estimator in (3.16) is

h^λ,N=gN(𝐜^,XK(𝐙N,))=gN(𝐜^,𝕁NK(𝐙N,)),\widehat{h}_{\lambda,N}=g_{N}(\widehat{\bf c},X_{K_{\cdot}}(\mathbf{Z}_{N},\cdot))=g_{N}(\widehat{\bf c},\mathbb{J}_{N}\nabla K(\mathbf{Z}_{N},\cdot)),

where 𝐜^T𝐙NM\widehat{\mathbf{c}}\in T_{\mathbf{Z}_{N}}M is given by

𝐜^=(GN+λNI)1𝐗σ2,N.\displaystyle\widehat{\mathbf{c}}=(G_{N}+\lambda NI)^{-1}\mathbf{X}_{\sigma^{2},N}.

In applications, it is important to compute the kernel estimator h^λ,N\widehat{h}_{\lambda,N} in pre-determined local patches that cover the manifold MM. By the local representations (2.17) and (3.21), the compatible structure JJ and the generalized Gram matrix GNG_{N} of the Hamiltonian system (5.9) are

Jij=ωikgkj,GNi,j=ω(𝐙(i))1,2K(𝐙(i),𝐙(j))ω(𝐙(j))g(j).\displaystyle J^{ij}=-\omega^{ik}g_{kj},\quad G_{N}^{i,j}=\omega(\mathbf{Z}^{(i)})\partial_{1,2}K(\mathbf{Z}^{(i)},\mathbf{Z}^{(j)})\omega^{\top}(\mathbf{Z}^{(j)})g^{(j)}.

In what follows, we demonstrate the effectiveness of this learning scheme on two examples that share the same symplectic manifold M=S2×S2M=S^{2}\times S^{2} with the same symplectic form ω\omega and the same Riemannian metric gg, while governed by two different Hamiltonian functions HH. The first Hamiltonian function is the restricted Euclidean 3-norm to the manifold MM, and the second one describes the two-vortex dynamics. We will see that the first Hamiltonian is “well-behaved” and the second Hamiltonian exhibits singularities.

We emphasize that, technically, there are three challenges in relation to learning these systems. First, it is difficult to verify that a given Hamiltonian function belongs to the RKHS K\mathcal{H}_{K} of a pre-determined kernel KK on the manifold due to the lack of results in the literature on the characterization of functions in this type of RKHS on manifolds. Second, in case the Hamiltonian function does not lie in the RKHS of the selected kernel KK, it is important to investigate the universality properties of kernels on manifolds to hope for at least a good approximation. Third, the Hamiltonian function of the two-vortex system exhibits singularities, which pose even more difficulties to recover HH from XHX_{H}, since the vector field diverges at singularities. We will see that even in this example, h^λ,N\widehat{h}_{\lambda,N} still recovers the qualitative behaviors of the Hamiltonian function.

5.2.1 Metric, symplectic, and compatible structures in S2×S2S^{2}\times S^{2}

Let 𝒫¯=S2×S2\overline{\mathcal{P}}=S^{2}\times S^{2}, the product of 22 copies of unit 22-spheres in 3\mathbb{R}^{3}. The phase space for the 22-vortex problem is 𝒫=𝒫¯\Δ\mathcal{P}=\overline{\mathcal{P}}\backslash\Delta, where Δ\Delta is the diagonal, that is,

Δ={𝐱=(x1,x2)x1=x2}.\displaystyle\Delta=\left\{\mathbf{x}=(x_{1},x_{2})\mid x_{1}=x_{2}\right\}.

The symplectic structure on 𝒫\mathcal{P} is given by

ω𝒫=λ1π1ωS2+λ2π2ωS2,\displaystyle\omega_{\mathcal{P}}=\lambda_{1}\pi_{1}^{*}\omega_{S^{2}}+\lambda_{2}\pi_{2}^{*}\omega_{S^{2}}, (5.10)

where πj\pi_{j} is the Cartesian projection on to the jj-th factor, ωS2\omega_{S^{2}} the natural symplectic form on S2S^{2}, and λj\lambda_{j} the vorticity of the jj th vortex. The Poisson structure is given by

{f,g}=λ11[d1f,d1g,x1]+λ21[d2f,d2g,x2],\{f,g\}=\lambda_{1}^{-1}\left[\mathrm{~{}d}_{1}f,\mathrm{~{}d}_{1}g,x_{1}\right]+\lambda_{2}^{-1}\left[\mathrm{~{}d}_{2}f,\mathrm{~{}d}_{2}g,x_{2}\right],

where [djf,djg,xj]\left[\mathrm{d}_{j}f,\mathrm{~{}d}_{j}g,x_{j}\right] is the triple product of [a,b,c]=a(b×c)[a,b,c]=a\cdot(b\times c) for a,b,c3a,b,c\in\mathbb{R}^{3}.

Local coordinates. We first consider the local structure of the unit sphere S2S^{2} as a Riemannian manifold. We parametrize the unit sphere S2S^{2} in terms of two angles, that is, θ\theta (the colatitude) and φ\varphi (the longitude), and use two patches (V1,ϕ1)(V_{1},\phi_{1}) and (V2,ϕ2)(V_{2},\phi_{2}) to cover the whole sphere S2S^{2}. The first patch is given by

{V1={(θ,φ)θ(0,π),φ(0,2π)},ϕ1:(θ,φ)(sinθcosφ,sinθsinφ,cosθ),\begin{cases}V_{1}=\{(\theta,\varphi)\mid\theta\in(0,\pi),\varphi\in(0,2\pi)\},\\ \phi_{1}:(\theta,\varphi)\mapsto(\sin\theta\cos\varphi,\sin\theta\sin\varphi,\cos\theta),\end{cases}

which implies that the semicircle from the north pole to the south pole lying in the xzxz-plane is not covered by this chart. In order to cover the whole sphere, we need another chart obtained by deleting the semicircle in the xyxy-plane from (0,1,0)(0,1,0) to (0,1,0)(0,1,0) and with x0x\leq 0. In view that the patch (V1,ϕ1)(V_{1},\phi_{1}) is already topologically dense in S2S^{2}, we omit the explicit parameterization of (V2,ϕ2)(V_{2},\phi_{2}) for convenience and focus on the first patch.

The unit sphere S2S^{2} admits a natural symplectic form given by its area form that is locally expressed by ωS2=sinθdθdφ\omega_{S^{2}}=\sin\theta\mathrm{d}\theta\wedge\mathrm{d}\varphi and a Riemannian metric gS2=dθ2+sin2θdφ2g_{S^{2}}=\mathrm{d}\theta^{2}+\sin^{2}\theta\mathrm{d}\varphi^{2} inherited from the ambient Euclidean metric in 3{\mathbb{R}}^{3}. The corresponding matrix forms in the coordinate patch (V1,ϕ1)(V_{1},\phi_{1}) can be expressed by

ωS2(θ,φ)=[0sinθsinθ0], and gS2(θ,φ)=[100sin2θ].\omega_{S^{2}}(\theta,\varphi)=\begin{bmatrix}0&\sin\theta\\ -\sin\theta&0\end{bmatrix},\quad\text{ and }\quad g_{S^{2}}(\theta,\varphi)=\begin{bmatrix}1&0\\ 0&\sin^{2}\theta\end{bmatrix}.

These structures induce local symplectic and Riemannian metrics on the patch (V1×V1,ϕ1×ϕ1)(V_{1}\times V_{1},\phi_{1}\times\phi_{1}) (that is, two copies of (V1,ϕ1)(V_{1},\phi_{1})) of the product manifold S2×S2S^{2}\times S^{2} given by

ω𝒫(θ1,φ1,θ2,φ2)=diag{λ1ωS2(θ1,φ1),λNωS2(θ2,φ2)},\displaystyle\omega_{\mathcal{P}}(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2})=\operatorname{diag}\{\lambda_{1}\omega_{S^{2}}(\theta_{1},\varphi_{1}),\lambda_{N}\omega_{S^{2}}(\theta_{2},\varphi_{2})\},

where (θ1,φ1,θ2,φ2)V1×V1(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2})\in V_{1}\times V_{1}. The matrix form of the product metric g𝒫g_{\mathcal{P}} is

g𝒫(θ1,φ1,θ2,φ2)=diag{gS2(θ1,φ1),gS2(θ2,φ2)},\displaystyle g_{\mathcal{P}}(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2})=\operatorname{diag}\{g_{S^{2}}(\theta_{1},\varphi_{1}),g_{S^{2}}(\theta_{2},\varphi_{2})\},

hence, the compatible structure J𝒫J_{\mathcal{P}} of 22-vortex system is

J𝒫(θ1,φ1,θ2,φ2)=diag{λ1ωS2(θ1,φ1)gS2(θ1,φ1),λ2ωS2(θ2,φ2)gS2(θ2,φ2)}.\displaystyle J_{\mathcal{P}}(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2})=\operatorname{diag}\{\lambda_{1}\omega_{S^{2}}(\theta_{1},\varphi_{1})g_{S^{2}}(\theta_{1},\varphi_{1}),\lambda_{2}\omega_{S^{2}}(\theta_{2},\varphi_{2})g_{S^{2}}(\theta_{2},\varphi_{2})\}.

5.2.2 A universal kernel and data sampling on S2×S2S^{2}\times S^{2}

Universal kernel.  Consider the RKHS K{\mathcal{H}}_{K} defined on a Hausdorff topological space 𝒳{\cal X}. Let 𝒵𝒳{\cal Z}\subset{\cal X} be an arbitrary but fixed compact subset of 𝒳{\cal X} and denote by K(𝒵)K{\mathcal{H}}_{K}({\cal Z})\subset{\mathcal{H}}_{K} the completion in the RKHS norm of the span of kernel sections determined by the elements of 𝒵{\cal Z}. We write this as:

K(𝒵)=span{Kzz𝒵}¯.{\mathcal{H}}_{K}({\cal Z})=\overline{{\rm span}\left\{K_{z}\mid z\in{\cal Z}\right\}}. (5.11)

Denote now by K(𝒵)¯\overline{\mathcal{H}_{K}({\cal Z})} the uniform closure of K(𝒵){\mathcal{H}}_{K}({\cal Z}). A kernel KK is called universal if for any compact subset 𝒵𝒳\mathcal{Z}\subset{\cal X}, we have that K(𝒵)¯=C(𝒵)\overline{\mathcal{H}_{K}({\cal Z})}=C({\cal Z}), with C(𝒵)C({\cal Z}) the set of real-valued continuous functions on 𝒵{\cal Z}. Equivalently, this implies that for any ε>0\varepsilon>0, and any function fC(𝒵)f\in C(\mathcal{Z}), there exists a function gK(𝒵)g\in\mathcal{H}_{K}({\cal Z}), such that fg<ε\|f-g\|_{\infty}<\varepsilon. Many kernels that are used in practice are indeed universal [Micc 06, Stei 01], e.g., the Gaussian kernel on Euclidean space. For non-Euclidean spaces there is still a lack of literature in this field. The universality of kernels is a crucial feature in recovering unknown functions using the corresponding RKHS. In our learning approach, we shall consider the kernel on S2×S2S^{2}\times S^{2} defined by

Kη(x,y):=exp{1η2|ι(x)ι(y)|2}, for allx,yS2×S2,\displaystyle K_{\eta}(x,y):=\exp\left\{-\frac{1}{\eta^{2}}|\iota(x)-\iota(y)|^{2}\right\},\quad\text{ for all}\quad x,y\in S^{2}\times S^{2}, (5.12)

where η>0\eta>0 is a constant and ι:S2×S23×3\iota:S^{2}\times S^{2}\to\mathbb{R}^{3}\times\mathbb{R}^{3} is the injection map. We emphasize that since ι\iota is continuous and injective, by [Chri 10, Theorem 2.2], the kernel KηK_{\eta} in (5.12) is universal for all η>0\eta>0 on S2×S2S^{2}\times S^{2}.

Sampling method.  Note that the total surface of the unit sphere is

02π0πsinθdθdφ=4π.\displaystyle\int_{0}^{2\pi}\int_{0}^{\pi}\sin\theta\mathrm{d}\theta\mathrm{d}\varphi=4\pi.

The cumulative distribution function can be computed as Θ,Φ\Theta,\Phi such that

(Θθ,Φφ)=14π0φ0θsintdtds=14πφ(1cosθ),\displaystyle\mathbb{P}(\Theta\leq\theta,\Phi\leq\varphi)=\frac{1}{4\pi}\int_{0}^{\varphi}\int_{0}^{\theta}\sin t~{}dtds=\frac{1}{4\pi}\varphi(1-\cos\theta),

which factorizes into the product of the margins

(Θθ)=12(1cosθ),(Φφ)=12πφ.\displaystyle\mathbb{P}(\Theta\leq\theta)=\frac{1}{2}(1-\cos\theta),\quad\mathbb{P}(\Phi\leq\varphi)=\frac{1}{2\pi}\varphi.

One can hence obtain a uniform distribution on S2S^{2} by sampling U1,U2U_{1},U_{2} uniformly on the interval (0,1)(0,1), and compute

Θ=arccos(2U11),Φ=2πU2.\displaystyle\Theta=\arccos(2U_{1}-1),\quad\Phi=2\pi U_{2}.

This extends to a uniform distribution on the product space 𝒫\mathcal{P}, which we adopt to generate training data.

5.2.3 Exact recovery of Hamiltonians using a universal kernel

In our first example, we consider as Hamiltonian function on 𝒫¯\overline{\mathcal{P}} the Euclidean 3-norm on 3×3\mathbb{R}^{3}\times\mathbb{R}^{3}.

Formulation  We choose the globally defined Hamiltonian function

H(x1,x2)=(x133+x233)13,H^{\prime}\left(x_{1},x_{2}\right)=(\|x_{1}\|^{3}_{3}+\|x_{2}\|^{3}_{3})^{\frac{1}{3}},

where 3\|\cdot\|_{3} is the usual 3-norm on 3\mathbb{R}^{3} defined as (a,b,c)3=(a3+b3+c3)13\|(a,b,c)\|_{3}=(a^{3}+b^{3}+c^{3})^{\frac{1}{3}}. In local coordinates, we focus on the patch (V1×V1,ϕ1×ϕ1)(V_{1}\times V_{1},\phi_{1}\times\phi_{1}) and so that the Hamiltonian become

H(θ1,φ1,θ2,φ2)=H(ϕ1(θ1,φ1),ϕ1(θ2,φ2)).\displaystyle H(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2})=H^{\prime}(\phi_{1}(\theta_{1},\varphi_{1}),\phi_{1}(\theta_{2},\varphi_{2})).

We shall call any Hamiltonian function constructed in this way a spherical 3-norm because the domain is on the products of spheres. The restricted Gaussian kernel KηK_{\eta} as in (5.12) will be adopted for learning, also focused on the patch (V1×V1,ϕ1×ϕ1)(V_{1}\times V_{1},\phi_{1}\times\phi_{1}) on 𝒫¯\overline{\mathcal{P}}.

Numerical results  We set N=1200N=1200 and sample NN states from the uniform distribution on 𝒫¯\overline{\mathcal{P}}, and then obtain the Hamiltonian vector fields at these states. In view of (4.3), we set the regularization parameter λ=cNα\lambda=c\cdot N^{-\alpha} with α=0.4\alpha=0.4. We perform hyperparameter tunning and selected parameter η=0.9\eta=0.9 and c=1e4c=1e^{-4}. Given that 𝒫¯\overline{\mathcal{P}} is 44-dimensional, we fix two of the dimensions, that is, (θ2,φ2)=(π2,0)(\theta_{2},\varphi_{2})=(\frac{\pi}{2},0), (θ1,θ2)=(π3,π3)(\theta_{1},\theta_{2})=(\frac{\pi}{3},\frac{\pi}{3}), and visualize both the ground-truth Hamiltonian functions HH and the reconstructed Hamiltonian functions h^λ,N\widehat{h}_{\lambda,N} in the remaining 22 dimensions, see Figure 5.4 (a)(b) and (c)(d). We also visualize the absolute error of the predicted Hamiltonian function in a heatmap, after a vertical shift to offset any potential constant, see Figure 5.4 (e)(f). We also picture the Hamiltonian function globally on the first sphere and the second sphere respectively as heatmaps, while fixing (θ2,φ2)=(π2,0)(\theta_{2},\varphi_{2})=(\frac{\pi}{2},0) and (θ1,φ1)=(π2,0)(\theta_{1},\varphi_{1})=(\frac{\pi}{2},0) respectively on the other sphere, see Figure 5.5. Lastly, we picture the absolute error of the learned Hamiltonian in the setting of only one sphere and NN increased to 2000, see Figure 5.6, which demonstrates the enhanced quantitative performance of the algorithm as the number of data becomes large.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 5.4: Spherical 3-norm on S2×S2S^{2}\times S^{2}: (a)(b) Ground-truth Hamiltonian (c)(d) Learned Hamiltonian with N=1200N=1200 (e)(f) Absolute Error of the predicted Hamiltonian function adjusted by constant.
Refer to caption
Refer to caption
Figure 5.5: Global heatmap of the spherical 3-norm on S2×S2S^{2}\times S^{2} with N=1200N=1200: (a) left: ground-truth Hamiltonian on the first sphere; right: learned Hamiltonian on the first sphere adjusted by constant (b) left: ground-truth Hamiltonian on the second sphere; right: learned Hamiltonian on the second sphere adjusted by constant.
Refer to caption
Figure 5.6: Spherical 3-norm on S2S^{2} with N=2000N=2000: Absolute Error of the predicted Hamiltonian function adjusted by constant.

5.2.4 Recovery of two-vortex dynamics

The Hamiltonian function of the two-vortex dynamics is

H(x1,x2)=λ1λ2log(1x1x2),H^{\prime}\left(x_{1},x_{2}\right)=-\lambda_{1}\lambda_{2}\log\left(1-x_{1}\cdot x_{2}\right), (5.13)

for x1,x23x_{1},x_{2}\in\mathbb{R}^{3}, and the Hamiltonian vector field XHX_{H^{\prime}} is given by

x˙j=XH(𝐱)j=ijλixi×xj1xixj.\dot{x}_{j}=X_{H^{\prime}}(\mathbf{x})_{j}=\sum_{i\neq j}\lambda_{i}\frac{x_{i}\times x_{j}}{1-x_{i}\cdot x_{j}}.

Moreover, in local coordinate, the Hamiltonian can be written as

H(θ1,φ1,θ2,φ2)=λ1λ2log(1sin(θ1)sin(θ2)cos(φ1φ2)cos(θ1)cos(θ2)),\displaystyle H(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2})=-\lambda_{1}\lambda_{2}\log(1-\sin(\theta_{1})\sin(\theta_{2})\cos(\varphi_{1}-\varphi_{2})-\cos(\theta_{1})\cos(\theta_{2})),

and, as we have seen in (2.13), the Hamiltonian vector field can be locally written as

XH(θ1,φ1,θ2,φ2)=J𝒫(θ1,φ1,θ2,φ2)H(θ1,φ1,θ2,φ2).\displaystyle X_{H}(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2})=J_{\mathcal{P}}(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2})\nabla H(\theta_{1},\varphi_{1},\theta_{2},\varphi_{2}).

Numerical results  For the numerical experiment, since the patch (V1×V1,ϕ1×ϕ1)(V_{1}\times V_{1},\phi_{1}\times\phi_{1}) is topologically dense in the space 𝒫\mathcal{P}, we shall assume that all the training data will fall into this patch for computational convenience. We consider the standard Gaussian kernel with parameter η\eta on 6\mathbb{R}^{6} but restricted to 𝒫\mathcal{P}. Since the Gaussian kernel is positive-definite on the entire 6\mathbb{R}^{6}, it is also positive-definite when restricted to the manifold 𝒫\mathcal{P}, and hence constitutes a valid choice of a kernel on 𝒫\mathcal{P}. We set N=1200N=1200 and sample NN states from the uniform distribution on 𝒫\mathcal{P} as elaborated above, and then obtain the Hamiltonian vector fields at these states. In view of (4.3), we set the regularization parameter λ=cNα\lambda=c\cdot N^{-\alpha} with α=0.4\alpha=0.4. We performed hyperparameter tunning and set η=0.7\eta=0.7 and c=0.01c=0.01. Given that 𝒫\mathcal{P} is 44-dimensional, we shall respectively fix two of the dimensions, that is, (θ2,φ2)=(π2,0)(\theta_{2},\varphi_{2})=(\frac{\pi}{2},0), (θ1,φ1)=(π2,π2)(\theta_{1},\varphi_{1})=(\frac{\pi}{2},\frac{\pi}{2}) and (θ1,θ2)=(π3,π3)(\theta_{1},\theta_{2})=(\frac{\pi}{3},\frac{\pi}{3}), and respectively visualize the mean-square error of the predicted Hamiltonian vector fields in a heatmap, see Figure 5.7 (a)(b)(c). In the end, we picture the Hamiltonian function globally on the first sphere as a heatmap, while fixing (θ2,φ2)=(π2,0)(\theta_{2},\varphi_{2})=(\frac{\pi}{2},0) on the second sphere, see Figure 5.8.

Refer to caption
Refer to caption
Refer to caption
Figure 5.7: Two-vortex Hamiltonian dynamics: Squared error of the predicted Hamiltonian vector field. It is observed that the vector field prediction error only becomes significant around the singularities.
Refer to caption
Figure 5.8: Global heatmap of the Two-vortex Hamiltonian with N=1200. Left: ground-truth Hamiltonian on the first sphere; right: learned Hamiltonian on the first sphere adjusted by constant.

Analysis  It can be seen that the estimator h^λ,N\widehat{h}_{\lambda,N} fully captures the qualitative behavior of the ground-truth Hamiltonian HH. Since HH exhibits singularities, by Theorem 3.2 (iii), HH cannot be in K\mathcal{H}_{K}. In this scenario, even though the restricted Gaussian kernel to the manifold 𝒫\mathcal{P} is universal, it only guarantees good approximation properties within the category of continuous functions, and not for functions with singularities.

6 Conclusion

This paper proposes a novel kernel-based machine learning method that offers a closed-form solution to the inverse problem of recovering a globally defined and potentially high-dimensional and nonlinear Hamiltonian function on Poisson manifolds. The method uses noisy observations of Hamiltonian vector fields on the phase space as data input. The approach is formulated as a kernel ridge regression problem, where the optimal candidate is sought within a reproducing kernel Hilbert space (RKHS) on the manifold, minimizing the discrepancy between the observed and candidate Hamiltonian vector fields (measured with respect to a chosen Riemannian metric) and an RKHS-based regularization term. Despite the complexities associated with optimization on a manifold, the problem remains convex, leading to an explicit solution derived by leveraging a “differential” version of the reproducing property on the manifold and considering the variations of the loss function.

The kernel method exhibits various key advantages. First, as we demonstrated in Section 3.1, the differential reproducing property allows the kernel ridge regression framework to naturally incorporate the differential equation constraints necessary for structure preservation, even on manifolds. Second, the method provides a closed-form solution, eliminating the computational burdens typically associated with iterative, gradient-based optimization. Additionally, the strict convexity of the Tikhonov-regularized kernel regression ensures a unique solution, addressing the ill-posedness that is inherent in recovering the Hamiltonian function in the presence of Poisson degeneracy due to the potential presence of Casimir functions. Finally, we introduced an operator-theoretic framework and a kernel estimator representation that enables a rigorous analysis of estimation and approximation errors. Additionally, when the target function exhibits a symmetry that is a priori known, as it is common in mechanical systems, an appropriate kernel can be selected to satisfy the symmetry constraints, further enhancing the method’s versatility, applicability, and structure-preservation features.

This paper tackles just one aspect of the broader class of structure-preserving inverse problems, focusing on extending prior approaches from Euclidean spaces to non-Euclidean manifolds. A significant challenge still lies in addressing systems defined in infinite-dimensional spaces, such as the space of measures. A notable example is the Schrödinger equation, which can be interpreted as a transport equation for measures under the Hamiltonian flow. Interestingly, this transport equation can also be viewed as a Hamiltonian flow within the Wasserstein space, referred to as the Wasserstein Hamiltonian flow. Extending the kernel-based framework to identify the potential function in the Schrödinger equation, thereby accommodating infinite-dimensional underlying spaces, is the focus of the authors’ research agenda.

The methodology in this paper assumes that Hamiltonian vector field data is directly available, which is a reasonable assumption when the observed data has a measurable physical interpretation, such as acceleration or force. However, in more realistic scenarios, the collected data might consist of discrete-time trajectories rather than continuous vector field observations. In this context, using variational integrators and other manifold-preserving techniques becomes crucial, yet these methods have not been fully explored in the existing literature in connection with this problem. Addressing this gap could open new avenues for applying structure-preserving kernel methods to a broader range of real-world problems.

As a final remark, a great wealth of knowledge has been accumulated on the qualitative behavior of Hamiltonian dynamical systems based on their geometry [Abra 78], symmetries [Mars 74, Orte 04], stability properties [Orte 05], or persistence [Mont 97b, Mont 97a, Orte 97] and bifurcation phenomena [Chos 02, Chos 03]. All these concepts surely have an interplay in relation to learnability that needs to be explored.

Table 1: Comparison among symplectic vector space, symplectic manifold, and Poisson manifold setups

​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​ Symplectic vector space Symplectic manifold Poisson manifold Space (V,ωcan)(V,\omega_{\mathrm{can}}) (M,ω)(M,\omega) (P,{,})(P,\{\cdot,\cdot\}) Hamiltonian vector field Xh=JcanhX_{h}=J_{\mathrm{can}}\nabla h iXh=𝐝hi_{X_{h}}=\mathbf{d}h Xh=B𝐝hX_{h}=B^{\sharp}\mathbf{d}h Compatible structure Jcan:TVTVJ_{\mathrm{can}}:TV\to TV J=ωg:TMTMJ=\omega^{\sharp}g^{\flat}:TM\to TM J=Bg:TPTPJ=B^{\sharp}g^{\flat}:TP\to TP Differential Gram matrix GN=𝕁can1,2K(N,N)𝕁canG_{N}=\mathbb{J}_{\mathrm{can}}\nabla_{1,2}K(\mathbb{Z}_{N},\mathbb{Z}_{N})\mathbb{J}_{\mathrm{can}}^{\top} GN𝐜:=XgN(𝐜,XK(𝐙N))(𝐙N)G_{N}\mathbf{c}:=X_{g_{N}(\mathbf{c},X_{K_{\cdot}}(\mathbf{Z}_{N}))}(\mathbf{Z}_{N}) GN𝐜:=XgN(𝐜,XK(𝐙N))(𝐙N)G_{N}\mathbf{c}:=X_{g_{N}(\mathbf{c},X_{K_{\cdot}}(\mathbf{Z}_{N}))}(\mathbf{Z}_{N}) Estimator 𝐗σ2,N(GN+λNI)1𝕁can1K(𝐙N,)\mathbf{X}_{\sigma^{2},N}^{\top}\left(G_{N}+\lambda NI\right)^{-1}\mathbb{J}_{\mathrm{can}}\nabla_{1}K(\mathbf{Z}_{N},\cdot) gN((GN+λNI)1𝐗σ2,N,XK(𝐙N))g_{N}\left((G_{N}+\lambda NI)^{-1}\mathbf{X}_{\sigma^{2},N},X_{K_{\cdot}}(\mathbf{Z}_{N})\right) gN((GN+λNI)1𝐗σ2,N,XK(𝐙N))g_{N}\left((G_{N}+\lambda NI)^{-1}\mathbf{X}_{\sigma^{2},N},X_{K_{\cdot}}(\mathbf{Z}_{N})\right)

Appendix A Appendices

A.1 The proof of Theorem 3.3

We prove (i) and (ii) simultaneously by induction on ss. The case s=0s=0 is trivial since that means for any xMx\in M, D(0,0)K(x,)=KxD^{(0,0)}K(x,\cdot)=K_{x} satisfies the standard reproducing property in K\mathcal{H}_{K}. Let now 0ls10\leq l\leq s-1. Suppose that D(l,0)K(y,)vKD^{(l,0)}K(y,\cdot)\cdot v\in\mathcal{H}_{K} and (3.2) holds for any yTl1My\in T^{l-1}M and vTyTl1Mv\in T_{y}T^{l-1}M. Then (3.2) implies that for any 1k,kl11\leq k,k^{\prime}\leq l-1, xTkMx\in T^{k}M, uTxTkMu\in T_{x}T^{k}M, xTkMx^{\prime}\in T^{k^{\prime}}M, and uTxTkMu^{\prime}\in T_{x^{\prime}}T^{k^{\prime}}M,

D(k,0)K(x,)u,D(k,0)K(x,)uK=D(k,k)K(x,x)(u,u).\langle D^{(k,0)}K(x,\cdot)\cdot u,D^{(k^{\prime},0)}K(x^{\prime},\cdot)\cdot u^{\prime}\rangle_{\mathcal{H}_{K}}=D^{(k,k^{\prime})}K(x,x^{\prime})\cdot(u,u^{\prime}). (A.1)

Denote Kk(y,)=D(k,0)K(x,)uK^{k}(y,\cdot)=D^{(k,0)}K(x,\cdot)\cdot u and K(k,k)(y,y):=D(k,k)K(x,x)(u,u)K^{(k,k^{\prime})}(y,y^{\prime}):=D^{(k,k^{\prime})}K(x,x^{\prime})\cdot(u,u^{\prime}) with with y=(x,u)y=(x,u) and y=(x,u)y^{\prime}=(x^{\prime},u^{\prime}).

Now, we turn to the case l+1sl+1\leq s. In the following proof, let ϕ:[1,1]TlM\phi:[-1,1]\to T^{l}M be a differentiable curve with ϕ(0)=yTlM\phi(0)=y\in T^{l}M and ϕ(0)=vTyTlM\phi^{\prime}(0)=v\in T_{y}T^{l}M. Since TlMT^{l}M with Riemannian metric glg_{l} is complete (see Remark 2.14), by Hopf-Rinow theorem, we can always choose ϕ\phi to be geodesics with constant speed 1. We prove that (i) and (ii) hold for l+1l+1 in three steps.

Step 1: Proving D(l+1,0)K(y,)vKD^{(l+1,0)}K(y,\cdot)\cdot v\in\mathcal{H}_{K} for all yTlMy\in T^{l}M and vTyTlMv\in T_{y}T^{l}M. Equations (2.10) and (A.1) imply that the set {1t(Kl(ϕ(t))Kl(y)):|t|1}\{\frac{1}{t}(K^{l}(\phi(t))-K^{l}(y)):|t|\leq 1\} of functions in K\mathcal{H}_{K} satisfies

1t(\displaystyle\Big{\|}\frac{1}{t}( Kl(ϕ(t),)Kl(y,))2K=1t2(K(l,l)(ϕ(t),ϕ(t))K(l,l)(y,ϕ(t))K(l,l)(ϕ(t),y)+K(l,l)(y,y))\displaystyle K^{l}(\phi(t),\cdot)-K^{l}(y,\cdot))\Big{\|}^{2}_{\mathcal{H}_{K}}=\frac{1}{t^{2}}\Big{(}K^{(l,l)}(\phi(t),\phi(t))-K^{(l,l)}(y,\phi(t))-K^{(l,l)}(\phi(t),y)+K^{(l,l)}(y,y)\Big{)}
=1t20t(D(1,0)K(l,l)(ϕ(s),ϕ(t))ϕ(s)D(1,0)K(l,l)(ϕ(s),y)ϕ(s))ds\displaystyle=\frac{1}{t^{2}}\int_{0}^{t}\Big{(}D^{(1,0)}K^{(l,l)}(\phi(s),\phi(t))\cdot\phi^{\prime}(s)-D^{(1,0)}K^{(l,l)}(\phi(s),y)\cdot\phi^{\prime}(s)\Big{)}\mathrm{d}s
=1t20t0tD(1,1)K(l,l)(ϕ(s),ϕ(τ))(ϕ(s),ϕ(τ))dτds\displaystyle=\frac{1}{t^{2}}\int_{0}^{t}\int_{0}^{t}D^{(1,1)}K^{(l,l)}(\phi(s),\phi(\tau))\cdot(\phi^{\prime}(s),\phi^{\prime}(\tau))\mathrm{d}\tau\mathrm{d}s
1t2KCb2(l+1)0t0tϕ(τ)lϕ(s)ldτds=12KCb2(l+1),|t|1,\displaystyle\leq\frac{1}{t^{2}}\|K\|_{C_{b}^{2(l+1)}}\int_{0}^{t}\int_{0}^{t}\|\phi^{\prime}(\tau)\|_{l}\|\phi^{\prime}(s)\|_{l}\mathrm{d}\tau\mathrm{d}s=\frac{1}{2}\|K\|_{C_{b}^{2(l+1)}},\quad\forall~{}|t|\leq 1,

where l\|\cdot\|_{l} is induced by the Riemannian metric on TlMT^{l}M and we have used the assumption that KCb2s+1(M×M)K\in C_{b}^{2s+1}(M\times M). This means that {1t(Kl(ϕ(t))Kl(y)):|t|1}\{\frac{1}{t}(K^{l}(\phi(t))-K^{l}(y)):|t|\leq 1\} lies in a closed ball of the Hilbert space K\mathcal{H}_{K} with a finite radius. Since this ball is sequentially weakly compact (see [Conw 07, Theorem 4.2]), there is a sequence {ti}i=1\{t_{i}\}_{i=1}^{\infty} with |ti|1|t_{i}|\leq 1 and limiti=0\lim_{i\rightarrow\infty}t_{i}=0 such that {1ti(Kl(ϕ(ti),)Kl(y,)):|ti|1}\{\frac{1}{t_{i}}(K^{l}(\phi(t_{i}),\cdot)-K^{l}(y,\cdot)):|t_{i}|\leq 1\} converges weakly to an element hyh_{y} of K\mathcal{H}_{K} as ii\rightarrow\infty. The weak convergence tells us that

limi1ti(Kl(ϕ(ti),)Kl(y,)),fK=hy,fK,fK.\lim\limits_{i\rightarrow\infty}\Big{\langle}\frac{1}{t_{i}}(K^{l}(\phi(t_{i}),\cdot)-K^{l}(y,\cdot)),f\Big{\rangle}_{\mathcal{H}_{K}}=\langle h_{y},f\rangle_{\mathcal{H}_{K}},\quad\forall f\in\mathcal{H}_{K}. (A.2)

In particular, by taking f=K(x,)f=K(x,\cdot) with xMx\in M, it holds that

hy(x)\displaystyle h_{y}(x) =limi1ti(Kl(ϕ(ti),)Kl(y,)),K(x,)K\displaystyle=\lim\limits_{i\rightarrow\infty}\Big{\langle}\frac{1}{t_{i}}(K^{l}(\phi(t_{i}),\cdot)-K^{l}(y,\cdot)),K(x,\cdot)\Big{\rangle}_{\mathcal{H}_{K}}
=limi1ti(K(l,0)(ϕ(ti),x)K(l,0)K(y,x))=D(1,0)K(l,0)(y,x)v=D(l+1,0)K(y,x)v.\displaystyle=\lim\limits_{i\rightarrow\infty}\frac{1}{t_{i}}(K^{(l,0)}(\phi(t_{i}),x)-K^{(l,0)}K(y,x))=D^{(1,0)}K^{(l,0)}(y,x)\cdot v=D^{(l+1,0)}K(y,x)\cdot v.

This is true for an arbitrary point xMx\in M. Hence D(l+1,0)K(y,)v=hyD^{(l+1,0)}K(y,\cdot)\cdot v=h_{y} as functions on MM. Since hyKh_{y}\in\mathcal{H}_{K}, we know D(l+1,0)K(y,)vKD^{(l+1,0)}K(y,\cdot)\cdot v\in\mathcal{H}_{K}.

Step 2: Proving the convergence

1t(Kl(ϕ(t),)Kl(y,))D(l+1,0)K(y,)v,inK.\displaystyle\frac{1}{t}\Big{(}K^{l}(\phi(t),\cdot)-K^{l}(y,\cdot)\Big{)}\longrightarrow D^{(l+1,0)}K(y,\cdot)\cdot v,\quad\text{in}\quad\mathcal{H}_{K}. (A.3)

Using the notation introduced above, Kl+1((y,v),)=D(l+1,0)K(y,)vKK^{l+1}((y,v),\cdot)=D^{(l+1,0)}K(y,\cdot)\cdot v\in\mathcal{H}_{K}. Then equation (A.2) yields

Kl+1((y,v),),Kl+1((y,v),)K\displaystyle\left\langle K^{l+1}((y,v),\cdot),K^{l+1}((y,v),\cdot)\right\rangle_{\mathcal{H}_{K}} =limi1tiKl+1((y,v),),Kl(ϕ(ti),)Kl(y,)K\displaystyle=\lim\limits_{i\rightarrow\infty}\frac{1}{t_{i}}\left\langle K^{l+1}((y,v),\cdot),K^{l}(\phi(t_{i}),\cdot)-K^{l}(y,\cdot)\right\rangle_{\mathcal{H}_{K}}
=\displaystyle=\ limi1ti(D(1,0)K(l,l)(y,ϕ(ti))vD(1,0)K(l,l)(y,y)v)\displaystyle\lim\limits_{i\rightarrow\infty}\frac{1}{t_{i}}\left(D^{(1,0)}K^{(l,l)}(y,\phi(t_{i}))\cdot v-D^{(1,0)}K^{(l,l)}(y,y)\cdot v\right)
=\displaystyle=\ D(1,1)K(l,l)(y,y)(v,v)=D(l+1,l+1)K(y,y)(v,v).\displaystyle D^{(1,1)}K^{(l,l)}(y,y)\cdot(v,v)=D^{(l+1,l+1)}K(y,y)\cdot(v,v).

By equation (2.10), we obtain that

1t(Kl(ϕ(t),)Kl(y,))Kl+1((y,v),)K2\displaystyle\Big{\|}\frac{1}{t}\Big{(}K^{l}(\phi(t),\cdot)-K^{l}(y,\cdot)\Big{)}-K^{l+1}((y,v),\cdot)\Big{\|}^{2}_{\mathcal{H}_{K}}
=\displaystyle=~{} 1t2(K(l,l)(ϕ(t),ϕ(t))2K(l,l)(ϕ(t),y)+K(l,l)(y,y))+K(l+1,l+1)((y,v),(y,v))\displaystyle\frac{1}{t^{2}}\Big{(}K^{(l,l)}(\phi(t),\phi(t))-2K^{(l,l)}(\phi(t),y)+K^{(l,l)}(y,y)\Big{)}+K^{(l+1,l+1)}((y,v),(y,v))
2t(K(l+1,l)((y,v),ϕ(t))K(l+1,l)((y,v),y))\displaystyle-\frac{2}{t}\Big{(}K^{(l+1,l)}((y,v),\phi(t))-K^{(l+1,l)}((y,v),y)\Big{)}
=\displaystyle=~{} 1t20t0tK(l+1,l+1)((ϕ(s),ϕ(s)),(ϕ(τ),ϕ(τ)))dτds+K(l+1,l+1)((y,v),(y,v))\displaystyle\frac{1}{t^{2}}\int_{0}^{t}\int_{0}^{t}K^{(l+1,l+1)}((\phi(s),\phi^{\prime}(s)),(\phi(\tau),\phi^{\prime}(\tau)))\mathrm{d}\tau\mathrm{d}s+K^{(l+1,l+1)}((y,v),(y,v))
2t0tK(l+1,l+1)((ϕ(τ),ϕ(τ)),(u,v))dτ\displaystyle-\frac{2}{t}\int_{0}^{t}K^{(l+1,l+1)}((\phi(\tau),\phi^{\prime}(\tau)),(u,v))~{}\mathrm{d}\tau
=\displaystyle=~{} 1t20t0t[K(l+1,l+1)((ϕ(s),ϕ(s)),(ϕ(τ),ϕ(τ))K(l+1,l+1)((ϕ(s),ϕ(s)),(y,v))]dτds\displaystyle\frac{1}{t^{2}}\int_{0}^{t}\int_{0}^{t}\left[K^{(l+1,l+1)}((\phi(s),\phi^{\prime}(s)),(\phi(\tau),\phi^{\prime}(\tau))-K^{(l+1,l+1)}((\phi(s),\phi^{\prime}(s)),(y,v))\right]\mathrm{d}\tau\mathrm{d}s
1t0t[K(l+1,l+1)((ϕ(τ),ϕ(τ)),(y,v))K(l+1,l+1)((y,v),(y,v))]dτ\displaystyle-\frac{1}{t}\int_{0}^{t}\left[K^{(l+1,l+1)}((\phi(\tau),\phi^{\prime}(\tau)),(y,v))-K^{(l+1,l+1)}((y,v),(y,v))\right]\mathrm{d}\tau
=\displaystyle=~{} 1t20t0t0τD(0,1)K(l+1,l+1)((ϕ(s),ϕ(s)),(ϕ(z),ϕ(z)))ψ(z)dzdτds\displaystyle\frac{1}{t^{2}}\int_{0}^{t}\int_{0}^{t}\int_{0}^{\tau}D^{(0,1)}K^{(l+1,l+1)}((\phi(s),\phi^{\prime}(s)),(\phi(z),\phi^{\prime}(z)))\cdot\psi^{\prime}(z)\mathrm{d}z\mathrm{d}\tau\mathrm{d}s
1t0t0τD(1,0)K(l+1,l+1)((ϕ(z),ϕ(z)),(y,v))ψ(z)dzdτ\displaystyle-\frac{1}{t}\int_{0}^{t}\int_{0}^{\tau}D^{(1,0)}K^{(l+1,l+1)}((\phi(z),\phi^{\prime}(z)),(y,v))\cdot\psi^{\prime}(z)\mathrm{d}z\mathrm{d}\tau
=\displaystyle=~{} 1t20t0t0τD(l+1,l+2)K(x,x)((ϕ(s),ϕ(s)),(ϕ(z),ϕ(z)))ψ(z)dzdτds\displaystyle\frac{1}{t^{2}}\int_{0}^{t}\int_{0}^{t}\int_{0}^{\tau}D^{(l+1,l+2)}K(x,x)\cdot((\phi(s),\phi^{\prime}(s)),(\phi(z),\phi^{\prime}(z)))\cdot\psi^{\prime}(z)\mathrm{d}z\mathrm{d}\tau\mathrm{d}s
1t0t0τD(l+2,l+1)K(x,x)((ϕ(z),ϕ(z)),(y,v))ψ(z)dzdτ\displaystyle-\frac{1}{t}\int_{0}^{t}\int_{0}^{\tau}D^{(l+2,l+1)}K(x,x)\cdot((\phi(z),\phi^{\prime}(z)),(y,v))\cdot\psi^{\prime}(z)\mathrm{d}z\mathrm{d}\tau
\displaystyle\leq~{} 1t20t0t0τD(l+1,l+2)Kψ(z)l+1dzdτds+1t0t0sD(l+2,l+1)Kψ(z)l+1dzds\displaystyle\frac{1}{t^{2}}\int_{0}^{t}\int_{0}^{t}\int_{0}^{\tau}\|D^{(l+1,l+2)}K\|_{\infty}\|\psi^{\prime}(z)\|_{l+1}\mathrm{d}z\mathrm{d}\tau\mathrm{d}s+\frac{1}{t}\int_{0}^{t}\int_{0}^{s}\|D^{(l+2,l+1)}K\|_{\infty}\|\psi^{\prime}(z)\|_{l+1}\mathrm{d}z\mathrm{d}s
=\displaystyle=~{} 16D(l+1,l+2)Kt+12D(l+2,l+1)Kt12KCb2l+3t12KCb2s+1t,\displaystyle\frac{1}{6}\|D^{(l+1,l+2)}K\|_{\infty}~{}t+\frac{1}{2}\|D^{(l+2,l+1)}K\|_{\infty}~{}t\leq\frac{1}{2}\|K\|_{C_{b}^{2l+3}}~{}t\leq\frac{1}{2}\|K\|_{C_{b}^{2s+1}}~{}t,

where ψ:[0,t]Tl+1M\psi:[0,t]\to T^{l+1}M is chosen as a geodesic with constant speed 1, together with ψ(0)=(y,v)\psi(0)=(y,v) and ψ(z)=(ϕ(z),ϕ(z))\psi(z)=(\phi(z),\phi^{\prime}(z)), the second and fourth qualities are due to the formula (2.10), and the last two inequalities thank to KCb2s+1(M×M)K\in C_{b}^{2s+1}(M\times M) and 2l+32s+12l+3\leq 2s+1. As t0t\rightarrow 0, (A.3) follows.

Step 3: Proving (3.2). Let fKf\in\mathcal{H}_{K}. By (A.3) we have

Dl+1f(y)v\displaystyle D^{l+1}f(y)\cdot v =limt01t(Dlf(ϕ(t))Dlf(y))=limt01t(Kl(ϕ(t),)Kl(y,)),fK\displaystyle=\lim\limits_{t\rightarrow 0}\frac{1}{t}\left(D^{l}f(\phi(t))-D^{l}f(y)\right)=\lim\limits_{t\rightarrow 0}\Big{\langle}\frac{1}{t}\left(K^{l}(\phi(t),\cdot)-K^{l}(y,\cdot)\right),f\Big{\rangle}_{\mathcal{H}_{K}}
=K(l+1,0)((y,v),),fK=D(l+1,0)K(y,)v,fK.\displaystyle=\langle K^{(l+1,0)}((y,v),\cdot),f\rangle_{\mathcal{H}_{K}}=\langle D^{(l+1,0)}K(y,\cdot)\cdot v,f\rangle_{\mathcal{H}_{K}}.

That is, Dl+1f(y)vD^{l+1}f(y)\cdot v exists and equals (D(l+1,0)K(y,)v,fK\langle(D^{(l+1,0)}K(y,\cdot)\cdot v,f\rangle_{\mathcal{H}_{K}}. This verifies (3.2) for l+1l+1. Hence, the parts (i) and (ii) follow.
We conclude by proving part (iii) using (3.2) and (A.1). For fKf\in\mathcal{H}_{K}, and xMx\in M, the Cauchy-Schwarz inequality implies that,

|f(x)|=|Kx,fK|K(x,x)fKKfK,\displaystyle|f(x)|=|\langle K_{x},f\rangle_{\mathcal{H}_{K}}|\leq\sqrt{K(x,x)}~{}\|f\|_{\mathcal{H}_{K}}\leq\sqrt{\|K\|_{\infty}}~{}\|f\|_{\mathcal{H}_{K}},

and for all 1ks1\leq k\leq s, yTk1My\in T^{k-1}M and vTyTk1Mv\in T_{y}T^{k-1}M,

|Dkf(y)v|\displaystyle|D^{k}f(y)\cdot v| =|(D(k,0)K(y,)v,fK|D(k,k)K(y,y)(v,v)fKD(k,k)Kvk1fK.\displaystyle=|\langle(D^{(k,0)}K(y,\cdot)\cdot v,f\rangle_{\mathcal{H}_{K}}|\leq\sqrt{D^{(k,k)}K(y,y)\cdot(v,v)}~{}\|f\|_{\mathcal{H}_{K}}\leq\sqrt{\|D^{(k,k)}K\|_{\infty}}~{}\|v\|_{k-1}\|f\|_{\mathcal{H}_{K}}.

Therefore, fCbk(M)f\in C^{k}_{b}(M) and DkfD(k,k)KfK\|D^{k}f\|_{\infty}\leq\sqrt{\|D^{(k,k)}K\|_{\infty}}~{}\|f\|_{\mathcal{H}_{K}}. Denote κ=(s+1)KCb2\kappa=\sqrt{(s+1)\|K\|_{C_{b}^{2}}}. It follows that

fCbs\displaystyle\|f\|_{C^{s}_{b}} =f+k=1sDkf(K+k=1sD(k,k)K)fK\displaystyle=\|f\|_{\infty}+\sum_{k=1}^{s}\|D^{k}f\|_{\infty}\leq\left(\sqrt{\|K\|_{\infty}}+\sum_{k=1}^{s}\sqrt{\|D^{(k,k)}K\|_{\infty}}\right)~{}\|f\|_{\mathcal{H}_{K}}
(s+1)(K+k=1sD(k,k)K)fK(s+1)KCb2fK=κfK.\displaystyle\leq\sqrt{(s+1)\left(\|K\|_{\infty}+\sum_{k=1}^{s}\|D^{(k,k)}K\|_{\infty}\right)}~{}\|f\|_{\mathcal{H}_{K}}\leq\sqrt{(s+1)\|K\|_{C_{b}^{2}}}~{}\|f\|_{\mathcal{H}_{K}}=\kappa\|f\|_{\mathcal{H}_{K}}.

A.2 The proof of Proposition 3.8

Since KCb3(P×P)K\in C_{b}^{3}(P\times P), Theorem 3.3 yields that for any hKh\in\mathcal{H}_{K} and zPz\in P,

g(z)(h(z),h(z))hCb1(g(z)(h(z),h(z)))12κhK(g(z)(h(z),h(z)))12,\displaystyle g(z)(\nabla h(z),\nabla h(z))\leq\|h\|_{C_{b}^{1}}\left(g(z)(\nabla h(z),\nabla h(z))\right)^{\frac{1}{2}}\leq\kappa\|h\|_{\mathcal{H}_{K}}\left(g(z)(\nabla h(z),\nabla h(z))\right)^{\frac{1}{2}},

which implies that g(z)(h(z),h(z))κ2hK2g(z)(\nabla h(z),\nabla h(z))\leq\kappa^{2}\|h\|_{\mathcal{H}_{K}}^{2}. Then by Assumption 3.7 and equation (2.13), we have that

g(z)(Xh(z),Xh(z))=g(z)(J(z)h(z),J(z)h(z))γ(z)g(z)(h(z),h(z))κ2γ(z)hK2.\displaystyle g(z)(X_{h}(z),X_{h}(z))=g(z)(J(z)\nabla h(z),J(z)\nabla h(z))\leq\gamma(z)g(z)(\nabla h(z),\nabla h(z))\leq\kappa^{2}\gamma(z)\|h\|_{\mathcal{H}_{K}}^{2}. (A.4)

Thus, the operator AA is bounded linear from K\mathcal{H}_{K} to L2(P,μ𝐙)L^{2}(P,\mu_{\mathbf{Z}}) since

AhL2(μ𝐙)2=Pg(z)(Xh(z),Xh(z))dμ𝐙(z)Pκ2γ(z)hK2μ𝐙(z)κ2C2hK2,\displaystyle\|Ah\|^{2}_{L^{2}(\mu_{\mathbf{Z}})}=\int_{P}g(z)(X_{h}(z),X_{h}(z))\mathrm{d}\mu_{\mathbf{Z}}(z)\leq\int_{P}\kappa^{2}\gamma(z)\|h\|_{\mathcal{H}_{K}}^{2}\mu_{\mathbf{Z}}(z)\leq\kappa^{2}C^{2}\|h\|_{\mathcal{H}_{K}}^{2},

Furthermore, for any vector field YL2(P,μ𝐙)Y\in L^{2}(P,\mu_{\mathbf{Z}}), Corollary 3.5 and Proposition 2.18 imply that

Ah,YL2(μ𝐙)\displaystyle\langle Ah,Y\rangle_{L^{2}(\mu_{\mathbf{Z}})} =Pg(z)(Xh(z),Y(z))dμ𝐙(z)=Pg(z)(h(z),J(z)Y(z))dμ𝐙(z)\displaystyle=\int_{P}g(z)(X_{h}(z),Y(z))\mathrm{d}\mu_{\mathbf{Z}}(z)=\int_{P}g(z)\left(\nabla h(z),-J(z)Y(z)\right)\mathrm{d}\mu_{\mathbf{Z}}(z)
=P(JY)[h](z)dμ𝐙(z)=P(JY)[h,KK](z)dμ𝐙(z)\displaystyle=\int_{P}(-JY)[h](z)\mathrm{d}\mu_{\mathbf{Z}}(z)=\int_{P}(-JY)[\langle h,K_{\cdot}\rangle_{\mathcal{H}_{K}}](z)\mathrm{d}\mu_{\mathbf{Z}}(z)
=Pddtt=0[h,KFt(z)K]dμ𝐙(z)=h,P(JY)[K](z)dμ𝐙(z)K.\displaystyle=\int_{P}\frac{d}{dt}\mid_{t=0}[\langle h,K_{F_{t}(z)}\rangle_{\mathcal{H}_{K}}]\mathrm{d}\mu_{\mathbf{Z}}(z)=\left\langle h,\int_{P}(-JY)[K_{\cdot}](z)\mathrm{d}\mu_{\mathbf{Z}}(z)\right\rangle_{\mathcal{H}_{K}}.

where FtF_{t} is the flow of the vector field JY-JY and Remark 3.4(i) shows that (JY)[K](z)K(-JY)[K_{\cdot}](z)\in\mathcal{H}_{K}. This leads to

(AY)(y)=P(JY)[Ky](z)dμ𝐙(z)=Pg(z)(XKy(z),Y(z))dμ𝐙(z).\displaystyle(A^{*}Y)(y)=\int_{P}(-JY)[K_{y}](z)\mathrm{d}\mu_{\mathbf{Z}}(z)=\int_{P}g(z)(X_{K_{y}}(z),Y(z))\mathrm{d}\mu_{\mathbf{Z}}(z).

Note that hence, for each hKh\in\mathcal{H}_{K},

(Qh)(y):\displaystyle(Qh)(y): =(AAh)(y)=(AXh)(y)=Pg(z)(XKy(z),Xh(z))dμ𝐙(z).\displaystyle=(A^{*}Ah)(y)=(A^{*}X_{h})(y)=\int_{P}g(z)(X_{K_{y}}(z),X_{h}(z))\mathrm{d}\mu_{\mathbf{Z}}(z).

We now show that QQ is a trace class operator; that is, we show that Tr(|Q|)<\operatorname{Tr}(|Q|)<\infty, where |Q|=QQ|Q|=\sqrt{Q^{*}Q}. The form of the operator Q=AAQ=A^{*}A automatically guarantees that it is positive semidefinite, and hence we have that |Q|=Q|Q|=Q. Therefore, it is equivalent to show that Tr(Q)<\operatorname{Tr}(Q)<\infty. To do that, we choose a countable spanning orthonormal set {en}n\left\{e_{n}\right\}_{n\in\mathbb{N}} for K{\mathcal{H}}_{K} whose existence is guaranteed by the continuity of the canonical feature map associated to KK that we established in [Hu 24, Lemma A.3] and [Owha 17, Theorem 2.4]. Then,

Tr(Q)\displaystyle\operatorname{Tr}(Q) =Tr(AA)=nAAen,enK=nAen,AenL2(μ𝐙)\displaystyle=\operatorname{Tr}\left(A^{*}A\right)=\sum_{n}\left\langle A^{*}Ae_{n},e_{n}\right\rangle_{\mathcal{H}_{K}}=\sum_{n}\left\langle Ae_{n},Ae_{n}\right\rangle_{L^{2}\left(\mu_{\mathbf{Z}}\right)}
=nPg(z)(Xen(z),Xen(z))dμ𝐙(𝐱)\displaystyle=\sum_{n}\int_{P}g(z)(X_{e_{n}}(z),X_{e_{n}}(z))\mathrm{d}\mu_{\mathbf{Z}}(\mathbf{x})
C2nPg(z)(en(z),en(z))dμ𝐙(𝐱)\displaystyle\leq C^{2}\sum_{n}\int_{P}g(z)(\nabla{e_{n}}(z),\nabla{e_{n}}(z))\mathrm{d}\mu_{\mathbf{Z}}(\mathbf{x})
dC2KCb2=12dC2κ2<,\displaystyle\leq dC^{2}\|K\|_{C_{b}^{2}}=\frac{1}{2}dC^{2}\kappa^{2}<\infty,

where the sixth inequality is derived by choosing an arbitrary orthonormal basis {vi}i=1d\{v_{i}\}_{i=1}^{d} of the vector space TzPT_{z}P as follows.

ng(z)(en(z),en(z))\displaystyle\sum_{n}g(z)(\nabla{e_{n}}(z),\nabla{e_{n}}(z)) =nDen(z)en(z)=nD(1,0)K(z,)en(z),enK\displaystyle=\sum_{n}De_{n}(z)\cdot\nabla{e_{n}}(z)=\sum_{n}\langle D^{(1,0)}K(z,\cdot)\cdot\nabla{e_{n}}(z),e_{n}\rangle_{\mathcal{H}_{K}}
=nD(1,0)K(z,)(i=1dg(z)(en(z),vi)vi),enK\displaystyle=\sum_{n}\left\langle D^{(1,0)}K(z,\cdot)\cdot\left(\sum_{i=1}^{d}g(z)(\nabla{e_{n}}(z),v_{i})v_{i}\right),e_{n}\right\rangle_{\mathcal{H}_{K}}
=ni=1dD(1,0)K(z,)vi,g(z)(en(z),vi)enK\displaystyle=\sum_{n}\sum_{i=1}^{d}\left\langle D^{(1,0)}K(z,\cdot)\cdot v_{i},g(z)(\nabla{e_{n}}(z),v_{i})e_{n}\right\rangle_{\mathcal{H}_{K}}
=i=1dD(1,0)K(z,)vi,n(Den(z)vi)enK\displaystyle=\sum_{i=1}^{d}\left\langle D^{(1,0)}K(z,\cdot)\cdot v_{i},\sum_{n}(D{e_{n}}(z)\cdot v_{i})e_{n}\right\rangle_{\mathcal{H}_{K}}
=i=1dD(1,0)K(z,)vi,nD(1,0)K(z,)vi,enKenK\displaystyle=\sum_{i=1}^{d}\left\langle D^{(1,0)}K(z,\cdot)\cdot v_{i},\sum_{n}\langle D^{(1,0)}K(z,\cdot)\cdot v_{i},e_{n}\rangle_{\mathcal{H}_{K}}e_{n}\right\rangle_{\mathcal{H}_{K}}
=i=1dD(1,0)K(z,)vi,D(1,0)K(z,)viK\displaystyle=\sum_{i=1}^{d}\left\langle D^{(1,0)}K(z,\cdot)\cdot v_{i},D^{(1,0)}K(z,\cdot)\cdot v_{i}\right\rangle_{\mathcal{H}_{K}}
=i=1dD(1,1)K(z,z)(vi,vi)dKCb2.\displaystyle=\sum_{i=1}^{d}D^{(1,1)}K(z,z)\cdot(v_{i},v_{i})\leq d\|K\|_{C_{b}^{2}}.

A.3 The proof of Proposition 3.12

Firstly, we prove the symmetry of GNG_{N}. It is sufficient to show that gN(𝐜,GN𝐜)=gN(𝐜,GN𝐜)g_{N}(\mathbf{c},G_{N}\mathbf{c}^{\prime})=g_{N}(\mathbf{c}^{\prime},G_{N}\mathbf{c}) for 𝐜,𝐜T𝐙NP\mathbf{c},\mathbf{c}^{\prime}\in T_{\mathbf{Z}_{N}}P arbitrary. Denote 𝕁N=diag{J(𝐙(1)),,J(𝐙(N))}\mathbb{J}_{N}=\operatorname{diag}\{J(\mathbf{Z}^{(1)}),\ldots,J(\mathbf{Z}^{(N)})\}. Then by Remark 3.4(i), we obtain that for any 𝐜T𝐙NP\mathbf{c}\in T_{\mathbf{Z}_{N}}P,

gN(𝐜,XK(𝐙N))=D(1,0)K(𝐙N,)(𝕁N𝐜).\displaystyle g_{N}(\mathbf{c},X_{K_{\cdot}}(\mathbf{Z}_{N}))=D^{(1,0)}K(\mathbf{Z}_{N},\cdot)\cdot\left(-\mathbb{J}_{N}\mathbf{c}\right).

Therefore, again by Remark 3.4(i), we have that

gN(𝐜,GN𝐜)=gN(𝐜,XD(1,0)K(𝐙N,)(𝕁N𝐜)(𝐙N))=D(1,1)K(𝐙N,𝐙N)(𝕁N𝐜,𝕁N𝐜)\displaystyle g_{N}(\mathbf{c},G_{N}\mathbf{c}^{\prime})=g_{N}(\mathbf{c},X_{D^{(1,0)}K(\mathbf{Z}_{N},\cdot)\cdot\left(-\mathbb{J}_{N}\mathbf{c}^{\prime}\right)}(\mathbf{Z}_{N}))=D^{(1,1)}K(\mathbf{Z}_{N},\mathbf{Z}_{N})(-\mathbb{J}_{N}\mathbf{c}^{\prime},-\mathbb{J}_{N}\mathbf{c})

Since KK is a symmetric function, D(1,1)K(𝐙N,𝐙N)D^{(1,1)}K(\mathbf{Z}_{N},\mathbf{Z}_{N}) is also symmetric by definition. This proves that GNG_{N} is symmetric.

We now show the positive semi-definiteness of GNG_{N}. Let L=dNL=dN. Since GNG_{N} is real symmetric, there exists an orthonormal matrix OT𝐙NP×T𝐙NPO\in T_{\mathbf{Z}_{N}}P\times T_{\mathbf{Z}_{N}}P that diagonalizes GNG_{N}, that is

GN=ODO=[|||f1f2fL|||][d1000d2000dL][|||f1f2fL|||],\displaystyle G_{N}=ODO^{\top}=\begin{bmatrix}|&|&\dots&|\\ f_{1}&f_{2}&\dots&f_{L}\\ |&|&\dots&|\end{bmatrix}\begin{bmatrix}d_{1}&0&\dots&0\\ 0&d_{2}&\dots&0\\ \vdots&\vdots&\ddots&\vdots\\ 0&0&\dots&d_{L}\end{bmatrix}\begin{bmatrix}|&|&\dots&|\\ f_{1}&f_{2}&\dots&f_{L}\\ |&|&\dots&|\end{bmatrix}^{\top},

where {di}i=1L\left\{d_{i}\right\}_{i=1}^{L} and {fi}i=1L\left\{f_{i}\right\}_{i=1}^{L} are the real eigenvalues and the corresponding eigenvectors of GNG_{N}, respectively. We now define e~i=gN(fi,XK(𝐙N))\widetilde{e}_{i}=g_{N}\left(f_{i},X_{K_{\cdot}}(\mathbf{Z}_{N})\right). By Remark 3.4(i), we obtain that e~iK\widetilde{e}_{i}\in\mathcal{H}_{K} and that

e~iK2\displaystyle\|\widetilde{e}_{i}\|_{\mathcal{H}_{K}}^{2} =gN(fi,XK(𝐙N)),gN(fi,XK(𝐙N))K\displaystyle=\left\langle g_{N}\left(f_{i},X_{K_{\cdot}}(\mathbf{Z}_{N})\right),g_{N}\left(f_{i},X_{K_{\cdot}}(\mathbf{Z}_{N})\right)\right\rangle_{\mathcal{H}_{K}}
=D(1,0)K(𝐙N,)(𝕁Nfi),D(1,0)K(𝐙N,)(𝕁Nfi)K\displaystyle=\left\langle D^{(1,0)}K(\mathbf{Z}_{N},\cdot)\cdot(-\mathbb{J}_{N}f_{i}),D^{(1,0)}K(\mathbf{Z}_{N},\cdot)\cdot(-\mathbb{J}_{N}f_{i})\right\rangle_{\mathcal{H}_{K}}
=D(1,1)K(𝐙N,𝐙N)(𝕁Nfi,𝕁Nfi)=gN(fi,GNfi)=di.\displaystyle=D^{(1,1)}K(\mathbf{Z}_{N},\mathbf{Z}_{N})(-\mathbb{J}_{N}f_{i},-\mathbb{J}_{N}f_{i})=g_{N}(f_{i},G_{N}f_{i})=d_{i}.

Hence, di0d_{i}\geq 0 for all i=1,,Li=1,\ldots,L and we can conclude that GNG_{N} is positive semi-definite.

A.4 The proof of Lemma 4.1

Proof.

Since KCb3(P×P)K\in C_{b}^{3}(P\times P), then for any hKh\in\mathcal{H}_{K}, we have

QNhK2\displaystyle\|Q_{N}h\|_{\mathcal{H}_{K}}^{2} =1NgN(Xh(𝐙N),XK(𝐙N))K2=1ND(1,0)K(𝐙N,)(JXh)(𝐙N)K2\displaystyle=\frac{1}{N}\|g_{N}(X_{h}(\mathbf{Z}_{N}),X_{K_{\cdot}}(\mathbf{Z}_{N}))\|^{2}_{\mathcal{H}_{K}}=\frac{1}{N}\|D^{(1,0)}K(\mathbf{Z}_{N},\cdot)\cdot(-JX_{h})(\mathbf{Z}_{N})\|^{2}_{\mathcal{H}_{K}}
=1ND(1,1)K(𝐙N,𝐙N)((JXh)(𝐙N),(JXh)(𝐙N))\displaystyle=\frac{1}{N}D^{(1,1)}K(\mathbf{Z}_{N},\mathbf{Z}_{N})\cdot((-JX_{h})(\mathbf{Z}_{N}),(-JX_{h})(\mathbf{Z}_{N}))
1ND(1,1)KgN(JXh(𝐙N),JXh(𝐙N))C22Nκ2gN(Xh(𝐙N),Xh(𝐙N))\displaystyle\leq\frac{1}{N}\|D^{(1,1)}K\|_{\infty}g_{N}(-JX_{h}(\mathbf{Z}_{N}),-JX_{h}(\mathbf{Z}_{N}))\leq\frac{C^{2}}{2N}\kappa^{2}g_{N}(X_{h}(\mathbf{Z}_{N}),X_{h}(\mathbf{Z}_{N}))
12C4κ4hK2<,\displaystyle\leq\frac{1}{2}C^{4}\kappa^{4}\|h\|^{2}_{\mathcal{H}_{K}}<\infty,

which shows that QNhQ_{N}h are bounded random variables in K\mathcal{H}_{K} for all N1N\geq 1. Moreover, we have 𝔼[QNhK2]12C4κ4hK2\mathbb{E}\left[\|Q_{N}h\|^{2}_{\mathcal{H}_{K}}\right]\leq\frac{1}{2}C^{4}\kappa^{4}\|h\|^{2}_{\mathcal{H}_{K}}. Define now the K\mathcal{H}_{K}-valued random variables

ξ(n)=g(𝐙(n))(Xh(𝐙(n)),XK(𝐙(n))),n=1,,N.\xi^{(n)}=g(\mathbf{Z}^{(n)})(X_{h}(\mathbf{Z}^{(n)}),X_{K_{\cdot}}(\mathbf{Z}^{(n)})),\quad\mbox{$n=1,\ldots,N$.}

Note that the random variables {ξ(n)}n=1N\{\xi^{(n)}\}_{n=1}^{N} are IID and that QNhQh=1Nn=1N(ξ(n)𝔼(ξ(n))).Q_{N}h-Qh=\frac{1}{N}\sum_{n=1}^{N}(\xi^{(n)}-\mathbb{E}(\xi^{(n)})). The result follows by applying [De V 05, Lemma 8] or [Yuri 95, Lemma A.2] to {ξ(n)}n=1N\{\xi^{(n)}\}_{n=1}^{N}. ∎

A.5 The proof of Lemma 4.2

Proof.

We first notice that, by the symmetry of GNG_{N}, for any 𝐜,𝐜T𝐙NP\mathbf{c},\mathbf{c}^{\prime}\in T_{\mathbf{Z}_{N}}P, it holds that

gN((GN+λNI)1𝐜,𝐜)=gN(𝐜,(GN+λNI)1𝐜).g_{N}((G_{N}+\lambda NI)^{-1}\mathbf{c},\mathbf{c}^{\prime})=g_{N}(\mathbf{c},(G_{N}+\lambda NI)^{-1}\mathbf{c}^{\prime}). (A.5)

Following an approach similar to the one in Theorem 3.13, we obtain that

h^λ,Nh~λ,NK2\displaystyle\big{\|}\widehat{h}_{\lambda,N}-\widetilde{h}_{\lambda,N}\big{\|}_{\mathcal{H}_{K}}^{2} =gN(XK(𝐙N),(GN+λNI)1𝐄N)K2\displaystyle=\|g_{N}(X_{K_{\cdot}}(\mathbf{Z}_{N}),(G_{N}+\lambda NI)^{-1}\mathbf{E}_{N})\|_{\mathcal{H}_{K}}^{2}
=D(1,1)K(𝐙N,𝐙N)(𝕁N(GN+λNI)1𝐄N,𝕁N(GN+λNI)1𝐄N)\displaystyle=D^{(1,1)}K(\mathbf{Z}_{N},\mathbf{Z}_{N})(-\mathbb{J}_{N}(G_{N}+\lambda NI)^{-1}\mathbf{E}_{N},-\mathbb{J}_{N}(G_{N}+\lambda NI)^{-1}\mathbf{E}_{N})
:=gN(𝐄N,ΣN𝐄N),\displaystyle:=g_{N}(\mathbf{E}_{N},\Sigma_{N}\mathbf{E}_{N}),

where GNG_{N} is defined in (3.15) and ΣN\Sigma_{N} is a matrix in T𝐙NP×T𝐙NPT_{\mathbf{Z}_{N}}P\times T_{\mathbf{Z}_{N}}P defined by

ΣN𝐜:=(GN+λNI)1XgN((GN+λNI)1𝐜,XK(𝐙N))(𝐙N),𝐜T𝐙NP.\displaystyle\Sigma_{N}\mathbf{c}:=(G_{N}+\lambda NI)^{-1}X_{g_{N}((G_{N}+\lambda NI)^{-1}\mathbf{c},X_{K_{\cdot}}(\mathbf{Z}_{N}))}(\mathbf{Z}_{N}),\quad\mathbf{c}\in T_{\mathbf{Z}_{N}}P.

Indeed, in light of (A.5), the above definition can be verified since

gN(𝐄N,ΣN𝐄N)\displaystyle g_{N}(\mathbf{E}_{N},\Sigma_{N}\mathbf{E}_{N}) =gN(𝐄N,(GN+λNI)1XgN((GN+λNI)1𝐄N,XK(𝐙N))(𝐙N))\displaystyle=g_{N}(\mathbf{E}_{N},(G_{N}+\lambda NI)^{-1}X_{g_{N}((G_{N}+\lambda NI)^{-1}\mathbf{E}_{N},X_{K_{\cdot}}(\mathbf{Z}_{N}))}(\mathbf{Z}_{N}))
=gN((GN+λNI)1𝐄N,XgN((GN+λNI)1𝐄N,XK(𝐙N))(𝐙N))\displaystyle=g_{N}((G_{N}+\lambda NI)^{-1}\mathbf{E}_{N},X_{g_{N}((G_{N}+\lambda NI)^{-1}\mathbf{E}_{N},X_{K_{\cdot}}(\mathbf{Z}_{N}))}(\mathbf{Z}_{N}))
=D(1,1)K(𝐙N,𝐙N)(𝕁N(GN+λNI)1𝐄N,𝕁N(GN+λNI)1𝐄N).\displaystyle=D^{(1,1)}K(\mathbf{Z}_{N},\mathbf{Z}_{N})(-\mathbb{J}_{N}(G_{N}+\lambda NI)^{-1}\mathbf{E}_{N},-\mathbb{J}_{N}(G_{N}+\lambda NI)^{-1}\mathbf{E}_{N}).

Choose a basis of the vector space T𝐙NPT_{\mathbf{Z}_{N}}P consisting of orthonormal eigenvectors {fi}i=1dN\{f_{i}\}_{i=1}^{dN} of GNG_{N}. Note that {fi}i=1dN\{f_{i}\}_{i=1}^{dN} are also eigenvectors of ΣN\Sigma_{N}. Then,

Tr(ΣN)\displaystyle\mathrm{Tr}(\Sigma_{N}) =i=1dNgN(fi,ΣNfi)1λ2N2i=1dND(1,1)K(𝐙N,𝐙N)(𝕁Nfi,𝕁Nfi)\displaystyle=\sum_{i=1}^{dN}g_{N}(f_{i},\Sigma_{N}f_{i})\leq\frac{1}{\lambda^{2}N^{2}}\sum_{i=1}^{dN}D^{(1,1)}K(\mathbf{Z}_{N},\mathbf{Z}_{N})(-\mathbb{J}_{N}f_{i},-\mathbb{J}_{N}f_{i})
1λ2N2i=1dND(1,1)KgN(𝕁Nfi,𝕁Nfi)\displaystyle\leq\frac{1}{\lambda^{2}N^{2}}\sum_{i=1}^{dN}\|D^{(1,1)}K\|_{\infty}\cdot g_{N}(-\mathbb{J}_{N}f_{i},-\mathbb{J}_{N}f_{i})
1λ2N2KCb2i=1dNgN(𝕁Nfi,𝕁Nfi)\displaystyle\leq\frac{1}{\lambda^{2}N^{2}}\|K\|_{C_{b}^{2}}\sum_{i=1}^{dN}g_{N}(-\mathbb{J}_{N}f_{i},-\mathbb{J}_{N}f_{i})
1λ2N2KCb2i=1dN(j=1Ng(Jfi(j),Jfi(j)))\displaystyle\leq\frac{1}{\lambda^{2}N^{2}}\|K\|_{C_{b}^{2}}\sum_{i=1}^{dN}\left(\sum_{j=1}^{N}g(-Jf^{(j)}_{i},-Jf^{(j)}_{i})\right)
1λ2N2KCb2i=1dN(j=1NC2g(fi(j),fi(j)))\displaystyle\leq\frac{1}{\lambda^{2}N^{2}}\|K\|_{C_{b}^{2}}\sum_{i=1}^{dN}\left(\sum_{j=1}^{N}C^{2}\cdot g(f^{(j)}_{i},f^{(j)}_{i})\right)
=C2λ2N2KCb2i=1dNgN(fi,fi)=dC2λ2NKCb2=κ2C2d2λ2N,\displaystyle=\frac{C^{2}}{\lambda^{2}N^{2}}\|K\|_{C_{b}^{2}}\sum_{i=1}^{dN}g_{N}(f_{i},f_{i})=\frac{dC^{2}}{\lambda^{2}N}\|K\|_{C_{b}^{2}}=\frac{\kappa^{2}C^{2}d}{2\lambda^{2}N},

with κ2=2KCb2\kappa^{2}=2\|K\|_{C_{b}^{2}} as introduced in the statement of Proposition 3.3. The rest of the proof follows from the same argument as in [Hu 24, Appendix C]. ∎

Acknowledgments

The authors thank Lyudmila Grigoryeva for helpful discussions and remarks and acknowledge partial financial support from the School of Physical and Mathematical Sciences of the Nanyang Technological University. DY is funded by a Nanyang President’s Graduate Scholarship of Nanyang Technological University.

References

  • [Abra 78] R. Abraham and J. E. Marsden. Foundations of Mechanics. Addison-Wesley, Reading, MA, 2nd Ed., 1978.
  • [Aron 50] N. Aronszajn. “Theory of reproducing kernels”. Transactions of the American mathematical society, Vol. 68, No. 3, pp. 337–404, 1950.
  • [Baja 23] J. Bajārs. “Locally-symplectic neural networks for learning volume-preserving dynamics”. Journal of Computational Physics, Vol. 476, p. 111911, 2023.
  • [Beck 22] T. Beckers, J. Seidman, P. Perdikaris, and G. J. Pappas. “Gaussian Process Port-Hamiltonian Systems: Bayesian Learning with Physics Prior”. In: 2022 IEEE 61st Conference on Decision and Control (CDC), pp. 1447–1453, 2022.
  • [Boro 20] V. Borovitskiy, A. Terenin, P. Mostowsky, and M. P. Deisenroth. “Matérn Gaussian processes on riemannian manifolds”. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Curran Associates Inc., Red Hook, NY, USA, 2020.
  • [Bran 23] B. Brantner and M. Kraus. “Symplectic Autoencoders for Model Reduction of Hamiltonian Systems”. 2023.
  • [Brun 16] S. L. Brunton, J. L. Proctor, and J. N. Kutz. “Discovering governing equations from data by sparse identification of nonlinear dynamical systems”. Proceedings of the National Academy of Sciences, Vol. 113, No. 15, pp. 3932–3937, 2016.
  • [Chen 19] Z. Chen, J. Zhang, M. Arjovsky, and L. Bottou. “Symplectic recurrent neural networks”. arXiv preprint arXiv:1909.13334, 2019.
  • [Chen 21] R. Chen and M. Tao. “Data-driven Prediction of General Hamiltonian Dynamics via Learning Exactly-Symplectic Maps”. In: M. Meila and T. Zhang, Eds., Proceedings of the 38th International Conference on Machine Learning, pp. 1717–1727, PMLR, 18–24 Jul 2021.
  • [Chen 22] X. Chen, J. Duan, J. Hu, and D. Li. “Data-driven method to learn the most probable transition pathway and stochastic differential equation”. Physica D: Nonlinear Phenomena, 2022.
  • [Chos 02] P. Chossat, J.-P. Ortega, and T. S. Ratiu. “Hamiltonian Hopf bifurcation with symmetry”. Archive for Rational Mechanics and Analysis, Vol. 163, pp. 1–33, 2002.
  • [Chos 03] P. Chossat, D. Lewis, J.-P. Ortega, and T. S. Ratiu. “Bifurcation of relative equilibria in mechanical systems with symmetry”. Advances in Applied Mathematics, Vol. 31, pp. 10–45, 2003.
  • [Chri 10] A. Christmann and I. Steinwart. “Universal Kernels on Non-Standard Input Spaces”. In: J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Eds., Advances in Neural Information Processing Systems, p. , Curran Associates, Inc., 2010.
  • [Conw 07] J. B. Conway. A Course in Functional Analysis. Springer Science & Business Media, second Ed., 2007.
  • [Cran 20] M. Cranmer, S. Greydanus, S. Hoyer, P. Battaglia, D. Spergel, and S. Ho. “Lagrangian neural networks”. arXiv preprint arXiv:2003.04630, 2020.
  • [Da C 23a] N. Da Costa, C. Mostajeran, and J.-P. Ortega. “The Gaussian kernel on the circle and spaces that admit isometric embeddings of the circle”. In: International Conference on Geometric Science of Information, pp. 426–435, Springer, 2023.
  • [Da C 23b] N. Da Costa, C. Mostajeran, J.-P. Ortega, and S. Said. “Geometric Learning with Positively Decomposable Kernels”. arXiv preprint arXiv:2310.13821, 2023.
  • [Davi 23] M. David and F. Méhats. “Symplectic learning for Hamiltonian neural networks”. Journal of Computational Physics, Vol. 494, p. 112495, 2023.
  • [De V 05] E. De Vito, L. Rosasco, A. Caponnetto, U. De Giovannini, F. Odone, and P. Bartlett. “Learning from Examples as an Inverse Problem.”. Journal of Machine Learning Research, Vol. 6, No. 5, 2005.
  • [Doum 23] N. Doumèche, G. Biau, and C. Boyer. “Convergence and error analysis of PINNs”. 2023.
  • [Eldr 24a] C. Eldred, F. Gay-Balmaz, and V. Putkaradze. “CLPNets: Coupled Lie-Poisson Neural Networks for Multi-Part Hamiltonian Systems with Symmetries”. arXiv preprint arXiv:2408.16160, 2024.
  • [Eldr 24b] C. Eldred, F. Gay-Balmaz, S. Huraka, and V. Putkaradze. “Lie–Poisson Neural Networks (LPNets): Data-based computing of Hamiltonian systems with symmetries”. Neural Networks, Vol. 173, p. 106162, 2024.
  • [Emer 12] M. Émery. Stochastic calculus in manifolds. Springer Science & Business Media, 2012.
  • [Feng 23] J. Feng, C. Kulick, Y. Ren, and S. Tang. “Learning particle swarming models from data with Gaussian processes”. Mathematics of Computation, 2023.
  • [Fera 15] A. Feragen, F. Lauze, and S. Hauberg. “Geodesic exponential kernels: When curvature and linearity conflict”. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3032–3042, 2015.
  • [Ferr 12] J. C. Ferreira and V. A. Menegatto. “Reproducing properties of differentiable Mercer-like kernels”. Mathematische Nachrichten, Vol. 285, No. 8-9, pp. 959–973, 2012.
  • [Gins 12] D. Ginsbourger, X. Bay, O. Roustant, and L. Carraro. “Argumentwise invariant kernels for the approximation of invariant functions”. In: Annales de la Faculté des sciences de Toulouse: Mathématiques, pp. 501–527, 2012.
  • [Grey 19] S. Greydanus, M. Dzamba, and J. Yosinski. “Hamiltonian neural networks”. Advances in neural information processing systems, Vol. 32, 2019.
  • [Holm 09] D. D. Holm, T. Schmah, and C. Stoica. Geometric mechanics and symmetry: from finite to infinite dimensions. Vol. 12, Oxford University Press, 2009.
  • [Holm 11a] D. D. Holm. Geometric Mechanics. Part I: Dynamics and Symmetry. World Scientific Publishing Company, second Ed., 2011.
  • [Holm 11b] D. D. Holm. Geometric Mechanics. Part II: Rotating, Translating and Rolling. World Scientific, second Ed., 2011.
  • [Hsu 02] E. P. Hsu. Stochastic analysis on manifolds. American Mathematical Soc., 2002.
  • [Hu 24] J. Hu, J.-P. Ortega, and D. Yin. “A Structure-Preserving Kernel Method for Learning Hamiltonian Systems”. arXiv preprint arXiv:2403.10070, 2024.
  • [Jin 20] P. Jin, Z. Zhang, A. Zhu, Y. Tang, and G. E. Karniadakis. “SympNets: Intrinsic structure-preserving symplectic networks for identifying Hamiltonian systems”. Neural Networks, Vol. 132, pp. 166–179, 2020.
  • [Jin 22] P. Jin, Z. Zhang, I. G. Kevrekidis, and G. E. Karniadakis. “Learning Poisson systems and trajectories of autonomous systems via Poisson neural networks”. IEEE Transactions on Neural Networks and Learning Systems, 2022.
  • [Kost 09] B. Kostant. “Orbits, symplectic structures and representation theory”. Collected Papers, pp. 482–482, 2009.
  • [Leok 12] M. Leok and T. Shingel. “General techniques for constructing variational integrators”. Frontiers of Mathematics in China, Vol. 7, pp. 273–303, 2012.
  • [Leon 97] N. E. Leonard and J. E. Marsden. “Stability and drift of underwater vehicle dynamics: mechanical systems with rigid motion symmetry”. Physica D: Nonlinear Phenomena, Vol. 105, No. 1-3, pp. 130–162, 1997.
  • [Li 23] S. Li. “Gaussian kernels on nonsimply connected closed Riemannian manifolds are never positive definite”. Bulletin of the London Mathematical Society, 2023.
  • [Libe 12] P. Libermann and C.-M. Marle. Symplectic Geometry and Analytical Mechanics. Vol. 35, Springer Science & Business Media, 2012.
  • [Lie 93] S. Lie and F. Engel. Theorie der Transformationsgruppen. Vol. 3, Teubner, 1893.
  • [Lu 19] F. Lu, M. Zhong, S. Tang, and M. Maggioni. “Nonparametric inference of interaction laws in systems of agents from trajectory data”. Proceedings of the National Academy of Sciences, Vol. 116, No. 29, pp. 14424–14433, 2019.
  • [Magg 21] M. Maggioni, J. Miller, H. Qiu, and M. Zhong. “Learning Interaction Kernels for Agent Systems on Riemannian Manifolds”. 2021.
  • [Mars 01] J. E. Marsden and M. West. “Discrete mechanics and variational integrators”. Acta numerica, Vol. 10, pp. 357–514, 2001.
  • [Mars 74] J. E. Marsden and A. Weinstein. “Reduction of symplectic manifolds with symmetry”. Reports on Mathematical Physics, Vol. 5, No. 1, pp. 121–130, 1974.
  • [Mars 99] J. E. Marsden and T. S. Ratiu. Introduction to Mechanics and Symmetry. Springer-Verlag, New York, second Ed., 1999.
  • [Micc 06] C. A. Micchelli, Y. Xu, and H. Zhang. “Universal Kernels.”. Journal of Machine Learning Research, Vol. 7, No. 12, 2006.
  • [Minh 10] H. Q. Minh. “Some properties of Gaussian reproducing kernel Hilbert spaces and their implications for function approximation and learning theory”. Constructive Approximation, Vol. 32, pp. 307–338, 2010.
  • [Mont 97a] J. A. Montaldi. “Persistance d’orbites périodiques relatives dans les systèmes hamiltoniens symétriques”. C. R. Acad. Sci. Paris Sér. I Math., Vol. 324, pp. 553–558, 1997.
  • [Mont 97b] J. A. Montaldi. “Persistence and stability of relative equilibria”. Nonlinearity, Vol. 10, pp. 449–466, 1997.
  • [Mrou 15] Y. Mroueh, S. Voinea, and T. A. Poggio. “Learning with Group Invariant Features: A Kernel Perspective.”. Advances in neural information processing systems, Vol. 28, 2015.
  • [Noet 18] E. Noether. “Invariante Variationsprobleme”. Nachr. d. König. Gesellsch. d. Wiss. zu Göttingen Math. Physik, Vol. 2, pp. 235–257, 1918.
  • [Nomi 61] K. Nomizu and H. Ozeki. “The existence of complete Riemannian metrics”. Proceedings of the American Mathematical Society, Vol. 12, No. 6, pp. 889–891, 1961.
  • [Nova 18] E. Novak, M. Ullrich, H. Woźniakowski, and S. Zhang. “Reproducing kernels of Sobolev spaces on d\mathbb{R}^{d} and applications to embedding constants and tractability”. Analysis and Applications, Vol. 16, No. 05, pp. 693–715, 2018.
  • [Offe 24] C. Offen. “Machine learning of continuous and discrete variational ODEs with convergence guarantee and uncertainty quantification”. 2024.
  • [Orte 04] J.-P. Ortega and T. S. Ratiu. Momentum Maps and Hamiltonian Reduction. Birkhauser Verlag, 2004.
  • [Orte 05] J.-P. Ortega, V. Planas-Bielsa, and T. S. Ratiu. “Asymptotic and Lyapunov stability of constrained and Poisson equilibria”. Journal of Differential Equations, Vol. 214, No. 1, pp. 92–127, jul 2005.
  • [Orte 24] J.-P. Ortega and D. Yin. “Learnability of Linear Port-Hamiltonian Systems”. Journal of Machine Learning Research, Vol. 25, No. 68, pp. 1–56, 2024.
  • [Orte 97] J.-P. Ortega and T. S. Ratiu. “Persistence and smoothness of critical relative elements in Hamiltonian systems with symmetry”. Comptes Rendus de l’Académie des Sciences - Series I - Mathematics, Vol. 325, No. 10, pp. 1107–1111, nov 1997.
  • [Owha 17] H. Owhadi and C. Scovel. “Separability of reproducing kernel spaces”. Proceedings of the American Mathematical Society, Vol. 145, No. 5, pp. 2131–2138, 2017.
  • [Penn 06] X. Pennec. “Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements”. Journal of Mathematical Imaging and Vision, Vol. 25, pp. 127–154, 2006.
  • [Penn 19] X. Pennec, S. Sommer, and T. Fletcher. Riemannian Geometric Statistics in Medical Image Analysis. Academic Press, 2019.
  • [Pfö 24] M. Pförtner, I. Steinwart, P. Hennig, and J. Wenger. “Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers”. 2024.
  • [Rais 19] M. Raissi, P. Perdikaris, and G. Karniadakis. “Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations”. Journal of Computational Physics, Vol. 378, pp. 686–707, 2019.
  • [Rath 21] K. Rath, C. G. Albert, B. Bischl, and U. von Toussaint. “Symplectic Gaussian process regression of maps in Hamiltonian systems”. Chaos: An Interdisciplinary Journal of Nonlinear Science, Vol. 31, No. 5, p. 053121, 05 2021.
  • [Rude 13] M. Rudelson, R. Vershynin, et al. “Hanson-wright inequality and sub-Gaussian concentration”. Electronic Communications in Probability, Vol. 18, 2013.
  • [Shin 20] J. Shin, YeonjongDarbon and G. Em Karniadakis. “On the Convergence of Physics Informed Neural Networks for Linear Second-Order Elliptic and Parabolic Type PDEs”. Communications in Computational Physics, Vol. 28, No. 5, pp. 2042–2074, 2020.
  • [Sour 65] J.-M. Souriau. Géométrie de l’espace de phases: calcul des variations et mécanique quantique. Faculté des sciences de Marseille, 1965.
  • [Sour 66] J.-M. Souriau. “Quantification géométrique”. Comm. Math. Phys., Vol. 1, pp. 374–398, 1966.
  • [Sour 69] J.-M. Souriau. Structure des Systèmes Dynamiques. Dunod, Paris, 1969.
  • [Stei 01] I. Steinwart. “On the influence of the kernel on the consistency of support vector machines”. Journal of machine learning research, Vol. 2, No. Nov, pp. 67–93, 2001.
  • [Tu 08] L. W. Tu. An Introduction to Manifolds. Springer, 2008.
  • [Vais 12] I. Vaisman. Lectures on the Geometry of Poisson Manifolds. Vol. 118, Birkhäuser, 2012.
  • [Van  12] C. Van Den Berg, J. P. R. Christensen, and P. Ressel. Harmonic Analysis on Semigroups: Theory of Positive Definite and Related Functions. Vol. 100, Springer Science & Business Media, 2012.
  • [Wein 83] A. Weinstein. “The local structure of Poisson manifolds”. Journal of differential geometry, Vol. 18, No. 3, pp. 523–557, 1983.
  • [Yuri 95] V. Yurinsky. Sums and Gaussian Vectors. Lecture Notes in Mathematics. Springer Berlin, Heidelberg, 1995.
  • [Zhou 08] D.-X. Zhou. “Derivative reproducing properties for kernel methods in learning theory”. Journal of Computational and Applied Mathematics, Vol. 220, No. 1-2, pp. 456–463, 2008.
  • [Zhu 20] A. Zhu, P. Jin, and Y. Tang. “Deep Hamiltonian networks based on symplectic integrators”. 2020.