This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Adaptive tuning of Hamiltonian Monte Carlo methods

Elena Akhmatskaya eakhmatskaya@bcamath.org Lorenzo Nagar lnagar@bcamath.org Jose Antonio Carrillo Leonardo Gavira Balmacz Hristo Inouzhe Martín Parga Pazos María Xosé Rodríguez Álvarez BCAM – Basque Center for Applied Mathematics, Bilbao, Spain Ikerbasque – Basque Foundation for Science, Bilbao, Spain UPV/EHU – Universidad del País Vasco / Euskal Herriko Unibertsitatea, Basque Country, Spain Mathematical Institute, University of Oxford, Oxford, United Kingdom Departamento de Matemáticas, Universidad Autónoma de Madrid, Madrid, Spain CIC bioGUNE – Center for Cooperative Research in Biosciences, Derio, Spain Departamento de Estatística e Investigación Operativa, Universidade de Vigo, Vigo, Spain
(August 14, 2025)
Abstract

With the recently increased interest in probabilistic models, the efficiency of an underlying sampler becomes a crucial consideration. A Hamiltonian Monte Carlo (HMC) sampler is one popular option for models of this kind. Performance of HMC, however, strongly relies on a choice of parameters associated with an integration method for Hamiltonian equations, which up to date remains mainly heuristic or introduce time complexity. We propose a novel computationally inexpensive and flexible approach (we call it Adaptive Tuning or ATune) that, by analyzing the data generated during a burning stage of an HMC simulation, detects a system specific splitting integrator with a set of reliable HMC hyperparameters, including their credible randomization intervals, to be readily used in a production simulation. The method automatically eliminates those values of simulation parameters which could cause undesired extreme scenarios, such as resonance artifacts, low accuracy or poor sampling. The new approach is implemented in the in-house software package HaiCS, with no computational overheads introduced in a production simulation, and can be easily incorporated in any package for Bayesian inference with HMC. The tests on popular statistical models using original HMC and generalized Hamiltonian Monte Carlo (GHMC) reveal the superiority of adaptively tuned methods in terms of stability, performance and accuracy over conventional HMC tuned heuristically and coupled with the well-established integrators. We also claim that the generalized formulation of HMC, i.e. GHMC, is preferable for achieving high sampling performance. The efficiency of the new methodology is assessed in comparison with state-of-the-art samplers, e.g. the No-U-Turn-Sampler (NUTS), in real-world applications, such as endocrine therapy resistance in cancer, modeling of cell-cell adhesion dynamics and influenza A epidemic outbreak.

keywords:
Hamiltonian Monte Carlo, GHMC, adaptive hyperparameters tuning, adaptive multi-stage splitting integration, randomization intervals, biomedical applications

1 Introduction

Hamiltonian Monte Carlo (HMC) [1] has emerged as a powerful tool for sampling from high-dimensional complex probability distributions, making it invaluable in Bayesian inference applications [2]. By leveraging the Hamiltonian equations of motion for generating Monte Carlo proposals, HMC exploits gradient information on posterior distributions to enhance the convergence of the Markov chain and enables efficient exploration of the parameter space.

The Markov kernel based on Hamiltonian dynamics induces a set of hyperparameters and tuning blocks that critically impact sampling performance and include a numerical integrator for solving Hamiltonian equations, an integration step size Δt\Delta t, a number of integrations per iteration LL, a mass matrix MM. For instance, while a smaller Δt\Delta t ensures high acceptance rates and reduces computational waste, it also shortens proposal moves, increasing correlation between samples. Conversely, a larger LL enables longer trajectories and more independent samples but comes at the cost of increased computational overhead due to additional gradient evaluations per iteration. Consequently, significant research efforts have been devoted to identifying optimal parameter settings for HMC to balance computational efficiency and sampling performance.

Early strategies for tuning the integration step size, Δt\Delta t, focused on targeting an optimal acceptance rate for the Metropolis test [3, 4] or employing on-the-fly adaptation using primal-dual averaging algorithms [5]. Some sophisticated integration schemes have also been proposed, replacing the standard Verlet algorithm [6, 7] with more accurate multi-stage splitting integrators to enhance sampling performance and efficiency [8, 9, 10, 11, 12, 13, 14, 15]. Comprehensive reviews on numerical integration in HMC are available in [16, 17].

Regarding the number of integration steps per iteration, LL, a common recommendation is randomization [18, 19], though more advanced algorithms have been proposed, such as automated trajectory length selection based on the double-backing criterion [5].

Additional contributions to HMC tuning include position-dependent mass matrix adaptation [20] and methods for determining the proper number of Monte Carlo iterations to ensure convergence [21].

More recent advancements in optimization algorithms for HMC hyperparameters address both integration step size [22, 23, 24, 15, 25, 26] and number of integration steps per iteration [27, 28, 23, 25, 26].

In this study, we propose a computationally inexpensive adaptive tuning procedure that identifies an optimal numerical integrator along with a set of HMC hyperparameters tailored to a given simulation system. We call this procedure ATune and refer to an HMC method reinforced with such tuning as Adaptively-Tuned HMC (AT-HMC).

Furthermore, we extend our focus to Generalized Hamiltonian Monte Carlo (GHMC) [29, 30], a promising but relatively unexplored variant of standard HMC.

The key distinction between HMC and GHMC lies in the update mechanism for the momenta. In HMC, momenta are completely discarded after the Metropolis test and resampled at the beginnig of each iteration from a Gaussian distribution 𝒩(0,M)\mathcal{N}(0,M). On the contrary, GHMC performs a partial momentum update (PMU) [29]. The PMU introduces an additional hyperparameter φ\varphi, which, if not carefully chosen, can degrade sampling performance. This likely explains why GHMC has been less commonly used than HMC.

However, when properly tuned, GHMC can offer significant improvements in sampling accuracy and efficiency. There are two primary reasons for that.

First, in contrast to HMC, GHMC produces irreversible chains by construction [31, 2, 32]. This usually implies a faster convergence to equilibrium and a reduction in asymptotic variance [33, 34, 35, 32, 36]. Second, the partial momentum update in GHMC with a small momentum mixing angle makes a choice of the number of steps per integration leg less critical than in the case of HMC, enabling efficient sampling with shorter Hamiltonian trajectories [37], thereby enhancing both the speed and efficiency of the sampling process.

Although theoretical studies recognize potential of GHMC, little effort has been devoted to developing effective tuning strategies. Only recently, GHMC has begun to attract growing interest within the scientific community [38, 39, 40, 41].

Building on these insights, we extend our Adaptive Tuning to GHMC to derive AT-GHMC. Apart from the previously discussed in the literature choices of HMC and GHMC simulation parameters, we add to our optimal settings list randomization intervals for simulation parameters. Indeed, randomization of simulation parameters is part of any HMC-based algorithm [2], but until now its choice remains purely heuristic.

In summary, we aim to find optimal, or close to optimal choices for:

  • 1.

    an integration scheme for the Hamiltonian equations of motion

  • 2.

    a dimensional integration step size Δt\Delta t and its randomization interval

  • 3.

    a number of integration steps LL per Monte Carlo iteration and its randomization interval

  • 4.

    a noise control parameter φ\varphi (along with the appropriate randomization interval) in the PMU.

Our approach relies on the adaptive s-AIA integration method [14], which provides useful guidance on choices of integrators and optimal step size locations. According to [14], s-AIA integrators outperform the standard Velocity Verlet and other fixed-parameter multi-stage integrators in terms of both integration accuracy and their impact on sampling efficiency of HMC, for any simulation step size within a system-specific stability interval. Additionally, a key observation in [14] is that the highest performance of HMC is achieved near the center of the stability interval, provided a stability limit is accurately estimated. This also was suggested in [10, 42]. Using these findings and the analysis underlying the s-AIA approach, we identify optimal integration schemes and step size randomization intervals for both AT-HMC and AT-GHMC as well as an optimal randomization scheme for LL in AT-GHMC.

Moreover, we revisit the optimization strategies previously proposed for Modified HMC in [12], and adapt them to GHMC, refining the momentum refreshment procedure to derive an optimal, system-dependent noise parameter φ\varphi. Using the proposed settings, GHMC achieves efficient phase-space exploration even with a small number of integration steps LL [37].

The new tuning methodology does not incur additional computational overhead, as the optimal values are either system-specific (and can be determined a priori) or computed during a burn-in stage. Numerical experiments on standard benchmarks confirm the positive effect of proposed optimal settings on HMC and GHMC performance.

Additionally, we assess the efficiency of AT-HMC and AT-GHMC on three real-world applications, each relying on a different probabilistic model, and compare the results with those obtained using the state-of-the-art samplers commonly employed in Bayesian inference on such models. The applications include

  • 1.

    Patient resistance to endocrine therapy in breast cancer – a Bayesian Logistic Regression (BLR) model

  • 2.

    Cell-cell adhesion dynamics – a Partial Differential Equation (PDE) model

  • 3.

    Influenza A (H1N1) epidemics outbreak – an Ordinary Differential Equation (ODE) model.

The paper is organized as follows. Section 2 presents the essential background, covering HMC, GHMC and the numerical integration of Hamiltonian dynamics. Section 3 revisits the numerical experiments in [14], showing that optimal performance is consistently achieved near the center of a stability interval. In Section 4, we present the refined randomization interval for the optimal simulation step size Δt\Delta t and support our choice by numerical experiments.

Sections 5 and 6 introduce the optimization procedures for the random noise parameter φ\varphi and the number of integration steps per iteration LL. The Adaptive Tuning algorithm ATune is outlined in Section 7, validated on the set of benchmarks in Section 8 and systematically tested in the three real-world application studies in Section 9. Section 10 summarizes our conclusions.

2 Hamiltonian Monte Carlo

2.1 Standard formulation

Hamiltonian Monte Carlo [1] is a Markov Chain Monte Carlo (MCMC) method for obtaining NN correlated samples {𝜽n}n=1,,N\{\bm{\theta}_{n}\}_{n=1,...,N} from a target probability distribution π(𝜽)\pi(\bm{\theta}) in D\mathbb{R}^{D}, DD being the dimension of 𝜽\bm{\theta}. HMC generates a Markov chain in the joint phase space Ω2D\Omega\subseteq\mathbb{R}^{2D} with invariant distribution

π(𝜽,𝒑)=π(𝜽)n(𝒑)exp(H(𝜽,𝒑)),\pi(\bm{\theta},\bm{p})=\pi(\bm{\theta})n(\bm{p})\propto\exp(-H(\bm{\theta},\bm{p})), (1)

and recovers π(𝜽)\pi(\bm{\theta}) by marginalizing out the auxiliary momentum variable 𝒑\bm{p} (distributed according to n(𝒑)n(\bm{p})).

In (1), H(𝜽,𝒑)H(\bm{\theta},\bm{p}) is the Hamiltonian

H(𝜽,𝒑)=12𝒑TM1𝒑+U(𝜽),H(\bm{\theta},\bm{p})=\frac{1}{2}\bm{p}^{T}M^{-1}\bm{p}+U(\bm{\theta}), (2)

where MM is the symmetric positive definite mass matrix, and U(𝜽)U(\bm{\theta}) is the potential energy, which contains information about the target distribution π(𝜽)\pi(\bm{\theta}):

U(𝜽)=logπ(𝜽)+const.U(\bm{\theta})=-\log\pi(\bm{\theta})+\text{const}.

HMC generates new proposals (𝜽,𝒑)(\bm{\theta}^{\prime},\bm{p}^{\prime}) by alternating momentum update steps, where 𝒑\bm{p} is drawn from a Gaussian 𝒩(0,M)\mathcal{N}(0,M), with the steps that update position 𝜽\bm{\theta} and momenta 𝒑\bm{p} through the numerical integration of the Hamiltonian equations

d𝜽dt=M1𝒑,d𝒑dt=θU(𝜽),\frac{d\bm{\theta}}{dt}=M^{-1}\bm{p},\qquad\frac{d\bm{p}}{dt}=-\nabla_{\theta}U(\bm{\theta}), (3)

performed LL times using an explicit symplectic and reversible integrator Ψh\Psi_{h} [43]. Thus,

(𝜽,𝒑)=ΨhΨhL times(𝜽,𝒑).(\bm{\theta}^{\prime},\bm{p}^{\prime})=\underbrace{\Psi_{h}\circ...\circ\Psi_{h}}_{\text{$L$ times}}(\bm{\theta},\bm{p}). (4)

Symplecticity and reversibility of Ψh\Psi_{h} ensure that the generated Markov chain is ergodic, thus that π(𝜽,𝒑)\pi(\bm{\theta},\bm{p}) is an invariant measure for the generated Markov chain.

Due to numerical integration errors, the Hamiltonian energy and hence the target density (1) are not exactly preserved. To address this issue, the invariance of the target density is secured through a Metropolis test with the acceptance probability

α=min{1,exp(ΔH)},\alpha=\min\{1,\exp(-\Delta H)\}, (5)

where

ΔH=H(𝜽,𝒑)H(𝜽,𝒑)\Delta H=H(\bm{\theta}^{\prime},\bm{p}^{\prime})-H(\bm{\theta},\bm{p}) (6)

is the energy error resulting from the numerical integration (4). In case of acceptance, 𝜽\bm{\theta}^{\prime} becomes the starting point for the subsequent iteration. Conversely, if the proposal is rejected, the previous state 𝜽\bm{\theta} is retained for the next iteration. Regardless of acceptance or rejection, the momentum variable is discarded, and a new momentum 𝒑\bm{p} is drawn from 𝒩(0,M)\mathcal{N}(0,M).

2.2 Generalized Hamiltonian Monte Carlo

Generalized Hamiltonian Monte Carlo [30] incorporates a Partial Momentum Update (PMU) [29] in standard Monte Carlo to replace the full refreshment of auxiliary momenta. The PMU reduces random-walk behavior by partially updating the auxiliary momenta 𝒑\bm{p} through mixing with an independent and identically distributed (i.i.d) Gaussian noise 𝒖𝒩(0,M)\bm{u}\sim\mathcal{N}(0,M):

𝒑1φ𝒑+φ𝒖.\bm{p}\leftarrow\sqrt{1-\varphi}\,\bm{p}+\sqrt{\varphi}\,\bm{u}. (7)

Here, φ(0,1]\varphi\in(0,1] is a random noise parameter which determines the extent to which the momenta are refreshed. When φ=1\varphi=1, the update (7) reduces to the standard HMC momentum resampling scheme.

Notice that the PMU produces configurations that preserve the joint target distribution, as the orthogonal transformation in (7) maintains the distributions of 𝒑𝒩(0,M)\bm{p}\sim\mathcal{N}(0,M). Since the momenta are not fully discarded, GHMC performs a momentum flip after each rejected proposal. As shown by Neal [2] and formally proved by Fang et al. [37], the three core steps of GHMC (Hamiltonian dynamics combined with the Metropolis test, the PMU, and momentum flip) each individually satisfy detailed balance and maintain reversibility, while their combination does not. This means that the resulting Markov chains are not reversible. Nevertheless, Fang et al. [37] demonstrated that GHMC fulfils the weaker modified detailed balance condition, ensuring the stationarity of the generated Markov chains.

2.3 Numerical integration of Hamiltonian dynamics

The most commonly used numerical scheme for solving the Hamiltonian equations (3) is the Velocity Verlet integrator [6, 7]. This scheme is widely favored due to its simplicity, good stability, and energy conservation properties. The algorithm reads

𝒑\displaystyle\bm{p} 𝒑h2θU(𝜽),\displaystyle\leftarrow\bm{p}-\frac{h}{2}\nabla_{\theta}U(\bm{\theta}),
𝜽\displaystyle\bm{\theta} 𝜽+hM1𝒑,\displaystyle\leftarrow\bm{\theta}+hM^{-1}\bm{p},
𝒑\displaystyle\bm{p} 𝒑h2θU(𝜽).\displaystyle\leftarrow\bm{p}-\frac{h}{2}\nabla_{\theta}U(\bm{\theta}). (8)

Here, hh represents the integration step size. The updates of momenta and position in (8) are often called momentum kick and position drift, respectively.

More sophisticated integration schemes have been proposed in literature, including multi-stage palindromic splitting integrators [44, 10, 45]. These methods perform multiple splits of the separable Hamiltonian system (3), applying a sequence of kicks and drifts (8) with arbitrary chosen step sizes, in contrast to the half-kick, drift, half-kick structure of the standard Verlet method. By denoting a position drift and a momentum kick, respectively, with

φhA(𝜽,𝒑)=(𝜽+hM1𝒑,𝒑),φhB(𝜽,𝒑)=(𝜽,𝒑hθU(𝜽)),\varphi^{A}_{h}(\bm{\theta},\bm{p})=(\bm{\theta}+hM^{-1}\bm{p},\bm{p}),\qquad\varphi^{B}_{h}(\bm{\theta},\bm{p})=(\bm{\theta},\bm{p}-h\nabla_{\theta}U(\bm{\theta})), (9)

the family of kk-stage splitting integrators is defined as [16]

Ψhk=φb1hBφa1hAφakhAφbk+1hBφakhAφa1hAφb1hB,bi,aj+,\Psi_{h}^{k}=\varphi^{B}_{b_{1}h}\circ\varphi^{A}_{a_{1}h}\circ\dots\circ\varphi^{A}_{a_{k^{\prime}}h}\circ\varphi^{B}_{b_{k^{\prime}+1}h}\circ\varphi^{A}_{a_{k^{\prime}}h}\circ\dots\circ\varphi^{A}_{a_{1}h}\circ\varphi^{B}_{b_{1}h},\quad b_{i},a_{j}\in\mathbb{R}^{+}, (10)

if k=2kk=2k^{\prime}, and

Ψhk=φb1hBφa1hAφbkhBφakhAφbkhBφa1hAφb1hB,bi,aj+,\Psi_{h}^{k}=\varphi^{B}_{b_{1}h}\circ\varphi^{A}_{a_{1}h}\circ\dots\circ\varphi^{B}_{b_{k^{\prime}}h}\circ\varphi^{A}_{a_{k^{\prime}}h}\circ\varphi^{B}_{b_{k^{\prime}}h}\circ\dots\varphi^{A}_{a_{1}h}\circ\varphi^{B}_{b_{1}h},\quad b_{i},a_{j}\in\mathbb{R}^{+}, (11)

if k=2k1k=2k^{\prime}-1. The coefficients bib_{i}, aja_{j} in (10)–(11) have to satisfy the conditions 2i=1kbi+bk+1=2j=1kaj=12\sum_{i=1}^{k^{\prime}}b_{i}+b_{k^{\prime}+1}=2\sum_{j=1}^{k^{\prime}}a_{j}=1, and 2i=1kbi=2j=1k1aj+ak=12\sum_{i=1}^{k^{\prime}}b_{i}=2\sum_{j=1}^{k^{\prime}-1}a_{j}+a_{k^{\prime}}=1, respectively, to ensure an integration step of length hh. A term stage (kk) refers to the number of gradient evaluations per integration. It is straightforward to verify that the only 1-stage splitting integrator of the form (11) is the Velocity Verlet scheme (8), which can be expressed as

ΨhVV=φh2BφhAφh2B.\Psi_{h}^{\text{VV}}=\varphi^{B}_{\frac{h}{2}}\circ\varphi^{A}_{h}\circ\varphi^{B}_{\frac{h}{2}}. (12)

Various choices of integration coefficients bib_{i}, aja_{j} in (10)–(11) yield integrators that reach their pick performance at different specific values of a simulation step size hh. To address the step size dependency, adaptive integration approaches AIA [11] and s-AIA [14] were developed for molecular simulations and Bayesian inference applications, respectively. For a given system and simulation step size hh, these methods select the most suitable 2- (AIA, s-AIA2) and 3- (s-AIA3) stage splitting integrator ensuring optimal energy conservation for harmonic forces. Examples of 2- and 3-stage integrators, including those used in this study, are reviewed in [10, 45, 14].

Integrator N. of stages (kk) Feature References
Velocity Verlet 11 longest stability interval [6, 7]
BCSSkk 2, 3 minimizes 𝔼[ΔH]\mathbb{E}[\Delta H] at hCoLSI=kh_{\text{CoLSI}}=k [10, 45]
MEkk 2, 3 minimizes truncation error for h0h\to 0 [46, 45]
sAIAkk 2, 3 minimizes 𝔼[ΔH]\mathbb{E}[\Delta H] h(0,2k)\forall h\in(0,2k) [14]
Table 1: Multi-stage splitting integrators considered for this study. 𝔼[ΔH]\mathbb{E}[\Delta H] is the expected energy error resulting from the numerical integration of Hamiltonian dynamics, hCoLSI=kh_{\text{CoLSI}}=k is the center of the longest stability interval.

3 Tracking the location of optimal step size

At first, we aim to demonstrate that top performance of HMC is typically achieved with a step size being near the center of the longest stability interval (CoLSI) for an integrator in use, as previously suggested in [42] and lately confirmed in [10] and [14]. We will use notations hCoLSIh_{\text{CoLSI}} and ΔtCoLSI\Delta t_{\text{CoLSI}} for such dimensionless and dimensional step sizes, respectively. We recall that for a kk-stage integrator, the length of the longest dimensionless stability interval equals 2kk, hence, hCoLSIh_{\text{CoLSI}} = kk [10, 14].

We consider the numerical experiments presented in [14] for the three benchmarks – 1000-dimensional multivariate Gaussian model, Gaussian 1 (for simplicity, from now on we call it G1000) and two BLR models, German and Musk, performed with the numerical integrators summarized in Table 1.

For each benchmark, we identify the integrator and integration step size which lead to the lowest ratio between the number of gradient evaluations, grad (which represents the bulk of the computational cost in an HMC simulation), and the minimum effective sample size [47] across variates, minESS, computed for HMC production chains of the length of Nconv+1000N_{\text{conv}}+1000 iterations. Here NconvN_{\text{conv}} is the number of samples in the production chains, required for reaching convergence [48, 49, 50]. We choose minESS (calculated through the effectiveSize function of the coda package of R [51]) as the most demanding one among the metrics considered in [14].

We refer to the performance metric introduced in such a way, i.e. grad/minESS, as gESS. In Section 8, we provide a detailed discussion of this metric and explore more performance metrics for validation of the proposed methodology. We emphasize that lower values of grad and higher minESS suggest faster convergence and better sampling efficiency, respectively, meaning that lower gESS indicates the improved overall performance.

Table 2 displays the best grad/minESS performance, i.e. the lowest gESSgESS\text{gESS}\coloneqq\text{gESS}^{\bigstar} achieved for each benchmark with kk-stage integrators, k=2,3k=2,3, and specifies the integrators along with the corresponding step sizes (normalized to the number of stages for clarity, i.e. normalized hCoLSIh_{\text{CoLSI}} is equal to 1 for all kk-stage integrators) responsible for such performance. The results are visualized in Figure 1 using non-normalized step sizes.

Benchmark kk gESS\text{gESS}^{\bigstar} hgESS/kh_{\text{gESS}^{\bigstar}}/k Integrator
G1000 2 6402 1.0 s-AIA2
3 5177 1.1 BCSS3
German 2 28.83 0.9 AIA2
3 26.43 0.8 ME3
Musk 2 249.3 0.8 BCSS2
3 252.8 0.9 s-AIA3
Table 2: Best grad/minESS (gESS\text{gESS}^{\bigstar}) performance, detected in the HMC simulations in [14] for the three benchmarks, using best performed integrator at hgESSh_{\text{gESS}^{\bigstar}} integration step size (normalized to the number of stages kk).
Refer to caption
Figure 1: Best grad/minESS performance (gESS\text{gESS}^{\bigstar}) obtained for the tested benchmarks – G1000 (mauve), German (dark red) and Musk (turquoise) (Table 2). kk is the number of stages of an integrator. For any benchmark, top performance is achieved around the CoLSI, i.e. h=kh=k (black dotted line).

From Table 2, one can observe that top performance is consistently achieved around the CoLSI, hCoLSI=1h_{\text{CoLSI}}=1, and thus hCoLSIh_{\text{CoLSI}} is an obvious candidate for an optimal step size. The deviation of hgESSh_{\text{gESS}^{\bigstar}} from hCoLSIh_{\text{CoLSI}} is not critical at this stage as an optimal step size is to be randomized yet, and only performance at a randomized step matters.

Furthermore, 3-stage integrators always outperform 2-stage ones, as already noticed in [13] and [14]. Hence, it is natural to search for an optimal integrator within the class of 3-stage integrators. Unfortunately, Table 2 does not suggest a unique candidate for the best performed integrator in this class.

We remark that according to [14], BCSS3 and s-AIA3 around CoLSI give rise to very similar accuracy and performance of HMC for all considered benchmarks. For the German benchmark, the efficiencies of all the three 3-stage integrators, i.e. BCSS3, ME3 and s-AIA3, at CoLSI are almost indistinguishable. Thus, it is not surprising that all those integrators appear in Table 2. However, only s-AIA3 adapts its performance to a choice of a step size, and this property becomes important when randomization of a step size is applied. In this context, s-AIA3 looks like the best pick, but this has to be confirmed yet. In the next Sections, we propose a randomization interval for hCoLSIh_{\text{CoLSI}} and use the 3-stage integrators (BCSS3, ME3, s-AIA3) and the standard 1-stage Velocity Verlet for its validation.

4 Randomizing optimal step size

Further, we propose an approach for finding an optimal randomization interval for an integration step size hCoLSIh_{\text{CoLSI}} (and for its dimensional counterpart ΔtCoLSI\Delta{t_{\text{CoLSI}}}) and compare the resulted HMC performance with the best grad/minESS performance in Table 2.

4.1 Randomization interval

To identify endpoints of a randomization interval for an optimal step size we refer to the upper bound of the expected energy error for 3-stage integrators ρ3(h,b)\rho_{3}(h,b) derived in [14] and plotted for the range of integrators considered in this study in Figure 2.

Refer to caption
Figure 2: The upper bound ρ3-stage(h,b)\rho_{\text{3-stage}}(h,b) of the expected energy error for 3-stage integrators.

When searching for an optimal randomization interval for hCoLSIh_{\text{CoLSI}} the following considerations have to be taken into account. First, the interval should include the step sizes that guarantee the high accuracy of the numerical integration, i.e. ρ3(h,b)1\rho_{3}(h,b)\ll 1. Then, one should keep interval narrow enough to stay close to an estimated optimal value of a step size and to avoid including too small step sizes, which usually satisfy the aforementioned inequality but lead to correlated samples in HMC. Finally, the approximate analysis proposed in [14] may likely result in the overestimated values of the stability limit and thus ΔtCoLSI\Delta{t_{\text{CoLSI}}} (see A). This implies a reasonable choice for the right endpoint to be hCoLSIh_{\text{CoLSI}}.

On the other hand, we recall that BCSS3 was derived to minimize the maximum of ρ3(h,b)\rho_{3}(h,b) in 0<h<hCoLSI0<h<h_{\text{CoLSI}} and thus to achieve the best performance near hCoLSIh_{\text{CoLSI}}. The inspection of Figure 2 reveals that ρ3(h,b)\rho_{3}(h,b) with b=bBCSS3b=b_{\text{BCSS3}} exhibits a local maximum hBCSS3maxh_{\text{BCSS3max}} on the left side of hCoLSIh_{\text{CoLSI}}. Within the interval (hBCSS3maxh_{\text{BCSS3max}}, hCoLSIh_{\text{CoLSI}}) both s-AIA3 and BCSS3 demonstrate very high accuracy, which deteriorates after hh passing the center of the stability interval. Thus, such an interval obeys all requirements on an optimal randomization interval summarized above.

The local maxima of ρBCSS3(h)\rho_{\text{BCSS3}}(h), i.e. hBCSS3maxh_{\text{BCSS3max}}, can be obtained by setting ρ3(h,b)h=0\frac{\partial\rho_{3}(h,b)}{\partial h}=0 and b=bBCSS3b=b_{\text{BCSS3}} [10] (see B) which yields hlowerhBCSS3max2.0772h_{\text{lower}}\equiv h_{\text{BCSS3max}}\approx 2.0772.

Therefore, the proposed randomization interval for the nondimensional optimal integration step size is

h𝒰(hlower,3)𝒰(2.0772,3).h\sim\mathcal{U}(h_{\text{lower}},3)\approx\mathcal{U}(2.0772,3). (13)

And the corresponding interval for the dimensional step size is

Δt𝒰(Δtlower,ΔtCoLSI),\Delta{t}\sim\mathcal{U}(\Delta{t_{\text{lower}}},\Delta{t_{\text{CoLSI}}}), (14)

where Δtlower\Delta{t_{\text{lower}}} and ΔtCoLSI\Delta{t_{\text{CoLSI}}} are calculated as proposed in [14] and briefly described in C.

Benchmark Sampler Integrator LL Δt\Delta t φ\varphi
G1000 HMC s-AIA3 𝒰{1,,2666}\mathcal{U}\{1,...,2666\} 𝒰(0.036,0.052)\mathcal{U}(0.036,0.052) 1
BCSS3
ME3
VV 𝒰{1,,7998}\mathcal{U}\{1,...,7998\} 𝒰(0.012,0.017)\mathcal{U}(0.012,0.017)
GHMC s-AIA3 𝒰{1,,1332}\mathcal{U}\{1,...,1332\} 𝒰(0.036,0.052)\mathcal{U}(0.036,0.052) 𝒰(0,0.5)\mathcal{U}(0,0.5)
BCSS3
ME3
VV 𝒰{1,,3999}\mathcal{U}\{1,...,3999\} 𝒰(0.012,0.017)\mathcal{U}(0.012,0.017)
German HMC s-AIA3 𝒰{1,,16}\mathcal{U}\{1,...,16\} 𝒰(0.111,0.160)\mathcal{U}(0.111,0.160) 1
BCSS3
ME3
VV 𝒰{1,,49}\mathcal{U}\{1,...,49\} 𝒰(0.037,0.053)\mathcal{U}(0.037,0.053)
GHMC s-AIA3 𝒰{1,,7}\mathcal{U}\{1,...,7\} 𝒰(0.111,0.160)\mathcal{U}(0.111,0.160) 𝒰(0,0.5)\mathcal{U}(0,0.5)
BCSS3
ME3
VV 𝒰{1,,24}\mathcal{U}\{1,...,24\} 𝒰(0.037,0.053)\mathcal{U}(0.037,0.053)
Musk HMC s-AIA3 𝒰{1,,110}\mathcal{U}\{1,...,110\} 𝒰(0.083,0.11)\mathcal{U}(0.083,0.11) 1
BCSS3
ME3
VV 𝒰{1,,333}\mathcal{U}\{1,...,333\} 𝒰(0.028,0.040)\mathcal{U}(0.028,0.040)
GHMC s-AIA3 𝒰{1,,55}\mathcal{U}\{1,...,55\} 𝒰(0.084,0.121)\mathcal{U}(0.084,0.121) 𝒰(0,0.5)\mathcal{U}(0,0.5)
BCSS3
ME3
VV 𝒰{1,,166}\mathcal{U}\{1,...,166\} 𝒰(0.028,0.040)\mathcal{U}(0.028,0.040)
Table 3: HMC and GHMC parameters settings for the numerical experiments presented in Section 4.2. Integration step sizes for each randomization interval are provided in dimensional units.

4.2 Validation

To validate the proposed optimal randomized step size and to choose an optimal integrator among the 3-stage integrators, we run HMC simulations with s-AIA3, BCSS3, ME3 and VV at Δt\Delta{t} in (14) (in 1-stage units for VV) for all benchmarks introduced in Section 3 and listed in Table 2. The remaining HMC parameters, namely, a number of integration steps per Monte Carlo iteration, LL, and numbers of burn-in and production iterations, are kept the same as in the numerical experiments in [14]. In addition, we conduct GHMC simulations to evaluate the efficacy of the proposed randomization interval with this sampler in comparison with HMC. For the PMU in Eq. (7), we set φ\varphi to be uniformly selected from the interval (0,0.5)(0,0.5) (a standard recommendation) and use twice shorter trajectories than for HMC. We remark that this choice of GHMC parameters is based on heuristic intuition. The optimal selection of these parameters are investigated in the following Sections.

The simulation parameters for HMC and GHMC used in this section are summarized in Table 3.

Figure 3 compares performance (in terms of gESS) of HMC and GHMC using ΔtCoLSI\Delta{t_{\text{CoLSI}}} and a randomization interval (14) with gESS\text{gESS}^{\bigstar} in Table 2 through gESS\text{gESS}^{\bigstar}/gESS relations.

Refer to caption
Figure 3: Comparison of gESS\text{gESS}^{\bigstar} performance (Table 2) with gESS observed in HMC (dashed bars) and GHMC (filled bars) with Δt𝒰(Δtlower,ΔtColSI)\Delta{t}\sim\mathcal{U}(\Delta{t_{\text{lower}}},\Delta{t_{\text{ColSI}}}) for G1000 (left), German (center) and Musk (right) benchmarks using s-AIA3 (green), BCSS3 (blue), ME3 (orange), VV (red) integrators. GHMC combined with s-AIA3 (green filled bars) always outperforms gESS\text{gESS}^{\bigstar} in Table 2 and demonstrates best performance in all three benchmarks. HMC (green dashed bars) also shows the best performance with s-AIA3, which is close to gESS\text{gESS}^{\bigstar} in Table 2 but not necessarily better.

First, we recognize that, as expected, for all benchmarks and for both HMC and GHMC, the adaptive s-AIA3 integrator yields either better or similar sampling performance (in terms of gESS) compared with other tested integrators. Moreover, for GHMC, this performance is visibly higher (up to 2x) than gESS observed in HMC with the newly proposed settings (see Table 3) and with the previously reported setup (Table 2). We notice that the new settings for HMC with s-AIA3 do not necessarily improve the results in Table 2 for each benchmark, but they always stay close to them. We recall that the gESS\text{gESS}^{\bigstar} values in Table 2 for each benchmark were achieved with various integrators and at different step sizes by trial and error, whereas our new settings were generated by the procedure universal for all simulated systems.

In summary, the numerical experiments not only justify the proposed optimal choices for a numerical integrator (s-AIA3), a step size (hCoLSIh_{\text{CoLSI}} / ΔtCoLSI\Delta{t_{\text{CoLSI}}}), and a randomization interval ((13) / (14)), which we will consider for the rest of this study, but also reveal the performance superiority of GHMC over HMC.

We remark that, in contrast to the HMC experiments which relied on the recommended choices of LL, the results for GHMC in Figure 3 were obtained without a proper tuning of LL and its additional parameter φ\varphi for the PMU. Obviously, refining these two parameters can further improve performance of GHMC.

Our next objective is to search for optimal, or close to optimal choices of the remaining GHMC parameters.

5 Randomization interval for φ\varphi

We note that GHMC is not a sole HMC-based method which partially refreshes the momenta. The Modified Hamiltonian Monte Carlo, or MHMC, methods for sampling with modified Hamiltonians also rely on the PMU procedure, though the PMU is Metropolized to secure sampling in a modified ensemble [12]. Both GHMC and MHMC use Hamiltonian dynamics for generating proposals but they are accepted more frequently in MHMC due to the better conservation of modified Hamiltonians by symplectic integrators. On the contrary, partially refreshed momentum is always accepted in GHMC, whereas for MHMC, it is accepted according to the modified Metropolis test. Clearly, the MHMC method with very high acceptance rates in the PMU and the GHMC method with very high acceptance rate in the Metropolis test will behave almost indistinguishably.

Here, we aim to derive an optimal φ\varphi parameter for a GHMC method with its optimal settings, i.e. when using the s-AIA3 integrator with (13) or (14). With this in mind, we refer to the procedure proposed in [12] which links φ\varphi in MHMC with the target acceptance rate in the modified Metropolis test in the PMU (see equation (22) in the aforementioned study). Following this idea we express φ\varphi in terms of a step size, parameters of the integrator in use and the target acceptance rate to obtain

φ=lnα~1+2h2λ2Dh4λ2.\varphi=-\ln\tilde{\alpha}\frac{1+2{h}^{2}\lambda}{2D{h}^{4}\lambda^{2}}. (15)

Here, α~\tilde{\alpha} is the expected acceptance rate for updated momentum, DD is the dimension of the system, hh is the dimensionless counterpart of the integration step size Δt\Delta t in use, λ=λk(𝒛)\lambda=\lambda_{k}(\bm{z}) is the coefficient depending on kk – a number of stages of a numerical integrator – and 𝒛\bm{z} – the integrator coefficients. In particular [13],

λ2(b)=6b124,λ3(b,a)=16a(1a)(12b)12.\lambda_{2}(b)=\frac{6b-1}{24},\qquad\lambda_{3}(b,a)=\frac{1-6a(1-a)(1-2b)}{12}. (16)

We remark that in the notations of this study φ\varphi corresponds to sin2ϕ\sin^{2}\phi in [12]. Next, we adapt (15) to the conditions of this work. More precisely, we are interested in optimizing performance of GHMC with s-AIA3 (i.e. k=3k=3) in (hlower,hCoLSI)(h_{\text{lower}},h_{\text{CoLSI}}). We recall that for s-AIA3, the integrator parameters aa and bb are step size dependent and so is λ3\lambda{{}_{3}}.

Refer to caption
Figure 4: Plot of 𝒦(h)\mathcal{K}(h) (Eq. 18) for hh \in (hlower,hCoLSI)(h_{\text{lower}},h_{\text{CoLSI}}) (13).

Given that for GHMC the expected acceptance rates for momentum update are equal to 1, in order to mimic the behavior of φ\varphi (15) in GHMC we set α~=0.999\tilde{\alpha}=0.999. Therefore, we obtain

φ(h)=ln0.999𝒦(h)D,\varphi(h)=-\ln{0.999}\frac{\mathcal{K}(h)}{D}, (17)

where

𝒦(h)=1+2h2λ3(h)2h4λ3(h)2.\mathcal{K}(h)=\frac{1+2{h}^{2}\lambda_{3}(h)}{2{h}^{4}{\lambda_{3}(h)}^{2}}. (18)

Moreover, from (7), φ(h)(0,1]\varphi(h)\in(0,1], which yields

φopt(h)=min{1,ln0.999𝒦(h)D}.\varphi_{\text{opt}}(h)=\min\left\{1,-\ln{0.999}\frac{\mathcal{K}(h)}{D}\right\}. (19)

We notice that 𝒦(h)\mathcal{K}(h) is a monotonically decreasing function of hh in the interval (hlower,hCoLSI)(h_{\text{lower}},h_{\text{CoLSI}}) (Figure 4), and does not depend on a simulated model. Hence, we can choose the bounds of the randomization interval for optimal φ\varphi as

φlower=φopt(hCoLSI),φupper=φopt(hlower).\varphi_{\text{lower}}=\varphi_{\text{opt}}(h_{\text{CoLSI}}),\qquad\varphi_{\text{upper}}=\varphi_{\text{opt}}(h_{\text{lower}}). (20)

Thus, an optimal randomization interval for the random noise φ\varphi in the PMU (7) reads as

φ𝒰(φlower,φupper).\varphi\sim\mathcal{U}\left(\varphi_{\text{lower}},\varphi_{\text{upper}}\right). (21)

Optionally, one can adapt φopt\varphi_{\text{opt}} (19) in each Monte Carlo iteration for hh within the randomization interval (13) and its corresponding parameters of s-AIA3. In this case, no additional randomization of φ\varphi is required.

In what follows, we first describe a procedure for an optimal choice of a number of integration steps per Monte Carlo iteration, LL, and then compare our proposed scheme (21) combined with the various choices of LL with the commonly recommended φ𝒰(0,0.5)\varphi\sim\mathcal{U}(0,0.5) and two additional randomization intervals, φ𝒰(0,0.1)\varphi\sim\mathcal{U}(0,0.1) (close to Metropolized Molecular Dynamics: φ=0\varphi=0) and φ𝒰(0,0.9)\varphi\sim\mathcal{U}(0,0.9) (close to HMC: φ=1\varphi=1).

6 Optimising Hamiltonian trajectory lengths

A partial momentum update (7) allows for maintaining a favorable direction of sampling, provided (i) the acceptance rate for proposed Hamiltonian trajectories is high and (ii) momentum mixing angle φ\varphi at the PMU is small. Under these conditions, choosing the small possible LL, i.e. L=1L=1 helps to explore in full the area containing a direction once accepted by the Metropolis test, thus mimicking a long but flexible trajectory. So, in contrast to HMC, GHMC may benefit from short Hamiltonian trajectories.

The optimal settings proposed in this study tend to satisfy both conditions, (i) and (ii). Indeed, the randomization interval for optimal step size hCoLSIh_{\text{CoLSI}} and the integrator s-AIA3 are chosen to guarantee small expected energy errors(see Figure 2) and hence, high acceptance rates. Furthermore, as follows from (19), φ1/D\varphi\sim 1/D, where DD is the dimension of a simulated system. Thus, the reasonable suggestion for an optimal choice of LL might be Lopt=1L_{\text{opt}}=1, provided that the underlying analysis for finding an optimal integrator and a step size is accurate.

However, as was indicated in [14], the accuracy of the approach for estimating hCoLSIh_{\text{CoLSI}} may suffer when it is applied to systems with clear anharmonic behavior. Such behavior can be caught by inspecting a system-specific value of a fitting factor SfS_{f}, which is computed using the simulation data collected at a burn-in stage of a GHMC simulation (see Eqs. (24) and (26) in [14])

Sf=S=max(1,2ω~ΔtVV32𝔼VV[ΔH]D6),S_{f}=S=\max\left(1,\frac{2}{\tilde{\omega}\Delta t_{\text{VV}}}\sqrt[6]{\frac{32\mathbb{E}_{\text{VV}}[\Delta H]}{D}}\right), (22)

where ω~\tilde{\omega} is the highest frequency, ΔtVV\Delta t_{\text{VV}} is the step size used during the burn-in stage, 𝔼VV\mathbb{E}_{\text{VV}} is the expectation of the energy error, and DD is the dimension of the system.

Its clear deviation from 1 implies the deviation from harmonic behavior. As a result, for such systems, the expected energy error at the estimated hCoLSIh_{\text{CoLSI}} may differ from the one predicted by the analysis of a harmonic oscillator. Moreover, using the expected energy error for 1-stage integrator at L=1L=1 for the analysis of 3-stage integrators in [14] leads to an overestimation of hCoLSIh_{\text{CoLSI}}, as was discussed in A. Even though, in the settings proposed in this study, it is not only possible to quantify the error introduced by the suggested analysis but also to compensate for it by choosing parameter LL in the appropriate way.

In particular, let 𝔼Sf=1L=1[ΔH]\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}=1$}}[\Delta H] be the expected energy error for 1-stage Velocity Verlet derived at L=1L=1 for systems with harmonic behavior (Sf=1S_{f}=1) and 𝔼Sf>1L[ΔH]\mathbb{E}^{\text{$L$}}_{\text{$S_{f}>1$}}[\Delta H] be the expected energy error for 1-stage Velocity Verlet at an arbitrary L1L\geq 1 for anharmonic systems (Sf>1S_{f}>1). As shown in [14],

𝔼Sf=1L=1[ΔH]=h632,h=Sfω~Δt.\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}=1$}}[\Delta H]=\frac{h^{6}}{32},\quad h=S_{f}\tilde{\omega}\Delta t. (23)

We denote

X=𝔼Sf>1L[ΔH]𝔼Sf=1L=1[ΔH].X=\frac{\mathbb{E}^{\text{$L$}}_{\text{$S_{f}>1$}}[\Delta H]}{\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}=1$}}[\Delta H]}. (24)

By setting X=1X=1 and solving the equation for L1L\geq 1, one can identify values of LL that level the difference between the expected errors in a GHMC simulation for harmonic and anharmonic systems. In D, we follow this idea and obtain such values of LL as a function of SfS_{f}. Moreover, taking the suggestion in [19], we randomize L>1L>1 so that LoptL_{\text{opt}} is the mean of the randomization interval, i.e.

L𝒰{2,,Lupper},Lupper=2Lopt2,L\sim\mathcal{U}\left\{2,...,L_{\text{upper}}\right\},\qquad L_{\text{upper}}=2L_{\text{opt}}-2, (25)

where (see D for details):

Lopt={1,for1.0Sf<1.5,4,otherwise.L_{\text{opt}}=\begin{cases}1,\quad\text{for}\quad 1.0\leq S_{f}<1.5,\\ 4,\quad\text{otherwise.}\end{cases} (26)

The resulting randomization scheme for LL will be the following:

L={1,for1.0Sf<1.5,𝒰{2,6},otherwise.L=\begin{cases}1,\quad&\text{for}\quad 1.0\leq S_{f}<1.5,\\ \mathcal{U}\{2,6\},\quad&\text{otherwise.}\end{cases} (27)

7 Adaptive Tuning algorithm

Here, we summarize the Adaptive Tuning algorithm (ATune) that, given a simulation system, generates a set of optimal parameters for a GHMC/HMC simulation to give rise to AT-GHMC/AT-HMC.

Similarly to the s-AIA algorithm presented in [14], ATune relies on the simulation data collected during the GHMC/HMC burn-in stage. The latter is run with 1-stage Velocity Verlet using L=1L=1 and a simulation step size ΔtVV\Delta t_{\text{VV}} tuned on the fly, in order to reach a target AR for the burn-in (more details in Appendix B in [14]). Finally, the random noise φ\varphi for the PMU (7) to run the GHMC burn-in stage is selected according to (20)–(21).

On the completion of the GHMC/HMC burn-in stage, the following properties are computed and stored:

  • 1.

    Acceptance rate (AR)

  • 2.

    Either the full set of frequencies ωi,i=1,,D\omega_{i},i=1,...,D, their maximum ω~\tilde{\omega} and standard deviation σ\sigma (which require an extra computational effort due to the calculation of Hessians, see [14]), or maximum frequencies ω~\tilde{\omega} only (less computationally expensive, see [40], Sec. 4.2).

This data is needed for calculation of the fitting factor SfS_{f} of the simulated system, which is critical for the nondimensionalization procedure as well as for finding optimal integration coefficients for s-AIA3 integrators and optimal parameters settings for a GHMC/HMC production simulation.

Depending on the availability of frequencies, we consider

Sf={Sω,if the full set of ωi,i=1,,D is available,S,otherwise,S_{f}=\begin{cases}S_{\omega},\quad&\text{if the full set of $\omega_{i},i=1,...,D$ is available},\\ S,\quad&\text{otherwise},\end{cases} (28)

where S,SωS,S_{\omega} are calculated as described in [14] and summarized in C. Then, the parameters of a s-AIA3 integrator are computed according to the procedure introduced in [14] with the use of the map hbopth\to b_{\text{opt}} available in [52].

The GHMC/HMC optimal parameters for the production stage are:

  • 1.

    Step size Δt\Delta t
    Δt\Delta t is picked uniformly randomly from (Δtlower,ΔtCoLSI)(\Delta t_{\text{lower}},\Delta t_{\text{CoLSI}}), where

    Δtlower=hlowerCF,ΔtCoLSI=3CF,\Delta t_{\text{lower}}=\frac{h_{\text{lower}}}{\text{CF}},\qquad\Delta t_{\text{CoLSI}}=\frac{3}{\text{CF}},

    and CF, hlowerh_{\text{lower}} are defined in (52)–(53) and (13), respectively.

  • 2.

    PMU random noise φ\varphi (for GHMC only)
    Similarly to the burn-in stage, φ\varphi is randomized following Eqs. (20)–(21).

  • 3.

    Number of integration steps per Monte Carlo iteration LL (recommended for GHMC but can be used with HMC)
    LL is chosen following (27).

Finally, the production stage is run using s-AIA3 integrator [14] and the set of optimal parameters for a chosen number of iterations NprodN_{\text{prod}}.

The algorithm is outlined in Figure 5.

INPUT
1. Number of iterations: Nburn-inN_{{\text{burn-in}}} (burn-in) NprodN_{{\text{prod}}} (production) 2. Calculation of frequencies: Iω=0(no)/ 1(yes)I_{\omega}=0\,\text{(no)}/\,1\,\text{(yes)} 3. Integrator: 1-stage Velocity Verlet 4. Trajectory length: L=1L=1 1. Dimensional step size: Δt=ΔtVV\Delta t=\Delta t_{{\text{VV}}} tuned on the fly 2. PMU random noise: φ=φi𝒰(φlower,φupper)\varphi=\varphi_{i}\sim\mathcal{U}(\varphi_{{\text{lower}}},\varphi_{{\text{upper}}}) i=1,,Nburn-ini=1,...,N_{{\text{burn-in}}} φlower,φupper\varphi_{{\text{lower}}},\varphi_{{\text{upper}}} in (20) 3. Integrator parameters: Map hbopth\to b_{\text{opt}} (C, [52])
GHMC/HMC
(Burn-in)
OUTPUT 1. Acceptance rates: AR 2. if Iω=1ωi,ω~,σI_{\omega}=1\implies\omega_{i},\tilde{\omega},\sigma else ω~\tilde{\omega}, set σ=0\sigma=0 ANALYSIS: s-AIA3 1. Nondimensionalization: (a) if Iω=0Sf=SI_{\omega}=0\implies S_{f}=S (51) else Sf=SωS_{f}=S_{\omega} (51) (b) if σ<1CF\sigma<1\implies{\text{CF}} (53) else CF (52) 2. Integrator parameters: (a) aopt,bopta_{{\text{opt}}},b_{{\text{opt}}} [52] PMU RANDOM NOISE φ\varphi 1. φi𝒰(φlower,φupper)\varphi_{i}\sim\mathcal{U}(\varphi_{{\text{lower}}},\varphi_{{\text{upper}}}) i=1,,Nprodi=1,...,N_{{\text{prod}}} φlower,φupper\varphi_{{\text{lower}}},\varphi_{{\text{upper}}} in (20) STEP SIZE Δt\Delta t 1. Δti𝒰(Δtlower,Δtupper)\Delta t_{i}\sim\mathcal{U}\left(\Delta t_{{\text{lower}}},\Delta t_{{\text{upper}}}\right) (14) i=1,,Nprodi=1,...,N_{{\text{prod}}} Δtlower=hlowerCF\Delta t_{{\text{lower}}}=\frac{h_{{\text{lower}}}}{{\text{CF}}} (13) Δtupper=3CF\Delta t_{{\text{upper}}}\!=\!\frac{3}{{\text{CF}}} NUMBER OF INTEGRATION STEPS PER ITERATION LL 1. LiL_{i}\sim (27) i=1,,Nprodi=1,...,N_{{\text{prod}}} Production settings INPUT 1. Integrator: s-AIA3 2. Dimensional step size: Δt=Δti\Delta t={\color[rgb]{1,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,1}\pgfsys@color@cmyk@stroke{0}{1}{0}{0}\pgfsys@color@cmyk@fill{0}{1}{0}{0}\Delta t_{i}}, i=1,,Nprodi=1,...,N_{{\text{prod}}} 1. Trajectory length: L=LiL={\color[rgb]{1,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,1}\pgfsys@color@cmyk@stroke{0}{1}{0}{0}\pgfsys@color@cmyk@fill{0}{1}{0}{0}L_{i}}, i=1,,Nprodi=1,...,N_{{\text{prod}}} 2. PMU random noise: φ=φi\varphi={\color[rgb]{1,0,1}\definecolor[named]{pgfstrokecolor}{rgb}{1,0,1}\pgfsys@color@cmyk@stroke{0}{1}{0}{0}\pgfsys@color@cmyk@fill{0}{1}{0}{0}\varphi_{i}}, i=1,,Nprodi=1,...,N_{{\text{prod}}} GHMC/HMC
(Production)
OUTPUT
1. GHMC/HMC trajectories
Figure 5: Schematic representation of the ATune algorithm for generating a set of optimal parameters for a GHMC/HMC simulation. The features specific to AT-GHMC only are highlighted in blue, whereas the features optional for AT-HMC are displayed in orange. Optimal parameters are shown in magenta.

The ATune algorithm is implemented (with no computational overheads introduced in a production simulation) in the BCAM in-house software package HaiCS (Hamiltonians in Computational Statistics), designed for statistical sampling of high dimensional and complex distributions and parameter estimation in Bayesian models using MCMC and HMC-based methods. A comprehensive overview of the package can be found in [53], whereas applications of HaiCS software are presented in [54, 13, 55, 14].

8 Numerical experiments

The next objective is to numerically justify the efficiency of our new adaptive tuning algorithm using a set of standard benchmarks summarized in Table 4.

8.1 Performance metrics

The irreversibility of GHMC may allow it to reach convergence faster, compared to reversible samplers, like HMC. To check this, we choose the performance metric that includes information about the speed of convergence of the Markov chains. The metric is briefly introduced in Section 3. Here we provide more details on the metric and extend it to the alternative approaches for computing effective sample size (ESS).

Following the recommendation in [50], we consider a chain to be converged if maxPSRF<1.01\max\text{PSRF}<1.01. The Potential Scale Reduction Factor (PSRF) [48] can be calculated as described in [49].

Starting from that, let us define the number of samples in the production chains, required for reaching maxPSRF<1.01\max\text{PSRF}<1.01, as N1.01N_{1.01}, and evaluate the ratio grad/ESS between the number of gradient computations and ESS for production chains of the length of N1.01+1000N_{1.01}+1000 iterations. Such a metric combines both the information about convergence and sampling efficiency.

The number of gradient evaluations reads as

grad=(N1.01+1000)L¯k,\text{grad}=(N_{1.01}+1000)\,\overline{L}\,k,

where L¯\overline{L} is the mean of the number of integration steps per Monte Carlo iteration (it varies depending on a chosen randomization scheme for LL) and kk is the number of stages of an integrator in use (in our case, k=3k=3). We recall that the number of stages kk indicates the number of times the algorithm performs an evaluation of gradients per step size. Therefore, L¯k\overline{L}k represents the theoretical average number of gradient evaluations per Monte Carlo iteration.

In Section 3, we calculate ESS using the effectiveSize function of the coda package of R [51] and consider the minimum ESS across variates, i.e. minESS. We want to include two additional ESS metrics in our list, which now appears like:

  • 1.

    minESS (coda) – the minimum ESS across variates

  • 2.

    meanESS (coda) – the average ESS over the sampled variables calculated through the effectiveSize function of the coda package of R [51]

  • 3.

    multiESS – the metric calculated in the multiESS function of the mcmcse package [56] using the approach proposed in [57].

To summarize, we propose three metrics for assessing convergence and sampling performance, such as grad/minESS, grad/meanESS, and grad/multiESS. For each metric, a lower gradient value (grad) indicates better convergence, while a higher ESS suggests improved sampling efficiency. Consequently, the smallest grad/ESS ratio represents the best overall performance.

To enable comparison of efficiency between two samplers, a Relative Efficiency Factor (REF) is introduced as:

REF(Sampler1,Sampler2)=21,\displaystyle\text{REF}(\text{Sampler}_{1},\text{Sampler}_{2})=\frac{\mathcal{M}_{2}}{\mathcal{M}_{1}}, (29)
=gradESS,ESS{minESS,meanESS,multiESS}.\displaystyle\mathcal{M}=\frac{\text{grad}}{\text{ESS}},\qquad\text{ESS}\in\{\text{minESS},\text{meanESS},\text{multiESS}\}.

The REF value quantifies the extent to which Sampler1\text{Sampler}_{1} outperforms Sampler2\text{Sampler}_{2} for a particular =grad/ESS\mathcal{M}=\text{grad}/\text{ESS} metric, where ESS is picked from the set {minESS, meanESS, multiESS}.

Benchmark Model Data DD (dimension) KK (observations)
G1000 𝒩(0,Σ)\mathcal{N}(0,\Sigma) Σ1𝒲(𝕀D,D)\Sigma^{-1}\sim\mathcal{W}(\mathbb{I}_{D},D) [5] 1000 -
G500 500 -
G2000 2000 -
German Bayesian Logistic Regression [58] [59, 20] 25 1000
Musk [59] 167 476
Banana [53, 20] 𝒩(0,σθ2)\mathcal{N}(0,\sigma_{\theta}^{2}) σθ2=1\sigma_{\theta}^{2}=1, 𝒚𝒩(1,2)\bm{y}\sim\mathcal{N}(1,2) 2 100
Table 4: List of benchmarks used in this study. 𝒲\mathcal{W} is the Wishart distribution (more details in [5]), 𝕀D\mathbb{I}_{D} is the identity matrix in D×D\mathbb{R}^{D\times D}, 𝒚K\bm{y}\in\mathbb{R}^{K} is the set of observations.

8.2 Benchmark tests

We use the the metrics described in 8.1 for the full set of simulation benchmarks (Table 4). For the biggest benchmark (G2000), we consider the ESS metrics evaluated over the first N1.01+2000N_{1.01}+2000 iterations, in order to take into account enough iterations (>D=2000>D=2000) for the multiESS calculation. We also remark that for Gaussian benchmarks the grad/ESS metrics for the simulations with L=1L=1 were calculated over the first N~1.01+1000\tilde{N}_{1.01}+1000 (20002000 for G2000), where N~1.01\tilde{N}_{1.01} is the number of iterations required to achieve avgPSRF<1.01\text{avgPSRF}<1.01. The relaxation of the PSRF threshold helped to avoid extremely long simulations (Nprod106N_{\text{prod}}\gg 10^{6}) needed for reaching convergence in terms of maxPSRF<1.01\max\text{PSRF}<1.01.

Benchmark LL φ\varphi
G1000 1 𝒰(0.00044,0.00264)\mathcal{U}(0.00044,0.00264)
G500 1 𝒰(0.00088,0.00527)\mathcal{U}(0.00088,0.00527)
G2000 1 𝒰(0.00022,0.00132)\mathcal{U}(0.00022,0.00132)
German 1 𝒰(0.01752,0.10545)\mathcal{U}(0.01752,0.10545)
Musk 𝒰{2,6}\mathcal{U}\{2,6\} 𝒰(0.00262,0.01579)\mathcal{U}(0.00262,0.01579)
Banana 𝒰{2,6}\mathcal{U}\{2,6\} 𝒰(0.21904,1)\mathcal{U}(0.21904,1)
Table 5: Optimal randomization intervals for LL (27) and φ\varphi (21) for each tested benchmark – G1000, G500, G2000, German, Musk, Banana.
Benchmark φ\varphi ESS L=1L=1 LoptL_{\text{opt}} 𝒰{1,D3}\mathcal{U}\{1,\frac{D}{3}\} 𝒰{1,23D}\mathcal{U}\{1,\frac{2}{3}D\} 𝒰{1,43D}\mathcal{U}\{1,\frac{4}{3}D\} 𝒰{1,83D}\mathcal{U}\{1,\frac{8}{3}D\}
G1000 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) minESS 39373 2387 1491 2509 5021
meanESS 0.4596 532.1 1071 2142 4285
multiESS 0.3787 299.8 402.5 895.4 3069
1 (HMC) minESS 132650 46712 21620 11740 5554
meanESS 2.530 533.6 1054 2093 4198
multiESS 3.712 301.3 1001 2000 4001
G500 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) minESS 31792 1553 919.8 1090 2339
meanESS 0.3429 260.0 520.7 1043 2093
multiESS 0.2553 170.6 320.8 384.0 1422
1 (HMC) minESS 116471 42111 21821 10094 4790
meanESS 2.363 261.3 515.5 1035 2046
multiESS 3.666 107.8 321.6 563.3 1325
G2000 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) minESS 63451 3305 2515 5036 9195
meanESS 0.6518 1104 2207 4417 8853
multiESS 0.5029 428.5 860.1 1723 7096
1 (HMC) minESS 174435 47859 22770 10870 9582
meanESS 2.791 1099 2178 4348 8671
multiESS 3.739 677.3 2000 4001 8000
German 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) minESS 1.734 13.94 25.93 53.38 113.6
meanESS 0.3463 10.15 23.84 49.05 101.0
multiESS 0.1218 10.62 25.10 49.47 100.5
1 (HMC) minESS 24.92 13.25 26.18 53.09 112.6
meanESS 5.860 10.56 24.45 49.23 102.9
multiESS 2.334 10.88 25.44 49.44 100.8
Musk 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) minESS 216.8 132.3 229.4 389.7 601.3 1458
meanESS 145.4 57.16 68.44 165.0 464.5 1031
multiESS 19.79 13.58 36.00 160.4 342.0 774.5
1 (HMC) minESS 6026 1476 187.7 290.3 525.7 1215
meanESS 3068 886.8 143.5 176.3 454.6 1047
multiESS 375.8 93.81 50.60 159.5 279.5 506.9
L=1L=1 𝒰{1,D}\mathcal{U}\{1,D\} LoptL_{\text{opt}} 𝒰{1,5D}\mathcal{U}\{1,5D\} 𝒰{1,6D}\mathcal{U}\{1,6D\}
Banana 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) minESS 287.9 247.2 167.3 168.0 158.3
meanESS 190.4 167.1 141.6 167.5 156.5
multiESS 202.3 165.6 151.2 183.3 169.2
1 (HMC) minESS 619.2 440.9 255.0 210.0 212.6
meanESS 403.7 283.1 197.8 192.3 208.1
multiESS 436.6 299.4 218.3 214.2 227.1
Table 6: Performance comparison between GHMC with optimal randomization scheme φ𝒰(φlower,φupper)\varphi\in\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) (21) and standard HMC, in terms of grad/ESS metrics (ESS{minESS,meanESS,multiESS})\text{ESS}\in\{\text{minESS},\text{meanESS},\text{multiESS}\}) for the range of randomization schemes for LL in all the tested benchmarks. The overall top values for each metric in each benchmark are highlighted in red. The best values for the sampler, but not overall the best for each metric, are shown in bold. The results achieved with the most successful simulation settings for the most successful sampler are shaded in light blue, whereas the results obtained with the most successful simulation settings for less successful sampler are highlighted in light grey.

In order to evaluate the optimal settings proposed in this study, we compare our recommended choice φ\varphi\sim 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) (21) with the fixed choice of φ=1\varphi=1 (which corresponds to standard HMC) and with three randomization intervals φ\varphi\sim 𝒰(0,0.1)\mathcal{U}(0,0.1), 𝒰(0,0.5)\mathcal{U}(0,0.5), 𝒰(0,0.9)\mathcal{U}(0,0.9). For each choice of φ\varphi, in addition to the proposed optimal randomization interval for LL (27), we consider five different 𝒰{1,Lupper}\mathcal{U}\{1,L_{\text{upper}}\} intervals (four for Banana), where the values of LupperL_{\text{upper}} are proportional to the dimension of the simulated system. The optimal randomization intervals for φ\varphi and LL are summarized in Table 5.

As put forward at the end of Section 4.2, we use s-AIA3 as a numerical integrator for Hamiltonian dynamics and the step size randomization interval proposed in Section 4.1 (Eq. (14)).

Table 6 presents the performance comparison between GHMC with φ\varphi\sim 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) and standard HMC (φ=1\varphi=1) across the tested benchmarks, with varying values of LupperL_{\text{upper}}. For any benchmark and for each metric, the best performance is consistently achieved by GHMC (shown in red). Combining GHMC with our proposed choice of LoptL_{\text{opt}} leads to the best GHMC results for the average metrics, i.e. grad/meanESS and grad/multiESS. Moreover, for non-Gaussian benchmarks (also, with smaller dimensions), grad/minESS shows the best results with such a choice of LL. For Gaussian benchmarks, the best results for grad/minESS are reached for larger values of LupperL_{\text{upper}} – they are lower for GHMC (around DD) and higher for HMC (>2D>\text{2}D).

The results obtained with other tested randomization intervals for φ\varphi, 𝒰(0,0.1)\mathcal{U}(0,0.1), 𝒰(0,0.5)\mathcal{U}(0,0.5), 𝒰(0,0.9)\mathcal{U}(0,0.9), support our choice for the optimal randomization interval (φlower\varphi_{\text{lower}}, φupper\varphi_{\text{upper}}) and can be examined in E.

Refer to caption
Figure 6: On the left: Relative efficiency (REF (29)) of GHMC with the optimal parameter settings, AT-GHMC (light blue boxes in Table 6) with respect to HMC with its top parameter setting, HMCbest{}_{\text{best}} (light grey boxes in Table 6) among all the tested benchmarks. All the benchmarks show the superiority (up to 34x) of AT-GHMC over HMCbest{}_{\text{best}} with each performance metric.
On the right: Relative efficiency (REF (29)) of GHMC with respect to HMC with their best settings for minESS (see Table 6) for the Gaussian benchmarks – G1000, G500, G2000. All the benchmarks show the superiority (up to 10x) of GHMCminESSbest{}_{\text{minESSbest}} over HMCminESSbest{}_{\text{minESSbest}} with each performance metric.

Finally, in Figure 6, we provide performance comparison in terms of the REF (29) between GHMC with our optimal settings, i.e. AT-GHMC, and HMC with its best settings found heuristically (see Table 6) using all proposed ESS metrics and all tested benchmarks. For the Gaussian benchmarks, we also present the comparison at the LupperL_{\text{upper}} values that optimize the minESS metric in HMC and GHMC (see Table 6). Both plots in Figure 6 confirm the improvements (up to 34x and 10x, respectively) gained with our proposed GHMC settings across all metrics and benchmarks. They also demonstrate that greater improvements are achieved with the average metrics, i.e. meanESS and multiESS.

9 Applications – Case studies

Next, we apply ATune to three case studies:

  • 1.

    Patient resistance to endocrine therapy in breast cancer

  • 2.

    Cell-cell adhesion dynamics

  • 3.

    Influenza A (H1N1) epidemics outbreak.

9.1 Patient resistance to endocrine therapy in breast cancer

Estrogen-positive (ER+) breast cancer is the most common subtype of breast cancer, accounting for over 70% of all breast tumors diagnosed. Hormone therapies such as tamoxifen are the gold-standard treatment. However, a significant amount of patients develops resistance to the treatment, which constitutes a serious clinical problem. The development of this resistance is not yet well understood, and plenty of clinical and laboratory research aims to find which genes may be relevant to the problem.

For one such clinical study presented in [60], ER+ breast cancer patients treated with tamoxifen for 5 years were tested for the presence of a specific gene marker, SOX2. The objective of the study was to analyze the connection between SOX2 and the emergence of resistance to tamoxifen treatment. An overexpression of this gene was previously identified in bioinformatic analyses of resistant breast cancer cells as a key contributor for resistance, but clinical evidence was presented in [60] for the first time.

The dataset used in our work stems from that clinical trial. It contains numerical data for SOX2 values and 7 other gene-related covariates taken from 75 tumors samples. The recorded response variable was the event of relapse after 5 years. Notably, the SOX2 variable perfectly separates the dataset for this response variable, as all cases with SOX2 values below 3 respond well to the treatment and all those above this value present a resistant response. The appearance of separation implies that maximum likelihood estimates do not exist for some binary classification models such as logistic regression [61] in the classical frequentist interpretation. The perfect separation makes the Bayesian framework natural to model such a dataset, as it can provide estimates for the parameters involved without the need to use regularization techniques.

The model of choice is a Bayesian Logistic Regression, where the dataset has 75 observations and dimension D=8D=8 (including intercept parameter). We test the effect of three priors, informative 𝒩(0,1)\mathcal{N}(0,1), weakly informative 𝒩(0,2.5)\mathcal{N}(0,2.5) (according to [62]) and low informative 𝒩(0,5)\mathcal{N}(0,5).

We compare the performance of AT-GHMC proposed in this study against the No-U-Turn-Sampler (NUTS) [5], the optimized HMC sampler implemented in Stan [63] using its R interface rstan [64]. NUTS in Stan uses the leapfrog/Verlet integrator [6] and the original optimization routine to determine both a step size, Δt\Delta t, and a number of Hamiltonian dynamics integration steps per iterations, LL. Additionally, we perform numerical experiments with HMC using LoptL_{\text{opt}} from Eq. (27) (AT-HMC) and setting L=LNUTSL=L_{\text{NUTS}} (HMCLNUTS{}_{L_{\text{NUTS}}}). The latter is determined such that the integration leg ΔtL\Delta tL matches that found by NUTS.

All samplers are run with the same number of burn-in iterations, Nburn-inN_{\text{burn-in}}, to ensure a fair comparison. We assess performance using grad/ESS metrics introduced in Section 8.1. Small values of N1.01N_{1.01} in NUTS are expected, given its optimization routine aims to bring PSRF close to 1. The simulation settings are summarized in Table 7.

Figure 7 compares the sampling performance of AT-GHMC, AT-HMC, and HMCLNUTS{}_{L_{\text{NUTS}}} with NUTS using REF metrics (29) for ESS{minESS,meanESS,multiESS}\text{ESS}\in\{\text{minESS},\text{meanESS},\text{multiESS}\}. AT-GHMC consistently outperforms NUTS across all metrics and priors, achieving more than 8 times improvement (e.g., multiESS for the 𝒩(0,2.5)\mathcal{N}(0,2.5) prior).

Prior Sampler N1.01N_{1.01} LL Δt\Delta t φ\varphi
𝒩(0,5)\mathcal{N}(0,5) AT-GHMC 300 1 𝒰{0.889,1.284}\mathcal{U}\{0.889,1.284\} 𝒰{0.055,0.330}\mathcal{U}\{0.055,0.330\}
AT-HMC 200 1 𝒰{0.890,1.285}\mathcal{U}\{0.890,1.285\} -
HMCLNUTS{}_{L_{\text{NUTS}}} 100 𝒰{1,6}\mathcal{U}\{1,6\} -
NUTS 100 L¯=14.8\overline{L}=14.8 Δt¯=0.247\overline{\Delta t}=0.247 -
𝒩(0,2.5)\mathcal{N}(0,2.5) AT-GHMC 100 1 𝒰{0.725,1.047}\mathcal{U}\{0.725,1.047\} 𝒰{0.055,0.330}\mathcal{U}\{0.055,0.330\}
AT-HMC 100 1 𝒰{0.723,1.045}\mathcal{U}\{0.723,1.045\} -
HMCLNUTS{}_{L_{\text{NUTS}}} 200 𝒰{1,8}\mathcal{U}\{1,8\} -
NUTS 100 L¯=11.8\overline{L}=11.8 Δt¯=0.354\overline{\Delta t}=0.354 -
𝒩(0,1)\mathcal{N}(0,1) AT-GHMC 100 1 𝒰{0.545,0.787}\mathcal{U}\{0.545,0.787\} 𝒰{0.055,0.330}\mathcal{U}\{0.055,0.330\}
AT-HMC 200 1 𝒰{0.545,0.788}\mathcal{U}\{0.545,0.788\} -
HMCLNUTS{}_{L_{\text{NUTS}}} 200 𝒰{1,11}\mathcal{U}\{1,11\} -
NUTS 100 L¯=8.1\overline{L}=8.1 Δt¯=0.474\overline{\Delta t}=0.474 -
Table 7: Simulation parameters for the numerical experiments on the breast cancer dataset with the BLR model combined with three different priors – 𝒩(0,5)\mathcal{N}(0,5), 𝒩(0,2.5)\mathcal{N}(0,2.5), 𝒩(0,1)\mathcal{N}(0,1). The optimal settings for AT-GHMC and AT-HMC, as well as the optimal Δt\Delta t randomization interval for HMCLNUTS{}_{L_{\text{NUTS}}} are found using the procedure described in Section 7. AT-GHMC, AT-HMC, and HMCLNUTS{}_{L_{\text{NUTS}}} employ s-AIA3 integrator while NUTS uses standard Verlet. The mean values of LL and Δt\Delta t used by NUTS optimal routine are denoted as L¯\overline{L} and Δt¯\overline{\Delta t}, respectively.

We emphasize that, the ATune algorithm does not offer an optimal choice of LL for HMC. The values of LoptL_{\text{opt}} (27) used in AT-HMC are the recommended values for GHMC. However, the strong meanESS and multiESS performance of AT-HMC in all experiments confirms their applicability in HMC. On the other hand, the relative efficiency of HMCLNUTS{}_{L_{\text{NUTS}}} with respect to NUTS shows its strong dependence on a chosen prior. Specifically, HMCLNUTS{}_{L_{\text{NUTS}}} demonstrates visible improvements over NUTS with the low informative prior 𝒩(0,5)\mathcal{N}(0,5); the comparable performance with NUTS if the chosen prior is 𝒩(0,2.5)\mathcal{N}(0,2.5), and the poor sampling efficiency with the highly informative prior 𝒩(0,1)\mathcal{N}(0,1). For 𝒩(0,5)\mathcal{N}(0,5), the improved performance can be attributed to the higher accuracy and longer optimal step size of the 3-stage integrator (s-AIA3) employed in HMCLNUTS{}_{L_{\text{NUTS}}} compared to those achievable with the standard 1-stage Verlet integrator used in NUTS. On the contrary, the poor performance of HMCLNUTS{}_{L_{\text{NUTS}}} with 𝒩(0,1)\mathcal{N}(0,1) is likely caused by the underestimation of ΔtCoLSI\Delta t_{\text{CoLSI}} hinted by the comparison of its value with Δt¯\overline{\Delta t} (see Table 7). Indeed, the values are close, which is not what one expects when comparing optimal step sizes for 1- and 3- stage integrators. It is also not the case for 𝒩(0,5)\mathcal{N}(0,5) and 𝒩(0,2.5)\mathcal{N}(0,2.5), where optimal step sizes are longer than the NUTS optimal step sizes (less pronounced for 𝒩(0,2.5)\mathcal{N}(0,2.5) though), and where AT-GHMC and AT-HMC have never been outperformed by NUTS. These observations are encouraging. They mean that even in the worst scenario, when the estimated optimal parameters are less accurate, they still guarantee the good sampling performance (see AT-GHMC and AT-HMC for 𝒩(0,1))\mathcal{N}(0,1)) and are not inferior to the optimal parameters offered by the state-of-the-art NUTS approach. Moreover, the sampling advantages of GHMC over HMC, enhanced by the stochastic procedure for finding its optimal settings, disregard the inaccuracy in the estimation of an optimal step size and, as a result, AT-GHMC demonstrates a significant improvement over NUTS, AT-HMC and HMCLNUTS{}_{L_{\text{NUTS}}} as happened in the case of 𝒩(0,1)\mathcal{N}(0,1). The fact that the performance of AT-HMC in this case is comparable with, or better than the NUTS performance also confirms the robustness of the proposed methodology.

Refer to caption
Figure 7: Relative efficiency REF (29) for minESS (left), meanESS (center), and multiESS (right) of AT-GHMC, AT-HMC, and HMCLNUTS{}_{L_{\text{NUTS}}} with respect to Stan NUTS obtained in the numerical experiments run with the settings presented in Table 7. The performance of AT-GHMC (filled bars), AT-HMC (horizontal bars), and HMCLNUTS{}_{L_{\text{NUTS}}} (vertical bars) are compared with NUTS for the three examined priors: 𝒩(0,5)\mathcal{N}(0,5) (blue), 𝒩(0,2.5)\mathcal{N}(0,2.5) (yellow), 𝒩(0,1)\mathcal{N}(0,1) (green). AT-GHMC significantly outperforms NUTS for any metric and prior. AT-HMC demonstrates good results for average metrics (meanESS, multiESS) with all priors but it is not convincing for minESS. Using LNUTSL_{\text{NUTS}} instead of LoptL_{\text{opt}} (27) in HMC leads to performance degradation in HMC simulations with 𝒩(0,1)\mathcal{N}(0,1), whereas for 𝒩(0,5)\mathcal{N}(0,5) the results are visibly better than with NUTS HMC and with AT-HMC for all metrics.

9.2 Parameter inference on a cell-cell adhesion model

Many phenomena on the cellular level such as tissue formation or cell sorting can be explained by introducing differential cell-cell adhesion [65, 66, 67, 68, 69, 70]. This phenomena has been used to explain pattern formation for zebrafish lateral lines [71, 72], cancer invasion modeling [66], and cell sorting in early fly brain [70] and early mice brain development [73]. When modeling such biological systems using PDEs, the adhesion can be represented by means of a non-local interaction potential [66, 65, 68, 69]. To enhance predictive power of such models, a careful calibration of the model parameters according to experimental lab data is required. This task can become increasingly difficult due to sparsity of available data or increasing complexity of the model (such as, for example, by the addition of an interaction term). The application of state-of-the-art methodologies for performing parameter estimation on PDE models is receiving lots of attention in the last years [74, 75, 76], and, in this context, our goal is to investigate the feasibility of using the methodology introduced in this work for the task.

To achieve this goal, we propose a benchmark PDE model:

ut=[u(Π(u)+(Wu)],\frac{\partial u}{\partial t}=\nabla\cdot[u\nabla(\Pi^{\prime}(u)+(W*u)], (30)

which describes the time evolution of the density of cells u(𝐱,t)>0u(\mathbf{x},t)>0, 𝐱d,t>0\mathbf{x}\in\mathbb{R}^{d},t>0 (d=1,2,3d=1,2,3 is the spatial dimension), as governed by a series of potentials. In (30), Π(u)\Pi(u) is a density of the internal energy and W(𝐱)W(\mathbf{x}) is an interaction potential (also called an interaction kernel) describing cell adhesion. The convolution (WuW*u) gives rise to non-local interactions. Depending on the explicit form of the potentials, they can describe a wide range of biological phenomena such as the formation of ligands through long philopodia, modeled by nonlocal attractive potentials, and volume constraints, modeled by strong repulsive nonlocal potentials or nonlinear diffusive terms. We refer to [77] for more information about the applications of these aggregation-diffusion models in the sciences. We consider the following explicit forms for Π(u)\Pi^{\prime}(u) and W(𝐱)W(\mathbf{x}):

Π(u)=νum1W(𝐱)=ae|𝐱|2/2rW/2πrW,\begin{split}&\Pi^{\prime}(u)=\nu u^{m-1}\\ &W(\mathbf{x})=-ae^{|\mathbf{x}|^{2}/2r_{W}}/\sqrt{2\pi r_{W}},\end{split} (31)

where Π(u)\Pi^{\prime}(u) gives rise to a diffusion term, governed by the diffusion exponent mm. In the limit m1m\to 1, one recovers linear diffusion, whereas if m>1m>1 the term represents porous medium and leads to finite speed propagation fronts. The diffusion coefficient ν\nu defines the cell mobility in the medium. On the other hand, the explicit form of W(𝐱)W(\mathbf{x}) leads to symmetric attraction between cells due to ligands formation on the cell’s membrane. The strength of the attraction is given by the parameter a>0a>0, whereas rW>0r_{W}>0 controls the range of the interaction.

Refer to caption
Figure 8: Relative parameter of the diffusion coefficient (left), diffusion exponent (second to left), noise parameter (second to right) and range of interaction (right) for the 3-parameter model (purple circles) and 4-parameter model (green squares). All sampler are able to recover the true parameter value with low relative error, in particular HMC and GHMC show improved precision over RW.

Our purpose is to perform Bayesian inference of the parameters of this model using three different MCMC methods, and to evaluate the efficiency of each tested sampler for this class of models. Parameter inference in a Bayesian framework requires sampling from the posterior distribution of parameters given by π(𝜽|𝒚)L(𝒚|𝜽)p(𝜽)\pi(\bm{\theta}|\bm{y})\propto L(\bm{y}|\bm{\theta})p(\bm{\theta}), where 𝒚\bm{y} are the experimental data and 𝜽\bm{\theta} is the vector of model parameters. For the proposed model, the data consists of a collection of measurements of the cell density at different spacial points and different times 𝒚={u𝒚(𝐱i,tj)}i,j\bm{y}=\{u_{\bm{y}}(\mathbf{x}_{i},t_{j})\}_{i,j}. In cell modeling, it is usual to assume that the experimental data stems from the proposed model with additive Gaussian noise ε𝒩(0,σ)\varepsilon\sim\mathcal{N}(0,\sigma), i.e. uy(𝐱i,tj)=u(𝐱i,tj)+εu_{y}(\mathbf{x}_{i},t_{j})=u(\mathbf{x}_{i},t_{j})+\varepsilon. Under this assumption, the likelihood takes the form:

L(𝒚|𝜽)=12πσ2exp(j(u𝒚(𝐱,tj)u(𝐱,tj)σ)2).L(\bm{y}|\bm{\theta})=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp\left(\sum_{j}\left(\frac{u_{\bm{y}}(\mathbf{x},t_{j})-u(\mathbf{x},t_{j})}{\sigma}\right)^{2}\right). (32)

In this work, we use synthetic data generated from model (30) in a one dimensional case (d=1d=1). First, we numerically solve the model in the interval t[0,5]t\in[0,5] by using the finite volume scheme proposed in [78, 79] with a spacial meshing of 50 cells and a time step of δt=0.1\delta t=0.1, resulting in a total of 51 time points. We denote the numerical solution at point xix_{i} and time tjt_{j} as u~s(xi,tj)\tilde{u}_{s}(x_{i},t_{j}). We then generate the synthetic data by adding Gaussian white noise εij𝒩(0,σ)\varepsilon_{ij}\sim\mathcal{N}(0,\sigma) to each cell and time point u~𝒚(xi,tj)=u~s(xi,tj)+εij\tilde{u}_{\bm{y}}(x_{i},t_{j})=\tilde{u}_{s}(x_{i},t_{j})+\varepsilon_{ij}, where σ\sigma is the noise parameter.

We consider two different variations of the model for the numerical experiments. In the first one (3-parameter model) the parameters sampled are the diffusion coefficient ν\nu, the diffusion exponent mm from the internal energy potential in (LABEL:eq:pdemodel_potentials), and the model independent noise parameter σ\sigma. The remaining parameters in (LABEL:eq:pdemodel_potentials) are fixed to a=2a=2 and rW=0.5r_{W}=0.5. In the second variation (4-parameter model), we extend the previous model to also consider rWr_{W} as a sampled parameter, leaving a=2a=2 as the only fixed parameter in (LABEL:eq:pdemodel_potentials). The chosen priors are νExp(1)\nu\sim\text{Exp}(1), mΓ(10,3)m\sim\Gamma(10,3), σExp(0.1)\sigma\sim\text{Exp}(0.1), and rWExp(1)r_{W}\sim\text{Exp}(1).

Three sampling methodologies are examined: Random Walk Metropolis-Hastings (RW) (the most commonly used method for Bayesian inference), AT-HMC and AT-GHMC. We evaluate performance using the metrics introduced in Section 8.1. For this study, the metrics are computed over the first N1.01+5000N_{1.01}+5000 iterations, where N1.01N_{1.01} is the number of iterations needed for convergence. We note that the slower-converging RW requires a large number of iterations to meet the strict threshold maxPSRF<1.01\text{maxPSRF}<1.01 (N1.01>106N_{1.01}>10^{6}). Therefore, for this sampler, we relax a convergence threshold to 1.1, as suggested in [48]. Since RW does not involve gradient evaluations (which constitute the bulk of the computational cost in HMC/GHMC), we normalize ESS metrics with respect to the simulation time T.

Figure 8 shows the relative error for each parameter and both models, defined as ϵθ=|θtrueθ|θtrue\epsilon_{\theta}=\frac{|\theta_{true}-\theta|}{\theta_{true}}, where θtrue\theta_{true} is the true parameter value used for generating the synthetic data and θ\theta is the mean parameter value obtained from the RW, AT-HMC and AT-GHMC simulations considering the first 5000 iterations after convergence. The three samplers are able to recover the true parameter values with high precision, as evidenced by the low relative errors, meaning all parameter can be confidently identified with small variance. In particular, GHMC and HMC exhibit higher precision than RW. Nevertheless, the number of iterations needed to reach the stationary distribution are vastly different across samplers: AT-GHMC needs 104\sim 10^{4} samples to converge for both models, whilst AT-HMC requires 105\sim 10^{5} and RW 106\sim 10^{6} (even with the relaxed convergence threshold), confirming that AT-GHMC has a much faster convergence rate.

In Figure 9, we present the comparison of the REF metric (29) for the three samplers (Sampler1 = {AT-GHMC, AT-HMC}, Sampler2 = RW) for both models, in terms of ESS/T, ESS{minESS,meanESS,multiESS}\text{ESS}\in\{\text{minESS},\text{meanESS},\text{multiESS}\}. AT-GHMC exhibits high efficiency across all metrics and models, outperforming the popular RW and AT-HMC by up to an order of magnitude in both models. Additionally, AT-GHMC shows visibly higher relative performance in the 4-parameter model than in the 3-parameter model for two out of three considered metrics. This may indicate the potential of the method for high dimensional problems. These results suggest that AT-GHMC provides a suitable framework for PDE calibration. On the other hand, AT-HMC displays worse performance than RW in two out of three metrics when increasing the dimension of the model. Most likely, an optimal choice of LL for GHMC (27) does not necessarily translate to optimal performance for HMC, especially for higher dimensions.

Refer to caption
Figure 9: Relative efficiency REF for minESS/T (left), meanESS/T (center), and multiESS/T (right) (T is a computational time) of AT-GHMC and AT-HMC with respect to RW. For the 3-parameter model, AT-GHMC (purple) and AT-HMC (orange) show improved performance over RW (dotted line) for all metrics. For the 4-parameter model, RW performs better than HMC for two out of three metrics. AT-GHMC demonstrates the best performance by a wide margin, outperforming RW by a factor of up to 20 in the 3-parameter model and up to 15 times in the 4-parameter model.

9.3 Influenza A (H1N1) epidemics outbreak

Mechanistic disease transmission models are mathematical frameworks used to describe how infectious diseases spread through populations. These models commonly employ compartmental structures that classify individuals into distinct states (such as susceptible, infected, recovered), and define transitions between these states over time using systems of ordinary differential equations (ODEs). Classical disease transmission models are often deterministic, meaning that the system’s evolution is fully determined by its parameters and initial conditions, always yielding the same outcome for a given setup. While deterministic models offer valuable insights into average system behavior, they fail to capture the inherent variability and stochasticity of real-world disease transmission. To address this limitation, mechanistic models can be enriched with statistical components that incorporate observational data to infer the underlying parameters. This is achieved by overlaying a probabilistic (or generative) model onto the deterministic structure. Bayesian inference, in particular, has proven highly effective for this purpose (refer to [80, 55] and references therein).

As an illustrative example, we consider an outbreak of influenza A (H1N1) in 1978 at a British boarding school involving 763 students. The dataset, available in the R package outbreaks [81], records the daily number of students confined to bed over a 14-day period. Following [80], we model the outbreak dynamics using a classical SIR (Susceptible-Infected-Recovered) model [82], coupled with a negative binomial observation model to account for overdispersion in the reported case counts. The model includes the transmission rate, β\beta, and the recovery rate, γ\gamma, which govern the transitions from susceptible to infected and from infected to recovered, respectively. For the prior distributions, we assume β𝒩0(1,1)\beta\sim\mathcal{N}^{0}(1,1) and γ𝒩0(0.4,0.5)\gamma\sim\mathcal{N}^{0}(0.4,0.5), where 𝒩a\mathcal{N}^{a} denotes a normal distribution truncated at aa. The prior specification is completed by assigning an exponential prior to the inverse of the dispersion parameter, ϕ\phi, of the negative binomial distribution, i.e. 1/ϕExp(5)1/\phi\sim\text{Exp}(5). The initial conditions are set as I0=1I_{0}=1, R0=0R_{0}=0, and thus S0=762S_{0}=762.

Sampler N1.01N_{1.01} LL Δt\Delta t φ\varphi
AT-GHMC 2250 11 𝒰(0.073,0.105)\mathcal{U}(0.073,0.105) 𝒰(0.088,0.527)\mathcal{U}(0.088,0.527)
NUTS 140 L¯=5.45\overline{L}=5.45 Δt¯=0.607\overline{\Delta t}=0.607 -
Table 8: Simulation parameters for the numerical experiments on the Influenza A (H1N1) epidemics dataset with the SIR model. The optimal settings for AT-GHMC is found using the algorithm described in Section 7. AT-GHMC employs s-AIA3 integrator while NUTS uses standard Verlet. The mean values of LL and Δt\Delta t used by NUTS optimal routine are denoted as L¯\overline{L} and Δt¯\overline{\Delta t}, respectively.

As in Section 9.1, we compare the performance of two samplers: AT-GHMC and NUTS [5], as implemented in the R package rstan [64]. We note that, in both cases, the (approximate) solution to the ODE system is obtained using the CVODES component of the SUNDIALS suite [83]. Specifically, we use the ODE solver that relies on the backward differentiation formula (BDF) method. Similarly to the breast cancer study, both samplers are run with the same number of burn-in iterations, and their performance is assessed in terms of the grad/ESS metrics calculated over the first N1.01+1000N_{1.01}+1000 iterations after the burn-in (see Section 8.1).

Refer to caption
Figure 10: Relative efficiency (REF) (29) in terms of grad/minESS (violet), grad/meanESS (yellow), and grad/multiESS (green) of AT-GHMC with respect to Stan NUTS. Filled bars correspond to the performance metrics that take into account the overheads introduced by NUTS at the optimization stage (quantified by δTNUTS\delta\text{T}_{\text{NUTS}}). AT-GHMC demonstrates superior performance with meanESS and multiESS, while NUTS outperforms AT-GHMC for minESS. When accounting for computational time differences, AT-GHMC shows around 5x improvement over NUTS for both meanESS and multiESS.

The simulations settings for the numerical experiments are summarized in Table 8. Figure 10 shows the relative efficiency REF (29) of AT-GHMC over NUTS, evaluated in terms of grad/ESS, ESS{minESS,meanESS,multiESS}\text{ESS}\in\{\text{minESS},\text{meanESS},\text{multiESS}\}. In addition, to account for the extra computational cost incurred solely by NUTS due to its optimization process during the burn-in phase, we multiply the grad/ESS metrics for NUTS by a factor quantifying such overheads, δTNUTS\delta\text{T}_{\text{NUTS}}. This factor is carefully evaluated using the rstan package.

AT-GHMC with our proposed settings outperforms NUTS in terms of both meanESS and multiESS, with improvements increasing from a factor of 2x when no NUTS computational overheads are considered to 5x otherwise. In contrast, NUTS achieves faster convergence and better sampling with the most restrictive metric, minESS, which is likely enabled by the use of a tuned mass matrix. This trend aligns with the findings from the breast cancer study (Figure 7), where AT-GHMC shows more significant improvements over NUTS for average ESS metrics than for minESS. This suggests that tuning a position-dependent mass matrix during the NUTS burn-in stage may have a pronounced effect on the efficiency of an HMC algorithm in sampling complex distributions. Nevertheless, the computational time analysis shows that, in this case study, NUTS requires approximately twice the computational time of a tuned GHMC to achieve on average twice lower sampling performance, as highlighted in Figure 10.

10 Conclusion

The challenge of tuning appropriate hyperparameters to enhance HMC sampling performance has been extensively addressed by the statistical community. In contrast, GHMC, despite its promising potential to improve sampling efficiency due to the inherent irreversibility of its Markov chains, has not garnered comparable attention.

Here we present a computationally inexpensive adaptive tuning approach ATune which generates optimal parameters settings on the fly for HMC and GHMC simulations, bringing forth adaptively tuned AT-HMC and AT-GHMC methods. Given a simulated system, the new methodology delivers a set of system-specific optimized HMC/GHMC hyperparameters, without introducing any additional computational overhead during the production simulation stage.

To determine an optimal integration step size, we build on the insights from the s-AIA adaptive integration approach introduced in [14]. We propose a system-specific randomization interval that minimizes energy error near the center of an estimated stability interval – the region typically associated with the most efficient performance [10, 42]. Additionally, we confirm that the s-AIA3 integrator consistently outperforms fixed-parameter integrators when combined with both HMC and GHMC samplers. We notice that even following the general recommendations for GHMC settings leads to its superior performance over manually tuned HMC. These findings motivate a deeper exploration of the proper tuning of φ\varphi and LL parameters for GHMC.

Leveraging the methodology proposed in [12], we develop an automated procedure for finding an optimal value for the random noise parameter in PMU (7), based solely on the size of the simulated system (assuming the simulation step size and numerical integrator are chosen as discussed). When applied to high-dimensional systems, this approach leads to smaller modifications of momentum, which, combined with the high accuracy provided by s-AIA3 and an optimal choice of Δt\Delta t, facilitates the use of shorter trajectories, thus enhancing sampling efficiency in GHMC [37]. We note that, unlike the optimization procedures proposed for other hyperparameters, the selection of the optimal random noise φ\varphi can be performed prior a burn-in stage, relying only on the known dimensionality of the system. As a result, we propose employing the optimal φ\varphi from the very start of a GHMC simulation, rather than limiting it to the production stage only.

The introduced optimal hyperparameter settings exhibit superiority over other combinations of hyperparameters, leading to significant GHMC performance improvement across a set of popular benchmarks. Additionally, the performance gains observed when comparing our results to state-of-the-art samplers in three distinct real-world case studies indicate the broad applicability and effectiveness of AT-GHMC for diverse probabilistic models. It is worth noting that, although the optimal choice of LL is specifically tailored for GHMC, LoptL_{\text{opt}} also enhances sampling efficiency when applied to HMC in some cases, especially of small dimensions.

The proposed ATune algorithm can be easily implemented in any software package for statistical modeling using Bayesian inference with Hamiltonian Monte Carlo to offer the advantages of AT-GHMC over conventional HMC.

Appendix A Overestimation of a dimensional stability interval for 3-stage integrators

A.1 Overestimation of the expected acceptance rate in the s-AIA analysis

We consider the high-dimensional asymptotic formula for expected acceptance rate 𝔼[AR]\mathbb{E}[\text{AR}] [3] proven for Gaussian distributions in a general scenario [84], i.e.

𝔼[AR]=2Φ(μ/2),\mathbb{E}[\text{AR}]=2\,\Phi\left(-\sqrt{\mu/2}\right), (33)

with μ=𝔼D[ΔH]\mu=\mathbb{E}^{D}[\Delta H] and

2Φ(μ/2)=112πμ+𝒪(μ32),μ0,D.2\,\Phi\left(-\sqrt{\mu/2}\right)=1-\frac{1}{2\sqrt{\pi}}\sqrt{\mu}+\mathcal{O}(\mu^{\frac{3}{2}}),\qquad\mu\to 0,\,D\to\infty. (34)

Replacing (33) with its asymptotic expression

𝔼[α]=112π𝔼D[ΔH],𝔼D[ΔH]0,D,\mathbb{E}[\alpha]=1-\frac{1}{2\sqrt{\pi}}\sqrt{\mathbb{E}^{D}[\Delta H]},\qquad\mathbb{E}^{D}[\Delta H]\to 0,\,D\to\infty, (35)

allows us to compute 𝔼D[ΔH]\mathbb{E}^{D}[\Delta H] as proposed in [14]:

𝔼D[ΔH]4π(1𝔼[α])2.\mathbb{E}^{D}[\Delta H]\approx 4\pi(1-\mathbb{E}[\alpha])^{2}. (36)

Here, α\alpha is the acceptance probability of the Metropolis test. This gives rise to the system-specific fitting factor SfS_{f} (see C) and a system-specific ΔtCoLSI\Delta t_{\text{CoLSI}} (for more details see [14]). In particular, we have

Sf𝔼D[ΔH],ΔtCoLSI1Sf,S_{f}\propto\mathbb{E}^{D}[\Delta H],\qquad\Delta t_{\text{CoLSI}}\propto\frac{1}{S_{f}},

that brings to

ΔtCoLSI1𝔼D[ΔH].\Delta t_{\text{CoLSI}}\propto\frac{1}{\mathbb{E}^{D}[\Delta H]}. (37)

The impact of the approximation error 𝒪(μ32)\mathcal{O}(\mu^{\frac{3}{2}}) in (34) on the estimated ΔtCoLSI\Delta t_{\text{CoLSI}} can be evaluated by substituting the RHS of the equation (34) into the inequality 0𝔼[α]10\leqslant{\mathbb{E}[\alpha]}\leqslant{1} which holds by definition. Then, one obtains:

12πμ1𝒪(μ32)12πμ.\frac{1}{2\sqrt{\pi}}\sqrt{\mu}-1\leqslant{\mathcal{O}(\mu^{\frac{3}{2}})}\leqslant{\frac{1}{2\sqrt{\pi}}\sqrt{\mu}}. (38)

For μ0\mu\to 0, as in our case, the inequality (38) becomes

1𝒪(μ32)0.-1\leqslant{\mathcal{O}(\mu^{\frac{3}{2}})}\leqslant{0}.

Therefore, the expected acceptance rate in (35) overestimates the real expected acceptance rate, leading to the underestimation of the energy error in (36), which in turn, results in the overestimation of ΔtCoLSI\Delta t_{\text{CoLSI}} in (37). We notice, however, that such an error is small due to the functional dependencies between the variables involved in the estimation of ΔtCoLSI\Delta t_{\text{CoLSI}} and the condition μ0\mu\to 0, though the order of such an approximation is out of the scope of this study.

A.2 Overestimation of ΔtCoLSI\Delta t_{\text{CoLSI}} for 3-stage integrators

A closed-form expression for the expected energy error produced by 1-stage Velocity Verlet with LL = 1 in the case of a univariate Gaussian model, was derived in [14]:

𝔼VV1[ΔH]=h632,h(0, 2),\mathbb{E}^{1}_{\text{VV}}[\Delta H]=\frac{h^{6}}{32},\quad h\in\left(0,\,2\right), (39)

where hh is the nondimensional counterpart of the integrations step size Δt\Delta t (see [14] and C).

Eq. (39) comes from the more general expression for the expected energy error for a univariate Gaussian model [10]

𝔼1[ΔH]=(Bh+Ch)22.\mathbb{E}^{1}[\Delta H]=\frac{(B_{h}+C_{h})^{2}}{2}. (40)

Here, Bh,ChB_{h},C_{h} are coefficients depending on an integrator in use, and hh is the nondimensional counterpart of a given dimensional simulation step size Δt\Delta t (see C).

It is also possible to derive from (40) similar expressions for the 2- and 3-stage cases (see A.3 and [14], respectively).

In particular,

  • 1.

    k=2k=2: the coefficients for the 2-stage Velocity Verlet are (see A.3)

    Bh=hh38,Ch=h+3h316h5128,B_{h}=h-\frac{h^{3}}{8},\qquad C_{h}=-h+\frac{3h^{3}}{16}-\frac{h^{5}}{128},

    and from (40) one gets

    𝔼VV21[ΔH]=h6(h28)2215,h(0, 4).\mathbb{E}^{1}_{\text{VV2}}[\Delta H]=\frac{h^{6}(h^{2}-8)^{2}}{2^{15}},\quad h\in\left(0,\,4\right). (41)
  • 2.

    k=3k=3: the coefficients for the 3-stage Velocity Verlet (see Appendix A in [14])

    Bh=h(h29)(h227)243,Ch=h(h29)(h227)(h236)8748,B_{h}=\frac{h(h^{2}-9)(h^{2}-27)}{243},\qquad C_{h}=\frac{h(h^{2}-9)(h^{2}-27)(h^{2}-36)}{8748},

    together with (40) give

    𝔼VV31[ΔH]=h6(h29)2(h227)225 314,h(0, 6).\mathbb{E}^{1}_{\text{VV3}}[\Delta H]=\frac{h^{6}(h^{2}-9)^{2}(h^{2}-27)^{2}}{2^{5}\,3^{14}},\quad h\in\left(0,\,6\right). (42)

One can immediately notice that, contrary to the formula (39) for the standard 1-stage Velocity Verlet, both (41) and (42) do not allow for closed-form expressions for hh, which are needed for estimating stability intervals with the procedure proposed in [14].

However, some analysis and comparison of 𝔼VVk1[ΔH]\mathbb{E}^{1}_{\text{VV}k}[\Delta H] in (41)–(42) with 𝔼VV1[ΔH]\mathbb{E}^{1}_{\text{VV}}[\Delta H] in (39) is possible. For that, we first apply a change of variables h=hh=h^{\prime}, h=2hh=2h^{\prime} and h=3hh=3h^{\prime}, where h(0, 2)h^{\prime}\in\left(0,\,2\right) in (39), (41) and (42), respectively.

Afterwards, we can equate both (41) and (42) to 𝔼VV1[ΔH]\mathbb{E}^{1}_{\text{VV}}[\Delta H] in (39) and see which values of hh^{\prime} solve the equation.

  • 1.

    k=2k=2: by equating (41) with h=2hh=2h^{\prime} and (39) with h=hh=h^{\prime}, one obtains

    26h6(4h28)2215=h632,\frac{2^{6}h^{\prime 6}(4h^{\prime 2}-8)^{2}}{2^{15}}=\frac{h^{\prime 6}}{32},

    which brings to

    (h22)2=1,(h^{\prime 2}-2)^{2}=1, (43)

    with two nonnegative roots, h1=1h^{\prime}_{1}=1, h2=3h^{\prime}_{2}=\sqrt{3}, or, equivalently, h1=2h_{1}=2, h2=23h_{2}=2\sqrt{3} in (41). Thus, Eq. (39) and Eq. (41) coincide at the CoLSI. Hence, we can conclude that the analysis made with (39) as proposed in [14] is reliable for the estimation of a dimensional stability limit for a 2-stage integrator.

  • 2.

    k=3k=3: by equating (42) with h=3hh=3h^{\prime} and (39) with h=hh=h^{\prime}, one gets

    36h6(9h29)2(9h227)225 314=h632,\frac{3^{6}h^{\prime 6}(9h^{\prime 2}-9)^{2}(9h^{\prime 2}-27)^{2}}{2^{5}\,3^{14}}=\frac{h^{\prime 6}}{32},

    leading to

    (h44h2+3)2=1.(h^{\prime 4}-4h^{\prime 2}+3)^{2}=1. (44)

    (44) has 3 nonnegative roots h1=2h^{\prime}_{1}=\sqrt{2}, h2=2+2h^{\prime}_{2}=\sqrt{2+\sqrt{2}} and h3=22h^{\prime}_{3}=\sqrt{2-\sqrt{2}}. However, h=1h^{\prime}=1, i.e. h=3h=3 is not a root of (44). The closest to CoLSI =3=3 is 3h32.2963h^{\prime}_{3}\approx 2.296. Clearly, the analysis carried out using (39) brings to an overestimation of a stability interval for 3-stage integrators and may not find an accurate estimation of a dimensional stability limit or CoLSI for a 3-stage integrator. We have to remark that the calculation of dimensional ΔtCoLSI\Delta t_{\text{CoLSI}} does not solely depend on hCoLSIh_{\text{CoLSI}} but also on the simulated properties, such as AR, extracted at the HMC burn-in stage. Obviously, acceptance rates obtained with 1- and 3-stage integrators should differ, and thus it is not feasible to make an accurate prediction of the estimation error for ΔtCoLSI\Delta t_{\text{CoLSI}} within the proposed analysis. In any case, using the randomization interval suggested in (13) resolves this problem by probing the step sizes smaller or equal than h=3h=3, including h=2.296h=2.296.

A.3 Calculation of 2-stage Velocity Verlet integration coefficients

Consider the harmonic oscillator with Hamiltonian

H=12(p2+θ2),θ,p,H=\frac{1}{2}(p^{2}+\theta^{2}),\qquad\theta,p\in\mathbb{R}, (45)

and equations of motions

dθdt=p,dpdt=θ.\frac{d\theta}{dt}=p,\qquad\frac{dp}{dt}=-\theta. (46)

A 2-stage palindromic splitting integrator Ψh\Psi_{h} (hh is the integration step size) acts on a configuration (θi,pi)(\theta_{i},p_{i}) at the ii-th iteration as

Ψh(qipi)=(qi+1pi+1)=(AhBhChDh)(qipi),\Psi_{h}\left(\begin{array}[]{c}q_{i}\\ p_{i}\end{array}\right)=\left(\begin{array}[]{c}q_{i+1}\\ p_{i+1}\end{array}\right)=\left(\begin{matrix}A_{h}&B_{h}\\ C_{h}&D_{h}\end{matrix}\right)\left(\begin{array}[]{c}q_{i}\\ p_{i}\end{array}\right), (47)

for suitable method-dependent coefficients AhA_{h}, BhB_{h}, ChC_{h}, DhD_{h}.

Following [10] and Appendix A in [14], we find BhB_{h} and ChC_{h} required for the calculation of 𝔼1[ΔH]\mathbb{E}^{1}[\Delta H] (40):

Bh\displaystyle B_{h} =h12(1/2b)h3,\displaystyle=h-\frac{1}{2}(1/2-b)h^{3},
Ch\displaystyle C_{h} =h+b(1b)h3b22(1/2b)h5,\displaystyle=-h+b(1-b)h^{3}-\frac{b^{2}}{2}(1/2-b)h^{5},

For the 2-stage Velocity Verlet (b=1/4b=1/4, see [10, 14]), they are

Bh=hh38,Ch=h+3h316h5128.B_{h}=h-\frac{h^{3}}{8},\qquad C_{h}=-h+\frac{3h^{3}}{16}-\frac{h^{5}}{128}.

Appendix B Local maximum of ρBCSS3(h)\rho_{\text{BCSS3}}(h) for h<3h<3

The upper bound of the expected energy error for 3-stage integrators reads as ([14], Eq. (16))

ρ3(h,b)=h4(3b4+8b319/4b2+b+b2h2(b35/4b2+b/21/16)1/16)22(3bbh2(b1/4)1)(13bbh2(b1/2)2)(9b2+6bh2(b35/4b2+b/21/16)1).\rho_{3}(h,b)=\scalebox{0.95}{$\frac{h^{4}\left(-3b^{4}+8b^{3}-19/4b^{2}+b+b^{2}h^{2}\left(b^{3}-5/4b^{2}+b/2-1/16\right)-1/16\right)^{2}}{2\left(3b-bh^{2}\left(b-1/4\right)-1\right)\left(1-3b-bh^{2}\left(b-1/2\right)^{2}\right)\left(-9b^{2}+6b-h^{2}\left(b^{3}-5/4b^{2}+b/2-1/16\right)-1\right)}$}. (48)

We define ρBCSS3(h)=ρ3(h,bBCSS3)\rho_{\text{BCSS3}}(h)=\rho_{3}(h,b_{\text{BCSS3}}) at bBCSS3=0.11888010966548b_{\text{BCSS3}}=0.11888010966548. The derivative with respect to hh of ρ3(h,b)\rho_{3}(h,b) in (48) is

ρ3(h,b)h=f(h,b)g(h,b),\displaystyle\frac{\partial\rho_{3}(h,b)}{\partial h}=\frac{f(h,b)}{g(h,b)}, (49)

where

f(h,b)=(4h3(3b4+8b3194b2+b+b2h2(b354b2+b2116)116)(3b4+8b3194b2+b+b2h2(b354b2+b2116)116)+2h4(3b4+8b3194b2+b+b2h2(b354b2+b2116)116)2b2h(b354b2+b2116))(2(3bbh2(b14)1)(13bbh2(b12)2)(9b2+6bh2(b354b2+b2116)1))(h4(3b4+8b3194b2+b+b2h2(b354b2+b2116)116)(3b4+8b3194b2+b+b2h2(b354b2+b2116)116))(2(2bh(b14)(13bbh2(b12)2)(9b2+6bh2(b354b2+b2116)1)+(3bbh2(b14)1)(2bh)(b12)2(9b2+6bh2(b354b2+b2116)1)+(3bbh2(b14)1)(13bbh2(b12)2)(2h)(b354b2+b2116))),\displaystyle\begin{aligned} f(h,b)=&\Bigg{(}4h^{3}\bigg{(}-3b^{4}+8b^{3}-\frac{19}{4}b^{2}+b+b^{2}h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-\frac{1}{16}\bigg{)}\\ &\cdot\bigg{(}-3b^{4}+8b^{3}-\frac{19}{4}b^{2}+b+b^{2}h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-\frac{1}{16}\bigg{)}\\ &\qquad+2h^{4}\bigg{(}-3b^{4}+8b^{3}-\frac{19}{4}b^{2}+b+b^{2}h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-\frac{1}{16}\bigg{)}\\ &\cdot 2b^{2}h\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)\Bigg{)}\\ &\cdot\bigg{(}2\bigg{(}3b-bh^{2}\left(b-\frac{1}{4}\right)-1\bigg{)}\\ &\cdot\bigg{(}1-3b-bh^{2}\left(b-\frac{1}{2}\right)^{2}\bigg{)}\\ &\cdot\bigg{(}-9b^{2}+6b-h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-1\bigg{)}\bigg{)}\\ &-\bigg{(}h^{4}\bigg{(}-3b^{4}+8b^{3}-\frac{19}{4}b^{2}+b+b^{2}h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-\frac{1}{16}\bigg{)}\\ &\cdot\bigg{(}-3b^{4}+8b^{3}-\frac{19}{4}b^{2}+b+b^{2}h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-\frac{1}{16}\bigg{)}\bigg{)}\\ &\qquad\cdot\bigg{(}2\bigg{(}-2bh\left(b-\frac{1}{4}\right)\bigg{(}1-3b-bh^{2}\left(b-\frac{1}{2}\right)^{2}\bigg{)}\\ &\cdot\bigg{(}-9b^{2}+6b-h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-1\bigg{)}\\ &+\bigg{(}3b-bh^{2}\left(b-\frac{1}{4}\right)-1\bigg{)}\left(-2bh\right)\left(b-\frac{1}{2}\right)^{2}\\ &\cdot\bigg{(}-9b^{2}+6b-h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-1\bigg{)}\\ &+\bigg{(}3b-bh^{2}\left(b-\frac{1}{4}\right)-1\bigg{)}\\ &\cdot\bigg{(}1-3b-bh^{2}\left(b-\frac{1}{2}\right)^{2}\bigg{)}\left(-2h\right)\\ &\cdot\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)\bigg{)}\bigg{)},\end{aligned}

and

g(h,b)=\displaystyle g(h,b)= (2(3bbh2(b14)1)\displaystyle\Bigg{(}2\bigg{(}3b-bh^{2}\left(b-\frac{1}{4}\right)-1\bigg{)}
(13bbh2(b12)2)\displaystyle\cdot\bigg{(}1-3b-bh^{2}\left(b-\frac{1}{2}\right)^{2}\bigg{)}
(9b2+6bh2(b354b2+b2116)1))\displaystyle\cdot\bigg{(}-9b^{2}+6b-h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-1\bigg{)}\Bigg{)}
(2(3bbh2(b14)1)\displaystyle\cdot\bigg{(}2\bigg{(}3b-bh^{2}\left(b-\frac{1}{4}\right)-1\bigg{)}
(13bbh2(b12)2)\displaystyle\cdot\bigg{(}1-3b-bh^{2}\left(b-\frac{1}{2}\right)^{2}\bigg{)}
(9b2+6bh2(b354b2+b2116)1)).\displaystyle\cdot\bigg{(}-9b^{2}+6b-h^{2}\left(b^{3}-\frac{5}{4}b^{2}+\frac{b}{2}-\frac{1}{16}\right)-1\bigg{)}\bigg{)}.

By setting ρ3(h,b)h=0\frac{\partial\rho_{3}(h,b)}{\partial h}=0 and b=bBCSS3b=b_{\text{BCSS3}} [10] in (49), one obtains the local maxima of ρBCSS3(h)\rho_{\text{BCSS3}}(h) around hlowerhBCSS3max2.0772h_{\text{lower}}\equiv h_{\text{BCSS3max}}\approx 2.0772.

Appendix C hΔth\to\Delta t map for a simulation step size

Following the nondimensionalization approaches presented in [14], here we derive a map hΔth\to\Delta t which, given a dimensionless step size hh, computes its dimensional counterpart Δt\Delta t.

s-AIA [14] uses a multivariate Gaussian model and simulation data obtained at the HMC burn-in stage to compute a system-specific fitting factor SfS_{f}. Fitting factors quantify the step size limitations imposed by nonlinear stability in numerical schemes employed for the integration of the Hamiltonian dynamics. These factors (also called safety factors (SF) in [11]) are closely related to stability limits on products of frequencies and step sizes introduced for up to the 6th order resonances for applications in molecular simulations in [85]. Those limitations are usually related to the frequency of the fastest oscillation ω~\tilde{\omega}, and provide a nondimensionalization map for a simulation step size Δt\Delta t, i.e.

h=Sfω~Δt.h=\text{$S_{f}$}\,\tilde{\omega}\,\Delta t. (50)

In [14], Sec. 4, we introduced three optional nondimensionalization maps Δth\Delta t\to h with different levels of accuracy and computational effort. First, we distinguish between the more demanding and accurate Sf=SωS_{f}=S_{\omega}, which requires the computation of the frequencies ωi\omega_{i} of the harmonic forces [14], and the cheaper and less precise Sf=SS_{f}=S (see Eq. (28)), where

Sω=max(1,2ΔtVV2π(1AR)2j=1Dωj66),S=max(1,2ω~ΔtVV2π(1AR)2D6).S_{\omega}=\max\left(1,\frac{2}{\Delta t_{\text{VV}}}\sqrt[6]{\frac{2\pi(1-\text{AR})^{2}}{\sum_{j=1}^{D}\omega_{j}^{6}}}\right),\,\,S=\max\left(1,\frac{2}{\tilde{\omega}\Delta t_{\text{VV}}}\sqrt[6]{\frac{2\pi(1-\text{AR})^{2}}{D}}\right). (51)

Here, ΔtVV\Delta t_{\text{VV}} is the integration step size used during the HMC burn-in simulation, AR is the acceptance rate observed in such a simulation, and DD is the dimension of the simulated system.

On the other hand, if the system frequencies are available, the nondimensionalization approach adapts according to the standard deviation σ\sigma of their distribution. More explicitly, if the distribution of the frequencies is disperse (σ>1\sigma>1), we apply a correction to (50), and define a nondimensionalization factor

CF=Sω(ω~σ),\text{CF}=S_{\omega}(\tilde{\omega}-\sigma), (52)

to replace Sfω~\text{$S_{f}$}\,\tilde{\omega} in (50). Otherwise, if σ<1\sigma<1, the nondimensionalization factor reads

CF=Sfω~,\text{CF}=S_{f}\tilde{\omega}, (53)

where SfS_{f} is defined as in (28).

In any case, given a nondimensional step size hh, its dimensional counterpart for a specific simulated system is computed as

Δt=hCF.\Delta t=\frac{h}{\text{CF}}.

Appendix D LoptL_{\text{opt}} calculation

As proposed in Section 6, we consider

X=𝔼Sf>1L[ΔH]𝔼Sf=1L=1[ΔH],X=\frac{\mathbb{E}^{\text{$L$}}_{\text{$S_{f}>1$}}[\Delta H]}{\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}=1$}}[\Delta H]},

where 𝔼Sf>1L[ΔH]\mathbb{E}^{\text{$L$}}_{\text{$S_{f}>1$}}[\Delta H] represents the expected energy error for 1-stage Velocity Verlet at an arbitrary L1L\geq 1 for anharmonic systems (Sf>1S_{f}>1), and 𝔼Sf=1L=1[ΔH]\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}=1$}}[\Delta H] (23) stands for the expected energy error for 1-stage Velocity Verlet derived at L=1L=1 for systems with harmonic behavior (Sf=1S_{f}=1). SfS_{f} is the fitting factor defined in (22).

One can express the quantity XX as

X=𝔼Sf>1L[ΔH]𝔼Sf>1L=1[ΔH]𝔼Sf>1L=1[ΔH]𝔼Sf=1L=1[ΔH]=X1X2,X=\frac{\mathbb{E}^{\text{$L$}}_{\text{$S_{f}>1$}}[\Delta H]}{\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}>1$}}[\Delta H]}\cdot\frac{\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}>1$}}[\Delta H]}{\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}=1$}}[\Delta H]}=X_{1}\cdot X_{2},

where

X1\displaystyle X_{1} =𝔼Sf>1L[ΔH]𝔼Sf>1L=1[ΔH]=[10] sin2(ηhL)ρVV(h,b)sin2(ηh)ρVV(h,b)=sin2(ηhL)sin2(ηh),\displaystyle=\frac{\mathbb{E}^{\text{$L$}}_{\text{$S_{f}>1$}}[\Delta H]}{\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}>1$}}[\Delta H]}\overset{\text{\cite[cite]{[\@@bibref{Number}{bcss_paper}{}{}]} }}{=}\frac{\sin^{2}\left(\eta_{h}L\right)\rho_{\text{VV}}(h,b)}{\sin^{2}\left(\eta_{h}\right)\rho_{\text{VV}}(h,b)}=\frac{\sin^{2}\left(\eta_{h}L\right)}{\sin^{2}\left(\eta_{h}\right)},
X2\displaystyle X_{2} =𝔼Sf>1L=1[ΔH]𝔼Sf=1L=1[ΔH]=(23)(ΔtVVω~Sf)6(ΔtVVω~)6=Sf6.\displaystyle=\frac{\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}>1$}}[\Delta H]}{\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}=1$}}[\Delta H]}\overset{\text{\eqref{eq:Rho_VV_1}}}{=}\frac{\left(\Delta t_{\text{VV}}\tilde{\omega}S_{f}\right)^{6}}{\left(\Delta t_{\text{VV}}\tilde{\omega}\right)^{6}}=S_{f}^{6}.

Therefore, the resulting expression for XX is

X=sin2(ηhL)sin2(ηh)Sf6.X=\frac{\sin^{2}\left(\eta_{h}L\right)}{\sin^{2}\left(\eta_{h}\right)}S_{f}^{6}. (54)

Here, ηh\eta_{h} is a coefficient that depends on the specific integrator in use (see [10]). In particular, it is determined by the number of stages in the integrator, the integration coefficients, and the step size.

We recall that LL is a free parameter. Then, to minimize the difference between 𝔼Sf>1L[ΔH]\mathbb{E}^{\text{$L$}}_{\text{$S_{f}>1$}}[\Delta H] and 𝔼Sf=1L=1[ΔH]\mathbb{E}^{\text{$L=1$}}_{\text{$S_{f}=1$}}[\Delta H] (as suggested in Section 6) one can find such a value of L1L\geq 1 that X=1X=1. Thus, using Eq. (54), one obtains

L=arcsinsinηhSf3±2πnηh,n=0,1,2,L=\frac{\arcsin\frac{\sin\eta_{h}}{S_{f}^{3}}\pm 2\pi n}{\eta_{h}},\qquad n=0,1,2,... (55)

In Section 6, we argue that an optimal value of LL for GHMC in the proposed settings should be small. Therefore, we take n=0n=0 in (55), which simplifies to

L=arcsinsinηhSf3ηh.L=\frac{\arcsin\frac{\sin\eta_{h}}{S_{f}^{3}}}{\eta_{h}}. (56)

Combining L1L\geq 1 and equation (56) yields Sf1S_{f}\leq 1. However, by definition, Sf1S_{f}\geq 1. Therefore, Sf=1S_{f}=1 and L=1L=1. This suggests a choice of Lopt=1L_{\text{opt}}=1 for the systems with harmonic behavior, i.e. with Sf=1S_{f}=1.

On the other hand, as discussed in A, SfS_{f} is overestimated for sAIA3 by a factor of 1.3066\approx 1.3066. Consequently, it is reasonable to consider Lopt=1L_{\text{opt}}=1 for the systems with Sf1.3S_{f}\lessapprox 1.3. In practice, it might be reasonable to relax condition on SfS_{f} and choose Lopt=1L_{\text{opt}}=1 for systems with SfS_{f} rounded up to 1, i.e. Sf<1.5S_{f}<1.5.

Otherwise (i.e. if Sf1.5S_{f}\geq 1.5), we refer again to Eq. (55) and set n=1n=1. This leads to

Lopt=arcsinsinηhSf3+2πηh.L_{\text{opt}}=\frac{\arcsin\frac{\sin\eta_{h}}{S_{f}^{3}}+2\pi}{\eta_{h}}.

In the settings of this study, the integration step size hh is drawn from 𝒰(hlower,hCoLSI)\mathcal{U}(h_{\text{lower}},h_{\text{CoLSI}}) (see Section 4.1). We therefore calculate ηh\eta_{h} (as detailed in [10] and Appendix A in [14]) at the midpoint h=hlower+hCoLSI2h^{*}=\frac{h_{\text{lower}}+h_{\text{CoLSI}}}{2}, which gives ηh2.637354\eta_{h^{*}}\approx 2.637354. Thus, we obtain

Lopt={1,for1.0Sf<1.5,4,otherwise.L_{\text{opt}}=\begin{cases}1,\quad\text{for}\quad 1.0\leq S_{f}<1.5,\\ 4,\quad\text{otherwise}.\end{cases}
Refer to caption
Figure E.1: ESS/grad metric, ESS ={minESS, meanESS, multiESS} observed in G1000 (a), G500 (b), G2000 (c), German (d), Musk (e), Banana (f) with different choices of φ\varphi and L¯\overline{L}, the mean of a randomization interval for LL. For each benchmark, the optimal randomization scheme for LL (27) is circled on the x-axis. The higher values of each metric indicate the better performance.

Appendix E Sampling Performance of GHMC with different choices of φ\varphi and LL

Figure E.1 presents the comparison of ESS/grad metric, ESS ={minESS, meanESS, multiESS}, obtained for the tested benchmarks using GHMC with different randomization schemes for φ\varphi and LL, and the optimal randomization scheme for the integration step size Δt\Delta t (Section 4). The s-AIA3 integrator is used in all experiments.

Figures E.1(a)–E.1(c) illustrate that, for the Gaussian benchmarks, φ\varphi\sim 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) ensures the best (or close to the best) results for minESS/grad metric among the ones obtained with all the tested φ\varphi randomization schemes, regardless of the choice of LL.

For the average metrics meanESS and multiESS, a significant improvement with φ\varphi\sim 𝒰(φlower,φupper)\mathcal{U}(\varphi_{\text{lower}},\varphi_{\text{upper}}) is observed only at the optimal Lopt=1L_{\text{opt}}=1 (indicated by a circle). With longer trajectories, GHMC with any tested choice of φ\varphi, including φ=1\varphi=1, i.e. HMC, exhibits reduced performance in terms of meanESS and multiESS.

The results for the BLR benchmarks, German and Musk, are displayed in Figures E.1(d) and E.1(e), respectively. For the German benchmark (Figure E.1(d)), the 𝒰(0,0.1)\mathcal{U}(0,0.1) randomization scheme for φ\varphi slightly outperforms our optimal setting in terms of the minESS/grad metric and performs comparably to our approach on the average metrics. In contrast, for the Musk benchmark (Figure E.1(e)), our method achieves the highest values across all ESS metrics, demonstrating superior performance.

For the Banana benchmark (Figure E.1(f)), the proposed optimal settings for φ\varphi, LL and Δt\Delta t lead either to the best or close to the best performance with all tested metrics.

Acknowledgments

We gratefully acknowledge Caetano Souto Maior for his valuable contributions at the early stages of this study, particularly to the implementation of the cell-cell adhesion PDE model in HaiCS.

This research was supported by MICIU/AEI/10.13039/501100011033 and by ERDF A way for Europe under Grant PID2022-136585NB-C22; by the Basque Government through ELKARTEK Programme under Grants KK-2024/00062, and through the BERC 2022-2025 program. We acknowledge the financial support by the Ministry of Science and Innovation through BCAM Severo Ochoa accreditation CEX2021-001142-S / MICIU/ AEI / 10.13039/501100011033 and PLAN COMPLEMENTARIO MATERIALES AVANZADOS 2022-2025, PROYECTO 1101288. The authors acknowledge the financial support received from the grant BCAM-IKUR, funded by the Basque Government by the IKUR Strategy and by the European Union NextGenerationEU/PRTR. JAC was supported by the Advanced Grant Nonlocal-CPD (Nonlocal PDEs for Complex Particle Dynamics: Phase Transitions, Patterns and Synchronization) of the European Research Council Executive Agency (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 883363). M.X. R-A was partially funded by the Spanish State Research Agency through the Ramón y Cajal grant RYC2019-027534-I. This work has been possible thanks to the support of the computing infrastructure of the BCAM in-house cluster Hipatia, i2BASQUE academic network, Barcelona Supercomputing Center (RES, QHS-2025-1-0027), DIPC Computer Center and the technical and human support provided by IZO-SGI SGIker of UPV/EHU.

References

  • [1] S. Duane, A. Kennedy, B. J. Pendleton, D. Roweth, Hybrid Monte Carlo, Physics Letters B 195 (2) (1987) 216–222. doi:https://doi.org/10.1016/0370-2693(87)91197-X.
  • [2] R. M. Neal, MCMC using Hamiltonian dynamics, Handbook of Markov Chain Monte Carlo 2 (11) (2011) 2. doi:10.1201/b10905-6.
  • [3] A. Beskos, N. Pillai, G. Roberts, J. M. Sanz-Serna, A. Stuart, Optimal tuning of the hybrid Monte Carlo algorithm, Bernoulli 19 (5A) (2013) 1501–1534.
    URL http://www.jstor.org/stable/42919328
  • [4] M. Betancourt, S. Byrne, M. Girolami, Optimizing The Integrator Step Size for Hamiltonian Monte Carlo, arXiv preprint, arXiv:1411.6669 (2014). doi:10.48550/arXiv.1411.6669.
  • [5] M. D. Hoffman, A. Gelman, The No-U-Turn sampler: adaptively setting path lengths in Hamiltonian Monte Carlo., Journal of Machine Learning Research 15 (1) (2014) 1593–1623.
    URL https://www.jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf
  • [6] L. Verlet, Computer "Experiments" on Classical Fluids. I. Thermodynamical Properties of Lennard-Jones Molecules, Physical Review 159 (1967) 98–103. doi:10.1103/PhysRev.159.98.
  • [7] W. C. Swope, H. C. Andersen, P. H. Berens, K. R. Wilson, A computer simulation method for the calculation of equilibrium constants for the formation of physical clusters of molecules: Application to small water clusters, The Journal of Chemical Physics 76 (1) (1982) 637–649. doi:10.1063/1.442716.
  • [8] R. I. McLachlan, On the Numerical Integration of Ordinary Differential Equations by Symmetric Composition Methods, SIAM Journal on Scientific Computing 16 (1) (1995) 151–168. doi:10.1137/0916010.
  • [9] T. Takaishi, P. de Forcrand, Testing and tuning symplectic integrators for the hybrid Monte Carlo algorithm in lattice QCD, Physical Review E 73 (2006) 036706. doi:10.1103/PhysRevE.73.036706.
  • [10] S. Blanes, F. Casas, J. M. Sanz-Serna, Numerical integrators for the Hybrid Monte Carlo method, SIAM Journal on Scientific Computing 36 (4) (2014) A1556–A1580. doi:10.1137/130932740.
  • [11] M. Fernández-Pendás, E. Akhmatskaya, J. M. Sanz-Serna, Adaptive multi-stage integrators for optimal energy conservation in molecular simulations, Journal of Computational Physics 327 (2016) 434–449. doi:10.1016/j.jcp.2016.09.035.
  • [12] E. Akhmatskaya, M. Fernández-Pendás, T. Radivojević, J. M. Sanz-Serna, Adaptive Splitting Integrators for Enhancing Sampling Efficiency of Modified Hamiltonian Monte Carlo Methods in Molecular Simulation, Langmuir 33 (42) (2017) 11530–11542, pMID: 28689416. doi:10.1021/acs.langmuir.7b01372.
  • [13] T. Radivojević, M. Fernández-Pendás, J. M. Sanz-Serna, E. Akhmatskaya, Multi-stage splitting integrators for sampling with modified Hamiltonian Monte Carlo methods, Journal of Computational Physics 373 (2018) 900–916. doi:10.1016/j.jcp.2018.07.023.
  • [14] L. Nagar, M. Fernández-Pendás, J. M. Sanz-Serna, E. Akhmatskaya, Adaptive multi-stage integration schemes for Hamiltonian Monte Carlo, Journal of Computational Physics 502 (2024) 112800. doi:10.1016/j.jcp.2024.112800.
  • [15] C. Tamborrino, F. Diele, C. Marangi, C. Tarantino, Adaptive parameters tuning based on energy-preserving splitting integration for Hamiltonian Monte Carlo Method, Communications in Nonlinear Science and Numerical Simulation 137 (2024) 108168. doi:10.1016/j.cnsns.2024.108168.
  • [16] N. Bou-Rabee, J. M. Sanz-Serna, Geometric integrators and the Hamiltonian Monte Carlo method, Acta Numerica 27 (2018) 113–206. doi:10.1017/S0962492917000101.
  • [17] S. Blanes, F. Casas, A. Murua, Splitting methods for differential equations, Acta Numerica 33 (2024) 1–161. doi:10.1017/S0962492923000077.
  • [18] P. B. Mackenzie, An improved hybrid Monte Carlo method, Physics Letters B 226 (1989) 369–371. doi:10.1016/0370-2693(89)91212-4.
  • [19] N. Bou-Rabee, J. M. Sanz-Serna, Randomized Hamiltonian Monte Carlo, The Annals of Applied Probability 27 (2017). doi:10.1214/16-AAP1255.
  • [20] M. Girolami, B. Calderhead, Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods, Journal of the Royal Statistical Society Series B: Statistical Methodology 73 (2011) 123–214. doi:10.1111/j.1467-9868.2010.00765.x.
  • [21] C. C. Margossian, A. Gelman, For how many iterations should we run Markov chain Monte Carlo?, arXiv preprint, arXiv.2311.02726 (2023). doi:10.48550/arXiv.2311.02726.
  • [22] S. Apers, S. Gribling, D. Szilágyi, Hamiltonian Monte Carlo for efficient Gaussian sampling: long and random steps, arXiv preprint, arXiv:2209.12771 (2022). doi:10.48550/arXiv.2209.12771.
  • [23] N. Bou-Rabee, B. Carpenter, M. Marsden, GIST: Gibbs self-tuning for locally adaptive Hamiltonian Monte Carlo, arXiv preprint, arXiv.2404.15253 (2024). doi:10.48550/arXiv.2404.15253.
  • [24] N. Bou-Rabee, B. Carpenter, T. S. Kleppe, M. Marsden, Incorporating Local Step-Size Adaptivity into the No-U-Turn Sampler using Gibbs Self Tuning, arXiv preprint, arXiv.2408.08259 (2024). doi:10.48550/arXiv.2408.08259.
  • [25] C. Modi, ATLAS: Adapting Trajectory Lengths and Step-Size for Hamiltonian Monte Carlo, arXiv preprint, arXiv:2410.21587 (2024). doi:10.48550/arXiv.2410.21587.
  • [26] M. Biron-Lattes, N. Surjanovic, S. Syed, T. Campbell, A. Bouchard-Côté, autoMALA: Locally adaptive Metropolis-adjusted Langevin algorithm, in: S. Dasgupta, S. Mandt, Y. Li (Eds.), Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, Vol. 238 of Proceedings of Machine Learning Research, PMLR, 2024, pp. 4600–4608.
    URL https://proceedings.mlr.press/v238/biron-lattes24a.html
  • [27] P. Sountsov, M. D. Hoffman, Focusing on Difficult Directions for Learning HMC Trajectory Lengths, arXiv preprint, arXiv:2110.11576 (2021). doi:10.48550/arXiv.2110.11576.
  • [28] Y. Chen, K. Gatmiry, When does Metropolized Hamiltonian Monte Carlo provably outperform Metropolis-adjusted Langevin algorithm?, arXiv preprint, arXiv:2304.04724 (2023). doi:10.48550/arXiv.2304.04724.
  • [29] A. M. Horowitz, A generalized guided Monte Carlo algorithm, Physics Letters B 268 (2) (1991) 247–252. doi:https://doi.org/10.1016/0370-2693(91)90812-5.
  • [30] A. Kennedy, B. Pendleton, Cost of the generalised hybrid Monte Carlo algorithm for free field theory, Nuclear Physics B 607 (3) (2001) 456–510. doi:10.1016/S0550-3213(01)00129-8.
  • [31] P. Diaconis, S. Holmes, R. M. Neal, Analysis of a nonreversible Markov chain sampler, Annals of Applied Probability 10 (2000) 726–752. doi:10.1214/AOAP/1019487508.
  • [32] M. Ottobre, Markov Chain Monte Carlo and Irreversibility, Reports on Mathematical Physics 77 (2016) 267–292. doi:10.1016/S0034-4877(16)30031-3.
  • [33] R. M. Neal, Improving Asymptotic Variance of MCMC Estimators: Non-reversible Chains are Better, arXiv preprint, arXiv:math/0407281 (2004). doi:10.48550/arXiv.math/0407281.
  • [34] Y. Sun, F. Gomez, J. Schmidhuber, Improving the Asymptotic Performance of Markov Chain Monte-Carlo by Inserting Vortices, Advances in Neural Information Processing Systems 23 (2010).
    URL https://proceedings.neurips.cc/paper/2010/hash/819f46e52c25763a55cc642422644317-Abstract.html
  • [35] A. B. Duncan, T. Lelièvre, G. A. Pavliotis, Variance Reduction Using Nonreversible Langevin Samplers, Journal of Statistical Physics 163 (2016) 457–491. doi:10.1007/S10955-016-1491-2.
  • [36] Z. Song, Z. Tan, On Irreversible Metropolis Sampling Related to Langevin Dynamics, SIAM Journal on Scientific Computing 44 (2022) A2089–A2120. doi:10.1137/21M1423701.
  • [37] Y. Fang, J. M. Sanz-Serna, R. D. Skeel, Compressible generalized hybrid Monte Carlo, The Journal of Chemical Physics 140 (2014). doi:10.1063/1.4874000.
  • [38] N. Gouraud, P. L. Bris, A. Majka, P. Monmarché, HMC and underdamped langevin united in the unadjusted convex smooth case, arXiv preprint, arXiv.2202.00977 (2022). doi:10.48550/arXiv.2202.00977.
  • [39] L. Riou-Durand, J. Vogrinc, Metropolis Adjusted Langevin Trajectories: a robust alternative to Hamiltonian Monte Carlo, arXiv preprint, arXiv.2202.13230 (2022). doi:10.48550/arXiv.2202.13230.
  • [40] M. D. Hoffman, P. Sountsov, Tuning-Free Generalized Hamiltonian Monte Carlo (2022).
    URL https://proceedings.mlr.press/v151/hoffman22a.html
  • [41] L. Riou-Durand, P. Sountsov, J. Vogrinc, C. C. Margossian, S. Power, Adaptive Tuning for Metropolis Adjusted Langevin Trajectories (2023).
    URL https://proceedings.mlr.press/v206/riou-durand23a.html
  • [42] A. K. Mazur, Common Molecular Dynamics Algorithms Revisited: Accuracy and Optimal Time Steps of Störmer–Leapfrog Integrators, Journal of Computational Physics 136 (2) (1997) 354–365. doi:10.1006/jcph.1997.5740.
  • [43] J. Sanz-Serna, M. Calvo, Numerical Hamiltonian Problems, Chapman and Hall, London, 1994. doi:10.1137/1037075.
  • [44] S. Blanes, F. Casas, A. Murua, Splitting and composition methods in the numerical integration of differential equations, arXiv preprint, arXiv:0812.0377 (2008). doi:10.48550/ARXIV.0812.0377.
  • [45] C. M. Campos, J. Sanz-Serna, Palindromic 3-stage splitting integrators, a roadmap, Journal of Computational Physics 346 (2017) 340–355. doi:10.1016/j.jcp.2017.06.006.
  • [46] C. Predescu, R. A. Lippert, M. P. Eastwood, D. Ierardi, H. Xu, M. Ø. Jensen, K. J. Bowers, J. Gullingsrud, C. A. Rendleman, R. O. Dror, D. E. Shaw, Computationally efficient molecular dynamics integrators with improved sampling accuracy, Molecular Physics 110 (9-10) (2012) 967–983. doi:10.1080/00268976.2012.681311.
  • [47] C. J. Geyer, Practical Markov Chain Monte Carlo, Statistical Science 7 (4) (1992) 473 – 483. doi:10.1214/ss/1177011137.
  • [48] A. Gelman, D. B. Rubin, Inference from Iterative Simulation Using Multiple Sequences, Statistical Science 7 (4) (1992) 457–472. doi:10.1214/ss/1177011136.
  • [49] S. P. Brooks, A. Gelman, General Methods for Monitoring Convergence of Iterative Simulations, Journal of Computational and Graphical Statistics 7 (4) (1998) 434–455. doi:10.1080/10618600.1998.10474787.
  • [50] A. Vehtari, A. Gelman, D. Simpson, B. Carpenter, P.-C. Bürkner, Rank-Normalization, Folding, and Localization: An Improved R̂ for Assessing Convergence of MCMC (with Discussion), Bayesian analysis 16 (2) (2021) 667–718. doi:10.1214/20-BA1221.
  • [51] M. Plummer, N. Best, K. Cowles, K. Vines, CODA: convergence diagnosis and output analysis for MCMC, R news 6 (1) (2006) 7–11.
    URL http://oro.open.ac.uk/22547/
  • [52] L. Nagar, M. Fernández-Pendás, J. M. Sanz-Serna, E. Akhmatskaya, Finding the optimal integration coefficient for a palindromic multi-stage splitting integrator in HMC applications to Bayesian inference, mendeley data (2023). doi:10.17632/5mmh4wcdd6.1.
  • [53] T. Radivojević, Enhancing Sampling in Computational Statistics Using Modified Hamiltonians, Ph.D. thesis, UPV/EHU, Bilbao (Spain) (2016). doi:20.500.11824/323.
  • [54] T. Radivojević, E. Akhmatskaya, Modified Hamiltonian Monte Carlo for Bayesian inference, Statistics and Computing 30 (2) (2020) 377–404. doi:10.1007/s11222-019-09885-x.
  • [55] H. Inouzhe, M. X. Rodríguez-Álvarez, L. Nagar, E. Akhmatskaya, Dynamic SIR/SEIR-like models comprising a time-dependent transmission rate: Hamiltonian Monte Carlo approach with applications to COVID-19, arXiv preprint, arXiv:2301.06385 (2023). doi:10.48550/ARXIV.2301.06385.
  • [56] J. M. Flegal, J. Hughes, D. Vats, N. Dai, K. Gupta, U. Maji, mcmcse: Monte Carlo Standard Errors for MCMC, Accessed: 2025-4-30 (2021).
    URL https://CRAN.R-project.org/package=mcmcse
  • [57] D. Vats, J. M. Flegal, G. L. Jones, Multivariate output analysis for Markov chain Monte Carlo, Biometrika 106 (2019) 321–337. doi:10.1093/biomet/asz002.
  • [58] J. S. Liu, Monte Carlo strategies in scientific computing, Vol. 10, Springer, New York, 2001. doi:10.1007/978-0-387-76371-2.
  • [59] M. Lichman, et al., UCI machine learning repository (2013).
    URL http://archive.ics.uci.edu/ml/index.php
  • [60] M. Piva, G. Domenici, O. Iriondo, M. Rábano, B. M. Simões, V. Comaills, I. Barredo, J. A. López-Ruiz, I. Zabalza, R. Kypta, M. d. M. Vivanco, Sox2 promotes tamoxifen resistance in breast cancer cells, EMBO Molecular Medicine 6 (1) (2014) 66–79. doi:10.1002/emmm.201303411.
  • [61] M. A. Mansournia, A. Geroldinger, S. Greenland, G. Heinze, Separation in Logistic Regression: Causes, Consequences, and Control, American Journal of Epidemiology 187 (2018) 864–870. doi:10.1093/aje/kwx299.
  • [62] A. Gelman, A. Jakulin, M. G. Pittau, Y.-S. Su, A weakly informative default prior distribution for logistic and other regression models, The Annals of Applied Statistics 2 (2008) 1360–1383. doi:10.1214/08-AOAS191.
  • [63] Stan Development Team, Stan Modeling Language Users Guide and Reference Manual, Stan Development Team, version 2.33 (2024).
    URL https://mc-stan.org
  • [64] Stan Development Team, RStan: the R interface to Stan, R package version 2.32.6 (2024).
    URL https://mc-stan.org/
  • [65] N. J. Armstrong, K. J. Painter, J. A. Sherratt, A continuum approach to modelling cell-cell adhesion, Journal of Theoretical Biology 243 (1) (2006) 98–113. doi:10.1016/j.jtbi.2006.05.030.
  • [66] A. Gerisch, M. A. J. Chaplain, Mathematical modelling of cancer cell invasion of tissue: Local and non-local models and the effect of adhesion, Journal of Theoretical Biology 250 (4) (2008) 684–704. doi:10.1016/j.jtbi.2007.10.026.
  • [67] H. Murakawa, H. Togashi, Continuous models for cell–cell adhesion, Journal of Theoretical Biology 374 (2015) 1–12. doi:https://doi.org/10.1016/j.jtbi.2015.03.002.
  • [68] J. A. Carrillo, Y. Huang, M. Schmidtchen, Zoology of a Nonlocal Cross-Diffusion Model for Two Species, SIAM Journal on Applied Mathematics 78 (2) (2018) 1078–1104. doi:10.1137/17M1128782.
  • [69] J. A. Carrillo, H. Murakawa, M. Sato, H. Togashi, O. Trush, A population dynamics model of cell-cell adhesion incorporating population pressure and density saturation, Journal of Theoretical Biology 474 (2019) 14–24. doi:10.1016/j.jtbi.2019.04.023.
  • [70] O. Trush, C. Liu, X. Han, Y. Nakai, R. Takayama, H. Murakawa, J. A. Carrillo, H. Takechi, S. Hakeda-Suzuki, T. Suzuki, M. Sato, N-Cadherin Orchestrates Self-Organization of Neurons within a Columnar Unit in the Drosophila Medulla, Journal of Neuroscience 39 (30) (2019) 5861–5880. doi:10.1523/JNEUROSCI.3107-18.2019.
  • [71] A. Volkening, B. Sandstede, Modelling stripe formation in zebrafish: an agent-based approach, Journal of The Royal Society Interface 12 (2015) 20150812. doi:10.1098/rsif.2015.0812.
  • [72] W. D. Martinson, A. Volkening, M. Schmidtchen, C. Venkataraman, J. A. Carrillo, Linking discrete and continuous models of cell birth and migration, Royal Society Open Science 11 (2024). doi:10.1098/rsos.232002.
  • [73] Y. Matsunaga, M. Noda, H. Murakawa, K. Hayashi, A. Nagasaka, S. Inoue, T. Miyata, T. Miura, K. ichiro Kubo, K. Nakajima, Reelin transiently promotes N-cadherin–dependent neuronal adhesion during mouse cortical development, Proceedings of the National Academy of Sciences of the United States of America 114 (8) (2017) 2048–2053.
    URL https://www.jstor.org/stable/26479312
  • [74] M. J. Simpson, R. E. Baker, S. T. Vittadello, O. J. MacLaren, Practical parameter identifiability for spatio-temporal models of cell invasion, Journal of the Royal Society Interface 17 (2020). doi:10.1098/RSIF.2020.0055.
  • [75] C. Falcó, D. J. Cohen, J. A. Carrillo, R. E. Baker, Quantifying tissue growth, shape and collision via continuum models and Bayesian inference, Journal of the Royal Society Interface 20 (2023). doi:10.1098/RSIF.2023.0184.
  • [76] M. J. Simpson, R. E. Baker, Parameter identifiability, parameter estimation and model prediction for differential equation models, arXiv preprint, arXiv:2405.08177 (2025). doi:10.48550/arXiv.2405.08177.
  • [77] R. Bailo, J. A. Carillo, D. Gómez-Castro, Aggregation-diffusion Equations for Collective Behavior in the Sciences, https://www.siam.org/publications/siam-news/articles/aggregation-diffusion-equations-for-collective-behavior-in-the-sciences/, accessed: 2025-04-30 (2024).
  • [78] J. A. Carrillo, A. Chertock, Y. Huang, A Finite-Volume Method for Nonlinear Nonlocal Equations with a Gradient Flow Structure, Communications in Computational Physics 17 (1) (2015) 233–258. doi:10.4208/cicp.160214.010814a.
  • [79] R. Bailo, J. A. Carrillo, H. Murakawa, M. Schmidtchen, Convergence of a fully discrete and energy-dissipating finite-volume scheme for aggregation-diffusion equations, Mathematical Models and Methods in Applied Sciences 30 (13) (2020) 2487–2522. doi:10.1142/S0218202520500487.
  • [80] L. Grinsztajn, E. Semenova, C. C. Margossian, J. Riou, Bayesian workflow for disease transmission modeling in Stan, Statistics in Medicine 40 (27) (2021) 6209–6234. doi:https://doi.org/10.1002/sim.9164.
  • [81] T. Jombart, S. Frost, P. Nouvellet, F. Campbell, B. Sudre, outbreaks: A Collection of Disease Outbreak Data, R package version 1.9.0 (2020).
    URL https://CRAN.R-project.org/package=outbreaks
  • [82] W. O. Kermack, A. G. McKendrick, A contribution to the mathematical theory of epidemics, Proceedings of the Royal Society of London 115 (772) (1927) 700–721. doi:10.1098/rspa.1927.0118.
  • [83] A. C. Hindmarsh, R. Serban, C. J. Balos, D. J. Gardner, D. R. Reynolds, C. S. Woodward, User Documentation for CVODE v5. 7.0 (sundials v5. 7.0), https://computing.llnl.gov/sites/default/files/cvs_guide-5.7.0.pdf, Accessed: 2025-4-30 (2021).
  • [84] M. Calvo, D. Sanz-Alonso, J. Sanz-Serna, HMC: Reducing the number of rejections by not using leapfrog and some results on the acceptance rate, Journal of Computational Physics 437 (2021) 110333. doi:10.1016/j.jcp.2021.110333.
  • [85] T. Schlick, M. Mandziuk, R. D. Skeel, K. Srinivas, Nonlinear Resonance Artifacts in Molecular Dynamics Simulations, Journal of Computational Physics 140 (1) (1998) 1–29. doi:https://doi.org/10.1006/jcph.1998.5879.