This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Computational budget optimization for Bayesian parameter estimation in heavy ion collisions

Brandon Weiss Department of Physics, Duke University, Durham NC 27708.    Jean-François Paquet Department of Physics and Astronomy, Vanderbilt University, Nashville TN 37235. Department of Mathematics, Vanderbilt University, Nashville TN 37235. Department of Physics, Duke University, Durham NC 27708.    Steffen A. Bass Department of Physics, Duke University, Durham NC 27708.
(August 2021)
Abstract

Bayesian parameter estimation provides a systematic approach to compare heavy ion collision models with measurements, leading to constraints on the properties of nuclear matter with proper accounting of experimental and theoretical uncertainties. Aside from statistical and systematic model uncertainties, interpolation uncertainties can also play a role in Bayesian inference, if the model’s predictions can only be calculated at a limited set of model parameters. This uncertainty originates from using an emulator to interpolate the model’s prediction across a continuous space of parameters. In this work, we study the trade-offs between the emulator (interpolation) and statistical uncertainties. We perform the analysis using spatial eccentricities from the TRENTo model of initial conditions for nuclear collisions. Given a fixed computational budget, we study the optimal compromise between the number of parameter samples and the number of collisions simulated per parameter sample. For the observables and parameters used in the present study, we find that the best constraints are achieved when the number of parameter samples is slightly smaller than the number of collisions simulated per parameter sample.

I Introduction

One of the main goals of the heavy ion program pursued at the Relativistic Heavy Ion Collider (RHIC) and the Large Hadron Collider (LHC) is a quantitative understanding of the properties of nuclear matter under extreme conditions as described by Quantum Chromodynamics (QCD). At sufficiently high temperatures and densities accessible during these nuclear collisions, a transient state of matter, the quark-gluon plasma (QGP), is formed, but decays again as the collision systems cools down and disintegrates. Collisions of heavy ions leading to QGP formation are multistage processes: the initial impact of the nuclei, the formation and expansion of the QGP, reconfinement into hadrons, and subsequent hadronic interactions [1, 2, 3, 4]. A wide range of properties of nuclear matter enter numerical simulations of heavy ion collisions. Some of these properties are constrained by external means, for example lattice Quantum Chromodynamics calculations of the equation of state of nuclear matter [5]. Other properties, such as transport coefficients or parameters entering the description of the early stage of the collision, are often parametrized and constrained by comparison with data.

Measurements from the Relativistic Heavy Ion Collider (RHIC) and the Large Hadron Collider (LHC) can constrain these physical parameters. Because of the heterogeneity and large number of measurements and model parameters, it is beneficial to perform model-to-data comparisons using statistical techniques such as Bayesian parameter estimation: this allows for a systematic propagation of uncertainties from experimental measurements to physical parameters. A number of such studies have been performed over the past decade [6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20].

A key ingredient of Bayesian parameter estimation as used in heavy ion collisions is emulation: given the model’s prediction at a discrete sample of parameters, an emulator will interpolate the model’s prediction over a continuous range of parameters. Furthermore, emulators such as Gaussian processes provide an estimate of their own interpolation uncertainty; in effect, this allows emulation to be used in model-to-data comparisons even if the emulator’s interpolation uncertainty is not negligible compared to the other uncertainties in the problem — a situation that is often unavoidable in simulations with large number of parameters. In Bayesian parameter estimation, this emulator uncertainty is included into the physical parameter’s final uncertainty determination, alongside experimental, statistical and other theoretical uncertainties.

The most straightforward approach to reducing emulator uncertainty is to increase the number of parameter samples at which the model observables are evaluated; these parameter samples are generally referred to as the emulator’s “design points”. Evidently, this increase in the number of design points must be balanced with other demands on the available computational budget. In particular, simulations of heavy ion collisions are stochastic: there are event-by-event fluctuations in the outcome of the collisions. Comparisons with measurements require simulating and averaging over a large number of collisions. At a fixed computational budget, one must determine the proper trade-off between reducing emulator interpolation uncertainties or statistical ones. In this work, we use the TRENTo model of initial conditions for heavy ion collisions [21] to study this trade-off between statistical and emulator uncertainties.

II Methodology

We use the model TRENTo as proxy for observables in heavy ion collisions; this is based on the known correlation [22] between (i) initial spatial anisotropies of models like TRENTo, and (ii) measurable momentum anisotropies of final state hadrons. To interpolate the output of the TRENTo model, we use Gaussian process emulators, which have been the standard emulation technique used for model-to-data comparisons in heavy ion physics [6, 7, 8, 10]. To quantify the optimal trade-off between statistical and emulator uncertainty, we use “closure tests”, which apply Bayesian parameter inference to model calculations. We summarize the overall approach and methods below.

II.1 TRENTo

TRENTo is a parametric initial condition ansatz for simulating relativistic heavy ion collisions [21]. It takes as input the type of colliding nuclei and the inelastic nucleon-nucleon cross-section corresponding to the center-of-mass energy of the collision.

We vary three parameters of the TRENTo model: the effective nucleon size ww, the fluctuation parameter kk and the reduced thickness pp. The fluctuation parameter kk allows for variation in the amount of energy carried by each nucleon, while the reduced thickness pp parametrize how the energy or entropy density is constructed from the profile of nucleons sampled from each nuclei.

The output of TRENTo is a density profile in the plane transverse to the collision axis. We interpret TRENTo’s output as an energy density profile ϵ(r,ϕ)\epsilon(r,\phi) with arbitrary normalization. From this density profile ϵ(r,ϕ)\epsilon(r,\phi), we compute single-event spatial anisotropies εn\varepsilon_{n} [22]:

εneinΦn=0𝑑rr02π𝑑ϕrnϵ(r,ϕ)einϕ0𝑑rr02π𝑑ϕrnϵ(r,ϕ)\varepsilon_{n}e^{in\Phi_{n}}=\frac{\int_{0}^{\infty}drr\int_{0}^{2\pi}d\phi\;r^{n}\epsilon(r,\phi)e^{in\phi}}{\int_{0}^{\infty}drr\int_{0}^{2\pi}d\phi\;r^{n}\epsilon(r,\phi)} (1)

Although these anisotropies can be related to the momentum anisotropy of hadrons measured at the end of the collision [23, 24, 25, 26, 27], in this work, we focus on the initial spatial eccentricies themselves.

We compute the average of an ensemble of TRENTo events with an arithmetic average:

εn=1Mevj=1Mevεn{event j}.\langle\varepsilon_{n}\rangle=\frac{1}{M_{\textrm{ev}}}\sum_{j=1}^{M_{\textrm{ev}}}\varepsilon_{n}\{\textrm{event }j\}\;. (2)

We used up to four harmonics in this work, n=2n=2 to 55.

We studied minimum bias Pb-Pb collisions with Woods-Saxon nucleon distributions. We used an inelastic nucleon-nucleon cross-section of 6464 mb, corresponding approximately to sNN=2.76\sqrt{s_{NN}}=2.76 TeV collisions.

II.2 Bayesian parameter estimation

We note yth,j(𝐩)y_{\textrm{th},j}(\mathbf{p}) the model’s prediction for the “jj”-th observables of interest, evaluated when the model’s parameters are set to 𝐩\mathbf{p}. At 𝐩\mathbf{p}, we compute yth,j(𝐩)=εnj(𝐩)y_{\textrm{th},j}(\mathbf{p})=\langle\varepsilon_{n_{j}}\rangle(\mathbf{p}), where the index jj runs over all the observables of interest, which in our case are the different “nn” in Eq. 2.

In general, given this information, we want to find the probability that parameters 𝐩\mathbf{p} are consistent with the data {yexp,j}\{y_{\textrm{exp},j}\} and with the experimental and theoretical uncertainties {σexp,j}\{\sigma_{\textrm{exp},j}\} and {σth,j(𝐩)}\{\sigma_{\textrm{th},j}(\mathbf{p})\}. The first step is to define a metric to quantify the level of model-to-data agreement: the likelihood function. We make the typical choice of a Gaussian likelihood function:

({yexp,j}|𝐩)=exp(12j[yth,j(𝐩)yexp,j]2σth,j(𝐩)2+σexp,j2)/(2π)nj(σth,j(𝐩)2+σexp,j2).\mathcal{L}(\{y_{\textrm{exp},j}\}|\mathbf{p})=\left.\exp\left(-\frac{1}{2}\sum_{j}\frac{\left[y_{\textrm{th},j}(\mathbf{p})-y_{\textrm{exp},j}\right]^{2}}{\sigma_{\textrm{th},j}(\mathbf{p})^{2}+\sigma_{\textrm{exp},j}^{2}}\right)\middle/\sqrt{(2\pi)^{n}\prod_{j}\left(\sigma_{\textrm{th},j}(\mathbf{p})^{2}+\sigma_{\textrm{exp},j}^{2}\right)}\right.\;. (3)

We assumed a diagonal covariance matrix. In this work, instead of comparing the model with data, we will use closure tests: we will replace yexp,jy_{\textrm{exp},j} by model calculations, as discussed later. These model calculations do stand for experimental data, and consequently we keep the label “exp” to denote these quantities.

Given our model of the collision, the probability that the parameter set 𝐩\mathbf{p} is consistent with the data and the uncertainties is given by the posterior distribution of model parameters, 𝒫(𝐩|({yexp})\mathcal{P}(\mathbf{p}|(\{y_{\textrm{exp}}\}):

𝒫(𝐩|({yexp})=({yexp}|𝐩)Prior(𝐩)𝑑𝐩({yexp}|𝐩)Prior(𝐩)\mathcal{P}(\mathbf{p}|(\{y_{\textrm{exp}}\})=\frac{\mathcal{L}(\{y_{\textrm{exp}}\}|\mathbf{p})\textrm{Prior}(\mathbf{p})}{\int d\mathbf{p}\mathcal{L}(\{y_{\textrm{exp}}\}|\mathbf{p})\textrm{Prior}(\mathbf{p})} (4)

where Prior(𝐩)\textrm{Prior}(\mathbf{p}) is the prior distribution. In this work we take the prior to be constant over a finite parameter range, for all the parameters.

The posterior distribution 𝒫(𝐩|({yexp})\mathcal{P}(\mathbf{p}|(\{y_{\textrm{exp}}\}) is a probability distribution whose dimensionality is equal to the number of model parameters. Different projections of the posterior distribution summarize the constraints on the model parameters. As long as the number of parameters is small and the model is relatively fast computationally, it is straightforward to sample and marginalize the posterior 𝒫(𝐩|({yexp})\mathcal{P}(\mathbf{p}|(\{y_{\textrm{exp}}\}). When the number of parameters increases or if the model is expensive, it can become overly burdensome to evaluate a model’s prediction at a large number of values of the parameter 𝐩\mathbf{p} to compute 𝒫(𝐩|({yexp})\mathcal{P}(\mathbf{p}|(\{y_{\textrm{exp}}\}). A solution is to prepare a fast proxy, such as a Gaussian process [28], to emulate the model’s prediction. We review Gaussian process emulators briefly in the next subsection. As far as the Bayesian inference is concerned, the consequence of using Gaussian process emulators is substituting the model’s prediction by the emulator’s prediction,

yth,j(𝐩)yemul,j(𝐩),y_{\textrm{th},j}(\mathbf{p})\to y_{\textrm{emul},j}(\mathbf{p})\;, (5)

as well as adding the emulator interpolation uncertainty into the sum of uncertainties,

σth,j(𝐩)2σth,j(𝐩)2+σemul,j(𝐩)2.\sigma_{\textrm{th},j}(\mathbf{p})^{2}\to\sigma_{\textrm{th},j}(\mathbf{p})^{2}+\sigma_{\textrm{emul},j}(\mathbf{p})^{2}\;. (6)

II.3 Emulation with Gaussian processes

Refer to caption
Figure 1: 16 design points in a two-dimensional parameter space (reduced thickness on the x-axis and nucleon width on the y-axis), as sampled with a low discrepancy sequence generator [29, 30].

The idea behind emulation is to first sample the parameter space of the model, {𝐩k}\{\mathbf{p}_{k}\}; the parameter samples are the design points. The model’s predictions are then calculated for all design points: {yth,j(𝐩k)}\{y_{\textrm{th},j}(\mathbf{p}_{k})\}. The emulator then interpolates the model’s predictions over a continuous range of model parameters 𝐩\mathbf{p}.

To sample the parameter space, we use a low discrepancy sequence generator [29, 30]. Figure 1 shows an example for 16 design points within a two-dimensional parameter space.

Refer to caption
Refer to caption
Figure 2: (Left) Value for ε2\langle\varepsilon_{2}\rangle as predicted by a Gaussian process emulator trained with 16 design points (y-axis), compared to the actual value of the model at the design points (x-axis); (Right) Same as left, for ε3\langle\varepsilon_{3}\rangle with 256 design points. The red line corresponding to perfect emulation, yemul,j(𝐩)=yth,j(𝐩)y_{\textrm{emul},j}(\mathbf{p})=y_{\textrm{th},j}(\mathbf{p}), is shown for reference. Statistical uncertainties on each sample are not shown for clarity.

For each observable (ϵn\langle\epsilon_{n}\rangle), a Gaussian process is trained on the set of values of ϵn(𝐩)\langle\epsilon_{n}\rangle(\mathbf{p}) calculated at the design points {𝐩k}\{\mathbf{p}_{k}\}. Gaussian process emulators are probabilistic interpolators, which assume that each point in parameter space is a probability distribution. At the design points, the width of the probability distribution should include the statistical uncertainty of the observables. Between the design points, the width of the probability distribution should increase to account for the interpolation uncertainty. To handle stochastic simulations, Gaussian process emulators must be set up such that variations between parameter samples are divided into (i) statistical uncertainty, and (ii) the actual variation from the parameter dependence of the model. A mathematical description of the approach can be found in Ref. [15, Section V-B-2]. In short, a white noise kernel accounts for the uncorrelated (“short range”) fluctuations originating from the statistical uncertainty, and a squared-exponential kernel accounts for the longer-range parameter dependence. The parameters of these kernels, for example the relative size of the kernels, is determined by numerical optimization. This approach is used in most applications of Bayesian parameter inference in heavy ion physics. The result of this implementation is that the emulator uncertainty accounts for both the statistical and interpolation uncertainties.

Figure 2 shows the mean-value predictions of the emulator for ϵ2\langle\epsilon_{2}\rangle and ϵ3\langle\epsilon_{3}\rangle compared to the actual value of these observables. The points all lie close to the line yemul,j(𝐩)=yth,j(𝐩)y_{\textrm{emul},j}(\mathbf{p})=y_{\textrm{th},j}(\mathbf{p}), which shows that the emulator does indeed interpolate well. On the other hand, ϵ3\langle\epsilon_{3}\rangle tends to have a larger statistical uncertainty than ϵ2\langle\epsilon_{2}\rangle, and as the right-hand side of Figure 2 shows, increasing the number of design points will not lead to perfect deterministic-like prediction from the emulator because the values of ϵ3\langle\epsilon_{3}\rangle have significant statistical uncertainties. This is the trade-off that we focus on in this work.

II.4 Closure tests

Closure tests are a straightforward application of Bayesian parameter inference, with experimental data replaced by calculations from the model itself. It is a self-consistency check made non-trivial by (i) the presence of uncertainties, as well as (ii) loss of information from the model to the observables. A longer discussion of closure tests, in the context of heavy ion collisions, can be found in Section VI of Ref. [15]. We describe it briefly here, with focus on metrics to quantify the success of closure tests.

In closure tests, we first select a set of parameters that are considered the reference parameter for the closure test; we refer to these parameters as the “reference” or “truth” parameters, 𝐩truth\mathbf{p}^{\textrm{truth}}. We then compute event-averaged eccentricities ϵntruth=εn(𝐩truth)\langle\epsilon^{\textrm{truth}}_{n}\rangle=\langle\varepsilon_{n}\rangle(\mathbf{p}^{\textrm{truth}}), which have a statistical uncertainty Δϵntruth\Delta\langle\epsilon^{\textrm{truth}}_{n}\rangle. While these uncertainties are purely statistical, they are meant to mimic statistical and systematic uncertainties found in measurements. This uncertainty can be dialed by changing the number of TRENTo events MevtruthM_{\textrm{ev}}^{\textrm{truth}} used to compute ϵntruth\langle\epsilon^{\textrm{truth}}_{n}\rangle.

Separately, we train Gaussian process emulators for εn(𝐩)\langle\varepsilon_{n}\rangle(\mathbf{p}) over a chosen range of model parameters, as described in the previous section. The two sources of uncertainties in the emulator are the number of design points (parameter samples) NdN_{d}, which controls the interpolation uncertainty, and the number of TRENTo events per design point, MevM_{\textrm{ev}}, which control the statistical uncertainty of εn(𝐩)\langle\varepsilon_{n}\rangle(\mathbf{p}) at each design point.

Using the Gaussian process emulators as proxy for TRENTo, we then perform Bayesian parameter estimation to attempt to recover the parameters 𝐩truth\mathbf{p}^{\textrm{truth}} from the values of the observables ϵntruth±Δϵntruth\langle\epsilon^{\textrm{truth}}_{n}\rangle\pm\Delta\langle\epsilon^{\textrm{truth}}_{n}\rangle. The result of the Bayesian inference is the posterior 𝒫(𝐩|{ϵntruth})\mathcal{P}(\mathbf{p}|\{\langle\epsilon^{\textrm{truth}}_{n}\rangle\}) (Eq. 4). This posterior distribution depends on three sources of uncertainties: the uncertainty on the reference observables (Δϵntruth\Delta\langle\epsilon^{\textrm{truth}}_{n}\rangle), which mimics experimental uncertainty, and the statistical and interpolation uncertainty in the emulator. If all uncertainties were small, one would expect the posterior to be a narrow peak at the parameter truth 𝐩truth\mathbf{p}^{\textrm{truth}}. In practice, some of these uncertainties are significant, and the posterior distribution thus has some finite width in parameter space. If closure is successful, the posterior should enclose the parameter set 𝐩truth\mathbf{p}^{\textrm{truth}}.

Refer to caption
Figure 3: Schematic comparison of 1-parameter posterior distributions (red solid, green dashed and blue dotted Gaussians) with a single fixed “truth” value of the parameter (vertical teal line). The green dashed line shows how the posterior increases as accuracy increases when compared to the blue dotted line, and the red solid line shows how the posterior increases as precision increases when compared to the blue dotted line.

To quantify the degree of agreement of the posterior — 𝒫(𝐩|{ϵntruth})\mathcal{P}(\mathbf{p}|\{\langle\epsilon^{\textrm{truth}}_{n}\rangle\}) — with the known true value 𝐩truth\mathbf{p}^{\textrm{truth}} of the parameters used in the closure test, we use two different metrics. The first one is the value of the posterior at the parameter truth, 𝒫(𝐩truth|{ϵntruth})\mathcal{P}(\mathbf{p}^{\textrm{truth}}|\{\langle\epsilon^{\textrm{truth}}_{n}\rangle\}). The posterior at the parameter truth balances accuracy and precision. As illustrated in Fig. 3, an increase in accuracy (shift towards true value of the parameter) will increase the posterior value. An increase in precision (narrower posterior) will also increase the posterior at the parameter truth, but only to a certain extent: the posterior at the truth will actually start to decrease if the results get too confident about the incorrect value. The posterior at the truth has the benefit of being simple to interpret and inexpensive to compute.

For completeness, we studied a second metric, the Akaike Information Criterion (AIC) [31], given by:

AIC2ln(max)+2k\textrm{AIC}\equiv-2\ln(\mathcal{L}_{\mathrm{max}})+2k (7)

where max\mathcal{L}_{\mathrm{max}} is the maximum likelihood in the parameter space and kk is the number of parameters. The Akaike Information Criterion is a metric used for model selection, to help determine how successful a model is at describing measurements given its number of parameters [32, 33]. In the case of closure tests, while we know that the emulator and the reference (truth) calculations originate from the same model, the uncertainties in the problem can evidently obscure this fact. We will see in this work the AIC’s success at quantifying closure. Note that the AIC is more expensive to compute, as it requires identifying the maximum of the likelihood.

III Results

As discussed in Section II, the result of Bayesian parameter inference generally depends on the experimental uncertainty as well as on the statistical and interpolation uncertainty of the emulator. Additional theoretical uncertainties can also play a role [15].

Because we use closure tests in this manuscript, the experimental uncertainty is replaced by the statistical uncertainty of the reference (“truth”) calculations, which is controlled by changing the number MevtruthM_{\textrm{ev}}^{\textrm{truth}} of TRENTo events that are averaged over to compute the reference observables. As for the emulator, its uncertainty is determined by the number of samples of the model parameters NdN_{d} and the number of collisions simulated per parameter sample MevM_{\textrm{ev}}.

Tests are performed at a fixed computation budget: the product Ntot=NdMevN_{\textrm{tot}}=N_{d}M_{\textrm{ev}} is kept fixed. The set of (Nd,Mev)(N_{d},M_{\textrm{ev}}) pairs is chosen to vary by a factor of 2 (e.g. {8, 128} \rightarrow {16, 64} \rightarrow {32, 32}). We use nine different sets of (Nd,Mev)(N_{d},M_{\textrm{ev}}), resulting in nine different groups of emulators, each group having one Gaussian process emulator per observable.

As a base case, we first used Ntot=NdMev=216N_{\textrm{tot}}=N_{d}M_{\textrm{ev}}=2^{16} total events for a Pb-Pb collision. We used Mevtruth=216M_{\textrm{ev}}^{\textrm{truth}}=2^{16} collisions to compute the reference “truth” observables. We begin with only two TRENTo parameters at the time — nucleon width and reduced thickness in one case, and nucleon width and fluctuation in the other. To test the robustness of the methodology, we later vary the data uncertainty, and the number of parameters and of observables.

Refer to caption
Refer to caption
Figure 4: Closure tests with NdMev=216N_{d}M_{\textrm{ev}}=2^{16} total events, Mevtruth=216M_{\textrm{ev}}^{\textrm{truth}}=2^{16} events to compute the reference observables, two parameters (nucleon width and reduced thickness on the left, and nucleon width and reduced fluctuation parameter on the right), and two observables (ε2\langle\varepsilon_{2}\rangle and ε3\langle\varepsilon_{3}\rangle in Pb-Pb collisions at sNN=2.76\sqrt{s_{NN}}=2.76 TeV). Top two panels show the mode and width of the one-parameter marginalized likelihood, while the bottom panel shows the value of the posterior at the true value of the parameter as well as the Akaike Information Criterion (AIC), all as a function of the ratio of the number of design points NdN_{d} to the number of TRENTo simulations per design points MevM_{\textrm{ev}}.

After performing Bayesian parameter inference in the framework described above, we obtain a 2-parameter posterior distribution. By marginalizing over each parameter, we obtain 1-parameter marginalized posterior distributions, whose mode and width (variance) we calculate and plot as a function of the ratio of the number of design points NdN_{d} to the number of TRENTo simulations per design points MevM_{\textrm{ev}}; this is shown in the left top two panels of Figure 4. The true (reference) value of the parameters used in the tests is shown as a red horizontal line. We first see that, for any number of design points, the constraints from the posterior are consistent with the true value of the parameters. This is not a trivial observation: it indicates that the emulator is quantifying properly the interpolation and statistical uncertainties, however large or small these uncertainties are. In this sense, we see that Bayesian parameter inference provides valid constraints for any set of design points NdN_{d} and number of TRENTo events MevM_{\textrm{ev}}. On the other hand, one can optimize the results of the Bayesian inference by a careful compromise between emulator and statistical uncertainty. With large numbers of design points (large Nd/MevN_{d}/M_{\textrm{ev}}), the emulator uncertainty is expected to be small, but large statistical uncertainties on the ϵn\langle\epsilon_{n}\rangle of each design point prevent accurate constraints on the parameter. Using a large number of TRENTo events with a small number of design points (small Nd/MevN_{d}/M_{\textrm{ev}}) leads to similar sub-optimal results. To quantify this tension, we use the two metrics discussed in Section II.4: the value of the posterior at the true value of the closure test parameters, and the Akaike Information Criterion (AIC). Both are plotted on the bottom left panel of Figure 4, still as a function of the ratio Nd/MevN_{d}/M_{\textrm{ev}}. The maximum value of the posterior at the truth, and the minimum value of the AIC, are observed around Nd/Mev=0.25N_{d}/M_{\textrm{ev}}=0.25, which corresponds to Nd=27N_{d}=2^{7} design points. Overall, the different panels all indicate that certain apportionment of the number of design points does result in significantly better constraints on the model parameter for the same computational expense. The optimal number is found to be between Nd/Mev=0.1N_{d}/M_{\textrm{ev}}=0.1 and 11. This suggests that the optimal value of NdN_{d} is of the order of Ntot\sqrt{N_{\textrm{tot}}} or slightly smaller.

Repeating this exercise by swapping one of the TRENTo parameters (reduced thickness) by another (fluctuation parameter) yields the right panels of Figure 4. Because the fluctuation parameter is more challenging to constrain from ε2\langle\varepsilon_{2}\rangle and ε3\langle\varepsilon_{3}\rangle, the dependence on the number of design point NdN_{d} of the top panel is modest. Yet this does not prevent the other parameter to be better constrained (middle right panel), and the optimal number of design points remain in the vicinity of Nd/Mev=0.25N_{d}/M_{\textrm{ev}}=0.25.

To ensure the robustness of our conclusions, we repeat this same analysis varying the uncertainty on the reference calculations (which are proxy for data uncertainty) as well as varying the number of parameters and observables.

III.1 Dependence on the uncertainty of the reference calculation (“data”)

Refer to caption
Refer to caption
Figure 5: Same as left side of Fig. 4, with (left) Mevtruth=214M_{\textrm{ev}}^{\textrm{truth}}=2^{14} events and (right) Mevtruth=218M_{\textrm{ev}}^{\textrm{truth}}=2^{18} events.

To compute the reference (“truth”) observables, MevtruthM_{\textrm{ev}}^{\textrm{truth}} TRENTo events are averaged. Changing MevtruthM_{\textrm{ev}}^{\textrm{truth}} controls the uncertainty on the reference observables, Δϵntruth\Delta\langle\epsilon^{\textrm{truth}}_{n}\rangle. In this section, we vary MevtruthM_{\textrm{ev}}^{\textrm{truth}} to verify the effect on the optimal number of design points.

In the previous example, because Ntot=NdMevN_{\textrm{tot}}=N_{d}M_{\textrm{ev}} and MevtruthM_{\textrm{ev}}^{\textrm{truth}} were both equal to 2162^{16}, the interpolation and statistical uncertainty in the emulator (controlled by NtotN_{\textrm{tot}}) were always large compared to the uncertainty of the reference calculations that are stand-in for measurements (controlled by MevtruthM_{\textrm{ev}}^{\textrm{truth}}): at best, the statistical uncertainties for the calculations used in the emulator could be equal to the statistical uncertainty from the reference calculations (uncertainty 1/216\propto 1/\sqrt{2^{16}}) when Nd=1N_{d}=1, which would evidently be a case with extreme interpolation uncertainty.

We first repeat the previous example with Mevtruth=214M_{\textrm{ev}}^{\textrm{truth}}=2^{14} events while keeping Ntot=NdMev=216N_{\textrm{tot}}=N_{d}M_{\textrm{ev}}=2^{16}. This means that, when Nd=22=4N_{d}=2^{2}=4, the statistical uncertainty of the reference calculation is similar to that in the calculations used for the emulator; when Nd=24=16N_{d}=2^{4}=16, the statistical uncertainty in the emulator is approximately a factor 22 larger than in that of the reference calculations, etc. The results of the closure tests are shown on the left panel of Fig. 5. Overall, the results are similar: the constraints on the model parameters are consistent with the true value of the parameters at any ratio Nd/MevN_{d}/M_{\textrm{ev}}, but the range Nd/Mev0.1N_{d}/M_{\textrm{ev}}\approx 0.111 is found to be optimal.

We repeat this test in the opposite direction, using Mevtruth=218M_{\textrm{ev}}^{\textrm{truth}}=2^{18} events to reduce the uncertainty on the reference calculations while keeping the emulator budget the same (Ntot=NdMev=216N_{\textrm{tot}}=N_{d}M_{\textrm{ev}}=2^{16}). This is equivalent to a scenario where the emulator uncertainties are always being much larger than the data uncertainties. This yields the right panel of Fig. 5. Again, the optimal range of design points remains Nd/Mev0.1N_{d}/M_{\textrm{ev}}\approx 0.111.

III.2 Dependence on the observables and the parameters

Refer to caption
Refer to caption
Figure 6: Same as left side of Fig. 4, with (left) three observables (εn\langle\varepsilon_{n}\rangle, n=2,3,4n=2,3,4), and (right) four observables (εn\langle\varepsilon_{n}\rangle, n=2,3,4,5n=2,3,4,5).
Refer to caption
Refer to caption
Refer to caption
Figure 7: Same as Fig. 4, with three TRENTo parameters instead of two, and between two (left), three (center) or four (right) observables, with the observables being εn\langle\varepsilon_{n}\rangle, n=2,3n=2,3, n=2,3,4n=2,3,4 and n=2,3,4,5n=2,3,4,5 respectively.

Emulator performance depends on multiple factors, in particular the number of parameters, and the complexity of the parameter dependence of the observables. The results of Bayesian parameter inference evidently also depend on the factors just listed that impact emulator performance, and further depends on the information carried by the different observables. We repeat the closure test shown on Fig. 4 by adding new observables: we first add ε4\langle\varepsilon_{4}\rangle, and then ε5\langle\varepsilon_{5}\rangle. The results are shown respectively on the left and right panels of Fig. 6. Again, the results remain similar, with the optimal number of design points in the same range as found in previous tests.

We repeat this test, this time with three TRENTo parameters instead of two, starting with two observables (εn\langle\varepsilon_{n}\rangle, n=2,3n=2,3), then three (n=2,3,4n=2,3,4) and four (n=2,3,4,5n=2,3,4,5) observables. The results are shown respectively on the left, center and right panels of Fig. 7. The results are consistent with all previous ones: closure is successful since the constraints on all parameters are consistent with the truth value of the parameter for any number of design points. Moreover, constraints on the parameters are still best when the ratio of the number of design points NdN_{d} to the number of TRENTo simulations per design points MevM_{\textrm{ev}} is in the range Nd/Mev0.1N_{d}/M_{\textrm{ev}}\approx 0.111.

IV Discussion

There is a consistent trend among the results shown in Figs. 47: closure was best when the number of design points was slightly smaller than the square root of the total number of events (Ntot\sqrt{N_{\textrm{tot}}}), which is equivalent to Nd/Mev0.1N_{d}/M_{\textrm{ev}}\approx 0.111. This trend holds true for all the variations tested in this study.

A rule of thumb that is used at times to determine the number of parameter samples (design points) for a Gaussian process emulator is 10 times the number of parameters (10Nparams10N_{\textrm{params}}[34]. This guidance is evidently not meant to be precise, if only because it does not take into account the unavoidable trade-offs between statistical and emulator uncertainty. In the two-parameter case, Nd=10Nparams24N_{d}=10N_{\textrm{params}}\approx 2^{4}, which is considerably smaller than the optimal Nd26N_{d}\approx 2^{6}282^{8} found in the previous section. We also note that the optimal number of design points did not change significantly when three model parameters were used instead of two (Fig. 7), which suggests a modest dependence of this optimum on the number of parameters.

While statistical uncertainties generally converge at the rate 1/Mev1/\sqrt{M_{\textrm{ev}}}, interpolation uncertainties depend on the details of the model’s parameter dependence, and the range of parameters (the prior) over which an emulator is being trained. Rather than attempting to provide general guidance, we focus on application to heavy ion physics, where our results guidance will apply most directly.

V Implications for Bayesian inference in heavy ion collisions

As seen in the results from Section III, emulator uncertainty is propagated properly in the Bayesian parameter estimation: even if one uses an emulator with large interpolation or statistical uncertainties, the parameter constraints remain consistent with the true value of the parameter used in the closure test. In this sense, we expect the Gaussian process emulators used in Bayesian inference procedure of heavy ion data to be robust. On the other hand, we saw that optimizing the interpolation and statistical uncertainty of the emulator can lead to considerably improved constraints on the model parameters at the same computational cost. In our study, we found an optimal ratio of design points to TRENTo events per design point, Nd/MevN_{d}/M_{\textrm{ev}}, around 0.10.111.

The optimum Nd/Mev1N_{d}/M_{\textrm{ev}}\approx 1 is expected if the interpolation uncertainty in the emulator scales approximately as 1/Nd1/\sqrt{N_{d}}: that is, if both interpolation and statistical uncertainties scale the same, one should aim for a roughly even number for NdN_{d} and MevM_{\textrm{ev}}. Tests published in Ref. [17] found some evidence of 1/Nd1/\sqrt{N_{d}} convergence of observables in realistic models of heavy ion collisions.

To apply our work’s conclusions to heavy ion collisions, one must remember that there are different sources of stochasticity in the collisions, only one of which was considered in this work: fluctuations in the initial impact geometry of the nuclei. Fluctuations in the number of particles produced will also introduce statistical uncertainty, which was not studied in this work. It is common, though not universal, that the two statistical uncertainties are reduced by oversampling the particles produced for each initial geometry of the collision. The factor MevM_{\textrm{ev}} in the ratio Nd/Mev0.1N_{d}/M_{\textrm{ev}}\approx 0.111 corresponds to the number of initial geometries samples, and not the total number of oversamples.

Finally, we note that the choice of MevM_{\textrm{ev}} might be guided by the need for accuracy for certain statistics-hungry observables, rather than the benefits of the emulator as a whole [19, 20]. It will be useful to repeat the current study with a broader range of observables.

VI Conclusion

We studied the interplay between interpolation and statistical uncertainties for Bayesian parameter inference using Gaussian process emulators. We used the TRENTo model of initial conditions in heavy ion collisions, with observables given by the event-averaged spatial anisotropies εn\langle\varepsilon_{n}\rangle (Eq. 2), which are analogous to momentum anisotropy observables encountered in heavy ion collisions. Using NdN_{d} samples of the TRENTo parameter space and MevM_{\textrm{ev}} TRENTo events at each parameter point, we performed closure tests to determine the impact of the emulator uncertainty on constraining model parameters. We found that, for a budget of NtotN_{\textrm{tot}} total TRENTo events, the optimal number of design points was around Nd(0.25N_{d}\approx(0.251.0)Ntot1.0)\sqrt{N_{\textrm{tot}}}, which corresponds to Nd/Mev0.1N_{d}/M_{\textrm{ev}}\approx 0.111. We found this optimum by looking at multiple different metrics: the maximum value and width of the marginalized posterior, the value of the posterior at the true value of the parameter, and the Akaike Information Criterion (AIC). We found the posterior at the truth to be very effective at quantifying closure.

In view of our results, we believe it would be beneficial for Bayesian inference applications in heavy ion physics to experiment with different ratios Nd/MevN_{d}/M_{\textrm{ev}}. Simulating large number of collision events, which reduce all sources of stochastic uncertainties, can give access to statistics-hungry observables [19, 20]; yet using a large number MevM_{\textrm{ev}} of events at each parameter sample will not provide significant benefits for the large number of observables whose emulation is limited by interpolation uncertainty. An improved balance between interpolation and statistical uncertainties can help reduce the emulator’s uncertainty below current experimental uncertainties, allowing us to make full use of heavy ion measurements from the RHIC and the LHC.

VII Acknowledgements

The authors thank Simon Mak for insightful discussions, and Ron Solz and Matthew Luzum for valuable feedback. B.W. thanks Duke University and Wayne State University for financial support. S.A.B. and J.-F.P. have been supported by the U.S. Department of Energy Grant no. DE-FG02-05ER41367 This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 using NERSC award NP-ERCAP0022230. Calculations were also performed on the Duke Compute Cluster.

References

  • [1] S. A. Bass, “Microscopic reaction dynamics at SPS and RHIC,” Nucl. Phys. A, vol. 698, pp. 164–170, 2002.
  • [2] C. Gale, S. Jeon, and B. Schenke, “Hydrodynamic Modeling of Heavy-Ion Collisions,” Int. J. Mod. Phys. A, vol. 28, p. 1340011, 2013.
  • [3] U. Heinz and R. Snellings, “Collective flow and viscosity in relativistic heavy-ion collisions,” Ann. Rev. Nucl. Part. Sci., vol. 63, pp. 123–151, 2013.
  • [4] R. Derradi de Souza, T. Koide, and T. Kodama, “Hydrodynamic Approaches in Relativistic Heavy Ion Reactions,” Prog. Part. Nucl. Phys., vol. 86, pp. 35–85, 2016.
  • [5] O. Philipsen, “The QCD equation of state from the lattice,” Prog. Part. Nucl. Phys., vol. 70, pp. 55–107, 2013.
  • [6] H. Petersen, C. Coleman-Smith, S. A. Bass, and R. Wolpert, “Constraining the initial state granularity with bulk observables in Au+Au collisions at sNN=200\sqrt{s_{\rm NN}}=200 GeV,” J. Phys. G, vol. 38, p. 045102, 2011.
  • [7] J. Novak, K. Novak, S. Pratt, J. Vredevoogd, C. Coleman-Smith, and R. Wolpert, “Determining Fundamental Properties of Matter Created in Ultrarelativistic Heavy-Ion Collisions,” Phys. Rev. C, vol. 89, no. 3, p. 034917, 2014.
  • [8] E. Sangaline and S. Pratt, “Toward a deeper understanding of how experiments constrain the underlying physics of heavy-ion collisions,” Phys. Rev. C, vol. 93, no. 2, p. 024908, 2016.
  • [9] S. Pratt, E. Sangaline, P. Sorensen, and H. Wang, “Constraining the Eq. of State of Super-Hadronic Matter from Heavy-Ion Collisions,” Phys. Rev. Lett., vol. 114, p. 202301, 2015.
  • [10] J. E. Bernhard, P. W. Marcy, C. E. Coleman-Smith, S. Huzurbazar, R. L. Wolpert, and S. A. Bass, “Quantifying properties of hot and dense QCD matter through systematic model-to-data comparison,” Phys. Rev. C, vol. 91, no. 5, p. 054910, 2015.
  • [11] J. E. Bernhard, J. S. Moreland, S. A. Bass, J. Liu, and U. Heinz, “Applying Bayesian parameter estimation to relativistic heavy-ion collisions: simultaneous characterization of the initial state and quark-gluon plasma medium,” Phys. Rev. C, vol. 94, no. 2, p. 024907, 2016.
  • [12] J. S. Moreland, J. E. Bernhard, and S. A. Bass, “Bayesian calibration of a hybrid nuclear collision model using p-Pb and Pb-Pb data at energies available at the CERN Large Hadron Collider,” Phys. Rev. C, vol. 101, no. 2, p. 024911, 2020.
  • [13] J. E. Bernhard, Bayesian parameter estimation for relativistic heavy-ion collisions. PhD thesis, Duke U., 4 2018.
  • [14] J. E. Bernhard, J. S. Moreland, and S. A. Bass, “Bayesian estimation of the specific shear and bulk viscosity of quark–gluon plasma,” Nature Phys., vol. 15, no. 11, pp. 1113–1117, 2019.
  • [15] D. Everett et al., “Multisystem Bayesian constraints on the transport coefficients of QCD matter,” Phys. Rev. C, vol. 103, no. 5, p. 054904, 2021.
  • [16] G. Nijs, W. van der Schee, U. Gürsoy, and R. Snellings, “Transverse Momentum Differential Global Analysis of Heavy-Ion Collisions,” Phys. Rev. Lett., vol. 126, no. 20, p. 202301, 2021.
  • [17] G. Nijs, W. van der Schee, U. Gürsoy, and R. Snellings, “Bayesian analysis of heavy ion collisions with the heavy ion computational framework Trajectum,” Phys. Rev. C, vol. 103, no. 5, p. 054909, 2021.
  • [18] D. Everett et al., “Phenomenological constraints on the transport properties of QCD matter with data-driven model averaging,” Phys. Rev. Lett., vol. 126, no. 24, p. 242301, 2021.
  • [19] J. E. Parkkila, A. Onnerstad, and D. J. Kim, “Bayesian estimation of the specific shear and bulk viscosity of the quark-gluon plasma with additional flow harmonic observables,” Phys. Rev. C, vol. 104, no. 5, p. 054904, 2021.
  • [20] J. E. Parkkila, A. Onnerstad, S. F. Taghavi, C. Mordasini, A. Bilandzic, M. Virta, and D. J. Kim, “New constraints for QCD matter from improved Bayesian parameter estimation in heavy-ion collisions at LHC,” Phys. Lett. B, vol. 835, p. 137485, 2022.
  • [21] J. S. Moreland, J. E. Bernhard, and S. A. Bass, “Alternative ansatz to wounded nucleon and binary collision scaling in high-energy nuclear collisions,” Phys. Rev. C, vol. 92, no. 1, p. 011901, 2015.
  • [22] M. Luzum and H. Petersen, “Initial State Fluctuations and Final State Correlations in Relativistic Heavy-Ion Collisions,” J. Phys. G, vol. 41, p. 063102, 2014.
  • [23] J.-Y. Ollitrault, “Anisotropy as a signature of transverse collective flow,” Phys. Rev. D, vol. 46, pp. 229–245, 1992.
  • [24] M. Miller and R. Snellings, “Eccentricity fluctuations and its possible effect on elliptic flow measurements,” 12 2003.
  • [25] B. Alver et al., “System size, energy, pseudorapidity, and centrality dependence of elliptic flow,” Phys. Rev. Lett., vol. 98, p. 242302, 2007.
  • [26] B. Alver and G. Roland, “Collision geometry fluctuations and triangular flow in heavy-ion collisions,” Phys. Rev. C, vol. 81, p. 054905, 2010. [Erratum: Phys.Rev.C 82, 039903 (2010)].
  • [27] F. G. Gardim, F. Grassi, M. Luzum, and J.-Y. Ollitrault, “Mapping the hydrodynamic response to the initial geometry in heavy-ion collisions,” Phys. Rev. C, vol. 85, p. 024908, 2012.
  • [28] C. K. Williams and C. E. Rasmussen, Gaussian processes for machine learning, vol. 2. MIT press Cambridge, MA, 2006.
  • [29] M. Roberts. https://github.com/j-f-paquet/bayesian_parameter_estimation_tutorial/blob/master/parameter_space_sampling/param_sampling.ipynb.
  • [30] H. Niederreiter, Random number generation and quasi-Monte Carlo methods. SIAM, 1992.
  • [31] H. Akaike, “A new look at the statistical model identification,” IEEE transactions on automatic control, vol. 19, no. 6, pp. 716–723, 1974.
  • [32] A. R. Liddle, “Information criteria for astrophysical model selection,” Mon. Not. Roy. Astron. Soc., vol. 377, pp. L74–L78, 2007.
  • [33] A. R. Liddle, “How many cosmological parameters,” Monthly Notices of the Royal Astronomical Society, vol. 351, no. 3, pp. L49–L53, 2004.
  • [34] J. L. Loeppky, J. Sacks, and W. J. Welch, “Choosing the sample size of a computer experiment: A practical guide,” Technometrics, vol. 51, no. 4, pp. 366–376, 2009.