This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Conditional Normalizing flow for Monte Carlo sampling in lattice scalar field theory

Ankur Singha Department of Physics, Indian Institute of Technology Kanpur, Kanpur-208016, India Dipankar Chakrabarti Department of Physics, Indian Institute of Technology Kanpur, Kanpur-208016, India Vipul Arora Department of Electrical Engineering, Indian Institute of Technology Kanpur, Kanpur-208016, India
Abstract

The cost of Monte Carlo sampling of lattice configurations is very high in the critical region of lattice field theory due to the high correlation between the samples. This paper suggests a Conditional Normalizing Flow (C-NF) model for sampling lattice configurations in the critical region to solve the problem of critical slowing down. We train the C-NF model using samples generated by Hybrid Monte Carlo (HMC) in non-critical regions with low simulation costs. The trained C-NF model is employed in the critical region to build a Markov chain of lattice samples with negligible autocorrelation. The C-NF model is used for both interpolation and extrapolation to the critical region of lattice theory. Our proposed method is assessed using the 1+1-dimensional scalar ϕ4\phi^{4} theory. This approach enables the construction of lattice ensembles for many parameter values in the critical region, which reduces simulation costs by avoiding the critical slowing down.

1  Introduction

In lattice field theory, Monte Carlo Simulation techniques are used to sample lattice configurations based on a distribution defined by the action of the lattice theory. The parameter value at which we generate the lattice samples determines the cost of the simulation. The non-critical region of the lattice theory has low simulation costs for algorithms like Hybrid Monte Carlo (HMC))[1]. However, as we attempt to sample uncorrelated lattice configurations from the critical region, the simulation cost increases rapidly. In the critical region, the integrated autocorrelation time, which gives the measure of correlation, increases rapidly and diverges at the critical point. For a finite-size lattice critical point corresponds to the peak point of the autocorrelation curve. This problem is known as the critical slowing down[2, 3]. Many efforts have been made to lessen the impact of critical slowing in statistical systems and lattice QFT [4, 5, 6]. But it always remains a challenging task to simulate near the critical point of a lattice QFT by overcoming the critical slowing down.

These days, ML-based solutions to this problem are becoming popular. Various ML algorithms have been applied for statistical physics and condensed matter problems[7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] . Some generative learning algorithms[21, 22, 23, 24, 25, 26, 27] have recently been developed to avoid the difficulty in lattice field theory. Conditional GAN was used in a 2D lattice Gross Neveue model [20] and shown to be effective in mitigating the influence of critical slowdown. However, explicit probability density estimation is not accessible in GAN; thus, we cannot guarantee that the model distribution is identical to the true lattice distribution. Flow based generative leanings are found successful in avoiding the problem of critical slowing down in scalar field theory[21], fermionic system[22], U(1) gauge theory[24] and Schwinger models[26]. In the flow-based approach [21, 24] an NF model is trained at a single value of the action parameter with reverse KL divergence. Finally, the trained model can generate lattice samples at the same parameter value. This model is initialized from scratch for a parameter value and does not use any lattice samples. Since HMC simulation in the non-critical region is not affected by the critical slowing down problem, we can use samples from that region to train a generative model. We present a method for sampling lattice configurations near the critical regions using Conditional Normalizing Flow (C-NF) to reduce the problem of critical slowing down. This method involves training a C-NF model in a non-critical region to produce samples for several parameter values in the critical region. Our goal is to generate lattice configurations from the distribution p(ϕ|λcrit)=1ZeS(ϕ|λcrit)p(\phi|\lambda_{crit})=\frac{1}{Z}e^{-S(\phi|\lambda_{crit})}, where ϕ\phi denotes the lattice field, λcrit\lambda_{crit} denotes the action parameter close to critical point and Z is the partition function. We train a C-NF model p~(ϕ|λ)\tilde{p}({\phi|\lambda}) with HMC samples from p(ϕ|λ)p(\phi|\lambda) for various non-critical λ\lambda values. We train the C-NF model to be a generalized model over λ\lambda parameters. The model is then interpolated or extrapolated to the λ\lambda values in the critical region to generate lattice configurations. However, the interpolated or extrapolated model may not directly provide samples from the true distribution. But the exactness can be guaranteed by using the Metropolis-Hastings(MH) algorithm at the end. So, after training, we use the interpolated/extrapolated model p~(ϕ|λcrit)\tilde{p}(\phi|{\lambda_{crit}}) at critical region as proposal for constructing a Markov Chain via an independent MH algorithm[21]. This method is useful when the probability distribution is known up to a normalizing factor.

The primary contributions of this study are as follows:

  1. 1.

    Using Conditional Normalizing Flows, we present a new method for sampling lattice configurations near the critical regions. This method eliminates the critical slowdown problem.

  2. 2.

    The model has the ability to learn about the lattice system across multiple λ\lambda values and use this knowledge to generate sample at any given λ\lambda values. As a result, our model can generate samples at multiple λ\lambda values values in the critical region, which is not possible for the existing flow-based methods in lattice thoery.

  3. 3.

    We also demonstrate that Conditional Normalizing flow can be used to do both interpolation and extrapolation for lattice ϕ4\phi^{4} theory. The extrapolation demonstrates the possibility of using our approach for sampling lattice gauge theory.

2  Lattice scalar ϕ4\phi^{4} Theory

In 2d2d euclidean space, the action for ϕ4\phi^{4} theory can be written as:

S(ϕ,λ,m)=𝑑x2[(μϕ(x))2+m2ϕ(x)2+λϕ(x)4]\displaystyle S(\phi,\lambda,m)=\int dx^{2}[(\partial_{\mu}\phi(x))^{2}+m^{2}\phi(x)^{2}+\lambda\phi(x)^{4}] (1)

where λ\lambda and mm are the two parameters of the theory.

On lattice the action become:

S(ϕ,λ,m)=\displaystyle S(\phi,\lambda,m)= x[μ=1,2[2ϕ(x)2+ϕ(x)ϕ(x+μ^)\displaystyle\sum\limits_{x}\Big{[}\sum\limits_{\mu={1,2}}\Big{[}2\phi(x)^{2}+\phi(x)\phi(x+\hat{\mu})
ϕ(x)ϕ(xμ^)]+m2ϕ(x)2+λϕ(x)4]\displaystyle-\phi(x)\phi(x-\hat{\mu})\Big{]}+m^{2}\phi(x)^{2}+\lambda\phi(x)^{4}\Big{]} (2)

where xx is a 2d2d discrete vector and μ^\hat{\mu} represents two possible directions on the lattice. ϕ(x)\phi(x) is defined on each lattice site, taking only real values.

We choose this specific form of action for HMC simulation because it is suited for creating datasets for training the C-NF model. Using this form, we do not need to apply any further transformations to the samples during training. This action possesses ϕ(x)ϕ(x)\phi(x)\rightarrow\phi(-x) symmetry, however it is spontaneously broken at a specific parameter region. We set m2=4m^{2}=-4 for our numerical experiments and observed spontaneous symmetry breaking in the theory by varying λ\lambda. If we begin in the broken phase of the theory, the order parameter under consideration ϕ2\langle\phi^{2}\rangle is nonzero which approaches zero at the critical point and remains zero in the symmetric phase, as shown in Figure 1. In HMC simulation we choose a parameter λ\lambda and produce configurations based on the probability distribution:

P(ϕ|λ)=1ZeS(ϕ|λ)\displaystyle P(\phi|\lambda)=\frac{1}{Z}e^{-S(\phi|\lambda)} (3)
where, Z=ϕeS(ϕ|λ)\displaystyle\text{where, }Z=\sum\limits_{\phi}e^{-S(\phi|\lambda)}
Refer to caption
Figure 1: HMC simulation: order parameter ϕ~2\langle\tilde{\phi}^{2}\rangle vs λ\lambda for m2=4m^{2}=-4, which shows a 2nd order phase transition.

Each lattice configuration is a 2d2d matrix with the dimensions Lx×LyL_{x}\times L_{y}. For our experiment, we use Lx=Ly=8L_{x}=L_{y}=8 and add a periodic boundary condition to the lattice in all directions. More information on lattice ϕ4\phi^{4} theory can be found in ref. [28]. Some of the observables which we calculate on the lattice ensembles are:

  1. 1.

    ϕ~2\langle\tilde{\phi}^{2}\rangle: ϕ~=1Vxϕ(x)\tilde{\phi}=\frac{1}{V}\sum_{x}\phi(x)

  2. 2.

    Correlation Function:

    Gc(x)=1Vy[ϕ(y)ϕ(x+y)ϕ(y)ϕ(x+y)]\displaystyle G_{c}(x)=\frac{1}{V}\sum_{y}[\langle\phi(y)\phi(x+y)\rangle-\langle\phi(y)\rangle\langle\phi(x+y)\rangle]

    Zero momentum Correlation Function:
    C(t)=x1Gc(x1,t)C(t)=\sum_{x_{1}}G_{c}(x_{1},t)

  3. 3.

    Two Point Susceptibility: χ=xG(x)\chi=\sum_{x}G(x)

3  Conditional Normalizing Flow

Normalizing flows[29] are a generative model for constructing complex distributions by transforming a simple known distribution via a series of invertible and smooth mapping f:ddf:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d} with inverse f1=gf^{-1}=g. If pz(z)p_{z}(z) is the prior distribution and pt(x)p_{t}(x) is the complex target distribution, then the model distribution pm(x;θ)p_{m}(x;\theta) can be written using the change of variable formula as

pm(x;θ)=pz(z)|detfθ(z)1x|\displaystyle p_{m}(x;\theta)=p_{z}(z)\big{|}det\frac{\partial f_{\theta}(z)^{-1}}{\partial x}\big{|} (4)
where, x=fθ(z)\displaystyle\text{where, }x=f_{\theta}(z)

Fitting a flow-based model pm(x;θ)p_{m}(x;\theta) to a target distribution pt(x)p_{t}(x) can be accomplished by minimising their KL divergence. The most crucial step is to build the flow so that we can calculate |detfθ(z)1x|\big{|}det\frac{\partial f_{\theta}(z)^{-1}}{\partial x}\big{|}. One such method is the affine coupling block, which divides the input zz into two halves and applies an affine transformation to produce upper or lower triangular Jacobians. The transformation rules for such a building are as follows[30]:

x1=z1exp(s1(z2)+t1(z2))\displaystyle x_{1}=z_{1}\odot exp(s_{1}(z_{2})+t_{1}(z_{2}))
x2=z2exp(s2(x1))+t2(x1)\displaystyle x_{2}=z_{2}\odot exp(s_{2}(x_{1}))+t_{2}(x_{1}) (5)

where, \odot represent element-wise product of two vectors.

The inverse of this coupling layer is simply computed as:

z2=(x2t2(x1))exp(s2(x1))\displaystyle z_{2}=(x_{2}-t_{2}(x_{1}))\odot exp(-s_{2}(x_{1}))
z1=(x1t1(z2))exp(s1(z2))\displaystyle z_{1}=(x_{1}-t_{1}(z_{2}))\odot exp(-s_{1}(z_{2})) (6)
Refer to caption
Figure 2: One affine block of Conditional Normalizing flow. Here, c is the conditional parameter.

Because inverting an affine coupling layer does not require the inverse of s1,s2,t1,t2s_{1},s_{2},t_{1},t_{2}, they can be any non-linear complex function and can thus be represented by neural networks. Introducing a conditioning parameter in NF is not as simple as it is in GAN. However, since ss and tt are only evaluated in the forward direction, we may concatenate the conditioning parameter cc with the input xx to the coupling layer in order to invert the model[31], as illustrated in Figure 2. As a result, the affine coupling transformation rules become:

x1=z1exp(s1(z2,c)+t1(z2,c))\displaystyle x_{1}=z_{1}\odot exp(s_{1}(z_{2},c)+t_{1}(z_{2},c))
x2=z2exp(s2(x1,c))+t2(x1,c))\displaystyle x_{2}=z_{2}\odot exp(s_{2}(x_{1},c))+t_{2}(x_{1},c)) (7)

And its inverse become

z2=(x2t2(x1,c))exp(s2(x1,c))\displaystyle z_{2}=(x_{2}-t_{2}(x_{1},c))\odot exp(-s_{2}(x_{1},c))
z1=(x1t1(z2,c))exp(s1(z2,c))\displaystyle z_{1}=(x_{1}-t_{1}(z_{2},c))\odot exp(-s_{1}(z_{2},c)) (8)

Let us designate the C-NF model as f(x;c,θ)f(x;c,\theta) and its inverse as g(z;c,θ)g(z;c,\theta). The invertibility for any fixed condition cc is given by

f1(.;c,θ)=g(.;,c,θ)\displaystyle f^{-1}(.;c,\theta)=g(.;,c,\theta) (9)

The change of variable formula become

pm(x;c,θ)=pz(f(x;c,θ))|detJf(z)|1\displaystyle p_{m}(x;c,\theta)=p_{z}(f(x;c,\theta))|detJ_{f}(z)|^{-1} (10)

And the loss function is the KL divergence between the model distribution pm(x;c,θ)p_{m}(x;c,\theta) and target distribution pt(x;c)p_{t}(x;c) :

=DKL[pt(x;c)||pm(x;c,θ)]\displaystyle\mathcal{L}=D_{KL}[{p_{t}(x;c)||p_{m}(x;c,\theta)}] (11)

We can find the maximum likelihood network parameter θML\theta_{ML} using this loss. Then, for a fixed cc, we can execute conditional generation of xx by sampling zz from pz(z)p_{z}(z) and employing the inverted network g(z;c,θML)g(z;c,\theta_{ML}).

4  Numerical Experiments

This section discusses dataset preparation and the model architecture utilized in training the C-NF model. In addition, the training details of the C-NF model and the sampling process in the critical region are discussed. We train the C-NF model with different datasets and model architecture for interpolation and extrapolation.

4.1 Dataset

In the non-critical region, we generate lattice configurations using HMC simulation where autocorrelation is low. In HMC simulation we use Molecular Dynamics(MD) step size=0.1 and MD trajectory length=1. For training purposes, we generate 10,000 lattice configurations for interpolation and 15,000 configurations for extrapolation of size 8×88\times 8 for each λ\lambda value. In the critical region, for the evaluation purpose of the C-NF model, we use HMC to generate around 10510^{5} lattice configurations per λ\lambda. This serves as a baseline for comparing observables to the observable produced through the proposed sampling strategy.

4.2 C-NF Model Architecture

The affine coupling block displayed in Figure 2 is the fundamental building component of our C-NF model. For all neural networks ss and tt, we use the same architecture. Figure 3 depicts the neural networks used for ss and tt. In the neural networks, we solely employ Fully Convolutional Layers. For the first two layers, we use 64 filter for interpolation and 32 for extrapolation, all of which are 3×33\times 3 in size. For each Convolutional layer, we employ the Tanh activation function. We use 8 such affine blocks while training for both interpolation and extrapolation. For each 8 affine block, the conditional parameter cc is concatenated with the input to ss and tt as shown in Figure 2.

Refer to caption
Figure 3: Architecture of the Neural Network s and t of the ii-th affine block.

4.3 Training and Sampling Procedure

The loss function used for training the C-NF model is the forward KL divergence as in Section 3 between the model distribution pm(ϕ;λ,θ)p_{m}(\phi;\lambda,\theta) and target distribution pt(ϕ;λ)p_{t}(\phi;\lambda):

(θ)\displaystyle\mathcal{L(\theta)} =𝑑ϕpt(ϕ;λ)(log[pt(ϕ;λ)]log[pm(ϕ;λ,θ)])\displaystyle=\int d\phi p_{t}(\phi;\lambda)(log[p_{t}(\phi;\lambda)]-log[p_{m}(\phi;\lambda,\theta)])
=Eϕpt(log[pt(ϕ;λ)]log[pm(ϕ;λ,θ)])\displaystyle=E_{\phi\sim p_{t}}(log[p_{t}(\phi;\lambda)]-log[p_{m}(\phi;\lambda,\theta)]) (12)
where,\displaystyle\text{where},
log[pt(ϕ;λ)]=[S(ϕ|λ)+Z]\displaystyle log[p_{t}(\phi;\lambda)]=-[S(\phi|\lambda)+Z]

The expectation EϕptE_{\phi\sim p_{t}} is evaluated using HMC samples from the non-critical region. During training, we maintain the learning rate at 0.0003 and employ the Adam optimizer.

Once training is complete, we invert the C-NF model, which we call the proposal model. The inputs to the proposal model are i) lattices from the Normal distribution 𝒩(0,1)\mathcal{N}(0,1) and ii) λ\lambda as a conditional parameter for sample generation. Outputs are the lattice configurations and probability densities for each configuration for a given λ\lambda. For the critical region we give critical λ\lambda values as conditional parameter to the propsal model. We look at two scenarios in which a C-NF model can be either interpolated or extrapolated to the critical region. Both require different training, but the sampling technique is the same. The samples from the interpolated/extrapolated model may not exactly represent the true distribution of lattice theory pt(ϕ|λ)p_{t}(\phi|\lambda). As a result, we use this model as a proposal for the independent MH algorithm, which generates a Markov Chain with asymptotic convergence to the true distribution.

4.4 Interpolation to the Critical Region

The λ\lambda set used to train the C-NF model for interpolation purposes is: {3,3.2,3.5,3.6,3.7,3.8,5.8,6,6.5,7,8,9}\{3,3.2,3.5,3.6,3.7,3.8,5.8,6,6.5,7,8,9\}. During training, we bypass the critical region [4.1-5.0] so that we can interpolate the model where the autocorrelation time is large for HMC simulation. We interpolate the trained model for multiple λ\lambda values belonging to the critical region (4.1, 4.2, 4.25, 4.3, 4.35, 4.4, 4.45, 4.5, 4.55, 4.6, 4.65, 4.7, 4.8, 5.0). For each λ\lambda values we generates one ensemble of 10510^{5} configurations from the proposed method. On each ensemble we calculate different observables using bootstrap re-sampling method.

Refer to caption
(a) ϕ~2\langle\tilde{\phi}^{2}\rangle
Refer to caption
(b) χ\chi
Figure 4: Interpolation to the critical region:-ϕ~2\langle\tilde{\phi}^{2}\rangle and χ\chi are calculated on samples generated from i)HMC, ii)C-NF followed by MH, and iii) Naive C-NF. The Error bars indicate standard deviation calculated using bootstrapping re-sampling with bin size 100.

We compare the observables from HMC simulation and our proposed method. Observable from the Naive C-NF without MH is also shown to demonstrate the C-NF model’s proximity to the true distribution. In Figure 4 we plot two observables ϕ~2\langle\tilde{\phi}^{2}\rangle and χ\chi for the interpolated λ\lambda values. Although the naive C-NF model has biases, MH can eliminate them, and both observables match pretty well within the statistical uncertainty. In the Appendix, we present a table of numerical values of the observables with errors.

Refer to caption
(a) λ=4.3\lambda=4.3
Refer to caption
(b) λ=4.5\lambda=4.5
Refer to caption
(c) λ=4.55\lambda=4.55
Refer to caption
(d) λ=4.70\lambda=4.70
Figure 5: Interpolation-:Zero momentum Correlation function calculated on samples generated from i)HMC, ii)C-NF followed by MH, and iii) Naive C-NF. The Error bars indicate 95%95\% confidence interval calculated using bootstrapping re-sampling method with bin-size 100.

In Figure 5, the two point zero momentum correlation function C(t){C(t)} is shown for four different λ\lambda values. For these λ\lambda values, we can observe that the correlation function is non-vanishing, which is a unique property of the critical region. The plots for other critical λ\lambda values are included in the Appendix.

In Figure 6 we have also shown the histogram of ϕ~\tilde{\phi} for a particular critical λ=4.6\lambda=4.6. This demonstrates that MH can eliminate this kind artefacts created by the C-NF model.

We displayed plots of various observables on both phases around the critical point for the interpolation. We found that the observables estimated using our technique and the HMC match quite well.

Refer to caption
Refer to caption
Figure 6: Histogram of ϕ~\tilde{\phi} for λ=4.6\lambda=4.6 from the a)naive C-NF model and b) C-NF with MH is compared against HMC results with a bin-size=100.

4.5 Extrapolation to the Critical Region

The λ\lambda set used to train the C-NF model for the extrapolation is :{3,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8,3.9}\{3,3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8,3.9\}. After training the model in the broken phase, we extrapolate it in the critical region around λ=4.6\lambda=4.6, which we take to be the critical region’s midpoint. The model is extrapolated for five distinct λ\lambda values:[4.2,4.3,4.4,4.5,4.6]. For each λ\lambda values we generates one ensemble of 10510^{5} configurations from our method. Again, we calculate several observables for each ensemble using the bootstrap re-sampling method and compare them to the HMC results. We find that the size of the training dataset needs to be increased for extrapolation in order to achieve a C-NF model that is close to the true distribution in the broken phase. We plot the observables ϕ~2\langle\tilde{\phi}^{2}\rangle and χ\chi in Figure 7 and the zero momentum correlation function is plotted in Figure 8 for four different λ\lambda in the critical region .

Refer to caption
(a) ϕ~2\langle\tilde{\phi}^{2}\rangle
Refer to caption
(b) χ\chi
Figure 7: Extrapolation to the critical region-: ϕ~2\langle\tilde{\phi}^{2}\rangle and χ\chi are calculated on samples generated from i)HMC, ii)C-NF followed by MH, and iii) Naive C-NF. The Error bars indicate standard deviation calculated using bootstrapping re-sampling with bin size 100.
Refer to caption
(a) λ=4.20\lambda=4.20
Refer to caption
(b) λ=4.30\lambda=4.30
Refer to caption
(c) λ=4.40\lambda=4.40
Refer to caption
(d) λ=4.45\lambda=4.45
Figure 8: Extrapolation:-Zero momentum Correlation function calculated on samples generated from i)HMC, ii)C-NF followed by MH, and iii) Naive C-NF. The Error bars indicate 95%95\% confidence interval calculated using bootstrapping re-sampling method with bin-size 100.

The naive C-NF model produces configurations that are inherently uncorrelated. Since we obtain a Markov Chain after applying MH, therefore we can’t guarantee the same. With a 2540%25-40\% acceptance rate from the model, we see negligible correlation between the samples. The integrated autocorrelation time for χ\chi from our proposed approach and HMC is plotted in Figure 9. It shows that we successfully reduce the Autocorrelation time for the Markov chain.The comparison is not absolute because HMC is affected by algorithmic settings. But we want to show that for the critical region, there is almost no correlation between samples in the Markov Chains obtained from both interpolation and extrapolation.

Refer to caption
Figure 9: Integrated Auto-correlation time calculated on samples generated from HMC and C-NF model with MH. The straight line represents the autocorrelation time for uncorrelated samples.

5  Cost Analysis

The sampling algorithm for the baseline approach (HMC) and the suggested method is vastly different; thus, a direct cost comparison is opaque. Nonetheless, we separate the simulation cost for the proposed technique into two components: Training time and sample generation time. On a Colab Tesla P100 GPU, the training time for the C-NF model for interpolation or extrapolation is roughly 5-6 hours. However, sample generation is very fast for the C-NF model. Generating one Markov Chain of 10510^{5} configuration takes 5-7 minutes with a 25-40%\% acceptance rate. Due to the short generation time, a low acceptance rate is acceptable until the autocorrelation time increases.

Once the C-NF model has been trained, it can be employed repeatedly to generate configurations for a wide range of λ\lambda values. From a single training of the C-NF(interpolated) model, we have generated configurations for 13 λ\lambda’s in the critical region. So, our approach outperforms HMC for sampling at multiple λ\lambda values in the critical region.

6  Conclusion

The critical slowing down problem prevents generating a large ensemble in the critical region of a lattice theory. In order to resolve this, we employ a Conditional normalizing flow trained on HMC samples with low autocorrelation and generate samples in the critical region. In order to learn a general distribution over parameter λ\lambda, we train the C-NF model away from the critical point of lattice ϕ4\phi^{4} theory. This model is interpolated in the critical region and serves as a proposal for the MH algorithm to generate a Markov chain. The degree to which the extrapolated or interpolated model resembles the actual distribution determines the acceptance rate in the critical region. In order to achieve a high acceptance rate and prevent the development of autocorrelation, the C-NF model must be trained adequately. The C-NF model generates uncorrelated samples, and we trained well enough to get 25-45%\% acceptance rate. With this much acceptance rate, we found no correlation between configuration in the Markov chain. As a result, our method significantly mitigates the critical slowing down problem. Aside from that, our method can be highly efficient when we need interpolation/extrapolation to numerous λ\lambda values in the critical region. Since lattice gauge theory requires extrapolation to the critical region, we have likewise extrapolated ϕ4\phi^{4} lattice theory to the critical region. We observe high agreement between observables estimated using the suggested technique and HMC simulation for both interpolation and extrapolation.

References

  • [1] Simon Duane, Anthony D Kennedy, Brian J Pendleton, and Duncan Roweth. Hybrid monte carlo. Physics letters B, 195(2):216–222, 1987.
  • [2] Ulli Wolff. CRITICAL SLOWING DOWN. Nucl. Phys. B Proc. Suppl., 17:93–102, 1990.
  • [3] Stefan Schaefer, Rainer Sommer, and Francesco Virotta. Critical slowing down and error analysis in lattice QCD simulations. Nucl. Phys. B, 845:93–119, 2011.
  • [4] Alberto Ramos. Playing with the kinetic term in the HMC. PoS, LATTICE2012:193, 2012.
  • [5] Arjun Singh Gambhir and Kostas Orginos. Improved sampling algorithms in lattice qcd, 2015.
  • [6] Michael G. Endres, Richard C. Brower, William Detmold, Kostas Orginos, and Andrew V. Pochinsky. Multiscale monte carlo equilibration: Pure yang-mills theory. Physical Review D, 92(11), Dec 2015.
  • [7] Kai Zhou, Gergely Endrődi, Long-Gang Pang, and Horst Stöcker. Regressive and generative neural networks for scalar field theory. Phys. Rev. D, 100:011501, Jul 2019.
  • [8] Dian Wu, Lei Wang, and Pan Zhang. Solving statistical mechanics using variational autoregressive networks. Phys. Rev. Lett., 122:080602, Feb 2019.
  • [9] Jan M Pawlowski and Julian M Urban. Reducing autocorrelation times in lattice simulations with generative adversarial networks. Machine Learning: Science and Technology, 1(4):045011, oct 2020.
  • [10] Kim A. Nicoli, Shinichi Nakajima, Nils Strodthoff, Wojciech Samek, Klaus-Robert Müller, and Pan Kessel. Asymptotically unbiased estimation of physical observables with neural samplers. Physical Review E, 101(2), feb 2020.
  • [11] Juan Carrasquilla. Machine learning for quantum matter. Advances in Physics: X, 5(1):1797528, jan 2020.
  • [12] Junwei Liu, Huitao Shen, Yang Qi, Zi Yang Meng, and Liang Fu. Self-learning monte carlo method and cumulative update in fermion systems. Physical Review B, 95(24), jun 2017.
  • [13] Johanna Vielhaben and Nils Strodthoff. Generative neural samplers for the quantum heisenberg chain. Physical Review E, 103(6), jun 2021.
  • [14] Chuang Chen, Xiao Yan Xu, Junwei Liu, George Batrouni, Richard Scalettar, and Zi Yang Meng. Symmetry-enforced self-learning monte carlo method applied to the holstein model. Physical Review B, 98(4), jul 2018.
  • [15] Giacomo Torlai and Roger G. Melko. Learning thermodynamics with boltzmann machines. Phys. Rev. B, 94:165134, Oct 2016.
  • [16] G. Carleo and M. Troyer. Solving the quantum many-body problem with artificial neural networks. Science, 355:602, 2017.
  • [17] Lei Wang. Discovering phase transitions with unsupervised learning. Phys. Rev. B, 94:195105, Nov 2016.
  • [18] Pengfei Zhang, Huitao Shen, and Hui Zhai. Machine learning topological invariants with neural networks. Phys. Rev. Lett., 120:066401, Feb 2018.
  • [19] Japneet Singh, Mathias Scheurer, and Vipul Arora. Conditional generative models for sampling and phase transition indication in spin systems. SciPost Physics, 11(2), Aug 2021.
  • [20] Ankur Singha, Dipankar Chakrabarti, and Vipul Arora. Generative learning for the problem of critical slowing down in lattice Gross Neveu model. arXiv: 2111.00574, 2021.
  • [21] M. S. Albergo, G. Kanwar, and P. E. Shanahan. Flow-based generative models for markov chain monte carlo in lattice field theory. Phys. Rev. D, 100:034515, Aug 2019.
  • [22] Michael S. Albergo, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Julian M. Urban, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, and Phiala E. Shanahan. Flow-based sampling for fermionic lattice field theories. 2021.
  • [23] Phiala E Shanahan, Daniel Trewartha, and William Detmold. Machine learning action parameters in lattice quantum chromodynamics. Physical Review D, 97(9):094506, 2018.
  • [24] Gurtej Kanwar, Michael S. Albergo, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Sébastien Racanière, Danilo Jimenez Rezende, and Phiala E. Shanahan. Equivariant flow-based sampling for lattice gauge theory. Physical Review Letters, 125(12), Sep 2020.
  • [25] Michael S. Albergo, Denis Boyda, Daniel C. Hackett, Gurtej Kanwar, Kyle Cranmer, Sébastien Racanière, Danilo Jimenez Rezende, and Phiala E. Shanahan. Introduction to normalizing flows for lattice field theory, 2021.
  • [26] Michael S. Albergo, Denis Boyda, Kyle Cranmer, Daniel C. Hackett, Gurtej Kanwar, Sébastien Racanière, Danilo J. Rezende, Fernando Romero-López, Phiala E. Shanahan, and Julian M. Urban. Flow-based sampling in the lattice Schwinger model at criticality. 2 2022.
  • [27] Daniel C. Hackett, Chung-Chun Hsieh, Michael S. Albergo, Denis Boyda, Jiunn-Wei Chen, Kai-Feng Chen, Kyle Cranmer, Gurtej Kanwar, and Phiala E. Shanahan. Flow-based sampling for multimodal distributions in lattice field theory. 7 2021.
  • [28] Ingmar Vierhaus. Simulation of phi 4 theory in the strong coupling expansion beyond the ising limit. Master’s thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät I, 2010.
  • [29] Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows, 2015.
  • [30] Lynton Ardizzone, Carsten Lüth, Jakob Kruse, Carsten Rother, and Ullrich Köthe. Guided image generation with conditional invertible neural networks, 2019.
  • [31] Lynton Ardizzone, Carsten Lüth, Jakob Kruse, Carsten Rother, and Ullrich Köthe. Guided image generation with conditional invertible neural networks. CoRR, abs/1907.02392, 2019.

7   Appendix

λ\lambda ϕ~2\langle\tilde{\phi}^{2}\rangle χ2\chi_{2}
HMC C-NF with MH Naive C-NF HMC C-NF with MH Naive C-NF
4.10 0.2147±0.01790.2147\pm 0.0179 0.2169±0.01690.2169\pm 0.0169 0.1586±0.02050.1586\pm 0.0205 13.7446+±0.805913.7446+\pm 0.8059 13.7446±0.805913.7446\pm 0.8059 10.1557±0.918110.1557\pm 0.9181
4.20 0.1925±0.01810.1925\pm 0.0181 0.1970±0.01730.1970\pm 0.0173 0.1463±0.01990.1463\pm 0.0199 12.3272±0.810312.3272\pm 0.8103 12.6235±0.782212.6235\pm 0.7822 9.3694±0.88449.3694\pm 0.8844
4.25 0.1819±0.00900.1819\pm 0.0090 0.1872±0.00870.1872\pm 0.0087 0.1411±0.00970.1411\pm 0.0097 11.6485±0.574711.6485\pm 0.5747 11.9992±0.556611.9992\pm 0.5566 9.0395±0.61919.0395\pm 0.6191
4.30 0.1724±0.01810.1724\pm 0.0181 0.1781±0.01760.1781\pm 0.0176 0.1348±0.01900.1348\pm 0.0190 11.0331±0.818811.0331\pm 0.8188 11.3775±0.776211.3775\pm 0.7762 8.6260±0.86748.6260\pm 0.8674
4.35 0.1622±0.01770.1622\pm 0.0177 0.1681±0.01740.1681\pm 0.0174 0.1289±0.01840.1289\pm 0.0184 10.3827±0.564810.3827\pm 0.5648 10.7440±0.564010.7440\pm 0.5640 8.2559±0.58948.2559\pm 0.5894
4.40 0.1530±0.01780.1530\pm 0.0178 0.1583±0.01720.1583\pm 0.0172 0.1225±0.01820.1225\pm 0.0182 9.7920±0.55299.7920\pm 0.5529 10.1284±0.554110.1284\pm 0.5541 7.8501±0.58677.8501\pm 0.5867
4.45 0.1446±0.01730.1446\pm 0.0173 0.1490±0.01700.1490\pm 0.0170 0.1166±0.01760.1166\pm 0.0176 9.2557±0.54409.2557\pm 0.5440 9.5575±0.55169.5575\pm 0.5516 7.4716±0.56377.4716\pm 0.5637
4.50 0.1357±0.01720.1357\pm 0.0172 0.1399±0.01700.1399\pm 0.0170 0.1115±0.01750.1115\pm 0.0175 8.7005±0.54468.7005\pm 0.5446 8.9528±0.54428.9528\pm 0.5442 7.1308±0.54797.1308\pm 0.5479
4.60 0.1197±0.01630.1197\pm 0.0163 0.1227±0.01630.1227\pm 0.0163 0.0996±0.01650.0996\pm 0.0165 7.6629±0.52727.6629\pm 0.5272 7.8601±0.51887.8601\pm 0.5188 6.3753±0.52716.3753\pm 0.5271
4.65 0.1127±0.01570.1127\pm 0.0157 0.1151±0.01620.1151\pm 0.0162 0.0939±0.01590.0939\pm 0.0159 7.1991±0.50417.1991\pm 0.5041 7.3486±0.51267.3486\pm 0.5126 6.0062±0.51096.0062\pm 0.5109
4.70 0.1054±0.01530.1054\pm 0.0153 0.1069±0.01570.1069\pm 0.0157 0.0879±0.01540.0879\pm 0.0154 6.7575±0.49786.7575\pm 0.4978 6.8422±0.50756.8422\pm 0.5075 5.6177±0.49665.6177\pm 0.4966
4.8 0.0931±0.01480.0931\pm 0.0148 0.0931±0.01480.0931\pm 0.0148 0.0769±0.01400.0769\pm 0.0140 5.9530±0.47175.9530\pm 0.4717 5.9569±0.46715.9569\pm 0.4671 4.9223±0.45904.9223\pm 0.4590
5.0 0.0733±0.01280.0733\pm 0.0128 0.0704±0.01280.0704\pm 0.0128 0.0593±0.01210.0593\pm 0.0121 4.6935±0.41314.6935\pm 0.4131 4.4963±0.41804.4963\pm 0.4180 3.7920±0.38863.7920\pm 0.3886
Table 1: Two observables calculated on samples generated from i) HMC ii) C-NF with MH and iii) Naive C-NF .The Error indicates standard deviation calculated using bootstrapping re-sampling with bin-size 100.
λ\lambda 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5.0
τint\tau_{int} 1.033 1.084 1.071 1.059 1.048 1.039 1.041 1.018 1.030 1.037
Table 2: Integrated Autocorrelation time for ϕ~2\langle\tilde{\phi}^{2}\rangle from the interpolated C-NF model after applying MH.
λ\lambda 4.2 4.3 4.4 4.5 4.6
τint\tau_{int} 1.046 1.049 1.027 1.032 1.010
Table 3: Integrated Autocorrelation time for ϕ~2\langle\tilde{\phi}^{2}\rangle from the extrapolated C-NF model after applying MH.
λ\lambda ϕ~2\langle\tilde{\phi}^{2}\rangle χ2\chi_{2}
HMC C-NF with MH Naive C-NF HMC C-NF with MH Naive C-NF
4.20 0.1925±0.01810.1925\pm 0.0181 0.1966±0.01700.1966\pm 0.0170 0.1676±0.01890.1676\pm 0.0189 12.3272±0.810312.3272\pm 0.8103 12.5670±0.549012.5670\pm 0.5490 10.7348±0.598910.7348\pm 0.5989
4.30 0.1724±0.01810.1724\pm 0.0181 0.1756±0.01710.1756\pm 0.0171 0.1509±0.01770.1509\pm 0.0177 11.0331±0.818811.0331\pm 0.8188 11.2388±0.557111.2388\pm 0.5571 9.6593±0.58689.6593\pm 0.5868
4.40 0.1530±0.01780.1530\pm 0.0178 0.1562±0.01710.1562\pm 0.0171 0.1348±0.01730.1348\pm 0.0173 9.7920±0.55299.7920\pm 0.5529 9.9943±0.55069.9943\pm 0.5506 8.6316±0.55108.6316\pm 0.5510
4.50 0.1357±0.01720.1357\pm 0.0172 0.1393±0.01670.1393\pm 0.0167 0.1200±0.01640.1200\pm 0.0164 8.7005±0.54468.7005\pm 0.5446 8.9094±0.52938.9094\pm 0.5293 7.6643±0.53427.6643\pm 0.5342
4.60 0.1197±0.01630.1197\pm 0.0163 0.1233±0.01570.1233\pm 0.0157 0.1075±0.01580.1075\pm 0.0158 7.6629±0.52727.6629\pm 0.5272 7.8881±0.51437.8881\pm 0.5143 6.8783±0.49926.8783\pm 0.4992
Table 4: Two observables calculated on samples generated from i)HMC ii)C-NF with MH and iii) Naive C-NF .The Error bars indicates standard deviation calculated using bootstrapping re-sampling with binsize 100.
Refer to caption
(a) λ=4.2\lambda=4.2
Refer to caption
(b) λ=4.4\lambda=4.4
Refer to caption
(c) λ=4.5\lambda=4.5
Refer to caption
(d) λ=4.6\lambda=4.6
Refer to caption
(e) λ=4.8\lambda=4.8
Refer to caption
(f) λ=5.0\lambda=5.0
Figure 10: Interpolation-:Zero momentum Correlation function calculated on samples generated from i)HMC, ii)C-NF followed by MH, and iii) Naive C-NF. The Error bars indicate 95%95\% confidence interval calculated using bootstrapping re-sampling method with bin-size 100.