This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Gauge invariant input to neural network for path optimization method

Yusuke Namekawa namekawa@yukawa.kyoto-u.ac.jp Yukawa Institute for Theoretical Physics, Kyoto University, Kyoto 606-8502, Japan    Kouji Kashiwa Fukuoka Institute of Technology, Wajiro, Fukuoka 811-0295, Japan    Akira Ohnishi Yukawa Institute for Theoretical Physics, Kyoto University, Kyoto 606-8502, Japan    Hayato Takase
Abstract

We investigate the efficiency of a gauge invariant input to a neural network for the path optimization method. While the path optimization with a completely gauge-fixed link-variable input has successfully tamed the sign problem in a simple gauge theory, the optimization does not work well when the gauge degrees of freedom remain. We propose to employ a gauge invariant input, such as a plaquette, to overcome this problem. The efficiency of the gauge invariant input to the neural network is evaluated for the 2-dimensional U(1)U(1) gauge theory with a complex coupling. The average phase factor is significantly enhanced by the path optimization with the plaquette input, indicating good control of the sign problem. It opens a possibility that the path optimization is available to complicated gauge theories, including Quantum Chromodynamics, in a realistic setup.

preprint: YITP-21-101

I Introduction

Exploring the phase structure of gauge theories at finite temperature (TT) and chemical potential (μ\mu) is an interesting and important subject not only in particle and nuclear physics but also in astrophysics. Quantitative understanding of heavy-ion experiments as well as the equation of state for neutron stars requires non-perturbative information of Quantum Chromodynamics (QCD) in the TμT\mathchar 45\relax\mu plane. It is, however, a difficult task due to the sign problem. Traditional Monte Carlo approaches work at low density, but fail in the middle and high density regions.

Recently, several new methods are developed to overcome the sign problem at high densities, e.g. μ/T1\mu/T\geq 1. The complex Langevin method Klauder (1984); Parisi (1983) is a stochastic quantization with complexified variables. It is a non-Monte Carlo approach and thus free from the sign problem. The low computational cost allows us to apply it to 4-dimensional QCD at finite density Sexty (2014); Aarts et al. (2014); Fodor et al. (2015); Nagata et al. (2018); Kogut and Sinclair (2019); Sexty (2019); Scherzer et al. (2020); Ito et al. (2020). The tensor renormalization group method Levin and Nave (2007) is a coarse graining algorithm using a tensor network. It is also a non-Monte Carlo method. Although the computational cost is extremely high, it has been vigorously tested even in 4-dimensional theoretical models Akiyama et al. (2019, 2020, 2021a, 2021b). Recent improved algorithms considerably reduce the cost Kadoh and Nakayama (2019); Kadoh et al. (2021). The Lefschetz thimble method Witten (2011) is a Monte Carlo scheme that complexifies variables and determines the integration path by solving an anti-holomorphic flow equation from fixed points such that the imaginary part of the action is constant. Cauchy’s integral theorem ensures the integral is independent of a choice of the integration path, if the path is given as a result of continuous deformation from the original path Alexandru et al. (2016), crosses no poles, and the integral at infinity has no contribution. The numerical study has been started with Langevin algorithm Cristoforetti et al. (2012), Metropolis algorithm Mukherjee et al. (2013), and Hybrid Monte Carlo algorithm Fujii et al. (2013). The high computational cost is the main bottleneck of this method, but the algorithm development is overcoming it Fukuma and Matsumoto (2020); Fukuma et al. (2021). The path optimization method (POM) Mori et al. (2017, 2018), also referred to as the sign-optimized manifold Alexandru et al. (2018a), is an alternative approach that modifies the integration path by use of the machine learning via neural networks. The machine learning finds the best path on which the sign problem is maximally weakened. The POM successfully works for the complex λϕ4\lambda\phi^{4} theory Mori et al. (2017), the Polyakov-loop extended Nambu–Jona-Lasinio model Kashiwa et al. (2019a, b), the Thirring model Alexandru et al. (2018a, b), the 0+10+1 dimensional bose gas Bursa and Kroyter (2018), the 0+10+1 dimensional QCD Mori et al. (2019), the 2-dimensional U(1)U(1) gauge theory with complexified coupling constant Kashiwa and Mori (2020), as well as noise reduction in observables Detmold et al. (2021). The recent progress of the complexified path approaches is reviewed in Ref. Alexandru et al. (2020).

A key issue of the POM in gauge theories is control of the gauge degrees of freedom. In 0+10+1 dimensional QCD at finite density Mori et al. (2019), the POM works with and without the gauge fixing. In higher dimensions, however, the gauge fixing is required for the neural networks to find an improved path. The effect of the gauge fixing is discussed in the 2-dimensional U(1)U(1) gauge theory with complexified coupling constant Kashiwa and Mori (2020). The average phase factor, an indicator of the sign problem, is never improved without the gauge fixing. As we reduce the gauge degrees of freedom by the gauge fixing, the average phase factor is enhanced better.

Based on this result, we propose to adopt gauge invariant input for the optimization process. The link variables are no longer direct input to the neural network. We first construct a gauge invariant quantity and use it as the input. We employ the simplest gauge invariant input, plaquette, in this study. A similar idea is employed as a part of lattice gauge equivariant Convolutional Neural Networks Favoni et al. (2020). The performance of the POM with the gauge invariant input is demonstrated in the 2-dimensional U(1)U(1) gauge theory with a complex coupling. The sign problem originates from the imaginary part of the complex coupling. The above-mentioned several methods have been tested for this theory Kashiwa and Mori (2020); Pawlowski et al. (2021). Since the analytic result is available Wiese (1989); Rusakov (1990); Bonati and Rossi (2019), we can utilize it for verification of the simulation results.

This paper is organized as follows. In the next section, we explain the formulation of the 2-dimensional lattice U(1)U(1) gauge theory and the path optimization method. Our numerical setup and results are presented in Sec. III. Section IV summarizes this paper.

II Formulation

II.1 22-dimensional U(1)U(1) gauge action

The gauge action is Wilson’s plaquette action Wilson (1974) given by

SG\displaystyle S_{\rm G} =β2n(Pn,12+Pn,121),\displaystyle=-\frac{\beta}{2}\sum_{n}\left(P_{n,12}+P_{n,12}^{-1}\right), (1)

where nn represents the lattice site, and β=1/(ga)2\beta=1/(ga)^{2} is an overall constant consisting of the gauge coupling constant gg and the lattice spacing aa. PP (P1P^{-1}) is the plaquette (its inverse). The definition is

Pn,12:=Un,1Un+1^,2Un+2^,11Un,21,\displaystyle P_{n,12}:=U_{n,1}\,U_{n+\hat{1},2}\,U^{-1}_{n+\hat{2},1}\,U^{-1}_{n,2}, (2)

where μ^\hat{\mu} is a unit vector in μ\mu-direction and Un,μU_{n,\mu} with μ=1,2\mu=1,2 are the U(1)U(1) link variables. We impose a periodic boundary condition in each direction.

In addition to the plaquette, we measure expectation values of the topological charge QQ defined on the lattice,

Q\displaystyle Q :=i4πn(Pn,12Pn,121).\displaystyle:=-\frac{i}{4\pi}\sum_{n}(P_{n,12}-P_{n,12}^{-1}). (3)

In the continuum limit, QQ recovers the continuum form Q=(1/4π)d2xϵμνFμν(x)Q=(1/4\pi)\int d^{2}x\,\epsilon^{\mu\nu}F_{\mu\nu}(x).

The analytic result of this theory has been obtained Wiese (1989); Rusakov (1990); Bonati and Rossi (2019). The partition function ZZ is represented through the modified Bessel function In(β)I_{n}(\beta),

Z\displaystyle Z =n=+In(β)V,\displaystyle=\sum_{n=-\infty}^{+\infty}I_{n}(\beta)^{V}, (4)
In(β)\displaystyle I_{n}(\beta) :=12πππ𝑑ϕeβcosϕinϕ,\displaystyle:=\frac{1}{2\pi}\int_{-\pi}^{\pi}d\phi\,e^{\beta\cos\phi-in\phi}, (5)

where V=N1N2V=N_{1}N_{2} is the volume factor with NμN_{\mu} being the lattice size in μ\mu-direction. Since In(β)I_{n}(\beta) is well-defined for all complex values of β\beta, the analytic solution is available over the whole domain of β\beta.

U2,2U_{2,2}U2+2^,1U_{2+\hat{2},1}U2+1^,2U_{2+\hat{1},2}U2,1U_{2,1}U1,2U_{1,2}U1+2^,1U_{1+\hat{2},1}U1+1^,2U_{1+\hat{1},2}U1,1U_{1,1}P2,12P_{2,12}P1,12P_{1,12}h9h_{9}h8h_{8}h7h_{7}h6h_{6}h5h_{5}h4h_{4}h3h_{3}h2h_{2}h1h_{1}y8y_{8}y7y_{7}y6y_{6}y5y_{5}y4y_{4}y3y_{3}y2y_{2}y1y_{1}
Figure 1: Schematic picture of the neural network with the plaquette input. The link variable Un,μU_{n,\mu} is converted to the plaquette Pn,μνP_{n,\mu\nu} in the input layer. hih_{i} and yiy_{i} are hidden and output layers, respectively.

II.2 Path optimization method with plaquette and link input

The path optimization method utilizes complexified dynamical variables to tame the sign problem. In the case of the U(1)U(1) gauge theory, the plaquette and the link variable are extended as

𝒫n,12\displaystyle{\cal P}_{n,12} =𝒰n,1𝒰n+1^,2𝒰n+2^,11𝒰n,21,\displaystyle={\cal U}_{n,1}\,{\cal U}_{n+\hat{1},2}\,{\cal U}^{-1}_{n+\hat{2},1}\,{\cal U}^{-1}_{n,2}, (6)
𝒰n,μ\displaystyle{\cal U}_{n,\mu} =eig𝒜μ(n+μ^/2)=:Un,μeyn,\displaystyle=e^{ig{\cal A}_{\mu}(n+\hat{\mu}/2)}=:U_{n,\mu}\,e^{-y_{n}}, (7)

where 𝒜μ{\cal A}_{\mu}\in\mathbb{C}. The modification of the integral path is represented by yny_{n}\in\mathbb{R}, originated from the imaginary part of 𝒜μ{\cal A}_{\mu}, which is evaluated by a neural network consisting of the input, a single hidden, and the output layers, as follows. The neural network is a mathematical model inspired by a brain, which is often used for the machine learning McCulloch and Pitts (1943); Rosenblatt (1958); Hebb (2002); Hinton and Salakhutdinov (2006). A sufficient number of the hidden layer units with a non-linear function called an activation function reproduce any continuous function, as proven by the universal approximation theorem Cybenko (1989); Hornik (1991). The variables in the hidden layer nodes (hjh_{j}) and the output (yny_{n}) are set as

hj\displaystyle h_{j} =F(wji(1)ti+bj),\displaystyle=F(w^{(1)}_{ji}t_{i}+b_{j}), (8)
yn\displaystyle y_{n} =ωnF(wnj(2)hj+bj),\displaystyle=\omega_{n}F(w^{(2)}_{nj}h_{j}+b_{j}), (9)

where i,j=1,,2×ndegi,j=1,\cdots,2\times n_{\mathrm{deg}} with the number of the degree of freedom ndegn_{\mathrm{deg}} and ww, bb and ω\omega are parameters of the neural network. An activation function FF defines a relation of data between two layers, the input and hidden layers as well as the hidden and output layers. The activation function is taken to be a tangent hyperbolic function in this work. Our choice of the input t={ti}t=\{t_{i}\} is the plaquette or the link variable.

(i)\displaystyle({\rm i})\, t=ReP,ImP\displaystyle t={\rm Re}\,P,\,{\rm Im}\,P (10)
(ii)\displaystyle({\rm ii})\, t=ReU,ImU.\displaystyle t={\rm Re}\,U,\,{\rm Im}\,U. (11)

We compare (i) with (ii) in terms of the efficiency of the neural network. A schematic picture of our neural network with the plaquette input is given in Fig. 1.

The expectation value of a complexified observable 𝒪{\mathcal{O}} is calculated by

𝒪\displaystyle\langle{\mathcal{O}}\rangle :=1Z𝒟U𝒪(U)eSG(U)=1Z𝒞𝒟𝒰𝒪(𝒰)eSG(𝒰)\displaystyle:=\frac{1}{Z}\int\mathcal{D}U{\mathcal{O}}(U)e^{-S_{\rm G}(U)}=\frac{1}{Z}\int_{\mathcal{C}}\,\mathcal{DU}\,{\mathcal{O}}(\mathcal{U})\,e^{-S_{\rm G}(\mathcal{U})}
=1Z𝒟U[𝒪eiθ|JeSG|]𝒰𝒞\displaystyle=\frac{1}{Z}\int\mathcal{D}U\,\left[{\mathcal{O}}e^{i\theta}|J\,e^{-S_{\rm G}}|\right]_{\mathcal{U}\in\mathcal{C}}
=𝒪eiθpqeiθpq,𝒪pq:=1Z𝒟U[𝒪|JeSG|]𝒰𝒞,\displaystyle=\frac{\langle{\mathcal{O}}\,e^{i\theta}\rangle_{\mathrm{pq}}}{\langle e^{i\theta}\rangle_{\mathrm{pq}}},\,\langle{\mathcal{O}}\rangle_{\mathrm{pq}}:=\frac{1}{Z}\int\mathcal{D}U\,\left[{\mathcal{O}}|J\,e^{-S_{\rm G}}|\right]_{\mathcal{U}\in\mathcal{C}}, (12)

where 𝒞\mathcal{C} is the integration path that specifies the complexified link variables 𝒰=𝒰(U)\mathcal{U}=\mathcal{U}(U) and the Jacobian J(𝒰(U))=det(𝒰/U)J(\mathcal{U}(U))=\mathrm{det}(\partial\mathcal{U}/\partial U). θ\theta is the phase of JeSG=eiθ|JeSG|J\,e^{-S_{\rm G}}=e^{i\theta}|J\,e^{-S_{\rm G}}|. pq\langle\cdots\rangle_{\mathrm{pq}} denotes the expectation value with the phase quenched Boltzmann weight. In contrast to the naive reweighting, Eq. (12) is evaluated on the modified integration path where the sign problem is maximally weakened by the machine learning without change of the expectation value guaranteed by Cauchy’s theorem. This is the advantage of the path optimization method.

The cost function controls optimization through the neural network. We apply the following cost function {\cal F} to minimize the sign problem,

[y(t)]=|Z|(|eiθ(t)pq|11).\displaystyle{\cal F}[y(t)]=|Z|\left(|\langle e^{i\theta(t)}\rangle_{\mathrm{pq}}|^{-1}-1\right). (13)

We evaluate it by the exponential moving average (EMA) as in Ref. Kashiwa and Mori (2020). Using this cost function (13), the neural network finds the best path that enhances eiθ(t)e^{i\theta(t)} as much as possible.

Refer to caption
Figure 2: Neural network iteration dependence of the exponential moving average phase factors at β=0.0+0.5i\beta=0.0+0.5i on 4×44\times 4 lattice.
Refer to caption
Refer to caption
Refer to caption
Figure 3: Histogram of the phases with and without the path optimization at β=0.0+0.5i\beta=0.0+0.5i on 2×22\times 2 (upper panel), 4×44\times 4 (middle panel), and 8×88\times 8 (lower panel) lattices.
Refer to caption
Figure 4: Volume dependence of the average phase factor at β=0.0+0.5i\beta=0.0+0.5i.
Refer to caption
Refer to caption
Refer to caption
Figure 5: The average phase factors with and without the path optimization at β=0.0+(0.25\beta=0.0+(0.252.25)i2.25)i on 2×22\times 2 (upper panel), 4×44\times 4 (middle panel), and 8×88\times 8 (lower panel) lattices.
Refer to caption
Refer to caption
Refer to caption
Figure 6: Histogram of the phases with the path optimization around βc\beta_{c} on 2×22\times 2 (upper panel), 4×44\times 4 (middle panel), and 8×88\times 8 (lower panel) lattices.
Refer to caption
Refer to caption
Refer to caption
Figure 7: Expectation values of the imaginary part of plaquette with and without the path optimization at β=0.0+(0.25\beta=0.0+(0.252.25)i2.25)i on 2×22\times 2 (upper panel), 4×44\times 4 (middle panel), and 8×88\times 8 (lower panel) lattices.
Refer to caption
Refer to caption
Refer to caption
Figure 8: Topological charge susceptibility with and without the path optimization at β=0.0+(0.25\beta=0.0+(0.252.25)i2.25)i on 2×22\times 2 (upper panel), 4×44\times 4 (middle panel), and 8×88\times 8 (lower panel) lattices.

III Numerical setup and result

III.1 Setup

We evaluate performance of the POM for β=0.0+(0.25\beta=0.0+(0.252.25)i2.25)i. The sign problem is originated from the imaginary part of β\beta. We generate gauge configurations by the Hybrid Monte-Carlo algorithm in the POM. The total number of configurations is 5000050000. Statistical errors are estimated by the Jackknife method with the bin size of 250250. The neural network utilizes ADADELTA optimizer Zeiler (2012) combined with the Xavier initialization Glorot and Bengio (2010). The parameters in Eqs.(8) and (9) are optimized during learning. The flow chart is displayed in Ref. Kashiwa and Mori (2020). We set the learning rate to 1 and the decay constant 0.95, combined with the batch size of 10. The number of units in the input layer reflects our choice of tt in Eqs. (10), (11) as

(i)\displaystyle({\rm i})\, Ninputplaq=2ndegplaq for plaquette input,\displaystyle N_{\mathrm{input}}^{\rm plaq}=2\,n_{\mathrm{deg}}^{\rm plaq}\mbox{ for plaquette input}, (14)
(ii)\displaystyle({\rm ii})\, Ninputlink=2ndeglink for link input,\displaystyle N_{\mathrm{input}}^{\rm link}=2\,n_{\mathrm{deg}}^{\rm link}\mbox{ for link input}, (15)

where ndeglink=2N1N2n_{\mathrm{deg}}^{\rm link}=2N_{1}N_{2} and ndegplaq=N1N2n_{\mathrm{deg}}^{\rm plaq}=N_{1}N_{2}. The number of units in the output layer is common to (i) and (ii), Noutput=ndeglinkN_{\mathrm{output}}=n_{\mathrm{deg}}^{\rm link}. We employ a single hidden layer with Nhidden=10N_{\mathrm{hidden}}=10 hidden units on 2×22\times 2 lattice, 1616 on 4×44\times 4 lattice, and 6464 on 8×88\times 8 lattice, respectively. As we set the NhiddenN_{\mathrm{hidden}} proportional to the volume, the cost of the neural network is O(ndeg2)O(n_{\rm deg}^{2}) and is O(ndeg3)O(n_{\rm deg}^{3}) for the Jacobian.

III.2 Result

Figure 2 exhibits the neural-network–step-number dependence of the exponential moving average of the average phase factor exp(iθ)EMA\left\langle\exp(i\theta)\right\rangle_{\rm EMA} at β=0.0+0.5i\beta=0.0+0.5i on a 4×44\times 4 lattice as a typical example. The path optimization with the plaquette input successfully enhances exp(iθ)EMA\left\langle\exp(i\theta)\right\rangle_{\rm EMA}, while that with the link variable input does not. Similar behavior is also observed at other values of β\beta on 2×2,4×42\times 2,4\times 4 and 8×88\times 8 lattices. Our result verifies the advantage of the gauge invariant input to the neural network for the 2-dimensional U(1)U(1) gauge theory with the complex coupling.

The enhancement with the plaquette input is confirmed in the histogram of the phases, shown in Fig. 3. While the naive reweighting gives a broad distribution of the phase factor, the path optimization significantly sharpens the peak structure. We stress that the POM works even on the 8×88\times 8 lattice, where the sign problem is severer. Although the naive reweighting has almost flat dependence on the phase, the POM can still extract a peak structure around θ/π0.2\theta/\pi\sim-0.2.

Figure 4 represents the volume dependence of the average phase factor at β=0.0+0.5i\beta=0.0+0.5i with additional simulation results on 6×66\times 6, 10×1010\times 10 and 12×1212\times 12 lattices. The naive reweighting leads to steep exponential fall-off as a function of the volume. The sign problem becomes extremely severer toward the infinite volume limit. The path optimization evidently changes the volume dependence of the average phase factor to be milder. It indicates better control of the sign problem.

We plot the average phase factors with and without the path optimization as functions of Imβ\mathrm{Im}\,\beta in Fig. 5. The average phase factors are enhanced by factors of up to 4 on 2×22\times 2, 21 on 4×44\times 4, and 27 on 8×88\times 8 lattices, respectively. The enhancement decreases, however, as we approach the critical coupling Imβc1.5{\rm Im}\beta_{c}\sim 1.5 where the partition function becomes zero, corresponding to the Lee-Yang zero. While clear peaks are still visible in the histogram of the phases by the path optimization even at ββc\beta\sim\beta_{c} on 2×22\times 2 lattice, no clear peak is obtained and the neural network eventually becomes unstable around βc\beta_{c} on 4×44\times 4 and 8×88\times 8 lattices, as displayed in Fig. 6. We need further improvement of the POM around βc\beta_{c} on large lattices.

Comparison with the exact solution (4) is accomplished for the expectation values of the plaquette and the topological charge. The result for the plaquette is plotted in Fig. 7 and for the topological charge in Fig. 8. The naive reweighting works in a small β\beta region but starts to deviate from the exact value with uncontrolled errors as β\beta becomes larger. Our data using the POM agree with the exact solution, as long as we find a peak in the histogram of the phases. The valid region of the POM is definitely extended to larger β\beta. Deviations from the exact solution are also observed by the path optimization case, if we have no clear peak in the histogram of the phases, where enhancement of the phase factor by the path optimization is still limited to be small or the path optimization is unstable. One reason of the failure is a restriction of the statistics. The machine learning requires more data to find the best path especially for a system with a large degrees of freedom. Another possibility is the effect of the multimodality. Though there can be several regions contributing to the integral (relevant thimbles), the optimized path may emphasize only a part of them. Quantitative evaluation of the systematic errors of the POM result is still difficult and is beyond the scope of this paper. It is an important future work. Nevertheless, these results demonstrate the superiority of the POM over the naive reweighting for the analysis of the U(1)U(1) gauge theory with the complex β\beta.

IV Summary

We established the efficiency of gauge invariant input in the POM for the 2-dimensional U(1)U(1) gauge theory with a complex coupling. While the path optimization using the link variable input without gauge fixing shows no gain, the optimization with the gauge invariant input shows a clear increase in the average phase factor through the neural network process. The gain is up to 4 on 2×22\times 2, 21 on 4×44\times 4, and 27 on 8×88\times 8 lattices. Even on 8×88\times 8 lattice at β=0.0+0.5i\beta=0.0+0.5i, the POM using the gauge invariant input successfully identifies a peak structure in the histogram of the phases, where the naive reweighting shows no peak in the histogram. We confirm the volume dependence of the average phase factor becomes much milder by the POM with the gauge invariant input. We also calculated expectation values of the plaquette and the topological charge for comparison with their exact solutions. Our results agree with the analytical values, as long as we find a peak in the histogram of the phases. The valid region is clearly enlarged to larger β\beta, compared with that by the naive reweighting method. It is encouraging toward the POM analysis in more realistic cases.

In the severer sign problem region near the critical point on the large volume, the gain of the POM is less clear. Further improvement of the POM is required. One possible direction is the incorporation of larger Wilson loops and the Polyakov lines as input to the neural network. Another direction is the adoption of novel approaches which respect the gauge symmetry in the neural network, such as the lattice gauge equivariant convolutional neural networks Favoni et al. (2020) and the gauge covariant neural network Tomiya and Nagai (2021). In addition, there is a different approach that modifies the action instead of the integral path Tsutsui and Doi (2016); Doi and Tsutsui (2017); Lawrence (2020). The combination of the modifications of the action and the integral path seems to be interesting.

Acknowledgements.
Our code is in part based on LTKf90 Choe et al. (2002). This work is supported by JSPS KAKENHI Grant Numbers JP18K03618, JP19H01898, and JP21K03553.

References