Machine Learning Regularization
for the Minimum Volume Formula of Toric Calabi-Yau 3-folds

Eugene Choi^a Rak-Kyeong Seong^a,b xeugenechoi@gmail.com, seong@unist.ac.kr ^a Department of Mathematical Sciences, and
^b Department of Physics,
Ulsan National Institute of Science and Technology,
50 UNIST-gil, Ulsan 44919, South Korea

Abstract

We present a collection of explicit formulas for the minimum volume of Sasaki-Einstein 5-manifolds. The cone over these 5-manifolds is a toric Calabi-Yau 3-fold. These toric Calabi-Yau 3-folds are associated with an infinite class of $4d$ $\mathcal{N}=1$ supersymmetric gauge theories, which are realized as worldvolume theories of D3-branes probing the toric Calabi-Yau 3-folds. Under the AdS/CFT correspondence, the minimum volume of the Sasaki-Einstein base is inversely proportional to the central charge of the corresponding $4d$ $\mathcal{N}=1$ superconformal field theories. The presented formulas for the minimum volume are in terms of geometric invariants of the toric Calabi-Yau 3-folds. These explicit results are derived by implementing machine learning regularization techniques that advance beyond previous applications of machine learning for determining the minimum volume. Moreover, the use of machine learning regularization allows us to present interpretable and explainable formulas for the minimum volume. Our work confirms that, even for extensive sets of toric Calabi-Yau 3-folds, the proposed formulas approximate the minimum volume with remarkable accuracy.

^†^†preprint: UNIST-MTH-23-RS-05

I Introduction

Since the introduction of machine learning techniques in He:2017aed ; Krefl:2017yox ; Ruehle:2017mzq ; Carifio:2017bov ; Cole:2019enn ; Cole:2020gkd ; Halverson:2020trp ; Gukov:2020qaj ; Abel:2021rrj ; Krippendorf:2021uxu ; Cole:2021nnt ; Berglund:2023ztk ; Demirtas:2023fir for studying problems that occur in the context of string theory, machine learning – both supervised Bull:2018uow ; Jejjala:2019kio ; Brodie:2019dfx ; He:2020lbz ; Erbin:2020tks ; Anagiannis:2021cco ; Larfors:2022nep and unsupervised Krippendorf:2020gny ; Berman:2021mcw ; Bao:2021olg ; Seong:2023njx – has led to a variety of applications in string theory. A problem that appeared particularly suited for machine learning in 2017 Krefl:2017yox was the problem of identifying a formula for the minimum volume of Sasaki-Einstein 5-manifolds Martelli:2006yb ; Martelli:2005tp . The cone over these Sasaki-Einstein 5-manifolds is a toric Calabi-Yau 3-fold fulton ; 1997hep.th…11013L . Given that there are infinitely many toric Calabi-Yau 3-folds with corresponding Sasaki-Einstein 5-manifolds and that there is an infinite class of $4d$ $\mathcal{N}=1$ supersymmetric gauge theories associated to them via string theory Greene:1996cy ; Douglas:1997de ; Witten:1998qj ; Klebanov:1998hh ; Douglas:1996sw ; Lawrence:1998ja ; Feng:2000mi ; Feng:2001xr , this beautiful correspondence between geometry and gauge theory was identified in Krefl:2017yox as an ideal testbed for introducing machine learning for string theory.

These $4d$ $\mathcal{N}=1$ supersymmetric gauge theories corresponding to toric Calabi-Yau 3-folds are realized as worldvolume theories of D3-branes probing the Calabi-Yau singularities. Via the AdS/CFT correspondence Maldacena:1997re ; Morrison:1998cs ; Acharya:1998db , the minimum volume of the Sasaki-Einstein 5-manifolds is related to the maximized $a$ -function Intriligator:2003jj ; Butti:2005vn ; Butti:2005ps that gives the central charges of the corresponding $4d$ $\mathcal{N}=1$ superconformal field theories Gubser:1998vd ; Henningson:1998gx . The proposal in Krefl:2017yox was that machine learning techniques can be used to give a formula of the minimum volume in terms of features taken from the toric diagram of the corresponding toric Calabi-Yau 3-folds. Such a formula would significantly simplify the computation of the minimum volume, which conventionally is computed by minimizing the volume function obtained from the equivariant index Martelli:2006yb ; Martelli:2005tp or Hilbert series of the toric Calabi-Yau 3-fold Benvenuti:2006qr ; Feng:2007ur .

In Krefl:2017yox , we made use of multiple linear regression gauss1823theoria ; fisher1922mathematical ; mendenhall2003second ; freedman2009statistical ; jobson2012applied and a combination of a regression model and a convolutional neural network (CNN) lecun1998gradient ; krizhevsky2012imagenet ; lecun2015deep ; schmidhuber2015deep to learn the minimum volume for toric Calabi-Yau 3-folds. As it is often the case for supervised machine learning rumelhart1986learning ; hastie2009elements , the models lacked interpretability and explainability, achieving high accuracies in estimating the minimum volume with giving only little insight into the mathematical structure and physical origin of the estimating formula.

	0	1	2	3	4	5	6	7	8	9
D5	$\times$	$\times$	$\times$	$\times$	$\cdot$	$\times$	$\cdot$	$\times$	$\cdot$	$\cdot$
NS5	$\times$	$\times$	$\times$	$\times$	— $\Sigma$ —				$\cdot$	$\cdot$

Table 1: Type IIB brane configuration for brane tilings, where

\Sigma:P(x,y)=0

refers to the holomorphic curve defined by the corresponding toric Calabi-Yau 3-fold and the Newton polynomial

P(x,y)

of the associated toric diagram

\Delta

Hori:2000kt ; Feng:2005gw .

In this work, we aim to highlight the pivotal role of regularization techniques in machine learning tikhonov1963regularization ; hastie2009elements . We demonstrate that employing regularized machine learning models can effectively address the limitations inherent in supervised machine learning, especially for problems that appear in string theory and, more broadly, for problems at the intersection of mathematics and physics. While the primary objective of regularization in machine learning is to prevent overfitting, certain versions of it can be employed to eliminate model parameters, echoing the spirit of regularization in quantum field theory.

By focusing on Least Absolute Shrinkage and Selection Operator (Lasso) regularization tibshirani1996regression for polynomial and logarithmic regression models, we identify several candidate formulas for the minimum volume of Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds. The discovered formulas depend either on 3 or 6 parameters that come from features of the corresponding toric diagrams fulton ; 1997hep.th…11013L – convex lattice polygons on $\mathbb{Z}^{2}$ that characterize uniquely the associated toric Calabi-Yau 3-fold. Compared to the extremely large number of parameters in the regression and CNN models used in our previous work in Krefl:2017yox , the formulas obtained in this study are both presentable, interpretable, and most importantly reusable for the computation of the minimum volume for toric Calabi-Yau 3-folds.

II Calabi-Yau 3-Folds and Quiver Gauge Theories

In this work, we concentrate on non-compact toric Calabi-Yau 3-folds $\mathcal{X}$ . These geometries can be considered as cones over Sasaki-Einstein 5-manifolds $Y_{5}$ Maldacena:1997re ; Morrison:1998cs ; Acharya:1998db ; Martelli:2004wu ; Benvenuti:2004dy ; Benvenuti:2005ja ; Butti:2005sw . The toric Calabi-Yau 3-folds are fully characterized by convex lattice polygons $\Delta$ on $\mathbb{Z}^{2}$ known as toric diagrams fulton ; 1997hep.th…11013L . The associated Calabi-Yau singularities can be probed by D3-branes whose worldvolume theories form a class of $4d$ $\mathcal{N}=1$ supersymmetric gauge theories Greene:1996cy ; Douglas:1997de ; Witten:1998qj ; Klebanov:1998hh ; Douglas:1996sw ; Lawrence:1998ja ; Feng:2000mi ; Feng:2001xr .

This class of $4d$ $\mathcal{N}=1$ supersymmetric gauge theories can be represented in terms of a T-dual Type IIB brane configuration known as a brane tiling Franco:2005rj ; Hanany:2005ve ; Franco:2005sm . Table 1 summarizes the Type IIB brane configuration. Brane tilings can be illustrated in terms of bipartite graphs on a 2-torus $T^{2}$ 2003math…..10326K ; kasteleyn1967graph and encapsulate both the field theory information and the information about the associated toric Calabi-Yau geometry. Figure 1 shows an example of a brane tiling and its associated toric Calabi-Yau 3-fold, which is in this case the cone over the zeroth Hirzebruch surface $F_{0}$ hirzebruch1968singularities ; brieskorn1966beispiele ; Morrison:1998cs ; Feng:2000mi . The mesonic moduli spaces Witten:1993yc ; Benvenuti:2006qr ; Feng:2007ur ; Butti:2007jv formed by the mesonic gauge invariant operators of these $4d$ $\mathcal{N}=1$ supersymmetric gauge theories with $U(1)$ gauge groups is precisely the associated toric Calabi-Yau 3-folds. When all the gauge groups of the $4d$ $\mathcal{N}=1$ supersymmetric gauge theory are $U(N)$ , then the mesonic moduli space is given by the $N$ -th symmetric product of the toric Calabi-Yau 3-fold.

Refer to caption — Figure 1: (a) The brane tiling for the second phase of the zeroth Hirzebruch surface $F_{0}$ , and (b) its corresponding toric diagram hirzebruch1968singularities ; brieskorn1966beispiele ; Morrison:1998cs ; Feng:2000mi .

The gravity dual of the $4d$ worldvolume theories is Type IIB string theory on $AdS_{5}\times Y_{5}$ , where $Y_{5}$ is the Sasaki-Einstein 5-manifold that forms the base of the associated toric Calabi-Yau 3-fold Maldacena:1997re ; Morrison:1998cs ; Acharya:1998db ; Martelli:2004wu ; Benvenuti:2004dy ; Benvenuti:2005ja ; Butti:2005sw . These $4d$ $\mathcal{N}=1$ supersymmetric gauge theories are known to flow at low energies to a superconformal fixed point. Under a procedure known as $a$ -maximization Intriligator:2003jj ; Butti:2005vn ; Butti:2005ps , the superconformal $R$ -charges of the $4d$ theory are determined. This procedure, involves the maximization of the trial $a$ -charge, which takes the form

\displaystyle a(R;Y_{5})=\frac{3}{32}(3\text{Tr}R^{3}-\text{Tr}R)~{}.~{}

(II.1)

The maximization procedure gives the value of the central charge of the superconformal field theory at the conformal fixed point.

Under the AdS/CFT correspondence Maldacena:1997re ; Morrison:1998cs ; Acharya:1998db , the central charge is directly related to the minimized volume of the corresponding Sasaki-Einstein 5-manifold $Y_{5}$ Gubser:1998vd ; Henningson:1998gx . We have,

\displaystyle a(R;Y_{5})=\frac{\pi^{3}N^{2}}{4V(R;Y_{5})}~{},~{}

(II.2)

where the R-charges $R$ and as a result the volume function $V(R;Y_{5})$ can be expressed in terms of Reeb vector components $b_{i}$ of the corresponding Sasaki-Einstein 5-manifold Martelli:2006yb ; Martelli:2005tp . We can reverse the statement saying that computing the minimum volume,

\displaystyle V_{min}=\text{min}_{b_{i}}~{}V(b_{i};Y_{5})~{},~{}

(II.3)

is equivalent to obtaining the maximum value of the central charge $a(R;Y_{5})$ . This correspondence is true for all $4d$ theories living on a stack of $N$ D3-branes probing toric Calabi-Yau 3-folds and has been checked extensively in various examples Intriligator:2003jj ; Butti:2005vn ; Butti:2005ps .

In this work, we will focus on the toric Calabi-Yau 3-folds and the corresponding Sasaki-Einstein 5-manifold $Y_{5}$ , with particular emphasis on the minimum volume $V_{min}$ of the Sasaki-Einstein 5-manifolds $Y_{5}$ . Building on the pioneering work of Krefl:2017yox , this work proposes the use of more advanced machine learning techniques. In particular, we introduce machine learning regularization by using the Least Absolute Shrinkage and Selection Operator (Lasso) tibshirani1996regression in order to identify an explicit formula for the minimum volume $V_{min}$ for Sasaki-Einstein 5-manifolds $Y_{5}$ . We expect to be able to write the minimum volume formula in terms of features obtained from the toric diagram of the corresponding toric Calabi-Yau 3-folds. The use of machine learning regularization allows us to eliminate parameters, reducing the necessary parameters for the volume formula to a manageable amount that is interpretable, presentable and reusable.

Before discussing these machine learning techniques, let us first review in the following section the computation of the volume functions for toric Calabi-Yau 3-folds using Hilbert series.

III Hilbert Series and Calabi-Yau Volumes

Given $\mathcal{X}$ as a cone over a projective variety $X$ , where $X$ is realized as an affine variety in $\mathbb{C}$ , the Hilbert series Benvenuti:2006qr ; Feng:2007ur is the generating function for the dimension of the graded pieces of the coordinate ring

\displaystyle\mathbb{C}[x_{1},\dots,x_{k}]/\langle f_{i}\rangle~{},~{}

(III.4)

where $f_{i}$ are the defining polynomials of $X$ . Accordingly, the Hilbert series takes the general form

\displaystyle g(t;\mathcal{X})=\sum_{i=0}^{\infty}\text{dim}_{\mathbb{C}}(X_{i})t^{i}~{}.~{}

(III.5)

For $4d$ $\mathcal{N}=1$ supersymmetric gauge theories given by brane tilings Franco:2005rj ; Hanany:2005ve ; Franco:2005sm , we have an associated toric Calabi-Yau 3-fold $\mathcal{X}$ , which becomes the mesonic moduli space Witten:1993yc ; Benvenuti:2006qr ; Feng:2007ur ; Butti:2007jv of the $4d$ $\mathcal{N}=1$ supersymmetric gauge theory when the gauge groups are all $U(1)$ . The corresponding Hilbert series is the generating function of mesonic gauge invariant operators that form the mesonic moduli space. For the purpose of the remaining discussion, we will consider the $4d$ $\mathcal{N}=1$ supersymmetric gauge theories given by brane tilings as abelian theories with $U(1)$ gauge groups.

Following the forward algorithm for brane tilings Feng:2000mi , we can use GLSM fields Witten:1993yc given by perfect matchings $p_{\alpha}$ Hanany:2005ve ; Franco:2005rj of the brane tilings in order to express the mesonic moduli space of the abelian $4d$ $\mathcal{N}=1$ supersymmetric gauge theory as the following symplectic quotient,

\displaystyle\mathcal{X}={}^{\text{Irr}}\mathcal{F}^{\flat}//Q_{D}=\left(\mathbb{C}[p_{\alpha}]//Q_{F}\right)//Q_{D}~{},~{}

(III.6)

where ${\text{Irr}}\mathcal{F}^{\flat}$ is the largest irreducible component, also known as the coherent component, of the master space $\mathcal{F}^{\flat}$ Hanany:2010zz ; Forcella:2008bb ; Forcella:2008eh of the $4d$ $\mathcal{N}=1$ supersymmetric gauge theory. The master space is the spectrum of the coordinate ring generated by the chiral fields encoded in $p_{\alpha}$ and quotiented by the F-term relations encoded in $Q_{F}$ . In (III.6), $Q_{F}$ is the $F$ -term charge matrix summarizing the $U(1)$ charges originating from the $F$ -terms, and $Q_{D}$ is the $D$ -term charge matrix which summarizes the $U(1)$ gauge charges on perfect matchings $p_{\alpha}$ .

Following the symplectic quotient description of the mesonic moduli space in (III.6), the Hilbert series can be obtained by solving the Molien integral Pouliot:1998yv ,

	$\displaystyle g(y_{\alpha};\mathcal{X})$	$\displaystyle=$	$\displaystyle\prod_{i=1}^{c-2}\oint_{\|z_{i}\|=1}\frac{\mathrm{d}z_{i}}{2\pi iz_{i}}$		(III.7)
			$\displaystyle\times\prod_{\alpha=1}^{c}\frac{1}{1-y_{\alpha}\prod_{j=1}^{c-3}z_{j}^{(Q_{t})_{j\alpha}}}~{},~{}$		(III.7)

where $c$ is the number of perfect matchings in the brane tiling and $Q_{t}=(Q_{F},Q_{D})$ is the total charge matrix.

Martelli:2006yb ; Martelli:2005tp showed that the same Hilbert series can be obtained directly from the toric diagram $\Delta$ of the toric Calabi-Yau 3-fold $\mathcal{X}$ . Given that the toric diagram $\Delta$ is a convex lattice polygon on $\mathbb{Z}^{2}$ with an ideal triangulation $\mathcal{T}(\Delta)$ into unit sub-triangles $\Delta_{i}\in\mathcal{T}(\Delta)$ , the Hilbert series of the corresponding toric Calabi-Yau 3-fold $\mathcal{X}$ can be written as

\displaystyle g(t_{i};\mathcal{X})=\sum_{i=1}^{r}\prod_{j=1}^{n}\frac{1}{(1-\mathbf{t}^{\mathbf{u}_{i,j}})}~{},~{}

(III.8)

where $i=1,\dots,r$ is the index for the $r$ unit triangles $\Delta_{i}\in\mathcal{T}(\Delta)$ , and $j=1,2,3$ is the index for the $3$ boundary edges of each unit triangle $\Delta_{i}$ . For each boundary edge $e_{j}\in\Delta_{i}$ , we have a $3$ -dimensional outer normal vector $\mathbf{u}_{i,j}$ whose components are assigned the following product of fugacities,

\displaystyle\mathbf{t}^{\mathbf{u}_{i,j}}=\prod_{a}^{3}t_{a}^{\mathbf{u}_{i,j}(a)}~{},~{}

(III.9)

where $\mathbf{u}_{i,j}(a)$ indicates the $a$ -th component of $\mathbf{u}_{i,j}$ . We note that $\mathbf{u}_{i,j}$ is a $3$ -dimensional vector because the defining vertices of $\Delta$ and $\Delta_{i}$ are all on a plane at height $z=1$ such that their coordinates are of the form $(x,y,1)$ . As a result, the vectors $\mathbf{u}_{i,j}$ corresponding to edge $e_{j}\in\Delta_{j}$ are normal to the 3-dimensional surface given by the vectors connecting the origin $(0,0,0)$ to the two bounding vertices of $e_{j}\in\Delta_{j}$ .

It is important to note that the fugacities $t_{1},t_{2},t_{3}$ in (III.9) relate to the components of normal vectors $\mathbf{u}_{i,j}$ , and therefore depend on the triangulation and the particular instance in a given $GL(2,\mathbb{Z})$ toric orbit of a toric diagram on the $z=1$ plane. In comparison, the fugacities $y_{\alpha}$ in (III.7) refer to the GLSM fields $p_{\alpha}$ given by perfect matchings of the corresponding brane tiling. Since perfect matchings can be mapped directly to chiral fields in the $4d$ $\mathcal{N}=1$ supersymmetric gauge theory, the fugacities $y_{\alpha}$ in (III.7) can be mapped to fugacities counting global symmetry charges carried by chiral fields in the $4d$ theory. Because both Hilbert series from (III.7) and (III.8) refer to the same toric Calabi-Yau 3-fold $\mathcal{X}$ , there exists a fugacity map between $y_{\alpha}$ and $t_{1},t_{2},t_{3}$ that identifies the two Hilbert series with each other.

For the rest of the discussion, let us consider Hilbert series for toric Calabi-Yau 3-folds $\mathcal{X}$ that are in terms of fugacities $t_{1},t_{2},t_{3}$ corresponding to coordinates of the normal vectors $\mathbf{u}_{i,j}\in\mathbb{Z}^{3}$ of the toric diagram $\Delta$ . Given the Hilbert series $g(t_{i};\mathcal{X})$ , we can obtain the volume function Martelli:2006yb ; Martelli:2005tp of the Sasaki-Einstein 5-manifold $Y_{5}$ using,

\displaystyle V(b_{i};Y_{5})=\lim_{\mu\rightarrow 0}\mu^{3}g(t_{i}=\exp[-\mu b_{i}];\mathcal{X})~{},~{}

(III.10)

where $b_{i}$ are the Reeb vector components with $i=1,\dots 3$ . We note that the Reeb vector ${\bf b}=(b_{1},b_{2},b_{3})$ is always in the interior of the toric diagram $\Delta$ and can be chosen such that one of its components is set to

\displaystyle b_{3}=3~{},~{}

(III.11)

for toric Calabi-Yau 3-folds $\mathcal{X}$ . We further note that the limit in (III.10) takes the leading order in $\mu$ in the expansion for $g(t_{i}=\exp[-\mu b_{i}];\mathcal{X})$ , which is shown to refer to the volume of the Sasaki-Einstein base $Y_{5}$ in Martelli:2006yb ; Martelli:2005tp .

Let us consider in the following paragraph an example of the computation of the volume function in terms of Reeb vector components $b_{i}$ for the Sasaki-Einstein base of the cone over the zeroth Hirzebruch surface $F_{0}$ hirzebruch1968singularities ; brieskorn1966beispiele ; Morrison:1998cs ; Feng:2000mi .

Example: $F_{0}$ . The toric diagram, its triangulation and the outer normal vectors $\mathbf{u}_{i,j}$ for the cone over the zeroth Hirzebruch surface $F_{0}$ hirzebruch1968singularities ; brieskorn1966beispiele ; Morrison:1998cs ; Feng:2000mi are shown in Figure 2(a). The cone over the zeroth Hirzebruch surface $F_{0}$ is an interesting toric Calabi-Yau 3-fold because it has two distinct corresponding $4d$ $\mathcal{N}=1$ supersymmetric gauge theories represented by two distinct brane tilings that are related by Seiberg duality Seiberg:1994pq ; 2001JHEP…12..001B ; Feng:2000mi . One of the brane tilings is shown in Figure 1.

Using the outer normal vectors $\mathbf{u}_{i,j}$ for each of the four unit sub-triangles $\Delta_{i}$ of the toric diagram for $F_{0}$ in Figure 2(b), we can use (III.8) to write down the Hilbert series,

	$\displaystyle g(t_{i};F_{0})=\frac{1}{(1-t_{1})(1-t_{2}^{-1})(1-t_{1}^{-1}t_{2}t_{3}^{-1})}$
	$\displaystyle\hskip 14.22636pt+\frac{1}{(1-t_{1}^{-1})(1-t_{2}^{-1})(1-t_{1}t_{2}t_{3}^{-1})}$
	$\displaystyle\hskip 14.22636pt+\frac{1}{(1-t_{1})(1-t_{2})(1-t_{1}^{-1}t_{2}^{-1}t_{3}^{-1})}$
	$\displaystyle\hskip 14.22636pt+\frac{1}{(1-t_{1}^{-1})(1-t_{2})(1-t_{1}t_{2}^{-1}t_{3}^{-1})}~{}.~{}$		(III.12)

Using the limit in (III.10), we can derive the volume function of the Sasaki-Einstein base directly from the Hilbert series as follows,

	$\displaystyle V(b_{i};F_{0})=$
	$\displaystyle\frac{24}{(b_{1}-b_{2}-3)(b_{1}-b_{2}+3)(b_{1}+b_{2}-3)(b_{1}+b_{2}+3)}~{},~{}$

where $b_{3}=3$ . When we find the global minimum of the volume function $V(b_{i};F_{0})$ , we obtain

\displaystyle V_{min}=\text{min}_{b_{i}}~{}V(b_{i};F_{0})=\frac{8}{27}\simeq 0.29630~{},~{}

(III.14)

up to 5 decimal points, which occurs at critical Reeb vector components $b_{1}^{*}=b_{2}^{*}=0$ . In the remainder of this work, we will maintain a precision level of 5 decimal points for all numerical measurements.

IV Features of Toric Diagrams and Regression

The aim of this work is to identify an expression for the minimum volume $V_{min}$ of Sasaki-Einstein 5-manifolds $Y_{5}$ in terms of parameters that we know from the corresponding toric Calabi-Yau 3-folds $\mathcal{X}$ . We refer to these parameters as features, denoted as $x_{a}$ , of the toric Calabi-Yau 3-fold $\mathcal{X}$ .

Assuming that we have $N_{x}$ features $x_{a}$ for a given toric Calabi-Yau 3-fold, the proposal in Krefl:2017yox states that we can write down a candidate linear function for the inverse minimum volume in terms of these features as follows,

\displaystyle 1/\hat{V}_{min}(x^{j}_{a})\equiv\hat{y}^{j}=\beta_{0}+\sum_{a=1}^{N_{x}}\beta_{a}x_{a}^{j}~{},~{}

(IV.15)

where $\beta_{0}$ and $\beta_{a}$ are real coefficients, and $j$ labels the particular toric Calabi-Yau 3-fold $\mathcal{X}^{j}$ with its corresponding toric diagram $\Delta^{j}\in\mathbb{Z}^{2}$ .

Let us refer to the inverse of the actual minimum volume obtained by volume minimization as $1/V_{min}^{j}\equiv y^{j}$ for a given toric Calabi-Yau 3-fold $\mathcal{X}^{j}$ . If for a set $S$ of $N=|S|$ toric Calabi-Yau 3-folds $\mathcal{X}^{j}$ , we know the actual minimum volumes $V_{min}^{j}$ via volume minimization, then we can calculate the following residual sum of squares of the difference between the inverses of the actual and the expected minimum volumes for the entire set $S$ ,

	$\displaystyle\mathcal{L}$	$\displaystyle=$	$\displaystyle\frac{1}{2N}\sum_{j=1}^{N=\|S\|}\left(y^{j}-\hat{y}^{j}\right)^{2}$
		$\displaystyle=$	$\displaystyle\frac{1}{2N}\sum_{j=1}^{N}\left(1/V_{min}^{j}-\beta_{0}-\sum_{a=1}^{N_{x}}\beta_{a}x_{a}^{j}\right)^{2}~{}.~{}$

Here, $\mathcal{L}$ can be considered as a loss function goodfellow2016deep that evaluates the performance of the candidate function for the minimum volume in (IV.15). In multiple linear regression gauss1823theoria ; fisher1922mathematical ; mendenhall2003second ; freedman2009statistical ; jobson2012applied , as initially proposed in Krefl:2017yox , the optimization task is to minimize the loss function in (IV) for a given dataset $S$ of toric Calabi-Yau 3-folds,

\displaystyle\text{argmin}_{\beta_{0},\beta_{a}}\mathcal{L}~{}.~{}

(IV.17)

In Krefl:2017yox , multiple linear regression was used to obtain a candidate minimum volume function using the following feature set,

\displaystyle x_{a}^{j}\in\{f_{1},f_{2},f_{3},f_{1}f_{2},f_{1}f_{3},\dots,f_{1}^{2},f_{2}^{2},f_{3}^{2}\}^{j}~{},~{}

(IV.18)

where

\displaystyle f_{1}=I~{},~{}f_{2}=E~{},~{}f_{3}=V~{},~{}

(IV.19)

corresponding respectively to the number of internal lattice points in $\Delta^{j}$ , the number of boundary lattice points in $\Delta^{j}$ , and the number of vertices that form the extremal corner points in $\Delta^{j}$ , for a given toric Calabi-Yau 3-fold $\mathcal{X}^{j}$ . Under Pick’s theorem pick1899geometrisches , these features are related as follows,

\displaystyle A=I+E/2-1~{},~{}

(IV.20)

where $A$ is the area of the toric diagram $\Delta$ , with the area of the smallest unit triangle in $\mathbb{Z}^{2}$ having $A=1/2$ .

With a dataset $S$ of $N=15,147$ toric Calabi-Yau 3-folds, the work in Krefl:2017yox showed that the candidate linear function in (IV.15) with features given by (IV.18) is able to estimate the inverse minimum volume with an expected percentage relative error of 2.2%. In this work, we expand upon the accomplishments of Krefl:2017yox by introducing novel features that describe toric Calabi-Yau 3-folds, augmenting the datasets for toric Calabi-Yau 3-folds, and applying machine learning techniques incorporating regularization. These improvements are designed to address some of the shortcomings of the work in Krefl:2017yox as well as give explicit interpretable formulas for the minimum volume for toric Calabi-Yau 3-folds.

New Features. We introduce several new features that describe a toric Calabi-Yau 3-fold and are obtained from the corresponding toric diagram $\Delta$ . By defining the $n$ -enlarged toric diagram as,

\displaystyle\Delta_{n}=\{nv=(nx,ny)~{}|~{}v=(x,y)\in\Delta\}~{},~{}

(IV.21)

where $n\in\mathbb{Z}^{+}$ and $v=(x,y)\in\mathbb{Z}^{2}$ are the coordinates of the vertices in the original toric diagram $\Delta$ . We note that $\Delta_{1}=\Delta$ . These $n$ -enlarged toric diagrams $\Delta_{n}$ also appeared in Berglund:2021ztg for the study of Hodge numbers of Calabi-Yau manifolds that are constructed as hypersurfaces in toric varieties given by $\Delta$ .

Using the $n$ -enlarged toric diagram $\Delta_{n}^{j}$ for a given toric Calabi-Yau 3-fold $\mathcal{X}^{j}$ , we can now refer to the area of $\Delta_{n}$ as $A_{n}$ , the number of internal lattice points of $\Delta_{n}$ as $I_{n}$ , and the number of boundary lattice points in $\Delta_{n}$ as $E_{n}$ . We further note that the number of vertices $V_{n}$ corresponding to extremal corner points in $\Delta_{n}$ is the same for $V$ in $\Delta$ for all $n$ , i.e. $V_{n}=V$ .

In our work, we use features of a toric Calabi-Yau 3-fold $\mathcal{X}^{j}$ that are composed from members of the following set,

\displaystyle\{A,V,E,I_{n}\}^{j}~{},~{}

(IV.22)

where $n=1,\dots,7$ . These are defined through the corresponding toric diagram $\Delta^{j}$ and its corresponding $n$ -enlarged toric diagram $\Delta_{n}^{j}$ . Through the application of machine learning regularization, our objective is to differentiate between features that contribute to the expression for the minimum volume associated with a toric Calabi-Yau 3-fold and those that do not.

Set	Description	$\|S_{m}\|$
$S_{\text{1a}}$	all polytopes $5\times 5$ lattice box	15,327
$S_{\text{1b}}$	all polytopes $r=3.5$ circle	31,324
$S_{\text{2a}}$	selected polytopes $30\times 30$ lattice box	202,015
$S_{\text{2b}}$	selected polytopes $r=15$ circle	201,895

Table 2: For training the machine learning models, we make use of 4 sets

S_{m}

of toric diagrams with different sizes

|S_{m}|

New Sets of Toric Calabi-Yau 3-folds. The aim of this work is to make use of machine learning with regularization in order to identify an interpretable formula that accurately estimates the minimum volume of Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds. The interpretability of the minimum volume formula is achieved by the lowest possible number of features on which the formula depends on. In order to train such a regularized machine learning model, we establish four sets $S_{m}$ of toric Calabi-Yau 3-folds $\mathcal{X}^{j}$ , for which the corresponding minimum volumes are known. These sets $S_{m}$ are defined as follows:

•

$S_{\text{1a}}$ : This set consists of toric Calabi-Yau 3-folds whose toric diagrams fit into a $5\times 5$ lattice box in $\mathbb{Z}^{2}$ as illustrated in Figure 4(a). This set contains a certain degree of redundancy given that convex lattice polygons related by a $GL(2,\mathbb{Z})$ transformation on their vertices refer to the same toric Calabi-Yau 3-fold. Accordingly, we restrict ourselves to toric diagrams $\Delta^{j}$ that give unique combinations of the form $(1/V_{min}^{j},V^{j},E^{j},I^{j})$ . This results in a dataset of $|S_{\text{1a}}|=15,327$ distinct toric diagrams with unique inverse minimum volumes $1/V_{min}^{j}$ up to 6 decimal points.
•

$S_{\text{1b}}$ : The second set consists of toric Calabi-Yau 3-folds whose toric diagrams fit inside a circle centered at the origin $(0,0)$ on the $\mathbb{Z}^{2}$ lattice with radius $r=3.5$ as illustrated in Figure 4(b). By imposing the condition that we want $GL(2,\mathbb{Z})$ -distinct toric diagrams $\Delta^{j}$ with unique combinations of the form $(1/V_{min}^{j},V^{j},E^{j},I^{j})$ , we obtain $|S_{\text{1b}}|=31,324$ toric diagram for this set.
•

$S_{\text{2a}}$ : For this set, we choose randomly 300,000 toric diagrams that fit into a $30\times 30$ lattice box in $\mathbb{Z}^{2}$ . By imposing the condition that the toric diagrams $\Delta^{j}$ have unique combinations of the form $(1/V_{min}^{j},V^{j},E^{j},I^{j})$ , we obtain $|S_{\text{2a}}|=202,015$ toric diagram for this set.
•

$S_{\text{2b}}$ : For this set, we choose randomly 300,000 toric diagrams that fit into a circle centered at the origin $(0,0)$ on the $\mathbb{Z}^{2}$ lattice with radius $r=15$ . By imposing the condition that the toric diagrams $\Delta^{j}$ have unique combinations of the form $(1/V_{min}^{j},V^{j},E^{j},I^{j})$ , we obtain $|S_{\text{2b}}|=201,895$ toric diagram for this set.

The distribution of inverse minimum volumes $1/V_{min}$ for the above sets of toric diagrams is illustrated together with the mean inverse minimum volume $\overline{y}=\langle 1/V_{min}\rangle=\frac{1}{|S_{m}|}\sum_{j=1}^{|S_{m}|}1/V_{min}^{j}$ in Figure 5. In the following sections, we make use of regularized machine learning in order to identify functions that optimally estimate the inverse minimum volume $1/V_{min}$ in each of the above datasets.

Machine Learning Models and Regularization. In order to obtain a function for the minimum volume of Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds in terms of features obtained from the corresponding toric diagrams, we make use of the following machine learning models:

•

Polynomial Regression (PR). We make use of polynomial regression montgomery2021introduction , where the relationship between the feature variables $x_{a}^{j}$ and the predicted variable $\hat{y}^{j}$ , is given by

$\displaystyle\hat{y}^{j}=\beta_{0}+\sum_{a=1}^{N_{x}}\beta_{a}x_{a}^{j}~{}.~{}$ (IV.23)

Here, $\beta_{0}$ and $\beta_{a}$ are real coefficients, $N_{x}$ is the number of features, and $j$ labels the particular sample in the data set that is used to train this machine learning model. In our case, the data set consists of toric Calabi-Yau 3-folds $\mathcal{X}^{j}$ , where the corresponding minimum volume $V_{min}^{j}$ is given by $y=1/V_{min}^{j}$ . Here we note that the features $x_{a}^{j}$ are taken from the set $\{(f_{u}^{j})^{a}(f_{v}^{j})^{b}~{}|~{}1\leq a+b\leq 2,~{}a,b\in\mathbb{Z}^{+}\}$ with $f_{u}^{j}\in\{A,V,E,I_{n}\}^{j}$ , where $n=1,\dots,7$ .
•

Logarithmic Regression (LR). We make use of logarithmic regression montgomery2021introduction in order to help linearize relationships between features $x_{a}^{j}$ that are potentially multiplicative in their contribution towards the predicted variable $\hat{y}^{j}$ . To be more precise, we make use of a $\log$ - $\log$ model where we $\log$ -transform both the predicted variable $\hat{y}^{j}$ and the features $x_{a}^{j}$ . The predicted variable is then given by,

$\displaystyle\log(\hat{y}^{j})=\beta_{0}+\sum_{a=1}^{N_{x}}\beta_{a}\log(x_{a}^{j})$ (IV.24)

where $\beta_{0}$ and $\beta_{a}$ are real coefficients, and $N_{x}$ is the number of $\log$ -transformed features of the form $\log(x_{a}^{j})$ . The label $j$ corresponds to a particular toric Calabi-Yau 3-fold $\mathcal{X}^{j}$ whose corresponding minimum volume $V_{min}^{j}$ is given by $y^{j}=1/V_{min}^{j}$ . Here we note that the $\log$ -transformed features of the form $\log(x_{a}^{j})$ are taken from the set $\{(\log(f_{u}^{j}))^{a}(\log(f_{v}^{j}))^{b}~{}|~{}1\leq a+b\leq 2,~{}a,b\in\mathbb{Z}^{+}\}$ with $f_{u}^{j}\in\{A,V,E,I_{n}\}^{j}$ , where $n=3,\dots,7$ . Here, we do not make use of $I_{1}$ and $I_{2}$ .

When we introduce regularization tikhonov1963regularization ; hastie2009elements into polynomial regression and logarithmic regression, we minimize the following loss function between the predicted variable $\hat{y}^{j}$ and the expected variable $y$ ,

\displaystyle\mathcal{L}=\frac{1}{2N}\sum_{j=1}^{N}(y^{j}-\hat{y}^{j})^{2}~{}+\Delta\mathcal{L}~{},~{}

(IV.25)

where $\Delta\mathcal{L}$ is the regularization term in the loss function. The loss function in (IV.25) is iteratively minimized during the optimization process and we set for all following computations the maximum number of iterative steps to be $N_{max}=10,000$ . The precise form of the regularization term in the loss function as well as the different regularization schemes in machine learning are discussed in the following section.

V Least Absolute Shrinkage and Selection Operator (Lasso) and Regularization

The Least Absolute Shrinkage and Selection Operator (Lasso) tibshirani1996regression is a machine learning regularization technique primarily employed to prevent overfitting in supervised machine learning. However, it can also be utilized for feature selection. In our work, the overarching goal in employing Lasso is to introduce a machine learning model capable of delivering optimal predictions for the minimum volume for toric Calabi-Yau 3-folds while using the fewest features from the training dataset. For problems such as the one considered in this work, it is quintessential to be able to obtain formulas with a small number of parameters. As a result, using Lasso is particularly suited for discovering new mathematical formulas such as the one aimed for in this work for the minimum volume for toric Calabi-Yau 3-folds.

In the following section, we give a brief overview of several regularization schemes including Lasso in the context of supervised machine learning for the minimum volume formula for toric Calabi-Yau 3-folds.

Regularization. Regularization in machine learning is a technique usually used to avoid overfitting the dataset during model training. This is done by adding a penalty term in the loss function. The introduction of the added regularization term $\Delta\mathcal{L}$ , resulting in an updated loss function of the form,

\displaystyle\mathcal{L}+\Delta\mathcal{L}~{},~{}

(V.26)

serves the purpose of constraining the possible parameter values within the supervised machine learning model. In the case of multiple linear regression as first introduced in Krefl:2017yox and reviewed in section §IV, these parameters would be the real coefficients $\beta_{0}$ and $\beta_{a}$ in the candidate linear function in (IV.15) for the expected minimum volume given by $\hat{y}^{j}=1/\hat{V}^{j}_{min}$ . By restricting the values for these parameters, regularization effectively makes it harder for the supervised machine learning model to give a candidate function for the minimum volume $V_{min}$ with many terms in the function. This prevents the machine learning model to overfit the dataset of minimized volumes for toric Calabi-Yau 3-folds.

Let us review the following three regularization schemes:

•

L1 Regularization (Lasso). This regularization scheme also known as Least Absolute Shrinkage and Selection Operator (Lasso) tibshirani1996regression adds the following linear regularization term to the loss function of the regression model,

$\displaystyle\Delta\mathcal{L}_{\text{L1}}=\alpha\sum_{a=1}^{N_{x}}|\beta_{a}|~{},~{}$ (V.27)

where $\beta_{a}$ are the real parameters of the regression model. $\alpha$ is a real regularization parameter. Increasing the value of $\alpha$ has the effect of increasing the strength of the L1 regularization.
•

L2 Regularization (Ridge). Another regularization scheme is known as Ridge regularization or L2 regularization hoerl1970ridge . It adds the following quadratic regularization term to the loss function of the regression model,

$\displaystyle\Delta\mathcal{L}_{\text{L2}}=\alpha\sum_{a=1}^{N_{x}}\beta_{a}^{2}~{},~{}$ (V.28)

where $\beta_{a}$ are the real parameters of the regression model and $\alpha$ is again the real regularization parameter.

•

Elastic Net (L1 and L2). Elastic Net zou2005regularization is a combination of L1 (Lasso) and L2 (Ridge) regularization and adds the following regularization terms to the loss function,

\displaystyle\Delta\mathcal{L}_{\text{L1,L2}}=\alpha_{1}\sum_{a=1}^{N_{x}}|\beta_{a}|+\alpha_{2}\sum_{a=1}^{N_{x}}\beta_{a}^{2}~{},~{}

(V.29)

where $\alpha_{1}$ and $\alpha_{2}$ are relative real regularization parameters that regulate the proportion of L1 regularization and L2 regularization in this regularization scheme.

Amongst these regularization schemes in supervised machine learning, we are going to mainly focus on Lasso and L1 regularization for the remainder of this work. While all three regularization schemes share the common goal of constraining the range of values for the model parameters $\beta_{a}$ , it is noteworthy that only Lasso possesses the unique property of inducing sparsity among the model parameters, resulting in the complete elimination of certain parameters during the training process.

There are several arguments why Lasso enables the complete elimination of some of the model parameters and the corresponding features in the candidate function for the minimum volume $V_{min}$ for toric Calabi-Yau 3-folds. In order to illustrate this, let us consider the case with $N_{x}=2$ features $x_{1}^{j}$ and $x_{2}^{j}$ , for which the L1 and L2 regularization terms take respectively the following form,

\displaystyle\Delta\mathcal{L}_{\text{L1}}=\alpha(|\beta_{1}|+|\beta_{2}|)~{},~{}\Delta\mathcal{L}_{\text{L2}}=\alpha(\beta_{1}^{2}+\beta_{2}^{2})~{}.~{}

(V.30)

If we assume that under optimization, the regularization terms reach a value $\Delta\mathcal{L}_{\text{L1}}=\epsilon$ and $\Delta\mathcal{L}_{\text{L2}}=\epsilon$ for $\alpha>0$ and $\epsilon\in\mathbb{R}$ , we can draw the parametric plots for the two regularization terms as shown in Figure 6 hastie2009elements . We can see from the plots in Figure 6 that for L1 regularization, the minimum of the total loss function is more likely achieved when one of the two parameters $\beta_{1}$ or $\beta_{2}$ approaches 0. This is in part due to the absolute values taken for the parameters in the linear L1 regularization term.

As a result, Lasso regularization is particularly suited for feature selection and parameter elimination in regression models. In our work, we employ L1 regularization to derive a formula for the minimum volume $V_{min}$ of Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds that is interpretable, presentable and reusable.

VI Candidates for Minimum Volume Functions

In this work, our aim is to apply Lasso regularization in order to identify explicit formulas for the minimum volume for toric Calabi-Yau 3-folds. By doing so, our aim is to maximize the accuracy of the formulas that we find while minimizing the number of parameters the formulas depend on, making them interpretable and readily presentable.

data set	$y=1/V_{min}$	$\alpha^{*}$	$N_{\beta_{a}(\alpha^{*})}$	$R^{2}(\alpha^{*})$
$S_{\text{1a}}$	$\hat{y}_{\text{1a}}^{\text{PR}}=1.28837A-0.71753V+0.07208I_{2}+5.18969$	0.03548	3	0.98354
$S_{\text{1b}}$	$\hat{y}_{\text{1b}}^{\text{PR}}=1.36089A-0.61041V+0.15561I+5.31028$	0.01995	3	0.98697
$S_{\text{2a}}$	$\hat{y}_{\text{2a}}^{\text{PR}}=1.61574A-19.35740V+0.06419I+101.58972$	0.97724	3	0.98743
$S_{\text{2b}}$	$\hat{y}_{\text{2b}}^{\text{PR}}=1.61494A-19.42096V+0.06494I+101.84952$	0.97724	3	0.98740

Table 3: Optimal candidate formulas for the minimum volume for toric Calabi-Yau 3-folds given by

y=1/V_{min}

and obtained under L1 (Lasso) regularized polynomial regression (PR) on datasets

S_{\text{1a}}

S_{\text{1b}}

S_{\text{2a}}

and

S_{\text{2b}}

. For each optimal candidate formula, we give the optimal regularization parameter

\alpha^{*}

that maximizes the corresponding

R^{2}

-score and minimizes the number of non-zero coefficients

N_{\beta_{a}}

in the formula.

data set	$y=1/V_{min}$	$\alpha^{*}$	$N_{\beta_{a}(\alpha^{*})}$	$R^{2}(\alpha^{*})$
$S_{\text{1a}}$	$\hat{y}_{\text{1a}}^{\text{LR}}=1.97348A^{0.77011}V^{-0.21355}I_{3}^{0.08796}I_{4}^{0.02722}I_{5}^{0.00202}e^{0.00923(\log{I_{3}})^{2}}$	0.00045	6	0.98932
$S_{\text{1b}}$	$\hat{y}_{\text{1b}}^{\text{LR}}=1.75668A^{0.74154}V^{-0.182009}E^{0.00050}I_{3}^{0.16451}I_{4}^{0.00679}e^{0.00447(\log{I_{3}})^{2}}$	0.00032	6	0.98992
$S_{\text{2a}}$	$\hat{y}_{\text{2a}}^{\text{LR}}=2.50772A^{0.95411}V^{-0.21992}I_{3}^{0.02867}$	0.00112	3	0.99281
$S_{\text{2b}}$	$\hat{y}_{\text{2b}}^{\text{LR}}=2.51288A^{0.95322}V^{-0.21970}I_{3}^{0.02898}$	0.00112	3	0.99297

Table 4: Optimal candidate formulas for the minimum volume for toric Calabi-Yau 3-folds given by

y=1/V_{min}

and obtained under L1 (Lasso) regularized logarithmic regression (LR) on datasets

S_{\text{1a}}

S_{\text{1b}}

S_{\text{2a}}

and

S_{\text{2b}}

. For each optimal candidate formula, we give the optimal regularization parameter

\alpha^{*}

that maximizes the corresponding

R^{2}

-score and minimizes the number of non-zero coefficients

N_{\beta_{a}}

in the formula.

Parameter Sparsity vs Accuracy. Like in all regression problems, we introduce as a measure of how well the model fits the observed data using the $R^{2}$ -score montgomery2021introduction ; hastie2009elements given by,

\displaystyle R^{2}=1-\frac{S_{res}}{S_{tot}}~{},~{}

(VI.31)

where the residual sum of squares $S_{res}$ is given by,

\displaystyle S_{res}=\sum_{j=1}^{N}(y^{j}-\hat{y}^{j})^{2}

(VI.32)

and the total sum of squares $S_{tot}$ is given by,

\displaystyle S_{tot}=\sum_{j=1}^{N}(y^{j}-\overline{y})^{2}~{}.~{}

(VI.33)

Here, $\hat{y}^{j}$ denotes the predicted value for the minimum volume $V_{min}^{j}$ given by $y^{j}=1/V_{min}^{j}$ , whereas $\overline{y}$ denotes the mean of the expected values $y^{j}$ .

We recall that the optimization problem for the L1-regularized regression model is to minimize the loss function $\mathcal{L}+\Delta\mathcal{L}_{\text{L1}}$ with the L1 regularization term. As we discussed in the sections above, this optimization problem focuses on minimizing the mean squared error with a penalty for non-zero coefficients $\beta_{a}(\alpha)$ , which depends on the regularization parameter $\alpha$ .

Here, we note that there is an additional optimization problem regarding the maximization of the $R^{2}$ -score in (VI.31) and the minimization of the number $N_{\beta_{a}(\alpha)}$ of non-zero coefficients $\beta_{a}(\alpha)$ . We can formulate this additional optimization problem as follows,

\displaystyle\max_{\alpha}\left\{R^{2}(\alpha)-\lambda\frac{N_{\beta_{a}(\alpha)}}{N_{x}}\right\}~{},~{}

(VI.34)

where $0<N_{\beta_{a}(\alpha)}\leq N_{x}$ , and the values of the coefficients $\beta_{a}(\alpha)$ and the $R^{2}(\alpha)$ -score all depend on the regularization parameter $\alpha$ . $\lambda$ is a positive hyperparameter that regulates how much we value sparsity of feature coefficients $\beta_{a}(\alpha)$ over the accuracy of the estimate given by $R^{2}(\alpha)$ .

Candidate Formulas. The candidate formulas for the minimum volume for toric Calabi-Yau 3-folds are identified by an optimal regularization parameter $\alpha^{*}$ that maximizes the $R^{2}$ -score of the candidate formula and minimizes the number of non-zero coefficients $N_{\beta_{a}(\alpha)}$ corresponding to features in the chosen regression model. In order to identify the optimal regularization parameter $\alpha^{*}$ for the optimization problem in (VI.34), we search for $\alpha^{*}$ in a given fixed range for $\alpha$ as specified in Figure 7 and Figure 8. We do the search for the optimal regularization parameter $\alpha^{*}$ for all four datasets in Table 2 for both L1-regularized polynomial regression and L1-regularized logarithmic regression as discussed in sections §IV and §V. The chosen L1-regularized regression models are trained for a particular value of the regularization parameter $\alpha$ under a fixed randomly chosen 80% training and 20% testing data split, where the corresponding $R^{2}$ -score depending on $\alpha$ is obtained from the testing data.

Figure 7 shows respectively for datasets $S_{\text{1a}}$ and $S_{\text{2a}}$ plots for the L1 regularization parameter $\alpha$ for polynomial regression against standardized coefficients $\overline{\beta}_{a}(\alpha)$ , against the number of non-zero coefficients $N_{\beta_{a}(\alpha)}$ , and against the $R^{2}$ -score. Here, the standardized coefficients $\overline{\beta}_{a}(\alpha)$ are obtained when the training is conducted over normalized features $\overline{x}_{a}$ . When the training is completed for a specific value of $\alpha$ , the candidate formula for the minimum volume given by $y=1/V_{min}$ is obtained by reversing the normalization on the features, giving us the coefficients $\beta_{a}(\alpha)$ of the candidate formula. We also have Figure 8 which shows respectively for datasets $S_{\text{1a}}$ and $S_{\text{2a}}$ plots for the L1 regularization parameter $\alpha$ for logarithmic regression against the standardized coefficients $\overline{\beta}_{a}(\alpha)$ , the number of non-zero coefficients $N_{\beta_{a}(\alpha)}$ and the $R^{2}$ -score. Similar plots can also be obtained for datasets $S_{\text{1b}}$ and $S_{\text{2b}}$ for both L1-regularized polynomial regression and L1-regularized logarithmic regression.

Overall, the plots illustrate that the identified optimal regularization parameters $\alpha^{*}$ minimize the number of non-zero coefficients $N_{\beta_{a}(\alpha)}$ in the formula estimating the minimum volume given by $y=1/V_{min}$ , as well as maximize the accuracy of the formulas measured by the $R^{2}$ -score. Table 3 and Table 4 summarize respectively the most optimal candidate formulas for the minimum volume given by $y=1/V_{min}$ under L1-regularized polynomial regression and L1-regularized logarithmic regression for the four datasets in Table 2, with the corresponding optimal regularization parameters $\alpha^{*}$ , the corresponding number of non-zero coefficients $N_{\beta_{a}(\alpha)}$ and the $R^{2}$ -score.

A closer look reveals that for all models, the identified optimal regularization parameters $\alpha^{*}$ results in formulas that approximate the minimum volume $y=1/V_{min}$ extremely well for all the datasets $S_{\text{1a}}$ , $S_{\text{1b}}$ , $S_{\text{2a}}$ and $S_{\text{2b}}$ . Overall, the L1-regularized logarithmic regression models seem to give more accurate results than the L1-regularized polynomial regression models with $N_{\beta_{a}(\alpha)}\leq 6$ over all datasets. In particular, L1-regularized logarithmic regression models trained on datasets $S_{\text{2a}}$ and $S_{\text{2b}}$ have $R^{2}$ -scores above $0.99$ , which is exceptionally high.

Having a closer look at explicit examples of toric Calabi-Yau 3-folds in the datasets reveals however that the performances of the regularized regression models can vary between different toric Calabi-Yau 3-folds. For example, focusing on the L1-regularized logarithmic regression models trained on $S_{\text{1a}}$ and $S_{\text{1b}}$ , we observe that the minimum volumes given by $1/\hat{y}_{\text{1a}}^{\text{LR}}$ and $1/\hat{y}_{\text{1b}}^{\text{LR}}$ in Table 4 perform differently for toric diagrams with smaller areas $A$ compared to toric diagrams with larger areas $A$ as illustrated in Figure 9. Similar observations can be made for the L1-regularized logarithmic regression models trained on $S_{\text{2a}}$ and $S_{\text{2b}}$ as well as the L1-regularized polynomial regression models.

In summary, we can calculate the expected relative percentage errors $E[\epsilon]$ of the predicted minimum volumes given by $1/\hat{y}$ and the corresponding standard deviations $\sigma[\epsilon]$ for the L1-regularized logarithmic regression models as follows,

	$\displaystyle E\left[\epsilon_{\text{1a}}^{\text{LR}}\right]=2.158\%~{},~{}\sigma\left[\epsilon_{\text{1a}}^{\text{LR}}\right]=1.696\%~{},~{}$
	$\displaystyle E\left[\epsilon_{\text{1b}}^{\text{LR}}\right]=1.884\%~{},~{}\sigma\left[\epsilon_{\text{1b}}^{\text{LR}}\right]=1.545\%~{},~{}$
	$\displaystyle E\left[\epsilon_{\text{2a}}^{\text{LR}}\right]=3.577\%~{},~{}\sigma\left[\epsilon_{\text{2a}}^{\text{LR}}\right]=2.396\%~{},~{}$
	$\displaystyle E\left[\epsilon_{\text{2b}}^{\text{LR}}\right]=3.579\%~{},~{}\sigma\left[\epsilon_{\text{2b}}^{\text{LR}}\right]=2.399\%~{}.~{}$		(VI.35)

We note that the models trained on $S_{\text{2a}}$ and $S_{\text{2b}}$ have a larger expected relative percentage error than the ones trained on $S_{\text{1a}}$ and $S_{\text{1b}}$ . This is partly due to the fact that $S_{\text{2a}}$ and $S_{\text{2b}}$ contain randomly selected toric diagrams in a $30\times 30$ lattice box in $\mathbb{Z}^{2}$ and $r=15$ circle, respectively, whereas $S_{\text{1a}}$ and $S_{\text{1b}}$ contain the full set of toric diagrams in a $5\times 5$ lattice box in $\mathbb{Z}^{2}$ and $r=3.5$ circle, respectively, as defined in Table 2.

We also note that the $R^{2}$ -scores of the L1-regularized logarithmic regression models in Table 4,

	$\displaystyle R^{2}\left(y_{\text{1a}}^{\text{LR}}\right)=0.98932~{},~{}R^{2}\left(y_{\text{1b}}^{\text{LR}}\right)=0.98992~{},~{}$
	$\displaystyle R^{2}\left(y_{\text{2a}}^{\text{LR}}\right)=0.99281~{},~{}R^{2}\left(y_{\text{2b}}^{\text{LR}}\right)=0.99297~{},~{}$		(VI.36)

are overall very high and close to $1$ . Compared to the expected relative percentage errors in (VI), which measure how far off predictions of the minimum volume given by $1/\hat{y}$ are, the $R^{2}$ -score is a measure of the accuracy of the trained regression model. It quantifies the proportion of the variation in $y=1/V_{min}$ that can be predicted using the features selected from the corresponding toric diagrams of the toric Calabi-Yau 3-folds.

VII Discussions and Conclusions

With this work, we demonstrated that employing regularization in machine learning models can effectively address the limitations posed by supervised machine learning techniques applied to problems that occur in the context of string theory. In particular, we have shown that the minimum volume $V_{min}$ for Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds can be expressed by just 3 features of the associated toric diagrams $\Delta$ with an $R^{2}$ -score $\geq 0.98$ . These 3 features are the area $A$ of $\Delta$ , the number of vertices $V$ in $\Delta$ , and the number of internal points in the factor $n=3$ enlarged toric diagram $\Delta_{3}$ .

The simultaneous maximization of the $R^{2}$ -score and the minimization of the number surviving parameters in the candidate function for $y=1/V_{min}$ by varying the regularization strength given by the regularization parameter $\alpha$ , the proposed regularized regression models in this work give far more presentable, interpretable and explainable results than our previous work in Krefl:2017yox . Above all, as suggested in Figure 9, the candidate formulas for the minimum volumes of toric Calabi-Yau 3-folds obtained in this study are concise enough to facilitate the examination of why some toric Calabi-Yau 3-folds are associated with minimum volumes that are more challenging to predict than those of certain other toric Calabi-Yau 3-folds. We plan to report on these investigations in the near future. We foresee that the application of regularization schemes to other supervised machine learning applications in string theory will open up equally promising research opportunities in the future.

Acknowledgements.

R.K.-S. would like to thank the Simons Center for Geometry and Physics at Stony Brook University, the City University of New York Graduate Center, the Institute for Basic Science Center for Geometry and Physics, as well as the Kavli Institute for the Physics and Mathematics of the Universe for hospitality during various stages of this work. He is supported by a Basic Research Grant of the National Research Foundation of Korea (NRF-2022R1F1A1073128). He is also supported by a Start-up Research Grant for new faculty at UNIST (1.210139.01), a UNIST AI Incubator Grant (1.230038.01) and UNIST UBSI Grants (1.230168.01, 1.230078.01), as well as an Industry Research Project (2.220916.01) funded by Samsung SDS in Korea. He is also partly supported by the BK21 Program (“Next Generation Education Program for Mathematical Sciences”, 4299990414089) funded by the Ministry of Education in Korea and the National Research Foundation of Korea (NRF).

References

(1) Y.-H. He, Deep-Learning the Landscape, 1706.02714.
(2) D. Krefl and R.-K. Seong, Machine Learning of Calabi-Yau Volumes, Phys. Rev. D 96 (2017) 066014, [1706.03346].
(3) F. Ruehle, Evolving neural networks with genetic algorithms to study the String Landscape, JHEP 08 (2017) 038, [1706.07024].
(4) J. Carifio, J. Halverson, D. Krioukov and B. D. Nelson, Machine Learning in the String Landscape, JHEP 09 (2017) 157, [1707.00655].
(5) A. Cole, A. Schachner and G. Shiu, Searching the Landscape of Flux Vacua with Genetic Algorithms, JHEP 11 (2019) 045, [1907.10072].
(6) A. Cole, G. J. Loges and G. Shiu, Interpretable Phase Detection and Classification with Persistent Homology, in 34th Conference on Neural Information Processing Systems, 12, 2020. 2012.00783.
(7) J. Halverson, A. Maiti and K. Stoner, Neural Networks and Quantum Field Theory, Mach. Learn. Sci. Tech. 2 (2021) 035002, [2008.08601].
(8) S. Gukov, J. Halverson, F. Ruehle and P. Sułkowski, Learning to Unknot, Mach. Learn. Sci. Tech. 2 (2021) 025035, [2010.16263].
(9) S. Abel, A. Constantin, T. R. Harvey and A. Lukas, Evolving Heterotic Gauge Backgrounds: Genetic Algorithms versus Reinforcement Learning, Fortsch. Phys. 70 (2022) 2200034, [2110.14029].
(10) S. Krippendorf, R. Kroepsch and M. Syvaeri, Revealing systematics in phenomenologically viable flux vacua with reinforcement learning, 2107.04039.
(11) A. Cole, S. Krippendorf, A. Schachner and G. Shiu, Probing the Structure of String Theory Vacua with Genetic Algorithms and Reinforcement Learning, in 35th Conference on Neural Information Processing Systems, 11, 2021. 2111.11466.
(12) P. Berglund, Y.-H. He, E. Heyes, E. Hirst, V. Jejjala and A. Lukas, New Calabi-Yau Manifolds from Genetic Algorithms, 2306.06159.
(13) M. Demirtas, J. Halverson, A. Maiti, M. D. Schwartz and K. Stoner, Neural Network Field Theories: Non-Gaussianity, Actions, and Locality, 2307.03223.
(14) K. Bull, Y.-H. He, V. Jejjala and C. Mishra, Machine Learning CICY Threefolds, Phys. Lett. B 785 (2018) 65–72, [1806.03121].
(15) V. Jejjala, A. Kar and O. Parrikar, Deep Learning the Hyperbolic Volume of a Knot, Phys. Lett. B 799 (2019) 135033, [1902.05547].
(16) C. R. Brodie, A. Constantin, R. Deen and A. Lukas, Machine Learning Line Bundle Cohomology, Fortsch. Phys. 68 (2020) 1900087, [1906.08730].
(17) Y.-H. He and A. Lukas, Machine Learning Calabi-Yau Four-folds, Phys. Lett. B 815 (2021) 136139, [2009.02544].
(18) H. Erbin and R. Finotello, Machine learning for complete intersection Calabi-Yau manifolds: a methodological study, Phys. Rev. D 103 (2021) 126014, [2007.15706].
(19) V. Anagiannis and M. C. N. Cheng, Entangled q-convolutional neural nets, Mach. Learn. Sci. Tech. 2 (2021) 045026, [2103.11785].
(20) M. Larfors, A. Lukas, F. Ruehle and R. Schneider, Numerical metrics for complete intersection and Kreuzer–Skarke Calabi–Yau manifolds, Mach. Learn. Sci. Tech. 3 (2022) 035014, [2205.13408].
(21) S. Krippendorf and M. Syvaeri, Detecting Symmetries with Neural Networks, 2003.13679.
(22) D. S. Berman, Y.-H. He and E. Hirst, Machine learning Calabi-Yau hypersurfaces, Phys. Rev. D 105 (2022) 066002, [2112.06350].
(23) J. Bao, Y.-H. He and E. Hirst, Neurons on Amoebae, J. Symb. Comput. 116 (2022) 1–38, [2106.03695].
(24) R.-K. Seong, Unsupervised Machine Learning Techniques for Exploring Tropical Coamoeba, Brane Tilings and Seiberg Duality, 2309.05702.
(25) D. Martelli, J. Sparks and S.-T. Yau, Sasaki-Einstein manifolds and volume minimisation, Commun. Math. Phys. 280 (2008) 611–673, [hep-th/0603021].
(26) D. Martelli, J. Sparks and S.-T. Yau, The geometric dual of a-maximisation for toric Sasaki- Einstein manifolds, Commun. Math. Phys. 268 (2006) 39–65, [hep-th/0503183].
(27) W. Fulton, Introduction to toric varieties. Annals of mathematics studies. Princeton Univ. Press, Princeton, NJ, 1993.
(28) N. C. Leung and C. Vafa, Branes and Toric Geometry, ArXiv High Energy Physics - Theory e-prints (Nov., 1997) , [hep-th/9711013].
(29) B. R. Greene, String theory on Calabi-Yau manifolds, in Theoretical Advanced Study Institute in Elementary Particle Physics (TASI 96): Fields, Strings, and Duality, pp. 543–726, 6, 1996. hep-th/9702155.
(30) M. R. Douglas, B. R. Greene and D. R. Morrison, Orbifold resolution by D-branes, Nucl.Phys. B506 (1997) 84–106, [hep-th/9704151].
(31) E. Witten, Anti-de Sitter space and holography, Adv. Theor. Math. Phys. 2 (1998) 253–291, [hep-th/9802150].
(32) I. R. Klebanov and E. Witten, Superconformal field theory on three-branes at a Calabi-Yau singularity, Nucl.Phys. B536 (1998) 199–218, [hep-th/9807080].
(33) M. R. Douglas and G. W. Moore, D-branes, Quivers, and ALE Instantons, hep-th/9603167.
(34) A. E. Lawrence, N. Nekrasov and C. Vafa, On conformal field theories in four-dimensions, Nucl.Phys. B533 (1998) 199–209, [hep-th/9803015].
(35) B. Feng, A. Hanany and Y.-H. He, D-brane gauge theories from toric singularities and toric duality, Nucl. Phys. B595 (2001) 165–200, [hep-th/0003085].
(36) B. Feng, A. Hanany and Y.-H. He, Phase structure of D-brane gauge theories and toric duality, JHEP 08 (2001) 040, [hep-th/0104259].
(37) J. M. Maldacena, The large N limit of superconformal field theories and supergravity, Adv. Theor. Math. Phys. 2 (1998) 231–252, [hep-th/9711200].
(38) D. R. Morrison and M. R. Plesser, Nonspherical horizons. 1., Adv.Theor.Math.Phys. 3 (1999) 1–81, [hep-th/9810201].
(39) B. S. Acharya, J. M. Figueroa-O’Farrill, C. M. Hull and B. J. Spence, Branes at conical singularities and holography, Adv. Theor. Math. Phys. 2 (1999) 1249–1286, [hep-th/9808014].
(40) K. A. Intriligator and B. Wecht, The Exact superconformal R symmetry maximizes a, Nucl. Phys. B 667 (2003) 183–200, [hep-th/0304128].
(41) A. Butti and A. Zaffaroni, R-charges from toric diagrams and the equivalence of a- maximization and Z-minimization, JHEP 11 (2005) 019, [hep-th/0506232].
(42) A. Butti and A. Zaffaroni, From toric geometry to quiver gauge theory: The Equivalence of a-maximization and Z-minimization, Fortsch.Phys. 54 (2006) 309–316, [hep-th/0512240].
(43) S. S. Gubser, Einstein manifolds and conformal field theories, Phys. Rev. D 59 (1999) 025006, [hep-th/9807164].
(44) M. Henningson and K. Skenderis, The Holographic Weyl anomaly, JHEP 07 (1998) 023, [hep-th/9806087].
(45) S. Benvenuti, B. Feng, A. Hanany and Y.-H. He, Counting BPS operators in gauge theories: Quivers, syzygies and plethystics, JHEP 11 (2007) 050, [hep-th/0608050].
(46) B. Feng, A. Hanany and Y.-H. He, Counting Gauge Invariants: the Plethystic Program, JHEP 03 (2007) 090, [hep-th/0701063].
(47) C.-F. Gauss, Theoria combinationis observationum erroribus minimis obnoxiae. Henricus Dieterich, 1823.
(48) R. A. Fisher, On the mathematical foundations of theoretical statistics, Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character 222 (1922) 309–368.
(49) W. Mendenhall, T. Sincich and N. S. Boudreau, A second course in statistics: regression analysis, vol. 6. Prentice Hall Upper Saddle River, NJ, 2003.
(50) D. A. Freedman, Statistical models: theory and practice. cambridge university press, 2009.
(51) J. D. Jobson, Applied multivariate data analysis: regression and experimental design. Springer Science & Business Media, 2012.
(52) Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (1998) 2278–2324.
(53) A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012) .
(54) Y. LeCun, Y. Bengio and G. Hinton, Deep learning, nature 521 (2015) 436–444.
(55) J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks 61 (2015) 85–117.
(56) D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning representations by back-propagating errors, nature 323 (1986) 533–536.
(57) T. Hastie, R. Tibshirani, J. H. Friedman and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer, 2009.
(58) K. Hori and C. Vafa, Mirror symmetry, hep-th/0002222.
(59) B. Feng, Y.-H. He, K. D. Kennaway and C. Vafa, Dimer models from mirror symmetry and quivering amoebae, Adv. Theor. Math. Phys. 12 (2008) 489–545, [hep-th/0511287].
(60) A. Tikhonov, Regularization of incorrectly posed problems, in Soviet Math. Dokl., pp. 1624–1627, 1963.
(61) R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B: Statistical Methodology 58 (1996) 267–288.
(62) D. Martelli and J. Sparks, Toric geometry, Sasaki-Einstein manifolds and a new infinite class of AdS/CFT duals, Commun. Math. Phys. 262 (2006) 51–89, [hep-th/0411238].
(63) S. Benvenuti, S. Franco, A. Hanany, D. Martelli and J. Sparks, An infinite family of superconformal quiver gauge theories with Sasaki-Einstein duals, JHEP 06 (2005) 064, [hep-th/0411264].
(64) S. Benvenuti and M. Kruczenski, From Sasaki-Einstein spaces to quivers via BPS geodesics: L**p,q—r, JHEP 04 (2006) 033, [hep-th/0505206].
(65) A. Butti, D. Forcella and A. Zaffaroni, The Dual superconformal theory for L**pqr manifolds, JHEP 09 (2005) 018, [hep-th/0505220].
(66) S. Franco, A. Hanany, K. D. Kennaway, D. Vegh and B. Wecht, Brane Dimers and Quiver Gauge Theories, JHEP 01 (2006) 096, [hep-th/0504110].
(67) A. Hanany and K. D. Kennaway, Dimer models and toric diagrams, hep-th/0503149.
(68) S. Franco et al., Gauge theories from toric geometry and brane tilings, JHEP 01 (2006) 128, [hep-th/0505211].
(69) R. Kenyon, An introduction to the dimer model, ArXiv Mathematics e-prints (Oct., 2003) , [math/0310326].
(70) P. Kasteleyn, Graph theory and crystal physics, Graph theory and theoretical physics (1967) 43–110.
(71) F. Hirzebruch, Singularities and exotic spheres. Societe Mathematic de France, 1968.
(72) E. Brieskorn, Beispiele zur differentialtopologie von singularitäten, Inventiones mathematicae 2 (1966) 1–14.
(73) E. Witten, Phases of N = 2 theories in two dimensions, Nucl. Phys. B403 (1993) 159–222, [hep-th/9301042].
(74) A. Butti, D. Forcella, A. Hanany, D. Vegh and A. Zaffaroni, Counting Chiral Operators in Quiver Gauge Theories, JHEP 0711 (2007) 092, [0705.2771].
(75) A. Hanany and A. Zaffaroni, The master space of supersymmetric gauge theories, Adv.High Energy Phys. 2010 (2010) 427891.
(76) D. Forcella, A. Hanany, Y.-H. He and A. Zaffaroni, The Master Space of N=1 Gauge Theories, JHEP 0808 (2008) 012, [0801.1585].
(77) D. Forcella, A. Hanany, Y.-H. He and A. Zaffaroni, Mastering the Master Space, Lett.Math.Phys. 85 (2008) 163–171, [0801.3477].
(78) P. Pouliot, Molien function for duality, JHEP 01 (1999) 021, [hep-th/9812015].
(79) N. Seiberg, Electric - magnetic duality in supersymmetric nonAbelian gauge theories, Nucl. Phys. B435 (1995) 129–146, [hep-th/9411149].
(80) C. E. Beasley and M. Ronen Plesser, Toric duality is Seiberg duality, Journal of High Energy Physics 12 (Dec., 2001) 1–+, [hep-th/0109053].
(81) I. Goodfellow, Y. Bengio and A. Courville, Deep learning. MIT press, 2016.
(82) G. Pick, Geometrisches zur zahlenlehre, Sitzenber. Lotos (Prague) 19 (1899) 311–319.
(83) P. Berglund, B. Campbell and V. Jejjala, Machine Learning Kreuzer-Skarke Calabi-Yau Threefolds, 2112.09117.
(84) D. C. Montgomery, E. A. Peck and G. G. Vining, Introduction to linear regression analysis. John Wiley & Sons, 2021.
(85) A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12 (1970) 55–67.
(86) H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B: Statistical Methodology 67 (2005) 301–320.

Machine Learning Regularization for the Minimum Volume Formula of Toric Calabi-Yau 3-folds