Machine Learning Regularization
for the Minimum Volume Formula of Toric Calabi-Yau 3-folds
Abstract
We present a collection of explicit formulas for the minimum volume of Sasaki-Einstein 5-manifolds. The cone over these 5-manifolds is a toric Calabi-Yau 3-fold. These toric Calabi-Yau 3-folds are associated with an infinite class of supersymmetric gauge theories, which are realized as worldvolume theories of D3-branes probing the toric Calabi-Yau 3-folds. Under the AdS/CFT correspondence, the minimum volume of the Sasaki-Einstein base is inversely proportional to the central charge of the corresponding superconformal field theories. The presented formulas for the minimum volume are in terms of geometric invariants of the toric Calabi-Yau 3-folds. These explicit results are derived by implementing machine learning regularization techniques that advance beyond previous applications of machine learning for determining the minimum volume. Moreover, the use of machine learning regularization allows us to present interpretable and explainable formulas for the minimum volume. Our work confirms that, even for extensive sets of toric Calabi-Yau 3-folds, the proposed formulas approximate the minimum volume with remarkable accuracy.
I Introduction
Since the introduction of machine learning techniques in He:2017aed ; Krefl:2017yox ; Ruehle:2017mzq ; Carifio:2017bov ; Cole:2019enn ; Cole:2020gkd ; Halverson:2020trp ; Gukov:2020qaj ; Abel:2021rrj ; Krippendorf:2021uxu ; Cole:2021nnt ; Berglund:2023ztk ; Demirtas:2023fir for studying problems that occur in the context of string theory, machine learning – both supervised Bull:2018uow ; Jejjala:2019kio ; Brodie:2019dfx ; He:2020lbz ; Erbin:2020tks ; Anagiannis:2021cco ; Larfors:2022nep and unsupervised Krippendorf:2020gny ; Berman:2021mcw ; Bao:2021olg ; Seong:2023njx – has led to a variety of applications in string theory. A problem that appeared particularly suited for machine learning in 2017 Krefl:2017yox was the problem of identifying a formula for the minimum volume of Sasaki-Einstein 5-manifolds Martelli:2006yb ; Martelli:2005tp . The cone over these Sasaki-Einstein 5-manifolds is a toric Calabi-Yau 3-fold fulton ; 1997hep.th…11013L . Given that there are infinitely many toric Calabi-Yau 3-folds with corresponding Sasaki-Einstein 5-manifolds and that there is an infinite class of supersymmetric gauge theories associated to them via string theory Greene:1996cy ; Douglas:1997de ; Witten:1998qj ; Klebanov:1998hh ; Douglas:1996sw ; Lawrence:1998ja ; Feng:2000mi ; Feng:2001xr , this beautiful correspondence between geometry and gauge theory was identified in Krefl:2017yox as an ideal testbed for introducing machine learning for string theory.
These supersymmetric gauge theories corresponding to toric Calabi-Yau 3-folds are realized as worldvolume theories of D3-branes probing the Calabi-Yau singularities. Via the AdS/CFT correspondence Maldacena:1997re ; Morrison:1998cs ; Acharya:1998db , the minimum volume of the Sasaki-Einstein 5-manifolds is related to the maximized -function Intriligator:2003jj ; Butti:2005vn ; Butti:2005ps that gives the central charges of the corresponding superconformal field theories Gubser:1998vd ; Henningson:1998gx . The proposal in Krefl:2017yox was that machine learning techniques can be used to give a formula of the minimum volume in terms of features taken from the toric diagram of the corresponding toric Calabi-Yau 3-folds. Such a formula would significantly simplify the computation of the minimum volume, which conventionally is computed by minimizing the volume function obtained from the equivariant index Martelli:2006yb ; Martelli:2005tp or Hilbert series of the toric Calabi-Yau 3-fold Benvenuti:2006qr ; Feng:2007ur .
In Krefl:2017yox , we made use of multiple linear regression gauss1823theoria ; fisher1922mathematical ; mendenhall2003second ; freedman2009statistical ; jobson2012applied and a combination of a regression model and a convolutional neural network (CNN) lecun1998gradient ; krizhevsky2012imagenet ; lecun2015deep ; schmidhuber2015deep to learn the minimum volume for toric Calabi-Yau 3-folds. As it is often the case for supervised machine learning rumelhart1986learning ; hastie2009elements , the models lacked interpretability and explainability, achieving high accuracies in estimating the minimum volume with giving only little insight into the mathematical structure and physical origin of the estimating formula.
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|
D5 | ||||||||||
NS5 | — — |
In this work, we aim to highlight the pivotal role of regularization techniques in machine learning tikhonov1963regularization ; hastie2009elements . We demonstrate that employing regularized machine learning models can effectively address the limitations inherent in supervised machine learning, especially for problems that appear in string theory and, more broadly, for problems at the intersection of mathematics and physics. While the primary objective of regularization in machine learning is to prevent overfitting, certain versions of it can be employed to eliminate model parameters, echoing the spirit of regularization in quantum field theory.
By focusing on Least Absolute Shrinkage and Selection Operator (Lasso) regularization tibshirani1996regression for polynomial and logarithmic regression models, we identify several candidate formulas for the minimum volume of Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds. The discovered formulas depend either on 3 or 6 parameters that come from features of the corresponding toric diagrams fulton ; 1997hep.th…11013L – convex lattice polygons on that characterize uniquely the associated toric Calabi-Yau 3-fold. Compared to the extremely large number of parameters in the regression and CNN models used in our previous work in Krefl:2017yox , the formulas obtained in this study are both presentable, interpretable, and most importantly reusable for the computation of the minimum volume for toric Calabi-Yau 3-folds.
II Calabi-Yau 3-Folds and Quiver Gauge Theories
In this work, we concentrate on non-compact toric Calabi-Yau 3-folds . These geometries can be considered as cones over Sasaki-Einstein 5-manifolds Maldacena:1997re ; Morrison:1998cs ; Acharya:1998db ; Martelli:2004wu ; Benvenuti:2004dy ; Benvenuti:2005ja ; Butti:2005sw . The toric Calabi-Yau 3-folds are fully characterized by convex lattice polygons on known as toric diagrams fulton ; 1997hep.th…11013L . The associated Calabi-Yau singularities can be probed by D3-branes whose worldvolume theories form a class of supersymmetric gauge theories Greene:1996cy ; Douglas:1997de ; Witten:1998qj ; Klebanov:1998hh ; Douglas:1996sw ; Lawrence:1998ja ; Feng:2000mi ; Feng:2001xr .
This class of supersymmetric gauge theories can be represented in terms of a T-dual Type IIB brane configuration known as a brane tiling Franco:2005rj ; Hanany:2005ve ; Franco:2005sm . Table 1 summarizes the Type IIB brane configuration. Brane tilings can be illustrated in terms of bipartite graphs on a 2-torus 2003math…..10326K ; kasteleyn1967graph and encapsulate both the field theory information and the information about the associated toric Calabi-Yau geometry. Figure 1 shows an example of a brane tiling and its associated toric Calabi-Yau 3-fold, which is in this case the cone over the zeroth Hirzebruch surface hirzebruch1968singularities ; brieskorn1966beispiele ; Morrison:1998cs ; Feng:2000mi . The mesonic moduli spaces Witten:1993yc ; Benvenuti:2006qr ; Feng:2007ur ; Butti:2007jv formed by the mesonic gauge invariant operators of these supersymmetric gauge theories with gauge groups is precisely the associated toric Calabi-Yau 3-folds. When all the gauge groups of the supersymmetric gauge theory are , then the mesonic moduli space is given by the -th symmetric product of the toric Calabi-Yau 3-fold.

The gravity dual of the worldvolume theories is Type IIB string theory on , where is the Sasaki-Einstein 5-manifold that forms the base of the associated toric Calabi-Yau 3-fold Maldacena:1997re ; Morrison:1998cs ; Acharya:1998db ; Martelli:2004wu ; Benvenuti:2004dy ; Benvenuti:2005ja ; Butti:2005sw . These supersymmetric gauge theories are known to flow at low energies to a superconformal fixed point. Under a procedure known as -maximization Intriligator:2003jj ; Butti:2005vn ; Butti:2005ps , the superconformal -charges of the theory are determined. This procedure, involves the maximization of the trial -charge, which takes the form
(II.1) |
The maximization procedure gives the value of the central charge of the superconformal field theory at the conformal fixed point.
Under the AdS/CFT correspondence Maldacena:1997re ; Morrison:1998cs ; Acharya:1998db , the central charge is directly related to the minimized volume of the corresponding Sasaki-Einstein 5-manifold Gubser:1998vd ; Henningson:1998gx . We have,
(II.2) |
where the R-charges and as a result the volume function can be expressed in terms of Reeb vector components of the corresponding Sasaki-Einstein 5-manifold Martelli:2006yb ; Martelli:2005tp . We can reverse the statement saying that computing the minimum volume,
(II.3) |
is equivalent to obtaining the maximum value of the central charge . This correspondence is true for all theories living on a stack of D3-branes probing toric Calabi-Yau 3-folds and has been checked extensively in various examples Intriligator:2003jj ; Butti:2005vn ; Butti:2005ps .
In this work, we will focus on the toric Calabi-Yau 3-folds and the corresponding Sasaki-Einstein 5-manifold , with particular emphasis on the minimum volume of the Sasaki-Einstein 5-manifolds . Building on the pioneering work of Krefl:2017yox , this work proposes the use of more advanced machine learning techniques. In particular, we introduce machine learning regularization by using the Least Absolute Shrinkage and Selection Operator (Lasso) tibshirani1996regression in order to identify an explicit formula for the minimum volume for Sasaki-Einstein 5-manifolds . We expect to be able to write the minimum volume formula in terms of features obtained from the toric diagram of the corresponding toric Calabi-Yau 3-folds. The use of machine learning regularization allows us to eliminate parameters, reducing the necessary parameters for the volume formula to a manageable amount that is interpretable, presentable and reusable.
Before discussing these machine learning techniques, let us first review in the following section the computation of the volume functions for toric Calabi-Yau 3-folds using Hilbert series.
III Hilbert Series and Calabi-Yau Volumes
Given as a cone over a projective variety , where is realized as an affine variety in , the Hilbert series Benvenuti:2006qr ; Feng:2007ur is the generating function for the dimension of the graded pieces of the coordinate ring
(III.4) |
where are the defining polynomials of . Accordingly, the Hilbert series takes the general form
(III.5) |
For supersymmetric gauge theories given by brane tilings Franco:2005rj ; Hanany:2005ve ; Franco:2005sm , we have an associated toric Calabi-Yau 3-fold , which becomes the mesonic moduli space Witten:1993yc ; Benvenuti:2006qr ; Feng:2007ur ; Butti:2007jv of the supersymmetric gauge theory when the gauge groups are all . The corresponding Hilbert series is the generating function of mesonic gauge invariant operators that form the mesonic moduli space. For the purpose of the remaining discussion, we will consider the supersymmetric gauge theories given by brane tilings as abelian theories with gauge groups.
Following the forward algorithm for brane tilings Feng:2000mi , we can use GLSM fields Witten:1993yc given by perfect matchings Hanany:2005ve ; Franco:2005rj of the brane tilings in order to express the mesonic moduli space of the abelian supersymmetric gauge theory as the following symplectic quotient,
(III.6) |
where is the largest irreducible component, also known as the coherent component, of the master space Hanany:2010zz ; Forcella:2008bb ; Forcella:2008eh of the supersymmetric gauge theory. The master space is the spectrum of the coordinate ring generated by the chiral fields encoded in and quotiented by the F-term relations encoded in . In (III.6), is the -term charge matrix summarizing the charges originating from the -terms, and is the -term charge matrix which summarizes the gauge charges on perfect matchings .
Following the symplectic quotient description of the mesonic moduli space in (III.6), the Hilbert series can be obtained by solving the Molien integral Pouliot:1998yv ,
(III.7) | |||||
where is the number of perfect matchings in the brane tiling and is the total charge matrix.
Martelli:2006yb ; Martelli:2005tp showed that the same Hilbert series can be obtained directly from the toric diagram of the toric Calabi-Yau 3-fold . Given that the toric diagram is a convex lattice polygon on with an ideal triangulation into unit sub-triangles , the Hilbert series of the corresponding toric Calabi-Yau 3-fold can be written as
(III.8) |
where is the index for the unit triangles , and is the index for the boundary edges of each unit triangle . For each boundary edge , we have a -dimensional outer normal vector whose components are assigned the following product of fugacities,
(III.9) |
where indicates the -th component of . We note that is a -dimensional vector because the defining vertices of and are all on a plane at height such that their coordinates are of the form . As a result, the vectors corresponding to edge are normal to the 3-dimensional surface given by the vectors connecting the origin to the two bounding vertices of .

It is important to note that the fugacities in (III.9) relate to the components of normal vectors , and therefore depend on the triangulation and the particular instance in a given toric orbit of a toric diagram on the plane. In comparison, the fugacities in (III.7) refer to the GLSM fields given by perfect matchings of the corresponding brane tiling. Since perfect matchings can be mapped directly to chiral fields in the supersymmetric gauge theory, the fugacities in (III.7) can be mapped to fugacities counting global symmetry charges carried by chiral fields in the theory. Because both Hilbert series from (III.7) and (III.8) refer to the same toric Calabi-Yau 3-fold , there exists a fugacity map between and that identifies the two Hilbert series with each other.
For the rest of the discussion, let us consider Hilbert series for toric Calabi-Yau 3-folds that are in terms of fugacities corresponding to coordinates of the normal vectors of the toric diagram . Given the Hilbert series , we can obtain the volume function Martelli:2006yb ; Martelli:2005tp of the Sasaki-Einstein 5-manifold using,
(III.10) |
where are the Reeb vector components with . We note that the Reeb vector is always in the interior of the toric diagram and can be chosen such that one of its components is set to
(III.11) |
for toric Calabi-Yau 3-folds . We further note that the limit in (III.10) takes the leading order in in the expansion for , which is shown to refer to the volume of the Sasaki-Einstein base in Martelli:2006yb ; Martelli:2005tp .
Let us consider in the following paragraph an example of the computation of the volume function in terms of Reeb vector components for the Sasaki-Einstein base of the cone over the zeroth Hirzebruch surface hirzebruch1968singularities ; brieskorn1966beispiele ; Morrison:1998cs ; Feng:2000mi .
Example: . The toric diagram, its triangulation and the outer normal vectors for the cone over the zeroth Hirzebruch surface hirzebruch1968singularities ; brieskorn1966beispiele ; Morrison:1998cs ; Feng:2000mi are shown in Figure 2(a). The cone over the zeroth Hirzebruch surface is an interesting toric Calabi-Yau 3-fold because it has two distinct corresponding supersymmetric gauge theories represented by two distinct brane tilings that are related by Seiberg duality Seiberg:1994pq ; 2001JHEP…12..001B ; Feng:2000mi . One of the brane tilings is shown in Figure 1.
Using the outer normal vectors for each of the four unit sub-triangles of the toric diagram for in Figure 2(b), we can use (III.8) to write down the Hilbert series,
(III.12) |
Using the limit in (III.10), we can derive the volume function of the Sasaki-Einstein base directly from the Hilbert series as follows,
where . When we find the global minimum of the volume function , we obtain
(III.14) |
up to 5 decimal points, which occurs at critical Reeb vector components .
In the remainder of this work, we will maintain a precision level of 5 decimal points for all numerical measurements.
IV Features of Toric Diagrams and Regression
The aim of this work is to identify an expression for the minimum volume of Sasaki-Einstein 5-manifolds in terms of parameters that we know from the corresponding toric Calabi-Yau 3-folds . We refer to these parameters as features, denoted as , of the toric Calabi-Yau 3-fold .
Assuming that we have features for a given toric Calabi-Yau 3-fold, the proposal in Krefl:2017yox states that we can write down a candidate linear function for the inverse minimum volume in terms of these features as follows,
(IV.15) |
where and are real coefficients, and labels the particular toric Calabi-Yau 3-fold with its corresponding toric diagram .
Let us refer to the inverse of the actual minimum volume obtained by volume minimization as for a given toric Calabi-Yau 3-fold . If for a set of toric Calabi-Yau 3-folds , we know the actual minimum volumes via volume minimization, then we can calculate the following residual sum of squares of the difference between the inverses of the actual and the expected minimum volumes for the entire set ,
Here, can be considered as a loss function goodfellow2016deep that evaluates the performance of the candidate function for the minimum volume in (IV.15). In multiple linear regression gauss1823theoria ; fisher1922mathematical ; mendenhall2003second ; freedman2009statistical ; jobson2012applied , as initially proposed in Krefl:2017yox , the optimization task is to minimize the loss function in (IV) for a given dataset of toric Calabi-Yau 3-folds,
(IV.17) |
In Krefl:2017yox , multiple linear regression was used to obtain a candidate minimum volume function using the following feature set,
(IV.18) |
where
(IV.19) |
corresponding respectively to the number of internal lattice points in , the number of boundary lattice points in , and the number of vertices that form the extremal corner points in , for a given toric Calabi-Yau 3-fold . Under Pick’s theorem pick1899geometrisches , these features are related as follows,
(IV.20) |
where is the area of the toric diagram , with the area of the smallest unit triangle in having .
With a dataset of toric Calabi-Yau 3-folds, the work in Krefl:2017yox showed that the candidate linear function in (IV.15) with features given by (IV.18) is able to estimate the inverse minimum volume with an expected percentage relative error of 2.2%.
In this work, we expand upon the accomplishments of Krefl:2017yox by introducing novel features that describe toric Calabi-Yau 3-folds, augmenting the datasets for toric Calabi-Yau 3-folds, and applying machine learning techniques incorporating regularization.
These improvements are designed to address some of the shortcomings of the work in Krefl:2017yox as well as give explicit interpretable formulas for the minimum volume for toric Calabi-Yau 3-folds.

New Features. We introduce several new features that describe a toric Calabi-Yau 3-fold and are obtained from the corresponding toric diagram . By defining the -enlarged toric diagram as,
(IV.21) |
where and are the coordinates of the vertices in the original toric diagram . We note that . These -enlarged toric diagrams also appeared in Berglund:2021ztg for the study of Hodge numbers of Calabi-Yau manifolds that are constructed as hypersurfaces in toric varieties given by .
Using the -enlarged toric diagram for a given toric Calabi-Yau 3-fold , we can now refer to the area of as , the number of internal lattice points of as , and the number of boundary lattice points in as . We further note that the number of vertices corresponding to extremal corner points in is the same for in for all , i.e. .
In our work, we use features of a toric Calabi-Yau 3-fold that are composed from members of the following set,
(IV.22) |
where .
These are defined through the corresponding toric diagram and its corresponding -enlarged toric diagram .
Through the application of machine learning regularization, our objective is to differentiate between features that contribute to the expression for the minimum volume associated with a toric Calabi-Yau 3-fold and those that do not.
Set | Description | |
---|---|---|
all polytopes lattice box | 15,327 | |
all polytopes circle | 31,324 | |
selected polytopes lattice box | 202,015 | |
selected polytopes circle | 201,895 |
New Sets of Toric Calabi-Yau 3-folds. The aim of this work is to make use of machine learning with regularization in order to identify an interpretable formula that accurately estimates the minimum volume of Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds. The interpretability of the minimum volume formula is achieved by the lowest possible number of features on which the formula depends on. In order to train such a regularized machine learning model, we establish four sets of toric Calabi-Yau 3-folds , for which the corresponding minimum volumes are known. These sets are defined as follows:
-
•
: This set consists of toric Calabi-Yau 3-folds whose toric diagrams fit into a lattice box in as illustrated in Figure 4(a). This set contains a certain degree of redundancy given that convex lattice polygons related by a transformation on their vertices refer to the same toric Calabi-Yau 3-fold. Accordingly, we restrict ourselves to toric diagrams that give unique combinations of the form . This results in a dataset of distinct toric diagrams with unique inverse minimum volumes up to 6 decimal points.
-
•
: The second set consists of toric Calabi-Yau 3-folds whose toric diagrams fit inside a circle centered at the origin on the lattice with radius as illustrated in Figure 4(b). By imposing the condition that we want -distinct toric diagrams with unique combinations of the form , we obtain toric diagram for this set.
-
•
: For this set, we choose randomly 300,000 toric diagrams that fit into a lattice box in . By imposing the condition that the toric diagrams have unique combinations of the form , we obtain toric diagram for this set.
-
•
: For this set, we choose randomly 300,000 toric diagrams that fit into a circle centered at the origin on the lattice with radius . By imposing the condition that the toric diagrams have unique combinations of the form , we obtain toric diagram for this set.
The distribution of inverse minimum volumes for the above sets of toric diagrams is illustrated together with the mean inverse minimum volume in Figure 5.
In the following sections, we make use of regularized machine learning in order to identify functions that optimally estimate the inverse minimum volume in each of the above datasets.


Machine Learning Models and Regularization. In order to obtain a function for the minimum volume of Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds in terms of features obtained from the corresponding toric diagrams, we make use of the following machine learning models:
-
•
Polynomial Regression (PR). We make use of polynomial regression montgomery2021introduction , where the relationship between the feature variables and the predicted variable , is given by
(IV.23) Here, and are real coefficients, is the number of features, and labels the particular sample in the data set that is used to train this machine learning model. In our case, the data set consists of toric Calabi-Yau 3-folds , where the corresponding minimum volume is given by . Here we note that the features are taken from the set with , where .
-
•
Logarithmic Regression (LR). We make use of logarithmic regression montgomery2021introduction in order to help linearize relationships between features that are potentially multiplicative in their contribution towards the predicted variable . To be more precise, we make use of a - model where we -transform both the predicted variable and the features . The predicted variable is then given by,
(IV.24) where and are real coefficients, and is the number of -transformed features of the form . The label corresponds to a particular toric Calabi-Yau 3-fold whose corresponding minimum volume is given by . Here we note that the -transformed features of the form are taken from the set with , where . Here, we do not make use of and .
When we introduce regularization tikhonov1963regularization ; hastie2009elements into polynomial regression and logarithmic regression, we minimize the following loss function between the predicted variable and the expected variable ,
(IV.25) |
where is the regularization term in the loss function.
The loss function in (IV.25) is iteratively minimized during the optimization process and we set for all following computations the maximum number of iterative steps to be .
The precise form of the regularization term in the loss function as well as the different regularization schemes in machine learning are discussed in the following section.
V Least Absolute Shrinkage and Selection Operator (Lasso) and Regularization
The Least Absolute Shrinkage and Selection Operator (Lasso) tibshirani1996regression is a machine learning regularization technique primarily employed to prevent overfitting in supervised machine learning. However, it can also be utilized for feature selection. In our work, the overarching goal in employing Lasso is to introduce a machine learning model capable of delivering optimal predictions for the minimum volume for toric Calabi-Yau 3-folds while using the fewest features from the training dataset. For problems such as the one considered in this work, it is quintessential to be able to obtain formulas with a small number of parameters. As a result, using Lasso is particularly suited for discovering new mathematical formulas such as the one aimed for in this work for the minimum volume for toric Calabi-Yau 3-folds.
In the following section, we give a brief overview of several regularization schemes including Lasso in the context of supervised machine learning for the minimum volume formula for toric Calabi-Yau 3-folds.
Regularization. Regularization in machine learning is a technique usually used to avoid overfitting the dataset during model training. This is done by adding a penalty term in the loss function. The introduction of the added regularization term , resulting in an updated loss function of the form,
(V.26) |
serves the purpose of constraining the possible parameter values within the supervised machine learning model. In the case of multiple linear regression as first introduced in Krefl:2017yox and reviewed in section §IV, these parameters would be the real coefficients and in the candidate linear function in (IV.15) for the expected minimum volume given by . By restricting the values for these parameters, regularization effectively makes it harder for the supervised machine learning model to give a candidate function for the minimum volume with many terms in the function. This prevents the machine learning model to overfit the dataset of minimized volumes for toric Calabi-Yau 3-folds.
Let us review the following three regularization schemes:
-
•
L1 Regularization (Lasso). This regularization scheme also known as Least Absolute Shrinkage and Selection Operator (Lasso) tibshirani1996regression adds the following linear regularization term to the loss function of the regression model,
(V.27) where are the real parameters of the regression model. is a real regularization parameter. Increasing the value of has the effect of increasing the strength of the L1 regularization.
-
•
L2 Regularization (Ridge). Another regularization scheme is known as Ridge regularization or L2 regularization hoerl1970ridge . It adds the following quadratic regularization term to the loss function of the regression model,
(V.28) where are the real parameters of the regression model and is again the real regularization parameter.
-
•
Elastic Net (L1 and L2). Elastic Net zou2005regularization is a combination of L1 (Lasso) and L2 (Ridge) regularization and adds the following regularization terms to the loss function,
(V.29) where and are relative real regularization parameters that regulate the proportion of L1 regularization and L2 regularization in this regularization scheme.
Amongst these regularization schemes in supervised machine learning, we are going to mainly focus on Lasso and L1 regularization for the remainder of this work. While all three regularization schemes share the common goal of constraining the range of values for the model parameters , it is noteworthy that only Lasso possesses the unique property of inducing sparsity among the model parameters, resulting in the complete elimination of certain parameters during the training process.



There are several arguments why Lasso enables the complete elimination of some of the model parameters and the corresponding features in the candidate function for the minimum volume for toric Calabi-Yau 3-folds. In order to illustrate this, let us consider the case with features and , for which the L1 and L2 regularization terms take respectively the following form,
(V.30) |
If we assume that under optimization, the regularization terms reach a value and for and , we can draw the parametric plots for the two regularization terms as shown in Figure 6 hastie2009elements . We can see from the plots in Figure 6 that for L1 regularization, the minimum of the total loss function is more likely achieved when one of the two parameters or approaches 0. This is in part due to the absolute values taken for the parameters in the linear L1 regularization term.
As a result, Lasso regularization is particularly suited for feature selection and parameter elimination in regression models.
In our work, we employ L1 regularization to derive a formula for the minimum volume of Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds that is interpretable, presentable and reusable.
VI Candidates for Minimum Volume Functions
In this work, our aim is to apply Lasso regularization in order to identify explicit formulas for the minimum volume for toric Calabi-Yau 3-folds.
By doing so, our aim is to maximize the accuracy of the formulas that we find while minimizing the number of parameters the formulas depend on, making them interpretable and readily presentable.
data set | ||||
---|---|---|---|---|
0.03548 | 3 | 0.98354 | ||
0.01995 | 3 | 0.98697 | ||
0.97724 | 3 | 0.98743 | ||
0.97724 | 3 | 0.98740 |
data set | ||||
---|---|---|---|---|
0.00045 | 6 | 0.98932 | ||
0.00032 | 6 | 0.98992 | ||
0.00112 | 3 | 0.99281 | ||
0.00112 | 3 | 0.99297 |
Parameter Sparsity vs Accuracy. Like in all regression problems, we introduce as a measure of how well the model fits the observed data using the -score montgomery2021introduction ; hastie2009elements given by,
(VI.31) |
where the residual sum of squares is given by,
(VI.32) |
and the total sum of squares is given by,
(VI.33) |
Here, denotes the predicted value for the minimum volume given by , whereas denotes the mean of the expected values .
We recall that the optimization problem for the L1-regularized regression model is to minimize the loss function with the L1 regularization term. As we discussed in the sections above, this optimization problem focuses on minimizing the mean squared error with a penalty for non-zero coefficients , which depends on the regularization parameter .
Here, we note that there is an additional optimization problem regarding the maximization of the -score in (VI.31) and the minimization of the number of non-zero coefficients . We can formulate this additional optimization problem as follows,
(VI.34) |
where , and the values of the coefficients and the -score all depend on the regularization parameter .
is a positive hyperparameter that regulates how much we value sparsity of feature coefficients over the accuracy of the estimate given by .
Candidate Formulas. The candidate formulas for the minimum volume for toric Calabi-Yau 3-folds are identified by an optimal regularization parameter that maximizes the -score of the candidate formula and minimizes the number of non-zero coefficients corresponding to features in the chosen regression model. In order to identify the optimal regularization parameter for the optimization problem in (VI.34), we search for in a given fixed range for as specified in Figure 7 and Figure 8. We do the search for the optimal regularization parameter for all four datasets in Table 2 for both L1-regularized polynomial regression and L1-regularized logarithmic regression as discussed in sections §IV and §V. The chosen L1-regularized regression models are trained for a particular value of the regularization parameter under a fixed randomly chosen 80% training and 20% testing data split, where the corresponding -score depending on is obtained from the testing data.

Figure 7 shows respectively for datasets and plots for the L1 regularization parameter for polynomial regression against standardized coefficients , against the number of non-zero coefficients , and against the -score. Here, the standardized coefficients are obtained when the training is conducted over normalized features . When the training is completed for a specific value of , the candidate formula for the minimum volume given by is obtained by reversing the normalization on the features, giving us the coefficients of the candidate formula. We also have Figure 8 which shows respectively for datasets and plots for the L1 regularization parameter for logarithmic regression against the standardized coefficients , the number of non-zero coefficients and the -score. Similar plots can also be obtained for datasets and for both L1-regularized polynomial regression and L1-regularized logarithmic regression.
Overall, the plots illustrate that the identified optimal regularization parameters minimize the number of non-zero coefficients in the formula estimating the minimum volume given by , as well as maximize the accuracy of the formulas measured by the -score. Table 3 and Table 4 summarize respectively the most optimal candidate formulas for the minimum volume given by under L1-regularized polynomial regression and L1-regularized logarithmic regression for the four datasets in Table 2, with the corresponding optimal regularization parameters , the corresponding number of non-zero coefficients and the -score.
A closer look reveals that for all models, the identified optimal regularization parameters results in formulas that approximate the minimum volume extremely well for all the datasets , , and . Overall, the L1-regularized logarithmic regression models seem to give more accurate results than the L1-regularized polynomial regression models with over all datasets. In particular, L1-regularized logarithmic regression models trained on datasets and have -scores above , which is exceptionally high.
Having a closer look at explicit examples of toric Calabi-Yau 3-folds in the datasets reveals however that the performances of the regularized regression models can vary between different toric Calabi-Yau 3-folds. For example, focusing on the L1-regularized logarithmic regression models trained on and , we observe that the minimum volumes given by and in Table 4 perform differently for toric diagrams with smaller areas compared to toric diagrams with larger areas as illustrated in Figure 9. Similar observations can be made for the L1-regularized logarithmic regression models trained on and as well as the L1-regularized polynomial regression models.
In summary, we can calculate the expected relative percentage errors of the predicted minimum volumes given by and the corresponding standard deviations for the L1-regularized logarithmic regression models as follows,
(VI.35) |
We note that the models trained on and have a larger expected relative percentage error than the ones trained on and . This is partly due to the fact that and contain randomly selected toric diagrams in a lattice box in and circle, respectively, whereas and contain the full set of toric diagrams in a lattice box in and circle, respectively, as defined in Table 2.
We also note that the -scores of the L1-regularized logarithmic regression models in Table 4,
(VI.36) |
are overall very high and close to .
Compared to the expected relative percentage errors in (VI), which measure how far off predictions of the minimum volume given by are, the -score is a measure of the accuracy of the trained regression model.
It quantifies the proportion of the variation in that can be predicted using the features selected from the corresponding toric diagrams of the toric Calabi-Yau 3-folds.
VII Discussions and Conclusions
With this work, we demonstrated that employing regularization in machine learning models can effectively address the limitations posed by supervised machine learning techniques applied to problems that occur in the context of string theory. In particular, we have shown that the minimum volume for Sasaki-Einstein 5-manifolds corresponding to toric Calabi-Yau 3-folds can be expressed by just 3 features of the associated toric diagrams with an -score . These 3 features are the area of , the number of vertices in , and the number of internal points in the factor enlarged toric diagram .
The simultaneous maximization of the -score and the minimization of the number surviving parameters in the candidate function for by varying the regularization strength given by the regularization parameter , the proposed regularized regression models in this work give far more presentable, interpretable and explainable results than our previous work in Krefl:2017yox . Above all, as suggested in Figure 9, the candidate formulas for the minimum volumes of toric Calabi-Yau 3-folds obtained in this study are concise enough to facilitate the examination of why some toric Calabi-Yau 3-folds are associated with minimum volumes that are more challenging to predict than those of certain other toric Calabi-Yau 3-folds. We plan to report on these investigations in the near future. We foresee that the application of regularization schemes to other supervised machine learning applications in string theory will open up equally promising research opportunities in the future.
Acknowledgements.
R.K.-S. would like to thank the Simons Center for Geometry and Physics at Stony Brook University, the City University of New York Graduate Center, the Institute for Basic Science Center for Geometry and Physics, as well as the Kavli Institute for the Physics and Mathematics of the Universe for hospitality during various stages of this work. He is supported by a Basic Research Grant of the National Research Foundation of Korea (NRF-2022R1F1A1073128). He is also supported by a Start-up Research Grant for new faculty at UNIST (1.210139.01), a UNIST AI Incubator Grant (1.230038.01) and UNIST UBSI Grants (1.230168.01, 1.230078.01), as well as an Industry Research Project (2.220916.01) funded by Samsung SDS in Korea. He is also partly supported by the BK21 Program (“Next Generation Education Program for Mathematical Sciences”, 4299990414089) funded by the Ministry of Education in Korea and the National Research Foundation of Korea (NRF).References
- (1) Y.-H. He, Deep-Learning the Landscape, 1706.02714.
- (2) D. Krefl and R.-K. Seong, Machine Learning of Calabi-Yau Volumes, Phys. Rev. D 96 (2017) 066014, [1706.03346].
- (3) F. Ruehle, Evolving neural networks with genetic algorithms to study the String Landscape, JHEP 08 (2017) 038, [1706.07024].
- (4) J. Carifio, J. Halverson, D. Krioukov and B. D. Nelson, Machine Learning in the String Landscape, JHEP 09 (2017) 157, [1707.00655].
- (5) A. Cole, A. Schachner and G. Shiu, Searching the Landscape of Flux Vacua with Genetic Algorithms, JHEP 11 (2019) 045, [1907.10072].
- (6) A. Cole, G. J. Loges and G. Shiu, Interpretable Phase Detection and Classification with Persistent Homology, in 34th Conference on Neural Information Processing Systems, 12, 2020. 2012.00783.
- (7) J. Halverson, A. Maiti and K. Stoner, Neural Networks and Quantum Field Theory, Mach. Learn. Sci. Tech. 2 (2021) 035002, [2008.08601].
- (8) S. Gukov, J. Halverson, F. Ruehle and P. Sułkowski, Learning to Unknot, Mach. Learn. Sci. Tech. 2 (2021) 025035, [2010.16263].
- (9) S. Abel, A. Constantin, T. R. Harvey and A. Lukas, Evolving Heterotic Gauge Backgrounds: Genetic Algorithms versus Reinforcement Learning, Fortsch. Phys. 70 (2022) 2200034, [2110.14029].
- (10) S. Krippendorf, R. Kroepsch and M. Syvaeri, Revealing systematics in phenomenologically viable flux vacua with reinforcement learning, 2107.04039.
- (11) A. Cole, S. Krippendorf, A. Schachner and G. Shiu, Probing the Structure of String Theory Vacua with Genetic Algorithms and Reinforcement Learning, in 35th Conference on Neural Information Processing Systems, 11, 2021. 2111.11466.
- (12) P. Berglund, Y.-H. He, E. Heyes, E. Hirst, V. Jejjala and A. Lukas, New Calabi-Yau Manifolds from Genetic Algorithms, 2306.06159.
- (13) M. Demirtas, J. Halverson, A. Maiti, M. D. Schwartz and K. Stoner, Neural Network Field Theories: Non-Gaussianity, Actions, and Locality, 2307.03223.
- (14) K. Bull, Y.-H. He, V. Jejjala and C. Mishra, Machine Learning CICY Threefolds, Phys. Lett. B 785 (2018) 65–72, [1806.03121].
- (15) V. Jejjala, A. Kar and O. Parrikar, Deep Learning the Hyperbolic Volume of a Knot, Phys. Lett. B 799 (2019) 135033, [1902.05547].
- (16) C. R. Brodie, A. Constantin, R. Deen and A. Lukas, Machine Learning Line Bundle Cohomology, Fortsch. Phys. 68 (2020) 1900087, [1906.08730].
- (17) Y.-H. He and A. Lukas, Machine Learning Calabi-Yau Four-folds, Phys. Lett. B 815 (2021) 136139, [2009.02544].
- (18) H. Erbin and R. Finotello, Machine learning for complete intersection Calabi-Yau manifolds: a methodological study, Phys. Rev. D 103 (2021) 126014, [2007.15706].
- (19) V. Anagiannis and M. C. N. Cheng, Entangled q-convolutional neural nets, Mach. Learn. Sci. Tech. 2 (2021) 045026, [2103.11785].
- (20) M. Larfors, A. Lukas, F. Ruehle and R. Schneider, Numerical metrics for complete intersection and Kreuzer–Skarke Calabi–Yau manifolds, Mach. Learn. Sci. Tech. 3 (2022) 035014, [2205.13408].
- (21) S. Krippendorf and M. Syvaeri, Detecting Symmetries with Neural Networks, 2003.13679.
- (22) D. S. Berman, Y.-H. He and E. Hirst, Machine learning Calabi-Yau hypersurfaces, Phys. Rev. D 105 (2022) 066002, [2112.06350].
- (23) J. Bao, Y.-H. He and E. Hirst, Neurons on Amoebae, J. Symb. Comput. 116 (2022) 1–38, [2106.03695].
- (24) R.-K. Seong, Unsupervised Machine Learning Techniques for Exploring Tropical Coamoeba, Brane Tilings and Seiberg Duality, 2309.05702.
- (25) D. Martelli, J. Sparks and S.-T. Yau, Sasaki-Einstein manifolds and volume minimisation, Commun. Math. Phys. 280 (2008) 611–673, [hep-th/0603021].
- (26) D. Martelli, J. Sparks and S.-T. Yau, The geometric dual of a-maximisation for toric Sasaki- Einstein manifolds, Commun. Math. Phys. 268 (2006) 39–65, [hep-th/0503183].
- (27) W. Fulton, Introduction to toric varieties. Annals of mathematics studies. Princeton Univ. Press, Princeton, NJ, 1993.
- (28) N. C. Leung and C. Vafa, Branes and Toric Geometry, ArXiv High Energy Physics - Theory e-prints (Nov., 1997) , [hep-th/9711013].
- (29) B. R. Greene, String theory on Calabi-Yau manifolds, in Theoretical Advanced Study Institute in Elementary Particle Physics (TASI 96): Fields, Strings, and Duality, pp. 543–726, 6, 1996. hep-th/9702155.
- (30) M. R. Douglas, B. R. Greene and D. R. Morrison, Orbifold resolution by D-branes, Nucl.Phys. B506 (1997) 84–106, [hep-th/9704151].
- (31) E. Witten, Anti-de Sitter space and holography, Adv. Theor. Math. Phys. 2 (1998) 253–291, [hep-th/9802150].
- (32) I. R. Klebanov and E. Witten, Superconformal field theory on three-branes at a Calabi-Yau singularity, Nucl.Phys. B536 (1998) 199–218, [hep-th/9807080].
- (33) M. R. Douglas and G. W. Moore, D-branes, Quivers, and ALE Instantons, hep-th/9603167.
- (34) A. E. Lawrence, N. Nekrasov and C. Vafa, On conformal field theories in four-dimensions, Nucl.Phys. B533 (1998) 199–209, [hep-th/9803015].
- (35) B. Feng, A. Hanany and Y.-H. He, D-brane gauge theories from toric singularities and toric duality, Nucl. Phys. B595 (2001) 165–200, [hep-th/0003085].
- (36) B. Feng, A. Hanany and Y.-H. He, Phase structure of D-brane gauge theories and toric duality, JHEP 08 (2001) 040, [hep-th/0104259].
- (37) J. M. Maldacena, The large N limit of superconformal field theories and supergravity, Adv. Theor. Math. Phys. 2 (1998) 231–252, [hep-th/9711200].
- (38) D. R. Morrison and M. R. Plesser, Nonspherical horizons. 1., Adv.Theor.Math.Phys. 3 (1999) 1–81, [hep-th/9810201].
- (39) B. S. Acharya, J. M. Figueroa-O’Farrill, C. M. Hull and B. J. Spence, Branes at conical singularities and holography, Adv. Theor. Math. Phys. 2 (1999) 1249–1286, [hep-th/9808014].
- (40) K. A. Intriligator and B. Wecht, The Exact superconformal R symmetry maximizes a, Nucl. Phys. B 667 (2003) 183–200, [hep-th/0304128].
- (41) A. Butti and A. Zaffaroni, R-charges from toric diagrams and the equivalence of a- maximization and Z-minimization, JHEP 11 (2005) 019, [hep-th/0506232].
- (42) A. Butti and A. Zaffaroni, From toric geometry to quiver gauge theory: The Equivalence of a-maximization and Z-minimization, Fortsch.Phys. 54 (2006) 309–316, [hep-th/0512240].
- (43) S. S. Gubser, Einstein manifolds and conformal field theories, Phys. Rev. D 59 (1999) 025006, [hep-th/9807164].
- (44) M. Henningson and K. Skenderis, The Holographic Weyl anomaly, JHEP 07 (1998) 023, [hep-th/9806087].
- (45) S. Benvenuti, B. Feng, A. Hanany and Y.-H. He, Counting BPS operators in gauge theories: Quivers, syzygies and plethystics, JHEP 11 (2007) 050, [hep-th/0608050].
- (46) B. Feng, A. Hanany and Y.-H. He, Counting Gauge Invariants: the Plethystic Program, JHEP 03 (2007) 090, [hep-th/0701063].
- (47) C.-F. Gauss, Theoria combinationis observationum erroribus minimis obnoxiae. Henricus Dieterich, 1823.
- (48) R. A. Fisher, On the mathematical foundations of theoretical statistics, Philosophical transactions of the Royal Society of London. Series A, containing papers of a mathematical or physical character 222 (1922) 309–368.
- (49) W. Mendenhall, T. Sincich and N. S. Boudreau, A second course in statistics: regression analysis, vol. 6. Prentice Hall Upper Saddle River, NJ, 2003.
- (50) D. A. Freedman, Statistical models: theory and practice. cambridge university press, 2009.
- (51) J. D. Jobson, Applied multivariate data analysis: regression and experimental design. Springer Science & Business Media, 2012.
- (52) Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE 86 (1998) 2278–2324.
- (53) A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012) .
- (54) Y. LeCun, Y. Bengio and G. Hinton, Deep learning, nature 521 (2015) 436–444.
- (55) J. Schmidhuber, Deep learning in neural networks: An overview, Neural networks 61 (2015) 85–117.
- (56) D. E. Rumelhart, G. E. Hinton and R. J. Williams, Learning representations by back-propagating errors, nature 323 (1986) 533–536.
- (57) T. Hastie, R. Tibshirani, J. H. Friedman and J. H. Friedman, The elements of statistical learning: data mining, inference, and prediction, vol. 2. Springer, 2009.
- (58) K. Hori and C. Vafa, Mirror symmetry, hep-th/0002222.
- (59) B. Feng, Y.-H. He, K. D. Kennaway and C. Vafa, Dimer models from mirror symmetry and quivering amoebae, Adv. Theor. Math. Phys. 12 (2008) 489–545, [hep-th/0511287].
- (60) A. Tikhonov, Regularization of incorrectly posed problems, in Soviet Math. Dokl., pp. 1624–1627, 1963.
- (61) R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society Series B: Statistical Methodology 58 (1996) 267–288.
- (62) D. Martelli and J. Sparks, Toric geometry, Sasaki-Einstein manifolds and a new infinite class of AdS/CFT duals, Commun. Math. Phys. 262 (2006) 51–89, [hep-th/0411238].
- (63) S. Benvenuti, S. Franco, A. Hanany, D. Martelli and J. Sparks, An infinite family of superconformal quiver gauge theories with Sasaki-Einstein duals, JHEP 06 (2005) 064, [hep-th/0411264].
- (64) S. Benvenuti and M. Kruczenski, From Sasaki-Einstein spaces to quivers via BPS geodesics: L**p,q—r, JHEP 04 (2006) 033, [hep-th/0505206].
- (65) A. Butti, D. Forcella and A. Zaffaroni, The Dual superconformal theory for L**pqr manifolds, JHEP 09 (2005) 018, [hep-th/0505220].
- (66) S. Franco, A. Hanany, K. D. Kennaway, D. Vegh and B. Wecht, Brane Dimers and Quiver Gauge Theories, JHEP 01 (2006) 096, [hep-th/0504110].
- (67) A. Hanany and K. D. Kennaway, Dimer models and toric diagrams, hep-th/0503149.
- (68) S. Franco et al., Gauge theories from toric geometry and brane tilings, JHEP 01 (2006) 128, [hep-th/0505211].
- (69) R. Kenyon, An introduction to the dimer model, ArXiv Mathematics e-prints (Oct., 2003) , [math/0310326].
- (70) P. Kasteleyn, Graph theory and crystal physics, Graph theory and theoretical physics (1967) 43–110.
- (71) F. Hirzebruch, Singularities and exotic spheres. Societe Mathematic de France, 1968.
- (72) E. Brieskorn, Beispiele zur differentialtopologie von singularitäten, Inventiones mathematicae 2 (1966) 1–14.
- (73) E. Witten, Phases of N = 2 theories in two dimensions, Nucl. Phys. B403 (1993) 159–222, [hep-th/9301042].
- (74) A. Butti, D. Forcella, A. Hanany, D. Vegh and A. Zaffaroni, Counting Chiral Operators in Quiver Gauge Theories, JHEP 0711 (2007) 092, [0705.2771].
- (75) A. Hanany and A. Zaffaroni, The master space of supersymmetric gauge theories, Adv.High Energy Phys. 2010 (2010) 427891.
- (76) D. Forcella, A. Hanany, Y.-H. He and A. Zaffaroni, The Master Space of N=1 Gauge Theories, JHEP 0808 (2008) 012, [0801.1585].
- (77) D. Forcella, A. Hanany, Y.-H. He and A. Zaffaroni, Mastering the Master Space, Lett.Math.Phys. 85 (2008) 163–171, [0801.3477].
- (78) P. Pouliot, Molien function for duality, JHEP 01 (1999) 021, [hep-th/9812015].
- (79) N. Seiberg, Electric - magnetic duality in supersymmetric nonAbelian gauge theories, Nucl. Phys. B435 (1995) 129–146, [hep-th/9411149].
- (80) C. E. Beasley and M. Ronen Plesser, Toric duality is Seiberg duality, Journal of High Energy Physics 12 (Dec., 2001) 1–+, [hep-th/0109053].
- (81) I. Goodfellow, Y. Bengio and A. Courville, Deep learning. MIT press, 2016.
- (82) G. Pick, Geometrisches zur zahlenlehre, Sitzenber. Lotos (Prague) 19 (1899) 311–319.
- (83) P. Berglund, B. Campbell and V. Jejjala, Machine Learning Kreuzer-Skarke Calabi-Yau Threefolds, 2112.09117.
- (84) D. C. Montgomery, E. A. Peck and G. G. Vining, Introduction to linear regression analysis. John Wiley & Sons, 2021.
- (85) A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics 12 (1970) 55–67.
- (86) H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B: Statistical Methodology 67 (2005) 301–320.