\jyear

2021

[1,2]Jan Janssen

1]Max-Planck-Institut für Eisenforschung GmbH, Max-Planck-Str. 1, 40237 Düsseldorf, Germany 2]Theoretical Division, Los Alamos National Laboratory, Bikini Atoll Rd., SM 30, Los Alamos, NM, USA 3]Skolkovo Institute of Science and Technology, Skolkovo Innovation Center, Bolshoy Bulvar, 30/1, Moscow 121205 Russia 4]BAM Federal Institute for Materials Research and Testing, Richard-Willstätter-Str. 11, 12489 Berlin, Germany

Automated optimization of convergence parameters in plane wave density functional theory calculations via a tensor decomposition-based uncertainty quantification

janssen@mpie.de Edgar Makarov Tilmann Hickel Alexander V. Shapeev Jörg Neugebauer [ [ [ [

Abstract

First principles approaches have revolutionized our ability in using computers to predict, explore and design materials. A major advantage commonly associated with these approaches is that they are fully parameter free. However, numerically solving the underlying equations requires to choose a set of convergence parameters. With the advent of high-throughput calculations it becomes exceedingly important to achieve a truly parameter free approach. Utilizing uncertainty quantification (UQ) and tensor decomposition we derive a numerically highly efficient representation of the statistical and systematic error in the multidimensional space of the convergence parameters. Based on this formalism we implement a fully automated approach that requires as input the target accuracy rather than convergence parameters. The performance and robustness of the approach are shown by applying it to a large set of elements crystallizing in a cubic fcc lattice.

keywords:

Density functional theory, plane wave basis set, kpoint mesh, convergence, uncertainty quantification

1 Introduction

Density functional theory (DFT) has evolved as work-horse method to routinely compute for essentially all known materials their properties. While DFT is parameter-free in the sense that no materials specific input parameters are needed, it is not free of numerical convergence parameters. Carefully selecting these parameters is critical: Setting them too low may sacrifice the predictive power, selecting them too high may waste valuable computational resources. Since DFT-based calculations consume huge amounts of supercomputer resources worldwide being able to reduce the cost for such calculations, without having to sacrifice their accuracy, would open large opportunities in saving computational resources. In the past, the selection of these parameters was based on a few guidelines and manual benchmarks. However, new applications related to machine learning or predicting finite temperature materials properties reliably require a precision in energy on the order of 1 meV, a precision that goes well beyond previous criteria. To guarantee this level of precision a detailed uncertainty quantification as well as an automated approach to determine the optimum set of convergence parameters that guarantee a predefined target error is needed.

Prominent examples of DFT convergence parameters are the number of basis functions and the k-point sampling, which reflect the need to approximate an infinite basis set such as e.g. plane waves (PW) or a continuous set of k-points in the Brillouin zone by discretized finite sets that can be represented on the computer. The accuracy of these approximations can often be described by a single scalar parameter such as the energy cutoff $\epsilon$ or the number of k-points $\kappa$ . The total energy surface $E_{tot}(\{\vec{R_{I}},Z_{I}\}\text{;}\kappa,\epsilon,\dots)$ is thus not only a function of atom coordinates ${\vec{R_{I}}}$ and species $Z_{I}$ , but also of the convergence parameters $\kappa$ , $\epsilon$ , etc. Since the total energy surface is the key quantity to derive materials properties, any derived quantity $f[E_{tot}]$ thus depends on the choice of the convergence parameters as well.

Numerically accurate, i.e. converged, results would be obtained when the convergence parameters approach infinity. In practice, this strategy is not feasible since the required computational resources also scale with the convergence parameters. Therefore, from the beginning of DFT calculations a judicious choice of the convergence parameters was mandatory chadi1973 ; monkhorst1976 ; dacosta1986 ; francis1990 . To optimally use computational resources the convergence parameters have to be chosen such that (i) the actual error $\Delta f(\epsilon,\kappa,\dots)$ is smaller than the target error $\Delta f_{\rm target}$ of the quantity of interest and (ii) the required computational resources are minimized. Since the required computational resources scale monotonously with the convergence parameters the latter condition translates in keeping the convergence parameters as small as possible without violating (i).

Extensive convergence checks have been mainly reported for simple bulk systems by computing the total potential energy surface (PES) as function of the volume per unit cell soler2002 ; beeler2010 ; kratzer2019 ; choudhary2019 . Knowing the energy-volume potential energy surface allows for a direct computation of important materials properties such as the equilibrium lattice structure, mechanical response such as the bulk modulus or the cohesive energy murnaghan1944 ; birch1947 ; vinet1987 . For more complex quantities derived from the PES such as phonon spectra, surface energies, or free energies routine benchmarks such as by how much a target quantity changes when increasing a specific convergence parameter are common. However, systematic convergence checks are rarely reported, e.g. grabowski2007 ; duong2015 for phonons or grabowski2011 for finite temperature free energies.

In the present study we derive the asymptotic behavior of the systematic and statistical errors considering the energy-volume dependence and derived quantities such as the equilibrium lattice constant and the bulk modulus of cubic materials. We start by computing an extensive set of DFT data $E(V,\epsilon,\kappa)$ spanning the full range of physical (i.e. volume $V$ ) as well as convergence parameters (i.e., $\epsilon$ , $\kappa$ ). By carefully analysing these data we show in a first step that the rank-three tensor can be decomposed into four rank-two matrices (see Fig. 1) with full control on systematic and statistical errors. In a second step we develop an efficient tensor decomposition approach also for derived quantities such as the equilibrium bulk modulus $B_{eq}(\epsilon,\kappa)$ . The availability of these extensive and easy-to-compute (via tensor-decomposition) data sets reveals surprising and hitherto not reported correlations between these critical DFT parameters. The derived formalism also allows us to construct and implement a computationally efficient algorithm that in a fully automated fashion predicts optimum convergence parameters that minimize computational effort while simultaneously guaranteeing convergence below a user-given target error.

Refer to caption — Figure 1: Schematic representation of the discrete mesh of physical and convergence parameters used to map the energy surface $E(V,\epsilon,\kappa)$ (left side) and graphical representation of the proposed tensor decomposition (right side). The upper/lower part represents the systematic/statistical error contributions given by Eqs. (7) and (8), respectively. To reconstruct the full rank-three tensor on the left only DFT data computed on the orange and blue colored planes is required.

2 Construction of an automated approach to determine optimal DFT convergence parameters

2.1 Analysis of the DFT convergence parameters

As a first step towards an automated uncertainty quantification of the total energy surface and derived quantities we start with an analysis of DFT convergence. As model system we consider the energy-volume curve of bulk fcc-Al constructed by changing the lattice constant $a_{lat}$ of the cubic cell at $T=0$ K. Due to crystal symmetry all atomic forces exactly vanish, so that the only degree of freedom is the lattice constant or equivalently the volume of the primitive cell ( $V=a_{lat}^{3}/4$ ). The actual calculations are performed using VASP kresse1993 ; kresse1996 ; kresse19962 . The results and conclusions are not limited to this specific code but can be directly transferred to any plane wave pseudpotential DFT code.

In this study we focus on the two most relevant convergence parameters of a plane wave (PW) DFT code: the PW energy cutoff $\epsilon$ and the k-point sampling $\kappa$ . The integer value $\kappa$ defines the Monkhorst-Pack mesh $(\kappa\times\kappa\times\kappa)$ used to construct the k-point set.

To discuss and derive our approach we first compute and analyze the total energy surface $E(V,\epsilon,\kappa)$ as function of volume $V$ , energy cutoff $\epsilon$ and k-point sampling $\kappa$ . We map the energy surface on an equidistant set of $n_{V}$ volumes $V_{i}$ ranging over an interval $\pm 10\%$ around the equilibrium volume $V_{\rm eq}$ (see Sec. 2.3.2), $n_{\epsilon}$ energy cutoffs $\epsilon_{j}$ and $n_{\kappa}$ k-point samplings $\kappa_{k}$ (see Fig. 1). The generated shifted k-point mesh starts at a minimal k-point sampling of $3\times 3\times 3$ $(\kappa_{min}=3)$ and goes up to a maximum of $91\times 91\times 91$ $(\kappa_{max}=91)$ and an energy cutoff of $\epsilon_{min}=200$ eV up to $\epsilon_{max}=1200$ eV. All calculations are executed for primitive cubic cells with a single atom. The k-point sampling is increased in equidistant steps of $\Delta\kappa=2$ to always exclude the Gamma point. The energy cutoff is increased in equidistant steps of $\Delta\epsilon=20$ eV. For the following analysis we focus on fcc bulk aluminum. Extensive tests for other elements and pseudopotentials show qualitatively the same behavior and will be discussed in Sec. 3.5.

In total $n_{V}\times n_{\epsilon}\times n_{\kappa}=21\times 51\times 45=48195$ DFT calculations have been performed for a single pseudopotential. We note already here that such a large number of DFT calculations is not required for the final algorithm. The complete mapping of cutoff and k-point-sampling is used here only to derive and benchmark the asymptotic behavior for $E(V,\epsilon,\kappa)$ and $B_{eq}(\epsilon_{i},\kappa_{j})\mid_{V_{eq}}$ when going towards large (i.e. extremely well converged) parameters. To analyze the energy surface $E(V,\epsilon,\kappa)$ we define the convergence error with respect to the maximum energy cutoff $\epsilon_{max}$ and k-point sampling $\kappa_{max}$ , respectively:

\Delta E_{\epsilon}(V,\epsilon,\kappa)=E(V,\epsilon,\kappa)-E(V,\epsilon_{max},\kappa)

(1)

\Delta E_{\kappa}(V,\epsilon,\kappa)=E(V,\epsilon,\kappa)-E(V,\epsilon,\kappa_{max})\quad.

(2)

We will first analyze the dependence of the energy over volume of these quantities since this dependence directly impacts convergence of equilibrium quantities such as bulk modulus, lattice constant etc. The energy volume dependence of the energy cutoff convergence $\Delta E_{\epsilon}$ is shown in Fig. 2(b) for a fixed $\kappa$ . As can be seen, using a non-converged energy cutoff $\epsilon_{min}=260$ eV gives rise to a convergence error in the energy that strongly depends on the volume. In the shown example the error is largest for small volumes and monotonously decreases with increasing volume. As a consequence, the equilibrium volume will be shifted to a smaller value compared to the fully converged one.

Fig. 2(b) also reveals a remarkable and highly useful behavior of the energy cutoff convergence $\Delta E_{\epsilon}$ : It is in first order independent of the k-point sampling. Taking the difference

\Delta\Delta E_{\rm noise}=\Delta E_{\epsilon}(V,\epsilon,\kappa_{2})-\Delta E_{\epsilon}(V,\epsilon,\kappa_{1})

(3)

between any two k-point samplings $\kappa_{1}$ and $\kappa_{2}$ for a fixed energy cutoff $\epsilon$ results in a volume dependence that resembles random noise. This is shown exemplary in Fig. 2(d) when computing $\Delta\Delta E$ setting $\kappa_{1}$ and $\kappa_{2}$ to the minimum and maximum cutoff value: The average $\left<\Delta\Delta E(V)\right>_{V}$ is approximately zero, the distribution is Gauss-like and any smooth volume dependence is absent. Due to these characteristics we can therefore regard the variance of this contribution

\Delta\Delta E(\epsilon,\kappa)=\sqrt{\left<(\Delta\Delta E_{\rm noise}(V))^{2}\right>_{V}}

(4)

as a statistical error. Its origin are discretization errors arising from discontinuous (discrete) jumps whenever a continuous change in $\kappa$ or $\epsilon$ results in a discontinuous change in the integer number of k-vectors or plane waves.

Fig. 2(c) shows for the k-point convergence $\Delta E_{\kappa}(V,\epsilon,\kappa)$ an analogous behavior: It is in first order independent of $\epsilon$ . The difference $\Delta\Delta E$ , is shown in Fig. 2(d) and by construction (see Appendix A) identical to the one obtained from the cutoff convergence $\Delta E_{\epsilon}(V,\epsilon,\kappa)$ . We can therefore conclude that the energy cutoff and k-point convergence can be separately considered, i.e.,

\Delta E_{\epsilon}(V,\epsilon,\kappa)\approx\Delta E_{\epsilon}(V,\epsilon,\kappa_{max})\pm\Delta\Delta E(\epsilon,\kappa)

(5)

and

\Delta E_{\kappa}(V,\epsilon,\kappa)\approx\Delta E_{\kappa}(V,\epsilon_{max},\kappa)\pm\Delta\Delta E(\epsilon,\kappa).

(6)

The above equations can be summarized in the following expression of the total energy surface:

\begin{split}E(V,\epsilon,\kappa)&\approx E(V,\epsilon_{max},\kappa_{max})\\ &+\Delta E_{\epsilon}(V,\epsilon,\kappa_{max})+\Delta E_{\kappa}(V,\epsilon_{max},\kappa)\\ &\pm\Delta\Delta E(\epsilon,\kappa).\end{split}

(7)

The above equation decomposes the original rank-three tensor, which would require to perform $n_{V}\times n_{\epsilon}\times n_{\kappa}$ DFT calculations into three rank-two tensors. The first two require only $n_{V}\times(n_{\epsilon}+n_{\kappa})$ DFT calculation. The last contribution, $\Delta\Delta E(\epsilon,\kappa)$ requires for each pair of $\epsilon,\kappa$ also the computation at all volumes, i.e., $n_{V}\times n_{\epsilon}\times n_{\kappa}$ DFT calculations. In the following Section we will analyze the statistical error and derive a computationally efficient approach to compute this error.

2.2 Analysis of the statistical error

Having the full dataset of DFT energies as function of V, $\epsilon$ and $\kappa$ allows us to directly compute the statistical error as defined in Eq. (4). The results are summarized in Fig. 3(a). In the double-logarithmic plot the k-point convergence shows an almost linear dependence, with a vertical shift when changing the energy cutoff (color coded). The fact that the slope remains unchanged when changing the energy cutoff indicates that the k-point convergence of the statistical error is independent of the energy cutoff except for a proportionality factor. To verify this independence we plot the normalized statistical error $\sigma^{E}(\epsilon,\kappa)/\sigma^{E}(\epsilon,\kappa_{min})$ in Fig. 3(b). Having this insight we consider the normalized statistical error along the energy cutoff, $\sigma^{E}(\epsilon,\kappa)/\sigma^{E}(\epsilon_{min},\kappa)$ . The fact that all curves coincide clearly shows that except for a proportionality factor the cutoff dependence is identical.

Using the above identified empirical relations we can approximate the statistical error by:

\sigma^{E}(\epsilon,\kappa)\approx\frac{\sigma^{E}(\epsilon,\kappa_{min})\cdot\sigma^{E}(\epsilon_{min},\kappa)}{\sigma^{E}(\epsilon_{min},\kappa_{min})}\quad.

(8)

The computation of $\sigma^{E}(\epsilon,\kappa_{min})$ and $\sigma^{E}(\epsilon_{min},\kappa)$ requires $n_{V}\times(n_{\epsilon}+n_{\kappa})$ additional DFT calculations. To obtain a sufficiently large magnitude of the statistical error these calculations are performed at the minimum of the convergence parameter set where the statistical noise is largest. As a consequence these extra calculations are computationally inexpensive. To compute the statistical error a second reference next to $\epsilon_{min}$ or $\kappa_{min}$ is needed (see Eq. 3). We find that $\epsilon_{max}$ and $\kappa_{max}$ provide an accurate estimate. These values do not require any additional DFT calculations since they are identical to the ones used to construct the systematic convergence errors $\Delta E_{\epsilon}$ and $\Delta E_{\kappa}$ in Eq. (7). The above formulation allows a highly efficient computation of the statistical error by reducing the computational effort from $(n_{V}\times n_{\epsilon}\times n_{\kappa})$ DFT calculations to $2n_{V}(n_{\epsilon}+n_{\kappa})$ .

2.3 Analysis and fit of the energy-volume curve

Eq. (7) together with Eq. (8) provide a powerful and computationally highly efficient approach to compute energy-volume curves $E(V,\epsilon,\kappa)$ for any set of convergence parameters $\epsilon$ and $\kappa$ and are a key result of the present study. They also form the basis for the automated approach to derive optimum convergence parameters that will be derived in the following. The underlying relations and assumption have been carefully validated by computing and analysing an extensive set of pseudopotentials and chemical elements, as presented below.

2.3.1 Analysis of the energy-volume curve

Fig. 4 shows the computed energy-volume curves $E(V,\epsilon,\kappa)$ for two sets of convergence parameters: One with parameters as recommended by the VASP-manual VASP_manual (i.e. $\epsilon_{min}=240$ eV and $\kappa_{min}=11$ ), the other one for an extremely well converged parameter set. Looking at the results over a large volume range ( $\pm 5\%$ ; Fig. 4(a)) the two curves appear to be smooth and well behaved. This may give the impression that the main impact of the convergence parameters is on the absolute energy scale resulting only in a vertical shift. However, going to a $5$ times smaller volume range (Fig. 4(b)) the surface with the recommended convergence parameters shows discontinuities that divide the curve. While the segments between two neighboring discontinuities are smooth and analytically well-behaved their boundaries to the neighboring segments are discontinuous in absolute values and derivatives. As a consequence, even a well-defined energy minimum with zero first derivative, which is the definition of the $T=0$ K ground state, does not exist.

The discontinuous behaviour is a well-known artifact of PW-pseudopotential total energy calculations francis1990 . The origin is that when changing the volume the number of basis functions (plane waves) changes. Since the number of plane waves is an integer, changing the volume continuously results in discontinuous changes in the number of PWs. Improving the convergence parameters reduces the magnitude of the discontinuity (see line marked by red and green dots in middle figure) but does not remove it (see Fig. 4(c) where the volume range has been reduced to $\pm 0.05\%$ ). Note also that in this interval the lower converged curve (straight line marked by blue dots) has no resemblance at all to the expected close to parabolic energy minimum.

A common strategy to overcome the discontinuous behaviour is to fit the energy-volume points obtained from the DFT-calculations to a smooth fitting function dacosta1986 . Two main approaches exist: First, using as fit function analytical expressions for the equation of state that describe the relation between the volume of a body and the pressure applied to it murnaghan1944 ; birch1947 ; vinet1987 . From this set we chose the Birch-Murnaghan equation of state, since it is the most popular choice in fitting such energy-volume curves. The second approach is to use polynomial fits.

In contrast to the discontinuous energy surface, having a smooth fit to the energy-volume data points allows one to obtain the minimum as well as higher order derivatives around it. Particularly important in this respect are the energy minimum (related to the cohesive energy), the volume at which the energy becomes minimum (equilibrium volume $V_{eq}$ at $T=0$ K), as well as the second and third derivative (related to the bulk modulus $B_{eq}$ and its derivative $B^{\prime}_{eq}$ ). These quantities can be measured experimentally and thus allow a direct comparison with the theoretical predictions.

Fitting the data points using either analytical or polynomial functions introduces next to DFT related convergence parameters (e.g. $\epsilon$ and $\kappa$ ) additional parameters that need to be carefully chosen. For the energy-volume curve these are (i) the number of energy-volume pairs and (ii) the volume range. For a polynomial fit, in addition, also the maximum polynomial degree is a parameter that needs to be tested. Since these parameters are related to the fit and not to the DFT calculation we call them in the following hyper-parameters.

2.3.2 Selection of suitable hyper-parameters for the fit

Similarly to the DFT convergence parameters the fit-related hyper-parameters have to be chosen such that (i) the error related to them is smaller than the target error in the physical quantity of interest and (ii) minimize the computational effort (number and computational expense of the necessary DFT calculations). To construct a suitable set of hyper-parameters we consider the bulk modulus:

\displaystyle B_{eq}=V\frac{\partial^{2}E(V)}{\partial^{2}V}\mid_{V=V_{\rm eq}}\quad.

(9)

The reason for choosing this parameter is that quantities related to higher derivatives in the total energy surface are more sensitive to fitting/convergence errors. Thus, identifying a set of hyper-parameters for this quantity will guarantee that it works also for less sensitive quantities. Indeed, we checked and validated this hypothesis for quantities related to lower orders in the total energy surface such as equilibrium volume $V_{eq}$ or minimum energy $E_{eq}$ .

We first study the dependence of the bulk modulus with respect to the volume range. As input we use the DFT data set $E(V,\epsilon,\kappa)$ constructed in Sec. 2.1, i.e., $21$ energy-volume data points equidistantly distributed over the considered volume range. For polynomial fits we also tested the impact of the polynomial degree. The accuracy of the fit increases until a maximum degree of $d=11$ , i.e., a degree which roughly corresponds to $1/2$ of the number of data points. Going to higher degrees does not reduce the fitting error.

The results are summarized in Fig. 5. Next to a polynomial fit we also show the results using a physics based analytical expression (Birch-Murnaghan equation of state birch1947 ). As minimum limit for the target error we chose $0.1$ GPa. We will later show that this target is much smaller than the DFT error related to the exchange-correlation functional, which is on the order of $10$ GPa and the one related to typical DFT convergence parameters ( $\approx 1\dots 5$ GPa).

Fig. 5 shows that the performance of the two fitting approaches depends on whether low or high convergence parameters are used. For low convergence parameters the analytical fit based on Birch-Murnaghan (blue) is rather insensitive to the exact choice of the volume interval—it remains almost unchanged for intervals between 2 and 10 %. Only when going above 10 % the underlying analytical model with its four free parameters becomes too unflexible giving rise to an increasing model error. For the commonly recommended interval of $\pm$ 10 % VASP_manual the polynomial fit (orange) shows a similar performance but deteriorates both for smaller and larger volume ranges.

For very high convergence parameters, however, the polynomial approach (red) clearly outperforms the analytical one (green). The polynomial fit is highly robust and largely independent on the chosen volume interval that ranges between $0.1$ % and $30$ %. Also, the number of $21$ energy-volume points is sufficient to achieve the targeted error of $0.1$ GPa. In this high-convergence regime the analytical Birch-Murnaghan fit provides accurate predictions only for small volume ranges up to 2%. The reason is that the analytical expression, which contains only four free fitting parameters is no longer able to adjust to the actual shape of the DFT energy surface.

Based on this discussion we will use in the following a volume interval of $\pm$ 10% and $21$ sample points. With this set of hyper-parameters we verified that the errors arising from the polynomial fit are below the target of $0.1$ GPA for both the systematic and the statistical error.

3 Uncertainty quantification of derived physical quantities

3.1 Construction of the error surface

The compact representation of the energy surface $E(V,\epsilon,\kappa)$ as function of both physical/materials parameters (i.e. volume $V$ ) and DFT convergence parameters ( $\epsilon$ , $\kappa$ ) by Eq. (7) together with the set of converged hyper-parameters derived in Sec. 2.3.2 allows us to analyze the convergence of important materials parameters such as equilibrium bulk modulus, lattice constant, cohesive energy etc. with a very modest number of DFT calculations. To test the accuracy and predictive power of the approximate energy surface Eq. (7) we first compute the physical quantity from the full set (i.e. $V_{i}$ , $\epsilon_{j}$ , $\kappa_{k}$ ) of DFT data. Like in the previous section we focus on the bulk modulus, which is highly sensitive to even small errors.

The deviation between the bulk modulus and its converged value $B_{0}(\epsilon_{max},\kappa_{max})$ as function of the convergence parameters is shown in Fig. 6(a). The color code shows the magnitude of the convergence error in a logarithmic scale. As expected, the error shows a general decrease when going towards higher convergence parameters. The actual dependence, however, is surprisingly complex showing a non-monotonous behavior and several local minima. Performing the same fitting approach on the approximate energy surface Eq. (7) (see Fig. 6(c) gives a convergence behavior that shows the same complexity and is virtually indistinguishable from the one shown in Fig. 6(a). Thus, Eq. (7) provides a highly accurate and computationally efficient approach for uncertainty quantification of DFT convergence parameters.

It may be tempting to identify the local minima in the error surface as optimum convergence parameters that combine low error with low computational effort. Unfortunately, these local minima are a spurious product of an oscillatory convergence behavior, where at the nodal points the value becomes close to the converged result. Since DFT convergence parameters should be robust against perturbations caused e.g. by changing the shape of the cell, by atomic displacements etc. the local minima are likely to shift, merge or disappear. Thus, selecting parameters based on such local minima would make these parameters suitable only for the exact structure for which the uncertainty quantification has been performed. We therefore construct in the following an envelope function that connects the local maxima. The envelope represents the amplitude of the oscillatory convergence behavior and is roughly independent on the exact position of the nodes (phase shifts).

Fig. 6(b) shows the resulting envelope function. It is much smoother than the original error surface (Fig. 6(a) and 6 (c)), decreases monotonously when increasing any of the convergence parameters and is free of any spurious local minima. For the further discussion and interpretation of convergence behavior and errors we will exclusively use the envelope function.

3.2 Error decomposition: Systematic vs statistical contribution

The energy expression given by Eq. (7) consists of two systematic contributions ( $\Delta_{\epsilon}$ and $\Delta_{\kappa}$ ) that smoothly change with the convergence parameters and provide an absolute value. It also includes a statistical contribution $\Delta\Delta E$ , which quantifies the magnitude of the fluctuations around this value. This decomposition into systematic and statistical contributions can be directly transferred to the physical quantities derived from the energy surface. To get the systematic contribution only the systematic part of the energy (i.e. the first three terms in Eq. (7)) are used as input for the fit. The statistical contribution is obtained using a Monte Carlo bootstrapping approach: The last term in Eq. (7) is replaced by a normal distribution $N(\mu,\sigma^{2}=\Delta\Delta E)$ and the fitting is performed over a large number of such distributions on the best converged surface $E(V,\epsilon_{max},\kappa_{max})+N(\mu=0,\sigma^{2}=\Delta\Delta E(\epsilon,\kappa)$ . The inset in Fig. 3(b) shows the computed propagation of the statistical error in total energy $E$ to the statistical error in the bulk modulus. One can observe a linear relation between the error in the bulk modulus and the magnitude of the noise in the energy-volume curve. This is because the bulk modulus is obtained as the second derivative of the polynomial fitted to the DFT-computed energy-volume curve. Because both fitting and taking the second derivative are linear operations, the resulting bulk modulus is a linear functional of the energy values. Hence, the noise in the bulk modulus scales linearly with the simulated noise in the energy-volume curve.

Fig. 6 shows the error surface for the statistical error (Fig. 6(a) and (d)), the systematic error (Fig. 6(b) and (e)) as well as the total convergence error (Fig. 6(c) and (f)) for Al and Cu. The two elements have been chosen since they represent the two most different cases that we observe for all investigated potentials.

3.3 Error phase diagrams: Interpretation and Application

Generally, the statistical error becomes dominant for low convergence parameters (in the bottom-left region) while the systematic error dominates for medium and high convergence parameters. The red solid line in Fig. 6(i) shows the boundary where the magnitude of both errors becomes equal. For Al (Fig. 6(c)) this line is absent since the systematic error dominates over the entire region, i.e., even for the lowest considered convergence parameters. Since we have chosen the minimum at or only slightly below the VASP recommended settings the region captures convergence parameters that are commonly chosen. For Cu, in contrast, the statistical error dominates even when using convergence parameters that are well above the recommended ones, i.e., at lower k-point sampling even for an energy cutoff of more than $600$ eV.

The boundary (red line in Fig. 6(i)) that separates the regions where either the statistical or the systematic error dominate can be interpreted like a boundary in a phase diagram: It directly provides information about the dominating error (phase) for any given set of convergence parameters (state variables). Such diagrams can be thus regarded as error phase diagram. The knowledge of the error type has direct practical consequences. If the systematic error dominates, the convergence error becomes a simple linear superposition of each individual convergence error (see Eq. (7)).¹¹1Strictly speaking Eq. (7) applies only for the total energy. For derived quantities such as the bulk modulus we observe sizeable deviations. The reason is that next to an explicit dependence of e.g. the bulk modulus on the convergence parameters also an implicit one via the equilibrium volume occurs, i.e. $B_{0}(\epsilon,\kappa,V_{0}(\epsilon,\kappa))$ It thus allows for any set of ( $\kappa$ , $\epsilon$ ) values to determine the deviation from the converged result including its sign. Due to its additive nature the total systematic error will be always dominated by the least converged parameter.

In contrast, the total statistical error can be reduced to any target by just converging a single convergence parameter, which is a direct consequence of its multiplicative rather than additive nature (see Eq. (8)). It also is the reason why the statistical error decays faster than the systematic error when increasing both convergence parameters simultaneously.

Knowledge of the above introduced and constructed error phase diagram can be directly used to find convergence parameters that minimize computational resources for a given target accuracy. In the region where the statistical error dominates, the multiplicative nature, i.e., where the targeted accuracy can be achieved by converging only a single parameter, which in practice will be the computationally less expensive one. In typical cases this will be the k-point sampling since the necessary CPU time scales linearly with the number of k-points and the number of k-points decreases with increasing system size. In contrast, if highly converged calculations are desired one will be in the region of the error phase diagram where the systematic error dominates. Since in this region the errors of the individual convergence parameters are additive, converging one parameter better than the other would be a waste of CPU time. Thus, the availability of such error phase diagrams will allow us to provide a highly systematic and intuitive way of identifying optimum sets of DFT parameters. In a way, error phase diagrams may become what thermodynamic phase diagrams are for materials engineers today: Roadmaps for identifying optimum paths (convergence parameters) in materials design (DFT calculations).

3.4 Implementation of an automated approach

Using the concepts outlined in the previous sections allows us to construct an easy to implement automated approach. This approach computes for a given chemical element and its pseudopotential a set of optimum convergence parameters. These optimized parameters guarantee that the error for a user-selected quantity (e.g. bulk modulus, lattice constant etc.) is below a user-defined target error. In the present implementation the developed tool accepts only cubic structures where the unit cell can be fully described by the lattice constant as single variable.

The key steps of the automated approach are as follows:

•

Determine the approximate lattice constant at $\epsilon_{max}$ and $\kappa_{max}$ using the experimental lattice constant, 21 volume points, a volume interval of $\pm 10\%$ , and a polynomial fit of order $d=11$ .
•

Perform DFT calculations around the computed equilibrium lattice constant using again $21$ volume points and a volume interval $\pm 10\%$ . Compute on each of the volume points the energies along ( $\epsilon_{min}$ , $\kappa_{i}$ ), ( $\epsilon_{max}$ , $\kappa_{i}$ ), ( $\epsilon_{i}$ , $\kappa_{min}$ ) and ( $\epsilon_{i}$ , $\kappa_{max}$ ).
•

Use Eq. (7) to compute the total energy $E$ on the full 3d mesh (V, $\epsilon$ , $\kappa$ ). No extra DFT calculations are needed.
•

Compute the systematic and the statistical error of the target quantity $A$ (e.g. bulk modulus) following the discussion given in Sec. 2.3. Construct the envelope function of the combined error.
•

Determine on the error surface $A(\epsilon,\kappa)$ the line $\Delta A(\epsilon_{opt},\kappa_{opt})=\Delta A_{target}$ , i.e., sets of convergence parameters where the predicted error equals the target error.
•

Take the set where the curvature of the line is maximum (i.e., close to a 90^o angle.

The above algorithm has been implemented in the pyiron framework janssen2019 . The pyiron-based module requires as input only the pseudopotential and the selection of the target quantity and error. The setup of the DFT jobs, submission on a compute cluster, analysis etc. is done fully automatically without any user intervention.

3.5 Application and benchmark of the automated tool

To demonstrate the robustness and the accuracy of the proposed approach we have employed our automated tool to compute error surface and optimized convergence parameters for a wide range of chemical elements and pseudopotentials. Fig. 7 shows the convergence behavior for $9$ elements using as target quantity the bulk modulus. For each calculation the PBE-GGA pseudo potential recommended by the VASP manual VASP_manual is chosen. These are also the same pseudo potentials used in the Delta project lejaeghere2016 . To visualize the convergence contour lines at target errors of $0.1$ to $5$ GPa are shown on the error surface. Some of the elements such as e.g. Ca allow a convergence down to $0.1$ GPa even for rather modest convergence parameters. Others, such as e.g. Cu achieve this lower limit barely at the maximum set of convergence parameters studied here. In general, systematic convergence trends that would allow to extract some simple rules for finding optimum convergence parameters are lacking. This emphasizes the importance to provide automatized tools for this task.

For our minimum parameter set the statistical error dominates for all elements except for Al, Ca and Pb as indicated by the grey line in Fig. 7, which marks the boundary where the statistical error and systematic error are equal. Going to errors below 1 GPa the systematic error becomes the dominating contribution except for elements with a very high bulk modulus such as Ir and Pt. In the region dominated by the systematic error the cutoff and k-point related error contribution are additive (Eq. (7)). As consequence, the contour lines in Fig. 7 are either parallel to the $\kappa$ -axis (i.e. the dominating error is due to the energy cutoff $\epsilon$ ) or to the $\epsilon$ -axis (with $\kappa$ causing the leading error).

To ’benchmark’ the choices made by our automated tool against the choices made by human experts we include the parameters used in two large and well-established high-throughput studies: The Materials Project jain2011 ; jain2013 ; jain2015 (orange squares) and the delta project lejaeghere2016 (magenta squares). The delta project, which aims at high precision to allow a comparison between different DFT codes, systematically shows an error between $1$ and $5$ GPa, with a clear tendency towards the $1$ GPa limit. The Materials Project, where the focus is on computational efficiency and not the highest precision the error is close to the $5$ GPa limit, for several elements (e.g. Ir, Pt, Au) the error becomes $10$ GPa or larger.

4 Conclusions

In summary, by analyzing the dependence of the total energy not only as function of a single convergence parameter, as commonly done, but as function of a physical parameter (in the present study the volume) as well as multiple convergence parameters (energy cutoff and k-point sampling) simultaneously we identified powerful relations to compute both the statistical and systematic error of the total energy and derived quantities. The identified relations summarized in Eqs. (7) and (8) provide an accurate and computationally efficient tensor decomposition (Fig. 1), thus allowing to describe error surfaces of multiple convergence parameters for important materials quantities such as e.g. the bulk modulus.

Being able to construct such energy surfaces for both the systematic and statistical error with a modest number of simple DFT calculations opened the way to construct error phase diagrams which tell whether for any given parameter set one or the other error type dominates. They also provide direct insight of how multiple convergence parameters together affect the errors and allowed us to construct contour lines of constant error. Having this detailed insight is helpful for DFT practitioners to chose and validate accurate yet computationally efficient convergence parameters. It also allowed us to develop and implement a robust and computationally efficient algorithm. The resulting fully automated tool, implemented in pyiron janssen2019 , predicts an optimum set of convergence parameters with minimum user input, i.e. choice of pseudopotential, desired target error and quantity of interest (e.g. bulk modulus). We expect that this approach where explicit convergence parameters are replaced by a user-selected target error will be particularly important for applications where for a large number of DFT calculations a systematically high accuracy is crucial, e.g. for high-throughput studies and for constructing data sets to be used in machine learning.

5 Acknowledgement

JJ and JN thank Kurt Lejaeghere and Christoph Freysoldt for stimulating discussions and the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Projektnummer 405621217 for financial support. EM and AVS acknowledge the financial support from the Russian Science Foundation (grant number 18-13-00479).

6 Competing Interests

The authors declare no competing interests.

7 Code Availability

The Jupyter Notebooks developed to run the calculations and to analyze the data will be provided on our freely accessible pyiron repository (https://github.com/pyiron/pyiron-dft-uncertainty). The fully interactive Jupyter Notebooks together with our pyiron framework contain the entire code and allow to easily reproduce all calculations and the analysis.

8 Data Availability

In addition to the Jupyter Notebooks also the corresponding data will be provided on our freely accessible pyiron repository (https://github.com/pyiron/pyiron-dft-uncertainty).

9 Author Contributions

JJ and JN developed the concepts of the automated uncertainty quantification. JJ implemented the method in the pyiron framework. JJ and EM compared the predictions with existing results. All five authors contributed to the generalization of the method and the writing of the manuscript.

Appendix A Statistical error

Equality of Eq. (3) for both the k-point dependence $\Delta E_{\epsilon}(V,\kappa)$ and the energy cut off dependence $\Delta E_{\kappa}(V,\epsilon)$ . For a discrete set of k-points $\kappa_{1}$ , $\kappa_{2}$ and a discrete set of energy cutoffs $\epsilon_{1}$ , $\epsilon_{2}$ the energy differences are defined as:

	$\displaystyle\Delta E_{\epsilon}(V,\kappa)$	$\displaystyle=E(V,\epsilon_{2},\kappa)-E(V,\epsilon_{1},\kappa)$
	$\displaystyle\Delta E_{\kappa}(V,\epsilon)$	$\displaystyle=E(V,\epsilon,\kappa_{2})-E(V,\epsilon,\kappa_{1}).$

Based on this definition the statistical error can be defined in two ways:

	$\displaystyle\Delta\Delta E_{\rm noise}=$	$\displaystyle\Delta E_{\epsilon}(V,\kappa_{2})-\Delta E_{\epsilon}(V,\kappa_{1})$
	$\displaystyle=$	$\displaystyle\Big{(}E(V,\epsilon_{2},\kappa_{2})-E(V,\epsilon_{1},\kappa_{2})\Big{)}$
		$\displaystyle-\Big{(}E(V,\epsilon_{2},\kappa_{1})-E(V,\epsilon_{1},\kappa_{1})\Big{)}$
	$\displaystyle=$	$\displaystyle\Big{(}E(V,\epsilon_{2},\kappa_{2})-E(V,\epsilon_{2},\kappa_{1})\Big{)}$
		$\displaystyle-\Big{(}E(V,\epsilon_{1},\kappa_{2})-E(V,\epsilon_{1},\kappa_{1})\Big{)}$
	$\displaystyle=$	$\displaystyle\Delta E_{\kappa}(V,\epsilon_{2})-\Delta E_{\kappa}(V,\epsilon_{1})$

The statistical error is independent of the order of the differences.

References

\bibcommenthead
(1) Chadi, D. J. & Cohen, M. L. Special points in the brillouin zone. Phys. Rev. B 8, 5747–5753 (1973). URL https://link.aps.org/doi/10.1103/PhysRevB.8.5747. 10.1103/PhysRevB.8.5747 .
(2) Monkhorst, H. J. & Pack, J. D. Special points for brillouin-zone integrations. Phys. Rev. B 13, 5188–5192 (1976). URL https://link.aps.org/doi/10.1103/PhysRevB.13.5188. 10.1103/PhysRevB.13.5188 .
(3) Dacosta, P. G., Nielsen, O. H. & Kunc, K. Stress theorem in the determination of static equilibrium by the density functional method. Journal of Physics C: Solid State Physics 19 (17), 3163–3172 (1986). URL https://doi.org/10.1088%2F0022-3719%2F19%2F17%2F012. 10.1088/0022-3719/19/17/012 .
(4) Francis, G. P. & Payne, M. C. Finite basis set corrections to total energy pseudopotential calculations. Journal of Physics: Condensed Matter 2 (19), 4395–4404 (1990). URL https://doi.org/10.1088%2F0953-8984%2F2%2F19%2F007. 10.1088/0953-8984/2/19/007 .
(5) Soler, J. M. et al. The SIESTA method forab initioorder-nmaterials simulation. Journal of Physics: Condensed Matter 14 (11), 2745–2779 (2002). URL https://doi.org/10.1088%2F0953-8984%2F14%2F11%2F302. 10.1088/0953-8984/14/11/302 .
(6) Beeler, B. et al. First principles calculations for defects in u. Journal of Physics: Condensed Matter 22 (50), 505703 (2010). URL https://doi.org/10.1088%2F0953-8984%2F22%2F50%2F505703. 10.1088/0953-8984/22/50/505703 .
(7) Kratzer, P. & Neugebauer, J. The basics of electronic structure theory for periodic systems. Frontiers in Chemistry 7, 106 (2019). URL https://www.frontiersin.org/article/10.3389/fchem.2019.00106. 10.3389/fchem.2019.00106 .
(8) Choudhary, K. & Tavazza, F. Convergence and machine learning predictions of monkhorst-pack k-points and plane-wave cut-off in high-throughput dft calculations. Computational Materials Science 161, 300 – 308 (2019). URL http://www.sciencedirect.com/science/article/pii/S0927025619300813. https://doi.org/10.1016/j.commatsci.2019.02.006 .
(9) Murnaghan, F. D. The compressibility of media under extreme pressures. Proceedings of the National Academy of Sciences 30 (9), 244–247 (1944). URL http://www.pnas.org/content/30/9/244. http://www.pnas.org/content/30/9/244.full.pdf .
(10) Birch, F. Finite elastic strain of cubic crystals. Phys. Rev. 71, 809–824 (1947). URL https://link.aps.org/doi/10.1103/PhysRev.71.809. 10.1103/PhysRev.71.809 .
(11) Vinet, P., Smith, J. R., Ferrante, J. & Rose, J. H. Temperature effects on the universal equation of state of solids. Phys. Rev. B 35, 1945–1953 (1987). URL https://link.aps.org/doi/10.1103/PhysRevB.35.1945. 10.1103/PhysRevB.35.1945 .
(12) Grabowski, B., Hickel, T. & Neugebauer, J. Ab initio study of the thermodynamic properties of nonmagnetic elementary fcc metals: Exchange-correlation-related error bars and chemical trends. Phys. Rev. B 76, 024309 (2007). URL https://link.aps.org/doi/10.1103/PhysRevB.76.024309. 10.1103/PhysRevB.76.024309 .
(13) Duong, D. L., Burghard, M. & Schön, J. C. Ab initio computation of the transition temperature of the charge density wave transition in $\mathrm{TiS}{\mathrm{e}}_{2}$ . Phys. Rev. B 92, 245131 (2015). URL https://link.aps.org/doi/10.1103/PhysRevB.92.245131. 10.1103/PhysRevB.92.245131 .
(14) Grabowski, B., Söderlind, P., Hickel, T. & Neugebauer, J. Temperature-driven phase transitions from first principles including all relevant excitations: The fcc-to-bcc transition in Ca. Phys. Rev. B 84, 214107 (2011). URL https://link.aps.org/doi/10.1103/PhysRevB.84.214107. 10.1103/PhysRevB.84.214107 .
(15) Kresse, G. & Hafner, J. Ab initio molecular dynamics for liquid metals. Phys. Rev. B 47, 558–561 (1993). URL https://link.aps.org/doi/10.1103/PhysRevB.47.558. 10.1103/PhysRevB.47.558 .
(16) Kresse, G. & Furthmüller, J. Efficiency of ab-initio total energy calculations for metals and semiconductors using a plane-wave basis set. Computational Materials Science 6 (1), 15 – 50 (1996). URL http://www.sciencedirect.com/science/article/pii/0927025696000080. https://doi.org/10.1016/0927-0256(96)00008-0 .
(17) Kresse, G. & Furthmüller, J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys. Rev. B 54, 11169–11186 (1996). URL https://link.aps.org/doi/10.1103/PhysRevB.54.11169. 10.1103/PhysRevB.54.11169 .
(18) Vasp manual. URL https://www.vasp.at/wiki/index.php/The_VASP_Manual.
(19) Janssen, J. et al. pyiron: An integrated development environment for computational materials science. Computational Materials Science 163, 24 – 36 (2019). URL http://www.sciencedirect.com/science/article/pii/S0927025618304786. https://doi.org/10.1016/j.commatsci.2018.07.043 .
(20) Jain, A. et al. A high-throughput infrastructure for density functional theory calculations. Computational Materials Science 50 (8), 2295 – 2310 (2011). URL http://www.sciencedirect.com/science/article/pii/S0927025611001133. https://doi.org/10.1016/j.commatsci.2011.02.023 .
(21) Jain, A. et al. Commentary: The materials project: A materials genome approach to accelerating materials innovation. APL Materials 1 (1), 011002 (2013). URL https://doi.org/10.1063/1.4812323. 10.1063/1.4812323, https://doi.org/10.1063/1.4812323 .
(22) Anubhav, J. et al. Fireworks: a dynamic workflow system designed for high‐throughput applications. Concurrency and Computation: Practice and Experience 27 (17), 5037–5059 (2015). URL https://onlinelibrary.wiley.com/doi/abs/10.1002/cpe.3505. 10.1002/cpe.3505, https://onlinelibrary.wiley.com/doi/pdf/10.1002/cpe.3505 .
(23) Lejaeghere, K. et al. Reproducibility in density functional theory calculations of solids. Science 351 (6280) (2016). URL https://science.sciencemag.org/content/351/6280/aad3000. 10.1126/science.aad3000, https://science.sciencemag.org/content/351/6280/aad3000.full.pdf .