This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Environment-adaptive machine learning potentials

Ngoc Cuong Nguyen Department of Aeronautics and Astronautics, Massachusetts Institute of Technology
77 Massachusetts Avenue, Cambridge, MA 02139
   Dionysios Sema Department of Mechanical Engineering, Massachusetts Institute of Technology
77 Massachusetts Avenue, Cambridge, MA 02139
Abstract

The development of interatomic potentials that can accurately capture a wide range of physical phenomena and diverse environments is of significant interest, but it presents a formidable challenge. This challenge arises from the numerous structural forms, multiple phases, complex intramolecular and intermolecular interactions, and varying external conditions. In this paper, we present a method to construct environment-adaptive interatomic potentials by adapting to the local atomic environment of each atom within a system. The collection of atomic environments of interest is partitioned into several clusters of atomic environments. Each cluster represents a distinctive local environment and is used to define a corresponding local potential. We introduce a many-body many-potential expansion to smoothly blend these local potentials to ensure global continuity of the potential energy surface. This is achieved by computing the probability functions that determine the likelihood of an atom belonging to each cluster. We apply the environment-adaptive machine learning potentials to predict observable properties for Ta element and InP compound, and compare them with density functional theory calculations.

preprint: APS/123-QED

I Introduction

Molecular dynamics (MD) simulations require an accurate calculation of energies and forces to analyze the physical movements of atoms. While electronic structure calculations provide accurate energies and forces, they are restricted to analyzing small length scales and short time scales due to their high computational complexity. Interatomic potentials represent the potential energy surface (PES) of an atomic system as a function of atomic positions and thus leave out the detailed electronic structures. They can enable MD simulations of large systems with millions or even billions of atoms over microseconds.

Over the years, empirical interatomic potentials (EIPs) such as the Finnis-Sinclair potential [1], embedded atom method (EAM) [2], modified EAM (MEAM) [3], Stillinger-Weber (SW) [4], Tersoff [5], Brenner [6], EDIP [7], COMB [8], ReaxFF [9] have been developed to treat a wide variety of atomic systems with different degrees of complexity. EAM potential has its root from the Finnis-Sinclair potential [1] in which the embedding function is a square root function. The MEAM potential [3] was developed as a generalization of the EAM potential by including angular-dependent interactions in the electron density term. The SW potential takes the form of a three-body potential in which the total energy is expressed as a linear combination of two- and three-body terms. The Tersoff potential is fundamentally different from the SW potential in that the strength of individual pair interactions is affected by the presence of surrounding atoms. The Brenner potential is based directly on the Tersoff potential but has additional terms and parameters which allow it to better describe various chemical environments. EDIP is designed to more accurately represent interatomic interactions by considering the effects of the local atomic environment on these interactions. Because EAM, MEAM, Tersoff, Brenner, EDIP, ReaxFF and COMB potentials dynamically adjust the strength of the bond based on the local environment of each atom, they can describe several different bonding states and complex behaviors of atoms in various states, including defects, phase transitions, surfaces, and interfaces within materials. One of the key features of ReaxFF and COMB is their ability to handle charge equilibration in a manner that includes long-range electrostatic interactions and reflects changes in the electronic environment of atoms during chemical reactions.

The past decade has seen a tremendous interest in machine learning interatomic potentials (MLIPs) due to their promising quantum accuracy at significantly lower computational complexity than electronic structure calculations. The descriptors play a central role in the construction of accurate and efficient MLIPs. In recent years, a wide variety of descriptors has been developed to represent atomic structures. There are two main approaches to mapping a configuration of atoms onto descriptors [10]: atom density approach and internal coordinate approach. Examples of internal coordinate descriptors include permutation-invariant polynomials (PIPs) [11, 12, 13], atom-centered symmetry functions (ACSFs) [14, 15, 16], and proper orthogonal descriptors (PODs) [17, 18, 19]. These internal coordinate descriptors are intrinsically invariant with respect to translation and rotation because they are functions of angles and distances. They are made to be permutationally invariant by summing symmetry functions over all possible atomic pairs and triplets within local atomic environments. However, achieving permutation invariance by such a way leads to the exponential scaling in terms of the number of neighbors. The computational cost can be kept under control by restricting the range of interactions, the number of descriptors, and the body orders.

The atom density approach describes a local atomic environment around a central atom as an atom density function which is obtained by summing over localized functions centered on the relative positions of all atoms in the local environment. Such a density is naturally invariant to translation and permutation. The atomic neighborhood density is then expanded as a linear combination of appropriate basis functions, where the expansion coefficients are given by the inner products of the neighborhood density with the basis functions. Rotationally invariant descriptors are computed as appropriate sums of products of the density coefficients. In the atom density approach, the choices of the basis set (e.g., radial basis functions, spherical harmonics, angular monomials, hyperspherical harmonics) lead to different sets of descriptors. The power spectrum and bispectrum descriptors [20] are constructed from spherical harmonics, while the spectral neighbor analysis potential (SNAP) descriptors [21] are based on hyperspherical harmonics. The moment tensor potential (MTP) [22] projects the atomic density onto a tensor product of angular vectors to construct the moment tensors whose contraction results in invariant descriptors. The atomic cluster expansion (ACE) [23, 24] extends the power and bispectrum construction to obtain a complete set of invariant descriptors with arbitrary number of body orders. The E3 equivariant graph neural network potentials [25] use spherical harmonics. The atom density representation of POD descriptors employs angular monomials and radial basis functions constructed from the proper orthogonal decomposition [26].

The main advantage of atom density descriptors is that their computational complexity scales linearly with the number of neighbors irrespective of the body orders. The computational complexity of internal coordinate descriptors scales exponentially with the body order in terms of the number of neighbors. However, the cost of internal coordinate descriptors scales linearly with the number of basis functions, whereas that of atom density descriptors scales exponentially with the body order in terms of the number of basis functions. In general, atom density descriptors are more efficient than internal coordinate descriptors when there are many neighbors and the body order is higher than 3. Despite the rather fundamental difference in their construction, some internal coordinate descriptors and atom density descriptors can be shown to span the same descriptor space. This is the case for the POD descriptors in which the atom density representation is shown to be equivalent to the internal coordinate representation [26]. The POD formalism allows other internal coordinate descriptors like PIPs and ACSFs, as well as empirical potentials like EAM, MEAM, and SW, to be implemented using the atom density approach.

Refer to caption
Figure 1: Proper Orthogonal Descriptors can be formulated with either the internal coordinate or atom density representations with a transformation that shows their equivalency.

Despite considerable progress that has been made in recent years, there remain open problems to be addressed with regard to the accuracy, efficiency, and transferability of interatomic potentials. The development of interatomic potentials that can effectively capture a wide range of atomic environments is a complex challenge due to several reasons. Materials can exist in numerous structural forms (e.g., crystalline, amorphous, defects, interfaces) and phases (solid, liquid, gas, plasma). Atoms interact through various forces such as electrostatic, van der Waals, ionic bonding, covalent bonding, and metallic bonding, which manifest differently depending on the chemical elements and their electronic structures. Furthermore, the effective interaction among atoms can change with external conditions like temperature, pressure, and chemical environment. Consequently, creating an interatomic potential that performs well across diverse conditions is difficult because optimizing the potential for one set of conditions can lead to poorer performance in others. Each of these factors contributes to the complexity of developing interatomic potentials that are effective and efficient to capture a diverse range of local atomic environments.

In this paper, we introduce a method for the systematic construction of accurate and transferable interatomic potentials by adapting to the local atomic environment of each atom within a system. Local atomic environment of an atom comprises the positions and chemical species of the atom and its neighbors within a cutoff radius. These atom positions and chemical species can be mapped onto a vector of MM invariant descriptors by using either the internal coordinate approach or the atom density approach. For a dataset of NN atoms, we obtain a descriptor matrix of size NN by MM. Each row of the descriptor matrix encapsulates the local atomic environment of the corresponding atom. Since MM is typically large, a dimensionality reduction technique is used to compress the descriptor matrix into a lower-dimensional matrix of size NN by JJ, where JJ is considerably less than MM. A clustering method is then employed to partition the compressed data into KK separate clusters. In other words, the original dataset of NN atoms is divided into KK subsets and the atoms in any subset have similar atomic environments. The clustering scheme allows us to divide the diverse dataset into smaller subsets, each characterized by data points sharing the common attributes. This approach captures the diversity inherent in the dataset by identifying distinct atomic environments within the dataset. By training MLIPs on these subsets separately, we can obtain MLIPs that are tailored to specific atomic environments. Each MLIP may accurately predict configurations in the subset on which it is trained. However, it may not be accurate for predicting configurations in the other subsets.

The above approach raises the question: How do we combine these separately localized MLIPs to construct a global potential energy surface? To this end, we propose a many-body many-potential (MBMP) expansion designed to seamlessly blend the individual MLIPs and ensure that the potential energy surface remains continuous across cluster boundaries. This continuity is achieved by calculating probability functions that assess the likelihood of an atom belonging to specific clusters identified within the dataset. These probability functions are critical in guiding how contributions from different MLIPs are weighted and combined, providing a systematic way to maintain the integrity and accuracy of the model across different atomic environments. This integration is crucial for achieving a comprehensive model that can accurately capture diverse environments in the original dataset. This model can also capture atomic environments that are a mixture of several distinct environments when the probability functions are close to each other, thereby potentially making the model more transferable than the individual MLIPs.

Although the formulation of the environment-adaptive machine learning (EAML) potentials is descriptor agnostic and can be developed for any set of descriptors, in this work we employ the proper orthogonal descriptors [17, 26]. To this end, we extend the proper orthogonal descriptors to deal with multi-element systems. This enables us to construct EAML potentials that are finely tuned to the complexities of various material compositions under diverse conditions. We apply the EAML potentials to predict observable properties for Ta element and InP compound, and compare them with density functional theory calculations.

The paper is organized as follows. In Section II, we extend proper orthogonal descriptors to multi-element systems. In Section III, we describe our approach for constructing EAML potentials. In Section IV, we present results to demonstrate the EAML potentials for Tantalum and Indium Phosphide. Finally, we provide some concluding remarks in Section V.

II Multi-Element proper orthogonal descriptors

This section outlines a systematic approach for constructing internal coordinate and atom density descriptors to represent the local atomic environments of multi-element systems. Building on our previous work [17, 18, 19], we develop invariant descriptors for multi-element systems by leveraging orthogonal proper decomposition to generate radial basis functions and employing the trinomial expansion of angular monomials to achieve rotational symmetry. The resulting descriptors combine elements of both internal coordinates and atom density fields, as illustrated in Figure 1.

II.1 Many-body potential energy surface

We consider a multi-element system of NaN_{a} atoms with NeN_{e} unique elements. We denote by 𝒓i\bm{r}_{i} and ZiZ_{i} position vector and type of an atom ii in the system, respectively. Thus we have Zi{1,,Ne}Z_{i}\in\{1,\ldots,N_{e}\}, 𝑹=(𝒓1,𝒓2,,𝒓Na)3Na\bm{R}=(\bm{r}_{1},\bm{r}_{2},\ldots,\bm{r}_{N_{a}})\in\mathbb{R}^{3N_{a}}, and 𝒁=(Z1,Z2,,ZNa)Na\bm{Z}=(Z_{1},Z_{2},\ldots,Z_{N_{a}})\in\mathbb{N}^{N_{a}}. The potential energy surface (PES) of the system of NaN_{a} atoms can be expressed as a many-body expansion of the form

ET(𝑹,𝒁)=iV(1)(𝒓i,Zi)+i,jV(2)(𝒓i,𝒓j,Zi,Zj)+i,j,kV(3)(𝒓i,𝒓j,𝒓k,Zi,Zj,Zk)+i,j,k,lV(4)(𝒓i,𝒓j,𝒓k,𝒓l,Zi,Zj,Zk,Zl)+\begin{split}E_{\rm T}(\bm{R},\bm{Z})=&\sum_{i}V^{(1)}(\bm{r}_{i},Z_{i})+\sum_{i,j}V^{(2)}(\bm{r}_{i},\bm{r}_{j},Z_{i},Z_{j})\\ &+\sum_{i,j,k}V^{(3)}(\bm{r}_{i},\bm{r}_{j},\bm{r}_{k},Z_{i},Z_{j},Z_{k})\\ &+\sum_{i,j,k,l}V^{(4)}(\bm{r}_{i},\bm{r}_{j},\bm{r}_{k},\bm{r}_{l},Z_{i},Z_{j},Z_{k},Z_{l})+\ldots\end{split} (1)

The superscript on each potential denotes its body order. Each potential must also depend on a set of parameters used to parametrize it for a specific application. To simplify the notation, we have chosen not to explicitly denote these parameters in the potentials. A separation of the PES into atomic contributions yields

ET(𝑹,𝒁)=i=1NaEi(𝑹,𝒁)E_{\rm T}(\bm{R},\bm{Z})=\sum_{i=1}^{N_{a}}E_{i}(\bm{R},\bm{Z}) (2)

where EiE_{i} is obtained from (1) by removing the sum over index ii. To make the PES invariant with respect to translation and rotation, the potentials should depend only on internal coordinates as follows

Ei=V(1)(Zi)+jV(2)(rij,Zi,Zj)+j,kV(3)(rij,rik,wijk,Zi,Zj,Zk)+j,k,lV(4)(rij,rik,ril,wijk,wijl,wikl,Zi,Zj,Zk,Zl)+\begin{split}E_{i}=&V^{(1)}(Z_{i})+\sum_{j}V^{(2)}(r_{ij},Z_{i},Z_{j})\ +\\ &\sum_{j,k}V^{(3)}(r_{ij},r_{ik},w_{ijk},Z_{i},Z_{j},Z_{k})\ +\\ &\sum_{j,k,l}V^{(4)}(r_{ij},r_{ik},r_{il},w_{ijk},w_{ijl},w_{ikl},Z_{i},Z_{j},Z_{k},Z_{l})+\ldots\end{split} (3)

where 𝒓ij=𝒓j𝒓i\bm{r}_{ij}=\bm{r}_{j}-\bm{r}_{i}, rij=|𝒓ij|r_{ij}=|\bm{r}_{ij}|, wijk=cosθijk=𝒓^ij𝒓^ikw_{ijk}=\cos\theta_{ijk}=\hat{\bm{r}}_{ij}\cdot\hat{\bm{r}}_{ik}, 𝒓^=𝒓/|𝒓|\hat{\bm{r}}=\bm{r}/|\bm{r}|. The internal coordinates include both distances rij,rik,rilr_{ij},r_{ik},r_{il} and angles wijk,wijl,wiklw_{ijk},w_{ijl},w_{ikl}. The number of internal coordinates for V(q)V^{(q)} is equal to (q1)q/2(q-1)q/2. Typically, the one-body terms V(1)(Zi)V^{(1)}(Z_{i}) are set to the isolated energies of atom ii.

II.2 Two-body proper orthogonal descriptors

We briefly describe the construction of two-body PODs and refer to [17, 26] for further details. We assume that the direct interaction between two atoms vanishes smoothly when their distance is greater than the cutoff distance rcutr_{\rm cut}. Furthermore, we assume that two atoms can not get closer than the inner cutoff distance rinr_{\rm in}. Letting r(rin,rcut)r\in(r_{\rm in},r_{\rm cut}), we introduce the following parametrized radial functions

ϕ(r,rin,rcut,α,β)=sin(απx)α(rrin),φ(r,γ)=1rγ,\phi(r,r_{\rm in},r_{\rm cut},\alpha,\beta)=\frac{\sin(\alpha\pi x)}{\alpha(r-r_{\rm in})},\qquad\varphi(r,\gamma)=\frac{1}{r^{\gamma}}, (4)

where the scaled distance function xx is given by

x(r,rin,rcut,β)=eβ(rrin)/(rcutrin)1eβ1.x(r,r_{\rm in},r_{\rm cut},\beta)=\frac{e^{-\beta(r-r_{\rm in})/(r_{\rm cut}-r_{\rm in})}-1}{e^{-\beta}-1}. (5)

The function ϕ\phi in (4) is related to the zeroth spherical Bessel function, while the function φ\varphi is inspired by the n-m Lennard-Jones potential. Although the parameter γ\gamma can be real number, we choose a set of consecutive positive integers {1,2,,Pγ}\{1,2,\ldots,P_{\gamma}\} to compute instances of the parametrized function φ\varphi by making use of the relation φ(r,γ+1)=φ(r,γ)/r\varphi(r,\gamma+1)=\varphi(r,\gamma)/r. Similarly, we choose a set of consecutive integers {1,2,,Pα}\{1,2,\ldots,P_{\alpha}\} for α\alpha to generate instances of the parametrized function ϕ\phi by making use of the formula sin((α+1)πx)=sin(πx)Uα(cos(πx))\sin((\alpha+1)\pi x)=\sin(\pi x)U_{\alpha}(\cos(\pi x)), where UαU_{\alpha} are Chebyshev polynomials of the second kind. We take PβP_{\beta} values for the parameter β\beta such that βk=(k1)βmax/(Pβ1)\beta_{k}=(k-1)\beta_{\max}/(P_{\beta}-1) for k=1,2,,Pβk=1,2,\ldots,P_{\beta}, where βmax=4.0\beta_{\max}=4.0.

We introduce the following function as a convex combination of the two functions in (4)

ψ(r,𝝁)=κϕ(r,rin,rcut,α,β)+(1κ)φ(r,γ),\psi(r,\bm{\mu})=\kappa\phi(r,r_{\rm in},r_{\rm cut},\alpha,\beta)+(1-\kappa)\varphi(r,\gamma), (6)

where μ1=rin,μ2=rcut,μ3=α,μ4=β,μ5=γ\mu_{1}=r_{\rm in},\mu_{2}=r_{\rm cut},\mu_{3}=\alpha,\mu_{4}=\beta,\mu_{5}=\gamma, and μ6=κ\mu_{6}=\kappa. The two-body parametrized potential is defined as follows

V(2)(rij,𝝁)=fc(rij,𝝁)ψ(rij,𝝁)V^{(2)}(r_{ij},\bm{\mu})=f_{\rm c}(r_{ij},\bm{\mu})\psi(r_{ij},\bm{\mu}) (7)

where the cut-off function fc(rij,𝝁)f_{\rm c}(r_{ij},\bm{\mu}) is

fc(r,𝝁)=exp(11(1(rrin)3(rcutrin)3)2+ϵ)f_{\rm c}(r,\bm{\mu})=\exp\left(1-\frac{1}{\sqrt{\left(1-\frac{(r-r_{\rm in})^{3}}{(r_{\rm cut}-r_{\rm in})^{3}}\right)^{2}+\epsilon}}\right) (8)

with ϵ=106\epsilon=10^{-6}. This cut-off function ensures the smooth vanishing of the two-body potential and its derivative for rijrcutr_{ij}\geq r_{\rm cut}.

Given S=Pγ+PαPβS=P_{\gamma}+P_{\alpha}P_{\beta} parameter tuples 𝝁s,1sS\bm{\mu}_{s},1\leq s\leq S, we introduce the following set of snapshots

Φs(rij)=V(2)(rij,𝝁s),s=1,,S.\Phi_{s}(r_{ij})=V^{(2)}(r_{ij},\bm{\mu}_{s}),\quad s=1,\ldots,S. (9)

We next employ the proper orthogonal decomposition [17] to generate an orthogonal basis set which is known to be optimal for representation of the snapshot family {Φs}s=1S\{\Phi_{s}\}_{s=1}^{S}. In particular, the orthogonal radial basis functions are computed as follows

Rn(rij)=s=1SQsnΦs(rij),n=1,,Nr,R_{n}(r_{ij})=\sum_{s=1}^{S}Q_{sn}\,\Phi_{s}(r_{ij}),\qquad n=1,\ldots,N_{r}, (10)

where the number of radial basis functions NrN_{r} is typically in the range between 5 and 10. Note that Qsn,1sS,1nNr,Q_{sn},1\leq s\leq S,1\leq n\leq N_{r}, are a matrix whose columns are eigenvectors of the following eigenvalue problem

𝑪𝒂=λ𝒂,\bm{C}\bm{a}=\lambda\bm{a}, (11)

where the covariance matrix 𝑪\bm{C} is given by

Csp=rinrcutΦs(r)Φp(r)𝑑r,1s,pS.C_{sp}=\int_{r_{\rm in}}^{r_{\rm cut}}\Phi_{s}(r)\Phi_{p}(r)dr,\qquad 1\leq s,p\leq S. (12)

The covariance matrix is computed by using the trapezoidal rule on a grid of 2000 subintervals on the interval [rin,rcut][r_{\rm in},r_{\rm cut}]. The eigenvector matrix QsnQ_{sn} is pre-computed and stored.

Finally, the two-body proper orthogonal descriptors at each atom ii are computed by summing the orthogonal basis functions over the neighbors of atom ii and numerating on the atom types as follows

𝒟ipqn(2)={{j=1|Zj=q}NiRn(rij),if Zi=p0,if Zip\mathcal{D}^{(2)}_{ipqn}=\left\{\begin{array}[]{ll}\displaystyle\sum_{\{j=1|Z_{j}=q\}}^{N_{i}}R_{n}(r_{ij}),&\quad\mbox{if }Z_{i}=p\\ 0,&\quad\mbox{if }Z_{i}\neq p\end{array}\right. (13)

for 1iNa,1nNr,1q,pNe1\leq i\leq N_{a},1\leq n\leq N_{r},1\leq q,p\leq N_{e}. The number of two-body descriptors per atom is thus NrNe2N_{r}N_{e}^{2}.

For the purpose of complexity analysis, we assume that each atom has the same number of neighbors NiN_{i}. The total number of neighbors is thus NaNiN_{a}N_{i} for all atoms. The cost of evaluating the radial basis functions in (10) is O(NaNiNrS)O(N_{a}N_{i}N_{r}S), while the cost of evaluating the two-body descriptors in (13) is O(NaNiNr)O(N_{a}N_{i}N_{r}). The total cost is thus independent of the number of elements NeN_{e}.

II.3 Three-body proper orthogonal descriptors

For any given integer [0,Pa]\ell\in[0,P_{a}], where PaP_{a} is the highest angular degree, we introduce a basis set of angular monomials

Am(𝒓^ij)=(x^ij)lx(y^ij)ly(z^ij)lz,A_{\ell m}(\hat{\bm{r}}_{ij})=(\hat{x}_{ij})^{l_{x}}\ (\hat{y}_{ij})^{l_{y}}\ (\hat{z}_{ij})^{l_{z}}, (14)

where the exponents lx,ly,lzl_{x},l_{y},l_{z} are nonnegative integers such that lx+ly+lz={l_{x}}+l_{y}+l_{z}=\ell. Note that the index mm satisfies 0m(+1)(+2)/210\leq m\leq(\ell+1)(\ell+2)/2-1 for any given \ell, and that the total number of angular monomials is (Pa+1)(Pa+2)(Pa+3)/6(P_{a}+1)(P_{a}+2)(P_{a}+3)/6. Table 1 shows the basis set of angular monomials for Pa=4P_{a}=4. By applying the trinomial expansion to the power of the angle component wijkw_{ijk}, we obtain

(wijk)=(x^ijx^ik+y^ijy^ik+z^ijz^ik)=m=0L()CmAm(𝒓^ij)Am(𝒓^ik)\begin{split}(w_{ijk})^{\ell}&=\left(\hat{x}_{ij}\hat{x}_{ik}+\hat{y}_{ij}\hat{y}_{ik}+\hat{z}_{ij}\hat{z}_{ik}\right)^{\ell}\\ &=\sum_{m=0}^{L(\ell)}C_{\ell m}A_{\ell m}(\hat{\bm{r}}_{ij})A_{\ell m}(\hat{\bm{r}}_{ik})\end{split} (15)

where L()=(+1)(+2)/21L(\ell)=(\ell+1)(\ell+2)/2-1 and Cm=!lx!ly!lz!C_{\ell m}=\frac{\ell!}{l_{x}!l_{y}!l_{z}!} correspond to the multinomial coefficients of the trinomial expansion. Table 2 shows the multinomial coefficients for =0,1,2,3,4\ell=0,1,2,3,4.

Table 1: The angular monomials for Pa=4P_{a}=4.
\ell Am(𝒓^ij)A_{\ell m}(\hat{\bm{r}}_{ij})
0 11
1 x^ij\hat{x}_{ij},  y^ij\hat{y}_{ij},  z^ij\hat{z}_{ij}
2 x^ij2\hat{x}^{2}_{ij},  y^ij2\hat{y}^{2}_{ij},  z^ij2\hat{z}^{2}_{ij},  x^ijy^ij\hat{x}_{ij}\hat{y}_{ij},  x^ijz^ij\hat{x}_{ij}\hat{z}_{ij},  y^ijz^ij\hat{y}_{ij}\hat{z}_{ij}
3 x^ij3\hat{x}^{3}_{ij},  y^ij3\hat{y}^{3}_{ij},  z^ij3\hat{z}^{3}_{ij},  x^ij2y^ij\hat{x}^{2}_{ij}\hat{y}_{ij},  x^ij2z^ij\hat{x}^{2}_{ij}\hat{z}_{ij},  y^ij2x^ij\hat{y}^{2}_{ij}\hat{x}_{ij},  y^ij2z^ij\hat{y}^{2}_{ij}\hat{z}_{ij},
z^ij2x^ij\hat{z}^{2}_{ij}\hat{x}_{ij}, z^ij2y^ij\hat{z}^{2}_{ij}\hat{y}_{ij}, x^ijy^ijz^ij\hat{x}_{ij}\hat{y}_{ij}\hat{z}_{ij}
4 x^ij4\hat{x}^{4}_{ij},  y^ij4\hat{y}^{4}_{ij},  z^ij4\hat{z}^{4}_{ij},  x^ij3y^ij\hat{x}^{3}_{ij}\hat{y}_{ij},  x^ij3z^ij\hat{x}^{3}_{ij}\hat{z}_{ij},  y^ij3x^ij\hat{y}^{3}_{ij}\hat{x}_{ij},  y^ij3z^ij\hat{y}^{3}_{ij}\hat{z}_{ij}, z^ij3x^ij\hat{z}^{3}_{ij}\hat{x}_{ij}
z^ij3y^ij\hat{z}^{3}_{ij}\hat{y}_{ij}, x^ij2y^ij2\hat{x}^{2}_{ij}\hat{y}^{2}_{ij},  x^ij2z^ij2\hat{x}^{2}_{ij}\hat{z}^{2}_{ij},  y^ij2z^ij2\hat{y}^{2}_{ij}\hat{z}^{2}_{ij},  x^ij2y^ijz^ij\hat{x}^{2}_{ij}\hat{y}_{ij}\hat{z}_{ij}  x^ijy^ij2z^ij\hat{x}_{ij}\hat{y}^{2}_{ij}\hat{z}_{ij}
x^ijy^ijz^ij2\hat{x}_{ij}\hat{y}_{ij}\hat{z}^{2}_{ij}
Table 2: The multinomial coefficients for Pa=4P_{a}=4.
\ell CmC_{\ell m}
0 11
1 11,  11,  11
2 11,  11,  11,  22,  22,  22
3 11,  11,  11,  33,  33,  33,  33,  33,  33,  66
4 11,  11,  11,  44,  44,  44,  44,  44,  44,  66,  66,  66,  1212,  1212,  1212

Next, we define the atom basis functions at atom ii as the sum over all neighbors of atom ii of the products of radial basis functions and angular monomials

Biqnm=j=1|Zj=qNiRn(rij)Am(𝒓^ij),1qNe.B_{iqn\ell m}=\sum_{j=1|Z_{j}=q}^{N_{i}}R_{n}(r_{ij})A_{\ell m}(\hat{\bm{r}}_{ij}),\quad 1\leq q\leq N_{e}. (16)

The cost of evaluating the atom basis functions is O(NaNiNr(Pa+1)(Pa+2)(Pa+3)/6)O(N_{a}N_{i}N_{r}(P_{a}+1)(P_{a}+2)(P_{a}+3)/6), which is independent of the number of elements. These atom basis functions are used to define the atom density descriptors as follows

𝒟ipqqn(3)={m=0L()CmBiqnmBiqnm,if Zi=p0,if Zip\mathcal{D}^{(3)}_{ipqq^{\prime}n\ell}=\left\{\begin{array}[]{ll}\displaystyle\sum_{m=0}^{L(\ell)}C_{\ell m}B_{iqn\ell m}B_{iq^{\prime}n\ell m},&\quad\mbox{if }Z_{i}=p\\ 0,&\quad\mbox{if }Z_{i}\neq p\end{array}\right. (17)

for 1iNa,1nNr,0Pa,1q,pNe,1qq1\leq i\leq N_{a},1\leq n\leq N_{r},0\leq\ell\leq P_{a},1\leq q,p\leq N_{e},1\leq q^{\prime}\leq q. The number of three-body descriptors per atom is thus Nr(Pa+1)Ne2(Ne+1)/2N_{r}(P_{a}+1)N_{e}^{2}(N_{e}+1)/2. The cost of evaluating (17) is O(NaNrNe(Ne+1)(Pa+1)(Pa+2)(Pa+3)/12)O(N_{a}N_{r}N_{e}(N_{e}+1)(P_{a}+1)(P_{a}+2)(P_{a}+3)/12), which is usually less than that of evaluating the atom basis functions. Therefore, the total cost of computing the atom density descriptors is O(NaNiNr(Pa+1)(Pa+2)(Pa+3)/6)O(N_{a}N_{i}N_{r}(P_{a}+1)(P_{a}+2)(P_{a}+3)/6).

Substituting (16) into (17) yields the internal coordinate form of the three-body descriptors

𝒟ipqqn(3)=j|Zj=qNik|Zk=qNiRn(rij)Rn(rik)(wijk)\mathcal{D}^{(3)}_{ipqq^{\prime}n\ell}=\displaystyle\sum_{j|Z_{j}=q}^{N_{i}}\sum_{k|Z_{k}=q^{\prime}}^{N_{i}}R_{n}(r_{ij})R_{n}(r_{ik})\left(w_{ijk}\right)^{\ell} (18)

for Zi=pZ_{i}=p. The cost of evaluating the internal coordinate descriptors is O(NaNi2NrPa)O(N_{a}N^{2}_{i}N_{r}P_{a}). Although the atom density descriptors (17) and the internal coordinate descriptors (18) are mathematically equivalent, their computational complexity are not the same. The complexity of the atom density descriptors is linear in the number of neighbors and cubic in the angular degree, whereas that of the internal coordinate descriptors is quadratic in the number of neighbors and linear in the angular degree. Therefore, if the number of neighbors is large and the angular degree is small, it is faster to evaluate the the atom density descriptors. On the other hand, if the number of neighbors is small and the angular degree is high, it is more efficient to compute the internal coordinate descriptors.

II.4 Four-body proper orthogonal descriptors

We begin by introducing the following four-body angular functions

fs(wijk,wijl,wikl)=(wijk)a(wijl)b(wikl)c,f_{s}(w_{ijk},w_{ijl},w_{ikl})=\left(w_{ijk}\right)^{a}\left(w_{ijl}\right)^{b}\left(w_{ikl}\right)^{c}, (19)

where a,b,ca,b,c are integers such that a+b+c=a+b+c=\ell and abc0a\geq b\geq c\geq 0. The four-body angular functions fsf_{s} are listed in Table 3. The four-body internal coordinate descriptors at each atom ii are defined as

𝒟ipqqq′′ns(4)={j|Zj=q}Ni{k|Zk=q}Ni{l|Zl=q′′}NiUns\mathcal{D}^{(4)}_{ipqq^{\prime}q^{\prime\prime}ns}=\displaystyle\sum_{\{j|Z_{j}=q\}}^{N_{i}}\sum_{\{k|Z_{k}=q^{\prime}\}}^{N_{i}}\sum_{\{l|Z_{l}=q^{\prime\prime}\}}^{N_{i}}U_{ns} (20)

for Zi=pZ_{i}=p, where UnsU_{ns} are given by

Uns=Rn(rij)Rn(rik)Rn(ril)fs(wijk,wijl,wikl)U_{ns}=R_{n}(r_{ij})R_{n}(r_{ik})R_{n}(r_{il})f_{s}(w_{ijk},w_{ijl},w_{ikl}) (21)

for 1iNa,1nNr,1sKa,1p,qNe,1qq,1q′′q1\leq i\leq N_{a},1\leq n\leq N_{r},1\leq s\leq K_{a},1\leq p,q\leq N_{e},1\leq q^{\prime}\leq q,1\leq q^{\prime\prime}\leq q^{\prime}. Here KaK_{a} is the number of four-body angular basis functions, which depends on PaP_{a}. The number of four-body descriptors per atom is thus NrKaNe2(Ne+1)(Ne+2)/6N_{r}K_{a}N_{e}^{2}(N_{e}+1)(N_{e}+2)/6. The cost of evaluating the four-body internal coordinate descriptors is O(NaNi3NrKa)O(N_{a}N^{3}_{i}N_{r}K_{a}), which is independent of the number of elements.

Table 3: Four-body angular functions fsf_{s}
\ell fsf_{s}
0 f1=1f_{1}=1
1 f2=wijkf_{2}=w_{ijk}
2 f3=wijk2,f4=wijkwijlf_{3}=w^{2}_{ijk},\quad f_{4}=w_{ijk}w_{ijl}
3 f5=wijk3,f6=wijk2wijlf_{5}=w^{3}_{ijk},\quad f_{6}=w^{2}_{ijk}w_{ijl},  f7=wijkwijlwiklf_{7}=w_{ijk}w_{ijl}w_{ikl}
4 f8=wijk4,f9=wijk3wijlf_{8}=w^{4}_{ijk},\quad f_{9}=w^{3}_{ijk}w_{ijl},  f10=wijk2wijl2f_{10}=w^{2}_{ijk}w^{2}_{ijl},
f11=wijk2wijlwiklf_{11}=w^{2}_{ijk}w_{ijl}w_{ikl}

We note from the trinomial expansion that

(ξ1+ξ2+ξ3)a(η1+η2+η3)b(ζ1+ζ2+ζ3)c=α=0L(a)β=0L(b)γ=0L(c)CaαCbβCcγAaα(𝝃)Abβ(𝜼)Acγ(𝜻)\begin{split}(\xi_{1}&+\xi_{2}+\xi_{3})^{a}(\eta_{1}+\eta_{2}+\eta_{3})^{b}(\zeta_{1}+\zeta_{2}+\zeta_{3})^{c}=\\ &\sum_{\alpha=0}^{L(a)}\sum_{\beta=0}^{L(b)}\sum_{\gamma=0}^{L(c)}C_{a\alpha}C_{b\beta}C_{c\gamma}A_{a\alpha}(\bm{\xi})A_{b\beta}(\bm{\eta})A_{c\gamma}(\bm{\zeta})\end{split}

where AaαA_{a\alpha} are the angular monomials defined in (14). By considering ξ1=x^ijx^ik,ξ2=y^ijy^ik,ξ3=z^ijz^ik\xi_{1}=\hat{x}_{ij}\hat{x}_{ik},\xi_{2}=\hat{y}_{ij}\hat{y}_{ik},\xi_{3}=\hat{z}_{ij}\hat{z}_{ik}, η1=x^ijx^il,η2=y^ijy^il,η3=z^ijz^il\eta_{1}=\hat{x}_{ij}\hat{x}_{il},\eta_{2}=\hat{y}_{ij}\hat{y}_{il},\eta_{3}=\hat{z}_{ij}\hat{z}_{il}, ζ1=x^ikx^il,ζ2=y^iky^il,ζ3=z^ikz^il\zeta_{1}=\hat{x}_{ik}\hat{x}_{il},\zeta_{2}=\hat{y}_{ik}\hat{y}_{il},\zeta_{3}=\hat{z}_{ik}\hat{z}_{il}, we obtain

Aaα(𝝃)Abβ(𝜼)Acγ(𝜻)=Aaα(𝒓^ij)Abβ(𝒓^ik)Acγ(𝒓^il)A_{a\alpha}(\bm{\xi})A_{b\beta}(\bm{\eta})A_{c\gamma}(\bm{\zeta})=A_{a^{\prime}\alpha^{\prime}}(\hat{\bm{r}}_{ij})A_{b^{\prime}\beta^{\prime}}(\hat{\bm{r}}_{ik})A_{c^{\prime}\gamma^{\prime}}(\hat{\bm{r}}_{il})

where a=a+ba^{\prime}=a+b, b=a+cb^{\prime}=a+c, c=b+cc^{\prime}=b+c, and the index α\alpha^{\prime} depends on α\alpha and β\beta, β\beta^{\prime} on α\alpha and γ\gamma, γ\gamma^{\prime} on β\beta and γ\gamma. It thus follows that the four-body angular functions can be expressed as

fs=α=0L(a)β=0L(b)γ=0L(c)CabcαβγAaα(𝒓^ij)Abβ(𝒓^ik)Acγ(𝒓^il).f_{s}=\sum_{\alpha=0}^{L(a)}\sum_{\beta=0}^{L(b)}\sum_{\gamma=0}^{L(c)}C_{abc}^{\alpha\beta\gamma}A_{a^{\prime}\alpha^{\prime}}(\hat{\bm{r}}_{ij})A_{b^{\prime}\beta^{\prime}}(\hat{\bm{r}}_{ik})A_{c^{\prime}\gamma^{\prime}}(\hat{\bm{r}}_{il}).

where Cabcαβγ=CaαCbβCcγC_{abc}^{\alpha\beta\gamma}=C_{a\alpha}C_{b\beta}C_{c\gamma}. Hence, the internal coordinate form (20) is equivalent to the atom density form

𝒟ipqqq′′ns(4)=α=0L(a)β=0L(b)γ=0L(c)CabcαβγBiqnaαBiqnbβBiq′′ncγ.\begin{split}\mathcal{D}^{(4)}_{ipqq^{\prime}q^{\prime\prime}ns}=&\sum_{\alpha=0}^{L(a)}\sum_{\beta=0}^{L(b)}\sum_{\gamma=0}^{L(c)}C_{abc}^{\alpha\beta\gamma}B_{iqna^{\prime}\alpha^{\prime}}B_{iq^{\prime}nb^{\prime}\beta^{\prime}}B_{iq^{\prime\prime}nc^{\prime}\gamma^{\prime}}.\end{split} (22)

The four-body atom density descriptors are expressed in terms of the sums of the products of the atom basis functions. In general, they are more efficient to evaluate than their internal coordinate counterparts because they scale linearly with the number of neighbors. The complexity analysis of the four-body atom density descriptors is detailed in [26].

It is possible to exploit the symmetry and hierarchy of the four-body descriptors to reduce the computational cost. In particular, the four-body descriptors associated with f1=1f_{1}=1 are the products of three two-body descriptors. For b=c=0b=c=0 (e.g., f2,f3,f5,f8f_{2},f_{3},f_{5},f_{8} in Table 3) the four-body descriptors are the products of two-body descriptors and three-body descriptors. Those four-body descriptors can be computed very fast without using the atom density form (22). The remaining four-body descriptors are calculated by using (22). They share many common terms, which can be exploited to further reduce the cost. For instance, since f6=wijkf4f_{6}=w_{ijk}f_{4} and f9=wijk2f4f_{9}=w^{2}_{ijk}f_{4}, the descriptors associated with f6f_{6} and f9f_{9} have many common terms with those associated with f4f_{4}.

III Environment-Adaptive Machine Learning Potentials

This section describes a method for constructing EAML potentials from a given set of invariant descriptors. The method leverages the principal component analysis and kk-means algorithm to partition the dataset into atom clusters. The method relies on a many-body many-potential expansion that combines several different potentials to define a single potential energy surface. This is done by calculating probability functions that assess the likelihood of an atom belonging to specific clusters. These probability functions determine how contributions from different potentials are weighted and combined and provide a systematic way to maintain the continuity of the potential energy surface.

III.1 Linear regression models

Linear regression is the most simple and efficient method for building MLIP models [21, 27, 28, 22]. Let 𝒟im,1mM,{\mathcal{D}}_{im},1\leq m\leq M, be a set of MM local descriptors at atom ii. The atomic energy at an atom ii is expressed as a linear combination of the local descriptors

Ei(𝑹,𝒁)=m=1Mcm𝒟im(𝑹,𝒁)E_{i}(\bm{R},\bm{Z})=\sum_{m=1}^{M}c_{m}\mathcal{D}_{im}(\bm{R},\bm{Z}) (23)

where cmc_{m} are the coefficients to be determined by fitting against QM database. The PES is given by

ET(𝑹,𝒁)=i=1NaEi(𝑹,𝒁)=m=1Mcm𝒢m(𝑹,𝒁)E_{\rm T}(\bm{R},\bm{Z})=\sum_{i=1}^{N_{a}}E_{i}(\bm{R},\bm{Z})=\sum_{m=1}^{M}c_{m}\mathcal{G}_{m}(\bm{R},\bm{Z}) (24)

where 𝒢m=i=1Na𝒟im\mathcal{G}_{m}=\sum_{i=1}^{N_{a}}\mathcal{D}_{im} are the global descriptors. The coefficients cmc_{m} are sought as solution of a least squares problem

min𝒄α𝑮𝒄𝒆2+β𝑯𝒄𝒇2+γ𝒄2\min_{\bm{c}}\alpha\|\bm{G}\bm{c}-\bm{e}\|^{2}+\beta\|\bm{H}\bm{c}-\bm{f}\|^{2}+\gamma\|\bm{c}\|^{2} (25)

where the matrix 𝑮\bm{G} is formed from the global descriptors, while 𝑯\bm{H} is formed from the derivatives of the global descriptors for all configurations in the training database. The vector 𝒆\bm{e} is comprised of DFT energies, while 𝒇\bm{f} is comprised of DFT forces. Note that α\alpha is the energy weight parameter, β\beta is the force weight parameter, and γ\gamma is a regularization parameter. They are hyperparameters of the linear regression model.

In order to accurately model atomic forces in MD simulations, the training dataset must be diverse and rich enough to cover structural forms (e.g., crystalline, amorphous, defects, interfaces), multiple phases (solid, liquid, gas, plasma), and a wide range of temperature, pressure, and chemical environment. Training a linear model on the entire training set may not produce an accurate and efficient potential it the dataset contains very diverse environments. We partition the training dataset into KK separate subsets, whose DFT energies and forces are denoted by (𝒆k,𝒇k),1kK(\bm{e}^{k},\bm{f}^{k}),1\leq k\leq K. On each subset, we introduce an associated PES

ETk(𝑹,𝒁)=i=1Nam=1Mcmk𝒟im(𝑹,𝒁),E^{k}_{\rm T}(\bm{R},\bm{Z})=\sum_{i=1}^{N_{a}}\sum_{m=1}^{M}c_{m}^{k}\mathcal{D}_{im}(\bm{R},\bm{Z}), (26)

where the coefficient vectors 𝒄k\bm{c}^{k} are sought as solutions of the least squares problems

min𝒄kαk𝑮k𝒄k𝒆k2+βk𝑯k𝒄k𝒇k2+γk𝒄k2.\min_{\bm{c}^{k}}\alpha^{k}\|\bm{G}^{k}\bm{c}^{k}-\bm{e}^{k}\|^{2}+\beta^{k}\|\bm{H}^{k}\bm{c}^{k}-\bm{f}^{k}\|^{2}+\gamma^{k}\|\bm{c}^{k}\|^{2}. (27)

Here the superscript kk is used to indicate the quantities associated with the kkth subset. This training strategy yields an ensemble of KK separate potentials. Each potential may accurately predict configurations in the subset on which it is trained. However, it may not be accurate for predicting configurations in the other subsets.

A hypothetical issue here is to guarantee the continuity of PES when predicting forces with an ensemble of potentials. It is not obvious how to combine these separate potentials, as they are trained on different datasets. A simple strategy is to select the best potential among these potentials to predict the physical properties of a given configuration at hand, if the criterion of selecting the best potential can be defined. While this strategy may work for property prediction, it does not work for MD simulations. This is because using different potentials in an MD simulation will result in discontinuity in the PES and thus forces. The remainder of this section describes a method that allows us to combine these separate potentials to construct a global, differentiable and continuous PES.

III.2 Dataset partition

We describe a clustering method to partition the dataset into subsets of similar attributes. Local atomic environment of an atom ii comprises the positions and chemical species of the atom and its neighbors within a cutoff radius. These atom positions and chemical species can be mapped onto a vector of MM invariant descriptors, 𝒟im,1mM\mathcal{D}_{im},1\leq m\leq M, by using either the internal coordinate approach or the atom density approach. For a dataset of NN atoms, we obtain a descriptor matrix 𝓓\bm{\mathcal{D}} of size NN by MM. Each row of the descriptor matrix encapsulates the local atomic environment of the corresponding atom. The similarity between two local atomic environments can be measured by the dot product of the two corresponding descriptor vectors. Therefore, partitioning the dataset of NN atoms into KK clusters can be done by dividing the rows of the descriptor matrix into KK similarity subsets. One can use a clustering method such as kk-means clustering algorithm to partition NN vectors in MM dimensions into KK separate clusters. However, clustering in very high dimensions can be very expensive, since MM is typically large.

To reduce computational cost, we consider a dimensionality reduction method to compress the descriptor matrix into a lower-dimensional matrix of size NN by JJ, where JJ is considerably less than MM. In this paper, principal component analysis is used to obtain the low-dimensional descriptor matrix as follows

𝓑=𝓓𝑾.\bm{\mathcal{B}}=\bm{\mathcal{D}}\ \bm{W}\ . (28)

Here 𝑾M×J\bm{W}\in\mathbb{R}^{M\times J} consists of the first JJ eigenvectors of the eigenvalue decomposition 𝓓T𝓓=𝑾𝚲𝑾T\bm{\mathcal{D}}^{T}\bm{\mathcal{D}}=\bm{W}\bm{\Lambda}\bm{W}^{T}, and the eigenvalues are ordered from the largest to the smallest.

For multi-element systems, the clustering method is applied to the local descriptor matrix for each element as follows. The matrix 𝓓\bm{\mathcal{D}} is split into 𝓓e,1eNe\bm{\mathcal{D}}^{e},1\leq e\leq N_{e}, where 𝓓e\bm{\mathcal{D}}^{e} is formed by gathering the rows of 𝓓\bm{\mathcal{D}} for all atoms of the element ee. For each element ee, we compute

𝓑e=𝓓e𝑾e\bm{\mathcal{B}}^{e}=\bm{\mathcal{D}}^{e}\ \bm{W}^{e} (29)

where 𝑾e\bm{W}^{e} consists of the JJ eigenvectors of the eigenvalue decomposition (𝓓e)T𝓓e=𝑾e𝚲(𝑾e)T\left(\bm{\mathcal{D}}^{e}\right)^{T}\bm{\mathcal{D}}^{e}=\bm{W}^{e}\bm{\Lambda}\left(\bm{W}^{e}\right)^{T}.

Next, we apply kk-means clustering method to partition the rows of the matrix 𝓑e\bm{\mathcal{B}}^{e} into KK separate clusters. We denote the centroids of the clusters by 𝓒ke,1kK,1eNe\bm{\mathcal{C}}^{e}_{k},1\leq k\leq K,1\leq e\leq N_{e}. The clustering scheme allows us to divide the diverse dataset into smaller subsets, each characterized by similar data points which share the common attributes. This approach captures the diversity inherent in the dataset by identifying distinct atomic environments within the dataset. Figure 2 illustrates the process of partitioning the dataset into several atom clusters.

Refer to caption
Figure 2: Dataset of atom configurations is partitioned into atom clusters by using principal component analysis and the kk-means algorithm.

While local descriptors are used to partition NN atoms of the data set into KK subsets of atoms for each element, global descriptors can be used for partitioning the configurations of the dataset into KK subsets of configurations. For the dataset of NconfigN_{\rm config} configurations, we sum the relevant rows of the local descriptor matrix 𝓓\bm{\mathcal{D}} to obtain the global descriptor matrix 𝓖\bm{\mathcal{G}} of size NconfigN_{\rm config} by MM. We then apply the above procedure to 𝓖\bm{\mathcal{G}} to obtain the desired configuration subsets. By training potentials on these subsets separately, we can construct MLIPs that are tailored to specific atomic environments.

III.3 Many-body many-potential expansion

We begin by introducing the atomic energies associated with the partitioned subsets

ik(𝑹,𝒁)=m=1Mcmk𝒟im(𝑹,𝒁),1kK,\mathcal{E}_{ik}(\bm{R},\bm{Z})=\sum_{m=1}^{M}c_{mk}\mathcal{D}_{im}(\bm{R},\bm{Z}),\quad 1\leq k\leq K, (30)

where the coefficients cmkc_{mk} are fitted against the QM data. For simplicity of exposition, the same local descriptors are used to define the atomic energies, although we allow for different local descriptors to be used for each subset. To construct a single potential energy surface, we introduce a many-body many-potential expansion

Ei(𝑹,𝒁)=k=1K𝒫ik(𝑹,𝒁)ik(𝑹,𝒁)E_{i}(\bm{R},\bm{Z})=\sum_{k=1}^{K}\mathcal{P}_{ik}(\bm{R},\bm{Z})\mathcal{E}_{ik}(\bm{R},\bm{Z}) (31)

where

k=1K𝒫ik(𝑹,𝒁)=1,𝒫ik(𝑹,𝒁)0.\sum_{k=1}^{K}\mathcal{P}_{ik}(\bm{R},\bm{Z})=1,\quad\mathcal{P}_{ik}(\bm{R},\bm{Z})\geq 0. (32)

Thus, the atomic energy at atom ii is a weighted sum of the individual contributions from the KK subsets. Note that 𝒫ik\mathcal{P}_{ik} denotes the probability of atom ii belonging to the kkth subset. Like the local descriptors, the probabilities depend on the local atomic environment of the central atom ii.

By inserting (30) into the many-potential expansion (31) and summing over index ii, we obtain the PES as

ET(𝑹,𝒁)=i=1Nak=1Km=1Mcmk𝒫ik(𝑹,𝒁)𝒟im(𝑹,𝒁).E_{\rm T}(\bm{R},\bm{Z})=\sum_{i=1}^{N_{a}}\sum_{k=1}^{K}\sum_{m=1}^{M}c_{mk}\mathcal{P}_{ik}(\bm{R},\bm{Z})\mathcal{D}_{im}(\bm{R},\bm{Z}). (33)

The quantities 𝒬ikm(𝑹,𝒁)=𝒫ik(𝑹,𝒁)𝒟im(𝑹,𝒁)\mathcal{Q}_{ikm}(\bm{R},\bm{Z})=\mathcal{P}_{ik}(\bm{R},\bm{Z})\mathcal{D}_{im}(\bm{R},\bm{Z}) shall be called environment-adaptive descriptors. Hence, we can write the PES as follows

ET(𝑹,𝒁)=i=1Nal=1KMcl𝒬il(𝑹,𝒁),E_{\rm T}(\bm{R},\bm{Z})=\sum_{i=1}^{N_{a}}\sum_{l=1}^{KM}c_{l}\mathcal{Q}_{il}(\bm{R},\bm{Z}), (34)

where ll is a linear indexing of kk and mm. The coefficients clc_{l} are sought as solution of a least squares problem that minimizes a loss function defined as the weighted mean squared errors between the predicted energies/forces and the QM energies/forces for all configurations in the training dataset.

For the single cluster case K=1K=1, the EA model (33) reduces to the standard linear model

ET(𝑹,𝒁)=i=1Nam=1Mcm𝒟im(𝑹,𝒁).E_{\rm T}(\bm{R},\bm{Z})=\sum_{i=1}^{N_{a}}\sum_{m=1}^{M}c_{m}\mathcal{D}_{im}(\bm{R},\bm{Z}). (35)

Hence, the EA model (34) has KK times more descriptors and coefficients than the standard linear model (35). Because the size of the EA model increases with KK, it can describe diverse environments in the dataset better than the standard linear model. Nonetheless, it is necessary to assess the EA model while varying KK and compare it against the standard linear model.

III.4 The probability functions

It remains to calculate the probabilities 𝒫ik\mathcal{P}_{ik}. They are defined in terms of the local descriptors as follows. First, we compute the low-dimensional descriptors.

ij(𝑹,𝒁)=m=1MWmjZi𝒟im(𝑹,𝒁),1jJ.{\mathcal{B}}_{ij}(\bm{R},\bm{Z})=\sum_{m=1}^{M}W_{mj}^{Z_{i}}{\mathcal{D}}_{im}(\bm{R},\bm{Z}),\quad 1\leq j\leq J. (36)

where Wmje,1eNe,W_{mj}^{e},1\leq e\leq N_{e}, are the PCA matrices. Next, we calculate the inverse of the square of the distance from the kkth centroid as

𝒮ik(𝑹,𝒁)=1j=1J(ij(𝑹,𝒁)𝒞kjZi)2.{\mathcal{S}}_{ik}(\bm{R},\bm{Z})=\frac{1}{\sum_{j=1}^{J}\left({\mathcal{B}}_{ij}(\bm{R},\bm{Z})-\mathcal{C}_{kj}^{Z_{i}}\right)^{2}}. (37)

Recall that 𝒞kje\mathcal{C}_{kj}^{e} are the centroids obtained from partitioning the dataset. Finally, the probabilities are calculated as

𝒫ik(𝑹,𝒁)=𝒮ik(𝑹,𝒁)l=1K𝒮il(𝑹,𝒁),1kK.{\mathcal{P}}_{ik}(\bm{R},\bm{Z})=\frac{{\mathcal{S}}_{ik}(\bm{R},\bm{Z})}{\sum_{l=1}^{K}{\mathcal{S}}_{il}(\bm{R},\bm{Z})},\quad 1\leq k\leq K. (38)

Hence, the probabilities are high when the distances between the low-dimensional descriptor vector and the centroids are small. The additional cost of evaluating the probabilities is only O(JM+JK)O(JM+JK) per atom. This cost can be neglected, as it is considerably less than the cost of computing the local descriptors.

III.5 Force calculation

Forces on atoms are calculated by differentiating the PES (33) with respect to atom positions. To this end, we first compute the partial derivatives of the probabilities with respect to the local descriptors as

𝒫ik𝒟im=𝒫ik𝒮il𝒮ilijij𝒟im.\frac{\partial{\mathcal{P}}_{ik}}{\partial\mathcal{D}_{im}}=\frac{\partial{\mathcal{P}}_{ik}}{\partial\mathcal{S}_{il}}\frac{\partial{\mathcal{S}}_{il}}{\partial\mathcal{B}_{ij}}\frac{\partial{\mathcal{B}}_{ij}}{\partial\mathcal{D}_{im}}. (39)

Here the Einstein summation convention is used to indicate the implicit summation over repeated indices except for the index ii. The cost of evaluating the terms in (39) is O(K2MJ)O(K^{2}MJ) per atom. Next, we note that

𝒫ik(𝑹,𝒁)𝑹=𝒫ik𝒟im𝒟im𝑹.\frac{\partial{\mathcal{P}}_{ik}(\bm{R},\bm{Z})}{\partial\bm{R}}=\frac{\partial{\mathcal{P}}_{ik}}{\partial\mathcal{D}_{im}}\frac{\partial{\mathcal{D}}_{im}}{\partial\bm{R}}. (40)

Differentiating the PES (33) with respect to atom positions yields

ET(𝑹,𝒁)𝑹=cnk𝒟in𝒫ik𝑹+cml𝒫il𝒟im𝑹.\frac{\partial E_{\rm T}(\bm{R},\bm{Z})}{\partial\bm{R}}=c_{nk}\mathcal{D}_{in}\frac{\partial{\mathcal{P}}_{ik}}{\partial\bm{R}}+c_{ml}\mathcal{P}_{il}\frac{\partial{\mathcal{D}}_{im}}{\partial\bm{R}}. (41)

By inserting (40) into (41), we obtain

ET(𝑹,𝒁)𝑹=(cnk𝒟in𝒫ik𝒟im+cml𝒫il)𝒟im𝑹.\frac{\partial E_{\rm T}(\bm{R},\bm{Z})}{\partial\bm{R}}=\left(c_{nk}\mathcal{D}_{in}\frac{\partial{\mathcal{P}}_{ik}}{\partial\mathcal{D}_{im}}+c_{ml}\mathcal{P}_{il}\right)\frac{\partial{\mathcal{D}}_{im}}{\partial\bm{R}}. (42)

The additional cost of evaluating the terms in the parenthesis is only O(3MK)O(3MK) per atom.

In summary, the additional cost of evaluating the forces on atoms is O(K2MJ)O(K^{2}MJ) per atom for any K>1K>1. Since this cost is independent of the number of neighbors and linear in the number of local descriptors, it can be much smaller than the cost of computing the local descriptors and their derivatives with respect to atom positions. We can thus expect that the EA potentials are almost as fast as the standard linear potential. Therefore, the proposed method enhances the EA potentials without increasing computational cost.

IV Results and Discussions

The EA potentials will be demonstrated and compared with the standard linear potential for Tantalum element and Indium Phosphide compound. For all potentials, the hyperparameters are fixed to α=100,β=1,γ=1012\alpha=100,\beta=1,\gamma=10^{-12}. In order to assess their performance, all potentials are trained on the same training datasets and validated on the same test datasets. We evaluate the potentials using the mean absolute errors (MAEs) of the predicted energies and forces

εE=1Nconfign=1Nconfig|EnEnDFT|εF=1Nforcen=1Nforce|FnFnDFT|\begin{split}\varepsilon_{E}&=\frac{1}{N_{\rm config}}\sum_{n=1}^{N_{\rm config}}|E_{n}-E_{n}^{\rm DFT}|\\ \varepsilon_{F}&=\frac{1}{N_{\rm force}}\sum_{n=1}^{N_{\rm force}}|F_{n}-F_{n}^{\rm DFT}|\\ \end{split} (43)

where NconfigN_{\rm config} is the number of configurations in a data set, and NforceN_{\rm force} is the total number of force components for all the configurations in the same data set. Both the source code and the data are available upon request to facilitate the reproduction of our work.

IV.1 Results for Tantalum

The Ta data set contains a wide range of configurations to adequately sample the important regions of the potential energy surface [21]. The data set includes 12 different groups such as surfaces, bulk structures, defects, elastics for BCC, FCC, and A15 crystal structures, and high temperature liquid. The database was used to create a SNAP potential [21] which successfully describes a wide range of properties such as energetic properties of solid tantalum phases, the size and shape of the Peierls barrier for screw dislocation motion in BCC tantalum, as well as both the structure of molten tantalum and its melting point. We train eight EAML models on the Ta dataset for different values of MM and KK. Table 4 shows the number of descriptors for the eight EAML potentials. Note that all potentials have a one-body descriptor to account for isolated energies. The inner and outer cut-off distances are set to rin=1.0r_{\rm in}=1.0Å and rcut=5.0r_{\rm cut}=5.0Å, respectively. Furthermore, we use J=2J=2 in all cases.

Table 4: The number of radial basis functions NrN_{r}, number of angular basis functions KaK_{a}, number of descriptors NdN_{d} for two-body, three-body, and four-body POD descriptors.
Ta two-body three-body four-body All
Case NrN_{r} NdN_{d} NrN_{r} KaK_{a} NdN_{d} NrN_{r} KaK_{a} NdN_{d} MM
1 4 4 2 2 4 0 0 0 8
2 5 5 3 3 9 0 0 0 14
3 6 6 4 4 16 0 0 0 22
4 7 7 5 5 25 0 0 0 32
5 8 8 6 5 30 3 2 6 44
6 9 9 7 5 35 4 4 16 60
7 10 10 8 6 48 5 4 20 78
8 11 11 9 6 54 5 7 35 100
Table 5: Training errors in energies and forces for EAML potentials are listed in Table 4. The units for the MAEs in energies and forces are meV/atom and meV/Å, respectively.
K=1K=1 K=2K=2 K=3K=3 K=4K=4
Case εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F}
1 74.6 132.65 35.02 161.45 25.47 178.24 20.01 179.69
2 55.37 237.93 22.98 228.11 14.43 208.26 6.72 130.22
3 37.63 209.94 8.60 128.45 5.98 99.94 4.94 101.05
4 10.47 106.69 4.43 95.77 2.51 70.99 2.00 67.16
5 8.02 96.94 3.14 74.04 1.75 62.49 1.49 59.05
6 4.00 88.38 1.58 62.49 1.19 52.46 1.02 50.82
7 2.37 76.73 1.20 55.82 0.80 49.94 0.60 45.51
8 1.77 64.11 0.83 48.91 0.56 43.10 0.49 39.56

Table 5 displays training errors in energies and forces predicted by EAML potentials for different values of the number of the descriptors listed in Table 4 and for K=1,2,3,4K=1,2,3,4. We see that both the energy and force errors decrease as the number of descriptors increases. As MM increases from 88 to 100, the energy errors drop by a a factor of 20, while the force errors drop by a factor of 4. As KK increases from 11 to 44, the energy errors decrease by a factor of 4, while the force errors decrease by a factor of 1.5. The energy errors reach 1.77 meV/atom, 0.83 meV/atom, 0.56 meV/atom, and 0.48 meV/atom for K=1,2,3,K=1,2,3, and 4, respectively. These energy errors are below the typical numerical errors of DFT calculations. The force errors reach 64.11 meV/Å, 48.91 meV/Å, 43.10 meV/Å, and 39.56 meV/Å. These force errors are acceptable for most applications. The errors decrease quite rapidly as KK increases from 1 to 2. However, as KK increases from 22 to 44, the rate of error decrease slows down considerably for this dataset and we observe a smaller improvement. Hence, we may only gain marginal improvements by increasing KK beyond 4.

Table 6 provides the training errors in energy and forces for each of the 12 groups for M=60M=60. The force errors for Bulk A15, Bulk BCC, and Bulk FCC are zero, because their structures are at equilibrium states and thus have zero atomic forces. The Surface group tends to have high energy errors than other groups, while the Liquid group has the highest force errors. The liquid structures depend strongly on the repulsive interactions that occur when two atoms approach each other. Consequently, it is more difficult to predict atomic forces of the liquid phase since the liquid configurations are very different from those of the equilibrium solid crystals. It is also more difficult to predict energies of surface configurations because the surfaces of BBC crystals tend to be rather open with surface atoms exhibiting rather low coordination numbers.

Table 6: Energy and force errors for EAML potentials with M=60M=60 for different configuration groups. The units for the MAEs in energies and forces are meV/atom and meV/Å, respectively.
K=1K=1 K=2K=2 K=3K=3 K=4K=4
Group εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F}
Disp. A15 1.94 125.28 0.42 80.82 1.66 76.02 1.35 71.27
Disp. BCC 11.71 140.57 5.81 110.24 5.47 92.13 5.13 89.89
Disp. FCC 1.79 106.22 3.36 77.38 2.95 51.53 2.84 51.88
Elas. BCC 0.91 0.04 0.55 0.01 0.38 0.01 0.38 0.00
Elas. FCC 0.72 0.16 0.47 0.13 0.38 0.13 0.34 0.12
GSF 110 3.78 41.35 2.03 15.84 1.53 17.09 1.26 15.8
GSF 112 5.43 59.10 3.27 51.41 2.30 42.01 2.12 41.55
Liquid 11.28 371.38 2.40 262.04 2.12 223.8 2.81 216.9
Surface 13.66 62.00 5.66 40.02 4.67 28.97 3.60 28.06
Bulk A15 4.87 0 2.19 0 1.32 0 1.16 0
Bulk BCC 11.96 0 3.47 0 2.32 0 1.49 0
Bulk FCC 13.59 0 2.74 0 1.69 0 1.31 0

Next, we investigate the influence of training datasets on model performance. To this end, we train four potentials on different training datasets with M=14M=14. The first three potentials are standard linear models, while the fourth potential is an EAML potential with K=2K=2 clusters. Table 7 displays the MAEs in energies for the four potentials. The first potential, trained exclusively on the Bulk A15 group, demonstrates a small error of 2.54 meV/atom for this group but a very large error of 1070.7 meV/atom for the Bulk FCC group. Conversely, the second potential, trained on the Bulk FCC group, shows a small error of 11.41 meV/atom for its training group and a significant error of 218.57 meV/atom for the Bulk A15 group. While each potential performs well on the dataset it was trained on, its predictions for the other group are highly inaccurate. The third potential, trained on both the Bulk A15 and Bulk FCC groups, exhibits more balanced errors of 37.63 meV/atom and 41.57 meV/atom for the Bulk A15 and Bulk FCC groups, respectively. The fourth potential, trained on both groups with K=2K=2 clusters, achieves superior accuracy with errors of 2.06 meV/atom and 3.88 meV/atom for the Bulk A15 and Bulk FCC groups, respectively. These results underscore the importance of diverse training datasets and demonstrate the substantial improvement of the EAML model over the standard linear model.

Table 7: Energy errors for the standard linear potentials and EAML potential with M=14M=14 when they are trained on different training groups and validated on Bulk A15 and Bulk FCC groups. The unit for the MAEs in energies is meV/atom.
Training data Clusters Test Groups
Bulk A15 Bulk FCC
Bulk A15 1 2.54 1070.7
Bulk FCC 1 218.57 11.41
Bulk A15 & Bulk FCC 1 37.63 41.57
Bulk A15 & Bulk FCC 2 2.06 3.88
Refer to caption
Figure 3: Energy parity (left) and atomic force parity (right) plots for Ta for Case 5 with K=4K=4. For force parity, atoms that do not belong to one specific the environment with maxk𝒫k<0.7\displaystyle{\max_{k}\mathcal{P}_{k}<0.7} (red dots) have similar errors as atoms that belong to one specific environment with maxk𝒫k0.7\displaystyle{\max_{k}\mathcal{P}_{k}\geq 0.7} (blue dots).

Figure 3 shows the energy and atomic force parity plots for Ta for M=44M=44 with K=4K=4 environments from Table 4. We note that atoms with maxk𝒫k<0.7\displaystyle{\max_{k}\mathcal{P}_{k}<0.7} that do not belong exclusively to one environment have similar force errors as the other atoms. This shows the ability of the EAML model to capture atomic environments that are a mixture of several distinct environments, thereby making itself more transferable than the standard linear model. The transferability of the EAML model can be attributed to several factors. First, the EAML model has more capacity than the standard linear potential because it has KK times more trainable coefficients. Second, owing to the probability functions that vary with the neighborhood of the central atom, the EAML model adapts itself according to the local atomic environments to capture atomic interactions more accurately than the linear model. Third, the products of the probability functions and the descriptors contain higher body interactions than the descriptors themselves, rendering the EAML model higher body order than the linear model.

Figure 4 plots the energy per atom as a function of volume per atom for A15, BCC, FCC crystal structures. The FCC phase has a minimum energy about 0.2eV/atom above the BCC and A15 phases. We see that the energies are predicted accurately for the whole volume range with using only M=14M=14 descriptors. Figure 5 plots the close-up view of the energy curve near the minimum energy. The predicted energy curves for K=4K=4 are almost indistinguishable from the DFT energy curves for BCC, FCC and A15 phases.

Figure 6 illustrates the trade-off between computational cost and training error for K=1,2,3,4K=1,2,3,4. The computational cost is measured in terms of milisecond per time step per atom for MD simulations. These MD simulations are performed using LAMMPS [29] on a CPU core of Intel i7-1068NG7 2.3 GHz with 20×20×2020\times 20\times 20 bulk supercell containing 16000 Tantalum atoms. The EAML potentials with K>1K>1 are almost as fast as the standard linear potential for the same number of descriptors, having the computational cost almost independent of KK. This is consistent with the computational complexity analysis discussed in Subsection III.5. Furthermore, the EAML potential with M=32,K=4M=32,K=4 is more accurate and 3 times faster than the standard linear potential with M=100M=100. The results show the superior performance of the EAML potentials.

Refer to caption
(a) A15
Refer to caption
(b) BCC
Refer to caption
(c) FCC
Figure 4: Energy per atom versus volume per atom for A15, BCC, and FCC crystal structures for EAML potentials using M=14M=14 descriptors in comparison with DFT data.
Refer to caption
(a) A15
Refer to caption
(b) BCC
Refer to caption
(c) FCC
Figure 5: Close-up view near the minimum energy for the energy per atom versus volume per atom curves.
Refer to caption
(a) Energy
Refer to caption
(b) Force
Figure 6: Training errors versus the computational cost of MD simulations for the Ta system of 16000 atoms. MD simulations are performed using LAMMPS [29] on a CPU core of Intel i7-1068NG7 2.3 GHz for EAML potentials with different numbers of descriptors and numbers of clusters.

IV.2 Results for Indium Phosphide

The InP dataset contains a wide range of configurations to adequately sample the important regions of the potential energy surface. It was generated by Cusentino et. al [30] using the Vienna Ab Initio Simulation Package to demonstrate the explicit multi-element SNAP potential. The InP dataset also contains high-energy defects which are intended to study radiation damage effects where collision cascades of sufficiently high energy leave behind high formation energy point defects. Furthermore, the dataset includes configurations for uniform expansion and compression (Equation of State), random cell shape modifications (Shear group), and uniaxially strained (Strain group) unit cells for zincblende crystal structure. In total, the dataset has 1894 configurations with atom counts per configuration ranging from 8 to 216. The training set is 80% of the InP dataset, while the entire InP dataset is used as the test set. The inner and outer cut-off distances are set to rin=0.8r_{\rm in}=0.8Å and rcut=5r_{\rm cut}=5Å, respectively.

Table 8 displays the number of descriptors for six different cases. Table 9 provides test errors in energies and forces for cases listed in Table 8 and for K=1,2,3,4K=1,2,3,4. Both the energy and force errors decrease as MM and KK increases. As MM increases from 4040 to 370370, the energy errors drop more than a factor 10, while the force errors drop by a factor of 4. As KK increases from 11 to 44, the energy errors decrease by a factor of 44 and the force errors decrease by a factor 2. The MAEs in energies reach 0.97 meV/atom for K=1K=1, 0.42 meV/atom for K=2K=2, 0.31 meV/atom for K=3K=3, and 0.23 meV/atom for K=4K=4. These energy errors are generally below the limits of DFT errors. The MAEs in forces reach 16.70, 11.16, 9.05, 7.79 meV/Å for K=1,2,3,4K=1,2,3,4, respectively. These force errors are acceptable for most applications.

Table 8: The number of radial basis functions NrN_{r}, number of angular basis functions KaK_{a}, number of descriptors NdN_{d} for two-body, three-body, and four-body POD descriptors. Note that Nd=NrNe2N_{d}=N_{r}N_{e}^{2} for two-body PODs, Nd=NrKaNe2(Ne+1)/2N_{d}=N_{r}K_{a}N_{e}^{2}(N_{e}+1)/2 for three-body PODs, Nd=NrKaNe2(Ne+1)(Ne+2)/6N_{d}=N_{r}K_{a}N_{e}^{2}(N_{e}+1)(N_{e}+2)/6 for four-body PODs.
InP two-body three-body four-body All
Case NrN_{r} NdN_{d} NrN_{r} KaK_{a} NdN_{d} NrN_{r} KaK_{a} NdN_{d} MM
1 4 16 2 2 24 0 0 0 40
2 5 20 3 3 54 0 0 0 74
3 6 24 4 4 96 0 0 0 120
4 7 28 5 5 150 0 0 0 178
5 8 32 6 5 180 3 2 48 260
6 9 36 7 5 210 4 4 128 374
Table 9: Test errors in energies and forces for EAML potentials listed in Table 8. The units for the MAEs in energies and forces are meV/atom and meV/Å, respectively.
K=1K=1 K=2K=2 K=3K=3 K=4K=4
Case εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F}
1 11.63 61.00 6.44 48.13 4.67 36.76 4.11 33.30
2 4.97 38.71 3.49 30.27 2.52 24.00 2.01 22.05
3 3.30 33.80 1.55 21.28 1.11 17.30 0.91 15.34
4 2.67 27.26 1.20 17.81 0.68 14.56 0.46 12.74
5 1.71 20.91 0.70 14.28 0.52 12.03 0.38 10.98
6 0.97 16.70 0.43 11.10 0.28 8.91 0.20 7.79

Table 10 provides the test errors in energy and forces for each of the 19 groups in the dataset for M=178M=178. Point defects are created when atoms become vacant at lattice sites (vacancy defect), occupy locations in the crystal structure at which there is usually no atom (interstitial defect), or exchange positions with other atoms of different types (antisite defect). The defect groups have higher errors than the other groups. The InvS{}^{\rm S}_{\rm v} group has the highest mean absolute error in energies, while the InvS{}^{\rm S}_{\rm v}PvS{}^{\rm S}_{\rm v} group has the highest mean absolute error in forces. We see that increasing KK reduces the energy and force errors across all groups.

Table 10: Energy and force errors for EAML potentials with M=178M=178 for different configuration groups. The units for the MAEs in energies and forces are meV/atom and meV/Å, respectively. Point defects are created from an equilibrium configuration by inserting atoms (interstitial), removing atoms (vacancy), or exchanging atoms of different types (antisite). Subscripts correspond to vacancy(v), interstitial(i) and antisite(a). Superscripts correspond to large configurations of 216 atoms (L) and small configurations of 64 atoms (S).
K=1K=1 K=2K=2 K=3K=3 K=4K=4
Group εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F} εE\varepsilon_{E} εF\varepsilon_{F}
Bulk 5.79 0.00 2.26 0.00 1.50 0.00 0.91 0.00
EOS 2.60 0.70 0.96 0.83 0.80 0.89 0.49 0.42
Shear 0.26 25.56 0.50 8.45 0.17 5.23 0.11 3.64
Strain 2.53 0.02 1.24 0.02 0.90 0.03 0.91 0.05
InaL{}^{\rm L}_{\rm a} 3.05 10.77 1.05 7.49 0.55 6.30 0.23 6.48
PaL{}^{\rm L}_{\rm a} 4.09 29.6 1.30 18.22 1.03 14.32 0.72 12.10
InaL{}^{\rm L}_{\rm a}PaL{}^{\rm L}_{\rm a} 6.68 23.23 3.00 16.83 1.71 13.96 1.14 13.52
IniL{}^{\rm L}_{\rm i} 4.68 19.82 1.82 13.80 1.13 11.36 0.73 10.35
PiL{}^{\rm L}_{\rm i} 3.34 15.68 1.28 9.96 0.87 8.55 0.61 7.69
PvL{}^{\rm L}_{\rm v} 3.65 7.95 0.73 5.17 0.22 5.62 0.09 5.09
InvL{}^{\rm L}_{\rm v}PvL{}^{\rm L}_{\rm v} 4.20 27.21 1.85 21.76 1.18 18.91 0.69 17.38
InaS{}^{\rm S}_{\rm a} 3.66 15.54 0.67 10.29 0.36 7.82 0.22 7.31
PaS{}^{\rm S}_{\rm a} 4.16 42.85 1.77 33.30 0.70 26.03 0.41 23.33
InaS{}^{\rm S}_{\rm a}PaS{}^{\rm S}_{\rm a} 6.46 58.35 3.76 35.68 1.36 25.29 0.82 24.00
IniS{}^{\rm S}_{\rm i} 3.05 51.51 1.18 33.05 0.65 25.96 0.39 21.89
PiS{}^{\rm S}_{\rm i} 2.95 40.13 1.39 24.51 0.69 20.46 0.45 16.90
InvS{}^{\rm S}_{\rm v} 11.23 40.15 6.92 27.99 4.54 20.10 2.90 14.65
PvS{}^{\rm S}_{\rm v} 0.92 26.00 0.49 18.94 0.36 16.15 0.24 14.88
InvS{}^{\rm S}_{\rm v}PvS{}^{\rm S}_{\rm v} 4.33 57.06 2.69 38.85 1.16 32.05 0.74 27.61

One of the crucial requirements for interatomic potentials is that they predict the formation and cohesive energies accurately. In addition to defect formation energies, we also study cohesive energies for different low-energy crystal structures. Figure 7 plots the energy per atom as a function of volume per atom for the rocksalt (RS) and zincblende (ZB) crystal structures. We see that the predicted cohesive energies are very close to the DFT cohesive energies for both the rocksalt (RS) and zincblende (ZB) crystal structures. Furthermore, the EAML potentials correctly predict ZB as the most stable structure and reproduce the experimental cohesive energy of -3.48 eV/atom at a volume of 24.4 Å3\mbox{\AA}^{3}/atom [31]. The predicted cohesive energies for the RS structure match exactly the DFT value of -3.30eV/atom at a volume of 19.7 Å3\mbox{\AA}^{3}/atom. While not plotted in Figure 7, the predicted cohesive energies for the wurtzite ground state structure agree well with the DFT value of -3.45eV/atom at a volume of 25.1 Å3\mbox{\AA}^{3}/atom.

Refer to caption
(a) Energy per atom versus volume per atom curve
Refer to caption
(b) Close-up view near the minimum energies
Figure 7: Energy per atom versus volume per atom for RS and ZB crystal structures for EAML potentials using M=40M=40 descriptors in comparison with DFT data.

Figure 8 illustrates the trade-off between computational cost and accuracy for MD simulations of 8000 InP atoms performed on a single CPU core of Intel i7-1068NG7 2.3 GHz. The computational cost is measured in terms of second per time step per atom. We clearly see that the potentials for K>1K>1 are almost as fast as the potential for K=1K=1 for the same number of descriptors. We also see that increasing KK reduces the test errors. The energy errors for K=4K=4 are about 4 times smaller than those for K=1K=1, while the force errors for K=4K=4 are about 2 times smaller than those for K=1K=1. As a result, the EAML potential with M=120,K=4M=120,K=4 is more accurate and 3 times faster than the standard linear potential with M=374M=374.

Refer to caption
(a) Energy
Refer to caption
(b) Force
Figure 8: Test errors versus the computational cost of MD simulations for the InP system of 8000 atoms. MD simulations are performed using LAMMPS [29] on a CPU core of Intel i7-1068NG7 2.3 GHz for EAML potentials with different numbers of descriptors and numbers of clusters.

V Conclusions

We have introduced multi-element Proper Orthogonal Descriptors (PODs) for constructing machine-learned interatomic potentials. The POD descriptors incorporate elements of both internal coordinate descriptors and atom density descriptors. Our approach can be extended to arbitrary body orders and can be used to compute atom-centered symmetry functions and empirical potentials with a cost that scales only linearly with the number of neighbors. The method brings about the possibility of constructing many-body empirical potentials, while maintaining the computational cost that scales linearly with the number of neighbors. For instance, the SW and EAM potentials can be extended to include four-body terms, while atom-centered symmetry functions can be formed from the four-body POD descriptors.

We have presented an environment-decomposition method to construct accurate and transferable interatomic potentials by adapting to the local atomic environment of each atom within a system. For a dataset of NN atoms, atom positions and chemical species are mapped to a descriptor matrix by using the POD method. Principal component analysis (PCA) is used to reduce the dimension of the descriptor space. The kk-means clustering scheme is applied to the reduced matrix to partition the dataset into subsets of similar environments. Each cluster represents a distinctive local environment and is used to define a corresponding local potential. We introduce a many-body many-potential expansion to smoothly blend these local potentials to ensure global continuity of the potential energy surface. This continuity is achieved by calculating probability functions that assess the likelihood of an atom belonging to specific clusters identified within the dataset.

We have applied the EAML potentials to Ta and InP datasets. The results show that EAML models provide significantly more accurate predictions than the standard linear model for the same number of descriptors MM. There are several reasons behind the better accuracy of EAML potentials. First, EAML potentials have more capacity than the standard linear potential because they have a larger number of fitting coefficients (i.e., KMKM versus MM). Second, owing to the probability functions that vary with the neighborhood of the central atom, EAML potentials adapt their descriptors according to the local atomic environments to capture atomic interactions more accurately than the linear potential. Third, the products of the probability functions and the descriptors contain higher body interactions than the descriptors themselves. As a result, EAML potentials can capture higher-order interactions than the linear potential. Since EAML potentials have computational complexity similar to that of the linear potential, they are more accurate and efficient.

While PCA is used for its simplicity and straightforward implementation, nonlinear dimensionality reduction techniques may offer some advantage. Autoencoders, Variational Autoencoders, t-Distributed Stochastic Neighbor Embedding (t-SNE) and Isomap offer the ability to uncover and preserve intricate structures in high-dimensional data that PCA might overlook. These methods are particularly useful in scenarios where the relationships among data points involve complex patterns. While kk-means clustering is used for its simplicity and straightforward implementation, there are several other clustering techniques that can be use to deal with very large and diverse datasets. Techniques such as hierarchical clustering, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Gaussian Mixture Models (GMMs) offer alternative methods that can yield better clusters than kk-means clustering.

In this paper, we consider linear regression to construct EAML models. Linear models are easy to understand and interpret because the relationship between the output and trainable parameters is linear. They are computationally inexpensive to train and require relatively low computational resources. Indeed, it takes only a few seconds to a few minutes to train EAML potentials on a personal computer. However, linear regression may be insufficient to capture complex atomic interactions without transformation of input features. Significant performance improvement can be achieved by using more sophisticated regression methods such as nonlinear regression, kernel regression, and neural networks. While these nonlinear models require much longer training times than linear models, they often yield more accurate predictions for the same computational cost [18].

Acknowledgements

We would like to thank Jaime Peraire, Robert M. Freund, Youssef Marzouk, Nicolas Hadjiconstantinou, and Spencer Wyant at MIT for the fruiful discussions on a wide ranging of topics related to this work. We would also like to thank Andrew Rohskopf, Axel Kohlmeyer and Aidan Thompson for fruitful discussions about LAMMPS implementation of POD potentials. We gratefully acknowledge the United States Department of Energy under contract DE-NA0003965 and the the Air Force Office of Scientific Research under Grant No. FA9550-22-1-0356 for supporting this work.

References