NewtonNet: A Newtonian message passing network for deep learning of interatomic potentials and forces

Mojtaba Haghighatlari¹ Jie Li¹ Xingyi Guan¹ Oufan Zhang¹ Farnaz Heidar-Zadeh^1-3 Teresa Head-Gordon^1,4-5 ¹Kenneth S. Pitzer Theory Center and Department of Chemistry, University of California, Berkeley, CA, USA ²Center for Molecular Modeling (CMM), Ghent University, B-9052 Ghent, Belgium ³Department of Chemistry, Queen’s University, Kingston, Ontario K7L 3N6, Canada ⁴Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA ⁵Departments of Bioengineering and Chemical and Biomolecular Engineering, University of California, Berkeley, CA, USA, corresponding author:thg@berkeley.edu

NewtonNet Modules

The core idea behind the iterative message passing of the atomic environments is to update the feature array ${a}_{i}^{t}$ that represent each atom in its immediate environment. This level of feature representation is invariant to the rotations of the initial configuration in NewtonNet through the following operations.

Atomic Feature Aggregator. We initialize the atomic features based on trainable embedding of atomic numbers $Z_{i}$ , i.e., ${a}_{i}^{0}=g(Z_{i})$ and ${g}:\mathbb{R}^{\mathrm{1}}\rightarrow\mathbb{R}^{\mathrm{nf}}$ . We next use the edge function ${e}:\mathbb{R}^{\mathrm{3}}\rightarrow\mathbb{R}^{\mathrm{nb}}$ to represent the interatomic distances using radial Bessel functions as introduced by Klicpera et al.^?

\displaystyle e(\vv{r_{ij}})=\sqrt{\frac{2}{r_{c}}}

where $r_{c}$ is the cutoff radius and $\lVert\vv{r_{ij}}\rVert$ returns the interatomic distance between any atom $i$ and $j$ . We follow Schutt et al.^? in using a self-interaction linear layer $\phi_{rbf}:\mathbb{R}^{\mathrm{nb}}\rightarrow\mathbb{R}^{\mathrm{nf}}$ to combine the output of radial basis functions with each other. This operation is followed by using an envelop function to implement a continuous radial cutoff around each atom. For this purpose, we use the polynomial function $e_{cut}$ introduced by Klicpera et al.^? with the choice of degree of polynomial $p=7$ . Thus, the edge operation ${\phi_{e}}:\mathbb{R}^{\mathrm{3}}\rightarrow\mathbb{R}^{\mathrm{nf}}$ is defined as a trainable transformation of relative atom position vectors in the cutoff radius ${r}_{c}$

\displaystyle\phi_{e}(\vv{r_{ij}})=\phi_{rbf}(e(\vv{r_{ij}}))\left e{}_{cut}(r_{c},\lVert\vv{r_{ij}}\rVert\right).

(1)

The output of $\phi_{e}$ is rotationally invariant as it only depends on the interatomic distances. Following the notation of neural message passing, we define a message function to collect the neighboring information and update atomic features. Here, we tend to pass a symmetric message between any pair of atoms, i.e., the message that is passed between atom $i$ and atom $j$ are the same in both directions. Thus, we introduce our symmetric message passing $m_{ij}$ by element-wise product between all feature arrays involved in any two-body interaction,

\displaystyle{m}_{ij}=\phi_{a}({a}_{i}^{l})\left\phi{}_{a}({a}_{j}^{l})\right\phi_{e}(\vv{r_{ij}})

(2)

where $\phi_{a}:\mathbb{R}^{\mathrm{nf}}\rightarrow\mathbb{R}^{\mathrm{nf}}$ indicates a trainable and differentiable network with a nonlinear activation function SiLU ^? after the first layer. Note that the $\phi_{a}$ is the same function applied to all atoms. Thus, due to the weight sharing and multiplication of output features of both heads of the two-body interaction, the ${m}_{ij}$ remain symmetric at each layer of message passing. To complete the feature array aggregator, we use the equation LABEL:eqn:aggregate to simply sum all messages received by central atom $i$ from its neighbors $\mathcal{N}(i)$ . Finally, we update the atomic features at each layer using the sum of received messages,

\displaystyle{a}_{i}^{l+1}={a}_{i}^{l}+\sum_{j\in\mathcal{N}(i)}{m}_{ij}.

(3)

force calculator. So far, we have followed a standard message passing that is invariant to the rotation. We begin to take advantage of directional information starting from the force calculator module. The core idea behind this module is to construct latent force vectors using the Newton’s third law. The third law states that the force that atom $i$ exerts on atom $j$ is equal and in opposite direction of the force that atom $j$ exerts on atom $i$ . This is the reason that we intended to introduce a symmetric message passing operator. Thus, we can estimate the symmetric force magnitude as a function of ${m}_{ij}$ , i.e., $\lVert\vv{F}_{ij}\rVert=\phi_{F}({m}_{ij})$ . The product of the force magnitude by unit distance vectors $\hat{r}_{ij}=\vv{r}_{ij}/\lVert\vv{r}_{ij}\rVert$ gives us antisymmetric interatomic forces that obey the Newton’s third law (note that $\vv{r}_{ij}=-\vv{r}_{ji}$ ),

\displaystyle\vv{F}_{ij}^{l}=\left\phi{}_{F}({m}_{ij}\right)\hat{r}_{ij}

(4)

where $\phi_{F}:\mathbb{R}^{\mathrm{nf}}\rightarrow\mathbb{R}^{\mathrm{1}}$ is a differentiable learned function, and $\vv{F}_{ij}^{l}\in\mathbb{R}^{\mathrm{3}}$ . The total force at each layer $\vv{F}_{i}^{l}$ on atom $i$ is the sum of all the forces from the neighboring atoms $j$ in the atomic environment,

\displaystyle\vv{F}_{i}^{l}=\sum_{j\in\mathcal{N}(i)}\vv{F}_{ij}^{l},

(5)

and updating the latent force vectors at each layer,

\displaystyle\bm{\mathcal{F}}_{i}^{l+1}=\bm{\mathcal{F}}_{i}^{l}+\vv{F}_{i}^{l}.

(6)

We ultimately use the latent force vector from the last layer $L$ , $\bm{\mathcal{F}}_{i}^{L}\in\mathbb{R}^{\mathrm{3}}$ in the loss function to ensure this latent space truly mimics the underlying physical rules.

To complete the force calculator module, we borrow the idea of continuous filter from Schut et al. to decompose and scale latent force vectors along each dimension using another learned function $\phi_{f}:\mathbb{R}^{\mathrm{nf}}\rightarrow\mathbb{R}^{\mathrm{nf}}$ . This way we can featurize the vector field to avoid too much of abstraction in the structural information that they carry with themselves,

\displaystyle\bm{\Delta f}_{i}=\sum_{j\in\mathcal{N}(i)}\left\phi{}_{f}({m}_{ij}\right)\vv{F}_{ij}^{l}.

(7)

As a result, the constructed latent interatomic forces are decomposed by rotationally invariant features along each dimmension, i.e., $\bm{\Delta f}_{i}\in\mathbb{R}^{\mathrm{3\times nf}}$ . We call this type of representation feature vectors. Following the message passing strategy, we update the force feature vectors with $\bm{\Delta f}_{i}$ after each layer, while they are initialized with zero values, $\bm{f}_{i}^{0}=\bm{0}$ ,

\displaystyle\bm{f}_{i}^{l+1}=\bm{f}_{i}^{l}+\bm{\Delta f}_{i}.

(8)

momentum calculator. This is the step that we try to estimate a measure of atomic displacement due to the forces that are exerted on them. We accumulate their dispalcements at each layer without updating the position of each atom. The main idea in this module is that the displacement must be along the updated force features in the previous step. Inspired by Newton’s second law, we approximate the displacement factor using a learned function $\phi_{r}:\mathbb{R}^{\mathrm{nf}}\rightarrow\mathbb{R}^{\mathrm{nf}}$ that acts on the current state of each atom presented by its atomic features ${a}_{i}^{l}$ ,

\displaystyle\bm{\delta r}_{i}=\phi_{r}({a}_{i}^{l+1})\bm{{f}_{i}^{l+1}}.

(9)

We finally update the displacement feature vectors by $\bm{\delta r}_{i}$ and a weighted sum of all the atomic displacements from the previous layer. The weights are estimated based on a trainable function of messages ( $\phi_{r}^{{}^{\prime}}:\mathbb{R}^{\mathrm{nf}}\rightarrow\mathbb{R}^{\mathrm{nf}}$ ) between atoms,

\displaystyle\bm{dr}_{i}^{l+1}=\sum_{j\in\mathcal{N}(i)}\phi_{r}^{{}^{\prime}}\left({m}_{ij}\right)\bm{dr}_{i}^{l}+\bm{\delta r}_{i}.

(10)

The weight component in this step works like attention mechanism to concentrate on the two-body interactions that cause maximum movement in the atoms. Since forces at $l=0$ are zero, the displacements are also initialized with zero values, i.e., $\bm{{dr}_{i}^{0}}=\bm{0}$ .

energy calculator. The last module contracts the directional information to the rotationally invariant atomic features. Since we developed the previous steps based on the Newton’s equations of motion, one immediate idea is to approximate the potential energy change for each atom using $f_{i}^{l}$ and $\delta r_{i}^{l}$ , considering that $\bm{f_{i}^{l}}\approx-\delta U/\bm{\delta r_{i}^{l}}$ . Thus, we find energy change for each atom by

\displaystyle{\delta U}_{i}=-\phi_{u}({a}_{i}^{l+1})\left\langle\bm{f}_{i}^{l+1}\cdot\bm{dr}_{i}^{l+1}\right\rangle,

(11)

where ${\delta U}_{i}\in\mathbb{R}^{\mathrm{nf}}$ and $\phi_{u}:\mathbb{R}^{\mathrm{nf}}\rightarrow\mathbb{R}^{\mathrm{nf}}$ is a differentiable learned function that operates on the atomic features and predicts the energy coefficient for each atom. The dot product of two feature vectors contracts the features along each dimension to a single feature array. We finally update the atomic features once again using the contracted directional information presented as atomic potential energy change,

\displaystyle{a}_{i}^{l+1}={a}_{i}^{l+1}+{\delta U}_{i}.

(12)

This approach is both physically and mathematically consistent with the rotational equivariance operations and the goals of our model development. Physically, the energy change is the meaningful addition to the atomic feature arrays as they are used to predict the atomic energies eventually. Mathematically, the dot product of two feature vectors contracts the rotationally equivariant features to invariant features similar to euclidean distance that we used in the atomic feature aggregator module.

Proof of Equivariance and Invariance

We prove that our model is rotationally equivariant on the atomic positions $\mathbb{R}_{i}\in\mathbb{R}^{\mathrm{3}}$ and atomic numbers $Z_{i}$ for a rotation matrix $T\in\mathbb{R}^{\mathrm{3\times 3}}$ . In the equation 1, the euclidean distance is invariant to the rotation, as it can be shown that

\displaystyle\begin{split}\left\|T\mathbf{r}_{ij}\right\|^{2}=\\ \left\|T\mathbf{R}_{j}-T\mathbf{R}_{i}\right\|^{2}=\\ \left(\mathbf{R}_{j}-\mathbf{R}_{i}\right)^{\top}T^{\top}T\left(\mathbf{R}_{j}-\mathbf{R}_{i}\right)=\\ \left(\mathbf{R}_{j}-\mathbf{R}_{i}\right)^{\top}\mathbf{I}\left(\mathbf{R}_{j}-\mathbf{R}_{i}\right)=\\ \left\|\mathbf{R}_{j}-\mathbf{R}_{i}\right\|^{2}=\\ \left\|\mathbf{r}_{ij}\right\|^{2},\end{split}

(13)

which means that the euclidean distance is indifferent to the rotation of the positions as it is quite well-known for this feature. Consequently, feature arrays $m_{ij}$ , $a_{i}$ , and all the linear or non-linear functions acting on them will result in invariant outputs. The only assumptions for this proof is that a linear combination of vectors or their product with invariant features will remain rotationally equivariant. Base on this assumption we claim that equation 5 to 11 will remain equivariant to the rotations. For instance, the same rotation matrix $T$ propagates to equation 5 such that,

\displaystyle\left\phi{}_{F}(T{m}_{ij}\right)T\hat{r}_{ij}=\left\phi{}_{F}({m}_{ij}\right)T\hat{r}_{ij}=T\left\phi{}_{F}({m}_{ij}\right)\hat{r}_{ij}=T\vv{F}_{ij}^{l}.

(14)

The last operator, equation 12, will remain invariant to the rotations due to the use of dot product. The proof for the invariant atomic energy changes is that,

\displaystyle\begin{split}-\phi_{u}({a}_{i}^{l+1})\left(T\bm{f}_{i}^{l+1}\cdot T\bm{dr}_{i}^{l+1}\right)=\\ -\phi_{u}({a}_{i}^{l+1})\left(\bm{f}_{i}^{l+1}\ T^{\top}T\ \bm{dr}_{i}^{l+1}\right)=\\ -\phi_{u}({a}_{i}^{l+1})\left(\bm{f}_{i}^{l+1}\ \mathbf{I}\ \bm{dr}_{i}^{l+1}\right)=\\ -\phi_{u}({a}_{i}^{l+1})\left(\bm{f}_{i}^{l+1}\cdot\bm{dr}_{i}^{l+1}\right)=\\ {\delta U}_{i}.\end{split}

(15)

This is how we contract equivariant features to invariant arrays. The addition of these arrays to atomic features preserves the invariance for the final prediction of atomic contributions to the total potential energy.