This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Gaussian process regression for transition state search

Alexander Denzel    Johannes Kästner kaestner@theochem.uni-stuttgart.de Institute for Theoretical Chemistry, University of Stuttgart, Pfaffenwaldring 55, 70569 Stuttgart,Germany
(August 11, 2025)
Abstract

We implemented a gradient-based algorithm for transition state search which uses Gaussian process regression (GPR). Besides a description of the algorithm we provide a method to find the starting point for the optimization if only the reactant and product minima are known. We perform benchmarks on 2727 test systems against the dimer method and partitioned rational function optimization (P-RFO) as implemented in the DL-FIND library. We found the new optimizer to significantly decrease the number of required energy and gradient evaluations.

1 Introduction

The investigation of reaction mechanisms is a central goal in theoretical chemistry. Any reaction can be characterized by its potential energy surface (PES) E(x)E(\vec{x}), the energy depending on the nuclear coordinates of all atoms. Minima on the PES correspond to reactants and products. The minimum-energy path, the path of lowest energy that connects two minima, can be seen as an approximation to the mean reaction path. It proceeds through a first-order saddle point (SP). Such a SP is the point of highest energy along the minimum-energy path. The energy difference between a minimum and a SP connected to the minimum is the reaction barrier, which can be used in transition state theory to calculate reaction rate constants. SPs are typically located by iterative algorithms, with the energy and its derivatives calculated by electronic structure methods in each iteration. Thus, SP searches are typically the computationally most expensive procedures in the theoretical study of reaction mechanisms. Thus, efficient black-box algorithms are required to increase the efficiency of such simulations. Here, we present such an algorithm based on machine learning techniques.

A fist-order saddle point is characterized by a vanishing gradient of the energy with respect to all coordinates and a single negative eigenvalue of the respective Hessian. The eigenmode vmin\vec{v}_{\text{min}} corresponding to that negative eigenvalue is the transition mode, a tangent of the minimum-energy path. The SP can be seen as an approximation of the transition state. While a general transition state is a surface that encapsulates the reactant minimum, the lowest-energy point on such a general transition state that minimizes recrossing is in many cases a fist-order SP. Thus, a SP is often referred to as transition structure or simply as transition state (TS).

Since the search for transition states is such a central task in computational chemistry, many algorithms have been proposed. The most general one is probably a full Newton search. This has the disadvantage that it converges to any SP, not necessarily first-order ones. Moreover, it requires to calculate the Hessian of the potential energy with respect to the coordinates at each step, which is computationally rather expensive. An algorithm, which also requires Hessian information, but converges specifically to first-order SPs is the partitioned rational function optimization1 (P-RFO), which is based on the rational function approximation to the energy of the RFO method.2 It typically shows excellent convergence properties, but its requirement for Hessians renders P-RFO impractical in many cases. Algorithms that find first-order SPs without Hessian information are the so-called minimum mode following methods3. They find vmin\vec{v}_{\text{min}} by rotating a dimer on the PES4 or the Lanczos method.5, 6 By reversing the component of the force F\vec{F} in the direction of vmin\vec{v}_{\text{min}} one can build a new force Feff=F2(Fvmin)vmin\vec{F}^{\text{eff}}=\vec{F}-2(\vec{F}\cdot\vec{v}_{\text{min}})\vec{v}_{\text{min}} that takes the algorithms to a first-order SP.  

Previous work compared P-RFO and gradient-based minimum mode following methods and found the latter to be advantageous in many cases.7  Even if they need more steps until convergence, this is compensated by the fact that no Hessians have to be calculated.

The P-RFO-based optimization technique  we present in this paper locates SPs without Hessian information. It uses energy and gradient information in the methodology of Gaussian process regression (GPR).8 This allows us to use P-RFO on an interpolated PES that is much cheaper to evaluate than the original PES without ever calculating Hessians on the original PES.  Kernel-based methods like GPR are increasingly used in theoretical chemistry to predict different kinds of chemical properties. 9, 10, 11, 12, 13, 14, 15, 16, 17, 18 Among these are minimization algorithms on the PES,19, 20 in some cases even without the requirement for analytical gradients.19  Especially interesting for our case is that GPR was already used to search for SPs: the efficiency of the nudged elastic band method (NEB)21, 22 was drastically improved using GPR.23, 24 In contrast to that approach, we use a surface walking algorithm, focusing on the SP rather than optimizing the whole path.

Sometimes it can be difficult to make a good first guess for the SP to start the algorithm. For the NEB method a procedure was introduced to provide a starting path using only geometrical properties of the system at two known minima. It is called image-dependent pair potential (IDPP).25 If we know the two minima that the wanted TS connects, we will make use of that potential to find an initial guess for our TS to start the optimization.

This paper is organized as follows. We briefly introduce the methodology of GPR in the next section. Subsequently we describe in detail how our optimizations  make use of GPR. Then we show some benchmarks of the new optimizer and compare it to the well-established dimer method and P-RFO in DL-FIND.26, 27 The complete algorithm presented in this paper is implemented in DL-FIND and accessible via ChemShell. 28, 29 The code will be made publicly available.

All properties in this paper are expressed in atomic units (Bohr for positions and distances, Hartree for energies), unless other units are specified.

2 Methods

Gaussian process regression.

The idea of GPR is described at length in the literature8 and we will only briefly review the basic idea. One can build a surrogate model for the PES using NN energies E1,E2,,ENE_{1},E_{2},...,E_{N} at certain configurations of the molecule x1,x2,,xNd\vec{x}_{1},\vec{x}_{2},...,\vec{x}_{N}\in\mathbb{R}^{d}. These configurations are called training points. In Cartesian coordinates the dimension dd of the system is d=3nd=3n, while nn is the number of atoms in the system. The key element of the GPR scheme is the covariance function k(xi,xj)k(\vec{x}_{i},\vec{x}_{j}). The covariance function describes the covariance between two random variables at the points xi\vec{x}_{i} and xj\vec{x}_{j} that take on the value of the energies. In simplified terms, the covariance function is a similarity measure between these two energies. In our case we use a form of the Matérn covariance function30

kM(r)=(1+5rl+5r23l2)exp[5rl]k_{\text{M}}(r)=\left(1+\frac{\sqrt{5}r}{l}+\frac{5r^{2}}{3l^{2}}\right)\exp\left[-\frac{\sqrt{5}r}{l}\right] (1)

in which we abbreviated r=|xixj|r=|\vec{x}_{i}-\vec{x}_{j}|. The parameter ll describes a characteristic length scale of the Gaussian process (GP). It determines how strongly the covariance between two random variables (describing energies) decreases with distance.

Given a prior estimate Eprior(x)E_{\text{prior}}(\vec{x}) of the PES, before we have included the training points in the interpolation, one can build the GPR-PES as follows.

E(x)=n=1Nwnk(x,xn)+Eprior(x)E(\vec{x})=\sum_{n=1}^{N}w_{n}k(\vec{x},\vec{x}_{n})+E_{\text{prior}}(\vec{x}) (2)

The vector w=(w1w2wN)T\vec{w}=(w_{1}\ w_{2}\ ...\ w_{N})^{T} is the solution of the linear system

n=1NKmnwn=EmEprior(xm)\sum_{n=1}^{N}K_{mn}w_{n}=E_{m}-E_{\text{prior}}(\vec{x}_{m}) (3)

for all m=1,2,,Nm=1,2,...,N. The elements

Kmn=k(xm,xn)+σe2δmnK_{mn}=k(\vec{x}_{m},\vec{x}_{n})+\sigma_{\text{e}}^{2}\delta_{mn} (4)

are the entries of the so called covariance matrix KK in which δmn\delta_{mn} is the Kronecker delta. The parameter σe\sigma_{\text{e}} describes a noise that one assumes on the energy data. If one includes gradients in the scheme, one can introduce an additional parameter σg\sigma_{\text{g}} for the noise on the gradient data.

Since the kernel function is the only dependency of xx, we can obtain gradients and Hessians of the GPR-PES. For the first derivative in dimension kk, i.e. in the direction of the kk-th unit vector we get

dE(x)dxk=n=1Nwndk(x,xn)dxk\frac{dE(\vec{x})}{dx^{k}}=\sum_{n=1}^{N}w_{n}\frac{dk(\vec{x},\vec{x}_{n})}{dx^{k}} (5)

and similarly

d2E(x)dxkdxl=n=1Nwnd2k(x,xn)dxkdxl\frac{d^{2}E(\vec{x})}{dx^{k}dx^{l}}=\sum_{n=1}^{N}w_{n}\frac{d^{2}k(\vec{x},\vec{x}_{n})}{dx^{k}dx^{l}} (6)

for second derivatives, in dimensions kk and ll. In a previous paper we explicitly describe how one can include gradient information into this scheme.20 In this paper we always use energy and gradient information at the training points. In order to build the GPR-PES we then have to solve a linear system with a covariance matrix of size N(d+1)N(d+1). We solve this linear system via Cholesky decomposition. This yields a scaling of 𝒪(N3d3)\mathcal{O}\left(N^{3}d^{3}\right). In our case we can decrease the formal scaling to 𝒪(d3)\mathcal{O}\left(d^{3}\right) with a multi-level approach described below.

The optimization algorithm.

In our optimization algorithm  we take a similar approach as the P-RFO optimizer and built up a surrogate model for the PES. In contrast to P-RFO we only use energy and gradient information. We do not need any second derivatives. We use GPR to build the surrogate model by interpolating the energy and gradient information we obtain along the optimization procedure. On the resulting GPR-PES we can cheaply obtain the Hessian and therefore, also perform a P-RFO optimization. Our algorithm can be seen as a GPR-based extension to P-RFO to dispense the need for Hessian evaluations.  The result is used to predict a SP on the real PES. We perform all optimizations in Cartesian coordinates.

Convergence criteria.

We call the vector that points from the last estimate of the TS to the current estimate of the TS the step vector s\vec{s}, while the gradient at the current estimate of the TS is referred to as g\vec{g}. DL-FIND uses a combination of convergence criteria, given a single tolerance value, δ\delta. The highest absolute value of all entries and the Euclidean norm of both, g\vec{g} and s\vec{s}, have to drop below certain thresholds:

maxi(gi)\displaystyle\max\limits_{i}(g_{i}) <δmax(g)\displaystyle<\delta_{\text{max}(g)} δ\displaystyle\coloneqq\delta (7)
|g|d\displaystyle\frac{|\vec{g}|}{d} <δ|g|\displaystyle<\delta_{|g|} 23\displaystyle\coloneqq\frac{2}{3} δ\displaystyle\delta (8)
maxi(si)\displaystyle\max\limits_{i}(s_{i}) <δmax(s)\displaystyle<\delta_{\text{max}(s)} 4\displaystyle\coloneqq 4 δ\displaystyle\delta (9)
|s|d\displaystyle\frac{|\vec{s}|}{d} <δ|s|\displaystyle<\delta_{|s|} 83\displaystyle\coloneqq\frac{8}{3} δ\displaystyle\delta (10)

In these equations dd stands for the number of dimensions in the system.  If all these criteria are fulfilled, the TS is considered to be converged. These are the same criteria that are used for other TS optimizers in DL-FIND.

Parameters.

We chose a length scale of l=20l=20 during all optimization runs. The noise parameters were chosen as σe=σg=107\sigma_{\text{e}}=\sigma_{\text{g}}=10^{-7}. Despite accounting for numerical errors in the electronic structure calculations they also function as regularization parameters to guarantee convergence of the solution in Equation (3). We chose the noise parameters as small as possible without compromising the stability of the system. We generally found small values for the noise parameters to work better for almost all systems.  As the prior mean, Eprior(x)E_{\text{prior}}(\vec{x}), of Equation (2), we set the mean value of all training points.

Eprior(x)=1Ni=1NEiE_{\text{prior}}(\vec{x})=\frac{1}{N}\sum_{i=1}^{N}E_{i} (11)

The parameter smaxs_{\text{max}}, the maximum step size, must be specified by the user.

Converging the transition mode.

We start the optimization at an initial guess x0trans\vec{x}^{\text{trans}}_{0} for the SP. At this point we obtain an approximate Hessian from a GPR-PES constructed from gradient calculations. This is done in such a way that we try to converge the eigenvector to the smallest eigenvalue of the Hessian, the transition mode v\vec{v}. An estimate of that transition mode at the point x0trans\vec{x}^{\text{trans}}_{0} is found according to the following procedure, which is the equivalent of dimer rotations in the dimer method.

  1. 1.

    Calculate the energy and the gradient at the point x0trans\vec{x}^{\text{trans}}_{0} and also at the point

    x1rot=x0trans+Δ|v0|v0\vec{x}^{\text{rot}}_{1}=\vec{x}^{\text{trans}}_{0}+\frac{\Delta}{|\vec{v}_{0}|}\vec{v}_{0} (12)

    with v0\vec{v}_{0} arbitrarily chosen. We generally chose v0=(1 1 1)T\vec{v}_{0}=(1\ 1\ ...\ 1)^{T}. The results are included as training points in a new GPR-PES. We set Δ=0.1\Delta=0.1 for our optimizer. Let i=1i=1.

  2. 2.

    Evaluate the Hessian Hi(x0trans)H_{i}(x^{\text{trans}}_{0}) of the resulting GPR-PES at the point x0trans\vec{x}^{\text{trans}}_{0}.

  3. 3.

    Compute the smallest eigenvalue of HiH_{i} and the corresponding eigenvector vi\vec{v}_{i}.

  4. 4.

    As soon as

    |(vivi1)|vi||vi1||>1δrtol\left\lvert\frac{(\vec{v}_{i}\cdot\vec{v}_{i-1})}{|\vec{v}_{i}|\cdot|\vec{v}_{i-1}|}\right\rvert>1-\delta_{\text{rtol}} (13)

    we assume that the transition mode is converged, this procedure is terminated, and we move x0trans\vec{x}^{\text{trans}}_{0} on the PES as we describe in the next section.

  5. 5.

    If the transition mode is not converged, calculate the energy and gradient at the point

    xi+1rot=x0trans+Δ|vi|vi\vec{x}^{\text{rot}}_{i+1}=\vec{x}^{\text{trans}}_{0}+\frac{\Delta}{|\vec{v}_{i}|}\vec{v}_{i} (14)

    and include the results to build a new GPR-PES. Increment ii by one and go back to step 22.

The procedure to converge the transition mode is not repeated after each movement of xtrans\vec{x}^{\text{trans}} on the PES but only after 5050 such moves. In these cases we start at the second step of the described procedure with the evaluation of the Hessian at the respective point. The initial guess for the transition mode after 5050 steps is the vector vi\vec{v}_{i} of the last optimization of the transition mode. We use δrtol=104\delta_{\text{rtol}}=10^{-4} (an angle smaller than 0.81°) at the very first point x0x_{0} and δrtol=103\delta_{\text{rtol}}=10^{-3} (an angle smaller than 2.56°) at every subsequent point at which we want to converge the transition mode.

Performing steps on the PES.

In agreement with the minimum-mode following methods we assume that we have enough Hessian-information available to move on the PES to the SP as soon as the transition mode is found. We call the points that are a result from these movements on the PES xitrans\vec{x}^{\text{trans}}_{i}, with i=0,1,i=0,1,.... The points xitrans\vec{x}^{\text{trans}}_{i} correspond to the midpoints in the dimer method. We use a user-defined parameter smaxs_{\text{max}} to limit the step size along the optimization. The step size can never be larger than smaxs_{\text{max}}. The steps on the PES, starting at the point xjtrans\vec{x}^{\text{trans}}_{j}, are performed according to the following procedure.

  1. 1.

    Find the SP on the GPR-PES using a P-RFO optimizer. This optimization on the GPR-PES is stopped if one of the following criteria is fulfilled.

    • The step size of this optimization is below δmax(s)/50\delta_{\text{max}(s)}/50.

    • We found a negative eigenvalue (smaller than 1010-10^{-10}) and the highest absolute value of all entries of the gradient on the GPR-PES drops below δmax(g)/100\delta_{\text{max}(g)}/100.

    • The Euclidean distance between the currently estimated TS and the starting point of the P-RFO optimization is larger than 2smax2s_{\text{max}}.

    If none of these are fulfilled after 100100 P-RFO steps, we use a simple dimer translation. The dimer translation is done until one of the criteria above is fulfilled or the Euclidean distance between the currently estimated TS and the starting point is larger than smaxs_{\text{max}}. The P-RFO method converged in less than 100100 steps in all of the presented test cases of this paper.

  2. 2.

    Overshooting the estimated step, according to the overshooting procedures described in the next section, resulting in xj+1trans\vec{x}^{\text{trans}}_{j+1}.

  3. 3.

    Calculate the energy and gradient at xj+1trans\vec{x}^{\text{trans}}_{j+1} and build a new GPR-PES.

The used P-RFO implementation for this procedure is the same as used in DL-FIND (with Hessians of the GPR-PES re-calculated in each step). P-RFO tries to find the modes corresponding to overall translation and rotation of the system and ignores them. Numerically, it is not always clear which modes these are. Therefore, we found it to be very beneficial to project the translational modes out of the inferred Hessian. This yields translational eigenvalues that are numerically zero. Otherwise, the TS search via P-RFO tends to translate the whole molecule, leading to an unnecessary large step size.

Overshooting.

As we have done in the optimization algorithm for minimum search,20 we try to shift the area in which we predict the SP to an interpolation regime rather than an extrapolation regime. This is because machine learning methods perform poorly in extrapolation. The overshooting is done, dependent on the angle between the vectors along the optimization: let sN\vec{s}_{N}^{\,\prime} be the vector from the point xN1trans\vec{x}^{\text{trans}}_{N-1} to the next estimate for the TS according to the first step in the procedure described in the previous section. Let sN1\vec{s}_{N-1} be the vector pointing from xN2trans\vec{x}^{\text{trans}}_{N-2} to xN1trans\vec{x}^{\text{trans}}_{N-1}. If xN1trans=x0\vec{x}^{\text{trans}}_{N-1}=\vec{x}_{0}, the point xNtrans\vec{x}^{\text{trans}}_{N} is simply calculated according to the procedure described in the previous section with no overshooting. We calculate an angle

αN=(sN1sN)|sN1||sN|\alpha_{N}=\frac{(\vec{s}_{N-1}\cdot\vec{s}^{\,\prime}_{N})}{|\vec{s}_{N-1}||\vec{s}^{\,\prime}_{N}|} (15)

and introduce a scaling factor

λ(αN)=1+(λmax1)(αN0.910.9)4\lambda(\alpha_{N})=1+({\lambda}_{\text{max}}-1)\left(\frac{\alpha_{N}-0.9}{1-0.9}\right)^{4} (16)

that scales

sN=λ(αN)sN\vec{s}_{N}={\lambda}(\alpha_{N})\vec{s}^{\,\prime}_{N} (17)

as soon as αN>0.9\alpha_{N}>0.9. The scaling limit λmax\lambda_{\text{max}} is chosen to be 55. But it is reduced if the algorithm is close to convergence and it is increased if αN>0.9\alpha_{N}>0.9 for two consecutive steps, i.e. we overshoot more than once in a row. We refer to our previous work for a more detailed description of the calculation of λmax\lambda_{\text{max}}.20 We also use the separate dimension overshooting procedure from that paper: if the value of a coordinate along the steps xitrans\vec{x}^{\text{trans}}_{i} changed monotonically for the last 2020 steps, a one dimensional GP is used to represent the development of this coordinate along the optimization. It is optimized, independent from the other coordinates. To account for coupling between the coordinates in succeeding optimization steps this procedure is suspended for 2020 steps after it was performed. After these two overshooting procedures the step is scaled down to smaxs_{\text{max}}. The chosen overshooting procedures might seem too aggressive at first glance. But the accuracy of the GPR-PES is largely improved if we overshoot the real TS. This is a crucial difference to conventional optimizers: a step that is too large/in the wrong direction can still be used to improve the next estimate of the TS.

Multi-level GPR.

Just like in our previous work20 we include a multi-level scheme to reduce the computational effort of the GPR. A more detailed explanation of the multi-level approach can also be found there. The most demanding computational step in the GPR scheme is solving Equation (3). The computational effort to solve this system can be reduced by limiting the size of the covariance matrix. Our multi-level scheme achieves this by solving several smaller systems instead of one single large system. This eliminates  the formal scaling with the number of steps in the optimization history. To achieve this, we take the oldest mm training points in the GP to build a separate GP called GP1GP_{1}. This is done as soon as the number of training points reaches NmaxN_{\text{max}}. The predicted GPR-PES from GP1GP_{1} is used as Eprior(x)E_{\text{prior}}(\vec{x}) in Equation (2) for the GP built with  the remaining NmaxmN_{\text{max}}-m training points that we call GP0GP_{0}. The new training points are added to GP0GP_{0}. When GP0GP_{0} eventually has more than NmaxN_{\text{max}} training points we rename GP1GP_{1} to GP2GP_{2} and use the mm oldest training points in GP0GP_{0} to build a new GP1GP_{1}. We always take GPi+1GP_{i+1} as the prior to GPiGP_{i}. The GPqGP_{q} with the highest index qq just uses the mean of all energy values at the contained training points as its prior. The number of levels increases along the optimization but since the number of points in all GPiGP_{i} is kept below a certain constant the formal scaling of our GPR scheme is 𝒪(d3)\mathcal{O}\left(d^{3}\right). Splitting the points which were used to converge the smallest eigenvalue of the Hessian (according to the procedure for the converging of the transition mode described above) leads to a loss in accuracy of the second order information of the GPR-PES and consequently, to inaccurate predictions of the step direction by the underlying P-RFO.  Therefore, we increase NmaxN_{\text{max}} and mm by one if those points would be split into different levels and try the splitting again after including the next training point. When we successfully performed the splitting, the original values of NmaxN_{\text{max}} and mm are restored. We set the values Nmax=60N_{\text{max}}=60 and m=10m=10 for all tests performed in this paper. We also want to guarantee that we always have information about the second derivative in GP0GP_{0}. Therefore, we start our Hessian approximation via Procedure 1 after Nmaxm=50N_{\text{max}}-m=50 steps on the PES. As a result, the points that belong to the last Hessian approximation are always included in GP0GP_{0}. The method described so far is referred to as GPRTS in the following.

Finding a starting point via IDPP.

If one intends to find a TS that connects two known minima, the algorithm tries to find a good guess for the starting point of the TS search automatically. To that end we optimize a nudged elastic band (NEB) 21, 22 in the image dependent pair potential (IDPP).25 The construction of this potential does not need any electronic structure calculations, but is a good first guess for a NEB path based on geometrical considerations. Given MM images xiIDPP,i=1,,M\vec{x}^{\text{IDPP}}_{i},\ i=1,...,M of the optimized NEB in the IDPP, our algorithm calculates the energy and gradient of the real PES at image xjIDPP\vec{x}^{\text{IDPP}}_{j} with j=M/2j={M/2} for even and j=(M1)/2j=(M-1)/2 for odd MM. Only from this point on we perform energy evaluations of the real PES.  Then we calculate the scalar product between the gradient γj\vec{\gamma}_{j} at point xjIDPP\vec{x}^{\text{IDPP}}_{j} and the vector to the next image on the NEB path.

β=γj(xj+1IDPPxjIDPP)\beta=\vec{\gamma}_{j}\cdot(\vec{x}^{\text{IDPP}}_{j+1}-\vec{x}^{\text{IDPP}}_{j}) (18)

If β>0\beta>0, we calculate the energy and the gradient at xj+1IDPP\vec{x}^{\text{IDPP}}_{j+1}, otherwise at xj1IDPP\vec{x}^{\text{IDPP}}_{j-1} in order to aim at a maximum in the energy. This is repeated until we change the direction along the NEB and would go back to a point xbestIDPP\vec{x}^{\text{IDPP}}_{\text{best}} at which we already know the energy and the gradient. The point xbestIDPP\vec{x}^{\text{IDPP}}_{\text{best}} we found in this procedure is assumed to be the best guess for the TS on the NEB and is the starting point for our optimizer. All the energy and gradient information acquired in this procedure is used to build the GPR-PES. The optimization technique including the usage of the NEB in the IDPP potential will be abbreviated as GPRPP.

3 Results

To benchmark our TS search we chose 2727 test systems: first of all, 2525 test systems suggested by Baker.31 The starting points of the optimizations were chosen following Ref. 31 close to a TS on Hartree–Fock level. However, we use the semi-empirical AM132 method for the electronic structure calculations. We plot all the TSs we found in the Baker test set in Fig. 1. These TSs were found by GPRTS except for the TS in system 44, 1010, and 1515 for which we used the dimer method. These exceptions were made since the GPRTS method finds a different transition than implied by the test set for system 44 and 1010. For system 1515 GPRTS does not converge.

Furthermore, we chose two test systems on DFT level using the BP86 functional33, 34 and the 6-31G* basis set.35 Firstly, an intramolecular [1,5] H shift of 1,3(Z)-hexadiene to 2(E),4(Z)-hexadiene as investigated in Ref. 36. Secondly, an asymmetric allylation of a simple isoxazolinone as investigated in Ref. 37, see Fig. 2. The starting points for the TS searches on those two systems were chosen by chemical intuition to approximate the real TS.

In the next section we compare the new GPR-based TS search (GPRTS) against the dimer method and the P-RFO method as they are implemented in DL-FIND. This is our first benchmark. In the following section we evaluate our GPRPP approach as a second benchmark.

Every time the energy is evaluated we also evaluate the gradient, a process which will be referred to merely as energy evaluation in the following for simplicity.

Refer to caption
Figure 1: All TSs for the Baker test set.
Refer to caption
Figure 2: TSs found by GPRTS for system 26, the [1,5] H shift of 1,3(Z)-hexadiene to 2(E),4(Z)-hexadiene, (left) and system 27, isoxazolinone, (right).
Table 1: A comparison of GPRTS to dimer and P-RFO in our test systems, sorted by the number of dimensions dd. The number of required energy calculations until convergence is shown, as well as the energy difference (Hartree) and RMSD values (Ång) of the resulting structure compared to the structure found by GPRTS. Convergence problems of the AM1 method are marked with err, non-convergence after 10001000 energy evaluations is marked nc for not converged.
Energy evaluations Energy differences RMSD values
dd GPRTS dimer P-RFO dimer P-RFO dimer P-RFO ID
5757 8787 218218 193193 1.00×1071.00\text{\times}{10}^{-7} 2.30×106-2.30\text{\times}{10}^{-6} 1.21×1021.21\text{\times}{10}^{-2} 5.01×1035.01\text{\times}{10}^{-3} 2727
4848 7979 155155 372372 5.00×107-5.00\text{\times}{10}^{-7} 4.11×102-4.11\text{\times}{10}^{-2} 3.54×1033.54\text{\times}{10}^{-3} 9.84×1019.84\text{\times}{10}^{-1} 2626
4848 4747 171171 nc 9.97×1079.97\text{\times}{10}^{-7} 2.24×1032.24\text{\times}{10}^{-3} 99
4242 3434 8383 130130 1.79×1071.79\text{\times}{10}^{-7} 1.86×106-1.86\text{\times}{10}^{-6} 3.83×1033.83\text{\times}{10}^{-3} 3.02×1033.02\text{\times}{10}^{-3} 1717
3333 101101 288288 232232 3.96×1083.96\text{\times}{10}^{-8} 3.33×1053.33\text{\times}{10}^{-5} 6.30×1036.30\text{\times}{10}^{-3} 7.12×1027.12\text{\times}{10}^{-2} 1818
3030 4747 106106 175175 5.70×103-5.70\text{\times}{10}^{-3} 5.70×103-5.70\text{\times}{10}^{-3} 1.23×1011.23\text{\times}{10}^{-1} 1.23×1011.23\text{\times}{10}^{-1} 66
3030 4141 100100 507507 1.21×1071.21\text{\times}{10}^{-7} 3.97×102-3.97\text{\times}{10}^{-2} 1.81×1031.81\text{\times}{10}^{-3} 5.80×1015.80\text{\times}{10}^{-1} 77
3030 2828 6363 8383 2.49×1082.49\text{\times}{10}^{-8} 3.21×108-3.21\text{\times}{10}^{-8} 1.69×1031.69\text{\times}{10}^{-3} 2.87×1042.87\text{\times}{10}^{-4} 88
3030 4949 err 7171 3.05×103-3.05\text{\times}{10}^{-3} 7.33×1017.33\text{\times}{10}^{-1} 1111
2424 2525 5959 8585 8.90×109-8.90\text{\times}{10}^{-9} 1.60×107-1.60\text{\times}{10}^{-7} 2.00×1032.00\text{\times}{10}^{-3} 6.64×1046.64\text{\times}{10}^{-4} 55
2424 2525 224224 151151 1.45×103-1.45\text{\times}{10}^{-3} 1.16×101-1.16\text{\times}{10}^{-1} 1.03×1011.03\text{\times}{10}^{-1} 2.29×1012.29\text{\times}{10}^{-1} 1010
2424 2222 5353 6868 9.42×1079.42\text{\times}{10}^{-7} 1.50×1091.50\text{\times}{10}^{-9} 2.86×1032.86\text{\times}{10}^{-3} 1.81×1041.81\text{\times}{10}^{-4} 1212
2424 3131 8888 7373 5.80×1075.80\text{\times}{10}^{-7} 9.05×109-9.05\text{\times}{10}^{-9} 2.56×1032.56\text{\times}{10}^{-3} 8.39×1058.39\text{\times}{10}^{-5} 1313
2424 100100 err 5555 2.81×107-2.81\text{\times}{10}^{-7} 4.40×1034.40\text{\times}{10}^{-3} 2121
2121 3434 7676 8080 4.18×1084.18\text{\times}{10}^{-8} 1.07×1081.07\text{\times}{10}^{-8} 4.21×1014.21\text{\times}{10}^{-1} 2.36×1042.36\text{\times}{10}^{-4} 1414
2121 3131 6767 7979 4.33×1084.33\text{\times}{10}^{-8} 9.00×10109.00\text{\times}{10}^{-10} 2.05×1032.05\text{\times}{10}^{-3} 5.14×1055.14\text{\times}{10}^{-5} 1616
2121 2424 211211 142142 5.15×1085.15\text{\times}{10}^{-8} 1.06×107-1.06\text{\times}{10}^{-7} 5.24×1035.24\text{\times}{10}^{-3} 3.04×1013.04\text{\times}{10}^{-1} 2020
2121 1818 4444 174174 7.39×1087.39\text{\times}{10}^{-8} 1.55×103-1.55\text{\times}{10}^{-3} 1.99×1031.99\text{\times}{10}^{-3} 2.71×1012.71\text{\times}{10}^{-1} 2222
1515 3636 152152 6666 7.97×1027.97\text{\times}{10}^{-2} 1.80×10101.80\text{\times}{10}^{-10} 2.25×1012.25\text{\times}{10}^{-1} 3.18×1053.18\text{\times}{10}^{-5} 44
1515 2222 6969 4444 1.66×108-1.66\text{\times}{10}^{-8} 3.35×1063.35\text{\times}{10}^{-6} 2.05×1032.05\text{\times}{10}^{-3} 7.81×1037.81\text{\times}{10}^{-3} 1919
1515 1616 3838 5656 2.30×1072.30\text{\times}{10}^{-7} 1.06×1081.06\text{\times}{10}^{-8} 2.53×1032.53\text{\times}{10}^{-3} 2.20×1042.20\text{\times}{10}^{-4} 2323
1515 1919 5858 114114 5.80×109-5.80\text{\times}{10}^{-9} 1.75×108-1.75\text{\times}{10}^{-8} 2.42×1032.42\text{\times}{10}^{-3} 3.92×1043.92\text{\times}{10}^{-4} 2424
1515 2323 4545 5959 5.00×1085.00\text{\times}{10}^{-8} 1.51×1071.51\text{\times}{10}^{-7} 2.38×1032.38\text{\times}{10}^{-3} 1.11×1031.11\text{\times}{10}^{-3} 2525
1212 1414 3434 3535 1.00×1091.00\text{\times}{10}^{-9} 6.40×1096.40\text{\times}{10}^{-9} 2.91×1032.91\text{\times}{10}^{-3} 6.77×1056.77\text{\times}{10}^{-5} 22
1212 1616 3131 127127 3.41×1093.41\text{\times}{10}^{-9} 9.43×102-9.43\text{\times}{10}^{-2} 2.61×1032.61\text{\times}{10}^{-3} 1.151.15 33
1212 err 5959 err 1515
99 1818 3939 3030 1.60×1091.60\text{\times}{10}^{-9} 1.80×109-1.80\text{\times}{10}^{-9} 3.03×1033.03\text{\times}{10}^{-3} 4.03×1054.03\text{\times}{10}^{-5} 11

Benchmarking the optimization.

We optimized the TSs of our test set using the new GPRTS optimizer and also the established dimer method and the P-RFO method in DL-FIND. We compare the number of energy evaluations that all three methods need until convergence in table 1. Note that analytic Hessians are available for systems 2626 and 2727. P-RFO uses 88 (44) analytic Hessians for system 2626 (2727). Those are not counted as energy evaluations. In the other systems the Hessians are calculated using central difference approximation via gradients.  The required 2d2d gradients are counted as energy evaluations in the table. Note that every energy evaluation in our tables always refers to evaluating the energy and the gradient of the PES.  We chose a maximum step size of smax=0.3s_{\text{max}}=$0.3$ and a tolerance δ=3×104\delta=$3\text{\times}{10}^{-4}$ for all optimizations.

The P-RFO method calculates the Hessian at the first point and then only every 5050 following steps. All other Hessians are inferred via the update mechanism by Bofill. 38

We see that the GPRTS method generally requires fewer energy evaluations than the other methods and has the least convergence problems. In table 1 we also show the energy differences (the energy from the respective method minus the energy from GPRTS) and root mean squared deviations (RMSD) of the found TSs, compared to the TSs found by GPRTS. Looking at the specific systems with relevant deviations for the found TSs we observe the following differences, compare Fig. 1:

  • System 3: P-RFO finds a different TS in which \ceH2\ce{H2} is completely abstracted. Its energy is 9.34×1029.34\times 10^{-2} Hartree lower than that of the TS found by GPRTS.

  • System 4: GPRTS and P-RFO yield a smaller distance of the \ceH\ce{H}-Atom to the remaining \ceO\ce{O} atom. which corresponds to a different transition than intended: a rotation of the \ceOH\ce{OH} group.

  • System 6: GPRTS finds a different angle H–C–C at the carbene end.

  • System 7: P-RFO finds an opening of the ring structure which remains closed in the other cases.

  • System 10: GPRTS finds a TS in that \ceN2\ce{N2} is abstracted, the dimer method finds the depicted opening of the ring. The opening of the ring is also found via GPRTS if one uses a smaller step size. P-RFO finds a closed ring structure as the TS.

  • System 11: P-RFO finds a more planar structure.

  • System 14: The dimer method finds a mirror image of the TS found by the other methods, thus the high RMSD.

  • System 20: P-RFO finds a different orientation of \ceNH3\ce{NH3}, rotated relative to \ceHCO\ce{HCO}, and also with a slightly different distance.

  • System 22: P-RFO finds a different angle of the attached \ceOH\ce{OH} group. The resulting molecule is not planar. The resulting structures of GPRTS and the dimer method are planar.

  • System 26: P-RFO does not find the \ceH\ce{H}-transfer but an opening of the ring.

Refer to caption
Figure 3: The Euclidean norm of the gradient with respect to the number of steps on the PES taken by GPRTS, the dimer method, and P-RFO in the four biggest systems in the Baker test set. We explicitly excluded the steps needed to optimize the minimum mode/calculate the Hessian. We truncated the plot for P-RFO in system 0909 since it does not converge.
Refer to caption
Figure 4: The Euclidean norm of the gradient with respect to the number of steps on the PES taken by GPRTS, the dimer method, and P-RFO in the systems 2626 and 2727. We explicitly excluded the steps needed to optimize the minimum mode/calculate the Hessian. We truncated the plot for P-RFO in system 2626 since it does not converge to the same SP as the other methods.

Furthermore, we also compare the evolution of the residual error of GPRTS, P-RFO, and the dimer method during the TS search for the four largest test systems in the Baker test set in Fig. 3 and for systems 2626 and 2727 in Fig. 4. In both plots we only count the number of steps on the PES that are taken along the optimization runs. The energy evaluations performed to converge the transition mode, to rotate the dimer, and to calculate the Hessian in P-RFO are excluded. We see a significant speedup with the new GPRTS optimizer. In addition, the new method needs also fewer gradients than the dimer method to approximate the Hessian appropriately. In our optimizer we often see jumps of the gradient norm at around 5050 steps. This is where the multi-level approach creates the first separate GP. Therefore, we include a possibility to increase the number of training points in the last level GP0GP_{0} in the code to get better performance in high-precision calculations. We do not make use of this possibility in all test systems presented in the paper.

Benchmarking GPRPP.

As described in the previous section about finding a starting point via IDPP we use an approximated NEB path between two known minima to find a good starting point for the TS search. To prepare the input for our test systems we proceeded as follows:

  • We start from the TSs shown in Fig. 1 and Fig. 2.

  • We performed IRC path searches to find the two minima connected by the TS.

  • The geometries of these minima are used as starting points for GPRPP.

  • Additionally, GPRTS, dimer and P-RFO searches are started from xbestIDPP\vec{x}^{\text{IDPP}}_{\text{best}}. In GPRPP, the points on the NEB path are included in the GPR, while in GPRTS they are excluded.

In the benchmarks reported in table 2 the additional points to find xbestIDPP\vec{x}^{\text{IDPP}}_{\text{best}} are included in the number of steps that the GPRPP optimizer needs, but not in the numbers for the other methods. Our approximated NEB path consists of 1010 images for all test cases. The number of additional energy evaluations that GPRPP needs to find xbestIDPP\vec{x}^{\text{IDPP}}_{\text{best}} is 22 or 33 for all test systems. We set the same maximum step size and tolerance value as in the previous section. We compare the number of energy evaluations all methods need until convergence in table 2. We also show the energy differences (the energy from the respective method minus the energy from GPRPP) and RMSD values compared to GPRPP. Note that analytic Hessians are available for systems 2626 and 2727. P-RFO uses 22  analytic Hessians for system 2626 and 55 for system 2727.  Those are not counted as energy evaluations. In the other systems the Hessians are calculated via finite differences and the required 2d2d gradients are counted as energy evaluations in the table. We observe that GPRPP and GPRTS yield consistently good results. Comparing GPRPP and GPRTS we see no clear tendency whether it is useful to include the points that lead to vbest\vec{v}_{\text{best}} in order to improve the speed of the convergence. On the other hand, we have one system that only converges when these points are included. We conclude that the stability might be improved using the GPRPP method with the additional training points in comparison to pure GPRTS. This will strongly depend on the choice of the starting point for GPRTS.

Table 2: A Comparison of GPRPP to GPRTS, dimer, and P-RFO in our test systems, sorted by the number of dimensions dd. The number of required energy calculations until convergence is shown, as well as the energy difference (Hartree) and RMSD values (Ång) of the resulting structure compared to the structure found by GPRPP. Convergence problems of the AM1 method are marked with err, non-convergence after 10001000 energy evaluations is marked nc for not converged.
Energy evaluations Energy differences RMSD values
dd GPRPP GPRTS dimer P-RFO GPRTS dimer P-RFO GPRTS dimer PRFO ID
5757 8787 111111 339339 210210 4.00×107-4.00\text{\times}{10}^{-7} 1.40×1061.40\text{\times}{10}^{-6} 3.30×106-3.30\text{\times}{10}^{-6} 9.06×1039.06\text{\times}{10}^{-3} 1.25×1021.25\text{\times}{10}^{-2} 7.02×1017.02\text{\times}{10}^{-1} 2727
4848 5454 5252 119119 5959 1.40×1061.40\text{\times}{10}^{-6} 9.00×1079.00\text{\times}{10}^{-7} 2.00×107-2.00\text{\times}{10}^{-7} 1.34×1021.34\text{\times}{10}^{-2} 1.27×1021.27\text{\times}{10}^{-2} 4.75×1034.75\text{\times}{10}^{-3} 2626
4848 4040 5252 nc nc 1.51×107-1.51\text{\times}{10}^{-7} 6.21×1046.21\text{\times}{10}^{-4} 99
4242 8888 115115 err 225225 2.84×1072.84\text{\times}{10}^{-7} 4.35×107-4.35\text{\times}{10}^{-7} 6.88×1046.88\text{\times}{10}^{-4} 1.17×1031.17\text{\times}{10}^{-3} 1717
3333 4646 5252 151151 190190 8.13×105-8.13\text{\times}{10}^{-5} 9.10×105-9.10\text{\times}{10}^{-5} 9.10×105-9.10\text{\times}{10}^{-5} 8.70×1028.70\text{\times}{10}^{-2} 1.28×1011.28\text{\times}{10}^{-1} 1.28×1011.28\text{\times}{10}^{-1} 1818
3030 4141 4747 121121 174174 1.21×106-1.21\text{\times}{10}^{-6} 9.49×107-9.49\text{\times}{10}^{-7} 1.23×106-1.23\text{\times}{10}^{-6} 2.54×1032.54\text{\times}{10}^{-3} 2.30×1032.30\text{\times}{10}^{-3} 2.55×1032.55\text{\times}{10}^{-3} 66
3030 3939 4040 114114 175175 1.82×107-1.82\text{\times}{10}^{-7} 8.00×1010-8.00\text{\times}{10}^{-10} 2.41×107-2.41\text{\times}{10}^{-7} 9.85×1049.85\text{\times}{10}^{-4} 1.07×1031.07\text{\times}{10}^{-3} 1.04×1031.04\text{\times}{10}^{-3} 77
3030 2929 2828 7474 7171 5.38×108-5.38\text{\times}{10}^{-8} 4.50×1084.50\text{\times}{10}^{-8} 9.23×108-9.23\text{\times}{10}^{-8} 5.52×1045.52\text{\times}{10}^{-4} 8.28×1048.28\text{\times}{10}^{-4} 4.73×1044.73\text{\times}{10}^{-4} 88
3030 6969 6565 nc 6969 1.43×108-1.43\text{\times}{10}^{-8} 2.92×103-2.92\text{\times}{10}^{-3} 2.03×1042.03\text{\times}{10}^{-4} 6.93×1016.93\text{\times}{10}^{-1} 1111
2424 2525 2626 6060 7575 9.87×1089.87\text{\times}{10}^{-8} 1.31×1071.31\text{\times}{10}^{-7} 2.04×108-2.04\text{\times}{10}^{-8} 4.77×1044.77\text{\times}{10}^{-4} 5.93×1045.93\text{\times}{10}^{-4} 2.61×1042.61\text{\times}{10}^{-4} 55
2424 3434 3131 err 8686 1.16×1011.16\text{\times}{10}^{-1} 5.91×1075.91\text{\times}{10}^{-7} 2.29×1012.29\text{\times}{10}^{-1} 1.34×1031.34\text{\times}{10}^{-3} 1010
2424 9393 8282 err 152152 6.00×10106.00\text{\times}{10}^{-10} 1.00×10101.00\text{\times}{10}^{-10} 4.64×1054.64\text{\times}{10}^{-5} 3.56×1053.56\text{\times}{10}^{-5} 1212
2424 4040 3939 122122 9292 6.73×109-6.73\text{\times}{10}^{-9} 7.23×1087.23\text{\times}{10}^{-8} 1.48×1071.48\text{\times}{10}^{-7} 1.18×1041.18\text{\times}{10}^{-4} 3.67×1043.67\text{\times}{10}^{-4} 8.80×1048.80\text{\times}{10}^{-4} 1313
2424 6969 129129 err 5555 4.96×1094.96\text{\times}{10}^{-9} 2.77×107-2.77\text{\times}{10}^{-7} 2.60×1042.60\text{\times}{10}^{-4} 4.49×1034.49\text{\times}{10}^{-3} 2121
2121 3131 3131 6868 7373 9.04×1089.04\text{\times}{10}^{-8} 2.84×1072.84\text{\times}{10}^{-7} 4.93×109-4.93\text{\times}{10}^{-9} 4.21×1014.21\text{\times}{10}^{-1} 4.22×1014.22\text{\times}{10}^{-1} 4.22×1014.22\text{\times}{10}^{-1} 1414
2121 3030 3131 7575 6868 9.16×1079.16\text{\times}{10}^{-7} 5.02×1085.02\text{\times}{10}^{-8} 3.80×1093.80\text{\times}{10}^{-9} 2.13×1032.13\text{\times}{10}^{-3} 6.52×1046.52\text{\times}{10}^{-4} 3.48×1043.48\text{\times}{10}^{-4} 1616
2121 5151 205205 nc nc 2.45×1022.45\text{\times}{10}^{-2} 2.992.99 2020
2121 3535 3232 9090 7676 1.00×1010-1.00\text{\times}{10}^{-10} 3.13×1083.13\text{\times}{10}^{-8} 1.38×1081.38\text{\times}{10}^{-8} 1.05×1041.05\text{\times}{10}^{-4} 2.82×1042.82\text{\times}{10}^{-4} 1.50×1041.50\text{\times}{10}^{-4} 2222
1515 8787 5555 nc 142142 1.43×1031.43\text{\times}{10}^{-3} 1.41×1031.41\text{\times}{10}^{-3} 5.76×1015.76\text{\times}{10}^{-1} 5.29×1015.29\text{\times}{10}^{-1} 44
1515 2323 2222 6868 4545 0.000.00 3.07×1083.07\text{\times}{10}^{-8} 4.40×1074.40\text{\times}{10}^{-7} 1.31×1041.31\text{\times}{10}^{-4} 2.99×1042.99\text{\times}{10}^{-4} 3.32×1033.32\text{\times}{10}^{-3} 1919
1515 3535 9595 err 377377 1.01×101-1.01\text{\times}{10}^{-1} 1.01×101-1.01\text{\times}{10}^{-1} 9.84×1019.84\text{\times}{10}^{-1} 7.44×1017.44\text{\times}{10}^{-1} 2323
1515 2323 2525 6060 6363 8.00×1010-8.00\text{\times}{10}^{-10} 5.49×1085.49\text{\times}{10}^{-8} 7.80×1097.80\text{\times}{10}^{-9} 3.12×1053.12\text{\times}{10}^{-5} 5.51×1045.51\text{\times}{10}^{-4} 1.48×1041.48\text{\times}{10}^{-4} 2424
1515 2020 2121 5454 5656 3.37×1083.37\text{\times}{10}^{-8} 1.16×1081.16\text{\times}{10}^{-8} 4.10×1094.10\text{\times}{10}^{-9} 3.61×1043.61\text{\times}{10}^{-4} 1.07×1041.07\text{\times}{10}^{-4} 2.54×1042.54\text{\times}{10}^{-4} 2525
1212 1616 1717 3737 3535 1.00×1010-1.00\text{\times}{10}^{-10} 1.00×1010-1.00\text{\times}{10}^{-10} 5.50×1095.50\text{\times}{10}^{-9} 3.54×1053.54\text{\times}{10}^{-5} 4.31×1054.31\text{\times}{10}^{-5} 8.26×1058.26\text{\times}{10}^{-5} 22
1212 2929 1717 2929 6666 9.50×10109.50\text{\times}{10}^{-10} 2.05×1082.05\text{\times}{10}^{-8} 3.35×1093.35\text{\times}{10}^{-9} 9.33×1059.33\text{\times}{10}^{-5} 1.31×1041.31\text{\times}{10}^{-4} 1.27×1041.27\text{\times}{10}^{-4} 33
1212 2828 err err err 1515
99 2020 1919 4040 3737 8.00×10108.00\text{\times}{10}^{-10} 1.00×1010-1.00\text{\times}{10}^{-10} 0.000.00 4.47×1054.47\text{\times}{10}^{-5} 2.45×1052.45\text{\times}{10}^{-5} 2.03×1062.03\text{\times}{10}^{-6} 11

Looking at the energy differences and RMSD values we see a few deviations between the results of the different methods that we like to discuss, compare Fig. 1.

  • System 4: The abstracted \ceH\ce{H} atom has a different angle and distance in every optimization.

  • System 10: GPRPP and P-RFO found a closed, symmetric ring structure. GPRTS found a similar ring as in the reference structure depicted in Fig. 1, only with larger distance of the two nitrogen atoms to the carbons. The dimer method leads to a problem with the AM1 SCF convergence.

  • System 11: P-RFO finds a more planar structure.

  • System 12: The dimer method does not converge. All other methods yield the same TS, in which the two hydrogen atoms that get abstracted are moved more towards one of the carbon atoms.

  • System 14: Dimer, GPRTS and P-RFO all find the mirror image of the TS found by GPRPP.

  • System 15: GPRPP converges to the correct SP but the suggested starting point leads all other methods to a configuration for which the AM1 SCF cycles do not converge anymore.

  • System 18: Dimer and P-RFO yield slightly different angles between the atoms.

  • System 20: GPRTS yields unrealistic large distances of the two molecules. The other two methods do not converge. They start unrealistically increasing the distance of the two molecules as well.

  • System 23: GPRPP finds the same TS as depicted in Fig. 1, only with a different angle of the hydrogen that is attached to the nitrogen. All other methods find too large, and each a different, distance of the \ceH2\ce{H2} that is abstracted.

  • System 27: P-RFO finds a different orientation of the ring structure corresponding to a different TS.

The clear superiority of the new optimizer in the test systems indicates that the GPR-based representation of the Hessian is very well suited for finding the minimum mode. The suggested starting points xbestIDPP\vec{x}^{\text{IDPP}}_{\text{best}} for the TS searches are plausible in most systems. Overall the initial guess for the TS search using the IDPP seems to be sufficient for our usage in the GPRPP method but is generally not advisable for other TS-search algorithms. In some test cases it can only be used if one manually corrects for chemically unintuitive TS estimates.

Timing.

Our GPR based optimizer has a larger computational overhead than the other optimizers. Solving the linear system of Equation (3) and the P-RFO runs on the GPR-PES surface can be highly time consuming. With the multi-level approach we limit the computational effort of solving the linear system. Also we parallelized the evaluation of the Hessian matrix, needed for P-RFO, with OpenMP. We look at the timing in the two DFT-based optimization runs of our first benchmark in table 3.

Table 3: Total time in the GPRTS benchmark.
Time [seconds] (% used by optimizer)
ID GPRTS dimer P-RFO
26 461 (5%) 810 (0%) 3665 (0%)
27 419 (15%) 891 (0%) 1374 (0%)

The fraction of time spent on the GPRTS optimizer can vary because the P-RFO optimization on the GPR-PES converges faster or slower, depending on the system. The time spent on the runs using the dimer method and P-RFO is almost exclusively consumed by the energy evaluations along the runs.

We also have a look at the timing of the runs of our second benchmark in table 4. Note that P-RFO converges to a different TS in system 2727 and this number is not comparable.

Table 4: Total time in the GPRPP benchmark.
Time [seconds] (% used on optimizer)
ID GPRPP GPRTS dimer P-RFO
26 366 (5%) 353 (2%) 886 (0%) 520 (0%)
27 1041 (5%) 1473 (6%) 3608 (0%) 4129 (0%)

The presented timings were done on an Intel i5-4570 quad-core CPU. The DFT calculations were parallelized on all four cores, as well as the evaluation of the Hessian of the GPR-PES. Overall, we see that the GPR-based optimizer has an overhead that depends on the system. But the overhead is easily compensated by the smaller number of energy calculations that have to be performed. The overall time in test systems 2626 and 2727 is reduced by a factor of 22 by our new optimizer. The improvement will be more significant if the optimization is done with a higher level of electronic structure calculations.

4 Discussion

For our optimizations we use Cartesian coordinates. In many cases it was proven that one can improve the interpolation quality of machine learning methods by choosing coordinate systems that incorporate translational, rotational, and permutational  invariance. 11, 16, 39, 10 More modern descriptors like the Coulomb matrix representation or the Bag of Bonds approach increase the number of dimensions in the system. Considering the scaling of our optimizer this might lead to performance problems. It was shown that one can use a Z-Matrix representation in the GPR framework for geometry optimization19. This introduces new hyperparameters for the length scales. Our algorithms are not capable of handling those.  For traditional TS optimizers the usage of internal coordinate systems like the Z-matrix  is not generally advisable. 40, 31 Therefore, it is not clear whether the usage of internal coordinates will improve the performance of the optimizer significantly. This has to be studied in future work. With this in mind it might also be beneficial to try a different covariance function/kernel than the one we used. By simple cross-validation (using a set of additional test points to validate a reasonable choice of the hyperparameters)  on several test systems we saw that the squared exponential covariance function yields generally worse results. We did not test other covariance functions/kernels. We also made no explicit use of the statistical properties that GPR yields. One can optimize the hyperparameters via the maximum likelihood estimation8 and one can make uncertainty predictions of the predicted energies via the variance. The former was initially used to explore the space of hyperparameters σe\sigma_{\text{e}}, σg\sigma_{\text{g}}, and ll. But the values that were chosen for this paper were obtained via cross-validation on several test systems. We found the optimization of hyperparameters based on the maximum likelihood estimation to yield no consistent improvements on the overall performance of the optimizations.  The use of uncertainty predictions, i.e. the variance, could be used to find a maximum step size dynamically. Nevertheless, we found no advantage of a variance-based approach over using a simple distance measure as the maximum step size.

Since the computational effort of our GPR approach scales cubically with the number of dimensions, we mainly recommend our optimizer for smaller systems. In higher dimensional systems one may also have to increase the number of training points in the last level of our multi-level approach which makes the optimizer less efficient. We recommend to increase the number of training points for the last level only for smaller systems for which one wants to do high-precision calculations.

Our algorithm apparently is more sensitive to numerical errors than the dimer method: choosing different floating point models for the compiler can lead to a different performance of the optimizer. This effect can also be observed when using P-RFO. The dimer method seems to be almost immune to this effect. The solution of the linear system, the evaluations of the GPR-PES, the diagonalization of the Hessian, and the P-RFO method on the GPR-PES all yield slightly different results when using different floating-point models. These machine-precision errors can lead to varying performance of the optimizer. In our tests the performance varies only by a few energy evaluations (mostly less than 55, always less than 2020), to the better or worse. In some test cases these variations might be higher, also using P-RFO. For the benchmarks presented in the paper we set no explicit compiler flag and use the standard setting for floating point operations of Intel’s fortran compiler, version 16.0.216.0.2. That is, using “more aggressive optimizations on floating-point calculations”, as can be read in the developer guide for ifort.

The overall result of our benchmark is very promising. The big advantage of the new GPRTS optimizer is twofold: firstly, one can get a quite precise representation of the second order information with GPR that seems to be superior to traditional Hessian update mechanisms. Secondly, our algorithm is able to do very large steps, as part of the overshooting procedures. Doing large steps and overshooting the estimated solution does not hinder the convergence since the optimizer can use that information to improve the predicted GPR-PES. Therefore, even bad estimates of the next point can lead to an improvement in the optimization performance.

Our GPRPP method of finding a starting point for the TS search seems to work quite well. It is generally not advisable to use the estimated starting point in other optimization algorithms. But it is sufficiently accurate for our GPRPP method. Also the additional points of the NEB in the IDPP seem to improve the stability of the optimization in many cases. If chemical intuition or some other method leads to a starting point very close to the real TS, the pure GPRTS method might still be faster.

5 Conclusions

We presented a new black box optimizer to find SPs on energy surfaces based on GPR. Only a maximum step size has to be set manually by the user. It outperforms both well established methods (dimer and P-RFO). The speedup in the presented test systems is significant and will be further increased when using higher level theory for the electronic structure calculations. We also presented an automated way of finding a starting geometry for the TS search using the reactant and product geometries. We advise to use this approach for systems in which the two minima are known and the estimate of the TS is not straightforward. In the presented test systems the method is very stable and fast.

Acknowledgments

We thank Bernard Haasdonk for stimulating discussions. This work was financially supported by the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 646717, TUNNELCHEM) and the German Research Foundation (DFG) through the Cluster of Excellence in Simulation Technology (EXC 310/2) at the University of Stuttgart.

References

  • Baker 1986 Baker, J. An algorithm for the location of transition states. J. Comput. Chem. 1986, 7, 385–395
  • Banerjee et al. 1985 Banerjee, A.; Adams, N.; Simons, J.; Shepard, R. Search for stationary points on surfaces. J. Phys. Chem. 1985, 89, 52–57
  • Plasencia Gutiérrez et al. 2017 Plasencia Gutiérrez, M.; Argáez, C.; Jónsson, H. Improved Minimum Mode Following Method for Finding First Order Saddle Points. J. Chem. Theory Comput. 2017, 13, 125–134
  • Henkelman and Jónsson 1999 Henkelman, G.; Jónsson, H. A dimer method for finding saddle points on high dimensional potential surfaces using only first derivatives. J. Chem. Phys. 1999, 111, 7010–7022
  • Lanczos 1950 Lanczos, C. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J. Res. Natl. Bur. Stand. B 1950, 45, 255–282
  • Zeng et al. 2014 Zeng, Y.; Xiao, P.; Henkelman, G. Unification of algorithms for minimum mode optimization. J. Chem. Phys. 2014, 140, 044115
  • Heyden et al. 2005 Heyden, A.; Bell, A. T.; Keil, F. J. Efficient methods for finding transition states in chemical reactions: Comparison of improved dimer method and partitioned rational function optimization method. J. Chem. Phys. 2005, 123, 224101
  • Rasmussen and Williams 2006 Rasmussen, C. E.; Williams, C. K. Gaussian processes for machine learning; MIT press Cambridge, 2006; Vol. 1
  • Alborzpour et al. 2016 Alborzpour, J. P.; Tew, D. P.; Habershon, S. Efficient and accurate evaluation of potential energy matrix elements for quantum dynamics using Gaussian process regression. J. Chem. Phys. 2016, 145, 174112
  • Bartók et al. 2013 Bartók, A. P.; Kondor, R.; Csányi, G. On representing chemical environments. Phys. Rev. B 2013, 87, 184115
  • Ramakrishnan and von Lilienfeld 2017 Ramakrishnan, R.; von Lilienfeld, O. A. Reviews in Computational Chemistry; JWS, 2017; pp 225–256
  • Mills and Popelier 2011 Mills, M. J.; Popelier, P. L. Intramolecular polarisable multipolar electrostatics from the machine learning method Kriging. Comput. Theor. Chem. 2011, 975, 42 – 51
  • Handley et al. 2009 Handley, C. M.; Hawe, G. I.; Kell, D. B.; Popelier, P. L. A. Optimal construction of a fast and accurate polarisable water potential based on multipole moments trained by machine learning. Phys. Chem. Chem. Phys. 2009, 11, 6365–6376
  • Fletcher et al. 2014 Fletcher, T. L.; Kandathil, S. M.; Popelier, P. L. A. The prediction of atomic kinetic energies from coordinates of surrounding atoms using kriging machine learning. Theor. Chem. Acc. 2014, 133, 1499
  • Ramakrishnan and von Lilienfeld 2015 Ramakrishnan, R.; von Lilienfeld, O. A. Many molecular properties from one kernel in chemical space. CHIMIA 2015, 69, 182–186
  • Hansen et al. 2015 Hansen, K.; Biegler, F.; Ramakrishnan, R.; Pronobis, W.; von Lilienfeld, O. A.; Müller, K.-R.; Tkatchenko, A. Machine Learning Predictions of Molecular Properties: Accurate Many-Body Potentials and Nonlocality in Chemical Space. J. Phys. Chem. Lett. 2015, 6, 2326–2331
  • Dral et al. 2017 Dral, P.; Owens, A.; Yurchenko, S.; Thiel, W. Structure-based sampling and self-correcting machine learning for accurate calculations of potential energy surfaces and vibrational levels. J. Chem. Phys. 2017, 146, 244108
  • Deringer et al. 2018 Deringer, V. L.; Bernstein, N.; Bartók, A. P.; Cliffe, M. J.; Kerber, R. N.; Marbella, L. E.; Grey, C. P.; Elliott, S. R.; Csányi, G. Realistic Atomistic Structure of Amorphous Silicon from Machine-Learning-Driven Molecular Dynamics. J. Phys. Chem. Lett. 2018, 9, 2879–2885
  • Schmitz and Christiansen 2018 Schmitz, G.; Christiansen, O. Gaussian process regression to accelerate geometry optimizations relying on numerical differentiation. J. Chem. Phys. 2018, 148, 241704
  • Denzel and Kästner 2018 Denzel, A.; Kästner, J. Gaussian process regression for geometry optimization. J. Chem. Phys. 2018, 148, 094114
  • Mills and Jónsson 1994 Mills, G.; Jónsson, H. Quantum and thermal effects in H2{\mathrm{H}}_{2} dissociative adsorption: Evaluation of free energy barriers in multidimensional quantum systems. Phys. Rev. Lett. 1994, 72, 1124–1127
  • Henkelman et al. 2000 Henkelman, G.; Uberuaga, B. P.; Jónsson, H. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. J. Chem. Phys. 2000, 113, 9901–9904
  • Koistinen et al. 2016 Koistinen, O.-P.; Maras, E.; Vehtari, A.; Jónsson, H. Minimum energy path calculations with Gaussian process regression. Nanosystems: Phys. Chem. Math. 2016, 7, 925–935
  • Koistinen et al. 2017 Koistinen, O.-P.; Dagbjartsdóttir, F. B.; Ásgeirsson, V.; Vehtari, A.; Jónsson, H. Nudged elastic band calculations accelerated with Gaussian process regression. J. Chem. Phys. 2017, 147, 152720
  • Smidstrup et al. 2014 Smidstrup, S.; Pedersen, A.; Stokbro, K.; Jónsson, H. Improved initial guess for minimum energy path calculations. J. Chem. Phys. 2014, 140, 214106
  • Kästner et al. 2009 Kästner, J.; Carr, J. M.; Keal, T. W.; Thiel, W.; Wander, A.; Sherwood, P. DL-FIND: An Open-Source Geometry Optimizer for Atomistic Simulations. J. Phys. Chem. A 2009, 113, 11856–11865
  • Kästner and Sherwood 2008 Kästner, J.; Sherwood, P. Superlinearly converging dimer method for transition state search. J. Chem. Phys. 2008, 128, 014106
  • Sherwood et al. 2003 Sherwood, P.; de Vries, A. H.; Guest, M. F.; Schreckenbach, G.; Catlow, C. A.; French, S. A.; Sokol, A. A.; Bromley, S. T.; Thiel, W.; Turner, A. J. et al. QUASI: A general purpose implementation of the QM/MM approach and its application to problems in catalysis. J. Mol. Struct. Theochem. 2003, 632, 1 – 28
  • Metz et al. 2014 Metz, S.; Kästner, J.; Sokol, A. A.; Keal, T. W.; Sherwood, P. ChemShell-a modular software package for QM/MM simulations. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2014, 4, 101–110
  • Matérn 2013 Matérn, B. Spatial variation; SSBM, 2013; Vol. 36
  • Baker and Chan 1996 Baker, J.; Chan, F. The location of transition states: A comparison of Cartesian, Z-matrix, and natural internal coordinates. J. Comput. Chem. 1996, 17, 888–904
  • Dewar et al. 1985 Dewar, M. J.; Zoebisch, E. G.; Healy, E. F.; Stewart, J. J. Development and use of quantum mechanical molecular models. 76. AM1: a new general purpose quantum mechanical molecular model. J. Am. Chem. Soc. 1985, 107, 3902–3909
  • Becke 1988 Becke, A. Density-functional exchange-energy approximation with correct asymptotic behavior. Phys. Rev. A 1988, 38, 3098–3100
  • Perdew 1986 Perdew, J. P. Density-functional approximation for the correlation energy of the inhomogeneous electron gas. Phys. Rev. B 1986, 33, 8822–8824
  • Hariharan and Pople 1973 Hariharan, P. C.; Pople, J. A. The influence of polarization functions on molecular orbital hydrogenation energies. Theor. Chem. Acc. 1973, 28, 213–222
  • Meisner and Kästner 2018 Meisner, J.; Kästner, J. Dual-Level Approach to Instanton Theory. J. Chem. Theory Comput. 2018, 14, 1865–1872
  • Rieckhoff et al. 2017 Rieckhoff, S.; Meisner, J.; Kästner, J.; Frey, W.; Peters, R. Double Regioselective Asymmetric C-Allylation of Isoxazolinones: Iridium-Catalyzed N-Allylation Followed by an Aza-Cope Rearrangement. Angew. Chem. Int. Ed. 2017, 57, 1404–1408
  • Bofill 1994 Bofill, J. M. Updated Hessian matrix and the restricted step method for locating transition structures. J. Comput. Chem. 1994, 15, 1–11
  • Behler 2016 Behler, J. J. Chem. Phys. 2016, 145, 170901
  • Baker and Hehre 1990 Baker, J.; Hehre, W. J. Geometry optimization in cartesian coordinates: The end of the Z-matrix? J. Comput. Chem. 1990, 12, 606–610