Incorporation of Internal Coordinates Interpolation into the Freezing String Method
Abstract
We present an improved method for determining guess structures for transition state searches by incorporating internal coordinates interpolation into the freezing string method (FSM). We test our method on over 40 reactions across 3 benchmark datasets covering a diverse set of chemical reactions. Our results show that incorporation of internal coordinates interpolation improves the computational efficiency of the FSM by enabling the use of larger interpolation step sizes and fewer optimization steps per cycle while maintaining 100% success rate on benchmark chemical reaction test cases including systems where previous attempts based on linear synchronous transit interpolation have failed. We provide an open-source Python implementation of the FSM, in addition to the reactant, product, transition state structures of all reactions studied.
keywords:
Chemical calculations, Chemical reactions, Optimization, Transition statesIowa]
Department of Chemical and Biochemical Engineering
University of Iowa, Iowa City, United States
\abbreviationsMEP,TS,FSM,LST,RIC
![[Uncaptioned image]](https://cdn.awesomepapers.org/papers/6fc59143-0fd7-4171-ba00-5b927de1bef0/bicyclo.png)
Reaction coordinate diagram of the bicyclo[1.1.0]butane ring opening reaction to trans-butadiene.
1 Introduction
The determination of reaction mechanisms is an important step for chemists in the prediction of thermodynamics and kinetic properties of chemical reactions. In computational chemistry, the reaction mechanism is typically represented as a pathway on the Born-Oppenheimer potential energy surface (PES) of the system of interest given by a suitable potential energy function of the nuclear coordinates. The equilibrium states correspond to stationary points on the PES with zero first order derivatives (gradient) in all directions and all positive eigenvalues of the second order derivative (Hessian) matrix. The equilibrium states can be identified by energy minimization given a suitable initial guess structure. There exist many pathways by which equilibrium states may interconvert; however, only a small subset of these pathways are thermally accessible and relevant to thermodynamic and kinetic analysis of reaction mechanisms. The minimum energy path (MEP) is the route that needs the least amount of potential energy for the system to undergo the transition. The transition states (TSs), based on transition state theory, are first order saddle points with zero gradient and only one negative eigenvalue of the Hessian matrix. The MEP connecting two equilibrium states must go through one or more TSs and serves as a representative reaction path.
The first step in identifying the MEP is typically locating the minimum energy TS connecting the equilibrium states of interest. Determination of TS geometries is a computationally expensive task that frequently requires significant human intervention. Efforts have been made to develop automated processes for finding first order saddle points on PESs while attempting to minimize computational cost. Many modern algorithms take a chain-of-states1 approach to this problem. Geometric interpolation between the reactant and product geometries is performed, producing a string of intermediate structures, or nodes, distributed along the reaction pathway. The resulting structures are optimized using ab-initio methods to approximate the MEP of the reaction. The highest energy structure in the chain-of-states is taken as a guess of the TS geometry. A local surface walking algorithm refines the guess to exact TS geometry. 2, 3, 4, 5, 6, 7, 8, 9, 10 These algorithms require the use of local gradient and Hessian information to locate saddle points, and their success is heavily dependent on the initial guess being located within the desired basin of attraction.
Many such chain-of-states methods that provide accurate TS geometry guesses from reactant and product structures have been previously reported. Details on how intermediate structures are selected and optimized differs between methods, and consequently influences the computational cost and success of the algorithm. The nudged elastic band (NEB) method creates an initial chain-of-states by interpolation between reactant and product, and every intermediate structure along the string is iteratively optimized to lie along the reaction pathway, together with an additional penalty term to maintain an even distribution of structures along the chain-of-states. 11, 12, 13, 14, 15, 16, 17, 18 The growing string method (GSM) in most use cases incurs fewer gradient calculations than the NEB by developing a better initial chain-of-states, or string.19, 20, 21, 22, 23, 24, 25, 26, 27, 28 The GSM creates two strings; one starts at the reactant configuration and the other starts at the product configuration. At each iteration of the growing process, an intermediate structure, or node, is added to each string frontier in the direction of the opposing string, after which the entire string is optimized. This growth process is repeated until the strings meet, providing a better initial chain-of-states, and followed by additional optimization of the unified string. In addition to providing a better initial chain-of-states, this avoids simulation of non-physical structures in the interior nodes which may result from the initial interpolation.
The freezing string method (FSM) further reduces the number of gradient calls but at the cost of identifying the true MEP. 29, 30, 31 Two strings are grown from the product and reactant like the GSM. Once a frontier node is placed, optimization is performed to step the node closer to the reaction pathway, after which it is “frozen”, i.e., it will not move for the remainder of the calculation. New frontier nodes are added to both strings, and the process is repeated. Once the two strings unite, the highest energy structure is taken to be the TS guess without optimization of the unified string. In practice, the resulting pathway can deviate significantly from the MEP but often produces guess structures suitable for further refinement.
The FSM greatly improves the computational efficiency of TS guess finding compared to previously described algorithms, often resulting in an order of magnitude reduction in the required number of electronic structure calculations required to determine the true TS structure. Despite these improvements, the FSM is not a foolproof algorithm. The efficiency of the calculation and ultimately its success heavily depends on the interpolation algorithm used to generate initial guess structures at the frontier of the growing strings. There exist known issues with commonly applied Cartesian coordinate and linear synchronous transit (LST) interpolation techniques32, which can produce high energy or otherwise aphysical molecular geometry structures far from the true MEP which are then subjected to electronic structure calculation and geometry optimization.25, 33 Due to the relatively few number of optimization steps used in the FSM, the incorporation of these geometries as anchor points into future interpolation steps poisons the calculation, resulting in a failed search and wasted computational effort.
In this work, we demonstrate an improved method for initiating TS searches by incorporating internal coordinates (ICs) interpolation into the FSM. Additionally, we incorporate the L-BFGS-B with explicit line search for step size determination which improves reaction pathway optimization steps. We test our method on over 40 reactions across three benchmark datasets covering a broad set of chemical reactions. Our results show that, using previously studied LST interpolation, the incorporation of the L-BFGS-B optimization reduces the computational effort required in previous studies, even considering the additional computational cost incurred due to step size determination by line search. Incorporation of ICs interpolation further improves the computational efficiency of the FSM and successfully locates high-quality TS guess structures where previous attempts based on LST interpolation have failed. We provide an open-source Python implementation of the FSM, in addition to the reactant, product, TS structures of all reactions computed at the B97X-V/def2-TZVP level of theory. We anticipate that these resources will be of broad interest for researchers in computational chemistry studying problems where the fast and reliable location of TSs is important, or those developing algorithms who wish to evaluate their methods on a broad set of realistic chemical problems.
2 Methods
2.1 Overview of the Freezing String Method

A flowchart of the FSM algorithm is shown in Figure 1. The goal of the FSM is to produce an approximate reaction pathway connecting two given reactant and product structures. The approximate reaction pathway is represented as a chain-of-states consisting of nodes along a parameterized string. The interior nodes on the string represent intermediate geometries along the approximate reaction pathway. The method evolves the string by alternately adding nodes to the reactant and product sides of a growing string. The new reactant and product side structures are generated by taking a step inward along an interpolated path between frontier nodes. After interpolation, the structures undergo geometry optimization in the direction perpendicular to the approximate reaction pathway. Optimization of these new frontier structures is performed, and then the geometries are frozen. These steps are repeated until the reactant and product sides of growing string meet. The highest energy node along the pathway is chosen as the TS guess structure that is refined to the true TS geometry using a local surface walking optimization algorithm.
The interpolation step is performed by adding nodes at a user-defined distance, , from the reactant and product frontier nodes. During the first evolution step, the frontier nodes are the user-defined reactant and product structures, while subsequent steps use the frozen nodes from the previous iteration as anchor points for interpolation. The interpolation step size is fixed during the calculation, and in this work, we determine by dividing the arc length of an interpolated path connecting the initial reactant and product structures by a user-defined nominal number of nodes, . The interpolation can be performed using many different coordinate systems. We consider both LST interpolation and linear interpolation in redundant internal coordinates. Tangent directions at the new reactant and product frontier nodes are determined from a cubic spline fitted through the chemical structures of the interpolated pathway.
During optimization, the energy is minimized by assumption of the local quadratic approximation:
(1) |
Here, is the current geometry, is a proposed geometry, is the perpendicular gradient, and is the approximate Hessian in the space perpendicular to the approximate reaction pathway. The perpendicular gradient is given by where is the normalized tangent vector determined during the interpolation step.
The optimization of equation 1 is performed using the L-BFGS-B algorithm34 as implemented in the SciPy Python library.35 We place bounds on each Cartesian coordinate such that no coordinate is displaced by more than 0.3 Å during a single optimization step. Each optimization step is begun by performing a backtracking line search, where at most (often one) energy and gradient calculations are performed to determine an appropriate step size. The optimization proceeds until a specified convergence criteria is satisfied, or a maximum number of step has been performed.
2.2 Redundant internal coordinates
Cartesian coordinates are used to specify the positions of atoms in a molecule and are a necessary input for electronic structure calculations. Internal coordinates (ICs) are functions of the Cartesian coordinates that better describe the collective motions of atoms. There exists several choices for constructing coordinate systems when performing interpolation between two molecular geometries. The definition of the redundant internal coordinates used in this work have been reported previously 36, 37. Redundant internal coordinate sets are constructed separately for reactant and product molecules, and the union of these two sets gives the full set of redundant coordinates. Bonds, angles, linear angles, torsion angles, and out-of-plane angles are assigned based on a procedure outlined in Bakken and Helgaker 38, and the full set of coordinates is further pruned to ensure each internal coordinate is well-defined for both reactant and product molecules.
2.3 ICs interpolation
Given a set of primitive redundant coordinates , a set of structures are produced by linear interpolation
(2) |
where is the interpolation parameter and and represent the reactant and product internal coordinate values. The displacements in are related by the the well-known matrix39 valid for small Cartesian displacements . The final step is to transform the geometries along the interpolated path from internal coordinates back to Cartesians. Given a target geometry in internal coordinates, this is done iteratively using the formula
(3) |
The iteration is terminated when the Cartesian coordinates generated on the ()th iteration are identical to those on the th iteration within a tolerance Å. The transformation of all results in a chain-of-states connecting the current frontier reactant and product nodes that is used during the interpolation step of the FSM.
2.4 LST interpolation
LST is another chemically realistic interpolation method that produces a chain-of-states pathway.32 A set of structures are produced by linear interpolation in a set of internuclear distances
(4) |
where is the interpolation parameter and and represent the reactant and product internuclear distances. A set of structures are produced by linear interpolation in Cartesian coordinates
(5) |
The final interpolated pathway is determined by minimizing the objective by the method of least-squares at each interpolation point
(6) |
where is a weighting factor (nominally ) and the superscripts and denote interpolated and calculated values, respectively. The variables are those optimized during the minimization of , and the variables are derived from . The denominator in the first term ensures that important distances between bonded atoms are preserved in the final interpolated geometries. The second term is weighted by a factor , and produces forces to prevent rigid translation or rotation such that the interpolated structures align with the end structures. The weighting factor additionally weights within the objective function the relative importance of linear interpolation in internuclear distances (first term) and linear interpolation in Cartesian coordinates (second term). The optimization of equation 6 is performed using the L-BFGS-B algorithm34 as implemented in the SciPy Python library. The optimization of for all results in a chain-of-states connecting the current frontier reactant and product nodes that is used during the interpolation step of the FSM.
2.5 Computational details
Electronic structure calculations are performed to produce the optimized geometries of the reactant and product for each reaction studied, compute the quantum mechanical gradients required by the FSM algorithm, as well as to refine the geometry of the TS guess structures produced by the FSM. All electronic structure calculations were performed using the range-separated, hybrid generalized gradient approximation with non-local correlation exchange-correlation function B97X-V40 using the triple-, polarized valence def2-TZVP basis set.41 During geometry optimization, energies were converged to 10-6 Ha (Hartree) and the maximum norm of the Cartesian gradient was converged to 10-3 Ha bohr-1. Geometry optimization was terminated if the convergence criteria were not met within 250 optimization steps. All reported energy values are electronic energy without thermal or zero-point correction. The eigenvector-following local TS searches were performed using the partitioned rational function optimization method.5 Frequency analysis was performed to confirm the nature of each stationary point: there must be zero imaginary frequencies for PES minima and exactly one imaginary frequency for PES TSs. Intrinsic reaction coordinate (IRC) pathway calculations initiated at the TS were performed with the predictor-corrector algorithm of Schmidt et al. 42 to further characterize TS geometries. The FSM software is written in Python and performs file-based data exchange with an electronic structure software package to obtain the quantum mechanical gradients used in the method. All QM calculations were performed using a release version of the Q-Chem 6.0 software package.43
3 Results and Discussion
We measure benchmark performance by tracking the number of quantum mechanical gradient calculations performed as part of the FSM and TS search procedure, as well as the total number of gradient calculations. We report separately the total number of gradients computed during FSM calculation, which measures the efficiency at which a TS guess is obtained, and the total number of gradients computed during the TS calculation, which serves as an indicator for TS guess quality. We compare FSM calculations with either redundant internal coordinates (FSM-RIC) or linear synchronous transit (FSM-LST) interpolation methods. We perform FSM calculations with nominally 18 nodes () along the approximate reaction pathway, two steps per optimization cycle (), and at most three line search steps per optimization cycle () unless otherwise stated. We consider this set of parameters to be a conservative baseline where both the FSM-RIC and FSM-LST should perform well.
3.1 Sharada test set
ID | Reaction | Gradients | |||||
Gradients | (FSM-RIC) | ||||||
Gradients | |||||||
Gradients | |||||||
Gradients | |||||||
Gradients | |||||||
(TOTAL-LST) | |||||||
1 | H2CO H2 + CO | 58 | 13 | 71 | 62 | 11 | 73 |
2 | SiH2 + H2 SiH4 | 55 | 5 | 60 | 53 | 4 | 57 |
3 | acetaldehyde Keto-enol tautomerism | 63 | 3 | 66 | 61 | 4 | 65 |
4 | CH3CH3 CH2CH2 + H2 | 58 | 24 | 82 | 52 | 21 | 73 |
5 | bicyclo[1.1.0]butane trans-butandiene | 61 | 47 | 108 | 58 | 25 | 83 |
6 | parent Diels-Alder cycloaddition reaction | 62 | 15 | 77 | 53 | 33 | 86 |
7 | cis,cis-2,4-hexadiene 3,4-dimethylcyclobutene | 60 | 18 | 78 | 79 | 25 | 104 |
8 | alanine dipeptide rearrangment | 61 | 64 | 125 | 45 | 45 | 90 |
9 | silyl ketene acetal silyl ester Ireland-Claisen rearrangement | 60 | 82 | 142 | 60 | 88 | 148 |
A test suite consisting of nine reactions curated by Sharada et al. 30 is chosen as the first benchmark. The Sharada test set contains a broad set of bond formation, dissociation, ring-opening, and isomerization reactions. The description of the nine reactions in the Sharada test set is presented in Table 3.1. This choice of benchmark suite is significant due to its previous use benchmarking previous implementations of the FSM30, 24. For each reaction, we highlight in bold the method (FSM-RIC or FSM-LST) that locates the TS geometry in fewest calculations.
The comparison between the required gradients calls for FSM and TS calculations using RIC or LST interpolation on the Sharada test set is presented in Table 3.1. Both FSM-RIC and FSM-LST calculations successfully locate the exact benchmark TS structure without user intervention when performed with the conservative baseline parameters. The FSM-RIC and FSM-LST perform similarly, on average, producing a TS guess structure after 60 and 58 gradient evaluations, respectively. The resulting TS guess structures are of similar quality, as indicated by the average number of gradient evaluations required for TS structure refinement. The FSM-RIC TS guess structure requires on average 30 gradient evaluations for subsequent optimization, and the FSM-LST TS guess structures require on average 28 gradient evaluations for optimization.
Though the choice of exchange-correlation method and basis set in the original study by Sharada et al. 30 precludes the direct comparison to the current work, we can qualitatively compare their reported gradient evaluation counts to ours. The average number of gradient calls required for the FSM-LST in its original implementation is 53, noticeably fewer than required by our implementation of FSM-LST, when performed with similar FSM parameter settings. The two implementations differ notably in the how each method chooses the step size for node-level optimization. Whereas the original work uses the BFGS algorithm and a heuristic for step size determination based on available local PES information, our method uses the L-BFGS-B algorithm together with a step size determined by explicit line search. There is a small increase in the number of gradient calls required during the FSM calculation due to the explicit line search requiring additional electronic structure calculations at each optimization step. The average number of gradient calls required for subsequent TS optimization in the previous FSM implementation is 64, whereas our method requires on average 28 gradient calls for TS optimization. We can reasonably conclude that, although our optimization method requires more gradient calls due to explicit line search, our TS guesses are of higher quality and converge to the correct TS structure in overall fewer gradient calls on average.
3.2 Birkholz test set
ID | Reaction | Gradients | |||||
Gradients | (FSM-RIC) | ||||||
Gradients | |||||||
Gradients | |||||||
Gradients | |||||||
Gradients | |||||||
(TOTAL-LST) | |||||||
1 | C2H4 + N2O C2N2O | 69 | 7 | 76 | 61 | 48 | 109 |
2 | 1,3-pentadiene hydrogen transfer | 63 | 5 | 68 | 68 | 6 | 74 |
3 | HCN HNC | 78 | 3 | 81 | 72 | 9 | 81 |
4 | 1,4-hexadiene Cope rearrangement | 56 | 27 | 83 | 60 | 12 | 72 |
5 | 1,3-cyclopentadiene hydrogen shift | 62 | 5 | 67 | 64 | 7 | 71 |
6 | 1,3-butadiene cyclization | 60 | 6 | 66 | 66 | 8 | 74 |
7 | Diels-Alder endo addition of cyclopentadiene to cyclopentadiene | 62 | 50 | 112 | 57 | 52 | 109 |
8 | Diels-Alder addition of cyclopentadiene and ethylene | 62 | 11 | 73 | 63 | 11 | 74 |
9 | difluorocarbene addition to ethylene | 56 | 24 | 80 | 50 | 12 | 62 |
10 | ene reaction of ethylene and propene | 49 | 42 | 91 | 51 | 47 | 98 |
11 | Grignard addition of phenyl magnesium bromide to benzophenone | 53 | 38 | 91 | 59 | 49 | 108 |
12 | H2CO H2 + CO | 58 | 13 | 71 | 62 | 11 | 73 |
13 | CH3CH2F CH2CH2 + HF | 52 | 9 | 61 | 67 | 9 | 76 |
14 | water assisted hydrolysis of ethyl acetate | 42 | 60 | 102 | 56 | 86 | 142 |
15 | H2 + H2CO CH3OH | 52 | 8 | 60 | 61 | 17 | 78 |
16 | 2-methyl-3-phenyloxirane ring opening | 58 | 48 | 106 | 59 | 36 | 95 |
17 | CH2CHCH2CH2CHO Claisen rearrangement | 57 | 31 | 88 | 61 | 30 | 91 |
18 | SiH2 + H2 SiH4 | 55 | 5 | 60 | 53 | 4 | 57 |
19 | Cl- + CH3F CH3Cl + F- | 56 | 4 | 60 | 60 | 3 | 63 |
20 | sulfur dioxide addition to butadiene | 63 | 21 | 84 | 65 | 22 | 87 |
A test suite consisting of twenty reactions curated by Birkholz and Schlegel 44 is chosen as the second benchmark. The Birkholz test set contains examples of many types of organic reactions, including insertions, additions, eliminations, hydrolysis, ring-opening, substitutions, cycloadditions, and rearrangements. The description of the twenty reactions in the Birkholz test set is presented in Table 3.2. This data set shares two identical test cases with the Sharada benchmark set (H2CO H2 + CO and SiH2 + H2 SiH4), resulting in 18 new, unique test cases to evaluate. For each reaction, we highlight in bold the method that locates the TS geometry in fewest calculations.
The comparison between the required gradient calls for FSM and TS calculations using RIC or LST interpolation on the Birkholz test set is presented in Table 3.2. Both the FSM-RIC and FSM-LST successfully find the exact benchmark TS structure for each test case given the same conservative baseline parameters. The FSM-RIC and FSM-LST again perform similarly, on average, producing a TS guess structure after 58 and 61 gradient evaluations, respectively. The resulting TS guess structures are of similar quality; the FSM-RIC TS guess structure requires on average 21 gradient evaluations for subsequent optimization, and the FSM-LST TS guess structures require on average 24 gradient evaluations for optimization. Of the 18 test cases unique to the Birkholz benchmark set, the FSM-RIC finds the final TS structure in fewer or equal number of gradient evaluations compared to the FSM-LST in 14 of the test cases.
One notable difference between the performance of the FSM-RIC and FSM-LST on the Birkholz benchmark set occurs in the case of the water-assisted hydrolysis of ethyl acetate to acetic acid and ethanol. Here, an additional water molecule assists in the hydrolysis reaction by proton shuttling which significantly stabilizes the TS structure in comparison to the unassisted reaction mechanism. The TS search for both structures requires a greater number of additional gradient evaluations when compared to other, simpler reactions within the test set. Both the TS guess structure generation and subsequent TS optimization require fewer gradient evaluations when using internal coordinates-based interpolation steps, resulting in a total of 102 and 142 gradient calls required to locate the final TS structure for FSM-RIC and FSM-LST techniques, respectively.
The Birkholz benchmark data set was originally developed by Birkholz and Schlegel 44 to test their proposed Connectivity Transition State (CTS) algorithm for determining TS guess structure geometry and TS optimization. In the CTS method, interpolation in redundant internal coordinates limited to bond making and breaking coordinates is performed, and optimization of the interpolated structure is performed on UFF molecular mechanics45 or PM6 semiempirical electronic structure46 PESs. Their method also considers different strategies for approximating the initial Hessian used in the subsequent optimization, whereas our TS search calculations are initialized with the exact Hessian calculated at the TS guess structure. This strategy of relying on computationally cheaper levels of theory for optimization results in fewer gradient evaluations overall on the desired PES. This is in contrast to the present work, where the high-level B97X-V/def2-TZVP model chemistry is applied in all calculations. While the CTS method is reported to find the final TS structures in significantly fewer high-level gradient evaluations, we report 100% success rate for both the FSM-RIC and FSM-LST evaluated on the Birkholz benchmark set, whereas the best reported CTS methodology reaches 95% success rate.
3.3 Baker test set
ID | Reaction | Gradients | |||||
Gradients | (FSM-RIC) | ||||||
Gradients | |||||||
Gradients | |||||||
Gradients | |||||||
Gradients | |||||||
(TOTAL-LST) | |||||||
1 | HCN HNC | 78 | 3 | 81 | 72 | 9 | 81 |
2 | HCCH CCH2 | 61 | 11 | 72 | 63 | 4 | 67 |
3 | H2CO H2 + CO | 58 | 13 | 71 | 62 | 11 | 73 |
4 | CH3O CH2OH | 52 | 5 | 57 | 48 | 7 | 55 |
5 | cyclopropyl ring opening | 59 | 11 | 70 | 62 | 25 | 87 |
6 | bicyclo[1.1.0]butane trans-butandiene | 61 | 47 | 108 | 58 | 25 | 83 |
7 | formyloxyethyl 1,2-migration | 63 | 14 | 77 | 62 | 11 | 73 |
8 | parent Diels-Alder cycloaddition | 62 | 15 | 77 | 53 | 33 | 86 |
9 | s-tetrazine 2HCN + N2 | 76 | 7 | 83 | 80 | 15 | 95 |
10 | trans-butadiene cis-butadiene | 63 | 2 | 65 | 69 | 2 | 71 |
11 | CH3CH3 CH2CH2 + H2 | 58 | 24 | 82 | 52 | 21 | 73 |
12 | CH3CH2F CH2CH2 + HF | 52 | 9 | 61 | 67 | 9 | 76 |
13 | acetaldehyde keto-enol tautomerism | 63 | 3 | 66 | 61 | 4 | 65 |
14 | HCOCl HCl + CO | 65 | 5 | 70 | 67 | 6 | 73 |
15 | H2O + PO H2PO | 55 | 21 | 76 | 48 | 24 | 72 |
16 | CH2CHCH2CH2CHO Claisen rearrangement | 57 | 31 | 88 | 61 | 30 | 91 |
17 | SiH2 + CH3CH3 SiH3CH2CH3 | 55 | 5 | 60 | 53 | 10 | 63 |
18 | HNCCS HNC + CS | 72 | 8 | 80 | 53 | 9 | 62 |
19 | HCONH NH + CO | 65 | 18 | 83 | 99 | 17 | 116 |
20 | acrolein rotational TS | 63 | 2 | 65 | 59 | 13 | 72 |
21 | HCONHOH HCOHNHO | 57 | 12 | 69 | 62 | 5 | 67 |
22 | HNC + H2 H2CNH | 36 | 18 | 54 | 66 | 15 | 81 |
23 | H2CNH HCNH2 | 30 | 6 | 36 | 66 | 3 | 69 |
24 | HCNH2 HCN + H2 | 60 | 41 | 101 | 53 | 26 | 79 |
A test suite consisting of twenty-five reactions originally compiled by Baker and Chan 47 is selected as the third and final benchmark set. The reactions in the test suite again cover a broad set of chemical reaction archetypes, including dissociation, insertion, rearrangement, ring-opening, and rotation reactions. A description of the reactions contained within the test suite, together with TS optimization results, is given in Table 3.3. This data set shares eight identical test cases with the Sharada and Birkholz benchmark sets, resulting in 16 new, unique test cases for evaluation. For each reaction, we highlight in bold the method that locates the TS in fewest calculations, and highlight in italics any method resulting in a failed search or a search where the wrong TS was located.
The comparison between the required gradient calls for FSM and TS calculations using RIC or LST interpolation on the Baker test set is presented in Table 3.3. The FSM-RIC and FSM-LST perform similarly, on average, producing a TS guess structure after 59 and 62 gradient evaluations, respectively, and both methods requiring on average 14 gradient evaluations for TS optimization. Of the 16 test cases unique to the Baker benchmark set, the FSM-RIC finds the final TS molecular geometry in fewer gradient evaluations compared to the FSM-LST in 12 of the test cases. There are two test cases where the TS optimization based on the FSM-RIC requires sizably more gradient evaluations than the equivalent optimization based on FSM-LST, bicyclobutane ring-opening and methylamine dehydrogenation.
The FSM-RIC successfully finds the exact benchmark TS structure for each test case in the Baker benchmark set. The FSM-LST, however, fails to find the correct TS structure in two of the 16 test cases unique to the Baker benchmark set, as highlighted in italic text in Table 3.3. In the case of cyclopropyl radical ring opening reaction, the FSM-RIC TS guess structure is optimized to the correct TS geometry as confirmed by computed TS mode imaginary frequency -1052 cm-1 and forward barrier height 53.8 kcal/mol. The FSM-LST TS guess is optimized to a planar C3 molecule with TS mode corresponding to out-of-plane motion of the central carbon atom and associated imaginary frequency -659 cm-1 and forward barrier height 103.1 kcal/mol. In the case of methanimine isomerization to aminomethylene (H2CNH HCNH2), the FSM-RIC TS guess structure is optimized to the correct TS geometry where the migrating hydrogen atom moves slightly out-of-plane. The TS geometry is confirmed by computed TS mode imaginary frequency -2210 cm-1 and forward barrier height 85.4 kcal/mol. The FSM-LST TS guess structure is optimized in three steps to a second-order saddle point associated with a planar transition geometry. The TS geometry is confirmed to have two vibrational modes with associated imaginary frequencies -2222 cm-1 and -472 cm-1 and forward barrier height 87.3 kcal/mol. The first vibrational mode corresponds to the correct TS mode while the second vibrational mode is associated with out-of-plane motion.
3.4 Ablation Study

We use the Baker benchmark set for further investigation of the effects of modifying the FSM algorithm parameters on the performance of the FSM-RIC. Figure 2 shows the average number of gradient calls required for FSM-RIC calculation and TS optimization per Baker benchmark set test case across three separate ablation studies. In the left panel of Figure 2, we show the results of our first ablation study where the minimum number of string nodes () is varied while other optimization parameters are held fixed. We find that decreasing from 18 to nine results in a significantly reduced number of gradient calculations for TS guess structure determination, and observe little variation overall in the number of gradient evaluations required for TS optimization. The central and right panels of Figure 2 show the results of the ablation studies where is held fixed and the maximum number of optimization steps per interpolation step () and the maximum number of line search steps taken for each optimization step (), respectively, are varied around the original setting. We find that reducing gives further improvement in the number of required gradient evaluations for TS guess structure determination, while having little effect on the quality of the TS guess structure as indicated by the required gradient evaluations for TS optimization. We find that the required number of gradient evaluations for TS guess structure determination is insensitive to , indicating that though we allow up to three line search steps per optimization step, we often require fewer in practice.
The results of the ablation study suggest that we can significantly reduce the computational cost of TS finding by the FSM-RIC, while maintaining a high success rate, by taking larger interpolation steps (smaller ) and fewer optimization steps per interpolation step. We perform FSM calculations with nominally 9 nodes along the approximate reaction pathway (), one step per optimization cycle (), and at most three line search steps per optimization cycle (). The comparison between the required gradient calls for FSM and TS calculations using RIC or LST interpolation on the Baker benchmark set with these modified settings is presented in Table 3.4. The FSM-RIC successfully locates the exact benchmark TS structure in fewer required gradient evaluations for each test case. The average required number of gradient evaluations to produce a TS guess structure is 19 and, similar to the more conservative operational settings, the TS guess structures require on average 20 gradient evaluations for optimization. In contrast, the FSM-LST fails to locate the benchmark TS structure in seven of the 24 total test cases present in the Baker benchmark set. The average required number of gradient evaluations to produce a TS guess structure is 19, but TS optimization requires on average 31 gradient evaluations, indicating that the TS guess structures are of worse quality compared with FSM-LST calculations run with more conservative settings. A typical course of action upon TS calculation failure is to re-run the calculation with more conservative settings. This, in practice, would increase the average gradient evaluation requirements for successfully determining a TS structure and can possibly negate the computational cost savings obtained by running the FSM-LST algorithm at more aggressive settings. These results highlight an important finding of this work, that the incorporation of RICs permits the use of larger interpolation steps during TS guess determination, resulting in fewer gradient calculations, while maintaining a good overall success rate when compared with LST interpolation.
ID | Reaction | Gradients | |||||
Gradients | (FSM-RIC) | ||||||
Gradients | |||||||
Gradients | |||||||
Gradients | |||||||
Gradients | |||||||
(TOTAL-LST) | |||||||
1 | HCN HNC | 22 | 4 | 26 | 21 | 7 | 28 |
2 | HCCH CCH2 | 21 | 9 | 30 | 23 | 10 | 33 |
3 | H2CO H2 + CO | 18 | 37 | 55 | 17 | 53 | 70 |
4 | CH3O CH2OH | 16 | 7 | 23 | 16 | 8 | 24 |
5 | cyclopropyl ring opening | 20 | 14 | 34 | 22 | 61 | 83 |
6 | bicyclo[1.1.0]butane trans-butandiene | 20 | 74 | 94 | 20 | 32 | 52 |
7 | formyloxyethyl 1,2-migration | 24 | 11 | 35 | 14 | 62 | 76 |
8 | parent Diels-Alder cycloaddition | 23 | 21 | 44 | 16 | 30 | 46 |
9 | s-tetrazine 2HCN + N2 | 22 | 8 | 30 | 25 | 40 | 65 |
10 | trans-butadiene cis-butadiene | 22 | 3 | 25 | 22 | 44 | 66 |
11 | CH3CH3 CH2CH2 + H2 | 18 | 37 | 55 | 16 | 25 | 41 |
12 | CH3CH2F CH2CH2 + HF | 17 | 15 | 32 | 18 | 16 | 34 |
13 | acetaldehyde keto-enol tautomerism | 20 | 8 | 28 | 19 | 8 | 27 |
14 | HCOCl HCl + CO | 22 | 10 | 32 | 20 | 12 | 32 |
15 | H2O + PO H2PO | 17 | 34 | 51 | 16 | 28 | 44 |
16 | CH2CHCH2CH2CHO Claisen rearrangement | 20 | 29 | 49 | 20 | 39 | 59 |
17 | SiH2 + CH3CH3 SiH3CH2CH3 | 19 | 9 | 28 | 16 | 10 | 26 |
18 | HNCCS HNC + CS | 23 | 8 | 31 | 26 | 10 | 36 |
19 | HCONH NH + CO | 20 | 35 | 55 | 30 | 1 | 31 |
20 | acrolein rotational TS | 24 | 4 | 28 | 20 | 17 | 37 |
21 | HCONHOH HCOHNHO | 19 | 17 | 36 | 20 | 11 | 31 |
22 | HNC + H2 H2CNH | 6 | 31 | 37 | 17 | 17 | 34 |
23 | H2CNH HCNH2 | 9 | 7 | 16 | 19 | 5 | 24 |
24 | HCNH2 HCN + H2 | 20 | 46 | 66 | 19 | 196 | 215 |
4 Conclusion
We have demonstrated the application of the FSM to TS search and optimization using three diverse benchmark sets containing over 40 chemical reactions. The FSM with either LST and RIC interpolation, when run with previously recommended interpolation stepsize and optimization settings, is a reliable technique for producing TS guess geometries and achieves a high success rate for TS optimization on all chemical reactions studied. More importantly, we have shown how incorporation of RIC interpolation into the FSM algorithm, a novel contribution of this work, enables the use of larger interpolation stepsizes and fewer optimization steps per interpolation step, which in turn reduces the computational cost of TS search and optimization while maintaining 100% success rate. An open-source Python implementation of the FSM algorithm, in addition to the reactant, product, and TS structures of all reactions studied computed at the B97X-V/def2-TZVP level of theory is provided as a contribution of this work.48
J.G acknowledges start up funding from The University of Iowa. This research was supported in part through computational resources provided by The University of Iowa.
References
- Elber and Karplus 1987 Elber, R.; Karplus, M. A method for determining reaction paths in large molecules: Application to myoglobin. Chemical Physics Letters 1987, 139, 375–380
- Cerjan and Miller 1981 Cerjan, C. J.; Miller, W. H. On finding transition states. The Journal of Chemical Physics 1981, 75, 2800–2806
- Schlegel 1982 Schlegel, H. B. Optimization of equilibrium geometries and transition structures. Journal of Computational Chemistry 1982, 3, 214–218
- Simons et al. 1983 Simons, J.; Joergensen, P.; Taylor, H.; Ozment, J. Walking on potential energy surfaces. The Journal of Physical Chemistry 1983, 87, 2745–2753
- Baker 1986 Baker, J. An algorithm for the location of transition states. Journal of Computational Chemistry 1986, 7, 385–395
- Wales 1992 Wales, D. J. Basins of attraction for stationary points on a potential-energy surface. Journal of the Chemical Society, Faraday Transactions 1992, 88, 653–657
- Wales 1993 Wales, D. J. Locating stationary points for clusters in cartesian coordinates. Journal of the Chemical Society, Faraday Transactions 1993, 89, 1305–1313
- Banerjee et al. 1985 Banerjee, A.; Adams, N.; Simons, J.; Shepard, R. Search for stationary points on surfaces. The Journal of Physical Chemistry 1985, 89, 52–57
- Ayala and Schlegel 1997 Ayala, P. Y.; Schlegel, H. B. A combined method for determining reaction paths, minima, and transition state geometries. The Journal of Chemical Physics 1997, 107, 375–384
- Heyden et al. 2005 Heyden, A.; Bell, A. T.; Keil, F. J. Efficient methods for finding transition states in chemical reactions: Comparison of improved dimer method and partitioned rational function optimization method. The Journal of Chemical Physics 2005, 123, 224101
- Mills and Jónsson 1994 Mills, G.; Jónsson, H. Quantum and thermal effects in H 2 dissociative adsorption: Evaluation of free energy barriers in multidimensional quantum systems. Physical Review Letters 1994, 72, 1124
- Henkelman and Jónsson 2000 Henkelman, G.; Jónsson, H. Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points. The Journal of Chemical Physics 2000, 113, 9978–9985
- Henkelman et al. 2000 Henkelman, G.; Uberuaga, B. P.; Jónsson, H. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. The Journal of Chemical Physics 2000, 113, 9901–9904
- Maragakis et al. 2002 Maragakis, P.; Andreev, S. A.; Brumer, Y.; Reichman, D. R.; Kaxiras, E. Adaptive nudged elastic band approach for transition state calculation. The Journal of Chemical Physics 2002, 117, 4651–4658
- Chu et al. 2003 Chu, J.-W.; Trout, B. L.; Brooks, B. R. A super-linear minimization scheme for the nudged elastic band method. The Journal of Chemical Physics 2003, 119, 12708–12717
- Trygubenko and Wales 2004 Trygubenko, S. A.; Wales, D. J. A doubly nudged elastic band method for finding transition states. The Journal of Chemical Physics 2004, 120, 2082–2094
- Kolsbjerg et al. 2016 Kolsbjerg, E. L.; Groves, M. N.; Hammer, B. An automated nudged elastic band method. The Journal of Chemical Physics 2016, 145, 094107
- Ruttinger et al. 2022 Ruttinger, A. W.; Sharma, D.; Clancy, P. Protocol for directing nudged elastic band calculations to the minimum energy pathway: Nurturing errant calculations back to convergence. Journal of Chemical Theory and Computation 2022, 18, 2993–3005
- Peters et al. 2004 Peters, B.; Heyden, A.; Bell, A. T.; Chakraborty, A. A growing string method for determining transition states: Comparison to the nudged elastic band and string methods. The Journal of Chemical Physics 2004, 120, 7877–7886
- Quapp 2005 Quapp, W. A growing string method for the reaction pathway defined by a Newton trajectory. The Journal of Chemical Physics 2005, 122, 174106
- Goodrow et al. 2008 Goodrow, A.; Bell, A. T.; Head-Gordon, M. Development and application of a hybrid method involving interpolation and ab initio calculations for the determination of transition states. The Journal of Chemical Physics 2008, 129, 174109
- Goodrow et al. 2009 Goodrow, A.; Bell, A. T.; Head-Gordon, M. Transition state-finding strategies for use with the growing string method. The Journal of Chemical Physics 2009, 130, 244108
- Goodrow et al. 2010 Goodrow, A.; Bell, A. T.; Head-Gordon, M. A strategy for obtaining a more accurate transition state estimate using the growing string method. Chemical Physics Letters 2010, 484, 392–398
- Behn et al. 2011 Behn, A.; Zimmerman, P. M.; Bell, A. T.; Head-Gordon, M. Incorporating linear synchronous transit interpolation into the growing string method: Algorithm and applications. Journal of Chemical Theory and Computation 2011, 7, 4019–4025
- Zimmerman 2013 Zimmerman, P. M. Growing string method with interpolation and optimization in internal coordinates: Method and examples. The Journal of Chemical Physics 2013, 138, 184102
- Zimmerman 2013 Zimmerman, P. Reliable transition state searches integrated with the growing string method. Journal of Chemical Theory and Computation 2013, 9, 3043–3050
- Zimmerman 2015 Zimmerman, P. M. Single-ended transition state finding with the growing string method. Journal of Computational Chemistry 2015, 36, 601–611
- Jafari and Zimmerman 2017 Jafari, M.; Zimmerman, P. M. Reliable and efficient reaction path and transition state finding for surface reactions with the growing string method. Journal of Computational Chemistry 2017, 38, 645–658
- Behn et al. 2011 Behn, A.; Zimmerman, P. M.; Bell, A. T.; Head-Gordon, M. Efficient exploration of reaction paths via a freezing string method. The Journal of Chemical Physics 2011, 135, 224108
- Sharada et al. 2012 Sharada, S. M.; Zimmerman, P. M.; Bell, A. T.; Head-Gordon, M. Automated transition state searches without evaluating the Hessian. Journal of Chemical Theory and Computation 2012, 8, 5166–5174
- Suleimanov and Green 2015 Suleimanov, Y. V.; Green, W. H. Automated discovery of elementary chemical reaction steps using freezing string and Berny optimization methods. Journal of Chemical Theory and Computation 2015, 11, 4248–4259
- Halgren and Lipscomb 1977 Halgren, T. A.; Lipscomb, W. N. The synchronous-transit method for determining reaction pathways and locating molecular transition states. Chemical Physics Letters 1977, 49, 225–232
- Smidstrup et al. 2014 Smidstrup, S.; Pedersen, A.; Stokbro, K.; Jónsson, H. Improved initial guess for minimum energy path calculations. The Journal of Chemical Physics 2014, 140
- Byrd et al. 1995 Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing 1995, 16, 1190–1208
- Virtanen et al. 2020 Virtanen, P.; Gommers, R.; Oliphant, T. E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; others SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 2020, 17, 261–272
- Peng et al. 1996 Peng, C.; Ayala, P. Y.; Schlegel, H. B.; Frisch, M. J. Using redundant internal coordinates to optimize equilibrium geometries and transition states. Journal of Computational Chemistry 1996, 17, 49–56
- Lee-Ping and Song 2016 Lee-Ping, W.; Song, C. Geometry optimization made simple with translation and rotation coordinates. The Journal of Chemical Physics 2016, 144
- Bakken and Helgaker 2002 Bakken, V.; Helgaker, T. The efficient optimization of molecular geometries using redundant internal coordinates. The Journal of Chemical Physics 2002, 117, 9160–9174
- Wilson et al. 1980 Wilson, E. B.; Decius, J. C.; Cross, P. C. Molecular vibrations: the theory of infrared and Raman vibrational spectra; Courier Corporation, 1980
- Mardirossian and Head-Gordon 2014 Mardirossian, N.; Head-Gordon, M. B97X-V: A 10-parameter, range-separated hybrid, generalized gradient approximation density functional with nonlocal correlation, designed by a survival-of-the-fittest strategy. Physical Chemistry Chemical Physics 2014, 16, 9904–9924
- Weigend and Ahlrichs 2005 Weigend, F.; Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Physical Chemistry Chemical Physics 2005, 7, 3297–3305
- Schmidt et al. 1985 Schmidt, M. W.; Gordon, M. S.; Dupuis, M. The intrinsic reaction coordinate and the rotational barrier in silaethylene. Journal of the American Chemical Society 1985, 107, 2585–2589
- Epifanovsky et al. 2021 Epifanovsky, E.; Gilbert, A. T.; Feng, X.; Lee, J.; Mao, Y.; Mardirossian, N.; Pokhilko, P.; White, A. F.; Coons, M. P.; Dempwolff, A. L.; others Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package. The Journal of Chemical Physics 2021, 155, 084801
- Birkholz and Schlegel 2015 Birkholz, A. B.; Schlegel, H. B. Using bonding to guide transition state optimization. Journal of Computational Chemistry 2015, 36, 1157–1166
- Casewit et al. 1992 Casewit, C.; Colwell, K.; Rappe, A. Application of a universal force field to organic molecules. Journal of the American Chemical Society 1992, 114, 10035–10046
- Stewart 2007 Stewart, J. J. Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements. Journal of Molecular Modeling 2007, 13, 1173–1213
- Baker and Chan 1996 Baker, J.; Chan, F. The location of transition states: A comparison of Cartesian, Z-matrix, and natural internal coordinates. Journal of Computational Chemistry 1996, 17, 888–904
- Marks and Gomes 2024 Marks, J.; Gomes, J. thegomeslab/fsm: Data for "Incorporation of Internal Coordinates Interpolation into the Freezing String Method". 2024; \urlhttps://zenodo.org/doi/10.5281/zenodo.12735423