This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Incorporation of Internal Coordinates Interpolation into the Freezing String Method

Jonah Marks    Joseph Gomes joe-gomes@uiowa.edu [
Abstract

We present an improved method for determining guess structures for transition state searches by incorporating internal coordinates interpolation into the freezing string method (FSM). We test our method on over 40 reactions across 3 benchmark datasets covering a diverse set of chemical reactions. Our results show that incorporation of internal coordinates interpolation improves the computational efficiency of the FSM by enabling the use of larger interpolation step sizes and fewer optimization steps per cycle while maintaining 100% success rate on benchmark chemical reaction test cases including systems where previous attempts based on linear synchronous transit interpolation have failed. We provide an open-source Python implementation of the FSM, in addition to the reactant, product, transition state structures of all reactions studied.

keywords:
Chemical calculations, Chemical reactions, Optimization, Transition states

Iowa] Department of Chemical and Biochemical Engineering
University of Iowa, Iowa City, United States \abbreviationsMEP,TS,FSM,LST,RIC

{tocentry}
[Uncaptioned image]

Reaction coordinate diagram of the bicyclo[1.1.0]butane ring opening reaction to trans-butadiene.

1 Introduction

The determination of reaction mechanisms is an important step for chemists in the prediction of thermodynamics and kinetic properties of chemical reactions. In computational chemistry, the reaction mechanism is typically represented as a pathway on the Born-Oppenheimer potential energy surface (PES) of the system of interest given by a suitable potential energy function of the nuclear coordinates. The equilibrium states correspond to stationary points on the PES with zero first order derivatives (gradient) in all directions and all positive eigenvalues of the second order derivative (Hessian) matrix. The equilibrium states can be identified by energy minimization given a suitable initial guess structure. There exist many pathways by which equilibrium states may interconvert; however, only a small subset of these pathways are thermally accessible and relevant to thermodynamic and kinetic analysis of reaction mechanisms. The minimum energy path (MEP) is the route that needs the least amount of potential energy for the system to undergo the transition. The transition states (TSs), based on transition state theory, are first order saddle points with zero gradient and only one negative eigenvalue of the Hessian matrix. The MEP connecting two equilibrium states must go through one or more TSs and serves as a representative reaction path.

The first step in identifying the MEP is typically locating the minimum energy TS connecting the equilibrium states of interest. Determination of TS geometries is a computationally expensive task that frequently requires significant human intervention. Efforts have been made to develop automated processes for finding first order saddle points on PESs while attempting to minimize computational cost. Many modern algorithms take a chain-of-states1 approach to this problem. Geometric interpolation between the reactant and product geometries is performed, producing a string of intermediate structures, or nodes, distributed along the reaction pathway. The resulting structures are optimized using ab-initio methods to approximate the MEP of the reaction. The highest energy structure in the chain-of-states is taken as a guess of the TS geometry. A local surface walking algorithm refines the guess to exact TS geometry. 2, 3, 4, 5, 6, 7, 8, 9, 10 These algorithms require the use of local gradient and Hessian information to locate saddle points, and their success is heavily dependent on the initial guess being located within the desired basin of attraction.

Many such chain-of-states methods that provide accurate TS geometry guesses from reactant and product structures have been previously reported. Details on how intermediate structures are selected and optimized differs between methods, and consequently influences the computational cost and success of the algorithm. The nudged elastic band (NEB) method creates an initial chain-of-states by interpolation between reactant and product, and every intermediate structure along the string is iteratively optimized to lie along the reaction pathway, together with an additional penalty term to maintain an even distribution of structures along the chain-of-states. 11, 12, 13, 14, 15, 16, 17, 18 The growing string method (GSM) in most use cases incurs fewer gradient calculations than the NEB by developing a better initial chain-of-states, or string.19, 20, 21, 22, 23, 24, 25, 26, 27, 28 The GSM creates two strings; one starts at the reactant configuration and the other starts at the product configuration. At each iteration of the growing process, an intermediate structure, or node, is added to each string frontier in the direction of the opposing string, after which the entire string is optimized. This growth process is repeated until the strings meet, providing a better initial chain-of-states, and followed by additional optimization of the unified string. In addition to providing a better initial chain-of-states, this avoids simulation of non-physical structures in the interior nodes which may result from the initial interpolation.

The freezing string method (FSM) further reduces the number of gradient calls but at the cost of identifying the true MEP. 29, 30, 31 Two strings are grown from the product and reactant like the GSM. Once a frontier node is placed, optimization is performed to step the node closer to the reaction pathway, after which it is “frozen”, i.e., it will not move for the remainder of the calculation. New frontier nodes are added to both strings, and the process is repeated. Once the two strings unite, the highest energy structure is taken to be the TS guess without optimization of the unified string. In practice, the resulting pathway can deviate significantly from the MEP but often produces guess structures suitable for further refinement.

The FSM greatly improves the computational efficiency of TS guess finding compared to previously described algorithms, often resulting in an order of magnitude reduction in the required number of electronic structure calculations required to determine the true TS structure. Despite these improvements, the FSM is not a foolproof algorithm. The efficiency of the calculation and ultimately its success heavily depends on the interpolation algorithm used to generate initial guess structures at the frontier of the growing strings. There exist known issues with commonly applied Cartesian coordinate and linear synchronous transit (LST) interpolation techniques32, which can produce high energy or otherwise aphysical molecular geometry structures far from the true MEP which are then subjected to electronic structure calculation and geometry optimization.25, 33 Due to the relatively few number of optimization steps used in the FSM, the incorporation of these geometries as anchor points into future interpolation steps poisons the calculation, resulting in a failed search and wasted computational effort.

In this work, we demonstrate an improved method for initiating TS searches by incorporating internal coordinates (ICs) interpolation into the FSM. Additionally, we incorporate the L-BFGS-B with explicit line search for step size determination which improves reaction pathway optimization steps. We test our method on over 40 reactions across three benchmark datasets covering a broad set of chemical reactions. Our results show that, using previously studied LST interpolation, the incorporation of the L-BFGS-B optimization reduces the computational effort required in previous studies, even considering the additional computational cost incurred due to step size determination by line search. Incorporation of ICs interpolation further improves the computational efficiency of the FSM and successfully locates high-quality TS guess structures where previous attempts based on LST interpolation have failed. We provide an open-source Python implementation of the FSM, in addition to the reactant, product, TS structures of all reactions computed at the ω\omegaB97X-V/def2-TZVP level of theory. We anticipate that these resources will be of broad interest for researchers in computational chemistry studying problems where the fast and reliable location of TSs is important, or those developing algorithms who wish to evaluate their methods on a broad set of realistic chemical problems.

2 Methods

2.1 Overview of the Freezing String Method

Refer to caption
Figure 1: Algorithm flowchart describing the freezing string method. ss is the interpolation step size and NoptN_{\mathrm{opt}} is the maximum number of optimization steps per interpolation step.

A flowchart of the FSM algorithm is shown in Figure 1. The goal of the FSM is to produce an approximate reaction pathway connecting two given reactant and product structures. The approximate reaction pathway is represented as a chain-of-states consisting of nodes along a parameterized string. The interior nodes on the string represent intermediate geometries along the approximate reaction pathway. The method evolves the string by alternately adding nodes to the reactant and product sides of a growing string. The new reactant and product side structures are generated by taking a step inward along an interpolated path between frontier nodes. After interpolation, the structures undergo geometry optimization in the direction perpendicular to the approximate reaction pathway. Optimization of these new frontier structures is performed, and then the geometries are frozen. These steps are repeated until the reactant and product sides of growing string meet. The highest energy node along the pathway is chosen as the TS guess structure that is refined to the true TS geometry using a local surface walking optimization algorithm.

The interpolation step is performed by adding nodes at a user-defined distance, ss, from the reactant and product frontier nodes. During the first evolution step, the frontier nodes are the user-defined reactant and product structures, while subsequent steps use the frozen nodes from the previous iteration as anchor points for interpolation. The interpolation step size ss is fixed during the calculation, and in this work, we determine ss by dividing the arc length of an interpolated path connecting the initial reactant and product structures by a user-defined nominal number of nodes, NnodesN_{\mathrm{nodes}}. The interpolation can be performed using many different coordinate systems. We consider both LST interpolation and linear interpolation in redundant internal coordinates. Tangent directions at the new reactant and product frontier nodes are determined from a cubic spline fitted through the chemical structures of the interpolated pathway.

During optimization, the energy is minimized by assumption of the local quadratic approximation:

E(𝒙)=E(𝒙𝟎)+(𝒙𝒙𝟎)T𝒈+12(𝒙𝒙𝟎)T𝑯(𝒙𝒙𝟎)E(\boldsymbol{x})=E(\boldsymbol{x_{0}})+(\boldsymbol{x}-\boldsymbol{x_{0}})^{T}\boldsymbol{g}^{\perp}+\frac{1}{2}(\boldsymbol{x}-\boldsymbol{x_{0}})^{T}\boldsymbol{H}(\boldsymbol{x}-\boldsymbol{x_{0}}) (1)

Here, 𝒙𝟎\boldsymbol{x_{0}} is the current geometry, 𝒙\boldsymbol{x} is a proposed geometry, 𝒈\boldsymbol{g}^{\perp} is the perpendicular gradient, and 𝑯\boldsymbol{H} is the approximate Hessian in the space perpendicular to the approximate reaction pathway. The perpendicular gradient is given by 𝒈=(𝑰𝒕^𝒕^T)𝒈\boldsymbol{g}^{\perp}=(\boldsymbol{I}-\boldsymbol{\hat{t}}\boldsymbol{\hat{t}}^{T})\boldsymbol{g} where 𝒕^\boldsymbol{\hat{t}} is the normalized tangent vector determined during the interpolation step.

The optimization of equation 1 is performed using the L-BFGS-B algorithm34 as implemented in the SciPy Python library.35 We place bounds on each Cartesian coordinate such that no coordinate is displaced by more than 0.3 Å during a single optimization step. Each optimization step is begun by performing a backtracking line search, where at most NlsN_{\mathrm{ls}} (often one) energy and gradient calculations are performed to determine an appropriate step size. The optimization proceeds until a specified convergence criteria is satisfied, or a maximum number of step NoptN_{\mathrm{opt}} has been performed.

2.2 Redundant internal coordinates

Cartesian coordinates 𝒙=(x1,x2,,x3N)\boldsymbol{x}=(x_{1},x_{2},\dots,x_{3N}) are used to specify the positions of NN atoms in a molecule and are a necessary input for electronic structure calculations. Internal coordinates (ICs) 𝒒=(q1(𝒙),q2(𝒙),,qn(𝒙))\boldsymbol{q}=(q_{1}(\boldsymbol{x}),q_{2}(\boldsymbol{x}),\dots,q_{n}(\boldsymbol{x})) are functions of the Cartesian coordinates that better describe the collective motions of atoms. There exists several choices for constructing coordinate systems when performing interpolation between two molecular geometries. The definition of the redundant internal coordinates used in this work have been reported previously 36, 37. Redundant internal coordinate sets are constructed separately for reactant and product molecules, and the union of these two sets gives the full set of redundant coordinates. Bonds, angles, linear angles, torsion angles, and out-of-plane angles are assigned based on a procedure outlined in Bakken and Helgaker 38, and the full set of coordinates is further pruned to ensure each internal coordinate is well-defined for both reactant and product molecules.

2.3 ICs interpolation

Given a set of nn primitive redundant coordinates 𝒒\boldsymbol{q}, a set of structures 𝒒i(f)\boldsymbol{q}^{i}(f) are produced by linear interpolation

𝒒i(f)=(1f)𝒒R+f𝒒P\boldsymbol{q}^{i}(f)=(1-f)\boldsymbol{q}^{R}+f\boldsymbol{q}^{P} (2)

where ff is the interpolation parameter and 𝒒R\boldsymbol{q}^{R} and 𝒒P\boldsymbol{q}^{P} represent the reactant and product internal coordinate values. The displacements 𝚫𝒒\boldsymbol{\Delta q} in 𝒒\boldsymbol{q} are related by the the well-known BB matrix39 𝚫𝒒=𝑩𝚫𝒙\boldsymbol{\Delta q}=\boldsymbol{B}\boldsymbol{\Delta x} valid for small Cartesian displacements 𝚫𝒙\boldsymbol{\Delta x}. The final step is to transform the geometries along the interpolated path from internal coordinates back to Cartesians. Given a target geometry 𝒒i\boldsymbol{q}^{i} in internal coordinates, this is done iteratively using the formula

𝒙k+1=𝒙k+(𝑩(𝒙k)T)1[𝒒i𝒒(𝒙k)].\boldsymbol{x}_{k+1}=\boldsymbol{x}_{k}+(\boldsymbol{B}(\boldsymbol{x}_{k})^{T})^{-1}[\boldsymbol{q}^{i}-\boldsymbol{q}(\boldsymbol{x}_{k})]. (3)

The iteration is terminated when the Cartesian coordinates generated on the (k+1k+1)th iteration 𝒙k+1\boldsymbol{x}_{k+1} are identical to those on the kkth iteration 𝒙k\boldsymbol{x}_{k} within a tolerance 10710^{-7} Å. The transformation of all 𝒒i(f)\boldsymbol{q}^{i}(f) results in a chain-of-states 𝒙c(f)\boldsymbol{x}^{c}(f) connecting the current frontier reactant and product nodes that is used during the interpolation step of the FSM.

2.4 LST interpolation

LST is another chemically realistic interpolation method that produces a chain-of-states pathway.32 A set of structures 𝒓i(f)\boldsymbol{r}^{i}(f) are produced by linear interpolation in a set of internuclear distances 𝒓i={rab;a>b=1,2,,N}\boldsymbol{r}^{i}=\{r_{ab};a>b=1,2,\dots,N\}

𝒓i(f)=(1f)𝒓R+f𝒓P\boldsymbol{r}^{i}(f)=(1-f)\boldsymbol{r}^{R}+f\boldsymbol{r}^{P} (4)

where ff is the interpolation parameter and 𝒓R\boldsymbol{r}^{R} and 𝒓P\boldsymbol{r}^{P} represent the reactant and product internuclear distances. A set of structures 𝒙i(f)\boldsymbol{x}^{i}(f) are produced by linear interpolation in Cartesian coordinates

𝒙i(f)=(1f)𝒙R+f𝒙P\boldsymbol{x}^{i}(f)=(1-f)\boldsymbol{x}^{R}+f\boldsymbol{x}^{P} (5)

The final interpolated pathway is determined by minimizing the objective SS by the method of least-squares at each interpolation point ff

S=a>bN(rabirabc)2(rabi)4+wj=13N(xjixjc)2S=\sum_{a>b}^{N}\frac{(r_{ab}^{i}-r_{ab}^{c})^{2}}{(r_{ab}^{i})^{4}}+w\sum_{j=1}^{3N}(x_{j}^{i}-x_{j}^{c})^{2} (6)

where ww is a weighting factor (nominally 10610^{-6}) and the superscripts ii and cc denote interpolated and calculated values, respectively. The variables 𝒙c\boldsymbol{x}^{c} are those optimized during the minimization of SS, and the variables 𝒓c\boldsymbol{r}^{c} are derived from 𝒙c\boldsymbol{x}^{c}. The denominator in the first term ensures that important distances between bonded atoms are preserved in the final interpolated geometries. The second term is weighted by a factor ww, and produces forces to prevent rigid translation or rotation such that the interpolated structures align with the end structures. The weighting factor ww additionally weights within the objective function the relative importance of linear interpolation in internuclear distances (first term) and linear interpolation in Cartesian coordinates (second term). The optimization of equation 6 is performed using the L-BFGS-B algorithm34 as implemented in the SciPy Python library. The optimization of SS for all ff results in a chain-of-states 𝒙c(f)\boldsymbol{x}^{c}(f) connecting the current frontier reactant and product nodes that is used during the interpolation step of the FSM.

2.5 Computational details

Electronic structure calculations are performed to produce the optimized geometries of the reactant and product for each reaction studied, compute the quantum mechanical gradients required by the FSM algorithm, as well as to refine the geometry of the TS guess structures produced by the FSM. All electronic structure calculations were performed using the range-separated, hybrid generalized gradient approximation with non-local correlation exchange-correlation function ω\omegaB97X-V40 using the triple-ζ\zeta, polarized valence def2-TZVP basis set.41 During geometry optimization, energies were converged to 10-6 Ha (Hartree) and the maximum norm of the Cartesian gradient was converged to 10-3 Ha bohr-1. Geometry optimization was terminated if the convergence criteria were not met within 250 optimization steps. All reported energy values are electronic energy without thermal or zero-point correction. The eigenvector-following local TS searches were performed using the partitioned rational function optimization method.5 Frequency analysis was performed to confirm the nature of each stationary point: there must be zero imaginary frequencies for PES minima and exactly one imaginary frequency for PES TSs. Intrinsic reaction coordinate (IRC) pathway calculations initiated at the TS were performed with the predictor-corrector algorithm of Schmidt et al. 42 to further characterize TS geometries. The FSM software is written in Python and performs file-based data exchange with an electronic structure software package to obtain the quantum mechanical gradients used in the method. All QM calculations were performed using a release version of the Q-Chem 6.0 software package.43

3 Results and Discussion

We measure benchmark performance by tracking the number of quantum mechanical gradient calculations performed as part of the FSM and TS search procedure, as well as the total number of gradient calculations. We report separately the total number of gradients computed during FSM calculation, which measures the efficiency at which a TS guess is obtained, and the total number of gradients computed during the TS calculation, which serves as an indicator for TS guess quality. We compare FSM calculations with either redundant internal coordinates (FSM-RIC) or linear synchronous transit (FSM-LST) interpolation methods. We perform FSM calculations with nominally 18 nodes (NnodesN_{\mathrm{nodes}}) along the approximate reaction pathway, two steps per optimization cycle (NoptN_{\mathrm{opt}}), and at most three line search steps per optimization cycle (NlsN_{\mathrm{ls}}) unless otherwise stated. We consider this set of parameters to be a conservative baseline where both the FSM-RIC and FSM-LST should perform well.

3.1 Sharada test set

ID Reaction Gradients
Gradients (FSM-RIC)
Gradients
Gradients
Gradients
Gradients
(TOTAL-LST)
1 H2CO \rightarrow H2 + CO 58 13 71 62 11 73
2 SiH2 + H2 \rightarrow SiH4 55 5 60 53 4 57
3 acetaldehyde Keto-enol tautomerism 63 3 66 61 4 65
4 CH3CH3 \rightarrow CH2CH2 + H2 58 24 82 52 21 73
5 bicyclo[1.1.0]butane \rightarrow trans-butandiene 61 47 108 58 25 83
6 parent Diels-Alder cycloaddition reaction 62 15 77 53 33 86
7 cis,cis-2,4-hexadiene \rightarrow 3,4-dimethylcyclobutene 60 18 78 79 25 104
8 alanine dipeptide rearrangment 61 64 125 45 45 90
9 silyl ketene acetal \rightarrow silyl ester Ireland-Claisen rearrangement 60 82 142 60 88 148
Table 1: Comparison of performance of the FSM-RIC and FSM-LST for TS guess structure generation and TS search on the Sharada benchmark set. Performance is measured based on the total number of gradient evaluations required to achieve FSM and TS search convergence. The method (FSM-RIC or FSM-LST) requiring the fewest gradient evaluations is highlighted in bold text.

A test suite consisting of nine reactions curated by Sharada et al. 30 is chosen as the first benchmark. The Sharada test set contains a broad set of bond formation, dissociation, ring-opening, and isomerization reactions. The description of the nine reactions in the Sharada test set is presented in Table 3.1. This choice of benchmark suite is significant due to its previous use benchmarking previous implementations of the FSM30, 24. For each reaction, we highlight in bold the method (FSM-RIC or FSM-LST) that locates the TS geometry in fewest calculations.

The comparison between the required gradients calls for FSM and TS calculations using RIC or LST interpolation on the Sharada test set is presented in Table 3.1. Both FSM-RIC and FSM-LST calculations successfully locate the exact benchmark TS structure without user intervention when performed with the conservative baseline parameters. The FSM-RIC and FSM-LST perform similarly, on average, producing a TS guess structure after 60 and 58 gradient evaluations, respectively. The resulting TS guess structures are of similar quality, as indicated by the average number of gradient evaluations required for TS structure refinement. The FSM-RIC TS guess structure requires on average 30 gradient evaluations for subsequent optimization, and the FSM-LST TS guess structures require on average 28 gradient evaluations for optimization.

Though the choice of exchange-correlation method and basis set in the original study by Sharada et al. 30 precludes the direct comparison to the current work, we can qualitatively compare their reported gradient evaluation counts to ours. The average number of gradient calls required for the FSM-LST in its original implementation is 53, noticeably fewer than required by our implementation of FSM-LST, when performed with similar FSM parameter settings. The two implementations differ notably in the how each method chooses the step size for node-level optimization. Whereas the original work uses the BFGS algorithm and a heuristic for step size determination based on available local PES information, our method uses the L-BFGS-B algorithm together with a step size determined by explicit line search. There is a small increase in the number of gradient calls required during the FSM calculation due to the explicit line search requiring additional electronic structure calculations at each optimization step. The average number of gradient calls required for subsequent TS optimization in the previous FSM implementation is 64, whereas our method requires on average 28 gradient calls for TS optimization. We can reasonably conclude that, although our optimization method requires more gradient calls due to explicit line search, our TS guesses are of higher quality and converge to the correct TS structure in overall fewer gradient calls on average.

3.2 Birkholz test set

ID Reaction Gradients
Gradients (FSM-RIC)
Gradients
Gradients
Gradients
Gradients
(TOTAL-LST)
1 C2H4 + N2O \rightarrow C2N2O 69 7 76 61 48 109
2 1,3-pentadiene hydrogen transfer 63 5 68 68 6 74
3 HCN \rightarrow HNC 78 3 81 72 9 81
4 1,4-hexadiene Cope rearrangement 56 27 83 60 12 72
5 1,3-cyclopentadiene hydrogen shift 62 5 67 64 7 71
6 1,3-butadiene cyclization 60 6 66 66 8 74
7 Diels-Alder endo addition of cyclopentadiene to cyclopentadiene 62 50 112 57 52 109
8 Diels-Alder addition of cyclopentadiene and ethylene 62 11 73 63 11 74
9 difluorocarbene addition to ethylene 56 24 80 50 12 62
10 ene reaction of ethylene and propene 49 42 91 51 47 98
11 Grignard addition of phenyl magnesium bromide to benzophenone 53 38 91 59 49 108
12 H2CO \rightarrow H2 + CO 58 13 71 62 11 73
13 CH3CH2F \rightarrow CH2CH2 + HF 52 9 61 67 9 76
14 water assisted hydrolysis of ethyl acetate 42 60 102 56 86 142
15 H2 + H2CO \rightarrow CH3OH 52 8 60 61 17 78
16 2-methyl-3-phenyloxirane ring opening 58 48 106 59 36 95
17 CH2CHCH2CH2CHO Claisen rearrangement 57 31 88 61 30 91
18 SiH2 + H2 \rightarrow SiH4 55 5 60 53 4 57
19 Cl- + CH3F \rightarrow CH3Cl + F- 56 4 60 60 3 63
20 sulfur dioxide addition to butadiene 63 21 84 65 22 87
Table 2: Comparison of performance of the FSM-RIC and FSM-LST for TS guess structure generation and TS search on the Birkholz benchmark set. Performance is measured based on the total number of gradient evaluations required to achieve FSM and TS search convergence. The method (FSM-RIC or FSM-LST) requiring the fewest gradient evaluations is highlighted in bold text.

A test suite consisting of twenty reactions curated by Birkholz and Schlegel 44 is chosen as the second benchmark. The Birkholz test set contains examples of many types of organic reactions, including insertions, additions, eliminations, hydrolysis, ring-opening, substitutions, cycloadditions, and rearrangements. The description of the twenty reactions in the Birkholz test set is presented in Table 3.2. This data set shares two identical test cases with the Sharada benchmark set (H2CO \rightarrow H2 + CO and SiH2 + H2 \rightarrow SiH4), resulting in 18 new, unique test cases to evaluate. For each reaction, we highlight in bold the method that locates the TS geometry in fewest calculations.

The comparison between the required gradient calls for FSM and TS calculations using RIC or LST interpolation on the Birkholz test set is presented in Table 3.2. Both the FSM-RIC and FSM-LST successfully find the exact benchmark TS structure for each test case given the same conservative baseline parameters. The FSM-RIC and FSM-LST again perform similarly, on average, producing a TS guess structure after 58 and 61 gradient evaluations, respectively. The resulting TS guess structures are of similar quality; the FSM-RIC TS guess structure requires on average 21 gradient evaluations for subsequent optimization, and the FSM-LST TS guess structures require on average 24 gradient evaluations for optimization. Of the 18 test cases unique to the Birkholz benchmark set, the FSM-RIC finds the final TS structure in fewer or equal number of gradient evaluations compared to the FSM-LST in 14 of the test cases.

One notable difference between the performance of the FSM-RIC and FSM-LST on the Birkholz benchmark set occurs in the case of the water-assisted hydrolysis of ethyl acetate to acetic acid and ethanol. Here, an additional water molecule assists in the hydrolysis reaction by proton shuttling which significantly stabilizes the TS structure in comparison to the unassisted reaction mechanism. The TS search for both structures requires a greater number of additional gradient evaluations when compared to other, simpler reactions within the test set. Both the TS guess structure generation and subsequent TS optimization require fewer gradient evaluations when using internal coordinates-based interpolation steps, resulting in a total of 102 and 142 gradient calls required to locate the final TS structure for FSM-RIC and FSM-LST techniques, respectively.

The Birkholz benchmark data set was originally developed by Birkholz and Schlegel 44 to test their proposed Connectivity Transition State (CTS) algorithm for determining TS guess structure geometry and TS optimization. In the CTS method, interpolation in redundant internal coordinates limited to bond making and breaking coordinates is performed, and optimization of the interpolated structure is performed on UFF molecular mechanics45 or PM6 semiempirical electronic structure46 PESs. Their method also considers different strategies for approximating the initial Hessian used in the subsequent optimization, whereas our TS search calculations are initialized with the exact Hessian calculated at the TS guess structure. This strategy of relying on computationally cheaper levels of theory for optimization results in fewer gradient evaluations overall on the desired PES. This is in contrast to the present work, where the high-level ω\omegaB97X-V/def2-TZVP model chemistry is applied in all calculations. While the CTS method is reported to find the final TS structures in significantly fewer high-level gradient evaluations, we report 100% success rate for both the FSM-RIC and FSM-LST evaluated on the Birkholz benchmark set, whereas the best reported CTS methodology reaches 95% success rate.

3.3 Baker test set

ID Reaction Gradients
Gradients (FSM-RIC)
Gradients
Gradients
Gradients
Gradients
(TOTAL-LST)
1 HCN \rightarrow HNC 78 3 81 72 9 81
2 HCCH \rightarrow CCH2 61 11 72 63 4 67
3 H2CO \rightarrow H2 + CO 58 13 71 62 11 73
4 CH3O \rightarrow CH2OH 52 5 57 48 7 55
5 cyclopropyl ring opening 59 11 70 62 25 87
6 bicyclo[1.1.0]butane \rightarrow trans-butandiene 61 47 108 58 25 83
7 formyloxyethyl 1,2-migration 63 14 77 62 11 73
8 parent Diels-Alder cycloaddition 62 15 77 53 33 86
9 s-tetrazine \rightarrow 2HCN + N2 76 7 83 80 15 95
10 trans-butadiene \rightarrow cis-butadiene 63 2 65 69 2 71
11 CH3CH3 \rightarrow CH2CH2 + H2 58 24 82 52 21 73
12 CH3CH2F \rightarrow CH2CH2 + HF 52 9 61 67 9 76
13 acetaldehyde keto-enol tautomerism 63 3 66 61 4 65
14 HCOCl \rightarrow HCl + CO 65 5 70 67 6 73
15 H2O + PO3{}_{3}^{-} \rightarrow H2PO4{}_{4}^{-} 55 21 76 48 24 72
16 CH2CHCH2CH2CHO Claisen rearrangement 57 31 88 61 30 91
17 SiH2 + CH3CH3 \rightarrow SiH3CH2CH3 55 5 60 53 10 63
18 HNCCS \rightarrow HNC + CS 72 8 80 53 9 62
19 HCONH+3{}_{3}^{+} \rightarrow NH+4{}_{4}^{+} + CO 65 18 83 99 17 116
20 acrolein rotational TS 63 2 65 59 13 72
21 HCONHOH \rightarrow HCOHNHO 57 12 69 62 5 67
22 HNC + H2 \rightarrow H2CNH 36 18 54 66 15 81
23 H2CNH \rightarrow HCNH2 30 6 36 66 3 69
24 HCNH2 \rightarrow HCN + H2 60 41 101 53 26 79
Table 3: Comparison of performance of the FSM-RIC and FSM-LST for TS guess structure generation and TS search on the Baker benchmark set. Performance is measured based on the total number of gradient evaluations required to achieve FSM and TS search convergence. The method (FSM-RIC or FSM-LST) requiring the fewest gradient evaluations is highlighted in bold text. Italicized text indicates a failed search or a search where the wrong TS was located.

A test suite consisting of twenty-five reactions originally compiled by Baker and Chan 47 is selected as the third and final benchmark set. The reactions in the test suite again cover a broad set of chemical reaction archetypes, including dissociation, insertion, rearrangement, ring-opening, and rotation reactions. A description of the reactions contained within the test suite, together with TS optimization results, is given in Table 3.3. This data set shares eight identical test cases with the Sharada and Birkholz benchmark sets, resulting in 16 new, unique test cases for evaluation. For each reaction, we highlight in bold the method that locates the TS in fewest calculations, and highlight in italics any method resulting in a failed search or a search where the wrong TS was located.

The comparison between the required gradient calls for FSM and TS calculations using RIC or LST interpolation on the Baker test set is presented in Table 3.3. The FSM-RIC and FSM-LST perform similarly, on average, producing a TS guess structure after 59 and 62 gradient evaluations, respectively, and both methods requiring on average 14 gradient evaluations for TS optimization. Of the 16 test cases unique to the Baker benchmark set, the FSM-RIC finds the final TS molecular geometry in fewer gradient evaluations compared to the FSM-LST in 12 of the test cases. There are two test cases where the TS optimization based on the FSM-RIC requires sizably more gradient evaluations than the equivalent optimization based on FSM-LST, bicyclobutane ring-opening and methylamine dehydrogenation.

The FSM-RIC successfully finds the exact benchmark TS structure for each test case in the Baker benchmark set. The FSM-LST, however, fails to find the correct TS structure in two of the 16 test cases unique to the Baker benchmark set, as highlighted in italic text in Table 3.3. In the case of cyclopropyl radical ring opening reaction, the FSM-RIC TS guess structure is optimized to the correct TS geometry as confirmed by computed TS mode imaginary frequency -1052 cm-1 and forward barrier height 53.8 kcal/mol. The FSM-LST TS guess is optimized to a planar C3 molecule with TS mode corresponding to out-of-plane motion of the central carbon atom and associated imaginary frequency -659 cm-1 and forward barrier height 103.1 kcal/mol. In the case of methanimine isomerization to aminomethylene (H2CNH \rightarrow HCNH2), the FSM-RIC TS guess structure is optimized to the correct TS geometry where the migrating hydrogen atom moves slightly out-of-plane. The TS geometry is confirmed by computed TS mode imaginary frequency -2210 cm-1 and forward barrier height 85.4 kcal/mol. The FSM-LST TS guess structure is optimized in three steps to a second-order saddle point associated with a planar transition geometry. The TS geometry is confirmed to have two vibrational modes with associated imaginary frequencies -2222 cm-1 and -472 cm-1 and forward barrier height 87.3 kcal/mol. The first vibrational mode corresponds to the correct TS mode while the second vibrational mode is associated with out-of-plane motion.

3.4 Ablation Study

Refer to caption
Figure 2: Ablation study of the FSM-RIC for TS guess structure generation and TS search on the Baker benchmark set. Performance is measured based on the average number of gradient evaluations required to achieve TS optimization convergence, and is separated to indicate gradient evaluation used for guess structure generation (FSM) and TS search (TS). (Left) Performance of the FSM-RIC when the minimum number of FSM images (Nnodes) is varied from nine to 18. (Center) Performance of the FSM-RIC when the number of optimization steps per interpolation step (Nopt) is varied from one to three. (Right) Performance of the FSM-RIC when the number of line search steps per optimization step (Nls) is varied from two to four.

We use the Baker benchmark set for further investigation of the effects of modifying the FSM algorithm parameters on the performance of the FSM-RIC. Figure 2 shows the average number of gradient calls required for FSM-RIC calculation and TS optimization per Baker benchmark set test case across three separate ablation studies. In the left panel of Figure 2, we show the results of our first ablation study where the minimum number of string nodes (NnodesN_{\mathrm{nodes}}) is varied while other optimization parameters are held fixed. We find that decreasing NnodesN_{\mathrm{nodes}} from 18 to nine results in a significantly reduced number of gradient calculations for TS guess structure determination, and observe little variation overall in the number of gradient evaluations required for TS optimization. The central and right panels of Figure 2 show the results of the ablation studies where NnodesN_{\mathrm{nodes}} is held fixed and the maximum number of optimization steps per interpolation step (NoptN_{\mathrm{opt}}) and the maximum number of line search steps taken for each optimization step (NlsN_{\mathrm{ls}}), respectively, are varied around the original setting. We find that reducing NoptN_{\mathrm{opt}} gives further improvement in the number of required gradient evaluations for TS guess structure determination, while having little effect on the quality of the TS guess structure as indicated by the required gradient evaluations for TS optimization. We find that the required number of gradient evaluations for TS guess structure determination is insensitive to NlsN_{\mathrm{ls}}, indicating that though we allow up to three line search steps per optimization step, we often require fewer in practice.

The results of the ablation study suggest that we can significantly reduce the computational cost of TS finding by the FSM-RIC, while maintaining a high success rate, by taking larger interpolation steps (smaller NnodesN_{\mathrm{nodes}}) and fewer optimization steps per interpolation step. We perform FSM calculations with nominally 9 nodes along the approximate reaction pathway (NnodesN_{\mathrm{nodes}}), one step per optimization cycle (NoptN_{\mathrm{opt}}), and at most three line search steps per optimization cycle (NlsN_{\mathrm{ls}}). The comparison between the required gradient calls for FSM and TS calculations using RIC or LST interpolation on the Baker benchmark set with these modified settings is presented in Table 3.4. The FSM-RIC successfully locates the exact benchmark TS structure in fewer required gradient evaluations for each test case. The average required number of gradient evaluations to produce a TS guess structure is 19 and, similar to the more conservative operational settings, the TS guess structures require on average 20 gradient evaluations for optimization. In contrast, the FSM-LST fails to locate the benchmark TS structure in seven of the 24 total test cases present in the Baker benchmark set. The average required number of gradient evaluations to produce a TS guess structure is 19, but TS optimization requires on average 31 gradient evaluations, indicating that the TS guess structures are of worse quality compared with FSM-LST calculations run with more conservative settings. A typical course of action upon TS calculation failure is to re-run the calculation with more conservative settings. This, in practice, would increase the average gradient evaluation requirements for successfully determining a TS structure and can possibly negate the computational cost savings obtained by running the FSM-LST algorithm at more aggressive settings. These results highlight an important finding of this work, that the incorporation of RICs permits the use of larger interpolation steps during TS guess determination, resulting in fewer gradient calculations, while maintaining a good overall success rate when compared with LST interpolation.

ID Reaction Gradients
Gradients (FSM-RIC)
Gradients
Gradients
Gradients
Gradients
(TOTAL-LST)
1 HCN \rightarrow HNC 22 4 26 21 7 28
2 HCCH \rightarrow CCH2 21 9 30 23 10 33
3 H2CO \rightarrow H2 + CO 18 37 55 17 53 70
4 CH3O \rightarrow CH2OH 16 7 23 16 8 24
5 cyclopropyl ring opening 20 14 34 22 61 83
6 bicyclo[1.1.0]butane \rightarrow trans-butandiene 20 74 94 20 32 52
7 formyloxyethyl 1,2-migration 24 11 35 14 62 76
8 parent Diels-Alder cycloaddition 23 21 44 16 30 46
9 s-tetrazine \rightarrow 2HCN + N2 22 8 30 25 40 65
10 trans-butadiene \rightarrow cis-butadiene 22 3 25 22 44 66
11 CH3CH3 \rightarrow CH2CH2 + H2 18 37 55 16 25 41
12 CH3CH2F \rightarrow CH2CH2 + HF 17 15 32 18 16 34
13 acetaldehyde keto-enol tautomerism 20 8 28 19 8 27
14 HCOCl \rightarrow HCl + CO 22 10 32 20 12 32
15 H2O + PO3{}_{3}^{-} \rightarrow H2PO4{}_{4}^{-} 17 34 51 16 28 44
16 CH2CHCH2CH2CHO Claisen rearrangement 20 29 49 20 39 59
17 SiH2 + CH3CH3 \rightarrow SiH3CH2CH3 19 9 28 16 10 26
18 HNCCS \rightarrow HNC + CS 23 8 31 26 10 36
19 HCONH+3{}_{3}^{+} \rightarrow NH+4{}_{4}^{+} + CO 20 35 55 30 1 31
20 acrolein rotational TS 24 4 28 20 17 37
21 HCONHOH \rightarrow HCOHNHO 19 17 36 20 11 31
22 HNC + H2 \rightarrow H2CNH 6 31 37 17 17 34
23 H2CNH \rightarrow HCNH2 9 7 16 19 5 24
24 HCNH2 \rightarrow HCN + H2 20 46 66 19 196 215
Table 4: Comparison of performance of the FSM-RIC and FSM-LST for TS guess structure generation and TS search on the Baker benchmark set using modified FSM settings (Nnodes=9, Nopt=1, Nls=3). Performance is measured based on the total number of gradient evaluations required to achieve FSM and TS search convergence. The method (FSM-RIC or FSM-LST) requiring the fewest gradient evaluations is highlighted in bold text. Italicized text indicates a failed search or a search where the wrong TS geometry was located.

4 Conclusion

We have demonstrated the application of the FSM to TS search and optimization using three diverse benchmark sets containing over 40 chemical reactions. The FSM with either LST and RIC interpolation, when run with previously recommended interpolation stepsize and optimization settings, is a reliable technique for producing TS guess geometries and achieves a high success rate for TS optimization on all chemical reactions studied. More importantly, we have shown how incorporation of RIC interpolation into the FSM algorithm, a novel contribution of this work, enables the use of larger interpolation stepsizes and fewer optimization steps per interpolation step, which in turn reduces the computational cost of TS search and optimization while maintaining 100% success rate. An open-source Python implementation of the FSM algorithm, in addition to the reactant, product, and TS structures of all reactions studied computed at the ω\omegaB97X-V/def2-TZVP level of theory is provided as a contribution of this work.48

{acknowledgement}

J.G acknowledges start up funding from The University of Iowa. This research was supported in part through computational resources provided by The University of Iowa.

References

  • Elber and Karplus 1987 Elber, R.; Karplus, M. A method for determining reaction paths in large molecules: Application to myoglobin. Chemical Physics Letters 1987, 139, 375–380
  • Cerjan and Miller 1981 Cerjan, C. J.; Miller, W. H. On finding transition states. The Journal of Chemical Physics 1981, 75, 2800–2806
  • Schlegel 1982 Schlegel, H. B. Optimization of equilibrium geometries and transition structures. Journal of Computational Chemistry 1982, 3, 214–218
  • Simons et al. 1983 Simons, J.; Joergensen, P.; Taylor, H.; Ozment, J. Walking on potential energy surfaces. The Journal of Physical Chemistry 1983, 87, 2745–2753
  • Baker 1986 Baker, J. An algorithm for the location of transition states. Journal of Computational Chemistry 1986, 7, 385–395
  • Wales 1992 Wales, D. J. Basins of attraction for stationary points on a potential-energy surface. Journal of the Chemical Society, Faraday Transactions 1992, 88, 653–657
  • Wales 1993 Wales, D. J. Locating stationary points for clusters in cartesian coordinates. Journal of the Chemical Society, Faraday Transactions 1993, 89, 1305–1313
  • Banerjee et al. 1985 Banerjee, A.; Adams, N.; Simons, J.; Shepard, R. Search for stationary points on surfaces. The Journal of Physical Chemistry 1985, 89, 52–57
  • Ayala and Schlegel 1997 Ayala, P. Y.; Schlegel, H. B. A combined method for determining reaction paths, minima, and transition state geometries. The Journal of Chemical Physics 1997, 107, 375–384
  • Heyden et al. 2005 Heyden, A.; Bell, A. T.; Keil, F. J. Efficient methods for finding transition states in chemical reactions: Comparison of improved dimer method and partitioned rational function optimization method. The Journal of Chemical Physics 2005, 123, 224101
  • Mills and Jónsson 1994 Mills, G.; Jónsson, H. Quantum and thermal effects in H 2 dissociative adsorption: Evaluation of free energy barriers in multidimensional quantum systems. Physical Review Letters 1994, 72, 1124
  • Henkelman and Jónsson 2000 Henkelman, G.; Jónsson, H. Improved tangent estimate in the nudged elastic band method for finding minimum energy paths and saddle points. The Journal of Chemical Physics 2000, 113, 9978–9985
  • Henkelman et al. 2000 Henkelman, G.; Uberuaga, B. P.; Jónsson, H. A climbing image nudged elastic band method for finding saddle points and minimum energy paths. The Journal of Chemical Physics 2000, 113, 9901–9904
  • Maragakis et al. 2002 Maragakis, P.; Andreev, S. A.; Brumer, Y.; Reichman, D. R.; Kaxiras, E. Adaptive nudged elastic band approach for transition state calculation. The Journal of Chemical Physics 2002, 117, 4651–4658
  • Chu et al. 2003 Chu, J.-W.; Trout, B. L.; Brooks, B. R. A super-linear minimization scheme for the nudged elastic band method. The Journal of Chemical Physics 2003, 119, 12708–12717
  • Trygubenko and Wales 2004 Trygubenko, S. A.; Wales, D. J. A doubly nudged elastic band method for finding transition states. The Journal of Chemical Physics 2004, 120, 2082–2094
  • Kolsbjerg et al. 2016 Kolsbjerg, E. L.; Groves, M. N.; Hammer, B. An automated nudged elastic band method. The Journal of Chemical Physics 2016, 145, 094107
  • Ruttinger et al. 2022 Ruttinger, A. W.; Sharma, D.; Clancy, P. Protocol for directing nudged elastic band calculations to the minimum energy pathway: Nurturing errant calculations back to convergence. Journal of Chemical Theory and Computation 2022, 18, 2993–3005
  • Peters et al. 2004 Peters, B.; Heyden, A.; Bell, A. T.; Chakraborty, A. A growing string method for determining transition states: Comparison to the nudged elastic band and string methods. The Journal of Chemical Physics 2004, 120, 7877–7886
  • Quapp 2005 Quapp, W. A growing string method for the reaction pathway defined by a Newton trajectory. The Journal of Chemical Physics 2005, 122, 174106
  • Goodrow et al. 2008 Goodrow, A.; Bell, A. T.; Head-Gordon, M. Development and application of a hybrid method involving interpolation and ab initio calculations for the determination of transition states. The Journal of Chemical Physics 2008, 129, 174109
  • Goodrow et al. 2009 Goodrow, A.; Bell, A. T.; Head-Gordon, M. Transition state-finding strategies for use with the growing string method. The Journal of Chemical Physics 2009, 130, 244108
  • Goodrow et al. 2010 Goodrow, A.; Bell, A. T.; Head-Gordon, M. A strategy for obtaining a more accurate transition state estimate using the growing string method. Chemical Physics Letters 2010, 484, 392–398
  • Behn et al. 2011 Behn, A.; Zimmerman, P. M.; Bell, A. T.; Head-Gordon, M. Incorporating linear synchronous transit interpolation into the growing string method: Algorithm and applications. Journal of Chemical Theory and Computation 2011, 7, 4019–4025
  • Zimmerman 2013 Zimmerman, P. M. Growing string method with interpolation and optimization in internal coordinates: Method and examples. The Journal of Chemical Physics 2013, 138, 184102
  • Zimmerman 2013 Zimmerman, P. Reliable transition state searches integrated with the growing string method. Journal of Chemical Theory and Computation 2013, 9, 3043–3050
  • Zimmerman 2015 Zimmerman, P. M. Single-ended transition state finding with the growing string method. Journal of Computational Chemistry 2015, 36, 601–611
  • Jafari and Zimmerman 2017 Jafari, M.; Zimmerman, P. M. Reliable and efficient reaction path and transition state finding for surface reactions with the growing string method. Journal of Computational Chemistry 2017, 38, 645–658
  • Behn et al. 2011 Behn, A.; Zimmerman, P. M.; Bell, A. T.; Head-Gordon, M. Efficient exploration of reaction paths via a freezing string method. The Journal of Chemical Physics 2011, 135, 224108
  • Sharada et al. 2012 Sharada, S. M.; Zimmerman, P. M.; Bell, A. T.; Head-Gordon, M. Automated transition state searches without evaluating the Hessian. Journal of Chemical Theory and Computation 2012, 8, 5166–5174
  • Suleimanov and Green 2015 Suleimanov, Y. V.; Green, W. H. Automated discovery of elementary chemical reaction steps using freezing string and Berny optimization methods. Journal of Chemical Theory and Computation 2015, 11, 4248–4259
  • Halgren and Lipscomb 1977 Halgren, T. A.; Lipscomb, W. N. The synchronous-transit method for determining reaction pathways and locating molecular transition states. Chemical Physics Letters 1977, 49, 225–232
  • Smidstrup et al. 2014 Smidstrup, S.; Pedersen, A.; Stokbro, K.; Jónsson, H. Improved initial guess for minimum energy path calculations. The Journal of Chemical Physics 2014, 140
  • Byrd et al. 1995 Byrd, R. H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing 1995, 16, 1190–1208
  • Virtanen et al. 2020 Virtanen, P.; Gommers, R.; Oliphant, T. E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; others SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods 2020, 17, 261–272
  • Peng et al. 1996 Peng, C.; Ayala, P. Y.; Schlegel, H. B.; Frisch, M. J. Using redundant internal coordinates to optimize equilibrium geometries and transition states. Journal of Computational Chemistry 1996, 17, 49–56
  • Lee-Ping and Song 2016 Lee-Ping, W.; Song, C. Geometry optimization made simple with translation and rotation coordinates. The Journal of Chemical Physics 2016, 144
  • Bakken and Helgaker 2002 Bakken, V.; Helgaker, T. The efficient optimization of molecular geometries using redundant internal coordinates. The Journal of Chemical Physics 2002, 117, 9160–9174
  • Wilson et al. 1980 Wilson, E. B.; Decius, J. C.; Cross, P. C. Molecular vibrations: the theory of infrared and Raman vibrational spectra; Courier Corporation, 1980
  • Mardirossian and Head-Gordon 2014 Mardirossian, N.; Head-Gordon, M. ω\omegaB97X-V: A 10-parameter, range-separated hybrid, generalized gradient approximation density functional with nonlocal correlation, designed by a survival-of-the-fittest strategy. Physical Chemistry Chemical Physics 2014, 16, 9904–9924
  • Weigend and Ahlrichs 2005 Weigend, F.; Ahlrichs, R. Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: Design and assessment of accuracy. Physical Chemistry Chemical Physics 2005, 7, 3297–3305
  • Schmidt et al. 1985 Schmidt, M. W.; Gordon, M. S.; Dupuis, M. The intrinsic reaction coordinate and the rotational barrier in silaethylene. Journal of the American Chemical Society 1985, 107, 2585–2589
  • Epifanovsky et al. 2021 Epifanovsky, E.; Gilbert, A. T.; Feng, X.; Lee, J.; Mao, Y.; Mardirossian, N.; Pokhilko, P.; White, A. F.; Coons, M. P.; Dempwolff, A. L.; others Software for the frontiers of quantum chemistry: An overview of developments in the Q-Chem 5 package. The Journal of Chemical Physics 2021, 155, 084801
  • Birkholz and Schlegel 2015 Birkholz, A. B.; Schlegel, H. B. Using bonding to guide transition state optimization. Journal of Computational Chemistry 2015, 36, 1157–1166
  • Casewit et al. 1992 Casewit, C.; Colwell, K.; Rappe, A. Application of a universal force field to organic molecules. Journal of the American Chemical Society 1992, 114, 10035–10046
  • Stewart 2007 Stewart, J. J. Optimization of parameters for semiempirical methods V: Modification of NDDO approximations and application to 70 elements. Journal of Molecular Modeling 2007, 13, 1173–1213
  • Baker and Chan 1996 Baker, J.; Chan, F. The location of transition states: A comparison of Cartesian, Z-matrix, and natural internal coordinates. Journal of Computational Chemistry 1996, 17, 888–904
  • Marks and Gomes 2024 Marks, J.; Gomes, J. thegomeslab/fsm: Data for "Incorporation of Internal Coordinates Interpolation into the Freezing String Method". 2024; \urlhttps://zenodo.org/doi/10.5281/zenodo.12735423