This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Inverse Cubic Iteration

Robert M. Corless
Abstract

There are thousands of papers on rootfinding for nonlinear scalar equations. Here is one more, to talk about an apparently new method, which I call “Inverse Cubic Iteration” (ICI) in analogy to the Inverse Quadratic Iteration in Richard Brent’s zeroin method. The possibly new method is based on a cubic blend of tangent-line approximations for the inverse function. We rewrite this iteration for numerical stability as an average of two Newton steps and a secant step: only one new function evaluation and derivative evaluation is needed for each step. The total cost of the method is therefore only trivially more than Newton’s method, and we will see that it has order 1+3=2.7321+\sqrt{3}=2.732..., thus ensuring that to achieve a given accuracy it usually takes fewer steps than Newton’s method while using essentially the same effort per step.

1 Introduction

Recently I wrote, with Erik Postma, a paper on blends of Taylor series [3]. A “blend” is just a two-point Hermite interpolant: if we have series for f(x)f(x) at x=ax=a and another series for the same function f(x)f(x) at x=bx=b then we can blend the series to get a (usually) better approximation to f(x)f(x) across the interval. The paper discusses how to compute high-order blends stably and efficiently in Maple, and looks at some applications of them.

In contrast, this paper looks at a low-order blend, and a particular application, namely rootfinding. In fact all we will need is a cubic. If we just use series up to and including terms of order 11 at each point, namely f(a)f(a) and f(a)f^{\prime}(a) and f(b)f(b) and f(b)f^{\prime}(b) then these four pieces of information determine, as is well known, a cubic Hermite interpolant.

What is less well-known (but still perfectly well-known, to be honest) is that from these same pieces of information one can usually construct a cubic interpolant of the inverse function invf(y)f(y), also written f˘(y)\breve{f}(y). The data for that information is that the points of expansion are y=f(a)y=f(a) and y=f(b)y=f(b), and the function values are aa and bb, and the derivative values are 1/f(a)1/f^{\prime}(a) and 1/f(b)1/f^{\prime}(b): four pieces of information which together suffice to determine a cubic polynomial in yy that fits the data.

Of course we cannot have f(a)=0f^{\prime}(a)=0 or f(b)=0f^{\prime}(b)=0, and indeed f(x)=0f^{\prime}(x)=0 anywhere near the interval is bad news for a rootfinder: the root will be ill-conditioned, if it’s determined at all. If f(x)=0f^{\prime}(x)=0 anywhere in the interval then the function is not guaranteed to be monotonic, and so the inverse function will not be defined in the whole neighbourhood. Throwing caution to the winds, we’re not going to worry about this at all.

2 Derivation of the Formula

Nearly any numerical analysis textbook will have an expression equivalent to the following for a cubic Hermite interpolant, fitting four pieces of information to get a cubic polynomial that fits those pieces of information:

y=\displaystyle y= (1+2θ)(θ1)2f(a)+θ(θ1)2hf(a)\displaystyle\left(1+2\,\theta\right)\left(\theta-1\right)^{2}f(a)+\theta\,\left(\theta-1\right)^{2}{hf^{\prime}(a)}
+θ2(32θ)f(b)+θ2(θ1)hf(b)\displaystyle+{\theta}^{2}\left(3-2\,\theta\right)f(b)+{\theta}^{2}\left(\theta-1\right){hf^{\prime}(b)} (1)

where θ=(xa)/(ba)\theta=(x-a)/(b-a) and h=bah=b-a. When θ=0\theta=0, x=ax=a; when θ=1\theta=1, x=bx=b. Since the derivative d/dθ=dx/dθd/dx=hd/dxd/d\theta=dx/d\theta\cdot d/dx=hd/dx we see that it all works out: y=f(a)y=f(a) when θ=0\theta=0, dy/dθ=hf(a)dy/d\theta=hf^{\prime}(a) and y=f(b)y=f(b) when θ=1\theta=1 and dy/dθ=hf(b)dy/d\theta=hf^{\prime}(b) when θ=1\theta=1.

Now we do the inverse. Put s=(yf(a))/(f(b)f(a))s=(y-f(a))/(f(b)-f(a)) and Δ=f(b)f(a)\Delta=f(b)-f(a). We see right away that we are in trouble if f(a)=f(b)f(a)=f(b); but then, by Rolle’s theorem there would be a place cc between aa and bb where f(c)=0f^{\prime}(c)=0 if that were the case, and we have already said that we are going to ignore that difficulty. Indeed, both Inverse Quadratic Iteration (IQI) and the secant method also fail in that case, so at least we are in good company.

We want

x=\displaystyle x= ((1+2s)a+sΔf(a))(1s)2\displaystyle\left(\left(1+2\,s\right)a+{\frac{s\Delta}{f^{\prime}\left(a\right)}}\right)\left(1-s\right)^{2}
+((32s)b(1s)Δf(b))s2.\displaystyle+\left(\left(3-2\,s\right)b-{\frac{\left(1-s\right)\Delta}{f^{\prime}\left(b\right)}}\right){s}^{2}\>. (2)

By similar reasoning to before, we see that at s=0s=0 then x=ax=a and dx/dy=1/f(a)dx/dy=1/f^{\prime}(a), while at s=1s=1 then x=bx=b and dx/dy=1/f(b)dx/dy=1/f^{\prime}(b).

Why do this? The wonderfully simple trick of Inverse Quadratic Iteration, namely that to find an approximate value of xx that makes y=0y=0 one simply substitutes y=0y=0 into the Inverse formula, also works here. Putting y=0y=0 means s=(0f(a))/(f(b)f(a))s=(0-f(a))/(f(b)-f(a)) (a number between 0 and 11) and that gets us our definite value of xx. Doing this blindly, we get

((12f(a)Δ)af(a)f(a))(1+f(a)Δ)2\displaystyle\left(\left(1-2\,{\frac{f\left(a\right)}{\Delta}}\right)a-{\frac{f\left(a\right)}{f^{\prime}\left(a\right)}}\right)\left(1+{\frac{f\left(a\right)}{\Delta}}\right)^{2}
+(f(a))2Δ2((3+2f(a)Δ)bΔf(b)(1+f(a)Δ))\displaystyle\qquad+{\frac{\left(f\left(a\right)\right)^{2}}{{\Delta}^{2}}\left(\left(3+2\,{\frac{f\left(a\right)}{\Delta}}\right)b-{\frac{\Delta}{f^{\prime}\left(b\right)}\left(1+{\frac{f\left(a\right)}{\Delta}}\right)}\right)} (3)

This is correct, but I suspect numerically useless. To be honest, I didn’t even try it. Instead, I looked for a way to rewrite it to improve its numerical stability.

3 A numerically more stable expression

After a surprisingly small amount of algebraic manipulation (perhaps a generous share of luck was involved) I arrived at the following:

xn+1=\displaystyle x_{n+1}= yn2(yn1yn)2(xn1yn1f(xn1))+yn12(yn1yn)2(xnynf(xn))\displaystyle{\frac{{y_{{n}}}^{2}}{\left(y_{{n-1}}-y_{{n}}\right)^{2}}\left(x_{{n-1}}-{\frac{y_{{n-1}}}{f^{\prime}\left(x_{{n-1}}\right)}}\right)}+{\frac{{y_{{n-1}}}^{2}}{\left(y_{{n-1}}-y_{{n}}\right)^{2}}\left(x_{{n}}-{\frac{y_{{n}}}{f^{\prime}\left(x_{{n}}\right)}}\right)}
2yn1yn(yn1yn)2(xnyn(xnxn1)ynyn1).\displaystyle-2\,{\frac{y_{{n-1}}y_{{n}}}{\left(y_{{n-1}}-y_{{n}}\right)^{2}}\left(x_{{n}}-{\frac{y_{{n}}\left(x_{{n}}-x_{{n-1}}\right)}{y_{{n}}-y_{{n-1}}}}\right)}\>. (4)

Call this iteration Inverse Cubic Iteration, or ICI. Instead of presenting a tedious transformation of the previous formula into the one above (it is equivalent, trust me, although I did make the sleight-of-hand transformation from aa and bb to xn1x_{n-1} and xnx_{n}, implying iteration), I will verify that it has the appropriate behaviour in the next section; we will look there at how quickly it converges to a root (when it converges at all). The interesting part of this, to me, is that we can recognize all three terms in equation (4). There is a Newton iteration starting at xn1x_{n-1}, another at xnx_{n}, and a secant iteration using both points. Then all three are averaged together111It might be numerically better still to average the previous points a0xn+a1xn1+asxna_{0}x_{n}+a_{1}x_{n-1}+a_{s}x_{n}, and average the small updates a0yn/f(xn)+a1yn1/f(xn1)+as(ynΔx/Δy)a_{0}y_{n}/f^{\prime}(x_{n})+a_{1}y_{n-1}/f^{\prime}(x_{n-1})+a_{s}(y_{n}\Delta x/\Delta y), and then use this average update to improve the average of the previous points. Currently this is the way my implementation does it, but it may not make much difference.: the weights are yn2y_{n}^{2}, yn12y_{n-1}^{2}, and 2ynyn1-2y_{n}y_{n-1} all divided by (ynyn1)2(y_{n}-y_{n-1})^{2} so that the weights all add up to 11, as they should. The most accurate estimate is given the most weight, by using the residuals yky_{k} on the other estimates. I think this is a rather pretty formula.

The numerical stability (or lack thereof) of the secant method and of Newton’s method is well-understood; by writing the secant method as above we make a “small change” to an existing value, and similarly Newton’s method is written in as stable a way as it can be. Thus we have what is likely to be a reasonably stable iteration formula.

Because this method uses two points xn1x_{n-1} and xnx_{n} to generate a third point xn+1x_{n+1}, it seems similar to the secant method. Because it uses derivatives f(xk)f^{\prime}(x_{k}) at each point, it seems similar to Newton’s method. It is not very similar to IQI, which fits an inverse quadratic to three points and uses that to generate a fourth. Halley’s method and other Schröder iterations use higher derivatives and will be more expensive per step. [It might be interesting to try to average higher-order methods in a similar manner to speed up convergence even more; so far I have not been able to succeed in doing so. The simple trick of replacing ff by f/ff/\sqrt{f^{\prime}} which transforms Newton’s method to Halley’s method (and speeds up secant) does not seem to work, with these weights.]

I am going to assume that we start with a single initial guess x0x_{0} and then generate x1x_{1} by a single step of Newton’s method. Then it is obvious that to carry out one step with equation (4) we need only one more function evaluation y1=f(x1)y_{1}=f(x_{1}) and one more derivative evaluation f(x1)f^{\prime}(x_{1}), because we can record the previous y0=f(x0)y_{0}=f(x_{0}) and f(x0)f^{\prime}(x_{0}) instead of recomputing them. In the standard model of rootfinding iteration where the dominant cost is function evaluation and derivative evaluation, we see that this method is no more expensive than Newton’s method itself. But instead of computing Xn+1=xnf(xn)/f(xn)X_{n+1}=x_{n}-f(x_{n})/f^{\prime}(x_{n}) and forgetting all about xn1x_{n-1}, once we have formed this Newton iteration we average it with the previous Newton iteration together with a secant iteration: the cost of forming this average is no more than a few floating-point operations, which we may consider to be trivial compared to the assumed treasury-breaking expense of evaluating f(x)f(x) and f(x)f^{\prime}(x). When we try this on Newton’s classical example z32z5z^{3}-2z-5 with an initial guess of z0=1z_{0}=1 we get 2929 Digits of accuracy after 77 iterations (1010 Digits of accuracy after 66 iterations).

I will sketch the cost of this method using the normal conventions in section 5.

4 The order of the method

If we assume that rr is the root we are searching for, f(r)=0f(r)=0, and that xn1=r+εn1x_{n-1}=r+\varepsilon_{n-1} and xn=r+εnx_{n}=r+\varepsilon_{n} then a straightforward and brutal series computation in Maple shows that

xn+1=r+(εn1εn)2f(iv)(r)f′′(r)10f(r)f′′(r)f′′′(r)+15(f′′(r))34!(f(r))3+.x_{n+1}=r+(\varepsilon_{n-1}\varepsilon_{n})^{2}\frac{f^{(iv)}(r)f^{\prime\prime}(r)-10f^{\prime}(r)f^{\prime\prime}(r)f^{\prime\prime\prime}(r)+15(f^{\prime\prime}(r))^{3}}{4!(f^{\prime}(r))^{3}}+\cdots\>. (5)

The error in the residual, which is what we will actually measure, is ever so slightly different: y(xk)=f(r+εk)=f(r)+f(r)εky(x_{k})=f(r+\varepsilon_{k})=f(r)+f^{\prime}(r)\varepsilon_{k} and so we can remove one f(r)f^{\prime}(r) factor from the denominator term above. Notice that each xkx_{k} exactly solves F(x)=f(x)ykF(x)=f(x)-y_{k}; this is a trivial observation, but often useful: by iterating, we get the exact solutions to slightly different equations. Here, though, it doesn’t matter. Both formulae show the trend of the errors. That is, the error in the next iterate is the square of the product of the two previous errors. This is satisfyingly analogous to that of the secant method (product of the two previous errors) and Newton’s method (square of the previous error). Solving the recurrence relation lnεn+1=2(lnεn+lnεn1)\ln\varepsilon_{n+1}=2(\ln\varepsilon_{n}+\ln\varepsilon_{n-1}) shows that the error is asymptotically the 1+3=2.7321+\sqrt{3}=2.732\ldotsth power of the previous one; not quite cubic convergence, but substantially faster than quadratic convergence.

Let’s take the method out for a spin to see if practice matches theory. Consider the function f(x)=(x2+1)exp(x)1/3f(x)=(x^{2}+1)\exp(-x)-1/3. This has two places where f(x)=0f^{\prime}(x)=0, namely x=(1±5)/2x=(1\pm\sqrt{5})/2. Provided we stay away from there, we should be fine. Taking our initial guess to be x0=2.0x_{0}=2.0, we then compute x1x_{1} by Newton’s method as usual. Now we have two iterates to work with; we then compute x2x_{2}, x3x_{3}, \ldots, x8x_{8}. Because convergence is so rapid, and I wanted to demonstrate it, I worked with Digits set to be 10001000. This was overkill, but it allows the plot in figure 1 to show clearly that the error in x8x_{8} is about 1059410^{-594}. Newton’s method (red squares) achieves only 106310^{-63} with the same effort. The ratios yk/(yk1yk2)2y_{k}/(y_{k-1}y_{k-2})^{2} for k=2k=2, 33, \ldots, 88 were, respectively, 1.59521.5952, 17.04817.048, 4.59554.5955, 4.90614.9061, 4.90804.9080, 4.90814.9081, and 4.90804.9080. These fit the curve yk=(0.6437)(1+3)ky_{k}=(0.6437)^{(1+\sqrt{3})^{k}}, demonstrating the near-cubic order of the method. The constant 0.64370.6437 was found by fitting to the final data point. This trend predicts that the accuracy in x9x_{9} would be 4.9081(y8y7)2=1.73831016224.9081(y_{8}y_{7})^{2}=1.7383\cdot 10^{-1622}. Re-doing the computation in 16241624 Digits and extending to x9x_{9}, we get y9=1.7383101622y_{9}=1.7383\cdot 10^{-1622}, just as predicted.

Refer to caption
Figure 1: Convergence to a root of f(x)=(x2+x)exp(x)1/3f(x)=(x^{2}+x)\exp(-x)-1/3. The initial guess was x0=2.0x_{0}=2.0. The next iterate was computed by Newton’s method, x1=x0f(x0)/f(x0)x_{1}=x_{0}-f(x_{0})/f^{\prime}(x_{0}). Thereafter equation (4) was used. Computing was carried out in 10001000 Digits. What is plotted is the logarithm (base 1010) of the residual error, equivalently log10|yn|=log10|f(xn)|\log_{10}|y_{n}|=\log_{10}|f(x_{n})|. The dashed blue line represents the theoretical curve yn=(0.6437)(1+3)ny_{n}=(0.6437)^{(1+\sqrt{3})^{n}} demonstrating the theoretical computation of the order of the method above. The constant 0.64370.6437 was found by fitting the final data point. Newton’s method (red squares) achieves only 6363 Digits of accuracy instead of ICI’s nearly 600600.

5 The cost of the method

The best analysis—that I know—of the cost of Newton’s method for solving a scalar equation by iterating with constant precision is contained in [9]. There, the author weighs the cost of function evaluations and estimates that the cost of derivative evaluations are usually a modest factor larger than that of function evaluations; then, balancing the greater cost of Newton’s method per step against its general need for fewer iterations, he concludes that usually the secant method should win. Yes, the secant method usually takes more iterations; but each one usually costs only a fraction of a Newton step. It is only when the derivatives are unusually inexpensive that Newton’s method wins. For the example in figure 1 this is the case: the dominant cost of the function evaluation is the exponential, and that can be re-used in the derivative. So, for this particular problem, ICI should win over Newton’s method and even more decisively over the secant method. But to win over Newton’s method, ICI needs to take fewer iterations, because the cost per step is essentially identical to Newton’s method. ICI will never222“What, never?” “No, never!” “What, never?” “Hardly ever!” —the crew and Captain of HMS Pinafore take more iterations than Newton’s method. But will it take fewer?

That is actually the issue. Will we save any iterations at all? Newton’s method typically converges very rapidly, given a good initial guess. To give a very rough sketch, if the initial error is 222^{-2} and all the constants are 11, we can expect a double precision answer in five iterations (actually the error then would be wasted: 2642^{-64}, and rounding errors would have made it at best 2522^{-52}). ICI would have the sequence of errors ε0=22\varepsilon_{0}=2^{-2}, ε1=24\varepsilon_{1}=2^{-4} (same as Newton for the x1x_{1} iterate, of course), ε2=212\varepsilon_{2}=2^{-12} squaring the product 22242^{-2}\cdot 2^{-4}, then ε3=22(12+4)=232\varepsilon_{3}=2^{-2(12+4)}=2^{-32} which is the same error that Newton achieved in four iterations. One more gets us ε4=22(32+12)=288\varepsilon_{4}=2^{-2(32+12)}=2^{-88} which is a vast overkill; still, we have beaten Newton by one iteration with this initial error.

If instead the initial guess is only accurate to 212^{-1}, not 222^{-2}, then Newton takes six iterations while ICI also takes six: ε0=21\varepsilon_{0}=2^{-1}, ε1=22\varepsilon_{1}=2^{-2}, ε3=26\varepsilon_{3}=2^{-6}, ε4=216\varepsilon_{4}=2^{-16}, ε5=244\varepsilon_{5}=2^{-44}.

If instead the initial guess is accurate to 232^{-3} then Newton takes five iterations, while ICI takes ε1=26\varepsilon_{1}=2^{-6}, ε2=218\varepsilon_{2}=2^{-18}, ε3=248\varepsilon_{3}=2^{-48} and at four iterations beats Newton.

The lesson seems clear: if one wants ultra-high precision, the ICI method will give the same results in fewer iterations of essentially the same cost. If one only wants a double precision result, then the number of iterations you save may not be much, if any (and if it doesn’t really beat Newton, then it won’t often beat the secant method). And both methods demand good initial guesses.

6 Initial guesses

I have been using the scheme “choose only one initial guess and let Newton’s method determine the next iterate” above. Obviously there are alternative strategies. I like this one because it’s usually hard enough to come up with one initial guess to a root, let alone two.

Using this strategy also allows one to plot basins of attraction: which initial guesses converge to which root? I took the function f(z)=z31f(z)=z^{3}-1 and sampled a 16001600 by 16001600 grid on 2x2-2\leq x\leq 2, 2y2-2\leq y\leq 2 where z=x+iyz=x+iy and took at most 1313 iterations of ICI starting from each gridpoint; if the iteration converged (to 10810^{-8}) earlier, that earlier result was recorded. Plotting the argument (phase) of the answer gives an apparent fractal diagram, very similar to that for Newton’s method, but different in detail. In particular, the basins of attraction are visibly disconnected. See figures 2 and 3.

Refer to caption
Figure 2: Approximate basins of attraction for the ICI method for f(z)=z31f(z)=z^{3}-1. Each initial guess z0=x+iyz_{0}=x+iy is coloured with the phase of the result after 1313 iterations of ICI, namely arg(z13)(z_{13}) (earlier iterations may have converged, in which case the limit is recorded). The first iterate z1z_{1} is obtained by Newton’s method. Computation done on a 16001600 by 16001600 grid on 2x2-2\leq x\leq 2, 2y2-2\leq y\leq 2. Notice that the fractal structure is more complicated than that for simple Newton’s method, suggesting that the choice of initial guess is more fraught for this method.
Refer to caption
Figure 3: Zooming in, we show the region 1.45x1.05-1.45\leq x\leq-1.05 and 0.2y0.2-0.2\leq y\leq 0.2 again with a 16001600 by 16001600 grid for the same function and iterations as in figure 2. We see disconnected components.

7 Multiple roots and infinite derivatives

Of course this method won’t help with multiple roots. We’re dividing by derivatives, which go to zero if we approach a multiple root. It’s true that the method can work, but if it does it will work more slowly; even so, the sequence yk/(yk1yk2)2y_{k}/(y_{k-1}y_{k-2})^{2} will just grow without bound. I tried this, on f(x)=(x2)2f(x)=(x-2)^{2}, with x0=0.5x_{0}=0.5, and this expectation is correct. The method did converge in ten iterations to the double root x=2x=2, with an error of about 101410^{-14}, but note that Newton’s method also converges linearly for this problem.

Another place for failure of rootfinding is when any derivative is infinite—that is, at or near derivative singularities. The presence of up to the fourth derivative in the error formula (5) suggests that this method will be more sensitive to singularities than Newton’s method is.

8 Where to, next

The first question to ask is, can this work for nonlinear systems of equations? Even though this method, stripped of its origins, is just a weighted average of two Newton iterations and a secant iteration, all of which can be used for systems, I find that I am dubious. It’s not clear to me what to use for the scalar weights: replace e.g. yk2/(ykyk1)2y_{k}^{2}/(y_{k}-y_{k-1})^{2} by yk2/ykyk12\|y_{k}\|^{2}/\|y_{k}-y_{k-1}\|^{2} using some norm? Which norm? Perhaps it will work, with the right choice. But even if it did, for large systems the largest computational effort is usually spent in setting up and solving the linear systems: full Newton iteration is rarely used, when cheaper approximate iterations will get you there faster. Moreover, the real benefit of the higher speed of convergence of ICI isn’t fully felt at double precision or lower—I think it will only matter for quad precision (double-double!) or better.

I believe that the best that can be hoped for from this method is to be used in a computer algebra system for its high-precision scalar solvers, which currently might use other high-order iterations. It is here, if anywhere, where this method will be useful.

There may be similar two-step iterations of higher order, perhaps starting with “inverse quintic iteration” which uses terms up to the second derivative and would have order (3+21)/23.79(3+\sqrt{21})/2\approx 3.79. I have not yet investigated any such schemes.

In ultra high-precision applications there is sometimes the option to change precision as you go. This is effective for Newton iteration; only the final function evaluation and derivative evaluation need to be done at the highest precision. I have not investigated the effects of variable precision on the efficiency of this method.

Then there are more practical matters: what to do in the inevitable case when yk=yk1y_{k}=y_{k-1}? [This happens when the iteration converges, for instance!] Or when f(xk)f^{\prime}(x_{k}) is too small? It turns out that the details of the extremely practical zeroin method [1] which combines IQI with bisection and the secant method for increased reliability (this method is implemented in fzero command in Matlab) matter quite a bit. More improvements on the method can be found in [11], which make the worst-case behaviour better. Attention needs to be paid to these details.

Then there is another academic matter: in all those tens of thousands of papers on rootfinding, has nobody thought of this before? Really? Again, I find that I am dubious, although as a two-step method it is technically not one of Schröder’s iterations (which keep getting reinvented). And it’s not one of Kalantari’s methods either [5]. But it’s such a pretty method—surely someone would have noticed it before, and tried to write about it? Perhaps it’s an exercise in one of the grand old numerical analysis texts, like Rutishauser’s (which I don’t have a copy of).

There are (literally) an infinite number of iterative methods to choose from; see the references in [7] for a pointer to the literature, including to [10] and from there to G. W. Stewart’s translation of Schröder’s work from the 1919th century, entitled “On infinitely many algorithms for solving equations”. The bibliography [8] has tens of thousands of entries.

However, searching the web for “Inverse Cubic Iteration” (surely a natural name) fails. We do find papers when searching for names like “Accelerated Newton’s method”, such as [4]; but the ones I have found are each different to ICI (for instance the last-cited paper finds a true third-order method; ICI is not quite third-order but only 1+31+\sqrt{3}). The method of [6], termed “Leap-Frog Newton”, is quite similar to ICI and consists of an intermediate Newton value followed by a secant step: this costs two function evaluations and a derivative evaluation per step (so is more expensive per step than is ICI which requires only one function evaluation and derivative evaluation per step after the first one) but is genuinely third order. I found this last paper by searching for “combining Newton and secant iterations,” so perhaps I could find papers describing closer matches to ICI, if only I could think of the correct search term. I am currently asking my friends and acquaintances if they know of any. Do you?

“Six months in the laboratory can save you three days in the library.”

—folklore as told to me by Henning Rasmussen

References

  • [1] R. P. Brent. An algorithm with guaranteed convergence for finding a zero of a function. The Computer Journal, 14(4):422–425, 01 1971.
  • [2] Robert M Corless and Nicolas Fillion. A graduate introduction to numerical methods, volume 10. Springer, 2013.
  • [3] Robert M. Corless and Erik Postma. Blends in Maple. arXiv 2007.05041, 2020.
  • [4] J. M. Gutiérrez and M. A. Hernández. An acceleration of Newton’s method: Super-Halley method. Appl. Math. Comput., 117(2–3):223–239, January 2001.
  • [5] Bahman Kalantari. Polynomial root-finding and polynomiography. World Scientific, 2008.
  • [6] A Bathi Kasturiarachi. Leap-frogging Newton’s method. International Journal of Mathematical Education in Science and Technology, 33(4):521–527, 2002.
  • [7] Ao Li and Robert M Corless. Revisiting Gilbert Strang’s “a chaotic search for ii”. ACM Communications in Computer Algebra, 53(1):1–22, 2019.
  • [8] J.M. McNamee. Numerical methods for roots of polynomials. Elsevier Science Limited, 2007.
  • [9] A. Neumaier. Introduction to numerical analysis. Cambridge University Press, 2001.
  • [10] MS Petković, LD Petković, and D. Herceg. On Schröder’s families of root-finding methods. Journal of computational and applied mathematics, 233(8):1755–1762, 2010.
  • [11] Gautam Wilkins and Ming Gu. A modified Brent’s method for finding zeros of functions. Numerische Mathematik, 123(1):177–188, 2013.
Refer to caption
Figure 4: Bonus figure: basins of attraction under ICI for Kepler’s equation z0.083sin(z)1=0z-0.083\sin(z)-1=0 on a 16001600 by 16001600 grid 30.5x29.5-30.5\leq x\leq-29.5, 17.5y16.5-17.5\leq y\leq-16.5, with tolerance 10810^{-8} and a maximum of 3030 iterations. The pure white areas represent areas where the iteration encountered a NaN; other colours represent different phases after 3030 iterations or convergence, whichever happened first. The picture is quite a bit more complicated than the similar picture for Newton’s method in Figure 3.21 in [2].