This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An Elementary Method For Fast Modular Exponentiation With Factored Modulus

(Date: May 2, 2025)
Abstract.

We present a fast algorithm for modular exponentiation when the factorization of the modulus is known. Let a,n,ma,n,m be positive integers and suppose mm factors canonically as i=1kpiei\prod_{i=1}^{k}p_{i}^{e_{i}}. Choose integer parameters ti[1,ei]t_{i}\in[1,e_{i}] for 1ik1\leq i\leq k. Then we can compute the modular exponentiation an(modm)a^{n}\pmod{m} in O(max(ei/ti)+i=1ktilogpi)O(\max(e_{i}/t_{i})+\sum_{i=1}^{k}t_{i}\log p_{i}) steps (i.e., modular operations). We go on to analyze this algorithm mathematically and programmatically, showing significant asymptotic improvement in specific cases. Specifically, for an infinite family of mm we achieve a complexity of O(logm)O(\sqrt{\log m}) steps, much faster than the Repeated Squaring Algorithm, which has complexity O(logm)O(\log m). Additionally, we extend our algorithm to matrices and hence general linear recurrences. The complexity is similar; with the same setup we can exponentiate matrices in GLd(/m)GL_{d}(\mathbb{Z}/m\mathbb{Z}) in less than O(max(ei/ti)+d2i=1ktilogpi)O(\max(e_{i}/t_{i})+d^{2}\sum_{i=1}^{k}t_{i}\log p_{i}) steps. This improves Fiduccia’s algorithm and the results of Bostan and Mori in the case of /m\mathbb{Z}/m\mathbb{Z}. We prove analogous results for /pk\mathbb{Z}/p^{k}\mathbb{Z} ring extensions.

1. Introduction

An important problem at the intersection of cryptography and number theory is the Modular Exponentiation Problem. This is the problem of computing an(modm)a^{n}\pmod{m} for positive integers a,n,ma,n,m. It happens that solving for nn in the equation anb(modm)a^{n}\equiv b\pmod{m}, called the Discrete Logarithm Problem (DLP), is computationally hard. This makes modular exponentiation useful in many cryptosystems, most notably the RSA public key cryptosystem and the Diffie-Hellman key exchange. The standard algorithm to solve the Modular Exponentiation Problem is the Repeated Squaring Algorithm, which runs in O(logn)O(\log n) steps (in this paper, a step is a modular multiplication). To our knowledge, there are no existing significant asymptotic improvements to this algorithm.

In this paper, we present an efficient algorithm to solve the Modular Exponentiation Problem when the factorization of the modulus is known. Such a situation is potentially practically useful, as modular exponentiation is generally performed for encryption and thus the user has the freedom of choosing mm and knowings its factorization. In the 1984 paper [1], Bach proves the existence of a Hensel lifting type algorithm that can lift solutions to the discrete logarithm from modulo pp to modulo pep^{e} in polynomial time. One can intuit that in a similar way, there exists a speedup to modular exponentiation when the factorization of the modulus has some prime power in it; this is what our algorithm provides. The algorithm is quite elementary: it hinges on the binomial theorem and the clever recursive computation of inverses and binomial coefficients. It also depends on a set of parameters: If mm can be factored as i=1kpiei\prod_{i=1}^{k}p_{i}^{e_{i}}, we choose for each ii an integer parameter ti[1,ei]t_{i}\in[1,e_{i}]. We then compute an(modm)a^{n}\pmod{m} in O(max(ei/ti)+i=1ktilogpi)O(\max(e_{i}/t_{i})+\sum_{i=1}^{k}t_{i}\log p_{i}) steps (again, steps are modular multiplications). We additionally provide an O(1)O(1) memory algorithm for memory-sensitive scenarios, where we define O(1)O(1) memory to mean the memory of an integer modulo mm, or logm\log m bits. For general mm we make asymptotic improvements as measured by specific metrics. For a certain infinite family of mm we achieve a complexity of O(logm)O(\sqrt{\log m}). Furthermore, our algorithm profits from the development of other fast algorithms. For example, if one were to discover a faster modular exponentiation algorithm, we could use this as the modExp function in our algorithm. This, in turn, makes our algorithm faster than the default one.

In section 2 of this paper, we lay out some preliminary results that our algorithm and analysis require. In section 3, we present the algorithm with pseudocode. In section 4, we mathematically analyze our algorithm. Because its complexity depends on number-theoretic properties of mm, we require results from analytic number theory to estimate the “average” complexity. We discuss families of mm for which we make significant improvements. Then, in section 5, we present an version of our algorithm for matrices, linear recurrences, and /pk\mathbb{Z}/p^{k}\mathbb{Z} ring extensions. Finally, in section 6, we test our algorithm for prime powers against Python’s built-in pow\mathrm{pow} function to show our algorithm does practically. A Python implementation of the algorithm and programmatical analysis for it can be found at [2].

2. Preliminaries

The complexity of our algorithm depends on the number-theoretic properties of the modulus, so we will need multiple preliminary definitions to continue with this analysis. In general, we use the variable pp for a prime. We denote the canonical decomposition mm as m=pieim=\prod p_{i}^{e_{i}}. By convention, we let a(modb)a\pmod{b} denote the least residue of aa modulo bb. We let νp(n)\nu_{p}(n) denote the pp-adic valuation of nn, and we use this notation interchangeably with ν(n,p)\nu(n,p). We let ζ()\zeta(\bullet) be the Riemann Zeta Function and let φ()\varphi(\bullet) denote Euler’s Totient Function. Additionally, we let ϑ()\vartheta(\bullet) denote Chebyshev’s Function. We must define the radical of an integer nn because this is intimately related to the optimal complexity that our algorithm may achieve:

Definition 2.1.

If n=i=1kpiein=\prod_{i=1}^{k}p_{i}^{e_{i}}, define the radical of nn as radn=i=1kpi\operatorname{rad}n=\prod_{i=1}^{k}p_{i}.

A useful notation that will assist our analysis is the following:

Definition 2.2.

Let n=i=1kpiein=\prod_{i=1}^{k}p_{i}^{e_{i}}, and define the multiset S={x1,x2,,xk}S=\{x_{1},x_{2},\cdots,x_{k}\}. We define HS(n):=max(eixi)H_{S}(n):=\max\left(\frac{e_{i}}{x_{i}}\right), and set H(n):=H{1,1,,1}(n)H(n):=H_{\{1,1,\ldots,1\}}(n).

Our algorithm hinges on the following theorem, due to Euler:

Theorem 2.3.

(Euler) Let a,na,n be relatively prime positive integers. Then aφ(n)1(modn)a^{\varphi(n)}\equiv 1\pmod{n}.

Another result due to Euler which we need to use in our analysis is the Euler Product formula:

Theorem 2.4.

(Euler) Let a(n):a(n):\mathbb{N}\to\mathbb{C} be a multiplicative function, i.e., a(mn)=a(m)a(n)a(mn)=a(m)a(n) is true when gcd(m,n)=1\gcd(m,n)=1. Then

n1a(n)ns=pprimek=0a(pk)pks.\sum_{n\geq 1}\frac{a(n)}{n^{s}}=\prod_{p\leavevmode\nobreak\ \mathrm{prime}}\sum_{k=0}^{\infty}\frac{a(p^{k})}{p^{ks}}.

In particular, we have the following Euler Product for ζ\zeta:

ζ(s)=pprime11ps.\zeta(s)=\prod_{p\leavevmode\nobreak\ \text{prime}}\frac{1}{1-p^{-s}}.

To acquire precise asymptotics for specific sums, we need the following asymptotics:

Theorem 2.5.

(Stirling’s Approximation) log(x!)=xlogxx+O(logx)\log(x!)=x\log x-x+O(\log x)

Theorem 2.6.

(Chebyshev) ϑ(x)=O(x)\vartheta(x)=O(x)

To approximate average order summations, we will need Abel’s summation formula, referred to as partial summation:

Theorem 2.7.

(Abel) Let (an)n=0(a_{n})^{\infty}_{n=0} be a sequence of complex numbers. Define A(t)=0ntanA(t)=\sum_{0\leq n\leq t}a_{n}. Fix real numbers x<yx<y and let ϕ\phi be a continuously differentiable function on [x,y][x,y]. Then

x<nyanϕ(n)=A(y)ϕ(y)A(x)ϕ(x)xyA(u)ϕ(u)du.\sum_{x<n\leq y}a_{n}\phi(n)=A(y)\phi(y)-A(x)\phi(x)-\int_{x}^{y}A(u)\phi^{\prime}(u)\mathrm{d}u.

We compare our algorithm to the standard repeated squaring method for modular exponentiation (there is no general asymptotic improvement to this method, as far as we know). We may compute an(modm)a^{n}\pmod{m} in O(logn)O(\log n) steps by this method. By Euler’s theorem, we can reduce this to O(log(φ(m)))O(\log(\varphi(m))) which is considered O(logm)O(\log m).

For a ring RR, we let GLn(R)GL_{n}(R) denote the general linear group over RR. We will need the well-known fact that |GLn(𝔽p)|=i=0n1(pnpi)|GL_{n}(\mathbb{F}_{p})|=\prod_{i=0}^{n-1}(p^{n}-p^{i}). We will also need the following theorem due to Lagrange:

Theorem 2.8.

(Lagrange) Let GG be a group of order nn. Then for every aGa\in G, ana^{n} is the identity.

To compute the order of general linear groups over /T\mathbb{Z}/T\mathbb{Z}, we must first compute the order of GLn(/pk)GL_{n}(\mathbb{Z}/p^{k}\mathbb{Z}). We do this with the following lemma:

Lemma 2.9.

For a prime pp and a k0k\geq 0 we have

|GLn(/pk)|=p(k1)n2|GLn(𝔽p)|=p(k1)n2i=0n1(pnpi).|GL_{n}(\mathbb{Z}/p^{k}\mathbb{Z})|=p^{(k-1)n^{2}}|GL_{n}(\mathbb{F}_{p})|=p^{(k-1)n^{2}}\prod_{i=0}^{n-1}(p^{n}-p^{i}).
Proof.

We proceed by induction on kk, with k=1k=1 trivial. Notice the natural ring homomorphism φ:/pk/pk1\varphi:\mathbb{Z}/p^{k}\mathbb{Z}\to\mathbb{Z}/p^{k-1}\mathbb{Z} given by a+pka+pk1a+p^{k}\mathbb{Z}\mapsto a+p^{k-1}\mathbb{Z}. This induces the surjection φn:GLn(/pk)GLn(/pk1)\varphi_{n}:GL_{n}(\mathbb{Z}/p^{k}\mathbb{Z})\to GL_{n}(\mathbb{Z}/p^{k-1}\mathbb{Z}). Hence |GLn(/pk)|=|ker(φn)||GLn(/pk1)||GL_{n}(\mathbb{Z}/p^{k}\mathbb{Z})|=|\ker(\varphi_{n})||GL_{n}(\mathbb{Z}/p^{k-1}\mathbb{Z})|. Since there are pp choices for each entry of a matrix in the kernel of φn\varphi_{n}, the size of the kernel is pn2p^{n^{2}}, and we have the desired result. ∎

This allows us to prove the following important lemma:

Lemma 2.10.

If T=p1t1p2t2pktkT=p_{1}^{t_{1}}p_{2}^{t_{2}}\cdots p_{k}^{t_{k}} is the canonical factorization of some TT\in\mathbb{N},

|GLn(/T)|=j=1kpj(tj1)n2i=0n1(pjnpji).|GL_{n}(\mathbb{Z}/T\mathbb{Z})|=\prod_{j=1}^{k}p_{j}^{(t_{j}-1)n^{2}}\prod_{i=0}^{n-1}(p_{j}^{n}-p_{j}^{i}).
Proof.

By the Chinese Remainder Theorem, we have the ring isomorphism

/Tj=1k/pjtj,\mathbb{Z}/T\mathbb{Z}\to\prod_{j=1}^{k}\mathbb{Z}/p_{j}^{t_{j}}\mathbb{Z},

and thus there is a corresponding isomorphism

GLn(/T)j=1kGLn(/pjtj).GL_{n}(\mathbb{Z}/T\mathbb{Z})\to\prod_{j=1}^{k}GL_{n}(\mathbb{Z}/p_{j}^{t_{j}}\mathbb{Z}).

This implies the desired result by Lemma 2.9. ∎

Remark 2.11.

It follows from Lemma 2.10 that we may compute |GLn(/T)||GL_{n}(\mathbb{Z}/T\mathbb{Z})| in O(kn)O(kn) steps, which is insignificant.

We use standard convention (from [3]) for the asymptotic notations O(),o(),,,O(\bullet),o(\bullet),\ll,\gg,\sim, and Ω()\Omega(\bullet). We use <O()<O(\bullet) and >O()>O(\bullet) rather unconventionally: f(x)<O(g(x))f(x)<O(g(x)) provided there is a function h(x)=O(g(x))h(x)=O(g(x)) such that f(x)<h(x)f(x)<h(x) for sufficiently large xx. We use a similar definition for >O()>O(\bullet).

3. The Algorithm

Our main theorem is the following:

Theorem 3.1.

Let mm be a positive integer with known factorization p1e1p2e2pkekp_{1}^{e_{1}}p_{2}^{e_{2}}\cdots p_{k}^{e_{k}} and let a,na,n be positive integers such that gcd(a,m)=1\gcd(a,m)=1. For each 1ik1\leq i\leq k, choose an integer parameter ti[1,ei]t_{i}\in[1,e_{i}]. With 𝒯={t1,t2,,tk}\mathcal{T}=\{t_{1},t_{2},\cdots,t_{k}\}, we may compute an(modm)a^{n}\pmod{m} in

O(H𝒯(m)+i=1ktilogpi)O\left(H_{\mathcal{T}}(m)+\sum_{i=1}^{k}t_{i}\log p_{i}\right)

steps.

First, we need a definition that will simplify our calculation of modular inverses.

Definition 3.2.

For some aa and m=pieim=\prod p_{i}^{e_{i}}, define {ua,va}\{u_{a},v_{a}\} to be the inverse pair of aa modulo mm, where va=piν(a,pi)v_{a}=\prod p_{i}^{\nu(a,p_{i})} and ua(a/va)1(modm)u_{a}\equiv(a/v_{a})^{-1}\pmod{m}. Define the inverse pair of 0 as {0,0}\{0,0\}.

Example 1.

Let us compute the inverse pair of 1212 modulo 2020 as an example. We have v12=2ν(12,2)5ν(12,5)=4v_{12}=2^{\nu(12,2)}\cdot 5^{\nu(12,5)}=4. Then, u12(12/4)17(mod20)u_{12}\equiv(12/4)^{-1}\equiv 7\pmod{20}. So the inverse pair of 1212 modulo 2020 is {4,7}\{4,7\}. Observe also that 47112(mod20)4\cdot 7^{-1}\equiv 12\pmod{20}.

Notice that uau_{a} always exists because a/vaa/v_{a} is clearly invertible modulo mm. Intuitively, inverse pairs are inverses equipped with an extra parameter that allows for computation of fractions with denominator not necessarily relatively prime to mm.

Consider the following lemma that allows for recursive computation of modular inverses.

Lemma 3.3.

If all of 1,2,,a1,2,\dots,a are invertible modulo mm,

a1(m(moda))1ma(modm).a^{-1}\equiv-(m\pmod{a})^{-1}\Big{\lfloor}\frac{m}{a}\Big{\rfloor}\pmod{m}.
Proof.

Note m=ama+(m(moda))m=a\lfloor\frac{m}{a}\rfloor+(m\pmod{a}). Working in /m\mathbb{Z}/m\mathbb{Z},

(m(moda))1(mma)=mmamama=maama=1a,(m\pmod{a})^{-1}\cdot\left(m-\Big{\lfloor}\frac{m}{a}\Big{\rfloor}\right)=\frac{m-\lfloor\frac{m}{a}\rfloor}{m-a\lfloor\frac{m}{a}\rfloor}=\frac{-\lfloor\frac{m}{a}\rfloor}{-a\lfloor\frac{m}{a}\rfloor}=\frac{1}{a},

as desired. ∎

We now extend this result from a recursive computation of modular inverses to a recursive computation of inverse pairs.

Lemma 3.4.

We may linearly compute {un,vn}\{u_{n},v_{n}\} given the inverse pairs 1i<n{ui,vi}\bigcup_{1\leq i<n}\{u_{i},v_{i}\}.

Proof.

We claim that the following formula holds:

un={um%nmnvm%n(modm)vn=1un/vn(modm)vn>1.u_{n}=\begin{cases}u_{m\%n}\cdot\frac{-\lfloor\frac{m}{n}\rfloor}{v_{m\%n}}\pmod{m}&v_{n}=1\\ u_{n/v_{n}}\pmod{m}&v_{n}>1.\end{cases}

Note that gcd(n/vn,m)=1\gcd(n/v_{n},m)=1 by definition, so that

un/vn(nvn)1un(modm).u_{n/v_{n}}\equiv\left(\frac{n}{v_{n}}\right)^{-1}\equiv u_{n}\pmod{m}.

Suppose vn=1v_{n}=1, so that mm and nn are relatively prime. Decompose m=qn+rm=qn+r via the division algorithm. We wish to show that

unurqvrvrrqvrqr(modm).u_{n}\equiv u_{r}\cdot\frac{q}{v_{r}}\equiv\frac{v_{r}}{r}\cdot\frac{q}{v_{r}}\equiv\frac{q}{r}\pmod{m}.

Note that

unn1,u_{n}\equiv n^{-1},

so we wish to show that

qn+r0(modm),qn+r\equiv 0\pmod{m},

which is obvious. There are no inversion issues as gcd(r,m)=1\gcd(r,m)=1 by the Euclidean Algorithm, and gcd(vr,m)=1\gcd(v_{r},m)=1 as vrrv_{r}\mid r. ∎

We are now equipped to prove Theorem 3.1. The algorithm is as follows.

Proof.

First, define T=p1t1p2t2pitiT=p_{1}^{t_{1}}p_{2}^{t_{2}}\cdots p_{i}^{t_{i}} and Φ=φ(T)\Phi=\varphi(T). Decompose n=MΦ+rn=M\Phi+r with the division algorithm. Then,

anaMΦ+rar(aΦ)M(modm).a^{n}\equiv a^{M\Phi+r}\equiv a^{r}\cdot(a^{\Phi})^{M}\pmod{m}.

Now, ara^{r} can be computed with the standard Repeated Squaring Algorithm. The complexity of this is O(logr)O(\log r). By the division algorithm, r<Φ<Tr<\Phi<T, so this is O(logT)O(\log T). For computation of the second term, first notice that by Euler’s theorem, aΦ1(modT)a^{\Phi}\equiv 1\pmod{T} so there exists some integer ss with aΦ=Ts+1a^{\Phi}=Ts+1. Now, we expand using the binomial theorem:

(aΦ)M=(Ts+1)M,(a^{\Phi})^{M}=(Ts+1)^{M},
=1(M0)+Ts(M1)++(Ts)i(Mi)++(Ts)M(MM).=1\cdot\binom{M}{0}+Ts\cdot\binom{M}{1}+\cdots+(Ts)^{i}\binom{M}{i}+\cdots+(Ts)^{M}\binom{M}{M}.

Let =1+max(ei/ti)\ell=1+\text{max}(\lfloor e_{i}/t_{i}\rfloor). Only the first \ell terms need to be computed, as all the later terms are 0 modulo mm. This is because for a prime pip_{i},

νpi(T)=νpi((piti))=νpi(piti(ei/ti+1))=ti(ei/ti+1)ti(ei/ti)=ei,\nu_{p_{i}}(T^{\ell})=\nu_{p_{i}}(\left(p_{i}^{t_{i}}\right)^{\ell})=\nu_{p_{i}}(p_{i}^{t_{i}(\lfloor e_{i}/t_{i}\rfloor+1)})=t_{i}(\lfloor e_{i}/t_{i}\rfloor+1)\geq t_{i}(e_{i}/t_{i})=e_{i},

so pieiTp_{i}^{e_{i}}\mid T^{\ell} for all ii, and thus mTm\mid T^{\ell} and T0(modm)T^{\ell}\equiv 0\pmod{m}.

Additionally, we can linearly compute the (Mi)\binom{M}{i} terms with the identity

(Mi+1)=M!(i+1)!(Mi1)!,\binom{M}{i+1}=\frac{M!}{(i+1)!(M-i-1)!},
=Mii+1M!i!(Mi)!=Mii+1(Mi).=\frac{M-i}{i+1}\frac{M!}{i!(M-i)!}=\frac{M-i}{i+1}\binom{M}{i}.

Notice that in our binomial recursion, the one inverse we need to calculate (Mi)\binom{M}{i} is the inverse of ii. Using Lemma 3.4, because we are performing a linear computation, we can compute the inverse pairs of 0 through 1\ell-1 modulo mm in O()O(\ell) time and O()O(\ell) space. This is the only time we use more than O(1)O(1) memory in this algorithm. From here, we can use our identity to compute (M0)\binom{M}{0} to (M1)\binom{M}{\ell-1} in O()O(\ell) time and space. This is the final aspect of our calculation, and we may now recover an(modm)a^{n}\pmod{m}. It is now clear that the number of steps is O(+logT)O(\ell+\log T) and the space complexity is O()O(\ell). Because =1+H𝒯(m)\ell=1+H_{\mathcal{T}}(m) and

logT=log(i=1kpiti)=i=1klog(piti)=i=1ktilogpi,\log T=\log\left(\prod_{i=1}^{k}p_{i}^{t_{i}}\right)=\sum_{i=1}^{k}\log(p_{i}^{t_{i}})=\sum_{i=1}^{k}t_{i}\log p_{i},

this concludes the proof of our main theorem. ∎

Example 2.

We present the details of the algorithm in the special case of k=1k=1, p1=pp_{1}=p, e1=ee_{1}=e, and t1=1t_{1}=1 (so T=pT=p). Our goal is to compute an(modpe)a^{n}\pmod{p^{e}}. We let n=(p1)m+rn=(p-1)m+r, so that

an(ap1)mar.a^{n}\equiv(a^{p-1})^{m}a^{r}.

Notice that ap11(modp)a^{p-1}\equiv 1\pmod{p}, so we let ap1=1+spa^{p-1}=1+sp. Compute ara^{r} via the standard modExp algorithm. Now compute

(ap1)m=(1+sp)m1+(m1)(sp)1+(m2)(sp)2++(me1)(sp)e1(modpe),(a^{p-1})^{m}=(1+sp)^{m}\equiv 1+\binom{m}{1}(sp)^{1}+\binom{m}{2}(sp)^{2}+\cdots+\binom{m}{e-1}(sp)^{e-1}\pmod{p^{e}},

by the Binomial Theorem. We can recursively compute the (mi)\binom{m}{i} coefficients modulo pep^{e} by the method outlined above. Thus we may compute each (mi)(sp)i\binom{m}{i}(sp)^{i} term for 0ie10\leq i\leq e-1. Finally, compute an(ap1)mar(modpe)a^{n}\equiv(a^{p-1})^{m}a^{r}\pmod{p^{e}}.

Example 3.

As a particular case, let us compute 7123(mod113)7^{123}\pmod{11^{3}}. Decompose 123=1210+3123=12\cdot 10+3. Compute 73343(mod113)7^{3}\equiv 343\pmod{11^{3}}. Write s=710111s=\frac{7^{10}-1}{11}. Notice ss is an integer by Euler. Then

7120(710)12(1+11s)12=(120)+(121)11s+(122)(11s)223(mod113).7^{120}\equiv(7^{10})^{12}\equiv(1+11s)^{12}=\binom{12}{0}+\binom{12}{1}\cdot 11s+\binom{12}{2}\cdot(11s)^{2}\equiv 23\pmod{11^{3}}.

Therefore, 7123343231234(mod113)7^{123}\equiv 343\cdot 23\equiv 1234\pmod{11^{3}}.

Notice that this algorithm is O(H𝒯(m))O(H_{\mathcal{T}}(m)) space, which is not always ideal. For more memory-sensitive scenarios, we may compute inverses directly rather than recursively. This gives rise to the following theorem:

Theorem 3.5.

Let mm be a positive integer with known factorization p1e1p2e2pkekp_{1}^{e_{1}}p_{2}^{e_{2}}\cdots p_{k}^{e_{k}} and let a,na,n be positive integers such that gcd(a,m)=1\gcd(a,m)=1. For each 1ik1\leq i\leq k, choose an integer parameter ti[1,ei]t_{i}\in[1,e_{i}]. With 𝒯={t1,t2,,tk}\mathcal{T}=\{t_{1},t_{2},\cdots,t_{k}\}, we may compute an(modm)a^{n}\pmod{m} in

O(H𝒯(m)log(H𝒯(m))+i=1ktilogpi)O\left(H_{\mathcal{T}}(m)\log(H_{\mathcal{T}}(m))+\sum_{i=1}^{k}t_{i}\log p_{i}\right)

steps and O(1)O(1) memory.

Proof.

Identical to Theorem 3.1, except instead of recursively computing inverse pairs, compute the modular inverses unu_{n} with standard inversion algorithms: un(n/vn)1(modm)u_{n}\equiv(n/v_{n})^{-1}\pmod{m}. If, for example, we use the Extended Euclidean Algorithm then calculating the inverses of 11 to \ell has complexity O(n=1log(n/vn))=O(n=1logn)=O(log)O(\sum_{n=1}^{\ell}\log(n/v_{n}))=O(\sum_{n=1}^{\ell}\log n)=O(\ell\log\ell) We do not give pseudocode for this algorithm, instead see [2]. ∎

3.1. Pseudocode

Define modExp(a,b,c)\textsc{modExp}(a,b,c) to be the standard Repeated Squaring Algorithm for computing ab(modc)a^{b}\pmod{c}. We use the integer division notation a//b=aba//b=\lfloor\frac{a}{b}\rfloor. The pseudocode builds up to our algorithm, FastModExp(a\textsc{FastModExp}(a, nn, PP, EE, T)T). Here aa and nn are positive integers, as usual. PP and EE are equal sized arrays, PP is an array of primes and EE is an array of positive integers. TT also has the same length as PP and EE, and is an array of non-negative integer parameters such that T[i]E[i]T[i]\leq E[i] for all ii. We store the inverse pairs in a 2-dimensional array LL. This algorithm computes ana^{n} modulo m=i=1len(P)P[i]E[i]m=\prod_{i=1}^{\mathrm{len}(P)}P[i]^{E[i]}.

Algorithm 1 Computes ν(n,p)\nu(n,p)
function ν\nu(nn, pp)
     if n%p=0n\%p=0 then
         return 1+ν(n//p,p)1+\nu(n//p,p)
     end if
     return 0
end function
Algorithm 2 Computes the inverse pair of ii given inverse pairs of 0 through i1i-1
function nextInversePair(ii, mm, LL, PP)
     v1v\leftarrow 1
     j0j\leftarrow 0
     while j<len(P)j<\text{len}(P) do
         vvP[j]ν(i,P[j])v\leftarrow v\cdot P[j]^{\nu(i,P[j])}
         jj+1j\leftarrow j+1
     end while
     if v=1v=1 then
         u(L[m%i][0]((mm//i)//L[m%i][1]))%mu\leftarrow(L[m\%i][0]\cdot((m-m//i)//L[m\%i][1]))\%m
     end if
     if v1v\neq 1 then
         uL[i//v][0]u\leftarrow L[i//v][0]
     end if
     return [u,v][u,v]
end function
Algorithm 3 Computes the inverse pairs of 0 through ii
function generateInversePairs(ii, mm, PP)
     L[[0,0],[1,1]]L\leftarrow[[0,0],[1,1]]
     j2j\leftarrow 2
     while j<i+1j<i+1 do
         LL+nextInversePair(jmLP)L\leftarrow L+\textsc{nextInversePair}(j\text{, }m\text{, }L\text{, }P)
         jj+1j\leftarrow j+1
     end while
     return LL
end function
Algorithm 4 Computes ana^{n} modulo m=P[i]E[i]m=\prod P[i]^{E[i]}
function FastModExp(aa, nn, PP, EE, TT)
     t1t\leftarrow 1
     ϕ1\phi\leftarrow 1
     m1m\leftarrow 1
     i0i\leftarrow 0
     while i<len(P)i<\text{len}(P) do
         tempP[i]T[i]1\text{temp}\leftarrow P[i]^{T[i]-1}
         ϕ=ϕtemp(P[i]1)\phi=\phi\cdot\text{temp}\cdot(P[i]-1)
         t=ttempP[i]t=t\cdot\text{temp}\cdot P[i]
         m=mP[i]E[i]m=m\cdot P[i]^{E[i]}
         ii+1i\leftarrow i+1
     end while
     rn%ϕr\leftarrow n\%\phi
     q(nr)//ϕq\leftarrow(n-r)//\phi
     cmodExp(a,ϕ,m)1c\leftarrow\textsc{modExp}(a,\phi,m)-1
     sum0\text{sum}\leftarrow 0
     choose=1\text{choose}=1
     cExp1\text{cExp}\leftarrow 1
     0\ell\leftarrow 0
     i0i\leftarrow 0
     while i<len(P)i<\text{len}(P) do
         if <E[i]//T[i]\ell<E[i]//T[i] then
              E[i]//T[i]\ell\leftarrow E[i]//T[i]
         end if
         ii+1i\leftarrow i+1
     end while
     inversesgenerateInversePairs(mP)\text{inverses}\leftarrow\textsc{generateInversePairs}(\ell\text{, }m\text{, }P)
     i0i\leftarrow 0
     while i<min(q+1)i<\textsc{min}(\ell\text{, }q+1) do
         sum(sum+(choosecExp))%m\text{sum}\leftarrow(\text{sum}+(\text{choose}\cdot\text{cExp}))\%m
         cExp(cExpc)%m\text{cExp}\leftarrow(\text{cExp}\cdot c)\%m
         choose(((choose(qi))%m)//inverses[i+1][1]inverses[i+1][0])%m\text{choose}\leftarrow(((\text{choose}\cdot(q-i))\%m)//\text{inverses}[i+1][1]\cdot\text{inverses}[i+1][0])\%m
         ii+1i\leftarrow i+1
     end while
     armodExp(a,r,m)\text{ar}\leftarrow\textsc{modExp}(a,r,m)
     return (sumar)%m(\text{sum}\cdot\text{ar})\%m
end function

4. Mathematical Analysis

It’s important to optimally choose the parameters tit_{i}. For general mm, the optimal choice is of the form ti=O(1)t_{i}=O(1) for all ii. This is due to the following theorem:

Theorem 4.1.

[4] We have that

limn1nknH(k)=C,\lim_{n\to\infty}\frac{1}{n}\sum_{k\leq n}H(k)=C,

where C=1+k2(11ζ(k))1.705C=1+\sum_{k\geq 2}\left(1-\frac{1}{\zeta(k)}\right)\approx 1.705 is Niven’s constant.

In other words, for general mm we expect H(m)H(m) to be O(1)O(1). The choice of ti=O(1)it_{i}=O(1)\forall i is then clear. Let us proceed with some analysis of this choice.

By some metrics, we make asymptotic improvement to the standard O(logm)O(\log m) repeated squaring methods with ti=1t_{i}=1 (i.e., T=radmT=\operatorname{rad}m). But by other metrics, we may not. First, let us sweep the H(m)H(m) term under the rug (we can do so due to Theorem 4.1). If we just compute mxlogmlogradm\sum_{m\leq x}\frac{\log m}{\log\operatorname{rad}m} or mx(logmlogradm)\sum_{m\leq x}(\log m-\log\operatorname{rad}m), we get Ω(x)\Omega(x). However, the sum mxmradm\sum_{m\leq x}\frac{m}{\operatorname{rad}m} is not O(x)O(x), and is not even O(x(logx)A)O(x(\log x)^{A}) for any AA.

Theorem 4.2.

For any xx\in\mathbb{N}, we have that

mxlogmlogradm=Ω(x)=mx(logmlogradm)\sum_{m\leq x}\frac{\log m}{\log\operatorname{rad}m}=\Omega(x)=\sum_{m\leq x}(\log m-\log\operatorname{rad}m)
Proof.

By the simple estimate that abab+1\frac{a}{b}\leq a-b+1 for ab1a\geq b\geq 1, we have that

mxlogmlogradmO(x)+mx(logmlogradm).\sum_{m\leq x}\frac{\log m}{\log\operatorname{rad}m}\leq O(x)+\sum_{m\leq x}(\log m-\log\operatorname{rad}m).

Note that, by Stirling’s approximation,

mxlogm=log(x!)=xlogx+O(x),\sum_{m\leq x}\log m=\log(x!)=x\log x+O(x),

and

mxlogradm=mxlog(pmp)=mxpmlogp.\sum_{m\leq x}\log\operatorname{rad}m=\sum_{m\leq x}\log\left(\prod_{p\mid m}p\right)=\sum_{m\leq x}\sum_{p\mid m}\log p.

Swapping the order of the summation, we may rewrite this sum as

pxlogpxp=pxxplogpO(pxlogp)=xpxlogppO(ϑ(x)).\sum_{p\leq x}\log p\left\lfloor\frac{x}{p}\right\rfloor=\sum_{p\leq x}\frac{x}{p}\log p-O\left(\sum_{p\leq x}\log p\right)=x\sum_{p\leq x}\frac{\log p}{p}-O(\vartheta(x)).

Thus, applying Merten’s first theorem and Chebyshev’s asymptotics for ϑ\vartheta, we have that

mxlogradm=xlogxO(x).\sum_{m\leq x}\log\operatorname{rad}m=x\log x-O(x).

Therefore,

mxlogmlogradm=O(x).\sum_{m\leq x}\frac{\log m}{\log\operatorname{rad}m}=O(x).

On the other hand, logmlogradm\log m\geq\log\operatorname{rad}m, so that

mxlogmlogradmx.\sum_{m\leq x}\frac{\log m}{\log\operatorname{rad}m}\geq x.

Therefore,

mxlogmlogradm=Ω(x),\sum_{m\leq x}\frac{\log m}{\log\operatorname{rad}m}=\Omega(x),

as desired. ∎

Theorem 4.3.

For any xx\in\mathbb{N} and A>0A>0, we have that

mxmradmO(x(logx)A)\sum_{m\leq x}\frac{m}{\operatorname{rad}m}\neq O(x(\log x)^{A})
Proof.

Notice that f(m):=mrad(m)f(m):=\frac{m}{\mathrm{rad}(m)} is multiplicative, and f(pk)=pk1f(p^{k})=p^{k-1} for prime pp and kk\in\mathbb{N}. Therefore, the Dirichlet Series of ff is

F(s)=n=1f(n)ns=p(1+k1pk1pks).F(s)=\sum_{n=1}^{\infty}\frac{f(n)}{n^{s}}=\prod_{p}\left(1+\sum_{k\geq 1}\frac{p^{k-1}}{p^{ks}}\right).

This expression simplifies to

F(s)=p(1+1psp).F(s)=\prod_{p}\left(1+\frac{1}{p^{s}-p}\right).

Thus

F(s)(ζ(s))A+1=p((1+1psp)(1ps)A+1)\frac{F(s)}{(\zeta(s))^{A+1}}=\prod_{p}\left(\left(1+\frac{1}{p^{s}-p}\right)(1-p^{-s})^{A+1}\right)

by the Euler product for ζ\zeta. Sending s1+s\to 1^{+} on the real axis, the product tends to infinity. Because lims1+ζ(s)A+1(1s)A+1=1\lim_{s\to 1^{+}}\zeta(s)^{A+1}(1-s)^{A+1}=1, F(s)O(1(s1)A+1)F(s)\neq O\left(\frac{1}{(s-1)^{A+1}}\right). Let S(x)=nxf(n)S(x)=\sum_{n\leq x}f(n) and suppose for the sake of contradiction that S(x)=O(x(logx)A)S(x)=O(x(\log x)^{A}). For s>1s>1 we have that, by partial summation,

F(s)=1dS(x)xs=[S(x)xs]1+1S(x)xs+1dx.F(s)=\int_{1^{-}}^{\infty}\frac{\mathrm{d}S(x)}{x^{s}}=\left[\frac{S(x)}{x^{s}}\right]_{1^{-}}^{\infty}+\int_{1}^{\infty}\frac{S(x)}{x^{s+1}}\mathrm{d}x.

Note that because S(x)=O(x(logx)A)S(x)=O(x(\log x)^{A}) the first term on the RHS vanishes. Thus F(s)F(s) is on the order of

1(logx)Axsdx.\int_{1}^{\infty}\frac{(\log x)^{A}}{x^{s}}\mathrm{d}x.

To arrive at the desired contradiction, we must show that this integral is O(1(s1)A+1)O\left(\frac{1}{(s-1)^{A+1}}\right). In order to do this, we induct on AA (we may assume AA is an integer). For A=0A=0, the result is obvious. For higher AA, we may integrate by parts. Set u=(logx)Au=(\log x)^{A} and dv=xsdx\mathrm{d}v=x^{-s}\mathrm{d}x. Then

(logx)Axsdx=(logx)Ax1s1sA1s(logx)A1xsdx,\int\frac{(\log x)^{A}}{x^{s}}\mathrm{d}x=\frac{(\log x)^{A}x^{1-s}}{1-s}-\frac{A}{1-s}\int\frac{(\log x)^{A-1}}{x^{s}}\mathrm{d}x,

which implies the result by the induction hypothesis, as both terms are O(x(logx)A)O(x(\log x)^{A}). ∎

In fact, there is a more precise estimate than Theorem 4.3.

Theorem 4.4.

[5] For any xx\in\mathbb{N} we have that

mxmradm=xexp((1+o(1))8logxloglogx)\sum_{m\leq x}\frac{m}{\operatorname{rad}m}=x\exp\left((1+o(1))\sqrt{\frac{8\log x}{\log\log x}}\right)

By the metric given by Theorem 4.2, we make an O(1)O(1) improvement “on average” to the repeated squaring method. However, by Theorem 4.3, we “expect” m(logm)A(radm)m\gg(\log m)^{A}(\operatorname{rad}m) (very loosely), so that we “expect” logmlogradm+Aloglogm\log m\gg\log\operatorname{rad}m+A\log\log m. By this metric, we do make asymptotic improvement to the repeated squaring method.

Nonetheless, it is particularly fruitful to work with smooth mm rather than general mm. We have the following corollary of Theorem 3.1:

Corollary 4.5.

Let mm be a positive integer with known factorization p1e1p2e2pkekp_{1}^{e_{1}}p_{2}^{e_{2}}\cdots p_{k}^{e_{k}} such that all primes pip_{i} have the same bit length as an integer PP, and all eie_{i} are such that eiklogPe_{i}\sim k\log P. Let a,na,n be positive integers such that gcd(a,m)=1\gcd(a,m)=1. We may then compute an(modm)a^{n}\pmod{m} in O(logm)O(\sqrt{\log m}) steps.

Proof.

The idea is to use Theorem 3.1 and set H𝒯(n)𝖼H_{\mathcal{T}}(n)\approx\mathsf{c} for some 𝖼\mathsf{c}, and choose tit_{i} such that eiti𝖼\frac{e_{i}}{t_{i}}\approx\mathsf{c} for all ii. In other words, take ti=ei𝖼t_{i}=\lfloor\frac{e_{i}}{\mathsf{c}}\rfloor, for example. Then H𝒯(n)=𝖼+O(1)H_{\mathcal{T}}(n)=\mathsf{c}+O(1), and

i=1ktilogpi=1𝖼i=1k(eilogpi)+O(i=1klogpi)=logm𝖼+O(logradm).\sum_{i=1}^{k}t_{i}\log p_{i}=\frac{1}{\mathsf{c}}\sum_{i=1}^{k}(e_{i}\log p_{i})+O\left(\sum_{i=1}^{k}\log p_{i}\right)=\frac{\log m}{\mathsf{c}}+O(\log\operatorname{rad}m).

Notice that

logm=logPi=1keiklogP,\sqrt{\log m}=\sqrt{\log P\sum_{i=1}^{k}e_{i}}\sim k\log P,

so we may choose 𝖼\mathsf{c} on the order of logm\sqrt{\log m} (as 𝖼<ei\mathsf{c}<e_{i} is necessary and sufficient). With this choice, we compute an(modm)a^{n}\pmod{m} in

O(logm)+O(logradm)O(\sqrt{\log m})+O(\log\operatorname{rad}m)

steps. Note that

logradmklogPlogm\log\operatorname{rad}m\sim k\log P\sim\sqrt{\log m}

so that our modular exponentiation can be completed in O(logm)O(\sqrt{\log m}) steps. ∎

Remark 4.6.

This shows that in general it is best to choose all tit_{i} as O(1)O(1). When faced with the problem of optimizing these parameters, we can hence choose a relatively strong upper bound. On the other hand, it seems very difficult to compute the exactly optimal multiset 𝒯\mathcal{T}. Note that this is important because a smart choice of tit_{i} can make for a big performance boost (see Figure 3). If one could better understand the constant factors at play, then, modifying the above calculations, they could theoretically find the optimal choice of 𝒯\mathcal{T}.

In other words, when mm has large exponents relative to its prime factors, our algorithm makes large improvements (as logradmlogm\log\operatorname{rad}m\ll\log m in the case where the exponents are on the order of logP\log P). In particular, our algorithm does well for prime powers. We have the following cororollary:

Cororollary 4.7.

Let m=pm=p^{\ell} where =O(logp)\ell=O(\log p). Let a,na,n be positive integers such that gcd(a,m)=1\gcd(a,m)=1. We may then compute an(modm)a^{n}\pmod{m} in O(logp)O(\log p) steps.

Proof.

This trivially follows from Corollary 4.5 with k=1k=1. ∎

This is quite impressive as, if we pick \ell on the order of logp\log p, we may modular exponentiate modulo pp^{\ell} in the same number of steps (asymptotically) as modulo pp. Furthermore, because each operation (modular multiplication or addition) can be taken to be O((logm)2)O((\log m)^{2}), we have the following result:

Corollary 4.8.

Let the modulus mm be a positive integer with known factorization p1e1p2e2pkekp_{1}^{e_{1}}p_{2}^{e_{2}}\cdots p_{k}^{e_{k}} such that all primes pip_{i} have the same bit length as an integer PP, and all eie_{i} are such that eiklogPe_{i}\sim k\log P. If the standard modular exponentiation algorithm takes TT time, our algorithm for the same values takes cT5/6cT^{5/6} time, for some constant cc and given unit of time.

Proof.

For convenience, denote O((logm)e)O((\log m)^{e}) simply as f(e)f(e). The standard algorithm takes f(1)f(1) steps and hence f(3)f(3) time. Our algorithm takes f(1/2)f(1/2) steps and hence f(5/2)f(5/2) time. Then, if T1T_{1} is the time it takes our algorithm to run and T2T_{2} is the time for the standard algorithm to run, we have for some constants c1,c2c_{1},c_{2} that

(logm)5/2c1T1,(\log m)^{5/2}\sim c_{1}T_{1},
(logm)3c2T2.(\log m)^{3}\sim c_{2}T_{2}.

Hence T1c25/6c11T25/6T_{1}\sim c_{2}^{5/6}c_{1}^{-1}T_{2}^{5/6}, so we obtain the desired result by taking the appropriate cc25/6c11c\sim c_{2}^{5/6}c_{1}^{-1}. ∎

Remark 4.9.

One may think that for such smooth mm, a CRT-type approach may also be fast. However, it is well-known (see e.g. [6]) that CRT is O(2)O(\ell^{2}), where the moduli have bit length \ell. Hence this is a non-issue.

5. General Modular Exponentiation

We may extend the ideas of our method to prove a more general theorem about matrix modular exponentiation, which translates over to linear recurrences. In this section, a step is a matrix multiplication.

Theorem 5.1.

Let m=p1e1p2e2pkekm=p_{1}^{e_{1}}p_{2}^{e_{2}}\cdots p_{k}^{e_{k}} be the canonical factorization of some mm\in\mathbb{N}. Choose parameters ti[1,ei]t_{i}\in[1,e_{i}] for each 1ik1\leq i\leq k. Let dd\in\mathbb{N}. Let AGLd(/m)A\in GL_{d}(\mathbb{Z}/m\mathbb{Z}) and nn\in\mathbb{N}. Then we may compute AnA^{n} in O(max(ei/ti)+j=1k(tj1)d2logpj+j=1ki=0d1log(pjdpji))O\left(\max(e_{i}/t_{i})+\sum_{j=1}^{k}(t_{j}-1)d^{2}\log p_{j}+\sum_{j=1}^{k}\sum_{i=0}^{d-1}\log(p_{j}^{d}-p_{j}^{i})\right) steps.

Proof.

Let T=j=1kpjtjT=\prod_{j=1}^{k}p_{j}^{t_{j}}. The algorithm is identical to that of Theorem 3.1, except we decompose the exponent modulo |GLd(/T)||GL_{d}(\mathbb{Z}/T\mathbb{Z})| instead of φ(T)\varphi(T). This works because of Lagrange’s theorem 2.8. The complexity is hence O(max(ei/ti)+log|GLd(/T)|)O\left(\max(e_{i}/t_{i})+\log|GL_{d}(\mathbb{Z}/T\mathbb{Z})|\right). By Lemma 2.10, this is the desired complexity. ∎

Remark 5.2.

A near-identical result holds for the special linear group instead. This is less practically useful but still important to note.

This asymptotic may be slightly difficult to work with, but a trivial corollary is the following:

Corollary 5.3.

Let m=p1e1p2e2pkekm=p_{1}^{e_{1}}p_{2}^{e_{2}}\cdots p_{k}^{e_{k}} be the canonical factorization of some mm\in\mathbb{N}. Choose parameters ti[1,ei]t_{i}\in[1,e_{i}] for each 1ik1\leq i\leq k. Let dd\in\mathbb{N}. Let AGLd(/m)A\in GL_{d}(\mathbb{Z}/m\mathbb{Z}) and nn\in\mathbb{N}. Then we may compute AnA^{n} in O(max(ei/ti)+d2i=1ktilogpi)O\left(\max(e_{i}/t_{i})+d^{2}\sum_{i=1}^{k}t_{i}\log p_{i}\right) steps.

Proof.

This follows from Theorem 5.1 and the obvious estimate log(pjdpji)<log(pjd)=dlogpj\log(p_{j}^{d}-p_{j}^{i})<\log(p_{j}^{d})=d\log p_{j}. Indeed, the double sum is upper-bounded by d2j=1klogpjd^{2}\sum_{j=1}^{k}\log p_{j} and so the result follows. ∎

Remark 5.4.

With d=1d=1, we get Theorem 3.1 for prime powers. Also, this shows we can modular exponentiate matrices of size O(1)O(1) in the same complexity as natural numbers.

Another corollary is that we may quickly compute the residue of large elements of a linear recurrent sequence modulo prime powers:

Corollary 5.5.

Let m=p1e1p2e2pkekm=p_{1}^{e_{1}}p_{2}^{e_{2}}\cdots p_{k}^{e_{k}} be the canonical factorization of some mm\in\mathbb{N}. Choose parameters ti[1,ei]t_{i}\in[1,e_{i}] for each 1ik1\leq i\leq k. Let (un)n=0(u_{n})^{\infty}_{n=0} be a sequence of elements of /m\mathbb{Z}/m\mathbb{Z} related with a linear recurrence relation of degree dd:

un+d=cd1un+d1++c0un.u_{n+d}=c_{d-1}u_{n+d-1}+\cdots+c_{0}u_{n}.

Suppose that gcd(c0,m)=1\gcd(c_{0},m)=1. Given sufficient initial terms, we may compute any element uNu_{N} in O(max(ei/ti)+j=1k(tj1)d2logpj+j=1ki=0d1log(pjdpji))O\left(\max(e_{i}/t_{i})+\sum_{j=1}^{k}(t_{j}-1)d^{2}\log p_{j}+\sum_{j=1}^{k}\sum_{i=0}^{d-1}\log(p_{j}^{d}-p_{j}^{i})\right) steps.

Proof.

Linear recurrence relations of order dd can be represented as matrix powers. In particular, the respective matrices are in GLd(/m)GL_{d}(\mathbb{Z}/m\mathbb{Z}), so the result follows immediately by Theorem 5.1. One must make sure that the matrix produced has determinant invertible in /m\mathbb{Z}/m\mathbb{Z} (and hence in /T\mathbb{Z}/T\mathbb{Z}). This determinant has magnitude c0c_{0}, so we require gcd(c0,m)=1\gcd(c_{0},m)=1. This condition is met, so we are done. ∎

Remark 5.6.

This shows that we may compute the nnth Fibonacci number modulo mm in the same number of steps as we perform a modular exponentiation modulo mm. For example, modulo pkp^{k} we may do it in O(k+logp)O(k+\log p) steps.

Fiduccia [7] and, more recently, Bostan and Mori [8] provide the state-of-the-art results for this problem in the case of a sequences over a general ring. The amount of steps taken is on the order of M(d)lognM(d)\log n, where M(d)=O(dlogdloglogd)M(d)=O(d\log d\log\log d) is the number of operations to multiply two polynomials in the ring. We achieve a stronger bound for the case where the ring is /m\mathbb{Z}/m\mathbb{Z}. Our bound does not depend on nn because of the reduction we make modulo |GLd(/T)||GL_{d}(\mathbb{Z}/T\mathbb{Z})|. In order to reduce nn in the same manner modulo mm for the bounds given by Fiduccia and by Bostan and Mori, the best possible reduction is by |GLd(/m)||GL_{d}(\mathbb{Z}/m\mathbb{Z})|. By Lemma 2.10, M(d)log|GLd(/m)|M(d)d2logmO(d3logdlogm)M(d)\log|GL_{d}(\mathbb{Z}/m\mathbb{Z})|\approx M(d)d^{2}\log m\gg O(d^{3}\log d\log m). If ω[2,3]\omega\in[2,3] is the exponent for matrix multiplication, we take under O(dωmax(ei/ti)+d2+ωlogT)O(d^{\omega}\max(e_{i}/t_{i})+d^{2+\omega}\log T) steps of the same complexity as the steps taken by Fiduccia. Therefore Fiduccia’s algorithm is better as a function of dd, but we are better as a function of mm. It is often the case that mdm\gg d, so we make significant improvement. Indeed, the key optimization that Fiduccia makes (involving the characteristic polynomial) only affects the complexity as a function of dd.

We can also apply our algorithm to ring extensions:

Theorem 5.7.

Let pp be a prime and kk and nn be natural numbers. Consider a finite ring extension R=/pk[α1,α2,,αn]R=\mathbb{Z}/p^{k}\mathbb{Z}[\alpha_{1},\alpha_{2},\cdots,\alpha_{n}]. Consider the corresponding field extension F=𝔽p[α1,α2,,αk]F=\mathbb{F}_{p}[\alpha_{1},\alpha_{2},\cdots,\alpha_{k}]. We can exponentiate in RR in O(k+log|F|)O(k+\log|F|) steps, where each step is an operation in RR.

Proof.

The algorithm is exactly that of Theorem 3.1 for prime powers and t=1t=1, except that we decompose the exponent modulo |F||F|. This works due to Lagrange’s Theorem 2.8. ∎

A nice example of this theorem is that we provide fast modular exponentiation for Gaussian Integers modulo prime powers! This case is also related to Theorem 5.1 due to the bijection a+bi(abba)a+bi\leftrightarrow\begin{pmatrix}a&-b\\ b&a\end{pmatrix}.

6. Programmatical Analysis

In this section, we test only integer modular exponentiation of our algorithm, as we prove that the generalizations have similar complexities.

Let us begin by testing our algorithm for m=pkm=p^{k}. We will test against Python’s built-in pow\mathrm{pow} function. We will iterate over primes pp, and choose kk randomly in a small interval around logp\log p. In particular, we choose kk uniformly at random in the interval [logplogp,logp+logp][\log p-\sqrt{\log p},\log p+\sqrt{\log p}]. We choose a[pk/2,pk]a\in[p^{k}/2,p^{k}] randomly (such that gcd(a,p)=1\gcd(a,p)=1) as well. We choose nn randomly in the interval [pk/2,pk][p^{k}/2,p^{k}]. We choose ti=1t_{i}=1 for simplicity. We then compare the runtime for computation of an(modpk)a^{n}\pmod{p^{k}} via our method (1\mathcal{R}_{1}) and Python’s built-in pow\mathrm{pow} function (2\mathcal{R}_{2}) over 10001000 iterations.

In Figure 1 we iterate over 0n3.51040\leq n\leq 3.5\cdot 10^{4}, plotting pn+0.7105p_{n+0.7\cdot 10^{5}} versus the respective ratio 21\frac{\mathcal{R}_{2}}{\mathcal{R}_{1}}.

Refer to caption
Figure 1. Ratio for small primes

In Figure 2 we iterate over 0n3.51040\leq n\leq 3.5\cdot 10^{4}, plotting pn+3.5105p_{n+3.5\cdot 10^{5}} versus the respective ratio 21\frac{\mathcal{R}_{2}}{\mathcal{R}_{1}}.

Refer to caption
Figure 2. Ratio for big primes

As seen in the figures, there is a high variance in the ratio over small primes, whereas it steadies out for larger primes. The large jump in the second figure at about n=16000n=16000 is likely because Python has different optimizations for smaller calls of the pow\mathrm{pow} function. We still do not have an explanation for the random jumps in data. Nonetheless, a basic implementation of our algorithm makes significant improvements to the highly optimized pow\mathrm{pow} function.

The below figure shows how the problem of optimizing 𝒯\mathcal{T} is important.

Refer to caption
Figure 3. Choosing optimal tt

For a specific choice of computing an(modpe)a^{n}\pmod{p^{e}} with p=101,e=200,a=13,p=101,e=200,a=13, and n=pe3n=\lfloor\frac{p^{e}}{3}\rfloor (chosen arbitrarily), we vary the choice of 𝒯\mathcal{T} over the set {{1},{2},,{50}}\{\{1\},\{2\},\cdots,\{50\}\}. We plot this against the runtime for computation to create the blue curve. As seen in the figure, the optimal value of 𝒯\mathcal{T} yields a time approximately 55 times faster than 𝒯={1}\mathcal{T}=\{1\}. The orange curve demonstrates how the graph follows a curve of the form at+b/tat+b/t, as indicated by Corollary 4.5. The R2R^{2} value is 96.5%96.5\%.

Recall that we achieve a complexity of O(logm)O(\sqrt{\log m}) for an infinite family of mm, by Corollary 4.5. We aim to show this empirically. Because the complexity of the repeated squaring algorithm is O(logm)O(\log m), if we graph the ratio r=2/1r=\mathcal{R}_{2}/\mathcal{R}_{1} of the runtime of python’s built-in modular exponentiation function to ours, we expect to see rlogmr\propto\sqrt{\log m}. With y=ry=r and x=logmx=\log m, we anticipate a graph of the form y=cxy=c\sqrt{x}.

The setup is as follows. Let P(n)P(n) be the first prime greater than 10n10^{n}. For 10n<50,10\leq n<50, we let p=P(n)p=P(n) and e=ne=n. Pick ti=1t_{i}=1 for simplicity. Then, m=pem=p^{e} is in the desired family. For each nn, we compute the ratio rr and plot it against log10mn2\log_{10}m\approx n^{2}.

Refer to caption
Figure 4. Ratio graph with logm\log m on the xx-axis

The desired y=cxy=c\sqrt{x} curve appears. Interestingly, the ratio for even nn is on average 24.824.8 percent higher than for odd nn. The curves fit quite well: the R2R^{2} values for even and odd nn are 94.994.9 and 85.285.2 percent respectively.

One might notice three values of odd nn appear to fit on the curve for even nn. Those values are 2121, 3939, and 4545. We do not know why this phenomenon occurs or why the ratio is higher in general for even nn.

Remark 6.1.

In Python, we have achieved computation time more than 200200 times faster than the built-in pow\mathrm{pow} for specific large values of mm.

See [2] for the Python code used to create these graphs.

7. Conclusion

We presented a fast algorithm for modular exponentiation when the modulus is known. We also presented a variant of this algorithm which uses less memory. We analyzed this algorithm in the general case, then shifted our focus to the specific case where the modulus has large prime exponents. We showed particular interest in the case where the modulus is a prime power, and we analyzed this case programmatically, testing it against Python’s built-in pow\mathrm{pow} function. We also presented a stronger version of our algorithm for matrix modular exponentiation, which applies to the computation of large terms in linear recurrent sequences modulo some mm.

This algorithm has potential practical use in cryptography. Fast modular exponentiation is vital in the fast encryption of classic algorithms such as RSA and the Diffie-Hellman Key Exchange. It is even used in quantum algorithms: modular exponentiation is the bottleneck of Shor’s algorithm. If one could construct a cryptosystem in which it is useful to have a known modulus with large prime exponents, our algorithm would be applicable to its encryption process. For example, a variant of Takagi’s cryptosystem [9] with larger exponents has such properties. Additionally, work has been done on using matrix exponentiation and linear recurrences for error-correcting codes. For example, Matsui’s 2012 paper [10] uses linear sequences for a decoding algorithm. It is quite possible that our algorithm is potentially useful for such an algorithm.

There are a couple of things that we wish to do with this work going forward. We’d like to find a framework for programmatically testing general moduli (not only prime powers). Additionally, we hope to to make further progress on the front of optimizing 𝒯\mathcal{T} in practice. Furthermore, we want to come up with explanations for some of the phenomena that we see in the figures in section 5. Finally, we aim to implement further optimizations to our algorithm such as Montgomery Reduction.

8. Acknowledgments

We thank Dr. Simon Rubinstein-Salzedo for the useful discussion we had during the creation of this paper. We also thank Eva Goedhart and Nandita Sahajpal for awarding us the Lehmer Prize for this work at the 2023 West Coast Number Theory conference. We finally thank Dr. John Gorman from Jesuit High School Portland for inspiring this paper.

References

  • [1] Eric Bach. Discrete Logarithms and Factoring. (UCB/CSD-84-186), Jun 1984.
  • [2] Manu Isaacs and Anay Aggarwal. Algorithm for Fast Modular Exponentiation. https://github.com/misaacs3737/modExp/, 2023. Accessed: January 17, 2024.
  • [3] Donald E. Knuth. The Art of Computer Programming, Vol. 1: Fundamental Algorithms. Addison-Wesley, Reading, Mass., third edition, 1997.
  • [4] Ivan Niven. Averages of Exponents in Factoring Integers. Proceedings of the American Mathematical Society, 22(2):356–360, 1969.
  • [5] Olivier Robert and Gérald Tenenbaum. Sur la répartition du noyau d’un entier. Indagationes Mathematicae, 24(4):802–914, 2013. In memory of N.G. (Dick) de Bruijn (1918–2012).
  • [6] Joris van der Hoeven. Fast Chinese Remaindering in Practice. In International Conference on Mathematical Aspects of Computer and Information Sciences, 2017.
  • [7] Charles M. Fiduccia. An Efficient Formula for Linear Recurrences. SIAM Journal on Computing, 14(1):106–112, 1985.
  • [8] Alin Bostan and Ryuhei Mori. A Simple and Fast Algorithm for Computing the nn-th Term of a Linearly Recurrent Sequence, 2020.
  • [9] Tsuyoshi Takagi. Fast RSA-type cryptosystem modulo pkqp^{k}q. In Advances in Cryptology—CRYPTO’98: 18th Annual International Cryptology Conference Santa Barbara, California, USA August 23–27, 1998 Proceedings 18, pages 318–326. Springer, 1998.
  • [10] Hajime Matsui. Decoding a Class of Affine Variety Codes with Fast DFT, 2012.