An
Algorithm for Finding Positive
Solutions to Polynomial Equations
Abstract.
We present a numerical algorithm for finding real non-negative solutions to polynomial equations. Our methods are based on the expectation maximization and iterative proportional fitting algorithms, which are used in statistics to find maximum likelihood parameters for certain classes of statistical models. Since our algorithm works by iteratively improving an approximate solution, we find approximate solutions in the cases when there are no exact solutions, such as overconstrained systems.
We present an iterative numerical method for finding non-negative solutions and approximate solutions to systems of polynomial equations. We require two assumptions about our system of equations. First, for each equation, all the coefficients other than the constant term must be non-negative. Second, there is a technical assumption on the exponents, described at the beginning of Section 1, which, for example, is satisfied if all non-constant terms have the same total degree. In Section 3, there is a discussion of the range of possible systems which can arise under these hypotheses.
Because of the assumption on signs, we can write our system of equations as,
(1) |
where the coefficients are non-negative and the are positive, and is a finite set of possibly non-integer multi-indices. Our algorithm works by iteratively decreasing the generalized Kullback-Leibler divergence of the left-hand side and right-hand side of (1). The generalized Kullback-Leibler divergence of two positive vectors and is defined to be
(2) |
The standard Kullback-Leibler consists only of the first term and is defined only for probability distributions, i.e. the sum of each vector is . The last two terms are necessary so that the generalized divergence has, for arbitrary positive vectors and , the property of being non-negative and zero exactly when and are equal (Proposition 3).
Our algorithm converges to local minima of the Kullback-Leibler divergence, including exact solutions to the system (1). In order to find multiple local minima, we can repeat the algorithm for randomly chosen starting points. For finding approximate solutions, this may be sufficient. However, there are no guarantees of completeness for the exact solutions obtained in this way. Nonetheless, we hope that in certain situations, our algorithm will find applications both for finding exact and approximate solutions.
Lee and Seung applied the EM algorithm to the problem of non-negative matrix factorization [6]. They introduced the generalized Kullback-Leibler divergence in (2) and used it to find approximate non-negative matrix factorizations. Since the product of two matrices can be expressed by polynomials in the entries of the matrices, matrix-factorization is a special case of the equations in (1).
For finding exact solutions to arbitrary systems of polynomials, there are a variety of approaches which find all complex or all real solutions. Homotopy continuation methods find all complex roots of a system of equations [8]. Even to find only positive roots, these two methods finds all complex or all real solutions, respectively. Lasserre, Laurent, and Rostalski have applied semidefinite programming to find all real solutions to a system of equations and a slight modificiation of their algorithm will find all positive real solutions [4, 5]. Nonetheless, neither of these methods has any notion of approximate solutions.
For directly finding only positive real solutions, Bates and Sottile have proposed an algorithm based on fewnomials bounds on the number of real solutions [1]. However, their method is only effective when the number of monomials (the set in our notation) is only slightly more than the number of variables. Our method only makes weak assumptions on the set of monomials, but stronger assumptions on the coefficients.
Our inspiration comes from tools for maximum likelihood estimation in statistics. Parameters which maximize the likelihood are exactly the parameters such that the model probabilities are closest to the empirical probabilities, in the sense of mimimizing Kullback-Leibler divergence. Expectation-Maximization [7, Sec. 1.3] and Iterative Proporitional Fitting [3] are well-known iterative methods for maximum likelihood estimation. We re-interpret these algorithms as methods for approximating solutions to polynomial equations, in which case their applicability can be somewhat generalized.
The impetus behind the work in this paper was the need to find approximate positive solutions to systems of bilinear equations in [2]. In this case the variables represented expression levels, which only made sense as positive parameters. Moreover, in order to accomodate noise in the data, there were more equations than variables, so it was necessary to find approximate solutions. Thus, the algorithm described in this paper was the most appropriate tool. Here, we generalize beyond bilinear equations and present proofs.
An implementation of our algorithm in the C programming language is freely available at http://math.berkeley.edu/~dustin/pos/.
In Section 1, we describe the algorithm and the connection to maximum likelihood estimation. In Section 2, we prove that the necessary convergence for our algorithm. Finally, in Section 3, we show that even with our restrictions on the form of the equations, there can be exponentially positive real solutions.
1. Algorithm
We make the assumption that we have an non-negative matrix , with no column identically zero, and positive real numbers for such that for each and each , is either or . For example, if all the monomials have the same total degree , we can take and for all . The other case of particular interest is multilinear systems of equations, in which each is at most one. In this case the variables can be partitioned into sets such that the equations are linear in each set of variables, so we can take for all . Note that because is in the denominator in (4), convergence is fastest when the are small, such as in the multilinear case. We also note that, for an arbitrary set of exponents , there may not exist such a .
The algorithm begins with a randomly chosen starting vector and iteratively improves it through two nested iterations:
-
Initialize with randomly chosen positive real numbers.
-
Loop until the vector stabilizes:
-
(a)
For all , compute
(3) -
(b)
Loop until the vector stablizes:
-
(i)
Loop for from to :
-
(A)
Simultaneously update all entries of :
(4)
-
(A)
-
(i)
-
(a)
Because there is no subtraction, it is clear that the entries of remain positive throughout this algorithm.
Our method is inspired by interpreting the equations in (1) as a maxmimum likelihood problem for a statistical model and applying the well-known methods of Expectation-Maximization (EM) and Iterative Proportional Fitting (IPF). Here, we assume that all the monomials have the same total degree. Our statistical model is that a hidden process generates an integer and an exponent vector with joint probability . The vector contains positive parameters for the model, restricted such that the total probability is . The empirical data consists of repeated observations of the integer , but not the exponent , and is the proportion of observations of . In this situation, the vector which minimizes the divergence of (1) is the maximum likelihood parameters for the empirical distribution . The inner loop of the algorithm consists of using IPF to solve the log-linear hidden model and the outer loop consists of using EM to estimate the distribution on the hidden states.
2. Proof of convergence
In this section we prove our main theorem:
Theorem 1.
The Kullback-Leibler divergence
(5) |
is weakly decreasing during the algorithm in Section 1. Moreover, assuming that the set contains a multiple of each unit vector , i.e. some power of each appears in the system of equations, then the vector coverges to a critical point of the function (5) or the boundary of the positive orthant.
Remark 2.
The condition on is necessary to ensure that the vector remains bounded during the algorithm.
We begin by establishing several basic properties of the generalized Kullback-Leibler divergence in Proposition 3 and Lemmas 4 and 5. Note that and thus, we will check these basic properties by reducing to the case where and are scalars.
The proof of Theorem 1 itself is divided into two parts, corresponding to the two nested iterative loops. The first step is to prove that the updates (4) in the inner loop converge a local minimum of the divergence . The second step is to show that this implies that the outer loop strictly decreases the divergence function (5) exact at a critical point.
Proposition 3.
For and vectors of positive real numbers, the divergence is non-negative with if and only if .
Proof.
It suffices to prove this when and are scalars. We let , and,
It is clear that is non-negative and equal to zero if and only if . ∎
Lemma 4.
Suppose that and are vectors of positive real numbers. Let be any positive real number, and then
Proof.
Again, we assume that and are scalars and then it becomes a straightforward computation. ∎
Lemma 5.
If and are vectors of positive real numbers, then we can relate their divergence to the divergence of their sums by
Proof.
We let and , and apply Lemma 4 to the last term:
After rearranging, we get the desired expression. ∎
Lemma 6.
Proof.
First, since the statement only depends on the th row of the matrix , we can assume that is a row vector and we drop from future subscripts. Second, we can assume that by replacing with .
Third, we reduce to the case when for all . We define a new set of exponents and coefficients by and , where the product is taken over all indices such that . We take to be a vector indexed by those such that . Then, under the change of coordinates , we have and the update rule in (4) is the same for the new system with coefficients and exponents . Furthermore, if all entries of are zero, then for all vectors and so we can drop from our exponent set. Therefore, for the rest of the proof, we drop the tildes, and assume that for all and for all , in which case drops out of the equations.
To prove the desired inequality, we substitute the updated assignment into the definition of Kullback-Leibler divergence:
(7) |
Proposition 7.
A positive vector is a fixed point of the update rule (4) for all if and only if is a critical point of the divergence function .
Proof.
For the update rule to be constant means that the numerator and denominator in (4) are equal, i.e.
(10) |
By our assumption on , for each , some is non-zero, so (10) is equivalent to
(11) |
On the other hand, we compute the partial derivative
Since each is assumed to be non-zero, it is clear that all partial derivatives being zero is equivalent to (11). ∎
Lemma 8.
If we define as in (3), then
Moreover, a positive vector is a fixed point if and only if is a critical point for the divergence function.
Proof.
We consider
(12) |
and apply Lemma 5 in two different ways. First, by applying Lemma 5 to each group of (12) with fixed , we get
In the last term, the monomials cancel and so it is a constant independent of which we denote . On the other hand, we can apply Lemma 5 to each group in (12) with fixed ,
We can combine these equations to get
(13) |
By Proposition 3, the last term of 13 is non-negative, and by the definition of , it is zero for . Therefore, any value of which decreases the first term compared to will also decrease the left hand side by at least as much, which i sthe desired inequality.
In order to prove the statement about the derivative, we consider the derivative of (13) at . Because the last term is mimimized at , its derivative is zero, so
By Proposition 7, a positive vector is a fixed point of the inner loop if and only if these partial derivatives on the right are zero for all indices , which is the definition of a critical point. ∎
Proof of Theorem 1.
The Kullback-Leibler divergence decreases at each step of the inner loop by Lemma 6. Thus, by Lemma 8, the divergence
(14) |
decreases at least as much. However, the divergence (14) is non-negative according to Proposition 3. Therefore, the magnitude of the decreases in divergence must approach zero over the course of the algorithm. By Lemma 6, this means that the quantity in that theorem must approach zero. By Proposition 3, this means that the quantities in that divergence approach each other. However, up to a power, these are the numerator and denominator of the factor in the update rule (4), so the difference between consecutive vectors approaches zero.
Thus, we just need to show that remains bounded. However, since some power of each variable occurs in some equation, as gets large, the divergence for that equation also gets arbitrarily large. Therefore, each must remain bounded, so the vector must have a limit as the algorithm is iterated. If this limit is in the interior of the positive orthant, then it must be a fixed point. By Lemma 8 and Proposition 7, this fixed point must be a critical point of the divergence (5). ∎
3. Universality
Although the restriction on the exponents and especially the positivity of the coefficients seem like strong conditions, such systems can nonetheless be quite complex. In this section, we investigate the breadth of such equations.
Proposition 9.
For any system of real polynomials in variables, there exists a system of equations in variables, in the form (1), such that the positive solutions to the former system are in bijection with the positive solutions of the latter, with .
Proof.
We write our system of equations as for , where is an arbitrary finite set of exponents and are any real numbers. We let be the maximum degree of any monomial for . We homogenize the equations with a new variable . Explicitly, define to consist of for all in and we write . We add a new equation with coefficients for all and . For this system, we can clearly take and to satisfy the condition on exponents. Furthermore, for any positive solution to the original system of equations, with and is a solution to the homogenized system of equations.
Next, we add a multiple of the last equation to each of the others in order to make all the coefficients positive. For each , choose a positive , and define . By construction, the resulting system has all positive coefficients, and since the equations are formed from the previous equations by elementary linear transformations, the set of solutions are the same. ∎
The practical use of the construction in the proof of Proposition 9 is mixed. The first step, of homogenizing to deal with arbitrary sets of exponents, is a straightforward way of guaranteeing the existence of the matrix . However, for large systems, the second step tends to produce an ill-conditioned coefficient matrix. In these cases, our algorithm converges very slowly. Nonetheless, Proposition 9 shows that in the worst case, systems satisfying our hypotheses can be as complicated as arbitrary polynomial systems.
Proposition 10.
There exist bilinear equations in variables with positive real solutions.
Proof.
We use a variation on the technique used to prove Proposition 9.
First, we pick generic homogeneous linear functions on variables. By generic, we mean for any of the , the only simultaneous solution of all linear equations is the trivial one. This genericity implies that any of the define a point in By taking a linear changes of coordinates in each set of variables, we can assume that all of these points are positive, i.e. have a representative consisting of all positive real numbers.
Then we consider the system of equations
(15) | ||||
(16) | ||||
(17) |
The equations (15) are bihomogeneous and so we can think of their solutions in . There are exactly positive real solutions, corresponding to the subsets of size . For any such , there is a unique, distinct solution satisfying for all in and for all not in . By assumption, for each solution, all the coordinates can be chosen to be positive. The last two equations (16) and (17) dehomogenize the system in a way such that there are positive real solutions. Finally, as in the last paragraph of the proof of Proposition 9, we can add multiples of (16) to the equations (15) in order to make all the coefficients positive. ∎
Acknowledgments
We thank Bernd Sturmfels for reading a draft of this manuscript. This work was supported by the Defense Advanced Research Projects Agency project “Microstates to Macrodynamics: A New Mathematics of Biology” and the U.S. National Science Foundation (DMS-0456960).
References
- [1] Dan Bates and Frank Sottile. Khovanskii-Rolle continuation for real solutions. arXiv:0908.4579, 2009.
- [2] Dustin A. Cartwright, Siobhan M. Brady, David A. Orlando, Bernd Sturmfels, and Philip N. Benfey. Reconstruction spatiotemporal gene expression data from partial observations. Bioinformatics, 25(19):2581–2587, 2009.
- [3] J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Math. Stat., 43(5):1470–1480, 1972.
- [4] Jean B. Lasserre, Monique Laurent, and Philipp Rostalski. Semidefinite characterization and computation of real radical ideals. Foundations of Computational Mathematics, 8(5):607–647, 2008.
- [5] Jean B. Lasserre, Monique Laurent, and Philipp Rostalski. A prolongation-projection algorithm for computing the finite real variety of an ideal. Theoretical Computer Science, 410(27–29):2685–2700, 2009.
- [6] Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. Adv. Neural Info. Proc. Syst., 13:556–562, 2001.
- [7] Lior Pachter and Bernd Sturmfels. Algebraic Statistics for Computational Biology. Cambridge University Press, 2005.
- [8] Jan Verschelde. PHCPACK: A general-purpose solver for polynomial systems by homotopy continuation.