An Algorithm for Finding Positive
Solutions to Polynomial Equations

Dustin Cartwright Dept. of Mathematics, University of California, Berkeley, CA 94720, USA dustin@math.berkeley.edu

Abstract.

We present a numerical algorithm for finding real non-negative solutions to polynomial equations. Our methods are based on the expectation maximization and iterative proportional fitting algorithms, which are used in statistics to find maximum likelihood parameters for certain classes of statistical models. Since our algorithm works by iteratively improving an approximate solution, we find approximate solutions in the cases when there are no exact solutions, such as overconstrained systems.

We present an iterative numerical method for finding non-negative solutions and approximate solutions to systems of polynomial equations. We require two assumptions about our system of equations. First, for each equation, all the coefficients other than the constant term must be non-negative. Second, there is a technical assumption on the exponents, described at the beginning of Section 1, which, for example, is satisfied if all non-constant terms have the same total degree. In Section 3, there is a discussion of the range of possible systems which can arise under these hypotheses.

Because of the assumption on signs, we can write our system of equations as,

(1)

\sum_{\alpha\in S}a_{i\alpha}x^{\alpha}=b_{i}\quad\mbox{for }i=1,\ldots,\ell,

where the coefficients $a_{i\alpha}$ are non-negative and the $b_{i}$ are positive, and $S\subset\mathbb{R}_{\geq 0}^{n}$ is a finite set of possibly non-integer multi-indices. Our algorithm works by iteratively decreasing the generalized Kullback-Leibler divergence of the left-hand side and right-hand side of (1). The generalized Kullback-Leibler divergence of two positive vectors $a$ and $b$ is defined to be

(2)

D(a\|b):=\sum_{i}\left(b_{i}\log\left(\frac{b_{i}}{a_{i}}\right)-b_{i}+a_{i}\right).

The standard Kullback-Leibler consists only of the first term and is defined only for probability distributions, i.e. the sum of each vector is $1$ . The last two terms are necessary so that the generalized divergence has, for arbitrary positive vectors $a$ and $b$ , the property of being non-negative and zero exactly when $a$ and $b$ are equal (Proposition 3).

Our algorithm converges to local minima of the Kullback-Leibler divergence, including exact solutions to the system (1). In order to find multiple local minima, we can repeat the algorithm for randomly chosen starting points. For finding approximate solutions, this may be sufficient. However, there are no guarantees of completeness for the exact solutions obtained in this way. Nonetheless, we hope that in certain situations, our algorithm will find applications both for finding exact and approximate solutions.

Lee and Seung applied the EM algorithm to the problem of non-negative matrix factorization [6]. They introduced the generalized Kullback-Leibler divergence in (2) and used it to find approximate non-negative matrix factorizations. Since the product of two matrices can be expressed by polynomials in the entries of the matrices, matrix-factorization is a special case of the equations in (1).

For finding exact solutions to arbitrary systems of polynomials, there are a variety of approaches which find all complex or all real solutions. Homotopy continuation methods find all complex roots of a system of equations [8]. Even to find only positive roots, these two methods finds all complex or all real solutions, respectively. Lasserre, Laurent, and Rostalski have applied semidefinite programming to find all real solutions to a system of equations and a slight modificiation of their algorithm will find all positive real solutions [4, 5]. Nonetheless, neither of these methods has any notion of approximate solutions.

For directly finding only positive real solutions, Bates and Sottile have proposed an algorithm based on fewnomials bounds on the number of real solutions [1]. However, their method is only effective when the number of monomials (the set $S$ in our notation) is only slightly more than the number of variables. Our method only makes weak assumptions on the set of monomials, but stronger assumptions on the coefficients.

Our inspiration comes from tools for maximum likelihood estimation in statistics. Parameters which maximize the likelihood are exactly the parameters such that the model probabilities are closest to the empirical probabilities, in the sense of mimimizing Kullback-Leibler divergence. Expectation-Maximization [7, Sec. 1.3] and Iterative Proporitional Fitting [3] are well-known iterative methods for maximum likelihood estimation. We re-interpret these algorithms as methods for approximating solutions to polynomial equations, in which case their applicability can be somewhat generalized.

The impetus behind the work in this paper was the need to find approximate positive solutions to systems of bilinear equations in [2]. In this case the variables represented expression levels, which only made sense as positive parameters. Moreover, in order to accomodate noise in the data, there were more equations than variables, so it was necessary to find approximate solutions. Thus, the algorithm described in this paper was the most appropriate tool. Here, we generalize beyond bilinear equations and present proofs.

An implementation of our algorithm in the C programming language is freely available at http://math.berkeley.edu/~dustin/pos/.

In Section 1, we describe the algorithm and the connection to maximum likelihood estimation. In Section 2, we prove that the necessary convergence for our algorithm. Finally, in Section 3, we show that even with our restrictions on the form of the equations, there can be exponentially positive real solutions.

1. Algorithm

We make the assumption that we have an $s\times n$ non-negative matrix $g$ , with no column identically zero, and positive real numbers $d_{j}$ for $1\leq j\leq s$ such that for each $\alpha\in S$ and each $j\leq s$ , $\sum_{i=1}^{n}g_{ji}\alpha_{i}$ is either $0$ or $d_{j}$ . For example, if all the monomials $x^{\alpha}$ have the same total degree $d_{1}$ , we can take $s=1$ and $g_{1i}=1$ for all $i$ . The other case of particular interest is multilinear systems of equations, in which each $\alpha_{i}$ is at most one. In this case the variables can be partitioned into sets such that the equations are linear in each set of variables, so we can take $d_{j}=1$ for all $j$ . Note that because $d_{j}$ is in the denominator in (4), convergence is fastest when the $d_{j}$ are small, such as in the multilinear case. We also note that, for an arbitrary set of exponents $S$ , there may not exist such a $g$ .

The algorithm begins with a randomly chosen starting vector and iteratively improves it through two nested iterations:

$\bullet$

Initialize $x$ with $n$ randomly chosen positive real numbers.

\bullet

Loop until the vector $x$ stabilizes:

(a)

For all $\alpha\in S$ , compute

(3) $w_{\alpha}:=\sum_{i}b_{i}\frac{a_{i\alpha}x^{\alpha}}{\sum_{\beta}a_{i\beta}x^{\beta}}.$

(b)

Loop until the vector $x$ stablizes:

(i)

Loop for $j$ from $1$ to $s$ :

(A)

Simultaneously update all entries of $x$ :

(4)

x_{i}\leftarrow x_{i}\left(\frac{\sum_{\alpha}\alpha_{i}g_{ji}w_{\alpha}}{\sum_{\alpha}\alpha_{i}g_{ji}a_{\alpha}x^{\alpha}}\right)^{g_{ji}/d_{j}}\quad\mbox{where }a_{\alpha}=\sum_{i}a_{i\alpha}.\hskip-108.405pt

Because there is no subtraction, it is clear that the entries of $x$ remain positive throughout this algorithm.

Our method is inspired by interpreting the equations in (1) as a maxmimum likelihood problem for a statistical model and applying the well-known methods of Expectation-Maximization (EM) and Iterative Proportional Fitting (IPF). Here, we assume that all the monomials $x^{\alpha}$ have the same total degree. Our statistical model is that a hidden process generates an integer $i\leq\ell$ and an exponent vector $\alpha$ with joint probability $a_{i\alpha}x^{\alpha}$ . The vector $x$ contains $n$ positive parameters for the model, restricted such that the total probability $\sum_{i,\alpha}a_{i\alpha}x^{\alpha}$ is $1$ . The empirical data consists of repeated observations of the integer $i$ , but not the exponent $\alpha$ , and $b_{i}$ is the proportion of observations of $i$ . In this situation, the vector $x$ which minimizes the divergence of (1) is the maximum likelihood parameters for the empirical distribution $b_{i}$ . The inner loop of the algorithm consists of using IPF to solve the log-linear hidden model and the outer loop consists of using EM to estimate the distribution on the hidden states.

2. Proof of convergence

In this section we prove our main theorem:

Theorem 1.

The Kullback-Leibler divergence

(5)

\sum_{i=1}^{\ell}D\left(\sum_{a\in S}a_{i\alpha}x^{\alpha}\|b_{i}\right).

is weakly decreasing during the algorithm in Section 1. Moreover, assuming that the set $S$ contains a multiple of each unit vector $e_{i}$ , i.e. some power of each $x_{i}$ appears in the system of equations, then the vector $x$ coverges to a critical point of the function (5) or the boundary of the positive orthant.

Remark 2.

The condition on $S$ is necessary to ensure that the vector $x$ remains bounded during the algorithm.

We begin by establishing several basic properties of the generalized Kullback-Leibler divergence in Proposition 3 and Lemmas 4 and 5. Note that $D\left(a\|b\right)=\sum_{i=1}^{m}D\left(a_{i}\|b_{i}\right)$ and thus, we will check these basic properties by reducing to the case where $a$ and $b$ are scalars.

The proof of Theorem 1 itself is divided into two parts, corresponding to the two nested iterative loops. The first step is to prove that the updates (4) in the inner loop converge a local minimum of the divergence $D\left(a_{\alpha}x^{\alpha}\|w_{\alpha}\right)$ . The second step is to show that this implies that the outer loop strictly decreases the divergence function (5) exact at a critical point.

Proposition 3.

For $a$ and $b$ vectors of positive real numbers, the divergence $D\left(a\|b\right)$ is non-negative with $D\left(a\|b\right)=0$ if and only if $a=b$ .

Proof.

It suffices to prove this when $a$ and $b$ are scalars. We let $t=b/a$ , and,

D\left(a\|b\right)=b\log\left(\frac{b}{a}\right)-b+a=a(t\log t-t+1)=a\int_{1}^{t}\log s\,ds.

It is clear that $\int_{1}^{t}\log s\,ds$ is non-negative and equal to zero if and only if $t=1$ . ∎

Lemma 4.

Suppose that $a$ and $b$ are vectors of $m$ positive real numbers. Let $t$ be any positive real number, and then

D\left(ta\|b\right)=D\left(a\|b\right)+(t-1)\sum_{i=1}^{m}a_{i}-\sum_{i=1}^{m}b_{i}\log t

Proof.

Again, we assume that $a$ and $b$ are scalars and then it becomes a straightforward computation. ∎

Lemma 5.

If $a$ and $b$ are vectors of $m$ positive real numbers, then we can relate their divergence to the divergence of their sums by

D\left(a\|b\right)=D\left(\textstyle\sum_{i=1}^{m}a_{i}\|\textstyle\sum_{i=1}^{m}b_{i}\right)+D\left(\frac{\sum_{i=1}^{m}b_{i}}{\sum_{i=1}^{m}a_{i}}a\|b\right).

Proof.

We let $A=\sum_{i=1}^{m}a_{i}$ and $B=\sum_{i=1}^{m}b_{i}$ , and apply Lemma 4 to the last term:

	$\displaystyle D\left(\frac{B}{A}a\\|b\right)$	$\displaystyle=D\left(a\\|b\right)+\left(\frac{B}{A}-1\right)A-B\log\frac{B}{A}$
		$\displaystyle=D\left(a\\|b\right)-B\log\frac{B}{A}+B-A=D\left(a\\|b\right)-D\left(A\\|B\right).$

After rearranging, we get the desired expression. ∎

Lemma 6.

The update rule (4) weakly decreases the divergence. If

x_{i}^{\prime}=x_{i}\left(\frac{\sum_{\alpha}\alpha_{i}g_{ji}w_{\alpha}}{\sum_{\alpha}\alpha_{i}g_{ji}a_{\alpha}x^{\alpha}}\right)^{g_{ji}/d_{j}},

then

(6)

\sum_{\alpha}D\left(a_{\alpha}(x^{\prime})^{\alpha}\|w_{\alpha}\right)\leq\sum_{\alpha}D\left(a_{\alpha}x^{\alpha}\|w_{\alpha}\right)-\frac{1}{d}\sum_{i=1}^{n}D\left(\sum_{\alpha}\alpha_{i}g_{ji}a_{\alpha}x^{\alpha}\|\sum_{\alpha}\alpha_{i}g_{ji}w_{\alpha}\right).

Proof.

First, since the statement only depends on the $j$ th row of the matrix $g$ , we can assume that $g$ is a row vector and we drop $j$ from future subscripts. Second, we can assume that $d=1$ by replacing $g_{i}$ with $g_{i}/d$ .

Third, we reduce to the case when $g_{i}=1$ for all $i$ . We define a new set of exponents $\tilde{\alpha}$ and coefficients $\tilde{a}_{\tilde{\alpha}}$ by $\tilde{\alpha}_{i}=g_{i}\alpha_{i}$ and $\tilde{a}_{\alpha}=a_{\alpha}\prod x_{i}$ , where the product is taken over all indices $i$ such that $g_{i}=0$ . We take $\tilde{x}$ to be a vector indexed by those $i$ such that $g_{i}\neq 0$ . Then, under the change of coordinates $\tilde{x}_{i}=x_{i}^{1/g_{i}}$ , we have $a_{\alpha}x^{\alpha}=\tilde{a}_{\alpha}\tilde{x}^{\tilde{\alpha}}$ and the update rule in (4) is the same for the new system with coefficients $\tilde{a}_{\tilde{\alpha}}$ and exponents $\tilde{\alpha}$ . Furthermore, if all entries of $\tilde{\alpha}$ are zero, then $\tilde{x}^{\tilde{\alpha}}=1$ for all vectors $x$ and so we can drop $\tilde{\alpha}$ from our exponent set. Therefore, for the rest of the proof, we drop the tildes, and assume that $\sum_{i}\alpha_{i}=1$ for all $\alpha\in S$ and $g_{i}=1$ for all $i$ , in which case $g$ drops out of the equations.

To prove the desired inequality, we substitute the updated assignment $x^{\prime}$ into the definition of Kullback-Leibler divergence:

	$\displaystyle D\left(a_{\alpha}(x^{\prime})^{\alpha}\\|w_{\alpha}\right)$	$\displaystyle=w_{\alpha}\log\left(\frac{w_{\alpha}}{a_{\alpha}x^{\alpha}\prod_{i=1}^{n}\left(\frac{\sum_{\beta}w_{\beta}}{\sum_{\beta}\beta_{i}a_{\beta}x^{\beta}}\right)^{\alpha_{i}}}\right)-w_{\alpha}+a_{\alpha}(x^{\prime})^{\alpha}$
		$\displaystyle=w_{\alpha}\log\frac{w_{\alpha}}{a_{\alpha}x^{\alpha}}-\sum_{i=1}^{n}\alpha_{i}w_{\alpha}\log\frac{\sum_{\beta}\beta_{i}w_{\beta}}{\sum_{\beta}\beta_{i}a_{\beta}x^{\beta}}-w_{\alpha}+a_{\alpha}(x^{\prime})^{\alpha}$
(7)			$\displaystyle=D\left(a_{\alpha}x^{\alpha}\\|w_{\alpha}\right)-\sum_{i=1}^{n}\alpha_{i}w_{\alpha}\log\frac{\sum_{\beta}\beta_{i}w_{\beta}}{\sum_{\beta}\beta_{i}a_{\beta}x^{\beta}}-a_{\alpha}x^{\alpha}+a_{\alpha}(x^{\prime})^{\alpha}.$

On the other hand, let $C$ denote the last term of (6), which we can expand as,

	$\displaystyle C$	$\displaystyle=\sum_{i=1}^{n}D\left(\sum_{\alpha}\alpha_{i}a_{\alpha}x^{\alpha}\\|\sum_{\alpha}\alpha_{i}w_{\alpha}\right)$
		$\displaystyle=\sum_{i=1}^{n}\left(\left(\sum_{\alpha}\alpha_{i}w_{\alpha}\right)\log\frac{\sum_{\alpha}\alpha_{i}w_{\alpha}}{\sum_{\alpha}\alpha_{i}a_{\alpha}x^{\alpha}}-\sum_{\alpha}\alpha_{i}w_{\alpha}+\sum_{\alpha}\alpha_{i}a_{\alpha}x^{\alpha}\right)$
		$\displaystyle=\sum_{i=1}^{n}\sum_{\alpha}\left(\alpha_{i}w_{\alpha}\log\frac{\sum_{\beta}\beta_{i}w_{\beta}}{\sum_{\beta}\beta_{i}a_{\beta}x^{\beta}}-\alpha_{i}w_{\alpha}+\alpha_{i}a_{\alpha}x^{\alpha}\right)$
(8)			$\displaystyle=\sum_{\alpha}\left(\sum_{i=1}^{n}\alpha_{i}w_{\alpha}\log\frac{\sum_{\beta}\beta_{i}w_{\beta}}{\sum_{\beta}\beta_{i}a_{\beta}x^{\beta}}\right)-w_{\alpha}+a_{\alpha}x^{\alpha},$

where the last step follows from the assumption that that $\sum_{i}\alpha_{i}=1$ for all $\alpha\in S$ . We take the sum of (7) over all $\alpha\in S$ and add it to (8) to get,

(9)

\sum_{\alpha}D\left(a_{\alpha}(x^{\prime})^{\alpha}\|w_{\alpha}\right)+C=\sum_{\alpha}D\left(a_{\alpha}x^{\alpha}\|w_{\alpha}\right)-\sum_{\alpha}b_{\alpha}+\sum_{\alpha}a_{\alpha}(x^{\prime})^{\alpha}.

Finally, we expand the last term of (8) using the definition of $x^{\prime}$ and apply the arithmetic-geometric mean inequality,

	$\displaystyle\sum_{\alpha}a_{\alpha}(x^{\prime})^{\alpha}$	$\displaystyle=\sum_{\alpha}a_{\alpha}x^{\alpha}\prod_{i=1}^{n}\left(\frac{\sum_{\beta}\beta_{i}b_{\beta}}{\sum_{\beta}\beta_{i}a_{\beta}x^{\beta}}\right)^{\alpha_{i}}$
		$\displaystyle\leq\sum_{\alpha}a_{\alpha}x^{\alpha}\sum_{i=1}^{n}\alpha_{i}\frac{\sum_{\beta}\beta_{i}b_{\beta}}{\sum_{\beta}\beta_{i}a_{\beta}x^{\beta}}$
		$\displaystyle=\sum_{i=1}^{n}\left(\sum_{\alpha}\alpha_{i}a_{\alpha}x^{\alpha}\right)\frac{\sum_{\beta}\beta_{i}b_{\beta}}{\sum_{\beta}\beta_{i}a_{\beta}x^{\beta}}$
		$\displaystyle=\sum_{i=1}^{n}\sum_{\beta}\beta_{i}b_{\beta}=\sum_{\beta}b_{\beta}.$

Together with (9), this gives the desired inequality. ∎

Proposition 7.

A positive vector $x$ is a fixed point of the update rule (4) for all $1\leq j\leq s$ if and only if $x$ is a critical point of the divergence function $\sum_{\alpha}D\left(a_{\alpha}x^{\alpha}\|w_{\alpha}\right)$ .

Proof.

For the update rule to be constant means that the numerator and denominator in (4) are equal, i.e.

(10)

\sum_{\alpha}\alpha_{i}g_{ji}a_{\alpha}x^{\alpha}=\sum_{\alpha}\alpha_{i}g_{ji}w_{\alpha}\quad\mbox{for all $i$ and $j$.}

By our assumption on $g$ , for each $i$ , some $g_{ji}$ is non-zero, so (10) is equivalent to

(11)

\sum_{\alpha}\alpha_{i}a_{\alpha}x^{\alpha}=\sum_{\alpha}\alpha_{i}w_{\alpha}\quad\mbox{for all }i.

On the other hand, we compute the partial derivative

\frac{\partial}{\partial x_{i}}\sum_{\alpha}D\left(a_{\alpha}x^{\alpha}\|w_{\alpha}\right)=\sum_{\alpha}-w_{\alpha}\frac{\alpha_{i}}{x_{i}}+\alpha_{i}a_{\alpha}\frac{x^{\alpha}}{x_{i}}.

Since each $x_{i}$ is assumed to be non-zero, it is clear that all partial derivatives being zero is equivalent to (11). ∎

Lemma 8.

If we define $w_{\alpha}$ as in (3), then

\sum_{i=1}^{n}D\left(\sum_{\alpha}a_{i\alpha}(x^{\prime})^{\alpha}\|b_{i}\right)-\sum_{i=1}^{n}D\left(\sum_{\alpha}a_{i\alpha}x^{\alpha}\|b_{i}\right)\\ \leq\sum_{\alpha}D\left(a_{\alpha}(x^{\prime})^{\alpha}\|w_{\alpha}\right)-\sum_{\alpha}D\left(a_{\alpha}x^{\alpha}\|w_{\alpha}\right).

Moreover, a positive vector $x$ is a fixed point if and only if $x$ is a critical point for the divergence function.

Proof.

We consider

(12)

\sum_{i,\alpha}D\left(a_{i\alpha}(x^{\prime})^{\alpha}\|w_{i\alpha}\right)\qquad\mbox{where }w_{i\alpha}=\frac{b_{i}a_{i\alpha}x^{\alpha}}{\sum_{\beta}a_{i\beta}x^{\beta}},

and apply Lemma 5 in two different ways. First, by applying Lemma 5 to each group of (12) with fixed $\alpha$ , we get

\sum_{i,\alpha}D\left(a_{i\alpha}(x^{\prime})^{\alpha}\|w_{i\alpha}\right)=\sum_{\alpha}D\left(a_{\alpha}(x^{\prime})^{\alpha}\|w_{\alpha}\right)+\sum_{i,\alpha}D\left(\frac{\sum_{j}w_{j\alpha}}{\sum_{j}a_{\alpha}(x^{\prime})^{\alpha}}a_{i\alpha}(x^{\prime})^{\alpha}\|w_{i\alpha}\right).

In the last term, the monomials $(x^{\prime})^{\alpha}$ cancel and so it is a constant independent of $x^{\prime}$ which we denote $E$ . On the other hand, we can apply Lemma 5 to each group in (12) with fixed $i$ ,

\sum_{i,\alpha}D\left(a_{i\alpha}(x^{\prime})^{\alpha}\|w_{i\alpha}\right)=\sum_{i}D\left(\sum_{\alpha}a_{i\alpha}(x^{\prime})^{\alpha}\|b_{i}\right)+\sum_{i,\alpha}D\left(\frac{b_{i}a_{i\alpha}(x^{\prime})^{\alpha}}{\sum_{\beta}a_{i\beta}(x^{\prime})^{\beta}}\|w_{i\alpha}\right).

We can combine these equations to get

(13)

\sum_{i}D\left(\sum_{\alpha}a_{i\alpha}(x^{\prime})^{\alpha}\|b_{i}\right)=\sum_{\alpha}D\left(a_{\alpha}(x^{\prime})^{\alpha}\|w_{i\alpha}\right)+E-\sum_{i,\alpha}D\left(\frac{b_{i}a_{i\alpha}(x^{\prime})^{\alpha}}{\sum_{\beta}a_{i\beta}(x^{\prime})^{\beta}}\|w_{i\alpha}\right).

By Proposition 3, the last term of 13 is non-negative, and by the definition of $w_{i\alpha}$ , it is zero for $x^{\prime}=x$ . Therefore, any value of $x^{\prime}$ which decreases the first term compared to $x$ will also decrease the left hand side by at least as much, which i sthe desired inequality.

In order to prove the statement about the derivative, we consider the derivative of (13) at $x^{\prime}=x$ . Because the last term is mimimized at $x^{\prime}=x$ , its derivative is zero, so

\left.\frac{\partial}{\partial x_{j}^{\prime}}\right|_{x^{\prime}=x}\sum_{i}D\left(\sum_{\alpha}a_{i\alpha}(x^{\prime})^{\alpha}\|b_{i}\right)=\left.\frac{\partial}{\partial x_{j}^{\prime}}\right|_{x^{\prime}=x}\sum_{i,\alpha}D\left(a_{i\alpha}(x^{\prime})^{\alpha}\|w_{i\alpha}\right).

By Proposition 7, a positive vector $x$ is a fixed point of the inner loop if and only if these partial derivatives on the right are zero for all indices $j$ , which is the definition of a critical point. ∎

Proof of Theorem 1.

The Kullback-Leibler divergence $\sum_{\alpha}D\left(a_{\alpha}x^{\alpha}\|w_{\alpha}\right)$ decreases at each step of the inner loop by Lemma 6. Thus, by Lemma 8, the divergence

(14)

\sum_{i=1}^{n}D\left(\sum_{\alpha}a_{i\alpha}x^{\alpha}\|b_{i}\right)

decreases at least as much. However, the divergence (14) is non-negative according to Proposition 3. Therefore, the magnitude of the decreases in divergence must approach zero over the course of the algorithm. By Lemma 6, this means that the quantity $C$ in that theorem must approach zero. By Proposition 3, this means that the quantities in that divergence approach each other. However, up to a power, these are the numerator and denominator of the factor in the update rule (4), so the difference between consecutive vectors $x$ approaches zero.

Thus, we just need to show that $x$ remains bounded. However, since some power of each variable $x_{i}$ occurs in some equation, as $x_{i}$ gets large, the divergence for that equation also gets arbitrarily large. Therefore, each $x_{i}$ must remain bounded, so the vector $x$ must have a limit as the algorithm is iterated. If this limit is in the interior of the positive orthant, then it must be a fixed point. By Lemma 8 and Proposition 7, this fixed point must be a critical point of the divergence (5). ∎

3. Universality

Although the restriction on the exponents and especially the positivity of the coefficients seem like strong conditions, such systems can nonetheless be quite complex. In this section, we investigate the breadth of such equations.

Proposition 9.

For any system of $\ell$ real polynomials in $n$ variables, there exists a system of $\ell+1$ equations in $n+1$ variables, in the form (1), such that the positive solutions $(x_{1},\ldots,x_{n})$ to the former system are in bijection with the positive solutions $(x_{1}^{\prime},\ldots,x_{n+1}^{\prime})$ of the latter, with $x_{i}^{\prime}=x_{i}/x_{n+1}$ .

Proof.

We write our system of equations as $\sum_{\alpha\in S}a_{i\alpha}x^{\alpha}=0$ for $1\leq i\leq\ell$ , where $S\subset\mathbb{N}^{n}$ is an arbitrary finite set of exponents and $a_{i\alpha}$ are any real numbers. We let $d$ be the maximum degree of any monomial $x^{\alpha}$ for $\alpha\in S$ . We homogenize the equations with a new variable $x_{n+1}$ . Explicitly, define $S^{\prime}\subset\mathbb{N}^{n+1}$ to consist of $\alpha^{\prime}=(\alpha,d-\sum_{i}\alpha_{i})$ for all $\alpha$ in $S$ and we write $a_{i\alpha^{\prime}}=a_{i\alpha}$ . We add a new equation with coefficients $a_{\ell+1,\alpha}=1$ for all $\alpha\in S^{\prime}$ and $b_{\ell+1}=1$ . For this system, we can clearly take $g_{1i}=1$ and $d_{1}=d$ to satisfy the condition on exponents. Furthermore, for any positive solution $(x_{1},\ldots,x_{n})$ to the original system of equations, $(x_{1}^{\prime},\ldots,x_{n+1}^{\prime})$ with $x_{i}^{\prime}=x_{i}/\big{(}\sum_{\alpha}x^{\alpha}\big{)}^{1/d}$ and $x_{n+1}^{\prime}=1/\big{(}\sum_{\alpha}x^{\alpha}\big{)}^{1/d}$ is a solution to the homogenized system of equations.

Next, we add a multiple of the last equation to each of the others in order to make all the coefficients positive. For each $1\leq i\leq\ell$ , choose a positive $b_{i}>-\min_{\alpha}\{a_{i\alpha}\mid\alpha\in S^{\prime}\}$ , and define $a_{i\alpha}^{\prime}=a_{i\alpha}+b_{i}$ . By construction, the resulting system has all positive coefficients, and since the equations are formed from the previous equations by elementary linear transformations, the set of solutions are the same. ∎

The practical use of the construction in the proof of Proposition 9 is mixed. The first step, of homogenizing to deal with arbitrary sets of exponents, is a straightforward way of guaranteeing the existence of the matrix $g$ . However, for large systems, the second step tends to produce an ill-conditioned coefficient matrix. In these cases, our algorithm converges very slowly. Nonetheless, Proposition 9 shows that in the worst case, systems satisfying our hypotheses can be as complicated as arbitrary polynomial systems.

Proposition 10.

There exist bilinear equations in $2m$ variables with ${2m-2\choose m-1}$ positive real solutions.

Proof.

We use a variation on the technique used to prove Proposition 9.

First, we pick $2m-2$ generic homogeneous linear functions $b_{1},\ldots,b_{2m-2}$ on $m$ variables. By generic, we mean for any $m$ of the $b_{k}$ , the only simultaneous solution of all $m$ linear equations is the trivial one. This genericity implies that any $m-1$ of the $b_{k}$ define a point in $\mathbb{P}^{m-1}$ By taking a linear changes of coordinates in each set of variables, we can assume that all of these points are positive, i.e. have a representative consisting of all positive real numbers.

Then we consider the system of equations

(15)	$\displaystyle b_{k}(x_{1},\ldots,x_{m})\cdot b_{k}(x_{m+1},\ldots,x_{n})$	$\displaystyle=0,\mbox{ for }1\leq k\leq 2m-2$
(16)	$\displaystyle(x_{1}+\ldots+x_{m})(x_{m+1}+\ldots+x_{2m})$	$\displaystyle=1$
(17)	$\displaystyle x_{1}+\ldots+x_{m}$	$\displaystyle=1.$

The equations (15) are bihomogeneous and so we can think of their solutions in $\mathbb{P}^{m-1}\times\mathbb{P}^{m-1}$ . There are exactly ${2m-2\choose m-1}$ positive real solutions, corresponding to the subsets $A\subset[2m-2]$ of size $m-1$ . For any such $A$ , there is a unique, distinct solution satisfying $b_{k}(x_{1},\ldots,x_{m})=0$ for all $k$ in $A$ and $b_{k}(x_{m+1},\ldots,x_{2m})=0$ for all $k$ not in $A$ . By assumption, for each solution, all the coordinates can be chosen to be positive. The last two equations (16) and (17) dehomogenize the system in a way such that there are ${2m-2\choose m-1}$ positive real solutions. Finally, as in the last paragraph of the proof of Proposition 9, we can add multiples of (16) to the equations (15) in order to make all the coefficients positive. ∎

Acknowledgments

We thank Bernd Sturmfels for reading a draft of this manuscript. This work was supported by the Defense Advanced Research Projects Agency project “Microstates to Macrodynamics: A New Mathematics of Biology” and the U.S. National Science Foundation (DMS-0456960).

References

[1] Dan Bates and Frank Sottile. Khovanskii-Rolle continuation for real solutions. arXiv:0908.4579, 2009.
[2] Dustin A. Cartwright, Siobhan M. Brady, David A. Orlando, Bernd Sturmfels, and Philip N. Benfey. Reconstruction spatiotemporal gene expression data from partial observations. Bioinformatics, 25(19):2581–2587, 2009.
[3] J. N. Darroch and D. Ratcliff. Generalized iterative scaling for log-linear models. Annals of Math. Stat., 43(5):1470–1480, 1972.
[4] Jean B. Lasserre, Monique Laurent, and Philipp Rostalski. Semidefinite characterization and computation of real radical ideals. Foundations of Computational Mathematics, 8(5):607–647, 2008.
[5] Jean B. Lasserre, Monique Laurent, and Philipp Rostalski. A prolongation-projection algorithm for computing the finite real variety of an ideal. Theoretical Computer Science, 410(27–29):2685–2700, 2009.
[6] Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negative matrix factorization. Adv. Neural Info. Proc. Syst., 13:556–562, 2001.
[7] Lior Pachter and Bernd Sturmfels. Algebraic Statistics for Computational Biology. Cambridge University Press, 2005.
[8] Jan Verschelde. PHCPACK: A general-purpose solver for polynomial systems by homotopy continuation.

An Algorithm for Finding Positive Solutions to Polynomial Equations

Abstract.

1. Algorithm

2. Proof of convergence

Theorem 1.

Remark 2.

Proposition 3.

Proof.

Lemma 4.

Proof.

Lemma 5.

Proof.

Lemma 6.

Proof.

Proposition 7.

Proof.

Lemma 8.

Proof.

Proof of Theorem 1.

3. Universality

Proposition 9.

Proof.

Proposition 10.

Proof.

Acknowledgments

References

An Algorithm for Finding Positive
Solutions to Polynomial Equations