This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Conic-Optimization Based Algorithms for
Nonnegative Matrix Factorization

\name Valentin Leplata, Yurii Nesterovb, Nicolas Gillisc, François Glineurb VL acknowledges the support by the European Research Council (ERC Advanced Grant no 788368) and the support by Ministry of Science and Higher Education grant No. 075-10-2021-068. Email: V.Leplat@skoltech.ruNG acknowledges the support by the Fonds de la Recherche Scientifique - FNRS and the Fonds Wetenschappelijk Onderzoek - Vlanderen (FWO) under EOS Project no O005318F-RG47, by the European Research Council (ERC Starting Grant no 679515), and by the Francqui Foundation. Email: nicolas.gillis@umons.ac.beYN and FG acknowledge support from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement No. 788368). FG also acknowledges support from EOS project no O005318F-RG47. Emails: {yurii.nesterov,francois.glineur}@uclouvain.be. aCenter for Artificial Intelligence Technology, Skoltech, Bolshoy Boulevard 30, bld. 1, Moscow, Russia 121205; bCORE and ICTEAM/Mathematical Engineering (INMA), UCLouvain, Avenue Georges Lemaître 4, B-1348 Louvain-la-Neuve, Belgium ; cDepartment of Mathematics and Operational Research, Faculté Polytechnique, Université de Mons, Rue de Houdain 9, 7000 Mons, Belgium
Abstract

Nonnegative matrix factorization is the following problem: given a nonnegative input matrix VV and a factorization rank KK, compute two nonnegative matrices, WW with KK columns and HH with KK rows, such that WHWH approximates VV as well as possible. In this paper, we propose two new approaches for computing high-quality NMF solutions using conic optimization. These approaches rely on the same two steps. First, we reformulate NMF as minimizing a concave function over a product of convex cones–one approach is based on the exponential cone, and the other on the second-order cone. Then, we solve these reformulations iteratively: at each step, we minimize exactly, over the feasible set, a majorization of the objective functions obtained via linearization at the current iterate. Hence these subproblems are convex conic programs and can be solved efficiently using dedicated algorithms. We prove that our approaches reach a stationary point with an accuracy decreasing as 𝒪(1i)\mathcal{O}(\frac{1}{i}), where ii denotes the iteration number. To the best of our knowledge, our analysis is the first to provide a convergence rate to stationary points for NMF. Furthermore, in the particular cases of rank-one factorizations (that is, K=1K=1), we show that one of our formulations can be expressed as a convex optimization problem implying that optimal rank-one approximations can be computed efficiently. Finally, we show on several numerical examples that our approaches are able to frequently compute exact NMFs (that is, with V=WHV=WH), and compete favorably with the state of the art.

keywords:
nonnegative matrix factorization, nonnegative rank, exponential cone, second-order cone, concave minimization, conic optimization, Frank-Wolfe gap, convergence to stationary points

1 Introduction

Nonnegative matrix factorization (NMF) is the problem of approximating a given nonnegative matrix, V+F×NV\in\mathbb{R}_{+}^{F\times N}, as the product of two smaller nonnegative matrices, W+F×KW\in\mathbb{R}_{+}^{F\times K} and H+K×NH\in\mathbb{R}_{+}^{K\times N}, where KK is a given parameter known as the factorization rank. One aims at finding the best approximation, that is, the one that minimizes the discrepancy between VV and the product WHWH, often measured by the Frobenius norm of their difference, VWHF\left\|V-WH\right\|_{F}. Despite the fact that NMF is NP-hard in general [30] (see also [27]), it has been used successfully in many domains such as probability, geoscience, medical imaging, computational geometry, combinatorial optimization, analytical chemistry, and machine learning; see [11, 13] and the references therein. Many local optimization schemes have been developed to compute NMFs. They aim to identify local minima or stationary points of optimization problems that minimize the discrepancy between VV and the approximation WHWH. Most of these iterative algorithms rely on a two-block coordinate descent scheme that consists in (approximatively) optimizing alternatively over WW with HH fixed, and vice-versa; see [5, 13] and the references therein. In this paper, we are interested in computing high-quality local minima for the NMF optimization problems without relying on the block coordinate descent (BCD) framework. We will perform the optimization over WW and HH jointly. Moreover, our focus is on finding exact NMFs, that is, computing nonnegative factors WW and HH such that V=WHV=WH, although our approaches can be used to find approximate NMFs as well.

The minimum factorization rank KK for which an exact NMF exists is called the nonnegative rank of VV and is denoted rank+(V)\operatorname{rank}_{+}(V), we have

rank+(V)=min{K|there exist W+F×K and H+K×N such that V=WH}.\text{rank}_{+}(V)=\text{min}\Big{\{}K\in\mathbb{N}\,\Big{|}\,\text{there exist }W\in\mathbb{R}_{+}^{F\times K}\text{ and }H\in\mathbb{R}_{+}^{K\times N}\text{ such that }V=WH\Big{\}}.

The computation of the nonnegative rank is NP-hard [30], and is a research topic on its own; see [13, Chapter 3][9, 8] and the references therein for recent progress on this question.

1.1 Computational complexity

Solving exact NMF can be used to compute the nonnegative rank, by finding the smallest KK such that an exact NMF exists. Cohen and Rothblum [7] give a super-exponential time algorithm for this problem. Vavasis [30] proved that checking whether rank(V)=rank+(V)\operatorname{rank}(V)=\operatorname{rank}_{+}(V), where rank(V)=K\operatorname{rank}(V)=K is part of the input, is NP-hard. Since determining the nonnnegative rank is a generalization of exact NMF, the results in [30] imply that computing an exact NMF is also NP-hard. Similarly, the standard NMF problem using any norm is a generalization of exact NMF, and therefore any hardness result that applies to exact NMF also applies to most approximated NMF models [30]. Hence, unless P=NP, no algorithm can solve exact NMF using a number of arithmetic operations bounded by a polynomial in KK and in the size of VV; see also [27] that gives a different proof using algebraic arguments.

More recently, Arora et al. [2] showed that no algorithm to solve this problem can run in time (FN)o(K)(FN)^{o(K)} unless1113-SAT, or 3-satisfiability, is an instrumental problem in computational complexity to prove NP-completeness results. 3-SAT is the problem of deciding whether a set of disjunctions containing 3 Boolean variables or their negation can be satisfied. 3-SAT can be solved in time 2o(n)2^{o(n)} on instances with nn variables. However, in practice, KK is small and it makes senses to wonder what is the complexity if KK is assumed to be a fixed constant. In that case, they showed that exact NMF can be solved in polynomial time in FF and NN, namely in time 𝒪((FN)c2KK2)\mathcal{O}((FN)^{c2^{K}K^{2}}) for some constant cc, which Moitra [24] later improved to 𝒪((FN)cK2)\mathcal{O}((FN)^{cK^{2}}).

The argument is based on quantifier elimination theory, using the seminal result by Basu, Pollack and Roy [3]. Unfortunately, this approach cannot be used in practice, even for small size matrices, because of its high computational cost: although the term 𝒪((FN)cK2)\mathcal{O}((FN)^{cK^{2}}) is a polynomial in FF and NN for KK fixed, it grows extremely fast (and the hidden constants are usually very large). Let us illustrate with a 4-by-4 matrix with K=3K=3, we have a complexity of order 1697 101016^{9}\approx 7\ 10^{10} and for a 5-by-5 matrix with K=4K=4, the complexity raises up to 25162 102225^{16}\approx 2\ 10^{22}. Therefore developing an effective computational technique for exact NMF of small matrices is an important direction of research. Some heuristics have been recently developed that allow solving exact NMF for matrices up to a few dozen rows and columns [29].

1.2 Contribution and outline of the paper

In this paper, we introduce two formulations for computing an NMF using conic optimization. They rely on the same two steps. First, in Section 2, we reformulate NMF as minimizing a concave function over a product of convex cones; one approach is based on the exponential cone and leads to under-approximations, and the other on the second-order cone and leads to over-approximations. For the latter formulation, in the case of a rank-one factorization, we show that it can be cast as a convex optimization problem, leading to an efficient computation of the optimal rank-one over-approximation. Then, in Section 3, we solve these reformulations iteratively: at each step, we minimize exactly over the feasible set a majorization of the objective functions obtained via linearization at the current iterate. Hence these subproblems are convex conic programs and can be solved efficiently using dedicated algorithms. In Section 4, we show that our optimization scheme relying on successive linearizations is a special case of the Frank-Wolfe (FW) algorithm. By using an appropriate measure of stationarity, namely the FW gap, we show in Theorem 4.1 that the minimal FW gap generated by our algorithm converges as 𝒪(1i)\mathcal{O}(\frac{1}{i}), where ii is the iteration index. Finally, in Section 5, we use our approaches to compute exact NMFs, and show that they compete favorably with the state of the art when applied to several classes of nonnegative matrices; namely, randomly generated, infinitesimally rigid and slack matrices.

Remark 1 (Focus on exact NMF).

Our two NMF formulations can be used to compute approximate factorizations (namely, under- and over-approximations). However, in this paper, we focus on exact NMF for the numerical experiments. The reason is twofold:

  1. 1.

    Exact NMF problems allow us to guarantee whether a globally optimal solution is reached, and hence compare algorithms in terms of global optimality.

  2. 2.

    Our current algorithms rely on interior-point methods that do not scale well. Therefore, they are significantly slower, at this point, than state-of-the-art algorithms to compute approximate NMFs on large data sets (such as images or documents). Making our approach scalable is a topic of further research. Moreover, because of the under/over-approximations, the error obtained with the proposed algorithms would be larger, and the comparison would not be fair. Here is a simple example, let

    V=(0111),VNMF=(0.450.720.721.17),VU=(0101),VO=(1111).V=\left(\begin{array}[]{cc}0&1\\ 1&1\end{array}\right),V_{\text{NMF}}=\left(\begin{array}[]{cc}0.45&0.72\\ 0.72&1.17\end{array}\right),V_{\text{U}}=\left(\begin{array}[]{cc}0&1\\ 0&1\end{array}\right),V_{\text{O}}=\left(\begin{array}[]{cc}1&1\\ 1&1\end{array}\right).

    The best rank-one NMF of VV is VNMFV_{\text{NMF}} (with two digits of accuracy) with Frobenius error VVNMFF=0.62\|V-V_{\text{NMF}}\|_{F}=0.62, while a best rank-one under-approximation (resp. over-approximation) of VV is VUV_{\text{U}} (resp. VOV_{\text{O}}) with Frobenius error 1. (Note however that under-/over-approximations can have some useful properties in practice [16, 28].)

2 NMF formulations based on conic optimization

In this section we propose two new formulations for NMF, where the feasible set is represented using the exponential cone (Section 2.1) and the second-order cone (Section 2.2).

2.1 NMF formulation via exponential cones

Given a non-negative matrix V+F×NV\in\mathbb{R}_{+}^{F\times N} and a positive integer Kmin(F,N)K\ll\min(F,N), we want to compute an NMF. Our first proposed formulation is the following:

maxWF×K,HK×N\displaystyle\underset{W\in\mathbb{R}^{F\times K},H\in\mathbb{R}^{K\times N}}{\text{max}} f=1Fn=1N(k=1KWfkHkn)\displaystyle\sum_{f=1}^{F}\sum_{n=1}^{N}\left(\sum_{k=1}^{K}W_{fk}H_{kn}\right) (1)
subject to k=1KWfkHknVfn for f,n𝒩,\displaystyle\sum_{k=1}^{K}W_{fk}H_{kn}\leq V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N},
Wfk0,Hkn0 for f,k𝒦,n𝒩.\displaystyle W_{fk}\geq 0,H_{kn}\geq 0\text{ for }f\in\mathcal{F},k\in\mathcal{K},n\in\mathcal{N}.

where ={1,,F}\mathcal{F}=\{1,...,F\}, 𝒩={1,,N}\mathcal{N}=\{1,...,N\} and 𝒦={1,,K}\mathcal{K}=\{1,...,K\}. Any feasible solution (W,H)(W,H) of (1) provides an under-approximation of VV, because of the elementwise constraint WHVWH\leq V. The objective function of (1) maximizes the sum of the entries of WHWH. Therefore, if VV admits an exact NMF of size KK, that is, rank+(V)K\operatorname{rank}_{+}(V)\leq K, any optimal solution (W,H)(W^{*},H^{*}) of (1) must satisfy WH=VW^{*}H^{*}=V, and hence will provide an exact NMF of VV. Note that this problem is nonconvex because of the bilinear terms appearing in the objective and the constraint WHVWH\leq V.

Let us now reformulate (1) using exponential cones. In order to deal with nonnegativity constraints on the entries of WW and HH, we use the following change of variables: Wfk=G(Ufk)=eUfkW_{fk}=G(U_{fk})=e^{U_{fk}} and Hkn=G(Tkn)=eTknH_{kn}=G(T_{kn})=e^{T_{kn}}, where UF×KU\in\mathbb{R}^{F\times K} and TK×NT\in\mathbb{R}^{K\times N}, with f=1,,Ff=1,\dots,F, n=1,,Nn=1,\dots,N and k=1,,Kk=1,\dots,K and G(t)=etG(t)=e^{t}. By applying a logarithm on top of this change of variables to the objective function, and on both sides of the inequality constraints WHVWH\leq V, (1) can be nearly equivalently rewritten as follows, the difference being that zero elements in WW and HH are now excluded:

maxUF×K,TK×N\displaystyle\underset{U\in\mathbb{R}^{F\times K},T\in\mathbb{R}^{K\times N}}{\text{max}} log(f,n,keUfk+Tkn)\displaystyle\text{log}\left(\sum_{f,n,k}e^{U_{fk}+T_{kn}}\right) (2)
subject to log(k=1KeUfk+Tkn)log(Vfn) for f,n𝒩,\displaystyle\text{log}\left(\sum_{k=1}^{K}e^{U_{fk}+T_{kn}}\right)\leq\text{log}\left(V_{fn}\right)\text{ for }f\in\mathcal{F},n\in\mathcal{N},

which corresponds to the maximization of a convex function (logarithm of the sums of exponentials) over a convex set, each constraint being convex for the same reason.

We rewrite the convex feasible set of (2) with explicit conic constraints as follows:

k=1KtfknVfn for f,n𝒩,\displaystyle\sum_{k=1}^{K}t_{fkn}\leq V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N}, (3)
(tfkn,1,Ufk+Tkn)Kexp for f,k𝒦,n𝒩,\displaystyle\left(t_{fkn},1,U_{fk}+T_{kn}\right)\in K_{exp}\text{ for }f\in\mathcal{F},k\in\mathcal{K},n\in\mathcal{N},

where Kexp3K_{exp}\subset\mathbb{R}^{3} denotes the (primal) exponential cone defined as:

Kexp={(x1,x2,x3)3|x1x2ex3x2,x2>0}{(x1,0,x3)|x10,x30}.\displaystyle K_{exp}=\left\{(x_{1},x_{2},x_{3})\in\mathbb{R}^{3}|x_{1}\geq x_{2}e^{\frac{x_{3}}{x_{2}}},x_{2}>0\right\}\cup\left\{\left(x_{1},0,x_{3}\right)|x_{1}\geq 0,x_{3}\leq 0\right\}. (4)

Note that the exponential cone is closed and includes the subset {(x1,0,x3)|x10,x30}\left\{\left(x_{1},0,x_{3}\right)|x_{1}\geq 0,x_{3}\leq 0\right\}, therefore the scenarios for which the entries VfnV_{fn} are equal to zero can be handled by exponential conic constraints, which was not possible with formulation (2) since the log\log function is not defined at zero. Hence the optimization problem (1) can be written completely equivalently as

maxUF×K,TK×N,tF×K×N\displaystyle\underset{U\in\mathbb{R}^{F\times K},T\in\mathbb{R}^{K\times N},t\in\mathbb{R}^{F\times K\times N}}{\text{max}} log(f,n,keUfk+Tkn)\displaystyle\text{log}\left(\sum_{f,n,k}e^{U_{fk}+T_{kn}}\right) (5)
subject to k=1KtfknVfn for f,n𝒩,\displaystyle\sum_{k=1}^{K}t_{fkn}\leq V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N},
(tfkn,1,Ufk+Tkn)Kexp for f,k𝒦,n𝒩.\displaystyle\left(t_{fkn},1,U_{fk}+T_{kn}\right)\in K_{exp}\text{ for }f\in\mathcal{F},k\in\mathcal{K},n\in\mathcal{N}.

This leads to F×NF\times N inequality constraints and the introduction of F×K×NF\times K\times N exponential cones. In Section 3, we propose an algorithm to tackle (5) using successive linearizations of the objective function.

2.2 NMF formulation via rotated second-order cones

Our second proposed NMF formulation is the following:

minWF×K,HK×N\displaystyle\underset{W\in\mathbb{R}^{F\times K},H\in\mathbb{R}^{K\times N}}{\text{min}} f=1Fn=1N(k=1KWfkHkn)\displaystyle\sum_{f=1}^{F}\sum_{n=1}^{N}\left(\sum_{k=1}^{K}W_{fk}H_{kn}\right) (6)
subject to k=1KWfkHknVfn for f,n𝒩,\displaystyle\sum_{k=1}^{K}W_{fk}H_{kn}\geq V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N},
Wfk,Hkn0 for f,k𝒦,n𝒩.\displaystyle W_{fk},H_{kn}\geq 0\text{ for }f\in\mathcal{F},k\in\mathcal{K},n\in\mathcal{N}.

Any feasible solution (W,H)(W,H) of (6) provides an over-approximation of VV, because of the constraint WHVWH\geq V. The objective function of (1) minimizes the sum of the entries of WHWH. Therefore, if rank+(V)K\operatorname{rank}_{+}(V)\leq K, any optimal solution (W,H)(W^{*},H^{*}) of (1) must satisfy WH=VW^{*}H^{*}=V, and hence will provide an exact NMF of VV. Again the problem is nonconvex due to the bilinear terms.

Let us use the following change of variables: we let Wfk=G(Ufk)=UfkW_{fk}=G(U_{fk})=\sqrt{U_{fk}} and Hkn=G(Tkn)=TknH_{kn}=G(T_{kn})=\sqrt{T_{kn}} where U+F×KU\in\mathbb{R}_{+}^{F\times K} and T+K×NT\in\mathbb{R}_{+}^{K\times N}, with f=1,,Ff=1,\dots,F, n=1,,Nn=1,\dots,N and k=1,,Kk=1,\dots,K, this time with G(t)=tG(t)=\sqrt{t}. Thus the optimization problem (6) can be equivalently rewritten as:

minU+F×K,T+K×N\displaystyle\underset{U\in\mathbb{R}_{+}^{F\times K},T\in\mathbb{R}_{+}^{K\times N}}{\text{min}} f=1Fn=1N(k=1KUfkTkn)\displaystyle\sum_{f=1}^{F}\sum_{n=1}^{N}\left(\sum_{k=1}^{K}\sqrt{U_{fk}}\sqrt{T_{kn}}\right) (7)
subject to k=1KUfkTknVfn for f,n𝒩,\displaystyle\sum_{k=1}^{K}\sqrt{U_{fk}}\sqrt{T_{kn}}\geq V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N},

which minimizes a concave function over a convex set. Indeed, the function xy\sqrt{xy} is concave.

This set can be written with conic constraints as follows:

k=1KtfknVfn, for f,n𝒩,\displaystyle\sum_{k=1}^{K}t_{fkn}\geq V_{fn},\text{ for }f\in\mathcal{F},n\in\mathcal{N}, (8)
(Ufk,12Tkn,tfkn)𝒬r3 for f,k𝒦,n𝒩,\displaystyle\left(U_{fk},\frac{1}{2}T_{kn},t_{fkn}\right)\in\mathcal{Q}_{r}^{3}\text{ for }f\in\mathcal{F},k\in\mathcal{K},n\in\mathcal{N},

where 𝒬r3\mathcal{Q}_{r}^{3} denotes the 33-dimensional rotated second-order cone defined as:

𝒬r3={(x1,x2,x3)3 | 2x1x2x32,x10,x20}.\displaystyle\mathcal{Q}_{r}^{3}=\left\{(x_{1},x_{2},x_{3})\in\mathbb{R}^{3}\text{ }|\text{ }2x_{1}x_{2}\geq x_{3}^{2},x_{1}\geq 0,x_{2}\geq 0\right\}.

Thus, the optimization problem (7) becomes

minU+F×K,T+K×N,tF×K×N\displaystyle\underset{U\in\mathbb{R}_{+}^{F\times K},T\in\mathbb{R}_{+}^{K\times N},t\in\mathbb{R}^{F\times K\times N}}{\text{min}} f=1Fn=1N(k=1KUfkTkn)\displaystyle\sum_{f=1}^{F}\sum_{n=1}^{N}\left(\sum_{k=1}^{K}\sqrt{U_{fk}}\sqrt{T_{kn}}\right) (9)
subject to k=1KtfknVfn for f,n𝒩,\displaystyle\sum_{k=1}^{K}t_{fkn}\geq V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N},
(Ufk,12Tkn,tfkn)𝒬r3 for f,k𝒦,n𝒩,\displaystyle\left(U_{fk},\frac{1}{2}T_{kn},t_{fkn}\right)\in\mathcal{Q}_{r}^{3}\text{ for }f\in\mathcal{F},k\in\mathcal{K},n\in\mathcal{N},

which leads to F×NF\times N inequality constraints and the introduction of of F×K×NF\times K\times N rotated quadratic cones. In section 3, we present an algorithm to tackle (5) and (9).

2.2.1 Rank-one Nonnegative Matrix Over-approximation

In this section, we show that our over-approximation formulation (6) can be expressed as a convex optimization problem in the case of a rank-one factorization (that is, K=1K=1). Hence we will be able to compute an optimal rank-one nonnegative matrix over-approximation (NMO). For K=1K=1, (6) becomes:

minwF,hN\displaystyle\underset{w\in\mathbb{R}^{F},h\in\mathbb{R}^{N}}{\text{min}} f=1Fn=1Nwfhn\displaystyle\sum_{f=1}^{F}\sum_{n=1}^{N}w_{f}h_{n} (10)
subject to wfhnVfn for f,n𝒩,\displaystyle w_{f}h_{n}\geq V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N},
wf,hn0 for f,n𝒩.\displaystyle w_{f},h_{n}\geq 0\text{ for }f\in\mathcal{F},n\in\mathcal{N}.

Any feasible solution (w,h)(w,h) of (10) provides a rank-one NMO of VV, because of the constraints. The objective function of (10) minimizes the sum of the entries of whwh^{\top}, which is equal to wh,ee=w,eh,e\langle wh^{\top},ee^{\top}\rangle=\langle w,e\rangle\ \langle h,e\rangle, where ee denotes the all-one factor of appropriate dimension. Since any solution whwh^{\top} can be rescaled as (λw)(h/λ)(\lambda w)(h^{\top}/\lambda) for any λ>0\lambda>0, we can assume without loss of generality (w.l.o.g.) that w,e=1\langle w,e\rangle=1, and hence (10) can be equivalently written as follows:

minwF,hN\displaystyle\underset{w\in\mathbb{R}^{F},h\in\mathbb{R}^{N}}{\text{min}} h,e\displaystyle\langle h,e\rangle (11)
subject to w,e=1,\displaystyle\langle w,e\rangle=1,
wfhnVfn for f,n𝒩,\displaystyle w_{f}h_{n}\geq V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N},
wf,hn0 for f,n𝒩.\displaystyle w_{f},h_{n}\geq 0\text{ for }f\in\mathcal{F},n\in\mathcal{N}.

Then, by letting each hnh_{n} take the minimal value allowed by the constraints, that is, hn=maxfVfn/wfh_{n}=\text{max}_{f\in\mathcal{F}}V_{fn}/w_{f} for each n𝒩n\in\mathcal{N}, and replacing wfw_{f} by its inverse, uf=1/wfu_{f}=1/w_{f} for each ff\in\mathcal{F}, (11) becomes:

minuF\displaystyle\underset{u\in\mathbb{R}^{F}}{\text{min}} nmaxf(ufVfn)\displaystyle\sum_{n}\text{max}_{f}\left(u_{f}V_{fn}\right) (12)
subject to f1/uf1,uf0 for f.\displaystyle\sum_{f}1/u_{f}\leq 1,\;u_{f}\geq 0\text{ for }f\in\mathcal{F}.

The feasible set of (12) can be formulated by using various conic constraints:

  • a semi-definite programming formulation: introduce variables yfy_{f} such that ufyf1,fyf1u_{f}y_{f}\geq 1,\sum_{f}y_{f}\leq 1 to obtain

    (uf11yf)𝕊+2 for f,fyf1,\displaystyle\left(\begin{array}[]{cc}u_{f}&1\\ 1&y_{f}\end{array}\right)\in\mathbb{S}^{2}_{+}\text{ for }f\in\mathcal{F},\;\sum_{f}y_{f}\leq 1,

    where 𝕊+2\mathbb{S}^{2}_{+} denotes the set of positive semi-definite matrices of dimension 2.

  • a power-cone formulation: for p<0p<0 the function g(x)=xpg(x)=x^{p} is convex for x>0x>0 and the inequality zxpz\geq x^{p} is equivalent to z1/(1p)xp/(1p)1(z,x,1)P1/(1p)z^{1/(1-p)}x^{-p/(1-p)}\geq 1\iff(z,x,1)\in P^{1/(1-p)} where Pα={(x,z,a)|zαx1αa}P^{\alpha}=\{(x,z,a)\ |\ z^{\alpha}x^{1-\alpha}\geq a\} is a power cone. In our case, by introducing yfuf1y_{f}\geq u_{f}^{-1}, we obtain

    (yf,uf,1)P1/2 for f,fyf1.\displaystyle(y_{f},u_{f},1)\in P^{1/2}\text{ for }f\in\mathcal{F},\;\sum_{f}y_{f}\leq 1.
  • a (rotated) quadratic formulation: introducing variables yfy_{f} such that yf1/ufy_{f}\geq 1/u_{f} for uf0u_{f}\geq 0, this can be formulated as follows: (uf,yf,2)Qr3(u_{f},y_{f},\sqrt{2})\in Q_{r}^{3} where Qr3Q_{r}^{3} denotes the set of rotated quadratic cones of dimension 3. We then have :

    (uf,yf,2)Qr3 for f,fyf1.\displaystyle(u_{f},y_{f},\sqrt{2})\in Q_{r}^{3}\text{ for }f\in\mathcal{F},\ \;\sum_{f}y_{f}\leq 1.

In this paper, we consider the (rotated) quadratic formulation which is the easiest to implement in the MOSEK software [25].

Further, the objective function of (12) is a sum of convex piece-wise linear functions. Hence by posing tnmaxf(ufVfn)t_{n}\geq\text{max}_{f}\left(u_{f}V_{fn}\right), (12) can equivalently be formulated as follows:

mintN,u,yF\displaystyle\underset{t\in\mathbb{R}^{N},u,y\in\mathbb{R}^{F}}{\text{min}} ntn\displaystyle\sum_{n}t_{n} (13)
subject to fyf1,\displaystyle\sum_{f}y_{f}\leq 1,
(uf,yf,2)Qr3 for f,\displaystyle(u_{f},y_{f},\sqrt{2})\in Q_{r}^{3}\text{ for }f\in\mathcal{F},
tnufVfn for f,n𝒩,\displaystyle t_{n}\geq u_{f}V_{fn}\text{ for }f\in\mathcal{F},n\in\mathcal{N},

which involves 2F+N2F+N variables and F(1+N)+1F(1+N)+1 constraints. This problem can be solved to optimality and efficiently with an interior-point method (IPM), as available for example in MOSEK [25].

For the formulation based on exponential cones, [12] showed that the rank-one underapproximation for positive input matrices can be expressed as the dual of an optimal transportation problem, and hence can also be solved optimally and efficiently with polynomial-time methods [26].

3 A successive linearization algorithm

In this section, we present an iterative algorithm to tackle problems (5) and (9). Both problems can be written as the minimization of a concave function Φ\Phi over a convex set denoted by QQ. Note that QQ designates either the feasible set of (5) or the feasible set of (9). We perform this minimization by solving a sequence of simpler problems in which the objective function is replaced by its linearization constructed at the current solution (U,T)(U,T). Let us denote Z(i)=(U(i),T(i))Z^{(i)}=(U^{(i)},T^{(i)}) the iith iterate of our algorithm. At each iteration ii, we update ZZ as follows:

Z(i)\displaystyle Z^{(i)} argmin ZQΦ(Z(i1))+Φ(Z(i1)),ZZ(i1)\displaystyle\in\underset{Z\in Q}{\text{argmin }}\Phi(Z^{(i-1)})+\langle\nabla\Phi(Z^{(i-1)}),Z-Z^{(i-1)}\rangle (14)
argmin ZQΦ(Z(i1)),Z,\displaystyle\in\underset{Z\in Q}{\text{argmin }}\langle\nabla\Phi(Z^{(i-1)}),Z\rangle,

where Φ\Phi is the objective function of (5) or (9). Since the objective of (14) is linear in ZZ, the subproblems become convex. Moreover they are particular structured conic optimization problems. In this paper, we use the MOSEK software [25] to solve each successive problem (14) with an IPM. Algorithm 1 summarizes our proposed method to tackle (5) and (9).

To initialize UU and TT, we chose to randomly initialize WW and HH (using the uniform distribution in the interval [0,1][0,1] for each entry of WW and HH) and apply the two changes of variables, G(.)G(.), to compute the initializations for UU and TT.

In this paper, we use a tolerance for the relative error equal to 10610^{-6}, that is, we assume that an exact NMF (W,H)(W,H) is found for an input matrix VV as soon as VWHFVF106\frac{\left\|V-WH\right\|_{F}}{\left\|V\right\|_{F}}\leq 10^{-6}, as done in [29].

The main algorithm integrates a procedure that automatically updates the optimization problems in the case subsets of entries of the solution tend to zero. Indeed, due to numerical limitations of the solver, the required level of accuracy cannot be reached in some numerical tests even if the solution is close to convergence. This procedure is referred to as Sparsity Patterns Integration (SPI) and is detailed in Appendix B. Note that the update of the optimization problem is computationally costly, in particular the update of the matrix of coefficients defining the constraints. Hence, SPI is triggered twice; at 80% and 95% of the maximum number of iterations. This practical choice has been motivated by numerical experiments that showed that two activations in the final iterations are sufficient to reach the tolerance error when the current solution is close enough to a high-accuracy local optimum. However, for exponential cones, in some numerical tests, the relative error can be stuck in the interval [104,105][10^{-4},10^{-5}]. In this context only, a final refinement step further improves the output of the main algorithm using the state-of-the-art accelerated HALS algorithm, an exact BCD method for NMF, from [14] to go below the 10610^{-6} tolerance.

Algorithm 1 Successive Conic Convex Approximation for Exact NMF
0: Input matrix V+F×NV\in\mathbb{R}^{F\times N}_{+}, the factorization rank KK, number of iterations maxiter, choose the formulation (5) (exponential cones) or (9) (second-order cones).
0:(W,H)0(W,H)\geq 0 such that VWHV\approx WH, and VWHV\leq WH for (5) (under-approximation) or VWHV\geq WH for (9) (over-approximation).
1:% Block 1: Initialization
2:(W(0),H(0))(W^{(0)},H^{(0)})\longleftarrow positive random initialization(F,K,N)(F,K,N).
3:(U(0),T(0))G1(W(0),H(0))(U^{(0)},T^{(0)})\longleftarrow G^{-1}(W^{(0)},H^{(0)}) where GG is the change of variables
4:Z(0)(U(0),T(0))Z^{(0)}\longleftarrow(U^{(0)},T^{(0)})
5:% Block 2: iterative update of ZZ
6:for i=1,2,,i=1,2,\dots, maxiter do
7:   Z(i)argmin ZQΦ(Z(i1)),ZZ^{(i)}\longleftarrow\underset{Z\in Q}{\text{argmin }}\langle\nabla\Phi(Z^{(i-1)}),Z\rangle with IPMs available in MOSEK [25]
8:end for
9:(W,H)G(Z(i))(W,H)\longleftarrow G(Z^{(i)})

In Section 4 , we discuss the convergence guarantees for Algorithm 1.

4 Convergence results

Let us focus on the following optimization problem

minZQ\displaystyle\underset{Z\in Q}{\text{min}} Φ(Z),\displaystyle\Phi\left(Z\right), (15)

where Φ\Phi is a concave continuously differentiable function over the domain QQ which is assumed to be convex and compact. Let us first describe the convergence of the sequence of objective function values {Φ(Z(i))}\{\Phi(Z^{(i)})\} obtained with Algorithm 1. Since Φ(Z)\Phi(Z) is concave, its linearization around the current iterate Z(i)Z^{(i)} provides an upper approximation, that is,

Φ(Z)Φ(Z(i))+Φ(Z(i)),ZZ(i) for all ZQ.\Phi(Z)\leq\Phi(Z^{(i)})+\langle\nabla\Phi(Z^{(i)}),Z-Z^{(i)}\rangle\;\text{ for all }\;Z\in Q. (16)

This upper bound is tight at the current iterate and is exactly minimized over the feasible set QQ at each iteration. Hence Algorithm 1 is a majorization-minimization algorithm. This implies that Φ(Z)\Phi(Z) is nonincreasing under the updates of Algorithm 1 and since Φ(Z)\Phi(Z) is bounded below on QQ, by construction, the sequence of objective function values {Φ(Z(i))}\{\Phi(Z^{(i)})\} converges.

We now focus on the convergence analysis of the sequence of iterates {Z(i)}\{Z^{(i)}\} generated by Algorithm 1, in particular convergence to a stationary point. To achieve this goal, we first recall some basics about the Frank-Wolfe (FW) algorithm. The FW algorithm [10] is a popular first-order method to solve (15) that relies on the ability to compute efficiently the so-called Linear Minimization Oracle (LMO), that is, LMO(D):=argmin ZQD,ZLMO(D):=\underset{Z\in Q}{\text{argmin }}\langle D,Z\rangle where DD denotes some search direction. The FW algorithm with adaptive step size is given in Algorithm 2. A step of this algorithm can be briefly summarized as follows: at a current iterate Z(i)Z^{(i)}, the algorithm considers the first-order model of the objective function (its linearization), and moves towards a minimizer of this linear function, computed on the same domain QQ.

1:Z(0)QZ^{(0)}\in Q, number of iterations I.
2:for i=1,2,,i=1,2,\dots,do
3:   Compute V(i):=argmin ZQΦ(Z(i1)),ZV^{(i)}:=\underset{Z\in Q}{\text{argmin }}\langle\nabla\Phi(Z^{(i-1)}),Z\rangle
4:   Choose 0<τ(i)10<\tau^{(i)}\leq 1. (A standard choice in the literature is τ(i):=2i+1\tau^{(i)}:=\frac{2}{i+1}.)
5:   Update Z(i):=(1τ(i))Z(i1)+τ(i)V(i)Z^{(i)}:=(1-\tau^{(i)})Z^{(i-1)}+\tau^{(i)}V^{(i)}
6:end for
Algorithm 2 Frank-Wolfe algorithm

Algorithm 1 is a particular case of Algorithm 2 for which τ(i)=1\tau^{(i)}=1 for all ii. In the last decade, FW-based methods have regained interest in many fields, mainly driven by their good scalability and the crucial property that Algorithm 2 maintains its iterates as a convex combination of a few extreme points. This results in the generation of sparse and low-rank solutions since at most one additional extreme point of the set QQ is added to the convex combination at each iteration. More details and insights about the later observations can be found in [6, 17]. FW algorithms have been recently studied in terms of convergence guarantees for the minimization of various classes of functions over convex sets, such as convex functions with Lipschitz continous gradient [18], and non-convex differentiable functions [22]. However, we are not aware of any convergence rates proven for Algorithm 2 when solving (15) assuming only concavity of the objective. To derive rates in the concave setting, we need to define a measure of stationarity for our iterates. In this paper, we consider the so-called FW gap of Φ\Phi at Z(i)Z^{(i)} defined as follows:

μ(i):=maxZQΦ(Z(i)),Z(i)Z.\mu^{(i)}:=\underset{Z\in Q}{\max}\langle\nabla\Phi(Z^{(i)}),Z^{(i)}-Z\rangle. (17)

This quantity is standard in the analysis of FW algorithms, see [18, 22] and the references therein. A point Z(i)Z^{(i)} is a stationary point for the constrained optimization problem (15) if and only if μ(i)=0\mu^{(i)}=0. Moreover, the FW gap

  • provides a lower bound on the accuracy: 0μ(i)Φ(Z(i))Φ0\leq\mu^{(i)}\leq\Phi(Z^{(i)})-\Phi^{*} for all ii, where Φ:=minZQΦ(Z)\Phi^{*}:=\min_{Z\in Q}\Phi(Z),

  • is affine invariant, that is, it is invariant with respect to an affine transformation of the domain QQ in problem (15) [18], and

  • is not tied to any specific choice of norms, unlike criteria such as Φ(Z(i))\left\|\nabla\Phi(Z^{(i)})\right\|.

Let us provide a convergence rate for the FW algorithm (Algorithm 2).

Theorem 4.1.

Consider the problem (15) where Φ\Phi is a continuously differentiable concave function over the compact convex domain QQ. Let us denote Z(i)Z^{(i)} the sequence of iterates generated by the FW algorithm (Algorithm 2) applied on (15). Assume there exists a constant τ~>0\tilde{\tau}>0 such that τ~τ(i)1\tilde{\tau}\leq\tau^{(i)}\leq 1 for all ii. Then the minimal FW gap, defined as μ~(i):=min0jiμ(j)\tilde{\mu}^{(i)}:=\min_{0\leq j\leq i}\,\mu^{(j)}, satisfies, for all i=1,2,i=1,2,\dots,

μ~(i)1τ~Φ(Z(0))Φi+1,\tilde{\mu}^{(i)}\;\leq\;\frac{1}{\tilde{\tau}}\frac{\Phi(Z^{(0)})-\Phi^{*}}{i+1}, (18)

where Φ(Z(0))Φ\Phi(Z^{(0)})-\Phi^{*} is the (global) accuracy of the initial iterate.

Proof.

Using (16), any points Z(i)Z^{(i)} and Z(i+1)Z^{(i+1)} generated by Algorithm 2 satisfy

Φ(Z(i+1))Φ(Z(i))+Φ(Z(i)),Z(i+1)Z(i).\Phi(Z^{(i+1)})\leq\Phi(Z^{(i)})+\langle\nabla\Phi(Z^{(i)}),Z^{(i+1)}-Z^{(i)}\rangle. (19)

Let us substitute Z(i+1)Z^{(i+1)} by its construction in Algorithm 2 (line 5) in (19) to obtain

Φ(Z(i+1))\displaystyle\Phi(Z^{(i+1)}) Φ(Z(i))+Φ(Z(i)),(1τ(i))Z(i)+τ(i)V(i)Z(i)\displaystyle\leq\Phi(Z^{(i)})+\langle\nabla\Phi(Z^{(i)}),(1-\tau^{(i)})Z^{(i)}+\tau^{(i)}V^{(i)}-Z^{(i)}\rangle (20)
=Φ(Z(i))τ(i)Φ(Z(i)),Z(i)V(i).\displaystyle=\Phi(Z^{(i)})-\tau^{(i)}\langle\nabla\Phi(Z^{(i)}),Z^{(i)}-V^{(i)}\rangle.

Let us show that Φ(Z(i)),Z(i)V(i)\langle\nabla\Phi(Z^{(i)}),Z^{(i)}-V^{(i)}\rangle is equal to μ(i)\mu^{(i)} defined in Equation (17):

μ(i)\displaystyle\mu^{(i)} =maxZQΦ(Z(i)),Z(i)Z\displaystyle=\underset{Z\in Q}{\max}\langle\nabla\Phi(Z^{(i)}),Z^{(i)}-Z\rangle (21)
=Φ(Z(i)),Z(i)+maxZQΦ(Z(i)),Z\displaystyle=\langle\nabla\Phi(Z^{(i)}),Z^{(i)}\rangle+\underset{Z\in Q}{\max}\langle\nabla\Phi(Z^{(i)}),-Z\rangle
=Φ(Z(i)),Z(i)minZQΦ(Z(i)),Z\displaystyle=\langle\nabla\Phi(Z^{(i)}),Z^{(i)}\rangle-\underset{Z\in Q}{\min}\langle\nabla\Phi(Z^{(i)}),Z\rangle
=Φ(Z(i)),Z(i)Φ(Z(i)),V(i)=Φ(Z(i)),Z(i)V(i),\displaystyle=\langle\nabla\Phi(Z^{(i)}),Z^{(i)}\rangle-\langle\nabla\Phi(Z^{(i)}),V^{(i)}\rangle=\langle\nabla\Phi(Z^{(i)}),Z^{(i)}-V^{(i)}\rangle,

where the fourth equality holds be definition of V(i)V^{(i)} from Algorithm 2 (line 3). Equation (20) becomes:

Φ(Z(i+1))Φ(Z(i))τ(i)μ(i).\Phi(Z^{(i+1)})\leq\Phi(Z^{(i)})-\tau^{(i)}\mu^{(i)}. (22)

By recursively applying (22) for the iterates generated by Algorithm 2, we obtain

Φ(Z(i+1))Φ(Z(0))j=0iτ(j)μ(j).\Phi(Z^{(i+1)})\leq\Phi(Z^{(0)})-\sum_{j=0}^{i}\tau^{(j)}\mu^{(j)}. (23)

Let define the quantities μ~(i):=min0jiμ(j)\tilde{\mu}^{(i)}:=\underset{0\leq j\leq i}{\min}\mu^{(j)}, the minimal FW gap encountered along iterates, and τ~\tilde{\tau} such that τ~τ(i)1\tilde{\tau}\leq\tau^{(i)}\leq 1 for all i0i\geq 0, so that inequality (23) implies

Φ(Z(i+1))Φ(Z(0))(i+1)τ~μ~(i)μ~(i)1τ~Φ(Z(0))Φ(Z(i+1))i+1.\displaystyle\Phi(Z^{(i+1)})\leq\Phi(Z^{(0)})-(i+1)\tilde{\tau}\tilde{\mu}^{(i)}\iff\tilde{\mu}^{(i)}\leq\frac{1}{\tilde{\tau}}\frac{\Phi(Z^{(0)})-\Phi(Z^{(i+1)})}{i+1}. (24)

Finally, using the fact that Φ(Z(0))Φ(Z(i+1))Φ(Z(0))Φ\Phi(Z^{(0)})-\Phi(Z^{(i+1)})\leq\Phi(Z^{(0)})-\Phi^{*} where Φ:=minZQΦ(Z)\Phi^{*}:=\underset{Z\in Q}{\min}\Phi(Z), inequality (24) becomes

μ~(i)1τ~Φ(Z(0))Φi+1,\tilde{\mu}^{(i)}\leq\frac{1}{\tilde{\tau}}\frac{\Phi(Z^{(0)})-\Phi^{*}}{i+1}, (25)

which concludes the proof. ∎

Theorem 4.1 shows that it takes at most 𝒪(1ϵ)\mathcal{O}(\frac{1}{\epsilon}) iterations to find an approximate stationary point with gap smaller than ϵ\epsilon. Note that Theorem 4.1 requires miniτ(i)>0\min_{i}\tau^{(i)}>0 and, for example, the standard choice τ(i)=2i+1\tau^{(i)}=\frac{2}{i+1} does not satisfy this assumption.

4.1 Compactness assumption and convergence of Algorithm 1

By looking at the convergence rate given by (18), it is tempting to take τ~\tilde{\tau} as large as possible. However, since the set QQ is convex, the maximum allowed value for τ~\tilde{\tau} is 1 to ensure that the iterates Z(i)Z^{(i)} remain feasible. This setting leads to a convergence rate of 𝒪(1/i)\mathcal{O}(1/i), given the assumptions of Theorem 4.1 are satisfied, and corresponds to Algorithm 1. In Theorem 4.1, we need the set QQ to be compact. Let us discuss this assumption in the context of Algorithm 1 that relies on our two formulations:

  1. 1.

    For the formulation using exponential cones (5), a variable WfkW_{fk} (resp. HknH_{kn}) equal to zero will correspond to UfkU_{fk} (resp. TknT_{kn}) going to -\infty, which is not bounded. As recommended by Mosek, we use additional artificial component-wise lower and upper bounds for UU and TT, namely, -35 and 10. Therefore, our implementation actually solves a component-wise bounded version of (5). However, since e3561016e^{-35}\approx 6\cdot 10^{-16} and e102104e^{10}\approx 2\cdot 10^{4}, numerically, these constraints do not exclude good approximations of VV in practice, as long as the entries in VV belong to a reasonable range which can be assumed w.l.o.g. by a proper preprocessing of VV, e.g., VVmaxf,nVfnV\leftarrow\frac{V}{\max_{f,n}V_{fn}} so that Vfn[0,1]V_{fn}\in[0,1] for all f,nf,n; see, e.g., the discussion in [15, page 66].

  2. 2.

    For the formulation using second-order cones (9), we can assume compactness w.l.o.g. In fact, for simplicity, let us consider the formulation (6) in variables (W,H)(W,H), since the change of variables is the component-wise square root, and keeps the feasible set compact. We can w.l.o.g. add a set of normalization constraints on the rows of HH, such as nHkn=Hk:1=1\sum_{n}H_{kn}=\|H_{k:}\|_{1}=1 for all kk which leads to a compact set for HH, since we can use the degree of freedom in scaling between the columns of WW and rows of HH. It remains to show that WW can be assumed to be in a compact set. Let us show that the level sets of the objective function are compact, which will give the result. The objective function of (6) is the component-wise 1\ell_{1} norm, WH1\|WH\|_{1}. Then, let (W,H)(W^{\prime},H^{\prime}) be an arbitrary feasible solution of (6) (such a solution can be easily constructed), and add the constraint WH1WH1=f\|WH\|_{1}\leq\|W^{\prime}H^{\prime}\|_{1}=f^{\prime} to formulation (6), which does not modify its optimal solution set. We have for all kk

    W:k1=W:k1Hk:1=W:kHk:1WH1f,\|W_{:k}\|_{1}=\|W_{:k}\|_{1}\|H_{k:}\|_{1}=\|W_{:k}H_{k:}\|_{1}\leq\|WH\|_{1}\leq f^{\prime},

    which shows that WW in that modified formulation also belongs to a compact set. The second equality and first inequality follow from nonnegativity of WW and HH.

under these modifications that make the feasible set QQ compact, we have the following corollary.

Corollary 4.2.

Both variants (5) and (9) of Algorithm 1 generate a sequence of iterates Z(i)Z^{(i)} whose FW gap converges according to μ~(i)Φ(Z(0))Φi+1\tilde{\mu}^{(i)}\leq\frac{\Phi(Z^{(0)})-\Phi^{*}}{i+1}.

Remark 2 (Difference-of-convex-functions optimization).

Algorithm 1 can also be interpreted as a special case of a difference-of-convex algorithm that minimizes the difference between two convex functions, namely f1(Z)f2(Z)f_{1}(Z)-f_{2}(Z) [23]. In our case, we would have f1(Z)=0f_{1}(Z)=0. For such problems, a convergence rate to stationary points of order O(1/i)O(1/i) has also been derived when optimizing over a convex compact feasible set, but using a different measure of stationary [1].

Remark 3.

Using a proper normalization (see Section 4.1), the original NMF problem can be tackled directly by the FW algorithm (Algorithm 2), as the linear minimization oracle can be computed in closed form. Hence one could use existing results on FW to derive a convergence rate. However, the FW algorithm applied to smooth nonconvex problems leads to a worse rate of O(1/i)O(1/\sqrt{i}) [22].

5 Numerical experiments

In this section, Algorithm 1, using both formulations (5) and (9), is tested for the computation of exact NMF for particular classes of matrices usually considered in the literature: (1) 1010-by-1010 matrices randomly generated with nonnegative rank 55 (each matrix is generated by multiplying two random rank-5 nonnegative matrices), (2) four 66-by-66 infinitesimally rigid factorizations with nonnegative rank 55 [20], denoted VinfiV_{\text{inf}i} for i=1,2,3,4i=1,2,3,4, and (3) four 55-by-55 slack matrices corresponding to nested hexagons, denoted Va=xV_{a=x}, with nonnegative ranks 3,4,5,53,4,5,5 depending on a parameter x=2,3,4,+x=2,3,4,+\infty, respectively. These matrices are described in more details in the Appendix A.

In order to make Algorithm 1 practically more effective, we incorporate two improvements in the context of Exact NMF:

  1. 1.

    Sparsity patterns integration: in NMF, entries of WW and HH are expected to be equal to zero at optimality. Hence, when some entries of WW or HH are sufficiently close to zero, fixing them to zero for all remaining iterations reduces the number of variables and hence accelerates the subsequent iterations of Algorithm 1; see Appendix B for the details.

  2. 2.

    Final refinement: once a solution is generated by Algorithm 1 for the formulation based on exponential cones, it can typically be slightly improved by applying a standard NMF algorithm (we use a few iterations of A-HALS [14]). This is due to the bound constraints on WW and HH; see Section 4.1.

For each of the matrices, we run our Algorithm 1 for 750 iterations for nested hexagons and random matrices and 3000 iterations for infinitesimally rigid matrices, and compare with the state-of-the-art algorithms from [29] with Multi-Start 1 heuristic ”ms1” and the Rank-by-rank heuristic ”rbr”. For each method, we run 100 initializations with SPARSE10, as recommended in [29], with target precision 10610^{-6}. Note that a different random matrix is generated each time for the experiments on random matrices, following the procedure described in the Appendix A. All tests are preformed using Matlab R2021b on a laptop Intel CORE i7-11800H CPU @2.3GHz 16GB RAM. The code is available from https://bit.ly/3FqMqhD.

Table 1 reports the number of successes over 100 attempts to compute the exact NMF of the input matrices, where the success is defined as obtaining a solution where VWHFVF\frac{\left\|V-WH\right\|_{F}}{\left\|V\right\|_{F}} is below the target precision, namely 10610^{-6}.

Table 1: Comparison of the two variants of Algorithm 1 for (5) and  (9) with the algorithm from [29] with the ”ms1” and ”rbr” heuristics. Each run is performed with 100 initializations to compute the factorizations of matrices described in Appendix A. In bold, we indicate the algorithm that found the most exact NMFs.
Algorithm 1 Algorithm 1 Algorithm from [29] Algorithm from [29]
for (5) for (9) with “ms1” with “rbr”
Matrices /100 /100 /100 /100
Random matrices 100 100 100 100
Inf. Rig. Fac. Vinf1{V_{\text{inf}1}} 5 7 6 0
Vinf2V_{\text{inf}2} 16 38 34 97
Vinf3V_{\text{inf}3} 14 33 14 90
Vinf4{V_{\text{inf}4}} 16 20 15 0
Nested hexagons Va=2V_{a=2} 100 100 100 100
Va=3V_{a=3} 100 100 100 100
Va=4V_{a=4} 35 69 36 100
Va+V_{a\rightarrow+\infty} 17 42 20 100

We observe the following:

  • All algorithms find exact NMFs in all runs for random matrices. It is well-known that factorizing randomly generated matrices is typically easier [29]. This shows that Algorithm 1 with both formulations (5) and (9) is also able to compute exact NMFs in this scenario, which is reassuring.

  • Looking at the nonrandom matrices, Algorithm 1 with both formulations (5) and (9), and “ms1” from [29] are the only algorithms able to compute an exact NMF for at least some of the 100 initializations. Moreover, among these three algorithms, Algorithm 1 with (9) found most frequently an exact NMF.

  • For some matrices (namely, Va=4V_{a=4} and Va+V_{a\rightarrow+\infty}), “rbr” from [29] is able to compute exact NMF for all initializations, which is not the case of the other algorithms. However, “rbr” is not able to compute exact NMFs for Vinf1V_{\text{inf}1} and Vinf2V_{\text{inf}2}.

In summary, Algorithm 1 competes favorably with the algorithms proposed in [29], and appears to be more robust in the sense that it computes exact NMFs in all the tested cases. In addition, the second-order cone formulation, Algorithm 1 with (9), performs slightly better than with the exponential cone formulation, Algorithm 1 with (5).

Computational time

In terms of computational time, Algorithm 1 performs similarly to algorithm from [29] for the considered matrices, but it does not scale as well. The main reason is that it relies on interior-point methods, while [29] relies on first-order methods (more precisely, exact BCD). For example, for the infinitesimally rigid matrices, Algorithm 1 and [29] take between 2 and 16 seconds. We would report slower performance for Algorithm 1 compared to [29] for larger matrices. Hence a possible direction of research would be to use faster methods to tackle the conic optimization problems.

Numerical validation of the rate of convergence

As a simple empirical validation of the rate of convergence proposed in Section 4, we report in Figure 1 the evolution of the minimum FW gap computed along iterations by Algorithm 1 (for (5) and (9)) for one tested matrix, namely Vinf2{V_{\text{inf}2}}. Similar behaviours were observed for all the tested input matrices: additional figures are given in Appendix C. Note that we also integrated a variant of Algorithm 1 for which τ(i):=2i+1\tau^{(i)}:=\frac{2}{i+1} (a standard choice in the literature for FW algorithms). In Figure 1, we observe that the behaviour of the minimal FW gap is in line with the theoretical results from Section 4. Furthermore, the choice τ~=τ(i):=1\tilde{\tau}=\tau^{(i)}:=1 leads to faster decrease of the minimal FW gap encountered along iterations, as expected.

Refer to caption
(a) Algorithm 1 for (5)
Refer to caption
(b) Algorithm 1 for (9)
Figure 1: Evolution of the minimum FW gap computed along iterations by Algorithm 1 for (5) (left) and (9) (right) for the matrix Vinf2{V_{\text{inf}2}}.
Impact of the initialization

We also investigated the impact of using an improved initialization for our Algorithm 1 with (9), based on a rank-one NMO slightly perturbed with nonnegative random uniform values. More precisely, we set Z(0)=ZK=1+dRZK=1FRFZ^{(0)}=Z_{K=1}+dR\frac{\|Z_{K=1}\|_{F}}{\|R\|_{F}} where ZK=1Z_{K=1} is the rank-one NMO computed with the methodology detailed in Section 2.2.1, RR is a matrix of appropriate size of uniformly distributed random numbers in the interval (0,1)(0,1), and dd is a parameter to be defined by the user. In our numerical experiments, we chose values for dd within the interval [0.01,0.05][0.01,0.05]. We find that Algorithm 1 with (9) using that dedicated initialization found respectively 74, 65 and 52 successes for matrices Va+V_{a\rightarrow+\infty}, Vinf2V_{\text{inf}2} and Vinf4V_{\text{inf}4}, respectively, a marked improvement over the default random initialization (where it had 42, 38, 20 successes, respectively). For the other tested matrices, it did not change the result significantly (only a few additional successes). Therefore a possible direction of research would be the design of more advanced strategies for the initialization of (U,T)(U,T).

6 Conclusion

In this paper, we introduced two formulations for computing exact NMFs, namely (1) and (6) that are under- and upper-approximation formulations for NMF, respectively. For each formulation, we used a particular change of variables that enabled the use of two convex cones, namely the exponential and second-order cones, to reformulate these problems as the minimization of a concave function over a convex feasible set. In order to solve the two optimization problems, we proposed Algorithm 1 that relies on the resolution of successive linear approximations of the objective functions and the use of interior-point methods. We also showed that our optimization scheme relying on successive linearizations is a special case of the Frank-Wolfe (FW) algorithm. Using an appropriate measure of stationarity, namely the FW gap, we showed in Theorem 4.1 that the minimal FW gap generated by our algorithm converges as 𝒪(1i)\mathcal{O}(\frac{1}{i}), where ii is the iteration index. We believe this type of global convergence rate to a stationary point is new for NMF. We showed that Algorithm 1 is able to compute exact NMFs for several classes of nonnegative matrices (namely, randomly generated matrices, infinitesimally rigid matrices, and slack matrices of nested hexagons) and as such demonstrate its competitiveness compared to recent methods from the literature. However, we have only tested Algorithm 1 on a limited number of nonnegative matrices for exact NMF. In the future we plan to test it on a larger library of nonnegative matrices and also to compute approximate NMFs in data analysis applications, in order to better understand the behavior of Algorithm 1 along with the two formulations, (5) and (9).

Further works also include:

  • The design of more advanced strategies for the initialization of (U,T)(U,T).

  • The elaboration of alternative formulations for (5) and (9) to deal with the non-uniqueness of the NMF models. In particular, we plan to develop new formulations that discard solutions of the form V=W~H~=(WE)(E1H)V=\tilde{W}\tilde{H}=\left(WE\right)\left(E^{-1}H\right) for a given solution (W,H)(W,H) and for invertible matrices EE such that WE0WE\geq 0 and E1H0E^{-1}H\geq 0. For example, one could remove the permutation and scaling ambiguity for the columns of WW and rows of HH, to remove some degrees of freedom in the formulations. We refer the interested reader to [11] and [13, Chapter 4], and the references therein, for more information on the non-uniqueness of NMF.

  • The use of our framework for other closely related problems; in particular the computation of symmetric NMFs which requires H=WH=W^{\top}; this problem is closely related to completely positive matrices [4]. Symmetric NMF can be used for data analysis and in particular for various clustering tasks [21].

  • The use of upper-approximations that are more accurate than linearizations for concave functions such as second-order models, and study the convergence for such models.

Acknowledgements

We thank the two reviewers for their careful reading and insightful feedback that helped us improve the paper significantly.

References

  • [1] H. Abbaszadehpeivasti, E. de Klerk, and M. Zamani, On the rate of convergence of the difference-of-convex algorithm (DCA), arXiv preprint arXiv:2109.13566 (2021).
  • [2] S. Arora, R. Ge, R. Kannan, and A. Moitra, Computing a nonnegative matrix factorization—provably, SIAM Journal on Computing 45 (2016), pp. 1582–1611.
  • [3] S. Basu, R. Pollack, and M.F. Roy, On the combinatorial and algebraic complexity of quantifier elimination, Journal of the ACM (JACM) 43 (1996), pp. 1002–1045.
  • [4] A. Berman and N. Shaked-Monderer, Completely positive matrices, World Scientific, 2003.
  • [5] A. Cichocki, R. Zdunek, A.H. Phan, and S.i. Amari, Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation, John Wiley & Sons, 2009.
  • [6] K.L. Clarkson, Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm, ACM Trans. Algorithms 6 (2010).
  • [7] J.E. Cohen and U.G. Rothblum, Nonnegative ranks, decompositions, and factorizations of nonnegative matrices, Linear Algebra and its Applications 190 (1993), pp. 149–168.
  • [8] J. Dewez, Computational approaches for lower bounds on the nonnegative rank, Ph.D. diss., UCLouvain, 2022.
  • [9] J. Dewez, N. Gillis, and F. Glineur, A geometric lower bound on the extension complexity of polytopes based on the ff-vector, Discrete Applied Mathematics 303 (2021), pp. 22–388.
  • [10] M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Research Logistics Quarterly 3 (1956), pp. 95–110.
  • [11] X. Fu, K. Huang, N.D. Sidiropoulos, and W.K. Ma, Nonnegative matrix factorization for signal and data analytics: Identifiability, algorithms, and applications., IEEE Signal Process. Mag. 36 (2019), pp. 59–80.
  • [12] N. Gillis, Approximation et sous-approximation de matrices par factorisation positive : algorithmes, complexite et applications, Master’s thesis, Universite Catholique de Louvain, 2007.
  • [13] N. Gillis, Nonnegative Matrix Factorization, SIAM, Philadelphia, 2020.
  • [14] N. Gillis and F. Glineur, Accelerated multiplicative updates and hierarchical ALS algorithms for nonnegative matrix factorization, Neural Computation 24 (2012), pp. 1085–1105.
  • [15] N. Gillis, et al., Nonnegative matrix factorization: Complexity, algorithms and applications, Unpublished doctoral dissertation, Université catholique de Louvain. Louvain-La-Neuve: CORE (2011).
  • [16] N. Gillis and F. Glineur, Using underapproximations for sparse nonnegative matrix factorization, Pattern recognition 43 (2010), pp. 1676–1687.
  • [17] M. Jaggi, Sparse convex optimization methods for machine learning, Ph.D. diss., ETH Zurich, 2011.
  • [18] M. Jaggi, Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization, in Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research Vol. 28, 17–19 Jun, Atlanta, Georgia, USA. PMLR, 2013, pp. 427–435.
  • [19] J. Kim and H. Park, Fast nonnegative matrix factorization: An active-set-like method and comparisons, SIAM Journal on Scientific Computing 33 (2011), pp. 3261–3281.
  • [20] R. Krone and K. Kubjas, Uniqueness of nonnegative matrix factorizations by rigidity theory, SIAM Journal on Matrix Analysis and Applications 42 (2021), p. 134–164.
  • [21] D. Kuang, C. Ding, and H. Park, Symmetric Nonnegative Matrix Factorization for Graph Clustering, in Proceedings of the 2012 SIAM International Conference on Data Mining. pp. 106–117.
  • [22] S. Lacoste-Julien, Convergence rate of Frank-Wolfe for non-convex objectives, arXiv preprint arXiv:1607.00345 (2016).
  • [23] H.A. Le Thi and T. Pham Dinh, DC programming and DCA: thirty years of developments, Mathematical Programming 169 (2018), pp. 5–68.
  • [24] A. Moitra, An Almost Optimal Algorithm for Computing Nonnegative Rank, in Proc. of the 24th Annual ACM-SIAM Symp. on Discrete Algorithms (SODA ’13). 2013, pp. 1454–1464.
  • [25] Mosek APS, Optimization software. Available at https://www.mosek.com/.
  • [26] R. Sharma and K. Sharma, A new dual based procedure for the transportation problem, European Journal of Operational Research 122 (2000), pp. 611–624.
  • [27] Y. Shitov, The nonnegative rank of a matrix: Hard problems, easy solutions, SIAM Review 59 (2017), pp. 794–800.
  • [28] M. Tepper and G. Sapiro, Nonnegative matrix underapproximation for robust multiple model fitting, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017, pp. 2059–2067.
  • [29] A. Vandaele, N. Gillis, F. Glineur, and D. Tuyttens, Heuristics for exact nonnegative matrix factorization, J. of Global Optim 65 (2016), p. 369–400.
  • [30] S.A. Vavasis, On the complexity of nonnegative matrix factorization, SIAM Journal on Optimization 20 (2010), pp. 1364–1377.

Appendix A Factorized matrices

In this appendix, we describe the matrices considered for the numerical experiments in Section 5:

  • Randomly generated matrices: It is standard in the NMF literature to use randomly generated matrices to compare algorithms (see, e.g., [19]). In this paper, we used the standard approach: V=WH+F×NV=WH\in\mathbb{R}_{+}^{F\times N} where each entry of W+F×KW\in\mathbb{R}_{+}^{F\times K} and H+K×NH\in\mathbb{R}_{+}^{K\times N} is generated using the uniform distribution in the interval [0,1][0,1], and Kmin(F,N)K\leq\min(F,N). With this approach, rank(V)=rank+(V)=K\operatorname{rank}(V)=\operatorname{rank}_{+}(V)=K with probability one. In the numerical experiments reported in Section 5, we used F=N=10F=N=10 and K=5K=5.

  • Infinitesimally rigid factorizations: in this paper, we consider four infinitesimally rigid factorizations for 5×55\times 5 matrices with positive entries and with nonnegative rank equal to four from [20]:

    Vinf1=(5737058065201676222465005316593970963960029917663720274120131646403260302692269152645109114851603111828274688517981478573200351037599025697755),\displaystyle V_{\text{inf}1}=\left(\begin{array}[]{ccccc}573705&806520&167622&246500&531659\\ 397096&39600&299176&63720&274120\\ 131646&403260&30269&226915&264510\\ 9114&85160&311182&827468&851798\\ 147857&3200&351037&599025&697755\end{array}\right),
    Vinf2=(30893319912149770873111428383490879905580628440587250560076103032433107028804535064720383030518427751226437620593390911142936500784618842609633),\displaystyle V_{\text{inf}2}=\left(\begin{array}[]{ccccc}30893&319912&149770&873&111428\\ 383490&87990&5580&628440&587250\\ 560076&1030324&331070&288045&350647\\ 203830&305184&277512&264376&205933\\ 90911&142936&500784&618842&609633\par\end{array}\right),
    Vinf3=(94820172360995875559185839795322244821804030429348793158253295887189623001120124691854674241607041150928355043439121114797932972975775997164636096),\displaystyle V_{\text{inf}3}=\left(\begin{array}[]{ccccc}948201&723609&958755&591858&397953\\ 222448&218040&30429&348793&15825\\ 329588&7189&623001&12012&469185\\ 467424&160704&115092&835504&343912\\ 1114797&932972&975775&997164&636096\par\end{array}\right),
    Vinf4=(88076294646658787902872244559221642165967056526982504652793601808647695061051380391634553284826606765406293965883775696039897917148301832169169525).\displaystyle V_{\text{inf}4}=\left(\begin{array}[]{ccccc}88076&294646&658787&902872&244559\\ 2216&4216&596705&652698&250465\\ 279360&180864&769506&1051380&391634\\ 553284&826606&765406&293965&883775\\ 696039&897917&148301&832169&169525\par\end{array}\right).

    These matrices have been shown to be challenging to factorize. We refer the reader to [20] for more details.

  • Nested hexagons problem: computing an exact NMF is equivalent to tackle a well-known problem in computational geometry which is referred to as nested polytope problem. Here we consider a family of input matrices whose exact NMF corresponds to finding a polytope nested between two hexagons; see [13, Chapter 2] and the references therein. Given x>1x>1, Va=xV_{a=x} is defined as

    1x(1x2x12x1x111x2x12x1xx11x2x12x12x1x11x2x12x12x1x11xx2x12x1x11)\frac{1}{x}\left(\begin{array}[]{cccccc}1&x&2x-1&2x-1&x&1\\ 1&1&x&2x-1&2x-1&x\\ x&1&1&x&2x-1&2x-1\\ 2x-1&x&1&1&x&2x-1\\ 2x-1&2x-1&x&1&1&x\\ x&2x-1&2x-1&x&1&1\end{array}\right)

    which satisfies rank(Va=x)=3\operatorname{rank}(V_{a=x})=3 for any x>1x>1. The inner hexagon is smaller than the outer one with a ratio of a1a\frac{a-1}{a}. We consider three values for aa:

    • a=2a=2: the inner hexagon is twiced smaller than the outer one and we can fit a triangle between the two, thus rank+(Va)=3\operatorname{rank}_{+}(V_{a})=3.

    • a=3a=3: the inner hexagon is 2/32/3 smaller than the outer one and we can fit a rectangle between the two, thus rank+(Va)=4\operatorname{rank}_{+}(V_{a})=4.

    • a=4a=4: rank+(Va)=5\operatorname{rank}_{+}(V_{a})=5.

    • a+a\rightarrow+\infty, which gives:

      V=(012210001221100122210012221001122100)V=\left(\begin{array}[]{cccccc}0&1&2&2&1&0\\ 0&0&1&2&2&1\\ 1&0&0&1&2&2\\ 2&1&0&0&1&2\\ 2&2&1&0&0&1\\ 1&2&2&1&0&0\par\end{array}\right)

      with rank+(V)=5\operatorname{rank}_{+}(V)=5.

Appendix B Sparsity Patterns Integration

This appendix details the SPI procedure for quadratic cones, similar rationale has been followed for exponential cones. Due to nonnegative constraints on the entries of WW and HH, one can expect sparsity patterns for the solutions (W,H)(W,H), as for the solution (U,T)(U,T) of (9) since Wfk=G(Ufk)=UfkW_{fk}=G(U_{fk})=\sqrt{U_{fk}} and Hkn=G(Tkn)=TknH_{kn}=G(T_{kn})=\sqrt{T_{kn}}. Obviously, the sparsity for the solutions is reinforced by the sparsity of the input matrix VV. One can observe that the objective function Φ\Phi from (9) is not L-smooth on the interior of the domain, that is the non-negative orthant. In the case the (f,k)(f,k)-entry of the current iterate Ui1U^{i-1} tends to zero, the corresponding entry of the gradient of Φ\Phi w.r.t. UU tends to \infty which therefore ends the optimization process. In order to tackle this issue and enables the solution to reach the desired tolerance of 10610^{-6}, we integrated an additional stage within the second building block of Algorithm 1. This additional stage is referred to as ”Sparsity Pattern Integration”. Let us illustrate its principle on the following case: the entry Uf¯,k¯i1U_{\bar{f},\bar{k}}^{i-1} tends to zero. Let us now fix Uf¯,k¯U_{\bar{f},\bar{k}} to zero, drop this variable from the optimization process and observe the impact on the constraints of (7) in which variable Uf¯,k¯U_{\bar{f},\bar{k}} is involved; the inequality constraints identified by index f=f¯f=\bar{f} are:

Uf¯,1T1,n++Uf¯,k¯Tk¯,n++Uf¯,KTK,nVf¯,n for n𝒩.\sqrt{U_{\bar{f},1}}\sqrt{T_{1,n}}+...+\sqrt{U_{\bar{f},\bar{k}}}\sqrt{T_{\bar{k},n}}+...+\sqrt{U_{\bar{f},K}}\sqrt{T_{K,n}}\geq V_{\bar{f},n}\text{ for }n\in\mathcal{N}.

First, since Uf¯,k¯=0\sqrt{U_{\bar{f},\bar{k}}}=0, there is no more constraints on Tk¯,n\sqrt{T_{\bar{k},n}} for NN inequalities identified by index f=f¯f=\bar{f}. Second, for the problem (9) and its successive linearizations, it is then clear that NN conic variables tf¯,k¯,nt_{\bar{f},\bar{k},n} (and hence the NN associated conic constraints) can be dropped from the optimization process. Finally, the linear term [UΦ(Ui1,Ti1)]f¯,k¯Uf¯,k¯\left[\nabla_{U}\Phi\left(U^{i-1},T^{i-1}\right)\right]_{\bar{f},\bar{k}}U_{\bar{f},\bar{k}} is removed from the current linearizations of Φ\Phi. The same rationale is followed for the case entries of the current iterate for TT tend to zero.

On a practical point of view, at each activation of SPI, Algorithm 1 checks if entries of the current iterates (Ui1,Ti1)(U^{i-1},T^{i-1}) are below a threshold thth defined by the user. Hence the corresponding entries of UU and TT are set to zero so that a sparsity pattern is determined, that are the indices of these entries. The successive linearizations of (9) are automatically updated based on the current sparsity pattern with the approach explained above.

Let us illustrate the impact of triggering the SPI procedure on the solutions obtained for the factorization of the following matrix VV:

V=(012210001221100122210012221001122100).V=\left(\begin{array}[]{cccccc}0&1&2&2&1&0\\ 0&0&1&2&2&1\\ 1&0&0&1&2&2\\ 2&1&0&0&1&2\\ 2&2&1&0&0&1\\ 1&2&2&1&0&0\end{array}\right). (26)

The nonnegative rank of (26) is known and is equal to 55. Algorithm 1 is used to compute an exact NMF of VV with the following input parameters:

  • K=5=rank+(V)K=5=\operatorname{rank}_{+}(V),

  • th=103th=10^{-3},

  • the maximum number of iterations defined by parameter maxiter is set to 500 and the SPI procedure is triggered at iteration 400400.

Figure 2 displays the evolution of VWHFVF\frac{\left\|V-WH\right\|_{F}}{\left\|V\right\|_{F}} along iterations for VV (26) with a factorization rank K=5K=5. One can observe that, once the SPI is activated, the relative Frobenius error drops from 5 10410^{-4} to 8 10910^{-9}, hence below the tolerance of 10610^{-6} such that we assume an exact NMF (W,H)(W,H) is found. For this experiment, we obtain:

W=(01.47480.9259000.782401.851700000.925901.47160000.60241.47160.7824001.2049001.474800.60240),\displaystyle W=\left(\begin{array}[]{ccccc}0&1.4748&0.9259&0&0\\ 0.7824&0&1.8517&0&0\\ 0&0&0.9259&0&1.4716\\ 0&0&0&0.6024&1.4716\\ 0.7824&0&0&1.2049&0\\ 0&1.4748&0&0.6024&0\end{array}\right),
H=(001.2781001.278100.67801.35610.6780000001.08011.080101.65991.659900000.67960000.67961.3591).\displaystyle H=\left(\begin{array}[]{cccccc}0&0&1.2781&0&0&1.2781\\ 0&0.6780&1.3561&0.6780&0&0\\ 0&0&0&1.0801&1.0801&0\\ 1.6599&1.6599&0&0&0&0\\ 0.6796&0&0&0&0.6796&1.3591\end{array}\right).
Refer to caption
Figure 2: Evolution of VWHFVF\frac{\left\|V-WH\right\|_{F}}{\left\|V\right\|_{F}} along iterations; SPI is triggered at iteration 400400.

Appendix C Additional empirical validations of the convergence rates

In this appendix, we report in Figures 3 and 4 additional empirical validations of the convergence rate introduced in 4 for the two others classes on tested matrices, namely the random matrices and the matrices related to nested hexagons problem. For the later, we considered the particular case a=4a=4.

Refer to caption
(a) Algorithm 1 for (5)
Refer to caption
(b) Algorithm 1 for (9)
Figure 3: Evolution of the minimum FW gap computed along iterations by Algorithm 1 for (5) (left) and (9) (right) for a randomly geenrated matrix.
Refer to caption
(a) Algorithm 1 for (5)
Refer to caption
(b) Algorithm 1 for (9)
Figure 4: Evolution of the minimum FW gap computed along iterations by Algorithm 1 for (5) (left) and (9) (right) for the matrix Va=4{V_{a=4}}.