This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Circuit imbalance measures and linear programming thanks: This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement ScaleOpt–757481)

Farbod Ekbatani London School of Economics and Political Science
{F.Ekbatani, B.Natura, L.Vegh}@lse.ac.uk
Bento Natura London School of Economics and Political Science
{F.Ekbatani, B.Natura, L.Vegh}@lse.ac.uk
László A. Végh London School of Economics and Political Science
{F.Ekbatani, B.Natura, L.Vegh}@lse.ac.uk
Abstract

We study properties and applications of various circuit imbalance measures associated with linear spaces. These measures describe possible ratios between nonzero entries of support-minimal nonzero vectors of the space. The fractional circuit imbalance measure turns out to be a crucial parameter in the context of linear programming, and two integer variants can be used to describe integrality properties of associated polyhedra.

We give an overview of the properties of these measures, and survey classical and recent applications, in particular, for linear programming algorithms with running time dependence on the constraint matrix only, and for circuit augmentation algorithms. We also present new bounds on the diameter and circuit diameter of polyhedra in terms of the fractional circuit imbalance measure.

1 Introduction

For a linear space WnW\subset\mathbb{R}^{n}, gWg\in W is an elementary vector if gg is a support minimal nonzero vector in WW, that is, no hW{0}h\in W\setminus\{0\} exists such that supp(h)supp(g)\mathrm{supp}(h)\subsetneq\mathrm{supp}(g), where supp\mathrm{supp} denotes the support of a vector. A circuit in WW is the support of some elementary vector; these are precisely the circuits in the associated linear matroid (W)\mathcal{M}(W). We let (W)W\mathcal{F}(W)\subseteq W and 𝒞W2n\mathcal{C}_{W}\subseteq 2^{n} denote the set of elementary vectors and circuits in the space WW, respectively.

Elementary vectors were first studied in the 1960s by Camion [Cam64], Tutte [Tut65], Fulkerson [Ful68], and Rockafellar [Roc69]. Circuits play a crucial role in matroid theory and have been extremely well studied. For regular subspaces (i.e., kernels of totally unimodular matrices), elementary vectors have ±1\pm 1 entries; this fact has been at the heart of several arguments in network optimization since the 1950s.

The focus of this paper is on various circuit imbalance measures. We give an overview of classical and recent applications, and their relationship with other condition measures. We will mainly focus on applications in linear programming, mentioning in passing also their relevance to integer programming.

Three circuit imbalance measures

There are multiple ways to quantify how ‘imbalanced’ elementary vectors of a subspace can be. We define three different measures that capture various fractionality and integrality properties.

We will need some simple definitions. The linear spaces {0}\{0\} and n\mathbb{R}^{n} will be called trivial subspaces; all other subspaces are nontrivial. A linear subspace of n\mathbb{R}^{n} is a rational linear space if it admits a basis of rational vectors. Equivalently, a rational linear space can be represented as the image of a rational matrix. For an integer vector vnv\in\mathbb{Z}^{n}, let lcm(v)\mathrm{lcm}(v) denote the least common multiple of the entries |vi||v_{i}|, i[n]i\in[n].

For every C𝒞WC\in\mathcal{C}_{W}, the elementary vectors with support CC form a one-dimensional subspace of WW. We pick a representative gC,W(W)g^{C,W}\in\mathcal{F}(W) from this subspace. If WW is not a rational subspace, we select gC,Wg^{C,W} arbitrarily. For rational subspaces, we select gC,Wg^{C,W} as an integer vector with the largest common divisor of the coordinates being 1; this choice is unique up to multiplication by 1-1. When clear from the context, we omit the index WW and simply write gCg^{C}. We now define the fractional circuit imbalance measure and two variants of integer circuit imbalance measure.

Definition 1.1 (Circuit imbalances).

For a non-trivial linear subspace WnW\subseteq\mathbb{R}^{n}, let us define the following notions:

  • The fractional circuit imbalance measure of WW is

    κW:=max{|gjCgiC|:C𝒞W,i,jC}.\kappa_{W}:=\max\left\{\left|\frac{g^{C}_{j}}{g^{C}_{i}}\right|:\,C\in\mathcal{C}_{W},i,j\in C\right\}\,.
  • If WW is a rational linear space, the lcm-circuit imbalance measure is

    κ˙W:=lcm{lcm(gC):C𝒞W}.{\dot{\kappa}}_{W}:=\mathrm{lcm}\left\{\mathrm{lcm}(g^{C}):\,C\in\mathcal{C}_{W}\right\}\,.
  • If WW is a rational linear space, the max-circuit imbalance measure is

    κ¯W:=max{gC:C𝒞W}.\bar{\kappa}_{W}:=\max\left\{\|g^{C}\|_{\infty}:\,C\in\mathcal{C}_{W}\right\}\,.

For trivial subspaces WW, we define κW=κ˙W=κ¯W=1\kappa_{W}={\dot{\kappa}}_{W}=\bar{\kappa}_{W}=1. Further, we say that the rational subspace WW is anchored, if every vector gCg^{C}, C𝒞WC\in\mathcal{C}_{W} has a ±1\pm 1 entry.

Equivalently, in an anchored subspace every elementary vector g(W)g\in\mathcal{F}(W) has a nonzero entry such that all other entries are integer multiples of this entry.

The term ‘circuit imbalance measure’ will refer to the fractional measure κW\kappa_{W}. Note that 1κWκ¯Wκ˙W1\leq\kappa_{W}\leq\bar{\kappa}_{W}\leq\dot{\kappa}_{W} and κW=1\kappa_{W}=1 implies κ¯W=κ˙W=1\bar{\kappa}_{W}=\dot{\kappa}_{W}=1. This case plays a distinguished role and turns out to be equivalent to WW being a regular linear space (see Theorem 3.4).

Another important case is when κ˙W=pα\dot{\kappa}_{W}=p^{\alpha} is a prime power. In this case, WW is anchored, and κW=κ¯W=κ˙W\kappa_{W}=\bar{\kappa}_{W}=\dot{\kappa}_{W}. The linear space will be often represented as W=ker(A)W=\ker(A) for a a matrix Am×nA\in\mathbb{R}^{m\times n}. We will use (A)\mathcal{F}(A), 𝒞A\mathcal{C}_{A}, κA\kappa_{A}, κ˙A{\dot{\kappa}}_{A}, κ¯A\bar{\kappa}_{A} to refer to the corresponding quantities in ker(A)\ker(A).

An earlier systematic study of elementary vectors was done in Lee’s work  [Lee89]. He mainly focused on the max-circuit imbalance measure; we give a quick comparison to the results in Section 3. The fractional circuit imbalance measure played a key role in the paper [DHNV20] on layered-least-squares interior point methods; it turns out to be a close proxy to the well-studied condition number χ¯W\bar{\chi}_{W}. As far as the authors are aware, the lcm-circuit imbalance measure has not been explicitly studied previously.

Overview and contributions

Section 2 introduces some background and notation. Section 3 gives an overview of fundamental properties of κW\kappa_{W} and κ˙W{\dot{\kappa}}_{W}. In particular, Section 3.1 relates circuit imbalances to subdeterminant bounds. We note that many extensions of totally unimodular matrices focus on matrices with bounded subdeterminants. Working with circuit imbalances directly can often lead to stronger and conceptually cleaner results. Section 3.2 presents an extension of the Hoffman-Kruskal characterization of TU matrices. Section 3.3 shows an important self-duality property of κW\kappa_{W} and κ˙W{\dot{\kappa}}_{W}. Section 3.4 studies ‘nice’ matrix representations of subspaces with given lcm-circuit imbalances. Section 3.5 proves a multiplicative triangle-inequality for κW\kappa_{W}. Many of these results were previously shown by Lee [Lee89], Appa and Kotnyek [AK04], and by Dadush et al. [DHNV20]. We present them in a unified framework, extend some of the results, and provide new proofs.

Section 4 reveals connections between κW\kappa_{W} and the well-studied condition numbers χ¯\bar{\chi} studied in the context of interior point methods, and δ\delta studied—among other topics—in the analysis of the shadow simplex method. In particular, we show that previous diameter bounds for polyhedra can be translated to strong diameter bounds in terms of the condition number κW\kappa_{W} (Theorem 4.8).

Section 5 studies the best possible values of κW\kappa_{W} that can be achieved by rescaling the variables. We present the algorithm and min-max characterization from [DHNV20]. Further, we characterize when a subspace can be rescaled to a regular one; we also give a new proof of a theorem from [Lee89].

Section 6 shows variants of Hoffman-proximity bounds in terms of κW\kappa_{W} that will be used in subsequent algorithms. In Section 7, we study algorithms for linear programming whose running time only depends on the constraint matrix AA, and reveal the key role of κA\kappa_{A} in this context. Section 7.1 shows how the Hoffman-proximity bounds can be used to obtain a black-box algorithm with κA\kappa_{A}-dependence as in [DNV20], and Section 7.2 discusses layered least squares interior point methods [DHNV20, VY96].

Section 8 gives an overview of circuit diameter bounds and circuit augmentation algorithms, a natural class of LP algorithms that work directly with elementary vectors. As a new result, we present an improved iteration bound on the steepest-descent circuit augmentation algorithm, by extending the analysis of the minimum mean-cycle cancelling algorithm of Goldberg and Tarjan (Theorem 8.4).

Section 9 gives an outlook to integer programming, showing the relationship between the max-circuit imbalance and Graver bases. Finally, Section 10 formulates a conjecture on circuit decompositions with bounded fractionality.

2 Preliminaries

We let [n]:={1,,n}[n]:=\{1,\ldots,n\}. For kk\in\mathbb{N}, a number qq\in\mathbb{Q} is 1/k1/k-integral if it is an integer multiple of 1/k1/k. Let \mathbb{P}\subseteq\mathbb{N} denote the set of primes. Let ++\mathbb{R}_{++} denote the set of positive reals, and +\mathbb{R}_{+} the set of nonnegative reals.

For a prime number pp\in\mathbb{P}, the pp-adic valuation for \mathbb{Z} is the function νp:\nu_{p}\colon\mathbb{Z}\to\mathbb{N} defined by

νp(n)={max{v:pvn}if n0if n=0.\nu_{p}(n)=\begin{cases}\max\{v\in\mathbb{N}:p^{v}\mid n\}&\text{if }n\neq 0\\ \infty&\text{if }n=0.\end{cases} (1)

We denote the support of a vector xnx\in\mathbb{R}^{n} by supp(x)={i[n]:xi0}\mathrm{supp}(x)=\{i\in[n]:x_{i}\neq 0\}. We let 𝟙n{\mathbbm{1}}_{n} denote the nn-dimensional all-ones vector, or simply 𝟙\mathbbm{1}, whenever the dimension is clear from the context. Let eie_{i} denote the ii-th unit vector.

For vectors v,wnv,w\in\mathbb{R}^{n} we denote by min{v,w}\min\{v,w\} the vector znz\in\mathbb{R}^{n} with zi=min{vi,wi},i[n]z_{i}=\min\{v_{i},w_{i}\},i\in[n]; analogously for max{v,w}\max\{v,w\}. Further, we use the notation v+=max{v,0n}v^{+}=\max\{v,0_{n}\} and v=max{v,0n}v^{-}=\max\{-v,0_{n}\} ; note that both v+v^{+} and vv^{-} are nonnegative vectors. For two vectors x,ynx,y\in\mathbb{R}^{n}, we let x,y=xy\left\langle x,y\right\rangle=x^{\top}y denote their scalar product. For sets S,TS,T\subseteq\mathbb{R} we let ST={st|sS,tT}S\cdot T=\{st|s\in S,t\in T\}.

We let Inn×nI_{n}\in\mathbb{R}^{n\times n} denote the nn-dimensional identity matrix. We let 𝐃n\mathbf{D}_{n} denote the set of all positive definite n×nn\times n diagonal matrices. For a vector vnv\in\mathbb{R}^{n}, we denote by diag(v)\operatorname{diag}(v) the diagonal matrix whose ii-th diagonal entry is viv_{i} and for a matrix Am×nA\in\mathbb{R}^{m\times n}, let A1,A2,,AnnA_{1},A_{2},\ldots,A_{n}\in\mathbb{R}^{n} denote the column vectors, and A1,A2,,AmmA^{1},A^{2},\ldots,A^{m}\in\mathbb{R}^{m} denote the row vectors, transposed. For S[n]S\subseteq[n], let ASA_{S} denote the submatrix formed by the columns of AA and for B[n]B\subseteq[n], |B|=m|B|=m, we say that AA is in basis form for BB if AB=ImA_{B}=I_{m}.

We will use 1,2\ell_{1},\ell_{2} and \ell_{\infty} vector norms, denoted as .1,.2\|.\|_{1},\|.\|_{2}, and .\|.\|_{\infty}, respectively. By v\|v\|, we always mean the 2-norm v2\|v\|_{2}. Further, for a matrix Am×nA\in\mathbb{R}^{m\times n}, A\|A\| will refer to the 22\ell_{2}\to\ell_{2} operator norm, and Amax=maxi,j|Aij|\|A\|_{\max}=\max_{i,j}|A_{ij}| to the max-norm.

For an index subset I[n]I\subseteq[n], we use πI:nI\pi_{I}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{I} for the coordinate projection. That is, πI(x)=xI\pi_{I}(x)=x_{I}, and for a subset SnS\subseteq\mathbb{R}^{n}, πI(S)={xI:xS}\pi_{I}(S)=\{x_{I}:\,x\in S\}. We let In={xn:x[n]I=0}\mathbb{R}^{n}_{I}=\{x\in\mathbb{R}^{n}:x_{[n]\setminus I}=0\}.

For a subspace WnW\subseteq\mathbb{R}^{n}, we let WI=πI(WIn)W_{I}=\pi_{I}(W\cap\mathbb{R}^{n}_{I}). It is easy to see that πI(W)=(W)I\pi_{I}(W)^{\perp}=(W^{\perp})_{I}. Assume we are given a matrix Am×nA\in\mathbb{R}^{m\times n} such that W=ker(A)W=\ker(A). Then, WI=ker(AI)W_{I}=\ker(A_{I}), and we can obtain a matrix AA^{\prime} from AA such that πI(W)=ker(A)\pi_{I}(W)=\ker(A^{\prime}) by performing a Gaussian elimination of the variables in [n]I[n]\setminus I.

For a subspace WnW\subseteq\mathbb{R}^{n}, we define by ΠW:nn\Pi_{W}\colon\mathbb{R}^{n}\to\mathbb{R}^{n} the orthogonal projection onto WW.

For a set of vectors V={vi:iI}V=\{v_{i}:\,i\in I\} we let span(V)\operatorname{span}(V) denote the linear space spanned by the vectors in VV. For a matrix Am×nA\in\mathbb{R}^{m\times n}, span(A)m\operatorname{span}(A)\subseteq\mathbb{R}^{m} is the subspace spanned by the columns of AA. A circuit basis of a subspace WnW\subseteq\mathbb{R}^{n} is a set (W){\cal F}\subseteq{\mathcal{F}}(W) of rk(W)\operatorname{rk}(W) linearly independent elementary vectors, i.e., span()=W\operatorname{span}({\cal F})=W.

Linear Programming (LP) in matrix formulation

We will use LPs in the following standard primal and dual form for Am×nA\in\mathbb{R}^{m\times n}, bmb\in\mathbb{R}^{m}, cnc\in\mathbb{R}^{n}.

minc,xAx=bx0.maxy,bAy+s=cs0.\begin{aligned} \min\;&\left\langle c,x\right\rangle\quad\\ Ax&=b\\ x&\geq 0.\\ \end{aligned}\quad\quad\quad\begin{aligned} \max\;&\left\langle y,b\right\rangle\\ A^{\top}y+s&=c\\ s&\geq 0.\\ \end{aligned} (LP(A,b,c)(A,b,c))
Linear Programming in subspace formulation

Since our main focus is on properties of subspaces, it will be more natural to think about linear programming in the following subspace formulation. For AA, bb and cc as above, let W=ker(A)nW=\ker(A)\subseteq\mathbb{R}^{n} and dnd\in\mathbb{R}^{n} such that Ad=bAd=b. We assume the existence of such a vector dd as otherwise the primal program is trivially infeasible. We can write LP(A,b,c)(A,b,c) in the following equivalent form:

minc,xxW+dx0.maxcs,dsW+cs0.\begin{aligned} \min\;&\left\langle c,x\right\rangle\\ x&\in W+d\\ x&\geq 0.\,\end{aligned}\quad\quad\quad\begin{aligned} \max\;&\left\langle c-s,d\right\rangle\\ s&\in W^{\perp}+c\\ s&\geq 0.\end{aligned} (LP(W,d,c)(W,d,c))
Conformal circuit decompositions

We say that the vector yny\in\mathbb{R}^{n} conforms to xnx\in\mathbb{R}^{n} if xiyi>0x_{i}y_{i}>0 whenever yi0y_{i}\neq 0. Given a subspace WnW\subseteq\mathbb{R}^{n}, a conformal circuit decomposition of a vector zWz\in W is a decomposition

z=k=1hgk,z=\sum_{k=1}^{h}g^{k},

where hnh\leq n and g1,g2,,gh(W)g^{1},g^{2},\ldots,g^{h}\in\mathcal{F}(W) are elementary vectors that are conformal with zz. A fundamental result on elementary vectors asserts the existence of a conformal circuit decomposition, see e.g. [Ful68, Roc69].

Lemma 2.1.

For every subspace WnW\subseteq\mathbb{R}^{n}, every zWz\in W admits a conformal circuit decomposition.

Proof.

Let FWF\subseteq W be the set of vectors conformal with zz. FF is a polyhedral cone; its faces correspond to inequalities of the form yk0y_{k}\geq 0, yk0y_{k}\leq 0, or yk=0y_{k}=0. The rays (edges) of FF are of the form {αg:α0}\{\alpha g:\,\alpha\geq 0\} for g(W)g\in\mathcal{F}(W). Clearly, zFz\in F, and thus, zz can be written as a conic combination of at most nn rays by the Minkowski–Weyl theorem. Such a decomposition yields a conformal circuit decomposition. ∎

Linear matroids

For a linear subspace WnW\subseteq\mathbb{R}^{n}, let (W)=([n],){\cal M}(W)=([n],{\cal I}) denote the associated linear matroid, i.e. the matroid defined by the set of circuits 𝒞W\mathcal{C}_{W}. Here, {\cal I} denotes the set of independent sets; SS\in{\cal I} if and only if there exists no zW{0}z\in W\setminus\{0\} with supp(z)S\mathrm{supp}(z)\subseteq S; the maximal independent sets are the bases. We refer the reader to [Sch03, Chapter 39] or [Fra11, Chapter 5] for relevant definitions and background on matroid theory.

Assume rk(W)=m\operatorname{rk}(W)=m and W=ker(A)W=\ker(A) for Am×nA\in\mathbb{R}^{m\times n}. Then B[n]B\subseteq[n], |B|=m|B|=m is a basis in (A):=(W){\cal M}(A):={\cal M}(W) if and only if ABA_{B} is nonsingular; then, A=AB1AA^{\prime}=A_{B}^{-1}A is in basis form for BB such that ker(A)=W\ker(A^{\prime})=W.

The matroid \cal M is separable, if the ground set [n][n] can be partitioned into two nonempty subsets [n]=ST[n]=S\cup T such that II\in{\cal I} if and only if IS,ITI\cap S,I\cap T\in{\cal I}. In this case, the matroid is the direct sum of its restrictions to SS and TT. In particular, every circuit is fully contained in SS or in TT. For the linear matroid (A){\cal M}(A), separability means that ker(A)=ker(AS)ker(AT)\ker(A)=\ker(A_{S})\oplus\ker(A_{T}). In this case, we have κA=max{κAS,κAT}\kappa_{A}=\max\{\kappa_{A_{S}},\kappa_{A_{T}}\} and κ˙A=lcm{κ˙AS,κ˙AT}{\dot{\kappa}}_{A}=\mathrm{lcm}\{{\dot{\kappa}}_{A_{S}},{\dot{\kappa}}_{A_{T}}\}; solving LP(A,b,c)(A,b,c) can be decomposed into two subproblems, restricted to the columns in ASA_{S} and in ATA_{T}.

Thus, for most concepts and problems considered in this paper, we can focus on the non-separable components of (W){\cal M}(W). The following characterization will turn out to be very useful, see e.g. [Fra11, Theorem 5.2.5].

Proposition 2.2.

A matroid =([n],){\cal M}=([n],{\cal I}) is non-separable if and only if for any i,j[n]i,j\in[n], there exists a circuit containing ii and jj.

3 Properties of the imbalance measures

Comparison to well-scaled frames

Lee’s work  [Lee89] on ‘well-scaled frames’, investigated the following closely related concepts. For a set SS\subseteq\mathbb{Q} the rational linear space WW is SS-regular if for every elementary vector g(F)g\in\mathcal{F}(F), there exists a λ0\lambda\neq 0 such that all nonzero entries of λg\lambda g are in SS. For S={k,,k}S=\{-k,\ldots,k\}, the subspace is called kk-regular. For k,Ωk,\Omega\in\mathbb{N}, a subspace is kk-adic of order Ω\Omega if it is SS-regular for S={±1,±k,,±kΩ}S=\{\pm 1,\pm k,\ldots,\pm k^{\Omega}\}. The frame of the subspace WW refers to the set of elementary vectors (W)\mathcal{F}(W).

Using our terminology, a subspace is kk-regular if and only if κ¯W=k\bar{\kappa}_{W}=k, and every kk-adic subspace is anchored. Many of the properties in this section were explicitly or implicitly shown in Lee [Lee89]. However, it turns out that many properties are simpler and more natural to state in terms of either κW\kappa_{W} and κ˙W\dot{\kappa}_{W}. Roughly speaking, the fractional circuit imbalance κW\kappa_{W} is the key quantity of interest for continuous properties, particularly relevant for proximity results in linear programming. On the other hand, the lcm-circuit imbalance κ˙W\dot{\kappa}_{W} captures most clearly the integrality properties. The max-circuit imbalance κ¯W\bar{\kappa}_{W} interpolates between these two, although, as already noted by Lee, it is the right quantity for proximity results in integer programming (see Section 9).

Appa and Kotnyek [AK04] also use the term kk-regularity in a different sense, as a natural extension of unimodularity. This turns out to be strongly related to κ˙W{\dot{\kappa}}_{W}; see Lemma 3.3 and Corollary 3.9.

The key lemma on basis forms

The following simple proposition turns out to be extremely useful in deriving properties of κW\kappa_{W} and κ˙W{\dot{\kappa}}_{W}. The first statement is from [DNV20].

Proposition 3.1.

For every matrix Am×nA\in\mathbb{R}^{m\times n} with rk(A)=m\operatorname{rk}(A)=m,

κA=max{AB1Amax:AB non-singular m×m-submatrix of A}.\kappa_{A}=\max\left\{\|A_{B}^{-1}A\|_{\max}:A_{B}\text{ non-singular $m\times m$-submatrix of }A\right\}\,.

Moreover, for each nonsingular ABA_{B}, all nonzero entries of AB1AA_{B}^{-1}A have absolute values between 1/κA1/\kappa_{A} and κA\kappa_{A} and are 1/κ˙A1/{\dot{\kappa}}_{A}-integral.

Proof.

Consider the matrix A=AB1AA^{\prime}=A_{B}^{-1}A for any non-singular m×mm\times m submatrix ABA_{B}. Let us renumber the columns such that BB corresponds to the first mm columns. Then, for every m+1jnm+1\leq j\leq n, the jjth column of AA^{\prime} corresponds to an elementary vector gg where gj=1g_{j}=1, and gi=Aijg_{i}=-A^{\prime}_{ij} for i[m]i\in[m]. Hence, Amax\|A^{\prime}\|_{\max} gives a lower bound on κA\kappa_{A}. This also implies that all nonzero entries are between 1/κA1/\kappa_{A} and κA\kappa_{A}. To see that all entries of AA^{\prime} are 1/κ˙A1/{\dot{\kappa}}_{A}-integral, note that g=g/αg=g^{\prime}/\alpha for a vector gg^{\prime} where all entries are integer divisors of κ˙A\dot{\kappa}_{A}. Since gj=1g_{j}=1, it follows that α\alpha itself is an integer divisor of κ˙A{\dot{\kappa}}_{A}.

To see that the maximum in the first statement is achieved, take the elementary vector gCg^{C} that attains the maximum in the definition of κA\kappa_{A}; let gjCg_{j}^{C} be the minimum absolute value element. Let us select a basis BB such that C{j}BC\setminus\{j\}\subseteq B. Then, the largest absolute value in the jj-th column of AB1AA_{B}^{-1}A will be κA\kappa_{A}. ∎

3.1 Bounds on subdeterminants

For an integer matrix Am×nA\in\mathbb{Z}^{m\times n}, we define

ΔA\displaystyle\Delta_{A} :=max{|det(B)|:B is a nonsingular submatrix of A}, and\displaystyle:=\max\{|\det(B)|:B\text{ is a nonsingular submatrix of }A\},\mbox{ and} (2)
Δ˙A\displaystyle\dot{\Delta}_{A} :=lcm{|det(B)|:B is a nonsingular submatrix of A}.\displaystyle:=\mathrm{lcm}\{|\det(B)|:B\text{ is a nonsingular submatrix of }A\}.

The matrix is totally unimodular (TU), if ΔA=1\Delta_{A}=1: thus, all subdeterminants are 0 or ±1\pm 1. This class of matrices plays a foundational role in combinatorial optimization, see e.g., [Sch98, Chapters 19-20]. A significant example is the node-arc incidence matrix of a directed graph. A key property is that they define integer polyhedra, see Theorem 3.5 below. A polynomial-time algorithm is known to decide whether a matrix is TU, based on the deep decomposition theorem by Seymour from 1980 [Sey80].

The next statement is implicit in [Lee89, Proposition 5.3].

Proposition 3.2.

For every integer matrix Am×nA\in\mathbb{Z}^{m\times n}, κ¯AΔA\bar{\kappa}_{A}\leq\Delta_{A} and κ˙AΔ˙A{\dot{\kappa}}_{A}\leq\dot{\Delta}_{A}.

Proof.

Let C𝒞AC\in\mathcal{C}_{A} be a circuit, and select a submatrix A^(|C|1)×|C|\hat{A}\in\mathbb{Z}^{(|C|-1)\times|C|} of AA where the columns are indexed by CC, and the rows are linearly independent. Let A^i\hat{A}_{-i} be the square submatrix resulting from deleting the column corresponding to ii from A^\hat{A}. From Cramer’s rule, we see that |giC|=|det(A^i)|/α|g^{C}_{i}|=|\det(\hat{A}_{-i})|/\alpha for some α\alpha\in\mathbb{Q}, α1\alpha\geq 1. This implies both claims κ¯AΔA\bar{\kappa}_{A}\leq\Delta_{A} and κ˙AΔ˙A{\dot{\kappa}}_{A}\leq\dot{\Delta}_{A}. ∎

In Propositions 3.18 and 3.19, we show that for any matrix Am×nA\in\mathbb{Q}^{m\times n} there exists a matrix A~m×n\tilde{A}\in\mathbb{Z}^{m\times n} such that ker(A)=ker(A~)\ker(A)=\ker(\tilde{A}) and Δ˙A~(κ˙A)m\dot{\Delta}_{\tilde{A}}\leq({\dot{\kappa}}_{A})^{m}.

To see an example where ΔA\Delta_{A} can be much larger than κA\kappa_{A}, let An×(n2)A\in\mathbb{Z}^{n\times\binom{n}{2}} be the node-edge incidence matrix of a complete undirected graph on nn nodes; assume nn is divisible by 33. The determinant corresponding to any submatrix corresponding to an odd cycle is ±2\pm 2. Let HH be an edge set of n3\frac{n}{3} node-disjoint triangles. Then AHA_{H} is a square submatrix with determinant ±2n/3\pm 2^{n/3}. In fact, ΔA=2n/3\Delta_{A}=2^{n/3} in this case, since ΔA\Delta_{A} for a node-edge incidence matrix equals the maximum number of node disjoint odd cycles, see [GKS95]. On the other hand, κA=κ¯A=κ˙A{1,2}\kappa_{A}=\bar{\kappa}_{A}={\dot{\kappa}}_{A}\in\{1,2\} for the incidence matrix AA of any undirected graph; see Section 3.2.

For TU-matrices, the converse of Proposition 3.2 is also true. In 1956, Heller and Tompkins [Hel57, HT56] introduced the Dantzig property. A matrix Am×nA\in\mathbb{R}^{m\times n} has the Dantzig property if AB1AA^{-1}_{B}A is a 0,±10,\pm 1-matrix for every nonsingular m×mm\times m submatrix ABA_{B}. According to Proposition 3.1, this is equivalent to κA=1\kappa_{A}=1. Theorem 3.4 below can be attributed to Cederbaum [Ced57, Proposition (v)]; see also Camion’s PhD thesis [Cam64, Theorem 2.4.5(f)]. The key is the following lemma that we formulate for general 1/κ˙A1/{\dot{\kappa}}_{A} for later use.

Lemma 3.3.

Let A=(Im|A)m×nA=(I_{m}|A^{\prime})\in\mathbb{R}^{m\times n}. Then, for any nonsingular square submatrix MM of AA, the inverse M1M^{-1} is 1/κ˙A1/{\dot{\kappa}}_{A}-integral, with non-zero entries between 1/κA1/\kappa_{A} and κA\kappa_{A} in absolute value.

Proof.

Let MM be any k×kk\times k nonsingular submatrix of AA; w.l.o.g., let us assume that it uses the first kk rows of AA. Let BB be the set of columns of MM, along with the mkm-k additional columns i[k+1,m]i\in[k+1,m], i.e., the last mkm-k unit vectors from ImI_{m}. Thus, ABm×mA_{B}\in\mathbb{R}^{m\times m} is also nonsingular. After permuting the columns, this can be written in the form

AB=(M0LImk)A_{B}=\left(\begin{array}[]{c|c}M&0\\ \hline\cr L&I_{m-k}\end{array}\right)

for some L(mk)×kL\in\mathbb{Z}^{(m-k)\times k}. We now use Proposition 3.1 for A~=AB1A\tilde{A}=A_{B}^{-1}A. Note that the first mm columns of A~\tilde{A} correspond to AB1A_{B}^{-1}. Moreover, we see that

AB1=(M10LM1Imk)A_{B}^{-1}=\left(\begin{array}[]{c|c}M^{-1}&0\\ \hline\cr-LM^{-1}&I_{m-k}\end{array}\right)

Thus, M1M^{-1} is 1/κ˙A1/\dot{\kappa}_{A}-integral, with non-zero entries between 1/κA1/\kappa_{A} and κA\kappa_{A} completing the proof. ∎

Appa and Kotnyek define kk-regular matrices as follows: a rational matrix Am×nA^{\prime}\in\mathbb{R}^{m\times n} is kk-regular if and only if the inverse of all nonsingular submatrices is 1/k1/k-integral. From the above statement, it follows that AA^{\prime} is kk-regular in this sense for k=κ(Im|A)k=\kappa_{(I_{m}|A^{\prime})}. See also Corollary 3.9.

Theorem 3.4 (Cederbaum, 1957).

Let WnW\subset\mathbb{R}^{n} be a linear subspace. Then, the following are equivalent.

  1. (i)

    κW=κ¯W=κ˙W=1\kappa_{W}=\bar{\kappa}_{W}={\dot{\kappa}}_{W}=1.

  2. (ii)

    There exists a TU matrix AA, such that W=ker(A)W=\ker(A).

  3. (iii)

    For any matrix AA in basis form such that W=ker(A)W=\ker(A), AA is a TU-matrix.

Proof.

iii \Rightarrow ii is straightforward, and ii \Rightarrow i follows by Proposition 3.2. It remains to show i \Rightarrow iii. Let rk(W)=nm\operatorname{rk}(W)=n-m, and consider any Am×nA\in\mathbb{R}^{m\times n} in basis form such that W=ker(A)W=\ker(A). For simplicity of notation, assume the basis is formed by the first mm columns, that is, A=(Im|A)A=(I_{m}|A^{\prime}) for some Am×(nm)A^{\prime}\in\mathbb{R}^{m\times(n-m)}.

Proposition 3.1 implies that all entries of AA are 0 and ±1\pm 1. Consider any nonsingular square submatrix MM of AA. By Lemma 3.3, M1M^{-1} is also a 0, ±1\pm 1 matrix. Consequently, both det(M)\det(M) and det(M1)\det(M^{-1}) are nonzero integers, which implies that |det(M)|=1|\det(M)|=1, as required. ∎

3.2 Fractional integrality characterization

Hoffman and Kruskal [HK56] gave the following characterization of TU matrices. A polyhedron PnP\subseteq\mathbb{R}^{n} is integral, if all vertices (=basic feasible solutions) are integer.

Theorem 3.5 (Hoffman and Kruskal, 1956).

An integer matrix Am×nA\in\mathbb{Z}^{m\times n} is totally unimodular if and only if for every bmb\in\mathbb{Z}^{m} , the polyhedron {xn:Axb,x0}\{x\in\mathbb{R}^{n}:\,Ax\leq b,x\geq 0\} is integral.

Since κ˙{\dot{\kappa}} is a property of the subspace, it will be more convenient to work with the standard equality form of an LP. Here as well as in Section 4.2, we use the following straightforward correspondence between the two forms. Recall that an edge of a polyhedron is a bounded one dimensional face; every edge is incident to exactly two vertices. The following statement is standard and easy to verify.

Lemma 3.6.

Let Am×nA\in\mathbb{R}^{m\times n} be of the form A=(A|Im)A=(A^{\prime}|I_{m}) for A=m×(nm)A^{\prime}=\mathbb{R}^{m\times(n-m)}. For a vector bmb\in\mathbb{R}^{m}, let

Pb={xn:Ax=b,x0}andPb={xnm:Axb,x0}.P_{b}=\{x\in\mathbb{R}^{n}:\,Ax=b,x\geq 0\}\quad\mbox{and}\quad P^{\prime}_{b}=\{x^{\prime}\in\mathbb{R}^{n-m}:\,A^{\prime}x^{\prime}\leq b,x^{\prime}\geq 0\}\,.

Let I=[nm]I=[n-m] denote the index set of AA^{\prime}. Then, Pb=πI(Pb)P^{\prime}_{b}=\pi_{I}(P_{b}), i.e., PbP^{\prime}_{b} is the projection of PbP_{b} to the coordinates in II. For every vertex xx of PbP_{b}, x=xIx^{\prime}=x_{I} is a vertex of PbP^{\prime}_{b}, and conversely, for every vertex xx^{\prime} of PbP^{\prime}_{b}, there exists a unique vertex xx of PP such that xI=xx_{I}=x^{\prime}. There is a one-to-one correspondence between the edges of PbP_{b} and PbP^{\prime}_{b}. Further, if bmb\in\mathbb{Z}^{m}, then PbP_{b} is 1/k1/k-integral if and only if PbP^{\prime}_{b} is 1/k1/k-integral.

Using Theorem 3.4 and Lemma 3.6, we can formulate Theorem 3.5 in subspace language.

Corollary 3.7.

Let WnW\subseteq\mathbb{R}^{n} be a linear space. Then, κW=1\kappa_{W}=1 if and only if for every dnd\in\mathbb{Z}^{n}, the polyhedron {xn:xW+d,x0}\{x\in\mathbb{R}^{n}:\,x\in W+d,x\geq 0\} is integral.

Proof.

Let n=nm=dim(W)n^{\prime}=n-m=\dim(W). W.l.o.g., assume the last mm variables form a basis, and let us represent WW in a basis form as W=ker(A)W=\ker(A) for A=(A|Im)A=(A^{\prime}|I_{m}), where Am×nA^{\prime}\in\mathbb{R}^{m\times n^{\prime}}. It follows by Theorem 3.4 that κW=1\kappa_{W}=1 if and only if AA is TU, which is further equivalent to AA^{\prime} being TU.

Further, note that the system {xn:xW+d,x0}\{x\in\mathbb{R}^{n}:\,x\in W+d,x\geq 0\} coincides with Pb={xn:(A|Im)x=b,x0}P_{b}=\{x\in\mathbb{R}^{n}:\,(A^{\prime}|I_{m})x=b,x\geq 0\}, where b=Adb=Ad.

Note that b=Adb=Ad is integer whenever dmd\in\mathbb{Z}^{m}. Moreover, we can obtain every integer vector in bmb\in\mathbb{Z}^{m} this way, since AA contains an identity matrix. According to Lemma 3.6, PbP_{b} is integral if and only if Pb={xnm:Axb,x0}P^{\prime}_{b}=\{x\in\mathbb{R}^{n-m}:\,A^{\prime}x^{\prime}\leq b,x^{\prime}\geq 0\} is integral. The claim follows by Theorem 3.5. ∎

We provide the following natural generalization. Related statements, although in substantially more complicated forms, were given in [Lee89, Proposition 6.1 and 6.2].

Theorem 3.8.

Let WnW\subseteq\mathbb{R}^{n} be a linear space. Then, κ˙W\dot{\kappa}_{W} is the smallest integer kk\in\mathbb{Z} such that for every dnd\in\mathbb{Z}^{n}, the polyhedron {xn:xW+d,x0}\{x\in\mathbb{R}^{n}:\,x\in W+d,x\geq 0\} is 1/k1/k-integral.

Proof.

Let dim(W)=nm\dim(W)=n-m, and let us represent W=ker(A)W=\ker(A) for Am×nA\in\mathbb{R}^{m\times n}. Then, xW+dx\in W+d, x0x\geq 0 can be written as Ax=AdAx=Ad, x0x\geq 0. Let xx be a basic feasible solution (i.e. vertex) of this system. Then, x=AB1Adx=A_{B}^{-1}Ad. By Proposition 3.1, AB1AA_{B}^{-1}A is 1/κ˙W1/{\dot{\kappa}}_{W}-integral. Thus, if dnd\in\mathbb{Z}^{n} then xx must be also 1/κ˙W1/{\dot{\kappa}}_{W}-integral.

Let us now show the converse direction. Assume {xn:xW+d,x0}\{x\in\mathbb{R}^{n}:\,x\in W+d,x\geq 0\} is 1/k1/k-integral for every dnd\in\mathbb{Z}^{n}. For a contradiction, assume there exists a circuit C𝒞WC\in\mathcal{C}_{W} such that the entries of the elementary vector are not all divisors of kk (or that gCg^{C} is not even a rational vector if WW is not a rational space). In particular, select an index C\ell\in C such that gCkg^{C}_{\ell}\nmid k, or such that (1/gC)gC(1/g^{C}_{\ell})g^{C}is not rational.

Let us select a basis B[n]B\subseteq[n] such that CB={}C\setminus B=\{\ell\}. For simplicity of notation, let B=[m]B=[m]. We can represent W=ker(A)W=\ker(A) in a basis form as A=(Im|A)A=(I_{m}|A^{\prime}). Let gng\in\mathbb{R}^{n} be defined by g=1g_{\ell}=1, gj=Ajg_{j}=-A_{j\ell} for jBj\in B and gj=0g_{j}=0 otherwise; thus, g=(1/gC)gCg=(1/g^{C}_{\ell})g^{C}.

Let us pick an integer tt\in\mathbb{N}, tgt\geq\|g\|_{\infty}, and define dnd\in\mathbb{Z}^{n} by dj=td_{j}=t for jBj\in B, d=1d_{\ell}=-1, and dj=0d_{j}=0 otherwise. Then, the basic solution of xW+dx\in W+d, x0x\geq 0 corresponding to the basis BB is obtained as xj=t+gjx_{j}=t+g_{j} for jBj\in B and xj=0x_{j}=0 for j[n]Bj\in[n]\setminus B. The choice of tt guarantees x0x\geq 0. By the assumption, xx is 1/k1/k-integer, and therefore gg is also 1/k1/k-integer. Recall that g=(1/gC)gCg=(1/g^{C}_{\ell})g^{C}, where either gCng^{C}\in\mathbb{Z}^{n} with lcm(gC)=1\mathrm{lcm}(g^{C})=1 and gCkg^{C}_{\ell}\nmid k, or gg is not rational. Both cases give a contradiction. ∎

Using again Lemma 3.6, we can write this theorem in a form similar to the Hoffman-Kruskal theorem.

Corollary 3.9.

Let A=(A|Im)m×nA=(A^{\prime}|I_{m})\in\mathbb{R}^{m\times n}. Then, κ˙A{\dot{\kappa}}_{A} is the smallest value kk such that for every bmb\in\mathbb{Z}^{m}, the polyhedron {xnm:Axb,x0}\{x^{\prime}\in\mathbb{R}^{n-m}:\,A^{\prime}x^{\prime}\leq b,x^{\prime}\geq 0\} is 1/k1/k-integral.

Appa and Kotnyek [AK04, Theorem 17] show that kk-regularity of AA^{\prime} (in the sense that the inverse of every square submatrix is 1/k1/k-integral) is equivalent to the property above.

Subspaces with κ˙A=2\dot{\kappa}_{A}=2

The case κ˙W=2{\dot{\kappa}}_{W}=2 is a particularly interesting class. As already noted, it includes incidence matrices of undirected graphs, and according to Theorem 3.8, it corresponds to half-integer polytopes. This class includes the following matrices, first studied by Edmonds and Johnson [EJ70]; the following result follows e.g. from [AK04, GS86, HMNT93].

Theorem 3.10.

Let Am×nA\in\mathbb{Z}^{m\times n} such that for each column j[n]j\in[n], i=1m|Aij|2\sum_{i=1}^{m}|A_{ij}|\leq 2. Then κ˙A{1,2}\dot{\kappa}_{A}\in\{1,2\}.

Appa and Kotnyek [AK04] define binet matrices as A=AB1AA^{\prime}=A_{B}^{-1}A for a matrix AA as in Theorem 3.10 for a basis BB. Clearly, these matrices have κ˙A{1,2}{\dot{\kappa}}_{A^{\prime}}\in\{1,2\} since they define the same subspace.

Deciding whether a matrix has κ˙A=2{\dot{\kappa}}_{A}=2 (or more generally, κ˙A=k{\dot{\kappa}}_{A}=k for a fixed constant kk) is an interesting open question: is it possible to extend Seymour’s decomposition [Sey80] from TU matrices? The matrices in Theorem 3.10 could be a natural building block of such a decomposition.

3.3 Self-duality

We next show that both κW\kappa_{W} and κ˙W{\dot{\kappa}}_{W} are self-dual. These rely on the following duality property of circuits. We introduce the following more refined quantities that will also come useful later on.

Definition 3.11 (Pairwise Circuit Imbalances).

For a space WnW\subseteq\mathbb{R}^{n} and variables i,j[n]i,j\in[n] we define

𝒦ijW\displaystyle\mathcal{K}_{ij}^{W} :={|gjgi|:{i,j}supp(g),g(W)},κijW:=max𝒦ijW,\displaystyle:=\left\{\left|\frac{g_{j}}{g_{i}}\right|:\,\{i,j\}\subseteq\mathrm{supp}(g),g\in\mathcal{F}(W)\right\}\,,\quad\kappa_{ij}^{W}:=\max\mathcal{K}_{ij}^{W}\,,
𝒦˙ijW\displaystyle\dot{\mathcal{K}}_{ij}^{W} :={lcm(p,q):p,q,gcd(p,q)=1,pq𝒦ijW}.\displaystyle:=\left\{\mathrm{lcm}(p,q):p,q\in\mathbb{N},\gcd(p,q)=1,\frac{p}{q}\in\mathcal{K}_{ij}^{W}\right\}\,.

We call κijW\kappa_{ij}^{W} the pairwise imbalance between ii and jj.

Cleary, κW=maxi,j[n]κijW\kappa_{W}=\max_{i,j\in[n]}\kappa_{ij}^{W} for a nontrivial linear space WW. We use the following simple lemma.

Lemma 3.12.

Consider a matrix Am×nA\in\mathbb{R}^{m\times n} in basis form for B[n]B\subseteq[n], i.e., AB=ImA_{B}=I_{m}. Let W=ker(A)W=\ker(A); thus, W=span(A)W^{\perp}=\operatorname{span}(A^{\top}). The following hold.

  1. (i)

    The rows of AA form a circuit basis of WW^{\perp}, denoted as B(W)\mathcal{F}_{B}(W^{\perp}).

  2. (ii)

    For any two rows Ai,AjA^{i},A^{j}, i,jBi,j\in B, iji\neq j, and k[n]Bk\in[n]\setminus B, the vector h=AjkAiAikAjh=A_{jk}A^{i}-A_{ik}A^{j} fulfills h(W)h\in\mathcal{F}(W^{\perp}).

Proof.

For part i, the rows are clearly linearly independent and span WW^{\perp}. Therefore, every gWg\in W^{\perp} must have supp(g)B\mathrm{supp}(g)\cap B\neq\emptyset, and if supp(g)B={i}\mathrm{supp}(g)\cap B=\{i\} then g=giAig=g_{i}A^{i}. These two facts imply that each AiA^{i} is support minimal in WW^{\perp}, that is, Ai(W)A^{i}\in\mathcal{F}(W^{\perp}).

For part ii, there is nothing to prove if Aik=0A_{ik}=0 or Ajk=0A_{jk}=0; for the rest, assume both are nonzero. Assume for a contradiction h(W)h\notin\mathcal{F}(W^{\perp}); thus, there exists a gWg\in W^{\perp}, g0g\neq 0 and supp(g)supp(h)\mathrm{supp}(g)\subsetneq\mathrm{supp}(h). We have supp(h)B={i,j}\mathrm{supp}(h)\cap B=\{i,j\}. If supp(g)B{i,j}\mathrm{supp}(g)\cap B\subsetneq\{i,j\}, as above we get that g=giAig=g_{i}A^{i} or g=gjAjg=g_{j}A^{j}, a contradiction since hk=0h_{k}=0 but Aik,Ajk0A_{ik},A_{jk}\neq 0. Hence, supp(g)B={i,j}\mathrm{supp}(g)\cap B=\{i,j\}. By part i, we have g=giAi+gjAjg=g_{i}A^{i}+g_{j}A^{j}; and since hk=0h_{k}=0 it follows that gi/gj=Ajk/Aikg_{i}/g_{j}=-A_{jk}/A_{ik}; thus, gg is a scalar multiple of hh, a contradiction. ∎

Lemma 3.13.

For any i,j[n]i,j\in[n] we have 𝒦ijW={α1:α𝒦jiW}\mathcal{K}_{ij}^{W}=\left\{\alpha^{-1}:\alpha\in\mathcal{K}_{ji}^{W^{\perp}}\right\}. Equivalently: for every elementary vector g(W)g\in\mathcal{F}(W) with indices i,jsupp(g)i,j\in\mathrm{supp}(g) there exists an elementary vector h(W)h\in\mathcal{F}(W^{\perp}) such that |hi/hj|=|gj/gi||h_{i}/h_{j}|=|g_{j}/g_{i}|.

Proof.

Let g(W)g\in\mathcal{F}(W) such that i,jsupp(g)i,j\in\mathrm{supp}(g). If supp(g)={i,j}\mathrm{supp}(g)=\{i,j\} then any h(W)h\in\mathcal{F}(W^{\perp}) with isupp(h)i\in\mathrm{supp}(h) fulfills gihi+gjhj=g,h=0g_{i}h_{i}+g_{j}h_{j}=\left\langle g,h\right\rangle=0, so jsupp(h)j\in\mathrm{supp}(h) and |hi/hj|=|gj/gi||h_{i}/h_{j}|=|g_{j}/g_{i}|.

Else, there exists ksupp(g){i,j}k\in\mathrm{supp}(g)\setminus\{i,j\}. Let us select a basis BB of (W)\mathcal{M}(W) with supp(g)B={k}\mathrm{supp}(g)\setminus B=\{k\}. Let Am×nA\in\mathbb{R}^{m\times n} be a matrix in basis form for BB with ker(A)=W\ker(A)=W, and let h=AjkAiAikAjh=A_{jk}A^{i}-A_{ik}A^{j}, an elementary vector in (W)\mathcal{F}(W^{\perp}) by Lemma 3.12ii.

By the construction, |hi/hj|=|Ajk/Aik||h_{i}/h_{j}|=|A_{jk}/A_{ik}|. On the other hand, g,Ai=0\left\langle g,A^{i}\right\rangle=0 and supp(g)B={k}\mathrm{supp}(g)\setminus B=\{k\} implies gi=gkAikg_{i}=-g_{k}A_{ik} and similarly g,Aj=0\left\langle g,A^{j}\right\rangle=0 implies gj=gkAjkg_{j}=-g_{k}A_{jk}. The claim follows. ∎

For κW\kappa_{W}, duality is immediate from the above:

Proposition 3.14 ([DHNV20]).

For any linear subspace WnW\subseteq\mathbb{R}^{n}, we have κW=κW\kappa_{W}=\kappa_{W^{\perp}}.

Let us now show duality also for κ˙W{\dot{\kappa}}_{W}; this was shown in [Lee89, Lemma 2.1] in a slightly different form.

Proposition 3.15.

For any rational linear subspace WnW\subseteq\mathbb{R}^{n}, we have κ˙W=κ˙W{\dot{\kappa}}_{W}={\dot{\kappa}}_{W^{\perp}}.

Proof.

Recall the pp-adic valuation νp(n)\nu_{p}(n) defined in (1). It suffices to show that νp(κ˙W)=νp(κ˙W)\nu_{p}(\dot{\kappa}_{W})=\nu_{p}(\dot{\kappa}_{W^{\perp}}) for any prime pp\in\mathbb{P}. We can reformulate as

νp(κ˙W)\displaystyle\nu_{p}(\dot{\kappa}_{W}) =\displaystyle= νp(lcm{lcm(gC):C𝒞W})\displaystyle\nu_{p}(\mathrm{lcm}\left\{\mathrm{lcm}(g^{C}):\,C\in\mathcal{C}_{W}\right\})
=\displaystyle= max{νp(lcm(gC)):C𝒞W}\displaystyle\max\left\{\nu_{p}(\mathrm{lcm}(g^{C})):\,C\in\mathcal{C}_{W}\right\}
=\displaystyle= max{νp(α):i,j[n],α𝒦˙ijW}.\displaystyle\max\left\{\nu_{p}(\alpha):\,i,j\in[n],\alpha\in\dot{\mathcal{K}}_{ij}^{W}\right\}.\,

Lemma 3.13 implies that the last expression is the same for WW and WW^{\perp}. ∎

We next show that κW\kappa_{W} and κ˙W{\dot{\kappa}}_{W} are monotone under projections and restrictions of the subspace.

Lemma 3.16.

For any linear subspace WnW\subseteq\mathbb{R}^{n}, J[n]J\subseteq[n] and i,jJi,j\in J, we have

𝒦ijπJ(W)𝒦ijW,𝒦ijWJ𝒦ijW,𝒦˙ijπJ(W)𝒦˙ijW,and𝒦˙ijWJ𝒦˙ijW.\mathcal{K}_{ij}^{\pi_{J}(W)}\subseteq\mathcal{K}_{ij}^{W}\,,\quad\mathcal{K}_{ij}^{W_{J}}\subseteq\mathcal{K}_{ij}^{W}\,,\quad\dot{\mathcal{K}}_{ij}^{\pi_{J}(W)}\subseteq\dot{\mathcal{K}}_{ij}^{W}\,,\quad\mbox{and}\quad\dot{\mathcal{K}}_{ij}^{W_{J}}\subseteq\dot{\mathcal{K}}_{ij}^{W}\,.
Proof.

Let g(WJ)g\in\mathcal{F}(W_{J}). Then (g,0[n]J)(W)(g,0_{[n]\setminus J})\in\mathcal{F}(W) and so 𝒦ijWJ𝒦ijW\mathcal{K}_{ij}^{W_{J}}\subseteq\mathcal{K}_{ij}^{W}. Note that πJ(W)=((W)J)\pi_{J}(W)=((W^{\perp})_{J})^{\perp} and so by Proposition 3.13,

𝒦ijπJ(W)={α1:α𝒦ji(W)J}{α1:α𝒦jiW}=𝒦ijW.\mathcal{K}_{ij}^{\pi_{J}(W)}=\left\{\alpha^{-1}:\alpha\in\mathcal{K}_{ji}^{(W^{\perp})_{J}}\right\}\subseteq\left\{\alpha^{-1}:\alpha\in\mathcal{K}_{ji}^{W^{\perp}}\right\}=\mathcal{K}_{ij}^{W}. (3)

The same arguments extend to 𝒦˙ij\dot{\mathcal{K}}_{ij}. ∎

Proposition 3.17.

For any linear subspace WnW\subseteq\mathbb{R}^{n} and J[n]J\subseteq[n], we have

κWJκW,κπJ(W)κW,κ˙WJκ˙W,andκ˙πJ(W)κ˙W.\kappa_{W_{J}}\leq\kappa_{W}\,,\quad\kappa_{\pi_{J}(W)}\leq\kappa_{W}\,,\quad{\dot{\kappa}}_{W_{J}}\leq{\dot{\kappa}}_{W}\,,\quad\mbox{and}\quad{\dot{\kappa}}_{\pi_{J}(W)}\leq{\dot{\kappa}}_{W}\,.

3.4 Matrix representations

Proposition 3.1 already tells us that any rational matrix of the form A=(Im|A)A=(I_{m}|A^{\prime}) is 1/κ˙A1/{\dot{\kappa}}_{A}-integral, and according to Lemma 3.3, the inverse of every non-singular square submatrix of AA is also 1/κ˙A1/{\dot{\kappa}}_{A}-integral. It is natural to ask whether every linear subspace WW can be represented as W=ker(A)W=\ker(A) for an integer matrix AA with the same property on the inverse matrices.

We show that this is true if the dual space is anchored but false in general. Recall that this means that every elementary vector gCg^{C}, C𝒞WC\in\mathcal{C}_{W^{\perp}} has a ±1\pm 1 entry. In particular, κ˙W=pα{\dot{\kappa}}_{W}=p^{\alpha} for some prime number pp\in\mathbb{P} implies that both WW and WW^{\perp} are anchored; in this case we also have κW=κ˙W\kappa_{W}={\dot{\kappa}}_{W}.

In [Lee89, Section 7], it is shown that if BB is a basis minimizing |det(AB)||\det(A_{B})| for a full rank Am×nA\in\mathbb{R}^{m\times n}, then every nonzero entry in AB1AA_{B}^{-1}A is at least 1 in absolute value. Moreover, a simple greedy algorithm is proposed (called 1-OPT) that finds such a basis within mm pivots for kk-adic spaces. Our next statement can be seen as the variant of this for anchored-spaces, using the lcm-circuit imbalance κ˙A{\dot{\kappa}}_{A}. We note that finding a basis minimizing |det(AB)||\det(A_{B})| is computationally hard in general [Kha95].

Proposition 3.18.

Let Wn,dim(W)=nmW\subseteq\mathbb{R}^{n},\dim(W)=n-m be a rational subspace such that WW^{\perp} is an anchored space. Then there exists an integer matrix Am×nA\in\mathbb{Z}^{m\times n} such that ker(A)=W\ker(A)=W, and

  1. (i)

    All entries of AA divide κ˙W{\dot{\kappa}}_{W}.

  2. (ii)

    For all non-singular submatrices MM of AA, M1M^{-1} is 1κ˙W\frac{1}{{\dot{\kappa}}_{W}}-integral.

  3. (iii)

    Δ˙A\dot{\Delta}_{A} is an integer divisor of (κ˙W)m({\dot{\kappa}}_{W})^{m}.

Proof.

Let A¯m×n\bar{A}\in\mathbb{Q}^{m\times n} be an arbitrary matrix with ker(A¯)=W\ker(\bar{A})=W. By performing row operations we can convert A~\tilde{A} into A=(D|A)m×nA=(D|A^{\prime})\in\mathbb{Z}^{m\times n} where D𝐃mD\in\mathbf{D}_{m} is positive diagonal and Am×(nm)A^{\prime}\in\mathbb{Z}^{m\times(n-m)} (after possibly permuting the columns). If D=ImD=I_{m}, then we are already done. Property i follows by Proposition 3.1; property ii follows by Lemma 3.3, and property iii holds since det(M)det(M1)=1\det(M)\cdot\det(M^{-1})=1, det(M)\det(M)\in\mathbb{Z}, and det(M1)\det(M^{-1}) is 1(κ˙W)m\frac{1}{({\dot{\kappa}}_{W})^{m}}-integral.

If DD is not the identity matrix, then we show that AA can be brought to the form (Im|A′′)(I_{m}|A^{\prime\prime}) with an integer A′′A^{\prime\prime} by performing further basis exchanges. Let us assume that gcd(Ai)=1\gcd(A^{i})=1 for all rows AiA^{i}, i[m]i\in[m]. By Lemma 3.13, Ai(W)A^{i}\in\mathcal{F}(W^{\perp}). Assume Dii=Aii>1D_{ii}=A_{ii}>1 for some i[m]i\in[m]. As AiA^{i} is a circuit and WW^{\perp} is anchored, there exists an index k[n]k\in[n] such that |Aik|=1|A_{ik}|=1.

Let us perform a basis exchange between columns ii and kk. That is, subtract integer multiples of row ii from the other rows to turn column kk into eie_{i}. We then swap columns ii and kk and obtain the matrix again in the form (D|A′′)(D^{\prime}|A^{\prime\prime}). Notice that the matrix remains integral, Dii=1D^{\prime}_{ii}=1, and Djj=DjjD^{\prime}_{jj}=D_{jj} for j[m]j\in[m], jij\neq i. Hence, repeating this procedure at most nn times, we can convert the matrix to the integer form (Im|A)(I_{m}|A^{\prime}), completing the proof. ∎

Note that the proof gives an algorithm to find such a basis representation using a Gaussian elimination and at most mm additional pivot operations. If WW^{\perp} is not anchored, we show the following weaker statement.

Proposition 3.19.

Let Wn,dim(W)=nmW\subseteq\mathbb{R}^{n},\dim(W)=n-m be a rational subspace. Then there exists an integer matrix Am×nA\in\mathbb{Z}^{m\times n} with ker(A)=W\ker(A)=W such that

  1. (i)

    All entries of AA divide κ˙W{\dot{\kappa}}_{W};

  2. (ii)

    For all non-singular submatrices MM of AA, M1M^{-1} is 1(κ˙W)2\frac{1}{({\dot{\kappa}}_{W})^{2}}-integral.

  3. (iii)

    Δ˙A\dot{\Delta}_{A} is an integer divisor of (κ˙W)m({\dot{\kappa}}_{W})^{m}.

Proof.

The proof is an easy consequence of Proposition 3.1 and Lemma 3.3. Consider any basis form A=(Im|A)A=(I_{m}|A^{\prime}) with ker(A)=W\ker(A)=W (after possibly permuting the columns). According to Proposition 3.1, all entries of AA are 1/κ˙W1/\dot{\kappa}_{W} integral. By Lemma 3.13, the rows Ai(W)A^{i}\in\mathcal{F}(W^{\perp}) for i[m]i\in[m]. We can write Ai=gi/diA^{i}=g^{i}/d_{i} for some gi(W)mg^{i}\in\mathcal{F}(W^{\perp})\cap\mathbb{Z}^{m} and did_{i}\in\mathbb{Q} such that gcd(gi)=1\gcd(g^{i})=1 for each i[m]i\in[m]. By the definition of κ˙A{\dot{\kappa}}_{A}, the entries of each gig^{i} are divisors of κ˙A{\dot{\kappa}}_{A}. Since Aii=1A_{ii}=1 it follows that did_{i}\in\mathbb{Z} and diκ˙Ad_{i}\mid{\dot{\kappa}}_{A}. Let D𝐃mD\in\mathbf{D}_{m} be the diagonal matrix with entries Dii=diD_{ii}=d_{i}. Then, A¯=DA\bar{A}=DA is an integer matrix where all entries divide κ˙A{\dot{\kappa}}_{A}, proving i. Part ii follows by Lemma 3.3 and noting that the subdeterminants get multiplied by a submatrix D1D^{-1}.

For part iii, let us start use a basis BB such that |det(AB)||\det(A_{B})| is maximal; w.l.o.g. assume B=[m]B=[m]. Then, in the basis form (Im|A)(I_{m}|A^{\prime}) for BB, all subdeterminants are 1\leq 1. This holds as for any submatrix Mk×kM\in\mathbb{Q}^{k\times k} of AA^{\prime} with det(M)0\det(M)\neq 0 we have that augmenting the columns of MM by the columns iBi\in B such that ii is not a row of MM results in a basis BMB_{M} with |det(M)|=|det((Im|A)BM)|det(Im)=1|\det(M)|=|\det\big{(}(I_{m}|A^{\prime})_{B_{M}}\big{)}|\leq\det(I_{m})=1 by assumption on BB. After multiplying by DD as above, A¯=DA\bar{A}=DA, all subdeterminants will be det(D)(κ˙A)m\leq\det(D)\leq({\dot{\kappa}}_{A})^{m}. ∎

Note that parts i and ii are true for any choice of the basis form, whereas iii requires one to select ABA_{B} with maximum determinant. The maximum subdeterminant is NP-hard even to approximate better than cmc^{m} for some c>1c>1 [DSEFM14]. However, it is easy to see that even if we start with an arbitrary basis, then Δ˙A(κ˙W)2m\dot{\Delta}_{A}\mid({\dot{\kappa}}_{W})^{2m}, since every subdeterminant of AB1AA_{B}^{-1}A is at most (κ˙W)m({\dot{\kappa}}_{W})^{m} follows by Lemma 3.3.

We now give an example to illustrate why Proposition 3.18ii cannot hold for arbitrary values of κ˙W{\dot{\kappa}}_{W}. The proof is given in the Appendix.

Proposition 3.20.

Consider the matrix

A=[1343013910].A=\begin{bmatrix}1&3&4&3\\ 0&13&9&10\end{bmatrix}\,.

For this matrix κ˙A=5850=2×32×52×13\dot{\kappa}_{A}=5850=2\times 3^{2}\times 5^{2}\times 13 holds, and there exists no A~2×4\tilde{A}\in\mathbb{Z}^{2\times 4} such that ker(A~)=ker(A)\ker(\tilde{A})=\ker(A) and the inverse of every nonsingular 2×22\times 2 submatrix of A~\tilde{A} is 1/58501/5850-integral.

3.5 The triangle inequality

An interesting additional fact about circuit imbalances is that the logarithm of the weights satisfy the triangle inequality; this was shown in [DHNV20]. Here, we formulate a stronger version and give a simpler proof. Throughout, we assume that (W){\cal M}(W) is non-separable. Thus, according to Proposition 2.2, for any i,j[n]i,j\in[n] there is a circuit C𝒞WC\in\mathcal{C}_{W} with i,jCi,j\in C.

Theorem 3.21.

Let WnW\subseteq\mathbb{R}^{n} be a linear space, and assume (W){\cal M}(W) is non-separable. Then,

  1. (i)

    for any distinct i,j,k[n]i,j,k\in[n], 𝒦ijW𝒦ikW𝒦kjW\mathcal{K}_{ij}^{W}\subseteq\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W}; and

  2. (ii)

    for any distinct i,j,k[n]i,j,k\in[n], κijκikκkj\kappa_{ij}\leq\kappa_{ik}\cdot\kappa_{kj}.

The proof relies on the following technical lemma that analyzes the scenario when almost all vectors in WW are elementary.

Lemma 3.22.

Let WnW\subseteq\mathbb{R}^{n} be a subspace s.t. (W)\mathcal{M}(W) is non-separable.

  1. (i)

    If (W)={gW{0}:supp(g)[n]}\mathcal{F}(W)=\left\{g\in W\setminus\{0\}:\mathrm{supp}(g)\neq[n]\right\}, then 𝒦ijW𝒦ikW𝒦kjW\mathcal{K}_{ij}^{W}\subseteq\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W}.

  2. (ii)

    If there exists g(W)g\in\mathcal{F}(W) such that |supp(g)|=n1|\mathrm{supp}(g)|=n-1, then

    (W)={gW{0}:supp(g)[n]}.\mathcal{F}(W)=\left\{g\in W\setminus\{0\}:\mathrm{supp}(g)\neq[n]\right\}\,.
Proof.

For part i, let δ𝒦ijW\delta\in\mathcal{K}_{ij}^{W} and let g(W)g\in\mathcal{F}(W) such that {i,j}supp(g)\{i,j\}\subset\mathrm{supp}(g) and |gj/gi|=δ|g_{j}/g_{i}|=\delta. If ksupp(g)k\in\mathrm{supp}(g), then |gj/gi|=|gk/gi||gj/gk||g_{j}/g_{i}|=|g_{k}/g_{i}|\cdot|g_{j}/g_{k}| shows the claim.

Assume ksupp(g)k\notin\mathrm{supp}(g), and pick h(W)h\in\mathcal{F}(W) such that {i,k}supp(h)\{i,k\}\subset\mathrm{supp}(h) and let h~=hjggjh\tilde{h}=h_{j}g-g_{j}h; such a hh exists by Proposition 2.2. Then h~j=0\tilde{h}_{j}=0 and h~k0\tilde{h}_{k}\neq 0, so h~(W)\tilde{h}\in\mathcal{F}(W) by the assumption. If h~i=0\tilde{h}_{i}=0 then hjgi=gjhih_{j}g_{i}=g_{j}h_{i} and so {i,j,k}supp(h)\{i,j,k\}\subset\mathrm{supp}(h) with hj/hi=gj/gih_{j}/h_{i}=g_{j}/g_{i}, therefore hh certifies the statement as |hj/hi|=|hk/hi||hj/hk||h_{j}/h_{i}|=|h_{k}/h_{i}|\cdot|h_{j}/h_{k}|. Otherwise, h~i0\tilde{h}_{i}\neq 0 and h:=h~iggih~h^{\prime}:=\tilde{h}_{i}g-g_{i}\tilde{h} fulfills h(W)h^{\prime}\in\mathcal{F}(W) as hi=0h^{\prime}_{i}=0, {j,k}supp(h)\{j,k\}\subset\mathrm{supp}(h^{\prime}). Now, using that h~j=0\tilde{h}_{j}=0 and gk=0g_{k}=0 it is easy to see that

|h~kh~ihjhk|=|h~kh~ih~igjgih~jh~igkgih~k|=|h~kh~ih~igjgih~k|=|gjgi|.\left|\frac{\tilde{h}_{k}}{\tilde{h}_{i}}\cdot\frac{h^{\prime}_{j}}{h^{\prime}_{k}}\right|=\left|\frac{\tilde{h}_{k}}{\tilde{h}_{i}}\cdot\frac{\tilde{h}_{i}g_{j}-g_{i}\tilde{h}_{j}}{\tilde{h}_{i}g_{k}-g_{i}\tilde{h}_{k}}\right|=\left|\frac{\tilde{h}_{k}}{\tilde{h}_{i}}\cdot\frac{\tilde{h}_{i}g_{j}}{g_{i}\tilde{h}_{k}}\right|=\left|\frac{g_{j}}{g_{i}}\right|\,. (4)

We now turn to part ii. Since there exists g(W)g\in\mathcal{F}(W) with supp(g)n\mathrm{supp}(g)\neq n, we cannot have [n]𝒞(W)[n]\in\mathcal{C}(W).

Let g(W)g\in\mathcal{F}(W) and i[n]i\in[n] such that supp(g)=[n]{i}\mathrm{supp}(g)=[n]\setminus\left\{i\right\}. Consider any hWh\in W, supp(h)supp(g)\mathrm{supp}(h)\neq\mathrm{supp}(g) such that supp(h)[n]\mathrm{supp}(h)\neq[n]. If h(W)h\notin\mathcal{F}(W) there exists (W)\ell\in\mathcal{F}(W) such that supp()supp(h)\mathrm{supp}(\ell)\subsetneq\mathrm{supp}(h). We must have isupp()i\in\mathrm{supp}(\ell), since supp()supp(g)\mathrm{supp}(\ell)\setminus\mathrm{supp}(g)\neq\emptyset. Then h~:=hiih\tilde{h}:=h_{i}\ell-\ell_{i}h fulfills h~0\tilde{h}\neq 0, h~i=0\tilde{h}_{i}=0 and supp(h~)[n]{i}\mathrm{supp}(\tilde{h})\subsetneq[n]\setminus\left\{i\right\}, a contradiction to g(W)g\in\mathcal{F}(W). ∎

Proof of Theorem 3.21.

Part ii immediately follows from part i, when taking C𝒞WC\in\mathcal{C}_{W} such that |gjC/giC|=κij|g_{j}^{C}/g_{i}^{C}|=\kappa_{ij}. We now prove part i.

Let δ𝒦ijW\delta\in\mathcal{K}_{ij}^{W} and C𝒞WC\in\mathcal{C}_{W} such that i,jCi,j\in C and for g=gCg=g^{C}, |gj/gi|=δ|g_{j}/g_{i}|=\delta. If kCk\in C then clearly |gj/gi|=|gk/gi||gj/gk|𝒦ikW𝒦kjW|g_{j}/g_{i}|=|g_{k}/g_{i}|\cdot|g_{j}/g_{k}|\in\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W}. Otherwise, let us select C𝒞WC^{\prime}\in\mathcal{C}_{W} such that i,kCi,k\in C^{\prime}, and |CC||C\cup C^{\prime}| is minimal. Let h=gCh=g^{C^{\prime}} and J=C(C{k})J=C^{\prime}\setminus(C\cup\{k\}).

Claim 3.22.1.

Let G=(CC)JG=(C\cup C^{\prime})\setminus J. Then for the space W^:=πG(WCC)\hat{W}:=\pi_{G}(W_{C\cup C^{\prime}}) we have that gG,hG(W^)g_{G},h_{G}\in\mathcal{F}(\hat{W}).

Proof.

The statement that hG(W^)h_{G}\in\mathcal{F}(\hat{W}) is clear as hCC(WCC)h_{C\cup C^{\prime}}\in\mathcal{F}(W_{C\cup C^{\prime}}) and the variables we project out JJ fulfill Jsupp(h)J\subseteq\mathrm{supp}(h). For the statement on gGg_{G} assume that there exists g^(W^)\hat{g}\in\mathcal{F}(\hat{W}) such that supp(g^)supp(gG)\mathrm{supp}(\hat{g})\subsetneq\mathrm{supp}(g_{G}). Then there exists a lift g~(WCC)\tilde{g}\in\mathcal{F}(W_{C\cup C^{\prime}}) of g^\hat{g} and some J\ell\in J such that supp(g~)\ell\in\mathrm{supp}(\tilde{g}); note also that g~k=gk=0\tilde{g}_{k}=g_{k}=0. The vector h^:=hg~g~h\hat{h}:=h_{\ell}\tilde{g}-\tilde{g}_{\ell}h fulfills supp(h^)\ell\notin\mathrm{supp}(\hat{h}) and ksupp(h^)k\in\mathrm{supp}(\hat{h}).

Now pick any circuit h~(W^)\tilde{h}\in\mathcal{F}(\hat{W}) such that ksupp(h~)k\in\mathrm{supp}(\tilde{h}) and supp(h~)supp(h^)\mathrm{supp}(\tilde{h})\subseteq\mathrm{supp}(\hat{h}). Note that J{k}J\cup\{k\} is independent, as J{k}C{i}CJ\cup\{k\}\subseteq C^{\prime}\setminus\{i\}\subsetneq C^{\prime}. Therefore, supp(h~)supp(g)\mathrm{supp}(\tilde{h})\cap\mathrm{supp}(g)\neq\emptyset. Hence, for T:=Csupp(h~)T:=C\cup\mathrm{supp}(\tilde{h}) we have that (WT)\mathcal{M}(W_{T}) is non-separable. In particular there exists a circuit h(WT)h^{\prime}\in\mathcal{F}(W_{T}) such that i,ksupp(h)i,k\in\mathrm{supp}(h^{\prime}). As T(CC){}T\subseteq(C\cup C^{\prime})\setminus\{\ell\}, this is a contradiction to the minimal choice of CC^{\prime}. ∎

As supp(hD)supp(gD)=D\mathrm{supp}(h_{D})\cup\mathrm{supp}(g_{D})=D and supp(hD)supp(gD)\mathrm{supp}(h_{D})\cap\mathrm{supp}(g_{D})\neq\emptyset we have that (W)\mathcal{F}(W^{\prime}) is non-separable. Further |supp(gD)|=|D|1|\mathrm{supp}(g_{D})|=|D|-1, so we can apply Lemma 3.22 to learn δ𝒦ijW𝒦ikW𝒦kjW\delta\in\mathcal{K}_{ij}^{W^{\prime}}\subseteq\mathcal{K}_{ik}^{W^{\prime}}\cdot\mathcal{K}_{kj}^{W^{\prime}}. We can conclude δ𝒦ikW𝒦kjW\delta\in\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W} from Lemma 3.16. ∎

If κW=1\kappa_{W}=1, then the reverse inclusion 𝒦ikW𝒦kjW𝒦ijW\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W}\subseteq\mathcal{K}_{ij}^{W} trivially holds, since 1 is the only element in these sets. In Proposition 5.4, we give a necessary and sufficient condition for 𝒦ijW=𝒦ikW𝒦kjW\mathcal{K}_{ij}^{W}=\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W}.

One may ask under which circumstances an element α𝒦ikW𝒦kjW\alpha\in\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W} is also contained in 𝒦ijW\mathcal{K}_{ij}^{W}. We give a partial answer by stating a sufficient condition in a restrictive setting. For a basis BB of (W){\cal M}(W), recall B(W)\mathcal{F}_{B}(W^{\perp}) from Lemma 3.12. Then, Lemmas 3.12 and 3.13 together imply:

Lemma 3.23.

Given a basis B[n]B\subseteq[n] in (W)\mathcal{M}(W) and g,hB(W)g,h\in\mathcal{F}_{B}\subseteq\mathcal{F}(W^{\perp}) such that isupp(g)Bi\in\mathrm{supp}(g)\cap B, jsupp(h)Bj\in\mathrm{supp}(h)\cap B and ksupp(g)supp(h)k\in\mathrm{supp}(g)\cap\mathrm{supp}(h). Then |hj/hk||gk/gi|𝒦ijW|h_{j}/h_{k}|\cdot|g_{k}/g_{i}|\in\mathcal{K}_{ij}^{W}.

4 Connections to other condition numbers

4.1 The condition number χ¯\bar{\chi} and the lifting operator

For a full row rank matrix Am×nA\in\mathbb{R}^{m\times n}, the condition number χ¯A\bar{\chi}_{A} can be defined in the following two equivalent ways:

χ¯A\displaystyle\bar{\chi}_{A} =sup{A(ADA)1AD:D𝐃n}\displaystyle=\sup\left\{\left\|A^{\top}\left(ADA^{\top}\right)^{-1}AD\right\|\,:D\in\mathbf{D}_{n}\right\} (5)
=sup{Ayp:y minimizes D1/2(Ayp) for some 0pn and D𝐃n}.\displaystyle=\sup\left\{\frac{\left\lVert A^{\top}y\right\rVert}{\left\lVert p\right\rVert}:\text{$y$ minimizes $\left\lVert D^{1/2}(A^{\top}y-p)\right\rVert$ for some $0\neq p\in\mathbb{R}^{n}$ and $D\in\mathbf{D}_{n}$}\right\}.

This condition number was first studied by Dikin [Dik67], Stewart [Ste89], and Todd [Tod90]. There is an extensive literature on the properties and applications of χ¯A\bar{\chi}_{A}, as well as its relations to other condition numbers. In particular, it plays a key role in layered-least-squares interior point methods, see Section 7.2. We refer the reader to the papers [HT02, MT03, VY96] for further results and references.

It is important to note that—similarly to κA\kappa_{A} and κ˙A\dot{\kappa}_{A}χ¯A\bar{\chi}_{A} only depends on the subspace W=ker(A)W=\ker(A). Hence, we can also write χ¯W\bar{\chi}_{W} for a subspace WnW\subseteq\mathbb{R}^{n}, defined to be equal to χ¯A\bar{\chi}_{A} for some matrix Ak×nA\in\mathbb{R}^{k\times n} with W=ker(A)W=\ker(A). We will use the notations χ¯A\bar{\chi}_{A} and χ¯W\bar{\chi}_{W} interchangeably. The following characterization reveals the connection between κA\kappa_{A} and χ¯A\bar{\chi}_{A}.

Proposition 4.1 ([TTY01]).

For a full row rank matrix Am×nA\in\mathbb{R}^{m\times n},

χ¯A=max{AB1A:AB is a non-singular m×m-submatrix of A}.\bar{\chi}_{A}=\max\{\|A_{B}^{-1}A\|:A_{B}\text{ is a non-singular $m\times m$-submatrix of }A\}\,.

Together with Proposition 3.1, this shows that the difference between χ¯A\bar{\chi}_{A} and κA\kappa_{A} is in using 2\ell_{2} instead of \ell_{\infty} norm. This immediately implies the upper bound and a slightly weaker lower bound in the next theorem.

Theorem 4.2 ([DHNV20, DNV20]).

For a matrix Am×nA\in\mathbb{R}^{m\times n} we have 1+κA2χ¯AnκA\sqrt{1+\kappa_{A}^{2}}\leq\bar{\chi}_{A}\leq n\kappa_{A}.

Approximating the condition number χ¯A\bar{\chi}_{A} is known to be hard; by the same token, κ¯A\bar{\kappa}_{A} also cannot be approximated by any polynomial factor. The proof relies on the hardness of approximating the minimum subdeterminant by Khachiyan [Kha95].

Theorem 4.3 (Tunçel [Tun99]).

Approximating χ¯A\bar{\chi}_{A} up to a factor of 2poly(n)2^{\operatorname{poly}(n)} is NP-hard.

In connection with χ¯A\bar{\chi}_{A}, it is worth mentioning the lifting map, a key concept in the algorithms presented in Section 7. The map LIW:πI(W)WL_{I}^{W}:\pi_{I}(W)\to W lifts back a vector from a coordinate projection of WW to a minimum-norm vector in WW:

LIW(p)=argmin{z:zI=p,zW}.L_{I}^{W}(p)=\arg\min\left\{\|z\|:z_{I}=p,z\in W\right\}.

Note that LIWL_{I}^{W} is the unique linear map from πI(W)\pi_{I}(W) to WW such that LIW(p)I=pL_{I}^{W}(p)_{I}=p and LIW(p)L_{I}^{W}(p) is orthogonal to W[n]InW\cap\mathbb{R}^{n}_{[n]\setminus I}. The condition number χ¯W\bar{\chi}_{W} can be equivalently defined as the maximum norm of any lifting map for an index subset.

Proposition 4.4 ([DHNV20, O’L90, Ste89]).

For a linear subspace WnW\subseteq\mathbb{R}^{n},

χ¯W=max{LIW:I[n],I}.\bar{\chi}_{W}=\max\left\{\|L_{I}^{W}\|\,:{I\subseteq[n]},I\neq\emptyset\right\}\,.

Even though LIWL_{I}^{W} is defined with respect to the 2\ell_{2}-norm, it can also be used to characterize κW\kappa_{W}.

Proposition 4.5 ([DNV20]).

For a linear subspace WnW\subseteq\mathbb{R}^{n},

κW=max{LIW(p)p1:I[n],I,pπI(W){0}}.\kappa_{W}=\max\left\{\frac{\|L_{I}^{W}(p)\|_{\infty}}{\|p\|_{1}}\,:{I\subseteq[n]},I\neq\emptyset,p\in\pi_{I}(W)\setminus\{0\}\right\}\,.
Proof.

We first show that for any II\neq\emptyset, and pπI(W){0}p\in\pi_{I}(W)\setminus\{0\}, LIW(p)κWp1\|L_{I}^{W}(p)\|_{\infty}\leq\kappa_{W}\|p\|_{1} holds. Let z=LIW(p)z=L_{I}^{W}(p), and take a conformal decomposition z=k=1hgkz=\sum_{k=1}^{h}g^{k} as in Lemma 2.1. For each k[h]k\in[h], let Ck=supp(gk)C_{k}=\mathrm{supp}(g^{k}). We claim that all these circuits must intersect II. Indeed, assume for a contradiction that one of them, say C1C_{1} is disjoint from II, and let z=k=2hgkz^{\prime}=\sum_{k=2}^{h}g^{k}. Then, zWz^{\prime}\in W and zI=zI=pz^{\prime}_{I}=z_{I}=p. Thus, zz^{\prime} also lifts pp to WW, but z2<z2\|z^{\prime}\|_{2}<\|z\|_{2}, contradicting the definition of z=LIW(p)z=L_{I}^{W}(p) as the minimum-norm lift of pp.

By the definition of κW\kappa_{W}, gkκWgIk1\|g^{k}\|_{\infty}\leq\kappa_{W}\|g^{k}_{I}\|_{1} for each k[h]k\in[h]. The claim follows since p=zI=k=1hgIkp=z_{I}=\sum_{k=1}^{h}g^{k}_{I}, moreover, conformity guarantees that p1=k=1hgIk1\|p\|_{1}=\sum_{k=1}^{h}\|g^{k}_{I}\|_{1}. Therefore,

zk=1hgkκWk=1hgIk1=κWp1.\|z\|_{\infty}\leq\sum_{k=1}^{h}\|g^{k}\|_{\infty}\leq\kappa_{W}\sum_{k=1}^{h}\|g^{k}_{I}\|_{1}=\kappa_{W}\|p\|_{1}\,.

We have thus shown that the maximum value in the statement is at most κW\kappa_{W}. To show that equality holds, let C𝒞WC\in\mathcal{C}_{W} be the circuit and gCWg^{C}\in W the corresponding elementary vector and i,jCi,j\in C such that κW=|gjC/giC|\kappa_{W}=|g^{C}_{j}/g^{C}_{i}|.

Let us set I=([n]C){i}I=([n]\setminus C)\cup\{i\}, and define pk=0p_{k}=0 if k[n]Ck\in[n]\setminus C and pi=giCp_{i}=g^{C}_{i}. Then pπI(W)p\in\pi_{I}(W), and the unique extension to WW is gCg^{C}; thus, LIW(p)=gCL_{I}^{W}(p)=g^{C}. We have LIW(p)=|gjC|\|L_{I}^{W}(p)\|_{\infty}=|g^{C}_{j}|. Noting that p1=|giC|\|p\|_{1}=|g^{C}_{i}|, it follows that κW=LIW(p)/p1\kappa_{W}=\|L_{I}^{W}(p)\|_{\infty}/\|p\|_{1}. ∎

4.2 The condition number δ\delta and bounds on diameters of polyhedra

Another related condition number is δ\delta, defined as follows:

Definition 4.6.

Let VnV\subseteq\mathbb{R}^{n} be a set of vectors. Then δV\delta_{V} is the largest value such that for any set of linearly independent vectors {vi:iI}V\{v_{i}:\,i\in I\}\subseteq V and λI\lambda\in\mathbb{R}^{I},

iIλiviδVmaxiI|λi|vi.\left\|\sum_{i\in I}\lambda_{i}v_{i}\right\|\geq\delta_{V}\max_{i\in I}|\lambda_{i}|\cdot\|v_{i}\|\,.

For a matrix Mm×nM\in\mathbb{R}^{m\times n}, we let δM\delta_{M} denote the value associated with the rows M1,M2,,MmM^{1},M^{2},\ldots,M^{m} of MM.

This can be equivalently characterized as follows: for a subset {vi:iI}V\{v_{i}:\,i\in I\}\subseteq V and vjVv_{j}\in V, vjW=span({vi:iI})v_{j}\notin W=\operatorname{span}(\{v_{i}:\,i\in I\}), the sine of the angle between the vector vjv_{j} and the subspace WW is at least δV\delta_{V} (see e.g. for the equivalence [DVZar]).

A line of work studied this condition number in the context of the simplex algorithm and diameter bounds. The diameter of a polyhedron PP is the diameter of the vertex-edge graph associated with PP; Hirsch’s famous conjecture from 1957 asserted that the diameter of a polytope (a bounded polyhedron) in nn dimensions with mm facets is at most mnm-n. This was disproved by Santos in 2012 [San12], but the polynomial Hirsch conjecture, i.e., a poly(n,m)(n,m) diameter bound remains wide open.

Consider the LP in standard inequality form with nn variables and mm constraints as

maxc,xs.t.xP,P={xn:Axb},\max\left\langle c,x\right\rangle\,\mathrm{s.t.}\,x\in P\,,\quad P=\{x\in\mathbb{R}^{n}:\,Ax\leq b\}\,, (6)

for Am×nA\in\mathbb{R}^{m\times n}, bmb\in\mathbb{R}^{m}. Using a randomized dual simplex algorithm, Dyer and Frieze [DF94] showed the polynomial Hirsch conjecture for TU matrices. Bonifas et al. [BDSE+14] strengthened and extended this to the bounded subdeterminant case, showing a diameter bound of O(n4ΔA2log(nΔA))O(n^{4}\Delta_{A}^{2}\log(n\Delta_{A})) for integer constraint matrices Am×nA\in\mathbb{Z}^{m\times n}. Note that this is independent of the number of constraints mm.

Brunsch and Röglin [BR13] analyzed the shadow vertex simplex algorithm in terms of the condition number δA\delta_{A}, noting that for integer matrices δA1/(nΔA2)\delta_{A}\geq 1/(n\Delta_{A}^{2}). They gave a diameter bound O(mn2/δA2)O(mn^{2}/\delta^{2}_{A}). Eisenbrand and Vempala [EV17] used a different approach to derive a bound poly(n,1/δA)(n,1/\delta_{A}) that is independent of mm. Dadush and Hähnle [DH16] further improved these bounds to O(n3log(n/δA)/δA)O(n^{3}\log(n/\delta_{A})/\delta_{A}).

In recent work, Dadush et al. [DVZar] considered (6) in the oracle model, where for each point xnx\in\mathbb{R}^{n}, the oracle returns xPx\in P or a violated inequality ai,xbi\left\langle a_{i},x\right\rangle\leq b_{i} from the system AxbAx\leq b. Their algorithm finds exact primal and dual solutions using O(n2log(n/δM))O(n^{2}\log(n/\delta_{M})) oracle calls, where M=(𝟎1Ab)M=\begin{pmatrix}\mathbf{0}&1\\ A&b\end{pmatrix}; the running time is independent of the cost function cc. They also show the following relation between κ\kappa and δ\delta:

Lemma 4.7 ([DVZar]).
  1. (i)

    Let Am×nA\in\mathbb{R}^{m\times n} be a matrix with full row rank and m<nm<n, with ai=1\|a_{i}\|=1 for all columns i[n]i\in[n]. Then, κA1/δA\kappa_{A}\leq{1}/{\delta_{A^{\top}}}.

  2. (ii)

    Let Am×nA\in\mathbb{R}^{m\times n} be in basis form A=(Im|A)A=(I_{m}|A^{\prime}). Then, 1/δAmκA2{1}/{\delta_{A^{\top}}}\leq m\kappa_{A}^{2}.

  3. (iii)

    If BB is the basis maximizing |det(AB)||\det(A_{B})|, then for A¯=AB1A\bar{A}=A_{B}^{-1}A, it holds that 1/δA¯mκA{1}/{\delta_{\bar{A}^{\top}}}\leq m\kappa_{A}.

Proof.

Part i: Let g(A)g\in\mathcal{F}(A) be an elementary vector. Select an arbitrary isupp(g)i\in\mathrm{supp}(g), and let J=supp(g){i}J=\mathrm{supp}(g)\setminus\{i\}. Then, the columns {gi:iJ}\{g_{i}:\,i\in J\} are linearly independent, and giai=jJgjaj-g_{i}a_{i}=\sum_{j\in J}g_{j}a_{j}. Thus,

|gi|ai=jJgjajδAmaxjJ|gj|aj,|g_{i}|\cdot\|a_{i}\|=\left\|\sum_{j\in J}g_{j}a_{j}\right\|\geq\delta_{A^{\top}}\max_{j\in J}|g_{j}|\cdot\|a_{j}\|\,,

and using that all columns have unit norm, we get |gj/gi|1/δA|g_{j}/g_{i}|\leq 1/\delta_{A^{\top}} for all jJj\in J. This shows that κA1/δA\kappa_{A}\leq 1/\delta_{A^{\top}}.

Parts ii and iii:

Let A=(Im|A)A=(I_{m}|A^{\prime}) in basis form, and let α=maxi[n]Ai\alpha=\max_{i\in[n]}\|A_{i}\|. Let us first show

1/δAmακA.{1}/{\delta_{A^{\top}}}\leq\sqrt{m}\alpha\kappa_{A}\,. (7)

Take any set {Ai:iI}\{A_{i}:\,i\in I\} of linearly independent columns of AA, along with coefficients λI\lambda\in\mathbb{R}^{I}. Without loss of generality, assume |I|=m|I|=m, i.e., II is a basis, by allowing λi=0\lambda_{i}=0 for some coefficients. Let z=iIλiAiz=\sum_{i\in I}\lambda_{i}A_{i}. Then, λ=AI1z\lambda=A_{I}^{-1}z. Lemma 3.3 implies that every column of AI1A_{I}^{-1} has 2-norm at most mκA\sqrt{m}\kappa_{A}. Hence, |λi|mκAz|\lambda_{i}|\leq\sqrt{m}\kappa_{A}\|z\| holds for all iIi\in I, implying (7).

Then, part ii follows since AimκA\|A_{i}\|\leq\sqrt{m}\kappa_{A} by Proposition 3.1. For part iii, let BB be a basis maximizing |det(AB)||\det(A_{B})|. Then, AB1A1\|A_{B}^{-1}A\|_{\infty}\leq 1. Indeed, if there is an entry |Aij|>1|A_{ij}|>1, then we can obtain a larger determinant by exchanging ii for jj. This implies αm\alpha\leq\sqrt{m}. ∎

Using this correspondence between δ\delta and κ\kappa, we can derive the following bound on the diameter of polyhedra in standard form from [DH16]. This verifies the polynomial Hirsch-conjecture whenever κA\kappa_{A} is polynomially bounded.

Theorem 4.8.

Consider a polyhedron in the standard equality form

P={xn:Ax=b,x0}P=\{x\in\mathbb{R}^{n}:\,Ax=b,x\geq 0\}

for Am×nA\in\mathbb{R}^{m\times n} and bmb\in\mathbb{R}^{m}. Then, the diameter of PP is at most O((nm)3mκAO((n-m)^{3}m\kappa_{A} log(κA+n))\log(\kappa_{A}+n)).

Proof.

Without loss of generality, we can assume that AA has full row rank. Changing to a standard basis representation does neither change the geometry (in particular, the diameter) of PP, nor the value of κA\kappa_{A}. Let BB be the basis maximizing det(AB)\det(A_{B}), and let us replace AA by AB1AA_{B}^{-1}A; w.l.o.g. assume that BB is the set of the last mm columns. Hence, A=(A|Im)A=(A^{\prime}|I_{m}) for Am×(nm)A^{\prime}\in\mathbb{R}^{m\times(n-m)}. According to Lemma 3.6, PP has the same diameter as PP^{\prime} defined as

P={xnm:Axb,x0},P^{\prime}=\{x^{\prime}\in\mathbb{R}^{n-m}:\,A^{\prime}x^{\prime}\leq b,x^{\prime}\geq 0\}\,,

in other words, P={xnm:Cxd}P^{\prime}=\{x^{\prime}\in\mathbb{R}^{n-m}:\,Cx^{\prime}\leq d\}, where C=(InmA)C={-I_{n-m}\choose A^{\prime}} and d=(0b)d={0\choose b}. There is a one-to-one correspondence between the vertices and edges of PP and PP^{\prime}, and hence, the two polyhedra have the same diameter. Thus, [DH16] gives a bound O((nm)3log(n/δC)/δC)O((n-m)^{3}\log(n/\delta_{C})/\delta_{C}) on the diameter of PP^{\prime}. By the choice of BB, from Lemma 4.7iii, we obtain the diameter bound O((nm)3mκCO((n-m)^{3}m\kappa_{C^{\top}} log(κC+n))\log(\kappa_{C^{\top}}+n)). We claim that κC=κA\kappa_{C^{\top}}=\kappa_{A}. Indeed, the kernels of A=(A|Im)A=(A^{\prime}|I_{m}) and C=(Inm|(A))C^{\top}=(-I_{n-m}|(A^{\prime})^{\top}) represent orthogonal complements, thus κC=κA\kappa_{C^{\top}}=\kappa_{A} by Proposition 3.14. This completes the proof. ∎

The diameter bound in [DH16] is proved constructively, using the shadow simplex method. However, in the proof we choose BB maximizing |det(AB)||\det(A_{B})|, a hard computational problem to solve even approximately [DSEFM14]. However, we do not actually require a (near) maximizing subdeterminant. For the argument, we only need to find a basis B[n]B\subseteq[n] such that for A¯=AB1A\bar{A}=A_{B}^{-1}A, A¯μ\|\bar{A}\|_{\infty}\leq\mu for some constant μ>1\mu>1. Then, (7), gives 1/δA¯mμκA1/\delta_{\bar{A}^{\top}}\leq m\mu\kappa_{A}.

Such a basis BB corresponds to approximate local subdeterminant maximization, and can be found using the following simple algorithm proposed by Knuth [Knu85]. As long as there is an entry |Aij|>μ|A_{ij}|>\mu, then swapping ii for jj increases |det(AB)||\det(A_{B})| by a factor |Aij|>μ|A_{ij}|>\mu. Using that |det(AB)|(κ˙W)m|\det(A_{B})|\leq({\dot{\kappa}}_{W})^{m} by Proposition 3.19, the algorithm terminates in O(mlog(κ˙W/μ))O(m\log({\dot{\kappa}}_{W}/\mu)) iterations.

We also note that δA\delta_{A} was also studied for lattice basis reduction by Seysen [Sey93]. A related quantity has been used to characterize Hoffman constants (introduced in Section 6), see [GHR95, KT95, PVZ20].

5 Optimizing circuit imbalances

Recall that 𝐃n\mathbf{D}_{n} is the set of n×nn\times n positive definite diagonal matrices. For every D𝐃nD\in\mathbf{D}_{n}, ADAD represents a column rescaling. This is a natural symmetry in linear programming, and particularly relevant in the context of interior point algorithms, as discussed in Section 7.2.

The condition number κAD\kappa_{AD} may vastly differ from κA\kappa_{A}. In terms of the subspace W=ker(A)W=\ker(A), this amounts to rescaling the subspace by D1D^{-1}; we denote this by D1WD^{-1}W. It is natural to ask for the best possible value that can be achieved by rescaling:

κW=inf{κDW:D𝐃n}.\kappa_{W}^{*}=\inf\left\{\kappa_{DW}:\,D\in\mathbf{D}_{n}\right\}\,.

In most algorithmic and polyhedral results in this paper, the κW\kappa_{W} dependence can be replaced by κW\kappa^{*}_{W} dependence. For example, the diameter bound in Theorem 4.8 is true in the stronger form with κW\kappa^{*}_{W}, since the diagonal rescaling maintains the geometry of the polyhedron.

Even the value κW\kappa^{*}_{W} can be arbitrarily large. As an example, let W4W\subseteq\mathbb{R}^{4} be defined as W=span{(0,1,1,M),(1,0,M,1)}W=\operatorname{span}\{(0,1,1,M),(1,0,M,1)\}. Both generating vectors are elementary, and we see that κW=M\kappa_{W}=M. Rescaling the third and fourth coordinates have opposite effect on the two elementary vectors, therefore we also have κW=M\kappa_{W}^{*}=M, i.e. the original subspace is already optimally rescaled.

A key result in [DHNV20] shows that an approximately optimal rescaling can be found:

Theorem 5.1 ([DHNV20]).

There is an O(n2m2+n3)O(n^{2}m^{2}+n^{3}) time algorithm that for any matrix Am×nA\in\mathbb{R}^{m\times n}, computes an estimate ξ\xi of κA\kappa_{A} such that

ξκA(κA)2ξ\xi\leq\kappa_{A}\leq(\kappa_{A}^{*})^{2}\xi

and a D𝐃nD\in\mathbf{D}_{n} such that

κAκAD(κA)3.\kappa^{*}_{A}\leq\kappa_{AD}\leq(\kappa_{A}^{*})^{3}\,.

This is in surprising contrast with the inapproximability result Theorem 4.3. Note that there is no contradiction since the approximation factor (κA)2(\kappa_{A}^{*})^{2} is not bounded as 2poly(n)2^{\mathrm{poly}(n)} in general.

The key idea of the proof of Theorem 5.1 is to analyze the pairwise imbalances κij=κijW\kappa_{ij}=\kappa^{W}_{ij} introduced in Section 3.3. In the 4-dimensional example above, we have κ34=κ43=M\kappa_{34}=\kappa_{43}=M. Let D𝐃D\in{\mathbf{D}} and let dnd\in\mathbb{R}^{n} denote the diagonal elements; i.e. the rescaling multiplies the ii-th coordinate of every wWw\in W by did_{i}. Then, we can see that κijDW=κijdj/di\kappa^{DW}_{ij}=\kappa_{ij}d_{j}/d_{i}. In particular, for any pair of variables ii and jj, κijDWκjiDW=κijκji\kappa^{DW}_{ij}\kappa^{DW}_{ji}=\kappa_{ij}\kappa_{ji}. Consequently, we get a lower bound κijκji(κW)2\kappa_{ij}\kappa_{ji}\leq(\kappa^{*}_{W})^{2}.

Theorem 5.1 is based on a combinatorial min-max characterization that extends this idea. For the rest of this section, let us assume that the matroid (W){\cal M}(W) is non-separable. In case it is separable, we can obtain κW\kappa^{*}_{W} by taking a maximum over the non-separable components.

Let G=([n],E)G=([n],E) be the complete directed graph on nn vertices with edge weights κij\kappa_{ij}. Since (W){\cal M}(W) is assumed to be non-separable, Proposition 2.2 implies that κij>0\kappa_{ij}>0 for any i,j[n]i,j\in[n]. We will refer to this weighted digraph as the circuit ratio digraph.

Let HH be a cycle in GG, that is, a sequence of indices i1,i2,,ik,ik+1=i1i_{1},i_{2},\dots,i_{k},i_{k+1}=i_{1}. We use |H|=k|H|=k to denote the length of the cycle. (In this terminology, cycles refer to objects in GG, whereas circuits to objects in 𝒞W\mathcal{C}_{W}.)

We use the notation κ(H)=κW(H)=j=1kκijij+1W\kappa(H)=\kappa_{W}(H)=\prod_{j=1}^{k}\kappa^{W}_{i_{j}i_{j+1}}. The observation for length-2 cycles remains valid in general: κ(H)\kappa(H) is invariant under any rescaling. This leads to the lower bound (κ(H))1/|H|κW(\kappa(H))^{1/|H|}\leq\kappa^{*}_{W}. The best of these bounds turns out to be tight:

Theorem 5.2 ([DHNV20]).

For a subspace WnW\subseteq\mathbb{R}^{n}, we have

κW=max{κW(H)1/|H|:H is a cycle in G}.\kappa_{W}^{*}=\max\left\{\kappa_{W}(H)^{1/|H|}:\ \mbox{$H$ is a cycle in $G$}\right\}\,.

The proof relies on the following formulation:

κW=mintκijdj/dit(i,j)E,d>0.\displaystyle\begin{aligned} \kappa^{*}_{W}=&&\min\;&t\\ &&\kappa_{ij}d_{j}/d_{i}&\leq t\quad\forall(i,j)\in E,\\ &&d&>0.\end{aligned} (8)

Taking logarithms and substituting zi=logdiz_{i}=\log d_{i}, we can rewrite this problem equivalently as

min\displaystyle\min s\displaystyle s (9)
logκij+zjzi\displaystyle\log\kappa_{ij}+z_{j}-z_{i} s(i,j)E,\displaystyle\leq s\quad\forall(i,j)\in E,
z\displaystyle z n.\displaystyle\in\mathbb{R}^{n}.

This is the dual of the minimum-mean cycle problem with weights logκij\log\kappa_{ij}, and can be solved in polynomial time (see e.g. [AMO93, Theorem 5.8]).

Whereas this formulation verifies Theorem 5.2, it does not give a polynomial-time algorithm to compute κW\kappa^{*}_{W}. In fact, the values κij\kappa_{ij} are already NP-hard to approximate due to Theorem 4.3. Nevertheless, the bound κijκji(κW)2\kappa_{ij}\kappa_{ji}\leq(\kappa^{*}_{W})^{2} implies that for any elementary vector gCg^{C} with support i,jCi,j\in C, we have

κij(κW)21κji|gjgi|κij.\frac{\kappa_{ij}}{(\kappa^{*}_{W})^{2}}\leq\frac{1}{\kappa_{ji}}\leq\left|\frac{g_{j}}{g_{i}}\right|\leq\kappa_{ij}\,. (10)

To find an efficient algorithm as in Theorem 5.1, we replace the exact values κij\kappa_{ij} by estimates κ^ij\hat{\kappa}_{ij} obtained as |gjC/giC||g^{C}_{j}/g^{C}_{i}| for an arbitrary circuit C𝒞WC\in\mathcal{C}_{W} with i,jCi,j\in C; these can be obtained using standard techniques from linear algebra and matroid theory. Thus, we can return ξ=max(i,j)Eκ^ij\xi=\max_{(i,j)\in E}\hat{\kappa}_{ij} as the estimate on the value of κA\kappa_{A}. To estimate κA\kappa^{*}_{A}, we solve (9) with the estimates κ^ij\hat{\kappa}_{ij} in place of the κij\kappa_{ij}’s.

5.1 Perfect balancing: κW=1\kappa^{*}_{W}=1

Let us now show that κA=1\kappa^{*}_{A}=1 can be efficiently checked.

Theorem 5.3.

There exists a strongly polynomial algorithm, that given a matrix Am×nA\in\mathbb{R}^{m\times n}, returns one of the following outcomes:

  1. (a)

    A diagonal matrix D𝐃nD\in{\mathbf{D}}_{n} such that κAD=1\kappa_{AD}=1 showing that κA=1\kappa^{*}_{A}=1. The algorithm also returns the exact value of κA\kappa_{A}. Further, if ker(A)\ker(A) is a rational linear space, then we can select DD with integer diagonal entries that divide κ˙A{\dot{\kappa}}_{A}.

  2. (b)

    The answer κA>1\kappa^{*}_{A}>1, along with a cycle of circuits HH such that κA(H)>1\kappa_{A}(H)>1.

Proof.

As noted above, we can assume without loss of generality that the matroid (W){\cal M}(W) is non-separable, as we can reduce the problem to solving on all connected components separately.

We obtain estimates κ^ij\hat{\kappa}_{ij} for every edge (i,j)(i,j) of the circuit ratio graph using a circuit C𝒞WC\in\mathcal{C}_{W} with i,jCi,j\in C. Assuming that κW=1\kappa^{*}_{W}=1, (10) implies that κ^ij=κij\hat{\kappa}_{ij}=\kappa_{ij} holds and the rescaling factors did_{i} must satisfy

κ^ijdj=dii,j[n].\hat{\kappa}_{ij}d_{j}=d_{i}\quad\forall i,j\in[n]\,. (11)

If this system is infeasible, then using the circuits that provided the estimates κ^ij\hat{\kappa}_{ij}, we can obtain a cycle HH such that κA(H)>1\kappa_{A}(H)>1, that is, outcome b. Let us now assume that (11) is feasible; then it has a unique solution dd up to scalar multiplication. We define D𝐃nD\in\mathbf{D}_{n} with diagonal entries Dii=diD_{ii}=d_{i}.

Since (W){\cal M}(W) is non-separable, we can conclude that κA=1\kappa^{*}_{A}=1 if and only if κAD=1\kappa_{AD}=1. By Theorem 3.4, this holds if and only if A=AB1ADA^{\prime}=A_{B}^{-1}AD is a TU-matrix for any basis BB.

We run Seymour’s algorithm [Sey80] for AA^{\prime}. If it confirms that AA is TU (certified by a construction sequence), then we return outcome a. In this case, |gjC/giC||g^{C}_{j}/g^{C}_{i}| is the same for any circuit CC with i,jCi,j\in C; therefore κij=κ^ij\kappa_{ij}=\hat{\kappa}_{ij}, and we can return κA=max(i,j)Eκ^ij\kappa_{A}=\max_{(i,j)\in E}\hat{\kappa}_{ij}.

Otherwise, Seymour’s algorithm finds a k×kk\times k submatrix TT of AA^{\prime} with det(T){0,±1}\det(T)\notin\{0,\pm 1\}. As in the proof of Proposition 3.2, we can recover a circuit CC in 𝒞A=𝒞AD\mathcal{C}_{A^{\prime}}=\mathcal{C}_{AD} with two entries i,jCi,j\in C such that |g¯j||g¯i||\bar{g}_{j}|\neq|\bar{g}_{i}| for the corresponding elementary vector g¯(AD)\bar{g}\in{\mathcal{F}}(AD). Note that κ^ijdj=di\hat{\kappa}_{ij}d_{j}=d_{i} for the rescaled estimates. Hence, the circuit CC^{\prime} with i,jCi,j\in C^{\prime} used to obtain the estimate κij\kappa_{ij}, together with CC certifies that κA>1\kappa^{*}_{A}>1 as required for outcome b.

Finally, if ker(A)\ker(A) is a rational linear space and we concluded κA=1\kappa^{*}_{A}=1, then let us select the solution did_{i} to (11) such that dnd\in\mathbb{Z}^{n} and gcd(d)=1\gcd(d)=1. We claim that diκ˙Wd_{i}\mid\dot{\kappa}_{W} for all ii. Indeed, let k=lcm(d)k=\mathrm{lcm}(d). For each pair i,j[n]i,j\in[n], dj/di=r/qd_{j}/d_{i}=r/q for two integers r,qκ˙Ar,q\mid{\dot{\kappa}}_{A}. Hence, for any prime pp\in\mathbb{P}, νp(k)νp(κ˙A)\nu_{p}(k)\leq\nu_{p}({\dot{\kappa}}_{A}), implying kκ˙Ak\mid{\dot{\kappa}}_{A}. ∎

Let WnW\subseteq\mathbb{R}^{n} be a linear space such that (W){\cal M}(W) is non-separable. Recall from Theorem 3.21 that 𝒦ijW𝒦ikW𝒦kjW\mathcal{K}_{ij}^{W}\subseteq\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W} for all i,j,k[n]i,j,k\in[n]. We now characterize when equality holds for all triples.

Proposition 5.4.

Let WnW\subseteq\mathbb{R}^{n} be a linear space such that (W){\cal M}(W) is non-separable. Then, the following are equivalent:

  1. (i)

    κW=1\kappa^{*}_{W}=1,

  2. (ii)

    |𝒦ijW|=1|\mathcal{K}_{ij}^{W}|=1 for all i,j[n]i,j\in[n],

  3. (iii)

    𝒦ijW=𝒦ikW𝒦kjW\mathcal{K}_{ij}^{W}=\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W} holds for all distinct i,j,k[n]i,j,k\in[n].

Proof.

i \Leftrightarrow ii: Consider any rescaling D𝐃nD\in\mathbf{D}_{n} with diagonal entries di=Diid_{i}=D_{ii}. Then, 𝒦ijDW={1}\mathcal{K}_{ij}^{DW}=\{1\}, and 𝒦ijDW=djdi𝒦ijW\mathcal{K}_{ij}^{DW}=\frac{d_{j}}{d_{i}}\mathcal{K}_{ij}^{W}. Hence, if κDW=1\kappa_{DW}=1 for some D𝐃nD\in\mathbf{D}_{n}, then 𝒦ijDW={1}\mathcal{K}_{ij}^{DW}=\{1\} implying |𝒦ijW|=1|\mathcal{K}_{ij}^{W}|=1 for every i,j[n]i,j\in[n]. If |𝒦ijW|>1|\mathcal{K}_{ij}^{W}|>1 for some i,ji,j, it follows that κDW1\kappa_{DW}\neq 1 for any diagonal rescaling.

ii \Rightarrow iii: We have 𝒦ijW𝒦ikW𝒦kjW\mathcal{K}_{ij}^{W}\subseteq\mathcal{K}_{ik}^{W}\cdot\mathcal{K}_{kj}^{W} by Theorem 3.21. If all three sets are of size one, then equality must hold.

iii \Rightarrow i: Let i,j[n]i,j\in[n] arbitrary but distinct and let us define

Γij:={κ(H)|H closed walk in G(i,j)E(H)}.\Gamma_{ij}:=\{\kappa(H)|\,\text{$H$ closed walk in $G$, $(i,j)\in E(H)$}\}\,.

Note that either Γij={1}\Gamma_{ij}=\{1\} or Γij\Gamma_{ij} is infinite as any cycle HH can be traversed multiple times to form a closed walk. Note that by (iii) we have for any i,j[n]i,j\in[n] that

Γij{Π(k,)E(H)𝒦k,|H closed walk in G(i,j)E(H)}=𝒦ij𝒦ji.\Gamma_{ij}\subseteq\bigcup\Big{\{}\Pi_{(k,\ell)\in E(H)}\mathcal{K}_{k,\ell}|\,\text{$H$ closed walk in $G$, $(i,j)\in E(H)$}\Big{\}}=\mathcal{K}_{ij}\cdot\mathcal{K}_{ji}. (12)

The set 𝒦ij𝒦ji\mathcal{K}_{ij}\cdot\mathcal{K}_{ji} is finite, implying that Γij={1}\Gamma_{ij}=\{1\}. This, together with Theorem 5.2 gives i. ∎

A surprising finding by Lee [Lee89, Lee90] is that if κ˙W\dot{\kappa}_{W} is an odd prime power, then κW=1\kappa^{*}_{W}=1 holds.111The statement in the paper is slightly more general, for kk-adic subspaces with k>2k>2; the proof is essentially the same. We first present a proof sketch following the lines of the one in [Lee89, Lee90]. We also present a second, almost self-contained proof, relying only on basic results on TU matrices.

Theorem 5.5 (Lee [Lee89, Lee90]).

Each WW for which κ˙W=pα\dot{\kappa}_{W}=p^{\alpha} where pp\in\mathbb{P}, p>2p>2, α\alpha\in\mathbb{N}, then κW=1\kappa^{*}_{W}=1.

Proof.

A theorem by Tutte [Tut65] asserts that WW can be represented as the kernel of a unimodular matrix, i.e. κW=1\kappa^{*}_{W}=1 or WW has a minor WW^{\prime} such that 𝒞(W)𝒞(U24)\mathcal{C}(W^{\prime})\cong\mathcal{C}(U_{2}^{4}) where U24U_{2}^{4} is the uniform matroid on four elements such that the independent sets are the sets of cardinality at most two. Here, a matroid minor corresponds to iteratively either deleting variables or projecting variables out. In the first case we are done, so let us consider the second case. Note that W4W^{\prime}\subset\mathbb{R}^{4} and by Lemma 3.16 we have that for all i,j[4]i,j\in[4] we have that 𝒦ijW𝒦ijW\mathcal{K}_{ij}^{W^{\prime}}\subseteq\mathcal{K}_{ij}^{W} and so in particular κ˙W=pβ\dot{\kappa}_{W^{\prime}}=p^{\beta} for some βα\beta\leq\alpha. An easy consequence of the proof of Proposition 3.18 and the congruence 𝒞(W)𝒞(U24)\mathcal{C}(W^{\prime})\cong\mathcal{C}(U_{2}^{4}) is that WW^{\prime} can be represented by AA^{\prime}, i.e. ker(A)=W\ker(A^{\prime})=W^{\prime} such that

A=[10pγ1pγ201pγ3pγ4]A^{\prime}=\begin{bmatrix}1&0&p^{\gamma_{1}}&p^{\gamma_{2}}\\ 0&1&p^{\gamma_{3}}&p^{\gamma_{4}}\end{bmatrix} (13)

for γi{0}\gamma_{i}\in\mathbb{N}\cup\{0\} and i[4]i\in[4]. Further, by 𝒞(W)𝒞(U24)\mathcal{C}(W^{\prime})\cong\mathcal{C}(U_{2}^{4}) and ΔAκ˙W\Delta_{A^{\prime}}\mid\dot{\kappa}_{W^{\prime}} (Proposition 3.18) we have that

0det(pγ1pγ2pγ3pγ4)=pγ1+γ4pγ2+γ3pβ.0\neq\det\begin{pmatrix}p^{\gamma_{1}}&p^{\gamma_{2}}\\ p^{\gamma_{3}}&p^{\gamma_{4}}\end{pmatrix}=p^{\gamma_{1}+\gamma_{4}}-p^{\gamma_{2}+\gamma_{3}}\mid p^{\beta}. (14)

It is immediate that (14) cannot be fulfilled for p>2p>2. ∎

Alternative Proof of Theorem 5.5.

Let Am×nA\in\mathbb{R}^{m\times n} be such that A=ker(W)A=\ker(W) satisfying the properties in Proposition 3.18 in basis form A=(Im|A)A=(I_{m}|A^{\prime}); for simplicity, assume the identity matrix is in the first mm columns. Let G=([n],E(G))G=\big{(}[n],E(G)\big{)} be a directed multigraph associated with AA with edge set E(G)=k[m]Ek(G)E(G)=\bigcup_{k\in[m]}E_{k}(G) where Ek(G)={(i,j):AkiAkj0}E_{k}(G)=\{(i,j):A_{ki}A_{kj}\neq 0\}. Further, define γ:E(G)+\gamma\colon E(G)\to\mathbb{R}_{+} where for eEke\in E_{k} we let γ(e)=|Akj/Aki|\gamma(e)=|A_{kj}/A_{ki}|. For a directed cycle CC in GG we define γ(C):=eE(C)γ(e)\gamma(C):=\prod_{e\in E(C)}\gamma(e).

Claim 5.5.1.

All cycles CC in GG fulfill γ(C)=1\gamma(C)=1.

Proof.

For a contradiction, assume that there exists a cycle CC such that γ(C)1\gamma(C)\neq 1 and let CC be a shortest cycle with this property. Then CC has no chord fE(G)f\in E(G), as otherwise C{f}C\cup\{f\} contains two shorter cycles C1,C2C_{1},C_{2} such that γ(C1)γ(C2)=γ(C)1\gamma(C_{1})\gamma(C_{2})=\gamma(C)\neq 1 and so in particular γ(C1)1\gamma(C_{1})\neq 1 or γ(C2)1\gamma(C_{2})\neq 1. This also means that the support of the corresponding submatrix AI,JA_{I,J} of AA where I:={i[m]:Ei(G)E(C)0}I:=\{i\in[m]:E_{i}(G)\cap E(C)\neq 0\} and J:=V(C)J:=V(C) is exactly the set of non-zeros of an incidence matrix of a cycle. We have that det(AI,J)0\det(A_{I,J})\neq 0 as the corresponding cycle CC has γ(C)1\gamma(C)\neq 1. Recall the Leibniz determinant formula. As AI,JA_{I,J} is supported on the incidence matrix of a cycle there exist only two bijective maps φ,ψ:IJ\varphi,\psi:I\to J, φψ\varphi\neq\psi such that iIAi,φ(i)0iIAi,ψ(i)\prod_{i\in I}A_{i,\varphi(i)}\neq 0\neq\prod_{i\in I}A_{i,\psi(i)} is non-vanishing. One of the maps corresponds to traversing the cycle forward, the other corresponds to traversing it backwards. As all the entries of AA are powers of pp we therefore have that 0det(AI,J)=±pα±pβ0\neq\det(A_{I,J})=\pm p^{\alpha}\pm p^{\beta} for some α,β\alpha,\beta\in\mathbb{N}. This contradicts Proposition 3.18iii for p>2p>2. ∎

The above claim implies the existence of a rescaling of rows and columns A~:=LAR\tilde{A}:=LAR where L𝐃nL\in\mathbf{D}_{n}, R𝐃mR\in\mathbf{D}_{m} such that A~{1,0,1}m×n\tilde{A}\in\{-1,0,1\}^{m\times n}. If A~\tilde{A} is TU, then we are done by Proposition 3.2 as now κW=1\kappa_{W}^{*}=1. Otherwise, we use a result by Gomory (see [Cam65] and [Sch98, Theorem 19.3]) that states that any matrix BB with entries in {1,0,1}\{-1,0,1\} that is not totally unimodular has a submatrix BB^{\prime} with |det(B)|=2|\det(B^{\prime})|=2. Let I[m]I\subseteq[m] and J[n]J\subseteq[n] such that |det(A~I,J)|=2|\det(\tilde{A}_{I,J})|=2. Note that w.l.o.g. the diagonal entries of LL and RR are of the form pαp^{\alpha} for some α\alpha\in\mathbb{Z}. Therefore, |det(AI,J)|=iILiijJRjj|det(A~I,J)|=2pβ|\det(A_{I,J})|=\prod_{i\in I}L_{ii}\prod_{j\in J}R_{jj}|\det(\tilde{A}_{I,J})|=2p^{\beta} for some β\beta\in\mathbb{Z}. As |det(AI,J)||\det(A_{I,J})|\in\mathbb{N} we must have β0\beta\geq 0 and 2|det(AI,J)|2\mid|\det(A_{I,J})|. This again contradicts Proposition 3.18iii for p>2p>2. ∎

6 Hoffman proximity theorems

Hoffman’s seminal work [Hof52] has analyzed proximity of LP solutions. Given P={xn:Axb}P=\{x\in\mathbb{R}^{n}:\,Ax\leq b\}, x0nx_{0}\in\mathbb{R}^{n}, and norms .α\|.\|_{\alpha} and .β\|.\|_{\beta}, we are interested in the minimum of xx0α\|x-x_{0}\|_{\alpha} over xPx\in P. Hoffman showed that this can be bounded as Hα,β(A)(Ax0b)+βH_{\alpha,\beta}(A)\|(Ax_{0}-b)^{+}\|_{\beta}, where the Lipschitz-bound Hα,β(A)H_{\alpha,\beta}(A) is a constant that only depends on AA and the norms. Such results are known as Hoffman proximity bounds in the literature and have been extensively studied; we refer the reader to [GHR95, KT95, PVZ20] for references. In particular, they are related to δ\delta studied in Section 4.2, see e.g. [GHR95].

In this section, we show a Hoffman-bound H,1=κWH_{\infty,1}=\kappa_{W} for the system xW+d,x0x\in W+d,\,x\geq 0, namely. Related bounds using χ¯A\bar{\chi}_{A} have been shown in [HT02]. We then extend it to proximity results on optimal LP solutions. These will be used in the black-box LP algorithms in Section 7.1, as well as for the improved analysis of the steepest descent circuit augmentation algorithm in Section 8.4.

A central tool in this section are conformal decompositions into circuits as in Lemma 2.1. The next proof is similar to that of Proposition 4.5.

Lemma 6.1.

If the system xW+d,x0x\in W+d,\,x\geq 0 is feasible, then the system

x\displaystyle x W+d\displaystyle\in W+d
xd\displaystyle\|x-d\|_{\infty} κWd1\displaystyle\leq\kappa_{W}\|d^{-}\|_{1}
x\displaystyle x 0,\displaystyle\geq 0,

is also feasible.

Proof.

Let xx be a solution to LP(W,d,c)(W,d,c) such that xd\|x-d\|_{\infty} is minimal, and subject to that, xd1\|x-d\|_{1} is minimal. Let D={i[n]:di<0=xi}D=\{i\in[n]:\,d_{i}<0=x_{i}\}.

Take a conformal circuit decomposition of the vector xdWx-d\in W as in Lemma 2.1 in the form xd=k=1tgkx-d=\sum_{k=1}^{t}g^{k} for some t[n]t\in[n]. We claim that supp(gk)D\mathrm{supp}(g^{k})\cap D\neq\emptyset for all k[t]k\in[t]. Indeed, if supp(gk)D=\mathrm{supp}(g^{k})\cap D=\emptyset, then x=xεgkx^{\prime}=x-\varepsilon g^{k} for some ε>0\varepsilon>0 is another solution with xdxd\|x^{\prime}-d\|_{\infty}\leq\|x-d\|_{\infty} and xd1<xd1\|x^{\prime}-d\|_{1}<\|x-d\|_{1}.

Consider any index iDi\notin D. For every elementary vector gkg^{k} with isupp(gk)i\in\mathrm{supp}(g^{k}), there exists an index jDj\in D such that |gik|κW|gjk||g^{k}_{i}|\leq\kappa_{W}|g_{j}^{k}|. By conformity,

|xidi|=|k=1tgik|=k=1t|gik|κWjDk=1t|gjk|=κWjD|dj|κWd1,|x_{i}-d_{i}|=\left|\sum_{k=1}^{t}g^{k}_{i}\right|=\sum_{k=1}^{t}\left|g^{k}_{i}\right|\leq\kappa_{W}\sum_{j\in D}\sum_{k=1}^{t}\left|g^{k}_{j}\right|=\kappa_{W}\sum_{j\in D}|d_{j}|\leq\kappa_{W}\|d^{-}\|_{1}\,,

completing the proof. ∎

We note that [DNV20] also provides a strongly polynomial algorithm that, for a given zW+dz\in W+d, and an estimate κ^\hat{\kappa} on κW\kappa_{W}, either finds a solution as in Lemma 6.1, or finds an elementary vector that reveals κ^>κW\hat{\kappa}>\kappa_{W}.

We next provide proximity results for optimization. For vectors d,cnd,c\in\mathbb{R}^{n}, let us define the set

Λ(d,c):=supp(d)supp(c+).\Lambda(d,c):=\mathrm{supp}(d^{-})\cup\mathrm{supp}(c^{+})\,. (15)

Note that for c0c\geq 0, dΛ(d,c)1=d1+dsupp(c)+1\left\|d_{\Lambda(d,c)}\right\|_{1}=\left\|d^{-}\right\|_{1}+\left\|d_{\mathrm{supp}(c)}^{+}\right\|_{1}. Consequently, if c0c\geq 0 and dΛ(d,c)=0d_{\Lambda(d,c)}=0, then x=dx=d and s=cs=c are optimal primal and dual solutions to LP(W,d,c)(W,d,c).

Lemma 6.2.

If the system LP(W,d,c)(W,d,c) is feasible, bounded and c0c\geq 0, then there is an optimal solution such that

xdκWdΛ(d,c)1.\left\|x-d\right\|_{\infty}\leq\kappa_{W}\left\|d_{\Lambda(d,c)}\right\|_{1}.
Proof.

Similarly to the proof of Lemma 6.1, let xx be an optimal solution to LP(W,d,c)(W,d,c) chosen such that xd\|x-d\|_{\infty} is minimal, and subject to that, xd1\|x-d\|_{1} is minimal; let D={i[n]:di<0=xi}D=\{i\in[n]:\,d_{i}<0=x_{i}\}. Take a conformal circuit decomposition xd=i=1tgkx-d=\sum_{i=1}^{t}g^{k} for some t[n]t\in[n]. With a similar argument as in the proof of Lemma 6.1, we can show that for each gkg^{k}, either supp(gk)D\mathrm{supp}(g^{k})\cap D\neq\emptyset, or gkg^{k} is an objective-reducing circuit, i.e. gk,c<0\left\langle g^{k},c\right\rangle<0. Since c0c\geq 0, the latter requires that for some isupp(gk)i\in\mathrm{supp}(g^{k}), gik<0g^{k}_{i}<0 and ci>0c_{i}>0, implying idsupp(c)+i\in d_{\mathrm{supp}(c)}^{+}. A similar bound as in the proof of Lemma 6.1 completes the proof. ∎

Lemma 6.3.

Let WnW\subseteq\mathbb{R}^{n} be a subspace and c,dnc,d\in\mathbb{R}^{n}. Let (x~,s)(\tilde{x},s) be an optimal solution to LP(W,x~,c)(W,\tilde{x},c). Then there exists an optimal solution (x,s)(x^{*},s^{*}) to LP(W,d,c)(W,d,c) such that

xx~(κW+1)ΠW(dx~)1.\|x^{*}-\tilde{x}\|_{\infty}\leq(\kappa_{W}+1)\left\|\Pi_{W^{\perp}}(d-\tilde{x})\right\|_{1}\,.
Proof.

Let x=x~+ΠW(dx~)x=\tilde{x}+\Pi_{W^{\perp}}(d-\tilde{x}). Note that W+x=W+dW+x=W+d, and also W+s=W+cW^{\perp}+s=W^{\perp}+c. Thus, the systems LP(W,d,c)(W,d,c) and LP(W,x,s)(W,x,s) define the same problem.

We apply Lemma 6.2 to (W,x,s)(W,x,s). This guarantees the existence of an optimal (x,s)(x^{*},s^{*}) to LP(W,x,s)(W,x,s) such that

xxκWxΛ(x,s)1=κW(x1+xsupp(s)+1).\|x^{*}-x\|_{\infty}\leq\kappa_{W}\|x_{\Lambda(x,s)}\|_{1}=\kappa_{W}\left(\left\|x^{-}\right\|_{1}+\left\|x_{\mathrm{supp}(s)}^{+}\right\|_{1}\right)\ .

Since x~0\tilde{x}\geq 0, we get that x1xsupp(x)x~supp(x)1\|x^{-}\|_{1}\leq\|x_{\mathrm{supp}(x^{-})}-\tilde{x}_{\mathrm{supp}(x^{-})}\|_{1}. Second, by the optimality of (x~,s)(\tilde{x},s), we have x~supp(s+)=0\tilde{x}_{\mathrm{supp}(s^{+})}=0, and thus xsupp(s+)=xsupp(s+)x~supp(s+)x_{\mathrm{supp}(s^{+})}=x_{\mathrm{supp}(s^{+})}-\tilde{x}_{\mathrm{supp}(s^{+})}. These together imply that

xx~\displaystyle\|x^{*}-\tilde{x}\|_{\infty} xx+xx~(κW+1)xx~1\displaystyle\leq\|x^{*}-x\|_{\infty}+\|x-\tilde{x}\|_{\infty}\leq(\kappa_{W}+1)\|x-\tilde{x}\|_{1}
=(κW+1)ΠW(dx~)1.\displaystyle=(\kappa_{W}+1)\|\Pi_{W^{\perp}}(d-\tilde{x})\|_{1}\,.

We can immediately use Lemma 6.3 to derive a conclusion on the support of the optimal dual solutions to LP(W,d,c)(W,d,c), using the optimal solution to LP(W,d~,c)(W,\tilde{d},c).

Theorem 6.4.

Let WnW\subseteq\mathbb{R}^{n} be a subspace and c,dnc,d\in\mathbb{R}^{n}. Let (x~,s)(\tilde{x},s) be an optimal solution to LP(W,x~,c)(W,\tilde{x},c) and

R:={i[n]:x~i>(κW+1)ΠW(x~d)1}.R:=\{i\in[n]:\,\tilde{x}_{i}>(\kappa_{W}+1)\|\Pi_{W^{\perp}}(\tilde{x}-d)\|_{1}\}\,.

Then for every dual optimal solution ss^{*} to LP(W,d,c)(W,d,c), we have sR=0s^{*}_{R}=0.

Proof.

By Lemma 6.3 there exists an optimal solution (x,s)(x^{\prime},s^{\prime}) to LP(W,d,c)(W,d,c) such that xx~(κW+1)ΠW(dx~)1\|x^{\prime}-\tilde{x}\|_{\infty}\leq(\kappa_{W}+1)\|\Pi_{W^{\perp}}(d-\tilde{x})\|_{1}. Consequently, xR>0x^{\prime}_{R}>0, implying sR=0s^{*}_{R}=0 for every dual optimal ss^{*} by complementary slackness. ∎

In Section 8.4, we use a dual version of this theorem, also including upper bound constraints in the primal side. We now adapt the required proximity result to the following primal and dual LPs, and formulate it in matrix language to conform to the algorithm in Section 8.4.

minc,xAx=b,0xu.maxy,bu,tAy+st=c,s,t0.\begin{aligned} \min\;&\left\langle c,x\right\rangle\\ Ax&=b\,,\\ 0\leq x&\leq u\,.\\ \end{aligned}\quad\quad\quad\begin{aligned} \max\;\left\langle y,b\right\rangle-&\left\langle u,t\right\rangle\\ A^{\top}y+s-t&=c\,,\\ s,t&\geq 0\,.\\ \end{aligned} (16)

Note that any ymy\in\mathbb{R}^{m} induces a feasible dual solution with si=(ciai,y)+s_{i}=(c_{i}-\left\langle a_{i},y\right\rangle)^{+} and ti=(ai,yci)+t_{i}=(\left\langle a_{i},y\right\rangle-c_{i})^{+} for i[n]i\in[n]. A primal feasible solution xx and ymy\in\mathbb{R}^{m} are optimal solutions if and only if ai,yci\left\langle a_{i},y\right\rangle\leq c_{i} if xi<uix_{i}<u_{i} and ai,yci\left\langle a_{i},y\right\rangle\geq c_{i} if xi>0x_{i}>0.

Theorem 6.5.

Let (x,y)(x^{\prime},y^{\prime}) be optimal primal and dual solutions to (16) for input (b,u,c)(b,u,c^{\prime}), and (x′′,y′′)(x^{\prime\prime},y^{\prime\prime}) for input (b,u,c′′)(b,u,c^{\prime\prime}). Let

R0\displaystyle R_{0} :={i[n]:ai,y<ci(κW+1)cc′′1},\displaystyle:=\{i\in[n]:\,\left\langle a_{i},y^{\prime}\right\rangle<c^{\prime}_{i}-(\kappa_{W}+1)\|c^{\prime}-c^{\prime\prime}\|_{1}\}\,,
Ru\displaystyle R_{u} :={i[n]:ai,y>ci+(κW+1)cc′′1}.\displaystyle:=\{i\in[n]:\,\left\langle a_{i},y^{\prime}\right\rangle>c^{\prime}_{i}+(\kappa_{W}+1)\|c^{\prime}-c^{\prime\prime}\|_{1}\}\,.

Then xi′′=0x^{\prime\prime}_{i}=0 for every iR0i\in R_{0} and xi′′=uix^{\prime\prime}_{i}=u_{i} for every iRui\in R_{u}.

Proof.

Let A¯=(A0InIn)\bar{A}=\begin{pmatrix}A&0\\ I_{n}&I_{n}\end{pmatrix}. It is easy to see that κA¯=κA\kappa_{\bar{A}}=\kappa_{A}. Let d¯2n\bar{d}\in\mathbb{R}^{2n} such that A¯d¯=(bu)\bar{A}\bar{d}=\begin{pmatrix}b\\ u\end{pmatrix}. With c¯=(c,0n)\bar{c}=(c,0_{n}), the primal system can be equivalently written as minc¯,x¯\min\left\langle\bar{c},\bar{x}\right\rangle, x¯ker(A¯)+d¯\bar{x}\in\ker(\bar{A})+\bar{d}, x¯0\bar{x}\geq 0. The statement follows by Theorem 6.4 applied for W=(ker(A¯))=im(A¯)W=(\ker(\bar{A}))^{\perp}=\operatorname{im}(\bar{A}^{\top}). ∎

7 Linear programming with dependence on the constraint matrix only

Recent years have seen tremendous progress in the development of more efficient LP algorithms using interior point methods, see e.g. [CLS19, LS19, vdBLN+20, vdBLL+21] and references therein. These algorithms are weakly polynomial, i.e., their running time depends on the encoding length of the input (A,b,c)(A,b,c) of LP(A,b,c)(A,b,c).

A fundamental open problem is the existence of a strongly polynomial LP algorithm; this was listed by Smale as one of the key open problems in mathematics for the 21st century [Sma98]. The number of arithmetic operations of such an algorithm would be polynomial in the number nn of variables and mm of constraints, but independent of the input length.

Towards this end, there is a line of work on developing algorithms with running time depending only on the constraint matrix AA, while removing the dependence on bb and cc. This direction was pioneered by Tardos’s 1985 paper [Tar86], giving an algorithm for LP(A,b,c)(A,b,c) with integral AA that has runtime poly(m,n,logΔA)\operatorname{poly}(m,n,\log\Delta_{A}).

A breakthrough work by Vavasis and Ye [VY96] introduced a Layered Least Squares Interior-Point Method that solves LP(A,b,c)(A,b,c) within O(n3.5log(χ¯A+n))O(n^{3.5}\log(\bar{\chi}_{A}+n)) iterations, each requiring to solve a linear system. Recall from Theorem 4.2 that log(χ¯A+n)=Θ(log(κA+n))\log(\bar{\chi}_{A}+n)=\Theta(\log(\kappa_{A}+n)); also recall from Proposition 3.2 that κAΔA\kappa_{A}\leq\Delta_{A}.

Recently, [DHNV20] improved the Vavasis–Ye bound to O(n2.5log(n)log(χ¯A+n))O(n^{2.5}\log(n)\log(\bar{\chi}_{A}^{*}+n)) linear system solves, where χ¯A\bar{\chi}_{A}^{*} is the optimized version of χ¯A\bar{\chi}_{A}, analogous to κA\kappa^{*}_{A} defined in Section 5. The key insight of this work is using the circuit imbalance measure κA\kappa_{A} as a proxy to χ¯A\bar{\chi}_{A}. These results are discussed in Section 7.2.

Section 7.1 exhibits another recent paper [DNV20] that extends Tardos’s black-box framework to solve LP in runtime poly(m,n,log(κ¯A+n))\operatorname{poly}(m,n,\log(\bar{\kappa}_{A}+n)), based on the proximity results in Section 6. We note that using an initial rescaling as in Theorem 5.1, we can obtain poly(m,n,log(κ¯A+n))\operatorname{poly}(m,n,\log(\bar{\kappa}^{*}_{A}+n)) runtimes from these algorithms.

7.1 A black box algorithm

The LP feasibility and optimization algorithms in [DNV20] rely on a black-box subroutine for approximate LP solutions, and use their outputs to find exact primal (and dual) optimal solutions in time poly(m,n,log(κ¯A+n))\operatorname{poly}(m,n,\log(\bar{\kappa}_{A}+n)). For the black-box, one can use the fast interior-point algorithms cited above.

More precisely, we require the following approximately feasible and optimal solution x~\tilde{x} to LP(W,d,c)(W,d,c) in time poly(n,m)log((κ¯A+n)/ε))\mathrm{poly}(n,m)\log((\bar{\kappa}_{A}+n)/\varepsilon)\big{)}. Here OPTLP\operatorname{OPT}_{\mathrm{LP}} denotes the objective value of LP(W,d,c)(W,d,c).

c,x~\displaystyle\left\langle c,\tilde{x}\right\rangle OPTLP+εcd\displaystyle\leq\operatorname{OPT}_{\mathrm{LP}}+\varepsilon\|c\|\cdot\|d\| (APX-LP)
x~\displaystyle\tilde{x} W+d\displaystyle\in W+d
x~\displaystyle\|\tilde{x}^{-}\| εd\displaystyle\leq\varepsilon\|d\|\,
x~\displaystyle\tilde{x} n\displaystyle\in\mathbb{R}^{n}

The feasibility algorithm makes O(m)O(m) calls, and the optimization algorithm makes O(nm)O(nm) calls to such a subroutine for ε=1/(κ¯A+n)O(1)\varepsilon=1/(\bar{\kappa}_{A}+n)^{O(1)}.

We now give a high-level outline of the feasibility algorithm in [DNV20] for the system xW+dx\in W+d, x0x\geq 0. For the description, let us assume this system is feasible; in case of infeasibility, the algorithm recovers a Farkas-certificate. The main progress step in the algorithm is reducing the dimension mm of the linear space WW by one, based on information obtained form an approximate solution x~\tilde{x} to (APX-LP). We can make such an inference using the proximity result Lemma 6.1.

If x~0\tilde{x}\geq 0 for the solution returned by the solver, we can terminate with x=x~x=\tilde{x}. Otherwise, let I[n]I\subseteq[n] denote the set of ‘large’ coordinates of x~\tilde{x}, i.e., where x~i>κWx~1\tilde{x}_{i}>\kappa_{W}\|\tilde{x}^{-}\|_{1}. By Lemma 6.1, there must exist a feasible solution xW+dx\in W+d, x0x\geq 0 such that xix_{i} is still sufficiently large for iIi\in I. Therefore, one can drop the sign-constraint on II, as non-negativity can be enforced automatically. We recurse on πJ(W)\pi_{J}(W) for J=[n]IJ=[n]\setminus I, i.e. project out the variables in II.

Each recursive call decreases the dimension of the subspace, until a feasible vector is found. The feasible solution on the remaining variables now has to be lifted to the variables we projected out, to get a feasible solution to the original problem xW+dx\in W+d, x0x\geq 0. We use the lifting operator LJW(p)L_{J}^{W}(p) introduced in Section 4.1: for pπJ(W)p\in\pi_{J}(W), z=LJW(p)z=L_{J}^{W}(p) is the minimum-norm vector in WW such that zJ=pz_{J}=p. According to Proposition 4.5, zκWp1\|z\|_{\infty}\leq\kappa_{W}\|p\|_{1}; this bound can be used to guarantee that the lifted solution is nonnegative on II.

Algorithm 1 gives a simplified description of the feasibility algorithm of [DNV20]. For simplicity, we ignore the infeasibility case and the details of the Adjust(d)\texttt{Adjust}(d) that may replace dd by it projection to WW^{\perp} in certain cases. This is needed to ensure II\neq\emptyset. Further, we omit an additional proximity condition from the approximate system APX-LP.

Input : Instance of LP(W,d,c)(W,d,c), ε>0\varepsilon>0, with c=0nc=0_{n}.
Output : Feasible solution to the system in Lemma 6.1
1 dAdjust(d)d\leftarrow\texttt{Adjust}(d) ;
2  \triangleright Occasionally applies a projection
3 if dWd\in W then  return 0n0_{n} ;
4 x~APX-LP(W,d,0n,ε)\tilde{x}\leftarrow\ref{sys:near_feas_near_opt}(W,d,0_{n},\varepsilon) ;
5  \triangleright Provided by Black box solver
6 I{i:x~iκWx~},J[n]II\leftarrow\{i:\tilde{x}_{i}\geq\kappa_{W}\|\tilde{x}^{-}\|\},J\leftarrow[n]\setminus I,;
7 zFeasibility-Simplified(πJ(W),dJ,ε)z\leftarrow\texttt{Feasibility-Simplified}(\pi_{J}(W),d_{J},\varepsilon);
return [x~I+[LJW(zx~J)]Iz]\begin{bmatrix}\tilde{x}_{I}+[L_{J}^{W}(z-\tilde{x}_{J})]_{I}\\ z\end{bmatrix}
Algorithm 1 Feasibility-Simplified

As stated here, the computation complexity is dominated by the at most mm recursive calls to the solver for the system (APX-LP).

In [DNV20], these techniques are also extended to solve the minimum-cost problem LP(W,d,c)(W,d,c) for arbitrary cost vector cnc\in\mathbb{R}^{n}. The optimization algorithm makes O(m)O(m) recursive calls to the approximate solver to identify a variable xix_{i} that must be 0 in every optimal solution; this is deduced using Theorem 6.4.

7.2 Layered least squares interior point methods

In this section, we briefly review layered least squares (LLS) interior-point methods, and highlight the role of the circuit imbalance measure κW\kappa_{W} in this context. The central path for the standard log-barrier function is the parametrized curve given by the solutions to the system following system for μ++\mu\in\mathbb{R}_{++}

Ax\displaystyle Ax =b\displaystyle=b (17)
Ay+s\displaystyle A^{\top}y+s =c\displaystyle=c
xisi\displaystyle x_{i}s_{i} =μi[n]\displaystyle=\mu\;\quad\forall i\in[n]
x,s\displaystyle x,s >0\displaystyle>0

A unique solution for each μ>0\mu>0 exist whenever LP(A,b,c)(A,b,c) possesses strictly feasible primal and dual solutions, i.e. primal resp. dual solutions with x>0x>0 resp. s>0s>0; the duality gap between these solutions is nμn\mu. The limit point at μ0\mu\to 0 gives a pair of primal and dual optimal solutions. At a high level, interior point methods require an initial solution close to the central path for some large μ\mu and proceed by following the central path in some proximity towards smaller and smaller μ\mu, which corresponds to converging to an optimal solution. A standard variant is the Mizuno–Todd–Ye [MTY93] predictor-corrector method. This alternates between predictor and corrector steps. Each predictor step decreases the parameter μ\mu at least by a factor (1β/n)(1-\beta/\sqrt{n}), but moves further away from the central path. Corrector steps maintain the same μ\mu but restore better centrality.

Let us now focus on the predictor step at a given point (x,s)(x,s); we use the subspace notation W=ker(A)W=\ker(A), W=im(A)W^{\top}=\operatorname{im}(A^{\top}) as in LP(W,d,c)(W,d,c). The augmentation direction is computed by the affine scaling (AS) step, that can be written as weighted least squares problems on the primal and dual sides:

Δx\displaystyle\Delta x :=argmin{i[n](xi+Δxixi)2:ΔxW}\displaystyle:=\operatorname*{arg\,min}\Big{\{}\sum_{i\in[n]}\Big{(}\frac{x_{i}+\Delta x_{i}}{x_{i}}\Big{)}^{2}:\Delta x\in W\Big{\}} (18)
Δs\displaystyle\Delta s :=argmin{i[n](si+Δsisi)2:ΔsW}\displaystyle:=\operatorname*{arg\,min}\Big{\{}\sum_{i\in[n]}\Big{(}\frac{s_{i}+\Delta s_{i}}{s_{i}}\Big{)}^{2}:\Delta s\in W^{\perp}\Big{\}}

The update is then performed by setting xx+αΔxx\leftarrow x+\alpha\Delta x, ss+αΔss\leftarrow s+\alpha\Delta s for some α[0,1]\alpha\in[0,1]. As such, this algorithm can find ε\varepsilon-approximate solutions in weakly polynomial time. However, it does not even terminate in finitely many iterations, because using the weighted 2\ell_{2}-regressions problems (18) will never set variables exactly to 0 as required by complementary slackness. For standard interior point methods, a final rounding step is required.

The layered least squares interior-point method by Vavasis and Ye [VY96] not only terminates finitely, but has an iteration bound of O(n3.5log(χ¯W+n))O(n^{3.5}\log(\bar{\chi}_{W}+n)), depending only on AA, but independent of bb and cc. We will refer to this as the VY algorithm.

Recall from Theorem 4.2 that log(χ¯A+n)=Θ(log(κA+n))\log(\bar{\chi}_{A}+n)=\Theta(\log(\kappa_{A}+n)). For certain predictor iterations, they use a layered least squared (LLS) step instead of affine scaling. Variables are split into layers according to the xix_{i} values: we order the variables as x1x2xnx_{1}\geq x_{2}\geq\ldots\geq x_{n}, and start a new layer whenever there is a big gap between consecutive variables, i.e. xi>O(n2)χ¯Axi+1x_{i}>O(n^{2})\bar{\chi}_{A}x_{i+1}. For a point (x,s)(x,s) on the central path, the ordering on the sis_{i}’s will be approximately reverse.

We illustrate their step based on a partition of the variable set [n][n] into two layers BN=[n]B\cup N=[n]; the general step may use an arbitrary number of layers. The layered least squared step is given in a 2-stage approach via

ΔxNll\displaystyle\Delta x_{N}^{\mathrm{ll}} :=argmin{iN(xi+Δxixi)2:ΔxNπN(W)},\displaystyle:=\operatorname*{arg\,min}\Big{\{}\sum_{i\in N}\Big{(}\frac{x_{i}+\Delta x_{i}}{x_{i}}\Big{)}^{2}:\Delta x_{N}\in\pi_{N}(W)\Big{\}}, (19)
ΔxBll\displaystyle\Delta x_{B}^{\mathrm{ll}} :=argmin{iB(xi+Δxixi)2:(ΔxB,ΔxN)W},\displaystyle:=\operatorname*{arg\,min}\Big{\{}\sum_{i\in B}\Big{(}\frac{x_{i}+\Delta x_{i}}{x_{i}}\Big{)}^{2}:(\Delta x_{B},\Delta x_{N})\in W\Big{\}},
ΔsBll\displaystyle\Delta s_{B}^{\mathrm{ll}} :=argmin{iB(si+Δsisi)2:ΔsBπB(W)},\displaystyle:=\operatorname*{arg\,min}\Big{\{}\sum_{i\in B}\Big{(}\frac{s_{i}+\Delta s_{i}}{s_{i}}\Big{)}^{2}:\Delta s_{B}\in\pi_{B}(W^{\perp})\Big{\}},
ΔsNll\displaystyle\Delta s_{N}^{\mathrm{ll}} :=argmin{iN(si+Δsisi)2:(ΔsB,ΔsN)W}.\displaystyle:=\operatorname*{arg\,min}\Big{\{}\sum_{i\in N}\Big{(}\frac{s_{i}+\Delta s_{i}}{s_{i}}\Big{)}^{2}:(\Delta s_{B},\Delta s_{N})\in W^{\perp}\Big{\}}.

Whereas the predictor-corrector algorithm—as most standard interior point variants—is invariant under rescaling the columns of the constraint matrix, the VY algorithm is not: the layers are chosen by comparing the xix_{i} values. For this reason, it was long sought to find a scaling-invariant version of [VY96], that would automatically improve the running time dependence from χ¯W\bar{\chi}_{W} to the best possible value χ¯W\bar{\chi}^{*}_{W} achievable under column rescaling. In this line of work fall the results of [MT08, MT03, MT05], but none of them achieving dependence on the constraint matrix only while being scaling-invariant.

This question was finally settled in [DHNV20] in the affirmative. A key ingredient is revealing the connection between χ¯W\bar{\chi}_{W} and κW\kappa_{W}. Preprocessing the instance via the algorithm in Theorem 5.1 to find a nearly optimal rescaling for κW\kappa_{W} (and thus for χ¯W\bar{\chi}_{W}), and then using the VY algorithm already achieves O(n3.5log(χ¯W+n))O(n^{3.5}\log(\bar{\chi}^{*}_{W}+n)). Beyond this, [DHNV20] also presents a new LLS interior point method based on the pairwise circuit imbalances κij\kappa_{ij} that is inherently scaling invariant, as well as an improved analysis of O(n2.5log(n)log(χ¯W+n))O(n^{2.5}\log(n)\log(\bar{\chi}^{*}_{W}+n)) iterations. We give an outline next. Details are omitted here, a self-contained overview can be found in [DHNV20].

What determines a good layering?

We illustrate the LLS step in the following hypothetical situation. Assume that the partition BNB\cup N is such that xN=0x_{N}^{*}=0 and sB=0s_{B}^{*}=0 for optimal primal and dual solution (x,s)(x^{*},s^{*}) to LP(W,d,c)(W,d,c). In particular, the LLS direction in (19) will set ΔxNll=xN\Delta x_{N}^{\mathrm{ll}}=-x_{N}. Note that this does not hold for the AS direction ΔxN\Delta x_{N} that solves (18). This benefit of the LLS direction over the AS direction will result in us being able to choose α\alpha larger for the LLS step compared to the AS step.

To terminate with an optimal solution in a single step, we need to be able to select the step size α=1\alpha=1, which requires that xB+ΔBll0x_{B}+\Delta_{B}^{\mathrm{ll}}\geq 0. But as in the computation of ΔxNll\Delta x_{N}^{\mathrm{ll}} the components in BB are ignored we need to ensure the choice of ΔxNll\Delta x_{N}^{\mathrm{ll}} does not impact ΔxBll\Delta x_{B}^{\mathrm{ll}} by too much. By that we mean that there is a vector zBz\in\mathbb{R}^{B} such that |zi/xi|1|z_{i}/x_{i}|\ll 1 for all iBi\in B and (z,ΔxN)W(z,\Delta x_{N})\in W. The norm of this zz is exactly governed by the lifting operator we introduced in Section 4.1. Let Wx=diag(x)1W={(wi/xi)i[n]:wW}W^{x}=\operatorname{diag}(x)^{-1}W=\{(w_{i}/x_{i})_{i\in[n]}:w\in W\} denote the space WW rescaled by the 1/xi1/x_{i} values. Then,

(zixi)iB=LNWx((Δxillxi)iN)LNWx(Δxillxi)iN.\Big{\|}\Big{(}\frac{z_{i}}{x_{i}}\Big{)}_{i\in B}\Big{\|}=\Big{\|}L_{N}^{W^{x}}\Big{(}\Big{(}\frac{\Delta x_{i}^{\mathrm{ll}}}{x_{i}}\Big{)}_{i\in N}\Big{)}\Big{\|}\leq\|L_{N}^{W^{x}}\|\cdot\Big{\|}\Big{(}\frac{\Delta x_{i}^{\mathrm{ll}}}{x_{i}}\Big{)}_{i\in N}\Big{\|}\,. (20)

By Lemma 4.5, note that

(zixi)iBnκWx(Δxixi)iN.\Big{\|}\Big{(}\frac{z_{i}}{x_{i}}\Big{)}_{i\in B}\Big{\|}\leq n\kappa_{W^{x}}\Big{\|}\Big{(}\frac{\Delta x_{i}}{x_{i}}\Big{)}_{i\in N}\Big{\|}\,. (21)

Further, notice that the lifting cost imposed on variables in BB by ΔxNll\Delta x_{N}^{\mathrm{ll}} are given by the circuit imbalances in the rescaled space WxW^{x}: For iBi\in B and jNj\in N we are interested in κjiWx=κjixj/xi\kappa^{W^{x}}_{ji}=\kappa_{ji}{x_{j}}/{x_{i}}. In particular, if these quantities are small for all iBi\in B and jNj\in N, then the low lifting cost discussed above is achieved and we can select stepsize α=1\alpha=1.

The choice of layers

The Vavasis-Ye algorithm defines the layering based on the magnitude of the elements xix_{i}. This guarantees that κjixj/xi\kappa_{ji}{x_{j}}/{x_{i}} is small since xi>O(n2)κAxjx_{i}>O(n^{2})\kappa_{A}x_{j} if ii is on a higher layer than jj. However, this choice is inherently not scaling-invariant.

The LLS algorithm in [DHNV20] directly uses the scaling invariant quantities κjixj/xi\kappa_{ji}{x_{j}}/{x_{i}} to define the layering. In the ideal version of the algorithm, the layers are selected as the strongly connected components of the directed graph formed by the edges where this value is large. Hence, κjixj/xi\kappa_{ji}{x_{j}}/{x_{i}} is small whenever ii is on a higher layer than jj.

This ideal version cannot be implemented since the pairwise imbalances κji\kappa_{ji} are hard to compute or even approximate. The actual algorithm instead works with lower estimates κ^ji\hat{\kappa}_{ji}. Thus, we may miss some edges from the directed graph, in which case the lifting may fail. Such failure will be detected in the algorithm, and in turn reveals better estimates for some pairs (i,j)(i,j).

7.3 The curvature of the central path

The condition number χ¯A\bar{\chi}_{A}^{*} also has an interesting connection to the geometry of the central path. In this context, Sonnevend, Stoer, and Zhao [SSZ91] introduced a primal-dual curvature notion. Monteiro and Tsuchiya [MT08] reveal strong connections between the curvature integral, the Mizuno-Todd-Ye predictor-corrector algorithm, and the Vavasis-Ye algorithm. In particular, they prove a bound O(n3.5log(χ¯A+n))O(n^{3.5}\log(\bar{\chi}^{*}_{A}+n)) on the curvature integral.

Besides the above primal-dual curvature, one can also study the total curvature of the central path, a standard notion in algebraic geometry. De Loera, Sturmfels, and Vinzant [DLSV12] studied the central curve defined as the solution of the polynomial equations

Ax=b,Ays=cxisi=λi[n],Ax=b\,,\quad A^{\top}y-s=c\,\quad x_{i}s_{i}=\lambda\,\quad\forall i\in[n],\quad (22)

This includes the usual central path in the region x,s>0x,s>0, but also includes the central path of all other LPs with objective cc in the hyperplane arrangement in {xn:Ax=b}\{x\in\mathbb{R}^{n}:\,Ax=b\} defined by the hyperplanes xi=0x_{i}=0; i.e., all LPs where some nonnegativity constraints xi0x_{i}\geq 0 are flipped to xi0x_{i}\leq 0. In fact, [DLSV12] shows thato (22) defines the smallest algebraic variety containing the central path.

They consider the average curvature taken over the bounded regions in the hyperplane arrangement, and show a bound 2π(nm1)2\pi(n-m-1) for the primal central path (i.e., the projection of (22) to the xx space), and 2π(m1)2\pi(m-1) for the dual central path (the projection to the ss space). Their argument crucially relies on circuit polynomials defined via elementary vectors. See [DLSV12] for further pointers to the literature on the total curvature of the central path.

8 Circuit diameter bounds and circuit augmentation algorithms

Consider an LP in standard equality form with upper bounds, where Am×nA\in\mathbb{R}^{m\times n}, bmb\in\mathbb{R}^{m}, unu\in\mathbb{R}^{n}:

min\displaystyle\min c,x\displaystyle\left\langle c,x\right\rangle (LP(A,b,c,u)(A,b,c,u))
Ax\displaystyle Ax =b\displaystyle=b
0x\displaystyle 0\leq x u\displaystyle\leq u\,

In Section 4.2 we briefly mentioned the Hirsch conjecture and some progress towards the polynomial Hirsch conjecture; in Theorem 4.8 shows a bound O((nm)3mκAlog(κA+n))O((n-m)^{3}m\kappa_{A}\log(\kappa_{A}+n)) on the diameter of {xn:Ax=b,x0}\{x\in\mathbb{R}^{n}:\,Ax=b,x\geq 0\} for Am×nA\in\mathbb{R}^{m\times n}.

Circuit diameter bounds were introduced by Borgwardt, Finhold, and Hemmecke [BFH15] as a relaxation of diameter bounds. Let PP denote the feasible region of LP(A,b,c,u)(A,b,c,u). A circuit walk is a set of consecutive feasible points x(1),x(2),,x(k+1)Px^{(1)},x^{(2)},\ldots,x^{(k+1)}\in P such that for each t=1,,kt=1,\ldots,k, x(t+1)=x(t)+g(t)x^{(t+1)}=x^{(t)}+g^{(t)} for g(t)(A)g^{(t)}\in\mathcal{F}(A), and further, x(t)+(1+ε)g(t)Px^{(t)}+(1+\varepsilon)g^{(t)}\notin P for any ε>0\varepsilon>0, i.e., each consecutive circuit step is maximal. The circuit diameter of PP is the minimum length of a circuit walk between any two vertices x,yPx,y\in P.

In contrast to walks in the vertex-edge graph, circuit walks are non-reversible and the minimum length from xx to yy may be different from the one from yy to xx; this is due to the maximality requirement. The circuit-analogue of the Hirsch conjecture, formulated in [BFH15], asserts that the circuit diameter of a polytope in dd dimensions with nn facets is at most ndn-d; this may be true even for unbounded polyhedra, see [BSY18].

In this section we begin by showing a recent, improved bound on the circuit diameter with logκA\log\kappa_{A} dependence. Section 8.2 gives an overview of circuit augmentations algorithms. We review existing algorithms for different augmentation rules (Theorem 8.2 and Theorem 8.5), and also show a new bound for the steepest-descent direction (Theorem 8.4). The bounds in these three theorems also translate directly to circuit diameter bounds, since they all consider algorithms with maximal augmentation sequences.

8.1 An improved circuit diameter bound

In a recent paper Dadush et al. [DKNV21] gave the following bound on the circuit diameter:

Theorem 8.1 ([DKNV21]).

An LP of the form LP(A,b,c,u)(A,b,c,u) has circuit diameter bound O(m2log(m+κA)+nlogn)O(m^{2}\log(m+\kappa_{A})+n\log n).

Let us highlight the main ideas to prove Theorem 8.1. The argument is constructive but non algorithmic in the sense that the augmentation steps are defined using the optimal solution. We first show the bound in Theorem 8.1 for LP(A,b,c)(A,b,c) (i.e., without upper bounds uu) and then extend the argument to systems of form LP(A,b,c,u)(A,b,c,u). Let xx^{*} be a basic optimal solution to LP(A,b,c)(A,b,c) corresponding to basis BB, and let N=[n]BN=[n]\setminus B. Thus, xx^{*} is the unique optimal solution with respect to the cost vector c=(0B,𝟙N)c=(0_{B},\mathbbm{1}_{N}).

For the current iterate x(t)x^{(t)}, consider a conformal circuit decomposition h1,,hkh^{1},\ldots,h^{k} of xx(t)x^{*}-x^{(t)}, and select a circuit hi,i[k]h^{i},i\in[k] such that hNi1\|h_{N}^{i}\|_{1} is maximized. We find the next iterate x(t+1)=x(t)+αhix^{(t+1)}=x^{(t)}+\alpha h^{i} for the maximal stepsize α>0\alpha>0. Note that the existence of such a decomposition does not yield a circuit diameter bound nn due to the maximality requirement in the definition of circuit walks. Nonetheless, it can be shown that we will not overshoot hih^{i} by too much. More precisely, one can show that the step length will be α[1,n]\alpha\in[1,n]. Further, the choice of hih^{i} guarantees that xN(t)1\|x^{(t)}_{N}\|_{1} decreases geometrically.

The analysis focuses on the index sets Lt={i[n]:xi>nκAxN(t)1}L_{t}=\{i\in[n]:\,x^{*}_{i}>n\kappa_{A}\|x^{(t)}_{N}\|_{1}\} and Rt={i[n]:xi(t)nxi}R_{t}=\{i\in[n]:\,x^{(t)}_{i}\leq nx^{*}_{i}\}. For every iLti\in L_{t}, xix_{i} must already be ‘large’ and cannot be set to zero later in the algorithm; RtR_{t} is the set of indices that have essentially ‘converged’ to the final value xix^{*}_{i}. Since xN(t)1\|x^{(t)}_{N}\|_{1} is decreasing, once an index enters LtL_{t}, it can never leave again. The same property can be shown for RtR_{t}. Moreover, a new index is added to either set RtR_{t} or LtL_{t} every O(mlog(m+κA))O(m\log(m+\kappa_{A})) iterations, leading to the overall bound O(m2log(m+κA))O(m^{2}\log(m+\kappa_{A})).

For a system LP(A,b,c,u)(A,b,c,u) with upper bounds uu, the above argument yields a bound O(n2log(n+κ))O(n^{2}\log(n+\kappa)) using a simple reduction. To achieve a better bound, [DKNV21] gives a preprocessing sequence of O(nlogn)O(n\log n) circuit augmentations that reduces the number of variables to 2m\leq 2m. This preprocessing terminates once the set of columns in ADA_{D} are linearly independent for D={i[n]:xixi and xi{0,ui}}D=\{i\in[n]:x_{i}^{*}\neq x_{i}\text{ and }x_{i}^{*}\in\{0,u_{i}\}\}. Since a basic solution xx^{*} may have m\leq m entries not equal to the lower or upper bound, at this point there are 2m\leq 2m variables xixix_{i}\neq x_{i}^{*}. This leads to a circuit diameter bound of O(m2log(m+κA)+nlogn)O(m^{2}\log(m+\kappa_{A})+n\log n).

8.2 Circuit augmentation algorithms

The generic circuit augmentation algorithm is a circuit walk x(1),x(2),,x(k+1)Px^{(1)},x^{(2)},\ldots,x^{(k+1)}\in P as defined above, such that an initial feasible x(0)x^{(0)} is given, and c,x(t+1)<c,x(t)\left\langle c,x^{(t+1)}\right\rangle<\left\langle c,x^{(t)}\right\rangle, i.e., the objective value decreases in every iteration. The elementary vector gg is an augmenting direction for the solution x(t)x^{(t)} if and only if c,g<0\left\langle c,g\right\rangle<0, gi0g_{i}\geq 0 for every xi(t)=0x^{(t)}_{i}=0 and gi0g_{i}\leq 0 for every xi(t)=uix^{(t)}_{i}=u_{i}. By LP duality, x(t)x^{(t)} is optimal if and only if no augmenting direction exists. Otherwise, the algorithm proceeds to the next iterate x(t+1)x^{(t+1)} by a maximal augmentation in an augmenting direction.

The simplex algorithm can be seen as a circuit augmentation algorithm that is restricted to using special elementary vectors corresponding to edges of the polyhedron.222Simplex may contain degenerate pivots when the basic solution remains the same; we do not count these as augmentation steps. For the general framework, the iterates x(k)x^{(k)} may not be vertices. However, in case of maximal augmentations, they must all lie on the boundary of the polyhedron.

In unpublished work, Bland [Bla76] extended the Edmonds–Karp–Dinic algorithm [Din70, EK72] algorithm for general LP, see also [Lee89, Proposition 3.1]. Circuit augmentation algorithms were revisited by De Loera, Hemmecke, and Lee in 2015 [DLHL15], analyzing different augmentation rules and extending them to integer programming. We give an overview of their results first for linear programming. In particular, they studied three augmentation rules that use maximal augmentation. Let x(t)x^{(t)} be the current feasible solution, and we aim to select an augmenting direction gg as follows.

  • Dantzig-descent direction: Select gg such that c,g-\left\langle c,g\right\rangle is maximized, where g=gCg=g^{C} is the elementary vector with lcm(gC)=1\mathrm{lcm}(g^{C})=1 for a circuit C𝒞WC\in\mathcal{C}_{W}.

  • Deepest-descent direction: Select gg such that αc,g-\alpha\left\langle c,g\right\rangle is maximized, where α\alpha is the maximal stepsize for x(t)x^{(t)} and gg.

  • Steepest-descent direction: Select gg such that c,g/g1-\left\langle c,g\right\rangle/\|g\|_{1} is maximized.

Computing Dantzig- and deepest-descent directions is in general NP-hard, see [DLKS19] and as detailed below. The steepest-descent direction can be formulated by an LP; but without any restrictions on the input problem, this may not be simpler than the original one. However, it could be easier to solve in practice; Borgwardt and Viss [BV20] exhibits an implementation of a steepest-descent circuit augmentation algorithm with encouraging computational results.

8.2.1 Augmenting directions for flow problems

It is instructive to consider these algorithms for the special case of minimum-cost flows. Given a directed graph D=(V,E)D=(V,E) with capacities uEu\in\mathbb{R}^{E}, costs cEc\in\mathbb{R}^{E}, and node demands bVb\in\mathbb{R}^{V} with b(V)=iVbi=0b(V)=\sum_{i\in V}b_{i}=0. The objective is to find the minimum cost flow xx that satisfies the capacity constraints: 0xu0\leq x\leq u, and the node demands: for each node iVi\in V, the total incoming minus the total outgoing flow equals bib_{i}. This can be written in the form LP(A,b,c,u)(A,b,c,u) with AA as the node-arc incidence matrix of DD, a TU matrix. Let us define the residual graph Dx=(V,Ex)D_{x}=(V,E_{x}), where for (i,j)Ex(i,j)\in E_{x} we let (i,j)E(i,j)\in E if xij<uijx_{ij}<u_{ij} and (j,i)E(j,i)\in E if xij>0x_{ij}>0. The cost of a reverse arc will be defined as cji=cijc_{ji}=-c_{ij}. We will also refer to the residual capacities of arcs; these are uijxiju_{ij}-x_{ij} in the first case and xijx_{ij} in the second.

Let us observe that the augmenting directions correspond to directed cycles in the residual graph. Circuit augmentation algorithms for the primal and dual problems yield the rich classes of cycle cancelling and cut cancelling algorithms, see the survey [SIM00].

The maximum flow problem between a source ss and sink tt can be formulated as a special case as follows. We add a new arc (t,s)(t,s) with capacity \infty, set the demands b0b\equiv 0, and costs as cts=1c_{ts}=-1 and cij=0c_{ij}=0 otherwise. Bland’s [Bla76] observation was that the steepest-descent direction for this problem corresponds to finding a shortest residual ss-tt path, as chosen in the Edmonds–Karp–Dinic algorithm.

More generally, a steepest-descent direction amounts to finding a residual cycle CExC\subseteq E_{x} that minimizes the mean cycle cost c(C)/|C|c(C)/|C|. Thus, the steepest descent algorithm for minimum-cost flows corresponds to the classical Goldberg–Tarjan algorithm [GT89] that is strongly polynomial with running time O(|V||E|2)O(|V|\cdot|E|^{2}) [RG94].

Let us now consider the other two variants. A Dantzig-descent direction in this context asks for the most negative cycle, i.e., a cycle maximizing c(C)-c(C). A deepest-descent direction asks for a cycle CC of arcs that maximizes αc(C)-\alpha c(C), where α\alpha is the residual capacity of CC. Computing both these directions exactly is NP-complete, since they generalize the Hamiltonian-cycle problem: for every directed graph, we can set up a flow problem where ExE_{x} coincides with the input graph, all residual capacities are equal to 1, and all costs are 1-1. We note that De Loera, Kafer, and Sanità [DLKS19] showed that computing the Dantzig- and deepest-descent directions is also NP-hard for the fractional matching polytope.

Nevertheless, the deepest-descent direction can be suitably approximated. Wallacher [Wal89] proposed selecting a minimum ratio cycle in the residual graph. This is a cycle in ExE_{x} that minimizes c(C)/d(C)c(C)/d(C), where de=1/ued_{e}=1/u_{e} for every residual arc eExe\in E_{x}; such a cycle can be found in strongly polynomial time. It is easy to show that this cycle approximates the deepest descent direction within a factor |Ex||E_{x}|. Wallacher’s algorithm can be naturally extended to linear programming [MS00], and has found several combinatorial applications, e.g. [WZ99, Way02], and has also been used in the context of integer programming [SW99]. We discuss an improved new variant in Section 8.3. A different relaxation of the deepest-descent algorithm was given by Barahona and Tardos [BT89], based on Weintraub’s algorithm [Wei74].

8.2.2 Convergence bounds

We now state the convergence bounds from [DLHL15]. The original statement refers to subdeterminant bounds; we paraphrase them in terms of finding approximately optimal solutions.

Theorem 8.2 (De Loera, Hemmecke, Lee [DLHL15]).

Consider a linear program in the form LP(A,b,c,u)(A,b,c,u). Assume we are given an initial feasible solution x(0)x^{(0)}, and let OPT\mathrm{OPT} denote the optimum value. By an ε\varepsilon- optimal solution we mean an iterate x(t)x^{(t)} such that c,x(t)OPT+ε\left\langle c,x^{(t)}\right\rangle\leq\mathrm{OPT}+\varepsilon.

  1. (a)

    For given ε>0\varepsilon>0, one can find an ε\varepsilon-optimal solution in 2nlog2(c,x(0)OPTε)2n\log_{2}\left(\frac{\left\langle c,x^{(0)}\right\rangle-\mathrm{OPT}}{\varepsilon}\right) deepest-descent augmentations.

  2. (b)

    For given ε>0\varepsilon>0, one can find an ε\varepsilon-optimal solution in 2n2γεlog2(c,x(0)OPTε)\frac{2n^{2}\gamma}{\varepsilon}\log_{2}\left(\frac{\left\langle c,x^{(0)}\right\rangle-\mathrm{OPT}}{\varepsilon}\right) Dantzig-descent augmentations, where γ\gamma is an upper bound on the maximum entry in any feasible solution.

  3. (c)

    One can find an exact optimal solution in min{n|𝒞A|,A}\min\{n|\mathcal{C}_{A}|,\ell_{A}\} steepest-descent augmentations, where A\ell_{A} denotes the number of distinct values of c,g/g1\left\langle c,g\right\rangle/\|g\|_{1} over g(A)g\in\mathcal{F}(A).

In general, circuit augmentation algorithms may not even finitely terminate; see [MS00] for an example on Wallacher’s rule for minimum cost flows. In parts a and b, assume that all basic solutions are 1/k1/k-integral for some kk\in\mathbb{Z} and cost function is cnc\in\mathbb{Z}^{n}. If x(t)x^{(t)} is a ε\varepsilon-optimal solution for ε<1/k\varepsilon<1/k, then we can identify an optimal vertex of the face containing x(t)x^{(t)} using a Carathéodory decomposition argument, this can be implemented by a sequence of n\leq n circuit augmentations (see [DLHL15, Lemma 5]).

According to part c, steepest descent terminates with an optimal solution in a finite number of iterations; moreover, the bound only depends on the linear space ker(A)\ker(A) and cc, and not on the parameters bb and uu. However, the bound can be exponentially large.

Bland’s original observation was that A\ell_{A} is strongly polynomially bounded for the maximum flow problem. Recall that all elementary vectors gg correspond to cycles in the auxiliary graph. Normalizing such that gi{0,±1}g_{i}\in\{0,\pm 1\}, c,g=1-\left\langle c,g\right\rangle=1 for every augmenting cycle (as these must use the (t,s)(t,s) arc), and g1\|g\|_{1} is between 1 and |E||E|. In fact, the crucial argument by Edmonds and Karp [EK72] and Dinic [Din70] is showing that the length of the shortest augmenting path is non-decreasing, and must strictly increase within |E||E| consecutive iterations.

For an integer cost function cnc\in\mathbb{Z}^{n}, Lee [Lee89, Proposition 3.2] gave the following upper bound on A\ell_{A}:

Proposition 8.3.

If c1(nm+1)c\|c\|_{1}\leq(n-m+1)\|c\|_{\infty}, then

A12c(nm+1)κ¯A((nm+1)κ¯A+1).\ell_{A}\leq\frac{1}{2}{\|c\|_{\infty}}(n-m+1)\bar{\kappa}_{A}((n-m+1)\bar{\kappa}_{A}+1)\,.

In order to bound the circuit distance between vertices xx and yy let us use the following cost function. For the basis BB defining yy, let

ci={0if iB,1if i[n]B,yi=0,1if i[n]B,yi=ui.c_{i}=\begin{cases}0&\mbox{if }i\in B\,,\\ 1&\mbox{if }i\in[n]\setminus B,y_{i}=0\,,\\ -1&\mbox{if }i\in[n]\setminus B,y_{i}=u_{i}\,.\end{cases} (23)

With this cost function, Theorem 8.2c and Proposition 8.3 yield a bound O((nm)2κ¯A2)O((n-m)^{2}\bar{\kappa}_{A}^{2}) on the circuit diameter using the steepest descent algorithm.

Extending the analysis of the Goldberg-Tarjan algorithm [GT89], we present a new bound that only depends on the fractional circuit imbalance κA\kappa_{A}, and is independent of cc. The same bound was independently obtained by Gauthier and Derosiers [GD21]. The proof is given in Section 8.4.

Theorem 8.4.

For the problem LP(A,b,c,u)(A,b,c,u) with constraint matrix Am×nA\in\mathbb{R}^{m\times n}, the steepest-descent algorithm terminates within O(n2mκAlog(κA+n))O(n^{2}m\kappa_{A}\log(\kappa_{A}+n)) augmentations starting from any feasible solution x(0)x^{(0)}.

This improves on the above bound O((nm)2κ¯A2)O((n-m)^{2}\bar{\kappa}_{A}^{2}) for most values of the parameters (recall that κAκ¯A2\kappa_{A}\leq\bar{\kappa}_{A}^{2}). Moreover, this bounds the running time for steepest descent for an arbitrary cost function cc, not necessarily of the form (23).

Both these bounds are independent of bb, however, κA\kappa_{A} and κ¯A\bar{\kappa}_{A} may be exponentially large in the encoding length LAL_{A} of the matrix AA. In contrast, Theorem 8.2a yields a polynomial bound O(nLA,b)O(nL_{A,b}) on the number of deepest-descent iterations, where LA,bL_{A,b} is the encoding length of (A,b)(A,b). In what follows, we review a new circuit augmentation algorithm from [DKNV21] that achieves a logκA\log\kappa_{A} dependence; the running time is bounded as O(n3LA)O(n^{3}L_{A}), independently from bb.

8.3 A circuit augmentation algorithm with logκA\log\kappa_{A} dependence

Recall that the diameter bound Theorem 8.1 is non-algorithmic in the sense that the augmentation steps rely on knowing the optimal solution xx^{*}. Dadush et al. [DKNV21] complemented this with an efficient circuit augmentation algorithm, assuming oracles are provided for certain circuit directions.

Theorem 8.5 ([DKNV21]).

Consider the primal of LP(A,b,c)(A,b,c). Given a feasible solution, there exists a circuit augmentation algorithm that finds an optimal solution or concludes unboundedness using O(n3log(n+κA))O(n^{3}\log(n+\kappa_{A})) circuit augmentations.

The main circuit augmentation direction used in the paper for optimization is a step defined as Ratio-Circuit, a generalisation of the previously mentioned augmentation step by Wallacher [Wal89] for minimum cost flows. It finds a circuit that is a basic optimal solution to the following linear system:

minc,zs.t.Az=0,w,z1.\min\;\left\langle c,z\right\rangle\,\quad\mathrm{s.t.}\quad Az=0\,,\,\left\langle w,z^{-}\right\rangle\leq 1\,. (24)

Equivalently, the goal is to minimize the cost to weight ratio of a circuit, where the unit weight of increasing a variable xix_{i} is 0 and decreasing it is wiw_{i}. This can be seen as an efficiently implementable relaxation of the deepest-descent direction: for suitable weights, it achieves a geometric decrease in the objective value. For a vector x+nx\in\mathbb{R}^{n}_{+}, we let 1/x({})n1/x\in(\mathbb{R}\cup\{\infty\})^{n} denote the vector ww with wi=1/xiw_{i}=1/x_{i} (in particular, wi=w_{i}=\infty if xi=0x_{i}=0).

Lemma 8.6 ([MS00]).

Let OPT\mathrm{OPT} denote the optimum value of LP(A,b,c)(A,b,c). Given a feasible solution xx to LP(A,b,c)(A,b,c), let gg be the elementary vector returned by Ratio-Circuit(A,c,1/x)(A,c,1/x), and xx^{\prime} the next iterate. Then,

c,xOPT(11n)(c,xOPT).\left\langle c,x^{\prime}\right\rangle-\mathrm{OPT}\leq\left(1-\frac{1}{n}\right)\left(\left\langle c,x\right\rangle-\mathrm{OPT}\right)\,.
Proof.

Let xx^{*} be an optimal solution to LP(A,b,c)(A,b,c), and let z=(xx)/nz=(x^{*}-x)/n. Then, zz is feasible to (24) for w=1/xw=1/x. The claim easily follows by noting that c,gc,z=(OPTc,x)/n\left\langle c,g\right\rangle\leq\left\langle c,z\right\rangle=(\mathrm{OPT}-\left\langle c,x\right\rangle)/n, and noting that x+gx+g is feasible since 1/x,g1\left\langle 1/x,g^{-}\right\rangle\leq 1. ∎

Repeated application of a Ratio-Circuit step thus provide an iterate whose optimality gap decreases by a constant factor within O(n)O(n) iterations. However, as noted in [MS00], using only this rule does not even finitely terminate already for minimum cost flows.

Support Circuits

For this reason, [DKNV21] occasionally uses a second circuit augmentation step called Support-Circuit. Roughly speaking, when given a non-basic feasible point in the system LP(A,b,c)(A,b,c), one can efficiently augment around a circuit gg such that c,g0\left\langle c,g\right\rangle\leq 0 and thereby reduce its support while maintaining the objective.

On a high level, the need for such an operation becomes clear when considering following example. Assume c1c\equiv 1 and further assume that we are given an iterate xx with xκAy\|x\|\gg\kappa_{A}\|y\| for some basic solution yy. Then geometric progress in the objective c,x\left\langle c,x\right\rangle can be achieved by just reducing the norm of xx, but just geometric progress would not give the desired bound in the number of circuit augmentations. Note that the norm of all basic solutions lies within a factor of poly(n)κAy\operatorname{poly}(n)\kappa_{A}\|y\| by Proposition 3.1. Therefore, it is helpful to reduce the support through at most nn Support-Circuit operations until a basic solution is reached instead of applying Ratio-Circuit. Subsequent applications of Ratio-Circuit will now, again due to Proposition 3.1, not be able to reduce the norm of xx by more than a factor of poly(n)κA\operatorname{poly}(n)\kappa_{A}, a fact that will be exploited in the proximity arguments.

The main progress in the algorithm in Theorem 8.5 is identifying a new index ii such that xi=0x_{i}=0 in the current solution and xi=0x_{i}^{*}=0 in any optimal solution xx^{*}. Such a conclusion derives using a variant of the proximity theorem Theorem 6.5. To implement the main subroutine that fixes a variable to xi=0x_{i}=0, a careful combination of Ratio-Circuit and Support-Circuit iterations is used. Interestingly, the Ratio-Circuit iterations do not use the original cost function cc, but a perturbed objective function cc^{\prime}. The main progress in the subroutine is identifying new ‘large’ variables, similarly to the proof of Theorem 8.1. Perturbations are performed whenever a new large variable xjx_{j} is identified.

8.4 An improved bound for steepest-descent augmentation

We now prove Theorem 8.4. The proof follows the same lines as that of the Goldberg–Tarjan algorithm; see also [AMO93, Section 10.5] for the analysis. A factor logn\log n improvement over the original bound was given in [RG94]. A key property in the original analysis is that for a flow around a cycle (i.e., an elementary vector), every edge carries at least 1/|V|1/|V| fraction of the 1\ell_{1}-norm of the flow. This can be naturally replaced by the argument that for every elementary flow gg, the minimum nonzero value of |gi||g_{i}| is at least g1/(1+(m1)κA)\|g\|_{1}/(1+(m-1)\kappa_{A}).

The Goldberg–Tarjan algorithm has been generalized to separable convex minimization with linear constraints by Karzanov and McCormick [KM97]. Instead of κW\kappa_{W}, they use the maximum entry in a Graver basis (see Section 9 below). Lemma 10.1 in their paper proves a weakly polynomial bound similar to Lemma 8.8 for the separable convex setting. However, no strongly polynomial analysis is given (which is in general not possible for the nonlinear setting).

Our arguments will be based on the dual of LP(A,b,c,u)(A,b,c,u):

maxy,b\displaystyle\max\;\left\langle y,b\right\rangle- u,t\displaystyle\left\langle u,t\right\rangle (25)
Ay+st\displaystyle A^{\top}y+s-t =c\displaystyle=c
s,t\displaystyle s,t 0.\displaystyle\geq 0\,.

Recall the primal-dual slackness conditions from Section 6: if xx is feasible to LP(A,b,c,u)(A,b,c,u) and ymy\in\mathbb{R}^{m}, they are primal and dual optimal solutions if and only if ai,yci\left\langle a_{i},y\right\rangle\leq c_{i} if xi<uix_{i}<u_{i} and ai,yci\left\langle a_{i},y\right\rangle\geq c_{i} if xi>0x_{i}>0.

Let us start by formulating the steepest-descent direction as an LP. Let A¯=(A|A)m×(2n){\bar{A}}=(A|-A)\in\mathbb{R}^{m\times(2n)} and c¯=(cc)2n\bar{c}=\binom{c}{-c}\in\mathbb{R}^{2n}. Clearly, κA¯=κA\kappa_{\bar{A}}=\kappa_{A}. For a feasible solution x=x(t)x=x^{(t)} to LP(A,b,c,u)(A,b,c,u), we define residual variable set

N(x)={i[n]:xi<ui}{n+j:j[n]:xj>0}[2n],N(x)=\{i\in[n]:x_{i}<u_{i}\}\cup\{n+j:j\in[n]:x_{j}>0\}\subseteq[2n]\,,

and consider the system

min\displaystyle\min c¯,z\displaystyle\left\langle\bar{c},z\right\rangle (26)
A¯z\displaystyle\bar{A}z =0\displaystyle=0
𝟙2n,z\displaystyle\left\langle\mathbbm{1}_{2n},z\right\rangle =1\displaystyle=1
z[2n]N(x)\displaystyle z_{[2n]\setminus N(x)} =0\displaystyle=0
z\displaystyle z 0.\displaystyle\geq 0\,.

We can map a solution z2nz\in\mathbb{R}^{2n} to gng\in\mathbb{R}^{n} by setting gi=zizn+ig_{i}=z_{i}-z_{n+i}. We will assume that zz is chosen as a basic optimal solution. Observe that every basic feasible solution to this program maps to an elementary vector in ker(A¯)\ker(\bar{A}). The dual program can be equivalently written as

min\displaystyle\min ε\displaystyle\varepsilon (27)
a¯i,y\displaystyle\left\langle\bar{a}_{i},y\right\rangle c¯i+εiN(x).\displaystyle\leq\bar{c}_{i}+\varepsilon\quad\forall i\in N(x)\,.

For the solution xx, we let ε(x)\varepsilon(x) denote the optimal solution to this dual problem; thus, the optimal solution to the primal is ε(x)-\varepsilon(x). If ε=0\varepsilon=0, then xx and yy are complementary primal and dual optimal solutions to LP(A,b,c,u)(A,b,c,u). We first show that this quantity is monotone (a key step also in the analysis in [DLHL15]).

Lemma 8.7.

At every iteration of the circuit augmentation algorithm, ε(x(t+1))ε(x(t))\varepsilon(x^{(t+1)})\leq\varepsilon(x^{(t)}).

Proof.

Let ε=ε(x(t))\varepsilon=\varepsilon(x^{(t)}) and let yy be an optimal solution to (27) for N(x(t))N(x^{(t)}). We show that the same yy is also feasible for N(x(t+1))N(x^{(t+1)}); the claim follows immediately. There is nothing to prove if N(x(t+1))N(x(t))N(x^{(t+1)})\subseteq N(x^{(t)}), so let iN(x(t+1))N(x(t))i\in N(x^{(t+1)})\setminus N(x^{(t)}).

Assume first i[n+1,2n]i\in[n+1,2n]; let i=n+ji=n+j. This means that xj(t)=0<xj(t+1)x^{(t)}_{j}=0<x_{j}^{(t+1)}; therefore, the augmenting direction gg has gj>0g_{j}>0. Thus, for the optimal solution zz to (26), we must have zj>0z_{j}>0. By primal-dual slackness, aj,y=cj+ε\left\langle a_{j},y\right\rangle=c_{j}+\varepsilon; thus,

a¯i,y=aj,y=cjε<cj=c¯j.\left\langle\bar{a}_{i},y\right\rangle=\left\langle-a_{j},y\right\rangle=-c_{j}-\varepsilon<-c_{j}=\bar{c}_{j}\,.

The case i[n]i\in[n] is analogous. ∎

The next lemma shows that within every nn iterations, ε(x(t))\varepsilon(x^{(t)}) decreases by a factor depending on κA\kappa_{A}.

Lemma 8.8.

For every iteration tt, ε(x(t+n))(111+(m1)κA)ε(x(t))\varepsilon(x^{(t+n)})\leq\left(1-\frac{1}{1+(m-1)\kappa_{A}}\right)\varepsilon(x^{(t)}).

Proof.

Let us set N=N(x(t))N=N(x^{(t)}), ε=ε(x(t))\varepsilon=\varepsilon(x^{(t)}), and let y=y(t)y=y^{(t)} be an optimal dual solution to (27) for x(t)x^{(t)}. Let

T:={iN:a¯i,y>c¯i}[2n];T:=\{i\in N:\left\langle\bar{a}_{i},y\right\rangle>\bar{c}_{i}\}\subseteq[2n]\,;

that is, if i[n]i\in[n] then ai,y>ci\left\langle a_{i},y\right\rangle>c_{i}, and if i[n+1,2n]i\in[n+1,2n], i=n+ji=n+j, then aj,y<cj\left\langle a_{j},y\right\rangle<c_{j}. In particular, |T{i,i+n}|1|T\cap\{i,i+n\}|\leq 1 for every i[n]i\in[n]. Let z(t)z^{(t)} be the basic optimal solution to (26) for x(t)x^{(t)}. By complementary slackness, every isupp(z(t))i\in\mathrm{supp}(z^{(t)}) must have a¯i,y=c¯i+ε\left\langle\bar{a}_{i},y\right\rangle=\bar{c}_{i}+\varepsilon, and thus, supp(z(t))T\mathrm{supp}(z^{(t)})\subseteq T.

Claim 8.8.1.

Let us pick k>tk>t as the first iteration when for the basic optimal solution z(k)z^{(k)} to (26), we have supp(z(k))T\mathrm{supp}(z^{(k)})\setminus T\neq\emptyset. Then kt+nk\leq t+n, and the solution (y,ε)(y,\varepsilon) is still feasible for (27) for x(k)x^{(k)}.

Proof.

For r[t,k1]r\in[t,k-1], let T(r)=TN(x(r))T^{(r)}=T\cap N(x^{(r)}). We show that T(r+1)T(r)T^{(r+1)}\subsetneq T^{(r)}. Since |T|n|T|\leq n, this implies kt+nk\leq t+n. Let z(r)z^{(r)} be the basic optimal solution for (26); recall that the augmenting direction is computed with gj=zj(r)zn+j(r)g_{j}=z^{(r)}_{j}-z^{(r)}_{n+j}. By the choice of kk, supp(z(r))T(r)\mathrm{supp}(z^{(r)})\subseteq T^{(r)}. Thus, we may only increase xix_{i} for iT[n]i\in T\cap[n] and decrease it for i=jni=j-n for jT[n+1,2n]j\in T\cap[n+1,2n]. Consequently, every index ii entering N(x(r+1))N(x^{(r+1)}) has a¯i,y<c¯i\left\langle\bar{a}_{i},y\right\rangle<\bar{c}_{i}, and therefore (y,ε)(y,\varepsilon) remains feasible throughout.

We now turn to the proof of T(r+1)T(r)T^{(r+1)}\subsetneq T^{(r)}. Since we use a maximal augmentation, at least one index leaves T(r)T^{(r)} at each iteration. We claim that T(r+1)T(r)=T^{(r+1)}\setminus T^{(r)}=\emptyset. For a contradiction, assume there exists iT(r+1)T(r)i\in T^{(r+1)}\setminus T^{(r)}. If i[n]i\in[n], then i+ni+n must be in the support of z(r)z^{(r)}; in particular, i+nT(r)i+n\in T^{(r)}. But this would mean that {i,i+n}T\{i,i+n\}\subseteq T, in contradiction with the definition of TT. Similarly, for i[n+1,2n]i\in[n+1,2n]. ∎

Let us now consider the optimal solution z=z(k)z=z^{(k)} to (26) at iteration kk; by the above claim, (y,ε)(y,\varepsilon) is still a feasible dual solution. Select an index jsupp(z)Tj\in\mathrm{supp}(z)\setminus T.

c¯,z=A¯yc¯,zεisupp(z){j}zi=(1zj)ε(111+(m1)κA)ε.\left\langle-\bar{c},z\right\rangle=\left\langle\bar{A}^{\top}y-\bar{c},z\right\rangle\leq\varepsilon\sum_{i\in\mathrm{supp}(z)\setminus\{j\}}z_{i}=(1-z_{j})\varepsilon\leq\left(1-\frac{1}{1+(m-1)\kappa_{A}}\right)\varepsilon\,.

In the first inequality, we use that a¯i,yc¯iε\left\langle\bar{a}_{i},y\right\rangle-\bar{c}_{i}\leq\varepsilon by the feasibility of (y,ε)(y,\varepsilon), and a¯j,yc¯j0\left\langle\bar{a}_{j},y\right\rangle-\bar{c}_{j}\leq 0 by the choice of jTj\notin T. In the second equality, we use the constraint izi=1\sum_{i}z_{i}=1. The final inequality uses that zz is a basic solution, and therefore, an elementary vector in ker(A¯)\ker(\bar{A}). In particular |supp(z)|m|\mathrm{supp}(z)|\leq m, and ziκA¯zj=κAzjz_{i}\leq\kappa_{\bar{A}}z_{j}=\kappa_{A}z_{j}. Consequently, zj1/(1+(m1)κA)z_{j}\geq 1/(1+(m-1)\kappa_{A}). ∎

We say that the variable j[2n]j\in[2n] is frozen at iteration tt, if jN(x(t))j\notin N(x^{(t^{\prime})}) for any ttt^{\prime}\geq t. Thus, for j[n]j\in[n], xj=ujx_{j}=u_{j}, and for j[n+1,2n]j\in[n+1,2n], j=i+nj=i+n, xi=0x_{i}=0 for all subsequent iterations. We show that a new frozen variable can be found in every O(nmκAlog(mκA))O(nm\kappa_{A}\log(m\kappa_{A})) iterations; this implies Theorem 8.4.

Lemma 8.9.

For every iteration t1t\geq 1, there is a variable jN(x(t))j\in N(x^{(t)}) that is frozen at iteration kk for k=t+O(nmκAlog(κA+n))k=t+O(nm\kappa_{A}\log(\kappa_{A}+n)).

Proof.

Let ε=ε(x(t))\varepsilon=\varepsilon(x^{(t)}). By Lemma 8.8, we can choose k=t+O(nmκAlog(n+κA))k=t+O(nm\kappa_{A}\log(n+\kappa_{A})) such that ε=ε(x(k))<ε/(2n(κA+1))\varepsilon^{\prime}=\varepsilon(x^{(k)})<\varepsilon/(2n(\kappa_{A}+1)). Consider the primal and dual optimal solutions (z,y,ε)(z,y,\varepsilon) to (26) and (27) at iteration tt and (z,y,ε)(z^{\prime},y^{\prime},\varepsilon^{\prime}) at iteration kk.

Claim 8.9.1.

There exists a jsupp(z)j\in\mathrm{supp}(z) such that a¯j,y>c¯j+2n(κA+1)ε\left\langle\bar{a}_{j},y^{\prime}\right\rangle>\bar{c}_{j}+2n(\kappa_{A}+1)\varepsilon^{\prime}.

Proof.

For a contradiction, assume that a¯j,yc¯j2n(κA+1)ε\left\langle\bar{a}_{j},y^{\prime}\right\rangle-\bar{c}_{j}\leq 2n(\kappa_{A}+1)\varepsilon^{\prime} for every jsupp(z)j\in\mathrm{supp}(z). Then,

ε=c¯,z=A¯yc¯,z2n(κA+1)εjzj=2n(κA+1)ε,\varepsilon=\left\langle\bar{c},z\right\rangle=\left\langle\bar{A}^{\top}y-\bar{c},z\right\rangle\leq 2n(\kappa_{A}+1)\varepsilon^{\prime}\sum_{j}z_{j}=2n(\kappa_{A}+1)\varepsilon^{\prime}\,,

contradicting the choice of ε\varepsilon^{\prime}. ∎

We now show that all such indices are frozen at iteration kk by making use of Theorem 6.5 on proximity. Let x=x(k)x^{\prime}=x^{(k)} and x′′=x(k′′)x^{\prime\prime}=x^{(k^{\prime\prime})} for any k′′>kk^{\prime\prime}>k; let (y′′,ε′′)(y^{\prime\prime},\varepsilon^{\prime\prime}) be optimal to (27) at iteration k′′k^{\prime\prime}; we have ε′′ε\varepsilon^{\prime\prime}\leq\varepsilon^{\prime} by Lemma 8.7.

Let us define the cost cnc^{\prime}\in\mathbb{R}^{n} by

ci:={a¯i,y if 0<xi<uimax{ci,a¯i,y} if xi=uimin{ci,a¯i,y} if xi=0.c^{\prime}_{i}:=\begin{cases}\left\langle\bar{a}_{i},y^{\prime}\right\rangle&\mbox{ if }0<x^{\prime}_{i}<u_{i}\\ \max\{c_{i},\left\langle\bar{a}_{i},y^{\prime}\right\rangle\}&\mbox{ if }x^{\prime}_{i}=u_{i}\\ \min\{c_{i},\left\langle\bar{a}_{i},y^{\prime}\right\rangle\}&\mbox{ if }x^{\prime}_{i}=0\,.\end{cases}

If we replace the cost cc by cc^{\prime}, then xx^{\prime} and yy^{\prime} satisfy complementary slackness, and hence are optimal to LP(A,b,c,u)(A,b,c,u) and (25). Moreover, the optimality of (y,ε)(y^{\prime},\varepsilon^{\prime}) to (27) guarantees that ccε\|c^{\prime}-c\|_{\infty}\leq\varepsilon^{\prime}.

We similarly construct c′′c^{\prime\prime} for y′′y^{\prime\prime}, and note that x′′x^{\prime\prime} and y′′y^{\prime\prime} are primal and dual optimal solutions for the costs c′′c^{\prime\prime}, c′′cε′′\|c^{\prime\prime}-c\|_{\infty}\leq\varepsilon^{\prime\prime}. Further,

cc′′1ncc′′n(cc+c′′c)n(ε+ε′′)2nε\|c^{\prime}-c^{\prime\prime}\|_{1}\leq n\|c^{\prime}-c^{\prime\prime}\|_{\infty}\leq n\left(\|c^{\prime}-c\|_{\infty}+\|c^{\prime\prime}-c\|_{\infty}\right)\leq n(\varepsilon^{\prime}+\varepsilon^{\prime\prime})\leq 2n\varepsilon^{\prime}

We thus apply Theorem 6.5 for (x,y)(x^{\prime},y^{\prime}) for cc^{\prime} and (x′′,y′′)(x^{\prime\prime},y^{\prime\prime}) for c′′c^{\prime\prime}, showing that every variable jj as in Claim 8.9.1 must be frozen. ∎

9 Circuits, integer proximity, and Graver bases

We now briefly discuss implications of circuit imbalances to the integer program (IP) of the form

min\displaystyle\min c,x\displaystyle\left\langle c,x\right\rangle\quad (IP)
Ax\displaystyle Ax =b\displaystyle=b
x\displaystyle x 0,\displaystyle\geq 0,
x\displaystyle x n.\displaystyle\in\mathbb{Z}^{n}.

Many algorithms for (IP) solve the LP-relaxation first and deduce from the optimal solution of the relaxation information about the IP itself. The following proximity lemma shows that in case that (IP) is feasible, the distance of an optimal integral solution to the optimal solution of the relaxation can be bounded in terms of max-circuit imbalance κ¯A\bar{\kappa}_{A}. So, a local search within a radius of this guaranteed proximity will provide the optimal solution for the IP; see [Lee89, Proposition 4.1].

Lemma 9.1.

Let xx^{*} be an optimal solution to LP(A,b,c)(A,b,c). Then there exists an optimal solution x^\hat{x} to (IP) such that x^x1nκ¯W\|\hat{x}-x\|_{1}\leq n\bar{\kappa}_{W}.

Proof.

Let x^\hat{x} be an optimal solution to (IP) that minimizes x^x1\|\hat{x}-x^{*}\|_{1} and consider w=xx^Ww=x^{*}-\hat{x}\in W and a conformal circuit decomposition w=i=1kλigCiw=\sum_{i=1}^{k}\lambda_{i}g^{C_{i}} for some knk\leq n and circuits C1,,CkC_{1},\ldots,C_{k} and λ1,,λk0\lambda_{1},\ldots,\lambda_{k}\geq 0. Then, c,gCi0\left\langle c,g^{C_{i}}\right\rangle\leq 0 for all i[k]i\in[k] as otherwise xλigCix^{*}-\lambda_{i}g^{C_{i}} would be a feasible solution to the primal of LP(A,b,c)(A,b,c) with strictly better objective than xx^{*}. Further, note that λi1\lambda_{i}\leq 1 for all i[k]i\in[k] as otherwise x^gCi\hat{x}-g^{C_{i}} is a feasible solution to (IP) that has an objective as least as good as x^\hat{x} and would in 1\ell_{1} norm be strictly closer to xx^{*} than x^\hat{x}. Therefore,

x^xi=1kgCinκ¯W.\|\hat{x}-x^{*}\|_{\infty}\leq\sum_{i=1}^{k}\|g^{C_{i}}\|_{\infty}\leq n\bar{\kappa}_{W}.

Another popular and well-studied quantity in integer programming is the Graver basis, defined as follows.

Definition 9.2 (Graver basis).

The Graver basis of a matrix AA, denoted by 𝒢(A)\mathcal{G}(A), consists of all gker(A)ng\in\ker(A)\cap\mathbb{Z}^{n} such that there exists no h(ker(A)n){g}h\in(\ker(A)\cap\mathbb{Z}^{n})\setminus\left\{g\right\} such that gg and hh are conformal and |hi||gi||h_{i}|\leq|g_{i}| for all i[n]i\in[n]. We can further define

𝔤1(A):=maxv𝒢(A)v1,𝔤(A):=maxv𝒢(A)v.\mathfrak{g}_{1}(A):=\max_{v\in\mathcal{G}(A)}\|v\|_{1},\qquad\mathfrak{g}_{\infty}(A):=\max_{v\in\mathcal{G}(A)}\|v\|_{\infty}. (28)

See [LHK12] for extensive treatment of the Graver basis and [EHK+19] for more recent developments. Clearly, elementary vectors, scaled such that its entries have greatest common divisor equal to one belong to the Graver basis: {g(W)n:gcd(g)=1}𝒢(A)\left\{g\in\mathcal{F}(W)\cap\mathbb{Z}^{n}:\gcd(g)=1\right\}\subseteq\mathcal{G}(A).

We will furthermore see how the max-circuit imbalance measure and Graver basis are related.

Lemma 9.3.

κ¯A𝔤(A)nκ¯A\bar{\kappa}_{A}\leq\mathfrak{g}_{\infty}(A)\leq n\bar{\kappa}_{A}.

Proof.

The first inequality follows from the paragraph above, noting that

{gC:C𝒞(W)}𝒢(A),\left\{g^{C}:C\in\mathcal{C}(W)\right\}\subseteq\mathcal{G}(A),

for the normalized elementary vectors gCg^{C} with lcm(gC)=1\mathrm{lcm}(g^{C})=1. For the second inequality, let g𝒢(A)g\in\mathcal{G}(A) and g=i=1kλigCig=\sum_{i=1}^{k}{\lambda_{i}g^{C_{i}}} be a conformal circuit decomposition where knk\leq n. Note that λi<1\lambda_{i}<1 for all i[k]i\in[k] as otherwise hih^{i} would contradict that g𝒢(A)g\in\mathcal{G}(A). Therefore,

gi=1kλigCinκ¯A.\|g\|_{\infty}\leq\sum_{i=1}^{k}\lambda_{i}\|g^{C_{i}}\|_{\infty}\leq n\bar{\kappa}_{A}. (29)

Using the Steinitz lemma, Eisenbrand and Weismantel [EHK18, Lemma 2] gave a bound on 𝔤1(A)\mathfrak{g}_{1}(A) that only depends on mm but is independent of nn:

Theorem 9.4.

Let Am×nA\in\mathbb{Z}^{m\times n}. Then 𝔤1(A)(2mAmax+1)m\mathfrak{g}_{1}(A)\leq(2m\|A\|_{\max}+1)^{m}.

10 A decomposition conjecture

Let WnW\subseteq\mathbb{R}^{n} be a linear space. As the analogue of maximal augmentations, we say that a conformal circuit decomposition of zWz\in W is maximal, if it can be obtained as follows. If zWz\in W is an elementary vector, return the decomposition containing the single vector zz. Otherwise, select an arbitrary g(W)g\in\mathcal{F}(W) that is conformal with zz (in particular, supp(g)supp(z)\mathrm{supp}(g)\subsetneq\mathrm{supp}(z)), and set g1=αgg^{1}=\alpha g for the largest value α>0\alpha>0 such that zg1z-g^{1} is conformal with zz. Then, recursively apply this procedure to zg1z-g^{1} to obtain the other elementary vectors g2,,ghg^{2},\ldots,g^{h}. We have hnh\leq n, since the support decreases by at least one due to the maximal choice of α\alpha. If κW=κ˙W=1\kappa_{W}={\dot{\kappa}}_{W}=1, then it is easy to verify the following.

Proposition 10.1.

Let WnW\subseteq\mathbb{R}^{n} be a linear space with κW=1\kappa_{W}=1, and let zWnz\in W\cap\mathbb{Z}^{n}. Then, for every maximal conformal circuit decomposition z=k=1hgkz=\sum_{k=1}^{h}g^{k}, we have gk(W)ng^{k}\in\mathcal{F}(W)\cap\mathbb{Z}^{n}.

We formulate a conjecture asserting that this property generalizes for arbitrary κ˙W{\dot{\kappa}}_{W} values. Note that in the conjecture, we only require the existence of some (not necessarily maximal) conformal circuit decomposition.

Conjecture 10.1.1.

Let WnW\subseteq\mathbb{R}^{n} be a rational linear subspace. Then, for every zWnz\in W\cap\mathbb{Z}^{n}, there exists a conformal circuit decomposition z=k=1hgkz=\sum_{k=1}^{h}g^{k}, hnh\leq n such that each gkg^{k} is a 1/κ˙W1/\dot{\kappa}_{W}-integral vector in (W)\mathcal{F}(W).

Note that it is equivalent to require the same property for elements of the Graver basis z𝒢(A)z\in\mathcal{G}(A). Hence, the conjecture asserts that every vector in the Graver basis is a ‘nice’ combination of elementary vectors.

We present some preliminary evidence towards this conjecture:

Proposition 10.2.

Let WnW\subseteq\mathbb{R}^{n} be a rational linear subspace with κW=1\kappa^{*}_{W}=1. Then, for every zWnz\in W\cap\mathbb{Z}^{n}, and every maximal conformal circuit decomposition z=k=1hgkz=\sum_{k=1}^{h}g^{k}, we have that gkg^{k} is a 1/κ˙1/\dot{\kappa}-integral vector in (W)\mathcal{F}(W).

Proof.

Assume κDW=1\kappa_{DW}=1 for some D𝐃nD\in\mathbf{D}_{n}. By Theorem 5.3, we can select DD such that all diagonal entries di=Diid_{i}=D_{ii}\in\mathbb{Z} and di|κ˙Wd_{i}|{\dot{\kappa}}_{W}. Let z=k=1hgkz=\sum_{k=1}^{h}g^{k} be any maximal conformal circuit decomposition of zWnz\in W\cap\mathbb{Z}^{n}. Clearly, Dz=k=1hDgkDz=\sum_{k=1}^{h}Dg^{k} is also a maximal conformal circuit decomposition of DzDWnDz\in DW\cap\mathbb{Z}^{n}. By Proposition 10.1, Dgk(DW)nDg^{k}\in\mathcal{F}(DW)\cap\mathbb{Z}^{n}. Since diκ˙Wd_{i}\mid{\dot{\kappa}}_{W}, this implies that gkg^{k} is 1/κ˙W1/{\dot{\kappa}}_{W}-integral. ∎

By Theorem 5.5, this implies the conjecture whenever κ˙W=pα{\dot{\kappa}}_{W}=p^{\alpha} for pp\in\mathbb{P}, p>2p>2, α\alpha\in\mathbb{N}. Let us now consider the case when κ˙W{\dot{\kappa}}_{W} is a power of 2. We verify the conjecture when the decomposition contains at most three terms.

Proposition 10.3.

Let WnW\subseteq\mathbb{R}^{n} be a rational linear subspace with κW=2α\kappa_{W}=2^{\alpha} for αn\alpha\in\mathbb{Z}^{n}. If zWnz\in W\cap\mathbb{Z}^{n} has a maximal conformal circuit decomposition z=k=1hgkz=\sum_{k=1}^{h}g^{k} with h3h\leq 3, then each gkg^{k} is a 1/κ˙1/\dot{\kappa}-integral vector in (W)\mathcal{F}(W).

Proof.

Let us write the maximal conformal circuit decomposition in the form z=k=1hλkgkz=\sum_{k=1}^{h}\lambda_{k}g^{k} such that lcm(gk)=1\mathrm{lcm}(g^{k})=1, and all entries gik{±1,±2,±4,,±2α}g^{k}_{i}\in\{\pm 1,\pm 2,\pm 4,\ldots,\pm 2^{\alpha}\} for k[h]k\in[h], i[n]i\in[n]. There is nothing to prove for h=1h=1. If h=2h=2, then by the maximality of the decomposition, λ1=minj{zj/gj1}\lambda_{1}=\min_{j}\{{z_{j}}/{g^{1}_{j}}\}. Hence, λ1\lambda_{1} is 1/2α1/2^{\alpha}-integral. Consequently, both λ1g1\lambda_{1}g^{1} and λ2g2=zλ1g1\lambda_{2}g^{2}=z-\lambda_{1}g^{1} are 1/2α1/2^{\alpha}-integral.

If h=3h=3, then λ1g1\lambda_{1}g^{1} is 1/2α1/2^{\alpha}-integral as above. It also follows that λ2g2\lambda_{2}g^{2} and λ3g3\lambda_{3}g^{3} are 1/2β1/2^{\beta}-integral for some βα\beta\geq\alpha. Let us choose the smallest such β\beta; we are done if β=α\beta=\alpha.

Assume for a contradiction β>α\beta>\alpha. Let μk=2βλk\mu_{k}=2^{\beta}\lambda_{k} for k=1,2,3k=1,2,3. Thus, μk\mu_{k}\in\mathbb{Z}, μ1\mu_{1} is even, and at least one of μ2\mu_{2} and μ3\mu_{3} is odd. We show that both μ2\mu_{2} and μ3\mu_{3} must be odd. Let us first assume that μ3\mu_{3} is odd. There exists an i[n]i\in[n] such that |gi3|=1|g^{3}_{i}|=1. Then, 2βzi=μ1gi1+μ2gi2+μ3gi32^{\beta}z_{i}=\mu_{1}g^{1}_{i}+\mu_{2}g^{2}_{i}+\mu_{3}g^{3}_{i} implies that μ2\mu_{2} must also be odd. Similarly, if μ2\mu_{2} is odd then μ3\mu_{3} must also be odd.

Let us take any j[n]j\in[n] such that gj1=0g^{1}_{j}=0. Then, 2βzi=μ2gj2+μ3gj32^{\beta}z_{i}=\mu_{2}g^{2}_{j}+\mu_{3}g^{3}_{j}. Noting that |gj2||g^{2}_{j}| and |gj3||g^{3}_{j}| are powers of 2, both at most 2α2^{\alpha}, it follows that |gj2|=|gj3||g^{2}_{j}|=|g^{3}_{j}|; by conformity, we have gj2=gj3g^{2}_{j}=g^{3}_{j}.

Consequently, supp(g2g3)supp(g1)\mathrm{supp}(g^{2}-g^{3})\subseteq\mathrm{supp}(g^{1}). Clearly, g2g3W{0}g^{2}-g^{3}\in W\setminus\{0\}, and the containment is strict by the maximality of the decomposition: there exists an index isupp(z)i\in\mathrm{supp}(z) such that zj=λ1gj1z_{j}=\lambda_{1}g^{1}_{j}. This contradicts the fact that g1(W)g^{1}\in\mathcal{F}(W). ∎

Acknowledgements

The authors are grateful to Daniel Dadush for numerous inspiring discussions and joint work on circuit imbalances and linear programming, and to Luze Xu for pointing them to Jon Lee’s papers [Lee89, Lee90]. The authors would also like to thank Jesús De Loera, Martin Koutecký, and the anonymous reviewers for their helpful comments and suggestions.

References

  • [AK04] G. Appa and B. Kotnyek. Rational and integral kk-regular matrices. Discrete Mathematics, 275(1-3):1–15, 2004.
  • [AMO93] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice-Hall, Inc., 1993.
  • [BDSE+14] N. Bonifas, M. Di Summa, F. Eisenbrand, N. Hähnle, and M. Niemeier. On sub-determinants and the diameter of polyhedra. Discrete & Computational Geometry, 52(1):102–115, 2014.
  • [BFH15] S. Borgwardt, E. Finhold, and R. Hemmecke. On the circuit diameter of dual transportation polyhedra. SIAM Journal on Discrete Mathematics, 29(1):113–121, 2015.
  • [Bla76] R. G. Bland. On the generality of network flow theory. Presented at the ORSA/TIMS Joint National Meeting, Miami, FL, 1976.
  • [BR13] T. Brunsch and H. Röglin. Finding short paths on polytopes by the shadow vertex algorithm. In Proceedings of the 40th International Colloquium on Automata, Languages, and Programming (ICALP), pages 279–290. Springer, 2013.
  • [BSY18] S. Borgwardt, T. Stephen, and T. Yusun. On the circuit diameter conjecture. Discrete & Computational Geometry, 60(3):558–587, 2018.
  • [BT89] F. Barahona and É. Tardos. Note on Weintraub’s minimum-cost circulation algorithm. SIAM Journal on Computing, 18(3):579–583, 1989.
  • [BV20] S. Borgwardt and C. Viss. An implementation of steepest-descent augmentation for linear programs. Operations Research Letters, 48(3):323–328, 2020.
  • [Cam64] P. Camion. Matrices Totalement Unimodulaires et Problemes Combinatoires. PhD thesis, Communauté europénne de l’énergie atomique (EURATOM), 1964. EUR 1632.1.
  • [Cam65] P. Camion. Characterization of totally unimodular matrices. Proceedings of the American Mathematical Society, 16(5):1068–1068, May 1965.
  • [Ced57] I. Cederbaum. Matrices all of whose elements and subdeterminants are 11, 1-1, or 0. Journal of Mathematics and Physics, 36(1-4):351–361, 1957.
  • [CLS19] M. B. Cohen, Y. T. Lee, and Z. Song. Solving linear programs in the current matrix multiplication time. In Proceedings of the 51st Annual ACM Symposium on Theory of Computing (STOC), pages 938–942, 2019.
  • [DF94] M. Dyer and A. Frieze. Random walks, totally unimodular matrices, and a randomised dual simplex algorithm. Mathematical Programming, 64(1):1–16, 1994.
  • [DH16] D. Dadush and N. Hähnle. On the shadow simplex method for curved polyhedra. Discrete & Computational Geometry, 56(4):882–909, 2016.
  • [DHNV20] D. Dadush, S. Huiberts, B. Natura, and L. A. Végh. A scaling-invariant algorithm for linear programming whose running time depends only on the constraint matrix. In Proceedings of the 52nd Annual ACM Symposium on Theory of Computing (STOC), pages 761–774, 2020.
  • [Dik67] I. Dikin. Iterative solution of problems of linear and quadratic programming. Doklady Akademii Nauk, 174(4):747–748, 1967.
  • [Din70] E. A. Dinic. Algorithm for solution of a problem of maximum flow in networks with power estimation. In Soviet Math. Doklady, volume 11, pages 1277–1280, 1970.
  • [DKNV21] D. Dadush, Z. K. Koh, B. Natura, and L. A. Végh. On circuit diameter bounds via circuit imbalances. arXiv preprint arXiv:2111.07913, 2021.
  • [DLHL15] J. A. De Loera, R. Hemmecke, and J. Lee. On augmentation algorithms for linear and integer-linear programming: From Edmonds–Karp to Bland and beyond. SIAM Journal on Optimization, 25(4):2494–2511, 2015.
  • [DLKS19] J. A. De Loera, S. Kafer, and L. Sanità. Pivot rules for circuit-augmentation algorithms in linear optimization. arXiv preprint arXiv:1909.12863, 2019.
  • [DLSV12] J. A. De Loera, B. Sturmfels, and C. Vinzant. The central curve in linear programming. Foundations of Computational Mathematics, 12(4):509–540, 2012.
  • [DNV20] D. Dadush, B. Natura, and L. A. Végh. Revisiting Tardos’s framework for linear programming: Faster exact solutions using approximate solvers. In Proceedings of the 61st Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 931–942, 2020.
  • [DSEFM14] M. Di Summa, F. Eisenbrand, Y. Faenza, and C. Moldenhauer. On largest volume simplices and sub-determinants. In Proceedings of the 26th annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 315–323, 2014.
  • [DVZar] D. Dadush, L. A. Végh, and G. Zambelli. On finding exact solutions to linear programs in the oracle model. In Proceedings of the 2022 ACM-SIAM Symposium on Discrete Algorithms (SODA), 2022 (to appear).
  • [EHK18] F. Eisenbrand, C. Hunkenschröder, and K.-M. Klein. Faster Algorithms for Integer Programs with Block Structure. In Proceedings of the 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018), pages 49:1–13, 2018.
  • [EHK+19] F. Eisenbrand, C. Hunkenschröder, K.-M. Klein, M. Koutecký, A. Levin, and S. Onn. An algorithmic theory of integer programming. arXiv preprint arXiv:1904.01361, 2019.
  • [EJ70] J. Edmonds and E. L. Johnson. Matching: A well-solved class of integer linear programs. In Combinatorial Structures and Their Applications, pages 89–92. Gordon and Breach, 1970.
  • [EK72] J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM (JACM), 19(2):248–264, 1972.
  • [EV17] F. Eisenbrand and S. Vempala. Geometric random edge. Mathematical Programming, 164(1-2):325–339, 2017.
  • [Fra11] A. Frank. Connections in Combinatorial Optimization. Number 38 in Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, 2011.
  • [Ful68] D. Fulkerson. Networks, frames, blocking systems. Mathematics of the Decision Sciences, Part I, Lectures in Applied Mathematics, 2:303–334, 1968.
  • [GD21] J. B. Gauthier and J. Desrosiers. The minimum mean cycle-canceling algorithm for linear programs. European Journal of Operational Research, 2021. (in press).
  • [GHR95] O. Güler, A. J. Hoffman, and U. G. Rothblum. Approximations to solutions to systems of linear inequalities. SIAM Journal on Matrix Analysis and Applications, 16(2):688–696, 1995.
  • [GKS95] J. W. Grossman, D. M. Kulkarni, and I. E. Schochetman. On the minors of an incidence matrix and its Smith normal form. Linear Algebra and its Applications, 218:213–224, March 1995.
  • [GS86] A. M. Gerards and A. Schrijver. Matrices with the Edmonds–Johnson property. Combinatorica, 6(4):365–379, 1986.
  • [GT89] A. V. Goldberg and R. E. Tarjan. Finding minimum-cost circulations by canceling negative cycles. Journal of the ACM (JACM), 36(4):873–886, 1989.
  • [Hel57] I. Heller. On linear systems with integral valued solutions. Pacific Journal of Mathematics, 7(3):1351–1364, 1957.
  • [HK56] A. Hoffman and J. Kruskal. Integral boundary points of convex polyhedra. In Linear Inequalities and Related Systems, pages 223–246. Princeton University Press, 1956.
  • [HMNT93] D. S. Hochbaum, N. Megiddo, J. S. Naor, and A. Tamir. Tight bounds and 2-approximation algorithms for integer programs with two variables per inequality. Mathematical Programming, 62(1):69–83, 1993.
  • [Hof52] A. J. Hoffman. On approximate solutions of systems of linear inequalities. Journal of Research of the National Bureau of Standards, 49(4):263––265, 1952.
  • [HT56] I. Heller and C. Tompkins. An extension of a theorem of Dantzig’s. Linear inequalities and related systems, 38:247–254, 1956.
  • [HT02] J. C. Ho and L. Tunçel. Reconciliation of various complexity and condition measures for linear programming problems and a generalization of Tardos’ theorem. In Foundations of Computational Mathematics, pages 93–147. World Scientific, 2002.
  • [Kha95] L. Khachiyan. On the complexity of approximating extremal determinants in matrices. Journal of Complexity, 11(1):138–153, 1995.
  • [KM97] A. V. Karzanov and S. T. McCormick. Polynomial methods for separable convex optimization in unimodular linear spaces with applications. SIAM Journal on Computing, 26(4):1245–1275, 1997.
  • [Knu85] D. E. Knuth. Semi-optimal bases for linear dependencies. Linear and Multilinear Algebra, 17(1):1–4, 1985.
  • [KT95] D. Klatte and G. Thiere. Error bounds for solutions of linear equations and inequalities. Zeitschrift für Operations Research, 41(2):191–214, 1995.
  • [Lee89] J. Lee. Subspaces with well-scaled frames. Linear Algebra and its Applications, 114:21–56, 1989.
  • [Lee90] J. Lee. The incidence structure of subspaces with well-scaled frames. Journal of Combinatorial Theory, Series B, 50(2):265–287, 1990.
  • [LHK12] J. D. Loera, R. Hemmecke, and M. Köppe. Algebraic and Geometric Ideas in the Theory of Discrete Optimization. Society for Industrial and Applied Mathematics, USA, 2012.
  • [LS19] Y. T. Lee and A. Sidford. Solving linear programs with O~(rank)\tilde{O}(\sqrt{{\rm rank}}) linear system solves. arXiv preprint 1910.08033, 2019.
  • [MS00] S. T. McCormick and A. Shioura. Minimum ratio canceling is oracle polynomial for linear programming, but not strongly polynomial, even for networks. Operations Research Letters, 27(5):199–207, 2000.
  • [MT03] R. D. C. Monteiro and T. Tsuchiya. A variant of the Vavasis-Ye layered-step interior-point algorithm for linear programming. SIAM Journal on Optimization, 13(4):1054–1079, 2003.
  • [MT05] R. D. C. Monteiro and T. Tsuchiya. A new iteration-complexity bound for the MTY predictor-corrector algorithm. SIAM Journal on Optimization, 15(2):319–347, 2005.
  • [MT08] R. D. Monteiro and T. Tsuchiya. A strong bound on the integral of the central path curvature and its relationship with the iteration-complexity of primal-dual path-following LP algorithms. Mathematical Programming, 115(1):105–149, 2008.
  • [MTY93] S. Mizuno, M. Todd, and Y. Ye. On adaptive-step primal-dual interior-point algorithms for linear programming. Mathematics of Operations Research - MOR, 18:964–981, 11 1993.
  • [O’L90] D. P. O’Leary. On bounds for scaled projections and pseudoinverses. Linear Algebra and its Applications, 132:115–117, April 1990.
  • [PVZ20] J. Pena, J. C. Vera, and L. F. Zuluaga. New characterizations of Hoffman constants for systems of linear constraints. Mathematical Programming, 187:1–31, 2020.
  • [RG94] T. Radzik and A. V. Goldberg. Tight bounds on the number of minimum-mean cycle cancellations and related results. Algorithmica, 11(3):226–242, 1994.
  • [Roc69] R. T. Rockafellar. The elementary vectors of a subspace of RNR^{N}. In Combinatorial Mathematics and Its Applications: Proceedings North Carolina Conference, Chapel Hill, 1967, pages 104–127. The University of North Carolina Press, 1969.
  • [San12] F. Santos. A counterexample to the Hirsch conjecture. Annals of Mathematics, pages 383–412, 2012.
  • [Sch98] A. Schrijver. Theory of linear and integer programming. John Wiley & Sons, 1998.
  • [Sch03] A. Schrijver. Combinatorial Optimization – Polyhedra and Efficiency. Springer, 2003.
  • [Sey80] P. Seymour. Decomposition of regular matroids. Journal of Combinatorial Theory, Series B, 28(3):305–359, 1980.
  • [Sey93] M. Seysen. Simultaneous reduction of a lattice basis and its reciprocal basis. Combinatorica, 13(3):363–376, 1993.
  • [SIM00] M. Shigeno, S. Iwata, and S. T. McCormick. Relaxed most negative cycle and most positive cut canceling algorithms for minimum cost flow. Mathematics of Operations Research, 25(1):76–104, 2000.
  • [Sma98] S. Smale. Mathematical problems for the next century. The Mathematical Intelligencer, 20:7–15, 1998.
  • [SSZ91] G. Sonnevend, J. Stoer, and G. Zhao. On the complexity of following the central path of linear programs by linear extrapolation II. Mathematical Programming, 52(1-3):527–553, 1991.
  • [Ste89] G. Stewart. On scaled projections and pseudoinverses. Linear Algebra and its Applications, 112:189–193, 1989.
  • [SW99] A. S. Schulz and R. Weismantel. An oracle-polynomial time augmentation algorithm for integer programming. In Proceedings of the 10th ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 967–968, 1999.
  • [Tar86] É. Tardos. A strongly polynomial algorithm to solve combinatorial linear programs. Operations Research, pages 250–256, 1986.
  • [Tod90] M. J. Todd. A Dantzig–Wolfe-like variant of Karmarkar’s interior-point linear programming algorithm. Operations Research, 38(6):1006–1018, 1990.
  • [TTY01] M. J. Todd, L. Tunçel, and Y. Ye. Characterizations, bounds, and probabilistic analysis of two complexity measures for linear programming problems. Mathematical Programming, 90(1):59–69, Mar 2001.
  • [Tun99] L. Tunçel. Approximating the complexity measure of Vavasis-Ye algorithm is NP-hard. Mathematical Programming, 86(1):219–223, Sep 1999.
  • [Tut65] W. T. Tutte. Lectures on matroids. J. Research of the National Bureau of Standards (B), 69:1–47, 1965.
  • [vdBLL+21] J. van den Brand, Y. P. Liu, Y.-T. Lee, T. Saranurak, A. Sidford, Z. Song, and D. Wang. Minimum cost flows, MDPs, and L1-regression in nearly linear time for dense instances. In Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 859–869, 2021.
  • [vdBLN+20] J. van den Brand, Y.-T. Lee, D. Nanongkai, R. Peng, T. Saranurak, A. Sidford, Z. Song, and D. Wang. Bipartite matching in nearly-linear time on moderately dense graphs. In IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), pages 919–930, 2020.
  • [VY96] S. A. Vavasis and Y. Ye. A primal-dual interior point method whose running time depends only on the constraint matrix. Mathematical Programming, 74(1):79–120, 1996.
  • [Wal89] C. Wallacher. A generalization of the minimum-mean cycle selection rule in cycle canceling algorithms. unpublished manuscript, Institute für Angewandte Mathematik, Technische Universität Braunschweig, 1989.
  • [Way02] K. D. Wayne. A polynomial combinatorial algorithm for generalized minimum cost flow. Mathematics of Operations Research, pages 445–459, 2002.
  • [Wei74] A. Weintraub. A primal algorithm to solve network flow problems with convex costs. Management Science, 21(1):87–97, 1974.
  • [WZ99] C. Wallacher and U. T. Zimmermann. A polynomial cycle canceling algorithm for submodular flows. Mathematical programming, 86(1):1–15, 1999.

Appendix A Appendix: Proof of Proposition 3.20

See 3.20

Proof.

We know all other representations of the space like A~\tilde{A} such that ker(A~)=ker(A)\ker(\tilde{A})=\ker(A) are of the form A~=BA\tilde{A}=BA where BB is a 2×22\times 2 invertible matrix. Since A11=1A_{11}=1 then to get an integral A~\tilde{A} we need to have integer B11B_{11} and B21B_{21}. Furthermore since the g.c.d. of the numbers in the second column is equal to 11, then B12B_{12} and B22B_{22} should be integers as well.

It can be verified by computer that the only 2×12\times 1 matrices like vv such that all entries of vTAv^{T}A are divisors of 58505850 are

±[94],±[103],±[133],±[01]\pm\begin{bmatrix}9\\ -4\end{bmatrix},\pm\begin{bmatrix}10\\ -3\end{bmatrix},\pm\begin{bmatrix}13\\ -3\end{bmatrix},\pm\begin{bmatrix}0\\ 1\end{bmatrix}

Checking all different 2×22\times 2 matrices we can get these matrices:

[94103][1343013910]\displaystyle\begin{bmatrix}9&-4\\ 10&-3\end{bmatrix}\begin{bmatrix}1&3&4&3\\ 0&13&9&10\end{bmatrix} =[9-2501310-9130]\displaystyle=\begin{bmatrix}\textbf{9}&\textbf{-25}&0&-13\\ \textbf{10}&\textbf{-9}&13&0\end{bmatrix}
[133103][1343013910]\displaystyle\begin{bmatrix}13&-3\\ 10&-3\end{bmatrix}\begin{bmatrix}1&3&4&3\\ 0&13&9&10\end{bmatrix} =[130259109130]\displaystyle=\begin{bmatrix}\textbf{13}&0&\textbf{25}&9\\ \textbf{10}&-9&\textbf{13}&0\end{bmatrix}
[94133][1343013910]\displaystyle\begin{bmatrix}9&-4\\ 13&-3\end{bmatrix}\begin{bmatrix}1&3&4&3\\ 0&13&9&10\end{bmatrix} =[9250-13130259]\displaystyle=\begin{bmatrix}\textbf{9}&-25&0&\textbf{-13}\\ \textbf{13}&0&25&\textbf{9}\end{bmatrix}
[0194][1343013910]\displaystyle\begin{bmatrix}0&1\\ 9&-4\end{bmatrix}\begin{bmatrix}1&3&4&3\\ 0&13&9&10\end{bmatrix} =[0139109-250-13]\displaystyle=\begin{bmatrix}0&\textbf{13}&9&\textbf{10}\\ 9&\textbf{-25}&0&\textbf{-13}\end{bmatrix}
[01103][1343013910]\displaystyle\begin{bmatrix}0&1\\ 10&-3\end{bmatrix}\begin{bmatrix}1&3&4&3\\ 0&13&9&10\end{bmatrix} =[01391010-9130]\displaystyle=\begin{bmatrix}0&\textbf{13}&\textbf{9}&10\\ 10&\textbf{-9}&\textbf{13}&0\end{bmatrix}
[01133][1343013910]\displaystyle\begin{bmatrix}0&1\\ 13&-3\end{bmatrix}\begin{bmatrix}1&3&4&3\\ 0&13&9&10\end{bmatrix} =[013910130259]\displaystyle=\begin{bmatrix}0&13&\textbf{9}&\textbf{10}\\ 13&0&\textbf{25}&\textbf{9}\end{bmatrix}

All of these matrices contain a 2×22\times 2 submatrix such that its inverse is not 15850\frac{1}{5850}-integral. ∎