This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Optimizing Gershgorin for Symmetric Matrices

Lee DeVille
Department of Mathematics
University of Illinois
Abstract

The Gershgorin Circle Theorem is a well-known and efficient method for bounding the eigenvalues of a matrix in terms of its entries. If AA is a symmetric matrix, by writing A=B+x𝟏A=B+x{\bf 1}, where 𝟏{\bf 1} is the matrix with unit entries, we consider the problem of choosing xx to give the optimal Gershgorin bound on the eigenvalues of BB, which then leads to one-sided bounds on the eigenvalues of AA. We show that this xx can be found by an efficient linear program (whose solution can in may cases be written in closed form), and we show that for large classes of matrices, this shifting method beats all existing piecewise linear or quadratic bounds on the eigenvalues. We also apply this shifting paradigm to some nonlinear estimators and show that under certain symmetries this also gives rise to a tractable linear program. As an application, we give a novel bound on the lowest eigenvalue of a adjacency matrix in terms of the “top two” or “bottom two” degrees of the corresponding graph, and study the efficacy of this method in obtaining sharp eigenvalue estimates for certain classes of matrices.

AMS classification: 65F15, 15A18, 15A48

Keywords: Gershgorin Circle Theorem, Eigenvalue Bounds, Linear Program, Adjacency Matrix

1 Introduction

One of the best known bounds for the eigenvalues of a matrix is the classical Gershgorin Circle Theorem [1, 2], which allows us to determine an inclusion domain of the eigenvalues of a matrix that can be determined solely from the entries of this matrix. Specifically, given an n×nn\times n matrix AA whose entries are denoted by aija_{ij}, we define

Ri(A):=ji|aij|,Di={z:|zaii|Ri(A)}.R_{i}(A):=\sum_{j\neq i}\left|{a_{ij}}\right|,\quad D_{i}=\{z\in\mathbb{C}:\left|{z-a_{ii}}\right|\leq R_{i}(A)\}.

Then the spectrum of AA is contained inside the union of the DiD_{i}.

If AA is a symmetric matrix, then the spectrum is real. In general, we might like to determine whether AA is positive definite, or even to determine a bound on the spectral gap of AA, i.e. a number α>0\alpha>0 such that all eigenvalues are greater than α\alpha. Let us refer to the diagonal dominance of the iith row of the matrix by the quantity aiiRi(A)a_{ii}-R_{i}(A). A restatement of Gershgorin is that the smallest eigenvalue is at least as large as the smallest diagonal dominance of any row.

As an example, let us consider the matrix

A=(655565556),A=\left(\begin{array}[]{ccc}6&5&5\\ 5&6&5\\ 5&5&6\end{array}\right), (1)

and we see the quantity aiiRi(A)=4a_{ii}-R_{i}(A)=-4 for all rows. From this and Gershgorin we know that all eigenvalues are at least 4-4, but we cannot assert that AA is positive definite. However, we can compute directly that Spec(A)={1,1,16}{\mathrm{Spec}}(A)=\{1,1,16\}; not only is AA positive definite, it has a spectral gap of 11.

We then note that A=I+5𝟏A=I+5{\bf 1} (here and below we use 𝟏{\bf 1} to denote the matrix with all entries equal to 11). Since II is positive definite and 𝟏{\bf 1} is positive semidefinite, then it follows that AA is positive definite as well. In fact, we know that the eigenvalues of AA are at least as large as those of II, so that AA has a spectral gap of at least 11, which gives us the exact spectral gap.

Let us consider the family of matrices Ax𝟏A-x{\bf 1} with x0x\geq 0, and consider how the diagonal dominance changes. Subtracting off the value of xx from each entry moves the diagonal to the left by xx, but as long as the off-diagonal terms are positive, it decreases the off-diagonal sum by a factor of 2x2x. This has the effect of moving the center of each Gershgorin circle to the left by xx, but decrease the radius by 2x2x, effectively moving the left-hand edge of each circle to the right by xx. This gives an improved estimate for the lowest eigenvalue. It is not hard to see in this case that choosing x=5x=5 gives the optimal diagonal dominance of 1. Generalizing this idea: let AA be a symmetric n×nn\times n matrix with strictly positive entries. Since 𝟏{\bf 1} is positive semidefinite, whenever x0x\geq 0, A=B+x𝟏A=B+x{\bf 1} is positive definite whenever BB is. However, subtracting xx from each entry, at least until some of the entries become negative, will only increase the diagonal dominance of each row: each center of each Gershgorin circle moves to the left by xx, but the radius decreases by (N1)x(N-1)x. From this it follows that we can always improve the Gershgorin bound for such a matrix with positive entries by choosing xx to be the smallest off-diagonal term.

However, now consider the matrix

A=(613174345).A=\left(\begin{array}[]{ccc}6&1&3\\ 1&7&4\\ 3&4&5\end{array}\right). (2)

We see directly that the bottom row has the smallest diagonal dominance of 2-2. Subtracting off a 1 from each entry gives

A1𝟏=(502063234),A-1\cdot{\bf 1}=\left(\begin{array}[]{ccc}5&0&2\\ 0&6&3\\ 2&3&4\end{array}\right),

and now the bottom row has a diagonal dominance of 1-1. From here we are not able to deduce that AA is positive definite. However, as it turns out, for all x(2,10/3)x\in(2,10/3), the matrix Ax𝟏A-x{\bf 1} has a positive diagonal dominance for each row, and by the techniques described in this paper we can compute that the optimal choice is x=3x=3:

A3𝟏=(320241012),A-3\cdot{\bf 1}=\left(\begin{array}[]{ccc}3&-2&0\\ -2&4&1\\ 0&1&2\end{array}\right),

and each row has a diagonal dominance of 11. Since 3𝟏3\cdot{\bf 1} is positive semi-definite, this guarantees that AA is positive definite and, in fact, has a spectral gap of at least 11. In fact, we can compute directly that the eigenvalues of AA are (11.4704,5.39938,1.13026)(11.4704,5.39938,1.13026), so this it not too bad of a bound for this case.

At first it might be counter-intuitive that we should subtract off entries and cause some off-diagonal entries to become negative, since the Ri(A)R_{i}(A) is a function of the absolute value of the off-diagonal terms and making these entries more negative can actually make the bound from the iith row worse. However, there is a coupling between the rows of the matrix, since we are looking for the minimal diagonal dominance across all of the rows: it might help us to make some rows worse, as long as we make the worst row better. In this example, increasing xx past 11 does worsen the diagonal dominance of the first row, but it continues to increase the dominance of the third row, and thus the minimum is increasing as we increase xx. Only until the rows pass each other should we stop, and this happens at x=3x=3. From this argument it is also clear that the optimal value of xx must occur when two rows have an equal diagonal dominance.

The topic of this paper is to study the possibility of using the idea of shifting the entries of the matrix uniformly to obtaining better eigenvalue bounds. Note that this technique is potentially applicable to any eigenvalue bounding method that is a function of the entries of the matrix, and we consider examples other than Gershgorin as well. We show that using Gershgorin, which is piecewise linear in the coefficients of the matrix, the technique boils down to the simultaneous optimization of families of piecewise linear functions, so can always be solved, e.g. by conversion to a linear program. We will also consider “nonlinear” versions of Gershgorin theorems and shift these as well, but we see that for these cases we can lose convexity and in general this problems are difficult to solve.

We show below that for many classes of matrices, the “shifted Gershgorin” gives significantly better bounds than many of the unshifted nonlinear methods that are currently known.

2 Notation and background

Throughout the remainder of this paper, all matrices are symmetric.

Definition 2.1.

Let 𝖰{\mathsf{{{Q}}}} be a method that gives upper and lower bounds on the eigenvalues of a matrix as a function of its entries. We denote by λ𝖰(A)\lambda_{{\mathsf{{{{Q}}}}}}({A}) as the greatest lower bound method 𝖰{\mathsf{{{Q}}}} can give on the spectrum of AA, and by ρ𝖰(A)\rho_{{\mathsf{{{{Q}}}}}}({A}) the least upper bound.

Here we review several known methods. There are many methods known that we do not mention here; a very comprehensive and engaging review of many methods and their histories is contained in [2].

  1. 1.

    Gershgorin’s Circles. [1] As mentioned in the introduction, define

    Ri(A)=ji|aij|,Di={z:|zaii|<Ri(A)}.R_{i}(A)=\sum_{j\neq i}\left|{a_{ij}}\right|,\quad D_{i}=\{z\in\mathbb{C}:\left|{z-a_{ii}}\right|<R_{i}(A)\}.

    Then Spec(A)iDi{\mathrm{Spec}}(A)\subseteq\cup_{i}D_{i}. For this method, we obtain the bounds

    λ𝖦(A)=mini(aiiRi(A)),ρ𝖦(A)=maxi(aii+Ri(A)).\lambda_{{\mathsf{{{{G}}}}}}({A})=\min_{i}(a_{ii}-R_{i}(A)),\quad\rho_{{\mathsf{{{{G}}}}}}({A})=\max_{i}(a_{ii}+R_{i}(A)). (3)
  2. 2.

    Brauer’s Ovals of Cassini. [3][4, eqn. (21)] Using the definitions of Ri(A)R_{i}(A) as above, let us define

    Bij={z:|zaii||zajj|Ri(A)Rj(A)}.B_{ij}=\{z\in\mathbb{C}:\left|{z-a_{ii}}\right|\left|{z-a_{jj}}\right|\leq R_{i}(A)R_{j}(A)\}.

    Then Spec(A)ijBi,j{\mathrm{Spec}}(A)\subseteq\cup_{i\neq j}B_{i,j}. It can be shown [2, Theorem 2.3] that this method always gives a nonworse bound than Gershgorin (note for example that choosing i=ji=j gives a Gershgorin disk). The bounds come from the left- or right-hand edges of this domain, so that zaii,zajjz-a_{ii},z-a_{jj} have the same sign. This gives

    aiiajjRi(A)Rj(A)(aii+ajj)z+z20.a_{ii}a_{jj}-R_{i}(A)R_{j}(A)-(a_{ii}+a_{jj})z+z^{2}\leq 0.

    The roots of this quadratic are

    aii+ajj2±(aiiajj)2+Ri(A)Rj(A).\frac{a_{ii}+a_{jj}}{2}\pm\sqrt{(a_{ii}-a_{jj})^{2}+R_{i}(A)R_{j}(A)}.

    Therefore we have

    λ𝖡(A)\displaystyle\lambda_{{\mathsf{{{{B}}}}}}({A}) =minij(aii+ajj2(aiiajj)2+Ri(A)Rj(A)),\displaystyle=\min_{i\neq j}\left(\frac{a_{ii}+a_{jj}}{2}-\sqrt{(a_{ii}-a_{jj})^{2}+R_{i}(A)R_{j}(A)}\right), (4)
    ρ𝖡(A)\displaystyle\rho_{{\mathsf{{{{B}}}}}}({A}) =maxij(aii+ajj2+(aiiajj)2+Ri(A)Rj(A)),\displaystyle=\max_{i\neq j}\left(\frac{a_{ii}+a_{jj}}{2}+\sqrt{(a_{ii}-a_{jj})^{2}+R_{i}(A)R_{j}(A)}\right), (5)
  3. 3.

    Melman method. According to Theorem 2 of [5], if we define

    Ωij(A)={z:|(zaii)(zajj)aijaji||zajj|ki,j|aik|+|aij|ki,j|ajk|},\Omega_{ij}(A)=\{z\in\mathbb{C}:\left|{(z-a_{ii})(z-a_{jj})-a_{ij}a_{ji}}\right|\leq\left|{z-a_{jj}}\right|\sum_{k\neq i,j}\left|{a_{ik}}\right|+\left|{a_{ij}}\right|\sum_{k\neq i,j}\left|{a_{jk}}\right|\},

    then

    Spec(A)i=1njiΩij(A).{\mathrm{Spec}}(A)\subseteq\bigcup_{i=1}^{n}\bigcap_{j\neq i}\Omega_{ij}(A).

    From this we can obtain bounds λ𝖬𝖲(A),ρ𝖬𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{M}}}}}}({A}),\rho^{\mathsf{S}}_{{\mathsf{{{{M}}}}}}({A}) similar to those in (45).

  4. 4.

    Cvetkovic–Kostic–Varga method. Let SS be a nonempty set of [n][n], and S¯=[n]S\overline{S}=[n]\setminus S, and let

    RiS(A)=jS,ji|aij|,RiS¯(A)=jS,ji|aij|.R_{i}^{S}(A)=\sum_{j\in S,j\neq i}\left|{a_{ij}}\right|,\quad R_{i}^{\overline{S}}(A)=\sum_{j\not\in S,j\neq i}\left|{a_{ij}}\right|.

    Further define

    ΓiS(A)\displaystyle\Gamma_{i}^{S}(A) ={z:|zaii|RiS(A)},\displaystyle=\left\{z\in\mathbb{C}:\left|{z-a_{ii}}\right|\leq R_{i}^{S}(A)\right\},
    VijS(A)\displaystyle V_{ij}^{S}(A) ={zC:(|zaii|RiS(A))(|zajj|RjS¯(A))RiS¯(A)RjS(A)},\displaystyle=\left\{z\in C:(\left|{z-a_{ii}}\right|-R_{i}^{S}(A))(\left|{z-a_{jj}}\right|-R_{j}^{\overline{S}}(A))\leq R_{i}^{\overline{S}}(A)R_{j}^{S}(A)\right\},

    According to Theorem 6 of [6],

    Spec(A)(iSΓiS)(iS,jS¯VijS(A)).{\mathrm{Spec}}(A)\subseteq\left(\bigcup_{i\in S}\Gamma_{i}^{S}\right)\cup\left(\bigcup_{i\in S,j\in\overline{S}}V_{ij}^{S}(A)\right).
Definition 2.2.

Let 𝖰{\mathsf{{{Q}}}} be a method as above, and let us define the shifted-𝖰{\mathsf{{{Q}}}} method given by

λ𝖰𝖲(A)=supx0λ𝖰(Ax𝟏),ρ𝖰𝖲(A)=infx0ρ𝖰(Ax𝟏).\lambda^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A})=\sup_{x\geq 0}\lambda_{{\mathsf{{{{Q}}}}}}({A-x{\bf 1}}),\quad\rho^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A})=\inf_{x\leq 0}\rho_{{\mathsf{{{{Q}}}}}}({A-x{\bf 1}}).

Note that the domain of optimization is x0x\geq 0 in one definition and x0x\leq 0 in the other. For example λ𝖰𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A}) is the best lower bound that we can obtain by method 𝖰{\mathsf{{{Q}}}} on the family given by subtracting a positive number from the entries of AA, whereas ρ𝖰𝖲(A)\rho^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A}) is the best upper bound obtained by adding a positive number.

Remark 2.3.

We will focus our study on the first two methods in the interest of brevity, i.e. we consider 𝖰=𝖦,𝖡{\mathsf{{{Q}}}}={\mathsf{{{G,B}}}} for Gershgorin and Brauer. However, we present all four above as representative of a truly large class of methods that share a common feature: the bounds are always a function of the diagonal entries and a sum of (perhaps a subset) of the absolute values of the off-diagonal terms on each row. As we show below in Section 3, the shifted bounds are a function not just of sums of off-diagonal terms, but a function of each individual entry. As such, it is not surprising that shifted bounds can in some cases vastly outperform the unshifted bounds — we will see even that shifted Gershgorin can outperform even unshifted nonlinear bounds such as Brauer.

Lemma 2.4.

If 𝖰{\mathsf{{{Q}}}} is a method that bounds the spectrum of a matrix, then the spectrum of AA is contained in the interval [λ𝖰𝖲(A),ρ𝖰𝖲(A)][\lambda^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A}),\rho^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A})], i.e. the scaled bounds are good upper and lower bounds for the eigenvalues.

Proof.

The proof of this follows from the Courant Minimax Theorem [7, §4.2][8, §4]: Let AA be a symmetric n×nn\times n matrix, and let λ1(A)λ2(A)λn(A)\lambda_{1}(A)\leq\lambda_{2}(A)\leq\dots\leq\lambda_{n}(A) be the eigenvalues, then we have

λ1(A)=minv0Av,vv,v,λn(A)=maxv0Av,vv,v.\lambda_{1}(A)=\min_{v\neq 0}\frac{\left\langle{{Av}},{{v}}\right\rangle}{\left\langle{{v}},{{v}}\right\rangle},\quad\lambda_{n}(A)=\max_{v\neq 0}\frac{\left\langle{{Av}},{{v}}\right\rangle}{\left\langle{{v}},{{v}}\right\rangle}.

If we write A=B+x𝟏A=B+x{\bf 1}, then

λ1(A)=minv0(B+x𝟏)v,vv,vminv0Bv,vv,v+minv0x𝟏v,vv,v.\lambda_{1}(A)=\min_{v\neq 0}\frac{\left\langle{{(B+x{\bf 1})v}},{{v}}\right\rangle}{\left\langle{{v}},{{v}}\right\rangle}\geq\min_{v\neq 0}\frac{\left\langle{{Bv}},{{v}}\right\rangle}{\left\langle{{v}},{{v}}\right\rangle}+\min_{v\neq 0}\frac{\left\langle{{x{\bf 1}v}},{{v}}\right\rangle}{\left\langle{{v}},{{v}}\right\rangle}. (6)

If x0x\geq 0, then the second term is zero (in fact, 𝟏{\bf 1} is a rank-one matrix with top eigenvalue nn), and thus we have λ1(A)λ1(B)\lambda_{1}(A)\geq\lambda_{1}(B). Since B=Ax𝟏B=A-x{\bf 1}, we have for all x0x\geq 0,

λ1(A)λ1(Ax𝟏)λ𝖰(Ax𝟏),\lambda_{1}(A)\geq\lambda_{1}(A-x{\bf 1})\geq\lambda_{{\mathsf{{{{Q}}}}}}({A-x{\bf 1}}),

and the same is true after taking the supremum. Similarly,

λn(A)maxv0Bv,vv,v+maxv0x𝟏v,vv,v,\lambda_{n}(A)\leq\max_{v\neq 0}\frac{\left\langle{{Bv}},{{v}}\right\rangle}{\left\langle{{v}},{{v}}\right\rangle}+\max_{v\neq 0}\frac{\left\langle{{x{\bf 1}v}},{{v}}\right\rangle}{\left\langle{{v}},{{v}}\right\rangle},

and if x0x\leq 0, the second term is zero. Therefore, for all x0x\leq 0,

λn(A)λn(Ax𝟏)ρ𝖰(Ax𝟏),\lambda_{n}(A)\leq\lambda_{n}(A-x{\bf 1})\leq\rho_{{\mathsf{{{{Q}}}}}}({A-x{\bf 1}}),

and the same remains true after taking the infimum of the right-hand side. ∎

It is clear from the definition that λ𝖰𝖲(A)λ𝖰(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A})\geq\lambda_{{\mathsf{{{{Q}}}}}}({A}) and ρ𝖰𝖲(A)ρ𝖰(A)\rho^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A})\leq\rho_{{\mathsf{{{{Q}}}}}}({A}) for any method 𝖰{\mathsf{{{Q}}}} and any matrix AA. In short, shifting cannot give a worse estimate, and by Lemma 2.4 this improved estimate is still a good bound. Moreover, as long as

(ddxλ𝖰(Ax𝟏))|x=0+>0,\left.\left(\frac{d}{dx}\lambda_{{\mathsf{{{{Q}}}}}}({A-x{\bf 1}})\right)\right|_{x=0+}>0,

then λ𝖰𝖲(A)>λ𝖰(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{Q}}}}}}({A})>\lambda_{{\mathsf{{{{Q}}}}}}({A}), and similarly for the upper bounds. Moreover, we show in many contexts that the bounding functions are convex in xx, so the converse is also true in many cases.

Remark 2.5.

If we reëxamine the proof of Lemma 2.4, we might hope that we could obtain information on a larger domain of shifts; for example, in (6), when x<0x<0, we have minx0x𝟏v,v=xNv,v\min_{x\neq 0}\left\langle{{x{\bf 1}v}},{{v}}\right\rangle=xN\left\langle{{v}},{{v}}\right\rangle, so we have for x0x\leq 0,

λ1(A)λ1(Ax𝟏)+xN.\lambda_{1}(A)\geq\lambda_{1}(A-x{\bf 1})+xN. (7)

We might hope that this could give useful information even when x0x\leq 0, but some calculation can show that (at least for all of the methods described in this paper) the first term on the right-hand side cannot grow fast enough to make the right-hand side of this inequality increase as xx decreases, so this will always give a worse bound than just restricting to x0x\geq 0. For example, for Gershgorin, the absolute best case is that all of the signs align and the edge of the Gershgorin disk is moving to the right at rate NxNx, but this is exactly counteracted by the second term on the right-hand side of (7).

3 Shifted Gershgorin bounds

In this section, we concentrate on the shifted Gershgorin method. Recall that the standard Gershgorin bounds are given by

λ𝖦(A)=mini(aiiRi(A)),ρ𝖦(A)=maxi(aii+Ri(A)).\lambda_{{\mathsf{{{{G}}}}}}({A})=\min_{i}(a_{ii}-R_{i}(A)),\quad\rho_{{\mathsf{{{{G}}}}}}({A})=\max_{i}(a_{ii}+R_{i}(A)).

Let us define

di(A,x)=aiixji|aijx|,Di(A,x)=aiix+ji|aijx|,d_{i}(A,x)=a_{ii}-x-\sum_{j\neq i}\left|{a_{ij}-x}\right|,\quad D_{i}(A,x)=a_{ii}-x+\sum_{j\neq i}\left|{a_{ij}-x}\right|, (8)

then

λ𝖦𝖲(A)=supx0minidi(A,x),ρ𝖦𝖲(A)=infx0maxiDi(A,x).\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\sup_{x\geq 0}\min_{i}d_{i}(A,x),\quad\rho^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\inf_{x\leq 0}\max_{i}D_{i}(A,x).
Theorem 3.1.

The main results for the shifted Gershgorin are as follows:

  • (Local Improvement.) Iff, for every ii with di(A,0)=λ𝖦(A)d_{i}(A,0)=\lambda_{{\mathsf{{{{G}}}}}}({A}), row ii of the matrix AA has at least n/21\lfloor n/2-1\rfloor positive numbers, then λ𝖦𝖲(A)>λ𝖦(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})>\lambda_{{\mathsf{{{{G}}}}}}({A}). Iff, for every ii with Di(A,0)=ρ𝖦(A)D_{i}(A,0)=\rho_{{\mathsf{{{{G}}}}}}({A}), row ii of the matrix AA has at least n/21\lfloor n/2-1\rfloor negative numbers, then ρ𝖦𝖲(A)<ρ𝖦(A)\rho^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})<\rho_{{\mathsf{{{{G}}}}}}({A}).

  • (Global bounds.) Each of the functions λ𝖦(Ax𝟏),ρ𝖦(Ax𝟏)\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}),\rho_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) can be written as a single piecewise-linear function. Alternatively, we can write these functions as cut out by nn lines, i.e.

    λ𝖦(Ax𝟏)=mink=1n(rkx+sk),ρ𝖦(Ax𝟏)=maxk=1n(Rkx+Sk),\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}})=\min_{k=1}^{n}(r_{k}x+s_{k}),\quad\rho_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}})=\max_{k=1}^{n}(R_{k}x+S_{k}), (9)

    where the constants in the above expressions are given in Definition 3.3;

  • (Convexity.) It follows from the previous that the functions λ𝖦(Ax𝟏)-\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) and ρ𝖦(Ax𝟏)\rho_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) are convex, i.e. λ𝖦(Ax𝟏)\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) is “concave down” and ρ𝖦(Ax𝟏)\rho_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) is “concave up” in xx. From this it follows that the minimizer is attained at a unique value, or perhaps on a unique closed interval.

Remark 3.2.

The local improvement part of the theorem above is basically telling us that we have to have enough terms of the correct sign to allow the shifting to improve the estimates. If a row has many more positive terms than negative terms, for example, then subtracting the same number from each entry improves the Gershgorin bound, since it decreases the off-diagonal sum — from this we see that we would want the row(s) which are the limiting factor in the λ𝖦(A)\lambda_{{\mathsf{{{{G}}}}}}({A}) calculation to have enough positive terms. In particular, it follows that if there are enough terms of both signs, shifting can improve both sides of the bound.

One quick corollary of this theorem is that if we have a matrix with all positive entries (or even all positive off-diagonal entries), then shifting is guaranteed to improve the lower bound, yet cannot improve the upper bound. In this sense, our theorem is in a sense a counterpoint to the Perron–Frobenius theorem: the bound on the spectral radius of a positive matrix is exactly the one given by Perron–Frobenius, but we always obtain a good bound on the least eigenvalue.

The global bound part of this theorem tells us that we can write the objective function as a single piecewise-linear function, or as an extremum of at most nn different linear functions. This more or less allows us to write down the optimal bound in closed form, see Corollary 3.5 below. In any case, the optimal bound can be obtained by a linear program.

Definition 3.3.

Let AA be an n×nn\times n symmetric matrix. We define rk:=n2k,Rk=2kn2r_{k}:=n-2k,R_{k}=2k-n-2 for k=1,,nk=1,\dots,n. For each i=1,,ni=1,\dots,n, we define δi,k\delta_{i,k} as follows: let y1y2yn1y_{1}\leq y_{2}\leq\dots\leq y_{n-1} be the off-diagonal terms of row ii of the matrix AA in nondecreasing order, and define, for i,k=1,,ni,k=1,\dots,n,

δi,k:=j<kyjjkyj,\delta_{i,k}:=\sum_{j<k}y_{j}-\sum_{j\geq k}y_{j},

and then

si,k=aii+δi,k,Si,k=aiiδi,k.s_{i,k}=a_{ii}+\delta_{i,k},\quad S_{i,k}=a_{ii}-\delta_{i,k}.

Finally, we define

sk=minisi,k,Sk=maxiSi,k.s_{k}=\min_{i}s_{i,k},\quad S_{k}=\max_{i}S_{i,k}.
Lemma 3.4.

The functions di(A,x),Di(A,x)d_{i}(A,x),D_{i}(A,x) are piecewise linear, and can be written as a minimizer of a family of linear functions:

di(A,x)=mink(rkx+si,k),Di(A,x)=maxk(Rkx+Si,k).d_{i}(A,x)=\min_{k}(r_{k}x+s_{i,k}),\quad D_{i}(A,x)=\max_{k}(R_{k}x+S_{i,k}).
Proof.

Define yy as in Definition 3.3. First note that we can write

di(A,x)=aiix=1n1|yx|,d_{i}(A,x)=a_{ii}-x-\sum_{\ell=1}^{n-1}\left|{y_{\ell}-x}\right|,

since the ordering in the sum does not matter. Now, note that if x(yk1,yk)x\in(y_{k-1},y_{k}) (which could be an empty interval), we have

di(A,x)\displaystyle d_{i}(A,x) =aiix=1k1(xy)=kn1(yx)\displaystyle=a_{ii}-x-\sum_{\ell=1}^{k-1}(x-y_{\ell})-\sum_{\ell=k}^{n-1}(y_{\ell}-x)
=aii+=1k1y=kn1yx(k1)x+(nk)x\displaystyle=a_{ii}+\sum_{\ell=1}^{k-1}y_{\ell}-\sum_{\ell=k}^{n-1}y_{\ell}-x-(k-1)x+(n-k)x
=si,k+(n2k)x=si,k+rkx.\displaystyle=s_{i,k}+(n-2k)x=s_{i,k}+r_{k}x.

In the case where yk1=yky_{k-1}=y_{k}, it is easy to see that this equality does not hold on an interval, but it does hold at the point yky_{k}. Finally, noting that |x|x\left|{x}\right|\geq x and |x|x\left|{x}\right|\geq-x, this means that di(A,x)rkx+si,kd_{i}(A,x)\leq r_{k}x+s_{i,k} for all xx (note the negative sign in front of the sum!). Thus the family rkx+si,kr_{k}x+s_{i,k} dominates di(A,x)d_{i}(A,x), and coincides with it on a nonempty set, and this proves the result.

The argument is similar for DiD_{i}. If we write

Di(A,x)\displaystyle D_{i}(A,x) =aiix+=1k1(xy)+=kn1(yx)\displaystyle=a_{ii}-x+\sum_{\ell=1}^{k-1}(x-y_{\ell})+\sum_{\ell=k}^{n-1}(y_{\ell}-x)
=aii=1k1y+=kn1yx+(k1)x(nk)x\displaystyle=a_{ii}-\sum_{\ell=1}^{k-1}y_{\ell}+\sum_{\ell=k}^{n-1}y_{\ell}-x+(k-1)x-(n-k)x
=Si,k+(2kn2)x=Si,k+Rkx,\displaystyle=S_{i,k}+(2k-n-2)x=S_{i,k}+R_{k}x,

and the remainder follows by taking maxima. ∎

Proof of Theorem 3.1.

First we prove the “global bounds” part of the theorem. From Lemma 3.4,

λ𝖦𝖲(A)=supx0minidi(A,x)=supx0minimink(rkx+si,k)=supx0mink(rkx+minisi,k)=supx0mink(rkx+sk),\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\sup_{x\geq 0}\min_{i}d_{i}(A,x)=\sup_{x\geq 0}\min_{i}\min_{k}(r_{k}x+s_{i,k})=\sup_{x\geq 0}\min_{k}(r_{k}x+\min_{i}s_{i,k})=\sup_{x\geq 0}\min_{k}(r_{k}x+s_{k}),

and we are done. The proof is the same for ρ𝖦𝖲(A)\rho^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}).

The argument for convexity is similar. We have

λ𝖦(Ax𝟏)=minidi(A,x)=minimink(rkx+si,k).\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}})=\min_{i}d_{i}(A,x)=\min_{i}\min_{k}(r_{k}x+s_{i,k}).

As a minimizer of linear functions, it follows that λ𝖦(Ax𝟏)\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) is concave down, and similarly, as ρ𝖦(Ax𝟏)\rho_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) is the maximizer of linear functions, it is concave up.

Finally, we consider the “local improvement” statement. Choose and fix a row ii, and define yy as in Definition 3.3. If all of the entries of yky_{k} are positive, then for x<y1x<y_{1}, di(A,x)=(n2)x+kykd_{i}(A,x)=(n-2)x+\sum_{k}y_{k}, and thus has a positive slope at x=0x=0. Otherwise, let us write yk10yky_{k-1}\leq 0\leq y_{k}. If n2k>0n-2k>0, then di(A,x)d_{i}(A,x) has a positive slope at x=0x=0. Thus, as long as yn/2>0y_{\lceil{n/2}\rceil}>0, then di(A,x)d_{i}(A,x) has a positive slope at 0. In short, di(A,x)d_{i}(A,x) has a positive slope at zero iff at least n1n/2=n/21n-1-\lceil{n/2}\rceil=\lfloor{n/2-1}\rfloor of the entries of yy are positive. Since λ𝖦(Ax𝟏)\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) is the minimum of the did_{i}, as long as this is true for every row which minimizes the quantity in a neighborhood of zero, then λ𝖦(Ax𝟏)\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) is itself increasing at x=0x=0.

Corollary 3.5.

Let AA be an n×nn\times n symmetric matrix and consider a set of pairs of indices defined by

Q={(k,l):kn/2,l>n/2,slsk}.Q=\{(k,l):k\leq n/2,l>n/2,s_{l}\geq s_{k}\}.

Then if QQ\neq\emptyset, then λ𝖦𝖲(A)>λ𝖦(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})>\lambda_{{\mathsf{{{{G}}}}}}({A}) and

λ𝖦𝖲(A)=min(k,l)Q((n/2k)sl+(ln/2)sklk),\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\min_{(k,l)\in Q}\left(\frac{(n/2-k)s_{l}+(l-n/2)s_{k}}{l-k}\right), (10)

and if Q=Q=\emptyset then λ𝖦𝖲(A)=λ𝖦(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\lambda_{{\mathsf{{{{G}}}}}}({A}). Moreover, the minimum of λ𝖦(Ax𝟏)\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) occurs at x=(slsk)/(2(lk))x=(s_{l}-s_{k})/(2(l-k)), where k,lk,l are chosen so that (10) is minimized. Also, if the (off-diagonal) terms of AA are all positive, then the set QQ is the set {(k,l):kn/2,l>n/2}\{(k,l):k\leq n/2,l>n/2\}.

Proof.

First note that if the entries of AA are positive, then si,k<si,k+1s_{i,k}<s_{i,k+1} for all ii, and thus sk<sk+1s_{k}<s_{k+1}. From this it follows that if klk\leq l then slsks_{l}\geq s_{k}, and the last sentence follows.

From Theorem 3.1, in particular (9), it follows that the supremum of λ𝖦(Ax𝟏)\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}}) will be attained at the intersection of two lines, and moreover it is clear that this must between between a line of nonnegative slope (kn/2k\leq n/2) and a line of nonpositive slope (kn/2k\geq n/2). Finding the intersection point gives

x=slskrkrl.x=\frac{s_{l}-s_{k}}{r_{k}-r_{l}}.

Since the denominator is positive, we need the numerator be nonnegative, which gives the condition slsks_{l}\geq s_{k}. Plugging this point into the line skx+rks_{k}x+r_{k} gives the expression in parentheses in (10). The minimum such intersection gives us our answer. ∎

Example 3.6.

Let AA be a 3×33\times 3 matrix and compute s1,s2,s3s_{1},s_{2},s_{3}. Using Corollary 3.5, there are at most only 2=1×22=1\times 2 terms to minimize over: k=1,l=2k=1,l=2 and k=1,l=3k=1,l=3. If s1s_{1} is the largest, then we cannot improve the lower bound by shifting, and in fact we obtain λ𝖦(A)=λ𝖦𝖲(A)=min(s2,s3)\lambda_{{\mathsf{{{{G}}}}}}({A})=\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\min(s_{2},s_{3}). If s1s_{1} is the smallest (which will always happen, for example, when the off-diagonal entries of AA are positive), then we obtain

λ𝖦𝖲(A)=min(s1+s22,3s1+s34).\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\min\left(\frac{s_{1}+s_{2}}{2},\frac{3s_{1}+s_{3}}{4}\right).

If we consider AA from (1), since si,ks_{i,k} is independent of ii, we can obtain sks_{k} from any row, and we have

s1=655=4,s2=6+55=6,s3=6+5+5=11,s_{1}=6-5-5=-4,\quad s_{2}=6+5-5=6,\quad s_{3}=6+5+5=11,

and thus

λ𝖦𝖲(A)=min(1,29/4)=1.\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\min(1,29/4)=1.

Since the minimum was attained by the (k=1,l=2)(k=1,l=2) pair, we have that it is attained at x=(6+4)/2=5x=(6+4)/2=5. This matches the analysis in the introduction.

Example 3.7.

Choosing AA as in (2), we have

s1,k={2,4,10},s2,k={2,4,10},s3,k={2,6,12},s_{1,k}=\{2,4,10\},\quad s_{2,k}=\{2,4,10\},\quad s_{3,k}=\{-2,6,12\},

so that

s1=2,s2=4,s3=10.s_{1}=-2,\quad s_{2}=4,\quad s_{3}=10.

Therefore

λ𝖦𝖲(A)=min(1,1)=1.\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\min(1,1)=1.

We also see that we can choose either pair to find the xx at which this is minimized, using the (k=1,l=3)(k=1,l=3) pair gives at x=(10+2)/(22)=3x=(10+2)/(2\cdot 2)=3. We could also ask for which xx the matrix Ax𝟏A-x{\bf 1} is guaranteed to be positive definite; note that we have

λ𝖦(Ax𝟏)=min{x2,x+4,3x+10},\lambda_{{\mathsf{{{{G}}}}}}({A-x{\bf 1}})=\min\{x-2,-x+4,-3x+10\},

so it is in the domain (2,)(,4)(,10/3)=(2,10/3)(2,\infty)\cap(-\infty,4)\cap(-\infty,10/3)=(2,10/3).

4 Shifted Brauer Bounds

In this section we consider the shifted Brauer estimator. The Brauer estimate is nonlinear and, moreover, we show that the convexity properties of shifted Gershgorin no longer apply. We show that this adds considerable technical difficulty to the problem, except in certain cases. Although we do not study the shift of some of the other nonlinear estimators from Section 2, we see from the difficulties that arise for Brauer that it is unlikely for us to obtain a tractable optimization problem without further assumptions.

We recall from (4) that

λ𝖡(A)=minij(aii+ajj2(aiiajj)2+Ri(A)Rj(A)),\lambda_{{\mathsf{{{{B}}}}}}({A})=\min_{i\neq j}\left(\frac{a_{ii}+a_{jj}}{2}-\sqrt{(a_{ii}-a_{jj})^{2}+R_{i}(A)R_{j}(A)}\right),

and thus

λ𝖡𝖲(A)=infx0minij(aii+ajj2x(aiiajj)2+Ri(Ax𝟏)Rj(Ax𝟏)).\lambda^{\mathsf{S}}_{{\mathsf{{{{B}}}}}}({A})=\inf_{x\geq 0}\min_{i\neq j}\left(\frac{a_{ii}+a_{jj}}{2}-x-\sqrt{(a_{ii}-a_{jj})^{2}+R_{i}(A-x{\bf 1})R_{j}(A-x{\bf 1})}\right).

Noting that

Ri(Ax𝟏)=ki|aikx|,R_{i}(A-x{\bf 1})=\sum_{k\neq i}\left|{a_{ik}-x}\right|,

and writing

fij(A,x)=aii+ajj2x(aiiajj)2+(ki|aikx|)(kj|ajkx|),f_{ij}(A,x)=\frac{a_{ii}+a_{jj}}{2}-x-\sqrt{(a_{ii}-a_{jj})^{2}+\left(\sum_{k\neq i}\left|{a_{ik}-x}\right|\right)\left(\sum_{k\neq j}\left|{a_{jk}-x}\right|\right)},

this simplifies to

λ𝖡𝖲(A)=infx0minijfij(A,x).\lambda^{\mathsf{S}}_{{\mathsf{{{{B}}}}}}({A})=\inf_{x\geq 0}\min_{i\neq j}f_{ij}(A,x). (11)

This last expression is deceptively complicated, because in fact the functions fij(A,x)f_{ij}(A,x) are not convex, or even very simple.

Example 4.1.

In Figure 1 we plot a few examples of fij(A,x)f_{ij}(A,x) to get a sense of the shape of such functions. In each case, we specify the diagonal entry, and the set of off-diagonal entries, of the first two rows of a matrix, and plot f12(A,x)f_{12}(A,x). For example, if

A1=(012405???),A2=(014205???),A_{1}=\left(\begin{array}[]{ccc}0&1&2\\ 4&0&5\\ ?&?&?\end{array}\right),\quad A_{2}=\left(\begin{array}[]{ccc}0&1&4\\ 2&0&5\\ ?&?&?\end{array}\right),

we obtain the figures in Figure 1. (Note that as long as we choose the diagonal entries the same, this simply shifts the function without changing its shape.) First note that neither function has the properties from the last section: although the function in the right frame is close to being concave down, it is not quite, and clearly the function in the left frame is far from concave down.

Refer to caption
Refer to caption
Figure 1: Two plots of f12(A,x)f_{12}(A,x) for various choices of entries.

We could, in theory, write out the expression (11) as a single minimization over a large number of functions. Consider the expression for fij(A,x)f_{ij}(A,x): the expression under the radical is piecewise quadratic on any interval between distinct off-diagonal terms of the two rows, and since |x|x,x\left|{x}\right|\geq x,-x, this means that fij(A,x)f_{ij}(A,x) could be written as the minimum of such terms. However, this approach will not be as fruitful as it was in the previous section, since these functions can be both concave up and concave down. Moreover, since the bounding functions are nonlinear with a square root, it is possible that they have a domain strictly smaller than x0x\geq 0. In any case, extrema do not have to occur at the places where the domain of definition shifts.

However, there is one case where an approximation gives a tractable minimization problem, which we state below.

Theorem 4.2.

Assume that all of the diagonal entries of AA are the same, and denote this number by qq. Then if we define

λB𝖲~(A)=supx0minij(qx12(Ri(Ax𝟏)+Rj(Ax𝟏))),\widetilde{\lambda^{\mathsf{S}}_{B}}(A)=\sup_{x\geq 0}\min_{i\neq j}\left(q-x-\frac{1}{2}\left(R_{i}(A-x{\bf 1})+R_{j}(A-x{\bf 1})\right)\right), (12)

then λB𝖲~(A)\widetilde{\lambda^{\mathsf{S}}_{B}}(A) is still a good lower bound for the eigenvalues of AA.

Remark 4.3.

Notice that the RHS of (12) is simply the average of the Gershgorin bounds given by the iith and jjth rows. As such, these functions are again piecewise linear and can be attacked in a similar manner to that of the previous section, which we discuss below. Also, notice in the statement of the theorem that we can assume wlog that q=0q=0, since adding qq to every diagonal entry simply shifts every eigenvalue by exactly qq.

Proof.

If aii=ajja_{ii}=a_{jj}, then

fij(A,x)\displaystyle f_{ij}(A,x) =aiixRi(Ax𝟏)Rj(Ax𝟏)\displaystyle=a_{ii}-x-\sqrt{R_{i}(A-x{\bf 1})R_{j}(A-x{\bf 1})}
aiix12(Ri(Ax𝟏)+Rj(Ax𝟏))=:dij(A,x).\displaystyle\geq a_{ii}-x-\frac{1}{2}(R_{i}(A-x{\bf 1})+R_{j}(A-x{\bf 1}))=:d_{ij}(A,x).

Notice that

dij(A,x)=12(di(A,x)+dj(A,x)),d_{ij}(A,x)=\frac{1}{2}(d_{i}(A,x)+d_{j}(A,x)),

where the did_{i} are the linear bounding functions from (8). Thus we can define

λB𝖲~(A):=infx0minijdij(A,x),\widetilde{\lambda^{\mathsf{S}}_{B}}(A):=\inf_{x\geq 0}\min_{i\neq j}d_{ij}(A,x),

and we are guaranteed that λB𝖲~(A)λ𝖡𝖲(A)\widetilde{\lambda^{\mathsf{S}}_{B}}(A)\leq\lambda^{\mathsf{S}}_{{\mathsf{{{{B}}}}}}({A}), so that it is still a good bound for the eigenvalues. ∎

Definition 4.4.

Let AA be an n×nn\times n symmetric matrix with all diagonal entries equal to qq. Define pk=n1kp_{k}=n-1-k for k=1,2n1k=1,\dots 2n-1. For each iji\neq j, let us define δi,j,k\delta_{i,j,k} and qi,j,kq_{i,j,k} as follows: let y1y2y2n2y_{1}\leq y_{2}\leq\dots\leq y_{2n-2} be the nondiagonal entries of rows ii and jj of the matrix AA in nondecreasing order, and then for k=1,,2n1k=1,\dots,2n-1,

δi,j,k=m<kymmkym,\delta_{i,j,k}=\sum_{m<k}y_{m}-\sum_{m\geq k}y_{m},

qi,j,k=q+δi,j,kq_{i,j,k}=q+\delta_{i,j,k} and finally qk=minijqi,j,kq_{k}=\min_{i\neq j}q_{i,j,k}.

Proposition 4.5.

If the diagonal entries of AA are all the same, then (q.v. Theorem 3.1)

λ𝖡(Ax𝟏)=mink=12n1(pkx+qk).\lambda_{{\mathsf{{{{B}}}}}}({A-x{\bf 1}})=\min_{k=1}^{2n-1}(p_{k}x+q_{k}).
Proof.

The proof is similar to that of the previous section. The main difference is that while the slope di(A,x)d_{i}(A,x) decreases by two every time xx passes through a non-diagonal entry of row ii, the slope of dij(A,x)d_{ij}(A,x) decreases by one every time xx passes through a non-diagonal entry of row ii or row jj. From this the potential slopes of dij(A,x)d_{ij}(A,x) can be any integer between n2n-2 and n-n, and everything else follows. ∎

5 Applications and Comparison

In Section 5.1 we use the main results of this paper to give a bound on the lowest eigenvalue of the adjacency matrix of an undirected graph, which is sharp for some circulant graphs. In Section 5.2 we perform some numerics to compare the shifted Gershgorin method to other methods on a class of random matrices with nonnegative coefficients. Finally in Sections 5.3 and 5.4 we consider the efficacy of shifting as a function of many of the entries of the matrix.

5.1 Adjacency matrix of a graph

Definition 5.1.

Let G=(V,E)G=(V,E) be an undirected loop-free graph; V=[n]V=[n] is the set of vertices of the graph and EV×VE\subseteq V\times V is the set of edges. We define the adjacency matrix of the graph to be the n×nn\times n matrix A(G)A(G) where

A(G)ij={1,(i,j)E,0,(i,j)E.A(G)_{ij}=\begin{cases}1,&(i,j)\in E,\\ 0,&(i,j)\not\in E.\end{cases}

We also write iji\sim j if (i,j)E(i,j)\in E. We define the degree of vertex ii, denoted deg(i)\deg(i), to be the number of vertices adjacent to vertex ii. We will denote the maximal and minimal degrees of GG by Δ(G),δ(G)\Delta(G),\delta(G). We also denote Δ2(G)\Delta^{2}(G) (resp. δ2(G)\delta^{2}(G)) as the second largest (resp. smallest) degrees of GG.

We use the convention here that A(G)ii=0A(G)_{ii}=0 for all ii. This choice varies in the literature but making these diagonal entries to be one instead will only shift every eigenvalue by 11. From Remark 3.2, since all off-diagonal entries are nonnegative, we have ρ𝖦(A)=ρ𝖦𝖲(A)\rho_{{\mathsf{{{{G}}}}}}({A})=\rho^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}). However, we can improve the left-hand bound by scaling. Intuitively, we expect the scaling to help when there are many positive off-diagonal terms, so we expect this to help best when we have a “dense” graph, i.e. a graph with large δ(G)\delta(G).

We can use the formulas from above, but in fact we can considerably simplify this computation. Note that

di(Ax𝟏)=xji|1x|j≁i|x|.d_{i}(A-x{\bf 1})=-x-\sum_{j\sim i}\left|{1-x}\right|-\sum_{j\not\sim i}\left|{x}\right|.

Notice that this function is decreasing for x>1x>1 in any case, so we can restrict the domain of consideration to x[0,1]x\in[0,1]. Restricted to this domain, the function simplifies to

di(Ax𝟏)\displaystyle d_{i}(A-x{\bf 1}) =xdeg(i)(1x)(n1deg(i))(x)\displaystyle=-x-\deg(i)(1-x)-(n-1-\deg(i))(x)
=deg(i)+(2deg(i)n)x.\displaystyle=-\deg(i)+(2\deg(i)-n)x.

Noting that deg(i)+(2deg(i)n)x=nx+deg(i)(2x1)-\deg(i)+(2\deg(i)-n)x=-nx+\deg(i)(2x-1), we see that all of these functions are equal to n/2-n/2 at x=1/2x=1/2. From this it follows that the only possible optimal points are x=0,1/2,1x=0,1/2,1 in the three cases that all of the slopes are negative, some are of each sign, and all slopes are positive. (The simplest way to see this is to note that the family of lines given by did_{i} all “pivot” around the point (1/2,n/2)(1/2,-n/2).)

The case where all slopes are negative is if 2deg(i)n<02\deg(i)-n<0 for all ii, or Δ(G)<n/2\Delta(G)<n/2. In this case, the unshifted Gershgorin bound of Δ(G)-\Delta(G) is the best that we can do. If not all of the slopes are of one sign, i.e. δ(G)n/2Δ(G)\delta(G)\leq n/2\leq\Delta(G), then the best bound is n/2-n/2. Finally, if all of the slopes are positive, i.e. δ(G)>n/2\delta(G)>n/2, then at x=1x=1 we obtain the bound δ(G)n\delta(G)-n.

Noting that all of the diagonal entries of A(G)A(G) are the same, we can use Theorem 4.2 and consider the dijd_{ij}. If we define ddeg(i,j)=1/2(deg(i)+deg(j)){\mathrm{ddeg}}(i,j)=1/2(\deg(i)+\deg(j)) to be the average of the degrees of vertices ii and jj, then

dij(A,x)\displaystyle d_{ij}(A,x) =12(di(A,x)+dj(A,x))=ddeg(i,j)+2(ddeg(i,j)n)x.\displaystyle=\frac{1}{2}(d_{i}(A,x)+d_{j}(A,x))=-{\mathrm{ddeg}}(i,j)+2({\mathrm{ddeg}}(i,j)-n)x.

If we also define Δ2(G)\Delta_{2}(G) as the average of the two largest degrees of GG, and δ2(G)\delta_{2}(G) as the average of the two smallest degrees, then everything in the previous case analogously holds with Δ,δ\Delta,\delta replaced by Δ2,δ2\Delta_{2},\delta_{2}.

Finally, note that we can apply the Brauer method, in a similar manner, directly. Rewriting the above to obtain

Ri(Ax𝟏)=deg(i)+(n12deg(i))x,R_{i}(A-x{\bf 1})=\deg(i)+(n-1-2\deg(i))x,

we have

fij(A,x)=x(deg(i)+(n12deg(i))x)(deg(j)+(n12deg(j))x)f_{ij}(A,x)=-x-\sqrt{(\deg(i)+(n-1-2\deg(i))x)(\deg(j)+(n-1-2\deg(j))x)}

Again, we see that all of these functions are equal at x=1/2x=1/2 and again take value n/2-n/2. At x=0x=0, this gives the unshifted Brauer bound of deg(i)deg(j)-\sqrt{\deg(i)\deg(j)} which is of course minimized at Δ(G)Δ2(G)-\sqrt{\Delta(G)\Delta^{2}(G)}. At x=1x=1, we obtain the bound

1((n1)deg(i))((n1)deg(j))-1-\sqrt{((n-1)-\deg(i))((n-1)-\deg(j))}

which is minimized at 1(n1δ(G))(n1δ2(G))-1-\sqrt{(n-1-\delta(G))(n-1-\delta^{2}(G))}. As before, the first bound Δ(G)Δ2(G)-\sqrt{\Delta(G)\Delta^{2}(G)} is best for graphs with small largest degree, the second bound 1(n1δ(G))(n1δ2(G))-1-\sqrt{(n-1-\delta(G))(n-1-\delta^{2}(G))} is best for graphs with large smallest degree, and the n/2-n/2 bound is best for graphs with a large separation of degrees.

One can check directly that these bounds are exact for the (n,1)(n,1)- and (n,n/21)(n,n/2-1)-circulant graphs. Of course when the graph is regular, the Gershgorin and Brauer bounds coincide. In Figure 2 we plot numerical computations for a sample of Erdős–Rényi random graphs. We consider the case with N=20N=20 vertices and an edge probability of p=0.9p=0.9 (we are considering the case where the minimal degree is large, so the shifted bound is best). We see that for a few graphs, this estimate is almost exact, and seems to be not too bad for a large selection of graphs. Note that the unshifted bounds given by either Gershgorin or Brauer are much worse in this case: the average degree of the vertices in these graphs is 1818, and the average maximal degree is larger still (see [9] for more precise statements), so that the unshifted methods would give bounds near 18-18, which is much further from the actual values.

Refer to caption
Figure 2: Plot of 10,000 samples of Erdős–Rényi random graphs G(N,p)G(N,p) with N=20N=20 vertices and edge probability p=0.9p=0.9. For each random graph, we choose AA to be the adjacency matrix of the graph, and we are plotting λ1(A)\lambda_{1}(A) on the xx-axis and λ𝖡𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{B}}}}}}({A}) on the yy-axis.

5.2 Numerical comparisons of unshifted and shifted methods

Here we present the results of a numerical study comparing methods studied here to the existing methods, where we compare the left bound given by multiple methods for matrices with nonnegative coefficients.

We present the results of the numerical experiments in Figures 3 and 4. In each case, we considered 10001000 5×55\times 5 matrices with random integer coefficients in the range 0,,100,\dots,10. For each random matrix AA, we compute the unshifted Gershgorin bound λ𝖦(A)\lambda_{{\mathsf{{{{G}}}}}}({A}), the unshifted Brauer bound λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}), the shifted Gershgorin bound λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}), and the actual smallest eigenvalue. In terms of computation, we computed all of these directly except for λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}), and here we used (9).

In the left frame of each figure, we plot all four numbers for each matrix. Since these are all random samples, ordering is irrelevant, and we have chosen to order them by the value of λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}) (yellow). We have plotted λ𝖦(A)\lambda_{{\mathsf{{{{G}}}}}}({A}) in blue, and we confirm in this picture that λ𝖦(A)λ𝖡(A)\lambda_{{\mathsf{{{{G}}}}}}({A})\leq\lambda_{{\mathsf{{{{B}}}}}}({A}). Finally, we plot λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) in green and the actual smallest eigenvalue λ1(A)\lambda_{1}(A) in red. We see that λ𝖦𝖲(A)>λ𝖡(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})>\lambda_{{\mathsf{{{{B}}}}}}({A}) for most samples, and is actually quite a bit better in many cases. We further compare λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}) and λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) in Figure 4, where we have a scatterplot of the error given by the two bounds: on the xx-axis we plot λ1(A)λ𝖦𝖲(A)\lambda_{1}(A)-\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) (the error given by shifted Gershgorin) and on the yy-axis we plot λ1(A)λ𝖡(A)\lambda_{1}(A)-\lambda_{{\mathsf{{{{B}}}}}}({A}) (the error given by unshifted Brauer). We also plot the line y=xy=x for comparison. We again see that shifted Gershgorin frequently beats unshifted Brauer, and typically by quite a lot. In fact, in 1000 samples we found that they were exactly the same 4 times, and unshifted Brauer was better 21 times, so that shifted Gershgorin gives a stronger estimate 97.5%97.5\% of the time.

We plot the same data for 10×1010\times 10 matrices (again with integer coefficients from 0,,100,\dots,10) in Figures 5 and 6. We see that for the larger matrices, the separation between the estimates and the actual eigenvalue grows, but λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) does much better than λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}). In particular, out of 1000 samples, λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) beats λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}) every time.

Refer to caption
Figure 3: A calculation of four quantities for each of 10001000 random 5×55\times 5 matrices. We plot λ𝖦(A)\lambda_{{\mathsf{{{{G}}}}}}({A}) in blue, λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}) in yellow, λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) in green, and λ1(A)\lambda_{1}(A) in red. These have been sorted by the value of λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}) for easier visualization.
Refer to caption
Figure 4: The same data as Figure 3 where we plot λ1(A)λ𝖦𝖲(A)\lambda_{1}(A)-\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) on the xx-axis and λ1(A)λ𝖡(A)\lambda_{1}(A)-\lambda_{{\mathsf{{{{B}}}}}}({A}) on the yy-axis.
Refer to caption
Figure 5: A calculation of four quantities for each of 10001000 random 10×1010\times 10 matrices. We plot λ𝖦(A)\lambda_{{\mathsf{{{{G}}}}}}({A}) in blue, λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}) in yellow, λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) in green, and λ1(A)\lambda_{1}(A) in red. These have been sorted by the value of λ𝖡(A)\lambda_{{\mathsf{{{{B}}}}}}({A}) for easier visualization.
Refer to caption
Figure 6: The same data as Figure 5 where we plot λ1(A)λ𝖦𝖲(A)\lambda_{1}(A)-\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) on the xx-axis and λ1(A)λ𝖡(A)\lambda_{1}(A)-\lambda_{{\mathsf{{{{B}}}}}}({A}) on the yy-axis.

5.3 Domain of positive definiteness

One other way to view the problem of positive definiteness is as follows. Let us imagine that a matrix is given where the off-diagonal entries are prescribed, but the diagonal entries are free. We could then ask what condition on the diagonal entries guarantees that the matrix be positive definite?

First note that as we send the diagonal entries to infinity, the matrix is guaranteed positive definite, simply because of the Gershgorin result: as long as aii>ji|aij|a_{ii}>\sum_{j\neq i}\left|{a_{ij}}\right| for all ii, then the matrix is positive definite. Thus we obtain at worst an unbounded box from unshifted Gershgorin.

From the fact that the shifted Gershgorin estimates are always piecewise linear, we would expect that the region from this is always an “infinite polytope”, i.e. an unbounded intersection of finitely many half-planes that is a superset of the unshifted Gershgorin box. We prove this here.

Proposition 5.2.

Let vv be a vector of length (n2n)/2(n^{2}-n)/2, and let (v)\mathcal{M}(v) be the set of all matrices with vv as the off-diagonal terms (ordered in the obvious manner). Let us define 𝒮(v)\mathcal{S}(v) as the subset of (v)\mathcal{M}(v) where A𝒮(v)A\in\mathcal{S}(v) if λ𝖦𝖲(A)0\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})\geq 0, and let 𝒟(v)\mathcal{D}(v) to be the set of all diagonals of matrices in 𝒮(v)\mathcal{S}(v). Then the set 𝒟(v)\mathcal{D}(v) is an unbounded domain defined by the intersection of finitely many half-planes that contains a box of the form aiiβia_{ii}\geq\beta_{i}, and as such can be written {x:ax0}\cap_{\ell}\{x:a_{\ell}\cdot x\geq 0\} where the entries of aa_{\ell} are nonnegative.

Proof.

We have from Theorem 3.1 that

λ𝖦𝖲(A)=supx0mink=1n(rkx+sk).\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\sup_{x\geq 0}\min_{k=1}^{n}(r_{k}x+s_{k}).

This means that λ𝖦𝖲(A)0\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})\geq 0 iff there exists x0x\geq 0 such that rkx+sk0r_{k}x+s_{k}\geq 0 for all k=1,,nk=1,\dots,n. This is the same as saying that

k=1n{x0:rkx+sk0}.\bigcap_{k=1}^{n}\{x\geq 0:r_{k}x+s_{k}\geq 0\}\neq\emptyset.

Under the condition that sn/20s_{n/2}\geq 0 (or under no additional condition if nn is odd), this is the same as

k<n/2{x0:xsk/rk}k>n/2{x0:xsk/rk}.\bigcap_{k<n/2}\{x\geq 0:x\geq-s_{k}/r_{k}\}\cap\bigcap_{k>n/2}\{x\geq 0:x\leq-s_{k}/r_{k}\}\neq\emptyset.

This reduces to the two conditions

maxk<n/2sk/rkmink>n/2sk/rk,mink>n/2sk/rk0,\max_{k<n/2}-s_{k}/r_{k}\leq\min_{k>n/2}-s_{k}/r_{k},\quad\min_{k>n/2}-s_{k}/r_{k}\geq 0,

which is clearly the intersection of half-planes in sks_{k}. Fixing the off-diagonal elements of AA fixes δi,k\delta_{i,k}, and then sks_{k} is the minimum of functions linear in the diagonal elements of AA. Finally we know that λ𝖦𝖲(A)λ𝖦(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})\leq\lambda_{{\mathsf{{{{G}}}}}}({A}), so that any matrix satisfying aiiRi(A)a_{ii}\geq R_{i}(A) is clearly in 𝒮(v)\mathcal{S}(v). ∎

It follows from this proposition that shifted Gershgorin cannot be optimal. The boundary of the set 𝒟(v)\mathcal{D}(v) must be defined by an nnth degree polynomial in the aiia_{ii}, since this boundary is given by an eigenvalue passing through zero and thus must be a subset of det(A)=0\det(A)=0.

Example 5.3.

Let us consider a 3×33\times 3 example for this region, and to make regions easier to plot let us only let two of the diagonal entries vary, namely

A=(yabazcbcd),A=\left(\begin{array}[]{ccc}y&a&b\\ a&z&c\\ b&c&d\end{array}\right), (13)

where a,b,c,da,b,c,d are fixed parameters and y,zy,z are variable. From Proposition 5.2, the domain {(y,z):λ𝖦𝖲(A)0}\{(y,z):\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})\geq 0\} is an intersection of half-planes; the domain {(y,z):λ𝖦(A)0}\{(y,z):\lambda_{{\mathsf{{{{G}}}}}}({A})\geq 0\} is given by a box, and of course the domain {(y,z):A0}\{(y,z):A\geq 0\} is given by a quadratic equation in y,zy,z.

Let us consider one example: choose a=2,b=1,c=2,d=4a=2,b=1,c=2,d=4, which is shown by the second frame in Figure 7. We can compute the conditions on y,zy,z as follows. We have

s1=min(1,y3,z4),s2=min(3,y1,z),s3=min(7,y+3,z+4),s_{1}=\min(1,y-3,z-4),\quad s_{2}=\min(3,y-1,z),\quad s_{3}=\min(7,y+3,z+4),

and the domain is defined by s20,s30,s2s1,s33s1s_{2}\geq 0,s_{3}\geq 0,s_{2}\geq-s_{1},s_{3}\geq-3s_{1}. Writing all of this out, we obtain the conditions

y2,z2,y+z5.y\geq 2,\quad z\geq 2,\quad y+z\geq 5.

Note that unshifted Gershgorin would give the conditions y3,z4y\geq 3,z\geq 4. The exact boundary condition from the determinant is 4yz4yz8=04yz-4y-z-8=0. We plot all of these sets in Figure 7 for this and other choices of parameters. Note that the shifted region sometimes shares boundary with the exact region, but being piecewise linear this is the best that can be done.

Refer to caption
Refer to caption
Refer to caption
Figure 7: Positive-definiteness for the matrix AA defined in (13). In each case, the actual region of positive-definiteness is plotted in gray, the region guaranteed by shifted Gershgorin given in green, and the region guaranteed by unshifted Gershgorin in yellow. The three pictures correspond to the parameter choices (2,0,2,4),(2,1,2,4),(2,2,2,4)(2,0,2,4),(2,1,2,4),(2,2,2,4) respectively.

5.4 Spread of the off-diagonal terms

We might ask the question, given the Ri(A)R_{i}(A), what are the best and worst choices of off-diagonal entries for the bound λ𝖦(A)\lambda_{{\mathsf{{{{G}}}}}}({A}). Intuitively, it seems clear that concentrating all of the mass in a single off-diagonal element is “worst”, whereas spreading it as much as possible is “best”. In short, high off-diagonal variance is bad for the shifted estimates.

Proposition 5.4.

Let v(N1,ρ)v(N-1,\rho) be the set of all vectors of length N1N-1 with nonnegative entries that sum to ρ\rho. We can then consider the quantities

d¯(N1,ρ)\displaystyle\underline{d}(N-1,\rho) =infyv(N1,ρ)supx0(ρxk=1n1|ykx|),\displaystyle=\inf_{y\in v(N-1,\rho)}\sup_{x\geq 0}\left(\rho-x-\sum_{k=1}^{n-1}\left|{y_{k}-x}\right|\right),
d¯(N1,ρ)\displaystyle\overline{d}(N-1,\rho) =supyv(N1,ρ)supx0(ρxk=1n1|ykx|).\displaystyle=\sup_{y\in v(N-1,\rho)}\sup_{x\geq 0}\left(\rho-x-\sum_{k=1}^{n-1}\left|{y_{k}-x}\right|\right).

Then d¯(N1,ρ)=0\underline{d}(N-1,\rho)=0 and is attained at any vv with more than half of the entries equal to zero, and d¯(N1,ρ)=ρ(N2)/(N1)\overline{d}(N-1,\rho)=\rho(N-2)/(N-1) and is attained at the vector whose entries are equal to ρ/(N1)\rho/(N-1).

Proof.

The claim about d¯(N1,ρ)\underline{d}(N-1,\rho) is straightforward. If more than half of the entries are zero, then it is easy to see that the function is decreasing in xx and its supremum is attained at x=0x=0, giving zero. Moreover, this infimum cannot be less than zero.

For d¯(N1,ρ)\overline{d}(N-1,\rho), we first show that for any yy with kyk=ρ/(N1)\sum_{k}y_{k}=\rho/(N-1),

minkyk+k|yky|ρN1.\min_{k}y_{k}+\sum_{\ell\neq k}\left|{y_{k}-y_{\ell}}\right|\geq\frac{\rho}{N-1}.

First note that if all yk=ρ/(N1)y_{k}=\rho/(N-1) then equality is satisfied. Assume that yy is not a constant vector. If ykρ/(N1)y_{k}\geq\rho/(N-1) then the inequality is trivial, so assume yk<ρ/(N1)y_{k}<\rho/(N-1). Since the average of the entries of yy is ρ/(N1)\rho/(N-1), at least one y>ρ/(N1)y_{\ell^{*}}>\rho/(N-1). Writing α=ρ/(N1)yk\alpha=\rho/(N-1)-y_{k}, we have |yky|α\left|{y_{k}-y_{\ell^{*}}}\right|\geq\alpha and again the inequality follows. Finally, note that since the quantity is linear away from the points yky_{k}, we have

supx0ρxk=1n1|ykx|=maxkρykk|yyk|,\sup_{x\geq 0}\rho-x-\sum_{k=1}^{n-1}\left|{y_{k}-x}\right|=\max_{k}\rho-y_{k}-\sum_{\ell\neq k}\left|{y_{\ell}-y_{k}}\right|,

and we are done. ∎

This proposition verifies the intuition that “spreading” the mass in the off-diagonal terms improves the shifted Gershgorin bound while not changing the unshifted Gershgorin bound at all. For example, let us consider all symmetric matrices with nonnegative coefficients with all diagonal entries equal to ρ\rho and all off-diagonal sums equal to ρ\rho. Then for any AA in this class, λ𝖦(A)=0\lambda_{{\mathsf{{{{G}}}}}}({A})=0. However, λ𝖦𝖲(A)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A}) can be as large as ρ(N2)/(N1)\rho(N-2)/(N-1) if all of the off-diagonal entries are the same. Moreover, it is not hard to see that if the off-diagonal entries differ by no more than ϵ\epsilon, then λ𝖦𝖲(A)=ρ(N2)/(N1)O(ϵ)\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\rho(N-2)/(N-1)-O(\epsilon). In this sense, spreading the off-diagonal mass as evenly as possible helps the shifted Gershgorin bound.

Conversely, concentrating the off-diagonal mass makes the shifted Gershgorin bound worse, and, for example, if we consider the case where each row has one entry equal to ρ\rho and the rest zero, then λ𝖦𝖲(A)=λ𝖦(A)=0\lambda^{\mathsf{S}}_{{\mathsf{{{{G}}}}}}({A})=\lambda_{{\mathsf{{{{G}}}}}}({A})=0. Moreover, note that this bound cannot be improved without further information, since it is sharp for ρ(I+P)\rho(I+P) where PP is any permutation matrix with a two-cycle.

6 Conclusions

We have shown a few cases where shifting the entries of a matrix leads to better bounds on the spectrum of the matrix. We can think of these results as a flip of the standard Perron–Frobenius theorem; the method of this paper gives good bounds on the lowest eigenvalue of a positive matrix rather than the largest.

We might ask whether the results considered here are optimal, in the sense of obtaining a spectral bound from a linear program in terms of the entries of a matrix. Of course the last condition is necessary: writing down the characteristic polynomial of a matrix and solving it gives excellent bounds on the eigenvalues, but this is a notoriously challenging approach. Most of the existing methods mentioned above (again see [2] for a comprehensive review) use some function of the off-diagonal terms, e.g. their sum, or their sum minus one distinguished element, or perhaps two subsums of the entries. This is true of all of the methods mentioned above, as is typical of results of this type. The shifted Gershgorin method uses the actual off-diagonal entries, and as we have shown above, its efficacy is a function as much of the variance of these off-diagonal terms as their sum. As we showed numerically in Section 5.2, shifted Gershgorin beats even a nonlinear method very often in a large family of matrices (to be fair, it was a family designed to work well with shifted Gershgorin, but in that class it works very well). Although checking against every existing method is beyond the scope of this paper, it seems likely that one can construct examples where shifted Gershgorin beats any existing method that uses less information about the off-diagonal terms than their actual values.

We also might ask whether this technique can be improved by shifting by a matrix other than 𝟏{\bf 1} (cf. [10] for a similar perspective on a more complicated bounding problem). Of course, if we choose any positive semidefinite matrix CC, write A=B+xCA=B+xC, and optimize xx to obtain the best bounds on the matrix BB, this will also give estimates on the eigenvalues of AA. It surely is true that for a given matrix AA, there could be a choice of CC that does better than 𝟏{\bf 1} in this regard by exploiting some feature of the original matrix AA. It seems unlikely that there could be a matrix CC that generally beats 𝟏{\bf 1}, especially in light of the results of Section 5.4. Also, there would be the challenge of knowing CC was positive semi-definite in the first place: it seems unlikely that one could bootstrap this method by writing C=D+y𝟏C=D+y{\bf 1}, since this would give A=(B+xD)+xy𝟏A=(B+xD)+xy{\bf 1}, and one is still optimizing over multiples of 𝟏{\bf 1}. However, if there is some known structure of AA that matches the structure of a known semi-definite CC (e.g. CC is a covariance matrix or more generally a Gram matrix) then this might allow us to obtain tighter bounds on the spectrum of AA.

Acknowledgments

The author would like to thank Jared Bronski for useful discussions.

References

  • [1] S. A. Gershgorin. Uber die Abgrenzung der Eigenwerte einer Matrix. Bulletin of the Russian Academy of Sciences, (6):749–7–54, 1931.
  • [2] Richard S. Varga. Gershgorin and his circles in springer series in computational mathematics, 36, 2004.
  • [3] Alexander Ostrowski. Über die Determinanten mit überwiegender Hauptdiagonale. Commentarii Mathematici Helvetici, 10(1):69—96, 1937.
  • [4] Alfred Brauer. Limits for the characteristic roots of a matrix. ii. Duke Math. J., 14(1):21–26, 03 1947.
  • [5] Aaron Melman. Generalizations of Gershgorin disks and polynomial zeros. Proceedings of the American Mathematical Society, 138(7):2349—2364, 2010.
  • [6] Ljiljana Cvetkovic, Vladimir Kostic, and Richard S. Varga. A new Geršgorin-type eigenvalue inclusion set. Electronic Transactions on Numerical Analysis, 18:73—80, 2004.
  • [7] Roger A Horn and Charles R Johnson. Topics in matrix analysis. Cambridge Univ. Press Cambridge etc, 1991.
  • [8] Richard Courant and David Hilbert. Methods of mathematical physics, volume 1. CUP Archive, 1966.
  • [9] Oliver Riordan and Alex Selby. The maximum degree of a random graph. Combinatorics, Probability and Computing, 9(06):549–572, 2000.
  • [10] Nicholas J Higham and Françoise Tisseur. Bounds for eigenvalues of matrix polynomials. Linear algebra and its applications, 358(1):5–22, 2003.