This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Department of Engineering, University of Cambridge, Cambridge CB2 1PZ, UK

Affine-invariant midrange statisticsthanks: This work has received support from the European Research Council under the Advanced ERC Grant Agreement Switchlet n.670645. Cyrus Mostajeran is supported by the Cambridge Philosophical Society.

Cyrus Mostajeran    Christian Grussler    Rodolphe Sepulchre
Abstract

We formulate and discuss the affine-invariant matrix midrange problem on the cone of n×nn\times n positive definite Hermitian matrices (n)\mathbb{P}(n), which is based on the Thompson metric. A particular computationally efficient midpoint of this metric is investigated as a highly scalable candidate for an average of two positive definite matrices within this context, before studying the NN-point problem in the vector and matrix settings.

Keywords:
Positive definite matrices Statistics Optimization

1 Introduction

In this paper, we develop a framework for affine-invariant midrange statistics on the cone of positive definite Hermitian matrices (n)\mathbb{P}(n) of dimension nn. In Subsection 1.1, we briefly note the basic elements of Finsler geometry relevant to the problem. In Subsection 1.2, we define the affine-invariant midrange problem for a collection of NN points. In Section 2, we study a particular midrange of two positive definite matrices arising as the midpoint of a suitable geodesic curve. In Section 3, we briefly review the scalar and vector affine-invariant midrange problem before returning to the NN-point problem on (n)\mathbb{P}(n) in Section 4.

1.1 Affine-invariant Finsler metrics on (n)\mathbb{P}(n)

Consider the family of affine-invariant metric distances dΦd_{\Phi} on (n)\mathbb{P}(n) defined as

dΦ(A,B)=logA1/2BA1/2Φ,d_{\Phi}(A,B)=\|\log A^{-1/2}BA^{-1/2}\|_{\Phi}, (1)

where Φ\|\cdot\|_{\Phi} is any unitarily invariant norm on the space of Hermitian matrices of dimension nn defined by XΦ:=Φ(λ1(X),,λn(X)),\|X\|_{\Phi}:=\Phi(\lambda_{1}(X),\ldots,\lambda_{n}(X)), with λmin(X)=λn(X)λ1(X)=λmax(X)\lambda_{\min}(X)=\lambda_{n}(X)\leq\ldots\leq\lambda_{1}(X)=\lambda_{\max}(X) denoting the nn real eigenvalues of XX and Φ\Phi a symmetric gauge norm on n\mathbb{R}^{n} (i.e. a norm that is invariant under permutations and sign changes of the coordinates) [1]. For any such Φ\Phi, XX is said to be a dΦd_{\Phi}-midpoint of AA and BB if

dΦ(A,X)=dΦ(X,B)=12dΦ(A,B).d_{\Phi}(A,X)=d_{\Phi}(X,B)=\frac{1}{2}d_{\Phi}(A,B). (2)

The curve γ𝒢:[0,1](n)\gamma_{\mathcal{G}}:[0,1]\rightarrow\mathbb{P}(n) defined by

γ𝒢(t)=A1/2(A1/2BA1/2)tA1/2\gamma_{\mathcal{G}}(t)=A^{1/2}\left(A^{-1/2}BA^{-1/2}\right)^{t}A^{1/2} (3)

is geometrically significant as a minimal geodesic for any of the affine-invariant metrics dΦd_{\Phi} [1]. The midpoint of γ𝒢\gamma_{\mathcal{G}} is the matrix geometric mean A#BA\#B, which is a metric midpoint in the sense of dΦ(A,A#B)=dΦ(A#B,B)=12dΦ(A,B)d_{\Phi}(A,A\#B)=d_{\Phi}(A\#B,B)=\frac{1}{2}d_{\Phi}(A,B) for any Φ\Phi. For the choice of Φ(x1,,xn)=(ixi2)1/2\Phi(x_{1},\ldots,x_{n})=(\sum_{i}x_{i}^{2})^{1/2}, d2:=dΦd_{2}:=d_{\Phi} corresponds to the metric distance generated by the standard affine-invariant Riemannian structure given by X,YΣ=tr(Σ1XΣ1Y)\langle X,Y\rangle_{\Sigma}=\operatorname{tr}(\Sigma^{-1}X\Sigma^{-1}Y) for Σ(n)\Sigma\in\mathbb{P}(n) and Hermitian matrices X,YTΣ(n)X,Y\in T_{\Sigma}\mathbb{P}(n). For the choice of Φ(x1,,xn)=maxi|xi|\Phi(x_{1},\ldots,x_{n})=\max_{i}|x_{i}| on the other hand, d:=dΦd_{\infty}:=d_{\Phi} yields the distance function that coincides with the Thompson metric [7] on the cone (n)\mathbb{P}(n)

d(A,B)=logA1/2BA1/2=max{logλmax(BA1),logλmax(AB1)}.d_{\infty}(A,B)=\|\log A^{-1/2}BA^{-1/2}\|_{\infty}=\max\{\log\lambda_{\max}(BA^{-1}),\,\log\lambda_{\max}(AB^{-1})\}. (4)

While the minimal geodesic γ𝒢\gamma_{\mathcal{G}} in (3) and the midpoint is unique for the Riemannian distance function d2d_{2}, it is not unique with respect to the dd_{\infty} metric which generally admits infinitely many minimal geodesics and midpoints between a given pair of matrices A,B(n)A,B\in\mathbb{P}(n) [6], as expected from the analogous picture concerning the associated norms in n\mathbb{R}^{n}. As we shall see in Section 2, some of these minimal geodesics are much more readily constructible than others from a computational standpoint and yield midpoints that satisfy many of the properties expected of an affine-invariant matrix mean. Specifically, we will use the midpoint of a particular minimal geodesic of dd_{\infty} from a construction by Nussbaum as a scalable relaxation of the matrix geometric mean that is much cheaper to construct than A#BA\#B.

1.2 The affine-invariant midrange problem

Given a collection of NN points YiY_{i} in (n)\mathbb{P}(n), the midrange problem can be formulated as the following optimization problem

minX0maxid(X,Yi).\displaystyle\min_{X\succeq 0}\;\max_{i}\;d_{\infty}(X,Y_{i}). (5)

We call a solution XX^{\star} to the above problem a midrange of {Yi}\{Y_{i}\}.

Proposition 1

Let XX^{\star} be a solution to (5) with optimal cost t=f(X)=maxid(X,Yi)t^{\star}=f(X^{\star})=\max_{i}\;d_{\infty}(X^{\star},Y_{i}). We have ltul\leq t^{\star}\leq u, where the lower and upper bounds are given by

l=12diam({Yi}):=12maxi,jd(Yi,Yj),u=minimaxjd(Yi,Yj)2l.l=\frac{1}{2}\operatorname{diam}_{\infty}(\{Y_{i}\}):=\frac{1}{2}\max_{i,j}d_{\infty}(Y_{i},Y_{j}),\quad u=\min_{i}\max_{j}d_{\infty}(Y_{i},Y_{j})\leq 2l. (6)
Proof

By the triangle inequality, we have for any i,j=1,,Ni,j=1,\ldots,N,

d(Yi,Yj)d(Yi,X)+d(X,Yj)t+t=2t,d_{\infty}(Y_{i},Y_{j})\leq d_{\infty}(Y_{i},X^{\star})+d_{\infty}(X^{\star},Y_{j})\leq t^{\star}+t^{\star}=2t^{\star}, (7)

since t=maxid(X,Yi)t^{\star}=\max_{i}d_{\infty}(X^{\star},Y_{i}). Taking the maximum of the left-hand side of (7) over i,ji,j, we arrive at l=12maxi,jd(Yi,Yj)tl=\frac{1}{2}\max_{i,j}d_{\infty}(Y_{i},Y_{j})\leq t^{\star}. For the upper bound, note that taking X=YiX=Y_{i} for each ii, we obtain a cost f(Yi)=maxjd(Yi,Yj)f(Y_{i})=\max_{j}d_{\infty}(Y_{i},Y_{j}). Since the minimum of these NN cost evaluations will still yield an upper bound on the optimum cost tt^{\star}, we have tu=minimaxjd(Yi,Yj)t^{\star}\leq u=\min_{i}\max_{j}d_{\infty}(Y_{i},Y_{j}). ∎

2 The 22-point midrange problem in (n)\mathbb{P}(n)

2.1 Scalable dd_{\infty} geodesic midpoints

As γ𝒢\gamma_{\mathcal{G}} is a minimal geodesic for the dd_{\infty} metric, A#BA\#B lies in the set of midpoints of this metric. In particular,

A#BargminX(n)(maxY{A,B}d(X,Y)).A\#B\in\operatorname{argmin}_{X\in\mathbb{P}(n)}\left(\max_{Y\in\{A,B\}}d_{\infty}(X,Y)\right). (8)

Recall that the metric distance dd_{\infty} coincides with the Thompson metric dTd_{T} on the cone (n)\mathbb{P}(n). If CC is a closed, solid, pointed, convex cone in a vector space VV, then CC induces a partial order on VV given by xyx\leq y if and only if yxCy-x\in C. The Thompson metric on CC is defined as dT(x,y)=logmax{M(x/y;C),M(y/x;C)}d_{T}(x,y)=\log\max\{M(x/y;C),M(y/x;C)\}, where M(y/x;C):=inf{λ:yλx}M(y/x;C):=\inf\{\lambda\in\mathbb{R}:y\leq\lambda x\} for xC{0}x\in C\setminus\{0\} and yVy\in V. For A,B(n)A,B\in\mathbb{P}(n), we have M(A/B)=λmax(AB1)M(A/B)=\lambda_{\max}(AB^{-1}), so that

dT(A,B)=logmax{λmax(AB1),λmax(BA1)},d_{T}(A,B)=\log\max\{\lambda_{\max}(AB^{-1}),\lambda_{\max}(BA^{-1})\}, (9)

which indeed coincides with dd_{\infty} in (4).

It is known that the Thompson metric does not admit unique minimal geodesics. Indeed, a remarkable construction by Nussbaum describes a family of geodesics that generally consists of an infinite number of curves connecting a pair of points in a cone CC. In particular, setting α:=1/M(x/y;C)\alpha:=1/M(x/y;C) and β:=M(y/x;C)\beta:=M(y/x;C), the curve ϕ:[0,1]C\phi:[0,1]\rightarrow C given by

ϕ(t;x,y):={(βtαtβα)y+(βαtαβtβα)xifαβ,αtxifα=β,\phi(t;x,y):=\begin{dcases}\left(\frac{\beta^{t}-\alpha^{t}}{\beta-\alpha}\right)y+\left(\frac{\beta\alpha^{t}-\alpha\beta^{t}}{\beta-\alpha}\right)x\quad&\mathrm{if}\;\alpha\neq\beta,\\ \alpha^{t}x&\mathrm{if}\;\alpha=\beta,\end{dcases} (10)

is always a minimal geodesic from xx to yy with respect to the Thompson metric. The curve ϕ\phi defines a projective straight line in the cone [6]. If we take CC to be the cone of positive semidefinite matrices with interior intC=(n)\operatorname{int}C=\mathbb{P}(n), then for a pair of points A,B(n)A,B\in\mathbb{P}(n), we have β=M(B/A;C)=λmax(BA1)\beta=M(B/A;C)=\lambda_{\max}(BA^{-1}) and α=1/M(A/B;C)=λmin(BA1)\alpha=1/M(A/B;C)=\lambda_{\min}(BA^{-1}). Thus, the minimal geodesic described by (10) takes the form

ϕ(t):={(λmaxtλmintλmaxλmin)B+(λmaxλmintλminλmaxtλmaxλmin)Aifλminλmax,λmintAifλmin=λmax,\phi(t):=\begin{dcases}\left(\frac{\lambda_{\max}^{t}-\lambda_{\min}^{t}}{\lambda_{\max}-\lambda_{\min}}\right)B+\left(\frac{\lambda_{\max}\lambda_{\min}^{t}-\lambda_{\min}\lambda_{\max}^{t}}{\lambda_{\max}-\lambda_{\min}}\right)A&\mathrm{if}\;\lambda_{\min}\neq\lambda_{\max},\\ \lambda_{\min}^{t}A&\mathrm{if}\;\lambda_{\min}=\lambda_{\max},\end{dcases} (11)

where λmax\lambda_{\max} and λmin\lambda_{\min} denote the largest and smallest eigenvalues of BA1BA^{-1}, respectively. Taking the midpoint t=1/2t=1/2 of this geodesic, we arrive at a computationally convenient dd_{\infty}-midpoint of AA and BB, which we will denote by ABA*B.

Proposition 2

For A,B(n)A,B\in\mathbb{P}(n), we have

AB=1λmin+λmax(B+λminλmaxA).A*B=\frac{1}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}}\left(B+\sqrt{\lambda_{\min}\lambda_{\max}}A\right). (12)

The result follows from elementary algebraic simplification upon setting AB=ϕ(1/2;A,B)A*B=\phi(1/2;A,B) in the case λminλmax\lambda_{\min}\neq\lambda_{\max}. If λmin=λmax\lambda_{\min}=\lambda_{\max}, then ϕ(1/2;A,B)=λminA\phi(1/2;A,B)=\sqrt{\lambda_{\min}}A also agrees with the formula in (12). The geometry of the set of dd_{\infty}-midpoints of a given pair of points in (n)\mathbb{P}(n) is studied in detail in [5], where it is shown that there is a unique dd_{\infty} minimal geodesic from AA to BB if and only if the spectrum of BA1BA^{-1} lies in a set {λ,λ1}\{\lambda,\lambda^{-1}\} for some λ>0\lambda>0. Moreover, it is shown that otherwise there are infinitely many dd_{\infty} minimal geodesics from AA to BB, and that the set of dd_{\infty}-midpoints of AA and BB is compact and convex in both Riemannian and Euclidean senses [5].

We now consider the merits of ABA*B as a mean of AA and BB. The following are a number of properties that should be satisfied by a sensible notion of a mean μ:(n)×(n)(n)\mu:\mathbb{P}(n)\times\mathbb{P}(n)\rightarrow\mathbb{P}(n) of a pair of positive definite matrices [2, 4].

  1. (i)

    Continuity: μ\mu is a continuous map.

  2. (ii)

    Symmetry: μ(A,B)=μ(B,A)\mu(A,B)=\mu(B,A) for all A,B(n)A,B\in\mathbb{P}(n).

  3. (iii)

    Affine-invariance: μ(XAX,XBX)=Xμ(A,B)X\mu(XAX^{*},XBX^{*})=X\mu(A,B)X^{*}, for all XGL(n)X\in GL(n).

  4. (iv)

    Order property: ABAμ(A,B)BA\preceq B\implies A\preceq\mu(A,B)\preceq B.

  5. (v)

    Monotonicity: μ(A,B)\mu(A,B) is monotone in its arguments.

Note that XX^{*} in (iii) denotes the conjugate transpose of XX. It is relatively straightforward to show that the map μ(A,B):=AB\mu(A,B):=A*B satisfies properties (i)-(iii) listed above. In the remainder of this section, we will turn our attention to the order and monotonicity properties (iv) and (v).

2.2 Order and monotonicity properties of μ(A,B)=AB\mu(A,B)=A*B

Condition (iv) is a generalization of the property of means of positive numbers whereby a mean of a pair of points is expected to lie between the two points on the number line. For Hermitian matrices, a standard partial order \preceq exists according to which ABA\preceq B if and only if BAB-A is positive semidefinite. This partial order is known as the Löwner order and the monotonicity of condition (v) is also with reference to this order. It is well-known that the Löwner order is affine-invariant in the sense that for all A,B(n)A,B\in\mathbb{P}(n), XGL(n)X\in GL(n), ABA\preceq B implies that XAXXBXXAX^{*}\preceq XBX^{*}. In particular, ABA\preceq B if and only if IA1/2BA1/2I\preceq A^{-1/2}BA^{-1/2}. Thus, by affine-invariance of μ\mu, it suffices to prove (iv) in the case where A=IA=I since Aμ(A,B)BA\preceq\mu(A,B)\preceq B if and only if Iμ(I,A1/2BA1/2)A1/2BA1/2I\preceq\mu(I,A^{-1/2}BA^{-1/2})\preceq A^{-1/2}BA^{-1/2}. To establish condition (iv) for μ(A,B)=AB\mu(A,B)=A*B, we shall make use of the following lemma whose proof is elementary.

Lemma 1

If c1,c2c_{1},c_{2}\in\mathbb{R} and MM is an n×nn\times n matrix with eigenvalues λi(M)\lambda_{i}(M), then c1M+c2Ic_{1}M+c_{2}I has eigenvalues c1λi(M)+c2c_{1}\lambda_{i}(M)+c_{2}.

Let Σ(n)\Sigma\in\mathbb{P}(n) be such that IΣI\preceq\Sigma and note that this is equivalent to λi(Σ)1\lambda_{i}(\Sigma)\geq 1 for i=1,,ni=1,\ldots,n. Writing λmin=λmin(Σ)\lambda_{\min}=\lambda_{\min}(\Sigma) and λmax=λmax(Σ)\lambda_{\max}=\lambda_{\max}(\Sigma), we have by Lemma 1 that

λi(IΣ)1\displaystyle\lambda_{i}(I*\Sigma)-1 =λi(1λmin+λmax(Σ+λminλmaxI))1\displaystyle=\lambda_{i}\left(\frac{1}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}}\left(\Sigma+\sqrt{\lambda_{\min}\lambda_{\max}}I\right)\right)-1 (13)
=λi(Σ)+λminλmaxλminλmaxλmin+λmax\displaystyle=\frac{\lambda_{i}(\Sigma)+\sqrt{\lambda_{\min}\lambda_{\max}}-\sqrt{\lambda_{\min}}-\sqrt{\lambda_{\max}}}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}} (14)
(λmin+λmax)(λmin1)λmin+λmax0,\displaystyle\geq\frac{\left(\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}\right)\left(\sqrt{\lambda_{\min}}-1\right)}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}}\geq 0, (15)

since λi(Σ)λmin1\lambda_{i}(\Sigma)\geq\lambda_{\min}\geq 1. Thus, we have shown that IΣI\preceq\Sigma implies IIΣI\preceq I*\Sigma. To prove the other inequality, note that

λi(ΣIΣ)\displaystyle\lambda_{i}(\Sigma-I*\Sigma) =λi((λmin+λmax1λmin+λmax)Σλminλmaxλmin+λmaxI)\displaystyle=\lambda_{i}\left(\left(\frac{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}-1}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}}\right)\Sigma-\frac{\sqrt{\lambda_{\min}\lambda_{\max}}}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}}\;I\right) (16)
=(λmin+λmax1λmin+λmax)λi(Σ)λminλmaxλmin+λmax\displaystyle=\left(\frac{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}-1}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}}\right)\lambda_{i}(\Sigma)-\frac{\sqrt{\lambda_{\min}\lambda_{\max}}}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}} (17)
(λmin+λmax1)λminλminλmaxλmin+λmax\displaystyle\geq\frac{(\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}-1)\lambda_{\min}-\sqrt{\lambda_{\min}\lambda_{\max}}}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}} (18)
=λmin(λmin1)0.\displaystyle=\sqrt{\lambda_{\min}}\left(\sqrt{\lambda_{\min}}-1\right)\geq 0. (19)

Therefore, we have also shown that ΣIΣ0\Sigma-I*\Sigma\succeq 0. That is,

IΣIIΣΣ,I\preceq\Sigma\implies I\preceq I*\Sigma\preceq\Sigma, (20)

for all Σ(n)\Sigma\in\mathbb{P}(n). In particular, upon substituting Σ=A1/2BA1/2\Sigma=A^{-1/2}BA^{-1/2} in (20) and using the affine-invariance properties of both the Löwner order and the mean μ(A,B)=AB\mu(A,B)=A*B, we establish the following important property.

Proposition 3

For A,B(n)A,B\in\mathbb{P}(n), ABA\preceq B implies that AABBA\preceq A*B\preceq B.

We now consider the monotonicity of μ\mu in its arguments. First recall that a map F:(n)(n)F:\mathbb{P}(n)\rightarrow\mathbb{P}(n) is said to be monotone if Σ1Σ2\Sigma_{1}\preceq\Sigma_{2} implies that F(Σ1)F(Σ2)F(\Sigma_{1})\preceq F(\Sigma_{2}). By symmetry and affine-invariance, it is sufficient to consider monotonicity of μ(I,Σ)\mu(I,\Sigma) with respect to Σ\Sigma. That is, monotonicity is established by showing that

Σ1Σ2IΣ1IΣ2.\Sigma_{1}\preceq\Sigma_{2}\implies I*\Sigma_{1}\preceq\ I*\Sigma_{2}. (21)

Unfortunately, it turns out that F(Σ):=IΣF(\Sigma):=I*\Sigma is not monotone with respect to Σ\Sigma as we shall demonstrate below. Nonetheless, FF is seen to enjoy certain weaker monotonicity properties, which is interesting and insightful. Considering the eigenvalues of IΣI*\Sigma, we find that

λi(IΣ)=λi(Σ)+λminλmaxλmin+λmax,\displaystyle\lambda_{i}(I*\Sigma)=\frac{\lambda_{i}(\Sigma)+\sqrt{\lambda_{\min}\lambda_{\max}}}{\sqrt{\lambda_{\min}}+\sqrt{\lambda_{\max}}}, (22)

where λmin\lambda_{\min} and λmax\lambda_{\max} refer to the smallest and largest eigenvalues of Σ\Sigma.

Proposition 4

The maximum and minimum eigenvalues of F(Σ)=IΣF(\Sigma)=I*\Sigma are monotone with respect to Σ\Sigma.

Proof

Considering the cases i=1i=1 and i=ni=n, we find that (22) yields

λmin(IΣ)=λmin(Σ)andλmax(IΣ)=λmax(Σ),\lambda_{\min}(I*\Sigma)=\sqrt{\lambda_{\min}(\Sigma)}\quad\mathrm{and}\quad\lambda_{\max}(I*\Sigma)=\sqrt{\lambda_{\max}(\Sigma)}, (23)

both of which are seen to be monotone functions of Σ\Sigma. ∎

It is in the sense of the above that μ(A,B)=AB\mu(A,B)=A*B inherits a weak monotonicity property. The monotonic dependence of the extremal eigenvalues of IΣI*\Sigma on Σ\Sigma ensures that if Σ1Σ2\Sigma_{1}\preceq\Sigma_{2}, then we can at least rule out the possibility that IΣ1IΣ2I*\Sigma_{1}\succ I*\Sigma_{2}, where 0\succ 0 here denotes positive definiteness. To prove that monotonicity is generally not satisfied in the full sense of condition (v), consider a diagonal matrix Σ=diag(a,b,x)(3)\Sigma=\operatorname{diag}(a,b,x)\in\mathbb{P}(3), where λmin(Σ)=a<bx=λmax(Σ)\lambda_{\min}(\Sigma)=a<b\leq x=\lambda_{\max}(\Sigma) and xx is thought of as a variable. We have IΣ=diag(a,f(x),x)I*\Sigma=\operatorname{diag}\left(\sqrt{a},f(x),\sqrt{x}\right), where

λ2(IΣ)=f(x):=b+axa+x.\lambda_{2}(I*\Sigma)=f(x):=\frac{b+\sqrt{ax}}{\sqrt{a}+\sqrt{x}}. (24)

Taking the derivative of ff with respect to xx, we find that

f(x)=ab2x(a+x)2<0,xb,f^{\prime}(x)=\frac{a-b}{2\sqrt{x}(\sqrt{a}+\sqrt{x})^{2}}<0,\quad\forall x\geq b, (25)

which shows that the second eigenvalue of IΣI*\Sigma decreases as xx increases. Thus, we see that IΣI*\Sigma cannot depend monotonically on Σ\Sigma in this example.

2.3 A geometric scaling property

Before completing this section on the midrange μ(A,B)=AB\mu(A,B)=A*B of a pair of positive definite matrices, we note a key scaling property satisfied by μ\mu which suggests that it is a plausible candidate as a scalable substitute for the standard matrix geometric mean A#BA\#B.

Proposition 5

For any real scalars a,b>0a,b>0 and matrices A,B(n)A,B\in\mathbb{P}(n), we have

(aA)(bB)=ab(AB).(aA)*(bB)=\sqrt{ab}(A*B). (26)
Proof

The result follows upon substituting λi((bB)(aA)1)=baλi(BA1)\lambda_{i}\left((bB)(aA)^{-1}\right)=\frac{b}{a}\lambda_{i}(BA^{-1}) into the formula (12). ∎

The scaling in (26) of course does not generally hold for a mean of two matrices. Indeed, it does not generally hold for means arising as dd_{\infty}-midpoints either. For instance, [5] identifies

AB={λmax1+λmax(A+B)ifλminλmax1,λmin1+λmin(A+B)ifλminλmax1,A\diamond B=\begin{dcases}\frac{\sqrt{\lambda_{\max}}}{1+\lambda_{\max}}(A+B)\quad\mathrm{if}\quad\lambda_{\min}\lambda_{\max}\geq 1,\\ \frac{\sqrt{\lambda_{\min}}}{1+\lambda_{\min}}(A+B)\quad\mathrm{if}\quad\lambda_{\min}\lambda_{\max}\leq 1,\end{dcases} (27)

as another dd_{\infty}-midpoint of AA and BB corresponding to the midpoint of a different dd_{\infty} minimal geodesic to the one considered in this paper. It is clear that (27) does not scale geometrically as in (26). As a summary, we collect the key results of this section in the following theorem.

Theorem 2.1

The mean μ(A,B)=AB\mu(A,B)=A*B defined in (12) yields a dd_{\infty}-midpoint of A,B(n)A,B\in\mathbb{P}(n) that is continuous, symmetric, affine-invariant, and scales geometrically as in (26). Moreover, if ABA\preceq B, then Aμ(A,B)BA\preceq\mu(A,B)\preceq B, and the extremal eigenvalues of μ(I,Σ)\mu(I,\Sigma) depend monotonically on Σ(n)\Sigma\in\mathbb{P}(n).

3 The midrange problem in +\mathbb{R}_{+} and +n\mathbb{R}_{+}^{n}

It is instructive to consider the NN-point affine-invariant midrange problem in the scalar and vector cases, corresponding to the cones +\mathbb{R}_{+} and +n\mathbb{R}_{+}^{n}, respectively. In the scalar case, we are given NN positive numbers yi>0y_{i}>0 that can be ordered such that miniyiykmaxiyi\min_{i}y_{i}\leq y_{k}\leq\max_{i}y_{i} for each k=1,,Nk=1,\ldots,N. By the monotonicity of the log\log function, we have log(miniyi)logyklog(maxiyi)\log\left(\min_{i}y_{i}\right)\leq\log y_{k}\leq\log\left(\max_{i}y_{i}\right). The midrange is uniquely given by

x=exp(12[log(miniyi)+log(maxiyi)])=(miniyimaxiyi)1/2.x=\exp\left(\frac{1}{2}\left[\log\left(\min_{i}y_{i}\right)+\log\left(\max_{i}y_{i}\right)\right]\right)=\left(\min_{i}y_{i}\cdot\max_{i}y_{i}\right)^{1/2}. (28)

Note that (28) is the unique solution of the optimization problem

minx>0maxi|logxlogyi|.\min_{x>0}\;\max_{i}\;|\log x-\log y_{i}|. (29)

In the vector case, the midrange problem in +n\mathbb{R}^{n}_{+} takes the form

min𝒙>0maxilog𝒙log𝒚i:=min𝒙>0maximaxa|logxalogyia|,\min_{\boldsymbol{x}>0}\;\max_{i}\;\|\log\boldsymbol{x}-\log\boldsymbol{y}_{i}\|_{\infty}:=\min_{\boldsymbol{x}>0}\;\max_{i}\;\max_{a}\;|\log x^{a}-\log y^{a}_{i}|, (30)

where 𝒙>0\boldsymbol{x}>0 means that 𝒙=(xa)\boldsymbol{x}=(x^{a}) satsifies xa>0x^{a}>0 for a=1,,na=1,\ldots,n and 𝒚𝒊\boldsymbol{y_{i}} are a collection of NN given points in +n\mathbb{R}^{n}_{+}. As in the matrix case, the optimum cost tt^{\star} has a lower bound

l=12maxi,jlog𝒚ilog𝒚j.l=\frac{1}{2}\max_{i,j}\|\log\boldsymbol{y}_{i}-\log\boldsymbol{y}_{j}\|_{\infty}. (31)
Proposition 6

The lower bound (31) is attained by 𝐱=(xa)+n\boldsymbol{x}=(x^{a})\in\mathbb{R}^{n}_{+} given by

xa=(miniyiamaxiyia)1/2.x^{a}=\left(\min_{i}y_{i}^{a}\cdot\max_{i}y_{i}^{a}\right)^{1/2}. (32)

4 The NN-point midrange problem in (n)\mathbb{P}(n)

In the matrix setting, the midrange problem (5) takes the form

minX0maxilog(Yi1/2XYi1/2),\displaystyle\min_{X\succeq 0}\;\max_{i}\;\|\log(Y_{i}^{-1/2}XY_{i}^{-1/2})\|_{\infty}, (33)

which can be rewritten as

{minX0,ttetYiXetYi\begin{cases}\min_{X\succeq 0,t}\;t\\ e^{-t}Y_{i}\preceq X\preceq e^{t}Y_{i}\end{cases} (34)

While this problem is not convex due to the presence of the log\log function, the feasibility condition etYiXetYie^{-t}Y_{i}\preceq X\preceq e^{t}Y_{i} is convex for fixed tt and can be solved using standard convex optimization packages such as CVX [3]. Given a tt that is greater than or equal to the optimum value t=minXmaxid(X,Yi)t^{\star}=\min_{X}\max_{i}\;d_{\infty}(X,Y_{i}), we can solve (34) by successively solving the feasibility condition as we decrease tt. In the bisection method it is very desirable to have a good estimate for the initial tt as the successive reductions in tt can be quite slow. In particular, if the lower bound l=12diam({Yi})l=\frac{1}{2}\operatorname{diam}(\{Y_{i}\}) is attained as in the vector case, then we can solve (34) in one step by taking t=lt=l and solving the feasibility condition once. Unfortunately, and rather remarkably, numerical examples show that unlike the scalar and vector cases, the lower bound ll is not always attained in the affine-invariant matrix midrange problem.

Proposition 7

The lower bound l=12diam({Yi})l=\frac{1}{2}\operatorname{diam}_{\infty}(\{Y_{i}\}) is not necessarily attained in (33).

The above result suggests that the NN-point matrix midrange problem is more challenging than the vector case in fundamental ways. While the bisection method offers a solution to this problem, we expect that significantly more efficient solutions to the problem can be found. In particular, we expect to find algorithms that rely principally on the computation of dominant generalized eigenpairs that would be considerably more efficient and scalable than existing algorithms for computing matrix geometric means, as in the N=2N=2 case.

References

  • [1] Bhatia, R.: On the exponential metric increasing property. Linear Algebra and its Applications 375, 211 – 220 (2003)
  • [2] Bhatia, R.: Positive Definite Matrices. Princeton University Press (2007)
  • [3] Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx (Mar 2014)
  • [4] Kubo, F., Ando, T.: Means of positive linear operators. Mathematische Annalen 246(3), 205–224 (1980)
  • [5] Lim, Y.: Geometry of midpoint sets for Thompson’s metric. Linear Algebra and its Applications 439(1), 211 – 227 (2013)
  • [6] Nussbaum, R.D.: Finsler structures for the part metric and Hilbert’s projective metric and applications to ordinary differential equations. Differential Integral Equations 7(5-6), 1649–1707 (1994)
  • [7] Thompson, A.C.: On certain contraction mappings in a partially ordered vector space. Proceedings of the American Mathematical Society 14(3), 438–443 (1963)