This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On the Exactness of SDP Relaxation for Quadratic Assignment Problem

Shuyang Ling Shanghai Frontiers Science Center of Artificial Intelligence and Deep Learning, New York University Shanghai, Shanghai, China. S.L. and Z.S.Z. are (partially) financially supported by the National Key R&D Program of China, Project Number 2021YFA1002800, National Natural Science Foundation of China (NSFC) No.12001372, Shanghai Municipal Education Commission (SMEC) via Grant 0920000112, and NYU Shanghai Boost Fund.
Abstract

Quadratic assignment problem (QAP) is a fundamental problem in combinatorial optimization and finds numerous applications in operation research, computer vision, and pattern recognition. However, it is a very well-known NP-hard problem to find the global minimizer to the QAP. In this work, we study the semidefinite relaxation (SDR) of the QAP and investigate when the SDR recovers the global minimizer. In particular, we consider the two input matrices satisfy a simple signal-plus-noise model, and show that when the noise is sufficiently smaller than the signal, then the SDR is exact, i.e., it recovers the global minimizer to the QAP. It is worth noting that this sufficient condition is purely algebraic and does not depend on any statistical assumption of the input data. We apply our bound to several statistical models such as correlated Gaussian Wigner model. Despite the sub-optimality in theory under those models, empirical studies show the remarkable performance of the SDR. Our work could be the first step towards a deeper understanding of the SDR exactness for the QAP.

1 Introduction

Given two matrices 𝑨\bm{A} and 𝑪\bm{C}, how to find a simultaneous row and column permutation of 𝑪\bm{C} such that the resulting two matrices are well aligned? This problem, known as the quadratic assignment (QAP), is one of most challenging problems in optimization [5, 6, 32, 34, 38]. Moreover, it has found numerous applications including graph matching [3, 13], de-anonymization and privacy [43], and protein network [47], and traveling salesman [16, 29].

One of the most common approaches to find the optimal permutation is to minimize the least squares objective:

min𝚷𝒫(n)𝑨𝚷𝚷𝑪F2max𝑨𝚷,𝚷𝑪𝚷𝒫(n)\min_{\bm{\Pi}\in{\cal P}(n)}~{}\|\bm{A}\bm{\Pi}-\bm{\Pi}\bm{C}\|_{F}^{2}~{}~{}\Longleftrightarrow~{}~{}\max~{}_{\bm{\Pi}\in{\cal P}(n)}\langle\bm{A}\bm{\Pi},\bm{\Pi}\bm{C}\rangle (1.1)

where 𝒫(n){\cal P}(n) is the set of all n×nn\times n permutation matrices. In general, it is well-known NP-hard problem to find its global minimizer [32]. Numerous works have been done to either approximate or exactly solve the QAP problem [5, 6, 27, 34, 38]. Among various algorithms, convex relaxation is a popular approach to solve the QAP [18, 24, 39, 44, 51, 29]. In [27], Gilmore proposed a famous relaxation of quadratic assignment via linear programming and [27, 34] derived a lower bound for the QAP problem. Another straightforward convex relaxation is to relax the permutation matrix to the set of doubly stochastic matrix 𝒟(n){\cal D}(n), which leads to a quadratic program:

min𝑿𝒟(n)𝑨𝑿𝑿𝑪F2.\min_{\bm{X}\in{\cal D}(n)}~{}\|\bm{A}\bm{X}-\bm{X}\bm{C}\|_{F}^{2}. (1.2)

One can also consider spectral relaxation of the QAP [46, 5, 7], and moreover an estimation of the optimal value of the QAP can be also characterized by the spectra of input data matrices.

In this work, we will focus on the semidefinite relaxation (SQR) of QAP. One well-known SDR was proposed in [51]. After that, many variants of SDR have been proposed [44, 15, 16] to improve the formulation in [51], and [10, 42] have studied efficient algorithms to solve the SDR. Our work will be more on the theoretical sides of the SDR for QAP. In particular, we are interested in the following question:

When does the SDR recover the global minimizer to (1.1)?

Without any more constraints, there is certainly almost no hope to find the exact solution to the QAP for general input data due to its NP-hardness. We consider a signal-plus-noise model for the quadratic assignment. More precisely, let 𝑨\bm{A} be an n×nn\times n symmetric matrix (e.g. adjacency matrix), and 𝑪\bm{C} is a perturbed matrix of 𝑨\bm{A}:

𝑪=𝚷(𝑨+𝚫)𝚷\bm{C}=\bm{\Pi}^{\top}(\bm{A}+\bm{\Delta})\bm{\Pi} (1.3)

where 𝚷𝒫(n)\bm{\Pi}\in{\cal P}(n) is an unknown permutation matrix. Our goal is to recover 𝚷\bm{\Pi} from the two matrices 𝑨\bm{A} and 𝑪\bm{C} in an efficient way.

In the noise-free case, i.e., 𝚫=0\bm{\Delta}=0 which corresponds to the graph isomorphism, the ground true permutation can be exactly recovered by solving (1.2) or using spectral methods [23, 33] under some regularity conditions on the spectra of 𝑨\bm{A}. Several other convex programs have been studied [3] to recover the ground true permutation for 𝚫=0\bm{\Delta}=0 or in presence of very weak noise. Recently, the quadratic assignment has been extensively studied under various statistical models (average-case analysis), especially in the context of graph matching or graph alignment such as correlated Erdös-Rényi graph model [41, 50, 20, 30] and correlated Wigner model [14, 25, 19]. A series of works fully exploit the statistical properties of random weight matrix, and use spectral methods [21, 19, 26] or extracts the feature of vertices [17, 40] to efficiently align two correlated random matrices. The core question is: under what noise levels, one can design an efficient algorithm to find the permutation [19, 20, 21] and whether the algorithm can achieve the information-theoretical threshold [14, 25].

On the other hand, the study on the optimization approaches to solve these random instances is quite limited compared with spectral methods or feature-extraction-based approaches. But optimization-based approach often enjoys more robustness [12]. Therefore, we are interested in studying the performance of optimization methods, especially SDR, in solving the random instances of QAP. In the work [39], the authors studied (1.2) for the correlated Erdös-Rényi model and proved that (1.2) will never produce the exact permutation matrix even if the noise level is extremely small. The work [19, 20] proposed a spectral method to estimate the true permutation under both correlated Gaussian Wigner and Erdös-Rényi models. The spectral method can be viewed as a convex relaxation of (1.1). Although that the global minimizer is not a permutation, it produces the true permutation after a rounding procedure.

SDR (semidefinite relaxation) has proven itself to be powerful in tacking many challenging nonconvex problems in signal processing and data science [48]. The exactness or tightness of SDR has been studied in various problems in signal processing and data science. k-means and data clustering [8, 31, 35, 37], community detection [1, 2], synchronization [9, 52, 36], phase retrieval [11], matrix completion [45], and blind deconvolution [4]. These works show that the SDR can recover the ground truth as long as the SNR (signal-to-noise ratio) is sufficiently large, such as the sample size is large enough or the noise in the data is smaller than certain threshold. Inspired by these observations, we will study when the exactness holds for the SDR of QAP, i.e., the SDR produces a permutation matrix which is also the global minimizer to (1.1).

In this work, we focus on two variants of SDR for the QAP, and study its exactness under the signal-plus-noise model. We provide a sufficient condition that is based on the spectral gap of 𝑨\bm{A} and also the noise strength 𝚫\|\bm{\Delta}\| to guarantee the exactness of the SDR. It is worth noting that this sufficient condition is deterministic, and can be applied to several statistical models. Despite the theoretical sub-optimality in the statistical examples, the SDR has shown powerful numerical performance and this could be the first step towards understanding the exactness of SDR for QAP.

1.1 Notation

Before proceeding, we go over some notations that will be used. We denote boldface 𝒙\bm{x} and 𝑿\bm{X} as a vector and matrix respectively, and 𝒙\bm{x}^{\top} and 𝑿\bm{X}^{\top} are their corresponding transpose. The 𝑰n\bm{I}_{n} and 𝑱n\bm{J}_{n} stands the n×nn\times n identity and constant “1” matrix of size n×nn\times n. For a vector 𝒙\bm{x}, diag(𝒙)\operatorname{diag}(\bm{x}) is a diagonal matrix whose diagonal entries are given by 𝒙.\bm{x}. For two matrices 𝑿\bm{X} and 𝒀\bm{Y} of the same size, 𝑿𝒀\bm{X}\circ\bm{Y} is the Hadamard product, 𝑿,𝒀=i,jXijYij=Tr(𝑿𝒀)\langle\bm{X},\bm{Y}\rangle=\sum_{i,j}X_{ij}Y_{ij}=\operatorname{Tr}(\bm{X}\bm{Y}^{\top}) is their inner product, and 𝑿𝒀\bm{X}\otimes\bm{Y} is their Kronecker product. For any matrix 𝑿\bm{X}, the Frobenius, operator norms, and the maximum absolute value among all entries are denoted by 𝑿F\|\bm{X}\|_{F}, 𝑿\|\bm{X}\|, and 𝑿max\|\bm{X}\|_{\max} respectively. Here 𝒫(n){\cal P}(n) is the set of n×nn\times n permutation matrix, 𝒟(n){\cal D}(n) is the set of all n×nn\times n doubly stochastic matrix, 𝒆i\bm{e}_{i} is a one-hot vector, and δij=1\delta_{ij}=1 if i=ji=j, and δij=0\delta_{ij}=0 if ij.i\neq j. We let vec(𝑿)\operatorname{vec}(\bm{X}) be the vectorization of 𝑿\bm{X} by stacking the columns of 𝑿\bm{X}, and for a given vector 𝒙n2\bm{x}\in\hbox{\msbm{R}}^{n^{2}}, we denote mat(𝒙)n×n\operatorname{mat}(\bm{x})\in\hbox{\msbm{R}}^{n\times n} as the matrization of 𝒙\bm{x}.

1.2 Organization

The derivation of SDRs for the QAP is discussed in Section 2; and we will present the main results and numerics in Section 2 as well. The theoretical justification of the main theorems will be given in Section 3 and 4.

2 Preliminaries and main results

Without loss of generality, we assume the hidden ground true permutation is 𝑰n\bm{I}_{n} in (1.3). Note that we can rewrite this least squares objective by vectorizing 𝑨𝚷\bm{A}\bm{\Pi} and 𝚷𝑪\bm{\Pi}\bm{C}:

𝑨𝚷𝚷𝑪F2\displaystyle\|\bm{A}\bm{\Pi}-\bm{\Pi}\bm{C}\|_{F}^{2} =(𝑰n𝑨𝑪𝑰n)𝒙2\displaystyle=\|(\bm{I}_{n}\otimes\bm{A}-\bm{C}\otimes\bm{I}_{n})\bm{x}\|^{2}
=𝒙(𝑰n𝑨𝑪𝑰n)2𝒙\displaystyle=\bm{x}^{\top}(\bm{I}_{n}\otimes\bm{A}-\bm{C}\otimes\bm{I}_{n})^{2}\bm{x}

where 𝑨\bm{A} and 𝑪\bm{C} are symmetric, and 𝒙\bm{x} is the vectorization of 𝚷\bm{\Pi}, i.e., for any permutation σ()\sigma(\cdot), which is bijective on {1,,n}\{1,\cdots,n\},

𝒙=[𝒆σ(1)𝒆σ(n)]\bm{x}=\begin{bmatrix}\bm{e}_{\sigma(1)}\\ \vdots\\ \bm{e}_{\sigma(n)}\end{bmatrix} (2.1)

Therefore, by letting

𝑴:=(𝑰n𝑨𝑪𝑰n)20,\bm{M}:=(\bm{I}_{n}\otimes\bm{A}-\bm{C}\otimes\bm{I}_{n})^{2}\succeq 0, (2.2)

the quadratic assignment problem is equivalent to

minTr(𝑴𝒙𝒙) s.t. 𝒙=vec(𝚷),𝚷𝒫(n).\min~{}\operatorname{Tr}(\bm{M}\bm{x}\bm{x}^{\top})\quad\text{ s.t. }\quad\bm{x}=\operatorname{vec}(\bm{\Pi}),~{}~{}\bm{\Pi}\in{\cal P}(n). (2.3)

Another equivalent form of (1.1) follows from

𝑨𝚷𝚷𝑪F2\displaystyle\|\bm{A}\bm{\Pi}-\bm{\Pi}\bm{C}\|_{F}^{2} =𝑨F22𝑨𝚷,𝚷𝑪+𝑪F2=2(𝑰n𝑨)𝒙,(𝑪𝑰n)𝒙+𝑨F2+𝑪F2\displaystyle=\|\bm{A}\|_{F}^{2}-2\langle\bm{A}\bm{\Pi},\bm{\Pi}\bm{C}\rangle+\|\bm{C}\|_{F}^{2}=-2\langle(\bm{I}_{n}\otimes\bm{A})\bm{x},(\bm{C}\otimes\bm{I}_{n})\bm{x}\rangle+\|\bm{A}\|_{F}^{2}+\|\bm{C}\|_{F}^{2}
=2𝑪𝑨,𝒙𝒙+𝑨F2+𝑪F2\displaystyle=-2\langle\bm{C}\otimes\bm{A},\bm{x}\bm{x}^{\top}\rangle+\|\bm{A}\|_{F}^{2}+\|\bm{C}\|_{F}^{2}

By letting

𝑴:=𝑪𝑨,\bm{M}:=-\bm{C}\otimes\bm{A}, (2.4)

the least squares objective is equal to (2.3). Throughout the discussion, we will mainly focus on (2.3) with 𝑴=(𝑰n𝑨𝑪𝑰n)2\bm{M}=(\bm{I}_{n}\otimes\bm{A}-\bm{C}\otimes\bm{I}_{n})^{2} in (2.2), and all the theoretical analysis also applies to (2.4) with minor modifications.

2.1 Convex relaxation

By letting 𝑿=𝒙𝒙,\bm{X}=\bm{x}\bm{x}^{\top}, we note that (2.3) is a linear function in 𝑿\bm{X}. Therefore, the idea of convex relaxation of (2.3) is to find a proper convex set that includes all rank-1 matrix 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} where 𝒙=vec(𝚷)\bm{x}=\operatorname{vec}(\bm{\Pi}) and 𝚷𝒫(n).\bm{\Pi}\in{\cal P}(n). By (2.1), 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} is highly structured:

𝑿=[𝒆σ(1)𝒆σ(1)𝒆σ(1)𝒆σ(2)𝒆σ(1)𝒆σ(n)𝒆σ(2)𝒆σ(1)𝒆σ(2)𝒆σ(2)𝒆σ(2)𝒆σ(n)𝒆σ(n)𝒆σ(1)𝒆σ(n)𝒆σ(2)𝒆σ(n)𝒆σ(n)]\bm{X}=\begin{bmatrix}\bm{e}_{\sigma(1)}\bm{e}_{\sigma(1)}^{\top}&\bm{e}_{\sigma(1)}\bm{e}_{\sigma(2)}^{\top}&\cdots&\bm{e}_{\sigma(1)}\bm{e}_{\sigma(n)}^{\top}\\ \bm{e}_{\sigma(2)}\bm{e}_{\sigma(1)}^{\top}&\bm{e}_{\sigma(2)}\bm{e}_{\sigma(2)}^{\top}&\cdots&\bm{e}_{\sigma(2)}\bm{e}_{\sigma(n)}^{\top}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{e}_{\sigma(n)}\bm{e}_{\sigma(1)}^{\top}&\bm{e}_{\sigma(n)}\bm{e}_{\sigma(2)}^{\top}&\cdots&\bm{e}_{\sigma(n)}\bm{e}_{\sigma(n)}^{\top}\\ \end{bmatrix}

which consists of n2n^{2} blocks with each block exactly rank-1 and only containing one non-zero entry.

Now we try to find a proper convex set which contains

{𝑿:𝑿=𝒙𝒙,𝒙=vec(𝚷),𝚷𝒫(n)}.\{\bm{X}:\bm{X}=\bm{x}\bm{x}^{\top},\quad\bm{x}=\operatorname{vec}(\bm{\Pi}),\quad\bm{\Pi}\in{\cal P}(n)\}.

It is obvious that 𝑿0\bm{X}\succeq 0 and 𝑿0\bm{X}\geq 0, which will be incorporated into the constraints.

Convex relaxation I.

Note that for each n×nn\times n block, we have

𝑿ij=𝒆σ(i)𝒆σ(j)\bm{X}_{ij}=\bm{e}_{\sigma(i)}\bm{e}_{\sigma(j)}^{\top}

is exactly rank-1 and only contains one nonzero entry. Moreover, for any permutation σ\sigma, it holds that

Tr(𝑿ij)={1,i=j,0,ij,,𝑿ij,𝑱n=1\operatorname{Tr}(\bm{X}_{ij})=\begin{cases}1,&i=j,\\ 0,&i\neq j,\end{cases},~{}~{}~{}\langle\bm{X}_{ij},\bm{J}_{n}\rangle=1

and also

i=1n𝑿ii\displaystyle\sum_{i=1}^{n}\bm{X}_{ii} =i=1n𝒆σ(i)𝒆σ(i)=𝑰n,\displaystyle=\sum_{i=1}^{n}\bm{e}_{\sigma(i)}\bm{e}_{\sigma(i)}^{\top}=\bm{I}_{n},
i,j𝑿ij\displaystyle\sum_{i,j}\bm{X}_{ij} =i,j𝒆σ(i)𝒆σ(j)=i=1n𝒆σ(i)(j=1n𝒆σ(j))=𝑱n.\displaystyle=\sum_{i,j}\bm{e}_{\sigma(i)}\bm{e}_{\sigma(j)}^{\top}=\sum_{i=1}^{n}\bm{e}_{\sigma(i)}\left(\sum_{j=1}^{n}\bm{e}_{\sigma(j)}\right)^{\top}=\bm{J}_{n}.

Therefore, combining these constraints leads to the following convex relaxation:

min\displaystyle\min 𝑴,𝑿\displaystyle\langle\bm{M},\bm{X}\rangle (2.5)
s.t. 𝑿0,𝑿0,\displaystyle\bm{X}\succeq 0,~{}~{}~{}\bm{X}\geq 0,
Tr(𝑿ij)=δij,𝑿ij,𝑱n=1,1i,jn,\displaystyle\operatorname{Tr}(\bm{X}_{ij})=\delta_{ij},~{}~{}~{}\langle\bm{X}_{ij},\bm{J}_{n}\rangle=1,~{}~{}\forall 1\leq i,j\leq n,
i=1n𝑿ii=𝑰n,i,j𝑿ij=𝑱n.\displaystyle\sum_{i=1}^{n}\bm{X}_{ii}=\bm{I}_{n},~{}~{}~{}\sum_{i,j}\bm{X}_{ij}=\bm{J}_{n}.

Convex relaxation II.

Note that for 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top}, it holds diag(𝑿)=𝒙.\operatorname{diag}(\bm{X})=\bm{x}. Using the fact that

[𝒙𝒙𝒙𝒙1]0,\begin{bmatrix}\bm{x}\bm{x}^{\top}&\bm{x}\\ \bm{x}^{\top}&1\end{bmatrix}\succeq 0,

we have a few new constraints:

[𝑿diag(𝑿)diag(𝑿)1]0,mat(diag(𝑿))𝒟(n)\begin{bmatrix}\bm{X}&\operatorname{diag}(\bm{X})\\ \operatorname{diag}(\bm{X})^{\top}&1\end{bmatrix}\succeq 0,~{}~{}~{}\operatorname{mat}(\operatorname{diag}(\bm{X}))\in{\cal D}(n)

where 𝒟(n){\cal D}(n) is the set of n×nn\times n doubly stochastic matrix. Combining the constraints above with Tr(𝑿ij)=δij\operatorname{Tr}(\bm{X}_{ij})=\delta_{ij} and nonnegativity, we get the SDR similar to [51]:

min\displaystyle\min 𝑴,𝑿\displaystyle\langle\bm{M},\bm{X}\rangle (2.6)
s.t. [𝑿diag(𝑿)diag(𝑿)1]0,𝑿0,\displaystyle\begin{bmatrix}\bm{X}&\operatorname{diag}(\bm{X})\\ \operatorname{diag}(\bm{X})^{\top}&1\end{bmatrix}\succeq 0,~{}~{}~{}\bm{X}\geq 0,
𝑿ij,𝑱n=1,Tr(𝑿ij)=δij,1i,jn,\displaystyle\langle\bm{X}_{ij},\bm{J}_{n}\rangle=1,~{}~{}~{}\operatorname{Tr}(\bm{X}_{ij})=\delta_{ij},~{}~{}\forall 1\leq i,j\leq n,
i=1n𝑿ii=𝑰n,i,j𝑿ij=𝑱n,1i,jn,\displaystyle\sum_{i=1}^{n}\bm{X}_{ii}=\bm{I}_{n},~{}~{}~{}\sum_{i,j}\bm{X}_{ij}=\bm{J}_{n},~{}~{}\forall 1\leq i,j\leq n,
mat(diag(𝑿))𝟏n=mat(diag(𝑿))𝟏n=𝟏n.\displaystyle\operatorname{mat}(\operatorname{diag}(\bm{X}))\bm{1}_{n}=\operatorname{mat}(\operatorname{diag}(\bm{X}))^{\top}\bm{1}_{n}=\bm{1}_{n}.

For the last constraints in (2.6), the explicit form of mat(diag(𝑿))\operatorname{mat}(\operatorname{diag}(\bm{X})) is given by

mat(diag(𝑿))=[𝑿11,11𝑿11,22𝑿11,33𝑿11,nn𝑿22,11𝑿22,22𝑿22,33𝑿22,nn𝑿33,11𝑿33,22𝑿33,33𝑿33,nn𝑿nn,11𝑿nn,22𝑿nn,33𝑿nn,nn]\operatorname{mat}(\operatorname{diag}(\bm{X}))=\begin{bmatrix}\bm{X}_{11,11}&\bm{X}_{11,22}&\bm{X}_{11,33}&\cdots&\bm{X}_{11,nn}\\ \bm{X}_{22,11}&\bm{X}_{22,22}&\bm{X}_{22,33}&\cdots&\bm{X}_{22,nn}\\ \bm{X}_{33,11}&\bm{X}_{33,22}&\bm{X}_{33,33}&\cdots&\bm{X}_{33,nn}\\ \vdots&\vdots&\vdots&\ddots&\vdots\\ \bm{X}_{nn,11}&\bm{X}_{nn,22}&\bm{X}_{nn,33}&\cdots&\bm{X}_{nn,nn}\end{bmatrix}

which reshapes the diagonal elements of 𝑿\bm{X} into an n×nn\times n matrix. The relaxation (2.6) is tighter than (2.5) as it imposes a few more constraints on 𝑿.\bm{X}.

2.2 Main theorems

With the introduction of two SDRs in (2.5) and (2.6), we are ready to present our main theorems. Note that if 𝑨\bm{A} has distinct eigenvalues and all eigenvectors are not orthogonal to 𝟏n\bm{1}_{n}, then finding the optimal permutation is possible via simple convex relaxations [3, 33]. We interested in whether the exactness holds in presence of the noise, and below is our main theorem.

Theorem 2.1.

Let (λi,𝐮i)(\lambda_{i},\bm{u}_{i}) be the ii-th eigenvalue and eigenvector of 𝐀\bm{A}, then 𝐗=𝐱𝐱\bm{X}=\bm{x}\bm{x}^{\top} is the unique global minimizer to SDR (2.5) if

minij(λiλj)2min|𝒖i,𝟏n|2n(𝚫𝑨+𝑨𝚫max)\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}\min|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}\geq n(\|\bm{\Delta}\|\|\bm{A}\|+\|\bm{A}\bm{\Delta}\|_{\max})

where 𝐀i\bm{A}_{i} and 𝚫i\bm{\Delta}_{i} are the ii-th columns of 𝐀\bm{A} and 𝚫\bm{\Delta} respectively.

There are remarks regarding Theorem 2.1. Suppose 𝚫=0\bm{\Delta}=0, minij|λiλj|>0\min_{i\neq j}|\lambda_{i}-\lambda_{j}|>0 and 𝒖i,𝟏n0\langle\bm{u}_{i},\bm{1}_{n}\rangle\neq 0, then the exactness always holds. This is aligned with the results for graph isomorphism in [3, 33]. The interesting point is that even if the noise exists, as long as the noise strength is sufficiently small compared with minimum spectral gap and the alignment of 𝒖i\bm{u}_{i} and 𝟏n\bm{1}_{n}, then SDR is still exact. In other words, our theorem provides a deterministic condition that guarantees the exactness of the SDR (2.5) in presence of noise. A result of similar flavor was derived in [3, Lemma 2], which provides an error estimation between the solution from a quadratic program relaxation and the true permutation in terms of spectral gap and mini|𝒖i,𝟏n|2\min_{i}|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}. However, the exactness was not obtained in [3].

For the SDR in (2.6), we can derive a similar deterministic sufficient condition for its exactness under the same assumptions.

Theorem 2.2.

Let (λi,𝐮i)(\lambda_{i},\bm{u}_{i}) be the ii-th eigenvalue and eigenvector of 𝐀\bm{A}, then 𝐗=𝐱𝐱\bm{X}=\bm{x}\bm{x}^{\top} is the unique global minimizer to SDR (2.6) if

minij(λiλj)2min|𝒖i,𝟏n|2n(𝚫𝑨+𝑨𝚫max)\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}\min|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}\geq n(\|\bm{\Delta}\|\|\bm{A}\|+\|\bm{A}\bm{\Delta}\|_{\max})

where 𝐀i\bm{A}_{i} and 𝚫i\bm{\Delta}_{i} are the ii-th columns of 𝐀\bm{A} and 𝚫\bm{\Delta} respectively.

The proof of both theorems can be reduced to constructing a dual certificate (finding the dual variables) such that it can ensure the global optimality of the solution 𝑿=𝒙𝒙.\bm{X}=\bm{x}\bm{x}^{\top}. This routine is well-established but the actual construction is highly problem dependent. For the SDR of QAP, the construction is not simple as the SDRs have complicated constraints. One may expect a sharper theoretical bound for the SDR (2.6) than (2.5), as (2.6) has more constraints. However, due to the complication of constructing a proper dual certificate that ensures 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top}, we employ a similar construction of the dual variables and this leads to the same deterministic condition.

As both Theorem 2.1 and 2.2 are general, we present three special examples and see how our theorem works, and also make a comparison with the numerical experiments and the best known theoretical bounds. The experiments under these models imply that our theoretical bound is sub-optimal, although the SDR is remarkable in numerics. As briefly mentioned before, this sub-optimality results from the construction of dual certificate which is quite challenging for the SDR of QAP with general input data.

2.3 Examples and numerics

Example: diagonal matrix plus Gaussian noise.

Suppose 𝑨\bm{A} is a diagonal matrix, i.e.,

𝑨=diag(λ1,,λn)\bm{A}=\operatorname{diag}(\lambda_{1},\cdots,\lambda_{n})

where the eigenvalues are in a descending order and the eigenvector is 𝒖i=𝒆i\bm{u}_{i}=\bm{e}_{i}. Suppose 𝚫=σdiag(𝒘)\bm{\Delta}=\sigma\operatorname{diag}(\bm{w}) where 𝒘\bm{w} is a standard Gaussian random vector, and 𝑪=𝑨+σdiag(𝒘)\bm{C}=\bm{A}+\sigma\operatorname{diag}(\bm{w}) is always a diagonal matrix. Then applying Theorem 2.1 and 2.2 implies that the SDR is exact if

minij(λiλj)2n(4σSDRlognmax1in|λi|)\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}\geq n\left(4\sigma_{\operatorname{SDR}}\sqrt{\log n}\cdot\max_{1\leq i\leq n}|\lambda_{i}|\right)

where 𝚫2logn\|\bm{\Delta}\|\leq 2\sqrt{\log n} holds with high probability and 𝒆i,𝟏n=1\langle\bm{e}_{i},\bm{1}_{n}\rangle=1. Then the noise level σSDR\sigma_{\operatorname{SDR}} should satisfy

σSDRminij(λiλj)24nlognmaxi|λi|.\sigma_{\operatorname{SDR}}\leq\frac{\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}}{4n\sqrt{\log n}\max_{i}|\lambda_{i}|}.

On the other hand, given two diagonal matrices 𝑨\bm{A} and 𝑪\bm{C}, the global minimizer to (1.1) should still be 𝑰n\bm{I}_{n} if and only if the ordering of the eigenvalues remains unchanged, which holds if

σminij|λiλj|2logn.\sigma\leq\frac{\min_{i\neq j}|\lambda_{i}-\lambda_{j}|}{2\sqrt{\log n}}.

For a specific example λk=k\lambda_{k}=k1kn1\leq k\leq n, then

σSDR14n2logn,σ12logn.\sigma_{\operatorname{SDR}}\leq\frac{1}{4n^{2}\sqrt{\log n}},~{}~{}~{}\sigma\leq\frac{1}{2\sqrt{\log n}}.

This indicates that the SDR is sub-optimal by a factor of n2.n^{2}.

We also look into the performance of (2.5) in numerical simulations. We choose n=10n=10 due to the high computational complexity for larger nn, and let λk=k\lambda_{k}=k and 0σ20\leq\sigma\leq 2. For each σ\sigma, we run 20 experiments and compute the correlation of 𝑿^\widehat{\bm{X}} and 𝒙𝒙:\bm{x}\bm{x}^{\top}:

corr(𝑿^,𝒙𝒙)=𝒙𝑿^𝒙n2\operatorname{corr}(\widehat{\bm{X}},\bm{x}\bm{x}^{\top})=\frac{\bm{x}^{\top}\widehat{\bm{X}}\bm{x}}{n^{2}} (2.7)

where 𝑿^\widehat{\bm{X}} is the solution to (2.5). This correlation is between 0 and 1: the higher the correlation, the better the recovery. In particular, if corr(,)\operatorname{corr}(\cdot,\cdot) is 1, then 𝑿^=𝒙𝒙\widehat{\bm{X}}=\bm{x}\bm{x}^{\top}. We count one instance as exactness if corr(𝑿^,𝒙𝒙)1103.\operatorname{corr}(\widehat{\bm{X}},\bm{x}\bm{x}^{\top})\geq 1-10^{-3}. All simulations are done by using CVX [28] and the numerical result is presented below in Figure 1.

Refer to caption
Refer to caption
Figure 1: Exactness of SDR (2.5) and (2.6) for QAP under diagonal matrix model.

Figure 1 indicates that for σ0.3\sigma\leq 0.3, then the SDR is exact and its performance nearly matches the optimal bound. The two types of SDRs perform similarly under this random setting.

Example: diagonal-plus-Wigner model

Consider 𝑨=diag(λ1,,λn)\bm{A}=\operatorname{diag}(\lambda_{1},\cdots,\lambda_{n}) and 𝑾\bm{W} is a Gaussian random matrix with diag(𝑾)=0\operatorname{diag}(\bm{W})=0, and 𝑪=𝑨+σ𝑾\bm{C}=\bm{A}+\sigma\bm{W}. Then applying Theorem 2.1 implies that

minij(λiλj)2min|𝒖i,𝟏n|2nσSDRmaxi|λi|2nσSDRminij(λiλj)22n3/2maxi|λi|\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}\min|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}\geq n\sigma_{\operatorname{SDR}}\max_{i}|\lambda_{i}|\cdot 2\sqrt{n}~{}~{}\Longleftrightarrow~{}~{}\sigma_{\operatorname{SDR}}\leq\frac{\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}}{2n^{3/2}\max_{i}|\lambda_{i}|}

is needed to ensures the exactness where 𝒖i=𝒆i\bm{u}_{i}=\bm{e}_{i} and 𝑾(2+o(1))n\|\bm{W}\|\leq(2+o(1))\sqrt{n} holds with high probability [49]. In particular, for λk=k\lambda_{k}=k1kn1\leq k\leq n, then the exactness of the SDR holds if σSDR1/(2n5/2).\sigma_{\operatorname{SDR}}\leq 1/(2n^{5/2}).

Refer to caption
Refer to caption
Figure 2: Exactness of SDR (2.5) and (2.6) for QAP under diagonal-plus-Wigner matrix model.

The numerical experiment is given in Figure 2, which shows the exactness holds if σ0.15\sigma\leq 0.15. For n=10n=10, our bound requires σSDR0.0016\sigma_{\operatorname{SDR}}\leq 0.0016 in theory. This implies that there is gap between the actual performance of the SDR and our results.

Example: correlated Wigner model.

The correlated Wigner model assumes 𝑨\bm{A} and 𝑪\bm{C} are two Gaussian random matrices satisfying 𝑪=𝑨+σ𝑾\bm{C}=\bm{A}+\sigma\bm{W} where 𝑾\bm{W} is a Gaussian random matrix independent of 𝑨\bm{A}. To apply Theorem 2.1 and 2.2, it suffices to obtain a lower bound on the spectral gap and also an upper bound on 𝑨.\|\bm{A}\|.

For the spectral gap, [22, Corollary 1] implies for nn sufficiently large, it holds that

(23/2nδnt)2txex2dx=et21t2\hbox{\msbm{P}}(2^{-3/2}n\delta_{n}\geq t)\approx 2\int_{t}^{\infty}xe^{-x^{2}}\mathop{}\!\mathrm{d}x=e^{-t^{2}}\geq 1-t^{2}

where δn\delta_{n} denotes the smallest gap. Therefore, it holds with probability at least 1O(log1n)1-O(\log^{-1}n) that

δn=maxij|λiλj|1nlogn.\delta_{n}=\max_{i\neq j}|\lambda_{i}-\lambda_{j}|\geq\frac{1}{n\sqrt{\log n}}.

As a result, our result implies that the exactness holds if

12(1nlogn)2n(2σn2n)σSDR1n4logn\frac{1}{2}\left(\frac{1}{n\sqrt{\log n}}\right)^{2}\geq n(2\sigma\sqrt{n}\cdot 2\sqrt{n})\Longrightarrow\sigma_{\operatorname{SDR}}\lesssim\frac{1}{n^{4}\log n}

where 𝑾2(1+o(1))n\|\bm{W}\|\leq 2(1+o(1))\sqrt{n} and mini|𝒖i,𝟏n|21/2\min_{i}|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}\geq 1/2 holds with high probability. As shown in the Figure 3, it implies that σ=0.6\sigma=0.6 suffices to have the exactness numerically which is far better than the theoretic bound. Under the correlated Wigner model, (2.6) performs better than (2.5) when σ0.6.\sigma\geq 0.6.

Refer to caption
Refer to caption
Figure 3: Exactness of SDR (2.5) and (2.6) for QAP under correlated-Wigner model.

Let’s conclude this section by commenting on the numerics and also future research problems. In all three examples, we can see the SDRs perform much better in numerics than the theoretical bound by a factor of n2n^{2} or even more. In fact, for the correlated Gaussian Wigner model, the SDRs are able to recover the true permutation even if σ\sigma is of constant order, which is comparable with the state-of-the-art works on the same model such as [19]. Therefore, there is still big room for the improvement. In particular, it will be very interesting to show theoretically that SDR achieves exactness on random instances that is consistent with the numerical observation. This asks for a new technique to construct dual certificates that could exploit the statistical property of noise. More importantly, it is unclear that for the QAP problem, what is the best way to characterize the SNR (signal-to-noise ratio) that could be useful in analyzing the SDR of QAP. For example, in the correlated Gaussian Wigner model, using the spectral gap for the analysis does not seem to match the numerical performance as the spectral gap of a Gaussian random matrix can be very small for larger nn. All of these will be worthwhile to consider in the future.

3 Proof of Theorem 2.1

The proof of Theorem 2.1 essentially follows from the well-established technique in convex relaxation. Without loss of generality, we assume 𝚷=𝑰n\bm{\Pi}=\bm{I}_{n} and thus

𝒙=[𝒆1,,𝒆n]n2,𝒙2=n.\bm{x}^{\top}=[\bm{e}_{1}^{\top},\cdots,\bm{e}_{n}^{\top}]\in\hbox{\msbm{R}}^{n^{2}},~{}~{}\|\bm{x}\|^{2}=n.

Our goal is to establish sufficient conditions that ensure the global minimizer to (2.5) is 𝒙𝒙\bm{x}\bm{x}^{\top} by constructing a dual certificate.

3.1 Dual program and optimality condition of (2.5)

For each constraint in (2.5), we assign a dual variable:

𝑸0\displaystyle\bm{Q}\succeq 0 :𝑿0,\displaystyle:\bm{X}\succeq 0,
𝑩0\displaystyle\bm{B}\geq 0 :𝑿0,\displaystyle:\bm{X}\geq 0,
𝑻n×n\displaystyle\bm{T}\in\hbox{\msbm{R}}^{n\times n} :Tr(𝑿ij)=δij,\displaystyle:\operatorname{Tr}(\bm{X}_{ij})=\delta_{ij},
𝒁n×n\displaystyle\bm{Z}\in\hbox{\msbm{R}}^{n\times n} :𝑿ij,𝑱n=1,\displaystyle:\langle\bm{X}_{ij},\bm{J}_{n}\rangle=1,
𝑲n×n\displaystyle\bm{K}\in\hbox{\msbm{R}}^{n\times n} :i=1n𝑿ii=𝑰n,\displaystyle:\sum_{i=1}^{n}\bm{X}_{ii}=\bm{I}_{n},
𝑯n×n\displaystyle\bm{H}\in\hbox{\msbm{R}}^{n\times n} :i,j𝑿ij=𝑱n.\displaystyle:\sum_{i,j}\bm{X}_{ij}=\bm{J}_{n}.

The Lagrangian function is

(𝑿,𝑸,𝑩,𝑻,𝒁,𝑲,𝑯)\displaystyle{\cal L}(\bm{X},\bm{Q},\bm{B},\bm{T},\bm{Z},\bm{K},\bm{H})
:=𝑿,𝑴𝑸𝑩)i,jtij(Tr(𝑿ij)δij)i,jzij(𝑿ij,𝑱n1)\displaystyle\qquad:=\langle\bm{X},\bm{M}-\bm{Q}-\bm{B})\rangle-\sum_{i,j}t_{ij}(\operatorname{Tr}(\bm{X}_{ij})-\delta_{ij})-\sum_{i,j}z_{ij}(\langle\bm{X}_{ij},\bm{J}_{n}\rangle-1)
i=1n𝑿ii𝑰n,𝑲i,j𝑿ij𝑱n,𝑯\displaystyle\qquad\qquad-\left\langle\sum_{i=1}^{n}\bm{X}_{ii}-\bm{I}_{n},\bm{K}\right\rangle-\left\langle\sum_{i,j}\bm{X}_{ij}-\bm{J}_{n},\bm{H}\right\rangle
=𝑿,𝑴𝑸𝑩𝑻𝑰n𝒁𝑱n𝑰n𝑲𝑱n𝑯+Tr(𝑻+𝑲)+𝑱n,𝒁+𝑯.\displaystyle\qquad=\langle\bm{X},\bm{M}-\bm{Q}-\bm{B}-\bm{T}\otimes\bm{I}_{n}-\bm{Z}\otimes\bm{J}_{n}-\bm{I}_{n}\otimes\bm{K}-\bm{J}_{n}\otimes\bm{H}\rangle+\operatorname{Tr}(\bm{T}+\bm{K})+\langle\bm{J}_{n},\bm{Z}+\bm{H}\rangle.

As a result, the dual program of (2.5) is

max\displaystyle\max Tr(𝑻+𝑲)+𝑱n,𝑯+𝒁\displaystyle\operatorname{Tr}(\bm{T}+\bm{K})+\langle\bm{J}_{n},\bm{H}+\bm{Z}\rangle (3.1)
s.t. 𝑴=𝑸+𝑩+𝑻𝑰n+𝒁𝑱n+𝑰n𝑲+𝑱n𝑯,\displaystyle\bm{M}=\bm{Q}+\bm{B}+\bm{T}\otimes\bm{I}_{n}+\bm{Z}\otimes\bm{J}_{n}+\bm{I}_{n}\otimes\bm{K}+\bm{J}_{n}\otimes\bm{H},
𝑸0,𝑩0.\displaystyle\bm{Q}\succeq 0,~{}~{}\bm{B}\geq 0.

Suppose 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} is the global minimizer, then corresponding the complementary slackness:

𝑩,𝑿=0,𝑸𝑿=0.\langle\bm{B},\bm{X}\rangle=0,\quad\bm{Q}\bm{X}=0. (3.2)

Note that 𝑸0\bm{Q}\succeq 0 and 𝑿=𝒙𝒙0\bm{X}=\bm{x}\bm{x}^{\top}\succeq 0, and then 𝑸𝑿=0\bm{Q}\bm{X}=0 is equivalent to 𝑸𝒙=0.\bm{Q}\bm{x}=0. Therefore the KKT condition becomes:

  1. 1.

    Stationarity:

    𝑴=𝑸+𝑩+𝑻𝑰n+𝑰n𝑲+𝑱n𝑯+𝒁𝑱n,\bm{M}=\bm{Q}+\bm{B}+\bm{T}\otimes\bm{I}_{n}+\bm{I}_{n}\otimes\bm{K}+\bm{J}_{n}\otimes\bm{H}+\bm{Z}\otimes\bm{J}_{n}, (3.3)
  2. 2.

    Dual feasibility:

    𝑩0,𝑸0,\bm{B}\geq 0,\qquad\bm{Q}\succeq 0, (3.4)
  3. 3.

    Complementary slackness:

    𝑩,𝑿=0,𝑸𝒙=0,\langle\bm{B},\bm{X}\rangle=0,\qquad\bm{Q}\bm{x}=0, (3.5)

where 𝑴=(𝑰n𝑨𝑪𝑰n)2\bm{M}=(\bm{I}_{n}\otimes\bm{A}-\bm{C}\otimes\bm{I}_{n})^{2} is the data matrix. Due to the nonnegativity of 𝑩\bm{B}, 𝑩,𝑿=0\langle\bm{B},\bm{X}\rangle=0 is equivalent to 𝑩𝒙𝒙=0\bm{B}\circ\bm{x}\bm{x}^{\top}=0 where “\circ” is the Hadamard product of two matrices.

Now we compute 𝑸𝒙=0\bm{Q}\bm{x}=0 to further make the KKT condition explicit. Instead of computing 𝑸𝒙\bm{Q}\bm{x}, we will use the property of Kronecker product to simplify the expression. Note that

mat(𝑴𝒙)=mat((𝑰n𝑨𝑪𝑰n)2𝒙)\displaystyle\operatorname{mat}(\bm{M}\bm{x})=\operatorname{mat}((\bm{I}_{n}\otimes\bm{A}-\bm{C}\otimes\bm{I}_{n})^{2}\bm{x})
=𝑨22𝑨𝑪+𝑪2=𝚫2+𝚫𝑨𝑨𝚫,\displaystyle\qquad\qquad~{}~{}=\bm{A}^{2}-2\bm{A}\bm{C}+\bm{C}^{2}=\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta},
mat((𝑻𝑰n)𝒙)=𝑻,\displaystyle\operatorname{mat}((\bm{T}\otimes\bm{I}_{n})\bm{x})=\bm{T},
mat((𝒁𝑱n)𝒙)=𝑱n𝒁,\displaystyle\operatorname{mat}((\bm{Z}\otimes\bm{J}_{n})\bm{x})=\bm{J}_{n}\bm{Z},
mat((𝑰n𝑲)𝒙)=𝑲,\displaystyle\operatorname{mat}((\bm{I}_{n}\otimes\bm{K})\bm{x})=\bm{K},
mat((𝑱n𝑯)𝒙)=𝑯𝑱n,\displaystyle\operatorname{mat}((\bm{J}_{n}\otimes\bm{H})\bm{x})=\bm{H}\bm{J}_{n},

where 𝒙=vec(𝑰n).\bm{x}=\operatorname{vec}(\bm{I}_{n}). We define

𝑺=𝑻𝑰n+𝑰n𝑲+𝑱n𝑯+𝒁𝑱n\bm{S}=\bm{T}\otimes\bm{I}_{n}+\bm{I}_{n}\otimes\bm{K}+\bm{J}_{n}\otimes\bm{H}+\bm{Z}\otimes\bm{J}_{n} (3.6)

as we will use it quite often, and it holds

mat(𝑺𝒙)\displaystyle\operatorname{mat}(\bm{S}\bm{x}) =𝑻+𝑱n𝒁+𝑲+𝑯𝑱n,\displaystyle=\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n},
mat((𝑴𝑺)𝒙)\displaystyle\operatorname{mat}((\bm{M}-\bm{S})\bm{x}) =𝚫2+𝚫𝑨𝑨𝚫(𝑻+𝑱n𝒁+𝑲+𝑯𝑱n).\displaystyle=\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}-(\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n}).

Therefore, (3.3), (3.4) and (3.5) imply

mat((𝑸+𝑩)𝒙)\displaystyle\operatorname{mat}((\bm{Q}+\bm{B})\bm{x}) =mat((𝑴𝑺)𝒙)\displaystyle=\operatorname{mat}((\bm{M}-\bm{S})\bm{x}) (3.7)
=mat((𝑴(𝑻𝑰n+𝑰n𝑲+𝑱n𝑯+𝒁𝑱n))𝒙)\displaystyle=\operatorname{mat}((\bm{M}-(\bm{T}\otimes\bm{I}_{n}+\bm{I}_{n}\otimes\bm{K}+\bm{J}_{n}\otimes\bm{H}+\bm{Z}\otimes\bm{J}_{n}))\bm{x})
=𝚫2+𝚫𝑨𝑨𝚫(𝑻+𝑱n𝒁+𝑲+𝑯𝑱n)=mat(𝑩𝒙)\displaystyle=\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}-(\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n})=\operatorname{mat}(\bm{B}\bm{x})

where 𝑸𝒙=0\bm{Q}\bm{x}=0 is in (3.4). Therefore, the KKT condition has a simplified form:

𝑸=𝑴(𝑩+𝑻𝑰n+𝒁𝑱n+𝑰n𝑲+𝑱n𝑯)0,\displaystyle\bm{Q}=\bm{M}-\left(\bm{B}+\bm{T}\otimes\bm{I}_{n}+\bm{Z}\otimes\bm{J}_{n}+\bm{I}_{n}\otimes\bm{K}+\bm{J}_{n}\otimes\bm{H}\right)\succeq 0, (3.8)
𝚫2+𝚫𝑨𝑨𝚫=mat(𝑩𝒙)+(𝑻+𝑱n𝒁+𝑲+𝑯𝑱n),\displaystyle\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}=\operatorname{mat}(\bm{B}\bm{x})+(\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n}),
𝑩,𝑿=0,𝑩0.\displaystyle\langle\bm{B},\bm{X}\rangle=0,~{}~{}\bm{B}\geq 0.

In the following section, we will construct an explicit dual certificate, i.e., 𝑻,𝒁,𝑲,𝑯\bm{T},\bm{Z},\bm{K},\bm{H}, and 𝑩0\bm{B}\succeq 0 such that (3.8) holds. Moreover, we can also certify 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} as the unique global minimizer under mild conditions.

Theorem 3.1.

Suppose there exist 𝐁0\bm{B}\geq 0 with 𝐁𝐱𝐱=0\bm{B}\circ\bm{x}\bm{x}^{\top}=0, and 𝐓,𝐊,𝐙\bm{T},\bm{K},\bm{Z} and 𝐇\bm{H} such that 𝐐c(𝐈n2n1𝐱𝐱)\bm{Q}\succeq c\left(\bm{I}_{n^{2}}-n^{-1}\bm{x}\bm{x}^{\top}\right) for some c0c\geq 0, then 𝐗=𝐱𝐱\bm{X}=\bm{x}\bm{x}^{\top} is a global minimizer. In addition, if c>0c>0, then 𝐗\bm{X} is the unique global minimizer.

Proof: .

Let 𝑿^\widehat{\bm{X}} be any feasible solution that is not equal to 𝒙𝒙\bm{x}\bm{x}^{\top}. Then

𝑸,𝑿^\displaystyle\langle\bm{Q},\widehat{\bm{X}}\rangle =𝑸,𝑿^𝒙𝒙=𝑴𝑺𝑩,𝑿^𝒙𝒙\displaystyle=\langle\bm{Q},\widehat{\bm{X}}-\bm{x}\bm{x}^{\top}\rangle=\langle\bm{M}-\bm{S}-\bm{B},\widehat{\bm{X}}-\bm{x}\bm{x}^{\top}\rangle
=𝑴𝑩,𝑿^𝒙𝒙\displaystyle=\langle\bm{M}-\bm{B},\widehat{\bm{X}}-\bm{x}\bm{x}^{\top}\rangle
=𝑴,𝑿^𝒙𝒙𝑩,𝑿^\displaystyle=\langle\bm{M},\widehat{\bm{X}}-\bm{x}\bm{x}^{\top}\rangle-\langle\bm{B},\widehat{\bm{X}}\rangle

where 𝑸=𝑴𝑺𝑩\bm{Q}=\bm{M}-\bm{S}-\bm{B} and 𝑺,𝑿^𝒙𝒙=0\langle\bm{S},\widehat{\bm{X}}-\bm{x}\bm{x}^{\top}\rangle=0 holds due to the primal feasibility of 𝑿^\widehat{\bm{X}} and 𝒙𝒙\bm{x}\bm{x}^{\top}. Now we have

𝑴,𝑿^𝒙𝒙\displaystyle\langle\bm{M},\widehat{\bm{X}}-\bm{x}\bm{x}^{\top}\rangle =𝑸+𝑩,𝑿^𝑸,𝑿^=𝑸,(𝑰n2𝒙𝒙n)𝑿^(𝑰n2𝒙𝒙n)0\displaystyle=\langle\bm{Q}+\bm{B},\widehat{\bm{X}}\rangle\geq\langle\bm{Q},\widehat{\bm{X}}\rangle=\left\langle\bm{Q},\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)\widehat{\bm{X}}\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)\right\rangle\geq 0

where (𝑰n2n1𝒙𝒙)𝑿^(𝑰n2n1𝒙𝒙)\left(\bm{I}_{n^{2}}-n^{-1}\bm{x}\bm{x}^{\top}\right)\widehat{\bm{X}}\left(\bm{I}_{n^{2}}-n^{-1}\bm{x}\bm{x}^{\top}\right) is nonzero. In particular, if c>0c>0, then the inequality above is strict, implying the uniqueness of the global minimizer. ∎

3.2 Construction of a dual certificate

From the first equation in (3.8) and 𝑸𝒙=0\bm{Q}\bm{x}=0, it holds

𝑩𝒙=(𝑴𝑺)𝒙0\bm{B}\bm{x}=(\bm{M}-\bm{S})\bm{x}\geq 0

where 𝑩0\bm{B}\geq 0 and 𝑺=𝑻𝑰n+𝒁𝑱n+𝑰n𝑲+𝑱n𝑯.\bm{S}=\bm{T}\otimes\bm{I}_{n}+\bm{Z}\otimes\bm{J}_{n}+\bm{I}_{n}\otimes\bm{K}+\bm{J}_{n}\otimes\bm{H}. Note that 𝑩𝒙𝒙=0\bm{B}\circ\bm{x}\bm{x}^{\top}=0, and thus diag(𝒙)𝑩𝒙=0\operatorname{diag}(\bm{x})\bm{B}\bm{x}=0, i.e., the diagonal elements of mat(𝑩𝒙)\operatorname{mat}(\bm{B}\bm{x}) equal 0. Therefore, the second equation in (3.8) gives diag(𝚫2)=diag(𝑻+𝑲)+(𝒁+𝑯)𝟏n\operatorname{diag}(\bm{\Delta}^{2})=\operatorname{diag}(\bm{T}+\bm{K})+(\bm{Z}+\bm{H})\bm{1}_{n}.

With the discussion above, we try to first determine 𝑩\bm{B} via solving the linear equation 𝑩𝒙=(𝑴𝑺)𝒙\bm{B}\bm{x}=(\bm{M}-\bm{S})\bm{x}, and we will find that the construction of a dual certificate finally reduce to searching for proper 𝑻,𝑲,𝒁\bm{T},\bm{K},\bm{Z} and 𝑯\bm{H}.

Proposition 3.2 (Construction of B\bm{B} and Q\bm{Q}).

Suppose

diag(𝚫2)=diag(𝑻+𝑲)+(𝒁+𝑯)𝟏n\operatorname{diag}(\bm{\Delta}^{2})=\operatorname{diag}(\bm{T}+\bm{K})+(\bm{Z}+\bm{H})\bm{1}_{n} (3.9)

for some 𝐓,𝐊\bm{T},\bm{K}, 𝐙\bm{Z}, and 𝐇\bm{H}. Let

𝑩=1n(𝑴𝑺)𝒙𝒙+1n𝒙𝒙(𝑴𝑺),\bm{B}=\frac{1}{n}\left(\bm{M}-\bm{S}\right)\bm{x}\bm{x}^{\top}+\frac{1}{n}\bm{x}\bm{x}^{\top}\left(\bm{M}-\bm{S}\right), (3.10)

then 𝐁𝐱=(𝐌𝐒)𝐱\bm{B}\bm{x}=(\bm{M}-\bm{S})\bm{x} and 𝐁𝐱𝐱=0\bm{B}\circ\bm{x}\bm{x}^{\top}=0 hold automatically. Moreover, 𝐁0\bm{B}\geq 0 if

(𝑴𝑺)𝒙0𝚫2+𝚫𝑨𝑨𝚫(𝑻+𝑱n𝒁+𝑲+𝑯𝑱n)0.(\bm{M}-\bm{S})\bm{x}\geq 0\Longleftrightarrow\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}-(\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n})\geq 0. (3.11)

for some 𝐓,𝐙,𝐊\bm{T},\bm{Z},\bm{K} and 𝐇.\bm{H}. As a result, it holds

𝑸=𝑴𝑺𝑩=(𝑰n2𝒙𝒙n)(𝑴𝑺)(𝑰n2𝒙𝒙n)\bm{Q}=\bm{M}-\bm{S}-\bm{B}=\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)\left(\bm{M}-\bm{S}\right)\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right) (3.12)

with the construction of 𝐁\bm{B} in (3.10).

Proof: .

Consider 𝑩\bm{B} in (3.10), it holds that

𝑩𝒙\displaystyle\bm{B}\bm{x} =1n(𝑴𝑺)𝒙𝒙𝒙+1n𝒙𝒙(𝑴𝑺)𝒙=(𝑴𝑺)𝒙\displaystyle=\frac{1}{n}\left(\bm{M}-\bm{S}\right)\bm{x}\bm{x}^{\top}\bm{x}+\frac{1}{n}\bm{x}\cdot\bm{x}^{\top}\left(\bm{M}-\bm{S}\right)\bm{x}=\left(\bm{M}-\bm{S}\right)\bm{x}

where

𝒙(𝑴𝑺)𝒙\displaystyle\bm{x}^{\top}(\bm{M}-\bm{S})\bm{x} =𝑰n,mat((𝑴𝑺)𝒙)\displaystyle=\langle\bm{I}_{n},\operatorname{mat}((\bm{M}-\bm{S})\bm{x})\rangle
=𝑰n,(𝚫2+𝚫𝑨𝑨𝚫mat(𝑺𝒙))\displaystyle=\langle\bm{I}_{n},(\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}-\operatorname{mat}(\bm{S}\bm{x}))\rangle
=Tr(diag(𝚫2)(diag(𝑻+𝑲)+(𝑯+𝒁)𝟏n))=0\displaystyle=\operatorname{Tr}\left(\operatorname{diag}(\bm{\Delta}^{2})-(\operatorname{diag}(\bm{T}+\bm{K})+(\bm{H}+\bm{Z})\bm{1}_{n})\right)=0

and the expression of (𝑴𝑺)𝒙(\bm{M}-\bm{S})\bm{x} follows from (3.7) and (3.9).

We proceed to verify that 𝑩𝒙𝒙=0\bm{B}\circ\bm{x}\bm{x}^{\top}=0:

mat(diag(𝒙)𝑩𝒙)\displaystyle\operatorname{mat}(\operatorname{diag}(\bm{x})\bm{B}\bm{x}) =mat(diag(𝒙)(𝑴𝑺)𝒙)\displaystyle=\operatorname{mat}(\operatorname{diag}(\bm{x})(\bm{M}-\bm{S})\bm{x})
=𝑰nmat((𝑴𝑺)𝒙)\displaystyle=\bm{I}_{n}\circ\operatorname{mat}((\bm{M}-\bm{S})\bm{x})
=𝑰n(𝚫2+𝚫𝑨𝑨𝚫(𝑻+𝑱n𝒁+𝑲+𝑯𝑱n))\displaystyle=\bm{I}_{n}\circ(\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}-(\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n}))
=diag(𝚫2)(diag(𝑻+𝑲)+(𝒁+𝑯)𝟏n)=0\displaystyle=\operatorname{diag}(\bm{\Delta}^{2})-(\operatorname{diag}(\bm{T}+\bm{K})+(\bm{Z}+\bm{H})\bm{1}_{n})=0

follows from (3.9). This implies that the diagonal entries of mat(𝑩𝒙)\operatorname{mat}(\bm{B}\bm{x}) are zero, i.e., the supports of 𝑩𝒙\bm{B}\bm{x} and 𝒙\bm{x} are disjoint. Now using 𝑩\bm{B} in (3.10) leads to

𝑩𝒙𝒙\displaystyle\bm{B}\circ\bm{x}\bm{x}^{\top} =diag(𝒙)𝑩diag(𝒙)=1ndiag(𝒙)(𝑴𝑺)𝒙𝒙+1n𝒙𝒙(𝑴𝑺)diag(𝒙)\displaystyle=\operatorname{diag}(\bm{x})\bm{B}\operatorname{diag}(\bm{x})=\frac{1}{n}\operatorname{diag}(\bm{x})\left(\bm{M}-\bm{S}\right)\bm{x}\bm{x}^{\top}+\frac{1}{n}\bm{x}\bm{x}^{\top}\left(\bm{M}-\bm{S}\right)\operatorname{diag}(\bm{x})
=1ndiag(𝒙)𝑩𝒙𝒙+1n𝒙𝒙𝑩diag(𝒙)=0\displaystyle=\frac{1}{n}\operatorname{diag}(\bm{x})\bm{B}\bm{x}\bm{x}^{\top}+\frac{1}{n}\bm{x}\bm{x}^{\top}\bm{B}\operatorname{diag}(\bm{x})=0

where diag(𝒙)𝒙=𝒙\operatorname{diag}(\bm{x})\bm{x}=\bm{x} and diag(𝒙)𝑩𝒙=0.\operatorname{diag}(\bm{x})\bm{B}\bm{x}=0.

Finally, to ensure 𝑩0\bm{B}\geq 0, it suffices to have

(𝑴𝑺)𝒙=𝚫2+𝚫𝑨𝑨𝚫(𝑻+𝑱n𝒁+𝑲+𝑯𝑱n)0(\bm{M}-\bm{S})\bm{x}=\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}-(\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n})\geq 0

due to the construction of 𝑩\bm{B} in (3.10).

Then it holds that

𝑸\displaystyle\bm{Q} =𝑴𝑺𝑩\displaystyle=\bm{M}-\bm{S}-\bm{B}
=𝑴𝑺1n(𝑴𝑺)𝒙𝒙1n𝒙𝒙(𝑴𝑺)(𝑰n2𝒙𝒙n)\displaystyle=\bm{M}-\bm{S}-\frac{1}{n}\left(\bm{M}-\bm{S}\right)\bm{x}\bm{x}^{\top}-\frac{1}{n}\bm{x}\bm{x}^{\top}\left(\bm{M}-\bm{S}\right)\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)
=(𝑰n2𝒙𝒙n)(𝑴𝑺)(𝑰n2𝒙𝒙n)\displaystyle=\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)\left(\bm{M}-\bm{S}\right)\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)

where 𝒙(𝑴𝑺)𝒙=0.\bm{x}^{\top}(\bm{M}-\bm{S})\bm{x}=0.

Finally, we summarize our findings: by choosing 𝑩\bm{B} in the form of (3.10), it suffices to find 𝑻\bm{T}, 𝑲\bm{K}, 𝒁\bm{Z} and 𝑯\bm{H} such that

diag(𝑻+𝑲)+(𝒁+𝑯)𝟏n=diag(𝚫2),\displaystyle\operatorname{diag}(\bm{T}+\bm{K})+(\bm{Z}+\bm{H})\bm{1}_{n}=\operatorname{diag}(\bm{\Delta}^{2}), (3.13)
𝚫2+𝚫𝑨𝑨𝚫(𝑻+𝑱n𝒁+𝑲+𝑯𝑱n)0,\displaystyle\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}-(\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n})\geq 0,
(𝑰n2𝒙𝒙n)(𝑴(𝑻𝑰n+𝒁𝑱n+𝑰n𝑲+𝑱n𝑯))(𝑰n2𝒙𝒙n)0.\displaystyle\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)\left(\bm{M}-(\bm{T}\otimes\bm{I}_{n}+\bm{Z}\otimes\bm{J}_{n}+\bm{I}_{n}\otimes\bm{K}+\bm{J}_{n}\otimes\bm{H})\right)\left(\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)\succeq 0.

where the first two constraints ensure 𝑩𝒙𝒙=0\bm{B}\circ\bm{x}\bm{x}^{\top}=0 and 𝑩0\bm{B}\geq 0 respectively, and third one corresponds to 𝑸0.\bm{Q}\succeq 0.

Now we proceed to prove Theorem 2.1. The proof relies on the following proposition that assume 𝚫=0\bm{\Delta}=0. In the noiseless case, i.e., 𝑪=𝑨\bm{C}=\bm{A} with 𝚫=0\bm{\Delta}=0:

𝑴\displaystyle\bm{M} =(𝑰n𝑨𝑨𝑰n)2=i=1nj=1n(λiλj)2𝒖i𝒖i𝒖j𝒖j\displaystyle=(\bm{I}_{n}\otimes\bm{A}-\bm{A}\otimes\bm{I}_{n})^{2}=\sum_{i=1}^{n}\sum_{j=1}^{n}(\lambda_{i}-\lambda_{j})^{2}\bm{u}_{i}\bm{u}_{i}^{\top}\otimes\bm{u}_{j}\bm{u}_{j}^{\top}

where (λi,𝒖i)(\lambda_{i},\bm{u}_{i}) is the iith eigenvalue and eigenvector of 𝑨.\bm{A}. The smallest eigenvectors of 𝑴\bm{M} are 0 with corresponding eigenvectors {𝒖i𝒖i,1in}.\{\bm{u}_{i}\otimes\bm{u}_{i},1\leq i\leq n\}.

Theorem 3.3 (Noise-free version of Theorem 2.1).

Suppose |𝐮i,𝟏n|2>0|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}>0 for all 1in1\leq i\leq n and the eigenvalues of 𝐀\bm{A} are distinct. Moreover, by letting

𝑻=t𝑱n,𝑲=t𝑱n,𝑯=t𝑱n/n,𝒁=t𝑱n/n,\bm{T}=-t\bm{J}_{n},~{}\bm{K}=-t\bm{J}_{n},~{}\bm{H}=t\bm{J}_{n}/n,~{}\bm{Z}=t\bm{J}_{n}/n, (3.14)

then the second smallest eigenvalue of

𝑸=𝑴𝑺=i=1nj=1n(λiλj)2𝒖i𝒖i𝒖j𝒖j+t𝑱n(𝑰n𝑱n/n)+t(𝑰n𝑱n/n)𝑱n.\bm{Q}=\bm{M}-\bm{S}=\sum_{i=1}^{n}\sum_{j=1}^{n}(\lambda_{i}-\lambda_{j})^{2}\bm{u}_{i}\bm{u}_{i}^{\top}\otimes\bm{u}_{j}\bm{u}_{j}^{\top}+t\bm{J}_{n}\otimes(\bm{I}_{n}-\bm{J}_{n}/n)+t(\bm{I}_{n}-\bm{J}_{n}/n)\otimes\bm{J}_{n}.

is bounded below by

λ2(𝑸)2nminij(λiλj)2min1in|𝒖i,𝟏n|2\lambda_{2}(\bm{Q})\geq\frac{2}{n}\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}\cdot\min_{1\leq i\leq n}|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}

for sufficiently large t>0t>0, and 𝐐𝐱=0.\bm{Q}\bm{x}=0.

Proof: .

Under (3.14), it holds

𝑺\displaystyle\bm{S} =t(𝑱n(𝑰n𝑱n/n)+t(𝑰n𝑱n/n)𝑱n),\displaystyle=-t\left(\bm{J}_{n}\otimes(\bm{I}_{n}-\bm{J}_{n}/n)+t(\bm{I}_{n}-\bm{J}_{n}/n)\otimes\bm{J}_{n}\right),
𝑸\displaystyle\bm{Q} =𝑴𝑺=𝑴+t𝑱n(𝑰n𝑱n/n)+t(𝑰n𝑱n/n)𝑱n.\displaystyle=\bm{M}-\bm{S}=\bm{M}+t\bm{J}_{n}\otimes(\bm{I}_{n}-\bm{J}_{n}/n)+t(\bm{I}_{n}-\bm{J}_{n}/n)\otimes\bm{J}_{n}.

It is easy to verify that 𝑴𝒙=𝑺𝒙=0\bm{M}\bm{x}=\bm{S}\bm{x}=0, and thus 𝑸𝒙=0\bm{Q}\bm{x}=0 and 𝑩=0.\bm{B}=0.

Next, we will show λ2(𝑴𝑺)>0\lambda_{2}(\bm{M}-\bm{S})>0 under the assumption of this theorem. Before proceeding, we introduce a few notations. Let 𝑼=[𝒖1,,𝒖n]\bm{U}=[\bm{u}_{1},\cdots,\bm{u}_{n}] consist of nn eigenvectors of 𝑨\bm{A} with corresponding eigenvalues 𝚲=diag(λ1,,λn)\bm{\Lambda}=\operatorname{diag}(\lambda_{1},\cdots,\lambda_{n}), i.e., 𝑨=𝑼𝚺𝑼.\bm{A}=\bm{U}\bm{\Sigma}\bm{U}^{\top}. Let

𝚽=[𝒖1𝒖1,,𝒖n𝒖n]n2×n\bm{\Phi}=[\bm{u}_{1}\otimes\bm{u}_{1},\cdots,\bm{u}_{n}\otimes\bm{u}_{n}]\in\hbox{\msbm{R}}^{n^{2}\times n}

be the eigenvectors of 𝑴\bm{M} w.r.t. eigenvalue 0. Then it holds that

𝚽𝚽𝒙=i=1n(𝒖i𝒖i𝒖i𝒖i)𝒙=𝒙,𝚽𝒙=𝟏n.\bm{\Phi}\bm{\Phi}^{\top}\bm{x}=\sum_{i=1}^{n}(\bm{u}_{i}\bm{u}_{i}^{\top}\otimes\bm{u}_{i}\bm{u}_{i}^{\top})\bm{x}=\bm{x},~{}~{}~{}~{}\bm{\Phi}^{\top}\bm{x}=\bm{1}_{n}. (3.15)

Let 𝑷=𝑱n/n\bm{P}=\bm{J}_{n}/n, and

𝑷¯\displaystyle\bar{\bm{P}} =(𝑷𝑰n𝑰n𝑷)2=1n(𝑱n(𝑰n𝑱nn)+(𝑰n𝑱nn)𝑱n)𝚷=𝑰n2𝒙𝒙n,\displaystyle=(\bm{P}\otimes\bm{I}_{n}-\bm{I}_{n}\otimes\bm{P})^{2}=\frac{1}{n}\left(\bm{J}_{n}\otimes\left(\bm{I}_{n}-\frac{\bm{J}_{n}}{n}\right)+\left(\bm{I}_{n}-\frac{\bm{J}_{n}}{n}\right)\otimes\bm{J}_{n}\right)~{}~{}~{}\bm{\Pi}=\bm{I}_{n^{2}}-\frac{\bm{x}\bm{x}^{\top}}{n},

where 𝑷¯\bar{\bm{P}}, 𝑷\bm{P}, 𝚷,\bm{\Pi}, and 𝚷𝑷¯\bm{\Pi}-\bar{\bm{P}} are all projection matrices satisfying 𝚷𝑷¯=𝑷¯\bm{\Pi}\bar{\bm{P}}=\bar{\bm{P}} and 𝑷¯𝒙=0.\bar{\bm{P}}\bm{x}=0.

Our goal is to show that for some sufficiently large t>0t>0, it holds

𝑴𝑺\displaystyle\bm{M}-\bm{S} =𝑴+t𝑱n(𝑰n𝑱n/n)+t(𝑰n𝑱n/n)𝑱n=𝑴+nt𝑷¯c𝚷\displaystyle=\bm{M}+t\bm{J}_{n}\otimes(\bm{I}_{n}-\bm{J}_{n}/n)+t(\bm{I}_{n}-\bm{J}_{n}/n)\otimes\bm{J}_{n}=\bm{M}+nt\bar{\bm{P}}\succeq c\bm{\Pi}

for some c>0c>0. Lemma 4.2 implies that it suffices to prove that

(𝚷𝑷¯)𝑴(𝚷𝑷¯)c(𝚷𝑷¯).(\bm{\Pi}-\bar{\bm{P}})\bm{M}(\bm{\Pi}-\bar{\bm{P}})\succ c(\bm{\Pi}-\bar{\bm{P}}).

Note that

𝑴minij(λiλj)2(𝑰n2𝚽𝚽)\bm{M}\succeq\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}\cdot(\bm{I}_{n^{2}}-\bm{\Phi}\bm{\Phi}^{\top})

and thus it remains to show that (𝚷𝑷¯)(𝑰n2𝚽𝚽)(𝚷𝑷¯)c(𝚷𝑷¯)(\bm{\Pi}-\bar{\bm{P}})(\bm{I}_{n^{2}}-\bm{\Phi}\bm{\Phi}^{\top})(\bm{\Pi}-\bar{\bm{P}})\succeq c^{\prime}(\bm{\Pi}-\bar{\bm{P}}) for some c>0c^{\prime}>0. Since

(𝚷𝑷¯)(𝑰n2𝚽𝚽)(𝚷𝑷¯)=(𝚷𝑷¯)(𝚷𝑷¯)𝚽𝚽(𝚷𝑷¯),\displaystyle(\bm{\Pi}-\bar{\bm{P}})(\bm{I}_{n^{2}}-\bm{\Phi}\bm{\Phi}^{\top})(\bm{\Pi}-\bar{\bm{P}})=(\bm{\Pi}-\bar{\bm{P}})-(\bm{\Pi}-\bar{\bm{P}})\bm{\Phi}\bm{\Phi}^{\top}(\bm{\Pi}-\bar{\bm{P}}),

it remains to show that λmax((𝚷𝑷¯)𝚽𝚽(𝚷𝑷¯))1c\lambda_{\max}((\bm{\Pi}-\bar{\bm{P}})\bm{\Phi}\bm{\Phi}^{\top}(\bm{\Pi}-\bar{\bm{P}}))\leq 1-c^{\prime} and equivalently

λmax(𝚽(𝚷𝑷¯)𝚽)1cλmin(𝚽(𝒙𝒙n+𝑷¯)𝚽)c\lambda_{\max}(\bm{\Phi}^{\top}(\bm{\Pi}-\bar{\bm{P}})\bm{\Phi})\leq 1-c^{\prime}\Longleftrightarrow\lambda_{\min}\left(\bm{\Phi}^{\top}\left(\frac{\bm{x}\bm{x}^{\top}}{n}+\bar{\bm{P}}\right)\bm{\Phi}\right)\geq c^{\prime}

which follows from

𝚽(𝚷𝑷¯)𝚽=𝑰n𝚽(𝒙𝒙n+𝑷¯)𝚽.\bm{\Phi}^{\top}(\bm{\Pi}-\bar{\bm{P}})\bm{\Phi}=\bm{I}_{n}-\bm{\Phi}^{\top}\left(\frac{\bm{x}\bm{x}^{\top}}{n}+\bar{\bm{P}}\right)\bm{\Phi}.

Since 𝑷¯=n2(𝑱n𝑰n𝑰n𝑱n)2\bar{\bm{P}}=n^{-2}(\bm{J}_{n}\otimes\bm{I}_{n}-\bm{I}_{n}\otimes\bm{J}_{n})^{2}, we first compute

(𝑱n𝑰n𝑰n𝑱n)𝚽\displaystyle(\bm{J}_{n}\otimes\bm{I}_{n}-\bm{I}_{n}\otimes\bm{J}_{n})\bm{\Phi} =(𝑱n𝑰n𝑰n𝑱n)[𝒖1𝒖1,,𝒖n𝒖n]\displaystyle=(\bm{J}_{n}\otimes\bm{I}_{n}-\bm{I}_{n}\otimes\bm{J}_{n})[\bm{u}_{1}\otimes\bm{u}_{1},\cdots,\bm{u}_{n}\otimes\bm{u}_{n}]
=([𝟏n𝒖1,,𝟏n𝒖n][𝒖1𝟏n,,𝒖n𝟏n])diag(𝑼𝟏n)\displaystyle=\left([\bm{1}_{n}\otimes\bm{u}_{1},\cdots,\bm{1}_{n}\otimes\bm{u}_{n}]-[\bm{u}_{1}\otimes\bm{1}_{n},\cdots,\bm{u}_{n}\otimes\bm{1}_{n}]\right)\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})
=(𝟏n𝑼𝑼𝟏n)diag(𝑼𝟏n)\displaystyle=(\bm{1}_{n}\otimes\bm{U}-\bm{U}\otimes\bm{1}_{n})\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})

and then

𝚽𝑷¯𝚽\displaystyle\bm{\Phi}^{\top}\bar{\bm{P}}\bm{\Phi} =1n2diag(𝑼𝟏n)(𝟏n𝑼𝑼𝟏n)(𝟏n𝑼𝑼𝟏n)diag(𝑼𝟏n)\displaystyle=\frac{1}{n^{2}}\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})(\bm{1}_{n}^{\top}\otimes\bm{U}^{\top}-\bm{U}^{\top}\otimes\bm{1}_{n}^{\top})(\bm{1}_{n}\otimes\bm{U}-\bm{U}\otimes\bm{1}_{n})\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})
=1n2diag(𝑼𝟏n)(2n𝑰n𝑼𝟏n𝟏n𝑼𝟏n𝑼𝑼𝟏n)diag(𝑼𝟏n)\displaystyle=\frac{1}{n^{2}}\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})(2n\bm{I}_{n}-\bm{U}^{\top}\bm{1}_{n}\otimes\bm{1}_{n}^{\top}\bm{U}-\bm{1}_{n}^{\top}\bm{U}\otimes\bm{U}^{\top}\bm{1}_{n})\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})
=2ndiag(𝑼𝟏n)(𝑰n𝑼𝑷𝑼)diag(𝑼𝟏n).\displaystyle=\frac{2}{n}\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})(\bm{I}_{n}-\bm{U}^{\top}\bm{P}\bm{U})\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n}).

where 𝑼𝟏n𝟏n𝑼=𝑼𝑱n𝑼.\bm{U}^{\top}\bm{1}_{n}\otimes\bm{1}_{n}^{\top}\bm{U}=\bm{U}^{\top}\bm{J}_{n}\bm{U}.

Using (3.15), we have

𝚽(𝑷¯+n1𝒙𝒙)𝚽\displaystyle\bm{\Phi}^{\top}(\bar{\bm{P}}+n^{-1}\bm{x}\bm{x}^{\top})\bm{\Phi} =2ndiag(𝑼𝟏n)(𝑰n𝑼𝑷𝑼)diag(𝑼𝟏n)+𝑱nn\displaystyle=\frac{2}{n}\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})(\bm{I}_{n}-\bm{U}^{\top}\bm{P}\bm{U})\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})+\frac{\bm{J}_{n}}{n}

where (𝑰n𝑼𝑷𝑼)diag(𝑼𝟏n)𝟏n=(𝑰n𝑼𝑷𝑼)𝑼𝟏n=0(\bm{I}_{n}-\bm{U}^{\top}\bm{P}\bm{U})\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})\bm{1}_{n}=(\bm{I}_{n}-\bm{U}^{\top}\bm{P}\bm{U})\bm{U}^{\top}\bm{1}_{n}=0. Therefore, we only need to control the second smallest eigenvalue of the first term.

Since 𝑼𝑷𝑼\bm{U}^{\top}\bm{P}\bm{U} is rank-1 and Tr(𝑼𝑷𝑼)=1\operatorname{Tr}(\bm{U}^{\top}\bm{P}\bm{U})=1, we have 𝑰n𝑼𝑷𝑼\bm{I}_{n}-\bm{U}^{\top}\bm{P}\bm{U} is a projection matrix. The second smallest eigenvalue of diag(𝑼𝟏n)(𝑰n𝑼𝑷𝑼)diag(𝑼𝟏n)\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n})(\bm{I}_{n}-\bm{U}^{\top}\bm{P}\bm{U})\operatorname{diag}(\bm{U}^{\top}\bm{1}_{n}) is lower bounded by min|𝒖i,𝟏n|2.\min|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}. Therefore, we have

λmin(𝚽(𝑷¯+n1𝒙𝒙)𝚽)min{2|𝒖i,𝟏n|2n,1}2min|𝒖i,𝟏n|2n=:c.\lambda_{\min}(\bm{\Phi}^{\top}(\bar{\bm{P}}+n^{-1}\bm{x}\bm{x}^{\top})\bm{\Phi})\geq\min\left\{\frac{2|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}}{n},1\right\}\geq\frac{2\min|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}}{n}=:c^{\prime}. (3.16)

where mini|𝒖i,𝟏n|21\min_{i}|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}\leq 1 and it holds (𝚷𝑷¯)(𝑰n2𝚽𝚽)(𝚷𝑷¯)c(𝚷𝑷¯).(\bm{\Pi}-\bar{\bm{P}})(\bm{I}_{n^{2}}-\bm{\Phi}\bm{\Phi}^{\top})(\bm{\Pi}-\bar{\bm{P}})\succeq c^{\prime}(\bm{\Pi}-\bar{\bm{P}}). Finally, we have

(𝚷𝑷¯)𝑴(𝚷𝑷¯)\displaystyle(\bm{\Pi}-\bar{\bm{P}})\bm{M}(\bm{\Pi}-\bar{\bm{P}}) minij(λiλj)2(𝚷𝑷¯)(𝑰n2𝚽𝚽)(𝚷𝑷¯)\displaystyle\succeq\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}(\bm{\Pi}-\bar{\bm{P}})(\bm{I}_{n^{2}}-\bm{\Phi}\bm{\Phi}^{\top})(\bm{\Pi}-\bar{\bm{P}})
cminij(λiλj)2(𝚷𝑷¯).\displaystyle\succeq c^{\prime}\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}(\bm{\Pi}-\bar{\bm{P}}).

Lemma 4.2 implies that

𝑴+nt𝑷¯(2nmin|𝒖i,𝟏n|2minij(λiλj)2ϵ)(𝑰n2𝚷)\bm{M}+nt\bar{\bm{P}}\succeq\left(\frac{2}{n}\min|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}\cdot\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}-\epsilon\right)(\bm{I}_{n^{2}}-\bm{\Pi})

for a sufficiently large tt where ϵ>0\epsilon>0 can be arbitrarily small for a sufficiently large tt. ∎

Proof of Theorem 2.1.

Theorem 3.1 indicates that if the second smallest eigenvalue of 𝑸\bm{Q} is nonnegative, then the exactness holds, i.e., 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} is a global minimizer to (2.5).

In presence of noise, 𝑴\bm{M} equals

𝑴\displaystyle\bm{M} =(𝑰n𝑨𝑨𝑰n𝚫𝑰n)2\displaystyle=(\bm{I}_{n}\otimes\bm{A}-\bm{A}\otimes\bm{I}_{n}-\bm{\Delta}\otimes\bm{I}_{n})^{2}
=(𝑰n𝑨𝑨𝑰n)22𝚫𝑨+(𝚫𝑨+𝑨𝚫+𝚫2)𝑰n.\displaystyle=(\bm{I}_{n}\otimes\bm{A}-\bm{A}\otimes\bm{I}_{n})^{2}-2\bm{\Delta}\otimes\bm{A}+(\bm{\Delta}\bm{A}+\bm{A}\bm{\Delta}+\bm{\Delta}^{2})\otimes\bm{I}_{n}.

Let

𝑻\displaystyle\bm{T} =t𝑱n+𝚫2+𝚫𝑨+𝑨𝚫2ddiag(𝑨𝚫)c(𝑱n𝑰n),\displaystyle=-t\bm{J}_{n}+\bm{\Delta}^{2}+\bm{\Delta}\bm{A}+\bm{A}\bm{\Delta}-2\operatorname{ddiag}(\bm{A}\bm{\Delta})-c(\bm{J}_{n}-\bm{I}_{n}), (3.17)
𝑲\displaystyle\bm{K} =t𝑱n,𝑯=t𝑱n/n,𝒁=t𝑱n/n,\displaystyle=-t\bm{J}_{n},~{}~{}\bm{H}=t\bm{J}_{n}/n,~{}~{}\bm{Z}=t\bm{J}_{n}/n,

for some cc and t>0t>0. Now we verify (3.9) in Proposition 3.2:

diag(𝑻+𝑲)+(𝒁+𝑯)𝟏n\displaystyle\operatorname{diag}(\bm{T}+\bm{K})+(\bm{Z}+\bm{H})\bm{1}_{n} =t𝟏n+diag(𝚫2)t𝟏n+2t𝟏n=diag(𝚫2)\displaystyle=-t\bm{1}_{n}+\operatorname{diag}(\bm{\Delta}^{2})-t\bm{1}_{n}+2t\bm{1}_{n}=\operatorname{diag}(\bm{\Delta}^{2})

where diag(𝑻)=t𝟏n+diag(𝚫2).\operatorname{diag}(\bm{T})=-t\bm{1}_{n}+\operatorname{diag}(\bm{\Delta}^{2}). Therefore, we can choose 𝑩\bm{B} in the form of (3.10). To ensure the KKT condition (3.13), it suffices to have (3.11) holds so that 𝑩0\bm{B}\geq 0 and 𝑸\bm{Q} in (3.12) is positive semidefintie. For the first requirement, we have

𝑻+𝑱n𝒁+𝑲+𝑯𝑱n\displaystyle\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n} =c(𝑱n𝑰n)+𝚫2+𝚫𝑨+𝑨𝚫2ddiag(𝑨𝚫)\displaystyle=-c(\bm{J}_{n}-\bm{I}_{n})+\bm{\Delta}^{2}+\bm{\Delta}\bm{A}+\bm{A}\bm{\Delta}-2\operatorname{ddiag}(\bm{A}\bm{\Delta})

and thus 𝑩0\bm{B}\geq 0 is guaranteed if

𝚫2+𝚫𝑨𝑨𝚫(𝑻+𝑱n𝒁+𝑲+𝑯𝑱n)=c(𝑱n𝑰n)+2ddiag(𝑨𝚫)2𝑨𝚫0\bm{\Delta}^{2}+\bm{\Delta}\bm{A}-\bm{A}\bm{\Delta}-(\bm{T}+\bm{J}_{n}\bm{Z}+\bm{K}+\bm{H}\bm{J}_{n})=c(\bm{J}_{n}-\bm{I}_{n})+2\operatorname{ddiag}(\bm{A}\bm{\Delta})-2\bm{A}\bm{\Delta}\geq 0

provided that c+2minij𝑨i,𝚫j0c+2\min_{i\neq j}\langle\bm{A}_{i},\bm{\Delta}_{j}\rangle\geq 0.

For 𝑸0\bm{Q}\succeq 0, we note

𝑺\displaystyle\bm{S} =(𝚫2+𝚫𝑨+𝑨𝚫2ddiag(𝑨𝚫)c(𝑱n𝑰n))𝑰nnt𝑷¯\displaystyle=(\bm{\Delta}^{2}+\bm{\Delta}\bm{A}+\bm{A}\bm{\Delta}-2\operatorname{ddiag}(\bm{A}\bm{\Delta})-c(\bm{J}_{n}-\bm{I}_{n}))\otimes\bm{I}_{n}-nt\bar{\bm{P}}

where 𝑷¯:=n2(𝑱n𝑰n𝑰n𝑱n)2.\bar{\bm{P}}:=n^{-2}(\bm{J}_{n}\otimes\bm{I}_{n}-\bm{I}_{n}\otimes\bm{J}_{n})^{2}. Then

𝑴𝑺\displaystyle\bm{M}-\bm{S} =(𝑰n𝑨𝑨𝑰n)22𝚫𝑨+(𝚫𝑨+𝑨𝚫+𝚫2)𝑰n\displaystyle=(\bm{I}_{n}\otimes\bm{A}-\bm{A}\otimes\bm{I}_{n})^{2}-2\bm{\Delta}\otimes\bm{A}+(\bm{\Delta}\bm{A}+\bm{A}\bm{\Delta}+\bm{\Delta}^{2})\otimes\bm{I}_{n}
(𝚫2+𝚫𝑨+𝑨𝚫2ddiag(𝑨𝚫)c(𝑱n𝑰n))𝑰n+nt𝑷¯\displaystyle\qquad-(\bm{\Delta}^{2}+\bm{\Delta}\bm{A}+\bm{A}\bm{\Delta}-2\operatorname{ddiag}(\bm{A}\bm{\Delta})-c(\bm{J}_{n}-\bm{I}_{n}))\otimes\bm{I}_{n}+nt\bar{\bm{P}}
=(𝑰n𝑨𝑨𝑰n)2+nt𝑷¯2𝚫𝑨+(2ddiag(𝑨𝚫)+c(𝑱n𝑰n))𝑰n.\displaystyle=(\bm{I}_{n}\otimes\bm{A}-\bm{A}\otimes\bm{I}_{n})^{2}+nt\bar{\bm{P}}-2\bm{\Delta}\otimes\bm{A}+(2\operatorname{ddiag}(\bm{A}\bm{\Delta})+c(\bm{J}_{n}-\bm{I}_{n}))\otimes\bm{I}_{n}.

By choosing 𝑩\bm{B} in the form of (3.10), we have 𝑸=(𝑰n2n1𝒙𝒙)(𝑴𝑺)(𝑰n2n1𝒙𝒙)\bm{Q}=(\bm{I}_{n^{2}}-n^{-1}\bm{x}\bm{x}^{\top})(\bm{M}-\bm{S})(\bm{I}_{n^{2}}-n^{-1}\bm{x}\bm{x}^{\top}) and the second smallest eigenvalue satisfies

λ2(𝑸)\displaystyle\lambda_{2}\left(\bm{Q}\right) λ2((𝑰n𝑨𝑨𝑰n)2+nt𝑷¯)2𝚫𝑨+2min1in𝑨i,𝚫ic\displaystyle\geq\lambda_{2}((\bm{I}_{n}\otimes\bm{A}-\bm{A}\otimes\bm{I}_{n})^{2}+nt\bar{\bm{P}})-2\|\bm{\Delta}\otimes\bm{A}\|+2\min_{1\leq i\leq n}\langle\bm{A}_{i},\bm{\Delta}_{i}\rangle-c (3.18)
(Theorem 3.3) 2nminij(λiλj)2min|𝒖i,𝟏n|22𝚫𝑨2𝑨𝚫max\displaystyle\geq\frac{2}{n}\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}\min|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}-2\|\bm{\Delta}\|\|\bm{A}\|-2\|\bm{A}\bm{\Delta}\|_{\max}

where 𝚫𝑨=𝑨𝚫\|\bm{\Delta}\otimes\bm{A}\|=\|\bm{A}\|\|\bm{\Delta}\|, c=2minij𝑨i,𝚫jc=-2\min_{i\neq j}\langle\bm{A}_{i},\bm{\Delta}_{j}\rangle is chosen, and tt is sufficiently large. Then by Theorem 3.1, the exactness holds if λ2(𝑸)>0\lambda_{2}(\bm{Q})>0, i.e., 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} is the unique global minimizer to (2.5). ∎

4 Proof of Theorem 2.2

4.1 Dual program and optimality condition of (2.6)

We start with deriving the dual form of (2.6). For each constraint, we assign a dual variable:

[𝑸𝒒𝒒z]0\displaystyle\begin{bmatrix}\bm{Q}&\bm{q}\\ \bm{q}&z\end{bmatrix}\succeq 0 :[𝑿diag(𝑿)diag(𝑿)1]0,\displaystyle:\begin{bmatrix}\bm{X}&\operatorname{diag}(\bm{X})\\ \operatorname{diag}(\bm{X})^{\top}&1\end{bmatrix}\succeq 0,
𝑩0\displaystyle\bm{B}\geq 0 :𝑿0,\displaystyle:\bm{X}\geq 0,
𝑻n×n\displaystyle\bm{T}\in\hbox{\msbm{R}}^{n\times n} :Tr(𝑿ij)=δij,1i,jn,\displaystyle:\operatorname{Tr}(\bm{X}_{ij})=\delta_{ij},~{}~{}1\leq i,j\leq n,
𝒁n×n\displaystyle\bm{Z}\in\hbox{\msbm{R}}^{n\times n} :𝑿ij,𝑱n=1,1i,jn,\displaystyle:\langle\bm{X}_{ij},\bm{J}_{n}\rangle=1,~{}~{}1\leq i,j\leq n,
𝑲n×n\displaystyle\bm{K}\in\hbox{\msbm{R}}^{n\times n} :i=1n𝑿ii=𝑰n,\displaystyle:\sum_{i=1}^{n}\bm{X}_{ii}=\bm{I}_{n},
𝑯n×n\displaystyle\bm{H}\in\hbox{\msbm{R}}^{n\times n} :i,j𝑿ij=𝑱n,\displaystyle:\sum_{i,j}\bm{X}_{ij}=\bm{J}_{n},
𝝁n\displaystyle\bm{\mu}\in\hbox{\msbm{R}}^{n} :mat(diag(𝑿))𝟏n=𝟏n,\displaystyle:\operatorname{mat}(\operatorname{diag}(\bm{X}))\bm{1}_{n}=\bm{1}_{n},
𝝀n\displaystyle\bm{\lambda}\in\hbox{\msbm{R}}^{n} :mat(diag(𝑿))𝟏n=𝟏n,\displaystyle:\operatorname{mat}(\operatorname{diag}(\bm{X}))^{\top}\bm{1}_{n}=\bm{1}_{n},

where diag(𝑿)\operatorname{diag}(\bm{X}) takes the diagonal entries of 𝑿\bm{X} and forms them into a column vector.

For the constraint mat(diag(𝑿))\operatorname{mat}(\operatorname{diag}(\bm{X})),

𝝁(mat(diag(𝑿))𝟏n𝟏n)\displaystyle\bm{\mu}^{\top}(\operatorname{mat}(\operatorname{diag}(\bm{X}))\bm{1}_{n}-\bm{1}_{n}) =mat(diag(𝑿)),𝝁𝟏n𝝁,𝟏n\displaystyle=\langle\operatorname{mat}(\operatorname{diag}(\bm{X})),\bm{\mu}\bm{1}_{n}^{\top}\rangle-\langle\bm{\mu},\bm{1}_{n}\rangle
=diag(𝑿),𝟏𝝁𝝁,𝟏n=𝑰ndiag(𝝁),𝑿𝝁,𝟏n\displaystyle=\langle\operatorname{diag}(\bm{X}),\bm{1}\otimes\bm{\mu}\rangle-\langle\bm{\mu},\bm{1}_{n}\rangle=\langle\bm{I}_{n}\otimes\operatorname{diag}(\bm{\mu}),\bm{X}\rangle-\langle\bm{\mu},\bm{1}_{n}\rangle
𝝀(mat(diag(𝑿))𝟏n𝟏n)\displaystyle\bm{\lambda}^{\top}(\operatorname{mat}(\operatorname{diag}(\bm{X}))^{\top}\bm{1}_{n}-\bm{1}_{n}) =mat(diag(𝑿)),𝟏n𝝀𝝀,𝟏n\displaystyle=\langle\operatorname{mat}(\operatorname{diag}(\bm{X})),\bm{1}_{n}\bm{\lambda}^{\top}\rangle-\langle\bm{\lambda},\bm{1}_{n}\rangle
=diag(𝑿),𝝀𝟏n𝝀,𝟏n=diag(𝝀)𝑰n,𝑿𝝀,𝟏n\displaystyle=\langle\operatorname{diag}(\bm{X}),\bm{\lambda}\otimes\bm{1}_{n}\rangle-\langle\bm{\lambda},\bm{1}_{n}\rangle=\langle\operatorname{diag}(\bm{\lambda})\otimes\bm{I}_{n},\bm{X}\rangle-\langle\bm{\lambda},\bm{1}_{n}\rangle

where vec(𝒖𝒗)=𝒗𝒖.\operatorname{vec}(\bm{u}\bm{v}^{\top})=\bm{v}\otimes\bm{u}. Also

[𝑿diag(𝑿)diag(𝑿)1],[𝑸𝒒𝒒z]=𝑿,𝑸+2𝑿,diag(𝒒)+z\left\langle\begin{bmatrix}\bm{X}&\operatorname{diag}(\bm{X})\\ \operatorname{diag}(\bm{X})^{\top}&1\end{bmatrix},\begin{bmatrix}\bm{Q}&\bm{q}\\ \bm{q}&z\end{bmatrix}\right\rangle=\langle\bm{X},\bm{Q}\rangle+2\langle\bm{X},\operatorname{diag}(\bm{q})\rangle+z

where diag(𝑿),𝒒=𝑿,diag(𝒒).\langle\operatorname{diag}(\bm{X}),\bm{q}\rangle=\langle\bm{X},\operatorname{diag}(\bm{q})\rangle.

Now the Lagrangian function is

(𝑿,𝑸,𝑩,𝑻,𝒁,𝑲,𝑯,𝝁,𝝀,𝒒,z)\displaystyle{\cal L}(\bm{X},\bm{Q},\bm{B},\bm{T},\bm{Z},\bm{K},\bm{H},\bm{\mu},\bm{\lambda},\bm{q},z)
:=𝑿,𝑴𝑸2diag(𝒒)𝑩(𝑰ndiag(𝝁)+diag(𝝀)𝑰n)+𝝀+𝝁,𝟏n\displaystyle\qquad:=\langle\bm{X},\bm{M}-\bm{Q}-2\operatorname{diag}(\bm{q})-\bm{B}-(\bm{I}_{n}\otimes\operatorname{diag}(\bm{\mu})+\operatorname{diag}(\bm{\lambda})\otimes\bm{I}_{n})\rangle+\langle\bm{\lambda}+\bm{\mu},\bm{1}_{n}\rangle
i,jtij(Tr(𝑿ij)δij)i,jzij(𝑿ij,𝑱n1)i=1n𝑿ii𝑰n,𝑲i,j𝑿ij𝑱n,𝑯z\displaystyle\qquad\quad-\sum_{i,j}t_{ij}(\operatorname{Tr}(\bm{X}_{ij})-\delta_{ij})-\sum_{i,j}z_{ij}(\langle\bm{X}_{ij},\bm{J}_{n}\rangle-1)-\left\langle\sum_{i=1}^{n}\bm{X}_{ii}-\bm{I}_{n},\bm{K}\right\rangle-\left\langle\sum_{i,j}\bm{X}_{ij}-\bm{J}_{n},\bm{H}\right\rangle-z
=𝑿,𝑴𝑸2diag(𝒒)𝑩diag(𝟏𝝁+𝝀𝟏n)𝑻𝑰n𝒁𝑱n𝑰n𝑲𝑱n𝑯\displaystyle\qquad=\left\langle\bm{X},\bm{M}-\bm{Q}-2\operatorname{diag}(\bm{q})-\bm{B}-\operatorname{diag}(\bm{1}\otimes\bm{\mu}+\bm{\lambda}\otimes\bm{1}_{n})-\bm{T}\otimes\bm{I}_{n}-\bm{Z}\otimes\bm{J}_{n}-\bm{I}_{n}\otimes\bm{K}-\bm{J}_{n}\otimes\bm{H}\right\rangle
+Tr(𝑻+𝑲)+𝑱n,𝒁+𝑯+𝝀+𝝁z.\displaystyle\qquad\quad+\operatorname{Tr}(\bm{T}+\bm{K})+\langle\bm{J}_{n},\bm{Z}+\bm{H}\rangle+\langle\bm{\lambda}+\bm{\mu}\rangle-z.

We define

𝚲=𝑰ndiag(𝝁)+diag(𝝀)𝑰n,𝑺=𝑻𝑰n+𝑰n𝑲+𝑱n𝑯+𝒁𝑱n.\bm{\Lambda}=\bm{I}_{n}\otimes\operatorname{diag}(\bm{\mu})+\operatorname{diag}(\bm{\lambda})\otimes\bm{I}_{n},~{}~{}~{}~{}\bm{S}=\bm{T}\otimes\bm{I}_{n}+\bm{I}_{n}\otimes\bm{K}+\bm{J}_{n}\otimes\bm{H}+\bm{Z}\otimes\bm{J}_{n}. (4.1)

Then the Lagrangian equals

()=𝑿,𝑴𝑸2diag(𝒒)𝑩𝚲𝑺+Tr(𝑻+𝑲)+𝑱n,𝒁+𝑯+𝝀+𝝁,𝟏nz.{\cal L}(\cdot)=\langle\bm{X},\bm{M}-\bm{Q}-2\operatorname{diag}(\bm{q})-\bm{B}-\bm{\Lambda}-\bm{S}\rangle+\operatorname{Tr}(\bm{T}+\bm{K})+\langle\bm{J}_{n},\bm{Z}+\bm{H}\rangle+\langle\bm{\lambda}+\bm{\mu},\bm{1}_{n}\rangle-z.

The resulting dual program becomes

max\displaystyle\max{} Tr(𝑻+𝑲)+𝑱n,𝒁+𝑯+𝝀+𝝁,𝟏nz,\displaystyle\operatorname{Tr}(\bm{T}+\bm{K})+\langle\bm{J}_{n},\bm{Z}+\bm{H}\rangle+\langle\bm{\lambda}+\bm{\mu},\bm{1}_{n}\rangle-z, (4.2)
s.t. 𝑴=𝑸+2diag(𝒒)+𝑩+𝚲+𝑺,\displaystyle\bm{M}=\bm{Q}+2\operatorname{diag}(\bm{q})+\bm{B}+\bm{\Lambda}+\bm{S},
[𝑸𝒒𝒒z]0,𝑩0.\displaystyle\begin{bmatrix}\bm{Q}&\bm{q}\\ \bm{q}&z\end{bmatrix}\succeq 0,~{}~{}\bm{B}\geq 0.

As a result, the KKT conditions are

  1. 1.

    Stationarity:

    𝑴=𝑸+2diag(𝒒)+𝑩+𝚲+𝑺,\bm{M}=\bm{Q}+2\operatorname{diag}(\bm{q})+\bm{B}+\bm{\Lambda}+\bm{S},
  2. 2.

    Dual feasibility:

    [𝑸𝒒𝒒z]0,𝑩0,\begin{bmatrix}\bm{Q}&\bm{q}\\ \bm{q}^{\top}&z\end{bmatrix}\succeq 0,~{}~{}~{}~{}\bm{B}\geq 0,
  3. 3.

    Complementary slackness:

    [𝑿diag(𝑿)diag(𝑿)1],[𝑸𝒒𝒒z]=0,𝑩,𝑿=0.\left\langle\begin{bmatrix}\bm{X}&\operatorname{diag}(\bm{X})\\ \operatorname{diag}(\bm{X})^{\top}&1\end{bmatrix},\begin{bmatrix}\bm{Q}&\bm{q}\\ \bm{q}&z\end{bmatrix}\right\rangle=0,~{}~{}~{}\langle\bm{B},\bm{X}\rangle=0.

Suppose 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} is a global minimizer, and then we can try to simplify the KKT optimality condition by determining some of the dual variables. Using the complementary slackness condition gives rise to

[𝑸𝒒𝒒z][𝒙1]=0,𝑩,𝒙𝒙=0,\displaystyle\begin{bmatrix}\bm{Q}&\bm{q}\\ \bm{q}^{\top}&z\end{bmatrix}\begin{bmatrix}\bm{x}\\ 1\end{bmatrix}=0,~{}~{}~{}~{}\langle\bm{B},\bm{x}\bm{x}^{\top}\rangle=0,

which leads to 𝑸𝒙+𝒒=0,z=𝒒,𝒙,\bm{Q}\bm{x}+\bm{q}=0,z=-\langle\bm{q},\bm{x}\rangle, and 𝑩𝒙𝒙=0.\bm{B}\circ\bm{x}\bm{x}^{\top}=0. Now we look into the KKT conditions again: for the dual feasibility, it holds that

[𝑸𝑸𝒙(𝑸𝒙)𝒙𝑸𝒙]0𝑸0.\begin{bmatrix}\bm{Q}&-\bm{Q}\bm{x}\\ -(\bm{Q}\bm{x})^{\top}&\bm{x}^{\top}\bm{Q}\bm{x}\end{bmatrix}\succeq 0~{}\Longleftrightarrow~{}\bm{Q}\succeq 0.

For the stationarity, we have

𝑴=𝑸2diag(𝑸𝒙)+𝑩+𝚲+𝑺.\bm{M}=\bm{Q}-2\operatorname{diag}(\bm{Q}\bm{x})+\bm{B}+\bm{\Lambda}+\bm{S}.

This linear system 𝑸2diag(𝑸𝒙)=𝑴(𝑩+𝚲+𝑺)\bm{Q}-2\operatorname{diag}(\bm{Q}\bm{x})=\bm{M}-(\bm{B}+\bm{\Lambda}+\bm{S}) has a unique solution for 𝑸\bm{Q}:

𝑸𝒙2diag(𝒙)𝑸𝒙=(𝑴𝑩𝚲𝑺)𝒙𝑸𝒙=(𝑰2diag(𝒙))(𝑴𝑩𝚲𝑺)𝒙\bm{Q}\bm{x}-2\operatorname{diag}(\bm{x})\bm{Q}\bm{x}=(\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S})\bm{x}\Longrightarrow\bm{Q}\bm{x}=(\bm{I}-2\operatorname{diag}(\bm{x}))(\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S})\bm{x}

and then

𝑸\displaystyle\bm{Q} =𝑴(𝑩+𝚲+𝑺)+2diag(𝑸𝒙)\displaystyle=\bm{M}-(\bm{B}+\bm{\Lambda}+\bm{S})+2\operatorname{diag}(\bm{Q}\bm{x})
=𝑴(𝑩+𝚲+𝑺)+2(𝑰2diag(𝒙))diag((𝑴𝑩𝚲𝑺)𝒙)\displaystyle=\bm{M}-(\bm{B}+\bm{\Lambda}+\bm{S})+2(\bm{I}-2\operatorname{diag}(\bm{x}))\operatorname{diag}((\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S})\bm{x})

Therefore, the KKT condition becomes

𝑸=𝑴𝑩𝚲𝑺+2(𝑰2diag(𝒙))diag((𝑴𝑩𝚲𝑺)𝒙)0,\displaystyle\bm{Q}=\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S}+2(\bm{I}-2\operatorname{diag}(\bm{x}))\operatorname{diag}((\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S})\bm{x})\succeq 0, (4.3)
𝑩0,𝑩𝒙𝒙=0,\displaystyle\bm{B}\geq 0,~{}\bm{B}\circ\bm{x}\bm{x}^{\top}=0,

where 𝑺\bm{S} is defined in (4.1). Next, we show that (4.3) implies the (unique) global optimality of 𝑿.\bm{X}.

Theorem 4.1.

Suppose that there exist 𝐁\bm{B}, 𝐒\bm{S}, and 𝚲\bm{\Lambda} such that (4.3) holds, then 𝐗=𝐱𝐱\bm{X}=\bm{x}\bm{x}^{\top} is a global minimizer. Moreover, if 𝐐0\bm{Q}\succ 0, then 𝐗\bm{X} is the unique global minimizer.

Proof of Theorem 4.1.

Suppose that

𝑸=𝑴𝑩𝚲𝑺+2diag(𝑸𝒙)0\bm{Q}=\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S}+2\operatorname{diag}(\bm{Q}\bm{x})\succeq 0

which gives

𝒙𝑸𝒙=𝒙(𝑴𝑩𝚲𝑺)𝒙.\bm{x}^{\top}\bm{Q}\bm{x}=-\bm{x}^{\top}(\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S})\bm{x}.

For any feasible solution 𝑿^\widehat{\bm{X}}, we have

[𝑸𝑸𝒙(𝑸𝒙)𝒙𝑸𝒙],[𝑿^diag(𝑿^)diag(𝑿^)1]=𝑿^,𝑸2diag(𝑸𝒙)+𝒙𝑸𝒙\displaystyle\left\langle\begin{bmatrix}\bm{Q}&-\bm{Q}\bm{x}\\ -(\bm{Q}\bm{x})^{\top}&\bm{x}^{\top}\bm{Q}\bm{x}\end{bmatrix},\begin{bmatrix}\widehat{\bm{X}}&\operatorname{diag}(\widehat{\bm{X}})\\ \operatorname{diag}(\widehat{\bm{X}})^{\top}&1\end{bmatrix}\right\rangle=\langle\widehat{\bm{X}},\bm{Q}-2\operatorname{diag}(\bm{Q}\bm{x})\rangle+\bm{x}^{\top}\bm{Q}\bm{x} (4.4)
=𝑿^𝒙𝒙,𝑴𝑩𝚲𝑺=𝑿^𝒙𝒙,𝑴𝑿^,𝑩0\displaystyle\qquad=\langle\widehat{\bm{X}}-\bm{x}\bm{x}^{\top},\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S}\rangle=\langle\widehat{\bm{X}}-\bm{x}\bm{x}^{\top},\bm{M}\rangle-\langle\widehat{\bm{X}},\bm{B}\rangle\geq 0

where 𝑿^𝒙𝒙,𝚲+𝑺=0\langle\widehat{\bm{X}}-\bm{x}\bm{x}^{\top},\bm{\Lambda}+\bm{S}\rangle=0 follows from the feasibility of 𝑿^\widehat{\bm{X}} and 𝒙𝒙\bm{x}\bm{x}^{\top}, and 𝒙𝒙,𝑩=0.\langle\bm{x}\bm{x}^{\top},\bm{B}\rangle=0. Therefore, it holds 𝑿^𝒙𝒙,𝑴𝑿^,𝑩0\langle\widehat{\bm{X}}-\bm{x}\bm{x}^{\top},\bm{M}\rangle\geq\langle\widehat{\bm{X}},\bm{B}\rangle\geq 0 which implies 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} is a global minimizer to (2.6). In particular, if 𝑸0\bm{Q}\succ 0, then 𝑿\bm{X} is the unique global minimizer since (4.4) becomes strictly positive for any 𝑿^𝒙𝒙.\widehat{\bm{X}}\neq\bm{x}\bm{x}^{\top}.

Proof of Theorem 2.2.

Consider the dual certificate 𝑻,𝑲,𝒁\bm{T},\bm{K},\bm{Z} and 𝑯\bm{H} in (3.17), and then it holds that

diag(𝑻+𝑲)+(𝒁+𝑯)𝟏n=diag(𝚫2),𝑩𝒙=(𝑴𝑺)𝒙\operatorname{diag}(\bm{T}+\bm{K})+(\bm{Z}+\bm{H})\bm{1}_{n}=\operatorname{diag}(\bm{\Delta}^{2}),~{}~{}~{}\bm{B}\bm{x}=(\bm{M}-\bm{S})\bm{x}

where 𝑩\bm{B} is chosen in the form of (3.10). As a result, 𝑸\bm{Q} satisfies

𝑸\displaystyle\bm{Q} =𝑴𝑩𝚲𝑺+2(𝑰2diag(𝒙))diag((𝑴𝑩𝚲𝑺)𝒙)\displaystyle=\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S}+2(\bm{I}-2\operatorname{diag}(\bm{x}))\operatorname{diag}((\bm{M}-\bm{B}-\bm{\Lambda}-\bm{S})\bm{x})
=𝚷(𝑴𝑺)𝚷(𝚲+2(𝑰2diag(𝒙))diag(𝚲𝒙))\displaystyle=\bm{\Pi}(\bm{M}-\bm{S})\bm{\Pi}-\left(\bm{\Lambda}+2(\bm{I}-2\operatorname{diag}(\bm{x}))\operatorname{diag}(\bm{\Lambda}\bm{x})\right)

where 𝑩𝒙=(𝑴𝑺)𝒙\bm{B}\bm{x}=(\bm{M}-\bm{S})\bm{x} and 𝚷:=𝑰𝒙𝒙/n.\bm{\Pi}:=\bm{I}-\bm{x}\bm{x}^{\top}/n. Note that (3.18) gives

λ2(𝚷(𝑴𝑺)𝚷)2nminij(λiλj)2min|𝒖i,𝟏n|22𝚫𝑨2𝑨𝚫max.\lambda_{2}\left(\bm{\Pi}(\bm{M}-\bm{S})\bm{\Pi}\right)\geq\frac{2}{n}\min_{i\neq j}(\lambda_{i}-\lambda_{j})^{2}\min|\langle\bm{u}_{i},\bm{1}_{n}\rangle|^{2}-2\|\bm{\Delta}\|\|\bm{A}\|-2\|\bm{A}\bm{\Delta}\|_{\max}.

Note that mat(𝚲𝒙)=diag(𝝀+𝝁)\operatorname{mat}(\bm{\Lambda}\bm{x})=\operatorname{diag}(\bm{\lambda}+\bm{\mu}) where 𝚲\bm{\Lambda} is defined in (4.1). Also we have

mat(diag(𝚲)+2(𝑰2diag(𝒙))𝚲𝒙)=𝟏n𝝀+𝝁𝟏n2diag(𝝀+𝝁).\operatorname{mat}(\operatorname{diag}(\bm{\Lambda})+2(\bm{I}-2\operatorname{diag}(\bm{x}))\bm{\Lambda}\bm{x})=\bm{1}_{n}\bm{\lambda}^{\top}+\bm{\mu}\bm{1}_{n}^{\top}-2\operatorname{diag}(\bm{\lambda}+\bm{\mu}).

In particular, if we choose 𝝀\bm{\lambda} and 𝝁\bm{\mu} as constant vectors, i.e., 𝝀=𝝁=t𝟏n\bm{\lambda}=\bm{\mu}=t\bm{1}_{n}, then

mat(diag(𝚲)+2(𝑰2diag(𝒙))𝚲𝒙)=2t(𝑱n2𝑰n)\operatorname{mat}(\operatorname{diag}(\bm{\Lambda})+2(\bm{I}-2\operatorname{diag}(\bm{x}))\bm{\Lambda}\bm{x})=2t(\bm{J}_{n}-2\bm{I}_{n})

and thus

𝑸\displaystyle\bm{Q} =𝚷(𝑴𝑺)𝚷2t(𝑰diag(𝒙))+2tdiag(𝒙)\displaystyle=\bm{\Pi}(\bm{M}-\bm{S})\bm{\Pi}-2t(\bm{I}-\operatorname{diag}(\bm{x}))+2t\operatorname{diag}(\bm{x})
λ2(𝚷(𝑴𝑺)𝚷)(𝑰𝒙𝒙n)2t(𝑰diag(𝒙))+2tdiag(𝒙)\displaystyle\succeq\lambda_{2}(\bm{\Pi}(\bm{M}-\bm{S})\bm{\Pi})\left(\bm{I}-\frac{\bm{x}\bm{x}^{\top}}{n}\right)-2t(\bm{I}-\operatorname{diag}(\bm{x}))+2t\operatorname{diag}(\bm{x})

For any 0<2t<λ2(𝚷(𝑴𝑺)𝚷)0<2t<\lambda_{2}(\bm{\Pi}(\bm{M}-\bm{S})\bm{\Pi}), then 𝑸0\bm{Q}\succ 0 holds and therefore 𝑿=𝒙𝒙\bm{X}=\bm{x}\bm{x}^{\top} is the unique global minimizer to (2.6), following from Theorem 4.1. ∎

Appendix

Lemma 4.2.

Let 𝚷\bm{\Pi} and 𝐕\bm{V} be two matrices of same size and 𝐏Π\bm{P}_{\Pi} and 𝐏V\bm{P}_{V} be their orthogonal projection matrices respectively that satisfy 𝐏V(𝐈𝐏Π)=0\bm{P}_{V}(\bm{I}-\bm{P}_{\Pi})=0, i.e., Ran(𝐕)Ran(𝚷)\operatorname{Ran}(\bm{V})\subseteq\operatorname{Ran}(\bm{\Pi}). Suppose

(𝑷Π𝑷V)𝑨(𝑷Π𝑷V)c(𝑷Π𝑷V)(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})\succeq c(\bm{P}_{\Pi}-\bm{P}_{V})

for some c>0c>0. Then for any 0<c′′<c0<c^{\prime\prime}<c, we have

𝑷Π𝑨𝑷Π+t𝑷Vc′′𝑷Π\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V}\succeq c^{\prime\prime}\bm{P}_{\Pi}

for sufficiently large tt.

Proof of Lemma 4.2.

We decompose 𝑷Π𝑨𝑷Π+t𝑷V\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V} into

𝑷Π𝑨𝑷Π+t𝑷V=(𝑷Π𝑷V)𝑨(𝑷Π𝑷V)+(𝑷Π𝑷V)𝑨𝑷V+𝑷V𝑨(𝑷Π𝑷V)+(𝑷V𝑨𝑷V+t𝑷V).\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V}=(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})+(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}\bm{P}_{V}+\bm{P}_{V}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})+(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V}).

Let 𝑹:=𝑷Π(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)\bm{R}:=\bm{P}_{\Pi}-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger} where Ran(𝑹)Ran(𝚷)\operatorname{Ran}(\bm{R})\subseteq\operatorname{Ran}(\bm{\Pi}), “\dagger” stands for the Moore-Penrose pseudo-inverse and

(𝑷V𝑨𝑷V+t𝑷V)(𝑷V𝑨𝑷V+t𝑷V)=𝑷V.(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}=\bm{P}_{V}. (4.5)

Note that

(𝑹𝑷Π)2=(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)=0(\bm{R}-\bm{P}_{\Pi})^{2}=(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}=0

where (𝑷V𝑨𝑷V+t𝑷V)(𝑷Π𝑷V)=0(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}(\bm{P}_{\Pi}-\bm{P}_{V})=0. Using 𝑷Π𝑹=𝑹𝑷Π=𝑹\bm{P}_{\Pi}\bm{R}=\bm{R}\bm{P}_{\Pi}=\bm{R} and (𝑹𝑷Π)2=0(\bm{R}-\bm{P}_{\Pi})^{2}=0 gives

𝑹22𝑹+𝑷Π=0\bm{R}^{2}-2\bm{R}+\bm{P}_{\Pi}=0

which implies 𝑹\bm{R} is invertible on Ran(𝚷)\operatorname{Ran}(\bm{\Pi}) as 1 is an eigenvalue of 𝑹\bm{R} with multiplicity equal to the rank of 𝑷Π.\bm{P}_{\Pi}. Therefore, 𝑷Π𝑨𝑷Π+t𝑷VΠ0\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V}\succ_{\Pi}0 is equivalent to 𝑹(𝑷Π𝑨𝑷Π+t𝑷V)𝑹Π0\bm{R}(\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V})\bm{R}^{\top}\succ_{\Pi}0. Now

𝑹(𝑷Π𝑨𝑷Π+t𝑷V)\displaystyle\bm{R}(\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V})
=(𝑷Π(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V))(𝑷Π𝑨𝑷Π+t𝑷V)\displaystyle=(\bm{P}_{\Pi}-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger})(\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V})
=𝑷Π𝑨𝑷Π+t𝑷V(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)(𝑷V𝑨(𝑷Π𝑷V)+(𝑷V𝑨𝑷V+t𝑷V))\displaystyle=\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V}-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}(\bm{P}_{V}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})+(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V}))
=𝑷Π𝑨𝑷Π+t𝑷V(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)𝑨(𝑷Π𝑷V)(𝑷Π𝑷V)𝑨𝑷V\displaystyle=\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V}-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}\bm{P}_{V}
=(𝑷Π𝑷V)𝑨(𝑷Π𝑷V)+𝑷V𝑨(𝑷Π𝑷V)+(𝑷V𝑨𝑷V+t𝑷V)\displaystyle=(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})+\bm{P}_{V}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})+(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})
(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)𝑨(𝑷Π𝑷V)\displaystyle\qquad-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})

which follows from 𝑷Π𝑷V=𝑷V\bm{P}_{\Pi}\bm{P}_{V}=\bm{P}_{V} and (4.5). Then

𝑹(𝑷Π𝑨𝑷Π+t𝑷V)𝑹\displaystyle\bm{R}(\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V})\bm{R}^{\top}
=(𝑷Π𝑷V)𝑨(𝑷Π𝑷V)+𝑷V𝑨(𝑷Π𝑷V)+(𝑷V𝑨𝑷V+t𝑷V)\displaystyle\qquad=(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})+\bm{P}_{V}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})+(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})
(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)𝑨(𝑷Π𝑷V)𝑷V𝑨(𝑷Π𝑷V)\displaystyle\qquad~{}~{}~{}-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})-\bm{P}_{V}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})
=(𝑷V𝑨𝑷V+t𝑷V)+(𝑷Π𝑷V)𝑨(𝑷Π𝑷V)(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)𝑨(𝑷Π𝑷V)\displaystyle\qquad=(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})+(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})

where (𝑷V𝑨𝑷V+t𝑷V)(𝑷V𝑨𝑷V+t𝑷V)=𝑷V(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}=\bm{P}_{V} for sufficiently large tt.

Next, we will show that 𝑹(𝑷Π𝑨𝑷Π+t𝑷V)𝑹\bm{R}(\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V})\bm{R}^{\top} is strictly positive semidefinite when restricted to 𝑷V.\bm{P}_{V}. Suppose (𝑷Π𝑷V)𝑨(𝑷Π𝑷V)c(𝑷Π𝑷V)(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})\succeq c(\bm{P}_{\Pi}-\bm{P}_{V}), and then for a sufficiently large t>0t>0, it holds for 0<c<c0<c^{\prime}<c that

(𝑷Π𝑷V)𝑨(𝑷Π𝑷V)(𝑷Π𝑷V)𝑨(𝑷V𝑨𝑷V+t𝑷V)𝑨(𝑷Π𝑷V)c(𝑷Π𝑷V)\displaystyle(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})-(\bm{P}_{\Pi}-\bm{P}_{V})\bm{A}(\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V})^{\dagger}\bm{A}(\bm{P}_{\Pi}-\bm{P}_{V})\succeq c^{\prime}(\bm{P}_{\Pi}-\bm{P}_{V})

as the second term can be arbitrarily small for a sufficiently large t>0.t>0. Then

𝑹(𝑷Π𝑨𝑷Π+t𝑷V)𝑹c𝑷V+c(𝑷Π𝑷V)c𝑷Π\bm{R}(\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V})\bm{R}^{\top}\succeq c^{\prime}\bm{P}_{V}+c^{\prime}(\bm{P}_{\Pi}-\bm{P}_{V})\succeq c^{\prime}\bm{P}_{\Pi}

where 𝑷V𝑨𝑷V+t𝑷Vc𝑷V\bm{P}_{V}\bm{A}\bm{P}_{V}+t\bm{P}_{V}\succeq c^{\prime}\bm{P}_{V} for sufficiently large tt.

Note that 𝑹\bm{R} is invertible on Ran(𝚷)\operatorname{Ran}(\bm{\Pi}) and also the range of 𝑹\bm{R} belongs to Ran(𝚷).\operatorname{Ran}(\bm{\Pi}). Therefore, we have

𝑷Π𝑨𝑷Π+t𝑷Vc(𝑹𝑹)c′′𝑷Π\bm{P}_{\Pi}\bm{A}\bm{P}_{\Pi}+t\bm{P}_{V}\succeq c^{\prime}(\bm{R}\bm{R}^{\top})^{\dagger}\succeq c^{\prime\prime}\bm{P}_{\Pi}

for any c′′<cc^{\prime\prime}<c where 𝑹𝑹𝑷Π\|\bm{R}\bm{R}^{\top}-\bm{P}_{\Pi}\| goes to 0 at the rate of 1/t1/t and thus 𝑹𝑹\bm{R}\bm{R}^{\top} is close to 𝑷Π\bm{P}_{\Pi} for sufficiently large tt. ∎

References

  • [1] E. Abbe. Community detection and stochastic block models: recent developments. The Journal of Machine Learning Research, 18(1):6446–6531, 2017.
  • [2] E. Abbe, A. S. Bandeira, and G. Hall. Exact recovery in the stochastic block model. IEEE Transactions on information theory, 62(1):471–487, 2015.
  • [3] Y. Aflalo, A. Bronstein, and R. Kimmel. On convex relaxation of graph isomorphism. Proceedings of the National Academy of Sciences, 112(10):2942–2947, 2015.
  • [4] A. Ahmed, B. Recht, and J. Romberg. Blind deconvolution using convex programming. IEEE Transactions on Information Theory, 60(3):1711–1732, 2013.
  • [5] K. M. Anstreicher. Eigenvalue bounds versus semidefinite relaxations for the quadratic assignment problem. SIAM Journal on Optimization, 11(1):254–265, 2000.
  • [6] K. M. Anstreicher. Recent advances in the solution of quadratic assignment problems. Mathematical Programming, 97:27–42, 2003.
  • [7] K. M. Anstreicher and N. W. Brixius. A new bound for the quadratic assignment problem based on convex quadratic programming. Mathematical Programming, Series A, 89(341-357), 2001.
  • [8] P. Awasthi, A. S. Bandeira, M. Charikar, R. Krishnaswamy, S. Villar, and R. Ward. Relax, no need to round: Integrality of clustering formulations. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science, pages 191–200. ACM, 2015.
  • [9] A. S. Bandeira. Random Laplacian matrices and convex relaxations. Foundations of Computational Mathematics, 18(2):345–379, Apr 2018.
  • [10] J. F. Bravo Ferreira, Y. Khoo, and A. Singer. Semidefinite programming approach for the quadratic assignment problem with a sparse graph. Computational Optimization and Applications, 69:677–712, 2018.
  • [11] E. J. Candes, T. Strohmer, and V. Voroninski. Phaselift: Exact and stable signal recovery from magnitude measurements via convex programming. Communications on Pure and Applied Mathematics, 66(8):1241–1274, 2013.
  • [12] Y. Cheng and R. Ge. Non-convex matrix completion against a semi-random adversary. In Conference On Learning Theory, pages 1362–1394. PMLR, 2018.
  • [13] D. Conte, P. Foggia, C. Sansone, and M. Vento. Thirty years of graph matching in pattern recognition. International journal of pattern recognition and artificial intelligence, 18(03):265–298, 2004.
  • [14] D. Cullina and N. Kiyavash. Improved achievability and converse bounds for erdos-rényi graph matching. ACM SIGMETRICS performance evaluation review, 44(1):63–72, 2016.
  • [15] E. De Klerk and R. Sotirov. Exploiting group symmetry in semidefinite programming relaxations of the quadratic assignment problem. Mathematical Programming, 122:225–246, 2010.
  • [16] E. de Klerk and R. Sotirov. Improved semidefinite programming bounds for quadratic assignment problems with suitable symmetry. Mathematical programming, 133:75–91, 2012.
  • [17] J. Ding, Z. Ma, Y. Wu, and J. Xu. Efficient random graph matching via degree profiles. Probability Theory and Related Fields, 179:29–115, 2021.
  • [18] Y. Ding and H. Wolkowicz. A low-dimensional semidefinite relaxation for the quadratic assignment problem. Mathematics of Operations Research, 34(4):1008–1022, 2009.
  • [19] Z. Fan, C. Mao, Y. Wu, and J. Xu. Spectral graph matching and regularized quadratic relaxations i algorithm and gaussian analysis. Foundations of Computational Mathematics, 23(5):1511–1565, 2023.
  • [20] Z. Fan, C. Mao, Y. Wu, and J. Xu. Spectral graph matching and regularized quadratic relaxations ii: Erdős-rényi graphs and universality. Foundations of Computational Mathematics, 23(5):1567–1617, 2023.
  • [21] S. Feizi, G. Quon, M. Recamonde-Mendoza, M. Medard, M. Kellis, and A. Jadbabaie. Spectral alignment of graphs. IEEE Transactions on Network Science and Engineering, 7(3):1182–1197, 2019.
  • [22] R. Feng, G. Tian, and D. Wei. Small gaps of GOE. Geometric and Functional Analysis, 29(6):1794–1827, 2019.
  • [23] M. Fiori and G. Sapiro. On spectral properties for graph matching and graph isomorphism problems. Information and Inference: A Journal of the IMA, 4:63–76, 2015.
  • [24] F. Fogel, R. Jenatton, F. Bach, and A. d’Aspremont. Convex relaxations for permutation problems. Advances in Neural Information Processing Systems, 26, 2013.
  • [25] L. Ganassali. Sharp threshold for alignment of graph databases with gaussian weights. In Mathematical and Scientific Machine Learning, pages 314–335. PMLR, 2022.
  • [26] L. Ganassali, M. Lelarge, and L. Massoulié. Spectral alignment of correlated gaussian matrices. Advances in Applied Probability, 54(1):279–310, 2022.
  • [27] P. C. Gilmore. Optimal and suboptimal algorithms for the quadratic assignment problem. Journal of the Society for Industrial and Applied Mathematics, 10(2):305–313, 1962.
  • [28] M. Grant, S. Boyd, and Y. Ye. Cvx: Matlab software for disciplined convex programming, 2008.
  • [29] S. C. Gutekunst and D. P. Williamson. Semidefinite programming relaxations of the traveling salesman problem and their integrality gaps. Mathematics of Operations Research, 47(1):1–28, 2022.
  • [30] G. Hall and L. Massoulié. Partial recovery in the graph alignment problem. Operations Research, 71(1):259–272, 2023.
  • [31] T. Iguchi, D. G. Mixon, J. Peterson, and S. Villar. Probably certifiably correct k-means clustering. Mathematical Programming, 165:605–642, 2017.
  • [32] R. M. Karp. Reducibility among combinatorial problems. Complexity of Computer Computations, pages 85–103, 1972.
  • [33] S. Klus and T. Sahai. A spectral assignment approach for the graph isomorphism problem. Information and Inference: A Journal of the IMA, 7:689–706, 2018.
  • [34] E. L. Lawler. The quadratic assignment problem. Management Science, 9(4):586–599, 1963.
  • [35] X. Li, Y. Li, S. Ling, T. Strohmer, and K. Wei. When do birds of a feather flock together? k-means, proximity, and conic programming. Mathematical Programming, 179:295–341, 2020.
  • [36] S. Ling. Improved performance guarantees for orthogonal group synchronization via generalized power method. SIAM Journal on Optimization, 32(2):1018–1048, 2022.
  • [37] S. Ling and T. Strohmer. Certifying global optimality of graph cuts via semidefinite relaxation: A performance guarantee for spectral clustering. Foundations of Computational Mathematics, 20(3):367–421, 2020.
  • [38] E. M. Loiola, N. M. M. De Abreu, P. O. Boaventura-Netto, P. Hahn, and T. Querido. A survey for the quadratic assignment problem. European Journal of Operational Research, 176(2):657–690, 2007.
  • [39] V. Lyzinski, D. E. Fishkind, M. Fiori, J. T. Vogelstein, C. E. Priebe, and G. Sapiro. Graph matching: Relax at your own risk. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(01):60–73, 2016.
  • [40] C. Mao, M. Rudelson, and K. Tikhomirov. Random graph matching with improved noise robustness. In 34th Annual Conference on Learning Theory, volume 134, pages 1–34, 2021.
  • [41] C. Mao, M. Rudelson, and K. Tikhomirov. Exact matching of random graphs with constant correlation. Probability Theory and Related Fields, 186(1):327–389, 2023.
  • [42] D. E. Oliveira, H. Wolkowicz, and Y. Xu. Admm for the sdp relaxation of the qap. Mathematical Programming Computation, 10(4):631–658, 2018.
  • [43] E. Onaran, S. Garg, and E. Erkip. Optimal de-anonymization in random graphs with community structure. In 2016 50th Asilomar Conference on Signals, Systems and Computers, pages 709–713. IEEE, 2016.
  • [44] J. Povh and F. Rendl. Copositive and semidefinite relaxations of the quadratic assignment problem. Discrete Optimization, 6(3):231–241, 2009.
  • [45] B. Recht. A simpler approach to matrix completion. Journal of Machine Learning Research, 12(12), 2011.
  • [46] F. Rendl and H. Wolkowicz. Applications of parametric programming and eigenvalue maximization to the quadratic assignment problem. Mathematical Programming, 53(1):63–78, 1992.
  • [47] R. Singh, J. Xu, and B. Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences, 105(35):12763–12768, 2008.
  • [48] L. Vandenberghe and S. Boyd. Semidefinite programming. SIAM Review, 38(1):49–95, 1996.
  • [49] R. Vershynin. High-dimensional Probability: An Introduction with Applications in Data Science, volume 47. Cambridge University Press, 2018.
  • [50] Y. Wu, J. Xu, and H. Y. Sophie. Settling the sharp reconstruction thresholds of random graph matching. IEEE Transactions on Information Theory, 68(8):5391–5417, 2022.
  • [51] Q. Zhao, S. E. Karisch, F. Rendl, and H. Wolkowicz. Semidefinite programming relaxations for the quadratic assignment problem. Journal of Combinatorial Optimization, 2:71–109, 1998.
  • [52] Y. Zhong and N. Boumal. Near-optimal bounds for phase synchronization. SIAM Journal on Optimization, 28(2):989–1016, 2018.