This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Blind Super-resolution via Projected Gradient Descent

Abstract

Blind super-resolution can be cast as low rank matrix recovery problem by exploiting the inherent simplicity of the signal. In this paper, we develop a simple yet efficient non-convex method for this problem based on the low rank structure of the vectorized Hankel matrix associated with the target matrix. Theoretical guarantees have been established under the similar conditions as convex approaches. Numerical experiments are also conducted to demonstrate its performance.

Index Terms—  Blind super-resolution, non-convex optimization, projected gradient descent, vectorized Hankel lift.

1 Introduction

Blind super-resolution of point sources is the problem of simultaneously estimating locations and amplitudes of point sources and point spread functions from low-resolution measurements. Such problem arises in various applications, including single-molecule imaging [1], medical imaging [2], multi-user communication system [3] and so on.

Under certain subspace assumption and applying the lifting technique, blind super-resolution can be cast as a matrix recovery problem. Recent works in [4, 5, 6, 7, 8] exploit the intrinsic structures of data matrix and propose different convex relaxation methods for such problem. Theoretical guarantees for these methods have been established. However, due to the limitations of convex relaxation, all of these methods do not scale well to the high dimensional setting.

In contrast to convex relaxation, a non-convex recovery method is proposed in this paper based on the Vectorized Hankel Lift [8] framework. More precisely, harnessing low-rank structure of vectorized Hankel matrix corresponding to the signal in terms of the Burer-Monteiro factorization, we develop a projected gradient descent algorithm, named PGD-VHL, to directly recover the low rank factors. We show that such a simple algorithm possesses a remarkable reconstruction ability. Moreover, our algorithm started from an initial guess converges linearly to the target matrix under the similar sample complexity as convex approaches.

The rest of this paper is organized as follows. We begin with the problem formulation and describe the proposed algorithm in Section 2. Section 3 provides a convergence analysis of PGD-VHL. Numerical experiments to illustrate the performance of PGD-VHL are provided in Section 4. Section 5 concludes this paper.

2 Algorithm

2.1 Problem formulation

The point source signal can be represented as a superposition of rr spikes

x(t)=k=1rdkδ(tτk),\displaystyle x(t)=\sum_{k=1}^{r}d_{k}\delta(t-\tau_{k}),

where dkd_{k} and τk\tau_{k} are the amplitude and location of kk-th point source respectively. Let {gk(t)}k=1r\{g_{k}(t)\}_{k=1}^{r} be the unknown point spread functions. The observation is a convolution between x(t)x(t) and {gk(t)}k=1r\{g_{k}(t)\}_{k=1}^{r},

y(t)=k=1rdkδ(tτk)gk(t)=k=1rdkgk(tτk).\displaystyle y(t)=\sum_{k=1}^{r}d_{k}\delta(t-\tau_{k})\ast g_{k}(t)=\sum_{k=1}^{r}d_{k}g_{k}(t-\tau_{k}).

After taking Fourier transform and sampling, we obtain the measurements as

y[j]=k=1rdke2πıτkjg^k[j] for j=0,,n1.\displaystyle y[j]=\sum_{k=1}^{r}d_{k}e^{-2\pi\imath\tau_{k}\cdot j}\cdot\hat{g}_{k}[j]~{}~{}\text{ for }j=0,\cdots,n-1. (2.1)

Let 𝒈k=[g^k[0]g^k[n1]]𝖳\bm{g}_{k}=\begin{bmatrix}\hat{g}_{k}[0]&\cdots&\hat{g}_{k}[n-1]\end{bmatrix}^{\mathsf{T}} be a vector corresponding to the kk-th unknown point spread function. The goal is to estimate {dk,τk}k=1r\{d_{k},\tau_{k}\}_{k=1}^{r} as well as {𝒈k}k=1r\{\bm{g}_{k}\}_{k=1}^{r} from (2.1). Since the number of unknowns is larger than nn, this problem is an ill-posed problem without any additional assumptions. Following the same route as that in [4, 5, 6, 8], we assume {𝒈k}k=1r\{\bm{g}_{k}\}_{k=1}^{r} belong to a known subspace spanned by the columns of 𝑩n×s\bm{B}\in\mathbb{C}^{n\times s}, i.e.,

𝒈k=𝑩𝒉k.\displaystyle\bm{g}_{k}=\bm{B}\bm{h}_{k}.

Then under the subspace assumption and applying the lifting technique [9], the measurements (2.1) can be rewritten as a linear observations of 𝑿:=k=1rdk𝒉k𝒂τk𝖳s×n\bm{X}^{\natural}:=\sum_{k=1}^{r}d_{k}\bm{h}_{k}\bm{a}_{\tau_{k}}^{\mathsf{T}}\in\mathbb{C}^{s\times n}:

yj=𝒃j𝒆j𝖳,𝑿forj=0,,n1,\displaystyle y_{j}=\left\langle\bm{b}_{j}\bm{e}_{j}^{\mathsf{T}},\bm{X}^{\natural}\right\rangle\quad\text{for}\quad j=0,\cdots,n-1, (2.2)

where 𝒃j\bm{b}_{j} is the jj-th row of 𝑩\bm{B}, 𝒆j\bm{e}_{j} is the (j+1)(j+1)-th standard basis of n\mathbb{R}^{n}, and 𝒂τn\bm{a}_{\tau}\in\mathbb{C}^{n} is the vector defined as

𝒂τ=[1e2πıτ1e2πıτ(n1)]𝖳.\displaystyle\bm{a}_{\tau}=\begin{bmatrix}1&e^{-2\pi\imath\tau\cdot 1}&\cdots&e^{-2\pi\imath\tau\cdot(n-1)}\end{bmatrix}^{\mathsf{T}}.

The measurement model (2.2) can be rewritten succinctly as

𝒚=𝒜(𝑿),\displaystyle\bm{y}={\cal A}(\bm{X}^{\natural}), (2.3)

where 𝒜:s×nn{\cal A}:\mathbb{C}^{s\times n}\rightarrow\mathbb{C}^{n} is the linear operator. Therefore, blind super-resolution can be cast as the problem of recovering the data matrix 𝑿\bm{X}^{\natural} from its linear measurements (2.3).

Let {\cal H} be the vectorized Hankel lifting operator which maps a matrix 𝑿s×n\bm{X}\in\mathbb{C}^{s\times n} into an sn1×n2sn_{1}\times n_{2} matrix,

(𝑿)=[𝒙0𝒙1𝒙n21𝒙1𝒙2𝒙n2𝒙n11𝒙n1𝒙n1]sn1×n2,\displaystyle{\cal H}(\bm{X})=\begin{bmatrix}\bm{x}_{0}&\bm{x}_{1}&\cdots&\bm{x}_{n_{2}-1}\\ \bm{x}_{1}&\bm{x}_{2}&\cdots&\bm{x}_{n_{2}}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{x}_{n_{1}-1}&\bm{x}_{n_{1}}&\cdots&\bm{x}_{n-1}\\ \end{bmatrix}\in\mathbb{C}^{sn_{1}\times n_{2}},

where 𝒙i\bm{x}_{i} is the (i+1)(i+1)-th column of 𝑿\bm{X} and n1+n2=n+1n_{1}+n_{2}=n+1. It is shown that (𝑿){\cal H}(\bm{X}^{\natural}) is a rank-rr matrix [8] and thus the matrix (𝑿){\cal H}(\bm{X}^{\natural}) admits low rank structure when rmin(sn1,n2)r\ll\min(sn_{1},n_{2}). It is natural to recover 𝑿\bm{X} by solving the constrained least squares problem

min𝑿12𝒚𝒜(𝑿)22 s.t. rank((𝑿))=r.\displaystyle\min_{\bm{X}}~{}\frac{1}{2}\left\|\bm{y}-{\cal A}(\bm{X})\right\|_{{\footnotesize{2}}}^{2}~{}\text{ s.t. }\operatorname{\mathrm{rank}}({\cal H}(\bm{X}))=r.

To introduce our algorithm, we assume that 𝑿\bm{X}^{\natural} is μ1\mu_{1}-incoherent which is defined as below.

Assumption 2.1.

Let (𝐗)=𝐔𝚺𝐕𝖧{\cal H}(\bm{X}^{\natural})=\bm{U}\bm{\Sigma}\bm{V}^{\mathsf{H}} be the singular value decomposition of (𝐗){\cal H}(\bm{X}^{\natural}), where 𝐔sn1×r,𝚺r×r\bm{U}\in\mathbb{C}^{sn_{1}\times r},\bm{\Sigma}\in\mathbb{R}^{r\times r} and 𝐕n2×r\bm{V}\in\mathbb{C}^{n_{2}\times r}. Denote 𝐔𝖧=[𝐔0𝖧𝐔n11𝖧]𝖧\bm{U}^{\mathsf{H}}=\begin{bmatrix}\bm{U}_{0}^{\mathsf{H}}&\cdots&\bm{U}_{n_{1}-1}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}, where 𝐔=𝐔[s+1:(+1)s,:]\bm{U}_{\ell}=\bm{U}[\ell s+1:(\ell+1)s,:] is the \ell-th block of 𝐔\bm{U} for =0,n11\ell=0,\cdots n_{1}-1. The matrix 𝐗\bm{X}^{\natural} is μ1\mu_{1}-incoherence if 𝐔\bm{U} and 𝐕\bm{V} obey that

max0n11𝑼𝖥2μ1rn and max0jn21𝒆j𝖳𝑽22μ1rn\displaystyle\max_{0\leq\ell\leq n_{1}-1}\left\|\bm{U}_{\ell}\right\|_{{\footnotesize{\mathsf{F}}}}^{2}\leq\frac{\mu_{1}r}{n}\text{ and }\max_{0\leq j\leq n_{2}-1}\left\|\bm{e}_{j}^{\mathsf{T}}\bm{V}\right\|_{{\footnotesize{2}}}^{2}\leq\frac{\mu_{1}r}{n}

for some positive constant μ1\mu_{1}.

Remark 2.1.

Assumption 2.1 is the same as the one made in [10, 11] for low rank matrix recovery and is used in blind super-resolution [8]. It has been established that Assumption 2.1 is obeyed when the minimum wrap-up distance between the locations of point sources is greater than about 1/n1/n.

Letting 𝑳=𝑼𝚺1/2\bm{L}^{\natural}=\bm{U}\bm{\Sigma}^{1/2} and 𝑹=𝑽𝚺1/2\bm{R}^{\natural}=\bm{V}\bm{\Sigma}^{1/2}, we have

max0n11𝑳𝖥μ1rσ1n and 𝑹2,μ1rσ1n,\displaystyle\max_{0\leq\ell\leq n_{1}-1}\left\|\bm{L}_{\ell}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}\leq\sqrt{\frac{\mu_{1}r\sigma_{1}}{n}}\text{ and }\left\|\bm{R}^{\natural}\right\|_{{\footnotesize{\mbox{2,$\infty$}}}}\leq\sqrt{\frac{\mu_{1}r\sigma_{1}}{n}},

where σ1=(𝑿)\sigma_{1}=\left\|{\cal H}(\bm{X}^{\natural})\right\|. Since that target data matrix 𝑿\bm{X}^{\natural} is μ1\mu_{1}-incoherence and the low rank structure of the vectorized Hankel matrix can be promoted by (𝑿)=𝑳𝑹𝖧{\cal H}(\bm{X}^{\natural})=\bm{L}^{\natural}{\bm{R}^{\natural}}^{\mathsf{H}}, it is natural to recover the low rank factors of the ground truth matrix (𝑿){\cal H}(\bm{X}^{\natural}) by solving an optimization problem in the form of

min𝑴{f(𝑴):=\displaystyle\min_{\bm{M}\in{\cal M}}~{}\bigg{\{}f(\bm{M}):= 12𝒚𝒜(𝑳𝑹𝖧)22\displaystyle\frac{1}{2}\left\|\bm{y}-{\cal A}{\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}})\right\|_{{\footnotesize{2}}}^{2}
+12()(𝑳𝑹𝖧)𝖥2\displaystyle+\frac{1}{2}\left\|\left({\cal I}-{\cal H}{\cal H}^{\dagger}\right)(\bm{L}\bm{R}^{\mathsf{H}})\right\|_{{\footnotesize{\mathsf{F}}}}^{2}
+116𝑳𝖧𝑳𝑹𝖧𝑹𝖥2},\displaystyle+\frac{1}{16}\left\|\bm{L}^{\mathsf{H}}\bm{L}-\bm{R}^{\mathsf{H}}\bm{R}\right\|_{{\footnotesize{\mathsf{F}}}}^{2}\bigg{\}}, (2.4)

where 𝑴=[𝑳𝖧𝑹𝖧]𝖧(sn1+n2)×r\bm{M}=\begin{bmatrix}\bm{L}^{\mathsf{H}}&\bm{R}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}\in\mathbb{C}^{(sn_{1}+n_{2})\times r}, {\cal H}^{\dagger} is the Moore-Penrose pseudoinverse of {\cal H} obeying that ={\cal H}^{\dagger}{\cal H}={\cal I}, the second term in objective guarantees that 𝑳𝑹𝖧\bm{L}\bm{R}^{\mathsf{H}} admits vectorized Hankel structure. Last term penalizes the mismatch between 𝑳\bm{L} and 𝑹\bm{R}, which is widely used in rectangular low rank matrix recovery [12, 13, 14]. The convex feasible set {\cal M} is defined as follows

={[𝑳𝑹]:max0n11𝑳𝖥\displaystyle{\cal M}=\bigg{\{}\begin{bmatrix}\bm{L}\\ \bm{R}\\ \end{bmatrix}~{}:~{}\max_{0\leq\ell\leq n_{1}-1}\left\|\bm{L}_{\ell}\right\|_{{\footnotesize{\mathsf{F}}}} μrσn,\displaystyle\leq\sqrt{\frac{\mu r\sigma}{n}},
𝑹2,\displaystyle\left\|\bm{R}\right\|_{{\footnotesize{\mbox{2,$\infty$}}}} μrσn},\displaystyle\leq\sqrt{\frac{\mu r\sigma}{n}}\bigg{\}}, (2.5)

where 𝑳\bm{L}_{\ell} is the \ell-th block of 𝑳\bm{L}, μ\mu and σ\sigma be two absolute constants such that μμ1\mu\geq\mu_{1} and σσ1\sigma\geq\sigma_{1}.

2.2 Projected gradient descent method

Inspired by [15], we design a projected gradient descent method for the problem (2.1), which is summarized in Algorithm 1.

Algorithm 1 PGD-VHL
Input: 𝒜,𝒚,n,s,r{\cal A},\bm{y},n,s,r
Initialization:
n1=n/2,n2=n+1n1~{}\quad n_{1}=n/2,\quad n_{2}=n+1-n_{1}
𝑼^0𝚺^0𝑽^0𝖧=𝒫r𝒜(𝒚)~{}\quad\widehat{\bm{U}}_{0}\widehat{\bm{\Sigma}}_{0}{\widehat{\bm{V}}_{0}}^{\mathsf{H}}={\cal P}_{r}{\cal H}{\cal A}^{\ast}(\bm{y})
𝑳^0=𝑼^0𝚺^01/2,𝑹^0=𝑽^0𝚺^01/2~{}\quad\widehat{\bm{L}}_{0}=\widehat{\bm{U}}_{0}{\widehat{\bm{\Sigma}}_{0}}^{1/2},\quad\widehat{\bm{R}}_{0}=\widehat{\bm{V}}_{0}{\widehat{\bm{\Sigma}}_{0}}^{1/2}
(𝑳0,𝑹0)=𝒫((𝑳^0,𝑹^0))~{}\quad(\bm{L}_{0},\bm{R}_{0})={\cal P}_{{\cal M}}((\widehat{\bm{L}}_{0},\widehat{\bm{R}}_{0}))
𝑴0=[𝑳0𝖧𝑹0𝖧]𝖧~{}\quad\bm{M}_{0}=\begin{bmatrix}\bm{L}_{0}^{\mathsf{H}}&\bm{R}^{\mathsf{H}}_{0}\end{bmatrix}^{\mathsf{H}}
while  not convergence do
     𝑴t+1=𝒫(𝑴tηf(𝑴t))\bm{M}_{t+1}={\cal P}_{{\cal M}}\left(\bm{M}_{t}-\eta\nabla f(\bm{M}_{t})\right).
end while

The initialization involves two steps: (1) computes the best rank rr approximation of (𝒜(𝒚)){\cal H}({\cal A}^{\ast}(\bm{y})) via one step hard thresholding 𝒫r(){\cal P}_{r}(\cdot), where 𝒜{\cal A}^{\ast} is the adjoint of 𝒜{\cal A} ;(2) projects the low rank factors of best rank-rr approximated matrix onto the set {\cal M}. Given a matrix 𝑴=[𝑳𝖧𝑹𝖧]𝖧\bm{M}=\begin{bmatrix}\bm{L}^{\mathsf{H}}&\bm{R}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}, the projection onto {\cal M}, denoted by [𝑳^𝖧𝑹^𝖧]𝖧\begin{bmatrix}\widehat{\bm{L}}^{\mathsf{H}}&\widehat{\bm{R}}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}, has a closed form solution:

[𝑳^]={𝑳 if 𝑳𝖥μrσn1𝑳𝖥𝑳μrσn o.w.\displaystyle[\widehat{\bm{L}}]_{\ell}=\begin{cases}\bm{L}_{\ell}&\text{ if }\left\|\bm{L}_{\ell}\right\|_{{\footnotesize{\mathsf{F}}}}\leq\sqrt{\frac{\mu r\sigma}{n}}\\ \frac{1}{\left\|\bm{L}_{\ell}\right\|_{{\footnotesize{\mathsf{F}}}}}\bm{L}_{\ell}\cdot\sqrt{\frac{\mu r\sigma}{n}}&\text{ o.w.}\end{cases}

for 0n110\leq\ell\leq n_{1}-1 and

𝒆j𝖳𝑹^={𝒆j𝖳𝑹 if 𝒆j𝖳𝑹2μrσn𝒆j𝖳𝑹𝒆j𝖳𝑹2μrσn o.w.\displaystyle\bm{e}_{j}^{\mathsf{T}}\widehat{\bm{R}}=\begin{cases}\bm{e}_{j}^{\mathsf{T}}\bm{R}&\text{ if }\left\|\bm{e}_{j}^{\mathsf{T}}\bm{R}\right\|_{{\footnotesize{2}}}\leq\sqrt{\frac{\mu r\sigma}{n}}\\ \frac{\bm{e}_{j}^{\mathsf{T}}\bm{R}}{\left\|\bm{e}_{j}^{\mathsf{T}}\bm{R}\right\|_{{\footnotesize{2}}}}\cdot\sqrt{\frac{\mu r\sigma}{n}}&\text{ o.w.}\end{cases}

for 0jn210\leq j\leq n_{2}-1. Let 𝑴t\bm{M}_{t} be the current estimator. The algorithm updates 𝑴t\bm{M}_{t} along gradient descent direction f(𝑴t)-\nabla f(\bm{M}_{t}) with step size η\eta, followed by projection onto the set {\cal M}. The gradient of f(𝑴)f(\bm{M}) is computed with respect to Wirtinger calculus given by f=[𝑳𝖧f𝑹𝖧f]𝖧\nabla f=\begin{bmatrix}\nabla^{\mathsf{H}}_{\bm{L}}f&\nabla^{\mathsf{H}}_{\bm{R}}f\end{bmatrix}^{\mathsf{H}} where

𝑳f=\displaystyle\nabla_{\bm{L}}f= (𝒟2𝒜(𝒜(𝑳𝑹𝖧)𝒚))𝑹\displaystyle\left({\cal H}{\cal D}^{-2}{\cal A}^{\ast}\left({\cal A}{\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}})-\bm{y}\right)\right)\bm{R}
+(()(𝑳𝑹𝖧))𝑹+14𝑳(𝑳𝖧𝑳𝑹𝖧𝑹),\displaystyle+\left(\left({\cal I}-{\cal H}{\cal H}^{\dagger}\right)(\bm{L}\bm{R}^{\mathsf{H}})\right)\bm{R}+\frac{1}{4}\bm{L}\left(\bm{L}^{\mathsf{H}}\bm{L}-\bm{R}^{\mathsf{H}}\bm{R}\right),
𝑹f=\displaystyle\nabla_{\bm{R}}f= (𝒟2𝒜(𝒜(𝑳𝑹𝖧)𝒚))𝖧𝑳\displaystyle\left({\cal H}{\cal D}^{-2}{\cal A}^{\ast}\left({\cal A}{\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}})-\bm{y}\right)\right)^{\mathsf{H}}\bm{L}
+(()(𝑳𝑹𝖧))𝖧𝑳+14𝑹(𝑹𝖧𝑹𝑳𝖧𝑳).\displaystyle+\left(\left({\cal I}-{\cal H}{\cal H}^{\dagger}\right)(\bm{L}\bm{R}^{\mathsf{H}})\right)^{\mathsf{H}}\bm{L}+\frac{1}{4}\bm{R}\left(\bm{R}^{\mathsf{H}}\bm{R}-\bm{L}^{\mathsf{H}}\bm{L}\right).

To obtain the computational cost of f\nabla f, we first introduce some notations. Let v{\cal H}_{v} be the Hankel operator which maps a vector 𝒙1×n\bm{x}\in\mathbb{C}^{1\times n} into an n1×n2n_{1}\times n_{2} matrix,

v(𝒙)=[x1xn2xn1xn],\displaystyle{\cal H}_{v}(\bm{x})=\begin{bmatrix}x_{1}&\cdots&x_{n_{2}}\\ \vdots&\ddots&\vdots\\ x_{n_{1}}&\cdots&x_{n}\\ \end{bmatrix},

where xix_{i} is the ii-th entry of 𝒙\bm{x}. The adjoint of v{\cal H}_{v}, denoted by v{\cal H}_{v}^{\ast}, is a linear mapping from n1×n2n_{1}\times n_{2} to 1×n1\times n. It is known that the computational complexity of both v(𝑳𝑹𝖧){\cal H}_{v}^{\ast}(\bm{L}_{\ell}\bm{R}^{\mathsf{H}}) and (v(𝒙))𝑹\left({\cal H}_{v}(\bm{x})\right)\bm{R} is 𝒪(rnlogn){\cal O}(rn\log n) flops [16]. Moreover, the authors in [8] show that (𝑿)=𝑷~(𝑿){\cal H}(\bm{X})=\bm{P}\widetilde{{\cal H}}(\bm{X}), where ~(𝑿)\widetilde{{\cal H}}(\bm{X}) is a matrix constructed by stacking all {v(𝒆𝖳𝑿)}=1s\{{\cal H}_{v}(\bm{e}_{\ell}^{\mathsf{T}}\bm{X})\}_{\ell=1}^{s} on top of one another, and 𝑷\bm{P} is a permutation matrix. Therefore we can compute (𝑳𝑹𝖧){\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}}) and (𝑿)𝑹{\cal H}(\bm{X})\bm{R} by using 𝒪(srnlogn){\cal O}(srn\log n) flops. Thus the implementation of our algorithm is very efficient and the main computational complexity in each step is 𝒪(sr2n+srnlog(n)){\cal O}(sr^{2}n+srn\log(n)).

3 Main results

In this section, we provide an analysis of PGD-VHL under a random subspace model.

Assumption 3.1.

The column vectors {𝐛j}j=0n1\{\bm{b}_{j}\}_{j=0}^{n-1} of 𝐁\bm{B} are independently and identically drawn from a distribution FF which satisfies the following conditions

𝔼{𝒃j𝒃j𝖧}\displaystyle\mathbb{E}\left\{\bm{b}_{j}\bm{b}_{j}^{\mathsf{H}}\rule{0.0pt}{8.53581pt}\right\} =𝑰s,j=0,,n1,\displaystyle=\bm{I}_{s},\quad j=0,\cdots,n-1, (3.1)
max0s1|𝒃j[]|2\displaystyle\max_{0\leq\ell\leq s-1}|\bm{b}_{j}[\ell]|^{2} μ0,j=0,,n1.\displaystyle\leq\mu_{0},\quad j=0,\cdots,n-1. (3.2)
Remark 3.1.

Assumption 3.1 is a standard assumption in blind super-resolution [4, 5, 6, 17, 8], and holds with μ0=1\mu_{0}=1 by many common random ensembles, for instance, the components of 𝐛\bm{b} are Rademacher random variables taking the values ±1\pm 1 with equal probability or 𝐛\bm{b} is uniformly sampled from the rows of a Discrete Fourier Transform (DFT) matrix.

Now we present the main result of the paper.

Theorem 3.1.

Let μμ1\mu\geq\mu_{1} and σ=σ1(𝚺^0)/(1ε)\sigma=\sigma_{1}({\widehat{\bm{\Sigma}}}_{0})/(1-\varepsilon) for 0ε1/30\leq\varepsilon\leq 1/3. Let ησr4500(μ0μsr)2σ2\eta\leq\frac{\sigma_{r}}{4500(\mu_{0}\mu sr)^{2}\sigma^{2}} and 𝐌=[𝐋𝖧𝐑𝖧]𝖧\bm{M}^{\natural}=\begin{bmatrix}{\bm{L}^{\natural}}^{\mathsf{H}}\quad{\bm{R}^{\natural}}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}. Suppose 𝐗\bm{X}^{\natural} obeys the Assumption 2.1 and the subspace 𝐁\bm{B} satisfies the Assumption 3.1. If

nc0ε2μ02μs2r2κ2log2(sn),\displaystyle n\geq c_{0}\varepsilon^{-2}\mu_{0}^{2}\mu s^{2}r^{2}\kappa^{2}\log^{2}(sn),

with probability at least 1c1(sn)c21-c_{1}(sn)^{-c_{2}}, the sequence {𝐌t}\{\bm{M}_{t}\} returned by Algorithm 1 satisfies

dist2(𝑴t,𝑴)(1ησr)tε2σrμ0s,\displaystyle\operatorname{\mathrm{dist}}^{2}(\bm{M}_{t},\bm{M}^{\natural})\leq(1-\eta\sigma_{r})^{t}\cdot\frac{\varepsilon^{2}\sigma_{r}}{\mu_{0}s}, (3.3)

where c0,c1,c2c_{0},c_{1},c_{2} are absolute constants, σr=σr((𝐗))\sigma_{r}=\sigma_{r}({\cal H}(\bm{X}^{\natural})), κ\kappa is the condition number of (𝐗){\cal H}(\bm{X}^{\natural}), and the distance dist(𝐌,𝐌)\operatorname{\mathrm{dist}}(\bm{M},\bm{M}^{\natural}) is defined as

dist(𝑴,𝑴)=min𝑸𝑸𝖳=𝑸𝖳𝑸=𝑰r𝑴𝑴𝑸𝖥.\displaystyle\operatorname{\mathrm{dist}}(\bm{M},\bm{M}^{\natural})=\min_{\bm{Q}\bm{Q}^{\mathsf{T}}=\bm{Q}^{\mathsf{T}}\bm{Q}=\bm{I}_{r}}\left\|\bm{M}-\bm{M}^{\natural}\bm{Q}\right\|_{{\footnotesize{\mathsf{F}}}}.
Remark 3.2.

The detailed proof of Theorem 3.1 is provided in [18]. Compared with the sample complexity established in [8] for the nuclear norm minimization method, which is nμ0μ1srlog4(sn)n\gtrsim\mu_{0}\mu_{1}\cdot sr\log^{4}(sn), Theorem 3.1 implies that PGD-VHL is sub-optimal in terms of ss and rr. We suspect that it is merely an artifact of our proof.

Remark 3.3.

Theorem 3.1 implies that PGD-VHL converges to 𝐌\bm{M}^{\natural} with a linear rate. Therefore, after T=𝒪((μ0μsrκ)2log(1/ϵ))T={\cal O}((\mu_{0}\mu sr\kappa)^{2}\log(1/\epsilon)) iterations, we have dist2(𝐌T,𝐌)ϵdist2(𝐌0,𝐌)\operatorname{\mathrm{dist}}^{2}(\bm{M}_{T},\bm{M}^{\natural})\leq\epsilon\cdot\operatorname{\mathrm{dist}}^{2}(\bm{M}_{0},\bm{M}^{\natural}). Given the iterates 𝐌T\bm{M}_{T} returned by PGD-VHL, we can estimate 𝐗T\bm{X}_{T} by (𝐋T𝐑T𝖧){\cal H}^{\dagger}(\bm{L}_{T}\bm{R}_{T}^{\mathsf{H}}).

Remark 3.4.

Once the data matrix 𝐗\bm{X}^{\natural} is recovered, the locations {τk}k=1r\{\tau_{k}\}_{k=1}^{r} can be computed from 𝐗\bm{X}^{\natural} by MUSIC algorithm and the weights {dk,𝐡k}k=1r\{d_{k},\bm{h}_{k}\}_{k=1}^{r} can be estimated by solving an overdetermined linear system [8].

4 Numerical simulations

In this section, we provide numerical results to illustrate the performance of PGD-VHL. The locations {τk}\{\tau_{k}\} of the point source signal is randomly generated from [0,1)[0,1) and the amplitudes {dk}\{d_{k}\} are selected to be dk=(1+10ck)eıϕkd_{k}=(1+10^{c_{k}})e^{-\imath\phi_{k}}, where ckc_{k} is uniformly sampled from [0,1][0,1] and ϕk\phi_{k} is uniformly sampled from [0,2π)[0,2\pi). The coefficients {𝒉k}k=1r\{\bm{h}_{k}\}_{k=1}^{r} are i.i.d. sampled from standard Gaussian with normalization. The columns of 𝑩\bm{B} are uniformly sampled from the DFT matrix. The stepsize of PGD-VHL is chosen via backtracking line search and PGD-VHL will be terminated if y𝒜(𝑿t)2105\left\|y-{\cal A}(\bm{X}_{t})\right\|_{{\footnotesize{2}}}\leq 10^{-5} is met or a maximum number of iterations is reached. We repeat 20 random trials and record the probability of successful recovery in our tests. A trial is declared to be successful if 𝑿t𝑿𝖥/𝑿𝖥103\left\|\bm{X}_{t}-\bm{X}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}/\left\|\bm{X}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}\leq 10^{-3}.

The first experiment studies the recovery ability of PGD-VHL through the framework of phase transition and we compare it with two convex recovery methods: VHL [8] and ANM [5]. Both VHL and ANM are solved by CVX [19]. The tests are conducted with n=64n=64 and the varied ss and rr. Figure 1(a), 1(c) and 1(e) show that phase transitions of VHL, ANM and PGD-VHL when the locations of point sources are randomly generated, and Figure 1(b), 1(d) and 1(f) illustrate the phase transitions of VHL, ANM and PGD-VHL when the separation condition Δ:=minjk|τjτk|1/n\Delta:=\min_{j\neq k}|\tau_{j}-\tau_{k}|\geq 1/n is imposed. It can be seen that PGD-VHL is less sensitive to the separation condition than ANM and has a higher phase transition curve than VHL.

Refer to caption

(a)

Refer to caption

(b)

Refer to caption

(c)

Refer to caption

(d)

Refer to caption

(e)

Refer to caption

(f)

Fig. 1: The phase transitions of VHL (a,b), ANM (c,d) and PGD-VHL (e,f) with (Left) or without (Right) imposing the separation condition. Here we fix n=64n=64. The red curve plots the hyperbola curve rs=20rs=20.

In the second experiment, we study the phase transition of PGD-VHL when one of rr and ss is fixed. Figure 2(a) indicates a approximately linear relationship between ss and nn for the successful recovery when the number of point sources is fixed to be r=4r=4. The same linear relationship between rr and nn can be observed when the dimension of the subspace is fixed to be s=4s=4, see Figure 2(b). Therefore there exists a gap between our theory and empirical observation and we leave it as future work.

Refer to caption

(a)

Refer to caption

(b)

Fig. 2: (a) The phase transition of PGD-VHL for varying nn and ss when r=4r=4. The red line plots the straight line n=2.5sn=2.5s. (b) The phase transition of PGD-VHL for varying nn and rr when s=4s=4. The red line plots the straight line n=2.5rn=2.5r.

In the third simulation, we investigate the convergence rate of PGD-VHL for different nn. The results are shown in Figure 3. The yy-axis denotes log(𝑿t𝑿𝖥/𝑿𝖥)\log\left(\left\|\bm{X}_{t}-\bm{X}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}/\left\|\bm{X}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}\right) and the xx-axis represents the iteration number. It can be clearly seen that PGD-VHL converges linearly as shown in our main theorem.

Refer to caption

Fig. 3: Convergence of PGD-VHL for n=256,512,1024n=256,512,1024. Here we fix s=4s=4 and r=4r=4.

5 Conclusion

In this paper, we propose an efficient algorithm named PGD-VHL towards recovering low rank matrix in blind super-resolution. Our theoretical analysis shows that the proposed algorithm converges to the target matrix linearly. This is also demonstrated by our numerical simulations.

References

  • [1] Sean Quirin, Sri Rama Prasanna Pavani, and Rafael Piestun, “Optimal 3D single-molecule localization for superresolution microscopy with aberrations and engineered point spread functions,” Proceedings of the National Academy of Sciences, vol. 109, no. 3, pp. 675–679, 2012.
  • [2] Xiaobo Qu, Maxim Mayzel, Jian-Feng Cai, Zhong Chen, and Vladislav Orekhov, “Accelerated NMR spectroscopy with low-rank reconstruction,” Angewandte Chemie International Edition, vol. 54, no. 3, pp. 852–854, 2015.
  • [3] Xiliang Luo and Georgios B Giannakis, “Low-complexity blind synchronization and demodulation for (ultra-) wideband multi-user ad hoc access,” IEEE Transactions on Wireless communications, vol. 5, no. 7, pp. 1930–1941, 2006.
  • [4] Yuejie Chi, “Guaranteed blind sparse spikes deconvolution via lifting and convex optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 4, pp. 782–794, 2016.
  • [5] Dehui Yang, Gongguo Tang, and Michael B Wakin, “Super-resolution of complex exponentials from modulations with unknown waveforms,” IEEE Transactions on Information Theory, vol. 62, no. 10, pp. 5809–5830, 2016.
  • [6] Shuang Li, Michael B Wakin, and Gongguo Tang, “Atomic norm denoising for complex exponentials with unknown waveform modulations,” IEEE Transactions on Information Theory, vol. 66, no. 6, pp. 3893–3913, 2019.
  • [7] Mohamed A Suliman and Wei Dai, “Mathematical theory of atomic norm denoising in blind two-dimensional super-resolution,” IEEE Transactions on Signal Processing, vol. 69, pp. 1681–1696, 2021.
  • [8] Jinchi Chen, Weiguo Gao, Sihan Mao, and Ke Wei, “Vectorized hankel lift: A convex approach for blind super-resolution of point sources,” arXiv preprint arXiv:2008.05092, 2020.
  • [9] Ali Ahmed, Benjamin Recht, and Justin Romberg, “Blind deconvolution using convex programming,” IEEE Transactions on Information Theory, vol. 60, no. 3, pp. 1711–1732, 2013.
  • [10] Emmanuel J Candès and Benjamin Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, pp. 717, 2009.
  • [11] Shuai Zhang, Yingshuai Hao, Meng Wang, and Joe H Chow, “Multichannel Hankel matrix completion through nonconvex optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 4, pp. 617–632, 2018.
  • [12] Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Ben Recht, “Low-rank solutions of linear matrix equations via procrustes flow,” in International Conference on Machine Learning. PMLR, 2016, pp. 964–973.
  • [13] Qinqing Zheng and John Lafferty, “Convergence analysis for rectangular matrix completion using burer-monteiro factorization and gradient descent,” arXiv preprint arXiv:1605.07051, 2016.
  • [14] Yuejie Chi, Yue M Lu, and Yuxin Chen, “Nonconvex optimization meets low-rank matrix factorization: An overview,” IEEE Transactions on Signal Processing, vol. 67, no. 20, pp. 5239–5269, 2019.
  • [15] Jian-Feng Cai, Tianming Wang, and Ke Wei, “Spectral compressed sensing via projected gradient descent,” SIAM Journal on Optimization, vol. 28, no. 3, pp. 2625–2653, 2018.
  • [16] Jian-Feng Cai, Tianming Wang, and Ke Wei, “Fast and provable algorithms for spectrally sparse signal reconstruction via low-rank Hankel matrix completion,” Applied and Computational Harmonic Analysis, vol. 46, no. 1, pp. 94–121, 2019.
  • [17] Mohamed A Suliman and Wei Dai, “Blind two-dimensional super-resolution and its performance guarantee,” arXiv preprint arXiv:1811.02070, 2018.
  • [18] Sihan Mao and Jinchi Chen, “Fast blind super-resolution of point sources via projected gradient descent,” Under Preparation, 2021.
  • [19] Michael Grant and Stephen Boyd, “CVX: Matlab software for disciplined convex programming, version 2.1,” Mar. 2014.