Blind Super-resolution via Projected Gradient Descent

Abstract

Blind super-resolution can be cast as low rank matrix recovery problem by exploiting the inherent simplicity of the signal. In this paper, we develop a simple yet efficient non-convex method for this problem based on the low rank structure of the vectorized Hankel matrix associated with the target matrix. Theoretical guarantees have been established under the similar conditions as convex approaches. Numerical experiments are also conducted to demonstrate its performance.

Index Terms— Blind super-resolution, non-convex optimization, projected gradient descent, vectorized Hankel lift.

1 Introduction

Blind super-resolution of point sources is the problem of simultaneously estimating locations and amplitudes of point sources and point spread functions from low-resolution measurements. Such problem arises in various applications, including single-molecule imaging [1], medical imaging [2], multi-user communication system [3] and so on.

Under certain subspace assumption and applying the lifting technique, blind super-resolution can be cast as a matrix recovery problem. Recent works in [4, 5, 6, 7, 8] exploit the intrinsic structures of data matrix and propose different convex relaxation methods for such problem. Theoretical guarantees for these methods have been established. However, due to the limitations of convex relaxation, all of these methods do not scale well to the high dimensional setting.

In contrast to convex relaxation, a non-convex recovery method is proposed in this paper based on the Vectorized Hankel Lift [8] framework. More precisely, harnessing low-rank structure of vectorized Hankel matrix corresponding to the signal in terms of the Burer-Monteiro factorization, we develop a projected gradient descent algorithm, named PGD-VHL, to directly recover the low rank factors. We show that such a simple algorithm possesses a remarkable reconstruction ability. Moreover, our algorithm started from an initial guess converges linearly to the target matrix under the similar sample complexity as convex approaches.

The rest of this paper is organized as follows. We begin with the problem formulation and describe the proposed algorithm in Section 2. Section 3 provides a convergence analysis of PGD-VHL. Numerical experiments to illustrate the performance of PGD-VHL are provided in Section 4. Section 5 concludes this paper.

2 Algorithm

2.1 Problem formulation

The point source signal can be represented as a superposition of $r$ spikes

\displaystyle x(t)=\sum_{k=1}^{r}d_{k}\delta(t-\tau_{k}),

where $d_{k}$ and $\tau_{k}$ are the amplitude and location of $k$ -th point source respectively. Let $\{g_{k}(t)\}_{k=1}^{r}$ be the unknown point spread functions. The observation is a convolution between $x(t)$ and $\{g_{k}(t)\}_{k=1}^{r}$ ,

\displaystyle y(t)=\sum_{k=1}^{r}d_{k}\delta(t-\tau_{k})\ast g_{k}(t)=\sum_{k=1}^{r}d_{k}g_{k}(t-\tau_{k}).

After taking Fourier transform and sampling, we obtain the measurements as

\displaystyle y[j]=\sum_{k=1}^{r}d_{k}e^{-2\pi\imath\tau_{k}\cdot j}\cdot\hat{g}_{k}[j]~{}~{}\text{ for }j=0,\cdots,n-1.

(2.1)

Let $\bm{g}_{k}=\begin{bmatrix}\hat{g}_{k}[0]&\cdots&\hat{g}_{k}[n-1]\end{bmatrix}^{\mathsf{T}}$ be a vector corresponding to the $k$ -th unknown point spread function. The goal is to estimate $\{d_{k},\tau_{k}\}_{k=1}^{r}$ as well as $\{\bm{g}_{k}\}_{k=1}^{r}$ from (2.1). Since the number of unknowns is larger than $n$ , this problem is an ill-posed problem without any additional assumptions. Following the same route as that in [4, 5, 6, 8], we assume $\{\bm{g}_{k}\}_{k=1}^{r}$ belong to a known subspace spanned by the columns of $\bm{B}\in\mathbb{C}^{n\times s}$ , i.e.,

\displaystyle\bm{g}_{k}=\bm{B}\bm{h}_{k}.

Then under the subspace assumption and applying the lifting technique [9], the measurements (2.1) can be rewritten as a linear observations of $\bm{X}^{\natural}:=\sum_{k=1}^{r}d_{k}\bm{h}_{k}\bm{a}_{\tau_{k}}^{\mathsf{T}}\in\mathbb{C}^{s\times n}$ :

\displaystyle y_{j}=\left\langle\bm{b}_{j}\bm{e}_{j}^{\mathsf{T}},\bm{X}^{\natural}\right\rangle\quad\text{for}\quad j=0,\cdots,n-1,

(2.2)

where $\bm{b}_{j}$ is the $j$ -th row of $\bm{B}$ , $\bm{e}_{j}$ is the $(j+1)$ -th standard basis of $\mathbb{R}^{n}$ , and $\bm{a}_{\tau}\in\mathbb{C}^{n}$ is the vector defined as

\displaystyle\bm{a}_{\tau}=\begin{bmatrix}1&e^{-2\pi\imath\tau\cdot 1}&\cdots&e^{-2\pi\imath\tau\cdot(n-1)}\end{bmatrix}^{\mathsf{T}}.

The measurement model (2.2) can be rewritten succinctly as

\displaystyle\bm{y}={\cal A}(\bm{X}^{\natural}),

(2.3)

where ${\cal A}:\mathbb{C}^{s\times n}\rightarrow\mathbb{C}^{n}$ is the linear operator. Therefore, blind super-resolution can be cast as the problem of recovering the data matrix $\bm{X}^{\natural}$ from its linear measurements (2.3).

Let ${\cal H}$ be the vectorized Hankel lifting operator which maps a matrix $\bm{X}\in\mathbb{C}^{s\times n}$ into an $sn_{1}\times n_{2}$ matrix,

\displaystyle{\cal H}(\bm{X})=\begin{bmatrix}\bm{x}_{0}&\bm{x}_{1}&\cdots&\bm{x}_{n_{2}-1}\\ \bm{x}_{1}&\bm{x}_{2}&\cdots&\bm{x}_{n_{2}}\\ \vdots&\vdots&\ddots&\vdots\\ \bm{x}_{n_{1}-1}&\bm{x}_{n_{1}}&\cdots&\bm{x}_{n-1}\\ \end{bmatrix}\in\mathbb{C}^{sn_{1}\times n_{2}},

where $\bm{x}_{i}$ is the $(i+1)$ -th column of $\bm{X}$ and $n_{1}+n_{2}=n+1$ . It is shown that ${\cal H}(\bm{X}^{\natural})$ is a rank- $r$ matrix [8] and thus the matrix ${\cal H}(\bm{X}^{\natural})$ admits low rank structure when $r\ll\min(sn_{1},n_{2})$ . It is natural to recover $\bm{X}$ by solving the constrained least squares problem

\displaystyle\min_{\bm{X}}~{}\frac{1}{2}\left\|\bm{y}-{\cal A}(\bm{X})\right\|_{{\footnotesize{2}}}^{2}~{}\text{ s.t. }\operatorname{\mathrm{rank}}({\cal H}(\bm{X}))=r.

To introduce our algorithm, we assume that $\bm{X}^{\natural}$ is $\mu_{1}$ -incoherent which is defined as below.

Assumption 2.1.

Let ${\cal H}(\bm{X}^{\natural})=\bm{U}\bm{\Sigma}\bm{V}^{\mathsf{H}}$ be the singular value decomposition of ${\cal H}(\bm{X}^{\natural})$ , where $\bm{U}\in\mathbb{C}^{sn_{1}\times r},\bm{\Sigma}\in\mathbb{R}^{r\times r}$ and $\bm{V}\in\mathbb{C}^{n_{2}\times r}$ . Denote $\bm{U}^{\mathsf{H}}=\begin{bmatrix}\bm{U}_{0}^{\mathsf{H}}&\cdots&\bm{U}_{n_{1}-1}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}$ , where $\bm{U}_{\ell}=\bm{U}[\ell s+1:(\ell+1)s,:]$ is the $\ell$ -th block of $\bm{U}$ for $\ell=0,\cdots n_{1}-1$ . The matrix $\bm{X}^{\natural}$ is $\mu_{1}$ -incoherence if $\bm{U}$ and $\bm{V}$ obey that

\displaystyle\max_{0\leq\ell\leq n_{1}-1}\left\|\bm{U}_{\ell}\right\|_{{\footnotesize{\mathsf{F}}}}^{2}\leq\frac{\mu_{1}r}{n}\text{ and }\max_{0\leq j\leq n_{2}-1}\left\|\bm{e}_{j}^{\mathsf{T}}\bm{V}\right\|_{{\footnotesize{2}}}^{2}\leq\frac{\mu_{1}r}{n}

for some positive constant $\mu_{1}$ .

Remark 2.1.

Assumption 2.1 is the same as the one made in [10, 11] for low rank matrix recovery and is used in blind super-resolution [8]. It has been established that Assumption 2.1 is obeyed when the minimum wrap-up distance between the locations of point sources is greater than about $1/n$ .

Letting $\bm{L}^{\natural}=\bm{U}\bm{\Sigma}^{1/2}$ and $\bm{R}^{\natural}=\bm{V}\bm{\Sigma}^{1/2}$ , we have

\displaystyle\max_{0\leq\ell\leq n_{1}-1}\left\|\bm{L}_{\ell}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}\leq\sqrt{\frac{\mu_{1}r\sigma_{1}}{n}}\text{ and }\left\|\bm{R}^{\natural}\right\|_{{\footnotesize{\mbox{2,$\infty$}}}}\leq\sqrt{\frac{\mu_{1}r\sigma_{1}}{n}},

where $\sigma_{1}=\left\|{\cal H}(\bm{X}^{\natural})\right\|$ . Since that target data matrix $\bm{X}^{\natural}$ is $\mu_{1}$ -incoherence and the low rank structure of the vectorized Hankel matrix can be promoted by ${\cal H}(\bm{X}^{\natural})=\bm{L}^{\natural}{\bm{R}^{\natural}}^{\mathsf{H}}$ , it is natural to recover the low rank factors of the ground truth matrix ${\cal H}(\bm{X}^{\natural})$ by solving an optimization problem in the form of

$\displaystyle\min_{\bm{M}\in{\cal M}}~{}\bigg{\{}f(\bm{M}):=$	$\displaystyle\frac{1}{2}\left\\|\bm{y}-{\cal A}{\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}})\right\\|_{{\footnotesize{2}}}^{2}$
	$\displaystyle+\frac{1}{2}\left\\|\left({\cal I}-{\cal H}{\cal H}^{\dagger}\right)(\bm{L}\bm{R}^{\mathsf{H}})\right\\|_{{\footnotesize{\mathsf{F}}}}^{2}$
	$\displaystyle+\frac{1}{16}\left\\|\bm{L}^{\mathsf{H}}\bm{L}-\bm{R}^{\mathsf{H}}\bm{R}\right\\|_{{\footnotesize{\mathsf{F}}}}^{2}\bigg{\}},$	(2.4)

where $\bm{M}=\begin{bmatrix}\bm{L}^{\mathsf{H}}&\bm{R}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}\in\mathbb{C}^{(sn_{1}+n_{2})\times r}$ , ${\cal H}^{\dagger}$ is the Moore-Penrose pseudoinverse of ${\cal H}$ obeying that ${\cal H}^{\dagger}{\cal H}={\cal I}$ , the second term in objective guarantees that $\bm{L}\bm{R}^{\mathsf{H}}$ admits vectorized Hankel structure. Last term penalizes the mismatch between $\bm{L}$ and $\bm{R}$ , which is widely used in rectangular low rank matrix recovery [12, 13, 14]. The convex feasible set ${\cal M}$ is defined as follows

	$\displaystyle{\cal M}=\bigg{\{}\begin{bmatrix}\bm{L}\\ \bm{R}\\ \end{bmatrix}~{}:~{}\max_{0\leq\ell\leq n_{1}-1}\left\\|\bm{L}_{\ell}\right\\|_{{\footnotesize{\mathsf{F}}}}$	$\displaystyle\leq\sqrt{\frac{\mu r\sigma}{n}},$
	$\displaystyle\left\\|\bm{R}\right\\|_{{\footnotesize{\mbox{2,$\infty$}}}}$	$\displaystyle\leq\sqrt{\frac{\mu r\sigma}{n}}\bigg{\}},$		(2.5)

where $\bm{L}_{\ell}$ is the $\ell$ -th block of $\bm{L}$ , $\mu$ and $\sigma$ be two absolute constants such that $\mu\geq\mu_{1}$ and $\sigma\geq\sigma_{1}$ .

2.2 Projected gradient descent method

Inspired by [15], we design a projected gradient descent method for the problem (2.1), which is summarized in Algorithm 1.

Algorithm 1 PGD-VHL

Input:

{\cal A},\bm{y},n,s,r

Initialization:

~{}\quad n_{1}=n/2,\quad n_{2}=n+1-n_{1}

~{}\quad\widehat{\bm{U}}_{0}\widehat{\bm{\Sigma}}_{0}{\widehat{\bm{V}}_{0}}^{\mathsf{H}}={\cal P}_{r}{\cal H}{\cal A}^{\ast}(\bm{y})

~{}\quad\widehat{\bm{L}}_{0}=\widehat{\bm{U}}_{0}{\widehat{\bm{\Sigma}}_{0}}^{1/2},\quad\widehat{\bm{R}}_{0}=\widehat{\bm{V}}_{0}{\widehat{\bm{\Sigma}}_{0}}^{1/2}

~{}\quad(\bm{L}_{0},\bm{R}_{0})={\cal P}_{{\cal M}}((\widehat{\bm{L}}_{0},\widehat{\bm{R}}_{0}))

~{}\quad\bm{M}_{0}=\begin{bmatrix}\bm{L}_{0}^{\mathsf{H}}&\bm{R}^{\mathsf{H}}_{0}\end{bmatrix}^{\mathsf{H}}

while not convergence do

\bm{M}_{t+1}={\cal P}_{{\cal M}}\left(\bm{M}_{t}-\eta\nabla f(\bm{M}_{t})\right)

end while

The initialization involves two steps: (1) computes the best rank $r$ approximation of ${\cal H}({\cal A}^{\ast}(\bm{y}))$ via one step hard thresholding ${\cal P}_{r}(\cdot)$ , where ${\cal A}^{\ast}$ is the adjoint of ${\cal A}$ ;(2) projects the low rank factors of best rank- $r$ approximated matrix onto the set ${\cal M}$ . Given a matrix $\bm{M}=\begin{bmatrix}\bm{L}^{\mathsf{H}}&\bm{R}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}$ , the projection onto ${\cal M}$ , denoted by $\begin{bmatrix}\widehat{\bm{L}}^{\mathsf{H}}&\widehat{\bm{R}}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}$ , has a closed form solution:

\displaystyle[\widehat{\bm{L}}]_{\ell}=\begin{cases}\bm{L}_{\ell}&\text{ if }\left\|\bm{L}_{\ell}\right\|_{{\footnotesize{\mathsf{F}}}}\leq\sqrt{\frac{\mu r\sigma}{n}}\\ \frac{1}{\left\|\bm{L}_{\ell}\right\|_{{\footnotesize{\mathsf{F}}}}}\bm{L}_{\ell}\cdot\sqrt{\frac{\mu r\sigma}{n}}&\text{ o.w.}\end{cases}

for $0\leq\ell\leq n_{1}-1$ and

\displaystyle\bm{e}_{j}^{\mathsf{T}}\widehat{\bm{R}}=\begin{cases}\bm{e}_{j}^{\mathsf{T}}\bm{R}&\text{ if }\left\|\bm{e}_{j}^{\mathsf{T}}\bm{R}\right\|_{{\footnotesize{2}}}\leq\sqrt{\frac{\mu r\sigma}{n}}\\ \frac{\bm{e}_{j}^{\mathsf{T}}\bm{R}}{\left\|\bm{e}_{j}^{\mathsf{T}}\bm{R}\right\|_{{\footnotesize{2}}}}\cdot\sqrt{\frac{\mu r\sigma}{n}}&\text{ o.w.}\end{cases}

for $0\leq j\leq n_{2}-1$ . Let $\bm{M}_{t}$ be the current estimator. The algorithm updates $\bm{M}_{t}$ along gradient descent direction $-\nabla f(\bm{M}_{t})$ with step size $\eta$ , followed by projection onto the set ${\cal M}$ . The gradient of $f(\bm{M})$ is computed with respect to Wirtinger calculus given by $\nabla f=\begin{bmatrix}\nabla^{\mathsf{H}}_{\bm{L}}f&\nabla^{\mathsf{H}}_{\bm{R}}f\end{bmatrix}^{\mathsf{H}}$ where

	$\displaystyle\nabla_{\bm{L}}f=$	$\displaystyle\left({\cal H}{\cal D}^{-2}{\cal A}^{\ast}\left({\cal A}{\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}})-\bm{y}\right)\right)\bm{R}$
		$\displaystyle+\left(\left({\cal I}-{\cal H}{\cal H}^{\dagger}\right)(\bm{L}\bm{R}^{\mathsf{H}})\right)\bm{R}+\frac{1}{4}\bm{L}\left(\bm{L}^{\mathsf{H}}\bm{L}-\bm{R}^{\mathsf{H}}\bm{R}\right),$
	$\displaystyle\nabla_{\bm{R}}f=$	$\displaystyle\left({\cal H}{\cal D}^{-2}{\cal A}^{\ast}\left({\cal A}{\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}})-\bm{y}\right)\right)^{\mathsf{H}}\bm{L}$
		$\displaystyle+\left(\left({\cal I}-{\cal H}{\cal H}^{\dagger}\right)(\bm{L}\bm{R}^{\mathsf{H}})\right)^{\mathsf{H}}\bm{L}+\frac{1}{4}\bm{R}\left(\bm{R}^{\mathsf{H}}\bm{R}-\bm{L}^{\mathsf{H}}\bm{L}\right).$

To obtain the computational cost of $\nabla f$ , we first introduce some notations. Let ${\cal H}_{v}$ be the Hankel operator which maps a vector $\bm{x}\in\mathbb{C}^{1\times n}$ into an $n_{1}\times n_{2}$ matrix,

\displaystyle{\cal H}_{v}(\bm{x})=\begin{bmatrix}x_{1}&\cdots&x_{n_{2}}\\ \vdots&\ddots&\vdots\\ x_{n_{1}}&\cdots&x_{n}\\ \end{bmatrix},

where $x_{i}$ is the $i$ -th entry of $\bm{x}$ . The adjoint of ${\cal H}_{v}$ , denoted by ${\cal H}_{v}^{\ast}$ , is a linear mapping from $n_{1}\times n_{2}$ to $1\times n$ . It is known that the computational complexity of both ${\cal H}_{v}^{\ast}(\bm{L}_{\ell}\bm{R}^{\mathsf{H}})$ and $\left({\cal H}_{v}(\bm{x})\right)\bm{R}$ is ${\cal O}(rn\log n)$ flops [16]. Moreover, the authors in [8] show that ${\cal H}(\bm{X})=\bm{P}\widetilde{{\cal H}}(\bm{X})$ , where $\widetilde{{\cal H}}(\bm{X})$ is a matrix constructed by stacking all $\{{\cal H}_{v}(\bm{e}_{\ell}^{\mathsf{T}}\bm{X})\}_{\ell=1}^{s}$ on top of one another, and $\bm{P}$ is a permutation matrix. Therefore we can compute ${\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}})$ and ${\cal H}(\bm{X})\bm{R}$ by using ${\cal O}(srn\log n)$ flops. Thus the implementation of our algorithm is very efficient and the main computational complexity in each step is ${\cal O}(sr^{2}n+srn\log(n))$ .

3 Main results

In this section, we provide an analysis of PGD-VHL under a random subspace model.

Assumption 3.1.

The column vectors $\{\bm{b}_{j}\}_{j=0}^{n-1}$ of $\bm{B}$ are independently and identically drawn from a distribution $F$ which satisfies the following conditions

	$\displaystyle\mathbb{E}\left\{\bm{b}_{j}\bm{b}_{j}^{\mathsf{H}}\rule{0.0pt}{8.53581pt}\right\}$	$\displaystyle=\bm{I}_{s},\quad j=0,\cdots,n-1,$		(3.1)
	$\displaystyle\max_{0\leq\ell\leq s-1}\|\bm{b}_{j}[\ell]\|^{2}$	$\displaystyle\leq\mu_{0},\quad j=0,\cdots,n-1.$		(3.2)

Remark 3.1.

Assumption 3.1 is a standard assumption in blind super-resolution [4, 5, 6, 17, 8], and holds with $\mu_{0}=1$ by many common random ensembles, for instance, the components of $\bm{b}$ are Rademacher random variables taking the values $\pm 1$ with equal probability or $\bm{b}$ is uniformly sampled from the rows of a Discrete Fourier Transform (DFT) matrix.

Now we present the main result of the paper.

Theorem 3.1.

Let $\mu\geq\mu_{1}$ and $\sigma=\sigma_{1}({\widehat{\bm{\Sigma}}}_{0})/(1-\varepsilon)$ for $0\leq\varepsilon\leq 1/3$ . Let $\eta\leq\frac{\sigma_{r}}{4500(\mu_{0}\mu sr)^{2}\sigma^{2}}$ and $\bm{M}^{\natural}=\begin{bmatrix}{\bm{L}^{\natural}}^{\mathsf{H}}\quad{\bm{R}^{\natural}}^{\mathsf{H}}\end{bmatrix}^{\mathsf{H}}$ . Suppose $\bm{X}^{\natural}$ obeys the Assumption 2.1 and the subspace $\bm{B}$ satisfies the Assumption 3.1. If

\displaystyle n\geq c_{0}\varepsilon^{-2}\mu_{0}^{2}\mu s^{2}r^{2}\kappa^{2}\log^{2}(sn),

with probability at least $1-c_{1}(sn)^{-c_{2}}$ , the sequence $\{\bm{M}_{t}\}$ returned by Algorithm 1 satisfies

\displaystyle\operatorname{\mathrm{dist}}^{2}(\bm{M}_{t},\bm{M}^{\natural})\leq(1-\eta\sigma_{r})^{t}\cdot\frac{\varepsilon^{2}\sigma_{r}}{\mu_{0}s},

(3.3)

where $c_{0},c_{1},c_{2}$ are absolute constants, $\sigma_{r}=\sigma_{r}({\cal H}(\bm{X}^{\natural}))$ , $\kappa$ is the condition number of ${\cal H}(\bm{X}^{\natural})$ , and the distance $\operatorname{\mathrm{dist}}(\bm{M},\bm{M}^{\natural})$ is defined as

\displaystyle\operatorname{\mathrm{dist}}(\bm{M},\bm{M}^{\natural})=\min_{\bm{Q}\bm{Q}^{\mathsf{T}}=\bm{Q}^{\mathsf{T}}\bm{Q}=\bm{I}_{r}}\left\|\bm{M}-\bm{M}^{\natural}\bm{Q}\right\|_{{\footnotesize{\mathsf{F}}}}.

Remark 3.2.

The detailed proof of Theorem 3.1 is provided in [18]. Compared with the sample complexity established in [8] for the nuclear norm minimization method, which is $n\gtrsim\mu_{0}\mu_{1}\cdot sr\log^{4}(sn)$ , Theorem 3.1 implies that PGD-VHL is sub-optimal in terms of $s$ and $r$ . We suspect that it is merely an artifact of our proof.

Remark 3.3.

Theorem 3.1 implies that PGD-VHL converges to $\bm{M}^{\natural}$ with a linear rate. Therefore, after $T={\cal O}((\mu_{0}\mu sr\kappa)^{2}\log(1/\epsilon))$ iterations, we have $\operatorname{\mathrm{dist}}^{2}(\bm{M}_{T},\bm{M}^{\natural})\leq\epsilon\cdot\operatorname{\mathrm{dist}}^{2}(\bm{M}_{0},\bm{M}^{\natural})$ . Given the iterates $\bm{M}_{T}$ returned by PGD-VHL, we can estimate $\bm{X}_{T}$ by ${\cal H}^{\dagger}(\bm{L}_{T}\bm{R}_{T}^{\mathsf{H}})$ .

Remark 3.4.

Once the data matrix $\bm{X}^{\natural}$ is recovered, the locations $\{\tau_{k}\}_{k=1}^{r}$ can be computed from $\bm{X}^{\natural}$ by MUSIC algorithm and the weights $\{d_{k},\bm{h}_{k}\}_{k=1}^{r}$ can be estimated by solving an overdetermined linear system [8].

4 Numerical simulations

In this section, we provide numerical results to illustrate the performance of PGD-VHL. The locations $\{\tau_{k}\}$ of the point source signal is randomly generated from $[0,1)$ and the amplitudes $\{d_{k}\}$ are selected to be $d_{k}=(1+10^{c_{k}})e^{-\imath\phi_{k}}$ , where $c_{k}$ is uniformly sampled from $[0,1]$ and $\phi_{k}$ is uniformly sampled from $[0,2\pi)$ . The coefficients $\{\bm{h}_{k}\}_{k=1}^{r}$ are i.i.d. sampled from standard Gaussian with normalization. The columns of $\bm{B}$ are uniformly sampled from the DFT matrix. The stepsize of PGD-VHL is chosen via backtracking line search and PGD-VHL will be terminated if $\left\|y-{\cal A}(\bm{X}_{t})\right\|_{{\footnotesize{2}}}\leq 10^{-5}$ is met or a maximum number of iterations is reached. We repeat 20 random trials and record the probability of successful recovery in our tests. A trial is declared to be successful if $\left\|\bm{X}_{t}-\bm{X}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}/\left\|\bm{X}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}\leq 10^{-3}$ .

The first experiment studies the recovery ability of PGD-VHL through the framework of phase transition and we compare it with two convex recovery methods: VHL [8] and ANM [5]. Both VHL and ANM are solved by CVX [19]. The tests are conducted with $n=64$ and the varied $s$ and $r$ . Figure 1(a), 1(c) and 1(e) show that phase transitions of VHL, ANM and PGD-VHL when the locations of point sources are randomly generated, and Figure 1(b), 1(d) and 1(f) illustrate the phase transitions of VHL, ANM and PGD-VHL when the separation condition $\Delta:=\min_{j\neq k}|\tau_{j}-\tau_{k}|\geq 1/n$ is imposed. It can be seen that PGD-VHL is less sensitive to the separation condition than ANM and has a higher phase transition curve than VHL.

Refer to caption — Fig. 1: The phase transitions of VHL (a,b), ANM (c,d) and PGD-VHL (e,f) with (Left) or without (Right) imposing the separation condition. Here we fix $n=64$ . The red curve plots the hyperbola curve $rs=20$ .

In the second experiment, we study the phase transition of PGD-VHL when one of $r$ and $s$ is ﬁxed. Figure 2(a) indicates a approximately linear relationship between $s$ and $n$ for the successful recovery when the number of point sources is ﬁxed to be $r=4$ . The same linear relationship between $r$ and $n$ can be observed when the dimension of the subspace is ﬁxed to be $s=4$ , see Figure 2(b). Therefore there exists a gap between our theory and empirical observation and we leave it as future work.

In the third simulation, we investigate the convergence rate of PGD-VHL for different $n$ . The results are shown in Figure 3. The $y$ -axis denotes $\log\left(\left\|\bm{X}_{t}-\bm{X}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}/\left\|\bm{X}^{\natural}\right\|_{{\footnotesize{\mathsf{F}}}}\right)$ and the $x$ -axis represents the iteration number. It can be clearly seen that PGD-VHL converges linearly as shown in our main theorem.

5 Conclusion

In this paper, we propose an efficient algorithm named PGD-VHL towards recovering low rank matrix in blind super-resolution. Our theoretical analysis shows that the proposed algorithm converges to the target matrix linearly. This is also demonstrated by our numerical simulations.

References

[1] Sean Quirin, Sri Rama Prasanna Pavani, and Rafael Piestun, “Optimal 3D single-molecule localization for superresolution microscopy with aberrations and engineered point spread functions,” Proceedings of the National Academy of Sciences, vol. 109, no. 3, pp. 675–679, 2012.
[2] Xiaobo Qu, Maxim Mayzel, Jian-Feng Cai, Zhong Chen, and Vladislav Orekhov, “Accelerated NMR spectroscopy with low-rank reconstruction,” Angewandte Chemie International Edition, vol. 54, no. 3, pp. 852–854, 2015.
[3] Xiliang Luo and Georgios B Giannakis, “Low-complexity blind synchronization and demodulation for (ultra-) wideband multi-user ad hoc access,” IEEE Transactions on Wireless communications, vol. 5, no. 7, pp. 1930–1941, 2006.
[4] Yuejie Chi, “Guaranteed blind sparse spikes deconvolution via lifting and convex optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 10, no. 4, pp. 782–794, 2016.
[5] Dehui Yang, Gongguo Tang, and Michael B Wakin, “Super-resolution of complex exponentials from modulations with unknown waveforms,” IEEE Transactions on Information Theory, vol. 62, no. 10, pp. 5809–5830, 2016.
[6] Shuang Li, Michael B Wakin, and Gongguo Tang, “Atomic norm denoising for complex exponentials with unknown waveform modulations,” IEEE Transactions on Information Theory, vol. 66, no. 6, pp. 3893–3913, 2019.
[7] Mohamed A Suliman and Wei Dai, “Mathematical theory of atomic norm denoising in blind two-dimensional super-resolution,” IEEE Transactions on Signal Processing, vol. 69, pp. 1681–1696, 2021.
[8] Jinchi Chen, Weiguo Gao, Sihan Mao, and Ke Wei, “Vectorized hankel lift: A convex approach for blind super-resolution of point sources,” arXiv preprint arXiv:2008.05092, 2020.
[9] Ali Ahmed, Benjamin Recht, and Justin Romberg, “Blind deconvolution using convex programming,” IEEE Transactions on Information Theory, vol. 60, no. 3, pp. 1711–1732, 2013.
[10] Emmanuel J Candès and Benjamin Recht, “Exact matrix completion via convex optimization,” Foundations of Computational Mathematics, vol. 9, no. 6, pp. 717, 2009.
[11] Shuai Zhang, Yingshuai Hao, Meng Wang, and Joe H Chow, “Multichannel Hankel matrix completion through nonconvex optimization,” IEEE Journal of Selected Topics in Signal Processing, vol. 12, no. 4, pp. 617–632, 2018.
[12] Stephen Tu, Ross Boczar, Max Simchowitz, Mahdi Soltanolkotabi, and Ben Recht, “Low-rank solutions of linear matrix equations via procrustes flow,” in International Conference on Machine Learning. PMLR, 2016, pp. 964–973.
[13] Qinqing Zheng and John Lafferty, “Convergence analysis for rectangular matrix completion using burer-monteiro factorization and gradient descent,” arXiv preprint arXiv:1605.07051, 2016.
[14] Yuejie Chi, Yue M Lu, and Yuxin Chen, “Nonconvex optimization meets low-rank matrix factorization: An overview,” IEEE Transactions on Signal Processing, vol. 67, no. 20, pp. 5239–5269, 2019.
[15] Jian-Feng Cai, Tianming Wang, and Ke Wei, “Spectral compressed sensing via projected gradient descent,” SIAM Journal on Optimization, vol. 28, no. 3, pp. 2625–2653, 2018.
[16] Jian-Feng Cai, Tianming Wang, and Ke Wei, “Fast and provable algorithms for spectrally sparse signal reconstruction via low-rank Hankel matrix completion,” Applied and Computational Harmonic Analysis, vol. 46, no. 1, pp. 94–121, 2019.
[17] Mohamed A Suliman and Wei Dai, “Blind two-dimensional super-resolution and its performance guarantee,” arXiv preprint arXiv:1811.02070, 2018.
[18] Sihan Mao and Jinchi Chen, “Fast blind super-resolution of point sources via projected gradient descent,” Under Preparation, 2021.
[19] Michael Grant and Stephen Boyd, “CVX: Matlab software for disciplined convex programming, version 2.1,” Mar. 2014.

$\displaystyle\min_{\bm{M}\in{\cal M}}~{}\bigg{\{}f(\bm{M}):=$	$\displaystyle\frac{1}{2}\left\\|\bm{y}-{\cal A}{\cal H}^{\dagger}(\bm{L}\bm{R}^{\mathsf{H}})\right\\|_{{\footnotesize{2}}}^{2}$
	$\displaystyle+\frac{1}{2}\left\\|\left({\cal I}-{\cal H}{\cal H}^{\dagger}\right)(\bm{L}\bm{R}^{\mathsf{H}})\right\\|_{{\footnotesize{\mathsf{F}}}}^{2}$
	$\displaystyle+\frac{1}{16}\left\\|\bm{L}^{\mathsf{H}}\bm{L}-\bm{R}^{\mathsf{H}}\bm{R}\right\\|_{{\footnotesize{\mathsf{F}}}}^{2}\bigg{\}},$	(2.4)