This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Sparse recovery by non-convex optimization – instance optimality

Rayan Saab Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, B.C. Canada V6T 1Z4 rayans@ece.ubc.ca Özgür Yılmaz Department of Mathematics, University of British Columbia, Vancouver, B.C. Canada V6T 1Z2 oyilmaz@math.ubc.ca
Abstract

In this note, we address the theoretical properties of Δp\Delta_{p}, a class of compressed sensing decoders that rely on p\ell^{p} minimization with 0<p<10<p<1 to recover estimates of sparse and compressible signals from incomplete and inaccurate measurements. In particular, we extend the results of Candès, Romberg and Tao [4] and Wojtaszczyk [30] regarding the decoder Δ1\Delta_{1}, based on 1\ell^{1} minimization, to Δp\Delta_{p} with 0<p<10<p<1. Our results are two-fold. First, we show that under certain sufficient conditions that are weaker than the analogous sufficient conditions for Δ1\Delta_{1} the decoders Δp\Delta_{p} are robust to noise and stable in the sense that they are (2,p)(2,p) instance optimal for a large class of encoders. Second, we extend the results of Wojtaszczyk to show that, like Δ1\Delta_{1}, the decoders Δp\Delta_{p} are (2,2)(2,2) instance optimal in probability provided the measurement matrix is drawn from an appropriate distribution.

thanks: This work was supported in part by a Discovery Grant and by a CRD Grant (DNOISE) from the Natural Sciences and Engineering Research Council of Canada. R. Saab also acknowledges a UGF award from the UBC, and a Pacific Century Graduate Scholarship from the Province of British Columbia through the Ministry of Advanced Education.

1 Introduction

The sparse recovery problem received a lot of attention lately, both because of its role in transform coding with redundant dictionaries (e.g., [28, 29, 9]), and perhaps more importantly because it inspired compressed sensing [13, 3, 4], a novel method of acquiring signals with certain properties more efficiently compared to the classical approach based on Nyquist-Shannon sampling theory. Define ΣSN\Sigma_{S}^{N} to be the set of all SS-sparse vectors, i.e.,

ΣSN:={xN:|supp(x)|S},\Sigma_{S}^{N}:=\{x\in\mathbb{R}^{N}:\ \ |\text{supp}(x)|\leq S\},

and define compressible vectors as vectors that can be well approximated in ΣSN\Sigma_{S}^{N}. Let σS(x)p\sigma_{S}(x)_{\ell^{p}} denote the best SS-term approximation error of xx in p\ell^{p}\quad (quasi-)norm where p>0p>0, i.e.,

σS(x)p:=minvΣSNxvp.\sigma_{S}(x)_{\ell^{p}}:=\min_{v\in\Sigma_{S}^{N}}\|x-v\|_{p}.

Throughout the text, AA denotes an M×NM\times N real matrix where M<NM<N. Let the associated encoder be the map xAxx\mapsto Ax (also denoted by AA). The transform coding and compressed sensing problems mentioned above require the existence of decoders, say Δ:MN\Delta:\ \mathbb{R}^{M}\mapsto\mathbb{R}^{N}, with roughly the following properties:

  1. (C1)

    Δ(Ax)=x\Delta(Ax)=x whenever xΣSNx\in\Sigma_{S}^{N} with sufficiently small SS.

  2. (C2)

    xΔ(Ax+e)e+σS(x)p\|x-\Delta(Ax+e)\|\lesssim\|e\|+\sigma_{S}(x)_{\ell^{p}}, where the norms are appropriately chosen. Here ee denotes measurement error, e.g., thermal and computational noise.

  3. (C3)

    Δ(Ax)\Delta(Ax) can be computed efficiently (in some sense).

Below, we denote the (in general noisy) encoding of xx by bb, i.e.,

b=Ax+e.b=Ax+e. (1)

In general, the problem of constructing decoders with properties (C1)-(C3) is non-trivial (even in the noise-free case) as AA is overcomplete, i.e., the linear system of MM equations in (1) is underdetermined, and thus, if consistent, it admits infinitely many solutions. In order for a decoder to satisfy (C1)-(C3), it must choose the “correct solution” among these infinitely many solutions. Under the assumption that the original signal xx is sparse, one can phrase the problem of finding the desired solution as an optimization problem where the objective is to maximize an appropriate “measure of sparsity” while simultaneously satisfying the constraints defined by (1). In the noise-free case, i.e., when e=0e=0 in (1), under certain conditions on the M×NM\times N matrix AA, i.e., if AA is in general position, there is a decoder Δ0\Delta_{0} which satisfies Δ0(Ax)=x\Delta_{0}(Ax)=x for all xΣSNx\in\Sigma_{S}^{N} whenever S<M/2S<M/2, e.g., see [14]. This Δ0\Delta_{0} can be explicitly computed via the optimization problem

Δ0(b):=argminyy0 subject to b=Ay.\Delta_{0}(b):=\arg\min_{y}\|y\|_{0}\text{ subject to }b=Ay. (2)

Here y0\|y\|_{0} denotes the number of non-zero entries of the vector yy, equivalently its so-called 0\ell^{0}-norm. Clearly, the sparsity of yy is reflected by its 0\ell^{0}-norm.

1.1 Decoding by 1\ell^{1} minimization

As mentioned above, Δ0(Ax)=x\Delta_{0}(Ax)=x exactly if xx is sufficiently sparse depending on the matrix AA. However, the associated optimization problem is combinatorial in nature, thus its complexity grows quickly as NN becomes much larger than MM. Naturally, one then seeks to modify the optimization problem so that it lends itself to solution methods that are more tractable than combinatorial search. In fact, in the noise-free setting, the decoder defined by 1\ell^{1} minimization, given by

Δ1(b):=argminyy1 subject to Ay=b,\Delta_{1}(b):=\arg\min_{y}\|y\|_{1}\text{ subject to }Ay=b, (3)

recovers xx exactly if xx is sufficiently sparse and the matrix AA has certain properties (e.g., [6, 4, 14, 15, 9, 26]). In particular, it has been shown in [4] that if xΣSNx\in\Sigma_{S}^{N} and AA satisfies a certain restricted isometry property, e.g., δ3S<1/3\delta_{3S}<1/3 or more generally δ(k+1)S<k1k+1\delta_{(k+1)S}<\frac{k-1}{k+1} for some k>1k>1 such that k1Sk\in\frac{1}{S}\mathbb{N}, then Δ1(Ax)=x\Delta_{1}(Ax)=x (in what follows, \mathbb{N} denotes the set of positive integers, i.e., 00\notin\mathbb{N}). Here δS\delta_{S} are the SS-restricted isometry constants of AA, as introduced by Candès, Romberg and Tao (see, e.g., [4]), defined as the smallest constants satisfying

(1δS)c22Ac22(1+δS)c22(1-\delta_{S})\|c\|_{2}^{2}\leq\|Ac\|_{2}^{2}\leq(1+\delta_{S})\|c\|_{2}^{2} (4)

for every cΣSNc\in\Sigma_{S}^{N}. Throughout the paper, using the notation of [30], we say that a matrix satisfies RIP(S,δ)\text{RIP}(S,\delta) if δS<δ\delta_{S}<\delta.

Checking whether a given matrix satisfies a certain RIP is computationally intensive, and becomes rapidly intractable as the size of the matrix increases. On the other hand, there are certain classes of random matrices which have favorable RIP. In fact, let AA be an M×NM\times N matrix the columns of which are independent, identically distributed (i.i.d.) random vectors with any sub-Gaussian distribution. It has been shown that AA satisfies RIP(S,δ)\text{RIP}\left(S,\delta\right) with any 0<δ<10<\delta<1 when

Sc1M/log(N/M),S\leq c_{1}M/log(N/M), (5)

with probability greater than 12ec2M1-2e^{-c_{2}M} (see, e.g., [1],[5],[6]), where c1c_{1} and c2c_{2} are positive constants that only depend on δ\delta and on the actual distribution from which AA is drawn.

In addition to recovering sparse vectors from error-free observations, it is important that the decoder be robust to noise and stable with regards to the “compressibility” of xx. In other words, we require that the reconstruction error scale well with the measurement error and with the “non-sparsity” of the signal (i.e., (C2) above). For matrices that satisfy RIP((k+1)S,δ)\text{RIP}((k+1)S,\delta), with δ<k1k+1\delta<\frac{k-1}{k+1} for some k>1k>1 such that k1Sk\in\frac{1}{S}\mathbb{N}, it has been shown in [4] that there exists a feasible decoder Δ1ϵ\Delta_{1}^{\epsilon} for which the approximation error Δ1ϵ(b)x2\|\Delta_{1}^{\epsilon}(b)-x\|_{2} scales linearly with the measurement error e2ϵ\|e\|_{2}\leq\epsilon and with σS(x)1\sigma_{S}(x)_{\ell^{1}}. More specifically, define the decoder

Δ1ϵ(b)=argminyy1 subject to Ayb2ϵ.\Delta_{1}^{\epsilon}(b)=\arg\min_{y}\|y\|_{1}\text{ subject to }\|Ay-b\|_{2}\leq\epsilon. (6)

The following theorem of Candès et al. in [4] provides error guarantees when xx is not sparse and when the observation is noisy.

Theorem 1

[4] Fix ϵ0\epsilon\geq 0, suppose that x{x} is arbitrary, and let b=Ax+e{b}={A}{x}+{e} where e2ϵ\|{e}\|_{2}\leq\epsilon. If AA satisfies δ3S+3δ4S<2\delta_{3S}+3\delta_{4S}<2, then

Δ1ϵ(b)x2C1,Sϵ+C2,SσS(x)1S.\|\Delta^{\epsilon}_{1}(b)-{x}\|_{2}\leq C_{1,S}\epsilon+C_{2,S}\frac{\sigma_{S}(x)_{\ell^{1}}}{\sqrt{S}}. (7)

For reasonable values of δ4S\delta_{4S}, the constants are well behaved; e.g., C1,S=12.04C_{1,S}=12.04 and C2,S=8.77C_{2,S}=8.77 for δ4S=1/5\delta_{4S}=1/5.

Remark 2

This means that given b=Ax+eb=Ax+e, and xx is sufficiently sparse, Δ1ϵ(b)\Delta_{1}^{\epsilon}(b) recovers the underlying sparse signal within the noise level. Consequently the recovery is perfect if ϵ=0\epsilon=0.

Remark 3

By explicitly assuming xx to be sparse, Candès et. al. [4] proved a version of the above result with smaller constants, i.e., for b=Ax+eb=Ax+e with xΣSNx\in\Sigma_{S}^{N} and e2ϵ\|e\|_{2}\leq\epsilon,

Δ1ϵ(b)x2CSϵ,\|\Delta^{\epsilon}_{1}(b)-{x}\|_{2}\leq C_{S}\epsilon, (8)

where CS<C1,SC_{S}<C_{1,S}.

Remark 4

Recently, Candès [2] showed that δ2S<21\delta_{2S}<\sqrt{2}-1 is sufficient to guarantee robust and stable recovery in the sense of (7) with slightly better constants.

In the noise free case, i.e., when ϵ=0\epsilon=0, the reconstruction error in Theorem 1 is bounded above by σS(x)1/S\sigma_{S}(x)_{\ell^{1}}/\sqrt{S}, see (7). This upper bound would sharpen if one could replace σS(x)1/S\sigma_{S}(x)_{\ell^{1}}/\sqrt{S} with σS(x)2\sigma_{S}(x)_{\ell^{2}} on the right hand side of (7) (note that σS(x)1\sigma_{S}(x)_{\ell^{1}} can be large even if all the entries of the reconstruction error are small but nonzero; this follows from the fact that for any vector yNy\in\mathbb{R}^{N}, y2y1Ny2\|y\|_{2}\leq\|y\|_{1}\leq\sqrt{N}\|y\|_{2}, and consequently there are vectors xNx\in\mathbb{R}^{N} for which σS(x)1/SσS(x)2\sigma_{S}(x)_{\ell^{1}}/\sqrt{S}\gg\sigma_{S}(x)_{\ell^{2}}, especially when NN is large). In [10] it was shown that the term C2,SσS(x)1/SC_{2,S}\sigma_{S}(x)_{\ell^{1}}/\sqrt{S} on the right hand side of (7) cannot be replaced with CσS(x)2C\sigma_{S}(x)_{\ell^{2}} if one seeks the inequality to hold for all xNx\in\mathbb{R}^{N} with a fixed matrix AA, unless M>cNM>cN for some constant cc. This is unsatisfactory since the paradigm of compressed sensing relies on the ability of recovering sparse or compressible vectors xx from significantly fewer measurements than the ambient dimension NN.

Even though one cannot obtain bounds on the approximation error in terms of σS(x)2\sigma_{S}(x)_{\ell^{2}} with constants that are uniform on xx (with a fixed matrix AA), the situation is significantly better if we relax the uniformity requirement and seek for a version of (7) that holds “with high probability”. Indeed, it has been recently shown by Wojtaszczyk that for any specific xx, σS(x)2\sigma_{S}(x)_{\ell^{2}} can be placed in (7) in lieu of σS(x)1/S\sigma_{S}(x)_{\ell^{1}}/\sqrt{S} (with different constants that are still independent of xx) with high probability on the draw of AA if (i) M>cSlogNM>cS\log{N} and (ii) the entries A{A} is drawn i.i.d. from a Gaussian distribution or the columns of AA are drawn i.i.d. from the uniform distribution on the unit sphere in M\mathbb{R}^{M} [30]. In other words, the encoder Δ1=Δ10\Delta_{1}=\Delta_{1}^{0} is “(2,2) instance optimal in probability” for encoders associated with such AA, a property which was discussed in [10].

Following the notation of [30], we say that an encoder-decoder pair (A,Δ)(A,\Delta) is (q,p)(q,p) instance optimal of order SS with constant C if

Δ(Ax)xqCσS(x)pS1/p1/q\|\Delta({Ax})-x\|_{q}\leq C\frac{\sigma_{S}(x)_{\ell^{p}}}{S^{1/p-1/q}} (9)

holds for all xNx\in\mathbb{R}^{N}. Moreover, for random matrices AωA_{\omega}, (Aω,Δ)(A_{\omega},\Delta) is said to be (q,p)(q,p) instance optimal in probability if for any xx (9) holds with high probability on the draw of AωA_{\omega}. Note that with this notation Theorem 1 implies that (A,Δ1)(A,\Delta_{1}) is (2,1) instance optimal (set ϵ=0\epsilon=0), provided AA satisfies the conditions of the theorem.

The preceding discussion makes it clear that Δ1\Delta_{1} satisfies conditions (C1) and (C2), at least when AA is a sub-Gaussian random matrix and SS is sufficiently small. It only remains to note that decoding by Δ1\Delta_{1} amounts to solving an 1\ell^{1} minimization problem, and is thus tractable, i.e., we also have (C3). In fact, 1\ell^{1} minimization problems as described above can be solved efficiently with solvers specifically designed for the sparse recovery scenarios (e.g. [27],[16], [11]).

1.2 Decoding by p\ell^{p} minimization

We have so far seen that with appropriate encoders, the decoders Δ1ϵ\Delta_{1}^{\epsilon} provide robust and stable recovery for compressible signals even when the measurements are noisy [4], and that (Aω,Δ1)(A_{\omega},\Delta_{1}) is (2,2) instance optimal in probability [30] when AωA_{\omega} is an appropriate random matrix. In particular, stability and robustness properties are conditioned on an appropriate RIP while the instance optimality property is dependent on the draw of the encoder matrix (which is typically called the measurement matrix) from an appropriate distribution, in addition to RIP.

Recall that the decoders Δ1\Delta_{1} and Δ1ϵ\Delta_{1}^{\epsilon} were devised because their action can be computed by solving convex approximations to the combinatorial optimization problem (2) that is required to compute Δ0\Delta_{0}. The decoders defined by

Δpϵ(b)\displaystyle\Delta_{p}^{\epsilon}({b}) :=argminyyp s.t. Ayb2ϵ,and\displaystyle:=\arg\min_{y}\|{y}\|_{p}\text{ s.t. }{\|Ay-b\|_{2}\leq\epsilon},\ \text{and} (10)
Δp(b)\displaystyle\Delta_{p}(b) :=argminyyp s.t. Ay=b,\displaystyle:=\arg\min_{y}\|{y}\|_{p}\text{ s.t. }{Ay=b}, (11)

with 0<p<10<p<1 are also approximations of Δ0\Delta_{0}, the actions of which are computed via non-convex optimization problems that can be solved, at least locally, still much faster than (2). It is natural to ask whether the decoders Δp\Delta_{p} and Δpϵ\Delta^{\epsilon}_{p} possess robustness, stability, and instance optimality properties similar to those of Δ1\Delta_{1} and Δ1ϵ\Delta^{\epsilon}_{1}, and whether these are obtained under weaker conditions on the measurement matrices than the analogous ones with p=1p=1.

Early work by Gribonval and co-authors [22, 21, 20, 19] take some initial steps in answering these questions. In particular, they devise metrics that lead to sufficient conditions for uniqueness of Δ1(b)\Delta_{1}({b}) to imply uniqueness of Δp(b)\Delta_{p}({b}) and specifically for having Δp(b)=Δ1(b)=x\Delta_{p}({b})=\Delta_{1}({b})=x. The authors also present stability conditions in terms of various norms that bound the error, and they conclude that the smaller the value of pp is, the more non-zero entries can be recovered by (11). These conditions, however, are hard to check explicitly and no class of deterministic or random matrices was shown to satisfy them at least with high probability. On the other hand, the authors provide lower bounds for their metrics in terms of generalized mutual coherence. Still, these conditions are pessimistic in the sense that they generally guarantee recovery of only very sparse vectors.

Recently, Chartrand showed that in the noise-free setting, a sufficiently sparse signal can be recovered perfectly with Δp\Delta_{p}, where 0<p<10<p<1, under less restrictive RIP requirements than those needed to guarantee perfect recovery with Δ1\Delta_{1}. The following theorem was proved in [7].

Theorem 5

[7] Let 0<p10<p\leq 1, and let SS\in\mathbb{N}. Suppose that x{x} is SS-sparse, and set b=Ax{b}={A}{x}. If AA satisfies δkS+k2p1δ(k+1)S<k2p11\delta_{kS}+k^{\frac{2}{p}-1}\delta_{(k+1)S}<k^{\frac{2}{p}-1}-1 for some k>1k>1 such that k1Sk\in\frac{1}{S}\mathbb{N}, then Δp(b)=x\Delta_{p}({b})=x.

Note that, for example, when p=0.5p=0.5 and k=3k=3, the above theorem only requires δ3S+27δ4S<26\delta_{3S}+27\delta_{4S}<26 to guarantee perfect recovery with Δ0.5\Delta_{0.5}, a less restrictive condition than the analogous one needed to guarantee perfect reconstruction with Δ1\Delta_{1}, i.e., δ3S+3δ4S<2.\delta_{3S}+3\delta_{4S}<2. Moreover, in [8], Staneva and Chartrand study a modified RIP that is defined by replacing Ac2\|Ac\|_{2} in (4) with Acp\|Ac\|_{p}. They show that under this new definition of δS\delta_{S}, the same sufficient condition as in Theorem 5 guarantees perfect recovery. Steneva and Chartrand also show that if AA is an M×NM\times N Gaussian matrix, their sufficient condition is satisfied provided M>C1(p)S+pC2(p)Slog(N/S)M>C_{1}(p)S+pC_{2}(p)S\log(N/S), where C1(p)C_{1}(p) and C2(p)C_{2}(p) are given explicitly in [8]. It is important to note is that pC2(p)pC_{2}(p) goes to zero as pp goes to zero. In other words, the dependence on NN of the required number of measurements MM (that guarantees perfect recovery for all xΣSNx\in\Sigma^{N}_{S}) disappears as pp approaches 0. This result motivates a more detailed study to understand the properties of the decoders Δp\Delta_{p} in terms of stability and robustness, which is the objective of this paper.

1.2.1 Algorithmic Issues

Clearly, recovery by p\ell^{p} minimization poses a non-convex optimization problem with many local minimizers. It is encouraging that simulation results from recent papers, e.g., [7, 25], strongly indicate that simple modifications to known approaches like iterated reweighted least squares algorithms and projected gradient algorithms yield xx^{*} that are the global minimizers of the associated p\ell^{p} minimization problem (or approximate the global optimizers very well). It is also encouraging to note that even though the results presented in this work and in others [7, 22, 21, 20, 19, 25] assume that the global minimizer has been found, a significant set of these results, including all results in this paper, continue to hold if we could obtain a feasible point x~\widetilde{x}^{*} which satisfies x~pxp\|\widetilde{x}^{*}\|_{p}\leq\|x\|_{p} (where xx is the vector to be recovered). Nevertheless, it should be stated that to our knowledge, the modified algorithms mentioned above have only been shown to converge to local minima.

1.3 Paper Outline

In what follows, we present generalizations of the above results, giving stability and robustness guarantees for p\ell^{p} minimization. In Section 2.1 we show that the decoders Δp\Delta_{p} and Δpϵ\Delta_{p}^{\epsilon} are robust to noise and (2,p) instance optimal in the case of appropriate measurement matrices. For this section we rely and expand on our note [25]. In Section 2.3 we extend [30] and show that for the same range of dimensions as for decoding by 1\ell^{1} minimization, i.e., when AωM×NA_{\omega}\in\mathbb{R}^{M\times N} with M>cSlog(N)M>cS\log(N), (Aω,Δp)(A_{\omega},\Delta_{p}) is also (2,2) instance optimal in probability for 0<p<10<p<1, provided the measurement matrix AωA_{\omega} is drawn from an appropriate distribution. The generalization follows the proof of Wojtaszczyk in [30]; however it is non-trivial and requires a variant of a result by Gordon and Kalton [18] on the Banach-Mazur distance between a pp-convex body and its convex hull. In Section 3 we present some numerical results, further illustrating the possible benefits of using p\ell^{p} minimization and highlighting the behavior of the Δp\Delta_{p} decoder in terms of stability and robustness. Finally, in Section 4 we present the proofs of the main theorems and corollaries.

While writing this paper, we became aware of the work of Foucart and Lai [17] which also shows similar (2,p)(2,p) instance optimality results for 0<p<10<p<1 under different sufficient conditions. In essence, one could use the (2,p)(2,p)-results of Foucart and Lai to obtain (2,2)(2,2) instance optimality in probability results similar to the ones we present in this paper, albeit with different constants. Since neither the sufficient conditions for (2,p)(2,p) instance optimality presented in [17] nor the ones in this paper are uniformly weaker, and since neither provide uniformly better constants, we simply use our estimates throughout.

2 Main Results

In this section, we present our theoretical results on the ability of p\ell^{p} minimization to recover sparse and compressible signals in the presence of noise.

2.1 Sparse recovery with Δp\Delta_{p}: stability and robustness

We begin with a deterministic stability and robustness theorem for decoders Δp\Delta_{p} and Δpϵ\Delta_{p}^{\epsilon} when 0<p<10<p<1 that generalizes Theorem 1 of Candès et al. Note the associated sufficient conditions on the measurement matrix, given in (12) below, are weaker for smaller values of pp than those that correspond to p=1p=1. The results in this subsection were initially reported, in part, in [25].

In what follows, we say that a matrix AA satisfies the property P(k,S,p)P(k,S,p) if it satisfies

δkS+k2p1δ(k+1)S<k2p11,\delta_{kS}+k^{\frac{2}{p}-1}\delta_{(k+1)S}<k^{\frac{2}{p}-1}-1, (12)

for SS\in\mathbb{N} and k>1k>1 such that k1Sk\in\frac{1}{S}\mathbb{N}.

Theorem 6 (General Case)

Let 0<p10<p\leq 1. Suppose that x{x} is arbitrary and b=Ax+eb=Ax+e where e2ϵ\|e\|_{2}\leq\epsilon. If AA satisfies P(k,S,p)P(k,S,p), then

Δpϵ(b)x2pC1ϵp+C2σS(x)ppS1p/2,\|\Delta_{p}^{\epsilon}({b})-{x}\|_{2}^{p}\leq C_{1}\epsilon^{p}+C_{2}\frac{\sigma_{S}(x)_{\ell^{p}}^{p}}{S^{1-p/2}}, (13)

where

C1=2p1+kp/21(2/p1)p/2(1δ(k+1)S)p/2(1+δkS)p/2kp/21,and C_{1}=2^{p}\frac{{1+{k^{p/2-1}(2/p-1)^{-p/2}}}}{(1-\delta_{(k+1)S})^{p/2}-(1+\delta_{kS})^{p/2}k^{p/2-1}},\quad\text{and } (14)
C2=2(p2p)p/2k1p/2(1+((2/p1)p2+kp/21)(1+δkS)p/2(1δ(k+1)S)p/2(1+δkS)p/2k1p/2).C_{2}=\frac{2(\frac{p}{2-p})^{p/2}}{k^{1-p/2}}\left(1+\frac{((2/p-1)^{\frac{p}{2}}+k^{p/2-1})(1+\delta_{kS})^{p/2}}{(1-\delta_{(k+1)S})^{p/2}-\frac{(1+\delta_{kS})^{p/2}}{k^{1-p/2}}}\right). (15)
Remark 7

By setting p=1p=1 and k=3k=3 in Theorem 6, we obtain Theorem 1, with precisely the same constants.

Remark 8

The constants in Theorem 6 are generally well behaved; e.g., C1=5.31C_{1}=5.31 and C2=4.31C_{2}=4.31 for δ4S=0.5\delta_{4S}=0.5 and p=0.5p=0.5. Note for δ4S=0.5\delta_{4S}=0.5 the sufficient condition (12) is not satisfied when p=1p=1, and thus Theorem 6 does not yield any upper bounds on Δ1(b)x2\|\Delta_{1}(b)-x\|_{2} in terms of σS(x)1\sigma_{S}(x)_{\ell^{1}}.

Corollary 9 ((2,p)2,p) instance optimality)

Let 0<p10<p\leq 1. Suppose that AA satisfies P(k,S,p)P(k,S,p). Then (A,Δp)(A,\Delta_{p}) is (2,p)(2,p) instance optimal of order SS with constant C21/pC_{2}^{1/p} where C2C_{2} is as in (15).

Corollary 10 (sparse case)

Let 0<p10<p\leq 1. Suppose xΣSNx\in\Sigma^{N}_{S} and b=Ax+e{b}={Ax}+{e} where e2ϵ\|{e}\|_{2}\leq\epsilon. If AA satisfies P(k,S,p)P(k,S,p), then

Δpϵ(b)x2(C1)1/pϵ,\|\Delta_{p}^{\epsilon}({b})-{x}\|_{2}\leq\left(C_{1}\right)^{1/p}\epsilon,

where C1C_{1} is as in (14).

Remark 11

Corollaries 9 and 10 follow from Theorem 6 by setting ϵ=0\epsilon=0 and σS(x)p=0\sigma_{S}(x)_{\ell^{p}}=0, respectively. Furthermore, Corollary 10 can be proved independently of Theorem 6 leading to smaller constants. See [25] for the explicit values of these improved constants. Finally, note that setting ϵ=0\epsilon=0 in Corollary 10, we obtain Theorem 5 as a corollary.

Remark 12

In [17], Foucart and Lai give different sufficient conditions for exact recovery than those we present. In particular, they show that if

δmS<g(m):=4(21)(m/2)1/p1/24(21)(m/2)1/p1/2+2\delta_{mS}<{g}(m):=\frac{4(\sqrt{2}-1)(m/2)^{1/p-1/2}}{4(\sqrt{2}-1)(m/2)^{1/p-1/2}+2} (16)

holds for some m2,m1Sm\geq 2,m\in\frac{1}{S}\mathbb{N}, then Δp\Delta_{p} will recover signals in ΣSN\Sigma_{S}^{N} exactly. Note that the sufficient condition in this paper, i.e., (12), holds when

δmS<f(m):=(m1)2/p11(m1)2/p1+1\delta_{mS}<{f}(m):=\frac{(m-1)^{2/p-1}-1}{(m-1)^{2/p-1}+1} (17)

for some m2,m1Sm\geq 2,m\in\frac{1}{S}\mathbb{N}. In Figure 1, we compare these different sufficient conditions as a function of mm for p=0.1,0.5,p=0.1,0.5, and 0.90.9 respectively. Figure 1 indicates that neither sufficient condition is weaker than the other for all values of mm. In fact, we can deduce that (16) is weaker when mm is close to 2, while (17) is weaker when mm starts to grow larger. Since both conditions are only sufficient, if either one of them holds for an appropriate mm, then Δp\Delta_{p} recovers all signals in ΣSN\Sigma_{S}^{N}.

Refer to caption
Figure 1: A comparison of the sufficient conditions on δmS\delta_{mS} in (17) and (16) as a function of mm, for p=0.1p=0.1 (top), p=0.5p=0.5 (center) and p=0.9p=0.9 (bottom).
Remark 13

In [12], Davies and Gribonval showed that if one chooses δ2S>δ(p)\delta_{2S}>\delta(p) (where δ(p)\delta(p) can be computed implicitly for 0<p10<p\leq 1), then there exist matrices (matrices in (N1)×N\mathbb{R}^{(N-1)\times N} that correspond to tight Parseval frames in N1\mathbb{R}^{N-1}) with the prescribed δ2S\delta_{2S} for which Δp\Delta_{p} fails to recover signals in ΣSN\Sigma_{S}^{N}. Note that this result does not contradict with the results that we present in this paper: we provide sufficient conditions (e.g., (12)) in terms of δ(k+1)S\delta_{(k+1)S}, where k>1k>1 and kSkS\in\mathbb{N}, that guarantee recovery by Δp\Delta_{p}. These conditions are weaker than the corresponding conditions ensuring recovery by Δ1\Delta_{1}, which suggests that using Δp\Delta_{p} can be beneficial. Moreover, the numerical examples we provide in Section 3 indicate that by using Δp\Delta_{p}, 0<p<10<p<1, one can indeed recover signals in ΣSN\Sigma_{S}^{N}, even when Δ1\Delta_{1} fails to recover them (see Figure 2).

Remark 14

In summary, Theorem 6 states that if (12) is satisfied then we can recover signals in ΣSN\Sigma_{S}^{N} stably by decoding with Δpϵ\Delta_{p}^{\epsilon}. It is worth mentioning that the sufficient conditions presented here reduce the gap between the conditions for exact recovery with Δ0\Delta_{0} (i.e., δ2S<1\delta_{2S}<1) and with Δ1\Delta_{1}, e.g., δ3S<1/3\delta_{3S}<1/3. For example for k=2k=2 and p=0.5p=0.5, δ3S<7/9\delta_{3S}<7/9 is sufficient. In the next subsection, we quantify this improvement.

2.2 The relationship between S1S_{1} and SpS_{p}

Let AA be an M×NM\times N matrix and suppose δm\delta_{m}, m{1,,M/2}m\in\{1,\dots,\lfloor M/2\rfloor\} are its mm-restricted isometry constants. Define SpS_{p} for AA with 0<p10<p\leq 1 as the largest value of SS\in\mathbb{N} for which the slightly stronger version of (12) given by

δ(k+1)S<k2p11k2p1+1\delta_{(k+1)S}<\frac{k^{\frac{2}{p}-1}-1}{k^{\frac{2}{p}-1}+1} (18)

holds for some k>1k>1, k1Sk\in\frac{1}{S}\mathbb{N}. Consequently, by Theorem 6, Δp(Ax)=x\Delta_{p}(Ax)=x for all xΣSpNx\in\Sigma^{N}_{S_{p}}. We now establish a relationship between S1S_{1} and SpS_{p}.

Proposition 15

Suppose, in the above described setting, there exists S1S_{1}\in\mathbb{N} and k>1k>1, k1S1k\in\frac{1}{S_{1}}\mathbb{N} such that

δ(k+1)S1<k1k+1\delta_{(k+1)S_{1}}<\frac{k-1}{k+1} (19)

Then Δ1\Delta_{1} recovers all S1S_{1}-sparse vectors, and Δp\Delta_{p} recovers all SpS_{p} sparse vectors with

Sp=k+1kp2p+1S1.S_{p}=\left\lfloor\frac{k+1}{k^{\frac{p}{2-p}}+1}S_{1}\right\rfloor.
Remark 16

For example, if δ5S1<3/5\delta_{5S_{1}}<3/5 then using Δ23\Delta_{\frac{2}{3}}, we can recover all S23S_{\frac{2}{3}}-sparse vectors with S23=53S1S_{\frac{2}{3}}=\lfloor\frac{5}{3}S_{1}\rfloor.

2.3 Instance optimality in probability and Δp\Delta_{p}

In this section, we show that (Aω,Δp)(A_{\omega},\Delta_{p}) is (2,2)(2,2) instance optimal in probability when AωA_{\omega} is an appropriate random matrix. Our approach is based on that of [30], which we summarize now. A matrix AA is said to possess the LQ1(α)\text{LQ}_{1}(\alpha) property if and only if

A(B1N)αB2M,{A}(B_{1}^{N})\supset\alpha B_{2}^{M},

where BqnB_{q}^{n} denotes the q\ell^{q} unit ball in n\mathbb{R}^{n}. In [30], Wojtaszczyk shows that random Gaussian matrices of size M×NM\times N as well as matrices whose columns are drawn uniformly from the sphere possess, with high probability, the LQ1(α)\text{LQ}_{1}(\alpha) property with α=μlog(N/M)M\alpha=\mu\sqrt{\frac{\log{(N/M)}}{M}}. Noting that such matrices also satisfy RIP((k+1)S,δ)\text{RIP}((k+1)S,\delta) with S<cMlog(N/M)S<c\frac{M}{log{(N/M)}}, again with high probability, Wojtaszczyk proves that Δ1\Delta_{1}, for these matrices, is (2,2) instance optimal in probability of order SS. Our strategy for generalizing this result to Δp\Delta_{p} with 0<p<10<p<1 relies on a generalization of the LQ1\text{LQ}_{1} property to an LQp\text{LQ}_{p} property. Specifically, we say that a matrix AA satisfies LQp(α)\text{LQ}_{p}(\alpha) if and only if

A(BpN)αB2M.{A}(B_{p}^{N})\supset\alpha B_{2}^{M}.

We first show that a random matrix AωA_{\omega}, either Gaussian or uniform as mentioned above, satisfies the LQp(α)\text{LQ}_{p}(\alpha) property with

α=1C(p)(μ2log(N/M)M)(1/p1/2).\alpha=\frac{1}{C(p)}\left(\mu^{2}{\frac{\log{(N/M)}}{M}}\right)^{(1/p-1/2)}.

Once we establish this property, the proof of instance optimality in probability for Δp\Delta_{p} proceeds largely unchanged from Wojtaszczyk’s proof with modifications to account only for the non-convexity of the p\ell^{p}-quasinorm with 0<p<10<p<1.

Next, we present our results on instance optimality of the Δp\Delta_{p} decoder, while deferring the proofs to Section 4. Throughout the rest of the paper, we focus on two classes of random matrices: Aω{A}_{\omega} denotes M×NM\times N matrices, the entries of which are drawn from a zero mean, normalized column-variance Gaussian distribution, i.e., Aω=(ai,j){A}_{\omega}=(a_{i,j}) where ai,j𝒩(0,1/M)a_{i,j}\sim\mathcal{N}(0,1/\sqrt{M}); in this case, we say that AωA_{\omega} is an M×NM\times N Gaussian random matrix. A~ω{\widetilde{A}}_{\omega}, on the other hand, denotes M×NM\times N matrices, the columns of which are drawn uniformly from the sphere; in this case we say that A~ω\widetilde{A}_{\omega} is an M×NM\times N uniform random matrix. In each case, (Ω,P)(\Omega,P) denotes the associated probability space.

We start with a lemma (which generalizes an analogous result of [30]) that shows that the matrices AωA_{\omega} and A~ω\widetilde{A}_{\omega} satisfy the LQp\text{LQ}_{p} property with high probability.

Lemma 17

Let 0<p10<p\leq 1, and let AωA_{\omega} be an M×NM\times N Gaussian random matrix. For 0<μ<1/20<\mu<1/\sqrt{2}, suppose that K1M(logM)ξNeK2MK_{1}M(\log M)^{\xi}\leq N\leq e^{K_{2}M} for some ξ>(12μ2)1\xi>(1-2\mu^{2})^{-1} and some constants K1,K2>0K_{1},K_{2}>0. Then, there exists a constant c=c(μ,ξ,K1,K2)>0c=c(\mu,\xi,K_{1},K_{2})>0, independent of pp, MM, and NN, and a set

Ωμ={ωΩ:Aω(BpN)1C(p)(μ2logN/MM)1/p1/2B2M}\Omega_{\mu}=\left\{\omega\in\Omega:{A}_{\omega}(B_{p}^{N})\supset\frac{1}{C(p)}\left(\mu^{2}{\frac{\log{N/M}}{M}}\right)^{1/p-1/2}B_{2}^{M}\right\}\newline

such that P(Ωμ)1ecMP(\Omega_{\mu})\geq 1-e^{-cM}.

In other words, AωA_{\omega} satisfies the LQp(α){\text{LQ}}_{p}(\alpha), α=1/C(p)(μ2log(N/M)M)1/p1/2\alpha=1/C(p)\left(\mu^{2}\frac{{\log{(N/M)}}}{M}\right)^{1/p-1/2}, with probability 1ecM\geq 1-e^{-cM} on the draw of the matrix. Here C(p)C(p) is a positive constant that depends only on pp. (In particular, C(1)=1C(1)=1 and see (50) for the explicit value of C(p)C(p) when 0<p<10<p<1). This statement is true also for A~ω{\widetilde{A}}_{\omega}.

The above lemma for p=1p=1 can be found in [30]. As we will see in Section 4, the generalization of this result to 0<p<10<p<1 is non-trivial and requires a result from [18], cf. [23], relating certain “distances” of pp-convex bodies to their convex hulls. It is important to note that this lemma provides the machinery needed to prove the following theorem, which extends to Δp\Delta_{p}, 0<p<10<p<1, the analogous result of Wojtaszczyk [30] for Δ1\Delta_{1}.

In what follows, for a set T{1,,N}T\subseteq\{1,\dots,N\}, Tc:={1,,N}TT^{c}:=\{1,\dots,N\}\setminus T; for yNy\in\mathbb{R}^{N}, yTy_{T} denotes the vector with entries yT(j)=y(j)y_{T}(j)=y(j) for all jTj\in T, and yT(j)=0y_{T}(j)=0 for jTcj\in T^{c}.

Theorem 18

Let 0<p<10<p<1. Suppose that AM×N{A}\in\mathbb{R}^{M\times N} satisfies RIP(S,δ){\text{RIP}}(S,\delta) and LQp(1C(p)(μ2/S)1/p1/2){\text{LQ}}_{p}\left(\frac{1}{C(p)}(\mu^{2}/S)^{1/p-1/2}\right) for some μ>0\mu>0 and C(p)C(p) as in (50). Let Δ\Delta be an arbitrary decoder. If (A,Δ)({A},\Delta) is (2,p) instance optimal of order SS with constant C2,pC_{2,p}, then for any xN{x}\in\mathbb{R}^{N} and eM{e}\in\mathbb{R}^{M}, all of the following hold.

  1. (i)

    Δ(Ax+e)x2C(e2+σS(x)pS1/p1/2)\|\Delta({Ax+e})-{x}\|_{2}\leq C(\|{e}\|_{2}+\frac{\sigma_{S}(x)_{\ell^{p}}}{S^{1/p-1/2}})

  2. (ii)

    Δ(Ax)x2C(AxT0c2+σS(x)2)\|\Delta({Ax})-{x}\|_{2}\leq C(\|{Ax}_{T_{0}^{c}}\|_{2}+\sigma_{S}(x)_{\ell^{2}})

  3. (iii)

    Δ(Ax+e)x2C(e2+σS(x)2+AxT0c2)\|\Delta({Ax+e})-{x}\|_{2}\leq C(\|{e}\|_{2}+\sigma_{S}(x)_{\ell^{2}}+\|{Ax}_{T_{0}^{c}}\|_{2})

Above, T0T_{0} denotes the set of indices of the largest (in magnitude) SS coefficients of xx; the constants (all denoted by CC) depend on δ\delta, μ\mu, pp, and C2,pC_{2,p} but not on MM and NN. For the explicit values of these constants see (38) and (39).

Finally, our main theorem on the instance optimality in probability of the Δp\Delta_{p} decoder follows.

Theorem 19

Let 0<p<10<p<1, and let AωA_{\omega} be an M×NM\times N Gaussian random matrix. Suppose that NM[log(M)]2N\geq M[\log(M)]^{2}. There exists constants c1,c2,c3>0c_{1},c_{2},c_{3}>0 such that for all SS\in\mathbb{N} with Sc1M/log(N/M)S\leq c_{1}M/\log{(N/M)}, the following are true.

  1. (i)

    There exists Ω1\Omega_{1} with P(Ω1)13ec2MP(\Omega_{1})\geq 1-3e^{-c_{2}M} such that for all ωΩ1\omega\in\Omega_{1}

    Δp(Aω(x)+e)x2C(e2+σS(x)pS1/p1/2),\|\Delta_{p}(A_{\omega}(x)+e)-{x}\|_{2}\leq C(\|{e}\|_{2}+\frac{\sigma_{S}(x)_{\ell^{p}}}{S^{1/p-1/2}}), (20)

    for any xNx\in\mathbb{R}^{N} and for any eMe\in\mathbb{R}^{M}.

  2. (ii)

    For any xN{x}\in\mathbb{R}^{N}, there exists Ωx\Omega_{x} with P(Ωx)14ec3MP(\Omega_{x})\geq 1-4e^{-c_{3}M} such that for all ωΩx\omega\in\Omega_{x}

    Δp(Aω(x)+e)x2C(e2+σS(x)2),\|\Delta_{p}(A_{\omega}(x)+e)-{x}\|_{2}\leq C\left(\|{e}\|_{2}+\sigma_{S}(x)_{\ell^{2}}\right), (21)

    for any eMe\in\mathbb{R}^{M}.

The statement also holds for A~ω\widetilde{A}_{\omega}, i.e., for random matrices the columns of which are drawn independently from a uniform distribution on the sphere.

Remark 20

The constants above (both denoted by CC) depend on the parameters of the particular LQp\text{LQ}_{p} and RIP properties that the matrix satisfies, and are given explicitly in Section 4, see (38) and (41). The constants c1,c2c_{1},c_{2}, and c3c_{3} depend only on pp and the distribution of the underlying random matrix (see the proof in Section 4.5) and are independent of MM and NN.

Remark 21

Clearly, the statements do not make sense if the hypothesis of the theorem forces SS to be 0. In turn, for a given (M,N)(M,N) pair, it is possible that there is no positive integer SS for which the conclusions of Theorem 19 hold. In particular, to get a non-trivial statement, one needs M>1c1log(N/M)M>\frac{1}{c_{1}}\log(N/M).

Remark 22

Note the difference in the order of the quantifiers between conclusions (i) and (ii) of Theorem 19. Specifically, with statement (i), once the matrix is drawn from the “good” set Ω1\Omega_{1}, we obtain the error guarantee (20) for every xx and ee. In other words, after the initial draw of a good matrix AA, stability and robustness in the sense of (20) are ensured. On the other hand, statement (ii) concludes that associated with every xx is a “good” set Ωx\Omega_{x} (possibly different for different xx) such that if the matrix is drawn from Ωx\Omega_{x}, then stability and robustness in the sense of (21) are guaranteed. Thus, in (ii), for every xx, a different matrix is drawn, and with high probability on that draw (21) holds.

Remark 23

The above theorem pertains to the decoders Δp\Delta_{p} which, like the analogous theorem for Δ1\Delta_{1} presented in [30], requires no knowledge of the noise level. In other words, Δp\Delta_{p} provides estimates of sparse and compressible signals from limited and noisy observations without having to explicitly account for the noise in the decoding. This provides an improvement on Theorem 6 and a practical advantage when estimates of measurement noise levels are absent.

3 Numerical Experiments

In this section, we present some numerical experiments to highlight important aspects of sparse reconstruction by decoding using Δp\Delta_{p}, 0<p10<p\leq 1. First, we compare the sufficient conditions under which decoding with Δp\Delta_{p} guarantees perfect recovery of signals in ΣSN\Sigma^{N}_{S} for different values of pp and SS. Next, we present numerical results illustrating the robustness and instance optimality of the Δp\Delta_{p} decoder. Here, we wish to observe the linear growth of the 2\ell^{2} reconstruction error Δp(Ax+e)x2\|\Delta_{p}({Ax+e})-{x}\|_{2}, as a function of σS(x)2\sigma_{S}(x)_{\ell^{2}} and of e2\|{e}\|_{2}.

To that end, we generate a 100×300100\times 300 matrix AA whose columns are drawn from a Gaussian distribution and we estimate its RIP constants δS\delta_{S} via Monte Carlo (MC) simulations. Under the assumption that the estimated constants are the correct ones (while in fact they are only lower bounds), Figure 2 (left) shows the regions where (12) guarantees recovery for different (S,p)(S,p)-pairs. On the other hand, Figure 2 (right) shows the empirical recovery rates via p\ell^{p} quasinorm minimization: To obtain this figure, for every S=1,,49S=1,\dots,49, we chose 50 different instances of xΣS300x\in\Sigma^{300}_{S} where non-zero coefficients of each were drawn i.i.d. from the standard Gaussian distribution. These vectors were encoded using the same measurement matrix AA as above. Since there is no known algorithm that will yield the global minimizer of the optimization problem (11), we approximated the action of Δp\Delta_{p} by using a projected gradient algorithm on a sequence of smoothed versions of the p\ell^{p} minimization problem: In (11), instead of minimizing the yp\|y\|_{p}, we minimized (i(yi2+ϵ2)p/2)1/p\left(\sum_{i}{({y}_{i}^{2}+\epsilon^{2})^{p/2}}\right)^{1/p} initially with a large ϵ\epsilon. We then used the corresponding solution as the starting point of the next subproblem obtained by decreasing the value of ϵ\epsilon according to the rule ϵn=(0.99)ϵn1\epsilon_{n}=(0.99)\epsilon_{n-1}. We continued reducing the value of ϵ\epsilon and solving the corresponding subproblem until ϵ\epsilon becomes very small. Note that this approach is similar to the one described in [7]. The empirical results show that Δp\Delta_{p} (in fact, the approximation of Δp\Delta_{p} as described above) is successful in a wider range of scenarios than those predicted by Theorem 6. This can be attributed to the fact that the conditions presented in this paper are only sufficient, or to the fact that in practice what is observed is not necessarily a manifestation of uniform recovery. Rather, the practical results could be interpreted as success of Δp\Delta_{p} with high probability on either xx or A{A}.

Refer to caption
Refer to caption
Figure 2: For a Gaussian matrix A100×300{A}\in{{{{\mathbb{R}}}}}^{100\times 300}, whose δS\delta_{S} values are estimated via MC simulations, we generate the theoretical (left) and practical (right) phase-diagrams for reconstruction via p\ell^{p} minimization.
Refer to caption
Refer to caption
Figure 3: Reconstruction error with compressible signals (left), noisy observations (right). Observe the almost linear growth of the error in compressible signals and for different values of pp, highlighting the instance optimality of the decoders. The plots were generated by averaging the results of 10 experiments with the same matrix AA and randomized locations of the coefficients of xx.

Next, we generate scenarios that allude to the conclusions of Theorem 19. To that end, we generate a signal composed of xTΣ40300x_{T}\in\Sigma_{40}^{300}, supported on an index set TT, and a signal zTcz_{T^{c}} supported on TcT^{c}, where all the coefficients are drawn from the standard Gaussian distribution. We then normalize xTx_{T} and zTcz_{T^{c}} so that xT2=zTc2=1\|x_{T}\|_{2}=\|z_{T^{c}}\|_{2}=1 and generate x=xT+λzTcx=x_{T}+\lambda z_{T^{c}} with increasing values of λ\lambda (starting from 0), thereby increasing σ40(x)2λ\sigma_{40}(x)_{\ell^{2}}\approx\lambda. For this experiment, we choose our measurement matrix A100×300A\in\mathbb{R}^{100\times 300} by drawing its columns uniformly from the sphere. For each value of λ\lambda we measure the reconstruction error Δp(Ax)x2\|\Delta_{p}(Ax)-x\|_{2}, and we repeat the process 10 times while randomizing the index set TT but preserving the coefficient values. We report the averaged results in Figure 3 (left) for different values of pp. Similarly, we generate noisy observations AxT+λeAx_{T}+\lambda e, of a sparse signal xTΣ40300x_{T}\in\Sigma_{40}^{300} where xT2=e2=1\|x_{T}\|_{2}=\|e\|_{2}=1 and we increase the noise level starting from λ=0\lambda=0. Here, again, the non-zero entries of xTx_{T} and all entries of ee were chosen i.i.d. from the standard Gaussian distribution and then the vectors were properly normalized. Next, we measure Δp(AxT+λe)xT2\|\Delta_{p}(Ax_{T}+\lambda e)-x_{T}\|_{2} (for 10 realizations where we randomize TT) and report the averaged results in Figure 3 (right) for different values of pp. In both these experiments, we observe that the error increases roughly linearly as we increase λ\lambda, i.e., σ40(x)2\sigma_{40}(x)_{\ell^{2}} and the noise power, respectively. Moreover, when the signal is highly compressible or when the noise level is low, we observe that reconstruction using Δp\Delta_{p} with 0<p<10<p<1 yields a lower approximation error than that with p=1p=1. It is also worth noting that for values of pp close to one, even in the case of sparse signals with no noise, the average reconstruction error is non-zero. This may be due to the fact that for such large pp the number of measurements is not sufficient for the recovery of signals with S=40S=40, further highlighting the benefits of using the decoder Δp\Delta_{p}, with smaller values of pp.

Finally, in Figure 4, we plot the results of an experiment in which we generate signals x200x\in\mathbb{R}^{200} with sorted coefficients x(j)x(j) that decay according to some power law. In particular, for various values of 0<q<10<q<1, we set x(j)=cj1/qx(j)=cj^{-1/q} such that x2=1\|x\|_{2}=1. We then encode xx with 50 different 100×200100\times 200 measurement matrices the columns of which were drawn from the uniform distribution on the sphere, and examine the approximations obtained by decoding with Δp\Delta_{p} for different values of 0<p<10<p<1. The results indicate that values of pqp\approx q provide the lowest reconstruction errors. Note that in Figure 4, we report the results in form of signal to noise ratios defined as

SNR=20log10(x2Δ(Ax)x2).SNR=20\log_{10}\left(\frac{\|x\|_{2}}{\|\Delta{(Ax)}-x\|_{2}}\right).
Refer to caption
Refer to caption
Figure 4: Reconstruction signal to noise ratios (in dB) obtained by using Δp\Delta_{p} to recover signals whose sorted coefficients decay according to a power law (x(j)=cj1/q,x2=1x(j)=cj^{-1/q},\|x\|_{2}=1) as a function of qq (left) and as a function of pp (right). The presented results are averages of 50 experiments performed with different matrices in 100×200\mathbb{R}^{100\times 200}. Observe that for highly compressible signals, e.g., for q=0.4q=0.4, there is a 55 dB gain in using p<0.6p<0.6 as compared to p=1p=1. The performance advantage is about 22 dB for q=0.6q=0.6. As the signals become much less compressible, i.e., as we increase qq to 0.90.9 the performances are almost identical.

4 Proofs

4.1 Proof of Proposition 15

First, note that for any AM×NA\in\mathbb{R}^{M\times N}, δm\delta_{m} is non-decreasing in mm. Also, the map kk1k+1k\mapsto\frac{k-1}{k+1} is increasing in kk for k0k\geq 0.

Set

L:=(k+1)S1,~=kp2p,and S~p=L~+1.L:=(k+1)S_{1},\quad\widetilde{\ell}=k^{\frac{p}{2-p}},\quad\text{and $\widetilde{S}_{p}=\frac{L}{\widetilde{\ell}+1}$.}

Then

δ(~+1)S~p=δ(k+1)S1<k1k+1=~2pp1~2pp+1.\delta_{(\widetilde{\ell}+1)\widetilde{S}_{p}}=\delta_{(k+1)S_{1}}<\frac{k-1}{k+1}=\frac{\widetilde{\ell}^{\frac{2-p}{p}}-1}{\widetilde{\ell}^{\frac{2-p}{p}}+1}.

We now describe how to choose \ell and SpS_{p} such that ~\ell\geq\widetilde{\ell}, SpS_{p}\in\mathbb{N}, and (+1)Sp=L(\ell+1)S_{p}=L (this will be sufficient to complete the proof using the monotonicity observations above). First, note that this last equality is satisfied only if (,Sp)(\ell,S_{p}) is in the set

{(nLn,Ln):n=1,,L1}.\{(\frac{n}{L-n},L-n):\ n=1,\dots,L-1\}.

Let nn^{*} be such that

n1Ln+1<~nLn.\frac{n^{*}-1}{L-n^{*}+1}<\widetilde{\ell}\leq\frac{n^{*}}{L-n^{*}}. (22)

To see that such an nn^{*} exists, recall that ~=kp2p\widetilde{\ell}=k^{\frac{p}{2-p}} where 0<p<10<p<1. Also, (k+1)S1=L(k+1)S_{1}=L with S1S_{1}\in\mathbb{N}, and k>1k>1. Consequently, 1<~<kL11<\widetilde{\ell}<k\leq L-1, and k{nLn:n=L2,,L1}k\in\{\frac{n}{L-n}:\ n=\lceil\frac{L}{2}\rceil,\dots,L-1\}. Thus, we know that we can find nn^{*} as above. Furthermore, nLn>1\frac{n^{*}}{L-n^{*}}>1. It follows from (22) that

LnS~p<Ln+1.L-n^{*}\leq\widetilde{S}_{p}<L-n^{*}+1.

We now choose

=nLn,andSp=S~p=Ln.\ell=\frac{n^{*}}{L-n^{*}},\quad\text{and}\ S_{p}=\lfloor\widetilde{S}_{p}\rfloor=L-n^{*}.

Then (+1)Sp=L(\ell+1)S_{p}=L, and ~\ell\geq\widetilde{\ell}. So, we conclude that for \ell as above and

Sp=S~p=k+1kp2p+1S1,S_{p}=\lfloor\widetilde{S}_{p}\rfloor=\left\lfloor\frac{k+1}{k^{\frac{p}{2-p}}+1}S_{1}\right\rfloor,

we have

δ(+1)Sp<2pp12pp+1.\delta_{(\ell+1)S_{p}}<\frac{\ell^{\frac{2-p}{p}}-1}{\ell^{\frac{2-p}{p}}+1}.

Consequently, the condition of Corollary 10 is satisfied and we have the desired conclusion. ∎

4.2 Proof of Theorem 6

We modify the proof of Candès et. al. of the analogous result for the encoder Δ1\Delta_{1} (Theorem 2 in [4]) to account for the non-convexity of the p\ell^{p} quasinorm. We give the full proof for completeness. We stick to the notation of [4] whenever possible.

Let 0<p<10<p<1, xNx\in\mathbb{R}^{N} be arbitrary, and define x:=Δpϵ(b)x^{*}:=\Delta_{p}^{\epsilon}(b) and h:=xxh:=x^{*}-x. Our goal is to obtain an upper bound on h2\|h\|_{2} given that Ah22ϵ\|Ah\|_{2}\leq 2\epsilon (by definition of Δpϵ\Delta_{p}^{\epsilon}).

Below, for a set T{1,,N}T\subseteq\{1,\dots,N\}, Tc:={1,,N}TT^{c}:=\{1,\dots,N\}\setminus T; for yNy\in\mathbb{R}^{N}, yTy_{T} denotes the vector with entries yT(j)=y(j)y_{T}(j)=y(j) for all jTj\in T, and yT(j)=0y_{T}(j)=0 for jTcj\in T^{c}.

( I )  We start by decomposing hh as a sum of sparse vectors with disjoint support. In particular, denote by T0T_{0} the set of indices of the largest (in magnitude) SS coefficients of xx (here SS is to be determined later). Next, partition TocT_{o}^{c} into sets T1,T2,T_{1},T_{2},\dots, |Tj|=L|T_{j}|=L for j1j\geq 1 where LL\in\mathbb{N} (also to be determined later), such that T1T_{1} is the set of indices of the LL largest (in magnitude) coefficients of hT0ch_{T_{0}^{c}}, T2T_{2} is the set of indices of the second LL largest coefficients of hT0ch_{T_{0}^{c}}, and so on. Finally let T01:=T0T1T_{01}:=T_{0}\cup T_{1}. We now obtain a lower bound for Ah2p\|Ah\|_{2}^{p} using the RIP constants of the matrix AA. In particular, we have

Ah2p\displaystyle\|Ah\|_{2}^{p} =\displaystyle= AhT01+j2AhTj2p\displaystyle\|Ah_{T_{01}}+\sum_{j\geq 2}Ah_{T_{j}}\|_{2}^{p} (23)
\displaystyle\geq AhT012pj2AhTj2p\displaystyle\|Ah_{T_{01}}\|_{2}^{p}-\sum_{j\geq 2}\|Ah_{T_{j}}\|_{2}^{p}
\displaystyle\geq (1δL+|T0|)p/2hT012p(1+δL)p/2j2hTj2p.\displaystyle(1-\delta_{L+|T_{0}|})^{p/2}\|h_{T_{01}}\|_{2}^{p}-(1+\delta_{L})^{p/2}\sum_{j\geq 2}\|h_{T_{j}}\|_{2}^{p}.

Above, together with RIP, we used the fact that 2p\|\cdot\|_{2}^{p} satisfies the triangle inequality for any 0<p<10<p<1. What now remains is to relate hT012p\|h_{T_{01}}\|_{2}^{p} and j2hTj2p\sum_{j\geq 2}\|h_{T_{j}}\|_{2}^{p} to h2\|h\|_{2}.

( II )  Next, we aim to bound j2hTj2p\sum_{j\geq 2}\|h_{T_{j}}\|_{2}^{p} from above in terms of h2\|h\|_{2}. To that end, we proceed as in [4]. First, note that |hTj+1()|p|hTj()|p|h_{T_{j+1}}(\ell)|^{p}\leq|h_{T_{j}}(\ell^{\prime})|^{p} for all Tj+1,Tj\ell\in T_{j+1},\ell^{\prime}\in T_{j}, and thus |hTj+1()|phTjpp/L|h_{T_{j+1}}(\ell)|^{p}\leq\|h_{T_{j}}\|_{p}^{p}/L. It follows that hTj+122L12phTjp2\|h_{T_{j+1}}\|_{2}^{2}\leq L^{1-\frac{2}{p}}\|h_{T_{j}}\|_{p}^{2}, and consequently

j2hTj2pLp21j1hTjpp=Lp21hTocpp.\sum_{j\geq 2}\|h_{T_{j}}\|_{2}^{p}\leq L^{\frac{p}{2}-1}\sum_{j\geq 1}\|h_{T_{j}}\|_{p}^{p}=L^{\frac{p}{2}-1}\|h_{T_{o}^{c}}\|_{p}^{p}. (24)

Next, note that, similar to the case when p=1p=1 as shown in [4], the “error” hh is concentrated on the “essential support” of xx (in our case T0T_{0}). To quantify this claim, we repeat the analogous calculation in [4]: Note, first, that by definition of xx^{*},

xpp=x+hpp=xT0+hT0pp+xT0c+hT0cppxpp.\|x^{*}\|_{p}^{p}=\|x+h\|_{p}^{p}=\|x_{T_{0}}+h_{T_{0}}\|_{p}^{p}+\|x_{T^{c}_{0}}+h_{T^{c}_{0}}\|_{p}^{p}\leq\|x\|_{p}^{p}.

As pp\|\cdot\|_{p}^{p} satisfies the triangle inequality, we then have

xT0pphT0pp+hT0cppxT0cppxpp.\|x_{T_{0}}\|_{p}^{p}-\|h_{T_{0}}\|_{p}^{p}+\|h_{T_{0}^{c}}\|_{p}^{p}-\|x_{T_{0}^{c}}\|_{p}^{p}\leq\|x\|_{p}^{p}.

Consequently,

hTocpphT0pp+2xT0cpp,\|h_{T_{o}^{c}}\|_{p}^{p}\leq\|h_{T_{0}}\|_{p}^{p}+2\|x_{T_{0}^{c}}\|_{p}^{p}, (25)

which, together with (24), implies

j2hTj2pLp21(hT0pp+2xT0cpp)ρ1p2(hT012p+2|T0|p21xT0cpp),\sum_{j\geq 2}\|h_{T_{j}}\|_{2}^{p}\leq L^{\frac{p}{2}-1}(\|h_{T_{0}}\|_{p}^{p}+2\|x_{T_{0}^{c}}\|_{p}^{p})\leq\rho^{1-\frac{p}{2}}(\|h_{T_{01}}\|_{2}^{p}+2|T_{0}|^{\frac{p}{2}-1}\|x_{T_{0}^{c}}\|_{p}^{p}), (26)

where ρ:=|T0|L\rho:=\frac{|T_{0}|}{L}, and we used the fact that hT0pp|T0|1p2hT02p\|h_{T_{0}}\|_{p}^{p}\leq|T_{0}|^{1-\frac{p}{2}}\|h_{T_{0}}\|_{2}^{p} (which follows as |supp(hT0)|=|T0||\text{supp}(h_{T_{0}})|=|T_{0}|). Using (26) and (23), we obtain

Ah2pCp,L,|T0|hT012p2ρ1p2|T0|p21(1+δL)p2xT0cpp,\|Ah\|_{2}^{p}\geq C_{p,L,|T_{0}|}\|h_{T_{01}}\|_{2}^{p}-2\rho^{1-\frac{p}{2}}|T_{0}|^{\frac{p}{2}-1}(1+\delta_{L})^{\frac{p}{2}}\|x_{T_{0}^{c}}\|_{p}^{p}, (27)

where

Cp,L,|T0|:=(1δL+|T0|)p2(1+δL)p2ρ1p2.C_{p,L,|T_{0}|}:=(1-\delta_{L+|T_{0}|})^{\frac{p}{2}}-(1+\delta_{L})^{\frac{p}{2}}\rho^{1-\frac{p}{2}}. (28)

At this point, using Ah22ϵ\|Ah\|_{2}\leq 2\epsilon, we obtain an upper bound on hT012\|h_{T_{01}}\|_{2} given by

hT012p1Cp,L,|T0|((2ϵ)p+2ρ1p2(1+δL)p2xT0cpp|T0|1p2),\|h_{T_{01}}\|_{2}^{p}\leq\frac{1}{C_{p,L,|T_{0}|}}\left((2\epsilon)^{p}+2\rho^{1-\frac{p}{2}}(1+\delta_{L})^{\frac{p}{2}}\frac{\|x_{T_{0}^{c}}\|_{p}^{p}}{|T_{0}|^{1-\frac{p}{2}}}\right), (29)

provided Cp,L,|T0|>0C_{p,L,|T_{0}|}>0 (this will impose the condition given in (12) on the RIP constants of the underlying matrix AA).

( III )  To complete the proof, we will show that the error vector hh is concentrated on T01T_{01}. Denote by hT0c[m]h_{T_{0}^{c}}[m] the mmth largest (in magnitude) coefficient of hT0ch_{T_{0}^{c}} and observe that |hT0c[m]|phT0cpp/m|h_{T_{0}^{c}}[m]|^{p}\leq\|h_{T_{0}^{c}}\|_{p}^{p}/m. As hT01c[m]=hT0c[L+m]h_{T_{01}^{c}}[m]=h_{T_{0}^{c}}[L+m], we then have

hT01c22=mL+1|hT0c[m]|2mL+1(hT0cppm)2phT0cp2L2p1(2/p1).\|h_{T_{01}^{c}}\|_{2}^{2}=\sum_{m\geq L+1}|h_{T_{0}^{c}}[m]|^{2}\leq\sum_{m\geq L+1}\left(\frac{\|h_{T_{0}^{c}}\|^{p}_{p}}{m}\right)^{\frac{2}{p}}\leq\frac{\|h_{T_{0}^{c}}\|_{p}^{2}}{L^{\frac{2}{p}-1}(2/p-1)}. (30)

Here, the last inequality follows because for 0<p<10<p<1

mL+1m2pLt2p𝑑t=1L2p1(2/p1).\sum_{m\geq L+1}m^{-\frac{2}{p}}\leq\int_{L}^{\infty}t^{-\frac{2}{p}}dt=\frac{1}{L^{\frac{2}{p}-1}(2/p-1)}.

Finally, we use (25) and (30) to conclude

h22\displaystyle\|h\|_{2}^{2} =\displaystyle= hT0122+hT01c22hT0122+[hT0pp+2xT0cppL1p2(2/p1)p2]2p\displaystyle\|h_{T_{01}}\|_{2}^{2}+\|h_{T_{01}^{c}}\|_{2}^{2}\leq\|h_{T_{01}}\|_{2}^{2}+\left[\frac{\|h_{T_{0}}\|_{p}^{p}+2\|x_{T^{c}_{0}}\|_{p}^{p}}{L^{1-\frac{p}{2}}(2/p-1)^{\frac{p}{2}}}\right]^{\frac{2}{p}} (31)
\displaystyle\leq [(1+ρ1p2(2/p1)p2)hT012p+2ρ1p2(2/p1)p2xT0cpp|T0|1p2]2p.\displaystyle\left[\big{(}1+\rho^{1-\frac{p}{2}}(2/p-1)^{-\frac{p}{2}}\big{)}\|h_{T_{01}}\|_{2}^{p}+2\rho^{1-\frac{p}{2}}(2/p-1)^{-\frac{p}{2}}\frac{\|x_{T^{c}_{0}}\|_{p}^{p}}{|T_{0}|^{1-\frac{p}{2}}}\right]^{\frac{2}{p}}.

Above, we used the fact that hT0pp|T0|1p2hT02p\|h_{T_{0}}\|_{p}^{p}\leq|T_{0}|^{1-\frac{p}{2}}\|h_{T_{0}}\|_{2}^{p}, and that for any a,b0a,b\geq 0, and α1\alpha\geq 1, aα+bα(a+b)αa^{\alpha}+b^{\alpha}\leq(a+b)^{\alpha}.

( IV )  We now set |T0|=S|T_{0}|=S, L=kSL=kS where kk and SS are chosen such that Cp,kS,S>0C_{p,kS,S}>0 which is equivalent to having kk, SS, and pp satisfy (12). In this case, xT0cp=σS(x)p\|x_{T_{0}^{c}}\|_{p}=\sigma_{S}(x)_{\ell^{p}}, ρ=1/k\rho=1/k, and combining (29) and (31) yields

h2pC1ϵp+C2σS(x)ppS1p2\|h\|_{2}^{p}\leq C_{1}\epsilon^{p}+C_{2}\frac{\sigma_{S}(x)_{\ell^{p}}^{p}}{S^{1-\frac{p}{2}}} (32)

where C1C_{1} and C2C_{2} are as in (14) and (15), respectively. ∎

4.3 Proof of Lemma 17.

( I ) The following result of Wojtaszczyk [30, Proposition 2.2] will be useful.

Proposition 24 ([30])

Let AωA_{\omega} be an M×NM\times N Gaussian random matrix, let 0<μ<1/20<\mu<1/\sqrt{2}, and suppose that K1M(logM)ξNeCMK_{1}M(\log M)^{\xi}\leq N\leq e^{CM} for some ξ>(12μ2)1\xi>(1-2\mu^{2})^{-1} and some constants K1,K2>0K_{1},K_{2}>0. Then, there exists a constant c=c(μ,ξ,K1,K2)>0c=c(\mu,\xi,K_{1},K_{2})>0, independent of MM and NN, and a set

Ωμ={ω:Aω(B1N)μlogN/MMB2M}\Omega_{\mu}=\left\{\omega:{A}_{\omega}(B_{1}^{N})\supset\mu\sqrt{\frac{\log{N/M}}{M}}B_{2}^{M}\right\}

such that

P(Ωμ)1ecM.P(\Omega_{\mu})\geq 1-e^{-cM}.

The above statement is true also for A~ω{\widetilde{A}}_{\omega}.

We will also use the following adaptation of [18, Lemma 2] for which we will first introduce some notation. Define a body to be a compact set containing the origin as an interior point and star shaped with respect to the origin [23]. Below, we use conv(K)conv(K) to denote the convex-hull of a body KK. For KBK\subseteq B, we denote by d1(K,B)d_{1}(K,B) the “distance” between KK and BB given by

d1(K,B):=inf{λ>0:KBλK}=inf{λ>0:1λBKB}.d_{1}(K,B):=\inf\{\lambda>0:\ K\subset B\subset\lambda K\}=\inf\{\lambda>0:\ \frac{1}{\lambda}B\subset K\subset B\}.

Finally, we call a body KK pp-convex if for any x,yKx,y\in K, λx+μyK\lambda x+\mu y\in K whenever λ,μ[0,1]\lambda,\mu\in[0,1] such that λp+μp=1\lambda^{p}+\mu^{p}=1.

Lemma 25

Let 0<p<10<p<1, and let KK be a pp-convex body in n\mathbb{R}^{n}. If conv(K)B2nconv(K)\subset B_{2}^{n}, then

d1(K,B2n)C(p)d1(conv(K),B2n)(2/p1),d_{1}(K,B_{2}^{n})\leq C(p)d_{1}(conv(K),B_{2}^{n})^{(2/p-1)},

where

C(p)=(21p+(1p)21p/2p)2pp2(1(1p)log2)22pp2.C(p)=\left(2^{1-p}+\frac{(1-p)2^{1-p/2}}{p}\right)^{\frac{2-p}{p^{2}}}\left(\frac{1}{(1-p)\log 2}\right)^{\frac{2-2p}{p^{2}}}.

We defer the proof of this lemma to the Appendix.

( II ) Note that A~ω(B1N)B2M{\widetilde{A}}_{\omega}(B_{1}^{N})\subset B_{2}^{M}. This follows because A~ω12\|{\widetilde{A}}_{\omega}\|_{1\rightarrow 2}, which is equal to the largest column norm of A~ω\widetilde{A}_{\omega}, is 1 by construction. Thus, for xB1Nx\in B_{1}^{N},

A~ω(x)2A~ω12x11,\|\widetilde{A}_{\omega}(x)\|_{2}\leq\|{\widetilde{A}}_{\omega}\|_{1\rightarrow 2}\|x\|_{1}\leq 1,

that is, A~ω(B1N)B2M{\widetilde{A}}_{\omega}(B_{1}^{N})\subset B_{2}^{M}, and so d1(A~ω(B1N),B2M)d_{1}(\widetilde{A}_{\omega}(B_{1}^{N}),B^{M}_{2}) is well-defined. Next, by Proposition 24, we know that there exists Ωμ\Omega_{\mu} with P(Ωμ)1ecMP(\Omega_{\mu})\geq 1-e^{-cM} such that for all ωΩμ\omega\in\Omega_{\mu},

A~ω(B1N)μlogN/MMB2M{\widetilde{A}}_{\omega}(B_{1}^{N})\supset\mu\sqrt{\frac{\log{N/M}}{M}}B_{2}^{M} (33)

From this point on, let ωΩμ\omega\in\Omega_{\mu}. Then

B2MA~ω(B1N)μlogN/MMB2M,B_{2}^{M}\supset{\widetilde{A}}_{\omega}(B_{1}^{N})\supset\mu\sqrt{\frac{\log{N/M}}{M}}B_{2}^{M},

and consequently

d1(A~ω(B1N),B2M)(μlogN/MM)1.d_{1}({\widetilde{A}}_{\omega}(B_{1}^{N}),B_{2}^{M})\leq\left(\mu\sqrt{\frac{\log{N/M}}{M}}\right)^{-1}. (34)

The next step is to note that conv(BpN)=B1Nconv(B_{p}^{N})=B_{1}^{N} and consequently

conv(A~ω(BpN))=A~ω(conv(BpN))=A~ω(B1N).conv\left({\widetilde{A}}_{\omega}(B_{p}^{N})\right)={\widetilde{A}}_{\omega}\left(conv(B_{p}^{N})\right)={\widetilde{A}}_{\omega}(B_{1}^{N}).

We can now invoke Lemma 25 to conclude that

d1(A~ω(BpN),B2M)\displaystyle d_{1}({\widetilde{A}}_{\omega}(B_{p}^{N}),B_{2}^{M}) C(p)d1(conv(A~ω(BpN)),B2M)2pp\displaystyle\leq C(p)d_{1}(conv({\widetilde{A}}_{\omega}(B_{p}^{N})),B_{2}^{M})^{\frac{2-p}{p}}
=C(p)d1(A~ω(B1N),B2M)2pp.\displaystyle=C(p)d_{1}({\widetilde{A}}_{\omega}(B_{1}^{N}),B_{2}^{M})^{\frac{2-p}{p}}. (35)

Finally, by using (34), we find that

d1(A~ω(BpN),B2M)C(p)(μ2logN/MM)1/21/p,d_{1}({\widetilde{A}}_{\omega}(B_{p}^{N}),B_{2}^{M})\leq C(p)\left(\mu^{2}{\frac{\log{N/M}}{M}}\right)^{1/2-1/p}, (36)

and consequently

A~ω(BpN)1C(p)(μ2logN/MM)(1/p1/2)B2M.{\widetilde{A}}_{\omega}(B_{p}^{N})\supset\frac{1}{C(p)}\left(\mu^{2}{\frac{\log{N/M}}{M}}\right)^{(1/p-1/2)}B_{2}^{M}. (37)

In other words, the matrix A~ω\widetilde{A}_{\omega} has the LQp(α)\text{LQ}_{p}(\alpha) property with the desired value of α\alpha for every ωΩμ\omega\in\Omega_{\mu} with P(Ωμ)1ecMP(\Omega_{\mu})\geq 1-e^{-cM}. Here cc is as specified in Proposition 24.

To see that the same is true for Aω{A}_{\omega}, note that there exists a set Ω0\Omega_{0} with P(Ω0)>1ecMP(\Omega_{0})>1-e^{-cM} such that for all ωΩ0\omega\in\Omega_{0}, Aj(ω)2<2\|A_{j}(\omega)\|_{2}<2, for every column Aj{A}_{j} of AωA_{\omega} (this follows from RIP). Using this observation one can trace the above proof with minor modifications. ∎

4.4 Proof of Theorem 18.

We start with the following lemma, the proof of which for p<1p<1 follows with very little modification from the analogous proof of Lemma 3.1 in [30] and shall be omitted.

Lemma 26

Let 0<p<10<p<1 and suppose that A{A} satisfies RIP(S,δ)\text{RIP}(S,\delta) and LQp(γp/S1/p1/2){\text{LQ}}_{p}\left(\gamma_{p}/S^{1/p-1/2}\right) with γp:=μ2/p1/C(p)\gamma_{p}:=\mu^{2/p-1}/C(p). Then for every xN{x}\in\mathbb{R}^{N}, there exists x~N{\widetilde{x}}\in\mathbb{R}^{N} such that

Ax=Ax~,x~pS1/p1/2γpAx2,andx~2C3Ax2.{Ax}={A\widetilde{x}},\quad\|{\widetilde{x}}\|_{p}\leq\frac{S^{1/p-1/2}}{\gamma_{p}}\|{Ax}\|_{2},\quad\text{and}\quad\|{\widetilde{x}}\|_{2}\leq C_{3}\|{Ax}\|_{2}.

Here, C3=1γp+γp(1δ)+1(1δ2)γpC_{3}=\frac{1}{\gamma_{p}}+\frac{\gamma_{p}(1-\delta)+1}{(1-\delta^{2})\gamma_{p}}. Note that C3C_{3} depends only on μ\mu, δ\delta and pp.

We now proceed to prove Theorem 18. Our proof follows the steps of [30] and differs in the handling of the non-convexity of the p\ell^{p} quasinorms for 0<p<10<p<1.

First, recall that AA satisfies RIP(S,δ)\text{RIP}(S,\delta) and LQp(γp/S1/p1/2)\text{LQ}_{p}(\gamma_{p}/S^{1/p-1/2}), so by Lemma 26, there exists zNz\in\mathbb{R}^{N} such that Az=e{Az}=e, zpS1/p1/2γpe2\|{z}\|_{p}\leq\frac{S^{1/p-1/2}}{\gamma_{p}}\|{e}\|_{2}, and z2C3e2\|{z}\|_{2}\leq C_{3}\|{e}\|_{2}. Now, A(x+z)=Ax+e{A(x+z)}={Ax+e}, and Δ\Delta is (2,p)(2,p) instance optimal with constant C2,pC_{2,p}. Thus,

Δ(A(x)+e)(x+z)2C2,pσS(x+z)pS1/p1/2,\|\Delta({A(x)+e})-{(x+z)}\|_{2}\leq C_{2,p}\frac{\sigma_{S}(x+z)_{\ell^{p}}}{S^{1/p-1/2}},

and consequently

Δ(A(x)+e)x2\displaystyle\|\Delta({A(x)+e})-{x}\|_{2} z2+C2,pσS(x+z)pS1/p1/2\displaystyle\leq\|{z}\|_{2}+C_{2,p}\frac{\sigma_{S}(x+z)_{\ell^{p}}}{S^{1/p-1/2}}
C3e2+C2,pσS(x+z)pS1/p1/2\displaystyle\leq C_{3}\|{e}\|_{2}+C_{2,p}\frac{\sigma_{S}(x+z)_{\ell^{p}}}{S^{1/p-1/2}}
C3e2+21/p1C2,pσS(x)p+zpS1/p1/2\displaystyle\leq C_{3}\|{e}\|_{2}+2^{1/p-1}C_{2,p}\frac{\sigma_{S}(x)_{\ell^{p}}+\|{z}\|_{p}}{S^{1/p-1/2}}
C3e2+21/p1C2,pσS(x)pS1/p1/2+21/p1C2,pe2γp,\displaystyle\leq C_{3}\|{e}\|_{2}+2^{1/p-1}C_{2,p}\frac{\sigma_{S}({x})_{\ell^{p}}}{S^{1/p-1/2}}+2^{1/p-1}C_{2,p}\frac{\|{e}\|_{2}}{\gamma_{p}},

where in the third inequality we used the fact in any that p\ell^{p} quasinorm satisfies the inequality a+bp21p1(ap+bp)\|a+b\|_{p}\leq 2^{\frac{1}{p}-1}(\|a\|_{p}+\|b\|_{p}) for all a,bNa,b\in\mathbb{R}^{N}. So, we conclude

Δ(A(x)+e)x2(C3+21/p1C2,p/γp)e2+21/p1C2,pσS(x)pS1/p1/2.\|\Delta({A(x)+e})-{x}\|_{2}\leq\left(C_{3}+2^{1/p-1}C_{2,p}/\gamma_{p}\right)\|{e}\|_{2}+2^{1/p-1}C_{2,p}\frac{\sigma_{S}({x})_{\ell^{p}}}{S^{1/p-1/2}}. (38)

That is (i) holds with C=C3+21/p1C2,p(1/γp+1)C=C_{3}+2^{1/p-1}C_{2,p}(1/\gamma_{p}+1).

Next, we prove parts (ii) and (iii) of Theorem 18. As in the analogous proof of [30], Theorem 18 (ii) can be seen as a special case of Theorem 18 (iii), with e=0{e}=0. We therefore turn to proving (iii). Once again, by Lemma 26, there exists vv and zz in N\mathbb{R}^{N} such that the following hold.

Av=e;vpS1/p1/2γpe2,v2C3e2,andAz=AxT0c;zpS1/p1/2γpAxT0c2,z2C3AxT0c2.\begin{array}[]{lll}{Av=e};&\|{v}\|_{p}\leq\frac{S^{1/p-1/2}}{\gamma_{p}}\|{e}\|_{2},&\|{v}\|_{2}\leq C_{3}\|{e}\|_{2},\ \ \text{and}\\ {Az=Ax_{T_{0}^{c}}};&\|{z}\|_{p}\leq\frac{S^{1/p-1/2}}{\gamma_{p}}\|{Ax_{T_{0}^{c}}}\|_{2},&\|{z}\|_{2}\leq C_{3}\|{Ax_{T_{0}^{c}}}\|_{2}.\end{array}

Here T0T_{0} is the set of indices of the largest (in magnitude) SS coefficients of xx, and T0cT_{0}^{c} and xTocx_{T_{o}^{c}} are as in the proof of Theorem 6.

Similar to the previous part we can see that A(xT0+z+v)=Ax+e{A(x_{T_{0}}+z+v)}={Ax+e} and by the hypothesis of (2,p)(2,p) instance optimality of Δ\Delta, we have

Δ(Ax+e)(xT0+z+v)2C2,pσS(xT0+z+v)pS1/p1/2.\|\Delta({Ax+e})-({x_{T_{0}}+z+v})\|_{2}\leq C_{2,p}\frac{\sigma_{S}({x_{T_{0}}+z+v})_{\ell^{p}}}{S^{1/p-1/2}}.

Consequently observing that xT0=xxT0c{x_{T_{0}}}={x-x_{T_{0}^{c}}} and using the triangle inequality,

Δ(A(x)+e)x2\displaystyle\|\Delta({A(x)+e})-{x}\|_{2} xT0czv2+C2,pσS(xT0+z+v)pS1/p1/2\displaystyle\leq\|{x_{T_{0}^{c}}-z-v}\|_{2}+C_{2,p}\frac{\sigma_{S}({x_{T_{0}}+z+v})_{\ell^{p}}}{S^{1/p-1/2}}
xT0czv2+21/p1(C2,p)(zp+vpS1/p1/2)\displaystyle\leq\|{x_{T_{0}^{c}}-z-v}\|_{2}+2^{1/p-1}(C_{2,p})\left(\frac{\|{z}\|_{p}+\|{v}\|_{p}}{S^{1/p-1/2}}\right)
σS(x)2+z2+v2+21/p1C2,p(AxT0c2γp+e2γp)\displaystyle\leq{\sigma_{S}({x})}_{\ell^{2}}+\|{z}\|_{2}+\|{v}\|_{2}+2^{1/p-1}C_{2,p}\left({\frac{\|{Ax_{T_{0}^{c}}}\|_{2}}{\gamma_{p}}+\frac{\|{e}\|_{2}}{\gamma_{p}}}\right)
σS(x)2+(C3+21/p1C2,pγp)(e2+AxT0c2).\displaystyle\leq{\sigma_{S}({x})}_{\ell^{2}}+\left(C_{3}+2^{1/p-1}\frac{C_{2,p}}{\gamma_{p}}\right)(\|{e}\|_{2}+\|{Ax_{T_{0}^{c}}}\|_{2}). (39)

That is (iii) holds with C=1+C3+21/p1C2,pγpC=1+C_{3}+2^{1/p-1}\frac{C_{2,p}}{\gamma_{p}}. By setting e=0e=0, one can see that this is the same constant associated with (ii). This concludes the proof of this theorem. ∎

4.5 Proof of Theorem 19.

First, we show that (Aω,Δp)(A_{\omega},\Delta_{p}) is (2,p)(2,p) instance optimal of order SS for an appropriate range of SS with high probability. One of the fundamental results in compressed sensing theory states that for any δ(0,1)\delta\in(0,1), there exists c~1,c~2>0\widetilde{c}_{1},\widetilde{c}_{2}>0 and ΩRIP\Omega_{\text{RIP}} with P(ΩRIP)12ec~2MP(\Omega_{\text{RIP}})\geq 1-2e^{-\widetilde{c}_{2}M}, all depending only on δ\delta, such that AωA_{\omega}, ωΩRIP\omega\in\Omega_{\text{RIP}}, satisfies RIP(,δ)\text{RIP}(\ell,\delta) for any c~1Mlog(N/M)\ell\leq\widetilde{c}_{1}\frac{M}{log(N/M)}. See, e.g., [6],[1], for the proof of this statement as well as for the explicit values of the constants. Now, choose δ(0,1)\delta\in(0,1) such that δ<22/p1122/p1+1\delta<\frac{2^{2/p-1}-1}{2^{2/p-1}+1}. Then, with c1~,c2~\widetilde{c_{1}},\widetilde{c_{2}}, and ΩRIP\Omega_{\text{RIP}} as above, for every ωΩRIP\omega\in\Omega_{\text{RIP}} and for every S<c~13Mlog(N/M)S<\frac{\widetilde{c}_{1}}{3}\frac{M}{log(N/M)}, the RIP constants of AωA_{\omega} satisfy (18) (and hence (12)), with k=2k=2. Thus, by Corollary 9 (Aω,Δp)(A_{\omega},\Delta_{p}) is instance optimal of order SS with constant C21/pC_{2}^{1/p} as in (15).

Now, set S1=c1Mlog(N/M)S_{1}=c_{1}\frac{M}{log(N/M)} with c1c~1/3c_{1}\leq\widetilde{c}_{1}/3 such that S1S_{1}\in\mathbb{N} (note that such a c1c_{1} exists if MM and NN are sufficiently large). By the hypothesis of the theorem, MM and NN satisfy the hypothesis of the Lemma 17 with ξ=2\xi=2, K1=1K_{1}=1, some 0<μ<1/20<\mu<1/2, and an appropriate K2K_{2} (determined by c~1\widetilde{c}_{1} above). Because

(μ2log(N/M)M)1/p1/2=(μ2c1S1)1/p1/2\left(\mu^{2}\frac{log(N/M)}{M}\right)^{1/p-1/2}=\left(\mu^{2}\frac{c_{1}}{S_{1}}\right)^{1/p-1/2}

by Lemma 17, there exists Ωμ\Omega_{\mu}, P(Ωμ)1ecMP(\Omega_{\mu})\geq 1-e^{-cM} such that for every ωΩμ\omega\in\Omega_{\mu}, AωA_{\omega} satisfies LQp(γp(μ)S11/p1/2)\text{LQ}_{p}\left(\frac{\gamma_{p}(\mu)}{{S_{1}}^{1/p-1/2}}\right) where γp(μ):=c11/p1/2μ2/p1C(p)\gamma_{p}(\mu):=\frac{c_{1}^{1/p-1/2}\mu^{2/p-1}}{C(p)}. Consequently, set Ω1:=ΩRIPΩμ\Omega_{1}:=\Omega_{\text{RIP}}\cap\Omega_{\mu}. Then, P(Ω1)12ec~2MecM13ec2MP(\Omega_{1})\geq 1-2e^{-\widetilde{c}_{2}M}-e^{-cM}\geq 1-3e^{-c_{2}M}, for c2=min{c~2,c}c_{2}=\min\{\widetilde{c}_{2},c\}. Note that c2c_{2} depends on cc, which is now a universal constant, and c~2\widetilde{c}_{2}, which depends only on the distribution of AωA_{\omega} (and in particular its concentration of measure properties, see [1]). Now, if ωΩ1\omega\in\Omega_{1}, AωA_{\omega} satisfies RIP(3S1,δ)\text{RIP}(3S_{1},\delta), thus RIP(S1,δ)\text{RIP}(S_{1},\delta), as well as LQp(γpS11/p1/2)\text{LQ}_{p}\left(\frac{\gamma_{p}}{{S_{1}}^{1/p-1/2}}\right). Therefore we can apply part (i) of Theorem 18 to get the first part of this theorem, i.e.,

Δ(Aω(x)+e)x2C(e2+σS1(x)pS11/p1/2).\|\Delta({A_{\omega}(x)+e})-{x}\|_{2}\leq C\left(\|e\|_{2}+\frac{\sigma_{S_{1}}(x)_{\ell^{p}}}{{S_{1}}^{1/p-1/2}}\right). (40)

Here CC is as in (38) with C2,p=C21/pC_{2,p}=C_{2}^{1/p}. To finish the proof of part (i), note that for SS1S\leq S_{1}, σS1(x)pσS(x)p\sigma_{S_{1}}(x)_{\ell^{p}}\leq\sigma_{S}(x)_{\ell^{p}} and S1/p1/2S11/p1/2S^{1/p-1/2}\leq S_{1}^{1/p-1/2}.

To prove part (ii), first define T0T_{0} as the support of the S1S_{1} largest coefficients (in magnitude) of xx and T0c={1,,N}T0T_{0}^{c}=\{1,...,N\}\setminus{T_{0}}. Now, note that for any xx there exists a set Ω~x\widetilde{\Omega}_{x} with P(Ω~x)1ec~MP(\widetilde{\Omega}_{x})\geq 1-e^{-\tilde{c}M} for some universal constant c~>0\widetilde{c}>0, such that for all ωΩ~x\omega\in\widetilde{\Omega}_{x}, AωxT0c22xT0c2=2σS1(x)2\|{A_{\omega}x_{T_{0}^{c}}}\|_{2}\leq 2\|{x_{T_{0}^{c}}}\|_{2}=2\sigma_{S_{1}}{(x)_{\ell^{2}}} (this follows from the concentration of measure property of Gaussian matrices, see, e.g., [1]). Define Ωx:=Ω~xΩ1\Omega_{x}:=\widetilde{\Omega}_{x}\cap\Omega_{1}. Thus, P(Ωx)13ec2Mec~M14ec3MP(\Omega_{x})\geq 1-3e^{-c_{2}M}-e^{-\widetilde{c}M}\geq 1-4e^{-c_{3}M} where c3=min{c2,c~}c_{3}=\min\{c_{2},\widetilde{c}\}. Note that the dependencies of c3c_{3} are identical to those of c2c_{2} discussed above. Recall that for ωΩ1\omega\in\Omega_{1}, AωA_{\omega} satisfies both RIP(S1,δ)\text{RIP}(S_{1},\delta) and LQp(γp(S1)1/p1/2)\text{LQ}_{p}\left(\frac{\gamma_{p}}{{(S_{1})}^{1/p-1/2}}\right). We can now apply part (iii) of Theorem 18 to obtain for ωΩx\omega\in\Omega_{x}

Δ(Aω(x)+e)x2C(3σS1(x)2+e2).\|\Delta({A_{\omega}(x)+e})-{x}\|_{2}\leq C\left({3\sigma_{S_{1}}({x})}_{\ell^{2}}+\|{e}\|_{2}\right). (41)

Above, the constant CC is as in (39). Once again, note that for SS1S\leq S_{1}, σS1(x)2σS(x)2\sigma_{S_{1}}(x)_{\ell^{2}}\leq\sigma_{S}(x)_{\ell^{2}} to finish the proof for any SS1S\leq S_{1}. ∎

5 Appendix: Proof of Lemma 25

In this section we provide the proof of Lemma 25 for the sake of completeness and also because we explicitly calculate the optimal constants involved. Let us first introduce some notation used in [18] and [23].

For a body KnK\subset\mathbb{R}^{n}, define its gauge functional by xK:=inf{t>0:xtK}\|x\|_{K}:=\inf\{t>0:\ x\in tK\}, and let Tq(K)T_{q}(K), q(1,2]q\in(1,2], be the smallest constant CC such that

m,x1,,xmKinfϵi=±1{i=1mϵixiK}Cm1/q.\forall m\in\mathbb{N},\ x_{1},...,x_{m}\in K\quad\inf_{\epsilon_{i}=\pm 1}\left\{\|\sum_{i=1}^{m}\epsilon_{i}x_{i}\|_{K}\right\}\leq Cm^{1/q}.

Given a pp-convex body KK and a positive integer rr, define

αr=αr(K):=sup{i=1rxiKr:xiK,ir}.\alpha_{r}=\alpha_{r}(K):=\sup\{\frac{\|\sum_{i=1}^{r}x_{i}\|_{K}}{r}:\ x_{i}\in K,i\leq r\}.

Note that αrr1+1/p\alpha_{r}\leq r^{-1+1/p}.

Finally, conforming with the notation used in [18] and [23], we define δK:=d1(K,conv(K))\delta_{K}:=d_{1}(K,conv(K)). Note that this should not cause confusion as we do not refer to the RIP constants throughout the rest of the paper. It can be shown by a result of [24] that δK=suprαr(K)\delta_{K}=\sup_{r}\alpha_{r}(K), cf. [18, Lemma 1] for a proof.

We will need the following propositions.

Proposition 27 (sub-additivity of Kp\|\cdot\|_{K}^{p})

For the gauge functional K\|\cdot\|_{K} associated with a pp-convex body KnK\in\mathbb{R}^{n}, the following inequality holds for any x,ynx,y\in\mathbb{R}^{n}.

x+yKpxKp+yKp.\|x+y\|_{K}^{p}\leq\|x\|_{K}^{p}+\|y\|_{K}^{p}. (42)
{@proof}

[Proof.] Let r=xKr=\|x\|_{K} and u=yKu=\|y\|_{K}. If at least one of rr and uu is zero, then (42) holds trivially. (Note that, as KK is a body, xK=0\|x\|_{K}=0 if and only if x=0x=0.) So, we may assume that both rr and uu are strictly positive. Since KK is compact, it follows that x/rKx/r\in K and y/uKy/u\in K. Furthermore, KK is pp-convex, i.e., for all α,β[0,1]\alpha,\beta\in[0,1] with α+β=1\alpha+\beta=1, we have α1/px/r+β1/py/uK\alpha^{1/p}x/r+\beta^{1/p}y/u\in K. In particular, choose α=rprp+up\alpha=\frac{r^{p}}{r^{p}+u^{p}} and β=uprp+up\beta=\frac{u^{p}}{r^{p}+u^{p}}. This gives x+y(rp+up)1/pK\frac{x+y}{(r^{p}+u^{p})^{1/p}}\in K. Consequently, by the definition of the gauge functional x+y(rp+up)1/pK1\|\frac{x+y}{(r^{p}+u^{p})^{1/p}}\|_{K}\leq 1. Finally, x+y(rp+up)1/pKp=x+yKp(rp+up)1\|\frac{x+y}{(r^{p}+u^{p})^{1/p}}\|_{K}^{p}=\frac{\|{x+y}\|_{K}^{p}}{(r^{p}+u^{p})}\leq 1 and x+yKprp+up=xKp+yKp\|x+y\|_{K}^{p}\leq r^{p}+u^{p}=\|x\|_{K}^{p}+\|y\|_{K}^{p}. ∎

Proposition 28

T2(B2n)=1T_{2}(B_{2}^{n})=1.

{@proof}

[Proof.] Note that B2n=2\|\cdot\|_{B_{2}^{n}}=\|\cdot\|_{2}, and thus, by definition, T2(B2n)T_{2}(B_{2}^{n}) is the smallest constant CC such that for every positive integer mm and for every choice of points x1,,xmB2x_{1},...,x_{m}\in B_{2},

infϵi=±1{i=1mϵixi2}Cm.\inf_{\epsilon_{i}=\pm 1}\left\{\|\sum_{i=1}^{m}\epsilon_{i}x_{i}\|_{2}\right\}\leq C\sqrt{m}. (43)

For mnm\leq n, we can choose {x1,,xm}\{x_{1},\dots,x_{m}\} to be orthonormal. Consequently,

i=1mϵixi22=i=1mϵi2=m,\|\sum_{i=1}^{m}\epsilon_{i}x_{i}\|^{2}_{2}=\sum_{i=1}^{m}\epsilon_{i}^{2}=m,

and thus, T2=T2(B2n)1T_{2}=T_{2}(B_{2}^{n})\geq 1. On the other hand, let mm be an arbitrary positive integer, and suppose that {x1,,xm}B2n\{x_{1},\dots,x_{m}\}\subset B_{2}^{n}. Then, it is easy to show that there exists a choice of signs ϵi\epsilon_{i}, i=1,,mi=1,\dots,m such that

infϵi=±1{i=1mϵixi2}m.\inf_{\epsilon_{i}=\pm 1}\left\{\|\sum_{i=1}^{m}\epsilon_{i}x_{i}\|_{2}\right\}\leq\sqrt{m}.

Indeed, we will show this by induction. First, note that ϵ1x12=x121\|\epsilon_{1}x_{1}\|_{2}=\|x_{1}\|_{2}\leq\sqrt{1}. Next, assume that there exists ϵ1,,ϵk1\epsilon_{1},\dots,\epsilon_{k-1} such that

i=1k1ϵixi2k1.\|\sum_{i=1}^{k-1}\epsilon_{i}x_{i}\|_{2}\leq\sqrt{k-1}.

Then (using parallelogram law),

min{i=1k1ϵixi+xk22,i=1k1ϵixixk22}i=1k1ϵixi22+xk22k.\min\{\|\sum_{i=1}^{k-1}\epsilon_{i}x_{i}+x_{k}\|_{2}^{2},\|\sum_{i=1}^{k-1}\epsilon_{i}x_{i}-x_{k}\|_{2}^{2}\}\leq\|\sum_{i=1}^{k-1}\epsilon_{i}x_{i}\|_{2}^{2}+\|x_{k}\|_{2}^{2}\leq k.

Choosing ϵk\epsilon_{k} accordingly, we get

i=1kϵixi22k,\|\sum_{i=1}^{k}\epsilon_{i}x_{i}\|_{2}^{2}\leq k,

which implies that T21T_{2}\leq 1. Using the fact that T21T_{2}\geq 1 which we showed above, we conclude that T2=1T_{2}=1. ∎

Proof of Lemma 25

We now present a proof of the more general form of Lemma 25 as stated in [18] and [23] (albeit for the Banach-Mazur distance in place of d1d_{1}). The proof is essentially as in [18], cf. [23], which in fact also works with the distance d1d_{1} to establish an upper bound on the Banach-Mazur distance between a pp-convex body and a symmetric body.

Lemma 29

Let 0<p<10<p<1, q(1,2]q\in(1,2], and let KK be a pp-convex body. Suppose that BB is a symmetric body with respect to the origin such that conv(K)Bconv(K)\subset B. Then

d1(K,B)Cp,q[Tq(B)]ϕ1[d1(conv(K),B)]ϕ,d_{1}(K,B)\leq C_{p,q}[T_{q}(B)]^{\phi-1}[d_{1}(conv(K),B)]^{\phi},

where ϕ=1/p1/q11/q\phi=\frac{1/p-1/q}{1-1/q}.

{@proof}

[Proof.] Note that Kconv(K)BK\subset conv(K)\subset B, and therefore d1(K,B)d_{1}(K,B) is well-defined. Let d=d1(K,B)d=d_{1}(K,B) and T=Tq(B)T=T_{q}(B). Thus, (1/d)BKB(1/d)B\subset K\subset B. Let mm be a positive integer and let xi,i1,2,,2mx_{i},i\in{1,2,...,2^{m}} be a collection of points in KK. Then, xiBx_{i}\in B and by the definition of TT, there is a choice of signs ϵi\epsilon_{i} so that i=12mϵixiBT2m/q\|\sum_{i=1}^{2^{m}}\epsilon_{i}x_{i}\|_{B}\leq T2^{m/q}. Since BB is symmetric, we can assume that D={i:ϵi=1}D=\{i:\ \epsilon_{i}=1\} has |D|>2m1|D|>2^{m-1}. Now we can write

i=12mxiKp\displaystyle\|\sum_{i=1}^{2^{m}}x_{i}\|_{K}^{p} =\displaystyle= i=12mϵixi+2iDxiKpdpi=12mϵixiBp+2piDxiKp\displaystyle\|\sum_{i=1}^{2^{m}}\epsilon_{i}x_{i}+2\sum_{i\notin D}x_{i}\|_{K}^{p}\leq d^{p}\|\sum_{i=1}^{2^{m}}\epsilon_{i}x_{i}\|_{B}^{p}+2^{p}\|\sum_{i\notin D}x_{i}\|_{K}^{p} (44)
\displaystyle\leq dpTp2mp/q+2mpα2m1p,\displaystyle d^{p}T^{p}2^{mp/q}+2^{mp}\alpha_{2^{m-1}}^{p},

where the first inequality uses the sub-additivity of K\|\cdot\|_{K} and the fact that (1/d)BK(1/d)B\subset K. Thus by taking the supremum in (44) over all possible xix_{i}’s and dividing by 2mp2^{mp}, we obtain, for any mm,

α2mpdpTp2mp/qmp+α2m1p.\alpha_{2^{m}}^{p}\leq d^{p}T^{p}2^{mp/q-mp}+\alpha_{2^{m-1}}^{p}.

By applying this inequality for m1,m2,,km-1,m-2,...,k, we obtain the following inequality for any kmk\leq m

α2mpdpTpi=k+12ip(11/q)+α2kpdpTp2kp(11/q)p(11/q)log2+2k(1p).\alpha_{2^{m}}^{p}\leq d^{p}T^{p}\sum_{i=k+1}^{\infty}2^{-ip(1-1/q)}+\alpha_{2^{k}}^{p}\leq d^{p}T^{p}\frac{2^{-kp(1-1/q)}}{p(1-1/q)\log 2}+2^{k(1-p)}. (45)

Since δK=suprαr\delta_{K}=\sup_{r}\alpha_{r}, we now want to minimize the right hand side in (45) by choosing kk appropriately. To that end, define

f(k):=2k(1p)+(dT)p2k(11/q)pp(11/q)log2f(k):=2^{k(1-p)}+(dT)^{p}\frac{2^{-k(1-1/q)p}}{p(1-1/q)\log 2}

and

A:=(dT)pp(11/q)log2.A:=\frac{(dT)^{p}}{p(1-1/q)\log 2}.

Since α2mpf(k)\alpha_{2^{m}}^{p}\leq f(k) for any k{1,,m1}k\in\{1,...,m-1\}, the best bound on α2mp\alpha_{2^{m}}^{p} is essentially given by f(k)f(k^{*}), where f(k)=0f^{\prime}(k^{*})=0. However, since kk^{*} is not necessarily an integer (which we require), we will instead use f(k+1)f(k)f(k)f(k^{*}+1)\geq f(\lceil k^{*}\rceil)\geq f(k^{*}) as a bound. Thus, we solve f(k)=0f^{\prime}(k^{*})=0 to obtain k=11p/qlog2(Ap(11/q)1p)k^{*}=\frac{1}{1-p/q}\log_{2}\left(\frac{Ap(1-1/q)}{1-p}\right). By evaluating f(k)f(k) at k+1k^{*}+1, we obtain α2m(f(k+1))1/p\alpha_{2^{m}}\leq\left(f(k^{*}+1)\right)^{1/p} for every mk+1m\geq k^{*}+1. In other words, for every mk+1m\geq k^{*}+1 , we have

α2m(dT)1p1p/q(21p+2p(11/q)1ppp/q)1/p(1(1p)log2)1/p1p(1p/q).\alpha_{2^{m}}\leq(dT)^{\frac{1-p}{1-p/q}}\left(2^{1-p}+2^{-p(1-1/q)}\frac{1-p}{p-p/q}\right)^{1/p}\left(\frac{1}{(1-p)\log 2}\right)^{\frac{1/p-1}{p(1-p/q)}}. (46)

On the other hand, if mkm\leq k^{*}, then α2mp2m(1p)2(k+1)(1p)\alpha_{2^{m}}^{p}\leq 2^{m(1-p)}\leq 2^{(k^{*}+1)(1-p)}. However, this last bound is one of the summands in the right hand side of (45) with k=k+1k=k^{*}+1 (which we provide a bound for in (46)). Consequently (46) holds for all mm. In particular, it holds for the value of mm which achieves the supremum of α2m\alpha_{2^{m}}. Since δK=suprαr\delta_{K}=\sup_{r}\alpha_{r}, we obtain

δK(dT)(1p)(1p/q)(21p+2p(11/q)1pp(11/q))1/p(1(1p)log2)1/p1p(1p/q).\delta_{K}\leq(dT)^{\frac{(1-p)}{(1-p/q)}}\left(2^{1-p}+2^{-p(1-1/q)}\frac{1-p}{p(1-1/q)}\right)^{1/p}\left(\frac{1}{(1-p)\log 2}\right)^{\frac{1/p-1}{p(1-p/q)}}. (47)
Remark 30

In the previous step we utilize the fact that in the derivations above we can replace every 2m2^{m} and 2k2^{k} with mm and kk respectively, thus every mm and kk with log2mlog_{2}m and log2klog_{2}k without changing (46). This allows us to pass from the bound on α2m\alpha_{2^{m}} to δK=suprαr\delta_{K}=sup_{r}\alpha_{r} without any problems.

Recalling the definitions of d1(conv(K),B)d_{1}(conv(K),B) and δK\delta_{K}, note the following inclusions:

1δKd1(conv(K,B))B1δKconv(K)Kconv(K)B.\frac{1}{\delta_{K}d_{1}(conv(K,B))}B\subset\frac{1}{\delta_{K}}conv(K)\subset K\subset conv(K)\subset B. (48)

Consequently 1δKd1(conv(K,B))BKB\frac{1}{\delta_{K}d_{1}(conv(K,B))}B\subset K\subset B and the inequality

d1(K,B)=dδKd1(conv(K),B)d_{1}(K,B)=d\leq\delta_{K}d_{1}(conv(K),B) (49)

follows from the definition of d1(K,B)d_{1}(K,B). Combining (49) and (47) we complete the proof with

Cp,q=(21p+2p(11/q)1pp(11/q))1p/qp2(11/q)(1(1p)log2)1/p1p(11/q).C_{p,q}=\left(2^{1-p}+2^{-p(1-1/q)}\frac{1-p}{p(1-1/q)}\right)^{\frac{1-p/q}{p^{2}(1-1/q)}}\left(\frac{1}{(1-p)\log 2}\right)^{\frac{1/p-1}{p(1-1/q)}}.

∎ Finally, we choose above B=B2nB=B_{2}^{n} and q=2q=2, recall that T=T2(B2n)=1T=T_{2}(B_{2}^{n})=1 (see Proposition 28), and obtain Lemma 25 as a corollary with

C(p)=(21p+(1p)21p/2p)2pp2(1(1p)log2)22pp2.C(p)=\left(2^{1-p}+\frac{(1-p)2^{1-p/2}}{p}\right)^{\frac{2-p}{p^{2}}}\left(\frac{1}{(1-p)\log 2}\right)^{\frac{2-2p}{p^{2}}}. (50)

Acknowledgment

The authors would like to thank Michael Friedlander, Gilles Hennenfent, Felix Herrmann, and Ewout Van Den Berg for many fruitful discussions. This work was finalized during an AIM workshop. We thank the American Institute of Mathematics for its hospitality. Moreover, R. Saab thanks Rabab Ward for her immense support. The authors also thank the anonymous reviewers for their constructive comments which improved the paper significantly.

References

  • [1] R. Baraniuk, M. Davenport, R. DeVore, M. Wakin, A simple proof of the restricted isometry property for random matrices, Constructive Approximation 28 (3) (2008) 253–263.
  • [2] E. J. Candès, The restricted isometry property and its implications for compressed sensing, Comptes rendus-Mathématique 346 (9-10) (2008) 589–592.
  • [3] E. J. Candès, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory 52 (2006) 489–509.
  • [4] E. J. Candès, J. Romberg, T. Tao, Stable signal recovery from incomplete and inaccurate measurements, Communications on Pure and Applied Mathematics 59 (2006) 1207–1223.
  • [5] E. J. Candès, T. Tao, Decoding by linear programming., IEEE Transactions on Information Theory 51 (12) (2005) 489–509.
  • [6] E. J. Candès, T. Tao, Near-optimal signal recovery from random projections: universal encoding strategies?, IEEE Transactions on Information Theory 52 (12) (2006) 5406–5425.
  • [7] R. Chartrand, Exact reconstructions of sparse signals via nonconvex minimization, IEEE Signal Processing Letters 14 (10) (2007) 707–710.
  • [8] R. Chartrand, V. Staneva, Restricted isometry properties and nonconvex compressive sensing, Inverse Problems 24 (035020).
  • [9] S. Chen, D. Donoho, M. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computing 20 (1) (1999) 33–61.
    URL citeseer.ist.psu.edu/chen98atomic.html
  • [10] A. Cohen, W. Dahmen, R. DeVore, Compressed sensing and best k-term approximation, Journal of the American Mathematical Society 22 (1) (2009) 211–231.
  • [11] I. Daubechies, R. DeVore, M. Fornasier, S. Gunturk, Iteratively re-weighted least squares minimization for sparse recovery, Communications on Pure and Applied Mathematics (to appear).
  • [12] M. Davies, R. Gribonval, Restricted Isometry Constants where p\ell^{p} sparse recovery can fail for 0<p10<p\leq 1, IEEE Transactions on Information Theory 55 (5) (2009) 2203–2214.
  • [13] D. Donoho, Compressed sensing., IEEE Transactions on Information Theory 52 (4) (2006) 1289–1306.
  • [14] D. Donoho, M. Elad, Optimally sparse representation in general (nonorthogonal) dictionaries via 1\ell^{1} minimization, Proceedings of the National Academy of Sciences of the United States of America 100 (5) (2003) 2197–2202.
  • [15] D. Donoho, X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Transactions on Information Theory 47 (2001) 2845–2862.
  • [16] M. Figueiredo, R. Nowak, S. Wright, Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems, IEEE Journal of Selected Topics in Signal Processing 1 (4) (2007) 586–597.
  • [17] S. Foucart, M. Lai, Sparsest solutions of underdetermined linear systems via q\ell^{q}-minimization for 0<q10<q\leq 1, Applied and Computational Harmonic Analysis 26 (3) (2009) 395–407.
  • [18] Y. Gordon, N. Kalton, Local structure theory for quasi-normed spaces, Bulletin des sciences mathématiques 118 (1994) 441–453.
  • [19] R. Gribonval, R. M. Figueras i Ventura, P. Vandergheynst, A simple test to check the optimality of sparse signal approximations, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’05), vol. 5, 2005.
  • [20] R. Gribonval, R. M. Figueras i Ventura, P. Vandergheynst, A simple test to check the optimality of sparse signal approximations, EURASIP Signal Processing, special issue on Sparse Approximations in Signal and Image Processing 86 (3) (2006) 496–510.
  • [21] R. Gribonval, M. Nielsen, On the strong uniqueness of highly sparse expansions from redundant dictionaries, in: International Conference on Independent Component Analysis (ICA’04), LNCS, Springer-Verlag, Granada, Spain, 2004.
  • [22] R. Gribonval, M. Nielsen, Highly sparse representations from dictionaries are unique and independent of the sparseness measure, Applied and Computational Harmonic Analysis 22 (3) (2007) 335–355.
  • [23] O. Guedon, A. Litvak, Euclidean projections of p-convex body, GAFA, Lecture Notes in Math 1745 (2000) 95–108.
  • [24] N. Peck, Banach-mazur distances and projections on p-convex spaces, Mathematische Zeitschrift 177 (1) (1981) 131–142.
  • [25] R. Saab, R. Chartrand, O. Yilmaz, Stable sparse approximations via nonconvex optimization, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2008.
  • [26] J. Tropp, Recovery of short, complex linear combinations via l1l^{1} minimization, IEEE Transactions on Information Theory 51 (4) (2005) 1568–1570.
  • [27] E. van den Berg, M. Friedlander, In pursuit of a root, UBC Computer Science Technical Report TR-2007-16.
    URL http://www.optimization-online.org/DB_FILE/2007/06/1708%.pdf
  • [28] P. Vandergheynst, P. Frossard, Efficient image representation by anisotropic refinement in matching pursuit, in: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, 2001.
  • [29] R. Ventura, P. Vandergheynst, P. Frossard, Low rate and scalable image coding with redundant representations, IEEE Transactions on Image Processing 15 (3) (2006) 726–739.
  • [30] P. Wojtaszczyk, Stability and instance optimality for gaussian measurements in compressed sensing, Foundations of Computational Mathematics (to appear).
    URL http://dx.doi.org/10.1007/s10208-009-9046-4