This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

An elementary method for the problem of column subset selection in a rectangular matrix

Stéphane Chrétien and Sébastien Darses National Physical Laboratory
Hampton road
Teddington TW11 0LW, UK
stephane.chretien@npl.co.uk LATP, UMR 6632
Université Aix-Marseille, Technopôle Château-Gombert
39 rue Joliot Curie
13453 Marseille Cedex 13, France
sebastien.darses@univ-amu.fr
Abstract.

The problem of extracting a well conditioned submatrix from any rectangular matrix (with e.g. normalized columns) has been a subject of extensive research with applications to rank revealing factorization, low stretch spanning trees, sparse solutions to least squares regression problems, and is also connected with problems in functional and harmonic analysis. Here, we provide a deterministic algorithm which extracts a submatrix XSX_{S} from any matrix XX with guaranteed individual lower and upper bounds on each singular value of XSX_{S}. The proof of our main result is short and elementary.

keywords: Column subset selection, Restricted Invertibility,

1. Introduction

Let Xn×pX\in\mathbb{R}^{n\times p} be a matrix such that all columns of XX have unit euclidean 2\ell_{2}-norm. We denote by x2\|x\|_{2} the 2\ell_{2}-norm of a vector xx and by X\|X\| (resp. XHS\|X\|_{HS}) the associated operator norm (resp. the Hilbert-Schmidt norm). Let XTX_{T} denote the submatrix of XX obtained by extracting the columns of XX indexed by T{1,,p}T\subset\{1,\ldots,p\}. For any real symmetric matrix AA, let λk(A)\lambda_{k}(A) denote the kk-th eigenvalue of AA, and we order the eigenvalues as λ1(A)λ2(A)\lambda_{1}(A)\geq\lambda_{2}(A)\geq\cdots. We also write λmin(A)\lambda_{\min}(A) (resp. λmax(A)\lambda_{\max}(A)) for the smallest (resp. largest) eigenvalue of AA. We finally write |S||S| for the size of a set SS.

The problem of well conditioned column selection that we condider here consists in finding the largest subset of columns of XX such that the corresponding submatrix has all singular values in a prescribed interval [1ε,1+ε][1-\varepsilon,1+\varepsilon]. The one-sided problem of finding the largest possible TT such that λmin(XTtXT)1ε\lambda_{\min}(X_{T}^{t}X_{T})\geq 1-\varepsilon is called the Restricted Invertibility Problem and has a long history starting with the seminal work of Bourgain and Tzafriri [1]. Applications of such results are well known in the domain of harmonic analysis [1]. The study of the condition number is also a subject of extensive study in statistics and signal processing [5].

Here, we propose an elementary approach to this problem based on two simple ingredients:

  1. (1)

    Choosing recursively y𝒱y\in\mathcal{V}, the set of remaining columns of XX, verifying

    Q(y)1|𝒱|x𝒱Q(x),\displaystyle Q(y)\leq\frac{1}{|\mathcal{V}|}\sum_{x\in\mathcal{V}}Q(x),

    where QQ is a relevant quantity depending on the previous chosen vectors;

  2. (2)

    a well-known equation (sometimes called secular equation) whose roots are the eigenvalues of a square matrix after appending a row and a line.

1.1. Historical background

Concerning the Restricted Invertibility problem, Bourgain and Tzafriri [1] obtained the following result for square matrices:

Theorem 1.1 ([1]).

Given a p×pp\times p matrix XX whose columns have unit 2\ell_{2}-norm, there exists T{1,,p}T\subset\{1,\ldots,p\} with |T|dpX2\displaystyle|T|\geq d\frac{p}{\|X\|^{2}} such that Cλmin(XTtXT)C\leq\lambda_{\min}(X_{T}^{t}X_{T}), where dd and CC are absolute constants.

See also [4] for a simpler proof. Vershynin [6] generalized Bourgain and Tzafriri’s result to the case of rectangular matrices and the estimate of |T||T| was improved as follows.

Theorem 1.2 ([6]).

Given a n×pn\times p matrix XX and letting X~\widetilde{X} be the matrix obtained from XX by 2\ell_{2}-normalizing its columns. Then, for any ε(0,1)\varepsilon\in(0,1), there exists T{1,,p}T\subset\{1,\ldots,p\} with

|T|(1ε)XHS2X2\displaystyle|T|\geq(1-\varepsilon)\frac{\|X\|_{HS}^{2}}{\|X\|^{2}}

such that C1(ε)λmin(X~TtX~T)λmax(X~TtX~T)C2(ε)C_{1}(\varepsilon)\leq\lambda_{\min}(\widetilde{X}_{T}^{t}\widetilde{X}_{T})\leq\lambda_{\max}(\widetilde{X}_{T}^{t}\widetilde{X}_{T})\leq C_{2}(\varepsilon).

Recently, Spielman and Srivastava proposed in [3] a deterministic construction of TT which allows them to obtain the following result.

Theorem 1.3 ([3]).

Let XX be a p×pp\times p matrix and ε(0,1)\varepsilon\in(0,1). Then there exists T{1,,p}T\subset\{1,\ldots,p\} with |T|(1ε)2XHS2X2\displaystyle|T|\geq(1-\varepsilon)^{2}\frac{\|X\|_{HS}^{2}}{\|X\|^{2}} such that  ε2X2pλmin(XTtXT)\displaystyle\varepsilon^{2}\frac{\|X\|^{2}}{p}\leq\lambda_{\min}(X_{T}^{t}X_{T}).

The technique of proof relies on new constructions and inequalities which are thoroughly explained in the Bourbaki seminar of Naor [2]. Using these techniques, Youssef [7] improved Vershynin’s result as:

Theorem 1.4 ([7]).

Given a n×pn\times p matrix XX and letting X~\widetilde{X} be the matrix obtained from XX by 2\ell_{2}-normalizing its columns. Then, for any ε(0,1)\varepsilon\in(0,1), there exists T{1,,p}T\subset\{1,\ldots,p\} with |T|ε29XHS2X2\displaystyle|T|\geq\frac{\varepsilon^{2}}{9}\frac{\|X\|_{HS}^{2}}{\|X\|^{2}} such that  1ελmin(X~TtX~T)λmax(X~TtX~T)1+ε1-\varepsilon\leq\lambda_{\min}(\widetilde{X}_{T}^{t}\widetilde{X}_{T})\leq\lambda_{\max}(\widetilde{X}_{T}^{t}\widetilde{X}_{T})\leq 1+\varepsilon.

1.2. Our contribution

We provide a deterministic algorithm that extracts a submatrix YrY_{r} from the matrix XX with guaranteed individual lower and upper bounds on each singular value of YrY_{r}.

Consider the set of vectors 𝒱0={x1,,xp}\mathcal{V}_{0}=\{x_{1},\ldots,x_{p}\}, where the xix_{i} are the columns of XX. At step r=1r=1, choose y1𝒱0y_{1}\in\mathcal{V}_{0}. By induction, let us be given y1,,yry_{1},\ldots,y_{r} at step rr. Let YrY_{r} denote the matrix whose columns are y1,,yry_{1},\ldots,y_{r} and let vkv_{k} be an unit eigenvector of YrtYrY_{r}^{t}Y_{r} associated to λk,r:=λk(YrtYr)\lambda_{k,r}:=\lambda_{k}(Y_{r}^{t}Y_{r}).

We say that u(,)u(\cdot,\cdot) satisfies the hypothesis (H) if uu verifies for r1r\geq 1:

(1.1) 0u(k,r)\displaystyle 0\leq u(k,r) \displaystyle\leq u(k+1,r+1),k{0,,r};\displaystyle u(k+1,r+1),\quad k\in\{0,\cdots,r\};
(1.2) 0u(k+1,r)\displaystyle 0\leq u(k+1,r) \displaystyle\leq u(1,r)<u(0,r)k{1,,r1}.\displaystyle u(1,r)\ <\ u(0,r)\quad k\in\{1,\cdots,r-1\}.

We now introduce the "potential" associated to u(,)u(\cdot,\cdot) satisfying (H):

Qr(x)\displaystyle Q_{r}(x) =\displaystyle= k=1r(vktYrtx)2u(0,r)u(k,r),x𝒱0.\displaystyle\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}x)^{2}}{u(0,r)-u(k,r)},\quad x\in\mathcal{V}_{0}.

We then choose yr+1𝒱r:={x1,,xp}{y1,,yr}y_{r+1}\in\mathcal{V}_{r}:=\{x_{1},\ldots,x_{p}\}\setminus\{y_{1},\ldots,y_{r}\} so that

(1.3) Qr(yr+1)\displaystyle Q_{r}(y_{r+1}) \displaystyle\leq 1prx𝒱rQr(x)=1prk=1rx𝒱r(vktYrtx)2u(0,r)u(k,r).\displaystyle\frac{1}{p-r}\sum_{x\in\mathcal{V}_{r}}Q_{r}(x)=\frac{1}{p-r}\sum_{k=1}^{r}\frac{\sum_{x\in\mathcal{V}_{r}}(v_{k}^{t}Y_{r}^{t}x)^{2}}{u(0,r)-u(k,r)}.

The following result, for which we propose a short and elementary proof, gives a control on all singular values in the column selection problem.

Theorem 1.5.

Let uu satisfies Hypothesis (H). Set Rp/2R\leq p/2. Then, we can extract from XX some submatrices YrY_{r} such that for all rr and kk with 1krR1\leq k\leq r\leq R, we have

(1.4) 1δRu(rk+1,r)λ1,rλk,r 1+δRu(k,r)λ1,r,\displaystyle 1-\delta_{R}\ u(r-k+1,r)\sqrt{\lambda_{1,r}}\ \leq\ \lambda_{k,r}\ \leq\ 1+\delta_{R}\ u(k,r)\sqrt{\lambda_{1,r}},

where

(1.5) δR\displaystyle\delta_{R} =\displaystyle= 2X2psup1rRk=1ru(0,r)1u(0,r)u(k,r).\displaystyle\sqrt{\frac{2\|X\|^{2}}{p}\sup_{1\leq r\leq R}\sum_{k=1}^{r}\frac{u(0,r)^{-1}}{u(0,r)-u(k,r)}}.

In particular,

λ1,r\displaystyle\lambda_{1,r} \displaystyle\leq 1+2δRu(1,r).\displaystyle 1+2\delta_{R}\ u(1,r).

2. Proof of Theorem 1.5

2.1. Suitable choice of the extracted vectors

Consider the set of vectors 𝒱0={x1,,xp}\mathcal{V}_{0}=\{x_{1},\ldots,x_{p}\}. At step 11, choose y1𝒱0y_{1}\in\mathcal{V}_{0}. By induction, let us be given y1,,yry_{1},\ldots,y_{r} at step rr. Let YrY_{r} denote the matrix whose columns are y1,,yry_{1},\ldots,y_{r} and let vkv_{k} be an unit eigenvector of YrtYrY_{r}^{t}Y_{r} associated to λk,r:=λk(YrtYr)\lambda_{k,r}:=\lambda_{k}(Y_{r}^{t}Y_{r}). Let us choose yr+1𝒱r:={x1,,xp}{y1,,yr}y_{r+1}\in\mathcal{V}_{r}:=\{x_{1},\ldots,x_{p}\}\setminus\{y_{1},\ldots,y_{r}\} so that

(2.6) k=1r(vktYrtyr+1)2u(0,r)u(k,r)1prx𝒱rk=1r(vktYrtx)2u(0,r)u(k,r)=1prk=1rx𝒱r(vktYrtx)2u(0,r)u(k,r).\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}\leq\frac{1}{p-r}\sum_{x\in\mathcal{V}_{r}}\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}x)^{2}}{u(0,r)-u(k,r)}=\frac{1}{p-r}\sum_{k=1}^{r}\frac{\sum_{x\in\mathcal{V}_{r}}(v_{k}^{t}Y_{r}^{t}x)^{2}}{u(0,r)-u(k,r)}.
Lemma 2.1.

For all r1r\geq 1, yr+1y_{r+1} verifies

k=1r(vktYrtyr+1)2u(0,r)u(k,r)\displaystyle\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)} \displaystyle\leq λ1,rX2prsup1jrk=1j1u(0,j)u(k,j).\displaystyle\frac{\lambda_{1,r}\|X\|^{2}}{p-r}\sup_{1\leq j\leq r}\sum_{k=1}^{j}\frac{1}{u(0,j)-u(k,j)}.
Proof.

Let XrX_{r} be the matrix whose columns are the x𝒱rx\in\mathcal{V}_{r}, i.e. XrXrt=x𝒱rxxtX_{r}X_{r}^{t}=\sum_{x\in\mathcal{V}_{r}}xx^{t}. Then

x𝒱r(vktYrtx)2=Tr(YrvkvktYrtXrXrt)Tr(YrvkvktYrt)XrXrtλk,rX2,\displaystyle\sum_{x\in\mathcal{V}_{r}}(v_{k}^{t}Y_{r}^{t}x)^{2}={\rm Tr}\left(Y_{r}v_{k}v_{k}^{t}Y_{r}^{t}X_{r}X_{r}^{t}\right)\leq{\rm Tr}(Y_{r}v_{k}v_{k}^{t}Y_{r}^{t})\|X_{r}X_{r}^{t}\|\leq\lambda_{k,r}\|X\|^{2},

which yields the conclusion by plugging in into (2.6) since λk,rλ1,r\lambda_{k,r}\leq\lambda_{1,r}. ∎

2.2. Controlling the individual eigenvalues

It is clear that (1.4) holds for r=1r=1 since then, 1 is the only singular value because the columns are supposed to be normalized.

Assume the induction hypothesis (Hr)(H_{r}): for all kk with 1kr<R1\leq k\leq r<R, (1.4) holds.

Let us then show that (Hr+1)(H_{r+1}) holds. By Cauchy interlacing theorem, we have

λk+1,r+1\displaystyle\lambda_{k+1,r+1} \displaystyle\leq λk,r,1kr\displaystyle\lambda_{k,r},\quad 1\leq k\leq r
λk+1,r+1\displaystyle\lambda_{k+1,r+1} \displaystyle\geq λk+1,r,0kr1.\displaystyle\lambda_{k+1,r},\quad 0\leq k\leq r-1.

We then deduce, due to the induction hypothesis (Hr)(H_{r}) and Assumption (H),

(2.7) λk+1,r+1\displaystyle\lambda_{k+1,r+1} \displaystyle\leq 1+δRu(k,r)λ1,r1+δRu(k+1,r+1)λ1,r+1,1kr,\displaystyle 1+\delta_{R}u(k,r)\sqrt{\lambda_{1,r}}\leq 1+\delta_{R}u(k+1,r+1)\sqrt{\lambda_{1,r+1}},\quad 1\leq k\leq r,
(2.8) λk+1,r+1\displaystyle\lambda_{k+1,r+1} \displaystyle\geq 1δRu(rk,r)λ1,r\displaystyle 1-\delta_{R}u(r-k,r)\sqrt{\lambda_{1,r}}
\displaystyle\geq 1δRu(r+1(k+1)+1,r+1)λ1,r+1,0kr1.\displaystyle 1-\delta_{R}u(r+1-(k+1)+1,r+1)\sqrt{\lambda_{1,r+1}},\quad 0\leq k\leq r-1.

It remains to obtain the upper estimate for λ1,r+1\lambda_{1,r+1} and the lower one for λr+1,r+1\lambda_{r+1,r+1}. We write

(2.14) Yr+1tYr+1\displaystyle Y_{r+1}^{t}Y_{r+1} =\displaystyle= [yr+1tYrt][yr+1Yr]=[1yr+1tYrYrtyr+1YrtYr],\displaystyle\left[\begin{array}[]{c}y_{r+1}^{t}\\ Y_{r}^{t}\end{array}\right]\left[\begin{array}[]{cc}y_{r+1}&Y_{r}\end{array}\right]=\left[\begin{array}[]{cc}1&y_{r+1}^{t}Y_{r}\\ Y_{r}^{t}y_{r+1}&Y_{r}^{t}Y_{r}\end{array}\right],

and it is well known that the eigenvalues of Yr+1tYr+1Y_{r+1}^{t}Y_{r+1} are the zeros of the secular equation:

(2.15) q(λ):=1λ+k=1r(vktYrtyr+1)2λλk,r= 0.\displaystyle q(\lambda):=1-\lambda+\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{\lambda-\lambda_{k,r}}\ =\ 0.

We first estimate λ1,r+1\lambda_{1,r+1} which is the greatest zero of qq, and assume for contradiction that

(2.16) λ1,r+1>1+δRu(0,r)λ1,r.\displaystyle\lambda_{1,r+1}>1+\delta_{R}u(0,r)\sqrt{\lambda_{1,r}}.

From (Hr)(H_{r}), we then obtain that for λ1+δRu(0,r)λ1,r\lambda\geq 1+\delta_{R}u(0,r)\sqrt{\lambda_{1,r}},

q(λ)1λ+1δRλ1,rk=1r(vktYrtyr+1)2u(0,r)u(k,r):=g(λ).\displaystyle q(\lambda)\leq 1-\lambda+\frac{1}{\delta_{R}\sqrt{\lambda_{1,r}}}\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}:=g(\lambda).

Let λ0\lambda^{0} be the zero of gg. We have g(λ1,r+1)q(λ1,r+1)=0=g(λ0)g(\lambda_{1,r+1})\geq q(\lambda_{1,r+1})=0=g(\lambda^{0}). But gg is decreasing, so

λ1,r+1λ0=1+1δRλ1,rk=1r(vktYrtyr+1)2u(0,r)u(k,r).\displaystyle\lambda_{1,r+1}\leq\lambda^{0}=1+\frac{1}{\delta_{R}\sqrt{\lambda_{1,r}}}\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}.

Thus, using Lemma 2.1, the equality (1.5) and noting that rp/2r\leq p/2, we can write:

(2.17) λ1,r+11+2δRλ1,rX2pk=1r1u(0,r)u(k,r) 1+δRu(0,r)λ1,r,\displaystyle\lambda_{1,r+1}\leq 1+\frac{2}{\delta_{R}}\frac{\sqrt{\lambda_{1,r}}\|X\|^{2}}{p}\sum_{k=1}^{r}\frac{1}{u(0,r)-u(k,r)}\ \leq\ 1+\delta_{R}u(0,r)\sqrt{\lambda_{1,r}},

which yields a contradiction with the inequality (2.16). Thus, we have

(2.18) λ1,r+11+δRu(0,r)λ1,r1+δRu(1,r+1)λ1,r+1.\lambda_{1,r+1}\leq 1+\delta_{R}u(0,r)\sqrt{\lambda_{1,r}}\leq 1+\delta_{R}u(1,r+1)\sqrt{\lambda_{1,r+1}}.

This shows that the upper bound in (Hr+1)(H_{r+1}) holds.

Finally, to estimate λr+1,r+1\lambda_{r+1,r+1} which is the smallest zero of qq, we write

q(λ)1λ1δRλ1,rk=1r(vktYrtyr+1)2u(0,r)u(k,r):=g~(λ).\displaystyle q(\lambda)\geq 1-\lambda-\frac{1}{\delta_{R}\sqrt{\lambda_{1,r}}}\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}:=\widetilde{g}(\lambda).

By means of the same reasonning as above, we show that the lower bound in (Hr+1)(H_{r+1}) holds.

2.3. Controlling the greatest eigenvalue

Set μ1,r=λ1,r10\mu_{1,r}=\lambda_{1,r}-1\geq 0.

Since u(1,r)u(1,R)u(0,R)u(1,r)\leq u(1,R)\leq u(0,R), we can write

μ1,r\displaystyle\mu_{1,r} \displaystyle\leq δRμ1,r+1.\displaystyle\delta_{R}\sqrt{\mu_{1,r}+1}.

Hence, using that xA1+xx\leq A\sqrt{1+x} implies x2Ax\leq 2A, we reach the upper estimate for λ1,r\lambda_{1,r}.

This concludes the proof of Theorem 1.5.

3. Two simple examples and an open question

Let us choose u(k,r)=2rkru(k,r)=\frac{2r-k}{\sqrt{r}}. Using (r+1)(2rk)2r(2r+1k)2(r+1)(2r-k)^{2}\leq r(2r+1-k)^{2} and (r+1)(r+k)2r(r+1+k)2(r+1)(r+k)^{2}\leq r(r+1+k)^{2}, we thus deduce that uu verifies Hypothesis (H). Applying Theorem 1.5, we obtain that we can extract a submatrix with RR columns and λ1,R1+ε\lambda_{1,R}\leq 1+\varepsilon, provided that

RlogR\displaystyle R\log R \displaystyle\leq ε28pX2,\displaystyle\frac{\varepsilon^{2}}{8}\frac{p}{\|X\|^{2}},

which is a slightly weaker bound than the one known from [1].

One can also verify that u(k,r)=rku(k,r)=\sqrt{r-k} satisfies Hypothesis (H) and yields a similar bound.

An open question is then to know whether there exists a function uu satisfying Hypothesis (H) and allowing to reach the optimal bound known in the Bourgain Tzafriri theorem [1] via our new algorithm.

References

  • [1] Bourgain, J. and Tzafriri, L., Invertibility of "large” submatrices with applications to the geometry of Banach spaces and harmonic analysis. Israel J. Math. 57 (1987), no. 2, 137–224.
  • [2] Naor, A., Sparse quadratic forms and their geometric applications [following Batson, Spielman and Srivastava]. Séminaire Bourbaki: Vol. 2010/2011. Exposés 1027–1042. Astérisque No. 348 (2012), Exp. No. 1033, viii, 189–217.
  • [3] Spielman, D. A. and Srivastava, N., An elementary proof of the restricted invertibility theorem. Israel J. Math. 190 (2012), 83–91.
  • [4] Tropp, J., The random paving property for uniformly bounded matrices, Studia Math., vol. 185, no. 1, pp. 67–82, 2008.
  • [5] Tropp, J., Norms of random submatrices and sparse approximation. C. R. Acad. Sci. Paris, Ser. I (2008), Vol. 346, pp. 1271-1274.
  • [6] Vershynin, R., John’s decompositions: selecting a large part. Israel J. Math. 122 (2001), 253–277.
  • [7] Youssef, P. A note on column subset selection. Int. Math. Res. Not. IMRN 2014, no. 23, 6431–6447.