An elementary method for the problem of column subset selection in a rectangular matrix

Stéphane Chrétien and Sébastien Darses National Physical Laboratory
Hampton road
Teddington TW11 0LW, UK stephane.chretien@npl.co.uk LATP, UMR 6632
Université Aix-Marseille, Technopôle Château-Gombert
39 rue Joliot Curie
13453 Marseille Cedex 13, France sebastien.darses@univ-amu.fr

Abstract.

The problem of extracting a well conditioned submatrix from any rectangular matrix (with e.g. normalized columns) has been a subject of extensive research with applications to rank revealing factorization, low stretch spanning trees, sparse solutions to least squares regression problems, and is also connected with problems in functional and harmonic analysis. Here, we provide a deterministic algorithm which extracts a submatrix $X_{S}$ from any matrix $X$ with guaranteed individual lower and upper bounds on each singular value of $X_{S}$ . The proof of our main result is short and elementary.

keywords: Column subset selection, Restricted Invertibility,

1. Introduction

Let $X\in\mathbb{R}^{n\times p}$ be a matrix such that all columns of $X$ have unit euclidean $\ell_{2}$ -norm. We denote by $\|x\|_{2}$ the $\ell_{2}$ -norm of a vector $x$ and by $\|X\|$ (resp. $\|X\|_{HS}$ ) the associated operator norm (resp. the Hilbert-Schmidt norm). Let $X_{T}$ denote the submatrix of $X$ obtained by extracting the columns of $X$ indexed by $T\subset\{1,\ldots,p\}$ . For any real symmetric matrix $A$ , let $\lambda_{k}(A)$ denote the $k$ -th eigenvalue of $A$ , and we order the eigenvalues as $\lambda_{1}(A)\geq\lambda_{2}(A)\geq\cdots$ . We also write $\lambda_{\min}(A)$ (resp. $\lambda_{\max}(A)$ ) for the smallest (resp. largest) eigenvalue of $A$ . We finally write $|S|$ for the size of a set $S$ .

The problem of well conditioned column selection that we condider here consists in finding the largest subset of columns of $X$ such that the corresponding submatrix has all singular values in a prescribed interval $[1-\varepsilon,1+\varepsilon]$ . The one-sided problem of finding the largest possible $T$ such that $\lambda_{\min}(X_{T}^{t}X_{T})\geq 1-\varepsilon$ is called the Restricted Invertibility Problem and has a long history starting with the seminal work of Bourgain and Tzafriri [1]. Applications of such results are well known in the domain of harmonic analysis [1]. The study of the condition number is also a subject of extensive study in statistics and signal processing [5].

Here, we propose an elementary approach to this problem based on two simple ingredients:

(1)

Choosing recursively $y\in\mathcal{V}$ , the set of remaining columns of $X$ , verifying

$\displaystyle Q(y)\leq\frac{1}{|\mathcal{V}|}\sum_{x\in\mathcal{V}}Q(x),$

where $Q$ is a relevant quantity depending on the previous chosen vectors;
(2)

a well-known equation (sometimes called secular equation) whose roots are the eigenvalues of a square matrix after appending a row and a line.

1.1. Historical background

Concerning the Restricted Invertibility problem, Bourgain and Tzafriri [1] obtained the following result for square matrices:

Theorem 1.1 ([1]).

Given a $p\times p$ matrix $X$ whose columns have unit $\ell_{2}$ -norm, there exists $T\subset\{1,\ldots,p\}$ with $\displaystyle|T|\geq d\frac{p}{\|X\|^{2}}$ such that $C\leq\lambda_{\min}(X_{T}^{t}X_{T})$ , where $d$ and $C$ are absolute constants.

See also [4] for a simpler proof. Vershynin [6] generalized Bourgain and Tzafriri’s result to the case of rectangular matrices and the estimate of $|T|$ was improved as follows.

Theorem 1.2 ([6]).

\displaystyle|T|\geq(1-\varepsilon)\frac{\|X\|_{HS}^{2}}{\|X\|^{2}}

such that $C_{1}(\varepsilon)\leq\lambda_{\min}(\widetilde{X}_{T}^{t}\widetilde{X}_{T})\leq\lambda_{\max}(\widetilde{X}_{T}^{t}\widetilde{X}_{T})\leq C_{2}(\varepsilon)$ .

Recently, Spielman and Srivastava proposed in [3] a deterministic construction of $T$ which allows them to obtain the following result.

Theorem 1.3 ([3]).

Let $X$ be a $p\times p$ matrix and $\varepsilon\in(0,1)$ . Then there exists $T\subset\{1,\ldots,p\}$ with $\displaystyle|T|\geq(1-\varepsilon)^{2}\frac{\|X\|_{HS}^{2}}{\|X\|^{2}}$ such that $\displaystyle\varepsilon^{2}\frac{\|X\|^{2}}{p}\leq\lambda_{\min}(X_{T}^{t}X_{T})$ .

The technique of proof relies on new constructions and inequalities which are thoroughly explained in the Bourbaki seminar of Naor [2]. Using these techniques, Youssef [7] improved Vershynin’s result as:

Theorem 1.4 ([7]).

Given a $n\times p$ matrix $X$ and letting $\widetilde{X}$ be the matrix obtained from $X$ by $\ell_{2}$ -normalizing its columns. Then, for any $\varepsilon\in(0,1)$ , there exists $T\subset\{1,\ldots,p\}$ with $\displaystyle|T|\geq\frac{\varepsilon^{2}}{9}\frac{\|X\|_{HS}^{2}}{\|X\|^{2}}$ such that $1-\varepsilon\leq\lambda_{\min}(\widetilde{X}_{T}^{t}\widetilde{X}_{T})\leq\lambda_{\max}(\widetilde{X}_{T}^{t}\widetilde{X}_{T})\leq 1+\varepsilon$ .

1.2. Our contribution

We provide a deterministic algorithm that extracts a submatrix $Y_{r}$ from the matrix $X$ with guaranteed individual lower and upper bounds on each singular value of $Y_{r}$ .

Consider the set of vectors $\mathcal{V}_{0}=\{x_{1},\ldots,x_{p}\}$ , where the $x_{i}$ are the columns of $X$ . At step $r=1$ , choose $y_{1}\in\mathcal{V}_{0}$ . By induction, let us be given $y_{1},\ldots,y_{r}$ at step $r$ . Let $Y_{r}$ denote the matrix whose columns are $y_{1},\ldots,y_{r}$ and let $v_{k}$ be an unit eigenvector of $Y_{r}^{t}Y_{r}$ associated to $\lambda_{k,r}:=\lambda_{k}(Y_{r}^{t}Y_{r})$ .

We say that $u(\cdot,\cdot)$ satisfies the hypothesis (H) if $u$ verifies for $r\geq 1$ :

(1.1)		$\displaystyle 0\leq u(k,r)$	$\displaystyle\leq$	$\displaystyle u(k+1,r+1),\quad k\in\{0,\cdots,r\};$
(1.2)		$\displaystyle 0\leq u(k+1,r)$	$\displaystyle\leq$	$\displaystyle u(1,r)\ <\ u(0,r)\quad k\in\{1,\cdots,r-1\}.$

We now introduce the "potential" associated to $u(\cdot,\cdot)$ satisfying (H):

\displaystyle Q_{r}(x)

\displaystyle=

\displaystyle\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}x)^{2}}{u(0,r)-u(k,r)},\quad x\in\mathcal{V}_{0}.

We then choose $y_{r+1}\in\mathcal{V}_{r}:=\{x_{1},\ldots,x_{p}\}\setminus\{y_{1},\ldots,y_{r}\}$ so that

(1.3)

\displaystyle Q_{r}(y_{r+1})

\displaystyle\leq

\displaystyle\frac{1}{p-r}\sum_{x\in\mathcal{V}_{r}}Q_{r}(x)=\frac{1}{p-r}\sum_{k=1}^{r}\frac{\sum_{x\in\mathcal{V}_{r}}(v_{k}^{t}Y_{r}^{t}x)^{2}}{u(0,r)-u(k,r)}.

The following result, for which we propose a short and elementary proof, gives a control on all singular values in the column selection problem.

Theorem 1.5.

Let $u$ satisfies Hypothesis (H). Set $R\leq p/2$ . Then, we can extract from $X$ some submatrices $Y_{r}$ such that for all $r$ and $k$ with $1\leq k\leq r\leq R$ , we have

(1.4)

\displaystyle 1-\delta_{R}\ u(r-k+1,r)\sqrt{\lambda_{1,r}}\ \leq\ \lambda_{k,r}\ \leq\ 1+\delta_{R}\ u(k,r)\sqrt{\lambda_{1,r}},

where

(1.5)

\displaystyle\delta_{R}

\displaystyle=

\displaystyle\sqrt{\frac{2\|X\|^{2}}{p}\sup_{1\leq r\leq R}\sum_{k=1}^{r}\frac{u(0,r)^{-1}}{u(0,r)-u(k,r)}}.

In particular,

\displaystyle\lambda_{1,r}

\displaystyle\leq

\displaystyle 1+2\delta_{R}\ u(1,r).

2. Proof of Theorem 1.5

2.1. Suitable choice of the extracted vectors

Consider the set of vectors $\mathcal{V}_{0}=\{x_{1},\ldots,x_{p}\}$ . At step $1$ , choose $y_{1}\in\mathcal{V}_{0}$ . By induction, let us be given $y_{1},\ldots,y_{r}$ at step $r$ . Let $Y_{r}$ denote the matrix whose columns are $y_{1},\ldots,y_{r}$ and let $v_{k}$ be an unit eigenvector of $Y_{r}^{t}Y_{r}$ associated to $\lambda_{k,r}:=\lambda_{k}(Y_{r}^{t}Y_{r})$ . Let us choose $y_{r+1}\in\mathcal{V}_{r}:=\{x_{1},\ldots,x_{p}\}\setminus\{y_{1},\ldots,y_{r}\}$ so that

(2.6)

\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}\leq\frac{1}{p-r}\sum_{x\in\mathcal{V}_{r}}\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}x)^{2}}{u(0,r)-u(k,r)}=\frac{1}{p-r}\sum_{k=1}^{r}\frac{\sum_{x\in\mathcal{V}_{r}}(v_{k}^{t}Y_{r}^{t}x)^{2}}{u(0,r)-u(k,r)}.

Lemma 2.1.

For all $r\geq 1$ , $y_{r+1}$ verifies

\displaystyle\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}

\displaystyle\leq

\displaystyle\frac{\lambda_{1,r}\|X\|^{2}}{p-r}\sup_{1\leq j\leq r}\sum_{k=1}^{j}\frac{1}{u(0,j)-u(k,j)}.

Proof.

Let $X_{r}$ be the matrix whose columns are the $x\in\mathcal{V}_{r}$ , i.e. $X_{r}X_{r}^{t}=\sum_{x\in\mathcal{V}_{r}}xx^{t}$ . Then

\displaystyle\sum_{x\in\mathcal{V}_{r}}(v_{k}^{t}Y_{r}^{t}x)^{2}={\rm Tr}\left(Y_{r}v_{k}v_{k}^{t}Y_{r}^{t}X_{r}X_{r}^{t}\right)\leq{\rm Tr}(Y_{r}v_{k}v_{k}^{t}Y_{r}^{t})\|X_{r}X_{r}^{t}\|\leq\lambda_{k,r}\|X\|^{2},

which yields the conclusion by plugging in into (2.6) since $\lambda_{k,r}\leq\lambda_{1,r}$ . ∎

2.2. Controlling the individual eigenvalues

It is clear that (1.4) holds for $r=1$ since then, 1 is the only singular value because the columns are supposed to be normalized.

Assume the induction hypothesis $(H_{r})$ : for all $k$ with $1\leq k\leq r<R$ , (1.4) holds.

Let us then show that $(H_{r+1})$ holds. By Cauchy interlacing theorem, we have

	$\displaystyle\lambda_{k+1,r+1}$	$\displaystyle\leq$	$\displaystyle\lambda_{k,r},\quad 1\leq k\leq r$
	$\displaystyle\lambda_{k+1,r+1}$	$\displaystyle\geq$	$\displaystyle\lambda_{k+1,r},\quad 0\leq k\leq r-1.$

We then deduce, due to the induction hypothesis $(H_{r})$ and Assumption (H),

(2.7)	$\displaystyle\lambda_{k+1,r+1}$	$\displaystyle\leq$	$\displaystyle 1+\delta_{R}u(k,r)\sqrt{\lambda_{1,r}}\leq 1+\delta_{R}u(k+1,r+1)\sqrt{\lambda_{1,r+1}},\quad 1\leq k\leq r,$
(2.8)	$\displaystyle\lambda_{k+1,r+1}$	$\displaystyle\geq$	$\displaystyle 1-\delta_{R}u(r-k,r)\sqrt{\lambda_{1,r}}$
(2.8)		$\displaystyle\geq$	$\displaystyle 1-\delta_{R}u(r+1-(k+1)+1,r+1)\sqrt{\lambda_{1,r+1}},\quad 0\leq k\leq r-1.$

It remains to obtain the upper estimate for $\lambda_{1,r+1}$ and the lower one for $\lambda_{r+1,r+1}$ . We write

(2.14)

\displaystyle Y_{r+1}^{t}Y_{r+1}

\displaystyle=

\displaystyle\left[\begin{array}[]{c}y_{r+1}^{t}\\ Y_{r}^{t}\end{array}\right]\left[\begin{array}[]{cc}y_{r+1}&Y_{r}\end{array}\right]=\left[\begin{array}[]{cc}1&y_{r+1}^{t}Y_{r}\\ Y_{r}^{t}y_{r+1}&Y_{r}^{t}Y_{r}\end{array}\right],

and it is well known that the eigenvalues of $Y_{r+1}^{t}Y_{r+1}$ are the zeros of the secular equation:

(2.15)

\displaystyle q(\lambda):=1-\lambda+\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{\lambda-\lambda_{k,r}}\ =\ 0.

We first estimate $\lambda_{1,r+1}$ which is the greatest zero of $q$ , and assume for contradiction that

(2.16)

\displaystyle\lambda_{1,r+1}>1+\delta_{R}u(0,r)\sqrt{\lambda_{1,r}}.

From $(H_{r})$ , we then obtain that for $\lambda\geq 1+\delta_{R}u(0,r)\sqrt{\lambda_{1,r}}$ ,

\displaystyle q(\lambda)\leq 1-\lambda+\frac{1}{\delta_{R}\sqrt{\lambda_{1,r}}}\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}:=g(\lambda).

Let $\lambda^{0}$ be the zero of $g$ . We have $g(\lambda_{1,r+1})\geq q(\lambda_{1,r+1})=0=g(\lambda^{0})$ . But $g$ is decreasing, so

\displaystyle\lambda_{1,r+1}\leq\lambda^{0}=1+\frac{1}{\delta_{R}\sqrt{\lambda_{1,r}}}\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}.

Thus, using Lemma 2.1, the equality (1.5) and noting that $r\leq p/2$ , we can write:

(2.17)

\displaystyle\lambda_{1,r+1}\leq 1+\frac{2}{\delta_{R}}\frac{\sqrt{\lambda_{1,r}}\|X\|^{2}}{p}\sum_{k=1}^{r}\frac{1}{u(0,r)-u(k,r)}\ \leq\ 1+\delta_{R}u(0,r)\sqrt{\lambda_{1,r}},

which yields a contradiction with the inequality (2.16). Thus, we have

(2.18)

\lambda_{1,r+1}\leq 1+\delta_{R}u(0,r)\sqrt{\lambda_{1,r}}\leq 1+\delta_{R}u(1,r+1)\sqrt{\lambda_{1,r+1}}.

This shows that the upper bound in $(H_{r+1})$ holds.

Finally, to estimate $\lambda_{r+1,r+1}$ which is the smallest zero of $q$ , we write

\displaystyle q(\lambda)\geq 1-\lambda-\frac{1}{\delta_{R}\sqrt{\lambda_{1,r}}}\sum_{k=1}^{r}\frac{(v_{k}^{t}Y_{r}^{t}y_{r+1})^{2}}{u(0,r)-u(k,r)}:=\widetilde{g}(\lambda).

By means of the same reasonning as above, we show that the lower bound in $(H_{r+1})$ holds.

2.3. Controlling the greatest eigenvalue

Set $\mu_{1,r}=\lambda_{1,r}-1\geq 0$ .

Since $u(1,r)\leq u(1,R)\leq u(0,R)$ , we can write

\displaystyle\mu_{1,r}

\displaystyle\leq

\displaystyle\delta_{R}\sqrt{\mu_{1,r}+1}.

Hence, using that $x\leq A\sqrt{1+x}$ implies $x\leq 2A$ , we reach the upper estimate for $\lambda_{1,r}$ .

This concludes the proof of Theorem 1.5.

3. Two simple examples and an open question

Let us choose $u(k,r)=\frac{2r-k}{\sqrt{r}}$ . Using $(r+1)(2r-k)^{2}\leq r(2r+1-k)^{2}$ and $(r+1)(r+k)^{2}\leq r(r+1+k)^{2}$ , we thus deduce that $u$ verifies Hypothesis (H). Applying Theorem 1.5, we obtain that we can extract a submatrix with $R$ columns and $\lambda_{1,R}\leq 1+\varepsilon$ , provided that

\displaystyle R\log R

\displaystyle\leq

\displaystyle\frac{\varepsilon^{2}}{8}\frac{p}{\|X\|^{2}},

which is a slightly weaker bound than the one known from [1].

One can also verify that $u(k,r)=\sqrt{r-k}$ satisfies Hypothesis (H) and yields a similar bound.

An open question is then to know whether there exists a function $u$ satisfying Hypothesis (H) and allowing to reach the optimal bound known in the Bourgain Tzafriri theorem [1] via our new algorithm.

References

[1] Bourgain, J. and Tzafriri, L., Invertibility of "large” submatrices with applications to the geometry of Banach spaces and harmonic analysis. Israel J. Math. 57 (1987), no. 2, 137–224.
[2] Naor, A., Sparse quadratic forms and their geometric applications [following Batson, Spielman and Srivastava]. Séminaire Bourbaki: Vol. 2010/2011. Exposés 1027–1042. Astérisque No. 348 (2012), Exp. No. 1033, viii, 189–217.
[3] Spielman, D. A. and Srivastava, N., An elementary proof of the restricted invertibility theorem. Israel J. Math. 190 (2012), 83–91.
[4] Tropp, J., The random paving property for uniformly bounded matrices, Studia Math., vol. 185, no. 1, pp. 67–82, 2008.
[5] Tropp, J., Norms of random submatrices and sparse approximation. C. R. Acad. Sci. Paris, Ser. I (2008), Vol. 346, pp. 1271-1274.
[6] Vershynin, R., John’s decompositions: selecting a large part. Israel J. Math. 122 (2001), 253–277.
[7] Youssef, P. A note on column subset selection. Int. Math. Res. Not. IMRN 2014, no. 23, 6431–6447.