Fast Algorithms for -Regression
Abstract
The -norm regression problem is a classic problem in optimization with wide ranging applications in machine learning and theoretical computer science. The goal is to compute , where and . Efficient high-accuracy algorithms for the problem have been challenging both in theory and practice and the state of the art algorithms require linear system solves for . In this paper, we provide new algorithms for -regression (and a more general formulation of the problem) that obtain a high-accuracy solution in linear system solves. We further propose a new inverse maintenance procedure that speeds-up our algorithm to total runtime, where denotes the running time for multiplying matrices. Additionally, we give the first Iteratively Reweighted Least Squares (IRLS) algorithm that is guaranteed to converge to an optimum in a few iterations. Our IRLS algorithm has shown exceptional practical performance, beating the currently available implementations in MATLAB/CVX by 10-50x.
1 Introduction
††Preliminary versions of the results in this paper have appeared as conference publications [Adi+19, APS19, AS20, Adi+21]. This paper unifies and simplifies results from the preliminary versions.Linear regression in -norm seeks to compute a vector such that,
where , . This is a classic convex optimization problem that captures several well-studied questions including least squares regression () which is equivalent to solving a system of linear equations, and linear programming (). The -norm regression problem for has found use across a wide range of applications in machine learning and theoretical computer science including low rank matrix approximation [Chi+17], sparse recovery [CT05], graph based semi-supervised learning [AL11, Cal19, RCL19, Kyn+15], data clustering and learning problems [ETT15, EDT17, HFE18]. In this paper, we focus on solving the -norm regression problem for The exact solution to the -norm regression problem for may not even be expressible using rationals. Thus, the goal is often relaxed to finding an -approximate solution to the problem, i.e., find such that and,
for some small Furthermore, several applications such as graph based semi-supervised learning require that is close to coordinate-wise and not just in objective value – necessitating a high-accuracy solution with . In order to find such high-accuracy solutions efficiently, we require an algorithm with runtime dependence on being rather than
Fast, high-accuracy algorithms for -regression are challenging both in theory and practice, due to the lack of smoothness and strong convexity of the objective. The Interior Point Method framework by [NN94] can be used to compute a high-accuracy solution for all in 555 hides constants, dependencies, and factors unless explicitly mentioned iterations, with each iteration requiring solving an system of linear equations. This was the most efficient algorithm for -regression until 2018. In 2018, [Bub+18] showed that iterations are necessary for the interior point framework and proposed a new homotopy-based approach that could compute a high-accuracy solution in linear system solves for all . Their algorithms improve over the interior point method by factors for values of bounded away from and However, for approaching or , the number of linear system solves required by their algorithm approaches the same as required by interior point methods. Finding an algorithm for -regression requiring linear system solves has been a long standing open problem.
Among practical implementations for the -norm regression problem, the Iteratively Reweighted Least Squares (IRLS) methods stand out due to their simplicity, and have been studied since 1961 [Law61]. For some range of values for IRLS converges rapidly. However, the method is guaranteed to converge only for and diverges even for small values of e.g. [RCL19]. Over the years, several empirical modifications of the algorithm have been used for various applications in practice (refer to [Bur12] for a full survey). However, an IRLS algorithm that is guaranteed to converge to the optimum in a few iterations for all values of has again been a long standing challenge.
1.1 Our Contributions
In this paper, we present the first algorithm for the -regression problem that finds a high-accuracy solution in at most linear system solves, which has been a long sought-after goal in optimization. Our algorithm builds on a new iterative refinement framework for -norm objectives that allows us to find a high-accuracy solution using low-accuracy solutions to a subproblem. The iterative refinement framework allows for the subproblems to be solved to an -approximation and this has been useful in several follow up works on graph optimization (see Section 1.3). We further propose a new inverse maintenance framework and show how to speed up our algorithm to solve the -norm problem to a high-accuracy in total time . Finally, we give the first IRLS algorithm that provably converges to a high-accuracy solution in a few iterations.
Preliminary versions of the results presented in this paper have appeared in previous conference publications by [Adi+19, APS19, AS20, Adi+21]. In this paper, we present our results for a more general formulation of the -regression problem,
(1) |
for matrices . Let and, , so that the above problem has a bounded solution. Our first result is a fast, high-accuracy algorithm for Problem (1).
Theorem 1.1.
Let and . There is an algorithm that starting from satisfying , finds an -approximate solution to Problem (1) in calls to a linear system solver.
As a corollary, for the -norm regression problem, i.e., and , our algorithm converges in calls to a linear system solver. This is the first algorithm that converges to a high accuracy solution at an asymptotic rate of convergence for all , and thus faster than all previously known algorithms by at least a factor of . As a result, we answer the long standing problem in optimization of whether such a rate of convergence could be achieved.
Our next result shows how to speed up our algorithms and solve Problem (1) in time (or for -regression), where and is the current time required for multiplying two matrices. This is almost as fast as solving a system of linear equations. We achieve this guarantee via a new inverse maintenance procedure for -regression and prove the following result.
Theorem 1.2.
If are explicitly given, matrices with polynomially bounded condition numbers, and , there is an algorithm for Problem (1) that can be implemented to run in total time .
Our inverse maintenance algorithm is presented in Section 5, where we also give a more fine grained dependence on the parameters and in the rate of convergence (Theorem 5.1). Our algorithms and techniques for -regression have motivated a line of work in graph optimization and the study of accelerated width reduced methods which we describe in detail in Section 1.3.
Our next contribution is towards the IRLS approach. For the -regression problem i.e. in (1), we give an IRLS algorithm that globally converges to the optimum in at most linear system solves for all (Section 6). This is the first IRLS algorithm that is guaranteed to converge to the optimum for all values of , with a quantitative bound on the runtime. Our IRLS algorithm has proven to be very fast and robust in practice and is faster than existing implementations in MATLAB/CVX by 10-50x. These speed-ups are demonstrated in experiments performed in [APS19] and we present these results along with our algorithm in Section 6.
Theorem 1.3.
Let . Algorithm 10 returns such that and , in at most calls to a linear system solver.
The analysis of our IRLS algorithm fits into the overall framework of this paper. Such an algorithm first appeared in the conference paper by [APS19], where they also ran some experiments to demonstrate the performance of their IRLS algorithm in practice. We include some of their experimental results to show that the rate of convergence in practice is even better than the theoretical bounds.
1.2 Technical Overview
Overall Convergence
Our algorithm follows an overall iterative refinement approach for , which implies can be upper bounded by the function , and lower bounded by a similar function. Here, the vector and matrix depend on and the matrix is as defined in Problem (1). We prove that if we can solve to a -approximation, such solves (iterations) suffice to obtain an -approximate solution to Problem (1) (Theorem 2.1). We call this problem the Residual Problem and this process Iterative Refinement for -norms.
Solving the Residual Problem
We next perform a binary search on the linear term of the residual problem and reduce it to solving problems of the form, (Lemma 3.1). In order to solve these new problems, we use a multiplicative weight update routine that returns a constant approximate solution in calls to a linear system solver (Theorem 3.2). We can thus find a constant approximate solution to the residual problem in calls to a linear system solver (Corollary 3.7). Combined with iterative refinement, we obtain an algorithm that converges in linear system solves.
Improving Dependence
Furthermore, we prove that for any , given a -norm residual problem, we can construct a corresponding -norm residual problem such that -approximate solution to the -norm residual problem roughly gives a approximate solution to the -norm residual problem (Theorem 4.3). As a consequence, if is large, i.e. , a constant approximate solution to the corresponding -norm residual problem will give an -approximate solution to the -norm residual problem in at most calls to a linear system solver. Combining this with the algorithm described in the previous paragraph, we obtain our final guarantees as described in Theorem 1.1.
-Regression in Matrix Multiplication Time
We next describe how to obtain the guarantees of Theorem 1.2. While solving the residual problem, the algorithm solves a system of linear equations at every iteration. The key observation for obtaining improved running times is that the weights determining these linear systems change slowly. Thus, we can maintain a spectral approximation to the linear system via a sequence of lazy low-rank updates. The Sherman-Morrison-Woodbury formula then allows us to update the inverse quickly. We can use the spectral approximation as a preconditioner for solving the linear system quickly at each iteration. Thus, we obtain a speed-up since the linear systems do not need to be solved from scratch at each iteration, giving Theorem 1.2.
Good Starting Solution
For -norm objectives, i.e., , we further show how to find a starting solution such that . The key idea is that for any , a constant approximate solution to the -norm problem is an -approximate solution to the -norm problem (Lemma 2.9). This inspires a homotopy approach, where we first solve an norm problem followed by -norm problems to constant approximations. We can thus obtain the required starting solution in at most calls to a linear system solver.
IRLS Algorithm
For the IRLS algorithm, given the residual problem at an iteration, we show how to construct a weighted least squares problem, the solution of which is an -approximate solution to the residual problem (Lemma 6.1). This result along with the overall iterative refinement culminates in our IRLS algorithm where we directly solve these weighted least squares problems in every iteration.
1.3 Related Works
-Regression
Until 2018, the fastest high-accuracy algorithms for -regression, including the [NN94] Interior Point Method framework and [Bub+18] homotopy method, asymptotically required linear system solves. The first algorithm for -regression to beat the iteration bound was the algorithm by [Adi+19], which was faster than all known algorithms and asymptotically required at most iterations , for all . Concurrently [Bul18] used tools from convex optimization to give an algorithm for which matches the rates of [Adi+19] up to logarithmic factors. Subsequent works have improved the dependence [AS20, Adi+21] and proposed alternate methods for obtaining matching rates (upto logarithmic and factors) [Car+20]. A recent work by [JLS22] shows how to solve -regression in iterations where is the smaller dimension of the constraint matrix .
Width Reduced MWU Algorithms
Width reduction is a technique that has been used repeatedly in multiplicative weight update algorithms to speed up rates of convergence from to , where is the size of the input. This technique was first seen in the work of [Chr+11], in the context of the maximum flow problem where for a graph with vertices and edges to improve the iteration complexity from to . A similar improvement was further seen in algorithms for -regression by [Chi+13, EV19], -regression () [Adi+19] and, algorithms for matrix scaling [All+17]. In a recent work [ABS21] extend this technique to improve iteration complexities for all quasi-self-concordant objectives which includes soft-max and logistic regression among others.
Inverse Maintenance
Inverse Maintenance is a technique used to speed up algorithms and was first introduced by [Vai90] in the context of minimum cost and multicommodity flows and has further been used for interior point methods [LS14], [LSW15]. In 2019, [Adi+19] developed a method for -regression that utilized the idea of reusing inverses due to controllable rates of change of underlying variables.
IRLS Algorithms
Iteratively Reweighted Least Squares Algorithms are simple to implement and have thus been used in a wide range of applications including sparse signal reconstruction [GR97], compressive sensing [CY08] and Chebyshev approximation in FIR filter design [BB94]. Refer to [Bur12] for a full survey. The works by [Osb85] and [Kar70] show convergence in the limit and with certain assumptions on the starting solution. For -regression, [SV16b, SV16a, SV16] show quantitative convergence bounds. In 2019, [APS19] give the first IRLS algorithm with quantitative bounds that is guaranteed to converge with no conditions on the starting point. Their algorithm also works well in practice as suggested by their experiments.
Follow-up Work in Graph Optimization
The -norm flow problem, which asks to minimize the -norm of a flow vector while satisfying certain demand constraints, is modeled via the -regression problem. The maximum flow problem is the special case of . For graphs with vertices and edges, the -norm regression algorithm of [Adi+19] when combined with fast laplacian solvers, directly gives an time algorithm for the -norm flow problem. Building on their work, specifically the iterative refinement framework, which allows to solve these problems to a high-accuracy while only requiring an -aproximate solution to an -norm subproblem, [Kyn+19] give an algorithm for unweighted graphs that runs in time . We note that their algorithm runs in time for . Further works including [Adi+21] also utilize the iterative refinement guarantees to give an algorithm with runtime for weighted -norm flow problems by designing new sparsification algorithms that preserve -norm objectives of the subproblem to an -approximation. For the maximum flow problem, [AS20] give an time algorithm for the approximate maximum flow problem on unweighted graphs. [KLS20] build on these works further and give an algorithm that computes maximum - flow problem where each edge has integer capacities at most , in time . In a recent breakthrough result by [Che+22], the authors give an algorithm for the maximum flow problem and the -norm flow problem that runs in almost linear time, .
1.4 Organization of Paper
Section 2 describes the overall iterative refinement framework, first for , and then for . In the end, we show how to find good starting solutions for pure -norm objectives for . Section 3 describes the width reduced multiplicative weight update routine used to solve the residual problem. In Section 4 we show how to solve -norm residual problems using -norm residual problems and give our overall algorithm (Algorithm 6). Section 5 contains our new inverse maintenance algorithm that allows us to solve -regression almost as fast as linear regression. Finally in Section 6 we give an IRLS algorithm and present some experimental results from [APS19].
2 Iterative Refinement for -norms
Recall that we would like to find a high-accuracy solution for the problem,
for matrices .
A common approach in smooth, convex optimization is upper bounding the function using a first order Taylor expansion plus a quadratic function (smoothness), and minimizing this bound repeatedly to converge to the optimum. Additionally, when the function has a similar quadratic lower bound (strong convexity) it can be shown that minimizing this upper bound 666hiding problem dependent parameters times is sufficient to converge to an -approximate solution. The -norm function satisfies no such quadratic upper bound since it has a very steep growth, or lower bound since it is too flat around . In this section we show that we can instead upper and lower bound the function for by a second order Taylor expansion plus an term. We show that it is sufficient to minimize such a bound to a -approximation times. Such an iterative refinement method was previously only known for , and we thus call this algorithm Iterative Refinement for -norms. In further sections, we show different ways to minimize this upper bound approximately to obtain fast algorithms.
For , we use a smoothed function which is quadratic in a small range around and grows as otherwise. We use this function to give upper and lower bounds and a similar iterative refinement scheme.
We further show how to obtain a good starting solution for Problem (1) in the special case when the vector and matrix are zero, i.e., the objective function is only the -norm function.
2.1 Iterative Refinement
We will prove that the following algorithm can be used to obtain a high-accuracy solution, i.e., rate of convergence for -regression.
Specifically, we will prove,
Theorem 2.1.
Before we prove the above result, we will define some of the terms used in the above statement.
2.1.1 Preliminaries
Definition 2.2 (-Approximate Solution).
Definition 2.3 (Residual Problem).
Definition 2.4 (Approximation to Residual Problem).
Let and be the optimum of the residual problem. is a -approximation to the residual problem if and,
2.1.2 Bounding Change in Objective
In order to prove our result, we first show that we can upper and lower bound the change in our -objective by a linear term plus a quadratic term plus an -norm term.
Lemma 2.5.
For any and , we have for vectors defined coordinate wise as and ,
Proof.
To show this, we show that the above holds for all coordinates. For a single coordinate, the above expression is equivalent to proving,
Let . Since the above clearly holds for , it remains to show for all ,
-
1.
:
In this case, . So, and the right inequality directly holds. To show the other side, letWe have,
and
Since , . So is an increasing function in and .
-
2.
:
Now, , and . As a result,which gives the right inequality. Consider,
Let . The above expression now becomes,
We know that . When , and . This gives us,
giving us for . When , and .
giving us for . Therefore, giving us, , thus giving the left inequality.
-
3.
:
Let Now,When , we have,
and
So is an increasing function of which gives us, . Therefore is a decreasing function, and the minimum is at which is . This gives us our required inequality for . When , and . We are left with the range . Again, we have,
Therefore, is an increasing function, . This implies is an increasing function, giving, as required.
To show the other direction,
Now, since ,
We thus have, when is positive and when is negative. The minimum of is at which is . This concludes the proof of this case.
∎
2.1.3 Proof of Iterative Refinement
In this section we will prove our main result. We start by proving the following lemma which relates the objective of the residual problem defined in the preliminaries to the change in objective value when is updated by .
Lemma 2.6.
For any and and ,
and
Proof.
We now track the value of with a parameter . We will first show that, if we have a approximate solver for the residual problem, we can either take a step to obtain such that
(2) |
or we need to reduce the value of by a factor of since is less than .
Lemma 2.7.
Proof.
We will first prove that by induction. For , by definition. Now, let us assume this is true for iteration . Note that, if the algorithm updates in line 7, since (solution of the residual problem is always non-negative), the relation holds for . Otherwise, the algorithm reduces to and . For such that , and from Lemma 2.6,
Since is a -approximate solution to the residual problem,
We have thus shown that for all iterates and whenever Line 9 of the algorithm is executed, 2 from the lemma statement holds. It remains to prove that if , then satisfies (2). Since, ,
Now, from Lemma 2.6,
∎
Proof.
Our starting solution satisfies and the solutions of the residual problem added in each iteration satisfy . Therefore, . For the second part, note that we always have . When we stop, . Thus,
∎
We are now ready to prove our main result. See 2.1
Proof.
From Corollary 2.8, the solution returned by the algorithm is as required. We next need to bound the runtime. From Lemma 2.7, the algorithm, either reduces or Equation (2) holds. The number of times we can reduce is bounded by . The number of times Equation (2) holds can be bounded as follows,
Therefore, the total number of iterations is bounded as . ∎
2.2 Starting Solution and Homotopy for pure Objectives
In this section, we consider the case where , i.e., and .
(3) |
For such cases, we show how to find a good starting solution. We note that we can solve the following problem since it is equivalent to solving a system of linear equations,
Refer to Appendix A for details on how the above is equivalent to solving a system of linear equations.
We next consider a homotopy on . Specifically, we want to find a starting solution for the -norm problem by first solving an -norm problem, followed by -norm problems to a constant approximation. The following lemma relates these solutions.
Lemma 2.9.
Let denote the optimum of the -norm and the optimum of the -norm problem (3). Let be an -approximate solution to the -norm problem. The following relation holds,
In other words, is a -approximate solution to the -norm problem.
Proof.
The left side follows from optimality of . For the other side, we have the following relation,
∎
Consider the following procedure to obtain a starting point for the -norm problem.
Lemma 2.10.
Let be as returned by Algorithm 2. Suppose there exists an oracle that solves the residual problem for any norm , i.e., to a -approximation in time . We can then compute which is a -approximation to the -norm problem, in time at most
Proof.
In later sections, we will describe an oracle that will have for all values of and depends on linearly.
2.3 Iterative Refinement for
We will consider the following pure problem here,
(4) |
where . In the previous sections we saw an iterative refinement framework that worked for . In this section, we will show a similar iterative refinement for . In particular, we will prove the following result from [Adi+19].
Theorem 2.11.
Let , and . Given an initial solution satisfying , we can find such that and in calls to a -approximate solver to the residual problem (Definition 2.13).
The key idea in the algorithm for was an upper and lower bound on the function that was an -norm term (Lemma 2.5). Such a bound does not hold when , however, we will show that a smoothed -norm function can be used for providing such bounds. Specifically, we use the following smoothed -norm function defined in [Bub+18].
Definition 2.12.
(Smoothed Function.) Let , and . We define,
For any vector and , we define .
We define the following residual problem for this section.
Definition 2.13.
For , we define the residual problem at any feasible to be,
where .
We will follow a similar structure as Section 2.1. We begin by proving analogues of Lemma 2.6 and Lemma 2.5.
Lemma 2.14.
Let . For any and ,
Proof.
We first show the following inequality holds for
(5) |
Let us first show the left inequality, i.e. . Define the following function,
When , . The derivative of with respect to is, . Next let us see what happens when .
This implies that is an increasing function of and for which is where attains its minimum value. The only point where is 0 is . This implies . This concludes the proof of the left inequality. For the right inequality, define:
Note that and . We have,
and
Using this, we get, which says is positive for positive and negative for negative. Thus the minima of is at 0 which is . So .
Before we prove the lemma, we will prove the following inequality for ,
(6) |
for . So the claim clearly holds for since . When , , so the claim holds since,
We now prove the lemma.
Let . The term . Let us first look at the case when . We want to show,
This follows from Equation (5) and the facts and . We next look at the case when . Now, . We need to show
When it is trivially true. When , let
Now, taking the derivative with respect to we get,
We use the mean value theorem to get for ,
which implies in this range as well. When it follows from Equation (6) that . So the function is increasing for and decreasing for . The minimum value of is . It follows that which gives us the left inequality. The other side requires proving,
Define:
The derivative is non negative for and non positive for . The minimum value taken by is which is non negative. This gives us the right inequality.
∎
Lemma 2.15.
Let and . Then for any ,
and
Proof.
Applying Lemma 2.14 to all coordinates,
From the definition of the residual problem and the above equation, the first inequality of our lemma directly follows. To see the other inequality, from the above equation,
Here, we are using the following property of ,
∎
3 Fast Multiplicative Weight Update Algorithm for -norms
In this section, we will show how to solve the residual problem for as defined in the previous section (Definition 2.3), to a constant approximation. The core of our approach is a multiplicative weight update routine with width reduction that is used to speed up the algorithm. For problem instances of size , this routine returns a constant approximate solution in at most calls to a linear system solver. Such a width reduced multiplicative weight update algorithm was first seen in the context of the maximum flow problem and -regression in works by [Chr+11, Chi+13]
The first instance of such a width reduced multiplicative weight update algorithm for -regression appeared in the work of [Adi+19]. In a further work, the authors improved the dependence on in the runtime [Adi+21]. The following sections are based on the improved algorithm from [Adi+21].
3.1 Algorithm for -norm Regression
Recall that our residual problem for is defined as:
for some vector and matrices and . Also recall that in Algorithm 1, we used a parameter , which was used to track the value of at any iteration . We will now use this parameter to do a binary search on the linear term in and reduce the residual problem to,
(7) |
for some constant . Further, we will use our multiplicative weight update solver to solve problems of this kind to a constant approximation. We start by proving the binary search results.
3.1.1 Binary Search
We first note that, if at iteration is such that , then from Lemma 2.6, the residual at has optimum value, . We now consider a parameter that has value between and such that . We have the following lemma that relates the optimum of problem of the type (7) with .
Lemma 3.1.
Let be such that the residual problem satisfies . The following problem has optimum at most .
(8) |
Further, let be a solution to the above problem such that and for some . Then is a -approximation to the residual problem.
Proof.
We have assumed that,
Since the last terms are strictly non-positive, we must have, Since is the optimum and satisfies ,
Thus,
Since we get the following
Now, we know that, and . This gives,
Now, let be as described in the lemma. We have,
∎
3.1.2 Width-Reduced Approximate Solver
We are now finally ready to solve problems of the type (7). In this section, we will give an algorithm to solve the following problem,
(9) | ||||
Here , and vector . Our approach involves a multiplicative weight update method with a width reduction step which allows us to solve these problems faster.
3.1.3 Slow Multiplicative Weight Update Solver
We first give an informal analysis of the multiplicative weight update method without width reduction. We will show that this method converges in iterations. For simplicity, we let in Problem (9) and assume without loss of generality that the optimum satisfies, . Consider the following MWU algorithm for parameter that we will set later:
-
1.
-
2.
for :
-
3.
Return
We claim that the above algorithm returns such that , i.e., a constant approximate solution to the residual problem, in iterations. We will bound the value of the returned solution, by looking at how grows with . From Lemma 2.5,
Observe that the third term on the right hand side is exactly the objective of the quadratic problem minimized to obtain . Using that must achieve a lower objective than , i.e., along with Hölder’s inequality and , we can bound this term by . We can further bound the second term in right hand side of the above inequality by the third term using Hölder’s inequality (refer to Proof of Lemma 3.3 for details). These bounds give,
Observe that the growth of is controlled by . We next see how large this quantity can be. Assume that, for all (one may verify in the end that this holds for all ). Since ,
where we used Hölder’s inequality in . This implies, . Now, for , and,
We can thus prove that,
as required. The total number of iterations is .
To obtain the improved rates of convergence via width reduction, our algorithm uses a hard threshold on and performs a width reduction step whenever is larger than the threshold. The analysis now requires to additionally track how changes with a width reduction step. Our analysis also tracks the value of an additional potential . The interplay of these two potentials and balancing out their changes with respect to primal updates and width reduction steps give the improved rates of convergence.
3.1.4 Fast, Width-Reduced MWU Solver
In the previous section, we showed that a multiplicative weight update algorithm without width-reduction obtains a rate of convergence . In this section we will show how width-reduction allows for a faster rate of convergence. We now present the faster width-reduced algorithm. We will prove the following result.
Theorem 3.2.
In every iteration of the algorithm, we solve a weighted linear system. The solution returned is used to update the current iterate if it has a small norm. Otherwise, we do not update the solution, but update the weights corresponding to the coordinates with large value by a constant factor. This step is refered to as the “width reduction step”. The analysis is based on a potential function argument for specially defined potentials.
The following is the oracle used in the algorithm, i.e., the linear system we need to solve. We show in the Appendix A how to implement the oracle using a linear system solver.
We now have the following multiplicative weight update algorithm given in Algorithm 5.
Notation
3.1.5 Analysis of Algorithm 5
Our analysis is based on tracking the following two potential functions. We will show how these potentials change with a primal step (Line 13) and a width reduction step (18) in the algorithm. The proofs of these lemmas appear later in the section.
Finally, to prove our runtime bound, we will first show that if the total number of width reduction steps is not too large, then is bounded. We then prove that the number of width reduction steps cannot be too large by using the relation between and and their respective changes throughout the algorithm.
We now begin our analysis. The next two lemmas show how our potentials change with every iteration of the algorithm.
Lemma 3.3.
After primal steps, and width-reduction steps, provided , the potential is bounded as follows:
Lemma 3.4.
After primal steps and width reduction steps, if,
-
1.
, and
-
2.
,
then,
The next lemma gives a lower bound on the energy in the beginning and an upper bound on the energy at each step.
Lemma 3.5.
Let denote the number of primal steps and the number of width reduction steps. For any , we have,
3.1.6 Proof of Theorem 3.2
Proof.
Let be the solution returned by Algorithm 5. We first note that this satisfies the linear constraint required. We next bound the objective value at , i.e., and .
Suppose the algorithm terminates in primal steps and width reduction steps. We next note that our parameter values and are such that . We can now apply Lemma 3.3 to get,
We next observe from the weight and update steps in our algorithm that, . Thus,
We next bound the quadratic term. Let denote the solution returned by the oracle in iteration . Since for all iterations, we always have from Lemma 3.5 that, . We will first bound for every .
Now from convexity of , we get
We have shown that if the number of width reduction steps is bounded by then our algorithm returns the required solution. We will next prove that we cannot have more than width reduction steps.
Suppose to the contrary, the algorithm takes a width reduction step starting from step where and . Since the conditions for Lemma 3.3 hold for all preceding steps, we must have which combined with Lemma 3.5 implies . Using this bound on , we note that our parameter values satisfy the conditions of Lemma 3.4. From lemma 3.4,
Since our parameter choices ensure ,
Since and , from Lemma 3.5,
which is a contradiction. We can thus conclude that we can never have more than width reduction steps, thus concluding the correctness of the returned solution. We next bound the number of oracle calls required. The total number of iterations is at most,
∎
3.1.7 Proof of Lemma 3.3
We first prove a simple lemma about the solution returned by the oracle, that we will use in our proof.
Lemma 3.6.
Let . For any , let be the solution returned by Algorithm 4. Then,
Proof.
Since is the solution returned by Algorithm 4, and satisfies the constraints of the oracle, we have,
In the last inequality we use,
Finally, using we have concluding the proof. ∎
See 3.3
Proof.
We prove this claim by induction. Initially, and and thus, the claim holds trivially. Assume that the claim holds for some We will use as an abbreviated notation for below.
Primal Step.
For brevity, we use to denote . If the next step is a primal step,
by Lemma 2.5 |
We next bound by Using Cauchy Schwarz’s inequality,
We thus have,
Using the above bound, we now have,
(since ) |
Recall Since , we have,
From the inductive assumption, we have
Thus,
proving the inductive claim.
Width Reduction Step.
Let be the solution returned by the oracle and denote the set of indices such that and , i.e., the set of indices on which the algorithm performs width reduction. We have the following:
where we use Lemma 3.6 for the second last inequality. Also,
Again, since ,
proving the inductive claim. ∎
3.1.8 Proof of Lemma 3.4
See 3.4
Proof.
It will be helpful for our analysis to split the index set into three disjoint parts:
-
•
-
•
-
•
.
Firstly, we note
hence, using Assumption 2
This means,
Secondly we note that,
So then, using Assumption 1,
As , this implies . We note that in a width reduction step, the resistances change by a factor of 2. Thus, combining our last two observations, and applying Lemma C.1, we get
Finally, for the “primal step” case, we use the trivial bound from Lemma C.1, ignoring the second term,
∎
3.1.9 Proof of Lemma 3.5
See 3.5
Proof.
3.2 Complete Algorithm for -Regression
Recall our problem, (1),
We will now use all the tools and algorithms described so far to give a complete algorithm for the above problem. We will assume we have a starting solution satisfying and for purely objectives, we will use the homotopy analysis from Section 2.2.
Our overall algorithm reduces the problem to solving the residual problem (Definition 2.3) approximately. In Sections 3.1.1 and 3.1.2, we give an algorithm to solve the residual problem by first doing a binary search on the linear term and then applying a multiplicative weight update routine to minimize these problems. We have the following result which follows from Lemma 3.1 and Theorem 3.2.
Corollary 3.7.
Proof.
Let be such that . Refer to Lemma 2.7 to see that this is the case in which we use the solution of the residual problem. Now, from Lemma 2.6 we know that the optimum of the residual problem satisfies . Since we vary to take all such values in the range for one such we must have For such a , consider problem (8). Using Algorithm 5 for this problem, from Theorem 3.2 we are guaranteed to find a solution such that and . Now from Lemma 3.1, we note that is an -approximate solution to the residual problem. Since Algorithm 5 requires calls to a linear system solver, and Algorithm 3 calls this algorithm times, we obtain the required runtime. ∎
We are now ready to prove our main result.
Theorem 3.8.
3.3 Complete Algorithm for Pure Objectives
Consider the special case when our problem is only the -norm, i.e., Problem (3),
In Section 2.2 we described how to find a good starting point for such problems. Combining this algorithm with our algorithm for solving the residual problem we can obtain a complete algorithm for finding a good starting point. Specifically, we prove the following result.
Corollary 3.9.
Proof.
From Lemma 2.9 Algorithm 2 finds such a solution in time , where and denote the approximation and time to solve a norm problem. Now consider Algorithm 3 with Algorithm 5 as a subroutine. From Corollary 3.7, we can solve any -norm residual problem to a -approximation in calls to a linear system solver. We thus have for all and . Using these values, we obtain a runtime of,
∎
The following theorem gives a complete runtime for pure objectives.
Corollary 3.10.
4 Solving -norm Problems using -norm Oracles
In this section, we propose a new technique that allows us to solve -norm residual problems by instead solving an -norm residual problem without adding much to the runtime. Such a technique is unknown for pure objectives without a large overhead in the runtime. As a consequence we also obtain an algorithm for -regression with a linear runtime dependence on instead of the dependence in the algorithms from previous sections. The dependence in algorithms had one factor resulting from solving the -norm residual problem. At a high level, we show that it is sufficient to solve a -norm residual problem when is large, thus replacing a -factor with . We prove the following results which are based on the proofs and results of [AS20].
Theorem 4.1.
Theorem 4.2.
4.1 Relation between Residual Problems for and Norms
In this section we prove how -norm residual problems can be used to solve -norm residual problems. This idea first appeared in the work of [AS20], where they also apply the results to the maximum flow problem. In this paper, we provide a much simpler proof for the main techncial content and unify the cases of and that were presented separately in previous works. We also unify the case of relating the decision versions of the residual problems (without the linear term) and the entire objective. The results for the maximum flow problem and -norm flow problem as described in the original paper still follow and we refer the reader to the original paper for these applications. The main result of the section is as follows.
Theorem 4.3.
Let and be such that , where is the optimum of the -norm residual problem (Definition 2.3). The following -norm residual problem has optimum at least ,
(11) |
Let and denote a feasible solution to the above -norm residual problem with objective value at least . For , gives a -approximate solution to the -norm residual problem .
Proof.
Consider , the optimum of the -norm residual problem. Note that is a feasible solution for all since We know that the objective is optimum for . Thus,
which gives us,
Rearranging,
Since , which implies
We also note that,
Combining these bounds, we obtain the optimum of (11) is at least,
Since the optimum of (11) is at least , there exists a feasible with objective value at least . We now prove the second part, that a scaling of gives a good approximation to the -norm residual problem. First, let us assume . Since has objective value at least ,
Thus, , and . Let , where . We will show that is a good solution to the -norm residual problem.
For the case , consider the vector where . This vector is still feasible for Problem (11) and and,
We can now repeat the same argument as above. ∎
4.2 Faster Algorithm for -Regression
In this section, we will combine the tools developed in previous chapters and combine it with Section 4.1 to obtain an algorithm for Problem 1 that requires calls to a linear systems solver. For pure objectives we can combine our algorithm with the algorithm in Section 2.2 to obtain a convergence rate of linear systems solves.
Lemma 4.4.
Let . Algorithm 7 returns an -approximate solution to the -residual problem at in at most calls to a linear system solver.
Proof.
Let be such that . Refer to Lemma 2.7 to see that this is the case in which we use the solution of the residual problem. Now, from Lemma 2.6 we know that the optimum of the residual problem satisfies . Since we vary to take all such values in the range for one such we must have For such a , consider the -norm residual problem (11). Using Algorithm 5 for this problem, from Theorem 3.2 we are guaranteed to find a solution such that and . Now from Lemma 3.1, we note that is an -approximate solution to the -residual problem. We now use Theorem 4.3, which states that is a -approximate solution to the required residual problem .
See 4.1
Proof.
We note that Algorithm 6 is essentially Algorithm 1 which calls different residual solvers depending on the value of . If , from Theorem 3.8, we obtain the required solution in calls to a linear system solver. If , from Lemma 4.4, we obtain an approximate solution to the residual problem at any iteration in calls to a linear system solver. Combining this with Theorem 2.1, we obtain our result. ∎
See 4.2
Proof.
From Lemma 2.9 we can find an -approximation to the above problem in time
where is the approximation to which we solve the residual problem for the -norm problem and is the time required to do so. If , we use Algorithm 7 to solve such residual problems. Thus and . If , we can use Algorithm 3 and , . Thus, the total runtime is . We now combine this with Theorem 4.1 to obtain the required rates of convergence. ∎
5 Speedups for General Matrices via Inverse Maintenance
Inverse maintenance was first introduced by Vaidya in 1990 [Vai90] for speeding up algorithms for minimum cost and multicommodity flow problems. The key idea is to reuse the inverse of matrices, which is possible due to the controllable rates at which variables are updated in some algorithms. In the work by [Adi+19], the authors design a new inverse maintenance algorithm for -regression that can solve -regression for any almost as fast as linear regression. This section is based on Section 6 of [Adi+19] and we give a more fine grained and simplified analysis of the original result. In particular, we simplify the proofs and give the result with explicit dependencies on both matrix dimensions as opposed to just the larger dimension.
Our inverse maintenance procedure is based on the same high-level ideas of combining low-rank updates and matrix multiplication as in [Vai90] and [LS15]. However, recall that the rate of convergence of our algorithm is controlled by two potentials which change at different rates based on the two different kind of weight update steps in our algorithm. In order to handle these updates, our inverse maintenance algorithm uses a new fine-grained bucketing scheme, inspired by lazy updates in data structures and is different from previous works on inverse maintenance which usually update weights based on fixed thresholds. Our scheme is also simpler than those used in [Vai90, LS15]. We now present our algorithm in detail.
Consider the weighted linear system being solved at each iteration of Algorithm 5. Each weighted linear system is of the form,
where . From Equation (15) in Section 3, the solution of the above linear system is given by,
In order to compute the above expression, we require the following products in order. The runtimes are considering the fact .
-
•
and : require time and respectively
-
•
: requires time
-
•
and : require time
-
•
: requires time
-
•
: requires time
The cost of solving the above problem is dominated by the first step, and we thus require time , where . This directly gives the runtime of Algorithm 5 to be . In this section, we show that we can implement Algorithm 5 in time similar to solving a system of linear equations for all . In particular, we prove the following result.
Theorem 5.1.
5.1 Inverse Maintenance Algorithm
We first note that the weights ’s, and thus ’s are monotonically increasing. Our algorithm in Section 3.1.2 updates both in every iteration. Here, we will instead update these gradually when there is a significant increase in the values. We thus give a lazy update scheme. The update can be done via the following consequence of the Woodbury matrix formula. The main idea is that we initially explicitly compute the inverse of the required matrix, and then when we update the coordinates that have significant increases, but are still within a good factor approximation of the original values, and directly use the current matrix inverse as a preconditioner and solve linear systems faster.
5.1.1 Low Rank Update
The following lemma is the same as Lemma 6.2 of [Adi+19].
Lemma 5.2.
Given matrices , and vectors and that differ in entries, as well as the matrix , we can construct in time.
Proof.
Let denote the entries that differ in and . Then we have
This is a low rank perturbation, so by Woodbury matrix identity we get:
where we use because is a symmetric matrix. To explicitly compute this matrix, we need to:
-
1.
compute the matrix ,
-
2.
compute
-
3.
invert the middle term.
This cost is dominated by the first term, which can be viewed as multiplying pairs of and matrices. Each such multiplication takes time , for a total cost of . The other terms all involve matrices with dimension at most , and are thus lower order terms. ∎
5.1.2 Approximation and Fast Linear Systems Solver
We now define the notion of approximation we use and how to solve linear systems fast given a good preconditioner.
Definition 5.3.
We use for positive numbers and iff , and for vectors and for vectors and we use to denote entry-wise.
In our algorithm, we only update resistances that have increased by a constant factor. We can therefore use a constant factor preconditioner to solve the new linear system. We will use the following result on solving preconditioned systems of linear equations.
Lemma 5.4.
If and are vectors such that , and we’re given the matrix explicitly, then we can solve a system of linear equations involving to accuracy in time.
Proof.
Suppose we want to solve the system,
We know and that for some constant , . The following iterative method (which is essentially gradient descent),
converges to an -approximate solution in iterations. Each iteration can be computed via matrix-vector products. Since matrix vector products for matrices require at most we get the above lemma for . ∎
5.1.3 Algorithm
The algorithm is the same as that in Section 6 of [Adi+19]. The algorithm has two parts, an initialization routine InverseInit which is called only at the first iteration, and the inverse maintenance procedure, UpdateInverse which is called from Algorithm 4, Oracle. Algorithm Oracle is called every time the resistances are updated in Algorithm 5. For this section, we will assume access to all variables from these routines, and maintain the following global variables:
-
1.
: resistances from the last time we updated each entry.
-
2.
: for each entry, track the number of times that it changed (relative to ) by a factor of about since the previous update.
-
3.
, an inverse of the matrix given by .
5.1.4 Analysis
We first verify that the maintained inverse is always a good preconditioner to the actual matrix, .
Lemma 5.5 (Lemma 6.5, [Adi+19]).
After each call to UpdateInverse, the vector satisfies
Proof.
First, observe that any change in resistance exceeding is reflected immediately. Otherwise, every time we update , can only increase additively by at most
Once exceeds , will be added to after at most steps. So when we start from , is added to after iterations. The maximum possible increase in resistance due to the bucket is,
Since there are only at most iterations, the contributions of buckets with are negligible. Now the change in resistance is influenced by all buckets , each contributing at most increase. The total change is at most since there are at most buckets. We therefore have
for every . ∎
It remains to bound the number and sizes of calls made to Lemma 5.2. For this we define variables to denote the number of edges added to at iteration due to the value of . Note that is non-zero only if , and
We divide our analysis into 2 cases, when the relative change in resistance is at least and when the relative change in resistance is at most . To begin with, let us first look at the following lemma that relates the change in weights to the relative change in resistance.
Lemma 5.6.
Proof.
We now consider the case when the relative change in resistance is at least .
Lemma 5.7.
Throughout the course of a run of Algorithm 5, the number of edges added to due to relative resistance increase of at least ,
Proof.
From Lemma C.1, we know that the change in energy over one iteration is at least,
Over all iterations, the change in energy is at least,
which is upper bounded by . When iteration is a width reduction step, the relative resistance change is always at least . In this case . When we have a primal step, Lemma 5.6 implies that when the relative change in resistance is at least then,
Using the bound is sufficient since and both kinds of iterations are accounted for. The total change in energy can now be bounded.
The Lemma follows by substituting in the above equation. ∎
Lemma 5.8.
Throughout the course of a run of Algorithm 5, the number of edges added to due to relative resistance increase between and ,
Proof.
From Lemma C.1, the total change in energy is at least,
We know that . Using Lemma 5.6, we have,
We thus obtain,
Now, in the second case, when and ,
Therefore, for both cases we have,
Using the above bound and the fact that the total change in energy is at most , gives,
The Lemma follows substituting in the above equation. ∎
We can now use the concavity of to upper bound the contribution of these terms.
Corollary 5.9.
Let be as defined. Over all iterations we have,
and for every ,
Proof.
Due to the concavity of the power, this total is maximized when it’s equally distributed over all iterations. In the first sum, the number of terms is equal to the number of iterations, i.e., . In the second sum the number of terms is . Distributing the sum equally over the above numbers give,
and
∎
5.2 Proof of Theorem 5.1
See 5.1
Proof.
By Lemma 5.5, the that the inverse being maintained corresponds to always satisfy . So by the iterative linear systems solver method outlined in Lemma 5.4, we can implement each call to Oracle (Algorithm 4)in time in addition to the cost of performing inverse maintenance. This leads to a total cost of
across the iterations.
The costs of inverse maintenance is dominated by the calls to the low-rank update procedure outlined in Lemma 5.2. Its total cost is bounded by
Because there are only values of , and each is non-negative, we can bound the total cost by:
where the inequality follows from substituting in the result of Lemma 5.9. Depending on the sign of , this sum is dominated either at or . Including both terms then gives
with the exponent on the trailing term simplifying to to give,
∎
6 Iteratively Reweighted Least Squares Algorithm
Iteratively Reweighted Least Squares (IRLS) Algorithms are a family of algorithms for solving -regression. These algorithms have been studied extensively for about 60 years [Law61, Ric64, GR97] and the classical form solves the following version of -regression,
(12) |
where is a tall thin matrix and is a vector. The main idea in IRLS algorithms is to solve a weighted least squares problem in every iteration to obtain the next iterate,
(13) |
starting from which is usually . Here is picked to be and note that the above equation now becomes a fixed point iterate for the -regression problem. It is known that the fixed point is unique for .
The basic version of the above IRLS algorithm is guaranteed to converge for , however, even for small , the algorithm diverges [RCL19]. Over the years there have been several studies on IRLS algorithms and attempts to show convergence [Kar70, Osb85], but none of them show quantitative bounds or require starting solutions close enough to the optimum. Refer to [Bur12] for a complete survey on these methods.
In this section we propose an IRLS algorithm and prove that our algorithm is guaranteed to converge geometrically to the optimum. Our algorithm is based on the algorithm of [APS19] and present some experimental results from experiments performed in the paper that demonstrate our algorithm works very well in practice. We provide a much simpler analysis and integrate the analysis with the framework we have built so far.
We will focus on the following pure setting for better readability,
We note that our algorithm also works for the setting described in Equation (12). We will first describe our algorithm in the next section, and then present some experimental results from experiments that were performed in [APS19].
6.1 IRLS Algorithm
Our IRLS algorithm is based on our overall iterative refinement framework (Algorithm 1) where we will directly use a weighted least squares problem to solve the residual problem. Consider Algorithm 10 and compare it with Algorithm 1. We note that it is same overall, except now we have an extra step LineSearch and we update the solution (Line 7) at every iteration. These steps do not affect the overall convergence guarantees of the iterative refinement framework in Algorithm 1, since these are only ensuring that given a solution from ResidualSolver-IRLS, we are taking a step that reduces the objective value the most as opposed the fixed update defined in Algorithm 1. In other words, we are reducing the objective value in each iteration at least as much as in Algorithm 1. We thus require to prove the guarantees of ResidualSolver-IRLS (Algorithm 11) and combine it with Theorem 2.1 to obtain our final convergence guarantees. We will prove the following result on our IRLS algorithm (Algorithm 10).
See 1.3 The key connection with IRLS algorithms is that we are able to show that it is sufficient to solve a weighted least squares problem to solve the residual problem. The two main differences are, in every iteration we add a small systematic padding to and, we perform a line search. These tricks are common empirical modifications used to avoid ill conditioning of matrices and for a faster convergence [Kar70, VB99].
We will prove the following result about solving the residual problem.
Lemma 6.1.
We note that Theorem 1.3 directly follows from Lemma 6.1, Lemma 2.7 and Theorem 2.1. Therefore, in the next section, we will prove Lemma 6.1.
6.1.1 Solving the Residual Problem
6.1.2 Proof of Lemma 6.1
Proof.
Since , from Lemma 2.6, we have the optimum of the residual problem satisfies, . We will next prove, that the objective of (14) at the optimum is at most and at least . Before proving the above bound, we will prove how gives the required approximation to the residual problem. We have .
It remains to prove the bound on the optimal objective of (14) and bound for which it is sufficient to find an upper bound on ,
We will first bound . Since, ,
and
Therefore it is sufficient to bound , as
To bound , we start by assuming . Now, since optimal objective of (14) is lower bounded by ,
We thus have,
Using this we get,
We thus have lower bounded by , which gives us our result. It remains to give a lower bound to the optimal objective of (14).
6.2 Experiments
In this section, we include the experimental results from [APS19] which are based on Algorithm -IRLS described in the paper. We would like to mention that -IRLS is similar in spirit to Algorithm 10 and thus we expect a similar performance by an implementation of Algorithm 10. Algorithm -IRLS is described for setting (12) and is available at https://github.com/fast-algos/pIRLS [APS19a]. We now give a brief summary of the experiments.
6.2.1 Experiments on p-IRLS











All implementations were done on on MATLAB 2018b on a Desktop ubuntu machine with an Intel Core - CPU @ processor and 4GB RAM. The two kinds of instances considered are Random Matrices and Graph instances for the problem .
-
1.
Random Matrices: Matrices and are generated randomly i.e., every entry of the matrix is chosen uniformly at random between and .
-
2.
Graphs: Instances are generated as in [RCL19]. Vertices are uniform random vectors in and edges are created by connecting the nearest neighbors. The weight of every edge is determined by a Gaussian function (Eq 3.1,[RCL19]). Around 10 vertices have labels chosen uniformly at random between and . The problem is to minimize the laplacian. Appendix B contains details on how to formulate this problem into our standard form. These instances were generated using the code by [Rio19].
References
- [ABS21] Deeksha Adil, Brian Bullins and Sushant Sachdeva “Unifying Width-Reduced Methods for Quasi-Self-Concordant Optimization” In arXiv preprint arXiv:2107.02432, 2021
- [Adi+19] Deeksha Adil, Rasmus Kyng, Richard Peng and Sushant Sachdeva “Iterative Refinement for -norm Regression” In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, 2019, pp. 1405–1424 SIAM
- [Adi+21] Deeksha Adil, Brian Bullins, Rasmus Kyng and Sushant Sachdeva “Almost-Linear-Time Weighted -Norm Solvers in Slightly Dense Graphs via Sparsification” In 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021), 2021 Schloss Dagstuhl-Leibniz-Zentrum für Informatik
- [AL11] Morteza Alamgir and Ulrike Luxburg “Phase transition in the family of p-resistances” In Advances in neural information processing systems 24, 2011, pp. 379–387
- [All+17] Zeyuan Allen-Zhu, Yuanzhi Li, Rafael Oliveira and Avi Wigderson “Much faster algorithms for matrix scaling” In 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS), 2017, pp. 890–901 IEEE
- [APS19] Deeksha Adil, Richard Peng and Sushant Sachdeva “Fast, Provably Convergent IRLS Algorithm for p-norm Linear Regression” In Advances in Neural Information Processing Systems, 2019, pp. 14189–14200
- [APS19a] Deeksha Adil, Richard Peng and Sushant Sachdeva “pIRLS” In https://github.com/fast-algos/pIRLS GitHub, https://github.com/fast-algos/pIRLS, 2019
- [AS20] Deeksha Adil and Sushant Sachdeva “Faster p-norm Minimizing Flows, via Smoothed q-norm Problems” In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, 2020, pp. 892–910 SIAM
- [BB94] Jose Antonio Barreto and C Sidney Burrus “L/sub p/-complex approximation using iterative reweighted least squares for FIR digital filters” In Proceedings of ICASSP’94. IEEE International Conference on Acoustics, Speech and Signal Processing 3, 1994, pp. III–545 IEEE
- [Bub+18] Sébastien Bubeck, Michael B Cohen, Yin Tat Lee and Yuanzhi Li “An homotopy method for regression provably beyond self-concordance and in input-sparsity time” In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing, 2018, pp. 1130–1137
- [Bul18] Brian Bullins “Fast minimization of structured convex quartics” In arXiv preprint arXiv:1812.10349, 2018
- [Bur12] C Burrus “Iterative re-weighted least-squares OpenStax-CNX”, 2012
- [Cal19] Jeff Calder “Consistency of Lipschitz learning with infinite unlabeled data and finite labeled data” In SIAM Journal on Mathematics of Data Science 1.4 SIAM, 2019, pp. 780–812
- [Car+20] Yair Carmon, Arun Jambulapati, Qijia Jiang, Yujia Jin, Yin Tat Lee, Aaron Sidford and Kevin Tian “Acceleration with a Ball Optimization Oracle” In Advances in Neural Information Processing Systems 33 Curran Associates, Inc., 2020, pp. 19052–19063 URL: https://proceedings.neurips.cc/paper/2020/file/dba4c1a117472f6aca95211285d0587e-Paper.pdf
- [Che+22] Li Chen, Rasmus Kyng, Yang P Liu, Richard Peng, Maximilian Probst Gutenberg and Sushant Sachdeva “Maximum flow and minimum-cost flow in almost-linear time” In arXiv preprint arXiv:2203.00671, 2022
- [Chi+13] Hui Han Chin, Aleksander Madry, Gary L Miller and Richard Peng “Runtime guarantees for regression problems” In Proceedings of the 4th conference on Innovations in Theoretical Computer Science, 2013, pp. 269–282
- [Chi+17] Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy and David P Woodruff “Algorithms for low-rank approximation” In International Conference on Machine Learning, 2017, pp. 806–814 PMLR
- [Chr+11] Paul Christiano, Jonathan A Kelner, Aleksander Madry, Daniel A Spielman and Shang-Hua Teng “Electrical flows, laplacian systems, and faster approximation of maximum flow in undirected graphs” In Proceedings of the forty-third annual ACM symposium on Theory of computing, 2011, pp. 273–282
- [CT05] Emmanuel J Candes and Terence Tao “Decoding by linear programming” In IEEE transactions on information theory 51.12 IEEE, 2005, pp. 4203–4215
- [CY08] Rick Chartrand and Wotao Yin “Iteratively reweighted algorithms for compressive sensing” In 2008 IEEE international conference on acoustics, speech and signal processing, 2008, pp. 3869–3872 IEEE
- [EDT17] Abderrahim Elmoataz, X Desquesnes and M Toutain “On the game p-Laplacian on weighted graphs with applications in image processing and data clustering” In European Journal of Applied Mathematics 28.6 Cambridge University Press, 2017, pp. 922–948
- [ETT15] Abderrahim Elmoataz, Matthieu Toutain and Daniel Tenbrinck “On the p-Laplacian and -Laplacian on graphs with applications in image and data processing” In SIAM Journal on Imaging Sciences 8.4 SIAM, 2015, pp. 2412–2451
- [EV19] Alina Ene and Adrian Vladu “Improved Convergence for and Regression via Iteratively Reweighted Least Squares” In International Conference on Machine Learning, 2019, pp. 1794–1801 PMLR
- [GB08] M. Grant and S. Boyd “Graph implementations for nonsmooth convex programs” http://stanford.edu/~boyd/graph_dcp.html In Recent Advances in Learning and Control, Lecture Notes in Control and Information Sciences Springer-Verlag Limited, 2008, pp. 95–110
- [GB14] M. Grant and S. Boyd “CVX: Matlab Software for Disciplined Convex Programming, version 2.1”, http://cvxr.com/cvx, 2014
- [GR97] Irina F Gorodnitsky and Bhaskar D Rao “Sparse signal reconstruction from limited data using FOCUSS: A re-weighted minimum norm algorithm” In IEEE Transactions on signal processing 45.3 IEEE, 1997, pp. 600–616
- [HFE18] Yosra Hafiene, Jalal Fadili and Abderrahim Elmoataz “Nonlocal -Laplacian Variational problems on graphs” In arXiv preprint arXiv:1810.12817, 2018
- [JLS22] Arun Jambulapati, Yang P Liu and Aaron Sidford “Improved iteration complexities for overconstrained p-norm regression” In Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, 2022, pp. 529–542
- [Kar70] LA Karlovitz “Construction of nearest points in the , even, and norms. I” In Journal of Approximation Theory 3.2 Academic Press, 1970, pp. 123–127
- [KLS20] Tarun Kathuria, Yang P Liu and Aaron Sidford “Unit Capacity Maxflow in Almost Time” In 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS), 2020, pp. 119–130 IEEE
- [Kyn+15] R. Kyng, A.. Rao, S. Sachdeva and D. Spielman “Algorithms for Lipschitz learning on graphs” In COLT, 2015
- [Kyn+19] Rasmus Kyng, Richard Peng, Sushant Sachdeva and Di Wang “Flows in almost linear time via adaptive preconditioning” In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, 2019, pp. 902–913
- [Law61] Charles Lawrence Lawson “Contribution to the theory of linear least maximum approximation” In Ph. D. dissertation, Univ. Calif., 1961
- [LS14] Yin Tat Lee and Aaron Sidford “Path Finding Methods for Linear Programming: Solving Linear Programs in Iterations and Faster Algorithms for Maximum Flow” Available at http://arxiv.org/abs/1312.6677 and http://arxiv.org/abs/1312.6713 In Foundations of Computer Science (FOCS), 2014 IEEE 55th Annual Symposium on, 2014, pp. 424–433 IEEE
- [LS15] Yin Tat Lee and Aaron Sidford “Efficient Inverse Maintenance and Faster Algorithms for Linear Programming” Available at: https://arxiv.org/abs/1503.01752 In IEEE 56th Annual Symposium on Foundations of Computer Science, FOCS 2015, Berkeley, CA, USA, 17-20 October, 2015, 2015, pp. 230–249
- [LSW15] Yin Tat Lee, Aaron Sidford and Sam Chiu-wai Wong “A faster cutting plane method and its implications for combinatorial and convex optimization” In 2015 IEEE 56th Annual Symposium on Foundations of Computer Science, 2015, pp. 1049–1065 IEEE
- [NN94] Yurii Nesterov and Arkadii Nemirovskii “Interior-point polynomial algorithms in convex programming” SIAM, 1994
- [Osb85] Michael Robert Osborne “Finite algorithms in optimization and data analysis” John Wiley & Sons, Inc., 1985
- [RCL19] Mauricio Flores Rios, Jeff Calder and Gilad Lerman “Algorithms for -based semi-supervised learning on graphs” In arXiv preprint arXiv:1901.05031, 2019
- [Ric64] John Rischard Rice “The approximation of functions” Addison-Wesley Reading, Mass., 1964
- [Rio19] M.. Rios “LaplacianLpGraphSSL” In GitHub repository GitHub, https://github.com/mauriciofloresML/Laplacian_Lp_Graph_SSL, 2019
- [SV16] Damian Straszak and Nisheeth K Vishnoi “IRLS and slime mold: Equivalence and convergence” In arXiv preprint arXiv:1601.02712, 2016
- [SV16a] Damian Straszak and Nisheeth K Vishnoi “Natural algorithms for flow problems” In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, 2016, pp. 1868–1883 SIAM
- [SV16b] Damian Straszak and Nisheeth K. Vishnoi “On a Natural Dynamics for Linear Programming” In Proceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, 2016
- [Vai90] P Vaidya “Solving linear equations with diagonally dominant matrices by constructing good preconditioners”, 1990
- [VB99] Ricardo A Vargas and Charles S Burrus “Adaptive iterative reweighted least squares design of L/sub p/FIR filters” In 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No. 99CH36258) 3, 1999, pp. 1129–1132 IEEE
Appendix A Solving Problems under Subspace Constraints
We will show how to solve general problems of the following form using a linear system solver.
We first write the Lagrangian of the problem,
Using Lagrangian duality and noting that strong duality holds, we can write the above as,
We first find that minimizes the above objective by setting the gradient with respect to to . We thus have,
Using this value of we arrive at the following dual program.
which is optimized at,
Strong duality also implies that is optimized at , which gives us,
We now note that we can compute by solving the following linear systems in order:
-
1.
Find inverse of
-
2.
Appendix B Converting -Laplacian Minimization to Regression Form
Define the following terms:
-
•
denote the number of vertices.
-
•
denote the number of labels.
-
•
denote the edge-vertex adjacency matrix.
-
•
denote the vector of labels for the labelled vertices.
-
•
denote the diagonal matrix with weights of the edges.
Set and . Now is equal to the laplacian and we can use our IRLS algorithm from Chapter 7 to find the that minimizes this.
Appendix C Increasing Resistances
We first prove the following lemma that shows how much changes with a change in resistance.
Lemma C.1.
Let . Then one has for any and such that ,
Proof.
For this proof, we use .
Constructing the Lagrangian and noting that strong duality holds,
Optimality conditions with respect to give us,
Substituting this in gives us,
Optimality conditions with respect to now give us,
which upon re-substitution gives,
We also note that
(15) |
We now want to see what happens when we change . Let denote the diagonal matrix with entries and let , where is the diagonal matrix with the changes in the resistances. We will use the following version of the Sherman-Morrison-Woodbury formula multiple times,
We begin by applying the above formula for , , and . We thus get,
(16) |
We next claim that,
which gives us,
(17) |
This further implies,
(18) |
We apply the Sherman-Morrison formula again for, , , and . Let us look at the term .
Using this, we get,
which on multiplying by and gives,
We note from Equation (15) that . We thus have,
∎