An active-set method for sparse approximations
Part II: General piecewise-linear terms
Abstract
In this paper we present an efficient active-set method for the solution of convex quadratic programming problems with general piecewise-linear terms in the objective, with applications to sparse approximations and risk-minimization. The method exploits the structure of the piecewise-linear terms appearing in the objective in order to significantly reduce its memory requirements, and thus improve its efficiency. We showcase the robustness of the proposed solver on a variety of problems arising in risk-averse portfolio selection, quantile regression, and binary classification via linear support vector machines. We provide computational evidence to demonstrate, on real-world datasets, the ability of the solver of efficiently handling a variety of problems, by comparing it against an efficient general-purpose interior point solver as well as a state-of-the-art alternating direction method of multipliers. This work complements the accompanying paper [“An active-set method for sparse approximations. Part I: Separable terms”, S. Pougkakiotis, J. Gondzio, D. S. Kalogerias], in which we discuss the case of separable terms, analyze the convergence, and propose general-purpose preconditioning strategies for the solution of its associated linear systems.
1 Introduction
In this paper we are interested in the solution of the following optimization problem
(1.1) |
where , is a positive semi-definite matrix, , , is a linear constraint matrix with a given right-hand side, is a diagonal positive semi-definite (“weight”) matrix, and is a closed convex set with , such that for all . Additionally, , while is an indicator function for the set , that is, if and , otherwise.
We introduce an auxiliary variable , and reformulate (1.1) in the following form:
(P) |
Remark 1.
Let us notice that the model in (P) can readily accommodate terms of the form of where , and . Indeed, letting and adding the constant term in the objective of (P), we notice that
where . Similarly, any piecewise-linear term of the form
where and , can also be readily modeled. Indeed, setting and adding the term in the objective yields
Finally, it is important to note that model (P) allows for multiple piecewise-linear terms of the form , or , since we can always adjust to account for more than one terms. Hence, one can observe that (P) is quite general and can be used to model a plethora of very important problems that arise in practice.
In light of the discussion in Remark 1, it is easily observed that problem (P) can model a plethora of very important problems arising in several application domains spanning, among others, operational research, machine learning, data science, and engineering. More specifically, various lasso and fussed lasso instances (with applications to sparse approximations for classification and regression [19, 54], portfolio allocation [1], or medical diagnosis [23], among many others) can be readily modeled by (P). Additionally, various risk-minimization problems with linear random cost functions can be modeled by (P) (e.g. see [33, 44, 48]). Furthermore, even risk-minimization problems with nonlinear random cost functions, which are typically solved via Gauss-Newton schemes (e.g. see [10]), often require the solution of sub-problems like (P). Finally, continuous relaxations of integer programming problems with applications to operational research (e.g. [32]) often take the form of (P). Given the multitude of problems requiring easy access to (usually accurate) solutions of (P), the derivation of efficient, robust, and scalable solution methods is of paramount importance.
Problem (P) can be solved by various first- or second-order methods. In particular, using a standard reformulation, by introducing several auxiliary variables, (P) can be written as a convex quadratic programming (QP) one and efficiently solved by, among others, an interior point method (IPM; e.g. [19, 38]), an alternating direction method of multipliers (ADMM; e.g. [20]), or a proximal point method (e.g. [28, 43]). However, the reformulation of (P) into a convex QP is not expected to lead to scalable solution methods, since the dimension of the problem is significantly increased, and hence an already large-scale instance might be very difficult to solve in this way.
Alternatively, ADMM (or general splitting) schemes can be developed for the solution of (P) without the need of additional auxiliary variables (see Section 3). However, no first-order scheme would be able to consistently yield sufficiently accurate solutions (i.e. of 4-, 5- or 6-digit accuracy). If such a solution is sought, we have to employ a semismooth Newton method (SSN; e.g. [12, 24, 37]), or a combination of a proximal point method with an SSN scheme utilized for the solution of its associated sub-problems. SSN methods have been successfully applied to a plethora of problems, however, their success is heavily reliant on the properties of the problem at hand (e.g. the rank of the linear constraints, or the conditioning of the Hessian). On the other hand, the combination of the proximal point method with the SSN can circumvent these issues, since the associated nonsmooth sub-problems can be guaranteed to be well-defined and well-conditioned.
Various such solvers have been developed and analyzed in the literature. For example, the authors in [53] developed a dual augmented Lagragian scheme combined with an SSN method for the solution of semidefinite programming problems, and obtained very promising results. This scheme was then utilized for the solution of linear programming problems in [35], and for lasso-regularized problems in [34]. A similar primal-dual approach for -regularized convex quadratic programming problems was developed and analyzed in our accompanying paper [39] and was shown to be especially efficient for the solution of elastic-net linear regression and -regularized partial differential equation constrained optimization problems. In fact, the proposed active set method developed in this work is a direct extension of the method given in [39], altered in a specific way so that it can efficiently handle most piecewise-linear terms that appear in practice, via restricting its memory requirements.
Indeed, we showcase that each of the nonsmooth terms in the objective of (P) can be utilized for reducing the memory requirements of the proposed method. In particular, when computing the Clarke subdifferential ([13]) of an appropriate augmented Lagrangian penalty function, we can show that the Clarke derivatives of such piecewise-linear terms can act as projectors. As a result, we obtain an active-set scheme that reduces the sub-problems’ dimensions significantly, leading to better performance and reduced memory requirements. In particular, we observe that a thresholding operation (originating from the presence of terms in the objective) determines which of the variables are inactive, allowing us to throw away entire columns from matrices and when solving the associated sub-problems. Furthermore, the terms in the objective determine which of the rows of are non-important, allowing us to further eliminate such rows.
We showcase the robustness and the efficiency of the resulting active-set scheme on various optimization problems arising in risk-averse portfolio selection, quantile regression, and binary classification via linear support vector machines. In each of these cases the proposed scheme is compared against the robust polynomially convergent regularized interior point method developed and analyzed in [38], as well as the well-known ADMM-based OSQP solver ([49]). We demonstrate the reduced memory requirements of the active set scheme (and hence its improved scalability), as compared to interior point and ADMM alternatives applied to QP reformulations, while showcasing its efficiency and robustness.
Structure of the paper
In Section 2 we briefly present the proposed inner-outer active-set method. In particular, in Section 2.1 we derive a proximal method of multipliers (outer scheme) for the solution of (P), assuming that we can find an -optimal solution of the associated sub-problems. Then, in Section 2.2, we briefly present a semismooth Newton method (inner scheme) for finding approximate solutions to sub-problems arising from the proximal method of multipliers. Focusing on the structure of problem (P), by selecting appropriate Clarke derivatives, we show that the proposed inner-outer method is in fact an active-set scheme, the associated linear systems of which are well-conditioned and stable. In Section 2.3, we discuss an extension of the method for dealing with problems having arbitrary piecewise-linear terms in the objective (with applications to, among others, robust optimization) that are not currently covered within the paper.
Subsequently, in Section 3, we derive a proximal alternating direction method of multipliers for the approximate solution of (P) in order to obtain good initial estimates for the primal and dual variables of the problem. This can then be used to warm-start the proposed second-order algorithm. A good starting point for the algorithm could mean that only a small portion of the problem data matrices is used at each inner-outer iteration of the scheme, while the outer method is expected to achieve its local linear (and potentially superlinear) convergence rate in a small number of inner-outer iterations.
Notation
Given a vector in , denotes its Euclidean norm. Given a closed set , denotes the Euclidean projection onto , that is . Given a closed set , we write . Given a convex function , we define the proximity operator as . Given an index set , denotes its cardinality. Given a rectangular matrix and an index set , we denote the columns of , the indices of which belong to , as . Given a matrix , and two index sets and , we denote the subset of rows and columns of , the indices of which belong to respectively, as . Finally, given an arbitrary vector with components as well as some indexes , we denote by the vector containing the -st up to the -nd component of this vector. To avoid confusion, indexes are always used to denote entries of vectors or matrices, while and are reserved to denote iteration counters (outer and inner, respectively).
2 An active-set method
In this section we derive an active-set method for the solution of (P). The algorithm is an inner-outer scheme, which results by combining an outer proximal method of multipliers, and an inner semismooth Newton method for the solution of the PMM sub-problems. Following the discussion in [39, Section 2], we briefly derive the outer PMM scheme. Then, we briefly present the inner semismooth Newton method and discuss the solution of its associated linear systems, which is the main bottleneck of the algorithm.
2.1 Outer scheme: A primal-dual proximal method of multipliers
In this section, following and extending the developments in [39], we derive a primal-dual proximal method of multipliers for the approximate solution of (P). A convergence analysis is not provided, and the reader is referred to [39, Section 2], for an outline of such an analysis (since it applies directly to the case under consideration).
Given a penalty parameter , and some dual multiplier estimates , we follow [39, Section 2] to obtain the augmented Lagrangian corresponding to (P), which reads
(2.1) |
Indeed, this can be verified utilizing certain standard properties of Fenchel duality and of the proximity operator (see [39, Section 2]).
During iteration of the proximal method of multipliers, we have the estimates as well as the penalty parameters , such that , where is a non-increasing positive sequence. For simplicity of exposition, let , where and We consider the following regularized continuously differentiable function:
Notice that we introduce primal proximal regularizer only for variable , and not for . This is a very important algorithmic choice that departs from the developments in [39]. We want to develop a memory-efficient active-set method, and for that reason we avoid introducing a proximal term for the auxiliary variables . This choice, which does not hinder the stability of the proposed approach, leads to better memory efficiency of the resulting active-set scheme, by promoting a sparsification of the associated linear systems solved at each inner-outer iteration (this point will become clear in Section 2.2.1).
The minimization of the proximal augmented Lagrangian function can be written as
and thus we need to find such that
To that end, we observe that
By introducing the dual variable , the optimality conditions of can be written as
(2.2) |
where
We now describe the proposed proximal method of multipliers in Algorithm PMM.
Input: , , , such that , .
(2.3) |
|
(2.4) |
(2.5) |
Let us notice that given a triple , we can easily evaluate the distance in (2.3), due to the piecewise linear structure of the associated function . A trivial extension of the analysis in [39, Section 3] yields that Algorithm PMM is globally convergent under the standard assumption of primal-dual feasibility. Furthermore, given some additional conditions on the error sequence one can show that in fact Algorithm PMM achieves a local linear convergence rate (which becomes superlinear if at a suitable rate). Finally, the algorithm exhibits a global linear convergence rate, assuming that the starting point is chosen properly. For more details, we refer the reader to [39, Theorems 2.2, 2.3].
2.2 Inner scheme: A semismooth Newton method
Next, we employ a semismooth Newton (SSN) method to solve problem (2.3) appearing in Algorithm PMM, and verify that the resulting inner-outer scheme admits an active-set interpretation. Given the estimates as well as the penalties , we apply SSN to approximately solve (2.2). Given any bounded positive penalty , the optimality conditions in (2.2) can be written as
(2.6) |
We set , , and at every iteration of SSN, we solve
(2.7) |
where we have introduced the notation
assuming that we are given some arbitrary matrices
(2.8) |
The symbol denotes the Clarke subdifferential of a function (see [13]). In our case, any element of the Clarke subdifferential is a Newton derivative (see [14, Chapter 13]), since and are piecewise continuously differentiable and regular functions. For any and any , it holds that
Since and is diagonal, we have that for all and any ,
Finally, since , we have that for all and any ,
Then, we can show (e.g. see [14, Example 14.9]) that
|
and
We complete the derivation of the SSN, by defining a primal-dual merit function for globalizing the semismooth Newton method via a backtracking line-search scheme. Following [39], we employ the following merit function
(2.9) |
This function is often used for globalizing SSN schemes applied to nonsmooth equations of the form of (2.6). In Algorithm SSN, we outline a locally superlinearly convergent semismooth Newton method for the approximate solution of (2.3). The associated linear systems can be solved approximately (e.g. by Krylov subspace methods as in [39]), although in this work a suitable factorization scheme is utilized.
Input: Let , , , , , .
Local superlinear convergence of Algorithm SSN follows directly from [37, Thereom 3], since the equation in (2.6) is BD-regular (i.e. the Bouligand subdifferential at the optimum contains nonsingular matrices) by construction, as it models “regularized” sub-problems arising from the proximal method of multipliers. On the other hand, if we assume that the directional derivative of (2.9) is continuous at the optimum, then we can mirror the analysis in [25] to show that Algorithm SSN is globally convergent. There is a wide literature on similar semismooth Newton schemes, and we refer the reader to [12, 25, 37, 40] and the references therein for more details.
2.2.1 The SSN linear systems
The major bottleneck of the previously presented inner-outer scheme is the solution of the associated linear systems given in (2.7). One could alter Algorithm SSN so that it does not require an exact solution. In turn this would allow for the utilization of preconditioned Krylov subspace solvers for the efficient solution of such systems (e.g. as in [39, Section 3]). In particular, any preconditioner derived in [21] can be utilized for the proposed solver. However, for simplicity of exposition, we employ a standard factorization approach. As will become evident in Section 4, the active-set nature of the proposed algorithm enables the use of factorization even for large-scale problems of interest, since most of the data are inactive at each inner-outer iteration . Indeed, in the presence of multiple piecewise-linear terms in the objective, and assuming that the method is appropriately warm-started, one can ensure that most of the problem data are not incorporated when forming the active Hessian at each inner-outer iteration.
In what follows we derive the associated linear systems. Let be an arbitrary iteration of Algorithm PMM, and an arbitrary iteration of Algorithm SSN. Firstly, let us notice that any element yields a Newton derivative (see [14, Theorem 14.8]). The same applies for any element , and . Thus, using (2.8) we can choose from the Bouligand subdifferential to improve computational efficiency, by reducing the active variables and constraint rows. To that end, we set as a diagonal matrix with
(2.10) |
for all , as the following diagonal matrix
(2.11) |
for all , where , and as
(2.12) |
for all .
Given the aforementioned choices for the projection matrices, we can now explicitly eliminate certain variables from the system in (2.7), in order to obtain a saddle-point system. To that end, from the second block-equation in (2.7), we have
Let . Then, we obtain
(2.13) |
where the right-hand side can be evaluated easily. Letting , we have
(2.14) |
On the other hand, from the third block-equation of (2.7), we observe that
which can be computed after solving the reduced system. We define the following index sets
From the first block equation of (2.7), we obtain
After a straightforward (if tedious) calculation, we can pivot , , and in system (2.7). This results in the following saddle-point system:
(2.15) |
where , and
Notice that the coefficient matrix of (2.15) is symmetric and quasi-definite (see [50]). As such, it is strongly factorizable, that is, each symmetric permutation of it admits an decomposition, with diagonal. Alternatively, one could eliminate further variables in order to obtain a positive definite linear system. This could be beneficial in certain applications, but is omitted here for the sake of generality. We note that contains only a subset of the columns and rows of . Similarly, contains only a subset of the columns of . As we will verify in practice, relatively close to the solution of (P) the active-set matrices , , can be significantly smaller than and , respectively, allowing the proposed approach to solve large-scale instances, without requiring excessive memory. Finally, we note that if we had included a proximal term for the auxiliary variables (i.e. if we were to directly apply the algorithm given in [39]), then would only contain a subset of the columns of but all of its rows (which, in general, are expected to be much more problematic, since it is often the case that ).
2.3 Extension to arbitrary piecewise-linear terms
Before closing this section we would like to notice that there are certain piecewise-linear terms that are not captured by the model in (P). In particular, given a set of pairs (with ), where , we could consider problems of the following form
In order to do so, we would have to introduce auxiliary vectors , , and reformulate the problem in the following form
Subsequently, in the derivation of the semismooth Newton method, we would have to evaluate the proximity operator of , , which admits a closed-form solution:
Then, the Clarke derivative of the previous could be computed by utilizing [14, Theorem 14.7]. This was not considered in this work for brevity of presentation, and is left as a future research direction. Indeed, this extension paves the way for the generalization of the proposed active-set scheme to a plethora of robust optimization problems that appear in practice (e.g. see [4]), as well as delivering an alternative to standard cutting-plane methods (e.g. see [29]), decomposition methods (e.g. see [3, 5, 18, 46]), or specialized interior point (e.g. see [22]) approaches appearing in the literature.
3 Warm-starting
Following the developments in [35, 39], we would like to find a starting point for Algorithm PMM that is relatively close to a primal-dual solution. Indeed, this is crucial since, on the one hand, then we can expect to observe early linear convergence of Algorithm PMM, while on the other hand, we can expect to be close to identifying the correct active-sets which in turn implies that the memory requirements of Algorithm SSN are significantly reduced. To that end, we employ a proximal alternating direction method of multipliers (pADMM; e.g. see [20]). We reformulate (P) by introducing an artificial variable , as
(P’) |
Given a penalty , we associate the following augmented Lagrangian to (P’)
where denotes the dual multipliers associated to the linear equality constraints of (P’). Let an arbitrary positive definite matrix be given, and assume the notation , for any . Also, as in Section 2.1, we denote . Algorithm pADMM describes a proximal ADMM for the approximate solution of (P’).
Input: , , , .
Let us notice that under certain standard assumptions on (P’), Algorithm pADMM converges globally, potentially at a linear rate (see [20]). We can employ the regularization matrix as a means of ensuring memory efficiency of Algorithm pADMM. For example, we can recover the prox-linear ADMM [20, Section 1.1], where given some sufficiently large constant , one defines
Given this choice of , the second step of Algorithm pADMM consists of only matrix-vector multiplications with and , and thus no (non-diagonal) matrix inversion is required. If memory is not an issue, one could use a positive-definite diagonal regularizer , yielding a standard regularized ADMM. In our implementation, the user can specify which of the two strategies should be employed. In either case, the first step of Algorithm pADMM is trivial to solve, since we know that the proximity operator of can be computed analytically as in Section 2.2.
Finally, once an approximate solution is computed, we set the starting point of Algorithm PMM as , where
Indeed, an optimal primal-dual solution of (P’) satisfies
thus the characterization of in Algorithm PMM follows from the Appendix, where one can also find the termination criteria of Algorithm pADMM (see (A.2)).
4 Applications and numerical results
In this section we present various applications that can be modeled by problem (P), focusing on portfolio optimization, quantile regression, and binary classification. In particular, we first discuss (and numerically demonstrate) the effectiveness of the approach for the solution of single-period mean-risk portfolio optimization problems, where risk is measured via the conditional value at risk or the mean absolute semi-deviation. Subsequently, we apply the proposed scheme for the solution of quantile regression problems, demonstrating its scalability. Finally, we apply the active-set scheme for the solution of binary classification problems via linear support vector machines on certain large-scale datasets.
Implementation details
Before proceeding to the numerical results, we mention certain implementation details of the proposed algorithm. The implementation follows closely the developments in [39], and can be found on GitHub111https://github.com/spougkakiotis/Active_set_method_for_CQP_piecewise_LP. The code is written in MATLAB, and the experiments are run on a PC with a 2.2GHz Intel core i7-8750H processor (hexa-core), 16GB of RAM, using the Windows 10 operating system.
We run Algorithm pADMM (warm-start) for at most 100 iterations (or until a 3-digit accurate solution is found). The user can choose whether the warm-starting scheme should be matrix-free or not. In the presented experiments, the matrix-free scheme was only used for the largest quantile regression and classification problems, for which standard ADMM crashed due to excessive memory requirements. Then, starting with , , we run Algorithms PMM-SSN. Following [39], when solving the PMM sub-problems using Algorithm SSN we use a predictor-corrector-like heuristic in which the first iteration is accepted without line-search and then line-search is activated for subsequent iterations. Algorithm PMM is allowed to run for at most 200 iterations, while Algorithm SSN is stopped after 20 inner iterations. An iterate is accepted as optimal if the conditions given in (A.1) are satisfied for the tolerance specified by the user. Most other implementation details follow directly from the developments in Sections 2.1–2.2. We refer the reader to the implementation on GitHub for additional details.
In the presented experiments we compare the proposed approach against the interior point solver given in [38], the implementation of which can be found on GitHub222https://github.com/spougkakiotis/IP_PMM, as well as the ADMM-based OSQP solver given in [49], the implementation of which can also be found on GitHub333https://github.com/osqp/osqp-matlab. We note that most problems considered herein were especially challenging for OSQP (mostly due to requesting highly accurate solutions), and for that reason we allow it to run for 50,000 iterations (unlike its default iteration threshold, which is 4000).
4.1 Portfolio optimization
We first consider the mean-risk portfolio selection problem (originally proposed in [36]), where we minimize some convex risk measure, while keeping the expected return of the portfolio above some desirable level. A variety of models for this problem have been intensely analyzed and solved in the literature (e.g. see [1, 31, 44]). The departure from the variance as a measure of risk often allows for great flexibility in the decision making of investors, enabling them to follow potential regulations as well as to better control the risk associated with an “optimal” portfolio. Optimality conditions and existence of solutions for several general deviation measures have been characterized in [45]. The comparison of portfolios obtained by minimizing different risk measures has also been considered (e.g. see [41, 42, 48]).
The method presented in this paper is general and not related to a specific model choice. Indeed, we would like to showcase the efficiency of our approach for obtaining accurate solutions to portfolio optimization problems with various risk measures of practical interest. Thus, we focus on the solution of a standard portfolio selection model that has been used in the literature. All numerical results are obtained on real-world datasets. As a result the problems are of medium-scale. Nonetheless, even for such medium-scale problems, we will be able to demonstrate the efficiency of the proposed approach, when compared to the efficient interior point method employed in the literature [19] for similar problems. We also note that both second-order solvers (i.e. IPM and active-set) significantly outperform OSQP on these instances, however it was included in the comparison for completeness. Some large-scale instances will be tackled in the following subsections, where the method will be applied to quantile regression and binary classification instances.
Let represent a portfolio of financial instruments, such that
This requirement indicates that short positions for each stock are restricted by some percentage () of the available wealth (assuming ), and no more than of the total wealth can be invested to instrument . Let denote a random vector, the -th entry of which represents the random return of the -th instrument. Then, the random loss (i.e. the negative of the random return) of a given portfolio is given by . In this paper we assume that follows some continuous distribution , as well as that there is a one-to-one correspondence between percentage return and monetary value (as in [44, Section 3]). Additionally, given some expected benchmark return (e.g. the market index), we only consider portfolios that yield an expected return above a certain threshold, i.e. .
Finally, given the previously stated constraints, we would like to minimize some convex risk measure of interest. However, in order to make sure that the problem is well-posed, while the transaction costs are not excessive, we include an term in the objective. This is a well-known modeling choice that yields sparse portfolios, and thus regulates the transaction costs in the single-period portfolio setting (e.g. see [1, 16, 31]). Additionally, with an appropriate tuning of the regularization parameter , one could also control the amount of short positions (see [16]). It should be mentioned here that in the multi-period setting such an term does not guarantee a reduction in the transaction costs (but mostly in the holding costs), and an additional total variation term should also be added in the objective (see [15, 19]). This is omitted here, since we focus on the single-period case, but the model in (P) could easily incorporate such an additional term (see Remark 1). By putting everything together, the model reads as
(4.1) |
There are several methods for solving such stochastic problems; let us mention two important variants. There is the parametric approach (e.g. as in [1, 44]), where one assumes that the returns follow some known distribution which is subsequently sampled to yield finite-dimensional optimization problems, and the sampling approach (e.g. as in [31]), where one obtains a finite number of samples (without assuming a specific distribution). Such samples are often obtained by historical observations, and this approach is also followed in this paper. It is well-known that historical data cannot fully predict the future (see [31]), however it is a widely-used practice. The reader is referred to [27] for an extensive discussion onprobabilistic models for portfolio selection problems.
Additional soft or hard constraints can be included when solving a portfolio selection problem. Such constraints can either be incorporated directly via the use of a model (e.g. see [7, Section 2]) as hard constraints or by including appropriate penalty terms in the objective (soft constraints). It is important to note that the model given in (P) is quite general and as a result has great expressive power, allowing one to incorporate various important risk measures (and their combinations), as well as modeling constraints of interest.
Real-world datasets:
In what follows, we solve two different instances of problem (4.1). In particular, we consider two potential risk measures; the conditional value at risk (e.g. see [44]), as well as the mean absolute semi-deviation (e.g. see [47, Section 6.2.2.], noting that this is in fact equivalent to the mean absolute deviation originally proposed in [30]). In each of the above cases problem (4.1) has the form of (P) and thus Algorithm PMM can directly be applied.
We showcase the effectiveness of the proposed approach on 6 real datasets taken from [9]. Each dataset contains time series for weekly asset returns and market indexes for different major stock markets, namely, DowJones, NASDAQ100, FTSE100, SP500, NASDAQComp, and FF49Industries. In the first 5 markets the authors in [9] provide the market indexes, while for the last dataset the uniform allocation strategy is considered as a benchmark. Additional information on the datasets is collected in Table 1. We note that stocks with less than 10 years of observations have been disregarded.
Name | # of assets | # of data points | Timeline |
---|---|---|---|
DowJones | 28 | 1363 | Feb. 1990–Apr. 2016 |
NASDAQ100 | 82 | 596 | Nov. 2004–Apr. 2016 |
FTSE100 | 83 | 717 | Jul. 2002–Apr. 2016 |
FF49Industries | 49 | 2325 | Jul. 1969–Jul. 2015 |
SP500 | 442 | 595 | Nov. 2004–Apr. 2016 |
NASDAQComp | 1203 | 685 | Feb. 2003–Apr. 2016 |
4.1.1 Conditional value at risk
First, we consider portfolio optimization problems that seek a solution minimizing the conditional value at risk; a measure which is known to be coherent (see [2] for a definition of coherent risk measures). In particular using the notation introduced earlier, we consider the following optimization problem
(4.2) |
where is the random cost function, models the linear constraint matrix of problem (4.1) (where an auxiliary variable has been introduced to transform the inequality constraint involving the expectation into an equality), and . In the above is the confidence level. It is well-known ([44]) that given a continuous random variable , the conditional value at risk can be computed as
We can write problem (4.2) in the following equivalent form:
where the expectation has been substituted by summation since we assume the availability of a dataset . Introducing an auxiliary variable , the previous can be re-written as
We solve problem (4.2) using the proposed active-set method (AS), the interior point-proximal method of multipliers (IP-PMM) given in [38], and OSQP ([49]). We allow any short position, that is , and restrict investing more that of the available wealth on a single stock (i.e. ). We set and , and run the three methods for each of the datasets described in Table 1 for varying confidence level. We report the confidence parameter , the number of PMM, SSN, IP-PMM and OSQP iterations, the CPU time needed by each of the three schemes, as well as the number of factorizations used within the active set (PMM-SSN) scheme. Indeed, it often happens that the active-set is not altered from one iteration to the next, and the factorization does not need to be re-computed. The results are collected in Table 4.1.1. In all the numerical results that follow, the lowest running time exhibited by a solver, assuming it successfully converged, is presented in bold.
Dataset | Iterations | Time (s) | |||||
---|---|---|---|---|---|---|---|
missingmissingmissing 3-5 | |||||||
missingmissingmissing 6-8 | |||||||
PMM(SSN)[Fact.] | IP–PMM | OSQP | AS | IP–PMM | OSQP | ||
DowJones | 35(169)[142] | 18 | 32,575 | 0.65 | 0.71 | 10.94 | |
36(166)[145] | 21 | 42,625 | 0.75 | 0.83 | 13.61 | ||
36(144)[125] | 13 | 43,100 | 0.67 | 0.53 | 14.45 | ||
NASDAQ100 | 31(156)[138] | 19 | 30,575 | 0.66 | 0.60 | 7.92 | |
33(168)[150] | 16 | 34,700 | 0.65 | 0.50 | 8.81 | ||
33(163)[142] | 15 | 40,500 | 0.62 | 0.57 | 10.46 | ||
FTSE100 | 32(179)[164] | 17 | 21,825 | 0.77 | 0.90 | 10.96 | |
34(171)[147] | 19 | 35,825 | 0.80 | 0.99 | 17.70 | ||
35(179)[156] | 11 | 36,550 | 0.93 | 0.59 | 18.24 | ||
FF49Industries | 41(215)[186] | 27 | 1 | 1.77 | 4.65 | ||
42(224)[183] | 22 | 2.37 | 3.62 | ||||
39(205)[157] | 11 | 2.66 | 2.18 | ||||
SP500 | 30(203)[193] | 20 | 21,075 | 5.93 | 12.82 | 65.70 | |
34(199)[187] | 11 | 32,375 | 5.88 | 6.98 | 99.39 | ||
34(173)[160] | 17 | 39,225 | 5.78 | 10.17 | 119.95 | ||
NASDAQComp | 28(198)[192] | 10 | 16,100 | 22.07 | 35.52 | 143.82 | |
31(171)[169] | 14 | 31,275 | 23.31 | 48.99 | 291.97 | ||
34(173)[167] | 19 | 44,775 | 21.67 | 67.14 | 403.75 |
-
1
indicates that the solver reached the maximum number of iterations.
From Table 4.1.1 we observe that for the smaller instances (DowJones, NaSDAQ100, FTSE100, FF49Industries) both second-order methods perform quite well and are comparable in terms of CPU time. Nonetheless, even in this case, the proposed active-set solver requires significantly less memory. Indeed, this can be seen by the fact that the two methods achieve similar times but the active-set scheme is performing significantly more factorizations. On the other hand, for the larger instances (SP500, NASDAQComp) the proposed active-set scheme outperforms the interior point method significantly. This is mostly due to the efficiency gained from the lower memory requirements and cheaper factorizations of the active-set solver. Also, we observe that both methods are quite robust with respect to the confidence level and consistently outperform OSQP, which struggles to find a 5-digit accurate solution. We note that OSQP could potentially be competitive for smaller tolerances (e.g. for finding a 3-digit accurate solution), however, the application under consideration dictates that an accurate solution is needed, since a small improvement in the portfolio output can translate into huge profits in practice.
In order to observe the behaviour of the three solvers under a different regularization value, we fix , , and run them for varying confidence level. The results are collected in Table 4.1.1. Similar observations can be drawn from Table 4.1.1, while noting that both second-order approaches remain efficient for different values of the regularization parameter . Notice, however, that one should be careful when choosing this regularization parameter, since otherwise the obtained portfolios could be meaningless. In light of this, we have chosen values for that yield reasonable portfolios, with controlled short positions. Additionally we should note that both second-order solvers were able to solve all problems in higher accuracy (e.g. up to 6- or 7-digits of accuracy), however, the differences in the obtained portfolios (in terms of positions and/or associated risk) were negligible and hence such accuracies were not considered here.
Dataset | Iterations | Time (s) | |||||
---|---|---|---|---|---|---|---|
missingmissingmissing 3-5 | |||||||
missingmissingmissing 6-8 | |||||||
PMM(SSN)[Fact.] | IP–PMM | OSQP | AS | IP–PMM | OSQP | ||
DowJones | 35(168)[144] | 21 | 24,975 | 0.57 | 0.85 | 8.31 | |
36(154)[133] | 20 | 21,925 | 0.64 | 0.79 | 7.32 | ||
36(143)[128] | 17 | 29,625 | 0.68 | 0.69 | 9.95 | ||
NASDAQ100 | 34(157)[124] | 19 | 17,225 | 0.46 | 0.57 | 4.50 | |
34(155)[133] | 16 | 18,100 | 0.59 | 0.49 | 4.80 | ||
35(157)[132] | 16 | 20,775 | 0.56 | 0.48 | 5.49 | ||
FTSE100 | 34(166)[137] | 21 | 18,900 | 0.65 | 1.17 | 10.10 | |
34(174)[157] | 16 | 24,750 | 0.92 | 0.86 | 13.15 | ||
36(166)[147] | 17 | 28,850 | 0.95 | 0.92 | 15.53 | ||
FF49Industries | 43(215)[171] | 19 | 48,000 | 1.63 | 3.00 | 59.33 | |
41(184)[146] | 22 | 40,475 | 1.57 | 3.53 | 48.48 | ||
41(160)[136] | 19 | 35,525 | 1.45 | 3.06 | 40.91 | ||
SP500 | 32(163)[144] | 20 | 21,425 | 4.68 | 12.26 | 66.29 | |
33(165)[143] | 18 | 22,100 | 5.86 | 12.43 | 73.56 | ||
35(169)[148] | 19 | 32,225 | 5.61 | 11.51 | 98.25 | ||
NASDAQComp | 32(190)[182] | 26 | 17,200 | 15.55 | 90.78 | 157.83 | |
34(180)[172] | 15 | 13,925 | 22.75 | 51.81 | 127.26 | ||
34(170)[167] | 18 | 14,675 | 24.69 | 62.40 | 137.42 |
4.1.2 Mean absolute semi-deviation
Next, we consider portfolio optimization problems that seek a solution minimizing the mean absolute semi-deviation, which is also coherent. We consider the following optimization problem
(4.3) |
where, given a continuous random variable , the associated risk is defined as
where the equivalence follows from [47, Prospotion 6.1]. Given a dataset , problem (4.3) can be written as
where . Note that this model is in the form of (P).
We fix and run the two methods on the 6 datasets for two different sensible values of the regularization parameter . The results are collected in Table 4.1.2.
Dataset | Iterations | Time (s) | |||||
---|---|---|---|---|---|---|---|
missingmissingmissing 3-5 | |||||||
missingmissingmissing 6-8 | |||||||
PMM(SSN)[Fact.] | IP–PMM | OSQP | AS | IP–PMM | OSQP | ||
DowJones | 51(142)[139] | 10 | 32,350 | 1.05 | 0.56 | 10.47 | |
52(142)[139] | 13 | 14,050 | 0.72 | 0.54 | 4.60 | ||
NASDAQ100 | 47(167)[160] | 11 | 18,800 | 0.67 | 0.26 | 5.02 | |
48(176)[161] | 10 | 40,550 | 0.67 | 0.27 | 10.72 | ||
FTSE100 | 46(153)[150] | 10 | 18,325 | 0.79 | 0.27 | 9.56 | |
46(153)[150] | 14 | 47,125 | 0.74 | 0.35 | 23.29 | ||
FF49Industries | 53(141)[136] | 11 | 1 | 1.49 | 1.72 | ||
52(137)[132] | 9 | 12,050 | 1.35 | 1.51 | 14.24 | ||
SP500 | 42(165)[163] | 17 | 28,375 | 5.33 | 2.91 | 87.10 | |
41(157)[153] | 17 | 41,275 | 4.39 | 3.07 | 128.09 | ||
NASDAQComp | 44(143)[133] | 17 | 34,175 | 9.75 | 9.91 | 310.12 | |
44(143)[132] | 18 | 41,000 | 10.22 | 9.72 | 367.32 |
-
1
indicates that the solver reached the maximum number of iterations.
From Table 4.1.2 we observe that both second-order schemes are robust and comparable for all instances. In this case, the larger instances (SP500, NASDAQComp) were solved in comparable time by both solvers. Nevertheless, it is important to note that the active-set scheme is the better choice, since it has significantly less memory requirements. Again, this can be readily observed from the fact that its factorizations are significantly cheaper compared to those of IP-PMM (since the active-set scheme performs significantly more factorizations and still achieves comparable performance). Indeed, as will become clear when solving large-scale quantile regression and binary classification instances, the proposed active-set solver scales better than IP-PMM or OSQP. Additionally, we also observe that the method is robust (i.e. converges reliably to an accurate solution). Finally, as in the case of the CVaR instances, OSQP struggled to find accurate solutions at a reasonable number of iterations, making it a less efficient choice for such problems.
4.1.3 Extensions and alternative risk measures
Let us notice that the presented methodology can be easily extended to the multi-period case [7, 33]. Then, one could include an additional fused-lasso term in the objective function in order to ensure low transaction costs. It is important to note that in this case the term added in the objective has the effect of reducing holding costs as well as short positions (e.g. see [15, 16, 19]). As noted in Remark 1, the additional fused-lasso term can be easily incorporated in the objective of (P). Multi-period portfolio selection problems are not considered here, however, one can expect a very similar behaviour to that observed in the single-period case.
Finally, we could easily deal with alternative risk measures, such as the variance (e.g. [36]), combination of CVaR and MAsD (e.g. see [6]), or approximations of other risk measures via multiple CVaR measures (see [26]). These were not included here for brevity of presentation.
4.2 Penalized quantile regression
Next we consider linear regression models of the following form
where is a -dimensional vector of covariates, are the regression coefficients and is some random error. A very popular problem in statistics is the estimation of the optimal coefficients, in the sense of minimizing a model of the following form:
(4.4) |
where is some loss function and is a penalty function with an associated regularization parameter . Following [54], we consider the elastic-net penalty,
For the loss function, we employ the quantile loss
(4.5) |
where . Notice that the case yields the absolute loss. Letting , and using Remark 1, we can re-write problem (4.4) in the form of (P), as
where
Real-world datasets:
In what follows, we solve several instances of problem (4.4). We showcase the effectiveness of the proposed approach on 5 regression problems taken from the LIBSVM library (see [11]). Additional information on the datasets is collected in Table 5.
Name | # of data points | # of features |
---|---|---|
space_ga | 3107 | 6 |
abalone | 4177 | 8 |
cpusmall | 8192 | 12 |
cadata | 20,640 | 8 |
E2006-tfidf | 16,087 | 150,360 |
We fix , , and , and run all three methods (active-set (AS), IP-PMM and OSQP) on the 5 instances for varying quantile level . The results are collected in Table 4.2.
Problem | Iterations | Time (s) | |||||
missingmissingmissing 3-5 | |||||||
missingmissingmissing 6-8 | |||||||
PMM(SSN)[Fact.] | IP–PMM | OSQ | AS | IP–PMM | OSQP | ||
space_ga | 15(43)[30] | 19 | 2350 | 0.38 | 0.35 | 0.51 | |
15(50)[35] | 20 | 16,300 | 0.37 | 0.39 | 3.36 | ||
15(62)[48] | 19 | 16,975 | 0.46 | 0.35 | 3.46 | ||
15(78)[77] | 20 | 7400 | 0.48 | 0.36 | 1.51 | ||
abalone | 14(68)[64] | 35 | 6350 | 1.02 | 1.29 | 2.15 | |
14(68)[63] | 35 | 6300 | 0.89 | 1.36 | 2.17 | ||
14(79)[74] | 27 | 12,425 | 0.89 | 1.01 | 4.13 | ||
14(87)[79] | 18 | 2875 | 0.82 | 0.69 | 0.97 | ||
cpusmall | 15(64)[64] | 29 | 1 | 2.01 | 2.57 | ||
16(74)[74] | 30 | 2.33 | 2.81 | ||||
16(86)[85] | 26 | 2.64 | 2.37 | ||||
15(110)[110] | 22 | 2.85 | 1.96 | ||||
cadata | 3(66)[65] | 49 | 7125 | 2.96 | 14.63 | 16.10 | |
4(85)[83] | 42 | 9050 | 3.65 | 12.43 | 20.27 | ||
3(77)[76] | 45 | 7275 | 3.45 | 13.13 | 16.45 | ||
3(225)[224] | 80 | 7450 | 13.14 | 23.32 | 17.08 | ||
E2006-tfidf | 14(27)[20] | 2 | 84.82 | ||||
15(34)[26] | 95.08 | ||||||
16(46)[35] | 112.24 | ||||||
17(79)[71] | 165.78 |
-
1
indicates that the solver incorrectly identified the problem as infeasible.
-
2
indicates that the solver ran out of memory.
From Table 4.2 we observe that the three approaches are comparable for the smaller isntances (space_ga, abalone, and cpusmall). The active-set scheme significantly outperforms IP-PMM and OSQP on the cadata problem, mainly due to its better numerical stability. Indeed, this problem is highly ill-conditioned, and this is reflected in the increased number of IP-PMM iterations needed to obtain a 4-digit accurate solution. Finally, we observe that for the large-scale instance (E2006-tfidf), IP-PMM and OSQP crashed due to memory requirements. In this case, the active set scheme was warm-started by a matrix-free ADMM (as discussed in Section 3), since the standard ADMM also crashed due to excessive memory requirements. The active-set scheme, however, manages to solve these large-scale instances very efficiently, without running into any memory issues (consistently). The OSQP solver is competitive for certain instances, albeit slightly slower than both second-order solvers, but fails to solve the cpusmall instances due to an incorrect classification of these instances as infeasible. Additionally, we observe that it is less consistent than the second-order approaches, since its iterations vary greatly with the problem parameters (see instances space_ga and abalone).
Next, we fix and , and run the three methods for varying regularization parameters and . The results are collected in Table 4.2. We observe that both second-order methods are quite robust for a wide range of parameter values. OSQP is competitive for certain instances, however, its behaviour is greatly influenced by the problem parameters. Nonetheless, it was albe to outperform the second-order solvers on two out of the four cadata instances. Finally, the active-set scheme consistently solved the large-scale instance (E2006-tfidf) for a wide range of parameter values, without running into any memory issues.
Problem | Iterations | Time (s) | |||||
missingmissingmissing 3-5 | |||||||
missingmissingmissing 6-8 | |||||||
PMM(SSN)[Fact.] | IP–PMM | OSQP | AS | IP–PMM | OSQP | ||
space_ga | 15(58)[49] | 16 | 37,125 | 0.51 | 0.30 | 8.00 | |
15(61)[48] | 26 | 12,050 | 0.46 | 0.49 | 2.48 | ||
15(67)[46] | 25 | 13,025 | 0.44 | 0.46 | 2.68 | ||
15(82)[62] | 27 | 20,575 | 0.61 | 0.49 | 4.21 | ||
abalone | 14(70)[65] | 25 | 16,225 | 0.88 | 1.32 | 5.49 | |
14(71)[67] | 34 | 15,575 | 0.86 | 1.48 | 5.16 | ||
15(80)[76] | 39 | 18,675 | 0.92 | 1.50 | 6.26 | ||
15(116)[109] | 30 | 9975 | 1.44 | 1.16 | 3.34 | ||
cpusmall | 15(83)[81] | 20 | 1 | 1.98 | 2.14 | ||
15(81)[78] | 26 | 2.10 | 2.39 | ||||
16(101)[94] | 21 | 2.79 | 1.94 | ||||
15(106)[103] | 20 | 3.16 | 1.85 | ||||
cadata | 9(147)[143] | 62 | 5875 | 15.42 | 18.05 | 13.18 | |
4(63)[62] | 50 | 7350 | 2.76 | 14.68 | 17.41 | ||
5(121)[118] | 45 | 9775 | 5.49 | 13.32 | 21.70 | ||
7(276)[276] | 33 | 875 | 14.01 | 9.91 | 2.08 | ||
E2006-tfidf | 16(47)[38] | 2 | 115.87 | ||||
16(43)[36] | 118.89 | ||||||
16(43)[34] | 115.90 | ||||||
17(48)[40] | 128.38 |
-
1
indicates that the solver incorrectly identified the problem as infeasible.
-
2
indicates that the solver ran out of memory.
4.3 Binary classification via linear support vector machines
Finally, we are interested in training a binary linear classifier using regularized soft-margin linear support vector machines (SVMs), [51]. More specifically, given a training dataset , where are the labels and are the feature vectors (with the number of features), we would like to solve the following optimization problem
(4.6) |
where is a regularization parameter, and are the weights for the and regularizers, respectively. The standard Euclidean regularization is traditionally used as a trade-off between the margin of the classifier (the larger the better) and the correct classification of , for all (e.g. [17]). However, this often leads to a dense estimate for , which has led researchers in the machine learning community into considering the regularizer instead, in order to encourage sparsity (e.g. [8]). It is well-known that both regularizers can be combined to obtain the effect of both individual ones, using the elastic-net penalty (see, for example, [52]), assuming that and are appropriately tuned.
Real-world datasets:
In what follows we consider elastic-net SVM instances of the form of (4.6). We showcase the effectiveness of the proposed active-set scheme on 3 large-scale binary classification datasets taken from the LIBSVM library ([11]). The problem names, as well as the number of features and training points are collected in Table 8.
Name | # of training points | # of features |
---|---|---|
rcv1 | 20,242 | 47,236 |
real-sim | 72,309 | 20,958 |
news20 | 19,996 | 1,355,191 |
In the experiments to follow, we only consider large-scale problems that neither IP-PMM nor OSQP can solve, due to excessive memory requirements. Nonetheless, we should note that, by following the developments in [19], IP-PMM can be specialized to problems of this form in order to be able to tackle large-scale instances. However, the same can be said about the proposed active-set method. The schemes tested in this work employ factorization for the solution of their associated linear systems. The introduction of preconditioning, e.g. as in [39], could improve their efficiency in certain cases, but is out of the scope of this article. In the following experiments, we employ the (matrix-free) prox-linear ADMM (as described in Section 3), to avoid any memory issues in the warm-starting phase of the proposed algorithm.
We fix , , and run the active-set solver for all datasets given in Table 8 for varying values of the regularization parameters , and . The results are collected in Table 4.3.
Problem | Iterations | Time (s) | ||
missingmissingmissing 4-4 | ||||
missingmissingmissing 5-5 | ||||
PMM(SSN)[Fact.] | AS | |||
rcv1 | 0.2 | 0.2 | 48(139)[105] | 43.36 |
0.8 | 0.2 | 47(100)[58] | 23.28 | |
0.2 | 0.8 | 47(165)[133] | 54.81 | |
5 | 5 | 39(85)[45] | 20.57 | |
real-sim | 0.2 | 0.2 | 50(130)[100] | 252.25 |
0.8 | 0.2 | 47(85)[48] | 132.20 | |
0.2 | 0.8 | 48(101)[68] | 201.56 | |
5 | 5 | 47(85)[48] | 128.75 | |
news20 | 0.2 | 0.2 | 50(142)[102] | 213.12 |
0.8 | 0.2 | 41(85)[43] | 133.94 | |
0.2 | 0.8 | 70(212)[161] | 74.42 | |
5 | 5 | 41(85)[43] | 130.33 |
We observe that the solver was able to consistently solve these instances, without running into memory issues. In this case, the behaviour of the active-set solver was affected by the parameters and (indeed, see the number of SSN iterations for different regularization values), however, we consistently obtain convergence in a very reasonable amount of time. Overall, we observe that the proposed algorithm is very general and can be applied in a plethora of very important applications arising in practice. We were able to showcase that the active-set nature of the method allows one to solve large-scale instances on a personal computer, without the need of employing iterative linear algebra (which could also complement the solver, as in [39]). The proposed method strikes a good balance between first-order methods, which are fast but unreliable, and second-order interior point methods, which are extremely robust but can struggle with the problem size, ill-conditioning and memory requirements. We have demonstrated that -regularized convex quadratic problems with piecewise-linear terms can be solved very efficiently using the proposed active-set scheme, and we conjecture that the algorithm can be readily extended to deal with general nonlinear convex objectives, or discretizations of stochastic two-stage problems.
5 Conclusions
In this paper we derived an efficient active-set method for the solution of convex quadratic optimization problems with piecewise-linear terms in the objective. The method, which complements our developments in the accompanying paper [“An active-set method for sparse approximations. Part I: Separable terms”, S. Pougkakiotis, J. Gondzio, D. S. Kalogerias], arises by suitably combining a proximal method of multipliers with a semismooth Newton scheme, and admits an active-set interpretation. By taking advantage of the piecewise-linear terms in the objective, the method has very reasonable memory requirements since it utilizes only a small active-set at every inner-outer iteration. We warm-start the algorithm using an appropriate alternating direction method of multipliers, and ensure faster convergence and reduced memory requirements.
We showcase the efficiency and robustness of the proposed scheme on a variety of optimization problems arising in risk-averse portfolio optimization, quantile regression, and binary classification via linear support vector machines. A numerical comparison against a robust interior point method and a state-of-the-art alternating direction method of multipliers demonstrates the viability of the approach as well as its limited memory requirements. In particular, we observe a significantly better behaviour, compared to the two other solvers, when dealing with large-scale instances. Overall, the approach remains efficient for a wide range of problems and strikes a good balance between cheap but unreliable first-order methods and expensive but highly reliable interior point methods.
Appendix A Appendix: Termination criteria
References
- [1] S. Alexander, T. F. Coleman, and Y. Li, Minimizing CVaR and VaR for a portfolio of derivatives, Journal of Banking & Finance, 30 (2006), pp. 583–605, https://doi.org/10.1016/j.jbankfin.2005.04.012.
- [2] P. Artzner, F. Delbaen, J.-M. Eber, and D. Heath, Coherent measures of risk, Mathematical Finance, 9 (1999), pp. 203–228, https://doi.org/10.1111/1467-9965.00068.
- [3] J. F. Benders, Partitioning procedures for solving mixed-variables programming problems, Numerische Mathematik, 4 (1962), pp. 238–252, https://doi.org/10.1007/BF01386316.
- [4] D. Bertsimas and M. Sim, The price of robustness, Operations Research, 52 (2004), pp. 35–53s, https://doi.org/10.1287/opre.1030.0065.
- [5] J. R. Birge, Decomposition and partitioning methods for multistage stochastic linear programs, Operations Research, 33 (1985), pp. 989–1007, https://doi.org/10.1287/opre.33.5.989.
- [6] C. Birungi and L. Muthoni, Analysis of risk measures in portfolio optimization for the Uganda securities exchange, Journal of Financial Risk Management, 10 (2021), pp. 135–152, https://doi.org/10.4236/jfrm.2021.102008.
- [7] S. Boyd, E. Busseti, S. Diamond, K. R. N., K. Koh, P. Nystrup, and J. Speth, Multi-period trading via convex optimization, Foundations and Trends® in Optimization, 3 (2017), pp. 1–76, https://doi.org/10.1561/2400000023.
- [8] P. S. Bradley and O. L. Mangasarian, Feature selection via concave minimization and support vector machines, in Proceedings of the Fifteenth International Conference on Machine Learning, ICML ’98, San Francisco, CA, USA, 1998, Morgan Kaufmann Publishers Inc., pp. 82––90.
- [9] R. Bruni, F. Cesarone, A. Scozzari, and F. Tardella, Real-world datasets for portfolio selection and solutions of some stochastic dominance portfolio models, Data in Brief, 8 (2016), pp. 858–862, https://doi.org/10.1016/j.dib.2016.06.031.
- [10] J. V. Burke and M. C. Ferris, A Gauss-Newton method for convex composite optimization, Mathematical Programming, 71 (1995), pp. 179–194, https://doi.org/10.1007/BF01585997.
- [11] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2 (2011), pp. 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.
- [12] J. Chen and L. Qi, Globally and superlinearly convergent inexact Newton-Krylov algorithms for solving nonsmooth equations, Numerical Linear Algebra with Applications, 17 (2010), pp. 155–174, https://doi.org/10.1002/nla.673.
- [13] F. Clarke, Optimization and Nonsmooth Analysis, Classics in Applied Mathematics, John Wiley and Sons, New York, 1990, https://doi.org/10.1137/1.9781611971309.
- [14] C. Clason and T. Valkonen, Introduction to Nonsmooth Analysis and Optimization, 2020, https://arxiv.org/abs/arXiv:1912.08672.
- [15] S. Corsaro, V. De Simone, and Z. Marino, Fused Lasso approach in portfolio selection, Annals of Operations Research, 299 (2021), pp. 47–59, https://doi.org/10.1007/s10479-019-03289-w.
- [16] S. Corsaro, V. De Simone, Z. Marino, and F. Perla, -Regularization for multi-period portfolio selection, Annals of Operations Research, 294 (2020), pp. 75–86, https://doi.org/10.1007/s10479-019-03308-w.
- [17] C. Cortes and V. N. Vapnik, Support-vector networks, Machine Learning, 20 (1995), pp. 273–297, https://doi.org/10.1007/BF00994018.
- [18] G. B. Dantzig and P. Wolfe, The decomposition algorithm for linear programming, Econometrica, 29 (1961), pp. 767–778, https://doi.org/10.2307/1911818.
- [19] V. De Simone, D. di Serafino, J. Gondzio, S. Pougkakiotis, and M. Viola, Sparse approximations with interior point methods, SIAM Review, 64 (2022), pp. 954–988, https://doi.org/10.1137/21M1401103.
- [20] W. Deng and W. Yin, On the global and linear convergence of the generalized alternating direction method of multipliers, Journal of Scientific Computing, 66 (2016), pp. 889–916, https://doi.org/10.1007/s10915-015-0048-x.
- [21] J. Gondzio, S. Pougkakiotis, and J. W. Pearson, General-purpose preconditioning for regularized interior point methods, Computational Optimization and Applications, (2022), https://doi.org/10.1007/s10589-022-00424-5.
- [22] B. L. Gorissen, Interior point methods can exploit structure of convex piecewise linear functions with application in radiation therapy, SIAM Journal on Optimization, 32 (2022), pp. 256–275, https://doi.org/10.1137/21M1402364.
- [23] A. Gramfort, B. Thirion, and G. Varoquaux, Identifying predictive regions from fMRI with TV-L1 prior, in 2013 International Workshop on Pattern Recognition in Neuroimaging, 2013, pp. 17–20, https://doi.org/10.1109/PRNI.2013.14.
- [24] S.-P. Han, J.-S. Pang, and N. Rangaraj, Globally convergent Newton methods for nonsmooth equations, Mathematics of Operations Research, 17 (1992), pp. 586–607, https://doi.org/10.1287/moor.17.3.586.
- [25] E. Hans and T. Raasch, Global convergence of damped semismooth Newton methods for Tikhonov regularization, Inverse Problems, 31 (2015), p. 025005, https://doi.org/10.1088/0266-5611/31/2/025005.
- [26] K. Inui and M. Kijima, On the significance of expected shortfall as a coherent risk measure, Journal of Banking and Finance, 29 (2005), pp. 853–864, https://doi.org/10.1016/j.jbankfin.2004.08.005.
- [27] P. Jorion, Risk management lessons from long-term capital management, European Financial Management, 6 (2000), pp. 277–300, https://doi.org/10.1111/1468-036X.00125.
- [28] C. Kanzow and T. Lechner, Globalized inexact proximal Newton-type methods for nonconvex composite functions, Computational Optimization and Applications, 78 (2021), pp. 377–410, https://doi.org/10.1007/s10589-020-00243-6.
- [29] J. E. Kelley, Jr., The cutting-plane method for solving convex programs, Journal of the Society for Industrial and Applied Mathematics, 8 (1960), pp. 703–712, https://doi.org/10.1137/0108053.
- [30] H. Konno and H. Yamazaki, Mean-absolute deviation portfolio optimization model and its applications to Tokyo stock market, Management Science, 37 (1991), pp. 519–531, https://doi.org/10.1287/mnsc.37.5.519.
- [31] P. Krokhmal, S. Uryasev, and J. Palmquist, Portfolio optimization with conditional value-at-risk objective and constraints, Journal of Risk, 4 (2002), pp. 43–68, https://doi.org/10.21314/JOR.2002.057.
- [32] R. Lazimy, Mixed-integer quadratic programming, Mathematical Programming, 22 (1982), pp. 332–349, https://doi.org/10.1007/BF01581047.
- [33] D. Li and W.-L. Ng, Optimal dynamic portfolio selection: Multiperiod mean-variance formulation, Mathematical Finance, 10 (2000), pp. 387–406, https://doi.org/10.1111/1467-9965.00100.
- [34] X. Li, D. Sun, and K. C. Toh, A highly efficient semismooth Newton augmented Lagrangian method for solving lasso problems, SIAM Journal on Optimization, 28 (2018), pp. 433–458, https://doi.org/10.1137/16M1097572.
- [35] X. Li, D. Sun, and K. C. Toh, An asymptotically superilinearly convergent semismooth Newton augmented Lagrangian method for linear programming, SIAM Journal on Optimization, 30 (2020), pp. 2410–2440, https://doi.org/10.1137/19M1251795.
- [36] H. M. Markowitz, Portfolio selection, The Journal of Finance, 7 (1952), pp. 77–91, https://doi.org/10.2307/2975974.
- [37] J. Martínez and L. Qi, Inexact Newton methods for solving nonsmooth equations, Journal of Computational and Applied Mathematics, 60 (1995), pp. 127–145, https://doi.org/10.1016/0377-0427(94)00088-I.
- [38] S. Pougkakiotis and J. Gondzio, An interior point-proximal method of multipliers for convex quadratic programming, Computational Optimization and Applications, 78 (2021), pp. 307–351, https://doi.org/10.1007/s10589-020-00240-9.
- [39] S. Pougkakiotis, J. Gondzio, and D. S. Kalogerias, An active-set method for sparse approximations. part i: Separable terms, 2022, https://arxiv.org/abs/arXiv:2201.10211.
- [40] L. Qi, Convergence analysis of some algorithms for solving nonsmooth equations, Mathematics of Operations Research, 18 (1993), pp. 227–244, https://doi.org/10.1287/moor.18.1.227.
- [41] H. Ramos, M. Righi, C. Guedes, and F. Müller, A comparison of risk measures for portfolio optimization with cardinality constraints, 2022, http://dx.doi.org/10.2139/ssrn.4141301.
- [42] M. B. Righi and D. Borenstein, A simulation comparison of risk measures for portfolio optimization, Finance Research Letters, 24 (2018), pp. 105–112, https://doi.org/10.1016/j.frl.2017.07.013.
- [43] R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM Journal on Control and Optimization, 14 (1976), pp. 877–898, https://doi.org/10.1137/0314056.
- [44] R. T. Rockafellar and S. Uryasev, Optimization of conditional value-at-risk, Journal of Risk, 2 (2000), pp. 21–41, https://doi.org/10.21314/JOR.2000.038.
- [45] R. T. Rockafellar, S. Uryasev, and M. Zabarankin, Optimality conditions in portfolio analysis with general deviation measures, Mathematical Programming, 108 (2006), pp. 515–540, https://doi.org/10.1007/s10107-006-0721-9.
- [46] A. Ruszczyński, A regularized decomposition method for minimizing a sum of polyhedral functions, Mathematical Programming, 35 (1986), pp. 309–333, https://doi.org/10.1007/BF01580883.
- [47] A. Shapiro, D. Dentcheva, and A. Ruszczyński, Lectures on Stochastic Programming: Modeling and Theory, MOS-SIAM Series on Optimization, SIAM & Mathematical Optimization Society, Philadelphia, 2009, https://doi.org/10.1137/1.9780898718751.
- [48] L. P. Silva, D. Alem, and F. L. Carvalho, Portfolio optimization using mean absolute deviation (MAD) and conditional value-at-risk (CVaR), Production, 27 (2017), https://doi.org/10.1590/0103-6513.208816.
- [49] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, OSQP: an operator splitting solver for quadratic programs, Mathematical Programming Computation, 12 (2020), pp. 637–672, https://doi.org/10.1007/s12532-020-00179-2.
- [50] R. J. Vanderbei, Symmetric quasidefinite matrices, SIAM Journal on Optimization, 5 (1993), pp. 100–113, https://doi.org/10.1137/0805005.
- [51] V. N. Vapnik, The Nature of Statistical Learning Theory, Information Science and Statistics, Springer, New York, NY, 2000, https://doi.org/10.1007/978-1-4757-3264-1.
- [52] L. Wang, J. Zhu, and H. Zou, The doubly regularized support vector machine, Statistica Sinica, 16 (2006), pp. 589–615, http://www.jstor.org/stable/24307560.
- [53] X. Y. Zhao, D. F. Sun, and K. C. Toh, A Newton-CG augmented Lagrangian method for semidefinite programming, SIAM Journal on Optimization, 20 (2010), pp. 1737–1765, https://doi.org/10.1137/080718206.
- [54] H. Zou and T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67 (2005), pp. 301–320, https://doi.org/10.1111/j.1467-9868.2005.00503.x.