Implementing portfolio risk management and hedging in practice
Abstract
In academic literature portfolio risk management and hedging are often versed in the language of stochastic control and Hamilton–Jacobi–Bellman (HJB) equations in continuous time. In practice the continuous-time framework of stochastic control may be undesirable for various business reasons. In this work we present a straightforward approach for thinking of cross-asset portfolio risk management and hedging, providing some implementation details, while rarely venturing outside the convex optimisation setting of (approximate) quadratic programming (QP). We pay particular attention to the correspondence between the economic concepts and their mathematical representations; the abstractions enabling us to handle multiple asset classes and risk models at once; the dimensional analysis of the resulting equations; and the assumptions inherent in our derivations. We demonstrate how to solve the resulting QPs with CVXOPT.
1 Introduction
In academic literature portfolio risk management and hedging are often (but not always [19, 22]) versed in the language of stochastic control [6, 30, 26, 20, 25, 24] and Hamilton–Jacobi–Bellman (HJB) equations in continuous time. Indeed, this is the most rigorous and general framework for such considerations. Due to the inherent, natural relationship between stochastic control and reinforcement learning [7, 8], this framework can be relatively easily extended to the numerical methods of reinforcement learning [29, 27] taking care of the more realistic setting involving frictions.
In the author's practical experience of portfolio and risk management on the sell-side and on the buy-side, including at bulge bracket institutions, this approach presents several practical problems. First, some quantitative analysts and quantitative developers [13, 18] come from backgrounds, which exclude stochastic control. For example, many classical computer science degrees do not cover stochastic control (or indeed stochastic analysis) as part of the syllabus. The continuous-time framework is technically complex and requires discretisation at some point anyway. There are relatively few industry-grade solvers for HJB-type problems.
On the other hand, quadratic programming [9, 12, 14, 10] is accessible to most majors and requires a relatively limited background in undergraduate linear algebra (rather than graduate-level stochastic analysis and measure theory). It is usually well understood by quantitative analysts and developers irresepective of their academic and professional background. Furthermore, there are fast industry-grade implementations of quadratic program (QP) solvers, including CPLEX [16], Gurobi [15], IMSL [17], NAG [2], NASOQ [11], OSQP [28], Xpress [1]. Therefore it makes sense to come up with a QP formulation of the portfolio risk management and hedging problem. Even if such approach is imperfect in comparison with the stochastic control approach, it is practical.
Finally, in the medium- to high-frequency trading (HFT) [3] applications, the optimiser needs to be very efficient and called sparingly at discrete time intervals. In this setting, the QP formulation is yet again appealing.
In this work we describe the portfolio risk management and hedging framework with sufficient rigour, paying particular attention to the correspondence between the economic concepts and their mathematical representations; the abstractions enabling us to handle multiple asset classes and risk models at once; the dimensional analysis of the resulting equations; and the assumptions inherent in our derivations.
We demonstrate how to solve the resulting QPs with CVXOPT [4], a free software package for convex optimisation based on the Python programming language. CVXOPT is convenient in the research and development context. In production the reader is advised to use one of the industry-grade C++ or kdb+/q [23] implementations, or a specialised implementation of the solver on an FPGA or ASIC [21].
The implementations that we provide here are baseline, pedagogical implementations. In some cases the business requirements may be such that additional frictions may need to be taken into account, in which case the problem ceases to be convex. There are some proprietary tricks that can be applied in these situations. These tricks can significantly impact the profitability and risk profile of a business, but unfortunately they are outside the scope of this work.
Acknowledgements
We would like to thank Berc Rustem (Department of Computing, Imperial College London) and Martin Zinkin (Qubealgo) for our constructive discussions. The factor abstraction is largely due to Attilio Meucci and follows [22].
2 Preliminaries
In what follows we assume that we are the market maker and the sides of the trades are given from our perspective: buys are when we buy (the change in our position has a positive sign), sells are when we sell (the change in our position has a negative sign).
We shall denote by the set ; by the vector in whose elements are all ones. The vectors are column vectors by default.
Let us establish conventions for matrix calculus. The (scalar-by-vector) derivative of a scalar with respect to a vector in ,
with , is written, in numerator layout notation, as
Strictly speaking, the result is a matrix in , although we can certainly, and somewhat confusingly, think of it as a vector in . (Confusingly, because of the widespread habit to think of vectors as column vectors by default. To avoid this confusion, we shall favour the matrix view over the vector view whenever there is ambiguity.)
The (vector-by-vector derivative) of a -valued vector function (a vector whose components are functions)
with , with respect to an input vector in ,
with , is written, in numerator layout notation, as
The result is a matrix in . It is called the Jacobian matrix of with respect to .
We have included these definitions here because they vary across the literature, with some authors preferring the numerator layout notation (as we do throughout this document), while others preferring the denominator layout notation.
Given the numerator layout convention that we have chosen, if , , then . If, moreover, is symmetric, then . If , , then .
3 CVXOPT
CVXOPT's function qp is an interface to coneqp for quadratic programs. It also provides the option of using the quadratic programming solver from MOSEK [5].
cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])
This solves the convex quadratic program
subject to | |||||
and its dual.
4 The portfolio
Suppose that we are trading products whose prices per unit notional111We shall use the words “size” and “notional” synonymously. at time are given by the vector . We shall represent the composition of our portfolio, , in terms of , where the th element, , is the net notional amount of the th product in the portfolio (). Thus the total net notional of our portfolio is given by . We could also consider the weights of the products in our portfolio, . The value of our portfolio at time is given by .
This is an appropriate time to comment on the units. The value of the portfolio, is always expressed in units of currency, say, USD or EUR. The units of the prices, , and notionals, , are subject to the conventions of the particular asset class:
-
•
For equities, is dimensionless; the notional is expressed as the number of shares. The price, , is in units of currency.
-
•
For CDS indices, is expressed in units of currency, as this is the amount on which protection is being bought or sold (e.g. 25,000,000 USD). The price, , on the other hand, is dimensionless, as it is expressed in units of currency per unit notional, which is itself expressed in units of currency.
5 The invariants
Our market risk consists in our dependence on the prices of the products at time , where represents the time when the asset allocation decision is being made and is our investment horizon.
In order to understand our market risk, we seek to express in terms of invariants — market variables that exhibit stationary behaviour over time and can be expressed in terms of independent and identically distributed random variables.
Let us denote by the -valued random vector of invariants at time for a given investment horizon . We express the prices of the products in our portfolio as
where is a function that depends on the asset classes of products in our portfolio and our investment horizon. In particular,
Notice that we are assuming that, like , is -dimensional. The reason why this is a sensible assumption will become clear when we consider examples of and for various asset classes.
6 The factor model
To get a handle on our market risk, we shall express the vector of invariants as
(1) |
where is an -valued () vector of common risk factors that are responsible for most of the randomness in the market, is a residual vector of perturbations, and is a function.
7 Risk
What is our risk exposure, i.e. the sensitivity of the value of our portfolio to our risk factors? It is given by the -valued
The dependence of our risk exposure on the notional vector is expressed by the matrix of sensitivities, the Jacobian,
Thus the relationship between and , which follows from their definitions, is given by
(2) |
8 The variance of the value of the portfolio
We would like to minimise the variance of the value of the portfolio at time , which represents our market risk. Let us regard as a function of the risk factors: . By Theorem A.1 and Remark A.2,
where is the covariance matrix.
This result is approximate. We approximated in two places:
-
1.
In Theorem A.1, where we disposed of the remainder term of the Taylor series expansion.
-
2.
When we regarded as a function of alone, whereas it is also a function of another random variable: . We have effectively approximated (1) by
which is sensible if the variance of is small, i.e. our risk factors are responsible for most of the variance of . To better account for the variance of , we could have used the bivariate Taylor series expansion, instead of the univariate as in the proof of Theorem A.1.
9 Minimising risk
We would like to minimise the variance of the value of our portfolio at our investment horizon, . Bearing in mind the caveats mentioned in Section 8, this is achieved by minimising
where and is the change in position that will minimise the variance of the value of our portfolio. It is that we need to find.
Note that, at the time the hedging decision is made, , , and are not yet known. Therefore we have to make use of forecasts instead and minimise
where . We shall discuss how these forecasts can be obtained later. For the time being, we shall drop the subscripts and superscripts to avoid notational clutter and solve the unconstrained quadratic program (QP)
(3) |
which can be done analytically using straightforward matrix calculus. First, differentiate with respect to :
We set this partial derivative to zero and solve for :
hence
and
10 Minimising risk and symmetric costs
Let us now incorporate the costs into the optimisation. We now assume that it will cost us to execute the hedge (change our position by ), where are the costs per unit notional. Notice that the costs are symmetric — they are the same linear factors of irrespective of whether we are buying or selling. This assumption may not be realistic in practice.
The problem remains a constrained QP:
where denotes the elementwise absolute value.222At first sight, the problem is an unconstrained QP, which can be solved using matrix calculus as before: Setting this partial derivative to zero and solving for , we get: hence However, in this form the problem is misspecified: if has all positive elements, then we are paying for buying, but receiving money for selling!
Here is a nonnegative constant specifying how many units of cash we are prepared to pay for reducing the variance of the value of the portfolio by one unit (in units of value squared).
We can rewrite this problem as
subject to | |||||
The last term was added to regularise the covariance matrix of the overall problem.
Let us write this problem as a standard QP. Setting
the objective function can be written as
We can also rewrite the constraints in block matrix form as
10.1 Using CVXOPT
Thus we can call
cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])
with
to find the optimal .
10.2 When is positive definite?
In this section, and in this section only, will denote the parameter of cvxopt.solvers.qp and not the vector of prices.
By Lemma A.3, the eigenvalues of are precisely those of and , combined. Clearly the eigenvalues of are just repeated times. To find the eigenvalues of , we solve the characteristic equation
for . On observing that
we notice that this deteminant is zero precisely when
for an eigenvalue, , of . Thus, to summarise, the eigenvalues of are:
-
•
repeated times;
-
•
for , , where is the th eigenvalue of .
It is now clear that is positive definite iff , where is the least eigenvalue of (which is positive because is positive definite by assumption).
11 Minimising risk and asymmetric costs
In Section 10, we assumed that the costs would be the same irrespective of the signs of the components of . In this section we shall develop an approach that will allow us to provide separate costs for buying, , and for selling, .
To this end we also define as the block vector
with with all their components nonnegative. specifies the notional amounts to be bought, specifies the notional amounts to be bought, for each product.
The optimisation problem now becomes
subject to | |||||
Here is a nonnegative constant specifying how many units of cash we are prepared to pay for reducing the variance of the value of the portfolio by one unit (in units of value squared).
We note that this is now a constrained QP.
There is a problem with this formulation: nothing guarantees that we won't be buying and selling the same product simultaneously, i.e. that, for some , and . To address this, we add an additional term, :
subject to | |||||
Let us write this problem as a standard QP. The first term of the objective function can be written as
The last term is a constant and can be dropped from the minimisation. The remaining terms can be rewritten as
The second term of the objective function can be written as
Finally, the third term of the objective function can be written as
Putting this together, we rewrite the objective function as
We can also rewrite the constraints in block matrix form as
11.1 Using CVXOPT
Thus we can call
cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])
with
to find the optimal
11.2 When is positive definite?
In this section, and in this section only, will denote the parameter of cvxopt.solvers.qp and not the vector of prices.
By Lemma A.4, the eigenvalues of are precisely those of and , combined. Clearly the eigenvalues of are just repeated times. To find the eigenvalues of , we solve the characteristic equation
for . On observing that
we notice that this deteminant is zero precisely when
for an eigenvalue, , of . Thus, to summarise, the eigenvalues of are:
-
•
repeated times;
-
•
for , , where is the th eigenvalue of .
It is now clear that is positive definite iff , where is the least eigenvalue of (which is positive because is positive definite by assumption).
12 Practical considerations
The universe of products that we trade may be a proper superset of the universe of products that we use to hedge. This is easily implemented within our framework: compute the risk for the overall portfolio, then restrict the set of products under consideration to the hedging universe. Then the dimension is the number of products that are used for hedging and the derivations remain valid for this restricted set of products.
13 Special case: , and diagonal
Let us now consider the special case when and both and are diagonal matrices:
Additionally, we require that be positive definite (not just positive semidefinite), so for . We shall also require for .
In this setting, the unconstrained QP (3) reduces to the system of scalar optimisation problems without any coupling,
Using elementary calculus — solving for the derivative of with respect to equated to 0 — we find the optimal : .
Note that in this case, when the correlations are absent, the sign of (i.e. whether we buy or sell the th product) depends entirely on the signs of and : is positive (i.e. we have to buy units of notional of the th product) when and are of different signs; is negative (i.e. we have to sell units of notional of the th product) when and are of the same sign. (Of course, we are assuming that , as otherwise there is no risk to hedge.)
For this reason there is no need to augment the costs of buying and selling as we did in Section 11. We can simply set
The objective function then becomes and the optimisation problem
where is as in Section 10.
Again using elementary calculus, we find
14 Case study: CDS indices, no cross-hedging
Let us now consider the case of CDS indices. Recall that credit DV01 (CDV01) or CS01 (Credit Spread 01) is defined as the change in price of the CDS contract (of a given notional) for a one basis point increase in spread.
For the European CDS indices (the iTraxx family), we shall take EUR as our unit of notional. For the North American CDS indices (the CDX family), we shall take USD as our unit of notional. So and will have these units.
The unit of price () will be EUR for iTraxx and USD for CDX. Our invariants () will be credit spreads, whose units are basis points. We shall set in (1) to be the identity map, i.e. our factors will be the same as our invariants, . Since our risk exposure is given by , its units will be EUR per basis point for iTraxx and USD per basis point for CDX. Since the Jacobian is given by , its units will be : is the change in price (in EUR for iTraxx, USD for CDX) per 1,000,000 EUR for iTraxx (1,000,000 USD for CDX) notional.
15 Case study: European government bonds
Suppose that we have a portfolio of European government bonds. For , the price of the th bond at time is given by
(4) |
where
-
•
is the dirty market price of the th bond;
-
•
is the number of cashflows for the th bond;
-
•
is the th cashflow for the th bond;
-
•
is the duration of the time interval between the time and the time of the th cashflow of the th bond;
-
•
is the zero curve at time , which is a function that maps the maturities of the cashflows (the durations of the time intervals between the time and the times of the cashflows) to the continuously compounded interest rates;
-
•
is a parallel vertical shift to the zero curve which is required to equate the right-hand side to the dirty market price of the th bond. We shall refer to as the idiosyncratic spread of the th bond at time .
We model the yield curve as
where and, for ,
are some suitably defined basis functions.
As we are interested in intraday market making, our investment horizon is relatively short, so we can assume that the bond prices are our invariants,
so is the identity map, i.e., for all , , so that
The risk factors are given by
where
are, respectively, our forecasts for
Our risk factors explain all of the risk, so
where is given by equation (4).
Thus we need to find
for , . Applying the chain rule, for ,
and for , setting ,
References
- [1] FICO Xpress Optimizer Reference Manual, 2023.
- [2] The Numerical Algorithms Group NAG Library Manual, Mark 29.2, 2023. https://support.nag.com/numeric/nl/nagdoc_latest/.
- [3] Irene Aldridge. High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems. Wiley, 2 edition, 2013.
- [4] Martin Andersen, Joachim Dahl, and Lieven Vandenberghe. CVXOPT: Convex optimization. Astrophysics Source Code Library, 2020.
- [5] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.0., 2019.
- [6] Karl J. Astrom. Introduction to Stochastic Control Theory. Dover, 2006.
- [7] Dimitri P. Bertsekas. Dynamic programming and optimal control, Volume I. Athena Scientific, Belmont, MA, 2001.
- [8] Dimitri P. Bertsekas. Dynamic programming and optimal control, Volume II. Athena Scientific, Belmont, MA, 2005.
- [9] Michael J. Best. Portfolio Optimization. CRC Press, 2010.
- [10] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
- [11] Kazem Cheshmi, Danny M. Kaufman, Shoaib Kamil, and Maryam Mehri Dehnavi. NASOQ. ACM Transactions on Graphics, 39(4), aug 2020.
- [12] Gerard Cornuejols, Javier Pena, and Reha Tutuncu. Optimization Methods in Finance. Cambridge University Press, 2 edition, 2018.
- [13] Emanuel Derman. My Life as a Quant: Reflections on Physics and Finance. Wiley, 2007.
- [14] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization. Emerald Group Publishing Limited, 1982.
- [15] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
- [16] IBM. V12.1: User's manual for CPLEX. Technical report, IBM, 2009.
- [17] IMSL. IMSL STAT/LIBRARY. Visual Numerics Inc., Houston, Texas, USA, 1997. http://www.vni.com/books/dod/pdf/STATVol_2.pdf.
- [18] Mark Joshi. On becoming a quant. http://www.maths.usyd.edu.au/u/UG/SM/MATH3075/r/Joshi_2008.pdf, 2008.
- [19] Mark S. Joshi and Jane M. Paterson. Introduction to Mathematical Portfolio Theory. International Series on Actuarial Science. Cambridge University Press, 2013.
- [20] Harold J. Kushner and Paul Dupuis. Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, 2 edition, 2000.
- [21] John W. Lockwood, Adwait Gupte, Nishit Mehta, Michaela Blott, Tom English, and Kees Vissers. A low-latency library in FPGA hardware for high-frequency trading (HFT). In IEEE 20th Annual Symposium on High-Performance Interconnects, pages 9–16, 2012.
- [22] Attilio Meucci. Risk and Asset Allocation. Springer Finance. Springer, 2005.
- [23] Jan Novotny, Paul Alexander Bilokon, Aris Galiotos, and Frédéric Délèze. Machine Learning and Bid Data with kdb+/q. Wiley, 2019.
- [24] Bernt Øksendal. Stochastic Differential Equations: An Introduction with Applications. Universitext. Springer, 6 edition, 2000.
- [25] Bernt Øksendal and Agnes Sulem. Applied Stochastic Control of Jump Diffusions. Springer, 3 edition, 2019.
- [26] Huyen Pham. Continuous-time Stochastic Control and Optimization with Financial Applications. Springer, 2009.
- [27] Warren B. Powell. Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions. Wiley, 2022.
- [28] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd. OSQP: an operator splitting solver for quadratic programs. Mathematical Programming Computation, 12(4):637–672, 2020.
- [29] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 2 edition, 2018.
- [30] Jiongmin Yong and Xun Yu Zhou. Stochastic Controls: Hamiltonian Systems and HJB Equations. Springer, 1999.
Appendix A Auxiliary results
Theorem A.1 (The variance of a function of a random variable).
Let be a real-valued random variable with known finite expected value and finite nonzero variance and let be an integrable function. Then
Proof.
The following proof is due to Tomek Tarczynski.333See Tomek’s post Variance of a function of one random variable on CrossValidated: http://stats.stackexchange.com/questions/5782/variance-of-a-function-of-one-random-variable
By Chebyshev's inequality for random variables with finite variance, for any real ,
so for any we can find a large enough so that
Let us estimate . We can write it as
(5) |
where is the distribution function of .
Since the domain of the first integral is the bounded closed interval , we can apply the Taylor series expansion:
where , and the equality holds for all . Here we took only four terms in the Taylor series expansion, but in general we can take as many as needed, as long as the function is smooth enough.
Substituting this formula into (5), we get
Increasing the domain of integration, we obtain
(6) |
where
Under some moment conditions, we can show that the second term of this remainder is as large as , which is generally small. The first term remains, therefore the quality of the approximation depends on and the behaviour of the third derivative of on bounded intervals. Such approximation should work particularly well for random variables with zero third central moment, such as the normal distribution.
To obtain an approximation for the variance of , we subtract (6) from the Taylor series expansion for and square the difference:
where involves the central moments for . ∎
Remark A.2.
Theorem A.1 generalises to the -valued random variable , , and :
Lemma A.3.
Let be the block matrix
with , . The eigenvalues of are precisely those of the matrices ( eigenvalues, some of the possibly repeated) and ( eigenvalues, some of them possibly repeated).
Proof.
It is well known that for all square matrices and of equal dimensions, the following holds:
Therefore, the characteristic polynomial of this block matrix is given by
It follows that the eigenvalues are precisely those of the matrices and , combined. ∎
Lemma A.4.
Let be the block matrix
with , . The eigenvalues of are precisely those of the matrices ( eigenvalues, some of them possibly repeated) and ( eigenvalues, some of them possibly repeated).
Proof.
It is well known that for all square matrices and of equal dimensions, the following holds:
Therefore, the characteristic polynomial of this block matrix is given by
It follows that the eigenvalues are precisely those of the matrices and , combined. ∎