Implementing portfolio risk management and hedging in practice

Paul Alexander Bilokon
Thalesians Ltd
Level39, One Canada Square
Canary Wharf
London E14 5AB

(2023.09.27)

Abstract

In academic literature portfolio risk management and hedging are often versed in the language of stochastic control and Hamilton–Jacobi–Bellman (HJB) equations in continuous time. In practice the continuous-time framework of stochastic control may be undesirable for various business reasons. In this work we present a straightforward approach for thinking of cross-asset portfolio risk management and hedging, providing some implementation details, while rarely venturing outside the convex optimisation setting of (approximate) quadratic programming (QP). We pay particular attention to the correspondence between the economic concepts and their mathematical representations; the abstractions enabling us to handle multiple asset classes and risk models at once; the dimensional analysis of the resulting equations; and the assumptions inherent in our derivations. We demonstrate how to solve the resulting QPs with CVXOPT.

1 Introduction

In academic literature portfolio risk management and hedging are often (but not always [19, 22]) versed in the language of stochastic control [6, 30, 26, 20, 25, 24] and Hamilton–Jacobi–Bellman (HJB) equations in continuous time. Indeed, this is the most rigorous and general framework for such considerations. Due to the inherent, natural relationship between stochastic control and reinforcement learning [7, 8], this framework can be relatively easily extended to the numerical methods of reinforcement learning [29, 27] taking care of the more realistic setting involving frictions.

In the author's practical experience of portfolio and risk management on the sell-side and on the buy-side, including at bulge bracket institutions, this approach presents several practical problems. First, some quantitative analysts and quantitative developers [13, 18] come from backgrounds, which exclude stochastic control. For example, many classical computer science degrees do not cover stochastic control (or indeed stochastic analysis) as part of the syllabus. The continuous-time framework is technically complex and requires discretisation at some point anyway. There are relatively few industry-grade solvers for HJB-type problems.

On the other hand, quadratic programming [9, 12, 14, 10] is accessible to most majors and requires a relatively limited background in undergraduate linear algebra (rather than graduate-level stochastic analysis and measure theory). It is usually well understood by quantitative analysts and developers irresepective of their academic and professional background. Furthermore, there are fast industry-grade implementations of quadratic program (QP) solvers, including CPLEX [16], Gurobi [15], IMSL [17], NAG [2], NASOQ [11], OSQP [28], Xpress [1]. Therefore it makes sense to come up with a QP formulation of the portfolio risk management and hedging problem. Even if such approach is imperfect in comparison with the stochastic control approach, it is practical.

Finally, in the medium- to high-frequency trading (HFT) [3] applications, the optimiser needs to be very efficient and called sparingly at discrete time intervals. In this setting, the QP formulation is yet again appealing.

In this work we describe the portfolio risk management and hedging framework with sufficient rigour, paying particular attention to the correspondence between the economic concepts and their mathematical representations; the abstractions enabling us to handle multiple asset classes and risk models at once; the dimensional analysis of the resulting equations; and the assumptions inherent in our derivations.

We demonstrate how to solve the resulting QPs with CVXOPT [4], a free software package for convex optimisation based on the Python programming language. CVXOPT is convenient in the research and development context. In production the reader is advised to use one of the industry-grade C++ or kdb+/q [23] implementations, or a specialised implementation of the solver on an FPGA or ASIC [21].

The implementations that we provide here are baseline, pedagogical implementations. In some cases the business requirements may be such that additional frictions may need to be taken into account, in which case the problem ceases to be convex. There are some proprietary tricks that can be applied in these situations. These tricks can significantly impact the profitability and risk profile of a business, but unfortunately they are outside the scope of this work.

Acknowledgements

We would like to thank Berc Rustem (Department of Computing, Imperial College London) and Martin Zinkin (Qubealgo) for our constructive discussions. The factor abstraction is largely due to Attilio Meucci and follows [22].

2 Preliminaries

In what follows we assume that we are the market maker and the sides of the trades are given from our perspective: buys are when we buy (the change in our position has a positive sign), sells are when we sell (the change in our position has a negative sign).

We shall denote by $\mathbb{N}^{*}$ the set $\{1,2,3,\ldots\}$ ; by $\mathbf{1}_{n}$ the vector in $\mathbb{R}^{n}$ whose elements are all ones. The vectors are column vectors by default.

Let us establish conventions for matrix calculus. The (scalar-by-vector) derivative of a scalar $y$ with respect to a vector in $\mathbb{R}^{n}$ ,

\boldsymbol{x}=\left(\begin{array}[]{c}x_{1}\\ \vdots\\ x_{n}\\ \end{array}\right),

with $n\in\mathbb{N}^{*}$ , is written, in numerator layout notation, as

\frac{\partial y}{\partial\boldsymbol{x}}=\left(\begin{array}[]{ccc}\frac{\partial y}{\partial x_{1}}&\ldots&\frac{\partial y}{\partial x_{n}}\end{array}\right).

Strictly speaking, the result is a matrix in $\mathbb{R}^{1\times n}$ , although we can certainly, and somewhat confusingly, think of it as a vector in $\mathbb{R}^{n}$ . (Confusingly, because of the widespread habit to think of vectors as column vectors by default. To avoid this confusion, we shall favour the matrix view over the vector view whenever there is ambiguity.)

The (vector-by-vector derivative) of a $\mathbb{R}^{m}$ -valued vector function (a vector whose components are functions)

\boldsymbol{y}=\left(\begin{array}[]{c}y_{1}\\ \vdots\\ y_{m}\\ \end{array}\right),

with $m\in\mathbb{N}^{*}$ , with respect to an input vector in $\mathbb{R}^{n}$ ,

\boldsymbol{x}=\left(\begin{array}[]{c}x_{1}\\ \vdots\\ x_{n}\\ \end{array}\right),

with $n\in\mathbb{N}^{*}$ , is written, in numerator layout notation, as

\frac{\partial\boldsymbol{y}}{\partial\boldsymbol{x}}=\left(\begin{array}[]{ccc}\frac{\partial y_{1}}{\partial x_{1}}&\cdots&\frac{\partial y_{1}}{\partial x_{n}}\\ \vdots&\ddots&\vdots\\ \frac{\partial y_{m}}{\partial x_{1}}&\cdots&\frac{\partial y_{m}}{\partial x_{n}}\\ \end{array}\right).

The result is a matrix in $\mathbb{R}^{m\times n}$ . It is called the Jacobian matrix of $\boldsymbol{y}$ with respect to $\boldsymbol{x}$ .

We have included these definitions here because they vary across the literature, with some authors preferring the numerator layout notation (as we do throughout this document), while others preferring the denominator layout notation.

Given the numerator layout convention that we have chosen, if $\boldsymbol{x}\in\mathbb{R}^{n}$ , $\boldsymbol{A}\in\mathbb{R}^{n\times n}$ , then $\frac{\partial}{\partial\boldsymbol{x}}\boldsymbol{x}^{\intercal}\boldsymbol{A}\boldsymbol{x}=\boldsymbol{x}^{\intercal}(\boldsymbol{A}+\boldsymbol{A}^{\intercal})$ . If, moreover, $\boldsymbol{A}$ is symmetric, then $\frac{\partial}{\partial\boldsymbol{x}}\boldsymbol{x}^{\intercal}\boldsymbol{A}\boldsymbol{x}=2\boldsymbol{x}^{\intercal}\boldsymbol{A}$ . If $\boldsymbol{x}\in\mathbb{R}^{n}$ , $\boldsymbol{A}\in\mathbb{R}^{m\times n}$ , then $\frac{\partial}{\partial\boldsymbol{x}}\boldsymbol{A}\boldsymbol{x}=\boldsymbol{A}$ .

3 CVXOPT

CVXOPT's function qp is an interface to coneqp for quadratic programs. It also provides the option of using the quadratic programming solver from MOSEK [5].

cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])

This solves the convex quadratic program

		$\displaystyle\underset{x}{\text{minimise}}$		$\displaystyle(1/2)x^{T}Px+q^{T}x$
		subject to		$\displaystyle Gx\preceq h,$
		$\displaystyle Ax=b$

and its dual.

4 The portfolio

Suppose that we are trading $n\in\mathbb{N}^{*}$ products whose prices per unit notional¹¹1We shall use the words “size” and “notional” synonymously. at time $t$ are given by the vector $\boldsymbol{P}_{t}\in\mathbb{R}^{n}$ . We shall represent the composition of our portfolio, $\pi_{t}$ , in terms of $\boldsymbol{N}_{t}\in\mathbb{R}^{n}$ , where the $i$ th element, $(\boldsymbol{N}_{t})_{i}$ , is the net notional amount of the $i$ th product in the portfolio ( $i=1:n$ ). Thus the total net notional of our portfolio is given by $N^{\pi}_{t}=\mathbf{1}_{n}^{\intercal}\boldsymbol{N}_{t}\in\mathbb{R}$ . We could also consider the weights of the products in our portfolio, $\boldsymbol{w}_{t}=\frac{1}{N^{\pi}_{t}}\boldsymbol{N}_{t}\in\mathbb{R}^{n}$ . The value of our portfolio at time $t$ is given by $V_{t}^{\pi}=\boldsymbol{N}_{t}^{\intercal}\boldsymbol{P}_{t}\in\mathbb{R}$ .

This is an appropriate time to comment on the units. The value of the portfolio, $V_{t}^{\pi}$ is always expressed in units of currency, say, USD or EUR. The units of the prices, $\boldsymbol{P}_{t}$ , and notionals, $\boldsymbol{N}_{t}$ , are subject to the conventions of the particular asset class:

•

For equities, $\boldsymbol{N}_{t}$ is dimensionless; the notional is expressed as the number of shares. The price, $\boldsymbol{P}_{t}$ , is in units of currency.
•

For CDS indices, $\boldsymbol{N}_{t}$ is expressed in units of currency, as this is the amount on which protection is being bought or sold (e.g. 25,000,000 USD). The price, $\boldsymbol{P}_{t}$ , on the other hand, is dimensionless, as it is expressed in units of currency per unit notional, which is itself expressed in units of currency.

5 The invariants

Our market risk consists in our dependence on the prices of the products at time $T+\tau$ , where $T$ represents the time when the asset allocation decision is being made and $\tau\in[0,+\infty)$ is our investment horizon.

In order to understand our market risk, we seek to express $\boldsymbol{P}_{t}\in\mathbb{R}^{n}$ in terms of invariants — market variables that exhibit stationary behaviour over time and can be expressed in terms of independent and identically distributed random variables.

Let us denote by $\boldsymbol{X}_{t,\tau}$ the $\mathbb{R}^{n}$ -valued random vector of invariants at time $t$ for a given investment horizon $\tau$ . We express the prices of the products in our portfolio as

\boldsymbol{P}_{t}=\boldsymbol{g}_{\tau}(\boldsymbol{X}_{t,\tau}),

where $\boldsymbol{g}_{\tau}:\mathbb{R}^{n}\to\mathbb{R}^{n}$ is a function that depends on the asset classes of products in our portfolio and our investment horizon. In particular,

\boldsymbol{P}_{T+\tau}=\boldsymbol{g}_{\tau}(\boldsymbol{X}_{T+\tau,\tau}).

Notice that we are assuming that, like $\boldsymbol{P}_{t}$ , $\boldsymbol{X}_{t}$ is $n$ -dimensional. The reason why this is a sensible assumption will become clear when we consider examples of $\boldsymbol{g}_{\tau}$ and $\boldsymbol{X}_{t,\tau}$ for various asset classes.

6 The factor model

To get a handle on our market risk, we shall express the vector of invariants as

\boldsymbol{X}_{t,\tau}=\boldsymbol{h}(\boldsymbol{F}_{t,\tau})+\boldsymbol{U}_{t,\tau},

(1)

where $F_{t,\tau}$ is an $\mathbb{R}^{m}$ -valued ( $m\in\mathbb{N}^{*}$ ) vector of common risk factors that are responsible for most of the randomness in the market, $U_{t,\tau}$ is a residual vector of perturbations, and $\boldsymbol{h}:\mathbb{R}^{m}\to\mathbb{R}^{n}$ is a function.

7 Risk

What is our risk exposure, i.e. the sensitivity of the value of our portfolio to our risk factors? It is given by the $\mathbb{R}^{m}$ -valued

\boldsymbol{r}_{t,\tau}:=\frac{\partial V_{t}^{\pi}}{\partial\boldsymbol{F}_{t,\tau}}=\frac{\partial}{\partial\boldsymbol{F}_{t,\tau}}\boldsymbol{N}_{t}^{\intercal}\boldsymbol{P}_{t}=\boldsymbol{N}_{t}^{\intercal}\frac{\partial}{\partial\boldsymbol{F}_{t,\tau}}\boldsymbol{g}(\boldsymbol{X}_{t,\tau})=\boldsymbol{N}_{t}^{\intercal}\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau}).

The dependence of our risk exposure on the notional vector is expressed by the matrix of sensitivities, the $m\times n$ Jacobian,

\boldsymbol{H}_{t,\tau}=\frac{\partial\boldsymbol{r}_{t,\tau}}{\partial\boldsymbol{N}_{t}}=\frac{\partial}{\partial\boldsymbol{N}_{t}}\left(\boldsymbol{N}_{t}^{\intercal}\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})\right)=\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau}).

Thus the relationship between $\boldsymbol{r}_{t,\tau}$ and $\boldsymbol{H}_{t,\tau}$ , which follows from their definitions, is given by

\boldsymbol{r}_{t,\tau}=\boldsymbol{N}_{t}^{\intercal}\boldsymbol{H}_{t,\tau}.

(2)

8 The variance of the value of the portfolio

We would like to minimise the variance of the value of the portfolio at time $T+\tau$ , which represents our market risk. Let us regard $V_{t}^{\pi}$ as a function of the risk factors: $V_{t}^{\pi}=V_{t}^{\pi}(\boldsymbol{F}_{t,\tau})$ . By Theorem A.1 and Remark A.2,

\mathsf{Var}\left[V_{t}^{\pi}\right]\approx\left(\frac{\partial V_{t}^{\pi}}{\partial\boldsymbol{F}_{t,\tau}}\right)^{\intercal}\boldsymbol{C}\frac{\partial V_{t}^{\pi}}{\partial\boldsymbol{F}_{t,\tau}}=\boldsymbol{r}_{t,\tau}^{\intercal}\boldsymbol{C}\boldsymbol{r}_{t,\tau},

where $\boldsymbol{C}:=\mathsf{Cov}(\boldsymbol{F}_{t,\tau})\in\mathbb{R}^{m\times m}$ is the covariance matrix.

This result is approximate. We approximated in two places:

1.

In Theorem A.1, where we disposed of the remainder term of the Taylor series expansion.
2.

When we regarded $V_{t}^{\pi}$ as a function of $\boldsymbol{F}_{t,\tau}$ alone, whereas it is also a function of another random variable: $\boldsymbol{U}_{t,\tau}$ . We have effectively approximated (1) by

$\boldsymbol{X}_{t,\tau}\approx\boldsymbol{h}(\boldsymbol{F}_{t,\tau}),$

which is sensible if the variance of $\boldsymbol{U}_{t,\tau}$ is small, i.e. our risk factors are responsible for most of the variance of $\boldsymbol{X}_{t,\tau}$ . To better account for the variance of $\boldsymbol{U}_{t,\tau}$ , we could have used the bivariate Taylor series expansion, instead of the univariate as in the proof of Theorem A.1.

9 Minimising risk

We would like to minimise the variance of the value of our portfolio at our investment horizon, $\mathsf{Var}\left[V_{T+\tau}^{\pi}\right]$ . Bearing in mind the caveats mentioned in Section 8, this is achieved by minimising

\mathsf{Var}\left[V_{T+\tau}^{\pi}\right]\approx(\boldsymbol{r}_{T+\tau,\tau}+\boldsymbol{H}_{T+\tau,\tau}\boldsymbol{x})^{\intercal}\boldsymbol{C}_{T+\tau,\tau}(\boldsymbol{r}_{T+\tau,\tau}+\boldsymbol{H}_{T+\tau,\tau}\boldsymbol{x}),

where $\boldsymbol{C}_{T+\tau,\tau}:=\mathsf{Cov}(\boldsymbol{F}_{T+\tau,\tau})$ and $\boldsymbol{x}\in\mathbb{R}^{n}$ is the change in position that will minimise the variance of the value of our portfolio. It is $\boldsymbol{x}$ that we need to find.

Note that, at the time the hedging decision is made, $\boldsymbol{r}_{T+\tau,\tau}$ , $\boldsymbol{H}_{T+\tau,\tau}$ , and $\boldsymbol{C}_{T+\tau,\tau}$ are not yet known. Therefore we have to make use of forecasts instead and minimise

(\hat{\boldsymbol{r}}_{T+\tau,\tau}+\hat{\boldsymbol{H}}_{T+\tau,\tau}\boldsymbol{x})^{\intercal}\hat{\boldsymbol{C}}_{T+\tau,\tau}(\hat{\boldsymbol{r}}_{T+\tau,\tau}+\hat{\boldsymbol{H}}_{T+\tau,\tau}\boldsymbol{x}),

where $\hat{\boldsymbol{C}}_{T+\tau,\tau}:=\mathsf{Cov}(\hat{\boldsymbol{F}}_{T+\tau,\tau})$ . We shall discuss how these forecasts can be obtained later. For the time being, we shall drop the subscripts and superscripts to avoid notational clutter and solve the unconstrained quadratic program (QP)

\underset{\boldsymbol{x}}{\text{minimise}}\,\,(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x}),

(3)

which can be done analytically using straightforward matrix calculus. First, differentiate with respect to $\boldsymbol{x}$ :

	$\displaystyle\frac{\partial}{\partial\boldsymbol{x}}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})$	$\displaystyle=\frac{\partial}{\partial\boldsymbol{x}}\left(\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}\right)$
		$\displaystyle=\frac{\partial}{\partial\boldsymbol{x}}\left(\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}\right)$
		$\displaystyle=2\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}.$

We set this partial derivative to zero and solve for $\boldsymbol{x}$ :

\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}=-\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H},

hence

\boldsymbol{x}^{\intercal}=-\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\right)^{-1}

and

\boldsymbol{x}=-\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\right)^{-1}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}\in\mathbb{R}^{n}.

10 Minimising risk and symmetric costs

Let us now incorporate the costs into the optimisation. We now assume that it will cost us $\boldsymbol{c}^{\intercal}\boldsymbol{x}$ to execute the hedge (change our position by $\boldsymbol{x}$ ), where $\boldsymbol{c}\in\mathbb{R}^{n}$ are the costs per unit notional. Notice that the costs are symmetric — they are the same linear factors of $\boldsymbol{x}$ irrespective of whether we are buying or selling. This assumption may not be realistic in practice.

The problem remains a constrained QP:

\underset{\boldsymbol{x}}{\text{minimise}}\,\,(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})+\lambda_{c}\boldsymbol{c}^{\intercal}|\boldsymbol{x}|,

where $|\cdot|$ denotes the elementwise absolute value.²²2At first sight, the problem is an unconstrained QP, $\underset{\boldsymbol{x}}{\text{minimise}}\,\,(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})+\lambda_{c}\boldsymbol{c}^{\intercal}\boldsymbol{x},$ which can be solved using matrix calculus as before: $\displaystyle\frac{\partial}{\partial\boldsymbol{x}}\left((\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})+\lambda_{c}\boldsymbol{c}^{\intercal}\boldsymbol{x}\right)$ $\displaystyle=\frac{\partial}{\partial\boldsymbol{x}}\left(\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{c}\boldsymbol{c}^{\intercal}\boldsymbol{x}\right)$ $\displaystyle=\frac{\partial}{\partial\boldsymbol{x}}\left(\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{c}\boldsymbol{c}^{\intercal}\boldsymbol{x}\right)$ $\displaystyle=2\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}+\lambda_{c}\boldsymbol{c}^{\intercal}.$ Setting this partial derivative to zero and solving for $\boldsymbol{x}$ , we get: $\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}=-\left(\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}+\frac{1}{2}\lambda_{c}\boldsymbol{c}^{\intercal}\right)\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\right)^{-1},$ hence $\boldsymbol{x}=-\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\right)^{-1}\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\frac{1}{2}\lambda_{c}\boldsymbol{c}\right)\in\mathbb{R}^{n}.$ However, in this form the problem is misspecified: if $\boldsymbol{c}$ has all positive elements, then we are paying for buying, but receiving money for selling!

Here $\lambda_{c}\geq 0$ is a nonnegative constant specifying how many units of cash we are prepared to pay for reducing the variance of the value of the portfolio by one unit (in units of value squared).

We can rewrite this problem as

		$\displaystyle\underset{\boldsymbol{x},\boldsymbol{v}}{\text{minimise}}$		$\displaystyle(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})+\lambda_{c}(\boldsymbol{c})^{\intercal}\boldsymbol{v}+\lambda_{0}(\boldsymbol{v}^{\intercal}\boldsymbol{v}-\boldsymbol{x}^{\intercal}\boldsymbol{x})$
		subject to		$\displaystyle\boldsymbol{v}\succeq\boldsymbol{x},$
		$\displaystyle\boldsymbol{v}\succeq-\boldsymbol{x}.$

The last term was added to regularise the covariance matrix of the overall problem.

Let us write this problem as a standard QP. Setting

\boldsymbol{x}^{a}=\left(\begin{array}[]{c}\boldsymbol{x}\\ \boldsymbol{v}\\ \end{array}\right),

the objective function can be written as

\displaystyle\frac{1}{2}(\boldsymbol{x}^{a})^{\intercal}\left(\begin{array}[]{cc}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n}&\boldsymbol{0}_{n\times n}\\ \boldsymbol{0}_{n\times n}&2\lambda_{0}\boldsymbol{I}_{n\times n}\\ \end{array}\right)\boldsymbol{x}^{a}+\left(\begin{array}[]{c}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{0}\boldsymbol{c}\\ \boldsymbol{0}_{n\times 1}\end{array}\right)^{\intercal}\boldsymbol{x}^{a}.

We can also rewrite the constraints in block matrix form as

\left(\begin{array}[]{ccc}\boldsymbol{I}_{n\times n}&-\boldsymbol{I}_{n\times n}\\ -\boldsymbol{I}_{n\times n}&-\boldsymbol{I}_{n\times n}\end{array}\right)\boldsymbol{x}^{a}\preceq\left(\begin{array}[]{c}\boldsymbol{0}_{n}\\ \boldsymbol{0}_{n}\\ \end{array}\right).

10.1 Using CVXOPT

Thus we can call

cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])

with

\text{{P}}=\left(\begin{array}[]{cc}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n}&\boldsymbol{0}_{n\times n}\\ \boldsymbol{0}_{n\times n}&2\lambda_{0}\boldsymbol{I}_{n\times n}\\ \end{array}\right),

\text{{q}}=\left(\begin{array}[]{c}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{0}\boldsymbol{c}\\ \boldsymbol{0}_{n\times 1}\end{array}\right),

\text{{G}}=\left(\begin{array}[]{ccc}\boldsymbol{I}_{n\times n}&-\boldsymbol{I}_{n\times n}\\ -\boldsymbol{I}_{n\times n}&-\boldsymbol{I}_{n\times n}\end{array}\right),

\text{{h}}=\left(\begin{array}[]{c}\boldsymbol{0}_{n}\\ \boldsymbol{0}_{n}\end{array}\right)

to find the optimal $\boldsymbol{x}^{a}$ .

10.2 When is $\boldsymbol{P}$ positive definite?

In this section, and in this section only, $\boldsymbol{P}$ will denote the parameter of cvxopt.solvers.qp and not the vector of prices.

By Lemma A.3, the eigenvalues of $\boldsymbol{P}$ are precisely those of $2\lambda_{0}\boldsymbol{I}_{n\times n}$ and $2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n}$ , combined. Clearly the eigenvalues of $2\lambda_{0}\boldsymbol{I}_{n\times n}$ are just $2\lambda_{0}$ repeated $n$ times. To find the eigenvalues of $2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n}$ , we solve the characteristic equation

\det((2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n})-\lambda\boldsymbol{I}_{n\times n})=0

for $\lambda$ . On observing that

\det((2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n})-\lambda\boldsymbol{I}_{n\times n})=2^{n}\det\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\left(\frac{1}{2}\lambda_{0}+\frac{1}{2}\lambda\right)\boldsymbol{I}_{n\times n}\right),

we notice that this deteminant is zero precisely when

\frac{1}{2}\lambda_{0}+\frac{1}{2}\lambda=\lambda^{\prime}

for an eigenvalue, $\lambda^{\prime}$ , of $\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}$ . Thus, to summarise, the eigenvalues of $\boldsymbol{P}$ are:

•

$2\lambda_{0}$ repeated $n$ times;
•

for $i=1,\ldots,n$ , $2\lambda^{\prime}_{i}-\lambda_{0}$ , where $\lambda^{\prime}_{i}$ is the $i$ th eigenvalue of $\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}$ .

It is now clear that $\boldsymbol{P}$ is positive definite iff $0<\lambda_{0}<2\lambda^{\prime}_{\text{min}}$ , where $\lambda^{\prime}_{\text{min}}$ is the least eigenvalue of $\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}$ (which is positive because $\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}$ is positive definite by assumption).

11 Minimising risk and asymmetric costs

In Section 10, we assumed that the costs would be the same irrespective of the signs of the components of $\boldsymbol{x}$ . In this section we shall develop an approach that will allow us to provide separate costs for buying, $\boldsymbol{c}^{+}\in\mathbb{R}^{n}$ , and for selling, $\boldsymbol{c}^{-}\in\mathbb{R}^{n}$ .

To this end we also define $\boldsymbol{x}^{a}\in\mathbb{R}^{2n}$ as the block vector

\boldsymbol{x}^{a}=\left(\begin{array}[]{c}\boldsymbol{x}^{+}\\ \boldsymbol{x}^{-}\\ \end{array}\right),

with $\boldsymbol{x}^{+},\boldsymbol{x}^{-}\in\mathbb{R}^{n}$ with all their components nonnegative. $\boldsymbol{x}^{+}$ specifies the notional amounts to be bought, $\boldsymbol{x}^{-}$ specifies the notional amounts to be bought, for each product.

The optimisation problem now becomes

		$\displaystyle\underset{\boldsymbol{x}^{+},\boldsymbol{x}^{-}}{\text{minimise}}$		$\displaystyle(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))+\lambda_{c}[(\boldsymbol{c}^{+})^{\intercal}\boldsymbol{x}^{+}+(\boldsymbol{c}^{-})^{\intercal}\boldsymbol{x}^{-}]$
		subject to		$\displaystyle\boldsymbol{x}^{+}\succeq 0,$
		$\displaystyle\boldsymbol{x}^{-}\succeq 0.$

We note that this is now a constrained QP.

There is a problem with this formulation: nothing guarantees that we won't be buying and selling the same product simultaneously, i.e. that, for some $i\in\{1,\ldots,n\}$ , $(\boldsymbol{x}^{+})_{i}>0$ and $(\boldsymbol{x}^{-})_{i}>0$ . To address this, we add an additional term, $\lambda_{0}(\boldsymbol{x}^{+})^{\intercal}\boldsymbol{x}^{-}$ :

		$\displaystyle\underset{\boldsymbol{x}^{+},\boldsymbol{x}^{-}}{\text{minimise}}$		$\displaystyle(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))+\lambda_{c}[(\boldsymbol{c}^{+})^{\intercal}\boldsymbol{x}^{+}+(\boldsymbol{c}^{-})^{\intercal}\boldsymbol{x}^{-}]+\lambda_{0}(\boldsymbol{x}^{+})^{\intercal}\boldsymbol{x}^{-}$
		subject to		$\displaystyle\boldsymbol{x}^{+}\succeq 0,$
		$\displaystyle\boldsymbol{x}^{-}\succeq 0.$

Let us write this problem as a standard QP. The first term of the objective function can be written as

\displaystyle(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))=(\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))^{\intercal}\boldsymbol{C}\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}.

The last term is a constant and can be dropped from the minimisation. The remaining terms can be rewritten as

	$\displaystyle(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})$
	$\displaystyle=(\boldsymbol{x}^{a})^{\intercal}\left(\begin{array}[]{cc}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}&-\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\\ -\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}&\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\\ \end{array}\right)\boldsymbol{x}^{a}+\left(\begin{array}[]{c}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}\\ -2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}\\ \end{array}\right)^{\intercal}\boldsymbol{x}^{a}.$

The second term of the objective function can be written as

\lambda_{c}[(\boldsymbol{c}^{+})^{\intercal}\boldsymbol{x}^{+}+(\boldsymbol{c}^{-})^{\intercal}\boldsymbol{x}^{-}]=\lambda_{c}\left(\begin{array}[]{c}\boldsymbol{c}^{+}\\ \boldsymbol{c}^{-}\\ \end{array}\right)^{\intercal}\boldsymbol{x}^{a}.

Finally, the third term of the objective function can be written as

(\boldsymbol{x}^{a})^{\intercal}\left(\begin{array}[]{cc}\boldsymbol{0}_{n\times n}&\lambda_{0}\boldsymbol{I}_{n\times n}\\ \lambda_{0}\boldsymbol{I}_{n\times n}&\boldsymbol{0}_{n\times n}\\ \end{array}\right)\boldsymbol{x}^{a}.

Putting this together, we rewrite the objective function as

\frac{1}{2}(\boldsymbol{x}^{a})^{\intercal}\left(\begin{array}[]{cc}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}&-2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\lambda_{0}\boldsymbol{I}_{n\times n}\\ -2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\lambda_{0}\boldsymbol{I}_{n\times n}&2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\\ \end{array}\right)\boldsymbol{x}^{a}+\left(\begin{array}[]{c}2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}+\lambda_{c}\boldsymbol{c}^{+}\\ -2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}+\lambda_{c}\boldsymbol{c}^{-}\\ \end{array}\right)^{\intercal}\boldsymbol{x}^{a}.

We can also rewrite the constraints in block matrix form as

\left(\begin{array}[]{ccc}-\boldsymbol{I}_{n\times n}&0_{n\times n}\\ 0_{n\times n}&-\boldsymbol{I}_{n\times n}\end{array}\right)\boldsymbol{x}^{a}\preceq\left(\begin{array}[]{c}\boldsymbol{0}_{n}\\ \boldsymbol{0}_{n}\\ \end{array}\right).

11.1 Using CVXOPT

Thus we can call

cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])

with

\text{{P}}=\left(\begin{array}[]{cc}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}&-2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\lambda_{0}\boldsymbol{I}_{n\times n}\\ -2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\lambda_{0}\boldsymbol{I}_{n\times n}&2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\\ \end{array}\right),

\text{{q}}=\left(\begin{array}[]{c}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{c}\boldsymbol{c}^{+}\\ -2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{c}\boldsymbol{c}^{-}\end{array}\right),

\text{{G}}=\left(\begin{array}[]{ccc}-\boldsymbol{I}_{n\times n}&0_{n\times n}\\ 0_{n\times n}&-\boldsymbol{I}_{n\times n}\end{array}\right),

\text{{h}}=\left(\begin{array}[]{c}\boldsymbol{0}_{n}\\ \boldsymbol{0}_{n}\end{array}\right)

to find the optimal

\boldsymbol{x}=\left(\begin{array}[]{c}\boldsymbol{x}^{+}\\ \boldsymbol{x}^{-}\end{array}\right).

11.2 When is $\boldsymbol{P}$ positive definite?

In this section, and in this section only, $\boldsymbol{P}$ will denote the parameter of cvxopt.solvers.qp and not the vector of prices.

By Lemma A.4, the eigenvalues of $\boldsymbol{P}$ are precisely those of $2\lambda_{0}\boldsymbol{I}_{n\times n}$ and $4\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-2\lambda_{0}\boldsymbol{I}_{n\times n}$ , combined. Clearly the eigenvalues of $2\lambda_{0}\boldsymbol{I}_{n\times n}$ are just $2\lambda_{0}$ repeated $n$ times. To find the eigenvalues of $4\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-2\lambda_{0}\boldsymbol{I}_{n\times n}$ , we solve the characteristic equation

\det((4\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-2\lambda_{0}\boldsymbol{I}_{n\times n})-\lambda\boldsymbol{I}_{n\times n})=0

for $\lambda$ . On observing that

\det((4\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-2\lambda_{0}\boldsymbol{I}_{n\times n})-\lambda\boldsymbol{I}_{n\times n})=4^{n}\det\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\left(\frac{1}{2}\lambda_{0}+\frac{1}{4}\lambda\right)\boldsymbol{I}_{n\times n}\right),

we notice that this deteminant is zero precisely when

\frac{1}{2}\lambda_{0}+\frac{1}{4}\lambda=\lambda^{\prime}

for an eigenvalue, $\lambda^{\prime}$ , of $\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}$ . Thus, to summarise, the eigenvalues of $\boldsymbol{P}$ are:

•

$2\lambda_{0}$ repeated $n$ times;
•

for $i=1,\ldots,n$ , $4\lambda^{\prime}_{i}-2\lambda_{0}$ , where $\lambda^{\prime}_{i}$ is the $i$ th eigenvalue of $\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}$ .

12 Practical considerations

The universe of products that we trade may be a proper superset of the universe of products that we use to hedge. This is easily implemented within our framework: compute the risk for the overall portfolio, then restrict the set of products under consideration to the hedging universe. Then the dimension $n$ is the number of products that are used for hedging and the derivations remain valid for this restricted set of products.

13 Special case: $m=n$ , $\boldsymbol{C}$ and $\boldsymbol{H}$ diagonal

Let us now consider the special case when $m=n$ and both $\boldsymbol{C}$ and $\boldsymbol{H}$ are diagonal matrices:

\boldsymbol{C}=\left(\begin{array}[]{ccccc}C_{1}&0&0&\cdots&0\\ 0&C_{2}&0&\cdots&0\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ 0&0&\ddots&\ddots&0\\ 0&0&\cdots&0&C_{n}\end{array}\right),\quad\boldsymbol{H}=\left(\begin{array}[]{ccccc}H_{1}&0&0&\cdots&0\\ 0&H_{2}&0&\cdots&0\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ 0&0&\ddots&\ddots&0\\ 0&0&\cdots&0&H_{n}\end{array}\right).

Additionally, we require that $\boldsymbol{C}$ be positive definite (not just positive semidefinite), so $C_{i}>0$ for $i=1,\ldots,n$ . We shall also require $H_{i}>0$ for $i=1,\ldots,n$ .

In this setting, the unconstrained QP (3) reduces to the system of scalar optimisation problems without any coupling,

\underset{x_{i}}{\text{minimise}}\,\,H_{i}^{2}x_{i}^{2}+2r_{i}H_{i}x_{i}+r_{i}^{2},\quad i=1,\ldots,n.

Using elementary calculus — solving for $x_{i}$ the derivative of $H_{i}^{2}x_{i}^{2}+2r_{i}H_{i}x_{i}+r_{i}^{2}$ with respect to $x$ equated to 0 — we find the optimal $x_{i}$ : $x_{i}^{*}=-r_{i}/H_{i}$ .

Note that in this case, when the correlations are absent, the sign of $x_{i}$ (i.e. whether we buy or sell the $i$ th product) depends entirely on the signs of $r_{i}$ and $H_{i}$ : $x_{i}$ is positive (i.e. we have to buy $|x_{i}|=x_{i}$ units of notional of the $i$ th product) when $r_{i}$ and $H_{i}$ are of different signs; $x_{i}$ is negative (i.e. we have to sell $|x_{i}|=-x_{i}$ units of notional of the $i$ th product) when $r_{i}$ and $H_{i}$ are of the same sign. (Of course, we are assuming that $r_{i}\neq 0$ , as otherwise there is no risk to hedge.)

For this reason there is no need to augment the costs of buying and selling as we did in Section 11. We can simply set

c_{i}:=\left\{\begin{array}[]{ll}\text{cost of buying 1 unit notional of $i$th product},&\hbox{$r_{i}$, $H_{i}$ of different signs;}\\ \text{cost of selling 1 unit notional of $i$th product},&\hbox{$r_{i}$, $H_{i}$ of same sign.}\end{array}\right.

The objective function then becomes $(r_{i}+H_{i}x_{i})^{2}C_{i}+\lambda_{c}c_{i}x_{i}$ and the optimisation problem

\underset{x_{i}}{\text{minimise}}\,\,C_{i}H_{i}^{2}x_{i}^{2}+(2C_{i}r_{i}H_{i}+\lambda_{c}c_{i})x_{i}+C_{i}r_{i}^{2},\quad i=1,\ldots,n,

where $\lambda_{c}$ is as in Section 10.

Again using elementary calculus, we find

x_{i}^{*}=-\frac{2C_{i}r_{i}H_{i}+\lambda_{c}c_{i}}{2C_{i}H_{i}^{2}}.

14 Case study: CDS indices, no cross-hedging

Let us now consider the case of CDS indices. Recall that credit DV01 (CDV01) or CS01 (Credit Spread 01) is defined as the change in price of the CDS contract (of a given notional) for a one basis point increase in spread.

For the European CDS indices (the iTraxx family), we shall take $1,000,000$ EUR as our unit of notional. For the North American CDS indices (the CDX family), we shall take $1,000,000$ USD as our unit of notional. So $N_{i}$ and $x_{i}$ will have these units.

The unit of price ( $P_{i}$ ) will be EUR for iTraxx and USD for CDX. Our invariants ( $X_{i}$ ) will be credit spreads, whose units are basis points. We shall set $h_{i}$ in (1) to be the identity map, i.e. our factors will be the same as our invariants, $F_{i}\equiv X_{i}$ . Since our risk exposure is given by $r_{i}:=\frac{dV^{\pi}}{dF_{i}}$ , its units will be EUR per basis point for iTraxx and USD per basis point for CDX. Since the Jacobian is given by $H_{i}:=\frac{dr_{i}}{dN_{i}}$ , its units will be $1,000,000^{-1}(\text{basis point})^{-1}$ : $H_{i}$ is the change in price (in EUR for iTraxx, USD for CDX) per 1,000,000 EUR for iTraxx (1,000,000 USD for CDX) notional.

15 Case study: European government bonds

Suppose that we have a portfolio of $n$ European government bonds. For $i=1:n$ , the price of the $i$ th bond at time $t$ is given by

(\boldsymbol{P}_{t})_{i}=\sum_{j=1}^{l_{i}}c_{i,j}e^{-(y_{t}(\tau_{t,i,j})+\lambda_{t,i})\tau_{t,i,j}},

(4)

where

•

$(\boldsymbol{P}_{t})_{i}$ is the dirty market price of the $i$ th bond;
•

$l_{i}$ is the number of cashflows for the $i$ th bond;
•

$c_{i,j}$ is the $j$ th cashflow for the $i$ th bond;
•

$\tau_{t,i,j}$ is the duration of the time interval between the time $t$ and the time of the $j$ th cashflow of the $i$ th bond;
•

$y_{t}:[0,\infty)\to\mathbb{R}$ is the zero curve at time $t$ , which is a function that maps the maturities of the cashflows (the durations of the time intervals between the time $t$ and the times of the cashflows) to the continuously compounded interest rates;
•

$\lambda_{t,i}$ is a parallel vertical shift to the zero curve $y_{t}$ which is required to equate the right-hand side to the dirty market price of the $i$ th bond. We shall refer to $\lambda_{t,i}$ as the idiosyncratic spread of the $i$ th bond at time $t$ .

We model the yield curve as

y_{t}(\tau_{\text{cf}})=\sum_{k=1}^{d}\beta_{t,k}f_{k}(\tau_{\text{cf}}),

where $d\in\mathbb{N}^{*}$ and, for $k=1:d$ ,

f_{k}:[0,\infty)\to\mathbb{R}

are some suitably defined basis functions.

As we are interested in intraday market making, our investment horizon $\tau$ is relatively short, so we can assume that the bond prices are our invariants,

\boldsymbol{X}_{t,\tau}=\boldsymbol{P}_{t},

so $\boldsymbol{g}_{\tau}$ is the identity map, i.e., for all $\boldsymbol{x}\in\mathbb{R}^{n}$ , $\boldsymbol{g}_{\tau}(\boldsymbol{x})=\boldsymbol{x}$ , so that

\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})=\boldsymbol{I}_{n}.

The risk factors are given by

\boldsymbol{F}_{t,\tau}=\left(\begin{array}[]{c}\hat{\beta}_{t+T,1}\\ \vdots\\ \hat{\beta}_{t+T,d}\\ \hat{\lambda}_{t+T,1}\\ \vdots\\ \hat{\lambda}_{t+T,n}\\ \end{array}\right),

where

\hat{\beta}_{t+T,1},\ldots,\hat{\beta}_{t+T,d},\hat{\lambda}_{t+T,1},\ldots,\hat{\lambda}_{t+T,n}

are, respectively, our forecasts for

\beta_{t+T,1},\ldots,\beta_{t+T,d},\lambda_{t+T,1},\ldots,\lambda_{t+T,n}.

Our risk factors explain all of the risk, so

\boldsymbol{X}_{t,\tau}=\boldsymbol{h}(\boldsymbol{F}_{t,\tau}),

where $\boldsymbol{h}$ is given by equation (4).

The matrix of sensitivities is given by

	$\displaystyle\boldsymbol{H}_{t,\tau}$	$\displaystyle=\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})$
		$\displaystyle=\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})$

and the risk exposure can be obtained from (2).

Thus we need to find

\left(\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})\right)_{i,j}

for $i=1:n$ , $j=1:d+n$ . Applying the chain rule, for $j=1:d$ ,

	$\displaystyle\left(\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})\right)_{i,j}$	$\displaystyle=\frac{\partial\left(\boldsymbol{P}_{t}\right)_{i}}{\partial\beta_{t,j}}(\boldsymbol{F}_{t,\tau})$
		$\displaystyle=-\sum_{k=1}^{l_{i}}\tau_{t,i,k}c_{i,k}e^{-(y_{t}(\tau_{t,i,k})+\lambda_{t,i})\tau_{t,i,k}}f_{j}(\tau_{t,i,k}),$

and for $j=d+1:d+n$ , setting $j^{\prime}=j-d$ ,

	$\displaystyle\left(\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})\right)_{i,j}$	$\displaystyle=\frac{\partial\left(\boldsymbol{P}_{t}\right)_{i}}{\partial\lambda_{t,j^{\prime}}}(\boldsymbol{F}_{t,\tau})$
		$\displaystyle=\left\{\begin{array}[]{ll}-\sum_{k=1}^{l_{i}}\tau_{t,i,k}c_{i,k}e^{-(y_{t}(\tau_{t,i,k})+\lambda_{t,i})\tau_{t,i,k}},&\hbox{$i=j^{\prime}$;}\\ 0,&\hbox{otherwise.}\end{array}\right.$

References

[1] FICO Xpress Optimizer Reference Manual, 2023.
[2] The Numerical Algorithms Group NAG Library Manual, Mark 29.2, 2023. https://support.nag.com/numeric/nl/nagdoc_latest/.
[3] Irene Aldridge. High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems. Wiley, 2 edition, 2013.
[4] Martin Andersen, Joachim Dahl, and Lieven Vandenberghe. CVXOPT: Convex optimization. Astrophysics Source Code Library, 2020.
[5] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.0., 2019.
[6] Karl J. Astrom. Introduction to Stochastic Control Theory. Dover, 2006.
[7] Dimitri P. Bertsekas. Dynamic programming and optimal control, Volume I. Athena Scientific, Belmont, MA, 2001.
[8] Dimitri P. Bertsekas. Dynamic programming and optimal control, Volume II. Athena Scientific, Belmont, MA, 2005.
[9] Michael J. Best. Portfolio Optimization. CRC Press, 2010.
[10] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[11] Kazem Cheshmi, Danny M. Kaufman, Shoaib Kamil, and Maryam Mehri Dehnavi. NASOQ. ACM Transactions on Graphics, 39(4), aug 2020.
[12] Gerard Cornuejols, Javier Pena, and Reha Tutuncu. Optimization Methods in Finance. Cambridge University Press, 2 edition, 2018.
[13] Emanuel Derman. My Life as a Quant: Reflections on Physics and Finance. Wiley, 2007.
[14] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization. Emerald Group Publishing Limited, 1982.
[15] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
[16] IBM. V12.1: User's manual for CPLEX. Technical report, IBM, 2009.
[17] IMSL. IMSL STAT/LIBRARY. Visual Numerics Inc., Houston, Texas, USA, 1997. http://www.vni.com/books/dod/pdf/STATVol_2.pdf.
[18] Mark Joshi. On becoming a quant. http://www.maths.usyd.edu.au/u/UG/SM/MATH3075/r/Joshi_2008.pdf, 2008.
[19] Mark S. Joshi and Jane M. Paterson. Introduction to Mathematical Portfolio Theory. International Series on Actuarial Science. Cambridge University Press, 2013.
[20] Harold J. Kushner and Paul Dupuis. Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, 2 edition, 2000.
[21] John W. Lockwood, Adwait Gupte, Nishit Mehta, Michaela Blott, Tom English, and Kees Vissers. A low-latency library in FPGA hardware for high-frequency trading (HFT). In IEEE 20th Annual Symposium on High-Performance Interconnects, pages 9–16, 2012.
[22] Attilio Meucci. Risk and Asset Allocation. Springer Finance. Springer, 2005.
[23] Jan Novotny, Paul Alexander Bilokon, Aris Galiotos, and Frédéric Délèze. Machine Learning and Bid Data with kdb+/q. Wiley, 2019.
[24] Bernt Øksendal. Stochastic Differential Equations: An Introduction with Applications. Universitext. Springer, 6 edition, 2000.
[25] Bernt Øksendal and Agnes Sulem. Applied Stochastic Control of Jump Diffusions. Springer, 3 edition, 2019.
[26] Huyen Pham. Continuous-time Stochastic Control and Optimization with Financial Applications. Springer, 2009.
[27] Warren B. Powell. Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions. Wiley, 2022.
[28] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd. OSQP: an operator splitting solver for quadratic programs. Mathematical Programming Computation, 12(4):637–672, 2020.
[29] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 2 edition, 2018.
[30] Jiongmin Yong and Xun Yu Zhou. Stochastic Controls: Hamiltonian Systems and HJB Equations. Springer, 1999.

Appendix A Auxiliary results

Theorem A.1 (The variance of a function of a random variable).

Let $X$ be a real-valued random variable with known finite expected value and finite nonzero variance and let $f:\mathbb{R}\to\mathbb{R}$ be an integrable function. Then

\mathsf{Var}\left[f(X)\right]\approx f^{\prime}(\mathbb{E}\left[X\right])^{2}\mathsf{Var}\left[X\right].

Proof.

The following proof is due to Tomek Tarczynski.³³3See Tomek’s post Variance of a function of one random variable on CrossValidated: http://stats.stackexchange.com/questions/5782/variance-of-a-function-of-one-random-variable

By Chebyshev's inequality for random variables with finite variance, for any real $c>0$ ,

\mathbb{P}\left[|X-\mathbb{E}\left[X\right]|>c\right]\leq\frac{1}{c}\mathsf{Var}\left[X\right],

so for any $\epsilon>0$ we can find a large enough $c$ so that

\mathbb{P}\left[X\in[\mathbb{E}\left[X\right]-c,\mathbb{E}\left[X\right]+c]\right]=\mathbb{P}\left[|X-\mathbb{E}\left[X\right]|\leq c\right]<1-\epsilon.

Let us estimate $\mathbb{E}\left[f(X)\right]$ . We can write it as

\mathbb{E}\left[f(X)\right]=\int_{|x-\mathbb{E}\left[X\right]|\leq c}f(x)\,dF(x)+\int_{|x-\mathbb{E}\left[X\right]|>c}f(x)\,dF(x),

(5)

where $F$ is the distribution function of $X$ .

Since the domain of the first integral is the bounded closed interval $[\mathbb{E}\left[X\right]-c,\mathbb{E}\left[X\right]+c]$ , we can apply the Taylor series expansion:

f(x)=f(\mathbb{E}\left[X\right])+f^{\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])+\frac{1}{2}f^{\prime\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])^{2}+\frac{1}{3!}f^{\prime\prime\prime}(\alpha)(x-\mathbb{E}\left[X\right])^{3},

where $\alpha\in[\mathbb{E}\left[X\right]-c,\mathbb{E}\left[X\right]+c]$ , and the equality holds for all $x\in[\mathbb{E}\left[X\right]-c,\mathbb{E}\left[X\right]+c]$ . Here we took only four terms in the Taylor series expansion, but in general we can take as many as needed, as long as the function $f$ is smooth enough.

Substituting this formula into (5), we get

	$\displaystyle\mathbb{E}\left[f(X)\right]=\int_{\|x-\mathbb{E}\left[X\right]\|\leq c}f(\mathbb{E}\left[X\right])+f^{\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])+\frac{1}{2}f^{\prime\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])^{2}\,dF(x)$
	$\displaystyle+\int_{\|x-\mathbb{E}\left[X\right]\|\leq c}\frac{1}{3!}f^{\prime\prime\prime}(\alpha)(x-\mathbb{E}\left[X\right])^{3}\,dF(x)$
	$\displaystyle+\int_{\|x-\mathbb{E}\left[X\right]\|>c}f(x)\,dF(x).$

Increasing the domain of integration, we obtain

\mathbb{E}\left[f(X)\right]=f(\mathbb{E}\left[X\right])+\frac{1}{2}f^{\prime\prime}(\mathbb{E}\left[X\right])\mathbb{E}\left[X-\mathbb{E}\left[X\right]\right]^{2}+R_{3}

(6)

where

	$\displaystyle R_{3}=\frac{1}{3!}f^{\prime\prime\prime}(\alpha)\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^{2}\right]$
	$\displaystyle+\int_{\|x-\mathbb{E}\left[X\right]\|>c}\left(f(\mathbb{E}\left[X\right])+f^{\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])+\frac{1}{2}f^{\prime\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])^{2}+f(X)\right)\,dF(x).$

Under some moment conditions, we can show that the second term of this remainder is as large as $\mathbb{P}[|X-\mathbb{E}\left[X\right]|>c]$ , which is generally small. The first term remains, therefore the quality of the approximation depends on $\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^{3}\right]$ and the behaviour of the third derivative of $f$ on bounded intervals. Such approximation should work particularly well for random variables with zero third central moment, such as the normal distribution.

To obtain an approximation for the variance of $f(X)$ , we subtract (6) from the Taylor series expansion for $f(x)$ and square the difference:

\mathsf{Var}\left[f(X)\right]=\mathbb{E}\left[(f(X)-\mathbb{E}\left[f(X)\right])^{2}\right]=(f^{\prime}(\mathbb{E}\left[X\right]))^{2}\mathsf{Var}\left[X\right]+T_{3},

where $T_{3}$ involves the central moments $\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^{k}\right]$ for $k\geq 4$ . ∎

Remark A.2.

Theorem A.1 generalises to the $\mathbb{R}^{n}$ -valued random variable $\boldsymbol{X}$ , $n\in\mathbb{N}^{*}$ , and $\boldsymbol{f}:\mathbb{R}^{n}\to\mathbb{R}^{n}$ :

\mathsf{Var}\left[\boldsymbol{f}(X)\right]\approx\left(\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{X}}(\mathbb{E}\left[\boldsymbol{X}\right])\right)^{\intercal}\mathsf{Var}\left[\boldsymbol{X}\right]\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{X}}(\mathbb{E}\left[\boldsymbol{X}\right]).

Lemma A.3.

Let $\boldsymbol{M}$ be the block matrix

\boldsymbol{M}=\left(\begin{array}[]{cc}\boldsymbol{A}&\boldsymbol{0}_{n\times m}\\ \boldsymbol{0}_{m\times n}&\boldsymbol{B}\\ \end{array}\right)

with $\boldsymbol{A}\in\mathbb{R}^{n\times n}$ , $\boldsymbol{B}\in\mathbb{R}^{m\times m}$ . The eigenvalues of $\boldsymbol{M}$ are precisely those of the matrices $\boldsymbol{A}$ ( $n$ eigenvalues, some of the possibly repeated) and $\boldsymbol{B}$ ( $m$ eigenvalues, some of them possibly repeated).

Proof.

It is well known that for all square matrices $\boldsymbol{A}$ and $\boldsymbol{B}$ of equal dimensions, the following holds:

\det\left(\begin{array}[]{cc}\boldsymbol{A}&\boldsymbol{0}_{n\times m}\\ \boldsymbol{0}_{m\times n}&\boldsymbol{B}\\ \end{array}\right)=\det(\boldsymbol{A})\det(\boldsymbol{B}).

Therefore, the characteristic polynomial of this block matrix is given by

\det\left(\begin{array}[]{cc}\boldsymbol{A}-\lambda\boldsymbol{I}_{n\times n}&\boldsymbol{0}_{n\times m}\\ \boldsymbol{0}_{m\times n}&\boldsymbol{B}-\lambda\boldsymbol{I}_{m\times m}\\ \end{array}\right)=\det(\boldsymbol{A}-\lambda\boldsymbol{I}_{n\times n})\det(\boldsymbol{A}-\lambda\boldsymbol{I}_{m\times m}).

It follows that the eigenvalues $\boldsymbol{M}$ are precisely those of the matrices $\boldsymbol{A}$ and $\boldsymbol{B}$ , combined. ∎

Lemma A.4.

Let $\boldsymbol{M}$ be the block matrix

\boldsymbol{M}=\left(\begin{array}[]{cc}\boldsymbol{A}&\boldsymbol{B}\\ \boldsymbol{B}&\boldsymbol{A}\\ \end{array}\right)

with $\boldsymbol{A},\boldsymbol{B}\in\mathbb{R}^{n}$ , $n\in\mathbb{N}^{*}$ . The eigenvalues of $\boldsymbol{M}$ are precisely those of the matrices $\boldsymbol{A}+\boldsymbol{B}$ ( $n$ eigenvalues, some of them possibly repeated) and $\boldsymbol{A}-\boldsymbol{B}$ ( $n$ eigenvalues, some of them possibly repeated).

Proof.

It is well known that for all square matrices $\boldsymbol{A}$ and $\boldsymbol{B}$ of equal dimensions, the following holds:

\det\left(\begin{array}[]{cc}\boldsymbol{A}&\boldsymbol{B}\\ \boldsymbol{B}&\boldsymbol{A}\\ \end{array}\right)=\det(\boldsymbol{A}-\boldsymbol{B})\det(\boldsymbol{A}+\boldsymbol{B}).

Therefore, the characteristic polynomial of this block matrix is given by

\det\left(\begin{array}[]{cc}\boldsymbol{A}-\lambda\boldsymbol{I}_{n\times n}&\boldsymbol{B}\\ \boldsymbol{B}&\boldsymbol{A}-\lambda\boldsymbol{I}_{n\times n}\\ \end{array}\right)=\det((\boldsymbol{A}-\boldsymbol{B})-\lambda\boldsymbol{I}_{n\times n})\det((\boldsymbol{A}+\boldsymbol{B})-\lambda\boldsymbol{I}_{n\times n}).

It follows that the eigenvalues $\boldsymbol{M}$ are precisely those of the matrices $\boldsymbol{A}+\boldsymbol{B}$ and $\boldsymbol{A}-\boldsymbol{B}$ , combined. ∎

Implementing portfolio risk management and hedging in practice

Abstract

1 Introduction

Acknowledgements

2 Preliminaries

3 CVXOPT

4 The portfolio

5 The invariants

6 The factor model

7 Risk

8 The variance of the value of the portfolio

9 Minimising risk

10 Minimising risk and symmetric costs

10.1 Using CVXOPT

10.2 When is 𝑷\boldsymbol{P} positive definite?

11 Minimising risk and asymmetric costs

11.1 Using CVXOPT

11.2 When is 𝑷\boldsymbol{P} positive definite?

12 Practical considerations

13 Special case: m=nm=n, 𝑪\boldsymbol{C} and 𝑯\boldsymbol{H} diagonal

14 Case study: CDS indices, no cross-hedging

15 Case study: European government bonds

References

Appendix A Auxiliary results

Theorem A.1 (The variance of a function of a random variable).

Proof.

Remark A.2.

Lemma A.3.

Proof.

Lemma A.4.

Proof.

10.2 When is $\boldsymbol{P}$ positive definite?

11.2 When is $\boldsymbol{P}$ positive definite?

13 Special case: $m=n$ , $\boldsymbol{C}$ and $\boldsymbol{H}$ diagonal