This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Implementing portfolio risk management and hedging in practice

Paul Alexander Bilokon
Thalesians Ltd
Level39, One Canada Square
Canary Wharf
London E14 5AB
(2023.09.27)
Abstract

In academic literature portfolio risk management and hedging are often versed in the language of stochastic control and Hamilton–Jacobi–Bellman (HJB) equations in continuous time. In practice the continuous-time framework of stochastic control may be undesirable for various business reasons. In this work we present a straightforward approach for thinking of cross-asset portfolio risk management and hedging, providing some implementation details, while rarely venturing outside the convex optimisation setting of (approximate) quadratic programming (QP). We pay particular attention to the correspondence between the economic concepts and their mathematical representations; the abstractions enabling us to handle multiple asset classes and risk models at once; the dimensional analysis of the resulting equations; and the assumptions inherent in our derivations. We demonstrate how to solve the resulting QPs with CVXOPT.

1 Introduction

In academic literature portfolio risk management and hedging are often (but not always [19, 22]) versed in the language of stochastic control [6, 30, 26, 20, 25, 24] and Hamilton–Jacobi–Bellman (HJB) equations in continuous time. Indeed, this is the most rigorous and general framework for such considerations. Due to the inherent, natural relationship between stochastic control and reinforcement learning [7, 8], this framework can be relatively easily extended to the numerical methods of reinforcement learning [29, 27] taking care of the more realistic setting involving frictions.

In the author's practical experience of portfolio and risk management on the sell-side and on the buy-side, including at bulge bracket institutions, this approach presents several practical problems. First, some quantitative analysts and quantitative developers [13, 18] come from backgrounds, which exclude stochastic control. For example, many classical computer science degrees do not cover stochastic control (or indeed stochastic analysis) as part of the syllabus. The continuous-time framework is technically complex and requires discretisation at some point anyway. There are relatively few industry-grade solvers for HJB-type problems.

On the other hand, quadratic programming [9, 12, 14, 10] is accessible to most majors and requires a relatively limited background in undergraduate linear algebra (rather than graduate-level stochastic analysis and measure theory). It is usually well understood by quantitative analysts and developers irresepective of their academic and professional background. Furthermore, there are fast industry-grade implementations of quadratic program (QP) solvers, including CPLEX [16], Gurobi [15], IMSL [17], NAG [2], NASOQ [11], OSQP [28], Xpress [1]. Therefore it makes sense to come up with a QP formulation of the portfolio risk management and hedging problem. Even if such approach is imperfect in comparison with the stochastic control approach, it is practical.

Finally, in the medium- to high-frequency trading (HFT) [3] applications, the optimiser needs to be very efficient and called sparingly at discrete time intervals. In this setting, the QP formulation is yet again appealing.

In this work we describe the portfolio risk management and hedging framework with sufficient rigour, paying particular attention to the correspondence between the economic concepts and their mathematical representations; the abstractions enabling us to handle multiple asset classes and risk models at once; the dimensional analysis of the resulting equations; and the assumptions inherent in our derivations.

We demonstrate how to solve the resulting QPs with CVXOPT [4], a free software package for convex optimisation based on the Python programming language. CVXOPT is convenient in the research and development context. In production the reader is advised to use one of the industry-grade C++ or kdb+/q [23] implementations, or a specialised implementation of the solver on an FPGA or ASIC [21].

The implementations that we provide here are baseline, pedagogical implementations. In some cases the business requirements may be such that additional frictions may need to be taken into account, in which case the problem ceases to be convex. There are some proprietary tricks that can be applied in these situations. These tricks can significantly impact the profitability and risk profile of a business, but unfortunately they are outside the scope of this work.

Acknowledgements

We would like to thank Berc Rustem (Department of Computing, Imperial College London) and Martin Zinkin (Qubealgo) for our constructive discussions. The factor abstraction is largely due to Attilio Meucci and follows [22].

2 Preliminaries

In what follows we assume that we are the market maker and the sides of the trades are given from our perspective: buys are when we buy (the change in our position has a positive sign), sells are when we sell (the change in our position has a negative sign).

We shall denote by \mathbb{N}^{*} the set {1,2,3,}\{1,2,3,\ldots\}; by 𝟏n\mathbf{1}_{n} the vector in n\mathbb{R}^{n} whose elements are all ones. The vectors are column vectors by default.

Let us establish conventions for matrix calculus. The (scalar-by-vector) derivative of a scalar yy with respect to a vector in n\mathbb{R}^{n},

𝒙=(x1xn),\boldsymbol{x}=\left(\begin{array}[]{c}x_{1}\\ \vdots\\ x_{n}\\ \end{array}\right),

with nn\in\mathbb{N}^{*}, is written, in numerator layout notation, as

y𝒙=(yx1yxn).\frac{\partial y}{\partial\boldsymbol{x}}=\left(\begin{array}[]{ccc}\frac{\partial y}{\partial x_{1}}&\ldots&\frac{\partial y}{\partial x_{n}}\end{array}\right).

Strictly speaking, the result is a matrix in 1×n\mathbb{R}^{1\times n}, although we can certainly, and somewhat confusingly, think of it as a vector in n\mathbb{R}^{n}. (Confusingly, because of the widespread habit to think of vectors as column vectors by default. To avoid this confusion, we shall favour the matrix view over the vector view whenever there is ambiguity.)

The (vector-by-vector derivative) of a m\mathbb{R}^{m}-valued vector function (a vector whose components are functions)

𝒚=(y1ym),\boldsymbol{y}=\left(\begin{array}[]{c}y_{1}\\ \vdots\\ y_{m}\\ \end{array}\right),

with mm\in\mathbb{N}^{*}, with respect to an input vector in n\mathbb{R}^{n},

𝒙=(x1xn),\boldsymbol{x}=\left(\begin{array}[]{c}x_{1}\\ \vdots\\ x_{n}\\ \end{array}\right),

with nn\in\mathbb{N}^{*}, is written, in numerator layout notation, as

𝒚𝒙=(y1x1y1xnymx1ymxn).\frac{\partial\boldsymbol{y}}{\partial\boldsymbol{x}}=\left(\begin{array}[]{ccc}\frac{\partial y_{1}}{\partial x_{1}}&\cdots&\frac{\partial y_{1}}{\partial x_{n}}\\ \vdots&\ddots&\vdots\\ \frac{\partial y_{m}}{\partial x_{1}}&\cdots&\frac{\partial y_{m}}{\partial x_{n}}\\ \end{array}\right).

The result is a matrix in m×n\mathbb{R}^{m\times n}. It is called the Jacobian matrix of 𝒚\boldsymbol{y} with respect to 𝒙\boldsymbol{x}.

We have included these definitions here because they vary across the literature, with some authors preferring the numerator layout notation (as we do throughout this document), while others preferring the denominator layout notation.

Given the numerator layout convention that we have chosen, if 𝒙n\boldsymbol{x}\in\mathbb{R}^{n}, 𝑨n×n\boldsymbol{A}\in\mathbb{R}^{n\times n}, then 𝒙𝒙𝑨𝒙=𝒙(𝑨+𝑨)\frac{\partial}{\partial\boldsymbol{x}}\boldsymbol{x}^{\intercal}\boldsymbol{A}\boldsymbol{x}=\boldsymbol{x}^{\intercal}(\boldsymbol{A}+\boldsymbol{A}^{\intercal}). If, moreover, 𝑨\boldsymbol{A} is symmetric, then 𝒙𝒙𝑨𝒙=2𝒙𝑨\frac{\partial}{\partial\boldsymbol{x}}\boldsymbol{x}^{\intercal}\boldsymbol{A}\boldsymbol{x}=2\boldsymbol{x}^{\intercal}\boldsymbol{A}. If 𝒙n\boldsymbol{x}\in\mathbb{R}^{n}, 𝑨m×n\boldsymbol{A}\in\mathbb{R}^{m\times n}, then 𝒙𝑨𝒙=𝑨\frac{\partial}{\partial\boldsymbol{x}}\boldsymbol{A}\boldsymbol{x}=\boldsymbol{A}.

3 CVXOPT

CVXOPT's function qp is an interface to coneqp for quadratic programs. It also provides the option of using the quadratic programming solver from MOSEK [5].

cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])

This solves the convex quadratic program

minimise𝑥\displaystyle\underset{x}{\text{minimise}} (1/2)xTPx+qTx\displaystyle(1/2)x^{T}Px+q^{T}x
subject to Gxh,\displaystyle Gx\preceq h,
Ax=b\displaystyle Ax=b

and its dual.

4 The portfolio

Suppose that we are trading nn\in\mathbb{N}^{*} products whose prices per unit notional111We shall use the words “size” and “notional” synonymously. at time tt are given by the vector 𝑷tn\boldsymbol{P}_{t}\in\mathbb{R}^{n}. We shall represent the composition of our portfolio, πt\pi_{t}, in terms of 𝑵tn\boldsymbol{N}_{t}\in\mathbb{R}^{n}, where the iith element, (𝑵t)i(\boldsymbol{N}_{t})_{i}, is the net notional amount of the iith product in the portfolio (i=1:ni=1:n). Thus the total net notional of our portfolio is given by Ntπ=𝟏n𝑵tN^{\pi}_{t}=\mathbf{1}_{n}^{\intercal}\boldsymbol{N}_{t}\in\mathbb{R}. We could also consider the weights of the products in our portfolio, 𝒘t=1Ntπ𝑵tn\boldsymbol{w}_{t}=\frac{1}{N^{\pi}_{t}}\boldsymbol{N}_{t}\in\mathbb{R}^{n}. The value of our portfolio at time tt is given by Vtπ=𝑵t𝑷tV_{t}^{\pi}=\boldsymbol{N}_{t}^{\intercal}\boldsymbol{P}_{t}\in\mathbb{R}.

This is an appropriate time to comment on the units. The value of the portfolio, VtπV_{t}^{\pi} is always expressed in units of currency, say, USD or EUR. The units of the prices, 𝑷t\boldsymbol{P}_{t}, and notionals, 𝑵t\boldsymbol{N}_{t}, are subject to the conventions of the particular asset class:

  • For equities, 𝑵t\boldsymbol{N}_{t} is dimensionless; the notional is expressed as the number of shares. The price, 𝑷t\boldsymbol{P}_{t}, is in units of currency.

  • For CDS indices, 𝑵t\boldsymbol{N}_{t} is expressed in units of currency, as this is the amount on which protection is being bought or sold (e.g. 25,000,000 USD). The price, 𝑷t\boldsymbol{P}_{t}, on the other hand, is dimensionless, as it is expressed in units of currency per unit notional, which is itself expressed in units of currency.

5 The invariants

Our market risk consists in our dependence on the prices of the products at time T+τT+\tau, where TT represents the time when the asset allocation decision is being made and τ[0,+)\tau\in[0,+\infty) is our investment horizon.

In order to understand our market risk, we seek to express 𝑷tn\boldsymbol{P}_{t}\in\mathbb{R}^{n} in terms of invariants — market variables that exhibit stationary behaviour over time and can be expressed in terms of independent and identically distributed random variables.

Let us denote by 𝑿t,τ\boldsymbol{X}_{t,\tau} the n\mathbb{R}^{n}-valued random vector of invariants at time tt for a given investment horizon τ\tau. We express the prices of the products in our portfolio as

𝑷t=𝒈τ(𝑿t,τ),\boldsymbol{P}_{t}=\boldsymbol{g}_{\tau}(\boldsymbol{X}_{t,\tau}),

where 𝒈τ:nn\boldsymbol{g}_{\tau}:\mathbb{R}^{n}\to\mathbb{R}^{n} is a function that depends on the asset classes of products in our portfolio and our investment horizon. In particular,

𝑷T+τ=𝒈τ(𝑿T+τ,τ).\boldsymbol{P}_{T+\tau}=\boldsymbol{g}_{\tau}(\boldsymbol{X}_{T+\tau,\tau}).

Notice that we are assuming that, like 𝑷t\boldsymbol{P}_{t}, 𝑿t\boldsymbol{X}_{t} is nn-dimensional. The reason why this is a sensible assumption will become clear when we consider examples of 𝒈τ\boldsymbol{g}_{\tau} and 𝑿t,τ\boldsymbol{X}_{t,\tau} for various asset classes.

6 The factor model

To get a handle on our market risk, we shall express the vector of invariants as

𝑿t,τ=𝒉(𝑭t,τ)+𝑼t,τ,\boldsymbol{X}_{t,\tau}=\boldsymbol{h}(\boldsymbol{F}_{t,\tau})+\boldsymbol{U}_{t,\tau}, (1)

where Ft,τF_{t,\tau} is an m\mathbb{R}^{m}-valued (mm\in\mathbb{N}^{*}) vector of common risk factors that are responsible for most of the randomness in the market, Ut,τU_{t,\tau} is a residual vector of perturbations, and 𝒉:mn\boldsymbol{h}:\mathbb{R}^{m}\to\mathbb{R}^{n} is a function.

7 Risk

What is our risk exposure, i.e. the sensitivity of the value of our portfolio to our risk factors? It is given by the m\mathbb{R}^{m}-valued

𝒓t,τ:=Vtπ𝑭t,τ=𝑭t,τ𝑵t𝑷t=𝑵t𝑭t,τ𝒈(𝑿t,τ)=𝑵t𝒈𝑿t,τ(𝑿t,τ)𝒉𝑭t,τ(𝑭t,τ).\boldsymbol{r}_{t,\tau}:=\frac{\partial V_{t}^{\pi}}{\partial\boldsymbol{F}_{t,\tau}}=\frac{\partial}{\partial\boldsymbol{F}_{t,\tau}}\boldsymbol{N}_{t}^{\intercal}\boldsymbol{P}_{t}=\boldsymbol{N}_{t}^{\intercal}\frac{\partial}{\partial\boldsymbol{F}_{t,\tau}}\boldsymbol{g}(\boldsymbol{X}_{t,\tau})=\boldsymbol{N}_{t}^{\intercal}\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau}).

The dependence of our risk exposure on the notional vector is expressed by the matrix of sensitivities, the m×nm\times n Jacobian,

𝑯t,τ=𝒓t,τ𝑵t=𝑵t(𝑵t𝒈𝑿t,τ(𝑿t,τ)𝒉𝑭t,τ(𝑭t,τ))=𝒈𝑿t,τ(𝑿t,τ)𝒉𝑭t,τ(𝑭t,τ).\boldsymbol{H}_{t,\tau}=\frac{\partial\boldsymbol{r}_{t,\tau}}{\partial\boldsymbol{N}_{t}}=\frac{\partial}{\partial\boldsymbol{N}_{t}}\left(\boldsymbol{N}_{t}^{\intercal}\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})\right)=\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau}).

Thus the relationship between 𝒓t,τ\boldsymbol{r}_{t,\tau} and 𝑯t,τ\boldsymbol{H}_{t,\tau}, which follows from their definitions, is given by

𝒓t,τ=𝑵t𝑯t,τ.\boldsymbol{r}_{t,\tau}=\boldsymbol{N}_{t}^{\intercal}\boldsymbol{H}_{t,\tau}. (2)

8 The variance of the value of the portfolio

We would like to minimise the variance of the value of the portfolio at time T+τT+\tau, which represents our market risk. Let us regard VtπV_{t}^{\pi} as a function of the risk factors: Vtπ=Vtπ(𝑭t,τ)V_{t}^{\pi}=V_{t}^{\pi}(\boldsymbol{F}_{t,\tau}). By Theorem A.1 and Remark A.2,

𝖵𝖺𝗋[Vtπ](Vtπ𝑭t,τ)𝑪Vtπ𝑭t,τ=𝒓t,τ𝑪𝒓t,τ,\mathsf{Var}\left[V_{t}^{\pi}\right]\approx\left(\frac{\partial V_{t}^{\pi}}{\partial\boldsymbol{F}_{t,\tau}}\right)^{\intercal}\boldsymbol{C}\frac{\partial V_{t}^{\pi}}{\partial\boldsymbol{F}_{t,\tau}}=\boldsymbol{r}_{t,\tau}^{\intercal}\boldsymbol{C}\boldsymbol{r}_{t,\tau},

where 𝑪:=𝖢𝗈𝗏(𝑭t,τ)m×m\boldsymbol{C}:=\mathsf{Cov}(\boldsymbol{F}_{t,\tau})\in\mathbb{R}^{m\times m} is the covariance matrix.

This result is approximate. We approximated in two places:

  1. 1.

    In Theorem A.1, where we disposed of the remainder term of the Taylor series expansion.

  2. 2.

    When we regarded VtπV_{t}^{\pi} as a function of 𝑭t,τ\boldsymbol{F}_{t,\tau} alone, whereas it is also a function of another random variable: 𝑼t,τ\boldsymbol{U}_{t,\tau}. We have effectively approximated (1) by

    𝑿t,τ𝒉(𝑭t,τ),\boldsymbol{X}_{t,\tau}\approx\boldsymbol{h}(\boldsymbol{F}_{t,\tau}),

    which is sensible if the variance of 𝑼t,τ\boldsymbol{U}_{t,\tau} is small, i.e. our risk factors are responsible for most of the variance of 𝑿t,τ\boldsymbol{X}_{t,\tau}. To better account for the variance of 𝑼t,τ\boldsymbol{U}_{t,\tau}, we could have used the bivariate Taylor series expansion, instead of the univariate as in the proof of Theorem A.1.

9 Minimising risk

We would like to minimise the variance of the value of our portfolio at our investment horizon, 𝖵𝖺𝗋[VT+τπ]\mathsf{Var}\left[V_{T+\tau}^{\pi}\right]. Bearing in mind the caveats mentioned in Section 8, this is achieved by minimising

𝖵𝖺𝗋[VT+τπ](𝒓T+τ,τ+𝑯T+τ,τ𝒙)𝑪T+τ,τ(𝒓T+τ,τ+𝑯T+τ,τ𝒙),\mathsf{Var}\left[V_{T+\tau}^{\pi}\right]\approx(\boldsymbol{r}_{T+\tau,\tau}+\boldsymbol{H}_{T+\tau,\tau}\boldsymbol{x})^{\intercal}\boldsymbol{C}_{T+\tau,\tau}(\boldsymbol{r}_{T+\tau,\tau}+\boldsymbol{H}_{T+\tau,\tau}\boldsymbol{x}),

where 𝑪T+τ,τ:=𝖢𝗈𝗏(𝑭T+τ,τ)\boldsymbol{C}_{T+\tau,\tau}:=\mathsf{Cov}(\boldsymbol{F}_{T+\tau,\tau}) and 𝒙n\boldsymbol{x}\in\mathbb{R}^{n} is the change in position that will minimise the variance of the value of our portfolio. It is 𝒙\boldsymbol{x} that we need to find.

Note that, at the time the hedging decision is made, 𝒓T+τ,τ\boldsymbol{r}_{T+\tau,\tau}, 𝑯T+τ,τ\boldsymbol{H}_{T+\tau,\tau}, and 𝑪T+τ,τ\boldsymbol{C}_{T+\tau,\tau} are not yet known. Therefore we have to make use of forecasts instead and minimise

(𝒓^T+τ,τ+𝑯^T+τ,τ𝒙)𝑪^T+τ,τ(𝒓^T+τ,τ+𝑯^T+τ,τ𝒙),(\hat{\boldsymbol{r}}_{T+\tau,\tau}+\hat{\boldsymbol{H}}_{T+\tau,\tau}\boldsymbol{x})^{\intercal}\hat{\boldsymbol{C}}_{T+\tau,\tau}(\hat{\boldsymbol{r}}_{T+\tau,\tau}+\hat{\boldsymbol{H}}_{T+\tau,\tau}\boldsymbol{x}),

where 𝑪^T+τ,τ:=𝖢𝗈𝗏(𝑭^T+τ,τ)\hat{\boldsymbol{C}}_{T+\tau,\tau}:=\mathsf{Cov}(\hat{\boldsymbol{F}}_{T+\tau,\tau}). We shall discuss how these forecasts can be obtained later. For the time being, we shall drop the subscripts and superscripts to avoid notational clutter and solve the unconstrained quadratic program (QP)

minimise𝒙(𝒓+𝑯𝒙)𝑪(𝒓+𝑯𝒙),\underset{\boldsymbol{x}}{\text{minimise}}\,\,(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x}), (3)

which can be done analytically using straightforward matrix calculus. First, differentiate with respect to 𝒙\boldsymbol{x}:

𝒙(𝒓+𝑯𝒙)𝑪(𝒓+𝑯𝒙)\displaystyle\frac{\partial}{\partial\boldsymbol{x}}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x}) =𝒙(𝒙𝑯𝑪𝑯𝒙+𝒓𝑪𝑯𝒙+𝒙𝑯𝑪𝒓+𝒓𝑪𝒓)\displaystyle=\frac{\partial}{\partial\boldsymbol{x}}\left(\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}\right)
=𝒙(𝒙𝑯𝑪𝑯𝒙+2𝒓𝑪𝑯𝒙+𝒓𝑪𝒓)\displaystyle=\frac{\partial}{\partial\boldsymbol{x}}\left(\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}\right)
=2𝒙𝑯𝑪𝑯+2𝒓𝑪𝑯.\displaystyle=2\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}.

We set this partial derivative to zero and solve for 𝒙\boldsymbol{x}:

𝒙𝑯𝑪𝑯=𝒓𝑪𝑯,\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}=-\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H},

hence

𝒙=𝒓𝑪𝑯(𝑯𝑪𝑯)1\boldsymbol{x}^{\intercal}=-\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\right)^{-1}

and

𝒙=(𝑯𝑪𝑯)1𝑯𝑪𝒓n.\boldsymbol{x}=-\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\right)^{-1}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}\in\mathbb{R}^{n}.

10 Minimising risk and symmetric costs

Let us now incorporate the costs into the optimisation. We now assume that it will cost us 𝒄𝒙\boldsymbol{c}^{\intercal}\boldsymbol{x} to execute the hedge (change our position by 𝒙\boldsymbol{x}), where 𝒄n\boldsymbol{c}\in\mathbb{R}^{n} are the costs per unit notional. Notice that the costs are symmetric — they are the same linear factors of 𝒙\boldsymbol{x} irrespective of whether we are buying or selling. This assumption may not be realistic in practice.

The problem remains a constrained QP:

minimise𝒙(𝒓+𝑯𝒙)𝑪(𝒓+𝑯𝒙)+λc𝒄|𝒙|,\underset{\boldsymbol{x}}{\text{minimise}}\,\,(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})+\lambda_{c}\boldsymbol{c}^{\intercal}|\boldsymbol{x}|,

where |||\cdot| denotes the elementwise absolute value.222At first sight, the problem is an unconstrained QP, minimise𝒙(𝒓+𝑯𝒙)𝑪(𝒓+𝑯𝒙)+λc𝒄𝒙,\underset{\boldsymbol{x}}{\text{minimise}}\,\,(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})+\lambda_{c}\boldsymbol{c}^{\intercal}\boldsymbol{x}, which can be solved using matrix calculus as before: 𝒙((𝒓+𝑯𝒙)𝑪(𝒓+𝑯𝒙)+λc𝒄𝒙)\displaystyle\frac{\partial}{\partial\boldsymbol{x}}\left((\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})+\lambda_{c}\boldsymbol{c}^{\intercal}\boldsymbol{x}\right) =𝒙(𝒙𝑯𝑪𝑯𝒙+𝒓𝑪𝑯𝒙+𝒙𝑯𝑪𝒓+𝒓𝑪𝒓+λc𝒄𝒙)\displaystyle=\frac{\partial}{\partial\boldsymbol{x}}\left(\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{c}\boldsymbol{c}^{\intercal}\boldsymbol{x}\right) =𝒙(𝒙𝑯𝑪𝑯𝒙+2𝒓𝑪𝑯𝒙+𝒓𝑪𝒓+λc𝒄𝒙)\displaystyle=\frac{\partial}{\partial\boldsymbol{x}}\left(\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}\boldsymbol{x}+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{c}\boldsymbol{c}^{\intercal}\boldsymbol{x}\right) =2𝒙𝑯𝑪𝑯+2𝒓𝑪𝑯+λc𝒄.\displaystyle=2\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}+\lambda_{c}\boldsymbol{c}^{\intercal}. Setting this partial derivative to zero and solving for 𝒙\boldsymbol{x}, we get: 𝒙𝑯𝑪𝑯=(𝒓𝑪𝑯+12λc𝒄)(𝑯𝑪𝑯)1,\boldsymbol{x}^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}=-\left(\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}+\frac{1}{2}\lambda_{c}\boldsymbol{c}^{\intercal}\right)\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\right)^{-1}, hence 𝒙=(𝑯𝑪𝑯)1(𝑯𝑪𝒓+12λc𝒄)n.\boldsymbol{x}=-\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\right)^{-1}\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\frac{1}{2}\lambda_{c}\boldsymbol{c}\right)\in\mathbb{R}^{n}. However, in this form the problem is misspecified: if 𝒄\boldsymbol{c} has all positive elements, then we are paying for buying, but receiving money for selling!

Here λc0\lambda_{c}\geq 0 is a nonnegative constant specifying how many units of cash we are prepared to pay for reducing the variance of the value of the portfolio by one unit (in units of value squared).

We can rewrite this problem as

minimise𝒙,𝒗\displaystyle\underset{\boldsymbol{x},\boldsymbol{v}}{\text{minimise}} (𝒓+𝑯𝒙)𝑪(𝒓+𝑯𝒙)+λc(𝒄)𝒗+λ0(𝒗𝒗𝒙𝒙)\displaystyle(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}\boldsymbol{x})+\lambda_{c}(\boldsymbol{c})^{\intercal}\boldsymbol{v}+\lambda_{0}(\boldsymbol{v}^{\intercal}\boldsymbol{v}-\boldsymbol{x}^{\intercal}\boldsymbol{x})
subject to 𝒗𝒙,\displaystyle\boldsymbol{v}\succeq\boldsymbol{x},
𝒗𝒙.\displaystyle\boldsymbol{v}\succeq-\boldsymbol{x}.

The last term was added to regularise the covariance matrix of the overall problem.

Let us write this problem as a standard QP. Setting

𝒙a=(𝒙𝒗),\boldsymbol{x}^{a}=\left(\begin{array}[]{c}\boldsymbol{x}\\ \boldsymbol{v}\\ \end{array}\right),

the objective function can be written as

12(𝒙a)(2𝑯𝑪𝑯λ0𝑰n×n𝟎n×n𝟎n×n2λ0𝑰n×n)𝒙a+(2𝑯𝑪𝒓+λ0𝒄𝟎n×1)𝒙a.\displaystyle\frac{1}{2}(\boldsymbol{x}^{a})^{\intercal}\left(\begin{array}[]{cc}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n}&\boldsymbol{0}_{n\times n}\\ \boldsymbol{0}_{n\times n}&2\lambda_{0}\boldsymbol{I}_{n\times n}\\ \end{array}\right)\boldsymbol{x}^{a}+\left(\begin{array}[]{c}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{0}\boldsymbol{c}\\ \boldsymbol{0}_{n\times 1}\end{array}\right)^{\intercal}\boldsymbol{x}^{a}.

We can also rewrite the constraints in block matrix form as

(𝑰n×n𝑰n×n𝑰n×n𝑰n×n)𝒙a(𝟎n𝟎n).\left(\begin{array}[]{ccc}\boldsymbol{I}_{n\times n}&-\boldsymbol{I}_{n\times n}\\ -\boldsymbol{I}_{n\times n}&-\boldsymbol{I}_{n\times n}\end{array}\right)\boldsymbol{x}^{a}\preceq\left(\begin{array}[]{c}\boldsymbol{0}_{n}\\ \boldsymbol{0}_{n}\\ \end{array}\right).

10.1 Using CVXOPT

Thus we can call

cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])

with

P=(2𝑯𝑪𝑯λ0𝑰n×n𝟎n×n𝟎n×n2λ0𝑰n×n),\text{{P}}=\left(\begin{array}[]{cc}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n}&\boldsymbol{0}_{n\times n}\\ \boldsymbol{0}_{n\times n}&2\lambda_{0}\boldsymbol{I}_{n\times n}\\ \end{array}\right),
q=(2𝑯𝑪𝒓+λ0𝒄𝟎n×1),\text{{q}}=\left(\begin{array}[]{c}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{0}\boldsymbol{c}\\ \boldsymbol{0}_{n\times 1}\end{array}\right),
G=(𝑰n×n𝑰n×n𝑰n×n𝑰n×n),\text{{G}}=\left(\begin{array}[]{ccc}\boldsymbol{I}_{n\times n}&-\boldsymbol{I}_{n\times n}\\ -\boldsymbol{I}_{n\times n}&-\boldsymbol{I}_{n\times n}\end{array}\right),
h=(𝟎n𝟎n)\text{{h}}=\left(\begin{array}[]{c}\boldsymbol{0}_{n}\\ \boldsymbol{0}_{n}\end{array}\right)

to find the optimal 𝒙a\boldsymbol{x}^{a}.

10.2 When is 𝑷\boldsymbol{P} positive definite?

In this section, and in this section only, 𝑷\boldsymbol{P} will denote the parameter of cvxopt.solvers.qp and not the vector of prices.

By Lemma A.3, the eigenvalues of 𝑷\boldsymbol{P} are precisely those of 2λ0𝑰n×n2\lambda_{0}\boldsymbol{I}_{n\times n} and 2𝑯𝑪𝑯λ0𝑰n×n2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n}, combined. Clearly the eigenvalues of 2λ0𝑰n×n2\lambda_{0}\boldsymbol{I}_{n\times n} are just 2λ02\lambda_{0} repeated nn times. To find the eigenvalues of 2𝑯𝑪𝑯λ0𝑰n×n2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n}, we solve the characteristic equation

det((2𝑯𝑪𝑯λ0𝑰n×n)λ𝑰n×n)=0\det((2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n})-\lambda\boldsymbol{I}_{n\times n})=0

for λ\lambda. On observing that

det((2𝑯𝑪𝑯λ0𝑰n×n)λ𝑰n×n)=2ndet(𝑯𝑪𝑯(12λ0+12λ)𝑰n×n),\det((2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\lambda_{0}\boldsymbol{I}_{n\times n})-\lambda\boldsymbol{I}_{n\times n})=2^{n}\det\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\left(\frac{1}{2}\lambda_{0}+\frac{1}{2}\lambda\right)\boldsymbol{I}_{n\times n}\right),

we notice that this deteminant is zero precisely when

12λ0+12λ=λ\frac{1}{2}\lambda_{0}+\frac{1}{2}\lambda=\lambda^{\prime}

for an eigenvalue, λ\lambda^{\prime}, of 𝑯𝑪𝑯\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}. Thus, to summarise, the eigenvalues of 𝑷\boldsymbol{P} are:

  • 2λ02\lambda_{0} repeated nn times;

  • for i=1,,ni=1,\ldots,n, 2λiλ02\lambda^{\prime}_{i}-\lambda_{0}, where λi\lambda^{\prime}_{i} is the iith eigenvalue of 𝑯𝑪𝑯\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}.

It is now clear that 𝑷\boldsymbol{P} is positive definite iff 0<λ0<2λmin0<\lambda_{0}<2\lambda^{\prime}_{\text{min}}, where λmin\lambda^{\prime}_{\text{min}} is the least eigenvalue of 𝑯𝑪𝑯\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H} (which is positive because 𝑯𝑪𝑯\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H} is positive definite by assumption).

11 Minimising risk and asymmetric costs

In Section 10, we assumed that the costs would be the same irrespective of the signs of the components of 𝒙\boldsymbol{x}. In this section we shall develop an approach that will allow us to provide separate costs for buying, 𝒄+n\boldsymbol{c}^{+}\in\mathbb{R}^{n}, and for selling, 𝒄n\boldsymbol{c}^{-}\in\mathbb{R}^{n}.

To this end we also define 𝒙a2n\boldsymbol{x}^{a}\in\mathbb{R}^{2n} as the block vector

𝒙a=(𝒙+𝒙),\boldsymbol{x}^{a}=\left(\begin{array}[]{c}\boldsymbol{x}^{+}\\ \boldsymbol{x}^{-}\\ \end{array}\right),

with 𝒙+,𝒙n\boldsymbol{x}^{+},\boldsymbol{x}^{-}\in\mathbb{R}^{n} with all their components nonnegative. 𝒙+\boldsymbol{x}^{+} specifies the notional amounts to be bought, 𝒙\boldsymbol{x}^{-} specifies the notional amounts to be bought, for each product.

The optimisation problem now becomes

minimise𝒙+,𝒙\displaystyle\underset{\boldsymbol{x}^{+},\boldsymbol{x}^{-}}{\text{minimise}} (𝒓+𝑯(𝒙+𝒙))𝑪(𝒓+𝑯(𝒙+𝒙))+λc[(𝒄+)𝒙++(𝒄)𝒙]\displaystyle(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))+\lambda_{c}[(\boldsymbol{c}^{+})^{\intercal}\boldsymbol{x}^{+}+(\boldsymbol{c}^{-})^{\intercal}\boldsymbol{x}^{-}]
subject to 𝒙+0,\displaystyle\boldsymbol{x}^{+}\succeq 0,
𝒙0.\displaystyle\boldsymbol{x}^{-}\succeq 0.

Here λc0\lambda_{c}\geq 0 is a nonnegative constant specifying how many units of cash we are prepared to pay for reducing the variance of the value of the portfolio by one unit (in units of value squared).

We note that this is now a constrained QP.

There is a problem with this formulation: nothing guarantees that we won't be buying and selling the same product simultaneously, i.e. that, for some i{1,,n}i\in\{1,\ldots,n\}, (𝒙+)i>0(\boldsymbol{x}^{+})_{i}>0 and (𝒙)i>0(\boldsymbol{x}^{-})_{i}>0. To address this, we add an additional term, λ0(𝒙+)𝒙\lambda_{0}(\boldsymbol{x}^{+})^{\intercal}\boldsymbol{x}^{-}:

minimise𝒙+,𝒙\displaystyle\underset{\boldsymbol{x}^{+},\boldsymbol{x}^{-}}{\text{minimise}} (𝒓+𝑯(𝒙+𝒙))𝑪(𝒓+𝑯(𝒙+𝒙))+λc[(𝒄+)𝒙++(𝒄)𝒙]+λ0(𝒙+)𝒙\displaystyle(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))+\lambda_{c}[(\boldsymbol{c}^{+})^{\intercal}\boldsymbol{x}^{+}+(\boldsymbol{c}^{-})^{\intercal}\boldsymbol{x}^{-}]+\lambda_{0}(\boldsymbol{x}^{+})^{\intercal}\boldsymbol{x}^{-}
subject to 𝒙+0,\displaystyle\boldsymbol{x}^{+}\succeq 0,
𝒙0.\displaystyle\boldsymbol{x}^{-}\succeq 0.

Let us write this problem as a standard QP. The first term of the objective function can be written as

(𝒓+𝑯(𝒙+𝒙))𝑪(𝒓+𝑯(𝒙+𝒙))=(𝑯(𝒙+𝒙))𝑪𝑯(𝒙+𝒙)+2𝒓𝑪𝑯(𝒙+𝒙)+𝒓𝑪𝒓.\displaystyle(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))^{\intercal}\boldsymbol{C}(\boldsymbol{r}+\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))=(\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-}))^{\intercal}\boldsymbol{C}\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})+\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{r}.

The last term is a constant and can be dropped from the minimisation. The remaining terms can be rewritten as

(𝒙+𝒙)𝑯𝑪𝑯(𝒙+𝒙)+2𝒓𝑪𝑯(𝒙+𝒙)\displaystyle(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})^{\intercal}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})+2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}(\boldsymbol{x}^{+}-\boldsymbol{x}^{-})
=(𝒙a)(𝑯𝑪𝑯𝑯𝑪𝑯𝑯𝑪𝑯𝑯𝑪𝑯)𝒙a+(2𝑯𝑪𝒓2𝑯𝑪𝒓)𝒙a.\displaystyle=(\boldsymbol{x}^{a})^{\intercal}\left(\begin{array}[]{cc}\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}&-\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\\ -\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}&\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\\ \end{array}\right)\boldsymbol{x}^{a}+\left(\begin{array}[]{c}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}\\ -2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}\\ \end{array}\right)^{\intercal}\boldsymbol{x}^{a}.

The second term of the objective function can be written as

λc[(𝒄+)𝒙++(𝒄)𝒙]=λc(𝒄+𝒄)𝒙a.\lambda_{c}[(\boldsymbol{c}^{+})^{\intercal}\boldsymbol{x}^{+}+(\boldsymbol{c}^{-})^{\intercal}\boldsymbol{x}^{-}]=\lambda_{c}\left(\begin{array}[]{c}\boldsymbol{c}^{+}\\ \boldsymbol{c}^{-}\\ \end{array}\right)^{\intercal}\boldsymbol{x}^{a}.

Finally, the third term of the objective function can be written as

(𝒙a)(𝟎n×nλ0𝑰n×nλ0𝑰n×n𝟎n×n)𝒙a.(\boldsymbol{x}^{a})^{\intercal}\left(\begin{array}[]{cc}\boldsymbol{0}_{n\times n}&\lambda_{0}\boldsymbol{I}_{n\times n}\\ \lambda_{0}\boldsymbol{I}_{n\times n}&\boldsymbol{0}_{n\times n}\\ \end{array}\right)\boldsymbol{x}^{a}.

Putting this together, we rewrite the objective function as

12(𝒙a)(2𝑯𝑪𝑯2𝑯𝑪𝑯+2λ0𝑰n×n2𝑯𝑪𝑯+2λ0𝑰n×n2𝑯𝑪𝑯)𝒙a+(2𝒓𝑪𝑯+λc𝒄+2𝒓𝑪𝑯+λc𝒄)𝒙a.\frac{1}{2}(\boldsymbol{x}^{a})^{\intercal}\left(\begin{array}[]{cc}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}&-2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\lambda_{0}\boldsymbol{I}_{n\times n}\\ -2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\lambda_{0}\boldsymbol{I}_{n\times n}&2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\\ \end{array}\right)\boldsymbol{x}^{a}+\left(\begin{array}[]{c}2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}+\lambda_{c}\boldsymbol{c}^{+}\\ -2\boldsymbol{r}^{\intercal}\boldsymbol{C}\boldsymbol{H}+\lambda_{c}\boldsymbol{c}^{-}\\ \end{array}\right)^{\intercal}\boldsymbol{x}^{a}.

We can also rewrite the constraints in block matrix form as

(𝑰n×n0n×n0n×n𝑰n×n)𝒙a(𝟎n𝟎n).\left(\begin{array}[]{ccc}-\boldsymbol{I}_{n\times n}&0_{n\times n}\\ 0_{n\times n}&-\boldsymbol{I}_{n\times n}\end{array}\right)\boldsymbol{x}^{a}\preceq\left(\begin{array}[]{c}\boldsymbol{0}_{n}\\ \boldsymbol{0}_{n}\\ \end{array}\right).

11.1 Using CVXOPT

Thus we can call

cvxopt.solvers.qp(P,q[,G,h,[,A,b[,solver[,initvals]]]])

with

P=(2𝑯𝑪𝑯2𝑯𝑪𝑯+2λ0𝑰n×n2𝑯𝑪𝑯+2λ0𝑰n×n2𝑯𝑪𝑯),\text{{P}}=\left(\begin{array}[]{cc}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}&-2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\lambda_{0}\boldsymbol{I}_{n\times n}\\ -2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}+2\lambda_{0}\boldsymbol{I}_{n\times n}&2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}\\ \end{array}\right),
q=(2𝑯𝑪𝒓+λc𝒄+2𝑯𝑪𝒓+λc𝒄),\text{{q}}=\left(\begin{array}[]{c}2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{c}\boldsymbol{c}^{+}\\ -2\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{r}+\lambda_{c}\boldsymbol{c}^{-}\end{array}\right),
G=(𝑰n×n0n×n0n×n𝑰n×n),\text{{G}}=\left(\begin{array}[]{ccc}-\boldsymbol{I}_{n\times n}&0_{n\times n}\\ 0_{n\times n}&-\boldsymbol{I}_{n\times n}\end{array}\right),
h=(𝟎n𝟎n)\text{{h}}=\left(\begin{array}[]{c}\boldsymbol{0}_{n}\\ \boldsymbol{0}_{n}\end{array}\right)

to find the optimal

𝒙=(𝒙+𝒙).\boldsymbol{x}=\left(\begin{array}[]{c}\boldsymbol{x}^{+}\\ \boldsymbol{x}^{-}\end{array}\right).

11.2 When is 𝑷\boldsymbol{P} positive definite?

In this section, and in this section only, 𝑷\boldsymbol{P} will denote the parameter of cvxopt.solvers.qp and not the vector of prices.

By Lemma A.4, the eigenvalues of 𝑷\boldsymbol{P} are precisely those of 2λ0𝑰n×n2\lambda_{0}\boldsymbol{I}_{n\times n} and 4𝑯𝑪𝑯2λ0𝑰n×n4\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-2\lambda_{0}\boldsymbol{I}_{n\times n}, combined. Clearly the eigenvalues of 2λ0𝑰n×n2\lambda_{0}\boldsymbol{I}_{n\times n} are just 2λ02\lambda_{0} repeated nn times. To find the eigenvalues of 4𝑯𝑪𝑯2λ0𝑰n×n4\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-2\lambda_{0}\boldsymbol{I}_{n\times n}, we solve the characteristic equation

det((4𝑯𝑪𝑯2λ0𝑰n×n)λ𝑰n×n)=0\det((4\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-2\lambda_{0}\boldsymbol{I}_{n\times n})-\lambda\boldsymbol{I}_{n\times n})=0

for λ\lambda. On observing that

det((4𝑯𝑪𝑯2λ0𝑰n×n)λ𝑰n×n)=4ndet(𝑯𝑪𝑯(12λ0+14λ)𝑰n×n),\det((4\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-2\lambda_{0}\boldsymbol{I}_{n\times n})-\lambda\boldsymbol{I}_{n\times n})=4^{n}\det\left(\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}-\left(\frac{1}{2}\lambda_{0}+\frac{1}{4}\lambda\right)\boldsymbol{I}_{n\times n}\right),

we notice that this deteminant is zero precisely when

12λ0+14λ=λ\frac{1}{2}\lambda_{0}+\frac{1}{4}\lambda=\lambda^{\prime}

for an eigenvalue, λ\lambda^{\prime}, of 𝑯𝑪𝑯\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}. Thus, to summarise, the eigenvalues of 𝑷\boldsymbol{P} are:

  • 2λ02\lambda_{0} repeated nn times;

  • for i=1,,ni=1,\ldots,n, 4λi2λ04\lambda^{\prime}_{i}-2\lambda_{0}, where λi\lambda^{\prime}_{i} is the iith eigenvalue of 𝑯𝑪𝑯\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H}.

It is now clear that 𝑷\boldsymbol{P} is positive definite iff 0<λ0<2λmin0<\lambda_{0}<2\lambda^{\prime}_{\text{min}}, where λmin\lambda^{\prime}_{\text{min}} is the least eigenvalue of 𝑯𝑪𝑯\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H} (which is positive because 𝑯𝑪𝑯\boldsymbol{H}^{\intercal}\boldsymbol{C}\boldsymbol{H} is positive definite by assumption).

12 Practical considerations

The universe of products that we trade may be a proper superset of the universe of products that we use to hedge. This is easily implemented within our framework: compute the risk for the overall portfolio, then restrict the set of products under consideration to the hedging universe. Then the dimension nn is the number of products that are used for hedging and the derivations remain valid for this restricted set of products.

13 Special case: m=nm=n, 𝑪\boldsymbol{C} and 𝑯\boldsymbol{H} diagonal

Let us now consider the special case when m=nm=n and both 𝑪\boldsymbol{C} and 𝑯\boldsymbol{H} are diagonal matrices:

𝑪=(C10000C200000000Cn),𝑯=(H10000H200000000Hn).\boldsymbol{C}=\left(\begin{array}[]{ccccc}C_{1}&0&0&\cdots&0\\ 0&C_{2}&0&\cdots&0\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ 0&0&\ddots&\ddots&0\\ 0&0&\cdots&0&C_{n}\end{array}\right),\quad\boldsymbol{H}=\left(\begin{array}[]{ccccc}H_{1}&0&0&\cdots&0\\ 0&H_{2}&0&\cdots&0\\ \vdots&\ddots&\ddots&\ddots&\vdots\\ 0&0&\ddots&\ddots&0\\ 0&0&\cdots&0&H_{n}\end{array}\right).

Additionally, we require that 𝑪\boldsymbol{C} be positive definite (not just positive semidefinite), so Ci>0C_{i}>0 for i=1,,ni=1,\ldots,n. We shall also require Hi>0H_{i}>0 for i=1,,ni=1,\ldots,n.

In this setting, the unconstrained QP (3) reduces to the system of scalar optimisation problems without any coupling,

minimisexiHi2xi2+2riHixi+ri2,i=1,,n.\underset{x_{i}}{\text{minimise}}\,\,H_{i}^{2}x_{i}^{2}+2r_{i}H_{i}x_{i}+r_{i}^{2},\quad i=1,\ldots,n.

Using elementary calculus — solving for xix_{i} the derivative of Hi2xi2+2riHixi+ri2H_{i}^{2}x_{i}^{2}+2r_{i}H_{i}x_{i}+r_{i}^{2} with respect to xx equated to 0 — we find the optimal xix_{i}: xi=ri/Hix_{i}^{*}=-r_{i}/H_{i}.

Note that in this case, when the correlations are absent, the sign of xix_{i} (i.e. whether we buy or sell the iith product) depends entirely on the signs of rir_{i} and HiH_{i}: xix_{i} is positive (i.e. we have to buy |xi|=xi|x_{i}|=x_{i} units of notional of the iith product) when rir_{i} and HiH_{i} are of different signs; xix_{i} is negative (i.e. we have to sell |xi|=xi|x_{i}|=-x_{i} units of notional of the iith product) when rir_{i} and HiH_{i} are of the same sign. (Of course, we are assuming that ri0r_{i}\neq 0, as otherwise there is no risk to hedge.)

For this reason there is no need to augment the costs of buying and selling as we did in Section 11. We can simply set

ci:={cost of buying 1 unit notional of ith product,riHi of different signs;cost of selling 1 unit notional of ith product,riHi of same sign.c_{i}:=\left\{\begin{array}[]{ll}\text{cost of buying 1 unit notional of $i$th product},&\hbox{$r_{i}$, $H_{i}$ of different signs;}\\ \text{cost of selling 1 unit notional of $i$th product},&\hbox{$r_{i}$, $H_{i}$ of same sign.}\end{array}\right.

The objective function then becomes (ri+Hixi)2Ci+λccixi(r_{i}+H_{i}x_{i})^{2}C_{i}+\lambda_{c}c_{i}x_{i} and the optimisation problem

minimisexiCiHi2xi2+(2CiriHi+λcci)xi+Ciri2,i=1,,n,\underset{x_{i}}{\text{minimise}}\,\,C_{i}H_{i}^{2}x_{i}^{2}+(2C_{i}r_{i}H_{i}+\lambda_{c}c_{i})x_{i}+C_{i}r_{i}^{2},\quad i=1,\ldots,n,

where λc\lambda_{c} is as in Section 10.

Again using elementary calculus, we find

xi=2CiriHi+λcci2CiHi2.x_{i}^{*}=-\frac{2C_{i}r_{i}H_{i}+\lambda_{c}c_{i}}{2C_{i}H_{i}^{2}}.

14 Case study: CDS indices, no cross-hedging

Let us now consider the case of CDS indices. Recall that credit DV01 (CDV01) or CS01 (Credit Spread 01) is defined as the change in price of the CDS contract (of a given notional) for a one basis point increase in spread.

For the European CDS indices (the iTraxx family), we shall take 1,000,0001,000,000 EUR as our unit of notional. For the North American CDS indices (the CDX family), we shall take 1,000,0001,000,000 USD as our unit of notional. So NiN_{i} and xix_{i} will have these units.

The unit of price (PiP_{i}) will be EUR for iTraxx and USD for CDX. Our invariants (XiX_{i}) will be credit spreads, whose units are basis points. We shall set hih_{i} in (1) to be the identity map, i.e. our factors will be the same as our invariants, FiXiF_{i}\equiv X_{i}. Since our risk exposure is given by ri:=dVπdFir_{i}:=\frac{dV^{\pi}}{dF_{i}}, its units will be EUR per basis point for iTraxx and USD per basis point for CDX. Since the Jacobian is given by Hi:=dridNiH_{i}:=\frac{dr_{i}}{dN_{i}}, its units will be 1,000,0001(basis point)11,000,000^{-1}(\text{basis point})^{-1}: HiH_{i} is the change in price (in EUR for iTraxx, USD for CDX) per 1,000,000 EUR for iTraxx (1,000,000 USD for CDX) notional.

15 Case study: European government bonds

Suppose that we have a portfolio of nn European government bonds. For i=1:ni=1:n, the price of the iith bond at time tt is given by

(𝑷t)i=j=1lici,je(yt(τt,i,j)+λt,i)τt,i,j,(\boldsymbol{P}_{t})_{i}=\sum_{j=1}^{l_{i}}c_{i,j}e^{-(y_{t}(\tau_{t,i,j})+\lambda_{t,i})\tau_{t,i,j}}, (4)

where

  • (𝑷t)i(\boldsymbol{P}_{t})_{i} is the dirty market price of the iith bond;

  • lil_{i} is the number of cashflows for the iith bond;

  • ci,jc_{i,j} is the jjth cashflow for the iith bond;

  • τt,i,j\tau_{t,i,j} is the duration of the time interval between the time tt and the time of the jjth cashflow of the iith bond;

  • yt:[0,)y_{t}:[0,\infty)\to\mathbb{R} is the zero curve at time tt, which is a function that maps the maturities of the cashflows (the durations of the time intervals between the time tt and the times of the cashflows) to the continuously compounded interest rates;

  • λt,i\lambda_{t,i} is a parallel vertical shift to the zero curve yty_{t} which is required to equate the right-hand side to the dirty market price of the iith bond. We shall refer to λt,i\lambda_{t,i} as the idiosyncratic spread of the iith bond at time tt.

We model the yield curve as

yt(τcf)=k=1dβt,kfk(τcf),y_{t}(\tau_{\text{cf}})=\sum_{k=1}^{d}\beta_{t,k}f_{k}(\tau_{\text{cf}}),

where dd\in\mathbb{N}^{*} and, for k=1:dk=1:d,

fk:[0,)f_{k}:[0,\infty)\to\mathbb{R}

are some suitably defined basis functions.

As we are interested in intraday market making, our investment horizon τ\tau is relatively short, so we can assume that the bond prices are our invariants,

𝑿t,τ=𝑷t,\boldsymbol{X}_{t,\tau}=\boldsymbol{P}_{t},

so 𝒈τ\boldsymbol{g}_{\tau} is the identity map, i.e., for all 𝒙n\boldsymbol{x}\in\mathbb{R}^{n}, 𝒈τ(𝒙)=𝒙\boldsymbol{g}_{\tau}(\boldsymbol{x})=\boldsymbol{x}, so that

𝒈𝑿t,τ(𝑿t,τ)=𝑰n.\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})=\boldsymbol{I}_{n}.

The risk factors are given by

𝑭t,τ=(β^t+T,1β^t+T,dλ^t+T,1λ^t+T,n),\boldsymbol{F}_{t,\tau}=\left(\begin{array}[]{c}\hat{\beta}_{t+T,1}\\ \vdots\\ \hat{\beta}_{t+T,d}\\ \hat{\lambda}_{t+T,1}\\ \vdots\\ \hat{\lambda}_{t+T,n}\\ \end{array}\right),

where

β^t+T,1,,β^t+T,d,λ^t+T,1,,λ^t+T,n\hat{\beta}_{t+T,1},\ldots,\hat{\beta}_{t+T,d},\hat{\lambda}_{t+T,1},\ldots,\hat{\lambda}_{t+T,n}

are, respectively, our forecasts for

βt+T,1,,βt+T,d,λt+T,1,,λt+T,n.\beta_{t+T,1},\ldots,\beta_{t+T,d},\lambda_{t+T,1},\ldots,\lambda_{t+T,n}.

Our risk factors explain all of the risk, so

𝑿t,τ=𝒉(𝑭t,τ),\boldsymbol{X}_{t,\tau}=\boldsymbol{h}(\boldsymbol{F}_{t,\tau}),

where 𝒉\boldsymbol{h} is given by equation (4).

The matrix of sensitivities is given by

𝑯t,τ\displaystyle\boldsymbol{H}_{t,\tau} =𝒈𝑿t,τ(𝑿t,τ)𝒉𝑭t,τ(𝑭t,τ)\displaystyle=\frac{\partial\boldsymbol{g}}{\partial\boldsymbol{X}_{t,\tau}}(\boldsymbol{X}_{t,\tau})\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})
=𝒉𝑭t,τ(𝑭t,τ)\displaystyle=\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})

and the risk exposure can be obtained from (2).

Thus we need to find

(𝒉𝑭t,τ(𝑭t,τ))i,j\left(\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})\right)_{i,j}

for i=1:ni=1:n, j=1:d+nj=1:d+n. Applying the chain rule, for j=1:dj=1:d,

(𝒉𝑭t,τ(𝑭t,τ))i,j\displaystyle\left(\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})\right)_{i,j} =(𝑷t)iβt,j(𝑭t,τ)\displaystyle=\frac{\partial\left(\boldsymbol{P}_{t}\right)_{i}}{\partial\beta_{t,j}}(\boldsymbol{F}_{t,\tau})
=k=1liτt,i,kci,ke(yt(τt,i,k)+λt,i)τt,i,kfj(τt,i,k),\displaystyle=-\sum_{k=1}^{l_{i}}\tau_{t,i,k}c_{i,k}e^{-(y_{t}(\tau_{t,i,k})+\lambda_{t,i})\tau_{t,i,k}}f_{j}(\tau_{t,i,k}),

and for j=d+1:d+nj=d+1:d+n, setting j=jdj^{\prime}=j-d,

(𝒉𝑭t,τ(𝑭t,τ))i,j\displaystyle\left(\frac{\partial\boldsymbol{h}}{\partial\boldsymbol{F}_{t,\tau}}(\boldsymbol{F}_{t,\tau})\right)_{i,j} =(𝑷t)iλt,j(𝑭t,τ)\displaystyle=\frac{\partial\left(\boldsymbol{P}_{t}\right)_{i}}{\partial\lambda_{t,j^{\prime}}}(\boldsymbol{F}_{t,\tau})
={k=1liτt,i,kci,ke(yt(τt,i,k)+λt,i)τt,i,k,i=j;0,otherwise.\displaystyle=\left\{\begin{array}[]{ll}-\sum_{k=1}^{l_{i}}\tau_{t,i,k}c_{i,k}e^{-(y_{t}(\tau_{t,i,k})+\lambda_{t,i})\tau_{t,i,k}},&\hbox{$i=j^{\prime}$;}\\ 0,&\hbox{otherwise.}\end{array}\right.

References

  • [1] FICO Xpress Optimizer Reference Manual, 2023.
  • [2] The Numerical Algorithms Group NAG Library Manual, Mark 29.2, 2023. https://support.nag.com/numeric/nl/nagdoc_latest/.
  • [3] Irene Aldridge. High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems. Wiley, 2 edition, 2013.
  • [4] Martin Andersen, Joachim Dahl, and Lieven Vandenberghe. CVXOPT: Convex optimization. Astrophysics Source Code Library, 2020.
  • [5] MOSEK ApS. The MOSEK optimization toolbox for MATLAB manual. Version 9.0., 2019.
  • [6] Karl J. Astrom. Introduction to Stochastic Control Theory. Dover, 2006.
  • [7] Dimitri P. Bertsekas. Dynamic programming and optimal control, Volume I. Athena Scientific, Belmont, MA, 2001.
  • [8] Dimitri P. Bertsekas. Dynamic programming and optimal control, Volume II. Athena Scientific, Belmont, MA, 2005.
  • [9] Michael J. Best. Portfolio Optimization. CRC Press, 2010.
  • [10] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
  • [11] Kazem Cheshmi, Danny M. Kaufman, Shoaib Kamil, and Maryam Mehri Dehnavi. NASOQ. ACM Transactions on Graphics, 39(4), aug 2020.
  • [12] Gerard Cornuejols, Javier Pena, and Reha Tutuncu. Optimization Methods in Finance. Cambridge University Press, 2 edition, 2018.
  • [13] Emanuel Derman. My Life as a Quant: Reflections on Physics and Finance. Wiley, 2007.
  • [14] Philip E. Gill, Walter Murray, and Margaret H. Wright. Practical Optimization. Emerald Group Publishing Limited, 1982.
  • [15] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual, 2023.
  • [16] IBM. V12.1: User's manual for CPLEX. Technical report, IBM, 2009.
  • [17] IMSL. IMSL STAT/LIBRARY. Visual Numerics Inc., Houston, Texas, USA, 1997. http://www.vni.com/books/dod/pdf/STATVol_2.pdf.
  • [18] Mark Joshi. On becoming a quant. http://www.maths.usyd.edu.au/u/UG/SM/MATH3075/r/Joshi_2008.pdf, 2008.
  • [19] Mark S. Joshi and Jane M. Paterson. Introduction to Mathematical Portfolio Theory. International Series on Actuarial Science. Cambridge University Press, 2013.
  • [20] Harold J. Kushner and Paul Dupuis. Numerical Methods for Stochastic Control Problems in Continuous Time. Springer, 2 edition, 2000.
  • [21] John W. Lockwood, Adwait Gupte, Nishit Mehta, Michaela Blott, Tom English, and Kees Vissers. A low-latency library in FPGA hardware for high-frequency trading (HFT). In IEEE 20th Annual Symposium on High-Performance Interconnects, pages 9–16, 2012.
  • [22] Attilio Meucci. Risk and Asset Allocation. Springer Finance. Springer, 2005.
  • [23] Jan Novotny, Paul Alexander Bilokon, Aris Galiotos, and Frédéric Délèze. Machine Learning and Bid Data with kdb+/q. Wiley, 2019.
  • [24] Bernt Øksendal. Stochastic Differential Equations: An Introduction with Applications. Universitext. Springer, 6 edition, 2000.
  • [25] Bernt Øksendal and Agnes Sulem. Applied Stochastic Control of Jump Diffusions. Springer, 3 edition, 2019.
  • [26] Huyen Pham. Continuous-time Stochastic Control and Optimization with Financial Applications. Springer, 2009.
  • [27] Warren B. Powell. Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequential Decisions. Wiley, 2022.
  • [28] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd. OSQP: an operator splitting solver for quadratic programs. Mathematical Programming Computation, 12(4):637–672, 2020.
  • [29] Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 2 edition, 2018.
  • [30] Jiongmin Yong and Xun Yu Zhou. Stochastic Controls: Hamiltonian Systems and HJB Equations. Springer, 1999.

Appendix A Auxiliary results

Theorem A.1 (The variance of a function of a random variable).

Let XX be a real-valued random variable with known finite expected value and finite nonzero variance and let f:f:\mathbb{R}\to\mathbb{R} be an integrable function. Then

𝖵𝖺𝗋[f(X)]f(𝔼[X])2𝖵𝖺𝗋[X].\mathsf{Var}\left[f(X)\right]\approx f^{\prime}(\mathbb{E}\left[X\right])^{2}\mathsf{Var}\left[X\right].
Proof.

The following proof is due to Tomek Tarczynski.333See Tomek’s post Variance of a function of one random variable on CrossValidated: http://stats.stackexchange.com/questions/5782/variance-of-a-function-of-one-random-variable

By Chebyshev's inequality for random variables with finite variance, for any real c>0c>0,

[|X𝔼[X]|>c]1c𝖵𝖺𝗋[X],\mathbb{P}\left[|X-\mathbb{E}\left[X\right]|>c\right]\leq\frac{1}{c}\mathsf{Var}\left[X\right],

so for any ϵ>0\epsilon>0 we can find a large enough cc so that

[X[𝔼[X]c,𝔼[X]+c]]=[|X𝔼[X]|c]<1ϵ.\mathbb{P}\left[X\in[\mathbb{E}\left[X\right]-c,\mathbb{E}\left[X\right]+c]\right]=\mathbb{P}\left[|X-\mathbb{E}\left[X\right]|\leq c\right]<1-\epsilon.

Let us estimate 𝔼[f(X)]\mathbb{E}\left[f(X)\right]. We can write it as

𝔼[f(X)]=|x𝔼[X]|cf(x)𝑑F(x)+|x𝔼[X]|>cf(x)𝑑F(x),\mathbb{E}\left[f(X)\right]=\int_{|x-\mathbb{E}\left[X\right]|\leq c}f(x)\,dF(x)+\int_{|x-\mathbb{E}\left[X\right]|>c}f(x)\,dF(x), (5)

where FF is the distribution function of XX.

Since the domain of the first integral is the bounded closed interval [𝔼[X]c,𝔼[X]+c][\mathbb{E}\left[X\right]-c,\mathbb{E}\left[X\right]+c], we can apply the Taylor series expansion:

f(x)=f(𝔼[X])+f(𝔼[X])(x𝔼[X])+12f′′(𝔼[X])(x𝔼[X])2+13!f′′′(α)(x𝔼[X])3,f(x)=f(\mathbb{E}\left[X\right])+f^{\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])+\frac{1}{2}f^{\prime\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])^{2}+\frac{1}{3!}f^{\prime\prime\prime}(\alpha)(x-\mathbb{E}\left[X\right])^{3},

where α[𝔼[X]c,𝔼[X]+c]\alpha\in[\mathbb{E}\left[X\right]-c,\mathbb{E}\left[X\right]+c], and the equality holds for all x[𝔼[X]c,𝔼[X]+c]x\in[\mathbb{E}\left[X\right]-c,\mathbb{E}\left[X\right]+c]. Here we took only four terms in the Taylor series expansion, but in general we can take as many as needed, as long as the function ff is smooth enough.

Substituting this formula into (5), we get

𝔼[f(X)]=|x𝔼[X]|cf(𝔼[X])+f(𝔼[X])(x𝔼[X])+12f′′(𝔼[X])(x𝔼[X])2dF(x)\displaystyle\mathbb{E}\left[f(X)\right]=\int_{|x-\mathbb{E}\left[X\right]|\leq c}f(\mathbb{E}\left[X\right])+f^{\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])+\frac{1}{2}f^{\prime\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])^{2}\,dF(x)
+|x𝔼[X]|c13!f′′′(α)(x𝔼[X])3𝑑F(x)\displaystyle+\int_{|x-\mathbb{E}\left[X\right]|\leq c}\frac{1}{3!}f^{\prime\prime\prime}(\alpha)(x-\mathbb{E}\left[X\right])^{3}\,dF(x)
+|x𝔼[X]|>cf(x)𝑑F(x).\displaystyle+\int_{|x-\mathbb{E}\left[X\right]|>c}f(x)\,dF(x).

Increasing the domain of integration, we obtain

𝔼[f(X)]=f(𝔼[X])+12f′′(𝔼[X])𝔼[X𝔼[X]]2+R3\mathbb{E}\left[f(X)\right]=f(\mathbb{E}\left[X\right])+\frac{1}{2}f^{\prime\prime}(\mathbb{E}\left[X\right])\mathbb{E}\left[X-\mathbb{E}\left[X\right]\right]^{2}+R_{3} (6)

where

R3=13!f′′′(α)𝔼[(X𝔼[X])2]\displaystyle R_{3}=\frac{1}{3!}f^{\prime\prime\prime}(\alpha)\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^{2}\right]
+|x𝔼[X]|>c(f(𝔼[X])+f(𝔼[X])(x𝔼[X])+12f′′(𝔼[X])(x𝔼[X])2+f(X))𝑑F(x).\displaystyle+\int_{|x-\mathbb{E}\left[X\right]|>c}\left(f(\mathbb{E}\left[X\right])+f^{\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])+\frac{1}{2}f^{\prime\prime}(\mathbb{E}\left[X\right])(x-\mathbb{E}\left[X\right])^{2}+f(X)\right)\,dF(x).

Under some moment conditions, we can show that the second term of this remainder is as large as [|X𝔼[X]|>c]\mathbb{P}[|X-\mathbb{E}\left[X\right]|>c], which is generally small. The first term remains, therefore the quality of the approximation depends on 𝔼[(X𝔼[X])3]\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^{3}\right] and the behaviour of the third derivative of ff on bounded intervals. Such approximation should work particularly well for random variables with zero third central moment, such as the normal distribution.

To obtain an approximation for the variance of f(X)f(X), we subtract (6) from the Taylor series expansion for f(x)f(x) and square the difference:

𝖵𝖺𝗋[f(X)]=𝔼[(f(X)𝔼[f(X)])2]=(f(𝔼[X]))2𝖵𝖺𝗋[X]+T3,\mathsf{Var}\left[f(X)\right]=\mathbb{E}\left[(f(X)-\mathbb{E}\left[f(X)\right])^{2}\right]=(f^{\prime}(\mathbb{E}\left[X\right]))^{2}\mathsf{Var}\left[X\right]+T_{3},

where T3T_{3} involves the central moments 𝔼[(X𝔼[X])k]\mathbb{E}\left[(X-\mathbb{E}\left[X\right])^{k}\right] for k4k\geq 4. ∎

Remark A.2.

Theorem A.1 generalises to the n\mathbb{R}^{n}-valued random variable 𝑿\boldsymbol{X}, nn\in\mathbb{N}^{*}, and 𝒇:nn\boldsymbol{f}:\mathbb{R}^{n}\to\mathbb{R}^{n}:

𝖵𝖺𝗋[𝒇(X)](𝒇𝑿(𝔼[𝑿]))𝖵𝖺𝗋[𝑿]𝒇𝑿(𝔼[𝑿]).\mathsf{Var}\left[\boldsymbol{f}(X)\right]\approx\left(\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{X}}(\mathbb{E}\left[\boldsymbol{X}\right])\right)^{\intercal}\mathsf{Var}\left[\boldsymbol{X}\right]\frac{\partial\boldsymbol{f}}{\partial\boldsymbol{X}}(\mathbb{E}\left[\boldsymbol{X}\right]).
Lemma A.3.

Let 𝐌\boldsymbol{M} be the block matrix

𝑴=(𝑨𝟎n×m𝟎m×n𝑩)\boldsymbol{M}=\left(\begin{array}[]{cc}\boldsymbol{A}&\boldsymbol{0}_{n\times m}\\ \boldsymbol{0}_{m\times n}&\boldsymbol{B}\\ \end{array}\right)

with 𝐀n×n\boldsymbol{A}\in\mathbb{R}^{n\times n}, 𝐁m×m\boldsymbol{B}\in\mathbb{R}^{m\times m}. The eigenvalues of 𝐌\boldsymbol{M} are precisely those of the matrices 𝐀\boldsymbol{A} (nn eigenvalues, some of the possibly repeated) and 𝐁\boldsymbol{B} (mm eigenvalues, some of them possibly repeated).

Proof.

It is well known that for all square matrices 𝑨\boldsymbol{A} and 𝑩\boldsymbol{B} of equal dimensions, the following holds:

det(𝑨𝟎n×m𝟎m×n𝑩)=det(𝑨)det(𝑩).\det\left(\begin{array}[]{cc}\boldsymbol{A}&\boldsymbol{0}_{n\times m}\\ \boldsymbol{0}_{m\times n}&\boldsymbol{B}\\ \end{array}\right)=\det(\boldsymbol{A})\det(\boldsymbol{B}).

Therefore, the characteristic polynomial of this block matrix is given by

det(𝑨λ𝑰n×n𝟎n×m𝟎m×n𝑩λ𝑰m×m)=det(𝑨λ𝑰n×n)det(𝑨λ𝑰m×m).\det\left(\begin{array}[]{cc}\boldsymbol{A}-\lambda\boldsymbol{I}_{n\times n}&\boldsymbol{0}_{n\times m}\\ \boldsymbol{0}_{m\times n}&\boldsymbol{B}-\lambda\boldsymbol{I}_{m\times m}\\ \end{array}\right)=\det(\boldsymbol{A}-\lambda\boldsymbol{I}_{n\times n})\det(\boldsymbol{A}-\lambda\boldsymbol{I}_{m\times m}).

It follows that the eigenvalues 𝑴\boldsymbol{M} are precisely those of the matrices 𝑨\boldsymbol{A} and 𝑩\boldsymbol{B}, combined. ∎

Lemma A.4.

Let 𝐌\boldsymbol{M} be the block matrix

𝑴=(𝑨𝑩𝑩𝑨)\boldsymbol{M}=\left(\begin{array}[]{cc}\boldsymbol{A}&\boldsymbol{B}\\ \boldsymbol{B}&\boldsymbol{A}\\ \end{array}\right)

with 𝐀,𝐁n\boldsymbol{A},\boldsymbol{B}\in\mathbb{R}^{n}, nn\in\mathbb{N}^{*}. The eigenvalues of 𝐌\boldsymbol{M} are precisely those of the matrices 𝐀+𝐁\boldsymbol{A}+\boldsymbol{B} (nn eigenvalues, some of them possibly repeated) and 𝐀𝐁\boldsymbol{A}-\boldsymbol{B} (nn eigenvalues, some of them possibly repeated).

Proof.

It is well known that for all square matrices 𝑨\boldsymbol{A} and 𝑩\boldsymbol{B} of equal dimensions, the following holds:

det(𝑨𝑩𝑩𝑨)=det(𝑨𝑩)det(𝑨+𝑩).\det\left(\begin{array}[]{cc}\boldsymbol{A}&\boldsymbol{B}\\ \boldsymbol{B}&\boldsymbol{A}\\ \end{array}\right)=\det(\boldsymbol{A}-\boldsymbol{B})\det(\boldsymbol{A}+\boldsymbol{B}).

Therefore, the characteristic polynomial of this block matrix is given by

det(𝑨λ𝑰n×n𝑩𝑩𝑨λ𝑰n×n)=det((𝑨𝑩)λ𝑰n×n)det((𝑨+𝑩)λ𝑰n×n).\det\left(\begin{array}[]{cc}\boldsymbol{A}-\lambda\boldsymbol{I}_{n\times n}&\boldsymbol{B}\\ \boldsymbol{B}&\boldsymbol{A}-\lambda\boldsymbol{I}_{n\times n}\\ \end{array}\right)=\det((\boldsymbol{A}-\boldsymbol{B})-\lambda\boldsymbol{I}_{n\times n})\det((\boldsymbol{A}+\boldsymbol{B})-\lambda\boldsymbol{I}_{n\times n}).

It follows that the eigenvalues 𝑴\boldsymbol{M} are precisely those of the matrices 𝑨+𝑩\boldsymbol{A}+\boldsymbol{B} and 𝑨𝑩\boldsymbol{A}-\boldsymbol{B}, combined. ∎