Deep Hedging: Learning to Remove the Drift under Trading Frictions with Minimal Equivalent Near-Martingale Measures

H. BUEHLER

{\dagger}

P. MURRAY^∗

{\dagger}

{{\ddagger}}

M. S. PAKKANEN

{{\ddagger}}

and B. WOOD

{\dagger}

^∗Corresponding author Email: phillip.murray@jpmorgan.com

{\dagger}

JP Morgan

{\ddagger}

Imperial College London

Abstract

We present a machine learning approach for finding minimal equivalent martingale measures for markets simulators of tradable instruments, e.g. for a spot price and options written on the same underlying. We extend our results to markets with frictions, in which case we find “near-martingale measures” under which the prices of hedging instruments are martingales within their bid/ask spread.

By removing the drift, we are then able to learn using Deep Hedging a “clean” hedge for an exotic payoff which is not polluted by the trading strategy trying to make money from statistical arbitrage opportunities. We correspondingly highlight the robustness of this hedge vs estimation error of the original market simulator. We discuss applications to two market simulators.

1 Introduction

A long-standing challenge in quantitative finance is the development of market models for the dynamics of tradable instruments such as a spot price and options thereon. The classic approach to developing such models is to find model dynamics in a suitable parameter space under which the respective risk-neutral drift could be computed somewhat efficiently, c.f. for example [12], [15], [10] for the case of equity option markets. With this approach, realistic dynamics or estimation of statistically valid parameters are an afterthought.

This article proposes to reverse this process by starting out with training a realistic model of the market under the statistical measure – and then find an equivalent “near-martingale” measure under which the drifts of tradable instruments are constrained by their marginal costs such that there are no statistical arbitrage opportunities, that is, trading strategies which produce positive expected gains. In the absence of trading costs, this means finding an equivalent martingale measure to “remove the drift”.

Indeed, we will show that absence of statistical arbitrage under a given measure is equivalent to the conditional expectation of the returns under this measure being constrained by their marginal bid/ask prices. This result is of independent interest.

The main motivation for the present work is the application of our Deep Hedging algorithm to construct hedging strategies for contingent claims by trading in hedging instruments which include derivatives such as options. When described first in [3], we relied on markets simulated with classic quantitative finance models. In [1] we proposed a method to build market simulators of options markets under the statistical measure. Under this measure, we will usually find statistical arbitrage in the sense that an empty initial portfolio has positive value. This reflects the realities of historic data: at the time of writing the S&P 500 had moved upwards over the last ten years, giving a machine the impression that selling puts and being long the market is a winning strategy. However, naively exploiting this observation risks falling foul of the “estimation error” of the mean returns of our hedging instruments. In the context of hedging a portfolio of exotic derivatives, the presence of statistical arbitrage is undesirable as an optimal strategy will be a combination of a true hedge, and a strategy which does not depend on our portfolio, but tries to take advantage of the opportunities seen the market. It is therefore not robust against estimation error of said drifts. Hence we propose using the method presented here to generate a “clean” hedge by removing the drift of the market to increase robustness against errors in the estimation of returns of our hedging instruments.

While our examples focus on simulating equity option markets – in this case amounting to a stochastic implied volatility model – our approach is by no means limited to the equities case. In fact, it is entirely model agnostic and can be applied to any market simulator which generates paths of tradable instruments under the same numeraire which are free of classic arbitrage.

In particular, our approach can be applied to “black box” neural network based simulators such as those using Generative Adversarial Networks (GANs) described in [1] or Variational Autoencoders as in [5]. Such simulators use machine learning methods to generate realistic paths from the statistical measure, but clearly no analytic expression to describe the market dynamics can be written. Our method allows constructing an equivalent risk neutral measure through further applications of machine learning methods.

1.1 Summary of our Approach

Given instrument returns $DH_{t}$ across discrete time steps $t\in\{0=t_{0}<\cdots<t_{m}\}$ , and convex costs $c_{t}$ associated with trading $a_{t}$ units of each instrument, we propose using our Deep Hedging algorithm introduced in [3] to find a trading strategy $a^{*}$ and cash amount $y^{*}$ which maximize the optimized certainty equivalent of a utility $u$ ,

a\longmapsto\sup_{y\in\mathbb{R}}\ {\mathbb{E}}\left[\,{u\left(y+\sum_{t}a_{t}\cdot DH_{t}-c_{t}(a_{t})\right)-y}\,\right]\ .

Let $m$ denote the “marginal cost” of trading in our market. We then define an equivalent measure $\mathbb{Q}^{*}$ by setting

\frac{d\mathbb{Q}^{*}}{d\mathbb{P}}:=u^{\prime}\left(y^{*}+\sum_{t}a^{*}_{t}\cdot DH_{t}-m_{t}(a^{*}_{t})\right)\ .

Under $\mathbb{Q}^{*}$ the market has no statistical arbitrage opportunities in the sense that there is no strategy $a$ which has positive expected returns, i.e.

\sup_{a}\ {\mathbb{E}^{*}}\left[\,{\sum_{t}a_{t}\cdot DH_{t}-c_{t}(a_{t})}\,\right]\leq 0\ .

(1)

In the absence of transaction costs, $\mathbb{Q}^{*}$ is an equivalent martingale measure. In the presence of transaction costs, we show that removing statistical arbitrage is equivalent to the measure being an equivalent near-martingale measure, in the sense that the drift of all tradable instruments must be dominated by the transaction costs. Moreover, $\mathbb{Q}^{*}$ is minimal among all equivalent (near-) martingale measures with respect to the $\tilde{u}$ -divergence from $\mathbb{P}$ where $\tilde{u}$ is the Legendre-Fenchel transform of $u$ .

The key insight of our utility-based risk-neutral density construction is that it relies only on solving the optimization problem of finding $a^{*}$ and $c^{*}$ , not on any particular dynamics for the market under the $\mathbb{P}$ measure. Therefore, it lends itself to the application of modern machine learning methods. As mentioned above, this is particularly useful in the case of removing statistical arbitrage from a “black box” market simulator, such as the GAN based approach discussed in [1]. Through the choice of utility function, we are able to control the risk neutral measure we construct.

We demonstrate the power of this approach with two examples of option market simulators for spot and a number of volatilities. Specifically, we train a Vector Autoregressive (VAR) model of the form

dY_{t}=\left(B-A_{1}Y_{t-1}-A_{2}Y_{t-2}\right)dt+\Sigma dW_{t}

for a vector of log spot returns and log volatilities $Y_{t}=(\log{S_{t}}/{S_{t-1}},\log\sigma^{1}_{t},\ldots,\log\sigma^{K}_{t})^{\prime}$ , and also a neural network based GAN simulator, and then in both cases use our approach to construct the above measure such that the resulting spots and option prices are near-martingale, and free from statistical arbitrage.

1.2 Related Work

We are not aware of attempts to numerically solve for a risk-neutral density with the approach discussed here, as an application to stochastic implied volatility or otherwise. To our knowledge, ours is the first practical approach for implementing general statistically trained market models under risk-neutral measures.

The classic approach to stochastic implied volatilities is via the route of identifying analytically a risk-neutral drift given the other parameters of the specified model. The first applicable results for a term structure of implied volatilities are due to [12]. The first viable approach to a full stochastic implied volatility model for an entire fixed strike and fixed maturity option surface was presented in[15], using as parametrization also discrete local volatilities. Wissel describes the required continuous time drift adjustment for a diffusion driving a grid of such discrete local volatilities as a function of the free parameters. Unnaturally, in his approach the resulting spot diffusion takes only discrete values at the strikes of the options at each maturity date and the approach is limited to a set grid of options defined in cash strikes and fixed maturities.

More recently, a number of works have shown that when representing an option surface with a Lévy kernel we can derive suitable Heath-Jarrow-Morton conditions on the parameters of the diffusion of the Lévy kernel such that the resulting stock price is arbitrage-free, c.f. [10] and the references therein. Simulation of the respective model requires solving the respective Fourier equations for the spot price and options at each step in the path.

1.3 Outline

The rest of the article is organised as follows. In Section 2 we describe the theoretical framework and introduce the key method for constructing a risk-neutral measure in the firctionless case, which is then extended to the case with market frictions in Section 3. Then, in Section 4 we describe some of the consequences of the approach from a practical persective, and in Section 5 we provide numerical experiments demonstrating the effectiveness of the method in practice.

2 Frictionless risk-neutral case

Consider a discrete-time simulated financial market with finite time horizon where we trade over time steps $0=t_{0}<\cdots<t_{m}=T$ where $T$ is the maximum maturity of all tradable instruments. Fix a probability space $\Omega$ and a probability measure $\mathbb{P}$ under which the market is simulated, which we will refer to as the “statistical” measure. For each $t\in\{t_{0},\ldots,t_{m}\}$ , we denote by $s_{t}$ the state of the market at time $t$ , including relevant information from the past. The state represents all information available to us, including mid-prices of all tradable instruments, trading costs, restrictions and risk limits.

The sequence of states $(s_{t})_{t=0,\ldots,T}$ generates a sequence $\mathbb{F}=({\cal F}_{t})_{t=0,\ldots,T}$ of $\sigma$ -algebras forming a filtration. Being generative means that any ${\cal F}_{t}$ -measurable function $f(\cdot)$ can be written as a function of $s_{t}$ as $f\equiv f(\cdot;s_{t})$ .

To simplify notation, we stipulate that the total number of instruments at each timestep is always $n$ . Let $H_{t}^{(t)}=(H_{t}^{(t,1)},\ldots,H_{t}^{(t,n)})$ be the and $\mathbb{R}^{n}$ -valued, $\mathbb{F}$ -adapted stochastic process of mid-prices of the liquid instruments available to trade at $t$ . As above, $H_{t}^{(t)}$ is a function of $s_{t}$ , and we also assume that $H$ is in $L^{1}(\mathbb{P})$ . Note that $H_{t}$ can represent a wide class of instruments, including primary assets such as single equities, indices, and liquid options.

For each instrument we observe at time $T$ a final mark-to-market mid-value $H_{T}^{(t,i)}$ which will usually be the sum of any cashflows along the path, and which is also assumed to be a function of $s_{T}$ . That means $s_{T}$ must contain sufficient information from the past along the path: for example, if the $i$ th instrument tradable at $t$ is a call option with relative strike $k_{i}$ and time-to-maturity $\tau_{i}\leq T-t$ on a spot price process $S_{t}$ , then the final value of this $i$ th instrument is the payoff on the path, $H_{T}^{(t,i)}=(S_{t+\tau_{i}}/S_{t}-k_{i})^{+}$ . Whilst for simplicity, we assume that all options mature within the time horizon, we can easily extend our method to the case where options are allowed to mature after $T$ by valuing them in $T$ at mid-prices.

We further assume that discounting rates, funding, dividends, and repo rates are zero. Extension to the case where they are non-zero and deterministic is straightforward.

At each time step $t$ we may chose an action $a_{t}\in\mathbb{R}^{n}$ to trade in the hedging instruments $H_{t}^{(t)}$ based on the information available in the state $s_{t}$ , i.e. $a_{t}\equiv a(s_{t})$ . The $\mathbb{R}^{n}$ -valued, $\mathbb{F}$ -adapted stochastic process $a=(a_{0},\ldots,a_{m-1})$ defines a trading strategy over the time horizon. To ease notation, it will be useful to define $a^{\pm}_{t}:=\max(0,\pm a_{t})$ for an action $a_{t}$ , where the operation is applied elementwise. We also use $e^{i}$ to refer to the $i$ th unit vector.

We start with the frictionless case, where the actions are unconstrained, i.e. the set of admissible actions $\mathcal{A}_{t}$ is equal to $\mathbb{R}^{n}$ for all $t$ . In this case the terminal gain of implementing a trading strategy $a=(a_{0},\ldots,a_{m-1})$ with $a_{t}\in{\cal A}_{t}$ is given by

{a\star DH_{T}}:=\sum_{t=0}^{m-1}a_{t}\cdot DH_{t}\ \ \ \mbox{with}\ \ \ DH_{t}:=H^{(t)}_{T}-H^{(t)}_{t}\ .

(2)

Our slightly unusual notation of taking the performance of each instrument to maturity reflects our ambition to look at option market simulators for “floating” implied volatility surfaces where the observed financial instruments change from step to step. If the instruments tradable at each time step are, in fact, the same fixed strike and maturity instruments, then

{a\star DH_{T}}\equiv\sum_{t=0}^{m-1}\delta_{t}\cdot dH_{t}\ \ \ \mbox{with}\ \ \ dH_{t}:=H_{t+1}-H_{t}\ ,

(3)

where $\delta_{t}:=a_{t}+\delta_{t-1}$ starting with $\delta_{-1}:=0$ .

2.1 Optimized certainty equivalents

In order to assess the performance of a trading strategy, we are looking for risk-adjusted measures of performance instead of plain expected return. We will focus on the following case: let $u$ be a strictly concave, strictly increasing utility function $u$ which is $C^{1}$ and normalized to both $u(0)=0$ and $u^{\prime}(0)=1$ .¹¹1 Note that the normalization is a convenience which is always achievable: If $\tilde{u}$ is concave and strictly increasing, then $u(x):=(\tilde{u}(x)-\tilde{u}(0))/\tilde{u}^{\prime}(0)$ satisfies these assumptions. Examples of such utility functions are the adjusted mean-volatility function $u(x):=(1+\lambda x-\sqrt{1+\lambda^{2}x^{2}})/\lambda$ proposed in [9], or the exponential utility $u(x)=(1-e^{-\lambda x})/\lambda$ . We make the further assumption that $u({a\star DH_{T}})\in L^{1}(\mathbb{P})$ for all $a$ . (This condition is not met when $u$ is the exponential utility and the market contains a Black & Scholes process with negative drift.²²2 To see this, assume $m=1$ , $n=1$ , $H_{0}:=1$ , and let $H_{T}:=\exp((-\mu-\frac{1}{2}\sigma^{2})T+\sigma\sqrt{T}W)$ for positive $\mu$ and $\sigma$ , and $W$ standard normal. Then $\mathbb{E}[u(-(H_{T}-H_{0}))]=-\infty$ . )

For a given utility function, define now the optimized certainty equivalent (OCE) of the expected utility, introduced in [2] as

U(X):=\sup_{y\in\mathbb{R}}\Big{\{}{\mathbb{E}}\left[\,{u(y+X)}\,\right]-y\Big{\}}\ .

(4)

The functional $U$ satisfies the following properties:

(i)

Monotone increasing: if $X\geq Y$ then $U(X)\geq U(Y)$ . A better payoff leads to higher expected utility.
(ii)

Concave: $U(\alpha X+(1-\alpha)Y)\geq\alpha U(X)+(1-\alpha)U(Y)$ for $\alpha\in[0,1]$ . Diversification leads higher utility.
(iii)

Cash-Invariant: $U(X+c)=U(X)+c$ for all $c\in\mathbb{R}$ . Adding cash to a position increases its utility by the same amount.

This above properties mean that $-U(X)$ is a convex risk measure. Note that the assumptions $u(0)=0$ and $u^{\prime}(0)=1$ imply that $u(y)\leq y$ for all $y$ and hence $U(0)=0$ . Furthermore, $U$ is finite for all bounded variables $X$ , since by monotonicity we have $U(X)\leq U(\sup X)=U(0)+\sup X<\infty$ .

Cash invariance means that $U(X-U(X))=0$ , i.e. $-U(X)$ is the minimum amount of cash that needs to be added to a position in order to make it acceptable, in the sense that $U(X+c)\geq 0$ . The cash-invariance property of the OCE means in particular that optimizing $U$ does not depend on our initial wealth.

A classic example of such an OCE measure is the case where $u$ is the exponential utility with risk aversion level $\lambda$ . In this case we obtain the entropy

U_{\lambda}(X)=-\frac{1}{\lambda}\log\mathbb{E}\left[e^{-\lambda X}\right]\ .

We now consider the application of the optimized certainty equivalent to the terminal gains of a trading strategy. To this end, define

F(y,a):={\mathbb{E}}\left[\,{u\big{(}\,y+{a\star DH_{T}}\,\big{)}-y}\,\right]\ .

(5)

Lemma 2.1.

Suppose the market exhibits classic arbitrage. Then no finite maximizers of $F$ exist.

Proof.

Assume that $\tilde{a}$ is a classic arbitrage opportunity with $\mathbb{P}[\tilde{a}\star DH_{T}\geq 0]=1$ and $\mathbb{P}[A]=p>0$ for a set $A=\{\tilde{a}\star DH_{T}\geq g\}$ for some $g>0$ . Note that $\tilde{a}\star DH_{T}\geq g\mbox{{\sl 1}}_{A}\,$ . Let $n\in\mathbb{N}$ . Then

U(n\tilde{a}\star DH_{T})\geq\sup{}_{y}:\ \mathbb{E}[\mbox{{\sl 1}}_{A}\,u(y+ng)-y]=\sup{}_{y}:\ pu(y+ng)-y=(*)\ .

The last term is optimized by $\tilde{y}=u^{\prime}{}^{-1}(1/p)-ng$ as it solves $pu^{\prime}(\tilde{y}+ng)=1$ . Therefore, $(*)=pu(u^{\prime}{}^{-1}(1/p))-u^{\prime}{}^{-1}(1/p)+ng$ which tends to infinity as $n\uparrow\infty$ . Hence, no finite maximizer of (5) exists. ∎

2.2 Utility-based risk neutral densities

In the absence of classic arbitrage, we are now able to use the optimized certainty equivalent framework to construct an equivalent martingale measure. Before moving on to our main result, we will need the following lemma.

Lemma 2.2.

Let $f:\mathbb{R}\rightarrow\mathbb{R}$ be concave and $C^{1}$ . Assume that $f(\xi)\in L^{1}$ for all $\xi$ and that $\mathbb{E}[f(\xi^{*})]\geq\mathbb{E}[f(\xi)]$ for some $\xi^{*}$ . Then

\partial_{\epsilon}|_{\epsilon=0}\mathbb{E}[f(\epsilon\xi+\xi^{*})]=\mathbb{E}[f^{\prime}(\xi^{*})\xi].

(6)

Proof.

Define $\Delta_{\epsilon}:=\frac{1}{\epsilon}\left(f(\epsilon\xi+\xi^{*})-f(\xi^{*})\right)$ such that $\Delta_{\epsilon}\uparrow\partial_{\epsilon}|_{\epsilon=0}f(\epsilon\xi+\xi^{*})=f^{\prime}(\xi^{*})\xi$ since $f$ is concave. As a difference between two $L^{1}$ variables $\Delta_{\epsilon}\in L^{1}$ . Since $\xi^{*}$ maximizes the expectation of $f$ , we also have $\mathbb{E}[\Delta_{\epsilon}]\leq 0$ . Using the dominated convergence shows that $\Delta_{\epsilon}\uparrow f^{\prime}(\xi^{*})\xi\in L^{1}$ and therefore that taking expectations and derivatives in (6) can be exchanged. ∎

Now we give the main result allowing the construction of utility-based equivalent martingale measures.

Proposition 2.3.

Let $y^{*}$ and $a^{*}$ be finite maximizers of

y,a\longmapsto F(y,a):={\mathbb{E}}\left[\,{u\big{(}\,y+{a\star DH_{T}}\,\big{)}-y}\,\right]\ .

(7)

Then,

D^{*}:=u^{\prime}\big{(}\,y^{*}+{a^{*}\star DH_{T}}\,\big{)}

(8)

is an equivalent martingale density, i.e. the measure $\mathbb{Q}^{*}$ defined via $d\mathbb{Q}^{*}:=D^{*}d\mathbb{P}$ is an equivalent martingale measure.³³3 We note that if $u$ is not strictly increasing, then $D^{*}$ is an absolutely continuous, but possibly not equivalent density. An example is the CVaR “utility” $u(x)=\min\{x,0\}/(1-\alpha)$ .

Proof.

We follow broadly the discussion in Section 3.1 of [6].

Show that $D^{*}({a\star DH_{T}})\in L^{1}$ with zero expectation: optimality of $c^{*}$ and $a^{*}$ imply first $0=\partial_{\epsilon}\big{|}_{\epsilon=0}F(y^{*},\epsilon a+a^{*})$ . Secondly, lemma 2.2 shows for arbitrary $a$ that $D^{*}({a\star DH_{T}}\in L^{1})$ with

0=\partial_{\epsilon}\big{|}_{\epsilon=0}F(y^{*},\epsilon a+a^{*})={\mathbb{E}}\left[\,{D^{*}\ {a\star DH_{T}}}\,\right]\ .

(9)

Show that $\mathbb{E}_{t}[D^{*}DH_{t}]=0$ : for the previous statement, set $a:=(0,\ldots,\mbox{{\sl 1}}_{A_{t}}\,e^{i},\ldots,0)$ where $A_{t}$ is ${\cal F}_{t}$ -measurable, and where $e^{i}$ denotes the $i$ th unit vector. We obtain

0=\mathbb{E}[D^{*}DH^{i}_{t}|{\cal F}_{t}]\ .

(10)

Show that $D^{*}\in L^{1}$ : recall that ${a\star DH_{T}}=\mbox{$\sum_{it}$}a^{i}_{t}\cdot DH^{i}_{t}$ . Since $u$ is concave and strictly increasing, $u^{\prime}$ is decreasing and positive. Then,

	$\displaystyle 0\leq u^{\prime}(y^{}+{a^{}\star DH_{T}})$	$\displaystyle\leq$	$\displaystyle u^{\prime}\left(y^{}+{a^{}\star DH_{T}}\right)\|DH\|\mbox{{\sl 1}}_{\|DH\|>1}\,$
			$\displaystyle+u^{\prime}\left(y^{}-\mbox{ess $\sup_{it}$}\|a^{}{}^{i}_{t}\|\right)\mbox{{\sl 1}}_{\|DH\|\leq 1}\,\in L^{1}$

since $y^{*}$ and $a^{*}$ were assumed to be finite, and since the previous step with $a=1$ implies $u^{\prime}\left(y^{*}+{a^{*}\star DH_{T}}\right)|DH|\in L^{1}$ .

Positivity of $D^{*}$ : since $u^{\prime}$ is decreasing and positive we have $\lim_{n\uparrow\infty}u^{\prime}(n)\geq 0$ and therefore $\mathbb{P}[u^{\prime}({a^{*}\star DH_{T}})>0]=\lim_{n\uparrow 0}\mathbb{P}[u^{\prime}({a^{*}\star DH_{T}})>u^{\prime}(n)]=\lim_{n\uparrow 0}\mathbb{P}[{a^{*}\star DH_{T}}\leq n]=\mathbb{P}[{a^{*}\star DH_{T}}<\infty]=1$ , since $a^{*}$ being almost surely finite implies that ${a^{*}\star DH_{T}}\in L^{1}$ .

$D^{*}$ has unit expectation: optimality of $y^{*}$ and $a^{*}$ implies

0=\partial_{y}\big{|}_{y=y^{*}}F(y,a^{*})={\mathbb{E}}\left[\,{D^{*}}\,\right]-1

(11)

and therefore that $\mathbb{E}[D^{*}]=1$ .

∎

The density $D^{*}$ provides an equivalent martingale density. It is minimal among all equivalent martingale densities in the following sense: the Legendre-Fenchel transform of the convex function $f(x):=-u(-x)$ is defined as.

\tilde{u}(y):=\sup_{x\in\mathbb{R}}\{yx-f(x)\}\ .

The associated $\tilde{u}$ -divergence between two distributions $\mathbb{Q}$ and $\mathbb{P}$ with $\mathbb{Q}\ll\mathbb{P}$ is then

D_{f}(\mathbb{Q}|\mathbb{P})=\mathbb{E}\left[\tilde{u}\left(\frac{d\mathbb{Q}}{d\mathbb{P}}\right)\right]\ .

It is a non-symmetric measure of the similarity between two probability distributions.

Corollary 2.4.

Let $\tilde{u}(y)$ be the Legrendre-Fenchel transform of $u$ , and define $D^{*}$ as in (8). Then, $D^{*}$ is a minimizer of the $\tilde{u}$ -divergence

D\longmapsto{\mathbb{E}}\left[\,{\tilde{u}\left(D)\right)}\,\right]

(12)

over all equivalent martingale densities.

Proof.

The Legrende-Fenchel transform of the convex function $-u(-x)$ is $\tilde{u}(y)=\sup_{x}(yx+u(-x))=\sup_{x}(u(x)-yx)$ which implies that for all $x$ ,

\tilde{u}(y)\geq u(x)-yx\ .

(13)

Let ${\cal D}_{e}:=\{\ D>0:\ \mathbb{E}[D]=1,\,\mathbb{E}[D\,({a\star DH_{T}})]=0\ \mbox{for all $a$}\ \}$ be the set of equivalent martingale densities. Equation (13) implies for $y\rightarrow D\in{\cal D}_{e}$ and $x\rightarrow c+{a\star DH_{T}}$ ,

\inf_{D\in{\cal D}_{e}}{\mathbb{E}}\left[\,{\tilde{u}(D)}\,\right]\geq\sup_{c,a}\Big{\{}{\mathbb{E}}\left[\,{u\big{(}c+{a\star DH_{T}}\big{)}}\,\right]-c\Big{\}}=F(y^{*},a^{*})\ .

(14)

Let $I=u^{\prime}{}^{-1}(\mathbb{R})\subseteq(0,\infty)$ . For a given $y\in I$ the sup in $\tilde{u}(y)=\sup_{x}(u(x)-yx)$ is attained by $x=u^{\prime}{}^{-1}(y)$ which yields

\tilde{u}(y)=u\big{(}u^{\prime}{}^{-1}(y)\big{)}-y\ u^{\prime}{}^{-1}(y)

(15)

for all $y\in I$ and all $x\in\mathbb{R}$ as claimed above. Applying (15) to $\tilde{u}(D^{*})$ yields that equality of both sides of (14), proving our claim that $D^{*}$ is indeed a minimizer of (12).

∎

Thus, finding the $\tilde{u}$ -minimal equivalent martingale measures is the dual problem of maximizing the expected utility. The key observation is that we now have a numerically efficient method to solving the primal problem via the application of machine learning methods.

In the case of the exponential utility, the $\tilde{u}$ -divergence is the relative entropy of $\mathbb{Q}$ with respect to $\mathbb{P}$ ,

H(\mathbb{Q}|\mathbb{P})=\mathbb{E}\left[\frac{d\mathbb{Q}}{d\mathbb{P}}\log\frac{d\mathbb{Q}}{d\mathbb{P}}\right]\ .

The measure $\mathbb{Q}^{*}$ is the minimal entropy martingale measure (MEMM) introduced by [7], given by

\frac{d\mathbb{Q}^{*}}{d\mathbb{P}}=\frac{e^{-{a^{*}\star DH_{T}}}}{\mathbb{E}[e^{-{a^{*}\star DH_{T}}}]}

The measure is unique due to the strict convexity of the function $\tilde{u}(y)=y\log y$ .

Remark 1.

In the case where the returns are normally distributed, and the utility is the exponential utility, then the optimization is easily shown to be equivalent to solving the classic mean-variance objective of Markowitz [11] $U(X)=\mathbb{E}[X]-1/2\lambda Var[X]$ , and in this case the found martingale measure removes the drift while preserving the covariance of the returns.

Direct Construction of Equivalent Martingale Measures

An alternative to the above construction is described in [6] Section 3.1, as follows.

Proposition 2.5.

Define $u$ as above and fix some initial wealth $w_{0}\in\mathbb{R}$ . Let $a^{*}$ be a finite maximizer of

a\longmapsto{\mathbb{E}}\left[\,{u\big{(}\,w_{0}+{a\star DH_{T}}\,\big{)}}\,\right]\ .

(16)

Then, the measure $\mathbb{Q}^{*}$ with density

D^{*}:=\frac{u^{\prime}(w_{0}+{a^{*}\star DH_{T}})}{{\mathbb{E}}\left[\,{u^{\prime}(w_{0}+{a^{*}\star DH_{T}})}\,\right]}

(17)

is an equivalent martingale measure.

Proof.

We prove that $\mathbb{Q}^{*}$ is a martingale measure.⁴⁴4We follow broadly section 3.1 in [6].

Show $\mathbb{E}[u^{\prime}(w_{0}+{a^{*}\star DH_{T}})\,{a\star DH_{T}}]=0$ for all $a$ . For an arbitrary $a$ we get

0=\partial_{\epsilon}|_{\epsilon=0}\mathbb{E}[u(\,(w_{0}+\epsilon a+a^{*})\star DH_{T}\,)]\stackrel{{\scriptstyle\mbox{{\scriptsize${(*)}$}}}}{{{=}}}\mathbb{E}[u^{\prime}(w_{0}+{a^{*}\star DH_{T}})\,{a\star DH_{T}}]\ ,

(18)

where $(*)$ follows from Lemma 2.2. Given that $a$ was arbitrary above also implies $\mathbb{E}[u^{\prime}(w_{0}+{a^{*}\star DH_{T}})\,DH_{t}|{\cal F}_{t}]=0$ .

We first prove that $u^{\prime}({a^{*}\star DH_{T}})\in L^{1}$ : recall that ${a\star DH_{T}}=\mbox{$\sum_{it}$}a^{i}_{t}\cdot DH^{i}_{t}$ . Since $u$ is concave and increasing, $0\leq u^{\prime}(x)\leq u^{\prime}(x-\epsilon)$ for $\epsilon>0$ . Then,

	$\displaystyle 0\leq u^{\prime}(w_{0}+{a^{*}\star DH_{T}})$	$\displaystyle\leq$	$\displaystyle u^{\prime}\left(w_{0}+{a^{*}\star DH_{T}}\right)\|DH\|\mbox{{\sl 1}}_{\|DH\|>1}\,$
			$\displaystyle+u^{\prime}\left(-\mbox{ess $\sup_{it}$}\|a^{*}{}^{i}_{t}\|\right)\mbox{{\sl 1}}_{\|DH\|\leq 1}\,\in L^{1}$

since $a^{*}$ was assumed to be finite.

Positivity of $D^{*}$ : since $u^{\prime}$ is decreasing and positive we have $\lim_{n\uparrow\infty}u^{\prime}(n)=0$ and therefore $\mathbb{P}[u^{\prime}(w_{0}+{a^{*}\star DH_{T}})>0]=\lim_{n\uparrow 0}\mathbb{P}[u^{\prime}(w_{0}+{a^{*}\star DH_{T}})>u^{\prime}(n)]=\lim_{n\uparrow 0}\mathbb{P}[w_{0}+{a^{*}\star DH_{T}}\leq n]=\mathbb{P}[w_{0}+{a^{*}\star DH_{T}}<\infty]=1$ , since $a^{*}$ being almost surely finite implies that ${a^{*}\star DH_{T}}\in L^{1}$ ∎

Remark 2.

We note that the assumption of finiteness of $a^{*}$ again excludes markets with classic arbitrage opportunities.⁵⁵5 Assume that $\tilde{a}$ is a classic arbitrage opportunity with $\mathbb{P}[\tilde{a}\star DH_{T}\geq 0]=1$ and $\mathbb{P}[A]=p>0$ for a set $A=\{\tilde{a}\star DH_{T}\geq g\}$ for $g>0$ . Then $\mathbb{E}[u(w_{0}+(n\tilde{a}1_{A})\star DH_{T})]\geq\mathbb{E}[1_{A}u(w_{0}+ng)]+(1-p)u(w_{0})\geq p\,u(w_{0}+ng)+(1-p)u(w_{0})\uparrow p\,u(\infty)+(1-p)u(w_{0})$ , e.g. no finite maximizer of $\mathbb{E}[u({a\star DH_{T}})]$ exists. If $u$ is the exponential utility, then $D^{*}$ coincides with the previously defined density of the MEMM in (8).

We note that while this approach is somewhat more direct it depends on initial wealth – except in the case of the exponential utility – and lacks the interpretation of the density as a minimizer of some distance to $\mathbb{P}$ .

We now briefly discuss some extensions of the previous results to the cases of unbounded assets, and continuous time processes.

Unbounded Assets

As pointed out in [6] Section 3.1, the requirement $u({a\star DH_{T}})\in L^{1}$ can be enforced at the cost of interpretability of our previous results by passing over to bounded asset prices: to this end define the random variable $M:=\max_{i,t}|DH_{t}^{i}|$ and set $D\bar{H}^{i}_{t}:=DH^{i}_{t}/(1+M)$ , which are now bounded.

We can then show with the same steps as before that we can construct an equivalent martingale measure in this case as follows.

Proposition 2.6.

Let $y^{*}$ and $a^{*}$ be maximizers of the bounded problem

y,a\longmapsto\bar{F}(y,a):={\mathbb{E}}\left[\,{u\left(\frac{y+{a\star DH_{T}}}{1+M}\right)-y}\,\right]\ .

(19)

Then,

D^{*}:=\frac{u^{\prime}\!\left(\frac{y^{*}+{a^{*}\star DH_{T}}}{1+M}\right)}{1+M}

(20)

is an equivalent martingale density for the unscaled problem, i.e. $\mathbb{E}^{*}[{a\star DH_{T}}]\leq 0$ for all $a$ .

Moreover, $D^{*}$ minimizes the scaled $\tilde{u}$ -divergence

D\rightarrow{\mathbb{E}}\left[\,{\tilde{u}\big{(}(1+M)D\big{)}}\,\right]

(21)

over all equivalent martingale densities.

Proof.

We cover the main differences to the previous case: first, we see that

0=\partial_{y}F(y^{*},a^{*})={\mathbb{E}}\left[\,{\frac{u^{\prime}\left(\frac{y^{*}+{a^{*}\star DH_{T}}}{1+M}\right)}{1+M}-1}\,\right]\ .

(22)

Then,

0=\partial_{\epsilon}F(y^{*},\epsilon a+a^{*})={\mathbb{E}}\left[\,{\frac{u^{\prime}\left(\frac{y^{*}+{a^{*}\star DH_{T}}}{1+M}\right)}{1+M}{a\star DH_{T}}}\,\right]

(23)

showing that $D^{*}$ is an equivalent martingale density. Using (13) with $y\rightarrow(1+M)D$ for $D\in{\cal D}_{e}$ and $z\rightarrow(c+{a\star DH_{T}})/(1+M)$ yields as before

\inf_{D\in{\cal D}_{e}}{\mathbb{E}}\left[\,{\tilde{g}\big{(}\,(1+M)D\,\big{)}}\,\right]\geq\sup_{c,a}\Big{\{}{\mathbb{E}}\left[\,{u\left(\frac{c+{a\star DH_{T}}}{1+M}\right)}\,\right]-c\Big{\}}\ .

(24)

Equality in $D^{*}$ follows as before. ∎

Continuous Time

We note that our method of proof also works in a continuous time: let $G_{t}(a):=\int_{0}^{t}\!\!\,a_{r}\,dH_{r}$ where $dH_{t}=\mu_{t}\,dt+\sigma_{t}\,dW_{t}$ (i.e. the classic setup with fixed instruments). Assume that $y^{*},a^{*}$ maximize $y,a\mapsto\mathbb{E}[u(y+G_{T}(a))-y]$ and that $\mathbb{E}[u(G_{T}(a^{*}))]<\infty$ which again excludes markets with classic arbitrage. Then, then same statement as above is true with virtually the same proof.

Let $G^{*}_{t}:=G_{t}(a^{*})$ and notice that if $u\in\mathbb{C}^{3}$ then $f(x):=\log u^{\prime}(x)$ has derivative $f^{\prime}(x)=\frac{u^{\prime\prime}(x)}{u^{\prime}(x)}$ which is the Arrow-Pratt coefficient of absolute risk aversion of $u$ , c.f. [6] section 2.3. Standard calculus shows that

D^{*}_{t}=\exp\left(\int_{0}^{t}\!\!\frac{u^{\prime\prime}(G^{*}_{r})}{u^{\prime}(G^{*}_{r})}\sigma_{r}\,dW_{r}-\frac{1}{2}\int_{0}^{t}\!\!\left(\frac{u^{\prime\prime}(G^{*}_{r})}{u^{\prime}(G^{*}_{r})}\sigma_{r}\right)^{2}\!\!dr\right)

(25)

Under $\mathbb{Q}^{*}$ our assets are driftless and satisfy $dH_{t}=\sigma_{t}\,dW^{*}_{t}$ for a $\mathbb{Q}^{*}$ -Brownian motion $W^{*}$ . This implies the well-known result

\frac{u^{\prime\prime}(G^{*}_{t})}{u^{\prime}(G^{*}_{t})}=\frac{\mu_{t}}{\sigma^{2}_{t}}\ .

(26)

3 Transaction costs and trading constraints

The previous section enables us to simulate markets from a martingale measure in the absence of trading frictions. In practise, trading strategies will be subject to trading cost and constraints such as liquidity and risk limits. Our use-case is training a Deep Hedging agent. We therefore now extend the previous results to the case of generalized cost functions which will cover both trading cost and most trading constraints.

A generalized cost function is a non-negative, $\mathcal{F}_{t}$ -measurable function $c_{t}(a_{t})\equiv c(a_{t};s_{t})$ with values in $[0,\infty]$ , which is convex in $a_{t}$ , lower semi-continuous, and normalized to $c_{t}(0)=0$ . To impose convex restrictions on our trading activity, we set transaction cost to infinity outside the admissible set. Indeed, let ${\cal A}_{t}$ be be a convex set of admissible trading actions, and $\bar{c}_{t}$ an initial const function. We then use $c_{t}(a_{t}):=\bar{c}_{t}(a_{t})+\infty\mbox{{\sl 1}}_{a_{t}\not\in{\cal A}_{t}}\,$ . (We note that this construction is lower semi-continuous.) As example, let us assume the $i$ th instrument is not tradable in $t$ . We then impose $c_{t}(a_{t})=\infty$ whenever $|a^{i}_{t}|>0$ .

In reverse, if $c_{t}$ is a generalized cost function, we may call ${\cal A}_{t}:=\{a\in\mathbb{R}^{n}:\,c_{t}(a)<\infty\}$ the convex set of admissible actions. Note also that by construction $0\in{\cal A}_{t}$ .

Example 3.1.

The simplest trading costs are proportional. Assume that $\Delta_{t}$ and $\mathrm{V}_{t}$ are observable Black & Scholes delta and vega of the mid-prices $H^{(t)}_{t}$ for the trading instruments available at $t$ , and that the cost of trading $a^{i}$ units of $H^{t,i}_{t}$ is proportional to its delta and vega with cost factors $g^{\pm}_{\Delta}$ and $g^{\pm}_{\mathrm{V}}$ for buying and selling, respectively. We also impose that we may trade at most $\mathrm{V}_{\mathrm{max}}$ units of vega per time step. The corresponding cost function is given by

c_{t}(a):=\left\{\begin{array}[]{ll}a^{+}\cdot\left(g^{+}_{\Delta}\Delta_{t}+g^{+}_{\mathrm{V}}\mathrm{V}_{t}\right)&a\cdot\mathrm{V}_{t}\leq\mathrm{V}_{\mathrm{max}}\\ \ \ \ \ \ +a^{-}\cdot\left(g^{-}_{\Delta}\Delta_{t}+g^{-}_{\mathrm{V}}\mathrm{V}_{t}\right)&\\ \infty&a\cdot\mathrm{V}_{t}>\mathrm{V}_{\mathrm{max}}\end{array}\right.

(27)

Example 3.2.

Consider trading cost which apply only to net delta and vega traded, e.g.

c_{t}(a):=\left\{\begin{array}[]{ll}g^{+}_{\Delta}\left(a\cdot\Delta_{t}\right)^{+}+g^{+}_{\mathrm{V}}\left(a\cdot\mathrm{V}_{t}\right)^{+}&a\cdot\mathrm{V}_{t}\leq\mathrm{V}_{\mathrm{max}}\\ \ \ \ \ \ +g^{-}_{\Delta}\left(a\cdot\Delta_{t}\right)^{-}+g^{-}_{\mathrm{V}}\left(a\cdot\mathrm{V}_{t}\right)^{-}&\\ \infty&a\cdot\mathrm{V}_{t}>\mathrm{V}_{\mathrm{max}}\end{array}\right.

(28)

The terminal gain of implementing a trading policy $a$ with cost function $c$ is given by

{a\star DH_{T}}-C_{T}(a)\ \ \ \mbox{where}\ \ \ C_{T}(a):=\sum_{t=0}^{T-1}c_{t}(a_{t})\ .

(29)

The marginal cost of trading small quantities of the $i$ th asset in $t$ are given as

\gamma^{i+}_{t}:=+\partial_{\epsilon>0}c_{t}(\epsilon e^{i})\ \ \ \mbox{and}\ \ \ \gamma^{i-}_{t}:=-\partial_{\epsilon>0}c_{t}(-\epsilon e^{i})\ .

(30)

They define the marginal cost function

m_{t}(a):=a^{+}\cdot\gamma^{+}_{t}-a^{-}\cdot\gamma^{-}_{t}\ ,

(31)

3.1 Statistical arbitrage and near-martingale measures

Under the statistical measure we expect there to be statistical arbitrage opportunities, i.e. trading strategies $a$ such that we expect to make money:

\mathbb{E}[{a\star DH_{T}}]>0\ .

(32)

In the absence of transaction costs, the market will be free from statistical arbitrage if and only if we are under a martingale measure.⁶⁶6 Assume that there is a $t$ such that $f_{t}:=\mathbb{E}[DH_{t}|{\cal F}_{t}]\not=0$ . Set $a_{t}:=\mathrm{sign}\,f_{t}$ . Then the strategy $a=(0,\ldots,a_{t},\ldots,0)$ is a statistical arbitrage strategy. Since the gains of trading with transaction costs are almost surely never greater than the gains in the absence of transaction costs, it is clear that if $\mathbb{Q}$ is an equivalent martingale measure for the market, then there are no statistical arbitrage opportunities under transaction cost, either, i.e. $\mathbb{E}_{\mathbb{Q}}\left[\,{a\star DH_{T}}-C_{T}(a)\right]\leq 0$ for all policies $a$ (equality is acheived with $a\equiv 0$ ). Taking the limit to small transaction cost, it becomes inutitively clear that $\mathbb{E}_{\mathbb{Q}}\left[\,{a\star DH_{T}}-M_{T}(a)\right]\leq 0$ as well for marginal cost. In fact, inuitively it makes sense that the market is free of statistical arbitrage with full cost $c$ if and only if it is free of statistical arbitrage with marginal cost $m$ .

Here is our formal result:

Proposition 3.3.

We call $\mathbb{Q}$ a near martingale measure if any of the following equivalent conditions hold:

•

the measure $\mathbb{Q}$ is free from statistical arbitrage with full cost $c$ ;
•

the measure $\mathbb{Q}$ is free from statistical arbitrage with marginal cost $m$ ; and

•

the expected return from any hedging instrument is within its marginal bid/ask spread in the sense that

\underbrace{H^{(t,i)}_{t}-\gamma^{i-}_{t}}_{\begin{array}[]{c}\mbox{Marginal}\\ \mbox{bid price}\end{array}}\leq\underbrace{\mathbb{E}_{\mathbb{Q}}\big{[}H^{(t,i)}_{T}\big{|}\mathcal{F}_{t}\big{]}}_{\mbox{Expected gains}}\leq\underbrace{H^{(t,i)}_{t}+\gamma^{i+}_{t}}_{\begin{array}[]{c}\mbox{Marginal}\\ \mbox{ask price}\end{array}}\ ,

(33)

with $\gamma^{i\pm}_{t}$ defined in (30).

Proof.

Assume first there are no statistical arbitrage opportunities with full cost $c$ . We will show (33). Let $A\in{\cal F}_{t}$ arbitrary and let $e^{i}_{t}$ the policy with unit vector $e^{i}$ at $t$ and zero elsewhere; for ease of notation we will also write $e^{i}_{t}$ for simply the unit vector, seen at a time $t$ .

Absence of statistical arbitrage implies that $0\geq\frac{1}{\epsilon}\mathbb{E}_{\mathbb{Q}}\big{[}{(\pm\epsilon e^{i}_{t}1_{A})\star DH_{T}}-C_{T}(\pm\epsilon e^{i}_{t}1_{A})\big{]}$ for all $\epsilon>0$ , and therefore

0\geq\partial_{\epsilon>0}\mathbb{E}_{\mathbb{Q}}\big{[}1_{A}\left\{\pm\epsilon{e^{i}_{t}\star DH_{T}}-C_{T}(\pm\epsilon e^{i}_{t})\right\}\big{]}=\mathbb{E}_{\mathbb{Q}}\big{[}1_{A}\ \big{\{}DH_{t}^{i}\mp\gamma^{i\pm}_{t}\big{\}}\big{]}

which yields (33).

Assume now that (33) holds, and let $a$ be arbitrary. Then $\mathbb{E}_{\mathbb{Q}}[\sum{}_{t}a_{t}\cdot DH_{t}-m_{t}(a_{t})]\leq 0$ by construction of our marginal cost (31) and (33). Hence, there is no statistical arbitrage with cost $m$ . Since $c_{t}\geq m_{t}$ it is also clear that if there is no statistical arbitrage with marginal cost $m$ , then there is also no statistical arbitrage with full cost $c$ . ∎

Remark 3.

Under the conditions of the above theorem the conditional expectation $\mathbb{E}_{\mathbb{Q}}\big{[}H^{(t,i)}_{T}\big{|}\mathcal{F}_{t}\big{]}$ defines a martingale “micro-price” [14] within the bid–ask spread.

In the absence of trading costs or trading constraints then equality is acheived. That is, the market is free from statistical arbitrage if and only if

\mathbb{E}_{\mathbb{Q}}\big{[}H^{(t,i)}_{T}\big{|}\mathcal{F}_{t}\big{]}=H^{(t,i)}_{t}\ .

resulting in the classic formulation of the price process being a martingale under $\mathbb{Q}$ .

3.2 Utility-based near-martingale measures under trading frictions

We now proceed with constructing a near-martingale measure $\mathbb{Q}^{*}$ via the same duality as in the zero transaction cost case. Define again the function

F(y,a):={\mathbb{E}}\left[\,{u\big{(}y+{a\star DH_{T}}-M_{T}(a)\big{)}-y}\,\right]

(34)

just as in (7), but this time with marginal transaction costs.

Proposition 3.4.

Let $y^{*}$ and $a^{*}$ be finite maximizers of $y,a\mapsto F(y,a)$ . Then

D^{*}:=u^{\prime}\!\left(y^{*}+{a^{*}\star DH_{T}}-M_{T}(a^{*})\right)

(35)

is an equivalent density, and the measure $\mathbb{Q}^{*}$ defined by $d\mathbb{Q}^{*}:=D^{*}d\mathbb{P}$ is a near-martingale measure. Moreover, the density $D^{*}$ minimizes the $\tilde{u}$ -divergence among all equivalent near-martingale densities.

Proof.

To show that $D^{*}$ is a equivalent near- martingale density most of the previous proof applies as before, except of course (9) since $D^{*}$ is not an equivalent martingale measure. Instead, we will show that there is no statistical arbitrage under $\mathbb{Q}^{*}$ . Let therefore $A\in{\cal F}_{t}$ be arbitrary, and denote by $e^{i}_{t}$ the strategy with unit vector $e^{i}$ in $t$ and zero otherwise; for notational simplicity we will also use $e^{i}_{t}$ to refer simply the unit vector, in $t$ .

Define $F_{\pm}(\epsilon):=\pm{\mathbb{E}}\left[\,{u\big{(}y^{*}+{(\pm\epsilon{\mbox{{\sl 1}}_{A}\,}e^{i}_{t}+a^{*})\star DH_{T}}-M_{T}(\pm\epsilon\mbox{{\sl 1}}_{A}\,e^{i}_{t}+a^{*})\big{)}-y^{*}}\,\right]$ . Consider the derivative $\partial_{\epsilon}F_{\pm}(0)={\mathbb{E}^{*}}\left[\,{\mbox{{\sl 1}}_{A}\,\left\{DH^{i}_{t}\pm\partial_{\epsilon>0}m_{t}(\pm\epsilon e^{i}_{t}+a^{*}_{t})\right\}}\,\right]$ : we recall that $m_{t}(a)=a^{+}_{t}\cdot\gamma^{+}_{t}-a^{-}_{t}\gamma^{-}_{t}$ . Therefore

(*)=\pm\partial_{\epsilon>0}m_{t}(\pm\epsilon e^{i}_{t}+a^{*}_{t})=\left\{\begin{array}[]{ll}-\gamma^{i+}_{t}&\mbox{if $a^{*i}_{t}>0$,}\\ +\gamma^{i-}_{t}&\mbox{if $a^{*i}_{t}<0$,}\\ \mp\gamma^{i\pm}_{t}&\mbox{if $a^{*i}_{t}=0$.}\\ \end{array}\right.

Since $(a^{*},y^{*})$ are optimal we must have $0\in[\min(*),\max(*)]$ . Given that $A\in{\cal F}_{t}$ was arbitrary we obtain

\left\{\begin{array}[]{rlll}&\mathbb{E}^{*}[DH_{t}|{\cal F}_{t}]&=+\gamma^{i+}_{t}&\mbox{if $a^{*i}_{t}>0$,}\\ -\gamma^{i\pm}_{t}\leq&\mathbb{E}^{*}[DH_{t}|{\cal F}_{t}]&\leq+\gamma^{i+}_{t}&\mbox{if $a^{*i}_{t}=0$, and }\\ -\gamma^{i-}_{t}=&\mathbb{E}^{*}[DH_{t}|{\cal F}_{t}]&&\mbox{if $a^{*i}_{t}<0$.}\\ \end{array}\right.

(36)

This is in fact a more precise statement than (33).

We now show that $D^{*}$ minimizes the $\tilde{u}$ -divergence among all measures $D\in{\cal D}_{e}:=\{\ D>0:\ \mathbb{E}[D]=1,\mathbb{E}[D\,({a\star DH_{T}}-M_{T}(a))]\leq 0\ \mbox{for all~$a$}\ \}$ . We apply (13) again with $y\rightarrow D\in{\cal D}_{e}$ and $x\rightarrow y+{a\star DH_{T}}-M_{T}(a)$ . This yields

\mathbb{E}[\tilde{u}(D)]\geq\mathbb{E}[u(y+{a\star DH_{T}}-M_{T}(a))]-\mathbb{E}[D\,(y+{a\star DH_{T}}-M_{T}(a))]\geq\mathbb{E}[u(y+{a\star DH_{T}}-M_{T}(a))-y]\ ,

(37)

where the last inequality holds since $D$ does not admit statistical arbitrage. The right hand side is maximized in $(a^{*},y^{*}$ ). For the left hand side, apply again (15) which yields

\mathbb{E}[\tilde{u}(D^{*})]=\mathbb{E}[u(y^{*}+{a^{*}\star DH_{T}}-M_{T}(a^{*}))-y^{*}]-\underbrace{\mathbb{E}^{*}[{a^{*}\star DH_{T}}-M_{T}(a^{*})]}_{=0}\ .

(38)

This proves that $D^{*}$ is $\tilde{u}$ -minimal among all near-martingale measures. ∎

Considering that any equivalent true martingale measure is also a near-martingale measure, this result is a formalization of the intuitive notion that in order to avoid statistical arbitrage we do not truly have to find a full martingale measure, but that we only have to “bend” the drifts of our trading instruments enough to be dominated by prevailing trading cost.

4 Learning to Simulate Risk-Neutral Dynamics

The key insight of our utility-based risk-neutral density construction is that it relies only on solving the optimization problem of find $a^{*}$ and $y^{*}$ , not on specifying any particular dynamics for the market under the $\mathbb{P}$ measure. Therefore, it can be done in a data-driven, model agnostic way, lending itself to the application of modern machine learning methods. Specifically, given a set of $N$ samples from a $\mathbb{P}$ market simulator, we may make the sample set risk neutral by numerically solving the optimization problem on the $N$ paths, and then using our formulation to reweight the paths so that the resulting weighted sample is a (near-)martingale. As mentioned above, this is particularly useful in the case of removing statistical arbitrage from a “black box” market simulator, such as the GAN based approach discussed in [1].

Our approach enables the adaptation of GAN and other advanced machine learning approaches so that they can not only simulate realistic samples from the statistical measure, but also from an equivalent risk neutral measure. Moreover, through the choice of utility function, we are able to control the risk neutral measure we construct.

We solve the stochastic control problem (7) through an application of the ‘Deep Hedging’ methods of [3]: we can pose (7) as a reinforcement learning problem and use a neural network to represent our trading policy $a$ , and since the function $F$ is fully differentiable, use stochastic gradient methods to find $a^{*},y^{*}$ , and hence $D^{*}$ .

4.1 Deep Hedging under Risk-Neutral Dynamics

Our primary application is in the pricing and hedging of exotic options via utility-based Deep Hedging. With a portfolio of derivatives represented by the random variable $Z$ to hedge, the Deep Hedging problem under the statistical measure $\mathbb{P}$ is to maximize the optimized certainty equivalent

\mathbb{U}_{\mathbb{P}}(Z):=\sup_{a,y}\ {\mathbb{E}_{\mathbb{P}}}\left[\,{u\big{(}y+Z+{a\star DH_{T}}-C_{T}(a)\big{)}-y}\,\right]

(39)

over strategies $a$ and $y\in\mathbb{R}$ . An optimal solution $a^{*}_{\mathbb{P}}$ is called an optimal hedge for $Z$ . We note that in the presence of statistical arbitrage $\mathbb{U}^{*}(0)>0$ . Deep Hedging under a near-martingale measure $\mathbb{Q}^{*}$ then is

\mathbb{U}^{*}(Z):=\sup_{a,y}\ \mathbb{E}^{*}\left[u\big{(}y+Z+{a\star DH_{T}}-C_{T}(a)\big{)}-y\right].

(40)

Since under $\mathbb{Q}^{*}$ we have $\mathbb{U}^{*}(0)=0$ we note that $\mathbb{U}^{*}$ represents an indifference price for $Z$ in the sense of [3] section 3.

In the case of hedging under exponential utility with zero transaction costs, it is straightforward to show that the optimal hedge for the derivative $Z$ under the statistical measure can be written as

a^{*}_{\mathbb{P}}=a_{\mathbb{Q}}^{*}+a_{0}^{*}\ ,

(41)

where $a_{\mathbb{Q}}^{*}$ is an optimal hedge for $Z$ under the minimal entropy martingale measure (MEMM), and where $a_{0}^{*}$ is an optimal statistical arbitrage strategy, i.e. an optimal “hedge” for an empty initial portfolio. In this sense we may regard $a^{*}_{\mathbb{Q}}$ as a net hedging strategy for $Z$ .

A consequence of this is that the hedge found by solving the Deep Hedging problem under the statistical measure will be a sum of a true hedge, and a component which does not depend on $Z$ and is simply seeking profitable opportunities in the market. Solving the optimization problem under the risk neutral measure will then directly remove the statistical arbitrage component of the strategy, leaving a “clean” hedge for the derivative, which is not sensitive to the estimation of the mean returns of our hedging instruments.

5 Numerical implementations

To demonstrate our approach we apply it to two market simulators. First, we discuss a simple, but usable multivariate “PCA” vector autoregressive model for spot and a form of implied volatilities. Secondly, we also present results for a Generative Adversarial Network based simulator based on the ideas presented in [1].

5.1 Vector Autoregressive market simulator

For the first numerical experiment, we build a VAR market simulator as follows. For the simulation we use discrete local volatilies (DLVs) as arbitrage-free parametrization of option prices. We do not use the underlying model dynamics; the only use of DLVs is arbitrage-free parametrization of the option surface. We briefly recap the relevant notation: assume thaty $0=\tau_{0}<\tau_{1}<\cdots<\tau_{m}$ are time-to-maturities and $0<x_{1}<\ldots<1<\ldots<x_{n}$ relative strikes.⁷⁷7See [4] for the use of inhomogeneous strike grids. Also define the additonal boundary strikes $0\leq x_{0}\ll x_{1}$ and $x_{n+1}:=1+2x_{n}\gg x_{n}$ .

For $i=1,\ldots,n$ and $j=1,\ldots,m$ we denote by $C^{j,i}$ the price of the call option with payoff $(S_{\tau_{j}}/S_{0}-x_{i})^{+}$ at maturity $\tau_{j}$ . Define $\Delta^{j,i}:=\frac{C^{j,i+1}-C^{j,i}}{x_{i+1}-x_{i}}$ , $\Gamma^{j,i}:=\Delta^{j,i}-\Delta^{j,{i-1}}$ and $\Theta^{j,i}:=\frac{C^{j,i}-C^{j,i-1}}{\tau_{j}-\tau_{j-1}}$ . The discrete local volatility surface $(\sigma^{j,i})_{j,i}$ for $j=1,\ldots,m$ and $i=1,\ldots,n$ is defined by

\sigma^{j,i}:=\sqrt{\frac{2\,\Theta^{j,i}}{x^{j,i}{}^{2}\Gamma^{j,i}}}\ ,

(42)

where we set $\sigma^{j,i}=\infty$ whenever the square root is imaginary. We also set $\sigma^{j,i}=0$ if $0/0$ occurs. We recall that a given surface of option prices is free of static arbitrage if and only if $\sigma<\infty$ . Moreover, given a surface of finite discrete local volatilities, we can reconstruct the surface of arbitrage-free call prices by solving the implicit finite difference scheme implied by (42). This involves inverting sequentially $m$ tridiagonal matrices, an operation which is “on graph” in modern automatic adjoint differentiation (AAD) machine learning packages such as TensorFlow, see [4] for further details.

For a given time series of vectors of historical log spot returns and log DLVs

Y_{r}=\left(\log\frac{S_{r}}{S_{r-1}},\log\sigma^{1,1}_{r},\ldots,\log\sigma^{m,n}_{r}\right)^{\prime}\ ,

(43)

we estimate a vector autoregression model of the form

Y_{r}=\left(B-A_{1}Y_{r-1}-A_{2}Y_{r-2}\right)dt+\sqrt{dt}Z_{r}\ ,\ \ \ Z_{r}\sim{\cal N}(0,\Sigma)

(44)

where each $A_{1,2}$ is a $mn+1\times mn+1$ coefficient matrix, $B=(B_{0},B_{1},\ldots,B_{mn})^{\prime}$ is an intercept, and $\Sigma$ is a volatility matrix.

Constructing $\mathbb{P}$ – we train the model to historical data from EURO STOXX 50, using standard regression techniques from the Statsmodels Python package [13]. Once the model has been trained, we can simulate new sample paths of log spot returns and discrete local volatilities by sampling new noise variables $Z_{r}$ and stepping the model forward. We then convert the DLVs to option prices using the methods detailed above, so that we can simulate market states of spot and option prices.

We generate $10^{5}$ paths, of length 30 days from a VAR model, where each path consists of spot and both put and call option prices on a grid of maturities $\{20,40,60\}$ and relative strikes $\{0.85,\ldots,1.15\}$ .

Constructing $\mathbb{Q}^{*}$ – we construct the risk neutral measure $\mathbb{Q}^{*}$ by solving 34 with the exponential utility, using proportional transaction costs for all instruments set to $\gamma=0.001$ . To parametrize our policy action, we use a two layer feedforward neural network, with 64 units in each layer and ReLU activation functions. We train for 2000 epochs on the training set of $10^{5}$ paths.

Assessing Performance – Figure 1 compares out-of-sample the expected value of the option payoffs vs. their prices for the full grid of calls and puts under both the statistical and the risk-free measure in relation to trading cost.

Refer to caption — Figure 1: Average realised drift for call options (top) and put options (bottom) under the $\mathbb{P}$ market simulator (left) and $\mathbb{Q}^{*}$ simulator (right), by strike and maturity.

The expected payoff under the changed measure has been flattened to zero, and now lies within the transaction cost level, so that the tradable drift has been removed.

To further confirm that statistical arbitrage has indeed been eliminated from the market simulator under $\mathbb{Q}^{*}$ , we train a second strategy under the new measure, with identical neural network architecture and the same utility function but incresed transaction cost $\gamma=0.002$ . Figure 2 shows the distributions of terminal gains of respective estimated optimal strategies under $\mathbb{P}$ and $\mathbb{Q}$ . We compare the method using the exponential utility, and the adjusted mean-volatility utility $u(x):=(1+\lambda x-\sqrt{1+\lambda^{2}x^{2}})/\lambda$ . In both cases, the distribution of gains is now tightly centred at zero confirming that statistical arbitrage has been removed.

5.2 GAN market simulator

To demonstrate the flexibility of our approach, we now apply it to a more data driven simulator for spot and option prices based on Generative Adversarial Networks (GANs) [8] as in[1].

We illustrate the effect on Deep Hedging of changing measure with the following numerical experiment. We hedge a short position in a digital call option, with market instruments being the spot and at the money call options with maturities $20$ and $40$ days. We first train a network under the zero portfolio to find a maximal statistical arbitrage strategy, then use this to construct the risk neutral density. We then train two Deep Hedging networks to hedge the digital, one under the original, unweighted, market and one under the risk neutral market. All networks are trained to maximize exponential utility. Figure 3 compares the final hedged PNL of the two strategies on the left, and the PNL of the strategies, with the statistical arbitrage component subtracted, on the right (i.e. $a_{\mathbb{Q}}^{*}$ vs. $a_{\mathbb{P}}^{*}-a_{0}^{*}$ ). The distribution of PNL from the risk neutral hedge is clearly less wide tailed, and the righthand plot demonstrates that we have removed the statistical arbitrage element as the distributions now align.

Robustness of $\mathbb{Q}^{*}$ – by removing the statistical arbitrage component of the strategy, the risk neutral hedge $a_{\mathbb{Q}}^{*}$ represents a more robust hedge with respect to uncertainty in the market simulator, i.e. when the future distribution of the market at model deployment differs slightly from the distribution of the training data generated by the market simulator. Consider the case where the future market returns follow a distribution $\tilde{\mathbb{P}}$ , which is similar to $\mathbb{P}$ in the sense that $H(\tilde{\mathbb{P}}|\mathbb{P})\leq c$ for some small $c$ where $H$ is the relative entropy. For illustration, we can construct such a measure by simply perturbing the weights of the simulated paths slightly.

In particular, we consider perturbations which are unfavourable for the original strategy, where the strategy was over-reliant on the perceived drift in the market, which is no longer present under $\tilde{}\mathbb{P}$ . Figure 4 shows the new PNL distributions under measures $\tilde{\mathbb{P}}$ with $H(\tilde{\mathbb{P}}|\mathbb{P})=c$ for $c=0.05,0.5.$ What is striking is that a relatively small shift in the measure can significantly worsen the distribution of hedged PNL of the original Deep Hedging model, but that the distribution of PNL of the model trained on the risk neutral measure remains practically invariant, indicating that the model performances more consistently with respect to uncertainty. Naturally, the method provides robustness against estimation errors for mean returns of the underlying assets.

Conclusion

We have presented a numerically efficient method for computing a risk-neutral density for a set of paths over a number of time steps. Our method is applicable to paths of derivatives and option prices in particular, hence we effectively provide a framework for statistically learned stochastic implied volatility via the application of machine learning tools. Our method is generic and does not depend on the market simulator itself, except that it requires that the simulator does not produce classic arbitrage opportunities. It also caters naturally for transaction costs and trading constraints, and is easily extended to multiple assets.

The method is particularly useful to introduce robustness to a utility-based machine learning approach to the hedging of derivatives, where the use of simulated data is essential to train a ‘Deep Hedging’ neural network model. If trained directly on data from the statistical measure, in addition to risk management of the derivative portfolio, the Deep Hedging agent will pursue statistical arbitrage opportunities that appear in the data, thus the hedge action will be polluted by drifts present in the simulated data. By applying our method, we remove any statistical arbitrage opportunities from the simulated data, resulting in a policy from the Deep Hedging agent that seeks to only manage the risk of the derivative portfolio, without exploiting any drifts. This in turn makes the suggested hedge more robust to any uncertainty inherent in the simulated data.

References

[1] Lanjun Bai, Hans Buehler, Mangnus Wiese and Ben Wood “Deep Hedging: Learning to Simulate Equity Option Markets” In SSRN, 2019 URL: https://ssrn.com/abstract=3470756
[2] Aharon Ben-Tal and Marc Teboulle “An old-new concept of convex risk measures: The optimized certainty equivalent” In Mathematical Finance 17.3 Wiley Online Library, 2007, pp. 449–476
[3] H. Buehler, L. Gonon, J. Teichmann and B. Wood “Deep Hedging” In Quantitative Finance 0.0 Routledge, 2019, pp. 1–21 URL: https://ssrn.com/abstract=3120710
[4] Hans Buehler and Evgeny Ryskin “Discrete Local Volatility for Large Time Steps (Short Version)”, 2016 URL: https://ssrn.com/abstract=2783409
[5] Hans Buehler et al. “A data-driven market simulator for small data environments” In Available at SSRN 3632431, 2020
[6] Hans Föllmer and Alexander Schied “Stochastic Finance” De Gruyter, 2008 DOI: doi:10.1515/9783110212075
[7] Marco Frittelli “The minimal entropy martingale measure and the valuation problem in incomplete markets” In Mathematical finance 10.1 Wiley Online Library, 2000, pp. 39–52
[8] Ian Goodfellow et al. “Generative adversarial nets” In Advances in neural information processing systems 27, 2014
[9] V. Henderson and D. Hobson “Utility indifference pricing : an overview”, 2009 URL: https://warwick.ac.uk/fac/sci/statistics/staff/academic-research/henderson/publications/indifference_survey.pdf
[10] Jan Kallsen and Paul Krühner “On a Heath-Jarrow-Morten approach for Stock Options” In Finance and Stochastics 19, 2015, pp. 583–615 URL: https://arxiv.org/pdf/1305.5621.pdf
[11] Harry Markowitz “Portfolio Selection” In The Journal of Finance 7.1, 1952, pp. 77–91 DOI: 10.2307/2975974
[12] Philipp Schönbucher “A market model for stochastic implied volatility” In Phil. Trans. R. Soc. A., 1999, pp. 2071–2092 URL: https://ssrn.com/abstract=182775
[13] Skipper Seabold and Josef Perktold “statsmodels: Econometric and statistical modeling with python” In 9th Python in Science Conference, 2010
[14] Sasha Stoikov “The micro-price: a high-frequency estimator of future prices” In Quantitative Finance 18.12 Routledge, 2018, pp. 1959–1966
[15] Johannes Wissel “Arbitrage-free market models for option prices”, 2007 URL: http://www.nccr-finrisk.uzh.ch/media/pdf/wp/WP428_D1.pdf

Disclaimer

Opinions and estimates constitute our judgement as of the date of this Material, are for informational purposes only and are subject to change without notice. It is not a research report and is not intended as such. Past performance is not indicative of future results. This Material is not the product of J.P. Morgan’s Research Department and therefore, has not been prepared in accordance with legal requirements to promote the independence of research, including but not limited to, the prohibition on the dealing ahead of the dissemination of investment research. This Material is not intended as research, a recommendation, advice, offer or solicitation for the purchase or sale of any financial product or service, or to be used in any way for evaluating the merits of participating in any transaction. Please consult your own advisors regarding legal, tax, accounting or any other aspects including suitability implications for your particular circumstances. J.P. Morgan disclaims any responsibility or liability whatsoever for the quality, accuracy or completeness of the information herein, and for any reliance on, or use of this material in any way.
Important disclosures at: www.jpmorgan.com/disclosures

	$\displaystyle 0\leq u^{\prime}(y^{}+{a^{}\star DH_{T}})$	$\displaystyle\leq$	$\displaystyle u^{\prime}\left(y^{}+{a^{}\star DH_{T}}\right)\|DH\|\mbox{{\sl 1}}_{\|DH\|>1}\,$
			$\displaystyle+u^{\prime}\left(y^{}-\mbox{ess $\sup_{it}$}\|a^{}{}^{i}_{t}\|\right)\mbox{{\sl 1}}_{\|DH\|\leq 1}\,\in L^{1}$