This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Least Squares Monte Carlo applied to Dynamic Monetary Utility Functions

Hampus Engsner111Corresponding author. Department of Mathematics, Stockholm University, SE-10691 Stockholm, Sweden. e-mail: hampus.engsner@math.su.se
Abstract

In this paper we explore ways of numerically computing recursive dynamic monetary risk measures and utility functions. Computationally, this problem suffers from the curse of dimensionality and nested simulations are unfeasible if there are more than two time steps. The approach considered in this paper is to use a Least Squares Monte Carlo (LSM) algorithm to tackle this problem, a method which has been primarily considered for valuing American derivatives, or more general stopping time problems, as these also give rise to backward recursions with corresponding challenges in terms of numerical computation. We give some overarching consistency results for the LSM algorithm in a general setting as well as explore numerically its performance for recursive Cost-of-Capital valuation, a special case of a dynamic monetary utility function.

keywords: Monte Carlo algorithms, least-squares regression, multi-period valuation, dynamic utility funcitons

1 Introduction

Dynamic monetary risk measures and utility functions, as described for instance in [1] and [4], are time consistent if and only if they satisfy a recursive relationship (see for instance [5], [19]). In the case of time-consistent valuations of cash flows, often in an insurance setting, (e.g. [12], [13], [17], [19], [20], [18], [11]), analogous recursions also appear. Recursive relationships also occur as properties of solutions to optimal stopping problems, of which valuation of American derivatives is a special case. It is well known that numerical solutions to these kinds of recursions suffer from ”the curse of dimensionality”: As the underlying stochastic process generating the flow of information grows high dimensional, direct computations of solutions of these recursions prove unfeasible.

In order to make objective of this paper more clear, consider a probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}), a dd-dimensional Markov chain (St)t=0T(S_{t})_{t=0}^{T} in L2()L^{2}(\mathcal{F}) and its natural filtration (t)t=0T(\mathcal{F}_{t})_{t=0}^{T}. We are interested in computing V0V_{0} given as the solution to the following recursion

Vt=φt(f(St+1)+Vt+1),VT=0,\displaystyle V_{t}=\varphi_{t}(f(S_{t+1})+V_{t+1}),\quad V_{T}=0, (1)

where, for each tt, φt:L2(t+1)L2(t)\varphi_{t}:L^{2}(\mathcal{F}_{t+1})\to L^{2}(\mathcal{F}_{t}) is a given law-invariant mapping (see Section 2 and definition 3). Recursions such as (1) arise when describing time-consistent dynamic monetary risk measures/utility functions (see e.g. [4]). Alternatively, we may be interested in computing V0V_{0} given as the solution to the following recursion

Vt=f(St)𝔼[Vt+1t],VT=f(ST),\displaystyle V_{t}=f(S_{t})\lor\mathbb{E}[V_{t+1}\mid\mathcal{F}_{t}],\quad V_{T}=f(S_{T}), (2)

where f:df:\mathbb{R}^{d}\to\mathbb{R} is a given function. Recursions such as (2) arise when solving discrete-time optimal stopping problems or valuing American-style derivatives (see e.g. [14] and [16]). In this article we will focus the recursive expression (1). In either case, due to the Markovian assumptions, we expect VtV_{t} to be determined by some deterministic function of the state StS_{t} at time tt. The curse of dimensionality can now be succinctly put as the statement that as the dimension dd grows, direct computation of VtV_{t} often becomes unfeasible. Additionally, brute-force valuation via a nested Monte Carlo simulation, discussed in [3] and [2], is only a feasible option when T=2T=2, as the number of required simulations would grow exponentially with TT. One approach to tackle this problem is the Least Squares Monte Carlo (LSM) algorithm, notably used in [16] to value American-style derivatives, and consists of approximating Vt+1V_{t+1} in either (1) or (2) as a linear combination basis functions of the state St+1S_{t+1} via least-squares regression. While most often considered for optimal stopping problems ([16], [22], [7], [21], [14], [23], [24], [25]), it has also been used recently in [18] for the purpose of actuarial valuation, with respect to a recursive relationship in line with (1).

The paper is organized as follows. In Section 2 we introduce the mathematical definitions and notation that will allow us to describe the LSM algorithm in our setting mathematically, as well as to formulate theoretical results. Section 3 contains consistency results with respect to computing (1) both in a general setting, only requiring an assumption of continuity in L2L^{2} norm, and for the special case of a Cost-of-Capital valuation, studied in [12] and [13], under the assumption that capital requirements are given by the risk measure Value-at-Risk, in line with solvency II. The lack of convenient continuity properties of Value-at-Risk pose certain challenges that are handled. Section 4 investigates the numerical performance of the LSM algorithm on valuation problems for a set of models for liability cash flows. Here some effort is also put into evaluating and validating the LSM algorithm’s performance, as this is not trivial for the considered cases.

2 Mathematical setup

Consider a probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}). On this space we consider two filtrations (t)t=0T(\mathcal{F}_{t})_{t=0}^{T}, with 0={,Ω}\mathcal{F}_{0}=\{\emptyset,\Omega\}, and (t)t=0T(\mathcal{H}_{t})_{t=0}^{T}. The latter filtration is an initial expansion of the former: take 𝒟\mathcal{D}\subset\mathcal{F} and set t:=t𝒟\mathcal{H}_{t}:=\mathcal{F}_{t}\vee\mathcal{D}. 𝒟\mathcal{D} will later correspond to the σ\sigma-field generated by initially simulated data needed for numerical approximations. Define L2(t)L^{2}(\mathcal{H}_{t}) as the spaces of t\mathcal{H}_{t} measurable random variables ZZ with 𝔼[Z2]<\mathbb{E}[Z^{2}]<\infty. The subspace L2(t)L2(t)L^{2}(\mathcal{F}_{t})\subset L^{2}(\mathcal{H}_{t}) is defined analogously. All equalities and inequalities between random variables should be interpreted in the \mathbb{P}-almost sure sense.

We assume that the probability space supports a Markov chain S=(St)t=0TS=(S_{t})_{t=0}^{T} on (d)T(\mathbb{R}^{d})^{T}, where S0S_{0} is constant, and an iid sequence D=(S(i))iD=(S^{(i)})_{i\in\mathbb{N}}, independent of SS, where, for each ii, S(i)=(St(i))t=0TS^{(i)}=(S^{(i)}_{t})_{t=0}^{T} has independent components with (St(i))=(St)\mathcal{L}(S^{(i)}_{t})=\mathcal{L}(S_{t}) (equal in distribution). DD will represent possible initially simulated data and we set 𝒟=σ(D)\mathcal{D}=\sigma(D). The actual simulated data will be a finite sample and we write Dn:=(S(i))i=1nD_{n}:=(S^{(i)})_{i=1}^{n}. For ZL2()Z\in L^{2}(\mathcal{F}) we write Z2:=𝔼[|Z|20]12\|Z\|_{2}:=\mathbb{E}[|Z|^{2}\mid\mathcal{H}_{0}]^{\frac{1}{2}}. Notice that Z2\|Z\|_{2} is a nonrandom number if ZZ is independent of DD.

The mappings ρt\rho_{t} and φt\varphi_{t} appearing in Definitions 1 and 2 below can be defined analogously as mappings from Lp(t+1)L^{p}(\mathcal{H}_{t+1}) to Lp(t)L^{p}(\mathcal{H}_{t}) for p2p\neq 2. However, p=2p=2 will be the relevant choice for the applications treated subsequently.

Definition 1.

A dynamic monetary risk measure is a sequence (ρt)t=0T1(\rho_{t})_{t=0}^{T-1} of mappings ρt:L2(t+1)L2(t)\rho_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) satisfying

if λL2(t) and YL2(t+1), then ρt(Y+λ)=ρt(Y)λ,\displaystyle\textrm{if }\lambda\in L^{2}(\mathcal{H}_{t})\textrm{ and }Y\in L^{2}(\mathcal{H}_{t+1}),\textrm{ then }\rho_{t}(Y+\lambda)=\rho_{t}(Y)-\lambda, (3)
if Y,Y~L2(t+1) and YY~, then ρt(Y)ρt(Y~),\displaystyle\textrm{if }Y,\widetilde{Y}\in L^{2}(\mathcal{H}_{t+1})\textrm{ and }Y\leq\widetilde{Y},\textrm{ then }\rho_{t}(Y)\geq\rho_{t}(\widetilde{Y}), (4)
ρt(0)=0.\displaystyle\rho_{t}(0)=0. (5)

The elements ρt\rho_{t} of the dynamic monetary risk measure (ρt)t=0T1(\rho_{t})_{t=0}^{T-1} are called conditional monetary risk measures.

Definition 2.

A dynamic monetary utility function is a sequence (φt)t=0T1(\varphi_{t})_{t=0}^{T-1} of mappings φt:L2(t+1)L2(t)\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) satisfying

if λL2(t) and YL2(t+1), then φt(Y+λ)=φt(Y)+λ,\displaystyle\textrm{if }\lambda\in L^{2}(\mathcal{H}_{t})\textrm{ and }Y\in L^{2}(\mathcal{H}_{t+1}),\textrm{ then }\varphi_{t}(Y+\lambda)=\varphi_{t}(Y)+\lambda, (6)
if Y,Y~L2(t+1) and YY~, then φt(Y)φt(Y~),\displaystyle\textrm{if }Y,\widetilde{Y}\in L^{2}(\mathcal{H}_{t+1})\textrm{ and }Y\leq\widetilde{Y},\textrm{ then }\varphi_{t}(Y)\leq\varphi_{t}(\widetilde{Y}), (7)
φt(0)=0.\displaystyle\varphi_{t}(0)=0. (8)

Note that if (ρt)t=0T1(\rho_{t})_{t=0}^{T-1} is a dynamic monetary risk measure, (ρt())t=0T1(\rho_{t}(-\cdot))_{t=0}^{T-1} is a dynamic monetary utility function. In what follows we will focus on dynamic monetary utility function of the form

φt(Y)=ρt(Y)11+ηt𝔼[(ρt(Y)Y)+t],\displaystyle\varphi_{t}(Y)=\rho_{t}(-Y)-\frac{1}{1+\eta_{t}}\mathbb{E}\big{[}\big{(}\rho_{t}(-Y)-Y\big{)}^{+}\mid\mathcal{H}_{t}\big{]}, (9)

where (ρt)t=0T(\rho_{t})_{t=0}^{T} is a dynamic monetary risk measures in the sense of Definition 1 and (ηt)t=0T1(\eta_{t})_{t=0}^{T-1} is a sequence of nonrandom numbers in (0,1)(0,1). We may consider a more general version of this dynamic monetary utility function by allowing (ηt)t=0T1(\eta_{t})_{t=0}^{T-1} to be an (t)t=0T1(\mathcal{F}_{t})_{t=0}^{T-1} adapted sequence, however we choose the simpler version here. That (φt)t=0T(\varphi_{t})_{t=0}^{T} is indeed a dynamic monetary utility function is shown in [12].

We will later consider conditional monetary risk measures that are conditionally law invariant in the sense of Definition 3 below. Conditional law invariance will then be inherited by φt\varphi_{t} in (9).

Definition 3.

A mapping φt:L2(t+1)L2(t)\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) is called law invariant if φt(X)=φt(Y)\varphi_{t}(X)=\varphi_{t}(Y) whenever (Xt)=(Yt)\mathbb{P}(X\in\cdot\mid\mathcal{H}_{t})=\mathbb{P}(Y\in\cdot\mid\mathcal{H}_{t}).

We now define the value process corresponding to a dynamic monetary utility function (φt)t=0T1(\varphi_{t})_{t=0}^{T-1} in the sense of Definition 2 with respect to the filtration (t)t=0T1(\mathcal{F}_{t})_{t=0}^{T-1} instead of (t)t=0T1(\mathcal{H}_{t})_{t=0}^{T-1}. The use of the smaller filtration is due to that a value process of the sort appearing in Definition 4 is the theoretical object that we aim to approximate well by the methods considered in this paper.

Definition 4.

Let L:=(Lt)t=1TL:=(L_{t})_{t=1}^{T} with LtL2(t)L_{t}\in L^{2}(\mathcal{F}_{t}) for all tt. Let (φt)t=0T1(\varphi_{t})_{t=0}^{T-1} be a dynamic monetary utility function with respect to (t)t=0T1(\mathcal{F}_{t})_{t=0}^{T-1}. Let

Vt(L,φ):=φt(Lt+1+Vt+1(L,φ)),VT(L,φ):=0.\displaystyle V_{t}(L,\varphi):=\varphi_{t}(L_{t+1}+V_{t+1}(L,\varphi)),\quad V_{T}(L,\varphi):=0. (10)

We refer to Vt(L,φ)V_{t}(L,\varphi) as the time tt φ\varphi-value of LL.

Whenever it will cause no confusion, we will suppress the argument φ\varphi in Vt(L,φ)V_{t}(L,\varphi) in order to make the expressions less notationally heavy.

Remark 1.

Letting ρ\rho be a dynamic monetary risk measure and letting φ\varphi be a dynamic monetary utility function, s=1tLtVt(L,ρ)-\sum_{s=1}^{t}L_{t}-V_{t}(-L,-\rho) will be a conditional monetary risk measure on the cash flow LL in the sense of [4] and likewise s=1tLt+Vt(L,φ)\sum_{s=1}^{t}L_{t}+V_{t}(L,\varphi) will be a conditional monetary utility function in the sense of [4], with LL being interpreted as a process of incremental cash flows. If LL is a liability cash flow, we may write the risk measure of LL as s=1tLt+Vt(L,ρ)\sum_{s=1}^{t}L_{t}+V_{t}(L,\rho). Importantly, any time-consistent dynamic monetary utility function/risk measure may be written in this way (see e.g. [5]). Often convexity or subadditivity is added to the list of desired properties in definitions 1 and 2 (see e.g. [1], [4], [5]).

2.1 The approximation framework

For t=1,,Tt=1,\dots,T, consider a sequence of functions {1,Φt,1,Φt,2,}\{1,\Phi_{t,1},\Phi_{t,2},\dots\}, where for each ii\in\mathbb{N}, Φt,i:d\Phi_{t,i}:\mathbb{R}^{d}\to\mathbb{R} has the property Φt,i(St)Lp(t)\Phi_{t,i}(S_{t})\in L^{p}(\mathcal{F}_{t}) and the set {1,Φt,1(St),Φt,2(St),}\{1,\Phi_{t,1}(S_{t}),\Phi_{t,2}(S_{t}),\dots\} make up a.s. linearly independent random variables. We define the approximation space t,N\mathcal{B}_{t,N} and its corresponding L2L^{2} projection operator Pt,N:L2(t)t,NP_{\mathcal{B}_{t,N}}:L^{2}(\mathcal{H}_{t})\to\mathcal{B}_{t,N} as follows: for NN\in\mathbb{N} and t{0,,T}t\in\{0,\dots,T\},

t,N:=span{1,Φt,1(St),,Φt,N(St)},\displaystyle\mathcal{B}_{t,N}:=\mathrm{span}\{1,\Phi_{t,1}(S_{t}),\dots,\Phi_{t,N}(S_{t})\}, (11)
Pt,NZt:=arginfBt,NZtB2.\displaystyle P_{\mathcal{B}_{t,N}}Z_{t}:=\arg\inf_{B\in\mathcal{B}_{t,N}}\|Z_{t}-B\|_{2}. (12)

Defining 𝚽t,N:=(1,Φt,1(St),,Φt,N(St))T\mathbf{\Phi}_{t,N}:=(1,\Phi_{t,1}(S_{t}),\dots,\Phi_{t,N}(S_{t}))^{\mathrm{T}}, note that the unique minimizer in (12) is given by Pt,NZt:=βt,N,ZtT𝚽t,NP_{\mathcal{B}_{t,N}}Z_{t}:=\beta_{t,N,Z_{t}}^{\mathrm{T}}\mathbf{\Phi}_{t,N}, with

βt,N,Zt=𝔼[𝚽t,N𝚽t,NT0]1𝔼[𝚽t,NZt0],\displaystyle\beta_{t,N,Z_{t}}=\mathbb{E}\big{[}\mathbf{\Phi}_{t,N}\mathbf{\Phi}_{t,N}^{{\mathrm{T}}}\mid\mathcal{H}_{0}\big{]}^{-1}\mathbb{E}\big{[}\mathbf{\Phi}_{t,N}Z_{t}\mid\mathcal{H}_{0}\big{]}, (13)

where the expected value of a vector or matrix is interpreted elementwise. Note that if ZtZ_{t} in (13) is independent of the initial data DD, then βt,N,Zt\beta_{t,N,Z_{t}} is a nonrandom vector. Indeed, we will only apply the operator Pt,NP_{\mathcal{B}_{t,N}} to random variables ZtZ_{t} independent of DD.

For each tt, consider a nonrandom function ztz_{t} such that Zt=zt(DM,St)L2(t)Z_{t}=z_{t}(D_{M},S_{t})\in L^{2}(\mathcal{H}_{t}). For MM\in\mathbb{N}, let

𝚽t,N(M):=(1Φt,1(St(1))Φt,N(St(1))1Φt,1(St(M))Φt,N(St(M))),\displaystyle\mathbf{\Phi}^{(M)}_{t,N}:=\left(\begin{array}[]{cccc}1&\Phi_{t,1}(S^{(1)}_{t})&\dots&\Phi_{t,N}(S^{(1)}_{t})\\ \vdots&\dots&\dots&\vdots\\ 1&\Phi_{t,1}(S^{(M)}_{t})&\dots&\Phi_{t,N}(S^{(M)}_{t})\end{array}\right),
Zt(M):=(zt(DM,St(1))zt(DM,St(M)))\displaystyle Z_{t}^{(M)}:=\left(\begin{array}[]{c}z_{t}(D_{M},S^{(1)}_{t})\\ \vdots\\ z_{t}(D_{M},S^{(M)}_{t})\end{array}\right)

and define

β^t,N,Zt(M)\displaystyle\widehat{\beta}^{(M)}_{t,N,Z_{t}} :=((𝚽t,N(M))T𝚽t,N(M))1(𝚽t,N(M))TZt(M),\displaystyle:=\Big{(}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}\mathbf{\Phi}^{(M)}_{t,N}\Big{)}^{-1}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}Z_{t}^{(M)}, (14)
Pt,N(M)Zt\displaystyle P_{\mathcal{B}_{t,N}}^{(M)}Z_{t} :=(β^t,N,Zt(M))T𝚽t,N.\displaystyle:=(\widehat{\beta}^{(M)}_{t,N,Z_{t}})^{\mathrm{T}}\mathbf{\Phi}_{t,N}. (15)

Notice that β^t,N,Zt(M)\widehat{\beta}^{(M)}_{t,N,Z_{t}} is independent of SS and is the standard OLS estimator of βt,N,Zt\beta_{t,N,Z_{t}} in (13). Notice also that 𝚽t,N\mathbf{\Phi}_{t,N} is independent of DD. With the above definitions we can define the Least Squares Monte Carlo (LSM) algorithm for approximating the value V0(L,φ)V_{0}(L,\varphi) given by (4).

Let (φt)t=0T1(\varphi_{t})_{t=0}^{T-1} be a sequence of law-invariant mappings φt:L2(t+1)L2(t)\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}). Consider a stochastic process L:=(Lt)t=1TL:=(L_{t})_{t=1}^{T}, where Lt=gt(St)L2(t)L_{t}=g_{t}(S_{t})\in L^{2}(\mathcal{F}_{t}) for all tt for some nonrandom functions gt:dg_{t}:\mathbb{R}^{d}\to\mathbb{R}. The goal is to estimate the values (Vt(L))t=0T(V_{t}(L))_{t=0}^{T} given by Definition 4. Note that the sought values Vt(L)V_{t}(L) are independent of DD and thus, by the law-invariance property and the Markov property, Vt(L)V_{t}(L) is a function of StS_{t} for each tt. Now we may describe the LSM algorithm with respect to NN basis functions and simulation sample size MM. The LSM algorithm corresponds to the following recursion:

V^N,t(M)(L):=Pt,N(M)φt(Lt+1+V^N,t+1(M)(L)),V^N,T(M)(L):=0.\displaystyle\widehat{V}^{(M)}_{N,t}(L):=P^{(M)}_{\mathcal{B}_{t,N}}\varphi_{t}\big{(}L_{t+1}+\widehat{V}^{(M)}_{N,t+1}(L)\big{)},\quad\widehat{V}^{(M)}_{N,T}(L):=0. (16)

Notice that V^N,t(M)\widehat{V}^{(M)}_{N,t} is a function of the random variables StS_{t} and Su(i)S_{u}^{(i)} for 1iM1\leq i\leq M and t+1uTt+1\leq u\leq T. In particular, V^N,t(M)L2(t)\widehat{V}^{(M)}_{N,t}\in L^{2}(\mathcal{H}_{t}). In the section below, we will investigate when, and in what manner, V^N,t(M)L2(t)\widehat{V}^{(M)}_{N,t}\in L^{2}(\mathcal{H}_{t}) may converge to VtL2(t)L2(t)V_{t}\in L^{2}(\mathcal{F}_{t})\subset L^{2}(\mathcal{H}_{t}). For this purpose, we make the additional useful definition:

V^N,t(L):=Pt,Nφt(Lt+1+V^N,t+1(L)),V^N,T(L):=0.\displaystyle\widehat{V}_{N,t}(L):=P_{\mathcal{B}_{t,N}}\varphi_{t}\big{(}L_{t+1}+\widehat{V}_{N,t+1}(L)\big{)},\quad\widehat{V}_{N,T}(L):=0. (17)

V^N,t(L)\widehat{V}_{N,t}(L) is to be interpreted as an idealized LSM estimate, where we make a least-squares optimal estimate in each iteration. Note that this quantity is independent of DD

3 Consistency results

In the following section we prove what essentially are two consistency results for the LSM estimator V^N,t(M)(L)\widehat{V}^{(M)}_{N,t}(L) along with providing conditions for these to hold. These consistency results are analogous to Theorems 3.1 and 3.2 in [7]. The first and simplest result, Lemma 1, is that if we have a flexible enough class of basis functions, V^N,t(L)\widehat{V}_{N,t}(L) will asymptotically approach the true value Vt(L)V_{t}(L). The second consistency result, Theorem 1, is that when NN is kept fixed, then V^N,t(M)(L)\widehat{V}^{(M)}_{N,t}(L) will approach the least-squares optimal V^N,t(L)\widehat{V}_{N,t}(L) for each tt as MM grows to infinity. Hence, we show that the LSM estimator for a fixed number of basis functions is consistent in the sense that the simulation-based projection operator Pt,N(M)P^{(M)}_{\mathcal{B}_{t,N}} will approach Pt,NP_{\mathcal{B}_{t,N}} even in the presence of errors in a multiperiod setting. Lemma 7 and Theorem 3 furthermore extends these results to the case of a Cost-of-Capital valuation, studied in [12] and [13], which here is dependent on the non-continuous risk measure Value-at-Risk. Note from Section 2 that these results presume simulated data not to be path dependent, in contrast to the results in [25].

We should note that these results do not give a rate of convergence, which is done in the optimal stopping setting in for instance [14] and [23]. Especially, these papers provide a joint convergence rate in which MM and NN simultaneously go to infinity, something which is not done here. There are three main reasons for this. First of all, in this paper the purpose is to investigate LSM methods given by standard OLS regression, i.e. we do not want to involve a truncation operator as we believe this would not likely be implemented in practice. The use of truncation operators is necessary for the results in [14] and [23], although one can handle the case of unbounded cash flows by letting the bound based on the truncation operator suitably go to infinity along with NN and MM. Secondly, we believe that the bounds involved in the rates of convergence would be quite large in our case if we applied repeatedly the procedure in [14] or [23] (see remark 3). Thirdly, we want to consider mappings which are L2L^{2}-continuous (Definition 5) but not necessarily Lipschitz. In this case it is not clear how convergence can be established other than at some unspecified rate.

3.1 General convergence results

We first define a useful mode of continuity that we will require to show our first results on the convergence of the LSM algorithm.

Definition 5.

The mapping φt:L2(t+1)L2(t)\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) is said to be L2L^{2}-continuous if XXn20 implies φt(X)φt(Xn)20\|X-X_{n}\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0\text{ implies }\|\varphi_{t}(X)-\varphi_{t}(X_{n})\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0.

Notice that if (Xn)n=1(X_{n})_{n=1}^{\infty} and XX are independent of DD, the convergence in probability may be replaced by convergence of real numbers.

We are now ready to formulate our first result on the convergence of the LSM algorithm. The first result essentially says that if we make the best possible estimation in each recursion step, using NN basis functions, then, for each tt, the estimator of VtV_{t} will converge in L2L^{2} to VtV_{t} as NN\to\infty. This result is not affected by the initial data DD, as it does not require any simulation-based approximation.

Lemma 1.

For t=0,,T1t=0,\dots,T-1, let the mappings φt:L2(t+1)L2(t)\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) be L2L^{2}-continuous and law invariant. For t=1,,Tt=1,\dots,T, let nt,n\bigcup_{n\in\mathbb{N}}\mathcal{B}_{t,n} be dense in the set {h(St)h:d,h(St)L2(t)}\{h(S_{t})\mid h:\mathbb{R}^{d}\to\mathbb{R},h(S_{t})\in L^{2}(\mathcal{F}_{t})\}. Then, for t=0,,T1t=0,\dots,T-1,

V^N,t(L)Vt(L)2 and limNV^N,t(L)Vt(L)2=0.\displaystyle\|\widehat{V}_{N,t}(L)-V_{t}(L)\|_{2}\in\mathbb{R}\text{ and }\lim_{N\to\infty}\|\widehat{V}_{N,t}(L)-V_{t}(L)\|_{2}=0.

The second result uses the independence assumptions of DD to prove a somewhat technical result for when Pt,N(M)P_{\mathcal{B}_{t,N}}^{(M)} given by (15) asymptotically approaches the projection Pt,NP_{\mathcal{B}_{t,N}} given by (12).

Lemma 2.

Let Zt=zt(St)L2(t)Z_{t}=z_{t}(S_{t})\in L^{2}(\mathcal{F}_{t}). For each MM\in\mathbb{N}, let ZM,t=zM,t(DM,St)L2(t)Z_{M,t}=z_{M,t}(D_{M},S_{t})\in L^{2}(\mathcal{H}_{t}), where zM,t(DM,)z_{M,t}(D_{M},\cdot) only depends on DMD_{M} through {Su(i):1iM,ut+1}\{S_{u}^{(i)}:1\leq i\leq M,u\geq t+1\}, i.e.

(zM,t(DM,St(i)))=(zM,t(DM,St)).\displaystyle\mathcal{L}\big{(}z_{M,t}(D_{M},S_{t}^{(i)})\big{)}=\mathcal{L}\big{(}z_{M,t}(D_{M},S_{t})\big{)}. (18)

Then, ZtZM,t20\|Z_{t}-Z_{M,t}\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0 as MM\to\infty implies

Pt,NZtPt,N(M)ZM,t20andβ^t,N,ZM,t(M)βt,N,Zt as M.\displaystyle\|P_{\mathcal{B}_{t,N}}Z_{t}-P^{(M)}_{\mathcal{B}_{t,N}}Z_{M,t}\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0\quad\text{and}\quad\widehat{\beta}^{(M)}_{t,N,Z_{M,t}}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}\beta_{t,N,Z_{t}}\quad\text{ as }M\to\infty.
Remark 2.

Notice that V^N,t(M)(L)\widehat{V}_{N,t}^{(M)}(L) satisfies (18) due to the backwards recursive structure of the LSM algorithm (16).

Lemma 2 is essentially what is needed to prove the induction step in the induction argument used to prove the following result:

Theorem 1.

For t=0,,T1t=0,\dots,T-1, let the mappings φt:L2(t+1)L2(t)\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) be L2L^{2}-continuous and law invariant, let V^N,t(M)(L)\widehat{V}_{N,t}^{(M)}(L) be given by (16), and let V^N,t(L)\widehat{V}_{N,t}(L) be given by (17). Then, for t=0,,T1t=0,\dots,T-1,

V^N,t(L)V^N,t(M)(L)20 as M.\displaystyle\|\widehat{V}_{N,t}(L)-\widehat{V}_{N,t}^{(M)}(L)\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0\text{ as }M\to\infty.

To summarize, Lemma 1 says that we can theoretically/asymptotically achieve arbitrarily accurate approximations, even when applying the approximation recursively, and Theorem 1 says that we may approach this theoretical best approximation in practice, with enough simulated non-path-dependent data.

Lemma 3.

If the mapping φt:L2(t+1)L2(t)\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) is Lipschitz continuous in the sense that there exists a constant K>0K>0 such that

|φt(X)φt(Y)|K𝔼[|XY|t]for all X,YL2(t+1),\displaystyle|\varphi_{t}(X)-\varphi_{t}(Y)|\leq K\mathbb{E}[|X-Y|\mid\mathcal{H}_{t}]\quad\text{for all }X,Y\in L^{2}(\mathcal{H}_{t+1}), (19)

then φt\varphi_{t} is L2L^{2}-continuous in the sense of Definition 5.

Lemma 4.

If the conditional monetary risk measure ρt:L2(t+1)L2(t)\rho_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) is Lipschitz continuous in the sense of (19) with Lipschitz constant KK, then so is the mapping φt\varphi_{t} given by (9) with Lipschitz constant 2K2K.

The large class of (conditional) spectral risk measures are in fact Lipschitz continuous. These conditional monetary risk measures can be expressed as

ρt,m(Y)=01Ft,Y1(u)m(u)du,\displaystyle\rho_{t,m}(Y)=-\int_{0}^{1}F^{-1}_{t,Y}(u)m(u)\mathrm{d}u, (20)

where mm is a probability density function that is decreasing, bounded and right continuous, and Ft,Y1(u)F^{-1}_{t,Y}(u) is the conditional quantile function

Ft,Y1(u):=essinf{yL0(t):(Yyt)u}.\displaystyle F^{-1}_{t,Y}(u):=\operatorname*{ess\,inf}\{y\in L^{0}(\mathcal{H}_{t}):\mathbb{P}(Y\leq y\mid\mathcal{H}_{t})\geq u\}.

It is well known that spectral risk measures are coherent and includes expected shortfall as a special case.

Lemma 5.

If mm is a probability density function that is decreasing, bounded and right continuous, then (ρt,m)t=0T1(\rho_{t,m})_{t=0}^{T-1} is a dynamic monetary risk measure in the sense of Definition 1. Moreover, each ρt,m\rho_{t,m} is law invariant in the sense of Definition 3 and also Lipschitz continuous with constant m(0)m(0).

Remark 3.

Assume φt\varphi_{t} is Lipschitz continuous. Then

V^t(L)Vt(L)2\displaystyle||\widehat{V}_{t}(L)-V_{t}(L)||_{2}
V^t(L)φt(Lt+1+V^t+1(L))2\displaystyle\quad\leq||\widehat{V}_{t}(L)-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L))||_{2}
+||φt(Lt+1+Vt+1(L))φt(Lt+1+V^t+1(L)||2\displaystyle\quad\quad+||\varphi_{t}(L_{t+1}+V_{t+1}(L))-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L)||_{2}
V^t(L)φt(Lt+1+V^t+1(L))2+KVt+1(L)V^t+1(L)2\displaystyle\quad\leq||\widehat{V}_{t}(L)-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L))||_{2}+K||V_{t+1}(L)-\widehat{V}_{t+1}(L)||_{2}

Repeating this argument gives

||V^t(L)Vt(L)||2s=tTKst||V^s(L)φs(Ls+1+V^s+1(L)||2\displaystyle||\widehat{V}_{t}(L)-V_{t}(L)||_{2}\leq\sum_{s=t}^{T}K^{s-t}||\widehat{V}_{s}(L)-\varphi_{s}(L_{s+1}+\widehat{V}_{s+1}(L)||_{2}

This bound is analogous to that in [24] (Lemma 2.3, see also Remark 3.4 for how this ties in with the main result) with the exception that the constant KstK^{s-t} appears instead of 22. As KK may be quite large this is one of the reasons for not seeking to determine the exact rate of convergence, as is done in for instance [25] and [14]. This observation also discourages judging the accuracy of the LSM algorithm purely by estimating the out-of-sample one-step estimation errors of the form V^s(L)φs(Ls+1+V^s+1(L))2||\widehat{V}_{s}(L)-\varphi_{s}(L_{s+1}+\widehat{V}_{s+1}(L))||_{2}, as these need to be quite small in order for a satisfying error bound.

3.2 Convergence results using Value-at-Risk

In this section, we will focus on mappings φt,α:L2(t+1)L2(t)\varphi_{t,\alpha}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t}) given by

φt,α(Y):=VaRt,α(Y)11+ηt𝔼[(VaRt,α(Y)Y)+t]\displaystyle\varphi_{t,\alpha}(Y):=\operatorname{VaR}_{t,\alpha}(-Y)-\frac{1}{1+\eta_{t}}\mathbb{E}[(\operatorname{VaR}_{t,\alpha}(-Y)-Y)^{+}\mid\mathcal{H}_{t}] (21)

for some α(0,1)\alpha\in(0,1) and nonnegative constants (ηt)t=0T1(\eta_{t})_{t=0}^{T-1}, and where

VaRt,α(Y)\displaystyle\operatorname{VaR}_{t,\alpha}(-Y) :=Ft,Y1(1α)\displaystyle:=F^{-1}_{t,Y}(1-\alpha)
:=essinf{yL0(t):(Yyt)1α}.\displaystyle:=\operatorname*{ess\,inf}\{y\in L^{0}(\mathcal{H}_{t}):\mathbb{P}(Y\leq y\mid\mathcal{H}_{t})\geq 1-\alpha\}.

is the conditional version of Value-at-Risk. Note that φt,α\varphi_{t,\alpha} is a special case of mappings φ\varphi in (9). (VaRt,α)t=0T1(\operatorname{VaR}_{t,\alpha})_{t=0}^{T-1} is a dynamic monetary risk measure in the sense of Definition 1, and VaRt,α\operatorname{VaR}_{t,\alpha} is law invariant in the sense of Definition 3. Since VaRt,α\operatorname{VaR}_{t,\alpha} is in general not Lipschitz continuous, φt,α\varphi_{t,\alpha} cannot be guaranteed to be so, without further regularity conditions. The aim of this section is to find results analogous to Lemma 1 and Theorem 1.

We will use the following Lemma and especially its corollary in lieu of L2L^{2}-continuity for Value-at-Risk:

Lemma 6.

For any X,Zt+1X,Z\in\mathcal{H}_{t+1} and any δ(0,1α)\delta\in(0,1-\alpha),

VaRt,α((X+Z))VaRt,α+δ(X)+VaRt,1δ(Z),\displaystyle\operatorname{VaR}_{t,\alpha}(-(X+Z))\leq\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(-Z), (22)
VaRt,α((X+Z))VaRt,αδ(X)VaRt,1δ(Z).\displaystyle\operatorname{VaR}_{t,\alpha}(-(X+Z))\geq\operatorname{VaR}_{t,\alpha-\delta}(-X)-\operatorname{VaR}_{t,1-\delta}(-Z). (23)

We get an interesting corollary from this lemma:

Corollary 1.

Let α(0,1)\alpha\in(0,1) and let δ(0,1α)\delta\in(0,1-\alpha) with δ<1/2\delta<1/2. Then, for any X,YL1(t+1)X,Y\in L^{1}(\mathcal{H}_{t+1}),

inf|ϵ|<δ|VaRt,α+ϵ(X)VaRt,α(Y)|1δ𝔼[|XY|t].\displaystyle\inf_{|\epsilon|<\delta}|\operatorname{VaR}_{t,\alpha+\epsilon}(X)-\operatorname{VaR}_{t,\alpha}(Y)|\leq\frac{1}{\delta}\mathbb{E}[|X-Y|\mid\mathcal{H}_{t}].

Using these Lipschitz-like results, we can show a Lipschitz-like result for φt,α()\varphi_{t,\alpha}(\cdot).

Theorem 2.

Let α(0,1)\alpha\in(0,1) and let δ(0,1α)\delta\in(0,1-\alpha) with δ<1/2\delta<1/2. Then, for any X,YL1(t+1)X,Y\in L^{1}(\mathcal{H}_{t+1}), then

inf|ϵ|<δ|φt,α+ϵ(X)φt,α(Y)|2δ𝔼[|XY|t]\displaystyle\inf_{|\epsilon|<\delta}|\varphi_{t,\alpha+\epsilon}(X)-\varphi_{t,\alpha}(Y)|\leq\frac{2}{\delta}\mathbb{E}[|X-Y|\mid\mathcal{H}_{t}] (24)

and

φt,αδ(X)2δ𝔼[|XY|t]\displaystyle\varphi_{t,\alpha-\delta}(X)-\frac{2}{\delta}\mathbb{E}[|X-Y|\mid\mathcal{H}_{t}] φt,α(Y)\displaystyle\leq\varphi_{t,\alpha}(Y)
φt,α+δ(X)+2δ𝔼[|XY|t],\displaystyle\leq\varphi_{t,\alpha+\delta}(X)+\frac{2}{\delta}\mathbb{E}[|X-Y|\mid\mathcal{H}_{t}], (25)

and (24) and (25) are equivalent.

Theorem 2 enables us to prove L2L^{2}-continuity of φt,α\varphi_{t,\alpha} under a continuity assumption.

Corollary 2.

Consider X,XnL2(t+1)X,X_{n}\in L^{2}(\mathcal{H}_{t+1}), n1n\geq 1, with XnXX_{n}\to X in L2L^{2}. Assume that (0,1)uVaRt,u(X)(0,1)\ni u\mapsto\operatorname{VaR}_{t,u}(-X) be a.s. continuous at u=αu=\alpha. Then φt,α(Xn)φt,α(X)\varphi_{t,\alpha}(X_{n})\to\varphi_{t,\alpha}(X) in L2L^{2}.

The following remark illustrates that even a stronger requirement of a.s. continuous time tt-conditional distributions should not be a great hindrance in practice:

Remark 4.

If we add to our cash flow (Lt)t=1T(L_{t})_{t=1}^{T} an adapted process (ϵt)t=1T(\epsilon_{t})_{t=1}^{T}, independent of (Lt)t=1T(L_{t})_{t=1}^{T}, such that for each tt, ϵt\epsilon_{t} is independent of t1\mathcal{F}_{t-1} and has a continuous distribution function, then the assumptions in Corollary 2 will be satisfied.

We are now ready to formulate a result analogous to Lemma 1.

Lemma 7.

Let α(0,1)\alpha\in(0,1) and let δ(0,1α)\delta\in(0,1-\alpha) with δ<1/2\delta<1/2. Let (φt,α)t=0T1(\varphi_{t,\alpha})_{t=0}^{T-1} be defined by (21) and let

V^N,t,α(L):=Pt,Nφt,α(Lt+1+V^N,t+1,α(L)),V^N,T,α(L):=0.\displaystyle\widehat{V}_{N,t,\alpha}(L):=P_{\mathcal{B}_{t,N}}\varphi_{t,\alpha}\big{(}L_{t+1}+\widehat{V}_{N,t+1,\alpha}(L)\big{)},\quad\widehat{V}_{N,T,\alpha}(L):=0. (26)

Let nt,n\bigcup_{n\in\mathbb{N}}\mathcal{B}_{t,n} be dense in the set {h(St)h:d,h(St)L2(t)}\{h(S_{t})\mid h:\mathbb{R}^{d}\to\mathbb{R},h(S_{t})\in L^{2}(\mathcal{F}_{t})\} and assume that (0,1)uVaRt,u(Lt+1V^N,t+1,α(L))(0,1)\ni u\mapsto\operatorname{VaR}_{t,u}(-L_{t+1}-\widehat{V}_{N,t+1,\alpha}(L)) be a.s. continuous at u=αu=\alpha for all N,t=0,T1N\in\mathbb{N},t=0,\dots T-1. Then, for t=0,,T1t=0,\dots,T-1,

V^N,t,α(L)Vt,α(L)2 and limNV^N,t,α(L)Vt,α(L)2=0.\displaystyle\|\widehat{V}_{N,t,\alpha}(L)-V_{t,\alpha}(L)\|_{2}\in\mathbb{R}\text{ and }\lim_{N\to\infty}\|\widehat{V}_{N,t,\alpha}(L)-V_{t,\alpha}(L)\|_{2}=0.
Lemma 8.

Let α(0,1)\alpha\in(0,1) and let (0,1)uVaRt,u(vT𝚽t+1,N)(0,1)\ni u\mapsto\operatorname{VaR}_{t,u}(v^{\mathrm{T}}\mathbf{\Phi}_{t+1,N}) be a.s. continuous at u=αu=\alpha for any vNv\in\mathbb{R}^{N}. Then

βnβimpliesφt,α(βT𝚽t+1,N)φt,α(βnT𝚽t+1,N)20.\displaystyle\beta_{n}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}\beta\quad\text{implies}\quad\big{\|}\varphi_{t,\alpha}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\big{\|}_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0.
Remark 5.

Lemma 8 can be extended to show the convergence

φt,α(Lt+1+βT𝚽t+1,N)φt,α(Lt+1+βnT𝚽t+1,N)20\displaystyle\big{\|}\varphi_{t,\alpha}(L_{t+1}+\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(L_{t+1}+\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\big{\|}_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0

since the vector of basis functions 𝚽t+1,N\mathbf{\Phi}_{t+1,N} could contain Lt+1L_{t+1} as an element. The requirement for convergence is that uVaRt,u(Lt+1vT𝚽t+1,N)u\mapsto\operatorname{VaR}_{t,u}(-L_{t+1}-v^{\mathrm{T}}\mathbf{\Phi}_{t+1,N}) is a.s. continuous at u=αu=\alpha. This requirement could be replaced by the stronger requirement that x(Lt+1+vT𝚽t+1,Nt)x\mapsto\mathbb{P}(L_{t+1}+v^{\mathrm{T}}\mathbf{\Phi}_{t+1,N}\mid\mathcal{F}_{t}) is a.s. continuous.

We have now fitted φt,α\varphi_{t,\alpha} into the setting of Theorem 1.

Theorem 3.

Let uVaRt,u(Lt+1vT𝚽t+1,N)u\mapsto\operatorname{VaR}_{t,u}(-L_{t+1}-v^{\mathrm{T}}\mathbf{\Phi}_{t+1,N}) be a.s. continuous at u=αu=\alpha for any vNv\in\mathbb{R}^{N}. For any NNN\in N and t=0,,Tt=0,\dots,T, let V^N,t,α(L)\widehat{V}_{N,t,\alpha}(L) be given by (26) and define

V^N,t,α(M)(L):=Pt,N(M)φt,α(Lt+1+V^N,t+1(M)(L)),V^N,T,α(M)(L):=0.\displaystyle\widehat{V}^{(M)}_{N,t,\alpha}(L):=P^{(M)}_{\mathcal{B}_{t,N}}\varphi_{t,\alpha}\big{(}L_{t+1}+\widehat{V}^{(M)}_{N,t+1}(L)\big{)},\quad\widehat{V}^{(M)}_{N,T,\alpha}(L):=0.

Then, for t=0,1,,T1t=0,1,\dots,T-1, V^N,t,α(L)V^N,t,α(M)(L)20\|\widehat{V}_{N,t,\alpha}(L)-\widehat{V}_{N,t,\alpha}^{(M)}(L)\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0 as MM\to\infty.

4 Implementing and validating the LSM algorithm

In this section we will test the LSM algorithm empirically for the special case of the mappings φ\varphi being given by (φt,α)t=0T1(\varphi_{t,\alpha})_{t=0}^{T-1} in (21). The LSM algorithm described below, Algorithm 1, will differ slightly from the one previously, in the sense that it will contain the small inefficiency of having two regression steps: One for the VaR\operatorname{VaR} term of the mapping, one for the expected value term. The reason for introducing this split is that it will significantly simplify the validation procedures of the algorithm. Heuristically, we will be able to run a forward simulation where we may test the accuracy of both the VaR\operatorname{VaR} term and the expected value term.

Let μt,t+1(,)\mu_{t,t+1}(\cdot,\cdot) be the transition kernel from time tt to t+1t+1 of the Markov process (St)t=0T(S_{t})_{t=0}^{T} so that μt,t+1(St,)=(St+1St)\mu_{t,t+1}(S_{t},\cdot)=\mathbb{P}(S_{t+1}\in\cdot\mid S_{t}). In order to perform the LSM algorithm below, the only requirements are the ability to efficiently sample a variate ss from the unconditional law (St)\mathcal{L}(S_{t}) of StS_{t} and from the conditional law μt,t+1(,s)\mu_{t,t+1}(\cdot,s). Recall that the liability cash flow (Lt)t=1T(L_{t})_{t=1}^{T} is assumed to be given by Lt:=gt(St)L_{t}:=g_{t}(S_{t}), for known functions (gt)t=1T(g_{t})_{t=1}^{T}.

Algorithm 1 LSM Algorithm
 Set β^T,N,V(M):=0\widehat{\beta}^{(M)}_{T,N,V}:=0
for t=T1:0t=T-1:0 do
  Draw independent variables St(1),St(M)S^{(1)}_{t},\dots S^{(M)}_{t} from (St)\mathcal{L}(S_{t})
  for i=1:Mi=1:M do
   Draw independent variables St+1(i,1),,St+1(i,n)S^{(i,1)}_{t+1},\dots,S^{(i,n)}_{t+1} from μt,t+1(St(i),)\mu_{t,t+1}(S^{(i)}_{t},\cdot)
   Set Yt+1(i,j):=gt+1(St+1(i,j))+(β^t+1,N,V(M))T𝚽t+1,N(St+1(i,j))Y^{(i,j)}_{t+1}:=g_{t+1}(S^{(i,j)}_{t+1})+(\widehat{\beta}^{(M)}_{t+1,N,V})^{{\mathrm{T}}}\mathbf{\Phi}_{t+1,N}(S^{(i,j)}_{t+1}), j=1,,nj=1,\dots,n
   Let F^t(i)(y):=1nj=1nI{Yt+1(i,j)y}\widehat{F}^{(i)}_{t}(y):=\frac{1}{n}\sum_{j=1}^{n}I\{Y^{(i,j)}_{t+1}\leq y\} (empirical cdf)
   Set Rt(i):=min{y:F^t(i)(y)α}R^{(i)}_{t}:=\min\{y:\widehat{F}^{(i)}_{t}(y)\geq\alpha\} (empirical α\alpha-quantile)
   Set Et(i):=1nj=1n(Rt(i)Yt+1(i,j))+E^{(i)}_{t}:=\frac{1}{n}\sum_{j=1}^{n}(R^{(i)}_{t}-Y^{(i,j)}_{t+1})_{+}
  end for
  Set β^t,N,R(M)\widehat{\beta}^{(M)}_{t,N,R} as in (14) by regressing (Rt(i))i=1M(R_{t}^{(i)})_{i=1}^{M} onto (𝚽t,N(St(i)))i=1M(\mathbf{\Phi}_{t,N}(S^{(i)}_{t}))_{i=1}^{M}
  Set β^t,N,E(M)\widehat{\beta}^{(M)}_{t,N,E} as in (14) by regressing (Et(i))i=1M(E_{t}^{(i)})_{i=1}^{M} onto (𝚽t,N(St(i)))i=1M(\mathbf{\Phi}_{t,N}(S^{(i)}_{t}))_{i=1}^{M}
  Set β^t,N,V(M):=β^t,N,Rt(M)11+ηβ^t,N,Et(M)\widehat{\beta}^{(M)}_{t,N,V}:=\widehat{\beta}^{(M)}_{t,N,R_{t}}-\frac{1}{1+\eta}\widehat{\beta}^{(M)}_{t,N,E_{t}}
end for

We may assess the accuracy of the LSM implementation by computing root mean-squared errors (RMSE) of quantities appearing in Algorithm 1. For each index pair (t,i)(t,i) set Vt(i):=Rt(i)11+ηEt(i)V^{(i)}_{t}:=R^{(i)}_{t}-\frac{1}{1+\eta}E^{(i)}_{t}. Define the RMSE and the normalized RMSE by

RMSEZ,t\displaystyle\text{RMSE}_{Z,t} :=(1Mi=1M(Zt(i)(β^t,N,Z(M))T𝚽t,N(St(i)))2)1/2,\displaystyle:=\bigg{(}\frac{1}{M}\sum_{i=1}^{M}\Big{(}Z^{(i)}_{t}-(\widehat{\beta}^{(M)}_{t,N,Z})^{\mathrm{T}}\mathbf{\Phi}_{t,N}\big{(}S^{(i)}_{t}\big{)}\Big{)}^{2}\bigg{)}^{1/2}, (27)
NRMSEZ,t\displaystyle\text{NRMSE}_{Z,t} :=RMSEZ,t×(1Mi=1MZt(i)2)1/2,\displaystyle:=\text{RMSE}_{Z,t}\times\bigg{(}\frac{1}{M}\sum_{i=1}^{M}{Z^{(i)}_{t}}^{2}\bigg{)}^{-1/2}, (28)

where ZZ is a placeholder for RR, EE or VV.

For each index pair (t,i)(t,i) consider the actual non-default probability and actual return on capital given by

ANDPt(i)\displaystyle\text{ANDP}^{(i)}_{t} :=F^t(i)((β^t,N,R(M))T𝚽t,N(St(i)))),\displaystyle:=\widehat{F}^{(i)}_{t}\big{(}(\widehat{\beta}^{(M)}_{t,N,R})^{{\mathrm{T}}}\mathbf{\Phi}_{t,N}(S^{(i)}_{t}))\big{)}, (29)
AROCt(i)\displaystyle\text{AROC}^{(i)}_{t} :=(1+η)Et(i)×((β^t,N,E(M))T𝚽t,N(St(i)))1),\displaystyle:=(1+\eta)E^{(i)}_{t}\times\big{(}(\widehat{\beta}^{(M)}_{t,N,E})^{{\mathrm{T}}}\mathbf{\Phi}_{t,N}(S^{(i)}_{t})\big{)}^{-1}), (30)

and note that these random variables are expected to be centered around α\alpha and 1+η1+\eta, respectively, if the implementation is accurate. All validation procedures in this paper are performed out-of-sample, i.e. we must perform a second validation run to get these values.

4.1 Models

In this section we will introduce two model types in order to test the performance of the LSM algorithm. The first model type, introduced in Section 4.1.1, is not motivated by a specific application but is simply a sufficiently flexible and moderately complex time series model.The second model type, introduced in Section 4.1.2, aims to describe the cash flow of a life insurance portfolio paying both survival and death benefits.

4.1.1 AR(1)-GARCH(1,1) models

The first model to be evaluated is when the liability cash flow (Lt)t=1T(L_{t})_{t=1}^{T} is assumed to be given by a process given by an AR(1) model with GARCH(1,1) residuals, with dynamics given by:

Lt+1=α0+α1Lt+σt+1ϵt+1,σt+12=α2+α3σt2+α4Lt2,L0=0,σ1=1.\displaystyle L_{t+1}=\alpha_{0}+\alpha_{1}L_{t}+\sigma_{t+1}\epsilon_{t+1},\quad\sigma^{2}_{t+1}=\alpha_{2}+\alpha_{3}\sigma^{2}_{t}+\alpha_{4}L^{2}_{t},\quad L_{0}=0,\sigma_{1}=1.

Here ϵ1,,ϵT\epsilon_{1},\dots,\epsilon_{T} are assumed to be i.i.d. standard normally distributed and α0,α4\alpha_{0},\dots\alpha_{4} are known model parameters. If we put St=(Lt,σt+1)S_{t}=(L_{t},\sigma_{t+1}) for t=0,,Tt=0,\dots,T, we see that StS_{t} will form a time homogeneous Markov chain.

In order to contrast this model with a more complex model, we also investigate the case where the process (Lt)t=1T(L_{t})_{t=1}^{T} is given by a sum of independent AR(1)-GARCH(1,1)-processes of the above type: Lt=i=110Lt,iL_{t}=\sum_{i=1}^{10}L_{t,i}, where

Lt+1,i=α0,i+α1,iLt,i+σt+1,iϵt+1,i,σt+1,i2=α2,i+α3,iσt,i2+α4,iLt,i2.\displaystyle L_{t+1,i}=\alpha_{0,i}+\alpha_{1,i}L_{t,i}+\sigma_{t+1,i}\epsilon_{t+1,i},\quad\sigma^{2}_{t+1,i}=\alpha_{2,i}+\alpha_{3,i}\sigma^{2}_{t,i}+\alpha_{4,i}L^{2}_{t,i}.

The motivation for these choices of toy models is as follows: Firstly, a single AR(1)-GARCH(1,1) process is sufficiently low dimensional so we may compare brute force approximation with that of the LSM model, thus getting a real sense of the performance of the LSM model. Secondly, despite it being low dimensional, it still seems to have a sufficiently complex dependence structure as not to be easily valued other than by numerical means. The motivation for looking at a sum of AR(1)-GARCH(1,1) processes is simply to investigate whether model performance is severely hampered by an increase in dimensionality, provided a certain amount of independence of the sources of randomness.

4.1.2 Life insurance models

In order to investigate a set of models more closely resembling an insurance cash flow, we also consider an example closely inspired by that in [9]. Essentially, we will assume the liability cash flow to be given by life insurance policies where we take into account age cohorts and their sizes at each time, along with financial data relevant to the contract payouts.

We consider two risky assets YY and FF, given by the log-normal dynamics

dYt=μYYtdt+σYYtdWtY,0tT,Y0=y0,\displaystyle\mathrm{d}Y_{t}=\mu_{Y}Y_{t}\mathrm{d}t+\sigma_{Y}Y_{t}\mathrm{d}W^{Y}_{t},\quad 0\leq t\leq T,\quad Y_{0}=y_{0},
dFt=μFFtdt+σFFtdWtF,0tT,F0=f0.\displaystyle\mathrm{d}F_{t}=\mu_{F}F_{t}\mathrm{d}t+\sigma_{F}F_{t}\mathrm{d}W^{F}_{t},\quad 0\leq t\leq T,\quad F_{0}=f_{0}.

WtYW^{Y}_{t} WtFW^{F}_{t} are two correlated Brownian motions, which we may re-write as

WtY=Wt1,WtF=ρWt1+1ρ2Wt2,0tT,\displaystyle W^{Y}_{t}=W^{1}_{t},\quad W^{F}_{t}=\rho W^{1}_{t}+\sqrt{1-\rho^{2}}W^{2}_{t},\quad 0\leq t\leq T,

where W1W^{1} and W2W^{2} are two standard, uncorrelated Brownian motions. Here, FF will represent the index associated with unit-linked contracts and YY will represent assets owned by the insurance company. Furthermore, we assume that an individual of age aa has the probability 1pa1-p_{a} of reaching age a+1a+1, where the probabilities pap_{a} for a=0,1,a=0,1,\dots are assumed to be nonrandom and known. All deaths are assumed to be independent of each other. We will consider kk age-homogeneous cohorts of sizes n1,,nkn_{1},\dots,n_{k} at time t=0t=0 and ages a1,,aka_{1},\dots,a_{k} at time t=0t=0. We assume that all insured individuals have bought identical contracts. If death occurs at time tt, the contract pays out the death benefit max(D,Ft)\max(D^{*},F_{t}), where DD^{*} is a nonrandom guaranteed amount. If an insured person survives until time TT, the survival benefit max(S,FT)\max(S^{*},F_{T}), where again SS^{*} is a nonrandom amount. We finally assume that the insurance company holds the nominal amount c(n1++nk)c(n_{1}+\dots+n_{k}) in the risky asset Y and that they will sell off these assets proportionally to the amount of deaths as they occur, and sell off the entire remaining amount at time TT. Let NtiN^{i}_{t} denote the number of people alive in cohort ii at time tt, with the following dynamics:

Nt+1iBin(Nti,1pai+t),t=0,,T1.\displaystyle N^{i}_{t+1}\sim\text{Bin}(N^{i}_{t},1-p_{a_{i}+t}),\quad t=0,\dots,T-1.

These are the same dynamics as the life insurance example in Section 5 of [12]. Thus, the liability cash flow we consider here is given by

Lt\displaystyle L_{t} =(max(D,Ft)cYt)i=1k(NtiNt1i)\displaystyle=\big{(}\max(D^{*},F_{t})-cY_{t}\big{)}\sum_{i=1}^{k}(N^{i}_{t}-N^{i}_{t-1})
+𝕀{t=T}(max(S,FT)cYT)i=1kNTi\displaystyle\quad+\mathbb{I}\{t=T\}\big{(}\max(S^{*},F_{T})-cY_{T}\big{)}\sum_{i=1}^{k}N^{i}_{T}

If we write St=(Yt,Ft,Nt1,,Ntk)S_{t}=(Y_{t},F_{t},N^{1}_{t},\dots,N^{k}_{t}), then S:=(St)t=0TS:=(S_{t})_{t=0}^{T} will be a Markov chain with dynamics outlined above. Note that depending on the number kk of cohorts, SS might be a fairly high-dimensional Markov chain. Note that in addition to the obvious risk factors of mortality and the contractual payout amounts, there is also the risk of the value of the insurance company’s risky asset YY depreciating in value, something which is of course a large risk factor of insurance companies in practice. Here we will consider the case of k=4k=4 cohorts, referred to as the small life insurance model and the case k=10k=10 cohorts, referred to as the large life insurance model.

4.2 Choice of basis functions

So far, the choice of basis functions has not been addressed. As we are trying to numerically calculate some unknown functions we do not know the form of, the approach used here will be a combination of standard polynomial functions, completed with functions that in some ways bear resemblance to the underlying liability cash flow. A similar approach for the valuation of American derivatives is taken in for instance [16] and [3], where in the latter it is explicitly advised (see pp. 1082) to use the value of related, simpler, derivatives as basis functions to price more exotic ones.

In these examples, we will not be overly concerned with model sparsity, covariate significance or efficiency, but rather take the machine-learning approach of simply evaluating models based on out-of-sample performance. This is feasible due to the availability of simulated data for both fitting and out-of-sample validation.

4.2.1 AR(1)-GARCH(1,1) models

Since the AR(1)-GARCH(1,1) models can be considered toy models, generic basis functions were chosen. For a single AR(1)-GARCH(1,1) model, the choice of basis functions was all polynomials of the form Ltiσt+1jL_{t}^{i}\sigma_{t+1}^{j} for all 0<i+j20<i+j\leq 2. For the sum of 1010 independent AR(1)-GARCH(1,1) models we denote by Lt,σt+1L_{t},\sigma_{t+1} the aggregated liability cash flow and standard deviation at time tt and t+1t+1, respectively. Then we consider the basis functions consisting of the state vector (Lt,i,σt,i)i=110(L_{t,i},\sigma_{t,i})_{i=1}^{10} along with Ltiσt+1jL_{t}^{i}\sigma_{t+1}^{j} for all 0<i+j20<i+j\leq 2, omitting the case of i=1i=1, j=0j=0 to avoid collinearity. Note that the number of basis functions grow linearly with the dimensionality of the state space, rather than quadratically.

4.2.2 Life insurance models

For the state St=(Yt,Ft,Nt1,,Ntk)S_{t}=(Y_{t},F_{t},N^{1}_{t},\dots,N^{k}_{t}), let pt+1ip^{i}_{t+1} be the probability of death during (t,t+1)(t,t+1) for an individual in cohort ii, with qt+1i:=1pt+1iq^{i}_{t+1}:=1-p^{i}_{t+1}. We then introduce the state-dependent variables

μt+1:=i=1kNtipt+1i,σt+1:=(i=1kNtipt+1iqt+1i)1/2,Nt:=i=1kNti.\displaystyle\mu_{t+1}:=\sum_{i=1}^{k}N^{i}_{t}p^{i}_{t+1},\quad\sigma_{t+1}:=\Big{(}\sum_{i=1}^{k}N^{i}_{t}p^{i}_{t+1}q^{i}_{t+1}\Big{)}^{1/2},\quad N_{t}:=\sum_{i=1}^{k}N^{i}_{t}.

The first two terms here are the mean and standard deviation of the number of deaths during (t,t+1)(t,t+1), the third simply being the total number of people alive at time tt. The basis functions we choose consist of the state vector Yt,Ft,Nt1,,NtkY_{t},F_{t},N^{1}_{t},\dots,N^{k}_{t} together with all products of two factors where the first factor is an element of the set {μt+1,σt+1,Nt}\{\mu_{t+1},\sigma_{t+1},N_{t}\} and the other factor is an element of the set

{Yt,Ft,Yt2,Ft2,Ft3,YtFt,YtFt2,(FtKj)+,(FtKj)+Yt,\displaystyle\big{\{}Y_{t},F_{t},Y^{2}_{t},F^{2}_{t},F_{t}^{3},Y_{t}F_{t},Y_{t}F_{t}^{2},(F_{t}-K_{j})_{+},(F_{t}-K_{j})_{+}Y_{t},
C(Ft,S,T,t),C(Ft,D,t+1,t),C(Ft,S,T,t)Yt,C(Ft,D,t+1,t)Yt}.\displaystyle\quad C(F_{t},S^{*},T,t),C(F_{t},D^{*},t+1,t),C(F_{t},S^{*},T,t)Y_{t},C(F_{t},D^{*},t+1,t)Y_{t}\big{\}}.

KjK_{j} can take values in {200,162,124,103}\{200,162,124,103\} depending on which covariates of the form (FtKj)+(F_{t}-K_{j})_{+} had the highest R2R^{2}-value at time T=5T=5. Here the R2R^{2}-values were calculated based on the residuals after performing linear regression with respect to all basis function not containing elements of the form (FtKj)+(F_{t}-K_{j})_{+}. While this is a somewhat ad hoc approach that could be refined, it is a simple and easy to implement example of basis functions. Again note that the number of basis functions grow linearly with the dimensionality of the state space, rather than quadratically.

4.2.3 Run specifications

For Algorithm 1, M=5104M=5\cdot 10^{4} and n=105n=10^{5} were chosen for the life insurance models and M=104M=10^{4} and n=105n=10^{5} for the AR(1)-GARCH(1,1) models. Terminal time T=6T=6 was used in all cases. For the validation run, M=104M=10^{4} and n=105n=10^{5} were chosen for all models. Due to the extreme quantile level involved, and also based on empirical observations, it was deemed necessary to keep nn around this order of magnitude. Similarly, in part due to the number of basis functions involved, it was observed as well that performance seemed to increase with MM. The choice of MM and nn to be on the considered order of magnitude was thus necessary for good model performance, and also the largest orders of magnitude that was computationally feasible given the computing power available.

For the AR(1)-GARCH(1,1) model, the chosen parameters were

α0=1,α1=1,α2=0.1,α3=0.1,α4=0.1.\alpha_{0}=1,\quad\alpha_{1}=1,\quad\alpha_{2}=0.1,\quad\alpha_{3}=0.1,\quad\alpha_{4}=0.1.

The same choice was used for each of the terms in the sum of 1010 AR(1)-GARCH(1,1) processes, making the model a sum of i.i.d. processes.

For the life insurance models, the choice of parameters of the risky assets was μY=μF=0.03,σY=σF=0.1,ρ=0.4,y0=f0=100\mu_{Y}=\mu_{F}=0.03,\sigma_{Y}=\sigma_{F}=0.1,\rho=0.4,y_{0}=f_{0}=100. The benefit lower bounds were chosen as D=100,S=110D^{*}=100,S^{*}=110. The death/survival probabilities were calculated using the Makeham formula (for males):

pa=exp{aa+1μxdx}μx:=0.001+0.000012exp{0.101314x}.\displaystyle p_{a}=\exp\Big{\{}-\int_{a}^{a+1}\mu_{x}\mathrm{d}x\Big{\}}\quad\mu_{x}:=0.001+0.000012\exp\{0.101314x\}.

These numbers correspond to the Swedish mortality table M90 for males (the formula for females is identical, but adjusted backwards by 66 years to account for the greater longevity in the female population). For the case of 44 cohorts, starting ages (for males) were 508050-80 in 1010-year increments and for the case of 1010 cohorts the starting ages were 408540-85 with 55-year increments.

The algorithms were run on a computer with 8 Intel(R) Core(TM) i7-4770S 3.10GHz processors, and parallel programming was implemented in the nested simulation steps in both Algorithm 1 and the validation algorithm.

4.3 Numerical results

The RMSE:s and NRMSE:s of the LSM models can be seen in Table 1 (RMSE:s) and Table 2 (NRMSE:s). The ANDP:s and AROC:s of the LSM models can be seen in Table 3. The tables display quantile ranges with respect to the 2.5%2.5\% and 97.5%97.5\% quantiles of the data.

Model RMSE V RMSE R RMSE E
one single AR(1)-GARCH(1,1) 0.0114, 0.0118, 0.0115, 0.0098, 0.0061 0.0533, 0.0556, 0.0553, 0.0455, 0.0285 0.0521, 0.0542, 0.0544, 0.0444, 0.0279
a sum of 1010 AR(1)-GARCH(1,1) 0.0172, 0.0130, 0.0120, 0.0100, 0.0061 0.0525, 0.0552, 0.0546, 0.0467, 0.0278 0.0536, 0.0544, 0.0535, 0.0458, 0.0273
Life model with 4 cohorts 134.4, 120.3, 134.8, 85.1, 75.5 760.3, 682.7, 901.6, 535.9, 575.1 742.4, 665.0, 856.8, 536.0, 571.3
Life model with 10 cohorts 331.9, 307.4, 330.8, 226.2, 219.3 1730.1, 1719.4, 2148.3, 1431.0, 1928.3 1689.2, 1672.8, 2049.8, 1429.4, 1910.3
Table 1: RMSE values for the quantities V,R,EV,R,E as defined in (27). The five values in each cell are for times t=1,2,3,4,5t=1,2,3,4,5, in that order.
Model NRMSE V (%) NRMSE R (%) NRMSE E (%)
one single AR(1)-GARCH(1,1) 0.0498, 0.0583, 0.0685, 0.0797, 0.0901 0.1705, 0.1931, 0.2170, 0.2333, 0.2544 0.5810, 0.5962, 0.5930, 0.5747, 0.5885
a sum of 1010 AR(1)-GARCH(1,1) 0.0758, 0.0642, 0.0715, 0.0813, 0.0912 0.1693, 0.1913, 0.2158, 0.2393, 0.2493 0.6049, 0.5963, 0.5880, 0.5949, 0.5779
Life model with 4 cohorts 0.2567 0.2225 0.2603 0.1711 0.1647 0.5443 0.5646 0.8722 0.5924 0.7403 0.7109 0.7535 1.1381 0.8306 1.0271
Life model with 10 cohorts 0.2505, 0.2247, 0.2454, 0.1720, 0.1733 0.4911, 0.5592, 0.7948, 0.5952, 0.9056 0.6475, 0.7452, 1.0499, 0.8351, 1.2580
Table 2: NRMSE values for the quantities V,R,EV,R,E as defined in (28). The five values in each cell are for times t=1,2,3,4,5t=1,2,3,4,5, in that order.
Model QR ANDP (2.5%2.5\%, 97.5%97.5\%) QR AROC (2.5%2.5\%, 97.5%97.5\%)
one single AR(1)-GARCH(1,1) (0.457, 0.544), (0.456, 0.545), (0.457, 0.545), (0.458, 0.545), (0.457, 0.543) (4.79, 7.22), (4.76, 7.25), (4.78, 7.22), (4.84, 7.24), (4.80, 7.20)
a sum of 1010 AR(1)-GARCH(1,1) (0.458, 0.545), (0.456, 0.545), (0.457, 0.545), (0.456, 0.546), (0.457, 0.544) (4.74, 7.29), (4.77, 7.26), (4.77, 7.23), (4.79, 7.25), (4.81, 7.21)
Life model with 4 cohorts (0.454, 0.548), (0.443, 0.565), (0.385, 0.622), (0.436, 0.571), (0.387, 0.603) (4.59, 7.43), (4.32, 7.82), (2.71, 9.05), (4.10, 7.93), (2.69, 8.49)
Life model with 10 cohorts (0.457, 0.546), (0.444, 0.560), (0.391, 0.611), (0.435, 0.569), (0.394, 0.605) (4.66, 7.37), (4.33, 7.68), (2.94, 8.97), (4.08, 7.89), (2.96, 8.58)
Table 3: Quantile ranges for the samples (1ANDPt(i))i=1M(1-\text{ANDP}_{t}^{(i)})_{i=1}^{M} and (AROCt(i))i=1M(\text{AROC}_{t}^{(i)})_{i=1}^{M}, as defined in (29) and (30). The quantiles considered are 2.5%2.5\% and 97.5%97.5\%. The five intervals in each cell are for times t=1,2,3,4,5t=1,2,3,4,5, in that order.

Below, in Figure 1 we also present some histograms of the actual returns and risks of ruin, in order to get a sense of the spread of these values.

Refer to caption Refer to caption
Refer to caption Refer to caption
Figure 1: The top two figures correspond to the AR(1)-GARCH(1,1) model. The bottom two figures correspond to the life insurance model with 10 cohorts.

From these we can observe that the quantity representing the actual returns seems to be quite sensitive to model errors, if we recall the rather small size of the RMSE values.

Model Running time valuation (HH:MM) Running time validation (HH:MM)
one single AR(1)-GARCH(1,1) 00:06 00:10
a sum of 10 AR(1)-GARCH(1,1) 00:33 00:39
Life model with 4 cohorts 12:48 02:30
Life model with 10 cohorts 13:29 02:44
Table 4: Run time of each model in hours and minutes. Run specifications are described in section 4.2.3

Table 4 displays the running times of each model. As far as is known, the main factor determining running time of Algoritm 1 is the repeated calculation of the basis functions inside the nested simulation (required to calculate the quantities Yt+1i,jY_{t+1}^{i,j} in the inner for-loop). As these are quite many for models with high-dimensional state spaces, we see that running times increase accordingly. It should be noted that Algorithm 1 was not implemented to run as fast as possible for any specific model, other than the implementation of parallel programming. Speed could potentially be gained by adapting Algorithm 1 for specific models of interest.

Some conclusions can be drawn from the numerical results. Firstly, we can see that from a mean-squared-error point of view, the LSM model seems to work well in order to capture the dynamics of the multiperiod cost-of-capital valuation. It should be noted that the (N)RMSE of the value VV is lower than those of RR and EE across the board for all models and times. Since the expression for EE is heavily dependent of RR, we can suspect that estimation errors of RR and EE are positively correlated, and thus that V=R11+ηEV=R-\frac{1}{1+\eta}E gets lower mean squared errors as a result.

We can see that increasing model complexity for the AR(1)-GARCH(1,1) and life insurance models seems to have no significant effect on LSM performance. It should be noted that model complexity in both cases is increased by introducing independent stochastic factors; A sum of i.i.d. processes in the AR(1)-GARCH(1,1) case and the adding of (independent) cohorts in the life insurance case. Thus the de-facto model complexity might not have increased much, even though the state-space of the markov process is increased.

When we look at the ANDP and AROC quantities, we see that these seem to vary more than do the (N)RMSE:s. Especially AROC, which is defined via a quotient, seems to be sensitive to model error.

One important thing to note with regards to sensitivity of ANDP and AROC is the presence of errors introduced by the necessity of having to use Monte-Carlo simulations in order to calculate samples of VaRt,1α(),𝔼[()+]\operatorname{VaR}_{t,1-\alpha}(-\cdot),\mathbb{E}[(\cdot)_{+}]. This can be seen in the AR(1)-GARCH(1,1) case: If we investigate what the value V5(L)V_{5}(L) should be, we see that in this case it has a closed form (using positive homogeneity and translation invariance):

φ5(L6+V6(L))=φ5(α0+α1L5+σ6ϵ6)=α0+α1L5+σ6φ5(ϵ6).\displaystyle\varphi_{5}(L_{6}+V_{6}(L))=\varphi_{5}(\alpha_{0}+\alpha_{1}L_{5}+\sigma_{6}\epsilon_{6})=\alpha_{0}+\alpha_{1}L_{5}+\sigma_{6}\varphi_{5}(\epsilon_{6}).

φ5(ϵ6)\varphi_{5}(\epsilon_{6}) is deterministic due to law invariance. Since LtL_{t} and σt\sigma_{t} are included in the basis functions for the AR(1)-GARCH(1,1) model, we would expect the fit in this case to be perfect. Since it is not, we conclude that errors still may appear even if optimal basis functions are among our selection of basis functions.

Finally, if we recall that the main purpose of these calculations is to calculate the quantity Vt(L)V_{t}(L), a good approach for validation might be to re-balance the LSM estimates of Rt(i)R^{(i)}_{t} and Et(i)E^{(i)}_{t} so that the LSM estimate of the value VtV_{t} remains unchanged, but the LSM estimates are better fitted to Rt(i)R^{(i)}_{t} and Et(i)E^{(i)}_{t}. This re-balancing would not be problematic in the economic model that this validation scheme is played out in. However, in this paper we were also interested in how the LSM model captures both the VaR term and the expected value term, so the quantities ANDP and AROC remain relevant to look at.

5 Conclusion

We have studied the performance of the LSM algorithm to numerically compute recursively defined objects such as (Vt(L,φ))t=0T(V_{t}(L,\varphi))_{t=0}^{T} given in definition 4, where the mappings φ\varphi are either L2L^{2}-continuous or are given by (21). As a part of this study, Lipschitz-like results and conditions for L2L^{2}-continuity were established for Value-at-Risk and the associated operator φt,α\varphi_{t,\alpha} in Theorem 2 and Corollary 2. Important basic consistency results have been obtained showing the convergence of the LSM estimator both as the number of basis functions go to infinity in Lemmas 1 and 7 and when the size of the simulated data goes to infinity for a fixed number of basis functions in Theorems 1 and 3. Furthermore, these results are applicable to a large class of conditional monetary risk measures, utility functions and various actuarial multi-period valuations, the only requirement being L2L^{2}-continuity or a property like that established in Theorem 2. We also apply and evaluate the LSM algorithm with respect to multi-period cost-of-capital valuation considered in [12] and [13], and in doing this also provide insight into practical considerations concerning implementation and validation of the LSM algorithm.

6 Proofs

Proof of Lemma 1.

Note that the quantities defined in (10) and (17) are independent of DD, hence all norms below are a.s. constants. Define ϵt:=V^N,tVt\epsilon_{t}:=\widehat{V}_{N,t}-V_{t}. We will now show via backwards induction staring from time t=Tt=T that ϵt20||\epsilon_{t}||_{2}\to 0. The induction base is trivial, since V^N,T(L)=VT(L)=0\widehat{V}_{N,T}(L)=V_{T}(L)=0. Now assume that ϵt+120||\epsilon_{t+1}||_{2}\to 0. Then

ϵt2\displaystyle||\epsilon_{t}||_{2} V^N,tφt(Lt+1+V^N,t+1(L))2\displaystyle\leq||\widehat{V}_{N,t}-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))||_{2}
+φt(Lt+1+V^N,t+1(L))φt(Lt+1+Vt+1(L))2\displaystyle\quad+||\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+V_{t+1}(L))||_{2}

By the induction assumption and the continuity assumption, we know that the second summand goes to 0. We now need to show that V^N,tφt(Lt+1+V^N,t+1(L))20||\widehat{V}_{N,t}-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))||_{2}\to 0. Now we simply note, by the definition of the projection operator and denseness of the approximating sets,

V^N,tφt(Lt+1+V^N,t+1(L))2\displaystyle||\widehat{V}_{N,t}-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))||_{2}
=infBNBφt(Lt+1+V^N,t+1(L))2\displaystyle\quad=\inf_{B\in\mathcal{B}_{N}}||B-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))||_{2}
infBNBφt(Lt+1+Vt+1)2\displaystyle\quad\leq\inf_{B\in\mathcal{B}_{N}}||B-\varphi_{t}(L_{t+1}+V_{t+1})||_{2}
+φt(Lt+1+V^N,t+1(L))φt(Lt+1+Vt+1)2\displaystyle\quad\quad+||\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+V_{t+1})||_{2}
=[infBNBφt(Lt+1+Vt+1)2]\displaystyle\quad=\Big{[}\inf_{B\in\mathcal{B}_{N}}||B-\varphi_{t}(L_{t+1}+V_{t+1})||_{2}\Big{]}
+φt(Lt+1+V^N,t+1(L))φt(Lt+1+Vt+1)2.\displaystyle\quad\quad+||\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+V_{t+1})||_{2}.

By our assumptions, both these terms go to zero as φt(Lt+1+Vt+1(L))\varphi_{t}(L_{t+1}+V_{t+1}(L)) is afunction of the state StS_{t}, which lies in L2(t)L^{2}(\mathcal{F}_{t}). ∎

Proof of Lemma 2.

We first note that if β^t,N,ZM,t(M)βt,N,Zt\widehat{\beta}^{(M)}_{t,N,Z_{M,t}}\to\beta_{t,N,Z_{t}} in probability, then (βt,N,Zt)T𝚽t,N(β^t,N,ZM,t(M))T𝚽t,N20||(\beta_{t,N,Z_{t}})^{\mathrm{T}}\mathbf{\Phi}_{t,N}-(\widehat{\beta}^{(M)}_{t,N,Z_{M,t}})^{\mathrm{T}}\mathbf{\Phi}_{t,N}||_{2}\to 0 in probability, since β^t,N,ZM,t(M)\widehat{\beta}^{(M)}_{t,N,Z_{M,t}} is independent of 𝚽t,N\mathbf{\Phi}_{t,N} and 0\mathcal{F}_{0}-measurable, while 𝚽t,N\mathbf{\Phi}_{t,N} is independent of DD. Hence it suffices to show that β^t,N,ZM,t(M)βt,N,Zt\widehat{\beta}^{(M)}_{t,N,Z_{M,t}}\to\beta_{t,N,Z_{t}} in probability. Now, recalling the definition of β^t,N,ZM,t(M)\widehat{\beta}^{(M)}_{t,N,Z_{M,t}} , we re-write (14) as

β^t,N,Zt(M)=(1M(𝚽t,N(M))T𝚽t,N(M))11M(𝚽t,N(M))TZt(M),\displaystyle\widehat{\beta}^{(M)}_{t,N,Z_{t}}=\Big{(}\frac{1}{M}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}\mathbf{\Phi}^{(M)}_{t,N}\Big{)}^{-1}\frac{1}{M}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}Z_{t}^{(M)},

Furthermore recall the form of βt,N,Zt\beta_{t,N,Z_{t}} given by (13). We first note that since, by the law of large numbers, 1M(𝚽t,N(M))T𝚽t,N(M)𝔼0[𝚽t,N𝚽t,NT]\frac{1}{M}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}\mathbf{\Phi}^{(M)}_{t,N}\to\mathbb{E}_{0}\big{[}\mathbf{\Phi}_{t,N}\mathbf{\Phi}_{t,N}^{{\mathrm{T}}}\big{]} almost surely and thus in probability, it suffices to show that

1M(Φt,j(M)(St(i)))1iMTZt(M)𝔼0[Φt,j(St)Zt]\displaystyle\frac{1}{M}\big{(}\Phi^{(M)}_{t,j}(S_{t}^{(i)})\big{)}_{1\leq i\leq M}^{\mathrm{T}}Z_{t}^{(M)}\to\mathbb{E}_{0}[\Phi_{t,j}(S_{t})Z_{t}]

in probability for each j=1,Nj=1,\dots N. We first note that, letting ϵM(i):=ZM,t(i)zt(St(i))\epsilon_{M}^{(i)}:=Z_{M,t}^{(i)}-z_{t}(S_{t}^{(i)})

|1Mi=1MΦt,j(St(i))ZM,t(i)𝔼0[Φt,j(St)Zt]|\displaystyle\Big{|}\frac{1}{M}\sum_{i=1}^{M}\Phi_{t,j}(S_{t}^{(i)})Z_{M,t}^{(i)}-\mathbb{E}_{0}[\Phi_{t,j}(S_{t})Z_{t}]\Big{|}
|1Mi=1MΦt,j(St(i))zt(St(i))𝔼0[Φt,j(St)Zt]|+|1Mi=1MΦt,j(St(i))ϵM(i)|\displaystyle\quad\leq\Big{|}\frac{1}{M}\sum_{i=1}^{M}\Phi_{t,j}(S_{t}^{(i)})z_{t}(S_{t}^{(i)})-\mathbb{E}_{0}[\Phi_{t,j}(S_{t})Z_{t}]\Big{|}+\Big{|}\frac{1}{M}\sum_{i=1}^{M}\Phi_{t,j}(S_{t}^{(i)})\epsilon_{M}^{(i)}\Big{|}

The first summand goes to zero in probability by the law of large numbers. Thus, we investigate the second summand using Hölder’s inequality:

|1Mi=1MΦt,j(St(i))ϵM(i)|(1Mi=1M(Φt,j(St(i)))2)1/2(1Mi=1M(ϵM(i))2)1/2\displaystyle\Big{|}\frac{1}{M}\sum_{i=1}^{M}\Phi_{t,j}(S_{t}^{(i)})\epsilon_{M}^{(i)}\Big{|}\leq\Big{(}\frac{1}{M}\sum_{i=1}^{M}(\Phi_{t,j}(S_{t}^{(i)}))^{2}\Big{)}^{1/2}\Big{(}\frac{1}{M}\sum_{i=1}^{M}(\epsilon_{M}^{(i)})^{2}\Big{)}^{1/2}

We see that, again by the law of large numbers, the first factor converges to 𝔼[(Φt,j(St))2]\mathbb{E}[(\Phi_{t,j}(S_{t}))^{2}] in probability. Now we look at the second factor. By our independence assumption, ϵM(i)=𝑑Zt,MZt\epsilon_{M}^{(i)}\overset{d}{=}Z_{t,M}-Z_{t} and thus

Var((1Mi=1M(ϵM(i))2)1/2|0)𝔼0[1Mi=1M(ϵM(i))2]=Zt,MZt22,\displaystyle\operatorname{Var}\Big{(}\Big{(}\frac{1}{M}\sum_{i=1}^{M}(\epsilon_{M}^{(i)})^{2}\Big{)}^{1/2}\Big{|}\mathcal{F}_{0}\Big{)}\leq\mathbb{E}_{0}\Big{[}\frac{1}{M}\sum_{i=1}^{M}(\epsilon_{M}^{(i)})^{2}\Big{]}=||Z_{t,M}-Z_{t}||^{2}_{2},

which, by assumption, goes to 0 in probability, hence the expression goes to zero in probability. This concludes the proof. ∎

Proof of Theorem 1.

We pove the statement by backwards induction, starting from time t=Tt=T. As before, the induction base follows immediately from our assumptions. Now assume V^N,t+1(L)V^N,t+1(M)(L)20||\widehat{V}_{N,t+1}(L)-\widehat{V}_{N,t+1}^{(M)}(L)||_{2}\to 0 in probability, as MM\to\infty. By L2L^{2}-continuity we get that φt(Lt+1+V^N,t+1(L))φt(Lt+1+V^N,t+1(M)(L))20||\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}^{(M)}(L))||_{2}\to 0 in probability. But then by Lemma 2 we immediately get that V^N,t(L)V^N,t(M)(L)20||\widehat{V}_{N,t}(L)-\widehat{V}_{N,t}^{(M)}(L)||_{2}\to 0 in probability. ∎

Proof of Lemma 3.

Note that

φt(X)φt(Y)22\displaystyle||\varphi_{t}(X)-\varphi_{t}(Y)||^{2}_{2} =𝔼0[|φt(X)φt(Y)|2]𝔼0[K2𝔼t[|XY|]2]\displaystyle=\mathbb{E}_{0}[|\varphi_{t}(X)-\varphi_{t}(Y)|^{2}]\leq\mathbb{E}_{0}[K^{2}\mathbb{E}_{t}[|X-Y|]^{2}]
K2𝔼0[𝔼t[|XY|2]]=K2𝔼0[|XY|2]\displaystyle\leq K^{2}\mathbb{E}_{0}[\mathbb{E}_{t}[|X-Y|^{2}]]=K^{2}\mathbb{E}_{0}[|X-Y|^{2}]
=K2XY22.\displaystyle=K^{2}||X-Y||^{2}_{2}.

Here we have used Jensen’s inequality and the tower property of the conditional epectation at the second inequality and the following equality, respectively. From this L2L^{2}-continuity immediately follows. ∎

Proof of Lemma 4.

By Lemma 9, to construct upper and lower bounds for a quantity given by φt,α(ξ)\varphi_{t,\alpha}(\xi) we may find upper and lower bounds for ρt(ξ)\rho_{t}(-\xi) and insert them into the expression for φt,α(ξ)\varphi_{t,\alpha}(\xi). Now Take X,YLp(t+1)X,Y\in L^{p}(\mathcal{F}_{t+1}) and let Z:=YXZ:=Y-X. By monotnicity we get that

φt(X|Z|)φt(Y)φt(X+|Z|).\displaystyle\varphi_{t}(X-|Z|)\leq\varphi_{t}(Y)\leq\varphi_{t}(X+|Z|).

We now observe that

ρt((X+|Z|))ρt(X)+K𝔼t[|Z|]\displaystyle\rho_{t}(-(X+|Z|))\leq\rho_{t}(-X)+K\mathbb{E}_{t}[|Z|]
ρt((X|Z|))ρt(X)K𝔼t[|Z|]\displaystyle\rho_{t}(-(X-|Z|))\leq\rho_{t}(-X)-K\mathbb{E}_{t}[|Z|]

We use this to also observe that, by the subadditivity of the ()+()_{+}-operation,

𝔼t[(ρt(X)+K𝔼t[|Z|]X|Z|)+]\displaystyle-\mathbb{E}_{t}[(\rho_{t}(-X)+K\mathbb{E}_{t}[|Z|]-X-|Z|)_{+}]
𝔼t[ρt(X)X)+]+𝔼t[(ρt(|Z|)|Z|)+]\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+\mathbb{E}_{t}[(\rho_{t}(-|Z|)-|Z|)_{+}]
𝔼t[ρt(X)X)+]+ρt(|Z|)\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+\rho_{t}(-|Z|)
𝔼t[ρt(X)X)+]+K𝔼t[|Z|].\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+K\mathbb{E}_{t}[|Z|].

Similarly, we have that

𝔼t[(ρt(X)K𝔼t[|Z|]X+|Z|)+]\displaystyle-\mathbb{E}_{t}[(\rho_{t}(-X)-K\mathbb{E}_{t}[|Z|]-X+|Z|)_{+}]
𝔼t[ρt(X)X)+]𝔼t[(ρt(|Z|)|Z|)+]\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-\mathbb{E}_{t}[(\rho_{t}(-|Z|)-|Z|)_{+}]
𝔼t[ρt(X)X)+]ρt(|Z|)\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-\rho_{t}(-|Z|)
𝔼t[ρt(X)X)+]K𝔼t[|Z|].\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-K\mathbb{E}_{t}[|Z|].

From this we get that

φt(Y)φt(X+|Z|)φt(X)+K𝔼t[|Z|]+11+ηK𝔼t[|Z|]\displaystyle\varphi_{t}(Y)\leq\varphi_{t}(X+|Z|)\leq\varphi_{t}(X)+K\mathbb{E}_{t}[|Z|]+\frac{1}{1+\eta}K\mathbb{E}_{t}[|Z|]
φt(Y)φt(X|Z|)φt(X)K𝔼t[|Z|]11+ηK𝔼t[|Z|]\displaystyle\varphi_{t}(Y)\geq\varphi_{t}(X-|Z|)\geq\varphi_{t}(X)-K\mathbb{E}_{t}[|Z|]-\frac{1}{1+\eta}K\mathbb{E}_{t}[|Z|]

From which Lipschitz continuity with resepct to the constant 2K2K immediately follows. ∎

Proof of Lemma 5.

By subadditivity we have that

ρt,M(Y)ρt,M(X)ρt,M(YX)ρt,M(|YX)|),\displaystyle\rho_{t,M}(Y)-\rho_{t,M}(X)\leq\rho_{t,M}(Y-X)\leq\rho_{t,M}(-|Y-X)|),
ρt,M(X)ρt,M(Y)ρt,M(XY)ρt,M(|YX)|),\displaystyle\rho_{t,M}(X)-\rho_{t,M}(Y)\leq\rho_{t,M}(X-Y)\leq\rho_{t,M}(-|Y-X)|),

Now we simply note that

ρt,M(|YX)|)\displaystyle\rho_{t,M}(-|Y-X)|) =01Ft,|YX)|1(u)m(u)du\displaystyle=-\int_{0}^{1}F^{-1}_{t,|Y-X)|}(u)m(u)\mathrm{d}u
m(0)01Ft,|YX)|1(u)du\displaystyle\leq-m(0)\int_{0}^{1}F^{-1}_{t,|Y-X)|}(u)\mathrm{d}u
=m(0)𝔼t[|YX|]\displaystyle=m(0)\mathbb{E}_{t}[|Y-X|]

This concludes the proof. ∎

Proof of Lemma 6.

We begin by showing (22). Let E={ZVaR1δ(Z)}E=\{Z\leq\operatorname{VaR}_{1-\delta}(-Z)\}. Then:

t(X+Zy)\displaystyle\mathbb{P}_{t}(X+Z\leq y) t(E{X+Zy})\displaystyle\geq\mathbb{P}_{t}(E\cap\{X+Z\leq y\})
t(E{X+VaR1δ(Z)y})\displaystyle\geq\mathbb{P}_{t}(E\cap\{X+\operatorname{VaR}_{1-\delta}(-Z)\leq y\})
t(X+VaR1δ(Z)y)(E)\displaystyle\geq\mathbb{P}_{t}(X+\operatorname{VaR}_{1-\delta}(-Z)\leq y)-\mathbb{P}(E^{\complement})
t(XyVaR1δ(Z))δ\displaystyle\geq\mathbb{P}_{t}(X\leq y-\operatorname{VaR}_{1-\delta}(Z))-\delta

Putting y=VaRα+δ(X)+VaR1δ(Z)y=\operatorname{VaR}_{\alpha+\delta}(-X)+\operatorname{VaR}_{1-\delta}(-Z) yields

t(XyVaR1δ(Z))δα+δδ=α\displaystyle\mathbb{P}_{t}(X\leq y-\operatorname{VaR}_{1-\delta}(-Z))-\delta\geq\alpha+\delta-\delta=\alpha

Hence VaRt,α((X+Z))VaRα+δ(X)+VaR1δ(Z)\operatorname{VaR}_{t,\alpha}(-(X+Z))\leq\operatorname{VaR}_{\alpha+\delta}(-X)+\operatorname{VaR}_{1-\delta}(-Z). We now prove (23) by applying (22)

VaRt,αδ(X)\displaystyle\operatorname{VaR}_{t,\alpha-\delta}(-X) =VaRt,αδ((X+Z+(Z))\displaystyle=\operatorname{VaR}_{t,\alpha-\delta}(-(X+Z+(-Z))
VaRt,α((X+Z))+VaR1δ(Z),\displaystyle\leq\operatorname{VaR}_{t,\alpha}(-(X+Z))+\operatorname{VaR}_{1-\delta}(Z),

from which we get (23) ∎

Proof of Corollary 1.

Let Z=YXZ=Y-X. Now we simply note that, for any δ\delta

VaRt,α((X+Z))\displaystyle\operatorname{VaR}_{t,\alpha}(-(X+Z)) VaRt,α((X+|Z|))\displaystyle\leq\operatorname{VaR}_{t,\alpha}(-(X+|Z|))
VaRt,α+δ(X)+VaRt,1δ(|Z|)\displaystyle\leq\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(-|Z|)

By Markov’s inequality, we may bound the latter summand:

VaRt,1δ(|Z|)1δ𝔼t[|Z|]\displaystyle\operatorname{VaR}_{t,1-\delta}(-|Z|)\leq\frac{1}{\delta}\mathbb{E}_{t}[|Z|]

Now for the lower bound, we similarly note

VaRt,α((X+Z))\displaystyle\operatorname{VaR}_{t,\alpha}(-(X+Z)) VaRt,α((X|Z|))\displaystyle\geq\operatorname{VaR}_{t,\alpha}(-(X-|Z|))
VaRt,αδ(X)+VaRt,1δ(|Z|)\displaystyle\geq\operatorname{VaR}_{t,\alpha-\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(|Z|)

where again we may bound the second summand using Markov’s inequality:

VaRt,1δ(|Z|)11δ𝔼t[|Z|]1δ𝔼t[|Z|],\displaystyle\operatorname{VaR}_{t,1-\delta}(|Z|)\geq-\frac{1}{1-\delta}\mathbb{E}_{t}[|Z|]\geq-\frac{1}{\delta}\mathbb{E}_{t}[|Z|],

since we have assumed δ<1/2\delta<1/2. This immediately yields that, almost surely,

VaRt,α(Y)[VaRt,αδ(X)1δ𝔼t[|XY|],VaRt,α+δ(X)+1δ𝔼t[|XY|]].\displaystyle\operatorname{VaR}_{t,\alpha}(-Y)\in\Big{[}\operatorname{VaR}_{t,\alpha-\delta}(-X)-\frac{1}{\delta}\mathbb{E}_{t}[|X-Y|],\operatorname{VaR}_{t,\alpha+\delta}(-X)+\frac{1}{\delta}\mathbb{E}_{t}[|X-Y|]\Big{]}.

This immediately yields our desired result. ∎

Lemma 9.

For any XL1(t+1)X\in L^{1}(\mathcal{F}_{t+1}) and R1,R2L0(t)R_{1},R_{2}\in L^{0}(\mathcal{F}_{t}) with R1R2R_{1}\leq R_{2} a.s.,

R111+η𝔼t[(R1X)+]R211+η𝔼t[(R2X)+]a.s.\displaystyle R_{1}-\frac{1}{1+\eta}\mathbb{E}_{t}[(R_{1}-X)_{+}]\leq R_{2}-\frac{1}{1+\eta}\mathbb{E}_{t}[(R_{2}-X)_{+}]\quad a.s.
Proof of Lemma 9.

Let R1R2R_{1}\leq R_{2} a.s. and let A1={R1X0}A_{1}=\{R_{1}-X\geq 0\} and A2={R2X0}A_{2}=\{R_{2}-X\geq 0\}. Note that A1A2A_{1}\subseteq A_{2} almost surely, i.e. t(A1A2)=0\mathbb{P}_{t}(A_{1}\setminus A_{2})=0 a.s. We now note that:

R111+η𝔼t[(R1X)+]=(111+ηt(A1))R1+11+η𝔼t[𝕀A1X]\displaystyle R_{1}-\frac{1}{1+\eta}\mathbb{E}_{t}[(R_{1}-X)_{+}]=\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{1})\big{)}R_{1}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{1}}X]
R211+η𝔼t[(R2X)+]=(111+ηt(A2))R2+11+η𝔼t[𝕀A2X]\displaystyle R_{2}-\frac{1}{1+\eta}\mathbb{E}_{t}[(R_{2}-X)_{+}]=\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{2})\big{)}R_{2}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]

We look at the expectation in the first expression:

𝔼t[𝕀A1X]=𝔼t[𝕀A2X]𝔼t[X𝕀A2A1]𝔼t[𝕀A2X]t(A2A1)R1\displaystyle\mathbb{E}_{t}[\mathbb{I}_{A_{1}}X]=\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]-\mathbb{E}_{t}[X\mathbb{I}_{A_{2}\setminus A_{1}}]\leq\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]-\mathbb{P}_{t}(A_{2}\setminus A_{1})R_{1}

We now see that

(111+ηt(A1))R1+11+η𝔼t[𝕀A1X]\displaystyle\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{1})\big{)}R_{1}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{1}}X]
(111+ηt(A1))R1+11+η𝔼t[𝕀A2X]11+ηt(A2A1)R1\displaystyle\quad\leq\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{1})\big{)}R_{1}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{2}\setminus A_{1})R_{1}
=(111+ηt(A2))R1+11+η𝔼t[𝕀A2X]\displaystyle\quad=\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{2})\big{)}R_{1}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]
(111+ηt(A2))R2+11+η𝔼t[𝕀A2X]\displaystyle\quad\leq\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{2})\big{)}R_{2}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]

This concludes the proof. ∎

Proof of Theorem 2.

Let Z=YXZ=Y-X. Note that

φt,α(Y)=φt,α(X+Z)φt,α(X+|Z|).\displaystyle\varphi_{t,\alpha}(Y)=\varphi_{t,\alpha}(X+Z)\leq\varphi_{t,\alpha}(X+|Z|).

As for the VaR\operatorname{VaR}-part of φt,α\varphi_{t,\alpha}, including that in the expectation, we note that by Lemma 6 VaRt,α((X+|Z|))VaRt,α+δ(X)+VaRt,1δ(|Z|)\operatorname{VaR}_{t,\alpha}(-(X+|Z|))\leq\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(-|Z|). We now note that, by subadditivity of x(x)+x\mapsto(x)_{+},

𝔼t[(VaRt,α+δ(X)+VaRt,1δ(|Z|)X|Z|)+]\displaystyle-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(-|Z|)-X-|Z|)_{+}]
𝔼t[(VaRt,α+δ(X)X)+]+𝔼t[(VaRt,1δ(|Z|)|Z|)+]\displaystyle\quad\leq-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)-X)_{+}]+\mathbb{E}_{t}[(\operatorname{VaR}_{t,1-\delta}(-|Z|)-|Z|)_{+}]
𝔼t[(VaRt,α+δ(X)X)+]+VaRt,1δ(|Z|).\displaystyle\quad\leq-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)-X)_{+}]+\operatorname{VaR}_{t,1-\delta}(-|Z|).

Hence

φt,α(X+|Z|)\displaystyle\varphi_{t,\alpha}(X+|Z|) φt,α+δ(X)+2+η1+ηVaRt,1δ(|Z|)\displaystyle\leq\varphi_{t,\alpha+\delta}(X)+\frac{2+\eta}{1+\eta}\operatorname{VaR}_{t,1-\delta}(-|Z|)
φt,α+δ(X)+2δ𝔼t[|Z|].\displaystyle\leq\varphi_{t,\alpha+\delta}(X)+\frac{2}{\delta}\mathbb{E}_{t}[|Z|].

Here we have used the Markov’s inequality bound from Corollary 1.

We now similarly construct a lower bound for φt,α(Y)\varphi_{t,\alpha}(Y):

φt,α(Y)=φt,α(X+Z)φt,α(X|Z|)\displaystyle\varphi_{t,\alpha}(Y)=\varphi_{t,\alpha}(X+Z)\geq\varphi_{t,\alpha}(X-|Z|)

Again, for the VaR\operatorname{VaR}-part, we note that by Lemma 6 VaRt,α((X|Z|))VaRt,αδ(X)+VaRt,1δ(|Z|)\operatorname{VaR}_{t,\alpha}(-(X-|Z|))\geq\operatorname{VaR}_{t,\alpha-\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(|Z|). We now analyze the resulting expected value part, using subadditivity of x(x)+x\mapsto(x)_{+}:

𝔼t[(VaRt,α+δ(X)+VaRt,1δ(|Z|)X+|Z|)+]\displaystyle-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(|Z|)-X+|Z|)_{+}]
𝔼t[(VaRt,α+δ(X)X)+]𝔼t[(VaRt,α+δ(|Z|)+|Z|)+]\displaystyle\quad\geq-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)-X)_{+}]-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(|Z|)+|Z|)_{+}]
𝔼t[(VaRt,α+δ(X)X)+]𝔼t[|Z|]\displaystyle\quad\geq-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)-X)_{+}]-\mathbb{E}_{t}[|Z|]

Hence we get the lower bound

φt,α(X|Z|)\displaystyle\varphi_{t,\alpha}(X-|Z|) φt,αδ(X)+VaRt,1δ(|Z|)+11+η𝔼t[|Z|]\displaystyle\leq\varphi_{t,\alpha-\delta}(X)+\operatorname{VaR}_{t,1-\delta}(|Z|)+\frac{1}{1+\eta}\mathbb{E}_{t}[|Z|]
φt,αδ(X)2δ𝔼t[|Z|].\displaystyle\geq\varphi_{t,\alpha-\delta}(X)-\frac{2}{\delta}\mathbb{E}_{t}[|Z|].

Here, again, we have used the Markov’s inequality bound from Corollary 1. Hence we have shown that

φt,α(Y)[φt,αδ(X)2δ𝔼t[|XY|],φt,α+δ(X)+2δ𝔼t[|XY|]],\displaystyle\varphi_{t,\alpha}(Y)\in\Big{[}\varphi_{t,\alpha-\delta}(X)-\frac{2}{\delta}\mathbb{E}_{t}[|X-Y|],\varphi_{t,\alpha+\delta}(X)+\frac{2}{\delta}\mathbb{E}_{t}[|X-Y|]\Big{]},

from which (24) immediately follows. ∎

Proof of Corollary 2.

Choose a sequence δn0\delta_{n}\to 0 such that 2δn2𝔼[|XXn|2]0\frac{2}{\delta_{n}^{2}}\mathbb{E}[|X-X_{n}|^{2}]\to 0 with α+δn<1\alpha+\delta_{n}<1 and δn<1/2\delta_{n}<1/2. We now use the following inequality, which follows from the monotonicity of φt,α\varphi_{t,\alpha} in α\alpha:

|φt,α(X)φt,α(Xn)|\displaystyle|\varphi_{t,\alpha}(X)-\varphi_{t,\alpha}(X_{n})|
φt,α+δn(X)φt,αδn(X)+inf|ϵ|<δn|φt,α+ϵ(X)φt,α(Xn)|\displaystyle\quad\leq\varphi_{t,\alpha+\delta_{n}}(X)-\varphi_{t,\alpha-\delta_{n}}(X)+\inf_{|\epsilon|<\delta_{n}}|\varphi_{t,\alpha+\epsilon}(X)-\varphi_{t,\alpha}(X_{n})|
φt,α+δn(X)φt,αδn(X)+2δn𝔼[|XXn|t]\displaystyle\quad\leq\varphi_{t,\alpha+\delta_{n}}(X)-\varphi_{t,\alpha-\delta_{n}}(X)+\frac{2}{\delta_{n}}\mathbb{E}[|X-X_{n}|\mid\mathcal{H}_{t}]

By L2L^{2}-convergence and our choice of δn\delta_{n}, the last term clearly goes to 0 by our assumptions.

As for the first summand, we see that for any sequence δn0\delta_{n}\to 0, φt,αδn(X)φt,α+δn(X)0\varphi_{t,\alpha-\delta_{n}}(X)-\varphi_{t,\alpha+\delta_{n}}(X)\to 0 almost surely (by the continuity assumption of VaRt,u\operatorname{VaR}_{t,u} at α\alpha) and furthermore it is a decreasing sequence of nonnegative random variables in L2(t)L^{2}(\mathcal{H}_{t}). Hence by Lebesgue’s monotone convergence theorem φt,αδn(X)φt,α+δn(X)20||\varphi_{t,\alpha-\delta_{n}}(X)-\varphi_{t,\alpha+\delta_{n}}(X)||_{2}\to 0. This concludes the proof. ∎

Proof of Lemma 7.

By Lemma 2, φt,α\varphi_{t,\alpha} is L2L^{2} continuous with respect to limit objects with a.s. continuous tt-conditional distributions. Hence the proof is completely analogous to that of Lemma 1. ∎

Proof of Lemma 8.

Fix ϵ>0\epsilon>0. we want to show that (φt,α(βT𝚽t+1,N)φt,α(βnT𝚽t+1,N)2>ϵ)0\mathbb{P}(||\varphi_{t,\alpha}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}>\epsilon)\to 0 as nn\to\infty. We first note that, for any δ(0,1α)\delta\in(0,1-\alpha) with δ<1/2\delta<1/2, we have an inequality similar to that in the proof of Corollary 2:

φt,α(βT𝚽t+1,N)φt,α(βnT𝚽t+1,N)2\displaystyle||\varphi_{t,\alpha}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}
φt,αδ(βT𝚽t+1,N)φt,α+δ(βT𝚽t+1,N)2\displaystyle\quad\leq||\varphi_{t,\alpha-\delta}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha+\delta}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}
+inf|ξ|δφt,α+ξ(βT𝚽t+1,N)φt,α(βnT𝚽t+1,N)2\displaystyle\quad+\inf_{|\xi|\leq\delta}||\varphi_{t,\alpha+\xi}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}

If we look at the first summand, we see that for any sequence δn0\delta_{n}\to 0, φt,αδn(βT𝚽t+1,N)φt,α+δn(βT𝚽t+1,N)0\varphi_{t,\alpha-\delta_{n}}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha+\delta_{n}}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\to 0 almost surely (by a.s. continuity) and furthermore it is a decreasing sequence of nonnegative random variables. Hence by Lebesgue’s monotone convergence theorem φt,αδn(βT𝚽t+1,N)φt,α+δn(βT𝚽t+1,N)20||\varphi_{t,\alpha-\delta_{n}}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha+\delta_{n}}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}\to 0 as a sequence of constants, since the expression is independent of DD.

We now apply Theorem 2 to the second term to see that

inf|ξ|δφt,α+ξ(βT𝚽t+1,N)φt,α(βnT𝚽t+1,N)2\displaystyle\inf_{|\xi|\leq\delta}||\varphi_{t,\alpha+\xi}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2} 2δ(ββn)T𝚽t+1,N2\displaystyle\leq\frac{2}{\delta}||(\beta-\beta_{n})^{\mathrm{T}}\mathbf{\Phi}_{t+1,N}||_{2}
2δββn𝚽t+1,N2\displaystyle\leq\frac{2}{\delta}||\beta-\beta_{n}||_{\infty}||\mathbf{\Phi}_{t+1,N}||_{2}

Note that 𝚽t+1,N2:=KΦ||\mathbf{\Phi}_{t+1,N}||_{2}:=K_{\Phi} is just a constant. We now note that, as ββn0||\beta-\beta_{n}||_{\infty}\to 0 in probability, then for any fixed ϵ>0\epsilon>0, it is possible to choose a sequence δn0\delta_{n}\to 0 such that (2KΦδnββn>ϵ)0\mathbb{P}(\frac{2K_{\Phi}}{\delta_{n}}||\beta-\beta_{n}||_{\infty}>\epsilon)\to 0. Hence, for any fixed ϵ>0\epsilon>0, (φt,α(βT𝚽t+1,N)φt,α(βnT𝚽t+1,N)2>ϵ)0\mathbb{P}(||\varphi_{t,\alpha}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}>\epsilon)\to 0 as nn\to\infty. ∎

Proof of Theorem 3.

We prove the statement by backwards induction, starting from time t=Tt=T. The induction base is trivial. Now assume that the statement holds for time t+1t+1. But then, by Lemma 8, φt,α(Lt+1+V^N,t,α(L))φt,α(Lt+1+V^N,t,α(M)(L))20||\varphi_{t,\alpha}(L_{t+1}+\widehat{V}_{N,t,\alpha}(L))-\varphi_{t,\alpha}(L_{t+1}+\widehat{V}^{(M)}_{N,t,\alpha}(L))||_{2}\to 0 in probability. Hence, we get immediately by Lemma 2 that V^N,t,α(L)V^N,t,α(M)(L)20||\widehat{V}_{N,t,\alpha}(L)-\widehat{V}_{N,t,\alpha}^{(M)}(L)||_{2}\to 0 in probability. This concludes the proof. ∎

References

  • [1] Philippe Artzner, Freddy Delbaen, Jean-Marc Eber, David Heath and Hyejin Ku (2007), Coherent multiperiod risk adjusted values and Bellman’s principle. Annals of Operations Research, 152, 5-22.
  • [2] D. Barrera, S. Crépey, B. Diallo, G. Fort, E. Gobet and U. Stazhynski (2018), Stochastic approximation schemes for economic capital and risk margin computations. HAL <hal-01710394>
  • [3] Mark Broadie, Yiping Du and Ciamac C. Moallemi (2015), Risk estimation via regression. Operations Research, 63 (5) 1077-1097.
  • [4] Patrick Cheridito, Freddy Delbaen and Michael Kupper (2006), Dynamic monetary risk measures for bounded discrete-time processes. Electronic Journal of Probability, 11, 57-106.
  • [5] Patrick Cheridito and Michael Kupper (2011), Composition of time-consistent dynamic monetary risk measures in discrete time. International Journal of Theoretical and Applied Finance, 14 (1), 137-162.
  • [6] Patrick Cheridito and Michael Kupper (2009), Recursiveness of indifference prices and translation-invariant preferences. Mathematics and Financial Economics, 2 (3), 173-188.
  • [7] Emanuelle Clément, Damien Lamberton, and Philip Protter(2002), An analysis of a least squares regression method for american option pricing. Finance and Stochastics, 6, 449–471.
  • [8] European Commission (2015), Commission delegated regulation (EU) 2015/35 of 10 October 2014. Official Journal of the European Union.
  • [9] Łukasz Delong, Jan Dhaene and Karim Barigou (2019), Fair valuation of insurance liability cash-flow streams in continuous time: Applications. ASTIN Bulletin, 49 (2), 299-333.
  • [10] Kai Detlefsen and Giacomo Scandolo (2005), Conditional and dynamic convex risk measures. Finance and Stochastics, 9, 539-561.
  • [11] Jan Dhaene, Ben Stassen, Karim Barigou, Daniël Linders and Ze Chen (2017) Fair valuation of insurance liabilities: Merging actuarial judgement and Market-Consistency. Insurance: Mathematics and Economics, 76, 14-27.
  • [12] Hampus Engsner, Mathias Lindholm and Filip Lindskog (2017), Insurance valuation: A computable multi-period cost-of-capital approach. Insurance: Mathematics and Economics, 72, 250-264.
  • [13] Hampus Engsner and Filip Lindskog (2020), Continuous-time limits of multi-period cost-of-capital margins. Statistics and Risk Modelling, forthcoming (DOI: https://doi.org/10.1515/strm-2019-0008)
  • [14] Daniel Egloff (2005), Monte Carlo algorithms for optimal stopping and statistical learning, The Annals of Applied Probability, 15 (2), 1396-1432.
  • [15] Hans Föllmer and Alexander Schied (2016), Stochastic finance: An introduction in discrete time, 4th edition, De Gruyter Graduate.
  • [16] Francis A. Longstaff and Eduardo S. Schwartz (2001). Valuing American options by simulation: A simple least-squares approach. The Review of Financial Studies, 14 (1),113-147.
  • [17] Christoph Möhr (2011), Market-consistent valuation of insurance liabilities by cost of capital. ASTIN Bulletin, 41, 315-341.
  • [18] Antoon Pelsser and Ahmad Salahnejhad Ghalehjooghi (2020). Time-consistent and market-consistent actuarial valuation of the participating pension contract. Scandinavian Actuarial Journal, forthcoming (DOI: https://doi.org/10.1080/03461238.2020.1832911)
  • [19] Antoon Pelsser and Ahmad Salahnejhad Ghalehjooghi (2016). Time-consistent actuarial valuations. Insurance: Mathematics and Economics, 66, 97-112.
  • [20] Antoon Pelsser and Mitja Stadje (2014), Time-consistent and market-consistent evaluations. Mathematical Finance, 24 (1), 25-65.
  • [21] Lars Stentoft (2004), Convergence of the least squares Monte Carlo approach to American option valuation. Management Science, 50 (9), 1193-1203.
  • [22] John N. Tsitsiklis and Benjamin Van Roy (2001). Regression methods for pricing complex American-style options. IEEE Transactions On Neural Networks, 12 (4), 694-703.
  • [23] Daniel Z. Zanger (2009), Convergence of a least-squares Monte Carlo algorithm for bounded approximating sets. Applied Mathematical Finance, 16 (2), 123-150.
  • [24] Daniel Z. Zanger (2013), Quantitative error estimates for a least-squares Monte Carlo algorithm for American option pricing. Finance and Stochastics, 17, 503-534.
  • [25] Daniel Z. Zanger (2018), Convergence of a least-squares Monte Carlo algorithm for American option pricing with dependent sample data. Mathematical Finance, 28 (1), 447-479.