Least Squares Monte Carlo applied to Dynamic Monetary Utility Functions

Hampus Engsner¹¹1Corresponding author. Department of Mathematics, Stockholm University, SE-10691 Stockholm, Sweden. e-mail: hampus.engsner@math.su.se

Abstract

In this paper we explore ways of numerically computing recursive dynamic monetary risk measures and utility functions. Computationally, this problem suffers from the curse of dimensionality and nested simulations are unfeasible if there are more than two time steps. The approach considered in this paper is to use a Least Squares Monte Carlo (LSM) algorithm to tackle this problem, a method which has been primarily considered for valuing American derivatives, or more general stopping time problems, as these also give rise to backward recursions with corresponding challenges in terms of numerical computation. We give some overarching consistency results for the LSM algorithm in a general setting as well as explore numerically its performance for recursive Cost-of-Capital valuation, a special case of a dynamic monetary utility function.

keywords: Monte Carlo algorithms, least-squares regression, multi-period valuation, dynamic utility funcitons

1 Introduction

Dynamic monetary risk measures and utility functions, as described for instance in [1] and [4], are time consistent if and only if they satisfy a recursive relationship (see for instance [5], [19]). In the case of time-consistent valuations of cash flows, often in an insurance setting, (e.g. [12], [13], [17], [19], [20], [18], [11]), analogous recursions also appear. Recursive relationships also occur as properties of solutions to optimal stopping problems, of which valuation of American derivatives is a special case. It is well known that numerical solutions to these kinds of recursions suffer from ”the curse of dimensionality”: As the underlying stochastic process generating the flow of information grows high dimensional, direct computations of solutions of these recursions prove unfeasible.

In order to make objective of this paper more clear, consider a probability space $(\Omega,\mathcal{F},\mathbb{P})$ , a $d$ -dimensional Markov chain $(S_{t})_{t=0}^{T}$ in $L^{2}(\mathcal{F})$ and its natural filtration $(\mathcal{F}_{t})_{t=0}^{T}$ . We are interested in computing $V_{0}$ given as the solution to the following recursion

\displaystyle V_{t}=\varphi_{t}(f(S_{t+1})+V_{t+1}),\quad V_{T}=0,

(1)

where, for each $t$ , $\varphi_{t}:L^{2}(\mathcal{F}_{t+1})\to L^{2}(\mathcal{F}_{t})$ is a given law-invariant mapping (see Section 2 and definition 3). Recursions such as (1) arise when describing time-consistent dynamic monetary risk measures/utility functions (see e.g. [4]). Alternatively, we may be interested in computing $V_{0}$ given as the solution to the following recursion

\displaystyle V_{t}=f(S_{t})\lor\mathbb{E}[V_{t+1}\mid\mathcal{F}_{t}],\quad V_{T}=f(S_{T}),

(2)

where $f:\mathbb{R}^{d}\to\mathbb{R}$ is a given function. Recursions such as (2) arise when solving discrete-time optimal stopping problems or valuing American-style derivatives (see e.g. [14] and [16]). In this article we will focus the recursive expression (1). In either case, due to the Markovian assumptions, we expect $V_{t}$ to be determined by some deterministic function of the state $S_{t}$ at time $t$ . The curse of dimensionality can now be succinctly put as the statement that as the dimension $d$ grows, direct computation of $V_{t}$ often becomes unfeasible. Additionally, brute-force valuation via a nested Monte Carlo simulation, discussed in [3] and [2], is only a feasible option when $T=2$ , as the number of required simulations would grow exponentially with $T$ . One approach to tackle this problem is the Least Squares Monte Carlo (LSM) algorithm, notably used in [16] to value American-style derivatives, and consists of approximating $V_{t+1}$ in either (1) or (2) as a linear combination basis functions of the state $S_{t+1}$ via least-squares regression. While most often considered for optimal stopping problems ([16], [22], [7], [21], [14], [23], [24], [25]), it has also been used recently in [18] for the purpose of actuarial valuation, with respect to a recursive relationship in line with (1).

The paper is organized as follows. In Section 2 we introduce the mathematical definitions and notation that will allow us to describe the LSM algorithm in our setting mathematically, as well as to formulate theoretical results. Section 3 contains consistency results with respect to computing (1) both in a general setting, only requiring an assumption of continuity in $L^{2}$ norm, and for the special case of a Cost-of-Capital valuation, studied in [12] and [13], under the assumption that capital requirements are given by the risk measure Value-at-Risk, in line with solvency II. The lack of convenient continuity properties of Value-at-Risk pose certain challenges that are handled. Section 4 investigates the numerical performance of the LSM algorithm on valuation problems for a set of models for liability cash flows. Here some effort is also put into evaluating and validating the LSM algorithm’s performance, as this is not trivial for the considered cases.

2 Mathematical setup

Consider a probability space $(\Omega,\mathcal{F},\mathbb{P})$ . On this space we consider two filtrations $(\mathcal{F}_{t})_{t=0}^{T}$ , with $\mathcal{F}_{0}=\{\emptyset,\Omega\}$ , and $(\mathcal{H}_{t})_{t=0}^{T}$ . The latter filtration is an initial expansion of the former: take $\mathcal{D}\subset\mathcal{F}$ and set $\mathcal{H}_{t}:=\mathcal{F}_{t}\vee\mathcal{D}$ . $\mathcal{D}$ will later correspond to the $\sigma$ -field generated by initially simulated data needed for numerical approximations. Define $L^{2}(\mathcal{H}_{t})$ as the spaces of $\mathcal{H}_{t}$ measurable random variables $Z$ with $\mathbb{E}[Z^{2}]<\infty$ . The subspace $L^{2}(\mathcal{F}_{t})\subset L^{2}(\mathcal{H}_{t})$ is defined analogously. All equalities and inequalities between random variables should be interpreted in the $\mathbb{P}$ -almost sure sense.

We assume that the probability space supports a Markov chain $S=(S_{t})_{t=0}^{T}$ on $(\mathbb{R}^{d})^{T}$ , where $S_{0}$ is constant, and an iid sequence $D=(S^{(i)})_{i\in\mathbb{N}}$ , independent of $S$ , where, for each $i$ , $S^{(i)}=(S^{(i)}_{t})_{t=0}^{T}$ has independent components with $\mathcal{L}(S^{(i)}_{t})=\mathcal{L}(S_{t})$ (equal in distribution). $D$ will represent possible initially simulated data and we set $\mathcal{D}=\sigma(D)$ . The actual simulated data will be a finite sample and we write $D_{n}:=(S^{(i)})_{i=1}^{n}$ . For $Z\in L^{2}(\mathcal{F})$ we write $\|Z\|_{2}:=\mathbb{E}[|Z|^{2}\mid\mathcal{H}_{0}]^{\frac{1}{2}}$ . Notice that $\|Z\|_{2}$ is a nonrandom number if $Z$ is independent of $D$ .

The mappings $\rho_{t}$ and $\varphi_{t}$ appearing in Definitions 1 and 2 below can be defined analogously as mappings from $L^{p}(\mathcal{H}_{t+1})$ to $L^{p}(\mathcal{H}_{t})$ for $p\neq 2$ . However, $p=2$ will be the relevant choice for the applications treated subsequently.

Definition 1.

A dynamic monetary risk measure is a sequence $(\rho_{t})_{t=0}^{T-1}$ of mappings $\rho_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ satisfying

	$\displaystyle\textrm{if }\lambda\in L^{2}(\mathcal{H}_{t})\textrm{ and }Y\in L^{2}(\mathcal{H}_{t+1}),\textrm{ then }\rho_{t}(Y+\lambda)=\rho_{t}(Y)-\lambda,$		(3)
	$\displaystyle\textrm{if }Y,\widetilde{Y}\in L^{2}(\mathcal{H}_{t+1})\textrm{ and }Y\leq\widetilde{Y},\textrm{ then }\rho_{t}(Y)\geq\rho_{t}(\widetilde{Y}),$		(4)
	$\displaystyle\rho_{t}(0)=0.$		(5)

The elements $\rho_{t}$ of the dynamic monetary risk measure $(\rho_{t})_{t=0}^{T-1}$ are called conditional monetary risk measures.

Definition 2.

A dynamic monetary utility function is a sequence $(\varphi_{t})_{t=0}^{T-1}$ of mappings $\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ satisfying

	$\displaystyle\textrm{if }\lambda\in L^{2}(\mathcal{H}_{t})\textrm{ and }Y\in L^{2}(\mathcal{H}_{t+1}),\textrm{ then }\varphi_{t}(Y+\lambda)=\varphi_{t}(Y)+\lambda,$		(6)
	$\displaystyle\textrm{if }Y,\widetilde{Y}\in L^{2}(\mathcal{H}_{t+1})\textrm{ and }Y\leq\widetilde{Y},\textrm{ then }\varphi_{t}(Y)\leq\varphi_{t}(\widetilde{Y}),$		(7)
	$\displaystyle\varphi_{t}(0)=0.$		(8)

Note that if $(\rho_{t})_{t=0}^{T-1}$ is a dynamic monetary risk measure, $(\rho_{t}(-\cdot))_{t=0}^{T-1}$ is a dynamic monetary utility function. In what follows we will focus on dynamic monetary utility function of the form

\displaystyle\varphi_{t}(Y)=\rho_{t}(-Y)-\frac{1}{1+\eta_{t}}\mathbb{E}\big{[}\big{(}\rho_{t}(-Y)-Y\big{)}^{+}\mid\mathcal{H}_{t}\big{]},

(9)

where $(\rho_{t})_{t=0}^{T}$ is a dynamic monetary risk measures in the sense of Definition 1 and $(\eta_{t})_{t=0}^{T-1}$ is a sequence of nonrandom numbers in $(0,1)$ . We may consider a more general version of this dynamic monetary utility function by allowing $(\eta_{t})_{t=0}^{T-1}$ to be an $(\mathcal{F}_{t})_{t=0}^{T-1}$ adapted sequence, however we choose the simpler version here. That $(\varphi_{t})_{t=0}^{T}$ is indeed a dynamic monetary utility function is shown in [12].

We will later consider conditional monetary risk measures that are conditionally law invariant in the sense of Definition 3 below. Conditional law invariance will then be inherited by $\varphi_{t}$ in (9).

Definition 3.

A mapping $\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ is called law invariant if $\varphi_{t}(X)=\varphi_{t}(Y)$ whenever $\mathbb{P}(X\in\cdot\mid\mathcal{H}_{t})=\mathbb{P}(Y\in\cdot\mid\mathcal{H}_{t})$ .

We now define the value process corresponding to a dynamic monetary utility function $(\varphi_{t})_{t=0}^{T-1}$ in the sense of Definition 2 with respect to the filtration $(\mathcal{F}_{t})_{t=0}^{T-1}$ instead of $(\mathcal{H}_{t})_{t=0}^{T-1}$ . The use of the smaller filtration is due to that a value process of the sort appearing in Definition 4 is the theoretical object that we aim to approximate well by the methods considered in this paper.

Definition 4.

Let $L:=(L_{t})_{t=1}^{T}$ with $L_{t}\in L^{2}(\mathcal{F}_{t})$ for all $t$ . Let $(\varphi_{t})_{t=0}^{T-1}$ be a dynamic monetary utility function with respect to $(\mathcal{F}_{t})_{t=0}^{T-1}$ . Let

\displaystyle V_{t}(L,\varphi):=\varphi_{t}(L_{t+1}+V_{t+1}(L,\varphi)),\quad V_{T}(L,\varphi):=0.

(10)

We refer to $V_{t}(L,\varphi)$ as the time $t$ $\varphi$ -value of $L$ .

Whenever it will cause no confusion, we will suppress the argument $\varphi$ in $V_{t}(L,\varphi)$ in order to make the expressions less notationally heavy.

Remark 1.

Letting $\rho$ be a dynamic monetary risk measure and letting $\varphi$ be a dynamic monetary utility function, $-\sum_{s=1}^{t}L_{t}-V_{t}(-L,-\rho)$ will be a conditional monetary risk measure on the cash flow $L$ in the sense of [4] and likewise $\sum_{s=1}^{t}L_{t}+V_{t}(L,\varphi)$ will be a conditional monetary utility function in the sense of [4], with $L$ being interpreted as a process of incremental cash flows. If $L$ is a liability cash flow, we may write the risk measure of $L$ as $\sum_{s=1}^{t}L_{t}+V_{t}(L,\rho)$ . Importantly, any time-consistent dynamic monetary utility function/risk measure may be written in this way (see e.g. [5]). Often convexity or subadditivity is added to the list of desired properties in definitions 1 and 2 (see e.g. [1], [4], [5]).

2.1 The approximation framework

For $t=1,\dots,T$ , consider a sequence of functions $\{1,\Phi_{t,1},\Phi_{t,2},\dots\}$ , where for each $i\in\mathbb{N}$ , $\Phi_{t,i}:\mathbb{R}^{d}\to\mathbb{R}$ has the property $\Phi_{t,i}(S_{t})\in L^{p}(\mathcal{F}_{t})$ and the set $\{1,\Phi_{t,1}(S_{t}),\Phi_{t,2}(S_{t}),\dots\}$ make up a.s. linearly independent random variables. We define the approximation space $\mathcal{B}_{t,N}$ and its corresponding $L^{2}$ projection operator $P_{\mathcal{B}_{t,N}}:L^{2}(\mathcal{H}_{t})\to\mathcal{B}_{t,N}$ as follows: for $N\in\mathbb{N}$ and $t\in\{0,\dots,T\}$ ,

	$\displaystyle\mathcal{B}_{t,N}:=\mathrm{span}\{1,\Phi_{t,1}(S_{t}),\dots,\Phi_{t,N}(S_{t})\},$		(11)
	$\displaystyle P_{\mathcal{B}_{t,N}}Z_{t}:=\arg\inf_{B\in\mathcal{B}_{t,N}}\\|Z_{t}-B\\|_{2}.$		(12)

Defining $\mathbf{\Phi}_{t,N}:=(1,\Phi_{t,1}(S_{t}),\dots,\Phi_{t,N}(S_{t}))^{\mathrm{T}}$ , note that the unique minimizer in (12) is given by $P_{\mathcal{B}_{t,N}}Z_{t}:=\beta_{t,N,Z_{t}}^{\mathrm{T}}\mathbf{\Phi}_{t,N}$ , with

\displaystyle\beta_{t,N,Z_{t}}=\mathbb{E}\big{[}\mathbf{\Phi}_{t,N}\mathbf{\Phi}_{t,N}^{{\mathrm{T}}}\mid\mathcal{H}_{0}\big{]}^{-1}\mathbb{E}\big{[}\mathbf{\Phi}_{t,N}Z_{t}\mid\mathcal{H}_{0}\big{]},

(13)

where the expected value of a vector or matrix is interpreted elementwise. Note that if $Z_{t}$ in (13) is independent of the initial data $D$ , then $\beta_{t,N,Z_{t}}$ is a nonrandom vector. Indeed, we will only apply the operator $P_{\mathcal{B}_{t,N}}$ to random variables $Z_{t}$ independent of $D$ .

For each $t$ , consider a nonrandom function $z_{t}$ such that $Z_{t}=z_{t}(D_{M},S_{t})\in L^{2}(\mathcal{H}_{t})$ . For $M\in\mathbb{N}$ , let

\displaystyle\mathbf{\Phi}^{(M)}_{t,N}:=\left(\begin{array}[]{cccc}1&\Phi_{t,1}(S^{(1)}_{t})&\dots&\Phi_{t,N}(S^{(1)}_{t})\\ \vdots&\dots&\dots&\vdots\\ 1&\Phi_{t,1}(S^{(M)}_{t})&\dots&\Phi_{t,N}(S^{(M)}_{t})\end{array}\right),

\displaystyle Z_{t}^{(M)}:=\left(\begin{array}[]{c}z_{t}(D_{M},S^{(1)}_{t})\\ \vdots\\ z_{t}(D_{M},S^{(M)}_{t})\end{array}\right)

and define

	$\displaystyle\widehat{\beta}^{(M)}_{t,N,Z_{t}}$	$\displaystyle:=\Big{(}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}\mathbf{\Phi}^{(M)}_{t,N}\Big{)}^{-1}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}Z_{t}^{(M)},$		(14)
	$\displaystyle P_{\mathcal{B}_{t,N}}^{(M)}Z_{t}$	$\displaystyle:=(\widehat{\beta}^{(M)}_{t,N,Z_{t}})^{\mathrm{T}}\mathbf{\Phi}_{t,N}.$		(15)

Notice that $\widehat{\beta}^{(M)}_{t,N,Z_{t}}$ is independent of $S$ and is the standard OLS estimator of $\beta_{t,N,Z_{t}}$ in (13). Notice also that $\mathbf{\Phi}_{t,N}$ is independent of $D$ . With the above definitions we can define the Least Squares Monte Carlo (LSM) algorithm for approximating the value $V_{0}(L,\varphi)$ given by (4).

Let $(\varphi_{t})_{t=0}^{T-1}$ be a sequence of law-invariant mappings $\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ . Consider a stochastic process $L:=(L_{t})_{t=1}^{T}$ , where $L_{t}=g_{t}(S_{t})\in L^{2}(\mathcal{F}_{t})$ for all $t$ for some nonrandom functions $g_{t}:\mathbb{R}^{d}\to\mathbb{R}$ . The goal is to estimate the values $(V_{t}(L))_{t=0}^{T}$ given by Definition 4. Note that the sought values $V_{t}(L)$ are independent of $D$ and thus, by the law-invariance property and the Markov property, $V_{t}(L)$ is a function of $S_{t}$ for each $t$ . Now we may describe the LSM algorithm with respect to $N$ basis functions and simulation sample size $M$ . The LSM algorithm corresponds to the following recursion:

\displaystyle\widehat{V}^{(M)}_{N,t}(L):=P^{(M)}_{\mathcal{B}_{t,N}}\varphi_{t}\big{(}L_{t+1}+\widehat{V}^{(M)}_{N,t+1}(L)\big{)},\quad\widehat{V}^{(M)}_{N,T}(L):=0.

(16)

Notice that $\widehat{V}^{(M)}_{N,t}$ is a function of the random variables $S_{t}$ and $S_{u}^{(i)}$ for $1\leq i\leq M$ and $t+1\leq u\leq T$ . In particular, $\widehat{V}^{(M)}_{N,t}\in L^{2}(\mathcal{H}_{t})$ . In the section below, we will investigate when, and in what manner, $\widehat{V}^{(M)}_{N,t}\in L^{2}(\mathcal{H}_{t})$ may converge to $V_{t}\in L^{2}(\mathcal{F}_{t})\subset L^{2}(\mathcal{H}_{t})$ . For this purpose, we make the additional useful definition:

\displaystyle\widehat{V}_{N,t}(L):=P_{\mathcal{B}_{t,N}}\varphi_{t}\big{(}L_{t+1}+\widehat{V}_{N,t+1}(L)\big{)},\quad\widehat{V}_{N,T}(L):=0.

(17)

$\widehat{V}_{N,t}(L)$ is to be interpreted as an idealized LSM estimate, where we make a least-squares optimal estimate in each iteration. Note that this quantity is independent of $D$

3 Consistency results

In the following section we prove what essentially are two consistency results for the LSM estimator $\widehat{V}^{(M)}_{N,t}(L)$ along with providing conditions for these to hold. These consistency results are analogous to Theorems 3.1 and 3.2 in [7]. The first and simplest result, Lemma 1, is that if we have a flexible enough class of basis functions, $\widehat{V}_{N,t}(L)$ will asymptotically approach the true value $V_{t}(L)$ . The second consistency result, Theorem 1, is that when $N$ is kept fixed, then $\widehat{V}^{(M)}_{N,t}(L)$ will approach the least-squares optimal $\widehat{V}_{N,t}(L)$ for each $t$ as $M$ grows to infinity. Hence, we show that the LSM estimator for a fixed number of basis functions is consistent in the sense that the simulation-based projection operator $P^{(M)}_{\mathcal{B}_{t,N}}$ will approach $P_{\mathcal{B}_{t,N}}$ even in the presence of errors in a multiperiod setting. Lemma 7 and Theorem 3 furthermore extends these results to the case of a Cost-of-Capital valuation, studied in [12] and [13], which here is dependent on the non-continuous risk measure Value-at-Risk. Note from Section 2 that these results presume simulated data not to be path dependent, in contrast to the results in [25].

We should note that these results do not give a rate of convergence, which is done in the optimal stopping setting in for instance [14] and [23]. Especially, these papers provide a joint convergence rate in which $M$ and $N$ simultaneously go to infinity, something which is not done here. There are three main reasons for this. First of all, in this paper the purpose is to investigate LSM methods given by standard OLS regression, i.e. we do not want to involve a truncation operator as we believe this would not likely be implemented in practice. The use of truncation operators is necessary for the results in [14] and [23], although one can handle the case of unbounded cash flows by letting the bound based on the truncation operator suitably go to infinity along with $N$ and $M$ . Secondly, we believe that the bounds involved in the rates of convergence would be quite large in our case if we applied repeatedly the procedure in [14] or [23] (see remark 3). Thirdly, we want to consider mappings which are $L^{2}$ -continuous (Definition 5) but not necessarily Lipschitz. In this case it is not clear how convergence can be established other than at some unspecified rate.

3.1 General convergence results

We first define a useful mode of continuity that we will require to show our first results on the convergence of the LSM algorithm.

Definition 5.

The mapping $\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ is said to be $L^{2}$ -continuous if $\|X-X_{n}\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0\text{ implies }\|\varphi_{t}(X)-\varphi_{t}(X_{n})\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0$ .

Notice that if $(X_{n})_{n=1}^{\infty}$ and $X$ are independent of $D$ , the convergence in probability may be replaced by convergence of real numbers.

We are now ready to formulate our first result on the convergence of the LSM algorithm. The first result essentially says that if we make the best possible estimation in each recursion step, using $N$ basis functions, then, for each $t$ , the estimator of $V_{t}$ will converge in $L^{2}$ to $V_{t}$ as $N\to\infty$ . This result is not affected by the initial data $D$ , as it does not require any simulation-based approximation.

Lemma 1.

For $t=0,\dots,T-1$ , let the mappings $\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ be $L^{2}$ -continuous and law invariant. For $t=1,\dots,T$ , let $\bigcup_{n\in\mathbb{N}}\mathcal{B}_{t,n}$ be dense in the set $\{h(S_{t})\mid h:\mathbb{R}^{d}\to\mathbb{R},h(S_{t})\in L^{2}(\mathcal{F}_{t})\}$ . Then, for $t=0,\dots,T-1$ ,

\displaystyle\|\widehat{V}_{N,t}(L)-V_{t}(L)\|_{2}\in\mathbb{R}\text{ and }\lim_{N\to\infty}\|\widehat{V}_{N,t}(L)-V_{t}(L)\|_{2}=0.

The second result uses the independence assumptions of $D$ to prove a somewhat technical result for when $P_{\mathcal{B}_{t,N}}^{(M)}$ given by (15) asymptotically approaches the projection $P_{\mathcal{B}_{t,N}}$ given by (12).

Lemma 2.

Let $Z_{t}=z_{t}(S_{t})\in L^{2}(\mathcal{F}_{t})$ . For each $M\in\mathbb{N}$ , let $Z_{M,t}=z_{M,t}(D_{M},S_{t})\in L^{2}(\mathcal{H}_{t})$ , where $z_{M,t}(D_{M},\cdot)$ only depends on $D_{M}$ through $\{S_{u}^{(i)}:1\leq i\leq M,u\geq t+1\}$ , i.e.

\displaystyle\mathcal{L}\big{(}z_{M,t}(D_{M},S_{t}^{(i)})\big{)}=\mathcal{L}\big{(}z_{M,t}(D_{M},S_{t})\big{)}.

(18)

Then, $\|Z_{t}-Z_{M,t}\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0$ as $M\to\infty$ implies

\displaystyle\|P_{\mathcal{B}_{t,N}}Z_{t}-P^{(M)}_{\mathcal{B}_{t,N}}Z_{M,t}\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0\quad\text{and}\quad\widehat{\beta}^{(M)}_{t,N,Z_{M,t}}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}\beta_{t,N,Z_{t}}\quad\text{ as }M\to\infty.

Remark 2.

Notice that $\widehat{V}_{N,t}^{(M)}(L)$ satisfies (18) due to the backwards recursive structure of the LSM algorithm (16).

Lemma 2 is essentially what is needed to prove the induction step in the induction argument used to prove the following result:

Theorem 1.

For $t=0,\dots,T-1$ , let the mappings $\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ be $L^{2}$ -continuous and law invariant, let $\widehat{V}_{N,t}^{(M)}(L)$ be given by (16), and let $\widehat{V}_{N,t}(L)$ be given by (17). Then, for $t=0,\dots,T-1$ ,

\displaystyle\|\widehat{V}_{N,t}(L)-\widehat{V}_{N,t}^{(M)}(L)\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0\text{ as }M\to\infty.

To summarize, Lemma 1 says that we can theoretically/asymptotically achieve arbitrarily accurate approximations, even when applying the approximation recursively, and Theorem 1 says that we may approach this theoretical best approximation in practice, with enough simulated non-path-dependent data.

Lemma 3.

If the mapping $\varphi_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ is Lipschitz continuous in the sense that there exists a constant $K>0$ such that

\displaystyle|\varphi_{t}(X)-\varphi_{t}(Y)|\leq K\mathbb{E}[|X-Y|\mid\mathcal{H}_{t}]\quad\text{for all }X,Y\in L^{2}(\mathcal{H}_{t+1}),

(19)

then $\varphi_{t}$ is $L^{2}$ -continuous in the sense of Definition 5.

Lemma 4.

If the conditional monetary risk measure $\rho_{t}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ is Lipschitz continuous in the sense of (19) with Lipschitz constant $K$ , then so is the mapping $\varphi_{t}$ given by (9) with Lipschitz constant $2K$ .

The large class of (conditional) spectral risk measures are in fact Lipschitz continuous. These conditional monetary risk measures can be expressed as

\displaystyle\rho_{t,m}(Y)=-\int_{0}^{1}F^{-1}_{t,Y}(u)m(u)\mathrm{d}u,

(20)

where $m$ is a probability density function that is decreasing, bounded and right continuous, and $F^{-1}_{t,Y}(u)$ is the conditional quantile function

\displaystyle F^{-1}_{t,Y}(u):=\operatorname*{ess\,inf}\{y\in L^{0}(\mathcal{H}_{t}):\mathbb{P}(Y\leq y\mid\mathcal{H}_{t})\geq u\}.

It is well known that spectral risk measures are coherent and includes expected shortfall as a special case.

Lemma 5.

If $m$ is a probability density function that is decreasing, bounded and right continuous, then $(\rho_{t,m})_{t=0}^{T-1}$ is a dynamic monetary risk measure in the sense of Definition 1. Moreover, each $\rho_{t,m}$ is law invariant in the sense of Definition 3 and also Lipschitz continuous with constant $m(0)$ .

Remark 3.

Assume $\varphi_{t}$ is Lipschitz continuous. Then

	$\displaystyle\|\|\widehat{V}_{t}(L)-V_{t}(L)\|\|_{2}$
	$\displaystyle\quad\leq\|\|\widehat{V}_{t}(L)-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L))\|\|_{2}$
	$\displaystyle\quad\quad+\|\|\varphi_{t}(L_{t+1}+V_{t+1}(L))-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L)\|\|_{2}$
	$\displaystyle\quad\leq\|\|\widehat{V}_{t}(L)-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L))\|\|_{2}+K\|\|V_{t+1}(L)-\widehat{V}_{t+1}(L)\|\|_{2}$

Repeating this argument gives

\displaystyle||\widehat{V}_{t}(L)-V_{t}(L)||_{2}\leq\sum_{s=t}^{T}K^{s-t}||\widehat{V}_{s}(L)-\varphi_{s}(L_{s+1}+\widehat{V}_{s+1}(L)||_{2}

This bound is analogous to that in [24] (Lemma 2.3, see also Remark 3.4 for how this ties in with the main result) with the exception that the constant $K^{s-t}$ appears instead of $2$ . As $K$ may be quite large this is one of the reasons for not seeking to determine the exact rate of convergence, as is done in for instance [25] and [14]. This observation also discourages judging the accuracy of the LSM algorithm purely by estimating the out-of-sample one-step estimation errors of the form $||\widehat{V}_{s}(L)-\varphi_{s}(L_{s+1}+\widehat{V}_{s+1}(L))||_{2}$ , as these need to be quite small in order for a satisfying error bound.

3.2 Convergence results using Value-at-Risk

In this section, we will focus on mappings $\varphi_{t,\alpha}:L^{2}(\mathcal{H}_{t+1})\to L^{2}(\mathcal{H}_{t})$ given by

\displaystyle\varphi_{t,\alpha}(Y):=\operatorname{VaR}_{t,\alpha}(-Y)-\frac{1}{1+\eta_{t}}\mathbb{E}[(\operatorname{VaR}_{t,\alpha}(-Y)-Y)^{+}\mid\mathcal{H}_{t}]

(21)

for some $\alpha\in(0,1)$ and nonnegative constants $(\eta_{t})_{t=0}^{T-1}$ , and where

	$\displaystyle\operatorname{VaR}_{t,\alpha}(-Y)$	$\displaystyle:=F^{-1}_{t,Y}(1-\alpha)$
		$\displaystyle:=\operatorname*{ess\,inf}\{y\in L^{0}(\mathcal{H}_{t}):\mathbb{P}(Y\leq y\mid\mathcal{H}_{t})\geq 1-\alpha\}.$

is the conditional version of Value-at-Risk. Note that $\varphi_{t,\alpha}$ is a special case of mappings $\varphi$ in (9). $(\operatorname{VaR}_{t,\alpha})_{t=0}^{T-1}$ is a dynamic monetary risk measure in the sense of Definition 1, and $\operatorname{VaR}_{t,\alpha}$ is law invariant in the sense of Definition 3. Since $\operatorname{VaR}_{t,\alpha}$ is in general not Lipschitz continuous, $\varphi_{t,\alpha}$ cannot be guaranteed to be so, without further regularity conditions. The aim of this section is to find results analogous to Lemma 1 and Theorem 1.

We will use the following Lemma and especially its corollary in lieu of $L^{2}$ -continuity for Value-at-Risk:

Lemma 6.

For any $X,Z\in\mathcal{H}_{t+1}$ and any $\delta\in(0,1-\alpha)$ ,

	$\displaystyle\operatorname{VaR}_{t,\alpha}(-(X+Z))\leq\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(-Z),$		(22)
	$\displaystyle\operatorname{VaR}_{t,\alpha}(-(X+Z))\geq\operatorname{VaR}_{t,\alpha-\delta}(-X)-\operatorname{VaR}_{t,1-\delta}(-Z).$		(23)

We get an interesting corollary from this lemma:

Corollary 1.

Let $\alpha\in(0,1)$ and let $\delta\in(0,1-\alpha)$ with $\delta<1/2$ . Then, for any $X,Y\in L^{1}(\mathcal{H}_{t+1})$ ,

\displaystyle\inf_{|\epsilon|<\delta}|\operatorname{VaR}_{t,\alpha+\epsilon}(X)-\operatorname{VaR}_{t,\alpha}(Y)|\leq\frac{1}{\delta}\mathbb{E}[|X-Y|\mid\mathcal{H}_{t}].

Using these Lipschitz-like results, we can show a Lipschitz-like result for $\varphi_{t,\alpha}(\cdot)$ .

Theorem 2.

Let $\alpha\in(0,1)$ and let $\delta\in(0,1-\alpha)$ with $\delta<1/2$ . Then, for any $X,Y\in L^{1}(\mathcal{H}_{t+1})$ , then

\displaystyle\inf_{|\epsilon|<\delta}|\varphi_{t,\alpha+\epsilon}(X)-\varphi_{t,\alpha}(Y)|\leq\frac{2}{\delta}\mathbb{E}[|X-Y|\mid\mathcal{H}_{t}]

(24)

and

	$\displaystyle\varphi_{t,\alpha-\delta}(X)-\frac{2}{\delta}\mathbb{E}[\|X-Y\|\mid\mathcal{H}_{t}]$	$\displaystyle\leq\varphi_{t,\alpha}(Y)$
		$\displaystyle\leq\varphi_{t,\alpha+\delta}(X)+\frac{2}{\delta}\mathbb{E}[\|X-Y\|\mid\mathcal{H}_{t}],$		(25)

and (24) and (25) are equivalent.

Theorem 2 enables us to prove $L^{2}$ -continuity of $\varphi_{t,\alpha}$ under a continuity assumption.

Corollary 2.

Consider $X,X_{n}\in L^{2}(\mathcal{H}_{t+1})$ , $n\geq 1$ , with $X_{n}\to X$ in $L^{2}$ . Assume that $(0,1)\ni u\mapsto\operatorname{VaR}_{t,u}(-X)$ be a.s. continuous at $u=\alpha$ . Then $\varphi_{t,\alpha}(X_{n})\to\varphi_{t,\alpha}(X)$ in $L^{2}$ .

The following remark illustrates that even a stronger requirement of a.s. continuous time $t$ -conditional distributions should not be a great hindrance in practice:

Remark 4.

If we add to our cash flow $(L_{t})_{t=1}^{T}$ an adapted process $(\epsilon_{t})_{t=1}^{T}$ , independent of $(L_{t})_{t=1}^{T}$ , such that for each $t$ , $\epsilon_{t}$ is independent of $\mathcal{F}_{t-1}$ and has a continuous distribution function, then the assumptions in Corollary 2 will be satisfied.

We are now ready to formulate a result analogous to Lemma 1.

Lemma 7.

Let $\alpha\in(0,1)$ and let $\delta\in(0,1-\alpha)$ with $\delta<1/2$ . Let $(\varphi_{t,\alpha})_{t=0}^{T-1}$ be defined by (21) and let

\displaystyle\widehat{V}_{N,t,\alpha}(L):=P_{\mathcal{B}_{t,N}}\varphi_{t,\alpha}\big{(}L_{t+1}+\widehat{V}_{N,t+1,\alpha}(L)\big{)},\quad\widehat{V}_{N,T,\alpha}(L):=0.

(26)

Let $\bigcup_{n\in\mathbb{N}}\mathcal{B}_{t,n}$ be dense in the set $\{h(S_{t})\mid h:\mathbb{R}^{d}\to\mathbb{R},h(S_{t})\in L^{2}(\mathcal{F}_{t})\}$ and assume that $(0,1)\ni u\mapsto\operatorname{VaR}_{t,u}(-L_{t+1}-\widehat{V}_{N,t+1,\alpha}(L))$ be a.s. continuous at $u=\alpha$ for all $N\in\mathbb{N},t=0,\dots T-1$ . Then, for $t=0,\dots,T-1$ ,

\displaystyle\|\widehat{V}_{N,t,\alpha}(L)-V_{t,\alpha}(L)\|_{2}\in\mathbb{R}\text{ and }\lim_{N\to\infty}\|\widehat{V}_{N,t,\alpha}(L)-V_{t,\alpha}(L)\|_{2}=0.

Lemma 8.

Let $\alpha\in(0,1)$ and let $(0,1)\ni u\mapsto\operatorname{VaR}_{t,u}(v^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})$ be a.s. continuous at $u=\alpha$ for any $v\in\mathbb{R}^{N}$ . Then

\displaystyle\beta_{n}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}\beta\quad\text{implies}\quad\big{\|}\varphi_{t,\alpha}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\big{\|}_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0.

Remark 5.

Lemma 8 can be extended to show the convergence

\displaystyle\big{\|}\varphi_{t,\alpha}(L_{t+1}+\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(L_{t+1}+\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\big{\|}_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0

since the vector of basis functions $\mathbf{\Phi}_{t+1,N}$ could contain $L_{t+1}$ as an element. The requirement for convergence is that $u\mapsto\operatorname{VaR}_{t,u}(-L_{t+1}-v^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})$ is a.s. continuous at $u=\alpha$ . This requirement could be replaced by the stronger requirement that $x\mapsto\mathbb{P}(L_{t+1}+v^{\mathrm{T}}\mathbf{\Phi}_{t+1,N}\mid\mathcal{F}_{t})$ is a.s. continuous.

We have now fitted $\varphi_{t,\alpha}$ into the setting of Theorem 1.

Theorem 3.

Let $u\mapsto\operatorname{VaR}_{t,u}(-L_{t+1}-v^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})$ be a.s. continuous at $u=\alpha$ for any $v\in\mathbb{R}^{N}$ . For any $N\in N$ and $t=0,\dots,T$ , let $\widehat{V}_{N,t,\alpha}(L)$ be given by (26) and define

\displaystyle\widehat{V}^{(M)}_{N,t,\alpha}(L):=P^{(M)}_{\mathcal{B}_{t,N}}\varphi_{t,\alpha}\big{(}L_{t+1}+\widehat{V}^{(M)}_{N,t+1}(L)\big{)},\quad\widehat{V}^{(M)}_{N,T,\alpha}(L):=0.

Then, for $t=0,1,\dots,T-1$ , $\|\widehat{V}_{N,t,\alpha}(L)-\widehat{V}_{N,t,\alpha}^{(M)}(L)\|_{2}\stackrel{{\scriptstyle{\small\mathbb{P}}}}{{\to}}0$ as $M\to\infty$ .

4 Implementing and validating the LSM algorithm

In this section we will test the LSM algorithm empirically for the special case of the mappings $\varphi$ being given by $(\varphi_{t,\alpha})_{t=0}^{T-1}$ in (21). The LSM algorithm described below, Algorithm 1, will differ slightly from the one previously, in the sense that it will contain the small inefficiency of having two regression steps: One for the $\operatorname{VaR}$ term of the mapping, one for the expected value term. The reason for introducing this split is that it will significantly simplify the validation procedures of the algorithm. Heuristically, we will be able to run a forward simulation where we may test the accuracy of both the $\operatorname{VaR}$ term and the expected value term.

Let $\mu_{t,t+1}(\cdot,\cdot)$ be the transition kernel from time $t$ to $t+1$ of the Markov process $(S_{t})_{t=0}^{T}$ so that $\mu_{t,t+1}(S_{t},\cdot)=\mathbb{P}(S_{t+1}\in\cdot\mid S_{t})$ . In order to perform the LSM algorithm below, the only requirements are the ability to efficiently sample a variate $s$ from the unconditional law $\mathcal{L}(S_{t})$ of $S_{t}$ and from the conditional law $\mu_{t,t+1}(\cdot,s)$ . Recall that the liability cash flow $(L_{t})_{t=1}^{T}$ is assumed to be given by $L_{t}:=g_{t}(S_{t})$ , for known functions $(g_{t})_{t=1}^{T}$ .

Algorithm 1 LSM Algorithm

Set

\widehat{\beta}^{(M)}_{T,N,V}:=0

for

t=T-1:0

Draw independent variables

S^{(1)}_{t},\dots S^{(M)}_{t}

from

\mathcal{L}(S_{t})

for

i=1:M

Draw independent variables

S^{(i,1)}_{t+1},\dots,S^{(i,n)}_{t+1}

from

\mu_{t,t+1}(S^{(i)}_{t},\cdot)

Set

Y^{(i,j)}_{t+1}:=g_{t+1}(S^{(i,j)}_{t+1})+(\widehat{\beta}^{(M)}_{t+1,N,V})^{{\mathrm{T}}}\mathbf{\Phi}_{t+1,N}(S^{(i,j)}_{t+1})

j=1,\dots,n

Let

\widehat{F}^{(i)}_{t}(y):=\frac{1}{n}\sum_{j=1}^{n}I\{Y^{(i,j)}_{t+1}\leq y\}

(empirical cdf)

Set

R^{(i)}_{t}:=\min\{y:\widehat{F}^{(i)}_{t}(y)\geq\alpha\}

(empirical

\alpha

-quantile)

Set

E^{(i)}_{t}:=\frac{1}{n}\sum_{j=1}^{n}(R^{(i)}_{t}-Y^{(i,j)}_{t+1})_{+}

end for

Set

\widehat{\beta}^{(M)}_{t,N,R}

as in (14) by regressing

(R_{t}^{(i)})_{i=1}^{M}

onto

(\mathbf{\Phi}_{t,N}(S^{(i)}_{t}))_{i=1}^{M}

Set

\widehat{\beta}^{(M)}_{t,N,E}

as in (14) by regressing

(E_{t}^{(i)})_{i=1}^{M}

onto

(\mathbf{\Phi}_{t,N}(S^{(i)}_{t}))_{i=1}^{M}

Set

\widehat{\beta}^{(M)}_{t,N,V}:=\widehat{\beta}^{(M)}_{t,N,R_{t}}-\frac{1}{1+\eta}\widehat{\beta}^{(M)}_{t,N,E_{t}}

end for

We may assess the accuracy of the LSM implementation by computing root mean-squared errors (RMSE) of quantities appearing in Algorithm 1. For each index pair $(t,i)$ set $V^{(i)}_{t}:=R^{(i)}_{t}-\frac{1}{1+\eta}E^{(i)}_{t}$ . Define the RMSE and the normalized RMSE by

	$\displaystyle\text{RMSE}_{Z,t}$	$\displaystyle:=\bigg{(}\frac{1}{M}\sum_{i=1}^{M}\Big{(}Z^{(i)}_{t}-(\widehat{\beta}^{(M)}_{t,N,Z})^{\mathrm{T}}\mathbf{\Phi}_{t,N}\big{(}S^{(i)}_{t}\big{)}\Big{)}^{2}\bigg{)}^{1/2},$		(27)
	$\displaystyle\text{NRMSE}_{Z,t}$	$\displaystyle:=\text{RMSE}_{Z,t}\times\bigg{(}\frac{1}{M}\sum_{i=1}^{M}{Z^{(i)}_{t}}^{2}\bigg{)}^{-1/2},$		(28)

where $Z$ is a placeholder for $R$ , $E$ or $V$ .

For each index pair $(t,i)$ consider the actual non-default probability and actual return on capital given by

	$\displaystyle\text{ANDP}^{(i)}_{t}$	$\displaystyle:=\widehat{F}^{(i)}_{t}\big{(}(\widehat{\beta}^{(M)}_{t,N,R})^{{\mathrm{T}}}\mathbf{\Phi}_{t,N}(S^{(i)}_{t}))\big{)},$		(29)
	$\displaystyle\text{AROC}^{(i)}_{t}$	$\displaystyle:=(1+\eta)E^{(i)}_{t}\times\big{(}(\widehat{\beta}^{(M)}_{t,N,E})^{{\mathrm{T}}}\mathbf{\Phi}_{t,N}(S^{(i)}_{t})\big{)}^{-1}),$		(30)

and note that these random variables are expected to be centered around $\alpha$ and $1+\eta$ , respectively, if the implementation is accurate. All validation procedures in this paper are performed out-of-sample, i.e. we must perform a second validation run to get these values.

4.1 Models

In this section we will introduce two model types in order to test the performance of the LSM algorithm. The first model type, introduced in Section 4.1.1, is not motivated by a specific application but is simply a sufficiently flexible and moderately complex time series model.The second model type, introduced in Section 4.1.2, aims to describe the cash flow of a life insurance portfolio paying both survival and death benefits.

4.1.1 AR(1)-GARCH(1,1) models

The first model to be evaluated is when the liability cash flow $(L_{t})_{t=1}^{T}$ is assumed to be given by a process given by an AR(1) model with GARCH(1,1) residuals, with dynamics given by:

\displaystyle L_{t+1}=\alpha_{0}+\alpha_{1}L_{t}+\sigma_{t+1}\epsilon_{t+1},\quad\sigma^{2}_{t+1}=\alpha_{2}+\alpha_{3}\sigma^{2}_{t}+\alpha_{4}L^{2}_{t},\quad L_{0}=0,\sigma_{1}=1.

Here $\epsilon_{1},\dots,\epsilon_{T}$ are assumed to be i.i.d. standard normally distributed and $\alpha_{0},\dots\alpha_{4}$ are known model parameters. If we put $S_{t}=(L_{t},\sigma_{t+1})$ for $t=0,\dots,T$ , we see that $S_{t}$ will form a time homogeneous Markov chain.

In order to contrast this model with a more complex model, we also investigate the case where the process $(L_{t})_{t=1}^{T}$ is given by a sum of independent AR(1)-GARCH(1,1)-processes of the above type: $L_{t}=\sum_{i=1}^{10}L_{t,i}$ , where

\displaystyle L_{t+1,i}=\alpha_{0,i}+\alpha_{1,i}L_{t,i}+\sigma_{t+1,i}\epsilon_{t+1,i},\quad\sigma^{2}_{t+1,i}=\alpha_{2,i}+\alpha_{3,i}\sigma^{2}_{t,i}+\alpha_{4,i}L^{2}_{t,i}.

The motivation for these choices of toy models is as follows: Firstly, a single AR(1)-GARCH(1,1) process is sufficiently low dimensional so we may compare brute force approximation with that of the LSM model, thus getting a real sense of the performance of the LSM model. Secondly, despite it being low dimensional, it still seems to have a sufficiently complex dependence structure as not to be easily valued other than by numerical means. The motivation for looking at a sum of AR(1)-GARCH(1,1) processes is simply to investigate whether model performance is severely hampered by an increase in dimensionality, provided a certain amount of independence of the sources of randomness.

4.1.2 Life insurance models

In order to investigate a set of models more closely resembling an insurance cash flow, we also consider an example closely inspired by that in [9]. Essentially, we will assume the liability cash flow to be given by life insurance policies where we take into account age cohorts and their sizes at each time, along with financial data relevant to the contract payouts.

We consider two risky assets $Y$ and $F$ , given by the log-normal dynamics

	$\displaystyle\mathrm{d}Y_{t}=\mu_{Y}Y_{t}\mathrm{d}t+\sigma_{Y}Y_{t}\mathrm{d}W^{Y}_{t},\quad 0\leq t\leq T,\quad Y_{0}=y_{0},$
	$\displaystyle\mathrm{d}F_{t}=\mu_{F}F_{t}\mathrm{d}t+\sigma_{F}F_{t}\mathrm{d}W^{F}_{t},\quad 0\leq t\leq T,\quad F_{0}=f_{0}.$

$W^{Y}_{t}$ $W^{F}_{t}$ are two correlated Brownian motions, which we may re-write as

\displaystyle W^{Y}_{t}=W^{1}_{t},\quad W^{F}_{t}=\rho W^{1}_{t}+\sqrt{1-\rho^{2}}W^{2}_{t},\quad 0\leq t\leq T,

where $W^{1}$ and $W^{2}$ are two standard, uncorrelated Brownian motions. Here, $F$ will represent the index associated with unit-linked contracts and $Y$ will represent assets owned by the insurance company. Furthermore, we assume that an individual of age $a$ has the probability $1-p_{a}$ of reaching age $a+1$ , where the probabilities $p_{a}$ for $a=0,1,\dots$ are assumed to be nonrandom and known. All deaths are assumed to be independent of each other. We will consider $k$ age-homogeneous cohorts of sizes $n_{1},\dots,n_{k}$ at time $t=0$ and ages $a_{1},\dots,a_{k}$ at time $t=0$ . We assume that all insured individuals have bought identical contracts. If death occurs at time $t$ , the contract pays out the death benefit $\max(D^{*},F_{t})$ , where $D^{*}$ is a nonrandom guaranteed amount. If an insured person survives until time $T$ , the survival benefit $\max(S^{*},F_{T})$ , where again $S^{*}$ is a nonrandom amount. We finally assume that the insurance company holds the nominal amount $c(n_{1}+\dots+n_{k})$ in the risky asset Y and that they will sell off these assets proportionally to the amount of deaths as they occur, and sell off the entire remaining amount at time $T$ . Let $N^{i}_{t}$ denote the number of people alive in cohort $i$ at time $t$ , with the following dynamics:

\displaystyle N^{i}_{t+1}\sim\text{Bin}(N^{i}_{t},1-p_{a_{i}+t}),\quad t=0,\dots,T-1.

These are the same dynamics as the life insurance example in Section 5 of [12]. Thus, the liability cash flow we consider here is given by

	$\displaystyle L_{t}$	$\displaystyle=\big{(}\max(D^{*},F_{t})-cY_{t}\big{)}\sum_{i=1}^{k}(N^{i}_{t}-N^{i}_{t-1})$
		$\displaystyle\quad+\mathbb{I}\{t=T\}\big{(}\max(S^{*},F_{T})-cY_{T}\big{)}\sum_{i=1}^{k}N^{i}_{T}$

If we write $S_{t}=(Y_{t},F_{t},N^{1}_{t},\dots,N^{k}_{t})$ , then $S:=(S_{t})_{t=0}^{T}$ will be a Markov chain with dynamics outlined above. Note that depending on the number $k$ of cohorts, $S$ might be a fairly high-dimensional Markov chain. Note that in addition to the obvious risk factors of mortality and the contractual payout amounts, there is also the risk of the value of the insurance company’s risky asset $Y$ depreciating in value, something which is of course a large risk factor of insurance companies in practice. Here we will consider the case of $k=4$ cohorts, referred to as the small life insurance model and the case $k=10$ cohorts, referred to as the large life insurance model.

4.2 Choice of basis functions

So far, the choice of basis functions has not been addressed. As we are trying to numerically calculate some unknown functions we do not know the form of, the approach used here will be a combination of standard polynomial functions, completed with functions that in some ways bear resemblance to the underlying liability cash flow. A similar approach for the valuation of American derivatives is taken in for instance [16] and [3], where in the latter it is explicitly advised (see pp. 1082) to use the value of related, simpler, derivatives as basis functions to price more exotic ones.

In these examples, we will not be overly concerned with model sparsity, covariate significance or efficiency, but rather take the machine-learning approach of simply evaluating models based on out-of-sample performance. This is feasible due to the availability of simulated data for both fitting and out-of-sample validation.

4.2.1 AR(1)-GARCH(1,1) models

Since the AR(1)-GARCH(1,1) models can be considered toy models, generic basis functions were chosen. For a single AR(1)-GARCH(1,1) model, the choice of basis functions was all polynomials of the form $L_{t}^{i}\sigma_{t+1}^{j}$ for all $0<i+j\leq 2$ . For the sum of $10$ independent AR(1)-GARCH(1,1) models we denote by $L_{t},\sigma_{t+1}$ the aggregated liability cash flow and standard deviation at time $t$ and $t+1$ , respectively. Then we consider the basis functions consisting of the state vector $(L_{t,i},\sigma_{t,i})_{i=1}^{10}$ along with $L_{t}^{i}\sigma_{t+1}^{j}$ for all $0<i+j\leq 2$ , omitting the case of $i=1$ , $j=0$ to avoid collinearity. Note that the number of basis functions grow linearly with the dimensionality of the state space, rather than quadratically.

4.2.2 Life insurance models

For the state $S_{t}=(Y_{t},F_{t},N^{1}_{t},\dots,N^{k}_{t})$ , let $p^{i}_{t+1}$ be the probability of death during $(t,t+1)$ for an individual in cohort $i$ , with $q^{i}_{t+1}:=1-p^{i}_{t+1}$ . We then introduce the state-dependent variables

\displaystyle\mu_{t+1}:=\sum_{i=1}^{k}N^{i}_{t}p^{i}_{t+1},\quad\sigma_{t+1}:=\Big{(}\sum_{i=1}^{k}N^{i}_{t}p^{i}_{t+1}q^{i}_{t+1}\Big{)}^{1/2},\quad N_{t}:=\sum_{i=1}^{k}N^{i}_{t}.

The first two terms here are the mean and standard deviation of the number of deaths during $(t,t+1)$ , the third simply being the total number of people alive at time $t$ . The basis functions we choose consist of the state vector $Y_{t},F_{t},N^{1}_{t},\dots,N^{k}_{t}$ together with all products of two factors where the first factor is an element of the set $\{\mu_{t+1},\sigma_{t+1},N_{t}\}$ and the other factor is an element of the set

	$\displaystyle\big{\{}Y_{t},F_{t},Y^{2}_{t},F^{2}_{t},F_{t}^{3},Y_{t}F_{t},Y_{t}F_{t}^{2},(F_{t}-K_{j})_{+},(F_{t}-K_{j})_{+}Y_{t},$
	$\displaystyle\quad C(F_{t},S^{},T,t),C(F_{t},D^{},t+1,t),C(F_{t},S^{},T,t)Y_{t},C(F_{t},D^{},t+1,t)Y_{t}\big{\}}.$

$K_{j}$ can take values in $\{200,162,124,103\}$ depending on which covariates of the form $(F_{t}-K_{j})_{+}$ had the highest $R^{2}$ -value at time $T=5$ . Here the $R^{2}$ -values were calculated based on the residuals after performing linear regression with respect to all basis function not containing elements of the form $(F_{t}-K_{j})_{+}$ . While this is a somewhat ad hoc approach that could be refined, it is a simple and easy to implement example of basis functions. Again note that the number of basis functions grow linearly with the dimensionality of the state space, rather than quadratically.

4.2.3 Run specifications

For Algorithm 1, $M=5\cdot 10^{4}$ and $n=10^{5}$ were chosen for the life insurance models and $M=10^{4}$ and $n=10^{5}$ for the AR(1)-GARCH(1,1) models. Terminal time $T=6$ was used in all cases. For the validation run, $M=10^{4}$ and $n=10^{5}$ were chosen for all models. Due to the extreme quantile level involved, and also based on empirical observations, it was deemed necessary to keep $n$ around this order of magnitude. Similarly, in part due to the number of basis functions involved, it was observed as well that performance seemed to increase with $M$ . The choice of $M$ and $n$ to be on the considered order of magnitude was thus necessary for good model performance, and also the largest orders of magnitude that was computationally feasible given the computing power available.

For the AR(1)-GARCH(1,1) model, the chosen parameters were

\alpha_{0}=1,\quad\alpha_{1}=1,\quad\alpha_{2}=0.1,\quad\alpha_{3}=0.1,\quad\alpha_{4}=0.1.

The same choice was used for each of the terms in the sum of $10$ AR(1)-GARCH(1,1) processes, making the model a sum of i.i.d. processes.

For the life insurance models, the choice of parameters of the risky assets was $\mu_{Y}=\mu_{F}=0.03,\sigma_{Y}=\sigma_{F}=0.1,\rho=0.4,y_{0}=f_{0}=100$ . The benefit lower bounds were chosen as $D^{*}=100,S^{*}=110$ . The death/survival probabilities were calculated using the Makeham formula (for males):

\displaystyle p_{a}=\exp\Big{\{}-\int_{a}^{a+1}\mu_{x}\mathrm{d}x\Big{\}}\quad\mu_{x}:=0.001+0.000012\exp\{0.101314x\}.

These numbers correspond to the Swedish mortality table M90 for males (the formula for females is identical, but adjusted backwards by $6$ years to account for the greater longevity in the female population). For the case of $4$ cohorts, starting ages (for males) were $50-80$ in $10$ -year increments and for the case of $10$ cohorts the starting ages were $40-85$ with $5$ -year increments.

The algorithms were run on a computer with 8 Intel(R) Core(TM) i7-4770S 3.10GHz processors, and parallel programming was implemented in the nested simulation steps in both Algorithm 1 and the validation algorithm.

4.3 Numerical results

The RMSE:s and NRMSE:s of the LSM models can be seen in Table 1 (RMSE:s) and Table 2 (NRMSE:s). The ANDP:s and AROC:s of the LSM models can be seen in Table 3. The tables display quantile ranges with respect to the $2.5\%$ and $97.5\%$ quantiles of the data.

Model	RMSE V	RMSE R	RMSE E
one single AR(1)-GARCH(1,1)	0.0114, 0.0118, 0.0115, 0.0098, 0.0061	0.0533, 0.0556, 0.0553, 0.0455, 0.0285	0.0521, 0.0542, 0.0544, 0.0444, 0.0279
a sum of $10$ AR(1)-GARCH(1,1)	0.0172, 0.0130, 0.0120, 0.0100, 0.0061	0.0525, 0.0552, 0.0546, 0.0467, 0.0278	0.0536, 0.0544, 0.0535, 0.0458, 0.0273
Life model with 4 cohorts	134.4, 120.3, 134.8, 85.1, 75.5	760.3, 682.7, 901.6, 535.9, 575.1	742.4, 665.0, 856.8, 536.0, 571.3
Life model with 10 cohorts	331.9, 307.4, 330.8, 226.2, 219.3	1730.1, 1719.4, 2148.3, 1431.0, 1928.3	1689.2, 1672.8, 2049.8, 1429.4, 1910.3

Table 1: RMSE values for the quantities

V,R,E

as defined in (27). The five values in each cell are for times

t=1,2,3,4,5

, in that order.

Model	NRMSE V (%)	NRMSE R (%)	NRMSE E (%)
one single AR(1)-GARCH(1,1)	0.0498, 0.0583, 0.0685, 0.0797, 0.0901	0.1705, 0.1931, 0.2170, 0.2333, 0.2544	0.5810, 0.5962, 0.5930, 0.5747, 0.5885
a sum of $10$ AR(1)-GARCH(1,1)	0.0758, 0.0642, 0.0715, 0.0813, 0.0912	0.1693, 0.1913, 0.2158, 0.2393, 0.2493	0.6049, 0.5963, 0.5880, 0.5949, 0.5779
Life model with 4 cohorts	0.2567 0.2225 0.2603 0.1711 0.1647	0.5443 0.5646 0.8722 0.5924 0.7403	0.7109 0.7535 1.1381 0.8306 1.0271
Life model with 10 cohorts	0.2505, 0.2247, 0.2454, 0.1720, 0.1733	0.4911, 0.5592, 0.7948, 0.5952, 0.9056	0.6475, 0.7452, 1.0499, 0.8351, 1.2580

Table 2: NRMSE values for the quantities

V,R,E

as defined in (28). The five values in each cell are for times

t=1,2,3,4,5

, in that order.

Model	QR ANDP ( $2.5\%$ , $97.5\%$ )	QR AROC ( $2.5\%$ , $97.5\%$ )
one single AR(1)-GARCH(1,1)	(0.457, 0.544), (0.456, 0.545), (0.457, 0.545), (0.458, 0.545), (0.457, 0.543)	(4.79, 7.22), (4.76, 7.25), (4.78, 7.22), (4.84, 7.24), (4.80, 7.20)
a sum of $10$ AR(1)-GARCH(1,1)	(0.458, 0.545), (0.456, 0.545), (0.457, 0.545), (0.456, 0.546), (0.457, 0.544)	(4.74, 7.29), (4.77, 7.26), (4.77, 7.23), (4.79, 7.25), (4.81, 7.21)
Life model with 4 cohorts	(0.454, 0.548), (0.443, 0.565), (0.385, 0.622), (0.436, 0.571), (0.387, 0.603)	(4.59, 7.43), (4.32, 7.82), (2.71, 9.05), (4.10, 7.93), (2.69, 8.49)
Life model with 10 cohorts	(0.457, 0.546), (0.444, 0.560), (0.391, 0.611), (0.435, 0.569), (0.394, 0.605)	(4.66, 7.37), (4.33, 7.68), (2.94, 8.97), (4.08, 7.89), (2.96, 8.58)

Table 3: Quantile ranges for the samples

(1-\text{ANDP}_{t}^{(i)})_{i=1}^{M}

and

(\text{AROC}_{t}^{(i)})_{i=1}^{M}

, as defined in (29) and (30). The quantiles considered are

2.5\%

and

97.5\%

. The five intervals in each cell are for times

t=1,2,3,4,5

, in that order.

Below, in Figure 1 we also present some histograms of the actual returns and risks of ruin, in order to get a sense of the spread of these values.

Refer to caption — Figure 1: The top two figures correspond to the AR(1)-GARCH(1,1) model. The bottom two figures correspond to the life insurance model with 10 cohorts.

From these we can observe that the quantity representing the actual returns seems to be quite sensitive to model errors, if we recall the rather small size of the RMSE values.

Model	Running time valuation (HH:MM)	Running time validation (HH:MM)
one single AR(1)-GARCH(1,1)	00:06	00:10
a sum of 10 AR(1)-GARCH(1,1)	00:33	00:39
Life model with 4 cohorts	12:48	02:30
Life model with 10 cohorts	13:29	02:44

Table 4: Run time of each model in hours and minutes. Run specifications are described in section 4.2.3

Table 4 displays the running times of each model. As far as is known, the main factor determining running time of Algoritm 1 is the repeated calculation of the basis functions inside the nested simulation (required to calculate the quantities $Y_{t+1}^{i,j}$ in the inner for-loop). As these are quite many for models with high-dimensional state spaces, we see that running times increase accordingly. It should be noted that Algorithm 1 was not implemented to run as fast as possible for any specific model, other than the implementation of parallel programming. Speed could potentially be gained by adapting Algorithm 1 for specific models of interest.

Some conclusions can be drawn from the numerical results. Firstly, we can see that from a mean-squared-error point of view, the LSM model seems to work well in order to capture the dynamics of the multiperiod cost-of-capital valuation. It should be noted that the (N)RMSE of the value $V$ is lower than those of $R$ and $E$ across the board for all models and times. Since the expression for $E$ is heavily dependent of $R$ , we can suspect that estimation errors of $R$ and $E$ are positively correlated, and thus that $V=R-\frac{1}{1+\eta}E$ gets lower mean squared errors as a result.

We can see that increasing model complexity for the AR(1)-GARCH(1,1) and life insurance models seems to have no significant effect on LSM performance. It should be noted that model complexity in both cases is increased by introducing independent stochastic factors; A sum of i.i.d. processes in the AR(1)-GARCH(1,1) case and the adding of (independent) cohorts in the life insurance case. Thus the de-facto model complexity might not have increased much, even though the state-space of the markov process is increased.

When we look at the ANDP and AROC quantities, we see that these seem to vary more than do the (N)RMSE:s. Especially AROC, which is defined via a quotient, seems to be sensitive to model error.

One important thing to note with regards to sensitivity of ANDP and AROC is the presence of errors introduced by the necessity of having to use Monte-Carlo simulations in order to calculate samples of $\operatorname{VaR}_{t,1-\alpha}(-\cdot),\mathbb{E}[(\cdot)_{+}]$ . This can be seen in the AR(1)-GARCH(1,1) case: If we investigate what the value $V_{5}(L)$ should be, we see that in this case it has a closed form (using positive homogeneity and translation invariance):

\displaystyle\varphi_{5}(L_{6}+V_{6}(L))=\varphi_{5}(\alpha_{0}+\alpha_{1}L_{5}+\sigma_{6}\epsilon_{6})=\alpha_{0}+\alpha_{1}L_{5}+\sigma_{6}\varphi_{5}(\epsilon_{6}).

$\varphi_{5}(\epsilon_{6})$ is deterministic due to law invariance. Since $L_{t}$ and $\sigma_{t}$ are included in the basis functions for the AR(1)-GARCH(1,1) model, we would expect the fit in this case to be perfect. Since it is not, we conclude that errors still may appear even if optimal basis functions are among our selection of basis functions.

Finally, if we recall that the main purpose of these calculations is to calculate the quantity $V_{t}(L)$ , a good approach for validation might be to re-balance the LSM estimates of $R^{(i)}_{t}$ and $E^{(i)}_{t}$ so that the LSM estimate of the value $V_{t}$ remains unchanged, but the LSM estimates are better fitted to $R^{(i)}_{t}$ and $E^{(i)}_{t}$ . This re-balancing would not be problematic in the economic model that this validation scheme is played out in. However, in this paper we were also interested in how the LSM model captures both the VaR term and the expected value term, so the quantities ANDP and AROC remain relevant to look at.

5 Conclusion

We have studied the performance of the LSM algorithm to numerically compute recursively defined objects such as $(V_{t}(L,\varphi))_{t=0}^{T}$ given in definition 4, where the mappings $\varphi$ are either $L^{2}$ -continuous or are given by (21). As a part of this study, Lipschitz-like results and conditions for $L^{2}$ -continuity were established for Value-at-Risk and the associated operator $\varphi_{t,\alpha}$ in Theorem 2 and Corollary 2. Important basic consistency results have been obtained showing the convergence of the LSM estimator both as the number of basis functions go to infinity in Lemmas 1 and 7 and when the size of the simulated data goes to infinity for a fixed number of basis functions in Theorems 1 and 3. Furthermore, these results are applicable to a large class of conditional monetary risk measures, utility functions and various actuarial multi-period valuations, the only requirement being $L^{2}$ -continuity or a property like that established in Theorem 2. We also apply and evaluate the LSM algorithm with respect to multi-period cost-of-capital valuation considered in [12] and [13], and in doing this also provide insight into practical considerations concerning implementation and validation of the LSM algorithm.

6 Proofs

Proof of Lemma 1.

Note that the quantities defined in (10) and (17) are independent of $D$ , hence all norms below are a.s. constants. Define $\epsilon_{t}:=\widehat{V}_{N,t}-V_{t}$ . We will now show via backwards induction staring from time $t=T$ that $||\epsilon_{t}||_{2}\to 0$ . The induction base is trivial, since $\widehat{V}_{N,T}(L)=V_{T}(L)=0$ . Now assume that $||\epsilon_{t+1}||_{2}\to 0$ . Then

	$\displaystyle\|\|\epsilon_{t}\|\|_{2}$	$\displaystyle\leq\|\|\widehat{V}_{N,t}-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))\|\|_{2}$
		$\displaystyle\quad+\|\|\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+V_{t+1}(L))\|\|_{2}$

By the induction assumption and the continuity assumption, we know that the second summand goes to $0$ . We now need to show that $||\widehat{V}_{N,t}-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))||_{2}\to 0$ . Now we simply note, by the definition of the projection operator and denseness of the approximating sets,

	$\displaystyle\|\|\widehat{V}_{N,t}-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))\|\|_{2}$
	$\displaystyle\quad=\inf_{B\in\mathcal{B}_{N}}\|\|B-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))\|\|_{2}$
	$\displaystyle\quad\leq\inf_{B\in\mathcal{B}_{N}}\|\|B-\varphi_{t}(L_{t+1}+V_{t+1})\|\|_{2}$
	$\displaystyle\quad\quad+\|\|\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+V_{t+1})\|\|_{2}$
	$\displaystyle\quad=\Big{[}\inf_{B\in\mathcal{B}_{N}}\|\|B-\varphi_{t}(L_{t+1}+V_{t+1})\|\|_{2}\Big{]}$
	$\displaystyle\quad\quad+\|\|\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+V_{t+1})\|\|_{2}.$

By our assumptions, both these terms go to zero as $\varphi_{t}(L_{t+1}+V_{t+1}(L))$ is afunction of the state $S_{t}$ , which lies in $L^{2}(\mathcal{F}_{t})$ . ∎

Proof of Lemma 2.

We first note that if $\widehat{\beta}^{(M)}_{t,N,Z_{M,t}}\to\beta_{t,N,Z_{t}}$ in probability, then $||(\beta_{t,N,Z_{t}})^{\mathrm{T}}\mathbf{\Phi}_{t,N}-(\widehat{\beta}^{(M)}_{t,N,Z_{M,t}})^{\mathrm{T}}\mathbf{\Phi}_{t,N}||_{2}\to 0$ in probability, since $\widehat{\beta}^{(M)}_{t,N,Z_{M,t}}$ is independent of $\mathbf{\Phi}_{t,N}$ and $\mathcal{F}_{0}$ -measurable, while $\mathbf{\Phi}_{t,N}$ is independent of $D$ . Hence it suffices to show that $\widehat{\beta}^{(M)}_{t,N,Z_{M,t}}\to\beta_{t,N,Z_{t}}$ in probability. Now, recalling the definition of $\widehat{\beta}^{(M)}_{t,N,Z_{M,t}}$ , we re-write (14) as

\displaystyle\widehat{\beta}^{(M)}_{t,N,Z_{t}}=\Big{(}\frac{1}{M}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}\mathbf{\Phi}^{(M)}_{t,N}\Big{)}^{-1}\frac{1}{M}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}Z_{t}^{(M)},

Furthermore recall the form of $\beta_{t,N,Z_{t}}$ given by (13). We first note that since, by the law of large numbers, $\frac{1}{M}\big{(}\mathbf{\Phi}^{(M)}_{t,N}\big{)}^{\mathrm{T}}\mathbf{\Phi}^{(M)}_{t,N}\to\mathbb{E}_{0}\big{[}\mathbf{\Phi}_{t,N}\mathbf{\Phi}_{t,N}^{{\mathrm{T}}}\big{]}$ almost surely and thus in probability, it suffices to show that

\displaystyle\frac{1}{M}\big{(}\Phi^{(M)}_{t,j}(S_{t}^{(i)})\big{)}_{1\leq i\leq M}^{\mathrm{T}}Z_{t}^{(M)}\to\mathbb{E}_{0}[\Phi_{t,j}(S_{t})Z_{t}]

in probability for each $j=1,\dots N$ . We first note that, letting $\epsilon_{M}^{(i)}:=Z_{M,t}^{(i)}-z_{t}(S_{t}^{(i)})$

	$\displaystyle\Big{\|}\frac{1}{M}\sum_{i=1}^{M}\Phi_{t,j}(S_{t}^{(i)})Z_{M,t}^{(i)}-\mathbb{E}_{0}[\Phi_{t,j}(S_{t})Z_{t}]\Big{\|}$
	$\displaystyle\quad\leq\Big{\|}\frac{1}{M}\sum_{i=1}^{M}\Phi_{t,j}(S_{t}^{(i)})z_{t}(S_{t}^{(i)})-\mathbb{E}_{0}[\Phi_{t,j}(S_{t})Z_{t}]\Big{\|}+\Big{\|}\frac{1}{M}\sum_{i=1}^{M}\Phi_{t,j}(S_{t}^{(i)})\epsilon_{M}^{(i)}\Big{\|}$

The first summand goes to zero in probability by the law of large numbers. Thus, we investigate the second summand using Hölder’s inequality:

\displaystyle\Big{|}\frac{1}{M}\sum_{i=1}^{M}\Phi_{t,j}(S_{t}^{(i)})\epsilon_{M}^{(i)}\Big{|}\leq\Big{(}\frac{1}{M}\sum_{i=1}^{M}(\Phi_{t,j}(S_{t}^{(i)}))^{2}\Big{)}^{1/2}\Big{(}\frac{1}{M}\sum_{i=1}^{M}(\epsilon_{M}^{(i)})^{2}\Big{)}^{1/2}

We see that, again by the law of large numbers, the first factor converges to $\mathbb{E}[(\Phi_{t,j}(S_{t}))^{2}]$ in probability. Now we look at the second factor. By our independence assumption, $\epsilon_{M}^{(i)}\overset{d}{=}Z_{t,M}-Z_{t}$ and thus

\displaystyle\operatorname{Var}\Big{(}\Big{(}\frac{1}{M}\sum_{i=1}^{M}(\epsilon_{M}^{(i)})^{2}\Big{)}^{1/2}\Big{|}\mathcal{F}_{0}\Big{)}\leq\mathbb{E}_{0}\Big{[}\frac{1}{M}\sum_{i=1}^{M}(\epsilon_{M}^{(i)})^{2}\Big{]}=||Z_{t,M}-Z_{t}||^{2}_{2},

which, by assumption, goes to $0$ in probability, hence the expression goes to zero in probability. This concludes the proof. ∎

Proof of Theorem 1.

We pove the statement by backwards induction, starting from time $t=T$ . As before, the induction base follows immediately from our assumptions. Now assume $||\widehat{V}_{N,t+1}(L)-\widehat{V}_{N,t+1}^{(M)}(L)||_{2}\to 0$ in probability, as $M\to\infty$ . By $L^{2}$ -continuity we get that $||\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}^{(M)}(L))||_{2}\to 0$ in probability. But then by Lemma 2 we immediately get that $||\widehat{V}_{N,t}(L)-\widehat{V}_{N,t}^{(M)}(L)||_{2}\to 0$ in probability. ∎

Proof of Lemma 3.

Note that

	$\displaystyle\|\|\varphi_{t}(X)-\varphi_{t}(Y)\|\|^{2}_{2}$	$\displaystyle=\mathbb{E}_{0}[\|\varphi_{t}(X)-\varphi_{t}(Y)\|^{2}]\leq\mathbb{E}_{0}[K^{2}\mathbb{E}_{t}[\|X-Y\|]^{2}]$
		$\displaystyle\leq K^{2}\mathbb{E}_{0}[\mathbb{E}_{t}[\|X-Y\|^{2}]]=K^{2}\mathbb{E}_{0}[\|X-Y\|^{2}]$
		$\displaystyle=K^{2}\|\|X-Y\|\|^{2}_{2}.$

Here we have used Jensen’s inequality and the tower property of the conditional epectation at the second inequality and the following equality, respectively. From this $L^{2}$ -continuity immediately follows. ∎

Proof of Lemma 4.

By Lemma 9, to construct upper and lower bounds for a quantity given by $\varphi_{t,\alpha}(\xi)$ we may find upper and lower bounds for $\rho_{t}(-\xi)$ and insert them into the expression for $\varphi_{t,\alpha}(\xi)$ . Now Take $X,Y\in L^{p}(\mathcal{F}_{t+1})$ and let $Z:=Y-X$ . By monotnicity we get that

\displaystyle\varphi_{t}(X-|Z|)\leq\varphi_{t}(Y)\leq\varphi_{t}(X+|Z|).

We now observe that

	$\displaystyle\rho_{t}(-(X+\|Z\|))\leq\rho_{t}(-X)+K\mathbb{E}_{t}[\|Z\|]$
	$\displaystyle\rho_{t}(-(X-\|Z\|))\leq\rho_{t}(-X)-K\mathbb{E}_{t}[\|Z\|]$

We use this to also observe that, by the subadditivity of the $()_{+}$ -operation,

	$\displaystyle-\mathbb{E}_{t}[(\rho_{t}(-X)+K\mathbb{E}_{t}[\|Z\|]-X-\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+\mathbb{E}_{t}[(\rho_{t}(-\|Z\|)-\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+\rho_{t}(-\|Z\|)$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+K\mathbb{E}_{t}[\|Z\|].$

Similarly, we have that

	$\displaystyle-\mathbb{E}_{t}[(\rho_{t}(-X)-K\mathbb{E}_{t}[\|Z\|]-X+\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-\mathbb{E}_{t}[(\rho_{t}(-\|Z\|)-\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-\rho_{t}(-\|Z\|)$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-K\mathbb{E}_{t}[\|Z\|].$

From this we get that

	$\displaystyle\varphi_{t}(Y)\leq\varphi_{t}(X+\|Z\|)\leq\varphi_{t}(X)+K\mathbb{E}_{t}[\|Z\|]+\frac{1}{1+\eta}K\mathbb{E}_{t}[\|Z\|]$
	$\displaystyle\varphi_{t}(Y)\geq\varphi_{t}(X-\|Z\|)\geq\varphi_{t}(X)-K\mathbb{E}_{t}[\|Z\|]-\frac{1}{1+\eta}K\mathbb{E}_{t}[\|Z\|]$

From which Lipschitz continuity with resepct to the constant $2K$ immediately follows. ∎

Proof of Lemma 5.

By subadditivity we have that

	$\displaystyle\rho_{t,M}(Y)-\rho_{t,M}(X)\leq\rho_{t,M}(Y-X)\leq\rho_{t,M}(-\|Y-X)\|),$
	$\displaystyle\rho_{t,M}(X)-\rho_{t,M}(Y)\leq\rho_{t,M}(X-Y)\leq\rho_{t,M}(-\|Y-X)\|),$

Now we simply note that

	$\displaystyle\rho_{t,M}(-\|Y-X)\|)$	$\displaystyle=-\int_{0}^{1}F^{-1}_{t,\|Y-X)\|}(u)m(u)\mathrm{d}u$
		$\displaystyle\leq-m(0)\int_{0}^{1}F^{-1}_{t,\|Y-X)\|}(u)\mathrm{d}u$
		$\displaystyle=m(0)\mathbb{E}_{t}[\|Y-X\|]$

This concludes the proof. ∎

Proof of Lemma 6.

We begin by showing (22). Let $E=\{Z\leq\operatorname{VaR}_{1-\delta}(-Z)\}$ . Then:

	$\displaystyle\mathbb{P}_{t}(X+Z\leq y)$	$\displaystyle\geq\mathbb{P}_{t}(E\cap\{X+Z\leq y\})$
		$\displaystyle\geq\mathbb{P}_{t}(E\cap\{X+\operatorname{VaR}_{1-\delta}(-Z)\leq y\})$
		$\displaystyle\geq\mathbb{P}_{t}(X+\operatorname{VaR}_{1-\delta}(-Z)\leq y)-\mathbb{P}(E^{\complement})$
		$\displaystyle\geq\mathbb{P}_{t}(X\leq y-\operatorname{VaR}_{1-\delta}(Z))-\delta$

Putting $y=\operatorname{VaR}_{\alpha+\delta}(-X)+\operatorname{VaR}_{1-\delta}(-Z)$ yields

\displaystyle\mathbb{P}_{t}(X\leq y-\operatorname{VaR}_{1-\delta}(-Z))-\delta\geq\alpha+\delta-\delta=\alpha

Hence $\operatorname{VaR}_{t,\alpha}(-(X+Z))\leq\operatorname{VaR}_{\alpha+\delta}(-X)+\operatorname{VaR}_{1-\delta}(-Z)$ . We now prove (23) by applying (22)

	$\displaystyle\operatorname{VaR}_{t,\alpha-\delta}(-X)$	$\displaystyle=\operatorname{VaR}_{t,\alpha-\delta}(-(X+Z+(-Z))$
		$\displaystyle\leq\operatorname{VaR}_{t,\alpha}(-(X+Z))+\operatorname{VaR}_{1-\delta}(Z),$

from which we get (23) ∎

Proof of Corollary 1.

Let $Z=Y-X$ . Now we simply note that, for any $\delta$

	$\displaystyle\operatorname{VaR}_{t,\alpha}(-(X+Z))$	$\displaystyle\leq\operatorname{VaR}_{t,\alpha}(-(X+\|Z\|))$
		$\displaystyle\leq\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(-\|Z\|)$

By Markov’s inequality, we may bound the latter summand:

\displaystyle\operatorname{VaR}_{t,1-\delta}(-|Z|)\leq\frac{1}{\delta}\mathbb{E}_{t}[|Z|]

Now for the lower bound, we similarly note

	$\displaystyle\operatorname{VaR}_{t,\alpha}(-(X+Z))$	$\displaystyle\geq\operatorname{VaR}_{t,\alpha}(-(X-\|Z\|))$
		$\displaystyle\geq\operatorname{VaR}_{t,\alpha-\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(\|Z\|)$

where again we may bound the second summand using Markov’s inequality:

\displaystyle\operatorname{VaR}_{t,1-\delta}(|Z|)\geq-\frac{1}{1-\delta}\mathbb{E}_{t}[|Z|]\geq-\frac{1}{\delta}\mathbb{E}_{t}[|Z|],

since we have assumed $\delta<1/2$ . This immediately yields that, almost surely,

\displaystyle\operatorname{VaR}_{t,\alpha}(-Y)\in\Big{[}\operatorname{VaR}_{t,\alpha-\delta}(-X)-\frac{1}{\delta}\mathbb{E}_{t}[|X-Y|],\operatorname{VaR}_{t,\alpha+\delta}(-X)+\frac{1}{\delta}\mathbb{E}_{t}[|X-Y|]\Big{]}.

This immediately yields our desired result. ∎

Lemma 9.

For any $X\in L^{1}(\mathcal{F}_{t+1})$ and $R_{1},R_{2}\in L^{0}(\mathcal{F}_{t})$ with $R_{1}\leq R_{2}$ a.s.,

\displaystyle R_{1}-\frac{1}{1+\eta}\mathbb{E}_{t}[(R_{1}-X)_{+}]\leq R_{2}-\frac{1}{1+\eta}\mathbb{E}_{t}[(R_{2}-X)_{+}]\quad a.s.

Proof of Lemma 9.

Let $R_{1}\leq R_{2}$ a.s. and let $A_{1}=\{R_{1}-X\geq 0\}$ and $A_{2}=\{R_{2}-X\geq 0\}$ . Note that $A_{1}\subseteq A_{2}$ almost surely, i.e. $\mathbb{P}_{t}(A_{1}\setminus A_{2})=0$ a.s. We now note that:

	$\displaystyle R_{1}-\frac{1}{1+\eta}\mathbb{E}_{t}[(R_{1}-X)_{+}]=\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{1})\big{)}R_{1}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{1}}X]$
	$\displaystyle R_{2}-\frac{1}{1+\eta}\mathbb{E}_{t}[(R_{2}-X)_{+}]=\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{2})\big{)}R_{2}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]$

We look at the expectation in the first expression:

\displaystyle\mathbb{E}_{t}[\mathbb{I}_{A_{1}}X]=\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]-\mathbb{E}_{t}[X\mathbb{I}_{A_{2}\setminus A_{1}}]\leq\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]-\mathbb{P}_{t}(A_{2}\setminus A_{1})R_{1}

We now see that

	$\displaystyle\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{1})\big{)}R_{1}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{1}}X]$
	$\displaystyle\quad\leq\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{1})\big{)}R_{1}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{2}\setminus A_{1})R_{1}$
	$\displaystyle\quad=\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{2})\big{)}R_{1}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]$
	$\displaystyle\quad\leq\big{(}1-\frac{1}{1+\eta}\mathbb{P}_{t}(A_{2})\big{)}R_{2}+\frac{1}{1+\eta}\mathbb{E}_{t}[\mathbb{I}_{A_{2}}X]$

This concludes the proof. ∎

Proof of Theorem 2.

Let $Z=Y-X$ . Note that

\displaystyle\varphi_{t,\alpha}(Y)=\varphi_{t,\alpha}(X+Z)\leq\varphi_{t,\alpha}(X+|Z|).

As for the $\operatorname{VaR}$ -part of $\varphi_{t,\alpha}$ , including that in the expectation, we note that by Lemma 6 $\operatorname{VaR}_{t,\alpha}(-(X+|Z|))\leq\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(-|Z|)$ . We now note that, by subadditivity of $x\mapsto(x)_{+}$ ,

	$\displaystyle-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(-\|Z\|)-X-\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)-X)_{+}]+\mathbb{E}_{t}[(\operatorname{VaR}_{t,1-\delta}(-\|Z\|)-\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)-X)_{+}]+\operatorname{VaR}_{t,1-\delta}(-\|Z\|).$

Hence

	$\displaystyle\varphi_{t,\alpha}(X+\|Z\|)$	$\displaystyle\leq\varphi_{t,\alpha+\delta}(X)+\frac{2+\eta}{1+\eta}\operatorname{VaR}_{t,1-\delta}(-\|Z\|)$
		$\displaystyle\leq\varphi_{t,\alpha+\delta}(X)+\frac{2}{\delta}\mathbb{E}_{t}[\|Z\|].$

Here we have used the Markov’s inequality bound from Corollary 1.

We now similarly construct a lower bound for $\varphi_{t,\alpha}(Y)$ :

\displaystyle\varphi_{t,\alpha}(Y)=\varphi_{t,\alpha}(X+Z)\geq\varphi_{t,\alpha}(X-|Z|)

Again, for the $\operatorname{VaR}$ -part, we note that by Lemma 6 $\operatorname{VaR}_{t,\alpha}(-(X-|Z|))\geq\operatorname{VaR}_{t,\alpha-\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(|Z|)$ . We now analyze the resulting expected value part, using subadditivity of $x\mapsto(x)_{+}$ :

	$\displaystyle-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)+\operatorname{VaR}_{t,1-\delta}(\|Z\|)-X+\|Z\|)_{+}]$
	$\displaystyle\quad\geq-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)-X)_{+}]-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(\|Z\|)+\|Z\|)_{+}]$
	$\displaystyle\quad\geq-\mathbb{E}_{t}[(\operatorname{VaR}_{t,\alpha+\delta}(-X)-X)_{+}]-\mathbb{E}_{t}[\|Z\|]$

Hence we get the lower bound

	$\displaystyle\varphi_{t,\alpha}(X-\|Z\|)$	$\displaystyle\leq\varphi_{t,\alpha-\delta}(X)+\operatorname{VaR}_{t,1-\delta}(\|Z\|)+\frac{1}{1+\eta}\mathbb{E}_{t}[\|Z\|]$
		$\displaystyle\geq\varphi_{t,\alpha-\delta}(X)-\frac{2}{\delta}\mathbb{E}_{t}[\|Z\|].$

Here, again, we have used the Markov’s inequality bound from Corollary 1. Hence we have shown that

\displaystyle\varphi_{t,\alpha}(Y)\in\Big{[}\varphi_{t,\alpha-\delta}(X)-\frac{2}{\delta}\mathbb{E}_{t}[|X-Y|],\varphi_{t,\alpha+\delta}(X)+\frac{2}{\delta}\mathbb{E}_{t}[|X-Y|]\Big{]},

from which (24) immediately follows. ∎

Proof of Corollary 2.

Choose a sequence $\delta_{n}\to 0$ such that $\frac{2}{\delta_{n}^{2}}\mathbb{E}[|X-X_{n}|^{2}]\to 0$ with $\alpha+\delta_{n}<1$ and $\delta_{n}<1/2$ . We now use the following inequality, which follows from the monotonicity of $\varphi_{t,\alpha}$ in $\alpha$ :

	$\displaystyle\|\varphi_{t,\alpha}(X)-\varphi_{t,\alpha}(X_{n})\|$
	$\displaystyle\quad\leq\varphi_{t,\alpha+\delta_{n}}(X)-\varphi_{t,\alpha-\delta_{n}}(X)+\inf_{\|\epsilon\|<\delta_{n}}\|\varphi_{t,\alpha+\epsilon}(X)-\varphi_{t,\alpha}(X_{n})\|$
	$\displaystyle\quad\leq\varphi_{t,\alpha+\delta_{n}}(X)-\varphi_{t,\alpha-\delta_{n}}(X)+\frac{2}{\delta_{n}}\mathbb{E}[\|X-X_{n}\|\mid\mathcal{H}_{t}]$

By $L^{2}$ -convergence and our choice of $\delta_{n}$ , the last term clearly goes to $0$ by our assumptions.

As for the first summand, we see that for any sequence $\delta_{n}\to 0$ , $\varphi_{t,\alpha-\delta_{n}}(X)-\varphi_{t,\alpha+\delta_{n}}(X)\to 0$ almost surely (by the continuity assumption of $\operatorname{VaR}_{t,u}$ at $\alpha$ ) and furthermore it is a decreasing sequence of nonnegative random variables in $L^{2}(\mathcal{H}_{t})$ . Hence by Lebesgue’s monotone convergence theorem $||\varphi_{t,\alpha-\delta_{n}}(X)-\varphi_{t,\alpha+\delta_{n}}(X)||_{2}\to 0$ . This concludes the proof. ∎

Proof of Lemma 7.

By Lemma 2, $\varphi_{t,\alpha}$ is $L^{2}$ continuous with respect to limit objects with a.s. continuous $t$ -conditional distributions. Hence the proof is completely analogous to that of Lemma 1. ∎

Proof of Lemma 8.

Fix $\epsilon>0$ . we want to show that $\mathbb{P}(||\varphi_{t,\alpha}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}>\epsilon)\to 0$ as $n\to\infty$ . We first note that, for any $\delta\in(0,1-\alpha)$ with $\delta<1/2$ , we have an inequality similar to that in the proof of Corollary 2:

	$\displaystyle\|\|\varphi_{t,\alpha}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\|\|_{2}$
	$\displaystyle\quad\leq\|\|\varphi_{t,\alpha-\delta}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha+\delta}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\|\|_{2}$
	$\displaystyle\quad+\inf_{\|\xi\|\leq\delta}\|\|\varphi_{t,\alpha+\xi}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\|\|_{2}$

If we look at the first summand, we see that for any sequence $\delta_{n}\to 0$ , $\varphi_{t,\alpha-\delta_{n}}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha+\delta_{n}}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\to 0$ almost surely (by a.s. continuity) and furthermore it is a decreasing sequence of nonnegative random variables. Hence by Lebesgue’s monotone convergence theorem $||\varphi_{t,\alpha-\delta_{n}}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha+\delta_{n}}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}\to 0$ as a sequence of constants, since the expression is independent of $D$ .

We now apply Theorem 2 to the second term to see that

	$\displaystyle\inf_{\|\xi\|\leq\delta}\|\|\varphi_{t,\alpha+\xi}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})\|\|_{2}$	$\displaystyle\leq\frac{2}{\delta}\|\|(\beta-\beta_{n})^{\mathrm{T}}\mathbf{\Phi}_{t+1,N}\|\|_{2}$
		$\displaystyle\leq\frac{2}{\delta}\|\|\beta-\beta_{n}\|\|_{\infty}\|\|\mathbf{\Phi}_{t+1,N}\|\|_{2}$

Note that $||\mathbf{\Phi}_{t+1,N}||_{2}:=K_{\Phi}$ is just a constant. We now note that, as $||\beta-\beta_{n}||_{\infty}\to 0$ in probability, then for any fixed $\epsilon>0$ , it is possible to choose a sequence $\delta_{n}\to 0$ such that $\mathbb{P}(\frac{2K_{\Phi}}{\delta_{n}}||\beta-\beta_{n}||_{\infty}>\epsilon)\to 0$ . Hence, for any fixed $\epsilon>0$ , $\mathbb{P}(||\varphi_{t,\alpha}(\beta^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})-\varphi_{t,\alpha}(\beta_{n}^{\mathrm{T}}\mathbf{\Phi}_{t+1,N})||_{2}>\epsilon)\to 0$ as $n\to\infty$ . ∎

Proof of Theorem 3.

We prove the statement by backwards induction, starting from time $t=T$ . The induction base is trivial. Now assume that the statement holds for time $t+1$ . But then, by Lemma 8, $||\varphi_{t,\alpha}(L_{t+1}+\widehat{V}_{N,t,\alpha}(L))-\varphi_{t,\alpha}(L_{t+1}+\widehat{V}^{(M)}_{N,t,\alpha}(L))||_{2}\to 0$ in probability. Hence, we get immediately by Lemma 2 that $||\widehat{V}_{N,t,\alpha}(L)-\widehat{V}_{N,t,\alpha}^{(M)}(L)||_{2}\to 0$ in probability. This concludes the proof. ∎

References

[1] Philippe Artzner, Freddy Delbaen, Jean-Marc Eber, David Heath and Hyejin Ku (2007), Coherent multiperiod risk adjusted values and Bellman’s principle. Annals of Operations Research, 152, 5-22.
[2] D. Barrera, S. Crépey, B. Diallo, G. Fort, E. Gobet and U. Stazhynski (2018), Stochastic approximation schemes for economic capital and risk margin computations. HAL <hal-01710394>
[3] Mark Broadie, Yiping Du and Ciamac C. Moallemi (2015), Risk estimation via regression. Operations Research, 63 (5) 1077-1097.
[4] Patrick Cheridito, Freddy Delbaen and Michael Kupper (2006), Dynamic monetary risk measures for bounded discrete-time processes. Electronic Journal of Probability, 11, 57-106.
[5] Patrick Cheridito and Michael Kupper (2011), Composition of time-consistent dynamic monetary risk measures in discrete time. International Journal of Theoretical and Applied Finance, 14 (1), 137-162.
[6] Patrick Cheridito and Michael Kupper (2009), Recursiveness of indifference prices and translation-invariant preferences. Mathematics and Financial Economics, 2 (3), 173-188.
[7] Emanuelle Clément, Damien Lamberton, and Philip Protter(2002), An analysis of a least squares regression method for american option pricing. Finance and Stochastics, 6, 449–471.
[8] European Commission (2015), Commission delegated regulation (EU) 2015/35 of 10 October 2014. Official Journal of the European Union.
[9] Łukasz Delong, Jan Dhaene and Karim Barigou (2019), Fair valuation of insurance liability cash-flow streams in continuous time: Applications. ASTIN Bulletin, 49 (2), 299-333.
[10] Kai Detlefsen and Giacomo Scandolo (2005), Conditional and dynamic convex risk measures. Finance and Stochastics, 9, 539-561.
[11] Jan Dhaene, Ben Stassen, Karim Barigou, Daniël Linders and Ze Chen (2017) Fair valuation of insurance liabilities: Merging actuarial judgement and Market-Consistency. Insurance: Mathematics and Economics, 76, 14-27.
[12] Hampus Engsner, Mathias Lindholm and Filip Lindskog (2017), Insurance valuation: A computable multi-period cost-of-capital approach. Insurance: Mathematics and Economics, 72, 250-264.
[13] Hampus Engsner and Filip Lindskog (2020), Continuous-time limits of multi-period cost-of-capital margins. Statistics and Risk Modelling, forthcoming (DOI: https://doi.org/10.1515/strm-2019-0008)
[14] Daniel Egloff (2005), Monte Carlo algorithms for optimal stopping and statistical learning, The Annals of Applied Probability, 15 (2), 1396-1432.
[15] Hans Föllmer and Alexander Schied (2016), Stochastic finance: An introduction in discrete time, 4th edition, De Gruyter Graduate.
[16] Francis A. Longstaff and Eduardo S. Schwartz (2001). Valuing American options by simulation: A simple least-squares approach. The Review of Financial Studies, 14 (1),113-147.
[17] Christoph Möhr (2011), Market-consistent valuation of insurance liabilities by cost of capital. ASTIN Bulletin, 41, 315-341.
[18] Antoon Pelsser and Ahmad Salahnejhad Ghalehjooghi (2020). Time-consistent and market-consistent actuarial valuation of the participating pension contract. Scandinavian Actuarial Journal, forthcoming (DOI: https://doi.org/10.1080/03461238.2020.1832911)
[19] Antoon Pelsser and Ahmad Salahnejhad Ghalehjooghi (2016). Time-consistent actuarial valuations. Insurance: Mathematics and Economics, 66, 97-112.
[20] Antoon Pelsser and Mitja Stadje (2014), Time-consistent and market-consistent evaluations. Mathematical Finance, 24 (1), 25-65.
[21] Lars Stentoft (2004), Convergence of the least squares Monte Carlo approach to American option valuation. Management Science, 50 (9), 1193-1203.
[22] John N. Tsitsiklis and Benjamin Van Roy (2001). Regression methods for pricing complex American-style options. IEEE Transactions On Neural Networks, 12 (4), 694-703.
[23] Daniel Z. Zanger (2009), Convergence of a least-squares Monte Carlo algorithm for bounded approximating sets. Applied Mathematical Finance, 16 (2), 123-150.
[24] Daniel Z. Zanger (2013), Quantitative error estimates for a least-squares Monte Carlo algorithm for American option pricing. Finance and Stochastics, 17, 503-534.
[25] Daniel Z. Zanger (2018), Convergence of a least-squares Monte Carlo algorithm for American option pricing with dependent sample data. Mathematical Finance, 28 (1), 447-479.

	$\displaystyle\|\|\widehat{V}_{t}(L)-V_{t}(L)\|\|_{2}$
	$\displaystyle\quad\leq\|\|\widehat{V}_{t}(L)-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L))\|\|_{2}$
	$\displaystyle\quad\quad+\|\|\varphi_{t}(L_{t+1}+V_{t+1}(L))-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L)\|\|_{2}$
	$\displaystyle\quad\leq\|\|\widehat{V}_{t}(L)-\varphi_{t}(L_{t+1}+\widehat{V}_{t+1}(L))\|\|_{2}+K\|\|V_{t+1}(L)-\widehat{V}_{t+1}(L)\|\|_{2}$

	$\displaystyle\|\|\widehat{V}_{N,t}-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))\|\|_{2}$
	$\displaystyle\quad=\inf_{B\in\mathcal{B}_{N}}\|\|B-\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))\|\|_{2}$
	$\displaystyle\quad\leq\inf_{B\in\mathcal{B}_{N}}\|\|B-\varphi_{t}(L_{t+1}+V_{t+1})\|\|_{2}$
	$\displaystyle\quad\quad+\|\|\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+V_{t+1})\|\|_{2}$
	$\displaystyle\quad=\Big{[}\inf_{B\in\mathcal{B}_{N}}\|\|B-\varphi_{t}(L_{t+1}+V_{t+1})\|\|_{2}\Big{]}$
	$\displaystyle\quad\quad+\|\|\varphi_{t}(L_{t+1}+\widehat{V}_{N,t+1}(L))-\varphi_{t}(L_{t+1}+V_{t+1})\|\|_{2}.$

	$\displaystyle\|\|\varphi_{t}(X)-\varphi_{t}(Y)\|\|^{2}_{2}$	$\displaystyle=\mathbb{E}_{0}[\|\varphi_{t}(X)-\varphi_{t}(Y)\|^{2}]\leq\mathbb{E}_{0}[K^{2}\mathbb{E}_{t}[\|X-Y\|]^{2}]$
		$\displaystyle\leq K^{2}\mathbb{E}_{0}[\mathbb{E}_{t}[\|X-Y\|^{2}]]=K^{2}\mathbb{E}_{0}[\|X-Y\|^{2}]$
		$\displaystyle=K^{2}\|\|X-Y\|\|^{2}_{2}.$

	$\displaystyle-\mathbb{E}_{t}[(\rho_{t}(-X)+K\mathbb{E}_{t}[\|Z\|]-X-\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+\mathbb{E}_{t}[(\rho_{t}(-\|Z\|)-\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+\rho_{t}(-\|Z\|)$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]+K\mathbb{E}_{t}[\|Z\|].$

	$\displaystyle-\mathbb{E}_{t}[(\rho_{t}(-X)-K\mathbb{E}_{t}[\|Z\|]-X+\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-\mathbb{E}_{t}[(\rho_{t}(-\|Z\|)-\|Z\|)_{+}]$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-\rho_{t}(-\|Z\|)$
	$\displaystyle\quad\leq-\mathbb{E}_{t}[\rho_{t}(-X)-X)_{+}]-K\mathbb{E}_{t}[\|Z\|].$