This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

A new efficient approximation scheme for solving high-dimensional semilinear PDEs: control variate method for Deep BSDE solver

Akihiko Takahashi***The University of Tokyo, Tokyo, Japan,  Yoshifumi TsuchidaHitotsubashi University, Tokyo, Japan  and Toshihiro YamadaHitotsubashi University, Tokyo, Japan §§§Japan Science and Technology Agency (JST), Tokyo, Japan
(First version: January 21, 2021,   This version: January 30, 2021)
Abstract

This paper introduces a new approximation scheme for solving high-dimensional semilinear partial differential equations (PDEs) and backward stochastic differential equations (BSDEs). First, we decompose a target semilinear PDE (BSDE) into two parts, namely “dominant” linear and “small” nonlinear PDEs. Then, we employ a Deep BSDE solver with a new control variate method to solve those PDEs, where approximations based on an asymptotic expansion technique are effectively applied to the linear part and also used as control variates for the nonlinear part. Moreover, our theoretical result indicates that errors of the proposed method become much smaller than those of the original Deep BSDE solver. Finally, we show numerical experiments to demonstrate the validity of our method, which is consistent with the theoretical result in this paper.

Keyword. Deep learning, Semilinear partial differential equations, Backward stochastic differential equations, Deep BSDE solver, Asymptotic expansion, Control variate method

1 Introduction

High-dimensional semilinear partial differential equations (PDEs) are often used to describe various complex, large-scale phenomena appearing in physics, applied mathematics, economics and finance. Such PDEs typically have the form:

tu(t,x)+u(t,x)+f(t,x,u(t,x),xu(t,x)σ(t,x))=0,t<T,xd,\displaystyle\displaystyle\frac{\partial}{\partial t}u(t,x)+{\cal L}u(t,x)+f(t,x,u(t,x),\partial_{x}u(t,x)\sigma(t,x))=0,\ \ \ t<T,\ \ x\in\mathbb{R}^{d}, (1.1)
u(T,x)=g(x),xd,\displaystyle\displaystyle u(T,x)=g(x),\ \ x\in\mathbb{R}^{d},

where f\displaystyle f is a nonlinear function, \displaystyle{\cal L} is a second order differential operator of the type:

φ(t,x)=iμi(t,x)xiφ(t,x)+12i,j[σσ]i,j(t,x)xixjφ(t,x),\displaystyle\displaystyle{\cal L}\varphi(t,x)=\sum_{i}\mu^{i}(t,x)\partial_{x_{i}}\varphi(t,x)+\frac{1}{2}\sum_{i,j}[\sigma\sigma^{\top}]_{i,j}(t,x)\partial_{x_{i}}\partial_{x_{j}}\varphi(t,x), (1.2)

and the dimension d\displaystyle d is assumed to be high. To solve the nonlinear PDE, we have to rely on some numerical schemes since they have no closed-form solutions especially in high-dimensional cases. Classical methods such as finite differences and finite elements fail in high-dimensional cases due to their exponential growth of complexity. In the last two decades, probabilistic approaches have been studied with Monte Carlo methods for backward stochastic differential equations (BSDEs) since solutions of semilinear PDEs can be represented by the ones of corresponding BSDEs through the nonlinear Feynman-Kac formula (see Zhang (2017) [42] for instance).

In Weinan E et al. (2017) [3], a novel computational scheme called the Deep BSDE method is proposed. In the Deep BSDE method, a stochastic target problem is considered with a forward-discretization scheme of the related BSDE. Then, the control problem is solved with a deep learning algorithm. The Deep BSDE method has opened the door to tractability of higher dimensional problems, which enables us to solve high-dimensional semilinear PDEs within realistic computation time. Recently, notable related works, mostly with neural networks have developed new methods for solving various types of high dimensional PDEs. See [2][4][5][10][11][12][14][15][17][18][16][23][30][41] for example.

While high-dimensional semilinear PDEs can be feasibly solved by the Deep BSDE method, the deviation of its estimated value from the true one is not small with reasonable computational time. Then, constructing an acceleration scheme for the Deep BSDE method is desirable.

Fujii et al. (2019) [9] proposed an improved scheme for the Deep BSDE method. They used a prior knowledge with an asymptotic expansion method for a target BSDE and obtained its fast approximation. Then, they found that numerical errors become small in accordance with the fast decrease in values of the corresponding loss function. The scheme enables us to reduce processing load of the original Deep BSDE solver. For details of the asymptotic expansion method, a key technique applied in their article, see Takahashi (1999, 2015) [31][32], Kunitomo and Takahashi (2001, 2003) [21][22] and references therein. Moreover, Naito and Yamada (2020) [27] presented an extended scheme of Fujii et al. (2019) [8] by applying the backward Euler scheme for a BSDE with a good initial detection of the solution to a target PDE so that the Deep BSDE method works more efficiently.

In the current work, we develop a new deep learning-based approximation for solving high-dimensional semilinear PDEs by extending the schemes in Weinan E et al. (2017) [3], Fujii et al. (2019) [9] and Naito and Yamada (2020) [27]. In particular, we propose an efficient control variate method for the Deep BSDE solver in order to obtain more accurate and stable approximations. Let us briefly explain the strategy considered in this paper. We first decompose the semilinear PDE (1.1) into two parts,

u(t,x)=𝒰1(t,x)+𝒰2(t,x)\displaystyle u(t,x)={\cal U}^{1}(t,x)+{\cal U}^{2}(t,x) as follows:

t𝒰1(t,x)+𝒰1(t,x)=0,t<T,xd,\displaystyle\displaystyle\frac{\partial}{\partial t}{\cal U}^{1}(t,x)+{\cal L}{\cal U}^{1}(t,x)=0,\ \ \ t<T,\ \ x\in\mathbb{R}^{d}, (1.3)
𝒰1(T,x)=g(x),xd,\displaystyle\displaystyle{\cal U}^{1}(T,x)=g(x),\ \ x\in\mathbb{R}^{d},

and

t𝒰2(t,x)+𝒰2(t,x)\displaystyle\displaystyle\frac{\partial}{\partial t}{\cal U}^{2}(t,x)+{\cal L}{\cal U}^{2}(t,x) (1.4)
+f(t,x,𝒰1(t,x)+𝒰2(t,x),x𝒰1(t,x)σ(t,x)+x𝒰2(t,x)σ(t,x))=0,t<T,xd,\displaystyle\displaystyle\hskip 5.0pt+f(t,x,{\cal U}^{1}(t,x)+{\cal U}^{2}(t,x),\partial_{x}{\cal U}^{1}(t,x)\sigma(t,x)+\partial_{x}{\cal U}^{2}(t,x)\sigma(t,x))=0,\ \ t<T,\ x\in\mathbb{R}^{d},
𝒰2(T,x)=0,xd.\displaystyle\displaystyle\hskip 100.00015pt{\cal U}^{2}(T,x)=0,\ \ x\in\mathbb{R}^{d}.

Here, we remark that the solution u\displaystyle u of the semilinear PDE (1.1) is given by the sum of the solutions 𝒰1\displaystyle{\cal U}^{1} and 𝒰2\displaystyle{\cal U}^{2} of PDEs (1.3) and (1.4), respectively. Also, we note that 𝒰1\displaystyle{\cal U}^{1} is the solution to the “dominant” linear PDE and 𝒰2\displaystyle{\cal U}^{2} is the solution to the “small” residual nonlinear PDE with null terminal condition whose magnitude is governed by the driver (t,x,y,z)f(t,x,𝒰1(t,x)+y,x𝒰1(t,x)σ(t,x)+z)\displaystyle(t,x,y,z)\mapsto f(t,x,{\cal U}^{1}(t,x)+y,\partial_{x}{\cal U}^{1}(t,x)\sigma(t,x)+z), which is generally expected to have small nonlinear effects on the solution of u\displaystyle u. Consequently, the decomposition of the target u(0,)\displaystyle u(0,\cdot) is represented as follows:

u(0,x)=𝒰1(0,x)“dominant” linear PDE part+𝒰2(0,x)“small” nonlinear PDE part,xd.\displaystyle\displaystyle u(0,x)=\underset{\mbox{\tiny{``dominant" linear PDE part}}}{{\cal U}^{1}(0,x)}+\underset{\mbox{\tiny{``small" nonlinear PDE part}}}{{\cal U}^{2}(0,x)},\ \ \ \ x\in\mathbb{R}^{d}. (1.5)

We next approximate

  1. 1.

    𝒰1\displaystyle{\cal U}^{1} by an asymptotic expansion method denoted by 𝒰1,Asymp\displaystyle{\cal U}^{1,\mathrm{Asymp}};

  2. 2.

    𝒰2\displaystyle{\cal U}^{2} by the Deep BSDE method, denoted by 𝒰2,Deep\displaystyle{\cal U}^{2,\mathrm{Deep}}.

We expect that 𝒰1,Asymp\displaystyle{\cal U}^{1,\mathrm{Asymp}} in the approximation

u(0,x)𝒰1,Asymp(0,x)+𝒰2,Deep(0,x),xd,\displaystyle\displaystyle u(0,x)\approx{\cal U}^{1,\mathrm{Asymp}}(0,x)+{\cal U}^{2,\mathrm{Deep}}(0,x),\ \ \ \ x\in\mathbb{R}^{d}, (1.6)

becomes a control variate. Furthermore, 𝒰1,Asymp\displaystyle{\cal U}^{1,\mathrm{Asymp}} and x𝒰1,Asympσ\displaystyle\partial_{x}{\cal U}^{1,\mathrm{Asymp}}\sigma in the approximate driver (t,x,y,z)f(t,x,𝒰1,Asymp(t,x)+y,x𝒰1,Asymp(t,x)σ(t,x)+z)\displaystyle(t,x,y,z)\mapsto f(t,x,{\cal U}^{1,\mathrm{Asymp}}(t,x)+y,\partial_{x}{\cal U}^{1,\mathrm{Asymp}}(t,x)\sigma(t,x)+z) of 𝒰2,Deep\displaystyle{\cal U}^{2,\mathrm{Deep}} will be doubly the control variates. The current work shows how the proposed method works well as a new deep learning-based approximation in both theoretical and numerical aspects.

The organization of this paper is as follows: The next section briefly introduces the deep BSDE solver and acceleration schemes with asymptotic expansions. Section 3 explains our proposed method with the main theoretical result and Section 4 presents our numerical scheme with its experiment.

2 Deep BSDE solver and acceleration scheme with asymptotic expansion

Let T>0\displaystyle T>0 and (Ω,,{t}0tT,P)\displaystyle(\Omega,{\cal F},\{{\cal F}_{t}\}_{0\leq t\leq T},P) be a filtered probability space equipped with a d\displaystyle d-dimensional Brownian motion W={(Wt1,,Wtd)}0tT\displaystyle W=\{(W_{t}^{1},\cdots,W_{t}^{d})\}_{0\leq t\leq T} and a square-integrable d\displaystyle\mathbb{R}^{d}-valued random variable ξ\displaystyle\xi, which is independent of W\displaystyle W. The filtration {t}0tT\displaystyle\{{\cal F}_{t}\}_{0\leq t\leq T} is generated by {Wt+ξ}0tT\displaystyle\{W_{t}+\xi\}_{0\leq t\leq T}. Under this setting we consider the following FBSDE:

dXtε=\displaystyle\displaystyle dX_{t}^{\varepsilon}= μ(t,Xtε)dt+εσ(t,Xtε)dWt,X0ε=ξ,\displaystyle\displaystyle\mu(t,X_{t}^{\varepsilon})dt+\varepsilon\sigma(t,X_{t}^{\varepsilon})dW_{t},\quad X_{0}^{\varepsilon}=\xi, (2.1)
dYtε,α=\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}= αf(t,Xtε,Ytε,α,Ztε,α)dtZtε,αdWt,YTε,α=g(XTε),\displaystyle\displaystyle\alpha f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})dt-Z_{t}^{\varepsilon,\alpha}dW_{t},\quad Y_{T}^{\varepsilon,\alpha}=g(X_{T}^{\varepsilon}), (2.2)

where μ\displaystyle\mu is a d\displaystyle\mathbb{R}^{d}-valued function on [0,T]×d\displaystyle[0,T]\times\mathbb{R}^{d}, σ\displaystyle\sigma is a dd\displaystyle\mathbb{R}^{d\otimes d}-valued function on [0,T]×d\displaystyle[0,T]\times\mathbb{R}^{d}, f:[0,T]×d××d\displaystyle f:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}\times\mathbb{R}^{d}\to\mathbb{R}, g:d\displaystyle g:\mathbb{R}^{d}\to\mathbb{R} are some functions so that the FBSDE has the unique solution, and ε,α(0,1)\displaystyle\varepsilon,\alpha\in(0,1) are some small parameters. Here, we assume that μ\displaystyle\mu and σ\displaystyle\sigma are bounded and smooth in x\displaystyle x and have bounded derivatives with any orders. Also, f\displaystyle f is uniformly Lipschitz continuous function with the Lipschitz constant CLip[f]\displaystyle C_{\mathrm{Lip}}[f] and at most linear growth in the variables x,y,z\displaystyle x,y,z. The function g\displaystyle g is assumed to be Cb2\displaystyle C^{2}_{b}-class. The functions μ,σ,f\displaystyle\mu,\sigma,f are uniformly Hölder-1/2\displaystyle 1/2 continuous with respect to t\displaystyle t. Furthermore, we put the condition that there is ε0>0\displaystyle\varepsilon_{0}>0 such that σ(t,x)σ(t,x)ε0I\displaystyle\sigma(t,x)\sigma(t,x)^{\top}\geq\varepsilon_{0}I for all t[0,T]\displaystyle t\in[0,T] and xN\displaystyle x\in\mathbb{R}^{N}. We sometimes omit the subscripts ε\displaystyle{\cdot}^{\varepsilon} or ε,α\displaystyle{\cdot}^{\varepsilon,\alpha} if no confusion arises.

The corresponding semilinear PDE is given by

tu(t,x)+εu(t,x)+fα(t,x,u(t,x),xu(t,x)σε(t,x))=0,t<T,\displaystyle\displaystyle\partial_{t}u(t,x)+\mathcal{L}^{\varepsilon}u(t,x)+f^{\alpha}(t,x,u(t,x),\partial_{x}u(t,x)\sigma^{\varepsilon}(t,x))=0,\ \ \ t<T, (2.3)
u(T,x)=g(x),\displaystyle\displaystyle u(T,x)=g(x),

where σε=εσ\displaystyle\sigma^{\varepsilon}=\varepsilon\sigma, fα=αf\displaystyle f^{\alpha}=\alpha f, x=(x1,,xd)=(/x1,,/xd)\displaystyle\partial_{x}=(\partial_{x_{1}},\cdots,\partial_{x_{d}})=({\partial}/{\partial x_{1}},\cdots,{\partial}/{\partial x_{d}}) and ε\displaystyle{\cal L}^{\varepsilon} is the generator:

ε=i=1dμi(t,x)xi+12i1,i2=1dσε,i1(t,x)σε,i2(t,x)2xi1xi2.\displaystyle\displaystyle{\cal L}^{\varepsilon}=\sum_{i=1}^{d}\mu^{i}(t,x)\frac{\partial}{\partial x_{i}}+\frac{1}{2}\sum_{i_{1},i_{2}=1}^{d}\sigma^{\varepsilon,i_{1}}(t,x)\sigma^{\varepsilon,i_{2}}(t,x)\frac{\partial^{2}}{\partial x_{i_{1}}\partial x_{i_{2}}}. (2.4)

The purpose of this paper is to estimate

u(0,X0ε)=Y0ε,α\displaystyle\displaystyle u(0,X_{0}^{\varepsilon})=Y_{0}^{\varepsilon,\alpha} (2.5)

against high dimensional FBSDEs/semilinear PDEs. In particular, we introduce an approximation with a deep BSDE solver to propose an efficient control variate method for solving semilinear PDEs. To explain how our method works well as a new scheme, we briefly review the deep BSDE method proposed in Weinan E et al. (2017) [3] and an approximation method developed by Fujii et al. (2019) [9].

2.1 Deep BSDE method by Weinan E et al. (2017)

In Weinan E et al. (2017) [3], the authors considered the minimization problem of the loss function:

infY0ε,α,(n),Zε,α,(n)g(X¯Tε,(n))YTε,α,(n)22\displaystyle\displaystyle\inf_{Y_{0}^{\varepsilon,\alpha,(n)},{Z}^{\varepsilon,\alpha,(n)}}\Big{\|}g(\bar{X}_{T}^{\varepsilon,(n)})-{Y}_{T}^{\varepsilon,\alpha,(n)}\Big{\|}_{2}^{2} (2.6)

where 2=E[||2]1/2\displaystyle\|\cdot\|_{2}=E[|\cdot|^{2}]^{1/2}, subject to

Ytε,α,(n)\displaystyle\displaystyle{Y}_{t}^{\varepsilon,\alpha,(n)} =Y0ε,α,(n)0tfα(s,X¯sε,(n),Ysε,α,(n),Zsε,α,(n))𝑑s+0tZsε,α,(n)𝑑Ws,\displaystyle\displaystyle={Y}^{\varepsilon,\alpha,(n)}_{0}-\int_{0}^{t}f^{\alpha}(s,\bar{X}_{s}^{\varepsilon,(n)},Y_{s}^{\varepsilon,\alpha,(n)},Z_{s}^{\varepsilon,\alpha,(n)})ds+\int_{0}^{t}Z_{s}^{\varepsilon,\alpha,(n)}dW_{s}, (2.7)

where X¯ε,(n)\displaystyle\bar{X}^{\varepsilon,(n)} is the continuous Euler-Maruyama scheme with number of discretization time steps n\displaystyle n:

X¯tε,(n)=0tμ(φ(s),X¯φ(s)ε,(n))𝑑s+0tσε(φ(s),X¯φ(s)ε,(n))𝑑Ws,t0,\displaystyle\displaystyle\bar{X}_{t}^{\varepsilon,(n)}=\int_{0}^{t}\mu(\varphi(s),\bar{X}_{\varphi(s)}^{\varepsilon,(n)})ds+\int_{0}^{t}\sigma^{\varepsilon}(\varphi(s),\bar{X}_{\varphi(s)}^{\varepsilon,(n)})dW_{s},\ \ \ t\geq 0, (2.8)

with φ(s)=max{kT/n;skT/n}\displaystyle\varphi(s)=\max\{kT/n;\ s\geq kT/n\}. They solved the problem by using a deep learning algorithm and checked the effectiveness of the method for nonlinear BSDEs/PDEs even for the high dimension d\displaystyle d. The method is known as Deep BSDE solver.

Then, we have

Y0ε,αY0ε,α,(n),,\displaystyle\displaystyle{Y}_{0}^{\varepsilon,\alpha}\ \approx\ {Y}^{\varepsilon,\alpha,(n),\ast}_{0}, (2.9)

where Y0ε,α,(n),\displaystyle Y_{0}^{\varepsilon,\alpha,(n),\ast} is obtained by solving (2.6), which is justified by the following estimate shown in Han and Long (2020) [14].

Theorem 1 (Han and Long (2020)).

There exists C>0\displaystyle C>0 such that

E[|Y0ε,αY0ε,α,(n)|2]C1n+Cg(X¯Tε,(n))YTε,α,(n)22,E[|Y_{0}^{\varepsilon,\alpha}-Y_{0}^{\varepsilon,\alpha,(n)}|^{2}]\leq C\frac{1}{n}+C\norm{g(\bar{X}_{T}^{\varepsilon,(n)})-{Y}_{T}^{\varepsilon,\alpha,(n)}}_{2}^{2}, (2.10)

for n1\displaystyle n\geq 1.

2.2 An approximation method by Fujii et al. (2019)

In Fujii et al. (2019) [9], the authors considered the problem

infY~0ε,α,(n),Z~Res,ε,α,(n)g(X¯Tε,(n))Y~Tε,α,(n)22\displaystyle\displaystyle\inf_{\widetilde{Y}_{0}^{\varepsilon,\alpha,(n)},\widetilde{Z}^{\mathrm{Res},\varepsilon,\alpha,(n)}}\Big{\|}g(\bar{X}_{T}^{\varepsilon,(n)})-\widetilde{Y}_{T}^{\varepsilon,\alpha,(n)}\Big{\|}_{2}^{2} (2.11)

subject to

Y~tε,α,(n)\displaystyle\displaystyle\widetilde{Y}_{t}^{\varepsilon,\alpha,(n)} =Y~0ε,α,(n)0tfα(s,X¯sε,(n),Y~sε,α,(n),Z^sε,α,(n)+Z~sRes,ε,α,(n))𝑑s\displaystyle\displaystyle=\widetilde{Y}^{\varepsilon,\alpha,(n)}_{0}-\int_{0}^{t}f^{\alpha}(s,\bar{X}_{s}^{\varepsilon,(n)},\widetilde{Y}_{s}^{\varepsilon,\alpha,(n)},\widehat{Z}_{s}^{\varepsilon,\alpha,(n)}+\widetilde{Z}_{s}^{\mathrm{Res},\varepsilon,\alpha,(n)})ds (2.12)
+0t{Z^sε,α,(n)+Z~sRes,ε,α,(n)}𝑑Ws,\displaystyle\displaystyle\quad+\int_{0}^{t}\{\widehat{Z}_{s}^{\varepsilon,\alpha,(n)}+\widetilde{Z}_{s}^{\mathrm{Res},\varepsilon,\alpha,(n)}\}dW_{s}, (2.13)

where Z^ε,α,(n)\displaystyle\widehat{Z}^{\varepsilon,\alpha,(n)} is a prior knowledge of Z\displaystyle Z which is easily computed by an asymptotic expansion method, and they solve the minimization problem with respect to Y~0ε,α,(n)\displaystyle\widetilde{Y}_{0}^{\varepsilon,\alpha,(n)} and Z~Res,ε,α,(n)\displaystyle\widetilde{Z}^{\mathrm{Res},\varepsilon,\alpha,(n)} by Deep BSDE solver. The authors showed that the scheme gives better accuracy than the original Deep BSDE solver. Furthermore, Naito and Yamada (2020) [27] proposed an acceleration scheme by extending the method of Fujii et al. (2019) [9] with a good initial detection of Y0\displaystyle Y_{0} and the backward Euler scheme of Z\displaystyle Z. They confirmed that the numerical error of the method becomes smaller even if the number of iteration steps is few, in other words, the scheme gives faster computation for nonlinear BSDEs/PDEs than the original deep BSDE method ([3]) and Fujii et al. (2019) [9].

3 New method

We propose a new method as an extension of Fujii et al. (2019) [9] and Naito and Yamada (2020) [27]. The new scheme is regarded as a control variate method for solving high-dimensional nonlinear BSDEs/PDEs which is motivated by the perturbation scheme in Takahashi and Yamada (2015) [34]. In the following, let us explain the proposed method. We first decompose (Yε,α,Zε,α)\displaystyle(Y^{\varepsilon,\alpha},Z^{\varepsilon,\alpha}) as Yε,α=𝒴1,ε+α𝒴2,ε\displaystyle Y^{\varepsilon,\alpha}=\mathcal{Y}^{1,\varepsilon}+\alpha\mathcal{Y}^{2,\varepsilon} and Zε,α=𝒵1,ε+α𝒵2,ε\displaystyle Z^{\varepsilon,\alpha}=\mathcal{Z}^{1,\varepsilon}+\alpha\mathcal{Z}^{2,\varepsilon} by introducing

d𝒴t1,ε\displaystyle\displaystyle-d\mathcal{Y}_{t}^{1,\varepsilon} =𝒵t1,εdWt,𝒴T1,ε=g(XTε),\displaystyle\displaystyle=-\mathcal{Z}_{t}^{1,\varepsilon}dW_{t},\quad\mathcal{Y}_{T}^{1,\varepsilon}=g(X_{T}^{\varepsilon}), (3.1)
d𝒴t2,ε\displaystyle\displaystyle-d\mathcal{Y}_{t}^{2,\varepsilon} =fα(t,Xtε,𝒴t1,ε+α𝒴t2,ε,𝒵t1,ε+α𝒵t2,ε)dt𝒵t2,εdWt,𝒴T2=0.\displaystyle\displaystyle=f^{\alpha}(t,X_{t}^{\varepsilon},\mathcal{Y}_{t}^{1,\varepsilon}+\alpha\mathcal{Y}_{t}^{2,\varepsilon},\mathcal{Z}_{t}^{1,\varepsilon}+\alpha\mathcal{Z}_{t}^{2,\varepsilon})dt-\mathcal{Z}_{t}^{2,\varepsilon}dW_{t},\quad\mathcal{Y}_{T}^{2}=0. (3.2)

Here, we note that (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) is the solution of a linear BSDE and that (α𝒴2,ε,α𝒵2,ε)\displaystyle(\alpha\mathcal{Y}^{2,\varepsilon},\alpha\mathcal{Z}^{2,\varepsilon}) can be interpreted as the solution of a “residual (nonlinear) BSDE”.

Let 𝒰1\displaystyle{\cal U}^{1} be the solution of the linear PDE corresponding to (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}):

t𝒰1(t,x)+ε𝒰1(t,x)=0,t<T,\displaystyle\displaystyle\partial_{t}{\cal U}^{1}(t,x)+{\cal L}^{\varepsilon}{\cal U}^{1}(t,x)=0,\ \ \ t<T, (3.3)
𝒰1(T,x)=g(x).\displaystyle\displaystyle{\cal U}^{1}(T,x)=g(x).

3.1 Deep BSDE solver for explicitly solvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})

We start with a case that (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) is explicitly solvable as a closed-form in order to explain our motivation of the paper. Even in this case, (α𝒴2,ε,α𝒵2,ε)\displaystyle(\alpha\mathcal{Y}^{2,\varepsilon},\alpha\mathcal{Z}^{2,\varepsilon}) can not be obtained in closed-form due to the nonlinearity of the driver f\displaystyle f. Hence, we apply the deep BSDE method to the residual nonlinear BSDE (α𝒴2,ε,α𝒵2,ε)\displaystyle(\alpha\mathcal{Y}^{2,\varepsilon},\alpha\mathcal{Z}^{2,\varepsilon}). Then, the following will be an approximation for the target Y0ε,α\displaystyle Y_{0}^{\varepsilon,\alpha}:

Y0ε,α\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha} \displaystyle\displaystyle\approx 𝒴01,ε+α𝒴~02,ε,(n),\displaystyle\displaystyle{\cal Y}_{0}^{1,\varepsilon}+\alpha\widetilde{\cal Y}_{0}^{2,\varepsilon,(n)\ast}, (3.4)

where 𝒴~2,ε,(n)\displaystyle\widetilde{\cal Y}^{2,\varepsilon,(n)\ast} is obtained as a solution of the following problem based on the deep BSDE method with closed-form functions for 𝒴1,ε\displaystyle\mathcal{Y}^{1,\varepsilon} and 𝒵1,ε\displaystyle\mathcal{Z}^{1,\varepsilon}:

inf𝒴~02,ε,(n),𝒵~2,ε,(n)𝒴~T2,ε,(n)22\displaystyle\displaystyle\inf_{\widetilde{\cal Y}^{2,\varepsilon,(n)}_{0},\widetilde{\cal Z}^{2,\varepsilon,(n)}}\Big{\|}\widetilde{\cal Y}_{T}^{2,\varepsilon,(n)}\Big{\|}_{2}^{2} (3.5)

subject to

𝒴~t2,ε,(n)\displaystyle\displaystyle\widetilde{\cal Y}_{t}^{2,\varepsilon,(n)} =𝒴~02,ε,(n)0tfα(s,X¯sε,(n),𝒴¯s1,ε,(n)+α𝒴~s2,ε,(n),𝒵¯s1,ε,(n)+α𝒵~s2,ε,(n))𝑑s\displaystyle\displaystyle=\widetilde{\cal Y}^{2,\varepsilon,(n)}_{0}-\int_{0}^{t}f^{\alpha}(s,\bar{X}_{s}^{\varepsilon,(n)},\overline{{\cal Y}}_{s}^{1,\varepsilon,(n)}+\alpha\widetilde{\cal Y}_{s}^{2,\varepsilon,(n)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(n)}+\alpha\widetilde{\cal Z}_{s}^{2,\varepsilon,(n)})ds
+0t𝒵~s2,ε,(n)𝑑Ws,\displaystyle\displaystyle\qquad+\int_{0}^{t}\widetilde{\cal Z}_{s}^{2,\varepsilon,(n)}dW_{s}, (3.6)

where

𝒴¯t1,ε,(n)=𝒰1(t,X¯tε,(n)),𝒵¯t1,ε,(n)=(x𝒰1σε)(t,X¯tε,(n)),t[0,T],\displaystyle\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(n)}={\cal U}^{1}(t,\bar{X}^{\varepsilon,(n)}_{t}),\ \ \ \ \overline{{\cal Z}}_{t}^{1,\varepsilon,(n)}=(\partial_{x}{\cal U}^{1}\sigma^{\varepsilon})(t,\bar{X}^{\varepsilon,(n)}_{t}),\ \ \ \ t\in[0,T], (3.7)

with the continuous Euler-Maruyama scheme X¯ε,(n)={X¯tε,(n)}t0(=X¯(n))\displaystyle\bar{X}^{\varepsilon,(n)}=\{\bar{X}^{\varepsilon,(n)}_{t}\}_{t\geq 0}(=\bar{X}^{(n)}) and closed-form functions 𝒰1\displaystyle{\cal U}^{1} and (x𝒰1σε)\displaystyle(\partial_{x}{\cal U}^{1}\sigma^{\varepsilon}).

In this case, we have the following error estimate with a small α\displaystyle\alpha-effect in the residual nonlinear BSDE. The proof will be shown as a part of the one for Theorem 3 in the next subsection. Particularly, see the sentences after (3.24) and (3.33).

Theorem 2.

There exists C>0\displaystyle C>0 such that

E[|Y0ε,α{𝒴01,ε+α𝒴~02,ε,(n)}|2]α2C{1n+𝒴~T2,ε,(n)22},E[|Y_{0}^{\varepsilon,\alpha}-\{{\cal Y}_{0}^{1,\varepsilon}+\alpha\widetilde{\cal Y}_{0}^{2,\varepsilon,(n)}\}|^{2}]\leq\alpha^{2}C\Big{\{}\frac{1}{n}+\norm{\widetilde{\mathcal{Y}}^{2,\varepsilon,(n)}_{T}}_{2}^{2}\Big{\}}, (3.8)

for all ε,α(0,1)\displaystyle\varepsilon,\alpha\in(0,1) and n1\displaystyle n\geq 1.

3.2 General case: Deep BSDE solver for unsolvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})

In most cases, (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) is unsolvable as a closed-form, particularly it is the case when the dimension d\displaystyle d is high. In such cases, we need to approximate (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}). However, constructing tractable approximations of 𝒴t1,ε=𝒰1(t,Xtε)\displaystyle\mathcal{Y}^{1,\varepsilon}_{t}={\cal U}^{1}(t,X^{\varepsilon}_{t}), t0\displaystyle t\geq 0, and especially 𝒵t1,ε=(x𝒰1σε)(t,Xtε)\displaystyle\mathcal{Z}^{1,\varepsilon}_{t}=(\partial_{x}{\cal U}^{1}\sigma^{\varepsilon})(t,X^{\varepsilon}_{t}), t0\displaystyle t\geq 0, is not an easy task because it includes the gradient of 𝒰1\displaystyle{\cal U}^{1}. A possible solution is to use an asymptotic expansion approach with stochastic calculus. We prepare some notations of Malliavin calculus. Let 𝔻\displaystyle\mathbb{D}^{\infty} be the space of smooth Wiener functionals in the sense of Malliavin. For a nondegenerate F(𝔻)d\displaystyle F\in(\mathbb{D}^{\infty})^{d} and G𝔻\displaystyle G\in\mathbb{D}^{\infty}, for a multi-index γ\displaystyle\gamma, there exists Hγ(F,G)𝔻\displaystyle H_{\gamma}(F,G)\in\mathbb{D}^{\infty} such that E[γφ(F)G]=E[φ(F)Hγ(F,G)]\displaystyle E[\partial^{\gamma}\varphi(F)G]=E[\varphi(F)H_{\gamma}(F,G)] for all φCb(d)\displaystyle\varphi\in C^{\infty}_{b}(\mathbb{R}^{d}). See Chapter V.8-10 in Ikeda and Watanabe (1989) [20] and Chapter 1-2 in Nualart (2006) [28] for the details.

First, we give approximations of 𝒴1,ε\displaystyle{\cal Y}^{1,\varepsilon} and 𝒵1,ε\displaystyle{\cal Z}^{1,\varepsilon}. For m\displaystyle m\in\mathbb{N}, we approximate 𝒰1\displaystyle{\cal U}^{1} and x𝒰1σε\displaystyle\partial_{x}{\cal U}^{1}\sigma^{\varepsilon} with asymptotic expansions up to the m\displaystyle m-th order and Malliavin calculus, by applying or extending the methods in [25][32][33][34][40]. Let us consider Xt,x,ε={Xst,x,ε}st\displaystyle X^{t,x,\varepsilon}=\{X^{t,x,\varepsilon}_{s}\}_{s\geq t} be the solution of

Xst,x,ε=x+tsμ(u,Xut,x,ε)𝑑u+εtsσ(u,Xut,x,ε)𝑑Wu,xd,st.\displaystyle\displaystyle X_{s}^{t,x,\varepsilon}=x+\int_{t}^{s}\mu(u,X_{u}^{t,x,\varepsilon})du+\varepsilon\int_{t}^{s}\sigma(u,X_{u}^{t,x,\varepsilon})dW_{u},\ \ \ x\in\mathbb{R}^{d},\ s\geq t. (3.9)

Then the d\displaystyle d-dimensional forward process Xt,x,ε=(Xt,x,ε,1,,Xt,x,ε,d)\displaystyle X^{t,x,\varepsilon}=(X^{t,x,\varepsilon,1},\cdots,X^{t,x,\varepsilon,d}) can be expanded as follows: for i=1,,d\displaystyle i=1,\cdots,d,

Xst,x,ε,iXst,x,0,i+εX1,st,x,i+ε2X2,st,x,i+in𝔻,\displaystyle\displaystyle X_{s}^{t,x,\varepsilon,i}\sim X^{t,x,0,i}_{s}+\varepsilon X^{t,x,i}_{1,s}+\varepsilon^{2}X^{t,x,i}_{2,s}+\cdots\ \ \ \ \ \mbox{in}\ \ \ \mathbb{D}^{\infty}, (3.10)

for some Xk,st,x,i𝔻\displaystyle X^{t,x,i}_{k,s}\in\mathbb{D}^{\infty}, k\displaystyle k\in\mathbb{N}, which are independent of ε\displaystyle\varepsilon (see Watanabe (1987) [38] for example). Here, Xst,x,0,i\displaystyle X^{t,x,0,i}_{s} is the solution of Xst,x,0,i=x+tsμi(u,Xut,x,0)𝑑u\displaystyle X_{s}^{t,x,0,i}=x+\int_{t}^{s}\mu^{i}(u,X_{u}^{t,x,0})du, and Xk,st,x,i=1k!k/εkXst,x,ε,i|ε=0\displaystyle X^{t,x,i}_{k,s}=\frac{1}{k!}{\partial^{k}}/{\partial\varepsilon^{k}}X_{s}^{t,x,\varepsilon,i}|_{\varepsilon=0}, k\displaystyle k\in\mathbb{N}.

Let us define X¯st,x,ε=Xst,x,0+εX1,st,x\displaystyle\overline{X}^{t,x,\varepsilon}_{s}=X^{t,x,0}_{s}+\varepsilon X^{t,x}_{1,s} for sT\displaystyle s\leq T. The functions 𝒰1\displaystyle{\cal U}^{1} and x𝒰1σε\displaystyle\partial_{x}{\cal U}^{1}\sigma^{\varepsilon} are approximated by the asymptotic expansion.

Proposition 1.

Let T>0\displaystyle T>0 and m\displaystyle m\in\mathbb{N}. There is [0,T)×d×(0,1)(t,x,ε)𝒲Tt,x,ε,(m)𝔻\displaystyle[0,T)\times\mathbb{R}^{d}\times(0,1)\ni(t,x,\varepsilon)\mapsto{\cal W}^{t,x,\varepsilon,(m)}_{T}\in\mathbb{D}^{\infty} satisfying that there exist C(T,m)>0\displaystyle C(T,m)>0 and p(m)m+1\displaystyle p(m)\geq m+1 such that

|𝒰1(t,x)𝒰1,(m)(t,x)|C(T,m)εm+1(Tt)p(m)/2,\displaystyle\displaystyle|{\cal U}^{1}(t,x)-{\cal U}^{1,(m)}(t,x)|\leq C(T,m)\varepsilon^{m+1}(T-t)^{p(m)/2}, (3.11)

for all ε(0,1)\displaystyle\varepsilon\in(0,1), t<T\displaystyle t<T and xd\displaystyle x\in\mathbb{R}^{d}, where the 𝒰1,(m)\displaystyle{\cal U}^{1,(m)} is given by

𝒰1,(m)(t,x)=E[g(X¯Tt,x,ε)𝒲Tt,x,ε,(m)],t<T,xd,\displaystyle\displaystyle{\cal U}^{1,(m)}(t,x)=E[g(\overline{X}^{t,x,\varepsilon}_{T}){\cal W}^{t,x,\varepsilon,(m)}_{T}],\ \ \ t<T,\ x\in\mathbb{R}^{d}, (3.12)

which satisfy 𝒰1,(m)(t,)Cb2(d)\displaystyle{\cal U}^{1,(m)}(t,\cdot)\in C_{b}^{2}(\mathbb{R}^{d}), t<T\displaystyle t<T. Also, there is [0,T)×d×(0,1)(t,x,ε)𝒵Tt,x,ε,(m)𝔻\displaystyle[0,T)\times\mathbb{R}^{d}\times(0,1)\ni(t,x,\varepsilon)\mapsto{\cal Z}^{t,x,\varepsilon,(m)}_{T}\in\mathbb{D}^{\infty} satisfying that there exist K(T,m)>0\displaystyle K(T,m)>0 and q(m)m\displaystyle q(m)\geq m such that

|x𝒰1(t,x)σε(t,x)𝒱1,(m)(t,x)|K(T,m)εm+1(Tt)q(m)/2,\displaystyle\displaystyle|\partial_{x}{\cal U}^{1}(t,x)\sigma^{\varepsilon}(t,x)-{\cal V}^{1,(m)}(t,x)|\leq K(T,m)\varepsilon^{m+1}(T-t)^{q(m)/2}, (3.13)

for all ε(0,1)\displaystyle\varepsilon\in(0,1), t<T\displaystyle t<T and xd\displaystyle x\in\mathbb{R}^{d}, where the 𝒱1,(m)\displaystyle{\cal V}^{1,(m)} is given by

𝒱1,(m)(t,x)=E[g(X¯Tt,x,ε)𝒵Tt,x,ε,(m)],t<T,xd,\displaystyle\displaystyle{\cal V}^{1,(m)}(t,x)=E[g(\overline{X}^{t,x,\varepsilon}_{T}){\cal Z}^{t,x,\varepsilon,(m)}_{T}],\ \ \ t<T,\ x\in\mathbb{R}^{d}, (3.14)

which satisfy 𝒱1,(m)(t,)Cb1(d)\displaystyle{\cal V}^{1,(m)}(t,\cdot)\in C_{b}^{1}(\mathbb{R}^{d}), t<T\displaystyle t<T.

Proof. See Appendix A.1.

For example, the stochastic weight 𝒲Tt,x,ε,(m)\displaystyle{\cal W}^{t,x,\varepsilon,(m)}_{T} has the representation in general:

𝒲Tt,x,ε,(m)\displaystyle\displaystyle{\cal W}^{t,x,\varepsilon,(m)}_{T}
=\displaystyle\displaystyle= 1+j=1mεjk=1jβ1++βk=j,βi1γ(k)=(γ1,,γk){1,,d}k1k!Hγ(k)(X1,Tt,x,=1kXβ+1,Tt,x,γ).\displaystyle\displaystyle 1+\sum_{j=1}^{m}\varepsilon^{j}\sum_{k=1}^{j}\sum_{\beta_{1}+\cdots+\beta_{k}=j,\beta_{i}\geq 1}\sum_{\gamma^{(k)}=(\gamma_{1},\cdots,\gamma_{k})\in\{1,\cdots,d\}^{k}}\frac{1}{k!}H_{\gamma^{(k)}}(X^{t,x}_{1,T},\prod_{\ell=1}^{k}X^{t,x,\gamma_{\ell}}_{{\beta_{\ell}}+1,T}). (3.15)

See Section 2.2 in Takahashi and Yamada (2012) [33] and Section 6.1 in Takahashi (2015) [32] for more details. The functions 𝒰1,(m)\displaystyle{\cal U}^{1,(m)} and 𝒱1,(m)\displaystyle{\cal V}^{1,(m)} have more explicit representation. Actually when m=1\displaystyle m=1, 𝒰1,(1)\displaystyle{\cal U}^{1,(1)} and 𝒱1,(1)\displaystyle{\cal V}^{1,(1)} have the following forms which are easily computed by taking advantage of the fact that X¯Tt,x,ε\displaystyle\overline{X}_{T}^{t,x,\varepsilon} (and X1,Tt,x\displaystyle X_{1,T}^{t,x}) is a Gaussian random variable. In particular, the representation 𝒱1,(1)\displaystyle{\cal V}^{1,(1)}, the multidimensional expansion of x𝒰1σε\displaystyle\partial_{x}{\cal U}^{1}\sigma^{\varepsilon} is new, which is an extension of [25][40].

Proposition 2.

For t<T\displaystyle t<T, xd\displaystyle x\in\mathbb{R}^{d},

𝒰1,(1)(t,x)=E[g(X¯Tt,x,ε)]\displaystyle\displaystyle{\cal U}^{1,(1)}(t,x)=E[g(\overline{X}_{T}^{t,x,\varepsilon})] (3.16)
+εi1,i2,i3,j1=1dk1,k2=1dE[g(X¯Tt,x,ε)H(i1,i2,i3)(X1,Tt,x,1)]Ci1,i2,i3,j1(1),k1,k2(t,T,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3})}(X_{1,T}^{t,x},1)]\ C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)
+εi1,i2,i3,j1,j2=1dk1,k2=1dE[g(X¯Tt,x,ε)H(i1,i2,i3)(X1,Tt,x,1)]Ci1,i2,i3,j1,j2(2),k1,k2(t,T,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},j_{2}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3})}(X_{1,T}^{t,x},1)]\ C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)
+ε12i1,j1,j2=1dk1,k2=1dE[g(X¯Tt,x,ε)H(i1)(X1,Tt,x,1)]1k1=k2Ci1,j1,j2(3),k1,k2(t,T,x),\displaystyle\displaystyle+\varepsilon\frac{1}{2}\sum_{i_{1},j_{1},j_{2}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1})}(X_{1,T}^{t,x},1)]\mathrm{1}_{k_{1}=k_{2}}C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x),
𝒱1,(1)(t,x)=i1=1dE[g(X¯Tt,x,ε)H(i1)(X1,Tt,x,1)][JtT0,x]i1σ(t,x)\displaystyle\displaystyle{\cal V}^{1,(1)}(t,x)=\sum_{i_{1}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1})}(X_{1,T}^{t,x},1)][J_{t\to T}^{0,x}]^{i_{1}}\sigma(t,x) (3.17)
+εi1,i2,i3,i4,j1=1dk1,k2=1dE[g(X¯Tt,x,ε)H(i1,i2,i3,i4)(X1,Tt,x,1)][JtT0,x]i1Ci2,i3,i4,j1(1),k1,k2(t,T,x)σ(t,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},i_{4},j_{1}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3},i_{4})}(X_{1,T}^{t,x},1)]\ [J_{t\to T}^{0,x}]^{i_{1}}C_{i_{2},i_{3},i_{4},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)\sigma(t,x)
+εi1,i2,i3,i4,j1,j2=1dk1,k2=1dE[g(X¯Tt,x,ε)H(i1,i2,i3,i4)(X1,Tt,x,1)][JtT0,x]i1Ci2,i3,i4,j1,j2(2),k1,k2(t,T,x)σ(t,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},i_{4},j_{1},j_{2}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3},i_{4})}(X_{1,T}^{t,x},1)]\ [J_{t\to T}^{0,x}]^{i_{1}}C_{i_{2},i_{3},i_{4},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)\sigma(t,x)
+ε12i1,j1,j2=1dk1,k2=1dE[g(X¯Tt,x,ε)H(i1,i2)(X1,Tt,x,1)][JtT0,x]i11k1=k2Ci2,j1,j2(3),k1,k2(t,T,x)σ(t,x)\displaystyle\displaystyle+\varepsilon\frac{1}{2}\sum_{i_{1},j_{1},j_{2}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2})}(X_{1,T}^{t,x},1)][J_{t\to T}^{0,x}]^{i_{1}}\mathrm{1}_{k_{1}=k_{2}}C_{i_{2},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x)\sigma(t,x)
+εi1,i2,j1,j2=1dk1=1dE[g(X¯Tt,x,ε)H(i1,i2)(X1,Tt,x,1)][JtT0,x]j1i1Ci2,j1,j2(4),k1(t,T,x)σ(t,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},j_{1},j_{2}=1}^{d}\sum_{k_{1}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2})}(X_{1,T}^{t,x},1)][J_{t\to T}^{0,x}]_{j_{1}}^{i_{1}}C_{i_{2},j_{1},j_{2}}^{(4),k_{1}}(t,T,x)\sigma(t,x)
+εi1,i2,j1=1dk1=1dE[g(X¯Tt,x,ε)H(i1,i2)(X1,Tt,x,1)][JtT0,x]j1i1Ci2,j1(5),k1(t,T,x)σ(t,x),\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},j_{1}=1}^{d}\sum_{k_{1}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2})}(X_{1,T}^{t,x},1)][J_{t\to T}^{0,x}]_{j_{1}}^{i_{1}}C_{i_{2},j_{1}}^{(5),k_{1}}(t,T,x)\sigma(t,x),

and

Ci1,i2,i3,j1(1),k1,k2(t,T,x)=\displaystyle\displaystyle C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)= tTtt1ak2i3(t,t2,t1,x)ak1i2(t,t1,T,x)bk1i1,j1(t,t1,T,x)ak2j1(t,t2,t1,x)𝑑t2𝑑t1,\displaystyle\displaystyle\int_{t}^{T}\int_{t}^{t_{1}}a^{i_{3}}_{k_{2}}(t,t_{2},t_{1},x)a^{i_{2}}_{k_{1}}(t,t_{1},T,x)b^{i_{1},j_{1}}_{k_{1}}(t,t_{1},T,x)a^{j_{1}}_{k_{2}}(t,t_{2},t_{1},x)dt_{2}dt_{1},
Ci1,i2,i3,j1,j2(2),k1,k2(t,T,x)=\displaystyle\displaystyle C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)= tTtt1tt2ak1i3(t,t3,t2,x)ak2i2(t,t2,t1,x)\displaystyle\displaystyle\int_{t}^{T}\int_{t}^{t_{1}}\int_{t}^{t_{2}}a^{i_{3}}_{k_{1}}(t,t_{3},t_{2},x)a^{i_{2}}_{k_{2}}(t,t_{2},t_{1},x)
ci1,j1,j2(t,t1,T,x)ak2j1(t,t2,t1,x)ak1j2(t,t3,t1,x)dt3dt2dt1,\displaystyle\displaystyle\hskip 32.00006ptc^{i_{1},j_{1},j_{2}}(t,t_{1},T,x)a^{j_{1}}_{k_{2}}(t,t_{2},t_{1},x)a^{j_{2}}_{k_{1}}(t,t_{3},t_{1},x)dt_{3}dt_{2}dt_{1},
Ci1,j1,j2(3),k1,k2(t,T,x)=\displaystyle\displaystyle C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x)= tTtt1ci1,j1,j2(t,t1,T,x)ak2j2(t,t2,t1,x)ak1j1(t,t2,t1,x)𝑑t2𝑑t1,\displaystyle\displaystyle\int_{t}^{T}\int_{t}^{t_{1}}c^{i_{1},j_{1},j_{2}}(t,t_{1},T,x)a^{j_{2}}_{k_{2}}(t,t_{2},t_{1},x)a^{j_{1}}_{k_{1}}(t,t_{2},t_{1},x)dt_{2}dt_{1},
Ci1,j1,j2(4),k1(t,T,x)=\displaystyle\displaystyle C_{i_{1},j_{1},j_{2}}^{(4),k_{1}}(t,T,x)= tTtt1ak1i1(t,t2,T,x)[2μ(t1,Xt1t,x,0)]j2j1ak1j2(t,t2,t1,x)𝑑t2𝑑t1,\displaystyle\displaystyle\int_{t}^{T}\int_{t}^{t_{1}}a^{i_{1}}_{k_{1}}(t,t_{2},T,x)[\partial^{2}\mu(t_{1},X_{t_{1}}^{t,x,0})]^{j_{1}}_{j_{2}}a^{j_{2}}_{k_{1}}(t,t_{2},t_{1},x)dt_{2}dt_{1},
Ci1,j1(5),k1(t,T,x)=\displaystyle\displaystyle C_{i_{1},j_{1}}^{(5),k_{1}}(t,T,x)= tTak1i1(t,t1,T,x)j1σk1(t1,Xt1t,x,0)dt1,\displaystyle\displaystyle\int_{t}^{T}a_{k_{1}}^{i_{1}}(t,t_{1},T,x)\partial_{j_{1}}\sigma_{k_{1}}(t_{1},X_{t_{1}}^{t,x,0})dt_{1},

with

aki(t,s,u,x)\displaystyle\displaystyle a^{i}_{k}(t,s,u,x) :=j1,j2=1d[Jtu0,x]j1i[(Jts0,x)1]j2j1σkj2(s,Xst,x,0),\displaystyle\displaystyle:=\sum_{j_{1},j_{2}=1}^{d}[J_{t\to u}^{0,x}]^{i}_{j_{1}}[(J_{t\to s}^{0,x})^{-1}]^{j_{1}}_{j_{2}}\sigma_{k}^{j_{2}}(s,X_{s}^{t,x,0}),
bki,j3(t,s,u,x)\displaystyle\displaystyle b^{i,j_{3}}_{k}(t,s,u,x) :=j1,j2=1d[Jtu0,x]j1i[(Jts0,x)1]j2j1j3σkj2(s,Xst,x,0),\displaystyle\displaystyle:=\sum_{j_{1},j_{2}=1}^{d}[J_{t\to u}^{0,x}]^{i}_{j_{1}}[(J_{t\to s}^{0,x})^{-1}]^{j_{1}}_{j_{2}}\partial_{j_{3}}\sigma_{k}^{j_{2}}(s,X_{s}^{t,x,0}),
ci,j3,j4(t,s,u,x)\displaystyle\displaystyle c^{i,j_{3},j_{4}}(t,s,u,x) :=j1,j2=1d[Jtu0,x]j1i[(Jts0,x)1]j2j1[2μj2(s,Xst,x,0)]j4j3.\displaystyle\displaystyle:=\sum_{j_{1},j_{2}=1}^{d}[J_{t\to u}^{0,x}]^{i}_{j_{1}}[(J_{t\to s}^{0,x})^{-1}]^{j_{1}}_{j_{2}}[\partial^{2}\mu^{j_{2}}(s,X_{s}^{t,x,0})]^{j_{3}}_{j_{4}}.

Here, []ji\displaystyle[\ \cdot\ ]^{i}_{j} is an entry in i\displaystyle i-th row and j\displaystyle j-th column of a matrix, jφ()\displaystyle\partial_{j}\varphi(\cdot) is an j\displaystyle j-th element of φ()=[φxi()]1id\displaystyle\partial\varphi(\cdot)=\left[\partialderivative*{\varphi}{x_{i}}(\cdot)\right]_{1\leq i\leq d} and [2φ()]ji=2φ()xixj\displaystyle[\partial^{2}\varphi(\cdot)]^{i}_{j}=\partialderivative{\varphi(\cdot)}{x_{i}}{x_{j}} (1i,jd\displaystyle 1\leq i,j\leq d) is used.

Proof. See Appendix A.2.

Using 𝒰1,(m)\displaystyle{\cal U}^{1,(m)} and 𝒱1,(m)\displaystyle{\cal V}^{1,(m)}, we define

𝒴¯t1,ε,(m)=𝒰1,(m)(t,Xtε),𝒵¯t1,ε,(m)=𝒱1,(m)(t,Xtε),t0.\displaystyle\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(m)}={\cal U}^{1,(m)}(t,X_{t}^{\varepsilon}),\ \ \ \overline{{\cal Z}}_{t}^{1,\varepsilon,(m)}={\cal V}^{1,(m)}(t,X_{t}^{\varepsilon}),\ \ \ t\geq 0. (3.18)

Furthermore, we compute 𝒴2,ε\displaystyle{\cal Y}^{2,\varepsilon} and 𝒵2,ε\displaystyle{\cal Z}^{2,\varepsilon} numerically by the deep BSDE method by solving

inf𝒴02,ε,(m,n),𝒵2,ε,(m,n)𝒴T2,ε,(m.n)22\displaystyle\displaystyle\inf_{{\cal Y}^{2,\varepsilon,(m,n)}_{0},{\cal Z}^{2,\varepsilon,(m,n)}}\Big{\|}{\cal Y}_{T}^{2,\varepsilon,(m.n)}\Big{\|}_{2}^{2} (3.19)

subject to

𝒴t2,ε,(m,n)\displaystyle\displaystyle{\cal Y}_{t}^{2,\varepsilon,(m,n)} =𝒴02,ε,(m,n)0tf(s,X¯sε,(n),𝒴¯s1,ε,(m,n)+α𝒴s2,ε,(m,n),𝒵¯s1,ε,(m,n)+α𝒵s2,ε,(m,n))𝑑s\displaystyle\displaystyle={\cal Y}^{2,\varepsilon,(m,n)}_{0}-\int_{0}^{t}f(s,\bar{X}_{s}^{\varepsilon,(n)},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m,n)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m,n)})ds
+0t𝒵s2,ε,(m,n)𝑑Ws,\displaystyle\displaystyle\qquad+\int_{0}^{t}{\cal Z}_{s}^{2,\varepsilon,(m,n)}dW_{s}, (3.20)

where

𝒴¯t1,ε,(m,n)=𝒰1,(m)(t,X¯tε,(n)),𝒵¯t1,ε,(m,n)=𝒱1,(m)(t,X¯tε,(n)),t[0,T],\displaystyle\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(m,n)}={\cal U}^{1,(m)}(t,\bar{X}^{\varepsilon,(n)}_{t}),\ \ \ \ \overline{{\cal Z}}_{t}^{1,\varepsilon,(m,n)}={\cal V}^{1,(m)}(t,\bar{X}^{\varepsilon,(n)}_{t}),\ \ \ \ t\in[0,T], (3.21)

with the continuous Euler-Maruyama scheme X¯ε,(n)={X¯tε,(n)}t0(=X¯(n))\displaystyle\bar{X}^{\varepsilon,(n)}=\{\bar{X}^{\varepsilon,(n)}_{t}\}_{t\geq 0}(=\bar{X}^{(n)}).

We have the main theoretical result in this paper as follows.

Theorem 3.

There exists C>0\displaystyle C>0 such that

E[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m,n)}|2]Cε2(m+1)+α2C{ε2(m+1)+1n+𝒴T2,ε,(m,n)22},E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}|^{2}]\leq C\varepsilon^{2(m+1)}+\alpha^{2}C\Big{\{}\varepsilon^{2(m+1)}+\frac{1}{n}+\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\Big{\}}, (3.22)

for all ε,α(0,1)\displaystyle\varepsilon,\alpha\in(0,1) and n1\displaystyle n\geq 1.

Proof.  In the proof, we use a generic constant C>0\displaystyle C>0 which varies from line to line. Let (𝒴2,ε,(m),𝒵2,ε,(m))\displaystyle({\cal Y}^{2,\varepsilon,(m)},{\cal Z}^{2,\varepsilon,(m)}) be the solution of the following BSDE:

𝒴t2,ε,(m)\displaystyle\displaystyle{\cal Y}_{t}^{2,\varepsilon,(m)} =tTf(s,Xsε,𝒴¯s1,ε,(m)+α𝒴s2,ε,(m),𝒵¯s1,ε,(m)+α𝒵s2,ε,(m))𝑑stT𝒵s2,ε,(m)𝑑Ws.\displaystyle\displaystyle=\int_{t}^{T}f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds-\int_{t}^{T}{\cal Z}_{s}^{2,\varepsilon,(m)}dW_{s}. (3.23)

Then we have

E[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m,n)}|2]\displaystyle\displaystyle\quad E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}|^{2}]
=E[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m)}+α𝒴02,ε,(m)α𝒴02,ε,(m,n)|2]\displaystyle\displaystyle=E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}-\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]
CE[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m)}|2]+α2CE[|𝒴02,ε,(m)𝒴02,ε,(m,n)|2].\displaystyle\displaystyle\leq CE[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}|^{2}]+\alpha^{2}CE[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]. (3.24)

First, we estimate the term E[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m)}|2]\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}|^{2}]. We note that this term becomes null in (3.8), i.e. the error estimate of Theorem 2 for the case that (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) is explicitly solvable as a closed-form.

Since we have

Y0ε,α=EX0[g(XTε)]+αEX0[0Tf(s,Xsε,Ysε,α,Zsε,α)𝑑s]\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha}=E_{X_{0}}[g(X_{T}^{\varepsilon})]+\alpha E_{X_{0}}[\int_{0}^{T}f(s,X_{s}^{\varepsilon},Y_{s}^{\varepsilon,\alpha},Z_{s}^{\varepsilon,\alpha})ds] (3.25)

and

𝒴¯01,ε,(m)\displaystyle\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)} =EX0[g(X¯T0,,ε){1+𝒲T0,,ε,(m)}],\displaystyle\displaystyle=E_{X_{0}}[g(\overline{X}^{0,\cdot,\varepsilon}_{T})\{1+{\cal W}^{0,\cdot,\varepsilon,(m)}_{T}\}], (3.26)
𝒴02,ε,(m)\displaystyle\displaystyle{\cal Y}^{2,\varepsilon,(m)}_{0} =EX0[0Tf(s,Xsε,𝒴¯s1,ε,(m)+α𝒴s2,ε,(m),𝒵¯s1,ε,(m)+α𝒵s2,ε,(m))𝑑s],\displaystyle\displaystyle=E_{X_{0}}[\int_{0}^{T}f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds], (3.27)

it holds that

E[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m)}|2]\displaystyle\displaystyle\quad E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}|^{2}]
CE[|EX0[g(XTε)]EX0[g(X¯T0,,ε)𝒲T0,,ε,(m)]|2]\displaystyle\displaystyle\leq CE[|E_{X_{0}}[g(X_{T}^{\varepsilon})]-E_{X_{0}}[g(\overline{X}^{0,\cdot,\varepsilon}_{T}){\cal W}^{0,\cdot,\varepsilon,(m)}_{T}]|^{2}]
+CE[|EX0[0Tαf(s,Xsε,Ysε,α,Zsε,α)ds]\displaystyle\displaystyle\quad+CE\Big{[}\Big{|}E_{X_{0}}[\int_{0}^{T}\alpha f(s,X_{s}^{\varepsilon},Y_{s}^{\varepsilon,\alpha},Z_{s}^{\varepsilon,\alpha})ds]
EX0[0Tαf(s,Xsε,𝒴¯s1,ε,(m)+α𝒴s2,ε,(m),𝒵¯s1,ε,(m)+α𝒵s2,ε,(m))ds]|2]\displaystyle\displaystyle\qquad\qquad-E_{X_{0}}[\int_{0}^{T}\alpha f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds]\Big{|}^{2}\Big{]}
Cε2(m+1)\displaystyle\displaystyle\leq C\varepsilon^{2(m+1)}
+Cα2CLip[f]20TE[|𝒴s1,ε𝒴¯s1,ε,(m)|2]𝑑s+Cα2CLip[f]20TE[|𝒵s1,ε𝒵¯s1,ε,(m)|2]𝑑s\displaystyle\displaystyle+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[|{\cal Y}_{s}^{1,\varepsilon}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}|^{2}]ds+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[|{\cal Z}_{s}^{1,\varepsilon}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}|^{2}]ds
+Cα2CLip[f]20TE[|α𝒴s2,εα𝒴s2,ε,(m)|2]𝑑s+Cα2CLip[f]20TE[|α𝒵s2,εα𝒵s2,ε,(m)|2]𝑑s.\displaystyle\displaystyle+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[|\alpha{\cal Y}_{s}^{2,\varepsilon}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}|^{2}]ds+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[|\alpha{\cal Z}_{s}^{2,\varepsilon}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}|^{2}]ds. (3.28)

Here, the estimates

0TE[|𝒴s1,ε𝒴¯s1,ε,(m)|2]𝑑sCε2(m+1),\displaystyle\displaystyle\int_{0}^{T}E[|{\cal Y}_{s}^{1,\varepsilon}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}|^{2}]ds\leq C\varepsilon^{2(m+1)}, (3.29)
0TE[|𝒵s1,ε𝒵¯s1,ε,(m)|2]𝑑sCε2(m+1),\displaystyle\displaystyle\int_{0}^{T}E[|{\cal Z}_{s}^{1,\varepsilon}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}|^{2}]ds\leq C\varepsilon^{2(m+1)}, (3.30)

are obtained by (3.11) and (3.13). Also, by Theorem 4.2.3 in Zhang (2017) [42], we have

0TE[|α𝒴s2,εα𝒴s2,ε,(m)|2]𝑑s+0TE[|α𝒵s2,εα𝒵s2,ε,(m)|2]𝑑s\displaystyle\displaystyle\int_{0}^{T}E[|\alpha{\cal Y}_{s}^{2,\varepsilon}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}|^{2}]ds+\int_{0}^{T}E[|\alpha{\cal Z}_{s}^{2,\varepsilon}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}|^{2}]ds
CE[0T|αf(s,Xsε,𝒴s1,ε+α𝒴s2,ε,𝒵s1,ε+α𝒵s2,ε)\displaystyle\displaystyle\leq CE[\int_{0}^{T}|\alpha f(s,X_{s}^{\varepsilon},\mathcal{Y}_{s}^{1,\varepsilon}+\alpha\mathcal{Y}_{s}^{2,\varepsilon},\mathcal{Z}_{s}^{1,\varepsilon}+\alpha\mathcal{Z}_{s}^{2,\varepsilon})
αf(s,Xsε,𝒴¯s1,ε,(m)+α𝒴s2,ε,𝒵¯s1,ε,(m)+α𝒵s2,ε)|2ds]\displaystyle\displaystyle\qquad\qquad\qquad-\alpha f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha\mathcal{Y}_{s}^{2,\varepsilon},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha\mathcal{Z}_{s}^{2,\varepsilon})|^{2}ds]
Cα2CLip[f]2{0TE[|𝒴s1,ε𝒴¯s1,ε,(m)|2]𝑑s+0TE[|𝒵s1,ε𝒵¯s1,ε,(m)|2]𝑑s}\displaystyle\displaystyle\leq C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\{\int_{0}^{T}E[|{\cal Y}_{s}^{1,\varepsilon}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}|^{2}]ds+\int_{0}^{T}E[|{\cal Z}_{s}^{1,\varepsilon}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}|^{2}]ds\}
Cα2ε2(m+1),\displaystyle\displaystyle\leq C\alpha^{2}\varepsilon^{2(m+1)}, (3.31)

where the estimates (3.29) and (3.30) are applied in the last inequality. Therefore, we get

E[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m)}|2]Cε2(m+1)+Cα2ε2(m+1).\displaystyle\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}|^{2}]\leq C\varepsilon^{2(m+1)}+C\alpha^{2}\varepsilon^{2(m+1)}. (3.32)

Next, we estimate

E[|𝒴02,ε,(m)𝒴02,ε,(m,n)|2]\displaystyle\displaystyle E[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}] (3.33)

in (3.24). We note that only this term appears in (3.8), i.e. the error estimate of Theorem 2 for the case that (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) is explicitly solvable as a closed-form.

Since we have

𝒴02,ε,(m)𝒴02,ε,(m,n)\displaystyle\displaystyle{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}^{2,\varepsilon,(m,n)}_{0}
=0Tf(s,Xsε,𝒴¯s1,ε,(m)+α𝒴s2,ε,(m),𝒵¯s1,ε,(m)+α𝒵s2,ε,(m))𝑑s0T𝒵s2,ε,(m)𝑑Ws\displaystyle\displaystyle=\int_{0}^{T}f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds-\int_{0}^{T}{\cal Z}_{s}^{2,\varepsilon,(m)}dW_{s}
𝒴T2,ε,(m,n)0Tf(s,X¯sε,(n),𝒴¯s1,ε,(m,n)+α𝒴s2,ε,(m,n),𝒵¯s1,ε,(m,n)+α𝒵s2,ε,(m,n))𝑑s\displaystyle\displaystyle\quad-{\cal Y}_{T}^{2,\varepsilon,(m,n)}-\int_{0}^{T}f(s,\bar{X}_{s}^{\varepsilon,(n)},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m,n)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m,n)})ds
+0T𝒵s2,ε,(m,n)𝑑Ws,\displaystyle\displaystyle\quad+\int_{0}^{T}{\cal Z}_{s}^{2,\varepsilon,(m,n)}dW_{s}, (3.34)

the upper bound of E[|𝒴02,ε,(m)𝒴02,ε,(m,n)|2]\displaystyle E[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}] can be decomposed as

E[|𝒴02,ε,(m)𝒴02,ε,(m,n)|2]\displaystyle\displaystyle E[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]
𝒴T2,ε,(m,n)22+0TE[|𝒵s2,ε,(m)𝒵s2,ε,(m,n)|2]𝑑s\displaystyle\displaystyle\leq\|\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}\|_{2}^{2}+\int_{0}^{T}E[|{\cal Z}_{s}^{2,\varepsilon,(m)}-{\cal Z}_{s}^{2,\varepsilon,(m,n)}|^{2}]ds
+CLip[f]2×{E[0T|XsεX¯sε,(n)|2ds\displaystyle\displaystyle\quad+C_{\text{Lip}}[f]^{2}\times\Big{\{}E[\int_{0}^{T}|X_{s}^{\varepsilon}-\bar{X}_{s}^{\varepsilon,(n)}|^{2}ds
+0T|𝒴¯s1,ε,(m)𝒴¯s1,ε,(m,n)|2𝑑s+0T|α𝒴s2,ε,(m)α𝒴s2,ε,(m,n)|2𝑑s\displaystyle\displaystyle\quad+\int_{0}^{T}|\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}|^{2}ds+\int_{0}^{T}|\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m,n)}|^{2}ds
+0T|𝒵¯s1,ε,(m)𝒵¯s1,ε,(m,n)|2ds+0T|α𝒵s2,ε,(m)α𝒵s2,ε,(m,n)|2ds]}.\displaystyle\displaystyle\quad+\int_{0}^{T}|\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}|^{2}ds+\int_{0}^{T}|\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m,n)}|^{2}ds]\Big{\}}. (3.35)

Then, the following holds:

0TE[|𝒴¯s1,ε,(m)𝒴¯s1,ε,(m,n)|2]𝑑sCE[0T|XsεX¯sε,(n)|2𝑑s],\displaystyle\displaystyle\int_{0}^{T}E[|\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}|^{2}]ds\leq CE[\int_{0}^{T}|X_{s}^{\varepsilon}-\bar{X}_{s}^{\varepsilon,(n)}|^{2}ds], (3.36)
0TE[|𝒵¯s1,ε,(m)𝒵¯s1,ε,(m,n)|2]𝑑sCE[0T|XsεX¯sε,(n)|2𝑑s],\displaystyle\displaystyle\int_{0}^{T}E[|\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}|^{2}]ds\leq CE[\int_{0}^{T}|X_{s}^{\varepsilon}-\bar{X}_{s}^{\varepsilon,(n)}|^{2}ds], (3.37)

since for all t<T\displaystyle t<T, 𝒰1,(m)(t,)\displaystyle{\cal U}^{1,(m)}(t,\cdot) and 𝒱1,(m)(t,)\displaystyle{\cal V}^{1,(m)}(t,\cdot) are in Cb2\displaystyle C^{2}_{b} and Cb1\displaystyle C^{1}_{b}, respectively. Thus, we have

E[|𝒴02,ε,(m)𝒴02,ε,(m,n)|2]𝒴T2,ε,(m,n)22+C×{supt[0,T](E[|XtX¯tε,(n)|2]\displaystyle\displaystyle E[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]\leq\|\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}\|_{2}^{2}+C\times\Big{\{}\sup_{t\in[0,T]}(E[|X_{t}-\bar{X}^{\varepsilon,(n)}_{t}|^{2}]
+E[|𝒴t2,ε,(m)𝒴t2,ε,(m,n)|2])+0TE[|𝒵s2,ε,(m)𝒵s2,ε,(m,n)|2]ds}.\displaystyle\displaystyle\quad\ \ \ \ \ \ \ \ \ \ +E[|{\cal Y}_{t}^{2,\varepsilon,(m)}-{\cal Y}_{t}^{2,\varepsilon,(m,n)}|^{2}])+\int_{0}^{T}E[|{\cal Z}_{s}^{2,\varepsilon,(m)}-{\cal Z}_{s}^{2,\varepsilon,(m,n)}|^{2}]ds\Big{\}}. (3.38)

By Theorem 1 of Han and Long (2020) [14], it holds that

supt[0,T](E[|XtεX¯tε,(n)|2]+E[|𝒴t2,ε,(m)𝒴t2,ε,(m,n)|2])+0TE[|𝒵s2,ε,(m)𝒵s2,ε,(m,n)|2]𝑑s\displaystyle\displaystyle\sup_{t\in[0,T]}(E[|X_{t}^{\varepsilon}-\bar{X}^{\varepsilon,(n)}_{t}|^{2}]+E[|{\cal Y}_{t}^{2,\varepsilon,(m)}-{\cal Y}_{t}^{2,\varepsilon,(m,n)}|^{2}])+\int_{0}^{T}E[|{\cal Z}_{s}^{2,\varepsilon,(m)}-{\cal Z}_{s}^{2,\varepsilon,(m,n)}|^{2}]ds
C{Tn+𝒴T2,ε,(m,n)22}.\displaystyle\displaystyle\leq C\Big{\{}\frac{T}{n}+\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\Big{\}}. (3.39)

Therefore, we get

E[|𝒴02,ε,(m)𝒴02,ε,(m,n)|2]Cn+C𝒴T2,ε,(m,n)22,E[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]\\ \leq\frac{C}{n}+C\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}, (3.40)

and the assertion is obtained as:

E[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m,n)}|2]\displaystyle\displaystyle\quad E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}|^{2}] (3.41)
CE[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m)}|2]+α2CE[|𝒴02,ε,(m)𝒴02,ε,(m,n)|2]\displaystyle\displaystyle\leq CE[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}|^{2}]+\alpha^{2}CE[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]
Cε2(m+1)+Cα2ε2(m+1)+α2C{1n+𝒴T2,ε,(m,n)22}.\displaystyle\displaystyle\leq C\varepsilon^{2(m+1)}+C\alpha^{2}\varepsilon^{2(m+1)}+\alpha^{2}C\Big{\{}\frac{1}{n}+\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\Big{\}}.\ \ \ \ \ \ \Box

By the theorem above, it holds that

Y0ε,α\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha} \displaystyle\displaystyle\approx 𝒴¯01,ε,(m)+α𝒴02,ε,(m,n),\displaystyle\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast}, (3.42)

where 𝒴02,ε,(m,n)\displaystyle{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast} is obtained by solving (3.19) with Deep BSDE method. The process 𝒴¯1,ε,(m,n)\displaystyle\overline{{\cal Y}}^{1,\varepsilon,(m,n)} and 𝒵¯1,ε,(m,n)\displaystyle\overline{{\cal Z}}^{1,\varepsilon,(m,n)} work as control variates for the nonlinear BSDE.

Here, let us briefly make comments on comparison of the theoretical error estimates of our proposed method, namely (3.8) in Theorem 2 for the explicitly solvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) case and (3.22) in Theorem 3 for the unsolvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) case with the one provided by Han and Long (2020) for the method of Weinan E et al (2017), i.e. (2.10) in Theorem 1. Given the number of discretized time steps n\displaystyle n for Euler-Maruyama scheme, those are relisted below:

  • Proposed method (for the solvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) case):

    E[|Y0ε,α{𝒴01,ε+α𝒴~02,ε,(n)}|2]Cα21n+Cα2𝒴~T2,ε,(n)22(3.8)\displaystyle\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-\{{\cal Y}_{0}^{1,\varepsilon}+\alpha\widetilde{\cal Y}_{0}^{2,\varepsilon,(n)}\}|^{2}]\leq C\alpha^{2}\frac{1}{n}+C\alpha^{2}\norm{\widetilde{\mathcal{Y}}^{2,\varepsilon,(n)}_{T}}_{2}^{2}\ \ \ (\ref{solvable-thm})
  • Proposed method (for the unsolvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) case):

    E[|Y0ε,α{𝒴¯01,ε,(m)+α𝒴02,ε,(m,n)}|2]\displaystyle\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}|^{2}]
    (Cε2(m+1)+Cα2ε2(m+1))+Cα21n+Cα2𝒴T2,ε,(m,n)22(3.22)\displaystyle\displaystyle\ \ \ \ \ \ \ \leq(C\varepsilon^{2(m+1)}+C\alpha^{2}\varepsilon^{2(m+1)})+C\alpha^{2}\frac{1}{n}+C\alpha^{2}\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\ \ (\ref{unsolvable-main-thm})
  • Method of Weinan E et al. (2017) (error estimate by Han and Long (2020)):

    E[|Y0ε,αY0ε,α,(n)|2]C1n+Cg(X¯Tε,(n))YTε,α,(n)22.(2.10)\displaystyle\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-Y_{0}^{\varepsilon,\alpha,(n)}|^{2}]\leq C\frac{1}{n}+C\norm{g(\bar{X}_{T}^{\varepsilon,(n)})-{Y}_{T}^{\varepsilon,\alpha,(n)}}_{2}^{2}.\ \ (\ref{han-long-thm})

Thanks to the following advantages of our proposed method, we can see that it works better as a new Deep BSDE solver, more precisely, its errors are expected to be smaller:

  • (i) Decomposition into a “dominant” linear PDE with original terminal g\displaystyle g and a “small” nonlinear PDE with zero terminal, i.e.

    u(0,x)=𝒰1(0,x)“dominant” linear PDE part+𝒰2(0,x)“small” nonlinear PDE part,xd.\displaystyle\displaystyle u(0,x)=\underset{\mbox{\tiny{``dominant" linear PDE part}}}{{\cal U}^{1}(0,x)}+\underset{\mbox{\tiny{``small" nonlinear PDE part}}}{{\cal U}^{2}(0,x)},\ \ \ \ x\in\mathbb{R}^{d}.

    and an application of Deep BSDE solver only to the “small” nonlinear PDE.

    (ii) Closed form solutions/approximations for the linear PDE, which also work as control variates for the driver of the nonlinear PDE.

    Thanks to (i) and (ii), we can obtain the term Cα2𝒴~T2,ε,(n)22\displaystyle C\alpha^{2}\norm{\widetilde{\mathcal{Y}}^{2,\varepsilon,(n)}_{T}}_{2}^{2} in the error bound, rather than Cg(X¯Tε,(n))YTε,α,(n)22\displaystyle C\norm{g(\bar{X}_{T}^{\varepsilon,(n)})-{Y}_{T}^{\varepsilon,\alpha,(n)}}_{2}^{2}.

    Moreover, we note that our method enjoys the effects of a small parameter in the nonlinear driver α(0,1)\displaystyle\alpha\in(0,1) for this term, as well as for the discretization error term caused by Euler-Maruyama scheme, which is given as Cα21n\displaystyle C\alpha^{2}\frac{1}{n} rather than C1n\displaystyle C\frac{1}{n}.

  • Regarding the unsolvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon}) case, our asymptotic expansions with respect to a small parameter ε(0,1)\displaystyle\varepsilon\in(0,1) in the diffusion coefficient enable us to obtain closed form approximations 𝒴¯t1,ε,(m)=𝒰1,(m)(t,Xtε)\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(m)}={\cal U}^{1,(m)}(t,X_{t}^{\varepsilon}) and 𝒵¯t1,ε,(m)=𝒱1,(m)(t,Xtε)\displaystyle\overline{{\cal Z}}_{t}^{1,\varepsilon,(m)}={\cal V}^{1,(m)}(t,X_{t}^{\varepsilon}): Particularly, in (3.22) the coefficients Cε2(m+1)\displaystyle C\varepsilon^{2(m+1)} and Cα2ε2(m+1)\displaystyle C\alpha^{2}\varepsilon^{2(m+1)} are associated with errors of the approximations for terminal g\displaystyle g and driver αf\displaystyle\alpha f, respectively.

We will check the effectiveness of the new method by numerical experiments in the next section.

Remark 1.

We give an important remark on the new method. While the proposed scheme provides a fine result, we can further improve it by replacing our approximation for the linear part 𝒴¯01,ε,(m)\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)} in the decomposition (3.42) with the methods of [36][35][39][26]
[29][20].

For example, based on Takahashi and Yamada (2016) [35], the following result will be an improvement of the proposed scheme. Let ti=T(1(1i/n0)γ)\displaystyle t_{i}=T(1-(1-i/n_{0})^{\gamma}), i=0,1,,n\displaystyle i=0,1,\cdots,n, with a parameter γ>0\displaystyle\gamma>0, and X¯ti0,x,ε,(n)=X¯titi1,X¯ti10,x,ε,(n),ε\displaystyle\bar{X}_{t_{i}}^{0,x,\varepsilon,(n)}=\overline{X}_{t_{i}}^{t_{i-1},\bar{X}_{t_{i-1}}^{0,x,\varepsilon,(n)},\varepsilon}, i=1,,n\displaystyle i=1,\cdots,n. Define

𝒴^01,ε,(m,n0)=E[g(X¯T0,x,ε,(n0))i=1n0𝒲titi1,X¯ti10,x,ε,(n0),ε]|x=X0,\displaystyle\displaystyle\widehat{{\cal Y}}_{0}^{1,\varepsilon,(m,n_{0})}=E[g(\bar{X}_{T}^{0,x,\varepsilon,(n_{0})})\prod_{i=1}^{n_{0}}{\cal W}_{t_{i}}^{t_{i-1},\bar{X}_{t_{i-1}}^{0,x,\varepsilon,(n_{0})},\varepsilon}]|_{x=X_{0}}, (3.43)

and consider the quantity

𝒴^01,ε,(m,n0)+α𝒴02,ε,(m,n),\displaystyle\displaystyle\widehat{{\cal Y}}_{0}^{1,\varepsilon,(m,n_{0})}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast}, (3.44)

where 𝒴02,ε,(m,n)\displaystyle{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast} is the same as in (3.42). Then, (3.44) will be the improved approximation, as

Y0ε,α\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha} \displaystyle\displaystyle\approx 𝒴^01,ε,(m,n0)+α𝒴02,ε,(m,n),\displaystyle\displaystyle\widehat{{\cal Y}}_{0}^{1,\varepsilon,(m,n_{0})}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast}, (3.45)

in the following sense.

Corollary 1.

There exist C>0\displaystyle C>0 and r(m)>0\displaystyle r(m)>0 such that

E[|Y0ε,α{𝒴^01,ε,(m,n0)+α𝒴02,ε,(m,n)}|2]Cε2(m+1)n02r(m)+α2C{ε2(m+1)+1n+𝒴T2,ε,(m,n)22},E[|Y_{0}^{\varepsilon,\alpha}-\{\widehat{{\cal Y}}_{0}^{1,\varepsilon,(m,n_{0})}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}|^{2}]\leq C\frac{\varepsilon^{2(m+1)}}{n_{0}^{2r(m)}}+\alpha^{2}C\Big{\{}\varepsilon^{2(m+1)}+\frac{1}{n}+\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\Big{\}}, (3.46)

for all ε,α(0,1)\displaystyle\varepsilon,\alpha\in(0,1) and n0,n1\displaystyle n_{0},n\geq 1.

Remark 2.

If the driver of a BSDE contains a linear part, we can transform the BSDE to the one considered in the current paper, namely the equation (2.2). For instance, let us solve FSDE (2.1) and the following BSDE:

dYtε,α\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha} =\displaystyle\displaystyle= [A(t,Xtε)Ytε,α+ZtεB(t,Xtε)+αf(t,Xtε,Ytε,α,Ztε,α)]dtZtεdWt,\displaystyle\displaystyle[A(t,X_{t}^{\varepsilon})Y_{t}^{\varepsilon,\alpha}+Z_{t}^{\varepsilon}B(t,X_{t}^{\varepsilon})+\alpha f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})]dt-Z_{t}^{\varepsilon}dW_{t},
YTε,α\displaystyle\displaystyle Y_{T}^{\varepsilon,\alpha} =\displaystyle\displaystyle= g(XTε),\displaystyle\displaystyle g(X_{T}^{\varepsilon}),

where A\displaystyle A and B\displaystyle B are \displaystyle\mathbb{R}-valued and d\displaystyle\mathbb{R}^{d}-valued bounded functions, respectively.

Let Y^tε,α:=e0tA(s,Xtε)𝑑sYtε,α\displaystyle\hat{Y}_{t}^{\varepsilon,\alpha}:=e^{\int_{0}^{t}A(s,X_{t}^{\varepsilon})ds}Y_{t}^{\varepsilon,\alpha}, t0\displaystyle t\geq 0, and W^t:=Wt0tB(s,Xsε)𝑑s\displaystyle\hat{W}_{t}:=W_{t}-\int_{0}^{t}B(s,X_{s}^{\varepsilon})ds, t0\displaystyle t\geq 0, which is a Brownian motion under a probability measure P^\displaystyle\hat{P} obtained by the change of measure with process B(,Xε)\displaystyle B(\cdot,X_{\cdot}^{\varepsilon}). Then, we have

dXtε\displaystyle\displaystyle dX_{t}^{\varepsilon} =[μ(t,Xtε)+εσ(t,Xtε)B(t,Xtε)]dt+εσ(t,Xtε)dW^t,X0εL2(Ω;d),\displaystyle\displaystyle=[\mu(t,X_{t}^{\varepsilon})+\varepsilon\sigma(t,X_{t}^{\varepsilon})B(t,X_{t}^{\varepsilon})]dt+\varepsilon\sigma(t,X_{t}^{\varepsilon})d\hat{W}_{t},\quad X_{0}^{\varepsilon}\in L^{2}(\Omega;\mathbb{R}^{d}),
dY^tε,α\displaystyle\displaystyle-d\hat{Y}_{t}^{\varepsilon,\alpha} =αe0tA(s,Xsε)𝑑sf(t,Xtε,Ytε,α,Ztε,α)dte0tA(s,Xsε)𝑑sZtdW^t,YTε,α=e0TA(s,Xsε)𝑑sg(XTε).\displaystyle\displaystyle=\alpha e^{\int_{0}^{t}A(s,X_{s}^{\varepsilon})ds}f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})dt-e^{\int_{0}^{t}A(s,X_{s}^{\varepsilon})ds}Z_{t}d\hat{W}_{t},\ Y_{T}^{\varepsilon,\alpha}=e^{\int_{0}^{T}A(s,X_{s}^{\varepsilon})ds}g(X_{T}^{\varepsilon}).

4 Numerical results

In the numerical examples, we demonstrate that the deep BSDE method with the first order asymptotic expansion obtained in Proposition 2 provides enough accuracy in solving semilinear PDEs. The dimension d\displaystyle d in (2.2) is assumed to be d=1\displaystyle d=1 or d=100\displaystyle d=100.

We investigate the accuracy of the new method by comparing to the standard Deep BSDE method in Weinan E et al. (2017) [3] and the Deep BSDE method with a prior knowledge in Fujii et al. (2019) [9] for the model (2.2), where the target BSDEs with FSDEs are specified later.

4.1 Numerical schemes used in experiments

In this subsection, we explain the details of schemes used in numerical experiments. To construct the deep neural networks for each method, we follow Weinan E et al. (2017) [3] and employ the adaptive moment estimation (Adam) with mini-batches. The parameters for the networks are set as follows: there are d+10\displaystyle d+10 of hidden layers except batch normalization layers. For all learning steps, 256\displaystyle 256 sample paths are generated and the learning rate is taken as 0.01\displaystyle 0.01.

(Numerical scheme) Now, let us briefly explain the schemes used in the numerical experiment in the following subsections.

  1. 1.

    (Deep BSDE method based on Weinan E et al. (2017)) In forward discretization of Yε,α\displaystyle Y^{\varepsilon,\alpha}, the Euler-Maruyama scheme X¯ε,(n)\displaystyle\bar{X}^{\varepsilon,(n)} is applied with time step n=20\displaystyle n=20. The initial guess of Y0ε,α\displaystyle Y_{0}^{\varepsilon,\alpha} is generated by uniform random number around 𝒴¯01,ε,(1)\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(1)}, which is a prior knowledge for the Deep BSDE method.

    In the study of Weinan E et al. (2017), it is known that the estimated value by the Deep BSDE method converges to the true value of Y0ε,α\displaystyle Y_{0}^{\varepsilon,\alpha} if we take a sufficient number of iteration steps.

    In Section 4.2 below, the estimate values based on this scheme are shown by the green lines labeled with “Deep BSDE” in the figures.

  2. 2.

    (Deep BSDE method with an enhanced version of Fujii et al. (2019)) In forward discretization of Yε,α\displaystyle Y^{\varepsilon,\alpha} in the Deep BSDE solver, as an approximation of 𝒵1,ε\displaystyle\mathcal{Z}^{1,\varepsilon} we apply 𝒵¯t1,ε,(1,n)=𝒱1,(1)(t,X¯tε,(n))\displaystyle\overline{{\cal Z}}_{t}^{1,\varepsilon,(1,n)}={\cal V}^{1,(1)}(t,\bar{X}_{t}^{\varepsilon,(n)}), t0\displaystyle t\geq 0, with the function 𝒱1,(1)\displaystyle{\cal V}^{1,(1)} defined by (3.17) and the Euler-Maruyama scheme X¯ε,(n)\displaystyle\bar{X}^{\varepsilon,(n)} with time step n=20\displaystyle n=20 to obtain an estimate of 𝒵2,ε\displaystyle\mathcal{Z}^{2,\varepsilon} by optimization in the Deep BSDE solver. As the initial value of Y0ε,α\displaystyle Y_{0}^{\varepsilon,\alpha}, we use 𝒰1,(1)(0,x)\displaystyle{{\cal U}}^{1,(1)}(0,x) with the function 𝒰1,(1)\displaystyle{\cal U}^{1,(1)} defined by (3.16), an approximation of 𝒴1,ε\displaystyle\mathcal{Y}^{1,\varepsilon}, which appears in the linear part of our decomposition of the BSDE (Yε,α,Zε,α)\displaystyle(Y^{\varepsilon,\alpha},Z^{\varepsilon,\alpha}) with Yε,α=𝒴1,ε+α𝒴2,ε\displaystyle Y^{\varepsilon,\alpha}=\mathcal{Y}^{1,\varepsilon}+\alpha\mathcal{Y}^{2,\varepsilon} and Zε,α=𝒵1,ε+α𝒵2,ε\displaystyle Z^{\varepsilon,\alpha}=\mathcal{Z}^{1,\varepsilon}+\alpha\mathcal{Z}^{2,\varepsilon}. Thus, the scheme is an improved version of Fujii et al. (2019) [9], since it applies the higher order term 𝒵¯1,ε,(1)\displaystyle\overline{{\cal Z}}^{1,\varepsilon,(1)} than the leading order term 𝒵¯1,ε,(0)\displaystyle\overline{{\cal Z}}^{1,\varepsilon,(0)} that Fujii et al. (2019) [9] uses.

    Through the study of Fujii et al. (2019) [9], it is also known that the estimated value by the enhanced Deep BSDE method converges to the true value of Y0ε,α\displaystyle Y_{0}^{\varepsilon,\alpha} with a much smaller number of iteration steps than by the original Deep BSDE method in Weinan E et al. (2017).

    In Section 4.2 below, the estimate values based on this scheme are shown by the red lines labeled with “Deep BSDE[(Y,Z)\displaystyle(Y,Z)]+AE[𝒴¯01,ε\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0} and Z¯1,ε\displaystyle\overline{Z}^{1,\varepsilon}]” in the figures.

  3. 3.

    (New scheme) Following the main result introduced in Section 3, particularly Theorem 2, we employ our approximation (3.42) for the decomposition Y0ε,α=𝒴01,ε+α𝒴02,ε\displaystyle Y^{\varepsilon,\alpha}_{0}=\mathcal{Y}^{1,\varepsilon}_{0}+\alpha\mathcal{Y}^{2,\varepsilon}_{0} with m=1\displaystyle m=1 and n=20\displaystyle n=20, namely,

    Y0ε,α\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha} \displaystyle\displaystyle\approx 𝒴¯01,ε,(1)+α𝒴02,ε,(1,20),\displaystyle\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(1)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(1,20)\ast},

    where we compute the nonlinear part 𝒴02,ε,(1,20)\displaystyle{\cal Y}_{0}^{2,\varepsilon,(1,20)\ast} with (3.19)–(3.21) by the Deep BSDE solver, while 𝒴¯01,ε,(1)\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(1)} by 𝒰1,(1)(0,x)\displaystyle{{\cal U}}^{1,(1)}(0,x) with the function 𝒰1,(1)\displaystyle{\cal U}^{1,(1)} defined by (3.16).

    Specifically, in computation of 𝒴02,ε,(1,20)\displaystyle{\cal Y}_{0}^{2,\varepsilon,(1,20)\ast} by the Deep BSDE solver with the equation:

    d𝒴t2,ε\displaystyle\displaystyle-d\mathcal{Y}_{t}^{2,\varepsilon} =f(t,Xtε,𝒴t1,ε+α𝒴t2,ε,𝒵t1,ε+α𝒵t2,ε)dt𝒵t2,εdWt,𝒴T2,ε=0,\displaystyle\displaystyle=f(t,X_{t}^{\varepsilon},\mathcal{Y}_{t}^{1,\varepsilon}+\alpha\mathcal{Y}_{t}^{2,\varepsilon},\mathcal{Z}_{t}^{1,\varepsilon}+\alpha\mathcal{Z}_{t}^{2,\varepsilon})dt-\mathcal{Z}_{t}^{2,\varepsilon}dW_{t},\quad\mathcal{Y}_{T}^{2,\varepsilon}=0,

    we use 𝒴¯t1,ε,(1)=𝒰1,(1)(t,Xtε)\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(1)}={\cal U}^{1,(1)}(t,X_{t}^{\varepsilon}) and 𝒵¯t1,ε,(1)=𝒱1,(1)(t,Xtε)\displaystyle\overline{{\cal Z}}_{t}^{1,\varepsilon,(1)}={\cal V}^{1,(1)}(t,X_{t}^{\varepsilon}) as approximations for 𝒴t1,ε\displaystyle\mathcal{Y}_{t}^{1,\varepsilon} and 𝒵t1,ε\displaystyle\mathcal{Z}_{t}^{1,\varepsilon} in the driver f\displaystyle f, respectively.

    In Section 4.2 below, the estimated values based on this new scheme are shown by the blue lines labeled with “New method [𝒴¯1,ε+𝒴2,ε,DL\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}, 𝒵¯1,ε+𝒵2,ε,DL\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]” in the figures.

The initial value of Z0ε,α\displaystyle Z_{0}^{\varepsilon,\alpha} or 𝒵02,ε\displaystyle\mathcal{Z}_{0}^{2,\varepsilon} is generated by uniform random number with the range [0.01,0.01]\displaystyle[-0.01,0.01] for each method. Numerical experiments presented in the following subsections are implemented by Python with TensorFlow on Google Colaboratory.

4.2 Numerical experiments

We show some numerical examples which show that our proposed method substantially outperforms other methods in terms of terminal errors (numerical values of loss functions), variations and convergence speed.

4.2.1 The case of d=1\displaystyle d=1

This subsection presents the numerical results for the case of d=1\displaystyle d=1.

We first check the performance of our method in the model, where the explicit value of the solution is obtained by the Picard iteration. We consider an option pricing model in finance that takes CVA(credit value adjustment) into account as follows:

dXtε=\displaystyle\displaystyle dX_{t}^{\varepsilon}= μ(t,Xtε)dt+εσ(t,Xtε)dWt,\displaystyle\displaystyle\mu(t,X_{t}^{\varepsilon})dt+\varepsilon\sigma(t,X_{t}^{\varepsilon})dW_{t}, (4.1)
dYtε,α=\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}= α(Ytε,α)+dtZtε,αdWt,YTε,α=(XTεK)+,\displaystyle\displaystyle-\alpha(Y_{t}^{\varepsilon,\alpha})^{+}dt-Z_{t}^{\varepsilon,\alpha}dW_{t},\quad Y_{T}^{\varepsilon,\alpha}=(X_{T}^{\varepsilon}-K)^{+}, (4.2)

with f(t,x,y,z)=(y)+\displaystyle f(t,x,y,z)=-(y)^{+} and g(x)=(xK)+\displaystyle g(x)=(x-K)^{+}. in (2.1) and (2.2). We note that α=\displaystyle\alpha=(loss rate in default)×\displaystyle\times(default intensity) in a finance model of CVA.

In computation we set μ(t,x)=0\displaystyle\mu(t,x)=0, σ(t,x)=x\displaystyle\sigma(t,x)=x, ε=σ=0.2\displaystyle\varepsilon=\sigma=0.2, X0=100\displaystyle X_{0}=100, α=0.05\displaystyle\alpha=0.05, T=0.5\displaystyle T=0.5 with K=100\displaystyle K=100 (ATM case) and K=115\displaystyle K=115 (OTM case).

In this case an explicit value of Y0\displaystyle Y_{0} is computed as Y0=𝒴01(1+i=1(1)iαiTi1i!)\displaystyle\textstyle{Y_{0}={\cal Y}_{0}^{1}(1+\sum_{i=1}^{\infty}(-1)^{i}\alpha^{i}T^{i}\frac{1}{i!})}. More precisely, by the k\displaystyle k-Picard iteration of the backward equation:

dYtε,α,[k]=α(Ytε,α,[k1])+dtZtε,α,[k]dWt,YTε,α,[k]=(XTεK)+,\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha,[k]}=-\alpha(Y_{t}^{\varepsilon,\alpha,[k-1]})^{+}dt-Z_{t}^{\varepsilon,\alpha,[k]}dW_{t},\ \ \ Y_{T}^{\varepsilon,\alpha,[k]}=(X_{T}^{\varepsilon}-K)^{+}, (4.3)
with (Ytε,α,[0])+=E[(XTεK)+|t]=𝒴t1,for  allt0,\displaystyle\displaystyle\ \ \ \ (Y_{t}^{\varepsilon,\alpha,[0]})^{+}=E[(X_{T}^{\varepsilon}-K)^{+}|{\cal F}_{t}]={\cal Y}_{t}^{1},\ \mbox{for \ all}\ t\geq 0, (4.4)

it is easy to see that Ytε,α,[k]=𝒴t1αtTE[Ysε,α,[k1]|t]𝑑s\displaystyle\textstyle{Y_{t}^{\varepsilon,\alpha,[k]}={\cal Y}_{t}^{1}-\alpha\int_{t}^{T}E[Y_{s}^{\varepsilon,\alpha,[k-1]}|{\cal F}_{t}]ds}, and thus one has

Y0ε,α,[k]=𝒴01(1+i=1k(1)iαiTi1i!).\displaystyle\displaystyle\textstyle{Y_{0}^{\varepsilon,\alpha,[k]}={\cal Y}_{0}^{1}(1+\sum_{i=1}^{k}(-1)^{i}\alpha^{i}T^{i}\frac{1}{i!})}.

Then, the true values are given as Y0=5.50\displaystyle Y_{0}=5.50 in the ATM case and Y0=1.26\displaystyle Y_{0}=1.26 in the OTM case by the 5\displaystyle 5-Picard iteration, which provides enough convergence and hence accuracy.

Figure 1 and 2 show the numerical values of loss functions and the approximate values of Y0\displaystyle Y_{0}, respectively against the number of iteration steps for the ATM case, and Figure 3 and 4 for the OTM case.

By Figure 2 and 4, the numerical values of “New method [𝒴¯1,ε+𝒴2,ε,DL\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL},𝒵¯1,ε+𝒵2,ε,DL]\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]” converge to the true values substantially faster with smaller variations comparing to other schemes. Also, we can see that the errors of “New method” are much smaller according to the behavior of their loss functions against the number of iteration steps in Figure 1 and 3.

Refer to caption
Figure 1: Values of the loss function and number of iteration steps (1-dim option pricing model with CVA, ATM case)
Refer to caption
Figure 2: Approximate values of Y0\displaystyle Y_{0} (true value: 5.50) and number of iteration steps (1-dim option pricing model with CVA, ATM case)
Refer to caption
Figure 3: Values of the loss function and number of iteration steps (1-dim option pricing model with CVA, OTM case)
Refer to caption
Figure 4: Approximate values of Y0\displaystyle Y_{0} (true value: 1.26) and number of iteration steps (1-dim option pricing model with CVA, OTM case)

Next, we present numerical examples for the model, where explicit values of Y0\displaystyle Y_{0} can not be obtained without numerical schemes such as Monte Carlo simulations. Let us consider

dXtε=\displaystyle\displaystyle dX_{t}^{\varepsilon}= μ(t,Xtε)dt+εσ(t,Xtε)dWt,\displaystyle\displaystyle\mu(t,X_{t}^{\varepsilon})dt+\varepsilon\sigma(t,X_{t}^{\varepsilon})dW_{t}, (4.5)
dYtε,α=\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}= αf(t,Xtε,Ytε,α,Ztε,α)dtZtε,αdWt,YTε,α=g(XTε)\displaystyle\displaystyle\alpha f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})dt-Z_{t}^{\varepsilon,\alpha}dW_{t},\quad Y_{T}^{\varepsilon,\alpha}=g(X_{T}^{\varepsilon}) (4.6)

with μ(t,x)=0\displaystyle\mu(t,x)=0, σ(t,x)=x\displaystyle\sigma(t,x)=x, ε=σ=0.2\displaystyle\varepsilon=\sigma=0.2, T=0.25\displaystyle T=0.25, X0=100\displaystyle X_{0}=100, and

f(t,x,y,z)\displaystyle\displaystyle f(t,x,y,z) ={(yzσ11)(Rr)}\displaystyle\displaystyle=-\left\{\left(y-z\sigma^{-1}\mathrm{1}\right)^{-}(R-r)\right\} (4.7)

with R=0.06\displaystyle R=0.06, r=0.0\displaystyle r=0.0, and

g(x)=(xK1)+2(xK2)+,withK1=95,K2=105.g(x)=(x-K_{1})^{+}-2(x-K_{2})^{+},\ \ \mbox{with}\ K_{1}=95,\ K_{2}=105. (4.8)

As we mentioned in Section 4.1, the estimated value by the methods “Deep BSDE” and “Deep BSDE[(Y,Z)\displaystyle(Y,Z)]+AE[𝒴¯01,ε\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0} and Z¯1,ε\displaystyle\overline{Z}^{1,\varepsilon}]” converges to the true value of Y0\displaystyle Y_{0}. Then, in the experiments, we check whether the estimated value by “New method [𝒴¯1,ε+𝒴2,ε,DL\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL},𝒵¯1,ε+𝒵2,ε,DL]\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]” converges faster than the ones computed by the methods “Deep BSDE” and “Deep BSDE[(Y,Z)\displaystyle(Y,Z)]+AE[𝒴¯01,ε\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0} and Z¯1,ε\displaystyle\overline{Z}^{1,\varepsilon}]”.

Figure 5 shows the numerical values of loss functions against the number of iteration steps. While “Deep BSDE[(Y,Z)\displaystyle(Y,Z)]+AE[𝒴¯01,ε\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0} and Z¯1,ε\displaystyle\overline{Z}^{1,\varepsilon}]” is superior to the original “Deep BSDE”, we see that “New method [𝒴¯1,ε+𝒴2,ε,DL\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL},𝒵¯1,ε+𝒵2,ε,DL]\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]” gives much more stable and accurate convergence than other schemes.

Figure 6 shows the approximate values of Y0\displaystyle Y_{0} against the number of iteration steps. It is observed that “New method [𝒴¯1,ε+𝒴2,ε,DL\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL},𝒵¯1,ε+𝒵2,ε,DL]\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]” provides the fastest convergence with the smallest standard deviation, while “Deep BSDE[(Y,Z)\displaystyle(Y,Z)]+AE[𝒴¯01,ε\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0} and Z¯1,ε\displaystyle\overline{Z}^{1,\varepsilon}]” gives better approximation than “Deep BSDE”.

Refer to caption
Figure 5: Values of the loss function and number of iteration steps
Refer to caption
Figure 6: Approximate values of Y0\displaystyle Y_{0} and number of iteration steps

4.2.2 The case of d=100\displaystyle d=100

We show the main numerical result for d=100\displaystyle d=100. The same experiment as in the case of d=1\displaystyle d=1 is performed. Let us consider

dXtε,i=\displaystyle\displaystyle dX_{t}^{\varepsilon,i}= μi(t,Xtε)dt+εj=1dσji(t,Xtε)dWtj,\displaystyle\displaystyle\mu^{i}(t,X_{t}^{\varepsilon})dt+\varepsilon\sum_{j=1}^{d}\sigma^{i}_{j}(t,X_{t}^{\varepsilon})dW^{j}_{t}, (4.9)
dYtε,α=\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}= αf(t,Xtε,Ytε,α,Ztε,α)dtZtε,αdWt,YTε,α=g(XTε),\displaystyle\displaystyle\alpha f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})dt-Z_{t}^{\varepsilon,\alpha}dW_{t},\quad Y_{T}^{\varepsilon,\alpha}=g(X_{T}^{\varepsilon}), (4.10)

with μi(t,x)=0\displaystyle\mu^{i}(t,x)=0, σji(t,x)=xi\displaystyle\sigma^{i}_{j}(t,x)=x^{i} (i=1,,d\displaystyle i=1,\cdots,d), ε=0.4\displaystyle\varepsilon=0.4, X0i=100\displaystyle X_{0}^{i}=100, T=0.25\displaystyle T=0.25, and

f(t,x,y,z)\displaystyle\displaystyle f(t,x,y,z) ={(yk=1dj=1dzk[σ1]kj)(Rr)},\displaystyle\displaystyle=-\left\{-\left(y-\sum_{k=1}^{d}\sum_{j=1}^{d}z_{k}[\sigma^{-1}]_{kj}\right)^{-}(R-r)\right\}, (4.11)

with R=0.01\displaystyle R=0.01, r=0.0\displaystyle r=0.0 where θ\displaystyle\theta is defined by μr1=σθ\displaystyle\mu-r\mathrm{1}=\sigma\theta, and

g(x)=(1di=1dxiK1)+2(1di=1dxiK2)+,withK1=95,K2=105.g(x)=\left(\frac{1}{d}\sum_{i=1}^{d}x_{i}-K_{1}\right)^{+}-2\left(\frac{1}{d}\sum_{i=1}^{d}x_{i}-K_{2}\right)^{+},\ \ \mbox{with}\ K_{1}=95,\ K_{2}=105. (4.12)

The result is given in Figure 7, 8 and 9. It seems that the convergence speed of the original deep BSDE method is too slow to obtain the precise result. On the contrary, “Deep BSDE[(Y,Z)\displaystyle(Y,Z)]+AE[𝒴¯01,ε\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0} and Z¯1,ε\displaystyle\bar{Z}^{1,\varepsilon}]” and “New method [𝒴¯1,ε+𝒴2,ε,DL\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL},𝒵¯1,ε+𝒵2,ε,DL]\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]” work well even in this high dimensional case. Particularly, “New method [𝒴¯1,ε+𝒴2,ε,DL\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL},𝒵¯1,ε+𝒵2,ε,DL]\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]” provides a remarkable performance in terms of convergence speed, accuracy (numerical values of loss functions) and variations. Moreover, comparing the results of our new method and “Deep BSDE[(Y,Z)\displaystyle(Y,Z)]+AE[𝒴¯01,ε\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0} and Z¯1,ε\displaystyle\bar{Z}^{1,\varepsilon}]” closely, Figure 9 shows that the variation of Y0\displaystyle Y_{0} by our method is much smaller, which is consistent with much smaller values of loss functions for the new method appearing in Figure 7.

Refer to caption
Figure 7: Values of the loss function and number of iteration steps
Refer to caption
Figure 8: Approximate values of Y0\displaystyle Y_{0} and number of iteration steps
Refer to caption
Figure 9: Approximate values of Y0\displaystyle Y_{0} and number of iteration steps (enlarged view for “Deep BSDE[(Y,Z)\displaystyle(Y,Z)]+AE[𝒴¯01,ε\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0} and Z¯1,ε\displaystyle\overline{Z}^{1,\varepsilon}]” and “New method [𝒴¯1,ε+𝒴2,ε,DL\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}, 𝒵¯1,ε+𝒵2,ε,DL\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]”)

5 Conclusion and future works

This paper has introduced a new control variate method for Deep BSDE solver to improve the methods such as in Weinan E et al. (2017) [3] and Fujii et al. (2019) [9]. First, we decompose a target semilinear PDE (BSDE) to two parts, namely dominant linear and residual nonlinear PDEs (BSDEs). When the dominant part is obtained as a closed-form or approximated based on an asymptotic expansion scheme, the small nonlinear PDE part is efficiently computed by Deep BSDE solver, where the asymptotic expansion crucially works as a control variate. The main theorem provides the validity of our proposed method. Moreover, numerical examples for one and 100 dimensional BSDEs corresponding to target nonlinear PDEs show the effectiveness of our scheme, which is consistent with our initial conjecture and theoretical result.

As mentioned in Remark 1, even if the accuracy of the standard asymptotic expansion scheme becomes worse, the linear PDE part can be more efficiently approximated by the existing methods such as [36][35][39][26][29][20]. We should check those performances in such cases against various nonlinear models. Also, it will be a challenging task to examine whether the high order automatic differentiation schemes proposed in [40][37] work as efficient approximations of Z\displaystyle Z in nonlinear BSDEs or xu\displaystyle\partial_{x}u in nonlinear PDEs. These are left for future studies.

Appendix

Appendix A Proof of Propositions

A.1 Proof of Proposition 1

See Proposition 4.2 in Takahashi and Yamada (2015) [34] for (3.11)–(3.14), for instance.

Also, note that for p1\displaystyle p\geq 1 and a multi-index α\displaystyle\alpha, supxdxαX¯Tt,x,εpC(T)\displaystyle\sup{}_{x\in\mathbb{R}^{d}}\|\partial_{x}^{\alpha}\overline{X}^{t,x,\varepsilon}_{T}\|_{p}\leq C(T) and supxdxα𝒲Tt,x,ε,(m)pC(T)\displaystyle\sup{}_{x\in\mathbb{R}^{d}}\|\partial_{x}^{\alpha}{\cal W}^{t,x,\varepsilon,(m)}_{T}\|_{p}\leq C(T) hold for t<T\displaystyle t<T. Then, sup|xdx2𝒰1,(m)(t,x)|2gC(T)\displaystyle\sup{}_{x\in\mathbb{R}^{d}}|\partial_{x}^{2}{\cal U}^{1,(m)}(t,x)|\leq\|\nabla^{2}g\|_{\infty}C(T) for t<T\displaystyle t<T, i.e. 𝒰1,(m)(t,)Cb2(d)\displaystyle{\cal U}^{1,(m)}(t,\cdot)\in C_{b}^{2}(\mathbb{R}^{d}), t<T\displaystyle t<T.

For 𝒱1,(m)\displaystyle{\cal V}^{1,(m)}, we have the representation

𝒱1,(m)(t,x)=E[g(X¯Tt,x,ε)𝒵Tt,x,ε,(m)]=E[(g)(X¯Tt,x,ε)𝒬Tt,x,ε,(m)],\displaystyle\displaystyle{\cal V}^{1,(m)}(t,x)=E[g(\overline{X}^{t,x,\varepsilon}_{T}){\cal Z}^{t,x,\varepsilon,(m)}_{T}]=E[(\nabla g)(\overline{X}^{t,x,\varepsilon}_{T}){\cal Q}^{t,x,\varepsilon,(m)}_{T}],

for a matrix-valued Wiener functional 𝒬Tt,x,ε,(m)=[[𝒬Tt,x,ε,(m)]ji]1i,jd\displaystyle{\cal Q}^{t,x,\varepsilon,(m)}_{T}=[[{\cal Q}^{t,x,\varepsilon,(m)}_{T}]^{i}_{j}]_{1\leq i,j\leq d} such that [𝒬Tt,x,ε,(m)]ji𝔻\displaystyle[{\cal Q}^{t,x,\varepsilon,(m)}_{T}]^{i}_{j}\in\mathbb{D}^{\infty}, 1i,jd\displaystyle 1\leq i,j\leq d, satisfying for p1\displaystyle p\geq 1 and a multi-index α\displaystyle\alpha, supxdxα𝒬Tt,x,ε,(m)pC(T)\displaystyle\sup{}_{x\in\mathbb{R}^{d}}\|\partial_{x}^{\alpha}{\cal Q}^{t,x,\varepsilon,(m)}_{T}\|_{p}\leq C(T) for t<T\displaystyle t<T. Then, sup|xdx𝒱1,(m)(t,x)|2gC(T)\displaystyle\sup{}_{x\in\mathbb{R}^{d}}|\partial_{x}{\cal V}^{1,(m)}(t,x)|\leq\|\nabla^{2}g\|_{\infty}C(T) for t<T\displaystyle t<T, i.e. 𝒱1,(m)(t,)Cb1(d)\displaystyle{\cal V}^{1,(m)}(t,\cdot)\in C_{b}^{1}(\mathbb{R}^{d}), t<T\displaystyle t<T. \displaystyle\Box

A.2 Proof of Proposition 2

For the derivations, we use Malliavin calculus. Let 𝒯𝒮(d)\displaystyle{\cal T}\in\mathcal{S}^{\prime}(\mathbb{R}^{d}) be a tempered distribution and F(𝔻)d\displaystyle F\in(\mathbb{D}^{\infty})^{d} be a nondegenerate Wiener functional in the sense of Malliavin. Then, 𝒯(F)\displaystyle{\cal T}(F) is well-defined as an element of the space of Watanabe distributions 𝔻\displaystyle\mathbb{D}^{-\infty}, that is the dual space of 𝔻\displaystyle\mathbb{D}^{\infty}. Also, for G𝔻\displaystyle G\in\mathbb{D}^{\infty}, a (generalized) expectation E[𝒯(F)G]\displaystyle E[{\cal T}(F)G] is understood as a coupling of 𝒯(F)𝔻\displaystyle{\cal T}(F)\in\mathbb{D}^{-\infty} and G𝔻\displaystyle G\in\mathbb{D}^{\infty}, namely 𝒯(F),G𝔻𝔻\displaystyle{}_{\mathbb{D}^{-\infty}}\langle{\cal T}(F),G\rangle_{\mathbb{D}^{\infty}}.

Note that GTt,x,ε:=(XTt,x,εXTt,x,0)/ε\displaystyle G_{T}^{t,x,\varepsilon}:=(X_{T}^{t,x,\varepsilon}-X_{T}^{t,x,0})/\varepsilon and (/x)XTt,x,ε\displaystyle(\partial/\partial x)X_{T}^{t,x,\varepsilon} in

𝒰1(t,x)=E[g(XTt,x,ε)]=dg(XTt,x,0+εy)E[δy(GTt,x,ε)]𝑑y\displaystyle\displaystyle{\cal U}^{1}(t,x)=E[g(X_{T}^{t,x,\varepsilon})]=\int_{\mathbb{R}^{d}}g(X_{T}^{t,x,0}+\varepsilon y)E[\delta_{y}(G_{T}^{t,x,\varepsilon})]dy (A.1)

and

(/x)𝒰1σε(t,x)=E[(g)(XTt,x,ε)(/x)XTt,x,ε]εσ(t,x)\displaystyle\displaystyle(\partial/\partial x){\cal U}^{1}\sigma^{\varepsilon}(t,x)=E[(\nabla g)(X_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon}]\varepsilon\sigma(t,x) (A.2)
=di=1d(ig)(XTt,x,0+εy)E[δy(GTt,x,ε)(/x)XTt,x,ε,i]dyεσ(t,x)\displaystyle\displaystyle=\int_{\mathbb{R}^{d}}\sum_{i=1}^{d}(\partial_{i}g)(X_{T}^{t,x,0}+\varepsilon y)E[\delta_{y}(G_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon,i}]dy\varepsilon\sigma(t,x) (A.3)

have expansions in 𝔻\displaystyle\mathbb{D}^{\infty} whose expansion coefficients are given by iterated stochastic integrals: GTt,x,εX1,Tt,x+εX2,Tt,x+\displaystyle G_{T}^{t,x,\varepsilon}\sim X_{1,T}^{t,x}+\varepsilon X_{2,T}^{t,x}+\cdots and (/x)XTt,x,εJtT0,x+εJtT1,x+\displaystyle(\partial/\partial x)X_{T}^{t,x,\varepsilon}\sim J_{t\to T}^{0,x}+\varepsilon J_{t\to T}^{1,x}+\cdots. In particular,

X1,Tt,x\displaystyle\displaystyle X_{1,T}^{t,x} =k=1dtTJtT0,x(Jts0,x)1σk(Xst,x,0)𝑑Wsk,\displaystyle\displaystyle=\sum_{k=1}^{d}\int_{t}^{T}J_{t\to T}^{0,x}(J_{t\to s}^{0,x})^{-1}\sigma_{k}(X_{s}^{t,x,0})dW_{s}^{k}, (A.4)
X2,Tt,x\displaystyle\displaystyle X_{2,T}^{t,x} =k=1dtTJtT0,x(Jts0,x)1σk(Xst,x,0)X1,st,xdWsk\displaystyle\displaystyle=\sum_{k=1}^{d}\int_{t}^{T}J_{t\to T}^{0,x}(J_{t\to s}^{0,x})^{-1}\partial\sigma_{k}(X_{s}^{t,x,0})X_{1,s}^{t,x}dW_{s}^{k} (A.5)
+12tTJtT0,x(Jts0,x)12μ(Xst,x,0)X1,st,xX1,st,xds,\displaystyle\displaystyle\ \ \ \ +\frac{1}{2}\int_{t}^{T}J_{t\to T}^{0,x}(J_{t\to s}^{0,x})^{-1}\partial^{2}\mu(X_{s}^{t,x,0})\cdot X_{1,s}^{t,x}\otimes X_{1,s}^{t,x}ds, (A.6)

and

JtT0,x=(/x)XTt,x,0,\displaystyle\displaystyle J_{t\to T}^{0,x}=(\partial/\partial x)X_{T}^{t,x,0}, (A.7)
JtT1,x=JtT0,x{tT2μ(Xst,x,0)X1,st,xds+k=1dtTσk(Xst,x,0)dWsk},\displaystyle\displaystyle J_{t\to T}^{1,x}=J_{t\to T}^{0,x}\{\int_{t}^{T}\partial^{2}\mu(X_{s}^{t,x,0})X_{1,s}^{t,x}ds+\sum_{k=1}^{d}\int_{t}^{T}\partial\sigma_{k}(X_{s}^{t,x,0})dW_{s}^{k}\}, (A.8)

where the followings are used: for a smooth function V:dd\displaystyle V:\mathbb{R}^{d}\to\mathbb{R}^{d},

2V(x)=[2Vαi(x)xjxk]jk,\partial^{2}V(x)=\left[\partialderivative{V_{\alpha}^{i}(x)}{x^{j}}{x^{k}}\right]^{k}_{j}, (A.9)
[2Vηη]i=j,k2Vαixjxkηjηk,ηd.\left[\partial^{2}V\cdot\eta\otimes\eta\right]^{i}=\sum_{j,k}\partialderivative{V_{\alpha}^{i}}{x^{j}}{x^{k}}\eta^{j}\eta^{k},\ \ \ \eta\in\mathbb{R}^{d}. (A.10)

We expand E[δy(XTt,x,ε)]\displaystyle E[\delta_{y}(X_{T}^{t,x,\varepsilon})] in (A.1) and E[(δy)(XTt,x,ε)(/x)XTt,x,ε]\displaystyle E[(\nabla\delta_{y})(X_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon}] in (A.3) to obtain explicit expressions of 𝒰1,(1)(t,x)\displaystyle{\cal U}^{1,(1)}(t,x) and 𝒱1,(1)(t,x)\displaystyle{\cal V}^{1,(1)}(t,x). Next, let us recall the following formulas.

Lemma 1.

Let 𝒯𝒮(d)\displaystyle{\cal T}\in\mathcal{S}^{\prime}(\mathbb{R}^{d}) be a tempered distribution.

  1. 1.

    For an adapted process hL2([0,T]×Ω)\displaystyle h\in L^{2}([0,T]\times\Omega),

    j=1dE[j𝒯(X1,Tt,x)tT(Di,sX1,Tt,x,j)h(s)𝑑s]=E[𝒯(X1,Tt,x)tTh(s)𝑑Wsi],\displaystyle\displaystyle\sum_{j=1}^{d}E[\partial_{j}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}(D_{i,s}{X}_{1,T}^{t,x,j})h(s)ds]=E[{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}h(s)dW_{s}^{i}], (A.11)

    where Di,F\displaystyle D_{i,\cdot}F represents the i\displaystyle i-th element of the Malliavin derivative
    DF=(D1,F,,Dd,F)\displaystyle D_{\cdot}F=(D_{1,\cdot}F,\cdots,D_{d,\cdot}F) for F𝔻\displaystyle F\in\mathbb{D}^{\infty}.

  2. 2.

    For 1i1,,id\displaystyle 1\leq i_{1},\cdots,i_{\ell}\leq d,

    E[(i1i𝒯)(X1,Tt,x)]=E[𝒯(X1,Tt,x)H(i1,,i)(X1,Tt,x,1)].\displaystyle\displaystyle E[(\partial_{{i_{1}}}\cdots\partial_{{i_{\ell}}}{\cal T})({X}_{1,T}^{t,x})]=E[{\cal T}({X}_{1,T}^{t,x})H_{(i_{1},\cdots,i_{\ell})}({X}_{1,T}^{t,x},1)].\ \ \ \ \ (A.12)

Proof of Lemma 1. Use the duality formula (see Theorem 1.26 of [24] or Proposition 1.3.11 of [28]), with D𝒯(Ξ)=i=1d(i𝒯)(Ξ)DΞi\displaystyle\textstyle{D{\cal T}(\Xi)=\sum_{i=1}^{d}(\partial_{i}{\cal T})(\Xi)D\Xi^{i}} for Ξ=(Ξ1,,Ξd)(𝔻)d\displaystyle\Xi=(\Xi^{1},\cdots,\Xi^{d})\in(\mathbb{D}^{\infty})^{d} (see Proof of Proposition 2.1.9 of [28] or Proof of Theorem 2.6 of [33]) to obtain the first assertion. Also, the second assertion is immediately obtained by the integration by parts. \displaystyle\Box

In the expansions of (A.1) and (A.3), iterated integrals such as

tThj1(t1)tt1hj2(t2)𝑑Wt2j2𝑑Wt1j1(hjlL2([0,T]),l=1,2,j1,j2=1,,d)\displaystyle\displaystyle\int_{t}^{T}h_{j_{1}}(t_{1})\int_{t}^{t_{1}}h_{j_{2}}(t_{2})dW_{t_{2}}^{j_{2}}dW_{t_{1}}^{j_{1}}\ \ (h_{j_{l}}\in L^{2}([0,T]),\ l=1,2,\ j_{1},j_{2}=1,\cdots,d) (A.13)

appear, for which the following calculation holds with use of (A.11):

i1E[i1𝒯(X1,Tt,x)tThj1(t1)tt1hj2(t2)𝑑Wt2j2𝑑Wt1j1]\displaystyle\displaystyle\sum_{i_{1}}E[\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}h_{j_{1}}(t_{1})\int_{t}^{t_{1}}h_{j_{2}}(t_{2})dW_{t_{2}}^{j_{2}}dW_{t_{1}}^{j_{1}}] (A.14)
=\displaystyle\displaystyle= i1,i2E[i2i1𝒯(X1,Tt,x)tT(Dj1,t1X1,Tt,x,i2)hj1(t1)tt1hj2(t2)𝑑Wt2j2𝑑t1]\displaystyle\displaystyle\sum_{i_{1},i_{2}}E[\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}(D_{j_{1},t_{1}}{X}_{1,T}^{t,x,i_{2}})h_{j_{1}}(t_{1})\int_{t}^{t_{1}}h_{j_{2}}(t_{2})dW_{t_{2}}^{j_{2}}dt_{1}]
=\displaystyle\displaystyle= i1,i2tT(Dj1,t1X1,Tt,x,i2)hj1(t1)E[i2i1𝒯(X1,Tt,x)tt1hj2(t2)𝑑Wt2j2]𝑑t1\displaystyle\displaystyle\sum_{i_{1},i_{2}}\int_{t}^{T}(D_{j_{1},t_{1}}{X}_{1,T}^{t,x,i_{2}})h_{j_{1}}(t_{1})E[\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{t_{1}}h_{j_{2}}(t_{2})dW_{t_{2}}^{j_{2}}]dt_{1}
=\displaystyle\displaystyle= i1,i2,i3tT(Dj1,t1X1,Tt,x,i2)hj1(t1)E[i3i2i1𝒯(X1,Tt,x)tt1(Dj2,t2X1,Tt,x,i3)hi2(t2)𝑑t2]𝑑t1\displaystyle\displaystyle\sum_{i_{1},i_{2},i_{3}}\int_{t}^{T}(D_{j_{1},t_{1}}{X}_{1,T}^{t,x,i_{2}})h_{j_{1}}(t_{1})E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{t_{1}}(D_{j_{2},t_{2}}{X}_{1,T}^{t,x,i_{3}})h_{i_{2}}(t_{2})d{t_{2}}]dt_{1}
=\displaystyle\displaystyle= i1,i2,i3E[i3i2i1𝒯(X1,Tt,x)]tT(Dj1,t1X1,Tt,x,i2)hj1(t1)tt1(Dj2,t2X1,Tt,x,i3)hj2(t2)𝑑t2𝑑t1.\displaystyle\displaystyle\sum_{i_{1},i_{2},i_{3}}E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]\int_{t}^{T}(D_{j_{1},t_{1}}{X}_{1,T}^{t,x,i_{2}})h_{j_{1}}(t_{1})\int_{t}^{t_{1}}(D_{j_{2},t_{2}}{X}_{1,T}^{t,x,i_{3}})h_{j_{2}}(t_{2})d{t_{2}}dt_{1}.

Note that sDj,sX1,Tt,x,i\displaystyle s\mapsto D_{j,s}{X}_{1,T}^{t,x,i} is deterministic, and one has

Dj,sX1,Tt,x,i=[JtT0,xJts0,x1σj(s,Xst,x,0)]i.\displaystyle\displaystyle D_{j,s}{X}_{1,T}^{t,x,i}=[J^{0,x}_{t\to T}{J^{0,x}_{t\to s}}^{-1}\sigma_{j}(s,X_{s}^{t,x,0})]^{i}. (A.15)

Thus, we get

i1E[i1𝒯(X1,Tt,x)tThi1(t1)tt1hi2(t2)𝑑Wt2i2𝑑Wt1i1]\displaystyle\displaystyle\sum_{i_{1}}E[\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}h_{i_{1}}(t_{1})\int_{t}^{t_{1}}h_{i_{2}}(t_{2})dW_{t_{2}}^{i_{2}}dW_{t_{1}}^{i_{1}}] (A.16)
=\displaystyle\displaystyle= i1,i2,i3E[i3i2i1𝒯(X1,Tt,x)]\displaystyle\displaystyle\sum_{i_{1},i_{2},i_{3}}E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})] (A.17)
tT[JtT0,x(Jtt10,x)1σj1(t1,Xt1t,x,0)]i2hj1(t1)tt1[JtT0,x(Jtt20,x)1σj2(t2,Xt2t,x,0)]i3hj2(t2)𝑑t2𝑑t1.\displaystyle\displaystyle\int_{t}^{T}[J^{0,x}_{t\to T}(J^{0,x}_{t\to t_{1}})^{-1}\sigma_{j_{1}}(t_{1},X_{t_{1}}^{t,x,0})]^{i_{2}}h_{j_{1}}(t_{1})\int_{t}^{t_{1}}[J^{0,x}_{t\to T}(J^{0,x}_{t\to t_{2}})^{-1}\sigma_{j_{2}}(t_{2},X_{t_{2}}^{t,x,0})]^{i_{3}}h_{j_{2}}(t_{2})d{t_{2}}dt_{1}. (A.18)

Using the above calculation with (A.12), we have

i1E[i1𝒯(X1,Tt,x)εX2,Tt,x,i1]\displaystyle\displaystyle\sum_{i_{1}}E[\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\varepsilon X_{2,T}^{t,x,i_{1}}] (A.19)
=\displaystyle\displaystyle= εi1,i2,i3,j1,k1,k2E[i3i2i1𝒯(X1,Tt,x)]Ci1,i2,i3,j1(1),k1,k2(t,T,x)\displaystyle\displaystyle\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},k_{1},k_{2}}E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x) (A.20)
+εi1,i2,i3,j1,j2,k1,k2E[i3i2i1𝒯(X1,Tt,x)]Ci1,i2,i3,j1,j2(2),k1,k2(t,T,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},j_{2},k_{1},k_{2}}E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x) (A.21)
+εi1,j1,j2,k1,k2E[i1𝒯(X1,Tt,x)]121k1=k2Ci1,j1,j2(3),k1,k2(t,T,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},j_{1},j_{2},k_{1},k_{2}}E[\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]\frac{1}{2}1_{k_{1}=k_{2}}C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x) (A.22)
=\displaystyle\displaystyle= εi1,i2,i3,j1,k1,k2E[𝒯(X1,Tt,x)H(i1,i2,i3)(X1,Tt,x,1)]Ci1,i2,i3,j1(1),k1,k2(t,T,x)\displaystyle\displaystyle\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},k_{1},k_{2}}E[{\cal T}({X}_{1,T}^{t,x})H_{(i_{1},i_{2},i_{3})}({X}_{1,T}^{t,x},1)]C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x) (A.23)
+εi1,i2,i3,j1,j2,k1,k2E[𝒯(X1,Tt,x)H(i1,i2,i3)(X1,Tt,x,1)]Ci1,i2,i3,j1,j2(2),k1,k2(t,T,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},j_{2},k_{1},k_{2}}E[{\cal T}({X}_{1,T}^{t,x})H_{(i_{1},i_{2},i_{3})}({X}_{1,T}^{t,x},1)]C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x) (A.24)
+εi1,,j1,j2,k1,k2E[𝒯(X1,Tt,x)H(i1)(X1,Tt,x,1)]121k1=k2Ci1,j1,j2(3),k1,k2(t,T,x).\displaystyle\displaystyle+\varepsilon\sum_{i_{1},,j_{1},j_{2},k_{1},k_{2}}E[{\cal T}({X}_{1,T}^{t,x})H_{(i_{1})}({X}_{1,T}^{t,x},1)]\frac{1}{2}1_{k_{1}=k_{2}}C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x). (A.25)

Therefore, we get (3.16) as:

𝒰1,(1)(t,x)=\displaystyle\displaystyle{\cal U}^{1,(1)}(t,x)= E[g(X¯Tt,x,ε)]\displaystyle\displaystyle E[g(\overline{X}_{T}^{t,x,\varepsilon})] (A.26)
+εi1,i2,i3,j1,k1,k2E[g(X¯Tt,x,ε)H(i1,i2,i3)(X1,Tt,x,1)]Ci1,i2,i3,j1(1),k1,k2(t,T,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},k_{1},k_{2}}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3})}({X}_{1,T}^{t,x},1)]C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x) (A.27)
+εi1,i2,i3,j1,j2,k1,k2E[g(X¯Tt,x,ε)H(i1,i2,i3)(X1,Tt,x,1)]Ci1,i2,i3,j1,j2(2),k1,k2(t,T,x)\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},j_{2},k_{1},k_{2}}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3})}({X}_{1,T}^{t,x},1)]C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x) (A.28)
+εi1,,j1,j2,k1,k2E[g(X¯Tt,x,ε)H(i1)(X1,Tt,x,1)]121k1=k2Ci1,j1,j2(3),k1,k2(t,T,x).\displaystyle\displaystyle+\varepsilon\sum_{i_{1},,j_{1},j_{2},k_{1},k_{2}}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1})}({X}_{1,T}^{t,x},1)]\frac{1}{2}1_{k_{1}=k_{2}}C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x). (A.29)

Next, we give the representation (3.17). The function (/x)𝒰1σε\displaystyle(\partial/\partial x){\cal U}^{1}\sigma^{\varepsilon} given by

(/x)𝒰1σε(t,x)=E[(g)(XTt,x,ε)(/x)XTt,x,ε]εσ(t,x)\displaystyle\displaystyle(\partial/\partial x){\cal U}^{1}\sigma^{\varepsilon}(t,x)=E[(\nabla g)(X_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon}]\varepsilon\sigma(t,x) (A.30)
=di=1d(ig)(XTt,x,0+εy)E[δy(GTt,x,ε)(/x)XTt,x,ε,i]dyεσ(t,x)\displaystyle\displaystyle=\int_{\mathbb{R}^{d}}\sum_{i=1}^{d}(\partial_{i}g)(X_{T}^{t,x,0}+\varepsilon y)E[\delta_{y}(G_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon,i}]dy\varepsilon\sigma(t,x) (A.31)

is expanded as

𝒱1,(1)(t,x)\displaystyle\displaystyle{\cal V}^{1,(1)}(t,x) (A.32)
=1εE[i1g(XTt,x,0+εX1,Tt,x)H(i1)(X1,Tt,x,1)]JtT0,xεσ(t,x)\displaystyle\displaystyle=\frac{1}{\varepsilon}E[\sum_{i_{1}}g(X_{T}^{t,x,0}+\varepsilon X_{1,T}^{t,x})H_{(i_{1})}(X_{1,T}^{t,x},1)]J_{t\to T}^{0,x}\ \varepsilon\sigma(t,x) (A.33)
+E[g(XTt,x,0+εX1,Tt,x)H(i1)(X1,Tt,x,JtT1,x)]εσ(t,x)\displaystyle\displaystyle\quad+E[g(X_{T}^{t,x,0}+\varepsilon X_{1,T}^{t,x})H_{(i_{1})}(X_{1,T}^{t,x},J_{t\to T}^{1,x})]\ \varepsilon\sigma(t,x) (A.34)
+i1,i2E[g(XTt,x,0+εX1,Tt,x)H(i2)(X1,Tt,x,H(i1)(X1,Tt,x,X2,Tt,x))]JtT0,xεσ(t,x),\displaystyle\displaystyle\quad+\sum_{i_{1},i_{2}}E[g(X_{T}^{t,x,0}+\varepsilon X_{1,T}^{t,x})H_{(i_{2})}(X_{1,T}^{t,x},H_{(i_{1})}(X_{1,T}^{t,x},X_{2,T}^{t,x}))]J_{t\to T}^{0,x}\ \varepsilon\sigma(t,x), (A.35)

where the following relationship is taken into account:

H(i)(XTt,x,0+εX1,Tt,x,1)=H(i)(X1,Tt,x,1)/ε,i=1,,d.\displaystyle\displaystyle H_{(i)}(X_{T}^{t,x,0}+\varepsilon X_{1,T}^{t,x},1)=H_{(i)}({X}_{1,T}^{t,x},1)/\varepsilon,\ \ \ i=1,\cdots,d. (A.36)

Then, the similar calculation in (A.14) with (A.12) gives the representation (3.17).  \displaystyle\Box

Acknowledgements

This work is supported by JSPS KAKENHI (Grant Number 19K13736) and JST PRESTO (Grant Number JPMJPR2029), Japan.

References

  • [1] C. Beck, F. Hornung, M. Hutzenthaler and A. Jentzen and T. Kruse, Overcoming the curse of dimensionality in the numerical approximation of Allen-Cahn partial differential equations via truncated full-history recursive multilevel Picard approximations, Journal of Numerical Mathematics (2020)
  • [2] J. Berner, P. Grohs and A. Jentzen, Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations, SIAM Journal on Mathematics of Data Science, 2(3), 631-657 (2020)
  • [3] Weinan E, J. Han and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Communications in Mathematics and Statistics, 5(4) 349-380 (2017)
  • [4] Weinan E, J. Han and A. Jentzen, Algorithms for Solving High Dimensional PDEs: From Nonlinear Monte Carlo to Machine Learning, arXiv (2020)
  • [5] D. Elbrächter, P. Grohs, A. Jentzen and C. Schwab, DNN expression rate analysis of high-dimensional PDEs: Application to option pricing, arXiv (2018)
  • [6] El Karoui, S. Peng and M.C. Quenez, Backward stochastic differential equations in finance, Mathematical Finance, 7(1), 1-71 (1997)
  • [7] M. Fujii and A. Takahashi, Analytical approximation for non-linear FBSDEs with perturbation scheme, International Journal of Theoretical and Applied Finance, (2011)
  • [8] M. Fujii and A. Takahashi, Solving backward stochastic differential equations with quadratic-growth drivers by connecting the short-term expansions, Stochastic Processes and their Applications, 129(5) (2019)
  • [9] M. Fujii, A. Takahashi and M. Takahashi, Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs, Asia-Pacific Financial Markets, (2019)
  • [10] A. Gnoatto, C. Reisinger and A. Picarelli, Deep xVA solver-A neural network based counterparty credit risk management framework, SSRN (2020)
  • [11] P. Grohs, A. Jentzen and D. Salimova, Deep neural network approximations for Monte Carlo algorithms, arXiv (2019)
  • [12] P. Grohs, F. Hornung, A. Jentzen and P. Zimmermann, Space-time error estimates for deep neural network approximations for differential equations, arXiv (2019)
  • [13] M.B. Giles, A. Jentzen and T. Welti, Generalised multilevel Picard approximations, arXiv (2020)
  • [14] J. Han and J. Long, Convergence of the Deep BSDE method for coupled FBSDEs, Probability, Uncertainty and Quantitative Risk, 5(5) (2020)
  • [15] J. Han, J. Lu and M.Zhou, Solving high-dimensional eigenvalue problems using deep neural networks: A diffusion Monte Carlo like approach, Journal of Computational Physics, Volume 423, 15, December 2020, 109792 (2020)
  • [16] J. Han, L. Zhang and Weinan E, Solving many-electron Schrödinger equation using deep neural networks, Journal of Computational Physics, Volume 399, 15, December 2019, 108929 (2019)
  • [17] F. Hornung, A. Jentzen and D. Salimova, Space-time deep neural network approximations for high-dimensional partial differential equations, arXiv (2020)
  • [18] M. Hutzenthaler, A. Jentzen, T. Kruse, T.A. Nguyen and P.V. Wurstemberger, Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations, arXiv (2018)
  • [19] N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion Processes, 2nd ed., North-Holland, Amsterdam, Kodansha, Tokyo (1989)
  • [20] Y. Iguchi and T. Yamada, Operator splitting around Euler-Maruyama scheme and high order discretization of heat kernels, ESAIM: Mathematical Modelling and Numerical Analysis, to appear (2020)
  • [21] N. Kunitomo and A. Takahashi, The asymptotic expansion approach to the valuation of interest rate contingent claims, Mathematical Finance, 11, 117-151 (2001)
  • [22] N. Kunitomo and A. Takahashi, On validity of the asymptotic expansion approach in contingent claim analysis, Annals of Applied Probability, 13(3), 914-952 (2003)
  • [23] Y. Li, J. Lu and A. Mao, Variational training of neural network approximations of solution maps for physical models, Journal of Computational Physics, Volume 409, 15 May 2020, 109338 (2020)
  • [24] P. Malliavin and A. Thalmaier, Stochastic Calculus of Variations in Mathematical Finance, Springer (2006)
  • [25] R. Matsuoka, A. Takahashi and Y. Uchida, A new computational scheme for computing Greeks by the asymptotic expansion approach, Asia Pacific Financial Markets, 11, 393-430 (2006)
  • [26] R. Naito and T. Yamada, A third-order weak approximation of multidimensional Itô stochastic differential equations, Monte Carlo Methods and Applications, vol 25 (2), 97-120 (2019)
  • [27] R. Naito and T. Yamada, An acceleration scheme for deep learning-based BSDE solver using weak expansions, International Journal of Financial Engineering, (2020)
  • [28] D. Nualart, The Malliavin Calculus and Related Topics, Springer (2006)
  • [29] Y. Okano and T. Yamada, A control variate method for weak approximation of SDEs via discretization of numerical error of asymptotic expansion, Monte Carlo Methods and Applications, 25(3) (2019)
  • [30] J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, Journal of Computational Physics, Vol 375, 1339-1364 (2018)
  • [31] A. Takahashi, An asymptotic expansion approach to pricing financial contingent claims, Asia-Pacific Financial Markets, 6(2), 115-151 (1999)
  • [32] A. Takahashi, Asymptotic expansion approach in finance, Large Deviations and Asymptotic Methods in Finance (P. Friz, J. Gatheral, A. Gulisashvili, A. Jacquier and J. Teichmann ed.), Springer Proceedings in Mathematics & Statistics (2015)
  • [33] A. Takahashi and T. Yamada, An asymptotic expansion with push-down of Malliavin weights, SIAM Journal on Financial Mathematics, 3, 95-136 (2012)
  • [34] A. Takahashi and T. Yamada, An asymptotic expansion of forward-backward SDEs with a perturbed driver, International Journal of Financial Engineering, 2(2) (2015)
  • [35] A. Takahashi and T. Yamada, A weak approximation with asymptotic expansion and multidimensional Malliavin weights, Annals of Applied Probability, 26(2), 818-856 (2016)
  • [36] A. Takahashi and N. Yoshida, Monte Carlo simulation with asymptotic method, Journal of the Japan Statistical Society 35(2), 171-203 (2005)
  • [37] K. Tokutome and T. Yamada, Acceleration of automatic differentiation of solutions to parabolic partial differential equations: a higher order discretization, Numerical Algorithms, Vol.86, 593-635 (2021)
  • [38] S. Watanabe, Analysis of Wiener functionals (Malliavin calculus) and its applications to heat kernels, Annals of Probability, 15, 1-39 (1987)
  • [39] T. Yamada, An arbitrary high order weak approximation of SDE and Malliavin Monte Carlo: application to probability distribution functions, SIAM Journal on Numerical Analysis, 57(2), 563-591 (2019)
  • [40] T. Yamada and K. Yamamoto, Second order discretization of Bismut-Elworthy-Li formula: application to sensitivity analysis, SIAM/ASA Journal on Uncertainty Quantification, 7(1), 143-173 (2019)
  • [41] Y. Zang, G. Bao, X. Ye and H. Zhou, Weak adversarial networks for high-dimensional partial differential equations, Journal of Computational Physics, 411 (2020)
  • [42] J. Zhang, Backward Stochastic Differential Equations, Springer (2017)