A new efficient approximation scheme for solving high-dimensional semilinear PDEs: control variate method for Deep BSDE solver

Akihiko Takahashi^*^**The University of Tokyo, Tokyo, Japan, Yoshifumi Tsuchida^†^††Hitotsubashi University, Tokyo, Japan and Toshihiro Yamada^‡^‡‡Hitotsubashi University, Tokyo, Japan ^§^§§Japan Science and Technology Agency (JST), Tokyo, Japan

(First version: January 21, 2021, This version: January 30, 2021)

Abstract

This paper introduces a new approximation scheme for solving high-dimensional semilinear partial differential equations (PDEs) and backward stochastic differential equations (BSDEs). First, we decompose a target semilinear PDE (BSDE) into two parts, namely “dominant” linear and “small” nonlinear PDEs. Then, we employ a Deep BSDE solver with a new control variate method to solve those PDEs, where approximations based on an asymptotic expansion technique are effectively applied to the linear part and also used as control variates for the nonlinear part. Moreover, our theoretical result indicates that errors of the proposed method become much smaller than those of the original Deep BSDE solver. Finally, we show numerical experiments to demonstrate the validity of our method, which is consistent with the theoretical result in this paper.

Keyword. Deep learning, Semilinear partial differential equations, Backward stochastic differential equations, Deep BSDE solver, Asymptotic expansion, Control variate method

1 Introduction

High-dimensional semilinear partial differential equations (PDEs) are often used to describe various complex, large-scale phenomena appearing in physics, applied mathematics, economics and finance. Such PDEs typically have the form:

	$\displaystyle\displaystyle\frac{\partial}{\partial t}u(t,x)+{\cal L}u(t,x)+f(t,x,u(t,x),\partial_{x}u(t,x)\sigma(t,x))=0,\ \ \ t<T,\ \ x\in\mathbb{R}^{d},$		(1.1)
	$\displaystyle\displaystyle u(T,x)=g(x),\ \ x\in\mathbb{R}^{d},$

where $\displaystyle f$ is a nonlinear function, $\displaystyle{\cal L}$ is a second order differential operator of the type:

\displaystyle\displaystyle{\cal L}\varphi(t,x)=\sum_{i}\mu^{i}(t,x)\partial_{x_{i}}\varphi(t,x)+\frac{1}{2}\sum_{i,j}[\sigma\sigma^{\top}]_{i,j}(t,x)\partial_{x_{i}}\partial_{x_{j}}\varphi(t,x),

(1.2)

and the dimension $\displaystyle d$ is assumed to be high. To solve the nonlinear PDE, we have to rely on some numerical schemes since they have no closed-form solutions especially in high-dimensional cases. Classical methods such as finite differences and finite elements fail in high-dimensional cases due to their exponential growth of complexity. In the last two decades, probabilistic approaches have been studied with Monte Carlo methods for backward stochastic differential equations (BSDEs) since solutions of semilinear PDEs can be represented by the ones of corresponding BSDEs through the nonlinear Feynman-Kac formula (see Zhang (2017) [42] for instance).

In Weinan E et al. (2017) [3], a novel computational scheme called the Deep BSDE method is proposed. In the Deep BSDE method, a stochastic target problem is considered with a forward-discretization scheme of the related BSDE. Then, the control problem is solved with a deep learning algorithm. The Deep BSDE method has opened the door to tractability of higher dimensional problems, which enables us to solve high-dimensional semilinear PDEs within realistic computation time. Recently, notable related works, mostly with neural networks have developed new methods for solving various types of high dimensional PDEs. See [2][4][5][10][11][12][14][15][17][18][16][23][30][41] for example.

While high-dimensional semilinear PDEs can be feasibly solved by the Deep BSDE method, the deviation of its estimated value from the true one is not small with reasonable computational time. Then, constructing an acceleration scheme for the Deep BSDE method is desirable.

Fujii et al. (2019) [9] proposed an improved scheme for the Deep BSDE method. They used a prior knowledge with an asymptotic expansion method for a target BSDE and obtained its fast approximation. Then, they found that numerical errors become small in accordance with the fast decrease in values of the corresponding loss function. The scheme enables us to reduce processing load of the original Deep BSDE solver. For details of the asymptotic expansion method, a key technique applied in their article, see Takahashi (1999, 2015) [31][32], Kunitomo and Takahashi (2001, 2003) [21][22] and references therein. Moreover, Naito and Yamada (2020) [27] presented an extended scheme of Fujii et al. (2019) [8] by applying the backward Euler scheme for a BSDE with a good initial detection of the solution to a target PDE so that the Deep BSDE method works more efficiently.

In the current work, we develop a new deep learning-based approximation for solving high-dimensional semilinear PDEs by extending the schemes in Weinan E et al. (2017) [3], Fujii et al. (2019) [9] and Naito and Yamada (2020) [27]. In particular, we propose an efficient control variate method for the Deep BSDE solver in order to obtain more accurate and stable approximations. Let us briefly explain the strategy considered in this paper. We first decompose the semilinear PDE (1.1) into two parts,

$\displaystyle u(t,x)={\cal U}^{1}(t,x)+{\cal U}^{2}(t,x)$ as follows:

	$\displaystyle\displaystyle\frac{\partial}{\partial t}{\cal U}^{1}(t,x)+{\cal L}{\cal U}^{1}(t,x)=0,\ \ \ t<T,\ \ x\in\mathbb{R}^{d},$		(1.3)
	$\displaystyle\displaystyle{\cal U}^{1}(T,x)=g(x),\ \ x\in\mathbb{R}^{d},$

and

	$\displaystyle\displaystyle\frac{\partial}{\partial t}{\cal U}^{2}(t,x)+{\cal L}{\cal U}^{2}(t,x)$		(1.4)
	$\displaystyle\displaystyle\hskip 5.0pt+f(t,x,{\cal U}^{1}(t,x)+{\cal U}^{2}(t,x),\partial_{x}{\cal U}^{1}(t,x)\sigma(t,x)+\partial_{x}{\cal U}^{2}(t,x)\sigma(t,x))=0,\ \ t<T,\ x\in\mathbb{R}^{d},$
	$\displaystyle\displaystyle\hskip 100.00015pt{\cal U}^{2}(T,x)=0,\ \ x\in\mathbb{R}^{d}.$

Here, we remark that the solution $\displaystyle u$ of the semilinear PDE (1.1) is given by the sum of the solutions $\displaystyle{\cal U}^{1}$ and $\displaystyle{\cal U}^{2}$ of PDEs (1.3) and (1.4), respectively. Also, we note that $\displaystyle{\cal U}^{1}$ is the solution to the “dominant” linear PDE and $\displaystyle{\cal U}^{2}$ is the solution to the “small” residual nonlinear PDE with null terminal condition whose magnitude is governed by the driver $\displaystyle(t,x,y,z)\mapsto f(t,x,{\cal U}^{1}(t,x)+y,\partial_{x}{\cal U}^{1}(t,x)\sigma(t,x)+z)$ , which is generally expected to have small nonlinear effects on the solution of $\displaystyle u$ . Consequently, the decomposition of the target $\displaystyle u(0,\cdot)$ is represented as follows:

\displaystyle\displaystyle u(0,x)=\underset{\mbox{\tiny{``dominant" linear PDE part}}}{{\cal U}^{1}(0,x)}+\underset{\mbox{\tiny{``small" nonlinear PDE part}}}{{\cal U}^{2}(0,x)},\ \ \ \ x\in\mathbb{R}^{d}.

(1.5)

We next approximate

1.

$\displaystyle{\cal U}^{1}$ by an asymptotic expansion method denoted by $\displaystyle{\cal U}^{1,\mathrm{Asymp}}$ ;
2.

$\displaystyle{\cal U}^{2}$ by the Deep BSDE method, denoted by $\displaystyle{\cal U}^{2,\mathrm{Deep}}$ .

We expect that $\displaystyle{\cal U}^{1,\mathrm{Asymp}}$ in the approximation

\displaystyle\displaystyle u(0,x)\approx{\cal U}^{1,\mathrm{Asymp}}(0,x)+{\cal U}^{2,\mathrm{Deep}}(0,x),\ \ \ \ x\in\mathbb{R}^{d},

(1.6)

becomes a control variate. Furthermore, $\displaystyle{\cal U}^{1,\mathrm{Asymp}}$ and $\displaystyle\partial_{x}{\cal U}^{1,\mathrm{Asymp}}\sigma$ in the approximate driver $\displaystyle(t,x,y,z)\mapsto f(t,x,{\cal U}^{1,\mathrm{Asymp}}(t,x)+y,\partial_{x}{\cal U}^{1,\mathrm{Asymp}}(t,x)\sigma(t,x)+z)$ of $\displaystyle{\cal U}^{2,\mathrm{Deep}}$ will be doubly the control variates. The current work shows how the proposed method works well as a new deep learning-based approximation in both theoretical and numerical aspects.

The organization of this paper is as follows: The next section briefly introduces the deep BSDE solver and acceleration schemes with asymptotic expansions. Section 3 explains our proposed method with the main theoretical result and Section 4 presents our numerical scheme with its experiment.

2 Deep BSDE solver and acceleration scheme with asymptotic expansion

Let $\displaystyle T>0$ and $\displaystyle(\Omega,{\cal F},\{{\cal F}_{t}\}_{0\leq t\leq T},P)$ be a filtered probability space equipped with a $\displaystyle d$ -dimensional Brownian motion $\displaystyle W=\{(W_{t}^{1},\cdots,W_{t}^{d})\}_{0\leq t\leq T}$ and a square-integrable $\displaystyle\mathbb{R}^{d}$ -valued random variable $\displaystyle\xi$ , which is independent of $\displaystyle W$ . The filtration $\displaystyle\{{\cal F}_{t}\}_{0\leq t\leq T}$ is generated by $\displaystyle\{W_{t}+\xi\}_{0\leq t\leq T}$ . Under this setting we consider the following FBSDE:

	$\displaystyle\displaystyle dX_{t}^{\varepsilon}=$	$\displaystyle\displaystyle\mu(t,X_{t}^{\varepsilon})dt+\varepsilon\sigma(t,X_{t}^{\varepsilon})dW_{t},\quad X_{0}^{\varepsilon}=\xi,$		(2.1)
	$\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}=$	$\displaystyle\displaystyle\alpha f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})dt-Z_{t}^{\varepsilon,\alpha}dW_{t},\quad Y_{T}^{\varepsilon,\alpha}=g(X_{T}^{\varepsilon}),$		(2.2)

where $\displaystyle\mu$ is a $\displaystyle\mathbb{R}^{d}$ -valued function on $\displaystyle[0,T]\times\mathbb{R}^{d}$ , $\displaystyle\sigma$ is a $\displaystyle\mathbb{R}^{d\otimes d}$ -valued function on $\displaystyle[0,T]\times\mathbb{R}^{d}$ , $\displaystyle f:[0,T]\times\mathbb{R}^{d}\times\mathbb{R}\times\mathbb{R}^{d}\to\mathbb{R}$ , $\displaystyle g:\mathbb{R}^{d}\to\mathbb{R}$ are some functions so that the FBSDE has the unique solution, and $\displaystyle\varepsilon,\alpha\in(0,1)$ are some small parameters. Here, we assume that $\displaystyle\mu$ and $\displaystyle\sigma$ are bounded and smooth in $\displaystyle x$ and have bounded derivatives with any orders. Also, $\displaystyle f$ is uniformly Lipschitz continuous function with the Lipschitz constant $\displaystyle C_{\mathrm{Lip}}[f]$ and at most linear growth in the variables $\displaystyle x,y,z$ . The function $\displaystyle g$ is assumed to be $\displaystyle C^{2}_{b}$ -class. The functions $\displaystyle\mu,\sigma,f$ are uniformly Hölder- $\displaystyle 1/2$ continuous with respect to $\displaystyle t$ . Furthermore, we put the condition that there is $\displaystyle\varepsilon_{0}>0$ such that $\displaystyle\sigma(t,x)\sigma(t,x)^{\top}\geq\varepsilon_{0}I$ for all $\displaystyle t\in[0,T]$ and $\displaystyle x\in\mathbb{R}^{N}$ . We sometimes omit the subscripts $\displaystyle{\cdot}^{\varepsilon}$ or $\displaystyle{\cdot}^{\varepsilon,\alpha}$ if no confusion arises.

The corresponding semilinear PDE is given by

	$\displaystyle\displaystyle\partial_{t}u(t,x)+\mathcal{L}^{\varepsilon}u(t,x)+f^{\alpha}(t,x,u(t,x),\partial_{x}u(t,x)\sigma^{\varepsilon}(t,x))=0,\ \ \ t<T,$		(2.3)
	$\displaystyle\displaystyle u(T,x)=g(x),$

where $\displaystyle\sigma^{\varepsilon}=\varepsilon\sigma$ , $\displaystyle f^{\alpha}=\alpha f$ , $\displaystyle\partial_{x}=(\partial_{x_{1}},\cdots,\partial_{x_{d}})=({\partial}/{\partial x_{1}},\cdots,{\partial}/{\partial x_{d}})$ and $\displaystyle{\cal L}^{\varepsilon}$ is the generator:

\displaystyle\displaystyle{\cal L}^{\varepsilon}=\sum_{i=1}^{d}\mu^{i}(t,x)\frac{\partial}{\partial x_{i}}+\frac{1}{2}\sum_{i_{1},i_{2}=1}^{d}\sigma^{\varepsilon,i_{1}}(t,x)\sigma^{\varepsilon,i_{2}}(t,x)\frac{\partial^{2}}{\partial x_{i_{1}}\partial x_{i_{2}}}.

(2.4)

The purpose of this paper is to estimate

\displaystyle\displaystyle u(0,X_{0}^{\varepsilon})=Y_{0}^{\varepsilon,\alpha}

(2.5)

against high dimensional FBSDEs/semilinear PDEs. In particular, we introduce an approximation with a deep BSDE solver to propose an efficient control variate method for solving semilinear PDEs. To explain how our method works well as a new scheme, we briefly review the deep BSDE method proposed in Weinan E et al. (2017) [3] and an approximation method developed by Fujii et al. (2019) [9].

2.1 Deep BSDE method by Weinan E et al. (2017)

In Weinan E et al. (2017) [3], the authors considered the minimization problem of the loss function:

\displaystyle\displaystyle\inf_{Y_{0}^{\varepsilon,\alpha,(n)},{Z}^{\varepsilon,\alpha,(n)}}\Big{\|}g(\bar{X}_{T}^{\varepsilon,(n)})-{Y}_{T}^{\varepsilon,\alpha,(n)}\Big{\|}_{2}^{2}

(2.6)

where $\displaystyle\|\cdot\|_{2}=E[|\cdot|^{2}]^{1/2}$ , subject to

\displaystyle\displaystyle{Y}_{t}^{\varepsilon,\alpha,(n)}

\displaystyle\displaystyle={Y}^{\varepsilon,\alpha,(n)}_{0}-\int_{0}^{t}f^{\alpha}(s,\bar{X}_{s}^{\varepsilon,(n)},Y_{s}^{\varepsilon,\alpha,(n)},Z_{s}^{\varepsilon,\alpha,(n)})ds+\int_{0}^{t}Z_{s}^{\varepsilon,\alpha,(n)}dW_{s},

(2.7)

where $\displaystyle\bar{X}^{\varepsilon,(n)}$ is the continuous Euler-Maruyama scheme with number of discretization time steps $\displaystyle n$ :

\displaystyle\displaystyle\bar{X}_{t}^{\varepsilon,(n)}=\int_{0}^{t}\mu(\varphi(s),\bar{X}_{\varphi(s)}^{\varepsilon,(n)})ds+\int_{0}^{t}\sigma^{\varepsilon}(\varphi(s),\bar{X}_{\varphi(s)}^{\varepsilon,(n)})dW_{s},\ \ \ t\geq 0,

(2.8)

with $\displaystyle\varphi(s)=\max\{kT/n;\ s\geq kT/n\}$ . They solved the problem by using a deep learning algorithm and checked the effectiveness of the method for nonlinear BSDEs/PDEs even for the high dimension $\displaystyle d$ . The method is known as Deep BSDE solver.

Then, we have

\displaystyle\displaystyle{Y}_{0}^{\varepsilon,\alpha}\ \approx\ {Y}^{\varepsilon,\alpha,(n),\ast}_{0},

(2.9)

where $\displaystyle Y_{0}^{\varepsilon,\alpha,(n),\ast}$ is obtained by solving (2.6), which is justified by the following estimate shown in Han and Long (2020) [14].

Theorem 1 (Han and Long (2020)).

There exists $\displaystyle C>0$ such that

E[|Y_{0}^{\varepsilon,\alpha}-Y_{0}^{\varepsilon,\alpha,(n)}|^{2}]\leq C\frac{1}{n}+C\norm{g(\bar{X}_{T}^{\varepsilon,(n)})-{Y}_{T}^{\varepsilon,\alpha,(n)}}_{2}^{2},

(2.10)

for $\displaystyle n\geq 1$ .

2.2 An approximation method by Fujii et al. (2019)

In Fujii et al. (2019) [9], the authors considered the problem

\displaystyle\displaystyle\inf_{\widetilde{Y}_{0}^{\varepsilon,\alpha,(n)},\widetilde{Z}^{\mathrm{Res},\varepsilon,\alpha,(n)}}\Big{\|}g(\bar{X}_{T}^{\varepsilon,(n)})-\widetilde{Y}_{T}^{\varepsilon,\alpha,(n)}\Big{\|}_{2}^{2}

(2.11)

subject to

	$\displaystyle\displaystyle\widetilde{Y}_{t}^{\varepsilon,\alpha,(n)}$	$\displaystyle\displaystyle=\widetilde{Y}^{\varepsilon,\alpha,(n)}_{0}-\int_{0}^{t}f^{\alpha}(s,\bar{X}_{s}^{\varepsilon,(n)},\widetilde{Y}_{s}^{\varepsilon,\alpha,(n)},\widehat{Z}_{s}^{\varepsilon,\alpha,(n)}+\widetilde{Z}_{s}^{\mathrm{Res},\varepsilon,\alpha,(n)})ds$		(2.12)
		$\displaystyle\displaystyle\quad+\int_{0}^{t}\{\widehat{Z}_{s}^{\varepsilon,\alpha,(n)}+\widetilde{Z}_{s}^{\mathrm{Res},\varepsilon,\alpha,(n)}\}dW_{s},$		(2.13)

where $\displaystyle\widehat{Z}^{\varepsilon,\alpha,(n)}$ is a prior knowledge of $\displaystyle Z$ which is easily computed by an asymptotic expansion method, and they solve the minimization problem with respect to $\displaystyle\widetilde{Y}_{0}^{\varepsilon,\alpha,(n)}$ and $\displaystyle\widetilde{Z}^{\mathrm{Res},\varepsilon,\alpha,(n)}$ by Deep BSDE solver. The authors showed that the scheme gives better accuracy than the original Deep BSDE solver. Furthermore, Naito and Yamada (2020) [27] proposed an acceleration scheme by extending the method of Fujii et al. (2019) [9] with a good initial detection of $\displaystyle Y_{0}$ and the backward Euler scheme of $\displaystyle Z$ . They confirmed that the numerical error of the method becomes smaller even if the number of iteration steps is few, in other words, the scheme gives faster computation for nonlinear BSDEs/PDEs than the original deep BSDE method ([3]) and Fujii et al. (2019) [9].

3 New method

We propose a new method as an extension of Fujii et al. (2019) [9] and Naito and Yamada (2020) [27]. The new scheme is regarded as a control variate method for solving high-dimensional nonlinear BSDEs/PDEs which is motivated by the perturbation scheme in Takahashi and Yamada (2015) [34]. In the following, let us explain the proposed method. We first decompose $\displaystyle(Y^{\varepsilon,\alpha},Z^{\varepsilon,\alpha})$ as $\displaystyle Y^{\varepsilon,\alpha}=\mathcal{Y}^{1,\varepsilon}+\alpha\mathcal{Y}^{2,\varepsilon}$ and $\displaystyle Z^{\varepsilon,\alpha}=\mathcal{Z}^{1,\varepsilon}+\alpha\mathcal{Z}^{2,\varepsilon}$ by introducing

	$\displaystyle\displaystyle-d\mathcal{Y}_{t}^{1,\varepsilon}$	$\displaystyle\displaystyle=-\mathcal{Z}_{t}^{1,\varepsilon}dW_{t},\quad\mathcal{Y}_{T}^{1,\varepsilon}=g(X_{T}^{\varepsilon}),$		(3.1)
	$\displaystyle\displaystyle-d\mathcal{Y}_{t}^{2,\varepsilon}$	$\displaystyle\displaystyle=f^{\alpha}(t,X_{t}^{\varepsilon},\mathcal{Y}_{t}^{1,\varepsilon}+\alpha\mathcal{Y}_{t}^{2,\varepsilon},\mathcal{Z}_{t}^{1,\varepsilon}+\alpha\mathcal{Z}_{t}^{2,\varepsilon})dt-\mathcal{Z}_{t}^{2,\varepsilon}dW_{t},\quad\mathcal{Y}_{T}^{2}=0.$		(3.2)

Here, we note that $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ is the solution of a linear BSDE and that $\displaystyle(\alpha\mathcal{Y}^{2,\varepsilon},\alpha\mathcal{Z}^{2,\varepsilon})$ can be interpreted as the solution of a “residual (nonlinear) BSDE”.

Let $\displaystyle{\cal U}^{1}$ be the solution of the linear PDE corresponding to $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ :

	$\displaystyle\displaystyle\partial_{t}{\cal U}^{1}(t,x)+{\cal L}^{\varepsilon}{\cal U}^{1}(t,x)=0,\ \ \ t<T,$		(3.3)
	$\displaystyle\displaystyle{\cal U}^{1}(T,x)=g(x).$

3.1 Deep BSDE solver for explicitly solvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$

We start with a case that $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ is explicitly solvable as a closed-form in order to explain our motivation of the paper. Even in this case, $\displaystyle(\alpha\mathcal{Y}^{2,\varepsilon},\alpha\mathcal{Z}^{2,\varepsilon})$ can not be obtained in closed-form due to the nonlinearity of the driver $\displaystyle f$ . Hence, we apply the deep BSDE method to the residual nonlinear BSDE $\displaystyle(\alpha\mathcal{Y}^{2,\varepsilon},\alpha\mathcal{Z}^{2,\varepsilon})$ . Then, the following will be an approximation for the target $\displaystyle Y_{0}^{\varepsilon,\alpha}$ :

\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha}

\displaystyle\displaystyle\approx

\displaystyle\displaystyle{\cal Y}_{0}^{1,\varepsilon}+\alpha\widetilde{\cal Y}_{0}^{2,\varepsilon,(n)\ast},

(3.4)

where $\displaystyle\widetilde{\cal Y}^{2,\varepsilon,(n)\ast}$ is obtained as a solution of the following problem based on the deep BSDE method with closed-form functions for $\displaystyle\mathcal{Y}^{1,\varepsilon}$ and $\displaystyle\mathcal{Z}^{1,\varepsilon}$ :

\displaystyle\displaystyle\inf_{\widetilde{\cal Y}^{2,\varepsilon,(n)}_{0},\widetilde{\cal Z}^{2,\varepsilon,(n)}}\Big{\|}\widetilde{\cal Y}_{T}^{2,\varepsilon,(n)}\Big{\|}_{2}^{2}

(3.5)

subject to

	$\displaystyle\displaystyle\widetilde{\cal Y}_{t}^{2,\varepsilon,(n)}$	$\displaystyle\displaystyle=\widetilde{\cal Y}^{2,\varepsilon,(n)}_{0}-\int_{0}^{t}f^{\alpha}(s,\bar{X}_{s}^{\varepsilon,(n)},\overline{{\cal Y}}_{s}^{1,\varepsilon,(n)}+\alpha\widetilde{\cal Y}_{s}^{2,\varepsilon,(n)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(n)}+\alpha\widetilde{\cal Z}_{s}^{2,\varepsilon,(n)})ds$
		$\displaystyle\displaystyle\qquad+\int_{0}^{t}\widetilde{\cal Z}_{s}^{2,\varepsilon,(n)}dW_{s},$		(3.6)

where

\displaystyle\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(n)}={\cal U}^{1}(t,\bar{X}^{\varepsilon,(n)}_{t}),\ \ \ \ \overline{{\cal Z}}_{t}^{1,\varepsilon,(n)}=(\partial_{x}{\cal U}^{1}\sigma^{\varepsilon})(t,\bar{X}^{\varepsilon,(n)}_{t}),\ \ \ \ t\in[0,T],

(3.7)

with the continuous Euler-Maruyama scheme $\displaystyle\bar{X}^{\varepsilon,(n)}=\{\bar{X}^{\varepsilon,(n)}_{t}\}_{t\geq 0}(=\bar{X}^{(n)})$ and closed-form functions $\displaystyle{\cal U}^{1}$ and $\displaystyle(\partial_{x}{\cal U}^{1}\sigma^{\varepsilon})$ .

In this case, we have the following error estimate with a small $\displaystyle\alpha$ -effect in the residual nonlinear BSDE. The proof will be shown as a part of the one for Theorem 3 in the next subsection. Particularly, see the sentences after (3.24) and (3.33).

Theorem 2.

There exists $\displaystyle C>0$ such that

E[|Y_{0}^{\varepsilon,\alpha}-\{{\cal Y}_{0}^{1,\varepsilon}+\alpha\widetilde{\cal Y}_{0}^{2,\varepsilon,(n)}\}|^{2}]\leq\alpha^{2}C\Big{\{}\frac{1}{n}+\norm{\widetilde{\mathcal{Y}}^{2,\varepsilon,(n)}_{T}}_{2}^{2}\Big{\}},

(3.8)

for all $\displaystyle\varepsilon,\alpha\in(0,1)$ and $\displaystyle n\geq 1$ .

3.2 General case: Deep BSDE solver for unsolvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$

In most cases, $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ is unsolvable as a closed-form, particularly it is the case when the dimension $\displaystyle d$ is high. In such cases, we need to approximate $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ . However, constructing tractable approximations of $\displaystyle\mathcal{Y}^{1,\varepsilon}_{t}={\cal U}^{1}(t,X^{\varepsilon}_{t})$ , $\displaystyle t\geq 0$ , and especially $\displaystyle\mathcal{Z}^{1,\varepsilon}_{t}=(\partial_{x}{\cal U}^{1}\sigma^{\varepsilon})(t,X^{\varepsilon}_{t})$ , $\displaystyle t\geq 0$ , is not an easy task because it includes the gradient of $\displaystyle{\cal U}^{1}$ . A possible solution is to use an asymptotic expansion approach with stochastic calculus. We prepare some notations of Malliavin calculus. Let $\displaystyle\mathbb{D}^{\infty}$ be the space of smooth Wiener functionals in the sense of Malliavin. For a nondegenerate $\displaystyle F\in(\mathbb{D}^{\infty})^{d}$ and $\displaystyle G\in\mathbb{D}^{\infty}$ , for a multi-index $\displaystyle\gamma$ , there exists $\displaystyle H_{\gamma}(F,G)\in\mathbb{D}^{\infty}$ such that $\displaystyle E[\partial^{\gamma}\varphi(F)G]=E[\varphi(F)H_{\gamma}(F,G)]$ for all $\displaystyle\varphi\in C^{\infty}_{b}(\mathbb{R}^{d})$ . See Chapter V.8-10 in Ikeda and Watanabe (1989) [20] and Chapter 1-2 in Nualart (2006) [28] for the details.

First, we give approximations of $\displaystyle{\cal Y}^{1,\varepsilon}$ and $\displaystyle{\cal Z}^{1,\varepsilon}$ . For $\displaystyle m\in\mathbb{N}$ , we approximate $\displaystyle{\cal U}^{1}$ and $\displaystyle\partial_{x}{\cal U}^{1}\sigma^{\varepsilon}$ with asymptotic expansions up to the $\displaystyle m$ -th order and Malliavin calculus, by applying or extending the methods in [25][32][33][34][40]. Let us consider $\displaystyle X^{t,x,\varepsilon}=\{X^{t,x,\varepsilon}_{s}\}_{s\geq t}$ be the solution of

\displaystyle\displaystyle X_{s}^{t,x,\varepsilon}=x+\int_{t}^{s}\mu(u,X_{u}^{t,x,\varepsilon})du+\varepsilon\int_{t}^{s}\sigma(u,X_{u}^{t,x,\varepsilon})dW_{u},\ \ \ x\in\mathbb{R}^{d},\ s\geq t.

(3.9)

Then the $\displaystyle d$ -dimensional forward process $\displaystyle X^{t,x,\varepsilon}=(X^{t,x,\varepsilon,1},\cdots,X^{t,x,\varepsilon,d})$ can be expanded as follows: for $\displaystyle i=1,\cdots,d$ ,

\displaystyle\displaystyle X_{s}^{t,x,\varepsilon,i}\sim X^{t,x,0,i}_{s}+\varepsilon X^{t,x,i}_{1,s}+\varepsilon^{2}X^{t,x,i}_{2,s}+\cdots\ \ \ \ \ \mbox{in}\ \ \ \mathbb{D}^{\infty},

(3.10)

for some $\displaystyle X^{t,x,i}_{k,s}\in\mathbb{D}^{\infty}$ , $\displaystyle k\in\mathbb{N}$ , which are independent of $\displaystyle\varepsilon$ (see Watanabe (1987) [38] for example). Here, $\displaystyle X^{t,x,0,i}_{s}$ is the solution of $\displaystyle X_{s}^{t,x,0,i}=x+\int_{t}^{s}\mu^{i}(u,X_{u}^{t,x,0})du$ , and $\displaystyle X^{t,x,i}_{k,s}=\frac{1}{k!}{\partial^{k}}/{\partial\varepsilon^{k}}X_{s}^{t,x,\varepsilon,i}|_{\varepsilon=0}$ , $\displaystyle k\in\mathbb{N}$ .

Let us define $\displaystyle\overline{X}^{t,x,\varepsilon}_{s}=X^{t,x,0}_{s}+\varepsilon X^{t,x}_{1,s}$ for $\displaystyle s\leq T$ . The functions $\displaystyle{\cal U}^{1}$ and $\displaystyle\partial_{x}{\cal U}^{1}\sigma^{\varepsilon}$ are approximated by the asymptotic expansion.

Proposition 1.

Let $\displaystyle T>0$ and $\displaystyle m\in\mathbb{N}$ . There is $\displaystyle[0,T)\times\mathbb{R}^{d}\times(0,1)\ni(t,x,\varepsilon)\mapsto{\cal W}^{t,x,\varepsilon,(m)}_{T}\in\mathbb{D}^{\infty}$ satisfying that there exist $\displaystyle C(T,m)>0$ and $\displaystyle p(m)\geq m+1$ such that

\displaystyle\displaystyle|{\cal U}^{1}(t,x)-{\cal U}^{1,(m)}(t,x)|\leq C(T,m)\varepsilon^{m+1}(T-t)^{p(m)/2},

(3.11)

for all $\displaystyle\varepsilon\in(0,1)$ , $\displaystyle t<T$ and $\displaystyle x\in\mathbb{R}^{d}$ , where the $\displaystyle{\cal U}^{1,(m)}$ is given by

\displaystyle\displaystyle{\cal U}^{1,(m)}(t,x)=E[g(\overline{X}^{t,x,\varepsilon}_{T}){\cal W}^{t,x,\varepsilon,(m)}_{T}],\ \ \ t<T,\ x\in\mathbb{R}^{d},

(3.12)

which satisfy $\displaystyle{\cal U}^{1,(m)}(t,\cdot)\in C_{b}^{2}(\mathbb{R}^{d})$ , $\displaystyle t<T$ . Also, there is $\displaystyle[0,T)\times\mathbb{R}^{d}\times(0,1)\ni(t,x,\varepsilon)\mapsto{\cal Z}^{t,x,\varepsilon,(m)}_{T}\in\mathbb{D}^{\infty}$ satisfying that there exist $\displaystyle K(T,m)>0$ and $\displaystyle q(m)\geq m$ such that

\displaystyle\displaystyle|\partial_{x}{\cal U}^{1}(t,x)\sigma^{\varepsilon}(t,x)-{\cal V}^{1,(m)}(t,x)|\leq K(T,m)\varepsilon^{m+1}(T-t)^{q(m)/2},

(3.13)

for all $\displaystyle\varepsilon\in(0,1)$ , $\displaystyle t<T$ and $\displaystyle x\in\mathbb{R}^{d}$ , where the $\displaystyle{\cal V}^{1,(m)}$ is given by

\displaystyle\displaystyle{\cal V}^{1,(m)}(t,x)=E[g(\overline{X}^{t,x,\varepsilon}_{T}){\cal Z}^{t,x,\varepsilon,(m)}_{T}],\ \ \ t<T,\ x\in\mathbb{R}^{d},

(3.14)

which satisfy $\displaystyle{\cal V}^{1,(m)}(t,\cdot)\in C_{b}^{1}(\mathbb{R}^{d})$ , $\displaystyle t<T$ .

Proof. See Appendix A.1.

For example, the stochastic weight $\displaystyle{\cal W}^{t,x,\varepsilon,(m)}_{T}$ has the representation in general:

		$\displaystyle\displaystyle{\cal W}^{t,x,\varepsilon,(m)}_{T}$
	$\displaystyle\displaystyle=$	$\displaystyle\displaystyle 1+\sum_{j=1}^{m}\varepsilon^{j}\sum_{k=1}^{j}\sum_{\beta_{1}+\cdots+\beta_{k}=j,\beta_{i}\geq 1}\sum_{\gamma^{(k)}=(\gamma_{1},\cdots,\gamma_{k})\in\{1,\cdots,d\}^{k}}\frac{1}{k!}H_{\gamma^{(k)}}(X^{t,x}_{1,T},\prod_{\ell=1}^{k}X^{t,x,\gamma_{\ell}}_{{\beta_{\ell}}+1,T}).$		(3.15)

See Section 2.2 in Takahashi and Yamada (2012) [33] and Section 6.1 in Takahashi (2015) [32] for more details. The functions $\displaystyle{\cal U}^{1,(m)}$ and $\displaystyle{\cal V}^{1,(m)}$ have more explicit representation. Actually when $\displaystyle m=1$ , $\displaystyle{\cal U}^{1,(1)}$ and $\displaystyle{\cal V}^{1,(1)}$ have the following forms which are easily computed by taking advantage of the fact that $\displaystyle\overline{X}_{T}^{t,x,\varepsilon}$ (and $\displaystyle X_{1,T}^{t,x}$ ) is a Gaussian random variable. In particular, the representation $\displaystyle{\cal V}^{1,(1)}$ , the multidimensional expansion of $\displaystyle\partial_{x}{\cal U}^{1}\sigma^{\varepsilon}$ is new, which is an extension of [25][40].

Proposition 2.

For $\displaystyle t<T$ , $\displaystyle x\in\mathbb{R}^{d}$ ,

	$\displaystyle\displaystyle{\cal U}^{1,(1)}(t,x)=E[g(\overline{X}_{T}^{t,x,\varepsilon})]$		(3.16)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3})}(X_{1,T}^{t,x},1)]\ C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)$
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},j_{2}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3})}(X_{1,T}^{t,x},1)]\ C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)$
	$\displaystyle\displaystyle+\varepsilon\frac{1}{2}\sum_{i_{1},j_{1},j_{2}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1})}(X_{1,T}^{t,x},1)]\mathrm{1}_{k_{1}=k_{2}}C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x),$

	$\displaystyle\displaystyle{\cal V}^{1,(1)}(t,x)=\sum_{i_{1}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1})}(X_{1,T}^{t,x},1)][J_{t\to T}^{0,x}]^{i_{1}}\sigma(t,x)$		(3.17)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},i_{4},j_{1}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3},i_{4})}(X_{1,T}^{t,x},1)]\ [J_{t\to T}^{0,x}]^{i_{1}}C_{i_{2},i_{3},i_{4},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)\sigma(t,x)$
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},i_{4},j_{1},j_{2}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3},i_{4})}(X_{1,T}^{t,x},1)]\ [J_{t\to T}^{0,x}]^{i_{1}}C_{i_{2},i_{3},i_{4},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)\sigma(t,x)$
	$\displaystyle\displaystyle+\varepsilon\frac{1}{2}\sum_{i_{1},j_{1},j_{2}=1}^{d}\sum_{k_{1},k_{2}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2})}(X_{1,T}^{t,x},1)][J_{t\to T}^{0,x}]^{i_{1}}\mathrm{1}_{k_{1}=k_{2}}C_{i_{2},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x)\sigma(t,x)$
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},j_{1},j_{2}=1}^{d}\sum_{k_{1}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2})}(X_{1,T}^{t,x},1)][J_{t\to T}^{0,x}]_{j_{1}}^{i_{1}}C_{i_{2},j_{1},j_{2}}^{(4),k_{1}}(t,T,x)\sigma(t,x)$
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},j_{1}=1}^{d}\sum_{k_{1}=1}^{d}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2})}(X_{1,T}^{t,x},1)][J_{t\to T}^{0,x}]_{j_{1}}^{i_{1}}C_{i_{2},j_{1}}^{(5),k_{1}}(t,T,x)\sigma(t,x),$

and

	$\displaystyle\displaystyle C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)=$	$\displaystyle\displaystyle\int_{t}^{T}\int_{t}^{t_{1}}a^{i_{3}}_{k_{2}}(t,t_{2},t_{1},x)a^{i_{2}}_{k_{1}}(t,t_{1},T,x)b^{i_{1},j_{1}}_{k_{1}}(t,t_{1},T,x)a^{j_{1}}_{k_{2}}(t,t_{2},t_{1},x)dt_{2}dt_{1},$
	$\displaystyle\displaystyle C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)=$	$\displaystyle\displaystyle\int_{t}^{T}\int_{t}^{t_{1}}\int_{t}^{t_{2}}a^{i_{3}}_{k_{1}}(t,t_{3},t_{2},x)a^{i_{2}}_{k_{2}}(t,t_{2},t_{1},x)$
		$\displaystyle\displaystyle\hskip 32.00006ptc^{i_{1},j_{1},j_{2}}(t,t_{1},T,x)a^{j_{1}}_{k_{2}}(t,t_{2},t_{1},x)a^{j_{2}}_{k_{1}}(t,t_{3},t_{1},x)dt_{3}dt_{2}dt_{1},$
	$\displaystyle\displaystyle C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x)=$	$\displaystyle\displaystyle\int_{t}^{T}\int_{t}^{t_{1}}c^{i_{1},j_{1},j_{2}}(t,t_{1},T,x)a^{j_{2}}_{k_{2}}(t,t_{2},t_{1},x)a^{j_{1}}_{k_{1}}(t,t_{2},t_{1},x)dt_{2}dt_{1},$

	$\displaystyle\displaystyle C_{i_{1},j_{1},j_{2}}^{(4),k_{1}}(t,T,x)=$	$\displaystyle\displaystyle\int_{t}^{T}\int_{t}^{t_{1}}a^{i_{1}}_{k_{1}}(t,t_{2},T,x)[\partial^{2}\mu(t_{1},X_{t_{1}}^{t,x,0})]^{j_{1}}_{j_{2}}a^{j_{2}}_{k_{1}}(t,t_{2},t_{1},x)dt_{2}dt_{1},$
	$\displaystyle\displaystyle C_{i_{1},j_{1}}^{(5),k_{1}}(t,T,x)=$	$\displaystyle\displaystyle\int_{t}^{T}a_{k_{1}}^{i_{1}}(t,t_{1},T,x)\partial_{j_{1}}\sigma_{k_{1}}(t_{1},X_{t_{1}}^{t,x,0})dt_{1},$

with

	$\displaystyle\displaystyle a^{i}_{k}(t,s,u,x)$	$\displaystyle\displaystyle:=\sum_{j_{1},j_{2}=1}^{d}[J_{t\to u}^{0,x}]^{i}_{j_{1}}[(J_{t\to s}^{0,x})^{-1}]^{j_{1}}_{j_{2}}\sigma_{k}^{j_{2}}(s,X_{s}^{t,x,0}),$
	$\displaystyle\displaystyle b^{i,j_{3}}_{k}(t,s,u,x)$	$\displaystyle\displaystyle:=\sum_{j_{1},j_{2}=1}^{d}[J_{t\to u}^{0,x}]^{i}_{j_{1}}[(J_{t\to s}^{0,x})^{-1}]^{j_{1}}_{j_{2}}\partial_{j_{3}}\sigma_{k}^{j_{2}}(s,X_{s}^{t,x,0}),$
	$\displaystyle\displaystyle c^{i,j_{3},j_{4}}(t,s,u,x)$	$\displaystyle\displaystyle:=\sum_{j_{1},j_{2}=1}^{d}[J_{t\to u}^{0,x}]^{i}_{j_{1}}[(J_{t\to s}^{0,x})^{-1}]^{j_{1}}_{j_{2}}[\partial^{2}\mu^{j_{2}}(s,X_{s}^{t,x,0})]^{j_{3}}_{j_{4}}.$

Here, $\displaystyle[\ \cdot\ ]^{i}_{j}$ is an entry in $\displaystyle i$ -th row and $\displaystyle j$ -th column of a matrix, $\displaystyle\partial_{j}\varphi(\cdot)$ is an $\displaystyle j$ -th element of $\displaystyle\partial\varphi(\cdot)=\left[\partialderivative*{\varphi}{x_{i}}(\cdot)\right]_{1\leq i\leq d}$ and $\displaystyle[\partial^{2}\varphi(\cdot)]^{i}_{j}=\partialderivative{\varphi(\cdot)}{x_{i}}{x_{j}}$ ( $\displaystyle 1\leq i,j\leq d$ ) is used.

Proof. See Appendix A.2.

Using $\displaystyle{\cal U}^{1,(m)}$ and $\displaystyle{\cal V}^{1,(m)}$ , we define

\displaystyle\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(m)}={\cal U}^{1,(m)}(t,X_{t}^{\varepsilon}),\ \ \ \overline{{\cal Z}}_{t}^{1,\varepsilon,(m)}={\cal V}^{1,(m)}(t,X_{t}^{\varepsilon}),\ \ \ t\geq 0.

(3.18)

Furthermore, we compute $\displaystyle{\cal Y}^{2,\varepsilon}$ and $\displaystyle{\cal Z}^{2,\varepsilon}$ numerically by the deep BSDE method by solving

\displaystyle\displaystyle\inf_{{\cal Y}^{2,\varepsilon,(m,n)}_{0},{\cal Z}^{2,\varepsilon,(m,n)}}\Big{\|}{\cal Y}_{T}^{2,\varepsilon,(m.n)}\Big{\|}_{2}^{2}

(3.19)

subject to

	$\displaystyle\displaystyle{\cal Y}_{t}^{2,\varepsilon,(m,n)}$	$\displaystyle\displaystyle={\cal Y}^{2,\varepsilon,(m,n)}_{0}-\int_{0}^{t}f(s,\bar{X}_{s}^{\varepsilon,(n)},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m,n)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m,n)})ds$
		$\displaystyle\displaystyle\qquad+\int_{0}^{t}{\cal Z}_{s}^{2,\varepsilon,(m,n)}dW_{s},$		(3.20)

where

\displaystyle\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(m,n)}={\cal U}^{1,(m)}(t,\bar{X}^{\varepsilon,(n)}_{t}),\ \ \ \ \overline{{\cal Z}}_{t}^{1,\varepsilon,(m,n)}={\cal V}^{1,(m)}(t,\bar{X}^{\varepsilon,(n)}_{t}),\ \ \ \ t\in[0,T],

(3.21)

with the continuous Euler-Maruyama scheme $\displaystyle\bar{X}^{\varepsilon,(n)}=\{\bar{X}^{\varepsilon,(n)}_{t}\}_{t\geq 0}(=\bar{X}^{(n)})$ .

We have the main theoretical result in this paper as follows.

Theorem 3.

There exists $\displaystyle C>0$ such that

E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}|^{2}]\leq C\varepsilon^{2(m+1)}+\alpha^{2}C\Big{\{}\varepsilon^{2(m+1)}+\frac{1}{n}+\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\Big{\}},

(3.22)

for all $\displaystyle\varepsilon,\alpha\in(0,1)$ and $\displaystyle n\geq 1$ .

Proof. In the proof, we use a generic constant $\displaystyle C>0$ which varies from line to line. Let $\displaystyle({\cal Y}^{2,\varepsilon,(m)},{\cal Z}^{2,\varepsilon,(m)})$ be the solution of the following BSDE:

\displaystyle\displaystyle{\cal Y}_{t}^{2,\varepsilon,(m)}

\displaystyle\displaystyle=\int_{t}^{T}f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds-\int_{t}^{T}{\cal Z}_{s}^{2,\varepsilon,(m)}dW_{s}.

(3.23)

Then we have

	$\displaystyle\displaystyle\quad E[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}\|^{2}]$
	$\displaystyle\displaystyle=E[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}-\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\|^{2}]$
	$\displaystyle\displaystyle\leq CE[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}\|^{2}]+\alpha^{2}CE[\|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}\|^{2}].$		(3.24)

First, we estimate the term $\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}|^{2}]$ . We note that this term becomes null in (3.8), i.e. the error estimate of Theorem 2 for the case that $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ is explicitly solvable as a closed-form.

Since we have

\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha}=E_{X_{0}}[g(X_{T}^{\varepsilon})]+\alpha E_{X_{0}}[\int_{0}^{T}f(s,X_{s}^{\varepsilon},Y_{s}^{\varepsilon,\alpha},Z_{s}^{\varepsilon,\alpha})ds]

(3.25)

and

	$\displaystyle\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}$	$\displaystyle\displaystyle=E_{X_{0}}[g(\overline{X}^{0,\cdot,\varepsilon}_{T})\{1+{\cal W}^{0,\cdot,\varepsilon,(m)}_{T}\}],$		(3.26)
	$\displaystyle\displaystyle{\cal Y}^{2,\varepsilon,(m)}_{0}$	$\displaystyle\displaystyle=E_{X_{0}}[\int_{0}^{T}f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds],$		(3.27)

it holds that

	$\displaystyle\displaystyle\quad E[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}\|^{2}]$
	$\displaystyle\displaystyle\leq CE[\|E_{X_{0}}[g(X_{T}^{\varepsilon})]-E_{X_{0}}[g(\overline{X}^{0,\cdot,\varepsilon}_{T}){\cal W}^{0,\cdot,\varepsilon,(m)}_{T}]\|^{2}]$
	$\displaystyle\displaystyle\quad+CE\Big{[}\Big{\|}E_{X_{0}}[\int_{0}^{T}\alpha f(s,X_{s}^{\varepsilon},Y_{s}^{\varepsilon,\alpha},Z_{s}^{\varepsilon,\alpha})ds]$
	$\displaystyle\displaystyle\qquad\qquad-E_{X_{0}}[\int_{0}^{T}\alpha f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds]\Big{\|}^{2}\Big{]}$
	$\displaystyle\displaystyle\leq C\varepsilon^{2(m+1)}$
	$\displaystyle\displaystyle+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[\|{\cal Y}_{s}^{1,\varepsilon}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[\|{\cal Z}_{s}^{1,\varepsilon}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds$
	$\displaystyle\displaystyle+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[\|\alpha{\cal Y}_{s}^{2,\varepsilon}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}\|^{2}]ds+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[\|\alpha{\cal Z}_{s}^{2,\varepsilon}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}\|^{2}]ds.$		(3.28)

Here, the estimates

	$\displaystyle\displaystyle\int_{0}^{T}E[\|{\cal Y}_{s}^{1,\varepsilon}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds\leq C\varepsilon^{2(m+1)},$		(3.29)
	$\displaystyle\displaystyle\int_{0}^{T}E[\|{\cal Z}_{s}^{1,\varepsilon}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds\leq C\varepsilon^{2(m+1)},$		(3.30)

are obtained by (3.11) and (3.13). Also, by Theorem 4.2.3 in Zhang (2017) [42], we have

	$\displaystyle\displaystyle\int_{0}^{T}E[\|\alpha{\cal Y}_{s}^{2,\varepsilon}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}\|^{2}]ds+\int_{0}^{T}E[\|\alpha{\cal Z}_{s}^{2,\varepsilon}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}\|^{2}]ds$
	$\displaystyle\displaystyle\leq CE[\int_{0}^{T}\|\alpha f(s,X_{s}^{\varepsilon},\mathcal{Y}_{s}^{1,\varepsilon}+\alpha\mathcal{Y}_{s}^{2,\varepsilon},\mathcal{Z}_{s}^{1,\varepsilon}+\alpha\mathcal{Z}_{s}^{2,\varepsilon})$
	$\displaystyle\displaystyle\qquad\qquad\qquad-\alpha f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha\mathcal{Y}_{s}^{2,\varepsilon},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha\mathcal{Z}_{s}^{2,\varepsilon})\|^{2}ds]$
	$\displaystyle\displaystyle\leq C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\{\int_{0}^{T}E[\|{\cal Y}_{s}^{1,\varepsilon}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds+\int_{0}^{T}E[\|{\cal Z}_{s}^{1,\varepsilon}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds\}$
	$\displaystyle\displaystyle\leq C\alpha^{2}\varepsilon^{2(m+1)},$		(3.31)

where the estimates (3.29) and (3.30) are applied in the last inequality. Therefore, we get

\displaystyle\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}|^{2}]\leq C\varepsilon^{2(m+1)}+C\alpha^{2}\varepsilon^{2(m+1)}.

(3.32)

Next, we estimate

\displaystyle\displaystyle E[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]

(3.33)

in (3.24). We note that only this term appears in (3.8), i.e. the error estimate of Theorem 2 for the case that $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ is explicitly solvable as a closed-form.

Since we have

	$\displaystyle\displaystyle{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}^{2,\varepsilon,(m,n)}_{0}$
	$\displaystyle\displaystyle=\int_{0}^{T}f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds-\int_{0}^{T}{\cal Z}_{s}^{2,\varepsilon,(m)}dW_{s}$
	$\displaystyle\displaystyle\quad-{\cal Y}_{T}^{2,\varepsilon,(m,n)}-\int_{0}^{T}f(s,\bar{X}_{s}^{\varepsilon,(n)},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m,n)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m,n)})ds$
	$\displaystyle\displaystyle\quad+\int_{0}^{T}{\cal Z}_{s}^{2,\varepsilon,(m,n)}dW_{s},$		(3.34)

the upper bound of $\displaystyle E[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]$ can be decomposed as

	$\displaystyle\displaystyle E[\|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}\|^{2}]$
	$\displaystyle\displaystyle\leq\\|\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}\\|_{2}^{2}+\int_{0}^{T}E[\|{\cal Z}_{s}^{2,\varepsilon,(m)}-{\cal Z}_{s}^{2,\varepsilon,(m,n)}\|^{2}]ds$
	$\displaystyle\displaystyle\quad+C_{\text{Lip}}[f]^{2}\times\Big{\{}E[\int_{0}^{T}\|X_{s}^{\varepsilon}-\bar{X}_{s}^{\varepsilon,(n)}\|^{2}ds$
	$\displaystyle\displaystyle\quad+\int_{0}^{T}\|\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}\|^{2}ds+\int_{0}^{T}\|\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m,n)}\|^{2}ds$
	$\displaystyle\displaystyle\quad+\int_{0}^{T}\|\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}\|^{2}ds+\int_{0}^{T}\|\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m,n)}\|^{2}ds]\Big{\}}.$		(3.35)

Then, the following holds:

	$\displaystyle\displaystyle\int_{0}^{T}E[\|\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}\|^{2}]ds\leq CE[\int_{0}^{T}\|X_{s}^{\varepsilon}-\bar{X}_{s}^{\varepsilon,(n)}\|^{2}ds],$		(3.36)
	$\displaystyle\displaystyle\int_{0}^{T}E[\|\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}\|^{2}]ds\leq CE[\int_{0}^{T}\|X_{s}^{\varepsilon}-\bar{X}_{s}^{\varepsilon,(n)}\|^{2}ds],$		(3.37)

since for all $\displaystyle t<T$ , $\displaystyle{\cal U}^{1,(m)}(t,\cdot)$ and $\displaystyle{\cal V}^{1,(m)}(t,\cdot)$ are in $\displaystyle C^{2}_{b}$ and $\displaystyle C^{1}_{b}$ , respectively. Thus, we have

	$\displaystyle\displaystyle E[\|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}\|^{2}]\leq\\|\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}\\|_{2}^{2}+C\times\Big{\{}\sup_{t\in[0,T]}(E[\|X_{t}-\bar{X}^{\varepsilon,(n)}_{t}\|^{2}]$
	$\displaystyle\displaystyle\quad\ \ \ \ \ \ \ \ \ \ +E[\|{\cal Y}_{t}^{2,\varepsilon,(m)}-{\cal Y}_{t}^{2,\varepsilon,(m,n)}\|^{2}])+\int_{0}^{T}E[\|{\cal Z}_{s}^{2,\varepsilon,(m)}-{\cal Z}_{s}^{2,\varepsilon,(m,n)}\|^{2}]ds\Big{\}}.$		(3.38)

By Theorem 1 of Han and Long (2020) [14], it holds that

	$\displaystyle\displaystyle\sup_{t\in[0,T]}(E[\|X_{t}^{\varepsilon}-\bar{X}^{\varepsilon,(n)}_{t}\|^{2}]+E[\|{\cal Y}_{t}^{2,\varepsilon,(m)}-{\cal Y}_{t}^{2,\varepsilon,(m,n)}\|^{2}])+\int_{0}^{T}E[\|{\cal Z}_{s}^{2,\varepsilon,(m)}-{\cal Z}_{s}^{2,\varepsilon,(m,n)}\|^{2}]ds$
	$\displaystyle\displaystyle\leq C\Big{\{}\frac{T}{n}+\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\Big{\}}.$		(3.39)

Therefore, we get

E[|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}|^{2}]\\ \leq\frac{C}{n}+C\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2},

(3.40)

and the assertion is obtained as:

	$\displaystyle\displaystyle\quad E[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}\|^{2}]$		(3.41)
	$\displaystyle\displaystyle\leq CE[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}\|^{2}]+\alpha^{2}CE[\|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}\|^{2}]$
	$\displaystyle\displaystyle\leq C\varepsilon^{2(m+1)}+C\alpha^{2}\varepsilon^{2(m+1)}+\alpha^{2}C\Big{\{}\frac{1}{n}+\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\Big{\}}.\ \ \ \ \ \ \Box$

By the theorem above, it holds that

\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha}

\displaystyle\displaystyle\approx

\displaystyle\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast},

(3.42)

where $\displaystyle{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast}$ is obtained by solving (3.19) with Deep BSDE method. The process $\displaystyle\overline{{\cal Y}}^{1,\varepsilon,(m,n)}$ and $\displaystyle\overline{{\cal Z}}^{1,\varepsilon,(m,n)}$ work as control variates for the nonlinear BSDE.

Here, let us briefly make comments on comparison of the theoretical error estimates of our proposed method, namely (3.8) in Theorem 2 for the explicitly solvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ case and (3.22) in Theorem 3 for the unsolvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ case with the one provided by Han and Long (2020) for the method of Weinan E et al (2017), i.e. (2.10) in Theorem 1. Given the number of discretized time steps $\displaystyle n$ for Euler-Maruyama scheme, those are relisted below:

•

Proposed method (for the solvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ case):

\displaystyle\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-\{{\cal Y}_{0}^{1,\varepsilon}+\alpha\widetilde{\cal Y}_{0}^{2,\varepsilon,(n)}\}|^{2}]\leq C\alpha^{2}\frac{1}{n}+C\alpha^{2}\norm{\widetilde{\mathcal{Y}}^{2,\varepsilon,(n)}_{T}}_{2}^{2}\ \ \ (\ref{solvable-thm})

•

Proposed method (for the unsolvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ case):

	$\displaystyle\displaystyle E[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}\|^{2}]$
	$\displaystyle\displaystyle\ \ \ \ \ \ \ \leq(C\varepsilon^{2(m+1)}+C\alpha^{2}\varepsilon^{2(m+1)})+C\alpha^{2}\frac{1}{n}+C\alpha^{2}\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\ \ (\ref{unsolvable-main-thm})$

•

Method of Weinan E et al. (2017) (error estimate by Han and Long (2020)):

\displaystyle\displaystyle E[|Y_{0}^{\varepsilon,\alpha}-Y_{0}^{\varepsilon,\alpha,(n)}|^{2}]\leq C\frac{1}{n}+C\norm{g(\bar{X}_{T}^{\varepsilon,(n)})-{Y}_{T}^{\varepsilon,\alpha,(n)}}_{2}^{2}.\ \ (\ref{han-long-thm})

Thanks to the following advantages of our proposed method, we can see that it works better as a new Deep BSDE solver, more precisely, its errors are expected to be smaller:

•

(i) Decomposition into a “dominant” linear PDE with original terminal $\displaystyle g$ and a “small” nonlinear PDE with zero terminal, i.e.

\displaystyle\displaystyle u(0,x)=\underset{\mbox{\tiny{``dominant" linear PDE part}}}{{\cal U}^{1}(0,x)}+\underset{\mbox{\tiny{``small" nonlinear PDE part}}}{{\cal U}^{2}(0,x)},\ \ \ \ x\in\mathbb{R}^{d}.

and an application of Deep BSDE solver only to the “small” nonlinear PDE.

(ii) Closed form solutions/approximations for the linear PDE, which also work as control variates for the driver of the nonlinear PDE.

Thanks to (i) and (ii), we can obtain the term $\displaystyle C\alpha^{2}\norm{\widetilde{\mathcal{Y}}^{2,\varepsilon,(n)}_{T}}_{2}^{2}$ in the error bound, rather than $\displaystyle C\norm{g(\bar{X}_{T}^{\varepsilon,(n)})-{Y}_{T}^{\varepsilon,\alpha,(n)}}_{2}^{2}$ .

Moreover, we note that our method enjoys the effects of a small parameter in the nonlinear driver $\displaystyle\alpha\in(0,1)$ for this term, as well as for the discretization error term caused by Euler-Maruyama scheme, which is given as $\displaystyle C\alpha^{2}\frac{1}{n}$ rather than $\displaystyle C\frac{1}{n}$ .

•

Regarding the unsolvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$ case, our asymptotic expansions with respect to a small parameter $\displaystyle\varepsilon\in(0,1)$ in the diffusion coefficient enable us to obtain closed form approximations $\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(m)}={\cal U}^{1,(m)}(t,X_{t}^{\varepsilon})$ and $\displaystyle\overline{{\cal Z}}_{t}^{1,\varepsilon,(m)}={\cal V}^{1,(m)}(t,X_{t}^{\varepsilon})$ : Particularly, in (3.22) the coefficients $\displaystyle C\varepsilon^{2(m+1)}$ and $\displaystyle C\alpha^{2}\varepsilon^{2(m+1)}$ are associated with errors of the approximations for terminal $\displaystyle g$ and driver $\displaystyle\alpha f$ , respectively.

We will check the effectiveness of the new method by numerical experiments in the next section.

Remark 1.

We give an important remark on the new method. While the proposed scheme provides a fine result, we can further improve it by replacing our approximation for the linear part $\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}$ in the decomposition (3.42) with the methods of [36][35][39][26]
[29][20].

For example, based on Takahashi and Yamada (2016) [35], the following result will be an improvement of the proposed scheme. Let $\displaystyle t_{i}=T(1-(1-i/n_{0})^{\gamma})$ , $\displaystyle i=0,1,\cdots,n$ , with a parameter $\displaystyle\gamma>0$ , and $\displaystyle\bar{X}_{t_{i}}^{0,x,\varepsilon,(n)}=\overline{X}_{t_{i}}^{t_{i-1},\bar{X}_{t_{i-1}}^{0,x,\varepsilon,(n)},\varepsilon}$ , $\displaystyle i=1,\cdots,n$ . Define

\displaystyle\displaystyle\widehat{{\cal Y}}_{0}^{1,\varepsilon,(m,n_{0})}=E[g(\bar{X}_{T}^{0,x,\varepsilon,(n_{0})})\prod_{i=1}^{n_{0}}{\cal W}_{t_{i}}^{t_{i-1},\bar{X}_{t_{i-1}}^{0,x,\varepsilon,(n_{0})},\varepsilon}]|_{x=X_{0}},

(3.43)

and consider the quantity

\displaystyle\displaystyle\widehat{{\cal Y}}_{0}^{1,\varepsilon,(m,n_{0})}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast},

(3.44)

where $\displaystyle{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast}$ is the same as in (3.42). Then, (3.44) will be the improved approximation, as

\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha}

\displaystyle\displaystyle\approx

\displaystyle\displaystyle\widehat{{\cal Y}}_{0}^{1,\varepsilon,(m,n_{0})}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)\ast},

(3.45)

in the following sense.

Corollary 1.

There exist $\displaystyle C>0$ and $\displaystyle r(m)>0$ such that

E[|Y_{0}^{\varepsilon,\alpha}-\{\widehat{{\cal Y}}_{0}^{1,\varepsilon,(m,n_{0})}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}|^{2}]\leq C\frac{\varepsilon^{2(m+1)}}{n_{0}^{2r(m)}}+\alpha^{2}C\Big{\{}\varepsilon^{2(m+1)}+\frac{1}{n}+\norm{\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}}_{2}^{2}\Big{\}},

(3.46)

for all $\displaystyle\varepsilon,\alpha\in(0,1)$ and $\displaystyle n_{0},n\geq 1$ .

Remark 2.

If the driver of a BSDE contains a linear part, we can transform the BSDE to the one considered in the current paper, namely the equation (2.2). For instance, let us solve FSDE (2.1) and the following BSDE:

	$\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}$	$\displaystyle\displaystyle=$	$\displaystyle\displaystyle[A(t,X_{t}^{\varepsilon})Y_{t}^{\varepsilon,\alpha}+Z_{t}^{\varepsilon}B(t,X_{t}^{\varepsilon})+\alpha f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})]dt-Z_{t}^{\varepsilon}dW_{t},$
	$\displaystyle\displaystyle Y_{T}^{\varepsilon,\alpha}$	$\displaystyle\displaystyle=$	$\displaystyle\displaystyle g(X_{T}^{\varepsilon}),$

where $\displaystyle A$ and $\displaystyle B$ are $\displaystyle\mathbb{R}$ -valued and $\displaystyle\mathbb{R}^{d}$ -valued bounded functions, respectively.

Let $\displaystyle\hat{Y}_{t}^{\varepsilon,\alpha}:=e^{\int_{0}^{t}A(s,X_{t}^{\varepsilon})ds}Y_{t}^{\varepsilon,\alpha}$ , $\displaystyle t\geq 0$ , and $\displaystyle\hat{W}_{t}:=W_{t}-\int_{0}^{t}B(s,X_{s}^{\varepsilon})ds$ , $\displaystyle t\geq 0$ , which is a Brownian motion under a probability measure $\displaystyle\hat{P}$ obtained by the change of measure with process $\displaystyle B(\cdot,X_{\cdot}^{\varepsilon})$ . Then, we have

	$\displaystyle\displaystyle dX_{t}^{\varepsilon}$	$\displaystyle\displaystyle=[\mu(t,X_{t}^{\varepsilon})+\varepsilon\sigma(t,X_{t}^{\varepsilon})B(t,X_{t}^{\varepsilon})]dt+\varepsilon\sigma(t,X_{t}^{\varepsilon})d\hat{W}_{t},\quad X_{0}^{\varepsilon}\in L^{2}(\Omega;\mathbb{R}^{d}),$
	$\displaystyle\displaystyle-d\hat{Y}_{t}^{\varepsilon,\alpha}$	$\displaystyle\displaystyle=\alpha e^{\int_{0}^{t}A(s,X_{s}^{\varepsilon})ds}f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})dt-e^{\int_{0}^{t}A(s,X_{s}^{\varepsilon})ds}Z_{t}d\hat{W}_{t},\ Y_{T}^{\varepsilon,\alpha}=e^{\int_{0}^{T}A(s,X_{s}^{\varepsilon})ds}g(X_{T}^{\varepsilon}).$

4 Numerical results

In the numerical examples, we demonstrate that the deep BSDE method with the first order asymptotic expansion obtained in Proposition 2 provides enough accuracy in solving semilinear PDEs. The dimension $\displaystyle d$ in (2.2) is assumed to be $\displaystyle d=1$ or $\displaystyle d=100$ .

We investigate the accuracy of the new method by comparing to the standard Deep BSDE method in Weinan E et al. (2017) [3] and the Deep BSDE method with a prior knowledge in Fujii et al. (2019) [9] for the model (2.2), where the target BSDEs with FSDEs are specified later.

4.1 Numerical schemes used in experiments

In this subsection, we explain the details of schemes used in numerical experiments. To construct the deep neural networks for each method, we follow Weinan E et al. (2017) [3] and employ the adaptive moment estimation (Adam) with mini-batches. The parameters for the networks are set as follows: there are $\displaystyle d+10$ of hidden layers except batch normalization layers. For all learning steps, $\displaystyle 256$ sample paths are generated and the learning rate is taken as $\displaystyle 0.01$ .

(Numerical scheme) Now, let us briefly explain the schemes used in the numerical experiment in the following subsections.

1.

(Deep BSDE method based on Weinan E et al. (2017)) In forward discretization of $\displaystyle Y^{\varepsilon,\alpha}$ , the Euler-Maruyama scheme $\displaystyle\bar{X}^{\varepsilon,(n)}$ is applied with time step $\displaystyle n=20$ . The initial guess of $\displaystyle Y_{0}^{\varepsilon,\alpha}$ is generated by uniform random number around $\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(1)}$ , which is a prior knowledge for the Deep BSDE method.

In the study of Weinan E et al. (2017), it is known that the estimated value by the Deep BSDE method converges to the true value of $\displaystyle Y_{0}^{\varepsilon,\alpha}$ if we take a sufficient number of iteration steps.

In Section 4.2 below, the estimate values based on this scheme are shown by the green lines labeled with “Deep BSDE” in the figures.
2.

(Deep BSDE method with an enhanced version of Fujii et al. (2019)) In forward discretization of $\displaystyle Y^{\varepsilon,\alpha}$ in the Deep BSDE solver, as an approximation of $\displaystyle\mathcal{Z}^{1,\varepsilon}$ we apply $\displaystyle\overline{{\cal Z}}_{t}^{1,\varepsilon,(1,n)}={\cal V}^{1,(1)}(t,\bar{X}_{t}^{\varepsilon,(n)})$ , $\displaystyle t\geq 0$ , with the function $\displaystyle{\cal V}^{1,(1)}$ defined by (3.17) and the Euler-Maruyama scheme $\displaystyle\bar{X}^{\varepsilon,(n)}$ with time step $\displaystyle n=20$ to obtain an estimate of $\displaystyle\mathcal{Z}^{2,\varepsilon}$ by optimization in the Deep BSDE solver. As the initial value of $\displaystyle Y_{0}^{\varepsilon,\alpha}$ , we use $\displaystyle{{\cal U}}^{1,(1)}(0,x)$ with the function $\displaystyle{\cal U}^{1,(1)}$ defined by (3.16), an approximation of $\displaystyle\mathcal{Y}^{1,\varepsilon}$ , which appears in the linear part of our decomposition of the BSDE $\displaystyle(Y^{\varepsilon,\alpha},Z^{\varepsilon,\alpha})$ with $\displaystyle Y^{\varepsilon,\alpha}=\mathcal{Y}^{1,\varepsilon}+\alpha\mathcal{Y}^{2,\varepsilon}$ and $\displaystyle Z^{\varepsilon,\alpha}=\mathcal{Z}^{1,\varepsilon}+\alpha\mathcal{Z}^{2,\varepsilon}$ . Thus, the scheme is an improved version of Fujii et al. (2019) [9], since it applies the higher order term $\displaystyle\overline{{\cal Z}}^{1,\varepsilon,(1)}$ than the leading order term $\displaystyle\overline{{\cal Z}}^{1,\varepsilon,(0)}$ that Fujii et al. (2019) [9] uses.

Through the study of Fujii et al. (2019) [9], it is also known that the estimated value by the enhanced Deep BSDE method converges to the true value of $\displaystyle Y_{0}^{\varepsilon,\alpha}$ with a much smaller number of iteration steps than by the original Deep BSDE method in Weinan E et al. (2017).

In Section 4.2 below, the estimate values based on this scheme are shown by the red lines labeled with “Deep BSDE[ $\displaystyle(Y,Z)$ ]+AE[ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0}$ and $\displaystyle\overline{Z}^{1,\varepsilon}$ ]” in the figures.

(New scheme) Following the main result introduced in Section 3, particularly Theorem 2, we employ our approximation (3.42) for the decomposition $\displaystyle Y^{\varepsilon,\alpha}_{0}=\mathcal{Y}^{1,\varepsilon}_{0}+\alpha\mathcal{Y}^{2,\varepsilon}_{0}$ with $\displaystyle m=1$ and $\displaystyle n=20$ , namely,

\displaystyle\displaystyle Y_{0}^{\varepsilon,\alpha}

\displaystyle\displaystyle\approx

\displaystyle\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(1)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(1,20)\ast},

where we compute the nonlinear part $\displaystyle{\cal Y}_{0}^{2,\varepsilon,(1,20)\ast}$ with (3.19)–(3.21) by the Deep BSDE solver, while $\displaystyle\overline{{\cal Y}}_{0}^{1,\varepsilon,(1)}$ by $\displaystyle{{\cal U}}^{1,(1)}(0,x)$ with the function $\displaystyle{\cal U}^{1,(1)}$ defined by (3.16).

Specifically, in computation of $\displaystyle{\cal Y}_{0}^{2,\varepsilon,(1,20)\ast}$ by the Deep BSDE solver with the equation:

\displaystyle\displaystyle-d\mathcal{Y}_{t}^{2,\varepsilon}

\displaystyle\displaystyle=f(t,X_{t}^{\varepsilon},\mathcal{Y}_{t}^{1,\varepsilon}+\alpha\mathcal{Y}_{t}^{2,\varepsilon},\mathcal{Z}_{t}^{1,\varepsilon}+\alpha\mathcal{Z}_{t}^{2,\varepsilon})dt-\mathcal{Z}_{t}^{2,\varepsilon}dW_{t},\quad\mathcal{Y}_{T}^{2,\varepsilon}=0,

we use $\displaystyle\overline{{\cal Y}}_{t}^{1,\varepsilon,(1)}={\cal U}^{1,(1)}(t,X_{t}^{\varepsilon})$ and $\displaystyle\overline{{\cal Z}}_{t}^{1,\varepsilon,(1)}={\cal V}^{1,(1)}(t,X_{t}^{\varepsilon})$ as approximations for $\displaystyle\mathcal{Y}_{t}^{1,\varepsilon}$ and $\displaystyle\mathcal{Z}_{t}^{1,\varepsilon}$ in the driver $\displaystyle f$ , respectively.

In Section 4.2 below, the estimated values based on this new scheme are shown by the blue lines labeled with “New method [ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}$ , $\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}$ ]” in the figures.

The initial value of $\displaystyle Z_{0}^{\varepsilon,\alpha}$ or $\displaystyle\mathcal{Z}_{0}^{2,\varepsilon}$ is generated by uniform random number with the range $\displaystyle[-0.01,0.01]$ for each method. Numerical experiments presented in the following subsections are implemented by Python with TensorFlow on Google Colaboratory.

4.2 Numerical experiments

We show some numerical examples which show that our proposed method substantially outperforms other methods in terms of terminal errors (numerical values of loss functions), variations and convergence speed.

4.2.1 The case of $\displaystyle d=1$

This subsection presents the numerical results for the case of $\displaystyle d=1$ .

We first check the performance of our method in the model, where the explicit value of the solution is obtained by the Picard iteration. We consider an option pricing model in finance that takes CVA(credit value adjustment) into account as follows:

	$\displaystyle\displaystyle dX_{t}^{\varepsilon}=$	$\displaystyle\displaystyle\mu(t,X_{t}^{\varepsilon})dt+\varepsilon\sigma(t,X_{t}^{\varepsilon})dW_{t},$		(4.1)
	$\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}=$	$\displaystyle\displaystyle-\alpha(Y_{t}^{\varepsilon,\alpha})^{+}dt-Z_{t}^{\varepsilon,\alpha}dW_{t},\quad Y_{T}^{\varepsilon,\alpha}=(X_{T}^{\varepsilon}-K)^{+},$		(4.2)

with $\displaystyle f(t,x,y,z)=-(y)^{+}$ and $\displaystyle g(x)=(x-K)^{+}$ . in (2.1) and (2.2). We note that $\displaystyle\alpha=$ (loss rate in default) $\displaystyle\times$ (default intensity) in a finance model of CVA.

In computation we set $\displaystyle\mu(t,x)=0$ , $\displaystyle\sigma(t,x)=x$ , $\displaystyle\varepsilon=\sigma=0.2$ , $\displaystyle X_{0}=100$ , $\displaystyle\alpha=0.05$ , $\displaystyle T=0.5$ with $\displaystyle K=100$ (ATM case) and $\displaystyle K=115$ (OTM case).

In this case an explicit value of $\displaystyle Y_{0}$ is computed as $\displaystyle\textstyle{Y_{0}={\cal Y}_{0}^{1}(1+\sum_{i=1}^{\infty}(-1)^{i}\alpha^{i}T^{i}\frac{1}{i!})}$ . More precisely, by the $\displaystyle k$ -Picard iteration of the backward equation:

		$\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha,[k]}=-\alpha(Y_{t}^{\varepsilon,\alpha,[k-1]})^{+}dt-Z_{t}^{\varepsilon,\alpha,[k]}dW_{t},\ \ \ Y_{T}^{\varepsilon,\alpha,[k]}=(X_{T}^{\varepsilon}-K)^{+},$		(4.3)
	with	$\displaystyle\displaystyle\ \ \ \ (Y_{t}^{\varepsilon,\alpha,[0]})^{+}=E[(X_{T}^{\varepsilon}-K)^{+}\|{\cal F}_{t}]={\cal Y}_{t}^{1},\ \mbox{for \ all}\ t\geq 0,$		(4.4)

it is easy to see that $\displaystyle\textstyle{Y_{t}^{\varepsilon,\alpha,[k]}={\cal Y}_{t}^{1}-\alpha\int_{t}^{T}E[Y_{s}^{\varepsilon,\alpha,[k-1]}|{\cal F}_{t}]ds}$ , and thus one has

\displaystyle\displaystyle\textstyle{Y_{0}^{\varepsilon,\alpha,[k]}={\cal Y}_{0}^{1}(1+\sum_{i=1}^{k}(-1)^{i}\alpha^{i}T^{i}\frac{1}{i!})}.

Then, the true values are given as $\displaystyle Y_{0}=5.50$ in the ATM case and $\displaystyle Y_{0}=1.26$ in the OTM case by the $\displaystyle 5$ -Picard iteration, which provides enough convergence and hence accuracy.

Figure 1 and 2 show the numerical values of loss functions and the approximate values of $\displaystyle Y_{0}$ , respectively against the number of iteration steps for the ATM case, and Figure 3 and 4 for the OTM case.

By Figure 2 and 4, the numerical values of “New method [ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}$ , $\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]$ ” converge to the true values substantially faster with smaller variations comparing to other schemes. Also, we can see that the errors of “New method” are much smaller according to the behavior of their loss functions against the number of iteration steps in Figure 1 and 3.

Refer to caption — Figure 1: Values of the loss function and number of iteration steps (1-dim option pricing model with CVA, ATM case)

Next, we present numerical examples for the model, where explicit values of $\displaystyle Y_{0}$ can not be obtained without numerical schemes such as Monte Carlo simulations. Let us consider

	$\displaystyle\displaystyle dX_{t}^{\varepsilon}=$	$\displaystyle\displaystyle\mu(t,X_{t}^{\varepsilon})dt+\varepsilon\sigma(t,X_{t}^{\varepsilon})dW_{t},$		(4.5)
	$\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}=$	$\displaystyle\displaystyle\alpha f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})dt-Z_{t}^{\varepsilon,\alpha}dW_{t},\quad Y_{T}^{\varepsilon,\alpha}=g(X_{T}^{\varepsilon})$		(4.6)

with $\displaystyle\mu(t,x)=0$ , $\displaystyle\sigma(t,x)=x$ , $\displaystyle\varepsilon=\sigma=0.2$ , $\displaystyle T=0.25$ , $\displaystyle X_{0}=100$ , and

\displaystyle\displaystyle f(t,x,y,z)

\displaystyle\displaystyle=-\left\{\left(y-z\sigma^{-1}\mathrm{1}\right)^{-}(R-r)\right\}

(4.7)

with $\displaystyle R=0.06$ , $\displaystyle r=0.0$ , and

g(x)=(x-K_{1})^{+}-2(x-K_{2})^{+},\ \ \mbox{with}\ K_{1}=95,\ K_{2}=105.

(4.8)

As we mentioned in Section 4.1, the estimated value by the methods “Deep BSDE” and “Deep BSDE[ $\displaystyle(Y,Z)$ ]+AE[ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0}$ and $\displaystyle\overline{Z}^{1,\varepsilon}$ ]” converges to the true value of $\displaystyle Y_{0}$ . Then, in the experiments, we check whether the estimated value by “New method [ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}$ , $\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]$ ” converges faster than the ones computed by the methods “Deep BSDE” and “Deep BSDE[ $\displaystyle(Y,Z)$ ]+AE[ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0}$ and $\displaystyle\overline{Z}^{1,\varepsilon}$ ]”.

Figure 5 shows the numerical values of loss functions against the number of iteration steps. While “Deep BSDE[ $\displaystyle(Y,Z)$ ]+AE[ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0}$ and $\displaystyle\overline{Z}^{1,\varepsilon}$ ]” is superior to the original “Deep BSDE”, we see that “New method [ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}$ , $\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]$ ” gives much more stable and accurate convergence than other schemes.

Figure 6 shows the approximate values of $\displaystyle Y_{0}$ against the number of iteration steps. It is observed that “New method [ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}$ , $\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]$ ” provides the fastest convergence with the smallest standard deviation, while “Deep BSDE[ $\displaystyle(Y,Z)$ ]+AE[ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0}$ and $\displaystyle\overline{Z}^{1,\varepsilon}$ ]” gives better approximation than “Deep BSDE”.

4.2.2 The case of $\displaystyle d=100$

We show the main numerical result for $\displaystyle d=100$ . The same experiment as in the case of $\displaystyle d=1$ is performed. Let us consider

	$\displaystyle\displaystyle dX_{t}^{\varepsilon,i}=$	$\displaystyle\displaystyle\mu^{i}(t,X_{t}^{\varepsilon})dt+\varepsilon\sum_{j=1}^{d}\sigma^{i}_{j}(t,X_{t}^{\varepsilon})dW^{j}_{t},$		(4.9)
	$\displaystyle\displaystyle-dY_{t}^{\varepsilon,\alpha}=$	$\displaystyle\displaystyle\alpha f(t,X_{t}^{\varepsilon},Y_{t}^{\varepsilon,\alpha},Z_{t}^{\varepsilon,\alpha})dt-Z_{t}^{\varepsilon,\alpha}dW_{t},\quad Y_{T}^{\varepsilon,\alpha}=g(X_{T}^{\varepsilon}),$		(4.10)

with $\displaystyle\mu^{i}(t,x)=0$ , $\displaystyle\sigma^{i}_{j}(t,x)=x^{i}$ ( $\displaystyle i=1,\cdots,d$ ), $\displaystyle\varepsilon=0.4$ , $\displaystyle X_{0}^{i}=100$ , $\displaystyle T=0.25$ , and

\displaystyle\displaystyle f(t,x,y,z)

\displaystyle\displaystyle=-\left\{-\left(y-\sum_{k=1}^{d}\sum_{j=1}^{d}z_{k}[\sigma^{-1}]_{kj}\right)^{-}(R-r)\right\},

(4.11)

with $\displaystyle R=0.01$ , $\displaystyle r=0.0$ where $\displaystyle\theta$ is defined by $\displaystyle\mu-r\mathrm{1}=\sigma\theta$ , and

g(x)=\left(\frac{1}{d}\sum_{i=1}^{d}x_{i}-K_{1}\right)^{+}-2\left(\frac{1}{d}\sum_{i=1}^{d}x_{i}-K_{2}\right)^{+},\ \ \mbox{with}\ K_{1}=95,\ K_{2}=105.

(4.12)

The result is given in Figure 7, 8 and 9. It seems that the convergence speed of the original deep BSDE method is too slow to obtain the precise result. On the contrary, “Deep BSDE[ $\displaystyle(Y,Z)$ ]+AE[ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0}$ and $\displaystyle\bar{Z}^{1,\varepsilon}$ ]” and “New method [ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}$ , $\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]$ ” work well even in this high dimensional case. Particularly, “New method [ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}+{\cal Y}^{2,\varepsilon,DL}$ , $\displaystyle\overline{{\cal Z}}^{1,\varepsilon}+{\cal Z}^{2,\varepsilon,DL}]$ ” provides a remarkable performance in terms of convergence speed, accuracy (numerical values of loss functions) and variations. Moreover, comparing the results of our new method and “Deep BSDE[ $\displaystyle(Y,Z)$ ]+AE[ $\displaystyle\overline{{\cal Y}}^{1,\varepsilon}_{0}$ and $\displaystyle\bar{Z}^{1,\varepsilon}$ ]” closely, Figure 9 shows that the variation of $\displaystyle Y_{0}$ by our method is much smaller, which is consistent with much smaller values of loss functions for the new method appearing in Figure 7.

5 Conclusion and future works

This paper has introduced a new control variate method for Deep BSDE solver to improve the methods such as in Weinan E et al. (2017) [3] and Fujii et al. (2019) [9]. First, we decompose a target semilinear PDE (BSDE) to two parts, namely dominant linear and residual nonlinear PDEs (BSDEs). When the dominant part is obtained as a closed-form or approximated based on an asymptotic expansion scheme, the small nonlinear PDE part is efficiently computed by Deep BSDE solver, where the asymptotic expansion crucially works as a control variate. The main theorem provides the validity of our proposed method. Moreover, numerical examples for one and 100 dimensional BSDEs corresponding to target nonlinear PDEs show the effectiveness of our scheme, which is consistent with our initial conjecture and theoretical result.

As mentioned in Remark 1, even if the accuracy of the standard asymptotic expansion scheme becomes worse, the linear PDE part can be more efficiently approximated by the existing methods such as [36][35][39][26][29][20]. We should check those performances in such cases against various nonlinear models. Also, it will be a challenging task to examine whether the high order automatic differentiation schemes proposed in [40][37] work as efficient approximations of $\displaystyle Z$ in nonlinear BSDEs or $\displaystyle\partial_{x}u$ in nonlinear PDEs. These are left for future studies.

Appendix

Appendix A Proof of Propositions

A.1 Proof of Proposition 1

See Proposition 4.2 in Takahashi and Yamada (2015) [34] for (3.11)–(3.14), for instance.

Also, note that for $\displaystyle p\geq 1$ and a multi-index $\displaystyle\alpha$ , $\displaystyle\sup{}_{x\in\mathbb{R}^{d}}\|\partial_{x}^{\alpha}\overline{X}^{t,x,\varepsilon}_{T}\|_{p}\leq C(T)$ and $\displaystyle\sup{}_{x\in\mathbb{R}^{d}}\|\partial_{x}^{\alpha}{\cal W}^{t,x,\varepsilon,(m)}_{T}\|_{p}\leq C(T)$ hold for $\displaystyle t<T$ . Then, $\displaystyle\sup{}_{x\in\mathbb{R}^{d}}|\partial_{x}^{2}{\cal U}^{1,(m)}(t,x)|\leq\|\nabla^{2}g\|_{\infty}C(T)$ for $\displaystyle t<T$ , i.e. $\displaystyle{\cal U}^{1,(m)}(t,\cdot)\in C_{b}^{2}(\mathbb{R}^{d})$ , $\displaystyle t<T$ .

For $\displaystyle{\cal V}^{1,(m)}$ , we have the representation

\displaystyle\displaystyle{\cal V}^{1,(m)}(t,x)=E[g(\overline{X}^{t,x,\varepsilon}_{T}){\cal Z}^{t,x,\varepsilon,(m)}_{T}]=E[(\nabla g)(\overline{X}^{t,x,\varepsilon}_{T}){\cal Q}^{t,x,\varepsilon,(m)}_{T}],

for a matrix-valued Wiener functional $\displaystyle{\cal Q}^{t,x,\varepsilon,(m)}_{T}=[[{\cal Q}^{t,x,\varepsilon,(m)}_{T}]^{i}_{j}]_{1\leq i,j\leq d}$ such that $\displaystyle[{\cal Q}^{t,x,\varepsilon,(m)}_{T}]^{i}_{j}\in\mathbb{D}^{\infty}$ , $\displaystyle 1\leq i,j\leq d$ , satisfying for $\displaystyle p\geq 1$ and a multi-index $\displaystyle\alpha$ , $\displaystyle\sup{}_{x\in\mathbb{R}^{d}}\|\partial_{x}^{\alpha}{\cal Q}^{t,x,\varepsilon,(m)}_{T}\|_{p}\leq C(T)$ for $\displaystyle t<T$ . Then, $\displaystyle\sup{}_{x\in\mathbb{R}^{d}}|\partial_{x}{\cal V}^{1,(m)}(t,x)|\leq\|\nabla^{2}g\|_{\infty}C(T)$ for $\displaystyle t<T$ , i.e. $\displaystyle{\cal V}^{1,(m)}(t,\cdot)\in C_{b}^{1}(\mathbb{R}^{d})$ , $\displaystyle t<T$ . $\displaystyle\Box$

A.2 Proof of Proposition 2

For the derivations, we use Malliavin calculus. Let $\displaystyle{\cal T}\in\mathcal{S}^{\prime}(\mathbb{R}^{d})$ be a tempered distribution and $\displaystyle F\in(\mathbb{D}^{\infty})^{d}$ be a nondegenerate Wiener functional in the sense of Malliavin. Then, $\displaystyle{\cal T}(F)$ is well-defined as an element of the space of Watanabe distributions $\displaystyle\mathbb{D}^{-\infty}$ , that is the dual space of $\displaystyle\mathbb{D}^{\infty}$ . Also, for $\displaystyle G\in\mathbb{D}^{\infty}$ , a (generalized) expectation $\displaystyle E[{\cal T}(F)G]$ is understood as a coupling of $\displaystyle{\cal T}(F)\in\mathbb{D}^{-\infty}$ and $\displaystyle G\in\mathbb{D}^{\infty}$ , namely $\displaystyle{}_{\mathbb{D}^{-\infty}}\langle{\cal T}(F),G\rangle_{\mathbb{D}^{\infty}}$ .

Note that $\displaystyle G_{T}^{t,x,\varepsilon}:=(X_{T}^{t,x,\varepsilon}-X_{T}^{t,x,0})/\varepsilon$ and $\displaystyle(\partial/\partial x)X_{T}^{t,x,\varepsilon}$ in

\displaystyle\displaystyle{\cal U}^{1}(t,x)=E[g(X_{T}^{t,x,\varepsilon})]=\int_{\mathbb{R}^{d}}g(X_{T}^{t,x,0}+\varepsilon y)E[\delta_{y}(G_{T}^{t,x,\varepsilon})]dy

(A.1)

and

	$\displaystyle\displaystyle(\partial/\partial x){\cal U}^{1}\sigma^{\varepsilon}(t,x)=E[(\nabla g)(X_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon}]\varepsilon\sigma(t,x)$		(A.2)
	$\displaystyle\displaystyle=\int_{\mathbb{R}^{d}}\sum_{i=1}^{d}(\partial_{i}g)(X_{T}^{t,x,0}+\varepsilon y)E[\delta_{y}(G_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon,i}]dy\varepsilon\sigma(t,x)$		(A.3)

have expansions in $\displaystyle\mathbb{D}^{\infty}$ whose expansion coefficients are given by iterated stochastic integrals: $\displaystyle G_{T}^{t,x,\varepsilon}\sim X_{1,T}^{t,x}+\varepsilon X_{2,T}^{t,x}+\cdots$ and $\displaystyle(\partial/\partial x)X_{T}^{t,x,\varepsilon}\sim J_{t\to T}^{0,x}+\varepsilon J_{t\to T}^{1,x}+\cdots$ . In particular,

$\displaystyle\displaystyle X_{1,T}^{t,x}$	$\displaystyle\displaystyle=\sum_{k=1}^{d}\int_{t}^{T}J_{t\to T}^{0,x}(J_{t\to s}^{0,x})^{-1}\sigma_{k}(X_{s}^{t,x,0})dW_{s}^{k},$	(A.4)
$\displaystyle\displaystyle X_{2,T}^{t,x}$	$\displaystyle\displaystyle=\sum_{k=1}^{d}\int_{t}^{T}J_{t\to T}^{0,x}(J_{t\to s}^{0,x})^{-1}\partial\sigma_{k}(X_{s}^{t,x,0})X_{1,s}^{t,x}dW_{s}^{k}$	(A.5)
	$\displaystyle\displaystyle\ \ \ \ +\frac{1}{2}\int_{t}^{T}J_{t\to T}^{0,x}(J_{t\to s}^{0,x})^{-1}\partial^{2}\mu(X_{s}^{t,x,0})\cdot X_{1,s}^{t,x}\otimes X_{1,s}^{t,x}ds,$	(A.6)

and

	$\displaystyle\displaystyle J_{t\to T}^{0,x}=(\partial/\partial x)X_{T}^{t,x,0},$		(A.7)
	$\displaystyle\displaystyle J_{t\to T}^{1,x}=J_{t\to T}^{0,x}\{\int_{t}^{T}\partial^{2}\mu(X_{s}^{t,x,0})X_{1,s}^{t,x}ds+\sum_{k=1}^{d}\int_{t}^{T}\partial\sigma_{k}(X_{s}^{t,x,0})dW_{s}^{k}\},$		(A.8)

where the followings are used: for a smooth function $\displaystyle V:\mathbb{R}^{d}\to\mathbb{R}^{d}$ ,

\partial^{2}V(x)=\left[\partialderivative{V_{\alpha}^{i}(x)}{x^{j}}{x^{k}}\right]^{k}_{j},

(A.9)

\left[\partial^{2}V\cdot\eta\otimes\eta\right]^{i}=\sum_{j,k}\partialderivative{V_{\alpha}^{i}}{x^{j}}{x^{k}}\eta^{j}\eta^{k},\ \ \ \eta\in\mathbb{R}^{d}.

(A.10)

We expand $\displaystyle E[\delta_{y}(X_{T}^{t,x,\varepsilon})]$ in (A.1) and $\displaystyle E[(\nabla\delta_{y})(X_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon}]$ in (A.3) to obtain explicit expressions of $\displaystyle{\cal U}^{1,(1)}(t,x)$ and $\displaystyle{\cal V}^{1,(1)}(t,x)$ . Next, let us recall the following formulas.

Lemma 1.

Let $\displaystyle{\cal T}\in\mathcal{S}^{\prime}(\mathbb{R}^{d})$ be a tempered distribution.

For an adapted process $\displaystyle h\in L^{2}([0,T]\times\Omega)$ ,

\displaystyle\displaystyle\sum_{j=1}^{d}E[\partial_{j}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}(D_{i,s}{X}_{1,T}^{t,x,j})h(s)ds]=E[{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}h(s)dW_{s}^{i}],

(A.11)

where $\displaystyle D_{i,\cdot}F$ represents the $\displaystyle i$ -th element of the Malliavin derivative
$\displaystyle D_{\cdot}F=(D_{1,\cdot}F,\cdots,D_{d,\cdot}F)$ for $\displaystyle F\in\mathbb{D}^{\infty}$ .

For $\displaystyle 1\leq i_{1},\cdots,i_{\ell}\leq d$ ,

\displaystyle\displaystyle E[(\partial_{{i_{1}}}\cdots\partial_{{i_{\ell}}}{\cal T})({X}_{1,T}^{t,x})]=E[{\cal T}({X}_{1,T}^{t,x})H_{(i_{1},\cdots,i_{\ell})}({X}_{1,T}^{t,x},1)].\ \ \ \ \

(A.12)

Proof of Lemma 1. Use the duality formula (see Theorem 1.26 of [24] or Proposition 1.3.11 of [28]), with $\displaystyle\textstyle{D{\cal T}(\Xi)=\sum_{i=1}^{d}(\partial_{i}{\cal T})(\Xi)D\Xi^{i}}$ for $\displaystyle\Xi=(\Xi^{1},\cdots,\Xi^{d})\in(\mathbb{D}^{\infty})^{d}$ (see Proof of Proposition 2.1.9 of [28] or Proof of Theorem 2.6 of [33]) to obtain the first assertion. Also, the second assertion is immediately obtained by the integration by parts. $\displaystyle\Box$

In the expansions of (A.1) and (A.3), iterated integrals such as

\displaystyle\displaystyle\int_{t}^{T}h_{j_{1}}(t_{1})\int_{t}^{t_{1}}h_{j_{2}}(t_{2})dW_{t_{2}}^{j_{2}}dW_{t_{1}}^{j_{1}}\ \ (h_{j_{l}}\in L^{2}([0,T]),\ l=1,2,\ j_{1},j_{2}=1,\cdots,d)

(A.13)

appear, for which the following calculation holds with use of (A.11):

	$\displaystyle\displaystyle\sum_{i_{1}}E[\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}h_{j_{1}}(t_{1})\int_{t}^{t_{1}}h_{j_{2}}(t_{2})dW_{t_{2}}^{j_{2}}dW_{t_{1}}^{j_{1}}]$	(A.14)
$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\sum_{i_{1},i_{2}}E[\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}(D_{j_{1},t_{1}}{X}_{1,T}^{t,x,i_{2}})h_{j_{1}}(t_{1})\int_{t}^{t_{1}}h_{j_{2}}(t_{2})dW_{t_{2}}^{j_{2}}dt_{1}]$
$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\sum_{i_{1},i_{2}}\int_{t}^{T}(D_{j_{1},t_{1}}{X}_{1,T}^{t,x,i_{2}})h_{j_{1}}(t_{1})E[\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{t_{1}}h_{j_{2}}(t_{2})dW_{t_{2}}^{j_{2}}]dt_{1}$
$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\sum_{i_{1},i_{2},i_{3}}\int_{t}^{T}(D_{j_{1},t_{1}}{X}_{1,T}^{t,x,i_{2}})h_{j_{1}}(t_{1})E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{t_{1}}(D_{j_{2},t_{2}}{X}_{1,T}^{t,x,i_{3}})h_{i_{2}}(t_{2})d{t_{2}}]dt_{1}$
$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\sum_{i_{1},i_{2},i_{3}}E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]\int_{t}^{T}(D_{j_{1},t_{1}}{X}_{1,T}^{t,x,i_{2}})h_{j_{1}}(t_{1})\int_{t}^{t_{1}}(D_{j_{2},t_{2}}{X}_{1,T}^{t,x,i_{3}})h_{j_{2}}(t_{2})d{t_{2}}dt_{1}.$

Note that $\displaystyle s\mapsto D_{j,s}{X}_{1,T}^{t,x,i}$ is deterministic, and one has

\displaystyle\displaystyle D_{j,s}{X}_{1,T}^{t,x,i}=[J^{0,x}_{t\to T}{J^{0,x}_{t\to s}}^{-1}\sigma_{j}(s,X_{s}^{t,x,0})]^{i}.

(A.15)

Thus, we get

	$\displaystyle\displaystyle\sum_{i_{1}}E[\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\int_{t}^{T}h_{i_{1}}(t_{1})\int_{t}^{t_{1}}h_{i_{2}}(t_{2})dW_{t_{2}}^{i_{2}}dW_{t_{1}}^{i_{1}}]$	(A.16)
$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\sum_{i_{1},i_{2},i_{3}}E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]$	(A.17)
	$\displaystyle\displaystyle\int_{t}^{T}[J^{0,x}_{t\to T}(J^{0,x}_{t\to t_{1}})^{-1}\sigma_{j_{1}}(t_{1},X_{t_{1}}^{t,x,0})]^{i_{2}}h_{j_{1}}(t_{1})\int_{t}^{t_{1}}[J^{0,x}_{t\to T}(J^{0,x}_{t\to t_{2}})^{-1}\sigma_{j_{2}}(t_{2},X_{t_{2}}^{t,x,0})]^{i_{3}}h_{j_{2}}(t_{2})d{t_{2}}dt_{1}.$	(A.18)

Using the above calculation with (A.12), we have

	$\displaystyle\displaystyle\sum_{i_{1}}E[\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})\varepsilon X_{2,T}^{t,x,i_{1}}]$	(A.19)
$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},k_{1},k_{2}}E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)$	(A.20)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},j_{2},k_{1},k_{2}}E[\partial_{i_{3}}\partial_{i_{2}}\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)$	(A.21)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},j_{1},j_{2},k_{1},k_{2}}E[\partial_{i_{1}}{\cal T}({X}_{1,T}^{t,x})]\frac{1}{2}1_{k_{1}=k_{2}}C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x)$	(A.22)
$\displaystyle\displaystyle=$	$\displaystyle\displaystyle\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},k_{1},k_{2}}E[{\cal T}({X}_{1,T}^{t,x})H_{(i_{1},i_{2},i_{3})}({X}_{1,T}^{t,x},1)]C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)$	(A.23)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},j_{2},k_{1},k_{2}}E[{\cal T}({X}_{1,T}^{t,x})H_{(i_{1},i_{2},i_{3})}({X}_{1,T}^{t,x},1)]C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)$	(A.24)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},,j_{1},j_{2},k_{1},k_{2}}E[{\cal T}({X}_{1,T}^{t,x})H_{(i_{1})}({X}_{1,T}^{t,x},1)]\frac{1}{2}1_{k_{1}=k_{2}}C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x).$	(A.25)

Therefore, we get (3.16) as:

$\displaystyle\displaystyle{\cal U}^{1,(1)}(t,x)=$	$\displaystyle\displaystyle E[g(\overline{X}_{T}^{t,x,\varepsilon})]$	(A.26)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},k_{1},k_{2}}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3})}({X}_{1,T}^{t,x},1)]C_{i_{1},i_{2},i_{3},j_{1}}^{(1),k_{1},k_{2}}(t,T,x)$	(A.27)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},i_{2},i_{3},j_{1},j_{2},k_{1},k_{2}}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1},i_{2},i_{3})}({X}_{1,T}^{t,x},1)]C_{i_{1},i_{2},i_{3},j_{1},j_{2}}^{(2),k_{1},k_{2}}(t,T,x)$	(A.28)
	$\displaystyle\displaystyle+\varepsilon\sum_{i_{1},,j_{1},j_{2},k_{1},k_{2}}E[g(\overline{X}_{T}^{t,x,\varepsilon})H_{(i_{1})}({X}_{1,T}^{t,x},1)]\frac{1}{2}1_{k_{1}=k_{2}}C_{i_{1},j_{1},j_{2}}^{(3),k_{1},k_{2}}(t,T,x).$	(A.29)

Next, we give the representation (3.17). The function $\displaystyle(\partial/\partial x){\cal U}^{1}\sigma^{\varepsilon}$ given by

	$\displaystyle\displaystyle(\partial/\partial x){\cal U}^{1}\sigma^{\varepsilon}(t,x)=E[(\nabla g)(X_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon}]\varepsilon\sigma(t,x)$		(A.30)
	$\displaystyle\displaystyle=\int_{\mathbb{R}^{d}}\sum_{i=1}^{d}(\partial_{i}g)(X_{T}^{t,x,0}+\varepsilon y)E[\delta_{y}(G_{T}^{t,x,\varepsilon})(\partial/\partial x)X_{T}^{t,x,\varepsilon,i}]dy\varepsilon\sigma(t,x)$		(A.31)

is expanded as

	$\displaystyle\displaystyle{\cal V}^{1,(1)}(t,x)$		(A.32)
	$\displaystyle\displaystyle=\frac{1}{\varepsilon}E[\sum_{i_{1}}g(X_{T}^{t,x,0}+\varepsilon X_{1,T}^{t,x})H_{(i_{1})}(X_{1,T}^{t,x},1)]J_{t\to T}^{0,x}\ \varepsilon\sigma(t,x)$		(A.33)
	$\displaystyle\displaystyle\quad+E[g(X_{T}^{t,x,0}+\varepsilon X_{1,T}^{t,x})H_{(i_{1})}(X_{1,T}^{t,x},J_{t\to T}^{1,x})]\ \varepsilon\sigma(t,x)$		(A.34)
	$\displaystyle\displaystyle\quad+\sum_{i_{1},i_{2}}E[g(X_{T}^{t,x,0}+\varepsilon X_{1,T}^{t,x})H_{(i_{2})}(X_{1,T}^{t,x},H_{(i_{1})}(X_{1,T}^{t,x},X_{2,T}^{t,x}))]J_{t\to T}^{0,x}\ \varepsilon\sigma(t,x),$		(A.35)

where the following relationship is taken into account:

\displaystyle\displaystyle H_{(i)}(X_{T}^{t,x,0}+\varepsilon X_{1,T}^{t,x},1)=H_{(i)}({X}_{1,T}^{t,x},1)/\varepsilon,\ \ \ i=1,\cdots,d.

(A.36)

Then, the similar calculation in (A.14) with (A.12) gives the representation (3.17). $\displaystyle\Box$

Acknowledgements

This work is supported by JSPS KAKENHI (Grant Number 19K13736) and JST PRESTO (Grant Number JPMJPR2029), Japan.

References

[1] C. Beck, F. Hornung, M. Hutzenthaler and A. Jentzen and T. Kruse, Overcoming the curse of dimensionality in the numerical approximation of Allen-Cahn partial differential equations via truncated full-history recursive multilevel Picard approximations, Journal of Numerical Mathematics (2020)
[2] J. Berner, P. Grohs and A. Jentzen, Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black–Scholes partial differential equations, SIAM Journal on Mathematics of Data Science, 2(3), 631-657 (2020)
[3] Weinan E, J. Han and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Communications in Mathematics and Statistics, 5(4) 349-380 (2017)
[4] Weinan E, J. Han and A. Jentzen, Algorithms for Solving High Dimensional PDEs: From Nonlinear Monte Carlo to Machine Learning, arXiv (2020)
[5] D. Elbrächter, P. Grohs, A. Jentzen and C. Schwab, DNN expression rate analysis of high-dimensional PDEs: Application to option pricing, arXiv (2018)
[6] El Karoui, S. Peng and M.C. Quenez, Backward stochastic differential equations in finance, Mathematical Finance, 7(1), 1-71 (1997)
[7] M. Fujii and A. Takahashi, Analytical approximation for non-linear FBSDEs with perturbation scheme, International Journal of Theoretical and Applied Finance, (2011)
[8] M. Fujii and A. Takahashi, Solving backward stochastic differential equations with quadratic-growth drivers by connecting the short-term expansions, Stochastic Processes and their Applications, 129(5) (2019)
[9] M. Fujii, A. Takahashi and M. Takahashi, Asymptotic expansion as prior knowledge in deep learning method for high dimensional BSDEs, Asia-Pacific Financial Markets, (2019)
[10] A. Gnoatto, C. Reisinger and A. Picarelli, Deep xVA solver-A neural network based counterparty credit risk management framework, SSRN (2020)
[11] P. Grohs, A. Jentzen and D. Salimova, Deep neural network approximations for Monte Carlo algorithms, arXiv (2019)
[12] P. Grohs, F. Hornung, A. Jentzen and P. Zimmermann, Space-time error estimates for deep neural network approximations for differential equations, arXiv (2019)
[13] M.B. Giles, A. Jentzen and T. Welti, Generalised multilevel Picard approximations, arXiv (2020)
[14] J. Han and J. Long, Convergence of the Deep BSDE method for coupled FBSDEs, Probability, Uncertainty and Quantitative Risk, 5(5) (2020)
[15] J. Han, J. Lu and M.Zhou, Solving high-dimensional eigenvalue problems using deep neural networks: A diffusion Monte Carlo like approach, Journal of Computational Physics, Volume 423, 15, December 2020, 109792 (2020)
[16] J. Han, L. Zhang and Weinan E, Solving many-electron Schrödinger equation using deep neural networks, Journal of Computational Physics, Volume 399, 15, December 2019, 108929 (2019)
[17] F. Hornung, A. Jentzen and D. Salimova, Space-time deep neural network approximations for high-dimensional partial differential equations, arXiv (2020)
[18] M. Hutzenthaler, A. Jentzen, T. Kruse, T.A. Nguyen and P.V. Wurstemberger, Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations, arXiv (2018)
[19] N. Ikeda and S. Watanabe, Stochastic Differential Equations and Diffusion Processes, 2nd ed., North-Holland, Amsterdam, Kodansha, Tokyo (1989)
[20] Y. Iguchi and T. Yamada, Operator splitting around Euler-Maruyama scheme and high order discretization of heat kernels, ESAIM: Mathematical Modelling and Numerical Analysis, to appear (2020)
[21] N. Kunitomo and A. Takahashi, The asymptotic expansion approach to the valuation of interest rate contingent claims, Mathematical Finance, 11, 117-151 (2001)
[22] N. Kunitomo and A. Takahashi, On validity of the asymptotic expansion approach in contingent claim analysis, Annals of Applied Probability, 13(3), 914-952 (2003)
[23] Y. Li, J. Lu and A. Mao, Variational training of neural network approximations of solution maps for physical models, Journal of Computational Physics, Volume 409, 15 May 2020, 109338 (2020)
[24] P. Malliavin and A. Thalmaier, Stochastic Calculus of Variations in Mathematical Finance, Springer (2006)
[25] R. Matsuoka, A. Takahashi and Y. Uchida, A new computational scheme for computing Greeks by the asymptotic expansion approach, Asia Pacific Financial Markets, 11, 393-430 (2006)
[26] R. Naito and T. Yamada, A third-order weak approximation of multidimensional Itô stochastic differential equations, Monte Carlo Methods and Applications, vol 25 (2), 97-120 (2019)
[27] R. Naito and T. Yamada, An acceleration scheme for deep learning-based BSDE solver using weak expansions, International Journal of Financial Engineering, (2020)
[28] D. Nualart, The Malliavin Calculus and Related Topics, Springer (2006)
[29] Y. Okano and T. Yamada, A control variate method for weak approximation of SDEs via discretization of numerical error of asymptotic expansion, Monte Carlo Methods and Applications, 25(3) (2019)
[30] J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, Journal of Computational Physics, Vol 375, 1339-1364 (2018)
[31] A. Takahashi, An asymptotic expansion approach to pricing financial contingent claims, Asia-Pacific Financial Markets, 6(2), 115-151 (1999)
[32] A. Takahashi, Asymptotic expansion approach in finance, Large Deviations and Asymptotic Methods in Finance (P. Friz, J. Gatheral, A. Gulisashvili, A. Jacquier and J. Teichmann ed.), Springer Proceedings in Mathematics & Statistics (2015)
[33] A. Takahashi and T. Yamada, An asymptotic expansion with push-down of Malliavin weights, SIAM Journal on Financial Mathematics, 3, 95-136 (2012)
[34] A. Takahashi and T. Yamada, An asymptotic expansion of forward-backward SDEs with a perturbed driver, International Journal of Financial Engineering, 2(2) (2015)
[35] A. Takahashi and T. Yamada, A weak approximation with asymptotic expansion and multidimensional Malliavin weights, Annals of Applied Probability, 26(2), 818-856 (2016)
[36] A. Takahashi and N. Yoshida, Monte Carlo simulation with asymptotic method, Journal of the Japan Statistical Society 35(2), 171-203 (2005)
[37] K. Tokutome and T. Yamada, Acceleration of automatic differentiation of solutions to parabolic partial differential equations: a higher order discretization, Numerical Algorithms, Vol.86, 593-635 (2021)
[38] S. Watanabe, Analysis of Wiener functionals (Malliavin calculus) and its applications to heat kernels, Annals of Probability, 15, 1-39 (1987)
[39] T. Yamada, An arbitrary high order weak approximation of SDE and Malliavin Monte Carlo: application to probability distribution functions, SIAM Journal on Numerical Analysis, 57(2), 563-591 (2019)
[40] T. Yamada and K. Yamamoto, Second order discretization of Bismut-Elworthy-Li formula: application to sensitivity analysis, SIAM/ASA Journal on Uncertainty Quantification, 7(1), 143-173 (2019)
[41] Y. Zang, G. Bao, X. Ye and H. Zhou, Weak adversarial networks for high-dimensional partial differential equations, Journal of Computational Physics, 411 (2020)
[42] J. Zhang, Backward Stochastic Differential Equations, Springer (2017)

	$\displaystyle\displaystyle\quad E[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\}\|^{2}]$
	$\displaystyle\displaystyle=E[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}-\alpha{\cal Y}_{0}^{2,\varepsilon,(m,n)}\|^{2}]$
	$\displaystyle\displaystyle\leq CE[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}\|^{2}]+\alpha^{2}CE[\|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}\|^{2}].$		(3.24)

	$\displaystyle\displaystyle\quad E[\|Y_{0}^{\varepsilon,\alpha}-\{\overline{{\cal Y}}_{0}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{0}^{2,\varepsilon,(m)}\}\|^{2}]$
	$\displaystyle\displaystyle\leq CE[\|E_{X_{0}}[g(X_{T}^{\varepsilon})]-E_{X_{0}}[g(\overline{X}^{0,\cdot,\varepsilon}_{T}){\cal W}^{0,\cdot,\varepsilon,(m)}_{T}]\|^{2}]$
	$\displaystyle\displaystyle\quad+CE\Big{[}\Big{\|}E_{X_{0}}[\int_{0}^{T}\alpha f(s,X_{s}^{\varepsilon},Y_{s}^{\varepsilon,\alpha},Z_{s}^{\varepsilon,\alpha})ds]$
	$\displaystyle\displaystyle\qquad\qquad-E_{X_{0}}[\int_{0}^{T}\alpha f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Y}_{s}^{2,\varepsilon,(m)},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha{\cal Z}_{s}^{2,\varepsilon,(m)})ds]\Big{\|}^{2}\Big{]}$
	$\displaystyle\displaystyle\leq C\varepsilon^{2(m+1)}$
	$\displaystyle\displaystyle+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[\|{\cal Y}_{s}^{1,\varepsilon}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[\|{\cal Z}_{s}^{1,\varepsilon}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds$
	$\displaystyle\displaystyle+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[\|\alpha{\cal Y}_{s}^{2,\varepsilon}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}\|^{2}]ds+C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\int_{0}^{T}E[\|\alpha{\cal Z}_{s}^{2,\varepsilon}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}\|^{2}]ds.$		(3.28)

	$\displaystyle\displaystyle\int_{0}^{T}E[\|\alpha{\cal Y}_{s}^{2,\varepsilon}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}\|^{2}]ds+\int_{0}^{T}E[\|\alpha{\cal Z}_{s}^{2,\varepsilon}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}\|^{2}]ds$
	$\displaystyle\displaystyle\leq CE[\int_{0}^{T}\|\alpha f(s,X_{s}^{\varepsilon},\mathcal{Y}_{s}^{1,\varepsilon}+\alpha\mathcal{Y}_{s}^{2,\varepsilon},\mathcal{Z}_{s}^{1,\varepsilon}+\alpha\mathcal{Z}_{s}^{2,\varepsilon})$
	$\displaystyle\displaystyle\qquad\qquad\qquad-\alpha f(s,X_{s}^{\varepsilon},\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}+\alpha\mathcal{Y}_{s}^{2,\varepsilon},\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}+\alpha\mathcal{Z}_{s}^{2,\varepsilon})\|^{2}ds]$
	$\displaystyle\displaystyle\leq C\alpha^{2}C_{\mathrm{Lip}}[f]^{2}\{\int_{0}^{T}E[\|{\cal Y}_{s}^{1,\varepsilon}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds+\int_{0}^{T}E[\|{\cal Z}_{s}^{1,\varepsilon}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}\|^{2}]ds\}$
	$\displaystyle\displaystyle\leq C\alpha^{2}\varepsilon^{2(m+1)},$		(3.31)

	$\displaystyle\displaystyle E[\|{\cal Y}_{0}^{2,\varepsilon,(m)}-{\cal Y}_{0}^{2,\varepsilon,(m,n)}\|^{2}]$
	$\displaystyle\displaystyle\leq\\|\mathcal{Y}^{2,\varepsilon,(m,n)}_{T}\\|_{2}^{2}+\int_{0}^{T}E[\|{\cal Z}_{s}^{2,\varepsilon,(m)}-{\cal Z}_{s}^{2,\varepsilon,(m,n)}\|^{2}]ds$
	$\displaystyle\displaystyle\quad+C_{\text{Lip}}[f]^{2}\times\Big{\{}E[\int_{0}^{T}\|X_{s}^{\varepsilon}-\bar{X}_{s}^{\varepsilon,(n)}\|^{2}ds$
	$\displaystyle\displaystyle\quad+\int_{0}^{T}\|\overline{{\cal Y}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Y}}_{s}^{1,\varepsilon,(m,n)}\|^{2}ds+\int_{0}^{T}\|\alpha{\cal Y}_{s}^{2,\varepsilon,(m)}-\alpha{\cal Y}_{s}^{2,\varepsilon,(m,n)}\|^{2}ds$
	$\displaystyle\displaystyle\quad+\int_{0}^{T}\|\overline{{\cal Z}}_{s}^{1,\varepsilon,(m)}-\overline{{\cal Z}}_{s}^{1,\varepsilon,(m,n)}\|^{2}ds+\int_{0}^{T}\|\alpha{\cal Z}_{s}^{2,\varepsilon,(m)}-\alpha{\cal Z}_{s}^{2,\varepsilon,(m,n)}\|^{2}ds]\Big{\}}.$		(3.35)

A new efficient approximation scheme for solving high-dimensional semilinear PDEs: control variate method for Deep BSDE solver

Abstract

1 Introduction

2 Deep BSDE solver and acceleration scheme with asymptotic expansion

2.1 Deep BSDE method by Weinan E et al. (2017)

Theorem 1 (Han and Long (2020)).

2.2 An approximation method by Fujii et al. (2019)

3 New method

3.1 Deep BSDE solver for explicitly solvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})

Theorem 2.

3.2 General case: Deep BSDE solver for unsolvable (𝒴1,ε,𝒵1,ε)\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})

Proposition 1.

Proposition 2.

Theorem 3.

Remark 1.

Corollary 1.

Remark 2.

4 Numerical results

4.1 Numerical schemes used in experiments

4.2 Numerical experiments

4.2.1 The case of d=1\displaystyle d=1

4.2.2 The case of d=100\displaystyle d=100

5 Conclusion and future works

Appendix

Appendix A Proof of Propositions

A.1 Proof of Proposition 1

A.2 Proof of Proposition 2

Lemma 1.

Acknowledgements

References

3.1 Deep BSDE solver for explicitly solvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$

3.2 General case: Deep BSDE solver for unsolvable $\displaystyle(\mathcal{Y}^{1,\varepsilon},\mathcal{Z}^{1,\varepsilon})$

4.2.1 The case of $\displaystyle d=1$

4.2.2 The case of $\displaystyle d=100$