From Monte Carlo to neural networks approximations
of boundary value problems

Lucian Beznea^2,1 E-mails: lucian.beznea@imar.ro Iulian Cîmpean^1,2 E-mails: iulian.cimpean@unibuc.ro;iulian.cimpean@imar.ro Oana Lupaşcu-Stamate ³ E-mails: oana.lupascu

\_

stamate@yahoo.com Ionel Popescu ^1,2 E-mails: ioionel@gmail.com,ionel.popescu@fmi.unibuc.ro Arghir Zarnescu ^4,2,5
E-mails: azarnescu@bcamath.org

(¹University of Bucharest, Faculty of Mathematics and Computer Science,
14 Academiei str., 70109, Bucharest, Romania
² Simion Stoilow Institute of Mathematics of the Romanian Academy,
P.O. Box 1-764, RO-014700 Bucharest, Romania
³“Gheorghe Mihoc – Caius Iacob” Institute of Mathematical Statistics and Applied Mathematics
of the Romanian Academy, 13 Calea 13 Septembrie, 050711 Bucharest, Romania
⁴ BCAM, Basque Center for Applied Mathematics, Mazarredo 14, E48009 Bilbao, Bizkaia, Spain
⁵ IKERBASQUE, Basque Foundation for Science, Maria Diaz de Haro 3, 48013, Bilbao, Bizkaia, Spain )

Abstract

In this paper we study probabilistic and neural network approximations for solutions to Poisson equation subject to Hölder data in general bounded domains of $\mathbb{R}^{d}$ . We aim at two fundamental goals.

The first, and the most important, we show that the solution to Poisson equation can be numerically approximated in the sup-norm by Monte Carlo methods, and that this can be done highly efficiently if we use a modified version of the walk on spheres algorithm as an acceleration method. This provides estimates which are efficient with respect to the prescribed approximation error and with polynomial complexity in the dimension and the reciprocal of the error. A crucial feature is that the overall number of samples does not not depend on the point at which the approximation is performed.

As a second goal, we show that the obtained Monte Carlo solver renders in a constructive way ReLU deep neural network (DNN) solutions to Poisson problem, whose sizes depend at most polynomialy in the dimension $d$ and in the desired error. In fact we show that the random DNN provides with high probability a small approximation error and low polynomial complexity in the dimension.

Keywords: Deep Neural Network (DNN); Walk-on-Spheres (WoS); Monte Carlo approximation; high-dimensional approximation; Poisson boundary value problem with Dirichlet boundary condition.

Mathematics Subject Classification (2020): 65C99, 68T07, 65C05.

1 Introduction

Partial differential equations provide the most commonly used framework for modelling a large variety of phenomena in science and technology. Using these models in practice requires fast, accurate and stable computations of solutions of PDEs. Broadly speaking there exist two large classes of simulations: deterministic and stochastic. The deterministic methods (e.g. finite differences, finite element methods, etc) are very effective in globally approximating the solutions but their computational effort grows exponentially with respect to the dimension of the space. On the other hand, the probabilistic methods manage to overcome the dimensionality issue, but they are usually employed to obtain approximations at a given point and changing the point requires a different approximation.

In recent years, another powerful class of methods has been developed, namely the (deep) neural network models (in short DNN). These have been able to provide a remarkable number of achievements in the technological realm, such as: image classification, language processing and time series analysis, to name only a few. However, despite their remarkable achievements their rigorous understanding is still in its infancy.

On the theoretical side it is known that DNNs are capable of providing good approximation properties for continuous functions [12, 15, 42, 52]. For a more recent in depth analysis see [15] and the references in there. However it should be noted that these approximations are not constructive and indeed the issue of constructibility and error estimates is the crucial one from the practical point of view.

In what concerns the approximation of solutions of PDEs there are numerical treatments in low dimensions, as for example in [34, 35, 37], which propose schemes for solving PDEs using some form of neural networks. However, these do not provide any error estimates. Their approaches depend on a grid discretization of the space and for the most challenging case, namely that of simulations in high dimensions, there are no theoretical guarantees that the methods would scale well in high dimensions. Typically, these approximations, though convincing, remain at the level of numerical experiments.

On the other hand there is a rigorous body of literature which treats the approximation of the solution where error estimates are provided for PDEs with neural networks as, for example, in [43, 46, 37, 39, 23, 25] but typically these scale poorly in high dimensions. A discussion of some of these is provided in the next subsection.

Our aim in this paper will be to study approximations of Poisson equation and provide DNNs built on stochastic approaches that address some of these shortcomings mentioned before, significantly advancing the state of the art and providing tools that can be extended to more general equations, as the cost of a suitable increase in the technical details. A discussion of our contribution is provided in subsection 1.2.

1.1 On related previous work.

For the sake of comparison (see Subsection 1.2), let us present here a short review of existing works that we find strongly connected to our paper, trying to point out some limitations that are fundamentally addressed in our present work. For a comprehensive overview of deep learning methods for PDEs we refer to [6].

Monte Carlo methods for PDEs

The Monte Carlo method for solving linear PDEs has been well understood and intensively used for a long time. Let us mention e.g. [48] for the case of linear parabolic PDEs on $\mathbb{R}^{d}$ , and [44] for linear elliptic PDEs in bounded domains; one issue that is encountered in all these classical works and which is particularly crucial to us, is that the theoretical errors are point-dependent, in the sense that there is no guarantee that one can use the same Monte Carlo samples uniformly for all the locations in an open Euclidian domain where the solution is aimed to be approximated. Recently, it was shown in [28] and [27] (see also [30, Theorem 1.1]) that a multilevel Picard Monte Carlo method can be derived in order to numerically approximate the solutions to semilinear parabolic PDEs without suffering from the curse of high dimensions; here as well, the obtained probabilistic errors are derived for a given fixed location where the exact solution is approximated. Monte Carlo methods have also been extended to fully nonlinear parabolic PDEs in $\mathbb{R}^{d}$ , as for example in [16], where a mixture between the Monte Carlo method and the finite difference scheme is proposed; moreover, a locally uniform (in space) convergence of the proposed numerical scheme is obtained, but the issue that matters to us is that the required number of samples grows like $h^{-d}$ (see [16, Example 4.4]) where $d$ is the dimension whilst $h$ is the time discretization parameter. Also, because the space discretization is performed by a finite difference scheme on a uniform grid and the Monte Carlo sampling needs to be performed for each point in the grid, the algorithm complexity once more suffers exponentially with respect to the dimension.

DNNs for the Dirichlet problem on bounded domains

We mention two directions in the literature that aim at rigorously proving that DNNs can be used as numerical solvers without suffering from the curse of high dimensions. The first one is proposed in [23] and it is the most related to our present work, so we shall frequently refer to it in what follows. The approach goes through the stochastic representation of the solution, and aims at designing a Monte Carlo sampler that can be used uniformly for all locations in the domain where the solution is approximated. However, some important issues remained open, like the fact that the obtained estimates depend on the volume of the domain which can nevertheless grow exponentially with respect to the dimension; these are going to be discussed in detail later. The second approach is in [39] which is rather different and constructs the neural network progressively using a gradient descent method and then calculates the polynomial complexity of neural network approximation. A main feature of this second approach is that the construction of the DNN solver uses the theoretical spectral decomposition of the differential operator which is numerically not available, hence the obtained existence of the DNN approximator for the exact solution is of theoretical nature.

DNNs for (linear) Kolmogorov PDEs

In the case of linear parabolic PDEs, DNNs solvers based on probabilistic representations have been proposed and numerically tested in [5]. A theoretical proof that DNNs are indeed able to approximate solutions to a class of linear Kolmogorov PDEs without suffering from the curse of dimensions has been provided in [31]. The strategy fundamentally aims at minimizing the error in $L^{2}(D;\lambda/\lambda(D))$ ( $\lambda$ is the Lebesgue’s measure), hence the existence of a DNN that approximates the solution without the curse of dimensions is in fact obtained on domains whose volumes increase at most polynomially with respect to the space dimension. In [5], the authors include some numerical evidence that the $L^{\infty}(D)$ -error also scales well with respect to the dimension, but some caution needs to be taken as mentioned in [5, 4.7 Conclusion]. In the case of the heat equation on $\mathbb{R}^{d}$ it was proved in [21] that any solution with at most polynomial growth can be approximated in $L^{\infty}([a,b]^{d})$ by a DNN whose size grows at most polynomially with respect to $d$ , the reciprocal of the prescribed approximation error, and $\max(|a|,|b|)$ ; the authors use heavily the representation of the solution by a standard Brownian motion shifted at the location point where the solution is approximated. In our paper we deal with the Poisson problem in a bounded domain in $\mathbb{R}^{d}$ , and the main difficulty and difference at the same time, comes from the fact that the representing process (namely the Brownian motion stopped when it exits the domain) depends in a nonlinear way on the starting point where the solution is represented, and this dependence is strongly influenced by the geometry of the domain.

DNNs for semilinear parabolic PDEs on $\mathbb{R}^{d}$

In the case of semilinear heat equation on $\mathbb{R}^{d}$ with gradient-independent nonlinearity, a rigorous proof that DNNs can be used as numerical solvers that do not suffer from the curse of dimensions can be found in [29, Theorem 1.1]. The approximating errors are considered in the $L^{2}([0,1]^{d})$ sense, so in general, by a scaling argument, they depend on the volume of the domain where the solution is approximated. Deep learning methods for general semilinear parabolic PDEs on $\mathbb{R}^{d}$ have been proposed and efficiently tested in high dimensions in [25]. Rigorous proofs that these type of deep solvers are indeed capable of approximating solutions to general PDEs without suffering from the curse of dimensions are still waiting to be derived.

1.2 Our contribution

In the present work we are primarily interested in studying Monte Carlo and DNN numerical approximations for solutions to the Poisson boundary value problem (1.1) in bounded domains in $\mathbb{R}^{d}$ , explicitly tracking the dependence of the Monte Carlo estimates as well as the size of the corresponding neural networks in terms of the spatial dimension, the reciprocal of the accuracy, the regularity of the domain and the prescribed source and boundary data. There are several key issues that we tackle throughout the paper, so let us briefly yet systematically point them out here:

Overcoming the curse of high dimensionality

We outline here that a very important consequence of our results concerns the breaking of the curse of high dimensions in the sense that the size of the neural network approximating the solution $u$ to problem (1.1) adds at most a (low degree) polynomial complexity to the overall complexity (see Theorem (Part II) below) of the approximating networks for the distance function and the data. Moreover, as typical in machine learning, we also show that despite the fact that the neural network construction is random, it breaks the dimensionality curse with high probability. In terms of the dimension $d$ , our main results state, in particular, that if the domain is sufficiently regular (e.g. convex) then the complexity of the Monte Carlo estimator of the exact solution to (1.1) scales at most like $d^{3}\log^{4}(d)$ , whilst the DNN estimator of the same solution scales at most like $d^{5}\log^{5}(d){\rm S}$ , where ${\rm S}$ is a cumulative size of the DNNs used to approximate the given data and the distance function to the boundary of the domain. In contrast with the results from [23], our construction of the DNN approximators is explicit and such that their sizes do not depend on the volume of the domain. The herein obtained estimates should also be compared with the conclusion from [39] where the size of the network is $\mathcal{O}(d^{\log(1/\gamma)})$ , where $\gamma$ represents the accuracy of the approximation, thus the degree grows with the allowed error. Also, the construction adopted in [39] is of theoretical nature in the sense that it guarantees the existence of DNNs with the desired properties, but is unclear how it could be implemented in practice. In contrast, our schemes can be easily implemented using GPU computing, as discussed in Section 5.

Low dimensions improvements for general bounded domains with Dirichlet data

We should point out that our approach also has interesting consequences in low dimensions. The key is that we can reuse the samples for the Monte Carlo solver to simultaneously approximate the solution for all points in the domain, and furthermore the number of the steps required by the designed Walk-on-Spheres algorithm does not depend on the starting point; these two features make the proposed scheme highly parallelizable, and GPU computing can be very efficiently employed. Moreover, this works for arbitrary bounded domains with quantitative estimates while for more regular domains one obviously obtain improved estimates.

General bounded domains and Hölder continuous data

Recall that in [23] the domain is assumed to be convex, whilst in [39] it is of class $C^{\infty}$ . In the present paper it is shown that the curse of high dimensions can be overcome for a general class of domains, namely those that satisfy a uniform exterior ball condition. As a matter of fact, our results are even more general, covering the case of a arbitrary bounded domain in $\mathbb{R}^{d}$ (see 2.26), but then the estimates are given in terms of the behavior of the function $v_{D}$ defined in (2.3) in the proximity of the boundary of the domain. Concerning the regularity of the source and boundary data, our assumption is that they are merely Hölder continuous. Recall that in [23] the source and boundary data are assumed to be twice continuous differentiable.

$L^{\infty}(D)$ estimates

Recall that in [23], the accuracy is measured with respect to the $L^{2}$ norm. However, as pointed out in [39], the estimates depend actually on the volume of the domain $D$ , hence they implicitly exhibit an exponential dependence on the dimension for sufficiently large domains. In the present work we estimate the errors in the uniform norm which gives on one hand much better results. On the other hand, we prove that the uniform norm of the error is small with large probability, whilst the approximation complexity depends on $D$ merely through its (annular) diameter. We emphasize that obtaining reliable estimates for the expectation or the tail probability of the Monte Carlo error computed in the uniform norm is in general highly nontrivial. For example, such estimates have been only very recently been obtained for the (linear) heat equation in $\mathbb{R}^{d}$ in [22], after some considerable effort. In our case, the difficulty arises mainly from the fact that we work in bounded domains. Nevertheless, our method is in some sense much simpler and could be easily transferred in other settings as well.

Walk-on-Spheres acceleration revisited

The walk on spheres (WoS) was introduced in [40] as a way of accelerating the calculation of integrals along the paths of Brownian motion. The standard walk-on-spheres uses the following scheme: take some $x\in D$ ; then, instead of simulating the entire Brownian motion trajectory, we start by uniformly choosing a point on the sphere of maximal radius inside $D$ . Then, this step is repeated until the current position enters some thin neighborhood of the boundary ( $\varepsilon$ -shell), where the chain is stopped. Thus, the chain so constructed is used as an approximation for the Brownian motion started from $x$ and stopped at the exit time from the domain $D$ ; recall that the distribution of the latter variable is precisely the harmonic measure with pole $x$ , hence it is used to represent the solution $u(x)$ to problem (1.1) with $f\equiv 0$ . Here, we modify this algorithm in two respects. Firstly, our stopping rule for the walk on sphere is deterministic, namely, we run the walk-on-spheres chain for a given number of steps, uniformly for all trajectories and all points in the domain. This is totally opposite to the stopping rule used in [23], and, in fact, to the one usually adopted in the literature. It has a number of advantages, but perhaps the most important one is that the neural network construction outlined here is explicit. In order to understand how large one should take such a deterministic stopping time in order to achieve the desired estimates, we have to investigate different estimates in less or more regular domains for the number of steps needed for the walk on spheres chain to reach the $\varepsilon$ neighborhood of the boundary. Secondly, our walk-on-spheres scheme is performed with the maximal radius replaced by a more general radius, which is not necessarily maximal and is compatible with ReLU DNNs. Overall, we develop a generalized walk-on-spheres algorithm which is of self interest and which is much more compatible with parallel computing, when compared to the classical scheme.

The core ideas are surprisingly simple and of general nature

Putting aside the WoS acceleration algorithm, the crucial ingredients are the following: the first one is that the Monte Carlo approximation $u_{M}^{N}(x)$ given by (1.5) of the exact solution $u$ to problem (1.1) is a.s. Hölder continuous with respect to $x$ , yet with a Hölder constant which is exponentially large with respect to the number $M$ of WoS steps. The second one is to consider a uniform grid discretization of the domain $D$ and to approximate the solution $u$ in the sup-norm merely on this grid. The third ingredient is to employ the (otherwise very poor) Hölder regularity of $u_{M}^{N}$ to extrapolate the approximation from the grid to the entire domain. The grid needs to be taken extremely refined, and thus leads immediately to an exponential complexity in terms of $M$ and the diameter of $D$ . Now the fourth ingredient comes into play, namely we use Höffding’s inequality combined with the union bound inequality in order to efficiently compensate for the exponential complexity induced by the uniform grid. We emphasize that the grid discretization is just an instrument to prove the main result which remains in fact grid-independent. This approach that uses an auxiliary uniform grid whose induced complexity is compensated by a concentration inequality is, as a matter of fact, simple and of general nature. Therefore, we expect that out approach can be easily employed for other classes of PDEs.

Universality with respect to given data

One useful feature of the estimator explicitly constructed in this paper is that it essentially approximates the operator that maps the data (source and boundary) of problem (1.1) into the corresponding solution $u$ . In particular, it means that the DNN solvers constructed herein consist of the composition of two separate neural networks: one which approximates the the source and boundary data and one for the above-mentioned operator. In this light, the present method could be interpreted as an operator learning method, and once the operator is learned, the source and boundary data can be varied very easily, without any further training.

Explicit construction of the approximation

One key element of our approach is the explicit formulation of the approximation. This is reflected in the formula (1.5) where all elements are fully determined. We should also insist that (1.5) is much simpler than a neural network and does not need any training. On the other hand, this structure can be exploited to initialize a DNN with significantly less complexity than the guaranteed Monte Carlo construction we provide. Once this initialization is done, we can train this network in order to further decrease the approximation error.

1.3 Brief technical description of the main results.

Our starting point is suggested by [23] which essentially builds on the stochastic representation of the solution to the Poisson equation. In turn, the stochastic representation is then followed by the standard walk on spheres (WoS) method to accelerate the computation of the integrals of the Brownian trajectory. This is then used to construct neural networks approximations. In the present paper we fundamentally expand, clarify, simplify and explicitly construct starting from some of the ideas pointed out in [23]. An extension of [23] to the fractional Laplacian has been developed in [50], so the refined methods proposed here should essentially apply to [50] as well.

Now we descend into the description of our main results. We study the Poisson boundary value problem

\begin{cases}\frac{1}{2}\Delta u=-f\,\textrm{ in }D\\ u|_{\partial D}=g,\end{cases}

(1.1)

where $D$ is a bounded domain from $\mathbb{R}^{d}$ , whilst $f:D\to\mathbb{R}$ and $g:\overline{D}\to\mathbb{R}$ are given continuous functions. It is well known that there exists a unique solution $u\in C(D)\cap H^{1}_{loc}(D)$ to (1.1), see [18, Theorem 6] or 2.2 from below. The fact that $u\in C(D)\cap H^{1}_{loc}(D)$ is a solution to (1.1) means that

	$\displaystyle\frac{1}{2}\int\limits_{D}\langle\nabla u,\nabla\varphi\rangle\;dx=\int\limits_{D}f\varphi\;dx\quad\forall\varphi\in C_{c}^{\infty}(D)$		(1.2)
	$\displaystyle\lim\limits_{D\ni x\to x_{0}\in\partial D}u(x)=g(x_{0}),\quad\forall x_{0}\in\partial D\mbox{ which is a regular point for }D.$		(1.3)

In [23] the domain $D$ is taken to be convex. We treat several layers of generality for the domain $D$ which lead to different final results.

All the random variables employed in the sequel are assumed to be on the same probability space $(\Omega,\mathcal{F},\mathbb{P})$ , whilst the expectation is further denoted by $\mathbb{E}$ . Further, let us consider the generalised walk on sphere process defined as

{X}^{x}_{0}:=x\in D,\quad{X}^{x}_{n+1}:={X}^{x}_{n}+\widetilde{r}({X}^{x}_{n})U_{n+1},\;n\geq 0,

(1.4)

where $x$ is the starting point in the domain $D$ , the function $\widetilde{r}$ denotes the replacement of the distance function $r$ to the boundary $\partial D$ , and $U_{i}$ are drawn independent and identically on the unit sphere in $\mathbb{R}^{d}$ . Essentially we use a Lipschitz $\widetilde{r}$ such that $\widetilde{r}\leq r$ on the whole domain and $\beta r\leq\widetilde{r}$ as long as $r\geq\varepsilon$ . We call such a candidate a $(\beta,\varepsilon)$ -distance and these constants play an important role in the estimates below. In all cases we can take $\beta\geq 1/3$ as we point out in Remark 2.8 below. With this process at hand, we introduce the Monte Carlo estimator $u_{M}^{N}$ of the solution $u$ to problem (1.1) by

u_{M}^{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\left[g(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x,i}_{k-1})f\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right],\quad x\in D,M,N\geq 1.

(1.5)

Here, the sequences $\{U_{n,i}\}_{n,i\geq 0}$ and $\{Y^{i}\}_{i\geq 0}$ are all independent, $U_{n,i}$ is drawn uniformly on the unit sphere, $X_{n}^{x,i}$ is given by (1.4) with $U_{n}$ replaced by $U_{n,i}$ , whilst $Y^{i}$ is drawn on the unit ball in $\mathbb{R}^{d}$ from the distribution $\mu$ which has an explicit density proportional to $|y|^{2-d}-1,|y|<1$ if $d\geq 3$ , and which is in fact the (normalized) Green kernel of the Laplacian on the unit ball with pole at $0$ . It is easy to see that if $R$ and $Z$ are independent random variables such that $R$ has distribution $Beta\left(\frac{4-d}{d-2},2\right)$ on $[0,1]$ and $Z$ is uniformly distributed on the unit sphere in $\mathbb{R}^{d}$ , then $R^{\frac{1}{d-2}}Z$ has distribution $\mu$ .

The first part of the main result of this paper is the following:

Theorem (Part I; see 2.26 for the full quantitative version).

Fix a small $\varepsilon_{0}>0$ , $\beta\in(0,1]$ , $\widetilde{r}$ a $(\beta,\varepsilon_{0})$ -distance, and consider $u_{M}$ and $u_{M}^{N}$ given by (2.20) and (2.29). Also, assume that $f$ and $g$ are $\alpha$ -Hölder on $D$ for some $\alpha\in(0,1]$ . Then, for any compact subset $F\subset D$ , for all $N,M,K\geq 1$ , $\gamma>0$ and $\varepsilon\in[0,\varepsilon_{0}]$ , then, there are explicit quantities $A(F,M,K,d,\varepsilon)$ and $B(M,K,d)$ given in terms of the boundary regularity, the parameters $N,M,K,d,\varepsilon$ , the set $F$ and the data $f,g$ such that

\mathbb{E}\left\{\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\right\}\leq A(F,M,K,d,\varepsilon)+\frac{B(M,K,d)}{\sqrt{N}}.

(1.6)

Moreover, for an arbitrary domain $D$ , for any compact subset $F\subset D$ , we have that

\lim_{M\to\infty}\lim_{N\to\infty}\mathbb{E}\left\{\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\right\}=0.

(1.7)

In addition, if the domain $D$ satisfies the uniform ball condition, we can take $F=D$ in the estimates (1.6) and (1.7).

Remark 1.1.

A few comments are in place here.

(i)

The estimates are true for any arbitrary domain, both in expectation and also in the tail. However, in this very general case we do not get any quantitative estimates, only asymptotic convergence guarantees on compact subsets.

(ii)

The full power of the result is a little more technical and states that we have a tail estimate in the form

\mathbb{P}\left(\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq 2\exp\left(C_{1}(M,K,d)-\frac{\left((\gamma-A(F,M,K,d,\varepsilon))^{+}\right)^{2}}{C_{2}(M,d)}N\right),

(1.8)

where

\begin{split}C_{1}(M,K,d):&=d\left(\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right),\quad C_{2}(M,d):=|g|_{\infty}+\frac{M}{d}{\sf diam}(D)^{2}|f|_{\infty}\\ A(F,M,K,d,\varepsilon)&:=2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha}\\ &\quad+d^{\alpha/2}|g|_{\alpha}v^{\alpha/2}(F,\varepsilon)+|f|_{\infty}v(F,\varepsilon)+(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.\end{split}

(1.9)

(iii)

In the case the function $g\in C^{2}(\bar{D})$ , we can take

\begin{split}A(F,M,K,d,\varepsilon):=&2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\frac{{\sf diam}(D)}{K}\\ &+\left(\frac{|\Delta g|_{\infty}}{2}+|f|_{\infty}\right)v(F,\varepsilon)+(8|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.\end{split}

(iv)

The parameter $\varepsilon$ measures the closeness to the boundary and can be taken to be arbitrary small. The function $v(x,\varepsilon)$ and $v(F,\varepsilon)$ defined in (2.22) measure the geometry of the boundary from a rather stochastic viewpoint. We can upper-bound $v(F,\varepsilon)$ by a more more tractable and analytical version of it, namely $|v|_{\infty}(\varepsilon)$ (see (2.22)). Moreover, if the domain $D$ satisfies the exterior ball condition, we can replace the compact set $F$ with the whole domain $D$ and $v(D,\varepsilon)$ with $\varepsilon{\sf adiam}(D)$ .

Furthermore, for the case of $\delta$ -defective convex domains $D$ (see (2.15)) we can replace $e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}$ by $\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\sqrt{\frac{{\sf diam}(D)}{\varepsilon}}$ .
(v)

There are many parameters in (2.35) and (1.8). $M$ stands for the number of steps the walk on spheres is allowed to take. $N$ is the number of Monte Carlo simulations.

The mysterious constant $K$ comes from a grid discretisation used for the estimate. Notice that the left hand side of (1.6) or (1.8) does not depend on $\varepsilon$ or $K$ . Furthermore, the larger the $K$ and $M$ the larger $C_{1}(M,K,d)$ , nevertheless this is compensated by $N$ and the dependency of $A(F,M,K,d,\varepsilon)$ , which becomes smaller for large $K$ and $M$ . Therefore, the strategy is to optimize the right hand sides of (1.6) or (1.8) to obtain the best estimate.
(vi)

We can take the constant

$B(M,K,d):=C_{2}(M,d)(\sqrt{C_{1}(M,K,d)+\log(2)}+1).$

which, as well as the constants $C_{1}(M,K,d)$ and $C_{2}(M,d)$ , does not depend on $N$ nor $\varepsilon$ . Thus the limit over $N$ in (1.7) and then over $M$ leaves the right hand side of (1.6) dependent only on $K$ and $\varepsilon$ . However, the limit in (1.7) is independent of $K$ and $\varepsilon$ . Letting $K\to\infty$ and $\varepsilon\to 0$ yields (1.7) for general domains. Quantitative versions can be obtained by carefully tuning all the parameters, $N$ , $M$ , $K$ , $\varepsilon$ .
(vii)

The estimate (1.8) is the key to guaranteeing that the error is actually small with high probability. This is more useful than the expectation result (1.7).
(viii)

Finally, we point out that these rather intricate terms are the key in choosing the dependency of all the constants $N,M,K,\varepsilon$ in terms of dimension $d$ for large $d$ . Nevertheless, the estimates are very useful also for small dimensions.
(ix)

When $g\in C^{2}(\partial D)$ then the above estimate can be improved; for more details we refer the reader to the extended arxiv version of this paper.

After all the remarks above, the goal is to make the right hand side of (1.8) small. Assuming that the domain $D$ satisfies the uniform exterior ball condition, and both $f,g$ have $\alpha$ -Hölder regularity, then we can ensure that

\mathbb{P}\left(\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq\eta,

(1.10)

by sampling $N$ times a number of $M$ steps of the WoS chain (1.4), with complexity (see more details below in Remark 2.29)

M=\mathcal{O}\left(\frac{d^{2}\log(1/\gamma)}{\gamma^{4/\alpha}}\right),\quad N=\mathcal{O}\left(\frac{d^{2}\log(1/\gamma)^{2}\left[d^{2}\log(1/\gamma)+\gamma^{2/\alpha}\log(1/\eta)\right]}{\gamma^{2+4/\alpha}}\right).

Moreover, if $D$ is defective convex (see (2.15)) then we get the significant improvement

M=\mathcal{O}\left(d\log(d/\gamma)\right),\quad N=\mathcal{O}\left(\frac{\log^{2}(d/\gamma)\left(d^{2}\log(d/\gamma)+\log(1/\eta)\right)}{\gamma^{2}}\right).

The expressions inside the $\mathcal{O}$ symbols represent the dependency on $f,g,{\sf diam}(D)$ and the geometry of $D$ . As these expressions make it clear, the dependency on the dimension and $\gamma$ is poly-logarithmic, with better choices in the case the domain $D$ has better geometry. We conclude by pointing out that the total number of flops to compute the approximation $u_{M}^{N}$ is $\mathcal{O}(d^{2}MN)$ , thus in total a polynomial complexity in $d$ .

Before we proceed to the DNN theoretical counterpart, let us briefly present a single numerical test meant to support the estimates in Theorem(Part I). For more numerical tests in this sense we refer the reader to Section 5. Here, Figure 1 below depicts the evolution w.r.t. $N$ of the mean errors $\mathbb{E}\left\{|D|^{-1}\int_{D}\left|u-u_{M}^{N}\right|\;dx\right\}$ and $\mathbb{E}\left\{\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\right\}$ for the Poisson problem (1.1), where $d=100$ , $D$ is the coronal cube $D_{\sf ac}$ defined in Section 5, whilst the exact solution $u$ is given by (5.4) from Section 5. The above expectations have also been approximated by Monte Carlo simulations, for more details see Section 5, more precisely Figure 7.

Refer to caption — (a) $\mathbb{E}\left\{1/|D|\int_{D}\left|u(x)-u_{M}^{N}(x)\right|\;dx\right\}$

The second part of the main result of this paper is dedicated to the construction of a neural network approximation. In fact, in this paper we deal only with the special class of feed-forward neural networks whose activation function is given by ReLU (rectified linear unit), and such a network shall further be referred to as a ReLU DNN. The essential part of this construction is the formula (1.5). The key fact is that we choose $\widetilde{r}$ to be already a ReLU DNN. This is the main reason for our modification of the walk on spheres algorithm with $\widetilde{r}$ instead of the usual distance to the boundary. In addition, having some ReLU DNN approximations of the data $f$ respectively $g$ , we can use these building blocks together with some basic facts about the ReLU DNNs in conjunction with (1.8) to get the following result.

Theorem (Part II; see Theorem 3.10 for details).

Under the same context as Theorem (Part I), assume that $D$ satisfies the uniform exterior ball condition, and that we are given ReLU DNNs $\phi_{f}:D\rightarrow\mathbb{R},\phi_{g}:\overline{D}\rightarrow\mathbb{R},\phi_{r}:D\rightarrow\mathbb{R}$ such that

|f-\phi_{f}|_{\infty}\leq\epsilon_{f}\leq|f|_{\infty},\quad|g-\phi_{g}|_{\infty}\leq\epsilon_{g},\quad|r-\phi_{r}|_{\infty}\leq\epsilon_{r}.

If $\gamma>0$ is chosen such that $a.1)-a.3)$ from 3.10 hold, and $0<\eta<1$ , then we can construct a (random) ReLU DNN $\mathbb{U}(x)$ such that

\mathbb{P}\left(\sup\limits_{x\in D}\left|u(x)-\mathbb{U}(\cdot,x)\right|\leq\gamma\right)\geq 1-\eta,

with

{\rm size}(\mathbb{U}(\omega,\cdot))=\mathcal{O}\left(d^{7}\gamma^{-16/\alpha-4}\log^{4}\left(\frac{1}{\gamma}\right)\left[d^{3}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right),

where

{\rm S}:=\left[\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+{\rm size}(\phi_{r})+{\rm size}(\phi_{g})+{\rm size}(\phi_{f})\right].

Furthermore, if $D$ is defective convex (see (2.15)) then we get the significant improvement

{\rm size}(\mathbb{U}(\omega,\cdot))=\mathcal{O}\left(\frac{d^{3}}{\gamma^{2}}\log^{4}\left(\frac{d}{\gamma}\right)\left[d^{2}\log\left(\frac{d}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right).

Here $\mathrm{size}$ denotes the number of non-zero parameters in the neural network, $\mathcal{W}(\phi)$ and $\mathcal{L}(\phi)$ represent the width respectively the length of the neural network $\phi$ . The implicit constants depend on $|g|_{\alpha},|g|_{\infty},|f|_{\infty},{\rm diam}(D),\\ {\rm adiam}(D),\delta,\alpha,\log(2+|\phi_{r}|_{1})$ .

1.4 Further extensions

This paper has outlined a novel method with the potential to extend to a general principle for solving high-dimensional partial differential equations (PDEs) without encountering the curse of dimensionality. The approach is threefold:

(i)

Initially, we establish a probabilistic representation of the solution via a stochastic process. Notably, such processes are already known to exist for a wide range of PDEs, ranging from linear types [24] to fully non-linear models [11, 17, 7, 41, 20].
(ii)

The subsequent phase involves the rapid simulation of this stochastic process, aligning closely with the PDE solution representation. This step is harmonized with the Monte-Carlo method for approximating the solution.
(iii)

The final phase focuses on approximating the solution at specific points within an auxiliary grid of the domain. Following this, we employ a concentration inequality to extend this approximation to the continuous domain, thus compensating for the complexity introduced by the grid.

Such a strategy finds immediate application in the realm of parabolic PDEs solvable by diffusion processes, as discussed in [3] and [10], especially within time-dependent domains. It also applies to PDEs characterized by (sticky-)reflecting boundary conditions, in line with the frameworks established in [53] and [32].

Furthermore, this methodology opens new avenues in molecular dynamics, particularly in scenarios where the operator exhibits discontinuous coefficients, as explored in [9, 38, 36]. Another promising direction involves the extension of this approach to (sub-)Riemannian or metric measure spaces, utilizing tools as the ones in [47, 26, 51, 2, 4, 49].

1.5 Organization of the paper

In Section 2 we present the main notations, the important quantities and the main results. For the sake of readability of the paper we moved the proofs to Section 4. Section 2 in turn contains several subsections which we discuss now because they show the main approach. Subsection 2.1 details the first probabilistic representation of the solution and various types of estimates based on the annular diameter of a domain, which is a certain one dimensional characterisation of the domain. In Subsection 2.2 the walk on spheres and its modified version enters the scene and we give the main estimates on the number of steps needed to get in the proximity of the boundary for general domains. Also here we introduce the class of $\delta$ -defective convex domains, a class of domains for which we provide better estimates for the number of steps to the proximity of the boundary. Subsection 2.3 provides the main analysis of the modified walk-on-spheres chain stopped at a deterministic time (thus uniformly for all points in the domain and all samples) as opposed to the one in [23] which is random and very difficult to control. Here we estimate first the sup norm of $u-u_{M}$ where $u$ is the solution to (1.1) represented probabilistically by (2.2), whilst $u_{M}$ is given by (2.20); the estimates are given in terms of the regularity of the data, the geometry of the boundary and the parameter $M$ . If the annular diameter is finite then in fact the estimates depend only on the diameter and the annular diameter or the parameter $\delta$ (the convex defectivness). Furthermore, Subsection 2.4 introduces the Monte Carlo estimator $u_{M}^{N}$ (see (1.5)) for $u_{M}$ and contains the main results, namely Theorem 2.26 and Corollary 2.28. Subsection 2.5 contains the main extensions we need to deal with the walk on sphere algoritm. Fundamentally, in order to construct either (2.20) or (1.5) we need to extend the values of boundary data $g$ inside the domain and we do this under some regularity conditions.

Section 3 contains the neural network consequences of the main probabilistic results and benefits from the very careful preliminary construction of the modified walk on spheres. We should only point out the key fact that from (1.5) namely that once we replaced $g$ , $f$ and $\tilde{r}$ by neural networks then the function $u_{M}^{N}$ also becomes a neural network. A word is in place here about the distance function $r$ , the distance to the boundary. In the original walk on spheres, one uses $r$ for the construction of the radius of the spheres. We modified this into $\tilde{r}$ . The benefit is that if we have an approximation of $r$ by a neural network, we can easily construct $\tilde{r}$ which is already a neural network. This avoids complications which arise in [23] from the approximation of $r$ by a neural network after the Monte-Carlo estimator is constructed. The rest of the section here is judicious counting of the size of the neural network obtained for $u_{M}^{N}$ replacing $f$ and $g$ by their corresponding approximating networks.

Finally, Section 5 is devoted to several numerical tests based on the Monte Carlo approach proposed and analyzed in the previous sections. Some key theoretical bounds are numerically validated and the PDE (2.1) below is numerically solved for some relevant domains in $\mathbb{R}^{d}$ for $d=10$ and $d=100$ .

2 The presentation of the main results

As we already discussed, this work concerns probabilistic representations and their DNN counterpart for the solution $u$ to problem

\left\{\begin{array}[]{ll}\frac{1}{2}\Delta u=-f&\,\textrm{ in }D\subset\mathbb{R}^{d}\\[2.84526pt] \phantom{\frac{1}{2}\Delta}u=g&\,\textrm{ on }\partial D.\end{array}\right.

(2.1)

Throughout this paper, $D$ is a bounded domain in $\mathbb{R}^{d}$ , $f$ is bounded on $D$ and $g$ is continuous on the boundary $\partial D$ . Further regularity shall also be imposed on $D\subset\mathbb{R}^{d}$ , $f$ and $g$ , so let us fix some notations: We say that a set $D\subset\mathbb{R}^{d}$ is of class $C^{k}$ if its boundary $\partial D$ can be locally represented as the graph of a $C^{k}$ function. We write $h\in C(D)$ to say that $h:D\rightarrow\mathbb{R}$ is continuous on $D$ . $L^{p}(D)$ is the standard Lebesgue space with norm denoted by $|\cdot|_{L^{p}(D)}$ . For a bounded function $h:D\rightarrow\mathbb{R}$ , that is for $h\in L^{\infty}(D)$ , we shall denote by $|h|_{\infty}$ the essential sup-norm of $h$ . For $\alpha\in[0,1]$ and $h:D\rightarrow\mathbb{R}$ an $\alpha$ -Hölder function for $\alpha\in(0,1)$ ( or Lipschitz, for $\alpha=1$ ), we set $|h|_{\alpha}:=\sup\limits_{x,y\in D}\dfrac{|h(x)-h(y)|}{|x-y|^{\alpha}}$ .

Before we proceed, let us remark that one could also consider the anisotropic operator $\nabla\cdot K\nabla$ instead of $\Delta$ , where $K$ is a (homogeneous) positive definite symmetric matrix, without altering the forthcoming results. This can be done mainly due to the following straightforward change of variables lemma. For completeness, we also include its short proof in the smooth case.

Lemma 2.1.

For any given $K$ a $d\times d$ symmetric, positive-definite matrix, assume that $u$ is a classical solution to (2.1) with $\Delta$ replaced by $\nabla\cdot K\nabla$ . If we take a $d\times d$ matrix $A$ such that $AA^{T}=K$ , then denoting $D_{A}:=A^{-1}(D)$ and $v(x):=u(Ax),f_{A}(x)=f(Ax),g_{A}(x)=g(Ax),x\in D_{A}$ , we have

\frac{1}{2}\Delta v=-f_{A}\,\textrm{ in }D_{A},\qquad v=g_{A}\textrm{ on }\partial D_{A}.

Proof in Subsection 4.1.

2.1 Probabilistic representation for Laplace equation and exit time estimates

We fix $B^{0}(t),t\geq 0$ , to be a standard Brownian motion on $(\Omega,\mathcal{F},\mathcal{F}_{t},\mathbb{P})$ which starts from zero, that is $\mathbb{P}(B(0)=0)=1$ . Then, we set

B^{x}(t):=x+B^{0}(t),\quad t\geq 0,\ x\in\mathbb{R}^{d},

and recall that the law $\mathbb{P}^{x}:=\mathbb{P}\circ(B^{x}(\cdot))^{-1}$ on the path-space $C([0,\infty);\mathbb{R}^{d})$ is precisely the law of the Brownian motion starting from $x\in\mathbb{R}^{d}$ .

By $\tau_{D^{c}}:=\tau^{x}_{D^{c}}$ we denote the first hitting time of $D^{c}:=\mathbb{R}^{d}\setminus D$ by $(B^{x}(t))_{t\geq 0}$ , namely

\tau_{D^{c}}(\omega):=\inf\{t>0:B^{x}(t,\omega)\in D^{c}\},\quad\omega\in\Omega.

The following result is the fundamental starting point of this work. It is standard for sufficiently regular data, but under the next assumptions we refer to [18, Theorem 6].

Theorem 2.2 ([18]).

Let $g\in C(\partial D)$ and $f\in L^{\infty}(D)$ . Then there exists a unique function $u\in C(D)\cap H^{1}_{loc}(D)$ such that $u$ is a (weak) solution to problem (2.1). Moreover, it is given by

u(x)=\mathbb{E}\{g(B^{x}(\tau_{D^{c}}))\}+\mathbb{E}\left\{\int_{0}^{\tau_{D^{c}}}f(B^{x}(t))\;dt\right\},\quad x\in D.

(2.2)

Let $v_{D}:\mathbb{R}^{d}\longrightarrow\mathbb{R}_{+}$ be given by

v_{D}(x):=\mathbb{E}\{\tau^{x}_{D^{c}}\},\quad x\in\mathbb{R}^{d}.

(2.3)

Most of the main estimates obtained in this paper are expressed in terms of the sup-norm of $v_{D}$ in the proximity of the boundary of $D$ (see e.g. (2.21)). In a following paragraph we shall explore such estimates for $v_{D}$ for domains that satisfy the uniform exterior ball condition. Before that, let us start with the following consequence of 2.2:

Corollary 2.3.

The following assertions hold for $v_{D}$ given by (2.3).

i)

$|v_{D}|_{\infty}\leq{\sf diam}(D)^{2}/d$ .
ii)

$v_{D}\in C^{\infty}(D)$ is the solution to the Poisson problem

$-\frac{1}{2}\Delta v_{D}=1\,\textrm{ in }D,\qquad v_{D}=0\,\textrm{ on }\partial D.$ (2.4)

Proof in Subsection 4.1.

The annular diameter of a set in $\mathbb{R}^{d}$ and exit time estimates

Now we explore more refined bounds for $v_{D}(x):=\mathbb{E}\{\tau^{x}_{D^{c}}\}$ , $x\in D$ , aiming at providing a general class of domains $D\subset\mathbb{R}^{d}$ for which $v_{D}(x)\leq{\rm adiam}(D)d(x,\partial D)$ , where ${\rm adiam}(D)$ is a sort of annular diameter of $D$ which in particular is a one dimensional parameter that depends on $D$ . For more details, see (2.7) below for the precise definition, and 2.6 for the precise result.

The first step here in understanding the exit problem from a domain which is not necessarily convex is driven by our first model, namely the annulus defined by $0<R_{0}<R_{1}$ as

A(a,R_{0},R_{1})=\{x\in\mathbb{R}^{n}:R_{0}<|x-a|<R_{1}\}.

We set $A(R_{0},R_{1}):=A(0,R_{0},R_{1})$ .

The result in this direction is the following.

Proposition 2.4.

Take $d\geq 3$ and $D:=A(R_{0},R_{1})$ . Then, for every point $x\in D$

v_{D}(x)=\mathbb{E}\{\tau^{x}_{D^{c}}\}\leq d(x,\partial D)\frac{(R_{1}-R_{0})R_{1}}{R_{0}}.

(2.5)

Proof in Subsection 4.1

Now we extend the above result to a larger class of domains, namely those that satisfy the uniform exterior ball condition. To do it in a quantitative way, we need to define first a notion of annular diameter of a set as follows:

Definition 2.1.

For a bounded domain $D\in\mathbb{R}^{d}$ , and $x\in\partial D$ , we set

{\sf adiam}(D)_{x}:=\inf\left\{\frac{(R_{1}-R_{0})R_{1}}{R_{0}}:\exists a\in\mathbb{R}^{d}\mbox{ such that }x\in\partial A(a,R_{0},R_{1})\mbox{ and }D\subset A(a,R_{0},R_{1})\right\},

(2.6)

with the convention $\inf\emptyset:=\infty$ . Furthermore, we set

{\sf adiam}(D):=\sup_{x\in\partial D}{\rm adiam}(D)_{x}=\sup_{x\in\partial D}\frac{(R_{1,x}-R_{0,x})R_{1,x}}{R_{0,x}}.

(2.7)

Now for a point $x$ , taking $R_{0,x}$ and $R_{1,x}$ such that the above infimumum is attained, it is easy to see using the triangle inequality that $R_{1,x}-R_{0,x}\leq{\sf diam}(D)$ for any $x\in\partial D$ and thus we get that

{\sf adiam}(D)_{x}\leq{\sf diam}(D)\left(1+\frac{{\sf diam}(D)}{R_{0,x}}\right).

(2.8)

This is strongly related to the exterior ball condition at $x$ , since the former holds if and only if ${\rm adiam}(D)_{x}<\infty$ . In fact if we have the exterior ball condition with the radius of the exterior ball is $r_{0}$ , then we can choose $R_{0,x}=r_{0}$ and $R_{1,x}=r_{0}+{\sf diam}(D)$ .

The above discussion leads to the following formal statement.

Proposition 2.5.

A bounded domain $D\subset\mathbb{R}^{d}$ satisfies the uniform exterior ball condition if and only if ${\rm adiam}(D)<\infty$ . If the radius of the exterior ball is at least $r_{0}>0$ , we can estimate

{\sf adiam}(D)\leq\frac{r_{0}{\sf diam}(D)}{r_{0}+{\sf diam}(D)}.

However, for instance a ball centered at $0$ from which we remove a cone with the vertex at the center does not have a finite adiam. On the other hand, obviously, a convex bounded domain has finite adiam and in fact, this is actually equal to the diameter of the set. Indeed, one can see this by taking a tangent ball of radius $R_{0}$ and taking $R_{1}=R_{0}+{\sf diam}(D)$ . Letting $R_{0}$ tend to infinity we deduce that for a convex set $D$ we actually have ${\sf adiam}(D)={\sf diam}(D)$ .

Now we can present the main estimate of this paragraph.

Proposition 2.6.

If $D\subset\mathbb{R}^{d}$ has ${\sf adiam}(D)<\infty$ , then for any $x\in D$ ,

v(x):=\mathbbm{E}\{\tau^{x}_{D^{c}}\}\leq d(x,\partial D){\sf adiam}(D).

In particular, using (2.8) and (2.7), we have

v(x)\leq 2d(x,\partial D){\sf diam}(D)\left(1+\frac{{\sf diam}(D)}{R_{0}}\right),x\in D,

where recall that $\tau_{D^{c}}$ is the first exit time from $D$ and $R_{0}:=\inf\{R_{0,x}:x\in\partial D\}$ .

Proof in Subsection 4.1.

2.2 Walk-on-Spheres (WoS) and $\varepsilon$ -shell estimates revisited

An important benefit of representation (2.2) is that the solution $u$ may be numerically approximated by the empirical mean of iid realizations of the random variables under expectation, thanks to the law of large numbers. One way to construct such realizations is by simulating a large number of paths of a Brownian motion that starts at $x\in D$ and is stopped at the boundary $\partial D$ . However, as introduced by Muller in [40], there is a much more (numerically) efficient way of constructing such realisations, based on the idea that (2.2) does not require the entire knowledge of how Brownian motion reaches $\partial D$ . This is clearer if one considers the case $f\equiv 0$ , when the only information required in (2.2) is the location of Brownian motion at the hitting point of $\partial D$ . In this subsection we shall revisit and enhance Muller’s method.

For any $x\in D$ let $r(x)\in[0,{\sf diam}(D)]$ denote the distance from $x$ to $\partial D$ , or equivalently, the radius of the largest sphere centered at $x$ and contained in $D$ , that is

r(x):=\inf\{|x-y|:y\in\partial D\}=\sup\{r>0:B(x,r)\subseteq D\}.

(2.9)

Clearly, $r$ is a Lipschitz function.

Recall that the standard WoS algorithm introduced by Muller [40] (see also [45], [14] or [13] for more recent developments) is based on constructing a Markov chain that steps on spheres of radius $r(x)$ where $x\in D$ denotes its current position. However, it is often the case, especially in practice, that merely an approximation $\overline{r}$ of the distance function is available, and not the exact $r$ . This is the case, for example, if for computational reasons, $r$ is simply approximated with the (computationally cheaper) distance function to a polygonal surrogate for the domain $D$ . Another situation, which is in fact central to this study, is given in Section 3 below, when $r$ is approximated by a neural network. In both cases, $r$ is approximated by a function $\overline{r}$ with a certain error. It turns out that considering the chain which walks on spheres of radius $\overline{r}(x),x\in D$ exhibits certain difficulties regarding the error analysis and also the construction of the chain itself. We do not go into more details at this point, but we refer the reader to 2.23, iii)-iv) below for a more technical explanation of these issues.

Our strategy is to solve both of the above difficulties at once, by developing from scratch the entire analysis in terms of (a modification of) $\overline{r}$ , and not in terms of $r$ as it is typically done. This motivates the following concept of distance.

Definition 2.7.

Let $D\subset\mathbb{R}^{d}$ be a bounded open set and $r$ the distance function to the boundary. Given $\varepsilon\geq 0$ and $\beta\in(0,1]$ , a Lipschitz function $\widetilde{r}:D\rightarrow[0,{\sf diam}(D)]$ is called a $(\beta,\varepsilon)$ -distance on $D$ if

i)\;0\leq\widetilde{r}\leq r\;\mbox{ on }D\quad\mbox{ and }\quad ii)\;\widetilde{r}\geq\beta r\;\mbox{ on }D_{\varepsilon}:=\{x\in D:r(x)\geq\varepsilon\}.

When $\varepsilon=0$ we say that $\widetilde{r}$ is a $\beta$ -distance, and if in addition $\beta=1$ then it is obvious that $\widetilde{r}=r$ .

Notice that if $\tilde{r}$ is a $(\beta,\varepsilon)$ -distance, then it is also a $(\beta,\varepsilon^{\prime})$ -distance for any smaller $\varepsilon^{\prime}<\varepsilon$ . Thus if we fix a $\varepsilon_{0}$ and $\tilde{r}$ is a $(\beta,\varepsilon_{0})$ -distance, then, $\tilde{r}$ is also a $(\beta,\varepsilon)$ -distance for any $\varepsilon<\varepsilon_{0}$ .

Remark 2.8.

Suppose that $\phi_{r}:D\rightarrow[0,\infty)$ is a Lipschitz function such that $|\phi_{r}-r|_{\infty}\leq\epsilon$ . If $\varepsilon>2\epsilon$ and $0<\beta\leq 1-\frac{2\epsilon}{\varepsilon}$ , then a simple computation yields that

\widetilde{r}(x):=(\phi_{r}(x)-\epsilon)^{+},\quad x\in D

is a $(\beta,\varepsilon)$ -distance on $D$ . For example if we take $\varepsilon=3\epsilon$ , we can choose $\beta=1/3$ , therefore, given any $\phi_{r}$ , a Lipschitz $\epsilon$ approximation of $r$ , there exists $\widetilde{r}$ which is $(1/3,3\epsilon)$ -distance. The moral is that we can always work with a $(\beta,\varepsilon)$ -distance with $\beta\geq 1/3$ .

This example already anticipates the amenability of this $\tilde{r}$ to the ReLU neural networks. Indeed, the positive function $x^{+}$ is precisely the non-linear activation function and from this standpoint, if $\phi_{r}$ is a neural network, we can argue that $\tilde{r}$ becomes also a ReLU neural network. More on this in Section 3.

$\widetilde{r}$ -WoS chain.

Let $U_{n}:\Omega\rightarrow S(0,1),\;n\geq 1$ be defined (for simplicity) on the same probability space $(\Omega,\mathcal{F},\mathbb{P})$ , independent and uniformly distributed, where $S(0,1)\subset\mathbb{R}^{d}$ is the sphere centered at the origin with radius $1$ . Let $(\widetilde{\mathcal{F}}_{n})_{n\geq 0}$ be the filtration generated by $(U_{n})_{n\geq 0}$ , where $U_{0}=0$ , namely

\widetilde{\mathcal{F}}_{n}:=\sigma(U_{i}\;:\;i\leq n),\;n\geq 0.

Also, let $\varepsilon\geq 0$ , $\beta\in(0,1]$ , and $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance on $D$ . For each $x\in D$ , we construct the chain $({X}^{x}_{n})_{n\geq 0}$ recursively by

	$\displaystyle{X}^{x}_{0}$	$\displaystyle:=x$		(2.10)
	$\displaystyle{X}^{x}_{n+1}$	$\displaystyle:={X}^{x}_{n}+\widetilde{r}({X}^{x}_{n})U_{n+1},\;n\geq 0.$		(2.11)

Clearly, $({X}^{x}_{n})_{n\geq 0}$ is a homogeneous Markov chain in $D$ with respect to the filtration $(\widetilde{\mathcal{F}}_{n})_{n\geq 0}$ , which starts from $x$ and has transition kernel given by

Pf(x)=\int_{S(0,1)}f(x+\widetilde{r}(x)z)\;\sigma(dz),\quad x\in D,

where $\sigma$ is the normalized surface measure on $S(0,1)$ ; that is, $\mathbb{E}\{f(X_{n}^{x})\}=P^{n}f(x),x\in D,f$ bounded and measurable. We name it an $\widetilde{r}$ -WoS chain.

We return now to problem (2.1)

\begin{array}[]{ll}\frac{1}{2}\Delta u=-f\textrm{ in }D\text{ with }u=g\textrm{ on }\partial D,\end{array}

which by Theorem 2.2 admits the probabilistic representation (2.2). Further, we consider the following sequence of stopping times

\tau_{0}^{x}=0,\quad\tau^{x}_{n+1}=\inf\{t>\tau_{n}^{x}:|B^{x}(t)-B^{x}(\tau^{x}_{n})|\geq\widetilde{r}(B^{x}(\tau^{x}_{n}))\},\;n\geq 1.

(2.12)

Remark 2.9.

It is clear that if $\varepsilon=0$ , i.e. $\widetilde{r}$ is a $\beta$ -distance, then $\lim\limits_{n}\tau_{n}^{x}=\tau_{D^{c}}^{x}\quad\mathbb{P}^{0}\mbox{-a.s.}$

The following result is a generalization of Lemma 3.4 from [23]:

Corollary 2.10.

Let $D\subset\mathbb{R}^{d}$ be a bounded open set, $g\in C(\partial D),f\in L^{\infty}(D)$ , and $u$ be the solution to (2.1). Also, for $\varphi:B(0,1)\rightarrow\mathbb{R}$ bounded and measurable set

K_{0}\varphi:=\mathbb{E}\left\{\int_{0}^{\tau^{0}_{B(0,1)^{c}}}\varphi(B^{0}(t))\;dt\right\}.

If $\widetilde{r}$ is a $\beta$ -distance ( i.e. it is a $(\beta,0)$ -distance) then the following assertions hold:

For all $x\in D$ we have

u(x)=\mathbb{E}\{g(B^{x}(\tau^{x}_{D^{c}}))\}+\mathbb{E}\left\{\sum\limits_{k\geq 1}\widetilde{r}^{2}(X^{x}_{k-1})K_{0}F_{x,k}\right\},

where $F_{x,k}(y)=f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})y),y\in B(0,1)$ , whilst $K_{0}$ acts on $F_{x,k}$ with respect to the $y$ variable.

ii)

The mapping $\mathcal{B}(B(0,1))\ni A\mapsto\mu(A):=dK_{0}1_{A}\in[0,1]$ renders a probability measure $\mu$ on $B(0,1)$ with density $dG(0,y),y\in D$ , where $G(x,y)$ is the Green function associated to $-\frac{1}{2}\Delta$ on $B(0,1)$ . More explicitly, for $d\geq 3$ we have that $G(0,y)$ is proportional to $|y|^{2-d}-1,|y|<1.$

iii)

Let $Y$ be a real valued random variable defined on $\left(\Omega,\mathcal{F},\mathbb{P}\right)$ , with distribution $\mu$ , such that $Y$ is independent of $(U_{n})_{n}$ . Then for all $x\in D$ we have

u(x)=\mathbb{E}\{g(B^{x}(\tau^{x}_{D^{c}}))\}+\frac{1}{d}\mathbb{E}\left\{\sum\limits_{k\geq 1}\widetilde{r}^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})Y)\right\}.

(2.13)

Proof in Subsection 4.1.

Consider the $\widetilde{r}$ -WoS chain described above, and for each $x\in D$ let us define the required number of steps to reach the $\varepsilon$ -shell of $\partial D$ by

{N}_{\varepsilon}^{x}:=\inf\{n\geq 0\;:\;r({X}_{n}^{x})<\varepsilon\},

which is clearly an $(\widetilde{\mathcal{F}}_{n})$ -stopping time.

The estimates to be obtained in the next subsection, and consequently the size of the DNN that we are going to construct in order to approximate the solution to (2.1), depend on how big $N_{\varepsilon}^{x}$ is. The goal of this section is to provide upper bounds for $N^{x}_{\varepsilon}$ , the number of steps the walk on spheres needs to get to the $\varepsilon$ -shell. These estimates are first obtained for general domains, and then improved considerably for ”defective convex” domains which are introduced in 2.12, below. We provide what are, to our knowledge, the strongest estimates when compared with the currently available literature, as well as rigorous proofs that rely on the general technique of Lyapunov functions. Some of these results have some something in common with the results in [8], though the estimates in there are not clearly determined in terms of the dimension.

It is essentially known that for a general bounded domain in $\mathbb{R}^{d}$ , the average of $N_{\varepsilon}^{x}$ grows with respect to $\varepsilon$ at most as $({\sf diam}(D)/\varepsilon)^{2}$ (see [33, Theorem 5.4] and its subsequent discussion), hence, by Markov inequality, $\mathbb{P}(N_{\varepsilon}^{x}\geq M)$ can be bounded by $({\sf diam}(D)/\varepsilon)^{2}/M$ . The next result shows that, in fact, $\mathbb{P}(N_{\varepsilon}^{x}\geq M)$ decays exponentially with respect to $M$ , and independent of the dimension $d$ .

Proposition 2.11.

Let $D\subset\mathbb{R}^{d}$ be a bounded domain, $\varepsilon>0$ , $\beta\in(0,1]$ and $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance. Then for any $x\in D$ ,

\mathbbm{E}\left\{e^{\frac{\beta^{2}\epsilon^{2}}{4{\sf diam}(D)^{2}}N^{x}_{\varepsilon}}\right\}\leq 2,

(2.14)

where ${\sf diam}(D)$ denotes the diameter of $D$ . In particular,

\mathbb{P}(N^{x}_{\varepsilon}\geq M)\leq 2e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\quad\mbox{for all }M\in\mathbb{N}.

Proof in Subsection 4.2.

If the domain $D$ is convex, then the average number of steps required by the WoS to reach the $\varepsilon$ -shell is of order $\log(1/\varepsilon)$ . This was shown by Muller in [40], and also reconsidered in [8]. However, it is of high importance to the present work to track an explicit constant in front of $\log(1/\varepsilon)$ in terms of the space dimension $d$ . In this subsection we aim to clarify (the proof of) this result and extend it from convex domains to a larger class of domains, resembling the technique of Lyapunov functions from ergodic theory. More importantly, as in the case of 2.11, we are strongly interested in tail estimates for $N_{\varepsilon}^{x}$ , not in its expected value. In a nutshell, the idea is to show that the square root of the distance function to $\partial D$ is pushing the WoS chain towards the boundary at geometric speed. This is a technique meant to be easily extended for more general operators, in a further work.

The following definition settles the class of domains for which the aforementioned estimate is going to hold.

Definition 2.12.

We say that a Lipschitz bounded domain $D\subset\mathbb{R}^{d}$ is ” $\delta$ -defective convex” if $\delta<1$ and

\Delta r\leq\frac{\delta}{2\;\rm{rad}(D)}\quad\mbox{weakly on }D,

(2.15)

where we recall that $r$ is the distance function to the boundary $\partial D$ , whilst $\rm{rad}(D):=\sup\limits_{x\in D}r(x)$ .

Remark 2.13.

Recall that by [1], $D$ is convex if and only if the signed distance function $r_{s}$ is superharmonic on $\mathbb{R}^{d}$ , where

r_{s}:=r\mbox{ on }\overline{D}\mbox{ and }r_{s}:=-r\mbox{ on }\mathbb{R}^{d}\setminus D.

Hence a defective convex domain as defined by (2.15) is more general than a convex domain; in fact, even if (2.15) holds (on $D$ ) with $\delta=0$ , it is not necessarily true that $D$ is convex, as explained in [1]. To give an intuition on how a defective convex domain could differ from a convex domain, imagine a ball in $3D$ which is deformed into a defective convex domain by squeezing it slightly, or a straight cylinder in $3D$ which is bent mildly. However, a defective convex domain can differ seriously from a convex domain, as revealed by the next two examples.

Example 2.14.

Let $A(R_{1},R_{2})\subset\mathbb{R}^{d}$ be an annulus of radii $R_{1}<R_{2}$ , namely $A(R_{1},R_{2})=\{x\in\mathbb{R}^{d};R_{1}<|x|<R_{2}\}$ . If $\frac{R_{2}}{R_{1}}<1+\frac{\delta}{d-1}$ for some $\delta<1$ , then $A(R_{1},R_{2})$ is a $\delta$ -defective convex domain.

Proof in Subsection 4.2.

Example 2.15.

We take $\Gamma$ to be a connected, compact orientable $C^{2}$ hypersurface in $\mathbb{R}^{d}$ , with $d\geq 2$ , endowed with the Riemannian metric $g$ induced by the embedding. We denote by $k_{1}(x)\leq k_{2}(x)\leq\ldots\leq k_{d-1}(x)$ the principal curvatures at $x\in\Gamma$ . The orientation is specified by a globally defined unit normal vector field $n:\Gamma\to\mathbb{S}^{d-1}$ . Then there exists a positive thickness ${\varepsilon}$ , such that the tubular neighbourhood given by

D_{\varepsilon}:=\left\{x+{\varepsilon}\,t\,n(x)\in\mathbb{R}^{d}\ \big{|}\ (x,t)\in\Gamma\times(0,1)\right\}\,

(2.16)

is $\delta$ -defective convex. In fact, ${\varepsilon}$ can be chosen explicitly in terms of the principal curvatures of $\Gamma$ .

Proof in Subsection 4.2.

Proposition 2.16.

Let $D\subset\mathbb{R}^{d}$ be $\delta$ -defective convex as in (2.15). Let $\varepsilon>0$ , $\beta\in(0,1]$ and $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance, and consider $P$ the transition kernel of the $\widetilde{r}$ -WoS Markov chain $(X_{n}^{\cdot})_{n\geq 0}$ . If we set $V(x):=r(x)^{1/2},x\in D$ , then

PV(x)\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)V(x),\quad x\in D\mbox{ a.e.}

(2.17)

In particular,

\mathbb{P}(N_{\varepsilon}^{x}>M)\leq\mathbb{P}(r(X_{M}^{x})\geq\varepsilon)\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\frac{V(x)}{\sqrt{\varepsilon}},\quad x\in D,

(2.18)

and if $\delta_{d}:=1-\frac{\beta^{2}(1-\delta)}{4d}$ , then for any $1<a<1/\delta_{d}$

a^{\mathbb{E}\left\{N_{\varepsilon}^{x}\right\}}\leq\mathbb{E}\left\{a^{N_{\varepsilon}^{x}}\right\}\leq 1+\frac{a}{1-a\delta_{d}}\frac{V(x)}{\sqrt{\varepsilon}},\quad x\in D.

(2.19)

Proof in Subsection 4.2.

Remark 2.17.

It can be shown that 2.16 still holds if condition (2.15) is satisfied merely in some strict neighbourhood of the boundary. In particular, in view of 2.15, 2.16 holds for all domains with smooth boundary, but the estimates would also depend on the thickness of the neighbourhood where condition (2.15) is fulfilled. Going even further, any bounded domain that can be uniformly approximated from inside by smooth domains enjoys, for fixed $\varepsilon$ , a similar estimate with respect to $M$ and $D$ as in (2.18), with $\delta$ possibly depending on $\varepsilon$ . This behavior is numerically confirmed by Test 1 in Section 5 for annular hypercubes, and it is going to be analyzed theoretically in a forthcoming work.

2.3 WoS stopped at deterministic time and error analysis

Throughout this subsection we assume that $u$ is the solution to problem (2.1), hence 2.2 and 2.10 are applicable. Moreover, we keep all the notations from the previous subsections. Before we proceed with the main results of this subsection, let us emphasize several aspects that are essential to this work. To make the explanation simple, assume that $\widetilde{r}=r$ , that is the WoS chain is constructed using the $(1,0)$ -distance $r$ . Given $x\in D$ , a usual way to employ WoS Markov chain in order to approximate $u(x)$ through the representation furnished by 2.10, is to start the chain from $x$ and run it until it reaches the $\varepsilon$ -shell, for some given $0<\varepsilon<<1$ . In other words, $(X_{k}^{x})_{k\geq 0}$ is usually stopped at the (random) stopping time $N_{\varepsilon}^{x}$ and $u$ represented by (2.2) is then approximated with

u_{\varepsilon}(x):=\mathbb{E}\{g(X^{x}_{N_{\varepsilon}^{x}})\}+\frac{1}{d}\mathbb{E}\left\{\sum\limits_{k=1}^{N_{\varepsilon}^{x}}r^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+r(X^{x}_{k-1})Y)\right\}.

The intuition behind is that stopping WoS chain at the $\varepsilon$ -proximity of the boundary should provide a good approximation of how the Brownian motion first hits the boundary $\partial D$ , and estimates that certify this fact are in principle well known. As discussed in the previous subsection, the number of steps required to reach the $\varepsilon$ -shell is small, especially if the domain is (defective) convex, which eventually leads to a fast numerical algorithm.

From the point of view of this work (also of [23]), the fundamental inconvenience of the above stopping rule is that it depends strongly on the starting point $x$ , mainly through $N_{\varepsilon}^{x}$ . In other words, although computationally efficient for estimating a single value $u(x)$ , the above point estimate $u_{\varepsilon}$ of $u$ is expected to fail at overcoming the curse of high dimensions for solving (2.1) globally in $D$ . Moreover, if one aims at constructing a (deep) neural network architecture based on the above representation, as considered in [23] and also in Section 3 below, $x$ would be the input while $N_{\varepsilon}^{x}$ would give the number of layers; however, the latter should be independent of $x$ , which is obviously not the case. To deal with this architectural impediment, in [23] the authors proposed $\sup\limits_{x\in D}N_{\varepsilon}^{x}$ as a random time to stop the WoS chain. However, beside the measurability and the stopping time property issues for $\sup\limits_{x\in D}N_{\varepsilon}^{x}$ , it is still unclear, at least to us, that $\mathbb{E}\left\{\sup\limits_{x\in D}N_{\varepsilon}^{x}\right\}<\infty$ and that this expectation does not depend on the dimension $d$ .

Anyway, our approach is consistently different. Instead of stopping the WoS chain at a random stopping time, be it $N_{\varepsilon}^{x}$ or $\sup\limits_{x\in D}N_{\varepsilon}^{x}$ , the idea is to stop the chain after a deterministic number of steps, say $M$ , independently of the starting point $x$ . Such a choice turns out to be feasible, and it not only avoids the above mentioned issues concerning $\sup\limits_{x\in D}N_{\varepsilon}^{x}$ , but it eventually provides a way to break the curse of high dimensions for solving (2.1), merely using WoS algorithm but in a global fashion. Furthermore, in terms of neural networks, this strategy would also render a way of explicitly constructing a corresponding DNN architecture, that could be easily sampled, and why not, further trained. Also, as already mentioned in 2.8, when we shall deal with neural networks in Section 3 the distance to the boundary $r$ needs to be replaced by an approximation given by a DNN, with a certain error. Therefore, in light of 2.7 and 2.8, we shall work instead with a $(\beta,\varepsilon)$ -distance $\widetilde{r}$ on $D$ , for some $\varepsilon>0$ and $\beta\in(0,1]$ properly chosen. Having all these in mind, the aim of this subsection is to estimate the error of approximating the solution $u$ with

u_{M}(x):=\mathbb{E}\left\{g(X_{M}^{x})+\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x}_{k-1})K_{0}f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})\cdot)\right\}\quad\mbox{ for all }x\in D,

(2.20)

for a given (deterministic) number of steps $M\geq 1$ that does not depend on $x\in D$ .

To keep the assumption on the regularity of $D$ as general as possible, the forthcoming estimates shall be obtained in terms of the function $v$ defined by (2.3) or (2.4), more precisely in terms of the behavior of $v$ near the boundary measured for each $\varepsilon>0$ by

|v|_{\infty}(\varepsilon):=\sup\{v(x):r(x):=d(x,\partial D)\leq\varepsilon\}.

(2.21)

Though this is our primary measure of the geometry of the boundary, we can in fact refine things by defining

\begin{split}v(x,\varepsilon):&=\mathbb{E}\left\{v(B^{x}_{\tau_{N_{\varepsilon}^{x}}^{x}})\right\}\\ v(F,\varepsilon):&=\sup_{x\in F}v(x,\varepsilon)\text{ for }F\subset D.\end{split}

(2.22)

We include here a small result which reveals the main properties we need further on.

Proposition 2.18.

We have

v(x,\varepsilon)\leq|v|_{\infty}(\varepsilon).

(2.23)

For any domain $D$ and any compact $F\subset D$ ,

\lim_{\varepsilon\to 0}v(F,\varepsilon)=0.

(2.24)

We will only point how one can prove this by using the observation that for any stopping time $\tau$ , $v(B_{\tau\wedge t})$ is a bounded right-continuous supermartingale, thus converges for $t\to\infty$ . As a consequence, we obtain that $v(x,\varepsilon)$ converges to $0$ . On the other hand, we just observe that $v(x,\varepsilon)$ is non-increasing in $\varepsilon$ , thus by Dini’s theorem, we get the uniform convergence to $0$ .

Let us first consider the case of homogeneous boundary conditions, namely $g\equiv 0$ .

Proposition 2.19.

Let $\varepsilon>0$ , $\beta\in(0,1]$ , $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance, $f\in L^{\infty}(D)$ , $M\in\mathbb{N}^{\ast}$ and $u$ be the solution to (2.1) with $g\equiv 0$ . If $u_{M}$ is given by (2.20) then

|u(x)-u_{M}(x)|\leq|f|_{\infty}\left[v(x,\varepsilon)+\frac{2}{d}{\sf diam}(D)^{2}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\right]\quad\mbox{ for all }\varepsilon>0.

(2.25)

In particular,

\sup_{x\in D}|u(x)-u_{M}(x)|\leq|f|_{\infty}\left[|v|_{\infty}(\varepsilon)+\frac{2}{d}{\sf diam}(D)^{2}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\right]\quad\mbox{ for all }\varepsilon>0.

(2.26)

Proof in Subsection 4.3.

Let us treat now the in-homogeneous Dirichlet problem, this time taking $f\equiv 0$ .

Proposition 2.20.

Let $\varepsilon>0$ , $\beta\in(0,1]$ , $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance, $u$ be the solution to (2.1) with $g\in C(\overline{D})$ and $f\equiv 0$ . Further, for each $M\in\mathbb{N}_{\ast}$ consider that $u_{M}$ is given by (2.20) If $g$ is $\alpha$ -Hölder on $\overline{D}$ for some $\alpha\in[0,1]$ then

|u(x)-u_{M}(x)|\leq d^{\alpha/2}|g|_{\alpha}\cdot v(x,\varepsilon)^{\alpha/2}+4|g|_{\infty}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\quad\mbox{ for all }\varepsilon>0

(2.27)

and in particular,

\sup\limits_{x\in D}|u(x)-u_{M}(x)|\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|^{\alpha/2}_{\infty}(\varepsilon)+4|g|_{\infty}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\quad\mbox{ for all }\varepsilon>0.

(2.28)

Proof in Subsection 4.3.

We can now superpose 2.19 and 2.20 to obtain the following key result:

Theorem 2.21.

Let $\varepsilon>0$ , $\beta\in(0,1]$ , $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance, and $u$ denote the solution to (2.1) with $g\in C(\overline{D})$ and $f\in L^{\infty}(D)$ . Further, for each $M\in\mathbb{N}_{\ast}$ let $u_{M}$ be given by (2.20). If $g$ is $\alpha$ -Hölder on $\overline{D}$ for some $\alpha\in[0,1]$ , then for all $\varepsilon>0$ we have

|u(x)-u_{M}(x)|\leq d^{\alpha/2}|g|_{\alpha}\cdot v^{\alpha/2}(x,\varepsilon)+|f|_{\infty}v(x,\varepsilon)+(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

In particular we get that

\sup_{x\in D}|u(x)-u_{M}(x)|\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|_{\infty}^{\alpha/2}(\varepsilon)+|f|_{\infty}|v|_{\infty}(\varepsilon)+(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

If $g\in C^{2}_{b}(\overline{D})$ , then for all $\varepsilon>0$ we have

|u(x)-u_{M}(x)|\leq\left(\frac{|\Delta g|_{\infty}}{2}\cdot+|f|_{\infty}\right)v(x,\varepsilon)+(8|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

and also in particular,

\sup_{x\in D}|u(x)-u_{M}(x)|\leq\left(\frac{|\Delta g|_{\infty}}{2}\cdot+|f|_{\infty}\right)|v|_{\infty}(\varepsilon)+(8|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

Let us point out that if ${\sf adiam}(D)<\infty$ (see (2.6) below), then $|v|_{\infty}(\varepsilon)$ involved above may be replaced by $\varepsilon\;{\sf adiam}(D)$ . Furthermore, when the domain is defective convex (see subsection 2.2), we can improve considerably the above error estimates with respect to the required number of WoS steps, $M$ . This can be done analogously to the proofs of 2.19 and 2.20, just by replacing the tail estimate given by 2.11 with the one provided by 2.16. Therefore, we give below the precise statement, but we skip its proof.

Corollary 2.22.

In the context of 2.21, the following additional assertions hold:

i)

If $D$ satisfies the uniform exterior ball condition, then by 2.6, $|v|_{\infty}(\varepsilon)$ involved in the above estimate may be replaced with $\varepsilon\;{\sf adiam}(D)$ , where recall that ${\sf adiam}(D)$ is given by (2.7).
ii)

If the domain $D$ is $\delta$ -defective convex ( $\delta<1$ , see (2.15) for the definition) so that the conclusion from 2.16 is in force, then the factor $e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}$ from the above estimate can be replaced by
$\left(1-\frac{\beta^{2}(1-d)}{4d}\right)^{M}\sqrt{\frac{{\sf diam}(D)}{\varepsilon}}$ .

2.4 Monte-Carlo approximations: mean versus tail estimates

We place ourselves in the same framework as before, namely: $D\subset\mathbb{R}^{d}$ is bounded domain, $g\in C(\overline{D})$ , $f\in L^{\infty}(D)$ , and $u$ is the solution to (2.1).

Further, let $(U^{i}_{n})_{n\geq 1,i\geq 1}$ be a family of independent and uniformly distributed random variables on $S(0,1)$ , $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance on $D$ for some $\beta\in(0,1]$ and $\varepsilon>0$ (see 2.7), and set:

\displaystyle X^{x,i}_{n+1}

\displaystyle:=X_{n}^{x,i}+\widetilde{r}(X_{n}^{x,i})\cdot U^{i}_{n+1},\;\;n\geq 0,i\geq 1.

On the same probability space as $(U^{i}_{n})_{n\geq 1,i\geq 1}$ , let $(Y^{i})_{i\geq 1}$ be iid random variables with distribution $\mu$ given by 2.10, ii), such that the family $(Y^{i})_{i\geq 1}$ is independent of $(U^{i}_{n})_{n\geq 1,i\geq 1}$ .

For $N,M\in\mathbb{N}^{\ast}$ let $u_{M}$ be given by (2.20), and consider the Monte Carlo estimator

u_{M}^{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\left[g(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x,i}_{k-1})f\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right],x\in D.

(2.29)

As in 2.10, iii), we have

\mathbb{E}\left\{u_{M}^{N}(x)\right\}=u_{M}(x),\quad x\in D,N\geq 1.

Remark 2.23.

At this point we would like to point out that estimator $u_{M}^{N}$ is different than the one employed in [23], Proposition 4.3 in several main aspects:

i)

The first aspect was already anticipated in the beginning of Subsection 2.3, namely instead of stopping the WoS chain at $\sup\limits_{x\in D}N_{\varepsilon}^{x}$ which is a random time that is difficult to handle both theoretically and practically, we simply stop it a deterministic time $M$ which is going to be chosen according to the estimates obtained in 2.26 below and its two subsequent corollaries.
ii)

The second aspect is that the estimator used in [23] considers $N$ iid samples drawn from $\mu$ , for each of the $N$ iid samples drawn from $(U_{n})_{n\geq 1}$ , leading to a total of $N^{2}$ samples. In contrast, $u_{M}^{N}$ requires merely $N$ samples, because $Y$ and $(U_{n})_{n}$ are sampled simultaneously (and independently), on the same probability space.
iii)

The third aspect is more subtle: In [23], the Monte Carlo estimator of type (2.29) is constructed based on a given DNN approximation $\overline{r}$ of the distance to the boundary $r$ , for any prescribed error, let us say $\eta$ . Then, the approximation error of the solution is obtained based on the error of the Monte Carlo estimator constructed with the exact distance $r$ , and on how such an estimator varies when $r$ is replaced with $\overline{r}$ . However, the latter source of error scales like $2^{N_{\varepsilon}}\eta$ , where $N_{\varepsilon}:=\sup\limits_{x\in D}N_{\varepsilon}^{x}$ . To compensate this explosion of error, $\eta$ has to be taken extremely small, and to do so, in [23] it is assumed that $\overline{r}$ can be realized with complexity $O(\log(1/\eta))$ ; The authors show that such a complexity can indeed be attained for the case of a ball or a hypercube in $\mathbb{R}^{d}$ , and probably can be extended to other domains with a nice geometry. Our approach is different and the key ingredient is to rely on the notion of $(\beta,\varepsilon)$ -distance introduced in 2.7. More precisely, using 2.8 we can replace $\overline{r}$ by some $(\beta,\varepsilon)$ -distance $\widetilde{r}$ at essentially no additional cost, and rely on the herein developed analysis for $\widetilde{r}$ -WoS. This approach turns out to avoid the additional error of order $2^{N_{\varepsilon}}\eta$ mentioned above, in particular we shall be able to consider domains whose distance function to the boundary may be approximated by a DNN merely at a polynomial complexity with respect to the approximation error.
iv)

Another issue regards the construction of the WoS chain itself. Because $\overline{r}$ from iii) may be strictly bigger than $r$ , for a given position $x\in D$ , the sphere of radius $\overline{r}(x)$ might exceed $D$ , so there is a risk that the WoS chain leaves the domain $D$ . In particular, if one constructs the WoS chain based on $\overline{r}$ , then in order to make the analysis rigorous the boundary data $g$ and the source $f$ should be extended also to the complement of the domain $\overline{D}$ . Fortunately, this issue is completely avoided by considering $\widetilde{r}$ -WoS chains (as it is done in this work), since by definition $\widetilde{r}\leq r$ on $D$ .

Let us begin with the following mean estimate in $L^{2}(D)$ :

Proposition 2.24.

Let $\varepsilon>0$ , $\beta\in(0,1]$ , and $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance. Then for all $N,M\in\mathbb{N}$ , and $\gamma\geq 0$

\mathbb{E}\left\{\left|u(\cdot)-u_{M}^{N}(\cdot)\right|^{2}_{L^{2}(D)}\right\}\leq 2\lambda(D)\left[\sup\limits_{x\in D}|u(x)-u_{M}(x)|^{2}+\frac{2(|g|^{2}_{\infty}+\frac{1}{d^{3}}M|f|^{2}_{\infty}{\sf diam}(D)^{4}}{N}\right],

(2.30)

where $\lambda$ is the Lebegue measure on $\mathbb{R}^{d}$ , whilst $u_{M}$ and $u_{M}^{N}$ are given by (2.20) and (2.29). In particular, the above inequality can be made more explicit by employing the estimates for $\sup\limits_{x\in D}|u(x)-u_{M}(x)|$ obtained in 2.21 and 2.22, depending on the regularity of $D$ and $g$ .

Proof in Subsection 4.4.

Remark 2.25.

Note that as in [23], Section 4, the above error estimate depends on the volume $\lambda(D)$ . When $\lambda(D)$ scales well with the dimension (e.g. at most polynomially), then (2.30) can be employed to overcome the curse of high dimensions; in fact, if $D$ is a subset of a hypercube whose side has length less then some $\delta<1$ , then $\lambda(D)\leq\delta^{d}$ , hence, in this case, the factor $\lambda(D)$ improves the mean squared error exponentially with respect to $d$ . However, $\lambda(D)$ may also grow exponentially with respect to $d$ , and the above estimate can not be used to construct a neural network whose size scales at most polynomially with respect to the dimension. Therefore, our next (and in fact) main goal is to solve this inconvenience, by looking at tail estimates for the Monte-Carlo error; one key idea is to quantify the error using the sup-norm instead of the $L^{2}(D)$ -norm.

Before we move forward, we recall the notion of a regular domain. We say that a bounded domain $D$ is regular if for any continuous function $g$ on the boundary, the harmonic function $u$ with the boundary condition $f$ is continuous in $\bar{D}$ .

As announced in the above remark, we conclude now with the central result of this paper.

Theorem 2.26.

Keep the same framework and notations as in the beginning of this subsection. Fix a small $\varepsilon_{0}>0$ , $\beta\in(0,1]$ , $\widetilde{r}$ a $(\beta,\varepsilon_{0})$ -distance, and consider $u_{M}$ and $u_{M}^{N}$ given by (2.20) and (2.29). Also, assume that $f$ and $g$ are $\alpha$ -Hölder on $D$ for some $\alpha\in(0,1]$ . Then, for any compact subset $F\subset D$ , for all $N,M,K\geq 1$ , $\gamma>0$ and $\varepsilon\in(0,\varepsilon_{0}]$ , then

\mathbb{P}\left(\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq 2\exp\left(C_{1}(M,K,d)-\frac{\left((\gamma-A(F,M,K,d,\varepsilon))^{+}\right)^{2}}{C_{2}(M,d)}N\right),

(2.31)

where

\begin{split}C_{1}(M,K,d):&=d\left(\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right)\\ C_{2}(M,d):&=|g|_{\infty}+\frac{M}{d}{\sf diam}(D)^{2}|f|_{\infty}\\ \end{split}

(2.32)

and

\begin{split}A(F,M,K,d,\varepsilon)&:=2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha}\\ &\quad+d^{\alpha/2}|g|_{\alpha}\cdot v(F,\varepsilon)^{\alpha/2}+|f|_{\infty}v(F,\varepsilon)+(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\end{split}

(2.33)

If $g\in C^{2}(\overline{D})$ then in (2.31) the term $\lceil M/\alpha\rceil$ can be replaced by $M$ and

	$\displaystyle A(F,M,K,d,\varepsilon):=$	$\displaystyle 2\left(\|g\|_{\alpha}+\frac{{\rm diam}(D)^{2}\|f\|_{\alpha}+2{\rm diam}(D)\|f\|_{\infty}}{d}\right)\frac{{\sf diam}(D)}{K}$
		$\displaystyle+\left(\frac{\|\Delta g\|_{\infty}}{2}+\|f\|_{\infty}\right)v(F,\varepsilon)+(8\|g\|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}\|f\|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.$

Moreover, if we set

\begin{split}B(M,K,d):&=C_{2}(M,d)(\sqrt{C_{1}(M,K,d)+\log(2)}+1)\\ &=\left(|g|_{\infty}+\frac{M}{d}{\sf diam}(D)^{2}|f|_{\infty}\right)\left(\sqrt{d\left(\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right)+\log(2)}+1\right).\end{split}

(2.34)

then we also have the estimate on the expectation of the total error in the form

\mathbb{E}\left\{\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\right\}\leq A(F,M,K,d,\varepsilon)+\frac{B(M,K,d)}{\sqrt{N}}.

(2.35)

As a consequence, from 2.18, for any compact set $F\subset D$ ,

\lim_{M\to\infty}\lim_{N\to\infty}\mathbb{E}\left\{\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\right\}=0.

(2.36)

For regular domains, we can take $F=D$ .

Furthermore, for any domain, we can replace $v(F,\epsilon)$ with $|v|_{\infty}(\varepsilon)$ . Moreover, if the domain satisfies the exterior ball condition, then in (2.33) we can take $F=D$ and replace $|v|_{\infty}(\varepsilon)$ by $\varepsilon\;{\rm adiam}(D)$ .

If the domain $D$ is $\delta$ -defective convex, we can replace $e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}$ from the definition of A in (2.33) with $\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\sqrt{\frac{{\sf diam}(D)}{\varepsilon}}$ .

Proof in Subsection 4.4.

Remark 2.27.

Note that the left hand side of (2.31) does not depend on $K$ , and if $\widetilde{r}$ is a $\beta$ -distance, then it does not depend on $\varepsilon$ as well. Therefore, in the right hand side of (2.31) one may take the infimum with respect to $K\geq 1$ , and if $\widetilde{r}$ is a $(\beta,\varepsilon_{0})$ -distance then one may also take the infimum with respect to $\varepsilon>0$ ; but optimizing the previously obtained bounds in this way may be cumbersome. Anyway, convenient bounds can be easily obtained from particular choices of $K$ and $\varepsilon$ , so let us do so in the sequel.

Let us conclude this subsection with the following consequence obtained for some convenient choices for $K$ and $\varepsilon$ .

Corollary 2.28.

Let $D\subset\mathbb{R}^{d}$ such that it satisfies the uniform exterior ball condition. Further, let $f$ and $g$ be $\alpha$ -Hölder on $\overline{D}$ for some $\alpha\in[0,1]$ , $\gamma>0$ be a prescribed error, $\eta>0$ be a prescribed confidence, and $\widetilde{r}$ be a $(\beta,\varepsilon_{0})$ -distance with $\beta\in(0,1]$ and $\varepsilon>0$ such that

	$\displaystyle\varepsilon$	$\displaystyle\leq\varepsilon_{0}:={\left[1+4(\|g\|_{\alpha}+\|f\|_{\infty}){\sf adiam}(D)\vee 1\right]}^{-\frac{2}{\alpha}}\gamma^{\frac{2}{\alpha}}d^{-1}.$	(2.37)
Also, choose
	$\displaystyle K$	$\displaystyle:=\left\lceil{\sf diam}(D)\left(\frac{8\left(\|g\|_{\alpha}+\frac{{\rm diam}(D)^{2}\|f\|_{\alpha}+2{\rm diam}(D)\|f\|_{\infty}}{d}\right)+1}{\gamma}\right)^{1/\alpha}\right\rceil.$	(2.38)

Then

\mathbb{P}\left(\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq\eta

(2.39)

whenever we choose

N\geq\frac{16\left\{d\left[\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right]+\log(\frac{2}{\eta})\right\}\left[|g|_{\infty}+\frac{M}{d}{\sf diam}(D)^{2}|f|_{\infty}\right]^{2}}{9\gamma^{2}}

(2.40)

and

M\geq\frac{\left[\log(4/\gamma)+\log(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})\right]4{\sf diam}(D)^{2}}{\beta^{2}\varepsilon_{0}^{2}}.

(2.41)

Furthermore, if $D$ is $\delta$ -defective convex, then $M$ can be chosen as

M\geq\frac{4d\left[\log\left(\frac{4}{\gamma}\sqrt{\frac{{\sf diam}(D)}{\varepsilon_{0}}}\right)+\log(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})\right]}{\beta^{2}(1-\delta)}.

(2.42)

Proof in Subsection 4.4.

Remark 2.29.

By some simple computations, $\varepsilon_{0}$ , $M$ , and $N$ from 2.28, exhibit the following asymptotic behaviors:

$\varepsilon_{0}\in\mathcal{O}(\gamma^{\frac{2}{\alpha}}d^{-1})$ ,

and if $D$ is $\delta$ -defective convex then

M\in\mathcal{O}\left(\frac{d\log(d/\gamma)}{\beta^{2}(1-\delta)}\right),\quad N\in\mathcal{O}\left(\frac{\log^{2}(d/\gamma)\left[d^{2}\log(d/\gamma)+\beta^{2}(1-\delta)\log(1/\eta)\right]}{\beta^{4}\gamma^{2}(1-\delta)^{2}}\right),

whilst if $D$ satisfies the uniform exterior ball condition then

M\in\mathcal{O}\left(\frac{d^{2}\log(1/\gamma)}{\beta^{2}\gamma^{4/\alpha}}\right),\quad N\in\mathcal{O}\left(\frac{d^{2}\log(1/\gamma)^{2}\left[d^{2}\log^{2}(1/\gamma)+\beta^{2}\gamma^{2/\alpha}\log(1/\eta)\right]}{\beta^{4}\gamma^{2+4/\alpha}}\right).

Here, the Landau symbols tacitly depend on (the regularity of) $f,g,{\sf diam}(D),{\sf adiam(D)},\mbox{ and }\widetilde{r}$ . In particular, only in terms of the dimension $d$ , if the domain is $\delta$ -convex then $M\in\mathcal{O}(d\log(d))$ and $N\in\mathcal{O}(d^{2}\log^{3}(d))$ , whilst merely under the uniform exterior ball condition, $M\in\mathcal{O}(d^{2})$ and $N\in\mathcal{O}(d^{4})$ .

2.5 On regular extensions of the boundary data inside the domain

Recall that one assumption of the main results in the previous subsections (see e.g. 2.26) is that the boundary data $g$ can be extended as a regular function (Hölder or $C^{2}$ ) defined on the entire domain $\overline{D}$ . This is required by the fact that the data needs to be evaluated at the location where the WoS chain is stopped, see (2.29), and such stopped position lies in principle in the interior of the domain $D$ . However, usually in practice, $g$ is measured (hence known) merely at the boundary $\partial D$ . With this issue in mind, in this subsection we address the problem of extending $g$ regularly from $\partial D$ to $\overline{D}$ , in a constructive way which is also DNN-compatible.

We take $D\subset\mathbb{R}^{d}$ to be a set of class $C^{k}$ , $k=3$ or $k=2$ , hence (see [19, sec. 14.6]) there exists a neighbourhood $D_{\epsilon_{0}}:=\{x\in D;\textrm{dist}(x,\partial D)<\epsilon_{0}\}$ of $\partial D$ such that the restriction of the distance function $r:{D_{\epsilon_{0}}}\to\mathbb{R}_{+}$ is of class $C^{k}$ , and the nearest point projection ${\pi_{\partial D}}:{D_{\epsilon_{0}}}\to\partial D$ is of class $C^{k-1}$ . We have:

Lemma 2.30.

Let $D\subset\mathbb{R}^{d}$ to be a set of class $C^{2}$ hence for any point $x\in\partial D$ there exist a function $\phi_{x}:\mathbb{R}^{d-1}\to\mathbb{R}$ of class $C^{2}$ and a radius $r_{x}>0$ such that

D\cap B(x,r_{x})=\{y=(y_{1},\dots,y_{d})\in B(x,r_{x});y_{d}<\phi_{x}(y_{1},\dots,y_{d-1}).

We denote by $M:=\sup_{\stackrel{{\scriptstyle i,j,k\in\{1,\dots,d\}}}{{x\in\partial D}}}\big{|}\frac{\partial^{3}\phi_{x}}{\partial y_{i}\partial y_{j}\partial y_{k}}(x)\big{|}$ . Furthermore, denoting by $k_{1}(x)\leq k_{2}(x)\leq\dots\leq k_{d-1}(x)$ the ordered principal curvatures of $\partial D$ let us take ${\varepsilon}_{0}:=\min_{x\in\partial D}k_{d-1}^{-1}(x)$ . Take $\psi\in C^{\infty}_{c}([0,\infty),\mathbb{R})$ to be such that $\psi\equiv 1$ on $[0,1]$ and $\psi\equiv 0$ on $[3,\infty)$ , $|\psi|_{\infty}|,|\psi^{\prime}|_{\infty}|,|\psi^{\prime\prime}|_{\infty}\leq 1$ . We define the extension $G$ in $\overline{D}$ of the $\alpha$ -Hölder function $g$ given on the boundary $\partial D$ , for $\alpha\in(0,1]$ as:

G:\overline{D}\to\mathbb{R},\quad G(x):=\psi\left(\frac{1}{{\varepsilon}_{0}}r(x))\right)g({\pi_{\partial D}}(x)),\;x\in\overline{D}.

(2.43)

Then $G$ is $\alpha$ -Hölder on $\overline{D}$ and $|G|_{\alpha}\leq|\nabla{\pi_{\partial D}}|_{\infty}^{\alpha}|g|_{\alpha}+|g|_{\infty}{\varepsilon}_{0}^{-1}|\sf{diam}(D)|^{1-\alpha}.$

If, furthemore, the domain is of class $C^{3}$ and $g\in C^{2}(\partial D)$ then $G$ is in $C^{2}(D)\cap C(\overline{D})$ with $G=g$ on $\partial D$ and furthermore we have:

	$\displaystyle\|\nabla G\|_{\infty}$	$\displaystyle\leq\frac{1}{{\varepsilon}_{0}}\|g\|_{\infty}+2\|\nabla g\|_{\infty}$
	$\displaystyle\|\Delta G\|_{\infty}$	$\displaystyle\leq\widetilde{C},$

where $\widetilde{C}$ is an explicitly computable constant in terms of $|g|_{\infty},|\nabla g|_{\infty},|\Delta g|_{\infty}$ , $M$ , and ${\varepsilon}_{0}$ .

Proof in Subsection 4.5.

Remark 2.31.

One can easily provide a non-constructive $\alpha$ -Hölder extension $\widetilde{G}$ on $\overline{D}$ of the $\alpha$ -Hölder boundary data $g$ given on $\partial D$ by setting $\widetilde{G}(x):=\inf\{g(y)+|g|_{\alpha}|x-y|^{\alpha},y\in\partial D\},x\in\overline{D}$ .

3 DNN counterpart of the main results

Let $\sigma:\mathbb{R}\to\mathbb{R}$ be the rectified linear unit (ReLU) activation function, that is $\sigma(x):=\max\{0,x\}$ , $x\in\mathbb{R}$ . Let $(d_{i})_{i=0,\ldots,d}$ be a sequence of positive integers. Let $A^{i}\in\mathbb{R}^{d_{i}\times d_{i-1}}$ and $b^{i}\in\mathbb{R}^{d_{i}}$ , $i=1,\ldots,L$ , and set $W^{i}(x):=A^{i}x+b^{i},x\in\mathbb{R}^{d}$ . We define the realization of the DNN $\mathbb{R}^{d_{0}}\ni x\mapsto\phi(x)$ by

\mathbb{R}^{d_{0}}\ni x\mapsto\phi(x):=W^{L}\circ\sigma\circ W^{L-1}\cdots\circ\sigma\circ W^{1}(x)\in\mathbb{R}^{d_{L}},\quad x\in\mathbb{R}^{d_{0}},

(3.1)

where $\mathbb{R}^{d}\ni x\mapsto\sigma(x):=(\sigma(x_{1}),\ldots,\sigma(x_{d}))$ , $d\in\mathbb{N}$ , is defined coordinatewise. The weights of the ReLU DNN $\phi$ are the entries of $(A^{i},b^{i})_{i=1,\ldots,L}$ . The size of $\phi$ denoted by ${\rm size}(\phi)$ is the number of non-zero weights. The width of $\phi$ is defined by ${\rm width}(\phi^{L})=\max\{d_{0},\ldots,d_{L}\}$ and $L$ is the depth of $\phi$ denoted by $\mathcal{L}(\phi)$ . In the sequel, we only consider DNNs with ReLU activation function.

For the reader’s convenience, before we proceed to the main result of this section (see 3.10 below), we present first several technical lemmas following [15] and [52], as well as some of their consequences; all these preparatory results are meant to provide a clear and systematic way of quantifying the size of the DNN which is constructed in the forthcoming main result, namely 3.10.

The following lemma is [52, Proposition 3].

Lemma 3.1.

For every $c>0$ and $\delta\in(0,1)$ , there exists a DNN $\Pi_{\delta}^{c}$ such that

\sup_{a,b\in[-c,c]}|ab-\Pi_{\delta}^{c}(a,b)|\leq\delta\quad\mbox{and}\quad{\rm size}(\Pi_{\delta}^{c})=\mathcal{O}(\lceil\log(\delta^{-1})+\log(c)\rceil).

Now, we recall Lemma II.6 from [15]:

Lemma 3.2.

Let $\phi_{i},\;i=1,\ldots,n$ , be ReLU DNNs with the same input dimension $d_{0}\in\mathbb{N}$ and the same depth $\mathcal{L}:=\mathcal{L}(\phi_{i}),1\leq i\leq n$ . Let $a_{i}$ , $i=1,\ldots,n$ , be scalars. Then there exists a ReLU DNN $\phi$ such that

i)

$\phi(x)=\sum_{i=1}^{n}a_{i}\phi_{i}(x)$ for every $x\in\mathbb{R}^{d_{0}}$ ,
ii)

$\mathcal{L}(\phi)=\mathcal{L}$ ,
iii)

$\mathcal{W}(\phi)\leq\sum\limits_{1\leq i\leq n}\mathcal{W}(\phi_{i})$ ,
iv)

${\rm size}(\phi)\leq\sum\limits_{1\leq i\leq n}{\rm size}(\phi_{i})$ .

The following lemma is taken from [15, Lemma II.3], with the mention that the last assertion iv) brings some improvement which is relevant to our purpose; it is immediately entailed by the proof of the same [15, Lemma II.3], so we skip its justification.

Lemma 3.3.

Let $\phi_{1}:\mathbb{R}^{d_{1}}\rightarrow\mathbb{R}^{d_{2}}$ and $\phi_{2}:\mathbb{R}^{d_{3}}\rightarrow\mathbb{R}^{d_{1}}$ be two ReLU DNNs. Then there exists a ReLU DNN $\phi:\mathbb{R}^{d_{3}}\rightarrow\mathbb{R}^{d_{2}}$ such that

i)

$\phi(x)=\phi_{1}(\phi_{2}(x))$ for every $x\in\mathbb{R}^{d_{0}^{2}}$ ,
iii)

$\mathcal{L}(\phi)=\mathcal{L}(\phi_{1})+\mathcal{L}(\phi_{2})$ ,
iii)

$\mathcal{W}(\phi)\leq\max(\mathcal{W}(\phi_{1}),\mathcal{W}(\phi_{2}),2d_{1})$ ,
iv)

${\rm size}(\phi)\leq\min\left({\rm size}(\phi_{1})+{\rm size}(\phi_{2})+d_{1}[\mathcal{W}(\phi_{1})+\mathcal{W}(\phi_{2})],\;2{\rm size}(\phi_{1})+2{\rm size}(\phi_{2})\right)$ .

The next lemma is essentially [15, Lemma II.4]. As in the case of the previous lemma, assertion iv) comes with a slight modification of the original result, which can be immediately deduced from the proof of [15, Lemma II.4].

Lemma 3.4.

Let $\phi:\mathbb{R}^{d_{0}}\rightarrow\mathbb{R}^{d_{1}}$ be a ReLU DNN such that $\mathcal{L}(\phi)<L$ . Then there exists a second ReLU DNN $\widetilde{\phi}:\mathbb{R}^{d_{0}}\rightarrow\mathbb{R}^{d_{1}}$ such that

i)

$\phi(x)=\widetilde{\phi}(x)$ for all $x\in\mathbb{R}^{d_{0}}$ ,
ii)

$\mathcal{L}(\widetilde{\phi})=L$ ,
iii)

$\mathcal{W}(\widetilde{\phi})=\max(2d_{1},\mathcal{W}(\phi))$ ,
iv)

${\rm size}(\widetilde{\phi})\leq\min\left({\rm size}(\phi)+d_{1}\mathcal{W}(\phi),\;2{\rm size}(\phi)\right)+2d_{1}(L-\mathcal{L}(\phi))$ .

As a direct consequence of 3.1, 3.3, 3.4, and [15, Lemma II.5], one gets the following approximation result for products of scalar ReLU DNN:

Corollary 3.5.

Let $\phi_{1},\phi_{2}:\mathbb{R}^{d}\rightarrow\mathbb{R}$ be two ReLU DNNs, $D\subset\mathbb{R}^{d}$ be a bounded subset, and let $\Pi:=\Pi_{\epsilon_{p}}^{c}$ be given by 3.1 for $c:=\max\left(\sup\limits_{x\in D}\phi_{1}(x),\sup\limits_{x\in D}\phi_{2}(x)\right)$ and $\epsilon_{p}>0$ . Then there exists a ReLU DNN $\phi:\mathbb{R}^{d}\rightarrow\mathbb{R}$ such that

i)

$\phi(x)=\Pi(\phi_{1}(x),\phi_{2}(x))$ for every $x\in\mathbb{R}^{d}$ ,
ii)

$\sup\limits_{x\in D}|\phi_{1}(x)\phi_{2}(x)-\phi(x)|\leq\epsilon_{p},$
iii)

${\rm size}(\phi)\leq 4{\rm size}(\phi_{1})+4{\rm size}(\phi_{2})+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)$ .

The following two lemmas are going to be employed later in order to quantify the size of one generic step of the WoS chain given by (2.10)-(2.11), regarded as an action of a ReLU DNN.

Lemma 3.6.

Let $\phi:\mathbb{R}^{d}\rightarrow\mathbb{R}$ be a ReLU DNN and $v\in\mathbb{R}^{d}$ be a vector. Then there exists a ReLU DNN $\phi_{v}:\mathbb{R}^{d}\rightarrow\mathbb{R}$ such that

i)

$\phi_{v}(x)=x+\phi(x)v$ for all $x\in\mathbb{R}^{d}$ ,
ii)

$\mathcal{L}(\phi_{v})=\mathcal{L}(\phi)+1$ ,
iii)

$\mathcal{W}(\phi_{v})\leq 2d+\max(d,\mathcal{W}(\phi))$ ,
iv)

${\rm size}(\phi_{v})\leq 2{\rm size}(\phi)+2d[\mathcal{L}(\phi)+2]$ .

Proof in Subsection 4.5.

The following result is easily deduced by employing recursively 3.6 and 3.3, so we omit its proof.

Corollary 3.7.

Let $\phi:\mathbb{R}^{d}\rightarrow\mathbb{R}$ be a ReLU DNN and $v_{k}\in\mathbb{R}^{d},k\geq 1$ be a sequence of vectors. Then there exist ReLU DNNs $\theta_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d},k\geq 0$ , such that for every $k\geq 0$

i)

$\theta_{k+1}(x)=\phi_{v_{k+1}}(\theta_{k}(x))$ and $\theta_{0}(x)=x$ for all $x\in\mathbb{R}^{d}$ , where $\phi_{v_{k}}$ is the one constructed in 3.6,
ii)

$\mathcal{L}(\theta_{k+1})=(k+1)(\mathcal{L}(\phi)+1)+1$ ,
iii)

$\mathcal{W}(\theta_{k+1})\leq 2d+\max(d,\mathcal{W}(\phi))$ ,
iv)

${\rm size}(\theta_{k+1})\leq 2d(k+1)[4d+\mathcal{W}(\phi)+\mathcal{L}(\phi)+2]+d+2(k+1){\rm size}(\phi)$ .

We end this first paragraph by a ReLU DNN extension of a Hölder continuous boundary data $g$ to the entire domain $\overline{D}$ .

Corollary 3.8.

Let $D\subset\mathbb{R}^{d}$ to be a set of class $C^{2}$ and $g$ be $\alpha$ -Hölder on $\partial D$ . For ${\varepsilon}_{0}>0$ and $\psi\in C^{\infty}_{c}([0,\infty),\mathbb{R})$ as defined in Lemma 2.30 we assume that for every $\delta_{r},\delta_{p},\delta_{\psi},\delta_{g}\in(0,1)$ , there exist ReLU DNNs $\phi_{r}$ , $\phi_{\pi}$ , $\phi_{\psi}$ and $\phi_{g}$ such that

|r-\phi_{r}|_{\infty}\leq\delta_{r},\quad|{\pi_{\partial D}}-\phi_{\pi}|_{\infty}\leq\delta_{\pi},\quad|\psi-\phi_{\psi}|_{\infty}\leq\delta_{\psi},\quad|g-\phi_{g}|_{\infty}\leq\delta_{g}.

With $\varepsilon_{0}$ the one given in 2.30, set

\overline{\delta}:=2\left(3\delta_{\psi}+\frac{\delta_{d}}{{\varepsilon}_{0}}\right)|g|_{\infty}+2\left(3\delta_{g}+|\nabla g|_{\infty}\delta_{\pi}\right)(\delta_{\psi}+1)\in\mathcal{O}(\delta_{\psi}+\delta_{r}+\delta_{g}+\delta_{\pi}).

If $G$ is the $\alpha$ -Hölder extension in $D$ of the boundary data $g$ given by (2.43), then there exists a ReLU DNN $\phi_{G}$ such that

i)

$|G-\phi_{G}|_{\infty}\leq\overline{\delta},$
ii)

${\rm size}(\phi_{G})\leq 2{\rm size}(\phi_{\psi})+2{\rm size}(\phi_{r})+2{\rm size}(\phi_{g})+2{\rm size}(\phi_{\pi})+\mathcal{O}(\lceil\log(\overline{\delta}^{-1})+\log(|g|_{\infty})\rceil)$ .

Proof in Subsection 4.5.

3.1 DNN approximations for solutions to problem (2.31)

We are now ready to present the DNN byproduct of 2.26, in fact of 2.28. First, let us state that the $\widetilde{r}$ -WoS chain given by (2.10)-(2.11) renders a ReLU DNN as soon as $\widetilde{r}$ is a ReLU DNN; this follows from a simple corroboration of 3.6 and 3.7, so we skip its formal proof.

Corollary 3.9.

Suppose that $\widetilde{r}$ is a ReLU DNN on the bounded set $D\subset\mathbb{R}^{d}$ such that $0\leq\widetilde{r}\leq r$ , where recall that $r$ specified by (2.9) is the distance function to the boundary of $D$ . Further, let $M\geq 0$ and $\left(X_{M}^{x},x\in D\right)$ be the $\widetilde{r}$ -WoS chain at step $M$ given by $\eqref{wos0}-\eqref{wosn}$ . Then for each $\omega\in\Omega$ there exists a ReLU DNN defined on $D$ and denoted by $\mathbb{X}_{M}^{\omega}(\cdot)$ such that

i)

$\mathbb{X}_{M}^{\omega}(x)=X_{M}^{x}(\omega)$ for all $x\in D$ ,
ii)

$\mathcal{L}(\mathbb{X}_{M}^{\omega})=M(\mathcal{L}(\widetilde{r})+1)+1$ ,
iii)

$\mathcal{W}(\mathbb{X}_{M}^{\omega})\leq 2d+\max(d,\mathcal{W}(\widetilde{r}))$ ,
iv)

${\rm size}(\mathbb{X}_{M}^{\omega})\leq 2dM[4d+\mathcal{W}(\widetilde{r})+\mathcal{L}(\widetilde{r})+2]+d+2M{\rm size}(\widetilde{r})$ .

The main result of this section is the following, proving that ReLU DNNs can approximate the solution $u$ to problem (2.31) without the curse of dimensions.

Theorem 3.10.

The statement requires a detailed context, so let us label the assumption and the conclusion separately.

Assumption: Let $D\subset\mathbb{R}^{d}$ be a bounded domain satisfying the uniform exterior ball condition, $f$ and $g$ be $\alpha$ -Hölder functions on $\overline{D}$ for some $\alpha\in[0,1]$ , and $u$ be the solution to (2.31), as in 2.2. Let $\phi_{f}:D\rightarrow\mathbb{R},\phi_{g}:\overline{D}\rightarrow\mathbb{R},\phi_{r}:D\rightarrow\mathbb{R}$ be ReLU DNNs such that

|f-\phi_{f}|_{\infty}\leq\epsilon_{f}\leq|f|_{\infty},\quad|g-\phi_{g}|_{\infty}\leq\epsilon_{g},\quad|r-\phi_{r}|_{\infty}\leq\epsilon_{r},

and set $\widetilde{r}(x):=(\phi_{r}(x)-\epsilon_{r})^{+},\,x\in D.$ Also, let $\Pi:=\Pi_{\epsilon_{p}}^{c}$ be the ReLU DNN given by 3.1.

Further, let $\gamma>0$ be a prescribed error, $0<\eta<1$ be a prescribed confidence, and consider the following assumptions on the parameters $\epsilon_{f},\epsilon_{g},\epsilon_{r},\epsilon_{p}\text{ and }c$ :

a.1)

$\epsilon_{g}\leq\gamma/6$ ,
a.2)

$\epsilon_{f}\leq\frac{d\gamma}{6M{\rm diam}(D)^{2}}$ ,
a.3)

$\epsilon_{r}<\varepsilon_{0}:=\frac{1}{3}\left[(4|g|_{\alpha}+|f|_{\infty}){\sf adiam}(D)\vee 1\right]^{-\frac{2}{\alpha}}(\gamma/2)^{\frac{2}{\alpha}}d^{-1}$ , so that, by 2.8, $\widetilde{r}$ is a $(\beta,\varepsilon_{0})$ -distance if we choose $\beta=1/3$ ,
a.4)

$\epsilon_{p}=\frac{\gamma d}{6M(1+2|f|_{\infty})}$ and $c=\max({\rm diam}(D),2|f|_{\infty})$ ,

where $M\geq 1$ is specified below.

Further, consider the iid pairs $((X^{\cdot,i}_{k})_{k\geq 1},Y^{i}),i\geq 1$ on $(\Omega,\mathcal{F},\mathbb{P})$ , as in the beginning of Subsection 2.4, and

\widetilde{u}_{M}^{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\left[\phi_{g}(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\Pi\left(\Pi\left(\widetilde{r}(X^{x,i}_{k-1}),\widetilde{r}(X^{x,i}_{k-1})\right),\phi_{f}\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right)\right],x\in D.

(3.2)

Let us choose

		$\displaystyle N\geq\frac{64\left\{d\left[\lceil M/\alpha\rceil\log(2+\|\widetilde{r}\|_{1})+\log(K)\right]+\log(\frac{2}{\eta})\right\}\left[\|g\|_{\infty}+\frac{M}{d}{\sf diam}(D)\|f\|_{\infty}\right]^{2}}{9\gamma^{2}},$		(3.3)
		$\displaystyle M\geq\frac{36\left[\log(8/\gamma)+\log(4\|g\|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}\|f\|_{\infty})\right]{\sf diam}(D)^{2}}{\varepsilon_{0}^{2}},$		(3.4)

where $K:=\left\lceil{\sf diam}(D)\left(\frac{16\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)+2}{\gamma}\right)^{1/\alpha}\right\rceil$ .

Furthermore, if $D$ is $\delta$ -defective convex then $M$ can be chose such that

M\geq 36\frac{d\log\left(\frac{8}{\gamma}\sqrt{\frac{{\sf diam}(D)}{\varepsilon_{0}}}\right)+\log(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})}{(1-\delta)}.

Conclusion: Under the above assumption and keeping the same notations, there exits a measurable function $\mathbb{U}_{M}^{N}:\Omega\times D\rightarrow\mathbb{R}$ such that

c.1)

$\mathbb{U}_{M}^{N}(\omega,\cdot)$ is a ReLU DNN for each $\omega\in\Omega$ , $\mathbb{U}_{M}^{N}(\cdot,x)=\widetilde{u}_{M}^{N}(x),\quad x\in D,$ and

$\mathbb{P}\left(\sup\limits_{x\in D}\left|u(x)-\mathbb{U}_{M}^{N}(\cdot,x)\right|\geq\gamma\right)\leq\eta.$

c.2)

For each $\omega\in\Omega$ we have that

{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))\in\mathcal{O}\left(MN\left[dM\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+M{\rm size}(\phi_{r})+{\rm size}(\phi_{f})+\left\lceil\log\left(\frac{1}{\gamma d}\right)\right\rceil\right]\right).

In particular,

{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))\in\mathcal{O}\left(d^{7}\gamma^{-16/\alpha-4}\log^{4}\left(\frac{1}{\gamma}\right)\left[d^{3}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right),

where

{\rm S}:=\left[\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+{\rm size}(\phi_{r})+{\rm size}(\phi_{g})+{\rm size}(\phi_{f})\right]

and the tacit constant depends on $|g|_{\alpha},|g|_{\infty},|f|_{\infty},{\rm diam}(D),{\rm adiam}(D),\delta,\alpha,\log(2+|\phi_{r}|_{1})$ .

Furthermore, if $D$ is $\delta$ -defective convex then if $a.5)$ holds then

{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))\in\mathcal{O}\left(\frac{d^{3}}{\gamma^{2}}\log^{4}\left(\frac{d}{\gamma}\right)\left[d^{2}\log\left(\frac{d}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right)

Proof in Subsection 4.5.

We end the exposition of the main results with the remark that it is sufficient to prescribe the Dirichlet data $g$ merely on $\partial D$ (not necessarily extended to $\overline{D}$ ), as expressed by the following direct consequence of 3.8.

Corollary 3.11.

If the domain $D$ is of class $C^{2}$ , and if $g$ is given merely on $\partial D$ and it is $\alpha$ -Hölder there, then $g$ can be constructively extended to an $\alpha$ -Hölder function on $\overline{D}$ . Furthermore, a ReLU DNN approximation $\phi_{g}$ can be constructed as in 3.8, so 2.26 and 3.10 fully apply.

4 Proofs of the main results

4.1 Proofs for Subsection 2.1

Proof of 2.1.

Let $v(x):=u(Ax)$ . Then $\partial_{k}v(x)=\sum_{j,l=1}^{n}\partial_{j}u(Ax)\partial_{k}(A_{jl}x_{l})=\sum_{j=1}^{n}\partial_{j}u(Ax)A_{jk}$ and

\sum_{k=1}^{n}\partial_{k}\partial_{k}v(x)=\sum_{i,j,k,l=1}^{n}\partial_{i}\partial_{j}u(Ax)A_{jk}\partial_{k}(A_{il}x_{l})=\sum_{i,j,k=1}^{n}\partial_{i}\partial_{j}u(Ax)A_{jk}A_{ik}.

Thus we need to determine $A$ such that $AA^{T}=K$ . We know that there exists a rotation matrix $R\in O(3)$ (hence $RR^{T}=Id$ ) such that $RKR^{t}=\textrm{diag}(\lambda_{1},\dots,\lambda_{n})$ . Since $K$ is positive definite we have $\lambda_{i}>0$ for $i\in\{1,\dots,n\}$ . Then we have:

RAR^{T}RA^{T}R^{T}=RKR^{T}=\textrm{diag}(\lambda_{1},\dots,\lambda_{n}).

(4.1)

We denote $B:=RAR^{T}$ and observe that (4.1) can be rewritten as $BB^{T}=\textrm{diag}(\lambda_{1},\dots,\lambda_{n})$ . We can now take $B:=\textrm{diag}(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}})$ . Thus $A:=R^{T}\textrm{diag}(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}})R$ . ∎

Proof of 2.3.

The second assertion follows directly from 2.2 and from classical regularity theory for Poisson equation, so let us prove the first one. To this end, note that without loss of generality we may assume that $x=0\in D$ , so by Ito’s formula we get that $(|B^{0}(t)|^{2}-dt)_{t\geq 0}$ is a martingale and

v(0)=\mathbb{E}\left\{\tau^{0}_{D^{c}}\right\}=\frac{1}{d}\mathbb{E}\left\{|B^{0}(\tau_{D^{c}})|^{2}\right\}\leq{\sf diam}(D)^{2}/d.

∎

Proof of 2.10.

i). Using 2.9 we have that,

	$\displaystyle\mathbb{E}\left\{\int_{0}^{\tau^{x}_{D^{c}}}f(B^{x}(t))\;dt\right\}$	$\displaystyle=\sum\limits_{n\geq 0}\mathbb{E}\left\{\int_{\tau^{x}_{n}}^{\tau^{x}_{n+1}}f(B^{x}(t))\;dt\right\}$
		$\displaystyle=\sum\limits_{n\geq 1}\mathbb{E}\left\{\widetilde{r}^{2}(B^{x}_{\tau^{x}_{n-1}})K_{0}f(B^{x}_{\tau^{x}_{n-1}}+\widetilde{r}(B^{x}_{\tau^{x}_{n-1}})\cdot)\right\}$
		$\displaystyle=\mathbb{E}\left\{\sum\limits_{k\geq 1}\widetilde{r}^{2}(X^{x}_{k-1})K_{0}f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})\cdot)\right\},\quad x\in D,$

where the second equality follows by the strong Markov and the scaling properties of Brownian motion, whilst the last equality follows from the fact that the law of $(X^{x}_{n})_{n\geq 0}$ under $\widetilde{\mathbb{P}}$ and the law of $(B^{x}_{\tau^{x}_{n}})_{n\geq 0}$ under $\mathbb{P}^{0}$ are the same. Therefore, the statement follows by 2.2.

ii). The claim follows from the fact that $\mu(A)=w(0)$ , where $w$ solves

\left\{\begin{array}[]{ll}-\frac{1}{2}\Delta w=d1_{A}&\,\textrm{ in }B(0,1)\\[2.84526pt] \phantom{\Delta}w=0&\,\textrm{ on }\partial B(0,1)\end{array}\right.

for all $A\in\mathcal{B}(B(0,1))$ .

iii). We use conditional expectation, namely for every $x\in D$

	$\displaystyle\mathbb{E}$	$\displaystyle\left\{\sum\limits_{k\geq 1}\widetilde{r}^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})Y)\right\}=\sum\limits_{k\geq 1}\mathbb{E}\left\{\widetilde{r}^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})Y)\right\}$
		$\displaystyle=\sum\limits_{k\geq 1}\mathbb{E}\left\{\mathbb{E}\left[\widetilde{r}^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})Y)\|\,X_{k-1}^{x}\right]\right\}=d\sum\limits_{k\geq 1}\mathbb{E}\left\{\widetilde{r}^{2}(X^{x}_{k-1})K_{0}f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})\cdot)\right\},$

where for the last equality we used that $Y$ has distribution $\mu$ and is independent of $X_{k-1}^{x}$ . ∎

Proof of 2.4.

Recall that $\mathbb{E}\{\tau^{x}_{D^{c}}\}=v(x),x\in D$ , where $v$ is given by (2.3). The idea is to explicitly solve for $v(x)=-2w(|x|)$ in radial form as

w^{\prime\prime}(r)+\frac{n-1}{r}w^{\prime}(r)=1\text{ with }w(R_{0})=0,w(R_{1})=0

which is explicitly solved as

w(r)=\frac{r^{2}}{2n}+C_{1}r^{2-n}+C_{2}\text{ with }C_{1}=\frac{R_{1}^{2}-R_{0}^{2}}{2n(R_{0}^{2-n}-R_{1}^{2-n})},C_{2}=\frac{R_{0}^{n}-R_{1}^{n}}{2n(R_{1}^{n-2}-R_{0}^{n-2})}.

Now, if we start from a point $x$ we can use the intermediate value theorem to obtain first that

\mathbbm{E}_{x}\{\tau_{D^{c}}\}=-2w(|x|)\leq 2d(x,\partial A(R_{0},R_{1}))|w^{\prime}(r_{0})|

for some point $r_{0}\in[R_{0},R_{1}]$ . Therefore our task now is to estimate the above derivative, which we can compute explicitly as

w^{\prime}(r)=\frac{r}{n}+\frac{2-n}{2n}\frac{R_{1}^{2}-R_{0}^{2}}{R_{0}^{2-n}-R_{1}^{2-n}}r^{1-n}.

To estimate this in a transparent way we set $R_{1}=\rho R_{0}$ and $r=tR_{0}$ for $1\leq t\leq\rho$ . In these new notations we have

v^{\prime}(r)=\frac{R_{0}}{n}\left(t-\frac{(2-n)(\rho^{2}-1)}{2(\rho^{2-n}-1)}t^{1-n}\right).

Now, it is an elementary matter that for two functions $f,g:[a,b]\to\mathbb{R}$ which are differentiable on $(a,b)$ (and $g$ non-vanishing), we can find a point $\xi\in(a,b)$ such that

\frac{f(b)-f(a)}{g(b)-g(a)}=\frac{f^{\prime}(\xi)}{g^{\prime}(\xi)}.

Using this fact for $a=1,b=\rho$ , $f(x)=x^{2}$ and $g(x)=x^{2-n}$ we argue that for some point $\xi\in(1,\rho)$ we have

w^{\prime}(r)=\frac{R_{0}t}{n}\left(1-(\xi/t)^{n}\right).

As a function of $t\in[1,\rho]$ the above function is increasing, thus we have

\frac{R_{0}}{n}\left(1-\xi^{n}\right)\leq w^{\prime}(r)\leq\frac{R_{0}\rho}{n}\left(1-(\xi/\rho)^{n}\right).

(4.2)

Therefore in order to control $|w^{\prime}(r)|$ it suffices to control the absolute values of the two bounds above. The right hand side bound is easy because for any $x\in(0,1)$ , thus we obtain that

R_{0}\rho\frac{1-(\xi/\rho)^{n}}{n}\leq R_{0}\rho(1-\xi/\rho)=R_{0}(\rho-\xi)\leq R_{1}-R_{0}.

(4.3)

The left hand side of (4.2) in absolute value is bounded by

\begin{split}\left|\frac{R_{0}(1-\xi^{n})}{n}\right|&=\frac{R_{0}(\xi^{n}-1)}{n}=\frac{R_{0}}{n}\left(\frac{(n-2)(\rho^{2}-1)}{2(1-\rho^{2-n})}-1\right)=\frac{R_{0}}{n}\left(\frac{(n-2)(\rho^{2}-1)\rho^{n-2}}{2(\rho^{n-2}-1)}-1\right)\\ &=\frac{R_{0}}{n}\left(\frac{(n-2)(\rho^{2}-1)}{2}+\frac{(n-2)(\rho^{2}-1)}{2(\rho^{n-2}-1)}-1\right)\\ &\leq\begin{cases}\frac{R_{0}}{3}\left(\frac{(\rho^{2}-1)}{2}+\frac{\rho-1}{2}\right),&n=3\\ \frac{R_{0}}{2n}(\rho^{2}-1),&n\geq 4\end{cases}\\ &\leq\frac{R_{0}(\rho-1)\rho}{2}\\ &\leq\frac{(R_{1}-R_{0})R_{1}}{2R_{0}},\end{split}

(4.4)

where we go back to the fact that $\xi^{n}=\frac{(n-2)(\rho^{2}-1)}{2(1-\rho)}$ and in the second line we used that $2(\rho^{n-2}-1)\geq(n-2)(\rho^{2}-1)$ because $\rho\geq 1$ for $n\geq 4$ . Thus combining (4.3) and (4.4) we get that

|w^{\prime}(r)|\leq\frac{(R_{1}-R_{0})R_{1}}{2R_{0}}

which is our claim. ∎

Proof of 2.6.

For $x\in D$ , we pick a point $y\in\partial D$ such that $d(x,y)=d(x,\partial D)$ and notice that $\tau_{D^{c}}\leq\tau_{A(a_{y},R_{0,y},R_{1,y})^{c}}$ . At the same time, since $d(x,\partial A(y,R_{0,y},R_{1,y}))=d(x,\partial D)$ , we employ 2.4 to deduce that

\mathbbm{E}\left\{\tau^{x}_{D^{c}}\right\}\leq\mathbbm{E}\left\{\tau^{x}_{A(y,R_{0,y},R_{1,y})^{c}}\right\}\leq 2d(x,\partial D){\sf adiam}(D).

∎

4.2 Proofs for Subsection 2.2

Proof of 2.11.

The proof goes through several steps.

The first step observes the following two basic facts which can be easily checked by direct computations. On the ball of radius $r$ centered at $0$ we have

\text{if }v(x)=1-\frac{\gamma}{2d}|x|^{2},\text{ for }0<\gamma<\frac{2d}{r^{2}},\text{ then }\Delta v\leq-\gamma v

(4.5)

and

\text{for }v(x)=1-\frac{\gamma}{2d+\gamma r^{2}}|x|^{2}\text{ with }0<\gamma,\text{ then }\Delta v\geq-\gamma v.

(4.6)

The second step consists in proving some estimates for the exit time of the Brownian motion from the ball of radius $r$ and centered at $0$ . Denote by $\tau$ this exit time for the Brownian motion started at the origin. Then

\mathbbm{E}\left\{e^{\gamma\tau}\right\}\leq\frac{1}{1-\frac{\gamma}{2d}r^{2}}\text{ for }0<\gamma<\frac{2d}{r^{2}}

(4.7)

and

\mathbbm{E}\left\{e^{\gamma\tau}\right\}\geq 1+\frac{\gamma}{2d}r^{2}\text{ for }0<\gamma.

(4.8)

The proofs of (4.7) and (4.8) are based on the previous step. For example, using (4.5) we learn that $\Delta v+\gamma v\leq 0$ for $v(x)=1-\frac{\gamma}{2d}|x|^{2}$ and this combined with Itô’s formula means that $e^{\gamma t}v(B_{t})$ is a supermartingale. In particular, stopping it at time $\tau$ , we obtain that

\mathbbm{E}\left\{e^{\gamma\tau}v(B_{\tau})\right\}\leq v(0)

from which we deduce (4.7).

In a similar fashion using (4.6) we can deduce (4.8).

With these two steps at hand we can move to proving the actual result. To proceed, we take $U_{1},U_{2},\dots$ the iid sequence of uniform random variables on the unit sphere in $\mathbb{R}^{d}$ which drives the walk on spheres. Now set $N_{k}^{x}$ to denote the number of steps to the $\epsilon$ -shell for the walk on spheres using the random variables $U_{k},U_{k+1},\dots$ . Notice that for a fixed point $x$ , in distribution sense, $N_{k}^{x}$ have the same distribution for all $k=1,2,\dots$ . Also, set $T_{\beta}(x,U)=x+\widetilde{r}(x)U$ the point on the sphere of radius $\widetilde{r}(x)$ determined by the first step of the walk on spheres determined by $U$ . The key now is the fact that

N^{x}_{1}=1+\mathbbm{1}_{T_{\beta}(x,U_{1})\in\Omega_{\epsilon}}N_{2}^{T_{\beta}(x,U_{1})}.

(4.9)

The intuitive explanation of this is rather simple, the walk on spheres starts with the first step. If we land in the $\epsilon$ -shell we stop. Otherwise we have to start again but this time we have already used the random variable $U_{1}$ and thus we have to base our remaining walk on spheres using $U_{2},U_{3},\dots$ .

Using now (4.9) we can write that

\mathbbm{E}\left\{e^{\lambda N^{x}}\right\}=\mathbbm{E}\left\{\mathbbm{E}[e^{\lambda N^{x}}|\,U_{1}]\right\}=\mathbbm{E}\left\{e^{\lambda}\mathbbm{E}[e^{\lambda\mathbbm{1}_{T_{\beta}(x,U_{1})\in\Omega_{\epsilon}}N^{T_{\beta}(x,U_{1})}_{2}}|\,U_{1}]\right\},

(4.10)

where we used conditioning with respect to the first random variable $U_{1}$ .

Now we are going to use a $\lambda$ such that

e^{\lambda}\leq\mathbbm{E}\left\{e^{\gamma\tau_{1}}\right\},

where $\tau_{1}$ is the first exit time of the Brownian motion from the ball of radius $\widetilde{r}(x)\geq\beta\epsilon$ starting at $x$ . This is the place where we can use the estimate (4.8) to show that $\lambda=\log(1+\gamma\beta^{2}\epsilon^{2}/2d)$ is sufficient to guarantee the above estimate. Notice the key point here, namely the fact that $U_{1}$ has the same distribution as $\frac{B_{\tau_{1}}-x}{|B_{\tau_{1}}-x|}$ where $B_{t}$ is the Brownian motion started at $x$ and $\tau_{1}$ denotes the exit time of the Brownian motion from the ball of radius $\widetilde{r}(x)$ .

Thus now we use this to argue that

\mathbbm{E}\left\{e^{\lambda N^{x}}\right\}\leq\mathbbm{E}\left\{e^{\gamma\tau_{1}+\lambda\mathbbm{1}_{T_{\beta}(x,U_{1})\in\Omega_{\epsilon}}N^{T_{\beta}(x,U_{1})}_{2}}\right\}=\mathbbm{E}\left\{e^{\gamma\tau_{1}+\lambda\mathbbm{1}_{B_{\tau_{1}}\in\Omega_{\epsilon}}N^{B_{\tau_{1}}}_{2}}\right\}.

(4.11)

Now repeating this one more step using the new starting point $B_{\tau_{1}}$ we will get

\mathbbm{E}\left\{e^{\lambda N^{x}}]\leq\mathbbm{E}[e^{\gamma(\tau_{1}+\mathbbm{1}_{T_{\beta}(x,U_{1})\in\Omega_{\epsilon}}\tau_{2})+\lambda\mathbbm{1}_{T_{\beta}(T_{\beta}(x,U_{1}),U_{2})\in\Omega_{\epsilon}}N^{T_{\beta}(T_{\beta}(x,U_{1}),U_{2})}_{2}}\right\}=\mathbbm{E}\left\{e^{\gamma\tau_{1}+\lambda\mathbbm{1}_{B_{\tau_{1}+\tau_{2}}\in\Omega_{\epsilon}}N^{B_{\tau_{1}}}_{3}}\right\}.

Repeating this we finally obtain that

\mathbbm{E}\left\{e^{\lambda N^{x}}\right\}\leq\mathbbm{E}\left\{e^{\gamma\tau_{\Omega_{\epsilon}}}\right\}.

(4.12)

Here we use $\gamma>0$ and $\lambda=\log(1+\frac{\gamma\beta^{2}\epsilon^{2}}{2d})$ . To finish the proof, we need to estimate now the right hand side in (4.12). To do this we enclose the domain $D$ in the ball of radius ${\sf diam}(D)$ centered at $x$ and now use (4.7) with $r={\sf diam}(D)$ to get that

\mathbbm{E}\left\{e^{\lambda N^{x}}\right\}\leq\frac{1}{1-\frac{\gamma D^{2}}{2d}}\text{ for }\lambda=\log(1+\frac{\gamma\beta^{2}\epsilon^{2}}{2d})\text{ and }\gamma<\frac{2d}{D^{2}}.

Finally, using that $\log(1+x)\geq x/2$ for $0\leq x\leq 1$ and choosing $\gamma=\frac{d}{D^{2}}$ , we obtain (2.14).

The second inequality of the statement is obtained based on the first estimate and Markov inequality:

	$\displaystyle\mathbb{P}\left(N^{x}\geq R\right)$	$\displaystyle=\mathbb{P}\left(e^{\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}N^{x}}\geq e^{\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}R}\right)\leq\mathbb{E}\left\{e^{\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}N^{x}}\right\}e^{-\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}R}$
		$\displaystyle\leq 2e^{-\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}R}.$

∎

Proof of 2.14.

The $\delta$ -defectiv convexity condition for this region amounts to the inequality:

-\int_{\mathcal{A}_{R_{1},R_{2}}}\nabla\varphi(x)\nabla r(x)\,dx\leq\frac{\delta}{R_{2}-R_{1}}\int_{A(R_{1},R_{2})}\varphi(x)\,dx\quad\mbox{ for all }0\leq\varphi\in C_{c}^{\infty}(A(R_{1},R_{2})),

where we used the fact that the distance function $r$ is Lipschitz, hence one can integrate by parts and discard the boundary terms due to the fact that $\varphi\in C_{c}^{\infty}(A(R_{1},R_{2}),\mathbb{R}_{+})$ .

We use the fact that on $A\left(R_{1},\frac{R_{1}+R_{2}}{2}\right)$ and $A\left(\frac{R_{1}+R_{2}}{2},R_{2}\right)$ the function $r$ is in fact smooth and we integrate by parts on each region to obtain that the left hand side above becomes:

	$\displaystyle\int_{A\left(R_{1},\frac{R_{1}+R_{2}}{2}\right)}\varphi(x)\Delta r(x)\,dx+\int_{S^{2}}\int_{r=\frac{R_{1}+R_{2}}{2}}\varphi(\sigma,s)r^{\prime}(s),s^{d-1}\,dsd\sigma+$
	$\displaystyle+\int_{\mathcal{A}_{R_{1},\frac{R_{1}+R_{2}}{2}}}\varphi(x)\Delta r(x)\,dx-\int_{S^{2}}\int_{r=\frac{R_{1}+R_{2}}{2}}\varphi(\sigma,s)r^{\prime}(s)\,s^{d-1}\,dsd\sigma$
	$\displaystyle=\int_{A\left(R_{1},\frac{R_{1}+R_{2}}{2}\right)}\varphi(x)\Delta r(x)\,dx,$

where we used spherical coordinates on the boundary and denoted by prime the derivative in the radial direction. Also, the $\Delta r(x)$ is well defined, in a classical sense, on $A(R_{1},R_{2})$ except for the set of measure zero that in spherical coordinates is given by $\{(r,\sigma);r=\frac{R_{1}+R_{2}}{2},\sigma\in\mathbb{S}^{2}\}$ . Noting that:

r(\sigma,s)=\left\{\begin{array}[]{ll}s-R_{1}&\mbox{ for all }\sigma\in\mathbb{S}^{2},s\in(R_{1},\frac{R_{1}+R_{2}}{2}),\\[5.69054pt] R_{2}-s&\mbox{ for all }\sigma\in\mathbb{S}^{2},s\in(\frac{R_{1}+R_{2}}{2},R_{2})\end{array}\right.

we have $r^{\prime}\equiv 1$ for all $x\in A\left(R_{1},\frac{R_{1}+R_{2}}{2}\right)$ and $r^{\prime}\equiv-1$ for all $x\in A\left(\frac{R_{1}+R_{2}}{2},R_{2}\right)$ , hence the $\delta$ -defective convexity condition amounts to the inequality:

(d-1)\bigg{(}\int_{\mathbb{S}^{2}}\int_{R_{1}}^{\frac{R_{1}+R_{2}}{2}}\varphi(\sigma,s)s^{d-2}ds-\int_{\mathbb{S}^{2}}\int_{\frac{R_{1}+R_{2}}{2}}^{R_{2}}\varphi(\sigma,s)s^{d-2}ds\bigg{)}\leq\frac{\delta}{R_{2}-R_{1}}\int_{\mathbb{S}^{2}}\int_{R_{1}}^{R_{2}}\varphi(\sigma,s)s^{d-1}\,ds\,d\sigma,

which (taking into account that $\varphi\geq 0$ ) is satisfied if for instance $\frac{R_{2}}{R_{1}}<1+\frac{\delta}{d-1}$ . ∎

Proof of 2.15.

Arguing similarly as in the previous example, the $\delta$ -defective convexity condition becomes:

\int_{D_{\varepsilon}}\varphi(x)\Delta r(x)\,dx\leq\frac{\delta}{{\varepsilon}}\int_{D_{\varepsilon}}\varphi(x)\,dx\quad\mbox{ for all }0\leq\varphi\in C_{c}^{\infty}(D_{\varepsilon}),

(4.13)

where $\Delta r(x)$ is well-defined, in a classical sense, except on the set of measure zero $\Gamma_{\varepsilon}:=\big{\{}x+\frac{1}{2}{\varepsilon}\,\,n(x)\in\mathbb{R}^{d}\ \big{|}\ x\in\Gamma\big{\}}$ . We also denote $D_{\varepsilon}^{+}:=\big{\{}x+{\varepsilon}\,t\,n(x)\in\mathbb{R}^{d}\ \big{|}\ (x,t)\in\Gamma\times(\frac{1}{2},1)\big{\}}$ respectively $D_{\varepsilon}^{-}:=\big{\{}x+{\varepsilon}\,t\,n(x)\in\mathbb{R}^{d}\ \big{|}\ (x,t)\in\Gamma\times(0,\frac{1}{2})\big{\}}$ .

We recall (see for instance, [19], Lemma $14.17$ , p. 355) that

\Delta r(x,t)=\left\{\begin{array}[]{ll}\sum_{i=1}^{d-1}\frac{k_{i}(x)}{1-k_{i}(x)r(x,t)}&\textrm{ if }t\in(0,\frac{1}{2})\\ -\sum_{i=1}^{d-1}\frac{k_{i}(x)}{1-k_{i}(x)({\varepsilon}-r(x,t))}&\textrm{ if }t\in(\frac{1}{2},1)\end{array},\right.

hence we have:

\int_{D_{\varepsilon}}\varphi(x)\Delta r(x)\,dx\leq\int_{D_{\varepsilon}^{-}}\varphi(x)\left(\sum_{i=1}^{d-1}\frac{2k_{i}(x)}{2-k_{i}(x){\varepsilon}}\right)\,dx-\int_{D_{\varepsilon}^{+}}\varphi(x)\left(\sum_{i=1}^{d-1}k_{i}(x)\right)\,dx

thus the condition (2.15) holds for suitably small ${\varepsilon}>0$ . ∎

Proof of 2.16.

We split the proof in two steps.

Step I (Regularization). Let $D_{n}:=\{x\in D:r(x)>1/n\}$ for each $n\geq n_{0}$ , where $n_{0}$ is such that $D_{n_{0}}\neq\emptyset$ . Further, let $\rho\geq 0$ be a (smooth) mollifier on $\mathbb{R}^{d}$ such that ${\rm Supp}\,{\rho}\subset B(0,1)$ , and set $\rho_{t}(\cdot):=t^{-d}\rho(\cdot/t)$ , $t>0$ . In particular,

{\rm Supp}\,{\rho_{t}}+D_{n}\subset D\quad\mbox{ for all }t<1/n_{0}.

(4.14)

Now let us consider that we extend $r$ from $\overline{D}$ to $\mathbb{R}^{d}$ by setting $r\equiv 0$ on $\mathbb{R}^{d}\setminus\overline{D}$ , and set

V_{t}=\sqrt{r_{t}},\;\mbox{ where }\;r_{t}:=\rho_{t}\ast r\in C_{c}^{2}(\mathbb{R}^{d}).

We claim that

i)

$\Delta r_{t}\leq\frac{\delta}{2\;\rm{rad}(D)}$ on $D_{n}$ for all $n\geq n_{0}$ and $0<t<1/{n_{0}}$ .
ii)

$\Delta V_{t}\leq-\frac{1}{4}{r_{t}}^{-3/2}[|\nabla r_{t}|^{2}-\delta]$ on $D_{n}$ for all $n\geq n_{0}$ and $0<t<1/{n_{0}}$ .

To prove the claim, note first that by a simple calculation we get

\Delta V_{t}=-\frac{1}{4}{r_{t}}^{-3/2}[|\nabla r_{t}|^{2}-2r_{t}\Delta r_{t}],

hence ii) follows from i) and the fact that $r_{t}\leq\rm{rad}(D)$ . So, it remains to prove the first assertion of the claim: Let $0\leq\varphi\in C_{c}^{\infty}(D_{n}),n\geq n_{0},t<1/n_{0}$ , and proceed with integration by parts and Fubini’s theorem as follows

	$\displaystyle\int_{D_{n}}\varphi\Delta r_{t}\;dx$	$\displaystyle=-\int_{D_{n}}\nabla r_{t}\cdot\nabla\varphi\;dx=-\int_{\mathbb{R}^{d}}\rho_{t}(y)\int_{D_{n}}\nabla r(x-y)\cdot\nabla\varphi(x)\;dx\;dy$
		$\displaystyle=-\int_{\mathbb{R}^{d}}\rho_{t}(y)\int_{\mathbb{R}^{d}}\nabla r(x-y)\cdot\nabla\varphi(x)\;dx\;dy$
		$\displaystyle=-\int_{{\rm Supp}\,{\rho_{t}}}\rho_{t}(y)\int_{\mathbb{R}^{d}}\nabla r(x)\cdot\nabla\varphi(\cdot+y)(x)\;dx\;dy,$
so, by (4.14) and then using (2.15) we can continue with
		$\displaystyle=-\int_{{\rm Supp}\,{\rho_{t}}}\rho_{t}(y)\int_{D}\nabla r(x)\cdot\nabla\varphi(\cdot+y)(x)\;dx\;dy\leq\frac{\delta}{2\;{\rm rad}(D)}\int_{{\rm Supp}\,{\rho_{t}}}\rho_{t}(y)\int_{D}\varphi(x+y)\;dx\;dy$
		$\displaystyle=\frac{\delta}{2\;{\rm rad}(D)}\int_{{\rm Supp}\,{\rho_{t}}}\int_{\mathbb{R}^{d}}\varphi(x+y)\;dx\;dy=\frac{\delta}{2\;{\rm rad}(D)}\int_{\mathbb{R}^{d}}\varphi(x)\;dx$
		$\displaystyle=\frac{\delta}{2\;{\rm rad}(D)}\rho_{t}(y)\int_{D_{n}}\varphi(x)\;dx,$

which proves i) and hence the entire claim.

Now, by Ito’s formula in corroboration with the claim proved above yield:

	$\displaystyle\mbox{For }x\in D_{n}\mbox{ we have }\;\mathbb{E}\left\{V_{t}\left(B^{x}_{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\right)\right\}$	$\displaystyle=V_{t}(x)+\mathbb{E}\left\{\int_{0}^{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\Delta V_{t}(B_{s}^{x})\;ds\right\}$
		$\displaystyle\leq V_{t}(x)-\frac{1}{4}\mathbb{E}\left\{\int_{0}^{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}(r^{t})^{-3/2}(B_{s}^{x})\left[\|\nabla r^{t}\|^{2}(B_{s}^{x})-\delta\right]\;ds\right\}.$		(4.15)

Step II (Passing to the limit in (4.15)). The next step is to let $t\to 0$ and then $n\to\infty$ in (4.15). To this end, note that because $r\in C(\overline{D})\cap W^{1,\infty}(D)$ , we have

iii)

$\lim\limits_{t\to 0}r_{t}=r$ uniformly on $D$ ,
iv)

$\lim\limits_{t\to 0}\nabla r_{t}=\nabla r$ a.e. and boundedly on $D$ .

In particular, because $\inf\limits_{x\in D_{n}}r(x)\geq 1/n$ for large enough $n_{0}$ and $n\geq n_{0}$ , we have that

v)

$\lim\limits_{t\to 0}r_{t}^{-3/2}=r^{-3/2}$ boundedly on $D_{n}$ , for each $n\geq n_{0}$ .

We are now in the position to let $t\to 0$ in (4.15) to get that for $x\in D_{n}$

\displaystyle\mathbb{E}\left\{V\left(B^{x}_{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\right)\right\}

\displaystyle\leq V(x)-\frac{1-\delta}{4}\mathbb{E}\left\{\int_{0}^{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}r^{-3/2}(B_{t}^{x})\;dt\right\},

where we have used that $|\nabla r|=1$ a.e. The fact that $|\nabla r|=1$ , follows from two basic facts. On one hand, $r$ is $1$ -Lipschitz so $|\nabla r|\leq 1$ . On the other hand, from Lipschitz conditions and Rademacher theorem, $r$ is differentiable almost everywhere. If $x$ is a point where $r$ is differentiable, and $y\in\partial D$ such that $d(x,y)=r(x)$ , and $v=y-x$ , then it is easy to see that the derivative of $r$ in the direction $v$ is constant $1$ , thus the claim.

Now, on the one hand, since $r\leq 2r(x)$ on $B(x,\widetilde{r}(x))\subset B(x,r(x))$ we deduce that

\displaystyle\mathbb{E}\left\{V\left(B^{x}_{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\right)\right\}

\displaystyle\leq V(x)-\frac{1-\delta}{4r(x)^{3/2}}\mathbb{E}\left\{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}\right\},\quad x\in D_{n}.

(4.16)

On the other hand, $D_{n}\mathop{\nearrow}\limits_{n}D$ , so $\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}\mathop{\longrightarrow}\limits_{n}\tau_{B(x,\widetilde{r}(x))^{c}}$ a.s., hence for all $x\in D$

	$\displaystyle PV(x)$	$\displaystyle=\mathbb{E}\left\{V\left(B^{x}_{\tau_{B(x,\widetilde{r}(x))}}\right)\right\}=\lim\limits_{n}\mathbb{E}\left\{V\left(B^{x}_{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\right)\right\},$
hence letting $n\to\infty$ in (4.16) we can continue with
	$\displaystyle PV(x)$	$\displaystyle\leq V(x)-\frac{1-\delta}{4r(x)^{3/2}}\mathbb{E}\left\{\tau_{B(x,\widetilde{r}(x))^{c}}\right\}=V(x)-\frac{1-\delta}{4r(x)^{3/2}}\frac{\widetilde{r}(x)^{2}}{d}$
		$\displaystyle\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)V(x),$

which proves (2.17). Notice that for the last inequality we used that $\beta r(x)\leq\widetilde{r}(x),x\in D$ .

The tail estimate (2.18) can be deduced from (2.17) and Markov inequality, as follows:

	$\displaystyle\mathbb{P}(N_{\varepsilon}^{x}>M)$	$\displaystyle\leq\mathbb{P}(r(X^{x}_{M})\geq\varepsilon)=\mathbb{P}(V(X^{x}_{M})\geq\sqrt{\varepsilon})\leq\frac{1}{\sqrt{\varepsilon}}\mathbb{E}\left\{V(X^{x}_{M})\right\}=\frac{1}{\sqrt{\varepsilon}}P^{M}V(x)$
		$\displaystyle\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\frac{V(x)}{\sqrt{\varepsilon}},\quad x\in D.$

Let us finally conclude the proof by showing (2.19):

	$\displaystyle\mathbb{E}\left\{a^{N_{\varepsilon}^{x}}\right\}$	$\displaystyle=\sum\limits_{k\geq 0}a^{k}\mathbb{P}(N_{\varepsilon}^{x}=k)\leq 1+\sum\limits_{k\geq 1}a^{k}\mathbb{P}(N_{\varepsilon}^{x}>k-1)$
		$\displaystyle\leq 1+\sum\limits_{k\geq 1}a^{k}\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{k-1}\frac{V(x)}{\sqrt{\varepsilon}}=1+a\frac{V(x)}{\sqrt{\varepsilon}}\sum\limits_{k\geq 0}a^{k}\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{k}$
		$\displaystyle=1+\frac{a}{1-a\delta_{d}}\frac{V(x)}{\sqrt{\varepsilon}},\quad x\in D.$

∎

4.3 Proofs for Subsection 2.3

Proof of 2.19.

First of all, note that by similar arguments to those used in the proof of 2.10,

u_{M}(x)=\mathbb{E}\left\{\sum\limits_{n=0}^{M-1}\int_{\tau_{n}^{x}}^{\tau_{n+1}^{x}}f(B^{x}_{t})\;dt\right\}=\mathbb{E}\left\{\int_{0}^{\tau_{M}^{x}}f(B^{x}_{t})\;dt\right\},\quad x\in D.

Therefore,

	$\displaystyle\|u(x)-u_{M}(x)\|$	$\displaystyle=\left\|\mathbb{E}\left\{\int_{\tau_{M}^{x}}^{\tau_{D^{c}}}f(B^{x}_{t})\;dt\right\}\right\|\leq\|f\|_{\infty}\mathbb{E}\left\{\tau_{D^{c}}-\tau_{M}^{x}\right\}=\|f\|_{\infty}\mathbb{E}^{x}\left\{\mathbb{E}^{B^{x}_{\tau_{M}^{x}}}\left\{\tau_{D^{c}}\right\}\right\}$
		$\displaystyle=\|f\|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}})\right\}=\|f\|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}});N_{\varepsilon}^{x}\leq M\right\}+\|f\|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}});N_{\varepsilon}^{x}>M\right\}$
		$\displaystyle\leq\|f\|_{\infty}\left[\mathbb{E}\left\{v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}});N_{\varepsilon}^{x}\leq M\right\}+\|v\|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M)\right]$
		$\displaystyle\leq\|f\|_{\infty}\left[\mathbb{E}\left\{v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}});N_{\varepsilon}^{x}\leq M\right\}+\frac{1}{d}{\sf diam}(D)^{2}2e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\right],\quad x\in D,$

where the last inequality follows by 2.3 and 2.11. Now, one can see that $(v(B_{\tau_{n}^{\beta}}))_{n\geq 0}$ is a supermartingale, hence by Doob’s stopping theorem we get that

\displaystyle\mathbb{E}\left\{v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}});N_{\varepsilon}^{x}\leq M\right\}=\mathbb{E}\left\{1_{[N_{\varepsilon}^{x}\leq M]}\mathbb{E}\left[v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}})\mid\mathcal{F}_{N_{\varepsilon}^{x}}\right]\right\}\leq\mathbb{E}\left\{v(B^{x}_{\tau_{N_{\varepsilon}^{x}}^{x}})\right\}=v(x,\varepsilon)\leq v_{\infty}(\varepsilon),

hence the desired estimate is now fully justified. ∎

Further, we need the following lemma:

Lemma 4.1.

Let $\varepsilon>0$ , $\beta\in(0,1]$ , $\widetilde{r}$ be a $(\beta,\varepsilon)$ -distance, $x\in D$ , and $(X_{n}^{x})_{n\geq 0}$ be the corresponding $\widetilde{r}$ -WoS chain. If $\tau^{\prime}\geq\tau$ is a finite stopping times such that $\tau\geq N_{\varepsilon}^{x}$ . If $g$ is $\alpha$ -Hölder on $\overline{D}$ for some $\alpha\in[0,1]$ then

\mathbb{E}\left\{\left|g(X^{x}_{\tau^{\prime}})-g(X^{x}_{\tau})\right|\right\}\leq d^{\alpha/2}|g|_{\alpha}\cdot v(x,\varepsilon)^{\alpha/2}\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|^{\alpha/2}_{\infty}(\varepsilon).

If $g\in C^{2}_{b}(\overline{D})$ then

\left|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right|\leq\dfrac{|\Delta g|_{\infty}}{2}\cdot v(x,\varepsilon)\leq\frac{|\Delta g|_{\infty}}{2}\cdot|v|_{\infty}(\varepsilon).

Proof.

It is easy to see that for $z\in\mathbb{R}^{d}$ , $(\langle X^{x}_{n},z\rangle-\langle x,z\rangle)_{n\geq 1}$ is a bounded martingale, hence

\mathbb{E}\{\langle X^{x}_{\tau^{\prime}},X^{x}_{\tau}\rangle\}=\mathbb{E}\left\{|X_{\tau}^{x}|^{2}\right\},\quad\mbox{ and thus }\quad\mathbb{E}\left\{|X^{x}_{\tau^{\prime}}-X^{x}_{\tau}|^{2}\right\}=\mathbb{E}\left\{|X^{x}_{\tau^{\prime}}|^{2}\right\}-\mathbb{E}\left\{|X^{x}_{\tau}|^{2}\right\}.

By employing the martingale problem for the Markov chain $(X_{n}^{x})_{n\geq 0}$ , we get that for any finite stopping time $T$ , $\mathbb{E}\left\{|X^{x}_{T}|^{2}\right\}=|x|^{2}+\mathbb{E}\left\{\sum_{i=0}^{T-1}\widetilde{r}^{2}(X^{x}_{i})\right\},$ hence $\mathbb{E}\left\{|X^{x}_{\tau^{\prime}}-X^{x}_{\tau}|^{2}\right\}=\mathbb{E}\left\{\sum\limits_{i=\tau}^{\tau^{\prime}}\widetilde{r}^{2}(X^{x}_{i})\right\}$ and therefore

	$\displaystyle\left\|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right\|$	$\displaystyle\leq\|g\|_{\alpha}\mathbb{E}\left\{\sum_{i=\tau}^{\infty}\widetilde{r}^{2}(X^{x}_{i})\right\}^{\alpha/2}\leq d^{\alpha/2}\|g\|_{\alpha}\mathbb{E}\left\{\sum_{i=N^{x}_{\varepsilon}}^{\infty}\tau^{x}_{i+1}-\tau^{x}_{i}\right\}^{\alpha/2}$
		$\displaystyle=d^{\alpha/2}\|g\|_{\alpha}\cdot\mathbb{E}\left\{\tau^{x}_{\partial D}-\tau^{x}_{N^{x}_{\varepsilon}}\right\}^{\alpha/2}=d^{\alpha/2}\|g\|_{\alpha}\cdot\mathbb{E}\left\{v\left(B^{x}_{\tau^{x}_{N^{x}_{\varepsilon}}}\right)\right\}^{\alpha/2}$
		$\displaystyle\leq d^{\alpha/2}\|g\|_{\alpha}\cdot\|v\|_{\infty}^{\alpha/2}(\varepsilon).$

Suppose now that $g\in C^{2}(\overline{D})$ . Then, by the martingale problem we deduce

\displaystyle\left|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right|

\displaystyle=\left|\mathbb{E}\left\{\sum_{i=\tau}^{\tau^{\prime}-1}(Pg-g)(X^{x}_{i})\right\}\right|\leq\mathbb{E}\left\{\sum_{i=N^{x}_{\varepsilon}}^{\infty}|Pg-g|(X^{x}_{i})\right\}.

On the other hand, by Itô’s formula

|Pg(z)-g(z)|=\left|\mathbb{E}\left\{\int_{0}^{\tau^{\beta,z}_{1}}\frac{1}{2}\Delta g(B_{t})\;\textrm{d}t\right\}\right|\leq\frac{1}{2}|\Delta g|_{\infty}\cdot\mathbb{E}\left\{\tau_{1}^{\beta,z}\right\}=|\Delta g|_{\infty}\cdot\frac{\widetilde{r}(z)^{2}}{2d},

hence

	$\displaystyle\left\|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right\|$	$\displaystyle\leq\dfrac{\|\Delta g\|_{\infty}}{2d}\cdot\mathbb{E}\left\{\sum_{i=N^{x}_{\epsilon}}^{\infty}\widetilde{r}^{2}(X^{x}_{i})\right\}=\dfrac{\|\Delta g\|_{\infty}}{2}\cdot\mathbb{E}\left\{\sum_{i=N^{x}_{\epsilon}}^{\infty}\tau^{x}_{i+1}-\tau^{x}_{i}\right\}$
		$\displaystyle=\dfrac{\|\Delta g\|_{\infty}}{2}\cdot\mathbb{E}\left\{\tau^{x}_{\partial D}-\tau^{x}_{N^{x}_{\epsilon}}\right\}=\dfrac{\|\Delta g\|_{\infty}}{2}\cdot\mathbb{E}\left\{v\left(B^{x}_{\tau^{x}_{N^{x}_{\epsilon}}}\right)\right\}$
		$\displaystyle=\dfrac{\|\Delta g\|_{\infty}}{2}\cdot v(x,\epsilon)\leq\dfrac{\|\Delta g\|_{\infty}}{2}\cdot\|v\|_{\infty}(\epsilon).$

∎

Proof of 2.20.

Recall that $\left(B^{x}_{\tau^{x}_{n}}\right)_{n}$ and $(X^{x}_{n})_{n}$ are equal in law and hence $B^{x}_{\tau^{x}_{N^{x}_{\epsilon}}}$ and $X^{x}_{N^{x}_{\epsilon}}$ are also equal in law. In particular,

\mathbb{E}\{g(B^{x}_{\tau^{x}_{D^{c}}})\}=\lim_{n}\mathbb{E}\{g(B^{x}_{\tau^{x}_{n}})\}=\lim_{n}\mathbb{E}\{g(X^{x}_{n})\}\quad\mbox{ for all }x\in D.

Also, if $T$ is a finite random time then

\lim_{n}\mathbb{E}\{g(X^{x}_{n\vee T})\}=\lim_{n}\{\mathbb{E}\{g(X^{x}_{n});\;T<n]+\mathbb{E}\{g(X^{x}_{T});\;T\geq n\}\}=\mathbb{E}\{g(B^{x}_{\tau^{x}_{D^{c}}})\},x\in D.

Now let us fix $\varepsilon>0$ and argue as follows:

	$\displaystyle\|u(x)-u_{M}(x)\|$	$\displaystyle\leq\lim\limits_{n}\left\|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M})\right\}\right\|$
		$\displaystyle=\lim\limits_{n}\left\|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M});\;N_{\varepsilon}^{x}\leq M\right\}\right\|+\lim\limits_{n}\left\|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M});\;N_{\varepsilon}^{x}>M\right\}\right\|$
		$\displaystyle\leq\lim\limits_{n}\left\|\mathbb{E}\left\{g(X^{x}_{n\vee M\vee N^{x}_{\varepsilon}})-g\left(X^{x}_{M\vee N_{\varepsilon}^{x}}\right);\;N_{\varepsilon}^{x}\leq M\right\}\right\|+2\|g\|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M).$

Now, since $g$ is $\alpha$ -Hölder, for $x\in D$ we have

	$\displaystyle\|u(x)-u_{M}(x)\|$	$\displaystyle\leq\lim\limits_{n}\mathbb{E}\left\{\left\|g(X^{x}_{n\vee M\vee N^{x}_{\varepsilon}})-g\left(X^{x}_{M\vee N_{\varepsilon}^{x}}\right)\right\|\right\}+2\|g\|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M),$
and by employing 4.1, i), we can continue with
		$\displaystyle\leq d^{\alpha/2}\|g\|_{\alpha}\cdot\|v\|^{\alpha/2}_{\infty}(\varepsilon)+2\|g\|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M)$
		$\displaystyle\leq d^{\alpha/2}\|g\|_{\alpha}\cdot\|v\|^{\alpha/2}_{\infty}(\varepsilon)+4\|g\|_{\infty}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M},$
where the last inequality is due to 2.11.

∎

4.4 Proofs for Subsection 2.4

Proof of 2.24.

We have

	$\displaystyle\mathbb{E}$	$\displaystyle\left\{\left\|u(\cdot)-u_{M}^{N}(\cdot)\right\|^{2}_{L^{2}(D)}\right\}$
		$\displaystyle\leq 2\left\|u(\cdot)-u_{M}(\cdot)\right\|^{2}_{L^{2}(D)}+2\mathbb{E}\left\{\left\|u_{M}(\cdot)-u_{M}^{N}(\cdot)\right\|^{2}_{L^{2}(D)}\right\}$
		$\displaystyle\leq 2\lambda(D)\sup\limits_{x\in D}\|u(x)-u_{M}(x)\|+2\int\limits_{D}\mathbb{E}\left\{\left\|u_{M}(\cdot)-u_{M}^{N}(\cdot)\right\|^{2}\right\}\;dx$
		$\displaystyle=2\lambda(D)\sup\limits_{x\in D}\|u(x)-u_{M}(x)\|^{2}+\frac{2}{N}\int\limits_{D}\mathbb{E}\left\{\left\|u_{M}(\cdot)-u_{M}^{1}(\cdot)\right\|^{2}\right\}\;dx$
		$\displaystyle\leq 2\lambda(D)\sup\limits_{x\in D}\|u(x)-u_{M}(x)\|^{2}+\frac{4}{N}\left(\int\limits_{D}\left[\|g\|^{2}_{\infty}+\frac{1}{d^{2}}M\|f\|^{2}_{\infty}{\sf diam}(D)^{2}\mathbb{E}\left\{\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X_{k-1}^{x,1})\right\}\;\right]dx\right)$
		$\displaystyle\leq 2\lambda(D)\sup\limits_{x\in D}\|u(x)-u_{M}(x)\|^{2}+\frac{4}{N}\left(\int\limits_{D}\left[\|g\|^{2}_{\infty}+\frac{1}{d^{2}}M\|f\|^{2}_{\infty}{\sf diam}(D)^{2}\mathbb{E}\left\{\tau^{x}_{D^{c}}\right\}\;\right]dx\right)$
and by 2.3
		$\displaystyle\leq 2\lambda(D)\left[\sup\limits_{x\in D}\|u(x)-u_{M}(x)\|^{2}+\frac{2(\|g\|^{2}_{\infty}+\frac{1}{d^{3}}M\|f\|^{2}_{\infty}{\sf diam}(D)^{4}}{N}\right].$

∎

Before proving the main result, 2.26, we need several preliminary Lemmas.

Lemma 4.2.

Let $\varepsilon>0$ and $\beta\in(0,1]$ . If $f$ and $g$ are $\alpha$ -Hölder for some $\alpha\in[0,1]$ , and $M,N\geq 1$ , then

|u_{M}^{N}(x)-u_{M}^{N}(y)|\leq\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)(2+|\widetilde{r}|_{1})^{M}\left(|x-y|^{\alpha}\vee|x-y|\right),

for all $x,y\in D$ almost surely. In particular,

|u_{M}(x)-u_{M}(y)|\leq\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)(2+|\widetilde{r}|_{1})^{M}\left(|x-y|^{\alpha}\vee|x-y|\right)\quad\mbox{for all }x,y\in D.

Proof.

Clearly, it is sufficient to prove the estimate for $|u_{M}^{i}(x)-u_{M}^{i}(y)|\leq 1$ , independently of $i$ , $1\leq i\leq N$ . To this end, since $\widetilde{r}$ is Lipschitz

	$\displaystyle\|X^{x,i}_{M}-X^{y,i}_{M}\|$	$\displaystyle\leq\|X^{x,i}_{M-1}-X^{y,i}_{M-1}\|+\|\widetilde{r}(X^{x,i}_{M-1})-\widetilde{r}(X^{x,i}_{M-1})\|$
		$\displaystyle\leq(1+\|\widetilde{r}\|_{1})\|X^{x,i}_{M-1}-X^{y,i}_{M-1}\|$
		$\displaystyle\leq(1+\|\widetilde{r}\|_{1})^{M}\|x-y\|\quad\mbox{ for all }x,y\in D,M\geq 0.$

Therefore,

	$\displaystyle\|u_{M}^{i}(x)-u_{M}^{i}(y)\|$	$\displaystyle\leq\|g\|_{\alpha}\|X^{x,i}_{M}-X^{y,i}_{M}\|^{\alpha}+\frac{1}{d}\sum\limits_{k=1}^{M}2\|f\|_{\infty}{\rm diam}(D)\|\widetilde{r}(X^{x,i}_{k-1})-\widetilde{r}(X^{y,i}_{k-1})\|$
		$\displaystyle\;\phantom{\leq\|g\|_{\alpha}\|X^{x,i}_{M}-X^{y,i}_{M}\|^{\alpha}}+\frac{1}{d}\sum\limits_{k=1}^{M}{\rm diam}(D)^{2}\|f\|_{\alpha}[\|X^{x,i}_{k-1}-X^{y,i}_{k-1}\|+\|\widetilde{r}(X^{x,i}_{k-1})-\widetilde{r}(X^{x,i}_{k-1})\|]^{\alpha}$
		$\displaystyle\leq\|g\|_{\alpha}(1+\|\widetilde{r}\|_{1})^{M\alpha}\|x-y\|^{\alpha}+\frac{2{\rm diam}(D)\|f\|_{\infty}\|x-y\|}{d}\sum\limits_{k=1}^{M}(1+\|\widetilde{r}\|_{1})^{k-1}$
		$\displaystyle\;\phantom{\leq\|g\|_{\alpha}\|X^{x,i}_{M}-X^{y,i}_{M}\|^{\alpha}}+\frac{{\rm diam}(D)^{2}\|f\|_{\alpha}\|x-y\|^{\alpha}}{d}\sum\limits_{k=1}^{M}(1+\|\widetilde{r}\|_{1})^{\alpha(k-1)}$
		$\displaystyle\leq\left(\|g\|_{\alpha}+\frac{{\rm diam}(D)^{2}\|f\|_{\alpha}+2{\rm diam}(D)\|f\|_{\infty}}{d}\right)(2+\|\widetilde{r}\|_{1})^{M}\left(\|x-y\|^{\alpha}\vee\|x-y\|\right).$

∎

The next lemma is the well-known Hoeffding’s inequality:

Lemma 4.3.

Suppose that $(Z_{i})_{i\geq 1}$ are iid real random variables such that $a_{i}\leq Z_{i}\leq b_{i}$ for all $i$ . Then for all $N\in\mathbb{R}$ and $\gamma\geq 0$

\mathbb{P}\left(\left|\mathbb{E}\{Z_{1}\}-\frac{1}{N}\sum_{i=1}^{N}Z_{i}\right|\geq\gamma\right)\leq 2e^{-\frac{2N^{2}\gamma^{2}}{\sum_{i=1}^{N}(b_{i}-a_{i})^{2}}}.

Using Hoeffding’s inequality we immediately get the following estimate.

Corollary 4.4.

For all $N,M\in\mathbb{N}$ , and $\gamma\geq 0$ we have

\mathbb{P}\left(\left|u_{M}(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq 2e^{-\frac{N\gamma^{2}}{(|g|_{\infty}+M{\sf diam}(D)^{2}|f|_{\infty}/d)^{2}}},\quad x\in D.

Proof.

The result follows directly from 4.3 since

\left|g(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x,i}_{k-1})f\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right|_{\infty}\leq|g|_{\infty}+M{\sf diam}(D)^{2}|f|_{\infty}/d.

∎

Finally, we are in the position to prove the main theorem.

Proof of 2.26.

First of all, assume without loss of generality that $D\subset[0,{\sf diam}(D)]^{d}$ , and for each $M,K\geq 1$ consider the grid

F=F(K,M,\alpha,|\widetilde{r}|_{1}):=\left\{\frac{i{\sf diam}(D)}{K(1+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}}:\;1\leq i\leq K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}\right\}^{d}\cap D.

For $x\in D$ such that $x\in\left[\frac{i_{1}{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}},\frac{(i_{1}+1){\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}}\right)\times\cdots\times\left[\frac{i_{d}{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}},\frac{(i_{d}+1){\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}}\right)$ we set

x^{F}:=\left(\frac{i_{1}{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}},\cdots,\frac{i_{d}{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}}\right).

Note that {fleqn}

	$\displaystyle\sup_{x\in D}\left\|u(x)-u_{M}^{N}(x)\right\|$	$\displaystyle\leq\sup_{x\in D}\left\|u(x)-u_{M}(x)\right\|+\sup_{x\in D}\left\|u_{M}(x)-u_{M}^{N}(x)\right\|$
		$\displaystyle\leq\sup_{x\in D}\left\|u(x)-u_{M}(x)\right\|+\sup_{x\in F}\left\|u_{M}(x)-u_{M}^{N}(x)\right\|$
		$\displaystyle\;\phantom{\leq\sup_{x\in D}\left\|u(x)-u_{M}(x)\right\|}+2\left(\|g\|_{\alpha}+\frac{{\rm diam}(D)^{2}\|f\|_{\alpha}+2{\rm diam}(D)\|f\|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha},$

where the last inequality follows from 4.2 and by the fact that

|x-x_{F}|\leq\frac{{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{M/\alpha}}\mbox{ for all }\;x\in D.

Consequently, by setting

\widetilde{\gamma}:=\gamma-\sup_{x\in D}\left|u(x)-u_{M}(x)\right|-2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha}

and using union bound inequality we have

\displaystyle\mathbb{P}

\displaystyle\left(\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq\mathbb{P}\left(\sup_{x\in F}\left|u_{M}(x)-u_{M}^{N}(x)\right|\geq\widetilde{\gamma}\right)\leq\sum\limits_{x\in F}\mathbb{P}\left(\left|u_{M}(x)-u_{M}^{N}(x)\right|\geq\widetilde{\gamma}\right),

so, the two desired estimates now follow by 2.21 and 4.4.

In the case of $g\in C^{2}(\bar{D})$ , we only need to use the second part of 2.21, the rest of the argument being the same.

The assertions about the particular domains are clear.

For the statement about the expectation stated in (2.35), we only have to involve the following Lemma.

Lemma 4.5.

If $X$ is a non-negative random variable with the property that there exist constants and $c_{1},c_{2},A\geq 0$ such that

\mathbb{P}(X\geq t)\leq 2e^{c_{1}-c_{2}((t-A)^{+})^{2}}\text{ for all }t\geq 0,

(4.17)

then

\mathbb{E}[X]\leq A+\frac{\sqrt{c_{1}+\log(2)}+1}{\sqrt{c_{2}}}.

Proof.

Start with $\lambda\geq A$ and write

\mathbb{E}[X]=\int_{0}^{\infty}\mathbb{P}(X\geq t)dt=\int_{0}^{\lambda}\mathbb{P}(X\geq t)dt+\int_{\lambda}^{\infty}\mathbb{P}(X\geq t)dt\leq\lambda+\int_{\lambda}^{\infty}2e^{c_{1}-c_{2}(t-A)^{2}}=\lambda+\int_{\lambda-A}^{\infty}2e^{c_{1}-c_{2}t^{2}}dt.

Optimizing over $\lambda\geq A$ , yields the optimum point as

\lambda^{*}=A+\sqrt{\frac{c_{1}+\log(2)}{c_{2}}}

which in turn yields

\int_{\lambda^{*}-A}^{\infty}2e^{c_{1}-c_{2}t^{2}}dt\leq 2e^{c_{1}}\int_{\lambda^{*}-A}^{\infty}\frac{t}{\lambda^{*}-A}e^{-c_{2}t^{2}}dt=\frac{1}{2\sqrt{(c_{1}+\log(2))c_{2}}}\leq\frac{1}{\sqrt{c_{2}}}.

This concludes the estimate. ∎

The rest of the statements in the theorem are straightforward now.

∎

Proof of 2.28.

Note that since $\widetilde{r}$ is a $(\beta,\varepsilon)$ -distance, it is also a $(\beta,\varepsilon_{0})$ -distance since $\varepsilon\leq\varepsilon_{0}$ .

Now, using (2.38) we get that

2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha}\leq\frac{\gamma}{4}.

Also, because ${\sf adiam}(D)<\infty$ , by 2.6 we have that $|v|_{\infty}(\varepsilon_{0})\leq\varepsilon_{0}{\sf adiam}(D)$ . Therefore, since $\varepsilon_{0}\leq 1$ is given by (2.37) and using that $\varepsilon_{0}\leq\varepsilon_{0}^{\alpha/2}$ , we get that

d^{\alpha/2}|g|_{\alpha}\cdot|v|^{\alpha/2}_{\infty}(\varepsilon_{0})+|f|_{\infty}|v|_{\infty}(\varepsilon_{0})\leq\frac{\gamma}{4}.

Now, if $M$ is as in (2.41), then

(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon_{0}^{2}}{4{\sf diam}(D)^{2}}M}\leq\frac{\gamma}{4}.

Furthermore, if $D$ is $\delta$ -defective convex and $M$ is chosen as in (2.42), we have that

(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})a_{d}^{M}\sqrt{\frac{{\sf diam}(D)}{\varepsilon_{0}}}\leq\frac{\gamma}{4},

All the choices above ensure that $A(D,M,K,d,\varepsilon)$ from 2.26 satisfies

A(D,M,K,d,\varepsilon)\leq\frac{3\gamma}{4}.

Taking into account the above inequality and the estimate (2.31), it is a dircet check to see that the right hand side of (2.31) is less than $\eta$ if $N$ satisfies (3.3), which concludes the proof. ∎

4.5 Proofs for Subsection 2.5

Proof of 2.30.

We have:

	$\displaystyle\frac{\|G(x)-G(y)\|}{\|x-y\|^{\alpha}}$	$\displaystyle\leq\frac{\|\psi\left(\frac{1}{{\varepsilon}_{0}}r(x))\right)(g({\pi_{\partial D}}(x))-g({\pi_{\partial D}}(y))\|}{\|x-y\|^{\alpha}}+\frac{\|((\psi\left(\frac{1}{{\varepsilon}_{0}}r(x))\right)-\psi\left(\frac{1}{{\varepsilon}_{0}}r(y))\right))g({\pi_{\partial D}}(y))\|}{\|x-y\|^{\alpha}}$
		$\displaystyle\leq\frac{\|{\pi_{\partial D}}(x))-{\pi_{\partial D}}(y))\|^{\alpha}\|g\|_{\alpha}}{\|x-y\|^{\alpha}}+\frac{\|g\|_{\infty}}{{\varepsilon}_{0}}\frac{\|r(x)-r(y)\|}{\|x-y\|^{\alpha}}$
		$\displaystyle\leq\|\nabla{\pi_{\partial D}}\|_{\infty}^{\alpha}\|g\|_{\alpha}+\frac{\|g\|_{L^{\infty}}}{{\varepsilon}_{0}}\frac{\|x-y\|^{\alpha}\|\sf{diam}(D)\|^{1-\alpha}}{\|x-y\|^{\alpha}}.$

For the second part, we know that $D$ if of class $C^{3}$ and $g\in C^{2}(\partial D)$ and thus $G\in C^{2}(D)\cap C(\bar{D})$ . Using the definition of $G$ we have

	$\displaystyle\nabla G$	$\displaystyle=\frac{1}{{\varepsilon}_{0}}\psi^{\prime}\left(\frac{1}{{\varepsilon}_{0}}r\right)\nabla rg({\pi_{\partial D}})+\psi\left(\frac{1}{{\varepsilon}_{0}}r\right)\nabla g({\pi_{\partial D}})\nabla{\pi_{\partial D}},$
respectively
	$\displaystyle\Delta G$	$\displaystyle=\left(\psi^{\prime\prime}\|\nabla r\|^{2}+\psi^{\prime}\Delta r\right)g({\pi_{\partial D}})+\frac{2}{{\varepsilon}_{0}}(\psi^{\prime}\nabla r)\nabla g({\pi_{\partial D}})\nabla{\pi_{\partial D}}$
		$\displaystyle\phantom{(\psi^{\prime\prime}\|\nabla r\|^{2}}\quad\;\;+\psi(r)\left(\Delta g({\pi_{\partial D}})\|\nabla{\pi_{\partial D}}\|^{2}+\nabla g({\pi_{\partial D}})\Delta{\pi_{\partial D}}\right).$

Taking into account that $|\psi|_{\infty},|\psi^{\prime}|_{\infty},|\psi^{\prime\prime}|_{\infty}\leq 1$ and $|\nabla r|\leq 1$ we have:

	$\displaystyle\|\nabla G\|_{\infty}$	$\displaystyle\leq\frac{1}{{\varepsilon}_{0}}\|g\|_{\infty}+\|\nabla g\|_{\infty}\|\nabla{\pi_{\partial D}}\|_{\infty},$
	$\displaystyle\|\Delta G\|_{\infty}$	$\displaystyle\leq\left(1+\|\Delta r\|_{\infty}\right)\|g\|_{\infty}+\frac{2}{\epsilon_{0}}\|\nabla g\|_{\infty}\|\nabla{\pi_{\partial D}}\|_{\infty}+\|\Delta g\|_{\infty}\|\nabla{\pi_{\partial D}}\|_{\infty}^{2}+\|\nabla g\|_{\infty}\|\Delta{\pi_{\partial D}}\|_{\infty}.$

For any point $P\in\partial D$ let $\nu(P)$ and $T_{P}$ denote respectively the unit exterior normal to $\partial D$ at the point $P$ and the tangent hyperspace to $\partial D$ at $P$ . By a rotation of coordinates we can assume that the $P_{d}$ coordinate lies in the direction $\nu(P)$ . In some neighbourhood $\mathcal{N}=\mathcal{N}(P)$ of $P$ , $\partial D$ is given by $P_{d}=\varphi(P^{\prime})$ where $P^{\prime}=(P_{1},\dots,P_{d-1})$ , $\varphi\in C^{3}(T_{P}\cap\mathcal{N})$ and $D\varphi(P^{\prime})=0$ . The eigenvalues of the matrix $[\nabla^{2}\varphi(P^{\prime})]$ denoted $\{k_{1},\dots,k_{d-1}\}$ are then the principal curvatures of $\partial D$ at $P$ . By a further rotation of coordinates we can assume that the $P_{1},\dots,P_{d-1}$ axes lie along principal directions corresponding to $k_{1},\dots,k_{d-1}$ at $P$ .

The Hessian matrix $[D^{2}\varphi(P^{\prime})]$ with respect to the principal coordinate system at $P$ described above is given by

[D^{2}\varphi(P^{\prime})]=\textrm{diag}[k_{1},\dots,k_{d-1}].

As noted in the proof of Lemma $14.16$ in [19] the maximal radius of the interior ball that can be associated to each point on the boundary, is bounded from below by a certain $\mu>0$ and we have that $\mu^{-1}$ bounds the principal curvatures, hence our choice of $\mu$ as $\epsilon_{0}$ .

The unit exterior normal vector $\hat{\nu}(P^{\prime}):=\nu(P)$ at a point $P=(P^{\prime},\varphi(P^{\prime}))\in\mathcal{N}\cap\partial D$ is given by

\nu_{i}(P)=\frac{D_{i}\varphi(P^{\prime})}{\sqrt{1+|D\varphi(P^{\prime})|^{2}}},\,i=1,\dots,d-1,\nu_{d}(P)=\frac{1}{\sqrt{1+|D\varphi(P^{\prime})|^{2}}}.

Hence with respect to the principal coordinate system at $P$ we have:

\frac{\partial\hat{\nu}_{i}}{\partial x_{j}}(P^{\prime})=k_{i}\delta_{ij},i,j=1,\dots,d-1.

(4.18)

We note that for each $x\in{D_{\epsilon_{0}}}$ there exists a unique ${\pi_{\partial D}}(x)\in\partial D$ such that $|{\pi_{\partial D}}(x)-x|=r(x)$ . We have:

x={\pi_{\partial D}}(x)+\nu({\pi_{\partial D}}(x))r(x).

(4.19)

As pointed in the proof of Lemma $14.6$ in [19] we have ${\pi_{\partial D}}\in C^{2}({D_{\epsilon_{0}}}),r\in C^{3}({D_{\epsilon_{0}}})$ . Differentiating the i-th coordinate of (4.19) with respect to $x_{j}$ we get:

\delta_{ij}=\frac{\partial({\pi_{\partial D}})_{i}}{\partial x_{j}}+\sum_{l}\frac{\partial\nu_{i}}{\partial y_{l}}\frac{\partial({\pi_{\partial D}})_{l}}{\partial x_{j}}r+\nu_{i}\frac{\partial r}{\partial x_{j}}

(4.20)

Furthermore, differentiating (4.20) with respect to $x_{k}$ we get:

	$\displaystyle 0=$	$\displaystyle\frac{\partial^{2}{{\pi_{\partial D}}}_{i}}{\partial x_{j}\partial x_{k}}+\sum_{l,m}\frac{\partial^{2}\nu_{i}}{\partial y_{l}\partial y_{m}}\frac{\partial{{\pi_{\partial D}}}_{m}}{\partial x_{k}}\frac{\partial{{\pi_{\partial D}}}_{l}}{\partial x_{j}}r+\sum_{l}\frac{\partial\nu_{i}}{\partial y_{l}}\frac{\partial^{2}{\pi_{\partial D}}_{l}}{\partial x_{j}\partial x_{k}}r+$
		$\displaystyle+\sum_{l}\frac{\partial\nu_{i}}{\partial y_{l}}\frac{\partial{{\pi_{\partial D}}}_{l}}{\partial x_{j}}\frac{\partial r}{\partial x_{k}}+\sum_{l}\frac{\partial\nu_{i}}{\partial y_{l}}\frac{\partial{{\pi_{\partial D}}}_{l}}{\partial x_{k}}\frac{\partial r}{\partial x_{j}}+\nu_{i}\frac{\partial^{2}r}{\partial x_{j}\partial x_{k}}.$		(4.21)

As noted in the proof of Lemma $14.17$ in [19] we have that in terms of a principal coordinate system at ${\pi_{\partial D}}(x)$ as chosen before

\nabla r(x)=\nu({\pi_{\partial D}}(x))

(4.22)

and,

\nabla^{2}r=\textrm{diag}(\frac{k_{1}}{1+k_{1}r},\dots,\frac{k_{d-1}}{1+k_{d-1}r},0),

(4.23)

hence

|\nabla r|_{\infty}\leq 1,|\nabla^{2}r|_{\infty}\leq\max_{x\in D}k_{d-1}(x)={\varepsilon}_{0}^{-1}.

Using (4.22) in (4.18) implies:

\frac{\partial({\pi_{\partial D}})_{i}}{\partial x_{j}}=(\delta_{ij}-\nu_{i}\nu_{j})\frac{1}{1+k_{i}r}

(4.24)

hence

|\nabla{\pi_{\partial D}}|_{\infty}\leq 2.

Furthermore using 4.22 and (4.23) in (4.5) we can bound:

|\nabla^{2}{\pi_{\partial D}}|_{\infty}\leq|\nabla^{2}\nu|_{\infty}4{\varepsilon}_{0}+4+{\varepsilon}_{0}^{-1},

where we estimated $|r|_{\infty}\leq{\varepsilon}_{0}$ because due to the cut-off function $\psi$ we only need to estimate $r,{\pi_{\partial D}}$ in the ${\varepsilon}_{0}$ neighbourhood of the boundary.

∎

Proof of 3.6.

First note that if $\phi=W^{L}\circ\sigma\circ\cdots\sigma W^{1}$ , where $W^{i}$ is of the form $W^{i}(z)=A^{i}z+b^{i}$ , then using the relation $x=\sigma(x)-\sigma(-x)$ we have

\phi(x)v=\begin{pmatrix}v&v\end{pmatrix}\sigma\circ\begin{pmatrix}W^{L}\\ -W^{L}\end{pmatrix}\circ\sigma\circ\cdots\sigma\circ W^{1}x,\quad x\in\mathbb{R}^{d}.

The assertions follow now from 3.4 and 3.2. ∎

Proof of 3.8.

We first note that

\left|\frac{1}{{\varepsilon}_{0}}r-\frac{1}{{\varepsilon}_{0}}\phi_{r}\right|_{\infty}\leq\frac{\delta_{r}}{{\varepsilon}_{0}}\quad\mbox{and}\quad\left|\psi\left(\frac{1}{{\varepsilon}_{0}}r\right)-\phi_{\psi}\left(\frac{1}{{\varepsilon}_{0}}r\right)\right|_{\infty}\leq\delta_{\psi}.

Therefore, by the triangle inequality we get

	$\displaystyle\left\|\phi_{\psi}\left(\frac{r}{{\varepsilon}_{0}}\right)(x)-\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right)(x)\right\|$	$\displaystyle\leq 2\delta_{\psi}+\|\psi^{\prime}\|_{\infty}\left\|\frac{1}{{\varepsilon}_{0}}r(x)-\frac{1}{{\varepsilon}_{0}}\phi_{r}(x)\right\|\leq 2\delta_{\psi}+\|\psi^{\prime}\|_{\infty}\frac{\delta_{r}}{{\varepsilon}_{0}},$
hence
	$\displaystyle\left\|\psi\left(\frac{1}{{\varepsilon}_{0}}r\right)-\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right)\right\|_{\infty}$	$\displaystyle\leq 3\delta_{\psi}+\frac{\delta_{d}}{{\varepsilon}_{0}}.$
Reasoning analogously we get
	$\displaystyle\|g({\pi_{\partial D}})-\phi_{g}\circ\phi_{\pi}\|_{\infty}$	$\displaystyle\leq 3\delta_{g}+\|\nabla g\|_{\infty}\delta_{\pi},$
so again by the triangle inequality
	$\displaystyle\left\|\psi\left(\frac{1}{{\varepsilon}_{0}}r\right)g({\pi_{\partial D}})-\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right)\phi_{g}\circ\phi_{\pi}\right\|_{\infty}$	$\displaystyle\leq\left(3\delta_{\psi}+\frac{\delta_{d}}{{\varepsilon}_{0}}\right)\|g\|_{\infty}+\left(3\delta_{g}+\|\nabla g\|_{\infty}\delta_{\pi}\right)(\delta_{\psi}+1).$

Let now $\Pi$ be the ReLU DNN given in 3.5 with $\epsilon_{p}:=\left(3\delta_{\psi}+\frac{\delta_{d}}{{\varepsilon}_{0}}\right)|g|_{\infty}+\left(3\delta_{g}+|\nabla g|_{\infty}\delta_{\pi}\right)(\delta_{\psi}+1)=\overline{\delta}/2$ , and $c:=|g|_{\infty}$ , so that

\left|G-\Pi\left(\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right),\phi_{g}\circ\phi_{\pi}\right)\right|_{\infty}\leq\overline{\delta}.

Now, by taking $\phi_{G}:=\Pi\left(\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right),\phi_{g}\circ\phi_{\pi}\right)$ , the statement follows directly from 3.5 and 3.3. ∎

Proof of 3.10.

i) The fact that $\widetilde{u}_{M}^{N}(\cdot)(\omega)$ can be realized, for each $\omega\in\Omega$ , as a ReLU DNN, denoted in the statement by $\mathbb{U}_{M}^{N}(\omega,\cdot)$ , is a direct consequence of 3.9, 3.5, 3.3, and 3.2.

Next, let $u_{M}^{N}$ be given by (2.29) and consider the modification of $u_{M}^{N}$ by replacing $g$ and $f$ with $\phi_{g}$ and $\phi_{f}$ , namely

\widehat{u}_{M}^{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\left[\phi_{g}(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x,i}_{k-1})\phi_{f}\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right],x\in D.

Then using assumption $a.4)$ we can easily deduce that $|\widehat{u}_{M}^{N}-\widetilde{u}_{M}^{N}|_{\infty}\leq\frac{M\epsilon_{p}}{d}(1+2|f|_{\infty})\leq\gamma/6$ . Therefore,

	$\displaystyle\sup\limits_{x\in D}\left\|u(x)-\mathbb{U}_{M}^{N}(\cdot,x)\right\|$	$\displaystyle=\|u-\widetilde{u}_{M}^{N}\|_{\infty}\leq\|u-u_{M}^{N}\|_{\infty}+\|u_{M}^{N}-\widehat{u}_{M}^{N}\|+\|\widehat{u}_{M}^{N}-\widetilde{u}_{M}^{N}\|$
		$\displaystyle\leq\|u-u_{M}^{N}\|_{\infty}+\epsilon_{g}+\frac{M{\rm diam}(D)^{2}\epsilon_{f}}{d}+\gamma/6$
		$\displaystyle\leq\|u-u_{M}^{N}\|_{\infty}+\gamma/2,$

where the last inequality follows from assumptions $a.1)$ , $a.2)$ . Consequently,

\displaystyle\mathbb{P}\left(\sup_{x\in D}\left|u(x)-\mathbb{U}_{M}^{N}(\cdot,x)\right|\geq\gamma\right)

\displaystyle\leq\mathbb{P}\left(|u-u_{M}^{N}|_{\infty}\geq\gamma/2\right),

so, we can employ 2.28 to conclude the first assertion.

Let us proceed at proving ii). First of all, note that

\mathcal{L}(\widetilde{r})=\mathcal{L}(\phi_{r})+1,\;\mathcal{W}(\widetilde{r})=\mathcal{W}(\phi_{r}),\;{\rm size}(\widetilde{r})\leq{\rm size}(\phi_{r})+2.

Then, by 3.3 and 3.9 we have

$\displaystyle\mathcal{L}(\phi_{g}(X^{\cdot,i}_{M}))$	$\displaystyle=\mathcal{L}(\phi_{g})+\mathcal{L}(X^{\cdot,i}_{M})=\mathcal{L}(\phi_{g})+M(\mathcal{L}(\widetilde{r})+1))+1=\mathcal{L}(\phi_{g})+M(\mathcal{L}(\phi_{r})+2))+1,$
$\displaystyle\mathcal{W}(\phi_{g}(X^{\cdot,i}_{M}))$	$\displaystyle\leq\max(\mathcal{W}(\phi_{g}),\mathcal{W}(X^{\cdot,i}_{M}),2d)\leq\max(\mathcal{W}(\phi_{g}),2d+\max(d,\mathcal{W}(\phi_{r}))),$
$\displaystyle{\rm size}(\phi_{g}(X^{\cdot,i}_{M}))$	$\displaystyle\leq 2{\rm size}(\phi_{g})+2{\rm size}(X^{\cdot,i}_{M})\leq 2{\rm size}(\phi_{g})+4dM[4d+\mathcal{W}(\widetilde{r})+\mathcal{L}(\widetilde{r})+2]+d+2M{\rm size}(\widetilde{r})$
	$\displaystyle\leq 2{\rm size}(\phi_{g})+4dM[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+2d+4M[{\rm size}(\phi_{r})+2]$
	$\displaystyle\in\mathcal{O}({\rm size}(\phi_{g})+M{\rm size}(\phi_{r})+Md\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))).$	(4.25)

Further, for each $k\geq 0$ , by 3.3 we get

	$\displaystyle{\rm size}\left(\phi_{f}\left(X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}\right)\right)$	$\displaystyle\leq 2{\rm size}\left(\phi_{f}\right)+2{\rm size}\left(X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}\right)$
and since $X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}$ has the same size as $X^{\cdot,i}_{k+1}$ , we can continue with
		$\displaystyle=2{\rm size}\left(\phi_{f}\right)+2{\rm size}\left(X^{\cdot,i}_{k+1}\right)$
		$\displaystyle\leq 2{\rm size}(\phi_{f})+4d(k+1)[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+2d+4(k+1)[{\rm size}(\phi_{r})+2].$	(4.26)

The next step is to use 3.5 to get

$\displaystyle{\rm size}\left(\Pi\left(\widetilde{r}(X^{\cdot,i}_{k}),\widetilde{r}(X^{\cdot,i}_{k})\right)\right)$	$\displaystyle\leq 8{\rm size}(\widetilde{r}(X^{\cdot,i}_{k}))+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)$
	$\displaystyle\leq 16{\rm size}(\widetilde{r})+16{\rm size}(X^{\cdot,i}_{k})+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)$
	$\displaystyle\leq 16{\rm size}(\phi_{r})+32dk[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+16d+32k[{\rm size}(\phi_{r})+2]$
	$\displaystyle\phantom{\leq 16{\rm size}(\phi_{r})\;}+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil),$	(4.27)

so that by (4.26) and (4.27) together with 3.5 we obtain

$\displaystyle{\rm size}$	$\displaystyle\left(\Pi\left(\Pi\left(\widetilde{r}(X^{\cdot,i}_{k}),\widetilde{r}(X^{\cdot,i}_{k})\right),\phi_{f}\left(X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}\right)\right)\right)$
	$\displaystyle\leq 4{\rm size}\left(\Pi\left(\widetilde{r}(X^{\cdot,i}_{k}),\widetilde{r}(X^{\cdot,i}_{k})\right)\right)+4{\rm size}\left(\phi_{f}\left(X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}\right)\right)+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)$
	$\displaystyle\leq 24{\rm size}(\phi_{r})+128dk[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+64d+128k[{\rm size}(\phi_{r})+2]$
	$\displaystyle\phantom{\leq 24{\rm size}(\phi_{r})\;}+8{\rm size}(\phi_{f})+16d(k+1)[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+8d+16(k+1)[{\rm size}(\phi_{r})+2]$	(4.28)
	$\displaystyle\phantom{\leq 24{\rm size}(\phi_{r})\;}+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)$
	$\displaystyle\in\mathcal{O}\left(dk\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+k{\rm size}(\phi_{r})+{\rm size}(\phi_{f})+\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil\right).$	(4.29)

Finally, corroborating (4.25) with (4.29) and by applying 3.2, we obtain that for each $\omega\in\Omega$

	$\displaystyle{\rm size}$	$\displaystyle(\mathbb{U}_{M}^{N}(\omega,\cdot))$
		$\displaystyle\in\mathcal{O}\left(MN\left[dM\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+M{\rm size}(\phi_{r})+{\rm size}(\phi_{f})+{\rm size}(\phi_{g})+\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil\right]\right),$

and by assumption $a.4)$ we deduce that

{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))\in\mathcal{O}\left(MN\left[dM\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+M{\rm size}(\phi_{r})+{\rm size}(\phi_{f})+\log\left(\frac{1}{\gamma d}\right)\right]\right),

where the tacit constant depends on $\max({\rm diam}(D),|f|_{\infty})$ .

In particular,

	$\displaystyle M$	$\displaystyle\in\mathcal{O}\left(d^{2}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)\right),$
	$\displaystyle N$	$\displaystyle\in\mathcal{O}\left(d^{2}\gamma^{-8/\alpha-2}\log^{2}\left(\frac{1}{\gamma}\right)\left[d^{3}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]\right),$
	$\displaystyle{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))$	$\displaystyle\in\mathcal{O}\left(d^{7}\gamma^{-16/\alpha-4}\log^{4}\left(\frac{1}{\gamma}\right)\left[d^{3}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right),$

where

{\rm S}:=\left[\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+{\rm size}(\phi_{r})+{\rm size}(\phi_{g})+{\rm size}(\phi_{f})\right]

and the tacit constant depends on $|g|_{\alpha},|g|_{\infty},|f|_{\infty},{\rm diam}(D),{\rm adiam}(D),\delta,\alpha,\log(2+|\phi_{r}|_{1})$ .

Now, if $D$ is $\delta$ -defective convex, then

	$\displaystyle M$	$\displaystyle\in\mathcal{O}\left(d\log\left(\frac{d}{\gamma}\right)\right),$
	$\displaystyle N$	$\displaystyle\in\mathcal{O}\left(\frac{1}{\gamma^{2}}\log^{2}\left(\frac{d}{\gamma}\right)\left[d^{2}\log\left(\frac{d}{\gamma}\right)\log(2+\|\phi_{r}\|_{1})+\log\left(\frac{1}{\eta}\right)\right]\right),$
	$\displaystyle{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))$	$\displaystyle\in\mathcal{O}\left(\frac{d^{3}}{\gamma^{2}}\log^{4}\left(\frac{d}{\gamma}\right)\left[d^{2}\log\left(\frac{d}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right),$

where ${\rm S}$ is as above and the tacit constant depends on $|g|_{\alpha},|g|_{\infty},|f|_{\infty},{\rm diam}(D),{\rm adiam}(D),\delta,\alpha,\log(2+|\phi_{r}|_{1})$ . ∎

5 Numerical results

Throughout this section we numerically test some key theoretical results obtained in this paper. In order to take advantage of the highly parallelizable properties of the proposed Monte Carlo methods, we have implemented the algorithms using GPU parallel computing within the PyTorch framework.

Concretely, we consider two types of domains, namely:

D_{\sf c}:=[-1,1]^{d},d\geq 1\quad\mbox{ and }\quad D_{\sf ac}:=[-1,1]^{d}\setminus\left\{x\in\mathbb{R}^{d}:|x|_{1}:=|x_{1}|+\dots+|x_{d}|\leq 0.5\right\},

which are illustrated by Figure 2 for the case $d=2$ .

We have performed the following numerical tests by:

Test 1. Here we numerically verify the estimate (2.18), namely that

\mathbb{P}(r(X^{x}_{M})\geq\varepsilon)\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\sqrt{\frac{r(x)}{\varepsilon}},\quad x\in D,

(5.1)

by fixing $\varepsilon$ and varying $d$ and $M$ for both $D_{\sf c}$ and $D_{\sf ac}$ . Note that $D_{\sf c}$ is $0$ -defective convex since it is in fact convex, so (5.1) is valid with $\delta=0$ by 2.16. Furthermore, for this test, the WoS is constructed with the exact distance to the boundary, so we shall take $\beta=1$ . On the other hand, $D=D_{\sf ac}$ is not defective convex since if we denote the boundary corner $(0.5,0,0,\dots,0)\in D_{\sf ac}$ by $x_{0}$ , then $\Delta r(x)=(d-1)|x-x_{0}|^{-1}$ is unbounded on the cone $\left\{(r,x)\in D_{\sf ac}:r\in(0.5,0.75),|x|\leq r\right\}$ , hence (2.15) can not hold. Nevertheless, for $\varepsilon$ fixed we speculated in 2.17 that a similar asymptotic behavior as in (5.1) still holds with respect to $M$ and $d$ ; this is numerically tested in Figure 5(b) below.

Let us introduce the notations

	$\displaystyle{\sf U}_{\sf bound}(d,M,x,\varepsilon):=\left(1-\frac{1}{4d}\right)^{M}\sqrt{\frac{r(x)}{\varepsilon}},$
	$\displaystyle\mathbb{P}_{N}(d,M,x,\varepsilon):=\frac{1}{N}\sum\limits_{1\leq i\leq N}1_{\left[r\geq\varepsilon\right]}(X_{M}^{x,i}),\quad\mbox{ where }X_{M}^{x,1},\cdots,X_{M}^{x,N}\sim X_{M}^{x}\mbox{ are independent},N\geq 1,$

so that the inequality to be tested becomes

\mathbb{P}_{N}(d,M,x,\varepsilon)\approx\mathbb{P}(r(X^{x}_{M})\geq\varepsilon)\leq{\sf U}_{\sf bound}(d,M,x,\varepsilon)\quad\mbox{ for }N\mbox{ sufficiently large}.

(5.2)

We obtained the following results:

Comments on Test 1.

$\bullet$

The numerical results depicted in Figure 3 and Figure 4 validate the upper bound (5.2). Note that in Figure 3(b), namely for large values of $M$ , inequality (5.2) becomes quite sharp, yet slightly reversed; the apparent reversed inequality is just a consequence of the Monte Carlo error that appears in the approximation $\mathbb{P}_{N}(d,M,x,\varepsilon)\approx\mathbb{P}(r(X^{x}_{M})\geq\varepsilon)$ corresponding to $N=5\times 10^{4}$ Monte Carlo samples.
$\bullet$

The numerical evidence illustrated in Figure 5 shows that at least for one relevant example of non $\delta$ -defective convex domain, namely for $D_{\sf ac}$ , by fixing $\varepsilon$ , the estimate (5.2) is still in force; in particular, this sustains the idea expressed in 2.17.
$\bullet$

Finally, recall that our main estimates obtained in the previous sections in the case of defective convex domains require $M\in\mathcal{O}(d\log(d/\gamma))$ . This requirement is suggested also by comparing Figure 4(a) versus Figure 4(b) as well as Figure 5(a) versus Figure 5(b); in Figure 4(b) and Figure 5(b) we have chosen $M\in\mathcal{O}(d^{1+0.5})$ instead of $M\in\mathcal{O}(d\log(d/\gamma))$ only because in the latter case it is harder to nicely visualize the numerical results.

Test 2. Here the goal is to test the approximation of the solution $u$ to (1.1) by simulating its Monte Carlo estimator (1.5). For simplicity we take the source term $f=1$ , hence we deal with

\begin{cases}\frac{1}{2}\Delta u=-1\,\mbox{ in }D\\ u|_{\partial D}=g,\end{cases},\mbox{ where }D=D_{\sf ac}.

(5.3)

In order to validate the numerical results, we consider a particular explicit solution to (5.3), namely

		$\displaystyle u(x)=x_{1}^{2}+\dots+x_{k}^{2}-x_{k+1}^{2}-\dots-x_{d}^{2}-x_{1}^{2},\quad d=2k,x=(x_{1},\cdots,x_{d})\in\mathbb{R}^{d},$		(5.4)
		$\displaystyle g=u\|_{\partial D},\quad D=D_{\sf c}\mbox{ or }D_{\sf ac}.$		(5.5)

Further, we need to introduce some notations. For $M,N,L,E\geq 1$ and $W_{i},1\leq i\leq L$ independent and uniformly distributed on $D$ , we introduce the notations

		$\displaystyle{\sf Err}_{M,N}(x):=\left\|u(x)-u_{M}^{N}(x)\right\|,x\in D,\quad\|{\sf Err}_{M,N}\|_{L^{1}(D/\|D\|)}:=\|D\|^{-1}\int_{D}{\sf Err}_{M,N}(x)\;dx$
		$\displaystyle\|{\sf Err}_{M,N,L}\|_{L^{1}(D/\|D\|)}:=1/L\sum_{1\leq i\leq L}{\sf Err}_{M,N}^{(i)}(W_{i}),\quad\|{\sf Err}_{M,N,L}\|_{L^{\infty}(D)}:=\max_{1\leq i\leq L}{\sf Err}_{M,N}^{(i)}(W_{i})$
where ${\sf Err}_{M,N}^{(i)}(\cdot),1\leq i\leq L$ are iid copies of ${\sf Err}_{M,N}(\cdot)$ , independent of $(W_{i})_{1\leq i\leq L}$ ,
		$\displaystyle\|{\sf Err}_{M,N,L,E}\|_{L^{\infty}(D)}:=1/E\sum_{1\leq j\leq E}\|{\sf Err}_{M,N,L}^{(j)}\|_{L^{\infty}(D)},$

where $|{\sf Err}_{M,N,L}^{(j)}|_{L^{\infty}(D)},1\leq j\leq E$ are independent copies of $|{\sf Err}_{M,N,L}|_{L^{\infty}(D)}$ . In particular, by the law of large numbers we immediately have

\lim_{L\to\infty}|{\sf Err}_{M,N,L}|_{L^{1}(D/|D|)}=\mathbb{E}\left\{|{\sf Err}_{M,N}|_{L^{1}(D/|D|)}\right\}\mbox{ almost surely},

(5.6)

whilst by a similar argument as in [5, Lemma 4.3] one can show that

\lim_{E\to\infty}\lim_{L\to\infty}|{\sf Err}_{M,N,L,E}|_{L^{\infty}(D)}=\mathbb{E}\left\{\sup_{x\in D}|{\sf Err}_{M,N}(x)|\right\}\mbox{ almost surely}.

(5.7)

The aim of this test is to numerically validate the approximation $u\approx u_{M}^{N}$ by computing the mean errors $\mathbb{E}\left\{|{\sf Err}_{M,N}|_{L^{1}(D/|D|)}\right\}$ and $\mathbb{E}\left\{\sup_{x\in D}|{\sf Err}_{M,N}(x)|\right\}$ . To this end, justified by (5.6) and (5.7), we shall simulate $|{\sf Err}_{M,N,L}|_{L^{1}(D/|D|)}(\omega)$ and $|{\sf Err}_{M,N,L,E}|_{L^{\infty}(D)}(\omega)$ for $L$ large, e.g. $L=1000$ or $L=2000$ . In order to reduce the computational burden, the value of $E$ is going to be taken relatively small, e.g. $E=5$ or $E=10$ . However, we point out that choosing a small value for $E$ shall not alter the reliability of the Monte Carlo estimate, mainly due to the fact that one can show that the distribution of $|{\sf Err}_{M,N,L}|_{L^{\infty}(D)}$ is concentrated.

Comments on Test 2.

$\bullet$

The histograms depicted in Figure 6 for the Poisson problem (5.3) for $d=10$ and $D=D_{\sf ac}$ confirm that the random variables given by the normalized $L^{1}$ -error $|{\sf Err}_{M,N}|_{L^{1}(D/|D|)}$ and the $L^{\infty}$ -error $|{\sf Err}_{M,N,L}|_{L^{\infty}(D)}$ are small and concentrated if $M$ and $N$ are chosen sufficiently large, as stipulated by the theoretical results 2.26 and 2.28.
$\bullet$

The numerical tests depicted by Figure 7 for the Poisson problem (5.3) for $d=100$ and $D=D_{\sf ac}$ confirm that the errors $\mathbb{E}\left\{|{\sf Err}_{M,N}|_{L^{1}(D/|D|)}\right\}$ and $\mathbb{E}\left\{\sup_{x\in D}|{\sf Err}_{M,N}(x)|\right\}$ , approximated by (5.6) and (5.7) respectively, are decreasing to a small value as the number of WoS trajectories $N$ increases. The limit error attained when $N$ goes to infinity is not zero as it depends on $M$ , but it decreases to zero as the latter parameter is also increased to infinity. This is discussed in the next comment.
$\bullet$

Furthermore, the results illustrated by Figure 8 for $d=100$ show that the error ${\sf Err}_{M,N}(x)$ for an arbitrary chosen location $x\in D=D_{\sf ac}$ becomes much smaller than the errors obtained in Figure 7, as the number of WoS steps $M$ is increased.
$\bullet$

Concerning the dependence of $M$ and $N$ with respect to $d$ , our numerical tests revealed that on the one hand the choice of $M$ required in 2.26 or 2.28 is quite optimal, and on the other hand that the value of $N$ can be in fact taken much smaller than the one required in the same theoretical results. As a consequence suggested by this numerical evidence, one could expect that the width of the DNNs provided by 3.10 can be significantly reduced.
$\bullet$

Finally, let us emphasize that the numerical results obtained during Test 2 are conducted for the domain $D_{\sf ac}$ which is nor defective convex, neither satisfies the uniform exterior ball condition, hence the test turned out to be successful even on a worse domain geometry.

Acknowledgements. Lucian Beznea and Oana Lupascu-Stamate were supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS - UEFISCDI, project number PN-III-P4-PCE-2021-0921, within PNCDI III. Iulian Cimpean acknowledges support from the project PN-III-P1-1.1-PD-2019-0780, within PNCDI III. The work of Arghir Zarnescu has been partially supported by the Basque Government through the BERC 2022-2025 program and by the Spanish State Research Agency through BCAM Severo Ochoa excellence accreditation Severo Ochoa CEX2021-00114 and through project PID2020-114189RB-I00 funded by Agencia Estatal de Investigación (PID2020-114189RB-I00 / AEI / 10.13039/501100011033).

References

[1] D.H. Armitage and Ü. Kuran. The convexity of a domain and the superharmonicity of the signed distance function. Proceedings of the Amer. Math. Soc., 93(4):598–600, 1985.
[2] Marc Arnaudon and Xue-Mei Li. Reflected Brownian motion: selection, approximation and linearization. Electronic Journal of Probability, 22(none):1 – 55, 2017.
[3] Vlad Bally and Denis Talay. The law of the euler scheme for stochastic differential equations: I. convergence rate of the distribution function. Probability Theory and Related Fields, 104, 07 1994.
[4] Fabrice Baudoin. Stochastic analysis on sub-Riemannian manifolds with transverse symmetries. The Annals of Probability, 45(1):56 – 81, 2017.
[5] Christian Beck, Sebastian Becker, Philipp Grohs, Nor Jaafari, and Arnulf Jentzen. Solving the Kolmogorov PDE by means of Deep learning. J. Sci. Comput., 88(3), sep 2021.
[6] Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, and Benno Kuckuck. An overview on deep learning-based approximation methods for partial differential equations. Discrete and Continuous Dynamical Systems - B, 28(6):3697–3746, 2023.
[7] Lucian Beznea and Andrei-George Oprina. Nonlinear pdes and measure-valued branching type processes. Journal of Mathematical Analysis and Applications, 384(1):16–32, 2011. Special Issue on Stochastic PDEs in Fluid Dynamics, Particle Physics and Statistical Mechanics.
[8] I. Binder and M. Braverman. The rate of convergence of the walk on spheres algorithm. Geometric and Functional Analysis, 22(3):558–587, 2012.
[9] Mireille Bossy, Nicolas Champagnat, Sylvain Maire, and Denis Talay. Probabilistic interpretation and random walk on spheres algorithms for the poisson-boltzmann equation in molecular dynamics. ESAIM: Mathematical Modelling and Numerical Analysis, 44(5):997–1048, 2010.
[10] Krzysztof Burdzy, Zhen-Qing Chen, and John Sylvester. The heat equation and reflected Brownian motion in time-dependent domains. The Annals of Probability, 32(1B):775 – 804, 2004.
[11] Patrick Cheridito, H. Soner, Nizar Touzi, and Nicolas Victoir. Second order backward stochastic differential equations and fully non-linear parabolic pdes. 10 2005.
[12] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, 1989.
[13] M. Deaconu, S. Herrmann, and S. Maire. The walk on moving spheres: A new tool for simulating brownian motion’s exit time from a domain. Mathematics and Computers in Simulation, 135:28–38, 2017. Special Issue: 9th IMACS Seminar on Monte Carlo Methods.
[14] Madalina Deaconu and Antoine Lejay. A random walk on rectangles algorithm. Methodology And Computing In Applied Probability, 8:135–151, 03 2006.
[15] D. Elbrächter, D. Perekrestenko, P. Grohs, and H. Bölcskei. Deep neural network approximation theory. IEEE Transactions on Information Theory, 67(5):2581–2623, 2021.
[16] Arash Fahim, Nizar Touzi, and Xavier Warin. A probabilistic numerical method for fully nonlinear parabolic PDEs. The Annals of Applied Probability, 21(4):1322 – 1364, 2011.
[17] J. F. Le Gall. Spatial branching processes, random snakes, and partial differential equations. 1999.
[18] W.D. Gerhard. The probabilistic solution of the Dirichlet problem for $1/2{\Delta}+\langle a,\nabla\rangle+b$ with singular coefficients. Journal of Theoretical Probability, 5(3):503–520, 1992.
[19] D. Gilbarg and N.S. Trudinger. Elliptic Partial Differential Equations of Second Order. Springer, 2001.
[20] Emmanuel Gobet. Monte-Carlo methods and stochastic processes: from linear to non-linear. Chapman and Hall/CRC, 2016.
[21] Lukas Gonon, Philipp Grohs, Arnulf Jentzen, David Kofler, and David Šiška. Uniform error estimates for artificial neural network approximations for heat equations. IMA Journal of Numerical Analysis, 42(3):1991–2054, 2022.
[22] Lukas Gonon, Philipp Grohs, Arnulf Jentzen, David Kofler, and David Šiška. Uniform error estimates for artificial neural network approximations for heat equations. IMA Journal of Numerical Analysis, 42(3):1991–2054, 2021.
[23] P. Grohs and L. Herrmann. Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions. IMA Journal of Numerical Analysis, 42(3):2055–2082, 05 2021.
[24] Gerald Trutnau Haesung Lee, Wilhelm Stannat. Analytic theory of Itô-stochastic differential equations with non-smooth coefficients. arxix, (none), 2022.
[25] J. Han, A. Jentzen, and E. Weinan. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
[26] Elton P Hsu. Stochastic analysis on manifolds. Number 38. American Mathematical Soc., 2002.
[27] Martin Hutzenthaler, Arnulf Jentzen, and Thomas Kruse. Overcoming the curse of dimensionality in the numerical approximation of parabolic partial differential equations with gradient-dependent nonlinearities. Foundations of Computational Mathematics, page 905–966, 2022.
[28] Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen, and Philippe von Wurstemberger. Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations. Proceedings of the Royal Society A, 476(2244):20190630, 2020.
[29] Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, and Tuan Nguyen. A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differential Equations and Applications, 1, 04 2020.
[30] Martin Hutzenthaler, Arnulf Jentzen, and von Wurstemberger Wurstemberger. Overcoming the curse of dimensionality in the approximative pricing of financial derivatives with default risks. Electronic Journal of Probability, 25(none):1 – 73, 2020.
[31] A. Jentzen, D. Salimova, and T. Welti. A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. Communications in Mathematical Sciences, 19(5):1167–1205, 2021.
[32] Vitalii Konarovskyi, Victor Marx, and Max von Renesse. Spectral gap estimates for Brownian motion on domains with sticky-reflecting boundary diffusion. arXiv e-prints, page arXiv:2106.00080, May 2021.
[33] A.E. Kyprianou, A.Osojnik, and T. Shardlow. Unbiased ‘walk-on-spheres’ monte carlo methods for the fractional laplacian. IMA Journal of Numerical Analysis, 38, 09 2017.
[34] I.E. Lagaris, A. Likas, and D.I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
[35] I.E Lagaris, A.C. Likas, and D.G. Papageorgiou. Neural-network methods for boundary value problems with irregular boundaries. IEEE Transactions on Neural Networks, 11(5):1041–1049, 2000.
[36] Antoine Lejay and Sylvain Maire. New monte carlo schemes for simulating diffusions in discontinuous media. Journal of computational and applied mathematics, 245:97–116, 2013.
[37] A. Malek and R. Shekari Beidokhti. Numerical solution for high order differential equations using a hybrid neural network—optimization method. Applied Mathematics and Computation, 183(1):260–271, 2006.
[38] Miguel Martinez and Denis Talay. One-dimensional parabolic diffraction equations: pointwise estimates and discretization of related stochastic differential equations with weighted local times. Electronic Journal of Probability, 17:1–30, 2012.
[39] T. Marwah, Z.C. Lipton, and A. Risteski. Parametric complexity bounds for approximating pdes with neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 15044–15055. Curran Associates, Inc., 2021.
[40] M.E. Muller. Some continuous monte carlo methods for the dirichlet problem. The Annals of Mathematical Statistics, pages 569–589, 1956.
[41] Etienne Pardoux and Shanjian Tang. Forward-backward stochastic differential equations and quasilinear parabolic pdes. Probability Theory and Related Fields, 114(2):123–150, 1999.
[42] A. Pinkus. Approximation theory of the mlp model in neural networks. Acta Numerica, 8:143–195, 1999.
[43] M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
[44] K.K. SABELFELD and D. TALAY. Integral formulation of the boundary value problems and the method of random walk on spheres. Monte Carlo Methods and Applications, 1(1):1–34, 1995.
[45] K.K. Sabelfeld and D. Talay. Integral formulation of the boundary value problems and the method of random walk on spheres. Monte Carlo Methods and Applications, 1(1):1–34, 1995.
[46] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics, 375:1339–1364, 2018.
[47] Daniel W Stroock. An introduction to the analysis of paths on a Riemannian manifold. Number 74. American Mathematical Soc., 2000.
[48] Denis Talay and Luciano Tubaro. Expansion of the global error for numerical schemes solving stochastic differential equations. Stochastic Analysis and Applications, 8(4):483–509, 1990.
[49] Anton Thalmaier and James Thompson. Exponential integrability and exit times of diffusions on sub-riemannian and metric measure spaces. Bernoulli, 26(3):2202–2225, 2020.
[50] N. Valenzuela. A new approach for the fractional laplacian via deep neural networks. arXiv preprint arXiv:2205.05229, 2022.
[51] Max-K von Renesse and Karl-Theodor Sturm. Transport inequalities, gradient estimates, entropy and ricci curvature. Communications on pure and applied mathematics, 58(7):923–940, 2005.
[52] D. Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
[53] Yijing Zhou, Wei Cai, and Elton Hsu. Computation of local time of reflecting brownian motion and probabilistic representation of the neumann problem. Communications in Mathematical Sciences, 15, 02 2015.

	$\displaystyle\|\nabla G\|_{\infty}$	$\displaystyle\leq\frac{1}{{\varepsilon}_{0}}\|g\|_{\infty}+2\|\nabla g\|_{\infty}$
	$\displaystyle\|\Delta G\|_{\infty}$	$\displaystyle\leq\widetilde{C},$

	$\displaystyle\|u(x)-u_{M}(x)\|$	$\displaystyle=\left\|\mathbb{E}\left\{\int_{\tau_{M}^{x}}^{\tau_{D^{c}}}f(B^{x}_{t})\;dt\right\}\right\|\leq\|f\|_{\infty}\mathbb{E}\left\{\tau_{D^{c}}-\tau_{M}^{x}\right\}=\|f\|_{\infty}\mathbb{E}^{x}\left\{\mathbb{E}^{B^{x}_{\tau_{M}^{x}}}\left\{\tau_{D^{c}}\right\}\right\}$
		$\displaystyle=\|f\|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}})\right\}=\|f\|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}});N_{\varepsilon}^{x}\leq M\right\}+\|f\|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}});N_{\varepsilon}^{x}>M\right\}$
		$\displaystyle\leq\|f\|_{\infty}\left[\mathbb{E}\left\{v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}});N_{\varepsilon}^{x}\leq M\right\}+\|v\|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M)\right]$
		$\displaystyle\leq\|f\|_{\infty}\left[\mathbb{E}\left\{v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}});N_{\varepsilon}^{x}\leq M\right\}+\frac{1}{d}{\sf diam}(D)^{2}2e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\right],\quad x\in D,$

	$\displaystyle\left\|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right\|$	$\displaystyle\leq\|g\|_{\alpha}\mathbb{E}\left\{\sum_{i=\tau}^{\infty}\widetilde{r}^{2}(X^{x}_{i})\right\}^{\alpha/2}\leq d^{\alpha/2}\|g\|_{\alpha}\mathbb{E}\left\{\sum_{i=N^{x}_{\varepsilon}}^{\infty}\tau^{x}_{i+1}-\tau^{x}_{i}\right\}^{\alpha/2}$
		$\displaystyle=d^{\alpha/2}\|g\|_{\alpha}\cdot\mathbb{E}\left\{\tau^{x}_{\partial D}-\tau^{x}_{N^{x}_{\varepsilon}}\right\}^{\alpha/2}=d^{\alpha/2}\|g\|_{\alpha}\cdot\mathbb{E}\left\{v\left(B^{x}_{\tau^{x}_{N^{x}_{\varepsilon}}}\right)\right\}^{\alpha/2}$
		$\displaystyle\leq d^{\alpha/2}\|g\|_{\alpha}\cdot\|v\|_{\infty}^{\alpha/2}(\varepsilon).$

	$\displaystyle\left\|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right\|$	$\displaystyle\leq\dfrac{\|\Delta g\|_{\infty}}{2d}\cdot\mathbb{E}\left\{\sum_{i=N^{x}_{\epsilon}}^{\infty}\widetilde{r}^{2}(X^{x}_{i})\right\}=\dfrac{\|\Delta g\|_{\infty}}{2}\cdot\mathbb{E}\left\{\sum_{i=N^{x}_{\epsilon}}^{\infty}\tau^{x}_{i+1}-\tau^{x}_{i}\right\}$
		$\displaystyle=\dfrac{\|\Delta g\|_{\infty}}{2}\cdot\mathbb{E}\left\{\tau^{x}_{\partial D}-\tau^{x}_{N^{x}_{\epsilon}}\right\}=\dfrac{\|\Delta g\|_{\infty}}{2}\cdot\mathbb{E}\left\{v\left(B^{x}_{\tau^{x}_{N^{x}_{\epsilon}}}\right)\right\}$
		$\displaystyle=\dfrac{\|\Delta g\|_{\infty}}{2}\cdot v(x,\epsilon)\leq\dfrac{\|\Delta g\|_{\infty}}{2}\cdot\|v\|_{\infty}(\epsilon).$

	$\displaystyle\|u(x)-u_{M}(x)\|$	$\displaystyle\leq\lim\limits_{n}\left\|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M})\right\}\right\|$
		$\displaystyle=\lim\limits_{n}\left\|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M});\;N_{\varepsilon}^{x}\leq M\right\}\right\|+\lim\limits_{n}\left\|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M});\;N_{\varepsilon}^{x}>M\right\}\right\|$
		$\displaystyle\leq\lim\limits_{n}\left\|\mathbb{E}\left\{g(X^{x}_{n\vee M\vee N^{x}_{\varepsilon}})-g\left(X^{x}_{M\vee N_{\varepsilon}^{x}}\right);\;N_{\varepsilon}^{x}\leq M\right\}\right\|+2\|g\|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M).$

From Monte Carlo to neural networks approximations of boundary value problems

Abstract

1 Introduction

1.1 On related previous work.

Monte Carlo methods for PDEs

DNNs for the Dirichlet problem on bounded domains

DNNs for (linear) Kolmogorov PDEs

DNNs for semilinear parabolic PDEs on ℝd\mathbb{R}^{d}

1.2 Our contribution

Overcoming the curse of high dimensionality

Low dimensions improvements for general bounded domains with Dirichlet data

General bounded domains and Hölder continuous data

L∞​(D)L^{\infty}(D) estimates

Walk-on-Spheres acceleration revisited

The core ideas are surprisingly simple and of general nature

Universality with respect to given data

Explicit construction of the approximation

1.3 Brief technical description of the main results.

Theorem (Part I; see 2.26 for the full quantitative version).

Remark 1.1.

Theorem (Part II; see Theorem 3.10 for details).

1.4 Further extensions

1.5 Organization of the paper

2 The presentation of the main results

Lemma 2.1.

2.1 Probabilistic representation for Laplace equation and exit time estimates

Theorem 2.2 ([18]).

Corollary 2.3.

The annular diameter of a set in ℝd\mathbb{R}^{d} and exit time estimates

Proposition 2.4.

Definition 2.1.

Proposition 2.5.

Proposition 2.6.

2.2 Walk-on-Spheres (WoS) and ε\varepsilon-shell estimates revisited

Definition 2.7.

Remark 2.8.

r~\widetilde{r}-WoS chain.

Remark 2.9.

Corollary 2.10.

Proposition 2.11.

Definition 2.12.

Remark 2.13.

Example 2.14.

Example 2.15.

Proposition 2.16.

Remark 2.17.

2.3 WoS stopped at deterministic time and error analysis

Proposition 2.18.

Proposition 2.19.

Proposition 2.20.

Theorem 2.21.

Corollary 2.22.

2.4 Monte-Carlo approximations: mean versus tail estimates

Remark 2.23.

Proposition 2.24.

Remark 2.25.

Theorem 2.26.

Remark 2.27.

Corollary 2.28.

Remark 2.29.

2.5 On regular extensions of the boundary data inside the domain

Lemma 2.30.

Remark 2.31.

3 DNN counterpart of the main results

Lemma 3.1.

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

Corollary 3.5.

Lemma 3.6.

Corollary 3.7.

Corollary 3.8.

3.1 DNN approximations for solutions to problem (2.31)

Corollary 3.9.

Theorem 3.10.

Corollary 3.11.

4 Proofs of the main results

4.1 Proofs for Subsection 2.1

Proof of 2.1.

Proof of 2.3.

From Monte Carlo to neural networks approximations
of boundary value problems

DNNs for semilinear parabolic PDEs on $\mathbb{R}^{d}$

$L^{\infty}(D)$ estimates

The annular diameter of a set in $\mathbb{R}^{d}$ and exit time estimates

2.2 Walk-on-Spheres (WoS) and $\varepsilon$ -shell estimates revisited

$\widetilde{r}$ -WoS chain.