This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

From Monte Carlo to neural networks approximations
of boundary value problems

Lucian Beznea2,1 E-mails: lucian.beznea@imar.ro    Iulian Cîmpean1,2 E-mails: iulian.cimpean@unibuc.ro;iulian.cimpean@imar.ro    Oana Lupaşcu-Stamate 3 E-mails: oana.lupascu_\_stamate@yahoo.com    Ionel Popescu 1,2 E-mails: ioionel@gmail.com,ionel.popescu@fmi.unibuc.ro    Arghir Zarnescu 4,2,5
E-mails: azarnescu@bcamath.org
(1University of Bucharest, Faculty of Mathematics and Computer Science,
14 Academiei str., 70109, Bucharest, Romania
2 Simion Stoilow Institute of Mathematics of the Romanian Academy,
P.O. Box 1-764, RO-014700 Bucharest, Romania
3“Gheorghe Mihoc – Caius Iacob” Institute of Mathematical Statistics and Applied Mathematics
of the Romanian Academy, 13 Calea 13 Septembrie, 050711 Bucharest, Romania
4 BCAM, Basque Center for Applied Mathematics, Mazarredo 14, E48009 Bilbao, Bizkaia, Spain
5 IKERBASQUE, Basque Foundation for Science, Maria Diaz de Haro 3, 48013, Bilbao, Bizkaia, Spain
)
Abstract

In this paper we study probabilistic and neural network approximations for solutions to Poisson equation subject to Hölder data in general bounded domains of d\mathbb{R}^{d}. We aim at two fundamental goals.

The first, and the most important, we show that the solution to Poisson equation can be numerically approximated in the sup-norm by Monte Carlo methods, and that this can be done highly efficiently if we use a modified version of the walk on spheres algorithm as an acceleration method. This provides estimates which are efficient with respect to the prescribed approximation error and with polynomial complexity in the dimension and the reciprocal of the error. A crucial feature is that the overall number of samples does not not depend on the point at which the approximation is performed.

As a second goal, we show that the obtained Monte Carlo solver renders in a constructive way ReLU deep neural network (DNN) solutions to Poisson problem, whose sizes depend at most polynomialy in the dimension dd and in the desired error. In fact we show that the random DNN provides with high probability a small approximation error and low polynomial complexity in the dimension.

Keywords: Deep Neural Network (DNN); Walk-on-Spheres (WoS); Monte Carlo approximation; high-dimensional approximation; Poisson boundary value problem with Dirichlet boundary condition.

Mathematics Subject Classification (2020): 65C99, 68T07, 65C05.

1 Introduction

Partial differential equations provide the most commonly used framework for modelling a large variety of phenomena in science and technology. Using these models in practice requires fast, accurate and stable computations of solutions of PDEs. Broadly speaking there exist two large classes of simulations: deterministic and stochastic. The deterministic methods (e.g. finite differences, finite element methods, etc) are very effective in globally approximating the solutions but their computational effort grows exponentially with respect to the dimension of the space. On the other hand, the probabilistic methods manage to overcome the dimensionality issue, but they are usually employed to obtain approximations at a given point and changing the point requires a different approximation.

In recent years, another powerful class of methods has been developed, namely the (deep) neural network models (in short DNN). These have been able to provide a remarkable number of achievements in the technological realm, such as: image classification, language processing and time series analysis, to name only a few. However, despite their remarkable achievements their rigorous understanding is still in its infancy.

On the theoretical side it is known that DNNs are capable of providing good approximation properties for continuous functions [12, 15, 42, 52]. For a more recent in depth analysis see [15] and the references in there. However it should be noted that these approximations are not constructive and indeed the issue of constructibility and error estimates is the crucial one from the practical point of view.

In what concerns the approximation of solutions of PDEs there are numerical treatments in low dimensions, as for example in [34, 35, 37], which propose schemes for solving PDEs using some form of neural networks. However, these do not provide any error estimates. Their approaches depend on a grid discretization of the space and for the most challenging case, namely that of simulations in high dimensions, there are no theoretical guarantees that the methods would scale well in high dimensions. Typically, these approximations, though convincing, remain at the level of numerical experiments.

On the other hand there is a rigorous body of literature which treats the approximation of the solution where error estimates are provided for PDEs with neural networks as, for example, in [43, 46, 37, 39, 23, 25] but typically these scale poorly in high dimensions. A discussion of some of these is provided in the next subsection.

Our aim in this paper will be to study approximations of Poisson equation and provide DNNs built on stochastic approaches that address some of these shortcomings mentioned before, significantly advancing the state of the art and providing tools that can be extended to more general equations, as the cost of a suitable increase in the technical details. A discussion of our contribution is provided in subsection  1.2.

1.1 On related previous work.

For the sake of comparison (see Subsection 1.2), let us present here a short review of existing works that we find strongly connected to our paper, trying to point out some limitations that are fundamentally addressed in our present work. For a comprehensive overview of deep learning methods for PDEs we refer to [6].

Monte Carlo methods for PDEs

The Monte Carlo method for solving linear PDEs has been well understood and intensively used for a long time. Let us mention e.g. [48] for the case of linear parabolic PDEs on d\mathbb{R}^{d}, and [44] for linear elliptic PDEs in bounded domains; one issue that is encountered in all these classical works and which is particularly crucial to us, is that the theoretical errors are point-dependent, in the sense that there is no guarantee that one can use the same Monte Carlo samples uniformly for all the locations in an open Euclidian domain where the solution is aimed to be approximated. Recently, it was shown in [28] and [27] (see also [30, Theorem 1.1]) that a multilevel Picard Monte Carlo method can be derived in order to numerically approximate the solutions to semilinear parabolic PDEs without suffering from the curse of high dimensions; here as well, the obtained probabilistic errors are derived for a given fixed location where the exact solution is approximated. Monte Carlo methods have also been extended to fully nonlinear parabolic PDEs in d\mathbb{R}^{d}, as for example in [16], where a mixture between the Monte Carlo method and the finite difference scheme is proposed; moreover, a locally uniform (in space) convergence of the proposed numerical scheme is obtained, but the issue that matters to us is that the required number of samples grows like hdh^{-d} (see [16, Example 4.4]) where dd is the dimension whilst hh is the time discretization parameter. Also, because the space discretization is performed by a finite difference scheme on a uniform grid and the Monte Carlo sampling needs to be performed for each point in the grid, the algorithm complexity once more suffers exponentially with respect to the dimension.

DNNs for the Dirichlet problem on bounded domains

We mention two directions in the literature that aim at rigorously proving that DNNs can be used as numerical solvers without suffering from the curse of high dimensions. The first one is proposed in [23] and it is the most related to our present work, so we shall frequently refer to it in what follows. The approach goes through the stochastic representation of the solution, and aims at designing a Monte Carlo sampler that can be used uniformly for all locations in the domain where the solution is approximated. However, some important issues remained open, like the fact that the obtained estimates depend on the volume of the domain which can nevertheless grow exponentially with respect to the dimension; these are going to be discussed in detail later. The second approach is in [39] which is rather different and constructs the neural network progressively using a gradient descent method and then calculates the polynomial complexity of neural network approximation. A main feature of this second approach is that the construction of the DNN solver uses the theoretical spectral decomposition of the differential operator which is numerically not available, hence the obtained existence of the DNN approximator for the exact solution is of theoretical nature.

DNNs for (linear) Kolmogorov PDEs

In the case of linear parabolic PDEs, DNNs solvers based on probabilistic representations have been proposed and numerically tested in [5]. A theoretical proof that DNNs are indeed able to approximate solutions to a class of linear Kolmogorov PDEs without suffering from the curse of dimensions has been provided in [31]. The strategy fundamentally aims at minimizing the error in L2(D;λ/λ(D))L^{2}(D;\lambda/\lambda(D)) (λ\lambda is the Lebesgue’s measure), hence the existence of a DNN that approximates the solution without the curse of dimensions is in fact obtained on domains whose volumes increase at most polynomially with respect to the space dimension. In [5], the authors include some numerical evidence that the L(D)L^{\infty}(D)-error also scales well with respect to the dimension, but some caution needs to be taken as mentioned in [5, 4.7 Conclusion]. In the case of the heat equation on d\mathbb{R}^{d} it was proved in [21] that any solution with at most polynomial growth can be approximated in L([a,b]d)L^{\infty}([a,b]^{d}) by a DNN whose size grows at most polynomially with respect to dd, the reciprocal of the prescribed approximation error, and max(|a|,|b|)\max(|a|,|b|); the authors use heavily the representation of the solution by a standard Brownian motion shifted at the location point where the solution is approximated. In our paper we deal with the Poisson problem in a bounded domain in d\mathbb{R}^{d}, and the main difficulty and difference at the same time, comes from the fact that the representing process (namely the Brownian motion stopped when it exits the domain) depends in a nonlinear way on the starting point where the solution is represented, and this dependence is strongly influenced by the geometry of the domain.

DNNs for semilinear parabolic PDEs on d\mathbb{R}^{d}

In the case of semilinear heat equation on d\mathbb{R}^{d} with gradient-independent nonlinearity, a rigorous proof that DNNs can be used as numerical solvers that do not suffer from the curse of dimensions can be found in [29, Theorem 1.1]. The approximating errors are considered in the L2([0,1]d)L^{2}([0,1]^{d}) sense, so in general, by a scaling argument, they depend on the volume of the domain where the solution is approximated. Deep learning methods for general semilinear parabolic PDEs on d\mathbb{R}^{d} have been proposed and efficiently tested in high dimensions in [25]. Rigorous proofs that these type of deep solvers are indeed capable of approximating solutions to general PDEs without suffering from the curse of dimensions are still waiting to be derived.

1.2 Our contribution

In the present work we are primarily interested in studying Monte Carlo and DNN numerical approximations for solutions to the Poisson boundary value problem (1.1) in bounded domains in d\mathbb{R}^{d}, explicitly tracking the dependence of the Monte Carlo estimates as well as the size of the corresponding neural networks in terms of the spatial dimension, the reciprocal of the accuracy, the regularity of the domain and the prescribed source and boundary data. There are several key issues that we tackle throughout the paper, so let us briefly yet systematically point them out here:

Overcoming the curse of high dimensionality

We outline here that a very important consequence of our results concerns the breaking of the curse of high dimensions in the sense that the size of the neural network approximating the solution uu to problem (1.1) adds at most a (low degree) polynomial complexity to the overall complexity (see Theorem (Part II) below) of the approximating networks for the distance function and the data. Moreover, as typical in machine learning, we also show that despite the fact that the neural network construction is random, it breaks the dimensionality curse with high probability. In terms of the dimension dd, our main results state, in particular, that if the domain is sufficiently regular (e.g. convex) then the complexity of the Monte Carlo estimator of the exact solution to (1.1) scales at most like d3log4(d)d^{3}\log^{4}(d), whilst the DNN estimator of the same solution scales at most like d5log5(d)Sd^{5}\log^{5}(d){\rm S}, where S{\rm S} is a cumulative size of the DNNs used to approximate the given data and the distance function to the boundary of the domain. In contrast with the results from [23], our construction of the DNN approximators is explicit and such that their sizes do not depend on the volume of the domain. The herein obtained estimates should also be compared with the conclusion from [39] where the size of the network is 𝒪(dlog(1/γ))\mathcal{O}(d^{\log(1/\gamma)}), where γ\gamma represents the accuracy of the approximation, thus the degree grows with the allowed error. Also, the construction adopted in [39] is of theoretical nature in the sense that it guarantees the existence of DNNs with the desired properties, but is unclear how it could be implemented in practice. In contrast, our schemes can be easily implemented using GPU computing, as discussed in Section 5.

Low dimensions improvements for general bounded domains with Dirichlet data

We should point out that our approach also has interesting consequences in low dimensions. The key is that we can reuse the samples for the Monte Carlo solver to simultaneously approximate the solution for all points in the domain, and furthermore the number of the steps required by the designed Walk-on-Spheres algorithm does not depend on the starting point; these two features make the proposed scheme highly parallelizable, and GPU computing can be very efficiently employed. Moreover, this works for arbitrary bounded domains with quantitative estimates while for more regular domains one obviously obtain improved estimates.

General bounded domains and Hölder continuous data

Recall that in [23] the domain is assumed to be convex, whilst in [39] it is of class CC^{\infty}. In the present paper it is shown that the curse of high dimensions can be overcome for a general class of domains, namely those that satisfy a uniform exterior ball condition. As a matter of fact, our results are even more general, covering the case of a arbitrary bounded domain in d\mathbb{R}^{d} (see 2.26), but then the estimates are given in terms of the behavior of the function vDv_{D} defined in (2.3) in the proximity of the boundary of the domain. Concerning the regularity of the source and boundary data, our assumption is that they are merely Hölder continuous. Recall that in [23] the source and boundary data are assumed to be twice continuous differentiable.

L(D)L^{\infty}(D) estimates

Recall that in [23], the accuracy is measured with respect to the L2L^{2} norm. However, as pointed out in [39], the estimates depend actually on the volume of the domain DD, hence they implicitly exhibit an exponential dependence on the dimension for sufficiently large domains. In the present work we estimate the errors in the uniform norm which gives on one hand much better results. On the other hand, we prove that the uniform norm of the error is small with large probability, whilst the approximation complexity depends on DD merely through its (annular) diameter. We emphasize that obtaining reliable estimates for the expectation or the tail probability of the Monte Carlo error computed in the uniform norm is in general highly nontrivial. For example, such estimates have been only very recently been obtained for the (linear) heat equation in d\mathbb{R}^{d} in [22], after some considerable effort. In our case, the difficulty arises mainly from the fact that we work in bounded domains. Nevertheless, our method is in some sense much simpler and could be easily transferred in other settings as well.

Walk-on-Spheres acceleration revisited

The walk on spheres (WoS) was introduced in [40] as a way of accelerating the calculation of integrals along the paths of Brownian motion. The standard walk-on-spheres uses the following scheme: take some xDx\in D; then, instead of simulating the entire Brownian motion trajectory, we start by uniformly choosing a point on the sphere of maximal radius inside DD. Then, this step is repeated until the current position enters some thin neighborhood of the boundary (ε\varepsilon-shell), where the chain is stopped. Thus, the chain so constructed is used as an approximation for the Brownian motion started from xx and stopped at the exit time from the domain DD; recall that the distribution of the latter variable is precisely the harmonic measure with pole xx, hence it is used to represent the solution u(x)u(x) to problem (1.1) with f0f\equiv 0. Here, we modify this algorithm in two respects. Firstly, our stopping rule for the walk on sphere is deterministic, namely, we run the walk-on-spheres chain for a given number of steps, uniformly for all trajectories and all points in the domain. This is totally opposite to the stopping rule used in [23], and, in fact, to the one usually adopted in the literature. It has a number of advantages, but perhaps the most important one is that the neural network construction outlined here is explicit. In order to understand how large one should take such a deterministic stopping time in order to achieve the desired estimates, we have to investigate different estimates in less or more regular domains for the number of steps needed for the walk on spheres chain to reach the ε\varepsilon neighborhood of the boundary. Secondly, our walk-on-spheres scheme is performed with the maximal radius replaced by a more general radius, which is not necessarily maximal and is compatible with ReLU DNNs. Overall, we develop a generalized walk-on-spheres algorithm which is of self interest and which is much more compatible with parallel computing, when compared to the classical scheme.

The core ideas are surprisingly simple and of general nature

Putting aside the WoS acceleration algorithm, the crucial ingredients are the following: the first one is that the Monte Carlo approximation uMN(x)u_{M}^{N}(x) given by (1.5) of the exact solution uu to problem (1.1) is a.s. Hölder continuous with respect to xx, yet with a Hölder constant which is exponentially large with respect to the number MM of WoS steps. The second one is to consider a uniform grid discretization of the domain DD and to approximate the solution uu in the sup-norm merely on this grid. The third ingredient is to employ the (otherwise very poor) Hölder regularity of uMNu_{M}^{N} to extrapolate the approximation from the grid to the entire domain. The grid needs to be taken extremely refined, and thus leads immediately to an exponential complexity in terms of MM and the diameter of DD. Now the fourth ingredient comes into play, namely we use Höffding’s inequality combined with the union bound inequality in order to efficiently compensate for the exponential complexity induced by the uniform grid. We emphasize that the grid discretization is just an instrument to prove the main result which remains in fact grid-independent. This approach that uses an auxiliary uniform grid whose induced complexity is compensated by a concentration inequality is, as a matter of fact, simple and of general nature. Therefore, we expect that out approach can be easily employed for other classes of PDEs.

Universality with respect to given data

One useful feature of the estimator explicitly constructed in this paper is that it essentially approximates the operator that maps the data (source and boundary) of problem (1.1) into the corresponding solution uu. In particular, it means that the DNN solvers constructed herein consist of the composition of two separate neural networks: one which approximates the the source and boundary data and one for the above-mentioned operator. In this light, the present method could be interpreted as an operator learning method, and once the operator is learned, the source and boundary data can be varied very easily, without any further training.

Explicit construction of the approximation

One key element of our approach is the explicit formulation of the approximation. This is reflected in the formula (1.5) where all elements are fully determined. We should also insist that (1.5) is much simpler than a neural network and does not need any training. On the other hand, this structure can be exploited to initialize a DNN with significantly less complexity than the guaranteed Monte Carlo construction we provide. Once this initialization is done, we can train this network in order to further decrease the approximation error.

1.3 Brief technical description of the main results.

Our starting point is suggested by [23] which essentially builds on the stochastic representation of the solution to the Poisson equation. In turn, the stochastic representation is then followed by the standard walk on spheres (WoS) method to accelerate the computation of the integrals of the Brownian trajectory. This is then used to construct neural networks approximations. In the present paper we fundamentally expand, clarify, simplify and explicitly construct starting from some of the ideas pointed out in [23]. An extension of [23] to the fractional Laplacian has been developed in [50], so the refined methods proposed here should essentially apply to [50] as well.

Now we descend into the description of our main results. We study the Poisson boundary value problem

{12Δu=f in Du|D=g,\begin{cases}\frac{1}{2}\Delta u=-f\,\textrm{ in }D\\ u|_{\partial D}=g,\end{cases} (1.1)

where DD is a bounded domain from d\mathbb{R}^{d}, whilst f:Df:D\to\mathbb{R} and g:D¯g:\overline{D}\to\mathbb{R} are given continuous functions. It is well known that there exists a unique solution uC(D)Hloc1(D)u\in C(D)\cap H^{1}_{loc}(D) to (1.1), see [18, Theorem 6] or 2.2 from below. The fact that uC(D)Hloc1(D)u\in C(D)\cap H^{1}_{loc}(D) is a solution to (1.1) means that

12Du,φ𝑑x=Dfφ𝑑xφCc(D)\displaystyle\frac{1}{2}\int\limits_{D}\langle\nabla u,\nabla\varphi\rangle\;dx=\int\limits_{D}f\varphi\;dx\quad\forall\varphi\in C_{c}^{\infty}(D) (1.2)
limDxx0Du(x)=g(x0),x0D which is a regular point for D.\displaystyle\lim\limits_{D\ni x\to x_{0}\in\partial D}u(x)=g(x_{0}),\quad\forall x_{0}\in\partial D\mbox{ which is a regular point for }D. (1.3)

In [23] the domain DD is taken to be convex. We treat several layers of generality for the domain DD which lead to different final results.

All the random variables employed in the sequel are assumed to be on the same probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}), whilst the expectation is further denoted by 𝔼\mathbb{E}. Further, let us consider the generalised walk on sphere process defined as

X0x:=xD,Xn+1x:=Xnx+r~(Xnx)Un+1,n0,{X}^{x}_{0}:=x\in D,\quad{X}^{x}_{n+1}:={X}^{x}_{n}+\widetilde{r}({X}^{x}_{n})U_{n+1},\;n\geq 0, (1.4)

where xx is the starting point in the domain DD, the function r~\widetilde{r} denotes the replacement of the distance function rr to the boundary D\partial D, and UiU_{i} are drawn independent and identically on the unit sphere in d\mathbb{R}^{d}. Essentially we use a Lipschitz r~\widetilde{r} such that r~r\widetilde{r}\leq r on the whole domain and βrr~\beta r\leq\widetilde{r} as long as rεr\geq\varepsilon. We call such a candidate a (β,ε)(\beta,\varepsilon)-distance and these constants play an important role in the estimates below. In all cases we can take β1/3\beta\geq 1/3 as we point out in Remark 2.8 below. With this process at hand, we introduce the Monte Carlo estimator uMNu_{M}^{N} of the solution uu to problem (1.1) by

uMN(x):=1Ni=1N[g(XMx,i)+1dk=1Mr~2(Xk1x,i)f(Xk1x,i+r~(Xk1x,i)Yi)],xD,M,N1.u_{M}^{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\left[g(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x,i}_{k-1})f\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right],\quad x\in D,M,N\geq 1. (1.5)

Here, the sequences {Un,i}n,i0\{U_{n,i}\}_{n,i\geq 0} and {Yi}i0\{Y^{i}\}_{i\geq 0} are all independent, Un,iU_{n,i} is drawn uniformly on the unit sphere, Xnx,iX_{n}^{x,i} is given by (1.4) with UnU_{n} replaced by Un,iU_{n,i}, whilst YiY^{i} is drawn on the unit ball in d\mathbb{R}^{d} from the distribution μ\mu which has an explicit density proportional to |y|2d1,|y|<1|y|^{2-d}-1,|y|<1 if d3d\geq 3, and which is in fact the (normalized) Green kernel of the Laplacian on the unit ball with pole at 0. It is easy to see that if RR and ZZ are independent random variables such that RR has distribution Beta(4dd2,2)Beta\left(\frac{4-d}{d-2},2\right) on [0,1][0,1] and ZZ is uniformly distributed on the unit sphere in d\mathbb{R}^{d}, then R1d2ZR^{\frac{1}{d-2}}Z has distribution μ\mu.

The first part of the main result of this paper is the following:

Theorem (Part I; see 2.26 for the full quantitative version).

Fix a small ε0>0\varepsilon_{0}>0, β(0,1]\beta\in(0,1], r~\widetilde{r} a (β,ε0)(\beta,\varepsilon_{0})-distance, and consider uMu_{M} and uMNu_{M}^{N} given by (2.20) and (2.29). Also, assume that ff and gg are α\alpha-Hölder on DD for some α(0,1]\alpha\in(0,1]. Then, for any compact subset FDF\subset D, for all N,M,K1N,M,K\geq 1, γ>0\gamma>0 and ε[0,ε0]\varepsilon\in[0,\varepsilon_{0}], then, there are explicit quantities A(F,M,K,d,ε)A(F,M,K,d,\varepsilon) and B(M,K,d)B(M,K,d) given in terms of the boundary regularity, the parameters N,M,K,d,εN,M,K,d,\varepsilon, the set FF and the data f,gf,g such that

𝔼{supxF|u(x)uMN(x)|}A(F,M,K,d,ε)+B(M,K,d)N.\mathbb{E}\left\{\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\right\}\leq A(F,M,K,d,\varepsilon)+\frac{B(M,K,d)}{\sqrt{N}}. (1.6)

Moreover, for an arbitrary domain DD, for any compact subset FDF\subset D, we have that

limMlimN𝔼{supxF|u(x)uMN(x)|}=0.\lim_{M\to\infty}\lim_{N\to\infty}\mathbb{E}\left\{\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\right\}=0. (1.7)

In addition, if the domain DD satisfies the uniform ball condition, we can take F=DF=D in the estimates (1.6) and (1.7).

Remark 1.1.

A few comments are in place here.

  1. (i)

    The estimates are true for any arbitrary domain, both in expectation and also in the tail. However, in this very general case we do not get any quantitative estimates, only asymptotic convergence guarantees on compact subsets.

  2. (ii)

    The full power of the result is a little more technical and states that we have a tail estimate in the form

    (supxF|u(x)uMN(x)|γ)2exp(C1(M,K,d)((γA(F,M,K,d,ε))+)2C2(M,d)N),\mathbb{P}\left(\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq 2\exp\left(C_{1}(M,K,d)-\frac{\left((\gamma-A(F,M,K,d,\varepsilon))^{+}\right)^{2}}{C_{2}(M,d)}N\right), (1.8)

    where

    C1(M,K,d):=d(M/αlog(2+|r~|1)+log(K)),C2(M,d):=|g|+Md𝖽𝗂𝖺𝗆(D)2|f|A(F,M,K,d,ε):=2(|g|α+diam(D)2|f|α+2diam(D)|f|d)(𝖽𝗂𝖺𝗆(D)K)α+dα/2|g|αvα/2(F,ε)+|f|v(F,ε)+(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε24𝖽𝗂𝖺𝗆(D)2M.\begin{split}C_{1}(M,K,d):&=d\left(\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right),\quad C_{2}(M,d):=|g|_{\infty}+\frac{M}{d}{\sf diam}(D)^{2}|f|_{\infty}\\ A(F,M,K,d,\varepsilon)&:=2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha}\\ &\quad+d^{\alpha/2}|g|_{\alpha}v^{\alpha/2}(F,\varepsilon)+|f|_{\infty}v(F,\varepsilon)+(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.\end{split} (1.9)
  3. (iii)

    In the case the function gC2(D¯)g\in C^{2}(\bar{D}), we can take

    A(F,M,K,d,ε):=2(|g|α+diam(D)2|f|α+2diam(D)|f|d)𝖽𝗂𝖺𝗆(D)K+(|Δg|2+|f|)v(F,ε)+(8|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε24𝖽𝗂𝖺𝗆(D)2M.\begin{split}A(F,M,K,d,\varepsilon):=&2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\frac{{\sf diam}(D)}{K}\\ &+\left(\frac{|\Delta g|_{\infty}}{2}+|f|_{\infty}\right)v(F,\varepsilon)+(8|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.\end{split}
  4. (iv)

    The parameter ε\varepsilon measures the closeness to the boundary and can be taken to be arbitrary small. The function v(x,ε)v(x,\varepsilon) and v(F,ε)v(F,\varepsilon) defined in (2.22) measure the geometry of the boundary from a rather stochastic viewpoint. We can upper-bound v(F,ε)v(F,\varepsilon) by a more more tractable and analytical version of it, namely |v|(ε)|v|_{\infty}(\varepsilon) (see (2.22)). Moreover, if the domain DD satisfies the exterior ball condition, we can replace the compact set FF with the whole domain DD and v(D,ε)v(D,\varepsilon) with ε𝖺𝖽𝗂𝖺𝗆(D)\varepsilon{\sf adiam}(D).

    Furthermore, for the case of δ\delta-defective convex domains DD (see (2.15)) we can replace eβ2ε24𝖽𝗂𝖺𝗆(D)2Me^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M} by (1β2(1δ)4d)M𝖽𝗂𝖺𝗆(D)ε\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\sqrt{\frac{{\sf diam}(D)}{\varepsilon}}.

  5. (v)

    There are many parameters in (2.35) and (1.8). MM stands for the number of steps the walk on spheres is allowed to take. NN is the number of Monte Carlo simulations.

    The mysterious constant KK comes from a grid discretisation used for the estimate. Notice that the left hand side of (1.6) or (1.8) does not depend on ε\varepsilon or KK. Furthermore, the larger the KK and MM the larger C1(M,K,d)C_{1}(M,K,d), nevertheless this is compensated by NN and the dependency of A(F,M,K,d,ε)A(F,M,K,d,\varepsilon), which becomes smaller for large KK and MM. Therefore, the strategy is to optimize the right hand sides of (1.6) or (1.8) to obtain the best estimate.

  6. (vi)

    We can take the constant

    B(M,K,d):=C2(M,d)(C1(M,K,d)+log(2)+1).B(M,K,d):=C_{2}(M,d)(\sqrt{C_{1}(M,K,d)+\log(2)}+1).

    which, as well as the constants C1(M,K,d)C_{1}(M,K,d) and C2(M,d)C_{2}(M,d), does not depend on NN nor ε\varepsilon. Thus the limit over NN in (1.7) and then over MM leaves the right hand side of (1.6) dependent only on KK and ε\varepsilon. However, the limit in (1.7) is independent of KK and ε\varepsilon. Letting KK\to\infty and ε0\varepsilon\to 0 yields (1.7) for general domains. Quantitative versions can be obtained by carefully tuning all the parameters, NN, MM, KK, ε\varepsilon.

  7. (vii)

    The estimate (1.8) is the key to guaranteeing that the error is actually small with high probability. This is more useful than the expectation result (1.7).

  8. (viii)

    Finally, we point out that these rather intricate terms are the key in choosing the dependency of all the constants N,M,K,εN,M,K,\varepsilon in terms of dimension dd for large dd. Nevertheless, the estimates are very useful also for small dimensions.

  9. (ix)

    When gC2(D)g\in C^{2}(\partial D) then the above estimate can be improved; for more details we refer the reader to the extended arxiv version of this paper.

After all the remarks above, the goal is to make the right hand side of (1.8) small. Assuming that the domain DD satisfies the uniform exterior ball condition, and both f,gf,g have α\alpha-Hölder regularity, then we can ensure that

(supxD|u(x)uMN(x)|γ)η,\mathbb{P}\left(\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq\eta, (1.10)

by sampling NN times a number of MM steps of the WoS chain (1.4), with complexity (see more details below in Remark 2.29)

M=𝒪(d2log(1/γ)γ4/α),N=𝒪(d2log(1/γ)2[d2log(1/γ)+γ2/αlog(1/η)]γ2+4/α).M=\mathcal{O}\left(\frac{d^{2}\log(1/\gamma)}{\gamma^{4/\alpha}}\right),\quad N=\mathcal{O}\left(\frac{d^{2}\log(1/\gamma)^{2}\left[d^{2}\log(1/\gamma)+\gamma^{2/\alpha}\log(1/\eta)\right]}{\gamma^{2+4/\alpha}}\right).

Moreover, if DD is defective convex (see (2.15)) then we get the significant improvement

M=𝒪(dlog(d/γ)),N=𝒪(log2(d/γ)(d2log(d/γ)+log(1/η))γ2).M=\mathcal{O}\left(d\log(d/\gamma)\right),\quad N=\mathcal{O}\left(\frac{\log^{2}(d/\gamma)\left(d^{2}\log(d/\gamma)+\log(1/\eta)\right)}{\gamma^{2}}\right).

The expressions inside the 𝒪\mathcal{O} symbols represent the dependency on f,g,𝖽𝗂𝖺𝗆(D)f,g,{\sf diam}(D) and the geometry of DD. As these expressions make it clear, the dependency on the dimension and γ\gamma is poly-logarithmic, with better choices in the case the domain DD has better geometry. We conclude by pointing out that the total number of flops to compute the approximation uMNu_{M}^{N} is 𝒪(d2MN)\mathcal{O}(d^{2}MN), thus in total a polynomial complexity in dd.

Before we proceed to the DNN theoretical counterpart, let us briefly present a single numerical test meant to support the estimates in Theorem(Part I). For more numerical tests in this sense we refer the reader to Section 5. Here, Figure 1 below depicts the evolution w.r.t. NN of the mean errors 𝔼{|D|1D|uuMN|𝑑x}\mathbb{E}\left\{|D|^{-1}\int_{D}\left|u-u_{M}^{N}\right|\;dx\right\} and 𝔼{supxD|u(x)uMN(x)|}\mathbb{E}\left\{\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\right\} for the Poisson problem (1.1), where d=100d=100, DD is the coronal cube D𝖺𝖼D_{\sf ac} defined in Section 5, whilst the exact solution uu is given by (5.4) from Section 5. The above expectations have also been approximated by Monte Carlo simulations, for more details see Section 5, more precisely Figure 7.

Refer to caption
(a) 𝔼{1/|D|D|u(x)uMN(x)|𝑑x}\mathbb{E}\left\{1/|D|\int_{D}\left|u(x)-u_{M}^{N}(x)\right|\;dx\right\}
Refer to caption
(b) 𝔼{supxD|u(x)uMN(x)|}\mathbb{E}\left\{\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\right\}
Figure 1: The evolution of (Monte Carlo estimates of) 𝔼{1/|D|D|u(x)uMN(x)|𝑑x}\mathbb{E}\left\{1/|D|\int_{D}\left|u(x)-u_{M}^{N}(x)\right|\;dx\right\} and 𝔼{supxD|u(x)uMN(x)|}\mathbb{E}\left\{\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\right\} w.r.t. NN, for D=D𝖺𝖼D=D_{\sf ac} (see Section 5), d=100d=100, M=500M=500. The computed errors are decreasing to a small value as the number of WoS trajectories NN increases. The limit error attained when NN goes to infinity is not zero as it depends on MM, but it decreases to zero as the latter parameter is also increased to infinity; see Section 5 for more details.

The second part of the main result of this paper is dedicated to the construction of a neural network approximation. In fact, in this paper we deal only with the special class of feed-forward neural networks whose activation function is given by ReLU (rectified linear unit), and such a network shall further be referred to as a ReLU DNN. The essential part of this construction is the formula (1.5). The key fact is that we choose r~\widetilde{r} to be already a ReLU DNN. This is the main reason for our modification of the walk on spheres algorithm with r~\widetilde{r} instead of the usual distance to the boundary. In addition, having some ReLU DNN approximations of the data ff respectively gg, we can use these building blocks together with some basic facts about the ReLU DNNs in conjunction with (1.8) to get the following result.

Theorem (Part II; see Theorem 3.10 for details).

Under the same context as Theorem (Part I), assume that DD satisfies the uniform exterior ball condition, and that we are given ReLU DNNs ϕf:D,ϕg:D¯,ϕr:D\phi_{f}:D\rightarrow\mathbb{R},\phi_{g}:\overline{D}\rightarrow\mathbb{R},\phi_{r}:D\rightarrow\mathbb{R} such that

|fϕf|ϵf|f|,|gϕg|ϵg,|rϕr|ϵr.|f-\phi_{f}|_{\infty}\leq\epsilon_{f}\leq|f|_{\infty},\quad|g-\phi_{g}|_{\infty}\leq\epsilon_{g},\quad|r-\phi_{r}|_{\infty}\leq\epsilon_{r}.

If γ>0\gamma>0 is chosen such that a.1)a.3)a.1)-a.3) from 3.10 hold, and 0<η<10<\eta<1, then we can construct a (random) ReLU DNN 𝕌(x)\mathbb{U}(x) such that

(supxD|u(x)𝕌(,x)|γ)1η,\mathbb{P}\left(\sup\limits_{x\in D}\left|u(x)-\mathbb{U}(\cdot,x)\right|\leq\gamma\right)\geq 1-\eta,

with

size(𝕌(ω,))=𝒪(d7γ16/α4log4(1γ)[d3γ4/αlog(1γ)+log(1η)]S),{\rm size}(\mathbb{U}(\omega,\cdot))=\mathcal{O}\left(d^{7}\gamma^{-16/\alpha-4}\log^{4}\left(\frac{1}{\gamma}\right)\left[d^{3}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right),

where

S:=[max(d,𝒲(ϕr),(ϕr))+size(ϕr)+size(ϕg)+size(ϕf)].{\rm S}:=\left[\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+{\rm size}(\phi_{r})+{\rm size}(\phi_{g})+{\rm size}(\phi_{f})\right].

Furthermore, if DD is defective convex (see (2.15)) then we get the significant improvement

size(𝕌(ω,))=𝒪(d3γ2log4(dγ)[d2log(dγ)+log(1η)]S).{\rm size}(\mathbb{U}(\omega,\cdot))=\mathcal{O}\left(\frac{d^{3}}{\gamma^{2}}\log^{4}\left(\frac{d}{\gamma}\right)\left[d^{2}\log\left(\frac{d}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right).

Here size\mathrm{size} denotes the number of non-zero parameters in the neural network, 𝒲(ϕ)\mathcal{W}(\phi) and (ϕ)\mathcal{L}(\phi) represent the width respectively the length of the neural network ϕ\phi. The implicit constants depend on |g|α,|g|,|f|,diam(D),adiam(D),δ,α,log(2+|ϕr|1)|g|_{\alpha},|g|_{\infty},|f|_{\infty},{\rm diam}(D),\\ {\rm adiam}(D),\delta,\alpha,\log(2+|\phi_{r}|_{1}).

1.4 Further extensions

This paper has outlined a novel method with the potential to extend to a general principle for solving high-dimensional partial differential equations (PDEs) without encountering the curse of dimensionality. The approach is threefold:

  1. (i)

    Initially, we establish a probabilistic representation of the solution via a stochastic process. Notably, such processes are already known to exist for a wide range of PDEs, ranging from linear types [24] to fully non-linear models [11, 17, 7, 41, 20].

  2. (ii)

    The subsequent phase involves the rapid simulation of this stochastic process, aligning closely with the PDE solution representation. This step is harmonized with the Monte-Carlo method for approximating the solution.

  3. (iii)

    The final phase focuses on approximating the solution at specific points within an auxiliary grid of the domain. Following this, we employ a concentration inequality to extend this approximation to the continuous domain, thus compensating for the complexity introduced by the grid.

Such a strategy finds immediate application in the realm of parabolic PDEs solvable by diffusion processes, as discussed in [3] and [10], especially within time-dependent domains. It also applies to PDEs characterized by (sticky-)reflecting boundary conditions, in line with the frameworks established in [53] and [32].

Furthermore, this methodology opens new avenues in molecular dynamics, particularly in scenarios where the operator exhibits discontinuous coefficients, as explored in [9, 38, 36]. Another promising direction involves the extension of this approach to (sub-)Riemannian or metric measure spaces, utilizing tools as the ones in [47, 26, 51, 2, 4, 49].

1.5 Organization of the paper

In Section 2 we present the main notations, the important quantities and the main results. For the sake of readability of the paper we moved the proofs to Section 4. Section 2 in turn contains several subsections which we discuss now because they show the main approach. Subsection 2.1 details the first probabilistic representation of the solution and various types of estimates based on the annular diameter of a domain, which is a certain one dimensional characterisation of the domain. In Subsection 2.2 the walk on spheres and its modified version enters the scene and we give the main estimates on the number of steps needed to get in the proximity of the boundary for general domains. Also here we introduce the class of δ\delta-defective convex domains, a class of domains for which we provide better estimates for the number of steps to the proximity of the boundary. Subsection 2.3 provides the main analysis of the modified walk-on-spheres chain stopped at a deterministic time (thus uniformly for all points in the domain and all samples) as opposed to the one in [23] which is random and very difficult to control. Here we estimate first the sup norm of uuMu-u_{M} where uu is the solution to (1.1) represented probabilistically by (2.2), whilst uMu_{M} is given by (2.20); the estimates are given in terms of the regularity of the data, the geometry of the boundary and the parameter MM. If the annular diameter is finite then in fact the estimates depend only on the diameter and the annular diameter or the parameter δ\delta (the convex defectivness). Furthermore, Subsection 2.4 introduces the Monte Carlo estimator uMNu_{M}^{N} (see (1.5)) for uMu_{M} and contains the main results, namely Theorem 2.26 and Corollary 2.28. Subsection 2.5 contains the main extensions we need to deal with the walk on sphere algoritm. Fundamentally, in order to construct either (2.20) or (1.5) we need to extend the values of boundary data gg inside the domain and we do this under some regularity conditions.

Section 3 contains the neural network consequences of the main probabilistic results and benefits from the very careful preliminary construction of the modified walk on spheres. We should only point out the key fact that from (1.5) namely that once we replaced gg, ff and r~\tilde{r} by neural networks then the function uMNu_{M}^{N} also becomes a neural network. A word is in place here about the distance function rr, the distance to the boundary. In the original walk on spheres, one uses rr for the construction of the radius of the spheres. We modified this into r~\tilde{r}. The benefit is that if we have an approximation of rr by a neural network, we can easily construct r~\tilde{r} which is already a neural network. This avoids complications which arise in [23] from the approximation of rr by a neural network after the Monte-Carlo estimator is constructed. The rest of the section here is judicious counting of the size of the neural network obtained for uMNu_{M}^{N} replacing ff and gg by their corresponding approximating networks.

Finally, Section 5 is devoted to several numerical tests based on the Monte Carlo approach proposed and analyzed in the previous sections. Some key theoretical bounds are numerically validated and the PDE (2.1) below is numerically solved for some relevant domains in d\mathbb{R}^{d} for d=10d=10 and d=100d=100.

2 The presentation of the main results

As we already discussed, this work concerns probabilistic representations and their DNN counterpart for the solution uu to problem

{12Δu=f in Ddu=g on D.\left\{\begin{array}[]{ll}\frac{1}{2}\Delta u=-f&\,\textrm{ in }D\subset\mathbb{R}^{d}\\[2.84526pt] \phantom{\frac{1}{2}\Delta}u=g&\,\textrm{ on }\partial D.\end{array}\right. (2.1)

Throughout this paper, DD is a bounded domain in d\mathbb{R}^{d}, ff is bounded on DD and gg is continuous on the boundary D\partial D. Further regularity shall also be imposed on DdD\subset\mathbb{R}^{d}, ff and gg, so let us fix some notations: We say that a set DdD\subset\mathbb{R}^{d} is of class CkC^{k} if its boundary D\partial D can be locally represented as the graph of a CkC^{k} function. We write hC(D)h\in C(D) to say that h:Dh:D\rightarrow\mathbb{R} is continuous on DD. Lp(D)L^{p}(D) is the standard Lebesgue space with norm denoted by ||Lp(D)|\cdot|_{L^{p}(D)}. For a bounded function h:Dh:D\rightarrow\mathbb{R}, that is for hL(D)h\in L^{\infty}(D), we shall denote by |h||h|_{\infty} the essential sup-norm of hh. For α[0,1]\alpha\in[0,1] and h:Dh:D\rightarrow\mathbb{R} an α\alpha-Hölder function for α(0,1)\alpha\in(0,1)( or Lipschitz, for α=1\alpha=1), we set |h|α:=supx,yD|h(x)h(y)||xy|α|h|_{\alpha}:=\sup\limits_{x,y\in D}\dfrac{|h(x)-h(y)|}{|x-y|^{\alpha}}.

Before we proceed, let us remark that one could also consider the anisotropic operator K\nabla\cdot K\nabla instead of Δ\Delta, where KK is a (homogeneous) positive definite symmetric matrix, without altering the forthcoming results. This can be done mainly due to the following straightforward change of variables lemma. For completeness, we also include its short proof in the smooth case.

Lemma 2.1.

For any given KK a d×dd\times d symmetric, positive-definite matrix, assume that uu is a classical solution to (2.1) with Δ\Delta replaced by K\nabla\cdot K\nabla. If we take a d×dd\times d matrix AA such that AAT=KAA^{T}=K, then denoting DA:=A1(D)D_{A}:=A^{-1}(D) and v(x):=u(Ax),fA(x)=f(Ax),gA(x)=g(Ax),xDAv(x):=u(Ax),f_{A}(x)=f(Ax),g_{A}(x)=g(Ax),x\in D_{A}, we have

12Δv=fA in DA,v=gA on DA.\frac{1}{2}\Delta v=-f_{A}\,\textrm{ in }D_{A},\qquad v=g_{A}\textrm{ on }\partial D_{A}.

Proof in Subsection 4.1.

2.1 Probabilistic representation for Laplace equation and exit time estimates

We fix B0(t),t0B^{0}(t),t\geq 0, to be a standard Brownian motion on (Ω,,t,)(\Omega,\mathcal{F},\mathcal{F}_{t},\mathbb{P}) which starts from zero, that is (B(0)=0)=1\mathbb{P}(B(0)=0)=1. Then, we set

Bx(t):=x+B0(t),t0,xd,B^{x}(t):=x+B^{0}(t),\quad t\geq 0,\ x\in\mathbb{R}^{d},

and recall that the law x:=(Bx())1\mathbb{P}^{x}:=\mathbb{P}\circ(B^{x}(\cdot))^{-1} on the path-space C([0,);d)C([0,\infty);\mathbb{R}^{d}) is precisely the law of the Brownian motion starting from xdx\in\mathbb{R}^{d}.

By τDc:=τDcx\tau_{D^{c}}:=\tau^{x}_{D^{c}} we denote the first hitting time of Dc:=dDD^{c}:=\mathbb{R}^{d}\setminus D by (Bx(t))t0(B^{x}(t))_{t\geq 0}, namely

τDc(ω):=inf{t>0:Bx(t,ω)Dc},ωΩ.\tau_{D^{c}}(\omega):=\inf\{t>0:B^{x}(t,\omega)\in D^{c}\},\quad\omega\in\Omega.

The following result is the fundamental starting point of this work. It is standard for sufficiently regular data, but under the next assumptions we refer to [18, Theorem 6].

Theorem 2.2 ([18]).

Let gC(D)g\in C(\partial D) and fL(D)f\in L^{\infty}(D). Then there exists a unique function uC(D)Hloc1(D)u\in C(D)\cap H^{1}_{loc}(D) such that uu is a (weak) solution to problem (2.1). Moreover, it is given by

u(x)=𝔼{g(Bx(τDc))}+𝔼{0τDcf(Bx(t))𝑑t},xD.u(x)=\mathbb{E}\{g(B^{x}(\tau_{D^{c}}))\}+\mathbb{E}\left\{\int_{0}^{\tau_{D^{c}}}f(B^{x}(t))\;dt\right\},\quad x\in D. (2.2)

Let vD:d+v_{D}:\mathbb{R}^{d}\longrightarrow\mathbb{R}_{+} be given by

vD(x):=𝔼{τDcx},xd.v_{D}(x):=\mathbb{E}\{\tau^{x}_{D^{c}}\},\quad x\in\mathbb{R}^{d}. (2.3)

Most of the main estimates obtained in this paper are expressed in terms of the sup-norm of vDv_{D} in the proximity of the boundary of DD (see e.g. (2.21)). In a following paragraph we shall explore such estimates for vDv_{D} for domains that satisfy the uniform exterior ball condition. Before that, let us start with the following consequence of 2.2:

Corollary 2.3.

The following assertions hold for vDv_{D} given by (2.3).

  1. i)

    |vD|𝖽𝗂𝖺𝗆(D)2/d|v_{D}|_{\infty}\leq{\sf diam}(D)^{2}/d.

  2. ii)

    vDC(D)v_{D}\in C^{\infty}(D) is the solution to the Poisson problem

    12ΔvD=1 in D,vD=0 on D.-\frac{1}{2}\Delta v_{D}=1\,\textrm{ in }D,\qquad v_{D}=0\,\textrm{ on }\partial D. (2.4)

Proof in Subsection 4.1.

The annular diameter of a set in d\mathbb{R}^{d} and exit time estimates

Now we explore more refined bounds for vD(x):=𝔼{τDcx}v_{D}(x):=\mathbb{E}\{\tau^{x}_{D^{c}}\}, xDx\in D, aiming at providing a general class of domains DdD\subset\mathbb{R}^{d} for which vD(x)adiam(D)d(x,D)v_{D}(x)\leq{\rm adiam}(D)d(x,\partial D), where adiam(D){\rm adiam}(D) is a sort of annular diameter of DD which in particular is a one dimensional parameter that depends on DD. For more details, see (2.7) below for the precise definition, and 2.6 for the precise result.

The first step here in understanding the exit problem from a domain which is not necessarily convex is driven by our first model, namely the annulus defined by 0<R0<R10<R_{0}<R_{1} as

A(a,R0,R1)={xn:R0<|xa|<R1}.A(a,R_{0},R_{1})=\{x\in\mathbb{R}^{n}:R_{0}<|x-a|<R_{1}\}.

We set A(R0,R1):=A(0,R0,R1)A(R_{0},R_{1}):=A(0,R_{0},R_{1}).

The result in this direction is the following.

Proposition 2.4.

Take d3d\geq 3 and D:=A(R0,R1)D:=A(R_{0},R_{1}). Then, for every point xDx\in D

vD(x)=𝔼{τDcx}d(x,D)(R1R0)R1R0.v_{D}(x)=\mathbb{E}\{\tau^{x}_{D^{c}}\}\leq d(x,\partial D)\frac{(R_{1}-R_{0})R_{1}}{R_{0}}. (2.5)

Proof in Subsection 4.1

Now we extend the above result to a larger class of domains, namely those that satisfy the uniform exterior ball condition. To do it in a quantitative way, we need to define first a notion of annular diameter of a set as follows:

Definition 2.1.

For a bounded domain DdD\in\mathbb{R}^{d}, and xDx\in\partial D, we set

𝖺𝖽𝗂𝖺𝗆(D)x:=inf{(R1R0)R1R0:ad such that xA(a,R0,R1) and DA(a,R0,R1)},{\sf adiam}(D)_{x}:=\inf\left\{\frac{(R_{1}-R_{0})R_{1}}{R_{0}}:\exists a\in\mathbb{R}^{d}\mbox{ such that }x\in\partial A(a,R_{0},R_{1})\mbox{ and }D\subset A(a,R_{0},R_{1})\right\}, (2.6)

with the convention inf:=\inf\emptyset:=\infty. Furthermore, we set

𝖺𝖽𝗂𝖺𝗆(D):=supxDadiam(D)x=supxD(R1,xR0,x)R1,xR0,x.{\sf adiam}(D):=\sup_{x\in\partial D}{\rm adiam}(D)_{x}=\sup_{x\in\partial D}\frac{(R_{1,x}-R_{0,x})R_{1,x}}{R_{0,x}}. (2.7)

Now for a point xx, taking R0,xR_{0,x} and R1,xR_{1,x} such that the above infimumum is attained, it is easy to see using the triangle inequality that R1,xR0,x𝖽𝗂𝖺𝗆(D)R_{1,x}-R_{0,x}\leq{\sf diam}(D) for any xDx\in\partial D and thus we get that

𝖺𝖽𝗂𝖺𝗆(D)x𝖽𝗂𝖺𝗆(D)(1+𝖽𝗂𝖺𝗆(D)R0,x).{\sf adiam}(D)_{x}\leq{\sf diam}(D)\left(1+\frac{{\sf diam}(D)}{R_{0,x}}\right). (2.8)

This is strongly related to the exterior ball condition at xx, since the former holds if and only if adiam(D)x<{\rm adiam}(D)_{x}<\infty. In fact if we have the exterior ball condition with the radius of the exterior ball is r0r_{0}, then we can choose R0,x=r0R_{0,x}=r_{0} and R1,x=r0+𝖽𝗂𝖺𝗆(D)R_{1,x}=r_{0}+{\sf diam}(D).

The above discussion leads to the following formal statement.

Proposition 2.5.

A bounded domain DdD\subset\mathbb{R}^{d} satisfies the uniform exterior ball condition if and only if adiam(D)<{\rm adiam}(D)<\infty. If the radius of the exterior ball is at least r0>0r_{0}>0, we can estimate

𝖺𝖽𝗂𝖺𝗆(D)r0𝖽𝗂𝖺𝗆(D)r0+𝖽𝗂𝖺𝗆(D).{\sf adiam}(D)\leq\frac{r_{0}{\sf diam}(D)}{r_{0}+{\sf diam}(D)}.

However, for instance a ball centered at 0 from which we remove a cone with the vertex at the center does not have a finite adiam. On the other hand, obviously, a convex bounded domain has finite adiam and in fact, this is actually equal to the diameter of the set. Indeed, one can see this by taking a tangent ball of radius R0R_{0} and taking R1=R0+𝖽𝗂𝖺𝗆(D)R_{1}=R_{0}+{\sf diam}(D). Letting R0R_{0} tend to infinity we deduce that for a convex set DD we actually have 𝖺𝖽𝗂𝖺𝗆(D)=𝖽𝗂𝖺𝗆(D){\sf adiam}(D)={\sf diam}(D).

Now we can present the main estimate of this paragraph.

Proposition 2.6.

If DdD\subset\mathbb{R}^{d} has 𝖺𝖽𝗂𝖺𝗆(D)<{\sf adiam}(D)<\infty, then for any xDx\in D,

v(x):=𝔼{τDcx}d(x,D)𝖺𝖽𝗂𝖺𝗆(D).v(x):=\mathbbm{E}\{\tau^{x}_{D^{c}}\}\leq d(x,\partial D){\sf adiam}(D).

In particular, using (2.8) and (2.7), we have

v(x)2d(x,D)𝖽𝗂𝖺𝗆(D)(1+𝖽𝗂𝖺𝗆(D)R0),xD,v(x)\leq 2d(x,\partial D){\sf diam}(D)\left(1+\frac{{\sf diam}(D)}{R_{0}}\right),x\in D,

where recall that τDc\tau_{D^{c}} is the first exit time from DD and R0:=inf{R0,x:xD}R_{0}:=\inf\{R_{0,x}:x\in\partial D\}.

Proof in Subsection 4.1.

2.2 Walk-on-Spheres (WoS) and ε\varepsilon-shell estimates revisited

An important benefit of representation (2.2) is that the solution uu may be numerically approximated by the empirical mean of iid realizations of the random variables under expectation, thanks to the law of large numbers. One way to construct such realizations is by simulating a large number of paths of a Brownian motion that starts at xDx\in D and is stopped at the boundary D\partial D. However, as introduced by Muller in [40], there is a much more (numerically) efficient way of constructing such realisations, based on the idea that (2.2) does not require the entire knowledge of how Brownian motion reaches D\partial D. This is clearer if one considers the case f0f\equiv 0, when the only information required in (2.2) is the location of Brownian motion at the hitting point of D\partial D. In this subsection we shall revisit and enhance Muller’s method.

For any xDx\in D let r(x)[0,𝖽𝗂𝖺𝗆(D)]r(x)\in[0,{\sf diam}(D)] denote the distance from xx to D\partial D, or equivalently, the radius of the largest sphere centered at xx and contained in DD, that is

r(x):=inf{|xy|:yD}=sup{r>0:B(x,r)D}.r(x):=\inf\{|x-y|:y\in\partial D\}=\sup\{r>0:B(x,r)\subseteq D\}. (2.9)

Clearly, rr is a Lipschitz function.

Recall that the standard WoS algorithm introduced by Muller [40] (see also [45], [14] or [13] for more recent developments) is based on constructing a Markov chain that steps on spheres of radius r(x)r(x) where xDx\in D denotes its current position. However, it is often the case, especially in practice, that merely an approximation r¯\overline{r} of the distance function is available, and not the exact rr. This is the case, for example, if for computational reasons, rr is simply approximated with the (computationally cheaper) distance function to a polygonal surrogate for the domain DD. Another situation, which is in fact central to this study, is given in Section 3 below, when rr is approximated by a neural network. In both cases, rr is approximated by a function r¯\overline{r} with a certain error. It turns out that considering the chain which walks on spheres of radius r¯(x),xD\overline{r}(x),x\in D exhibits certain difficulties regarding the error analysis and also the construction of the chain itself. We do not go into more details at this point, but we refer the reader to 2.23, iii)-iv) below for a more technical explanation of these issues.

Our strategy is to solve both of the above difficulties at once, by developing from scratch the entire analysis in terms of (a modification of) r¯\overline{r}, and not in terms of rr as it is typically done. This motivates the following concept of distance.

Definition 2.7.

Let DdD\subset\mathbb{R}^{d} be a bounded open set and rr the distance function to the boundary. Given ε0\varepsilon\geq 0 and β(0,1]\beta\in(0,1], a Lipschitz function r~:D[0,𝖽𝗂𝖺𝗆(D)]\widetilde{r}:D\rightarrow[0,{\sf diam}(D)] is called a (β,ε)(\beta,\varepsilon)-distance on DD if

i) 0r~r on D and ii)r~βr on Dε:={xD:r(x)ε}.i)\;0\leq\widetilde{r}\leq r\;\mbox{ on }D\quad\mbox{ and }\quad ii)\;\widetilde{r}\geq\beta r\;\mbox{ on }D_{\varepsilon}:=\{x\in D:r(x)\geq\varepsilon\}.

When ε=0\varepsilon=0 we say that r~\widetilde{r} is a β\beta-distance, and if in addition β=1\beta=1 then it is obvious that r~=r\widetilde{r}=r.

Notice that if r~\tilde{r} is a (β,ε)(\beta,\varepsilon)-distance, then it is also a (β,ε)(\beta,\varepsilon^{\prime})-distance for any smaller ε<ε\varepsilon^{\prime}<\varepsilon. Thus if we fix a ε0\varepsilon_{0} and r~\tilde{r} is a (β,ε0)(\beta,\varepsilon_{0})-distance, then, r~\tilde{r} is also a (β,ε)(\beta,\varepsilon)-distance for any ε<ε0\varepsilon<\varepsilon_{0}.

Remark 2.8.

Suppose that ϕr:D[0,)\phi_{r}:D\rightarrow[0,\infty) is a Lipschitz function such that |ϕrr|ϵ|\phi_{r}-r|_{\infty}\leq\epsilon. If ε>2ϵ\varepsilon>2\epsilon and 0<β12ϵε0<\beta\leq 1-\frac{2\epsilon}{\varepsilon}, then a simple computation yields that

r~(x):=(ϕr(x)ϵ)+,xD\widetilde{r}(x):=(\phi_{r}(x)-\epsilon)^{+},\quad x\in D

is a (β,ε)(\beta,\varepsilon)-distance on DD. For example if we take ε=3ϵ\varepsilon=3\epsilon, we can choose β=1/3\beta=1/3, therefore, given any ϕr\phi_{r}, a Lipschitz ϵ\epsilon approximation of rr, there exists r~\widetilde{r} which is (1/3,3ϵ)(1/3,3\epsilon)-distance. The moral is that we can always work with a (β,ε)(\beta,\varepsilon)-distance with β1/3\beta\geq 1/3.

This example already anticipates the amenability of this r~\tilde{r} to the ReLU neural networks. Indeed, the positive function x+x^{+} is precisely the non-linear activation function and from this standpoint, if ϕr\phi_{r} is a neural network, we can argue that r~\tilde{r} becomes also a ReLU neural network. More on this in Section 3.

r~\widetilde{r}-WoS chain.

Let Un:ΩS(0,1),n1U_{n}:\Omega\rightarrow S(0,1),\;n\geq 1 be defined (for simplicity) on the same probability space (Ω,,)(\Omega,\mathcal{F},\mathbb{P}), independent and uniformly distributed, where S(0,1)dS(0,1)\subset\mathbb{R}^{d} is the sphere centered at the origin with radius 11. Let (~n)n0(\widetilde{\mathcal{F}}_{n})_{n\geq 0} be the filtration generated by (Un)n0(U_{n})_{n\geq 0}, where U0=0U_{0}=0, namely

~n:=σ(Ui:in),n0.\widetilde{\mathcal{F}}_{n}:=\sigma(U_{i}\;:\;i\leq n),\;n\geq 0.

Also, let ε0\varepsilon\geq 0, β(0,1]\beta\in(0,1], and r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance on DD. For each xDx\in D, we construct the chain (Xnx)n0({X}^{x}_{n})_{n\geq 0} recursively by

X0x\displaystyle{X}^{x}_{0} :=x\displaystyle:=x (2.10)
Xn+1x\displaystyle{X}^{x}_{n+1} :=Xnx+r~(Xnx)Un+1,n0.\displaystyle:={X}^{x}_{n}+\widetilde{r}({X}^{x}_{n})U_{n+1},\;n\geq 0. (2.11)

Clearly, (Xnx)n0({X}^{x}_{n})_{n\geq 0} is a homogeneous Markov chain in DD with respect to the filtration (~n)n0(\widetilde{\mathcal{F}}_{n})_{n\geq 0}, which starts from xx and has transition kernel given by

Pf(x)=S(0,1)f(x+r~(x)z)σ(dz),xD,Pf(x)=\int_{S(0,1)}f(x+\widetilde{r}(x)z)\;\sigma(dz),\quad x\in D,

where σ\sigma is the normalized surface measure on S(0,1)S(0,1); that is, 𝔼{f(Xnx)}=Pnf(x),xD,f\mathbb{E}\{f(X_{n}^{x})\}=P^{n}f(x),x\in D,f bounded and measurable. We name it an r~\widetilde{r}-WoS chain.

We return now to problem (2.1)

12Δu=f in D with u=g on D,\begin{array}[]{ll}\frac{1}{2}\Delta u=-f\textrm{ in }D\text{ with }u=g\textrm{ on }\partial D,\end{array}

which by Theorem 2.2 admits the probabilistic representation (2.2). Further, we consider the following sequence of stopping times

τ0x=0,τn+1x=inf{t>τnx:|Bx(t)Bx(τnx)|r~(Bx(τnx))},n1.\tau_{0}^{x}=0,\quad\tau^{x}_{n+1}=\inf\{t>\tau_{n}^{x}:|B^{x}(t)-B^{x}(\tau^{x}_{n})|\geq\widetilde{r}(B^{x}(\tau^{x}_{n}))\},\;n\geq 1. (2.12)
Remark 2.9.

It is clear that if ε=0\varepsilon=0, i.e. r~\widetilde{r} is a β\beta-distance, then limnτnx=τDcx0-a.s.\lim\limits_{n}\tau_{n}^{x}=\tau_{D^{c}}^{x}\quad\mathbb{P}^{0}\mbox{-a.s.}

The following result is a generalization of Lemma 3.4 from [23]:

Corollary 2.10.

Let DdD\subset\mathbb{R}^{d} be a bounded open set, gC(D),fL(D)g\in C(\partial D),f\in L^{\infty}(D), and uu be the solution to (2.1). Also, for φ:B(0,1)\varphi:B(0,1)\rightarrow\mathbb{R} bounded and measurable set

K0φ:=𝔼{0τB(0,1)c0φ(B0(t))𝑑t}.K_{0}\varphi:=\mathbb{E}\left\{\int_{0}^{\tau^{0}_{B(0,1)^{c}}}\varphi(B^{0}(t))\;dt\right\}.

If r~\widetilde{r} is a β\beta-distance ( i.e. it is a (β,0)(\beta,0)-distance) then the following assertions hold:

  1. i)

    For all xDx\in D we have

    u(x)=𝔼{g(Bx(τDcx))}+𝔼{k1r~2(Xk1x)K0Fx,k},u(x)=\mathbb{E}\{g(B^{x}(\tau^{x}_{D^{c}}))\}+\mathbb{E}\left\{\sum\limits_{k\geq 1}\widetilde{r}^{2}(X^{x}_{k-1})K_{0}F_{x,k}\right\},

    where Fx,k(y)=f(Xk1x+r~(Xk1x)y),yB(0,1)F_{x,k}(y)=f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})y),y\in B(0,1), whilst K0K_{0} acts on Fx,kF_{x,k} with respect to the yy variable.

  2. ii)

    The mapping (B(0,1))Aμ(A):=dK01A[0,1]\mathcal{B}(B(0,1))\ni A\mapsto\mu(A):=dK_{0}1_{A}\in[0,1] renders a probability measure μ\mu on B(0,1)B(0,1) with density dG(0,y),yDdG(0,y),y\in D, where G(x,y)G(x,y) is the Green function associated to 12Δ-\frac{1}{2}\Delta on B(0,1)B(0,1). More explicitly, for d3d\geq 3 we have that G(0,y)G(0,y) is proportional to |y|2d1,|y|<1.|y|^{2-d}-1,|y|<1.

  3. iii)

    Let YY be a real valued random variable defined on (Ω,,)\left(\Omega,\mathcal{F},\mathbb{P}\right), with distribution μ\mu, such that YY is independent of (Un)n(U_{n})_{n}. Then for all xDx\in D we have

    u(x)=𝔼{g(Bx(τDcx))}+1d𝔼{k1r~2(Xk1x)f(Xk1x+r~(Xk1x)Y)}.u(x)=\mathbb{E}\{g(B^{x}(\tau^{x}_{D^{c}}))\}+\frac{1}{d}\mathbb{E}\left\{\sum\limits_{k\geq 1}\widetilde{r}^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})Y)\right\}. (2.13)

Proof in Subsection 4.1.

Consider the r~\widetilde{r}-WoS chain described above, and for each xDx\in D let us define the required number of steps to reach the ε\varepsilon-shell of D\partial D by

Nεx:=inf{n0:r(Xnx)<ε},{N}_{\varepsilon}^{x}:=\inf\{n\geq 0\;:\;r({X}_{n}^{x})<\varepsilon\},

which is clearly an (~n)(\widetilde{\mathcal{F}}_{n})-stopping time.

The estimates to be obtained in the next subsection, and consequently the size of the DNN that we are going to construct in order to approximate the solution to (2.1), depend on how big NεxN_{\varepsilon}^{x} is. The goal of this section is to provide upper bounds for NεxN^{x}_{\varepsilon}, the number of steps the walk on spheres needs to get to the ε\varepsilon-shell. These estimates are first obtained for general domains, and then improved considerably for ”defective convex” domains which are introduced in 2.12, below. We provide what are, to our knowledge, the strongest estimates when compared with the currently available literature, as well as rigorous proofs that rely on the general technique of Lyapunov functions. Some of these results have some something in common with the results in [8], though the estimates in there are not clearly determined in terms of the dimension.

It is essentially known that for a general bounded domain in d\mathbb{R}^{d}, the average of NεxN_{\varepsilon}^{x} grows with respect to ε\varepsilon at most as (𝖽𝗂𝖺𝗆(D)/ε)2({\sf diam}(D)/\varepsilon)^{2} (see [33, Theorem 5.4] and its subsequent discussion), hence, by Markov inequality, (NεxM)\mathbb{P}(N_{\varepsilon}^{x}\geq M) can be bounded by (𝖽𝗂𝖺𝗆(D)/ε)2/M({\sf diam}(D)/\varepsilon)^{2}/M. The next result shows that, in fact, (NεxM)\mathbb{P}(N_{\varepsilon}^{x}\geq M) decays exponentially with respect to MM, and independent of the dimension dd.

Proposition 2.11.

Let DdD\subset\mathbb{R}^{d} be a bounded domain, ε>0\varepsilon>0, β(0,1]\beta\in(0,1] and r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance. Then for any xDx\in D,

𝔼{eβ2ϵ24𝖽𝗂𝖺𝗆(D)2Nεx}2,\mathbbm{E}\left\{e^{\frac{\beta^{2}\epsilon^{2}}{4{\sf diam}(D)^{2}}N^{x}_{\varepsilon}}\right\}\leq 2, (2.14)

where 𝖽𝗂𝖺𝗆(D){\sf diam}(D) denotes the diameter of DD. In particular,

(NεxM)2eβ2ε24𝖽𝗂𝖺𝗆(D)2Mfor all M.\mathbb{P}(N^{x}_{\varepsilon}\geq M)\leq 2e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\quad\mbox{for all }M\in\mathbb{N}.

Proof in Subsection 4.2.

If the domain DD is convex, then the average number of steps required by the WoS to reach the ε\varepsilon-shell is of order log(1/ε)\log(1/\varepsilon). This was shown by Muller in [40], and also reconsidered in [8]. However, it is of high importance to the present work to track an explicit constant in front of log(1/ε)\log(1/\varepsilon) in terms of the space dimension dd. In this subsection we aim to clarify (the proof of) this result and extend it from convex domains to a larger class of domains, resembling the technique of Lyapunov functions from ergodic theory. More importantly, as in the case of 2.11, we are strongly interested in tail estimates for NεxN_{\varepsilon}^{x}, not in its expected value. In a nutshell, the idea is to show that the square root of the distance function to D\partial D is pushing the WoS chain towards the boundary at geometric speed. This is a technique meant to be easily extended for more general operators, in a further work.

The following definition settles the class of domains for which the aforementioned estimate is going to hold.

Definition 2.12.

We say that a Lipschitz bounded domain DdD\subset\mathbb{R}^{d} is ”δ\delta-defective convex” if δ<1\delta<1 and

Δrδ2rad(D)weakly on D,\Delta r\leq\frac{\delta}{2\;\rm{rad}(D)}\quad\mbox{weakly on }D, (2.15)

where we recall that rr is the distance function to the boundary D\partial D, whilst rad(D):=supxDr(x)\rm{rad}(D):=\sup\limits_{x\in D}r(x).

Remark 2.13.

Recall that by [1], DD is convex if and only if the signed distance function rsr_{s} is superharmonic on d\mathbb{R}^{d}, where

rs:=r on D¯ and rs:=r on dD.r_{s}:=r\mbox{ on }\overline{D}\mbox{ and }r_{s}:=-r\mbox{ on }\mathbb{R}^{d}\setminus D.

Hence a defective convex domain as defined by (2.15) is more general than a convex domain; in fact, even if (2.15) holds (on DD) with δ=0\delta=0, it is not necessarily true that DD is convex, as explained in [1]. To give an intuition on how a defective convex domain could differ from a convex domain, imagine a ball in 3D3D which is deformed into a defective convex domain by squeezing it slightly, or a straight cylinder in 3D3D which is bent mildly. However, a defective convex domain can differ seriously from a convex domain, as revealed by the next two examples.

Example 2.14.

Let A(R1,R2)dA(R_{1},R_{2})\subset\mathbb{R}^{d} be an annulus of radii R1<R2R_{1}<R_{2}, namely A(R1,R2)={xd;R1<|x|<R2}A(R_{1},R_{2})=\{x\in\mathbb{R}^{d};R_{1}<|x|<R_{2}\}. If R2R1<1+δd1\frac{R_{2}}{R_{1}}<1+\frac{\delta}{d-1} for some δ<1\delta<1, then A(R1,R2)A(R_{1},R_{2}) is a δ\delta-defective convex domain.

Proof in Subsection 4.2.

Example 2.15.

We take Γ\Gamma to be a connected, compact orientable C2C^{2} hypersurface in d\mathbb{R}^{d}, with d2d\geq 2, endowed with the Riemannian metric gg induced by the embedding. We denote by k1(x)k2(x)kd1(x)k_{1}(x)\leq k_{2}(x)\leq\ldots\leq k_{d-1}(x) the principal curvatures at xΓx\in\Gamma. The orientation is specified by a globally defined unit normal vector field n:Γ𝕊d1n:\Gamma\to\mathbb{S}^{d-1}. Then there exists a positive thickness ε{\varepsilon}, such that the tubular neighbourhood given by

Dε:={x+εtn(x)d|(x,t)Γ×(0,1)}D_{\varepsilon}:=\left\{x+{\varepsilon}\,t\,n(x)\in\mathbb{R}^{d}\ \big{|}\ (x,t)\in\Gamma\times(0,1)\right\}\, (2.16)

is δ\delta-defective convex. In fact, ε{\varepsilon} can be chosen explicitly in terms of the principal curvatures of Γ\Gamma.

Proof in Subsection 4.2.

Proposition 2.16.

Let DdD\subset\mathbb{R}^{d} be δ\delta-defective convex as in (2.15). Let ε>0\varepsilon>0, β(0,1]\beta\in(0,1] and r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance, and consider PP the transition kernel of the r~\widetilde{r}-WoS Markov chain (Xn)n0(X_{n}^{\cdot})_{n\geq 0}. If we set V(x):=r(x)1/2,xDV(x):=r(x)^{1/2},x\in D, then

PV(x)(1β2(1δ)4d)V(x),xD a.e.PV(x)\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)V(x),\quad x\in D\mbox{ a.e.} (2.17)

In particular,

(Nεx>M)(r(XMx)ε)(1β2(1δ)4d)MV(x)ε,xD,\mathbb{P}(N_{\varepsilon}^{x}>M)\leq\mathbb{P}(r(X_{M}^{x})\geq\varepsilon)\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\frac{V(x)}{\sqrt{\varepsilon}},\quad x\in D, (2.18)

and if δd:=1β2(1δ)4d\delta_{d}:=1-\frac{\beta^{2}(1-\delta)}{4d}, then for any 1<a<1/δd1<a<1/\delta_{d}

a𝔼{Nεx}𝔼{aNεx}1+a1aδdV(x)ε,xD.a^{\mathbb{E}\left\{N_{\varepsilon}^{x}\right\}}\leq\mathbb{E}\left\{a^{N_{\varepsilon}^{x}}\right\}\leq 1+\frac{a}{1-a\delta_{d}}\frac{V(x)}{\sqrt{\varepsilon}},\quad x\in D. (2.19)

Proof in Subsection 4.2.

Remark 2.17.

It can be shown that 2.16 still holds if condition (2.15) is satisfied merely in some strict neighbourhood of the boundary. In particular, in view of 2.15, 2.16 holds for all domains with smooth boundary, but the estimates would also depend on the thickness of the neighbourhood where condition (2.15) is fulfilled. Going even further, any bounded domain that can be uniformly approximated from inside by smooth domains enjoys, for fixed ε\varepsilon, a similar estimate with respect to MM and DD as in (2.18), with δ\delta possibly depending on ε\varepsilon. This behavior is numerically confirmed by Test 1 in Section 5 for annular hypercubes, and it is going to be analyzed theoretically in a forthcoming work.

2.3 WoS stopped at deterministic time and error analysis

Throughout this subsection we assume that uu is the solution to problem (2.1), hence 2.2 and 2.10 are applicable. Moreover, we keep all the notations from the previous subsections. Before we proceed with the main results of this subsection, let us emphasize several aspects that are essential to this work. To make the explanation simple, assume that r~=r\widetilde{r}=r, that is the WoS chain is constructed using the (1,0)(1,0)-distance rr. Given xDx\in D, a usual way to employ WoS Markov chain in order to approximate u(x)u(x) through the representation furnished by 2.10, is to start the chain from xx and run it until it reaches the ε\varepsilon-shell, for some given 0<ε<<10<\varepsilon<<1. In other words, (Xkx)k0(X_{k}^{x})_{k\geq 0} is usually stopped at the (random) stopping time NεxN_{\varepsilon}^{x} and uu represented by (2.2) is then approximated with

uε(x):=𝔼{g(XNεxx)}+1d𝔼{k=1Nεxr2(Xk1x)f(Xk1x+r(Xk1x)Y)}.u_{\varepsilon}(x):=\mathbb{E}\{g(X^{x}_{N_{\varepsilon}^{x}})\}+\frac{1}{d}\mathbb{E}\left\{\sum\limits_{k=1}^{N_{\varepsilon}^{x}}r^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+r(X^{x}_{k-1})Y)\right\}.

The intuition behind is that stopping WoS chain at the ε\varepsilon-proximity of the boundary should provide a good approximation of how the Brownian motion first hits the boundary D\partial D, and estimates that certify this fact are in principle well known. As discussed in the previous subsection, the number of steps required to reach the ε\varepsilon-shell is small, especially if the domain is (defective) convex, which eventually leads to a fast numerical algorithm.

From the point of view of this work (also of [23]), the fundamental inconvenience of the above stopping rule is that it depends strongly on the starting point xx, mainly through NεxN_{\varepsilon}^{x}. In other words, although computationally efficient for estimating a single value u(x)u(x), the above point estimate uεu_{\varepsilon} of uu is expected to fail at overcoming the curse of high dimensions for solving (2.1) globally in DD. Moreover, if one aims at constructing a (deep) neural network architecture based on the above representation, as considered in [23] and also in Section 3 below, xx would be the input while NεxN_{\varepsilon}^{x} would give the number of layers; however, the latter should be independent of xx, which is obviously not the case. To deal with this architectural impediment, in [23] the authors proposed supxDNεx\sup\limits_{x\in D}N_{\varepsilon}^{x} as a random time to stop the WoS chain. However, beside the measurability and the stopping time property issues for supxDNεx\sup\limits_{x\in D}N_{\varepsilon}^{x}, it is still unclear, at least to us, that 𝔼{supxDNεx}<\mathbb{E}\left\{\sup\limits_{x\in D}N_{\varepsilon}^{x}\right\}<\infty and that this expectation does not depend on the dimension dd.

Anyway, our approach is consistently different. Instead of stopping the WoS chain at a random stopping time, be it NεxN_{\varepsilon}^{x} or supxDNεx\sup\limits_{x\in D}N_{\varepsilon}^{x}, the idea is to stop the chain after a deterministic number of steps, say MM, independently of the starting point xx. Such a choice turns out to be feasible, and it not only avoids the above mentioned issues concerning supxDNεx\sup\limits_{x\in D}N_{\varepsilon}^{x}, but it eventually provides a way to break the curse of high dimensions for solving (2.1), merely using WoS algorithm but in a global fashion. Furthermore, in terms of neural networks, this strategy would also render a way of explicitly constructing a corresponding DNN architecture, that could be easily sampled, and why not, further trained. Also, as already mentioned in 2.8, when we shall deal with neural networks in Section 3 the distance to the boundary rr needs to be replaced by an approximation given by a DNN, with a certain error. Therefore, in light of 2.7 and 2.8, we shall work instead with a (β,ε)(\beta,\varepsilon)-distance r~\widetilde{r} on DD, for some ε>0\varepsilon>0 and β(0,1]\beta\in(0,1] properly chosen. Having all these in mind, the aim of this subsection is to estimate the error of approximating the solution uu with

uM(x):=𝔼{g(XMx)+k=1Mr~2(Xk1x)K0f(Xk1x+r~(Xk1x))} for all xD,u_{M}(x):=\mathbb{E}\left\{g(X_{M}^{x})+\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x}_{k-1})K_{0}f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})\cdot)\right\}\quad\mbox{ for all }x\in D, (2.20)

for a given (deterministic) number of steps M1M\geq 1 that does not depend on xDx\in D.

To keep the assumption on the regularity of DD as general as possible, the forthcoming estimates shall be obtained in terms of the function vv defined by (2.3) or (2.4), more precisely in terms of the behavior of vv near the boundary measured for each ε>0\varepsilon>0 by

|v|(ε):=sup{v(x):r(x):=d(x,D)ε}.|v|_{\infty}(\varepsilon):=\sup\{v(x):r(x):=d(x,\partial D)\leq\varepsilon\}. (2.21)

Though this is our primary measure of the geometry of the boundary, we can in fact refine things by defining

v(x,ε):=𝔼{v(BτNεxxx)}v(F,ε):=supxFv(x,ε) for FD.\begin{split}v(x,\varepsilon):&=\mathbb{E}\left\{v(B^{x}_{\tau_{N_{\varepsilon}^{x}}^{x}})\right\}\\ v(F,\varepsilon):&=\sup_{x\in F}v(x,\varepsilon)\text{ for }F\subset D.\end{split} (2.22)

We include here a small result which reveals the main properties we need further on.

Proposition 2.18.

We have

v(x,ε)|v|(ε).v(x,\varepsilon)\leq|v|_{\infty}(\varepsilon). (2.23)

For any domain DD and any compact FDF\subset D,

limε0v(F,ε)=0.\lim_{\varepsilon\to 0}v(F,\varepsilon)=0. (2.24)

We will only point how one can prove this by using the observation that for any stopping time τ\tau, v(Bτt)v(B_{\tau\wedge t}) is a bounded right-continuous supermartingale, thus converges for tt\to\infty. As a consequence, we obtain that v(x,ε)v(x,\varepsilon) converges to 0. On the other hand, we just observe that v(x,ε)v(x,\varepsilon) is non-increasing in ε\varepsilon, thus by Dini’s theorem, we get the uniform convergence to 0.

Let us first consider the case of homogeneous boundary conditions, namely g0g\equiv 0.

Proposition 2.19.

Let ε>0\varepsilon>0, β(0,1]\beta\in(0,1], r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance, fL(D)f\in L^{\infty}(D), MM\in\mathbb{N}^{\ast} and uu be the solution to (2.1) with g0g\equiv 0. If uMu_{M} is given by (2.20) then

|u(x)uM(x)||f|[v(x,ε)+2d𝖽𝗂𝖺𝗆(D)2eβ2ε24𝖽𝗂𝖺𝗆(D)2M] for all ε>0.|u(x)-u_{M}(x)|\leq|f|_{\infty}\left[v(x,\varepsilon)+\frac{2}{d}{\sf diam}(D)^{2}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\right]\quad\mbox{ for all }\varepsilon>0. (2.25)

In particular,

supxD|u(x)uM(x)||f|[|v|(ε)+2d𝖽𝗂𝖺𝗆(D)2eβ2ε24𝖽𝗂𝖺𝗆(D)2M] for all ε>0.\sup_{x\in D}|u(x)-u_{M}(x)|\leq|f|_{\infty}\left[|v|_{\infty}(\varepsilon)+\frac{2}{d}{\sf diam}(D)^{2}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\right]\quad\mbox{ for all }\varepsilon>0. (2.26)

Proof in Subsection 4.3.

Let us treat now the in-homogeneous Dirichlet problem, this time taking f0f\equiv 0.

Proposition 2.20.

Let ε>0\varepsilon>0, β(0,1]\beta\in(0,1], r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance, uu be the solution to (2.1) with gC(D¯)g\in C(\overline{D}) and f0f\equiv 0. Further, for each MM\in\mathbb{N}_{\ast} consider that uMu_{M} is given by (2.20) If gg is α\alpha-Hölder on D¯\overline{D} for some α[0,1]\alpha\in[0,1] then

|u(x)uM(x)|dα/2|g|αv(x,ε)α/2+4|g|eβ2ε24𝖽𝗂𝖺𝗆(D)2M for all ε>0|u(x)-u_{M}(x)|\leq d^{\alpha/2}|g|_{\alpha}\cdot v(x,\varepsilon)^{\alpha/2}+4|g|_{\infty}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\quad\mbox{ for all }\varepsilon>0 (2.27)

and in particular,

supxD|u(x)uM(x)|dα/2|g|α|v|α/2(ε)+4|g|eβ2ε24𝖽𝗂𝖺𝗆(D)2M for all ε>0.\sup\limits_{x\in D}|u(x)-u_{M}(x)|\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|^{\alpha/2}_{\infty}(\varepsilon)+4|g|_{\infty}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\quad\mbox{ for all }\varepsilon>0. (2.28)

Proof in Subsection 4.3.

We can now superpose 2.19 and 2.20 to obtain the following key result:

Theorem 2.21.

Let ε>0\varepsilon>0, β(0,1]\beta\in(0,1], r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance, and uu denote the solution to (2.1) with gC(D¯)g\in C(\overline{D}) and fL(D)f\in L^{\infty}(D). Further, for each MM\in\mathbb{N}_{\ast} let uMu_{M} be given by (2.20). If gg is α\alpha-Hölder on D¯\overline{D} for some α[0,1]\alpha\in[0,1], then for all ε>0\varepsilon>0 we have

|u(x)uM(x)|dα/2|g|αvα/2(x,ε)+|f|v(x,ε)+(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε24𝖽𝗂𝖺𝗆(D)2M.|u(x)-u_{M}(x)|\leq d^{\alpha/2}|g|_{\alpha}\cdot v^{\alpha/2}(x,\varepsilon)+|f|_{\infty}v(x,\varepsilon)+(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

In particular we get that

supxD|u(x)uM(x)|dα/2|g|α|v|α/2(ε)+|f||v|(ε)+(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε24𝖽𝗂𝖺𝗆(D)2M.\sup_{x\in D}|u(x)-u_{M}(x)|\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|_{\infty}^{\alpha/2}(\varepsilon)+|f|_{\infty}|v|_{\infty}(\varepsilon)+(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

If gCb2(D¯)g\in C^{2}_{b}(\overline{D}), then for all ε>0\varepsilon>0 we have

|u(x)uM(x)|(|Δg|2+|f|)v(x,ε)+(8|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε24𝖽𝗂𝖺𝗆(D)2M.|u(x)-u_{M}(x)|\leq\left(\frac{|\Delta g|_{\infty}}{2}\cdot+|f|_{\infty}\right)v(x,\varepsilon)+(8|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

and also in particular,

supxD|u(x)uM(x)|(|Δg|2+|f|)|v|(ε)+(8|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε24𝖽𝗂𝖺𝗆(D)2M.\sup_{x\in D}|u(x)-u_{M}(x)|\leq\left(\frac{|\Delta g|_{\infty}}{2}\cdot+|f|_{\infty}\right)|v|_{\infty}(\varepsilon)+(8|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

Let us point out that if 𝖺𝖽𝗂𝖺𝗆(D)<{\sf adiam}(D)<\infty (see (2.6) below), then |v|(ε)|v|_{\infty}(\varepsilon) involved above may be replaced by ε𝖺𝖽𝗂𝖺𝗆(D)\varepsilon\;{\sf adiam}(D). Furthermore, when the domain is defective convex (see subsection 2.2), we can improve considerably the above error estimates with respect to the required number of WoS steps, MM. This can be done analogously to the proofs of 2.19 and 2.20, just by replacing the tail estimate given by 2.11 with the one provided by 2.16. Therefore, we give below the precise statement, but we skip its proof.

Corollary 2.22.

In the context of 2.21, the following additional assertions hold:

  1. i)

    If DD satisfies the uniform exterior ball condition, then by 2.6, |v|(ε)|v|_{\infty}(\varepsilon) involved in the above estimate may be replaced with ε𝖺𝖽𝗂𝖺𝗆(D)\varepsilon\;{\sf adiam}(D), where recall that 𝖺𝖽𝗂𝖺𝗆(D){\sf adiam}(D) is given by (2.7).

  2. ii)

    If the domain DD is δ\delta-defective convex (δ<1\delta<1, see (2.15) for the definition) so that the conclusion from 2.16 is in force, then the factor eβ2ε24𝖽𝗂𝖺𝗆(D)2Me^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M} from the above estimate can be replaced by
    (1β2(1d)4d)M𝖽𝗂𝖺𝗆(D)ε\left(1-\frac{\beta^{2}(1-d)}{4d}\right)^{M}\sqrt{\frac{{\sf diam}(D)}{\varepsilon}}.

2.4 Monte-Carlo approximations: mean versus tail estimates

We place ourselves in the same framework as before, namely: DdD\subset\mathbb{R}^{d} is bounded domain, gC(D¯)g\in C(\overline{D}), fL(D)f\in L^{\infty}(D), and uu is the solution to (2.1).

Further, let (Uni)n1,i1(U^{i}_{n})_{n\geq 1,i\geq 1} be a family of independent and uniformly distributed random variables on S(0,1)S(0,1), r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance on DD for some β(0,1]\beta\in(0,1] and ε>0\varepsilon>0 (see 2.7), and set:

Xn+1x,i\displaystyle X^{x,i}_{n+1} :=Xnx,i+r~(Xnx,i)Un+1i,n0,i1.\displaystyle:=X_{n}^{x,i}+\widetilde{r}(X_{n}^{x,i})\cdot U^{i}_{n+1},\;\;n\geq 0,i\geq 1.

On the same probability space as (Uni)n1,i1(U^{i}_{n})_{n\geq 1,i\geq 1}, let (Yi)i1(Y^{i})_{i\geq 1} be iid random variables with distribution μ\mu given by 2.10, ii), such that the family (Yi)i1(Y^{i})_{i\geq 1} is independent of (Uni)n1,i1(U^{i}_{n})_{n\geq 1,i\geq 1}.

For N,MN,M\in\mathbb{N}^{\ast} let uMu_{M} be given by (2.20), and consider the Monte Carlo estimator

uMN(x):=1Ni=1N[g(XMx,i)+1dk=1Mr~2(Xk1x,i)f(Xk1x,i+r~(Xk1x,i)Yi)],xD.u_{M}^{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\left[g(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x,i}_{k-1})f\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right],x\in D. (2.29)

As in 2.10, iii), we have

𝔼{uMN(x)}=uM(x),xD,N1.\mathbb{E}\left\{u_{M}^{N}(x)\right\}=u_{M}(x),\quad x\in D,N\geq 1.
Remark 2.23.

At this point we would like to point out that estimator uMNu_{M}^{N} is different than the one employed in [23], Proposition 4.3 in several main aspects:

  1. i)

    The first aspect was already anticipated in the beginning of Subsection 2.3, namely instead of stopping the WoS chain at supxDNεx\sup\limits_{x\in D}N_{\varepsilon}^{x} which is a random time that is difficult to handle both theoretically and practically, we simply stop it a deterministic time MM which is going to be chosen according to the estimates obtained in 2.26 below and its two subsequent corollaries.

  2. ii)

    The second aspect is that the estimator used in [23] considers NN iid samples drawn from μ\mu, for each of the NN iid samples drawn from (Un)n1(U_{n})_{n\geq 1}, leading to a total of N2N^{2} samples. In contrast, uMNu_{M}^{N} requires merely NN samples, because YY and (Un)n(U_{n})_{n} are sampled simultaneously (and independently), on the same probability space.

  3. iii)

    The third aspect is more subtle: In [23], the Monte Carlo estimator of type (2.29) is constructed based on a given DNN approximation r¯\overline{r} of the distance to the boundary rr, for any prescribed error, let us say η\eta. Then, the approximation error of the solution is obtained based on the error of the Monte Carlo estimator constructed with the exact distance rr, and on how such an estimator varies when rr is replaced with r¯\overline{r}. However, the latter source of error scales like 2Nεη2^{N_{\varepsilon}}\eta, where Nε:=supxDNεxN_{\varepsilon}:=\sup\limits_{x\in D}N_{\varepsilon}^{x}. To compensate this explosion of error, η\eta has to be taken extremely small, and to do so, in [23] it is assumed that r¯\overline{r} can be realized with complexity O(log(1/η))O(\log(1/\eta)); The authors show that such a complexity can indeed be attained for the case of a ball or a hypercube in d\mathbb{R}^{d}, and probably can be extended to other domains with a nice geometry. Our approach is different and the key ingredient is to rely on the notion of (β,ε)(\beta,\varepsilon)-distance introduced in 2.7. More precisely, using 2.8 we can replace r¯\overline{r} by some (β,ε)(\beta,\varepsilon)-distance r~\widetilde{r} at essentially no additional cost, and rely on the herein developed analysis for r~\widetilde{r}-WoS. This approach turns out to avoid the additional error of order 2Nεη2^{N_{\varepsilon}}\eta mentioned above, in particular we shall be able to consider domains whose distance function to the boundary may be approximated by a DNN merely at a polynomial complexity with respect to the approximation error.

  4. iv)

    Another issue regards the construction of the WoS chain itself. Because r¯\overline{r} from iii) may be strictly bigger than rr, for a given position xDx\in D, the sphere of radius r¯(x)\overline{r}(x) might exceed DD, so there is a risk that the WoS chain leaves the domain DD. In particular, if one constructs the WoS chain based on r¯\overline{r}, then in order to make the analysis rigorous the boundary data gg and the source ff should be extended also to the complement of the domain D¯\overline{D}. Fortunately, this issue is completely avoided by considering r~\widetilde{r}-WoS chains (as it is done in this work), since by definition r~r\widetilde{r}\leq r on DD.

Let us begin with the following mean estimate in L2(D)L^{2}(D):

Proposition 2.24.

Let ε>0\varepsilon>0, β(0,1]\beta\in(0,1], and r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance. Then for all N,MN,M\in\mathbb{N}, and γ0\gamma\geq 0

𝔼{|u()uMN()|L2(D)2}2λ(D)[supxD|u(x)uM(x)|2+2(|g|2+1d3M|f|2𝖽𝗂𝖺𝗆(D)4N],\mathbb{E}\left\{\left|u(\cdot)-u_{M}^{N}(\cdot)\right|^{2}_{L^{2}(D)}\right\}\leq 2\lambda(D)\left[\sup\limits_{x\in D}|u(x)-u_{M}(x)|^{2}+\frac{2(|g|^{2}_{\infty}+\frac{1}{d^{3}}M|f|^{2}_{\infty}{\sf diam}(D)^{4}}{N}\right], (2.30)

where λ\lambda is the Lebegue measure on d\mathbb{R}^{d}, whilst uMu_{M} and uMNu_{M}^{N} are given by (2.20) and (2.29). In particular, the above inequality can be made more explicit by employing the estimates for supxD|u(x)uM(x)|\sup\limits_{x\in D}|u(x)-u_{M}(x)| obtained in 2.21 and 2.22, depending on the regularity of DD and gg.

Proof in Subsection 4.4.

Remark 2.25.

Note that as in [23], Section 4, the above error estimate depends on the volume λ(D)\lambda(D). When λ(D)\lambda(D) scales well with the dimension (e.g. at most polynomially), then (2.30) can be employed to overcome the curse of high dimensions; in fact, if DD is a subset of a hypercube whose side has length less then some δ<1\delta<1, then λ(D)δd\lambda(D)\leq\delta^{d}, hence, in this case, the factor λ(D)\lambda(D) improves the mean squared error exponentially with respect to dd. However, λ(D)\lambda(D) may also grow exponentially with respect to dd, and the above estimate can not be used to construct a neural network whose size scales at most polynomially with respect to the dimension. Therefore, our next (and in fact) main goal is to solve this inconvenience, by looking at tail estimates for the Monte-Carlo error; one key idea is to quantify the error using the sup-norm instead of the L2(D)L^{2}(D)-norm.

Before we move forward, we recall the notion of a regular domain. We say that a bounded domain DD is regular if for any continuous function gg on the boundary, the harmonic function uu with the boundary condition ff is continuous in D¯\bar{D}.

As announced in the above remark, we conclude now with the central result of this paper.

Theorem 2.26.

Keep the same framework and notations as in the beginning of this subsection. Fix a small ε0>0\varepsilon_{0}>0, β(0,1]\beta\in(0,1], r~\widetilde{r} a (β,ε0)(\beta,\varepsilon_{0})-distance, and consider uMu_{M} and uMNu_{M}^{N} given by (2.20) and (2.29). Also, assume that ff and gg are α\alpha-Hölder on DD for some α(0,1]\alpha\in(0,1]. Then, for any compact subset FDF\subset D, for all N,M,K1N,M,K\geq 1, γ>0\gamma>0 and ε(0,ε0]\varepsilon\in(0,\varepsilon_{0}], then

(supxF|u(x)uMN(x)|γ)2exp(C1(M,K,d)((γA(F,M,K,d,ε))+)2C2(M,d)N),\mathbb{P}\left(\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq 2\exp\left(C_{1}(M,K,d)-\frac{\left((\gamma-A(F,M,K,d,\varepsilon))^{+}\right)^{2}}{C_{2}(M,d)}N\right), (2.31)

where

C1(M,K,d):=d(M/αlog(2+|r~|1)+log(K))C2(M,d):=|g|+Md𝖽𝗂𝖺𝗆(D)2|f|\begin{split}C_{1}(M,K,d):&=d\left(\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right)\\ C_{2}(M,d):&=|g|_{\infty}+\frac{M}{d}{\sf diam}(D)^{2}|f|_{\infty}\\ \end{split} (2.32)

and

A(F,M,K,d,ε):=2(|g|α+diam(D)2|f|α+2diam(D)|f|d)(𝖽𝗂𝖺𝗆(D)K)α+dα/2|g|αv(F,ε)α/2+|f|v(F,ε)+(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε24𝖽𝗂𝖺𝗆(D)2M\begin{split}A(F,M,K,d,\varepsilon)&:=2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha}\\ &\quad+d^{\alpha/2}|g|_{\alpha}\cdot v(F,\varepsilon)^{\alpha/2}+|f|_{\infty}v(F,\varepsilon)+(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\end{split} (2.33)

If gC2(D¯)g\in C^{2}(\overline{D}) then in (2.31) the term M/α\lceil M/\alpha\rceil can be replaced by MM and

A(F,M,K,d,ε):=\displaystyle A(F,M,K,d,\varepsilon):= 2(|g|α+diam(D)2|f|α+2diam(D)|f|d)𝖽𝗂𝖺𝗆(D)K\displaystyle 2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\frac{{\sf diam}(D)}{K}
+(|Δg|2+|f|)v(F,ε)+(8|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε24𝖽𝗂𝖺𝗆(D)2M.\displaystyle+\left(\frac{|\Delta g|_{\infty}}{2}+|f|_{\infty}\right)v(F,\varepsilon)+(8|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}.

Moreover, if we set

B(M,K,d):=C2(M,d)(C1(M,K,d)+log(2)+1)=(|g|+Md𝖽𝗂𝖺𝗆(D)2|f|)(d(M/αlog(2+|r~|1)+log(K))+log(2)+1).\begin{split}B(M,K,d):&=C_{2}(M,d)(\sqrt{C_{1}(M,K,d)+\log(2)}+1)\\ &=\left(|g|_{\infty}+\frac{M}{d}{\sf diam}(D)^{2}|f|_{\infty}\right)\left(\sqrt{d\left(\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right)+\log(2)}+1\right).\end{split} (2.34)

then we also have the estimate on the expectation of the total error in the form

𝔼{supxF|u(x)uMN(x)|}A(F,M,K,d,ε)+B(M,K,d)N.\mathbb{E}\left\{\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\right\}\leq A(F,M,K,d,\varepsilon)+\frac{B(M,K,d)}{\sqrt{N}}. (2.35)

As a consequence, from 2.18, for any compact set FDF\subset D,

limMlimN𝔼{supxF|u(x)uMN(x)|}=0.\lim_{M\to\infty}\lim_{N\to\infty}\mathbb{E}\left\{\sup_{x\in F}\left|u(x)-u_{M}^{N}(x)\right|\right\}=0. (2.36)

For regular domains, we can take F=DF=D.

Furthermore, for any domain, we can replace v(F,ϵ)v(F,\epsilon) with |v|(ε)|v|_{\infty}(\varepsilon). Moreover, if the domain satisfies the exterior ball condition, then in (2.33) we can take F=DF=D and replace |v|(ε)|v|_{\infty}(\varepsilon) by εadiam(D)\varepsilon\;{\rm adiam}(D).

If the domain DD is δ\delta-defective convex, we can replace eβ2ε24𝖽𝗂𝖺𝗆(D)2Me^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M} from the definition of A in (2.33) with (1β2(1δ)4d)M𝖽𝗂𝖺𝗆(D)ε\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\sqrt{\frac{{\sf diam}(D)}{\varepsilon}}.

Proof in Subsection 4.4.

Remark 2.27.

Note that the left hand side of (2.31) does not depend on KK, and if r~\widetilde{r} is a β\beta-distance, then it does not depend on ε\varepsilon as well. Therefore, in the right hand side of (2.31) one may take the infimum with respect to K1K\geq 1, and if r~\widetilde{r} is a (β,ε0)(\beta,\varepsilon_{0})-distance then one may also take the infimum with respect to ε>0\varepsilon>0; but optimizing the previously obtained bounds in this way may be cumbersome. Anyway, convenient bounds can be easily obtained from particular choices of KK and ε\varepsilon, so let us do so in the sequel.

Let us conclude this subsection with the following consequence obtained for some convenient choices for KK and ε\varepsilon.

Corollary 2.28.

Let DdD\subset\mathbb{R}^{d} such that it satisfies the uniform exterior ball condition. Further, let ff and gg be α\alpha-Hölder on D¯\overline{D} for some α[0,1]\alpha\in[0,1], γ>0\gamma>0 be a prescribed error, η>0\eta>0 be a prescribed confidence, and r~\widetilde{r} be a (β,ε0)(\beta,\varepsilon_{0})-distance with β(0,1]\beta\in(0,1] and ε>0\varepsilon>0 such that

ε\displaystyle\varepsilon ε0:=[1+4(|g|α+|f|)𝖺𝖽𝗂𝖺𝗆(D)1]2αγ2αd1.\displaystyle\leq\varepsilon_{0}:={\left[1+4(|g|_{\alpha}+|f|_{\infty}){\sf adiam}(D)\vee 1\right]}^{-\frac{2}{\alpha}}\gamma^{\frac{2}{\alpha}}d^{-1}. (2.37)
Also, choose
K\displaystyle K :=𝖽𝗂𝖺𝗆(D)(8(|g|α+diam(D)2|f|α+2diam(D)|f|d)+1γ)1/α.\displaystyle:=\left\lceil{\sf diam}(D)\left(\frac{8\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)+1}{\gamma}\right)^{1/\alpha}\right\rceil. (2.38)

Then

(supxD|u(x)uMN(x)|γ)η\mathbb{P}\left(\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq\eta (2.39)

whenever we choose

N16{d[M/αlog(2+|r~|1)+log(K)]+log(2η)}[|g|+Md𝖽𝗂𝖺𝗆(D)2|f|]29γ2N\geq\frac{16\left\{d\left[\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right]+\log(\frac{2}{\eta})\right\}\left[|g|_{\infty}+\frac{M}{d}{\sf diam}(D)^{2}|f|_{\infty}\right]^{2}}{9\gamma^{2}} (2.40)

and

M[log(4/γ)+log(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)]4𝖽𝗂𝖺𝗆(D)2β2ε02.M\geq\frac{\left[\log(4/\gamma)+\log(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})\right]4{\sf diam}(D)^{2}}{\beta^{2}\varepsilon_{0}^{2}}. (2.41)

Furthermore, if DD is δ\delta-defective convex, then MM can be chosen as

M4d[log(4γ𝖽𝗂𝖺𝗆(D)ε0)+log(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)]β2(1δ).M\geq\frac{4d\left[\log\left(\frac{4}{\gamma}\sqrt{\frac{{\sf diam}(D)}{\varepsilon_{0}}}\right)+\log(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})\right]}{\beta^{2}(1-\delta)}. (2.42)

Proof in Subsection 4.4.

Remark 2.29.

By some simple computations, ε0\varepsilon_{0}, MM, and NN from 2.28, exhibit the following asymptotic behaviors:

  1. ε0𝒪(γ2αd1)\varepsilon_{0}\in\mathcal{O}(\gamma^{\frac{2}{\alpha}}d^{-1}),

and if DD is δ\delta-defective convex then

M𝒪(dlog(d/γ)β2(1δ)),N𝒪(log2(d/γ)[d2log(d/γ)+β2(1δ)log(1/η)]β4γ2(1δ)2),M\in\mathcal{O}\left(\frac{d\log(d/\gamma)}{\beta^{2}(1-\delta)}\right),\quad N\in\mathcal{O}\left(\frac{\log^{2}(d/\gamma)\left[d^{2}\log(d/\gamma)+\beta^{2}(1-\delta)\log(1/\eta)\right]}{\beta^{4}\gamma^{2}(1-\delta)^{2}}\right),

whilst if DD satisfies the uniform exterior ball condition then

M𝒪(d2log(1/γ)β2γ4/α),N𝒪(d2log(1/γ)2[d2log2(1/γ)+β2γ2/αlog(1/η)]β4γ2+4/α).M\in\mathcal{O}\left(\frac{d^{2}\log(1/\gamma)}{\beta^{2}\gamma^{4/\alpha}}\right),\quad N\in\mathcal{O}\left(\frac{d^{2}\log(1/\gamma)^{2}\left[d^{2}\log^{2}(1/\gamma)+\beta^{2}\gamma^{2/\alpha}\log(1/\eta)\right]}{\beta^{4}\gamma^{2+4/\alpha}}\right).

Here, the Landau symbols tacitly depend on (the regularity of) f,g,𝖽𝗂𝖺𝗆(D),𝖺𝖽𝗂𝖺𝗆(𝖣), and r~f,g,{\sf diam}(D),{\sf adiam(D)},\mbox{ and }\widetilde{r}. In particular, only in terms of the dimension dd, if the domain is δ\delta-convex then M𝒪(dlog(d))M\in\mathcal{O}(d\log(d)) and N𝒪(d2log3(d))N\in\mathcal{O}(d^{2}\log^{3}(d)), whilst merely under the uniform exterior ball condition, M𝒪(d2)M\in\mathcal{O}(d^{2}) and N𝒪(d4)N\in\mathcal{O}(d^{4}).

2.5 On regular extensions of the boundary data inside the domain

Recall that one assumption of the main results in the previous subsections (see e.g. 2.26) is that the boundary data gg can be extended as a regular function (Hölder or C2C^{2}) defined on the entire domain D¯\overline{D}. This is required by the fact that the data needs to be evaluated at the location where the WoS chain is stopped, see (2.29), and such stopped position lies in principle in the interior of the domain DD. However, usually in practice, gg is measured (hence known) merely at the boundary D\partial D. With this issue in mind, in this subsection we address the problem of extending gg regularly from D\partial D to D¯\overline{D}, in a constructive way which is also DNN-compatible.

We take DdD\subset\mathbb{R}^{d} to be a set of class CkC^{k}, k=3k=3 or k=2k=2, hence (see [19, sec. 14.6]) there exists a neighbourhood Dϵ0:={xD;dist(x,D)<ϵ0}D_{\epsilon_{0}}:=\{x\in D;\textrm{dist}(x,\partial D)<\epsilon_{0}\} of D\partial D such that the restriction of the distance function r:Dϵ0+r:{D_{\epsilon_{0}}}\to\mathbb{R}_{+} is of class CkC^{k}, and the nearest point projection πD:Dϵ0D{\pi_{\partial D}}:{D_{\epsilon_{0}}}\to\partial D is of class Ck1C^{k-1}. We have:

Lemma 2.30.

Let DdD\subset\mathbb{R}^{d} to be a set of class C2C^{2} hence for any point xDx\in\partial D there exist a function ϕx:d1\phi_{x}:\mathbb{R}^{d-1}\to\mathbb{R} of class C2C^{2} and a radius rx>0r_{x}>0 such that

DB(x,rx)={y=(y1,,yd)B(x,rx);yd<ϕx(y1,,yd1).D\cap B(x,r_{x})=\{y=(y_{1},\dots,y_{d})\in B(x,r_{x});y_{d}<\phi_{x}(y_{1},\dots,y_{d-1}).

We denote by M:=supxDi,j,k{1,,d}|3ϕxyiyjyk(x)|M:=\sup_{\stackrel{{\scriptstyle i,j,k\in\{1,\dots,d\}}}{{x\in\partial D}}}\big{|}\frac{\partial^{3}\phi_{x}}{\partial y_{i}\partial y_{j}\partial y_{k}}(x)\big{|}. Furthermore, denoting by k1(x)k2(x)kd1(x)k_{1}(x)\leq k_{2}(x)\leq\dots\leq k_{d-1}(x) the ordered principal curvatures of D\partial D let us take ε0:=minxDkd11(x){\varepsilon}_{0}:=\min_{x\in\partial D}k_{d-1}^{-1}(x). Take ψCc([0,),)\psi\in C^{\infty}_{c}([0,\infty),\mathbb{R}) to be such that ψ1\psi\equiv 1 on [0,1][0,1] and ψ0\psi\equiv 0 on [3,)[3,\infty), |ψ||,|ψ||,|ψ′′|1|\psi|_{\infty}|,|\psi^{\prime}|_{\infty}|,|\psi^{\prime\prime}|_{\infty}\leq 1. We define the extension GG in D¯\overline{D} of the α\alpha-Hölder function gg given on the boundary D\partial D, for α(0,1]\alpha\in(0,1] as:

G:D¯,G(x):=ψ(1ε0r(x)))g(πD(x)),xD¯.G:\overline{D}\to\mathbb{R},\quad G(x):=\psi\left(\frac{1}{{\varepsilon}_{0}}r(x))\right)g({\pi_{\partial D}}(x)),\;x\in\overline{D}. (2.43)

Then GG is α\alpha-Hölder on D¯\overline{D} and |G|α|πD|α|g|α+|g|ε01|𝖽𝗂𝖺𝗆(𝖣)|𝟣α.|G|_{\alpha}\leq|\nabla{\pi_{\partial D}}|_{\infty}^{\alpha}|g|_{\alpha}+|g|_{\infty}{\varepsilon}_{0}^{-1}|\sf{diam}(D)|^{1-\alpha}.

If, furthemore, the domain is of class C3C^{3} and gC2(D)g\in C^{2}(\partial D) then GG is in C2(D)C(D¯)C^{2}(D)\cap C(\overline{D}) with G=gG=g on D\partial D and furthermore we have:

|G|\displaystyle|\nabla G|_{\infty} 1ε0|g|+2|g|\displaystyle\leq\frac{1}{{\varepsilon}_{0}}|g|_{\infty}+2|\nabla g|_{\infty}
|ΔG|\displaystyle|\Delta G|_{\infty} C~,\displaystyle\leq\widetilde{C},

where C~\widetilde{C} is an explicitly computable constant in terms of |g|,|g|,|Δg||g|_{\infty},|\nabla g|_{\infty},|\Delta g|_{\infty}, MM, and ε0{\varepsilon}_{0}.

Proof in Subsection 4.5.

Remark 2.31.

One can easily provide a non-constructive α\alpha-Hölder extension G~\widetilde{G} on D¯\overline{D} of the α\alpha-Hölder boundary data gg given on D\partial D by setting G~(x):=inf{g(y)+|g|α|xy|α,yD},xD¯\widetilde{G}(x):=\inf\{g(y)+|g|_{\alpha}|x-y|^{\alpha},y\in\partial D\},x\in\overline{D}.

3 DNN counterpart of the main results

Let σ:\sigma:\mathbb{R}\to\mathbb{R} be the rectified linear unit (ReLU) activation function, that is σ(x):=max{0,x}\sigma(x):=\max\{0,x\}, xx\in\mathbb{R}. Let (di)i=0,,d(d_{i})_{i=0,\ldots,d} be a sequence of positive integers. Let Aidi×di1A^{i}\in\mathbb{R}^{d_{i}\times d_{i-1}} and bidib^{i}\in\mathbb{R}^{d_{i}}, i=1,,Li=1,\ldots,L, and set Wi(x):=Aix+bi,xdW^{i}(x):=A^{i}x+b^{i},x\in\mathbb{R}^{d}. We define the realization of the DNN d0xϕ(x)\mathbb{R}^{d_{0}}\ni x\mapsto\phi(x) by

d0xϕ(x):=WLσWL1σW1(x)dL,xd0,\mathbb{R}^{d_{0}}\ni x\mapsto\phi(x):=W^{L}\circ\sigma\circ W^{L-1}\cdots\circ\sigma\circ W^{1}(x)\in\mathbb{R}^{d_{L}},\quad x\in\mathbb{R}^{d_{0}}, (3.1)

where dxσ(x):=(σ(x1),,σ(xd))\mathbb{R}^{d}\ni x\mapsto\sigma(x):=(\sigma(x_{1}),\ldots,\sigma(x_{d})), dd\in\mathbb{N}, is defined coordinatewise. The weights of the ReLU DNN ϕ\phi are the entries of (Ai,bi)i=1,,L(A^{i},b^{i})_{i=1,\ldots,L}. The size of ϕ\phi denoted by size(ϕ){\rm size}(\phi) is the number of non-zero weights. The width of ϕ\phi is defined by width(ϕL)=max{d0,,dL}{\rm width}(\phi^{L})=\max\{d_{0},\ldots,d_{L}\} and LL is the depth of ϕ\phi denoted by (ϕ)\mathcal{L}(\phi). In the sequel, we only consider DNNs with ReLU activation function.

For the reader’s convenience, before we proceed to the main result of this section (see 3.10 below), we present first several technical lemmas following [15] and [52], as well as some of their consequences; all these preparatory results are meant to provide a clear and systematic way of quantifying the size of the DNN which is constructed in the forthcoming main result, namely 3.10.

The following lemma is [52, Proposition 3].

Lemma 3.1.

For every c>0c>0 and δ(0,1)\delta\in(0,1), there exists a DNN Πδc\Pi_{\delta}^{c} such that

supa,b[c,c]|abΠδc(a,b)|δandsize(Πδc)=𝒪(log(δ1)+log(c)).\sup_{a,b\in[-c,c]}|ab-\Pi_{\delta}^{c}(a,b)|\leq\delta\quad\mbox{and}\quad{\rm size}(\Pi_{\delta}^{c})=\mathcal{O}(\lceil\log(\delta^{-1})+\log(c)\rceil).

Now, we recall Lemma II.6 from [15]:

Lemma 3.2.

Let ϕi,i=1,,n\phi_{i},\;i=1,\ldots,n, be ReLU DNNs with the same input dimension d0d_{0}\in\mathbb{N} and the same depth :=(ϕi),1in\mathcal{L}:=\mathcal{L}(\phi_{i}),1\leq i\leq n. Let aia_{i}, i=1,,ni=1,\ldots,n, be scalars. Then there exists a ReLU DNN ϕ\phi such that

  • i)

    ϕ(x)=i=1naiϕi(x)\phi(x)=\sum_{i=1}^{n}a_{i}\phi_{i}(x) for every xd0x\in\mathbb{R}^{d_{0}},

  • ii)

    (ϕ)=\mathcal{L}(\phi)=\mathcal{L},

  • iii)

    𝒲(ϕ)1in𝒲(ϕi)\mathcal{W}(\phi)\leq\sum\limits_{1\leq i\leq n}\mathcal{W}(\phi_{i}),

  • iv)

    size(ϕ)1insize(ϕi){\rm size}(\phi)\leq\sum\limits_{1\leq i\leq n}{\rm size}(\phi_{i}).

The following lemma is taken from  [15, Lemma II.3], with the mention that the last assertion iv) brings some improvement which is relevant to our purpose; it is immediately entailed by the proof of the same  [15, Lemma II.3], so we skip its justification.

Lemma 3.3.

Let ϕ1:d1d2\phi_{1}:\mathbb{R}^{d_{1}}\rightarrow\mathbb{R}^{d_{2}} and ϕ2:d3d1\phi_{2}:\mathbb{R}^{d_{3}}\rightarrow\mathbb{R}^{d_{1}} be two ReLU DNNs. Then there exists a ReLU DNN ϕ:d3d2\phi:\mathbb{R}^{d_{3}}\rightarrow\mathbb{R}^{d_{2}} such that

  1. i)

    ϕ(x)=ϕ1(ϕ2(x))\phi(x)=\phi_{1}(\phi_{2}(x)) for every xd02x\in\mathbb{R}^{d_{0}^{2}},

  2. iii)

    (ϕ)=(ϕ1)+(ϕ2)\mathcal{L}(\phi)=\mathcal{L}(\phi_{1})+\mathcal{L}(\phi_{2}),

  3. iii)

    𝒲(ϕ)max(𝒲(ϕ1),𝒲(ϕ2),2d1)\mathcal{W}(\phi)\leq\max(\mathcal{W}(\phi_{1}),\mathcal{W}(\phi_{2}),2d_{1}),

  4. iv)

    size(ϕ)min(size(ϕ1)+size(ϕ2)+d1[𝒲(ϕ1)+𝒲(ϕ2)], 2size(ϕ1)+2size(ϕ2)){\rm size}(\phi)\leq\min\left({\rm size}(\phi_{1})+{\rm size}(\phi_{2})+d_{1}[\mathcal{W}(\phi_{1})+\mathcal{W}(\phi_{2})],\;2{\rm size}(\phi_{1})+2{\rm size}(\phi_{2})\right).

The next lemma is essentially [15, Lemma II.4]. As in the case of the previous lemma, assertion iv) comes with a slight modification of the original result, which can be immediately deduced from the proof of [15, Lemma II.4].

Lemma 3.4.

Let ϕ:d0d1\phi:\mathbb{R}^{d_{0}}\rightarrow\mathbb{R}^{d_{1}} be a ReLU DNN such that (ϕ)<L\mathcal{L}(\phi)<L. Then there exists a second ReLU DNN ϕ~:d0d1\widetilde{\phi}:\mathbb{R}^{d_{0}}\rightarrow\mathbb{R}^{d_{1}} such that

  1. i)

    ϕ(x)=ϕ~(x)\phi(x)=\widetilde{\phi}(x) for all xd0x\in\mathbb{R}^{d_{0}},

  2. ii)

    (ϕ~)=L\mathcal{L}(\widetilde{\phi})=L,

  3. iii)

    𝒲(ϕ~)=max(2d1,𝒲(ϕ))\mathcal{W}(\widetilde{\phi})=\max(2d_{1},\mathcal{W}(\phi)),

  4. iv)

    size(ϕ~)min(size(ϕ)+d1𝒲(ϕ), 2size(ϕ))+2d1(L(ϕ)){\rm size}(\widetilde{\phi})\leq\min\left({\rm size}(\phi)+d_{1}\mathcal{W}(\phi),\;2{\rm size}(\phi)\right)+2d_{1}(L-\mathcal{L}(\phi)).

As a direct consequence of 3.1, 3.3, 3.4, and [15, Lemma II.5], one gets the following approximation result for products of scalar ReLU DNN:

Corollary 3.5.

Let ϕ1,ϕ2:d\phi_{1},\phi_{2}:\mathbb{R}^{d}\rightarrow\mathbb{R} be two ReLU DNNs, DdD\subset\mathbb{R}^{d} be a bounded subset, and let Π:=Πϵpc\Pi:=\Pi_{\epsilon_{p}}^{c} be given by 3.1 for c:=max(supxDϕ1(x),supxDϕ2(x))c:=\max\left(\sup\limits_{x\in D}\phi_{1}(x),\sup\limits_{x\in D}\phi_{2}(x)\right) and ϵp>0\epsilon_{p}>0. Then there exists a ReLU DNN ϕ:d\phi:\mathbb{R}^{d}\rightarrow\mathbb{R} such that

  1. i)

    ϕ(x)=Π(ϕ1(x),ϕ2(x))\phi(x)=\Pi(\phi_{1}(x),\phi_{2}(x)) for every xdx\in\mathbb{R}^{d},

  2. ii)

    supxD|ϕ1(x)ϕ2(x)ϕ(x)|ϵp,\sup\limits_{x\in D}|\phi_{1}(x)\phi_{2}(x)-\phi(x)|\leq\epsilon_{p},

  3. iii)

    size(ϕ)4size(ϕ1)+4size(ϕ2)+𝒪(log(ϵp1)+log(c)){\rm size}(\phi)\leq 4{\rm size}(\phi_{1})+4{\rm size}(\phi_{2})+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil).

The following two lemmas are going to be employed later in order to quantify the size of one generic step of the WoS chain given by (2.10)-(2.11), regarded as an action of a ReLU DNN.

Lemma 3.6.

Let ϕ:d\phi:\mathbb{R}^{d}\rightarrow\mathbb{R} be a ReLU DNN and vdv\in\mathbb{R}^{d} be a vector. Then there exists a ReLU DNN ϕv:d\phi_{v}:\mathbb{R}^{d}\rightarrow\mathbb{R} such that

  1. i)

    ϕv(x)=x+ϕ(x)v\phi_{v}(x)=x+\phi(x)v for all xdx\in\mathbb{R}^{d},

  2. ii)

    (ϕv)=(ϕ)+1\mathcal{L}(\phi_{v})=\mathcal{L}(\phi)+1,

  3. iii)

    𝒲(ϕv)2d+max(d,𝒲(ϕ))\mathcal{W}(\phi_{v})\leq 2d+\max(d,\mathcal{W}(\phi)),

  4. iv)

    size(ϕv)2size(ϕ)+2d[(ϕ)+2]{\rm size}(\phi_{v})\leq 2{\rm size}(\phi)+2d[\mathcal{L}(\phi)+2].

Proof in Subsection 4.5.

The following result is easily deduced by employing recursively 3.6 and 3.3, so we omit its proof.

Corollary 3.7.

Let ϕ:d\phi:\mathbb{R}^{d}\rightarrow\mathbb{R} be a ReLU DNN and vkd,k1v_{k}\in\mathbb{R}^{d},k\geq 1 be a sequence of vectors. Then there exist ReLU DNNs θk:dd,k0\theta_{k}:\mathbb{R}^{d}\rightarrow\mathbb{R}^{d},k\geq 0, such that for every k0k\geq 0

  1. i)

    θk+1(x)=ϕvk+1(θk(x))\theta_{k+1}(x)=\phi_{v_{k+1}}(\theta_{k}(x)) and θ0(x)=x\theta_{0}(x)=x for all xdx\in\mathbb{R}^{d}, where ϕvk\phi_{v_{k}} is the one constructed in 3.6,

  2. ii)

    (θk+1)=(k+1)((ϕ)+1)+1\mathcal{L}(\theta_{k+1})=(k+1)(\mathcal{L}(\phi)+1)+1,

  3. iii)

    𝒲(θk+1)2d+max(d,𝒲(ϕ))\mathcal{W}(\theta_{k+1})\leq 2d+\max(d,\mathcal{W}(\phi)),

  4. iv)

    size(θk+1)2d(k+1)[4d+𝒲(ϕ)+(ϕ)+2]+d+2(k+1)size(ϕ){\rm size}(\theta_{k+1})\leq 2d(k+1)[4d+\mathcal{W}(\phi)+\mathcal{L}(\phi)+2]+d+2(k+1){\rm size}(\phi).

We end this first paragraph by a ReLU DNN extension of a Hölder continuous boundary data gg to the entire domain D¯\overline{D}.

Corollary 3.8.

Let DdD\subset\mathbb{R}^{d} to be a set of class C2C^{2} and gg be α\alpha-Hölder on D\partial D. For ε0>0{\varepsilon}_{0}>0 and ψCc([0,),)\psi\in C^{\infty}_{c}([0,\infty),\mathbb{R}) as defined in Lemma 2.30 we assume that for every δr,δp,δψ,δg(0,1)\delta_{r},\delta_{p},\delta_{\psi},\delta_{g}\in(0,1), there exist ReLU DNNs ϕr\phi_{r}, ϕπ\phi_{\pi},ϕψ\phi_{\psi} and ϕg\phi_{g} such that

|rϕr|δr,|πDϕπ|δπ,|ψϕψ|δψ,|gϕg|δg.|r-\phi_{r}|_{\infty}\leq\delta_{r},\quad|{\pi_{\partial D}}-\phi_{\pi}|_{\infty}\leq\delta_{\pi},\quad|\psi-\phi_{\psi}|_{\infty}\leq\delta_{\psi},\quad|g-\phi_{g}|_{\infty}\leq\delta_{g}.

With ε0\varepsilon_{0} the one given in 2.30, set

δ¯:=2(3δψ+δdε0)|g|+2(3δg+|g|δπ)(δψ+1)𝒪(δψ+δr+δg+δπ).\overline{\delta}:=2\left(3\delta_{\psi}+\frac{\delta_{d}}{{\varepsilon}_{0}}\right)|g|_{\infty}+2\left(3\delta_{g}+|\nabla g|_{\infty}\delta_{\pi}\right)(\delta_{\psi}+1)\in\mathcal{O}(\delta_{\psi}+\delta_{r}+\delta_{g}+\delta_{\pi}).

If GG is the α\alpha-Hölder extension in DD of the boundary data gg given by (2.43), then there exists a ReLU DNN ϕG\phi_{G} such that

  1. i)

    |GϕG|δ¯,|G-\phi_{G}|_{\infty}\leq\overline{\delta},

  2. ii)

    size(ϕG)2size(ϕψ)+2size(ϕr)+2size(ϕg)+2size(ϕπ)+𝒪(log(δ¯1)+log(|g|)){\rm size}(\phi_{G})\leq 2{\rm size}(\phi_{\psi})+2{\rm size}(\phi_{r})+2{\rm size}(\phi_{g})+2{\rm size}(\phi_{\pi})+\mathcal{O}(\lceil\log(\overline{\delta}^{-1})+\log(|g|_{\infty})\rceil).

Proof in Subsection 4.5.

3.1 DNN approximations for solutions to problem (2.31)

We are now ready to present the DNN byproduct of 2.26, in fact of 2.28. First, let us state that the r~\widetilde{r}-WoS chain given by (2.10)-(2.11) renders a ReLU DNN as soon as r~\widetilde{r} is a ReLU DNN; this follows from a simple corroboration of 3.6 and 3.7, so we skip its formal proof.

Corollary 3.9.

Suppose that r~\widetilde{r} is a ReLU DNN on the bounded set DdD\subset\mathbb{R}^{d} such that 0r~r0\leq\widetilde{r}\leq r, where recall that rr specified by (2.9) is the distance function to the boundary of DD. Further, let M0M\geq 0 and (XMx,xD)\left(X_{M}^{x},x\in D\right) be the r~\widetilde{r}-WoS chain at step MM given by (2.10)(2.11)\eqref{wos0}-\eqref{wosn}. Then for each ωΩ\omega\in\Omega there exists a ReLU DNN defined on DD and denoted by 𝕏Mω()\mathbb{X}_{M}^{\omega}(\cdot) such that

  1. i)

    𝕏Mω(x)=XMx(ω)\mathbb{X}_{M}^{\omega}(x)=X_{M}^{x}(\omega) for all xDx\in D,

  2. ii)

    (𝕏Mω)=M((r~)+1)+1\mathcal{L}(\mathbb{X}_{M}^{\omega})=M(\mathcal{L}(\widetilde{r})+1)+1,

  3. iii)

    𝒲(𝕏Mω)2d+max(d,𝒲(r~))\mathcal{W}(\mathbb{X}_{M}^{\omega})\leq 2d+\max(d,\mathcal{W}(\widetilde{r})),

  4. iv)

    size(𝕏Mω)2dM[4d+𝒲(r~)+(r~)+2]+d+2Msize(r~){\rm size}(\mathbb{X}_{M}^{\omega})\leq 2dM[4d+\mathcal{W}(\widetilde{r})+\mathcal{L}(\widetilde{r})+2]+d+2M{\rm size}(\widetilde{r}).

The main result of this section is the following, proving that ReLU DNNs can approximate the solution uu to problem (2.31) without the curse of dimensions.

Theorem 3.10.

The statement requires a detailed context, so let us label the assumption and the conclusion separately.

Assumption: Let DdD\subset\mathbb{R}^{d} be a bounded domain satisfying the uniform exterior ball condition, ff and gg be α\alpha-Hölder functions on D¯\overline{D} for some α[0,1]\alpha\in[0,1], and uu be the solution to (2.31), as in 2.2. Let ϕf:D,ϕg:D¯,ϕr:D\phi_{f}:D\rightarrow\mathbb{R},\phi_{g}:\overline{D}\rightarrow\mathbb{R},\phi_{r}:D\rightarrow\mathbb{R} be ReLU DNNs such that

|fϕf|ϵf|f|,|gϕg|ϵg,|rϕr|ϵr,|f-\phi_{f}|_{\infty}\leq\epsilon_{f}\leq|f|_{\infty},\quad|g-\phi_{g}|_{\infty}\leq\epsilon_{g},\quad|r-\phi_{r}|_{\infty}\leq\epsilon_{r},

and set r~(x):=(ϕr(x)ϵr)+,xD.\widetilde{r}(x):=(\phi_{r}(x)-\epsilon_{r})^{+},\,x\in D. Also, let Π:=Πϵpc\Pi:=\Pi_{\epsilon_{p}}^{c} be the ReLU DNN given by 3.1.

Further, let γ>0\gamma>0 be a prescribed error, 0<η<10<\eta<1 be a prescribed confidence, and consider the following assumptions on the parameters ϵf,ϵg,ϵr,ϵp and c\epsilon_{f},\epsilon_{g},\epsilon_{r},\epsilon_{p}\text{ and }c:

  1. a.1)

    ϵgγ/6\epsilon_{g}\leq\gamma/6,

  2. a.2)

    ϵfdγ6Mdiam(D)2\epsilon_{f}\leq\frac{d\gamma}{6M{\rm diam}(D)^{2}},

  3. a.3)

    ϵr<ε0:=13[(4|g|α+|f|)𝖺𝖽𝗂𝖺𝗆(D)1]2α(γ/2)2αd1\epsilon_{r}<\varepsilon_{0}:=\frac{1}{3}\left[(4|g|_{\alpha}+|f|_{\infty}){\sf adiam}(D)\vee 1\right]^{-\frac{2}{\alpha}}(\gamma/2)^{\frac{2}{\alpha}}d^{-1}, so that, by 2.8, r~\widetilde{r} is a (β,ε0)(\beta,\varepsilon_{0})-distance if we choose β=1/3\beta=1/3,

  4. a.4)

    ϵp=γd6M(1+2|f|)\epsilon_{p}=\frac{\gamma d}{6M(1+2|f|_{\infty})} and c=max(diam(D),2|f|)c=\max({\rm diam}(D),2|f|_{\infty}),

where M1M\geq 1 is specified below.

Further, consider the iid pairs ((Xk,i)k1,Yi),i1((X^{\cdot,i}_{k})_{k\geq 1},Y^{i}),i\geq 1 on (Ω,,)(\Omega,\mathcal{F},\mathbb{P}), as in the beginning of Subsection 2.4, and

u~MN(x):=1Ni=1N[ϕg(XMx,i)+1dk=1MΠ(Π(r~(Xk1x,i),r~(Xk1x,i)),ϕf(Xk1x,i+r~(Xk1x,i)Yi))],xD.\widetilde{u}_{M}^{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\left[\phi_{g}(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\Pi\left(\Pi\left(\widetilde{r}(X^{x,i}_{k-1}),\widetilde{r}(X^{x,i}_{k-1})\right),\phi_{f}\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right)\right],x\in D. (3.2)

Let us choose

N64{d[M/αlog(2+|r~|1)+log(K)]+log(2η)}[|g|+Md𝖽𝗂𝖺𝗆(D)|f|]29γ2,\displaystyle N\geq\frac{64\left\{d\left[\lceil M/\alpha\rceil\log(2+|\widetilde{r}|_{1})+\log(K)\right]+\log(\frac{2}{\eta})\right\}\left[|g|_{\infty}+\frac{M}{d}{\sf diam}(D)|f|_{\infty}\right]^{2}}{9\gamma^{2}}, (3.3)
M36[log(8/γ)+log(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)]𝖽𝗂𝖺𝗆(D)2ε02,\displaystyle M\geq\frac{36\left[\log(8/\gamma)+\log(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})\right]{\sf diam}(D)^{2}}{\varepsilon_{0}^{2}}, (3.4)

where K:=𝖽𝗂𝖺𝗆(D)(16(|g|α+diam(D)2|f|α+2diam(D)|f|d)+2γ)1/αK:=\left\lceil{\sf diam}(D)\left(\frac{16\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)+2}{\gamma}\right)^{1/\alpha}\right\rceil.

Furthermore, if DD is δ\delta-defective convex then MM can be chose such that

M36dlog(8γ𝖽𝗂𝖺𝗆(D)ε0)+log(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)(1δ).M\geq 36\frac{d\log\left(\frac{8}{\gamma}\sqrt{\frac{{\sf diam}(D)}{\varepsilon_{0}}}\right)+\log(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})}{(1-\delta)}.

Conclusion: Under the above assumption and keeping the same notations, there exits a measurable function 𝕌MN:Ω×D\mathbb{U}_{M}^{N}:\Omega\times D\rightarrow\mathbb{R} such that

  • c.1)

    𝕌MN(ω,)\mathbb{U}_{M}^{N}(\omega,\cdot) is a ReLU DNN for each ωΩ\omega\in\Omega, 𝕌MN(,x)=u~MN(x),xD,\mathbb{U}_{M}^{N}(\cdot,x)=\widetilde{u}_{M}^{N}(x),\quad x\in D, and

    (supxD|u(x)𝕌MN(,x)|γ)η.\mathbb{P}\left(\sup\limits_{x\in D}\left|u(x)-\mathbb{U}_{M}^{N}(\cdot,x)\right|\geq\gamma\right)\leq\eta.
  • c.2)

    For each ωΩ\omega\in\Omega we have that

    size(𝕌MN(ω,))𝒪(MN[dMmax(d,𝒲(ϕr),(ϕr))+Msize(ϕr)+size(ϕf)+log(1γd)]).{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))\in\mathcal{O}\left(MN\left[dM\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+M{\rm size}(\phi_{r})+{\rm size}(\phi_{f})+\left\lceil\log\left(\frac{1}{\gamma d}\right)\right\rceil\right]\right).

In particular,

size(𝕌MN(ω,))𝒪(d7γ16/α4log4(1γ)[d3γ4/αlog(1γ)+log(1η)]S),{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))\in\mathcal{O}\left(d^{7}\gamma^{-16/\alpha-4}\log^{4}\left(\frac{1}{\gamma}\right)\left[d^{3}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right),

where

S:=[max(d,𝒲(ϕr),(ϕr))+size(ϕr)+size(ϕg)+size(ϕf)]{\rm S}:=\left[\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+{\rm size}(\phi_{r})+{\rm size}(\phi_{g})+{\rm size}(\phi_{f})\right]

and the tacit constant depends on |g|α,|g|,|f|,diam(D),adiam(D),δ,α,log(2+|ϕr|1)|g|_{\alpha},|g|_{\infty},|f|_{\infty},{\rm diam}(D),{\rm adiam}(D),\delta,\alpha,\log(2+|\phi_{r}|_{1}).

Furthermore, if DD is δ\delta-defective convex then if a.5)a.5) holds then

size(𝕌MN(ω,))𝒪(d3γ2log4(dγ)[d2log(dγ)+log(1η)]S){\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))\in\mathcal{O}\left(\frac{d^{3}}{\gamma^{2}}\log^{4}\left(\frac{d}{\gamma}\right)\left[d^{2}\log\left(\frac{d}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right)

.

Proof in Subsection 4.5.

We end the exposition of the main results with the remark that it is sufficient to prescribe the Dirichlet data gg merely on D\partial D (not necessarily extended to D¯\overline{D}), as expressed by the following direct consequence of 3.8.

Corollary 3.11.

If the domain DD is of class C2C^{2}, and if gg is given merely on D\partial D and it is α\alpha-Hölder there, then gg can be constructively extended to an α\alpha-Hölder function on D¯\overline{D}. Furthermore, a ReLU DNN approximation ϕg\phi_{g} can be constructed as in 3.8, so 2.26 and 3.10 fully apply.

4 Proofs of the main results

4.1 Proofs for Subsection 2.1

Proof of 2.1.

Let v(x):=u(Ax)v(x):=u(Ax). Then kv(x)=j,l=1nju(Ax)k(Ajlxl)=j=1nju(Ax)Ajk\partial_{k}v(x)=\sum_{j,l=1}^{n}\partial_{j}u(Ax)\partial_{k}(A_{jl}x_{l})=\sum_{j=1}^{n}\partial_{j}u(Ax)A_{jk} and

k=1nkkv(x)=i,j,k,l=1niju(Ax)Ajkk(Ailxl)=i,j,k=1niju(Ax)AjkAik.\sum_{k=1}^{n}\partial_{k}\partial_{k}v(x)=\sum_{i,j,k,l=1}^{n}\partial_{i}\partial_{j}u(Ax)A_{jk}\partial_{k}(A_{il}x_{l})=\sum_{i,j,k=1}^{n}\partial_{i}\partial_{j}u(Ax)A_{jk}A_{ik}.

Thus we need to determine AA such that AAT=KAA^{T}=K. We know that there exists a rotation matrix RO(3)R\in O(3) (hence RRT=IdRR^{T}=Id) such that RKRt=diag(λ1,,λn)RKR^{t}=\textrm{diag}(\lambda_{1},\dots,\lambda_{n}). Since KK is positive definite we have λi>0\lambda_{i}>0 for i{1,,n}i\in\{1,\dots,n\}. Then we have:

RARTRATRT=RKRT=diag(λ1,,λn).RAR^{T}RA^{T}R^{T}=RKR^{T}=\textrm{diag}(\lambda_{1},\dots,\lambda_{n}). (4.1)

We denote B:=RARTB:=RAR^{T} and observe that (4.1) can be rewritten as BBT=diag(λ1,,λn)BB^{T}=\textrm{diag}(\lambda_{1},\dots,\lambda_{n}). We can now take B:=diag(λ1,,λn)B:=\textrm{diag}(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}}). Thus A:=RTdiag(λ1,,λn)RA:=R^{T}\textrm{diag}(\sqrt{\lambda_{1}},\dots,\sqrt{\lambda_{n}})R. ∎

Proof of 2.3.

The second assertion follows directly from 2.2 and from classical regularity theory for Poisson equation, so let us prove the first one. To this end, note that without loss of generality we may assume that x=0Dx=0\in D, so by Ito’s formula we get that (|B0(t)|2dt)t0(|B^{0}(t)|^{2}-dt)_{t\geq 0} is a martingale and

v(0)=𝔼{τDc0}=1d𝔼{|B0(τDc)|2}𝖽𝗂𝖺𝗆(D)2/d.v(0)=\mathbb{E}\left\{\tau^{0}_{D^{c}}\right\}=\frac{1}{d}\mathbb{E}\left\{|B^{0}(\tau_{D^{c}})|^{2}\right\}\leq{\sf diam}(D)^{2}/d.

Proof of 2.10.

i). Using 2.9 we have that,

𝔼{0τDcxf(Bx(t))𝑑t}\displaystyle\mathbb{E}\left\{\int_{0}^{\tau^{x}_{D^{c}}}f(B^{x}(t))\;dt\right\} =n0𝔼{τnxτn+1xf(Bx(t))𝑑t}\displaystyle=\sum\limits_{n\geq 0}\mathbb{E}\left\{\int_{\tau^{x}_{n}}^{\tau^{x}_{n+1}}f(B^{x}(t))\;dt\right\}
=n1𝔼{r~2(Bτn1xx)K0f(Bτn1xx+r~(Bτn1xx))}\displaystyle=\sum\limits_{n\geq 1}\mathbb{E}\left\{\widetilde{r}^{2}(B^{x}_{\tau^{x}_{n-1}})K_{0}f(B^{x}_{\tau^{x}_{n-1}}+\widetilde{r}(B^{x}_{\tau^{x}_{n-1}})\cdot)\right\}
=𝔼{k1r~2(Xk1x)K0f(Xk1x+r~(Xk1x))},xD,\displaystyle=\mathbb{E}\left\{\sum\limits_{k\geq 1}\widetilde{r}^{2}(X^{x}_{k-1})K_{0}f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})\cdot)\right\},\quad x\in D,

where the second equality follows by the strong Markov and the scaling properties of Brownian motion, whilst the last equality follows from the fact that the law of (Xnx)n0(X^{x}_{n})_{n\geq 0} under ~\widetilde{\mathbb{P}} and the law of (Bτnxx)n0(B^{x}_{\tau^{x}_{n}})_{n\geq 0} under 0\mathbb{P}^{0} are the same. Therefore, the statement follows by 2.2.

ii). The claim follows from the fact that μ(A)=w(0)\mu(A)=w(0), where ww solves

{12Δw=d1A in B(0,1)w=0 on B(0,1)\left\{\begin{array}[]{ll}-\frac{1}{2}\Delta w=d1_{A}&\,\textrm{ in }B(0,1)\\[2.84526pt] \phantom{\Delta}w=0&\,\textrm{ on }\partial B(0,1)\end{array}\right.

for all A(B(0,1))A\in\mathcal{B}(B(0,1)).

iii). We use conditional expectation, namely for every xDx\in D

𝔼\displaystyle\mathbb{E} {k1r~2(Xk1x)f(Xk1x+r~(Xk1x)Y)}=k1𝔼{r~2(Xk1x)f(Xk1x+r~(Xk1x)Y)}\displaystyle\left\{\sum\limits_{k\geq 1}\widetilde{r}^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})Y)\right\}=\sum\limits_{k\geq 1}\mathbb{E}\left\{\widetilde{r}^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})Y)\right\}
=k1𝔼{𝔼[r~2(Xk1x)f(Xk1x+r~(Xk1x)Y)|Xk1x]}=dk1𝔼{r~2(Xk1x)K0f(Xk1x+r~(Xk1x))},\displaystyle=\sum\limits_{k\geq 1}\mathbb{E}\left\{\mathbb{E}\left[\widetilde{r}^{2}(X^{x}_{k-1})f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})Y)|\,X_{k-1}^{x}\right]\right\}=d\sum\limits_{k\geq 1}\mathbb{E}\left\{\widetilde{r}^{2}(X^{x}_{k-1})K_{0}f(X^{x}_{k-1}+\widetilde{r}(X^{x}_{k-1})\cdot)\right\},

where for the last equality we used that YY has distribution μ\mu and is independent of Xk1xX_{k-1}^{x}. ∎

Proof of 2.4.

Recall that 𝔼{τDcx}=v(x),xD\mathbb{E}\{\tau^{x}_{D^{c}}\}=v(x),x\in D, where vv is given by (2.3). The idea is to explicitly solve for v(x)=2w(|x|)v(x)=-2w(|x|) in radial form as

w′′(r)+n1rw(r)=1 with w(R0)=0,w(R1)=0w^{\prime\prime}(r)+\frac{n-1}{r}w^{\prime}(r)=1\text{ with }w(R_{0})=0,w(R_{1})=0

which is explicitly solved as

w(r)=r22n+C1r2n+C2 with C1=R12R022n(R02nR12n),C2=R0nR1n2n(R1n2R0n2).w(r)=\frac{r^{2}}{2n}+C_{1}r^{2-n}+C_{2}\text{ with }C_{1}=\frac{R_{1}^{2}-R_{0}^{2}}{2n(R_{0}^{2-n}-R_{1}^{2-n})},C_{2}=\frac{R_{0}^{n}-R_{1}^{n}}{2n(R_{1}^{n-2}-R_{0}^{n-2})}.

Now, if we start from a point xx we can use the intermediate value theorem to obtain first that

𝔼x{τDc}=2w(|x|)2d(x,A(R0,R1))|w(r0)|\mathbbm{E}_{x}\{\tau_{D^{c}}\}=-2w(|x|)\leq 2d(x,\partial A(R_{0},R_{1}))|w^{\prime}(r_{0})|

for some point r0[R0,R1]r_{0}\in[R_{0},R_{1}]. Therefore our task now is to estimate the above derivative, which we can compute explicitly as

w(r)=rn+2n2nR12R02R02nR12nr1n.w^{\prime}(r)=\frac{r}{n}+\frac{2-n}{2n}\frac{R_{1}^{2}-R_{0}^{2}}{R_{0}^{2-n}-R_{1}^{2-n}}r^{1-n}.

To estimate this in a transparent way we set R1=ρR0R_{1}=\rho R_{0} and r=tR0r=tR_{0} for 1tρ1\leq t\leq\rho. In these new notations we have

v(r)=R0n(t(2n)(ρ21)2(ρ2n1)t1n).v^{\prime}(r)=\frac{R_{0}}{n}\left(t-\frac{(2-n)(\rho^{2}-1)}{2(\rho^{2-n}-1)}t^{1-n}\right).

Now, it is an elementary matter that for two functions f,g:[a,b]f,g:[a,b]\to\mathbb{R} which are differentiable on (a,b)(a,b) (and gg non-vanishing), we can find a point ξ(a,b)\xi\in(a,b) such that

f(b)f(a)g(b)g(a)=f(ξ)g(ξ).\frac{f(b)-f(a)}{g(b)-g(a)}=\frac{f^{\prime}(\xi)}{g^{\prime}(\xi)}.

Using this fact for a=1,b=ρa=1,b=\rho, f(x)=x2f(x)=x^{2} and g(x)=x2ng(x)=x^{2-n} we argue that for some point ξ(1,ρ)\xi\in(1,\rho) we have

w(r)=R0tn(1(ξ/t)n).w^{\prime}(r)=\frac{R_{0}t}{n}\left(1-(\xi/t)^{n}\right).

As a function of t[1,ρ]t\in[1,\rho] the above function is increasing, thus we have

R0n(1ξn)w(r)R0ρn(1(ξ/ρ)n).\frac{R_{0}}{n}\left(1-\xi^{n}\right)\leq w^{\prime}(r)\leq\frac{R_{0}\rho}{n}\left(1-(\xi/\rho)^{n}\right). (4.2)

Therefore in order to control |w(r)||w^{\prime}(r)| it suffices to control the absolute values of the two bounds above. The right hand side bound is easy because for any x(0,1)x\in(0,1), thus we obtain that

R0ρ1(ξ/ρ)nnR0ρ(1ξ/ρ)=R0(ρξ)R1R0.R_{0}\rho\frac{1-(\xi/\rho)^{n}}{n}\leq R_{0}\rho(1-\xi/\rho)=R_{0}(\rho-\xi)\leq R_{1}-R_{0}. (4.3)

The left hand side of (4.2) in absolute value is bounded by

|R0(1ξn)n|=R0(ξn1)n=R0n((n2)(ρ21)2(1ρ2n)1)=R0n((n2)(ρ21)ρn22(ρn21)1)=R0n((n2)(ρ21)2+(n2)(ρ21)2(ρn21)1){R03((ρ21)2+ρ12),n=3R02n(ρ21),n4R0(ρ1)ρ2(R1R0)R12R0,\begin{split}\left|\frac{R_{0}(1-\xi^{n})}{n}\right|&=\frac{R_{0}(\xi^{n}-1)}{n}=\frac{R_{0}}{n}\left(\frac{(n-2)(\rho^{2}-1)}{2(1-\rho^{2-n})}-1\right)=\frac{R_{0}}{n}\left(\frac{(n-2)(\rho^{2}-1)\rho^{n-2}}{2(\rho^{n-2}-1)}-1\right)\\ &=\frac{R_{0}}{n}\left(\frac{(n-2)(\rho^{2}-1)}{2}+\frac{(n-2)(\rho^{2}-1)}{2(\rho^{n-2}-1)}-1\right)\\ &\leq\begin{cases}\frac{R_{0}}{3}\left(\frac{(\rho^{2}-1)}{2}+\frac{\rho-1}{2}\right),&n=3\\ \frac{R_{0}}{2n}(\rho^{2}-1),&n\geq 4\end{cases}\\ &\leq\frac{R_{0}(\rho-1)\rho}{2}\\ &\leq\frac{(R_{1}-R_{0})R_{1}}{2R_{0}},\end{split} (4.4)

where we go back to the fact that ξn=(n2)(ρ21)2(1ρ)\xi^{n}=\frac{(n-2)(\rho^{2}-1)}{2(1-\rho)} and in the second line we used that 2(ρn21)(n2)(ρ21)2(\rho^{n-2}-1)\geq(n-2)(\rho^{2}-1) because ρ1\rho\geq 1 for n4n\geq 4. Thus combining (4.3) and (4.4) we get that

|w(r)|(R1R0)R12R0|w^{\prime}(r)|\leq\frac{(R_{1}-R_{0})R_{1}}{2R_{0}}

which is our claim. ∎

Proof of 2.6.

For xDx\in D, we pick a point yDy\in\partial D such that d(x,y)=d(x,D)d(x,y)=d(x,\partial D) and notice that τDcτA(ay,R0,y,R1,y)c\tau_{D^{c}}\leq\tau_{A(a_{y},R_{0,y},R_{1,y})^{c}}. At the same time, since d(x,A(y,R0,y,R1,y))=d(x,D)d(x,\partial A(y,R_{0,y},R_{1,y}))=d(x,\partial D), we employ 2.4 to deduce that

𝔼{τDcx}𝔼{τA(y,R0,y,R1,y)cx}2d(x,D)𝖺𝖽𝗂𝖺𝗆(D).\mathbbm{E}\left\{\tau^{x}_{D^{c}}\right\}\leq\mathbbm{E}\left\{\tau^{x}_{A(y,R_{0,y},R_{1,y})^{c}}\right\}\leq 2d(x,\partial D){\sf adiam}(D).

4.2 Proofs for Subsection 2.2

Proof of 2.11.

The proof goes through several steps.

The first step observes the following two basic facts which can be easily checked by direct computations. On the ball of radius rr centered at 0 we have

if v(x)=1γ2d|x|2, for 0<γ<2dr2, then Δvγv\text{if }v(x)=1-\frac{\gamma}{2d}|x|^{2},\text{ for }0<\gamma<\frac{2d}{r^{2}},\text{ then }\Delta v\leq-\gamma v (4.5)

and

for v(x)=1γ2d+γr2|x|2 with 0<γ, then Δvγv.\text{for }v(x)=1-\frac{\gamma}{2d+\gamma r^{2}}|x|^{2}\text{ with }0<\gamma,\text{ then }\Delta v\geq-\gamma v. (4.6)

The second step consists in proving some estimates for the exit time of the Brownian motion from the ball of radius rr and centered at 0. Denote by τ\tau this exit time for the Brownian motion started at the origin. Then

𝔼{eγτ}11γ2dr2 for 0<γ<2dr2\mathbbm{E}\left\{e^{\gamma\tau}\right\}\leq\frac{1}{1-\frac{\gamma}{2d}r^{2}}\text{ for }0<\gamma<\frac{2d}{r^{2}} (4.7)

and

𝔼{eγτ}1+γ2dr2 for 0<γ.\mathbbm{E}\left\{e^{\gamma\tau}\right\}\geq 1+\frac{\gamma}{2d}r^{2}\text{ for }0<\gamma. (4.8)

The proofs of (4.7) and (4.8) are based on the previous step. For example, using (4.5) we learn that Δv+γv0\Delta v+\gamma v\leq 0 for v(x)=1γ2d|x|2v(x)=1-\frac{\gamma}{2d}|x|^{2} and this combined with Itô’s formula means that eγtv(Bt)e^{\gamma t}v(B_{t}) is a supermartingale. In particular, stopping it at time τ\tau, we obtain that

𝔼{eγτv(Bτ)}v(0)\mathbbm{E}\left\{e^{\gamma\tau}v(B_{\tau})\right\}\leq v(0)

from which we deduce (4.7).

In a similar fashion using (4.6) we can deduce (4.8).

With these two steps at hand we can move to proving the actual result. To proceed, we take U1,U2,U_{1},U_{2},\dots the iid sequence of uniform random variables on the unit sphere in d\mathbb{R}^{d} which drives the walk on spheres. Now set NkxN_{k}^{x} to denote the number of steps to the ϵ\epsilon-shell for the walk on spheres using the random variables Uk,Uk+1,U_{k},U_{k+1},\dots. Notice that for a fixed point xx, in distribution sense, NkxN_{k}^{x} have the same distribution for all k=1,2,k=1,2,\dots. Also, set Tβ(x,U)=x+r~(x)UT_{\beta}(x,U)=x+\widetilde{r}(x)U the point on the sphere of radius r~(x)\widetilde{r}(x) determined by the first step of the walk on spheres determined by UU. The key now is the fact that

N1x=1+𝟙Tβ(x,U1)ΩϵN2Tβ(x,U1).N^{x}_{1}=1+\mathbbm{1}_{T_{\beta}(x,U_{1})\in\Omega_{\epsilon}}N_{2}^{T_{\beta}(x,U_{1})}. (4.9)

The intuitive explanation of this is rather simple, the walk on spheres starts with the first step. If we land in the ϵ\epsilon-shell we stop. Otherwise we have to start again but this time we have already used the random variable U1U_{1} and thus we have to base our remaining walk on spheres using U2,U3,U_{2},U_{3},\dots.

Using now (4.9) we can write that

𝔼{eλNx}=𝔼{𝔼[eλNx|U1]}=𝔼{eλ𝔼[eλ𝟙Tβ(x,U1)ΩϵN2Tβ(x,U1)|U1]},\mathbbm{E}\left\{e^{\lambda N^{x}}\right\}=\mathbbm{E}\left\{\mathbbm{E}[e^{\lambda N^{x}}|\,U_{1}]\right\}=\mathbbm{E}\left\{e^{\lambda}\mathbbm{E}[e^{\lambda\mathbbm{1}_{T_{\beta}(x,U_{1})\in\Omega_{\epsilon}}N^{T_{\beta}(x,U_{1})}_{2}}|\,U_{1}]\right\}, (4.10)

where we used conditioning with respect to the first random variable U1U_{1}.

Now we are going to use a λ\lambda such that

eλ𝔼{eγτ1},e^{\lambda}\leq\mathbbm{E}\left\{e^{\gamma\tau_{1}}\right\},

where τ1\tau_{1} is the first exit time of the Brownian motion from the ball of radius r~(x)βϵ\widetilde{r}(x)\geq\beta\epsilon starting at xx. This is the place where we can use the estimate (4.8) to show that λ=log(1+γβ2ϵ2/2d)\lambda=\log(1+\gamma\beta^{2}\epsilon^{2}/2d) is sufficient to guarantee the above estimate. Notice the key point here, namely the fact that U1U_{1} has the same distribution as Bτ1x|Bτ1x|\frac{B_{\tau_{1}}-x}{|B_{\tau_{1}}-x|} where BtB_{t} is the Brownian motion started at xx and τ1\tau_{1} denotes the exit time of the Brownian motion from the ball of radius r~(x)\widetilde{r}(x).

Thus now we use this to argue that

𝔼{eλNx}𝔼{eγτ1+λ𝟙Tβ(x,U1)ΩϵN2Tβ(x,U1)}=𝔼{eγτ1+λ𝟙Bτ1ΩϵN2Bτ1}.\mathbbm{E}\left\{e^{\lambda N^{x}}\right\}\leq\mathbbm{E}\left\{e^{\gamma\tau_{1}+\lambda\mathbbm{1}_{T_{\beta}(x,U_{1})\in\Omega_{\epsilon}}N^{T_{\beta}(x,U_{1})}_{2}}\right\}=\mathbbm{E}\left\{e^{\gamma\tau_{1}+\lambda\mathbbm{1}_{B_{\tau_{1}}\in\Omega_{\epsilon}}N^{B_{\tau_{1}}}_{2}}\right\}. (4.11)

Now repeating this one more step using the new starting point Bτ1B_{\tau_{1}} we will get

𝔼{eλNx]𝔼[eγ(τ1+𝟙Tβ(x,U1)Ωϵτ2)+λ𝟙Tβ(Tβ(x,U1),U2)ΩϵN2Tβ(Tβ(x,U1),U2)}=𝔼{eγτ1+λ𝟙Bτ1+τ2ΩϵN3Bτ1}.\mathbbm{E}\left\{e^{\lambda N^{x}}]\leq\mathbbm{E}[e^{\gamma(\tau_{1}+\mathbbm{1}_{T_{\beta}(x,U_{1})\in\Omega_{\epsilon}}\tau_{2})+\lambda\mathbbm{1}_{T_{\beta}(T_{\beta}(x,U_{1}),U_{2})\in\Omega_{\epsilon}}N^{T_{\beta}(T_{\beta}(x,U_{1}),U_{2})}_{2}}\right\}=\mathbbm{E}\left\{e^{\gamma\tau_{1}+\lambda\mathbbm{1}_{B_{\tau_{1}+\tau_{2}}\in\Omega_{\epsilon}}N^{B_{\tau_{1}}}_{3}}\right\}.

Repeating this we finally obtain that

𝔼{eλNx}𝔼{eγτΩϵ}.\mathbbm{E}\left\{e^{\lambda N^{x}}\right\}\leq\mathbbm{E}\left\{e^{\gamma\tau_{\Omega_{\epsilon}}}\right\}. (4.12)

Here we use γ>0\gamma>0 and λ=log(1+γβ2ϵ22d)\lambda=\log(1+\frac{\gamma\beta^{2}\epsilon^{2}}{2d}). To finish the proof, we need to estimate now the right hand side in (4.12). To do this we enclose the domain DD in the ball of radius 𝖽𝗂𝖺𝗆(D){\sf diam}(D) centered at xx and now use (4.7) with r=𝖽𝗂𝖺𝗆(D)r={\sf diam}(D) to get that

𝔼{eλNx}11γD22d for λ=log(1+γβ2ϵ22d) and γ<2dD2.\mathbbm{E}\left\{e^{\lambda N^{x}}\right\}\leq\frac{1}{1-\frac{\gamma D^{2}}{2d}}\text{ for }\lambda=\log(1+\frac{\gamma\beta^{2}\epsilon^{2}}{2d})\text{ and }\gamma<\frac{2d}{D^{2}}.

Finally, using that log(1+x)x/2\log(1+x)\geq x/2 for 0x10\leq x\leq 1 and choosing γ=dD2\gamma=\frac{d}{D^{2}}, we obtain (2.14).

The second inequality of the statement is obtained based on the first estimate and Markov inequality:

(NxR)\displaystyle\mathbb{P}\left(N^{x}\geq R\right) =(eβ2ε24D2Nxeβ2ε24D2R)𝔼{eβ2ε24D2Nx}eβ2ε24D2R\displaystyle=\mathbb{P}\left(e^{\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}N^{x}}\geq e^{\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}R}\right)\leq\mathbb{E}\left\{e^{\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}N^{x}}\right\}e^{-\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}R}
2eβ2ε24D2R.\displaystyle\leq 2e^{-\frac{\beta^{2}\varepsilon^{2}}{4D^{2}}R}.

Proof of 2.14.

The δ\delta-defectiv convexity condition for this region amounts to the inequality:

𝒜R1,R2φ(x)r(x)𝑑xδR2R1A(R1,R2)φ(x)𝑑x for all 0φCc(A(R1,R2)),-\int_{\mathcal{A}_{R_{1},R_{2}}}\nabla\varphi(x)\nabla r(x)\,dx\leq\frac{\delta}{R_{2}-R_{1}}\int_{A(R_{1},R_{2})}\varphi(x)\,dx\quad\mbox{ for all }0\leq\varphi\in C_{c}^{\infty}(A(R_{1},R_{2})),

where we used the fact that the distance function rr is Lipschitz, hence one can integrate by parts and discard the boundary terms due to the fact that φCc(A(R1,R2),+)\varphi\in C_{c}^{\infty}(A(R_{1},R_{2}),\mathbb{R}_{+}).

We use the fact that on A(R1,R1+R22)A\left(R_{1},\frac{R_{1}+R_{2}}{2}\right) and A(R1+R22,R2)A\left(\frac{R_{1}+R_{2}}{2},R_{2}\right) the function rr is in fact smooth and we integrate by parts on each region to obtain that the left hand side above becomes:

A(R1,R1+R22)φ(x)Δr(x)𝑑x+S2r=R1+R22φ(σ,s)r(s),sd1dsdσ+\displaystyle\int_{A\left(R_{1},\frac{R_{1}+R_{2}}{2}\right)}\varphi(x)\Delta r(x)\,dx+\int_{S^{2}}\int_{r=\frac{R_{1}+R_{2}}{2}}\varphi(\sigma,s)r^{\prime}(s),s^{d-1}\,dsd\sigma+
+𝒜R1,R1+R22φ(x)Δr(x)𝑑xS2r=R1+R22φ(σ,s)r(s)sd1𝑑s𝑑σ\displaystyle+\int_{\mathcal{A}_{R_{1},\frac{R_{1}+R_{2}}{2}}}\varphi(x)\Delta r(x)\,dx-\int_{S^{2}}\int_{r=\frac{R_{1}+R_{2}}{2}}\varphi(\sigma,s)r^{\prime}(s)\,s^{d-1}\,dsd\sigma
=A(R1,R1+R22)φ(x)Δr(x)𝑑x,\displaystyle=\int_{A\left(R_{1},\frac{R_{1}+R_{2}}{2}\right)}\varphi(x)\Delta r(x)\,dx,

where we used spherical coordinates on the boundary and denoted by prime the derivative in the radial direction. Also, the Δr(x)\Delta r(x) is well defined, in a classical sense, on A(R1,R2)A(R_{1},R_{2}) except for the set of measure zero that in spherical coordinates is given by {(r,σ);r=R1+R22,σ𝕊2}\{(r,\sigma);r=\frac{R_{1}+R_{2}}{2},\sigma\in\mathbb{S}^{2}\}. Noting that:

r(σ,s)={sR1 for all σ𝕊2,s(R1,R1+R22),R2s for all σ𝕊2,s(R1+R22,R2)r(\sigma,s)=\left\{\begin{array}[]{ll}s-R_{1}&\mbox{ for all }\sigma\in\mathbb{S}^{2},s\in(R_{1},\frac{R_{1}+R_{2}}{2}),\\[5.69054pt] R_{2}-s&\mbox{ for all }\sigma\in\mathbb{S}^{2},s\in(\frac{R_{1}+R_{2}}{2},R_{2})\end{array}\right.

we have r1r^{\prime}\equiv 1 for all xA(R1,R1+R22)x\in A\left(R_{1},\frac{R_{1}+R_{2}}{2}\right) and r1r^{\prime}\equiv-1 for all xA(R1+R22,R2)x\in A\left(\frac{R_{1}+R_{2}}{2},R_{2}\right), hence the δ\delta-defective convexity condition amounts to the inequality:

(d1)(𝕊2R1R1+R22φ(σ,s)sd2𝑑s𝕊2R1+R22R2φ(σ,s)sd2𝑑s)δR2R1𝕊2R1R2φ(σ,s)sd1𝑑s𝑑σ,(d-1)\bigg{(}\int_{\mathbb{S}^{2}}\int_{R_{1}}^{\frac{R_{1}+R_{2}}{2}}\varphi(\sigma,s)s^{d-2}ds-\int_{\mathbb{S}^{2}}\int_{\frac{R_{1}+R_{2}}{2}}^{R_{2}}\varphi(\sigma,s)s^{d-2}ds\bigg{)}\leq\frac{\delta}{R_{2}-R_{1}}\int_{\mathbb{S}^{2}}\int_{R_{1}}^{R_{2}}\varphi(\sigma,s)s^{d-1}\,ds\,d\sigma,

which (taking into account that φ0\varphi\geq 0) is satisfied if for instance R2R1<1+δd1\frac{R_{2}}{R_{1}}<1+\frac{\delta}{d-1}. ∎

Proof of 2.15.

Arguing similarly as in the previous example, the δ\delta-defective convexity condition becomes:

Dεφ(x)Δr(x)𝑑xδεDεφ(x)𝑑x for all 0φCc(Dε),\int_{D_{\varepsilon}}\varphi(x)\Delta r(x)\,dx\leq\frac{\delta}{{\varepsilon}}\int_{D_{\varepsilon}}\varphi(x)\,dx\quad\mbox{ for all }0\leq\varphi\in C_{c}^{\infty}(D_{\varepsilon}), (4.13)

where Δr(x)\Delta r(x) is well-defined, in a classical sense, except on the set of measure zero Γε:={x+12εn(x)d|xΓ}\Gamma_{\varepsilon}:=\big{\{}x+\frac{1}{2}{\varepsilon}\,\,n(x)\in\mathbb{R}^{d}\ \big{|}\ x\in\Gamma\big{\}}. We also denote Dε+:={x+εtn(x)d|(x,t)Γ×(12,1)}D_{\varepsilon}^{+}:=\big{\{}x+{\varepsilon}\,t\,n(x)\in\mathbb{R}^{d}\ \big{|}\ (x,t)\in\Gamma\times(\frac{1}{2},1)\big{\}} respectively Dε:={x+εtn(x)d|(x,t)Γ×(0,12)}D_{\varepsilon}^{-}:=\big{\{}x+{\varepsilon}\,t\,n(x)\in\mathbb{R}^{d}\ \big{|}\ (x,t)\in\Gamma\times(0,\frac{1}{2})\big{\}}.

We recall (see for instance, [19], Lemma 14.1714.17, p. 355) that

Δr(x,t)={i=1d1ki(x)1ki(x)r(x,t) if t(0,12)i=1d1ki(x)1ki(x)(εr(x,t)) if t(12,1),\Delta r(x,t)=\left\{\begin{array}[]{ll}\sum_{i=1}^{d-1}\frac{k_{i}(x)}{1-k_{i}(x)r(x,t)}&\textrm{ if }t\in(0,\frac{1}{2})\\ -\sum_{i=1}^{d-1}\frac{k_{i}(x)}{1-k_{i}(x)({\varepsilon}-r(x,t))}&\textrm{ if }t\in(\frac{1}{2},1)\end{array},\right.

hence we have:

Dεφ(x)Δr(x)𝑑xDεφ(x)(i=1d12ki(x)2ki(x)ε)𝑑xDε+φ(x)(i=1d1ki(x))𝑑x\int_{D_{\varepsilon}}\varphi(x)\Delta r(x)\,dx\leq\int_{D_{\varepsilon}^{-}}\varphi(x)\left(\sum_{i=1}^{d-1}\frac{2k_{i}(x)}{2-k_{i}(x){\varepsilon}}\right)\,dx-\int_{D_{\varepsilon}^{+}}\varphi(x)\left(\sum_{i=1}^{d-1}k_{i}(x)\right)\,dx

thus the condition (2.15) holds for suitably small ε>0{\varepsilon}>0. ∎

Proof of 2.16.

We split the proof in two steps.

Step I (Regularization). Let Dn:={xD:r(x)>1/n}D_{n}:=\{x\in D:r(x)>1/n\} for each nn0n\geq n_{0}, where n0n_{0} is such that Dn0D_{n_{0}}\neq\emptyset. Further, let ρ0\rho\geq 0 be a (smooth) mollifier on d\mathbb{R}^{d} such that SuppρB(0,1){\rm Supp}\,{\rho}\subset B(0,1), and set ρt():=tdρ(/t)\rho_{t}(\cdot):=t^{-d}\rho(\cdot/t), t>0t>0. In particular,

Suppρt+DnD for all t<1/n0.{\rm Supp}\,{\rho_{t}}+D_{n}\subset D\quad\mbox{ for all }t<1/n_{0}. (4.14)

Now let us consider that we extend rr from D¯\overline{D} to d\mathbb{R}^{d} by setting r0r\equiv 0 on dD¯\mathbb{R}^{d}\setminus\overline{D}, and set

Vt=rt, where rt:=ρtrCc2(d).V_{t}=\sqrt{r_{t}},\;\mbox{ where }\;r_{t}:=\rho_{t}\ast r\in C_{c}^{2}(\mathbb{R}^{d}).

We claim that

  1. i)

    Δrtδ2rad(D)\Delta r_{t}\leq\frac{\delta}{2\;\rm{rad}(D)} on DnD_{n} for all nn0n\geq n_{0} and 0<t<1/n00<t<1/{n_{0}}.

  2. ii)

    ΔVt14rt3/2[|rt|2δ]\Delta V_{t}\leq-\frac{1}{4}{r_{t}}^{-3/2}[|\nabla r_{t}|^{2}-\delta] on DnD_{n} for all nn0n\geq n_{0} and 0<t<1/n00<t<1/{n_{0}}.

To prove the claim, note first that by a simple calculation we get

ΔVt=14rt3/2[|rt|22rtΔrt],\Delta V_{t}=-\frac{1}{4}{r_{t}}^{-3/2}[|\nabla r_{t}|^{2}-2r_{t}\Delta r_{t}],

hence ii) follows from i) and the fact that rtrad(D)r_{t}\leq\rm{rad}(D). So, it remains to prove the first assertion of the claim: Let 0φCc(Dn),nn0,t<1/n00\leq\varphi\in C_{c}^{\infty}(D_{n}),n\geq n_{0},t<1/n_{0}, and proceed with integration by parts and Fubini’s theorem as follows

DnφΔrt𝑑x\displaystyle\int_{D_{n}}\varphi\Delta r_{t}\;dx =Dnrtφdx=dρt(y)Dnr(xy)φ(x)𝑑x𝑑y\displaystyle=-\int_{D_{n}}\nabla r_{t}\cdot\nabla\varphi\;dx=-\int_{\mathbb{R}^{d}}\rho_{t}(y)\int_{D_{n}}\nabla r(x-y)\cdot\nabla\varphi(x)\;dx\;dy
=dρt(y)dr(xy)φ(x)𝑑x𝑑y\displaystyle=-\int_{\mathbb{R}^{d}}\rho_{t}(y)\int_{\mathbb{R}^{d}}\nabla r(x-y)\cdot\nabla\varphi(x)\;dx\;dy
=Suppρtρt(y)dr(x)φ(+y)(x)dxdy,\displaystyle=-\int_{{\rm Supp}\,{\rho_{t}}}\rho_{t}(y)\int_{\mathbb{R}^{d}}\nabla r(x)\cdot\nabla\varphi(\cdot+y)(x)\;dx\;dy,
so, by (4.14) and then using (2.15) we can continue with
=Suppρtρt(y)Dr(x)φ(+y)(x)dxdyδ2rad(D)Suppρtρt(y)Dφ(x+y)dxdy\displaystyle=-\int_{{\rm Supp}\,{\rho_{t}}}\rho_{t}(y)\int_{D}\nabla r(x)\cdot\nabla\varphi(\cdot+y)(x)\;dx\;dy\leq\frac{\delta}{2\;{\rm rad}(D)}\int_{{\rm Supp}\,{\rho_{t}}}\rho_{t}(y)\int_{D}\varphi(x+y)\;dx\;dy
=δ2rad(D)Suppρtdφ(x+y)𝑑x𝑑y=δ2rad(D)dφ(x)𝑑x\displaystyle=\frac{\delta}{2\;{\rm rad}(D)}\int_{{\rm Supp}\,{\rho_{t}}}\int_{\mathbb{R}^{d}}\varphi(x+y)\;dx\;dy=\frac{\delta}{2\;{\rm rad}(D)}\int_{\mathbb{R}^{d}}\varphi(x)\;dx
=δ2rad(D)ρt(y)Dnφ(x)𝑑x,\displaystyle=\frac{\delta}{2\;{\rm rad}(D)}\rho_{t}(y)\int_{D_{n}}\varphi(x)\;dx,

which proves i) and hence the entire claim.

Now, by Ito’s formula in corroboration with the claim proved above yield:

For xDn we have 𝔼{Vt(Bτ(DnB(x,r~(x)))cx)}\displaystyle\mbox{For }x\in D_{n}\mbox{ we have }\;\mathbb{E}\left\{V_{t}\left(B^{x}_{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\right)\right\} =Vt(x)+𝔼{0τ(DnB(x,r~(x)))cΔVt(Bsx)𝑑s}\displaystyle=V_{t}(x)+\mathbb{E}\left\{\int_{0}^{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\Delta V_{t}(B_{s}^{x})\;ds\right\}
Vt(x)14𝔼{0τ(DnB(x,r~(x)))c(rt)3/2(Bsx)[|rt|2(Bsx)δ]𝑑s}.\displaystyle\leq V_{t}(x)-\frac{1}{4}\mathbb{E}\left\{\int_{0}^{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}(r^{t})^{-3/2}(B_{s}^{x})\left[|\nabla r^{t}|^{2}(B_{s}^{x})-\delta\right]\;ds\right\}. (4.15)

Step II (Passing to the limit in (4.15)). The next step is to let t0t\to 0 and then nn\to\infty in (4.15). To this end, note that because rC(D¯)W1,(D)r\in C(\overline{D})\cap W^{1,\infty}(D), we have

  1. iii)

    limt0rt=r\lim\limits_{t\to 0}r_{t}=r uniformly on DD,

  2. iv)

    limt0rt=r\lim\limits_{t\to 0}\nabla r_{t}=\nabla r a.e. and boundedly on DD.

In particular, because infxDnr(x)1/n\inf\limits_{x\in D_{n}}r(x)\geq 1/n for large enough n0n_{0} and nn0n\geq n_{0}, we have that

  1. v)

    limt0rt3/2=r3/2\lim\limits_{t\to 0}r_{t}^{-3/2}=r^{-3/2} boundedly on DnD_{n}, for each nn0n\geq n_{0}.

We are now in the position to let t0t\to 0 in (4.15) to get that for xDnx\in D_{n}

𝔼{V(Bτ(DnB(x,r~(x)))cx)}\displaystyle\mathbb{E}\left\{V\left(B^{x}_{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\right)\right\} V(x)1δ4𝔼{0τ(DnB(x,r~(x)))cr3/2(Btx)𝑑t},\displaystyle\leq V(x)-\frac{1-\delta}{4}\mathbb{E}\left\{\int_{0}^{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}r^{-3/2}(B_{t}^{x})\;dt\right\},

where we have used that |r|=1|\nabla r|=1 a.e. The fact that |r|=1|\nabla r|=1, follows from two basic facts. On one hand, rr is 11-Lipschitz so |r|1|\nabla r|\leq 1. On the other hand, from Lipschitz conditions and Rademacher theorem, rr is differentiable almost everywhere. If xx is a point where rr is differentiable, and yDy\in\partial D such that d(x,y)=r(x)d(x,y)=r(x), and v=yxv=y-x, then it is easy to see that the derivative of rr in the direction vv is constant 11, thus the claim.

Now, on the one hand, since r2r(x)r\leq 2r(x) on B(x,r~(x))B(x,r(x))B(x,\widetilde{r}(x))\subset B(x,r(x)) we deduce that

𝔼{V(Bτ(DnB(x,r~(x)))cx)}\displaystyle\mathbb{E}\left\{V\left(B^{x}_{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\right)\right\} V(x)1δ4r(x)3/2𝔼{τ(DnB(x,r~(x)))c},xDn.\displaystyle\leq V(x)-\frac{1-\delta}{4r(x)^{3/2}}\mathbb{E}\left\{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}\right\},\quad x\in D_{n}. (4.16)

On the other hand, DnnDD_{n}\mathop{\nearrow}\limits_{n}D, so τ(DnB(x,r~(x)))cnτB(x,r~(x))c\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}\mathop{\longrightarrow}\limits_{n}\tau_{B(x,\widetilde{r}(x))^{c}} a.s., hence for all xDx\in D

PV(x)\displaystyle PV(x) =𝔼{V(BτB(x,r~(x))x)}=limn𝔼{V(Bτ(DnB(x,r~(x)))cx)},\displaystyle=\mathbb{E}\left\{V\left(B^{x}_{\tau_{B(x,\widetilde{r}(x))}}\right)\right\}=\lim\limits_{n}\mathbb{E}\left\{V\left(B^{x}_{\tau_{(D_{n}\cap B(x,\widetilde{r}(x)))^{c}}}\right)\right\},
hence letting nn\to\infty in (4.16) we can continue with
PV(x)\displaystyle PV(x) V(x)1δ4r(x)3/2𝔼{τB(x,r~(x))c}=V(x)1δ4r(x)3/2r~(x)2d\displaystyle\leq V(x)-\frac{1-\delta}{4r(x)^{3/2}}\mathbb{E}\left\{\tau_{B(x,\widetilde{r}(x))^{c}}\right\}=V(x)-\frac{1-\delta}{4r(x)^{3/2}}\frac{\widetilde{r}(x)^{2}}{d}
(1β2(1δ)4d)V(x),\displaystyle\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)V(x),

which proves (2.17). Notice that for the last inequality we used that βr(x)r~(x),xD\beta r(x)\leq\widetilde{r}(x),x\in D.

The tail estimate (2.18) can be deduced from (2.17) and Markov inequality, as follows:

(Nεx>M)\displaystyle\mathbb{P}(N_{\varepsilon}^{x}>M) (r(XMx)ε)=(V(XMx)ε)1ε𝔼{V(XMx)}=1εPMV(x)\displaystyle\leq\mathbb{P}(r(X^{x}_{M})\geq\varepsilon)=\mathbb{P}(V(X^{x}_{M})\geq\sqrt{\varepsilon})\leq\frac{1}{\sqrt{\varepsilon}}\mathbb{E}\left\{V(X^{x}_{M})\right\}=\frac{1}{\sqrt{\varepsilon}}P^{M}V(x)
(1β2(1δ)4d)MV(x)ε,xD.\displaystyle\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\frac{V(x)}{\sqrt{\varepsilon}},\quad x\in D.

Let us finally conclude the proof by showing (2.19):

𝔼{aNεx}\displaystyle\mathbb{E}\left\{a^{N_{\varepsilon}^{x}}\right\} =k0ak(Nεx=k)1+k1ak(Nεx>k1)\displaystyle=\sum\limits_{k\geq 0}a^{k}\mathbb{P}(N_{\varepsilon}^{x}=k)\leq 1+\sum\limits_{k\geq 1}a^{k}\mathbb{P}(N_{\varepsilon}^{x}>k-1)
1+k1ak(1β2(1δ)4d)k1V(x)ε=1+aV(x)εk0ak(1β2(1δ)4d)k\displaystyle\leq 1+\sum\limits_{k\geq 1}a^{k}\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{k-1}\frac{V(x)}{\sqrt{\varepsilon}}=1+a\frac{V(x)}{\sqrt{\varepsilon}}\sum\limits_{k\geq 0}a^{k}\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{k}
=1+a1aδdV(x)ε,xD.\displaystyle=1+\frac{a}{1-a\delta_{d}}\frac{V(x)}{\sqrt{\varepsilon}},\quad x\in D.

4.3 Proofs for Subsection 2.3

Proof of 2.19.

First of all, note that by similar arguments to those used in the proof of 2.10,

uM(x)=𝔼{n=0M1τnxτn+1xf(Btx)𝑑t}=𝔼{0τMxf(Btx)𝑑t},xD.u_{M}(x)=\mathbb{E}\left\{\sum\limits_{n=0}^{M-1}\int_{\tau_{n}^{x}}^{\tau_{n+1}^{x}}f(B^{x}_{t})\;dt\right\}=\mathbb{E}\left\{\int_{0}^{\tau_{M}^{x}}f(B^{x}_{t})\;dt\right\},\quad x\in D.

Therefore,

|u(x)uM(x)|\displaystyle|u(x)-u_{M}(x)| =|𝔼{τMxτDcf(Btx)𝑑t}||f|𝔼{τDcτMx}=|f|𝔼x{𝔼BτMxx{τDc}}\displaystyle=\left|\mathbb{E}\left\{\int_{\tau_{M}^{x}}^{\tau_{D^{c}}}f(B^{x}_{t})\;dt\right\}\right|\leq|f|_{\infty}\mathbb{E}\left\{\tau_{D^{c}}-\tau_{M}^{x}\right\}=|f|_{\infty}\mathbb{E}^{x}\left\{\mathbb{E}^{B^{x}_{\tau_{M}^{x}}}\left\{\tau_{D^{c}}\right\}\right\}
=|f|𝔼{v(BτMxx)}=|f|𝔼{v(BτMxx);NεxM}+|f|𝔼{v(BτMxx);Nεx>M}\displaystyle=|f|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}})\right\}=|f|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}});N_{\varepsilon}^{x}\leq M\right\}+|f|_{\infty}\mathbb{E}\left\{v(B^{x}_{\tau_{M}^{x}});N_{\varepsilon}^{x}>M\right\}
|f|[𝔼{v(BτMNεxxx);NεxM}+|v|(Nεx>M)]\displaystyle\leq|f|_{\infty}\left[\mathbb{E}\left\{v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}});N_{\varepsilon}^{x}\leq M\right\}+|v|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M)\right]
|f|[𝔼{v(BτMNεxxx);NεxM}+1d𝖽𝗂𝖺𝗆(D)22eβ2ε24𝖽𝗂𝖺𝗆(D)2M],xD,\displaystyle\leq|f|_{\infty}\left[\mathbb{E}\left\{v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}});N_{\varepsilon}^{x}\leq M\right\}+\frac{1}{d}{\sf diam}(D)^{2}2e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M}\right],\quad x\in D,

where the last inequality follows by 2.3 and 2.11. Now, one can see that (v(Bτnβ))n0(v(B_{\tau_{n}^{\beta}}))_{n\geq 0} is a supermartingale, hence by Doob’s stopping theorem we get that

𝔼{v(BτMNεxxx);NεxM}=𝔼{1[NεxM]𝔼[v(BτMNεxxx)Nεx]}𝔼{v(BτNεxxx)}=v(x,ε)v(ε),\displaystyle\mathbb{E}\left\{v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}});N_{\varepsilon}^{x}\leq M\right\}=\mathbb{E}\left\{1_{[N_{\varepsilon}^{x}\leq M]}\mathbb{E}\left[v(B^{x}_{\tau_{M\vee N_{\varepsilon}^{x}}^{x}})\mid\mathcal{F}_{N_{\varepsilon}^{x}}\right]\right\}\leq\mathbb{E}\left\{v(B^{x}_{\tau_{N_{\varepsilon}^{x}}^{x}})\right\}=v(x,\varepsilon)\leq v_{\infty}(\varepsilon),

hence the desired estimate is now fully justified. ∎

Further, we need the following lemma:

Lemma 4.1.

Let ε>0\varepsilon>0, β(0,1]\beta\in(0,1], r~\widetilde{r} be a (β,ε)(\beta,\varepsilon)-distance, xDx\in D, and (Xnx)n0(X_{n}^{x})_{n\geq 0} be the corresponding r~\widetilde{r}-WoS chain. If ττ\tau^{\prime}\geq\tau is a finite stopping times such that τNεx\tau\geq N_{\varepsilon}^{x}. If gg is α\alpha-Hölder on D¯\overline{D} for some α[0,1]\alpha\in[0,1] then

𝔼{|g(Xτx)g(Xτx)|}dα/2|g|αv(x,ε)α/2dα/2|g|α|v|α/2(ε).\mathbb{E}\left\{\left|g(X^{x}_{\tau^{\prime}})-g(X^{x}_{\tau})\right|\right\}\leq d^{\alpha/2}|g|_{\alpha}\cdot v(x,\varepsilon)^{\alpha/2}\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|^{\alpha/2}_{\infty}(\varepsilon).

If gCb2(D¯)g\in C^{2}_{b}(\overline{D}) then

|𝔼{g(Xτx)}𝔼{g(Xτx)}||Δg|2v(x,ε)|Δg|2|v|(ε).\left|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right|\leq\dfrac{|\Delta g|_{\infty}}{2}\cdot v(x,\varepsilon)\leq\frac{|\Delta g|_{\infty}}{2}\cdot|v|_{\infty}(\varepsilon).
Proof.

It is easy to see that for zdz\in\mathbb{R}^{d}, (Xnx,zx,z)n1(\langle X^{x}_{n},z\rangle-\langle x,z\rangle)_{n\geq 1} is a bounded martingale, hence

𝔼{Xτx,Xτx}=𝔼{|Xτx|2}, and thus 𝔼{|XτxXτx|2}=𝔼{|Xτx|2}𝔼{|Xτx|2}.\mathbb{E}\{\langle X^{x}_{\tau^{\prime}},X^{x}_{\tau}\rangle\}=\mathbb{E}\left\{|X_{\tau}^{x}|^{2}\right\},\quad\mbox{ and thus }\quad\mathbb{E}\left\{|X^{x}_{\tau^{\prime}}-X^{x}_{\tau}|^{2}\right\}=\mathbb{E}\left\{|X^{x}_{\tau^{\prime}}|^{2}\right\}-\mathbb{E}\left\{|X^{x}_{\tau}|^{2}\right\}.

By employing the martingale problem for the Markov chain (Xnx)n0(X_{n}^{x})_{n\geq 0}, we get that for any finite stopping time TT, 𝔼{|XTx|2}=|x|2+𝔼{i=0T1r~2(Xix)},\mathbb{E}\left\{|X^{x}_{T}|^{2}\right\}=|x|^{2}+\mathbb{E}\left\{\sum_{i=0}^{T-1}\widetilde{r}^{2}(X^{x}_{i})\right\}, hence 𝔼{|XτxXτx|2}=𝔼{i=ττr~2(Xix)}\mathbb{E}\left\{|X^{x}_{\tau^{\prime}}-X^{x}_{\tau}|^{2}\right\}=\mathbb{E}\left\{\sum\limits_{i=\tau}^{\tau^{\prime}}\widetilde{r}^{2}(X^{x}_{i})\right\} and therefore

|𝔼{g(Xτx)}𝔼{g(Xτx)}|\displaystyle\left|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right| |g|α𝔼{i=τr~2(Xix)}α/2dα/2|g|α𝔼{i=Nεxτi+1xτix}α/2\displaystyle\leq|g|_{\alpha}\mathbb{E}\left\{\sum_{i=\tau}^{\infty}\widetilde{r}^{2}(X^{x}_{i})\right\}^{\alpha/2}\leq d^{\alpha/2}|g|_{\alpha}\mathbb{E}\left\{\sum_{i=N^{x}_{\varepsilon}}^{\infty}\tau^{x}_{i+1}-\tau^{x}_{i}\right\}^{\alpha/2}
=dα/2|g|α𝔼{τDxτNεxx}α/2=dα/2|g|α𝔼{v(BτNεxxx)}α/2\displaystyle=d^{\alpha/2}|g|_{\alpha}\cdot\mathbb{E}\left\{\tau^{x}_{\partial D}-\tau^{x}_{N^{x}_{\varepsilon}}\right\}^{\alpha/2}=d^{\alpha/2}|g|_{\alpha}\cdot\mathbb{E}\left\{v\left(B^{x}_{\tau^{x}_{N^{x}_{\varepsilon}}}\right)\right\}^{\alpha/2}
dα/2|g|α|v|α/2(ε).\displaystyle\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|_{\infty}^{\alpha/2}(\varepsilon).

Suppose now that gC2(D¯)g\in C^{2}(\overline{D}). Then, by the martingale problem we deduce

|𝔼{g(Xτx)}𝔼{g(Xτx)}|\displaystyle\left|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right| =|𝔼{i=ττ1(Pgg)(Xix)}|𝔼{i=Nεx|Pgg|(Xix)}.\displaystyle=\left|\mathbb{E}\left\{\sum_{i=\tau}^{\tau^{\prime}-1}(Pg-g)(X^{x}_{i})\right\}\right|\leq\mathbb{E}\left\{\sum_{i=N^{x}_{\varepsilon}}^{\infty}|Pg-g|(X^{x}_{i})\right\}.

On the other hand, by Itô’s formula

|Pg(z)g(z)|=|𝔼{0τ1β,z12Δg(Bt)dt}|12|Δg|𝔼{τ1β,z}=|Δg|r~(z)22d,|Pg(z)-g(z)|=\left|\mathbb{E}\left\{\int_{0}^{\tau^{\beta,z}_{1}}\frac{1}{2}\Delta g(B_{t})\;\textrm{d}t\right\}\right|\leq\frac{1}{2}|\Delta g|_{\infty}\cdot\mathbb{E}\left\{\tau_{1}^{\beta,z}\right\}=|\Delta g|_{\infty}\cdot\frac{\widetilde{r}(z)^{2}}{2d},

hence

|𝔼{g(Xτx)}𝔼{g(Xτx)}|\displaystyle\left|\mathbb{E}\left\{g(X^{x}_{\tau^{\prime}})\right\}-\mathbb{E}\left\{g(X^{x}_{\tau})\right\}\right| |Δg|2d𝔼{i=Nϵxr~2(Xix)}=|Δg|2𝔼{i=Nϵxτi+1xτix}\displaystyle\leq\dfrac{|\Delta g|_{\infty}}{2d}\cdot\mathbb{E}\left\{\sum_{i=N^{x}_{\epsilon}}^{\infty}\widetilde{r}^{2}(X^{x}_{i})\right\}=\dfrac{|\Delta g|_{\infty}}{2}\cdot\mathbb{E}\left\{\sum_{i=N^{x}_{\epsilon}}^{\infty}\tau^{x}_{i+1}-\tau^{x}_{i}\right\}
=|Δg|2𝔼{τDxτNϵxx}=|Δg|2𝔼{v(BτNϵxxx)}\displaystyle=\dfrac{|\Delta g|_{\infty}}{2}\cdot\mathbb{E}\left\{\tau^{x}_{\partial D}-\tau^{x}_{N^{x}_{\epsilon}}\right\}=\dfrac{|\Delta g|_{\infty}}{2}\cdot\mathbb{E}\left\{v\left(B^{x}_{\tau^{x}_{N^{x}_{\epsilon}}}\right)\right\}
=|Δg|2v(x,ϵ)|Δg|2|v|(ϵ).\displaystyle=\dfrac{|\Delta g|_{\infty}}{2}\cdot v(x,\epsilon)\leq\dfrac{|\Delta g|_{\infty}}{2}\cdot|v|_{\infty}(\epsilon).

Proof of 2.20.

Recall that (Bτnxx)n\left(B^{x}_{\tau^{x}_{n}}\right)_{n} and (Xnx)n(X^{x}_{n})_{n} are equal in law and hence BτNϵxxxB^{x}_{\tau^{x}_{N^{x}_{\epsilon}}} and XNϵxxX^{x}_{N^{x}_{\epsilon}} are also equal in law. In particular,

𝔼{g(BτDcxx)}=limn𝔼{g(Bτnxx)}=limn𝔼{g(Xnx)} for all xD.\mathbb{E}\{g(B^{x}_{\tau^{x}_{D^{c}}})\}=\lim_{n}\mathbb{E}\{g(B^{x}_{\tau^{x}_{n}})\}=\lim_{n}\mathbb{E}\{g(X^{x}_{n})\}\quad\mbox{ for all }x\in D.

Also, if TT is a finite random time then

limn𝔼{g(XnTx)}=limn{𝔼{g(Xnx);T<n]+𝔼{g(XTx);Tn}}=𝔼{g(BτDcxx)},xD.\lim_{n}\mathbb{E}\{g(X^{x}_{n\vee T})\}=\lim_{n}\{\mathbb{E}\{g(X^{x}_{n});\;T<n]+\mathbb{E}\{g(X^{x}_{T});\;T\geq n\}\}=\mathbb{E}\{g(B^{x}_{\tau^{x}_{D^{c}}})\},x\in D.

Now let us fix ε>0\varepsilon>0 and argue as follows:

|u(x)uM(x)|\displaystyle|u(x)-u_{M}(x)| limn|𝔼{g(Xnx)g(XMx)}|\displaystyle\leq\lim\limits_{n}\left|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M})\right\}\right|
=limn|𝔼{g(Xnx)g(XMx);NεxM}|+limn|𝔼{g(Xnx)g(XMx);Nεx>M}|\displaystyle=\lim\limits_{n}\left|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M});\;N_{\varepsilon}^{x}\leq M\right\}\right|+\lim\limits_{n}\left|\mathbb{E}\left\{g(X^{x}_{n})-g(X^{x}_{M});\;N_{\varepsilon}^{x}>M\right\}\right|
limn|𝔼{g(XnMNεxx)g(XMNεxx);NεxM}|+2|g|(Nεx>M).\displaystyle\leq\lim\limits_{n}\left|\mathbb{E}\left\{g(X^{x}_{n\vee M\vee N^{x}_{\varepsilon}})-g\left(X^{x}_{M\vee N_{\varepsilon}^{x}}\right);\;N_{\varepsilon}^{x}\leq M\right\}\right|+2|g|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M).

Now, since gg is α\alpha-Hölder, for xDx\in D we have

|u(x)uM(x)|\displaystyle|u(x)-u_{M}(x)| limn𝔼{|g(XnMNεxx)g(XMNεxx)|}+2|g|(Nεx>M),\displaystyle\leq\lim\limits_{n}\mathbb{E}\left\{\left|g(X^{x}_{n\vee M\vee N^{x}_{\varepsilon}})-g\left(X^{x}_{M\vee N_{\varepsilon}^{x}}\right)\right|\right\}+2|g|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M),
and by employing 4.1, i), we can continue with
dα/2|g|α|v|α/2(ε)+2|g|(Nεx>M)\displaystyle\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|^{\alpha/2}_{\infty}(\varepsilon)+2|g|_{\infty}\mathbb{P}(N_{\varepsilon}^{x}>M)
dα/2|g|α|v|α/2(ε)+4|g|eβ2ε24𝖽𝗂𝖺𝗆(D)2M,\displaystyle\leq d^{\alpha/2}|g|_{\alpha}\cdot|v|^{\alpha/2}_{\infty}(\varepsilon)+4|g|_{\infty}e^{-\frac{\beta^{2}\varepsilon^{2}}{4{\sf diam}(D)^{2}}M},
where the last inequality is due to 2.11.

4.4 Proofs for Subsection 2.4

Proof of 2.24.

We have

𝔼\displaystyle\mathbb{E} {|u()uMN()|L2(D)2}\displaystyle\left\{\left|u(\cdot)-u_{M}^{N}(\cdot)\right|^{2}_{L^{2}(D)}\right\}
2|u()uM()|L2(D)2+2𝔼{|uM()uMN()|L2(D)2}\displaystyle\leq 2\left|u(\cdot)-u_{M}(\cdot)\right|^{2}_{L^{2}(D)}+2\mathbb{E}\left\{\left|u_{M}(\cdot)-u_{M}^{N}(\cdot)\right|^{2}_{L^{2}(D)}\right\}
2λ(D)supxD|u(x)uM(x)|+2D𝔼{|uM()uMN()|2}𝑑x\displaystyle\leq 2\lambda(D)\sup\limits_{x\in D}|u(x)-u_{M}(x)|+2\int\limits_{D}\mathbb{E}\left\{\left|u_{M}(\cdot)-u_{M}^{N}(\cdot)\right|^{2}\right\}\;dx
=2λ(D)supxD|u(x)uM(x)|2+2ND𝔼{|uM()uM1()|2}𝑑x\displaystyle=2\lambda(D)\sup\limits_{x\in D}|u(x)-u_{M}(x)|^{2}+\frac{2}{N}\int\limits_{D}\mathbb{E}\left\{\left|u_{M}(\cdot)-u_{M}^{1}(\cdot)\right|^{2}\right\}\;dx
2λ(D)supxD|u(x)uM(x)|2+4N(D[|g|2+1d2M|f|2𝖽𝗂𝖺𝗆(D)2𝔼{k=1Mr~2(Xk1x,1)}]𝑑x)\displaystyle\leq 2\lambda(D)\sup\limits_{x\in D}|u(x)-u_{M}(x)|^{2}+\frac{4}{N}\left(\int\limits_{D}\left[|g|^{2}_{\infty}+\frac{1}{d^{2}}M|f|^{2}_{\infty}{\sf diam}(D)^{2}\mathbb{E}\left\{\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X_{k-1}^{x,1})\right\}\;\right]dx\right)
2λ(D)supxD|u(x)uM(x)|2+4N(D[|g|2+1d2M|f|2𝖽𝗂𝖺𝗆(D)2𝔼{τDcx}]𝑑x)\displaystyle\leq 2\lambda(D)\sup\limits_{x\in D}|u(x)-u_{M}(x)|^{2}+\frac{4}{N}\left(\int\limits_{D}\left[|g|^{2}_{\infty}+\frac{1}{d^{2}}M|f|^{2}_{\infty}{\sf diam}(D)^{2}\mathbb{E}\left\{\tau^{x}_{D^{c}}\right\}\;\right]dx\right)
and by 2.3
2λ(D)[supxD|u(x)uM(x)|2+2(|g|2+1d3M|f|2𝖽𝗂𝖺𝗆(D)4N].\displaystyle\leq 2\lambda(D)\left[\sup\limits_{x\in D}|u(x)-u_{M}(x)|^{2}+\frac{2(|g|^{2}_{\infty}+\frac{1}{d^{3}}M|f|^{2}_{\infty}{\sf diam}(D)^{4}}{N}\right].

Before proving the main result, 2.26, we need several preliminary Lemmas.

Lemma 4.2.

Let ε>0\varepsilon>0 and β(0,1]\beta\in(0,1]. If ff and gg are α\alpha-Hölder for some α[0,1]\alpha\in[0,1], and M,N1M,N\geq 1, then

|uMN(x)uMN(y)|(|g|α+diam(D)2|f|α+2diam(D)|f|d)(2+|r~|1)M(|xy|α|xy|),|u_{M}^{N}(x)-u_{M}^{N}(y)|\leq\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)(2+|\widetilde{r}|_{1})^{M}\left(|x-y|^{\alpha}\vee|x-y|\right),

for all x,yDx,y\in D almost surely. In particular,

|uM(x)uM(y)|(|g|α+diam(D)2|f|α+2diam(D)|f|d)(2+|r~|1)M(|xy|α|xy|)for all x,yD.|u_{M}(x)-u_{M}(y)|\leq\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)(2+|\widetilde{r}|_{1})^{M}\left(|x-y|^{\alpha}\vee|x-y|\right)\quad\mbox{for all }x,y\in D.
Proof.

Clearly, it is sufficient to prove the estimate for |uMi(x)uMi(y)|1|u_{M}^{i}(x)-u_{M}^{i}(y)|\leq 1, independently of ii, 1iN1\leq i\leq N. To this end, since r~\widetilde{r} is Lipschitz

|XMx,iXMy,i|\displaystyle|X^{x,i}_{M}-X^{y,i}_{M}| |XM1x,iXM1y,i|+|r~(XM1x,i)r~(XM1x,i)|\displaystyle\leq|X^{x,i}_{M-1}-X^{y,i}_{M-1}|+|\widetilde{r}(X^{x,i}_{M-1})-\widetilde{r}(X^{x,i}_{M-1})|
(1+|r~|1)|XM1x,iXM1y,i|\displaystyle\leq(1+|\widetilde{r}|_{1})|X^{x,i}_{M-1}-X^{y,i}_{M-1}|
(1+|r~|1)M|xy| for all x,yD,M0.\displaystyle\leq(1+|\widetilde{r}|_{1})^{M}|x-y|\quad\mbox{ for all }x,y\in D,M\geq 0.

Therefore,

|uMi(x)uMi(y)|\displaystyle|u_{M}^{i}(x)-u_{M}^{i}(y)| |g|α|XMx,iXMy,i|α+1dk=1M2|f|diam(D)|r~(Xk1x,i)r~(Xk1y,i)|\displaystyle\leq|g|_{\alpha}|X^{x,i}_{M}-X^{y,i}_{M}|^{\alpha}+\frac{1}{d}\sum\limits_{k=1}^{M}2|f|_{\infty}{\rm diam}(D)|\widetilde{r}(X^{x,i}_{k-1})-\widetilde{r}(X^{y,i}_{k-1})|
+1dk=1Mdiam(D)2|f|α[|Xk1x,iXk1y,i|+|r~(Xk1x,i)r~(Xk1x,i)|]α\displaystyle\;\phantom{\leq|g|_{\alpha}|X^{x,i}_{M}-X^{y,i}_{M}|^{\alpha}}+\frac{1}{d}\sum\limits_{k=1}^{M}{\rm diam}(D)^{2}|f|_{\alpha}[|X^{x,i}_{k-1}-X^{y,i}_{k-1}|+|\widetilde{r}(X^{x,i}_{k-1})-\widetilde{r}(X^{x,i}_{k-1})|]^{\alpha}
|g|α(1+|r~|1)Mα|xy|α+2diam(D)|f||xy|dk=1M(1+|r~|1)k1\displaystyle\leq|g|_{\alpha}(1+|\widetilde{r}|_{1})^{M\alpha}|x-y|^{\alpha}+\frac{2{\rm diam}(D)|f|_{\infty}|x-y|}{d}\sum\limits_{k=1}^{M}(1+|\widetilde{r}|_{1})^{k-1}
+diam(D)2|f|α|xy|αdk=1M(1+|r~|1)α(k1)\displaystyle\;\phantom{\leq|g|_{\alpha}|X^{x,i}_{M}-X^{y,i}_{M}|^{\alpha}}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}|x-y|^{\alpha}}{d}\sum\limits_{k=1}^{M}(1+|\widetilde{r}|_{1})^{\alpha(k-1)}
(|g|α+diam(D)2|f|α+2diam(D)|f|d)(2+|r~|1)M(|xy|α|xy|).\displaystyle\leq\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)(2+|\widetilde{r}|_{1})^{M}\left(|x-y|^{\alpha}\vee|x-y|\right).

The next lemma is the well-known Hoeffding’s inequality:

Lemma 4.3.

Suppose that (Zi)i1(Z_{i})_{i\geq 1} are iid real random variables such that aiZibia_{i}\leq Z_{i}\leq b_{i} for all ii. Then for all NN\in\mathbb{R} and γ0\gamma\geq 0

(|𝔼{Z1}1Ni=1NZi|γ)2e2N2γ2i=1N(biai)2.\mathbb{P}\left(\left|\mathbb{E}\{Z_{1}\}-\frac{1}{N}\sum_{i=1}^{N}Z_{i}\right|\geq\gamma\right)\leq 2e^{-\frac{2N^{2}\gamma^{2}}{\sum_{i=1}^{N}(b_{i}-a_{i})^{2}}}.

Using Hoeffding’s inequality we immediately get the following estimate.

Corollary 4.4.

For all N,MN,M\in\mathbb{N}, and γ0\gamma\geq 0 we have

(|uM(x)uMN(x)|γ)2eNγ2(|g|+M𝖽𝗂𝖺𝗆(D)2|f|/d)2,xD.\mathbb{P}\left(\left|u_{M}(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq 2e^{-\frac{N\gamma^{2}}{(|g|_{\infty}+M{\sf diam}(D)^{2}|f|_{\infty}/d)^{2}}},\quad x\in D.
Proof.

The result follows directly from 4.3 since

|g(XMx,i)+1dk=1Mr~2(Xk1x,i)f(Xk1x,i+r~(Xk1x,i)Yi)||g|+M𝖽𝗂𝖺𝗆(D)2|f|/d.\left|g(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x,i}_{k-1})f\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right|_{\infty}\leq|g|_{\infty}+M{\sf diam}(D)^{2}|f|_{\infty}/d.

Finally, we are in the position to prove the main theorem.

Proof of 2.26.

First of all, assume without loss of generality that D[0,𝖽𝗂𝖺𝗆(D)]dD\subset[0,{\sf diam}(D)]^{d}, and for each M,K1M,K\geq 1 consider the grid

F=F(K,M,α,|r~|1):={i𝖽𝗂𝖺𝗆(D)K(1+|r~|1)M/α: 1iK(2+|r~|1)M/α}dD.F=F(K,M,\alpha,|\widetilde{r}|_{1}):=\left\{\frac{i{\sf diam}(D)}{K(1+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}}:\;1\leq i\leq K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}\right\}^{d}\cap D.

For xDx\in D such that x[i1𝖽𝗂𝖺𝗆(D)K(2+|r~|1)M/α,(i1+1)𝖽𝗂𝖺𝗆(D)K(2+|r~|1)M/α)××[id𝖽𝗂𝖺𝗆(D)K(2+|r~|1)M/α,(id+1)𝖽𝗂𝖺𝗆(D)K(2+|r~|1)M/α)x\in\left[\frac{i_{1}{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}},\frac{(i_{1}+1){\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}}\right)\times\cdots\times\left[\frac{i_{d}{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}},\frac{(i_{d}+1){\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}}\right) we set

xF:=(i1𝖽𝗂𝖺𝗆(D)K(2+|r~|1)M/α,,id𝖽𝗂𝖺𝗆(D)K(2+|r~|1)M/α).x^{F}:=\left(\frac{i_{1}{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}},\cdots,\frac{i_{d}{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{\lceil M/\alpha\rceil}}\right).

Note that {fleqn}

supxD|u(x)uMN(x)|\displaystyle\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right| supxD|u(x)uM(x)|+supxD|uM(x)uMN(x)|\displaystyle\leq\sup_{x\in D}\left|u(x)-u_{M}(x)\right|+\sup_{x\in D}\left|u_{M}(x)-u_{M}^{N}(x)\right|
supxD|u(x)uM(x)|+supxF|uM(x)uMN(x)|\displaystyle\leq\sup_{x\in D}\left|u(x)-u_{M}(x)\right|+\sup_{x\in F}\left|u_{M}(x)-u_{M}^{N}(x)\right|
+2(|g|α+diam(D)2|f|α+2diam(D)|f|d)(𝖽𝗂𝖺𝗆(D)K)α,\displaystyle\;\phantom{\leq\sup_{x\in D}\left|u(x)-u_{M}(x)\right|}+2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha},

where the last inequality follows from 4.2 and by the fact that

|xxF|𝖽𝗂𝖺𝗆(D)K(2+|r~|1)M/α for all xD.|x-x_{F}|\leq\frac{{\sf diam}(D)}{K(2+|\widetilde{r}|_{1})^{M/\alpha}}\mbox{ for all }\;x\in D.

Consequently, by setting

γ~:=γsupxD|u(x)uM(x)|2(|g|α+diam(D)2|f|α+2diam(D)|f|d)(𝖽𝗂𝖺𝗆(D)K)α\widetilde{\gamma}:=\gamma-\sup_{x\in D}\left|u(x)-u_{M}(x)\right|-2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha}

and using union bound inequality we have

\displaystyle\mathbb{P} (supxD|u(x)uMN(x)|γ)(supxF|uM(x)uMN(x)|γ~)xF(|uM(x)uMN(x)|γ~),\displaystyle\left(\sup_{x\in D}\left|u(x)-u_{M}^{N}(x)\right|\geq\gamma\right)\leq\mathbb{P}\left(\sup_{x\in F}\left|u_{M}(x)-u_{M}^{N}(x)\right|\geq\widetilde{\gamma}\right)\leq\sum\limits_{x\in F}\mathbb{P}\left(\left|u_{M}(x)-u_{M}^{N}(x)\right|\geq\widetilde{\gamma}\right),

so, the two desired estimates now follow by 2.21 and 4.4.

In the case of gC2(D¯)g\in C^{2}(\bar{D}), we only need to use the second part of 2.21, the rest of the argument being the same.

The assertions about the particular domains are clear.

For the statement about the expectation stated in (2.35), we only have to involve the following Lemma.

Lemma 4.5.

If XX is a non-negative random variable with the property that there exist constants and c1,c2,A0c_{1},c_{2},A\geq 0 such that

(Xt)2ec1c2((tA)+)2 for all t0,\mathbb{P}(X\geq t)\leq 2e^{c_{1}-c_{2}((t-A)^{+})^{2}}\text{ for all }t\geq 0, (4.17)

then

𝔼[X]A+c1+log(2)+1c2.\mathbb{E}[X]\leq A+\frac{\sqrt{c_{1}+\log(2)}+1}{\sqrt{c_{2}}}.
Proof.

Start with λA\lambda\geq A and write

𝔼[X]=0(Xt)𝑑t=0λ(Xt)𝑑t+λ(Xt)𝑑tλ+λ2ec1c2(tA)2=λ+λA2ec1c2t2𝑑t.\mathbb{E}[X]=\int_{0}^{\infty}\mathbb{P}(X\geq t)dt=\int_{0}^{\lambda}\mathbb{P}(X\geq t)dt+\int_{\lambda}^{\infty}\mathbb{P}(X\geq t)dt\leq\lambda+\int_{\lambda}^{\infty}2e^{c_{1}-c_{2}(t-A)^{2}}=\lambda+\int_{\lambda-A}^{\infty}2e^{c_{1}-c_{2}t^{2}}dt.

Optimizing over λA\lambda\geq A, yields the optimum point as

λ=A+c1+log(2)c2\lambda^{*}=A+\sqrt{\frac{c_{1}+\log(2)}{c_{2}}}

which in turn yields

λA2ec1c2t2𝑑t2ec1λAtλAec2t2𝑑t=12(c1+log(2))c21c2.\int_{\lambda^{*}-A}^{\infty}2e^{c_{1}-c_{2}t^{2}}dt\leq 2e^{c_{1}}\int_{\lambda^{*}-A}^{\infty}\frac{t}{\lambda^{*}-A}e^{-c_{2}t^{2}}dt=\frac{1}{2\sqrt{(c_{1}+\log(2))c_{2}}}\leq\frac{1}{\sqrt{c_{2}}}.

This concludes the estimate. ∎

The rest of the statements in the theorem are straightforward now.

Proof of 2.28.

Note that since r~\widetilde{r} is a (β,ε)(\beta,\varepsilon)-distance, it is also a (β,ε0)(\beta,\varepsilon_{0})-distance since εε0\varepsilon\leq\varepsilon_{0}.

Now, using (2.38) we get that

2(|g|α+diam(D)2|f|α+2diam(D)|f|d)(𝖽𝗂𝖺𝗆(D)K)αγ4.2\left(|g|_{\alpha}+\frac{{\rm diam}(D)^{2}|f|_{\alpha}+2{\rm diam}(D)|f|_{\infty}}{d}\right)\left(\frac{{\sf diam}(D)}{K}\right)^{\alpha}\leq\frac{\gamma}{4}.

Also, because 𝖺𝖽𝗂𝖺𝗆(D)<{\sf adiam}(D)<\infty, by 2.6 we have that |v|(ε0)ε0𝖺𝖽𝗂𝖺𝗆(D)|v|_{\infty}(\varepsilon_{0})\leq\varepsilon_{0}{\sf adiam}(D). Therefore, since ε01\varepsilon_{0}\leq 1 is given by (2.37) and using that ε0ε0α/2\varepsilon_{0}\leq\varepsilon_{0}^{\alpha/2}, we get that

dα/2|g|α|v|α/2(ε0)+|f||v|(ε0)γ4.d^{\alpha/2}|g|_{\alpha}\cdot|v|^{\alpha/2}_{\infty}(\varepsilon_{0})+|f|_{\infty}|v|_{\infty}(\varepsilon_{0})\leq\frac{\gamma}{4}.

Now, if MM is as in (2.41), then

(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)eβ2ε024𝖽𝗂𝖺𝗆(D)2Mγ4.(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})e^{-\frac{\beta^{2}\varepsilon_{0}^{2}}{4{\sf diam}(D)^{2}}M}\leq\frac{\gamma}{4}.

Furthermore, if DD is δ\delta-defective convex and MM is chosen as in (2.42), we have that

(4|g|+2d𝖽𝗂𝖺𝗆(D)2|f|)adM𝖽𝗂𝖺𝗆(D)ε0γ4,(4|g|_{\infty}+\frac{2}{d}{\sf diam}(D)^{2}|f|_{\infty})a_{d}^{M}\sqrt{\frac{{\sf diam}(D)}{\varepsilon_{0}}}\leq\frac{\gamma}{4},

All the choices above ensure that A(D,M,K,d,ε)A(D,M,K,d,\varepsilon) from 2.26 satisfies

A(D,M,K,d,ε)3γ4.A(D,M,K,d,\varepsilon)\leq\frac{3\gamma}{4}.

Taking into account the above inequality and the estimate (2.31), it is a dircet check to see that the right hand side of (2.31) is less than η\eta if NN satisfies (3.3), which concludes the proof. ∎

4.5 Proofs for Subsection 2.5

Proof of 2.30.

We have:

|G(x)G(y)||xy|α\displaystyle\frac{|G(x)-G(y)|}{|x-y|^{\alpha}} |ψ(1ε0r(x)))(g(πD(x))g(πD(y))||xy|α+|((ψ(1ε0r(x)))ψ(1ε0r(y))))g(πD(y))||xy|α\displaystyle\leq\frac{|\psi\left(\frac{1}{{\varepsilon}_{0}}r(x))\right)(g({\pi_{\partial D}}(x))-g({\pi_{\partial D}}(y))|}{|x-y|^{\alpha}}+\frac{|((\psi\left(\frac{1}{{\varepsilon}_{0}}r(x))\right)-\psi\left(\frac{1}{{\varepsilon}_{0}}r(y))\right))g({\pi_{\partial D}}(y))|}{|x-y|^{\alpha}}
|πD(x))πD(y))|α|g|α|xy|α+|g|ε0|r(x)r(y)||xy|α\displaystyle\leq\frac{|{\pi_{\partial D}}(x))-{\pi_{\partial D}}(y))|^{\alpha}|g|_{\alpha}}{|x-y|^{\alpha}}+\frac{|g|_{\infty}}{{\varepsilon}_{0}}\frac{|r(x)-r(y)|}{|x-y|^{\alpha}}
|πD|α|g|α+|g|Lε0|xy|α|𝖽𝗂𝖺𝗆(𝖣)|𝟣α|xy|α.\displaystyle\leq|\nabla{\pi_{\partial D}}|_{\infty}^{\alpha}|g|_{\alpha}+\frac{|g|_{L^{\infty}}}{{\varepsilon}_{0}}\frac{|x-y|^{\alpha}|\sf{diam}(D)|^{1-\alpha}}{|x-y|^{\alpha}}.

For the second part, we know that DD if of class C3C^{3} and gC2(D)g\in C^{2}(\partial D) and thus GC2(D)C(D¯)G\in C^{2}(D)\cap C(\bar{D}). Using the definition of GG we have

G\displaystyle\nabla G =1ε0ψ(1ε0r)rg(πD)+ψ(1ε0r)g(πD)πD,\displaystyle=\frac{1}{{\varepsilon}_{0}}\psi^{\prime}\left(\frac{1}{{\varepsilon}_{0}}r\right)\nabla rg({\pi_{\partial D}})+\psi\left(\frac{1}{{\varepsilon}_{0}}r\right)\nabla g({\pi_{\partial D}})\nabla{\pi_{\partial D}},
respectively
ΔG\displaystyle\Delta G =(ψ′′|r|2+ψΔr)g(πD)+2ε0(ψr)g(πD)πD\displaystyle=\left(\psi^{\prime\prime}|\nabla r|^{2}+\psi^{\prime}\Delta r\right)g({\pi_{\partial D}})+\frac{2}{{\varepsilon}_{0}}(\psi^{\prime}\nabla r)\nabla g({\pi_{\partial D}})\nabla{\pi_{\partial D}}
+ψ(r)(Δg(πD)|πD|2+g(πD)ΔπD).\displaystyle\phantom{(\psi^{\prime\prime}|\nabla r|^{2}}\quad\;\;+\psi(r)\left(\Delta g({\pi_{\partial D}})|\nabla{\pi_{\partial D}}|^{2}+\nabla g({\pi_{\partial D}})\Delta{\pi_{\partial D}}\right).

Taking into account that |ψ|,|ψ|,|ψ′′|1|\psi|_{\infty},|\psi^{\prime}|_{\infty},|\psi^{\prime\prime}|_{\infty}\leq 1 and |r|1|\nabla r|\leq 1 we have:

|G|\displaystyle|\nabla G|_{\infty} 1ε0|g|+|g||πD|,\displaystyle\leq\frac{1}{{\varepsilon}_{0}}|g|_{\infty}+|\nabla g|_{\infty}|\nabla{\pi_{\partial D}}|_{\infty},
|ΔG|\displaystyle|\Delta G|_{\infty} (1+|Δr|)|g|+2ϵ0|g||πD|+|Δg||πD|2+|g||ΔπD|.\displaystyle\leq\left(1+|\Delta r|_{\infty}\right)|g|_{\infty}+\frac{2}{\epsilon_{0}}|\nabla g|_{\infty}|\nabla{\pi_{\partial D}}|_{\infty}+|\Delta g|_{\infty}|\nabla{\pi_{\partial D}}|_{\infty}^{2}+|\nabla g|_{\infty}|\Delta{\pi_{\partial D}}|_{\infty}.

For any point PDP\in\partial D let ν(P)\nu(P) and TPT_{P} denote respectively the unit exterior normal to D\partial D at the point PP and the tangent hyperspace to D\partial D at PP. By a rotation of coordinates we can assume that the PdP_{d} coordinate lies in the direction ν(P)\nu(P). In some neighbourhood 𝒩=𝒩(P)\mathcal{N}=\mathcal{N}(P) of PP, D\partial D is given by Pd=φ(P)P_{d}=\varphi(P^{\prime}) where P=(P1,,Pd1)P^{\prime}=(P_{1},\dots,P_{d-1}), φC3(TP𝒩)\varphi\in C^{3}(T_{P}\cap\mathcal{N}) and Dφ(P)=0D\varphi(P^{\prime})=0. The eigenvalues of the matrix [2φ(P)][\nabla^{2}\varphi(P^{\prime})] denoted {k1,,kd1}\{k_{1},\dots,k_{d-1}\} are then the principal curvatures of D\partial D at PP . By a further rotation of coordinates we can assume that the P1,,Pd1P_{1},\dots,P_{d-1} axes lie along principal directions corresponding to k1,,kd1k_{1},\dots,k_{d-1} at PP.

The Hessian matrix [D2φ(P)][D^{2}\varphi(P^{\prime})] with respect to the principal coordinate system at PP described above is given by

[D2φ(P)]=diag[k1,,kd1].[D^{2}\varphi(P^{\prime})]=\textrm{diag}[k_{1},\dots,k_{d-1}].

As noted in the proof of Lemma 14.1614.16 in [19] the maximal radius of the interior ball that can be associated to each point on the boundary, is bounded from below by a certain μ>0\mu>0 and we have that μ1\mu^{-1} bounds the principal curvatures, hence our choice of μ\mu as ϵ0\epsilon_{0}.

The unit exterior normal vector ν^(P):=ν(P)\hat{\nu}(P^{\prime}):=\nu(P) at a point P=(P,φ(P))𝒩DP=(P^{\prime},\varphi(P^{\prime}))\in\mathcal{N}\cap\partial D is given by

νi(P)=Diφ(P)1+|Dφ(P)|2,i=1,,d1,νd(P)=11+|Dφ(P)|2.\nu_{i}(P)=\frac{D_{i}\varphi(P^{\prime})}{\sqrt{1+|D\varphi(P^{\prime})|^{2}}},\,i=1,\dots,d-1,\nu_{d}(P)=\frac{1}{\sqrt{1+|D\varphi(P^{\prime})|^{2}}}.

Hence with respect to the principal coordinate system at PP we have:

ν^ixj(P)=kiδij,i,j=1,,d1.\frac{\partial\hat{\nu}_{i}}{\partial x_{j}}(P^{\prime})=k_{i}\delta_{ij},i,j=1,\dots,d-1. (4.18)

We note that for each xDϵ0x\in{D_{\epsilon_{0}}} there exists a unique πD(x)D{\pi_{\partial D}}(x)\in\partial D such that |πD(x)x|=r(x)|{\pi_{\partial D}}(x)-x|=r(x). We have:

x=πD(x)+ν(πD(x))r(x).x={\pi_{\partial D}}(x)+\nu({\pi_{\partial D}}(x))r(x). (4.19)

As pointed in the proof of Lemma 14.614.6 in [19] we have πDC2(Dϵ0),rC3(Dϵ0){\pi_{\partial D}}\in C^{2}({D_{\epsilon_{0}}}),r\in C^{3}({D_{\epsilon_{0}}}). Differentiating the i-th coordinate of (4.19) with respect to xjx_{j} we get:

δij=(πD)ixj+lνiyl(πD)lxjr+νirxj\delta_{ij}=\frac{\partial({\pi_{\partial D}})_{i}}{\partial x_{j}}+\sum_{l}\frac{\partial\nu_{i}}{\partial y_{l}}\frac{\partial({\pi_{\partial D}})_{l}}{\partial x_{j}}r+\nu_{i}\frac{\partial r}{\partial x_{j}} (4.20)

Furthermore, differentiating (4.20) with respect to xkx_{k} we get:

0=\displaystyle 0= 2πDixjxk+l,m2νiylymπDmxkπDlxjr+lνiyl2πDlxjxkr+\displaystyle\frac{\partial^{2}{{\pi_{\partial D}}}_{i}}{\partial x_{j}\partial x_{k}}+\sum_{l,m}\frac{\partial^{2}\nu_{i}}{\partial y_{l}\partial y_{m}}\frac{\partial{{\pi_{\partial D}}}_{m}}{\partial x_{k}}\frac{\partial{{\pi_{\partial D}}}_{l}}{\partial x_{j}}r+\sum_{l}\frac{\partial\nu_{i}}{\partial y_{l}}\frac{\partial^{2}{\pi_{\partial D}}_{l}}{\partial x_{j}\partial x_{k}}r+
+lνiylπDlxjrxk+lνiylπDlxkrxj+νi2rxjxk.\displaystyle+\sum_{l}\frac{\partial\nu_{i}}{\partial y_{l}}\frac{\partial{{\pi_{\partial D}}}_{l}}{\partial x_{j}}\frac{\partial r}{\partial x_{k}}+\sum_{l}\frac{\partial\nu_{i}}{\partial y_{l}}\frac{\partial{{\pi_{\partial D}}}_{l}}{\partial x_{k}}\frac{\partial r}{\partial x_{j}}+\nu_{i}\frac{\partial^{2}r}{\partial x_{j}\partial x_{k}}. (4.21)

As noted in the proof of Lemma 14.1714.17 in [19] we have that in terms of a principal coordinate system at πD(x){\pi_{\partial D}}(x) as chosen before

r(x)=ν(πD(x))\nabla r(x)=\nu({\pi_{\partial D}}(x)) (4.22)

and,

2r=diag(k11+k1r,,kd11+kd1r,0),\nabla^{2}r=\textrm{diag}(\frac{k_{1}}{1+k_{1}r},\dots,\frac{k_{d-1}}{1+k_{d-1}r},0), (4.23)

hence

|r|1,|2r|maxxDkd1(x)=ε01.|\nabla r|_{\infty}\leq 1,|\nabla^{2}r|_{\infty}\leq\max_{x\in D}k_{d-1}(x)={\varepsilon}_{0}^{-1}.

Using (4.22) in (4.18) implies:

(πD)ixj=(δijνiνj)11+kir\frac{\partial({\pi_{\partial D}})_{i}}{\partial x_{j}}=(\delta_{ij}-\nu_{i}\nu_{j})\frac{1}{1+k_{i}r} (4.24)

hence

|πD|2.|\nabla{\pi_{\partial D}}|_{\infty}\leq 2.

Furthermore using 4.22 and (4.23) in (4.5) we can bound:

|2πD||2ν|4ε0+4+ε01,|\nabla^{2}{\pi_{\partial D}}|_{\infty}\leq|\nabla^{2}\nu|_{\infty}4{\varepsilon}_{0}+4+{\varepsilon}_{0}^{-1},

where we estimated |r|ε0|r|_{\infty}\leq{\varepsilon}_{0} because due to the cut-off function ψ\psi we only need to estimate r,πDr,{\pi_{\partial D}} in the ε0{\varepsilon}_{0} neighbourhood of the boundary.

Proof of 3.6.

First note that if ϕ=WLσσW1\phi=W^{L}\circ\sigma\circ\cdots\sigma W^{1}, where WiW^{i} is of the form Wi(z)=Aiz+biW^{i}(z)=A^{i}z+b^{i}, then using the relation x=σ(x)σ(x)x=\sigma(x)-\sigma(-x) we have

ϕ(x)v=(vv)σ(WLWL)σσW1x,xd.\phi(x)v=\begin{pmatrix}v&v\end{pmatrix}\sigma\circ\begin{pmatrix}W^{L}\\ -W^{L}\end{pmatrix}\circ\sigma\circ\cdots\sigma\circ W^{1}x,\quad x\in\mathbb{R}^{d}.

The assertions follow now from 3.4 and 3.2. ∎

Proof of 3.8.

We first note that

|1ε0r1ε0ϕr|δrε0and|ψ(1ε0r)ϕψ(1ε0r)|δψ.\left|\frac{1}{{\varepsilon}_{0}}r-\frac{1}{{\varepsilon}_{0}}\phi_{r}\right|_{\infty}\leq\frac{\delta_{r}}{{\varepsilon}_{0}}\quad\mbox{and}\quad\left|\psi\left(\frac{1}{{\varepsilon}_{0}}r\right)-\phi_{\psi}\left(\frac{1}{{\varepsilon}_{0}}r\right)\right|_{\infty}\leq\delta_{\psi}.

Therefore, by the triangle inequality we get

|ϕψ(rε0)(x)ϕψ(ϕrε0)(x)|\displaystyle\left|\phi_{\psi}\left(\frac{r}{{\varepsilon}_{0}}\right)(x)-\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right)(x)\right| 2δψ+|ψ||1ε0r(x)1ε0ϕr(x)|2δψ+|ψ|δrε0,\displaystyle\leq 2\delta_{\psi}+|\psi^{\prime}|_{\infty}\left|\frac{1}{{\varepsilon}_{0}}r(x)-\frac{1}{{\varepsilon}_{0}}\phi_{r}(x)\right|\leq 2\delta_{\psi}+|\psi^{\prime}|_{\infty}\frac{\delta_{r}}{{\varepsilon}_{0}},
hence
|ψ(1ε0r)ϕψ(ϕrε0)|\displaystyle\left|\psi\left(\frac{1}{{\varepsilon}_{0}}r\right)-\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right)\right|_{\infty} 3δψ+δdε0.\displaystyle\leq 3\delta_{\psi}+\frac{\delta_{d}}{{\varepsilon}_{0}}.
Reasoning analogously we get
|g(πD)ϕgϕπ|\displaystyle|g({\pi_{\partial D}})-\phi_{g}\circ\phi_{\pi}|_{\infty} 3δg+|g|δπ,\displaystyle\leq 3\delta_{g}+|\nabla g|_{\infty}\delta_{\pi},
so again by the triangle inequality
|ψ(1ε0r)g(πD)ϕψ(ϕrε0)ϕgϕπ|\displaystyle\left|\psi\left(\frac{1}{{\varepsilon}_{0}}r\right)g({\pi_{\partial D}})-\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right)\phi_{g}\circ\phi_{\pi}\right|_{\infty} (3δψ+δdε0)|g|+(3δg+|g|δπ)(δψ+1).\displaystyle\leq\left(3\delta_{\psi}+\frac{\delta_{d}}{{\varepsilon}_{0}}\right)|g|_{\infty}+\left(3\delta_{g}+|\nabla g|_{\infty}\delta_{\pi}\right)(\delta_{\psi}+1).

Let now Π\Pi be the ReLU DNN given in 3.5 with ϵp:=(3δψ+δdε0)|g|+(3δg+|g|δπ)(δψ+1)=δ¯/2\epsilon_{p}:=\left(3\delta_{\psi}+\frac{\delta_{d}}{{\varepsilon}_{0}}\right)|g|_{\infty}+\left(3\delta_{g}+|\nabla g|_{\infty}\delta_{\pi}\right)(\delta_{\psi}+1)=\overline{\delta}/2, and c:=|g|c:=|g|_{\infty}, so that

|GΠ(ϕψ(ϕrε0),ϕgϕπ)|δ¯.\left|G-\Pi\left(\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right),\phi_{g}\circ\phi_{\pi}\right)\right|_{\infty}\leq\overline{\delta}.

Now, by taking ϕG:=Π(ϕψ(ϕrε0),ϕgϕπ)\phi_{G}:=\Pi\left(\phi_{\psi}\left(\frac{\phi_{r}}{{\varepsilon}_{0}}\right),\phi_{g}\circ\phi_{\pi}\right), the statement follows directly from 3.5 and 3.3. ∎

Proof of 3.10.

i) The fact that u~MN()(ω)\widetilde{u}_{M}^{N}(\cdot)(\omega) can be realized, for each ωΩ\omega\in\Omega, as a ReLU DNN, denoted in the statement by 𝕌MN(ω,)\mathbb{U}_{M}^{N}(\omega,\cdot), is a direct consequence of 3.9, 3.5, 3.3, and 3.2.

Next, let uMNu_{M}^{N} be given by (2.29) and consider the modification of uMNu_{M}^{N} by replacing gg and ff with ϕg\phi_{g} and ϕf\phi_{f}, namely

u^MN(x):=1Ni=1N[ϕg(XMx,i)+1dk=1Mr~2(Xk1x,i)ϕf(Xk1x,i+r~(Xk1x,i)Yi)],xD.\widehat{u}_{M}^{N}(x):=\frac{1}{N}\sum_{i=1}^{N}\left[\phi_{g}(X^{x,i}_{M})+\frac{1}{d}\sum\limits_{k=1}^{M}\widetilde{r}^{2}(X^{x,i}_{k-1})\phi_{f}\left(X^{x,i}_{k-1}+\widetilde{r}(X^{x,i}_{k-1})Y^{i}\right)\right],x\in D.

Then using assumption a.4)a.4) we can easily deduce that |u^MNu~MN|Mϵpd(1+2|f|)γ/6|\widehat{u}_{M}^{N}-\widetilde{u}_{M}^{N}|_{\infty}\leq\frac{M\epsilon_{p}}{d}(1+2|f|_{\infty})\leq\gamma/6. Therefore,

supxD|u(x)𝕌MN(,x)|\displaystyle\sup\limits_{x\in D}\left|u(x)-\mathbb{U}_{M}^{N}(\cdot,x)\right| =|uu~MN||uuMN|+|uMNu^MN|+|u^MNu~MN|\displaystyle=|u-\widetilde{u}_{M}^{N}|_{\infty}\leq|u-u_{M}^{N}|_{\infty}+|u_{M}^{N}-\widehat{u}_{M}^{N}|+|\widehat{u}_{M}^{N}-\widetilde{u}_{M}^{N}|
|uuMN|+ϵg+Mdiam(D)2ϵfd+γ/6\displaystyle\leq|u-u_{M}^{N}|_{\infty}+\epsilon_{g}+\frac{M{\rm diam}(D)^{2}\epsilon_{f}}{d}+\gamma/6
|uuMN|+γ/2,\displaystyle\leq|u-u_{M}^{N}|_{\infty}+\gamma/2,

where the last inequality follows from assumptions a.1)a.1), a.2)a.2). Consequently,

(supxD|u(x)𝕌MN(,x)|γ)\displaystyle\mathbb{P}\left(\sup_{x\in D}\left|u(x)-\mathbb{U}_{M}^{N}(\cdot,x)\right|\geq\gamma\right) (|uuMN|γ/2),\displaystyle\leq\mathbb{P}\left(|u-u_{M}^{N}|_{\infty}\geq\gamma/2\right),

so, we can employ 2.28 to conclude the first assertion.

Let us proceed at proving ii). First of all, note that

(r~)=(ϕr)+1,𝒲(r~)=𝒲(ϕr),size(r~)size(ϕr)+2.\mathcal{L}(\widetilde{r})=\mathcal{L}(\phi_{r})+1,\;\mathcal{W}(\widetilde{r})=\mathcal{W}(\phi_{r}),\;{\rm size}(\widetilde{r})\leq{\rm size}(\phi_{r})+2.

Then, by 3.3 and 3.9 we have

(ϕg(XM,i))\displaystyle\mathcal{L}(\phi_{g}(X^{\cdot,i}_{M})) =(ϕg)+(XM,i)=(ϕg)+M((r~)+1))+1=(ϕg)+M((ϕr)+2))+1,\displaystyle=\mathcal{L}(\phi_{g})+\mathcal{L}(X^{\cdot,i}_{M})=\mathcal{L}(\phi_{g})+M(\mathcal{L}(\widetilde{r})+1))+1=\mathcal{L}(\phi_{g})+M(\mathcal{L}(\phi_{r})+2))+1,
𝒲(ϕg(XM,i))\displaystyle\mathcal{W}(\phi_{g}(X^{\cdot,i}_{M})) max(𝒲(ϕg),𝒲(XM,i),2d)max(𝒲(ϕg),2d+max(d,𝒲(ϕr))),\displaystyle\leq\max(\mathcal{W}(\phi_{g}),\mathcal{W}(X^{\cdot,i}_{M}),2d)\leq\max(\mathcal{W}(\phi_{g}),2d+\max(d,\mathcal{W}(\phi_{r}))),
size(ϕg(XM,i))\displaystyle{\rm size}(\phi_{g}(X^{\cdot,i}_{M})) 2size(ϕg)+2size(XM,i)2size(ϕg)+4dM[4d+𝒲(r~)+(r~)+2]+d+2Msize(r~)\displaystyle\leq 2{\rm size}(\phi_{g})+2{\rm size}(X^{\cdot,i}_{M})\leq 2{\rm size}(\phi_{g})+4dM[4d+\mathcal{W}(\widetilde{r})+\mathcal{L}(\widetilde{r})+2]+d+2M{\rm size}(\widetilde{r})
2size(ϕg)+4dM[4d+𝒲(ϕr)+(ϕr)+3]+2d+4M[size(ϕr)+2]\displaystyle\leq 2{\rm size}(\phi_{g})+4dM[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+2d+4M[{\rm size}(\phi_{r})+2]
𝒪(size(ϕg)+Msize(ϕr)+Mdmax(d,𝒲(ϕr),(ϕr))).\displaystyle\in\mathcal{O}({\rm size}(\phi_{g})+M{\rm size}(\phi_{r})+Md\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))). (4.25)

Further, for each k0k\geq 0, by 3.3 we get

size(ϕf(Xk,i+r~(Xk,i)Yi))\displaystyle{\rm size}\left(\phi_{f}\left(X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}\right)\right) 2size(ϕf)+2size(Xk,i+r~(Xk,i)Yi)\displaystyle\leq 2{\rm size}\left(\phi_{f}\right)+2{\rm size}\left(X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}\right)
and since Xk,i+r~(Xk,i)YiX^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i} has the same size as Xk+1,iX^{\cdot,i}_{k+1}, we can continue with
=2size(ϕf)+2size(Xk+1,i)\displaystyle=2{\rm size}\left(\phi_{f}\right)+2{\rm size}\left(X^{\cdot,i}_{k+1}\right)
2size(ϕf)+4d(k+1)[4d+𝒲(ϕr)+(ϕr)+3]+2d+4(k+1)[size(ϕr)+2].\displaystyle\leq 2{\rm size}(\phi_{f})+4d(k+1)[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+2d+4(k+1)[{\rm size}(\phi_{r})+2]. (4.26)

The next step is to use 3.5 to get

size(Π(r~(Xk,i),r~(Xk,i)))\displaystyle{\rm size}\left(\Pi\left(\widetilde{r}(X^{\cdot,i}_{k}),\widetilde{r}(X^{\cdot,i}_{k})\right)\right) 8size(r~(Xk,i))+𝒪(log(ϵp1)+log(c))\displaystyle\leq 8{\rm size}(\widetilde{r}(X^{\cdot,i}_{k}))+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)
16size(r~)+16size(Xk,i)+𝒪(log(ϵp1)+log(c))\displaystyle\leq 16{\rm size}(\widetilde{r})+16{\rm size}(X^{\cdot,i}_{k})+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)
16size(ϕr)+32dk[4d+𝒲(ϕr)+(ϕr)+3]+16d+32k[size(ϕr)+2]\displaystyle\leq 16{\rm size}(\phi_{r})+32dk[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+16d+32k[{\rm size}(\phi_{r})+2]
+𝒪(log(ϵp1)+log(c)),\displaystyle\phantom{\leq 16{\rm size}(\phi_{r})\;}+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil), (4.27)

so that by (4.26) and (4.27) together with 3.5 we obtain

size\displaystyle{\rm size} (Π(Π(r~(Xk,i),r~(Xk,i)),ϕf(Xk,i+r~(Xk,i)Yi)))\displaystyle\left(\Pi\left(\Pi\left(\widetilde{r}(X^{\cdot,i}_{k}),\widetilde{r}(X^{\cdot,i}_{k})\right),\phi_{f}\left(X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}\right)\right)\right)
4size(Π(r~(Xk,i),r~(Xk,i)))+4size(ϕf(Xk,i+r~(Xk,i)Yi))+𝒪(log(ϵp1)+log(c))\displaystyle\leq 4{\rm size}\left(\Pi\left(\widetilde{r}(X^{\cdot,i}_{k}),\widetilde{r}(X^{\cdot,i}_{k})\right)\right)+4{\rm size}\left(\phi_{f}\left(X^{\cdot,i}_{k}+\widetilde{r}(X^{\cdot,i}_{k})Y^{i}\right)\right)+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)
24size(ϕr)+128dk[4d+𝒲(ϕr)+(ϕr)+3]+64d+128k[size(ϕr)+2]\displaystyle\leq 24{\rm size}(\phi_{r})+128dk[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+64d+128k[{\rm size}(\phi_{r})+2]
+8size(ϕf)+16d(k+1)[4d+𝒲(ϕr)+(ϕr)+3]+8d+16(k+1)[size(ϕr)+2]\displaystyle\phantom{\leq 24{\rm size}(\phi_{r})\;}+8{\rm size}(\phi_{f})+16d(k+1)[4d+\mathcal{W}(\phi_{r})+\mathcal{L}(\phi_{r})+3]+8d+16(k+1)[{\rm size}(\phi_{r})+2] (4.28)
+𝒪(log(ϵp1)+log(c))\displaystyle\phantom{\leq 24{\rm size}(\phi_{r})\;}+\mathcal{O}(\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil)
𝒪(dkmax(d,𝒲(ϕr),(ϕr))+ksize(ϕr)+size(ϕf)+log(ϵp1)+log(c)).\displaystyle\in\mathcal{O}\left(dk\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+k{\rm size}(\phi_{r})+{\rm size}(\phi_{f})+\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil\right). (4.29)

Finally, corroborating (4.25) with (4.29) and by applying 3.2, we obtain that for each ωΩ\omega\in\Omega

size\displaystyle{\rm size} (𝕌MN(ω,))\displaystyle(\mathbb{U}_{M}^{N}(\omega,\cdot))
𝒪(MN[dMmax(d,𝒲(ϕr),(ϕr))+Msize(ϕr)+size(ϕf)+size(ϕg)+log(ϵp1)+log(c)]),\displaystyle\in\mathcal{O}\left(MN\left[dM\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+M{\rm size}(\phi_{r})+{\rm size}(\phi_{f})+{\rm size}(\phi_{g})+\lceil\log(\epsilon_{p}^{-1})+\log(c)\rceil\right]\right),

and by assumption a.4)a.4) we deduce that

size(𝕌MN(ω,))𝒪(MN[dMmax(d,𝒲(ϕr),(ϕr))+Msize(ϕr)+size(ϕf)+log(1γd)]),{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot))\in\mathcal{O}\left(MN\left[dM\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+M{\rm size}(\phi_{r})+{\rm size}(\phi_{f})+\log\left(\frac{1}{\gamma d}\right)\right]\right),

where the tacit constant depends on max(diam(D),|f|)\max({\rm diam}(D),|f|_{\infty}).

In particular,

M\displaystyle M 𝒪(d2γ4/αlog(1γ)),\displaystyle\in\mathcal{O}\left(d^{2}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)\right),
N\displaystyle N 𝒪(d2γ8/α2log2(1γ)[d3γ4/αlog(1γ)+log(1η)]),\displaystyle\in\mathcal{O}\left(d^{2}\gamma^{-8/\alpha-2}\log^{2}\left(\frac{1}{\gamma}\right)\left[d^{3}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]\right),
size(𝕌MN(ω,))\displaystyle{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot)) 𝒪(d7γ16/α4log4(1γ)[d3γ4/αlog(1γ)+log(1η)]S),\displaystyle\in\mathcal{O}\left(d^{7}\gamma^{-16/\alpha-4}\log^{4}\left(\frac{1}{\gamma}\right)\left[d^{3}\gamma^{-4/\alpha}\log\left(\frac{1}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right),

where

S:=[max(d,𝒲(ϕr),(ϕr))+size(ϕr)+size(ϕg)+size(ϕf)]{\rm S}:=\left[\max(d,\mathcal{W}(\phi_{r}),\mathcal{L}(\phi_{r}))+{\rm size}(\phi_{r})+{\rm size}(\phi_{g})+{\rm size}(\phi_{f})\right]

and the tacit constant depends on |g|α,|g|,|f|,diam(D),adiam(D),δ,α,log(2+|ϕr|1)|g|_{\alpha},|g|_{\infty},|f|_{\infty},{\rm diam}(D),{\rm adiam}(D),\delta,\alpha,\log(2+|\phi_{r}|_{1}).

Now, if DD is δ\delta-defective convex, then

M\displaystyle M 𝒪(dlog(dγ)),\displaystyle\in\mathcal{O}\left(d\log\left(\frac{d}{\gamma}\right)\right),
N\displaystyle N 𝒪(1γ2log2(dγ)[d2log(dγ)log(2+|ϕr|1)+log(1η)]),\displaystyle\in\mathcal{O}\left(\frac{1}{\gamma^{2}}\log^{2}\left(\frac{d}{\gamma}\right)\left[d^{2}\log\left(\frac{d}{\gamma}\right)\log(2+|\phi_{r}|_{1})+\log\left(\frac{1}{\eta}\right)\right]\right),
size(𝕌MN(ω,))\displaystyle{\rm size}(\mathbb{U}_{M}^{N}(\omega,\cdot)) 𝒪(d3γ2log4(dγ)[d2log(dγ)+log(1η)]S),\displaystyle\in\mathcal{O}\left(\frac{d^{3}}{\gamma^{2}}\log^{4}\left(\frac{d}{\gamma}\right)\left[d^{2}\log\left(\frac{d}{\gamma}\right)+\log\left(\frac{1}{\eta}\right)\right]{\rm S}\right),

where S{\rm S} is as above and the tacit constant depends on |g|α,|g|,|f|,diam(D),adiam(D),δ,α,log(2+|ϕr|1)|g|_{\alpha},|g|_{\infty},|f|_{\infty},{\rm diam}(D),{\rm adiam}(D),\delta,\alpha,\log(2+|\phi_{r}|_{1}). ∎

5 Numerical results

Throughout this section we numerically test some key theoretical results obtained in this paper. In order to take advantage of the highly parallelizable properties of the proposed Monte Carlo methods, we have implemented the algorithms using GPU parallel computing within the PyTorch framework.

Concretely, we consider two types of domains, namely:

D𝖼:=[1,1]d,d1 and D𝖺𝖼:=[1,1]d{xd:|x|1:=|x1|++|xd|0.5},D_{\sf c}:=[-1,1]^{d},d\geq 1\quad\mbox{ and }\quad D_{\sf ac}:=[-1,1]^{d}\setminus\left\{x\in\mathbb{R}^{d}:|x|_{1}:=|x_{1}|+\dots+|x_{d}|\leq 0.5\right\},

which are illustrated by Figure 2 for the case d=2d=2.

Refer to caption
(a) D𝖼=[1,1]2D_{\sf c}=[-1,1]^{2}
Refer to caption
(b) D𝖺𝖼={x[1,1]2:|x1|+|x2|0.5}D_{\sf ac}=\left\{x\in[-1,1]^{2}:|x_{1}|+|x_{2}|\leq 0.5\right\}
Figure 2: D𝖼D_{\sf c} and D𝖺𝖼D_{\sf ac} for d=2d=2

We have performed the following numerical tests by:

Test 1. Here we numerically verify the estimate (2.18), namely that

(r(XMx)ε)(1β2(1δ)4d)Mr(x)ε,xD,\mathbb{P}(r(X^{x}_{M})\geq\varepsilon)\leq\left(1-\frac{\beta^{2}(1-\delta)}{4d}\right)^{M}\sqrt{\frac{r(x)}{\varepsilon}},\quad x\in D, (5.1)

by fixing ε\varepsilon and varying dd and MM for both D𝖼D_{\sf c} and D𝖺𝖼D_{\sf ac}. Note that D𝖼D_{\sf c} is 0-defective convex since it is in fact convex, so (5.1) is valid with δ=0\delta=0 by 2.16. Furthermore, for this test, the WoS is constructed with the exact distance to the boundary, so we shall take β=1\beta=1. On the other hand, D=D𝖺𝖼D=D_{\sf ac} is not defective convex since if we denote the boundary corner (0.5,0,0,,0)D𝖺𝖼(0.5,0,0,\dots,0)\in D_{\sf ac} by x0x_{0}, then Δr(x)=(d1)|xx0|1\Delta r(x)=(d-1)|x-x_{0}|^{-1} is unbounded on the cone {(r,x)D𝖺𝖼:r(0.5,0.75),|x|r}\left\{(r,x)\in D_{\sf ac}:r\in(0.5,0.75),|x|\leq r\right\}, hence (2.15) can not hold. Nevertheless, for ε\varepsilon fixed we speculated in 2.17 that a similar asymptotic behavior as in (5.1) still holds with respect to MM and dd; this is numerically tested in Figure 5(b) below.

Let us introduce the notations

𝖴𝖻𝗈𝗎𝗇𝖽(d,M,x,ε):=(114d)Mr(x)ε,\displaystyle{\sf U}_{\sf bound}(d,M,x,\varepsilon):=\left(1-\frac{1}{4d}\right)^{M}\sqrt{\frac{r(x)}{\varepsilon}},
N(d,M,x,ε):=1N1iN1[rε](XMx,i), where XMx,1,,XMx,NXMx are independent,N1,\displaystyle\mathbb{P}_{N}(d,M,x,\varepsilon):=\frac{1}{N}\sum\limits_{1\leq i\leq N}1_{\left[r\geq\varepsilon\right]}(X_{M}^{x,i}),\quad\mbox{ where }X_{M}^{x,1},\cdots,X_{M}^{x,N}\sim X_{M}^{x}\mbox{ are independent},N\geq 1,

so that the inequality to be tested becomes

N(d,M,x,ε)(r(XMx)ε)𝖴𝖻𝗈𝗎𝗇𝖽(d,M,x,ε) for N sufficiently large.\mathbb{P}_{N}(d,M,x,\varepsilon)\approx\mathbb{P}(r(X^{x}_{M})\geq\varepsilon)\leq{\sf U}_{\sf bound}(d,M,x,\varepsilon)\quad\mbox{ for }N\mbox{ sufficiently large}. (5.2)

We obtained the following results:

Refer to caption
(a) M=5×i,i{80,,200}M=5\times i,i\in\{80,\dots,200\}
Refer to caption
(b) M=5×i,i{120,,200}M=5\times i,i\in\{120,\dots,200\}
Figure 3: 𝖴𝖻𝗈𝗎𝗇𝖽(d,,x0,ε){\sf U}_{\sf bound}(d,\cdot,x_{0},\varepsilon) (red) vs N(d,,x0,ε)\mathbb{P}_{N}(d,\cdot,x_{0},\varepsilon) (black) for D=D𝖼D=D_{\sf c}, d=20,ε=103,N=5×104d=20,\varepsilon=10^{-3},N=5\times 10^{4}, whilst x0x_{0} is arbitrarily chosen in DD such that |x0|=0.5|x_{0}|=0.5.
Refer to caption
(a) d{2,,30}d\in\{2,\dots,30\}, M=300M=300
Refer to caption
(b) d{2,,30}d\in\{2,\dots,30\}, M=4×d1+0.5M=4\times d^{1+0.5}
Figure 4: 𝖴𝖻𝗈𝗎𝗇𝖽(,,x0,ε){\sf U}_{\sf bound}(\cdot,\cdot,x_{0},\varepsilon) (red) vs N(,,x0,ε)\mathbb{P}_{N}(\cdot,\cdot,x_{0},\varepsilon) (black) for D=D𝖼D=D_{\sf c}, ε=103,N=5×104\varepsilon=10^{-3},N=5\times 10^{4}, whilst x0x_{0} is arbitrarily chosen in DD such that |x0|=0.5|x_{0}|=0.5, for each dimension dd.
Refer to caption
(a) d{2,,30}d\in\{2,\dots,30\}, M=300M=300
Refer to caption
(b) d{2,,30}d\in\{2,\dots,30\}, M=4×d1+0.5M=4\times d^{1+0.5}
Figure 5: 𝖴𝖻𝗈𝗎𝗇𝖽(,,x0,ε){\sf U}_{\sf bound}(\cdot,\cdot,x_{0},\varepsilon) (red) vs N(,,x0,ε)\mathbb{P}_{N}(\cdot,\cdot,x_{0},\varepsilon) (black) for D=D𝖺𝖼D=D_{\sf ac}, ε=103\varepsilon=10^{-3}, N=5×104N=5\times 10^{4}, whilst x0x_{0} is arbitrarily chosen in DD such that |x0|=0.7|x_{0}|=0.7, for each dimension dd.

Comments on Test 1.

  1. \bullet

    The numerical results depicted in Figure 3 and Figure 4 validate the upper bound (5.2). Note that in Figure 3(b), namely for large values of MM, inequality (5.2) becomes quite sharp, yet slightly reversed; the apparent reversed inequality is just a consequence of the Monte Carlo error that appears in the approximation N(d,M,x,ε)(r(XMx)ε)\mathbb{P}_{N}(d,M,x,\varepsilon)\approx\mathbb{P}(r(X^{x}_{M})\geq\varepsilon) corresponding to N=5×104N=5\times 10^{4} Monte Carlo samples.

  2. \bullet

    The numerical evidence illustrated in Figure 5 shows that at least for one relevant example of non δ\delta-defective convex domain, namely for D𝖺𝖼D_{\sf ac}, by fixing ε\varepsilon, the estimate (5.2) is still in force; in particular, this sustains the idea expressed in 2.17.

  3. \bullet

    Finally, recall that our main estimates obtained in the previous sections in the case of defective convex domains require M𝒪(dlog(d/γ))M\in\mathcal{O}(d\log(d/\gamma)). This requirement is suggested also by comparing Figure 4(a) versus Figure 4(b) as well as Figure 5(a) versus Figure 5(b); in Figure 4(b) and Figure 5(b) we have chosen M𝒪(d1+0.5)M\in\mathcal{O}(d^{1+0.5}) instead of M𝒪(dlog(d/γ))M\in\mathcal{O}(d\log(d/\gamma)) only because in the latter case it is harder to nicely visualize the numerical results.

Test 2. Here the goal is to test the approximation of the solution uu to (1.1) by simulating its Monte Carlo estimator (1.5). For simplicity we take the source term f=1f=1, hence we deal with

{12Δu=1 in Du|D=g,, where D=D𝖺𝖼.\begin{cases}\frac{1}{2}\Delta u=-1\,\mbox{ in }D\\ u|_{\partial D}=g,\end{cases},\mbox{ where }D=D_{\sf ac}. (5.3)

In order to validate the numerical results, we consider a particular explicit solution to (5.3), namely

u(x)=x12++xk2xk+12xd2x12,d=2k,x=(x1,,xd)d,\displaystyle u(x)=x_{1}^{2}+\dots+x_{k}^{2}-x_{k+1}^{2}-\dots-x_{d}^{2}-x_{1}^{2},\quad d=2k,x=(x_{1},\cdots,x_{d})\in\mathbb{R}^{d}, (5.4)
g=u|D,D=D𝖼 or D𝖺𝖼.\displaystyle g=u|_{\partial D},\quad D=D_{\sf c}\mbox{ or }D_{\sf ac}. (5.5)

Further, we need to introduce some notations. For M,N,L,E1M,N,L,E\geq 1 and Wi,1iLW_{i},1\leq i\leq L independent and uniformly distributed on DD, we introduce the notations

𝖤𝗋𝗋M,N(x):=|u(x)uMN(x)|,xD,|𝖤𝗋𝗋M,N|L1(D/|D|):=|D|1D𝖤𝗋𝗋M,N(x)𝑑x\displaystyle{\sf Err}_{M,N}(x):=\left|u(x)-u_{M}^{N}(x)\right|,x\in D,\quad|{\sf Err}_{M,N}|_{L^{1}(D/|D|)}:=|D|^{-1}\int_{D}{\sf Err}_{M,N}(x)\;dx
|𝖤𝗋𝗋M,N,L|L1(D/|D|):=1/L1iL𝖤𝗋𝗋M,N(i)(Wi),|𝖤𝗋𝗋M,N,L|L(D):=max1iL𝖤𝗋𝗋M,N(i)(Wi)\displaystyle|{\sf Err}_{M,N,L}|_{L^{1}(D/|D|)}:=1/L\sum_{1\leq i\leq L}{\sf Err}_{M,N}^{(i)}(W_{i}),\quad|{\sf Err}_{M,N,L}|_{L^{\infty}(D)}:=\max_{1\leq i\leq L}{\sf Err}_{M,N}^{(i)}(W_{i})
where 𝖤𝗋𝗋M,N(i)(),1iL{\sf Err}_{M,N}^{(i)}(\cdot),1\leq i\leq L are iid copies of 𝖤𝗋𝗋M,N(){\sf Err}_{M,N}(\cdot), independent of (Wi)1iL(W_{i})_{1\leq i\leq L},
|𝖤𝗋𝗋M,N,L,E|L(D):=1/E1jE|𝖤𝗋𝗋M,N,L(j)|L(D),\displaystyle|{\sf Err}_{M,N,L,E}|_{L^{\infty}(D)}:=1/E\sum_{1\leq j\leq E}|{\sf Err}_{M,N,L}^{(j)}|_{L^{\infty}(D)},

where |𝖤𝗋𝗋M,N,L(j)|L(D),1jE|{\sf Err}_{M,N,L}^{(j)}|_{L^{\infty}(D)},1\leq j\leq E are independent copies of |𝖤𝗋𝗋M,N,L|L(D)|{\sf Err}_{M,N,L}|_{L^{\infty}(D)}. In particular, by the law of large numbers we immediately have

limL|𝖤𝗋𝗋M,N,L|L1(D/|D|)=𝔼{|𝖤𝗋𝗋M,N|L1(D/|D|)} almost surely,\lim_{L\to\infty}|{\sf Err}_{M,N,L}|_{L^{1}(D/|D|)}=\mathbb{E}\left\{|{\sf Err}_{M,N}|_{L^{1}(D/|D|)}\right\}\mbox{ almost surely}, (5.6)

whilst by a similar argument as in [5, Lemma 4.3] one can show that

limElimL|𝖤𝗋𝗋M,N,L,E|L(D)=𝔼{supxD|𝖤𝗋𝗋M,N(x)|} almost surely.\lim_{E\to\infty}\lim_{L\to\infty}|{\sf Err}_{M,N,L,E}|_{L^{\infty}(D)}=\mathbb{E}\left\{\sup_{x\in D}|{\sf Err}_{M,N}(x)|\right\}\mbox{ almost surely}. (5.7)

The aim of this test is to numerically validate the approximation uuMNu\approx u_{M}^{N} by computing the mean errors 𝔼{|𝖤𝗋𝗋M,N|L1(D/|D|)}\mathbb{E}\left\{|{\sf Err}_{M,N}|_{L^{1}(D/|D|)}\right\} and 𝔼{supxD|𝖤𝗋𝗋M,N(x)|}\mathbb{E}\left\{\sup_{x\in D}|{\sf Err}_{M,N}(x)|\right\}. To this end, justified by (5.6) and (5.7), we shall simulate |𝖤𝗋𝗋M,N,L|L1(D/|D|)(ω)|{\sf Err}_{M,N,L}|_{L^{1}(D/|D|)}(\omega) and |𝖤𝗋𝗋M,N,L,E|L(D)(ω)|{\sf Err}_{M,N,L,E}|_{L^{\infty}(D)}(\omega) for LL large, e.g. L=1000L=1000 or L=2000L=2000. In order to reduce the computational burden, the value of EE is going to be taken relatively small, e.g. E=5E=5 or E=10E=10. However, we point out that choosing a small value for EE shall not alter the reliability of the Monte Carlo estimate, mainly due to the fact that one can show that the distribution of |𝖤𝗋𝗋M,N,L|L(D)|{\sf Err}_{M,N,L}|_{L^{\infty}(D)} is concentrated.

Refer to caption
(a) The histogram of 𝖤𝗋𝗋M,N(i)(ω),1iL=2×104{\sf Err}_{M,N}^{(i)}(\omega),1\leq i\leq L=2\times 10^{4}
Refer to caption
(b) The histogram of |𝖤𝗋𝗋M,N,L(j)|L(D),1jE=10|{\sf Err}_{M,N,L}^{(j)}|_{L^{\infty}(D)},1\leq j\leq E=10, L=2×103L=2\times 10^{3}
Figure 6: The histograms of 𝖤𝗋𝗋M,N(i),1iL{\sf Err}_{M,N}^{(i)},1\leq i\leq L and |𝖤𝗋𝗋M,N,L(j)|L(D),1jE|{\sf Err}_{M,N,L}^{(j)}|_{L^{\infty}(D)},1\leq j\leq E for D=D𝖺𝖼D=D_{\sf ac}, d=10d=10, N=105N=10^{5}, M=500M=500, whilst LL and EE are specified for each subfigure, accordingly.
Refer to caption
(a) The evolution of |𝖤𝗋𝗋M,N,L|L1(D/|D|)(ω)|{\sf Err}_{M,N,L}|_{L^{1}(D/|D|)}(\omega) w.r.t. NN
Refer to caption
(b) The evolution of |𝖤𝗋𝗋M,N,L,E|L(D)(ω)|{\sf Err}_{M,N,L,E}|_{L^{\infty}(D)}(\omega) w.r.t. NN
Figure 7: The evolution of |𝖤𝗋𝗋M,N,L|L1(D/|D|)(ω)|{\sf Err}_{M,N,L}|_{L^{1}(D/|D|)}(\omega) and |𝖤𝗋𝗋M,N,L,E|L(D)(ω)|{\sf Err}_{M,N,L,E}|_{L^{\infty}(D)}(\omega) w.r.t. NN, for D=D𝖺𝖼D=D_{\sf ac}, d=100d=100, M=500M=500, L=1000L=1000, and E=5E=5.
Refer to caption
(a) The evolution of 𝖤𝗋𝗋M,N(x)(ω){\sf Err}_{M,N}(x)(\omega) w.r.t. NN
Refer to caption
(b) Zoom in of Figure 8(a)
Figure 8: The evolution of 𝖤𝗋𝗋M,N(x)(ω){\sf Err}_{M,N}(x)(\omega) w.r.t. NN, for D=D𝖺𝖼D=D_{\sf ac}, d=100d=100, M=2000M=2000, and an arbitrarily fixed location xDx\in D, at which the exact solution u(x)u(x) is 0.357-0.357.

Comments on Test 2.

  1. \bullet

    The histograms depicted in Figure 6 for the Poisson problem (5.3) for d=10d=10 and D=D𝖺𝖼D=D_{\sf ac} confirm that the random variables given by the normalized L1L^{1}-error |𝖤𝗋𝗋M,N|L1(D/|D|)|{\sf Err}_{M,N}|_{L^{1}(D/|D|)} and the LL^{\infty}-error |𝖤𝗋𝗋M,N,L|L(D)|{\sf Err}_{M,N,L}|_{L^{\infty}(D)} are small and concentrated if MM and NN are chosen sufficiently large, as stipulated by the theoretical results 2.26 and 2.28.

  2. \bullet

    The numerical tests depicted by Figure 7 for the Poisson problem (5.3) for d=100d=100 and D=D𝖺𝖼D=D_{\sf ac} confirm that the errors 𝔼{|𝖤𝗋𝗋M,N|L1(D/|D|)}\mathbb{E}\left\{|{\sf Err}_{M,N}|_{L^{1}(D/|D|)}\right\} and 𝔼{supxD|𝖤𝗋𝗋M,N(x)|}\mathbb{E}\left\{\sup_{x\in D}|{\sf Err}_{M,N}(x)|\right\}, approximated by (5.6) and (5.7) respectively, are decreasing to a small value as the number of WoS trajectories NN increases. The limit error attained when NN goes to infinity is not zero as it depends on MM, but it decreases to zero as the latter parameter is also increased to infinity. This is discussed in the next comment.

  3. \bullet

    Furthermore, the results illustrated by Figure 8 for d=100d=100 show that the error 𝖤𝗋𝗋M,N(x){\sf Err}_{M,N}(x) for an arbitrary chosen location xD=D𝖺𝖼x\in D=D_{\sf ac} becomes much smaller than the errors obtained in Figure 7, as the number of WoS steps MM is increased.

  4. \bullet

    Concerning the dependence of MM and NN with respect to dd, our numerical tests revealed that on the one hand the choice of MM required in 2.26 or 2.28 is quite optimal, and on the other hand that the value of NN can be in fact taken much smaller than the one required in the same theoretical results. As a consequence suggested by this numerical evidence, one could expect that the width of the DNNs provided by 3.10 can be significantly reduced.

  5. \bullet

    Finally, let us emphasize that the numerical results obtained during Test 2 are conducted for the domain D𝖺𝖼D_{\sf ac} which is nor defective convex, neither satisfies the uniform exterior ball condition, hence the test turned out to be successful even on a worse domain geometry.

Acknowledgements. Lucian Beznea and Oana Lupascu-Stamate were supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS - UEFISCDI, project number PN-III-P4-PCE-2021-0921, within PNCDI III. Iulian Cimpean acknowledges support from the project PN-III-P1-1.1-PD-2019-0780, within PNCDI III. The work of Arghir Zarnescu has been partially supported by the Basque Government through the BERC 2022-2025 program and by the Spanish State Research Agency through BCAM Severo Ochoa excellence accreditation Severo Ochoa CEX2021-00114 and through project PID2020-114189RB-I00 funded by Agencia Estatal de Investigación (PID2020-114189RB-I00 / AEI / 10.13039/501100011033).

References

  • [1] D.H. Armitage and Ü. Kuran. The convexity of a domain and the superharmonicity of the signed distance function. Proceedings of the Amer. Math. Soc., 93(4):598–600, 1985.
  • [2] Marc Arnaudon and Xue-Mei Li. Reflected Brownian motion: selection, approximation and linearization. Electronic Journal of Probability, 22(none):1 – 55, 2017.
  • [3] Vlad Bally and Denis Talay. The law of the euler scheme for stochastic differential equations: I. convergence rate of the distribution function. Probability Theory and Related Fields, 104, 07 1994.
  • [4] Fabrice Baudoin. Stochastic analysis on sub-Riemannian manifolds with transverse symmetries. The Annals of Probability, 45(1):56 – 81, 2017.
  • [5] Christian Beck, Sebastian Becker, Philipp Grohs, Nor Jaafari, and Arnulf Jentzen. Solving the Kolmogorov PDE by means of Deep learning. J. Sci. Comput., 88(3), sep 2021.
  • [6] Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, and Benno Kuckuck. An overview on deep learning-based approximation methods for partial differential equations. Discrete and Continuous Dynamical Systems - B, 28(6):3697–3746, 2023.
  • [7] Lucian Beznea and Andrei-George Oprina. Nonlinear pdes and measure-valued branching type processes. Journal of Mathematical Analysis and Applications, 384(1):16–32, 2011. Special Issue on Stochastic PDEs in Fluid Dynamics, Particle Physics and Statistical Mechanics.
  • [8] I. Binder and M. Braverman. The rate of convergence of the walk on spheres algorithm. Geometric and Functional Analysis, 22(3):558–587, 2012.
  • [9] Mireille Bossy, Nicolas Champagnat, Sylvain Maire, and Denis Talay. Probabilistic interpretation and random walk on spheres algorithms for the poisson-boltzmann equation in molecular dynamics. ESAIM: Mathematical Modelling and Numerical Analysis, 44(5):997–1048, 2010.
  • [10] Krzysztof Burdzy, Zhen-Qing Chen, and John Sylvester. The heat equation and reflected Brownian motion in time-dependent domains. The Annals of Probability, 32(1B):775 – 804, 2004.
  • [11] Patrick Cheridito, H. Soner, Nizar Touzi, and Nicolas Victoir. Second order backward stochastic differential equations and fully non-linear parabolic pdes. 10 2005.
  • [12] G. Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals and Systems, 2(4):303–314, 1989.
  • [13] M. Deaconu, S. Herrmann, and S. Maire. The walk on moving spheres: A new tool for simulating brownian motion’s exit time from a domain. Mathematics and Computers in Simulation, 135:28–38, 2017. Special Issue: 9th IMACS Seminar on Monte Carlo Methods.
  • [14] Madalina Deaconu and Antoine Lejay. A random walk on rectangles algorithm. Methodology And Computing In Applied Probability, 8:135–151, 03 2006.
  • [15] D. Elbrächter, D. Perekrestenko, P. Grohs, and H. Bölcskei. Deep neural network approximation theory. IEEE Transactions on Information Theory, 67(5):2581–2623, 2021.
  • [16] Arash Fahim, Nizar Touzi, and Xavier Warin. A probabilistic numerical method for fully nonlinear parabolic PDEs. The Annals of Applied Probability, 21(4):1322 – 1364, 2011.
  • [17] J. F. Le Gall. Spatial branching processes, random snakes, and partial differential equations. 1999.
  • [18] W.D. Gerhard. The probabilistic solution of the Dirichlet problem for 1/2Δ+a,+b1/2{\Delta}+\langle a,\nabla\rangle+b with singular coefficients. Journal of Theoretical Probability, 5(3):503–520, 1992.
  • [19] D. Gilbarg and N.S. Trudinger. Elliptic Partial Differential Equations of Second Order. Springer, 2001.
  • [20] Emmanuel Gobet. Monte-Carlo methods and stochastic processes: from linear to non-linear. Chapman and Hall/CRC, 2016.
  • [21] Lukas Gonon, Philipp Grohs, Arnulf Jentzen, David Kofler, and David Šiška. Uniform error estimates for artificial neural network approximations for heat equations. IMA Journal of Numerical Analysis, 42(3):1991–2054, 2022.
  • [22] Lukas Gonon, Philipp Grohs, Arnulf Jentzen, David Kofler, and David Šiška. Uniform error estimates for artificial neural network approximations for heat equations. IMA Journal of Numerical Analysis, 42(3):1991–2054, 2021.
  • [23] P. Grohs and L. Herrmann. Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions. IMA Journal of Numerical Analysis, 42(3):2055–2082, 05 2021.
  • [24] Gerald Trutnau Haesung Lee, Wilhelm Stannat. Analytic theory of Itô-stochastic differential equations with non-smooth coefficients. arxix, (none), 2022.
  • [25] J. Han, A. Jentzen, and E. Weinan. Solving high-dimensional partial differential equations using deep learning. Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
  • [26] Elton P Hsu. Stochastic analysis on manifolds. Number 38. American Mathematical Soc., 2002.
  • [27] Martin Hutzenthaler, Arnulf Jentzen, and Thomas Kruse. Overcoming the curse of dimensionality in the numerical approximation of parabolic partial differential equations with gradient-dependent nonlinearities. Foundations of Computational Mathematics, page 905–966, 2022.
  • [28] Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, Tuan Anh Nguyen, and Philippe von Wurstemberger. Overcoming the curse of dimensionality in the numerical approximation of semilinear parabolic partial differential equations. Proceedings of the Royal Society A, 476(2244):20190630, 2020.
  • [29] Martin Hutzenthaler, Arnulf Jentzen, Thomas Kruse, and Tuan Nguyen. A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. SN Partial Differential Equations and Applications, 1, 04 2020.
  • [30] Martin Hutzenthaler, Arnulf Jentzen, and von Wurstemberger Wurstemberger. Overcoming the curse of dimensionality in the approximative pricing of financial derivatives with default risks. Electronic Journal of Probability, 25(none):1 – 73, 2020.
  • [31] A. Jentzen, D. Salimova, and T. Welti. A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. Communications in Mathematical Sciences, 19(5):1167–1205, 2021.
  • [32] Vitalii Konarovskyi, Victor Marx, and Max von Renesse. Spectral gap estimates for Brownian motion on domains with sticky-reflecting boundary diffusion. arXiv e-prints, page arXiv:2106.00080, May 2021.
  • [33] A.E. Kyprianou, A.Osojnik, and T. Shardlow. Unbiased ‘walk-on-spheres’ monte carlo methods for the fractional laplacian. IMA Journal of Numerical Analysis, 38, 09 2017.
  • [34] I.E. Lagaris, A. Likas, and D.I. Fotiadis. Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
  • [35] I.E Lagaris, A.C. Likas, and D.G. Papageorgiou. Neural-network methods for boundary value problems with irregular boundaries. IEEE Transactions on Neural Networks, 11(5):1041–1049, 2000.
  • [36] Antoine Lejay and Sylvain Maire. New monte carlo schemes for simulating diffusions in discontinuous media. Journal of computational and applied mathematics, 245:97–116, 2013.
  • [37] A. Malek and R. Shekari Beidokhti. Numerical solution for high order differential equations using a hybrid neural network—optimization method. Applied Mathematics and Computation, 183(1):260–271, 2006.
  • [38] Miguel Martinez and Denis Talay. One-dimensional parabolic diffraction equations: pointwise estimates and discretization of related stochastic differential equations with weighted local times. Electronic Journal of Probability, 17:1–30, 2012.
  • [39] T. Marwah, Z.C. Lipton, and A. Risteski. Parametric complexity bounds for approximating pdes with neural networks. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 15044–15055. Curran Associates, Inc., 2021.
  • [40] M.E. Muller. Some continuous monte carlo methods for the dirichlet problem. The Annals of Mathematical Statistics, pages 569–589, 1956.
  • [41] Etienne Pardoux and Shanjian Tang. Forward-backward stochastic differential equations and quasilinear parabolic pdes. Probability Theory and Related Fields, 114(2):123–150, 1999.
  • [42] A. Pinkus. Approximation theory of the mlp model in neural networks. Acta Numerica, 8:143–195, 1999.
  • [43] M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707, 2019.
  • [44] K.K. SABELFELD and D. TALAY. Integral formulation of the boundary value problems and the method of random walk on spheres. Monte Carlo Methods and Applications, 1(1):1–34, 1995.
  • [45] K.K. Sabelfeld and D. Talay. Integral formulation of the boundary value problems and the method of random walk on spheres. Monte Carlo Methods and Applications, 1(1):1–34, 1995.
  • [46] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equations. Journal of Computational Physics, 375:1339–1364, 2018.
  • [47] Daniel W Stroock. An introduction to the analysis of paths on a Riemannian manifold. Number 74. American Mathematical Soc., 2000.
  • [48] Denis Talay and Luciano Tubaro. Expansion of the global error for numerical schemes solving stochastic differential equations. Stochastic Analysis and Applications, 8(4):483–509, 1990.
  • [49] Anton Thalmaier and James Thompson. Exponential integrability and exit times of diffusions on sub-riemannian and metric measure spaces. Bernoulli, 26(3):2202–2225, 2020.
  • [50] N. Valenzuela. A new approach for the fractional laplacian via deep neural networks. arXiv preprint arXiv:2205.05229, 2022.
  • [51] Max-K von Renesse and Karl-Theodor Sturm. Transport inequalities, gradient estimates, entropy and ricci curvature. Communications on pure and applied mathematics, 58(7):923–940, 2005.
  • [52] D. Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
  • [53] Yijing Zhou, Wei Cai, and Elton Hsu. Computation of local time of reflecting brownian motion and probabilistic representation of the neumann problem. Communications in Mathematical Sciences, 15, 02 2015.