This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

STAR DISCREPANCY BOUNDS BASED ON HILBERT SPACE FILLING CURVE STRATIFIED SAMPLING AND ITS APPLICATIONS

Jun Xian, Xiaoda Xu J. Xian
Department of Mathematics and Guangdong Province Key Laboratory of Computational Science
Sun Yat-sen University
510275 Guangzhou
China.
xianjun@mail.sysu.edu.cn X. Xu
Department of Mathematics
Sun Yat-sen University
510275 Guangzhou
China.
xuxd26@mail2.sysu.edu.cn
Abstract.

In this paper, we consider the upper bound of the probabilistic star discrepancy based on Hilbert space filling curve sampling. This problem originates from the multivariate integral approximation, but the main result removes the strict conditions on the sampling number of the classical grid-based jittered sampling. The main content has three parts. First, we inherit the advantages of this new sampling and achieve a better upper bound of the random star discrepancy than the use of Monte Carlo sampling. In addition, the convergence order of the upper bound is improved from O(N12)O(N^{-\frac{1}{2}}) to O(N1212d)O(N^{-\frac{1}{2}-\frac{1}{2d}}). Second, a better uniform integral approximation error bound of the function in the weighted space is obtained. Third, other applications will be given. Such as the sampling theorem in Hilbert spaces and the improvement of the classical Koksma-Hlawka inequality. Finally, the idea can also be applied to the proof of the strong partition principle of the star discrepancy version.

Key words and phrases:
Star discrepancy; Stratified sampling; Hilbert space filling curve; Integral approximation.
2020 Mathematics Subject Classification:
11K38, 65D30.

1. Introduction

Many problems in industry, statistics, physics and finance need to introduce multivariable integrals. In order to effectively represent, analyze and process these integrals on the computer, it is necessary to discretize the sample points. A natural problem is whether these discrete sampling values can achieve the optimal fast convergence approximation of multivariable integrals. This is investigated by the discrepancy function method and sampling theory. And which also makes it a popular research topic in applied mathematics and engineering science.

For the multivariate function integral [0,1]df(x)𝑑x\int_{[0,1]^{d}}f(x)dx defined on a dd-dimensional unit cube, if a discrete sampling point set PN,d={t1,t2,,tN}P_{N,d}=\{t_{1},t_{2},\ldots,t_{N}\} is selected, it can be approximated by the sample mean 1NtPN,df(t)\frac{1}{N}\sum_{t\in P_{N,d}}f(t). Then the approximation error can be calculated by:

EN(f)=|[0,1]df(x)𝑑x1NtPN,df(t)|.E_{N}(f)=\left|\int_{[0,1]^{d}}f(x)dx-\frac{1}{N}\sum_{t\in P_{N,d}}f(t)\right|.

Among various techniques to estimate the error of EN(f)E_{N}(f), star discrepancy has proven to be among the most efficient approaches. Star discrepancy was first used to measure the irregularity of the point distribution. For a point set PN,d={t1,t2,,tN}P_{N,d}=\{t_{1},t_{2},\ldots,t_{N}\}, the star discrepancy is defined as follows:

(1.1) DN(PN,d):=supx[0,1]d|λ([0,x])n=1NI[0,x](tn)N|,D_{N}^{*}\left(P_{N,d}\right):=\sup_{x\in[0,1]^{d}}\left|\lambda([0,x])-\frac{\sum_{n=1}^{N}I_{[0,x]}\left(t_{n}\right)}{N}\right|,

where λ\lambda denotes the dd-dimensional Lebesgue measure, and I[0,x]I_{[0,x]} denotes the characteristic function defined on the dd-dimensional rectangle [0,x][0,x].

A well-known result in discrepancy theory is the Koksma-Hlawka inequality [1, 2]. It shows that the integral approximation error of any multivariate function can be separated into two parts, one is the star discrepancy of the defined point set PN,dP_{N,d}, and the other is the total variation of the function ff (under the definition of Hardy and Krause), i.e,:

(1.2) EN(f)DN(PN,d)V(f).E_{N}(f)\leq D_{N}^{*}\left(P_{N,d}\right)V(f).

The equation (1.2) indicates that when ff is a function of bounded variation, a smaller upper bound on the star discrepancy yields a more accurate approximation error of the integration. Therefore, we focus on the research of the star discrepancy in the following.

There are many kinds of constructive sampling sets, also known as low discrepancy point sets [3], which obtain a satisfactory convergence rate of star discrepancy bound. For a fixed dimension dd, it can reach O((lnN)αd/N)O((\ln N)^{\alpha_{d}}/N), where αd0\alpha_{d}\geq 0 is a constant that depends only on the dimension dd. This kind of point set is widely used in many fields, such as colored noise, learning theory and finance  [4, 5, 6, 7], but this kind of point set also has some limitations. First of all, the convergence rate is strongly dependent on the dimensions. For high dimensional space, the convergence rate becomes slower, even worse than the convergence rate O(N12)O(N^{-\frac{1}{2}}) obtained by classical Monte Carlo in some cases. While the classical Monte Carlo sampling method is independent of the dimensions, so it has strong applicability for especially large-dimensional data. Secondly, because the low discrepancy point set is a constructive sampling, it is difficult to deal with a large number of point sets generated by random sampling in real life, such as the processing of noise-contaminated signals. Therefore, it is necessary to explore a new sampling method that inherits the fast convergence characteristics of low discrepancy point sets in moderate dimensions and the strong applicability of Monte Carlo point sets to high dimensions.

In order to achieve a better approximation effect for multivariate integrals in higher dimensions, a large number of points must be sampled if we use the low discrepancy point sets. Which loses information about the moderate sample size. In other words, if it is required to find the optimal upper bound of the star discrepancy in the higher dimensional space under the moderate sample size, the low discrepancy point set is not applicable. Therefore, in recent years, many scholars have carried out a series of studies on this problem and tried to use the appropriate sampling point set. This study is called the search for the optimal pre-asymptotic star discrepancy bound. In 2001, Heinrich, Novak [8] et al. proved that the optimal pre-asymptotic upper bound can be obtained as long as the number of sampling points linearly depends on the dimension, and the upper bound is cdNc\cdot\frac{\sqrt{d}}{{\sqrt{N}}}, where cc is an unknown positive constant. In 2011, Aistleitner [9] calculated that cc is at most 10. This constant has been improved to 2.53  [10] so far. From a practical point of view, the following probabilistic pre-asymptotic upper bound  [11] can be obtained, that is,

(1.3) DN(Y)5.74.9ln(1q)ddN,D_{N}^{*}(Y)\leq 5.7\sqrt{4.9-\frac{\ln(1-q)}{d}}\frac{\sqrt{d}}{\sqrt{N}},

holds with probability at least qq for the Monte Carlo point set YY.

Combined with the K-H inequality, it is easy to achieve the optimal probability approximation of the bounded variation function on the dd-dimensional unit cube. Especially for high-dimensional data, this approximation overcomes some disadvantages of the traditional deterministic low discrepancy point sets, and gives a good approximation with a moderate sample size.

By optimizing the cover technique, the probability bound of (1.3) is improved by  [10], which gives

(1.4) DN(Y)0.772910.7042ln(1q)ddN.D_{N}^{*}(Y)\leq 0.7729\sqrt{10.7042-\frac{\ln(1-q)}{d}}\frac{\sqrt{d}}{\sqrt{N}}.

holds with probability at least qq for the Monte Carlo point set YY.

It is worth mentioning that the above pre-asymptotic star discrepancy bounds and the optimal probability approximation bounds are all obtained by using the Monte Carlo point set. Although the Monte Carlo method can get better approximation for high dimensional data, the convergence speed is slow, and the convergence speed is O(N12)O(N^{-\frac{1}{2}}). How to improve this convergence speed and make the original approximation still applied to moderate sample size is a concern of scholars in recent years. Some scholars have noticed that the main reason for the slow convergence of the Monte Carlo method is the large variance of the sample mean function. Therefore, they have focused on the variance reduction technique and tried to use some new sampling methods to improve the approximation effect. For example, the probabilistic star discrepancy bound using Latin hypercube sampling  [10] and the expected bound  [12, 13] obtained by using jittered sampling. Jittered sampling is formed by isometric grid partition, and [0,1]d[0,1]^{d} is divided into mdm^{d} small sub-cubes Qi,1iN,Q_{i},1\leq i\leq N, each with side length 1m,\frac{1}{m}, Figures 1 and 2 give 22-dimensional and 33-dimensional illustrations.

Refer to caption
Figure 1. Two-dimensional stratified sampling
Refer to caption
Figure 2. Three-dimensional stratified sampling

Moreover, although the classical jittered sampling inherits some advantages of stratified sampling, it is more demanding for the number of sampling points, i.e., N=mdN=m^{d}. The exponential dependence of the sampling number on dd, which will inevitably lead to the curse of dimensionality, so it is difficult to obtain the pre-asymptotic star discrepancy bound. However, jittered sampling retains the value of theoretical and numerical research, such as, it can achieve a faster convergence rate than the Monte Carlo sampling in a relatively low-dimensional space. While Latin hypercube sampling overcomes the dimensionality problem and reduces the variance of the sample mean, the star discrepancy convergence order is still O(N12)O(N^{-\frac{1}{2}}), see  [10].

If the problem is to reduce the error of the (1.2) under suitable sampling number, then for a class of bounded variation function spaces, there is

(1.5) EN(f)0.772910.7042ln(1q)ddNV(f),E_{N}(f)\leq 0.7729\sqrt{10.7042-\frac{\ln(1-q)}{d}}\frac{\sqrt{d}}{\sqrt{N}}\cdot V(f),

holds with probability at least qq.

So how to improve the convergence speed of the approximation error in (1.5) and preserve approximation information under the appropriate sampling number, based on the above introduction and discussion, there are two main ideas: one is to use the low discrepancy point sets, but note that the convergence order is O((lnN)αd/N)O((\ln N)^{\alpha_{d}}/N), where αd0\alpha_{d}\geq 0 is a constant that depends on the dimension dd. Generally for different point sets, αd\alpha_{d} is at least O(d)O(d), so for high-dimensional space, the number of samples required is particularly large; the other is to use the idea of stratified sampling, but for classical jittered sampling, the disadvantage is that N=mdN=m^{d} sampling points are required, and for high-dimensional space, it faces the curse of dimensionality. Therefore, it is necessary to return to the general stratified sampling method. We try to use a sampling method that combines the advantages of low discrepancy point sets and stratified sampling, and to overcome the dimension problem to a certain extent.

The Hilbert space filling curve sampling method coincidentally inherits many advantages of general stratified sampling, and we describe it in detail below. This method was developed by A. B. Owen, Z. He et al. to study the variance of the sample mean function  [14, 15]. In our main result, we use Hilbert space filling curve sampling to obtain an improved upper bound of probability star discrepancy, and we give

(1.6) DN(X1,X2,,XN)c(d,q)N12+12d,D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N}\right)\leq\frac{c(d,q)}{N^{\frac{1}{2}+\frac{1}{2d}}},

holds with probability at least qq for the Hilbert space filling curve sampling set XX, where c(d,q)c(d,q) is an explicit and computable constant depending on dd and qq.

We can easily derive the probability star discrepancy bounds to the weighted form, and give the uniform probability approximation error in the weighted function space, that is, for a function in the weighted Sobolev space Fd,1,γF_{d,1,\gamma}, the following

(1.7) supfFd,1,γ,fFd,1,γ1|[0,1]df(x)𝑑x1Nj=1Nf(Xj)|maxuIdγu,dc(|u|,q)N12+12d\displaystyle\sup_{f\in F_{d,1,\gamma},\|f\|_{F_{d,1,\gamma}}\leq 1}|\int_{[0,1]^{d}}f(x)dx-\frac{1}{N}\sum_{j=1}^{N}f(X_{j})|\leq\max_{\emptyset\neq u\subseteq I_{d}}\gamma_{u,d}\cdot\frac{c(|u|,q)}{N^{\frac{1}{2}+\frac{1}{2d}}}

holds with probability at least ϵ\epsilon for the Hilbert space filling curve sampling set XX.

The strong partition principle for the star discrepancy version is an open question mentioned in [12, 20], we use the probabilistic results to prove the following conclusion:

(1.8) 𝔼(DN(PΩ))<𝔼(DN(Y)),\mathbb{E}(D_{N}^{*}(P_{\Omega}))<\mathbb{E}(D_{N}^{*}(Y)),

where Ω={Ω1,Ω2,,ΩN}\Omega=\{\Omega_{1},\Omega_{2},\ldots,\Omega_{N}\} is an arbitrary equivolume partition and PΩP_{\Omega} is the corresponding stratified sampling set, and Y={Y1,Y2,Y3,,YN}Y=\{Y_{1},Y_{2},Y_{3},\ldots,Y_{N}\} is simple random dd-dimensional samples.

We also apply the main result to give explicit bounded expressions for a set of sampling inequalities in the general Hilbert space.

We prove:

(1DN(X)γ)fL221Nj=1Nf2(xj)(DN(X)γ+1)fL22,\displaystyle(1-\frac{D_{N}^{*}(X)}{\gamma})\|f\|_{L^{2}}^{2}\leq\frac{1}{N}\sum_{j=1}^{N}f^{2}(x_{j})\leq(\frac{D_{N}^{*}(X)}{\gamma}+1)\|f\|_{L^{2}}^{2},

holds with probability at least 1ϵ1-\epsilon(ϵ\epsilon the probability that DN(X)c(d,ϵ)N12+12dD_{N}^{*}(X)\leq\frac{c(d,\epsilon)}{N^{\frac{1}{2}+\frac{1}{2d}}} must hold), where 0<γ10<\gamma\leq 1, and ff belongs to Hilbert space.

In the end, we use the variant K-H inequality, combined with the obtained probabilistic star discrepancy bound, to give an approximation error for the piecewise smooth function ff defined on a Borel convex subset of [0,1]d[0,1]^{d}.

We prove:

(1.9) |n=1N(f𝟏Ω)(Xn)NΩf(x)𝑑x|2dDN(X1,X2,,XN)V(f),|\frac{\sum_{n=1}^{N}(f\cdot\mathbf{1}_{\Omega})(X_{n})}{N}-\int_{\Omega}f(x)dx|\leq 2^{d}D_{N}^{*}(X_{1},X_{2},\ldots,X_{N})\cdot V(f),

holds with probability at least qq, where DN(X1,X2,,XN)D_{N}^{*}(X_{1},X_{2},\ldots,X_{N}) is the star discrepancy of Hilbert space filling curve sampling XX and qq denotes the probability that the upper bound DN(X)c(d,q)N12+12dD_{N}^{*}(X)\leq\frac{c(d,q)}{N^{\frac{1}{2}+\frac{1}{2d}}} must hold. Besides,

(1.10) V(f)=u{1,2,,d}2d|u|[0,1]d||u|xuf(x)|𝑑x.V(f)=\sum_{u\subset\{1,2,\ldots,d\}}2^{d-|u|}\int_{[0,1]^{d}}|\frac{\partial^{|u|}}{\partial x_{u}}f(x)|dx.

The symbol |u|xnf(x)\frac{\partial^{|u|}}{\partial x_{n}}f(x) is the partial derivative of ff with respect to those components of xx with index in uu.

The remainder of this paper is organised as follows. Section 2 presents the preliminaries. Section 3 presents our main result. Section 4 gives some applications of the main result. Finally, in section 5, we conclude the paper with a short summary.

2. Hilbert space filling curve sampling and some estimations

1. Hilbert space filling curve sampling

We first introduce Hilbert space filling curve sampling, which is abbreviated as HSFC sampling. We mainly adopt the definitions and symbols in [14, 15]. HSFC sampling is actually a stratified sampling formed by a special partition method.

Let aia_{i} be the first N=bmN=b^{m} points of the van der Corput sequence(van der Corput 1935) in base b2,m=0,1,b\geq 2,m=0,1,\ldots. The integer i10i-1\geq 0 is written in base bb as

(2.1) i1=j=1aijbj1i-1=\sum_{j=1}^{\infty}a_{ij}b^{j-1}

for aij{0,,b1}a_{ij}\in\{0,\ldots,b-1\}. Then, aia_{i} is defined by

(2.2) ai=j=1aijbj.a_{i}=\sum_{j=1}^{\infty}a_{ij}b^{-j}.

The scrambled version of a1,a2,,aNa_{1},a_{2},\ldots,a_{N} is x1,x2,,xNx_{1},x_{2},\ldots,x_{N} written as

(2.3) xi=j=1xijbj,x_{i}=\sum_{j=1}^{\infty}x_{ij}b^{-j},

where xijx_{ij} are defined through random permutations of the aija_{ij}. These permutations depend on aik,a_{ik}, for k<j.k<j. More precisely, xi1=π(ai1),x_{i1}=\pi(a_{i1}), xi2=πai1(ai2)x_{i2}=\pi_{a_{i1}}(a_{i2}) and generally for j2,j\geq 2,

(2.4) xij=πai1aij1(aij).x_{ij}=\pi_{a_{i1\ldots a_{i}{j-1}}}(a_{ij}).

Each random permutation is uniformly distributed over the b!b! permutation of {0,,b1}\{0,\ldots,b-1\}, and is mutually independent of the others. Thanks to the nice property of nested uniform scrambling, the data values in the scrambled sequence can be reordered such that

(2.5) xiU(Ii),x_{i}\sim U(I_{i}),

independently with

Ii=[i1N,iN]I_{i}=[\frac{i-1}{N},\frac{i}{N}]

for i=1,2,,N=bmi=1,2,\ldots,N=b^{m}.

Let

Ei=H(Ii):={H(x)|xIi},E_{i}=H(I_{i}):=\{H(x)|x\in I_{i}\},

where HH is a mapping.

Then

Xi=H(xi)U(Ei),i=1,2,3,N=bmX_{i}=H(x_{i})\sim U(E_{i}),i=1,2,3,\ldots N=b^{m}

is the corresponding stratified samples. Set rir_{i} be the diameter of EiE_{i}, according to the property of HSFC sampling, the following estimation holds,

(2.6) ri2d+3N1d.r_{i}\leq 2\sqrt{d+3}\cdot N^{-\frac{1}{d}}.

2. Minkowski content

We use the definition of Minkowski content in [16], which provides convenience for analysing the boundary properties of the test set BB in (1.1). For a set Ω[0,1]d\Omega\subset[0,1]^{d}, define

(2.7) (Ω)=limϵ0λ((Ω)ϵ)2ϵ,\mathscr{M}(\partial\Omega)=\lim_{\epsilon\to 0}\frac{\lambda((\partial\Omega)_{\epsilon})}{2\epsilon},

where (Ω)ϵ={Xd|dist(x,Ω)ϵ}(\partial\Omega)_{\epsilon}=\{X\in\mathbb{R}^{d}|dist(x,\partial\Omega)\leq\epsilon\}. If (Ω)\mathscr{M}(\partial\Omega) exists and is finite, then Ω\partial\Omega is said to admit (d1)(d-1)-dimensional Minkowski content. If Ω\Omega is a convex set, then it is easy to see that Ω\partial\Omega admits (d1)(d-1)-dimensional Minkowski content. Furthermore, (Ω)2d\mathscr{M}(\partial\Omega)\leq 2d as the outer surface area of a convex set in [0,1]d[0,1]^{d} is bounded by the surface area of the unit cube [0,1]d[0,1]^{d}, which is 2d2d.

3. δ\delta-covers

To discretise the star discrepancy, we use the definition of δ\delta-covers as in[17].

Definition 2.1.

For any δ(0,1]\delta\in(0,1], a finite set Γ\Gamma of points in [0,1)d[0,1)^{d} is called a δ\delta–cover of [0,1)d[0,1)^{d}, if for every y[0,1)dy\in[0,1)^{d}, there exist x,zΓ{0}x,z\in\Gamma\cup\{0\} such that xyzx\leq y\leq z and λ([0,z])λ([0,x])δ\lambda([0,z])-\lambda([0,x])\leq\delta. The number 𝒩(d,δ)\mathcal{N}(d,\delta) denotes the smallest cardinality of a δ\delta–cover of [0,1)d[0,1)^{d}.

From [18], combined with Stirling’s formula, the following estimate holds for 𝒩(d,δ)\mathcal{N}(d,\delta), that is, for any d1d\geq 1 and δ(0,1)\delta\in(0,1),

(2.8) 𝒩(d,δ)2ded2πd(δ1+1)d.\mathcal{N}(d,\delta)\leq 2^{d}\cdot\frac{e^{d}}{\sqrt{2\pi d}}\cdot(\delta^{-1}+1)^{d}.

Furthermore, the following lemma provides a convenient way to estimate the star discrepancy with δ\delta-covers.

Lemma 2.2.

[17] Let P={p1,p2,,pN}[0,1]dP=\{p_{1},p_{2},\ldots,p_{N}\}\subset[0,1]^{d} and Γ\Gamma be δ\delta-covers, then

(2.9) DN(P)DΓ(P)+δ,D_{N}^{*}(P)\leq D_{\Gamma}(P)+\delta,

where

(2.10) DΓ(P):=maxxΓ|λ([0,x])n=1NI[0,x](pn)N|.D_{\Gamma}(P):=\max_{x\in\Gamma}|\lambda([0,x])-\frac{\sum_{n=1}^{N}I_{[0,x]}\left(p_{n}\right)}{N}|.

4. Bernstein inequality

At the end of this section, we will restate the Bernstein inequality which will be used in the estimation of star discrepancy bounds.

Lemma 2.3.

[19] Let Z1,,ZNZ_{1},\ldots,Z_{N} be independent random variables with expected values 𝔼(Zj)=μj\mathbb{E}(Z_{j})=\mu_{j} and variances σj2\sigma_{j}^{2} for j=1,,Nj=1,\ldots,N. Assume |Zjμj|C|Z_{j}-\mu_{j}|\leq C(C is a constant) for each jj and set Σ2:=j=1Nσj2\Sigma^{2}:=\sum_{j=1}^{N}\sigma_{j}^{2}, then for any λ0\lambda\geq 0,

{|j=1N[Zjμj]|λ}2exp(λ22Σ2+23Cλ).\mathbb{P}\left\{\Big{|}\sum_{j=1}^{N}[Z_{j}-\mu_{j}]\Big{|}\geq\lambda\right\}\leq 2\exp\left(-\frac{\lambda^{2}}{2\Sigma^{2}+\frac{2}{3}C\lambda}\right).

3. Probabilistic star discrepancy bound for HSFC sampling

Theorem 3.1.

For integer number b1b\geq 1, m1m\geq 1 and N=bmN=b^{m}, then for the well defined dd-dimensional stratified samples XiU(Ei),i=1,2,,N=bmX_{i}\sim U(E_{i}),i=1,2,\ldots,N=b^{m} in Section 2, we have

(3.1) DN(X1,X2,,XN)6d34N12+12ddln(N+1)+c(d,q)+2c(d,q)3N,D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N}\right)\leq\frac{6d^{\frac{3}{4}}}{N^{\frac{1}{2}+\frac{1}{2d}}}\cdot\sqrt{d\ln(N+1)+c(d,q)}+\frac{2c(d,q)}{3N},

holds with probability at least qq, where

c(d,q)=ln(2e)d2πd(1q).c(d,q)=\ln\frac{(2e)^{d}}{\sqrt{2\pi d}\cdot(1-q)}.
Remark 3.2.

Theorem 3.1 improves the convergence order of the probabilistic star discrepancy bound. It is improved from O(N12)O(N^{-\frac{1}{2}}) to O(N1212d(lnN)12)O(N^{-\frac{1}{2}-\frac{1}{2d}}\cdot(\ln N)^{\frac{1}{2}}). Which can be compared to that of (1.3) and the part of the star discrepancy bound in (1.5). Secondly, the theorem is not as demanding as classical jittered sampling (i.e., it requires the number of sampling points to depend exponentially on the dimension, i.e., N=mdN=m^{d}. In fact the number of sampling points in HSFC sampling method does not depend on the dimensions). Therefor, it gives a better integral approximation error in the pre-asymptotic case, which can be obtained by using the K-H inequality.

Proof.

Let AA be a subset of EiE_{i}, then according to XiU(Ei),1iNX_{i}\sim U(E_{i}),1\leq i\leq N, it follows that

(3.2) (XiA)=λ(A)λ(Ei)=Nλ(A).\mathbb{P}(X_{i}\in A)=\frac{\lambda(A)}{\lambda(E_{i})}=N\lambda(A).

Now, for an arbitrary dd-dimensional rectangle R=[0,x][0,1]dR=[0,x]\in[0,1]^{d} with diameter κ\kappa. When xx passes through the unit cube [0,1)d[0,1)^{d}, we can choose two points y,zy,z such that yxzy\leq x\leq z and λ([0,z])λ([0,y])1N\lambda([0,z])-\lambda([0,y])\leq\frac{1}{N}. Let R0=[0,y]R_{0}=[0,y] and R1=[0,z]R_{1}=[0,z], then we have

R0RR1,R_{0}\subseteq R\subset R_{1},

and

(3.3) λ(R1)λ(R0)1N.\lambda(R_{1})-\lambda(R_{0})\leq\frac{1}{N}.

We know that R0R_{0} has a smaller diameter than κ\kappa, so we set it to κ0\kappa_{0}, and R1R_{1} has a larger diameter than κ\kappa, so we set it to κ1\kappa_{1}. This forms a bracketing cover for the set RR, and from (2.8) and (3.3), we can derive the upper bound for the bracketing cover pair (R0,R1)(R_{0},R_{1}). It has a cardinality at most 2d1ed2πd(N+1)d2^{d-1}\frac{e^{d}}{\sqrt{2\pi d}}(N+1)^{d}. Therefore, from the lemma 2.2, we obtain

(3.4) DN(Y1,Y2,,YN;R)maxi=0,1DN(Y1,Y2,,YN;Ri)+1N.D_{N}^{*}(Y_{1},Y_{2},\ldots,Y_{N};R)\leq\max_{i=0,1}D_{N}^{*}(Y_{1},Y_{2},\ldots,Y_{N};R_{i})+\frac{1}{N}.

For an anchored box RR in [0,1]d[0,1]^{d}, it is easy to check that RR is representable as a disjoint union of EisE_{i}^{\prime}s entirely contained in RR and the union of ll pieces which are the intersections of some EjsE_{j}^{\prime}s and RR, i.e,

(3.5) R=iIEijJ(EjR),R=\bigcup_{i\in I}E_{i}\cup\bigcup_{j\in J}(E_{j}\cap\partial R),

where I,JI,J denote the index-sets.

By the definition of Minkowski content, for any σ>2\sigma>2, there exists ϵ0\epsilon_{0} such that λ((R)ϵ)σϵ(R)\lambda((\partial R)_{\epsilon})\leq\sigma\epsilon\mathscr{M}(\partial R) whenever ϵϵ0\epsilon\leq\epsilon_{0}.

From (2.6), the diameter for each EiE_{i} is at most 2d+3N1d2\sqrt{d+3}\cdot N^{-\frac{1}{d}}. We can assume N>(2d+3ϵ0)dN>(\frac{2\sqrt{d+3}}{\epsilon_{0}})^{d}, then 2d+3N1d:=ϵ<ϵ02\sqrt{d+3}\cdot N^{-\frac{1}{d}}:=\epsilon<\epsilon_{0} and jJ(EjR)(R)ϵ\bigcup_{j\in J}(E_{j}\cap\partial R)\subseteq(\partial R)_{\epsilon}; thus

|J|λ((R)ϵ)λ(Ei)σϵ(R)N1=2d+3σ(R)N11d.|J|\leq\frac{\lambda((\partial R)_{\epsilon})}{\lambda(E_{i})}\leq\frac{\sigma\epsilon\mathscr{M}(\partial R)}{N^{-1}}=2\sqrt{d+3}\sigma\mathscr{M}(\partial R)N^{1-\frac{1}{d}}.

Without loss of generality, we can set σ=3\sigma=3, and combined with the fact that (R)2d\mathscr{M}(\partial R)\leq 2d, it follows that

(3.6) |J|12dd+3N11d.|J|\leq 12d\sqrt{d+3}\cdot N^{1-\frac{1}{d}}.

The same argument (3.6) applies to test sets R0R_{0} and R1R_{1}.

For R0R_{0} or R1R_{1}, set

(3.7) DN(X1,X2,,XN;R)=maxi=0,1DN(X1,X2,,XN;Ri).D_{N}^{*}(X_{1},X_{2},\ldots,X_{N};R^{\prime})=\max_{i=0,1}D_{N}^{*}(X_{1},X_{2},\ldots,X_{N};R_{i}).

RR^{\prime} is also a test rectangle, which can be divided into two parts

(3.8) R=kKEklL(ElR),R^{\prime}=\bigcup_{k\in K}E_{k}\cup\bigcup_{l\in L}(E_{l}\cap R^{\prime}),

and the cardinality of R[0,1)dR^{{}^{\prime}}\subset[0,1)^{d} is at most 2d1ed2πd(N+1)d2^{d-1}\frac{e^{d}}{\sqrt{2\pi d}}(N+1)^{d} according to the δcover\delta-cover estimate.

Let

(3.9) T=lL(ElR),|L|=|{1,2,,l}|.T=\bigcup_{l\in L}(E_{l}\cap R^{\prime}),|L|=|\{1,2,\ldots,l\}|.

If we define new random variables χj,1jl\chi_{j},1\leq j\leq l as follows

(3.10) χl={1,XlElR0,otherwise,\chi_{l}=\left\{\begin{aligned} &1,X_{l}\in E_{l}\cap R^{\prime}\\ &0,otherwise,\end{aligned}\right.

then from the above discussions, we have

(3.11) NDN(X1,X2,,XN;R)\displaystyle N\cdot D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N};R^{\prime}\right)
=NDN(X1,X2,,XN;T)\displaystyle=N\cdot D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N};T\right)
=|l=1|L|χlN(l=1|L|λ(ElR))|.\displaystyle=|\sum_{l=1}^{|L|}\chi_{l}-N(\sum_{l=1}^{|L|}\lambda(E_{l}\cap R^{\prime}))|.

Since

(3.12) (χl=1)=λ(ElR)λ(El)=Nλ(ElR),\mathbb{P}(\chi_{l}=1)=\frac{\lambda(E_{l}\cap R^{\prime})}{\lambda(E_{l})}=N\cdot\lambda(E_{l}\cap R^{\prime}),

hence

(3.13) 𝐄(χl)=Nλ(ElR).\mathbf{E}(\chi_{l})=N\cdot\lambda(E_{l}\cap R^{\prime}).

Thus, from (3.11) and (3.13), we obtain

(3.14) NDN(X1,X2,,XN;R)=|l=1|L|(χl𝔼(χl))|.N\cdot D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N};R^{\prime}\right)=|\sum_{l=1}^{|L|}(\chi_{l}-\mathbb{E}(\chi_{l}))|.

Let σl2=𝔼(χl𝔼(χl))2\sigma_{l}^{2}=\mathbb{E}(\chi_{l}-\mathbb{E}(\chi_{l}))^{2} and Σ2=(l=1|L|σl2)12\Sigma^{2}=(\sum_{l=1}^{|L|}\sigma_{l}^{2})^{\frac{1}{2}}, then we have

(3.15) Σ2|L|12dd+3N11d.\Sigma^{2}\leq|L|\leq 12d\sqrt{d+3}\cdot N^{1-\frac{1}{d}}.

Therefore, from Lemma 2.3, for every RR^{\prime}, we have,

(3.16) (|l=1|L|(χl𝔼(χl))|>λ)2exp(λ224dd+3N11d+2λ3).\mathbb{P}\left(\Big{|}\sum_{l=1}^{|L|}(\chi_{l}-\mathbb{E}(\chi_{l}))\Big{|}>\lambda\right)\leq 2\cdot\exp(-\frac{\lambda^{2}}{24d\sqrt{d+3}\cdot N^{1-\frac{1}{d}}+\frac{2\lambda}{3}}).

Let

(3.17) =R(|l=1|L|(χl𝔼(χl))|>λ).\mathscr{B}=\bigcup_{R^{\prime}}\left(\Big{|}\sum_{l=1}^{|L|}(\chi_{l}-\mathbb{E}(\chi_{l}))|>\lambda\right).

Then using covering numbers, we have

(3.18) ()(2e)d12πd(N+1)dexp(λ224dd+3N11d+2λ3).\mathbb{P}(\mathscr{B})\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{24d\sqrt{d+3}\cdot N^{1-\frac{1}{d}}+\frac{2\lambda}{3}}).

Let A(d,q,N)=dln(2e)+dln(N+1)ln(2πd)2ln(1q)A(d,q,N)=d\ln(2e)+d\ln(N+1)-\frac{\ln(2\pi d)}{2}-\ln(1-q), and we choose

(3.19) λ=24dd+3A(d,q,N)+A2(d,q,N)9N11dN1212d+A(d,q,N)3.\lambda=\sqrt{24d\sqrt{d+3}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9N^{1-\frac{1}{d}}}}\cdot N^{\frac{1}{2}-\frac{1}{2d}}+\frac{A(d,q,N)}{3}.

By substituting λ\lambda into (3.18), we have

(3.20) ()1q.\mathbb{P}(\mathscr{B})\leq 1-q.

Combining above and (3.14), we obtain

(3.21) DN(X1,X2,,XN;R)\displaystyle D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N};R^{\prime}\right)
24dd+3A(d,q,N)+A2(d,q,N)9N11dN12+12d+A(d,q,N)3N\displaystyle\leq\frac{\sqrt{24d\sqrt{d+3}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9N^{1-\frac{1}{d}}}}}{N^{\frac{1}{2}+\frac{1}{2d}}}+\frac{A(d,q,N)}{3N}

holds with probability at least qq.

Thus, obviously, we have

(3.22) maxi=0,1DN(X1,X2,,XN;Ri)\displaystyle\max_{i=0,1}D_{N}^{*}(X_{1},X_{2},\ldots,X_{N};R_{i})
24dd+3A(d,q,N)+A2(d,q,N)9N11dN12+12d+A(d,q,N)3N\displaystyle\leq\frac{\sqrt{24d\sqrt{d+3}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9N^{1-\frac{1}{d}}}}}{N^{\frac{1}{2}+\frac{1}{2d}}}+\frac{A(d,q,N)}{3N}
24dd+3A(d,q,N)N12+12d+2A(d,q,N)3N\displaystyle\leq\frac{\sqrt{24d\sqrt{d+3}\cdot A(d,q,N)}}{N^{\frac{1}{2}+\frac{1}{2d}}}+\frac{2A(d,q,N)}{3N}

holds with probability at least qq.

(3.23) A(d,q,N)\displaystyle A(d,q,N) =ln(2e)d2πd(1q)+dln(N+1)\displaystyle=\ln\frac{(2e)^{d}}{\sqrt{2\pi d}\cdot(1-q)}+d\ln(N+1)
=c(d,q)+dln(N+1),\displaystyle=c(d,q)+d\ln(N+1),

where

(3.24) c(d,q)=ln(2e)d2πd(1q).c(d,q)=\ln\frac{(2e)^{d}}{\sqrt{2\pi d}\cdot(1-q)}.

The proof is completed. ∎

4. Applications of the main result

The following will give three applications of the theorem  3.1. One is the application of uniform integral approximation in weighted function space, and the other is to prove the strong partition principle of star discrepancy version. The strong partition principle is the problem proposed by  [20, 21], which has been perfectly solved for the L2L_{2}-discrepancy, but remains to be solved for the star discrepancy version. The third is the sampling theorem in general Hilbert space. Finally, it is applied to the integral approximation problem on the Borel convex subset in [0,1]d[0,1]^{d}.

4.1. Uniform integral approximation for functions in weighted function space

Many high-dimensional problems in practical applications have low effective dimension  [22], that is, they have different weights for the function values of different components. Therefore, the problem is abstracted as finding the uniform integral approximation error in weighted Sobolev space.

Let Fd,1F_{d,1} be a Sobolev space, for functions fFd,1f\in F_{d,1}, ff is differentiable for any variable and has a finite L1L_{1}-module for its first-order differential. For d>1d>1, the norm in Fd,1F_{d,1} is defined as

fFd,1=D1fL1([0,1]d)=[0,1]d|D1f(x)|𝑑x,\|f\|_{F_{d,1}}=\|D^{\vec{1}}f\|_{L_{1}([0,1]^{d})}=\int_{[0,1]^{d}}|D^{\vec{1}}f(x)|dx,

where 1=[1,1,,1]\vec{1}=[1,1,\ldots,1] and D1=d/x1xdD^{\vec{1}}=\partial^{d}/\partial x_{1}\ldots\partial x_{d}.

Then for weighted Sobolev space Fd,1,γF_{d,1,\gamma}, its norm is

(4.1) fFd,1,γ=uIdγu,d[0,1]|u|||u|xuf(xu,1)|𝑑xu.\|f\|_{F_{d,1,\gamma}}=\sum_{u\subseteq I_{d}}\gamma_{u,d}\int_{[0,1]^{|u|}}|\frac{\partial^{|u|}}{\partial x_{u}}f(x_{u},1)|dx_{u}.

Considering the problem of function approximation in Fd,1,γF_{d,1,\gamma} space. The sample mean method can still be used, for

I(f)=[0,1]df(x)𝑑x,I(f)=\int_{[0,1]^{d}}f(x)dx,

and sample mean function

I~(f,𝐏)=1Ni=1Nf(xi).\tilde{I}(f,\mathbf{P})=\frac{1}{N}\sum_{i=1}^{N}f(x_{i}).

We consider the worst-case error

EN(f)=|[0,1]df(x)𝑑x1NtPN,df(t)|.E_{N}(f)=\left|\int_{[0,1]^{d}}f(x)dx-\frac{1}{N}\sum_{t\in P_{N,d}}f(t)\right|.

Then according to Hlawka and Zaremba identity [23], we have

e(EN(f))=supfFd,1,γ,fFd,1,γ1|I(f)I~(f,𝐏)|=DN,γ(t1,t2,,tN).e(E_{N}(f))=\sup_{f\in F_{d,1,\gamma},\|f\|_{F_{d,1,\gamma}}\leq 1}|I(f)-\tilde{I}(f,\mathbf{P})|=D_{N,\gamma}^{*}(t_{1},t_{2},\ldots,t_{N}).

For uniform integral approximation in weighted Sobolev spaces, we have the following theorem:

Theorem 4.1.

Let fFd,1,γf\in F_{d,1,\gamma} be functions in Sobolev sapce. Let integer b1b\geq 1, m1m\geq 1 and N=bmN=b^{m}. Let b,λ,λ0,cb,\lambda,\lambda_{0},c be some integers, λ0\lambda_{0} be constants such that bλ2e2λ2b\lambda^{2}\leq e^{2\lambda^{2}} holds for all λλ0\lambda\geq\lambda_{0}, and c=max{2,b,λ0,1log2(2ϵ)}c=\max\{2,b,\lambda_{0},\frac{1}{\log_{2}(2-\epsilon)}\}. Then for dd-dimensional Hilbert space filling curve samples XiU(Ei),i=1,2,,N=bmX_{i}\sim U(E_{i}),i=1,2,\ldots,N=b^{m}, we have

(4.2) supfFd,1,γ,fFd,1,γ1\displaystyle\sup_{f\in F_{d,1,\gamma},\|f\|_{F_{d,1,\gamma}}\leq 1} |[0,1]df(x)𝑑x1Nj=1Nf(Xj)|\displaystyle|\int_{[0,1]^{d}}f(x)dx-\frac{1}{N}\sum_{j=1}^{N}f(X_{j})|
maxuIdγu,d[6|u|34N12+12|u||u|ln(N+1)+c(|u|,ϵ)+2c(|u|,ϵ)3N]\displaystyle\leq\max_{\emptyset\neq u\subseteq I_{d}}\gamma_{u,d}[\frac{6|u|^{\frac{3}{4}}}{N^{\frac{1}{2}+\frac{1}{2|u|}}}\cdot\sqrt{|u|\ln(N+1)+c(|u|,\epsilon)}+\frac{2c(|u|,\epsilon)}{3N}]

holds with probability at least ϵ\epsilon, where

c(|u|,ϵ)=ln(2e)|u|2π|u|(1ϵ).c(|u|,\epsilon)=\ln\frac{(2e)^{|u|}}{\sqrt{2\pi|u|}\cdot(1-\epsilon)}.
Proof.

In theorem 3.1, we choose the probability q=ϵ=1(bλ2e2λ2)dq=\epsilon=1-(b\lambda^{2}e^{-2\lambda^{2}})^{d}. Which holds for some positive constant bb and for all λmax{1,b,λ0}\lambda\geq\max\{1,b,\lambda_{0}\}, where λ0\lambda_{0} is constant such that bλ2e2λ2b\lambda^{2}\leq e^{2\lambda^{2}} holds for all λλ0\lambda\geq\lambda_{0}, and we choose

λ=cmax{1,(lnd)/(ln2)},\lambda=c\max\{1,\sqrt{(\ln d)/(\ln 2)}\},

and c=max{2,b,λ0}c=\max\{2,b,\lambda_{0}\}.

Let

c(|u|,ϵ)=ln(2e)|u|2π|u|(1ϵ).c(|u|,\epsilon)=\ln\frac{(2e)^{|u|}}{\sqrt{2\pi|u|}\cdot(1-\epsilon)}.

For a given number of sample points NN and dimension dd, we consider the following set

Ad:={𝐏N,d[0,1]d:\displaystyle A_{d}:=\{\mathbf{P}_{N,d}\subset[0,1]^{d}: DN(𝐏N,d(u))[6|u|34N12+12|u||u|ln(N+1)+c(|u|,ϵ)+2c(|u|,ϵ)3N],\displaystyle D_{N}(\mathbf{P}_{N,d}(u))\leq[\frac{6|u|^{\frac{3}{4}}}{N^{\frac{1}{2}+\frac{1}{2|u|}}}\cdot\sqrt{|u|\ln(N+1)+c(|u|,\epsilon)}+\frac{2c(|u|,\epsilon)}{3N}],
uId,u},\displaystyle\forall u\subseteq I_{d},u\neq\emptyset\},

where 𝐏N,d(u):={X1(u),,XN(u)}\mathbf{P}_{N,d}(u):=\{X_{1}(u),\ldots,X_{N}(u)\}. In addition, for uId,uu\subseteq I_{d},u\neq\emptyset, we define

Au,d:={𝐏N,d[0,1)d:\displaystyle A_{u,d}:=\{\mathbf{P}_{N,d}\subset[0,1)^{d}: DN(𝐏N,d(u))[6|u|34N12+12|u||u|ln(N+1)+c(|u|,ϵ)\displaystyle D_{N}(\mathbf{P}_{N,d}(u))\leq[\frac{6|u|^{\frac{3}{4}}}{N^{\frac{1}{2}+\frac{1}{2|u|}}}\cdot\sqrt{|u|\ln(N+1)+c(|u|,\epsilon)}
+2c(|u|,ϵ)3N]}.\displaystyle+\frac{2c(|u|,\epsilon)}{3N}]\}.

Then we have

Ad=uIdAu,d.\displaystyle A_{d}=\bigcap_{\emptyset\neq u\subseteq I_{d}}A_{u,d}.

Hence,

(Ad)\displaystyle\mathbb{P}(A_{d}) =(uIdAu,d)=1(uIdAu,dc)\displaystyle=\mathbb{P}(\bigcap_{\emptyset\neq u\subseteq I_{d}}A_{u,d})=1-\mathbb{P}(\bigcup_{\emptyset\neq u\subseteq I_{d}}A_{u,d}^{c})
1uId(Au,dc)>1uId(bλ2e2λ2)|u|\displaystyle\geq 1-\sum_{\emptyset\neq u\subseteq I_{d}}\mathbb{P}(A_{u,d}^{c})>1-\sum_{\emptyset\neq u\subseteq I_{d}}(b\lambda^{2}e^{-2\lambda^{2}})^{|u|}
=1u=1d(du)(bλ2e2λ2)u=2(1+bλ2e2λ2)d.\displaystyle=1-\sum_{u=1}^{d}\dbinom{d}{u}(b\lambda^{2}e^{-2\lambda^{2}})^{u}=2-(1+b\lambda^{2}e^{-2\lambda^{2}})^{d}.

According to λ=cmax{1,lndln2}\lambda=c\max\{1,\sqrt{\frac{\ln d}{\ln 2}}\} and c=max{2,b,λ0}c=\max\{2,b,\lambda_{0}\}, for all d2d\geq 2 and x=c2ln2>5x=\frac{c^{2}}{\ln 2}>5, we have x22xdxx^{2}\leq 2^{x}\leq d^{x} and lnddx1\ln d\leq d^{x-1}. Thus, x2lndd2x1x^{2}\ln d\leq d^{2x-1}, then

c3lnd(ln2)d2c2ln2ln2cd.\frac{c^{3}\ln d}{(\ln 2)d^{\frac{2c^{2}}{\ln 2}}}\leq\frac{\ln 2}{cd}.

Based on this inequality, we obtain a formula that holds for all d2d\geq 2

(Ad)\displaystyle\mathbb{P}(A_{d}) >2(1+bλ2e2λ2)d2(1+c3lnd(ln2)d2c2ln2)d\displaystyle>2-(1+b\lambda^{2}e^{-2\lambda^{2}})^{d}\geq 2-(1+\frac{c^{3}\ln d}{(\ln 2)d^{\frac{2c^{2}}{\ln 2}}})^{d}
2(1+ln2cd)d>2eln2c=221/cϵ=q(0,1).\displaystyle\geq 2-(1+\frac{\ln 2}{cd})^{d}>2-e^{\frac{\ln 2}{c}}=2-2^{1/c}\geq\epsilon=q\in(0,1).

Thus for every uId\emptyset\neq u\subseteq I_{d}, we get

DN,γ(t1,t2,tN)maxuIdγu,d[6|u|34N12+12|u||u|ln(N+1)+c(|u|,ϵ)+2c(|u|,ϵ)3N]D_{N,\gamma}\left(t_{1},t_{2},\ldots t_{N}\right)\leq\max_{\emptyset\neq u\subseteq I_{d}}\gamma_{u,d}[\frac{6|u|^{\frac{3}{4}}}{N^{\frac{1}{2}+\frac{1}{2|u|}}}\cdot\sqrt{|u|\ln(N+1)+c(|u|,\epsilon)}+\frac{2c(|u|,\epsilon)}{3N}]

holds with probability at least ϵ\epsilon.

The proof is completed. ∎

4.2. Strong partition principle for star discrepancy

For stratified sampling,  [13] proves that the classical jittered sampling can obtain a better expected star discrepancy bound than the traditional Monte Carlo sampling. Which means that the jittered sampling is the optimization of Monte Carlo sampling when the number of sampling points is N=mdN=m^{d}. Then the problem is how to directly compare the size of the expected star discrepancy under different sampling methods, rather than the optimization of the bound. It refers to how to prove the partition principle of star discrepancy. That is, the expected star discrepancy of the HSFC stratified sampling( here is no longer limited to the jittered case) must be smaller than the expected star discrepancy of simple random sampling.

Theorem 4.2.

For integers b1b\geq 1, m1m\geq 1 and N=bmN=b^{m}. For well-defined Hilbert space filling curve samples XiU(Ei),i=1,2,,N=bmX_{i}\sim U(E_{i}),i=1,2,\ldots,N=b^{m}, and simple random dd-dimensional samples Y={Y1,Y2,Y3,,YN}Y=\{Y_{1},Y_{2},Y_{3},\ldots,Y_{N}\}, then

(4.3) 𝔼(DN(X))<𝔼(DN(Y)).\mathbb{E}(D_{N}^{*}(X))<\mathbb{E}(D_{N}^{*}(Y)).
Proof.

For arbitrary test set R=[0,x)R=[0,x), we unify a label W={W1,W2,,WN}W=\{W_{1},W_{2},\ldots,W_{N}\}. Which is the sampling point sets formed by different sampling models, and we consider the following discrepancy function,

(4.4) Δ𝒫(x)=1Nn=1N𝟏R(Wn)λ(R).\Delta_{\mathscr{P}}(x)=\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{R}(W_{n})-\lambda(R).

For an equivolume partition Ω={Ω1,Ω2,,ΩN}\Omega=\{\Omega_{1},\Omega_{2},\ldots,\Omega_{N}\}, we divide the test set RR into two parts, one being the disjoint union of Ωi\Omega_{i} completely contained by RR, and the other being the union of the remaining parts which are the intersections of some Ωj\Omega_{j} and RR, i.e.,

(4.5) R=iI0ΩijJ0(ΩjR),R=\bigcup_{i\in I_{0}}\Omega_{i}\cup\bigcup_{j\in J_{0}}(\Omega_{j}\cap R),

where I0,J0I_{0},J_{0} are two index sets.

We set

T=jJ0(ΩjR).T=\bigcup_{j\in J_{0}}(\Omega_{j}\cap R).

Furthermore, for the equivolume partition Ω={Ω1,Ω2,,ΩN}\Omega=\{\Omega_{1},\Omega_{2},\ldots,\Omega_{N}\} of [0,1]d[0,1]^{d} and the corresponding stratified sampling set PΩP_{\Omega}, we have

(4.6) Var(n=1N𝟏R(PΩ))\displaystyle\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(P_{\Omega})) =i=1N|Ωi[0,x]||Ωi|(1|Ωi[0,x]||Ωi|)\displaystyle=\sum_{i=1}^{N}\frac{|\Omega_{i}\cap[0,x]|}{|\Omega_{i}|}(1-\frac{|\Omega_{i}\cap[0,x]|}{|\Omega_{i}|})
=N|[0,x]|N2i=1N(|Ωi[0,x]|)2.\displaystyle=N|[0,x]|-N^{2}\sum_{i=1}^{N}(|\Omega_{i}\cap[0,x]|)^{2}.

For sampling sets Y={Y1,Y2,,YN}Y=\{Y_{1},Y_{2},\ldots,Y_{N}\} and X={X1,X2,,XN}X=\{X_{1},X_{2},\ldots,X_{N}\}, we have

(4.7) Var(n=1N𝟏R(X))=N|[0,x]|N2i=1N(|Ω|,i[0,x]|)2,\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(X))=N|[0,x]|-N^{2}\sum_{i=1}^{N}(|\Omega^{*}_{|,i}\cap[0,x]|)^{2},

and

(4.8) Var(n=1N𝟏R(Y))=N|[0,x]|N2|[0,x]|2.\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(Y))=N|[0,x]|-N^{2}|[0,x]|^{2}.

Hence, we have

(4.9) Var(n=1N𝟏R(X))Var(n=1N𝟏R(Y)).\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(X))\leq\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{R}(Y)).

Now we exclude the case of equality in (4.9), which means that the following formula holds,

(4.10) N|Ω|,i[0,x]|=|[0,x]|,N|\Omega^{*}_{|,i}\cap[0,x]|=|[0,x]|,

for i=1,2,,Ni=1,2,\ldots,N and for almost all x[0,1]dx\in[0,1]^{d}.

Hence,

(4.11) [0,x]𝟏Ω|,i(y)𝑑y=[0,x]1N𝑑y,\int_{[0,x]}\mathbf{1}_{\Omega^{*}_{|,i}}(y)dy=\int_{[0,x]}\frac{1}{N}dy,

for almost all x[0,1]dx\in[0,1]^{d} and all i=1,2,,Ni=1,2,\ldots,N, which implies 𝟏Ω|,i=1N\mathbf{1}_{\Omega^{*}_{|,i}}=\frac{1}{N}. This is not impossible for N2.N\geq 2.

For set TT, we have

(4.12) Var(1Nn=1N𝟏T(Xn))=i=1|J0||Ωi,|[0,x]||Ωi,||(1|Ωi,|[0,x]||Ωi,||).\text{Var}(\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{T}(X_{n}))=\sum_{i=1}^{|J_{0}|}\frac{|\Omega^{*}_{i,|}\cap[0,x]|}{|\Omega^{*}_{i,|}|}(1-\frac{|\Omega^{*}_{i,|}\cap[0,x]|}{|\Omega^{*}_{i,|}|}).

The same analysis for the set TT as RR, we have

(4.13) Var(1Nn=1N𝟏T(Xn))<Var(1Nn=1N𝟏T(Yn)).\text{Var}(\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{T}(X_{n}))<\text{Var}(\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{T}(Y_{n})).

For the test set R=[0,x)R=[0,x), we choose R0=[0,y)R_{0}=[0,y) and R1=[0,z)R_{1}=[0,z) such that yxzy\leq x\leq z and λ(R1)λ(R0)1N\lambda(R_{1})-\lambda(R_{0})\leq\frac{1}{N}, then (R0R_{0},R1R_{1}) form the 1N\frac{1}{N}-covers. For R0R_{0} and R1R_{1}, we can divide them into two parts as we did for (4.5). Let

T0=jJ0(ΩjR0),T_{0}=\bigcup_{j\in J_{0}}(\Omega_{j}\cap R_{0}),

and

T1=jJ0(ΩjR1).T_{1}=\bigcup_{j\in J_{0}}(\Omega_{j}\cap R_{1}).

We have the same result for T0T_{0} and T1T_{1}. To unify the two cases T0T_{0} and T1T_{1} (because T0T_{0} and T1T_{1} are generated from two test sets with the same cardinality, and the cardinality is the covering numbers), we consider a set RR^{\prime} which can be divided into two parts

(4.14) R=kKΩklL(ΩlR),R^{\prime}=\bigcup_{k\in K}\Omega_{k}\cup\bigcup_{l\in L}(\Omega_{l}\cap R^{\prime}),

where K,LK,L are two index sets. In addition, we set the cardinality of R[0,1)dR^{{}^{\prime}}\subset[0,1)^{d} to be at most 2d1ed2πd(N+1)d2^{d-1}\frac{e^{d}}{\sqrt{2\pi d}}(N+1)^{d} (the δ\delta-covering numbers, where we choose δ=1N\delta=\frac{1}{N}), and we let

T=lL(ΩlR).T^{\prime}=\bigcup_{l\in L}(\Omega_{l}\cap R^{\prime}).

We define new random variables χj,1j|L|\chi_{j},1\leq j\leq|L|, as follows

χj={1,WjΩjR,0,otherwise.\chi_{j}=\left\{\begin{aligned} &1,W_{j}\in\Omega_{j}\cap R^{\prime},\\ &0,otherwise.\end{aligned}\right.

Then,

(4.15) NDN(W1,W2,,WN;R)\displaystyle N\cdot D^{*}_{N}\left(W_{1},W_{2},\ldots,W_{N};R^{\prime}\right)
=NDN(W1,W2,,WN;T)\displaystyle=N\cdot D^{*}_{N}\left(W_{1},W_{2},\ldots,W_{N};T^{\prime}\right)
=|j=1|L|χjN(j=1|L|λ(ΩjT))|.\displaystyle=|\sum_{j=1}^{|L|}\chi_{j}-N(\sum_{j=1}^{|L|}\lambda(\Omega_{j}\cap T^{\prime}))|.

Since

(χj=1)=λ(ΩjT)λ(Ωj)=Nλ(ΩjT),\mathbb{P}(\chi_{j}=1)=\frac{\lambda(\Omega_{j}\cap T^{\prime})}{\lambda(\Omega_{j})}=N\cdot\lambda(\Omega_{j}\cap T^{\prime}),

we get

(4.16) 𝐄(χj)=Nλ(ΩjT).\mathbf{E}(\chi_{j})=N\cdot\lambda(\Omega_{j}\cap T^{\prime}).

Thus, from (4.15) and (4.16), we obtain

(4.17) NDN(W1,W2,,WN;R)=|j=1|L|(χj𝔼(χj))|.N\cdot D^{*}_{N}\left(W_{1},W_{2},\ldots,W_{N};R^{\prime}\right)=|\sum_{j=1}^{|L|}(\chi_{j}-\mathbb{E}(\chi_{j}))|.

Let

σj2=𝔼(χj𝔼(χj))2,Σ=(j=1|L|σj2)12.\sigma_{j}^{2}=\mathbb{E}(\chi_{j}-\mathbb{E}(\chi_{j}))^{2},\Sigma=(\sum_{j=1}^{|L|}\sigma_{j}^{2})^{\frac{1}{2}}.

Therefore, from Lemma 2.3, for every RR^{\prime}, we have,

(|j=1|L|(χj𝔼(χj))|>λ)2exp(λ22Σ2+2λ3).\mathbb{P}\left(\Big{|}\sum_{j=1}^{|L|}(\chi_{j}-\mathbb{E}(\chi_{j}))\Big{|}>\lambda\right)\leq 2\cdot\exp(-\frac{\lambda^{2}}{2\Sigma^{2}+\frac{2\lambda}{3}}).

Let =R(|j=1|L|(χj𝔼(χj))|>λ),\mathscr{B}=\bigcup\limits_{R^{\prime}}\left(\Big{|}\sum_{j=1}^{|L|}(\chi_{j}-\mathbb{E}(\chi_{j}))|>\lambda\right), then using δ\delta-cover numbers, we have

(4.18) ()(2e)d12πd(N+1)dexp(λ22Σ2+2λ3).\mathbb{P}(\mathscr{B})\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{2\Sigma^{2}+\frac{2\lambda}{3}}).

Combining with (4.17), we get

(4.19) (R(NDN(W1,W2,,WN;R)>λ))\displaystyle\mathbb{P}\Big{(}\bigcup_{R^{\prime}}\left(N\cdot D_{N}^{*}\left(W_{1},W_{2},\ldots,W_{N};R^{\prime}\right)>\lambda\right)\Big{)}
(2e)d12πd(N+1)dexp(λ22Σ2+2λ3).\displaystyle\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{2\Sigma^{2}+\frac{2\lambda}{3}}).

For point sets YY and XX, if we let

Σ02=Var(n=1N𝟏T(Xn)),Σ12=Var(n=1N𝟏T(Yn)).\Sigma_{0}^{2}=\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{T^{\prime}}(X_{n})),\Sigma_{1}^{2}=\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{T^{\prime}}(Y_{n})).

Then (4.13) implies

Σ02<Σ12.\Sigma_{0}^{2}<\Sigma_{1}^{2}.

Besides, as (4.19), we have

(4.20) (R(NDN(X1,X2,,XN;R)>λ))\displaystyle\mathbb{P}\Big{(}\bigcup_{R^{\prime}}\left(N\cdot D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N};R^{\prime}\right)>\lambda\right)\Big{)}
(2e)d12πd(N+1)dexp(λ22Σ02+2λ3),\displaystyle\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{2\Sigma_{0}^{2}+\frac{2\lambda}{3}}),

and

(R(NDN(Y1,Y2,,YN;R)>λ))\displaystyle\mathbb{P}\Big{(}\bigcup_{R^{\prime}}\left(N\cdot D_{N}^{*}\left(Y_{1},Y_{2},\ldots,Y_{N};R^{\prime}\right)>\lambda\right)\Big{)}
(2e)d12πd(N+1)dexp(λ22Σ12+2λ3).\displaystyle\leq(2e)^{d}\cdot\frac{1}{\sqrt{2\pi d}}\cdot(N+1)^{d}\cdot\exp(-\frac{\lambda^{2}}{2\Sigma_{1}^{2}+\frac{2\lambda}{3}}).

Suppose A(d,q,N)=dln(2e)+dln(N+1)ln(2πd)2ln(1q)A(d,q,N)=d\ln(2e)+d\ln(N+1)-\frac{\ln(2\pi d)}{2}-\ln(1-q), and we substitute

λ=2Σ02A(d,q,N)+A2(d,q,N)9+A(d,q,N)3\lambda=\sqrt{2\Sigma_{0}^{2}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9}}+\frac{A(d,q,N)}{3}

into (4.20), then we have

(4.21) (R(NDN(X1,X2,,XN;R)>λ))1q.\mathbb{P}\Big{(}\bigcup_{R^{\prime}}\left(N\cdot D_{N}^{*}\left(X_{1},X_{2},\ldots,X_{N};R^{\prime}\right)>\lambda\right)\Big{)}\leq 1-q.

Therefore, from (4.21), it can be easily verified that

(4.22) maxRi,i=0,1DN(X1,X2,,XN;Ri)2Σ02A(d,q,N)+A2(d,q,N)9N+A(d,q,N)3N\max_{R_{i},i=0,1}D_{N}^{*}(X_{1},X_{2},\ldots,X_{N};R_{i})\leq\frac{\sqrt{2\Sigma_{0}^{2}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9}}}{N}+\frac{A(d,q,N)}{3N}

holds with probability at least qq.

Combined with δ\delta-covering numbers(where δ=1N\delta=\frac{1}{N}), we get,

(4.23) DN(X)\displaystyle D_{N}^{*}(X) 2Σ02A(d,q,N)+A2(d,q,N)9N+A(d,q,N)+33N\displaystyle\leq\frac{\sqrt{2\Sigma_{0}^{2}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9}}}{N}+\frac{A(d,q,N)+3}{3N}
(2Σ0+1)A(d,q,N)N\displaystyle\leq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q,N)}{N}

holds with probability at least qq. The last inequality in (4.23) holds because A(d,q,N)3A(d,q,N)\geq 3 holds for all q(0,1)q\in(0,1).

Same analysis with point set YY, we have

(4.24) DN(Y)\displaystyle D_{N}^{*}(Y) 2Σ12A(d,q,N)+A2(d,q,N)9N+A(d,q,N)+33N\displaystyle\leq\frac{\sqrt{2\Sigma_{1}^{2}\cdot A(d,q,N)+\frac{A^{2}(d,q,N)}{9}}}{N}+\frac{A(d,q,N)+3}{3N}
(2Σ1+1)A(d,q,N)N\displaystyle\leq(\sqrt{2}\cdot\Sigma_{1}+1)\frac{A(d,q,N)}{N}

holds with probability at least qq.

Now we fix a probability value q0(0,1)q_{0}\in(0,1) in (4.23), i.e. we assume that (4.23) holds with probability exactly q0q_{0}, where q0[q,1)q_{0}\in[q,1). If we choose this q0q_{0} in (4.24), we have

DN(Y)(2Σ1+1)A(d,q0,N)N,D_{N}^{*}(Y)\leq(\sqrt{2}\cdot\Sigma_{1}+1)\frac{A(d,q_{0},N)}{N},

holds with probability q0.q_{0}.

Therefore from Σ0<Σ1\Sigma_{0}<\Sigma_{1}, we obtain,

(4.25) DN(Y)(2Σ0+1)A(d,q0,N)ND_{N}^{*}(Y)\leq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}

holds with probability q1,q_{1}, where 0<q1<q00<q_{1}<q_{0}.

We use the following fact to estimate the expected star discrepancy

(4.26) 𝔼[DN(W)]=01(DN(W)t)𝑑t,\mathbb{E}[D^{*}_{N}(W)]=\int_{0}^{1}\mathbb{P}(D^{*}_{N}(W)\geq t)dt,

where DN(W)D^{*}_{N}(W) denotes the star discrepancy of point set WW.

By substituting q0q_{0} into (4.23), we obtain

(4.27) DN(X)(2Σ0+1)A(d,q0,N)ND_{N}^{*}\left(X\right)\leq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}

holds with probability q0q_{0}. Then (4.27) is equivalent to

(DN(X)(2Σ0+1)A(d,q0,N)N)=1q0.\mathbb{P}\big{(}D_{N}^{*}\left(X\right)\geq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}\big{)}=1-q_{0}.

Now releasing q0q_{0} and let

(4.28) t=(2Σ0+1)A(d,q0,N)N,t=(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N},
(4.29) C0(Σ0,N)=2Σ0+1N,C_{0}(\Sigma_{0},N)=\frac{\sqrt{2}\cdot\Sigma_{0}+1}{N},

and

(4.30) C1(d,Σ0,N)=2Σ0+1N(dln(2e)+dln(N+1)ln(2πd)2).C_{1}(d,\Sigma_{0},N)=\frac{\sqrt{2}\cdot\Sigma_{0}+1}{N}\cdot(d\ln(2e)+d\ln(N+1)-\frac{\ln(2\pi d)}{2}).

Then

(4.31) t=C1(d,Σ0,N)C0(Σ0,N)ln(1q0).t=C_{1}(d,\Sigma_{0},N)-C_{0}(\Sigma_{0},N)\ln(1-q_{0}).

Thus from (4.26) and q0[q,1)q_{0}\in[q,1), we have

(4.32) 𝔼[DN(X)]=01(DN(X)t)𝑑t\displaystyle\mathbb{E}[D^{*}_{N}(X)]=\int_{0}^{1}\mathbb{P}(D^{*}_{N}(X)\geq t)dt
=1eC1(d,Σ0,N)C0(Σ0,N)1eC1(d,Σ0,N)1C0(Σ0,N)(DN(X)(2Σ0+1)A(d,q0,N)N)C0(Σ0,N)11q0𝑑q0\displaystyle=\int_{1-e^{\frac{C_{1}(d,\Sigma_{0},N)}{C_{0}(\Sigma_{0},N)}}}^{1-e^{\frac{C_{1}(d,\Sigma_{0},N)-1}{C_{0}(\Sigma_{0},N)}}}\mathbb{P}\Big{(}D^{*}_{N}(X)\geq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}\Big{)}\cdot C_{0}(\Sigma_{0},N)\cdot\frac{1}{1-q_{0}}dq_{0}
=q1eC1(d,Σ0,N)1C0(Σ0,N)C0(Σ0,N)1q01q0𝑑q0.\displaystyle=\int_{q}^{1-e^{\frac{C_{1}(d,\Sigma_{0},N)-1}{C_{0}(\Sigma_{0},N)}}}C_{0}(\Sigma_{0},N)\cdot\frac{1-q_{0}}{1-q_{0}}dq_{0}.

Furthermore, from (4.25), we have

(DN(Y)(2Σ0+1)A(d,q0,N)N)=1q1.\mathbb{P}\big{(}D_{N}^{*}\left(Y\right)\geq(\sqrt{2}\cdot\Sigma_{0}+1)\frac{A(d,q_{0},N)}{N}\big{)}=1-q_{1}.

Following the steps from (4.28) to (4.31), we obtain,

𝔼[DN(Y)]=01(DN(Y)t)𝑑t=q1eC1(d,Σ0,N)1C0(Σ0,N)C0(Σ0,N)1q11q0𝑑q0.\mathbb{E}[D^{*}_{N}(Y)]=\int_{0}^{1}\mathbb{P}(D^{*}_{N}(Y)\geq t)dt=\int_{q}^{1-e^{\frac{C_{1}(d,\Sigma_{0},N)-1}{C_{0}(\Sigma_{0},N)}}}C_{0}(\Sigma_{0},N)\cdot\frac{1-q_{1}}{1-q_{0}}dq_{0}.

From q1<q0,q_{1}<q_{0}, we obtain

1q11q0>1q01q0.\frac{1-q_{1}}{1-q_{0}}>\frac{1-q_{0}}{1-q_{0}}.

Hence,

(4.33) 𝔼(DN(X))<𝔼(DN(Y)).\mathbb{E}(D_{N}^{*}(X))<\mathbb{E}(D_{N}^{*}(Y)).

Corollary 4.3.

For any equivolume partitions Ω={Ω1,Ω2,,ΩN}\Omega=\{\Omega_{1},\Omega_{2},\ldots,\Omega_{N}\} and the corresponding stratified sampling set PΩP_{\Omega}. For simple random dd-dimensional samples Y={Y1,Y2,Y3,,YN}Y=\{Y_{1},Y_{2},Y_{3},\ldots,Y_{N}\}, we have

(4.34) 𝔼(DN(PΩ))<𝔼(DN(Y)).\mathbb{E}(D_{N}^{*}(P_{\Omega}))<\mathbb{E}(D_{N}^{*}(Y)).
Proof.

For Ω={Ω1,Ω2,,ΩN}\Omega=\{\Omega_{1},\Omega_{2},\ldots,\Omega_{N}\} and PΩP_{\Omega}, it follows that

(4.35) Var(n=1N𝟏[0,x](PΩ))\displaystyle\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{[0,x]}(P_{\Omega})) =i=1N|Ωi[0,x]||Ωi|(1|Ωi[0,x]||Ωi|)\displaystyle=\sum_{i=1}^{N}\frac{|\Omega_{i}\cap[0,x]|}{|\Omega_{i}|}(1-\frac{|\Omega_{i}\cap[0,x]|}{|\Omega_{i}|})
=N|[0,x]|N2i=1N(|Ωi[0,x]|)2.\displaystyle=N|[0,x]|-N^{2}\sum_{i=1}^{N}(|\Omega_{i}\cap[0,x]|)^{2}.

For simple random sampling set Y={Y1,Y2,,YN}Y=\{Y_{1},Y_{2},\ldots,Y_{N}\}, we have

(4.36) Var(n=1N𝟏[0,x](Y))=N|[0,x]|N2|[0,x]|2.\text{Var}(\sum_{n=1}^{N}\mathbf{1}_{[0,x]}(Y))=N|[0,x]|-N^{2}|[0,x]|^{2}.

According to the proof process of Theorem 4.2, we have

(4.37) Var(1Nn=1N𝟏[0,x](PΩ))<Var(1Nn=1N𝟏T(Yn)).\text{Var}(\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{[0,x]}(P_{\Omega}))<\text{Var}(\frac{1}{N}\sum_{n=1}^{N}\mathbf{1}_{T}(Y_{n})).

and the desired result. ∎

Remark 4.4.

In fact, Corollary 4.3 extends the result of Theorem 4.2 to the more general case of equivolume partitions. The following result is a clear example.

An example: For a grid-based equivolume partition in two dimensions, the two squares in the upper right corner are merged to form a rectangle

I=[a1,a1+2b]×[a2,a2+b],I=[a_{1},a_{1}+2b]\times[a_{2},a_{2}+b],

where a1,a2,ba_{1},a_{2},b are three positive constants.

For the merged rectangle II, we use a series of straight-line partitions to divide the rectangle into two equal-volume parts, which will be converted to a one-parameter model if we set the angle between the dividing line and the horizontal line across the center θ\theta, where we suppose that 0θπ20\leq\theta\leq\frac{\pi}{2}. From simple calculations, we can conclude that the arbitrary straight line must pass through the center of the rectangle. For convenience of notation, we set this partition model Ω=(Ω1,,Ω2,,Q3,,QN)\Omega_{\sim}=(\Omega_{1,\sim},\Omega_{2,\sim},Q_{3},\ldots,Q_{N}) in the two-dimensional case, see Figure 3.

Refer to caption
Figure 3. A class of equivolume partitions for two dimensions

Now, we consider the dd-dimensional cuboid

(4.38)   Id=I×i=3d[ai,ai+b]  I_{d}=I\times\prod_{i=3}^{d}[a_{i},a_{i}+b]

and its partitions Ω=(Ω1,,Ω2,)\Omega^{\prime}_{\sim}=(\Omega^{\prime}_{1,\sim},\Omega^{\prime}_{2,\sim}) into two closed, convex bodies with

(4.39)   Ω1,=Ω1,×i=3d[ai,ai+b].  \Omega^{\prime}_{1,\sim}=\Omega_{1,\sim}\times\prod_{i=3}^{d}[a_{i},a_{i}+b].

We choose a1=m2m,a2=m1m,b=1ma_{1}=\frac{m-2}{m},a_{2}=\frac{m-1}{m},b=\frac{1}{m} in Ω1,\Omega^{\prime}_{1,\sim}, which is denoted by Ω1,\Omega^{*}_{1,\sim}, and obtain

(4.40) Ω=(Ω1,,Ω2,,Q3,QN).\Omega^{*}_{\sim}=(\Omega^{*}_{1,\sim},\Omega^{*}_{2,\sim},Q_{3}\ldots,Q_{N}).

Then, we have the following result.

Corollary 4.5.

For any 0θπ20\leq\theta\leq\frac{\pi}{2}. For the class of equivolume partitions Ω=(Ω1,,Ω2,,Q3,QN)\Omega^{*}_{\sim}=(\Omega^{*}_{1,\sim},\Omega^{*}_{2,\sim},Q_{3}\ldots,Q_{N}) and the corresponding stratified sampling set PΩP_{\Omega^{*}_{\sim}}. For simple random samples Y={Y1,Y2,Y3,,YN}Y=\{Y_{1},Y_{2},Y_{3},\ldots,Y_{N}\}, we have

(4.41) 𝔼(DN(PΩ))<𝔼(DN(Y)).\mathbb{E}(D_{N}^{*}(P_{\Omega^{*}_{\sim}}))<\mathbb{E}(D_{N}^{*}(Y)).

4.3. Sampling theorem in Hilbert space

The sampling problem attempts to reconstruct or approximate a function ff from its sampled values on the sampling set [24]. To solve this problem, it is necessary to specify the function space, since the theoretical and numerical analysis may differ for different function spaces. Common sampling function spaces are: (i)band-limited signals [25]; (ii)signals in some shift-invariant spaces [34, 32, 33, 26, 30, 31]; (iii)non-uniform splines [28, 29]; (iv)sum of some of the above signals [27], and so on.

Well-posedness of the sampling problem implies that the following inequalities must hold:

(4.42) Af22j=1N|f(xj)|2Bf22,A\|f\|_{2}^{2}\leq\sum_{j=1}^{N}|f(x_{j})|^{2}\leq B\|f\|_{2}^{2},

where AA and BB are some positive constants. The inequalities (4.42) indicate that a small change in a sampled value f(xj)f(x_{j}) causes only a small change in ff. This implies that the sampling is stable, or equivalently, that the reconstruction of ff from its samples is continuous. Due to the randomness in the choice of the sampling points, our goal is to choose the following probability estimate:

(4.43) (Af22j=1N|f(xj)|2Bf22)1ϵ,\mathbb{P}(A\|f\|_{2}^{2}\leq\sum_{j=1}^{N}|f(x_{j})|^{2}\leq B\|f\|_{2}^{2})\geq 1-\epsilon,

where ϵ>0\epsilon>0 can be taken as arbitrarily small.

In this subsection, we study the problem of stratified sampling sets, and give explicit bounded expressions for a set of sampling inequalities in general Hilbert spaces.

Theorem 4.6.

For integers b1b\geq 1, m1m\geq 1 and N=bmN=b^{m}. Then for well-defined dd-dimensional Hilbert space filling curve samples XiU(Ei),i=1,2,,N=bmX_{i}\sim U(E_{i}),i=1,2,\ldots,N=b^{m}, we have

(1DN(X)γ)fL221Nj=1Nf2(xj)(DN(X)γ+1)fL22,\displaystyle(1-\frac{D_{N}^{*}(X)}{\gamma})\|f\|_{L^{2}}^{2}\leq\frac{1}{N}\sum_{j=1}^{N}f^{2}(x_{j})\leq(\frac{D_{N}^{*}(X)}{\gamma}+1)\|f\|_{L^{2}}^{2},

holds with probability at least 1ϵ1-\epsilon, where 0<γ10<\gamma\leq 1, ff belongs to Hilbert space.

Proof.

Consider the following kernel function,

K(x,y)=[0,1]d𝟏(x,1](t)𝟏(y,1](t)𝑑t.K(x,y)=\int_{[0,1]^{d}}\mathbf{1}_{(x,1]}(t)\mathbf{1}_{(y,1]}(t)dt.

It is not realistic to measure exactly for samples f(xj),1jNf(x_{j}),1\leq j\leq N, see discussions in [34]. A better assumption is that the sampling number has the following form:

(4.44) f2(x)=[0,1]dK(x,y)g2(y)𝑑y,\displaystyle f^{2}(x)=\int_{[0,1]^{d}}K(x,y)g^{2}(y)dy,

where f,gf,g belongs to Hilbert space. For samples xj,1jN,x_{j},1\leq j\leq N, K(xj,y)K(x_{j},y) is a set of functionals acting on the function g2g^{2} to produce the data f2(xj)f^{2}(x_{j}). The K(xj,y)K(x_{j},y) functionals can reflect the characteristics of the sampling devices.

Therefore, we have

(4.45) |1Nj=1Nf2(xj)[0,1]df2(x)𝑑x|\displaystyle|\frac{1}{N}\sum_{j=1}^{N}f^{2}(x_{j})-\int_{[0,1]^{d}}f^{2}(x)dx|
=\displaystyle= |1Nj=1N[0,1]dK(xj,y)g2(y)𝑑y[0,1]d[0,1]dK(x,y)g2(y)𝑑y𝑑x|\displaystyle|\frac{1}{N}\sum_{j=1}^{N}\int_{[0,1]^{d}}K(x_{j},y)g^{2}(y)dy-\int_{[0,1]^{d}}\int_{[0,1]^{d}}K(x,y)g^{2}(y)dydx|
=\displaystyle= |[0,1]dj=1N1NK(xj,y)g2(y)dy[0,1]dg2(y)[0,1]dK(x,y)𝑑x𝑑y|,\displaystyle|\int_{[0,1]^{d}}\sum_{j=1}^{N}\frac{1}{N}K(x_{j},y)g^{2}(y)dy-\int_{[0,1]^{d}}g^{2}(y)\int_{[0,1]^{d}}K(x,y)dxdy|,

and

(4.46) |[0,1]dj=1N1NK(xj,y)g2(y)dy[0,1]dg2(y)[0,1]dK(x,y)𝑑x𝑑y|\displaystyle|\int_{[0,1]^{d}}\sum_{j=1}^{N}\frac{1}{N}K(x_{j},y)g^{2}(y)dy-\int_{[0,1]^{d}}g^{2}(y)\int_{[0,1]^{d}}K(x,y)dxdy|
=\displaystyle= |[0,1]d(j=1N1NK(xj,y)[0,1]dK(x,y)𝑑x)g2(y)𝑑y|\displaystyle|\int_{[0,1]^{d}}\Big{(}\sum_{j=1}^{N}\frac{1}{N}K(x_{j},y)-\int_{[0,1]^{d}}K(x,y)dx\Big{)}g^{2}(y)dy|
\displaystyle\leq j=1N1NK(xj,y)[0,1]dK(x,y)𝑑xgL22.\displaystyle\|\sum_{j=1}^{N}\frac{1}{N}K(x_{j},y)-\int_{[0,1]^{d}}K(x,y)dx\|_{\infty}\|g\|_{L^{2}}^{2}.

Therefore, according to (4.44), we obtain γgL22fL22gL22,\gamma\|g\|_{L^{2}}^{2}\leq\|f\|_{L^{2}}^{2}\leq\|g\|_{L^{2}}^{2}, where 0<γ10<\gamma\leq 1.

In addition, combining with theorem 3.1, we have

(4.47) j=1N1NK(xj,y)[0,1]dK(x,y)𝑑x\displaystyle\|\sum_{j=1}^{N}\frac{1}{N}K(x_{j},y)-\int_{[0,1]^{d}}K(x,y)dx\|_{\infty}
\displaystyle\leq supy[0,1]d|[0,1]dj=1N1N𝟏(xj,1](t)𝟏(y,1](t)dt[0,1]d[0,1]d𝟏(x,1](t)𝟏(y,1](t)𝑑t𝑑x|\displaystyle\sup_{y\in[0,1]^{d}}|\int_{[0,1]^{d}}\sum_{j=1}^{N}\frac{1}{N}\mathbf{1}_{(x_{j},1]}(t)\mathbf{1}_{(y,1]}(t)dt-\int_{[0,1]^{d}}\int_{[0,1]^{d}}\mathbf{1}_{(x,1]}(t)\mathbf{1}_{(y,1]}(t)dtdx|
=\displaystyle= supy[0,1]d|[0,1]d(j=1N1N𝟏(xj,1](t)[0,1]d𝟏(x,1](t)𝑑x)𝟏(y,1](t)𝑑t|\displaystyle\sup_{y\in[0,1]^{d}}|\int_{[0,1]^{d}}\Big{(}\sum_{j=1}^{N}\frac{1}{N}\mathbf{1}_{(x_{j},1]}(t)-\int_{[0,1]^{d}}\mathbf{1}_{(x,1]}(t)dx\Big{)}\mathbf{1}_{(y,1]}(t)dt|
\displaystyle\leq supy[0,1]dj=1N1N𝟏(xj,1](t)[0,1]d𝟏(x,1](t)𝑑x[0,1]d𝟏𝟏(y,1](t)𝑑t\displaystyle\sup_{y\in[0,1]^{d}}\|\sum_{j=1}^{N}\frac{1}{N}\mathbf{1}_{(x_{j},1]}(t)-\int_{[0,1]^{d}}\mathbf{1}_{(x,1]}(t)dx\|_{\infty}\int_{[0,1]^{d}}\mathbf{1}_{\mathbf{1}_{(y,1]}}(t)dt
\displaystyle\leq j=1N1N𝟏(xj,1](t)[0,1]d𝟏(x,1](t)𝑑x\displaystyle\|\sum_{j=1}^{N}\frac{1}{N}\mathbf{1}_{(x_{j},1]}(t)-\int_{[0,1]^{d}}\mathbf{1}_{(x,1]}(t)dx\|_{\infty}
\displaystyle\leq supt[0,1]d|j=1N1N𝟏(xj,1](t)[0,1]d𝟏(x,1](t)𝑑x|\displaystyle\sup_{t\in[0,1]^{d}}|\sum_{j=1}^{N}\frac{1}{N}\mathbf{1}_{(x_{j},1]}(t)-\int_{[0,1]^{d}}\mathbf{1}_{(x,1]}(t)dx|
\displaystyle\leq DN(x1,x2,,xN).\displaystyle D_{N}^{*}(x_{1},x_{2},\ldots,x_{N}).

holds with probability at least ϵ\epsilon.

Combining with (4.45)-(4.47), the desired result is obtained.

4.4. Application on Koksma–Hlawka inequality

The classical K-H inequality is not suitable for functions with simple discontinuities. While [35] proposes a variant K-H type inequality that is suitable for piecewise smooth functions of f𝟏Ωf\cdot\mathbf{1}_{\Omega}, where ff is smooth and Ω\Omega is a Borel convex subset of [0,1]d[0,1]^{d}. In this subsection, we will use the star discrepancy upper bound of Hilbert space filling curve sampling to give an approximation error in the piecewise smooth function space. First of all, there is the following lemma:

Lemma 4.7.

[35] Let ff be a piecewise smooth function on [0,1]d[0,1]^{d}, and Ω\Omega be a Borel subset of [0,1]d[0,1]^{d}. Then for x1,x2,,xNx_{1},x_{2},\ldots,x_{N} in [0,1]d[0,1]^{d}, we have

(4.48) |n=1N(f𝟏Ω)(xn)NΩf(x)𝑑x|DNΩ(x1,x2,,xN)V(f),|\frac{\sum_{n=1}^{N}(f\cdot\mathbf{1}_{\Omega})(x_{n})}{N}-\int_{\Omega}f(x)dx|\leq D_{N}^{\Omega}(x_{1},x_{2},\ldots,x_{N})\cdot V(f),

where

(4.49) DNΩ(x1,x2,,xN)=2dsupA[0,1]d|n=1N𝟏ΩA(xn)Nλ(ΩA)|,D_{N}^{\Omega}(x_{1},x_{2},\ldots,x_{N})=2^{d}\sup_{A\in[0,1]^{d}}|\frac{\sum_{n=1}^{N}\mathbf{1}_{\Omega\cap A}(x_{n})}{N}-\lambda(\Omega\cap A)|,

and

(4.50) V(f)=u{1,2,,d}2d|u|[0,1]d||u|xuf(x)|𝑑x.V(f)=\sum_{u\subset\{1,2,\ldots,d\}}2^{d-|u|}\int_{[0,1]^{d}}|\frac{\partial^{|u|}}{\partial x_{u}}f(x)|dx.

The symbol |u|xnf(x)\frac{\partial^{|u|}}{\partial x_{n}}f(x) is the partial derivative of ff with respect to those components of xx with index in uu, and the supremum is taken over all axis-parallel boxes AA.

Theorem 4.8.

For integers b1b\geq 1, m1m\geq 1 and N=bmN=b^{m}. Let ff be a piecewise smooth function on [0,1]d[0,1]^{d}, and Ω\Omega be a Borel convex subset of [0,1]d[0,1]^{d}. Then for well-defined dd-dimensional Hilbert space filling curve samples XiU(Ei),i=1,2,,N=bmX_{i}\sim U(E_{i}),i=1,2,\ldots,N=b^{m}, we have

(4.51) |n=1N(f𝟏Ω)(Xn)NΩf(x)𝑑x|2dDN(X1,X2,,XN)V(f),|\frac{\sum_{n=1}^{N}(f\cdot\mathbf{1}_{\Omega})(X_{n})}{N}-\int_{\Omega}f(x)dx|\leq 2^{d}D_{N}^{*}(X_{1},X_{2},\ldots,X_{N})\cdot V(f),

where DN(X1,X2,,XN)D_{N}^{*}(X_{1},X_{2},\ldots,X_{N}) was obtained in Theorem 3.1.

Where

(4.52) V(f)=u{1,2,,d}2d|u|[0,1]d||u|xuf(x)|𝑑x.V(f)=\sum_{u\subset\{1,2,\ldots,d\}}2^{d-|u|}\int_{[0,1]^{d}}|\frac{\partial^{|u|}}{\partial x_{u}}f(x)|dx.

The symbol |u|xnf(x)\frac{\partial^{|u|}}{\partial x_{n}}f(x) is the partial derivative of ff with respect to those components of xx with index in uu.

Proof.

Because Ω[0,1]d\Omega\subset[0,1]^{d} is a convex set, and in (4.49), AA is a convex set, so the test set ΩA\Omega\cap A is a convex set and ((ΩA))2d\mathscr{M}(\partial(\Omega\cap A))\leq 2d. In addition, for the set A1A_{1} and A1A2A_{1}\subset A_{2}, and which satisfy λ(A2)λ(A1)1N\lambda(A_{2})-\lambda(A_{1})\leq\frac{1}{N}, then λ(ΩA2)\lambda(\Omega\cap A_{2}) and λ(ΩA1)\lambda(\Omega\cap A_{1}) constitute 1N\frac{1}{N}- cover. Through λ(ΩA2)λ(ΩA1)1N\lambda(\Omega\cap A_{2})-\lambda(\Omega\cap A_{1})\leq\frac{1}{N}, we can obtain the cover number. Furthermore, according to the idea of the proof of theorem 3.1, we get the desired result. ∎

An example: For ϵ>0\epsilon>0, let Σ\Sigma be the simplex

Σ={(x1,x2,,xd)[0,1]d:x1xdϵ,1x1xdϵ}.\Sigma=\{(x_{1},x_{2},\ldots,x_{d})\in[0,1]^{d}:x_{1}\geq\ldots\geq x_{d}\geq\epsilon,1-x_{1}-\ldots-x_{d}\geq\epsilon\}.

Define

f(x1,x2,,xd)=1x1x2xd(1x1x2xd).f(x_{1},x_{2},\ldots,x_{d})=\frac{1}{x_{1}x_{2}\ldots x_{d}}(1-x_{1}-x_{2}-\ldots-x_{d}).

Then one can show that

|α|dΣ|(x)αf(x)|𝑑xcϵ˙d.\sum_{|\alpha|\leq d}\int_{\Sigma}|(\frac{\partial}{\partial x})^{\alpha}f(x)|dx\leq c\dot{\epsilon}^{-d}.

By theorem 4.8, for Hilbert space filling curve samples Xi,i=1,2,,NX_{i},i=1,2,\ldots,N, and all convex sets Ω\Omega contained in Σ\Sigma, we have

(4.53) |n=1N(f𝟏Ω)(Xn)NΩf(x)𝑑x|cϵ˙d2dDN(X1,X2,,XN).|\frac{\sum_{n=1}^{N}(f\cdot\mathbf{1}_{\Omega})(X_{n})}{N}-\int_{\Omega}f(x)dx|\leq c\dot{\epsilon}^{-d}\cdot 2^{d}\cdot D_{N}^{*}(X_{1},X_{2},\ldots,X_{N}).

5. Conclusions

The convergence of the star discrepancy bound for HSFC-based sampling is O(N1212d(lnN)12)O(N^{-\frac{1}{2}-\frac{1}{2d}}\cdot(\ln N)^{\frac{1}{2}}), which matches the rate using jittered sampling sets and improves the rate using the classical MC method. The strict condition for the sampling number N=mdN=m^{d} in the jittered case is removed, and the applicability of the new stratified sampling method for higher dimensions is improved. Although a faster convergence of the upper bound leads to a faster integration approximation rate, the strict comparison of the size of random star discrepancy under different stratified sampling models remains unresolved.

References

  • [1] E. Hlawka, Funktionen von beschränkter variatiou in der theorie der gleichverteilung, Annali di Matematica Pura ed Applicata, 54(1961), 325-333.
  • [2] JF. Koksma, Een algemeene stelling uit de theorie der gelijkmatige verdeeling modulo 1, Mathematica B (Zutphen), 11(1942/1943), 7-11.
  • [3] J. Dick and F. Pillichshammer, Digital nets and sequences: Discrepancy theory and quasi-Monte Carlo integration, Cambridge University Press, 2010.
  • [4] A. G. M. Ahmed, H. Perrier and D. Coeurjolly, et al, Low-discrepancy blue noise sampling, ACM Trans. Graph., 35(6)(2016), 1-13.
  • [5] C. Cervellera and M. Muselli, Deterministic design for neural network learning: An approach based on discrepancy, IEEE Trans. Neural Netw., 15(3)(2004), 533-544.
  • [6] Y. Lai, Monte Carlo and Quasi-Monte carlo methods and their applications, Ph.D. Dissertation, Department of Mathematics, Claremont Graduate University, California, USA, 1998.
  • [7] Y. Lai, Intermediate rank lattice rules and applications to finance, Appl. Numer. Math., 59(2009), 1-20.
  • [8] S. Heinrich, E. Novak, G. W. Wasilkowski and H. Woźniakowski, The inverse of the star-discrepancy depends linearly on the dimension, Acta Arith., 96(3)(2001), 279-302.
  • [9] C. Aistleitner, Covering numbers, dyadic chaining and discrepancy, J. Complexity, 27(6)(2011), 531-540.
  • [10] M. Gnewuch, N. Hebbinghaus, Discrepancy bounds for a class of negatively dependent random points including Latin hypercube samples, Ann. Appl. Probab., 31(4)(2021), 1944-1965.
  • [11] C. Aistleitner and M. Hofer, Probabilistic discrepancy bound for Monte Carlo point sets, Math. Comp., 83(2014), 1373-1381.
  • [12] F. Pausinger and S. Steinerberger, On the discrepancy of jittered sampling, J. Complexity, 33(2016), 199-216.
  • [13] B. Doerr, A sharp discrepancy bound for jittered sampling, Math. Comp., 91(2022), 1871-1892.
  • [14] Z. He, A. B. Owen, Extensible grids: uniform sampling on a space filling curve, J. R. Stat. Soc. Ser. B, 78(2016), 917-931.
  • [15] Z. He, L. Zhu, Asymptotic normality of extensible grid sampling, Stat. Comput., 29(2019), 53-65.
  • [16] L. Ambrosio, A. Colesanti, and E. Villa, Outer Minkowski content for some classes of closed sets, Math. Ann., 342(4)(2008), 727–748.
  • [17] B. Doerr, M. Gnewuch, A. Srivastav, Bounds and constructions for the star-discrepancy via δ\delta-covers, J. Complexity, 21(2005), 691-709.
  • [18] M. Gnewuch, Bracketing numbers for axis-parallel boxes and applications to geometric discrepancy, J. Complexity, 24(2)(2008), 154-172.
  • [19] F. Cucker and D. X. Zhou, Learning theory: an approximation theory viewpoint, Cambridge University Press., 2007.
  • [20] M. Kiderlen and F. Pausinger, On a partition with a lower expected L2L_{2}-discrepancy than classical jittered sampling, J. Complexity 70(2022), https://doi.org/10.1016/j.jco.2021.101616.
  • [21] M. Kiderlen and F. Pausinger, Discrepancy of stratified samples from partitions of the unit cube, Monatsh. Math., 195(2021), 267-306.
  • [22] F. Y. Kuo and I. H. Sloan, Lifting the curse of dimensionality, Notices of the American Mathematical Society, 52(11)(2005), 1320–1328.
  • [23] E. Novak and H. Woźniakowski, Tractability of Multivariate Problems, Volume II: Standard Information for Functionals, European Mathematical Society, 2010.
  • [24] R. F. Bass and K. Gröchenig, Random sampling of bandlimited functions, Israel J. Math., 177(2010), 1-28.
  • [25] R. F. Bass and K. Gröchenig, Relevant sampling of band-limited functions, Illinois J. Math., 57(2013), 43-58.
  • [26] K. Gröchenig and H. Schwab, Fast local reconstruction methods for nonuniform sampling in shift-invariant spaces, SIAM J. Matrix Anal. Appl., 24(2003), 899-913.
  • [27] Q. Y. Sun, Frames in spaces with finite rate of innovation, Adv. Comput. Math., 28(2008), 301-329.
  • [28] W. C. Sun, Local sampling theorems for spaces generated by splines with arbitrary knots, Math. Comput., 78(2009), 225-239.
  • [29] W. C. Sun and X. W. Zhou, Characterization of local sampling sequences for spline subspaces, Adv. Comput. Math., 30(2009), 153-175.
  • [30] J. Xian and S. Li, Sampling set conditions in weighted finitely generated shift-invariant spaces and their applications, Appl. Comput. Harmon. Anal., 23(2007), 171-180.
  • [31] J. B. Yang, Random sampling and reconstruction in multiply generated shift-invariant spaces, Anal. Appl., 17(2019), 323-347.
  • [32] H. Fu¨\ddot{u}hr and J. Xian, Quantifying invariance properties of shift-invariant spaces, Appl. Comput. Harmon. Anal., 36(2014), 514-521.
  • [33] H. Fu¨\ddot{u}hr and J. Xian, Relevant sampling in finitely generated shift-invariant spaces, J. Approx. Theory, 240(2019), 1-15.
  • [34] A. Aldroubi and K. Gröchenig, Nonuniform sampling and reconstruction in shift-invariant spaces, SIAM Rev., 43(2001), 585-620.
  • [35] G. Gigante, L. Brandolini, L. Colzani and G. Travaglini, On the koksma–hlawka inequality, J. Complexity, 29(2)(2013), 158–172.