Subgroup-based Rank-1 Lattice Quasi-Monte Carlo
Abstract
Quasi-Monte Carlo (QMC) is an essential tool for integral approximation, Bayesian inference, and sampling for simulation in science, etc. In the QMC area, the rank-1 lattice is important due to its simple operation, and nice properties for point set construction. However, the construction of the generating vector of the rank-1 lattice is usually time-consuming because of an exhaustive computer search. To address this issue, we propose a simple closed-form rank-1 lattice construction method based on group theory. Our method reduces the number of distinct pairwise distance values to generate a more regular lattice. We theoretically prove a lower and an upper bound of the minimum pairwise distance of any non-degenerate rank-1 lattice. Empirically, our methods can generate a near-optimal rank-1 lattice compared with the Korobov exhaustive search regarding the -norm and -norm minimum distance. Moreover, experimental results show that our method achieves superior approximation performance on benchmark integration test problems and kernel approximation problems.
1 Introduction
Integral operation is critical in a large amount of interesting machine learning applications, e.g. kernel approximation with random feature maps [29], variational inference in Bayesian learning [3], generative modeling and variational autoencoders [15]. Directly calculating an integral is usually infeasible in these real applications. Instead, researchers usually try to find an approximation for the integral. A simple and conventional approximation is Monte Carlo (MC) sampling, in which the integral is approximated by calculating the average of the i.i.d. sampled integrand values. Monte Carlo (MC) methods [12] are widely studied with many techniques to reduce the approximation error, which includes importance sampling and variance reduction techniques and more [1].
To further reduce the approximation error, Quasi-Monte Carlo (QMC) methods utilize a low discrepancy point set instead of the i.i.d. sampled point set used in the standard Monte Carlo method. There are two main research lines in the area of QMC [8, 25], i.e., the digital nets/sequences and lattice rules. The Halton sequence and the Sobol sequence are the widely used representatives of digital sequences [8]. Compared with digital nets/sequences, the points set of lattice rules preserve the properties of lattice. The points partition the space into small repeating cells. Among previous research on the lattice rules, Korobov introduced integration lattice rules in [16] for an integral approximation of the periodic integrands. [33] proves that there also exist good lattice rules for non-periodic integrands. According to general lattice rules, a point set is usually constructed by enumerating the integer vectors and multiplying them with an invertible generator matrix. A general lattice rule has to check each constructed point to see whether it is inside a unit cube and discard it if it is not. The process is repeated until we reach the desired number of points. This construction process is inefficient since the checking step is required for every point. Note that rescaling the unchecked points by the maximum norm of all the points may lead to non-uniform points set in the cube.
An interesting special case of the lattice rules is the rank-1 lattice, which only requires one generating vector to construct the whole point set. Given the generating vector, rank-1 lattices can be obtained by a very simple construction form. It is thus much more efficient to construct the point set with the simple construction form. Compared with the general lattice rule, the construction form of the rank-1 lattice has already guaranteed the constructed point to be inside the unit cube, therefore, no further checks are required. We refer to [8] and [25] for a more detailed survey of QMC and rank-1 lattice.



Although the rank-1 lattice can derive a simple construction form, obtaining the generating vector remains difficult. Most methods [17, 26, 9, 21, 20, 18, 27] in the literature rely on an exhaustive computer search by optimizing some criteria to find a good generating vector. Korobov [17] suggests searching the generating vector in a form of with , where is the dimension and is the number of points, such that the greatest common divisor of and equals to 1. Sloan et al. study the component-by-component construction for the lattice rules [32]. It is a greedy search that is faster than an exhaustive search. Nuyens et al. [26] propose a fast algorithm to construct the generating vector using a component-by-component search method. Although the exhaustive checking steps are avoided compared with general lattice rules, the rank-1 lattice still requires a brute-force search for the generating vector, which is still very time-consuming, especially when the dimension and the number of points are large.
To address this issue, we propose a closed-form rank-1 lattice rule that directly computes a generating vector without any search process. To generate a more evenly spaced lattice, we propose to reduce the number of distinct pairwise distance in the lattice point set to make the lattice more regular w.r.t. the minimum toroidal distance [11]. Larger minimum toroidal distance means more regular. Based on group theory, we derive that if the generating vector satisfies the condition that set is a subgroup of the multiplicative group of integers modulo , where is the number of points, then the number of distinct pairwise distance can be efficiently reduced. We construct the generating vector by ensuring this condition. With the proposed subgroup-based rank-1 lattice, we can construct a more evenly spaced lattice. An illustration of the generated lattice is shown in Figure 1.
Our contributions are summarized as follows:
-
•
We propose a simple and efficient closed-form method for rank-1 lattice construction, which does not require the time-consuming exhaustive computer search that previous rank-1 lattice algorithms rely on. A side product is a closed-form method to generate QMC points set on sphere with bounded mutual coherence, which is presented in Appendix.
-
•
We generate a more regular lattice by reducing the number of distinct pairwise distances. We prove a lower and an upper bound for the minimum -norm-based and -norm-based toroidal distance of the rank-1 lattice. Theoretically, our constructed lattice is the optimal rank-1 lattice for maximizing the minimum toroidal distance when the number of points is a prime number and .
-
•
Empirically, the proposed method generates near-optimal rank-1 lattice compared with the Korobov search method in maximizing the minimum of the -norm-based and -norm-based toroidal distance.
-
•
Our method obtains better approximation accuracy on benchmark test problems and kernel approximation problem.
2 Background
We first give the definition and the properties of lattices in Section 2.1. Then we introduce the minimum distance criterion for lattice construction in Section 2.2.
2.1 The Lattice
A -dimensional lattice is a set of points that contains no limit points and satisfies [22]
(1) |
A widely known lattice is the unit lattice whose components are all integers. A general lattice is constructed by a generator matrix. Given a generator matrix , a -dimensional lattice can be constructed as
(2) |
A generator matrix is not unique to a lattice , namely, a lattice can be obtained from a different generator matrices.
A lattice point set for integration is constructed as . This step may require an additional search (or check) for all the points inside the unit cube.
A rank-1 lattice is a special case of the general lattice, which has a simple operation for point set construction instead of directly using Eq.(2). A rank-1 lattice point set can be constructed as
(3) |
where is the so-called generating vector, and the big denotes the operation of taking the fractional part of the input number elementwise. Compared with the general lattice rule, the construction form of the rank-1 lattice already ensures the constructed points to be inside the unit cube without the need for any further checks.
Given a rank-1 lattice set in the unit cube, we can also construct a randomized point set. Sample a random variable , we can construct a point set by random shift as [8]
(4) |
2.2 The separating distance of a lattice
Several criteria have been studied in the literature for good lattice construction through computer search. Worst case error is one of the most widely used criteria for functions in a reproducing kernel Hilbert space (RKHS) [8]. However, this criterion requires the prior knowledge of functions and the assumption of the RKHS. Without assumptions of the functions, it is reasonable to construct a good lattice by designing an evenly spaced point set. Minimizing the covering radius is a good way for evenly spaced point set construction.
As minimizing the covering radius of the lattice is equivalent to maximizing the packing radius [7], we can construct the point set through maximizing the packing radius (separating distance) of the lattice. Define the covering radius and packing radius of a set of points as Eq.(5) and Eq.(6), respectively:
(5) | |||
(6) |
The -norm-based toroidal distance [11] between two lattice points and can be defined as:
(7) |
Because the difference (subtraction) between two lattice points is still a lattice point, and a rank-1 lattice has a period 1, the packing radius of a rank-1 lattice can be calculated as
(8) |
where denotes the -norm-based toroidal distance between and , symbol denotes the set excludes the point . This formulation calculates the packing radius with a time complexity of rather than in pairwise comparison. However, the computation of the packing radius is not easily accelerated by fast Fourier transform due to the minimum operation in Eq.(8).
3 Subgroup-based Rank-1 Lattice
In this section, we derive our construction of a rank-1 lattice based on the subgroup theory. Then we analyze the properties of our method. We provide detailed proofs in the supplement.
3.1 Construction of the Generating Vector
From the definition of rank-1 lattice, we know the packing radius of rank-1 lattice with points can be reformulated as
(9) |
where
(10) |
Then, we have
(11) |
where denotes the elementwise min operation between two inputs.
Suppose is a prime number, from number theory, we know that for a primitive root , the residue of modulo forms a cyclic group under multiplication, and . Since , we know that .
Because of the one-to-one correspondence between the residue of modulo and the set , we can construct the generating vector as
(12) |
without loss of generality, where are integer components to be designed. Denote , maximizing the separating distance is equivalent to maximizing
(13) |
Suppose divides , i.e., , by setting for , we know that is equivalent to setting mod , and it forms a subgroup of the group mod . From Lagrange’s theorem in group theory [10], we know that the cosets of the subgroup partition the entire group into equal-size, non-overlapping sets, and the number of cosets of is . Thus, we know that distance for has different values, and there are the same numbers of items for each value.
Thus, we can construct the generating vector as
(14) |
In this way, the constructed rank-1 lattice is more regular as it has few different distinct pairwise distance values, and for each distance, the same number of items obtain this value. Usually, the constructed regular lattice is more evenly spaced, and it has a large minimum pairwise distance. We confirm this empirically through extensive experiments in Section 5.
We summarize our construction method and the properties of the constructed rank-1 lattice in Theorem 1.
Theorem 1.
Suppose is a prime number and . Let be a primitive root of . Let . Construct a rank-1 lattice with . Then, there are distinct pairwise toroidal distance values among , and each distance value is taken by the same number of pairs in .
As shown in Theorem 1, our method can construct regular rank-1 lattice through a very simple closed-form construction, which does not require any exhaustive computer search.
3.2 Regular Property of Rank-1 Lattice
We show the regular property of rank-1 lattices in terms of -norm-based toroidal distance.
Theorem 2.
Suppose is a prime number and . Let with . Construct a rank-1 lattice with and . Then, the minimum pairwise toroidal distance can be bounded as
(15) | |||
(16) |
where and denote the -norm-based toroidal distance and the -norm-based toroidal distance, respectively.
Theorem 2 gives the upper and lower bounds of the minimum pairwise distance of any non-degenerate rank-1 lattice. The term ‘non-degenerate’ means that the elements in the generating vector are not equal, i.e., .
We now show that our subgroup-based rank-1 lattice can achieve the optimal minimum pairwise distance when is a prime number.
Corollary 1.
Suppose is a prime number. Let be a primitive root of . Let . Construct rank-1 lattice with . Then, the pairwise toroidal distance of the lattice attains the upper bound.
(17) | |||
(18) |
Corollary 1 shows a case when our subgroup rank-1 lattice obtains the maximum minimum pairwise toroidal distance. It is useful for expensive black-box functions, where the number of function queries is small. Empirically, we find that our subgroup rank-1 lattice can achieve near-optimal pairwise toroidal distance in many other cases.
4 QMC for Kernel Approximation
Another application of our subgroup rank-1 lattice is kernel approximation. Kernel approximation has been widely studied. A random feature maps is a promising way for kernel approximation. Rahimi et al. study the shift-invariant kernels by Monte Carlo sampling [29]. Yang et al. suggest employing QMC for kernel approximation [35, 2]. Several previous methods work on the construction of structured feature maps for kernel approximation [19, 6, 23]. Apart from other kernel approximation methods designed for specific kernels, QMC can serve as a plug-in for any integral representation of kernels to improve kernel approximation. We include this section to be self-contained.
From Bochner’s Theorem, shift invariant kernels can be expressed as an integral [29]
(19) |
where , and is a probability density. ensure the imaginary parts of the integral vanish. Eq.(19) can be rewritten as
(20) |
We can approximate the integral Eq.(19) by using our subgroup rank-1 lattice according to the QMC approximation in [35, 34]
(21) |
where is the feature map of the input .
5 Experiments
In this section, we first evaluate the minimum distance generated by our subgroup rank-1 lattice in section 5.1. We then evaluate the subgroup rank-1 lattice on integral approximation tasks and kernel approximation task in section 5.2 and 5.3, respectively.
d=50 | n=101 | 401 | 601 | 701 | 1201 | 1301 | 1601 | 1801 | 1901 | 2801 | |
---|---|---|---|---|---|---|---|---|---|---|---|
SubGroup | 12.624 | 11.419 | 11.371 | 11.354 | 11.029 | 10.988 | 10.541 | 10.501 | 10.454 | 10.748 | |
Hua [13] | 10.426 | 10.421 | 9.8120 | 10.267 | 10.074 | 9.3982 | 9.5890 | 9.5175 | 8.9868 | 9.2260 | |
Korobov [17] | 12.624 | 11.419 | 11.371 | 11.354 | 11.029 | 10.988 | 10.665 | 10.561 | 10.701 | 10.748 | |
d=100 | 401 | 601 | 1201 | 1601 | 1801 | 2801 | 3001 | 4001 | 4201 | 4801 | |
SubGroup | 24.097 | 23.760 | 22.887 | 23.342 | 22.711 | 23.324 | 22.233 | 22.437 | 22.573 | 21.190 | |
Hua [13] | 21.050 | 21.251 | 21.205 | 20.675 | 19.857 | 20.683 | 20.700 | 19.920 | 19.967 | 20.574 | |
Korobov [17] | 24.097 | 23.760 | 23.167 | 23.342 | 22.893 | 23.324 | 22.464 | 22.437 | 22.573 | 22.188 | |
d=200 | 401 | 1201 | 1601 | 2801 | 4001 | 4801 | 9601 | 12401 | 14401 | 15601 | |
SubGroup | 50.125 | 48.712 | 47.500 | 47.075 | 47.810 | 45.957 | 45.819 | 46.223 | 43.982 | 45.936 | |
Hua [13] | 43.062 | 43.057 | 43.052 | 43.055 | 43.053 | 43.055 | 43.053 | 42.589 | 42.558 | 42.312 | |
Korobov [17] | 50.125 | 48.712 | 47.660 | 47.246 | 47.810 | 46.686 | 46.154 | 46.223 | 45.949 | 45.936 | |
d=500 | 3001 | 4001 | 7001 | 9001 | 13001 | 16001 | 19001 | 21001 | 24001 | 28001 | |
SubGroup | 121.90 | 121.99 | 119.60 | 118.63 | 120.23 | 119.97 | 116.41 | 120.56 | 120.24 | 113.96 | |
Hua [13] | 108.33 | 108.33 | 108.33 | 108.33 | 108.33 | 108.33 | 108.33 | 108.33 | 108.33 | 108.33 | |
Korobov [17] | 121.90 | 121.99 | 120.46 | 120.16 | 120.23 | 119.97 | 119.41 | 120.56 | 120.24 | 118.86 |
d=50 | n=101 | 401 | 601 | 701 | 1201 | 1301 | 1601 | 1801 | 1901 | 2801 | |
---|---|---|---|---|---|---|---|---|---|---|---|
SubGroup | 2.0513 | 1.9075 | 1.9469 | 1.9196 | 1.8754 | 1.8019 | 1.8008 | 1.8709 | 1.7844 | 1.7603 | |
Hua [13] | 1.7862 | 1.7512 | 1.7293 | 1.7049 | 1.7326 | 1.6295 | 1.6659 | 1.6040 | 1.5629 | 1.5990 | |
Korobov [17] | 2.0513 | 1.9075 | 1.9469 | 1.9196 | 1.8754 | 1.8390 | 1.8356 | 1.8709 | 1.8171 | 1.8327 | |
d=100 | 401 | 601 | 1201 | 1601 | 1801 | 2801 | 3001 | 4001 | 4201 | 4801 | |
SubGroup | 2.8342 | 2.8143 | 2.7077 | 2.7645 | 2.7514 | 2.6497 | 2.6337 | 2.6410 | 2.6195 | 2.5678 | |
Hua [13] | 2.5385 | 2.5739 | 2.4965 | 2.4783 | 2.4132 | 2.5019 | 2.4720 | 2.4138 | 2.4537 | 2.4937 | |
Korobov [17] | 2.8342 | 2.8143 | 2.7409 | 2.7645 | 2.7514 | 2.6956 | 2.6709 | 2.6562 | 2.6667 | 2.6858 | |
d=200 | 401 | 1201 | 1601 | 2801 | 4001 | 4801 | 9601 | 12401 | 14401 | 15601 | |
SubGroup | 4.0876 | 3.9717 | 3.9791 | 3.8425 | 3.9276 | 3.8035 | 3.7822 | 3.8687 | 3.6952 | 3.8370 | |
Hua [13] | 3.7332 | 3.7025 | 3.6902 | 3.6944 | 3.7148 | 3.6936 | 3.6571 | 3.5625 | 3.6259 | 3.5996 | |
Korobov [17] | 4.0876 | 3.9717 | 3.9791 | 3.9281 | 3.9276 | 3.9074 | 3.8561 | 3.8687 | 3.8388 | 3.8405 | |
d=500 | 3001 | 4001 | 7001 | 9001 | 13001 | 16001 | 19001 | 21001 | 24001 | 28001 | |
SubGroup | 6.3359 | 6.3769 | 6.3141 | 6.2131 | 6.2848 | 6.2535 | 6.0656 | 6.2386 | 6.2673 | 6.1632 | |
Hua [13] | 5.9216 | 5.9216 | 5.9215 | 5.9215 | 5.9216 | 5.9216 | 5.9215 | 5.9215 | 5.8853 | 5.9038 | |
Korobov [17] | 6.3359 | 6.3769 | 6.3146 | 6.2960 | 6.2848 | 6.2549 | 6.2611 | 6.2386 | 6.2673 | 6.2422 |




5.1 Evaluation of the minimum distance
We evaluate the minimum distance of our subgroup rank-1 lattice by comparing with Hua’s method [13] and the Korobov [17] searching method. We denote ‘SubGroup’ as our subgroup rank-1 lattice, ‘Hua’ as rank-1 lattice constructed by Hua’s method [13], and ‘Korobov’ as rank-1 lattice constructed by exhaustive computer search in Korobov form [17].
We set the dimension as in . For each dimension , we set the number of points as the first ten prime numbers such that divides , i.e., . The minimum -norm-based toroidal distance and the minimum -norm-based toroidal distance for each dimension are reported in Table 5.1 and Table 2, respectively. The larger the distance, the better.
We can observe that our subgroup rank-1 lattice achieves consistently better (larger) minimum distances than Hua’s method in all the cases. Moreover, we see that subgroup rank-1 lattice obtains, in 20 out of 40 cases, the same -norm-based toroidal distance and in 24 out of 40 cases the same -norm-based toroidal distance compared with the exhaustive computer search in Korobov form. The experiments show that our subgroup rank-1 lattice achieves the optimal toroidal distance in exhaustive computer searches in Korobov form in over half of all the cases. Furthermore, the experimental result shows that our subgroup rank-1 lattice obtains a competitive distance compared with the exhaustive Korobov search in the remaining cases. Note that our subgroup rank-1 lattice is a closed-form construction which does not require computer search, making our method more appealing and simple to use.
Time Comparison of Korobov searching and our sub-group rank-1 lattice. The table below shows the time cost (seconds) for lattice construction. The run time for Korobov searching grows fast to hours. Our method can run in less than one second, achieving a to speed-up. The speed-up increases when and becomes larger.
d=500 | n=3001 | 4001 | 7001 | 9001 | 13001 | 16001 | 19001 | 21001 | 24001 | 28001 | |
SubGroup | 0.0185 | 0.0140 | 0.0289 | 0.043 | 0.0386 | 0.0320 | 0.0431 | 0.0548 | 0.0562 | 0.0593 | |
Korobov | 34.668 | 98.876 | 152.86 | 310.13 | 624.56 | 933.54 | 1308.9 | 1588.5 | 2058.5 | 2815.9 | |
d=1000 | n=4001 | 16001 | 24001 | 28001 | 54001 | 70001 | 76001 | 88001 | 90001 | 96001 | |
SubGroup | 0.0388 | 0.0618 | 0.1041 | 0.1289 | 0.2158 | 0.2923 | 0.3521 | 0.4099 | 0.5352 | 0.5663 | |
Korobov | 112.18 | 1849.4 | 4115.9 | 5754.6 | 20257 | 34842 | 43457 | 56798 | 56644 | 69323 |
5.2 Integral approximation
We evaluate our subgroup rank-1 lattice on the integration test problem
(22) | |||
(23) |
We compare with i.i.d. Monte Carlo, a Hua’s rank-1 lattice [13], Korobov searching rank-1 lattice [16], Halton sequence, and Sobol sequence [8]. For both Halton sequence and Sobol sequence, we use the scrambling technique suggested in [8]. For all the QMC methods, we use the random shift technique as in Eq.(4).
We fix and in all the experiments. We set dimension and , respectively. We set the number of points as the first ten prime numbers such that divides , i.e., .
The mean approximation error () with error bars over 50 independent runs for each dimension is presented in Figure 2. We can see that Hua’s method obtains a smaller error than i.i.d Monte Carlo on the 50-d problem, however, it becomes worse than MC on 500-d and 1000-d problems. Moreover, our subgroup rank-1 lattice obtains a consistent smaller error on all the tested problems than Hua and MC. In addition, our subgroup rank-1 lattice achieves a slightly better performance than Halton, Sobol and Korobov searching method.






5.3 Kernel approximation
We evaluate the performance of subgroup rank-1 lattice on kernel approximation tasks by comparing with other QMC baseline methods. We test the kernel approximation of the Gaussian kernel, the zeroth-order arc-cosine kernel, and the first-order arc-cosine kernel as in [6].
We compare subgroup rank-1 lattice with a Hua’s rank-1 lattice [13], Halton sequence, Sobol sequence [8] and standard i.i.d. Monte Carlo sampling. For both the Halton sequence and Sobol sequence, we use the scrambling technique suggested in [8]. For both subgroup rank-1 lattice and Hua’s rank-1 lattice, we use the random shift as in Eq.(4). We evaluate the methods on the DNA [28] and the SIFT1M [14] dataset over 50 independent runs. Each run contains 2000 random samples to construct the Gram matrix. The bandwidth parameter of Gaussian kernel is set to 15 in all the experiments.
The mean Frobenius norm approximation error () and maximum norm approximation error () with error bars on DNA [28] dataset are plotted in Figure 3. The results on SIFT1M [14] is given in Figure 6 in the supplement. The experimental result shows that subgroup rank-1 lattice consistently obtains a smaller approximation error compared with other baselines.
5.4 Approximation on Graphical Model
For general Boltzmann machines with continuous state in , the energy function of is defined as . The normalization constant is . For inference, the marginal likelihood of observation is with function , where denotes the hidden states.
We evaluate our method on approximation of the normalization constant and inference by comparing with i.i.d. Monte Carlo (MC), slice sampling (SS) and Hamiltonian Monte Carlo (HMC). We generate the elements of , , , and by sampling from standard Gaussian . These parameters are fixed and kept the same for all the methods in comparison. For inference, we generate an observation by uniformly sampling and keep it fixed and same for all the methods. For SS and HMC, we use the slicesample function and hmcSampler function in MATLAB, respectively. We use the approximation of i.i.d. MC with samples as the pseudo ground-truth. The approximation errors and are shown in Fig.4(a)-4(d) and Fig.4(e)-4(h), respectively. our method consistently outperforms MC, HMC and SS on all cases. Moreover, our method is much cheaper than SS and HMC.








Comparison to sequential Monte Carlo. When the positive density region takes a large fraction of the entire domain, our method is very competitive. When it is only inside a small part of a large domain, our method may not be better than sequential adaptive sampling. In this case, it is interesting to take advantage of both lattice and adaptive sampling. E.g., one can employ our subgroup rank-1 lattice as a rough partition of the domain to find high mass regions, then take sequential adaptive sampling on the promising regions with the lattice points as the start points. Also, it is interesting to consider recursively apply our subgroup rank-1 lattice to refine the partition. Moreover, our subgroup-based rank-1 lattice enables black-box evaluation without the need for gradient information. In contrast, several sequential sampling methods, e.g., HMC, need a gradient of density function for sampling.
6 Conclusion
We propose a closed-form method for rank-1 lattice construction, which is simple and efficient without exhaustive computer search. Theoretically, we prove that our subgroup rank-1 lattice has few different pairwise distance values, which is more regular to be evenly spaced. Moreover, we prove a lower and an upper bound for the minimum toroidal distance of a non-degenerate rank-1 lattice. Empirically, our subgroup rank-1 lattice obtains near-optimal minimum toroidal distance compared with Korobov exhaustive search. Moreover, subgroup rank-1 lattice achieves smaller integration approximation error. In addition, we propose a closed-form method to generate QMC points set on sphere . We proved upper bounds of the mutual coherence of the generated points. Further, we show an example of CycleGAN training in the supplement. Our subgroup rank-1 lattice sampling and QMC on sphere can serve as an alternative for training generative models.
Acknowledgement and Funding Disclosure
We thank the reviewers for their valuable comments and suggestions. Yueming Lyu was supported by UTS President Scholarship. Prof. Ivor W. Tsang was supported by ARC DP180100106 and DP200101328.
Broader Impact
In this paper, we proposed a closed-form rank-one lattice construction based on group theory for Quasi-Monte Carlo. Our method does not require the time-consuming exhaustive computer search. Our method is a fundamental tool for integral approximation and sampling.
Our method may serve as a potential advance in QMC, which may have an impact on a wide range of applications that rely on integral approximation. It includes kernel approximation with feature map, variational inference in Bayesian learning, generative modeling, and variational autoencoders. This may bring useful applications and be beneficial to society and the community. Since our method focuses more on the theoretical side, the direct negative influences and ethical issues are negligible.
References
- [1] Bouhari Arouna. Adaptative monte carlo method, a variance reduction technique. Monte Carlo Methods and Applications, 10(1):1–24, 2004.
- [2] Haim Avron, Vikas Sindhwani, Jiyan Yang, and Michael W Mahoney. Quasi-monte carlo feature maps for shift-invariant kernels. The Journal of Machine Learning Research, 17(1):4096–4133, 2016.
- [3] Matthew James Beal et al. Variational algorithms for approximate Bayesian inference. university of London London, 2003.
- [4] Jean Bourgain, Alexey A Glibichuk, and SERGEI VLADIMIROVICH KONYAGIN. Estimates for the number of sums and products and for exponential sums in fields of prime order. Journal of the London Mathematical Society, 73(2):380–398, 2006.
- [5] Alexander Buchholz, Florian Wenzel, and Stephan Mandt. Quasi-monte carlo variational inference. arXiv preprint arXiv:1807.01604, 2018.
- [6] Krzysztof Choromanski and Vikas Sindhwani. Recycling randomness with structure for sublinear time kernel expansions. 2016.
- [7] Sabrina Dammertz and Alexander Keller. Image synthesis by rank-1 lattices. In Monte Carlo and Quasi-Monte Carlo Methods 2006, 2008.
- [8] Josef Dick, Frances Y Kuo, and Ian H Sloan. High-dimensional integration: the quasi-monte carlo way. Acta Numerica, 22:133–288, 2013.
- [9] Carola Doerr and François-Michel De Rainville. Constructing low star discrepancy point sets with genetic algorithms. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pages 789–796, 2013.
- [10] David Steven Dummit and Richard M Foote. Abstract algebra, volume 3. Wiley Hoboken, 2004.
- [11] Leonhard Grünschloß, Johannes Hanika, Ronnie Schwede, and Alexander Keller. (t, m, s)-nets and maximized minimum distance. In Monte Carlo and Quasi-Monte Carlo Methods 2006, pages 397–412. Springer, 2008.
- [12] John Hammersley. Monte carlo methods. Springer Science & Business Media, 2013.
- [13] L-K Hua and Yuan Wang. Applications of number theory to numerical analysis. Springer Science & Business Media, 2012.
- [14] Herve Jegou, Matthijs Douze, and Cordelia Schmid. Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1):117–128, 2010.
- [15] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
- [16] AN Korobov. The approximate computation of multiple integrals. In Dokl. Akad. Nauk SSSR, volume 124, pages 1207–1210, 1959.
- [17] Nikolai Mikhailovich Korobov. Properties and calculation of optimal coefficients. In Doklady Akademii Nauk, volume 132, pages 1009–1012. Russian Academy of Sciences, 1960.
- [18] Helene Laimer. On combined component-by-component constructions of lattice point sets. Journal of Complexity, 38:22–30, 2017.
- [19] Quoc Le, Tamás Sarlós, and Alex Smola. Fastfood-approximating kernel expansions in loglinear time. In Proceedings of the international conference on machine learning, 2013.
- [20] Pierre L’ecuyer and David Munger. Algorithm 958: Lattice builder: A general software tool for constructing rank-1 lattice rules. ACM Transactions on Mathematical Software (TOMS), 42(2):1–30, 2016.
- [21] Gunther Leobacher and Friedrich Pillichshammer. Introduction to quasi-Monte Carlo integration and applications. Springer, 2014.
- [22] James N Lyness. Notes on lattice rules. Journal of Complexity, 19(3):321–331, 2003.
- [23] Yueming Lyu. Spherical structured feature maps for kernel approximation. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pages 2256–2264, 2017.
- [24] Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow, and Brendan Frey. Adversarial autoencoders. arXiv preprint arXiv:1511.05644, 2015.
- [25] Harald Niederreiter. Random number generation and quasi-Monte Carlo methods, volume 63. Siam, 1992.
- [26] Dirk Nuyens and Ronald Cools. Fast algorithms for component-by-component construction of rank-1 lattice rules in shift-invariant reproducing kernel hilbert spaces. Mathematics of Computation, 75(254):903–920, 2006.
- [27] Art B Owen. Monte carlo book: the quasi-monte carlo parts. 2019.
- [28] Farhad Pourkamali-Anaraki, Stephen Becker, and Michael B Wakin. Randomized clustered nystrom for large-scale kernel machines. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
- [29] Ali Rahimi and Benjamin Recht. Random features for large-scale kernel machines. In Advances in neural information processing systems, 2007.
- [30] Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolution strategies as a scalable alternative to reinforcement learning. arXiv preprint arXiv:1703.03864, 2017.
- [31] Ilya D Shkredov. On exponential sums over multiplicative subgroups of medium size. Finite Fields and Their Applications, 30:72–87, 2014.
- [32] I Sloan and A Reztsov. Component-by-component construction of good lattice rules. Mathematics of Computation, 71(237):263–273, 2002.
- [33] Ian H Sloan and Henryk Woźniakowski. Tractability of multivariate integration for weighted korobov classes. Journal of Complexity, 17(4):697–721, 2001.
- [34] Anthony Tompkins, Ransalu Senanayake, Philippe Morere, and Fabio Ramos. Black box quantiles for kernel learning. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 1427–1437, 2019.
- [35] Jiyan Yang, Vikas Sindhwani, Haim Avron, and Michael Mahoney. Quasi-monte carlo feature maps for shift-invariant kernels. In International Conference on Machine Learning, pages 485–493, 2014.
- [36] Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
Organization: In the supplement, we present the detailed proof of the Theorem 1, Theorem 2 and Corollary 1 in section A, section B and section C, respectively. We then present a subgroup-based QMC on sphere in Section D. We give the detailed proof of Theorem 3 and Theorem 4 in section E and section F, respectively. We then present QMC for generative CycleGAN in section G and section H. At last, we present the experimental results of kernel approximation on SIFT1M dataset in Figure 7.
Appendix A Proof of Theorem 1
Theorem.
Suppose is a prime number and . Let be a primitive root of . Let . Construct a rank-1 lattice with . Then, there are distinct pairwise toroidal distance values among , and for each distance value, there are the same number of pairs that obtain this value.
Proof.
From the definition of the rank-1 lattice, we know that
(24) |
where denotes the -norm-based toroidal distance between and , and .
For non-identical pair , we know . Thus, . Moreover, for each , there are pairs of obtaining . Therefore, the non-identical pairwise toroidal distance is determined by for . Moreover, each corresponds to pairwise distances.
From the definition of the -norm-based toroidal distance, we know that
(25) |
where denotes the element-wise min operation between two inputs.
Since is a prime number, from the number theory, we know that for a primitive root , the residue of modulo forms a cyclic group under multiplication, and . Moreover, there is a one-to-one correspondence between the residue of modulo and the set . Then, we know that . It follows that
(26) |
Since and is a primitive root, we know that . Denote . Since , we know that
(27) | ||||
(28) | ||||
(29) |
It follows that forms a subgroup of the group . From Lagrange’s theorem in group theory [10], we know that the cosets of the subgroup partition the entire group into equal-size, non-overlapping sets, i.e., cosets , and the number of cosets of is .
Together with Eq.(26), we know that distance for has different values simultaneously hold for all , i.e, for . And for each distance value, there are the same number of terms that obtain this value. Since each corresponds to pairwise distance , where , there are distinct pairwise toroidal distance. Moreover, for each distance value, there are the same number of pairs that obtain this value.
∎
Appendix B Proof of Theorem 2
Theorem.
Suppose is a prime number and . Let with . Construct non-degenerate rank-1 lattice with . Then, the minimum pairwise toroidal distance can be bounded as
(30) | |||
(31) |
where and denotes the -norm-based toroidal distance and the -norm-based toroidal distance, respectively.
Proof.
From the definition of the rank-1 lattice, we know that
(32) |
where denotes the -norm-based toroidal distance, we know that between and , and .
Thus, the minimum pairwise toroidal distance is equivalent to Eq. (33)
(33) |
Since the minimum value is smaller than the average value, it follows that
(34) |
Since is a prime number, from number theory, we know that for a primitive root , the residue of modulo forms a cyclic group under multiplication, and . Moreover, there is a one-to-one correspondence between the residue of modulo and the set . Then, for each component of , we know that such that . Therefore, the set is a permutation of the set .
From the definition of the -norm-based toroidal distance, we know that each component of is determined by . Because the set is a permutation of set , we know that the set consists of two copy of permutation of the set . It follows that
(35) |
Similarly, for -norm-based toroidal distance, we have that
(36) |
By Cauchy–Schwarz inequality, we know that
(37) |
Together with Eq.(34), it follows that
(38) | |||
(39) |
Now, we are going to prove the lower bound. For a non-degenerate rank-1 lattice, the components of generating vector should be all different. Then, we know the components of should be all different. Thus, the min norm point is achieved at . Since , it follows that
(40) | |||
(41) |
∎
Appendix C Proof of Corollary 1
Corollary 1.
Suppose is a prime number. Let be a primitive root of . Let . Construct rank-1 lattice with . Then, the pairwise toroidal distance of the lattice attains the upper bound.
(42) | |||
(43) |
Proof.
From the definition of the rank-1 lattice, we know that
(44) |
where denote the -norm-based toroidal distance, we know that between and , and .
From Theorem 1, we know that has different values. Since , we know the pairwise toroidal distance has the same value. Therefore, we know that
(45) |
From the proof of Theorem 2, we know that
(46) |
and
(47) |
Since , it follows that
(49) |
Together with Eq.(47), we know that
(50) |
From Theorem 2, we know that the -norm-based and -norm-based pairwise toroidal distance of the lattice attains the upper bound.
∎
Appendix D Subgroup-based QMC on Sphere
In this section, we propose a closed-form subgroup-based QMC method on the sphere instead of unit cube . QMC uniformly on sphere can be used to construct samples for isotropic distribution, which is helpful for variance reduction of the gradient estimators in Evolutionary strategy for reinforcement learning [30].
Lyu [23] constructs structured sampling matrix on by minimizing the discrete Riesz energy. In contrast, we construct samples by a closed-form construction without the time-consuming optimization procedure. Our construction can achieve a small mutual coherence.
Without loss of generality, we assume that , and is a prime such that . Let be a discrete Fourier matrix. is the entry of , where . Let be a subset of indexes.
The structured sampling matrix in [23] can be defined as equation (52).
(52) |
where Re and Im denote the real and image part of a complex number, and in equation (53) is the matrix constructed by rows of
(53) |
With the given in equation (52), we know that for . Thus, each column of matrix is a point on .
Let denote a primitive root modulo . We construct the index as
(54) |
The set forms a subgroup of the the group mod . Based on this, we derive upper bounds of the mutual coherence of the points set . The results are summarized in Theorem 3 and Theorem 4.
Theorem 3.
Theorem 4.
Theorem 3 and Theorem 4 show that our construction can achieve a bounded mutual coherence. A smaller mutual coherence means that the points are more evenly spread on sphere .
Remark: Our construction does not require a restrictive constraint of the dimension of data. The only assumption of data dimension is that is a even number, i.e.,, which is commonly satisfied in practice. Moreover, the product can be accelerated by fast Fourier transform as in [23].
D.1 Evaluation of the mutual coherence
We evaluate our subgroup-based spherical QMC by comparing with the construction in [23] and i.i.d Gaussian sampling.
We set the dimension as in . For each dimension , we set the number of points , with as the first ten prime numbers such that divides , i.e., . Both subgroup-based QMC and Lyu’s method are deterministic. For Gaussian sampling method, we report the mean standard deviation of mutual coherence over 50 independent runs. The mutual coherence for each dimension are reported in Table 3. The smaller the mutual coherence, the better.
We can observe that our subgroup-based spherical QMC achieves a competitive mutual coherence compared with Lyu’s method in [23]. Note that our method does not require a time consuming optimization procedure, thus it is appealing for applications that demands a fast construction. Moreover, both our subgroup-based QMC and Lyu’s method obtain a significant smaller coherence than i.i.d Gaussian sampling.
d=50 | 202 | 302 | 502 | 802 | 1202 | 1402 | 1502 | 2102 | 2302 | 2402 | |
---|---|---|---|---|---|---|---|---|---|---|---|
SubGroup | 0.1490 | 0.2289 | 0.1923 | 0.2930 | 0.2608 | 0.3402 | 0.3358 | 0.3211 | 0.4534 | 0.3353 | |
Lyu [23] | 0.2313 | 0.2377 | 0.2901 | 0.2902 | 0.3005 | 0.3154 | 0.3155 | 0.3209 | 0.3595 | 0.3718 | |
Gaussian | 0.5400 | 0.5738 | 0.5904 | 0.6158 | 0.6270 | 0.6254 | 0.6328 | 0.6447 | 0.6520 | 0.6517 | |
0.0254 | 0.0291 | 0.0257 | 0.0249 | 0.0209 | 0.0184 | 0.0219 | 0.0184 | 0.0204 | 0.0216 | ||
d=100 | 202 | 302 | 502 | 802 | 1202 | 1402 | 1502 | 2102 | 2302 | 2402 | |
SubGroup | 0.1105 | 0.1529 | 0.1923 | 0.1764 | 0.2397 | 0.2749 | 0.2513 | 0.2679 | 0.4534 | 0.3353 | |
Lyu [23] | 0.1234 | 0.1581 | 0.1586 | 0.1870 | 0.2041 | 0.2191 | 0.1976 | 0.2047 | 0.2244 | 0.2218 | |
Gaussian | 0.4033 | 0.4210 | 0.4422 | 0.4577 | 0.4616 | 0.4734 | 0.4716 | 0.4878 | 0.4866 | 0.4947 | |
0.0272 | 0.0274 | 0.0225 | 0.0230 | 0.0170 | 0.0174 | 0.0234 | 0.0167 | 0.0172 | 0.0192 | ||
d=200 | 202 | 802 | 1202 | 1402 | 2402 | 2602 | 3202 | 3602 | 3802 | 5602 | |
SubGroup | 0.0100 | 0.1251 | 0.1835 | 0.1966 | 0.2365 | 0.1553 | 0.1910 | 0.1914 | 0.2529 | 0.2457 | |
Lyu [23] | 0.0100 | 0.1108 | 0.1223 | 0.1262 | 0.1417 | 0.1444 | 0.1505 | 0.1648 | 0.1624 | 0.1679 | |
Gaussian | 0.2887 | 0.3295 | 0.3362 | 0.3447 | 0.3564 | 0.3578 | 0.3645 | 0.3648 | 0.3689 | 0.3768 | |
0.0163 | 0.0155 | 0.0148 | 0.0182 | 0.0140 | 0.0142 | 0.0143 | 0.0142 | 0.0140 | 0.0151 | ||
d=500 | 502 | 1502 | 4502 | 6002 | 6502 | 8002 | 9502 | 11002 | 14002 | 17002 | |
SubGroup | 0.0040 | 0.0723 | 0.1051 | 0.1209 | 0.1107 | 0.1168 | 0.1199 | 0.1425 | 0.1587 | 0.1273 | |
Lyu [23] | 0.0040 | 0.0650 | 0.0946 | 0.0934 | 0.0930 | 0.1004 | 0.0980 | 0.1022 | 0.1077 | 0.1110 | |
Gaussian | 0.2040 | 0.2218 | 0.2388 | 0.2425 | 0.2448 | 0.2498 | 0.2528 | 0.2527 | 0.2579 | 0.2607 | |
0.0111 | 0.0099 | 0.0092 | 0.0081 | 0.0113 | 0.0110 | 0.0100 | 0.0084 | 0.0113 | 0.0092 | ||
d=1000 | 6002 | 8002 | 11002 | 14002 | 17002 | 18002 | 21002 | 26002 | 32002 | 38002 | |
SubGroup | 0.0754 | 0.0778 | 0.0819 | 0.0921 | 0.0935 | 0.0764 | 0.1065 | 0.0931 | 0.0908 | 0.1125 | |
Lyu [23] | 0.0594 | 0.0637 | 0.0662 | 0.0680 | 0.0684 | 0.0744 | 0.0774 | 0.0815 | 0.0781 | 0.0814 | |
Gaussian | 0.1736 | 0.1764 | 0.1797 | 0.1828 | 0.1846 | 0.1840 | 0.1869 | 0.1888 | 0.1909 | 0.1920 | |
0.0067 | 0.0059 | 0.0060 | 0.0062 | 0.0052 | 0.0057 | 0.0052 | 0.0055 | 0.0067 | 0.0056 |
Appendix E Proof of Theorem 3
Proof.
Let be the column of matrix in Eq.(53). Let be the column of matrix in Eq.(52). For , , we know that
(55) | |||
(56) | |||
(57) |
where denote the complex conjugate, and denote the real and image part of the input complex number.
It follows that
(58) |
From the definition of in Eq.(53), we know that
(59) | ||||
(60) |
Because is a subgroup of the multiplicative group mod , from [4], we know that
(61) |
Finally, we know that
(62) |
∎
Appendix F Proof of Theorem 4
Appendix G QMC for Generative models
Our subgroup rank-1 lattice can be used for generative models. Buchholz et al. [5] suggest using QMC for variational inference to maximize the evidence lower bound (ELBO). We present a new method by directly learning the inverse of the cumulative distribution function (CDF).
In variational autoencoder, the objective is the evidence lower bound (ELBO) [15] defined as
(65) |
The ELBO consists of two terms, i.e., the reconstruction term and the regularization term . The reconstruction term is learning to fit, while the regularization term controls the distance between distribution to the prior distribution .
The reconstruction term can be reformulated as
(66) | ||||
(67) |
where denotes the inverse cumulative distribution function with respect to the density .
Eq.(67) provides an alternative training scheme, we directly learn the inverse of CDF given instead of the density . We parameterize as a neural network with input and data . The inverse of CDF function can be seen as an encoder of for inference. It is worth noting that learning the inverse of CDF can bring more flexibility without the assumption of the distribution, e.g., Gaussian.
To ensure the distribution close to the prior distribution , we can use other regularization terms instead of the KL-divergence for any implicit distribution , e.g., the maximum mean discrepancy. Besides this, we can also use a discriminator-based adversarial loss similar to adversarial autoencoders [24]
(68) |
where denotes a uniform distribution on unit cube , is the discriminator, denotes the inverse of CDF mapping.
When the domain coincides with a target domain , we can use an empirical data distribution as the prior. This leads to a training scheme similar to cycle GAN [36]. In contrast to cycle GAN, the encoder depends on both data in source domain and in unit cube. The expectation term can be approximated by QMC methods.
Appendix H Generative Inference for CycleGAN
We evaluate our subgroup rank-1 lattice on training generative model. As shown in section G, we can learn the inverse CDF functions as a generator from domain to domain in cycle GAN. We set , where and denotes the neural networks. Network maps input image to a target mean, while network maps as the residue. Similarly, denotes an generator from domain to domain .
We implement the model based on the open-sourced Pytorch code of [36]. All , , and employ the default ResNet architecture with 9 blocks in [36]. The input size of both and are . We keep all the hyperparameters same for all the methods as the default value in [36].
We compare our subgroup rank-1 lattice with Monte Carlo sampling for training the generative model. For subgroup rank-1 lattice, we set the number of points . We do not store all the points, instead we sample uniformly and construct and based on Eq.(3) during the training process. For Monte Carlo sampling, and are sampled from .
We train generative models on the Vangogh2photo data set and maps data set employed in [36]. We present experimental results of the generated images from models trained with subgroup-based rank-1 lattice sampling, Monte-Carlo sampling, and standard version of CycleGAN. The experimental results on Vangogh2photo dataset and maps dataset are shown in Figure 5 and Figure 6, respectively. From Figure 5, we can observe that the images generated by the model trained with Monte-Carlo sampling have some blurred patches. This phenomenon may be because the additional flexibility of randomness makes the training more difficult to converge to a good model. In contrast, the model trained with subgroup-based rank-1 lattice sampling generates more clearer images. It may be because the rank-1 lattice sampling has finite possible choices, i.e., possible points in the experiments, which is much smaller than the case of Monte-Carlo uniform sampling. The rank-1 lattice sampling is more deterministic than Monte Carlo sampling, which alleviates the training difficulty to fit a good model. Since in our subgroup-based rank-1 lattice it is very simple to construct new samples, it can serve as a good alternative to Monte Carlo sampling for generative model training.







