This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Unbounded Slice Sampling

Daichi Mochihashi
The Institute of Statistical Mathematics
daichi@ism.ac.jp
( 2020-1-5 (Sun) )
Abstract

Slice sampling is an efficient Markov Chain Monte Carlo algorithm to sample from an unnormalized density with acceptance ratio always 11. However, when the variable to sample is unbounded, its “stepping-out” heuristic works only locally, making it difficult to uniformly explore possible candidates. This paper proposes a simple change-of-variable method to slice sample an unbounded variable equivalently from [0,1)[\kern 1.1pt0,1).

1 Introduction

Slice sampling [1] is one of the Markov Chain Monte Carlo (MCMC) methods to sample from one-dimensional distribution. Because its acceptance probability is always 11 as opposed to famous Metropolis-Hastings algorithm [2], slice sampling is widely employed in modern statistics and machine learning problems, especially for sampling hyperparameters of a statistical model where each hyperparameter is often univariate but has a nontrivial likelihood surface.

When the variable to sample is not bounded, an algorithm called “stepping-out” [3] is usually employed to adaptively adjust the interval to sample from based on the current value of the variable. However, this “stepping-out” works only locally, thus will be potentially trapped to local modes in multimodal likelihoods. In contrast, in this paper we propose a simple method to uniformly sample from an unbounded variable through an appropriate reparametrization with a unit interval [0,1)[\kern 1.1pt0,1).

2 Slice sampling

Refer to caption
(a) Sigmoid map p=11+ex\displaystyle p=\frac{1}{1+e^{-x}}
Refer to caption
(b) Logit map x=p1p\displaystyle x=\frac{p}{1-p}
Figure 1: One-to-one map between p[0,1)p\in[\kern 1.1pt0,1) and unbounded xx.

Slice sampling iteratively draws xx from a probability distribution

p(x)=f(x)Zp(x)=\frac{f(x)}{Z} (1)

without knowing the normalizing constant Z=f(x)𝑑x\displaystyle Z\!=\!\int\!f(x)dx by the following general algorithm, starting from an initial value x(0)x^{(0)}:

Algorithm 1 General slice sampling
1:  for t=1Tt=1\cdots T do
2:     =f(x(t1))\ell=f(x^{(t-1)})
3:     ρUniform[0,)\rho\sim{\rm Uniform}[0,\ell)
4:     x(t)Uniform({x:f(x)>ρ})x^{(t)}\sim{\rm Uniform}(\{x:f(x)>\rho\})
5:  end for

In practice, Step 44 of the Algorithm 1 is not trivial, and usually “stepping-out” heuristic [1] is employed to determine the interval from which xx is uniformly drawn, starting from an initial interval enclosing xx that is adaptively expanded.

However, the efficiency of “stepping-out” depends on setting the initial interval whose appropriate scale for the problem is not known in advance. Moreover, it works only locally: if f(x)f(x) has multiple and distant modes, it tends to wander around a single mode and will spend a very long time to explore different modes that might have higher probabilities.

On the other hand, if xx is bounded and known to always reside in the interval [𝑠𝑡,𝑒𝑑][{\it st},{\it ed}], slice sampling is very easy using the following algorithm:

Algorithm 2  Bounded slice sampling
1:  for t=1Tt=1\cdots T do
2:     =f(x(t1))\ell=f(x^{(t-1)})
3:     ρUniform[0,)\rho\sim{\rm Uniform}[0,\ell)
4:     while true do
5:        xUniform[𝑠𝑡,𝑒𝑑]x\sim{\rm Uniform}[{\it st},{\it ed}]   /* generate a candidate */
6:        if f(x)>ρf(x)>\rho then
7:           x(t)=xx^{(t)}=x; break
8:        else                           /* modify interval */
9:           if x<x(t1)x<x^{(t-1)} then
10:              𝑠𝑡:=x{\it st}:=x
11:           else
12:              𝑒𝑑:=x{\it ed}:=x
13:           end if
14:        end if
15:     end while
16:  end for

This is an efficient binary search that quickly draws xx uniformly over the region [𝑠𝑡,𝑒𝑑][{\it st},{\it ed}] where f(x)>ρf(x)>\rho.

3 Unbounded slice sampling

Algorithm 2 is effective for a bounded xx that is known to reside in [𝑠𝑡,𝑒𝑑][{\it st},{\it ed}]. However, even when xx is unbounded, we can sample xx through an one-to-one map between [0,1)[\kern 1.1pt0,1) and \mathbb{R}.

General unbounded variate:

For example, using a sigmoid map

p\displaystyle p =11+ex\displaystyle=\frac{1}{1+e^{-x}} (2)
i.e.x\displaystyle\text{\it i.e.}\qquad x =log(1p1)\displaystyle=-\log\left(\frac{1}{\,p\,}-1\right) (3)

for p[0,1)p\in[\kern 1.1pt0,1) shown in Figure 1(a), for each xx\in\mathbb{R} we can associate p[0,1)p\in[\kern 1.1pt0,1) and vice versa.

Since

dxdp=1p(1p),\frac{dx}{dp}=\frac{1}{p(1-p)}~{}, (4)

we can alternatively sample pp in place of xx using the exchange of variables:

f(x)dx=f(x)dxdpdp=f(log(1p1))1p(1p)dpf(x)dx=f(x)\frac{dx}{dp}dp=f\left(-\log\left(\frac{1}{\,p\,}-1\right)\right)\frac{1}{p(1-p)}dp (5)

Using (5) for the Algorithm 2 in place of f(x)f(x) can sample p[0,1)p\in[\kern 1.1pt0,1), from which we can recover xx by the relationship (3).

Positive variate:

When x>0x>0, for example we can exploit a map

x=p1px=\frac{p}{1-p} (6)

therefore

dxdp=1(1p)2,\frac{dx}{dp}=\frac{1}{(1-p)^{2}}~{}, (7)

leading

f(x)dx=f(x)dxdpdp=f(p1p)1(1p)2dpf(x)dx=f(x)\frac{dx}{dp}dp=f\left(\frac{p}{1-p}\right)\frac{1}{(1-p)^{2}}dp (8)

to sample p[0,1)p\in[\kern 1.1pt0,1) and obtain xx by the relationship (6).

Using this reparameterization, we can slice sample an unbounded xx essentially by a “probabilistic binary search” always on [0,1)[\kern 1.1pt0,1). We included sample of MATLAB and C codes for our unbounded slice sampling in Appendix B and C.

In practice, sometimes we cannot compute a corresponding pp for very large (or small) xx using equation (2). Therefore, instead of (2) we can use

p=σ(x/A)=11+ex/Ap=\sigma(x/A)=\frac{1}{1+e^{-x/A}} (9)

for some constant A>0A>0 as shown in Figure 2. While every xx\in\mathbb{R} is mathematically equivalent, notice that usually the parameter to sample from a statistical model roughly111 This is important, because we do not enforce a strict interval to sample xx in this paper. resides in this regime; even if xx will exceed over this regime, we can safely rescale xx to fit into a decent interval. We empirically confirmed that A=100A=100 usually works fine for 1000<x<1000-1000<x<1000 (see Section 4).

4 Experiments

Refer to caption
Figure 2: Modified sigmoid map p=11+ex/100\displaystyle p=\frac{1}{1+e^{-x/100}}.

We conducted some experiments to confirm that our unbounded slice sampler correctly samples from a given distribution. Using the C code shown in Appendix A, we can easily sample from p(x)p(x).

For a general unbounded case xx\in\mathbb{R}, Figure 3 shows the plots of sampled xx from (a) f(x)=exp((x500)2/10)f(x)=\exp(-(x\!-\!500)^{2}/10), (b) f(x)=exp((x500)2/10)f(x)=\exp(-(x\!-\!500)^{2}/10), and (c) f(x)=exp((x1000)2/100)f(x)=\exp(-(x\!-\!1000)^{2}/100). Note that p(x)p(x) will be high on different regimes for each case: roughly 0<x<40<x<4 for (a), x500x\approx 500 for (b), and x1000x\approx 1000 for (c), while these regimes are not known in advance, and they have different variances. Figure 3 clearly shows we can correctly sample from p(x)p(x), even when xx is very large using the map (9) with A=100A\!=\!100. The average step of “binary search”, i.e. the number of function evaluations for the loop in Step 4 of the Algorithm 2, is 11.4411.44 for (a), 16.4816.48 for (b), and 9.349.34 for (c).

Refer to caption
(a) f(x)=exp(x(x1)(x2)(x3.5))f(x)=\exp(-x(x\!-\!1)(x\!-\!2)(x\!-\!3.5))
Refer to caption
(b) f(x)=exp((x500)2/10)f(x)=\exp(-(x\!-\!500)^{2}/10)
Refer to caption
(c) f(x)=exp((x1000)2/100)f(x)=\exp(-(x\!-\!1000)^{2}/100)
Figure 3: Unbounded slice sampling of xx\in\mathbb{R}.

For a positive case x+x\in\mathbb{R}^{+}, Figure 4 shows the results for (a) f(x)=x4exf(x)=x^{4}e^{-x} and (b) f(x)=exp((x1000)2/100)f(x)=\exp(-(x\!-\!1000)^{2}/100); (a) means p(x)Gamma(5,1)p(x)\sim{\rm Gamma}(5,1).

Refer to caption
(a) f(x)=x4exf(x)=x^{4}e^{-x} (=Gamma(5,1))(={\rm Gamma}(5,1))
Refer to caption
(b) f(x)=exp((x1000)2/100)f(x)=\exp(-(x\!-\!1000)^{2}/100)
Figure 4: Unbounded slice sampling of x>0x>0.

Comparison with stepping-out

To confirm the advantage of our unbounded slice sampling, we also conducted an experiment to sample from a multimodal distribution. Specifically, Figure 5 shows a draw of 10,00010,000 samples from a Gaussian mixture model p(x)=0.8𝒩(x|0,12)+0.2𝒩(x|10,12)p(x)=0.8\mathcal{N}(x|0,1^{2})+0.2\mathcal{N}(x|10,1^{2}) using unbounded slice sampler and “stepping-out” algorithm, respectively.

Clearly, “stepping-out” is stuck into the first mode because the sampling starts from x=1x=1, and this tendency remained the same over multiple simulations. Moreover, because “stepping-out” linearly increases the interval to sample from, sampling from a distant density shown in Figure 3(c) required about 2,0002,000 function evaluations at first to reach the mode, while unbounded slice sampling required only 9.349.34 as described before, thanks to the efficient “probabilistic binary search” on \mathbb{R}.

Refer to caption
(a) Unbounded slice sampling
Refer to caption
(b) Stepping-out algorithm (width=11)
Figure 5: Comparison with “stepping-out” on a Gaussian mixture model p(x)=0.8𝒩(x|0,12)+0.2𝒩(x|10,12)p(x)=0.8\mathcal{N}(x|0,1^{2})+0.2\mathcal{N}(x|10,1^{2}) using 10,00010,000 samples.

References

  • [1] Radford M. Neal. Slice sampling. Annals of Statistics, pages 705–741, 2003.
  • [2] W. R. Gilks, S. Richardson, and D. J. Spiegelhalter. Markov Chain Monte Carlo in Practice. Chapman & Hall / CRC, 1996.
  • [3] David J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

Appendix

Appendix A C code of easy slice sampling

#include "slice.h"

double
f (double x, void *arg)
{
        return - x * (x - 1) * (x - 2) * (x - 3.5);
}

main () {
        int i, N = atoi (argv[1]);
        double x = 1;

        for (i = 0; i < N; i++) {
                x = slice_sample (x, f, NULL);
                printf("%lf\n", x);
        }
}

Appendix B C code for unbounded slice sampling

/*
 *    slice.c
 *    Unbounded slice sampling with ease.
 *    $Id: unbounded.tex,v 1.17 2020/01/03 01:23:08 daichi Exp $
 *
 */
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include "random.h"

static double A = 100.0;
typedef double (*likfun)(double x, void *args);

static double
expand (double p)        /* p -> x */
{
        return - A * log (1 / p - 1);
}

static double
shrink (double x)        /* x -> p */
{
        return 1 / (1 + exp(- x / A));
}

static double
expandp (double p)        /* p -> (x > 0) */
{
        return p / (1 - p);
}

static double
shrinkp (double x)        /* (x > 0) -> p */
{
        return  x / (1 + x);
}

double
slice_sample (double x, likfun loglik, void *arg)
{
        double st = 0, ed = 1;
        double r, rnew, slice, newlik;
        int iter, maxiter = 1000;

        r = shrink (x);
        slice = (*loglik)(x, arg) - log (A * r * (1 - r)) + log (RANDOM);

        for (iter = 0; iter < maxiter; iter++)
        {
                rnew = unif (st, ed);
                newlik = (*loglik)(expand(rnew), arg)
                                        - log (A * rnew * (1 - rnew));

                if (newlik > slice)
                        return expand (rnew);
                else if (rnew > r)
                        ed = rnew;
                else if (rnew < r)
                        st = rnew;
                else
                        return x;
        }
        fprintf(stderr, "slice_sample: max iteration %d reached.\n", maxiter);
        return x;
}

double
slice_sample_positive (double x, likfun loglik, void *arg)
{
        double st = 0, ed = 1;
        double r, rnew, slice, newlik;
        int iter, maxiter = 1000;

        r = shrinkp (x);
        slice = (*loglik)(x, arg) - 2 * log (1 - r) + log (RANDOM);

        for (iter = 0; iter < maxiter; iter++)
        {
                rnew = unif (st, ed);
                newlik = (*loglik)(expandp(rnew), arg) - 2 * log (1 - rnew);

                if (newlik > slice)
                        return expandp (rnew);
                else if (rnew > r)
                        ed = rnew;
                else if (rnew < r)
                        st = rnew;
                else
                        return x;
        }
        fprintf(stderr, "slice_sample: max iteration %d reached.\n", maxiter);
        return x;
}

#if 1

double
f (double x, void *arg)
{
        return - x * (x - 1) * (x - 2) * (x - 3.5);
}

double
g (double x, void *arg)
{
        return - 10 * (x + 1000) * (x + 1000);
}

double
h (double p, void *arg)
{
        double a = 2, b = 3;
        return (a - 1) * log (p) + (b - 1) * log (1 - p);
}

int
main (int argc, char *argv[])
{
        int i, N = atoi(argv[1]);
        double x = 0.5;

        for (i = 0; i < N; i++)
        {
                x = slice_sample (x, g, NULL);
                // x = slice_sample_positive (x, f, NULL);
                printf("%.4lf\n", x);
        }
}

#endif

Appendix C MATLAB code for unbounded slice sampling

function xnew = slice_sample (x,likfun,varargin)
% xnew = slice_sample (x,likfun,varargin)
% Unbounded slice sampling in MATLAB.
% $Id: unbounded.tex,v 1.17 2020/01/03 01:23:08 daichi Exp $
global A;
A  = 100;
st = 0; ed = 1;
maxiter = 1000;
% function body
r = shrink (x);
slice = feval (likfun,x,varargin{:}) - log (A * r * (r - 1)) + log (rand());
% slice sampling
for iter = 1:maxiter
  rnew = unif (st,ed);
  xnew = expand (rnew);
  newlik = feval (likfun,xnew,varargin{:}) - log (A * rnew * (rnew - 1));
  if (newlik > slice)
    return;
  elseif (rnew > r)
    ed = rnew;
  elseif (rnew < r)
    st = rnew;
  else
    return;
  end
end
fprintf(stderr,’slice_sample: max iteration %d reached\n’,maxiter);
return;

function p = shrink(x)Ψ% x -> p
global A;
p = 1 / (1 + exp (- x / A));

function x = expand(p)Ψ% p -> x
global A;
x = - A * log (1 / p - 1);

function xnew = slice_sample_positive (x,likfun,varargin)
% xnew = slice_sample_positive (x,likfun,varargin)
% Unbounded slice sampling in MATLAB. (only positive)
% $Id: unbounded.tex,v 1.17 2020/01/03 01:23:08 daichi Exp $
st = 0; ed = 1;
maxiter = 1000;
% function body
r = shrinkp (x);
slice = feval (likfun,x,varargin{:}) - 2 * log (1 - r) + log (rand());
% slice sampling
for iter = 1:maxiter
  rnew = unif (st,ed);
  xnew = expandp (rnew);
  newlik = feval (likfun,xnew,varargin{:}) - 2 * log (1 - rnew);
  if (newlik > slice)
    return;
  elseif (rnew > r)
    ed = rnew;
  elseif (rnew < r)
    st = rnew;
  else
    return;
  end
end
fprintf(stderr,’slice_sample: max iteration %d reached\n’,maxiter);
return;

function p = shrinkp (x)Ψ% (x > 0) -> p
p = x / (1 + x);

function x = expandp (p)Ψ% p -> (x > 0)
x = p / (1 - p);