This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Supplement to “Markov Chain Monte Carlo Based on Deterministic Transformations”

Somak Dutta
Department of Statistics
University of Chicago
and
Sourabh Bhattacharya
Bayesian and Interdisciplinary Research Unit
Indian Statistical Institute
Corresponding e-mail: sdutta@galton.uchicago.edu

Throughout, we refer to our main article Dutta and Bhattacharya (2013) as DB.

S-1 Proof of detailed balance for TMCMC

The detailed balance condition is proved as follows: Suppose 𝐲=T𝐳(𝐱,ϵ)T𝐳(𝐱,𝒴)\mathbf{y}=T_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon})\in T_{\mathbf{z}}(\mathbf{x},\mathcal{Y}), then 𝐱=T𝐳c(𝐲,ϵ)\mathbf{x}=T_{\mathbf{z}^{c}}(\mathbf{y},\boldsymbol{\epsilon}). Hence, the kernel KK satisfies,

π(𝐱)K(𝐱𝐲)\displaystyle\pi(\mathbf{x})K(\mathbf{x}\to\mathbf{y}) =\displaystyle= π(𝐱)P(T𝐳)g(ϵ)min{1,P(T𝐳c)π(𝐲)P(T𝐳)π(𝐱)J𝐳(𝐱,ϵ)}\displaystyle\pi(\mathbf{x})~P(T_{\mathbf{z}})~g(\boldsymbol{\epsilon})\min\left\{1,\dfrac{P(T_{\mathbf{z}^{c}})\pi(\mathbf{y})}{P(T_{\mathbf{z}})\pi(\mathbf{x})}J_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon})\right\}
=\displaystyle= g(ϵ)min{π(𝐱)P(T𝐳),π(𝐲)P(T𝐳c)J𝐳(𝐱,ϵ)}\displaystyle g(\boldsymbol{\epsilon})~\min\left\{\pi(\mathbf{x})~P(T_{\mathbf{z}}),\pi(\mathbf{y})P(T_{\mathbf{z}^{c}})~J_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon})\right\}

and

π(𝐲)K(𝐲𝐱)\displaystyle\pi(\mathbf{y})K(\mathbf{y}\to\mathbf{x}) =\displaystyle= π(𝐲)P(T𝐳c)g(ϵ)J𝐳(𝐱,ϵ)min{1,P(T𝐳)π(𝐱)P(T𝐳c)π(𝐲)J𝐳c(𝐲,ϵ)}\displaystyle\pi(\mathbf{y})~P(T_{\mathbf{z}^{c}})~g(\boldsymbol{\epsilon})J_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon})\min\left\{1,\dfrac{P(T_{\mathbf{z}})\pi(\mathbf{x})}{P(T_{\mathbf{z}^{c}})\pi(\mathbf{y})}J_{\mathbf{z}^{c}}(\mathbf{y},\boldsymbol{\epsilon})~\right\}
=\displaystyle= g(ϵ)min{π(𝐲)P(T𝐳c)J𝐳(𝐱,ϵ),π(𝐱)P(T𝐳)}\displaystyle g(\boldsymbol{\epsilon})~\min\left\{\pi(\mathbf{y})~P(T_{\mathbf{z}^{c}})J_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon}),\pi(\mathbf{x})P(T_{\mathbf{z}})\right\}

where J𝐳(𝐱,ϵ)=|(T𝐳(𝐱,ϵ),ϵ)/(𝐱,ϵ)|J_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon})=\left|\partial(T_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon}),\boldsymbol{\epsilon})/\partial(\mathbf{x},\boldsymbol{\epsilon})\right| satisfies

J𝐳c(T𝐳(𝐱,ϵ),ϵ)×J𝐳(𝐱,ϵ)=1 since T𝐳c(T𝐳(𝐱,ϵ),ϵ)=𝐱.J_{\mathbf{z}^{c}}(T_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon}),\boldsymbol{\epsilon})\times J_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon})=1\quad\textrm{ since }\quad T_{\mathbf{z}^{c}}(T_{\mathbf{z}}(\mathbf{x},\boldsymbol{\epsilon}),\boldsymbol{\epsilon})=\mathbf{x}.

S-2 General TMCMC algorithm based on a single ϵ\epsilon

Algorithm S-2.1
 

General TMCMC algorithm based on a single ϵ\epsilon.  

  • Input: Initial value 𝐱(0){\mathbf{x}}^{(0)}, and number of iterations NN.

  • For t=0,,N1t=0,\ldots,N-1

    1. 1.

      Generate ϵg()\epsilon\sim g(\cdot) and an index i(1;p1,,p3k1)i\sim\mathcal{M}(1;p_{1},\ldots,p_{3^{k}-1}) independently. Again, actual simulation from the high-dimensional multinomial distribution is not necessary; see Section 3.1 of DB.

    2. 2.
      𝐱=T𝐳i(𝐱(t),ϵ) and α(𝐱(t),ϵ)=min(1,P(T𝐳ic)P(T𝐳i)π(𝐱)π(𝐱(t))|(T𝐳i(𝐱(t),ϵ),ϵ)(𝐱(t),ϵ)|)\mathbf{x}^{\prime}=T_{\mathbf{z}_{i}}({\mathbf{x}}^{(t)},\epsilon)\quad\textrm{ and }\quad\alpha({\mathbf{x}}^{(t)},\epsilon)=\min\left(1,\dfrac{P(T_{\mathbf{z}^{c}_{i}})}{P(T_{\mathbf{z}_{i}})}~\dfrac{\pi(\mathbf{x}^{\prime})}{\pi({\mathbf{x}}^{(t)})}~\left|\frac{\partial(T_{\mathbf{z}_{i}}({\mathbf{x}}^{(t)},\epsilon),\epsilon)}{\partial({\mathbf{x}}^{(t)},\epsilon)}\right|\right)
    3. 3.

      Set

      𝐱(t+1)={𝐱 with probability α(𝐱(t),ϵ)𝐱(t) with probability 1α(𝐱(t),ϵ){\mathbf{x}}^{(t+1)}=\left\{\begin{array}[]{ccc}\mathbf{x}^{\prime}&\textsf{ with probability }&\alpha({\mathbf{x}}^{(t)},\epsilon)\\ {\mathbf{x}}^{(t)}&\textsf{ with probability }&1-\alpha({\mathbf{x}}^{(t)},\epsilon)\end{array}\right.
  • End for

 

S-3 Convergence properties of additive TMCMC

In this section we prove some convergence properties of the TMCMC in the case of the additive transformation. Before going into our main result we first borrow some definitions from the MCMC literature.

Definition 1 (Irreducibility)

A Markov transition kernel KK is φ\varphi-irreducible, where φ\varphi is a nontrivial measure, if for every x𝒳x\in\mathcal{X} and for every measurable set AA of 𝒳\mathcal{X} with φ(A)>0\varphi(A)>0, there exists nn\in\mathbb{N}, such that Kn(x,A)>0.K^{n}(x,A)>0.

Definition 2 (Small set)

A measurable subset EE of 𝒳\mathcal{X} is said to be small if there is an nn\in\mathbb{N}, a constant c>0c>0, possibly depending on EE and a finite measure ν\nu such that

Kn(x,A)cν(A),A(𝒳),xEK^{n}(x,A)\geq c~\nu(A),\qquad\forall~A\in\mathcal{B}(\mathcal{X}),~\forall~x\in E
Definition 3 (Aperiodicity)

A Markov kernel KK is said to be periodic with period d>0d>0 if the state-space 𝒳\mathcal{X} can be partitioned into dd disjoint subsets 𝒳1,𝒳2,,𝒳d\mathcal{X}_{1},\mathcal{X}_{2},\ldots,\mathcal{X}_{d} with

K(x,𝒳i+1)=1x𝒳i,i=1,2,,d1K(x,\mathcal{X}_{i+1})=1~\forall~x\in\mathcal{X}_{i},~i=1,2,\ldots,d-1

and K(x,𝒳1)=1x𝒳dK(x,\mathcal{X}_{1})=1~\forall~x\in\mathcal{X}_{d}.

A Markov kernel KK is aperiodic if for no dd\in\mathbb{N} it is periodic with period dd.

S-3.1 Additive transformation with singleton ϵ\epsilon

Consider now the case where 𝒳=k\mathcal{X}=\mathbb{R}^{k}, 𝒟=\mathcal{D}=\mathbb{R} and T𝐳(𝐱,ϵ)=(x1+z1a1ϵ,x2+z2a2ϵ,,xk+zkakϵ)T_{\mathbf{z}}(\mathbf{x},\epsilon)=(x_{1}+z_{1}a_{1}\epsilon,x_{2}+z_{2}a_{2}\epsilon,\ldots,x_{k}+z_{k}a_{k}\epsilon) where, for i=1,,ki=1,\ldots,k, zi=±1z_{i}=\pm 1, and ai>0a_{i}>0. In this case 𝒴=[0,)\mathcal{Y}=[0,\infty). Suppose that gg is a density on 𝒴\mathcal{Y}.

Theorem 1

Suppose that π\pi is bounded and positive on every compact subset of k\mathbb{R}^{k} and that gg is positive on every compact subset of (0,)(0,\infty). Then the chain is λ\lambda-irreducible, aperiodic. Moreover every nonempty compact subset of k\mathbb{R}^{k} is small.

Proof 1

Without loss we may assume that ai=1;a_{i}=1; i=1,,ki=1,\ldots,k. For notational convenience we shall prove the theorem for k=2k=2. The general case can be seen to hold with suitably defined ‘rotational’ matrices on k\mathbb{R}^{k} similar to (S-3.1).

Suppose EE is a nonempty compact subset of k\mathbb{R}^{k}. Let CC be a compact rectangle whose sides are parallel to the diagonals {(x,y):|y|=|x|}\{(x,y)~:~|y|=|x|\} and containing EE such that λ(C)>0\lambda(C)>0. We shall show that EE is small, i.e., c>0\exists~c>0 such that

K2(𝐱,A)cλC(A)A(2) and xE.K^{2}(\mathbf{x},A)\geq c\lambda_{C}(A)\qquad\forall A\in\mathcal{B}(\mathbb{R}^{2})\textrm{ and }\forall x\in E.

It is clear that the points reachable from 𝐱\mathbf{x} in two steps are of the form

(x1±ϵ1±ϵ2x2±ϵ1±ϵ2),ϵ10,ϵ20\begin{pmatrix}x_{1}\pm\epsilon_{1}\pm\epsilon_{2}\\ x_{2}\pm\epsilon_{1}\pm\epsilon_{2}\end{pmatrix},\qquad\epsilon_{1}\geq 0,\epsilon_{2}\geq 0

Thus, if we define the matrices

M1=(1111)M2=(1111)M3=(1111)M4=(1111)M~1=(1111)M~2=(1111)M~3=(1111)M~4=(1111)\begin{split}&M_{1}=\begin{pmatrix}1&1\\ 1&-1\end{pmatrix}\quad M_{2}=\begin{pmatrix}-1&1\\ 1&1\end{pmatrix}\quad M_{3}=\begin{pmatrix}1&-1\\ -1&-1\end{pmatrix}\quad M_{4}=\begin{pmatrix}-1&-1\\ -1&1\end{pmatrix}\\ &\tilde{M}_{1}=\begin{pmatrix}1&1\\ -1&1\end{pmatrix}\quad\tilde{M}_{2}=\begin{pmatrix}1&-1\\ 1&1\end{pmatrix}\quad\tilde{M}_{3}=\begin{pmatrix}-1&1\\ -1&-1\end{pmatrix}\quad\tilde{M}_{4}=\begin{pmatrix}-1&-1\\ 1&-1\end{pmatrix}\end{split} (S-3.1)

then the points reachable from 𝐱\mathbf{x} in two steps, other than the points lying on the diagonals passing through 𝐱\mathbf{x} itself, are of the form

𝐱+Mi(ϵ1ϵ2) and 𝐱+M~i(ϵ1ϵ2),ϵ1>0,ϵ2>0,i=1,,4.\mathbf{x}+M_{i}\left(\begin{smallmatrix}\epsilon_{1}\\ \epsilon_{2}\end{smallmatrix}\right)\quad\textrm{ and }\quad\mathbf{x}+\tilde{M}_{i}\left(\begin{smallmatrix}\epsilon_{1}\\ \epsilon_{2}\end{smallmatrix}\right),\quad\epsilon_{1}>0,\epsilon_{2}>0,~i=1,\ldots,4.

Define

m=inf𝐲Cπ(𝐲)>0M=sup𝐲Cπ(𝐲)<a=inf0<ϵ<Rg(ϵ)>0m=\inf_{\mathbf{y}\in C}\pi(\mathbf{y})>0\qquad M=\sup_{\mathbf{y}\in C}\pi(\mathbf{y})<\infty\qquad a=\inf_{0<\epsilon<R}g(\epsilon)>0

where RR is the length of the diagonal of the rectangle CC111Actually R/2R/\sqrt{2} suffices.. Fix an element 𝐱E\mathbf{x}\in E. For any set A(2)A\in\mathcal{B}(\mathbb{R}^{2}), let A=ACA^{*}=A\cap C and define,

Ai={ϵ(0,)2:𝐱+MiϵA}A~i={ϵ(0,)2:𝐱+M~iϵA}\begin{split}A_{i}&=\{\boldsymbol{\epsilon}\in(0,\infty)^{2}~:~\mathbf{x}+M_{i}\boldsymbol{\epsilon}\in A^{*}\}\\ \tilde{A}_{i}&=\{\boldsymbol{\epsilon}\in(0,\infty)^{2}~:~\mathbf{x}+\tilde{M}_{i}\boldsymbol{\epsilon}\in A^{*}\}\end{split} (S-3.2)

The need for defining such sets illustrated in the following example: to make a transition from the state 𝐱\mathbf{x} to a state in AA^{*} in two steps, first making a forward transition in both coordinates and then a forward transition in first coordinate and a backward transition in the second coordinate is same as applying the transformation 𝐱𝐱+M1ϵ\mathbf{x}\to\mathbf{x}+M_{1}\boldsymbol{\epsilon} for some ϵA1\boldsymbol{\epsilon}\in A_{1} in two steps, i.e. first

𝐱𝐱+M1(ϵ1,0)T=𝐱+(ϵ1,ϵ1)T then 𝐱+M1(ϵ1,ϵ2)T𝐱+M1ϵ\mathbf{x}\to\mathbf{x}+M_{1}(\epsilon_{1},0)^{T}=\mathbf{x}+(\epsilon_{1},\epsilon_{1})^{T}\quad\textrm{ then }\quad\mathbf{x}+M_{1}(\epsilon_{1},\epsilon_{2})^{T}\to\mathbf{x}+M_{1}\boldsymbol{\epsilon}

Also note that for any ϵ=(ϵ1,ϵ2)Ai\boldsymbol{\epsilon}=(\epsilon_{1},\epsilon_{2})\in A_{i}, ACA^{*}\subset C implies that the intermediate point 𝐱+Mi(ϵ1,0)TC\mathbf{x}+M_{i}(\epsilon_{1},0)^{T}\in C and similarly for A~i(i=1,,4)\tilde{A}_{i}~(i=1,\ldots,4). Now, with p¯{\underline{p}} and p¯\bar{p} as the minimum and maximum of the move probabilities.

K2(𝐱,A)K2(𝐱,A)\displaystyle K^{2}(\mathbf{x},A)\geq K^{2}(\mathbf{x},A^{*}) (S-3.3)
\displaystyle\geq p¯2i=14Aig(ϵ1)g(ϵ2)min{p¯π(𝐱+Mi(ϵ1,0)T)p¯π(𝐱),1}min{p¯π(𝐱+Mi(ϵ1,ϵ2)T)p¯π(𝐱+Mi(ϵ1,0)T),1}𝑑ϵ1𝑑ϵ2\displaystyle{\underline{p}}^{2}\sum_{i=1}^{4}\int\displaylimits_{A_{i}}g(\epsilon_{1})g(\epsilon_{2})\min\left\{\frac{{\underline{p}}\pi(\mathbf{x}+M_{i}(\epsilon_{1},0)^{T})}{{\bar{p}}\pi(\mathbf{x})},1\right\}\min\left\{\frac{{\underline{p}}\pi(\mathbf{x}+M_{i}(\epsilon_{1},\epsilon_{2})^{T})}{{\bar{p}}\pi(\mathbf{x}+M_{i}(\epsilon_{1},0)^{T})},1\right\}d\epsilon_{1}d\epsilon_{2}
+\displaystyle+ p¯2i=14A~ig(ϵ1)g(ϵ2)min{p¯π(𝐱+M~i(ϵ1,0)T)p¯π(𝐱),1}min{p¯π(𝐱+M~i(ϵ1,ϵ2)T)p¯π(𝐱+M~i(ϵ1,0)T),1}𝑑ϵ1𝑑ϵ2\displaystyle{\underline{p}}^{2}\sum_{i=1}^{4}\int\displaylimits_{\tilde{A}_{i}}g(\epsilon_{1})g(\epsilon_{2})\min\left\{\frac{{\underline{p}}\pi(\mathbf{x}+\tilde{M}_{i}(\epsilon_{1},0)^{T})}{{\bar{p}}\pi(\mathbf{x})},1\right\}\min\left\{\frac{{\underline{p}}\pi(\mathbf{x}+\tilde{M}_{i}(\epsilon_{1},\epsilon_{2})^{T})}{{\bar{p}}\pi(\mathbf{x}+\tilde{M}_{i}(\epsilon_{1},0)^{T})},1\right\}d\epsilon_{1}d\epsilon_{2}
\displaystyle\geq p¯2a2(min{p¯mp¯M,1})2(i=14λ(Ai)+i=14λ(A~i))\displaystyle{\underline{p}}^{2}a^{2}\left(\min\left\{\dfrac{\underline{p}m}{\bar{p}M},1\right\}\right)^{2}\left(\sum_{i=1}^{4}\lambda(A_{i})+\sum_{i=1}^{4}\lambda(\tilde{A}_{i})\right)
=\displaystyle= p¯2a2(min{p¯mp¯M,1})2×2×i=14λ(Ai)\displaystyle{\underline{p}}^{2}a^{2}\left(\min\left\{\dfrac{\underline{p}m}{\bar{p}M},1\right\}\right)^{2}\times 2\times\sum_{i=1}^{4}\lambda(A_{i})

Since (ϵ1,ϵ2)Ai(ϵ2,ϵ1)A~i(\epsilon_{1},\epsilon_{2})\in A_{i}\iff(\epsilon_{2},\epsilon_{1})\in\tilde{A}_{i}, so that, λ(Ai)=λ(A~i)\lambda(A_{i})=\lambda(\tilde{A}_{i}). Now notice that, if we define for i=1,,4i=1,\ldots,4

fi:(0,)22ϵ𝐱+Miϵf_{i}:(0,\infty)^{2}\to\mathbb{R}^{2}\ni\boldsymbol{\epsilon}\mapsto\mathbf{x}+M_{i}\boldsymbol{\epsilon}

and

A𝐱={(ϵ,0)T:ϵ>0,(x1±ϵ,x2±ϵ)A}A_{\mathbf{x}}=\{(\epsilon,0)^{T}~:~\epsilon>0,(x_{1}\pm\epsilon,x_{2}\pm\epsilon)\in A^{*}\}

then,

A=i=14fi(AiAx)λ(A)=i=14fi(Ai)=2×i=14λ(Ai),A^{*}=\bigcup_{i=1}^{4}f_{i}(A_{i}\cup A_{x})\quad\Longrightarrow\quad\lambda(A^{*})=\sum_{i=1}^{4}f_{i}(A_{i})\quad=\quad 2\times\sum_{i=1}^{4}\lambda(A_{i}),

since, fi(Ai)f_{i}(A_{i})’s are pairwise disjoint, λ(fi(Ax))=0\lambda(f_{i}(A_{x}))=0 and λ(fi(Ai))=2λ(Ai)\lambda(f_{i}(A_{i}))=2\lambda(A_{i}) for 1i41\leq i\leq 4. It follows from (S-3.3) that

K2(𝐱,A)p¯2a2(min{p¯mp¯M,1})2λ(A)=cλC(A)K^{2}(\mathbf{x},A)\quad\geq\quad{\underline{p}}^{2}a^{2}\left(\min\left\{\dfrac{\underline{p}m}{\bar{p}M},1\right\}\right)^{2}\lambda(A^{*})\quad=\quad c\lambda_{C}(A)

where c=p¯2a2(min{p¯mp¯M,1})2>0c={\underline{p}}^{2}a^{2}\left(\min\left\{\dfrac{\underline{p}m}{\bar{p}M},1\right\}\right)^{2}>0.
This completes the proof that EE is small.

That the chain is irreducible, follows easily, for any 𝐱\mathbf{x}, the set {𝐱}\{\mathbf{x}\} is a compact set and for a measurable set AA with λ(A)>0\lambda(A)>0 we may choose CC in the first part of the proof such that λ(CA)>0\lambda(C\cap A)>0. Now,

K2(𝐱,A)cλ(CA)>0K^{2}(\mathbf{x},A)\quad\geq\quad c\lambda(C\cap A)>0

Also aperiodicity follows trivially from the observation that any set with positive λ\lambda-measure can be accessed in at most 2 steps.

S-4 General TMCMC algorithm with single ϵ\epsilon and dependent 𝐳\mathbf{z}

Also, let Let h1(𝐩)h_{1}(\mathbf{p}), h2(𝐪)h_{2}(\mathbf{q}) be the specified joint distributions of 𝐩\mathbf{p} and 𝐪\mathbf{q} induced by the Gaussian distributions of 𝐰1,𝐰2,𝐰3\mathbf{w}_{1},\mathbf{w}_{2},\mathbf{w}_{3}, and let P(𝐳|𝐩,𝐪)=i=1kfi(zi|pi,qi)P(\mathbf{z}|\mathbf{p},\mathbf{q})=\prod_{i=1}^{k}f_{i}(z_{i}|p_{i},q_{i}) denote the conditional probability of 𝐳\mathbf{z}, given 𝐩,𝐪\mathbf{p},\mathbf{q}, where fi(|pi,qi)f_{i}(\cdot|p_{i},q_{i}) is the conditional probability of ziz_{i} given pip_{i} and qiq_{i}. Then the general TMCMC algorithm with singleton ϵ\epsilon and dependent 𝐳\mathbf{z} is given as follows.

Algorithm S-4.1
 

General TMCMC algorithm based on single ϵ\epsilon and dependent 𝐳\mathbf{z}.  

  • Input: Initial value 𝐱(0){\mathbf{x}}^{(0)}, and number of iterations NN.

    1. 1.

      For t=0,,N1t=0,\ldots,N-1

      1. (a)

        Generate 𝐰1Nk(𝝁1,𝚺1)\mathbf{w}_{1}\sim N_{k}(\mbox{\boldmath$\mu$}_{1},\mbox{\boldmath$\Sigma$}_{1}), 𝐰2Nk(𝝁2,𝚺2)\mathbf{w}_{2}\sim N_{k}(\mbox{\boldmath$\mu$}_{2},\mbox{\boldmath$\Sigma$}_{2}), and 𝐰3Nk(μ3,𝚺3)\mathbf{w}_{3}\sim N_{k}(\mu_{3},\mbox{\boldmath$\Sigma$}_{3}).

      2. (b)

        For i=1,,ki=1,\ldots,k, set pi=exp(w1i)/j=13exp(wji)p_{i}=\exp\left(w_{1i}\right)/\sum_{j=1}^{3}\exp\left(w_{ji}\right), qi=exp(w2i)/j=13exp(wji)q_{i}=\exp\left(w_{2i}\right)/\sum_{j=1}^{3}\exp\left(w_{ji}\right), and 1piqi=exp(w3i)/j=13exp(wji)1-p_{i}-q_{i}=\exp\left(w_{3i}\right)/\sum_{j=1}^{3}\exp\left(w_{ji}\right).

      3. (c)

        Generate ϵg()\epsilon\sim g(\cdot) and an index i(1;p1,,p3k1)i\sim\mathcal{M}(1;p_{1},\ldots,p_{3^{k}-1}) independently.

    2. 2.
      𝐱=T𝐳i(𝐱(t),ϵ) and α(𝐱(t),ϵ)=min(1,P(𝐳ic|𝐩,𝐪)P(𝐳i|𝐩,𝐪)π(𝐱)π(𝐱(t))|(T𝐳i(𝐱(t),ϵ),ϵ)(𝐱(t),ϵ)|)\mathbf{x}^{\prime}=T_{\mathbf{z}_{i}}({\mathbf{x}}^{(t)},\epsilon)\quad\textrm{ and }\quad\alpha({\mathbf{x}}^{(t)},\epsilon)=\min\left(1,\dfrac{P(\mathbf{z}^{c}_{i}|\mathbf{p},\mathbf{q})}{P(\mathbf{z}_{i}|\mathbf{p},\mathbf{q})}~\dfrac{\pi(\mathbf{x}^{\prime})}{\pi({\mathbf{x}}^{(t)})}~\left|\frac{\partial(T_{\mathbf{z}_{i}}({\mathbf{x}}^{(t)},\epsilon),\epsilon)}{\partial({\mathbf{x}}^{(t)},\epsilon)}\right|\right)
    3. 3.

      Set

      𝐱(t+1)={𝐱 with probability α(𝐱(t),ϵ)𝐱(t) with probability 1α(𝐱(t),ϵ){\mathbf{x}}^{(t+1)}=\left\{\begin{array}[]{ccc}\mathbf{x}^{\prime}&\textsf{ with probability }&\alpha({\mathbf{x}}^{(t)},\epsilon)\\ {\mathbf{x}}^{(t)}&\textsf{ with probability }&1-\alpha({\mathbf{x}}^{(t)},\epsilon)\end{array}\right.
  • End for

 

S-5 Proof of detailed balance for TMCMC with dependent 𝐳\mathbf{z}

Let 𝐲=T𝐳(𝐱,ϵ)T𝐳(𝐱,𝒴)\mathbf{y}=T_{\mathbf{z}}(\mathbf{x},\epsilon)\in T_{\mathbf{z}}(\mathbf{x},\mathcal{Y}), then 𝐱=T𝐳c(𝐲,ϵ)\mathbf{x}=T_{\mathbf{z}^{c}}(\mathbf{y},\epsilon). The kernel KK satisfies,

π(𝐱)K(𝐱𝐲)\displaystyle\pi(\mathbf{x})K(\mathbf{x}\to\mathbf{y}) =\displaystyle= π(𝐱)h1(𝐩)h2(𝐪)P(𝐳|𝐩,𝐪)g(ϵ)min{1,P(𝐳c|𝐩,𝐪)π(𝐲)P(𝐳|𝐩,𝐪)π(𝐱)J𝐳(𝐱,ϵ)}\displaystyle\pi(\mathbf{x})h_{1}(\mathbf{p})h_{2}(\mathbf{q})P(\mathbf{z}|\mathbf{p},\mathbf{q})~g(\epsilon)\min\left\{1,\dfrac{P(\mathbf{z}^{c}|\mathbf{p},\mathbf{q})\pi(\mathbf{y})}{P(\mathbf{z}|\mathbf{p},\mathbf{q})\pi(\mathbf{x})}J_{\mathbf{z}}(\mathbf{x},\epsilon)\right\}
=\displaystyle= h1(𝐩)h2(𝐪)g(ϵ)min{π(𝐱)P(𝐳|𝐩,𝐪),π(𝐲)P(𝐳c|𝐩,𝐪)J𝐳(𝐱,ϵ)}\displaystyle h_{1}(\mathbf{p})h_{2}(\mathbf{q})g(\boldsymbol{\epsilon})\min\left\{\pi(\mathbf{x})P(\mathbf{z}|\mathbf{p},\mathbf{q}),\pi(\mathbf{y})P(\mathbf{z}^{c}|\mathbf{p},\mathbf{q})J_{\mathbf{z}}(\mathbf{x},\epsilon)\right\}

and

π(𝐲)K(𝐲𝐱)\displaystyle\pi(\mathbf{y})K(\mathbf{y}\to\mathbf{x}) =\displaystyle= π(𝐲)h1(𝐩)h2(𝐪)P(𝐳c|𝐩,𝐪)g(ϵ)J𝐳(𝐱,ϵ)min{1,P(𝐳|𝐩,𝐪)π(𝐱)P(𝐳c|𝐩,𝐪)π(𝐲)J𝐳c(𝐲,ϵ)}\displaystyle\pi(\mathbf{y})h_{1}(\mathbf{p})h_{2}(\mathbf{q})P(\mathbf{z}^{c}|\mathbf{p},\mathbf{q})g(\epsilon)J_{\mathbf{z}}(\mathbf{x},\epsilon)\min\left\{1,\dfrac{P(\mathbf{z}|\mathbf{p},\mathbf{q})\pi(\mathbf{x})}{P(\mathbf{z}^{c}|\mathbf{p},\mathbf{q})\pi(\mathbf{y})}J_{\mathbf{z}^{c}}(\mathbf{y},\epsilon)~\right\}
=\displaystyle= h1(𝐩)h2(𝐪)g(ϵ)min{π(𝐲)P(𝐳c|𝐩,𝐪)J𝐳(𝐱,ϵ),π(𝐱)P(𝐳|𝐩,𝐪)}\displaystyle h_{1}(\mathbf{p})h_{2}(\mathbf{q})g(\epsilon)\min\left\{\pi(\mathbf{y})P(\mathbf{z}^{c}|\mathbf{p},\mathbf{q})J_{\mathbf{z}}(\mathbf{x},\epsilon),\pi(\mathbf{x})P(\mathbf{z}|\mathbf{p},\mathbf{q})\right\}

S-6 Improved acceptance rates of additive TMCMC with singleton ϵ\epsilon compared to joint updating using RWMH

The joint RWMH algorithm generates ϵ=(ϵ1,,ϵk)\boldsymbol{\epsilon}=(\epsilon_{1},\ldots,\epsilon_{k})^{\prime} independently from N(0,1)N(0,1), and then uses the transformation of the form xi=xi+aiϵix^{\prime}_{i}=x_{i}+a_{i}\epsilon_{i}, where ai>0a_{i}>0 are appropriate scaling constants. For large kk, the so-called “curse of dimensionality” can force the acceptance rate to be close to zero. On the other hand, the additive-transformation based TMCMC also updates (x1,,xk)(x_{1},\ldots,x_{k}) simultaneously in a single block, but instead of using kk different ϵi\epsilon_{i}, it uses a single ϵ\epsilon for updating all the xix_{i} variables. In other words, for TMCMC based on additive transformation ϵ\boldsymbol{\epsilon} is of the form ϵ=(±ϵ,,±ϵ)\boldsymbol{\epsilon}=(\pm\epsilon,\ldots,\pm\epsilon)^{\prime}, where ϵN(0,1)𝕀{ϵ>0}\epsilon\sim N(0,1)\mathbb{I}_{\{\epsilon>0\}}. Thus, relative to RWMH, the dimension in the TMCMC case is effectively reduced to 1, avoiding the curse of dimensionality. Thus, it is expected that additive TMCMC will have a much higher acceptance rate than RWMH. In this section we formalize and compare the issues related to acceptance rates of additive TMCMC and RWMH.

S-6.1 Discussion on optimal scaling and optimal acceptance rate of additive TMCMC and RWMH

A reasonable approach to compare the acceptance rates of additive TMCMC and RWMH is to develop the optimal scaling theory for additive TMCMC, obtain the optimal acceptance rate, and then compare the latter with the optimal acceptance rates for RWMH, which are already established in the MCMC literature. Indeed, optimal scaling and optimal acceptance rate of additive TMCMC and comparison with those of RWMH is the subject of Dey and Bhattacharya (2013), where it is shown that additive TMCMC has a much higher optimal acceptance rate compared to RWMH. Before we summarize the results of Dey and Bhattacharya (2013) we first provide a brief overview of optimal scaling and optimal acceptance rate of RWMH.

S-6.1.1 Brief overview of optimal scaling and optimal acceptance rate for RWMH

Roughly, the optimal random walk proposal variance, represented as an inverse function of the dimension kk, is the one that maximizes the speed of convergence to the stationary distribution of a relevant diffusion process to which a ‘sped-up’ version of RWMH weakly converges as the dimension kk increases to infinity. The optimal acceptance rate corresponds to the optimal proposal variance. Under various assumptions on the form of the target distribution π\pi, ranging from the iidiid assumption (Roberts et al. (1997)), through independent but non-identical set-up (Bedard (2007)), to a more general dependent structure (Mattingly et al. (2011)), the optimal acceptance rate turns out to be 0.234.

S-6.1.2 Optimal scaling and optimal acceptance rate for additive TMCMC

In Dey and Bhattacharya (2013) it has been proved in the case of additive TMCMC, assuming pi=qi=1/2p_{i}=q_{i}=1/2, that the optimal acceptance rate, as kk\rightarrow\infty, is 0.439 under the set-ups (iidiid, independent but non-identical, and dependent) for which the optimal acceptance rate for RWMH has been studied and established to be 0.234. Thus, the optimal acceptance rate for additive TMCMC is much higher than that of RWMH. The optimal scalings, that is, the optimal values of the scales a1,,aka_{1},\ldots,a_{k} are also available using the optimal scaling theory. As shown in Dey and Bhattacharya (2013), all these results for additive TMCMC and RWMH remain true even in all the aforementioned set-ups if some of the co-ordinates of 𝐱\mathbf{x} are updated at random, conditioning on the remaining co-ordinates.

S-6.2 Comparison between the asymptotic forms of the acceptance rates of additive TMCMC and RWMH for strongly log-concave target densities

The results on optimal scaling and optimal acceptance rate discussed in Sections S-6.1.1 and S-6.1.2 are available only for special forms of the target distribution π\pi. In this section we obtain the asymptotic forms of the acceptance rates associated with RWMH and additive TMCMC assuming that the target density is strongly log-concave. In particular, under suitable conditions we show that as the dimension increases, the acceptance rate of RWMH converges to zero at a much faster rate than that of additive TMCMC.

Assuming without loss of generality that the marginal variances of the target density π\pi are all unity (achieved after suitable scaling perhaps), for RWMH we consider the following proposed value 𝐱\mathbf{x}^{\prime} given the current value 𝐱\mathbf{x}: 𝐱=𝐱+ϵ\mathbf{x}^{\prime}=\mathbf{x}+\boldsymbol{\epsilon}, where ϵNk(𝟎,𝐈k)\boldsymbol{\epsilon}\sim N_{k}(\mbox{\boldmath$0$},\mathbf{I}_{k}). On the other hand, for additive TMCMC, we consider 𝐱=𝐱+ϵ𝜹\mathbf{x}^{\prime}=\mathbf{x}+\epsilon\mbox{\boldmath$\delta$}, where ϵN(0,1)𝕀(ϵ>0)\epsilon\sim N(0,1)\mathbb{I}(\epsilon>0) and the components δi\delta_{i} of 𝜹\boldsymbol{\delta} are iidiid taking values ±1\pm 1 with probability 1/21/2 each.

To proceed we consider the following form of acceptance rate for our asymptotic framework. Letting R(𝐱|𝐱)R(\mathbf{x}^{\prime}|\mathbf{x}) denote the acceptance probability of 𝐱\mathbf{x}^{\prime} given the current value 𝐱\mathbf{x}, and letting UUniform(0,1)U\sim Uniform(0,1), the acceptance rate is given by

AR\displaystyle AR =R(𝐱|𝐱)q(𝐱|𝐱)π(𝐱)𝑑𝐱𝑑𝐱\displaystyle=\int R(\mathbf{x}^{\prime}|\mathbf{x})q(\mathbf{x}^{\prime}|\mathbf{x})\pi(\mathbf{x})d\mathbf{x}d\mathbf{x}^{\prime}
=Pr(U<R(𝐱|𝐱))q(𝐱|𝐱)π(𝐱)𝑑𝐱𝑑𝐱\displaystyle=\int Pr\left(U<R(\mathbf{x}^{\prime}|\mathbf{x})\right)q(\mathbf{x}^{\prime}|\mathbf{x})\pi(\mathbf{x})d\mathbf{x}d\mathbf{x}^{\prime}
=[Pr(U<R(𝐱|𝐱))q(𝐱|𝐱)𝑑𝐱]π(𝐱)𝑑𝐱\displaystyle=\int\left[\int Pr\left(U<R(\mathbf{x}^{\prime}|\mathbf{x})\right)q(\mathbf{x}^{\prime}|\mathbf{x})d\mathbf{x}^{\prime}\right]\pi(\mathbf{x})d\mathbf{x}
=[01Pr(R(𝐱|𝐱)>u)𝑑u]π(𝐱)𝑑𝐱\displaystyle=\int\left[\int_{0}^{1}Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)du\right]\pi(\mathbf{x})d\mathbf{x} (S-6.1)

In the above formula for acceptance rate note that Pr(R(𝐱|𝐱)>u)1Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)\rightarrow 1 as u0u\rightarrow 0 and Pr(R(𝐱|𝐱)>u)0Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)\rightarrow 0 as u1u\rightarrow 1. Hence, given any η1>0,η2>0\eta_{1}>0,\eta_{2}>0, we can choose ψ1,ψ2(0,1)\psi_{1},\psi_{2}\in(0,1) sufficiently small such that 0ψ1Pr(R(𝐱|𝐱)>u)𝑑u<η1\int_{0}^{\psi_{1}}Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)du<\eta_{1} and 1ψ21Pr(R(𝐱|𝐱)>u)𝑑u<η2\int_{1-\psi_{2}}^{1}Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)du<\eta_{2}. Hence, re-writing (S-6.1) as

AR\displaystyle AR =\displaystyle= [0ψ1Pr(R(𝐱|𝐱)>u)𝑑u]π(𝐱)𝑑𝐱+[ψ11ψ2Pr(R(𝐱|𝐱)>u)𝑑u]π(𝐱)𝑑𝐱\displaystyle\int\left[\int_{0}^{\psi_{1}}Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)du\right]\pi(\mathbf{x})d\mathbf{x}+\int\left[\int_{\psi_{1}}^{1-\psi_{2}}Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)du\right]\pi(\mathbf{x})d\mathbf{x}
+[1ψ21Pr(R(𝐱|𝐱)>u)𝑑u]π(𝐱)𝑑𝐱,\displaystyle\quad\quad\quad\quad+\int\left[\int_{1-\psi_{2}}^{1}Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)du\right]\pi(\mathbf{x})d\mathbf{x},

we find that the first and the third term on the right hand side are negligible for any algorithm. So, for the purpose of comparing algorithms with respect to their acceptance rates, we consider only the middle term; in all that follow we denote

AR=[ψ11ψ2Pr(R(𝐱|𝐱)>u)𝑑u]π(𝐱)𝑑𝐱.AR=\int\left[\int_{\psi_{1}}^{1-\psi_{2}}Pr\left(R(\mathbf{x}^{\prime}|\mathbf{x})>u\right)du\right]\pi(\mathbf{x})d\mathbf{x}. (S-6.2)

For our purpose, we consider a target density π(𝐱)\pi(\mathbf{x}) of kk variables that is strongly log-concave, that is,

Mk𝐈k2logπ(𝐱)mk𝐈k,-M_{k}\mathbf{I}_{k}\leq\nabla^{2}\log\pi(\mathbf{x})\leq-m_{k}\mathbf{I}_{k}, (S-6.3)

where we assume that Mk=ck+mkM_{k}=c_{k}+m_{k}, with mk,ck>0m_{k},~c_{k}>0 for every kk. We further assume that mkm_{k}\rightarrow\infty, and the sequence {ck}\{c_{k}\} is such that ck/mk0c_{k}/m_{k}\rightarrow 0 as kk\rightarrow\infty. Then clearly, MkmkM_{k}\asymp m_{k}, meaning Mk/mk1M_{k}/m_{k}\rightarrow 1 as kk\rightarrow\infty. In fact, we assume that Mk/mkM_{k}/m_{k} approaches 1 at a sufficiently fast rate, so that k|Mkmk1|0k\left|\frac{M_{k}}{m_{k}}-1\right|\rightarrow 0. For our purpose we assume that ck=O(ks);s1c_{k}=O(k^{s});s\geq 1 and mk=O(kt);t>s+12m_{k}=O(k^{t});t>s+1\geq 2, so that Mk=O(kt)M_{k}=O(k^{t}). It is easy to verify that these choices satisfy the above conditions.

It is important to note that our assumption mk,Mkm_{k},M_{k}\rightarrow\infty need not hold for all strongly log-concave distributions. For instance, when π\pi is the iidiid product of standard normals, that is, when 𝐱Nk(𝟎,𝐈k)\mathbf{x}\sim N_{k}\left(\mbox{\boldmath$0$},\mathbf{I}_{k}\right) under π\pi, 2logπ(𝐱)=𝐈k\nabla^{2}\log\pi(\mathbf{x})=\mathbf{I}_{k}. In this case, mk=Mk=1m_{k}=M_{k}=1 for every k1k\geq 1. In general, even if mkm_{k} and MkM_{k} remains finite as kk grows to infinity, our proofs remain valid provided that MkmkM_{k}\asymp m_{k} and k|Mkmk1|0k\left|\frac{M_{k}}{m_{k}}-1\right|\rightarrow 0. The case of π\pi being an iidiid product of standard normals clearly satisfies the above conditions.

S-6.2.1 Asymptotic form of the acceptance rate for RWMH

Let 𝐱\mathbf{x}^{*} denote the mode of the target density π()\pi(\cdot). Then for every r(0,1),r\in(0,1),

Pr(R(𝐱|𝐱)<r)=Pr(π(𝐱)/π(𝐱)<r)=Pr(logπ(𝐱)logπ(𝐱)<logr)\displaystyle Pr~(R(\mathbf{x}^{\prime}|\mathbf{x})<r)=Pr~(\pi(\mathbf{x}^{\prime})/\pi(\mathbf{x})<r)=Pr~(\log\pi(\mathbf{x}^{\prime})-\log\pi(\mathbf{x})<\log r)
=\displaystyle= Pr([logπ(𝐱)logπ(𝐱)][logπ(𝐱)logπ(𝐱)]<logr)\displaystyle Pr~(\left[\log\pi(\mathbf{x}^{\prime})-\log\pi(\mathbf{x^{*}})\right]-\left[\log\pi(\mathbf{x}^{\prime})-\log\pi(\mathbf{x^{*}})\right]<\log r)
=\displaystyle= Pr([logπ(𝐱)T(𝐱𝐱)+(1/2)(𝐱𝐱)T2logπ(𝝃𝟏(𝐱,𝐱))(𝐱𝐱)]\displaystyle Pr~\left(\left[\nabla\log\pi(\mathbf{x^{*}})^{T}(\mathbf{x}^{\prime}-\mathbf{x^{*}})+(1/2)(\mathbf{x}^{\prime}-\mathbf{x^{*}})^{T}\nabla^{2}\log\pi(\boldsymbol{\xi_{1}(\mathbf{x}^{\prime},\mathbf{x}^{*})})(\mathbf{x}^{\prime}-\mathbf{x^{*}})\right]\right.
[logπ(𝐱)T(𝐱𝐱)+(1/2)(𝐱𝐱)T2logπ(𝝃𝟐(𝐱,𝐱))(𝐱𝐱)]<logr),\displaystyle\quad\quad\left.-\left[\nabla\log\pi(\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})+(1/2)(\mathbf{x}-\mathbf{x^{*}})^{T}\nabla^{2}\log\pi(\boldsymbol{\xi_{2}(\mathbf{x},\mathbf{x}^{*})})(\mathbf{x}-\mathbf{x^{*}})\right]<\log r\right),
 for some 𝝃1(𝐱,𝐱),𝝃2(𝐱,𝐱)depending upon(𝐱,𝐱)and(𝐱,𝐱)respectively;\displaystyle\quad\textrm{~for some }\boldsymbol{\xi}_{1}(\mathbf{x}^{\prime},\mathbf{x}^{*}),\boldsymbol{\xi}_{2}(\mathbf{x},\mathbf{x}^{*})\ \ \mbox{depending upon}\ \ (\mathbf{x}^{\prime},\mathbf{x}^{*})\ \ \mbox{and}(\mathbf{x},\mathbf{x}^{*})\ \ \mbox{respectively};
=\displaystyle= Pr([(1/2)(𝐱𝐱)T2logπ(𝝃𝟏(𝐱,𝐱))(𝐱𝐱)]\displaystyle Pr~\left(\left[(1/2)(\mathbf{x}^{\prime}-\mathbf{x^{*}})^{T}\nabla^{2}\log\pi(\boldsymbol{\xi_{1}(\mathbf{x}^{\prime},\mathbf{x}^{*})})(\mathbf{x}^{\prime}-\mathbf{x^{*}})\right]\right.
[(1/2)(𝐱𝐱)T2logπ(𝝃𝟐(𝐱,𝐱))(𝐱𝐱)]<logr)\displaystyle\quad\quad\left.-\left[(1/2)(\mathbf{x}-\mathbf{x^{*}})^{T}\nabla^{2}\log\pi(\boldsymbol{\xi_{2}(\mathbf{x}^{\prime},\mathbf{x}^{*})})(\mathbf{x}-\mathbf{x^{*}})\right]<\log r\right)
sincelogπ(𝐱)=𝟎.\displaystyle\quad\quad\quad\quad\quad\mbox{since}\ \ \nabla\log\pi(\mathbf{x^{*}})=\mbox{\boldmath$0$}.

Thus from the assumption in (S-6.3), and noting that (𝐱𝐱)T(𝐱𝐱)=(𝐱𝐱)T(𝐱𝐱)+2(𝐱𝐱)Tϵ+ϵTϵ(\mathbf{x}^{\prime}-\mathbf{x^{*}})^{T}(\mathbf{x}^{\prime}-\mathbf{x^{*}})=(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})+2(\mathbf{x}-\mathbf{x^{*}})^{T}\boldsymbol{\epsilon}+\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon} it follows that

Pr((Mkmk)2(𝐱𝐱)T(𝐱𝐱)mk(𝐱𝐱)Tϵmk2ϵTϵ<logr)Pr(R(𝐱|𝐱)<r)Pr((Mkmk)2(𝐱𝐱)T(𝐱𝐱)Mk(𝐱𝐱)TϵMk2ϵTϵ<logr);\begin{split}&Pr~\left(\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-m_{k}(\mathbf{x}-\mathbf{x^{*}})^{T}\boldsymbol{\epsilon}-\frac{m_{k}}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}<\log r\right)\\ &\leq Pr~(R(\mathbf{x}^{\prime}|\mathbf{x})<r)\\ &\leq Pr~\left(-\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-M_{k}(\mathbf{x}-\mathbf{x^{*}})^{T}\boldsymbol{\epsilon}-\frac{M_{k}}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}<\log r\right);\end{split} (S-6.4)

so that

Pr((Mkmk)2(𝐱𝐱)T(𝐱𝐱)mk(𝐱𝐱)Tϵmk2ϵTϵ>logr)Pr(R(𝐱|𝐱)>r)Pr((Mkmk)2(𝐱𝐱)T(𝐱𝐱)Mk(𝐱𝐱)TϵMk2ϵTϵ>logr),\begin{split}&Pr~\left(\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-m_{k}(\mathbf{x}-\mathbf{x^{*}})^{T}\boldsymbol{\epsilon}-\frac{m_{k}}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}>\log r\right)\\ &\geq Pr~(R(\mathbf{x}^{\prime}|\mathbf{x})>r)\\ &\geq Pr~\left(-\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-M_{k}(\mathbf{x}-\mathbf{x^{*}})^{T}\boldsymbol{\epsilon}-\frac{M_{k}}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}>\log r\right),\end{split} (S-6.5)

Hence, using (S-6.2) it can be seen that the acceptance rate is bounded above and below as follows

[ψ11ψ2{A2,ϵ,uk1(2π)k/2exp{12ϵTϵ}𝑑ϵ}𝑑u]π(𝐱)𝑑𝐱\displaystyle\int\left[\int_{\psi_{1}}^{1-\psi_{2}}\left\{\int_{A^{k}_{2,\boldsymbol{\epsilon},u}}\frac{1}{(2\pi)^{k/2}}\exp\left\{-\frac{1}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}\right\}d\boldsymbol{\epsilon}\right\}du\right]\pi(\mathbf{x})d\mathbf{x}
AR(RWMH)\displaystyle\geq AR^{(RWMH)} (S-6.6)
[ψ11ψ2{A1,ϵ,uk1(2π)k/2exp{12ϵTϵ}𝑑ϵ}𝑑u]π(𝐱)𝑑𝐱,\displaystyle\geq\int\left[\int_{\psi_{1}}^{1-\psi_{2}}\left\{\int_{A^{k}_{1,\boldsymbol{\epsilon},u}}\frac{1}{(2\pi)^{k/2}}\exp\left\{-\frac{1}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}\right\}d\boldsymbol{\epsilon}\right\}du\right]\pi(\mathbf{x})d\mathbf{x},

where

A1,ϵ,uk={𝐱:(Mkmk)2(𝐱𝐱)T(𝐱𝐱)Mk(𝐱𝐱)TϵMk2ϵTϵ>logu}A^{k}_{1,\boldsymbol{\epsilon},u}=\left\{\mathbf{x}:-\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-M_{k}(\mathbf{x}-\mathbf{x^{*}})^{T}\boldsymbol{\epsilon}-\frac{M_{k}}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}>\log u\right\}

and

A2,ϵ,uk={𝐱:(Mkmk)2(𝐱𝐱)T(𝐱𝐱)mk(𝐱𝐱)Tϵmk2ϵTϵ>logu}.A^{k}_{2,\boldsymbol{\epsilon},u}=\left\{\mathbf{x}:\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-m_{k}(\mathbf{x}-\mathbf{x^{*}})^{T}\boldsymbol{\epsilon}-\frac{m_{k}}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}>\log u\right\}.

Now, note that for some 𝝃(𝐱,𝐱)\mbox{\boldmath$\xi$}(\mathbf{x},\mathbf{x}^{*}) depending upon 𝐱\mathbf{x} and 𝐱\mathbf{x}^{*},

π(𝐱)\displaystyle\pi(\mathbf{x}) =\displaystyle= exp{logπ(𝐱)}d𝐱\displaystyle\exp\left\{\log\pi(\mathbf{x})\right\}d\mathbf{x}
=\displaystyle= exp{logπ(𝐱)+12(𝐱𝐱)T2logπ(𝝃(𝐱,𝐱))(𝐱𝐱)},\displaystyle\exp\left\{\log\pi(\mathbf{x^{*}})+\frac{1}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}\nabla^{2}\log\pi(\boldsymbol{\xi}(\mathbf{x},\mathbf{x}^{*}))(\mathbf{x}-\mathbf{x^{*}})\right\},

so that the inequalities related to strong convexity, given by (S-6.3) yield

(2π)k/2mkkπ(𝐱)mkk(2π)k/2exp{mk2(𝐱𝐱)T(𝐱𝐱)}π(𝐱)\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\frac{m^{k}_{k}}{(2\pi)^{k/2}}\exp\left\{-\frac{m_{k}}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})\right\}\geq\pi(\mathbf{x})
(2π)k/2Mkkπ(𝐱)Mkk(2π)k/2exp{Mk2(𝐱𝐱)T(𝐱𝐱)}\geq\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\frac{M^{k}_{k}}{(2\pi)^{k/2}}\exp\left\{-\frac{M_{k}}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})\right\} (S-6.8)

Using the lower bound of π(𝐱)\pi(\mathbf{x}) given by (S-6.8) and Fubini’s theorem, the lower bound of the acceptance rate given by (S-6.6) can be further bounded below as

AR(RWMH)\displaystyle AR^{(RWMH)} ψ11ψ2A1,ϵ,uk1(2π)k/2exp{12ϵTϵ}π(𝐱)𝑑𝐱𝑑u𝑑ϵ\displaystyle\geq\int\int_{\psi_{1}}^{1-\psi_{2}}\int_{A^{k}_{1,\boldsymbol{\epsilon},u}}\frac{1}{(2\pi)^{k/2}}\exp\left\{-\frac{1}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}\right\}\pi(\mathbf{x})~d\mathbf{x}~du~d\boldsymbol{\epsilon}
(2π)k/2Mkkπ(𝐱)ψ11ψ2A1,ϵ,uk1(2π)k/2exp{12ϵTϵ}\displaystyle\geq\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\int\int_{\psi_{1}}^{1-\psi_{2}}\int_{A^{k}_{1,\boldsymbol{\epsilon},u}}\frac{1}{(2\pi)^{k/2}}\exp\left\{-\frac{1}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}\right\}
×Mkk(2π)k/2exp{Mk2(𝐱𝐱)T(𝐱𝐱)}d𝐱dudϵ\displaystyle\quad\quad\quad\quad\times\frac{M^{k}_{k}}{(2\pi)^{k/2}}\exp\left\{-\frac{M_{k}}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})\right\}~d\mathbf{x}~du~d\boldsymbol{\epsilon}
(2π)k/2Mkkπ(𝐱)infu(ψ1,1ψ2)Pr(A1,ϵ,uk),\displaystyle\geq\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\inf_{u\in(\psi_{1},1-\psi_{2})}Pr~(A^{k}_{1,\boldsymbol{\epsilon},u}), (S-6.9)

where Pr(A1,ϵ,uk)Pr~(A^{k}_{1,\boldsymbol{\epsilon},u}) must be calculated with respect to ϵNk(𝟎,𝐈k)\boldsymbol{\epsilon}\sim N_{k}(\mbox{\boldmath$0$},\mathbf{I}_{k}), and independently, 𝐱𝐱Nk(𝟎,Mk1𝐈k)\mathbf{x}-\mathbf{x}^{*}\sim N_{k}(\mbox{\boldmath$0$},M^{-1}_{k}\mathbf{I}_{k}).

Similarly, using the upper bound of π(𝐱)\pi(\mathbf{x}) given by (S-6.8) the upper bound of the acceptance rate given by (S-6.6) can be further bounded above as

AR(RWMH)\displaystyle AR^{(RWMH)} ψ11ψ2A2,ϵ,uk1(2π)k/2exp{12ϵTϵ}π(𝐱)𝑑𝐱𝑑u𝑑ϵ\displaystyle\leq\int\int_{\psi_{1}}^{1-\psi_{2}}\int_{A^{k}_{2,\boldsymbol{\epsilon},u}}\frac{1}{(2\pi)^{k/2}}\exp\left\{-\frac{1}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}\right\}\pi(\mathbf{x})~d\mathbf{x}~du~d\boldsymbol{\epsilon}
=(2π)k/2mkkπ(𝐱)ψ11ψ2Pr(A2,ϵ,uk)𝑑u\displaystyle=\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\int_{\psi_{1}}^{1-\psi_{2}}Pr~(A^{k}_{2,\boldsymbol{\epsilon},u})du
(2π)k/2mkkπ(𝐱)supu(ψ1,1ψ2)Pr(A2,ϵ,uk)\displaystyle\leq\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\sup_{u\in(\psi_{1},1-\psi_{2})}Pr~(A^{k}_{2,\boldsymbol{\epsilon},u}) (S-6.10)

The probability Pr(A2,ϵ,ukk)Pr~(A^{k}_{2,\boldsymbol{\epsilon},u_{k}}) must be calculated with respect to ϵNk(𝟎,𝐈k)\boldsymbol{\epsilon}\sim N_{k}(\mbox{\boldmath$0$},\mathbf{I}_{k}), and independently, 𝐱𝐱Nk(𝟎,mk1𝐈k)\mathbf{x}-\mathbf{x}^{*}\sim N_{k}(\mbox{\boldmath$0$},m^{-1}_{k}\mathbf{I}_{k}). Thus, we have

(2π)k/2Mkkπ(𝐱)infu(ψ1,1ψ2)Pr(A1,ϵ,uk)AR(RWMH)(2π)k/2mkkπ(𝐱)supu(ψ1,1ψ2)Pr(A2,ϵ,uk).\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\inf_{u\in(\psi_{1},1-\psi_{2})}Pr~(A^{k}_{1,\boldsymbol{\epsilon},u})\leq AR^{(RWMH)}\leq\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\sup_{u\in(\psi_{1},1-\psi_{2})}Pr~(A^{k}_{2,\boldsymbol{\epsilon},u}). (S-6.11)

We first focus on the lower bound in (S-6.11). As kk\rightarrow\infty,
(Mkmk)2(𝐱𝐱)T(𝐱𝐱)Mk(𝐱𝐱)TϵMk2ϵTϵ-\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-M_{k}(\mathbf{x}-\mathbf{x^{*}})^{T}\boldsymbol{\epsilon}-\frac{M_{k}}{2}\boldsymbol{\epsilon}^{T}\boldsymbol{\epsilon}

\displaystyle\sim AN(k2[(MkmkMk)+Mk],k2[(MkmkMk)2+2Mk+Mk2]),\displaystyle AN\left(-\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)+M_{k}\right],\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)^{2}+2M_{k}+M^{2}_{k}\right]\right), (S-6.12)

where AN(μ,σ2)AN(\mu,\sigma^{2}) denotes asymptotic normal with mean μ\mu and variance σ2\sigma^{2}. From (S-6.12) it follows that

infu(ψ1,1ψ2)Pr(A1,ϵ,uk)\displaystyle\inf_{u\in(\psi_{1},1-\psi_{2})}Pr~(A^{k}_{1,\boldsymbol{\epsilon},u}) \displaystyle\asymp 1supu(ψ1,1ψ2)Φ(logu+k2[(MkmkMk)+Mk]k2[(MkmkMk)2+2Mk+Mk2])\displaystyle 1-\sup_{u\in(\psi_{1},1-\psi_{2})}\Phi\left(\frac{\log u+\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)+M_{k}\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)^{2}+2M_{k}+M^{2}_{k}\right]}}\right) (S-6.13)
=\displaystyle= 1Φ(log(1ψ2)+k2[(MkmkMk)+Mk]k2[(MkmkMk)2+2Mk+Mk2]).\displaystyle 1-\Phi\left(\frac{\log(1-\psi_{2})+\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)+M_{k}\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)^{2}+2M_{k}+M^{2}_{k}\right]}}\right).

Combining (S-6.9) and (S-6.13) we obtain

AR(RWMH)(2π)k/2Mkkπ(𝐱){1Φ(log(1ψ2)+k2[(MkmkMk)+Mk]k2[(MkmkMk)2+2Mk+Mk2])}.AR^{(RWMH)}\geq\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\left\{1-\Phi\left(\frac{\log(1-\psi_{2})+\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)+M_{k}\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)^{2}+2M_{k}+M^{2}_{k}\right]}}\right)\right\}. (S-6.14)

Now focusing our attention on the upper bound of AR(RWMH)AR^{(RWMH)} we similarly obtain

AR(RWMH)\displaystyle AR^{(RWMH)} \displaystyle\leq (2π)k/2mkkπ(𝐱){1Φ(logψ1k2[(Mkmkmk)mk]k2[(Mkmkmk)2+2mk+mk2])}.\displaystyle\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\left\{1-\Phi\left(\frac{\log\psi_{1}-\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{m_{k}}\right)-m_{k}\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{m_{k}}\right)^{2}+2m_{k}+m^{2}_{k}\right]}}\right)\right\}.

In other words,

(2π)k/2Mkkπ(𝐱)\displaystyle\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*}) {1Φ(log(1ψ2)+k2[(MkmkMk)+Mk]k2[(MkmkMk)2+2Mk+Mk2])}AR(RWMH)\displaystyle\left\{1-\Phi\left(\frac{\log(1-\psi_{2})+\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)+M_{k}\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)^{2}+2M_{k}+M^{2}_{k}\right]}}\right)\right\}\leq AR^{(RWMH)}
(2π)k/2mkkπ(𝐱){1Φ(logψ1k2[(Mkmkmk)mk]k2[(Mkmkmk)2+2mk+mk2])}.\displaystyle\leq\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\left\{1-\Phi\left(\frac{\log\psi_{1}-\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{m_{k}}\right)-m_{k}\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{m_{k}}\right)^{2}+2m_{k}+m^{2}_{k}\right]}}\right)\right\}.

Since mkMkm_{k}\asymp M_{k}, it is easy to see that

log(1ψ2)+k2[(MkmkMk)+Mk]k2[(MkmkMk)2+2Mk+Mk2]k2,and\displaystyle\frac{\log(1-\psi_{2})+\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)+M_{k}\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)^{2}+2M_{k}+M^{2}_{k}\right]}}\asymp\sqrt{\frac{k}{2}},\quad\mbox{and}
logψ1k2[(Mkmkmk)mk]k2[(Mkmkmk)2+2mk+mk2]k2.\displaystyle\frac{\log\psi_{1}-\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{m_{k}}\right)-m_{k}\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{m_{k}}\right)^{2}+2m_{k}+m^{2}_{k}\right]}}\asymp\sqrt{\frac{k}{2}}.

Hence, it follows that

AR(RWMH)(2π)k/2Mkk{1Φ(k2)}.AR^{(RWMH)}\asymp\frac{(2\pi)^{k/2}}{M^{k}_{k}}\left\{1-\Phi\left(\sqrt{\frac{k}{2}}\right)\right\}. (S-6.17)

S-6.2.2 Asymptotic bounds of the acceptance rate for additive TMCMC

Next let us obtain lower and upper bounds of AR(TMCMC)AR^{(TMCMC)} associated with TMCMC with additive transformation. Recall that in this case, 𝐱=𝐱+ϵ𝜹\mathbf{x}^{\prime}=\mathbf{x}+\epsilon\boldsymbol{\delta} where ϵN(0,1)𝕀(ϵ>0)\epsilon\sim N(0,1)\mathbb{I}(\epsilon>0) and the components δi\delta_{i} of 𝜹\boldsymbol{\delta} are iidiid taking values ±1\pm 1 with probability 1/21/2 each. In this set up (S-6.5) becomes

Pr((Mkmk)2(𝐱𝐱)T(𝐱𝐱)mkϵ(𝐱𝐱)T𝜹mk2kϵ2>logr)Pr(R(𝐱|𝐱)>r)Pr((Mkmk)2(𝐱𝐱)T(𝐱𝐱)Mkϵ(𝐱𝐱)T𝜹Mk2kϵ2>logr),\begin{split}&Pr~\left(\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-m_{k}\epsilon(\mathbf{x}-\mathbf{x^{*}})^{T}\mbox{\boldmath$\delta$}-\frac{m_{k}}{2}k\epsilon^{2}>\log r\right)\\ &\geq Pr~(R(\mathbf{x}^{\prime}|\mathbf{x})>r)\\ &\geq Pr~\left(-\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-M_{k}\epsilon(\mathbf{x}-\mathbf{x^{*}})^{T}\mbox{\boldmath$\delta$}-\frac{M_{k}}{2}k\epsilon^{2}>\log r\right),\end{split} (S-6.18)

Now notice that, under the lower bound of π(𝐱)\pi(\mathbf{x}) provided in (S-6.8), as kk\rightarrow\infty,

Mk(𝐱𝐱)T(𝐱𝐱)kα.s.1,\frac{M_{k}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})}{k}\stackrel{{\scriptstyle\alpha.s.}}{{\longrightarrow}}1,

and

Mk(𝐱𝐱)T𝜹kα.s.0.\frac{\sqrt{M_{k}}(\mathbf{x}-\mathbf{x^{*}})^{T}\mbox{\boldmath$\delta$}}{k}\stackrel{{\scriptstyle\alpha.s.}}{{\longrightarrow}}0.

Similarly, under the upper bound of π(𝐱)\pi(\mathbf{x}) in (S-6.8), the above hold with MkM_{k} replaced with mkm_{k}. From these it follow that the asymptotic forms of the lower and the upper bounds of (S-6.18) are given by

Pr((Mkmk)2(𝐱𝐱)T(𝐱𝐱)Mkϵ(𝐱𝐱)T𝜹Mk2kϵ2>logr)Pr~\left(-\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-M_{k}\epsilon(\mathbf{x}-\mathbf{x^{*}})^{T}\mbox{\boldmath$\delta$}-\frac{M_{k}}{2}k\epsilon^{2}>\log r\right)
(2π)k/2Mkkπ(𝐱){2Φ(2kMklogr(MkmkMk2))1}\asymp\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\left\{2\Phi\left(\sqrt{-\frac{2}{kM_{k}}\log r-\left(\frac{M_{k}-m_{k}}{M^{2}_{k}}\right)}\right)-1\right\}

and

Pr((Mkmk)2(𝐱𝐱)T(𝐱𝐱)mkϵ(𝐱𝐱)T𝜹mk2kϵ2>logr)Pr~\left(\frac{(M_{k}-m_{k})}{2}(\mathbf{x}-\mathbf{x^{*}})^{T}(\mathbf{x}-\mathbf{x^{*}})-m_{k}\epsilon(\mathbf{x}-\mathbf{x^{*}})^{T}\mbox{\boldmath$\delta$}-\frac{m_{k}}{2}k\epsilon^{2}>\log r\right)
(2π)k/2mkkπ(𝐱){2Φ(2kmklogr+(Mkmkmk2))1}.\asymp\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\left\{2\Phi\left(\sqrt{-\frac{2}{km_{k}}\log r+\left(\frac{M_{k}-m_{k}}{m^{2}_{k}}\right)}\right)-1\right\}.

Using the above results, it follows as in the case of AR(RWMH)AR^{(RWMH)} that

(2π)k/2Mkkπ(𝐱){2infu(ψ1,1ψ2)Φ(2kMklogu(MkmkMk2))1}\displaystyle\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\left\{2\inf_{u\in(\psi_{1},1-\psi_{2})}\Phi\left(\sqrt{-\frac{2}{kM_{k}}\log u-\left(\frac{M_{k}-m_{k}}{M^{2}_{k}}\right)}\right)-1\right\}
AR(TMCMC)(2π)k/2Mkkπ(𝐱){2supu(ψ1,1ψ2)Φ(2kmklogu(mkMkmk2))1}.\displaystyle\leq AR^{(TMCMC)}\leq\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\left\{2\sup_{u\in(\psi_{1},1-\psi_{2})}\Phi\left(\sqrt{-\frac{2}{km_{k}}\log u-\left(\frac{m_{k}-M_{k}}{m^{2}_{k}}\right)}\right)-1\right\}.

Substituting the infimum and supremum over (ψ1,1ψ2)(\psi_{1},1-\psi_{2}) we obtain

(2π)k/2Mkkπ(𝐱){2Φ(2kMklog(1ψ2)(MkmkMk2))1}\displaystyle\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\left\{2\Phi\left(\sqrt{-\frac{2}{kM_{k}}\log(1-\psi_{2})-\left(\frac{M_{k}-m_{k}}{M^{2}_{k}}\right)}\right)-1\right\}
AR(TMCMC)(2π)k/2mkkπ(𝐱){2Φ(2kmklogψ1(mkMkmk2))1}.\displaystyle\leq AR^{(TMCMC)}\leq\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\left\{2\Phi\left(\sqrt{-\frac{2}{km_{k}}\log\psi_{1}-\left(\frac{m_{k}-M_{k}}{m^{2}_{k}}\right)}\right)-1\right\}.

Since k|Mkmk1|0k\left|\frac{M_{k}}{m_{k}}-1\right|\rightarrow 0 and mkMkm_{k}\asymp M_{k}, it follows that

2kMklog(1ψ2)(MkmkMk2)2kMklog(1ψ2)and\displaystyle-\frac{2}{kM_{k}}\log(1-\psi_{2})-\left(\frac{M_{k}-m_{k}}{M^{2}_{k}}\right)\asymp-\frac{2}{kM_{k}}\log(1-\psi_{2})\quad\mbox{and}
2kmklogψ1(mkMkmk2)2kmklogψ12kMklogψ1.\displaystyle-\frac{2}{km_{k}}\log\psi_{1}-\left(\frac{m_{k}-M_{k}}{m^{2}_{k}}\right)\asymp-\frac{2}{km_{k}}\log\psi_{1}\asymp-\frac{2}{kM_{k}}\log\psi_{1}.

Hence,

(2π)k/2Mkkπ(𝐱){2Φ(2kMklog(1ψ2))1}\displaystyle\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\left\{2\Phi\left(\sqrt{-\frac{2}{kM_{k}}\log(1-\psi_{2})}\right)-1\right\} \displaystyle\leq AR(TMCMC)\displaystyle AR^{(TMCMC)}
\displaystyle\leq (2π)k/2Mkkπ(𝐱){2Φ(2kMklogψ1)1}.\displaystyle\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*})\left\{2\Phi\left(\sqrt{-\frac{2}{kM_{k}}\log\psi_{1}}\right)-1\right\}.

For comparing (LABEL:eq:AR_TMCMC_bounds) with (S-6.17) where Mk=O(kt);t>2M_{k}=O\left(k^{t}\right);t>2, it can be easily verified using L’Hospital’s rule that for any ζ1>0\zeta_{1}>0, ζ2>0\zeta_{2}>0,

2Φ(ζ1kMk)11Φ(ζ2k).\frac{2\Phi\left(\frac{\zeta_{1}}{\sqrt{kM_{k}}}\right)-1}{1-\Phi\left(\zeta_{2}\sqrt{k}\right)}\rightarrow\infty. (S-6.20)

The above result will continue to hold if instead of Mk=O(kt);t>2M_{k}=O\left(k^{t}\right);t>2, MkaM_{k}\rightarrow a, where a>0a>0 is some constant. Hence, AR(TMCMC)AR^{(TMCMC)} converges to zero at a much slower rate compared to AR(RWMH)AR^{(RWMH)}.

S-7 Comparison of TMCMC with HMC

Motivated by Hamiltonian dynamics, Duane et al. (1987) introduced HMC, an MCMC algorithm with deterministic proposals based on approximations of the Hamiltonian equations. We will show that this algorithm is a special case of TMCMC, but first we provide a brief overview of HMC. More details can be found in Liu (2001), Cheung and Beck (2009) and the references therein.

S-7.1 Overview of HMC

If π(𝐱)\pi(\mathbf{x}) is the target distribution, a fictitious dynamical system may be considered, where 𝐱(t)d\mathbf{x}(t)\in\mathbb{R}^{d} can be thought of as the dd-dimensional position vector of a body of particles at time tt. If 𝐯(t)=𝐱˙(t)=d𝐱dt\mathbf{v}(t)=\dot{\mathbf{x}}(t)=\frac{d{\mathbf{x}}}{dt} is the speed vector of the particles, 𝐯˙(t)=d𝐯dt\dot{\mathbf{v}}(t)=\frac{d{\mathbf{v}}}{dt} is its acceleration vector, and F\vec{F} is the force exerted on the particle; then, by Newton’s law of motion F=𝐦𝐯˙(t)=(m1v1˙,,mdvd˙)(t)\vec{F}=\mathbf{m}\dot{\mathbf{v}}(t)=(m_{1}\dot{v_{1}},\ldots,m_{d}\dot{v_{d}})(t), where 𝐦d\mathbf{m}\in\mathbb{R}^{d} is a mass vector. The momentum vector, 𝐩=𝐦𝐯\mathbf{p}=\mathbf{m}\mathbf{v}, often used in classical mechanics, can be thought of as a vector of auxiliary variables brought in to facilitate simulation from π(𝐱)\pi(\mathbf{x}). The kinetic energy of the system is defined as W(𝐩)=𝐩𝐌1𝐩W(\mathbf{p})=\mathbf{p}^{\prime}\mathbf{M}^{-1}\mathbf{p}, 𝐌\mathbf{M} being the mass matrix. Usually, 𝐌\mathbf{M} is taken as 𝐌=diag{m1,,md}\mathbf{M}=diag\{m_{1},\ldots,m_{d}\}.

The target density π(𝐱)\pi(\mathbf{x}) is linked to the dynamical system via the potential energy field of the system, defined as U(𝐱)=logπ(𝐱)U(\mathbf{x})=-\log\pi(\mathbf{x}). The total energy (Hamiltonian function), is given by H(𝐱,𝐩)=U(𝐱)+W(𝐩)H(\mathbf{x},\mathbf{p})=U(\mathbf{x})+W(\mathbf{p}). A joint distribution over the phase-space (𝐱,𝐩)(\mathbf{x},\mathbf{p}) is then considered, given by

f(𝐱,𝐩)exp{H(𝐱,𝐩)}=π(𝐱)exp(𝐩𝐌1𝐩/2)f(\mathbf{x},\mathbf{p})\propto\exp\left\{-H(\mathbf{x},\mathbf{p})\right\}=\pi(\mathbf{x})\exp\left(-\mathbf{p}^{\prime}\mathbf{M}^{-1}\mathbf{p}/2\right) (S-7.1)

Since the marginal density of f(𝐱,𝐩)f(\mathbf{x},\mathbf{p}) is π(𝐱)\pi(\mathbf{x}), it now remains to provide a joint proposal mechanism for simulating (𝐱,𝐩)(\mathbf{x},\mathbf{p}) jointly; ignoring 𝐩\mathbf{p} yields 𝐱\mathbf{x} marginally from π()\pi(\cdot).

For the joint proposal mechanism, HMC makes use of Newton’s law of motion, derived from the law of conservation of energy, and often written in the form of Hamiltonian equations, given by

𝐱˙(t)\displaystyle\dot{\mathbf{x}}(t) =\displaystyle= H(𝐱,𝐩)𝐩=𝐌1𝐩,\displaystyle\frac{\partial H(\mathbf{x},\mathbf{p})}{\partial\mathbf{p}}=\mathbf{M}^{-1}\mathbf{p},
𝐩˙(t)\displaystyle\dot{\mathbf{p}}(t) =\displaystyle= H(𝐱,𝐩)𝐱=U(𝐱),\displaystyle-\frac{\partial H(\mathbf{x},\mathbf{p})}{\partial\mathbf{x}}=-\nabla U(\mathbf{x}),

where U(𝐱)=U(𝐱)𝐱\nabla U(\mathbf{x})=\frac{\partial U(\mathbf{x})}{\partial\mathbf{x}}. The Hamiltonian equations can be approximated by the commonly used leap-frog algorithm (Hockney (1970)), given by,

𝐱(t+δt)\displaystyle\mathbf{x}(t+\delta t) =𝐱(t)+δt𝐌1{𝐩(t)δt2U(𝐱(t))}\displaystyle=\mathbf{x}(t)+\delta t\mathbf{M}^{-1}\left\{\mathbf{p}(t)-\frac{\delta t}{2}\nabla U\left(\mathbf{x}(t)\right)\right\} (S-7.2)
𝐩(t+δt)\displaystyle\mathbf{p}(t+\delta t) =𝐩(t)δt2{U(𝐱(t))+U(𝐱(t+δt))}\displaystyle=\mathbf{p}(t)-\frac{\delta t}{2}\left\{\nabla U\left(\mathbf{x}(t)\right)+\nabla U\left(\mathbf{x}(t+\delta t)\right)\right\} (S-7.3)

Given choices of 𝐌\mathbf{M}, δt\delta t, and LL, the HMC is then given by the following algorthm:

Algorithm S-7.1
 

HMC  

  • Initialise 𝐱\mathbf{x} and draw 𝐩N(𝟎,𝐌)\mathbf{p}\sim N(\mbox{\boldmath$0$},\mathbf{M}).

  • Assuming the current state to be (𝐱,𝐩)(\mathbf{x},\mathbf{p}), do the following:

    1. 1.

      Generate 𝐩N(𝟎,𝐌)\mathbf{p}^{\prime}\sim N\left(\mbox{\boldmath$0$},\mathbf{M}\right);

    2. 2.

      Letting (𝐱(0),𝐩(0))=(𝐱,𝐩)(\mathbf{x}(0),\mathbf{p}(0))=(\mathbf{x},\mathbf{p}^{\prime}), run the leap-frog algorithm for LL time steps, to yield (𝐱′′,𝐩′′)=(𝐱(t+Lδt),𝐩(t+Lδt))(\mathbf{x}^{\prime\prime},\mathbf{p}^{\prime\prime})=\left(\mathbf{x}(t+L\delta t),\mathbf{p}(t+L\delta t)\right);

    3. 3.

      Accept (𝐱′′,𝐩′′)(\mathbf{x}^{\prime\prime},\mathbf{p}^{\prime\prime}) with probability

      min{1,exp{H(𝐱′′,𝐩′′)+H(𝐱,𝐩)}},\min\left\{1,\exp\left\{-H(\mathbf{x}^{\prime\prime},\mathbf{p}^{\prime\prime})+H(\mathbf{x},\mathbf{p}^{\prime})\right\}\right\}, (S-7.4)

      and accept (𝐱,𝐩)(\mathbf{x},\mathbf{p}^{\prime}) with the remaining probability.

 

In the above algorithm, it is not required to store simulations of 𝐩\mathbf{p}. Next we show that HMC is a special case of TMCMC.

S-7.2 HMC is a special case of TMCMC

To see that HMC is a special case of TMCMC, note that the leap-frog step of the HMC algorithm (Algorithm S-7.1) is actually a deterministic transformation of the form gL:(𝐱(0),𝐩(0))(𝐱(L),𝐩(L))g^{L}:(\mathbf{x}(0),\mathbf{p}(0))\to(\mathbf{x}(L),\mathbf{p}(L)) (see Liu (2001)). This transformation satisfies the following: if (𝐱,𝐩)=gL(𝐱,𝐩)(\mathbf{x}^{\prime},\mathbf{p}^{\prime})=g^{L}(\mathbf{x},\mathbf{p}), then (𝐱,𝐩)=gL(𝐱,𝐩)(\mathbf{x},-\mathbf{p})=g^{L}(\mathbf{x}^{\prime},-\mathbf{p}^{\prime}).

The Jacobian of this transformation is 1 because of the volume preservation property, which says that if V(0)V(0) is a subset of the phase space, and if V(t)={(𝐱(t),𝐩(t)):(𝐱(0),𝐩(0))V(0)}V(t)=\left\{(\mathbf{x}(t),\mathbf{p}(t)):(\mathbf{x}(0),\mathbf{p}(0))\in V(0)\right\}, then the volume |V(t)|=V(t)𝑑𝐱𝑑𝐩=V(0)𝑑𝐱𝑑𝐩=|V(0)||V(t)|={\int\int}_{V(t)}d\mathbf{x}d\mathbf{p}={\int\int}_{V(0)}d\mathbf{x}d\mathbf{p}=|V(0)|. As a result, the Jacobian does not feature in the HMC acceptance probability (S-7.4).

For any dimension, there is only one move type defined for HMC, which is the forward transformation gLg^{L}. Hence, this move type has probability one of selection, and all other move types which we defined in general terms in connection with TMCMC, have zero probability of selection. As a result, the corresponding TMCMC acceptance ratio needs slight modification—it must be made free of the move-type probabilities, which is exactly the case in (S-7.4).

The momentum vector 𝐩\mathbf{p} can be likened to ϵ\boldsymbol{\epsilon} of TMCMC, but note that 𝐩\mathbf{p} must always be of the same dimensionality as 𝐱\mathbf{x}; this is of course, permitted by TMCMC as a special case.

S-7.3 Comparison of acceptance rate for L=1L=1 with RWMH and TMCMC

For L=1L=1, the proposal corresponding to HMC is given by (see Cheung and Beck (2009))

q(𝐱𝐱(t))=N(𝐱:𝝁(t),𝚺(t)),q(\mathbf{x}^{\prime}\mid\mathbf{x}(t))=N\left(\mathbf{x}^{\prime}:\mbox{\boldmath$\mu$}(t),\mbox{\boldmath$\Sigma$}(t)\right), (S-7.5)

where (S-7.5) is a normal distribution with mean and variance given, respectively, by the following:

𝝁(t)\displaystyle\mbox{\boldmath$\mu$}(t) =𝐱(t)+12𝐌1δtlog(π(𝐱(t)))\displaystyle=\mathbf{x}(t)+\frac{1}{2}\mathbf{M}^{-1}\delta t\nabla\log\left(\pi(\mathbf{x}(t))\right) (S-7.6)
𝚺(t)\displaystyle\mbox{\boldmath$\Sigma$}(t) =δt𝐌1\displaystyle=\delta t\mathbf{M}^{-1} (S-7.7)

Assuming diagonal 𝐌\mathbf{M} with mim_{i} being the ii-th diagonal element, the proposal can be re-written in the following more convenient manner: for i=1,,ki=1,\ldots,k,

xi=xi(t)+ϵi,x^{\prime}_{i}=x_{i}(t)+\epsilon_{i}, (S-7.8)

where si(t)s_{i}(t) denotes the ii-th component of log(π(𝐱(t)))\nabla\log\left(\pi(\mathbf{x}(t))\right), and ϵiN(12δtsi(t)mi,δtmi)\epsilon_{i}\sim N\left(\frac{1}{2}\frac{\delta ts_{i}(t)}{m_{i}},\frac{\delta t}{m_{i}}\right). Assuming, as is usual, that mi=1m_{i}=1 for each ii, it follows that

𝐱𝐱2δt2=i=1k(ϵiδt)2=i=1kϵi2χk2(λ),\frac{\parallel\mathbf{x}^{\prime}-\mathbf{x}\parallel^{2}}{\delta t^{2}}=\sum_{i=1}^{k}\left(\frac{\epsilon_{i}}{\delta t}\right)^{2}=\sum_{i=1}^{k}\epsilon^{\prime 2}_{i}\sim\chi^{2}_{k}(\lambda), (S-7.9)

where χk2(λ)\chi^{2}_{k}(\lambda) is a non-central χ2\chi^{2} distribution with kk degrees of freedom and non-centrality parameter λ=δt24i=1ksi2(t)\lambda=\frac{\delta t^{2}}{4}\sum_{i=1}^{k}s^{2}_{i}(t). Since, as either kk\rightarrow\infty or λ\lambda\rightarrow\infty,

χk2(λ)(k+λ)2(k+2λ)N(0,1),\frac{\chi^{2}_{k}(\lambda)-(k+\lambda)}{\sqrt{2(k+2\lambda)}}\stackrel{{\scriptstyle\mathcal{L}}}{{\rightarrow}}N(0,1), (S-7.10)

assuming the same strong log-concavity conditions on the target density π\pi as provided in Section S-6.2 it follows as in (S-6.2.1) that,

(2π)k/2Mkkπ(𝐱)\displaystyle\frac{(2\pi)^{k/2}}{M^{k}_{k}}\pi(\mathbf{x}^{*}) {1Φ(log(1ψ2)+k2[(MkmkMk)+Mkδt2(1+λk)]k2[(MkmkMk)2+2Mkδt2(1+λk)+Mk2δt4(1+2λk)])}AR(HMC)\displaystyle\left\{1-\Phi\left(\frac{\log(1-\psi_{2})+\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)+M_{k}\delta t^{2}\left(1+\frac{\lambda}{k}\right)\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{M_{k}}\right)^{2}+2M_{k}\delta t^{2}\left(1+\frac{\lambda}{k}\right)+M^{2}_{k}\delta t^{4}\left(1+\frac{2\lambda}{k}\right)\right]}}\right)\right\}\leq AR^{(HMC)}
(2π)k/2mkkπ(𝐱){1Φ(logψ1k2[(Mkmkmk)mkδt2(1+λk)]k2[(Mkmkmk)2+2mkδt2(1+λk)+mk2δt4(1+2λk)])}\displaystyle\leq\frac{(2\pi)^{k/2}}{m^{k}_{k}}\pi(\mathbf{x}^{*})\left\{1-\Phi\left(\frac{\log\psi_{1}-\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{m_{k}}\right)-m_{k}\delta t^{2}\left(1+\frac{\lambda}{k}\right)\right]}{\sqrt{\frac{k}{2}\left[\left(\frac{M_{k}-m_{k}}{m_{k}}\right)^{2}+2m_{k}\delta t^{2}\left(1+\frac{\lambda}{k}\right)+m^{2}_{k}\delta t^{4}\left(1+\frac{2\lambda}{k}\right)\right]}}\right)\right\}

If λ/k0\lambda/k\rightarrow 0 as kk\rightarrow\infty, it follows as in Section S-6.2.1 that

AR(HMC)(2π)k/2Mkk{1Φ(k2)},AR^{(HMC)}\asymp\frac{(2\pi)^{k/2}}{M^{k}_{k}}\left\{1-\Phi\left(\sqrt{\frac{k}{2}}\right)\right\}, (S-7.12)

which is of the same asymptotic form as (S-6.17), corresponding to the RWMH acceptance rate. On the other hand, if λ/k\lambda/k\rightarrow\infty as kk\rightarrow\infty, then it follows that

AR(HMC)(2π)k/2Mkk{1Φ(k2(1+λk)21Mkδt2+1)},AR^{(HMC)}\asymp\frac{(2\pi)^{k/2}}{M^{k}_{k}}\left\{1-\Phi\left(\frac{\sqrt{\frac{k}{2}\left(1+\frac{\lambda}{k}\right)}}{\sqrt{2}\sqrt{\frac{1}{M_{k}\delta t^{2}}+1}}\right)\right\}, (S-7.13)

which clearly tends to zero at a much faster rate compared to (S-7.12).

To summarize, if λ/k0\lambda/k\rightarrow 0 as kk\rightarrow\infty, then both HMC and RWMH have the same asymptotic acceptance rate, tending to zero much faster than that of additive TMCMC. On the other hand, if λ/k\lambda/k\rightarrow\infty as kk\rightarrow\infty, the acceptance rate of HMC tends to zero much faster than that of RWMH, while that of additive TMCMC maintains its slowest convergence rate to zero. Also observe that the above conclusions will continue to hold if mkm_{k} and MkM_{k} tend to finite positive constants satisfying MkmkM_{k}\asymp m_{k} and k|mkMk1|0k\left|\frac{m_{k}}{M_{k}}-1\right|\rightarrow 0 as kk\rightarrow\infty.

S-8 Generalized Gibbs/Metropolis approaches and comparisons with TMCMC

It is important to make it clear at the outset of this discussion that the goals of TMCMC and generalized Gibbs/Metropolis methods are different, even though both use moves based on transformations. While the strength of the latter lies in improving mixing of the standard Gibbs/MH algorithms by adding transformation-based steps to the underlying collection of usual Gibbs/MH steps, TMCMC is an altogether general method of simulating from the target distribution which does not require any underlying step of Gibbs or MH.

The generalized Gibbs/MH methods work in the following manner. Suppose that an underlying Gibbs or MH algorithm for exploring a target distribution has poor mixing properties. Then in order to improve mixing, one may consider some suitable transformation of the random variables being updated such that mixing is improved under the transformation. Such a transformation needs to chosen carefully since it is important to ensure that invariance of the Markov chain is preserved under the transformation. It is convenient to begin with an overview of the generalized Gibbs method with a sequential updating scheme and then proceed to the discussion on the issues and the importance of the block updating idea in the context of improving mixing of standard Gibbs/MH methods.

Liu and Sabatti (2000) (see also Liu and Yu (1999)) propose simulation of a transformation from some appropriate probability distribution, and then applying the transformation to the co-ordinate to be updated. For example, in a dd-dimensional target distribution, for updating 𝐱=(x1,x2,,xd)\mathbf{x}=(x_{1},x_{2},\ldots,x_{d}) to 𝐱=(x1,x2,,xd)\mathbf{x}^{\prime}=(x^{\prime}_{1},x_{2},\ldots,x_{d}), using an additive transformation, one can select ϵ\epsilon from some appropriate distribution and set x1=x1+ϵx^{\prime}_{1}=x_{1}+\epsilon. Similarly, if a scale transformation is desired, then one can set x1=γx1x^{\prime}_{1}=\gamma x_{1}, where γ\gamma must be sampled from some suitable distribution. The suitable distributions of ϵ\epsilon and γ\gamma are chosen such that the target distribution is invariant with respect to the move 𝐱\mathbf{x}^{\prime}, the forms of which are provided in Liu and Sabatti (2000). For instance, if π()\pi(\cdot) denotes the target distribution, then for the additive transformation, ϵ\epsilon may be sampled from π(x1+ϵ,x2,,xd)\pi(x_{1}+\epsilon,x_{2},\ldots,x_{d}), and for the multiplicative transformation, one may sample γ\gamma from |γ|π(γx1,x2,,xd)|\gamma|\pi(\gamma x_{1},x_{2},\ldots,x_{d}). Since direct sampling from such distributions may be impossible, Liu and Sabatti (2000) suggest a Metropolis-type move with respect to a transformation-invariant transition kernel.

Thus, in the generalized Gibbs method, sequentially all the variables must be updated, unlike TMCMC, where all the variables can be updated simultaneously in a single block. Here we note that for irreducibility issues the generalized Gibbs approach is not suitable for updating the variables blockwise using some transformation that acts on all the variables in a given block. To consider a simple example, with say, d=2d=2 and a single block consisting of both the variables, if one considers the additive transformation, then starting with 𝐱=(x1,x2)\mathbf{x}=(x_{1},x_{2}), where x1<x2x_{1}<x_{2}, one can not ever reach 𝐱=(x1,x2)\mathbf{x}^{\prime}=(x^{\prime}_{1},x^{\prime}_{2}), where x1>0,x2<0x^{\prime}_{1}>0,x^{\prime}_{2}<0. This is because x1=x1+zx^{\prime}_{1}=x_{1}+z and x2=x2+zx^{\prime}_{2}=x_{2}+z, for some zz, and x1>0,x2<0x^{\prime}_{1}>0,x^{\prime}_{2}<0 implies z>x1z>-x_{1} and z<x2z<-x_{2}, which is a contradiction. The scale transformation implies the move 𝐱=(x1,,xd)(γx1,,γxd)=𝐱\mathbf{x}=(x_{1},\ldots,x_{d})\rightarrow(\gamma x_{1},\ldots,\gamma x_{d})=\mathbf{x}^{\prime}. If one initializes the Markov chain with all components positive, for instance, then in every iteration, all the variables will have the same sign. The spaces where some variables are positive and some negative will never be visited, even if those spaces have positive (in fact, high) probabilities under the target distribution. This shows that the Markov chain is not irreducible. In fact, with the aforementioned approach, no transformation, whatever distribution they are generated from, can guarantee irreducibility in general if blockwise updates using the transformation strategy of generalized Gibbs is used.

Although blockwise transformations are proposed in Liu and Sabatti (2000) (see also Kou et al. (2005) who propose a MH-based rule for blockwise transformation), they are meant for a different purpose than that discussed above. The strength of such blockwise transformations lies in improving the mixing behaviour of standard Gibbs or MH algorithms. Suppose that an underlying Gibbs or MH algorithm for exploring a target distribution has poor mixing properties. Then in order to improve mixing, one may consider some suitable transformation of the set of random variables being updated such that mixing is improved under the transformation. This additional step involving transformation of the block of random variables can be obtained by selecting a transformation from the appropriate probability distribution provided in Liu and Sabatti (2000). This “appropriate” probability distribution guarantees that stationarity of the transformed block of random variables is preserved. Examples reported in Liu and Sabatti (2000), Müller and Czado (2005), Kou et al. (2005), etc. demonstrate that this transformation also improves the mixing behaviour of the chain, as desired.

Thus, to improve mixing using the methods of Liu and Sabatti (2000) or Kou et al. (2005) one needs to run the usual Gibbs/MH steps, with an additional step involving transformations as discussed above. This additional step induces more computational burden compared to the standard Gibbs/MH steps, but improved mixing may compensate for the extra computational labour. In very high dimensions, of course, this need not be a convenient approach since computational complexity usually makes standard Gibbs/MH approaches infeasible. Since the additional transformation-based step works on the samples generated by standard Gibbs/MH, impracticality of the latter implies that the extra transformation-based step of Liu and Sabatti (2000) for improving mixing is of little value in such cases.

It is important to point out that the generalized Gibbs/MH methods can be usefully employed by even TMCMC to further improve its mixing properties. In other words, a step of generalized Gibbs/MH can be added to the computational fast TMCMC. This additional step can significantly improve the mixing properties of TMCMC. That TMCMC is much faster computationally than standard Gibbs/MH methods imply that even in very high-dimensional situations the generalized Gibbs/MH step can ve very much successful while working in conjunction with TMCMC.

Flight no. Failure Temp Flight no. Failure Temp
14 1 53 2 1 70
9 1 57 11 1 70
23 1 58 6 0 72
10 1 63 7 0 73
1 0 66 16 0 75
5 0 67 21 1 75
13 0 67 19 0 76
15 0 67 22 0 76
4 0 68 12 0 78
3 0 69 20 0 79
8 0 70 18 0 81
17 0 70
Table S-1: Challenger data. Temperature at flight time (degrees F) and failure of O-rings (1 stands for failure, 0 for success).

S-9 Examples of TMCMC for discrete state spaces

The ideas developed in this paper are not confined to continuous target distributions, but also to discrete cases. For the sake of illustration, we consider two examples below.

  • (i)

    Consider an Ising model, where, for i=1,,ki=1,\ldots,k (k1)(k\geq 1), the discrete random variable xix_{i} takes the value +1+1 or 1-1 with positive probabilities. We then have 𝒳={1,1}\mathcal{X}=\{-1,1\}. To implement TMCMC, consider the forward transformation T(xi,ϵ)=sgn(xi+ϵ)T(x_{i},\epsilon)=sgn(x_{i}+\epsilon) with probability pip_{i}, and choose the backward transformation as Tb(xi,ϵ)=sgn(xiϵ)T^{b}(x_{i},\epsilon)=sgn(x_{i}-\epsilon) with probability 1pi1-p_{i}. Here sgn(a)=±1sgn(a)=\pm 1 accordingly as a>0a>0 or a<0a<0, and 𝒴=(1,)\mathcal{Y}=(1,\infty). Note the difference with the continuous cases. Here even though neither of the transformations is 1-to-1 or onto, TMCMC works because of discreteness; the algorithm can easily be seen to satisfy detailed balance, irreducibility and aperiodicity. However, if k=1k=1 with x1x_{1} being the only variable, then, if x1=1x_{1}=1, it is possible to choose, with probability one, the backward move-type, yielding Tb(x1,ϵ)=1T^{b}(x_{1},\epsilon)=-1. On the other hand, if x1=1x_{1}=-1, with probability one, we can choose the forward move-type, yielding T(x1,ϵ)=1T(x_{1},\epsilon)=1. Only 2k2^{k} move-types are necessary for the kk-dimensional case for one-step irreducibility. In discrete cases, however, there will be no Jacobian of transformation, thereby simplifying the acceptance ratio.

  • (ii)

    For discrete state spaces like k\mathbb{Z}^{k}, (={0,±1,±2,}\mathbb{Z}=\{0,\pm 1,\pm 2,\ldots\}) the additive transformation with single epsilon does not work. For example, with k=2k=2, if the starting state is (1,2)(1,2) then the chain will never reach any states (x,y)(x,y) where xx and yy have same parity (i.e. both even or both odd) resulting a reducible Markov chain. Thus in this case we need to have more move-types than 2k2^{k}. For example, with some positive probability (say rr) we may select a random coordinate and update it leaving other states unchanged. With the remaining probability (i.e. 1r1-r) we may do the analogous version of the additive transformation:
    Let 𝒴=[1,)\mathcal{Y}=[1,\infty). Then, can choose the forward transformation for each coordinate as Ti(xi,ϵ)=xi+[ϵ]T_{i}(x_{i},\epsilon)=x_{i}+[\epsilon] and the backward transformation as Tib(xi,ϵ)=xi[ϵ]T^{b}_{i}(x_{i},\epsilon)=x_{i}-[\epsilon], where [a][a] denotes the largest integer not exceeding aa.

    This chain is clearly ergodic and we still need only one epsilon to update the states.

However, in discrete cases, TMCMC reduces to Metropolis-Hastings with a mixture proposal. But it is important to note that the implementation is much efficient and computationally cheap when TMCMC-based methodologies developed in this paper, are used.

References

  • Bedard (2007) M. Bedard. Weak Convergence of Metropolis Algorithms for Non-i.i.d. Target Distributions. The Annals of Applied Probability, 17:1222–1244, 2007.
  • Cheung and Beck (2009) S. H. Cheung and J. L. Beck. Bayesian Model Updating Using Hybrid Monte Carlo Simulation with Application to Structural Dynamic Models with Many Uncertain Parameters. Journal of Engineering Mechanics, 135:243–255, 2009.
  • Dey and Bhattacharya (2013) K. K. Dey and S. Bhattacharya. On Optimal Scaling of Additive Transformation Based Markov Chain Monte Carlo. Technical report, Indian Statistical Institute, 2013.
  • Duane et al. (1987) S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth. Hybrid Monte Carlo. Physics Letters B, 195:216–222, 1987.
  • Dutta and Bhattacharya (2013) S. Dutta and S. Bhattacharya. Markov Chain Monte Carlo Based on Deterministic Transformations, 2013. Submitted.
  • Hockney (1970) R. W. Hockney. The Potential Calculation and some Applications. Methods in Computational Physics, 9:136–211, 1970.
  • Kou et al. (2005) S. C. Kou, X. S. Xie, and J. S. Liu. Bayesian Analysis of Single-Molecule Experimental Data. Applied Statistics, 54:469–506, 2005.
  • Liu (2001) J. Liu. Monte Carlo Strategies in Scientific Computing. Springer-Verlag, New York, 2001.
  • Liu and Sabatti (2000) J. S. Liu and S. Sabatti. Generalized Gibbs Sampler and Multigrid Monte Carlo for Bayesian Computation. Biometrika, 87:353–369, 2000.
  • Liu and Yu (1999) J. S. Liu and Y. N. Yu. Parameter Expansion for Data Augmentation. Journal of the American Statistical Association, 94:1264–1274, 1999.
  • Mattingly et al. (2011) J. C. Mattingly, N. S. Pillai, and A. M. Stuart. Diffusion Limits of the Random Walk Metropolis Algorithm in High Dimensions. The Annals of Applied Probability, 22:881–930, 2011.
  • Müller and Czado (2005) G. Müller and C. Czado. An Autoregressive Ordered Probit Model with Application to High-Frequency Financial Data. Journal of Computational and Graphical Statistics, 14:320–338, 2005.
  • Roberts et al. (1997) G. O. Roberts, A. Gelman, and W. R. Gilks. Weak Convergence and Optimal Scaling of Random Walk Metropolis Algorithms. The Annals of Applied Probability, 7:110–120, 1997.