Information divergences and likelihood ratios of Poisson processes and point patterns

Lasse Leskelä

Abstract

This article develops an analytical framework for studying information divergences and likelihood ratios associated with Poisson processes and point patterns on general measurable spaces. The main results include explicit analytical formulas for Kullback–Leibler divergences, Rényi divergences, Hellinger distances, and likelihood ratios of the laws of Poisson point patterns in terms of their intensity measures. The general results yield similar formulas for inhomogeneous Poisson processes, compound Poisson processes, as well as spatial and marked Poisson point patterns. Additional results include simple characterisations of absolute continuity, mutual singularity, and the existence of common dominating measures. The analytical toolbox is based on Tsallis divergences of sigma-finite measures on abstract measurable spaces. The treatment is purely information-theoretic and free of topological assumptions.

Keywords: Poisson random measure, inhomogeneous Poisson process, point process, spatial point pattern, Rényi divergence, Tsallis divergence, Hellinger distance, mutual information, Chernoff information, Bhattacharyya distance

1 Introduction

A point pattern or a point process (PP) represents a countable set of points in space or time. Poisson PPs are fundamental statistical models for generating randomly scattered points on the real line, in a Euclidean space, or in an abstract measurable space $S$ . They are encountered in a wide range of applications such as archeology [Bev20], [PRBR18], astronomy [SMSS02], forestry statistics [SP00, KL13], machine learning [BDK⁺21, AS15, ABS21, AKL24], neuroscience and genomics [RBS10, PRBR18], and queueing systems [ABDW21, Les22]. The law of a Poisson PP is a probability measure $P_{\lambda}$ characterised by an intensity measure $\lambda$ , so that $\lambda(B)$ indicates the expected number of points in $B\subset S$ . In statistical inference, it is important to understand how the intensity measure can be identified from data. Statistical research devoted to this question has a long history — well summarised in standard textbooks [Kar91, DVJ03, IPSS08, Dig13]. The main analytical approaches include computing and estimating likelihood ratios $\frac{dP_{\lambda}}{dP_{\mu}}$ and information divergences of laws of Poisson PPs with intensity measures $\lambda$ and $\mu$ .

Likelihood ratios are easy to compute for standard families of probability distributions on finite-dimensional spaces, but not so for probability measures of infinite-dimensional objects such as paths of stochastic processes or spaces of point patterns. In fact, even verifying the absolute continuity of a pair of probability measures, a necessary condition for the existence of a likelihood ratio, can be nontrivial. For Poisson PPs with finite intensity measures, a classical result [Kar91, Rei93] states that $P_{\lambda}\ll P_{\mu}$ if and only if $\lambda\ll\mu$ , in which case a likelihood ratio is given by

\frac{dP_{\lambda}}{dP_{\mu}}(\eta)\ =\ \exp\left(\int_{S}\log\phi\,d\eta+\int_{S}(1-\phi)\,d\mu\right)

(1.1)

with $\phi=\frac{d\lambda}{d\mu}$ being a density of the intensity measures. Using this formula, it is easy to compute various types of information divergences and distances for $P_{\lambda}$ and $P_{\mu}$ . For Poisson PPs with general sigma-finite intensity measures, the description and even the existence of a likelihood ratio is far less obvious. To see why, note that the rightmost integral in (1.1) equals $\int_{S}(1-\phi)\,d\mu=\mu(S)-\lambda(S)$ for finite intensity measures, but for infinite intensity measures this integral might not exist. For Poisson PPs with general sigma-finite intensity measures, most of the known results [Sko57, Lie75, Kar83, Tak90] are restricted to locally compact Polish spaces, thereby ruling out e.g. infinite-dimensional Hilbert spaces.

1.1 Main contributions

This article develops a framework for computing likelihood ratios and information divergences in the most general natural setting, for Poisson PPs with sigma-finite intensity measures on a general measurable space. The purely information-theoretic approach makes no topological assumptions, and allows one to work with point patterns in high- and infinite-dimensional spaces without worrying about topological regularity properties. A key contribution is an explicit formula (Theorem 4.4) for the likelihood ratio $\frac{dP_{\lambda}}{dP_{\mu}}$ that is applicable to all Poisson PP distributions with $P_{\lambda}\ll P_{\mu}$ . This result facilitates the derivation of a characterisation for pairs of Poisson PPs whose laws are dominated by a Poisson PP distribution (Theorem 5.7).

Furthermore, the article provides a comprehensive characterisation of Rényi and Kullback–Leibler divergences of Poisson PPs (Theorems 5.1–5.2), showing that these divergences can be expressed as generalised Tsallis divergences of associated intensity measures. It also extends the definition of Tsallis divergences from probability measures to sigma-finite measures, representing them as linear combinations of Rényi divergences of Poisson distributions (Theorem 3.1). These Poisson–Rényi–Tsallis relationships yield a simplified characterisation for the absolute continuity and mutual singularity of general Poisson PP distributions.

The practical applicability of these results is demonstrated in various contexts, including Poisson processes, compound Poisson processes, marked Poisson point patterns, and Chernoff information of Poisson vectors.

1.2 Outline

The rest of the article is organised as follows. Section 2 introduces notations and definitions. Section 3 develops theoretical foundations for Tsallis divergences of sigma-finite measures. Section 4 presents the main results concerning likelihood ratios of Poisson PP distributions. Section 5 presents the main results about information divergences of Poisson PPs. Section 6 illustrates how the main results can applied to analyse Poisson processes, compound Poisson processes, marked Poisson point patterns, and Chernoff information of Poisson vectors. Section 7 contains the technical proofs of the main results, and Section 8 concludes.

2 Preliminaries

2.1 Measures

Standard conventions of measure theory and Lebesgue integration [Kal02] are used. The sets of nonnegative integers and nonnegative real numbers are denoted by $\mathbb{Z}_{+}$ and $\mathbb{R}_{+}$ , respectively. For measures $\lambda,\mu$ on a measurable space $(S,\mathcal{S})$ , the notation $\lambda\ll\mu$ means that $\lambda$ is absolutely continuous with respect to $\mu$ , that is, $\mu(A)=0$ for all $A\in\mathcal{S}$ such that $\lambda(A)=0$ . We denote $\lambda\perp\mu$ and say that $\lambda,\mu$ are mutually singular if there exist a measurable set $B$ such that $\lambda(B^{c})=0$ and $\mu(B)=0$ . A measurable function $f\colon S\to\mathbb{R}_{+}$ is called a density (or Radon–Nikodym derivative) of $\lambda$ with respect to $\mu$ when $\mu(A)=\int_{A}f\,d\mu$ for all $A\in\mathcal{S}$ ; in this case we denote $f=\frac{d\lambda}{d\mu}$ . A density of probability measure is called a likelihood ratio.

The symbol $\delta_{x}$ refers to the Dirac measure at $x$ . For a number $c\geq 0$ , the symbol $\operatorname{Poi}(c)$ denotes the Poisson probability distribution with mean $c$ and density $k\mapsto e^{-c}\frac{c^{k}}{k!}$ with respect to the counting measure on $\mathbb{Z}_{+}$ , with the standard conventions that $\operatorname{Poi}(0)=\delta_{0}$ and $\operatorname{Poi}(\infty)=\delta_{\infty}$ .

2.2 Point patterns

A point pattern is a countable collection of points, possibly with multiplicities, in a measurable space $(S,\mathcal{S})$ . Such a collection is naturally represented as a measure $\eta$ on $(S,\mathcal{S})$ , so that $\eta(B)$ equals the number of points in $B\in\mathcal{S}$ . The requirement on countability is guaranteed when $\eta=\sum_{n=1}^{\infty}\eta_{n}$ for some finite measures $\eta_{n}$ . Following [LP18], we define $N(S)$ as the set of all measures that can be written as a countable sum of integer-valued finite measures on $(S,\mathcal{S})$ , and equip it with the sigma-algebra $\mathcal{N}(S)$ generated by the evaluation maps $\operatorname{ev}_{B}\colon\eta\mapsto\eta(B)$ , $B\in\mathcal{S}$ . An element of $N(S)$ is called a point pattern or a point process (PP).

2.3 Poisson PPs

Poisson PPs are defined in the standard manner, see [Kin67, Kal02, LP18] for general background. A PP distribution is a probability measure on $(N(S),\mathcal{N}(S))$ . The Laplace functional of a PP distribution $P$ is the map that assigns to every measurable function $u\colon S\to[0,\infty]$ a number $L_{P}(u)=\int_{N(S)}\exp(-\int_{X}u\,d\eta)\,P(d\eta)\in[0,1]$ . Given a sigma-finite measure $\lambda$ on $(S,\mathcal{S})$ , the Poisson PP distribution with intensity measure $\lambda$ is the unique probability measure $P_{\lambda}$ on $(N(S),\mathcal{N}(S))$ such that

(i)

$P_{\lambda}\circ(\operatorname{ev}_{B_{1}},\dots,\operatorname{ev}_{B_{n}})^{-1}=\bigotimes_{i=1}^{n}(P_{\lambda}\circ\operatorname{ev}_{B_{i}}^{-1})$ for all integers $n\geq 1$ and mutually disjoint $B_{1},\dots,B_{n}\in\mathcal{S}$ .
(ii)

$P_{\lambda}\circ\operatorname{ev}_{B}^{-1}=\operatorname{Poi}(\lambda(B))$ for all $B\in\mathcal{S}$ .

For the existence and uniqueness, see e.g. [LP18, Proposition 2.10, Theorem 3.6] or [Kal02, Lemma 12.1–12.2, Theorem 12.7]. Samples $\eta$ from $P_{\lambda}$ are called Poisson PPs.

2.4 Rényi divergences

Rényi divergences were introduced in [Rén61]. The Rényi divergence of order $\alpha\in[0,\infty)$ for probability measures $P,Q$ on a measurable space $(S,\mathcal{S})$ is defined [vH14, PW24] by

R_{\alpha}(P\|Q)\ =\ \begin{cases}-\log Q(p>0),&\quad\alpha=0,\\ \frac{1}{\alpha-1}\log\int_{S}p^{\alpha}q^{1-\alpha}\,d\nu,&\quad\alpha\notin\{0,1\},\\ \int_{S}p\log\frac{p}{q}\,d\nu,&\quad\alpha=1,\end{cases}

(2.1)

where $p=\frac{dP}{d\nu}$ and $q=\frac{dQ}{d\nu}$ are densities of $P,Q$ with respect to a sigma-finite measure¹¹1The definition does not depend on the choice of reference measure or densities, we may take $\nu=\frac{1}{2}(P+Q)$ for example [vH14]. $\nu$ on $S$ ; and for $\alpha>1$ we read $p^{\alpha}q^{1-\alpha}=\frac{p^{\alpha}}{q^{\alpha-1}}$ and adopt the conventions [LV06, vH14] that $\frac{0}{0}=0$ and $\frac{t}{0}=\infty$ for $t>0$ , together with $0\log\frac{0}{t}=0$ for $t\geq 0$ and $t\log\frac{t}{0}=\infty$ for $t>0$ . For $\alpha\in(1,\infty)$ , note that $\int_{S}p^{\alpha}q^{1-\alpha}\,d\nu=\int_{p>0,\,q>0}p^{\alpha}q^{1-\alpha}\,d\nu+\infty P\{q=0\}$ , so that

R_{\alpha}(P\|Q)\ =\ \begin{cases}\frac{1}{\alpha-1}\log\int_{p>0,\,q>0}p^{\alpha}q^{1-\alpha}\,d\nu,&\qquad P\ll Q,\\ \infty,&\qquad P\not\ll Q.\end{cases}

Rényi divergences also admit several variational characterisations, see for example [Ana18, BDK⁺21, vH14].

Rényi divergences of orders $\frac{1}{2},1,2$ are connected to other important information quantities as follows: $R_{1}(P,Q)$ equals the Kullback–Leibler divergence or relative entropy, $R_{1/2}(P\|Q)=-2\log(1-\frac{\operatorname{Hel}^{2}(P,Q)}{2})$ where $\operatorname{Hel}(P,Q)$ is the Hellinger distance, and $R_{2}(P\|Q)=\log(1+\chi^{2}(P,Q))$ where $\chi^{2}(P,Q)$ refers to the $\chi^{2}$ -divergence [GS02].

3 Tsallis divergences of sigma-finite measures

This section introduces a theoretical framework of Tsallis divergences of sigma-finite measures. Section 3.1 provides a definition and a representation formula as a Poisson–Rényi integral, and Section 3.2 summarises some basic properties. Section 3.3 demonstrates how Tsallis divergences characterise absolute continuity and mutual singularity. Section 3.4 establishes a connection with Hellinger distances, and Section 3.5 presents a disintegration formula.

3.1 Definition

Tsallis divergences of probability measures were introduced in [Tsa98] (see also [NN11]). The following definition generalises the notion of Tsallis divergence from probability measures to arbitrary sigma-finite measures. Let $\lambda,\mu$ be sigma-finite measures on a measurable space $S$ admitting densities $f=\frac{d\lambda}{d\nu}$ and $g=\frac{d\mu}{d\nu}$ with respect to a sigma-finite measure $\nu$ . The Tsallis divergence of order $\alpha\in\mathbb{R}_{+}$ is defined by

T_{\alpha}(\lambda\|\mu)=\begin{cases}\mu\{f=0\},&\quad\alpha=0,\\ \int_{S}\big{(}\frac{\alpha f+(1-\alpha)g-f^{\alpha}g^{1-\alpha}}{1-\alpha}\big{)}\,d\nu,&\quad\alpha\notin\{0,1\},\\ \int_{S}\big{(}f\log\frac{f}{g}+g-f\big{)}\,d\nu,&\quad\alpha=1,\end{cases}

(3.1)

where for $\alpha>1$ we read $f^{\alpha}g^{1-\alpha}$ as $\frac{f^{\alpha}}{g^{\alpha-1}}$ and adopt the conventions that $\frac{0}{0}=0$ and $\frac{t}{0}=\infty$ for $t>0$ , as well as $t\log\frac{t}{0}=\infty$ for $t>0$ and $0\log\frac{0}{t}=0$ for $t\geq 0$ .

Theorem 3.1.

$\alpha\mapsto T_{\alpha}(\lambda\|\mu)$ is a well-defined nondecreasing function from $\mathbb{R}_{+}$ into $[0,\infty]$ that is continuous on the interval $\{\alpha\colon T_{\alpha}(\lambda\|\mu)<\infty\}$ , and admits a representation

T_{\alpha}(\lambda\|\mu)\ =\ \int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu(dx),

(3.2)

where $p_{s}$ refers to the Poisson distribution $k\mapsto e^{-sk}\frac{s^{k}}{k!}$ on the nonnegative integers with mean $s$ . Furthermore, the value of $T_{\alpha}(\lambda\|\mu)$ does not depend on the choice of the densities $f,g$ nor the measure $\nu$ .

Proof.

Section 7.1. ∎

Remark 3.2.

In the special case with $\lambda(S)=\mu(S)=1$ , we find that $T_{\alpha}(\lambda\|\mu)=\frac{1-\int_{S}f^{\alpha}g^{1-\alpha}\,d\nu}{1-\alpha}$ for $\alpha\in(0,1)$ agrees with the classical definition of the Tsallis divergence for probability measures [Tsa98]. In this case $T_{1}(\lambda\|\mu)=\operatorname{KL}(\lambda\|\mu)$ equals the Kullback–Leibler divergence, and the Tsallis divergence of order $\alpha\notin\{0,1\}$ is related to the Rényi divergence by $T_{\alpha}(\lambda\|\mu)=\frac{1-e^{-(1-\alpha)R_{\alpha}(\lambda\|\mu)}}{1-\alpha}$ . In case of finite measures, the formulas on the right side of (3.1) can be simplified by replacing $\int_{S}f\,d\nu=\lambda(S)$ and $\int_{S}g\,d\nu=\mu(S)$ . Such simplifications are not possible for general sigma-finite measures, but Theorem 3.1 guarantees that the integrals in (3.1) are nevertheless well defined.

Remark 3.3.

$T_{1/2}(\lambda\|\mu)=2H^{2}(\lambda,\mu)$ where $H(\lambda,\mu)$ refers to the Hellinger distance (see Section 3.4), and $T_{1}(\lambda\|\mu)$ corresponds to the generalized KL divergence discussed in [MFWF23].

Remark 3.4.

When $\lambda,\mu$ admit strictly positive densities $f,g$ with respect to a sigma-finite measure $\nu$ , the Tsallis divergence of order $\alpha\neq 1$ can be written as $T_{\alpha}(\lambda\|\mu)=\int g\,\Phi(\frac{f}{g})\,d\nu$ where $\Phi(t)=\frac{t^{\alpha}-1-\alpha(t-1)}{\alpha-1}$ is a convex function such that $\Phi(1)=0$ . In this sense $T_{\alpha}(\lambda\|\mu)$ corresponds to an instance of an f-divergence between sigma-finite measures $\lambda,\mu$ . Tsallis divergences restricted to probability measures may therefore be analysed using the rich theory of f-divergences [PW24] [Sas18].

3.2 Properties

Tsallis divergences share several properties in common with Rényi divergences. This section summarises some of the most important.

Proposition 3.5.

The Tsallis divergence for sigma-finite measures $\lambda\ll\mu$ with density $\phi=\frac{d\lambda}{d\mu}$ can be written as

T_{\alpha}(\lambda\|\mu)=\begin{cases}\mu\{\phi=0\},&\qquad\alpha=0,\\ \int_{S}\big{(}\frac{\alpha\phi+1-\alpha-\phi^{\alpha}}{1-\alpha}\big{)}d\mu,&\qquad\alpha\notin\{0,1\},\\ \int_{S}\left(\phi\log\phi+1-\phi\right)d\mu,&\qquad\alpha=1.\end{cases}

(3.3)

Proof.

Theorem 3.1 indicates that we are free to choose densities and reference measures in (3.1). The claim follows by choosing densities $f=\phi$ , $g=1$ of $\lambda,\mu$ with respect to reference measure $\nu=\mu$ . ∎

Proposition 3.6.

$(1-\alpha)T_{\alpha}(\mu\|\lambda)=\alpha T_{1-\alpha}(\lambda\|\mu)$ for all $\alpha\in(0,1)$ .

Proof.

Immediate from formula (3.1). ∎

Proposition 3.7.

$\frac{\alpha}{\beta}\frac{1-\beta}{1-\alpha}T_{\beta}(\lambda\|\mu)\leq T_{\alpha}(\lambda\|\mu)\leq T_{\beta}(\lambda\|\mu)$ for all $\alpha\leq\beta$ in $(0,1)$ .

Proof.

The second inequality follows by the monotonicity of Tsallis divergences (Theorem 3.1). The first inequality follows by applying Proposition 3.6 and monotonicity to conclude that

	$\displaystyle\frac{\alpha}{\beta}\frac{1-\beta}{1-\alpha}T_{\beta}(\lambda\\|\mu)$	$\displaystyle\ =\ \frac{\alpha}{1-\alpha}T_{1-\beta}(\mu\\|\lambda)$
		$\displaystyle\ \leq\ \frac{\alpha}{1-\alpha}T_{1-\alpha}(\mu\\|\lambda)\ =\ T_{\alpha}(\lambda\\|\mu).$

∎

3.3 Absolute continuity and mutual singularity

The following is a characterisation of absolute continuity for sigma-finite measures in terms of Tsallis divergences. It is similar in spirit for an analogous characterisation of probability measures using Rényi divergences: $P\ll Q$ iff $R_{0}(Q\|P)=0$ (see [Shi96, Theorem III.9.2], [vH14, Theorem 23]).

Proposition 3.8.

The following are equivalent for all sigma-finite measures:

(i)

$\lambda\ll\mu$ .
(ii)

$T_{0}(\mu\|\lambda)=0$ .

Proof.

Let $\lambda,\mu$ be measures on a measurable space $(S,\mathcal{S})$ admitting densities $f=\frac{d\lambda}{d\nu}$ and $g=\frac{d\mu}{d\nu}$ with respect to a measure $\nu$ . The equivalence of (i) and (ii) follows by applying Lemma A.1 and noting that $\lambda\{g=0\}=T_{0}(\mu\|\lambda)$ by definition (3.1).

∎

The following is a characterisation of mutual singularity for finite measures in terms of Tsallis divergences. It is similar in spirit to an analogous characterisation of probability measures using Rényi divergences: $P\perp Q$ iff $R_{0}(P\|Q)=\infty$ iff $R_{0}(Q\|P)=\infty$ (see [Shi96, Theorem III.9.3], [vH14, Theorem 24]).

Proposition 3.9.

The following are equivalent for all finite measures:

(i)

$\lambda\perp\mu$ .
(ii)

$T_{0}(\lambda\|\mu)=\mu(S)$ .
(iii)

$T_{0}(\mu\|\lambda)=\lambda(S)$ .

Proof.

Let $\lambda,\mu$ be measures on a measurable space $(S,\mathcal{S})$ admitting densities $f=\frac{d\lambda}{d\nu}$ and $g=\frac{d\mu}{d\nu}$ with respect to a measure $\nu$ .

(i) $\iff$ (ii). By (3.1), $T_{0}(\lambda\|\mu)=\mu\{f=0\}$ . Hence $\mu(S)-T_{0}(\lambda\|\mu)=\mu\{f>0\}$ . The claim follows because $\lambda\perp\mu$ is equivalent to $\mu\{f>0\}=0$ (Lemma A.1).

(i) $\iff$ (iii). Analogously, $\lambda(S)-T_{0}(\mu\|\lambda)=\lambda(S)-\lambda\{g=0\}=\lambda\{g>0\}$ . The claim follows because $\lambda\perp\mu$ is equivalent to $\lambda\{g>0\}=0$ (Lemma A.1).

∎

3.4 Hellinger distances of sigma-finite measures

The Hellinger distance between sigma-finite measures $\lambda$ and $\mu$ is defined by

H(\lambda,\mu)\ =\ \left(\frac{1}{2}\int_{S}\left(\sqrt{f}-\sqrt{g}\right)^{2}d\nu\right)^{1/2},

(3.4)

where $f=\frac{d\lambda}{d\nu}$ and $g=\frac{d\mu}{d\nu}$ are densities with respect to a sigma-finite measure $\nu$ . Hellinger distances take values in $[0,1]$ for probability measures, and in $[0,\infty]$ for general sigma-finite measures. By writing $(\sqrt{f}-\sqrt{g})^{2}=f+g-2f^{1/2}g^{1/2}$ , we see by comparing (3.1) and (3.4) that

T_{1/2}(\lambda\|\mu)\ =\ 2H^{2}(\lambda,\mu).

(3.5)

In particular, we see see that Tsallis divergences of order $\frac{1}{2}$ are symmetric according to $T_{1/2}(\lambda\|\mu)=T_{1/2}(\mu\|\lambda)$ . When $\lambda,\mu$ are probability measures, we note that $T_{1/2}(\lambda\|\mu)=2(1-\int_{S}f^{1/2}g^{1/2}\,d\nu)=2(1-\exp(-\frac{1}{2}R_{1/2}(\lambda\|\mu))$ . Hence for probability measures $\lambda,\mu$ ,

H^{2}(\lambda,\mu)\ =\ 1-\exp\left(-\frac{1}{2}R_{1/2}(\lambda\|\mu)\right).

(3.6)

Proposition 3.10.

The right side of (3.4) does not depend on the choice of the densities $f,g$ nor the reference measure $\nu$ . Furthermore, if $\lambda\ll\mu$ , then the Hellinger distance can also be written as

H(\lambda,\mu)\ =\ \left(\frac{1}{2}\int_{S}\left(\sqrt{\phi}-1\right)^{2}\,d\mu\right)^{1/2},

where $\phi\colon S\to\mathbb{R}_{+}$ is a density of $\lambda$ with respect to $\mu$ .

Proof.

The first claim follows by applying Theorem 3.1 with (3.5). The second claim follows by applying (3.4) with $f=\phi$ , $g=1$ , and $\nu=\mu$ . ∎

Proposition 3.11.

$H(\lambda,\xi)\leq H(\lambda,\mu)+H(\mu,\xi)$ for all sigma-finite measures.

Proof.

Let $f,g,h$ be densities of $\lambda,\mu,\xi$ with respect to the sigma-finite measure $\nu=\lambda+\mu+\xi$ . We note that

	$\displaystyle H(\lambda,\xi)$	$\displaystyle\ =\ \frac{1}{\sqrt{2}}{\lVert\sqrt{f}-\sqrt{h}\rVert}_{L^{2}(\nu)}$
		$\displaystyle\ =\ \frac{1}{\sqrt{2}}{\lVert(\sqrt{f}-\sqrt{g})+(\sqrt{g}-\sqrt{h})\rVert}_{L^{2}(\nu)}.$

Therefore, the claim follows by applying Minkowski’s inequality ${\lVert u+v\rVert}_{L^{2}(\nu)}\leq{\lVert u\rVert}_{L^{2}(\nu)}+{\lVert v\rVert}_{L^{2}(\nu)}$ which is true for all measurable functions $u,v$ , regardless of whether ${\lVert u\rVert}_{L^{2}(\nu)},{\lVert v\rVert}_{L^{2}(\nu)}$ are finite or not. ∎

3.5 Disintegration of Tsallis divergences

The disintegration of a measure $\Lambda$ on a product space $(S_{1}\times S_{2},\mathcal{S}_{1}\otimes\mathcal{S}_{2})$ refers to a representation $\Lambda=\lambda\otimes K$ where $\lambda$ is a measure on $(S_{1},\mathcal{S}_{1})$ corresponding to the first marginal of $\Lambda$ , and $K$ is a kernel from $(S_{1},\mathcal{S}_{1})$ into $(S_{2},\mathcal{S}_{2})$ . Equivalently,

\Lambda(C)\ =\ \int_{S_{1}}\int_{S_{2}}1_{C}(x,y)\,K_{x}(dy)\,\lambda(dx),\quad C\in\mathcal{S}_{1}\otimes\mathcal{S}_{2}.

Informally, we write

\Lambda(dx,dy)\ =\ \lambda(dx)K_{x}(dy).

If $\Lambda$ disintegrates according to $\Lambda=\lambda\otimes K$ , then it also disintegrates according to $\Lambda=\tilde{\lambda}\otimes\tilde{K}$ where $\tilde{\lambda}=2\lambda$ and $\tilde{K}=\frac{1}{2}K$ . To rule out such unidentifiability issues, in applications it is natural to require $K$ to be probability kernel, that is, a kernel such that $K_{x}(S_{2})=1$ for all $x\in S_{1}$ . The following result characterises Tsallis divergences of disintegrated measures that helps to compute information divergences of compound Poisson processes and marked Poisson PPs (see Sections 6.3–6.4). See [PW24, Theorem 2.13, Equation 7.71] for similar results concerning Rényi divergences.

Theorem 3.12.

Let $\lambda,\mu$ be sigma-finite measures on a measurable space $(S_{1},\mathcal{S}_{1})$ , and let $K,L$ be probability kernels from $(S_{1},\mathcal{S}_{1})$ into a measurable space $(S_{2},\mathcal{S}_{2})$ . Assume that there exist measurable functions $k,\ell\colon S_{1}\times S_{2}\to\mathbb{R}_{+}$ and a kernel $M$ from $(S_{1},\mathcal{S}_{1})$ into $(S_{2},\mathcal{S}_{2})$ such that

	$\displaystyle K_{t}(dx)$	$\displaystyle=k_{t}(x)M_{t}(dx),$		(3.7)
	$\displaystyle L_{t}(dx)$	$\displaystyle=\ell_{t}(x)M_{t}(dx),$		(3.7)

for all $t\in S_{1}$ . Then the Tsallis divergence of order $\alpha\in\mathbb{R}_{+}$ equals

		$\displaystyle T_{\alpha}(\lambda\otimes K\\|\mu\otimes L)$		(3.8)
		$\displaystyle\ =\ \begin{cases}T_{0}(\lambda\\|\mu)+\int_{f\neq 0}T_{0}(K_{t}\\|L_{t})\,\mu(dt),&\quad\alpha=0,\\ T_{\alpha}(\lambda\\|\mu)+\int_{S_{1}}T_{\alpha}(K_{t}\\|L_{t})\,f_{t}^{\alpha}g_{t}^{1-\alpha}\,\nu(dt),&\quad\alpha\notin\{0,1\},\\ T_{1}(\lambda\\|\mu)+\int_{S_{1}}T_{1}(K_{t}\\|L_{t})\,\lambda(dt),&\quad\alpha=1,\end{cases}$		(3.8)

where $f_{t}=\frac{d\lambda}{d\nu}(t)$ and $g_{t}=\frac{d\mu}{d\nu}(t)$ are densities of $\lambda$ and $\mu$ with respect to a sigma-finite measure $\nu$ .

Proof.

Section 7.2. ∎

A sufficient condition for (3.7) is to assume that $(S_{2},\mathcal{S}_{2})$ is separable in the sense that $\mathcal{S}_{2}$ is generated by a countable set family. In this case there exist [DM82, Theorem 58] measurable functions $k,\ell\colon S_{1}\times S_{2}\to\mathbb{R}_{+}$ such that (3.7) holds for the probability kernel $M=\frac{1}{2}(K+L)$ . It might be that (3.7) is not needed for Theorem 3.12. Proving this might require a measurable selection theorem that is different from the usual ones for which it is assumed that at least one of the sigma-algebras is generated by a regular topology [Les10, LV17].

4 Likelihood ratios of Poisson PPs

This section presents a general likelihood ratio formula for Poisson PPs. Section 4.1 first focuses on the case with finite intensity measures, and Section 4.2 then provides a general formula.

4.1 Finite Poisson PPs

Poisson PPs with finite intensity measures are almost surely finite. Likelihood ratio formulas for Poisson PP distributions with finite intensity measures are classical (e.g. [LS78, Kar91, Rei93, Bir07]), although they are usually restricted to Polish spaces. The proof of the following general result is included in Section 7.3 for completeness.

Theorem 4.1.

Any Poisson PP distributions with finite intensity measures $\lambda\ll\mu$ satisfy $P_{\lambda}\ll P_{\mu}$ , and a likelihood ratio is given by

\frac{dP_{\lambda}}{dP_{\mu}}(\eta)\ =\ 1_{M_{\lambda,\mu}}(\eta)\,\exp\bigg{(}\int_{S}(1-\phi)\,d\mu+\int_{S}\log\phi\,d\eta\bigg{)},

(4.1)

where $\phi=\frac{d\lambda}{d\mu}$ is a density of $\lambda$ with respect to $\mu$ , and

M_{\lambda,\mu}\ =\ \Big{\{}\eta\in N(S)\colon\eta\{\phi=0\}=0\Big{\}}.

(4.2)

Remark 4.2.

For every finite point pattern $\eta\in M_{\lambda,\mu}$ , the integral $\int_{S}\log\phi\,d\eta$ in (4.1) is a well-defined real number because $\log\phi\in\mathbb{R}$ outside the set $\{\phi=0\}$ of $\eta$ -measure zero. Also recall that every point pattern generated by a Poisson PP distribution with a finite intensity measure is finite almost surely. Therefore, the right side in (4.1) is well defined for $P_{\mu}$ -almost every $\eta$ .

Remark 4.3.

The set $M_{\lambda,\mu}$ in (4.2) indicates the set of point patterns that contain no points in the region where $\phi=\frac{d\lambda}{d\mu}$ vanishes. If we assume that $\lambda$ and $\mu$ are mutually absolutely continuous, then we may omit $1_{M_{\lambda,\mu}}(\eta)$ from (4.2). To see why, observe that $\lambda\{\phi=0\}=\int_{\{\phi=0\}}\phi\,d\mu=0$ together with $\mu\ll\lambda$ implies that $\mu\{\phi=0\}=0$ . Therefore, by Markov’s inequality,

	$\displaystyle P_{\mu}(M_{\lambda,\mu}^{c})$	$\displaystyle\ =\ P_{\mu}\{\eta\{\phi=0\}\geq 1\}$
		$\displaystyle\ \leq\ E_{\mu}\eta\{\phi=0\}\ =\ \mu\{\phi=0\}\ =\ 0.$

4.2 Sigma-finite PPs

The simple density formula of Theorem 4.1 is not in general valid for Poisson PPs with infinite intensity measures, because the integral $\int_{S}\log\phi\,d\eta$ might not converge for infinite point patterns $\eta$ . In the general setting, we need to work with carefully compensated integrals. A key observation is that a compensated Poisson integral $\int_{\{{\lvert\log\phi\rvert}\leq 1\}}\log\phi\,d(\eta-\mu)$ and the ordinary Poisson integral $\int_{\{{\lvert\log\phi\rvert}>1\}}\log\phi\,d\eta$ of the logarithm of $\phi=\frac{d\lambda}{d\mu}$ converge for $P_{\mu}$ -almost every $\eta$ whenever $P_{\lambda}\ll P_{\mu}$ . See Appendix B.2 for the definition of the compensated integral and details. The following theorem confirms that a density of $P_{\lambda}$ with respect to $P_{\mu}$ can be written using these integrals.

Theorem 4.4.

Any Poisson PP distributions with sigma-finite intensity measures such that $\lambda\ll\mu$ and $H(\lambda,\mu)<\infty$ satisfy $P_{\lambda}\ll P_{\mu}$ , and a likelihood ratio is given by

\frac{dP_{\lambda}}{dP_{\mu}}(\eta)\ =\ 1_{M_{\lambda,\mu}}(\eta)\exp(\ell_{\lambda,\mu}(\eta)),

(4.3)

where

	$\displaystyle\ell_{\lambda,\mu}(\eta)$	$\displaystyle\ =\int\displaylimits_{{\lvert\log\phi\rvert}\leq 1}\log\phi\,d(\eta-\mu)\,+\int\displaylimits_{{\lvert\log\phi\rvert}>1}\log\phi\,d\eta$		(4.4)
		$\displaystyle\quad+\int\displaylimits_{{\lvert\log\phi\rvert}\leq 1}(\log\phi+1-\phi)\,d\mu+\int\displaylimits_{{\lvert\log\phi\rvert}>1}(1-\phi)\,d\mu$		(4.4)

and

M_{\lambda,\mu}\ =\ \{\eta\in N(S)\colon\eta\{\phi=0\}=0\}

(4.5)

are defined in terms of a density $\phi=\frac{d\lambda}{d\mu}$ .

Proof.

Section 7.4. ∎

Remark 4.5.

When $\lambda$ and $\mu$ are mutually absolutely continuous, the factor $1_{M_{\lambda,\mu}}(\eta)$ may be omitted from (4.3), as explained in Remark 4.3.

5 Information divergences of Poisson PPs

In principle, most information divergences for Poisson PP distributions can be computed using the likelihood ratio $\frac{dP_{\lambda}}{dP_{\mu}}$ . Unfortunately, the general likelihood ratio formula in Theorem 4.4 involves a rather complicated stochastic integral that renders it difficult to obtain explicit analytical expressions. However, the fact that the laws of Poisson PPs are infinitely divisible suggests that simple formulas should be available for information divergences that are additive with respect to product measures. It is well known that Rényi divergences, including the Kullback–Leibler divergence, enjoy this tensorisation property [vH14, PW24]. Indeed, it was recently confirmed that linear combinations of Rényi divergences are the only divergences satisfying the tensorisation property and the data processing inequality [MPST21, Theorem 2].

This section demonstrates how Rényi divergences and related quantities of general Poisson PP distributions can be computed from their associated intensity measures as generalised Tsallis divergences introduced in Section 3. The section is outlined as follows. Section 5.1 summarises formulas Rényi divergences, Kullback–Leibler divergences, and Hellinger distances. Section 5.2 characterises the absolute continuity of Poisson PP distributions using Tsallis divergences of their intensity measures. Section 5.3 characterises pairs of Poisson PPs whose laws admit a common dominating measure corresponding to a Poisson PP.

5.1 Divergences and distances

In what follows, $P_{\lambda}$ and $P_{\mu}$ are Poisson PP distributions with sigma-finite intensity measures $\lambda$ and $\mu$ on a measurable space $S$ .

Theorem 5.1.

The Rényi divergence of order $\alpha\in(0,\infty)$ for Poisson PP distributions $P_{\lambda}$ and $P_{\mu}$ is given by the Tsallis divergence of their intensity measures according to

R_{\alpha}(P_{\lambda}\|P_{\mu})\ =\ T_{\alpha}(\lambda\|\mu).

(5.1)

If $T_{\alpha}(\lambda\|\mu)<\infty$ for some $\alpha>0$ , then (5.1) also holds for $\alpha=0$ .

Proof.

Section 7.5. ∎

Theorem 5.2.

The Kullback–Leibler divergence for Poisson PP distributions $P_{\lambda}$ and $P_{\mu}$ is given by

\operatorname{KL}(P_{\lambda}\|P_{\mu})\ =\ \int_{S}\left(f\log\frac{f}{g}+g-f\right)d\nu,

(5.2)

where $f=\frac{d\lambda}{d\nu}$ and $g=\frac{d\mu}{d\nu}$ are densities with respect to a measure $\nu$ .

Proof.

Immediate corollary of Theorem 5.1. ∎

As another corollary of Theorem 5.1, we obtain a simple formula for the Hellinger distance between Poisson PP distributions. This result was proved in [Tak90] for locally finite intensity measures on locally compact Polish spaces.

Theorem 5.3.

The Hellinger distance between Poisson PPs $P_{\lambda}$ and $P_{\mu}$ is given by $H^{2}(P_{\lambda},P_{\mu})=1-e^{-H^{2}(\lambda,\mu)}$ .

Proof.

Theorem 5.1 combined with formula (3.5) implies that $R_{1/2}(P_{\lambda}\|P_{\mu})=T_{1/2}(\lambda\|\mu)=2H^{2}(\lambda,\mu)$ . For probability measures $P_{\lambda}$ and $P_{\mu}$ , we find by applying (2.1) that $H^{2}(P_{\lambda},P_{\mu})=1-e^{-\frac{1}{2}R_{1/2}(P_{\lambda},P_{\mu})}$ . By combining these findings, we conclude that $H^{2}(P_{\lambda},P_{\mu})=1-e^{-H^{2}(\lambda,\mu)}$ . ∎

5.2 Absolute continuity

Theorem 5.4.

The following are equivalent for any Poisson PPs with sigma-finite intensity measures:

(i)

$P_{\lambda}\ll P_{\mu}$ .
(ii)

$T_{0}(\mu\|\lambda)=0$ and $T_{\alpha}(\mu\|\lambda)<\infty$ for all $\alpha\in\mathbb{R}_{+}$ .
(iii)

$T_{0}(\mu\|\lambda)=0$ and $T_{\alpha}(\mu\|\lambda)<\infty$ for some $0<\alpha<\infty$ .

Proof.

(i) $\implies$ (ii). Assume that $P_{\lambda}\ll P_{\mu}$ . Fix a measurable set $A\subset S$ such that $\mu(A)=0$ . Let $C=\{\eta\in N(S)\colon\eta(A)>0\}$ . Note that $\eta(A)$ is Poisson-distributed with mean $\lambda(A)$ (resp. $\mu(A)$ ) when $\eta$ is sampled from $P_{\lambda}$ (resp. $P_{\mu}$ ). Therefore $P_{\mu}(C)=1-e^{-\mu(A)}=0$ . Because $P_{\lambda}\ll P_{\mu}$ , it follows that $0=P_{\lambda}(C)=1-e^{-\lambda(A)}$ , and we conclude that $\lambda(A)=0$ . Hence $\lambda\ll\mu$ , and Proposition 3.8 implies that $T_{0}(\mu\|\lambda)=0$ . Furthermore, because $P_{\lambda}$ and $P_{\mu}$ are not mutually singular, we know [vH14, Theorem 24] that $R_{\alpha}(P_{\mu}\|P_{\lambda})<\infty$ for all $\alpha\in\mathbb{R}_{+}$ . By Theorem 5.1, it follows that $T_{\alpha}(\mu\|\lambda)<\infty$ for all $\alpha\in\mathbb{R}_{+}$ .

(ii) $\implies$ (iii). Immediate.

(iii) $\implies$ (i). Theorem 5.1 implies that $R_{0}(P_{\mu}\|P_{\lambda})=T_{0}(\mu\|\lambda)=0$ . By formula (2.1), we see that $-\log P_{\lambda}\{G>0\}=0$ where $G=\frac{dP_{\mu}}{dm}$ is a density of $P_{\mu}$ with respect to an arbitrary measure $m$ such that $P_{\lambda},P_{\mu}\ll m$ . Hence $P_{\lambda}\{G=0\}=0$ , and Lemma A.1 confirms that $P_{\lambda}\ll P_{\mu}$ . ∎

As a corollary of Theorem 5.4, we obtain a simple proof of the following result extending [Tak90] to general nontopological spaces.

Theorem 5.5.

The following are equivalent for Poisson PPs with sigma-finite intensity measures:

(i)

$P_{\lambda}\ll P_{\mu}$
(ii)

$\lambda\ll\mu$ and $H(\lambda,\mu)<\infty$ .

Proof.

Assume that $\lambda\ll\mu$ and $H(\lambda,\mu)<\infty$ . Then $T_{0}(\mu\|\lambda)=0$ by Proposition 3.8. Formula (3.5) implies that $T_{1/2}(\mu\|\lambda)=2H^{2}(\mu,\lambda)=2H^{2}(\lambda,\mu)$ is finite. Theorem 5.4 now implies that $P_{\lambda}\ll P_{\mu}$ .

Assume that $P_{\lambda}\ll P_{\mu}$ . Theorem 5.4 then implies that $T_{0}(\mu\|\lambda)=0$ and $T_{1/2}(\mu\|\lambda)<\infty$ . Then Proposition 3.8 implies that $\lambda\ll\mu$ , and formula (3.5) implies that $H(\lambda,\mu)<\infty$ . ∎

Kakutani’s famous dichotomy [Kak48] states that infinite products of probability measures $\prod_{i}P_{i}$ and $\prod_{i}Q_{i}$ , such that $P_{i}$ and $Q_{i}$ are mutually absolutely continuous for all $i$ , are either mutually absolutely continuous or mutually singular—there is no middle ground. Earlier results in this and the previous section yield a simple proof of an analogue of Kakutani’s dichotomy for Poisson PPs. This result is well known for Polish spaces [Lie75, Kar91], and has also been presented in [Bro71] in terms of a more complicated criterion for intensity measures that is equivalent to $H(\lambda,\mu)<\infty$ .

Theorem 5.6.

Let $\lambda$ and $\mu$ be mutually absolutely continuous. Then $P_{\lambda}$ and $P_{\mu}$ are either mutually absolutely continuous, or mutually singular, according to $H(\lambda,\mu)$ being finite or infinite.

Proof.

Assume that $H(\lambda,\mu)<\infty$ . Theorem 5.5 then shows that $P_{\lambda}\ll P_{\mu}$ and $P_{\mu}\ll P_{\lambda}$ , so that $P_{\lambda}$ and $P_{\mu}$ are mutually absolutely continuous.

Assume next that $H(\lambda,\mu)=\infty$ . Theorem 5.3 then implies that $H(P_{\lambda},P_{\mu})=1$ . In light of (3.6) we see that $H^{2}(P_{\lambda},P_{\mu})=1-\exp\left(-\frac{1}{2}R_{1/2}(P_{\lambda}\|P_{\mu})\right)$ , from which we conclude that $R_{1/2}(P_{\lambda}\|P_{\mu})=\infty$ . It follows [vH14, Theorem 24] that $P_{\lambda}$ and $P_{\mu}$ are mutually singular. ∎

5.3 Existence of a dominating Poisson PP

For any pair of Poisson PP distributions $P_{\lambda},P_{\mu}$ , there always exists a probability measure $Q$ on $(N(S),\mathcal{N}(S))$ such that $P_{\lambda}\ll Q$ and $P_{\mu}\ll Q$ . For example, we may choose $Q=\frac{1}{2}(P_{\lambda}+P_{\mu})$ . For practical purposes (e.g. Monte Carlo simulation), it would be helpful to find a Poisson PP distribution, such as $Q=P_{\lambda+\mu}$ , that would also serve as a common dominating probability measure. This is always possible for finite intensity measures but fails in general (Remark 5.8). Remarkably, even when a dominating Poisson PP exists, $\lambda+\mu$ might not be a feasible choice for its intensity.

Theorem 5.7.

Given Poisson PP distributions $P_{\lambda},P_{\mu}$ with sigma-finite intensity measures $\lambda,\mu$ , there exists a Poisson PP distribution $P_{\xi}$ such that $P_{\lambda},P_{\mu}\ll P_{\xi}$ if and only if $H(\lambda,\mu)<\infty$ .

Proof.

Assume that there exists a Poisson PP distribution $P_{\xi}$ with a sigma-finite intensity measure $\xi$ such that $P_{\lambda},P_{\mu}\ll P_{\xi}$ . Theorem 5.5 implies that $H(\lambda,\xi)<\infty$ and $H(\xi,\mu)<\infty$ . The triangle inequality (Proposition 3.11) implies that $H(\lambda,\mu)\leq H(\lambda,\xi)+H(\xi,\mu)$ . Hence $H(\lambda,\mu)<\infty$ .

Assume that $H(\lambda,\mu)<\infty$ . Let $f,g$ be densities of $\lambda,\mu$ with respect to the sigma-finite measure $\nu=\lambda+\mu$ . Define a measure $\xi(dx)=h(x)\nu(dx)$ where $h=\frac{1}{4}(\sqrt{f}+\sqrt{g})^{2}$ . Observe that $f\leq 4h$ and $g\leq 4h$ pointwise, and therefore we see that $\lambda\ll\xi$ and $\mu\ll\xi$ . Furthermore, because $\sqrt{h}=\frac{1}{2}(\sqrt{f}+\sqrt{g})$ , we see that $\sqrt{h}-\sqrt{f}=\frac{1}{2}(\sqrt{g}-\sqrt{f})$ and $\sqrt{h}-\sqrt{g}=\frac{1}{2}(\sqrt{f}-\sqrt{g})$ . Then by formula (3.4),

	$\displaystyle H^{2}(\lambda,\xi)\ =\ \frac{1}{2}\int(\sqrt{h}-\sqrt{f})^{2}\,d\nu$	$\displaystyle\ =\ \frac{1}{8}\int(\sqrt{f}-\sqrt{g})^{2}\,d\nu$
		$\displaystyle\ =\ \frac{1}{4}H^{2}(\lambda,\mu).$

Hence $H(\lambda,\xi)=\frac{1}{2}H(\lambda,\mu)$ . By symmetry, $H(\mu,\xi)=\frac{1}{2}H(\lambda,\mu)$ . We conclude that $H(\lambda,\xi)$ and $H(\mu,\xi)$ are finite. Theorem 5.5 now confirms that $P_{\lambda}\ll P_{\xi}$ and $P_{\mu}\ll P_{\xi}$ . ∎

Remark 5.8.

$P_{\lambda},P_{\mu}\ll P_{\lambda+\mu}$ if and only if $H(\lambda,\lambda+\mu)+H(\mu,\lambda+\mu)<\infty$ . The latter requirement is stronger than $H(\lambda,\mu)<\infty$ . To see why, observe that (see Proposition 3.10) $H^{2}(\lambda,2\lambda)=\frac{1}{2}\int_{S}(\sqrt{2}-\sqrt{1})^{2}\,d\lambda=\frac{1}{2}(\sqrt{2}-1)^{2}\lambda(S)$ . Therefore $H(\lambda,\mu)=0$ and $H(\lambda,\lambda+\mu)=\infty$ whenever $\lambda$ is an infinite measure and $\mu=\lambda$ . Let us also note that $P_{\lambda},P_{\mu}\ll P_{\lambda+\mu}$ is always true for finite intensity measures $\lambda,\mu$ because Hellinger distances between finite measures are finite; as is seen from (3.4) combined with the inequality $(\sqrt{f}-\sqrt{g})^{2}\leq f+g$ .

6 Applications

This section lists various applications of the general formulas derived in the previous sections. Section 6.1 discusses Poisson processes, Section 6.2 Chernoff information of Poisson vectors, Section 6.3 marked Poisson PPs, and Section 6.4 concludes with compound Poisson processes.

6.1 Poisson processes

The counting process of a point pattern $\eta\in N(\mathbb{R}_{+})$ is a function $X\colon\mathbb{R}_{+}\to\mathbb{Z}_{+}$ defined by

X_{t}\ =\ \int_{\mathbb{R}_{+}}1(s\leq t)\,\eta(ds)\ =\ \eta([0,t]).

(6.1)

Denote by $\operatorname{cou}\colon\eta\mapsto X$ the map induced by (6.1). A Poisson process with intensity measure $\lambda$ is the counting process $X=\operatorname{cou}(\eta)$ of a point pattern $\eta$ sampled from a Poisson PP distribution $P_{\lambda}$ with a sigma-finite intensity measure $\lambda$ on $\mathbb{R}_{+}$ . The law of the Poisson process is the pushforward measure $\mathcal{L}(X)=P_{\lambda}\circ\operatorname{cou}^{-1}$ . When $\lambda$ admits a density $f$ with respect to the Lebesgue measure on $\mathbb{R}_{+}$ , the process $X=(X_{t})_{t\in\mathbb{R}_{+}}$ is called an inhomogeneous Poisson process with intensity function $f$ .

Theorem 6.1.

For Poisson processes $X=(X_{t})_{t\in\mathbb{R}_{+}}$ and $Y=(Y_{t})_{t\in\mathbb{R}_{+}}$ with intensity functions $f$ and $g$ , the Kullback–Leibler divergence is given by

\displaystyle\operatorname{KL}(\mathcal{L}(X)\|\mathcal{L}(Y))\ =\ \int_{0}^{\infty}\left(f_{t}\log\frac{f_{t}}{g_{t}}+g_{t}-f_{t}\right)dt,

the Rényi divergence or order $\alpha\neq 1$ is given by

\displaystyle R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))\ =\ \int_{0}^{\infty}\Big{(}\frac{\alpha f_{t}+(1-\alpha)g_{t}-f_{t}^{\alpha}g_{t}^{1-\alpha}}{1-\alpha}\Big{)}dt,

and the Hellinger distance is given by

\operatorname{Hel}(\mathcal{L}(X),\mathcal{L}(Y))\ =\ \sqrt{1-e^{-\frac{1}{2}T_{1/2}(\lambda\|\mu)}},

where $T_{1/2}(\lambda\|\mu)=\int_{0}^{\infty}\left(\sqrt{f_{t}}-\sqrt{g_{t}}\right)^{2}dt.$

Proof.

Denote by $F(\mathbb{R}_{+},\mathbb{Z}_{+})$ set of nondecreasing functions $X\colon\mathbb{R}_{+}\to\mathbb{Z}_{+}$ that are right-continuous with left limits (càdlàg), equipped with the sigma-algebra generated by the evaluation maps $X\mapsto X_{t}$ . It follows from (6.1) that the map $\operatorname{cou}\colon N(\mathbb{R}_{+})\to F(\mathbb{R}_{+})$ is a measurable bijection with a measurable inverse. Therefore (Lemma A.4), $R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))=R_{\alpha}(P_{\lambda}\|P_{\mu})$ with $\lambda(dt)=f(t)dt$ and $\mu(dt)=g(t)dt$ . Theorem 5.1 implies that $R_{\alpha}(P_{\lambda}\|P_{\mu})=T_{\alpha}(\lambda\|\mu)$ , and the first two claims follow by (3.1).

Next, we note by (3.6) that $H^{2}(\mathcal{L}(X),\mathcal{L}(Y))=1-\exp\left(-\frac{1}{2}R_{1/2}(\mathcal{L}(X)\|\mathcal{L}(Y))\right)$ , so that the last claim follows from $R_{1/2}(\mathcal{L}(X)\|\mathcal{L}(Y))=R_{1/2}(P_{\lambda}\|P_{\mu})=T_{1/2}(\lambda\|\mu)$ .

∎

6.2 Chernoff information for Poisson vectors

In classical binary hypothesis testing, the task is to estimate a parameter $\theta\in\{0,1\}$ from $n$ independent samples from a probability distribution $F_{\theta}$ on a measurable space $S$ . The error rate (i.e. Bayes risk) of a decision rule $\hat{\theta}_{n}\colon S^{n}\to\{0,1\}$ , averaged with respect to prior probabilities $\pi_{0},\pi_{1}>0$ is given by $R_{\pi}(\hat{\theta}_{n})=\pi_{0}F_{0}^{\otimes n}(\hat{\theta}_{n}=1)+\pi_{1}F_{1}^{\otimes n}(\hat{\theta}_{n}=0)$ . Chernoff’s famous theorem [Che52] states that for $n\gg 1$ , the minimum error rate scales as

\inf_{\hat{\theta}_{n}}R_{\pi}(\hat{\theta}_{n})\ =\ e^{-(1+o(1))C(F_{0}\|F_{1})n},

where

C(F_{0}\|F_{1})\ =\ \sup_{\alpha\in(0,1)}(1-\alpha)R_{\alpha}(F_{0}\|F_{1}).

is called the Chernoff information of the test [Nie13].

In a network community detection problem related to a stochastic block model, Abbe and Sandon [AS15] reduced a key estimation task into a binary hypothesis test for the law of a random vector with independent Poisson-distributed components, either having mean vector $H_{0}:(\lambda_{1},\dots,\lambda_{K})$ or $H_{1}:(\mu_{1},\dots,\mu_{K})$ . Equivalently, the law of this random vector can be seen as a Poisson PP distribution on the finite set $\{1,\dots,K\}$ with intensity measure admitting density $k\mapsto\lambda_{k}$ or $k\mapsto\mu_{k}$ with respect to the counting measure. Hence we are looking at a hypothesis test between Poisson PP distributions $F_{0}=P_{\lambda}$ and $F_{1}=P_{\mu}$ . With the help of Theorem 5.1 and formula (3.1) we see that the Chernoff information of the test is given by

C(P_{\lambda}\|P_{\mu})\ =\ \sup_{\alpha\in(0,1)}\sum_{k=1}^{K}\big{(}\alpha\lambda_{k}+(1-\alpha)\mu_{k}-\lambda_{k}^{\alpha}\mu_{k}^{1-\alpha}\big{)}.

This is what is called Chernoff–Hellinger divergence in [AS15, RB17, ZT23].

6.3 Marked Poisson PPs

A marking of a (random or nonrandom) set of points with locations $x_{1},x_{2},\dots$ in $S_{1}$ associates each point $x_{i}$ a random variable $y_{i}$ in $S_{2}$ , called a mark, so that the conditional distribution of $y_{i}$ given $x_{i}=x$ is determined by $x$ , and the marks are conditionally independent given the locations. The statistics of the marking mechanism is parameterised by the collection of conditional distributions $K\colon x\mapsto K_{x}=\mathcal{L}(y_{i}\,|\,x_{i}=x)$ that constitutes a probability kernel from $S_{1}$ into $S_{2}$ . Such a marking mechanism can also be defined for random point patterns that admit a proper enumeration. This is the case for point patterns sampled from a Poisson PP distribution [LP18, Corollary 3.7].

Let $P_{\lambda}$ be a Poisson PP distribution with a sigma-finite intensity measure $\lambda$ on a measurable space $(S_{1},\mathcal{S}_{1})$ , and let $K$ be a probability kernel from $(S_{1},\mathcal{S}_{1})$ into $(S_{2},\mathcal{S}_{2})$ . A marked Poisson PP with intensity measure $\lambda$ and mark kernel $K$ is then defined by sampling a point pattern $\xi$ on $S_{1}\times S_{2}$ from a Poisson PP distribution $P_{\lambda\otimes K}$ with intensity measure $\lambda\otimes K$ defined by

(\lambda\otimes K)(C)\ =\ \int_{S_{1}}\int_{S_{2}}1_{C}(x,y)\,K_{x}(dy)\,\lambda(dx),\qquad C\in\mathcal{S}_{1}\otimes\mathcal{S}_{2}.

Then the Poisson PP distribution $P_{\lambda}$ equals the law of the point pattern $A\mapsto\xi(A\times S_{2})$ corresponding to the locations of the points in $S_{1}$ , and the probability kernel $K$ yields the conditional distributions of the marks [LP18, Section 5.2]. Hence $\xi$ is a marked Poisson PP with intensity measure $\lambda$ and marking kernel $K$ .

Theorem 6.2.

If $\xi,\zeta$ are marked Poisson PPs with sigma-finite intensity measures $\lambda,\mu$ and mark kernels $K,L$ , respectively, then the Rényi divergence of order $\alpha>0$ is given by $R_{\alpha}(\mathcal{L}(\xi)\|\mathcal{L}(\zeta))=T_{\alpha}(\lambda\otimes K\|\mu\otimes L)$ .

Proof.

Because $\mathcal{L}(\xi)=P_{\lambda\otimes K}$ and $\mathcal{L}(\zeta)=P_{\mu\otimes L}$ , the claim follows by Theorem 5.1. ∎

Theorem 6.2 combined with Theorem 3.12 now allows one to compute Kullback–Leibler and Rényi divergences of marked Poisson PPs in terms of the intensity measures $\lambda,\mu$ and the kernels $K,L$ .

6.4 Compound Poisson processes

A set of points $(t_{i},x_{i})$ associated with time stamps $t_{i}\in\mathbb{R}_{+}$ and labels $x_{i}\in\mathbb{R}^{d}$ can be modelled as a point pattern on $\mathbb{R}_{+}\times\mathbb{R}^{d}$ , or as a marked point pattern on $\mathbb{R}_{+}$ with mark space $\mathbb{R}^{d}$ . Denote by $N(\mathbb{R}_{+}\times\mathbb{R}^{d})$ the set of point patterns $\eta$ on $\mathbb{R}_{+}\times\mathbb{R}^{d}$ such that $\eta([0,t]\times\mathbb{R}^{d})<\infty$ for all $t\in\mathbb{R}_{+}$ . The cumulative process of such a point pattern is a function $X\colon\mathbb{R}_{+}\to\mathbb{R}^{d}$ defined by

X_{t}\ =\ \int_{[0,t]\times\mathbb{R}^{d}}x\,\eta(ds,dx).

(6.2)

Denote by $\operatorname{agg}\colon\eta\mapsto X$ the map induced by (6.2). A compound Poisson process with event intensity measure $\lambda$ and increment probability kernel $K$ is the cumulative process $X=\operatorname{agg}(\eta)$ of a point pattern $\eta$ sampled from a Poisson PP distribution $P_{\lambda\otimes K}$ on $N(\mathbb{R}_{+}\times\mathbb{R}^{d})$ with intensity measure $(\lambda\otimes K)(dt,dx)=\lambda(dt)K_{t}(dx)$ , where $\lambda$ is a locally finite measure $\mathbb{R}_{+}$ and $K$ is a probability kernel from $\mathbb{R}_{+}$ into $\mathbb{R}^{d}$ . The law of the compound Poisson process is the pushforward measure $\mathcal{L}(X)=P_{\lambda\otimes K}\circ\operatorname{agg}^{-1}$ . The assumption that $\lambda$ is locally finite implies that $P_{\lambda\otimes K}(N(\mathbb{R}_{+}\times\mathbb{R}^{d}))=1$ , so that the right side of (6.2) is well defined for $P_{\lambda\otimes K}$ -almost every $\eta$ .

The map $\operatorname{agg}\colon\eta\mapsto X$ induced by (6.2) maps $N(\mathbb{R}_{+}\times\mathbb{R}^{d})$ into the set $F(\mathbb{R}_{+},\mathbb{R}^{d})$ of right-continuous and piecewise constant functions from $\mathbb{R}_{+}$ into $\mathbb{R}^{d}$ . However, this map is not bijective because: (i) two simultaneous events $(t,x)$ and $(t,y)$ have the same contribution as a single event $(t,x+y)$ to the cumulative process; and (ii) points $(t,0)$ with zero mark do not contribute anything. To rule out such identifiability issues, we will restrict to a set $N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d})$ of point patterns $\eta\in N(\mathbb{R}_{+}\times\mathbb{R}^{d})$ such that $\eta(\{t\}\times\mathbb{R}^{d})\in\{0,1\}$ for all $t$ , and $\eta(\mathbb{R}_{+}\times\{0\})=0$ . This is why we will assume that $\lambda$ is locally finite, diffuse in the sense that $\lambda(\{t\})=0$ for all $t$ , and that the probability kernel $K$ satisfies $K_{t}(\{0\})=0$ for all $t$ . Under these assumptions, it follows [LP18, Proposition 6.9] that $P_{\lambda\otimes K}(N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d}))=1$ .

Theorem 6.3.

For compound Poisson processes $X=(X_{t})_{t\in\mathbb{R}_{+}}$ and $Y=(Y_{t})_{t\in\mathbb{R}_{+}}$ with diffuse locally finite event intensity measures $\lambda$ and $\mu$ , and increment probability kernels $K$ and $L$ such that $K_{t}(\{0\})=0$ and $L_{t}(\{0\})=0$ , the Rényi divergence of order $\alpha>0$ is given by $R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))=T_{\alpha}(\lambda\otimes K\|\mu\otimes L)$ .

Proof.

Let us equip $F(\mathbb{R}_{+},\mathbb{R}^{d})$ with the sigma-algebra generated by the evaluation maps $X\mapsto X_{t}$ . The aggregation map $\operatorname{agg}\colon N(\mathbb{R}_{+}\times\mathbb{R}^{d})\to F(\mathbb{R}_{+},\mathbb{R}^{d})$ restricted to $N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d})$ is bijective, with inverse map given by

(\operatorname{agg}^{-1}(X))(C)\ =\ \#\{(t,x)\in C\colon X_{t}-X_{t-}=x,\,x\neq 0\},

where $X_{t-}=\lim_{s\uparrow t}X_{s}$ for $t>0$ and $X_{t-}=X_{0}$ for $t=0$ . Standard techniques (as in the proof of Theorem 5.1) imply that $\operatorname{agg}\colon N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d})\to F(\mathbb{R}_{+},\mathbb{R}^{d})$ is measurable with a measurable inverse. Because $P_{\lambda\otimes K}$ and $P_{\mu\otimes L}$ have all their mass supported on $N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d})$ , it follows (Lemma A.4) that $R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))=R_{\alpha}(P_{\lambda\otimes K}\|P_{\mu\otimes L})$ . Theorem 5.1 then implies that $R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))=T_{\alpha}(\lambda\|\mu)$ .

∎

By combining Theorem 6.3 with Theorem 3.12, we may compute information divergences of compound Poisson processes. For example, the Rényi divergence of order $\alpha\notin\{0,1\}$ for compound Poisson processes $X=(X_{t})_{t\in\mathbb{R}_{+}}$ and $Y=(Y_{t})_{t\in\mathbb{R}_{+}}$ with event intensity measures $\lambda(dt)=f_{t}dt$ and $\mu(dt)=g_{t}dt$ and increment probability kernels $K$ and $L$ is given by

		$\displaystyle R_{\alpha}(\mathcal{L}(X)\\|\mathcal{L}(Y))$
		$\displaystyle\ =\ R_{\alpha}(P_{\lambda}\\|P_{\mu})+\int_{0}^{\infty}T_{\alpha}(K_{t}\\|L_{t})\,f_{t}^{\alpha}g_{t}^{1-\alpha}\,dt.$

This formula demonstrates how the information content decomposes into two parts: $R_{\alpha}(P_{\lambda}\|P_{\mu})$ associated with only observing the jump instants of the compound Poisson processes, and the additional term $\int_{0}^{\infty}T_{\alpha}(K_{t}\|L_{t})\,f_{t}^{\alpha}g_{t}^{1-\alpha}\,dt$ characterising the information gain when we also observe the jump sizes.

7 Proofs

This section contains the proofs of the main results, with some of the technical parts postponed to the appendix.

7.1 Proof of Theorem 3.1

Lemma 7.1.

The Rényi divergence of Poisson distributions $p_{s}$ and $p_{t}$ with means $s,t\in\mathbb{R}_{+}$ is given by²²2 $R_{\alpha}(p_{s}\|p_{t})=\infty$ when $\alpha\geq 1$ , $s>0$ , and $t=0$ by our 0-division conventions.

R_{\alpha}(p_{s}\|p_{t})\ =\ \begin{cases}1(s=0)t,&\quad\alpha=0,\\ \frac{\alpha s+(1-\alpha)t-s^{\alpha}t^{1-\alpha}}{1-\alpha},&\quad\alpha\notin\{0,1\},\\ s\log\frac{s}{t}+t-s,&\quad\alpha=1.\end{cases}

(7.1)

Proof.

Assume first that $s,t>0$ . By definition, the Rényi divergence or order $\alpha\notin\{0,1\}$ equals $R_{\alpha}(p_{s}\|p_{t})=\frac{1}{\alpha-1}\log Z_{\alpha}(p_{s}\|p_{t})$ , where

	$\displaystyle Z_{\alpha}(p_{s}\\|p_{t})$	$\displaystyle\ =\ \sum_{x=0}^{\infty}\Big{(}e^{-s}\frac{s^{x}}{x!}\Big{)}^{\alpha}\Big{(}e^{-t}\frac{t^{x}}{x!}\Big{)}^{1-\alpha}$
		$\displaystyle\ =\ \exp\Big{(}-\Big{(}\alpha s+(1-\alpha)t-s^{\alpha}t^{1-\alpha}\Big{)}\Big{)}.$

By taking logarithms, (7.1) follows for $\alpha\notin\{0,1\}$ . We also note that $\log\frac{e^{-s}\frac{s^{x}}{x!}}{e^{-t}\frac{t^{x}}{x!}}=x\log\frac{s}{t}+t-s$ , and taking expectations with respect to $p_{s}$ yields (7.1) for $\alpha=1$ . Furthermore, when $s,t>0$ , both $p_{s}$ and $p_{t}$ assign strictly positive probabilities to all nonnegative integers. Therefore, $R_{0}(p_{s}\|p_{t})=-\log p_{t}(\mathbb{Z}_{+})=0$ , confirming (7.1) for $\alpha=0$ .

We also note that the Poisson distribution with mean 0 equals the Dirac measure at 0. Therefore, a simple computation shows that for all $s,t>0$ ,

R_{\alpha}(p_{s}\|\delta_{0})\ =\ \begin{cases}0,&\quad\alpha=0,\\ \frac{\alpha}{1-\alpha}s,&\quad\alpha\in(0,1),\\ \infty,&\quad\alpha\geq 1,\end{cases}

and $R_{\alpha}(\delta_{0}\|p_{t})=t$ for all $\alpha\in[0,\infty)$ . Finally, $R_{\alpha}(\delta_{0}\|\delta_{0})=0$ for all $\alpha\in[0,\infty)$ . By recalling our conventions with dividing by 0, we may conclude that (7.1) holds for all $s,t,\alpha\in[0,\infty)$ . ∎

Proof of Theorem 3.1.

Let $\lambda,\mu$ be sigma-finite measures admitting densities $f,g\colon S\to\mathbb{R}_{+}$ with respect to a sigma-finite measure $\nu$ on a measurable space $(S,\mathcal{S})$ . Let $p_{f(x)}$ and $p_{g(x)}$ be Poisson distributions with means $f(x)$ and $g(x)$ . By applying the Rényi divergence formula for Poisson distributions in Lemma 7.1, we find that

	$\displaystyle\int_{S}R_{\alpha}(p_{f(x)}\\|p_{g(x)})\,\nu(dx)$
	$\displaystyle\quad\ =\ \begin{cases}\int_{S}1(f=0)g\,d\nu,&\quad\alpha=0,\\ \int_{S}\frac{\alpha f+(1-\alpha)g-f^{\alpha}g^{1-\alpha}}{1-\alpha}\,d\nu,&\quad\alpha\notin\{0,1\},\\ \int_{S}\left(f\log\frac{f}{g}+g-f\right)\,d\nu,&\quad\alpha=1.\end{cases}$

By comparing this with the definition of the Tsallis divergence (3.1), we find that

T_{\alpha}(\lambda\|\mu)\ =\ \int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu(dx).

(7.2)

Because Rényi divergences of probability measures are nonnegative, (7.2) shows that $T_{\alpha}(\lambda\|\mu)$ is a well-defined element in $[0,\infty]$ for all $\alpha\in\mathbb{R}_{+}$ . We also know [vH14, Theorem 3] that Rényi divergences of probability measures are nondecreasing in $\alpha$ , so it also follows from (7.2) that $T_{\alpha}(\lambda\|\mu)$ is nondecreasing in $\alpha$ .

Let us next verify that $\alpha\mapsto T_{\alpha}(\lambda\|\mu)$ is continuous on $A=\{\alpha\in\mathbb{R}_{+}\colon T_{\alpha}(\lambda\|\mu)<\infty\}$ . To this end, assume that $A$ is nonempty and $\alpha_{n}\to\alpha$ for some $\alpha_{n},\alpha\in A$ . Because $T_{\alpha}(\lambda\|\mu)$ is nondecreasing in $\alpha$ , we see that $A$ is an interval, and that there exists a number $\beta\in A$ such that $\alpha_{n},\alpha\leq\beta$ for all $n$ . Denote $r_{a}(x)=R_{a}(p_{f(x)}\|p_{g(x)})$ for $a,x\in\mathbb{R}_{+}$ , and let $U=\{x\in S\colon r_{\beta}(x)<\infty\}$ . Because $T_{\beta}(\lambda\|\mu)=\int_{S}r_{\beta}\,d\nu$ is finite, it follows that $\nu(U^{c})=0$ . Therefore,

T_{\alpha_{n}}(\lambda\|\mu)\ =\ \int_{S}r_{\alpha_{n}}\,d\nu\ =\ \int_{U}r_{\alpha_{n}}\,d\nu.

For any $x\in U$ , the monotonicity of Rényi divergences [vH14, Theorem 3] implies that $r_{\alpha}(x),r_{\alpha_{n}}(x)\leq r_{\beta}(x)$ . In particular, $r_{\alpha}(x),r_{\alpha_{n}}(x)$ are finite for $x\in U$ . The continuity of Rényi divergences [vH14, Theorem 7] then implies that $r_{\alpha_{n}}\to r_{\alpha}$ pointwise on $U$ . Lebesgue’s dominated convergence theorem then implies that

T_{\alpha_{n}}(\lambda\|\mu)\ =\ \int_{U}r_{\alpha_{n}}\,d\nu\ \to\ \int_{U}r_{\alpha}\,d\nu\ =\ T_{\alpha}(\lambda\|\mu).

Finally, let us verify that the right side of (3.1) does not depend on the choice of the densities nor the reference measure. Assume that $\lambda,\mu$ admit densities $f_{1},g_{1}\colon S\to\mathbb{R}_{+}$ with respect to a sigma-finite measure $\nu_{1}$ , and densities $f_{2},g_{2}\colon S\to\mathbb{R}_{+}$ with respect to a sigma-finite measure $\nu_{2}$ . Define $\nu=\nu_{1}+\nu_{2}$ . Then $\nu_{1},\nu_{2}\ll\nu$ and $\nu$ is sigma-finite. The Radon–Nikodym theorem [Kal02, Theorem 2.10] implies that there exist densities $h_{1},h_{2}\colon S\to\mathbb{R}_{+}$ of $\nu_{1},\nu_{2}$ with respect to $\nu$ . Then

\lambda(A)\ =\ \int_{A}f_{i}\,d\nu_{i}\ =\ \int_{A}\tilde{f}_{i}\,d\nu\quad\text{for $i=1,2$},

where $\tilde{f}_{i}=f_{i}h_{i}$ . We see that both $\tilde{f}_{1}$ and $\tilde{f}_{2}$ are densities of $\lambda$ with respect to $\nu$ . The Radon–Nikodym theorem [Kal02, Theorem 2.10] implies that $\tilde{f}_{1}=\tilde{f}_{2}$ $\nu$ -almost everywhere. Similarly, we see that both $\tilde{g}_{1}$ and $\tilde{g}_{2}$ are densities of $\mu$ with respect to $\nu$ , so that $\tilde{g}_{1}=\tilde{g}_{2}$ $\nu$ -almost everywhere. Formula (7.1) shows that Rényi divergences of Poisson distributions are homogeneous in the sense that $R_{\alpha}(p_{cs}\|p_{ct})=cR_{\alpha}(p_{s}\|p_{t})$ for all $c\in\mathbb{R}_{+}$ . As a consequence, we see that

	$\displaystyle\int_{S}R_{\alpha}(p_{\tilde{f}_{1}(x)}\\|p_{\tilde{g}_{1}(x)})\,\nu(dx)$	$\displaystyle\ =\ \int_{S}R_{\alpha}(p_{f_{1}(x)}\\|p_{g_{1}(x)})\,\nu_{1}(dx),$
	$\displaystyle\int_{S}R_{\alpha}(p_{\tilde{f}_{2}(x)}\\|p_{\tilde{g}_{2}(x)})\,\nu(dx)$	$\displaystyle\ =\ \int_{S}R_{\alpha}(p_{f_{2}(x)}\\|p_{g_{2}(x)})\,\nu_{2}(dx).$

Because $\tilde{f}_{1}=\tilde{f}_{2}$ and $\tilde{g}_{1}=\tilde{g}_{2}$ $\nu$ -almost everywhere, it follows that all of the above integrals are equal to each other. In particular,

\int_{S}R_{\alpha}(p_{f_{1}(x)}\|p_{g_{1}(x)})\,\nu_{1}(dx)\ =\ \int_{S}R_{\alpha}(p_{f_{2}(x)}\|p_{g_{2}(x)})\,\nu_{2}(dx).

In light of (7.2), we conclude that the value of $T_{\alpha}(\lambda\|\mu)$ as defined by formula (7.2) is the same for both triples $(f_{1},g_{1},\nu_{1})$ and $(f_{2},g_{2},\nu_{2})$ .

∎

7.2 Proof of Theorem 3.12

Proof of Theorem 3.12.

Because $\lambda(dt)=f_{t}\nu(dt)$ and $K_{t}(dx)=k_{t}(x)M_{t}(dx)$ , we see that

	$\displaystyle(\lambda\otimes K)(C)$	$\displaystyle\ =\ \int_{S_{1}}\left(\int_{S_{2}}1_{C}(t,x)\,K_{t}(dx)\right)\lambda(dt)$
		$\displaystyle\ =\ \int_{S_{1}}\left(\int_{S_{2}}1_{C}(t,x)\,k_{t}(x)\,M_{t}(dx)\right)f_{t}\nu(dt)$
		$\displaystyle\ =\ \int_{S_{1}}\left(\int_{S_{2}}1_{C}(t,x)\,f_{t}k_{t}(x)\,M_{t}(dx)\right)\nu(dt)$
		$\displaystyle\ =\ \int_{C}f_{t}k_{t}(x)\,(\nu\otimes M)(dt,dx)$

for all measurable $C\subset S_{1}\times S_{2}$ , so that $(t,x)\mapsto f_{t}k_{t}(x)$ is a density of $\lambda\otimes K$ with respect to $\nu\otimes M$ . Similarly, $t\mapsto g_{t}\ell_{t}(x)$ is a density of $\mu\otimes L$ with respect to $\nu\otimes M$ .

By (3.1), the Tsallis divergence of order $\alpha\neq 0,1$ is given by

T_{\alpha}(\lambda\otimes K\|\mu\otimes L)\ =\ \int_{S_{1}}\left(\int_{S_{2}}\tau_{\alpha}(t,x)\,M_{t}(dx)\right)\,\nu(dt),

(7.3)

where the integrand equals

\tau_{\alpha}(t,x)\ =\ \frac{\alpha f_{t}k_{t}(x)+(1-\alpha)g_{t}\ell_{t}(x)-f_{t}^{\alpha}g_{t}^{1-\alpha}k_{t}(x)^{\alpha}\ell_{t}(x)^{1-\alpha}}{1-\alpha}.

The integrand may also be written as

	$\displaystyle\tau_{\alpha}(t,x)$	$\displaystyle\ =\ \frac{\alpha f_{t}k_{t}(x)+(1-\alpha)g_{t}\ell_{t}(x)}{1-\alpha}$
		$\displaystyle\qquad-\left(\frac{\alpha k_{t}(x)+(1-\alpha)\ell_{t}(x)}{1-\alpha}\right)f_{t}^{\alpha}g_{t}^{1-\alpha}$
		$\displaystyle\qquad+\left(\frac{\alpha k_{t}(x)+(1-\alpha)\ell_{t}(x)-k_{t}(x)^{\alpha}\ell_{t}(x)^{1-\alpha}}{1-\alpha}\right)f_{t}^{\alpha}g_{t}^{1-\alpha}.$

We note that $\int_{S_{2}}k_{t}(x)\,M_{t}(dx)=1$ and $\int_{S_{2}}\ell_{t}(x)\,M_{t}(dx)=1$ , and that

\int_{S_{2}}\left(\frac{\alpha k_{t}(x)+(1-\alpha)\ell_{t}(x)-k_{t}(x)^{\alpha}\ell_{t}(x)^{1-\alpha}}{1-\alpha}\right)M_{t}(dx)\ =\ T_{\alpha}(K_{t}\|L_{t}).

It follows that the inner integral in (7.3) equals

\displaystyle\int_{S_{2}}\tau_{\alpha}(t,x)\,M_{t}(dx)

\displaystyle\ =\ \frac{\alpha f_{t}+(1-\alpha)g_{t}-f_{t}^{\alpha}g_{t}^{1-\alpha}}{1-\alpha}+T_{\alpha}(K_{t}\|L_{t})\,f_{t}^{\alpha}g_{t}^{1-\alpha}.

By integrating both sides against $\nu(dt)$ , we obtain (3.8) for $\alpha\notin\{0,1\}$ .

Let us now consider the case with $\alpha=1$ . In this case $T_{\alpha}(\lambda\otimes K\|\mu\otimes L)$ is again given by (7.3), but now we replace $\tau_{\alpha}$ by

	$\displaystyle\tau_{1}(t,x)$	$\displaystyle\ =\ f_{t}k_{t}(x)\log\frac{f_{t}k_{t}(x)}{g_{t}\ell_{t}(x)}+g_{t}\ell_{t}(x)-f_{t}k_{t}(x)$
		$\displaystyle\ =\ f_{t}k_{t}(x)\log\frac{f_{t}}{g_{t}}+f_{t}k_{t}(x)\log\frac{k_{t}(x)}{\ell_{t}(x)}+g_{t}\ell_{t}(x)-f_{t}k_{t}(x).$

By integrating the above equation against $M_{t}(dx)$ , we find that

\displaystyle\int_{S_{2}}\tau_{1}(t,x)\,M_{t}(dx)

\displaystyle\ =\ f_{t}\log\frac{f_{t}}{g_{t}}+g_{t}-f_{t}+f_{t}T_{1}(K_{t}\|L_{t}).

By further integrating this against $\nu(dt)$ , it follows that

\displaystyle\int_{S_{1}}\left(\int_{S_{2}}\tau_{1}(t,x)\,M_{t}(dx)\right)\nu(dt)\ =\ T_{1}(\lambda\|\mu)+\int_{S_{1}}T_{1}(K_{t}\|L_{t})\,\lambda(dt),

from which we conclude the validity of (3.8) for $\alpha=1$ .

Finally, for $\alpha=0$ we note that

	$\displaystyle\{(t,x)\colon f_{t}k_{t}(x)=0\}$
	$\displaystyle\ =\ (\{f=0\}\times S_{2})\cup\{(t,x)\colon f_{t}\neq 0,\,k_{t}(x)=0\}.$

Hence by (3.1),

	$\displaystyle T_{0}(\lambda\otimes K\\|\mu\otimes L)$	$\displaystyle\ =\ (\mu\otimes L)\{(t,x)\colon f_{t}k_{t}(x)=0\}$
		$\displaystyle\ =\ \mu\{f=0\}+\int_{f\neq 0}L_{t}\{k_{t}=0\}\,\mu(dt)$
		$\displaystyle\ =\ T_{0}(\lambda\\|\mu)+\int_{f\neq 0}T_{0}(K_{t}\\|L_{t})\,\mu(dt).$

∎

7.3 Proof of Theorem 4.1

The following result shows that embeddings $(x_{1},\dots,x_{n})\mapsto\sum_{i=1}^{n}\delta_{x_{i}}$ are measurable even when singleton sets in $(S,\mathcal{S})$ might be nonmeasurable. This is the reason why there is no need to deal with ‘chunks’ as in [Bro71], and we get a conceptually simplified proof of Theorem 4.1.

Lemma 7.2.

For any $n\geq 1$ , the function $\iota_{n}\colon S^{n}\to N(S)$ by $\iota_{k}(x_{1},\dots,x_{n})=\sum_{i=1}^{n}\delta_{x_{i}}$ is measurable.

Proof.

Fix a set $A\in\mathcal{S}$ and an integer $k\geq 0$ . Let $C=\{\eta\colon\eta(A)=k\}$ . Then the set

\displaystyle\iota_{n}^{-1}(C)

\displaystyle\ =\ \Big{\{}(x_{1},\dots,x_{n})\in S^{n}\colon\sum_{i=1}^{n}\delta_{x_{i}}(A)=k\Big{\}}

consists of the $n$ -tuples in $S$ for which exactly $k$ members belong to $A$ , and the remaining members belong to $A^{c}$ . The set of all such $n$ -tuples can be written as

\iota_{n}^{-1}(C)\ =\ \bigcup_{b\in\{0,1\}^{n}:\sum b_{i}=k}1_{A}^{-1}(\{b_{1}\})\times\cdots\times 1_{A}^{-1}(\{b_{n}\}),

where $1_{A}^{-1}(\{b_{k}\})$ denotes the preimage of the indicator function $1_{A}\colon S\to\{0,1\}$ for $\{b_{k}\}$ , and equals $A$ for $b_{k}=1$ and $A^{c}$ for $b_{k}=0$ . Because all sets appearing in the finite union on the right are products of $A$ and $A^{c}$ , we conclude that $\iota_{n}^{-1}(C)\in\mathcal{S}^{\otimes n}$ . Because $\mathcal{N}(S)$ is the sigma-algebra generated by sets of form $C$ , we conclude that $\iota_{n}^{-1}(C)\in\mathcal{S}^{\otimes n}$ for all $C\in\mathcal{N}(S)$ . ∎

Proof of Theorem 4.1.

Assume that $P_{\lambda},P_{\mu}$ are Poisson PP distributions with finite intensity measures $\lambda,\mu$ such that $\lambda\ll\mu$ . To rule out trivialities, we assume that $\lambda,\mu$ are nonzero. The Poisson PP distributions may [LP18, Proposition 3.5] then be represented as

P_{\lambda}\ =\ \sum_{n\geq 0}e^{-\lambda(S)}\frac{\lambda(S)^{n}}{n!}\,\lambda_{1}^{\otimes n}\!\circ\!\iota_{n}^{-1},\qquad P_{\mu}\ =\ \sum_{n\geq 0}e^{-\mu(S)}\frac{\mu(S)^{n}}{n!}\,\mu_{1}^{\otimes n}\!\circ\!\iota_{n}^{-1},

(7.4)

where $\lambda_{1}^{\otimes n},\mu_{1}^{\otimes n}$ are $n$ -fold products of probability measures $\lambda_{1}=\lambda/\lambda(S)$ and $\mu_{1}=\mu/\mu(S)$ on $S$ , and $\iota_{n}(x_{1},\dots,x_{n})=\sum_{i=1}^{n}\delta_{x_{i}}$ . (Lemma 7.2 guarantees that the maps $\iota_{n}\colon S^{n}\to N(S)$ are measurable, and therefore $P_{\lambda},P_{\mu}$ are well-defined probability measures on $N(S)$ .)

Recall that $\phi\colon S\to\mathbb{R}_{+}$ is a density of $\lambda$ with respect to $\mu$ , and consider the function $\Phi\colon N(S)\to\mathbb{R}_{+}$ such that

\Phi(\eta)=e^{\mu(S)-\lambda(S)}\prod_{i=1}^{n}\phi(x_{i})\qquad\text{for}\quad\eta=\sum_{i=1}^{n}\delta_{x_{i}},

(7.5)

and $\Phi(\eta)=0$ for $\eta(S)=\infty$ . We will show that $\Phi$ is a density of $P_{\lambda}$ with respect to $P_{\mu}$ . To do this, fix a measurable set $C\subset N(S)$ , and note that

	$\displaystyle\int_{C}\Phi(\eta)\,P_{\mu}(d\eta)$	$\displaystyle\ =\ \sum_{n\geq 0}e^{-\mu(S)}\frac{\mu(S)^{n}}{n!}\int_{C}\Phi(\eta)\,\mu_{1}^{\otimes n}\!\circ\!\iota_{n}^{-1}(d\eta)$
		$\displaystyle\ =\ \sum_{n\geq 0}e^{-\mu(S)}\frac{\mu(S)^{n}}{n!}\int_{\iota_{n}^{-1}(C)}\Phi(\iota_{n}(x))\,\mu_{1}^{\otimes n}(dx)$
		$\displaystyle\ =\ \sum_{n\geq 0}e^{-\lambda(S)}\frac{\mu(S)^{n}}{n!}\int_{\iota_{n}^{-1}(C)}\phi^{\otimes n}(x)\,\mu_{1}^{\otimes n}(dx),$

where $\phi^{\otimes n}(x_{1},\dots,x_{n})=\prod_{i=1}^{n}\phi(x_{i})$ . We also note that $\frac{\mu(S)}{\lambda(S)}\phi$ is a density of $\lambda_{1}$ with respect to $\mu_{1}$ , and therefore, $(\frac{\mu(S)}{\lambda(S)})^{n}\phi^{\otimes n}$ is a density of $\lambda_{1}^{\otimes n}$ with respect to $\mu_{1}^{\otimes n}$ . Then

\lambda_{1}^{\otimes n}(\iota_{n}^{-1}(C))\ =\ \int_{\iota_{n}^{-1}(C)}\left(\frac{\mu(S)}{\lambda(S)}\right)^{n}\phi^{\otimes n}(x)\,\mu_{1}^{\otimes n}(dx),

and it follows that

	$\displaystyle\int_{C}\Phi(\eta)\,P_{\mu}(d\eta)$	$\displaystyle\ =\ \sum_{n\geq 0}e^{-\lambda(S)}\frac{\mu(S)^{n}}{n!}\left(\frac{\lambda(S)}{\mu(S)}\right)^{n}\lambda_{1}^{\otimes n}(\iota_{n}^{-1}(C))$
		$\displaystyle\ =\ \sum_{n\geq 0}e^{-\lambda(S)}\frac{\lambda(S)^{n}}{n!}\lambda_{1}^{\otimes n}(\iota_{n}^{-1}(C)).$

In light of (7.4), we conclude that $\int_{C}\Phi(\eta)\,P_{\mu}(d\eta)=P_{\lambda}(C)$ . Hence $\Phi$ is a density of $P_{\lambda}$ with respect to $P_{\mu}$ . In particular, $P_{\lambda}\ll P_{\mu}$ .

Finally, let us verify that the function $\Phi$ can be written in form (4.1). By definition (7.5), we see that $\Phi(\eta)=0$ whenever $\eta(S)=\infty$ or $\eta\{\phi=0\}>0$ . Hence $\Phi$ vanishes outside the set $M\cap\Omega$ where $M=\{\eta\in N(S)\colon\eta\{\phi=0\}=0\}$ and $\Omega=\{\eta\in N(S)\colon\eta(S)<\infty\}$ . On the other hand, $\prod_{i=1}^{n}\phi(x_{i})=\exp(\int_{S}\log\phi\,d\eta)$ for every point pattern in $M\cap\Omega$ of form $\eta=\sum_{i=1}^{n}\delta_{x_{i}}$ . By noting that $\mu(S)-\lambda(S)=\int_{S}(1-\phi)\,d\mu$ , we see that $\Phi$ may be written as in (4.1), but with $1_{M}$ replaced by $1_{M\cap\Omega}$ . Because Campbell’s theorem implies that $P_{\mu}(\Omega)=1$ , we see that $\Phi$ is equal to the function in (4.1) as an element of $L_{1}(N(S),\mathcal{N}(S),P_{\mu})$ . ∎

7.4 Proof of Theorem 4.4

We start by proving the following simple upper bound that confirms that $\lambda$ is finite whenever $\lambda\ll\mu$ and $H(\lambda,\mu)<\infty$ for some finite measure $\mu$ .

Lemma 7.3.

Assume that $\lambda\ll\mu$ and that $\phi\colon S\to\mathbb{R}_{+}$ is a density of $\lambda$ with respect to $\mu$ . Then $\lambda(B)\leq 4\mu(B)+3\int_{B}(\sqrt{\phi}-1)^{2}\,d\mu$ for any measurable $B\subset S$ . Especially, $\lambda(S)\leq 4\mu(S)+6H(\lambda,\mu)^{2}$ .

Proof.

By writing $\phi=1+(\phi-1)\leq 1+(\phi-1)_{+}$ , we see that $\lambda(B)=\int_{B}\phi\,d\mu$ is bounded by

\lambda(B)\ \leq\ \mu(B)+\int_{B}(\phi-1)_{+}\,d\mu.

(7.6)

We also note that $t\mapsto\frac{t-1}{t+1}$ is increasing on $\mathbb{R}_{+}$ , so that

(\sqrt{\phi}-1)^{2}\ =\ \frac{\sqrt{\phi}-1}{\sqrt{\phi}+1}(\phi-1)\ \geq\ \frac{1}{3}(\phi-1)\qquad\text{for $\phi>4$}.

Because $(\phi-1)_{+}\leq 3$ for $\phi\leq 4$ , it follows that

	$\displaystyle\int\displaylimits_{B}(\phi-1)_{+}\,d\mu$	$\displaystyle\ =\ \int\displaylimits_{B\cap\{\phi\leq 4\}}(\phi-1)_{+}\,d\mu\ +\int\displaylimits_{B\cap\{\phi>4\}}(\phi-1)_{+}\,d\mu$
		$\displaystyle\ \leq\ 3\mu(B)+3\int\displaylimits_{B}(\sqrt{\phi}-1)^{2}\,d\mu.$

The first claim follows by combining this with (7.6). The second claim follows by noting that $\int_{S}(\sqrt{\phi}-1)^{2}\,d\mu=2H(\lambda,\mu)^{2}$ due to Proposition 3.10. ∎

Proof of Theorem 4.4.

Consider sigma-finite intensity measures such that $\lambda\ll\mu$ and $H(\lambda,\mu)<\infty$ . Fix a density $\phi=\frac{d\lambda}{d\mu}$ , and a sequence $S_{n}\uparrow S$ such that $\mu(S_{n})<\infty$ for all $n$ .

(i) Computing a truncated density. Let $P_{\lambda_{n}}$ and $P_{\mu_{n}}$ be Poisson PP distributions with truncated intensity measures $\lambda_{n}(B)=\lambda(B\cap S_{n})$ and $\mu_{n}(B)=\mu(B\cap S_{n})$ on $(S,\mathcal{S})$ . We also employ the same notation $\eta_{n}(B)=\eta(B\cap S_{n})$ for truncations of point patterns $\eta\in N(S)$ . We note that $\lambda_{n}\ll\mu_{n}$ , and that $\phi$ serves also as a density of $\lambda_{n}$ with respect to $\mu_{n}$ . Our choice of $S_{n}$ implies that $\mu_{n}$ is a finite measure. We also note that $\lambda_{n}$ is finite because $\lambda_{n}(S)\leq 4\mu_{n}(S)+3\int_{S}(\sqrt{\phi}-1)^{2}\,d\mu_{n}=4\mu(S_{n})+6H^{2}(\lambda,\mu)$ due to Lemma 7.3. Theorem 4.1 now implies that $P_{\lambda_{n}}\ll P_{\mu_{n}}$ , with a likelihood ratio given by

\Phi_{n}(\eta)\ =\ 1_{M}(\eta)\,\exp\bigg{(}\int_{S}(1-\phi)\,d\mu_{n}+\int_{S}\log\phi\,d\eta\bigg{)},

(7.7)

where

M\ =\ \{\eta\in N(S)\colon\eta\{\phi=0\}=0\}.

(7.8)

(ii) Approximate Laplace functional. Fix a measurable function $u\colon S\to\mathbb{R}_{+}$ . Monotone convergence of integrals then implies that $\eta_{n}(u)=\eta(u1_{S_{n}})\uparrow\eta(u)$ for all $\eta\in N(S)$ . Lebesgue’s dominated convergence theorem then implies that

\int_{N(S)}e^{-\eta(u)}P_{\lambda}(d\eta)\ =\ \lim_{n\to\infty}\int_{N(S)}e^{-\eta_{n}(u)}P_{\lambda}(d\eta).

(7.9)

By [LP18, Theorem 5.2],

	$\displaystyle\int_{N(S)}e^{-\eta_{n}(u)}P_{\lambda}(d\eta)$	$\displaystyle\ =\ \int_{N(S)}e^{-\eta(u)}P_{\lambda_{n}}(d\eta)$
		$\displaystyle\ =\ \int_{N(S)}e^{-\eta(u)}\Phi_{n}(\eta)P_{\mu_{n}}(d\eta)$
		$\displaystyle\ =\ \int_{N(S)}e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})P_{\mu}(d\eta).$

Together with (7.9), we conclude that

\int_{N(S)}e^{-\eta(u)}\,P_{\lambda}(d\eta)\ =\ \lim_{n\to\infty}\int_{N(S)}e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})\,P_{\mu}(d\eta).

(7.10)

(iii) Identifying the limiting density. Let us next identify the limit of $\Phi_{n}(\eta_{n})$ as $n\to\infty$ . In light of (7.7), we see that

\Phi_{n}(\eta_{n})\ =\ 1_{M}(\eta_{n})\exp\bigg{(}\int_{S_{n}}(1-\phi)\,d\mu+\int_{S_{n}}\log\phi\,d\eta\bigg{)}.

(7.11)

Even though $S_{n}\uparrow S$ , the integrals on the right side above may not converge as expected because $\int_{S}(1-\phi)\,d\mu$ and $\int_{S}\log\phi\,d\eta$ are not necessarily well defined. Also, the compensated integral $\int_{S}\log\phi\,d(\eta-\mu)$ might diverge. A key observation (proven soon) is that the compensated integral $\int_{A}\log\phi\,d(\eta-\mu)$ will converge for $P_{\mu}$ -almost every $\eta$ , where

A\ =\ \{x\in S\colon{\lvert\log\phi(x)\rvert}\leq 1\}.

With this target in mind, we will reorganise the integral terms of (7.11) according to

\int_{S_{n}}(1-\phi)\,d\mu+\int_{S_{n}}\log\phi\,d\eta\ =\ W_{n}(\eta)+Z_{n}(\eta)+w_{n}+z_{n},

(7.12)

where

\begin{aligned} W_{n}(\eta)&=\int_{A\cap S_{n}}\log\phi\,d\eta-\int_{A\cap S_{n}}\log\phi\,d\mu,\\ Z_{n}(\eta)&=\int_{A^{c}\cap S_{n}}\log\phi\,d\eta,\end{aligned}\qquad\begin{aligned} w_{n}&=\int_{A\cap S_{n}}(\log\phi+1-\phi)\,d\mu,\\ z_{n}&=\int_{A^{c}\cap S_{n}}(1-\phi)\,d\mu.\end{aligned}

We will show that all terms on the right side of (7.12) converge for all $\eta\in M\cap\Omega$ , where $M$ is defined by (7.8) and $\Omega=\Omega_{1}\cap\Omega_{2}$ , where

\Omega_{1}\ =\ \left\{\eta\in N(S)\colon\int_{A^{c}\cap\{\phi>0\}}{\lvert\log\phi\rvert}\,d\eta<\infty,\ \eta(A^{c})<\infty\right\}

and

\Omega_{2}\ =\ \left\{\eta\in N(S)\colon\text{$\int_{A}\log\phi\,d(\eta-\mu)$ converges},\ \eta(S_{n})<\infty\ \text{for all $n$}\right\}.

First, the functions $1_{A^{c}\cap S_{n}}\log\phi$ are dominated in absolute value by $1_{A^{c}}{\lvert\log\phi\rvert}$ for all $n$ . The dominating function is integrable with respect to any $\eta\in M\cap\Omega$ by the definition of $\Omega_{1}$ . Lebesgue’s dominated convergence theorem then implies that

Z_{n}(\eta)\to\int_{A^{c}}\log\phi\,d\eta

(7.13)

for every $\eta\in M\cap\Omega_{1}$ . The definition of $\Omega_{2}$ in turn implies that

W_{n}(\eta)\to\int_{A}\log\phi\,d(\eta-\mu)

(7.14)

for all $\eta\in\Omega_{2}$ . We also note that the functions associated with the definitions of $z_{n}$ and $w_{n}$ converge pointwise according to

	$\displaystyle(1-\phi)1_{A^{c}\cap S_{n}}$	$\displaystyle\to(1-\phi)1_{A^{c}},$
	$\displaystyle(\log\phi+1-\phi)1_{A\cap S_{n}}$	$\displaystyle\to(\log\phi+1-\phi)1_{A}.$

By (C.6), $\phi+1\ \leq\ \frac{e+1}{(e^{1/2}-1)^{2}}(\sqrt{\phi}-1)^{2}$ on $A^{c}$ , so that the functions $(1-\phi)1_{A^{c}\cap S_{n}}$ are dominated in absolute value by $\frac{e+1}{(e^{1/2}-1)^{2}}(\sqrt{\phi}-1)^{2}$ . Similarly, by (C.7), the functions $(\log\phi+1-\phi)1_{A\cap S_{n}}$ are dominated in absolute value by $2e^{3}(\sqrt{\phi}-1)^{2}$ . Both dominating functions are integrable due to $\int_{S}(\sqrt{\phi}-1)^{2}\,d\mu=2H^{2}(\lambda,\mu)<\infty$ (recall Proposition 3.10). Therefore, by dominated convergence, we see that

	$\displaystyle z_{n}$	$\displaystyle\ \to\ \int_{A^{c}}(1-\phi)\,d\mu,$		(7.15)
	$\displaystyle w_{n}$	$\displaystyle\ \to\ \int_{A}(\log\phi+1-\phi)\,d\mu.$		(7.16)

By plugging (7.13)–(7.16) into (7.12), we find that for all $\eta\in M\cap\Omega$ ,

\lim_{n\to\infty}\left(\int_{S_{n}}(1-\phi)\,d\mu+\int_{S_{n}}\log\phi\,d\eta\right)\ =\ \ell(\eta)

where

	$\displaystyle\ell(\eta)$	$\displaystyle\ =\ \int_{A^{c}}\log\phi\,d\eta+\int_{A}\log\phi\,d(\eta-\mu)$		(7.17)
		$\displaystyle\qquad+\int_{A^{c}}(1-\phi)\,d\mu+\int_{A}(\log\phi+1-\phi)\,d\mu.$		(7.17)

By noting that $\eta_{n}\in M$ whenever $\eta\in M\cap\Omega$ , we see in light of (7.11) that

\lim_{n\to\infty}\Phi_{n}(\eta_{n})\ =\ e^{\ell(\eta)}

for all $\eta\in M\cap\Omega$ . Furthermore, for any $\eta\in M^{c}\cap\Omega$ , we note that $\eta\{\phi=0\}\geq 1$ , and the fact that $\{\phi=0\}\cap S_{n}\uparrow\{\phi=0\}$ then implies that $\eta(\{\phi=0\}\cap S_{n})\geq 1$ eventually for all large $n$ . Hence for any $\eta\in M^{c}\cap\Omega$ , $1_{M}(\eta_{n})=0$ eventually for all sufficiently large values of $n$ . We also note that $\eta_{n}(u)=\eta(u1_{S_{n}})\uparrow\eta(u)$ by monotone convergence of integrals. By denoting $\Phi(\eta)=1_{M\cap\Omega}(\eta)e^{\ell(\eta)}$ , we see that

\lim_{n\to\infty}1_{\Omega}(\eta)e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})\ =\ e^{-\eta(u)}\Phi(\eta)\qquad\text{for all $\eta$}.

(7.18)

(iv) Exchanging the limit and integral. Let us justify that we may interchange the limit and the integral in (7.10). We know by Lemma B.3 that $P_{\mu}(\Omega)=1$ . Therefore, (7.10) can be written as

\int_{N(S)}e^{-\eta(u)}\,P_{\lambda}(d\eta)\ =\ \lim_{n\to\infty}\int_{N(S)}1_{\Omega}(\eta)e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})\,P_{\mu}(d\eta).

(7.19)

We wish to take the limit inside the integral in (7.19). To justify this, we note that the functions $f_{n}=1_{\Omega}(\eta)e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})$ are bounded by $0\leq f_{n}\leq g_{n}$ , where $g_{n}=\Phi_{n}(\eta_{n})$ . We also note [LP18, Theorem 5.2], that

\int_{N(S)}g_{n}\,dP_{\mu}\ =\ \int_{N(S)}\Phi_{n}(\eta)\,P_{\mu_{n}}(d\eta)\ =\ \int_{N(S)}P_{\lambda_{n}}(d\eta)\ =\ 1,

because $\Phi_{n}=\frac{dP_{\lambda_{n}}}{dP_{\mu_{n}}}$ . Especially, ${\lvert f_{n}\rvert}\leq g_{n}$ for all $n$ , and $\sup_{n}\int_{N(S)}g_{n}\,dP_{\mu}<\infty$ . A modified version of Lebesgue’s dominated convergence theorem (Lemma A.3) then justifies exchanging the limit and integral on the right side of (7.19), and plugging in the limit of (7.18) shows that for all measurable $u\colon S\to\mathbb{R}_{+}$ ,

\int_{N(S)}e^{-\eta(u)}\,P_{\lambda}(d\eta)\ =\ \int_{N(S)}e^{-\eta(u)}\,\Phi(\eta)P_{\mu}(d\eta).

(7.20)

(v) Conclusion. Finally, we note that the formula $Q(d\eta)=\Phi(\eta)P_{\mu}(d\eta)$ defines a measure on $(N(S),\mathcal{N}(S))$ . By applying (7.20) with $u=0$ , we see that $Q(N(S))=\int_{N(S)}\Phi(\eta)P_{\mu}(d\eta)=P_{\lambda}(N(S))=1$ , so that $Q$ is a probability measure. Because the Laplace functional uniquely characterises [LP18, Proposition 2.10] a probability measure on $(N(S),\mathcal{N}(S))$ , we conclude from (7.20) that $Q=P_{\lambda}$ . In other words, $\Phi$ is a density of $P_{\lambda}$ with respect to $P_{\mu}$ . As an element of $L_{1}(N(S),\mathcal{N}(S),P_{\mu})$ , we see that $\Phi(\eta)=1_{M}(\eta)e^{\ell(\eta)}$ , because $P_{\mu}(\Omega)=1$ . ∎

7.5 Proof of Theorem 5.1

First, Lemma 7.5 proves the claim under an additional condition that $\lambda$ and $\mu$ are finite measures on $S$ . This proof is different from the usual topological approach that is based on approximating the measurable sets of $S$ by a finite sigma-algebra [Lie75, Kar83, Kar91], which usually requires $S$ to be a separable metric space. Instead, the following proof is based on (i) representing a Poisson PP distribution with a finite intensity measure using a Poisson-distributed number of IID random variables (see [Kin67, Rei93, LP18]); and (ii) representing a Poisson PP distribution with a sigma-finite intensity measure using a decomposition with respect to a countable partition.

Lemma 7.4.

$P_{\lambda}\ll P_{\mu}$ $\implies$ $\lambda\ll\mu$ for any Poisson PP distributions with sigma-finite intensity measures. Furthermore, the converse implication $\lambda\ll\mu$ $\implies$ $P_{\lambda}\ll P_{\mu}$ holds when the intensity measures are finite.

Proof.

Assume that $P_{\lambda}\ll P_{\mu}$ . Consider a set $B\subset S$ such that $\mu(B)=0$ . Define $C=\{\eta\colon\eta(B)>0\}$ . Recall that $\eta(B)$ is Poisson distributed with mean $\mu(B)$ when $\eta$ is sampled from $P_{\mu}$ . Therefore, $P_{\mu}(C)=1-e^{-\mu(B)}=0$ . Now $P_{\lambda}\ll P_{\mu}$ implies that $0=P_{\lambda}(C)=1-e^{-\lambda(B)}$ , from which we conclude that $\lambda(B)=0$ . Hence $\lambda\ll\mu$ .

For finite intensity measures $\lambda,\mu$ , Theorem 4.1 confirms the reverse implication $\lambda\ll\mu$ $\implies$ $P_{\lambda}\ll P_{\mu}$ . ∎

Lemma 7.5.

$R_{\alpha}(P_{\lambda}\|P_{\mu})=T_{\alpha}(\lambda\|\mu)$ for all $\alpha\in\mathbb{R}_{+}$ and all Poisson PP distributions $P_{\lambda},P_{\mu}$ with finite intensity measures $\lambda,\mu$ .

Proof.

Let $P_{\nu}$ be a Poisson PP distribution with intensity measure $\nu=\lambda+\mu$ . Let $f=\frac{d\lambda}{d\nu},g=\frac{d\mu}{d\nu}$ be densities of $\lambda,\mu$ with respect to $\nu$ . Such functions exist by the Radon–Nikodym theorem [Kal02, Theorem 2.10]. Theorem 4.1 implies that $P_{\lambda},P_{\mu}$ are absolutely continuous with respect to $P_{\nu}$ , admitting likelihood ratios $F=\frac{dP_{\lambda}}{dP_{\nu}}$ and $G=\frac{dP_{\mu}}{dP_{\nu}}$ given by

	$\displaystyle F(\eta)$	$\displaystyle\ =\ 1_{M_{f}}(\eta)\,e^{\nu(1-f)+\eta(\log f)},$		(7.21)
	$\displaystyle G(\eta)$	$\displaystyle\ =\ 1_{M_{g}}(\eta)\,e^{\nu(1-g)+\eta(\log g)},$		(7.21)

where $M_{f}=\{\eta\in N(S)\colon\eta\{f=0\}=0,\,\eta(S)<\infty\}$ and $M_{g}$ is defined similarly, and we abbreviate $\nu(f)=\int f\,d\nu$ .

Let us first consider the case with $\alpha\notin\{0,1\}$ . Then $R_{\alpha}(P_{\lambda}\|P_{\mu})=\frac{1}{\alpha-1}\log Z_{\alpha}$ where $Z_{\alpha}=\int_{N(S)}F^{\alpha}G^{1-\alpha}\,dP_{\nu}$ . By the standard conventions $0\cdot\infty=\frac{0}{0}=0$ and $\frac{1}{0}=\infty$ , we see that

Z_{\alpha}\ =\ \begin{cases}\tilde{Z}_{\alpha},&\quad\alpha\in(0,1),\\ \tilde{Z}_{\alpha}+\infty\cdot P_{\nu}\{F>0,G=0\},&\quad\alpha\in(1,\infty),\end{cases}

(7.22)

where

\tilde{Z}_{\alpha}\ =\ \int_{F>0,\,G>0}F^{\alpha}G^{1-\alpha}\,dP_{\nu}.

To derive a simplified expression for $\tilde{Z}_{\alpha}$ , define $U=\{f>0,\,g>0\}$ and consider a set of point patterns $N(U)=\{\eta\in N(S)\colon\eta(S)<\infty,\,\eta(U^{c})=0\}$ . In light of (7.21), we see that $\{F>0,\,G>0\}\cap\{\eta\colon\eta(S)<\infty\}=M_{f}\cap M_{g}=N(U)$ . Because $P_{\nu}\{\eta\colon\eta(S)<\infty\}=1$ , it follows that

\tilde{Z}_{\alpha}\ =\ \int_{N(U)}F^{\alpha}G^{1-\alpha}\,dP_{\nu}.

We also note that for $\eta\in N(U)$ ,

\displaystyle F(\eta)^{\alpha}G(\eta)^{1-\alpha}

\displaystyle\ =\ e^{\alpha\nu(1-f)+(1-\alpha)\nu(1-g)}e^{\eta(\log h_{\alpha})},

(7.23)

where $h_{\alpha}=f^{\alpha}g^{1-\alpha}$ . The conditional distribution of a point pattern $\eta$ sampled from $P_{\nu}$ given $\eta\in N(U)$ equals (Proposition B.4) $P_{\nu_{U}}$ . By also noting that $P_{\nu}(N(U))=e^{-\nu(U^{c})}$ , it follows that

\int_{N(U)}e^{\eta(\log h_{\alpha})}\,P_{\nu}(d\eta)\ =\ e^{-\nu(U^{c})}\int_{N(S)}e^{\eta(\log h_{\alpha})}\,P_{\nu_{U}}(d\eta).

Because $\int_{S}({\lvert\log h_{\alpha}\rvert}\wedge 1)\,d\nu_{U}\leq\nu(U)<\infty$ and $x\mapsto\log h(x)$ restricted to $U$ is $\mathbb{R}$ -valued, the Laplace functional formula of Poisson point patterns [Kal02, Lemma 12.2] implies that $\int_{N(S)}e^{\eta(\log h_{\alpha})}\,P_{\nu_{U}}(d\eta)=e^{\nu_{U}(h_{\alpha}-1)}$ . By integrating (7.23) with respect to $P_{\nu}$ , it follows that

	$\displaystyle\tilde{Z}_{\alpha}$	$\displaystyle\ =\ e^{\alpha\nu(1-f)+(1-\alpha)\nu(1-g)}e^{-\nu(U^{c})}e^{\nu_{U}(h_{\alpha}-1)}$		(7.24)
		$\displaystyle\ =\ e^{-\alpha\nu(f)-(1-\alpha)\nu(g)+\nu_{U}(h_{\alpha})}.$		(7.24)

We are now ready to verify the claim by considering the following five cases one by one:

(i)

Assume now that $\alpha\in(0,1)$ . Then by (7.22), we see that $R_{\alpha}(P_{\lambda}\|P_{\mu})=\frac{1}{\alpha-1}\log\tilde{Z}_{\alpha}$ . We also note that $\nu_{U}(h_{\alpha})=\nu(h_{\alpha})$ , so that by (7.24), we conclude that

$R_{\alpha}(P_{\lambda}\|P_{\mu})\ =\ \frac{\nu(\alpha f+(1-\alpha)g-h_{\alpha})}{1-\alpha}.$ (7.25)

The claim follows because the right side equals $T_{\alpha}(\lambda\|\mu)$ by (3.1).
(ii)

Assume now that $\alpha\in(1,\infty)$ and $\lambda\ll\mu$ . Then $P_{\lambda}\ll P_{\mu}$ by Lemma 7.4. Then $\nu\{f>0,g=0\}=0$ and $P_{\nu}\{F>0,G=0\}=0$ (Lemma A.1). In this case we again find that $\nu_{U}(h_{\alpha})=\nu(h_{\alpha})$ . Therefore, in light of (7.22) and (7.24), we see that (7.25) holds also in this case, and the claim follows.
(iii)

Assume now that $\alpha\in(1,\infty)$ and $\lambda\not\ll\mu$ . Then $P_{\lambda}\not\ll P_{\mu}$ by Lemma 7.4. Then $\nu\{f>0,g=0\}>0$ and $P_{\nu}\{F>0,G=0\}>0$ (Lemma A.1). Hence by (7.22), $R_{\alpha}(P_{\lambda}\|P_{\mu})=\infty$ . The assumption $\alpha>1$ implies that $f^{\alpha}g^{1-\alpha}=\infty$ on the set $\{f>0,\,g=0\}$ . Therefore $\int_{S}f^{\alpha}g^{1-\alpha}\,d\nu=\infty$ , and we find that $T_{\alpha}(\lambda\|\mu)=\infty$ by (3.1), and the claim follows.
(iv)

Assume that $\alpha=1$ . Now $R_{1}(P_{\lambda}\|P_{\mu})=\lim_{\alpha\uparrow 1}R_{\alpha}(P_{\lambda}\|P_{\mu})$ [vH14]. By (i), we know that $R_{\alpha}(\lambda\|\mu)=T_{\alpha}(\lambda\|\mu)$ for all $\alpha\in(0,1)$ . The claim follows by letting $\alpha\uparrow 1$ and noting that $T_{1}(P_{\lambda}\|P_{\mu})=\lim_{\alpha\uparrow 1}T_{\alpha}(P_{\lambda}\|P_{\mu})$ by Theorem 3.1.
(v)

Assume that $\alpha=0$ . Formula (2.1) shows that $R_{0}(P_{\lambda}\|P_{\mu})=-\log P_{\mu}(F>0)=-\log P_{\mu}(M_{f})$ . Because $P_{\mu}(M_{f})=e^{-\mu\{f=0\}}$ , it follows that $R_{0}(P_{\lambda}\|P_{\mu})=\mu\{f=0\}$ . By formula (3.1), we see that $R_{0}(P_{\lambda}\|P_{\mu})=T_{0}(\lambda\|\mu)$ .

∎

With the help of Lemma 7.5 we will prove Theorem 5.1 in the general case where $\lambda,\mu$ are sigma-finite measures on $S$ .

Proof of Theorem 5.1.

(i) Fix $\alpha\in(0,\infty)$ . Define $\nu=\lambda+\mu$ . Then $\nu$ is sigma-finite. Select a partition (see Lemma A.2) $S=\cup_{n\geq 1}S_{n}$ such that $\nu(S_{n})<\infty$ for all $n$ . Denote $N(S_{n})=\{\eta\in N(S)\colon\eta(S_{n}^{c})=0\}$ . Define $\tau\colon N(S)\to\prod_{n=1}^{\infty}N(S_{n})$ by

\tau(\eta)\ =\ (\eta_{S_{1}},\eta_{S_{2}},\dots),

(7.26)

where $\eta_{S_{n}}\in N(S_{n})$ is defined by $\eta_{S_{n}}(B)=\eta(B\cap S_{n})$ . A restriction theorem [LP18, Theorem 5.2] implies that when $\eta$ is sampled from $P_{\lambda}$ , then the restrictions $\eta_{S_{1}},\eta_{S_{2}},\dots$ are mutually independent Poisson PPs with intensity measures $\lambda_{S_{1}},\lambda_{S_{2}},\dots$ defined by $\lambda_{S_{n}}(B)=\lambda(B\cap S_{n})$ . Therefore, the pushforward probability measure $P_{\lambda}\circ\tau^{-1}$ can be written as a product of Poisson PP distributions $P_{\lambda_{S_{n}}}$ , $n\geq 1$ . The same reasoning is valid also for $P_{\mu}$ . Hence

P_{\lambda}\circ\tau^{-1}\ =\ \bigotimes_{n=1}^{\infty}P_{\lambda_{S_{n}}}\qquad\text{and}\qquad P_{\mu}\circ\tau^{-1}\ =\ \bigotimes_{n=1}^{\infty}P_{\mu_{S_{n}}}.

Because the sets $S_{1},S_{2},\dots$ form a partition of $S$ , we see that the map $\tau$ defined by (7.26) is a bijection with inverse $\tau^{-1}(\eta_{1},\eta_{2},\dots)=\sum_{n\geq 1}\eta_{n}$ . Standard arguments show that $\tau$ and $\tau^{-1}$ are measurable mappings (see Section B.1). Lemma A.4 then implies that $R_{\alpha}(P_{\lambda}\|P_{\mu})=R_{\alpha}(P_{\lambda}\circ\tau^{-1}\|P_{\mu}\circ\tau^{-1})$ . Because Rényi divergences of order $\alpha>0$ factorise over tensor products [vH14, Theorem 28] it follows that

R_{\alpha}(P_{\lambda}\|P_{\mu})\ =\ \sum_{n=1}^{\infty}R_{\alpha}(P_{\lambda_{S_{n}}}\|P_{\mu_{S_{n}}}).

(7.27)

Because $\lambda,\mu\ll\nu$ and $\nu$ is sigma-finite, the Radon–Nikodym theorem [Kal02, Theorem 2.10] implies that there exist densities $f=\frac{d\lambda}{d\nu}$ and $g=\frac{d\mu}{d\nu}$ of $\lambda$ and $\mu$ with respect to $\nu$ . Observe now that $\lambda_{S_{n}}(A)=\int_{A\cap S_{n}}f\,d\nu=\int_{A}f\,d\nu_{S_{n}}$ for all measurable $A\subset S$ . Similarly, $\mu_{S_{n}}(A)=\int_{A}g\,d\nu_{S_{n}}$ . We conclude that the functions $f$ and $g$ also act as densities $f=\frac{d\lambda_{S_{n}}}{d\nu_{S_{n}}}$ and $g=\frac{d\mu_{S_{n}}}{d\nu_{S_{n}}}$ of the finite measures $\lambda_{S_{n}},\mu_{S_{n}}$ with respect to $\nu_{S_{n}}$ . Lemma 7.5 now implies that

R_{\alpha}(P_{\lambda_{S_{n}}}\|P_{\mu_{S_{n}}})=T_{\alpha}(\lambda_{S_{n}}\|\mu_{S_{n}}).

(7.28)

Furthermore, by (3.2) in Theorem 3.1, we see that

T_{\alpha}(\lambda_{S_{n}}\|\mu_{S_{n}})\ =\ \int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu_{S_{n}}(dx),

where $p_{s}$ refers to the Poisson distribution $k\mapsto e^{-sk}\frac{s^{k}}{k!}$ with mean $s$ . Observe that the integrand on the right side above is nonnegative, $d\nu_{S_{n}}=1_{S_{n}}d\nu$ , and $\sum_{n}1_{S_{n}}=1$ . Fubini’s theorem combined with (3.2) then implies that

\displaystyle\sum_{n=1}^{\infty}T_{\alpha}(\lambda_{S_{n}}\|\mu_{S_{n}})\ =\ \int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu(dx)\ =\ T_{\alpha}(\lambda\|\mu).

By combining this with (7.27) and (7.28), it follows that $R_{\alpha}(P_{\lambda}\|P_{\mu})=T_{\alpha}(\lambda\|\mu)$ .

(ii) Finally, let use verify that $R_{0}(P_{\lambda}\|P_{\mu})=T_{0}(\lambda\|\mu)$ under the additional assumption that $T_{\beta}(\lambda\|\mu)<\infty$ for some $\beta>0$ . We saw in part (i) of the proof that

R_{\alpha}(\lambda\|\mu)=T_{\alpha}(\lambda\|\mu)\quad\text{for all $\alpha\in(0,\infty)$}.

(7.29)

Now [vH14, Theorem 7] implies that $\alpha\mapsto R_{\alpha}(P_{\lambda}\|P_{\mu})$ is continuous on $[0,1]$ , and Theorem 3.1 implies that $\alpha\mapsto T_{\alpha}(\lambda\|\mu)$ is continuous on $[0,\beta]$ . Hence the claim follows by taking limits $\alpha\to 0$ in (7.29). ∎

8 Conclusions

By developing an analytical toolbox of generalised Tsallis divergences for sigma-finite measures, a framework was derived for analysing likelihood ratios and Rényi divergences of Poisson PPs on general measurable spaces. The main advantage of this approach is that it is purely information-theoretic and free of topological assumptions. This framework allows one to derive explicit descriptions of Kullback–Leibler divergences, Rényi divergences, Hellinger distances, and likelihood ratios for statistical models that admit a measurable one-to-one map into a space of point patterns governed by a Poisson PP distribution. Marked Poisson PPs corresponding to Poisson PPs on abstract product spaces provide a rich context for various applications. The disintegrated Tsallis divergence formula in Section 3.5 is key to understanding their information-theoretic features. For completing the general theory, understanding whether the technical condition (3.7) is necessary in Theorem 3.12 remains an important open problem.

Future directions of extending this work include deriving similar results for a wider class of statistical models derived from Poisson processes and marked point patterns, for example Cox processes, Hawkes processes, Poisson shot noise random measures, and Matérn point patterns. A challenge here is that mechanisms used to derive the model from a marked point pattern tend to lose information. For example, Poisson shot noise models in certain limiting regimes reduce to Gaussian white noises [KLNS07]. The thorough characterisation of Poisson PP distributions derived in this article is expected to serve as a cornerstone for this type of further studies.

Acknowledgments

The author thanks Venkat Anantharam for insightful discussions and two anonymous reviewers for valuable remarks that have helped to improve the presentation.

Appendix A Measure theory

A.1 Densities

Recall notations from Section 2.1.

Lemma A.1.

Let $\lambda,\mu$ be measures on a measurable space $(S,\mathcal{S})$ admitting densities $f=\frac{d\lambda}{d\nu}$ and $g=\frac{d\mu}{d\nu}$ with respect to a measure $\nu$ .

(i)

$\lambda\{f=0\}=0$ and $\mu\{g=0\}=0$ .
(ii)

For any measurable set $B$ , $\lambda(B)=0$ if and only if $\nu(\{f>0\}\cap B)=0$ .
(iii)

$\lambda\ll\mu$ if and only if $\lambda\{g=0\}=0$ .
(iv)

$\lambda\ll\mu$ if and only if $\nu\{f>0,\,g=0\}=0$ .
(v)

$\lambda\perp\mu$ if and only if $\nu\{f>0,\,g>0\}=0$ .
(vi)

$\lambda\perp\mu$ if and only if $\lambda\{g>0\}=0$ .

Proof.

(i) $\lambda\{f=0\}=\int_{\{f=0\}}f\,d\nu=0.$ Analogously we see that $\mu\{g=0\}=0$ .

(ii) Fix a measurable set $B$ , and denote $A=\{f>0\}\cap B$ . Note that (i) implies that $\lambda(B)=\lambda(A)=\int_{A}f\,d\nu$ . Hence $\nu(A)=0$ implies that $\lambda(B)=0$ . Assume next that $\nu(A)>0$ . Then $\nu(A_{n})>0$ for some integer $n\geq 1$ , where $A_{n}=\{f\geq n^{-1}\}\cap B$ . Hence $\lambda(B)\geq\lambda(A_{n})=\int_{A_{n}}f\,d\nu\geq n^{-1}\nu(A_{n})>0$ .

(iii) Assume that $\lambda\{g=0\}=0$ , and consider a set $B$ such that $\mu(B)=0$ . By applying (ii) for $\mu$ , we find that $\nu(\{g>0\}\cap B)=0$ . Then $\lambda\ll\nu$ implies that $\lambda(\{g>0\}\cap B)=0$ . Then $\lambda(B)=\lambda(\{g>0\}\cap B)+\lambda(\{g=0\}\cap B)=0$ due to assumption $\lambda\{g=0\}=0$ . Hence $\lambda(B)=0$ , and we conclude that $\lambda\ll\mu$ . The converse implication $\lambda\ll\mu\implies\lambda\{g=0\}=0$ is immediate from (i).

(iv) By applying (ii) with $B=\{g=0\}$ , we find that $\lambda\{g=0\}$ is equivalent to $\nu\{f>0,\,g=0\}=0$ . The claim now follows by (iii).

(v) Assume that $\nu\{f>0,\,g>0\}=0$ . Let $B=\{f>0\}$ . Then $\lambda(B^{c})=0$ due to (i). Furthermore by (i), $\mu(B)=\mu(B\cap\{g>0\})=\mu\{f>0,\,g>0\}$ . Hence $\mu(B=0)$ due to $\mu\ll\nu$ . Hence $\lambda\perp\mu$ . Assume now that $\lambda\perp\mu$ . Then there exists a set $B$ such that $\lambda(B^{c})=0$ and $\mu(B)=0$ . Then (ii) implies that $\nu(\{f>0\}\cap B^{c})=0$ and $\nu(\{g>0\}\cap B)=0$ . Then

\displaystyle\nu(\{f>0,g>0\})\ \leq\ \nu(\{f>0\}\cap B^{c})+\nu(\{g>0\}\cap B)\ =\ 0.

(vi) Assume that $\lambda\{g>0\}=0$ . Let $B=\{g=0\}$ . Then $\lambda(B^{c})=0$ , and $\mu(B)=0$ due to (i). Hence $\lambda\perp\mu$ . Assume now that $\lambda\perp\mu$ . Then there exists a set $B$ such that $\lambda(B^{c})=0$ and $\mu(B)=0$ . Then by (ii), we see that $\nu(\{g>0\}\cap B)=0$ . Then $\lambda\ll\nu$ implies that $\lambda(\{g>0\}\cap B)=0$ . Then

\displaystyle\lambda\{g>0\}\ =\ \lambda(\{g>0\}\cap B)+\lambda(\{g>0\}\cap B^{c})\ =\ 0.

∎

A.2 Basic measure theory

Lemma A.2.

If $\nu$ is a sigma-finite measure on a measurable space $(S,\mathcal{S})$ , then there exists a partition $S=\cup_{n\geq 1}S_{n}$ such that $\nu(S_{n})<\infty$ for all $n$ .

Proof.

Because $\nu$ is sigma-finite, there exist measurable sets such that $\cup_{n\geq 1}C_{n}=S$ and $\nu(C_{n})<\infty$ for all $n$ . Define $S_{0}=\emptyset$ and $S_{n}=C_{n}\setminus S_{n-1}$ for $n\geq 1$ . Then the sets $S_{1},S_{2},\dots$ are mutually disjoint, and $S=\cup_{n\geq 1}S_{n}$ , together with $\nu(S_{n})\leq\nu(C_{n})<\infty$ for all $n$ . ∎

The following result is a convenient alternative form of Lebesgue’s dominated convergence theorem that quantifies uniform integrability (boundedness in the increasing convex stochastic order [LV13]) in a flexible manner.

Lemma A.3.

Let $f,f_{1},f_{2},\dots$ and $g_{1},g_{2},\dots$ be measurable real-valued functions on $(S,\mathcal{S})$ such that $f_{n}\to f$ , $g_{n}\to g$ , ${\lvert f_{n}\rvert}\leq g_{n}$ for all $n$ , and $\sup_{n}\int_{S}g_{n}\,d\mu<\infty$ . Then $\int_{S}f_{n}\,d\mu\to\int_{S}f\,d\mu$ , and $\int_{S}{\lvert f\rvert}\,d\mu\leq\sup_{n}\int_{S}g_{n}\,d\mu$ .

Proof.

We note that $g=\liminf_{n\to\infty}g_{n}$ , and that Fatou’s lemma [Kal02, Lemma 1.20] implies $\int g\,d\mu=\int(\liminf_{n\to\infty}g_{n})\,d\mu\leq\liminf_{n\to\infty}\int g_{n}\,d\mu\leq\sup_{n}\int g_{n}\,d\mu$ . Then $\int g\,d\mu$ is finite, and Kallenberg’s version of Lebesgue’s dominated convergence theorem [Kal02, Theorem 1.21] yields the first claim. For the second claim, we note by Fatou’s lemma that $\int_{S}{\lvert f\rvert}\,d\mu=\int_{S}\liminf{\lvert f_{n}\rvert}\,d\mu\leq\liminf\int_{S}{\lvert f_{n}\rvert}\,d\mu\leq\sup_{n}\int_{S}g_{n}\,d\mu$ . ∎

A.3 Measurable bijections

Lemma A.4.

Let $S,T$ be measurable spaces, and let $\phi\colon S\to T$ be a measurable bijection with a measurable inverse. Then $R_{\alpha}(P\circ\phi^{-1}\|\,Q\circ\phi^{-1})=R_{\alpha}(P\|Q)$ for all $\alpha>0$ .

Proof.

Let $P,Q$ be probability measures on $S$ . Fix densities $p=\frac{dP}{dm}$ and $q=\frac{dQ}{dm}$ with respect to $m=P+Q$ . Define a function $\tilde{p}\colon T\to\mathbb{R}_{+}$ by $\tilde{p}=p\circ\phi^{-1}$ and a measure $\tilde{m}$ on $T$ by $\tilde{m}=m\circ\phi^{-1}$ . Note that for any measurable $A\subset T$ ,

\displaystyle\int_{A}\tilde{p}\,d\tilde{m}\ =\ \int_{T}1_{A}(y)\tilde{p}(y)\,\tilde{m}(dy)\ =\ \int_{S}1_{A}(\phi(x))\tilde{p}(\phi(x))\,m(dx).

Because $\tilde{p}(\phi(x))=p(x)$ for all $x$ , we see that

\displaystyle\int_{A}\tilde{p}\,d\tilde{m}\ =\ \int_{\phi^{-1}(A)}p(x)\,m(dx)\ =\ P(\phi^{-1}(A)).

We conclude that $\tilde{p}=\frac{dP\circ\phi^{-1}}{dm\circ\phi^{-1}}$ is a density of $P\circ\phi^{-1}$ with respect to $\tilde{m}$ . Similarly, we see that $\tilde{q}=q\circ\phi^{-1}$ is a density of $Q\circ\phi^{-1}$ with respect to $\tilde{m}$ . Hence,

	$\displaystyle\int_{T}(p\circ\phi^{-1})^{\alpha}(q\circ\phi^{-1})^{1-\alpha}d\tilde{m}$	$\displaystyle\ =\ \int_{S}(p\circ\phi^{-1}(\phi(x)))^{\alpha}(q\circ\phi^{-1}(\phi(x))^{1-\alpha}m(dx)$
		$\displaystyle\ =\ \int_{S}(p(x))^{\alpha}(q(x))^{1-\alpha}m(dx)$
		$\displaystyle\ =\ \int_{S}p^{\alpha}q^{1-\alpha}\,dm.$

From this the claim follows for $\alpha\neq 1$ . The case with $\alpha=1$ is similar. ∎

Appendix B Point patterns

B.1 Measurability of sigma-finite decompositions

This section discusses a decomposition of a point pattern with respect to a countable partition of the ground space $S$ . Let $(S,\mathcal{S})$ be a measurable space. Let $N(S)$ be the set of point patterns (measures with values in $\mathbb{Z}_{+}\cup\{\infty\}$ ) on $(S,\mathcal{S})$ equipped with the sigma-algebra $\mathcal{N}(S)=\sigma(\operatorname{ev}_{B}\colon B\in\mathcal{S})$ generated by the evaluation maps $\operatorname{ev}_{B}\colon\eta\mapsto\eta(B)$ .

Assume that $S_{1},S_{2},\dots\in\mathcal{S}$ are disjoint and such that $S=\cup_{n=1}^{\infty}S_{n}$ . We define $N(S_{n})=\{\eta\in N(S)\colon\eta(S_{n}^{c})=0\}$ and equip this set with the trace sigma-algebra $\mathcal{N}(S_{n})=\mathcal{N}(S)\cap N(S_{n})$ . We define the truncation map $\tau_{S_{n}}\colon N(S)\to N(S_{n})$ by

(\tau_{S_{n}}\eta)(B)\ =\ \eta(B\cap S_{n}),\quad B\in\mathcal{S}.

Then we define $\tau\colon N(S)\to\prod_{n=1}^{\infty}N(S_{n})$ by

\tau(\eta)\ =\ (\tau_{S_{1}}(\eta),\tau_{S_{2}}(\eta),\dots).

(B.1)

We find that $\tau$ is a bijection with inverse

\tau^{-1}(\eta_{1},\eta_{2},\dots)\ =\ \sum_{n=1}^{\infty}\eta_{n}.

We equip $\prod_{n=1}^{\infty}N(S_{n})$ with the product sigma-algebra $\bigotimes_{n=1}^{\infty}\mathcal{N}(S_{n})$ .

Lemma B.1.

$\tau\colon(N(S),\mathcal{N}(S))\to(\prod_{n=1}^{\infty}N(S_{n}),\bigotimes_{n=1}^{\infty}\mathcal{N}(S_{n}))$ is a measurable bijection with a measurable inverse.

Proof.

Denote $\mathcal{C}(S)=\{\operatorname{ev}_{B}^{-1}(\{k\})\colon B\in\mathcal{S},\,k\in\mathbb{Z}_{+}\}$ and note that this set family generates the sigma-algebra $\mathcal{N}(S)$ . By [Shi96, Lemma II.3.3], we know that the set family $\mathcal{C}(S)\cap N(S_{n})$ generates the trace sigma-algebra $\mathcal{N}(S_{n})=\mathcal{N}(S)\cap N(S_{n})$ .

We start by by verifying that $\tau_{S_{n}}\colon N(S)\to N(S_{n})$ is measurable. Fix a set $B\in\mathcal{S}$ and an integer $k\geq 0$ , and consider a set in $\mathcal{C}(S)\cap N(S_{n})$ of form

C\ =\ \{\eta\in N(S)\colon\eta(S_{n}^{c})=0,\,\eta(B)=k\}.

Then

	$\displaystyle\tau_{S_{n}}^{-1}(C)$	$\displaystyle\ =\ \{\eta\in N(S)\colon\eta(S_{n}^{c}\cap S_{n})=0,\,\eta(B\cap S_{n})=k\}$
		$\displaystyle\ =\ \{\eta\in N(S)\colon\eta(B\cap S_{n})=k\}$

shows that $\tau_{S_{n}}^{-1}(C)\in\mathcal{N}(S)$ . Because such sets $C$ generate $\mathcal{N}(S_{n})$ , it follows [Kal02, Lemma 1.4] that $\tau_{S_{n}}$ is measurable. Because each coordinate map of $\tau$ is measurable, it follows [Kal02, Lemma 1.8] that $\tau$ is measurable.

Let us now verify that the inverse map $\tau^{-1}\colon\prod_{n=1}^{\infty}N(S_{n})\to N(S)$ is measurable. Fix a set $B\in\mathcal{S}$ and an integer $k\geq 0$ , and consider a set in $\mathcal{C}(S)$ of form

C\ =\ \{\eta\in N(S)\colon\eta(B)=k\}.

Then

\displaystyle(\tau^{-1})^{-1}(C)

\displaystyle\ =\ \{(\eta_{1},\eta_{2},\dots)\colon\sum_{n=1}^{\infty}\eta_{n}(B)=k\}.

Let $Z_{k}$ be the collection of integer-valued measures $z=\sum_{n=1}^{\infty}z_{n}\delta_{n}$ on $\mathbb{N}=\{1,2,\dots\}$ with total mass $\sum_{n=1}^{\infty}z_{n}=k$ . Then

	$\displaystyle(\tau^{-1})^{-1}(C)$	$\displaystyle\ =\ \bigcup_{z\in Z_{k}}\{(\eta_{1},\eta_{2},\dots)\colon\eta_{n}(B)=z_{n}\ \text{for all $n$}\}$
		$\displaystyle\ =\ \bigcup_{z\in Z_{k}}\bigcap_{n=1}^{\infty}\{(\eta_{1},\eta_{2},\dots)\colon\eta_{n}(B)=z_{n}\}$

shows that $(\tau^{-1})^{-1}(C)\in\bigotimes_{n=1}^{\infty}\mathcal{N}(S_{n})$ . Because such sets $C$ generate $\mathcal{N}(S)$ , it follows [Kal02, Lemma 1.4] that $\tau^{-1}$ is measurable. ∎

B.2 Compensated Poisson integrals

Given measurable sets $S_{n}\uparrow S$ and measures $\eta,\mu$ on a measurable space $(S,\mathcal{S})$ , we say that the compensated integral

\int_{A}f\,d(\eta-\mu)\ =\ \lim_{n\to\infty}\left(\int_{A\cap S_{n}}f\,d\eta-\int_{A\cap S_{n}}f\,d\mu\right)

(B.2)

of a measurable function $f\colon S\to\mathbb{R}$ over a measurable set $A\subset S$ converges if $\int_{A\cap S_{n}}{\lvert f\rvert}\,d\eta+\int_{A\cap S_{n}}{\lvert f\rvert}\,d\mu<\infty$ for all $n$ , and the limit in (B.2) exists in $\mathbb{R}$ . The following two results characterise the convergence of compensated integrals when $\eta$ is sampled from a Poisson PP distribution with intensity measure $\mu$ . These are needed for proving Theorem 4.4.

Lemma B.2.

Let $P_{\mu}$ be a Poisson PP distribution with a sigma-finite intensity measure $\mu$ such that $\mu(S_{n})<\infty$ for all $n$ . For any bounded function $f\colon S\to\mathbb{R}$ such that $\int_{A}f^{2}\,d\mu<\infty$ , the compensated integral $\int_{A}f\,d(\eta-\mu)$ converges for $P_{\mu}$ -almost every $\eta\in N(S)$ .

Proof.

Define $U_{n}=S_{n}\setminus S_{n-1}$ for $n\geq 1$ , where $S_{0}=\emptyset$ . Because $f$ is bounded, we see that $\int_{U_{n}}{\lvert f\rvert}\,d\mu\leq\|f\|_{\infty}\mu(S_{n})<\infty$ . Campbell’s theorem [Kin93, Section 3.2] then implies that

W_{n}(\eta)\ =\ \int_{U_{n}}f\,d\eta-\int_{U_{n}}f\,d\mu

defines a real-valued random variable on probability space $(N(S),\mathcal{N}(S),P_{\mu})$ with mean $E_{\mu}W_{n}=0$ and variance $E_{\mu}W_{n}^{2}=\int_{U_{n}}f^{2}\,\mu$ . Because the sets $U_{n}$ are disjoint, the random variables $W_{n}$ are independent. Furthermore, $E_{\mu}\sum_{n=1}^{\infty}W_{n}^{2}=\int_{S}f^{2}\,d\mu<\infty$ . The Khinchin–Kolmogorov variance criterion [Kal02, Lemma 4.16] then implies that the sum $W=\sum_{n=1}^{\infty}W_{n}$ converges almost surely. Hence $W$ , or equivalently the right side of formula (B.2), is a well-defined real-valued random variable on the probability space $(N(S),\mathcal{N}(S),P_{\mu})$ . In particular $W(\eta)\in\mathbb{R}$ for $P_{\mu}$ -almost every $\eta$ . ∎

Lemma B.3.

Let $P_{\mu}$ be a Poisson PP distribution with a sigma-finite intensity measure $\mu$ . Let $\phi\colon S\to\mathbb{R}_{+}$ be such that $\int_{S}(\sqrt{\phi}-1)^{2}\,d\mu<\infty$ , and denote $A=\{x\in S\colon{\lvert\log\phi(x)\rvert}\leq 1\}$ . Assume that $S_{n}\uparrow S$ and $\mu(S_{n})<\infty$ for all $n$ . Then the sets

\Omega_{1}\ =\ \left\{\eta\in N(S)\colon\int_{A^{c}\cap\{\phi>0\}}{\lvert\log\phi\rvert}\,d\eta<\infty,\ \eta(A^{c})<\infty\right\}

and

\Omega_{2}\ =\ \left\{\eta\in N(S)\colon\text{$\int_{A}\log\phi\,d(\eta-\mu)$ converges},\ \eta(S_{n})<\infty\ \text{for all $n$}\right\}

satisfy $P_{\mu}(\Omega_{1})=1$ and $P_{\mu}(\Omega_{2})=1$ .

Proof.

We note that $\mu(A^{c})=\int_{A^{c}}d\mu\leq\int_{A^{c}}(\phi+1)\,d\mu$ . By (C.6), $\phi+1\ \leq\ \frac{e+1}{(e^{1/2}-1)^{2}}(\sqrt{\phi}-1)^{2}$ on $A^{c}$ . It follows that $\mu(A^{c})$ is finite. Campbell’s formula then implies $\int_{N(S)}\eta(A^{c})P_{\mu}(d\eta)=\mu(A^{c})<\infty$ . Hence the set $\Omega_{1}^{\prime}=\{\eta\colon\eta(A^{c})<\infty\}$ satisfies $P_{\mu}(\Omega_{1}^{\prime})=1$ . We also note that for any $\eta\in\Omega_{1}^{\prime}$ , the restriction of $\eta$ into $A^{c}$ can be written as a finite sum $\eta_{A^{c}}=\sum_{i}\delta_{x_{i}}$ with $x_{i}\in S$ such that $\phi(x_{i})>0$ , so that $\int_{A^{c}\cap\{\phi>0\}}{\lvert\log\phi\rvert}\,d\eta=\sum_{i}{\lvert\log\phi(x_{i})\rvert}$ is finite. Therefore, $\Omega_{1}^{\prime}=\Omega_{1}$ , and we conclude that $P_{\mu}(\Omega_{1})=1$ .

Let

\Omega_{2}^{\prime}\ =\ \Big{\{}\eta\in N(S)\colon\eta(S_{n})<\infty\ \text{for all $n$}\Big{\}}.

Because $\mu(S_{n})<\infty$ for all $n$ , we see that $P_{\mu}(\Omega_{2}^{\prime})=1$ . Observe that $f=1_{A}\log\phi$ is bounded, and by (C.8),

\int_{S}f^{2}\,d\mu\ =\ \int_{A}\log^{2}\phi\,d\mu\ \leq\ 4e^{3}\int_{S}(\sqrt{\phi}-1)^{2}\ <\ \infty.

Lemma B.2 implies that $P_{\mu}(\Omega_{2})=1$ . ∎

B.3 Truncated Poisson PPs

Let $(S,\mathcal{S})$ be a measurable space. Given a set $U\in\mathcal{S}$ and a measure $\lambda$ , define $\lambda_{U}(A)=\lambda(A\cap U)$ for $A\in\mathcal{S}$ . Then $\lambda_{U}$ is a measure on $(S,\mathcal{S})$ . We denote the truncation map by $\pi_{U}\colon\lambda\mapsto\lambda_{U}$ .

When $\eta$ is sampled from a Poisson PP distribution $P_{\lambda}$ with a sigma-finite intensity measure $\lambda$ on $S$ , then the probability distribution of the random point pattern $\eta_{U}$ is given by $P_{\lambda}\circ\pi_{U}^{-1}$ . The following proposition confirms that the law of $\eta_{U}$ is a Poisson PP distribution with truncated intensity measure $\lambda_{U}$ , and that that $P_{\lambda_{U}}$ is also the conditional distribution of $\eta$ sampled from $P_{\lambda}$ given that $\eta(U^{c})=0$ .

Proposition B.4.

For any measurable set $U\subset S$ :

(i)

$P_{\lambda}\circ\pi_{U}^{-1}=P_{\lambda_{U}}$ .
(ii)

$P_{\lambda}\{\eta\colon\eta\in A,\,\eta(U^{c})=0\}=e^{-\lambda(U^{c})}P_{\lambda_{U}}(A)$ for all measurable $A\subset S$ .

Proof.

(i) This follows by [LP18, Theorem 5.2].

(ii) We note that $\eta=\eta_{U}$ for any $\eta$ such that $\eta(U^{c})=0$ , and that $\eta(U^{c})=0$ if and only if $\eta_{U^{c}}=0$ . We also note ([LP18, Theorem 5.2]) that the random point patterns $\eta_{U}$ and $\eta_{U^{c}}$ are independent when $\eta$ is sampled from $P_{\lambda}$ . Therefore, by (i),

	$\displaystyle P_{\lambda}\{\eta\colon\eta\in A,\,\eta(U^{c})=0\}$	$\displaystyle\ =\ P_{\lambda}\{\eta\colon\eta_{U}\in A,\,\eta_{U^{c}}=0\}$
		$\displaystyle\ =\ P_{\lambda_{U}}(A)\,P_{\lambda_{U^{c}}}(\{0\}).$

By noting that $P_{\lambda_{U^{c}}}(\{0\})=P_{\lambda_{U^{c}}}(\eta(S)=0)$ and noting that $\eta(S)$ is Poisson-distributed with parameter $\lambda(U^{c})$ when $\eta$ is sampled from $P_{\lambda_{U^{c}}}$ , we see that $P_{\lambda_{U^{c}}}(\{0\})=e^{-\lambda(U^{c})}$ . Hence the claim follows. ∎

Appendix C Elementary analysis

Lemma C.1.

For all $x>-1$ ,

	$\displaystyle\frac{x}{1+x}$	$\displaystyle\ \leq\ \log(1+x)\ \leq\ x,$		(C.1)
	$\displaystyle x-\frac{x^{2}}{2(1-x_{-})^{2}}$	$\displaystyle\ \leq\ \log(1+x)\ \leq\ x-\frac{x^{2}}{2(1+x_{+})^{2}},$		(C.2)

and for all $y>0$ ,

{\lvert\log y\rvert}\ \leq\ \frac{{\lvert y-1\rvert}}{y\wedge 1}

(C.3)

and

0\ \leq\ y-1-\log y\ \leq\ \frac{(y-1)^{2}}{2(y\wedge 1)^{2}}.

(C.4)

Proof.

Fix a number $x>-1$ . Define $f(t)=\log(1+tx)$ for $t\in[0,1]$ . Note that $f^{\prime}(t)=x(1+tx)^{-1}$ and $f^{\prime\prime}(t)=-x^{2}(1+tx)^{-2}$ . Then the formula $f(1)=f(0)+\int_{0}^{1}f^{\prime}(r)\,dr$ implies that

\log(1+x)\ =\ \int_{0}^{1}\frac{x}{1+rx}\,dr.

Then (C.1) follows by noting that that $\frac{x}{1+x}\leq\frac{x}{1+rx}\leq x$ for all $0\leq r\leq 1$ . Similarly, the formula $f(1)=f(0)+f^{\prime}(0)+\int_{0}^{1}\int_{0}^{s}f^{\prime\prime}(r)\,dr\,ds$ implies that

\log(1+x)\ =\ x-\int_{0}^{1}\int_{0}^{s}\frac{x^{2}}{(1+rx)^{2}}\,dr\,ds.

Then (C.2) follows by noting that $1-x_{-}\leq 1+rx\leq 1+x_{+}$ for all $0\leq r\leq 1$ .

Fix a number $y>0$ . By substituting $x=y-1$ into (C.1), we see that $\frac{y-1}{y}\leq\log y\leq y-1$ . Hence $-\frac{{\lvert y-1\rvert}}{y}\leq\log y\leq{\lvert y-1\rvert}$ , and (C.3) follows. By substituting $x=y-1$ into (C.2), we see that

\frac{(y-1)^{2}}{2(1+(y-1)_{+})^{2}}\ \leq\ y-1-\log y\ \leq\ \frac{(y-1)^{2}}{2(1-(y-1)_{-})^{2}}.

Now (C.4) follows by noting that $1-(y-1)_{-}=y\wedge 1$ . ∎

Lemma C.2.

If $t\geq 0$ satisfies ${\lvert t-1\rvert}\geq c$ for some $c>0$ , then $t+1\leq C(\sqrt{t}-1)^{2}$ where

C\ =\ \begin{cases}\frac{2-c}{(\sqrt{1-c}-1)^{2}},&\quad 0<c\leq 1,\\ \frac{2+c}{(\sqrt{1+c}-1)^{2}},&\quad c>1.\end{cases}

Proof.

Differentiation shows that $r(t)=\frac{(\sqrt{t}-1)^{2}}{t+1}=1-\frac{2\sqrt{t}}{t+1}$ is strictly decreasing on $[0,1]$ and strictly increasing on $[1,\infty)$ .

(i) Assume that $0<c\leq 1$ . Then ${\lvert t-1\rvert}\geq c$ implies that either $t\leq 1-c$ or $t\geq 1+c$ . In the former case $r(t)\geq r(1-c)$ , and in the latter case $r(t)\geq r(1+c)$ . Hence $t+1\leq(r(1-c)^{-1}\wedge r(1+c)^{-1})(\sqrt{t}-1)^{2}=r(1-c)^{-1}(\sqrt{t}-1)^{2}$ .

(ii) Assume that $c>1$ . Then ${\lvert t-1\rvert}\geq c$ implies that $t\geq 1+c$ , so that $r(t)\geq r(1+c)$ . Hence $t+1\leq r(1+c)^{-1}(\sqrt{t}-1)^{2}$ . ∎

Lemma C.3.

For all $0\leq t\leq c$ ,

{\lvert t-1\rvert}\ \leq\ (1+c^{1/2}){\lvert\sqrt{t}-1\rvert}.

(C.5)

For all $0\leq t\leq c^{-1}$ and all $t\geq c$ with $c>1$ ,

t+1\ \leq\ \frac{c+1}{(c^{1/2}-1)^{2}}(\sqrt{t}-1)^{2}.

(C.6)

Proof.

(i) Because ${\lvert t-1\rvert}={\lvert\sqrt{t}+1\rvert}{\lvert\sqrt{t}-1\rvert}$ , we see that (C.5) follows by noting that ${\lvert\sqrt{t}+1\rvert}\leq 1+c^{1/2}$ for all $0\leq t\leq c$ .

(ii) Differentiation shows that $r(t)=\frac{(\sqrt{t}-1)^{2}}{t+1}=1-\frac{2\sqrt{t}}{t+1}$ is decreasing on $[0,1]$ and increasing on $[1,\infty)$ . Hence $r(t)\geq r(c)$ for $t\geq c$ and $r(t)\geq r(c^{-1})$ for $0\leq t\leq c^{-1}$ . Because $r(c)=r(c^{-1})=\frac{(c^{1/2}-1)^{2}}{c+1}$ , we conclude (C.6). ∎

Lemma C.4.

For all $t$ such that ${\lvert\log t\rvert}\leq 1$ ,

	$\displaystyle{\lvert\log t+1-t\rvert}$	$\displaystyle\ \leq\ 2e^{3}(\sqrt{t}-1)^{2},$		(C.7)
	$\displaystyle\log^{2}t$	$\displaystyle\ \leq\ 4e^{3}(\sqrt{t}-1)^{2}.$		(C.8)

Proof.

Fix a number $t$ such that ${\lvert\log t\rvert}\leq 1$ . Then $e^{-1}\leq t\leq e$ . Then (C.4) implies that ${\lvert\log t+1-t\rvert}\leq\frac{(t-1)^{2}}{2(t\wedge 1)^{2}}\leq\frac{1}{2}e^{2}(t-1)^{2}.$ By combining this with (C.5), inequality (C.7) follows. Furthermore, (C.3) implies that ${\lvert\log t\rvert}\leq\frac{{\lvert t-1\rvert}}{t\wedge 1}\leq e{\lvert t-1\rvert}$ . By combining this with (C.5), we that (C.8) is valid. ∎

References

[ABDW21] Rami Atar, Amarjit Budhiraja, Paul Dupuis, and Ruoyu Wu. Robust bounds and optimization at the large deviations scale for queueing models via Rényi divergence. Annals of Applied Probability, 31(3):1061–1099, 2021.
[ABS21] Emmanuel Abbé, François Baccelli, and Abishek Sankararaman. Community detection on Euclidean random graphs. Information and Inference: A Journal of the IMA, 10(1):109–160, 2021.
[AKL24] Konstantin Avrachenkov, B. R. Vinay Kumar, and Lasse Leskelä. Community detection on block models with geometric kernels, 2024. https://arxiv.org/abs/2403.02802.
[Ana18] Venkat Anantharam. A variational characterization of Rényi divergences. IEEE Transactions on Information Theory, 64(11):6979–6989, 2018.
[AS15] Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), 2015.
[BDK⁺21] Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Luc Rey-Bellet, and Jie Wang. Variational representations and neural network estimation of Rényi divergences. SIAM Journal on Mathematics of Data Science, 3(4):1093–1116, 2021.
[Bev20] Andrew Bevan. Spatial point patterns and processes. In Mark Gillings, Piraye Hacıgüzeller, and Gary Lock, editors, Archaeological Spatial Analysis, pages 60–76. Routledge, 2020.
[Bir07] Lucien Birgé. Model selection for Poisson processes. In Asymptotics: particles, processes and inverse problems. Festschrift for Piet Groeneboom, pages 32–64. Beachwood, OH: IMS, Institute of Mathematical Statistics, 2007.
[Bro71] Mark Brown. Discrimination of Poisson processes. Annals of Mathematical Statistics, 42(2):773–776, 1971.
[Che52] Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23(4):493–507, 1952.
[Dig13] Peter J. Diggle. Statistical Analysis of Spatial and Spatio-Temporal Point Patterns. Chapman and Hall/CRC, third edition, 2013.
[DM82] Claude Dellacherie and Paul-André Meyer. Probabilities and Potential B. North-Holland Publishing Company, 1982.
[DVJ03] Daryl J. Daley and David Vere-Jones. An Introduction to the Theory of Point Processes. Springer, second edition, 2003.
[GS02] Alison L. Gibbs and Francis Edward Su. On choosing and bounding probability metrics. International Statistical Review, 70(3):419–435, 2002.
[IPSS08] Janine Illian, Antti Penttinen, Helga Stoyan, and Dietrich Stoyan. Statistical Analysis and Modelling of Spatial Point Patterns. Wiley, 2008.
[Kak48] Shizuo Kakutani. On equivalence of infinite product measures. Annals of Mathematics, 49(1):214–224, 1948.
[Kal02] Olav Kallenberg. Foundations of Modern Probability. Springer, second edition, 2002.
[Kar83] Alan F. Karr. State estimation for Cox processes on general spaces. Stochastic Processes and their Applications, 14(3):209–232, 1983.
[Kar91] Alan F. Karr. Point Processes and Their Statistical Inference. Marcel Dekker, second edition, 1991.
[Kin67] John Frank Charles Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1):59–78, 1967.
[Kin93] John Frank Charles Kingman. Poisson Processes. Oxford University Press, 1993.
[KL13] Mikko Kuronen and Lasse Leskelä. Hard-core thinnings of germ–grain models with power-law grain sizes. Advances in Applied Probability, 45(3):595–625, 2013.
[KLNS07] Ingemar Kaj, Lasse Leskelä, Ilkka Norros, and Volker Schmidt. Scaling limits for random fields with long-range dependence. Annals of Probability, 35(2):528–550, 2007.
[Les10] Lasse Leskelä. Stochastic relations of random variables and processes. Journal of Theoretical Probability, 23(2):523–546, 2010.
[Les22] Lasse Leskelä. Ross’s second conjecture and supermodular stochastic ordering. Queueing Systems, 100(3):213–215, 2022.
[Lie75] Friedrich Liese. Eine informationstheoretische Bedingung fur die Äquivalenz unbegrenzt teilbarer Punktprozesse. Mathematische Nachrichten, 70(1):183–196, 1975.
[LP18] Günter Last and Mathew D. Penrose. Lectures on the Poisson Process. Cambridge University Press, 2018.
[LS78] Robert S. Liptser and Albert N. Shiryaev. Statistics of Random Processes II. Springer, 1978.
[LV06] Friedrich Liese and Igor Vajda. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394–4412, 2006.
[LV13] Lasse Leskelä and Matti Vihola. Stochastic order characterization of uniform integrability and tightness. Statistics & Probability Letters, 83(1):382–389, 2013.
[LV17] Lasse Leskelä and Matti Vihola. Conditional convex orders and measurable martingale couplings. Bernoulli, 23(4A):2784–2807, 2017.
[MFWF23] Benjamin Kurt Miller, Marco Federici, Christoph Weniger, and Patrick Forré. Simulation-based inference with the generalized Kullback-Leibler divergence, 2023. ICML Workshop on Synergy of Scientific and Machine Learning Modeling.
[MPST21] Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. From Blackwell dominance in large samples to Rényi divergences and back again. Econometrica, 89(1):475–506, 2021.
[Nie13] Frank Nielsen. An information-geometric characterization of Chernoff information. IEEE Signal Processing Letters, 20(3):269–272, 2013.
[NN11] Frank Nielsen and Richard Nock. A closed-form expression for the Sharma–Mittal entropy of exponential families. Journal of Physics A: Mathematical and Theoretical, 45(3):032003, 2011.
[PRBR18] Franck Picard, Patricia Reynaud-Bouret, and Etienne Roquain. Continuous testing for Poisson process intensities: a new perspective on scanning statistics. Biometrika, 105(4):931–944, 2018.
[PW24] Yury Polyanskiy and Yihong Wu. Information Theory. Cambridge University Press, 2024.
[RB17] Miklós Z. Rácz and Sébastien Bubeck. Basic models and questions in statistical network analysis. Statistics Surveys, 11:1–47, 2017.
[RBS10] Patricia Reynaud-Bouret and Sophie Schbath. Adaptive estimation for Hawkes processes; application to genome analysis. Annals of Statistics, 38(5):2781–2822, 2010.
[Rei93] Rolf-Dieter Reiss. A Course on Point Processes. Springer, 1993.
[Rén61] Alfréd Rényi. On measures of entropy and information. In 4th Berkeley Symposium on Mathematics, Statistics and Probability, 1961.
[Sas18] Igal Sason. On $f$ -divergences: Integral representations, local behavior, and inequalities. Entropy, 20(5):1–32, 2018.
[Shi96] Albert N. Shiryaev. Probability. Springer, second edition, 1996.
[Sko57] Anatoliy Volodymyrovych Skorohod. On the differentiability of measures which correspond to stochastic processes. I. Processes with independent increments. Theory of Probability & Its Applications, 2(4):407–432, 1957.
[SMSS02] Maritin Snethlage, Vicent J. Martínez, Dietrich Stoyan, and Enn Saar. Point field models for the galaxy point pattern. Astronomy & Astrophysics, 388(3):758–765, 2002.
[SP00] Dietrich Stoyan and Antti Penttinen. Recent applications of point process methods in forestry statistics. Statistical Science, 15(1):61–78, 2000.
[Tak90] Yoichiro Takahashi. Absolute continuity of Poisson random fields. Publications of the Research Institute for Mathematical Sciences, Kyoto University, 26(4):629–647, 1990.
[Tsa98] Constantino Tsallis. Generalized entropy-based criterion for consistent testing. Physical Review E, 58:1442–1445, 1998.
[vH14] Tim van Erven and Peter Harremoës. Rényi divergence and Kullback–Leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, July 2014.
[ZT23] Qiaosheng Zhang and Vincent Y. F. Tan. Exact recovery in the general hypergraph stochastic block model. IEEE Transactions on Information Theory, 69(1):453–471, 2023.

	$\displaystyle\int_{S}R_{\alpha}(p_{\tilde{f}_{1}(x)}\\|p_{\tilde{g}_{1}(x)})\,\nu(dx)$	$\displaystyle\ =\ \int_{S}R_{\alpha}(p_{f_{1}(x)}\\|p_{g_{1}(x)})\,\nu_{1}(dx),$
	$\displaystyle\int_{S}R_{\alpha}(p_{\tilde{f}_{2}(x)}\\|p_{\tilde{g}_{2}(x)})\,\nu(dx)$	$\displaystyle\ =\ \int_{S}R_{\alpha}(p_{f_{2}(x)}\\|p_{g_{2}(x)})\,\nu_{2}(dx).$