This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Information divergences and likelihood ratios of Poisson processes and point patterns

Lasse Leskelä
Abstract

This article develops an analytical framework for studying information divergences and likelihood ratios associated with Poisson processes and point patterns on general measurable spaces. The main results include explicit analytical formulas for Kullback–Leibler divergences, Rényi divergences, Hellinger distances, and likelihood ratios of the laws of Poisson point patterns in terms of their intensity measures. The general results yield similar formulas for inhomogeneous Poisson processes, compound Poisson processes, as well as spatial and marked Poisson point patterns. Additional results include simple characterisations of absolute continuity, mutual singularity, and the existence of common dominating measures. The analytical toolbox is based on Tsallis divergences of sigma-finite measures on abstract measurable spaces. The treatment is purely information-theoretic and free of topological assumptions.

Keywords: Poisson random measure, inhomogeneous Poisson process, point process, spatial point pattern, Rényi divergence, Tsallis divergence, Hellinger distance, mutual information, Chernoff information, Bhattacharyya distance

1 Introduction

A point pattern or a point process (PP) represents a countable set of points in space or time. Poisson PPs are fundamental statistical models for generating randomly scattered points on the real line, in a Euclidean space, or in an abstract measurable space SS. They are encountered in a wide range of applications such as archeology [Bev20], [PRBR18], astronomy [SMSS02], forestry statistics [SP00, KL13], machine learning [BDK+21, AS15, ABS21, AKL24], neuroscience and genomics [RBS10, PRBR18], and queueing systems [ABDW21, Les22]. The law of a Poisson PP is a probability measure PλP_{\lambda} characterised by an intensity measure λ\lambda, so that λ(B)\lambda(B) indicates the expected number of points in BSB\subset S. In statistical inference, it is important to understand how the intensity measure can be identified from data. Statistical research devoted to this question has a long history — well summarised in standard textbooks [Kar91, DVJ03, IPSS08, Dig13]. The main analytical approaches include computing and estimating likelihood ratios dPλdPμ\frac{dP_{\lambda}}{dP_{\mu}} and information divergences of laws of Poisson PPs with intensity measures λ\lambda and μ\mu.

Likelihood ratios are easy to compute for standard families of probability distributions on finite-dimensional spaces, but not so for probability measures of infinite-dimensional objects such as paths of stochastic processes or spaces of point patterns. In fact, even verifying the absolute continuity of a pair of probability measures, a necessary condition for the existence of a likelihood ratio, can be nontrivial. For Poisson PPs with finite intensity measures, a classical result [Kar91, Rei93] states that PλPμP_{\lambda}\ll P_{\mu} if and only if λμ\lambda\ll\mu, in which case a likelihood ratio is given by

dPλdPμ(η)=exp(Slogϕdη+S(1ϕ)𝑑μ)\frac{dP_{\lambda}}{dP_{\mu}}(\eta)\ =\ \exp\left(\int_{S}\log\phi\,d\eta+\int_{S}(1-\phi)\,d\mu\right) (1.1)

with ϕ=dλdμ\phi=\frac{d\lambda}{d\mu} being a density of the intensity measures. Using this formula, it is easy to compute various types of information divergences and distances for PλP_{\lambda} and PμP_{\mu}. For Poisson PPs with general sigma-finite intensity measures, the description and even the existence of a likelihood ratio is far less obvious. To see why, note that the rightmost integral in (1.1) equals S(1ϕ)𝑑μ=μ(S)λ(S)\int_{S}(1-\phi)\,d\mu=\mu(S)-\lambda(S) for finite intensity measures, but for infinite intensity measures this integral might not exist. For Poisson PPs with general sigma-finite intensity measures, most of the known results [Sko57, Lie75, Kar83, Tak90] are restricted to locally compact Polish spaces, thereby ruling out e.g. infinite-dimensional Hilbert spaces.

1.1 Main contributions

This article develops a framework for computing likelihood ratios and information divergences in the most general natural setting, for Poisson PPs with sigma-finite intensity measures on a general measurable space. The purely information-theoretic approach makes no topological assumptions, and allows one to work with point patterns in high- and infinite-dimensional spaces without worrying about topological regularity properties. A key contribution is an explicit formula (Theorem 4.4) for the likelihood ratio dPλdPμ\frac{dP_{\lambda}}{dP_{\mu}} that is applicable to all Poisson PP distributions with PλPμP_{\lambda}\ll P_{\mu}. This result facilitates the derivation of a characterisation for pairs of Poisson PPs whose laws are dominated by a Poisson PP distribution (Theorem 5.7).

Furthermore, the article provides a comprehensive characterisation of Rényi and Kullback–Leibler divergences of Poisson PPs (Theorems 5.15.2), showing that these divergences can be expressed as generalised Tsallis divergences of associated intensity measures. It also extends the definition of Tsallis divergences from probability measures to sigma-finite measures, representing them as linear combinations of Rényi divergences of Poisson distributions (Theorem 3.1). These Poisson–Rényi–Tsallis relationships yield a simplified characterisation for the absolute continuity and mutual singularity of general Poisson PP distributions.

The practical applicability of these results is demonstrated in various contexts, including Poisson processes, compound Poisson processes, marked Poisson point patterns, and Chernoff information of Poisson vectors.

1.2 Outline

The rest of the article is organised as follows. Section 2 introduces notations and definitions. Section 3 develops theoretical foundations for Tsallis divergences of sigma-finite measures. Section 4 presents the main results concerning likelihood ratios of Poisson PP distributions. Section 5 presents the main results about information divergences of Poisson PPs. Section 6 illustrates how the main results can applied to analyse Poisson processes, compound Poisson processes, marked Poisson point patterns, and Chernoff information of Poisson vectors. Section 7 contains the technical proofs of the main results, and Section 8 concludes.

2 Preliminaries

2.1 Measures

Standard conventions of measure theory and Lebesgue integration [Kal02] are used. The sets of nonnegative integers and nonnegative real numbers are denoted by +\mathbb{Z}_{+} and +\mathbb{R}_{+}, respectively. For measures λ,μ\lambda,\mu on a measurable space (S,𝒮)(S,\mathcal{S}), the notation λμ\lambda\ll\mu means that λ\lambda is absolutely continuous with respect to μ\mu, that is, μ(A)=0\mu(A)=0 for all A𝒮A\in\mathcal{S} such that λ(A)=0\lambda(A)=0. We denote λμ\lambda\perp\mu and say that λ,μ\lambda,\mu are mutually singular if there exist a measurable set BB such that λ(Bc)=0\lambda(B^{c})=0 and μ(B)=0\mu(B)=0. A measurable function f:S+f\colon S\to\mathbb{R}_{+} is called a density (or Radon–Nikodym derivative) of λ\lambda with respect to μ\mu when μ(A)=Af𝑑μ\mu(A)=\int_{A}f\,d\mu for all A𝒮A\in\mathcal{S}; in this case we denote f=dλdμf=\frac{d\lambda}{d\mu}. A density of probability measure is called a likelihood ratio.

The symbol δx\delta_{x} refers to the Dirac measure at xx. For a number c0c\geq 0, the symbol Poi(c)\operatorname{Poi}(c) denotes the Poisson probability distribution with mean cc and density kecckk!k\mapsto e^{-c}\frac{c^{k}}{k!} with respect to the counting measure on +\mathbb{Z}_{+}, with the standard conventions that Poi(0)=δ0\operatorname{Poi}(0)=\delta_{0} and Poi()=δ\operatorname{Poi}(\infty)=\delta_{\infty}.

2.2 Point patterns

A point pattern is a countable collection of points, possibly with multiplicities, in a measurable space (S,𝒮)(S,\mathcal{S}). Such a collection is naturally represented as a measure η\eta on (S,𝒮)(S,\mathcal{S}), so that η(B)\eta(B) equals the number of points in B𝒮B\in\mathcal{S}. The requirement on countability is guaranteed when η=n=1ηn\eta=\sum_{n=1}^{\infty}\eta_{n} for some finite measures ηn\eta_{n}. Following [LP18], we define N(S)N(S) as the set of all measures that can be written as a countable sum of integer-valued finite measures on (S,𝒮)(S,\mathcal{S}), and equip it with the sigma-algebra 𝒩(S)\mathcal{N}(S) generated by the evaluation maps evB:ηη(B)\operatorname{ev}_{B}\colon\eta\mapsto\eta(B), B𝒮B\in\mathcal{S}. An element of N(S)N(S) is called a point pattern or a point process (PP).

2.3 Poisson PPs

Poisson PPs are defined in the standard manner, see [Kin67, Kal02, LP18] for general background. A PP distribution is a probability measure on (N(S),𝒩(S))(N(S),\mathcal{N}(S)). The Laplace functional of a PP distribution PP is the map that assigns to every measurable function u:S[0,]u\colon S\to[0,\infty] a number LP(u)=N(S)exp(Xu𝑑η)P(dη)[0,1]L_{P}(u)=\int_{N(S)}\exp(-\int_{X}u\,d\eta)\,P(d\eta)\in[0,1]. Given a sigma-finite measure λ\lambda on (S,𝒮)(S,\mathcal{S}), the Poisson PP distribution with intensity measure λ\lambda is the unique probability measure PλP_{\lambda} on (N(S),𝒩(S))(N(S),\mathcal{N}(S)) such that

  1. (i)

    Pλ(evB1,,evBn)1=i=1n(PλevBi1)P_{\lambda}\circ(\operatorname{ev}_{B_{1}},\dots,\operatorname{ev}_{B_{n}})^{-1}=\bigotimes_{i=1}^{n}(P_{\lambda}\circ\operatorname{ev}_{B_{i}}^{-1}) for all integers n1n\geq 1 and mutually disjoint B1,,Bn𝒮B_{1},\dots,B_{n}\in\mathcal{S}.

  2. (ii)

    PλevB1=Poi(λ(B))P_{\lambda}\circ\operatorname{ev}_{B}^{-1}=\operatorname{Poi}(\lambda(B)) for all B𝒮B\in\mathcal{S}.

For the existence and uniqueness, see e.g. [LP18, Proposition 2.10, Theorem 3.6] or [Kal02, Lemma 12.1–12.2, Theorem 12.7]. Samples η\eta from PλP_{\lambda} are called Poisson PPs.

2.4 Rényi divergences

Rényi divergences were introduced in [Rén61]. The Rényi divergence of order α[0,)\alpha\in[0,\infty) for probability measures P,QP,Q on a measurable space (S,𝒮)(S,\mathcal{S}) is defined [vH14, PW24] by

Rα(PQ)={logQ(p>0),α=0,1α1logSpαq1α𝑑ν,α{0,1},Splogpqdν,α=1,R_{\alpha}(P\|Q)\ =\ \begin{cases}-\log Q(p>0),&\quad\alpha=0,\\ \frac{1}{\alpha-1}\log\int_{S}p^{\alpha}q^{1-\alpha}\,d\nu,&\quad\alpha\notin\{0,1\},\\ \int_{S}p\log\frac{p}{q}\,d\nu,&\quad\alpha=1,\end{cases} (2.1)

where p=dPdνp=\frac{dP}{d\nu} and q=dQdνq=\frac{dQ}{d\nu} are densities of P,QP,Q with respect to a sigma-finite measure111The definition does not depend on the choice of reference measure or densities, we may take ν=12(P+Q)\nu=\frac{1}{2}(P+Q) for example [vH14]. ν\nu on SS; and for α>1\alpha>1 we read pαq1α=pαqα1p^{\alpha}q^{1-\alpha}=\frac{p^{\alpha}}{q^{\alpha-1}} and adopt the conventions [LV06, vH14] that 00=0\frac{0}{0}=0 and t0=\frac{t}{0}=\infty for t>0t>0, together with 0log0t=00\log\frac{0}{t}=0 for t0t\geq 0 and tlogt0=t\log\frac{t}{0}=\infty for t>0t>0. For α(1,)\alpha\in(1,\infty), note that Spαq1α𝑑ν=p>0,q>0pαq1α𝑑ν+P{q=0}\int_{S}p^{\alpha}q^{1-\alpha}\,d\nu=\int_{p>0,\,q>0}p^{\alpha}q^{1-\alpha}\,d\nu+\infty P\{q=0\}, so that

Rα(PQ)={1α1logp>0,q>0pαq1α𝑑ν,PQ,,P≪̸Q.R_{\alpha}(P\|Q)\ =\ \begin{cases}\frac{1}{\alpha-1}\log\int_{p>0,\,q>0}p^{\alpha}q^{1-\alpha}\,d\nu,&\qquad P\ll Q,\\ \infty,&\qquad P\not\ll Q.\end{cases}

Rényi divergences also admit several variational characterisations, see for example [Ana18, BDK+21, vH14].

Rényi divergences of orders 12,1,2\frac{1}{2},1,2 are connected to other important information quantities as follows: R1(P,Q)R_{1}(P,Q) equals the Kullback–Leibler divergence or relative entropy, R1/2(PQ)=2log(1Hel2(P,Q)2)R_{1/2}(P\|Q)=-2\log(1-\frac{\operatorname{Hel}^{2}(P,Q)}{2}) where Hel(P,Q)\operatorname{Hel}(P,Q) is the Hellinger distance, and R2(PQ)=log(1+χ2(P,Q))R_{2}(P\|Q)=\log(1+\chi^{2}(P,Q)) where χ2(P,Q)\chi^{2}(P,Q) refers to the χ2\chi^{2}-divergence [GS02].

3 Tsallis divergences of sigma-finite measures

This section introduces a theoretical framework of Tsallis divergences of sigma-finite measures. Section 3.1 provides a definition and a representation formula as a Poisson–Rényi integral, and Section 3.2 summarises some basic properties. Section 3.3 demonstrates how Tsallis divergences characterise absolute continuity and mutual singularity. Section 3.4 establishes a connection with Hellinger distances, and Section 3.5 presents a disintegration formula.

3.1 Definition

Tsallis divergences of probability measures were introduced in [Tsa98] (see also [NN11]). The following definition generalises the notion of Tsallis divergence from probability measures to arbitrary sigma-finite measures. Let λ,μ\lambda,\mu be sigma-finite measures on a measurable space SS admitting densities f=dλdνf=\frac{d\lambda}{d\nu} and g=dμdνg=\frac{d\mu}{d\nu} with respect to a sigma-finite measure ν\nu. The Tsallis divergence of order α+\alpha\in\mathbb{R}_{+} is defined by

Tα(λμ)={μ{f=0},α=0,S(αf+(1α)gfαg1α1α)𝑑ν,α{0,1},S(flogfg+gf)𝑑ν,α=1,T_{\alpha}(\lambda\|\mu)=\begin{cases}\mu\{f=0\},&\quad\alpha=0,\\ \int_{S}\big{(}\frac{\alpha f+(1-\alpha)g-f^{\alpha}g^{1-\alpha}}{1-\alpha}\big{)}\,d\nu,&\quad\alpha\notin\{0,1\},\\ \int_{S}\big{(}f\log\frac{f}{g}+g-f\big{)}\,d\nu,&\quad\alpha=1,\end{cases} (3.1)

where for α>1\alpha>1 we read fαg1αf^{\alpha}g^{1-\alpha} as fαgα1\frac{f^{\alpha}}{g^{\alpha-1}} and adopt the conventions that 00=0\frac{0}{0}=0 and t0=\frac{t}{0}=\infty for t>0t>0, as well as tlogt0=t\log\frac{t}{0}=\infty for t>0t>0 and 0log0t=00\log\frac{0}{t}=0 for t0t\geq 0.

Theorem 3.1.

αTα(λμ)\alpha\mapsto T_{\alpha}(\lambda\|\mu) is a well-defined nondecreasing function from +\mathbb{R}_{+} into [0,][0,\infty] that is continuous on the interval {α:Tα(λμ)<}\{\alpha\colon T_{\alpha}(\lambda\|\mu)<\infty\}, and admits a representation

Tα(λμ)=SRα(pf(x)pg(x))ν(dx),T_{\alpha}(\lambda\|\mu)\ =\ \int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu(dx), (3.2)

where psp_{s} refers to the Poisson distribution keskskk!k\mapsto e^{-sk}\frac{s^{k}}{k!} on the nonnegative integers with mean ss. Furthermore, the value of Tα(λμ)T_{\alpha}(\lambda\|\mu) does not depend on the choice of the densities f,gf,g nor the measure ν\nu.

Proof.

Section 7.1. ∎

Remark 3.2.

In the special case with λ(S)=μ(S)=1\lambda(S)=\mu(S)=1, we find that Tα(λμ)=1Sfαg1α𝑑ν1αT_{\alpha}(\lambda\|\mu)=\frac{1-\int_{S}f^{\alpha}g^{1-\alpha}\,d\nu}{1-\alpha} for α(0,1)\alpha\in(0,1) agrees with the classical definition of the Tsallis divergence for probability measures [Tsa98]. In this case T1(λμ)=KL(λμ)T_{1}(\lambda\|\mu)=\operatorname{KL}(\lambda\|\mu) equals the Kullback–Leibler divergence, and the Tsallis divergence of order α{0,1}\alpha\notin\{0,1\} is related to the Rényi divergence by Tα(λμ)=1e(1α)Rα(λμ)1αT_{\alpha}(\lambda\|\mu)=\frac{1-e^{-(1-\alpha)R_{\alpha}(\lambda\|\mu)}}{1-\alpha}. In case of finite measures, the formulas on the right side of (3.1) can be simplified by replacing Sf𝑑ν=λ(S)\int_{S}f\,d\nu=\lambda(S) and Sg𝑑ν=μ(S)\int_{S}g\,d\nu=\mu(S). Such simplifications are not possible for general sigma-finite measures, but Theorem 3.1 guarantees that the integrals in (3.1) are nevertheless well defined.

Remark 3.3.

T1/2(λμ)=2H2(λ,μ)T_{1/2}(\lambda\|\mu)=2H^{2}(\lambda,\mu) where H(λ,μ)H(\lambda,\mu) refers to the Hellinger distance (see Section 3.4), and T1(λμ)T_{1}(\lambda\|\mu) corresponds to the generalized KL divergence discussed in [MFWF23].

Remark 3.4.

When λ,μ\lambda,\mu admit strictly positive densities f,gf,g with respect to a sigma-finite measure ν\nu, the Tsallis divergence of order α1\alpha\neq 1 can be written as Tα(λμ)=gΦ(fg)𝑑νT_{\alpha}(\lambda\|\mu)=\int g\,\Phi(\frac{f}{g})\,d\nu where Φ(t)=tα1α(t1)α1\Phi(t)=\frac{t^{\alpha}-1-\alpha(t-1)}{\alpha-1} is a convex function such that Φ(1)=0\Phi(1)=0. In this sense Tα(λμ)T_{\alpha}(\lambda\|\mu) corresponds to an instance of an f-divergence between sigma-finite measures λ,μ\lambda,\mu. Tsallis divergences restricted to probability measures may therefore be analysed using the rich theory of f-divergences [PW24] [Sas18].

3.2 Properties

Tsallis divergences share several properties in common with Rényi divergences. This section summarises some of the most important.

Proposition 3.5.

The Tsallis divergence for sigma-finite measures λμ\lambda\ll\mu with density ϕ=dλdμ\phi=\frac{d\lambda}{d\mu} can be written as

Tα(λμ)={μ{ϕ=0},α=0,S(αϕ+1αϕα1α)𝑑μ,α{0,1},S(ϕlogϕ+1ϕ)𝑑μ,α=1.T_{\alpha}(\lambda\|\mu)=\begin{cases}\mu\{\phi=0\},&\qquad\alpha=0,\\ \int_{S}\big{(}\frac{\alpha\phi+1-\alpha-\phi^{\alpha}}{1-\alpha}\big{)}d\mu,&\qquad\alpha\notin\{0,1\},\\ \int_{S}\left(\phi\log\phi+1-\phi\right)d\mu,&\qquad\alpha=1.\end{cases} (3.3)
Proof.

Theorem 3.1 indicates that we are free to choose densities and reference measures in (3.1). The claim follows by choosing densities f=ϕf=\phi, g=1g=1 of λ,μ\lambda,\mu with respect to reference measure ν=μ\nu=\mu. ∎

Proposition 3.6.

(1α)Tα(μλ)=αT1α(λμ)(1-\alpha)T_{\alpha}(\mu\|\lambda)=\alpha T_{1-\alpha}(\lambda\|\mu) for all α(0,1)\alpha\in(0,1).

Proof.

Immediate from formula (3.1). ∎

Proposition 3.7.

αβ1β1αTβ(λμ)Tα(λμ)Tβ(λμ)\frac{\alpha}{\beta}\frac{1-\beta}{1-\alpha}T_{\beta}(\lambda\|\mu)\leq T_{\alpha}(\lambda\|\mu)\leq T_{\beta}(\lambda\|\mu) for all αβ\alpha\leq\beta in (0,1)(0,1).

Proof.

The second inequality follows by the monotonicity of Tsallis divergences (Theorem 3.1). The first inequality follows by applying Proposition 3.6 and monotonicity to conclude that

αβ1β1αTβ(λμ)\displaystyle\frac{\alpha}{\beta}\frac{1-\beta}{1-\alpha}T_{\beta}(\lambda\|\mu) =α1αT1β(μλ)\displaystyle\ =\ \frac{\alpha}{1-\alpha}T_{1-\beta}(\mu\|\lambda)
α1αT1α(μλ)=Tα(λμ).\displaystyle\ \leq\ \frac{\alpha}{1-\alpha}T_{1-\alpha}(\mu\|\lambda)\ =\ T_{\alpha}(\lambda\|\mu).

3.3 Absolute continuity and mutual singularity

The following is a characterisation of absolute continuity for sigma-finite measures in terms of Tsallis divergences. It is similar in spirit for an analogous characterisation of probability measures using Rényi divergences: PQP\ll Q iff R0(QP)=0R_{0}(Q\|P)=0 (see [Shi96, Theorem III.9.2], [vH14, Theorem 23]).

Proposition 3.8.

The following are equivalent for all sigma-finite measures:

  1. (i)

    λμ\lambda\ll\mu.

  2. (ii)

    T0(μλ)=0T_{0}(\mu\|\lambda)=0.

Proof.

Let λ,μ\lambda,\mu be measures on a measurable space (S,𝒮)(S,\mathcal{S}) admitting densities f=dλdνf=\frac{d\lambda}{d\nu} and g=dμdνg=\frac{d\mu}{d\nu} with respect to a measure ν\nu. The equivalence of (i) and (ii) follows by applying Lemma A.1 and noting that λ{g=0}=T0(μλ)\lambda\{g=0\}=T_{0}(\mu\|\lambda) by definition (3.1).

The following is a characterisation of mutual singularity for finite measures in terms of Tsallis divergences. It is similar in spirit to an analogous characterisation of probability measures using Rényi divergences: PQP\perp Q iff R0(PQ)=R_{0}(P\|Q)=\infty iff R0(QP)=R_{0}(Q\|P)=\infty (see [Shi96, Theorem III.9.3], [vH14, Theorem 24]).

Proposition 3.9.

The following are equivalent for all finite measures:

  1. (i)

    λμ\lambda\perp\mu.

  2. (ii)

    T0(λμ)=μ(S)T_{0}(\lambda\|\mu)=\mu(S).

  3. (iii)

    T0(μλ)=λ(S)T_{0}(\mu\|\lambda)=\lambda(S).

Proof.

Let λ,μ\lambda,\mu be measures on a measurable space (S,𝒮)(S,\mathcal{S}) admitting densities f=dλdνf=\frac{d\lambda}{d\nu} and g=dμdνg=\frac{d\mu}{d\nu} with respect to a measure ν\nu.

(i)\iff(ii). By (3.1), T0(λμ)=μ{f=0}T_{0}(\lambda\|\mu)=\mu\{f=0\}. Hence μ(S)T0(λμ)=μ{f>0}\mu(S)-T_{0}(\lambda\|\mu)=\mu\{f>0\}. The claim follows because λμ\lambda\perp\mu is equivalent to μ{f>0}=0\mu\{f>0\}=0 (Lemma A.1).

(i)\iff(iii). Analogously, λ(S)T0(μλ)=λ(S)λ{g=0}=λ{g>0}\lambda(S)-T_{0}(\mu\|\lambda)=\lambda(S)-\lambda\{g=0\}=\lambda\{g>0\}. The claim follows because λμ\lambda\perp\mu is equivalent to λ{g>0}=0\lambda\{g>0\}=0 (Lemma A.1).

3.4 Hellinger distances of sigma-finite measures

The Hellinger distance between sigma-finite measures λ\lambda and μ\mu is defined by

H(λ,μ)=(12S(fg)2𝑑ν)1/2,H(\lambda,\mu)\ =\ \left(\frac{1}{2}\int_{S}\left(\sqrt{f}-\sqrt{g}\right)^{2}d\nu\right)^{1/2}, (3.4)

where f=dλdνf=\frac{d\lambda}{d\nu} and g=dμdνg=\frac{d\mu}{d\nu} are densities with respect to a sigma-finite measure ν\nu. Hellinger distances take values in [0,1][0,1] for probability measures, and in [0,][0,\infty] for general sigma-finite measures. By writing (fg)2=f+g2f1/2g1/2(\sqrt{f}-\sqrt{g})^{2}=f+g-2f^{1/2}g^{1/2}, we see by comparing (3.1) and (3.4) that

T1/2(λμ)= 2H2(λ,μ).T_{1/2}(\lambda\|\mu)\ =\ 2H^{2}(\lambda,\mu). (3.5)

In particular, we see see that Tsallis divergences of order 12\frac{1}{2} are symmetric according to T1/2(λμ)=T1/2(μλ)T_{1/2}(\lambda\|\mu)=T_{1/2}(\mu\|\lambda). When λ,μ\lambda,\mu are probability measures, we note that T1/2(λμ)=2(1Sf1/2g1/2dν)=2(1exp(12R1/2(λμ))T_{1/2}(\lambda\|\mu)=2(1-\int_{S}f^{1/2}g^{1/2}\,d\nu)=2(1-\exp(-\frac{1}{2}R_{1/2}(\lambda\|\mu)). Hence for probability measures λ,μ\lambda,\mu,

H2(λ,μ)= 1exp(12R1/2(λμ)).H^{2}(\lambda,\mu)\ =\ 1-\exp\left(-\frac{1}{2}R_{1/2}(\lambda\|\mu)\right). (3.6)
Proposition 3.10.

The right side of (3.4) does not depend on the choice of the densities f,gf,g nor the reference measure ν\nu. Furthermore, if λμ\lambda\ll\mu, then the Hellinger distance can also be written as

H(λ,μ)=(12S(ϕ1)2𝑑μ)1/2,H(\lambda,\mu)\ =\ \left(\frac{1}{2}\int_{S}\left(\sqrt{\phi}-1\right)^{2}\,d\mu\right)^{1/2},

where ϕ:S+\phi\colon S\to\mathbb{R}_{+} is a density of λ\lambda with respect to μ\mu.

Proof.

The first claim follows by applying Theorem 3.1 with (3.5). The second claim follows by applying (3.4) with f=ϕf=\phi, g=1g=1, and ν=μ\nu=\mu. ∎

Proposition 3.11.

H(λ,ξ)H(λ,μ)+H(μ,ξ)H(\lambda,\xi)\leq H(\lambda,\mu)+H(\mu,\xi) for all sigma-finite measures.

Proof.

Let f,g,hf,g,h be densities of λ,μ,ξ\lambda,\mu,\xi with respect to the sigma-finite measure ν=λ+μ+ξ\nu=\lambda+\mu+\xi. We note that

H(λ,ξ)\displaystyle H(\lambda,\xi) =12fhL2(ν)\displaystyle\ =\ \frac{1}{\sqrt{2}}{\lVert\sqrt{f}-\sqrt{h}\rVert}_{L^{2}(\nu)}
=12(fg)+(gh)L2(ν).\displaystyle\ =\ \frac{1}{\sqrt{2}}{\lVert(\sqrt{f}-\sqrt{g})+(\sqrt{g}-\sqrt{h})\rVert}_{L^{2}(\nu)}.

Therefore, the claim follows by applying Minkowski’s inequality u+vL2(ν)uL2(ν)+vL2(ν){\lVert u+v\rVert}_{L^{2}(\nu)}\leq{\lVert u\rVert}_{L^{2}(\nu)}+{\lVert v\rVert}_{L^{2}(\nu)} which is true for all measurable functions u,vu,v, regardless of whether uL2(ν),vL2(ν){\lVert u\rVert}_{L^{2}(\nu)},{\lVert v\rVert}_{L^{2}(\nu)} are finite or not. ∎

3.5 Disintegration of Tsallis divergences

The disintegration of a measure Λ\Lambda on a product space (S1×S2,𝒮1𝒮2)(S_{1}\times S_{2},\mathcal{S}_{1}\otimes\mathcal{S}_{2}) refers to a representation Λ=λK\Lambda=\lambda\otimes K where λ\lambda is a measure on (S1,𝒮1)(S_{1},\mathcal{S}_{1}) corresponding to the first marginal of Λ\Lambda, and KK is a kernel from (S1,𝒮1)(S_{1},\mathcal{S}_{1}) into (S2,𝒮2)(S_{2},\mathcal{S}_{2}). Equivalently,

Λ(C)=S1S21C(x,y)Kx(dy)λ(dx),C𝒮1𝒮2.\Lambda(C)\ =\ \int_{S_{1}}\int_{S_{2}}1_{C}(x,y)\,K_{x}(dy)\,\lambda(dx),\quad C\in\mathcal{S}_{1}\otimes\mathcal{S}_{2}.

Informally, we write

Λ(dx,dy)=λ(dx)Kx(dy).\Lambda(dx,dy)\ =\ \lambda(dx)K_{x}(dy).

If Λ\Lambda disintegrates according to Λ=λK\Lambda=\lambda\otimes K, then it also disintegrates according to Λ=λ~K~\Lambda=\tilde{\lambda}\otimes\tilde{K} where λ~=2λ\tilde{\lambda}=2\lambda and K~=12K\tilde{K}=\frac{1}{2}K. To rule out such unidentifiability issues, in applications it is natural to require KK to be probability kernel, that is, a kernel such that Kx(S2)=1K_{x}(S_{2})=1 for all xS1x\in S_{1}. The following result characterises Tsallis divergences of disintegrated measures that helps to compute information divergences of compound Poisson processes and marked Poisson PPs (see Sections 6.36.4). See [PW24, Theorem 2.13, Equation 7.71] for similar results concerning Rényi divergences.

Theorem 3.12.

Let λ,μ\lambda,\mu be sigma-finite measures on a measurable space (S1,𝒮1)(S_{1},\mathcal{S}_{1}), and let K,LK,L be probability kernels from (S1,𝒮1)(S_{1},\mathcal{S}_{1}) into a measurable space (S2,𝒮2)(S_{2},\mathcal{S}_{2}). Assume that there exist measurable functions k,:S1×S2+k,\ell\colon S_{1}\times S_{2}\to\mathbb{R}_{+} and a kernel MM from (S1,𝒮1)(S_{1},\mathcal{S}_{1}) into (S2,𝒮2)(S_{2},\mathcal{S}_{2}) such that

Kt(dx)\displaystyle K_{t}(dx) =kt(x)Mt(dx),\displaystyle=k_{t}(x)M_{t}(dx), (3.7)
Lt(dx)\displaystyle L_{t}(dx) =t(x)Mt(dx),\displaystyle=\ell_{t}(x)M_{t}(dx),

for all tS1t\in S_{1}. Then the Tsallis divergence of order α+\alpha\in\mathbb{R}_{+} equals

Tα(λKμL)\displaystyle T_{\alpha}(\lambda\otimes K\|\mu\otimes L) (3.8)
={T0(λμ)+f0T0(KtLt)μ(dt),α=0,Tα(λμ)+S1Tα(KtLt)ftαgt1αν(dt),α{0,1},T1(λμ)+S1T1(KtLt)λ(dt),α=1,\displaystyle\ =\ \begin{cases}T_{0}(\lambda\|\mu)+\int_{f\neq 0}T_{0}(K_{t}\|L_{t})\,\mu(dt),&\quad\alpha=0,\\ T_{\alpha}(\lambda\|\mu)+\int_{S_{1}}T_{\alpha}(K_{t}\|L_{t})\,f_{t}^{\alpha}g_{t}^{1-\alpha}\,\nu(dt),&\quad\alpha\notin\{0,1\},\\ T_{1}(\lambda\|\mu)+\int_{S_{1}}T_{1}(K_{t}\|L_{t})\,\lambda(dt),&\quad\alpha=1,\end{cases}

where ft=dλdν(t)f_{t}=\frac{d\lambda}{d\nu}(t) and gt=dμdν(t)g_{t}=\frac{d\mu}{d\nu}(t) are densities of λ\lambda and μ\mu with respect to a sigma-finite measure ν\nu.

Proof.

Section 7.2. ∎

A sufficient condition for (3.7) is to assume that (S2,𝒮2)(S_{2},\mathcal{S}_{2}) is separable in the sense that 𝒮2\mathcal{S}_{2} is generated by a countable set family. In this case there exist [DM82, Theorem 58] measurable functions k,:S1×S2+k,\ell\colon S_{1}\times S_{2}\to\mathbb{R}_{+} such that (3.7) holds for the probability kernel M=12(K+L)M=\frac{1}{2}(K+L). It might be that (3.7) is not needed for Theorem 3.12. Proving this might require a measurable selection theorem that is different from the usual ones for which it is assumed that at least one of the sigma-algebras is generated by a regular topology [Les10, LV17].

4 Likelihood ratios of Poisson PPs

This section presents a general likelihood ratio formula for Poisson PPs. Section 4.1 first focuses on the case with finite intensity measures, and Section 4.2 then provides a general formula.

4.1 Finite Poisson PPs

Poisson PPs with finite intensity measures are almost surely finite. Likelihood ratio formulas for Poisson PP distributions with finite intensity measures are classical (e.g. [LS78, Kar91, Rei93, Bir07]), although they are usually restricted to Polish spaces. The proof of the following general result is included in Section 7.3 for completeness.

Theorem 4.1.

Any Poisson PP distributions with finite intensity measures λμ\lambda\ll\mu satisfy PλPμP_{\lambda}\ll P_{\mu}, and a likelihood ratio is given by

dPλdPμ(η)= 1Mλ,μ(η)exp(S(1ϕ)𝑑μ+Slogϕdη),\frac{dP_{\lambda}}{dP_{\mu}}(\eta)\ =\ 1_{M_{\lambda,\mu}}(\eta)\,\exp\bigg{(}\int_{S}(1-\phi)\,d\mu+\int_{S}\log\phi\,d\eta\bigg{)}, (4.1)

where ϕ=dλdμ\phi=\frac{d\lambda}{d\mu} is a density of λ\lambda with respect to μ\mu, and

Mλ,μ={ηN(S):η{ϕ=0}=0}.M_{\lambda,\mu}\ =\ \Big{\{}\eta\in N(S)\colon\eta\{\phi=0\}=0\Big{\}}. (4.2)
Remark 4.2.

For every finite point pattern ηMλ,μ\eta\in M_{\lambda,\mu}, the integral Slogϕdη\int_{S}\log\phi\,d\eta in (4.1) is a well-defined real number because logϕ\log\phi\in\mathbb{R} outside the set {ϕ=0}\{\phi=0\} of η\eta-measure zero. Also recall that every point pattern generated by a Poisson PP distribution with a finite intensity measure is finite almost surely. Therefore, the right side in (4.1) is well defined for PμP_{\mu}-almost every η\eta.

Remark 4.3.

The set Mλ,μM_{\lambda,\mu} in (4.2) indicates the set of point patterns that contain no points in the region where ϕ=dλdμ\phi=\frac{d\lambda}{d\mu} vanishes. If we assume that λ\lambda and μ\mu are mutually absolutely continuous, then we may omit 1Mλ,μ(η)1_{M_{\lambda,\mu}}(\eta) from (4.2). To see why, observe that λ{ϕ=0}={ϕ=0}ϕ𝑑μ=0\lambda\{\phi=0\}=\int_{\{\phi=0\}}\phi\,d\mu=0 together with μλ\mu\ll\lambda implies that μ{ϕ=0}=0\mu\{\phi=0\}=0. Therefore, by Markov’s inequality,

Pμ(Mλ,μc)\displaystyle P_{\mu}(M_{\lambda,\mu}^{c}) =Pμ{η{ϕ=0}1}\displaystyle\ =\ P_{\mu}\{\eta\{\phi=0\}\geq 1\}
Eμη{ϕ=0}=μ{ϕ=0}= 0.\displaystyle\ \leq\ E_{\mu}\eta\{\phi=0\}\ =\ \mu\{\phi=0\}\ =\ 0.

4.2 Sigma-finite PPs

The simple density formula of Theorem 4.1 is not in general valid for Poisson PPs with infinite intensity measures, because the integral Slogϕdη\int_{S}\log\phi\,d\eta might not converge for infinite point patterns η\eta. In the general setting, we need to work with carefully compensated integrals. A key observation is that a compensated Poisson integral {|logϕ|1}logϕd(ημ)\int_{\{{\lvert\log\phi\rvert}\leq 1\}}\log\phi\,d(\eta-\mu) and the ordinary Poisson integral {|logϕ|>1}logϕdη\int_{\{{\lvert\log\phi\rvert}>1\}}\log\phi\,d\eta of the logarithm of ϕ=dλdμ\phi=\frac{d\lambda}{d\mu} converge for PμP_{\mu}-almost every η\eta whenever PλPμP_{\lambda}\ll P_{\mu}. See Appendix B.2 for the definition of the compensated integral and details. The following theorem confirms that a density of PλP_{\lambda} with respect to PμP_{\mu} can be written using these integrals.

Theorem 4.4.

Any Poisson PP distributions with sigma-finite intensity measures such that λμ\lambda\ll\mu and H(λ,μ)<H(\lambda,\mu)<\infty satisfy PλPμP_{\lambda}\ll P_{\mu}, and a likelihood ratio is given by

dPλdPμ(η)= 1Mλ,μ(η)exp(λ,μ(η)),\frac{dP_{\lambda}}{dP_{\mu}}(\eta)\ =\ 1_{M_{\lambda,\mu}}(\eta)\exp(\ell_{\lambda,\mu}(\eta)), (4.3)

where

λ,μ(η)\displaystyle\ell_{\lambda,\mu}(\eta) =|logϕ|1logϕd(ημ)+|logϕ|>1logϕdη\displaystyle\ =\int\displaylimits_{{\lvert\log\phi\rvert}\leq 1}\log\phi\,d(\eta-\mu)\,+\int\displaylimits_{{\lvert\log\phi\rvert}>1}\log\phi\,d\eta (4.4)
+|logϕ|1(logϕ+1ϕ)𝑑μ+|logϕ|>1(1ϕ)𝑑μ\displaystyle\quad+\int\displaylimits_{{\lvert\log\phi\rvert}\leq 1}(\log\phi+1-\phi)\,d\mu+\int\displaylimits_{{\lvert\log\phi\rvert}>1}(1-\phi)\,d\mu

and

Mλ,μ={ηN(S):η{ϕ=0}=0}M_{\lambda,\mu}\ =\ \{\eta\in N(S)\colon\eta\{\phi=0\}=0\} (4.5)

are defined in terms of a density ϕ=dλdμ\phi=\frac{d\lambda}{d\mu}.

Proof.

Section 7.4. ∎

Remark 4.5.

When λ\lambda and μ\mu are mutually absolutely continuous, the factor 1Mλ,μ(η)1_{M_{\lambda,\mu}}(\eta) may be omitted from (4.3), as explained in Remark 4.3.

5 Information divergences of Poisson PPs

In principle, most information divergences for Poisson PP distributions can be computed using the likelihood ratio dPλdPμ\frac{dP_{\lambda}}{dP_{\mu}}. Unfortunately, the general likelihood ratio formula in Theorem 4.4 involves a rather complicated stochastic integral that renders it difficult to obtain explicit analytical expressions. However, the fact that the laws of Poisson PPs are infinitely divisible suggests that simple formulas should be available for information divergences that are additive with respect to product measures. It is well known that Rényi divergences, including the Kullback–Leibler divergence, enjoy this tensorisation property [vH14, PW24]. Indeed, it was recently confirmed that linear combinations of Rényi divergences are the only divergences satisfying the tensorisation property and the data processing inequality [MPST21, Theorem 2].

This section demonstrates how Rényi divergences and related quantities of general Poisson PP distributions can be computed from their associated intensity measures as generalised Tsallis divergences introduced in Section 3. The section is outlined as follows. Section 5.1 summarises formulas Rényi divergences, Kullback–Leibler divergences, and Hellinger distances. Section 5.2 characterises the absolute continuity of Poisson PP distributions using Tsallis divergences of their intensity measures. Section 5.3 characterises pairs of Poisson PPs whose laws admit a common dominating measure corresponding to a Poisson PP.

5.1 Divergences and distances

In what follows, PλP_{\lambda} and PμP_{\mu} are Poisson PP distributions with sigma-finite intensity measures λ\lambda and μ\mu on a measurable space SS.

Theorem 5.1.

The Rényi divergence of order α(0,)\alpha\in(0,\infty) for Poisson PP distributions PλP_{\lambda} and PμP_{\mu} is given by the Tsallis divergence of their intensity measures according to

Rα(PλPμ)=Tα(λμ).R_{\alpha}(P_{\lambda}\|P_{\mu})\ =\ T_{\alpha}(\lambda\|\mu). (5.1)

If Tα(λμ)<T_{\alpha}(\lambda\|\mu)<\infty for some α>0\alpha>0, then (5.1) also holds for α=0\alpha=0.

Proof.

Section 7.5. ∎

Theorem 5.2.

The Kullback–Leibler divergence for Poisson PP distributions PλP_{\lambda} and PμP_{\mu} is given by

KL(PλPμ)=S(flogfg+gf)𝑑ν,\operatorname{KL}(P_{\lambda}\|P_{\mu})\ =\ \int_{S}\left(f\log\frac{f}{g}+g-f\right)d\nu, (5.2)

where f=dλdνf=\frac{d\lambda}{d\nu} and g=dμdνg=\frac{d\mu}{d\nu} are densities with respect to a measure ν\nu.

Proof.

Immediate corollary of Theorem 5.1. ∎

As another corollary of Theorem 5.1, we obtain a simple formula for the Hellinger distance between Poisson PP distributions. This result was proved in [Tak90] for locally finite intensity measures on locally compact Polish spaces.

Theorem 5.3.

The Hellinger distance between Poisson PPs PλP_{\lambda} and PμP_{\mu} is given by H2(Pλ,Pμ)=1eH2(λ,μ)H^{2}(P_{\lambda},P_{\mu})=1-e^{-H^{2}(\lambda,\mu)}.

Proof.

Theorem 5.1 combined with formula (3.5) implies that R1/2(PλPμ)=T1/2(λμ)=2H2(λ,μ)R_{1/2}(P_{\lambda}\|P_{\mu})=T_{1/2}(\lambda\|\mu)=2H^{2}(\lambda,\mu). For probability measures PλP_{\lambda} and PμP_{\mu}, we find by applying (2.1) that H2(Pλ,Pμ)=1e12R1/2(Pλ,Pμ)H^{2}(P_{\lambda},P_{\mu})=1-e^{-\frac{1}{2}R_{1/2}(P_{\lambda},P_{\mu})}. By combining these findings, we conclude that H2(Pλ,Pμ)=1eH2(λ,μ)H^{2}(P_{\lambda},P_{\mu})=1-e^{-H^{2}(\lambda,\mu)}. ∎

5.2 Absolute continuity

Theorem 5.4.

The following are equivalent for any Poisson PPs with sigma-finite intensity measures:

  1. (i)

    PλPμP_{\lambda}\ll P_{\mu}.

  2. (ii)

    T0(μλ)=0T_{0}(\mu\|\lambda)=0 and Tα(μλ)<T_{\alpha}(\mu\|\lambda)<\infty for all α+\alpha\in\mathbb{R}_{+}.

  3. (iii)

    T0(μλ)=0T_{0}(\mu\|\lambda)=0 and Tα(μλ)<T_{\alpha}(\mu\|\lambda)<\infty for some 0<α<0<\alpha<\infty.

Proof.

(i)\implies(ii). Assume that PλPμP_{\lambda}\ll P_{\mu}. Fix a measurable set ASA\subset S such that μ(A)=0\mu(A)=0. Let C={ηN(S):η(A)>0}C=\{\eta\in N(S)\colon\eta(A)>0\}. Note that η(A)\eta(A) is Poisson-distributed with mean λ(A)\lambda(A) (resp. μ(A)\mu(A)) when η\eta is sampled from PλP_{\lambda} (resp. PμP_{\mu}). Therefore Pμ(C)=1eμ(A)=0P_{\mu}(C)=1-e^{-\mu(A)}=0. Because PλPμP_{\lambda}\ll P_{\mu}, it follows that 0=Pλ(C)=1eλ(A)0=P_{\lambda}(C)=1-e^{-\lambda(A)}, and we conclude that λ(A)=0\lambda(A)=0. Hence λμ\lambda\ll\mu, and Proposition 3.8 implies that T0(μλ)=0T_{0}(\mu\|\lambda)=0. Furthermore, because PλP_{\lambda} and PμP_{\mu} are not mutually singular, we know [vH14, Theorem 24] that Rα(PμPλ)<R_{\alpha}(P_{\mu}\|P_{\lambda})<\infty for all α+\alpha\in\mathbb{R}_{+}. By Theorem 5.1, it follows that Tα(μλ)<T_{\alpha}(\mu\|\lambda)<\infty for all α+\alpha\in\mathbb{R}_{+}.

(ii)\implies(iii). Immediate.

(iii)\implies(i). Theorem 5.1 implies that R0(PμPλ)=T0(μλ)=0R_{0}(P_{\mu}\|P_{\lambda})=T_{0}(\mu\|\lambda)=0. By formula (2.1), we see that logPλ{G>0}=0-\log P_{\lambda}\{G>0\}=0 where G=dPμdmG=\frac{dP_{\mu}}{dm} is a density of PμP_{\mu} with respect to an arbitrary measure mm such that Pλ,PμmP_{\lambda},P_{\mu}\ll m. Hence Pλ{G=0}=0P_{\lambda}\{G=0\}=0, and Lemma A.1 confirms that PλPμP_{\lambda}\ll P_{\mu}. ∎

As a corollary of Theorem 5.4, we obtain a simple proof of the following result extending [Tak90] to general nontopological spaces.

Theorem 5.5.

The following are equivalent for Poisson PPs with sigma-finite intensity measures:

  1. (i)

    PλPμP_{\lambda}\ll P_{\mu}

  2. (ii)

    λμ\lambda\ll\mu and H(λ,μ)<H(\lambda,\mu)<\infty.

Proof.

Assume that λμ\lambda\ll\mu and H(λ,μ)<H(\lambda,\mu)<\infty. Then T0(μλ)=0T_{0}(\mu\|\lambda)=0 by Proposition 3.8. Formula (3.5) implies that T1/2(μλ)=2H2(μ,λ)=2H2(λ,μ)T_{1/2}(\mu\|\lambda)=2H^{2}(\mu,\lambda)=2H^{2}(\lambda,\mu) is finite. Theorem 5.4 now implies that PλPμP_{\lambda}\ll P_{\mu}.

Assume that PλPμP_{\lambda}\ll P_{\mu}. Theorem 5.4 then implies that T0(μλ)=0T_{0}(\mu\|\lambda)=0 and T1/2(μλ)<T_{1/2}(\mu\|\lambda)<\infty. Then Proposition 3.8 implies that λμ\lambda\ll\mu, and formula (3.5) implies that H(λ,μ)<H(\lambda,\mu)<\infty. ∎

Kakutani’s famous dichotomy [Kak48] states that infinite products of probability measures iPi\prod_{i}P_{i} and iQi\prod_{i}Q_{i}, such that PiP_{i} and QiQ_{i} are mutually absolutely continuous for all ii, are either mutually absolutely continuous or mutually singular—there is no middle ground. Earlier results in this and the previous section yield a simple proof of an analogue of Kakutani’s dichotomy for Poisson PPs. This result is well known for Polish spaces [Lie75, Kar91], and has also been presented in [Bro71] in terms of a more complicated criterion for intensity measures that is equivalent to H(λ,μ)<H(\lambda,\mu)<\infty.

Theorem 5.6.

Let λ\lambda and μ\mu be mutually absolutely continuous. Then PλP_{\lambda} and PμP_{\mu} are either mutually absolutely continuous, or mutually singular, according to H(λ,μ)H(\lambda,\mu) being finite or infinite.

Proof.

Assume that H(λ,μ)<H(\lambda,\mu)<\infty. Theorem 5.5 then shows that PλPμP_{\lambda}\ll P_{\mu} and PμPλP_{\mu}\ll P_{\lambda}, so that PλP_{\lambda} and PμP_{\mu} are mutually absolutely continuous.

Assume next that H(λ,μ)=H(\lambda,\mu)=\infty. Theorem 5.3 then implies that H(Pλ,Pμ)=1H(P_{\lambda},P_{\mu})=1. In light of (3.6) we see that H2(Pλ,Pμ)=1exp(12R1/2(PλPμ))H^{2}(P_{\lambda},P_{\mu})=1-\exp\left(-\frac{1}{2}R_{1/2}(P_{\lambda}\|P_{\mu})\right), from which we conclude that R1/2(PλPμ)=R_{1/2}(P_{\lambda}\|P_{\mu})=\infty. It follows [vH14, Theorem 24] that PλP_{\lambda} and PμP_{\mu} are mutually singular. ∎

5.3 Existence of a dominating Poisson PP

For any pair of Poisson PP distributions Pλ,PμP_{\lambda},P_{\mu}, there always exists a probability measure QQ on (N(S),𝒩(S))(N(S),\mathcal{N}(S)) such that PλQP_{\lambda}\ll Q and PμQP_{\mu}\ll Q. For example, we may choose Q=12(Pλ+Pμ)Q=\frac{1}{2}(P_{\lambda}+P_{\mu}). For practical purposes (e.g. Monte Carlo simulation), it would be helpful to find a Poisson PP distribution, such as Q=Pλ+μQ=P_{\lambda+\mu}, that would also serve as a common dominating probability measure. This is always possible for finite intensity measures but fails in general (Remark 5.8). Remarkably, even when a dominating Poisson PP exists, λ+μ\lambda+\mu might not be a feasible choice for its intensity.

Theorem 5.7.

Given Poisson PP distributions Pλ,PμP_{\lambda},P_{\mu} with sigma-finite intensity measures λ,μ\lambda,\mu, there exists a Poisson PP distribution PξP_{\xi} such that Pλ,PμPξP_{\lambda},P_{\mu}\ll P_{\xi} if and only if H(λ,μ)<H(\lambda,\mu)<\infty.

Proof.

Assume that there exists a Poisson PP distribution PξP_{\xi} with a sigma-finite intensity measure ξ\xi such that Pλ,PμPξP_{\lambda},P_{\mu}\ll P_{\xi}. Theorem 5.5 implies that H(λ,ξ)<H(\lambda,\xi)<\infty and H(ξ,μ)<H(\xi,\mu)<\infty. The triangle inequality (Proposition 3.11) implies that H(λ,μ)H(λ,ξ)+H(ξ,μ)H(\lambda,\mu)\leq H(\lambda,\xi)+H(\xi,\mu). Hence H(λ,μ)<H(\lambda,\mu)<\infty.

Assume that H(λ,μ)<H(\lambda,\mu)<\infty. Let f,gf,g be densities of λ,μ\lambda,\mu with respect to the sigma-finite measure ν=λ+μ\nu=\lambda+\mu. Define a measure ξ(dx)=h(x)ν(dx)\xi(dx)=h(x)\nu(dx) where h=14(f+g)2h=\frac{1}{4}(\sqrt{f}+\sqrt{g})^{2}. Observe that f4hf\leq 4h and g4hg\leq 4h pointwise, and therefore we see that λξ\lambda\ll\xi and μξ\mu\ll\xi. Furthermore, because h=12(f+g)\sqrt{h}=\frac{1}{2}(\sqrt{f}+\sqrt{g}), we see that hf=12(gf)\sqrt{h}-\sqrt{f}=\frac{1}{2}(\sqrt{g}-\sqrt{f}) and hg=12(fg)\sqrt{h}-\sqrt{g}=\frac{1}{2}(\sqrt{f}-\sqrt{g}). Then by formula (3.4),

H2(λ,ξ)=12(hf)2𝑑ν\displaystyle H^{2}(\lambda,\xi)\ =\ \frac{1}{2}\int(\sqrt{h}-\sqrt{f})^{2}\,d\nu =18(fg)2𝑑ν\displaystyle\ =\ \frac{1}{8}\int(\sqrt{f}-\sqrt{g})^{2}\,d\nu
=14H2(λ,μ).\displaystyle\ =\ \frac{1}{4}H^{2}(\lambda,\mu).

Hence H(λ,ξ)=12H(λ,μ)H(\lambda,\xi)=\frac{1}{2}H(\lambda,\mu). By symmetry, H(μ,ξ)=12H(λ,μ)H(\mu,\xi)=\frac{1}{2}H(\lambda,\mu). We conclude that H(λ,ξ)H(\lambda,\xi) and H(μ,ξ)H(\mu,\xi) are finite. Theorem 5.5 now confirms that PλPξP_{\lambda}\ll P_{\xi} and PμPξP_{\mu}\ll P_{\xi}. ∎

Remark 5.8.

Pλ,PμPλ+μP_{\lambda},P_{\mu}\ll P_{\lambda+\mu} if and only if H(λ,λ+μ)+H(μ,λ+μ)<H(\lambda,\lambda+\mu)+H(\mu,\lambda+\mu)<\infty. The latter requirement is stronger than H(λ,μ)<H(\lambda,\mu)<\infty. To see why, observe that (see Proposition 3.10) H2(λ,2λ)=12S(21)2𝑑λ=12(21)2λ(S)H^{2}(\lambda,2\lambda)=\frac{1}{2}\int_{S}(\sqrt{2}-\sqrt{1})^{2}\,d\lambda=\frac{1}{2}(\sqrt{2}-1)^{2}\lambda(S). Therefore H(λ,μ)=0H(\lambda,\mu)=0 and H(λ,λ+μ)=H(\lambda,\lambda+\mu)=\infty whenever λ\lambda is an infinite measure and μ=λ\mu=\lambda. Let us also note that Pλ,PμPλ+μP_{\lambda},P_{\mu}\ll P_{\lambda+\mu} is always true for finite intensity measures λ,μ\lambda,\mu because Hellinger distances between finite measures are finite; as is seen from (3.4) combined with the inequality (fg)2f+g(\sqrt{f}-\sqrt{g})^{2}\leq f+g.

6 Applications

This section lists various applications of the general formulas derived in the previous sections. Section 6.1 discusses Poisson processes, Section 6.2 Chernoff information of Poisson vectors, Section 6.3 marked Poisson PPs, and Section 6.4 concludes with compound Poisson processes.

6.1 Poisson processes

The counting process of a point pattern ηN(+)\eta\in N(\mathbb{R}_{+}) is a function X:++X\colon\mathbb{R}_{+}\to\mathbb{Z}_{+} defined by

Xt=+1(st)η(ds)=η([0,t]).X_{t}\ =\ \int_{\mathbb{R}_{+}}1(s\leq t)\,\eta(ds)\ =\ \eta([0,t]). (6.1)

Denote by cou:ηX\operatorname{cou}\colon\eta\mapsto X the map induced by (6.1). A Poisson process with intensity measure λ\lambda is the counting process X=cou(η)X=\operatorname{cou}(\eta) of a point pattern η\eta sampled from a Poisson PP distribution PλP_{\lambda} with a sigma-finite intensity measure λ\lambda on +\mathbb{R}_{+}. The law of the Poisson process is the pushforward measure (X)=Pλcou1\mathcal{L}(X)=P_{\lambda}\circ\operatorname{cou}^{-1}. When λ\lambda admits a density ff with respect to the Lebesgue measure on +\mathbb{R}_{+}, the process X=(Xt)t+X=(X_{t})_{t\in\mathbb{R}_{+}} is called an inhomogeneous Poisson process with intensity function ff.

Theorem 6.1.

For Poisson processes X=(Xt)t+X=(X_{t})_{t\in\mathbb{R}_{+}} and Y=(Yt)t+Y=(Y_{t})_{t\in\mathbb{R}_{+}} with intensity functions ff and gg, the Kullback–Leibler divergence is given by

KL((X)(Y))=0(ftlogftgt+gtft)𝑑t,\displaystyle\operatorname{KL}(\mathcal{L}(X)\|\mathcal{L}(Y))\ =\ \int_{0}^{\infty}\left(f_{t}\log\frac{f_{t}}{g_{t}}+g_{t}-f_{t}\right)dt,

the Rényi divergence or order α1\alpha\neq 1 is given by

Rα((X)(Y))=0(αft+(1α)gtftαgt1α1α)𝑑t,\displaystyle R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))\ =\ \int_{0}^{\infty}\Big{(}\frac{\alpha f_{t}+(1-\alpha)g_{t}-f_{t}^{\alpha}g_{t}^{1-\alpha}}{1-\alpha}\Big{)}dt,

and the Hellinger distance is given by

Hel((X),(Y))=1e12T1/2(λμ),\operatorname{Hel}(\mathcal{L}(X),\mathcal{L}(Y))\ =\ \sqrt{1-e^{-\frac{1}{2}T_{1/2}(\lambda\|\mu)}},

where T1/2(λμ)=0(ftgt)2𝑑t.T_{1/2}(\lambda\|\mu)=\int_{0}^{\infty}\left(\sqrt{f_{t}}-\sqrt{g_{t}}\right)^{2}dt.

Proof.

Denote by F(+,+)F(\mathbb{R}_{+},\mathbb{Z}_{+}) set of nondecreasing functions X:++X\colon\mathbb{R}_{+}\to\mathbb{Z}_{+} that are right-continuous with left limits (càdlàg), equipped with the sigma-algebra generated by the evaluation maps XXtX\mapsto X_{t}. It follows from (6.1) that the map cou:N(+)F(+)\operatorname{cou}\colon N(\mathbb{R}_{+})\to F(\mathbb{R}_{+}) is a measurable bijection with a measurable inverse. Therefore (Lemma A.4), Rα((X)(Y))=Rα(PλPμ)R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))=R_{\alpha}(P_{\lambda}\|P_{\mu}) with λ(dt)=f(t)dt\lambda(dt)=f(t)dt and μ(dt)=g(t)dt\mu(dt)=g(t)dt. Theorem 5.1 implies that Rα(PλPμ)=Tα(λμ)R_{\alpha}(P_{\lambda}\|P_{\mu})=T_{\alpha}(\lambda\|\mu), and the first two claims follow by (3.1).

Next, we note by (3.6) that H2((X),(Y))=1exp(12R1/2((X)(Y)))H^{2}(\mathcal{L}(X),\mathcal{L}(Y))=1-\exp\left(-\frac{1}{2}R_{1/2}(\mathcal{L}(X)\|\mathcal{L}(Y))\right), so that the last claim follows from R1/2((X)(Y))=R1/2(PλPμ)=T1/2(λμ)R_{1/2}(\mathcal{L}(X)\|\mathcal{L}(Y))=R_{1/2}(P_{\lambda}\|P_{\mu})=T_{1/2}(\lambda\|\mu).

6.2 Chernoff information for Poisson vectors

In classical binary hypothesis testing, the task is to estimate a parameter θ{0,1}\theta\in\{0,1\} from nn independent samples from a probability distribution FθF_{\theta} on a measurable space SS. The error rate (i.e. Bayes risk) of a decision rule θ^n:Sn{0,1}\hat{\theta}_{n}\colon S^{n}\to\{0,1\}, averaged with respect to prior probabilities π0,π1>0\pi_{0},\pi_{1}>0 is given by Rπ(θ^n)=π0F0n(θ^n=1)+π1F1n(θ^n=0)R_{\pi}(\hat{\theta}_{n})=\pi_{0}F_{0}^{\otimes n}(\hat{\theta}_{n}=1)+\pi_{1}F_{1}^{\otimes n}(\hat{\theta}_{n}=0). Chernoff’s famous theorem [Che52] states that for n1n\gg 1, the minimum error rate scales as

infθ^nRπ(θ^n)=e(1+o(1))C(F0F1)n,\inf_{\hat{\theta}_{n}}R_{\pi}(\hat{\theta}_{n})\ =\ e^{-(1+o(1))C(F_{0}\|F_{1})n},

where

C(F0F1)=supα(0,1)(1α)Rα(F0F1).C(F_{0}\|F_{1})\ =\ \sup_{\alpha\in(0,1)}(1-\alpha)R_{\alpha}(F_{0}\|F_{1}).

is called the Chernoff information of the test [Nie13].

In a network community detection problem related to a stochastic block model, Abbe and Sandon [AS15] reduced a key estimation task into a binary hypothesis test for the law of a random vector with independent Poisson-distributed components, either having mean vector H0:(λ1,,λK)H_{0}:(\lambda_{1},\dots,\lambda_{K}) or H1:(μ1,,μK)H_{1}:(\mu_{1},\dots,\mu_{K}). Equivalently, the law of this random vector can be seen as a Poisson PP distribution on the finite set {1,,K}\{1,\dots,K\} with intensity measure admitting density kλkk\mapsto\lambda_{k} or kμkk\mapsto\mu_{k} with respect to the counting measure. Hence we are looking at a hypothesis test between Poisson PP distributions F0=PλF_{0}=P_{\lambda} and F1=PμF_{1}=P_{\mu}. With the help of Theorem 5.1 and formula (3.1) we see that the Chernoff information of the test is given by

C(PλPμ)=supα(0,1)k=1K(αλk+(1α)μkλkαμk1α).C(P_{\lambda}\|P_{\mu})\ =\ \sup_{\alpha\in(0,1)}\sum_{k=1}^{K}\big{(}\alpha\lambda_{k}+(1-\alpha)\mu_{k}-\lambda_{k}^{\alpha}\mu_{k}^{1-\alpha}\big{)}.

This is what is called Chernoff–Hellinger divergence in [AS15, RB17, ZT23].

6.3 Marked Poisson PPs

A marking of a (random or nonrandom) set of points with locations x1,x2,x_{1},x_{2},\dots in S1S_{1} associates each point xix_{i} a random variable yiy_{i} in S2S_{2}, called a mark, so that the conditional distribution of yiy_{i} given xi=xx_{i}=x is determined by xx, and the marks are conditionally independent given the locations. The statistics of the marking mechanism is parameterised by the collection of conditional distributions K:xKx=(yi|xi=x)K\colon x\mapsto K_{x}=\mathcal{L}(y_{i}\,|\,x_{i}=x) that constitutes a probability kernel from S1S_{1} into S2S_{2}. Such a marking mechanism can also be defined for random point patterns that admit a proper enumeration. This is the case for point patterns sampled from a Poisson PP distribution [LP18, Corollary 3.7].

Let PλP_{\lambda} be a Poisson PP distribution with a sigma-finite intensity measure λ\lambda on a measurable space (S1,𝒮1)(S_{1},\mathcal{S}_{1}), and let KK be a probability kernel from (S1,𝒮1)(S_{1},\mathcal{S}_{1}) into (S2,𝒮2)(S_{2},\mathcal{S}_{2}). A marked Poisson PP with intensity measure λ\lambda and mark kernel KK is then defined by sampling a point pattern ξ\xi on S1×S2S_{1}\times S_{2} from a Poisson PP distribution PλKP_{\lambda\otimes K} with intensity measure λK\lambda\otimes K defined by

(λK)(C)=S1S21C(x,y)Kx(dy)λ(dx),C𝒮1𝒮2.(\lambda\otimes K)(C)\ =\ \int_{S_{1}}\int_{S_{2}}1_{C}(x,y)\,K_{x}(dy)\,\lambda(dx),\qquad C\in\mathcal{S}_{1}\otimes\mathcal{S}_{2}.

Then the Poisson PP distribution PλP_{\lambda} equals the law of the point pattern Aξ(A×S2)A\mapsto\xi(A\times S_{2}) corresponding to the locations of the points in S1S_{1}, and the probability kernel KK yields the conditional distributions of the marks [LP18, Section 5.2]. Hence ξ\xi is a marked Poisson PP with intensity measure λ\lambda and marking kernel KK.

Theorem 6.2.

If ξ,ζ\xi,\zeta are marked Poisson PPs with sigma-finite intensity measures λ,μ\lambda,\mu and mark kernels K,LK,L, respectively, then the Rényi divergence of order α>0\alpha>0 is given by Rα((ξ)(ζ))=Tα(λKμL)R_{\alpha}(\mathcal{L}(\xi)\|\mathcal{L}(\zeta))=T_{\alpha}(\lambda\otimes K\|\mu\otimes L).

Proof.

Because (ξ)=PλK\mathcal{L}(\xi)=P_{\lambda\otimes K} and (ζ)=PμL\mathcal{L}(\zeta)=P_{\mu\otimes L}, the claim follows by Theorem 5.1. ∎

Theorem 6.2 combined with Theorem 3.12 now allows one to compute Kullback–Leibler and Rényi divergences of marked Poisson PPs in terms of the intensity measures λ,μ\lambda,\mu and the kernels K,LK,L.

6.4 Compound Poisson processes

A set of points (ti,xi)(t_{i},x_{i}) associated with time stamps ti+t_{i}\in\mathbb{R}_{+} and labels xidx_{i}\in\mathbb{R}^{d} can be modelled as a point pattern on +×d\mathbb{R}_{+}\times\mathbb{R}^{d}, or as a marked point pattern on +\mathbb{R}_{+} with mark space d\mathbb{R}^{d}. Denote by N(+×d)N(\mathbb{R}_{+}\times\mathbb{R}^{d}) the set of point patterns η\eta on +×d\mathbb{R}_{+}\times\mathbb{R}^{d} such that η([0,t]×d)<\eta([0,t]\times\mathbb{R}^{d})<\infty for all t+t\in\mathbb{R}_{+}. The cumulative process of such a point pattern is a function X:+dX\colon\mathbb{R}_{+}\to\mathbb{R}^{d} defined by

Xt=[0,t]×dxη(ds,dx).X_{t}\ =\ \int_{[0,t]\times\mathbb{R}^{d}}x\,\eta(ds,dx). (6.2)

Denote by agg:ηX\operatorname{agg}\colon\eta\mapsto X the map induced by (6.2). A compound Poisson process with event intensity measure λ\lambda and increment probability kernel KK is the cumulative process X=agg(η)X=\operatorname{agg}(\eta) of a point pattern η\eta sampled from a Poisson PP distribution PλKP_{\lambda\otimes K} on N(+×d)N(\mathbb{R}_{+}\times\mathbb{R}^{d}) with intensity measure (λK)(dt,dx)=λ(dt)Kt(dx)(\lambda\otimes K)(dt,dx)=\lambda(dt)K_{t}(dx), where λ\lambda is a locally finite measure +\mathbb{R}_{+} and KK is a probability kernel from +\mathbb{R}_{+} into d\mathbb{R}^{d}. The law of the compound Poisson process is the pushforward measure (X)=PλKagg1\mathcal{L}(X)=P_{\lambda\otimes K}\circ\operatorname{agg}^{-1}. The assumption that λ\lambda is locally finite implies that PλK(N(+×d))=1P_{\lambda\otimes K}(N(\mathbb{R}_{+}\times\mathbb{R}^{d}))=1, so that the right side of (6.2) is well defined for PλKP_{\lambda\otimes K}-almost every η\eta.

The map agg:ηX\operatorname{agg}\colon\eta\mapsto X induced by (6.2) maps N(+×d)N(\mathbb{R}_{+}\times\mathbb{R}^{d}) into the set F(+,d)F(\mathbb{R}_{+},\mathbb{R}^{d}) of right-continuous and piecewise constant functions from +\mathbb{R}_{+} into d\mathbb{R}^{d}. However, this map is not bijective because: (i) two simultaneous events (t,x)(t,x) and (t,y)(t,y) have the same contribution as a single event (t,x+y)(t,x+y) to the cumulative process; and (ii) points (t,0)(t,0) with zero mark do not contribute anything. To rule out such identifiability issues, we will restrict to a set Ns(+×d)N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d}) of point patterns ηN(+×d)\eta\in N(\mathbb{R}_{+}\times\mathbb{R}^{d}) such that η({t}×d){0,1}\eta(\{t\}\times\mathbb{R}^{d})\in\{0,1\} for all tt, and η(+×{0})=0\eta(\mathbb{R}_{+}\times\{0\})=0. This is why we will assume that λ\lambda is locally finite, diffuse in the sense that λ({t})=0\lambda(\{t\})=0 for all tt, and that the probability kernel KK satisfies Kt({0})=0K_{t}(\{0\})=0 for all tt. Under these assumptions, it follows [LP18, Proposition 6.9] that PλK(Ns(+×d))=1P_{\lambda\otimes K}(N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d}))=1.

Theorem 6.3.

For compound Poisson processes X=(Xt)t+X=(X_{t})_{t\in\mathbb{R}_{+}} and Y=(Yt)t+Y=(Y_{t})_{t\in\mathbb{R}_{+}} with diffuse locally finite event intensity measures λ\lambda and μ\mu, and increment probability kernels KK and LL such that Kt({0})=0K_{t}(\{0\})=0 and Lt({0})=0L_{t}(\{0\})=0, the Rényi divergence of order α>0\alpha>0 is given by Rα((X)(Y))=Tα(λKμL)R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))=T_{\alpha}(\lambda\otimes K\|\mu\otimes L).

Proof.

Let us equip F(+,d)F(\mathbb{R}_{+},\mathbb{R}^{d}) with the sigma-algebra generated by the evaluation maps XXtX\mapsto X_{t}. The aggregation map agg:N(+×d)F(+,d)\operatorname{agg}\colon N(\mathbb{R}_{+}\times\mathbb{R}^{d})\to F(\mathbb{R}_{+},\mathbb{R}^{d}) restricted to Ns(+×d)N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d}) is bijective, with inverse map given by

(agg1(X))(C)=#{(t,x)C:XtXt=x,x0},(\operatorname{agg}^{-1}(X))(C)\ =\ \#\{(t,x)\in C\colon X_{t}-X_{t-}=x,\,x\neq 0\},

where Xt=limstXsX_{t-}=\lim_{s\uparrow t}X_{s} for t>0t>0 and Xt=X0X_{t-}=X_{0} for t=0t=0. Standard techniques (as in the proof of Theorem 5.1) imply that agg:Ns(+×d)F(+,d)\operatorname{agg}\colon N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d})\to F(\mathbb{R}_{+},\mathbb{R}^{d}) is measurable with a measurable inverse. Because PλKP_{\lambda\otimes K} and PμLP_{\mu\otimes L} have all their mass supported on Ns(+×d)N_{s}(\mathbb{R}_{+}\times\mathbb{R}^{d}), it follows (Lemma A.4) that Rα((X)(Y))=Rα(PλKPμL)R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))=R_{\alpha}(P_{\lambda\otimes K}\|P_{\mu\otimes L}). Theorem 5.1 then implies that Rα((X)(Y))=Tα(λμ)R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))=T_{\alpha}(\lambda\|\mu).

By combining Theorem 6.3 with Theorem 3.12, we may compute information divergences of compound Poisson processes. For example, the Rényi divergence of order α{0,1}\alpha\notin\{0,1\} for compound Poisson processes X=(Xt)t+X=(X_{t})_{t\in\mathbb{R}_{+}} and Y=(Yt)t+Y=(Y_{t})_{t\in\mathbb{R}_{+}} with event intensity measures λ(dt)=ftdt\lambda(dt)=f_{t}dt and μ(dt)=gtdt\mu(dt)=g_{t}dt and increment probability kernels KK and LL is given by

Rα((X)(Y))\displaystyle R_{\alpha}(\mathcal{L}(X)\|\mathcal{L}(Y))
=Rα(PλPμ)+0Tα(KtLt)ftαgt1α𝑑t.\displaystyle\ =\ R_{\alpha}(P_{\lambda}\|P_{\mu})+\int_{0}^{\infty}T_{\alpha}(K_{t}\|L_{t})\,f_{t}^{\alpha}g_{t}^{1-\alpha}\,dt.

This formula demonstrates how the information content decomposes into two parts: Rα(PλPμ)R_{\alpha}(P_{\lambda}\|P_{\mu}) associated with only observing the jump instants of the compound Poisson processes, and the additional term 0Tα(KtLt)ftαgt1α𝑑t\int_{0}^{\infty}T_{\alpha}(K_{t}\|L_{t})\,f_{t}^{\alpha}g_{t}^{1-\alpha}\,dt characterising the information gain when we also observe the jump sizes.

7 Proofs

This section contains the proofs of the main results, with some of the technical parts postponed to the appendix.

7.1 Proof of Theorem 3.1

Lemma 7.1.

The Rényi divergence of Poisson distributions psp_{s} and ptp_{t} with means s,t+s,t\in\mathbb{R}_{+} is given by222Rα(pspt)=R_{\alpha}(p_{s}\|p_{t})=\infty when α1\alpha\geq 1, s>0s>0, and t=0t=0 by our 0-division conventions.

Rα(pspt)={1(s=0)t,α=0,αs+(1α)tsαt1α1α,α{0,1},slogst+ts,α=1.R_{\alpha}(p_{s}\|p_{t})\ =\ \begin{cases}1(s=0)t,&\quad\alpha=0,\\ \frac{\alpha s+(1-\alpha)t-s^{\alpha}t^{1-\alpha}}{1-\alpha},&\quad\alpha\notin\{0,1\},\\ s\log\frac{s}{t}+t-s,&\quad\alpha=1.\end{cases} (7.1)
Proof.

Assume first that s,t>0s,t>0. By definition, the Rényi divergence or order α{0,1}\alpha\notin\{0,1\} equals Rα(pspt)=1α1logZα(pspt)R_{\alpha}(p_{s}\|p_{t})=\frac{1}{\alpha-1}\log Z_{\alpha}(p_{s}\|p_{t}), where

Zα(pspt)\displaystyle Z_{\alpha}(p_{s}\|p_{t}) =x=0(essxx!)α(ettxx!)1α\displaystyle\ =\ \sum_{x=0}^{\infty}\Big{(}e^{-s}\frac{s^{x}}{x!}\Big{)}^{\alpha}\Big{(}e^{-t}\frac{t^{x}}{x!}\Big{)}^{1-\alpha}
=exp((αs+(1α)tsαt1α)).\displaystyle\ =\ \exp\Big{(}-\Big{(}\alpha s+(1-\alpha)t-s^{\alpha}t^{1-\alpha}\Big{)}\Big{)}.

By taking logarithms, (7.1) follows for α{0,1}\alpha\notin\{0,1\}. We also note that logessxx!ettxx!=xlogst+ts\log\frac{e^{-s}\frac{s^{x}}{x!}}{e^{-t}\frac{t^{x}}{x!}}=x\log\frac{s}{t}+t-s, and taking expectations with respect to psp_{s} yields (7.1) for α=1\alpha=1. Furthermore, when s,t>0s,t>0, both psp_{s} and ptp_{t} assign strictly positive probabilities to all nonnegative integers. Therefore, R0(pspt)=logpt(+)=0R_{0}(p_{s}\|p_{t})=-\log p_{t}(\mathbb{Z}_{+})=0, confirming (7.1) for α=0\alpha=0.

We also note that the Poisson distribution with mean 0 equals the Dirac measure at 0. Therefore, a simple computation shows that for all s,t>0s,t>0,

Rα(psδ0)={0,α=0,α1αs,α(0,1),,α1,R_{\alpha}(p_{s}\|\delta_{0})\ =\ \begin{cases}0,&\quad\alpha=0,\\ \frac{\alpha}{1-\alpha}s,&\quad\alpha\in(0,1),\\ \infty,&\quad\alpha\geq 1,\end{cases}

and Rα(δ0pt)=tR_{\alpha}(\delta_{0}\|p_{t})=t for all α[0,)\alpha\in[0,\infty). Finally, Rα(δ0δ0)=0R_{\alpha}(\delta_{0}\|\delta_{0})=0 for all α[0,)\alpha\in[0,\infty). By recalling our conventions with dividing by 0, we may conclude that (7.1) holds for all s,t,α[0,)s,t,\alpha\in[0,\infty). ∎

Proof of Theorem 3.1.

Let λ,μ\lambda,\mu be sigma-finite measures admitting densities f,g:S+f,g\colon S\to\mathbb{R}_{+} with respect to a sigma-finite measure ν\nu on a measurable space (S,𝒮)(S,\mathcal{S}). Let pf(x)p_{f(x)} and pg(x)p_{g(x)} be Poisson distributions with means f(x)f(x) and g(x)g(x). By applying the Rényi divergence formula for Poisson distributions in Lemma 7.1, we find that

SRα(pf(x)pg(x))ν(dx)\displaystyle\int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu(dx)
={S1(f=0)g𝑑ν,α=0,Sαf+(1α)gfαg1α1α𝑑ν,α{0,1},S(flogfg+gf)𝑑ν,α=1.\displaystyle\quad\ =\ \begin{cases}\int_{S}1(f=0)g\,d\nu,&\quad\alpha=0,\\ \int_{S}\frac{\alpha f+(1-\alpha)g-f^{\alpha}g^{1-\alpha}}{1-\alpha}\,d\nu,&\quad\alpha\notin\{0,1\},\\ \int_{S}\left(f\log\frac{f}{g}+g-f\right)\,d\nu,&\quad\alpha=1.\end{cases}

By comparing this with the definition of the Tsallis divergence (3.1), we find that

Tα(λμ)=SRα(pf(x)pg(x))ν(dx).T_{\alpha}(\lambda\|\mu)\ =\ \int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu(dx). (7.2)

Because Rényi divergences of probability measures are nonnegative, (7.2) shows that Tα(λμ)T_{\alpha}(\lambda\|\mu) is a well-defined element in [0,][0,\infty] for all α+\alpha\in\mathbb{R}_{+}. We also know [vH14, Theorem 3] that Rényi divergences of probability measures are nondecreasing in α\alpha, so it also follows from (7.2) that Tα(λμ)T_{\alpha}(\lambda\|\mu) is nondecreasing in α\alpha.

Let us next verify that αTα(λμ)\alpha\mapsto T_{\alpha}(\lambda\|\mu) is continuous on A={α+:Tα(λμ)<}A=\{\alpha\in\mathbb{R}_{+}\colon T_{\alpha}(\lambda\|\mu)<\infty\}. To this end, assume that AA is nonempty and αnα\alpha_{n}\to\alpha for some αn,αA\alpha_{n},\alpha\in A. Because Tα(λμ)T_{\alpha}(\lambda\|\mu) is nondecreasing in α\alpha, we see that AA is an interval, and that there exists a number βA\beta\in A such that αn,αβ\alpha_{n},\alpha\leq\beta for all nn. Denote ra(x)=Ra(pf(x)pg(x))r_{a}(x)=R_{a}(p_{f(x)}\|p_{g(x)}) for a,x+a,x\in\mathbb{R}_{+}, and let U={xS:rβ(x)<}U=\{x\in S\colon r_{\beta}(x)<\infty\}. Because Tβ(λμ)=Srβ𝑑νT_{\beta}(\lambda\|\mu)=\int_{S}r_{\beta}\,d\nu is finite, it follows that ν(Uc)=0\nu(U^{c})=0. Therefore,

Tαn(λμ)=Srαn𝑑ν=Urαn𝑑ν.T_{\alpha_{n}}(\lambda\|\mu)\ =\ \int_{S}r_{\alpha_{n}}\,d\nu\ =\ \int_{U}r_{\alpha_{n}}\,d\nu.

For any xUx\in U, the monotonicity of Rényi divergences [vH14, Theorem 3] implies that rα(x),rαn(x)rβ(x)r_{\alpha}(x),r_{\alpha_{n}}(x)\leq r_{\beta}(x). In particular, rα(x),rαn(x)r_{\alpha}(x),r_{\alpha_{n}}(x) are finite for xUx\in U. The continuity of Rényi divergences [vH14, Theorem 7] then implies that rαnrαr_{\alpha_{n}}\to r_{\alpha} pointwise on UU. Lebesgue’s dominated convergence theorem then implies that

Tαn(λμ)=Urαn𝑑νUrα𝑑ν=Tα(λμ).T_{\alpha_{n}}(\lambda\|\mu)\ =\ \int_{U}r_{\alpha_{n}}\,d\nu\ \to\ \int_{U}r_{\alpha}\,d\nu\ =\ T_{\alpha}(\lambda\|\mu).

Finally, let us verify that the right side of (3.1) does not depend on the choice of the densities nor the reference measure. Assume that λ,μ\lambda,\mu admit densities f1,g1:S+f_{1},g_{1}\colon S\to\mathbb{R}_{+} with respect to a sigma-finite measure ν1\nu_{1}, and densities f2,g2:S+f_{2},g_{2}\colon S\to\mathbb{R}_{+} with respect to a sigma-finite measure ν2\nu_{2}. Define ν=ν1+ν2\nu=\nu_{1}+\nu_{2}. Then ν1,ν2ν\nu_{1},\nu_{2}\ll\nu and ν\nu is sigma-finite. The Radon–Nikodym theorem [Kal02, Theorem 2.10] implies that there exist densities h1,h2:S+h_{1},h_{2}\colon S\to\mathbb{R}_{+} of ν1,ν2\nu_{1},\nu_{2} with respect to ν\nu. Then

λ(A)=Afi𝑑νi=Af~i𝑑νfor i=1,2,\lambda(A)\ =\ \int_{A}f_{i}\,d\nu_{i}\ =\ \int_{A}\tilde{f}_{i}\,d\nu\quad\text{for $i=1,2$},

where f~i=fihi\tilde{f}_{i}=f_{i}h_{i}. We see that both f~1\tilde{f}_{1} and f~2\tilde{f}_{2} are densities of λ\lambda with respect to ν\nu. The Radon–Nikodym theorem [Kal02, Theorem 2.10] implies that f~1=f~2\tilde{f}_{1}=\tilde{f}_{2} ν\nu-almost everywhere. Similarly, we see that both g~1\tilde{g}_{1} and g~2\tilde{g}_{2} are densities of μ\mu with respect to ν\nu, so that g~1=g~2\tilde{g}_{1}=\tilde{g}_{2} ν\nu-almost everywhere. Formula (7.1) shows that Rényi divergences of Poisson distributions are homogeneous in the sense that Rα(pcspct)=cRα(pspt)R_{\alpha}(p_{cs}\|p_{ct})=cR_{\alpha}(p_{s}\|p_{t}) for all c+c\in\mathbb{R}_{+}. As a consequence, we see that

SRα(pf~1(x)pg~1(x))ν(dx)\displaystyle\int_{S}R_{\alpha}(p_{\tilde{f}_{1}(x)}\|p_{\tilde{g}_{1}(x)})\,\nu(dx) =SRα(pf1(x)pg1(x))ν1(dx),\displaystyle\ =\ \int_{S}R_{\alpha}(p_{f_{1}(x)}\|p_{g_{1}(x)})\,\nu_{1}(dx),
SRα(pf~2(x)pg~2(x))ν(dx)\displaystyle\int_{S}R_{\alpha}(p_{\tilde{f}_{2}(x)}\|p_{\tilde{g}_{2}(x)})\,\nu(dx) =SRα(pf2(x)pg2(x))ν2(dx).\displaystyle\ =\ \int_{S}R_{\alpha}(p_{f_{2}(x)}\|p_{g_{2}(x)})\,\nu_{2}(dx).

Because f~1=f~2\tilde{f}_{1}=\tilde{f}_{2} and g~1=g~2\tilde{g}_{1}=\tilde{g}_{2} ν\nu-almost everywhere, it follows that all of the above integrals are equal to each other. In particular,

SRα(pf1(x)pg1(x))ν1(dx)=SRα(pf2(x)pg2(x))ν2(dx).\int_{S}R_{\alpha}(p_{f_{1}(x)}\|p_{g_{1}(x)})\,\nu_{1}(dx)\ =\ \int_{S}R_{\alpha}(p_{f_{2}(x)}\|p_{g_{2}(x)})\,\nu_{2}(dx).

In light of (7.2), we conclude that the value of Tα(λμ)T_{\alpha}(\lambda\|\mu) as defined by formula (7.2) is the same for both triples (f1,g1,ν1)(f_{1},g_{1},\nu_{1}) and (f2,g2,ν2)(f_{2},g_{2},\nu_{2}).

7.2 Proof of Theorem 3.12

Proof of Theorem 3.12.

Because λ(dt)=ftν(dt)\lambda(dt)=f_{t}\nu(dt) and Kt(dx)=kt(x)Mt(dx)K_{t}(dx)=k_{t}(x)M_{t}(dx), we see that

(λK)(C)\displaystyle(\lambda\otimes K)(C) =S1(S21C(t,x)Kt(dx))λ(dt)\displaystyle\ =\ \int_{S_{1}}\left(\int_{S_{2}}1_{C}(t,x)\,K_{t}(dx)\right)\lambda(dt)
=S1(S21C(t,x)kt(x)Mt(dx))ftν(dt)\displaystyle\ =\ \int_{S_{1}}\left(\int_{S_{2}}1_{C}(t,x)\,k_{t}(x)\,M_{t}(dx)\right)f_{t}\nu(dt)
=S1(S21C(t,x)ftkt(x)Mt(dx))ν(dt)\displaystyle\ =\ \int_{S_{1}}\left(\int_{S_{2}}1_{C}(t,x)\,f_{t}k_{t}(x)\,M_{t}(dx)\right)\nu(dt)
=Cftkt(x)(νM)(dt,dx)\displaystyle\ =\ \int_{C}f_{t}k_{t}(x)\,(\nu\otimes M)(dt,dx)

for all measurable CS1×S2C\subset S_{1}\times S_{2}, so that (t,x)ftkt(x)(t,x)\mapsto f_{t}k_{t}(x) is a density of λK\lambda\otimes K with respect to νM\nu\otimes M. Similarly, tgtt(x)t\mapsto g_{t}\ell_{t}(x) is a density of μL\mu\otimes L with respect to νM\nu\otimes M.

By (3.1), the Tsallis divergence of order α0,1\alpha\neq 0,1 is given by

Tα(λKμL)=S1(S2τα(t,x)Mt(dx))ν(dt),T_{\alpha}(\lambda\otimes K\|\mu\otimes L)\ =\ \int_{S_{1}}\left(\int_{S_{2}}\tau_{\alpha}(t,x)\,M_{t}(dx)\right)\,\nu(dt), (7.3)

where the integrand equals

τα(t,x)=αftkt(x)+(1α)gtt(x)ftαgt1αkt(x)αt(x)1α1α.\tau_{\alpha}(t,x)\ =\ \frac{\alpha f_{t}k_{t}(x)+(1-\alpha)g_{t}\ell_{t}(x)-f_{t}^{\alpha}g_{t}^{1-\alpha}k_{t}(x)^{\alpha}\ell_{t}(x)^{1-\alpha}}{1-\alpha}.

The integrand may also be written as

τα(t,x)\displaystyle\tau_{\alpha}(t,x) =αftkt(x)+(1α)gtt(x)1α\displaystyle\ =\ \frac{\alpha f_{t}k_{t}(x)+(1-\alpha)g_{t}\ell_{t}(x)}{1-\alpha}
(αkt(x)+(1α)t(x)1α)ftαgt1α\displaystyle\qquad-\left(\frac{\alpha k_{t}(x)+(1-\alpha)\ell_{t}(x)}{1-\alpha}\right)f_{t}^{\alpha}g_{t}^{1-\alpha}
+(αkt(x)+(1α)t(x)kt(x)αt(x)1α1α)ftαgt1α.\displaystyle\qquad+\left(\frac{\alpha k_{t}(x)+(1-\alpha)\ell_{t}(x)-k_{t}(x)^{\alpha}\ell_{t}(x)^{1-\alpha}}{1-\alpha}\right)f_{t}^{\alpha}g_{t}^{1-\alpha}.

We note that S2kt(x)Mt(dx)=1\int_{S_{2}}k_{t}(x)\,M_{t}(dx)=1 and S2t(x)Mt(dx)=1\int_{S_{2}}\ell_{t}(x)\,M_{t}(dx)=1, and that

S2(αkt(x)+(1α)t(x)kt(x)αt(x)1α1α)Mt(dx)=Tα(KtLt).\int_{S_{2}}\left(\frac{\alpha k_{t}(x)+(1-\alpha)\ell_{t}(x)-k_{t}(x)^{\alpha}\ell_{t}(x)^{1-\alpha}}{1-\alpha}\right)M_{t}(dx)\ =\ T_{\alpha}(K_{t}\|L_{t}).

It follows that the inner integral in (7.3) equals

S2τα(t,x)Mt(dx)\displaystyle\int_{S_{2}}\tau_{\alpha}(t,x)\,M_{t}(dx) =αft+(1α)gtftαgt1α1α+Tα(KtLt)ftαgt1α.\displaystyle\ =\ \frac{\alpha f_{t}+(1-\alpha)g_{t}-f_{t}^{\alpha}g_{t}^{1-\alpha}}{1-\alpha}+T_{\alpha}(K_{t}\|L_{t})\,f_{t}^{\alpha}g_{t}^{1-\alpha}.

By integrating both sides against ν(dt)\nu(dt), we obtain (3.8) for α{0,1}\alpha\notin\{0,1\}.

Let us now consider the case with α=1\alpha=1. In this case Tα(λKμL)T_{\alpha}(\lambda\otimes K\|\mu\otimes L) is again given by (7.3), but now we replace τα\tau_{\alpha} by

τ1(t,x)\displaystyle\tau_{1}(t,x) =ftkt(x)logftkt(x)gtt(x)+gtt(x)ftkt(x)\displaystyle\ =\ f_{t}k_{t}(x)\log\frac{f_{t}k_{t}(x)}{g_{t}\ell_{t}(x)}+g_{t}\ell_{t}(x)-f_{t}k_{t}(x)
=ftkt(x)logftgt+ftkt(x)logkt(x)t(x)+gtt(x)ftkt(x).\displaystyle\ =\ f_{t}k_{t}(x)\log\frac{f_{t}}{g_{t}}+f_{t}k_{t}(x)\log\frac{k_{t}(x)}{\ell_{t}(x)}+g_{t}\ell_{t}(x)-f_{t}k_{t}(x).

By integrating the above equation against Mt(dx)M_{t}(dx), we find that

S2τ1(t,x)Mt(dx)\displaystyle\int_{S_{2}}\tau_{1}(t,x)\,M_{t}(dx) =ftlogftgt+gtft+ftT1(KtLt).\displaystyle\ =\ f_{t}\log\frac{f_{t}}{g_{t}}+g_{t}-f_{t}+f_{t}T_{1}(K_{t}\|L_{t}).

By further integrating this against ν(dt)\nu(dt), it follows that

S1(S2τ1(t,x)Mt(dx))ν(dt)=T1(λμ)+S1T1(KtLt)λ(dt),\displaystyle\int_{S_{1}}\left(\int_{S_{2}}\tau_{1}(t,x)\,M_{t}(dx)\right)\nu(dt)\ =\ T_{1}(\lambda\|\mu)+\int_{S_{1}}T_{1}(K_{t}\|L_{t})\,\lambda(dt),

from which we conclude the validity of (3.8) for α=1\alpha=1.

Finally, for α=0\alpha=0 we note that

{(t,x):ftkt(x)=0}\displaystyle\{(t,x)\colon f_{t}k_{t}(x)=0\}
=({f=0}×S2){(t,x):ft0,kt(x)=0}.\displaystyle\ =\ (\{f=0\}\times S_{2})\cup\{(t,x)\colon f_{t}\neq 0,\,k_{t}(x)=0\}.

Hence by (3.1),

T0(λKμL)\displaystyle T_{0}(\lambda\otimes K\|\mu\otimes L) =(μL){(t,x):ftkt(x)=0}\displaystyle\ =\ (\mu\otimes L)\{(t,x)\colon f_{t}k_{t}(x)=0\}
=μ{f=0}+f0Lt{kt=0}μ(dt)\displaystyle\ =\ \mu\{f=0\}+\int_{f\neq 0}L_{t}\{k_{t}=0\}\,\mu(dt)
=T0(λμ)+f0T0(KtLt)μ(dt).\displaystyle\ =\ T_{0}(\lambda\|\mu)+\int_{f\neq 0}T_{0}(K_{t}\|L_{t})\,\mu(dt).

7.3 Proof of Theorem 4.1

The following result shows that embeddings (x1,,xn)i=1nδxi(x_{1},\dots,x_{n})\mapsto\sum_{i=1}^{n}\delta_{x_{i}} are measurable even when singleton sets in (S,𝒮)(S,\mathcal{S}) might be nonmeasurable. This is the reason why there is no need to deal with ‘chunks’ as in [Bro71], and we get a conceptually simplified proof of Theorem 4.1.

Lemma 7.2.

For any n1n\geq 1, the function ιn:SnN(S)\iota_{n}\colon S^{n}\to N(S) by ιk(x1,,xn)=i=1nδxi\iota_{k}(x_{1},\dots,x_{n})=\sum_{i=1}^{n}\delta_{x_{i}} is measurable.

Proof.

Fix a set A𝒮A\in\mathcal{S} and an integer k0k\geq 0. Let C={η:η(A)=k}C=\{\eta\colon\eta(A)=k\}. Then the set

ιn1(C)\displaystyle\iota_{n}^{-1}(C) ={(x1,,xn)Sn:i=1nδxi(A)=k}\displaystyle\ =\ \Big{\{}(x_{1},\dots,x_{n})\in S^{n}\colon\sum_{i=1}^{n}\delta_{x_{i}}(A)=k\Big{\}}

consists of the nn-tuples in SS for which exactly kk members belong to AA, and the remaining members belong to AcA^{c}. The set of all such nn-tuples can be written as

ιn1(C)=b{0,1}n:bi=k1A1({b1})××1A1({bn}),\iota_{n}^{-1}(C)\ =\ \bigcup_{b\in\{0,1\}^{n}:\sum b_{i}=k}1_{A}^{-1}(\{b_{1}\})\times\cdots\times 1_{A}^{-1}(\{b_{n}\}),

where 1A1({bk})1_{A}^{-1}(\{b_{k}\}) denotes the preimage of the indicator function 1A:S{0,1}1_{A}\colon S\to\{0,1\} for {bk}\{b_{k}\}, and equals AA for bk=1b_{k}=1 and AcA^{c} for bk=0b_{k}=0. Because all sets appearing in the finite union on the right are products of AA and AcA^{c}, we conclude that ιn1(C)𝒮n\iota_{n}^{-1}(C)\in\mathcal{S}^{\otimes n}. Because 𝒩(S)\mathcal{N}(S) is the sigma-algebra generated by sets of form CC, we conclude that ιn1(C)𝒮n\iota_{n}^{-1}(C)\in\mathcal{S}^{\otimes n} for all C𝒩(S)C\in\mathcal{N}(S). ∎

Proof of Theorem 4.1.

Assume that Pλ,PμP_{\lambda},P_{\mu} are Poisson PP distributions with finite intensity measures λ,μ\lambda,\mu such that λμ\lambda\ll\mu. To rule out trivialities, we assume that λ,μ\lambda,\mu are nonzero. The Poisson PP distributions may [LP18, Proposition 3.5] then be represented as

Pλ=n0eλ(S)λ(S)nn!λ1nιn1,Pμ=n0eμ(S)μ(S)nn!μ1nιn1,P_{\lambda}\ =\ \sum_{n\geq 0}e^{-\lambda(S)}\frac{\lambda(S)^{n}}{n!}\,\lambda_{1}^{\otimes n}\!\circ\!\iota_{n}^{-1},\qquad P_{\mu}\ =\ \sum_{n\geq 0}e^{-\mu(S)}\frac{\mu(S)^{n}}{n!}\,\mu_{1}^{\otimes n}\!\circ\!\iota_{n}^{-1}, (7.4)

where λ1n,μ1n\lambda_{1}^{\otimes n},\mu_{1}^{\otimes n} are nn-fold products of probability measures λ1=λ/λ(S)\lambda_{1}=\lambda/\lambda(S) and μ1=μ/μ(S)\mu_{1}=\mu/\mu(S) on SS, and ιn(x1,,xn)=i=1nδxi\iota_{n}(x_{1},\dots,x_{n})=\sum_{i=1}^{n}\delta_{x_{i}}. (Lemma 7.2 guarantees that the maps ιn:SnN(S)\iota_{n}\colon S^{n}\to N(S) are measurable, and therefore Pλ,PμP_{\lambda},P_{\mu} are well-defined probability measures on N(S)N(S).)

Recall that ϕ:S+\phi\colon S\to\mathbb{R}_{+} is a density of λ\lambda with respect to μ\mu, and consider the function Φ:N(S)+\Phi\colon N(S)\to\mathbb{R}_{+} such that

Φ(η)=eμ(S)λ(S)i=1nϕ(xi)forη=i=1nδxi,\Phi(\eta)=e^{\mu(S)-\lambda(S)}\prod_{i=1}^{n}\phi(x_{i})\qquad\text{for}\quad\eta=\sum_{i=1}^{n}\delta_{x_{i}}, (7.5)

and Φ(η)=0\Phi(\eta)=0 for η(S)=\eta(S)=\infty. We will show that Φ\Phi is a density of PλP_{\lambda} with respect to PμP_{\mu}. To do this, fix a measurable set CN(S)C\subset N(S), and note that

CΦ(η)Pμ(dη)\displaystyle\int_{C}\Phi(\eta)\,P_{\mu}(d\eta) =n0eμ(S)μ(S)nn!CΦ(η)μ1nιn1(dη)\displaystyle\ =\ \sum_{n\geq 0}e^{-\mu(S)}\frac{\mu(S)^{n}}{n!}\int_{C}\Phi(\eta)\,\mu_{1}^{\otimes n}\!\circ\!\iota_{n}^{-1}(d\eta)
=n0eμ(S)μ(S)nn!ιn1(C)Φ(ιn(x))μ1n(dx)\displaystyle\ =\ \sum_{n\geq 0}e^{-\mu(S)}\frac{\mu(S)^{n}}{n!}\int_{\iota_{n}^{-1}(C)}\Phi(\iota_{n}(x))\,\mu_{1}^{\otimes n}(dx)
=n0eλ(S)μ(S)nn!ιn1(C)ϕn(x)μ1n(dx),\displaystyle\ =\ \sum_{n\geq 0}e^{-\lambda(S)}\frac{\mu(S)^{n}}{n!}\int_{\iota_{n}^{-1}(C)}\phi^{\otimes n}(x)\,\mu_{1}^{\otimes n}(dx),

where ϕn(x1,,xn)=i=1nϕ(xi)\phi^{\otimes n}(x_{1},\dots,x_{n})=\prod_{i=1}^{n}\phi(x_{i}). We also note that μ(S)λ(S)ϕ\frac{\mu(S)}{\lambda(S)}\phi is a density of λ1\lambda_{1} with respect to μ1\mu_{1}, and therefore, (μ(S)λ(S))nϕn(\frac{\mu(S)}{\lambda(S)})^{n}\phi^{\otimes n} is a density of λ1n\lambda_{1}^{\otimes n} with respect to μ1n\mu_{1}^{\otimes n}. Then

λ1n(ιn1(C))=ιn1(C)(μ(S)λ(S))nϕn(x)μ1n(dx),\lambda_{1}^{\otimes n}(\iota_{n}^{-1}(C))\ =\ \int_{\iota_{n}^{-1}(C)}\left(\frac{\mu(S)}{\lambda(S)}\right)^{n}\phi^{\otimes n}(x)\,\mu_{1}^{\otimes n}(dx),

and it follows that

CΦ(η)Pμ(dη)\displaystyle\int_{C}\Phi(\eta)\,P_{\mu}(d\eta) =n0eλ(S)μ(S)nn!(λ(S)μ(S))nλ1n(ιn1(C))\displaystyle\ =\ \sum_{n\geq 0}e^{-\lambda(S)}\frac{\mu(S)^{n}}{n!}\left(\frac{\lambda(S)}{\mu(S)}\right)^{n}\lambda_{1}^{\otimes n}(\iota_{n}^{-1}(C))
=n0eλ(S)λ(S)nn!λ1n(ιn1(C)).\displaystyle\ =\ \sum_{n\geq 0}e^{-\lambda(S)}\frac{\lambda(S)^{n}}{n!}\lambda_{1}^{\otimes n}(\iota_{n}^{-1}(C)).

In light of (7.4), we conclude that CΦ(η)Pμ(dη)=Pλ(C)\int_{C}\Phi(\eta)\,P_{\mu}(d\eta)=P_{\lambda}(C). Hence Φ\Phi is a density of PλP_{\lambda} with respect to PμP_{\mu}. In particular, PλPμP_{\lambda}\ll P_{\mu}.

Finally, let us verify that the function Φ\Phi can be written in form (4.1). By definition (7.5), we see that Φ(η)=0\Phi(\eta)=0 whenever η(S)=\eta(S)=\infty or η{ϕ=0}>0\eta\{\phi=0\}>0. Hence Φ\Phi vanishes outside the set MΩM\cap\Omega where M={ηN(S):η{ϕ=0}=0}M=\{\eta\in N(S)\colon\eta\{\phi=0\}=0\} and Ω={ηN(S):η(S)<}\Omega=\{\eta\in N(S)\colon\eta(S)<\infty\}. On the other hand, i=1nϕ(xi)=exp(Slogϕdη)\prod_{i=1}^{n}\phi(x_{i})=\exp(\int_{S}\log\phi\,d\eta) for every point pattern in MΩM\cap\Omega of form η=i=1nδxi\eta=\sum_{i=1}^{n}\delta_{x_{i}}. By noting that μ(S)λ(S)=S(1ϕ)𝑑μ\mu(S)-\lambda(S)=\int_{S}(1-\phi)\,d\mu, we see that Φ\Phi may be written as in (4.1), but with 1M1_{M} replaced by 1MΩ1_{M\cap\Omega}. Because Campbell’s theorem implies that Pμ(Ω)=1P_{\mu}(\Omega)=1, we see that Φ\Phi is equal to the function in (4.1) as an element of L1(N(S),𝒩(S),Pμ)L_{1}(N(S),\mathcal{N}(S),P_{\mu}). ∎

7.4 Proof of Theorem 4.4

We start by proving the following simple upper bound that confirms that λ\lambda is finite whenever λμ\lambda\ll\mu and H(λ,μ)<H(\lambda,\mu)<\infty for some finite measure μ\mu.

Lemma 7.3.

Assume that λμ\lambda\ll\mu and that ϕ:S+\phi\colon S\to\mathbb{R}_{+} is a density of λ\lambda with respect to μ\mu. Then λ(B)4μ(B)+3B(ϕ1)2𝑑μ\lambda(B)\leq 4\mu(B)+3\int_{B}(\sqrt{\phi}-1)^{2}\,d\mu for any measurable BSB\subset S. Especially, λ(S)4μ(S)+6H(λ,μ)2\lambda(S)\leq 4\mu(S)+6H(\lambda,\mu)^{2}.

Proof.

By writing ϕ=1+(ϕ1)1+(ϕ1)+\phi=1+(\phi-1)\leq 1+(\phi-1)_{+}, we see that λ(B)=Bϕ𝑑μ\lambda(B)=\int_{B}\phi\,d\mu is bounded by

λ(B)μ(B)+B(ϕ1)+𝑑μ.\lambda(B)\ \leq\ \mu(B)+\int_{B}(\phi-1)_{+}\,d\mu. (7.6)

We also note that tt1t+1t\mapsto\frac{t-1}{t+1} is increasing on +\mathbb{R}_{+}, so that

(ϕ1)2=ϕ1ϕ+1(ϕ1)13(ϕ1)for ϕ>4.(\sqrt{\phi}-1)^{2}\ =\ \frac{\sqrt{\phi}-1}{\sqrt{\phi}+1}(\phi-1)\ \geq\ \frac{1}{3}(\phi-1)\qquad\text{for $\phi>4$}.

Because (ϕ1)+3(\phi-1)_{+}\leq 3 for ϕ4\phi\leq 4, it follows that

B(ϕ1)+𝑑μ\displaystyle\int\displaylimits_{B}(\phi-1)_{+}\,d\mu =B{ϕ4}(ϕ1)+𝑑μ+B{ϕ>4}(ϕ1)+𝑑μ\displaystyle\ =\ \int\displaylimits_{B\cap\{\phi\leq 4\}}(\phi-1)_{+}\,d\mu\ +\int\displaylimits_{B\cap\{\phi>4\}}(\phi-1)_{+}\,d\mu
3μ(B)+3B(ϕ1)2𝑑μ.\displaystyle\ \leq\ 3\mu(B)+3\int\displaylimits_{B}(\sqrt{\phi}-1)^{2}\,d\mu.

The first claim follows by combining this with (7.6). The second claim follows by noting that S(ϕ1)2𝑑μ=2H(λ,μ)2\int_{S}(\sqrt{\phi}-1)^{2}\,d\mu=2H(\lambda,\mu)^{2} due to Proposition 3.10. ∎

Proof of Theorem 4.4.

Consider sigma-finite intensity measures such that λμ\lambda\ll\mu and H(λ,μ)<H(\lambda,\mu)<\infty. Fix a density ϕ=dλdμ\phi=\frac{d\lambda}{d\mu}, and a sequence SnSS_{n}\uparrow S such that μ(Sn)<\mu(S_{n})<\infty for all nn.

(i) Computing a truncated density. Let PλnP_{\lambda_{n}} and PμnP_{\mu_{n}} be Poisson PP distributions with truncated intensity measures λn(B)=λ(BSn)\lambda_{n}(B)=\lambda(B\cap S_{n}) and μn(B)=μ(BSn)\mu_{n}(B)=\mu(B\cap S_{n}) on (S,𝒮)(S,\mathcal{S}). We also employ the same notation ηn(B)=η(BSn)\eta_{n}(B)=\eta(B\cap S_{n}) for truncations of point patterns ηN(S)\eta\in N(S). We note that λnμn\lambda_{n}\ll\mu_{n}, and that ϕ\phi serves also as a density of λn\lambda_{n} with respect to μn\mu_{n}. Our choice of SnS_{n} implies that μn\mu_{n} is a finite measure. We also note that λn\lambda_{n} is finite because λn(S)4μn(S)+3S(ϕ1)2𝑑μn=4μ(Sn)+6H2(λ,μ)\lambda_{n}(S)\leq 4\mu_{n}(S)+3\int_{S}(\sqrt{\phi}-1)^{2}\,d\mu_{n}=4\mu(S_{n})+6H^{2}(\lambda,\mu) due to Lemma 7.3. Theorem 4.1 now implies that PλnPμnP_{\lambda_{n}}\ll P_{\mu_{n}}, with a likelihood ratio given by

Φn(η)= 1M(η)exp(S(1ϕ)𝑑μn+Slogϕdη),\Phi_{n}(\eta)\ =\ 1_{M}(\eta)\,\exp\bigg{(}\int_{S}(1-\phi)\,d\mu_{n}+\int_{S}\log\phi\,d\eta\bigg{)}, (7.7)

where

M={ηN(S):η{ϕ=0}=0}.M\ =\ \{\eta\in N(S)\colon\eta\{\phi=0\}=0\}. (7.8)

(ii) Approximate Laplace functional. Fix a measurable function u:S+u\colon S\to\mathbb{R}_{+}. Monotone convergence of integrals then implies that ηn(u)=η(u1Sn)η(u)\eta_{n}(u)=\eta(u1_{S_{n}})\uparrow\eta(u) for all ηN(S)\eta\in N(S). Lebesgue’s dominated convergence theorem then implies that

N(S)eη(u)Pλ(dη)=limnN(S)eηn(u)Pλ(dη).\int_{N(S)}e^{-\eta(u)}P_{\lambda}(d\eta)\ =\ \lim_{n\to\infty}\int_{N(S)}e^{-\eta_{n}(u)}P_{\lambda}(d\eta). (7.9)

By [LP18, Theorem 5.2],

N(S)eηn(u)Pλ(dη)\displaystyle\int_{N(S)}e^{-\eta_{n}(u)}P_{\lambda}(d\eta) =N(S)eη(u)Pλn(dη)\displaystyle\ =\ \int_{N(S)}e^{-\eta(u)}P_{\lambda_{n}}(d\eta)
=N(S)eη(u)Φn(η)Pμn(dη)\displaystyle\ =\ \int_{N(S)}e^{-\eta(u)}\Phi_{n}(\eta)P_{\mu_{n}}(d\eta)
=N(S)eηn(u)Φn(ηn)Pμ(dη).\displaystyle\ =\ \int_{N(S)}e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})P_{\mu}(d\eta).

Together with (7.9), we conclude that

N(S)eη(u)Pλ(dη)=limnN(S)eηn(u)Φn(ηn)Pμ(dη).\int_{N(S)}e^{-\eta(u)}\,P_{\lambda}(d\eta)\ =\ \lim_{n\to\infty}\int_{N(S)}e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})\,P_{\mu}(d\eta). (7.10)

(iii) Identifying the limiting density. Let us next identify the limit of Φn(ηn)\Phi_{n}(\eta_{n}) as nn\to\infty. In light of (7.7), we see that

Φn(ηn)= 1M(ηn)exp(Sn(1ϕ)𝑑μ+Snlogϕdη).\Phi_{n}(\eta_{n})\ =\ 1_{M}(\eta_{n})\exp\bigg{(}\int_{S_{n}}(1-\phi)\,d\mu+\int_{S_{n}}\log\phi\,d\eta\bigg{)}. (7.11)

Even though SnSS_{n}\uparrow S, the integrals on the right side above may not converge as expected because S(1ϕ)𝑑μ\int_{S}(1-\phi)\,d\mu and Slogϕdη\int_{S}\log\phi\,d\eta are not necessarily well defined. Also, the compensated integral Slogϕd(ημ)\int_{S}\log\phi\,d(\eta-\mu) might diverge. A key observation (proven soon) is that the compensated integral Alogϕd(ημ)\int_{A}\log\phi\,d(\eta-\mu) will converge for PμP_{\mu}-almost every η\eta, where

A={xS:|logϕ(x)|1}.A\ =\ \{x\in S\colon{\lvert\log\phi(x)\rvert}\leq 1\}.

With this target in mind, we will reorganise the integral terms of (7.11) according to

Sn(1ϕ)𝑑μ+Snlogϕdη=Wn(η)+Zn(η)+wn+zn,\int_{S_{n}}(1-\phi)\,d\mu+\int_{S_{n}}\log\phi\,d\eta\ =\ W_{n}(\eta)+Z_{n}(\eta)+w_{n}+z_{n}, (7.12)

where

Wn(η)=ASnlogϕdηASnlogϕdμ,Zn(η)=AcSnlogϕdη,wn=ASn(logϕ+1ϕ)𝑑μ,zn=AcSn(1ϕ)𝑑μ.\begin{aligned} W_{n}(\eta)&=\int_{A\cap S_{n}}\log\phi\,d\eta-\int_{A\cap S_{n}}\log\phi\,d\mu,\\ Z_{n}(\eta)&=\int_{A^{c}\cap S_{n}}\log\phi\,d\eta,\end{aligned}\qquad\begin{aligned} w_{n}&=\int_{A\cap S_{n}}(\log\phi+1-\phi)\,d\mu,\\ z_{n}&=\int_{A^{c}\cap S_{n}}(1-\phi)\,d\mu.\end{aligned}

We will show that all terms on the right side of (7.12) converge for all ηMΩ\eta\in M\cap\Omega, where MM is defined by (7.8) and Ω=Ω1Ω2\Omega=\Omega_{1}\cap\Omega_{2}, where

Ω1={ηN(S):Ac{ϕ>0}|logϕ|𝑑η<,η(Ac)<}\Omega_{1}\ =\ \left\{\eta\in N(S)\colon\int_{A^{c}\cap\{\phi>0\}}{\lvert\log\phi\rvert}\,d\eta<\infty,\ \eta(A^{c})<\infty\right\}

and

Ω2={ηN(S):Alogϕd(ημ) converges,η(Sn)<for all n}.\Omega_{2}\ =\ \left\{\eta\in N(S)\colon\text{$\int_{A}\log\phi\,d(\eta-\mu)$ converges},\ \eta(S_{n})<\infty\ \text{for all $n$}\right\}.

First, the functions 1AcSnlogϕ1_{A^{c}\cap S_{n}}\log\phi are dominated in absolute value by 1Ac|logϕ|1_{A^{c}}{\lvert\log\phi\rvert} for all nn. The dominating function is integrable with respect to any ηMΩ\eta\in M\cap\Omega by the definition of Ω1\Omega_{1}. Lebesgue’s dominated convergence theorem then implies that

Zn(η)AclogϕdηZ_{n}(\eta)\to\int_{A^{c}}\log\phi\,d\eta (7.13)

for every ηMΩ1\eta\in M\cap\Omega_{1}. The definition of Ω2\Omega_{2} in turn implies that

Wn(η)Alogϕd(ημ)W_{n}(\eta)\to\int_{A}\log\phi\,d(\eta-\mu) (7.14)

for all ηΩ2\eta\in\Omega_{2}. We also note that the functions associated with the definitions of znz_{n} and wnw_{n} converge pointwise according to

(1ϕ)1AcSn\displaystyle(1-\phi)1_{A^{c}\cap S_{n}} (1ϕ)1Ac,\displaystyle\to(1-\phi)1_{A^{c}},
(logϕ+1ϕ)1ASn\displaystyle(\log\phi+1-\phi)1_{A\cap S_{n}} (logϕ+1ϕ)1A.\displaystyle\to(\log\phi+1-\phi)1_{A}.

By (C.6), ϕ+1e+1(e1/21)2(ϕ1)2\phi+1\ \leq\ \frac{e+1}{(e^{1/2}-1)^{2}}(\sqrt{\phi}-1)^{2} on AcA^{c}, so that the functions (1ϕ)1AcSn(1-\phi)1_{A^{c}\cap S_{n}} are dominated in absolute value by e+1(e1/21)2(ϕ1)2\frac{e+1}{(e^{1/2}-1)^{2}}(\sqrt{\phi}-1)^{2}. Similarly, by (C.7), the functions (logϕ+1ϕ)1ASn(\log\phi+1-\phi)1_{A\cap S_{n}} are dominated in absolute value by 2e3(ϕ1)22e^{3}(\sqrt{\phi}-1)^{2}. Both dominating functions are integrable due to S(ϕ1)2𝑑μ=2H2(λ,μ)<\int_{S}(\sqrt{\phi}-1)^{2}\,d\mu=2H^{2}(\lambda,\mu)<\infty (recall Proposition 3.10). Therefore, by dominated convergence, we see that

zn\displaystyle z_{n} Ac(1ϕ)𝑑μ,\displaystyle\ \to\ \int_{A^{c}}(1-\phi)\,d\mu, (7.15)
wn\displaystyle w_{n} A(logϕ+1ϕ)𝑑μ.\displaystyle\ \to\ \int_{A}(\log\phi+1-\phi)\,d\mu. (7.16)

By plugging (7.13)–(7.16) into (7.12), we find that for all ηMΩ\eta\in M\cap\Omega,

limn(Sn(1ϕ)𝑑μ+Snlogϕdη)=(η)\lim_{n\to\infty}\left(\int_{S_{n}}(1-\phi)\,d\mu+\int_{S_{n}}\log\phi\,d\eta\right)\ =\ \ell(\eta)

where

(η)\displaystyle\ell(\eta) =Aclogϕdη+Alogϕd(ημ)\displaystyle\ =\ \int_{A^{c}}\log\phi\,d\eta+\int_{A}\log\phi\,d(\eta-\mu) (7.17)
+Ac(1ϕ)𝑑μ+A(logϕ+1ϕ)𝑑μ.\displaystyle\qquad+\int_{A^{c}}(1-\phi)\,d\mu+\int_{A}(\log\phi+1-\phi)\,d\mu.

By noting that ηnM\eta_{n}\in M whenever ηMΩ\eta\in M\cap\Omega, we see in light of (7.11) that

limnΦn(ηn)=e(η)\lim_{n\to\infty}\Phi_{n}(\eta_{n})\ =\ e^{\ell(\eta)}

for all ηMΩ\eta\in M\cap\Omega. Furthermore, for any ηMcΩ\eta\in M^{c}\cap\Omega, we note that η{ϕ=0}1\eta\{\phi=0\}\geq 1, and the fact that {ϕ=0}Sn{ϕ=0}\{\phi=0\}\cap S_{n}\uparrow\{\phi=0\} then implies that η({ϕ=0}Sn)1\eta(\{\phi=0\}\cap S_{n})\geq 1 eventually for all large nn. Hence for any ηMcΩ\eta\in M^{c}\cap\Omega, 1M(ηn)=01_{M}(\eta_{n})=0 eventually for all sufficiently large values of nn. We also note that ηn(u)=η(u1Sn)η(u)\eta_{n}(u)=\eta(u1_{S_{n}})\uparrow\eta(u) by monotone convergence of integrals. By denoting Φ(η)=1MΩ(η)e(η)\Phi(\eta)=1_{M\cap\Omega}(\eta)e^{\ell(\eta)}, we see that

limn1Ω(η)eηn(u)Φn(ηn)=eη(u)Φ(η)for all η.\lim_{n\to\infty}1_{\Omega}(\eta)e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})\ =\ e^{-\eta(u)}\Phi(\eta)\qquad\text{for all $\eta$}. (7.18)

(iv) Exchanging the limit and integral. Let us justify that we may interchange the limit and the integral in (7.10). We know by Lemma B.3 that Pμ(Ω)=1P_{\mu}(\Omega)=1. Therefore, (7.10) can be written as

N(S)eη(u)Pλ(dη)=limnN(S)1Ω(η)eηn(u)Φn(ηn)Pμ(dη).\int_{N(S)}e^{-\eta(u)}\,P_{\lambda}(d\eta)\ =\ \lim_{n\to\infty}\int_{N(S)}1_{\Omega}(\eta)e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n})\,P_{\mu}(d\eta). (7.19)

We wish to take the limit inside the integral in (7.19). To justify this, we note that the functions fn=1Ω(η)eηn(u)Φn(ηn)f_{n}=1_{\Omega}(\eta)e^{-\eta_{n}(u)}\Phi_{n}(\eta_{n}) are bounded by 0fngn0\leq f_{n}\leq g_{n}, where gn=Φn(ηn)g_{n}=\Phi_{n}(\eta_{n}). We also note [LP18, Theorem 5.2], that

N(S)gn𝑑Pμ=N(S)Φn(η)Pμn(dη)=N(S)Pλn(dη)= 1,\int_{N(S)}g_{n}\,dP_{\mu}\ =\ \int_{N(S)}\Phi_{n}(\eta)\,P_{\mu_{n}}(d\eta)\ =\ \int_{N(S)}P_{\lambda_{n}}(d\eta)\ =\ 1,

because Φn=dPλndPμn\Phi_{n}=\frac{dP_{\lambda_{n}}}{dP_{\mu_{n}}}. Especially, |fn|gn{\lvert f_{n}\rvert}\leq g_{n} for all nn, and supnN(S)gn𝑑Pμ<\sup_{n}\int_{N(S)}g_{n}\,dP_{\mu}<\infty. A modified version of Lebesgue’s dominated convergence theorem (Lemma A.3) then justifies exchanging the limit and integral on the right side of (7.19), and plugging in the limit of (7.18) shows that for all measurable u:S+u\colon S\to\mathbb{R}_{+},

N(S)eη(u)Pλ(dη)=N(S)eη(u)Φ(η)Pμ(dη).\int_{N(S)}e^{-\eta(u)}\,P_{\lambda}(d\eta)\ =\ \int_{N(S)}e^{-\eta(u)}\,\Phi(\eta)P_{\mu}(d\eta). (7.20)

(v) Conclusion. Finally, we note that the formula Q(dη)=Φ(η)Pμ(dη)Q(d\eta)=\Phi(\eta)P_{\mu}(d\eta) defines a measure on (N(S),𝒩(S))(N(S),\mathcal{N}(S)). By applying (7.20) with u=0u=0, we see that Q(N(S))=N(S)Φ(η)Pμ(dη)=Pλ(N(S))=1Q(N(S))=\int_{N(S)}\Phi(\eta)P_{\mu}(d\eta)=P_{\lambda}(N(S))=1, so that QQ is a probability measure. Because the Laplace functional uniquely characterises [LP18, Proposition 2.10] a probability measure on (N(S),𝒩(S))(N(S),\mathcal{N}(S)), we conclude from (7.20) that Q=PλQ=P_{\lambda}. In other words, Φ\Phi is a density of PλP_{\lambda} with respect to PμP_{\mu}. As an element of L1(N(S),𝒩(S),Pμ)L_{1}(N(S),\mathcal{N}(S),P_{\mu}), we see that Φ(η)=1M(η)e(η)\Phi(\eta)=1_{M}(\eta)e^{\ell(\eta)}, because Pμ(Ω)=1P_{\mu}(\Omega)=1. ∎

7.5 Proof of Theorem 5.1

First, Lemma 7.5 proves the claim under an additional condition that λ\lambda and μ\mu are finite measures on SS. This proof is different from the usual topological approach that is based on approximating the measurable sets of SS by a finite sigma-algebra [Lie75, Kar83, Kar91], which usually requires SS to be a separable metric space. Instead, the following proof is based on (i) representing a Poisson PP distribution with a finite intensity measure using a Poisson-distributed number of IID random variables (see [Kin67, Rei93, LP18]); and (ii) representing a Poisson PP distribution with a sigma-finite intensity measure using a decomposition with respect to a countable partition.

Lemma 7.4.

PλPμP_{\lambda}\ll P_{\mu} \implies λμ\lambda\ll\mu for any Poisson PP distributions with sigma-finite intensity measures. Furthermore, the converse implication λμ\lambda\ll\mu \implies PλPμP_{\lambda}\ll P_{\mu} holds when the intensity measures are finite.

Proof.

Assume that PλPμP_{\lambda}\ll P_{\mu}. Consider a set BSB\subset S such that μ(B)=0\mu(B)=0. Define C={η:η(B)>0}C=\{\eta\colon\eta(B)>0\}. Recall that η(B)\eta(B) is Poisson distributed with mean μ(B)\mu(B) when η\eta is sampled from PμP_{\mu}. Therefore, Pμ(C)=1eμ(B)=0P_{\mu}(C)=1-e^{-\mu(B)}=0. Now PλPμP_{\lambda}\ll P_{\mu} implies that 0=Pλ(C)=1eλ(B)0=P_{\lambda}(C)=1-e^{-\lambda(B)}, from which we conclude that λ(B)=0\lambda(B)=0. Hence λμ\lambda\ll\mu.

For finite intensity measures λ,μ\lambda,\mu, Theorem 4.1 confirms the reverse implication λμ\lambda\ll\mu \implies PλPμP_{\lambda}\ll P_{\mu}. ∎

Lemma 7.5.

Rα(PλPμ)=Tα(λμ)R_{\alpha}(P_{\lambda}\|P_{\mu})=T_{\alpha}(\lambda\|\mu) for all α+\alpha\in\mathbb{R}_{+} and all Poisson PP distributions Pλ,PμP_{\lambda},P_{\mu} with finite intensity measures λ,μ\lambda,\mu.

Proof.

Let PνP_{\nu} be a Poisson PP distribution with intensity measure ν=λ+μ\nu=\lambda+\mu. Let f=dλdν,g=dμdνf=\frac{d\lambda}{d\nu},g=\frac{d\mu}{d\nu} be densities of λ,μ\lambda,\mu with respect to ν\nu. Such functions exist by the Radon–Nikodym theorem [Kal02, Theorem 2.10]. Theorem 4.1 implies that Pλ,PμP_{\lambda},P_{\mu} are absolutely continuous with respect to PνP_{\nu}, admitting likelihood ratios F=dPλdPνF=\frac{dP_{\lambda}}{dP_{\nu}} and G=dPμdPνG=\frac{dP_{\mu}}{dP_{\nu}} given by

F(η)\displaystyle F(\eta) = 1Mf(η)eν(1f)+η(logf),\displaystyle\ =\ 1_{M_{f}}(\eta)\,e^{\nu(1-f)+\eta(\log f)}, (7.21)
G(η)\displaystyle G(\eta) = 1Mg(η)eν(1g)+η(logg),\displaystyle\ =\ 1_{M_{g}}(\eta)\,e^{\nu(1-g)+\eta(\log g)},

where Mf={ηN(S):η{f=0}=0,η(S)<}M_{f}=\{\eta\in N(S)\colon\eta\{f=0\}=0,\,\eta(S)<\infty\} and MgM_{g} is defined similarly, and we abbreviate ν(f)=f𝑑ν\nu(f)=\int f\,d\nu.

Let us first consider the case with α{0,1}\alpha\notin\{0,1\}. Then Rα(PλPμ)=1α1logZαR_{\alpha}(P_{\lambda}\|P_{\mu})=\frac{1}{\alpha-1}\log Z_{\alpha} where Zα=N(S)FαG1α𝑑PνZ_{\alpha}=\int_{N(S)}F^{\alpha}G^{1-\alpha}\,dP_{\nu}. By the standard conventions 0=00=00\cdot\infty=\frac{0}{0}=0 and 10=\frac{1}{0}=\infty, we see that

Zα={Z~α,α(0,1),Z~α+Pν{F>0,G=0},α(1,),Z_{\alpha}\ =\ \begin{cases}\tilde{Z}_{\alpha},&\quad\alpha\in(0,1),\\ \tilde{Z}_{\alpha}+\infty\cdot P_{\nu}\{F>0,G=0\},&\quad\alpha\in(1,\infty),\end{cases} (7.22)

where

Z~α=F>0,G>0FαG1α𝑑Pν.\tilde{Z}_{\alpha}\ =\ \int_{F>0,\,G>0}F^{\alpha}G^{1-\alpha}\,dP_{\nu}.

To derive a simplified expression for Z~α\tilde{Z}_{\alpha}, define U={f>0,g>0}U=\{f>0,\,g>0\} and consider a set of point patterns N(U)={ηN(S):η(S)<,η(Uc)=0}N(U)=\{\eta\in N(S)\colon\eta(S)<\infty,\,\eta(U^{c})=0\}. In light of (7.21), we see that {F>0,G>0}{η:η(S)<}=MfMg=N(U)\{F>0,\,G>0\}\cap\{\eta\colon\eta(S)<\infty\}=M_{f}\cap M_{g}=N(U). Because Pν{η:η(S)<}=1P_{\nu}\{\eta\colon\eta(S)<\infty\}=1, it follows that

Z~α=N(U)FαG1α𝑑Pν.\tilde{Z}_{\alpha}\ =\ \int_{N(U)}F^{\alpha}G^{1-\alpha}\,dP_{\nu}.

We also note that for ηN(U)\eta\in N(U),

F(η)αG(η)1α\displaystyle F(\eta)^{\alpha}G(\eta)^{1-\alpha} =eαν(1f)+(1α)ν(1g)eη(loghα),\displaystyle\ =\ e^{\alpha\nu(1-f)+(1-\alpha)\nu(1-g)}e^{\eta(\log h_{\alpha})}, (7.23)

where hα=fαg1αh_{\alpha}=f^{\alpha}g^{1-\alpha}. The conditional distribution of a point pattern η\eta sampled from PνP_{\nu} given ηN(U)\eta\in N(U) equals (Proposition B.4) PνUP_{\nu_{U}}. By also noting that Pν(N(U))=eν(Uc)P_{\nu}(N(U))=e^{-\nu(U^{c})}, it follows that

N(U)eη(loghα)Pν(dη)=eν(Uc)N(S)eη(loghα)PνU(dη).\int_{N(U)}e^{\eta(\log h_{\alpha})}\,P_{\nu}(d\eta)\ =\ e^{-\nu(U^{c})}\int_{N(S)}e^{\eta(\log h_{\alpha})}\,P_{\nu_{U}}(d\eta).

Because S(|loghα|1)𝑑νUν(U)<\int_{S}({\lvert\log h_{\alpha}\rvert}\wedge 1)\,d\nu_{U}\leq\nu(U)<\infty and xlogh(x)x\mapsto\log h(x) restricted to UU is \mathbb{R}-valued, the Laplace functional formula of Poisson point patterns [Kal02, Lemma 12.2] implies that N(S)eη(loghα)PνU(dη)=eνU(hα1)\int_{N(S)}e^{\eta(\log h_{\alpha})}\,P_{\nu_{U}}(d\eta)=e^{\nu_{U}(h_{\alpha}-1)}. By integrating (7.23) with respect to PνP_{\nu}, it follows that

Z~α\displaystyle\tilde{Z}_{\alpha} =eαν(1f)+(1α)ν(1g)eν(Uc)eνU(hα1)\displaystyle\ =\ e^{\alpha\nu(1-f)+(1-\alpha)\nu(1-g)}e^{-\nu(U^{c})}e^{\nu_{U}(h_{\alpha}-1)} (7.24)
=eαν(f)(1α)ν(g)+νU(hα).\displaystyle\ =\ e^{-\alpha\nu(f)-(1-\alpha)\nu(g)+\nu_{U}(h_{\alpha})}.

We are now ready to verify the claim by considering the following five cases one by one:

  1. (i)

    Assume now that α(0,1)\alpha\in(0,1). Then by (7.22), we see that Rα(PλPμ)=1α1logZ~αR_{\alpha}(P_{\lambda}\|P_{\mu})=\frac{1}{\alpha-1}\log\tilde{Z}_{\alpha}. We also note that νU(hα)=ν(hα)\nu_{U}(h_{\alpha})=\nu(h_{\alpha}), so that by (7.24), we conclude that

    Rα(PλPμ)=ν(αf+(1α)ghα)1α.R_{\alpha}(P_{\lambda}\|P_{\mu})\ =\ \frac{\nu(\alpha f+(1-\alpha)g-h_{\alpha})}{1-\alpha}. (7.25)

    The claim follows because the right side equals Tα(λμ)T_{\alpha}(\lambda\|\mu) by (3.1).

  2. (ii)

    Assume now that α(1,)\alpha\in(1,\infty) and λμ\lambda\ll\mu. Then PλPμP_{\lambda}\ll P_{\mu} by Lemma 7.4. Then ν{f>0,g=0}=0\nu\{f>0,g=0\}=0 and Pν{F>0,G=0}=0P_{\nu}\{F>0,G=0\}=0 (Lemma A.1). In this case we again find that νU(hα)=ν(hα)\nu_{U}(h_{\alpha})=\nu(h_{\alpha}). Therefore, in light of (7.22) and (7.24), we see that (7.25) holds also in this case, and the claim follows.

  3. (iii)

    Assume now that α(1,)\alpha\in(1,\infty) and λ≪̸μ\lambda\not\ll\mu. Then Pλ≪̸PμP_{\lambda}\not\ll P_{\mu} by Lemma 7.4. Then ν{f>0,g=0}>0\nu\{f>0,g=0\}>0 and Pν{F>0,G=0}>0P_{\nu}\{F>0,G=0\}>0 (Lemma A.1). Hence by (7.22), Rα(PλPμ)=R_{\alpha}(P_{\lambda}\|P_{\mu})=\infty. The assumption α>1\alpha>1 implies that fαg1α=f^{\alpha}g^{1-\alpha}=\infty on the set {f>0,g=0}\{f>0,\,g=0\}. Therefore Sfαg1α𝑑ν=\int_{S}f^{\alpha}g^{1-\alpha}\,d\nu=\infty, and we find that Tα(λμ)=T_{\alpha}(\lambda\|\mu)=\infty by (3.1), and the claim follows.

  4. (iv)

    Assume that α=1\alpha=1. Now R1(PλPμ)=limα1Rα(PλPμ)R_{1}(P_{\lambda}\|P_{\mu})=\lim_{\alpha\uparrow 1}R_{\alpha}(P_{\lambda}\|P_{\mu}) [vH14]. By (i), we know that Rα(λμ)=Tα(λμ)R_{\alpha}(\lambda\|\mu)=T_{\alpha}(\lambda\|\mu) for all α(0,1)\alpha\in(0,1). The claim follows by letting α1\alpha\uparrow 1 and noting that T1(PλPμ)=limα1Tα(PλPμ)T_{1}(P_{\lambda}\|P_{\mu})=\lim_{\alpha\uparrow 1}T_{\alpha}(P_{\lambda}\|P_{\mu}) by Theorem 3.1.

  5. (v)

    Assume that α=0\alpha=0. Formula (2.1) shows that R0(PλPμ)=logPμ(F>0)=logPμ(Mf)R_{0}(P_{\lambda}\|P_{\mu})=-\log P_{\mu}(F>0)=-\log P_{\mu}(M_{f}). Because Pμ(Mf)=eμ{f=0}P_{\mu}(M_{f})=e^{-\mu\{f=0\}}, it follows that R0(PλPμ)=μ{f=0}R_{0}(P_{\lambda}\|P_{\mu})=\mu\{f=0\}. By formula (3.1), we see that R0(PλPμ)=T0(λμ)R_{0}(P_{\lambda}\|P_{\mu})=T_{0}(\lambda\|\mu).

With the help of Lemma 7.5 we will prove Theorem 5.1 in the general case where λ,μ\lambda,\mu are sigma-finite measures on SS.

Proof of Theorem 5.1.

(i) Fix α(0,)\alpha\in(0,\infty). Define ν=λ+μ\nu=\lambda+\mu. Then ν\nu is sigma-finite. Select a partition (see Lemma A.2) S=n1SnS=\cup_{n\geq 1}S_{n} such that ν(Sn)<\nu(S_{n})<\infty for all nn. Denote N(Sn)={ηN(S):η(Snc)=0}N(S_{n})=\{\eta\in N(S)\colon\eta(S_{n}^{c})=0\}. Define τ:N(S)n=1N(Sn)\tau\colon N(S)\to\prod_{n=1}^{\infty}N(S_{n}) by

τ(η)=(ηS1,ηS2,),\tau(\eta)\ =\ (\eta_{S_{1}},\eta_{S_{2}},\dots), (7.26)

where ηSnN(Sn)\eta_{S_{n}}\in N(S_{n}) is defined by ηSn(B)=η(BSn)\eta_{S_{n}}(B)=\eta(B\cap S_{n}). A restriction theorem [LP18, Theorem 5.2] implies that when η\eta is sampled from PλP_{\lambda}, then the restrictions ηS1,ηS2,\eta_{S_{1}},\eta_{S_{2}},\dots are mutually independent Poisson PPs with intensity measures λS1,λS2,\lambda_{S_{1}},\lambda_{S_{2}},\dots defined by λSn(B)=λ(BSn)\lambda_{S_{n}}(B)=\lambda(B\cap S_{n}). Therefore, the pushforward probability measure Pλτ1P_{\lambda}\circ\tau^{-1} can be written as a product of Poisson PP distributions PλSnP_{\lambda_{S_{n}}}, n1n\geq 1. The same reasoning is valid also for PμP_{\mu}. Hence

Pλτ1=n=1PλSnandPμτ1=n=1PμSn.P_{\lambda}\circ\tau^{-1}\ =\ \bigotimes_{n=1}^{\infty}P_{\lambda_{S_{n}}}\qquad\text{and}\qquad P_{\mu}\circ\tau^{-1}\ =\ \bigotimes_{n=1}^{\infty}P_{\mu_{S_{n}}}.

Because the sets S1,S2,S_{1},S_{2},\dots form a partition of SS, we see that the map τ\tau defined by (7.26) is a bijection with inverse τ1(η1,η2,)=n1ηn\tau^{-1}(\eta_{1},\eta_{2},\dots)=\sum_{n\geq 1}\eta_{n}. Standard arguments show that τ\tau and τ1\tau^{-1} are measurable mappings (see Section B.1). Lemma A.4 then implies that Rα(PλPμ)=Rα(Pλτ1Pμτ1)R_{\alpha}(P_{\lambda}\|P_{\mu})=R_{\alpha}(P_{\lambda}\circ\tau^{-1}\|P_{\mu}\circ\tau^{-1}). Because Rényi divergences of order α>0\alpha>0 factorise over tensor products [vH14, Theorem 28] it follows that

Rα(PλPμ)=n=1Rα(PλSnPμSn).R_{\alpha}(P_{\lambda}\|P_{\mu})\ =\ \sum_{n=1}^{\infty}R_{\alpha}(P_{\lambda_{S_{n}}}\|P_{\mu_{S_{n}}}). (7.27)

Because λ,μν\lambda,\mu\ll\nu and ν\nu is sigma-finite, the Radon–Nikodym theorem [Kal02, Theorem 2.10] implies that there exist densities f=dλdνf=\frac{d\lambda}{d\nu} and g=dμdνg=\frac{d\mu}{d\nu} of λ\lambda and μ\mu with respect to ν\nu. Observe now that λSn(A)=ASnf𝑑ν=Af𝑑νSn\lambda_{S_{n}}(A)=\int_{A\cap S_{n}}f\,d\nu=\int_{A}f\,d\nu_{S_{n}} for all measurable ASA\subset S. Similarly, μSn(A)=Ag𝑑νSn\mu_{S_{n}}(A)=\int_{A}g\,d\nu_{S_{n}}. We conclude that the functions ff and gg also act as densities f=dλSndνSnf=\frac{d\lambda_{S_{n}}}{d\nu_{S_{n}}} and g=dμSndνSng=\frac{d\mu_{S_{n}}}{d\nu_{S_{n}}} of the finite measures λSn,μSn\lambda_{S_{n}},\mu_{S_{n}} with respect to νSn\nu_{S_{n}}. Lemma 7.5 now implies that

Rα(PλSnPμSn)=Tα(λSnμSn).R_{\alpha}(P_{\lambda_{S_{n}}}\|P_{\mu_{S_{n}}})=T_{\alpha}(\lambda_{S_{n}}\|\mu_{S_{n}}). (7.28)

Furthermore, by (3.2) in Theorem 3.1, we see that

Tα(λSnμSn)=SRα(pf(x)pg(x))νSn(dx),T_{\alpha}(\lambda_{S_{n}}\|\mu_{S_{n}})\ =\ \int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu_{S_{n}}(dx),

where psp_{s} refers to the Poisson distribution keskskk!k\mapsto e^{-sk}\frac{s^{k}}{k!} with mean ss. Observe that the integrand on the right side above is nonnegative, dνSn=1Sndνd\nu_{S_{n}}=1_{S_{n}}d\nu, and n1Sn=1\sum_{n}1_{S_{n}}=1. Fubini’s theorem combined with (3.2) then implies that

n=1Tα(λSnμSn)=SRα(pf(x)pg(x))ν(dx)=Tα(λμ).\displaystyle\sum_{n=1}^{\infty}T_{\alpha}(\lambda_{S_{n}}\|\mu_{S_{n}})\ =\ \int_{S}R_{\alpha}(p_{f(x)}\|p_{g(x)})\,\nu(dx)\ =\ T_{\alpha}(\lambda\|\mu).

By combining this with (7.27) and (7.28), it follows that Rα(PλPμ)=Tα(λμ)R_{\alpha}(P_{\lambda}\|P_{\mu})=T_{\alpha}(\lambda\|\mu).

(ii) Finally, let use verify that R0(PλPμ)=T0(λμ)R_{0}(P_{\lambda}\|P_{\mu})=T_{0}(\lambda\|\mu) under the additional assumption that Tβ(λμ)<T_{\beta}(\lambda\|\mu)<\infty for some β>0\beta>0. We saw in part (i) of the proof that

Rα(λμ)=Tα(λμ)for all α(0,).R_{\alpha}(\lambda\|\mu)=T_{\alpha}(\lambda\|\mu)\quad\text{for all $\alpha\in(0,\infty)$}. (7.29)

Now [vH14, Theorem 7] implies that αRα(PλPμ)\alpha\mapsto R_{\alpha}(P_{\lambda}\|P_{\mu}) is continuous on [0,1][0,1], and Theorem 3.1 implies that αTα(λμ)\alpha\mapsto T_{\alpha}(\lambda\|\mu) is continuous on [0,β][0,\beta]. Hence the claim follows by taking limits α0\alpha\to 0 in (7.29). ∎

8 Conclusions

By developing an analytical toolbox of generalised Tsallis divergences for sigma-finite measures, a framework was derived for analysing likelihood ratios and Rényi divergences of Poisson PPs on general measurable spaces. The main advantage of this approach is that it is purely information-theoretic and free of topological assumptions. This framework allows one to derive explicit descriptions of Kullback–Leibler divergences, Rényi divergences, Hellinger distances, and likelihood ratios for statistical models that admit a measurable one-to-one map into a space of point patterns governed by a Poisson PP distribution. Marked Poisson PPs corresponding to Poisson PPs on abstract product spaces provide a rich context for various applications. The disintegrated Tsallis divergence formula in Section 3.5 is key to understanding their information-theoretic features. For completing the general theory, understanding whether the technical condition (3.7) is necessary in Theorem 3.12 remains an important open problem.

Future directions of extending this work include deriving similar results for a wider class of statistical models derived from Poisson processes and marked point patterns, for example Cox processes, Hawkes processes, Poisson shot noise random measures, and Matérn point patterns. A challenge here is that mechanisms used to derive the model from a marked point pattern tend to lose information. For example, Poisson shot noise models in certain limiting regimes reduce to Gaussian white noises [KLNS07]. The thorough characterisation of Poisson PP distributions derived in this article is expected to serve as a cornerstone for this type of further studies.

Acknowledgments

The author thanks Venkat Anantharam for insightful discussions and two anonymous reviewers for valuable remarks that have helped to improve the presentation.

Appendix A Measure theory

A.1 Densities

Recall notations from Section 2.1.

Lemma A.1.

Let λ,μ\lambda,\mu be measures on a measurable space (S,𝒮)(S,\mathcal{S}) admitting densities f=dλdνf=\frac{d\lambda}{d\nu} and g=dμdνg=\frac{d\mu}{d\nu} with respect to a measure ν\nu.

  1. (i)

    λ{f=0}=0\lambda\{f=0\}=0 and μ{g=0}=0\mu\{g=0\}=0.

  2. (ii)

    For any measurable set BB, λ(B)=0\lambda(B)=0 if and only if ν({f>0}B)=0\nu(\{f>0\}\cap B)=0.

  3. (iii)

    λμ\lambda\ll\mu if and only if λ{g=0}=0\lambda\{g=0\}=0.

  4. (iv)

    λμ\lambda\ll\mu if and only if ν{f>0,g=0}=0\nu\{f>0,\,g=0\}=0.

  5. (v)

    λμ\lambda\perp\mu if and only if ν{f>0,g>0}=0\nu\{f>0,\,g>0\}=0.

  6. (vi)

    λμ\lambda\perp\mu if and only if λ{g>0}=0\lambda\{g>0\}=0.

Proof.

(i) λ{f=0}={f=0}f𝑑ν=0.\lambda\{f=0\}=\int_{\{f=0\}}f\,d\nu=0. Analogously we see that μ{g=0}=0\mu\{g=0\}=0.

(ii) Fix a measurable set BB, and denote A={f>0}BA=\{f>0\}\cap B. Note that (i) implies that λ(B)=λ(A)=Af𝑑ν\lambda(B)=\lambda(A)=\int_{A}f\,d\nu. Hence ν(A)=0\nu(A)=0 implies that λ(B)=0\lambda(B)=0. Assume next that ν(A)>0\nu(A)>0. Then ν(An)>0\nu(A_{n})>0 for some integer n1n\geq 1, where An={fn1}BA_{n}=\{f\geq n^{-1}\}\cap B. Hence λ(B)λ(An)=Anf𝑑νn1ν(An)>0\lambda(B)\geq\lambda(A_{n})=\int_{A_{n}}f\,d\nu\geq n^{-1}\nu(A_{n})>0.

(iii) Assume that λ{g=0}=0\lambda\{g=0\}=0, and consider a set BB such that μ(B)=0\mu(B)=0. By applying (ii) for μ\mu, we find that ν({g>0}B)=0\nu(\{g>0\}\cap B)=0. Then λν\lambda\ll\nu implies that λ({g>0}B)=0\lambda(\{g>0\}\cap B)=0. Then λ(B)=λ({g>0}B)+λ({g=0}B)=0\lambda(B)=\lambda(\{g>0\}\cap B)+\lambda(\{g=0\}\cap B)=0 due to assumption λ{g=0}=0\lambda\{g=0\}=0. Hence λ(B)=0\lambda(B)=0, and we conclude that λμ\lambda\ll\mu. The converse implication λμλ{g=0}=0\lambda\ll\mu\implies\lambda\{g=0\}=0 is immediate from (i).

(iv) By applying (ii) with B={g=0}B=\{g=0\}, we find that λ{g=0}\lambda\{g=0\} is equivalent to ν{f>0,g=0}=0\nu\{f>0,\,g=0\}=0. The claim now follows by (iii).

(v) Assume that ν{f>0,g>0}=0\nu\{f>0,\,g>0\}=0. Let B={f>0}B=\{f>0\}. Then λ(Bc)=0\lambda(B^{c})=0 due to (i). Furthermore by (i), μ(B)=μ(B{g>0})=μ{f>0,g>0}\mu(B)=\mu(B\cap\{g>0\})=\mu\{f>0,\,g>0\}. Hence μ(B=0)\mu(B=0) due to μν\mu\ll\nu. Hence λμ\lambda\perp\mu. Assume now that λμ\lambda\perp\mu. Then there exists a set BB such that λ(Bc)=0\lambda(B^{c})=0 and μ(B)=0\mu(B)=0. Then (ii) implies that ν({f>0}Bc)=0\nu(\{f>0\}\cap B^{c})=0 and ν({g>0}B)=0\nu(\{g>0\}\cap B)=0. Then

ν({f>0,g>0})ν({f>0}Bc)+ν({g>0}B)= 0.\displaystyle\nu(\{f>0,g>0\})\ \leq\ \nu(\{f>0\}\cap B^{c})+\nu(\{g>0\}\cap B)\ =\ 0.

(vi) Assume that λ{g>0}=0\lambda\{g>0\}=0. Let B={g=0}B=\{g=0\}. Then λ(Bc)=0\lambda(B^{c})=0, and μ(B)=0\mu(B)=0 due to (i). Hence λμ\lambda\perp\mu. Assume now that λμ\lambda\perp\mu. Then there exists a set BB such that λ(Bc)=0\lambda(B^{c})=0 and μ(B)=0\mu(B)=0. Then by (ii), we see that ν({g>0}B)=0\nu(\{g>0\}\cap B)=0. Then λν\lambda\ll\nu implies that λ({g>0}B)=0\lambda(\{g>0\}\cap B)=0. Then

λ{g>0}=λ({g>0}B)+λ({g>0}Bc)= 0.\displaystyle\lambda\{g>0\}\ =\ \lambda(\{g>0\}\cap B)+\lambda(\{g>0\}\cap B^{c})\ =\ 0.

A.2 Basic measure theory

Lemma A.2.

If ν\nu is a sigma-finite measure on a measurable space (S,𝒮)(S,\mathcal{S}), then there exists a partition S=n1SnS=\cup_{n\geq 1}S_{n} such that ν(Sn)<\nu(S_{n})<\infty for all nn.

Proof.

Because ν\nu is sigma-finite, there exist measurable sets such that n1Cn=S\cup_{n\geq 1}C_{n}=S and ν(Cn)<\nu(C_{n})<\infty for all nn. Define S0=S_{0}=\emptyset and Sn=CnSn1S_{n}=C_{n}\setminus S_{n-1} for n1n\geq 1. Then the sets S1,S2,S_{1},S_{2},\dots are mutually disjoint, and S=n1SnS=\cup_{n\geq 1}S_{n}, together with ν(Sn)ν(Cn)<\nu(S_{n})\leq\nu(C_{n})<\infty for all nn. ∎

The following result is a convenient alternative form of Lebesgue’s dominated convergence theorem that quantifies uniform integrability (boundedness in the increasing convex stochastic order [LV13]) in a flexible manner.

Lemma A.3.

Let f,f1,f2,f,f_{1},f_{2},\dots and g1,g2,g_{1},g_{2},\dots be measurable real-valued functions on (S,𝒮)(S,\mathcal{S}) such that fnff_{n}\to f, gngg_{n}\to g, |fn|gn{\lvert f_{n}\rvert}\leq g_{n} for all nn, and supnSgn𝑑μ<\sup_{n}\int_{S}g_{n}\,d\mu<\infty. Then Sfn𝑑μSf𝑑μ\int_{S}f_{n}\,d\mu\to\int_{S}f\,d\mu, and S|f|𝑑μsupnSgn𝑑μ\int_{S}{\lvert f\rvert}\,d\mu\leq\sup_{n}\int_{S}g_{n}\,d\mu.

Proof.

We note that g=lim infngng=\liminf_{n\to\infty}g_{n}, and that Fatou’s lemma [Kal02, Lemma 1.20] implies g𝑑μ=(lim infngn)𝑑μlim infngn𝑑μsupngn𝑑μ\int g\,d\mu=\int(\liminf_{n\to\infty}g_{n})\,d\mu\leq\liminf_{n\to\infty}\int g_{n}\,d\mu\leq\sup_{n}\int g_{n}\,d\mu. Then g𝑑μ\int g\,d\mu is finite, and Kallenberg’s version of Lebesgue’s dominated convergence theorem [Kal02, Theorem 1.21] yields the first claim. For the second claim, we note by Fatou’s lemma that S|f|𝑑μ=Slim inf|fn|dμlim infS|fn|𝑑μsupnSgn𝑑μ\int_{S}{\lvert f\rvert}\,d\mu=\int_{S}\liminf{\lvert f_{n}\rvert}\,d\mu\leq\liminf\int_{S}{\lvert f_{n}\rvert}\,d\mu\leq\sup_{n}\int_{S}g_{n}\,d\mu. ∎

A.3 Measurable bijections

Lemma A.4.

Let S,TS,T be measurable spaces, and let ϕ:ST\phi\colon S\to T be a measurable bijection with a measurable inverse. Then Rα(Pϕ1Qϕ1)=Rα(PQ)R_{\alpha}(P\circ\phi^{-1}\|\,Q\circ\phi^{-1})=R_{\alpha}(P\|Q) for all α>0\alpha>0.

Proof.

Let P,QP,Q be probability measures on SS. Fix densities p=dPdmp=\frac{dP}{dm} and q=dQdmq=\frac{dQ}{dm} with respect to m=P+Qm=P+Q. Define a function p~:T+\tilde{p}\colon T\to\mathbb{R}_{+} by p~=pϕ1\tilde{p}=p\circ\phi^{-1} and a measure m~\tilde{m} on TT by m~=mϕ1\tilde{m}=m\circ\phi^{-1}. Note that for any measurable ATA\subset T,

Ap~𝑑m~=T1A(y)p~(y)m~(dy)=S1A(ϕ(x))p~(ϕ(x))m(dx).\displaystyle\int_{A}\tilde{p}\,d\tilde{m}\ =\ \int_{T}1_{A}(y)\tilde{p}(y)\,\tilde{m}(dy)\ =\ \int_{S}1_{A}(\phi(x))\tilde{p}(\phi(x))\,m(dx).

Because p~(ϕ(x))=p(x)\tilde{p}(\phi(x))=p(x) for all xx, we see that

Ap~𝑑m~=ϕ1(A)p(x)m(dx)=P(ϕ1(A)).\displaystyle\int_{A}\tilde{p}\,d\tilde{m}\ =\ \int_{\phi^{-1}(A)}p(x)\,m(dx)\ =\ P(\phi^{-1}(A)).

We conclude that p~=dPϕ1dmϕ1\tilde{p}=\frac{dP\circ\phi^{-1}}{dm\circ\phi^{-1}} is a density of Pϕ1P\circ\phi^{-1} with respect to m~\tilde{m}. Similarly, we see that q~=qϕ1\tilde{q}=q\circ\phi^{-1} is a density of Qϕ1Q\circ\phi^{-1} with respect to m~\tilde{m}. Hence,

T(pϕ1)α(qϕ1)1α𝑑m~\displaystyle\int_{T}(p\circ\phi^{-1})^{\alpha}(q\circ\phi^{-1})^{1-\alpha}d\tilde{m} =S(pϕ1(ϕ(x)))α(qϕ1(ϕ(x))1αm(dx)\displaystyle\ =\ \int_{S}(p\circ\phi^{-1}(\phi(x)))^{\alpha}(q\circ\phi^{-1}(\phi(x))^{1-\alpha}m(dx)
=S(p(x))α(q(x))1αm(dx)\displaystyle\ =\ \int_{S}(p(x))^{\alpha}(q(x))^{1-\alpha}m(dx)
=Spαq1α𝑑m.\displaystyle\ =\ \int_{S}p^{\alpha}q^{1-\alpha}\,dm.

From this the claim follows for α1\alpha\neq 1. The case with α=1\alpha=1 is similar. ∎

Appendix B Point patterns

B.1 Measurability of sigma-finite decompositions

This section discusses a decomposition of a point pattern with respect to a countable partition of the ground space SS. Let (S,𝒮)(S,\mathcal{S}) be a measurable space. Let N(S)N(S) be the set of point patterns (measures with values in +{}\mathbb{Z}_{+}\cup\{\infty\}) on (S,𝒮)(S,\mathcal{S}) equipped with the sigma-algebra 𝒩(S)=σ(evB:B𝒮)\mathcal{N}(S)=\sigma(\operatorname{ev}_{B}\colon B\in\mathcal{S}) generated by the evaluation maps evB:ηη(B)\operatorname{ev}_{B}\colon\eta\mapsto\eta(B).

Assume that S1,S2,𝒮S_{1},S_{2},\dots\in\mathcal{S} are disjoint and such that S=n=1SnS=\cup_{n=1}^{\infty}S_{n}. We define N(Sn)={ηN(S):η(Snc)=0}N(S_{n})=\{\eta\in N(S)\colon\eta(S_{n}^{c})=0\} and equip this set with the trace sigma-algebra 𝒩(Sn)=𝒩(S)N(Sn)\mathcal{N}(S_{n})=\mathcal{N}(S)\cap N(S_{n}). We define the truncation map τSn:N(S)N(Sn)\tau_{S_{n}}\colon N(S)\to N(S_{n}) by

(τSnη)(B)=η(BSn),B𝒮.(\tau_{S_{n}}\eta)(B)\ =\ \eta(B\cap S_{n}),\quad B\in\mathcal{S}.

Then we define τ:N(S)n=1N(Sn)\tau\colon N(S)\to\prod_{n=1}^{\infty}N(S_{n}) by

τ(η)=(τS1(η),τS2(η),).\tau(\eta)\ =\ (\tau_{S_{1}}(\eta),\tau_{S_{2}}(\eta),\dots). (B.1)

We find that τ\tau is a bijection with inverse

τ1(η1,η2,)=n=1ηn.\tau^{-1}(\eta_{1},\eta_{2},\dots)\ =\ \sum_{n=1}^{\infty}\eta_{n}.

We equip n=1N(Sn)\prod_{n=1}^{\infty}N(S_{n}) with the product sigma-algebra n=1𝒩(Sn)\bigotimes_{n=1}^{\infty}\mathcal{N}(S_{n}).

Lemma B.1.

τ:(N(S),𝒩(S))(n=1N(Sn),n=1𝒩(Sn))\tau\colon(N(S),\mathcal{N}(S))\to(\prod_{n=1}^{\infty}N(S_{n}),\bigotimes_{n=1}^{\infty}\mathcal{N}(S_{n})) is a measurable bijection with a measurable inverse.

Proof.

Denote 𝒞(S)={evB1({k}):B𝒮,k+}\mathcal{C}(S)=\{\operatorname{ev}_{B}^{-1}(\{k\})\colon B\in\mathcal{S},\,k\in\mathbb{Z}_{+}\} and note that this set family generates the sigma-algebra 𝒩(S)\mathcal{N}(S). By [Shi96, Lemma II.3.3], we know that the set family 𝒞(S)N(Sn)\mathcal{C}(S)\cap N(S_{n}) generates the trace sigma-algebra 𝒩(Sn)=𝒩(S)N(Sn)\mathcal{N}(S_{n})=\mathcal{N}(S)\cap N(S_{n}).

We start by by verifying that τSn:N(S)N(Sn)\tau_{S_{n}}\colon N(S)\to N(S_{n}) is measurable. Fix a set B𝒮B\in\mathcal{S} and an integer k0k\geq 0, and consider a set in 𝒞(S)N(Sn)\mathcal{C}(S)\cap N(S_{n}) of form

C={ηN(S):η(Snc)=0,η(B)=k}.C\ =\ \{\eta\in N(S)\colon\eta(S_{n}^{c})=0,\,\eta(B)=k\}.

Then

τSn1(C)\displaystyle\tau_{S_{n}}^{-1}(C) ={ηN(S):η(SncSn)=0,η(BSn)=k}\displaystyle\ =\ \{\eta\in N(S)\colon\eta(S_{n}^{c}\cap S_{n})=0,\,\eta(B\cap S_{n})=k\}
={ηN(S):η(BSn)=k}\displaystyle\ =\ \{\eta\in N(S)\colon\eta(B\cap S_{n})=k\}

shows that τSn1(C)𝒩(S)\tau_{S_{n}}^{-1}(C)\in\mathcal{N}(S). Because such sets CC generate 𝒩(Sn)\mathcal{N}(S_{n}), it follows [Kal02, Lemma 1.4] that τSn\tau_{S_{n}} is measurable. Because each coordinate map of τ\tau is measurable, it follows [Kal02, Lemma 1.8] that τ\tau is measurable.

Let us now verify that the inverse map τ1:n=1N(Sn)N(S)\tau^{-1}\colon\prod_{n=1}^{\infty}N(S_{n})\to N(S) is measurable. Fix a set B𝒮B\in\mathcal{S} and an integer k0k\geq 0, and consider a set in 𝒞(S)\mathcal{C}(S) of form

C={ηN(S):η(B)=k}.C\ =\ \{\eta\in N(S)\colon\eta(B)=k\}.

Then

(τ1)1(C)\displaystyle(\tau^{-1})^{-1}(C) ={(η1,η2,):n=1ηn(B)=k}.\displaystyle\ =\ \{(\eta_{1},\eta_{2},\dots)\colon\sum_{n=1}^{\infty}\eta_{n}(B)=k\}.

Let ZkZ_{k} be the collection of integer-valued measures z=n=1znδnz=\sum_{n=1}^{\infty}z_{n}\delta_{n} on ={1,2,}\mathbb{N}=\{1,2,\dots\} with total mass n=1zn=k\sum_{n=1}^{\infty}z_{n}=k. Then

(τ1)1(C)\displaystyle(\tau^{-1})^{-1}(C) =zZk{(η1,η2,):ηn(B)=znfor all n}\displaystyle\ =\ \bigcup_{z\in Z_{k}}\{(\eta_{1},\eta_{2},\dots)\colon\eta_{n}(B)=z_{n}\ \text{for all $n$}\}
=zZkn=1{(η1,η2,):ηn(B)=zn}\displaystyle\ =\ \bigcup_{z\in Z_{k}}\bigcap_{n=1}^{\infty}\{(\eta_{1},\eta_{2},\dots)\colon\eta_{n}(B)=z_{n}\}

shows that (τ1)1(C)n=1𝒩(Sn)(\tau^{-1})^{-1}(C)\in\bigotimes_{n=1}^{\infty}\mathcal{N}(S_{n}). Because such sets CC generate 𝒩(S)\mathcal{N}(S), it follows [Kal02, Lemma 1.4] that τ1\tau^{-1} is measurable. ∎

B.2 Compensated Poisson integrals

Given measurable sets SnSS_{n}\uparrow S and measures η,μ\eta,\mu on a measurable space (S,𝒮)(S,\mathcal{S}), we say that the compensated integral

Afd(ημ)=limn(ASnf𝑑ηASnf𝑑μ)\int_{A}f\,d(\eta-\mu)\ =\ \lim_{n\to\infty}\left(\int_{A\cap S_{n}}f\,d\eta-\int_{A\cap S_{n}}f\,d\mu\right) (B.2)

of a measurable function f:Sf\colon S\to\mathbb{R} over a measurable set ASA\subset S converges if ASn|f|𝑑η+ASn|f|𝑑μ<\int_{A\cap S_{n}}{\lvert f\rvert}\,d\eta+\int_{A\cap S_{n}}{\lvert f\rvert}\,d\mu<\infty for all nn, and the limit in (B.2) exists in \mathbb{R}. The following two results characterise the convergence of compensated integrals when η\eta is sampled from a Poisson PP distribution with intensity measure μ\mu. These are needed for proving Theorem 4.4.

Lemma B.2.

Let PμP_{\mu} be a Poisson PP distribution with a sigma-finite intensity measure μ\mu such that μ(Sn)<\mu(S_{n})<\infty for all nn. For any bounded function f:Sf\colon S\to\mathbb{R} such that Af2𝑑μ<\int_{A}f^{2}\,d\mu<\infty, the compensated integral Afd(ημ)\int_{A}f\,d(\eta-\mu) converges for PμP_{\mu}-almost every ηN(S)\eta\in N(S).

Proof.

Define Un=SnSn1U_{n}=S_{n}\setminus S_{n-1} for n1n\geq 1, where S0=S_{0}=\emptyset. Because ff is bounded, we see that Un|f|𝑑μfμ(Sn)<\int_{U_{n}}{\lvert f\rvert}\,d\mu\leq\|f\|_{\infty}\mu(S_{n})<\infty. Campbell’s theorem [Kin93, Section 3.2] then implies that

Wn(η)=Unf𝑑ηUnf𝑑μW_{n}(\eta)\ =\ \int_{U_{n}}f\,d\eta-\int_{U_{n}}f\,d\mu

defines a real-valued random variable on probability space (N(S),𝒩(S),Pμ)(N(S),\mathcal{N}(S),P_{\mu}) with mean EμWn=0E_{\mu}W_{n}=0 and variance EμWn2=Unf2μE_{\mu}W_{n}^{2}=\int_{U_{n}}f^{2}\,\mu. Because the sets UnU_{n} are disjoint, the random variables WnW_{n} are independent. Furthermore, Eμn=1Wn2=Sf2𝑑μ<E_{\mu}\sum_{n=1}^{\infty}W_{n}^{2}=\int_{S}f^{2}\,d\mu<\infty. The Khinchin–Kolmogorov variance criterion [Kal02, Lemma 4.16] then implies that the sum W=n=1WnW=\sum_{n=1}^{\infty}W_{n} converges almost surely. Hence WW, or equivalently the right side of formula (B.2), is a well-defined real-valued random variable on the probability space (N(S),𝒩(S),Pμ)(N(S),\mathcal{N}(S),P_{\mu}). In particular W(η)W(\eta)\in\mathbb{R} for PμP_{\mu}-almost every η\eta. ∎

Lemma B.3.

Let PμP_{\mu} be a Poisson PP distribution with a sigma-finite intensity measure μ\mu. Let ϕ:S+\phi\colon S\to\mathbb{R}_{+} be such that S(ϕ1)2𝑑μ<\int_{S}(\sqrt{\phi}-1)^{2}\,d\mu<\infty, and denote A={xS:|logϕ(x)|1}A=\{x\in S\colon{\lvert\log\phi(x)\rvert}\leq 1\}. Assume that SnSS_{n}\uparrow S and μ(Sn)<\mu(S_{n})<\infty for all nn. Then the sets

Ω1={ηN(S):Ac{ϕ>0}|logϕ|𝑑η<,η(Ac)<}\Omega_{1}\ =\ \left\{\eta\in N(S)\colon\int_{A^{c}\cap\{\phi>0\}}{\lvert\log\phi\rvert}\,d\eta<\infty,\ \eta(A^{c})<\infty\right\}

and

Ω2={ηN(S):Alogϕd(ημ) converges,η(Sn)<for all n}\Omega_{2}\ =\ \left\{\eta\in N(S)\colon\text{$\int_{A}\log\phi\,d(\eta-\mu)$ converges},\ \eta(S_{n})<\infty\ \text{for all $n$}\right\}

satisfy Pμ(Ω1)=1P_{\mu}(\Omega_{1})=1 and Pμ(Ω2)=1P_{\mu}(\Omega_{2})=1.

Proof.

We note that μ(Ac)=Ac𝑑μAc(ϕ+1)𝑑μ\mu(A^{c})=\int_{A^{c}}d\mu\leq\int_{A^{c}}(\phi+1)\,d\mu. By (C.6), ϕ+1e+1(e1/21)2(ϕ1)2\phi+1\ \leq\ \frac{e+1}{(e^{1/2}-1)^{2}}(\sqrt{\phi}-1)^{2} on AcA^{c}. It follows that μ(Ac)\mu(A^{c}) is finite. Campbell’s formula then implies N(S)η(Ac)Pμ(dη)=μ(Ac)<\int_{N(S)}\eta(A^{c})P_{\mu}(d\eta)=\mu(A^{c})<\infty. Hence the set Ω1={η:η(Ac)<}\Omega_{1}^{\prime}=\{\eta\colon\eta(A^{c})<\infty\} satisfies Pμ(Ω1)=1P_{\mu}(\Omega_{1}^{\prime})=1. We also note that for any ηΩ1\eta\in\Omega_{1}^{\prime}, the restriction of η\eta into AcA^{c} can be written as a finite sum ηAc=iδxi\eta_{A^{c}}=\sum_{i}\delta_{x_{i}} with xiSx_{i}\in S such that ϕ(xi)>0\phi(x_{i})>0, so that Ac{ϕ>0}|logϕ|𝑑η=i|logϕ(xi)|\int_{A^{c}\cap\{\phi>0\}}{\lvert\log\phi\rvert}\,d\eta=\sum_{i}{\lvert\log\phi(x_{i})\rvert} is finite. Therefore, Ω1=Ω1\Omega_{1}^{\prime}=\Omega_{1}, and we conclude that Pμ(Ω1)=1P_{\mu}(\Omega_{1})=1.

Let

Ω2={ηN(S):η(Sn)<for all n}.\Omega_{2}^{\prime}\ =\ \Big{\{}\eta\in N(S)\colon\eta(S_{n})<\infty\ \text{for all $n$}\Big{\}}.

Because μ(Sn)<\mu(S_{n})<\infty for all nn, we see that Pμ(Ω2)=1P_{\mu}(\Omega_{2}^{\prime})=1. Observe that f=1Alogϕf=1_{A}\log\phi is bounded, and by (C.8),

Sf2𝑑μ=Alog2ϕdμ 4e3S(ϕ1)2<.\int_{S}f^{2}\,d\mu\ =\ \int_{A}\log^{2}\phi\,d\mu\ \leq\ 4e^{3}\int_{S}(\sqrt{\phi}-1)^{2}\ <\ \infty.

Lemma B.2 implies that Pμ(Ω2)=1P_{\mu}(\Omega_{2})=1. ∎

B.3 Truncated Poisson PPs

Let (S,𝒮)(S,\mathcal{S}) be a measurable space. Given a set U𝒮U\in\mathcal{S} and a measure λ\lambda, define λU(A)=λ(AU)\lambda_{U}(A)=\lambda(A\cap U) for A𝒮A\in\mathcal{S}. Then λU\lambda_{U} is a measure on (S,𝒮)(S,\mathcal{S}). We denote the truncation map by πU:λλU\pi_{U}\colon\lambda\mapsto\lambda_{U}.

When η\eta is sampled from a Poisson PP distribution PλP_{\lambda} with a sigma-finite intensity measure λ\lambda on SS, then the probability distribution of the random point pattern ηU\eta_{U} is given by PλπU1P_{\lambda}\circ\pi_{U}^{-1}. The following proposition confirms that the law of ηU\eta_{U} is a Poisson PP distribution with truncated intensity measure λU\lambda_{U}, and that that PλUP_{\lambda_{U}} is also the conditional distribution of η\eta sampled from PλP_{\lambda} given that η(Uc)=0\eta(U^{c})=0.

Proposition B.4.

For any measurable set USU\subset S:

  1. (i)

    PλπU1=PλUP_{\lambda}\circ\pi_{U}^{-1}=P_{\lambda_{U}}.

  2. (ii)

    Pλ{η:ηA,η(Uc)=0}=eλ(Uc)PλU(A)P_{\lambda}\{\eta\colon\eta\in A,\,\eta(U^{c})=0\}=e^{-\lambda(U^{c})}P_{\lambda_{U}}(A) for all measurable ASA\subset S.

Proof.

(i) This follows by [LP18, Theorem 5.2].

(ii) We note that η=ηU\eta=\eta_{U} for any η\eta such that η(Uc)=0\eta(U^{c})=0, and that η(Uc)=0\eta(U^{c})=0 if and only if ηUc=0\eta_{U^{c}}=0. We also note ([LP18, Theorem 5.2]) that the random point patterns ηU\eta_{U} and ηUc\eta_{U^{c}} are independent when η\eta is sampled from PλP_{\lambda}. Therefore, by (i),

Pλ{η:ηA,η(Uc)=0}\displaystyle P_{\lambda}\{\eta\colon\eta\in A,\,\eta(U^{c})=0\} =Pλ{η:ηUA,ηUc=0}\displaystyle\ =\ P_{\lambda}\{\eta\colon\eta_{U}\in A,\,\eta_{U^{c}}=0\}
=PλU(A)PλUc({0}).\displaystyle\ =\ P_{\lambda_{U}}(A)\,P_{\lambda_{U^{c}}}(\{0\}).

By noting that PλUc({0})=PλUc(η(S)=0)P_{\lambda_{U^{c}}}(\{0\})=P_{\lambda_{U^{c}}}(\eta(S)=0) and noting that η(S)\eta(S) is Poisson-distributed with parameter λ(Uc)\lambda(U^{c}) when η\eta is sampled from PλUcP_{\lambda_{U^{c}}}, we see that PλUc({0})=eλ(Uc)P_{\lambda_{U^{c}}}(\{0\})=e^{-\lambda(U^{c})}. Hence the claim follows. ∎

Appendix C Elementary analysis

Lemma C.1.

For all x>1x>-1,

x1+x\displaystyle\frac{x}{1+x} log(1+x)x,\displaystyle\ \leq\ \log(1+x)\ \leq\ x, (C.1)
xx22(1x)2\displaystyle x-\frac{x^{2}}{2(1-x_{-})^{2}} log(1+x)xx22(1+x+)2,\displaystyle\ \leq\ \log(1+x)\ \leq\ x-\frac{x^{2}}{2(1+x_{+})^{2}}, (C.2)

and for all y>0y>0,

|logy||y1|y1{\lvert\log y\rvert}\ \leq\ \frac{{\lvert y-1\rvert}}{y\wedge 1} (C.3)

and

0y1logy(y1)22(y1)2.0\ \leq\ y-1-\log y\ \leq\ \frac{(y-1)^{2}}{2(y\wedge 1)^{2}}. (C.4)
Proof.

Fix a number x>1x>-1. Define f(t)=log(1+tx)f(t)=\log(1+tx) for t[0,1]t\in[0,1]. Note that f(t)=x(1+tx)1f^{\prime}(t)=x(1+tx)^{-1} and f′′(t)=x2(1+tx)2f^{\prime\prime}(t)=-x^{2}(1+tx)^{-2}. Then the formula f(1)=f(0)+01f(r)𝑑rf(1)=f(0)+\int_{0}^{1}f^{\prime}(r)\,dr implies that

log(1+x)=01x1+rx𝑑r.\log(1+x)\ =\ \int_{0}^{1}\frac{x}{1+rx}\,dr.

Then (C.1) follows by noting that that x1+xx1+rxx\frac{x}{1+x}\leq\frac{x}{1+rx}\leq x for all 0r10\leq r\leq 1. Similarly, the formula f(1)=f(0)+f(0)+010sf′′(r)𝑑r𝑑sf(1)=f(0)+f^{\prime}(0)+\int_{0}^{1}\int_{0}^{s}f^{\prime\prime}(r)\,dr\,ds implies that

log(1+x)=x010sx2(1+rx)2𝑑r𝑑s.\log(1+x)\ =\ x-\int_{0}^{1}\int_{0}^{s}\frac{x^{2}}{(1+rx)^{2}}\,dr\,ds.

Then (C.2) follows by noting that 1x1+rx1+x+1-x_{-}\leq 1+rx\leq 1+x_{+} for all 0r10\leq r\leq 1.

Fix a number y>0y>0. By substituting x=y1x=y-1 into (C.1), we see that y1ylogyy1\frac{y-1}{y}\leq\log y\leq y-1. Hence |y1|ylogy|y1|-\frac{{\lvert y-1\rvert}}{y}\leq\log y\leq{\lvert y-1\rvert}, and (C.3) follows. By substituting x=y1x=y-1 into (C.2), we see that

(y1)22(1+(y1)+)2y1logy(y1)22(1(y1))2.\frac{(y-1)^{2}}{2(1+(y-1)_{+})^{2}}\ \leq\ y-1-\log y\ \leq\ \frac{(y-1)^{2}}{2(1-(y-1)_{-})^{2}}.

Now (C.4) follows by noting that 1(y1)=y11-(y-1)_{-}=y\wedge 1. ∎

Lemma C.2.

If t0t\geq 0 satisfies |t1|c{\lvert t-1\rvert}\geq c for some c>0c>0, then t+1C(t1)2t+1\leq C(\sqrt{t}-1)^{2} where

C={2c(1c1)2,0<c1,2+c(1+c1)2,c>1.C\ =\ \begin{cases}\frac{2-c}{(\sqrt{1-c}-1)^{2}},&\quad 0<c\leq 1,\\ \frac{2+c}{(\sqrt{1+c}-1)^{2}},&\quad c>1.\end{cases}
Proof.

Differentiation shows that r(t)=(t1)2t+1=12tt+1r(t)=\frac{(\sqrt{t}-1)^{2}}{t+1}=1-\frac{2\sqrt{t}}{t+1} is strictly decreasing on [0,1][0,1] and strictly increasing on [1,)[1,\infty).

(i) Assume that 0<c10<c\leq 1. Then |t1|c{\lvert t-1\rvert}\geq c implies that either t1ct\leq 1-c or t1+ct\geq 1+c. In the former case r(t)r(1c)r(t)\geq r(1-c), and in the latter case r(t)r(1+c)r(t)\geq r(1+c). Hence t+1(r(1c)1r(1+c)1)(t1)2=r(1c)1(t1)2t+1\leq(r(1-c)^{-1}\wedge r(1+c)^{-1})(\sqrt{t}-1)^{2}=r(1-c)^{-1}(\sqrt{t}-1)^{2}.

(ii) Assume that c>1c>1. Then |t1|c{\lvert t-1\rvert}\geq c implies that t1+ct\geq 1+c, so that r(t)r(1+c)r(t)\geq r(1+c). Hence t+1r(1+c)1(t1)2t+1\leq r(1+c)^{-1}(\sqrt{t}-1)^{2}. ∎

Lemma C.3.

For all 0tc0\leq t\leq c,

|t1|(1+c1/2)|t1|.{\lvert t-1\rvert}\ \leq\ (1+c^{1/2}){\lvert\sqrt{t}-1\rvert}. (C.5)

For all 0tc10\leq t\leq c^{-1} and all tct\geq c with c>1c>1,

t+1c+1(c1/21)2(t1)2.t+1\ \leq\ \frac{c+1}{(c^{1/2}-1)^{2}}(\sqrt{t}-1)^{2}. (C.6)
Proof.

(i) Because |t1|=|t+1||t1|{\lvert t-1\rvert}={\lvert\sqrt{t}+1\rvert}{\lvert\sqrt{t}-1\rvert}, we see that (C.5) follows by noting that |t+1|1+c1/2{\lvert\sqrt{t}+1\rvert}\leq 1+c^{1/2} for all 0tc0\leq t\leq c.

(ii) Differentiation shows that r(t)=(t1)2t+1=12tt+1r(t)=\frac{(\sqrt{t}-1)^{2}}{t+1}=1-\frac{2\sqrt{t}}{t+1} is decreasing on [0,1][0,1] and increasing on [1,)[1,\infty). Hence r(t)r(c)r(t)\geq r(c) for tct\geq c and r(t)r(c1)r(t)\geq r(c^{-1}) for 0tc10\leq t\leq c^{-1}. Because r(c)=r(c1)=(c1/21)2c+1r(c)=r(c^{-1})=\frac{(c^{1/2}-1)^{2}}{c+1}, we conclude (C.6). ∎

Lemma C.4.

For all tt such that |logt|1{\lvert\log t\rvert}\leq 1,

|logt+1t|\displaystyle{\lvert\log t+1-t\rvert} 2e3(t1)2,\displaystyle\ \leq\ 2e^{3}(\sqrt{t}-1)^{2}, (C.7)
log2t\displaystyle\log^{2}t 4e3(t1)2.\displaystyle\ \leq\ 4e^{3}(\sqrt{t}-1)^{2}. (C.8)
Proof.

Fix a number tt such that |logt|1{\lvert\log t\rvert}\leq 1. Then e1tee^{-1}\leq t\leq e. Then (C.4) implies that |logt+1t|(t1)22(t1)212e2(t1)2.{\lvert\log t+1-t\rvert}\leq\frac{(t-1)^{2}}{2(t\wedge 1)^{2}}\leq\frac{1}{2}e^{2}(t-1)^{2}. By combining this with (C.5), inequality (C.7) follows. Furthermore, (C.3) implies that |logt||t1|t1e|t1|{\lvert\log t\rvert}\leq\frac{{\lvert t-1\rvert}}{t\wedge 1}\leq e{\lvert t-1\rvert}. By combining this with (C.5), we that (C.8) is valid. ∎

References

  • [ABDW21] Rami Atar, Amarjit Budhiraja, Paul Dupuis, and Ruoyu Wu. Robust bounds and optimization at the large deviations scale for queueing models via Rényi divergence. Annals of Applied Probability, 31(3):1061–1099, 2021.
  • [ABS21] Emmanuel Abbé, François Baccelli, and Abishek Sankararaman. Community detection on Euclidean random graphs. Information and Inference: A Journal of the IMA, 10(1):109–160, 2021.
  • [AKL24] Konstantin Avrachenkov, B. R. Vinay Kumar, and Lasse Leskelä. Community detection on block models with geometric kernels, 2024. https://arxiv.org/abs/2403.02802.
  • [Ana18] Venkat Anantharam. A variational characterization of Rényi divergences. IEEE Transactions on Information Theory, 64(11):6979–6989, 2018.
  • [AS15] Emmanuel Abbe and Colin Sandon. Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS), 2015.
  • [BDK+21] Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis, Luc Rey-Bellet, and Jie Wang. Variational representations and neural network estimation of Rényi divergences. SIAM Journal on Mathematics of Data Science, 3(4):1093–1116, 2021.
  • [Bev20] Andrew Bevan. Spatial point patterns and processes. In Mark Gillings, Piraye Hacıgüzeller, and Gary Lock, editors, Archaeological Spatial Analysis, pages 60–76. Routledge, 2020.
  • [Bir07] Lucien Birgé. Model selection for Poisson processes. In Asymptotics: particles, processes and inverse problems. Festschrift for Piet Groeneboom, pages 32–64. Beachwood, OH: IMS, Institute of Mathematical Statistics, 2007.
  • [Bro71] Mark Brown. Discrimination of Poisson processes. Annals of Mathematical Statistics, 42(2):773–776, 1971.
  • [Che52] Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Annals of Mathematical Statistics, 23(4):493–507, 1952.
  • [Dig13] Peter J. Diggle. Statistical Analysis of Spatial and Spatio-Temporal Point Patterns. Chapman and Hall/CRC, third edition, 2013.
  • [DM82] Claude Dellacherie and Paul-André Meyer. Probabilities and Potential B. North-Holland Publishing Company, 1982.
  • [DVJ03] Daryl J. Daley and David Vere-Jones. An Introduction to the Theory of Point Processes. Springer, second edition, 2003.
  • [GS02] Alison L. Gibbs and Francis Edward Su. On choosing and bounding probability metrics. International Statistical Review, 70(3):419–435, 2002.
  • [IPSS08] Janine Illian, Antti Penttinen, Helga Stoyan, and Dietrich Stoyan. Statistical Analysis and Modelling of Spatial Point Patterns. Wiley, 2008.
  • [Kak48] Shizuo Kakutani. On equivalence of infinite product measures. Annals of Mathematics, 49(1):214–224, 1948.
  • [Kal02] Olav Kallenberg. Foundations of Modern Probability. Springer, second edition, 2002.
  • [Kar83] Alan F. Karr. State estimation for Cox processes on general spaces. Stochastic Processes and their Applications, 14(3):209–232, 1983.
  • [Kar91] Alan F. Karr. Point Processes and Their Statistical Inference. Marcel Dekker, second edition, 1991.
  • [Kin67] John Frank Charles Kingman. Completely random measures. Pacific Journal of Mathematics, 21(1):59–78, 1967.
  • [Kin93] John Frank Charles Kingman. Poisson Processes. Oxford University Press, 1993.
  • [KL13] Mikko Kuronen and Lasse Leskelä. Hard-core thinnings of germ–grain models with power-law grain sizes. Advances in Applied Probability, 45(3):595–625, 2013.
  • [KLNS07] Ingemar Kaj, Lasse Leskelä, Ilkka Norros, and Volker Schmidt. Scaling limits for random fields with long-range dependence. Annals of Probability, 35(2):528–550, 2007.
  • [Les10] Lasse Leskelä. Stochastic relations of random variables and processes. Journal of Theoretical Probability, 23(2):523–546, 2010.
  • [Les22] Lasse Leskelä. Ross’s second conjecture and supermodular stochastic ordering. Queueing Systems, 100(3):213–215, 2022.
  • [Lie75] Friedrich Liese. Eine informationstheoretische Bedingung fur die Äquivalenz unbegrenzt teilbarer Punktprozesse. Mathematische Nachrichten, 70(1):183–196, 1975.
  • [LP18] Günter Last and Mathew D. Penrose. Lectures on the Poisson Process. Cambridge University Press, 2018.
  • [LS78] Robert S. Liptser and Albert N. Shiryaev. Statistics of Random Processes II. Springer, 1978.
  • [LV06] Friedrich Liese and Igor Vajda. On divergences and informations in statistics and information theory. IEEE Transactions on Information Theory, 52(10):4394–4412, 2006.
  • [LV13] Lasse Leskelä and Matti Vihola. Stochastic order characterization of uniform integrability and tightness. Statistics & Probability Letters, 83(1):382–389, 2013.
  • [LV17] Lasse Leskelä and Matti Vihola. Conditional convex orders and measurable martingale couplings. Bernoulli, 23(4A):2784–2807, 2017.
  • [MFWF23] Benjamin Kurt Miller, Marco Federici, Christoph Weniger, and Patrick Forré. Simulation-based inference with the generalized Kullback-Leibler divergence, 2023. ICML Workshop on Synergy of Scientific and Machine Learning Modeling.
  • [MPST21] Xiaosheng Mu, Luciano Pomatto, Philipp Strack, and Omer Tamuz. From Blackwell dominance in large samples to Rényi divergences and back again. Econometrica, 89(1):475–506, 2021.
  • [Nie13] Frank Nielsen. An information-geometric characterization of Chernoff information. IEEE Signal Processing Letters, 20(3):269–272, 2013.
  • [NN11] Frank Nielsen and Richard Nock. A closed-form expression for the Sharma–Mittal entropy of exponential families. Journal of Physics A: Mathematical and Theoretical, 45(3):032003, 2011.
  • [PRBR18] Franck Picard, Patricia Reynaud-Bouret, and Etienne Roquain. Continuous testing for Poisson process intensities: a new perspective on scanning statistics. Biometrika, 105(4):931–944, 2018.
  • [PW24] Yury Polyanskiy and Yihong Wu. Information Theory. Cambridge University Press, 2024.
  • [RB17] Miklós Z. Rácz and Sébastien Bubeck. Basic models and questions in statistical network analysis. Statistics Surveys, 11:1–47, 2017.
  • [RBS10] Patricia Reynaud-Bouret and Sophie Schbath. Adaptive estimation for Hawkes processes; application to genome analysis. Annals of Statistics, 38(5):2781–2822, 2010.
  • [Rei93] Rolf-Dieter Reiss. A Course on Point Processes. Springer, 1993.
  • [Rén61] Alfréd Rényi. On measures of entropy and information. In 4th Berkeley Symposium on Mathematics, Statistics and Probability, 1961.
  • [Sas18] Igal Sason. On ff-divergences: Integral representations, local behavior, and inequalities. Entropy, 20(5):1–32, 2018.
  • [Shi96] Albert N. Shiryaev. Probability. Springer, second edition, 1996.
  • [Sko57] Anatoliy Volodymyrovych Skorohod. On the differentiability of measures which correspond to stochastic processes. I. Processes with independent increments. Theory of Probability & Its Applications, 2(4):407–432, 1957.
  • [SMSS02] Maritin Snethlage, Vicent J. Martínez, Dietrich Stoyan, and Enn Saar. Point field models for the galaxy point pattern. Astronomy & Astrophysics, 388(3):758–765, 2002.
  • [SP00] Dietrich Stoyan and Antti Penttinen. Recent applications of point process methods in forestry statistics. Statistical Science, 15(1):61–78, 2000.
  • [Tak90] Yoichiro Takahashi. Absolute continuity of Poisson random fields. Publications of the Research Institute for Mathematical Sciences, Kyoto University, 26(4):629–647, 1990.
  • [Tsa98] Constantino Tsallis. Generalized entropy-based criterion for consistent testing. Physical Review E, 58:1442–1445, 1998.
  • [vH14] Tim van Erven and Peter Harremoës. Rényi divergence and Kullback–Leibler divergence. IEEE Transactions on Information Theory, 60(7):3797–3820, July 2014.
  • [ZT23] Qiaosheng Zhang and Vincent Y. F. Tan. Exact recovery in the general hypergraph stochastic block model. IEEE Transactions on Information Theory, 69(1):453–471, 2023.