This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Functional Covering of Point Processes

Nirmal V. Shende and Aaron B. Wagner N. V. Shende is with Marvell Technology, Inc., Santa Clara, CA 95054 (email: nshende@marvell.com). A. B. Wagner is with the School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853 (email:wagner@cornell.edu). This work was performed when N. V. Shende was a student at Cornell University. This paper was presented in part at the IEEE International Symposium on Information Theory, Paris, Jul. 2019 [1]. This research was supported by the US National Science Foundation under grants CCF-1956192, CCF-2008266, and CCF-1934985.
Abstract

We introduce a new distortion measure for point processes called functional-covering distortion. It is inspired by intensity theory and is related to both the covering of point processes and logarithmic-loss distortion. We obtain the distortion-rate function with feedforward under this distortion measure for a large class of point processes. For Poisson processes, the rate-distortion function is obtained under a general condition called constrained functional-covering distortion, of which both covering and functional-covering are special cases. Also for Poisson processes, we characterize the rate-distortion region for a two-encoder CEO problem and show that feedforward does not enlarge this region.

I Introduction

The classical theory of compression [2] focuses on discrete-time, sequential sources. The theory is thus well-suited to text, audio, speech, genomic data, and the like. Continuous-time signals are typically handled by reducing to discrete-time via projection onto a countable basis. Multi-dimensional extensions enable application to images and video.

Point processes model a distinct data type that appears in diverse domains such as neuroscience [3, 4, 5, 6, 7, 8], communication networks [9, 10, 11], imaging [12, 13], blockchains [14, 15, 16, 17], and photonics [18, 19, 20, 21, 22]. Formally, a point process can be viewed as a random counting measure on some space of interest [23], or if the space is a real line, a random counting function; we shall adopt the latter view. Informally, it may be viewed as simply a random collection of points representing epochs in time or points in space.

Compression of point processes emerges naturally in several of the above domains. Sub-cranial implants need to communicate the timing of neural firings to a monitoring station over a wireless link that is low-rate because it must traverse the skull [24, 25]. In network flow correlation analysis, one cross-correlates packet timings from different links in the network [11]; this requires communication of the packet timings from one place to another. Compressing point process realizations in 2-D (also known as point clouds) arises in computer vision [26, 27, 28], and so on.

Various specialized approaches have been developed for compressing point processes, and in particular for measuring distortion. One natural approach is for the compressed representation to be itself a point-process realization. In this case, the distortion can be the sum of the absolute value of the differences between the actual and reconstructed epochs, with the constraint that the two processes must have the same number of points. For the Poisson point process, Gallager [29] obtained a lower bound on the rate-distortion function by insisting on the causal reconstruction of the points but allowing for their reorder. Bedekar [30] determined the rate-distortion function with the additional constraint of exact orders of epochs in reconstruction. Verdú [31] allowed the reconstruction to be non-causal. Coleman et al. [32] introduced the queueing distortion function, where the reproduced epochs lead the actual epochs. Rubin [33] used the L1L_{1} distance between the counting functions as a distortion measure. In a more general setting, Koliander et al. [34] gave upper and lower bounds on the rate-distortion function under a more generic distortion defined between pair of point processes.

Most relevant to the present paper, Lapidoth et al. [35] introduced a covering distortion measure, where the reconstruction of a point process on [0,T][0,T] is a subset YY of [0,T][0,T] that must contain all the points, and the distortion is the Lebesgue measure of the covering set (see also Shen et al. [36]).

If we encode the subset YY as an indicator function

Yt={1if tY0otherwise,Y_{t}=\begin{cases}1&\text{if $t\in Y$}\\ 0&\text{otherwise,}\end{cases}

then Yt=0Y_{t}=0 guarantees that no point occurred at time tt, while Yt=1Y_{t}=1 indicates that a point may occur at tt. More generally, YtY_{t} could encode the relative belief that there is a point at tt. Inspired by this observation, and the notion of logarithmic-loss distortion [37, 38], we consider the following formulation. For a realization of a counting (or point) process y0T=(yt:t[0,T])y_{0}^{T}=(y_{t}:t\in[0,T]) (i.e., yty_{t} is integer-valued, non-decreasing, and has unit jumps) and a non-negative reconstruction y^0T\hat{y}_{0}^{T}, we define the functional-covering distortion as

d(y^0T,y0T)0Ty^t𝑑tlog(y^t)dyt.\displaystyle d(\hat{y}_{0}^{T},y_{0}^{T})\triangleq\int_{0}^{T}\hat{y}_{t}\,dt-\log(\hat{y}_{t})\,dy_{t}. (1)

This is related to the covering distortion measure in the following sense. If we impose that y^t{0,1}\hat{y}_{t}\in\{0,1\}, then (1) reduces to the covering distortion measure. Yet it is natural to consider the distortion in (1) without such a restriction, or with a more general set of allowable values for y^t\hat{y}_{t}. In fact, there are advantages to not restricting y^0T\hat{y}_{0}^{T} to the set {0,1}\{0,1\}. Consider a remote source setting where the encoder cannot access the point-process source directly, but instead observes a thinned version where some of the points in the source point process are deleted randomly. Then, in case of the covering distortion the reconstruction can only be the entire interval [0,T][0,T] (i.e. y^t=1,t[0,T]\hat{y}_{t}=1,t\in[0,T]). On the other hand, under functional covering distortion the problem has a nontrivial solution.

The relation functional covering distortion measure to logarithmic-loss is as follows. If we constrain y^0T\hat{y}_{0}^{T} to be bounded, then we can use a Girsanov-type transformation [39, Chapter VI, Theorems T2-T4] to define a probability measure on the set of all counting processes using y^0T\hat{y}_{0}^{T}, and the distortion can be defined as the expectation of the negative logarithm of the Radon-Nikodym derivative between this probability measure and an appropriately chosen reference measure, evaluated at the source realization, which is equivalent to (1). However, we will allow y^0T\hat{y}_{0}^{T} to be unbounded but integrable 𝔼[0TY^t𝑑t]<\mathbb{E}[\int_{0}^{T}\hat{Y}_{t}\,dt]<\infty.

The relation to intensity theory is as follows. Heuristically, given a random variable MM, the intensity of a point process represented by a counting function Y0TY_{0}^{T} is a non-negative process Γ0T\Gamma_{0}^{T} such that P(Yt+ΔYt=1|M,Y0t)ΓtΔP(Y_{t+\Delta}-Y_{t}=1|M,Y_{0}^{t})\approx\Gamma_{t}\Delta (see Definition 2 for the precise statement). From (1), we expect any optimal Y^0T\hat{Y}_{0}^{T} (in the rate-distortion trade-off sense) to be related to the intensity of Y0TY_{0}^{T}. In fact, we will see in the proof of Theorem 4 that an optimal reconstruction Y^0T\hat{Y}_{0}^{T} is the intensity of Y0TY_{0}^{T} given the encoder’s output.

Beyond the introduction of the functional covering distortion measure and the accompanying coding theorems, the paper provides a collection of results for the information-theoretic analysis of point processes, which may be of independent use. One such contribution is Theorem 1, where we derive the mutual information between point-processes with intensities and arbitrary random variables. This is the most general expression available for mutual informations involving point process with intensities. Theorem 1 subsumes the existing formulae for mutual informations involving doubly stochastic Poisson processes [40, 41, 42] and queuing processes [43] as special cases. The other theorems proved in this paper are: we obtain the rate-distortion trade-off with feedforward for the functional-covering distortion measure for point processes which admit intensities (see Theorem 4). For Poisson processes, we obtain the rate-distortion region when the reconstruction function y^0T\hat{y}_{0}^{T} is constrained to take value in a subset of reals (Theorem 5). The covering distortion in [35, Theorem 1] is a special case of this constrained functional-covering distortion, hence the rate-distortion function in [35] can be obtained as the special case of this theorem. We characterize the rate-distortion region for a two-encoder Poisson CEO problem (see Figure 1) under functional-covering distortion in Theorem 6. To prove the converse of the CEO problem, we derive a strong data processing inequality for Poisson processes under superposition (see Theorem 2), which complements the strong data processing inequality for Poisson processes under thinning due to Wang [44]. We also provide a self-contained proof of Wang’s theorem in Theorem 3. The solution to the CEO problem gives the rate-distortion trade-off for remote Poisson sources as an immediate corollary.

II Preliminaries

We will consider a probability space (Ω,,P)(\Omega,\mathcal{F},P) on which all stochastic processes considered here are defined. For a finite T>0T>0, let (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T]) be an increasing family of σ\sigma-fields with T\mathcal{F}_{T}\in\mathcal{F}. We will assume that the given filtration (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T]), PP, and \mathcal{F} satisfy the “usual conditions”[39, Chapter III, p. 75]: \mathcal{F} is complete with respect to PP, t\mathcal{F}_{t} is right continuous, and 0\mathcal{F}_{0} contains all the PP-null sets of t\mathcal{F}_{t}. Stochastic processes are denoted as Y^0T={Y^t:0tT}\hat{Y}_{0}^{T}=\{\hat{Y}_{t}:0\leq t\leq T\}. The process X0TX_{0}^{T} is said to be adapted to the history (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T]) if XtX_{t} is t\mathcal{F}_{t} measurable for all t[0,T]t\in[0,T]. The internal history recorded by the process X0TX_{0}^{T} is denoted by tX=(σ(Xs):s[0,t])\mathcal{F}^{X}_{t}=(\sigma(X_{s}):s\in[0,t]), where σ(A)\sigma(A) denotes the σ\sigma-field generated by AA.

A process X0TX_{0}^{T} is called (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable if X0X_{0} is 0\mathcal{F}_{0} measurable and the mapping (t,ω)Xt(ω)(t,\omega)\to X_{t}(\omega) defined from (0,T)×Ω(0,T)\times\Omega into \mathbb{R} (the set of real numbers) is measurable with respect to the σ\sigma-field over (0,T)×Ω(0,T)\times\Omega generated by rectangles of the form

(s,t]×A;0<stT,As.\displaystyle(s,t]\times A;\quad 0<s\leq t\leq T,\quad A\in\mathcal{F}_{s}. (2)

For two measurable spaces (Ω1,1)(\Omega_{1},\mathcal{F}_{1}) and (Ω2,2)(\Omega_{2},\mathcal{F}_{2}), the product space is denoted by (Ω1×Ω2,12)(\Omega_{1}\times\Omega_{2},\mathcal{F}_{1}\otimes\mathcal{F}_{2}). We say that ABCA\rightleftarrows B\rightleftarrows C forms a Markov chain under measure PP if AA and CC are conditionally independent given BB under PP. PQP\ll Q denotes that the probability measure PP is absolutely continuous with respect to the measure QQ. 1{𝖤}\textbf{1}\{\mathsf{E}\} denotes the indicator function for an event 𝖤\mathsf{E}. log(x)\log(x) is the natural logarithm of xx. (x)+(x)^{+} and (x)(x)^{-} denote the positive (max(x,0)\max(x,0)) and the negative part (min(x,0)-\min(x,0)) of xx respectively. x\lceil x\rceil denotes the ceiling of xx. Throughout this paper we will adopt the convention that 0log(0)=00\log(0)=0, exp(log(0))=0\exp(\log(0))=0, and 00=10^{0}=1.

Definition 1

ϕ(x)=xlog(x)\phi(x)=x\log(x) with convention that 0log(0)=00\log(0)=0.

We note that ϕ(x)\phi(x) is convex.

We will use the following form of Jensen’s inequality [45, Theorem 7.9, p. 149] and [45, Theorem 8.20, p. 177].

Lemma 1

If f(x)f(x) is a convex function and 𝔼|X|<\mathbb{E}|X|<\infty then 𝔼[f(X)]\mathbb{E}[f(X)] exists and for any two σ\sigma-fields AA and BB,

𝔼[f(X)]𝔼[f(𝔼[X|A,B])]𝔼[f(𝔼[X|A])]f(𝔼[X]).\displaystyle\mathbb{E}[f(X)]\geq\mathbb{E}[f(\mathbb{E}[X|A,B])]\geq\mathbb{E}[f(\mathbb{E}[X|A])]\geq f(\mathbb{E}[X]).

We now recall the definition of mutual information for general ensembles and its properties. Let AA, BB, and CC be measurable mappings defined on a given probability space (Ω,,P)(\Omega,\mathcal{F},P), taking values in (𝒜,𝔉A)(\mathcal{A},\mathfrak{F}^{A}), (,𝔉B)(\mathcal{B},\mathfrak{F}^{B}), and (𝒞,𝔉C)(\mathcal{C},\mathfrak{F}^{C}) respectively. Consider partitions of Ω\Omega, 𝔔A={𝙰i,1iNA}σ(A)\mathfrak{Q}_{A}=\left\{\mathtt{A}_{i},1\leq i\leq N_{A}\right\}\subseteq\sigma(A) and 𝔔B={𝙱j,1jNB}σ(B)\mathfrak{Q}_{B}=\left\{\mathtt{B}_{j},1\leq j\leq N_{B}\right\}\subseteq\sigma(B). Wyner defined the conditional mutual information I(A;B|C)I(A;B|C) as [46]

I(A;B|C)=sup𝔔A,𝔔B𝔼[i,j=1,1NA,NBP(𝙰i,𝙱j|C)log(P(𝙰i,𝙱j|C)P(𝙰i|C)P(𝙱j|C))],\displaystyle I(A;B|C)=\sup_{\mathfrak{Q}_{A},\mathfrak{Q}_{B}}\mathbb{E}\left[\sum_{i,j=1,1}^{N_{A},N_{B}}P(\mathtt{A}_{i},\mathtt{B}_{j}|C)\log\left(\frac{P(\mathtt{A}_{i},\mathtt{B}_{j}|C)}{P(\mathtt{A}_{i}|C)P(\mathtt{B}_{j}|C)}\right)\right], (3)

where the supremum is over all such partitions of Ω\Omega. Wyner showed that I(A;B|C)0I(A;B|C)\geq 0 with equality if and only if ACBA\rightleftarrows C\rightleftarrows B forms a Markov chain [46, Lemma 3.1], and that (what is generally referred to as) Kolmogrov’s formula holds [46, Lemma 3.2]

I(A,C;B)=I(A;B)+I(C;B|A).\displaystyle I(A,C;B)=I(A;B)+I(C;B|A). (4)

Hence if I(A;B)<I(A;B)<\infty, then I(C;B|A)=I(A,C;B)I(A;B)I(C;B|A)=I(A,C;B)-I(A;B). The data processing inequality can be obtained from (4) as well: if ACBA\rightleftarrows C\rightleftarrows B forms a Markov chain, then I(A;B)I(C;B)I(A;B)\leq I(C;B).

Denote by PA,BP^{A,B}, the joint distribution of AA and BB on the space (𝒜×,𝔉A𝔉B\mathcal{A}\times\mathcal{B},\mathfrak{F}^{A}\otimes\mathfrak{F}^{B} ), i.e.,

PA,B(dA×dB)=P((A1(dA),B1(dB)),dA𝔉A,dB𝔉B.\displaystyle P^{A,B}(dA\times dB)=P((A^{-1}(dA),B^{-1}(dB)),\quad dA\in\mathfrak{F}^{A},dB\in\mathfrak{F}^{B}.

Similarly, PAP^{A} and PBP^{B} denote the marginal distributions. Gelfand and Yaglom [47] proved that if PA,BPA×PBP^{A,B}\ll P^{A}\times P^{B}, then the mutual information I(A;B)I(A;B) (defined via (3) by taking σ(C)\sigma(C) to be the trivial σ\sigma-field) can be computed as:

I(A;B)=𝔼[log(dPA,Bd(PA×PB))].\displaystyle I(A;B)=\mathbb{E}\left[\log\left(\frac{dP^{A,B}}{d(P^{A}\times P^{B})}\right)\right]. (5)

A sufficient condition for PA,BPA×PBP^{A,B}\ll P^{A}\times P^{B} is that I(A;B)<I(A;B)<\infty [48, Lemma 5.2.3, p. 92]. We will also require the following result [46, Lemma 2.1]:

Lemma 2 (Wyner’s Lemma)

If MM is a finite alphabet random variable, then

I(M;U0T)=H(M)𝔼[H(M|U0T)],\displaystyle I(M;U_{0}^{T})=H(M)-\mathbb{E}\left[H(M|U_{0}^{T})\right],

where

H(M|U0T)=mP(M=m|U0T)log(P(M=m|U0T)),\displaystyle H(M|U_{0}^{T})=-\sum_{m}P(M=m|U_{0}^{T})\log\left(P(M=m|U_{0}^{T})\right),

and H(M)H(M) is the entropy of MM.

III Point Processes, Intensities, and Mutual Information

Let 𝒩0T\mathcal{N}_{0}^{T} denote the set of counting realizations (or point-process realizations) on [0,T][0,T], i.e., if N0T𝒩0T{N}^{T}_{0}\in\mathcal{N}_{0}^{T}, then for t[0,T]t\in[0,T], Nt𝐍{N}_{t}\in\mathbf{N} (the set of non-negative integers), is right continuous, and has unit increasing jumps with N0=0{N}_{0}=0. Let 𝔉N\mathfrak{F}^{N} be the restriction of the σ\sigma-field generated by the Skorohod topology on D[0,1]D[0,1] to 𝒩0T\mathcal{N}_{0}^{T}.

Definition 2

If N0TN_{0}^{T} is a counting process adapted to the history (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T]), then N0TN_{0}^{T} is said to have (P,t:t[0,T])(P,\mathcal{F}_{t}:t\in[0,T])-intensity Γ0T=(Γt:t[0,T]){\Gamma}_{0}^{T}=(\Gamma_{t}:t\in[0,T]), where Γ0T\Gamma_{0}^{T} is a non-negative measurable process if

  • Γ0T\Gamma_{0}^{T} is (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable,

  • 0TΓt𝑑t<\int_{0}^{T}\Gamma_{t}\,dt<\infty, PP-a.s.,

  • and for all non-negative (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable processes C0TC_{0}^{T}:111The limits of the Lebesgue-Stieltjes integral ab\int_{a}^{b} should be interpreted as (a,b]\int_{(a,b]}.

    𝔼[0TCs𝑑Ns]=𝔼[0TCsΓs𝑑s].\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{N}_{s}\right]=\mathbb{E}\left[\int_{0}^{T}C_{s}\Gamma_{s}\,ds\right].

When it is clear from the context, we will drop the probability measure PP from the notation and say N0TN_{0}^{T} has (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-intensity Γ0T\Gamma_{0}^{T}.

Definition 3

A point process Y0TY_{0}^{T} is said to be Poisson process with rate λ\lambda if its (tY:t[0,T])(\mathcal{F}_{t}^{Y}:t\in[0,T])-intensity is (λ:t[0,T])(\lambda:t\in[0,T]).

The above definition can be shown to imply the usual definition of Poisson process [39, Theorem T4, Chapter II, p. 25] and vice versa [39, Section 2, Chapter II, p. 23].

Definition 4

P0Y0TP_{0}^{Y_{0}^{T}} denotes the distribution of a point process Y0TY_{0}^{T} (on the space (𝒩0T,𝔉N)(\mathcal{N}_{0}^{T},\mathfrak{F}^{N})) under which Y0TY_{0}^{T} is a Poisson process with unit rate.

A point processes with stochastic intensity and a Poisson process with unit rate are linked via the following result.

Lemma 3

Let PY0TP^{Y_{0}^{T}} be the distribution of a point process Y0TY_{0}^{T} such that PY0TP0Y0TP^{Y_{0}^{T}}\ll P_{0}^{Y_{0}^{T}}. Then there exists a non-negative predictable process Λ0T\Lambda_{0}^{T} such that

dPY0TdP0Y0T=exp(0Tlog(Λt)𝑑YtΛt+1dt).\displaystyle\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}=\exp\left(\int_{0}^{T}\log(\Lambda_{t})\,dY_{t}-\Lambda_{t}+1\,dt\right).

Moreover, the (PY0T,(tY:t[0,T]))(P^{Y_{0}^{T}},(\mathcal{F}_{t}^{Y}:t\in[0,T]))-intensity of Y0TY_{0}^{T} is Λ0T\Lambda_{0}^{T}. Conversely, if the (PY0T,tY:t[0,T])(P^{Y_{0}^{T}},\mathcal{F}_{t}^{Y}:t\in[0,T])-intensity of Y0TY_{0}^{T} is Γ0T\Gamma_{0}^{T} and 𝔼PY0T[0T|ϕ(Γt)|𝑑t]<\mathbb{E}_{P^{Y_{0}^{T}}}[\int_{0}^{T}|\phi(\Gamma_{t})|\,dt]<\infty, then PY0TP0Y0TP^{Y_{0}^{T}}\ll P_{0}^{Y_{0}^{T}}, and the corresponding Radon-Nikodym derivative is given by the above expression, where

𝔼PY0T[0T|ΓtΛt|𝑑t]=0,𝔼PY0T[0T𝟏{ΓtΛt}𝑑Yt]=0.\displaystyle\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}|\Gamma_{t}-\Lambda_{t}|\,dt\right]=0,\quad\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\mathbf{1}\{\Gamma_{t}\neq\Lambda_{t}\}\,dY_{t}\right]=0.

In the latter case,

𝔼PY0T[log(dPY0TdP0Y0T)]=𝔼PY0T[0Tϕ(Γt)Γt+1dt].\displaystyle\mathbb{E}_{P^{Y_{0}^{T}}}\left[\log\left(\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}\right)\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+1\,dt\right].
Proof:

Please see the supplementary material. ∎

The following theorem allows us to express the mutual information involving a point processes with intensity and other random variables in terms of the intensity functions. The proof of the theorem is similar to the proof of Theorem 1 in [42].

Theorem 1

Let Y0TY_{0}^{T} be a point process with (tY:t[0,T])(\mathcal{F}_{t}^{Y}:t\in[0,T])-intensity Λ0T\Lambda_{0}^{T} such that

𝔼[0T|ϕ(Λt)|𝑑t]<,\mathbb{E}[\int_{0}^{T}|\phi(\Lambda_{t})|\,dt]<\infty,

and let MM be a measurable mapping on the given probability space satisfying I(M;Y0T)<I(M;Y_{0}^{T})<\infty. Then there exists a process Γ0T\Gamma_{0}^{T} such that Γ0T\Gamma_{0}^{T} is the (𝒢t=σ(M,Y0t):t[0,T])(\mathcal{G}_{t}=\sigma(M,Y_{0}^{t}):t\in[0,T]) intensity of Y0TY_{0}^{T} and

I(M;Y0T)=𝔼[0Tϕ(Γt)ϕ(Λt)dt].\displaystyle I(M;Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\Lambda_{t})\,dt\right].
Proof:

Let PM,Y0TP^{M,Y_{0}^{T}} denote the joint distribution of MM and Y0TY_{0}^{T}, and PMP^{M} and PY0TP^{Y_{0}^{T}} denote their marginals, respectively. Since I(M;Y0T)<I(M;Y_{0}^{T})<\infty, we get that PM,Y0TPM×PY0TP^{M,Y_{0}^{T}}\ll P^{M}\times P^{Y_{0}^{T}} [48, Lemma 5.2.3, p. 92]. Lemma 3 says that PY0TP0Y0TP^{Y_{0}^{T}}\ll P_{0}^{Y_{0}^{T}}, which together with [49, Chapter 1, Exercise 19, p. 22] gives PM,Y0TPM×PY0TPM×P0Y0TP^{M,Y_{0}^{T}}\ll P^{M}\times P^{Y_{0}^{T}}\ll P^{M}\times P_{0}^{Y_{0}^{T}}.

Let P~M,Y0TPM×P0Y0T\tilde{P}^{M,Y_{0}^{T}}\triangleq{P}^{M}\times P_{0}^{Y_{0}^{T}} and

=dPM,Y0TdP~M,Y0T\displaystyle\mathcal{L}=\frac{d{P}^{M,Y_{0}^{T}}}{d\tilde{P}^{M,Y_{0}^{T}}} (6)

denote the Radon-Nikodym derivative. Since under P~M,Y0T\tilde{P}^{M,Y_{0}^{T}}, MM and Y0TY_{0}^{T} are independent, we note that the (P~M,Y0T,(𝒢t:t[0,T]))(\tilde{P}^{M,Y_{0}^{T}},(\mathcal{G}_{t}:t\in[0,T]))-intensity of Y0TY_{0}^{T} is 1 [39, E5 Exercise, Chapter II, p. 28]. Define the process L0TL_{0}^{T} as

Lt=𝔼P~[|𝒢t],t[0,T],\displaystyle L_{t}=\mathbb{E}_{\tilde{P}}[\mathcal{L}|\mathcal{G}_{t}],\quad t\in[0,T], (7)

where 𝔼P~\mathbb{E}_{\tilde{P}} denotes that the conditional expectation is taken with respect to the measure P~M,Y0T\tilde{P}^{M,Y_{0}^{T}}. Then L0TL_{0}^{T} is a (P~M,Y0T,𝒢t)(\tilde{P}^{M,Y_{0}^{T}},\mathcal{G}_{t}) non-negative absolutely-integrable martingale.

By the martingale representation theorem, the process L0TL_{0}^{T} can be written as [39, Chapter III, Theorem T17, p. 76] (where we have taken σ(M)\sigma(M) to be the “germ σ\sigma-field”):

Lt=1+0tKs(dYsds),\displaystyle L_{t}=1+\int_{0}^{t}K_{s}(d{Y}_{s}-ds),

where K0TK_{0}^{T} is a (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-predictable process which satisfies 0T|Kt|𝑑t<\int_{0}^{T}|K_{t}|\,dt<\infty P~M,Y0T\tilde{P}^{M,Y_{0}^{T}}-a.s. Applying [50, Lemma 19.5, p. 315], we can write L0TL_{0}^{T} as

Lt=exp(0tlog(Γs)𝑑Ys+(1Γs)ds),t[0,T],\displaystyle L_{t}=\exp\left(\int_{0}^{t}\log(\Gamma_{s})\,dY_{s}+(1-\Gamma_{s})\,ds\right),\quad t\in[0,T], (8)

where Γ0T\Gamma_{0}^{T} is a non-negative (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-predictable process, and Γt<\Gamma_{t}<\infty P~M,Y0T\tilde{P}^{M,Y_{0}^{T}}-a.s. for t[0,T]t\in[0,T].

Now we can mimic the proof of [39, Chapter VI, Theorems T3, p. 166] to deduce:

Lemma 4

For all non-negative (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-predictable processes C0TC_{0}^{T}

𝔼[0TCtΓt𝑑t]=𝔼[0TCt𝑑Yt],\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{t}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}C_{t}\,dY_{t}\right],

where the expectation is taken with respect to measure PP.

Proof:

Please see the supplementary material. ∎

Taking Ct=1C_{t}=1 in the above equality yields

𝔼[0TΓt𝑑t]=𝔼[0T𝑑Yt]=𝔼[0TΛt𝑑t]<.\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]<\infty. (9)

Hence 0TΓt𝑑t<\int_{0}^{T}\Gamma_{t}\,dt<\infty PP-a.s. and we conclude that the (PM,Y0T,𝒢t:t[0,T])({P}^{M,Y_{0}^{T}},\mathcal{G}_{t}:t\in[0,T])-intensity of Y0TY_{0}^{T} is Γ0T\Gamma_{0}^{T}.

Now we will use:

Lemma 5
𝔼[0Tlog(Γt)𝑑Yt]=𝔼[0Tϕ(Γt)𝑑t].\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]. (10)
Proof:

Please see the supplementary material. ∎

Since 𝔼[log(dPM,N0TdP~M,N0T)]\mathbb{E}\left[\log\left(\frac{dP^{M,N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right] is well-defined, (6), (7), and (8) yields

𝔼[log(dPM,N0TdP~M,N0T)]\displaystyle\mathbb{E}\left[\log\left(\frac{dP^{M,N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right] =𝔼[log(LT)]\displaystyle=\mathbb{E}\left[\log(L_{T})\right]
=𝔼[0Tlog(Γt)𝑑Yt+(1Γt)dt]\displaystyle=\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}+(1-\Gamma_{t})\,dt\right]
=𝔼[0Tϕ(Γt)𝑑t]+𝔼[0T(1Λt)𝑑t],\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]+\mathbb{E}\left[\int_{0}^{T}(1-\Lambda_{t})\,dt\right], (11)

where in the last line we have used Lemma 5 and 𝔼[0TΓt𝑑t]=𝔼[0TΛt𝑑t]<\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]<\infty from (9). Also,

𝔼[log(d(PM×PY0TdP~M,Y0T)]\displaystyle\mathbb{E}\left[\log\left(\frac{d(P^{M}\times P^{Y_{0}^{T}}}{d\tilde{P}^{M,Y_{0}^{T}}}\right)\right] =𝔼[log(dPY0TdP0Y0T)]\displaystyle=\mathbb{E}\left[\log\left(\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}\right)\right]
=(a)𝔼[0Tϕ(Λt)+1Λtdt]\displaystyle\overset{(a)}{=}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})+1-\Lambda_{t}\,dt\right] (12)
<,\displaystyle<\infty,

where we have used Lemma 3 for (a). Using the above inequality and the fact that

𝔼[log(dPM,N0TdP~M,N0T)]\mathbb{E}\left[\log\left(\frac{dP^{M,N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right]

is well-defined, we can express the mutual information as

I(M;Y0T)\displaystyle I(M;Y_{0}^{T}) =𝔼[log(dPM,Y0Td(PM×PY0T))]\displaystyle=\mathbb{E}\left[\log\left(\frac{d{P}^{M,Y_{0}^{T}}}{d({P}^{M}\times P^{Y_{0}^{T}})}\right)\right]
=𝔼[log(dPM,N0TdP~M,N0T)]𝔼[log(d(PM×PN0TdP~M,N0T)],\displaystyle=\mathbb{E}\left[\log\left(\frac{dP^{M,N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right]-\mathbb{E}\left[\log\left(\frac{d(P^{M}\times P^{N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right], (13)

Now we can compute the mutual information from (11), (12), and (13),

I(M;Y0T)\displaystyle I(M;Y_{0}^{T}) =𝔼[0Tϕ(Γt)𝑑t]+𝔼[0T(1Λt)𝑑t]𝔼[0Tϕ(Λt)𝑑t]𝔼[0T1Λtdt]\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]+\mathbb{E}\left[\int_{0}^{T}(1-\Lambda_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}1-\Lambda_{t}\,dt\right]
=𝔼[0Tϕ(Γt)𝑑t]𝔼[0Tϕ(Λt)𝑑t]\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]
=𝔼[0Tϕ(Γt)ϕ(Λt)dt].\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\Lambda_{t})\,dt\right].

We shall require several strong data processing inequalities, for which purpose we now derive some ancillary results regarding the intensity of a point process. Combining [39, T8 Theorem, Chapter II, p. 27] and [39, T9 Theorem, Chapter II, p. 28], we can conclude the following result.

Lemma 6

Let Γ0T\Gamma_{0}^{T} be a (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable non-negative process satisfying

0TΓt𝑑t<a.s.\int_{0}^{T}\Gamma_{t}\,dt<\infty\quad\text{a.s.}

Let Y0TY_{0}^{T} be a point process adapted to (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T]). Then Γ0T\Gamma_{0}^{T} is the (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-intensity of Y0TY_{0}^{T} if and only if

MtYt0tΓs𝑑st[0,T]M_{t}\triangleq Y_{t}-\int_{0}^{t}\Gamma_{s}\,ds\quad t\in[0,T]

is a (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-local martingale222 A process Y0TY_{0}^{T} is called a local martingale with respect to a filtration (t:t0)(\mathcal{F}_{t}:t\geq 0) if YtY_{t} is t\mathcal{F}_{t}-measurable for each t[0,T]t\in[0,T] and there exists an increasing sequence of stopping times TnT_{n}, such that TnT_{n}\to\infty and the stopped and shifted processes (Ymin{t,Tn}Y0:t[0,T])(Y_{\min\{t,T_{n}\}}-Y_{0}:t\in[0,T]) are (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-martingales for each nn..

If we impose the stricter condition of finite expectation 𝔼[0TΓt𝑑t]<\mathbb{E}[\int_{0}^{T}\Gamma_{t}\,dt]<\infty, the local martingale condition in the above statement can be replaced by the martingale condition.

Lemma 7

Let Γ0T\Gamma_{0}^{T} be a (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable non-negative process satisfying

𝔼[0TΓt𝑑t]<.\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]<\infty.

Let Y0TY_{0}^{T} be a point process adapted to (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T]). Then Γ0T\Gamma_{0}^{T} is the (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-intensity of Y0TY_{0}^{T} if and only if

MtYt0tΓs𝑑st[0,T]M_{t}\triangleq Y_{t}-\int_{0}^{t}\Gamma_{s}\,ds\quad t\in[0,T]

is a (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-martingale.

Proof:

Please see the supplementary material. ∎

Lemma 8

If a point process N0TN_{0}^{T} has (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-intensity Λ0T{\Lambda}_{0}^{T}, and (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T]) is another history for N0TN_{0}^{T} such that 𝒢tt\mathcal{G}_{t}\subseteq\mathcal{F}_{t} for each t[0,T]t\in[0,T], then there exists a process Π0T\Pi_{0}^{T} such that Π0T\Pi_{0}^{T} is the (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-intensity of N0T{N}_{0}^{T}, and for each t[0,T]t\in[0,T], Πt=𝔼[Λt|𝒢t]\Pi_{t}=\mathbb{E}[{\Lambda}_{t}|\mathcal{G}_{t-}] PP-a.s.

Proof:

Please see the supplementary material. ∎

Lemma 9

Let Y0TY_{0}^{T} be a point process with (𝒢tσ(M,Y0t):t[0,T])(\mathcal{G}_{t}\triangleq\sigma(M,Y_{0}^{t}):t\in[0,T])-intensity Γ0T\Gamma_{0}^{T} for some MM. Let Z0TZ_{0}^{T} be obtained adding an independent (of both MM and Y0TY_{0}^{T}) point process N0TN_{0}^{T} with (tN:t[0,T])(\mathcal{F}_{t}^{N}:t\in[0,T])-intensity Π0T\Pi_{0}^{T} to Y0TY_{0}^{T}. Then Z0TZ_{0}^{T} has a (tσ(M,Z0t):t[0,T])(\mathcal{F}_{t}\triangleq\sigma(M,Z_{0}^{t}):t\in[0,T])-intensity Θ0T\Theta_{0}^{T} which satisfies Θt=𝔼[(Γt+Πt)|t]\Theta_{t}=\mathbb{E}[(\Gamma_{t}+\Pi_{t})|\mathcal{F}_{t-}] PP-a.s. for each t[0,T]t\in[0,T].

Proof:

Please see the supplementary material. ∎

Theorem 2

Let Y0TY_{0}^{T} be a Poisson process with rate λ\lambda, MM be such that I(M;Y0T)<I(M;Y_{0}^{T})<\infty, and Γ0T\Gamma_{0}^{T} be the (σ(M;Y0t):t[0,T])(\sigma(M;Y_{0}^{t}):t\in[0,T])-intensity of Y0TY_{0}^{T}. Suppose Z0TZ_{0}^{T} is obtained by adding an independent (of Y0TY_{0}^{T} and MM) Poisson process with rate μ\mu to Y0TY_{0}^{T}. Then,

I(M;Y0T)\displaystyle I(M;Y_{0}^{T}) =𝔼[0Tϕ(Γt)ϕ(λ)dt],\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\lambda)\,dt\right],
I(M;Z0T)\displaystyle I(M;Z_{0}^{T}) 𝔼[0Tϕ(Γt+μ)ϕ(λ+μ)dt].\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t}+\mu)-\phi(\lambda+\mu)\,dt\right].
Proof:

Since MY0TZ0TM\leftrightarrows Y_{0}^{T}\leftrightarrows Z_{0}^{T} forms a Markov chain, the data processing inequality gives I(M;Z0T)I(M;Y0T)<I(M;Z_{0}^{T})\leq I(M;Y_{0}^{T})<\infty. Applying Theorem 1 and using the uniqueness of intensities,

I(M;Y0T)\displaystyle I(M;Y_{0}^{T}) =𝔼[0Tϕ(Γt)ϕ(λ)dt],and\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\lambda)\,dt\right],\quad\text{and}
I(M;Z0T)\displaystyle I(M;Z_{0}^{T}) =𝔼[0Tϕ(Γ^t)ϕ(λ^t)dt],\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\hat{\Gamma}_{t})-\phi(\hat{\lambda}_{t})\,dt\right], (14)

where Γ^0T\hat{\Gamma}_{0}^{T} and λ^0T\hat{\lambda}_{0}^{T} are the (σ(M;Z0t):t[0,T])(\sigma(M;Z_{0}^{t}):t\in[0,T]) and (tZ:t[0,T])(\mathcal{F}_{t}^{Z}:t\in[0,T])-intensities of Z0TZ_{0}^{T}. Due to the uniqueness of the intensities and Lemma 9, we get for each t[0,T]t\in[0,T], Γ^t=𝔼[Γt|M,Z0t]+μ\hat{\Gamma}_{t}=\mathbb{E}[\Gamma_{t}|M,Z_{0}^{t-}]+\mu, and λ^t=λ+μ\hat{\lambda}_{t}=\lambda+\mu. Substituting this in (14) and applying Jensen’s inequality yields

I(M;Z0T)\displaystyle I(M;Z_{0}^{T}) =𝔼[0Tϕ(𝔼[Γt|M,Z0t]+μ)ϕ(λ+μ)dt],\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\mathbb{E}[\Gamma_{t}|M,Z_{0}^{t-}]+\mu)-\phi(\lambda+\mu)\,dt\right],
𝔼[0Tϕ(Γt+μ)ϕ(λ+μ)dt].\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t}+\mu)-\phi(\lambda+\mu)\,dt\right].

Definition 5

A point process Z0TZ_{0}^{T} is said to be obtained from pp-thinning of a point process Y0TY_{0}^{T}, if each point in Y0TY_{0}^{T} is deleted with probability pp, independent of all other points and deletions.

Lemma 10

Suppose that Y0TY_{0}^{T} is a point process with 𝒢tσ(M,Y0t)\mathcal{G}_{t}\triangleq\sigma(M,Y_{0}^{t})-intensity Γ0T\Gamma_{0}^{T} such that 𝔼[0TΓt𝑑t]<\mathbb{E}[\int_{0}^{T}\Gamma_{t}\,dt]<\infty and Z0TZ_{0}^{T} is obtained from pp-thinning Y0TY_{0}^{T}. Then the (tσ(M,Z0t):t[0,T])(\mathcal{F}_{t}\triangleq\sigma(M,Z_{0}^{t}):t\in[0,T])-intensity of Z0TZ_{0}^{T} is given by Θ0T\Theta_{0}^{T}, where PP-a.s. Θt=(1p)𝔼[Γt|t],t[0,T]\Theta_{t}=(1-p)\mathbb{E}[\Gamma_{t}|\mathcal{F}_{t-}],t\in[0,T].

Proof:

Please see the supplementary material. ∎

The following theorem was first proven by Wang in [44] using a property of certain “contraction coefficient” used in strong data processing inequalities [51]. Here, we provide a self-contained proof which uses Theorem 1 and Lemma 10.

Theorem 3

Let Y0TY_{0}^{T} be a Poisson process with rate λ\lambda, and MM be such that I(M;Y0T)<I(M;Y_{0}^{T})<\infty. Let Z0TZ_{0}^{T} obtained from pp-thinning of Y0TY_{0}^{T} such that the thinning operation is independent of MM. Then

I(M;Z0T)(1p)I(M;Y0T).\displaystyle I(M;Z_{0}^{T})\leq(1-p)I(M;Y_{0}^{T}).
Proof:

The data processing inequality gives I(M;Z0T)I(M;Y0T)<I(M;Z_{0}^{T})\leq I(M;Y_{0}^{T})<\infty. Applying Theorem 1,

I(M;Y0T)=𝔼[0Tϕ(Γt)ϕ(λ)dt],\displaystyle I(M;Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\lambda)\,dt\right], (15)

and

I(M;Z0T)=𝔼[0Tϕ(Γ^t)ϕ(λ^t)dt],\displaystyle I(M;Z_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\hat{\Gamma}_{t})-\phi(\hat{\lambda}_{t})\,dt\right], (16)

where Γ0T\Gamma_{0}^{T} and λ0T\lambda_{0}^{T} (respectively Γ^0T\hat{\Gamma}_{0}^{T} and λ^0T\hat{\lambda}_{0}^{T}) are the (σ(M;Y0t):t[0,T])(\sigma(M;Y_{0}^{t}):t\in[0,T]) and (σ(Y0t):t[0,T])(\sigma(Y_{0}^{t}):t\in[0,T])-intensities (respectively (σ(M;Z0t):t[0,T])(\sigma(M;Z_{0}^{t}):t\in[0,T]) and (σ(Z0t):t[0,T])(\sigma(Z_{0}^{t}):t\in[0,T])-intensities) of Y0TY_{0}^{T} (respectively Z0TZ_{0}^{T}). Due to the uniqueness of the intensities and Lemma 10, we can take for each t[0,T]t\in[0,T],

Γ^t=(1p)𝔼[Γt|M,Z0t],λ^t=(1p)λ.\hat{\Gamma}_{t}=(1-p)\mathbb{E}[\Gamma_{t}|M,Z_{0}^{t-}],\quad\hat{\lambda}_{t}=(1-p)\lambda.

Noting that ϕ((1p)x)=(1p)ϕ(x)+xϕ(1p)\phi((1-p)x)=(1-p)\phi(x)+x\phi(1-p), (16) yields

I(M;Z0T)=\displaystyle I(M;Z_{0}^{T})= (1p)𝔼[0Tϕ(𝔼[Γt|M,Z0t])ϕ(λ)dt]\displaystyle(1-p)\mathbb{E}\left[\int_{0}^{T}\phi(\mathbb{E}[\Gamma_{t}|M,Z_{0}^{t-}])-\phi(\lambda)\,dt\right]
+ϕ(1p)𝔼[0TΓtλdt]\displaystyle+\phi(1-p)\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}-\lambda\,dt\right]
=(a)\displaystyle\overset{(a)}{=} (1p)𝔼[0Tϕ(𝔼[Γt|M,Z0t])ϕ(λ)dt]\displaystyle(1-p)\mathbb{E}\left[\int_{0}^{T}\phi(\mathbb{E}[\Gamma_{t}|M,Z_{0}^{t-}])-\phi(\lambda)\,dt\right]
(b)\displaystyle\overset{(b)}{\leq} (1p)𝔼[0Tϕ(Γt)ϕ(λ)dt]\displaystyle(1-p)\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\lambda)\,dt\right]
=\displaystyle= (1p)I(M;Y0T),\displaystyle(1-p)I(M;Y_{0}^{T}),

where for (a) we have used the fact that 𝔼[0TΓt𝑑t]=𝔼[0T1𝑑Yt]=𝔼[0Tλ𝑑t]\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}1\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\lambda\,dt\right], and
for (b) we have used Jensen’s inequality. ∎

We will require the following result [52, Theorem 2.11, p. 106].

Lemma 11

Suppose that Y0TY_{0}^{T} is a Poisson process with rate λ\lambda and Z0TZ_{0}^{T} is obtained from pp-thinning of Y0TY_{0}^{T}. Let

Z^t=YtZtt[0,T].\displaystyle\hat{Z}_{t}=Y_{t}-Z_{t}\quad t\in[0,T].

Then Z^0T\hat{Z}_{0}^{T} and Z0TZ_{0}^{T} are independent Poisson processes with rates pλp\lambda and (1p)λ(1-p)\lambda respectively.

The following lemma will be used repeatedly in the converse proofs of the rate-distortion function.

Lemma 12

Let a point process Y0TY_{0}^{T} have an (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-intensity Γ0T\Gamma_{0}^{T} such that

𝔼[0Tϕ(Γt)𝑑t]<.\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]<\infty.

Let Y^0T\hat{Y}_{0}^{T} be an non-negative (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable process satisfying 𝔼[0TY^t𝑑t]<\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]<\infty. Then

𝔼[0Tlog(Y^t)𝑑Yt]=𝔼[0Tlog(Y^t)Γt𝑑t].\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\Gamma_{t}\,dt\right].
Proof:

Please see the supplementary material. ∎

IV Functional Covering of Point Processes

In this section, we will consider general point processes and obtain the rate-distortion function under the functional-covering distortion when feedforward is present. Stronger results are obtained for Poisson processes in the next sections.

Definition 6

Given a point process y0T𝒩0Ty_{0}^{T}\in\mathcal{N}_{0}^{T}, and a non-negative function y^0T\hat{y}_{0}^{T}, the functional-covering distortion dd is

d(y^0T,y0T)(0Ty^t𝑑tlog(y^t)dyt),\displaystyle d(\hat{y}_{0}^{T},y_{0}^{T})\triangleq\left(\int_{0}^{T}\hat{y}_{t}\,dt-\log(\hat{y}_{t})\,dy_{t}\right),

whenever the expression on the right is well-defined.

We will allow the reconstruction function Y^0T\hat{Y}_{0}^{T} to depend on Y0TY_{0}^{T} as well as the message, constrained via predictability. In particular, we will call Y^0T\hat{Y}_{0}^{T} an allowable reconstruction with feedforward if it is non-negative and (σ(Y0t):t[0,T])(\sigma(Y_{0}^{t}):t\in[0,T])-predictable. Let 𝒴^0,FFT\mathcal{\hat{Y}}_{0,\text{FF}}^{T} denote the set of all y^0T\hat{y}_{0}^{T} processes which are allowable reconstructions with feedforward.

Definition 7

A (T,R,D)(T,R,D) code with feedforward consists of an encoder ff

f:𝒩0T{1,,,exp(RT)}\displaystyle f:\mathcal{N}_{0}^{T}\rightarrow\{1,\dots,\dots,\lceil\exp(RT)\rceil\}

and a decoder gg

g:{1,,exp(RT)}×𝒩0T𝒴^0,FFT\displaystyle g:\{1,\dots,\lceil\exp(RT)\rceil\}\times\mathcal{N}_{0}^{T}\rightarrow\mathcal{\hat{Y}}_{0,\text{FF}}^{T}

satisfying

𝔼[0TY^t𝑑t]<\displaystyle\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]<\infty

and the distortion constraint

𝔼[1Td(Y^0T,Y0T)]D.\displaystyle\mathbb{E}\left[\frac{1}{T}d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]\leq D.

We will call the encoder’s output M=f(Y0T)M=f(Y_{0}^{T}) the message and the decoder’s output Y^0T\hat{Y}_{0}^{T} the reconstruction.

Definition 8

The minimum achievable distortion with feedforward at rate RR and blocklength TT is

DF(R,T)inf{D:there exists a (T,R,D) code with feedforward}.\displaystyle D_{F}^{*}(R,T)\triangleq\inf\{D:\text{there exists a $(T,R,D)$ code with feedforward}\}.
Definition 9

The distortion-rate function with feedforward is

DF(R)lim supTDF(R,T).\displaystyle D_{F}(R)\triangleq\limsup_{T\to\infty}D_{F}^{*}(R,T).

The minimum achievable rate at distortion DD and blocklength TT with feedforward RF(D,T)R_{F}^{*}(D,T) and the rate-distortion function with feedforward RF(D)R_{F}(D) can be defined similarly.

DF(R,T)D_{F}^{*}(R,T) can be characterized via the following theorem for certain point processes.

Theorem 4

Let Y0TY_{0}^{T} be a point process with (tY:t[0,T])(\mathcal{F}_{t}^{Y}:t\in[0,T])-intensity Λ0T\Lambda_{0}^{T} such that

𝔼[0T|ϕ(Λt)|𝑑t]<.\mathbb{E}\left[\int_{0}^{T}|\phi(\Lambda_{t})|\,dt\right]<\infty.

Let

Ξ(Y0T)1T𝔼[0TΛtϕ(Λt)dt],\displaystyle\Xi(Y_{0}^{T})\triangleq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}-\phi(\Lambda_{t})\,dt\right],

and

δTP(YT=0)<1.\delta_{T}\triangleq P(Y_{T}=0)<1.

Then DF(R,T)D_{F}^{*}(R,T) satisfies

Ξ(Y0T)R1TDF(R,T)Ξ(Y0T)(1δT)R+1T.\displaystyle\Xi(Y_{0}^{T})-R-\frac{1}{T}\leq D_{F}^{*}(R,T)\leq\Xi(Y_{0}^{T})-(1-\delta_{T})R+\frac{1}{T}.
Proof:

Achievability:
Recall that since Λ0T\Lambda_{0}^{T} is the (tY:t[0,T])(\mathcal{F}_{t}^{Y}:t\in[0,T])-intensity of Y0TY_{0}^{T}, it is (tY:t[0,T])(\mathcal{F}_{t}^{Y}:t\in[0,T])-predictable, and 𝔼[0T|ϕ(Λt)|𝑑t]<\mathbb{E}[\int_{0}^{T}|\phi(\Lambda_{t})|\,dt]<\infty implies 𝔼[0TΛt𝑑t]<\mathbb{E}[\int_{0}^{T}\Lambda_{t}\,dt]<\infty. If the decoder outputs Λ0T\Lambda_{0}^{T}, this leads to distortion

1T𝔼[d(Λ0T,Y0T)]\displaystyle\frac{1}{T}\mathbb{E}[d(\Lambda_{0}^{T},Y_{0}^{T})] =1T𝔼[0TΛt𝑑tlog(Λt)dYt]\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt-\log(\Lambda_{t})\,dY_{t}\right]
=1T𝔼[0TΛtϕ(Λt)dt]\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}-\phi(\Lambda_{t})\,dt\right]
=Ξ(Y0T).\displaystyle=\Xi(Y_{0}^{T}).

Thus DF(0,T)Ξ(Y0T)D_{F}^{*}(0,T)\leq\Xi(Y_{0}^{T}), and the upper bound in the statement of the theorem holds at R=0R=0.

Now consider the case R>0R>0. Fix T>0T>0 and let J=exp(RT)J=\lceil\exp(RT)\rceil. If YT=0Y_{T}=0, then the encoder sends index M=1M=1. Otherwise, let Θ\Theta denote the first arrival instant of the observed point process Y0TY_{0}^{T}. From Lemma 3, we have that PY0TP0Y0TP^{Y^{T}_{0}}\ll P_{0}^{Y^{T}_{0}}. Since under P0Y0TP_{0}^{Y^{T}_{0}}, Y0TY_{0}^{T} is a Poisson process with unit rate, it holds that P0Y0T(Θ=t,YT>0)=0P_{0}^{Y^{T}_{0}}(\Theta=t,Y_{T}>0)=0 for any fixed t[0,T]t\in[0,T]. This gives us P(Θ=t,YT>0)=0P(\Theta=t,Y_{T}>0)=0 for t[0,T]t\in[0,T]. Thus conditioned on the event YT>0Y_{T}>0, Θ\Theta has a continuous distribution function FΘF_{\Theta}. The encoder computes FΘ(Θ)F_{\Theta}(\Theta) which is uniformly distributed over [0,1][0,1], which the encoder suitably quantizes to obtain an MM which is uniform in {2,,J}\{2,\dots,J\}. From Theorem 1, there exists a (σ(M,Y0t):t[0,T])(\sigma(M,Y_{0}^{t}):t\in[0,T])-predictable process Γ0T\Gamma_{0}^{T} which is the (σ(M,Y0t):t[0,T])(\sigma(M,Y_{0}^{t}):t\in[0,T])-intensity of Y0TY_{0}^{T}. We note that 𝔼[0TΓt𝑑t]=𝔼[0TΛt𝑑t]<\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]<\infty, and from Theorem 1, 𝔼[0Tlog(Γt)𝑑Yt]<\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]<\infty. Hence

1T𝔼[d(Λ0T,Y0T)]=1T𝔼[0TΓt𝑑tlog(Γt)dYt]\frac{1}{T}\mathbb{E}[d(\Lambda_{0}^{T},Y_{0}^{T})]=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt-\log(\Gamma_{t})\,dY_{t}\right]

is well-defined. The decoder outputs Γ0T\Gamma_{0}^{T} as its reconstruction. Then we have

1TH(M)\displaystyle\frac{1}{T}H(M) =1T(δTlog(δT)+(1δT)log(1δT))+1δTT(log(J1))\displaystyle=-\frac{1}{T}\left(\delta_{T}\log(\delta_{T})+(1-\delta_{T})\log(1-\delta_{T})\right)+\frac{1-\delta_{T}}{T}(\log(J-1))
(a)1δTT(log(J1))\displaystyle\overset{(a)}{\geq}\frac{1-\delta_{T}}{T}(\log(J-1))
(b)1δTTlog(J/exp(1))\displaystyle\overset{(b)}{\geq}\frac{1-\delta_{T}}{T}\log(J/\exp(1))
(c)(1δT)R1T,\displaystyle\overset{(c)}{\geq}(1-\delta_{T})R-\frac{1}{T}, (17)

where for (a), we have used the bound δTlog(δT)(1δT)log(1δT)0-\delta_{T}\log(\delta_{T})-(1-\delta_{T})\log(1-\delta_{T})\geq 0,
for (b), we have used the inequality J1J/exp(1)J-1\geq J/\exp(1) when J2J\geq 2, and
for (c), we used the fact that RTlog(J)RT\leq\log(J).

H(M)H(M) also satisfies

1TH(M)\displaystyle\frac{1}{T}H(M) =(a)1TI(M;Y0T)\displaystyle\overset{(a)}{=}\frac{1}{T}I(M;Y_{0}^{T})
=(b)1T𝔼[0Tlog(Γt)𝑑Yt]1T𝔼[0Tϕ(Λt)𝑑t],\displaystyle\overset{(b)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right], (18)

where, for (a) we have used Lemma 2,
for (b) we have used Theorem 1.
The average distortion can be bounded as follows:

1T𝔼[d(Λ0T,Y0T)]\displaystyle\frac{1}{T}\mathbb{E}[d(\Lambda_{0}^{T},Y_{0}^{T})] =1T𝔼[0TΓt𝑑tlog(Γt)dYt]\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt-\log(\Gamma_{t})\,dY_{t}\right]
=(a)1T𝔼[0TΓt𝑑t]1T𝔼[0Tlog(Γt)𝑑Yt]\displaystyle\overset{(a)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]
=(b)1T𝔼[0TΛt𝑑t]1T𝔼[0Tlog(Γt)𝑑Yt]\displaystyle\overset{(b)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]
=(c)1T𝔼[0TΛt𝑑t]1TH(M)1T𝔼[0Tϕ(Λt)𝑑t]\displaystyle\overset{(c)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]-\frac{1}{T}H(M)-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]
(d)1T𝔼[0TΛtϕ(Λt)dt](1δT)R+1T\displaystyle\overset{(d)}{\leq}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}-\phi(\Lambda_{t})\,dt\right]-(1-\delta_{T})R+\frac{1}{T}
=Ξ(Y0T)(1δT)R+1T,\displaystyle=\Xi(Y_{0}^{T})-(1-\delta_{T})R+\frac{1}{T},

where, for (a), we have used the fact that 𝔼[0Tlog(Γt)𝑑Yt]<\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]<\infty due to Theorem 1,
for (b), we used the equality 𝔼[0TΓt𝑑t]=𝔼[0TΛt𝑑t]\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right],
for (c), we used (18), and
for (d), we used (17).

Thus we have shown the existence of a (T,R,D)(T,R,D) code with feedforward such that D=Ξ(Y0T)(1δT)R+1TD=\Xi(Y_{0}^{T})-(1-\delta_{T})R+\frac{1}{T}. This gives the upper bound on DF(R,T)D_{F}^{*}(R,T).

Converse:

For the given (T,R,D)(T,R,D) code with feedforward, let J=exp(RT)J=\lceil\exp(RT)\rceil. Then Jexp(RT)+1exp(RT+1)J\leq\exp(RT)+1\leq\exp(RT+1). Thus we have

R+1T1Tlog(J)1TH(M)=(a)1TI(M;Y0T),\displaystyle R+\frac{1}{T}\geq\frac{1}{T}\log(J)\geq\frac{1}{T}H(M)\overset{(a)}{=}\frac{1}{T}I(M;Y_{0}^{T}), (19)

where (a) follows because of Lemma 2.

Since I(M;Y0T)<I(M;Y_{0}^{T})<\infty, we conclude from Theorem 1 that there exists a process Γ0T\Gamma_{0}^{T} such that Γ0T\Gamma_{0}^{T} is the (t=σ(M,Y0t):t[0,T])(\mathcal{F}_{t}=\sigma(M,Y_{0}^{t}):t\in[0,T]) intensity of Y0TY_{0}^{T} and

I(M;Y0T)=𝔼[0Tϕ(Γt)𝑑t]𝔼[0Tϕ(Λt)𝑑t].\displaystyle I(M;Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right].

Hence from (19)

R1T𝔼[0Tϕ(Γt)𝑑t]1T𝔼[0Tϕ(Λt)𝑑t]1T.\displaystyle R\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]-\frac{1}{T}. (20)

Let Y^0T\hat{Y}_{0}^{T} denote the decoder’s output. The distortion constraint DD satisfies

D1T𝔼[d(Y^0T,Y0T)]\displaystyle D\geq\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right] =1T𝔼[0TY^t𝑑tlog(Y^t)dYt]\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt-\log(\hat{Y}_{t})\,dY_{t}\right]
=1T𝔼[0TY^tlog(Y^t)Γtdt]\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}-\log(\hat{Y}_{t})\Gamma_{t}\,dt\right] (21)

where in the last line we have used Lemma 12.

Using the inequality ulog(v)ϕ(u)u+vu\log(v)\leq\phi(u)-u+v, and noting that the individual terms have finite expectations,

𝔼[0Tlog(Y^t)Γt𝑑t]\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\Gamma_{t}\,dt\right] 𝔼[0Tϕ(Γt)Γt+Y^tdt]\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+\hat{Y}_{t}\,dt\right]
=𝔼[0Tϕ(Γt)𝑑t]𝔼[0TΓt𝑑t]+𝔼[0TY^t𝑑t].\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]+\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]. (22)

From (21) and (20), we deduce

R+D\displaystyle R+D 1T𝔼[0Tϕ(Γt)𝑑t]1T𝔼[0Tϕ(Λt)𝑑t]+1T𝔼[0TY^t𝑑t]\displaystyle\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]+\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]
1T𝔼[0Tlog(Y^t)𝑑Yt]1T\displaystyle\phantom{====}-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}\right]-\frac{1}{T}
(a)1T𝔼[0TΓt𝑑t]1T𝔼[0Tϕ(Λt)𝑑t]1T\displaystyle\overset{(a)}{\geq}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]-\frac{1}{T}
(b)1T𝔼[0TΛt𝑑t]1T𝔼[0Tϕ(Λt)𝑑t]1T\displaystyle\overset{(b)}{\geq}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]-\frac{1}{T}
=Ξ(Y0T)1T,\displaystyle\overset{}{=}\Xi(Y_{0}^{T})-\frac{1}{T},

where, for (a) we have used (22), and
for (b) we used the fact that 𝔼[0TΓt𝑑t]=𝔼[0T𝑑Yt]=𝔼[0TΛt𝑑t]\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right].
Hence we have shown that for any (T,R,D)(T,R,D) code with feedforward, DΞ(Y0T)R1/TD\geq\Xi(Y_{0}^{T})-R-1/T. This gives us the lower bound on DF(R,T)D_{F}^{*}(R,T)

Corollary 1

Let Y0TY_{0}^{T} be a point process with (tY:t[0,T])(\mathcal{F}_{t}^{Y}:t\in[0,T])-intensity Λ0T\Lambda_{0}^{T} such that

  • 𝔼[0T|ϕ(Λt)|𝑑t]<\mathbb{E}[\int_{0}^{T}|\phi(\Lambda_{t})|\,dt]<\infty,

  • Ξ¯(Y)lim supT1T𝔼[0TΛtϕ(Λt)dt]\bar{\Xi}(Y)\triangleq\limsup_{T\to\infty}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}-\phi(\Lambda_{t})\,dt\right] is finite.

  • limTP(YT=0)=0\lim_{T\to\infty}P(Y_{T}=0)=0.

Then

DF(R)=Ξ¯(Y)R.\displaystyle D_{F}(R)=\bar{\Xi}(Y)-R.
Proof:

The corollary follows from the definition DF(R)=lim supTDF(R,T)D_{F}(R)=\limsup_{T\to\infty}D_{F}^{*}(R,T) and from the bounds on DF(R,T)D_{F}^{*}(R,T) in the Theorem 4. ∎

Remark 1

The above distortion-rate function is reminiscent of the logarithmic-loss distortion-rate function for a DMS. Specifically, for a DMS YY on alphabet 𝒴\mathcal{Y} let the reconstruction be a probability distribution function QQ on 𝒴\mathcal{Y}. The logarithmic loss distortion is defined as dLL(y,Q)log(Q(y))d_{LL}(y,Q)\triangleq-\log(Q(y)) and the distortion-rate function is then given by D(R)=(H(Y)R)+D(R)=(H(Y)-R)^{+} [38].

If the reconstruction y^0T\hat{y}_{0}^{T} is assumed to be bounded then it can be used define a probability measure on the space of point-processes (𝒩0T,𝔉N)(\mathcal{N}_{0}^{T},\mathfrak{F}^{N}) via following Radon-Nikodym derivative.

dPy^0TdP0(y0T)=exp(0Tlog(y^t)𝑑yt(y^t1)dt),\frac{dP_{\hat{y}_{0}^{T}}}{dP_{0}}(y_{0}^{T})=\exp\left(\int_{0}^{T}\log(\hat{y}_{t})\,dy_{t}-(\hat{y}_{t}-1)\,dt\right),

where P0P_{0} is the measure under which Y0TY_{0}^{T} is a Poisson process with unit rate. Then the intensity of Y0TY_{0}^{T} under this measure is y^0T\hat{y}_{0}^{T} [39, Chapter VI, Theorems T2-T4] and the functional-covering distortion is related to the above Radon-Nikodym derivative as

d(y^0T,y0T)=log(dPy^0TdP0(y0T))+T.d(\hat{y}_{0}^{T},y_{0}^{T})=-\log\left(\frac{dP_{\hat{y}_{0}^{T}}}{dP_{0}}(y_{0}^{T})\right)+T.

\Diamond

Applying the above corollary to a Poisson process with rate λ>0\lambda>0, we get that DF(R)=λλlog(λ)RD_{F}(R)=\lambda-\lambda\log(\lambda)-R. As we will see in the next section, this distortion-rate function can be achieved without feedforward.

V Constrained Functional-Covering of Poisson Processes

In this and the next section we focus on Poisson processes. Let 𝒴^0T\hat{\mathcal{Y}}_{0}^{T} denote the set of all functions y^0T\hat{y}_{0}^{T} which are non-negative and left-continuous with right-limits. We assume that we are given a set 𝒜+\mathcal{A}\in\mathbb{R}_{+} with at least one positive element. We will constrain the reconstruction function Y^0T\hat{Y}_{0}^{T} to take value in 𝒜\mathcal{A}, so that for all t[0,T]t\in[0,T], Y^t𝒜\hat{Y}_{t}\in\mathcal{A}.

Definition 10

A (T,R,D)(T,R,D) code consists of an encoder ff

f:𝒩0T{1,,exp(RT)}\displaystyle f:\mathcal{N}_{0}^{T}\rightarrow\{1,\dots,\lceil\exp(RT)\rceil\}

and a decoder gg

g:{1,,exp(RT)}𝒴^0T\displaystyle g:\{1,\dots,\lceil\exp(RT)\rceil\}\rightarrow\hat{\mathcal{Y}}_{0}^{T}

satisfying

Y^t𝒜,𝔼[0TY^t𝑑t]<\displaystyle\hat{Y}_{t}\in\mathcal{A},\,\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]<\infty

and the distortion constraint

1T𝔼[d(Y^0T,Y0T)]D.\displaystyle\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]\leq D.

As before, we will call the encoder’s output M=f(Y0T)M=f(Y_{0}^{T}) the message and the decoder’s output Y^0T=g(M)\hat{Y}_{0}^{T}=g(M) the reconstruction.

Definition 11

A rate-distortion vector (R,D)(R,D) is said to be achievable if for any ϵ>0\epsilon>0, there exists a sequence of (Tn,R+ϵ,D+ϵ)(T_{n},R+\epsilon,D+\epsilon) codes such that limnTn=\lim_{n\to\infty}T_{n}=\infty.

Definition 12

The rate-distortion region 𝔇𝒜𝒫\mathfrak{RD}^{\mathcal{P}}_{\mathcal{A}} is the intersection of all achievable rate-distortion vectors (R,D)(R,D).

The rate-distortion region 𝔇𝒜𝒫,F\mathfrak{RD}^{\mathcal{P},\text{F}}_{\mathcal{A}} with feedforward is defined as in Definitions 11 and 12.

Theorem 5

The rate-distortion region for the constrained functional-covering of a Poisson process with rate λ>0\lambda>0 is given by

𝔇𝒜𝒫=𝔇𝒜𝒫,F=𝔇,\mathfrak{RD}^{\mathcal{P}}_{\mathcal{A}}=\mathfrak{RD}^{\mathcal{P},\text{F}}_{\mathcal{A}}=\mathfrak{RD},

where 𝔇\mathfrak{RD} is the convex hull of the union of sets of rate-distortion vectors (R,D)(R,D) such that

Rλk=14βklog(βkαk)\displaystyle R\geq\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)
Dk=14αkΨ𝒜(λβkαk),\displaystyle D\geq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right),

where

Ψ𝒜(u)infv𝒜vulog(v)\displaystyle\Psi_{\mathcal{A}}(u)\triangleq\inf_{v\in\mathcal{A}}v-u\log(v)

with the convention that 0Ψ(0/0)=00\Psi(0/0)=0, and [αk]k=14[\alpha_{k}]_{k=1}^{4} and [βk]k=14[\beta_{k}]_{k=1}^{4} are probability vectors over {1,2,3,4}\{1,2,3,4\} satisfying αk=0βk=0\alpha_{k}=0\Rightarrow\beta_{k}=0.

Proof:

Achievability

Let

Rλk=14βklog(βkαk)\displaystyle{R}\triangleq\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)
Dk=14αkΨ𝒜(λβkαk).\displaystyle{D}\triangleq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right).

We will show achievability using a (T,R+ϵ,D+ϵ)(T,{R}+\epsilon,{D}+\epsilon) code without feedforward. We will use discretization and results from the rate-distortion theory for discrete memoryless sources (DMS). Define a binary-valued discrete-time process (Y¯j:j{1,,n})(\bar{Y}_{j}:j\in\{1,\dots,n\}) as follows. If there are one or more arrivals in the interval ((j1)Δ,jΔ]((j-1)\Delta,j\Delta] of the process Y0TY_{0}^{T} , then set Y¯j\bar{Y}_{j} to 11, otherwise it equals zero. Since Y0TY_{0}^{T} is a Poisson process with rate λ\lambda, the components of (Y¯j:j{1,,n})(\bar{Y}_{j}:j\in\{1,\dots,n\}) are independent and identically distributed with P(Y¯=1)=1exp(λΔ)P(\bar{Y}=1)=1-\exp(-\lambda\Delta). Consider the following “test”-channel for k{1,2,3,4}k\in\{1,2,3,4\},

P(U¯=k|Y¯=1)\displaystyle P(\bar{U}=k|\bar{Y}=1) =βk,\displaystyle=\beta_{k},
P(U¯=k|Y¯=0)\displaystyle P(\bar{U}=k|\bar{Y}=0) =αk.\displaystyle=\alpha_{k}.

Define the discretized distortion function

d¯(y¯^,y¯)y¯^log(y¯^)Δ𝟏{y¯=1}y¯^𝒜,y¯{0,1}.\bar{d}(\hat{\bar{y}},\bar{y})\triangleq\hat{\bar{y}}-\frac{\log(\hat{\bar{y}})}{\Delta}\mathbf{1}\{\bar{y}=1\}\quad\hat{\bar{y}}\in\mathcal{A},\bar{y}\in\{0,1\}.

The reconstruction Y¯^(k)\hat{\bar{Y}}(k) is taken as a v𝒜v\in\mathcal{A} satisfying

|Ψ𝒜(λβkαk)(vλβkαklog(v))|ϵ4,\displaystyle\left|\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)-\left(v-\frac{\lambda\beta_{k}}{\alpha_{k}}\log(v)\right)\right|\leq\frac{\epsilon}{4}, (23)

where such a vv exists due to the definition of Ψ𝒜\Psi_{\mathcal{A}}. We recall that if αk=0\alpha_{k}=0 then βk=0\beta_{k}=0, and hence P(U¯=k)=0P(\bar{U}=k)=0 for such a kk. The scaling of the mutual information I(U¯;Y¯)I(\bar{U};\bar{Y}) and the distortion function d¯(Y¯^,Y¯)\bar{d}(\hat{\bar{Y}},\bar{Y}) with respect to Δ\Delta is given by the following lemma.

Lemma 13
limΔ0I(U¯;Y¯)Δ\displaystyle\lim_{\Delta\to 0}\frac{I(\bar{U};\bar{Y})}{\Delta} =R\displaystyle={R}
limΔ0𝔼[d¯(Y¯^,Y¯)]\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})] D+ϵ4\displaystyle\leq{D}+\frac{\epsilon}{4}
Proof:

Please see the supplementary material. ∎

Let

κmaxk{1,2,3,4}Y¯^(k)>0|log(Y¯^(k))|.\displaystyle\kappa\triangleq\max_{\begin{subarray}{c}k\in\{1,2,3,4\}\\ \hat{\bar{Y}}(k)>0\end{subarray}}\left|\log\left(\hat{\bar{Y}}(k)\right)\right|. (24)

Due to [53, Theorem 9.3.2, p. 455], for a given Δ>0\Delta>0, ϵ¯>0\bar{\epsilon}>0, and all sufficiently large nn, there exists an encoder f¯\bar{f} and a decoder g¯\bar{g} such that

f¯:(Y¯j:j{1,,n}){1,,L}\displaystyle\bar{f}:(\bar{Y}_{j}:j\in\{1,\dots,n\})\to\{1,\dots,L\}
g¯:{1,,L}(Y¯^j:j{1,,n})\displaystyle\bar{g}:\{1,\dots,L\}\to(\hat{\bar{Y}}_{j}:j\in\{1,\dots,n\})

satisfying

1nlog(L)I(U¯;Y¯)+ϵ¯,\displaystyle\frac{1}{n}\log(L)\leq I(\bar{U};\bar{Y})+\bar{\epsilon},
𝔼[1nj=1nd¯(Y¯^j,Yj¯)]𝔼[d¯(Y¯^,Y¯)]+ϵ¯.\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y_{j}})\right]\leq\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\bar{\epsilon}. (25)

Given the above setup, the encoder ff upon observing Y0TY_{0}^{T} obtains the binary valued discrete time process (Y¯j:j{1,,n})(\bar{Y}_{j}:j\in\{1,\dots,n\}), and sends M=f¯(Y¯j:j{1,,n})M=\bar{f}(\bar{Y}_{j}:j\in\{1,\dots,n\}) to the decoder. The decoder outputs the reconstruction Y^0T\hat{Y}_{0}^{T} as

Y^tj=1nY¯^j𝟏{t((j1)Δ,jΔ]}t[0,T].\displaystyle\hat{Y}_{t}\triangleq\sum_{j=1}^{n}\hat{\bar{Y}}_{j}\mathbf{1}\left\{t\in((j-1)\Delta,j\Delta]\right\}\quad t\in[0,T].

Let Y¯¯j\bar{\bar{Y}}_{j} denote the actual number of arrivals of Y0TY_{0}^{T} in an interval ((j1)Δ,jΔ]((j-1)\Delta,j\Delta]. Then d¯\bar{d} is related to the original distortion function via the above reconstruction as follows:

1Td(Y^0T;Y0T)\displaystyle\frac{1}{T}d(\hat{Y}_{0}^{T};Y_{0}^{T}) =1T0TY^t𝑑t1T0Tlog(Y^t)𝑑Yt\displaystyle=\frac{1}{T}\int_{0}^{T}\hat{Y}_{t}\,dt-\frac{1}{T}\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}
=1nj=1nY¯^j1Tj=1nlog(Y¯^j)Y¯¯j\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\hat{\bar{Y}}_{j}-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})\bar{\bar{Y}}_{j}
=1nj=1nY¯^j1nΔj=1nlog(Y¯^j)Y¯j1Tj=1nlog(Y¯^j)(Y¯¯j1)𝟏{Y¯¯j>1}.\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\hat{\bar{Y}}_{j}-\frac{1}{n\Delta}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})\bar{Y}_{j}-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}.
=1nj=1nd¯(Y¯^j,Y¯j)1Tj=1nlog(Y¯^j)(Y¯¯j1)𝟏{Y¯¯j>1}\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}
(a)1nj=1nd¯(Y¯^j,Y¯j)+κTj=1n(Y¯¯j1)𝟏{Y¯¯j>1}\displaystyle\overset{(a)}{\leq}\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})+\frac{\kappa}{T}\sum_{j=1}^{n}(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}
1nj=1nd¯(Y¯^j,Y¯j)+κTj=1nY¯¯j𝟏{Y¯¯j>1},\displaystyle\leq\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})+\frac{\kappa}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\},

where for (a), we have used the definition of κ\kappa in (24), since Y¯¯j>1\bar{\bar{Y}}_{j}>1 implies Y¯j=1\bar{Y}_{j}=1 which implies Y¯^j>0\hat{\bar{Y}}_{j}>0 in order for d¯(Y¯^j,1)<\bar{d}(\hat{\bar{Y}}_{j},1)<\infty, which occurs a.s. since 𝔼[d¯(Y¯^,Y¯)]<\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]<\infty so long as Δ\Delta is sufficiently small.
Hence taking expectations, we get

𝔼[1Td(Y^0T,Y0T)]\displaystyle\mathbb{E}\left[\frac{1}{T}d(\hat{Y}_{0}^{T},Y_{0}^{T})\right] 𝔼[1nj=1nd¯(Y¯^j,Y¯j)]+κ𝔼[1Tj=1nY¯¯j𝟏{Y¯¯j>1}]\displaystyle\leq\mathbb{E}\left[\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})\right]+\kappa\mathbb{E}\left[\frac{1}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}\right]
(a)𝔼[d¯(Y¯^,Y¯)]+κ𝔼[1Tj=1nY¯¯j𝟏{Y¯¯j>1}]+ϵ¯\displaystyle\overset{(a)}{\leq}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa\mathbb{E}\left[\frac{1}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}\right]+\bar{\epsilon}
=(b)𝔼[d¯(Y¯^,Y¯)]+κ(λλexp(λΔ))+ϵ¯\displaystyle\overset{(b)}{=}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa(\lambda-\lambda\exp(-\lambda\Delta))+\bar{\epsilon}
(c)𝔼[d¯(Y¯^,Y¯)]+κλ2Δ+ϵ¯,\displaystyle\overset{(c)}{\leq}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa\lambda^{2}\Delta+\bar{\epsilon}, (26)

where, for (a), we have used (25),
for (b) we note that 𝔼[Y¯¯j𝟏{Y¯¯j>1}]=λΔλΔexp(λΔ)\mathbb{E}[\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}]=\lambda\Delta-\lambda\Delta\exp(-\lambda\Delta), and
for (c), we have used the inequality 1uexp(u)1-u\leq\exp(-u) .
Moreover using (25),

1Tlog(L)=1nΔlog(L)I(U¯;Y¯)Δ+ϵ¯Δ.\displaystyle\frac{1}{T}\log(L)=\frac{1}{n\Delta}\log(L)\leq\frac{I(\bar{U};\bar{Y})}{\Delta}+\frac{\bar{\epsilon}}{\Delta}. (27)

Now given the rate-distortion vector (R,D)({R},{D}) and ϵ>0\epsilon>0, first choose Δ<1\Delta<1 sufficiently small so that

I(U¯;Y¯)Δ\displaystyle\frac{I(\bar{U};\bar{Y})}{\Delta} R+ϵ4\displaystyle\leq{R}+\frac{\epsilon}{4}
𝔼[d¯(Y¯^,Y¯)]\displaystyle\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})] D+ϵ2,\displaystyle\leq{D}+\frac{\epsilon}{2},
κλ2Δ\displaystyle\kappa\lambda^{2}\Delta ϵ/2.\displaystyle\leq\epsilon/2.

Then let ϵ¯=Δϵ/4\bar{\epsilon}=\Delta\epsilon/4, and choose a sufficiently large nn so that (25) is satisfied. From (26) and (27) we conclude that a sequence of (Tn,R+ϵ,D+ϵ)(T_{n},{R}+\epsilon,{D}+\epsilon) code exists with Tn=nΔT_{n}=n\Delta and TnT_{n}\to\infty as nn\to\infty.

Converse

We will prove the converse when feedforward is present. For the given (T,R+ϵ,D+ϵ)(T,R+\epsilon,D+\epsilon) code with feedforward, let MM denote the encoder’s output. Since I(M;Y0T)<I(M;Y_{0}^{T})<\infty, we conclude from Theorem 1 that there exists a process Γ0T\Gamma_{0}^{T} such that Γ0T\Gamma_{0}^{T} is the (t=σ(M,Y0t):t[0,T])(\mathcal{F}_{t}=\sigma(M,Y_{0}^{t}):t\in[0,T]) intensity of Y0TY_{0}^{T} and

I(M;Y0T)=𝔼[0Tϕ(Γt)𝑑t]Tϕ(λ).I(M;Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-T\phi(\lambda). (28)

We also have

1TI(M;Y0T)=1TH(M)1Tlog(exp((R+ϵ)T))R+ϵ+1T.\displaystyle\frac{1}{T}I(M;{Y}^{T}_{0})=\frac{1}{T}H(M)\leq\frac{1}{T}\log\left(\lceil\exp((R+\epsilon)T)\rceil\right)\leq R+\epsilon+\frac{1}{T}.

This gives

R1T𝔼[0Tϕ(Γt)𝑑t]ϕ(λ)ϵ1T.\displaystyle R\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\phi(\lambda)-\epsilon-\frac{1}{T}.

Let Y^0T\hat{Y}_{0}^{T} denote the decoder’s output. The distortion constraint DD satisfies

D\displaystyle D 1T𝔼[d(Y^0T,Y0T)]ϵ\displaystyle\geq\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]-\epsilon
=1T𝔼[0TY^t𝑑tlog(Y^t)dYt]ϵ\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt-\log(\hat{Y}_{t})\,dY_{t}\right]-\epsilon
=(a)1T𝔼[0TY^tΓtlog(Y^t)dt]ϵ\displaystyle\overset{(a)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}-\Gamma_{t}\log(\hat{Y}_{t})\,\,dt\right]-\epsilon
1T𝔼[0Tinfv𝒜vΓtlog(v)dt]ϵ\displaystyle\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\inf_{v\in\mathcal{A}}v-\Gamma_{t}\log(v)\,dt\right]-\epsilon
=(b)1T𝔼[0TΨ𝒜(Γt)𝑑t]ϵ,\displaystyle\overset{(b)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Psi_{\mathcal{A}}(\Gamma_{t})\,dt\right]-\epsilon, (29)

where, for (a) we have used Lemma 12, and
for (b), we have used the definition of Ψ𝒜\Psi_{\mathcal{A}}. Defining SS to be uniformly distributed on [0,T][0,T], and independent of all other random variables we have

R\displaystyle R 𝔼[ϕ(ΓS)]ϕ(λ)ϵ1T\displaystyle\geq\mathbb{E}\left[\phi(\Gamma_{S})\right]-\phi(\lambda)-\epsilon-\frac{1}{T} (30)
D\displaystyle D 𝔼[Ψ𝒜(ΓS)]ϵ,\displaystyle\geq\mathbb{E}\left[\Psi_{\mathcal{A}}(\Gamma_{S})\right]-\epsilon, (31)

Now we use Carathéodory’s theorem [54, Theorem 17.1]. There exist non-negative [ηk]k=14[\eta_{k}]_{k=1}^{4} and [αk]k=14[\alpha_{k}]_{k=1}^{4}, such that k=14αk=1\sum_{k=1}^{4}\alpha_{k}=1 and

𝔼[ϕ(ΓS)]\displaystyle\mathbb{E}\left[\phi({\Gamma}_{S})\right] =k=14αkϕ(ηk),\displaystyle=\sum_{k=1}^{4}\alpha_{k}\phi(\eta_{k}), (32)
𝔼[Ψ𝒜(ΓS)]\displaystyle\mathbb{E}\left[\Psi_{\mathcal{A}}({\Gamma}_{S})\right] =k=14αkΨ𝒜(ηk),\displaystyle=\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}(\eta_{k}), (33)
𝔼[ΓS]\displaystyle\mathbb{E}\left[{\Gamma}_{S}\right] =k=14αkηk=λ,\displaystyle=\sum_{k=1}^{4}\alpha_{k}\eta_{k}=\lambda, (34)

where in the last line we have used the fact that since Γ0T{\Gamma}_{0}^{T} is the (σ(M,Y0T):t[0,T])(\sigma(M,{Y}_{0}^{T}):t\in[0,T])-intensity of Y0T{Y}_{0}^{T}, 𝔼[0TΓt𝑑t]=𝔼[YT]=Tλ\mathbb{E}\left[\int_{0}^{T}{\Gamma}_{t}\,dt\right]=\mathbb{E}[{Y}_{T}]=T\lambda. Now define

βkαkηkλ.\beta_{k}\triangleq\frac{\alpha_{k}\eta_{k}}{\lambda}.

We note that βk=0\beta_{k}=0 if αk=0\alpha_{k}=0, and k=14βk=1\sum_{k=1}^{4}\beta_{k}=1. Substituting the above definitions in (30)-(31), we obtain

R\displaystyle R (k=14αkηklog(ηk)λlog(λ))ϵ1T\displaystyle\geq\left(\sum_{k=1}^{4}\alpha_{k}\eta_{k}\log(\eta_{k})-\lambda\log(\lambda)\right)-\epsilon-\frac{1}{T}
=λ(k=14βklog(βkλαk)𝟏{αk>0}log(λ))ϵ1T\displaystyle=\lambda\left(\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}\lambda}{\alpha_{k}}\right)\mathbf{1}\{\alpha_{k}>0\}-\log(\lambda)\right)-\epsilon-\frac{1}{T}
=λk=14βklog(βkαk)ϵ1T.\displaystyle=\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)-\epsilon-\frac{1}{T}. (35)

Likewise,

Dk=14αkΨ𝒜(λβkαk)ϵ.\displaystyle D\geq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)-\epsilon.

Since ϵ\epsilon is arbitrary and TT can be made arbitrarily large, we obtain the rate-distortion region in the statement of the theorem. ∎

If we do not place any restrictions on 𝒜\mathcal{A}, i.e., if 𝒜\mathcal{A} is the set of all non-negative reals, then we obtain the functional-covering distortion.

Corollary 2 (Functional Covering of Poisson Processes)

The rate-distortion function for functional-covering distortion is given by RFC(D)=(λλlog(λ)D)+R_{\text{FC}}(D)=(\lambda-\lambda\log(\lambda)-D)^{+}.

Proof:

For the functional-covering distortion, 𝒜\mathcal{A} is the set of non-negative reals. Hence

Ψ𝒜(u)=infv0vulog(v)=uulog(u).\displaystyle\Psi_{\mathcal{A}}(u)=\inf_{v\geq 0}v-u\log(v)=u-u\log(u).

For any achievable (R,D)(R,D) we have

R\displaystyle R λk=14βklog(βkαk),\displaystyle\geq\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right), (36)

and

D\displaystyle D k=14αkΨ𝒜(λβkαk)\displaystyle\geq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)
=k=14αk(λβkαkλβkαklog(λβkαk))\displaystyle=\sum_{k=1}^{4}\alpha_{k}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}-\frac{\lambda\beta_{k}}{\alpha_{k}}\log\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)\right)
=λλlog(λ)λk=14βklog(βkαk).\displaystyle=\lambda-\lambda\log(\lambda)-\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right).

Hence

R+Dλλlog(λ),\displaystyle R+D\geq\lambda-\lambda\log(\lambda),

and this is achieved by [αk]k=14[\alpha_{k}]_{k=1}^{4} and [βk]k=14[\beta_{k}]_{k=1}^{4} that yield equality in (36). ∎

If take 𝒜={0,1}\mathcal{A}=\{0,1\}, then we recover the covering distortion in [35, Theorem 1].

Corollary 3 (Covering Distortion [35])

The rate-distortion function for the covering distortion is given by RC(D)=(λlog(D))+R_{\text{C}}(D)=(-\lambda\log(D))^{+}.

Proof:

For the covering distortion, 𝒜={0,1}\mathcal{A}=\{0,1\}. Hence

Ψ𝒜(u)=infv{0,1}vulog(v)=𝟏{u>0}.\displaystyle\Psi_{\mathcal{A}}(u)=\inf_{v\in\{0,1\}}v-u\log(v)=\mathbf{1}\{u>0\}.

Suppose (R,D)(R,D) is in 𝔇\mathfrak{RD}. Then

D\displaystyle D k=14αkΨ𝒜(λβkαk)\displaystyle\geq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)
=k=14αk𝟏{βk>0}\displaystyle=\sum_{k=1}^{4}\alpha_{k}\mathbf{1}\{\beta_{k}>0\}
=kαk,\displaystyle=\sum_{k\in\mathcal{B}}\alpha_{k},

where we have defined ={k:βk>0}\mathcal{B}=\{k:\beta_{k}>0\}. Similarly,

R\displaystyle R λk=14βklog(βkαk)\displaystyle\geq\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)
=λkβklog(βkαk)\displaystyle=\lambda\sum_{k\in\mathcal{B}}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)
(a)λ(kβk)log(kβkkαk)\displaystyle\overset{(a)}{\geq}\lambda\left(\sum_{k\in\mathcal{B}}\beta_{k}\right)\log\left(\frac{\sum_{k\in\mathcal{B}}\beta_{k}}{\sum_{k\in\mathcal{B}}\alpha_{k}}\right)
=λlog(1kαk)\displaystyle=\lambda\log\left(\frac{1}{\sum_{k\in\mathcal{B}}\alpha_{k}}\right)
(λlog(D))+,\displaystyle\geq\left(-\lambda\log(D)\right)^{+},

where, (a) is due to the log-sum inequality, which can be achieved by setting α1=min(1,D)\alpha_{1}=\min(1,D), α2=1α1\alpha_{2}=1-\alpha_{1}, β1=1\beta_{1}=1, β2=0\beta_{2}=0. ∎

Remark 2

As in the general case in Theorem 4 (see Remark 1), the reconstruction y^0T\hat{y}_{0}^{T} (assuming it is bounded) can be used to define a probability measure on the input space (𝒩0T,𝔉N)(\mathcal{N}_{0}^{T},\mathfrak{F}^{N}) via

dPy^0TdP0(y0T)=exp(0Tlog(y^t)𝑑yt(y^t1)dt),\frac{dP_{\hat{y}_{0}^{T}}}{dP_{0}}(y_{0}^{T})=\exp\left(\int_{0}^{T}\log(\hat{y}_{t})\,dy_{t}-(\hat{y}_{t}-1)\,dt\right),

where P0P_{0} is the measure under which Y0TY_{0}^{T} is a Poisson process with unit rate. Moreover, in absence of feedforward, y^0T\hat{y}_{0}^{T} is deterministic (it depends only on the encoder’s output). Thus the input point-process Y0TY_{0}^{T} is a non-homogeneous Poisson processes with rate y^0T\hat{y}_{0}^{T} under Py^0TP_{\hat{y}_{0}^{T}}. As in the general case, the functional-covering distortion is related to the above Radon-Nikodym derivative via

d(y^0T,y0T)=log(dPy^0TdP0(y0T))+Td(\hat{y}_{0}^{T},y_{0}^{T})=-\log\left(\frac{dP_{\hat{y}_{0}^{T}}}{dP_{0}}(y_{0}^{T})\right)+T

\Diamond

VI The Poisson CEO Problem

Refer to caption
Figure 1: Poisson CEO Problem.

We now consider the distributed problem shown Figure 1. Our goal is to compress Y0TY_{0}^{T}, which is a Poisson process with rate λ>0\lambda>0. Each of the two encoders observes a degraded version of Y0TY_{0}^{T}, denoted by Y0(i),TY_{0}^{(i),T}, i{1,2}i\in\{1,2\}. Y0TY_{0}^{T} is first p(i)p^{(i)}-thinned to obtain Y~0(i),T\tilde{Y}_{0}^{(i),T}, and then an independent Poisson process N0(i),TN_{0}^{(i),T} with rate μ(i)\mu^{(i)} is added to Y~0(i),T\tilde{Y}_{0}^{(i),T} to obtain Y0(i),TY_{0}^{(i),T}.

Recall that 𝒴^0T\hat{\mathcal{Y}}_{0}^{T} is the set of all non-negative functions y^0T\hat{y}_{0}^{T} which are left-continuous with right-limits, and

d(y^0T,y0T)=0Ty^t𝑑tlog(y^t)dyt.d(\hat{y}_{0}^{T},y_{0}^{T})=\int_{0}^{T}\hat{y}_{t}\,dt-\log(\hat{y}_{t})\,dy_{t}.
Definition 13

A (T,R(1),R(2),D)(T,R^{(1)},R^{(2)},D) code for the Poisson CEO problem consists of encoders f(1)f^{(1)} and f(2)f^{(2)},

f(1):𝒩0T{1,,exp(R(1)T)}\displaystyle f^{(1)}:\mathcal{N}_{0}^{T}\rightarrow\{1,\dots,\lceil\exp(R^{(1)}T)\rceil\}
f(2):𝒩0T{1,,exp(R(2)T)},\displaystyle f^{(2)}:\mathcal{N}_{0}^{T}\rightarrow\{1,\dots,\lceil\exp(R^{(2)}T)\rceil\},

and a decoder gg,

g:{1,,exp(R(1)T)}×{1,,exp(R(2)T)}𝒴^0T,\displaystyle g:\{1,\dots,\lceil\exp(R^{(1)}T)\rceil\}\times\{1,\dots,\lceil\exp(R^{(2)}T)\rceil\}\rightarrow\hat{\mathcal{Y}}_{0}^{T},

satisfying

𝔼[0TY^t𝑑t]<,\displaystyle\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]<\infty,

and the distortion constraint

1T𝔼[d(Y^0T,Y0T)]D.\displaystyle\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]\leq D.
Definition 14

A rate-distortion vector (R(1),R(2),D)(R^{(1)},R^{(2)},D) is said to be achievable for the Poisson CEO problem if for any ϵ>0\epsilon>0, there exists a sequence (Tn,R(1)+ϵ,R(2)+ϵ,D+ϵ)(T_{n},R^{(1)}+\epsilon,R^{(2)}+\epsilon,D+\epsilon) codes TnT_{n}\to\infty .

Definition 15

The rate-distortion region for the Poisson CEO problem 𝔇𝒫\mathfrak{RD}^{\mathcal{P}} is the intersection of all achievable rate-distortion vectors (R(1),R(2),D)(R^{(1)},R^{(2)},D).

The rate-distortion region for the Poisson CEO problem with feedforward, denoted by 𝔇F𝒫\mathfrak{RD}^{\mathcal{P}}_{F}, is defined analogously.

Theorem 6

The rate-distortion region for the Poisson CEO problem is given by

𝔇𝒫=𝔇F𝒫=𝔇,\mathfrak{RD}^{\mathcal{P}}=\mathfrak{RD}^{\mathcal{P}}_{F}=\mathfrak{RD},

where 𝔇\mathfrak{RD} is the convex hull of union of sets of rate-distortion vectors (R(1),R(2),D)(R^{(1)},R^{(2)},D) such that

R(1)((1p(1))λ+μ(1))k=14βk(1)log(βk(1)αk(1)),\displaystyle R^{(1)}\geq\left((1-p^{(1)})\lambda+\mu^{(1)}\right)\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right),
R(2)((1p(2))λ+μ(2))k=14βk(2)log(βk(2)αk(2)),\displaystyle R^{(2)}\geq\left((1-p^{(2)})\lambda+\mu^{(2)}\right)\sum_{k=1}^{4}\beta^{(2)}_{k}\log\left(\frac{\beta^{(2)}_{k}}{\alpha^{(2)}_{k}}\right),
Dλϕ(λ)λ(k=14γk(1)log(γk(1)αk(1))+k=14γk(2)log(γk(2)αk(2)))\displaystyle D\geq\lambda-\phi(\lambda)-\lambda\left(\sum_{k=1}^{4}\gamma^{(1)}_{k}\log\left(\frac{\gamma^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)+\sum_{k=1}^{4}\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)\right)

for some probability vectors [αk(i)]k=14[\alpha^{(i)}_{k}]_{k=1}^{4}, [βk(i)]k=14[\beta^{(i)}_{k}]_{k=1}^{4}, and [γk(i)]k=14[\gamma^{(i)}_{k}]_{k=1}^{4}, where for k{1,2,3,4}k\in\{1,2,3,4\} and i{1,2}i\in\{1,2\}

γk(i)=p(i)αk(i)+(1p(i))βk(i)αk(i)=0βk(i)=0}if p(i)<1,\displaystyle\begin{rcases}\gamma^{(i)}_{k}=p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}\\ \alpha^{(i)}_{k}=0\Rightarrow\beta^{(i)}_{k}=0\end{rcases}\quad\text{if }p^{(i)}<1,
αk(i)=βk(i)=γk(i)if p(i)=1.\displaystyle\alpha^{(i)}_{k}=\beta^{(i)}_{k}=\gamma^{(i)}_{k}\quad\qquad\qquad\quad\quad\,\,\,\text{if }p^{(i)}=1.
Proof:

Please see the supplementary material. ∎

Remark 3

Note that there is no sum-rate constraint in the rate-distortion region of the above theorem. This occurs due to the sparsity of points in a Poisson process. After discretizing a Poisson process with rate λ\lambda, the expected number of ones in the resulting binary process is roughly λT\lambda T, and the remaining T/ΔλTT/\Delta-\lambda T bits are zeroes. When such a sparse binary process is sent via two independent parallel channels as in (46)-(47), the resulting output processes are almost independent. This implies that the encoders do not need to bin their messages in the achievability argument.

Corollary 4 (Poisson CEO Problem without Thinning)

If p(1)=p(2)=0p^{(1)}=p^{(2)}=0, then the rate-distortion region in Theorem 6 takes a simple form

λλ+μ(1)R(1)+λλ+μ(2)R(2)+Dλϕ(λ).\frac{\lambda}{\lambda+\mu^{(1)}}R^{(1)}+\frac{\lambda}{\lambda+\mu^{(2)}}R^{(2)}+D\geq\lambda-\phi(\lambda).
Corollary 5 (Remote Poisson Source)

Consider a scenario where an encoder wishes to compress a Poisson process with rate λ>0\lambda>0, but observes a degraded version of it, where the points are first erased with independent probability 1p1-p and then an independent Poisson process with rate μ\mu is added to it. Then the rate-distortion region (R,D)(R,D) is the convex hull of the union of all rate-distortion vectors satisfying

R((1p)λ+μ)k=14βklog(βkαk),\displaystyle R\geq\left((1-p)\lambda+\mu\right)\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right),
Dλϕ(λ)λk=14γklog(γkαk),\displaystyle D\geq\lambda-\phi(\lambda)-\lambda\cdot\sum_{k=1}^{4}\gamma_{k}\log\left(\frac{\gamma_{k}}{\alpha_{k}}\right),

for some probability vectors [αk]k=14[\alpha_{k}]_{k=1}^{4}, [βk]k=14[\beta_{k}]_{k=1}^{4}, and [γk]k=14[\gamma_{k}]_{k=1}^{4}, where for k{1,2,3,4}k\in\{1,2,3,4\}

γk=pαk+(1p)βk,αk=0βk=0.\displaystyle\gamma_{k}=p\alpha_{k}+(1-p)\beta_{k},\quad\alpha_{k}=0\Rightarrow\beta_{k}=0.

References

  • [1] N. V. Shende and A. B. Wagner, “Functional covering of point processes,” in IEEE Int. Symp. Info. Theory, 2019, pp. 2039–2043.
  • [2] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression.   Englewood Cliffs, NJ: Prentice Hall, 1971.
  • [3] D. H. Johnson, “Point process models of single-neuron discharges,” Journal of Computational Neuroscience, vol. 3, no. 4, pp. 275–299, 1996.
  • [4] J. H. Goldwyn, J. T. Rubinstein, and E. Shea-Brown, “A point process framework for modeling electrical stimulation of the auditory nerve,” Journal of Neurophysiology, vol. 108, no. 5, pp. 1430–1452, 2012.
  • [5] F. Farkhooi, M. F. Strube-Bloss, and M. P. Nawrot, “Serial correlation in neural spike trains: Experimental evidence, stochastic modeling, and single neuron variability,” Physical Review E, vol. 79, no. 2, p. 021905, 2009.
  • [6] S. V. Sarma, U. T. Eden, M. L. Cheng, Z. M. Williams, R. Hu, E. Eskandar, and E. N. Brown, “Using point process models to compare neural spiking activity in the subthalamic nucleus of Parkinson’s patients and a healthy primate,” IEEE Transactions on Biomedical Engineering, vol. 57, no. 6, pp. 1297–1305, 2010.
  • [7] E. N. Brown, R. E. Kass, and P. P. Mitra, “Multiple neural spike train data analysis: state-of-the-art and future challenges,” Nature Neuroscience, vol. 7, no. 5, pp. 456–461, 2004.
  • [8] F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek, Spikes: Exporing the Neural Code.   MIT Press, 1997.
  • [9] J. Giles and B. Hajek, “An information-theoretic and game-theoretic study of timing channels,” IEEE Trans. on Inf. Theory, vol. 48, no. 9, pp. 2455–2477, 2002.
  • [10] M. Shahzad and A. X. Liu, “Accurate and efficient per-flow latency measurement without probing and time stamping,” IEEE/ACM Trans. Networking, vol. 24, no. 6, pp. 3477–3492, 2016.
  • [11] Y. Zhu, X. Fu, B. Graham, R. Bettati, and W. Zhao, “On flow correlation attacks and countermeasures in mix networks,” in Proc. 4th Privacy Enhancement Technology Workshop (PET), 2004.
  • [12] C. B. Attila Börcs, “A marked point process model for vehicle detection in aerial LIDAR point clouds,” in ISPRS Ann. Photogrammetry, Remote Sens. and Spatial Inf. Sci, vol. 1-3, 2012, pp. 93–98.
  • [13] Y. Yu, J. Li, H. Guan, C. Wang, and M. Cheng, “A marked point process for automated tree detection from mobile laser scanning point cloud data,” in 2012 Intl. Conf. Comp. Vision in Remote Sensing, 2012, pp. 140–145.
  • [14] S. Nakamoto, “A peer-to-peer electronic cash system,” 2008. [Online]. Available: bitcoin.org/bitcoin
  • [15] Y. Lewenberg, Y. Bachrach, Y. Sompolinsky, A. Zohar, and J. S. Rosenschein, “Bitcoin mining pools: A cooperative game theoretic analysis,” in Proc. 2015 Int. Conf. Autonomous Agents and Multiagent Sys., 2015, p. 919–927.
  • [16] Y. Kawase and S. Kasahara, “Transaction-confirmation time for bitcoin: A queueing analytical approach to blockchain mechanism,” in Intl. Conf. on Queueing Theory and Network App., 2017, p. 75–88.
  • [17] C. Decker and R. Wattenhofer, “Information propagation in the bitcoin network,” in IEEE P2P 2013 Proc., 2013, p. 1–10.
  • [18] A. Laourine and A. B. Wagner, “Secrecy capacity of the degraded Poisson wiretap channel,” in Proc. IEEE Intl. Symp. Inf. Theory, Jun. 2010, pp. 2553–2557.
  • [19] A. D. Wyner, “Capacity and error exponent for the direct detection photon channel—Part I,” IEEE Trans. Inf. Theory, vol. 34, no. 6, pp. 1449–1461, Nov. 1988.
  • [20] ——, “Capacity and error exponent for the direct detection photon channel—Part II,” IEEE Trans. Inf. Theory, vol. 34, no. 6, pp. 1462–1471, Nov. 1988.
  • [21] A. Lapidoth, “On the reliability function of the ideal Poisson channel with noiseless feedback,” IEEE Trans. Inf. Theory, vol. 39, no. 2, pp. 491–503, Mar. 1993.
  • [22] N. Shende and A. B. Wagner, “The stochastic-calculus approach to multiple-decoder Poisson channels,” IEEE Trans. Inf. Theory, vol. 65, no. 8, pp. 5007–5027, Aug. 2019.
  • [23] F. Baccelli and P. Brémaud, Palm Probabilities and Stationary Queues.   Springer-Verlag, 1987.
  • [24] C. Sutardja and J. M. Rabaey, “Isolator-less near-field RFID reader for sub-cranial powering/data link of millimeter-sized implants,” IEEE Journal of Solid-State Circuits, vol. 53, no. 7, pp. 2032–2042, 2018.
  • [25] A. K. Skrivervik, A. J. M. Montes, I. V. Trivino, M. Bosiljevac, M. Veljovic, and Z. Sipus, “Antenna design for a cranial implant,” in 2020 Intl. Work. Antenna Tech. (iWAT), 2020, pp. 1–4.
  • [26] R. L. de Queiroz and P. A. Chou, “Compression of 3D point clouds using a region-adaptive hierarchical transform,” IEEE Trans. on Image Proc., vol. 25, no. 8, pp. 3947–3956, 2016.
  • [27] T. Golla and R. Klein, “Real-time point cloud compression,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 5087–5092.
  • [28] C. Tu, E. Takeuchi, C. Miyajima, and K. Takeda, “Compressing continuous point cloud data using image compression methods,” in 2016 IEEE 19th Intl. Conf. Intell. Transportation Sys. (ITSC), 2016, pp. 1712–1719.
  • [29] R. Gallager, “Basic limits on protocol information in data communication networks,” IEEE Trans. Info. Theory, vol. 22, no. 4, pp. 385–398, 1976.
  • [30] A. S. Bedekar, “On the information about message arrival times required for in-order decoding,” in IEEE Int. Symp. Inf. Theory, June 2001, p. 227.
  • [31] S. Verdú, “The exponential distribution in information theory.” Probl. Inf. Transm., vol. 32, no. 1, pp. 86–95, 1996.
  • [32] T. P. Coleman, N. Kiyavash, and V. G. Subramanian, “The rate-distortion function of a Poisson process with a queueing distortion measure,” in Data Compress. Conf, Mar 2008, pp. 63–72.
  • [33] I. Rubin, “Information rates and data-compression schemes for Poisson processes,” IEEE Trans. Info. Theory, vol. 20, no. 2, pp. 200–210, 1974.
  • [34] G. Koliander, D. Schuhmacher, and F. Hlawatsch, “Rate-distortion theory of finite point processes,” IEEE Trans. Info. Theory, vol. 64, no. 8, pp. 5832–5861, 2018.
  • [35] A. Lapidoth, A. Malar, and L. Wang, “Covering point patterns,” IEEE Trans. Info. Theory, vol. 61, no. 9, pp. 4521–4533, 2015.
  • [36] H.-A. Shen, S. M. Moser, and J.-P. Pfister, “Rate-distortion problems of the Poisson process based on a group-theoretic approach,” 2022. [Online]. Available: https://arxiv.org/abs/2202.13684
  • [37] T. A. Courtade and R. D. Wesel, “Multiterminal source coding with an entropy-based distortion measure,” in IEEE Int. Symp. Info. Theory, Jul 2011, pp. 2040–2044.
  • [38] T. A. Courtade and T. Weissman, “Multiterminal source coding under logarithmic loss,” IEEE Trans. Info. Theory, vol. 60, no. 1, pp. 740–761, 2014.
  • [39] P. Brémaud, Point Procceses and Queues: Martingale Dynamics.   Springer-Verlag, 1981.
  • [40] Y. Kabanov, “The capacity of a channel of the Poisson type,” Theory of Probabilty and Applications, vol. 23, pp. 143–147, 1978.
  • [41] M. Davis, “Capacity and cutoff rate for Poisson-type channels,” IEEE Trans. Info. Theory, vol. 26, no. 6, pp. 710–715, Nov 1980.
  • [42] N. V. Shende and A. B. Wagner, “The stochastic-calculus approach to multi-receiver Poisson channels,” IEEE Trans. Info. Theory, vol. 65, no. 8, pp. 5007–5027, Aug 2019.
  • [43] R. Sundaresan and S. Verdú, “Capacity of queues via point-process channels,” IEEE Trans. Info. Theory, vol. 52, no. 6, pp. 2697–2709, June 2006.
  • [44] L. Wang, “A strong data processing inequality for thinning Poisson processes and some applications,” in IEEE Int. Symp. Info. Theory, June 2017, pp. 3180–3184.
  • [45] A. Klenke, Probability Theory: A Comprehensive Course, 2nd ed.   Springer London, 2013.
  • [46] A. Wyner, “A definition of conditional mutual information for arbitrary ensembles,” Information and Control, vol. 38, no. 1, pp. 51 – 59, 1978.
  • [47] I. M. Gel’fand and A. M. Yaglom, “Computation of the amount of information about a stochastic function contained in another such function,” Uspekhi Mat. Nauk, vol. 12, no. 1, pp. 3–52, 1957.
  • [48] R. M. Gray, Entropy and Information Theory.   Springer-Verlag, 1990.
  • [49] O. Kallenberg, Foundations of Modern Probability, 2nd ed.   Springer-Verlag, New York, 2002.
  • [50] R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes II, 2nd ed.   Springer-Verlag Berlin Heidelberg, 2001.
  • [51] Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for channels and Bayesian networks,” in Convexity and Concentration.   New York, NY: Springer New York, 2017, pp. 211–249.
  • [52] R. Durrett, Essentials of Stochastic Processes, ser. Springer Texts in Statistics.   Springer International Publishing, 2016.
  • [53] R. G. Gallager, Information Theory and Reliable Communication.   New York, NY, USA: John Wiley & Sons, Inc., 1968.
  • [54] R. Rockafellar, Convex Analysis.   Princeton University Press, 1997.
  • [55] C. Dellacherie and P. A. Meyer, Probabilities and Potential B: Theory of Martingales, ser. North-Holland Mathematics Studies.   North-Holland, 1982, vol. 72.
  • [56] A. E. Gamal and Y.-H. Kim, Network Information Theory.   Cambridge University Press, 2011.
Proof:

The first part of the lemma is due to [39, T12 Theorem, Chapter VI, p. 187]. To prove the second part we note that 𝔼PY0T[0T|ϕ(Γt)|𝑑t]<\mathbb{E}_{P^{Y_{0}^{T}}}[\int_{0}^{T}|\phi(\Gamma_{t})|\,dt]<\infty implies 𝔼PY0T[0TΓt𝑑t]<\mathbb{E}_{P^{Y_{0}^{T}}}[\int_{0}^{T}\Gamma_{t}\,dt]<\infty, which in turn gives

0T(1Γt))20T(Γt+1)<,\int_{0}^{T}(1-\sqrt{\Gamma_{t}}))^{2}\leq\int_{0}^{T}(\Gamma_{t}+1)<\infty,

PY0T{P^{Y_{0}^{T}}}-a.s. Thus applying [50, Theorem 19.7, p. 343], we conclude that PY0TP0Y0TP^{Y_{0}^{T}}\ll P_{0}^{Y_{0}^{T}}. Hence, from the first part of the lemma

dPY0TdP0Y0T=exp(0Tlog(Λt)𝑑YtΛt+1dt),\displaystyle\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}=\exp\left(\int_{0}^{T}\log(\Lambda_{t})\,dY_{t}-\Lambda_{t}+1\,dt\right),

where the uniqueness of intensity [39, T12 Theorem, Chapter II, p. 31] gives us

𝔼PY0T[0T|ΓtΛt|𝑑t]=0,𝔼PY0T[0T𝟏{ΓtΛt}𝑑Yt]=0.\displaystyle\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}|\Gamma_{t}-\Lambda_{t}|\,dt\right]=0,\quad\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\mathbf{1}\{\Gamma_{t}\neq\Lambda_{t}\}\,dY_{t}\right]=0.

Since

𝔼PY0T[0T|ϕ(Γt)|𝑑t]<,\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}|\phi(\Gamma_{t})|\,dt\right]<\infty,

we have

𝔼PY0T[0T(log(Γt))+𝑑Yt]=𝔼PY0T[0T(log(Γt))+Γt𝑑t]=𝔼PY0T[0T(ϕ(Γt))+𝑑t]<,\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{+}\,dY_{t}\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{+}\Gamma_{t}\,dt\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\phi(\Gamma_{t}))^{+}\,dt\right]<\infty,

and

𝔼PY0T[0T(log(Γt))𝑑Yt]=𝔼PY0T[0T(log(Γt))Γt𝑑t]=𝔼PY0T[0T(ϕ(Γt))𝑑t]<.\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{-}\,dY_{t}\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{-}\Gamma_{t}\,dt\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\phi(\Gamma_{t}))^{-}\,dt\right]<\infty.

Hence

𝔼PY0T[0Tlog(Γt)𝑑Yt]=𝔼PY0T[0Tϕ(Γt)𝑑t]<.\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]<\infty.

Finally,

𝔼PY0T[log(dPY0TdP0Y0T)]\displaystyle\mathbb{E}_{P^{Y_{0}^{T}}}\left[\log\left(\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}\right)\right] =𝔼PY0T[0Tlog(Λt)𝑑Yt+Λt1dt]\displaystyle=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\log(\Lambda_{t})\,dY_{t}+\Lambda_{t}-1\,dt\right]
=(𝐚)𝔼PY0T[0Tlog(Γt)𝑑Yt+Γt1dt]\displaystyle\mathbf{\overset{(a)}{=}}\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}+\Gamma_{t}-1\,dt\right]
=𝔼PY0T[0Tlog(Γt)𝑑Yt]𝔼[0TΓt1dt]\displaystyle=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}-1\,dt\right]
=𝔼PY0T[0Tϕ(Γt)𝑑t]𝔼[0TΓt1dt]\displaystyle=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}-1\,dt\right]
=𝔼PY0T[0Tϕ(Γt)Γt+1dt].\displaystyle=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+1\,dt\right].

Here, for (a) we have used the uniqueness of the intensity and in the remaining equalities, we have used the finiteness of the expectations [0Tϕ(Γt)𝑑t]\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right], 𝔼[0TΓt𝑑t]\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]. ∎

Proof:

Recall that L0TL_{0}^{T} can be written as

Lt=exp(0tlog(Γs)𝑑Ys+(1Γs)ds),t[0,T].\displaystyle L_{t}=\exp\left(\int_{0}^{t}\log(\Gamma_{s})\,dY_{s}+(1-\Gamma_{s})\,ds\right),\quad t\in[0,T].

We note that for t[0,T]t\in[0,T] LtL_{t} satisfies

Lt={Ltif YtYt=0,ΓtLtif YtYt=1.\displaystyle L_{t}=\begin{cases}L_{t-}&\text{if }Y_{t}-Y_{t-}=0,\\ \Gamma_{t}L_{t-}&\text{if }Y_{t}-Y_{t-}=1.\end{cases} (37)

Let C0TC_{0}^{T} be a non-negative (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-predictable process. Then

𝔼[0TCt𝑑Yt]\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{t}\,dY_{t}\right] =(a)𝔼P~M,Y0T[LT0TCt𝑑Yt]\displaystyle\overset{(a)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[L_{T}\int_{0}^{T}C_{t}\,dY_{t}\right]
=(b)𝔼P~M,Y0T[0TLtCt𝑑Yt]\displaystyle\overset{(b)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[\int_{0}^{T}L_{t}C_{t}\,dY_{t}\right]
=(c)𝔼P~M,Y0T[0TΓtLtCt𝑑Yt]\displaystyle\overset{(c)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[\int_{0}^{T}\Gamma_{t}L_{t-}C_{t}\,dY_{t}\right]
=(d)𝔼P~M,Y0T[0TΓtLtCt𝑑t]\displaystyle\overset{(d)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[\int_{0}^{T}\Gamma_{t}L_{t-}C_{t}\,dt\right]
=(e)𝔼P~M,Y0T[0TΓtLtCt𝑑t]\displaystyle\overset{(e)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[\int_{0}^{T}\Gamma_{t}L_{t}C_{t}\,dt\right]
=(f)𝔼P~M,Y0T[LT0TΓtCt𝑑t]\displaystyle\overset{(f)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[L_{T}\int_{0}^{T}\Gamma_{t}C_{t}\,dt\right]
=(g)𝔼[0TΓtCt𝑑t],\displaystyle\overset{(g)}{=}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}C_{t}\,dt\right],

where, (a) follows since LTL_{T} is the Radon-Nikodym derivative dPM,Y0TdP~M,Y0T\frac{d{P}^{M,Y_{0}^{T}}}{d\tilde{P}^{M,Y_{0}^{T}}},
(b) follows due to [39, T19 Theorem, Appendix A2, p. 302],
(c) follows due to (37),
(d) follows since the (P~M,Y0T,𝒢t:t[0,T])(\tilde{P}^{M,Y_{0}^{T}},\mathcal{G}_{t}:t\in[0,T])-intensity of Y0TY_{0}^{T} is 1, and LtL_{t-} being a left-continuous adapted process is (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-predictable,
(e) follows since the Lebesgue measure of the set {t:t[0,T],LtLt}\{t:t\in[0,T],L_{t-}\neq L_{t}\} is zero due to (37),
(f) again follows again due to [39, T19 Theorem, Appendix A2, p. 302], and
(g) again follows since LTL_{T} is the Radon-Nikodym derivative dPM,Y0TdP~M,Y0T\frac{d{P}^{M,Y_{0}^{T}}}{d\tilde{P}^{M,Y_{0}^{T}}}. ∎

Proof:

We will first show that

𝔼[0T(log(Γt))𝑑Yt]=𝔼[0T(log(Γt))Γt𝑑t]<.\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\Gamma_{t}\,dt\right]<\infty.

Define Γt1+max(Γt,1)\Gamma^{1+}_{t}\triangleq\max(\Gamma_{t},1) and Γt1min(Γt,1)\Gamma^{1-}_{t}\triangleq\min(\Gamma_{t},1). We note that ΓtΓt1+Γt+1\Gamma_{t}\leq\Gamma^{1+}_{t}\leq\Gamma_{t}+1 and Γt=Γt1+Γt1\Gamma_{t}=\Gamma^{1+}_{t}\Gamma^{1-}_{t}. Define the process μ0T\mu_{0}^{T} as

μtΓt1+Γt𝟏{Γt>0},t[0,T].\displaystyle\mu_{t}\triangleq\frac{\Gamma^{1+}_{t}}{\Gamma_{t}}\mathbf{1}\{\Gamma_{t}>0\},\quad t\in[0,T].

Then μ0T\mu_{0}^{T} is a non-negative (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-predictable process and

0TμtΓt𝑑t=0TΓt1+𝟏{Γt>0}𝑑t0T(Γt+1)𝑑t<\int_{0}^{T}\mu_{t}\Gamma_{t}\,dt=\int_{0}^{T}\Gamma^{1+}_{t}\mathbf{1}\{\Gamma_{t}>0\}\,dt\leq\int_{0}^{T}(\Gamma_{t}+1)\,dt<\infty

PP-a.s. since 𝔼[0TΓt𝑑t]<\mathbb{E}[\int_{0}^{T}\Gamma_{t}\,dt]<\infty. Hence the process L^0T\hat{L}_{0}^{T} defined as

L^texp(0Tlog(μt)𝑑Yt+(1μt)Γtdt),t[0,T]\displaystyle\hat{L}_{t}\triangleq\exp\left(\int_{0}^{T}\log(\mu_{t})\,dY_{t}+(1-\mu_{t})\Gamma_{t}\,dt\right),\quad t\in[0,T]

is a (P,Gt:t[0,T])(P,G_{t}:t\in[0,T]) non-negative super-martingale [39, T2 Theorem, Chapter VI, p. 165]. Hence the following chain of inequalities hold

𝔼[log(L^T)]\displaystyle\mathbb{E}\left[\log(\hat{L}_{T})\right] (a)log(𝔼[L^T])\displaystyle\overset{(a)}{\leq}\log\left(\mathbb{E}[\hat{L}_{T}]\right)
(b)log(𝔼[L^0])\displaystyle\overset{(b)}{\leq}\log\left(\mathbb{E}[\hat{L}_{0}]\right)
=0.\displaystyle=0. (38)

Here, for (a) we have used the fact that since L0TL_{0}^{T} is a super-martingale, LTL_{T} is integrable, and then Jensen’s inequality and
for (b), we have used the fact that L^0T\hat{L}_{0}^{T} is a super-martingale, hence 𝔼[L^T]𝔼[L^0]\mathbb{E}[\hat{L}_{T}]\leq\mathbb{E}[\hat{L}_{0}].
Let τk\tau_{k} denote the kkth arrival instant of the process Y0TY_{0}^{T}, i.e.,

τk=inf{t[0,T]:Yt=k},\displaystyle\tau_{k}=\inf\{t\in[0,T]:Y_{t}=k\},

where the infimum of the null set is taken as \infty. Then if τkT\tau_{k}\leq T, Γτk>0\Gamma_{\tau_{k}}>0 PP-a.s. [39, T12 Theorem, Chapter II, p. 31]. Hence for τkT\tau_{k}\leq T,

log(μτk)=log(Γτk1+)log(Γτk)=log(Γτk1)=(log(Γτk))Pa.s.,\displaystyle\log(\mu_{\tau_{k}})=\log(\Gamma^{1+}_{\tau_{k}})-\log(\Gamma_{\tau_{k}})=-\log(\Gamma^{1-}_{\tau_{k}})=\left(\log(\Gamma_{\tau_{k}})\right)^{-}\quad P-\text{a.s.},

Thus we can write

log(L^T)=0T(log(Γt))𝑑Yt+0T(ΓtΓt1+)𝟏{Γt>0}𝑑t.\displaystyle\log(\hat{L}_{T})=\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}+\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt.

Using (38) we obtain

𝔼[0T(log(Γt))𝑑Yt+0T(ΓtΓt1+)𝟏{Γt>0}𝑑t]=𝔼[log(LT^)]0.\displaystyle\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}+\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt\right]=\mathbb{E}[\log(\hat{L_{T}})]\leq 0.

We note that 0T(log(Γt))𝑑Yt\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t} is a non-negative random variable, and

|𝔼[0T(ΓtΓt1+)𝟏{Γt>0}𝑑t]|𝔼[0T(Γt+Γt1+)𝑑t]𝔼[0T(2Γt+1)𝑑t]<.\left|\mathbb{E}\left[\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt\right]\right|\leq\mathbb{E}\left[\int_{0}^{T}(\Gamma_{t}+\Gamma^{1+}_{t})\,dt\right]\leq\mathbb{E}\left[\int_{0}^{T}(2\Gamma_{t}+1)\,dt\right]<\infty.

Hence we can split the expectation to get

𝔼[0T(log(Γt))𝑑Yt]+𝔼[0T(ΓtΓt1+)𝟏{Γt>0}𝑑t]0,\displaystyle\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}\right]+\mathbb{E}\left[\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt\right]\leq 0,

which gives

𝔼[0T(log(Γt))𝑑Yt]𝔼[0T(ΓtΓt1+)𝟏{Γt>0}𝑑t]<.\displaystyle\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}\right]\leq-\mathbb{E}\left[\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt\right]<\infty. (39)

Hence

𝔼[0Tlog(Γt)𝑑Yt]\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right] =𝔼[0T(log(Γt))+𝑑Yt]𝔼[0T(log(Γt))𝑑Yt]\displaystyle=\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{+}\,dY_{t}\right]-\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}\right]
=𝔼[0T(log(Γt))+Γt𝑑t]𝔼[0T(log(Γt))Γt𝑑t]\displaystyle=\mathbb{E}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{+}\Gamma_{t}\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\Gamma_{t}\,dt\right]
=𝔼[0Tϕ(Γt)𝑑t].\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]. (40)

Proof:

Suppose that Γ0T\Gamma_{0}^{T} is the (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-intensity of Y0TY_{0}^{T}. Then applying [39, T8 Theorem, Chapter II, p. 27] with Xs=1X_{s}=1 proves M0TM_{0}^{T} is a (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-martingale. Now suppose that M0TM_{0}^{T} is a (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-martingale. Consider a simple (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable process C0TC_{0}^{T} of the form

Ct=𝟏{}𝟏{u<tvT}u.\displaystyle C_{t}=\mathbf{1}\{\mathcal{E}\}\mathbf{1}\{u<t\leq v\leq T\}\quad\mathcal{E}\in\mathcal{F}_{u}.

Then

𝔼[0TCs𝑑Ys]\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{Y}_{s}\right] =𝔼[𝟏{}(YvYu)]\displaystyle=\mathbb{E}\left[\mathbf{1}\{\mathcal{E}\}(Y_{v}-Y_{u})\right]
=𝔼[𝟏{}𝔼[(YvYu)|u]]\displaystyle=\mathbb{E}\left[\mathbf{1}\{\mathcal{E}\}\mathbb{E}[(Y_{v}-Y_{u})|\mathcal{F}_{u}]\right]
=(a)𝔼[𝟏{}𝔼[uvΓsds|u]]\displaystyle\overset{(a)}{=}\mathbb{E}\left[\mathbf{1}\{\mathcal{E}\}\mathbb{E}\left[\int_{u}^{v}\Gamma_{s}\,ds\middle|\mathcal{F}_{u}\right]\right]
=𝔼[0TCsΓs𝑑s],\displaystyle=\mathbb{E}\left[\int_{0}^{T}C_{s}\Gamma_{s}\,d{s}\right], (41)

where for (a) we have used the martingale property of M0TM_{0}^{T}. Thus by the monotone class theorem, for all bounded (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable processes C0TC_{0}^{T}, (41) holds (see [39, App. A1, Theorem T5, p. 264]). Then by applying the monotone convergence theorem, we can show that (41) holds for all non-negative (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable processes as well, so that Γ0T\Gamma_{0}^{T} is the (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-intensity of Y0TY_{0}^{T}. ∎

Proof:

There exists a (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-predictable process Π0T\Pi_{0}^{T} such that PP-a.s. Πt=𝔼[Λt|𝒢t]\Pi_{t}=\mathbb{E}[{\Lambda}_{t}|\mathcal{G}_{t-}], t[0,T]t\in[0,T] [55, Chapter 6, Theorem 43, p. 103]. We will show that Π0T\Pi_{0}^{T} is the (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-intensity of N0T{N}_{0}^{T}. Let D0T{D}_{0}^{T} be a non-negative (𝒢t:t[0,T])(\mathcal{G}_{t}:t\in[0,T])-predictable process. As 𝒢tt\mathcal{G}_{t}\subseteq\mathcal{F}_{t}, it is also (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable. Thus

𝔼[0TDs𝑑Ns]=𝔼[0TDsΛs𝑑s].\displaystyle\mathbb{E}\left[\int_{0}^{T}D_{s}\,d{{N}}_{s}\right]=\mathbb{E}\left[\int_{0}^{T}D_{s}{\Lambda}_{s}\,ds\right]. (42)

Hence

𝔼[0TDsΠs𝑑s]\displaystyle\mathbb{E}\left[\int_{0}^{T}D_{s}\Pi_{s}\,ds\right] =𝔼[0TDs𝔼[Λs|𝒢s]𝑑s]\displaystyle=\mathbb{E}\left[\int_{0}^{T}D_{s}\mathbb{E}[{\Lambda}_{s}|\mathcal{G}_{s-}]\,ds\right]
=(a)𝔼[0T𝔼[DsΛs|𝒢s]𝑑s]\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\mathbb{E}\left[\int_{0}^{T}\mathbb{E}[D_{s}{\Lambda}_{s}|\mathcal{G}_{s-}]\,ds\right]
=𝔼[0TDsΛs𝑑s]\displaystyle=\mathbb{E}\left[\int_{0}^{T}D_{s}{\Lambda}_{s}\,ds\right]
=(b)𝔼[0TDs𝑑Ns].\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\mathbb{E}\left[\int_{0}^{T}D_{s}\,d{{N}}_{s}\right].

Here, (a) is due to the fact that DsD_{s} is 𝒢s\mathcal{G}_{s-} measurable [39, Exercise E10, Chapter I, p. 9], and
(b) is due to (42).

Hence the (𝒢t:t[0,T])\left(\mathcal{G}_{t}:t\in[0,T]\right)-intensity of N0T{N}_{0}^{{T}} is Π0T\Pi_{0}^{{T}}. ∎

Proof:

We first note that since Y0TY_{0}^{T} and N0TN_{0}^{T} are independent, trajectories of Z0TZ_{0}^{T} are a.s. in 𝒩0T\mathcal{N}_{0}^{T}. The (~tσ(M,Y0t,N0t):t[0,T])(\tilde{\mathcal{F}}_{t}\triangleq\sigma(M,Y_{0}^{t},N_{0}^{t}):t\in[0,T])-intensities of Y0TY_{0}^{T} and N0TN_{0}^{T} are Γ0T\Gamma_{0}^{T} and Π0T\Pi_{0}^{T} respectively [39, E5 Exercise, Chapter II, p. 28]. Then for a non-negative (~t:t[0,T])(\tilde{\mathcal{F}}_{t}:t\in[0,T])-predictable process C0TC_{0}^{T}:

𝔼[0TCs𝑑Zs]\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{Z}_{s}\right] =𝔼[0TCs𝑑Ys]+𝔼[0TCs𝑑Ns]\displaystyle=\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{Y}_{s}\right]+\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{N}_{s}\right]
=𝔼[0TCsΓs𝑑s]+𝔼[0TCsΠs𝑑s]\displaystyle=\mathbb{E}\left[\int_{0}^{T}C_{s}\Gamma_{s}\,d{s}\right]+\mathbb{E}\left[\int_{0}^{T}C_{s}\Pi_{s}\,ds\right]
=𝔼[0TCs(Γs+Πs)𝑑s].\displaystyle=\mathbb{E}\left[\int_{0}^{T}C_{s}(\Gamma_{s}+\Pi_{s})\,d{s}\right].

Hence the (~t:t[0,T])(\tilde{\mathcal{F}}_{t}:t\in[0,T])-intensity of Z0TZ_{0}^{T} is (Γt+Πt:t[0,T])(\Gamma_{t}+\Pi_{t}:t\in[0,T]). Since t~t\mathcal{F}_{t}\subseteq\tilde{\mathcal{F}}_{t} the statement of the lemma follows from an application of Lemma 8. ∎

Proof:

Let tσ(M,Y0t,Z0t)\mathcal{H}_{t}\triangleq\sigma(M,Y_{0}^{t},Z_{0}^{t}). We note that the (t:t[0,T])(\mathcal{H}_{t}:t\in[0,T])-intensity of Y0TY_{0}^{T} is Γ0T\Gamma_{0}^{T}. Now we will compute the (t:t[0,T])(\mathcal{H}_{t}:t\in[0,T])-intensity of Z0TZ_{0}^{T}. Let (χi:i{1,})(\chi_{i}:i\in\{1,\dots\}) denote the sequence of independent and identically distributed Bernoulli random variables which indicate if a particular point in point process Y0T{Y}_{0}^{T} is erased or not. In particular, if χj=1\chi_{j}=1, then the jjth point in Y0TY_{0}^{T} is retained, so that 𝔼[χj]=1p\mathbb{E}[\chi_{j}]=1-p. Then for 0u<vT0\leq u<v\leq T

ZvZu=k=Yu+1Yvχk=k=1χk𝟏{Yu<kYv}.\displaystyle Z_{v}-Z_{u}=\sum_{k=Y_{u}+1}^{Y_{v}}\chi_{k}=\sum_{k=1}^{\infty}\chi_{k}\mathbf{1}\{Y_{u}<k\leq Y_{v}\}.

Using the monotone convergence theorem for the conditional expectation,

𝔼[(ZvZu)|u]\displaystyle\mathbb{E}\left[(Z_{v}-Z_{u})|\mathcal{H}_{u}\right] =k=1𝔼[χk𝟏{Yu<kYv}|u]\displaystyle=\sum_{k=1}^{\infty}\mathbb{E}[\chi_{k}\mathbf{1}\{Y_{u}<k\leq Y_{v}\}|\mathcal{H}_{u}]
=(a)k=1𝔼[χk|u]𝔼[𝟏{Yu<kYv}|u]\displaystyle\overset{(a)}{=}\sum_{k=1}^{\infty}\mathbb{E}[\chi_{k}|\mathcal{H}_{u}]\mathbb{E}[\mathbf{1}\{Y_{u}<k\leq Y_{v}\}|\mathcal{H}_{u}]
=(b)(1p)𝔼[(YvYu)|u]\displaystyle\overset{(b)}{=}(1-p)\mathbb{E}[(Y_{v}-Y_{u})|\mathcal{H}_{u}]
=(c)(1p)𝔼[uvΓsds|u],\displaystyle\overset{(c)}{=}(1-p)\mathbb{E}\left[\int_{u}^{v}\Gamma_{s}\,ds\middle|\mathcal{H}_{u}\right],

where, for (a) we have used the fact that given u\mathcal{H}_{u}, χk\chi_{k} is independent of Y0TY_{0}^{T},
for (b), we use note that 𝔼[χk|u]=χk𝟏{kYu}+(1p)𝟏{k>Yu}\mathbb{E}[\chi_{k}|\mathcal{H}_{u}]=\chi_{k}\mathbf{1}\{k\leq Y_{u}\}+(1-p)\mathbf{1}\{k>Y_{u}\}, and
for (c), we have used the martingale property of M0TM_{0}^{T}.
Then

M~tZt0t(1p)Γs𝑑st[0,T]\tilde{M}_{t}\triangleq Z_{t}-\int_{0}^{t}(1-p)\Gamma_{s}\,ds\quad t\in[0,T]

is a (t:t[0,T])(\mathcal{H}_{t}:t\in[0,T])-martingale. Hence from Lemma 7, the (t:t[0,T])(\mathcal{H}_{t}:t\in[0,T])-intensity of Z0TZ_{0}^{T} is ((1p)Γt:t[0,T])((1-p)\Gamma_{t}:t\in[0,T]). An application of Lemma 8 the proves the statement of the lemma. ∎

Proof:

We will require the following inequality

ulog(v)ϕ(u)u+v,0u,v<.\displaystyle u\log(v)\leq\phi(u)-u+v,\quad 0\leq u,v<\infty. (43)

The inequality can be verified to be true if either or both uu, vv are zero. If u,v>0u,v>0, the inequality follows from log(u/v)(1v/u)\log(u/v)\geq(1-v/u).

Defining Y^t1+max(1,Y^t)\hat{Y}^{1+}_{t}\triangleq\max(1,\hat{Y}_{t}), we note that Y^t1+Y^t+1\hat{Y}^{1+}_{t}\leq\hat{Y}_{t}+1. Consider

𝔼[0T(log(Y^t))+𝑑Yt]\displaystyle\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{+}\,dY_{t}\right] =𝔼[0Tlog(Y^t1+)𝑑Yt]\displaystyle=\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}^{1+}_{t})\,dY_{t}\right]
=(a)𝔼[0Tlog(Y^t1+)Γt𝑑t]\displaystyle\overset{(a)}{=}\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}^{1+}_{t})\Gamma_{t}\,dt\right]
(b)𝔼[0Tϕ(Γt)Γt+Y^t1+dt]\displaystyle\overset{(b)}{\leq}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+\hat{Y}^{1+}_{t}\,dt\right]
=(c)𝔼[0Tϕ(Γt)𝑑t]𝔼[0TΓt𝑑t]+𝔼[0TY^t1+𝑑t]\displaystyle\overset{(c)}{=}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]+\mathbb{E}\left[\int_{0}^{T}\hat{Y}^{1+}_{t}\,dt\right]
<,\displaystyle<\infty, (44)

where, for (a), we have used the facts that (Y^t1+:t[0,T])(\hat{Y}^{1+}_{t}:t\in[0,T]) is (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-predictable, log(Y^t1+)\log(\hat{Y}^{1+}_{t}) is non-negative, and Γ0T\Gamma_{0}^{T} is the (t:t[0,T])(\mathcal{F}_{t}:t\in[0,T])-intensity of Y0TY_{0}^{T},
for (b), we note that Y^t1+\hat{Y}_{t}^{1+} and Γt\Gamma_{t} are PP-a.s finite, and then use the inequality in (43), and
for (c), we have used the facts that 𝔼[0Tϕ(Γt)𝑑t]<\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]<\infty (via Theorem 1), 𝔼[0TΓt𝑑t]<\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]<\infty, and 𝔼[0TY^t1+𝑑t]𝔼[0TY^t+1dt]<\mathbb{E}\left[\int_{0}^{T}\hat{Y}^{1+}_{t}\,dt\right]\leq\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}+1\,dt\right]<\infty.
Hence we can write

𝔼[0Tlog(Y^t)𝑑Yt]\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}\right] =𝔼[0T(log(Y^t))+𝑑Yt]𝔼[0T(log(Y^t))𝑑Yt]\displaystyle=\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{+}\,dY_{t}\right]-\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{-}\,dY_{t}\right]
=𝔼[0T(log(Y^t))+Γt𝑑t]𝔼[0T(log(Y^t))Γt𝑑t]\displaystyle=\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{+}\Gamma_{t}\,dt\right]-\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{-}\Gamma_{t}\,dt\right]
=𝔼[0Tlog(Y^t)Γt𝑑t].\displaystyle=\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\Gamma_{t}\,dt\right]. (45)

Proof:

The first part of the lemma follows directly from L’Hôpital’s rule. For the second part

limΔ0𝔼[d¯(Y¯^,Y¯)]\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})] =limΔ0k=14Y¯^(k)exp(λΔ)αk+(Y¯^(k)log(Y¯^(k))Δ)(1exp(λΔ)βk\displaystyle=\lim_{\Delta\to 0}\sum_{k=1}^{4}\hat{\bar{Y}}(k)\exp(-\lambda\Delta)\alpha_{k}+\left(\hat{\bar{Y}}(k)-\frac{\log(\hat{\bar{Y}}(k))}{\Delta}\right)(1-\exp(-\lambda\Delta)\beta_{k}
=k=14Y¯^(k)αkλlog(Y¯^(k))βk\displaystyle=\sum_{k=1}^{4}\hat{\bar{Y}}(k)\alpha_{k}-\lambda\log(\hat{\bar{Y}}(k))\beta_{k}
=k=14αk(Y¯^(k)λβkαklog(Y¯^(k)))𝟏{αk>0}\displaystyle=\sum_{k=1}^{4}\alpha_{k}\left(\hat{\bar{Y}}(k)-\frac{\lambda\beta_{k}}{\alpha_{k}}\log(\hat{\bar{Y}}(k))\right)\mathbf{1}\{\alpha_{k}>0\}
(a)k=14αk(Ψ𝒜(λβkαk)+ϵ4)\displaystyle\overset{(a)}{\leq}\sum_{k=1}^{4}\alpha_{k}\left(\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)+\frac{\epsilon}{4}\right)
=D+ϵ4,\displaystyle={D}+\frac{\epsilon}{4},

where for (a), we have used the definition in (23).

Proof:

The first limit can be evaluated using L’Hôpital’s rule. To compute the second limit, consider

limΔ0P(U¯(i)=k|Y¯=1)\displaystyle\lim_{\Delta\to 0}{P(\bar{U}^{(i)}=k|\bar{Y}=1)} =limΔ0l=01P(U¯(i)=k,Y¯(i)=l|Y¯=1)\displaystyle=\lim_{\Delta\to 0}\sum_{l=0}^{1}{P(\bar{U}^{(i)}=k,\bar{Y}^{(i)}=l|\bar{Y}=1)}
=limΔ0l=01P(Y¯(i)=l|Y¯=1)(U¯(i)=k|Y¯(i)=l)\displaystyle=\lim_{\Delta\to 0}\sum_{l=0}^{1}{P(\bar{Y}^{(i)}=l|\bar{Y}=1)}(\bar{U}^{(i)}=k|\bar{Y}^{(i)}=l)
=p(i)αk(i)+(1p(i))βk(i)\displaystyle=p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}
=γk(i).\displaystyle={\gamma^{(i)}_{k}}.

Then we have

limΔ0P(U¯1(1)=k1,U¯1(2)=k2|Y¯=1))\displaystyle\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}|\bar{Y}=1)) =limΔ0P(U¯1(1)=k1|Y¯=1)P(U¯1(2)=k2|Y¯=1)\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1}|\bar{Y}=1)P(\bar{U}_{1}^{(2)}=k_{2}|\bar{Y}=1)
=γk1(1)γk2(2).\displaystyle=\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}.

Recalling that αk(i)=0\alpha^{(i)}_{k}=0 implies βk(i)=γk(i)=0\beta^{(i)}_{k}=\gamma^{(i)}_{k}=0, we have

limΔ0𝔼[log(Y¯^)𝟏{Y¯=1}]Δ\displaystyle\lim_{\Delta\to 0}\frac{\mathbb{E}[\log(\hat{\bar{Y}})\mathbf{1}\{\bar{Y}=1\}]}{\Delta} =limΔ0P(Y¯=1)ΔlimΔ0𝔼[log(Y¯^)|Y¯=1]\displaystyle=\lim_{\Delta\to 0}\frac{P(\bar{Y}=1)}{\Delta}\lim_{\Delta\to 0}\mathbb{E}[\log(\hat{\bar{Y}})|\bar{Y}=1]
=λk1,k2limΔ0P(U¯=k1,U¯2=k2|Y¯=1))log(Y¯^(k1,k2))\displaystyle=\lambda\sum_{k_{1},k_{2}}\lim_{\Delta\to 0}P(\bar{U}=k_{1},\bar{U}_{2}=k_{2}|\bar{Y}=1))\log\left(\hat{\bar{Y}}(k_{1},k_{2})\right)
=λk1,k2γk1(1)γk2(2)log(λγk1(1)γk2(2)αk1(1)αk2(2))𝟏{γk1(1)γk2(2)>0}\displaystyle=\lambda\sum_{k_{1},k_{2}}\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}\log\left(\lambda\frac{\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}}{\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}}\right)\mathbf{1}\{\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}>0\}
=λk1=14γk1(1)log(γk1(1)αk1(1))+λk2=14γk2(1)log(γk2(2)αk2(2))+ϕ(λ).\displaystyle=\lambda\sum_{k_{1}=1}^{4}\gamma^{(1)}_{k_{1}}\log\left(\frac{\gamma^{(1)}_{k_{1}}}{\alpha^{(1)}_{k_{1}}}\right)+\lambda\sum_{k_{2}=1}^{4}\gamma^{(1)}_{k_{2}}\log\left(\frac{\gamma^{(2)}_{k_{2}}}{\alpha^{(2)}_{k_{2}}}\right)+\phi(\lambda).

Now to compute limΔ0𝔼[Y¯^]\lim_{\Delta\to 0}\mathbb{E}[\hat{\bar{Y}}], we first calculate

limΔ0P(U¯1(1)=k1,U¯1(2)=k2)\displaystyle\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}) =limΔ0P(U¯1(1)=k1,U¯1(2)=k2|Y¯=0)P(Y¯=0)\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}|\bar{Y}=0)P(\bar{Y}=0)
+limΔ0P(U¯1(1)=k1,U¯1(2)=k2|Y¯=1)P(Y¯=1)\displaystyle\quad+\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}|\bar{Y}=1)P(\bar{Y}=1)
=limΔ0P(U¯1(1)=k1,U¯1(2)=k2|Y¯=0)\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}|\bar{Y}=0)
=limΔ0P(U¯1(1)=k1|Y¯=0)P(U¯1(2)=k2|Y¯=0)\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1}|\bar{Y}=0)P(\bar{U}_{1}^{(2)}=k_{2}|\bar{Y}=0)
=αk1(1)αk2(2).\displaystyle=\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}.

This gives

limΔ0𝔼[Y¯^]\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\hat{\bar{Y}}] =k1,k2limΔ0P(U¯=k1,U¯2=k2)Y¯^(k1,k2)\displaystyle=\sum_{k_{1},k_{2}}\lim_{\Delta\to 0}P(\bar{U}=k_{1},\bar{U}_{2}=k_{2})\hat{\bar{Y}}(k_{1},k_{2})
=λk1,k2αk1(1)αk2(2)γk1(1)γk2(2)αk1(1)αk2(2)𝟏{αk1(1)αk2(2)>0}\displaystyle=\lambda\sum_{k_{1},k_{2}}\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}\frac{\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}}{\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}}\mathbf{1}\{\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}>0\}
=λ.\displaystyle=\lambda.

Thus

limΔ0𝔼[d¯(Y¯^,Y¯)]\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})] =λϕ(λ)λ(k1=14γk1(1)log(γk1(1)αk1(1))+k2=14γk2(2)log(γk2(2)αk2(2)))\displaystyle=\lambda-\phi(\lambda)-\lambda\left(\sum_{k_{1}=1}^{4}\gamma^{(1)}_{k_{1}}\log\left(\frac{\gamma^{(1)}_{k_{1}}}{\alpha^{(1)}_{k_{1}}}\right)+\sum_{k_{2}=1}^{4}\gamma^{(2)}_{k_{2}}\log\left(\frac{\gamma^{(2)}_{k_{2}}}{\alpha^{(2)}_{k_{2}}}\right)\right)
=D.\displaystyle={D}.

Proof:

Achievability:

Let

R(1)((1p(1))λ+μ(1))k=14βk(1)log(βk(1)αk(1))\displaystyle{R}^{(1)}\triangleq\left((1-p^{(1)})\lambda+\mu^{(1)}\right)\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)
R(2)((1p(2))λ+μ(2))k=14βk(2)log(βk(2)αk(2))\displaystyle{R}^{(2)}\triangleq\left((1-p^{(2)})\lambda+\mu^{(2)}\right)\sum_{k=1}^{4}\beta^{(2)}_{k}\log\left(\frac{\beta^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)
Dλϕ(λ)λ(k=14γk(1)log(γk(1)αk(1))+γk(2)log(γk(2)αk(2))).\displaystyle{D}\triangleq\lambda-\phi(\lambda)-\lambda\cdot\left(\sum_{k=1}^{4}\gamma^{(1)}_{k}\log\left(\frac{\gamma^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)+\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)\right).

We will show achievability using a (T,R(1)+ϵ,R(2)+ϵ,D+ϵ)(T,{R}^{(1)}+\epsilon,{R}^{(2)}+\epsilon,{D}+\epsilon) code without feedforward. We will use discretization and results from the rate-distortion theory for discrete memoryless sources (DMS).

First consider the case when for each i{1,2}i\in\{1,2\}, at least one of the following conditions is satisfied

  1. C.1

    βk(i)>0\beta^{(i)}_{k}>0 for all kk,

  2. C.2

    p(i)>0p^{(i)}>0.

Fix Δ>0\Delta>0, and let TnΔT\triangleq n\Delta for an integer nn. For each i{1,2}i\in\{1,2\}, define a binary valued discrete time process (Y¯j(i):j{1,,n})(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\}) as follows. If there are one or more arrivals in the interval ((j1)Δ,jΔ]((j-1)\Delta,j\Delta] of the process Y0(i),TY_{0}^{(i),T}, then set Y¯j(i)\bar{Y}^{(i)}_{j} to 11, otherwise set it equal to zero. Since Y0(i),TY_{0}^{(i),T} is a Poisson process with rate λ(i)(1p(i))λ+μ(i)\lambda^{(i)}\triangleq(1-p^{(i)})\lambda+\mu^{(i)}, the components of (Y¯j(i):j{1,,n})(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\}) are independent and identically distributed with P(Y¯(i)=1)=1exp(λ(i)Δ)P(\bar{Y}^{(i)}=1)=1-\exp(-\lambda^{(i)}\Delta). Similarly, if (Y¯j:j{1,,n})(\bar{Y}_{j}:j\in\{1,\dots,n\}) denotes the discretized process Y0TY_{0}^{T}, then we have

P(Y¯j(i):j{1,,n}|Y¯j:j{1,,n})=j=1nP(Y¯j(i)|Y¯j)\displaystyle P\left(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\}\middle|\bar{Y}_{j}:j\in\{1,\dots,n\}\right)=\prod_{j=1}^{n}P(\bar{Y}^{(i)}_{j}|\bar{Y}_{j})

due to the memoryless property of Poisson processes and independent thinning. Consider the following “test”-channel for k{1,2,3,4}k\in\{1,2,3,4\},

P(U¯(i)=k|Y¯(i)=1)=βk(i),\displaystyle P(\bar{U}^{(i)}=k|\bar{Y}^{(i)}=1)=\beta^{(i)}_{k}, (46)
P(U¯(i)=k|Y¯(i)=0)=αk(i).\displaystyle P(\bar{U}^{(i)}=k|\bar{Y}^{(i)}=0)=\alpha^{(i)}_{k}. (47)

Define the discretized distortion function

d¯(y¯^,y¯)y¯^log(y¯^)Δ𝟏{y¯=1}y¯^0,y¯{0,1}.\displaystyle\bar{d}(\hat{\bar{y}},\bar{y})\triangleq\hat{\bar{y}}-\frac{\log(\hat{\bar{y}})}{\Delta}\mathbf{1}\{\bar{y}=1\}\quad\hat{\bar{y}}\geq 0,\bar{y}\in\{0,1\}. (48)

The reconstruction Y¯^\hat{\bar{Y}} is taken as

Y¯^(U¯(1),U¯(2))=λY¯^(1)(U¯(1))Y¯^(2)(U¯(2)),\displaystyle\hat{\bar{Y}}(\bar{U}^{(1)},\bar{U}^{(2)})=\lambda\hat{\bar{Y}}^{(1)}(\bar{U}^{(1)})\hat{\bar{Y}}^{(2)}(\bar{U}^{(2)}),

where

Y¯^(i)(k)={γk(i)αk(i)ifαk(i)>0,1otherwise.\displaystyle\hat{\bar{Y}}^{(i)}(k)=\begin{cases}\frac{\gamma^{(i)}_{k}}{\alpha^{(i)}_{k}}&\text{if}\,\alpha^{(i)}_{k}>0,\\ 1&\text{otherwise}.\end{cases}

We note that since γk(i)=p(i)αk(i)+(1p(i))βk(i)\gamma^{(i)}_{k}=p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}, and at least one of C.1-C.2 is satisfied, Y¯^(i)(k)>0\hat{\bar{Y}}^{(i)}(k)>0, and hence Y¯^>0\hat{\bar{Y}}>0. Thus the distortion function d¯(Y¯^,Y¯)\bar{d}(\hat{\bar{Y}},\bar{Y}) in (48) is bounded. Let

κmaxk1,k2|log(Y¯^(k1,k2))|.\displaystyle\kappa\triangleq\max_{k_{1},k_{2}}\left|\log\left(\hat{\bar{Y}}(k_{1},k_{2})\right)\right|. (49)

Due to the Berger-Tung inner bound [56, Theorem 12.1, p. 295], for a given Δ>0\Delta>0, ϵ¯>0\bar{\epsilon}>0, and all sufficiently large nn, there exists encoders f¯(1)\bar{f}^{(1)} and f¯(2)\bar{f}^{(2)}, and a decoder g¯\bar{g} such that for i{1,2}i\in\{1,2\}

f¯(i):(Y¯j(i):j{1,,n}){1,,L(i)}\displaystyle\bar{f}^{(i)}:(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\})\to\{1,\dots,L^{(i)}\}
g¯:{1,,L(1)}×{1,,L(2)}(Y¯^j:j{1,,n}),\displaystyle\bar{g}:\{1,\dots,L^{(1)}\}\times\{1,\dots,L^{(2)}\}\to(\hat{\bar{Y}}_{j}:j\in\{1,\dots,n\}),

satisfying

1nlog(L(i))I(U¯(i);Y¯(i))+ϵ¯,\displaystyle\frac{1}{n}\log(L^{(i)})\leq I(\bar{U}^{(i)};\bar{Y}^{(i)})+\bar{\epsilon}, (50)
𝔼[1nj=1nd¯(Y¯^j,Yj¯)]𝔼[d¯(Y¯^,Y¯)]+ϵ¯.\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y_{j}})\right]\leq\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\bar{\epsilon}. (51)

It is noteworthy that the Berger-Tung inner bound has a conditioning term in the mutual-information expression, which in general is a stronger bound than that presented here. However, in our setting we can drop this conditioning as explained in Remark 3 in the main paper.

Given the above setup, each encoder f(i)f^{(i)} upon observing Y0(i),TY_{0}^{(i),T} obtains the binary valued discrete-time process (Y¯j(i):j{1,,n})(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\}), and sends M(i)=f¯(i)(Y¯j(i):j{1,,n})M^{(i)}=\bar{f}^{(i)}(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\}) to the decoder. The decoder outputs the reconstruction Y^0T\hat{Y}_{0}^{T} as

Y^tj=1nY¯^j𝟏{t((j1)Δ,jΔ]}t[0,T].\displaystyle\hat{Y}_{t}\triangleq\sum_{j=1}^{n}\hat{\bar{Y}}_{j}\mathbf{1}\left\{t\in((j-1)\Delta,j\Delta]\right\}\quad t\in[0,T].

Let Y¯¯j\bar{\bar{Y}}_{j} denote the actual number of arrivals of Y0TY_{0}^{T} in an interval ((j1)Δ,jΔ]((j-1)\Delta,j\Delta]. Then d¯\bar{d} is related to the original distortion function via the above reconstruction as follows:

1Td(Y^0T;Y0T)\displaystyle\frac{1}{T}d(\hat{Y}_{0}^{T};Y_{0}^{T}) =1T0TY^t𝑑t1T0Tlog(Y^t)𝑑Yt\displaystyle=\frac{1}{T}\int_{0}^{T}\hat{Y}_{t}\,dt-\frac{1}{T}\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}
=1nj=1nY¯^j1Tj=1nlog(Y¯^j)Y¯¯j\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\hat{\bar{Y}}_{j}-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})\bar{\bar{Y}}_{j}
=1nj=1nY¯^j1nΔj=1nlog(Y¯^j)Y¯j1Tj=1nlog(Y¯^j)(Y¯¯j1)𝟏{Y¯¯j>1}.\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\hat{\bar{Y}}_{j}-\frac{1}{n\Delta}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})\bar{Y}_{j}-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}.
=1nj=1nd¯(Y¯^j,Y¯j)1Tj=1nlog(Y¯^j)(Y¯¯j1)𝟏{Y¯¯j>1}\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}
(a)1nj=1nd¯(Y¯^j,Y¯j)+κTj=1n(Y¯¯j1)𝟏{Y¯¯j>1}\displaystyle\overset{(a)}{\leq}\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})+\frac{\kappa}{T}\sum_{j=1}^{n}(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}
1nj=1nd¯(Y¯^j,Y¯j)+κTj=1nY¯¯j𝟏{Y¯¯j>1},\displaystyle\leq\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})+\frac{\kappa}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\},

where for (a), we have used the definition of κ\kappa in (49).
Hence taking the expectation, we get

𝔼[1Td(Y^0T;Y0T)]\displaystyle\mathbb{E}\left[\frac{1}{T}d(\hat{Y}_{0}^{T};Y_{0}^{T})\right] 𝔼[1nj=1nd¯(Y¯^j,Y¯j)]+κ𝔼[1Tj=1nY¯¯j𝟏{Y¯¯j>1}]\displaystyle\leq\mathbb{E}\left[\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})\right]+\kappa\mathbb{E}\left[\frac{1}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}\right]
(a)𝔼[d¯(Y¯^,Y¯)]+κ𝔼[1Tj=1nY¯¯j𝟏{Y¯¯j>1}]+ϵ¯\displaystyle\overset{(a)}{\leq}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa\mathbb{E}\left[\frac{1}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}\right]+\bar{\epsilon}
=(b)𝔼[d¯(Y¯^,Y¯)]+κ(λλexp(λΔ))+ϵ¯\displaystyle\overset{(b)}{=}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa(\lambda-\lambda\exp(-\lambda\Delta))+\bar{\epsilon}
(c)𝔼[d¯(Y¯^,Y¯)]+κλ2Δ+ϵ¯,\displaystyle\overset{(c)}{\leq}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa\lambda^{2}\Delta+\bar{\epsilon}, (52)

where, for (a), we have used (51),
for (b) we note that 𝔼[Y¯¯j𝟏{Y¯¯j>1}]=λΔλΔexp(λΔ)\mathbb{E}[\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}]=\lambda\Delta-\lambda\Delta\exp(-\lambda\Delta), and
for (c), we have used the inequality 1uexp(u)1-u\leq\exp(-u).
Moreover using (50), for i{1,2}i\in\{1,2\}

1Tlog(L(i))=1nΔlog(L(i))I(U¯(i);Y¯(i))Δ+ϵ¯Δ.\displaystyle\frac{1}{T}\log(L^{(i)})=\frac{1}{n\Delta}\log(L^{(i)})\leq\frac{I(\bar{U}^{(i)};\bar{Y}^{(i)})}{\Delta}+\frac{\bar{\epsilon}}{\Delta}. (53)

The scaling of the mutual information I(U¯(i);Y¯(i))I(\bar{U}^{(i)};\bar{Y}^{(i)}) and the distortion function d¯(Y¯^,Y¯)\bar{d}(\hat{\bar{Y}},\bar{Y}) with respect to Δ\Delta is given by the following lemma.

Lemma 14

For i{1,2}i\in\{1,2\}

limΔ0I(U¯(i);Y¯(i))Δ\displaystyle\lim_{\Delta\to 0}\frac{I(\bar{U}^{(i)};\bar{Y}^{(i)})}{\Delta} =R(i),\displaystyle={R}^{(i)},
limΔ0𝔼[d¯(Y¯^,Y¯)]\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})] =D.\displaystyle={D}.
Proof:

Please see the supplementary material. ∎

Now given the rate-distortion vector (R(1),R(2),D)({R}^{(1)},{R}^{(2)},{D}) and ϵ>0\epsilon>0, first choose Δ\Delta sufficiently small so that

I(U¯(i);Y¯(i))Δ\displaystyle\frac{I(\bar{U}^{(i)};\bar{Y}^{(i)})}{\Delta} R(i)+ϵ4,\displaystyle\leq{R}^{(i)}+\frac{\epsilon}{4},
𝔼[d¯(Y¯^,Y¯)]\displaystyle\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})] D+ϵ4,\displaystyle\leq{D}+\frac{\epsilon}{4},
κλ2Δ\displaystyle\kappa\lambda^{2}\Delta ϵ/4.\displaystyle\leq\epsilon/4.

Then let ϵ¯=Δϵ/4\bar{\epsilon}=\Delta\epsilon/4, and choose a sufficiently large nn so that (50) and (51) are satisfied. From (52) and (53) we conclude that a sequence of (Tn,R(1)+ϵ,R(2)+ϵ,D+ϵ)(T_{n},{R}^{(1)}+\epsilon,{R}^{(2)}+\epsilon,{D}+\epsilon) code exists with Tn=nΔT_{n}=n\Delta when at least one of the conditions C.1 or C.2 is satisfied.

Now consider the case when p(i)=0p^{(i)}=0 some i{1,2}i\in\{1,2\}, and for that ii, βk(i)=0\beta^{(i)}_{k}=0 for some kk’s. Say p(1)=0p^{(1)}=0 and p(2)>0p^{(2)}>0. This gives us γk(1)=βk(1)\gamma^{(1)}_{k}=\beta^{(1)}_{k} for k{1,2,3,4}k\in\{1,2,3,4\}. Then we need to show that the rate-distortion vector

R(1)=(λ+μ(1))k=14βk(1)log(βk(1)αk(1))\displaystyle{R}^{(1)}=\left(\lambda+\mu^{(1)}\right)\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)
R(2)=((1p(2))λ+μ(2))k=14βk(2)log(βk(2)αk(2))\displaystyle{R}^{(2)}=\left((1-p^{(2)})\lambda+\mu^{(2)}\right)\sum_{k=1}^{4}\beta^{(2)}_{k}\log\left(\frac{\beta^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)
D=λϕ(λ)λ(k=14βk(1)log(βk(1)αk(1))+k=14γk(2)log(γk(2)αk(2)))\displaystyle{D}=\lambda-\phi(\lambda)-\lambda\left(\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)+\sum_{k=1}^{4}\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)\right) (54)

is achievable. Let [β^k(1)]k=14=[1/4,1/4,1/4][\hat{\beta}^{(1)}_{k}]_{k=1}^{4}=[1/4,1/4,1/4] and [α^k(1)]k=14=[1/4,1/4,1/4ν,1/4+ν][\hat{\alpha}^{(1)}_{k}]_{k=1}^{4}=[1/4,1/4,1/4-\nu,1/4+\nu] for some ν[0,1/3)\nu\in[0,1/3). Then the term

k=14β^k(1)log(β^k(1)α^k(1))\sum_{k=1}^{4}\hat{\beta}^{(1)}_{k}\log\left(\frac{\hat{\beta}^{(1)}_{k}}{\hat{\alpha}^{(1)}_{k}}\right)

is continuous in ν\nu and goes from zero to infinity as ν\nu is increased from zero to 1/41/4, hence there exists some ν^[0,1/4)\hat{\nu}\in[0,1/4) such that with [α^k(1)]k=14=[1/4,1/4,1/4ν^,1/4+ν^][\hat{\alpha}^{(1)}_{k}]_{k=1}^{4}=[1/4,1/4,1/4-\hat{\nu},1/4+\hat{\nu}],

k=14β^k(1)log(β^k(1)α^k(1))=k=14βk(1)log(βk(1)αk(1)).\displaystyle\sum_{k=1}^{4}\hat{\beta}^{(1)}_{k}\log\left(\frac{\hat{\beta}^{(1)}_{k}}{\hat{\alpha}^{(1)}_{k}}\right)=\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right). (55)

We note that this [β^k(1)]k=14[\hat{\beta}^{(1)}_{k}]_{k=1}^{4} satisfies condition C.1. Hence the rate-distortion vector in (54) is achievable by using [α^k(1)]k=14[\hat{\alpha}^{(1)}_{k}]_{k=1}^{4} that satisfies (55). The case when p(2)=0p^{(2)}=0 or both p(1)=p(2)=0p^{(1)}=p^{(2)}=0 can be handled similarly.

Converse:

We will prove the converse when feedforward is present. For the given (T,R(1)+ϵ,R(2)+ϵ,D+ϵ)(T,R^{(1)}+\epsilon,R^{(2)}+\epsilon,D+\epsilon) code with feedforward, let M(1)M^{(1)} and M(2)M^{(2)} denote the first and second encoder’s output respectively. We essentially repeat the steps in the converse proof of Theorem 4 to show that

1TI(M(1),M(2);Y0T)+Dλϕ(λ)ϵ.\displaystyle\frac{1}{T}I(M^{(1)},M^{(2)};Y_{0}^{T})+D\geq\lambda-\phi(\lambda)-\epsilon.

Since I(M(1),M(2);Y0T)<I(M^{(1)},M^{(2)};Y_{0}^{T})<\infty, we conclude from Theorem 1 that there exists a process Γ0T\Gamma_{0}^{T} such that Γ0T\Gamma_{0}^{T} is the (t=σ(M(1),M(2),Y0t):t[0,T])(\mathcal{F}_{t}=\sigma(M^{(1)},M^{(2)},Y_{0}^{t}):t\in[0,T]) intensity of Y0TY_{0}^{T} and

I(M(1),M(2);Y0T)=𝔼[0Tϕ(Γt)𝑑t]Tϕ(λ),\displaystyle I(M^{(1)},M^{(2)};Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-T\phi(\lambda), (56)

Let Y^0T\hat{Y}_{0}^{T} denote the decoder’s output. The distortion constraint DD satisfies

D1T𝔼[d(Y^0T,Y0T)]ϵ\displaystyle D\geq\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]-\epsilon =1T𝔼[0TY^t𝑑tlog(Y^t)dYt]ϵ\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt-\log(\hat{Y}_{t})\,dY_{t}\right]-\epsilon
=1T𝔼[0TY^tlog(Y^t)Γtdt]ϵ,\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}-\log(\hat{Y}_{t})\Gamma_{t}\,dt\right]-\epsilon, (57)

where for the last equality we have used Lemma 12. Once again using the inequality ulog(v)ϕ(u)u+v,0u,v<u\log(v)\leq\phi(u)-u+v,\quad 0\leq u,v<\infty, and noting that the individual terms have finite expectations,

𝔼[0Tlog(Y^t)Γt𝑑t]\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\Gamma_{t}\,dt\right] 𝔼[0Tϕ(Γt)Γt+Y^tdt]\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+\hat{Y}_{t}\,dt\right]
=𝔼[0Tϕ(Γt)𝑑t]𝔼[0TΓt𝑑t]+𝔼[0TY^t𝑑t].\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]+\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]. (58)

Combining these inequalities, we obtain

1TI(M(1),M(2);Y0T)+D\displaystyle\frac{1}{T}I(M^{(1)},M^{(2)};Y_{0}^{T})+D 1T𝔼[0Tϕ(Γt)𝑑t]ϕ(λ)\displaystyle\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\phi(\lambda)
+1T𝔼[0TY^t𝑑t]1T𝔼[0Tlog(Y^t)𝑑Yt]ϵ\displaystyle\qquad+\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}\right]-\epsilon
(a)1T𝔼[0TΓt𝑑t]ϕ(λ)ϵ\displaystyle\overset{(a)}{\geq}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]-\phi(\lambda)-\epsilon
=(b)λϕ(λ)ϵ,\displaystyle\overset{(b)}{=}\lambda-\phi(\lambda)-\epsilon, (59)

where, for (a) we have used (57) and (58) and
for (b) we use the fact that 𝔼[0TΓt𝑑t]=𝔼[0T𝑑Yt]=λT\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\,dY_{t}\right]=\lambda T.

We can upper bound the term I(M(1),M(2);Y0T)I(M^{(1)},M^{(2)};Y_{0}^{T}) as

I(M(1),M(2);Y0T)\displaystyle I(M^{(1)},M^{(2)};Y_{0}^{T}) =(a)H(M(1),M(2))𝔼[H(M(1),M(2)|Y0T)]\displaystyle\overset{(a)}{=}H(M^{(1)},M^{(2)})-\mathbb{E}\left[H(M^{(1)},M^{(2)}|Y_{0}^{T})\right]
=(b)H(M(1),M(2))𝔼[H(M(1)|Y0T)]𝔼[H(M(2)|Y0T)]\displaystyle\overset{(b)}{=}H(M^{(1)},M^{(2)})-\mathbb{E}\left[H(M^{(1)}|Y_{0}^{T})\right]-\mathbb{E}\left[H(M^{(2)}|Y_{0}^{T})\right]
H(M(1))+H(M(2))𝔼[H(M(1)|Y0T)]𝔼[H(M(2)|Y0T)]\displaystyle\overset{}{\leq}H(M^{(1)})+H(M^{(2)})-\mathbb{E}\left[H(M^{(1)}|Y_{0}^{T})\right]-\mathbb{E}\left[H(M^{(2)}|Y_{0}^{T})\right]
=I(M(1);Y0T)+I(M(2);Y0T),\displaystyle=I(M^{(1)};Y_{0}^{T})+I(M^{(2)};Y_{0}^{T}), (60)

where, for (a) we have used Lemma 2 and
for (b), we used the Markov chain M(1)Y0TM(2)M^{(1)}\leftrightarrows{Y}_{0}^{T}\leftrightarrows M^{(2)}.
Combining (59) and (60) we get

Dλϕ(λ)1TI(M(1);Y0T)1TI(M(2);Y0T)ϵ.\displaystyle D\geq\lambda-\phi(\lambda)-\frac{1}{T}I(M^{(1)};Y_{0}^{T})-\frac{1}{T}I(M^{(2)};Y_{0}^{T})-\epsilon. (61)

For i{1,2}i\in\{1,2\}, using Lemma 2

1TI(M(i);Y0(i),T)=1TH(M(i))1Tlog(exp((R(i)+ϵ)T))R(i)+ϵ+1T.\displaystyle\frac{1}{T}I(M^{(i)};{Y}^{(i),T}_{0})=\frac{1}{T}H(M^{(i)})\leq\frac{1}{T}\log\left(\lceil\exp((R^{(i)}+\epsilon)T)\rceil\right)\leq R^{(i)}+\epsilon+\frac{1}{T}. (62)
Refer to caption
Figure 2: Thinning and superposition operations defined in the proof for the first encoder. Note that the joint distribution of (Y0(1),T,Y~0(1),T,Y0T)(Y_{0}^{(1),T},\tilde{Y}_{0}^{(1),T},Y_{0}^{T}) is same as that of (Y0(1),T,Z~0(1),T,Z0T)(Y_{0}^{(1),T},\tilde{Z}_{0}^{(1),T},Z_{0}^{T}).

We will first consider the case when p(i)<1p^{(i)}<1 for i{1,2}i\in\{1,2\}. We shall proceed by defining certain auxiliary processes (see Figure 2). Let Z~0(i),T\tilde{Z}_{0}^{(i),T} be obtained from p~(i)\tilde{p}^{(i)}-thinning of Y0(i),TY_{0}^{(i),T}, where

p~(i)=μ(i)((1p(i))λ+μ(i)).\tilde{p}^{(i)}=\frac{\mu^{(i)}}{((1-p^{(i)})\lambda+\mu^{(i)})}.

Then using Lemma 11 we can write

Yt(i)=Z~t(i)+Z^t(i)t[0,T],Y_{t}^{(i)}=\tilde{Z}_{t}^{(i)}+\hat{Z}_{t}^{(i)}\quad t\in[0,T],

where Z~0(i),T\tilde{Z}_{0}^{(i),T} and Z^0(i),T\hat{Z}_{0}^{(i),T} are independent Poisson processes with rates (1p(i))λ(1-p^{(i)})\lambda and μ(i)\mu^{(i)} respectively. Whereas, by definition

Yt(i)=Y~t(i)+Nt(i)t[0,T],Y_{t}^{(i)}=\tilde{Y}_{t}^{(i)}+{N}_{t}^{(i)}\quad t\in[0,T],

where Y~t(i)\tilde{Y}_{t}^{(i)} and Nt(i)N_{t}^{(i)} are independent Poisson processes with rates (1p(i))λ(1-p^{(i)})\lambda and μ(i)\mu^{(i)} respectively. Hence we conclude that the joint distribution of (Y0(i),T,Z~0(i),T)(Y_{0}^{(i),T},\tilde{Z}_{0}^{(i),T}) is identical to the joint distribution of (Y0(i),T,Y~0(i),T)(Y_{0}^{(i),T},\tilde{Y}_{0}^{(i),T}). Let Z0(i),TZ_{0}^{(i),T} be obtained by adding an independent Poisson process N^0(i),T\hat{N}_{0}^{(i),T} with rate p(i)λp^{(i)}\lambda to Z~0(i),T\tilde{Z}^{(i),T}_{0},

Zt(i)=Z~t(i)+N^t(i)t[0,T].Z_{t}^{(i)}=\tilde{Z}_{t}^{(i)}+\hat{N}_{t}^{(i)}\quad t\in[0,T].

Also using Lemma 11 we have

Yt=Y~t(i)+Y~~t(i)t[0,T],Y_{t}=\tilde{Y}^{(i)}_{t}+\tilde{\tilde{Y}}^{(i)}_{t}\quad t\in[0,T],

where Y~0(i),T\tilde{Y}_{0}^{(i),T} and Y~~0(i),T\tilde{\tilde{Y}}^{(i),T}_{0} are independent Poisson processes with rates (1p(i))λ(1-p^{(i)})\lambda and p(i)λp^{(i)}\lambda. Hence the joint distribution of (Z0(i),T,Z~0(i),T)(Z_{0}^{(i),T},\tilde{Z}^{(i),T}_{0}) and (Y0T,Y~0(i),T)(Y_{0}^{T},\tilde{Y}^{(i),T}_{0}) are identical. Moreover, M(i)Y0(i),TY~0(i),TY0TM^{(i)}\rightleftarrows Y_{0}^{(i),T}\rightleftarrows\tilde{Y}_{0}^{(i),T}\rightleftarrows Y_{0}^{T} forms a Markov chain and M(i)Y0(i),TZ~0(i),TZ0(i),TM^{(i)}\rightleftarrows Y_{0}^{(i),T}\rightleftarrows\tilde{Z}_{0}^{(i),T}\rightleftarrows Z_{0}^{(i),T} forms a Markov chain. This allows us to write

I(M(i);Z~0(i),T)\displaystyle I\left(M^{(i)};\tilde{Z}_{0}^{(i),T}\right) =I(M(i);Y~0(i),T),\displaystyle=I\left(M^{(i)};\tilde{Y}_{0}^{(i),T}\right),
I(M(i);Z0(i),T)\displaystyle I\left(M^{(i)};{Z}_{0}^{(i),T}\right) =I(M(i);Y0T).\displaystyle=I(M^{(i)};Y_{0}^{T}). (63)

Since Z~0(i),T\tilde{Z}_{0}^{(i),T} is a μ(i)((1p(i))λ+μ(i))\frac{\mu^{(i)}}{((1-p^{(i)})\lambda+\mu^{(i)})}-thinning of Y0(i),T{Y}^{(i),T}_{0}, Theorem 3 gives

I(M(i);Z~0(i),T)(1μ(i)(1p(i))λ+μ(i))I(M(i);Y0(i),T).\displaystyle I(M^{(i)};\tilde{Z}_{0}^{(i),T})\leq\left(1-\frac{\mu^{(i)}}{(1-p^{(i)})\lambda+\mu^{(i)}}\right)I(M^{(i)};{Y}^{(i),T}_{0}). (64)

Also Z0(i),TZ_{0}^{(i),T} is obtained by adding an independent Poisson process with rate p(i)λp^{(i)}\lambda to Z~0(i),T\tilde{Z}^{(i),T}_{0}, Theorem 2 yields

I(M(i);Z~0(i),T)\displaystyle I(M^{(i)};\tilde{Z}_{0}^{(i),T}) =𝔼[0Tϕ(Γ~t(i))ϕ((1p(i))λ)dt],\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(i)})-\phi((1-p^{(i)})\lambda)\,dt\right],
I(M(i);Z0(i),T)\displaystyle I(M^{(i)};{Z}_{0}^{(i),T}) 𝔼[0Tϕ(Γ~t(i)+p(i)λ)ϕ(λ)dt],\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(i)}+p^{(i)}\lambda)-\phi(\lambda)\,dt\right], (65)

where, Γ~0(i),T\tilde{\Gamma}_{0}^{(i),T} is the (σ(M(i),Z~0(i),T):t[0,T])(\sigma(M^{(i)},\tilde{Z}_{0}^{(i),T}):t\in[0,T])-intensity of Z~0(i),T\tilde{Z}_{0}^{(i),T}. Then we can further lower bound DD in (61) as

D\displaystyle D λϕ(λ)1TI(M(1);Y0T)1TI(M(2);Y0T)ϵ\displaystyle\geq\lambda-\phi(\lambda)-\frac{1}{T}I(M^{(1)};Y_{0}^{T})-\frac{1}{T}I(M^{(2)};Y_{0}^{T})-\epsilon
=(a)λϕ(λ)1TI(M(1);Z0(1),T)1TI(M(2);Z0(2),T)ϵ\displaystyle\overset{(a)}{=}\lambda-\phi(\lambda)-\frac{1}{T}I(M^{(1)};{Z}_{0}^{(1),T})-\frac{1}{T}I(M^{(2)};{Z}_{0}^{(2),T})-\epsilon
(b)λϕ(λ)1T(𝔼[0Tϕ(Γ~t(1)+p(1)λ)ϕ(λ)dt])\displaystyle\overset{(b)}{\geq}\lambda-\phi(\lambda)-\frac{1}{T}\left(\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(1)}+p^{(1)}\lambda)-\phi(\lambda)\,dt\right]\right)
1T(𝔼[0Tϕ(Γ~t(2)+p(2)λ)ϕ(λ)dt])ϵ\displaystyle\phantom{===}-\frac{1}{T}\left(\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(2)}+p^{(2)}\lambda)-\phi(\lambda)\,dt\right]\right)-\epsilon
=(c)λ+ϕ(λ)𝔼[ϕ(Γ~S1(1)+p(1)λ)]𝔼[ϕ(Γ~S2(2)+p(2)λ)]ϵ,\displaystyle\overset{(c)}{=}\lambda+\phi(\lambda)-\mathbb{E}\left[\phi(\tilde{\Gamma}_{S_{1}}^{(1)}+p^{(1)}\lambda)\right]-\mathbb{E}\left[\phi(\tilde{\Gamma}^{(2)}_{S_{2}}+p^{(2)}\lambda)\right]-\epsilon,

where for (a), we have used (63),
for (b), we have used (65), and
for (c), we define S1S_{1} and S2S_{2} to be uniformly distributed on [0,T][0,T], independent of all other random variables and independent of each other as well.
For each i{1,2}i\in\{1,2\}, R(i)R^{(i)} in (62) can be lower bounded as

R(i)\displaystyle R^{(i)} 1TI(M(i);Y0(i),T)ϵ1T\displaystyle\geq\frac{1}{T}I(M^{(i)};{Y}^{(i),T}_{0})-\epsilon-\frac{1}{T}
(a)(1p(i))λ+μ(i)(1p(i))λ1TI(M(i);Z~0(i),T)ϵ1T\displaystyle\overset{(a)}{\geq}\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\frac{1}{T}I(M^{(i)};\tilde{Z}_{0}^{(i),T})-\epsilon-\frac{1}{T}
=(b)(1p(i))λ+μ(i)(1p(i))λ1T𝔼[0Tϕ(Γ~t(i))ϕ((1p(i))λ)dt]ϵ1T\displaystyle\overset{(b)}{=}\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(i)})-\phi((1-p^{(i)})\lambda)\,dt\right]-\epsilon-\frac{1}{T}
=(c)(1p(i))λ+μ(i)(1p(i))λ𝔼[ϕ(Γ~Si(i))ϕ((1p(i))λ)]ϵ1T,\displaystyle\overset{(c)}{=}\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\mathbb{E}\left[\phi(\tilde{\Gamma}_{S_{i}}^{(i)})-\phi((1-p^{(i)})\lambda)\right]-\epsilon-\frac{1}{T},

where for (a), we have used (64),
for (b), we have used (65), and
for (c), recall that S1S_{1} and S2S_{2} are uniformly distributed on [0,T][0,T], independent of all other random variables and independent of each other.
Now we use Carathéodory’s theorem [54, Theorem 17.1]. For each i{1,2}i\in\{1,2\}, there exist non-negative [ηk(i)]k=14[\eta^{(i)}_{k}]_{k=1}^{4} and [αk(i)]k=14[\alpha^{(i)}_{k}]_{k=1}^{4}, such that k=14αk(i)=1\sum_{k=1}^{4}\alpha^{(i)}_{k}=1 and

𝔼[ϕ(Γ~Si(i))]\displaystyle\mathbb{E}\left[\phi(\tilde{\Gamma}_{S_{i}}^{(i)})\right] =k=14αk(i)ϕ(ηk(i)),\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi(\eta^{(i)}_{k}),
𝔼[ϕ(Γ~Si(i)+p(i)λ)]\displaystyle\mathbb{E}\left[\phi(\tilde{\Gamma}_{S_{i}}^{(i)}+p^{(i)}\lambda)\right] =k=14αk(i)ϕ(ηk(i)+p(i)λ),\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi(\eta^{(i)}_{k}+p^{(i)}\lambda),
𝔼[Γ~Si(i)]\displaystyle\mathbb{E}\left[\tilde{\Gamma}_{S_{i}}^{(i)}\right] =k=14αk(i)ηk(i)=(1p(i))λ,\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\eta^{(i)}_{k}=(1-p^{(i)})\lambda,

where in the last line we have used the fact that since Γ~0(i),T\tilde{\Gamma}_{0}^{(i),T} is the (σ(M(i),Z~0(i),T):t[0,T])(\sigma(M^{(i)},\tilde{Z}_{0}^{(i),T}):t\in[0,T])-intensity of Z~0(i),T\tilde{Z}_{0}^{(i),T}, 𝔼[0TΓ~t(i)𝑑t]=𝔼[Z~T(i)]=T(1p(i))λ\mathbb{E}\left[\int_{0}^{T}\tilde{\Gamma}_{t}^{(i)}\,dt\right]=\mathbb{E}[\tilde{Z}^{(i)}_{T}]=T(1-p^{(i)})\lambda. Hence we have

R(i)\displaystyle R^{(i)} (1p(i))λ+μ(i)(1p(i))λ(k=14αk(i)ϕ(ηk(i))ϕ((1p(i))λ))ϵ1T,\displaystyle\geq\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\left(\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi(\eta^{(i)}_{k})-\phi((1-p^{(i)})\lambda)\right)-\epsilon-\frac{1}{T}, (66)
D\displaystyle D λ+ϕ(λ)k=14αk(1)ϕ(ηk(1)+p(1)λ)k=14αk(2)ϕ(ηk(2)+p(2)λ)ϵ.\displaystyle\geq\lambda+\phi(\lambda)-\sum_{k=1}^{4}\alpha^{(1)}_{k}\phi(\eta^{(1)}_{k}+p^{(1)}\lambda)-\sum_{k=1}^{4}\alpha^{(2)}_{k}\phi(\eta^{(2)}_{k}+p^{(2)}\lambda)-\epsilon. (67)

Now define

βk(i)αk(i)ηk(i)(1p(i))λ,γk(i)p(i)αk(i)+(1p(i))βk(i).\beta^{(i)}_{k}\triangleq\frac{\alpha^{(i)}_{k}\eta^{(i)}_{k}}{(1-p^{(i)})\lambda},\quad\gamma^{(i)}_{k}\triangleq p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}.

We note that βk(i)=0\beta^{(i)}_{k}=0 if αk(i)=0\alpha^{(i)}_{k}=0, and k=14βk(i)=1\sum_{k=1}^{4}\beta^{(i)}_{k}=1. Substituting the above definitions in (66)

R(i)\displaystyle R^{(i)} (1p(i))λ+μ(i)(1p(i))λ(k=14αk(i)ηk(i)log(ηk(i))ϕ((1p(i))λ))ϵ1T\displaystyle\geq\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\left(\sum_{k=1}^{4}\alpha^{(i)}_{k}\eta^{(i)}_{k}\log(\eta^{(i)}_{k})-\phi((1-p^{(i)})\lambda)\right)-\epsilon-\frac{1}{T}
=((1p(i))λ+μ(i))(k=14βk(i)log(βk(i)(1p(i))λαk(i))𝟏{αk(i)>0}log((1p(i))λ))\displaystyle=((1-p^{(i)})\lambda+\mu^{(i)})\left(\sum_{k=1}^{4}\beta^{(i)}_{k}\log\left(\frac{\beta^{(i)}_{k}(1-p^{(i)})\lambda}{\alpha^{(i)}_{k}}\right)\mathbf{1}\{\alpha^{(i)}_{k}>0\}-\log((1-p^{(i)})\lambda)\right) (68)
ϵ1T\displaystyle\phantom{=========}-\epsilon-\frac{1}{T}
=((1p(i))λ+μ(i))k=14βk(i)log(βk(i)αk(i))ϵ1T.\displaystyle=((1-p^{(i)})\lambda+\mu^{(i)})\sum_{k=1}^{4}\beta^{(i)}_{k}\log\left(\frac{\beta^{(i)}_{k}}{\alpha^{(i)}_{k}}\right)-\epsilon-\frac{1}{T}. (69)

Likewise,

k=14αk(i)ϕ(ηk(i)+p(i)λ)\displaystyle\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi(\eta^{(i)}_{k}+p^{(i)}\lambda) =k=14αk(i)ϕ(βk(i)(1p(i))λαk(i)+p(i)λ)𝟏{αk(i)>0}\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi\left(\frac{\beta^{(i)}_{k}(1-p^{(i)})\lambda}{\alpha^{(i)}_{k}}+p^{(i)}\lambda\right)\mathbf{1}\{\alpha^{(i)}_{k}>0\}
=k=14αk(i)ϕ(γk(i)αk(i)λ)𝟏{αk(i)>0}\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi\left(\frac{\gamma^{(i)}_{k}}{\alpha^{(i)}_{k}}\lambda\right)\mathbf{1}\{\alpha^{(i)}_{k}>0\}
=λk=14γk(i)log(γk(i)αk(i))+ϕ(λ).\displaystyle=\lambda\sum_{k=1}^{4}\gamma^{(i)}_{k}\log\left(\frac{\gamma^{(i)}_{k}}{\alpha^{(i)}_{k}}\right)+\phi(\lambda).

Substituting the above in (67), we get

Dλϕ(λ)λk=14γk(1)log(γk(1)αk(1))λk=14γk(2)log(γk(2)αk(2))ϵ.\displaystyle D\geq\lambda-\phi(\lambda)-\lambda\sum_{k=1}^{4}\gamma^{(1)}_{k}\log\left(\frac{\gamma^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)-\lambda\sum_{k=1}^{4}\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)-\epsilon. (70)

If either p(i)p^{(i)}, say p(1)p^{(1)}, equals 1, then M(1)M^{(1)} and Y0TY_{0}^{T} are independent so that I(M(1);Y0T)=0I(M^{(1)};Y_{0}^{T})=0, and we can repeat the above steps to show that

R(2)\displaystyle R^{(2)} ((1p(2))λ+μ(2))k=14βk(2)log(β(2)αk(2))ϵ1T,\displaystyle\geq((1-p^{(2)})\lambda+\mu^{(2)})\sum_{k=1}^{4}\beta^{(2)}_{k}\log\left(\frac{\beta^{(2)}}{\alpha^{(2)}_{k}}\right)-\epsilon-\frac{1}{T},
D\displaystyle D λϕ(λ)λk=14γk(2)log(γk(2)αk(2))ϵ,\displaystyle\geq\lambda-\phi(\lambda)-\lambda\sum_{k=1}^{4}\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)-\epsilon,

which is the region in (69)-(70) with αk(1)=βk(1)=γk(1)\alpha^{(1)}_{k}=\beta^{(1)}_{k}=\gamma^{(1)}_{k} for k{1,2,3,4}k\in\{1,2,3,4\}.

Since ϵ\epsilon is arbitrary, taking ϵ0\epsilon\to 0 and TT\to\infty gives us the rate region in the statement of the theorem.