Functional Covering of Point Processes

Nirmal V. Shende and Aaron B. Wagner N. V. Shende is with Marvell Technology, Inc., Santa Clara, CA 95054 (email: nshende@marvell.com). A. B. Wagner is with the School of Electrical and Computer Engineering, Cornell University, Ithaca, NY 14853 (email:wagner@cornell.edu). This work was performed when N. V. Shende was a student at Cornell University. This paper was presented in part at the IEEE International Symposium on Information Theory, Paris, Jul. 2019 [1]. This research was supported by the US National Science Foundation under grants CCF-1956192, CCF-2008266, and CCF-1934985.

Abstract

We introduce a new distortion measure for point processes called functional-covering distortion. It is inspired by intensity theory and is related to both the covering of point processes and logarithmic-loss distortion. We obtain the distortion-rate function with feedforward under this distortion measure for a large class of point processes. For Poisson processes, the rate-distortion function is obtained under a general condition called constrained functional-covering distortion, of which both covering and functional-covering are special cases. Also for Poisson processes, we characterize the rate-distortion region for a two-encoder CEO problem and show that feedforward does not enlarge this region.

I Introduction

The classical theory of compression [2] focuses on discrete-time, sequential sources. The theory is thus well-suited to text, audio, speech, genomic data, and the like. Continuous-time signals are typically handled by reducing to discrete-time via projection onto a countable basis. Multi-dimensional extensions enable application to images and video.

Point processes model a distinct data type that appears in diverse domains such as neuroscience [3, 4, 5, 6, 7, 8], communication networks [9, 10, 11], imaging [12, 13], blockchains [14, 15, 16, 17], and photonics [18, 19, 20, 21, 22]. Formally, a point process can be viewed as a random counting measure on some space of interest [23], or if the space is a real line, a random counting function; we shall adopt the latter view. Informally, it may be viewed as simply a random collection of points representing epochs in time or points in space.

Compression of point processes emerges naturally in several of the above domains. Sub-cranial implants need to communicate the timing of neural firings to a monitoring station over a wireless link that is low-rate because it must traverse the skull [24, 25]. In network flow correlation analysis, one cross-correlates packet timings from different links in the network [11]; this requires communication of the packet timings from one place to another. Compressing point process realizations in 2-D (also known as point clouds) arises in computer vision [26, 27, 28], and so on.

Various specialized approaches have been developed for compressing point processes, and in particular for measuring distortion. One natural approach is for the compressed representation to be itself a point-process realization. In this case, the distortion can be the sum of the absolute value of the differences between the actual and reconstructed epochs, with the constraint that the two processes must have the same number of points. For the Poisson point process, Gallager [29] obtained a lower bound on the rate-distortion function by insisting on the causal reconstruction of the points but allowing for their reorder. Bedekar [30] determined the rate-distortion function with the additional constraint of exact orders of epochs in reconstruction. Verdú [31] allowed the reconstruction to be non-causal. Coleman et al. [32] introduced the queueing distortion function, where the reproduced epochs lead the actual epochs. Rubin [33] used the $L_{1}$ distance between the counting functions as a distortion measure. In a more general setting, Koliander et al. [34] gave upper and lower bounds on the rate-distortion function under a more generic distortion defined between pair of point processes.

Most relevant to the present paper, Lapidoth et al. [35] introduced a covering distortion measure, where the reconstruction of a point process on $[0,T]$ is a subset $Y$ of $[0,T]$ that must contain all the points, and the distortion is the Lebesgue measure of the covering set (see also Shen et al. [36]).

If we encode the subset $Y$ as an indicator function

Y_{t}=\begin{cases}1&\text{if $t\in Y$}\\ 0&\text{otherwise,}\end{cases}

then $Y_{t}=0$ guarantees that no point occurred at time $t$ , while $Y_{t}=1$ indicates that a point may occur at $t$ . More generally, $Y_{t}$ could encode the relative belief that there is a point at $t$ . Inspired by this observation, and the notion of logarithmic-loss distortion [37, 38], we consider the following formulation. For a realization of a counting (or point) process $y_{0}^{T}=(y_{t}:t\in[0,T])$ (i.e., $y_{t}$ is integer-valued, non-decreasing, and has unit jumps) and a non-negative reconstruction $\hat{y}_{0}^{T}$ , we define the functional-covering distortion as

\displaystyle d(\hat{y}_{0}^{T},y_{0}^{T})\triangleq\int_{0}^{T}\hat{y}_{t}\,dt-\log(\hat{y}_{t})\,dy_{t}.

(1)

This is related to the covering distortion measure in the following sense. If we impose that $\hat{y}_{t}\in\{0,1\}$ , then (1) reduces to the covering distortion measure. Yet it is natural to consider the distortion in (1) without such a restriction, or with a more general set of allowable values for $\hat{y}_{t}$ . In fact, there are advantages to not restricting $\hat{y}_{0}^{T}$ to the set $\{0,1\}$ . Consider a remote source setting where the encoder cannot access the point-process source directly, but instead observes a thinned version where some of the points in the source point process are deleted randomly. Then, in case of the covering distortion the reconstruction can only be the entire interval $[0,T]$ (i.e. $\hat{y}_{t}=1,t\in[0,T]$ ). On the other hand, under functional covering distortion the problem has a nontrivial solution.

The relation functional covering distortion measure to logarithmic-loss is as follows. If we constrain $\hat{y}_{0}^{T}$ to be bounded, then we can use a Girsanov-type transformation [39, Chapter VI, Theorems T2-T4] to define a probability measure on the set of all counting processes using $\hat{y}_{0}^{T}$ , and the distortion can be defined as the expectation of the negative logarithm of the Radon-Nikodym derivative between this probability measure and an appropriately chosen reference measure, evaluated at the source realization, which is equivalent to (1). However, we will allow $\hat{y}_{0}^{T}$ to be unbounded but integrable $\mathbb{E}[\int_{0}^{T}\hat{Y}_{t}\,dt]<\infty$ .

The relation to intensity theory is as follows. Heuristically, given a random variable $M$ , the intensity of a point process represented by a counting function $Y_{0}^{T}$ is a non-negative process $\Gamma_{0}^{T}$ such that $P(Y_{t+\Delta}-Y_{t}=1|M,Y_{0}^{t})\approx\Gamma_{t}\Delta$ (see Definition 2 for the precise statement). From (1), we expect any optimal $\hat{Y}_{0}^{T}$ (in the rate-distortion trade-off sense) to be related to the intensity of $Y_{0}^{T}$ . In fact, we will see in the proof of Theorem 4 that an optimal reconstruction $\hat{Y}_{0}^{T}$ is the intensity of $Y_{0}^{T}$ given the encoder’s output.

Beyond the introduction of the functional covering distortion measure and the accompanying coding theorems, the paper provides a collection of results for the information-theoretic analysis of point processes, which may be of independent use. One such contribution is Theorem 1, where we derive the mutual information between point-processes with intensities and arbitrary random variables. This is the most general expression available for mutual informations involving point process with intensities. Theorem 1 subsumes the existing formulae for mutual informations involving doubly stochastic Poisson processes [40, 41, 42] and queuing processes [43] as special cases. The other theorems proved in this paper are: we obtain the rate-distortion trade-off with feedforward for the functional-covering distortion measure for point processes which admit intensities (see Theorem 4). For Poisson processes, we obtain the rate-distortion region when the reconstruction function $\hat{y}_{0}^{T}$ is constrained to take value in a subset of reals (Theorem 5). The covering distortion in [35, Theorem 1] is a special case of this constrained functional-covering distortion, hence the rate-distortion function in [35] can be obtained as the special case of this theorem. We characterize the rate-distortion region for a two-encoder Poisson CEO problem (see Figure 1) under functional-covering distortion in Theorem 6. To prove the converse of the CEO problem, we derive a strong data processing inequality for Poisson processes under superposition (see Theorem 2), which complements the strong data processing inequality for Poisson processes under thinning due to Wang [44]. We also provide a self-contained proof of Wang’s theorem in Theorem 3. The solution to the CEO problem gives the rate-distortion trade-off for remote Poisson sources as an immediate corollary.

II Preliminaries

We will consider a probability space $(\Omega,\mathcal{F},P)$ on which all stochastic processes considered here are defined. For a finite $T>0$ , let $(\mathcal{F}_{t}:t\in[0,T])$ be an increasing family of $\sigma$ -fields with $\mathcal{F}_{T}\in\mathcal{F}$ . We will assume that the given filtration $(\mathcal{F}_{t}:t\in[0,T])$ , $P$ , and $\mathcal{F}$ satisfy the “usual conditions”[39, Chapter III, p. 75]: $\mathcal{F}$ is complete with respect to $P$ , $\mathcal{F}_{t}$ is right continuous, and $\mathcal{F}_{0}$ contains all the $P$ -null sets of $\mathcal{F}_{t}$ . Stochastic processes are denoted as $\hat{Y}_{0}^{T}=\{\hat{Y}_{t}:0\leq t\leq T\}$ . The process $X_{0}^{T}$ is said to be adapted to the history $(\mathcal{F}_{t}:t\in[0,T])$ if $X_{t}$ is $\mathcal{F}_{t}$ measurable for all $t\in[0,T]$ . The internal history recorded by the process $X_{0}^{T}$ is denoted by $\mathcal{F}^{X}_{t}=(\sigma(X_{s}):s\in[0,t])$ , where $\sigma(A)$ denotes the $\sigma$ -field generated by $A$ .

A process $X_{0}^{T}$ is called $(\mathcal{F}_{t}:t\in[0,T])$ -predictable if $X_{0}$ is $\mathcal{F}_{0}$ measurable and the mapping $(t,\omega)\to X_{t}(\omega)$ defined from $(0,T)\times\Omega$ into $\mathbb{R}$ (the set of real numbers) is measurable with respect to the $\sigma$ -field over $(0,T)\times\Omega$ generated by rectangles of the form

\displaystyle(s,t]\times A;\quad 0<s\leq t\leq T,\quad A\in\mathcal{F}_{s}.

(2)

For two measurable spaces $(\Omega_{1},\mathcal{F}_{1})$ and $(\Omega_{2},\mathcal{F}_{2})$ , the product space is denoted by $(\Omega_{1}\times\Omega_{2},\mathcal{F}_{1}\otimes\mathcal{F}_{2})$ . We say that $A\rightleftarrows B\rightleftarrows C$ forms a Markov chain under measure $P$ if $A$ and $C$ are conditionally independent given $B$ under $P$ . $P\ll Q$ denotes that the probability measure $P$ is absolutely continuous with respect to the measure $Q$ . $\textbf{1}\{\mathsf{E}\}$ denotes the indicator function for an event $\mathsf{E}$ . $\log(x)$ is the natural logarithm of $x$ . $(x)^{+}$ and $(x)^{-}$ denote the positive ( $\max(x,0)$ ) and the negative part ( $-\min(x,0)$ ) of $x$ respectively. $\lceil x\rceil$ denotes the ceiling of $x$ . Throughout this paper we will adopt the convention that $0\log(0)=0$ , $\exp(\log(0))=0$ , and $0^{0}=1$ .

Definition 1

$\phi(x)=x\log(x)$ with convention that $0\log(0)=0$ .

We note that $\phi(x)$ is convex.

We will use the following form of Jensen’s inequality [45, Theorem 7.9, p. 149] and [45, Theorem 8.20, p. 177].

Lemma 1

If $f(x)$ is a convex function and $\mathbb{E}|X|<\infty$ then $\mathbb{E}[f(X)]$ exists and for any two $\sigma$ -fields $A$ and $B$ ,

\displaystyle\mathbb{E}[f(X)]\geq\mathbb{E}[f(\mathbb{E}[X|A,B])]\geq\mathbb{E}[f(\mathbb{E}[X|A])]\geq f(\mathbb{E}[X]).

We now recall the definition of mutual information for general ensembles and its properties. Let $A$ , $B$ , and $C$ be measurable mappings defined on a given probability space $(\Omega,\mathcal{F},P)$ , taking values in $(\mathcal{A},\mathfrak{F}^{A})$ , $(\mathcal{B},\mathfrak{F}^{B})$ , and $(\mathcal{C},\mathfrak{F}^{C})$ respectively. Consider partitions of $\Omega$ , $\mathfrak{Q}_{A}=\left\{\mathtt{A}_{i},1\leq i\leq N_{A}\right\}\subseteq\sigma(A)$ and $\mathfrak{Q}_{B}=\left\{\mathtt{B}_{j},1\leq j\leq N_{B}\right\}\subseteq\sigma(B)$ . Wyner defined the conditional mutual information $I(A;B|C)$ as [46]

\displaystyle I(A;B|C)=\sup_{\mathfrak{Q}_{A},\mathfrak{Q}_{B}}\mathbb{E}\left[\sum_{i,j=1,1}^{N_{A},N_{B}}P(\mathtt{A}_{i},\mathtt{B}_{j}|C)\log\left(\frac{P(\mathtt{A}_{i},\mathtt{B}_{j}|C)}{P(\mathtt{A}_{i}|C)P(\mathtt{B}_{j}|C)}\right)\right],

(3)

where the supremum is over all such partitions of $\Omega$ . Wyner showed that $I(A;B|C)\geq 0$ with equality if and only if $A\rightleftarrows C\rightleftarrows B$ forms a Markov chain [46, Lemma 3.1], and that (what is generally referred to as) Kolmogrov’s formula holds [46, Lemma 3.2]

\displaystyle I(A,C;B)=I(A;B)+I(C;B|A).

(4)

Hence if $I(A;B)<\infty$ , then $I(C;B|A)=I(A,C;B)-I(A;B)$ . The data processing inequality can be obtained from (4) as well: if $A\rightleftarrows C\rightleftarrows B$ forms a Markov chain, then $I(A;B)\leq I(C;B)$ .

Denote by $P^{A,B}$ , the joint distribution of $A$ and $B$ on the space ( $\mathcal{A}\times\mathcal{B},\mathfrak{F}^{A}\otimes\mathfrak{F}^{B}$ ), i.e.,

\displaystyle P^{A,B}(dA\times dB)=P((A^{-1}(dA),B^{-1}(dB)),\quad dA\in\mathfrak{F}^{A},dB\in\mathfrak{F}^{B}.

Similarly, $P^{A}$ and $P^{B}$ denote the marginal distributions. Gelfand and Yaglom [47] proved that if $P^{A,B}\ll P^{A}\times P^{B}$ , then the mutual information $I(A;B)$ (defined via (3) by taking $\sigma(C)$ to be the trivial $\sigma$ -field) can be computed as:

\displaystyle I(A;B)=\mathbb{E}\left[\log\left(\frac{dP^{A,B}}{d(P^{A}\times P^{B})}\right)\right].

(5)

A sufficient condition for $P^{A,B}\ll P^{A}\times P^{B}$ is that $I(A;B)<\infty$ [48, Lemma 5.2.3, p. 92]. We will also require the following result [46, Lemma 2.1]:

Lemma 2 (Wyner’s Lemma)

If $M$ is a finite alphabet random variable, then

\displaystyle I(M;U_{0}^{T})=H(M)-\mathbb{E}\left[H(M|U_{0}^{T})\right],

where

\displaystyle H(M|U_{0}^{T})=-\sum_{m}P(M=m|U_{0}^{T})\log\left(P(M=m|U_{0}^{T})\right),

and $H(M)$ is the entropy of $M$ .

III Point Processes, Intensities, and Mutual Information

Let $\mathcal{N}_{0}^{T}$ denote the set of counting realizations (or point-process realizations) on $[0,T]$ , i.e., if ${N}^{T}_{0}\in\mathcal{N}_{0}^{T}$ , then for $t\in[0,T]$ , ${N}_{t}\in\mathbf{N}$ (the set of non-negative integers), is right continuous, and has unit increasing jumps with ${N}_{0}=0$ . Let $\mathfrak{F}^{N}$ be the restriction of the $\sigma$ -field generated by the Skorohod topology on $D[0,1]$ to $\mathcal{N}_{0}^{T}$ .

Definition 2

If $N_{0}^{T}$ is a counting process adapted to the history $(\mathcal{F}_{t}:t\in[0,T])$ , then $N_{0}^{T}$ is said to have $(P,\mathcal{F}_{t}:t\in[0,T])$ -intensity ${\Gamma}_{0}^{T}=(\Gamma_{t}:t\in[0,T])$ , where $\Gamma_{0}^{T}$ is a non-negative measurable process if

•

$\Gamma_{0}^{T}$ is $(\mathcal{F}_{t}:t\in[0,T])$ -predictable,
•

$\int_{0}^{T}\Gamma_{t}\,dt<\infty$ , $P$ -a.s.,
•

and for all non-negative $(\mathcal{F}_{t}:t\in[0,T])$ -predictable processes $C_{0}^{T}$ :¹¹1The limits of the Lebesgue-Stieltjes integral $\int_{a}^{b}$ should be interpreted as $\int_{(a,b]}$ .

$\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{N}_{s}\right]=\mathbb{E}\left[\int_{0}^{T}C_{s}\Gamma_{s}\,ds\right].$

When it is clear from the context, we will drop the probability measure $P$ from the notation and say $N_{0}^{T}$ has $(\mathcal{F}_{t}:t\in[0,T])$ -intensity $\Gamma_{0}^{T}$ .

Definition 3

A point process $Y_{0}^{T}$ is said to be Poisson process with rate $\lambda$ if its $(\mathcal{F}_{t}^{Y}:t\in[0,T])$ -intensity is $(\lambda:t\in[0,T])$ .

The above definition can be shown to imply the usual definition of Poisson process [39, Theorem T4, Chapter II, p. 25] and vice versa [39, Section 2, Chapter II, p. 23].

Definition 4

$P_{0}^{Y_{0}^{T}}$ denotes the distribution of a point process $Y_{0}^{T}$ (on the space $(\mathcal{N}_{0}^{T},\mathfrak{F}^{N})$ ) under which $Y_{0}^{T}$ is a Poisson process with unit rate.

A point processes with stochastic intensity and a Poisson process with unit rate are linked via the following result.

Lemma 3

Let $P^{Y_{0}^{T}}$ be the distribution of a point process $Y_{0}^{T}$ such that $P^{Y_{0}^{T}}\ll P_{0}^{Y_{0}^{T}}$ . Then there exists a non-negative predictable process $\Lambda_{0}^{T}$ such that

\displaystyle\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}=\exp\left(\int_{0}^{T}\log(\Lambda_{t})\,dY_{t}-\Lambda_{t}+1\,dt\right).

Moreover, the $(P^{Y_{0}^{T}},(\mathcal{F}_{t}^{Y}:t\in[0,T]))$ -intensity of $Y_{0}^{T}$ is $\Lambda_{0}^{T}$ . Conversely, if the $(P^{Y_{0}^{T}},\mathcal{F}_{t}^{Y}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ is $\Gamma_{0}^{T}$ and $\mathbb{E}_{P^{Y_{0}^{T}}}[\int_{0}^{T}|\phi(\Gamma_{t})|\,dt]<\infty$ , then $P^{Y_{0}^{T}}\ll P_{0}^{Y_{0}^{T}}$ , and the corresponding Radon-Nikodym derivative is given by the above expression, where

\displaystyle\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}|\Gamma_{t}-\Lambda_{t}|\,dt\right]=0,\quad\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\mathbf{1}\{\Gamma_{t}\neq\Lambda_{t}\}\,dY_{t}\right]=0.

In the latter case,

\displaystyle\mathbb{E}_{P^{Y_{0}^{T}}}\left[\log\left(\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}\right)\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+1\,dt\right].

Proof:

Please see the supplementary material. ∎

The following theorem allows us to express the mutual information involving a point processes with intensity and other random variables in terms of the intensity functions. The proof of the theorem is similar to the proof of Theorem 1 in [42].

Theorem 1

Let $Y_{0}^{T}$ be a point process with $(\mathcal{F}_{t}^{Y}:t\in[0,T])$ -intensity $\Lambda_{0}^{T}$ such that

\mathbb{E}[\int_{0}^{T}|\phi(\Lambda_{t})|\,dt]<\infty,

and let $M$ be a measurable mapping on the given probability space satisfying $I(M;Y_{0}^{T})<\infty$ . Then there exists a process $\Gamma_{0}^{T}$ such that $\Gamma_{0}^{T}$ is the $(\mathcal{G}_{t}=\sigma(M,Y_{0}^{t}):t\in[0,T])$ intensity of $Y_{0}^{T}$ and

\displaystyle I(M;Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\Lambda_{t})\,dt\right].

Proof:

Let $P^{M,Y_{0}^{T}}$ denote the joint distribution of $M$ and $Y_{0}^{T}$ , and $P^{M}$ and $P^{Y_{0}^{T}}$ denote their marginals, respectively. Since $I(M;Y_{0}^{T})<\infty$ , we get that $P^{M,Y_{0}^{T}}\ll P^{M}\times P^{Y_{0}^{T}}$ [48, Lemma 5.2.3, p. 92]. Lemma 3 says that $P^{Y_{0}^{T}}\ll P_{0}^{Y_{0}^{T}}$ , which together with [49, Chapter 1, Exercise 19, p. 22] gives $P^{M,Y_{0}^{T}}\ll P^{M}\times P^{Y_{0}^{T}}\ll P^{M}\times P_{0}^{Y_{0}^{T}}$ .

Let $\tilde{P}^{M,Y_{0}^{T}}\triangleq{P}^{M}\times P_{0}^{Y_{0}^{T}}$ and

\displaystyle\mathcal{L}=\frac{d{P}^{M,Y_{0}^{T}}}{d\tilde{P}^{M,Y_{0}^{T}}}

(6)

denote the Radon-Nikodym derivative. Since under $\tilde{P}^{M,Y_{0}^{T}}$ , $M$ and $Y_{0}^{T}$ are independent, we note that the $(\tilde{P}^{M,Y_{0}^{T}},(\mathcal{G}_{t}:t\in[0,T]))$ -intensity of $Y_{0}^{T}$ is 1 [39, E5 Exercise, Chapter II, p. 28]. Define the process $L_{0}^{T}$ as

\displaystyle L_{t}=\mathbb{E}_{\tilde{P}}[\mathcal{L}|\mathcal{G}_{t}],\quad t\in[0,T],

(7)

where $\mathbb{E}_{\tilde{P}}$ denotes that the conditional expectation is taken with respect to the measure $\tilde{P}^{M,Y_{0}^{T}}$ . Then $L_{0}^{T}$ is a $(\tilde{P}^{M,Y_{0}^{T}},\mathcal{G}_{t})$ non-negative absolutely-integrable martingale.

By the martingale representation theorem, the process $L_{0}^{T}$ can be written as [39, Chapter III, Theorem T17, p. 76] (where we have taken $\sigma(M)$ to be the “germ $\sigma$ -field”):

\displaystyle L_{t}=1+\int_{0}^{t}K_{s}(d{Y}_{s}-ds),

where $K_{0}^{T}$ is a $(\mathcal{G}_{t}:t\in[0,T])$ -predictable process which satisfies $\int_{0}^{T}|K_{t}|\,dt<\infty$ $\tilde{P}^{M,Y_{0}^{T}}$ -a.s. Applying [50, Lemma 19.5, p. 315], we can write $L_{0}^{T}$ as

\displaystyle L_{t}=\exp\left(\int_{0}^{t}\log(\Gamma_{s})\,dY_{s}+(1-\Gamma_{s})\,ds\right),\quad t\in[0,T],

(8)

where $\Gamma_{0}^{T}$ is a non-negative $(\mathcal{G}_{t}:t\in[0,T])$ -predictable process, and $\Gamma_{t}<\infty$ $\tilde{P}^{M,Y_{0}^{T}}$ -a.s. for $t\in[0,T]$ .

Now we can mimic the proof of [39, Chapter VI, Theorems T3, p. 166] to deduce:

Lemma 4

For all non-negative $(\mathcal{G}_{t}:t\in[0,T])$ -predictable processes $C_{0}^{T}$

\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{t}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}C_{t}\,dY_{t}\right],

where the expectation is taken with respect to measure $P$ .

Proof:

Please see the supplementary material. ∎

Taking $C_{t}=1$ in the above equality yields

\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]<\infty.

(9)

Hence $\int_{0}^{T}\Gamma_{t}\,dt<\infty$ $P$ -a.s. and we conclude that the $({P}^{M,Y_{0}^{T}},\mathcal{G}_{t}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ is $\Gamma_{0}^{T}$ .

Now we will use:

Lemma 5

\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right].

(10)

Proof:

Please see the supplementary material. ∎

Since $\mathbb{E}\left[\log\left(\frac{dP^{M,N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right]$ is well-defined, (6), (7), and (8) yields

$\displaystyle\mathbb{E}\left[\log\left(\frac{dP^{M,N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right]$	$\displaystyle=\mathbb{E}\left[\log(L_{T})\right]$
	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}+(1-\Gamma_{t})\,dt\right]$
	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]+\mathbb{E}\left[\int_{0}^{T}(1-\Lambda_{t})\,dt\right],$	(11)

where in the last line we have used Lemma 5 and $\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]<\infty$ from (9). Also,

$\displaystyle\mathbb{E}\left[\log\left(\frac{d(P^{M}\times P^{Y_{0}^{T}}}{d\tilde{P}^{M,Y_{0}^{T}}}\right)\right]$	$\displaystyle=\mathbb{E}\left[\log\left(\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}\right)\right]$
	$\displaystyle\overset{(a)}{=}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})+1-\Lambda_{t}\,dt\right]$	(12)
	$\displaystyle<\infty,$

where we have used Lemma 3 for (a). Using the above inequality and the fact that

\mathbb{E}\left[\log\left(\frac{dP^{M,N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right]

is well-defined, we can express the mutual information as

	$\displaystyle I(M;Y_{0}^{T})$	$\displaystyle=\mathbb{E}\left[\log\left(\frac{d{P}^{M,Y_{0}^{T}}}{d({P}^{M}\times P^{Y_{0}^{T}})}\right)\right]$
		$\displaystyle=\mathbb{E}\left[\log\left(\frac{dP^{M,N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right]-\mathbb{E}\left[\log\left(\frac{d(P^{M}\times P^{N_{0}^{T}}}{d\tilde{P}^{M,N_{0}^{T}}}\right)\right],$		(13)

Now we can compute the mutual information from (11), (12), and (13),

	$\displaystyle I(M;Y_{0}^{T})$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]+\mathbb{E}\left[\int_{0}^{T}(1-\Lambda_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}1-\Lambda_{t}\,dt\right]$
		$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]$
		$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\Lambda_{t})\,dt\right].$

∎

We shall require several strong data processing inequalities, for which purpose we now derive some ancillary results regarding the intensity of a point process. Combining [39, T8 Theorem, Chapter II, p. 27] and [39, T9 Theorem, Chapter II, p. 28], we can conclude the following result.

Lemma 6

Let $\Gamma_{0}^{T}$ be a $(\mathcal{F}_{t}:t\in[0,T])$ -predictable non-negative process satisfying

\int_{0}^{T}\Gamma_{t}\,dt<\infty\quad\text{a.s.}

Let $Y_{0}^{T}$ be a point process adapted to $(\mathcal{F}_{t}:t\in[0,T])$ . Then $\Gamma_{0}^{T}$ is the $(\mathcal{F}_{t}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ if and only if

M_{t}\triangleq Y_{t}-\int_{0}^{t}\Gamma_{s}\,ds\quad t\in[0,T]

is a $(\mathcal{F}_{t}:t\in[0,T])$ -local martingale²²2 A process $Y_{0}^{T}$ is called a local martingale with respect to a filtration $(\mathcal{F}_{t}:t\geq 0)$ if $Y_{t}$ is $\mathcal{F}_{t}$ -measurable for each $t\in[0,T]$ and there exists an increasing sequence of stopping times $T_{n}$ , such that $T_{n}\to\infty$ and the stopped and shifted processes $(Y_{\min\{t,T_{n}\}}-Y_{0}:t\in[0,T])$ are $(\mathcal{F}_{t}:t\in[0,T])$ -martingales for each $n$ ..

If we impose the stricter condition of finite expectation $\mathbb{E}[\int_{0}^{T}\Gamma_{t}\,dt]<\infty$ , the local martingale condition in the above statement can be replaced by the martingale condition.

Lemma 7

Let $\Gamma_{0}^{T}$ be a $(\mathcal{F}_{t}:t\in[0,T])$ -predictable non-negative process satisfying

\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]<\infty.

Let $Y_{0}^{T}$ be a point process adapted to $(\mathcal{F}_{t}:t\in[0,T])$ . Then $\Gamma_{0}^{T}$ is the $(\mathcal{F}_{t}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ if and only if

M_{t}\triangleq Y_{t}-\int_{0}^{t}\Gamma_{s}\,ds\quad t\in[0,T]

is a $(\mathcal{F}_{t}:t\in[0,T])$ -martingale.

Proof:

Please see the supplementary material. ∎

Lemma 8

If a point process $N_{0}^{T}$ has $(\mathcal{F}_{t}:t\in[0,T])$ -intensity ${\Lambda}_{0}^{T}$ , and $(\mathcal{G}_{t}:t\in[0,T])$ is another history for $N_{0}^{T}$ such that $\mathcal{G}_{t}\subseteq\mathcal{F}_{t}$ for each $t\in[0,T]$ , then there exists a process $\Pi_{0}^{T}$ such that $\Pi_{0}^{T}$ is the $(\mathcal{G}_{t}:t\in[0,T])$ -intensity of ${N}_{0}^{T}$ , and for each $t\in[0,T]$ , $\Pi_{t}=\mathbb{E}[{\Lambda}_{t}|\mathcal{G}_{t-}]$ $P$ -a.s.

Proof:

Please see the supplementary material. ∎

Lemma 9

Let $Y_{0}^{T}$ be a point process with $(\mathcal{G}_{t}\triangleq\sigma(M,Y_{0}^{t}):t\in[0,T])$ -intensity $\Gamma_{0}^{T}$ for some $M$ . Let $Z_{0}^{T}$ be obtained adding an independent (of both $M$ and $Y_{0}^{T}$ ) point process $N_{0}^{T}$ with $(\mathcal{F}_{t}^{N}:t\in[0,T])$ -intensity $\Pi_{0}^{T}$ to $Y_{0}^{T}$ . Then $Z_{0}^{T}$ has a $(\mathcal{F}_{t}\triangleq\sigma(M,Z_{0}^{t}):t\in[0,T])$ -intensity $\Theta_{0}^{T}$ which satisfies $\Theta_{t}=\mathbb{E}[(\Gamma_{t}+\Pi_{t})|\mathcal{F}_{t-}]$ $P$ -a.s. for each $t\in[0,T]$ .

Proof:

Please see the supplementary material. ∎

Theorem 2

Let $Y_{0}^{T}$ be a Poisson process with rate $\lambda$ , $M$ be such that $I(M;Y_{0}^{T})<\infty$ , and $\Gamma_{0}^{T}$ be the $(\sigma(M;Y_{0}^{t}):t\in[0,T])$ -intensity of $Y_{0}^{T}$ . Suppose $Z_{0}^{T}$ is obtained by adding an independent (of $Y_{0}^{T}$ and $M$ ) Poisson process with rate $\mu$ to $Y_{0}^{T}$ . Then,

	$\displaystyle I(M;Y_{0}^{T})$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\lambda)\,dt\right],$
	$\displaystyle I(M;Z_{0}^{T})$	$\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t}+\mu)-\phi(\lambda+\mu)\,dt\right].$

Proof:

Since $M\leftrightarrows Y_{0}^{T}\leftrightarrows Z_{0}^{T}$ forms a Markov chain, the data processing inequality gives $I(M;Z_{0}^{T})\leq I(M;Y_{0}^{T})<\infty$ . Applying Theorem 1 and using the uniqueness of intensities,

	$\displaystyle I(M;Y_{0}^{T})$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\lambda)\,dt\right],\quad\text{and}$
	$\displaystyle I(M;Z_{0}^{T})$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\hat{\Gamma}_{t})-\phi(\hat{\lambda}_{t})\,dt\right],$		(14)

where $\hat{\Gamma}_{0}^{T}$ and $\hat{\lambda}_{0}^{T}$ are the $(\sigma(M;Z_{0}^{t}):t\in[0,T])$ and $(\mathcal{F}_{t}^{Z}:t\in[0,T])$ -intensities of $Z_{0}^{T}$ . Due to the uniqueness of the intensities and Lemma 9, we get for each $t\in[0,T]$ , $\hat{\Gamma}_{t}=\mathbb{E}[\Gamma_{t}|M,Z_{0}^{t-}]+\mu$ , and $\hat{\lambda}_{t}=\lambda+\mu$ . Substituting this in (14) and applying Jensen’s inequality yields

	$\displaystyle I(M;Z_{0}^{T})$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\mathbb{E}[\Gamma_{t}\|M,Z_{0}^{t-}]+\mu)-\phi(\lambda+\mu)\,dt\right],$
		$\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t}+\mu)-\phi(\lambda+\mu)\,dt\right].$

∎

Definition 5

A point process $Z_{0}^{T}$ is said to be obtained from $p$ -thinning of a point process $Y_{0}^{T}$ , if each point in $Y_{0}^{T}$ is deleted with probability $p$ , independent of all other points and deletions.

Lemma 10

Suppose that $Y_{0}^{T}$ is a point process with $\mathcal{G}_{t}\triangleq\sigma(M,Y_{0}^{t})$ -intensity $\Gamma_{0}^{T}$ such that $\mathbb{E}[\int_{0}^{T}\Gamma_{t}\,dt]<\infty$ and $Z_{0}^{T}$ is obtained from $p$ -thinning $Y_{0}^{T}$ . Then the $(\mathcal{F}_{t}\triangleq\sigma(M,Z_{0}^{t}):t\in[0,T])$ -intensity of $Z_{0}^{T}$ is given by $\Theta_{0}^{T}$ , where $P$ -a.s. $\Theta_{t}=(1-p)\mathbb{E}[\Gamma_{t}|\mathcal{F}_{t-}],t\in[0,T]$ .

Proof:

Please see the supplementary material. ∎

The following theorem was first proven by Wang in [44] using a property of certain “contraction coefficient” used in strong data processing inequalities [51]. Here, we provide a self-contained proof which uses Theorem 1 and Lemma 10.

Theorem 3

Let $Y_{0}^{T}$ be a Poisson process with rate $\lambda$ , and $M$ be such that $I(M;Y_{0}^{T})<\infty$ . Let $Z_{0}^{T}$ obtained from $p$ -thinning of $Y_{0}^{T}$ such that the thinning operation is independent of $M$ . Then

\displaystyle I(M;Z_{0}^{T})\leq(1-p)I(M;Y_{0}^{T}).

Proof:

The data processing inequality gives $I(M;Z_{0}^{T})\leq I(M;Y_{0}^{T})<\infty$ . Applying Theorem 1,

\displaystyle I(M;Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\lambda)\,dt\right],

(15)

and

\displaystyle I(M;Z_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\hat{\Gamma}_{t})-\phi(\hat{\lambda}_{t})\,dt\right],

(16)

where $\Gamma_{0}^{T}$ and $\lambda_{0}^{T}$ (respectively $\hat{\Gamma}_{0}^{T}$ and $\hat{\lambda}_{0}^{T}$ ) are the $(\sigma(M;Y_{0}^{t}):t\in[0,T])$ and $(\sigma(Y_{0}^{t}):t\in[0,T])$ -intensities (respectively $(\sigma(M;Z_{0}^{t}):t\in[0,T])$ and $(\sigma(Z_{0}^{t}):t\in[0,T])$ -intensities) of $Y_{0}^{T}$ (respectively $Z_{0}^{T}$ ). Due to the uniqueness of the intensities and Lemma 10, we can take for each $t\in[0,T]$ ,

\hat{\Gamma}_{t}=(1-p)\mathbb{E}[\Gamma_{t}|M,Z_{0}^{t-}],\quad\hat{\lambda}_{t}=(1-p)\lambda.

Noting that $\phi((1-p)x)=(1-p)\phi(x)+x\phi(1-p)$ , (16) yields

	$\displaystyle I(M;Z_{0}^{T})=$	$\displaystyle(1-p)\mathbb{E}\left[\int_{0}^{T}\phi(\mathbb{E}[\Gamma_{t}\|M,Z_{0}^{t-}])-\phi(\lambda)\,dt\right]$
		$\displaystyle+\phi(1-p)\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}-\lambda\,dt\right]$
	$\displaystyle\overset{(a)}{=}$	$\displaystyle(1-p)\mathbb{E}\left[\int_{0}^{T}\phi(\mathbb{E}[\Gamma_{t}\|M,Z_{0}^{t-}])-\phi(\lambda)\,dt\right]$
	$\displaystyle\overset{(b)}{\leq}$	$\displaystyle(1-p)\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\phi(\lambda)\,dt\right]$
	$\displaystyle=$	$\displaystyle(1-p)I(M;Y_{0}^{T}),$

where for (a) we have used the fact that $\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}1\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\lambda\,dt\right]$ , and
for (b) we have used Jensen’s inequality. ∎

We will require the following result [52, Theorem 2.11, p. 106].

Lemma 11

Suppose that $Y_{0}^{T}$ is a Poisson process with rate $\lambda$ and $Z_{0}^{T}$ is obtained from $p$ -thinning of $Y_{0}^{T}$ . Let

\displaystyle\hat{Z}_{t}=Y_{t}-Z_{t}\quad t\in[0,T].

Then $\hat{Z}_{0}^{T}$ and $Z_{0}^{T}$ are independent Poisson processes with rates $p\lambda$ and $(1-p)\lambda$ respectively.

The following lemma will be used repeatedly in the converse proofs of the rate-distortion function.

Lemma 12

Let a point process $Y_{0}^{T}$ have an $(\mathcal{F}_{t}:t\in[0,T])$ -intensity $\Gamma_{0}^{T}$ such that

\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]<\infty.

Let $\hat{Y}_{0}^{T}$ be an non-negative $(\mathcal{F}_{t}:t\in[0,T])$ -predictable process satisfying $\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]<\infty$ . Then

\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\Gamma_{t}\,dt\right].

Proof:

Please see the supplementary material. ∎

IV Functional Covering of Point Processes

In this section, we will consider general point processes and obtain the rate-distortion function under the functional-covering distortion when feedforward is present. Stronger results are obtained for Poisson processes in the next sections.

Definition 6

Given a point process $y_{0}^{T}\in\mathcal{N}_{0}^{T}$ , and a non-negative function $\hat{y}_{0}^{T}$ , the functional-covering distortion $d$ is

\displaystyle d(\hat{y}_{0}^{T},y_{0}^{T})\triangleq\left(\int_{0}^{T}\hat{y}_{t}\,dt-\log(\hat{y}_{t})\,dy_{t}\right),

whenever the expression on the right is well-defined.

We will allow the reconstruction function $\hat{Y}_{0}^{T}$ to depend on $Y_{0}^{T}$ as well as the message, constrained via predictability. In particular, we will call $\hat{Y}_{0}^{T}$ an allowable reconstruction with feedforward if it is non-negative and $(\sigma(Y_{0}^{t}):t\in[0,T])$ -predictable. Let $\mathcal{\hat{Y}}_{0,\text{FF}}^{T}$ denote the set of all $\hat{y}_{0}^{T}$ processes which are allowable reconstructions with feedforward.

Definition 7

A $(T,R,D)$ code with feedforward consists of an encoder $f$

\displaystyle f:\mathcal{N}_{0}^{T}\rightarrow\{1,\dots,\dots,\lceil\exp(RT)\rceil\}

and a decoder $g$

\displaystyle g:\{1,\dots,\lceil\exp(RT)\rceil\}\times\mathcal{N}_{0}^{T}\rightarrow\mathcal{\hat{Y}}_{0,\text{FF}}^{T}

satisfying

\displaystyle\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]<\infty

and the distortion constraint

\displaystyle\mathbb{E}\left[\frac{1}{T}d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]\leq D.

We will call the encoder’s output $M=f(Y_{0}^{T})$ the message and the decoder’s output $\hat{Y}_{0}^{T}$ the reconstruction.

Definition 8

The minimum achievable distortion with feedforward at rate $R$ and blocklength $T$ is

\displaystyle D_{F}^{*}(R,T)\triangleq\inf\{D:\text{there exists a $(T,R,D)$ code with feedforward}\}.

Definition 9

The distortion-rate function with feedforward is

\displaystyle D_{F}(R)\triangleq\limsup_{T\to\infty}D_{F}^{*}(R,T).

The minimum achievable rate at distortion $D$ and blocklength $T$ with feedforward $R_{F}^{*}(D,T)$ and the rate-distortion function with feedforward $R_{F}(D)$ can be defined similarly.

$D_{F}^{*}(R,T)$ can be characterized via the following theorem for certain point processes.

Theorem 4

Let $Y_{0}^{T}$ be a point process with $(\mathcal{F}_{t}^{Y}:t\in[0,T])$ -intensity $\Lambda_{0}^{T}$ such that

\mathbb{E}\left[\int_{0}^{T}|\phi(\Lambda_{t})|\,dt\right]<\infty.

Let

\displaystyle\Xi(Y_{0}^{T})\triangleq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}-\phi(\Lambda_{t})\,dt\right],

and

\delta_{T}\triangleq P(Y_{T}=0)<1.

Then $D_{F}^{*}(R,T)$ satisfies

\displaystyle\Xi(Y_{0}^{T})-R-\frac{1}{T}\leq D_{F}^{*}(R,T)\leq\Xi(Y_{0}^{T})-(1-\delta_{T})R+\frac{1}{T}.

Proof:

Achievability:
Recall that since $\Lambda_{0}^{T}$ is the $(\mathcal{F}_{t}^{Y}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ , it is $(\mathcal{F}_{t}^{Y}:t\in[0,T])$ -predictable, and $\mathbb{E}[\int_{0}^{T}|\phi(\Lambda_{t})|\,dt]<\infty$ implies $\mathbb{E}[\int_{0}^{T}\Lambda_{t}\,dt]<\infty$ . If the decoder outputs $\Lambda_{0}^{T}$ , this leads to distortion

	$\displaystyle\frac{1}{T}\mathbb{E}[d(\Lambda_{0}^{T},Y_{0}^{T})]$	$\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt-\log(\Lambda_{t})\,dY_{t}\right]$
		$\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}-\phi(\Lambda_{t})\,dt\right]$
		$\displaystyle=\Xi(Y_{0}^{T}).$

Thus $D_{F}^{*}(0,T)\leq\Xi(Y_{0}^{T})$ , and the upper bound in the statement of the theorem holds at $R=0$ .

Now consider the case $R>0$ . Fix $T>0$ and let $J=\lceil\exp(RT)\rceil$ . If $Y_{T}=0$ , then the encoder sends index $M=1$ . Otherwise, let $\Theta$ denote the first arrival instant of the observed point process $Y_{0}^{T}$ . From Lemma 3, we have that $P^{Y^{T}_{0}}\ll P_{0}^{Y^{T}_{0}}$ . Since under $P_{0}^{Y^{T}_{0}}$ , $Y_{0}^{T}$ is a Poisson process with unit rate, it holds that $P_{0}^{Y^{T}_{0}}(\Theta=t,Y_{T}>0)=0$ for any fixed $t\in[0,T]$ . This gives us $P(\Theta=t,Y_{T}>0)=0$ for $t\in[0,T]$ . Thus conditioned on the event $Y_{T}>0$ , $\Theta$ has a continuous distribution function $F_{\Theta}$ . The encoder computes $F_{\Theta}(\Theta)$ which is uniformly distributed over $[0,1]$ , which the encoder suitably quantizes to obtain an $M$ which is uniform in $\{2,\dots,J\}$ . From Theorem 1, there exists a $(\sigma(M,Y_{0}^{t}):t\in[0,T])$ -predictable process $\Gamma_{0}^{T}$ which is the $(\sigma(M,Y_{0}^{t}):t\in[0,T])$ -intensity of $Y_{0}^{T}$ . We note that $\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]<\infty$ , and from Theorem 1, $\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]<\infty$ . Hence

\frac{1}{T}\mathbb{E}[d(\Lambda_{0}^{T},Y_{0}^{T})]=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt-\log(\Gamma_{t})\,dY_{t}\right]

is well-defined. The decoder outputs $\Gamma_{0}^{T}$ as its reconstruction. Then we have

$\displaystyle\frac{1}{T}H(M)$	$\displaystyle=-\frac{1}{T}\left(\delta_{T}\log(\delta_{T})+(1-\delta_{T})\log(1-\delta_{T})\right)+\frac{1-\delta_{T}}{T}(\log(J-1))$
	$\displaystyle\overset{(a)}{\geq}\frac{1-\delta_{T}}{T}(\log(J-1))$
	$\displaystyle\overset{(b)}{\geq}\frac{1-\delta_{T}}{T}\log(J/\exp(1))$
	$\displaystyle\overset{(c)}{\geq}(1-\delta_{T})R-\frac{1}{T},$	(17)

where for (a), we have used the bound $-\delta_{T}\log(\delta_{T})-(1-\delta_{T})\log(1-\delta_{T})\geq 0$ ,
for (b), we have used the inequality $J-1\geq J/\exp(1)$ when $J\geq 2$ , and
for (c), we used the fact that $RT\leq\log(J)$ .

$H(M)$ also satisfies

	$\displaystyle\frac{1}{T}H(M)$	$\displaystyle\overset{(a)}{=}\frac{1}{T}I(M;Y_{0}^{T})$
		$\displaystyle\overset{(b)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right],$		(18)

where, for (a) we have used Lemma 2,
for (b) we have used Theorem 1.
The average distortion can be bounded as follows:

	$\displaystyle\frac{1}{T}\mathbb{E}[d(\Lambda_{0}^{T},Y_{0}^{T})]$	$\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt-\log(\Gamma_{t})\,dY_{t}\right]$
		$\displaystyle\overset{(a)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]$
		$\displaystyle\overset{(b)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]$
		$\displaystyle\overset{(c)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]-\frac{1}{T}H(M)-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]$
		$\displaystyle\overset{(d)}{\leq}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}-\phi(\Lambda_{t})\,dt\right]-(1-\delta_{T})R+\frac{1}{T}$
		$\displaystyle=\Xi(Y_{0}^{T})-(1-\delta_{T})R+\frac{1}{T},$

where, for (a), we have used the fact that $\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]<\infty$ due to Theorem 1,
for (b), we used the equality $\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]$ ,
for (c), we used (18), and
for (d), we used (17).

Thus we have shown the existence of a $(T,R,D)$ code with feedforward such that $D=\Xi(Y_{0}^{T})-(1-\delta_{T})R+\frac{1}{T}$ . This gives the upper bound on $D_{F}^{*}(R,T)$ .

Converse:

For the given $(T,R,D)$ code with feedforward, let $J=\lceil\exp(RT)\rceil$ . Then $J\leq\exp(RT)+1\leq\exp(RT+1)$ . Thus we have

\displaystyle R+\frac{1}{T}\geq\frac{1}{T}\log(J)\geq\frac{1}{T}H(M)\overset{(a)}{=}\frac{1}{T}I(M;Y_{0}^{T}),

(19)

where (a) follows because of Lemma 2.

Since $I(M;Y_{0}^{T})<\infty$ , we conclude from Theorem 1 that there exists a process $\Gamma_{0}^{T}$ such that $\Gamma_{0}^{T}$ is the $(\mathcal{F}_{t}=\sigma(M,Y_{0}^{t}):t\in[0,T])$ intensity of $Y_{0}^{T}$ and

\displaystyle I(M;Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right].

Hence from (19)

\displaystyle R\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]-\frac{1}{T}.

(20)

Let $\hat{Y}_{0}^{T}$ denote the decoder’s output. The distortion constraint $D$ satisfies

	$\displaystyle D\geq\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]$	$\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt-\log(\hat{Y}_{t})\,dY_{t}\right]$
		$\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}-\log(\hat{Y}_{t})\Gamma_{t}\,dt\right]$		(21)

where in the last line we have used Lemma 12.

Using the inequality $u\log(v)\leq\phi(u)-u+v$ , and noting that the individual terms have finite expectations,

	$\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\Gamma_{t}\,dt\right]$	$\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+\hat{Y}_{t}\,dt\right]$
		$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]+\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right].$		(22)

From (21) and (20), we deduce

	$\displaystyle R+D$	$\displaystyle\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]+\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]$
		$\displaystyle\phantom{====}-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}\right]-\frac{1}{T}$
		$\displaystyle\overset{(a)}{\geq}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]-\frac{1}{T}$
		$\displaystyle\overset{(b)}{\geq}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Lambda_{t})\,dt\right]-\frac{1}{T}$
		$\displaystyle\overset{}{=}\Xi(Y_{0}^{T})-\frac{1}{T},$

where, for (a) we have used (22), and
for (b) we used the fact that $\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}\,dt\right]$ .
Hence we have shown that for any $(T,R,D)$ code with feedforward, $D\geq\Xi(Y_{0}^{T})-R-1/T$ . This gives us the lower bound on $D_{F}^{*}(R,T)$ ∎

Corollary 1

Let $Y_{0}^{T}$ be a point process with $(\mathcal{F}_{t}^{Y}:t\in[0,T])$ -intensity $\Lambda_{0}^{T}$ such that

•

$\mathbb{E}[\int_{0}^{T}|\phi(\Lambda_{t})|\,dt]<\infty$ ,
•

$\bar{\Xi}(Y)\triangleq\limsup_{T\to\infty}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Lambda_{t}-\phi(\Lambda_{t})\,dt\right]$ is finite.
•

$\lim_{T\to\infty}P(Y_{T}=0)=0$ .

Then

\displaystyle D_{F}(R)=\bar{\Xi}(Y)-R.

Proof:

The corollary follows from the definition $D_{F}(R)=\limsup_{T\to\infty}D_{F}^{*}(R,T)$ and from the bounds on $D_{F}^{*}(R,T)$ in the Theorem 4. ∎

Remark 1

The above distortion-rate function is reminiscent of the logarithmic-loss distortion-rate function for a DMS. Specifically, for a DMS $Y$ on alphabet $\mathcal{Y}$ let the reconstruction be a probability distribution function $Q$ on $\mathcal{Y}$ . The logarithmic loss distortion is defined as $d_{LL}(y,Q)\triangleq-\log(Q(y))$ and the distortion-rate function is then given by $D(R)=(H(Y)-R)^{+}$ [38].

If the reconstruction $\hat{y}_{0}^{T}$ is assumed to be bounded then it can be used define a probability measure on the space of point-processes $(\mathcal{N}_{0}^{T},\mathfrak{F}^{N})$ via following Radon-Nikodym derivative.

\frac{dP_{\hat{y}_{0}^{T}}}{dP_{0}}(y_{0}^{T})=\exp\left(\int_{0}^{T}\log(\hat{y}_{t})\,dy_{t}-(\hat{y}_{t}-1)\,dt\right),

where $P_{0}$ is the measure under which $Y_{0}^{T}$ is a Poisson process with unit rate. Then the intensity of $Y_{0}^{T}$ under this measure is $\hat{y}_{0}^{T}$ [39, Chapter VI, Theorems T2-T4] and the functional-covering distortion is related to the above Radon-Nikodym derivative as

d(\hat{y}_{0}^{T},y_{0}^{T})=-\log\left(\frac{dP_{\hat{y}_{0}^{T}}}{dP_{0}}(y_{0}^{T})\right)+T.

$\Diamond$

Applying the above corollary to a Poisson process with rate $\lambda>0$ , we get that $D_{F}(R)=\lambda-\lambda\log(\lambda)-R$ . As we will see in the next section, this distortion-rate function can be achieved without feedforward.

V Constrained Functional-Covering of Poisson Processes

In this and the next section we focus on Poisson processes. Let $\hat{\mathcal{Y}}_{0}^{T}$ denote the set of all functions $\hat{y}_{0}^{T}$ which are non-negative and left-continuous with right-limits. We assume that we are given a set $\mathcal{A}\in\mathbb{R}_{+}$ with at least one positive element. We will constrain the reconstruction function $\hat{Y}_{0}^{T}$ to take value in $\mathcal{A}$ , so that for all $t\in[0,T]$ , $\hat{Y}_{t}\in\mathcal{A}$ .

Definition 10

A $(T,R,D)$ code consists of an encoder $f$

\displaystyle f:\mathcal{N}_{0}^{T}\rightarrow\{1,\dots,\lceil\exp(RT)\rceil\}

and a decoder $g$

\displaystyle g:\{1,\dots,\lceil\exp(RT)\rceil\}\rightarrow\hat{\mathcal{Y}}_{0}^{T}

satisfying

\displaystyle\hat{Y}_{t}\in\mathcal{A},\,\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]<\infty

and the distortion constraint

\displaystyle\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]\leq D.

As before, we will call the encoder’s output $M=f(Y_{0}^{T})$ the message and the decoder’s output $\hat{Y}_{0}^{T}=g(M)$ the reconstruction.

Definition 11

A rate-distortion vector $(R,D)$ is said to be achievable if for any $\epsilon>0$ , there exists a sequence of $(T_{n},R+\epsilon,D+\epsilon)$ codes such that $\lim_{n\to\infty}T_{n}=\infty$ .

Definition 12

The rate-distortion region $\mathfrak{RD}^{\mathcal{P}}_{\mathcal{A}}$ is the intersection of all achievable rate-distortion vectors $(R,D)$ .

The rate-distortion region $\mathfrak{RD}^{\mathcal{P},\text{F}}_{\mathcal{A}}$ with feedforward is defined as in Definitions 11 and 12.

Theorem 5

The rate-distortion region for the constrained functional-covering of a Poisson process with rate $\lambda>0$ is given by

\mathfrak{RD}^{\mathcal{P}}_{\mathcal{A}}=\mathfrak{RD}^{\mathcal{P},\text{F}}_{\mathcal{A}}=\mathfrak{RD},

where $\mathfrak{RD}$ is the convex hull of the union of sets of rate-distortion vectors $(R,D)$ such that

	$\displaystyle R\geq\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)$
	$\displaystyle D\geq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right),$

where

\displaystyle\Psi_{\mathcal{A}}(u)\triangleq\inf_{v\in\mathcal{A}}v-u\log(v)

with the convention that $0\Psi(0/0)=0$ , and $[\alpha_{k}]_{k=1}^{4}$ and $[\beta_{k}]_{k=1}^{4}$ are probability vectors over $\{1,2,3,4\}$ satisfying $\alpha_{k}=0\Rightarrow\beta_{k}=0$ .

Proof:

Achievability

Let

	$\displaystyle{R}\triangleq\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)$
	$\displaystyle{D}\triangleq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right).$

We will show achievability using a $(T,{R}+\epsilon,{D}+\epsilon)$ code without feedforward. We will use discretization and results from the rate-distortion theory for discrete memoryless sources (DMS). Define a binary-valued discrete-time process $(\bar{Y}_{j}:j\in\{1,\dots,n\})$ as follows. If there are one or more arrivals in the interval $((j-1)\Delta,j\Delta]$ of the process $Y_{0}^{T}$ , then set $\bar{Y}_{j}$ to $1$ , otherwise it equals zero. Since $Y_{0}^{T}$ is a Poisson process with rate $\lambda$ , the components of $(\bar{Y}_{j}:j\in\{1,\dots,n\})$ are independent and identically distributed with $P(\bar{Y}=1)=1-\exp(-\lambda\Delta)$ . Consider the following “test”-channel for $k\in\{1,2,3,4\}$ ,

	$\displaystyle P(\bar{U}=k\|\bar{Y}=1)$	$\displaystyle=\beta_{k},$
	$\displaystyle P(\bar{U}=k\|\bar{Y}=0)$	$\displaystyle=\alpha_{k}.$

Define the discretized distortion function

\bar{d}(\hat{\bar{y}},\bar{y})\triangleq\hat{\bar{y}}-\frac{\log(\hat{\bar{y}})}{\Delta}\mathbf{1}\{\bar{y}=1\}\quad\hat{\bar{y}}\in\mathcal{A},\bar{y}\in\{0,1\}.

The reconstruction $\hat{\bar{Y}}(k)$ is taken as a $v\in\mathcal{A}$ satisfying

\displaystyle\left|\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)-\left(v-\frac{\lambda\beta_{k}}{\alpha_{k}}\log(v)\right)\right|\leq\frac{\epsilon}{4},

(23)

where such a $v$ exists due to the definition of $\Psi_{\mathcal{A}}$ . We recall that if $\alpha_{k}=0$ then $\beta_{k}=0$ , and hence $P(\bar{U}=k)=0$ for such a $k$ . The scaling of the mutual information $I(\bar{U};\bar{Y})$ and the distortion function $\bar{d}(\hat{\bar{Y}},\bar{Y})$ with respect to $\Delta$ is given by the following lemma.

Lemma 13

	$\displaystyle\lim_{\Delta\to 0}\frac{I(\bar{U};\bar{Y})}{\Delta}$	$\displaystyle={R}$
	$\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]$	$\displaystyle\leq{D}+\frac{\epsilon}{4}$

Proof:

Please see the supplementary material. ∎

Let

\displaystyle\kappa\triangleq\max_{\begin{subarray}{c}k\in\{1,2,3,4\}\\ \hat{\bar{Y}}(k)>0\end{subarray}}\left|\log\left(\hat{\bar{Y}}(k)\right)\right|.

(24)

Due to [53, Theorem 9.3.2, p. 455], for a given $\Delta>0$ , $\bar{\epsilon}>0$ , and all sufficiently large $n$ , there exists an encoder $\bar{f}$ and a decoder $\bar{g}$ such that

	$\displaystyle\bar{f}:(\bar{Y}_{j}:j\in\{1,\dots,n\})\to\{1,\dots,L\}$
	$\displaystyle\bar{g}:\{1,\dots,L\}\to(\hat{\bar{Y}}_{j}:j\in\{1,\dots,n\})$

satisfying

	$\displaystyle\frac{1}{n}\log(L)\leq I(\bar{U};\bar{Y})+\bar{\epsilon},$
	$\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y_{j}})\right]\leq\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\bar{\epsilon}.$		(25)

Given the above setup, the encoder $f$ upon observing $Y_{0}^{T}$ obtains the binary valued discrete time process $(\bar{Y}_{j}:j\in\{1,\dots,n\})$ , and sends $M=\bar{f}(\bar{Y}_{j}:j\in\{1,\dots,n\})$ to the decoder. The decoder outputs the reconstruction $\hat{Y}_{0}^{T}$ as

\displaystyle\hat{Y}_{t}\triangleq\sum_{j=1}^{n}\hat{\bar{Y}}_{j}\mathbf{1}\left\{t\in((j-1)\Delta,j\Delta]\right\}\quad t\in[0,T].

Let $\bar{\bar{Y}}_{j}$ denote the actual number of arrivals of $Y_{0}^{T}$ in an interval $((j-1)\Delta,j\Delta]$ . Then $\bar{d}$ is related to the original distortion function via the above reconstruction as follows:

	$\displaystyle\frac{1}{T}d(\hat{Y}_{0}^{T};Y_{0}^{T})$	$\displaystyle=\frac{1}{T}\int_{0}^{T}\hat{Y}_{t}\,dt-\frac{1}{T}\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}$
		$\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\hat{\bar{Y}}_{j}-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})\bar{\bar{Y}}_{j}$
		$\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\hat{\bar{Y}}_{j}-\frac{1}{n\Delta}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})\bar{Y}_{j}-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}.$
		$\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}$
		$\displaystyle\overset{(a)}{\leq}\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})+\frac{\kappa}{T}\sum_{j=1}^{n}(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}$
		$\displaystyle\leq\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})+\frac{\kappa}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\},$

where for (a), we have used the definition of $\kappa$ in (24), since $\bar{\bar{Y}}_{j}>1$ implies $\bar{Y}_{j}=1$ which implies $\hat{\bar{Y}}_{j}>0$ in order for $\bar{d}(\hat{\bar{Y}}_{j},1)<\infty$ , which occurs a.s. since $\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]<\infty$ so long as $\Delta$ is sufficiently small.
Hence taking expectations, we get

$\displaystyle\mathbb{E}\left[\frac{1}{T}d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]$	$\displaystyle\leq\mathbb{E}\left[\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})\right]+\kappa\mathbb{E}\left[\frac{1}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}\right]$
	$\displaystyle\overset{(a)}{\leq}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa\mathbb{E}\left[\frac{1}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}\right]+\bar{\epsilon}$
	$\displaystyle\overset{(b)}{=}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa(\lambda-\lambda\exp(-\lambda\Delta))+\bar{\epsilon}$
	$\displaystyle\overset{(c)}{\leq}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa\lambda^{2}\Delta+\bar{\epsilon},$	(26)

where, for (a), we have used (25),
for (b) we note that $\mathbb{E}[\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}]=\lambda\Delta-\lambda\Delta\exp(-\lambda\Delta)$ , and
for (c), we have used the inequality $1-u\leq\exp(-u)$ .
Moreover using (25),

\displaystyle\frac{1}{T}\log(L)=\frac{1}{n\Delta}\log(L)\leq\frac{I(\bar{U};\bar{Y})}{\Delta}+\frac{\bar{\epsilon}}{\Delta}.

(27)

Now given the rate-distortion vector $({R},{D})$ and $\epsilon>0$ , first choose $\Delta<1$ sufficiently small so that

	$\displaystyle\frac{I(\bar{U};\bar{Y})}{\Delta}$	$\displaystyle\leq{R}+\frac{\epsilon}{4}$
	$\displaystyle\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]$	$\displaystyle\leq{D}+\frac{\epsilon}{2},$
	$\displaystyle\kappa\lambda^{2}\Delta$	$\displaystyle\leq\epsilon/2.$

Then let $\bar{\epsilon}=\Delta\epsilon/4$ , and choose a sufficiently large $n$ so that (25) is satisfied. From (26) and (27) we conclude that a sequence of $(T_{n},{R}+\epsilon,{D}+\epsilon)$ code exists with $T_{n}=n\Delta$ and $T_{n}\to\infty$ as $n\to\infty$ .

Converse

We will prove the converse when feedforward is present. For the given $(T,R+\epsilon,D+\epsilon)$ code with feedforward, let $M$ denote the encoder’s output. Since $I(M;Y_{0}^{T})<\infty$ , we conclude from Theorem 1 that there exists a process $\Gamma_{0}^{T}$ such that $\Gamma_{0}^{T}$ is the $(\mathcal{F}_{t}=\sigma(M,Y_{0}^{t}):t\in[0,T])$ intensity of $Y_{0}^{T}$ and

I(M;Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-T\phi(\lambda).

(28)

We also have

\displaystyle\frac{1}{T}I(M;{Y}^{T}_{0})=\frac{1}{T}H(M)\leq\frac{1}{T}\log\left(\lceil\exp((R+\epsilon)T)\rceil\right)\leq R+\epsilon+\frac{1}{T}.

This gives

\displaystyle R\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\phi(\lambda)-\epsilon-\frac{1}{T}.

Let $\hat{Y}_{0}^{T}$ denote the decoder’s output. The distortion constraint $D$ satisfies

$\displaystyle D$	$\displaystyle\geq\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]-\epsilon$
	$\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt-\log(\hat{Y}_{t})\,dY_{t}\right]-\epsilon$
	$\displaystyle\overset{(a)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}-\Gamma_{t}\log(\hat{Y}_{t})\,\,dt\right]-\epsilon$
	$\displaystyle\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\inf_{v\in\mathcal{A}}v-\Gamma_{t}\log(v)\,dt\right]-\epsilon$
	$\displaystyle\overset{(b)}{=}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Psi_{\mathcal{A}}(\Gamma_{t})\,dt\right]-\epsilon,$	(29)

where, for (a) we have used Lemma 12, and
for (b), we have used the definition of $\Psi_{\mathcal{A}}$ . Defining $S$ to be uniformly distributed on $[0,T]$ , and independent of all other random variables we have

	$\displaystyle R$	$\displaystyle\geq\mathbb{E}\left[\phi(\Gamma_{S})\right]-\phi(\lambda)-\epsilon-\frac{1}{T}$		(30)
	$\displaystyle D$	$\displaystyle\geq\mathbb{E}\left[\Psi_{\mathcal{A}}(\Gamma_{S})\right]-\epsilon,$		(31)

Now we use Carathéodory’s theorem [54, Theorem 17.1]. There exist non-negative $[\eta_{k}]_{k=1}^{4}$ and $[\alpha_{k}]_{k=1}^{4}$ , such that $\sum_{k=1}^{4}\alpha_{k}=1$ and

$\displaystyle\mathbb{E}\left[\phi({\Gamma}_{S})\right]$	$\displaystyle=\sum_{k=1}^{4}\alpha_{k}\phi(\eta_{k}),$	(32)
$\displaystyle\mathbb{E}\left[\Psi_{\mathcal{A}}({\Gamma}_{S})\right]$	$\displaystyle=\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}(\eta_{k}),$	(33)
$\displaystyle\mathbb{E}\left[{\Gamma}_{S}\right]$	$\displaystyle=\sum_{k=1}^{4}\alpha_{k}\eta_{k}=\lambda,$	(34)

where in the last line we have used the fact that since ${\Gamma}_{0}^{T}$ is the $(\sigma(M,{Y}_{0}^{T}):t\in[0,T])$ -intensity of ${Y}_{0}^{T}$ , $\mathbb{E}\left[\int_{0}^{T}{\Gamma}_{t}\,dt\right]=\mathbb{E}[{Y}_{T}]=T\lambda$ . Now define

\beta_{k}\triangleq\frac{\alpha_{k}\eta_{k}}{\lambda}.

We note that $\beta_{k}=0$ if $\alpha_{k}=0$ , and $\sum_{k=1}^{4}\beta_{k}=1$ . Substituting the above definitions in (30)-(31), we obtain

$\displaystyle R$	$\displaystyle\geq\left(\sum_{k=1}^{4}\alpha_{k}\eta_{k}\log(\eta_{k})-\lambda\log(\lambda)\right)-\epsilon-\frac{1}{T}$
	$\displaystyle=\lambda\left(\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}\lambda}{\alpha_{k}}\right)\mathbf{1}\{\alpha_{k}>0\}-\log(\lambda)\right)-\epsilon-\frac{1}{T}$
	$\displaystyle=\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)-\epsilon-\frac{1}{T}.$	(35)

Likewise,

\displaystyle D\geq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)-\epsilon.

Since $\epsilon$ is arbitrary and $T$ can be made arbitrarily large, we obtain the rate-distortion region in the statement of the theorem. ∎

If we do not place any restrictions on $\mathcal{A}$ , i.e., if $\mathcal{A}$ is the set of all non-negative reals, then we obtain the functional-covering distortion.

Corollary 2 (Functional Covering of Poisson Processes)

The rate-distortion function for functional-covering distortion is given by $R_{\text{FC}}(D)=(\lambda-\lambda\log(\lambda)-D)^{+}$ .

Proof:

For the functional-covering distortion, $\mathcal{A}$ is the set of non-negative reals. Hence

\displaystyle\Psi_{\mathcal{A}}(u)=\inf_{v\geq 0}v-u\log(v)=u-u\log(u).

For any achievable $(R,D)$ we have

\displaystyle R

\displaystyle\geq\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right),

(36)

and

	$\displaystyle D$	$\displaystyle\geq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)$
		$\displaystyle=\sum_{k=1}^{4}\alpha_{k}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}-\frac{\lambda\beta_{k}}{\alpha_{k}}\log\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)\right)$
		$\displaystyle=\lambda-\lambda\log(\lambda)-\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right).$

Hence

\displaystyle R+D\geq\lambda-\lambda\log(\lambda),

and this is achieved by $[\alpha_{k}]_{k=1}^{4}$ and $[\beta_{k}]_{k=1}^{4}$ that yield equality in (36). ∎

If take $\mathcal{A}=\{0,1\}$ , then we recover the covering distortion in [35, Theorem 1].

Corollary 3 (Covering Distortion [35])

The rate-distortion function for the covering distortion is given by $R_{\text{C}}(D)=(-\lambda\log(D))^{+}$ .

Proof:

For the covering distortion, $\mathcal{A}=\{0,1\}$ . Hence

\displaystyle\Psi_{\mathcal{A}}(u)=\inf_{v\in\{0,1\}}v-u\log(v)=\mathbf{1}\{u>0\}.

Suppose $(R,D)$ is in $\mathfrak{RD}$ . Then

	$\displaystyle D$	$\displaystyle\geq\sum_{k=1}^{4}\alpha_{k}\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)$
		$\displaystyle=\sum_{k=1}^{4}\alpha_{k}\mathbf{1}\{\beta_{k}>0\}$
		$\displaystyle=\sum_{k\in\mathcal{B}}\alpha_{k},$

where we have defined $\mathcal{B}=\{k:\beta_{k}>0\}$ . Similarly,

	$\displaystyle R$	$\displaystyle\geq\lambda\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)$
		$\displaystyle=\lambda\sum_{k\in\mathcal{B}}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right)$
		$\displaystyle\overset{(a)}{\geq}\lambda\left(\sum_{k\in\mathcal{B}}\beta_{k}\right)\log\left(\frac{\sum_{k\in\mathcal{B}}\beta_{k}}{\sum_{k\in\mathcal{B}}\alpha_{k}}\right)$
		$\displaystyle=\lambda\log\left(\frac{1}{\sum_{k\in\mathcal{B}}\alpha_{k}}\right)$
		$\displaystyle\geq\left(-\lambda\log(D)\right)^{+},$

where, (a) is due to the log-sum inequality, which can be achieved by setting $\alpha_{1}=\min(1,D)$ , $\alpha_{2}=1-\alpha_{1}$ , $\beta_{1}=1$ , $\beta_{2}=0$ . ∎

Remark 2

As in the general case in Theorem 4 (see Remark 1), the reconstruction $\hat{y}_{0}^{T}$ (assuming it is bounded) can be used to define a probability measure on the input space $(\mathcal{N}_{0}^{T},\mathfrak{F}^{N})$ via

\frac{dP_{\hat{y}_{0}^{T}}}{dP_{0}}(y_{0}^{T})=\exp\left(\int_{0}^{T}\log(\hat{y}_{t})\,dy_{t}-(\hat{y}_{t}-1)\,dt\right),

where $P_{0}$ is the measure under which $Y_{0}^{T}$ is a Poisson process with unit rate. Moreover, in absence of feedforward, $\hat{y}_{0}^{T}$ is deterministic (it depends only on the encoder’s output). Thus the input point-process $Y_{0}^{T}$ is a non-homogeneous Poisson processes with rate $\hat{y}_{0}^{T}$ under $P_{\hat{y}_{0}^{T}}$ . As in the general case, the functional-covering distortion is related to the above Radon-Nikodym derivative via

d(\hat{y}_{0}^{T},y_{0}^{T})=-\log\left(\frac{dP_{\hat{y}_{0}^{T}}}{dP_{0}}(y_{0}^{T})\right)+T

$\Diamond$

VI The Poisson CEO Problem

Refer to caption — Figure 1: Poisson CEO Problem.

We now consider the distributed problem shown Figure 1. Our goal is to compress $Y_{0}^{T}$ , which is a Poisson process with rate $\lambda>0$ . Each of the two encoders observes a degraded version of $Y_{0}^{T}$ , denoted by $Y_{0}^{(i),T}$ , $i\in\{1,2\}$ . $Y_{0}^{T}$ is first $p^{(i)}$ -thinned to obtain $\tilde{Y}_{0}^{(i),T}$ , and then an independent Poisson process $N_{0}^{(i),T}$ with rate $\mu^{(i)}$ is added to $\tilde{Y}_{0}^{(i),T}$ to obtain $Y_{0}^{(i),T}$ .

Recall that $\hat{\mathcal{Y}}_{0}^{T}$ is the set of all non-negative functions $\hat{y}_{0}^{T}$ which are left-continuous with right-limits, and

d(\hat{y}_{0}^{T},y_{0}^{T})=\int_{0}^{T}\hat{y}_{t}\,dt-\log(\hat{y}_{t})\,dy_{t}.

Definition 13

A $(T,R^{(1)},R^{(2)},D)$ code for the Poisson CEO problem consists of encoders $f^{(1)}$ and $f^{(2)}$ ,

	$\displaystyle f^{(1)}:\mathcal{N}_{0}^{T}\rightarrow\{1,\dots,\lceil\exp(R^{(1)}T)\rceil\}$
	$\displaystyle f^{(2)}:\mathcal{N}_{0}^{T}\rightarrow\{1,\dots,\lceil\exp(R^{(2)}T)\rceil\},$

and a decoder $g$ ,

\displaystyle g:\{1,\dots,\lceil\exp(R^{(1)}T)\rceil\}\times\{1,\dots,\lceil\exp(R^{(2)}T)\rceil\}\rightarrow\hat{\mathcal{Y}}_{0}^{T},

satisfying

\displaystyle\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]<\infty,

and the distortion constraint

\displaystyle\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]\leq D.

Definition 14

A rate-distortion vector $(R^{(1)},R^{(2)},D)$ is said to be achievable for the Poisson CEO problem if for any $\epsilon>0$ , there exists a sequence $(T_{n},R^{(1)}+\epsilon,R^{(2)}+\epsilon,D+\epsilon)$ codes $T_{n}\to\infty$ .

Definition 15

The rate-distortion region for the Poisson CEO problem $\mathfrak{RD}^{\mathcal{P}}$ is the intersection of all achievable rate-distortion vectors $(R^{(1)},R^{(2)},D)$ .

The rate-distortion region for the Poisson CEO problem with feedforward, denoted by $\mathfrak{RD}^{\mathcal{P}}_{F}$ , is defined analogously.

Theorem 6

The rate-distortion region for the Poisson CEO problem is given by

\mathfrak{RD}^{\mathcal{P}}=\mathfrak{RD}^{\mathcal{P}}_{F}=\mathfrak{RD},

where $\mathfrak{RD}$ is the convex hull of union of sets of rate-distortion vectors $(R^{(1)},R^{(2)},D)$ such that

	$\displaystyle R^{(1)}\geq\left((1-p^{(1)})\lambda+\mu^{(1)}\right)\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right),$
	$\displaystyle R^{(2)}\geq\left((1-p^{(2)})\lambda+\mu^{(2)}\right)\sum_{k=1}^{4}\beta^{(2)}_{k}\log\left(\frac{\beta^{(2)}_{k}}{\alpha^{(2)}_{k}}\right),$
	$\displaystyle D\geq\lambda-\phi(\lambda)-\lambda\left(\sum_{k=1}^{4}\gamma^{(1)}_{k}\log\left(\frac{\gamma^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)+\sum_{k=1}^{4}\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)\right)$

for some probability vectors $[\alpha^{(i)}_{k}]_{k=1}^{4}$ , $[\beta^{(i)}_{k}]_{k=1}^{4}$ , and $[\gamma^{(i)}_{k}]_{k=1}^{4}$ , where for $k\in\{1,2,3,4\}$ and $i\in\{1,2\}$

	$\displaystyle\begin{rcases}\gamma^{(i)}_{k}=p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}\\ \alpha^{(i)}_{k}=0\Rightarrow\beta^{(i)}_{k}=0\end{rcases}\quad\text{if }p^{(i)}<1,$
	$\displaystyle\alpha^{(i)}_{k}=\beta^{(i)}_{k}=\gamma^{(i)}_{k}\quad\qquad\qquad\quad\quad\,\,\,\text{if }p^{(i)}=1.$

Proof:

Please see the supplementary material. ∎

Remark 3

Note that there is no sum-rate constraint in the rate-distortion region of the above theorem. This occurs due to the sparsity of points in a Poisson process. After discretizing a Poisson process with rate $\lambda$ , the expected number of ones in the resulting binary process is roughly $\lambda T$ , and the remaining $T/\Delta-\lambda T$ bits are zeroes. When such a sparse binary process is sent via two independent parallel channels as in (46)-(47), the resulting output processes are almost independent. This implies that the encoders do not need to bin their messages in the achievability argument.

Corollary 4 (Poisson CEO Problem without Thinning)

If $p^{(1)}=p^{(2)}=0$ , then the rate-distortion region in Theorem 6 takes a simple form

\frac{\lambda}{\lambda+\mu^{(1)}}R^{(1)}+\frac{\lambda}{\lambda+\mu^{(2)}}R^{(2)}+D\geq\lambda-\phi(\lambda).

Corollary 5 (Remote Poisson Source)

Consider a scenario where an encoder wishes to compress a Poisson process with rate $\lambda>0$ , but observes a degraded version of it, where the points are first erased with independent probability $1-p$ and then an independent Poisson process with rate $\mu$ is added to it. Then the rate-distortion region $(R,D)$ is the convex hull of the union of all rate-distortion vectors satisfying

	$\displaystyle R\geq\left((1-p)\lambda+\mu\right)\sum_{k=1}^{4}\beta_{k}\log\left(\frac{\beta_{k}}{\alpha_{k}}\right),$
	$\displaystyle D\geq\lambda-\phi(\lambda)-\lambda\cdot\sum_{k=1}^{4}\gamma_{k}\log\left(\frac{\gamma_{k}}{\alpha_{k}}\right),$

for some probability vectors $[\alpha_{k}]_{k=1}^{4}$ , $[\beta_{k}]_{k=1}^{4}$ , and $[\gamma_{k}]_{k=1}^{4}$ , where for $k\in\{1,2,3,4\}$

\displaystyle\gamma_{k}=p\alpha_{k}+(1-p)\beta_{k},\quad\alpha_{k}=0\Rightarrow\beta_{k}=0.

References

[1] N. V. Shende and A. B. Wagner, “Functional covering of point processes,” in IEEE Int. Symp. Info. Theory, 2019, pp. 2039–2043.
[2] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression. Englewood Cliffs, NJ: Prentice Hall, 1971.
[3] D. H. Johnson, “Point process models of single-neuron discharges,” Journal of Computational Neuroscience, vol. 3, no. 4, pp. 275–299, 1996.
[4] J. H. Goldwyn, J. T. Rubinstein, and E. Shea-Brown, “A point process framework for modeling electrical stimulation of the auditory nerve,” Journal of Neurophysiology, vol. 108, no. 5, pp. 1430–1452, 2012.
[5] F. Farkhooi, M. F. Strube-Bloss, and M. P. Nawrot, “Serial correlation in neural spike trains: Experimental evidence, stochastic modeling, and single neuron variability,” Physical Review E, vol. 79, no. 2, p. 021905, 2009.
[6] S. V. Sarma, U. T. Eden, M. L. Cheng, Z. M. Williams, R. Hu, E. Eskandar, and E. N. Brown, “Using point process models to compare neural spiking activity in the subthalamic nucleus of Parkinson’s patients and a healthy primate,” IEEE Transactions on Biomedical Engineering, vol. 57, no. 6, pp. 1297–1305, 2010.
[7] E. N. Brown, R. E. Kass, and P. P. Mitra, “Multiple neural spike train data analysis: state-of-the-art and future challenges,” Nature Neuroscience, vol. 7, no. 5, pp. 456–461, 2004.
[8] F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek, Spikes: Exporing the Neural Code. MIT Press, 1997.
[9] J. Giles and B. Hajek, “An information-theoretic and game-theoretic study of timing channels,” IEEE Trans. on Inf. Theory, vol. 48, no. 9, pp. 2455–2477, 2002.
[10] M. Shahzad and A. X. Liu, “Accurate and efficient per-flow latency measurement without probing and time stamping,” IEEE/ACM Trans. Networking, vol. 24, no. 6, pp. 3477–3492, 2016.
[11] Y. Zhu, X. Fu, B. Graham, R. Bettati, and W. Zhao, “On flow correlation attacks and countermeasures in mix networks,” in Proc. 4th Privacy Enhancement Technology Workshop (PET), 2004.
[12] C. B. Attila Börcs, “A marked point process model for vehicle detection in aerial LIDAR point clouds,” in ISPRS Ann. Photogrammetry, Remote Sens. and Spatial Inf. Sci, vol. 1-3, 2012, pp. 93–98.
[13] Y. Yu, J. Li, H. Guan, C. Wang, and M. Cheng, “A marked point process for automated tree detection from mobile laser scanning point cloud data,” in 2012 Intl. Conf. Comp. Vision in Remote Sensing, 2012, pp. 140–145.
[14] S. Nakamoto, “A peer-to-peer electronic cash system,” 2008. [Online]. Available: bitcoin.org/bitcoin
[15] Y. Lewenberg, Y. Bachrach, Y. Sompolinsky, A. Zohar, and J. S. Rosenschein, “Bitcoin mining pools: A cooperative game theoretic analysis,” in Proc. 2015 Int. Conf. Autonomous Agents and Multiagent Sys., 2015, p. 919–927.
[16] Y. Kawase and S. Kasahara, “Transaction-confirmation time for bitcoin: A queueing analytical approach to blockchain mechanism,” in Intl. Conf. on Queueing Theory and Network App., 2017, p. 75–88.
[17] C. Decker and R. Wattenhofer, “Information propagation in the bitcoin network,” in IEEE P2P 2013 Proc., 2013, p. 1–10.
[18] A. Laourine and A. B. Wagner, “Secrecy capacity of the degraded Poisson wiretap channel,” in Proc. IEEE Intl. Symp. Inf. Theory, Jun. 2010, pp. 2553–2557.
[19] A. D. Wyner, “Capacity and error exponent for the direct detection photon channel—Part I,” IEEE Trans. Inf. Theory, vol. 34, no. 6, pp. 1449–1461, Nov. 1988.
[20] ——, “Capacity and error exponent for the direct detection photon channel—Part II,” IEEE Trans. Inf. Theory, vol. 34, no. 6, pp. 1462–1471, Nov. 1988.
[21] A. Lapidoth, “On the reliability function of the ideal Poisson channel with noiseless feedback,” IEEE Trans. Inf. Theory, vol. 39, no. 2, pp. 491–503, Mar. 1993.
[22] N. Shende and A. B. Wagner, “The stochastic-calculus approach to multiple-decoder Poisson channels,” IEEE Trans. Inf. Theory, vol. 65, no. 8, pp. 5007–5027, Aug. 2019.
[23] F. Baccelli and P. Brémaud, Palm Probabilities and Stationary Queues. Springer-Verlag, 1987.
[24] C. Sutardja and J. M. Rabaey, “Isolator-less near-field RFID reader for sub-cranial powering/data link of millimeter-sized implants,” IEEE Journal of Solid-State Circuits, vol. 53, no. 7, pp. 2032–2042, 2018.
[25] A. K. Skrivervik, A. J. M. Montes, I. V. Trivino, M. Bosiljevac, M. Veljovic, and Z. Sipus, “Antenna design for a cranial implant,” in 2020 Intl. Work. Antenna Tech. (iWAT), 2020, pp. 1–4.
[26] R. L. de Queiroz and P. A. Chou, “Compression of 3D point clouds using a region-adaptive hierarchical transform,” IEEE Trans. on Image Proc., vol. 25, no. 8, pp. 3947–3956, 2016.
[27] T. Golla and R. Klein, “Real-time point cloud compression,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 5087–5092.
[28] C. Tu, E. Takeuchi, C. Miyajima, and K. Takeda, “Compressing continuous point cloud data using image compression methods,” in 2016 IEEE 19th Intl. Conf. Intell. Transportation Sys. (ITSC), 2016, pp. 1712–1719.
[29] R. Gallager, “Basic limits on protocol information in data communication networks,” IEEE Trans. Info. Theory, vol. 22, no. 4, pp. 385–398, 1976.
[30] A. S. Bedekar, “On the information about message arrival times required for in-order decoding,” in IEEE Int. Symp. Inf. Theory, June 2001, p. 227.
[31] S. Verdú, “The exponential distribution in information theory.” Probl. Inf. Transm., vol. 32, no. 1, pp. 86–95, 1996.
[32] T. P. Coleman, N. Kiyavash, and V. G. Subramanian, “The rate-distortion function of a Poisson process with a queueing distortion measure,” in Data Compress. Conf, Mar 2008, pp. 63–72.
[33] I. Rubin, “Information rates and data-compression schemes for Poisson processes,” IEEE Trans. Info. Theory, vol. 20, no. 2, pp. 200–210, 1974.
[34] G. Koliander, D. Schuhmacher, and F. Hlawatsch, “Rate-distortion theory of finite point processes,” IEEE Trans. Info. Theory, vol. 64, no. 8, pp. 5832–5861, 2018.
[35] A. Lapidoth, A. Malar, and L. Wang, “Covering point patterns,” IEEE Trans. Info. Theory, vol. 61, no. 9, pp. 4521–4533, 2015.
[36] H.-A. Shen, S. M. Moser, and J.-P. Pfister, “Rate-distortion problems of the Poisson process based on a group-theoretic approach,” 2022. [Online]. Available: https://arxiv.org/abs/2202.13684
[37] T. A. Courtade and R. D. Wesel, “Multiterminal source coding with an entropy-based distortion measure,” in IEEE Int. Symp. Info. Theory, Jul 2011, pp. 2040–2044.
[38] T. A. Courtade and T. Weissman, “Multiterminal source coding under logarithmic loss,” IEEE Trans. Info. Theory, vol. 60, no. 1, pp. 740–761, 2014.
[39] P. Brémaud, Point Procceses and Queues: Martingale Dynamics. Springer-Verlag, 1981.
[40] Y. Kabanov, “The capacity of a channel of the Poisson type,” Theory of Probabilty and Applications, vol. 23, pp. 143–147, 1978.
[41] M. Davis, “Capacity and cutoff rate for Poisson-type channels,” IEEE Trans. Info. Theory, vol. 26, no. 6, pp. 710–715, Nov 1980.
[42] N. V. Shende and A. B. Wagner, “The stochastic-calculus approach to multi-receiver Poisson channels,” IEEE Trans. Info. Theory, vol. 65, no. 8, pp. 5007–5027, Aug 2019.
[43] R. Sundaresan and S. Verdú, “Capacity of queues via point-process channels,” IEEE Trans. Info. Theory, vol. 52, no. 6, pp. 2697–2709, June 2006.
[44] L. Wang, “A strong data processing inequality for thinning Poisson processes and some applications,” in IEEE Int. Symp. Info. Theory, June 2017, pp. 3180–3184.
[45] A. Klenke, Probability Theory: A Comprehensive Course, 2nd ed. Springer London, 2013.
[46] A. Wyner, “A definition of conditional mutual information for arbitrary ensembles,” Information and Control, vol. 38, no. 1, pp. 51 – 59, 1978.
[47] I. M. Gel’fand and A. M. Yaglom, “Computation of the amount of information about a stochastic function contained in another such function,” Uspekhi Mat. Nauk, vol. 12, no. 1, pp. 3–52, 1957.
[48] R. M. Gray, Entropy and Information Theory. Springer-Verlag, 1990.
[49] O. Kallenberg, Foundations of Modern Probability, 2nd ed. Springer-Verlag, New York, 2002.
[50] R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes II, 2nd ed. Springer-Verlag Berlin Heidelberg, 2001.
[51] Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for channels and Bayesian networks,” in Convexity and Concentration. New York, NY: Springer New York, 2017, pp. 211–249.
[52] R. Durrett, Essentials of Stochastic Processes, ser. Springer Texts in Statistics. Springer International Publishing, 2016.
[53] R. G. Gallager, Information Theory and Reliable Communication. New York, NY, USA: John Wiley & Sons, Inc., 1968.
[54] R. Rockafellar, Convex Analysis. Princeton University Press, 1997.
[55] C. Dellacherie and P. A. Meyer, Probabilities and Potential B: Theory of Martingales, ser. North-Holland Mathematics Studies. North-Holland, 1982, vol. 72.
[56] A. E. Gamal and Y.-H. Kim, Network Information Theory. Cambridge University Press, 2011.

Proof:

The first part of the lemma is due to [39, T12 Theorem, Chapter VI, p. 187]. To prove the second part we note that $\mathbb{E}_{P^{Y_{0}^{T}}}[\int_{0}^{T}|\phi(\Gamma_{t})|\,dt]<\infty$ implies $\mathbb{E}_{P^{Y_{0}^{T}}}[\int_{0}^{T}\Gamma_{t}\,dt]<\infty$ , which in turn gives

\int_{0}^{T}(1-\sqrt{\Gamma_{t}}))^{2}\leq\int_{0}^{T}(\Gamma_{t}+1)<\infty,

${P^{Y_{0}^{T}}}$ -a.s. Thus applying [50, Theorem 19.7, p. 343], we conclude that $P^{Y_{0}^{T}}\ll P_{0}^{Y_{0}^{T}}$ . Hence, from the first part of the lemma

\displaystyle\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}=\exp\left(\int_{0}^{T}\log(\Lambda_{t})\,dY_{t}-\Lambda_{t}+1\,dt\right),

where the uniqueness of intensity [39, T12 Theorem, Chapter II, p. 31] gives us

\displaystyle\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}|\Gamma_{t}-\Lambda_{t}|\,dt\right]=0,\quad\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\mathbf{1}\{\Gamma_{t}\neq\Lambda_{t}\}\,dY_{t}\right]=0.

Since

\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}|\phi(\Gamma_{t})|\,dt\right]<\infty,

we have

\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{+}\,dY_{t}\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{+}\Gamma_{t}\,dt\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\phi(\Gamma_{t}))^{+}\,dt\right]<\infty,

and

\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{-}\,dY_{t}\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{-}\Gamma_{t}\,dt\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}(\phi(\Gamma_{t}))^{-}\,dt\right]<\infty.

Hence

\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]<\infty.

Finally,

	$\displaystyle\mathbb{E}_{P^{Y_{0}^{T}}}\left[\log\left(\frac{dP^{Y_{0}^{T}}}{dP_{0}^{Y_{0}^{T}}}\right)\right]$	$\displaystyle=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\log(\Lambda_{t})\,dY_{t}+\Lambda_{t}-1\,dt\right]$
		$\displaystyle\mathbf{\overset{(a)}{=}}\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}+\Gamma_{t}-1\,dt\right]$
		$\displaystyle=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}-1\,dt\right]$
		$\displaystyle=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}-1\,dt\right]$
		$\displaystyle=\mathbb{E}_{P^{Y_{0}^{T}}}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+1\,dt\right].$

Here, for (a) we have used the uniqueness of the intensity and in the remaining equalities, we have used the finiteness of the expectations $\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]$ , $\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]$ . ∎

Proof:

Recall that $L_{0}^{T}$ can be written as

\displaystyle L_{t}=\exp\left(\int_{0}^{t}\log(\Gamma_{s})\,dY_{s}+(1-\Gamma_{s})\,ds\right),\quad t\in[0,T].

We note that for $t\in[0,T]$ $L_{t}$ satisfies

\displaystyle L_{t}=\begin{cases}L_{t-}&\text{if }Y_{t}-Y_{t-}=0,\\ \Gamma_{t}L_{t-}&\text{if }Y_{t}-Y_{t-}=1.\end{cases}

(37)

Let $C_{0}^{T}$ be a non-negative $(\mathcal{G}_{t}:t\in[0,T])$ -predictable process. Then

	$\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{t}\,dY_{t}\right]$	$\displaystyle\overset{(a)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[L_{T}\int_{0}^{T}C_{t}\,dY_{t}\right]$
		$\displaystyle\overset{(b)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[\int_{0}^{T}L_{t}C_{t}\,dY_{t}\right]$
		$\displaystyle\overset{(c)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[\int_{0}^{T}\Gamma_{t}L_{t-}C_{t}\,dY_{t}\right]$
		$\displaystyle\overset{(d)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[\int_{0}^{T}\Gamma_{t}L_{t-}C_{t}\,dt\right]$
		$\displaystyle\overset{(e)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[\int_{0}^{T}\Gamma_{t}L_{t}C_{t}\,dt\right]$
		$\displaystyle\overset{(f)}{=}\mathbb{E}_{\tilde{P}^{M,Y_{0}^{T}}}\left[L_{T}\int_{0}^{T}\Gamma_{t}C_{t}\,dt\right]$
		$\displaystyle\overset{(g)}{=}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}C_{t}\,dt\right],$

where, (a) follows since $L_{T}$ is the Radon-Nikodym derivative $\frac{d{P}^{M,Y_{0}^{T}}}{d\tilde{P}^{M,Y_{0}^{T}}}$ ,
(b) follows due to [39, T19 Theorem, Appendix A2, p. 302],
(c) follows due to (37),
(d) follows since the $(\tilde{P}^{M,Y_{0}^{T}},\mathcal{G}_{t}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ is 1, and $L_{t-}$ being a left-continuous adapted process is $(\mathcal{G}_{t}:t\in[0,T])$ -predictable,
(e) follows since the Lebesgue measure of the set $\{t:t\in[0,T],L_{t-}\neq L_{t}\}$ is zero due to (37),
(f) again follows again due to [39, T19 Theorem, Appendix A2, p. 302], and
(g) again follows since $L_{T}$ is the Radon-Nikodym derivative $\frac{d{P}^{M,Y_{0}^{T}}}{d\tilde{P}^{M,Y_{0}^{T}}}$ . ∎

Proof:

We will first show that

\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}\right]=\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\Gamma_{t}\,dt\right]<\infty.

Define $\Gamma^{1+}_{t}\triangleq\max(\Gamma_{t},1)$ and $\Gamma^{1-}_{t}\triangleq\min(\Gamma_{t},1)$ . We note that $\Gamma_{t}\leq\Gamma^{1+}_{t}\leq\Gamma_{t}+1$ and $\Gamma_{t}=\Gamma^{1+}_{t}\Gamma^{1-}_{t}$ . Define the process $\mu_{0}^{T}$ as

\displaystyle\mu_{t}\triangleq\frac{\Gamma^{1+}_{t}}{\Gamma_{t}}\mathbf{1}\{\Gamma_{t}>0\},\quad t\in[0,T].

Then $\mu_{0}^{T}$ is a non-negative $(\mathcal{G}_{t}:t\in[0,T])$ -predictable process and

\int_{0}^{T}\mu_{t}\Gamma_{t}\,dt=\int_{0}^{T}\Gamma^{1+}_{t}\mathbf{1}\{\Gamma_{t}>0\}\,dt\leq\int_{0}^{T}(\Gamma_{t}+1)\,dt<\infty

$P$ -a.s. since $\mathbb{E}[\int_{0}^{T}\Gamma_{t}\,dt]<\infty$ . Hence the process $\hat{L}_{0}^{T}$ defined as

\displaystyle\hat{L}_{t}\triangleq\exp\left(\int_{0}^{T}\log(\mu_{t})\,dY_{t}+(1-\mu_{t})\Gamma_{t}\,dt\right),\quad t\in[0,T]

is a $(P,G_{t}:t\in[0,T])$ non-negative super-martingale [39, T2 Theorem, Chapter VI, p. 165]. Hence the following chain of inequalities hold

$\displaystyle\mathbb{E}\left[\log(\hat{L}_{T})\right]$	$\displaystyle\overset{(a)}{\leq}\log\left(\mathbb{E}[\hat{L}_{T}]\right)$
	$\displaystyle\overset{(b)}{\leq}\log\left(\mathbb{E}[\hat{L}_{0}]\right)$
	$\displaystyle=0.$	(38)

Here, for (a) we have used the fact that since $L_{0}^{T}$ is a super-martingale, $L_{T}$ is integrable, and then Jensen’s inequality and
for (b), we have used the fact that $\hat{L}_{0}^{T}$ is a super-martingale, hence $\mathbb{E}[\hat{L}_{T}]\leq\mathbb{E}[\hat{L}_{0}]$ .
Let $\tau_{k}$ denote the $k$ th arrival instant of the process $Y_{0}^{T}$ , i.e.,

\displaystyle\tau_{k}=\inf\{t\in[0,T]:Y_{t}=k\},

where the infimum of the null set is taken as $\infty$ . Then if $\tau_{k}\leq T$ , $\Gamma_{\tau_{k}}>0$ $P$ -a.s. [39, T12 Theorem, Chapter II, p. 31]. Hence for $\tau_{k}\leq T$ ,

\displaystyle\log(\mu_{\tau_{k}})=\log(\Gamma^{1+}_{\tau_{k}})-\log(\Gamma_{\tau_{k}})=-\log(\Gamma^{1-}_{\tau_{k}})=\left(\log(\Gamma_{\tau_{k}})\right)^{-}\quad P-\text{a.s.},

Thus we can write

\displaystyle\log(\hat{L}_{T})=\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}+\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt.

Using (38) we obtain

\displaystyle\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}+\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt\right]=\mathbb{E}[\log(\hat{L_{T}})]\leq 0.

We note that $\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}$ is a non-negative random variable, and

\left|\mathbb{E}\left[\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt\right]\right|\leq\mathbb{E}\left[\int_{0}^{T}(\Gamma_{t}+\Gamma^{1+}_{t})\,dt\right]\leq\mathbb{E}\left[\int_{0}^{T}(2\Gamma_{t}+1)\,dt\right]<\infty.

Hence we can split the expectation to get

\displaystyle\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}\right]+\mathbb{E}\left[\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt\right]\leq 0,

which gives

\displaystyle\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}\right]\leq-\mathbb{E}\left[\int_{0}^{T}(\Gamma_{t}-\Gamma^{1+}_{t})\mathbf{1}\{\Gamma_{t}>0\}\,dt\right]<\infty.

(39)

Hence

$\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\Gamma_{t})\,dY_{t}\right]$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{+}\,dY_{t}\right]-\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\,dY_{t}\right]$
	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}(\log(\Gamma_{t}))^{+}\Gamma_{t}\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\left(\log(\Gamma_{t})\right)^{-}\Gamma_{t}\,dt\right]$
	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right].$	(40)

∎

Proof:

Suppose that $\Gamma_{0}^{T}$ is the $(\mathcal{F}_{t}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ . Then applying [39, T8 Theorem, Chapter II, p. 27] with $X_{s}=1$ proves $M_{0}^{T}$ is a $(\mathcal{F}_{t}:t\in[0,T])$ -martingale. Now suppose that $M_{0}^{T}$ is a $(\mathcal{F}_{t}:t\in[0,T])$ -martingale. Consider a simple $(\mathcal{F}_{t}:t\in[0,T])$ -predictable process $C_{0}^{T}$ of the form

\displaystyle C_{t}=\mathbf{1}\{\mathcal{E}\}\mathbf{1}\{u<t\leq v\leq T\}\quad\mathcal{E}\in\mathcal{F}_{u}.

Then

$\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{Y}_{s}\right]$	$\displaystyle=\mathbb{E}\left[\mathbf{1}\{\mathcal{E}\}(Y_{v}-Y_{u})\right]$
	$\displaystyle=\mathbb{E}\left[\mathbf{1}\{\mathcal{E}\}\mathbb{E}[(Y_{v}-Y_{u})\|\mathcal{F}_{u}]\right]$
	$\displaystyle\overset{(a)}{=}\mathbb{E}\left[\mathbf{1}\{\mathcal{E}\}\mathbb{E}\left[\int_{u}^{v}\Gamma_{s}\,ds\middle\|\mathcal{F}_{u}\right]\right]$
	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}C_{s}\Gamma_{s}\,d{s}\right],$	(41)

where for (a) we have used the martingale property of $M_{0}^{T}$ . Thus by the monotone class theorem, for all bounded $(\mathcal{F}_{t}:t\in[0,T])$ -predictable processes $C_{0}^{T}$ , (41) holds (see [39, App. A1, Theorem T5, p. 264]). Then by applying the monotone convergence theorem, we can show that (41) holds for all non-negative $(\mathcal{F}_{t}:t\in[0,T])$ -predictable processes as well, so that $\Gamma_{0}^{T}$ is the $(\mathcal{F}_{t}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ . ∎

Proof:

There exists a $(\mathcal{G}_{t}:t\in[0,T])$ -predictable process $\Pi_{0}^{T}$ such that $P$ -a.s. $\Pi_{t}=\mathbb{E}[{\Lambda}_{t}|\mathcal{G}_{t-}]$ , $t\in[0,T]$ [55, Chapter 6, Theorem 43, p. 103]. We will show that $\Pi_{0}^{T}$ is the $(\mathcal{G}_{t}:t\in[0,T])$ -intensity of ${N}_{0}^{T}$ . Let ${D}_{0}^{T}$ be a non-negative $(\mathcal{G}_{t}:t\in[0,T])$ -predictable process. As $\mathcal{G}_{t}\subseteq\mathcal{F}_{t}$ , it is also $(\mathcal{F}_{t}:t\in[0,T])$ -predictable. Thus

\displaystyle\mathbb{E}\left[\int_{0}^{T}D_{s}\,d{{N}}_{s}\right]=\mathbb{E}\left[\int_{0}^{T}D_{s}{\Lambda}_{s}\,ds\right].

(42)

Hence

	$\displaystyle\mathbb{E}\left[\int_{0}^{T}D_{s}\Pi_{s}\,ds\right]$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}D_{s}\mathbb{E}[{\Lambda}_{s}\|\mathcal{G}_{s-}]\,ds\right]$
		$\displaystyle\stackrel{{\scriptstyle(a)}}{{=}}\mathbb{E}\left[\int_{0}^{T}\mathbb{E}[D_{s}{\Lambda}_{s}\|\mathcal{G}_{s-}]\,ds\right]$
		$\displaystyle=\mathbb{E}\left[\int_{0}^{T}D_{s}{\Lambda}_{s}\,ds\right]$
		$\displaystyle\stackrel{{\scriptstyle(b)}}{{=}}\mathbb{E}\left[\int_{0}^{T}D_{s}\,d{{N}}_{s}\right].$

Here, (a) is due to the fact that $D_{s}$ is $\mathcal{G}_{s-}$ measurable [39, Exercise E10, Chapter I, p. 9], and
(b) is due to (42).

Hence the $\left(\mathcal{G}_{t}:t\in[0,T]\right)$ -intensity of ${N}_{0}^{{T}}$ is $\Pi_{0}^{{T}}$ . ∎

Proof:

We first note that since $Y_{0}^{T}$ and $N_{0}^{T}$ are independent, trajectories of $Z_{0}^{T}$ are a.s. in $\mathcal{N}_{0}^{T}$ . The $(\tilde{\mathcal{F}}_{t}\triangleq\sigma(M,Y_{0}^{t},N_{0}^{t}):t\in[0,T])$ -intensities of $Y_{0}^{T}$ and $N_{0}^{T}$ are $\Gamma_{0}^{T}$ and $\Pi_{0}^{T}$ respectively [39, E5 Exercise, Chapter II, p. 28]. Then for a non-negative $(\tilde{\mathcal{F}}_{t}:t\in[0,T])$ -predictable process $C_{0}^{T}$ :

	$\displaystyle\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{Z}_{s}\right]$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{Y}_{s}\right]+\mathbb{E}\left[\int_{0}^{T}C_{s}\,d{N}_{s}\right]$
		$\displaystyle=\mathbb{E}\left[\int_{0}^{T}C_{s}\Gamma_{s}\,d{s}\right]+\mathbb{E}\left[\int_{0}^{T}C_{s}\Pi_{s}\,ds\right]$
		$\displaystyle=\mathbb{E}\left[\int_{0}^{T}C_{s}(\Gamma_{s}+\Pi_{s})\,d{s}\right].$

Hence the $(\tilde{\mathcal{F}}_{t}:t\in[0,T])$ -intensity of $Z_{0}^{T}$ is $(\Gamma_{t}+\Pi_{t}:t\in[0,T])$ . Since $\mathcal{F}_{t}\subseteq\tilde{\mathcal{F}}_{t}$ the statement of the lemma follows from an application of Lemma 8. ∎

Proof:

Let $\mathcal{H}_{t}\triangleq\sigma(M,Y_{0}^{t},Z_{0}^{t})$ . We note that the $(\mathcal{H}_{t}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ is $\Gamma_{0}^{T}$ . Now we will compute the $(\mathcal{H}_{t}:t\in[0,T])$ -intensity of $Z_{0}^{T}$ . Let $(\chi_{i}:i\in\{1,\dots\})$ denote the sequence of independent and identically distributed Bernoulli random variables which indicate if a particular point in point process ${Y}_{0}^{T}$ is erased or not. In particular, if $\chi_{j}=1$ , then the $j$ th point in $Y_{0}^{T}$ is retained, so that $\mathbb{E}[\chi_{j}]=1-p$ . Then for $0\leq u<v\leq T$

\displaystyle Z_{v}-Z_{u}=\sum_{k=Y_{u}+1}^{Y_{v}}\chi_{k}=\sum_{k=1}^{\infty}\chi_{k}\mathbf{1}\{Y_{u}<k\leq Y_{v}\}.

Using the monotone convergence theorem for the conditional expectation,

	$\displaystyle\mathbb{E}\left[(Z_{v}-Z_{u})\|\mathcal{H}_{u}\right]$	$\displaystyle=\sum_{k=1}^{\infty}\mathbb{E}[\chi_{k}\mathbf{1}\{Y_{u}<k\leq Y_{v}\}\|\mathcal{H}_{u}]$
		$\displaystyle\overset{(a)}{=}\sum_{k=1}^{\infty}\mathbb{E}[\chi_{k}\|\mathcal{H}_{u}]\mathbb{E}[\mathbf{1}\{Y_{u}<k\leq Y_{v}\}\|\mathcal{H}_{u}]$
		$\displaystyle\overset{(b)}{=}(1-p)\mathbb{E}[(Y_{v}-Y_{u})\|\mathcal{H}_{u}]$
		$\displaystyle\overset{(c)}{=}(1-p)\mathbb{E}\left[\int_{u}^{v}\Gamma_{s}\,ds\middle\|\mathcal{H}_{u}\right],$

where, for (a) we have used the fact that given $\mathcal{H}_{u}$ , $\chi_{k}$ is independent of $Y_{0}^{T}$ ,
for (b), we use note that $\mathbb{E}[\chi_{k}|\mathcal{H}_{u}]=\chi_{k}\mathbf{1}\{k\leq Y_{u}\}+(1-p)\mathbf{1}\{k>Y_{u}\}$ , and
for (c), we have used the martingale property of $M_{0}^{T}$ .
Then

\tilde{M}_{t}\triangleq Z_{t}-\int_{0}^{t}(1-p)\Gamma_{s}\,ds\quad t\in[0,T]

is a $(\mathcal{H}_{t}:t\in[0,T])$ -martingale. Hence from Lemma 7, the $(\mathcal{H}_{t}:t\in[0,T])$ -intensity of $Z_{0}^{T}$ is $((1-p)\Gamma_{t}:t\in[0,T])$ . An application of Lemma 8 the proves the statement of the lemma. ∎

Proof:

We will require the following inequality

\displaystyle u\log(v)\leq\phi(u)-u+v,\quad 0\leq u,v<\infty.

(43)

The inequality can be verified to be true if either or both $u$ , $v$ are zero. If $u,v>0$ , the inequality follows from $\log(u/v)\geq(1-v/u)$ .

Defining $\hat{Y}^{1+}_{t}\triangleq\max(1,\hat{Y}_{t})$ , we note that $\hat{Y}^{1+}_{t}\leq\hat{Y}_{t}+1$ . Consider

$\displaystyle\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{+}\,dY_{t}\right]$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}^{1+}_{t})\,dY_{t}\right]$
	$\displaystyle\overset{(a)}{=}\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}^{1+}_{t})\Gamma_{t}\,dt\right]$
	$\displaystyle\overset{(b)}{\leq}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+\hat{Y}^{1+}_{t}\,dt\right]$
	$\displaystyle\overset{(c)}{=}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]+\mathbb{E}\left[\int_{0}^{T}\hat{Y}^{1+}_{t}\,dt\right]$
	$\displaystyle<\infty,$	(44)

where, for (a), we have used the facts that $(\hat{Y}^{1+}_{t}:t\in[0,T])$ is $(\mathcal{F}_{t}:t\in[0,T])$ -predictable, $\log(\hat{Y}^{1+}_{t})$ is non-negative, and $\Gamma_{0}^{T}$ is the $(\mathcal{F}_{t}:t\in[0,T])$ -intensity of $Y_{0}^{T}$ ,
for (b), we note that $\hat{Y}_{t}^{1+}$ and $\Gamma_{t}$ are $P$ -a.s finite, and then use the inequality in (43), and
for (c), we have used the facts that $\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]<\infty$ (via Theorem 1), $\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]<\infty$ , and $\mathbb{E}\left[\int_{0}^{T}\hat{Y}^{1+}_{t}\,dt\right]\leq\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}+1\,dt\right]<\infty$ .
Hence we can write

$\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}\right]$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{+}\,dY_{t}\right]-\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{-}\,dY_{t}\right]$
	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{+}\Gamma_{t}\,dt\right]-\mathbb{E}\left[\int_{0}^{T}(\log(\hat{Y}_{t}))^{-}\Gamma_{t}\,dt\right]$
	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\Gamma_{t}\,dt\right].$	(45)

∎

Proof:

The first part of the lemma follows directly from L’Hôpital’s rule. For the second part

	$\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]$	$\displaystyle=\lim_{\Delta\to 0}\sum_{k=1}^{4}\hat{\bar{Y}}(k)\exp(-\lambda\Delta)\alpha_{k}+\left(\hat{\bar{Y}}(k)-\frac{\log(\hat{\bar{Y}}(k))}{\Delta}\right)(1-\exp(-\lambda\Delta)\beta_{k}$
		$\displaystyle=\sum_{k=1}^{4}\hat{\bar{Y}}(k)\alpha_{k}-\lambda\log(\hat{\bar{Y}}(k))\beta_{k}$
		$\displaystyle=\sum_{k=1}^{4}\alpha_{k}\left(\hat{\bar{Y}}(k)-\frac{\lambda\beta_{k}}{\alpha_{k}}\log(\hat{\bar{Y}}(k))\right)\mathbf{1}\{\alpha_{k}>0\}$
		$\displaystyle\overset{(a)}{\leq}\sum_{k=1}^{4}\alpha_{k}\left(\Psi_{\mathcal{A}}\left(\frac{\lambda\beta_{k}}{\alpha_{k}}\right)+\frac{\epsilon}{4}\right)$
		$\displaystyle={D}+\frac{\epsilon}{4},$

where for (a), we have used the definition in (23).
∎

Proof:

The first limit can be evaluated using L’Hôpital’s rule. To compute the second limit, consider

	$\displaystyle\lim_{\Delta\to 0}{P(\bar{U}^{(i)}=k\|\bar{Y}=1)}$	$\displaystyle=\lim_{\Delta\to 0}\sum_{l=0}^{1}{P(\bar{U}^{(i)}=k,\bar{Y}^{(i)}=l\|\bar{Y}=1)}$
		$\displaystyle=\lim_{\Delta\to 0}\sum_{l=0}^{1}{P(\bar{Y}^{(i)}=l\|\bar{Y}=1)}(\bar{U}^{(i)}=k\|\bar{Y}^{(i)}=l)$
		$\displaystyle=p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}$
		$\displaystyle={\gamma^{(i)}_{k}}.$

Then we have

	$\displaystyle\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=1))$	$\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1}\|\bar{Y}=1)P(\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=1)$
		$\displaystyle=\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}.$

Recalling that $\alpha^{(i)}_{k}=0$ implies $\beta^{(i)}_{k}=\gamma^{(i)}_{k}=0$ , we have

	$\displaystyle\lim_{\Delta\to 0}\frac{\mathbb{E}[\log(\hat{\bar{Y}})\mathbf{1}\{\bar{Y}=1\}]}{\Delta}$	$\displaystyle=\lim_{\Delta\to 0}\frac{P(\bar{Y}=1)}{\Delta}\lim_{\Delta\to 0}\mathbb{E}[\log(\hat{\bar{Y}})\|\bar{Y}=1]$
		$\displaystyle=\lambda\sum_{k_{1},k_{2}}\lim_{\Delta\to 0}P(\bar{U}=k_{1},\bar{U}_{2}=k_{2}\|\bar{Y}=1))\log\left(\hat{\bar{Y}}(k_{1},k_{2})\right)$
		$\displaystyle=\lambda\sum_{k_{1},k_{2}}\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}\log\left(\lambda\frac{\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}}{\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}}\right)\mathbf{1}\{\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}>0\}$
		$\displaystyle=\lambda\sum_{k_{1}=1}^{4}\gamma^{(1)}_{k_{1}}\log\left(\frac{\gamma^{(1)}_{k_{1}}}{\alpha^{(1)}_{k_{1}}}\right)+\lambda\sum_{k_{2}=1}^{4}\gamma^{(1)}_{k_{2}}\log\left(\frac{\gamma^{(2)}_{k_{2}}}{\alpha^{(2)}_{k_{2}}}\right)+\phi(\lambda).$

Now to compute $\lim_{\Delta\to 0}\mathbb{E}[\hat{\bar{Y}}]$ , we first calculate

	$\displaystyle\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2})$	$\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=0)P(\bar{Y}=0)$
		$\displaystyle\quad+\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=1)P(\bar{Y}=1)$
		$\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=0)$
		$\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1}\|\bar{Y}=0)P(\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=0)$
		$\displaystyle=\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}.$

This gives

	$\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\hat{\bar{Y}}]$	$\displaystyle=\sum_{k_{1},k_{2}}\lim_{\Delta\to 0}P(\bar{U}=k_{1},\bar{U}_{2}=k_{2})\hat{\bar{Y}}(k_{1},k_{2})$
		$\displaystyle=\lambda\sum_{k_{1},k_{2}}\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}\frac{\gamma^{(1)}_{k_{1}}\gamma^{(2)}_{k_{2}}}{\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}}\mathbf{1}\{\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}>0\}$
		$\displaystyle=\lambda.$

Thus

	$\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]$	$\displaystyle=\lambda-\phi(\lambda)-\lambda\left(\sum_{k_{1}=1}^{4}\gamma^{(1)}_{k_{1}}\log\left(\frac{\gamma^{(1)}_{k_{1}}}{\alpha^{(1)}_{k_{1}}}\right)+\sum_{k_{2}=1}^{4}\gamma^{(2)}_{k_{2}}\log\left(\frac{\gamma^{(2)}_{k_{2}}}{\alpha^{(2)}_{k_{2}}}\right)\right)$
		$\displaystyle={D}.$

∎

Proof:

Achievability:

Let

	$\displaystyle{R}^{(1)}\triangleq\left((1-p^{(1)})\lambda+\mu^{(1)}\right)\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)$
	$\displaystyle{R}^{(2)}\triangleq\left((1-p^{(2)})\lambda+\mu^{(2)}\right)\sum_{k=1}^{4}\beta^{(2)}_{k}\log\left(\frac{\beta^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)$
	$\displaystyle{D}\triangleq\lambda-\phi(\lambda)-\lambda\cdot\left(\sum_{k=1}^{4}\gamma^{(1)}_{k}\log\left(\frac{\gamma^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)+\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)\right).$

We will show achievability using a $(T,{R}^{(1)}+\epsilon,{R}^{(2)}+\epsilon,{D}+\epsilon)$ code without feedforward. We will use discretization and results from the rate-distortion theory for discrete memoryless sources (DMS).

First consider the case when for each $i\in\{1,2\}$ , at least one of the following conditions is satisfied

C.1

$\beta^{(i)}_{k}>0$ for all $k$ ,
C.2

$p^{(i)}>0$ .

Fix $\Delta>0$ , and let $T\triangleq n\Delta$ for an integer $n$ . For each $i\in\{1,2\}$ , define a binary valued discrete time process $(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\})$ as follows. If there are one or more arrivals in the interval $((j-1)\Delta,j\Delta]$ of the process $Y_{0}^{(i),T}$ , then set $\bar{Y}^{(i)}_{j}$ to $1$ , otherwise set it equal to zero. Since $Y_{0}^{(i),T}$ is a Poisson process with rate $\lambda^{(i)}\triangleq(1-p^{(i)})\lambda+\mu^{(i)}$ , the components of $(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\})$ are independent and identically distributed with $P(\bar{Y}^{(i)}=1)=1-\exp(-\lambda^{(i)}\Delta)$ . Similarly, if $(\bar{Y}_{j}:j\in\{1,\dots,n\})$ denotes the discretized process $Y_{0}^{T}$ , then we have

\displaystyle P\left(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\}\middle|\bar{Y}_{j}:j\in\{1,\dots,n\}\right)=\prod_{j=1}^{n}P(\bar{Y}^{(i)}_{j}|\bar{Y}_{j})

due to the memoryless property of Poisson processes and independent thinning. Consider the following “test”-channel for $k\in\{1,2,3,4\}$ ,

	$\displaystyle P(\bar{U}^{(i)}=k\|\bar{Y}^{(i)}=1)=\beta^{(i)}_{k},$		(46)
	$\displaystyle P(\bar{U}^{(i)}=k\|\bar{Y}^{(i)}=0)=\alpha^{(i)}_{k}.$		(47)

Define the discretized distortion function

\displaystyle\bar{d}(\hat{\bar{y}},\bar{y})\triangleq\hat{\bar{y}}-\frac{\log(\hat{\bar{y}})}{\Delta}\mathbf{1}\{\bar{y}=1\}\quad\hat{\bar{y}}\geq 0,\bar{y}\in\{0,1\}.

(48)

The reconstruction $\hat{\bar{Y}}$ is taken as

\displaystyle\hat{\bar{Y}}(\bar{U}^{(1)},\bar{U}^{(2)})=\lambda\hat{\bar{Y}}^{(1)}(\bar{U}^{(1)})\hat{\bar{Y}}^{(2)}(\bar{U}^{(2)}),

where

\displaystyle\hat{\bar{Y}}^{(i)}(k)=\begin{cases}\frac{\gamma^{(i)}_{k}}{\alpha^{(i)}_{k}}&\text{if}\,\alpha^{(i)}_{k}>0,\\ 1&\text{otherwise}.\end{cases}

We note that since $\gamma^{(i)}_{k}=p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}$ , and at least one of C.1-C.2 is satisfied, $\hat{\bar{Y}}^{(i)}(k)>0$ , and hence $\hat{\bar{Y}}>0$ . Thus the distortion function $\bar{d}(\hat{\bar{Y}},\bar{Y})$ in (48) is bounded. Let

\displaystyle\kappa\triangleq\max_{k_{1},k_{2}}\left|\log\left(\hat{\bar{Y}}(k_{1},k_{2})\right)\right|.

(49)

Due to the Berger-Tung inner bound [56, Theorem 12.1, p. 295], for a given $\Delta>0$ , $\bar{\epsilon}>0$ , and all sufficiently large $n$ , there exists encoders $\bar{f}^{(1)}$ and $\bar{f}^{(2)}$ , and a decoder $\bar{g}$ such that for $i\in\{1,2\}$

	$\displaystyle\bar{f}^{(i)}:(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\})\to\{1,\dots,L^{(i)}\}$
	$\displaystyle\bar{g}:\{1,\dots,L^{(1)}\}\times\{1,\dots,L^{(2)}\}\to(\hat{\bar{Y}}_{j}:j\in\{1,\dots,n\}),$

satisfying

	$\displaystyle\frac{1}{n}\log(L^{(i)})\leq I(\bar{U}^{(i)};\bar{Y}^{(i)})+\bar{\epsilon},$		(50)
	$\displaystyle\mathbb{E}\left[\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y_{j}})\right]\leq\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\bar{\epsilon}.$		(51)

It is noteworthy that the Berger-Tung inner bound has a conditioning term in the mutual-information expression, which in general is a stronger bound than that presented here. However, in our setting we can drop this conditioning as explained in Remark 3 in the main paper.

Given the above setup, each encoder $f^{(i)}$ upon observing $Y_{0}^{(i),T}$ obtains the binary valued discrete-time process $(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\})$ , and sends $M^{(i)}=\bar{f}^{(i)}(\bar{Y}^{(i)}_{j}:j\in\{1,\dots,n\})$ to the decoder. The decoder outputs the reconstruction $\hat{Y}_{0}^{T}$ as

\displaystyle\hat{Y}_{t}\triangleq\sum_{j=1}^{n}\hat{\bar{Y}}_{j}\mathbf{1}\left\{t\in((j-1)\Delta,j\Delta]\right\}\quad t\in[0,T].

	$\displaystyle\frac{1}{T}d(\hat{Y}_{0}^{T};Y_{0}^{T})$	$\displaystyle=\frac{1}{T}\int_{0}^{T}\hat{Y}_{t}\,dt-\frac{1}{T}\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}$
		$\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\hat{\bar{Y}}_{j}-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})\bar{\bar{Y}}_{j}$
		$\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\hat{\bar{Y}}_{j}-\frac{1}{n\Delta}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})\bar{Y}_{j}-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}.$
		$\displaystyle=\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})-\frac{1}{T}\sum_{j=1}^{n}\log(\hat{\bar{Y}}_{j})(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}$
		$\displaystyle\overset{(a)}{\leq}\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})+\frac{\kappa}{T}\sum_{j=1}^{n}(\bar{\bar{Y}}_{j}-1)\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}$
		$\displaystyle\leq\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})+\frac{\kappa}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\},$

where for (a), we have used the definition of $\kappa$ in (49).
Hence taking the expectation, we get

$\displaystyle\mathbb{E}\left[\frac{1}{T}d(\hat{Y}_{0}^{T};Y_{0}^{T})\right]$	$\displaystyle\leq\mathbb{E}\left[\frac{1}{n}\sum_{j=1}^{n}\bar{d}(\hat{\bar{Y}}_{j},\bar{Y}_{j})\right]+\kappa\mathbb{E}\left[\frac{1}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}\right]$
	$\displaystyle\overset{(a)}{\leq}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa\mathbb{E}\left[\frac{1}{T}\sum_{j=1}^{n}\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}\right]+\bar{\epsilon}$
	$\displaystyle\overset{(b)}{=}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa(\lambda-\lambda\exp(-\lambda\Delta))+\bar{\epsilon}$
	$\displaystyle\overset{(c)}{\leq}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]+\kappa\lambda^{2}\Delta+\bar{\epsilon},$	(52)

where, for (a), we have used (51),
for (b) we note that $\mathbb{E}[\bar{\bar{Y}}_{j}\mathbf{1}\{\bar{\bar{Y}}_{j}>1\}]=\lambda\Delta-\lambda\Delta\exp(-\lambda\Delta)$ , and
for (c), we have used the inequality $1-u\leq\exp(-u)$ .
Moreover using (50), for $i\in\{1,2\}$

\displaystyle\frac{1}{T}\log(L^{(i)})=\frac{1}{n\Delta}\log(L^{(i)})\leq\frac{I(\bar{U}^{(i)};\bar{Y}^{(i)})}{\Delta}+\frac{\bar{\epsilon}}{\Delta}.

(53)

The scaling of the mutual information $I(\bar{U}^{(i)};\bar{Y}^{(i)})$ and the distortion function $\bar{d}(\hat{\bar{Y}},\bar{Y})$ with respect to $\Delta$ is given by the following lemma.

Lemma 14

For $i\in\{1,2\}$

	$\displaystyle\lim_{\Delta\to 0}\frac{I(\bar{U}^{(i)};\bar{Y}^{(i)})}{\Delta}$	$\displaystyle={R}^{(i)},$
	$\displaystyle\lim_{\Delta\to 0}\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]$	$\displaystyle={D}.$

Proof:

Please see the supplementary material. ∎

Now given the rate-distortion vector $({R}^{(1)},{R}^{(2)},{D})$ and $\epsilon>0$ , first choose $\Delta$ sufficiently small so that

	$\displaystyle\frac{I(\bar{U}^{(i)};\bar{Y}^{(i)})}{\Delta}$	$\displaystyle\leq{R}^{(i)}+\frac{\epsilon}{4},$
	$\displaystyle\mathbb{E}[\bar{d}(\hat{\bar{Y}},\bar{Y})]$	$\displaystyle\leq{D}+\frac{\epsilon}{4},$
	$\displaystyle\kappa\lambda^{2}\Delta$	$\displaystyle\leq\epsilon/4.$

Then let $\bar{\epsilon}=\Delta\epsilon/4$ , and choose a sufficiently large $n$ so that (50) and (51) are satisfied. From (52) and (53) we conclude that a sequence of $(T_{n},{R}^{(1)}+\epsilon,{R}^{(2)}+\epsilon,{D}+\epsilon)$ code exists with $T_{n}=n\Delta$ when at least one of the conditions C.1 or C.2 is satisfied.

Now consider the case when $p^{(i)}=0$ some $i\in\{1,2\}$ , and for that $i$ , $\beta^{(i)}_{k}=0$ for some $k$ ’s. Say $p^{(1)}=0$ and $p^{(2)}>0$ . This gives us $\gamma^{(1)}_{k}=\beta^{(1)}_{k}$ for $k\in\{1,2,3,4\}$ . Then we need to show that the rate-distortion vector

	$\displaystyle{R}^{(1)}=\left(\lambda+\mu^{(1)}\right)\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)$
	$\displaystyle{R}^{(2)}=\left((1-p^{(2)})\lambda+\mu^{(2)}\right)\sum_{k=1}^{4}\beta^{(2)}_{k}\log\left(\frac{\beta^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)$
	$\displaystyle{D}=\lambda-\phi(\lambda)-\lambda\left(\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)+\sum_{k=1}^{4}\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)\right)$		(54)

is achievable. Let $[\hat{\beta}^{(1)}_{k}]_{k=1}^{4}=[1/4,1/4,1/4]$ and $[\hat{\alpha}^{(1)}_{k}]_{k=1}^{4}=[1/4,1/4,1/4-\nu,1/4+\nu]$ for some $\nu\in[0,1/3)$ . Then the term

\sum_{k=1}^{4}\hat{\beta}^{(1)}_{k}\log\left(\frac{\hat{\beta}^{(1)}_{k}}{\hat{\alpha}^{(1)}_{k}}\right)

is continuous in $\nu$ and goes from zero to infinity as $\nu$ is increased from zero to $1/4$ , hence there exists some $\hat{\nu}\in[0,1/4)$ such that with $[\hat{\alpha}^{(1)}_{k}]_{k=1}^{4}=[1/4,1/4,1/4-\hat{\nu},1/4+\hat{\nu}]$ ,

\displaystyle\sum_{k=1}^{4}\hat{\beta}^{(1)}_{k}\log\left(\frac{\hat{\beta}^{(1)}_{k}}{\hat{\alpha}^{(1)}_{k}}\right)=\sum_{k=1}^{4}\beta^{(1)}_{k}\log\left(\frac{\beta^{(1)}_{k}}{\alpha^{(1)}_{k}}\right).

(55)

We note that this $[\hat{\beta}^{(1)}_{k}]_{k=1}^{4}$ satisfies condition C.1. Hence the rate-distortion vector in (54) is achievable by using $[\hat{\alpha}^{(1)}_{k}]_{k=1}^{4}$ that satisfies (55). The case when $p^{(2)}=0$ or both $p^{(1)}=p^{(2)}=0$ can be handled similarly.

Converse:

We will prove the converse when feedforward is present. For the given $(T,R^{(1)}+\epsilon,R^{(2)}+\epsilon,D+\epsilon)$ code with feedforward, let $M^{(1)}$ and $M^{(2)}$ denote the first and second encoder’s output respectively. We essentially repeat the steps in the converse proof of Theorem 4 to show that

\displaystyle\frac{1}{T}I(M^{(1)},M^{(2)};Y_{0}^{T})+D\geq\lambda-\phi(\lambda)-\epsilon.

Since $I(M^{(1)},M^{(2)};Y_{0}^{T})<\infty$ , we conclude from Theorem 1 that there exists a process $\Gamma_{0}^{T}$ such that $\Gamma_{0}^{T}$ is the $(\mathcal{F}_{t}=\sigma(M^{(1)},M^{(2)},Y_{0}^{t}):t\in[0,T])$ intensity of $Y_{0}^{T}$ and

\displaystyle I(M^{(1)},M^{(2)};Y_{0}^{T})=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-T\phi(\lambda),

(56)

Let $\hat{Y}_{0}^{T}$ denote the decoder’s output. The distortion constraint $D$ satisfies

	$\displaystyle D\geq\frac{1}{T}\mathbb{E}\left[d(\hat{Y}_{0}^{T},Y_{0}^{T})\right]-\epsilon$	$\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt-\log(\hat{Y}_{t})\,dY_{t}\right]-\epsilon$
		$\displaystyle=\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}-\log(\hat{Y}_{t})\Gamma_{t}\,dt\right]-\epsilon,$		(57)

where for the last equality we have used Lemma 12. Once again using the inequality $u\log(v)\leq\phi(u)-u+v,\quad 0\leq u,v<\infty$ , and noting that the individual terms have finite expectations,

	$\displaystyle\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\Gamma_{t}\,dt\right]$	$\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})-\Gamma_{t}+\hat{Y}_{t}\,dt\right]$
		$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]+\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right].$		(58)

Combining these inequalities, we obtain

$\displaystyle\frac{1}{T}I(M^{(1)},M^{(2)};Y_{0}^{T})+D$	$\displaystyle\geq\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\Gamma_{t})\,dt\right]-\phi(\lambda)$
	$\displaystyle\qquad+\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\hat{Y}_{t}\,dt\right]-\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\log(\hat{Y}_{t})\,dY_{t}\right]-\epsilon$
	$\displaystyle\overset{(a)}{\geq}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]-\phi(\lambda)-\epsilon$
	$\displaystyle\overset{(b)}{=}\lambda-\phi(\lambda)-\epsilon,$	(59)

where, for (a) we have used (57) and (58) and
for (b) we use the fact that $\mathbb{E}\left[\int_{0}^{T}\Gamma_{t}\,dt\right]=\mathbb{E}\left[\int_{0}^{T}\,dY_{t}\right]=\lambda T$ .

We can upper bound the term $I(M^{(1)},M^{(2)};Y_{0}^{T})$ as

$\displaystyle I(M^{(1)},M^{(2)};Y_{0}^{T})$	$\displaystyle\overset{(a)}{=}H(M^{(1)},M^{(2)})-\mathbb{E}\left[H(M^{(1)},M^{(2)}\|Y_{0}^{T})\right]$
	$\displaystyle\overset{(b)}{=}H(M^{(1)},M^{(2)})-\mathbb{E}\left[H(M^{(1)}\|Y_{0}^{T})\right]-\mathbb{E}\left[H(M^{(2)}\|Y_{0}^{T})\right]$
	$\displaystyle\overset{}{\leq}H(M^{(1)})+H(M^{(2)})-\mathbb{E}\left[H(M^{(1)}\|Y_{0}^{T})\right]-\mathbb{E}\left[H(M^{(2)}\|Y_{0}^{T})\right]$
	$\displaystyle=I(M^{(1)};Y_{0}^{T})+I(M^{(2)};Y_{0}^{T}),$	(60)

where, for (a) we have used Lemma 2 and
for (b), we used the Markov chain $M^{(1)}\leftrightarrows{Y}_{0}^{T}\leftrightarrows M^{(2)}$ .
Combining (59) and (60) we get

\displaystyle D\geq\lambda-\phi(\lambda)-\frac{1}{T}I(M^{(1)};Y_{0}^{T})-\frac{1}{T}I(M^{(2)};Y_{0}^{T})-\epsilon.

(61)

For $i\in\{1,2\}$ , using Lemma 2

\displaystyle\frac{1}{T}I(M^{(i)};{Y}^{(i),T}_{0})=\frac{1}{T}H(M^{(i)})\leq\frac{1}{T}\log\left(\lceil\exp((R^{(i)}+\epsilon)T)\rceil\right)\leq R^{(i)}+\epsilon+\frac{1}{T}.

(62)

We will first consider the case when $p^{(i)}<1$ for $i\in\{1,2\}$ . We shall proceed by defining certain auxiliary processes (see Figure 2). Let $\tilde{Z}_{0}^{(i),T}$ be obtained from $\tilde{p}^{(i)}$ -thinning of $Y_{0}^{(i),T}$ , where

\tilde{p}^{(i)}=\frac{\mu^{(i)}}{((1-p^{(i)})\lambda+\mu^{(i)})}.

Then using Lemma 11 we can write

Y_{t}^{(i)}=\tilde{Z}_{t}^{(i)}+\hat{Z}_{t}^{(i)}\quad t\in[0,T],

where $\tilde{Z}_{0}^{(i),T}$ and $\hat{Z}_{0}^{(i),T}$ are independent Poisson processes with rates $(1-p^{(i)})\lambda$ and $\mu^{(i)}$ respectively. Whereas, by definition

Y_{t}^{(i)}=\tilde{Y}_{t}^{(i)}+{N}_{t}^{(i)}\quad t\in[0,T],

where $\tilde{Y}_{t}^{(i)}$ and $N_{t}^{(i)}$ are independent Poisson processes with rates $(1-p^{(i)})\lambda$ and $\mu^{(i)}$ respectively. Hence we conclude that the joint distribution of $(Y_{0}^{(i),T},\tilde{Z}_{0}^{(i),T})$ is identical to the joint distribution of $(Y_{0}^{(i),T},\tilde{Y}_{0}^{(i),T})$ . Let $Z_{0}^{(i),T}$ be obtained by adding an independent Poisson process $\hat{N}_{0}^{(i),T}$ with rate $p^{(i)}\lambda$ to $\tilde{Z}^{(i),T}_{0}$ ,

Z_{t}^{(i)}=\tilde{Z}_{t}^{(i)}+\hat{N}_{t}^{(i)}\quad t\in[0,T].

Also using Lemma 11 we have

Y_{t}=\tilde{Y}^{(i)}_{t}+\tilde{\tilde{Y}}^{(i)}_{t}\quad t\in[0,T],

where $\tilde{Y}_{0}^{(i),T}$ and $\tilde{\tilde{Y}}^{(i),T}_{0}$ are independent Poisson processes with rates $(1-p^{(i)})\lambda$ and $p^{(i)}\lambda$ . Hence the joint distribution of $(Z_{0}^{(i),T},\tilde{Z}^{(i),T}_{0})$ and $(Y_{0}^{T},\tilde{Y}^{(i),T}_{0})$ are identical. Moreover, $M^{(i)}\rightleftarrows Y_{0}^{(i),T}\rightleftarrows\tilde{Y}_{0}^{(i),T}\rightleftarrows Y_{0}^{T}$ forms a Markov chain and $M^{(i)}\rightleftarrows Y_{0}^{(i),T}\rightleftarrows\tilde{Z}_{0}^{(i),T}\rightleftarrows Z_{0}^{(i),T}$ forms a Markov chain. This allows us to write

	$\displaystyle I\left(M^{(i)};\tilde{Z}_{0}^{(i),T}\right)$	$\displaystyle=I\left(M^{(i)};\tilde{Y}_{0}^{(i),T}\right),$
	$\displaystyle I\left(M^{(i)};{Z}_{0}^{(i),T}\right)$	$\displaystyle=I(M^{(i)};Y_{0}^{T}).$		(63)

Since $\tilde{Z}_{0}^{(i),T}$ is a $\frac{\mu^{(i)}}{((1-p^{(i)})\lambda+\mu^{(i)})}$ -thinning of ${Y}^{(i),T}_{0}$ , Theorem 3 gives

\displaystyle I(M^{(i)};\tilde{Z}_{0}^{(i),T})\leq\left(1-\frac{\mu^{(i)}}{(1-p^{(i)})\lambda+\mu^{(i)}}\right)I(M^{(i)};{Y}^{(i),T}_{0}).

(64)

Also $Z_{0}^{(i),T}$ is obtained by adding an independent Poisson process with rate $p^{(i)}\lambda$ to $\tilde{Z}^{(i),T}_{0}$ , Theorem 2 yields

	$\displaystyle I(M^{(i)};\tilde{Z}_{0}^{(i),T})$	$\displaystyle=\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(i)})-\phi((1-p^{(i)})\lambda)\,dt\right],$
	$\displaystyle I(M^{(i)};{Z}_{0}^{(i),T})$	$\displaystyle\leq\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(i)}+p^{(i)}\lambda)-\phi(\lambda)\,dt\right],$		(65)

where, $\tilde{\Gamma}_{0}^{(i),T}$ is the $(\sigma(M^{(i)},\tilde{Z}_{0}^{(i),T}):t\in[0,T])$ -intensity of $\tilde{Z}_{0}^{(i),T}$ . Then we can further lower bound $D$ in (61) as

	$\displaystyle D$	$\displaystyle\geq\lambda-\phi(\lambda)-\frac{1}{T}I(M^{(1)};Y_{0}^{T})-\frac{1}{T}I(M^{(2)};Y_{0}^{T})-\epsilon$
		$\displaystyle\overset{(a)}{=}\lambda-\phi(\lambda)-\frac{1}{T}I(M^{(1)};{Z}_{0}^{(1),T})-\frac{1}{T}I(M^{(2)};{Z}_{0}^{(2),T})-\epsilon$
		$\displaystyle\overset{(b)}{\geq}\lambda-\phi(\lambda)-\frac{1}{T}\left(\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(1)}+p^{(1)}\lambda)-\phi(\lambda)\,dt\right]\right)$
		$\displaystyle\phantom{===}-\frac{1}{T}\left(\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(2)}+p^{(2)}\lambda)-\phi(\lambda)\,dt\right]\right)-\epsilon$
		$\displaystyle\overset{(c)}{=}\lambda+\phi(\lambda)-\mathbb{E}\left[\phi(\tilde{\Gamma}_{S_{1}}^{(1)}+p^{(1)}\lambda)\right]-\mathbb{E}\left[\phi(\tilde{\Gamma}^{(2)}_{S_{2}}+p^{(2)}\lambda)\right]-\epsilon,$

where for (a), we have used (63),
for (b), we have used (65), and
for (c), we define $S_{1}$ and $S_{2}$ to be uniformly distributed on $[0,T]$ , independent of all other random variables and independent of each other as well.
For each $i\in\{1,2\}$ , $R^{(i)}$ in (62) can be lower bounded as

	$\displaystyle R^{(i)}$	$\displaystyle\geq\frac{1}{T}I(M^{(i)};{Y}^{(i),T}_{0})-\epsilon-\frac{1}{T}$
		$\displaystyle\overset{(a)}{\geq}\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\frac{1}{T}I(M^{(i)};\tilde{Z}_{0}^{(i),T})-\epsilon-\frac{1}{T}$
		$\displaystyle\overset{(b)}{=}\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\frac{1}{T}\mathbb{E}\left[\int_{0}^{T}\phi(\tilde{\Gamma}_{t}^{(i)})-\phi((1-p^{(i)})\lambda)\,dt\right]-\epsilon-\frac{1}{T}$
		$\displaystyle\overset{(c)}{=}\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\mathbb{E}\left[\phi(\tilde{\Gamma}_{S_{i}}^{(i)})-\phi((1-p^{(i)})\lambda)\right]-\epsilon-\frac{1}{T},$

where for (a), we have used (64),
for (b), we have used (65), and
for (c), recall that $S_{1}$ and $S_{2}$ are uniformly distributed on $[0,T]$ , independent of all other random variables and independent of each other.
Now we use Carathéodory’s theorem [54, Theorem 17.1]. For each $i\in\{1,2\}$ , there exist non-negative $[\eta^{(i)}_{k}]_{k=1}^{4}$ and $[\alpha^{(i)}_{k}]_{k=1}^{4}$ , such that $\sum_{k=1}^{4}\alpha^{(i)}_{k}=1$ and

	$\displaystyle\mathbb{E}\left[\phi(\tilde{\Gamma}_{S_{i}}^{(i)})\right]$	$\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi(\eta^{(i)}_{k}),$
	$\displaystyle\mathbb{E}\left[\phi(\tilde{\Gamma}_{S_{i}}^{(i)}+p^{(i)}\lambda)\right]$	$\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi(\eta^{(i)}_{k}+p^{(i)}\lambda),$
	$\displaystyle\mathbb{E}\left[\tilde{\Gamma}_{S_{i}}^{(i)}\right]$	$\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\eta^{(i)}_{k}=(1-p^{(i)})\lambda,$

where in the last line we have used the fact that since $\tilde{\Gamma}_{0}^{(i),T}$ is the $(\sigma(M^{(i)},\tilde{Z}_{0}^{(i),T}):t\in[0,T])$ -intensity of $\tilde{Z}_{0}^{(i),T}$ , $\mathbb{E}\left[\int_{0}^{T}\tilde{\Gamma}_{t}^{(i)}\,dt\right]=\mathbb{E}[\tilde{Z}^{(i)}_{T}]=T(1-p^{(i)})\lambda$ . Hence we have

	$\displaystyle R^{(i)}$	$\displaystyle\geq\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\left(\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi(\eta^{(i)}_{k})-\phi((1-p^{(i)})\lambda)\right)-\epsilon-\frac{1}{T},$		(66)
	$\displaystyle D$	$\displaystyle\geq\lambda+\phi(\lambda)-\sum_{k=1}^{4}\alpha^{(1)}_{k}\phi(\eta^{(1)}_{k}+p^{(1)}\lambda)-\sum_{k=1}^{4}\alpha^{(2)}_{k}\phi(\eta^{(2)}_{k}+p^{(2)}\lambda)-\epsilon.$		(67)

Now define

\beta^{(i)}_{k}\triangleq\frac{\alpha^{(i)}_{k}\eta^{(i)}_{k}}{(1-p^{(i)})\lambda},\quad\gamma^{(i)}_{k}\triangleq p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}.

We note that $\beta^{(i)}_{k}=0$ if $\alpha^{(i)}_{k}=0$ , and $\sum_{k=1}^{4}\beta^{(i)}_{k}=1$ . Substituting the above definitions in (66)

$\displaystyle R^{(i)}$	$\displaystyle\geq\frac{(1-p^{(i)})\lambda+\mu^{(i)}}{(1-p^{(i)})\lambda}\left(\sum_{k=1}^{4}\alpha^{(i)}_{k}\eta^{(i)}_{k}\log(\eta^{(i)}_{k})-\phi((1-p^{(i)})\lambda)\right)-\epsilon-\frac{1}{T}$
	$\displaystyle=((1-p^{(i)})\lambda+\mu^{(i)})\left(\sum_{k=1}^{4}\beta^{(i)}_{k}\log\left(\frac{\beta^{(i)}_{k}(1-p^{(i)})\lambda}{\alpha^{(i)}_{k}}\right)\mathbf{1}\{\alpha^{(i)}_{k}>0\}-\log((1-p^{(i)})\lambda)\right)$	(68)
	$\displaystyle\phantom{=========}-\epsilon-\frac{1}{T}$
	$\displaystyle=((1-p^{(i)})\lambda+\mu^{(i)})\sum_{k=1}^{4}\beta^{(i)}_{k}\log\left(\frac{\beta^{(i)}_{k}}{\alpha^{(i)}_{k}}\right)-\epsilon-\frac{1}{T}.$	(69)

Likewise,

	$\displaystyle\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi(\eta^{(i)}_{k}+p^{(i)}\lambda)$	$\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi\left(\frac{\beta^{(i)}_{k}(1-p^{(i)})\lambda}{\alpha^{(i)}_{k}}+p^{(i)}\lambda\right)\mathbf{1}\{\alpha^{(i)}_{k}>0\}$
		$\displaystyle=\sum_{k=1}^{4}\alpha^{(i)}_{k}\phi\left(\frac{\gamma^{(i)}_{k}}{\alpha^{(i)}_{k}}\lambda\right)\mathbf{1}\{\alpha^{(i)}_{k}>0\}$
		$\displaystyle=\lambda\sum_{k=1}^{4}\gamma^{(i)}_{k}\log\left(\frac{\gamma^{(i)}_{k}}{\alpha^{(i)}_{k}}\right)+\phi(\lambda).$

Substituting the above in (67), we get

\displaystyle D\geq\lambda-\phi(\lambda)-\lambda\sum_{k=1}^{4}\gamma^{(1)}_{k}\log\left(\frac{\gamma^{(1)}_{k}}{\alpha^{(1)}_{k}}\right)-\lambda\sum_{k=1}^{4}\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)-\epsilon.

(70)

If either $p^{(i)}$ , say $p^{(1)}$ , equals 1, then $M^{(1)}$ and $Y_{0}^{T}$ are independent so that $I(M^{(1)};Y_{0}^{T})=0$ , and we can repeat the above steps to show that

	$\displaystyle R^{(2)}$	$\displaystyle\geq((1-p^{(2)})\lambda+\mu^{(2)})\sum_{k=1}^{4}\beta^{(2)}_{k}\log\left(\frac{\beta^{(2)}}{\alpha^{(2)}_{k}}\right)-\epsilon-\frac{1}{T},$
	$\displaystyle D$	$\displaystyle\geq\lambda-\phi(\lambda)-\lambda\sum_{k=1}^{4}\gamma^{(2)}_{k}\log\left(\frac{\gamma^{(2)}_{k}}{\alpha^{(2)}_{k}}\right)-\epsilon,$

which is the region in (69)-(70) with $\alpha^{(1)}_{k}=\beta^{(1)}_{k}=\gamma^{(1)}_{k}$ for $k\in\{1,2,3,4\}$ .

Since $\epsilon$ is arbitrary, taking $\epsilon\to 0$ and $T\to\infty$ gives us the rate region in the statement of the theorem.

∎

	$\displaystyle\mathbb{E}\left[(Z_{v}-Z_{u})\|\mathcal{H}_{u}\right]$	$\displaystyle=\sum_{k=1}^{\infty}\mathbb{E}[\chi_{k}\mathbf{1}\{Y_{u}<k\leq Y_{v}\}\|\mathcal{H}_{u}]$
		$\displaystyle\overset{(a)}{=}\sum_{k=1}^{\infty}\mathbb{E}[\chi_{k}\|\mathcal{H}_{u}]\mathbb{E}[\mathbf{1}\{Y_{u}<k\leq Y_{v}\}\|\mathcal{H}_{u}]$
		$\displaystyle\overset{(b)}{=}(1-p)\mathbb{E}[(Y_{v}-Y_{u})\|\mathcal{H}_{u}]$
		$\displaystyle\overset{(c)}{=}(1-p)\mathbb{E}\left[\int_{u}^{v}\Gamma_{s}\,ds\middle\|\mathcal{H}_{u}\right],$

	$\displaystyle\lim_{\Delta\to 0}{P(\bar{U}^{(i)}=k\|\bar{Y}=1)}$	$\displaystyle=\lim_{\Delta\to 0}\sum_{l=0}^{1}{P(\bar{U}^{(i)}=k,\bar{Y}^{(i)}=l\|\bar{Y}=1)}$
		$\displaystyle=\lim_{\Delta\to 0}\sum_{l=0}^{1}{P(\bar{Y}^{(i)}=l\|\bar{Y}=1)}(\bar{U}^{(i)}=k\|\bar{Y}^{(i)}=l)$
		$\displaystyle=p^{(i)}\alpha^{(i)}_{k}+(1-p^{(i)})\beta^{(i)}_{k}$
		$\displaystyle={\gamma^{(i)}_{k}}.$

	$\displaystyle\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2})$	$\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=0)P(\bar{Y}=0)$
		$\displaystyle\quad+\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=1)P(\bar{Y}=1)$
		$\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1},\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=0)$
		$\displaystyle=\lim_{\Delta\to 0}P(\bar{U}_{1}^{(1)}=k_{1}\|\bar{Y}=0)P(\bar{U}_{1}^{(2)}=k_{2}\|\bar{Y}=0)$
		$\displaystyle=\alpha^{(1)}_{k_{1}}\alpha^{(2)}_{k_{2}}.$