Functional Covering of Point Processes
Abstract
We introduce a new distortion measure for point processes called functional-covering distortion. It is inspired by intensity theory and is related to both the covering of point processes and logarithmic-loss distortion. We obtain the distortion-rate function with feedforward under this distortion measure for a large class of point processes. For Poisson processes, the rate-distortion function is obtained under a general condition called constrained functional-covering distortion, of which both covering and functional-covering are special cases. Also for Poisson processes, we characterize the rate-distortion region for a two-encoder CEO problem and show that feedforward does not enlarge this region.
I Introduction
The classical theory of compression [2] focuses on discrete-time, sequential sources. The theory is thus well-suited to text, audio, speech, genomic data, and the like. Continuous-time signals are typically handled by reducing to discrete-time via projection onto a countable basis. Multi-dimensional extensions enable application to images and video.
Point processes model a distinct data type that appears in diverse domains such as neuroscience [3, 4, 5, 6, 7, 8], communication networks [9, 10, 11], imaging [12, 13], blockchains [14, 15, 16, 17], and photonics [18, 19, 20, 21, 22]. Formally, a point process can be viewed as a random counting measure on some space of interest [23], or if the space is a real line, a random counting function; we shall adopt the latter view. Informally, it may be viewed as simply a random collection of points representing epochs in time or points in space.
Compression of point processes emerges naturally in several of the above domains. Sub-cranial implants need to communicate the timing of neural firings to a monitoring station over a wireless link that is low-rate because it must traverse the skull [24, 25]. In network flow correlation analysis, one cross-correlates packet timings from different links in the network [11]; this requires communication of the packet timings from one place to another. Compressing point process realizations in 2-D (also known as point clouds) arises in computer vision [26, 27, 28], and so on.
Various specialized approaches have been developed for compressing point processes, and in particular for measuring distortion. One natural approach is for the compressed representation to be itself a point-process realization. In this case, the distortion can be the sum of the absolute value of the differences between the actual and reconstructed epochs, with the constraint that the two processes must have the same number of points. For the Poisson point process, Gallager [29] obtained a lower bound on the rate-distortion function by insisting on the causal reconstruction of the points but allowing for their reorder. Bedekar [30] determined the rate-distortion function with the additional constraint of exact orders of epochs in reconstruction. Verdú [31] allowed the reconstruction to be non-causal. Coleman et al. [32] introduced the queueing distortion function, where the reproduced epochs lead the actual epochs. Rubin [33] used the distance between the counting functions as a distortion measure. In a more general setting, Koliander et al. [34] gave upper and lower bounds on the rate-distortion function under a more generic distortion defined between pair of point processes.
Most relevant to the present paper, Lapidoth et al. [35] introduced a covering distortion measure, where the reconstruction of a point process on is a subset of that must contain all the points, and the distortion is the Lebesgue measure of the covering set (see also Shen et al. [36]).
If we encode the subset as an indicator function
then guarantees that no point occurred at time , while indicates that a point may occur at . More generally, could encode the relative belief that there is a point at . Inspired by this observation, and the notion of logarithmic-loss distortion [37, 38], we consider the following formulation. For a realization of a counting (or point) process (i.e., is integer-valued, non-decreasing, and has unit jumps) and a non-negative reconstruction , we define the functional-covering distortion as
(1) |
This is related to the covering distortion measure in the following sense. If we impose that , then (1) reduces to the covering distortion measure. Yet it is natural to consider the distortion in (1) without such a restriction, or with a more general set of allowable values for . In fact, there are advantages to not restricting to the set . Consider a remote source setting where the encoder cannot access the point-process source directly, but instead observes a thinned version where some of the points in the source point process are deleted randomly. Then, in case of the covering distortion the reconstruction can only be the entire interval (i.e. ). On the other hand, under functional covering distortion the problem has a nontrivial solution.
The relation functional covering distortion measure to logarithmic-loss is as follows. If we constrain to be bounded, then we can use a Girsanov-type transformation [39, Chapter VI, Theorems T2-T4] to define a probability measure on the set of all counting processes using , and the distortion can be defined as the expectation of the negative logarithm of the Radon-Nikodym derivative between this probability measure and an appropriately chosen reference measure, evaluated at the source realization, which is equivalent to (1). However, we will allow to be unbounded but integrable .
The relation to intensity theory is as follows. Heuristically, given a random variable , the intensity of a point process represented by a counting function is a non-negative process such that (see Definition 2 for the precise statement). From (1), we expect any optimal (in the rate-distortion trade-off sense) to be related to the intensity of . In fact, we will see in the proof of Theorem 4 that an optimal reconstruction is the intensity of given the encoder’s output.
Beyond the introduction of the functional covering distortion measure and the accompanying coding theorems, the paper provides a collection of results for the information-theoretic analysis of point processes, which may be of independent use. One such contribution is Theorem 1, where we derive the mutual information between point-processes with intensities and arbitrary random variables. This is the most general expression available for mutual informations involving point process with intensities. Theorem 1 subsumes the existing formulae for mutual informations involving doubly stochastic Poisson processes [40, 41, 42] and queuing processes [43] as special cases. The other theorems proved in this paper are: we obtain the rate-distortion trade-off with feedforward for the functional-covering distortion measure for point processes which admit intensities (see Theorem 4). For Poisson processes, we obtain the rate-distortion region when the reconstruction function is constrained to take value in a subset of reals (Theorem 5). The covering distortion in [35, Theorem 1] is a special case of this constrained functional-covering distortion, hence the rate-distortion function in [35] can be obtained as the special case of this theorem. We characterize the rate-distortion region for a two-encoder Poisson CEO problem (see Figure 1) under functional-covering distortion in Theorem 6. To prove the converse of the CEO problem, we derive a strong data processing inequality for Poisson processes under superposition (see Theorem 2), which complements the strong data processing inequality for Poisson processes under thinning due to Wang [44]. We also provide a self-contained proof of Wang’s theorem in Theorem 3. The solution to the CEO problem gives the rate-distortion trade-off for remote Poisson sources as an immediate corollary.
II Preliminaries
We will consider a probability space on which all stochastic processes considered here are defined. For a finite , let be an increasing family of -fields with . We will assume that the given filtration , , and satisfy the “usual conditions”[39, Chapter III, p. 75]: is complete with respect to , is right continuous, and contains all the -null sets of . Stochastic processes are denoted as . The process is said to be adapted to the history if is measurable for all . The internal history recorded by the process is denoted by , where denotes the -field generated by .
A process is called -predictable if is measurable and the mapping defined from into (the set of real numbers) is measurable with respect to the -field over generated by rectangles of the form
(2) |
For two measurable spaces and , the product space is denoted by . We say that forms a Markov chain under measure if and are conditionally independent given under . denotes that the probability measure is absolutely continuous with respect to the measure . denotes the indicator function for an event . is the natural logarithm of . and denote the positive () and the negative part () of respectively. denotes the ceiling of . Throughout this paper we will adopt the convention that , , and .
Definition 1
with convention that .
We note that is convex.
We will use the following form of Jensen’s inequality [45, Theorem 7.9, p. 149] and [45, Theorem 8.20, p. 177].
Lemma 1
If is a convex function and then exists and for any two -fields and ,
We now recall the definition of mutual information for general ensembles and its properties. Let , , and be measurable mappings defined on a given probability space , taking values in , , and respectively. Consider partitions of , and . Wyner defined the conditional mutual information as [46]
(3) |
where the supremum is over all such partitions of . Wyner showed that with equality if and only if forms a Markov chain [46, Lemma 3.1], and that (what is generally referred to as) Kolmogrov’s formula holds [46, Lemma 3.2]
(4) |
Hence if , then . The data processing inequality can be obtained from (4) as well: if forms a Markov chain, then .
Denote by , the joint distribution of and on the space ( ), i.e.,
Similarly, and denote the marginal distributions. Gelfand and Yaglom [47] proved that if , then the mutual information (defined via (3) by taking to be the trivial -field) can be computed as:
(5) |
A sufficient condition for is that [48, Lemma 5.2.3, p. 92]. We will also require the following result [46, Lemma 2.1]:
Lemma 2 (Wyner’s Lemma)
If is a finite alphabet random variable, then
where
and is the entropy of .
III Point Processes, Intensities, and Mutual Information
Let denote the set of counting realizations (or point-process realizations) on , i.e., if , then for , (the set of non-negative integers), is right continuous, and has unit increasing jumps with . Let be the restriction of the -field generated by the Skorohod topology on to .
Definition 2
If is a counting process adapted to the history , then is said to have -intensity , where is a non-negative measurable process if
-
•
is -predictable,
-
•
, -a.s.,
-
•
and for all non-negative -predictable processes :111The limits of the Lebesgue-Stieltjes integral should be interpreted as .
When it is clear from the context, we will drop the probability measure from the notation and say has -intensity .
Definition 3
A point process is said to be Poisson process with rate if its -intensity is .
The above definition can be shown to imply the usual definition of Poisson process [39, Theorem T4, Chapter II, p. 25] and vice versa [39, Section 2, Chapter II, p. 23].
Definition 4
denotes the distribution of a point process (on the space ) under which is a Poisson process with unit rate.
A point processes with stochastic intensity and a Poisson process with unit rate are linked via the following result.
Lemma 3
Let be the distribution of a point process such that . Then there exists a non-negative predictable process such that
Moreover, the -intensity of is . Conversely, if the -intensity of is and , then , and the corresponding Radon-Nikodym derivative is given by the above expression, where
In the latter case,
Proof:
Please see the supplementary material. ∎
The following theorem allows us to express the mutual information involving a point processes with intensity and other random variables in terms of the intensity functions. The proof of the theorem is similar to the proof of Theorem 1 in [42].
Theorem 1
Let be a point process with -intensity such that
and let be a measurable mapping on the given probability space satisfying . Then there exists a process such that is the intensity of and
Proof:
Let denote the joint distribution of and , and and denote their marginals, respectively. Since , we get that [48, Lemma 5.2.3, p. 92]. Lemma 3 says that , which together with [49, Chapter 1, Exercise 19, p. 22] gives .
Let and
(6) |
denote the Radon-Nikodym derivative. Since under , and are independent, we note that the -intensity of is 1 [39, E5 Exercise, Chapter II, p. 28]. Define the process as
(7) |
where denotes that the conditional expectation is taken with respect to the measure . Then is a non-negative absolutely-integrable martingale.
By the martingale representation theorem, the process can be written as [39, Chapter III, Theorem T17, p. 76] (where we have taken to be the “germ -field”):
where is a -predictable process which satisfies -a.s. Applying [50, Lemma 19.5, p. 315], we can write as
(8) |
where is a non-negative -predictable process, and -a.s. for .
Now we can mimic the proof of [39, Chapter VI, Theorems T3, p. 166] to deduce:
Lemma 4
For all non-negative -predictable processes
where the expectation is taken with respect to measure .
Proof:
Please see the supplementary material. ∎
Taking in the above equality yields
(9) |
Hence -a.s. and we conclude that the -intensity of is .
Now we will use:
Lemma 5
(10) |
Proof:
Please see the supplementary material. ∎
Since is well-defined, (6), (7), and (8) yields
(11) |
where in the last line we have used Lemma 5 and from (9). Also,
(12) | ||||
where we have used Lemma 3 for (a). Using the above inequality and the fact that
is well-defined, we can express the mutual information as
(13) |
Now we can compute the mutual information from (11), (12), and (13),
∎
We shall require several strong data processing inequalities, for which purpose we now derive some ancillary results regarding the intensity of a point process. Combining [39, T8 Theorem, Chapter II, p. 27] and [39, T9 Theorem, Chapter II, p. 28], we can conclude the following result.
Lemma 6
Let be a -predictable non-negative process satisfying
Let be a point process adapted to . Then is the -intensity of if and only if
is a -local martingale222 A process is called a local martingale with respect to a filtration if is -measurable for each and there exists an increasing sequence of stopping times , such that and the stopped and shifted processes are -martingales for each ..
If we impose the stricter condition of finite expectation , the local martingale condition in the above statement can be replaced by the martingale condition.
Lemma 7
Let be a -predictable non-negative process satisfying
Let be a point process adapted to . Then is the -intensity of if and only if
is a -martingale.
Proof:
Please see the supplementary material. ∎
Lemma 8
If a point process has -intensity , and is another history for such that for each , then there exists a process such that is the -intensity of , and for each , -a.s.
Proof:
Please see the supplementary material. ∎
Lemma 9
Let be a point process with -intensity for some . Let be obtained adding an independent (of both and ) point process with -intensity to . Then has a -intensity which satisfies -a.s. for each .
Proof:
Please see the supplementary material. ∎
Theorem 2
Let be a Poisson process with rate , be such that , and be the -intensity of . Suppose is obtained by adding an independent (of and ) Poisson process with rate to . Then,
Proof:
Since forms a Markov chain, the data processing inequality gives . Applying Theorem 1 and using the uniqueness of intensities,
(14) |
where and are the and -intensities of . Due to the uniqueness of the intensities and Lemma 9, we get for each , , and . Substituting this in (14) and applying Jensen’s inequality yields
∎
Definition 5
A point process is said to be obtained from -thinning of a point process , if each point in is deleted with probability , independent of all other points and deletions.
Lemma 10
Suppose that is a point process with -intensity such that and is obtained from -thinning . Then the -intensity of is given by , where -a.s. .
Proof:
Please see the supplementary material. ∎
The following theorem was first proven by Wang in [44] using a property of certain “contraction coefficient” used in strong data processing inequalities [51]. Here, we provide a self-contained proof which uses Theorem 1 and Lemma 10.
Theorem 3
Let be a Poisson process with rate , and be such that . Let obtained from -thinning of such that the thinning operation is independent of . Then
Proof:
The data processing inequality gives . Applying Theorem 1,
(15) |
and
(16) |
where and (respectively and ) are the and -intensities (respectively and -intensities) of (respectively ). Due to the uniqueness of the intensities and Lemma 10, we can take for each ,
Noting that , (16) yields
where for (a) we have used the fact that , and
for (b) we have used Jensen’s inequality.
∎
We will require the following result [52, Theorem 2.11, p. 106].
Lemma 11
Suppose that is a Poisson process with rate and is obtained from -thinning of . Let
Then and are independent Poisson processes with rates and respectively.
The following lemma will be used repeatedly in the converse proofs of the rate-distortion function.
Lemma 12
Let a point process have an -intensity such that
Let be an non-negative -predictable process satisfying . Then
Proof:
Please see the supplementary material. ∎
IV Functional Covering of Point Processes
In this section, we will consider general point processes and obtain the rate-distortion function under the functional-covering distortion when feedforward is present. Stronger results are obtained for Poisson processes in the next sections.
Definition 6
Given a point process , and a non-negative function , the functional-covering distortion is
whenever the expression on the right is well-defined.
We will allow the reconstruction function to depend on as well as the message, constrained via predictability. In particular, we will call an allowable reconstruction with feedforward if it is non-negative and -predictable. Let denote the set of all processes which are allowable reconstructions with feedforward.
Definition 7
A code with feedforward consists of an encoder
and a decoder
satisfying
and the distortion constraint
We will call the encoder’s output the message and the decoder’s output the reconstruction.
Definition 8
The minimum achievable distortion with feedforward at rate and blocklength is
Definition 9
The distortion-rate function with feedforward is
The minimum achievable rate at distortion and blocklength with feedforward and the rate-distortion function with feedforward can be defined similarly.
can be characterized via the following theorem for certain point processes.
Theorem 4
Let be a point process with -intensity such that
Let
and
Then satisfies
Proof:
Achievability:
Recall that since is the -intensity of , it is -predictable, and implies .
If the decoder outputs , this leads to distortion
Thus , and the upper bound in the statement of the theorem holds at .
Now consider the case . Fix and let . If , then the encoder sends index . Otherwise, let denote the first arrival instant of the observed point process . From Lemma 3, we have that . Since under , is a Poisson process with unit rate, it holds that for any fixed . This gives us for . Thus conditioned on the event , has a continuous distribution function . The encoder computes which is uniformly distributed over , which the encoder suitably quantizes to obtain an which is uniform in . From Theorem 1, there exists a -predictable process which is the -intensity of . We note that , and from Theorem 1, . Hence
is well-defined. The decoder outputs as its reconstruction. Then we have
(17) |
where for (a), we have used the bound ,
for (b), we have used the inequality when , and
for (c), we used the fact that .
also satisfies
(18) |
where, for (a) we have used Lemma 2,
for (b) we have used Theorem 1.
The average distortion can be bounded as follows:
where, for (a), we have used the fact that due to Theorem 1,
for (b), we used the equality ,
for (c), we used (18), and
for (d), we used (17).
Thus we have shown the existence of a code with feedforward such that . This gives the upper bound on .
Converse:
For the given code with feedforward, let . Then . Thus we have
(19) |
where (a) follows because of Lemma 2.
Since , we conclude from Theorem 1 that there exists a process such that is the intensity of and
Hence from (19)
(20) |
Let denote the decoder’s output. The distortion constraint satisfies
(21) |
where in the last line we have used Lemma 12.
Using the inequality , and noting that the individual terms have finite expectations,
(22) |
where, for (a) we have used (22), and
for (b) we used the fact that .
Hence we have shown that for any code with feedforward, . This gives us the lower bound on
∎
Corollary 1
Let be a point process with -intensity such that
-
•
,
-
•
is finite.
-
•
.
Then
Proof:
The corollary follows from the definition and from the bounds on in the Theorem 4. ∎
Remark 1
The above distortion-rate function is reminiscent of the logarithmic-loss distortion-rate function for a DMS. Specifically, for a DMS on alphabet let the reconstruction be a probability distribution function on . The logarithmic loss distortion is defined as and the distortion-rate function is then given by [38].
If the reconstruction is assumed to be bounded then it can be used define a probability measure on the space of point-processes via following Radon-Nikodym derivative.
where is the measure under which is a Poisson process with unit rate. Then the intensity of under this measure is [39, Chapter VI, Theorems T2-T4] and the functional-covering distortion is related to the above Radon-Nikodym derivative as
Applying the above corollary to a Poisson process with rate , we get that . As we will see in the next section, this distortion-rate function can be achieved without feedforward.
V Constrained Functional-Covering of Poisson Processes
In this and the next section we focus on Poisson processes. Let denote the set of all functions which are non-negative and left-continuous with right-limits. We assume that we are given a set with at least one positive element. We will constrain the reconstruction function to take value in , so that for all , .
Definition 10
A code consists of an encoder
and a decoder
satisfying
and the distortion constraint
As before, we will call the encoder’s output the message and the decoder’s output the reconstruction.
Definition 11
A rate-distortion vector is said to be achievable if for any , there exists a sequence of codes such that .
Definition 12
The rate-distortion region is the intersection of all achievable rate-distortion vectors .
Theorem 5
The rate-distortion region for the constrained functional-covering of a Poisson process with rate is given by
where is the convex hull of the union of sets of rate-distortion vectors such that
where
with the convention that , and and are probability vectors over satisfying .
Proof:
Achievability
Let
We will show achievability using a code without feedforward. We will use discretization and results from the rate-distortion theory for discrete memoryless sources (DMS). Define a binary-valued discrete-time process as follows. If there are one or more arrivals in the interval of the process , then set to , otherwise it equals zero. Since is a Poisson process with rate , the components of are independent and identically distributed with . Consider the following “test”-channel for ,
Define the discretized distortion function
The reconstruction is taken as a satisfying
(23) |
where such a exists due to the definition of . We recall that if then , and hence for such a . The scaling of the mutual information and the distortion function with respect to is given by the following lemma.
Lemma 13
Proof:
Please see the supplementary material. ∎
Let
(24) |
Due to [53, Theorem 9.3.2, p. 455], for a given , , and all sufficiently large , there exists an encoder and a decoder such that
satisfying
(25) |
Given the above setup, the encoder upon observing obtains the binary valued discrete time process , and sends to the decoder. The decoder outputs the reconstruction as
Let denote the actual number of arrivals of in an interval . Then is related to the original distortion function via the above reconstruction as follows:
where for (a), we have used the definition of in (24), since implies which implies in order for , which occurs a.s. since so long as is sufficiently small.
Hence taking expectations, we get
(26) |
where, for (a), we have used (25),
for (b) we note that , and
for (c), we have used the inequality .
Moreover using (25),
(27) |
Converse
We will prove the converse when feedforward is present. For the given code with feedforward, let denote the encoder’s output. Since , we conclude from Theorem 1 that there exists a process such that is the intensity of and
(28) |
We also have
This gives
Let denote the decoder’s output. The distortion constraint satisfies
(29) |
where, for (a) we have used Lemma 12, and
for (b), we have used the definition of .
Defining to be uniformly distributed on , and independent of all other random variables we have
(30) | ||||
(31) |
Now we use Carathéodory’s theorem [54, Theorem 17.1]. There exist non-negative and , such that and
(32) | ||||
(33) | ||||
(34) |
where in the last line we have used the fact that since is the -intensity of , . Now define
We note that if , and . Substituting the above definitions in (30)-(31), we obtain
(35) |
Likewise,
Since is arbitrary and can be made arbitrarily large, we obtain the rate-distortion region in the statement of the theorem. ∎
If we do not place any restrictions on , i.e., if is the set of all non-negative reals, then we obtain the functional-covering distortion.
Corollary 2 (Functional Covering of Poisson Processes)
The rate-distortion function for functional-covering distortion is given by .
Proof:
For the functional-covering distortion, is the set of non-negative reals. Hence
For any achievable we have
(36) |
and
Hence
and this is achieved by and that yield equality in (36). ∎
If take , then we recover the covering distortion in [35, Theorem 1].
Corollary 3 (Covering Distortion [35])
The rate-distortion function for the covering distortion is given by .
Proof:
For the covering distortion, . Hence
Suppose is in . Then
where we have defined . Similarly,
where, (a) is due to the log-sum inequality, which can be achieved by setting , , , . ∎
Remark 2
As in the general case in Theorem 4 (see Remark 1), the reconstruction (assuming it is bounded) can be used to define a probability measure on the input space via
where is the measure under which is a Poisson process with unit rate. Moreover, in absence of feedforward, is deterministic (it depends only on the encoder’s output). Thus the input point-process is a non-homogeneous Poisson processes with rate under . As in the general case, the functional-covering distortion is related to the above Radon-Nikodym derivative via
VI The Poisson CEO Problem

We now consider the distributed problem shown Figure 1. Our goal is to compress , which is a Poisson process with rate . Each of the two encoders observes a degraded version of , denoted by , . is first -thinned to obtain , and then an independent Poisson process with rate is added to to obtain .
Recall that is the set of all non-negative functions which are left-continuous with right-limits, and
Definition 13
A code for the Poisson CEO problem consists of encoders and ,
and a decoder ,
satisfying
and the distortion constraint
Definition 14
A rate-distortion vector is said to be achievable for the Poisson CEO problem if for any , there exists a sequence codes .
Definition 15
The rate-distortion region for the Poisson CEO problem is the intersection of all achievable rate-distortion vectors .
The rate-distortion region for the Poisson CEO problem with feedforward, denoted by , is defined analogously.
Theorem 6
The rate-distortion region for the Poisson CEO problem is given by
where is the convex hull of union of sets of rate-distortion vectors such that
for some probability vectors , , and , where for and
Proof:
Please see the supplementary material. ∎
Remark 3
Note that there is no sum-rate constraint in the rate-distortion region of the above theorem. This occurs due to the sparsity of points in a Poisson process. After discretizing a Poisson process with rate , the expected number of ones in the resulting binary process is roughly , and the remaining bits are zeroes. When such a sparse binary process is sent via two independent parallel channels as in (46)-(47), the resulting output processes are almost independent. This implies that the encoders do not need to bin their messages in the achievability argument.
Corollary 4 (Poisson CEO Problem without Thinning)
If , then the rate-distortion region in Theorem 6 takes a simple form
Corollary 5 (Remote Poisson Source)
Consider a scenario where an encoder wishes to compress a Poisson process with rate , but observes a degraded version of it, where the points are first erased with independent probability and then an independent Poisson process with rate is added to it. Then the rate-distortion region is the convex hull of the union of all rate-distortion vectors satisfying
for some probability vectors , , and , where for
References
- [1] N. V. Shende and A. B. Wagner, “Functional covering of point processes,” in IEEE Int. Symp. Info. Theory, 2019, pp. 2039–2043.
- [2] T. Berger, Rate Distortion Theory: A Mathematical Basis for Data Compression. Englewood Cliffs, NJ: Prentice Hall, 1971.
- [3] D. H. Johnson, “Point process models of single-neuron discharges,” Journal of Computational Neuroscience, vol. 3, no. 4, pp. 275–299, 1996.
- [4] J. H. Goldwyn, J. T. Rubinstein, and E. Shea-Brown, “A point process framework for modeling electrical stimulation of the auditory nerve,” Journal of Neurophysiology, vol. 108, no. 5, pp. 1430–1452, 2012.
- [5] F. Farkhooi, M. F. Strube-Bloss, and M. P. Nawrot, “Serial correlation in neural spike trains: Experimental evidence, stochastic modeling, and single neuron variability,” Physical Review E, vol. 79, no. 2, p. 021905, 2009.
- [6] S. V. Sarma, U. T. Eden, M. L. Cheng, Z. M. Williams, R. Hu, E. Eskandar, and E. N. Brown, “Using point process models to compare neural spiking activity in the subthalamic nucleus of Parkinson’s patients and a healthy primate,” IEEE Transactions on Biomedical Engineering, vol. 57, no. 6, pp. 1297–1305, 2010.
- [7] E. N. Brown, R. E. Kass, and P. P. Mitra, “Multiple neural spike train data analysis: state-of-the-art and future challenges,” Nature Neuroscience, vol. 7, no. 5, pp. 456–461, 2004.
- [8] F. Rieke, D. Warland, R. de Ruyter van Steveninck, and W. Bialek, Spikes: Exporing the Neural Code. MIT Press, 1997.
- [9] J. Giles and B. Hajek, “An information-theoretic and game-theoretic study of timing channels,” IEEE Trans. on Inf. Theory, vol. 48, no. 9, pp. 2455–2477, 2002.
- [10] M. Shahzad and A. X. Liu, “Accurate and efficient per-flow latency measurement without probing and time stamping,” IEEE/ACM Trans. Networking, vol. 24, no. 6, pp. 3477–3492, 2016.
- [11] Y. Zhu, X. Fu, B. Graham, R. Bettati, and W. Zhao, “On flow correlation attacks and countermeasures in mix networks,” in Proc. 4th Privacy Enhancement Technology Workshop (PET), 2004.
- [12] C. B. Attila Börcs, “A marked point process model for vehicle detection in aerial LIDAR point clouds,” in ISPRS Ann. Photogrammetry, Remote Sens. and Spatial Inf. Sci, vol. 1-3, 2012, pp. 93–98.
- [13] Y. Yu, J. Li, H. Guan, C. Wang, and M. Cheng, “A marked point process for automated tree detection from mobile laser scanning point cloud data,” in 2012 Intl. Conf. Comp. Vision in Remote Sensing, 2012, pp. 140–145.
- [14] S. Nakamoto, “A peer-to-peer electronic cash system,” 2008. [Online]. Available: bitcoin.org/bitcoin
- [15] Y. Lewenberg, Y. Bachrach, Y. Sompolinsky, A. Zohar, and J. S. Rosenschein, “Bitcoin mining pools: A cooperative game theoretic analysis,” in Proc. 2015 Int. Conf. Autonomous Agents and Multiagent Sys., 2015, p. 919–927.
- [16] Y. Kawase and S. Kasahara, “Transaction-confirmation time for bitcoin: A queueing analytical approach to blockchain mechanism,” in Intl. Conf. on Queueing Theory and Network App., 2017, p. 75–88.
- [17] C. Decker and R. Wattenhofer, “Information propagation in the bitcoin network,” in IEEE P2P 2013 Proc., 2013, p. 1–10.
- [18] A. Laourine and A. B. Wagner, “Secrecy capacity of the degraded Poisson wiretap channel,” in Proc. IEEE Intl. Symp. Inf. Theory, Jun. 2010, pp. 2553–2557.
- [19] A. D. Wyner, “Capacity and error exponent for the direct detection photon channel—Part I,” IEEE Trans. Inf. Theory, vol. 34, no. 6, pp. 1449–1461, Nov. 1988.
- [20] ——, “Capacity and error exponent for the direct detection photon channel—Part II,” IEEE Trans. Inf. Theory, vol. 34, no. 6, pp. 1462–1471, Nov. 1988.
- [21] A. Lapidoth, “On the reliability function of the ideal Poisson channel with noiseless feedback,” IEEE Trans. Inf. Theory, vol. 39, no. 2, pp. 491–503, Mar. 1993.
- [22] N. Shende and A. B. Wagner, “The stochastic-calculus approach to multiple-decoder Poisson channels,” IEEE Trans. Inf. Theory, vol. 65, no. 8, pp. 5007–5027, Aug. 2019.
- [23] F. Baccelli and P. Brémaud, Palm Probabilities and Stationary Queues. Springer-Verlag, 1987.
- [24] C. Sutardja and J. M. Rabaey, “Isolator-less near-field RFID reader for sub-cranial powering/data link of millimeter-sized implants,” IEEE Journal of Solid-State Circuits, vol. 53, no. 7, pp. 2032–2042, 2018.
- [25] A. K. Skrivervik, A. J. M. Montes, I. V. Trivino, M. Bosiljevac, M. Veljovic, and Z. Sipus, “Antenna design for a cranial implant,” in 2020 Intl. Work. Antenna Tech. (iWAT), 2020, pp. 1–4.
- [26] R. L. de Queiroz and P. A. Chou, “Compression of 3D point clouds using a region-adaptive hierarchical transform,” IEEE Trans. on Image Proc., vol. 25, no. 8, pp. 3947–3956, 2016.
- [27] T. Golla and R. Klein, “Real-time point cloud compression,” in 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2015, pp. 5087–5092.
- [28] C. Tu, E. Takeuchi, C. Miyajima, and K. Takeda, “Compressing continuous point cloud data using image compression methods,” in 2016 IEEE 19th Intl. Conf. Intell. Transportation Sys. (ITSC), 2016, pp. 1712–1719.
- [29] R. Gallager, “Basic limits on protocol information in data communication networks,” IEEE Trans. Info. Theory, vol. 22, no. 4, pp. 385–398, 1976.
- [30] A. S. Bedekar, “On the information about message arrival times required for in-order decoding,” in IEEE Int. Symp. Inf. Theory, June 2001, p. 227.
- [31] S. Verdú, “The exponential distribution in information theory.” Probl. Inf. Transm., vol. 32, no. 1, pp. 86–95, 1996.
- [32] T. P. Coleman, N. Kiyavash, and V. G. Subramanian, “The rate-distortion function of a Poisson process with a queueing distortion measure,” in Data Compress. Conf, Mar 2008, pp. 63–72.
- [33] I. Rubin, “Information rates and data-compression schemes for Poisson processes,” IEEE Trans. Info. Theory, vol. 20, no. 2, pp. 200–210, 1974.
- [34] G. Koliander, D. Schuhmacher, and F. Hlawatsch, “Rate-distortion theory of finite point processes,” IEEE Trans. Info. Theory, vol. 64, no. 8, pp. 5832–5861, 2018.
- [35] A. Lapidoth, A. Malar, and L. Wang, “Covering point patterns,” IEEE Trans. Info. Theory, vol. 61, no. 9, pp. 4521–4533, 2015.
- [36] H.-A. Shen, S. M. Moser, and J.-P. Pfister, “Rate-distortion problems of the Poisson process based on a group-theoretic approach,” 2022. [Online]. Available: https://arxiv.org/abs/2202.13684
- [37] T. A. Courtade and R. D. Wesel, “Multiterminal source coding with an entropy-based distortion measure,” in IEEE Int. Symp. Info. Theory, Jul 2011, pp. 2040–2044.
- [38] T. A. Courtade and T. Weissman, “Multiterminal source coding under logarithmic loss,” IEEE Trans. Info. Theory, vol. 60, no. 1, pp. 740–761, 2014.
- [39] P. Brémaud, Point Procceses and Queues: Martingale Dynamics. Springer-Verlag, 1981.
- [40] Y. Kabanov, “The capacity of a channel of the Poisson type,” Theory of Probabilty and Applications, vol. 23, pp. 143–147, 1978.
- [41] M. Davis, “Capacity and cutoff rate for Poisson-type channels,” IEEE Trans. Info. Theory, vol. 26, no. 6, pp. 710–715, Nov 1980.
- [42] N. V. Shende and A. B. Wagner, “The stochastic-calculus approach to multi-receiver Poisson channels,” IEEE Trans. Info. Theory, vol. 65, no. 8, pp. 5007–5027, Aug 2019.
- [43] R. Sundaresan and S. Verdú, “Capacity of queues via point-process channels,” IEEE Trans. Info. Theory, vol. 52, no. 6, pp. 2697–2709, June 2006.
- [44] L. Wang, “A strong data processing inequality for thinning Poisson processes and some applications,” in IEEE Int. Symp. Info. Theory, June 2017, pp. 3180–3184.
- [45] A. Klenke, Probability Theory: A Comprehensive Course, 2nd ed. Springer London, 2013.
- [46] A. Wyner, “A definition of conditional mutual information for arbitrary ensembles,” Information and Control, vol. 38, no. 1, pp. 51 – 59, 1978.
- [47] I. M. Gel’fand and A. M. Yaglom, “Computation of the amount of information about a stochastic function contained in another such function,” Uspekhi Mat. Nauk, vol. 12, no. 1, pp. 3–52, 1957.
- [48] R. M. Gray, Entropy and Information Theory. Springer-Verlag, 1990.
- [49] O. Kallenberg, Foundations of Modern Probability, 2nd ed. Springer-Verlag, New York, 2002.
- [50] R. S. Liptser and A. N. Shiryaev, Statistics of Random Processes II, 2nd ed. Springer-Verlag Berlin Heidelberg, 2001.
- [51] Y. Polyanskiy and Y. Wu, “Strong data-processing inequalities for channels and Bayesian networks,” in Convexity and Concentration. New York, NY: Springer New York, 2017, pp. 211–249.
- [52] R. Durrett, Essentials of Stochastic Processes, ser. Springer Texts in Statistics. Springer International Publishing, 2016.
- [53] R. G. Gallager, Information Theory and Reliable Communication. New York, NY, USA: John Wiley & Sons, Inc., 1968.
- [54] R. Rockafellar, Convex Analysis. Princeton University Press, 1997.
- [55] C. Dellacherie and P. A. Meyer, Probabilities and Potential B: Theory of Martingales, ser. North-Holland Mathematics Studies. North-Holland, 1982, vol. 72.
- [56] A. E. Gamal and Y.-H. Kim, Network Information Theory. Cambridge University Press, 2011.
Proof:
The first part of the lemma is due to [39, T12 Theorem, Chapter VI, p. 187]. To prove the second part we note that implies , which in turn gives
-a.s. Thus applying [50, Theorem 19.7, p. 343], we conclude that . Hence, from the first part of the lemma
where the uniqueness of intensity [39, T12 Theorem, Chapter II, p. 31] gives us
Since
we have
and
Hence
Finally,
Here, for (a) we have used the uniqueness of the intensity and in the remaining equalities, we have used the finiteness of the expectations , . ∎
Proof:
Recall that can be written as
We note that for satisfies
(37) |
Let be a non-negative -predictable process. Then
where, (a) follows since is the Radon-Nikodym derivative ,
(b) follows due to [39, T19 Theorem, Appendix A2, p. 302],
(c) follows due to (37),
(d) follows since the -intensity of is 1, and being a left-continuous adapted process is -predictable,
(e) follows since the Lebesgue measure of the set is zero due to (37),
(f) again follows again due to [39, T19 Theorem, Appendix A2, p. 302], and
(g) again follows since is the Radon-Nikodym derivative .
∎
Proof:
We will first show that
Define and . We note that and . Define the process as
Then is a non-negative -predictable process and
-a.s. since . Hence the process defined as
is a non-negative super-martingale [39, T2 Theorem, Chapter VI, p. 165]. Hence the following chain of inequalities hold
(38) |
Here, for (a) we have used the fact that since is a super-martingale, is integrable, and then Jensen’s inequality and
for (b), we have used the fact that is a super-martingale, hence .
Let denote the th arrival instant of the process , i.e.,
where the infimum of the null set is taken as . Then if , -a.s. [39, T12 Theorem, Chapter II, p. 31]. Hence for ,
Thus we can write
Using (38) we obtain
We note that is a non-negative random variable, and
Hence we can split the expectation to get
which gives
(39) |
Hence
(40) |
∎
Proof:
Suppose that is the -intensity of . Then applying [39, T8 Theorem, Chapter II, p. 27] with proves is a -martingale. Now suppose that is a -martingale. Consider a simple -predictable process of the form
Then
(41) |
where for (a) we have used the martingale property of . Thus by the monotone class theorem, for all bounded -predictable processes , (41) holds (see [39, App. A1, Theorem T5, p. 264]). Then by applying the monotone convergence theorem, we can show that (41) holds for all non-negative -predictable processes as well, so that is the -intensity of . ∎
Proof:
There exists a -predictable process such that -a.s. , [55, Chapter 6, Theorem 43, p. 103]. We will show that is the -intensity of . Let be a non-negative -predictable process. As , it is also -predictable. Thus
(42) |
Hence
Here, (a) is due to the fact that is measurable [39, Exercise E10, Chapter I, p. 9], and
(b) is due to (42).
Hence the -intensity of is . ∎
Proof:
We first note that since and are independent, trajectories of are a.s. in . The -intensities of and are and respectively [39, E5 Exercise, Chapter II, p. 28]. Then for a non-negative -predictable process :
Hence the -intensity of is . Since the statement of the lemma follows from an application of Lemma 8. ∎
Proof:
Let . We note that the -intensity of is . Now we will compute the -intensity of . Let denote the sequence of independent and identically distributed Bernoulli random variables which indicate if a particular point in point process is erased or not. In particular, if , then the th point in is retained, so that . Then for
Using the monotone convergence theorem for the conditional expectation,
where, for (a) we have used the fact that given , is independent of ,
for (b), we use note that , and
for (c), we have used the martingale property of .
Then
is a -martingale. Hence from Lemma 7, the -intensity of is . An application of Lemma 8 the proves the statement of the lemma. ∎
Proof:
We will require the following inequality
(43) |
The inequality can be verified to be true if either or both , are zero. If , the inequality follows from .
Defining , we note that . Consider
(44) |
where, for (a), we have used the facts that is -predictable, is non-negative, and is the -intensity of ,
for (b), we note that and are -a.s finite, and then use the inequality in (43), and
for (c), we have used the facts that (via Theorem 1), , and .
Hence we can write
(45) |
∎
Proof:
The first part of the lemma follows directly from L’Hôpital’s rule. For the second part
where for (a), we have used the definition in (23).
∎
Proof:
The first limit can be evaluated using L’Hôpital’s rule. To compute the second limit, consider
Then we have
Recalling that implies , we have
Now to compute , we first calculate
This gives
Thus
∎
Proof:
Achievability:
Let
We will show achievability using a code without feedforward. We will use discretization and results from the rate-distortion theory for discrete memoryless sources (DMS).
First consider the case when for each , at least one of the following conditions is satisfied
-
C.1
for all ,
-
C.2
.
Fix , and let for an integer . For each , define a binary valued discrete time process as follows. If there are one or more arrivals in the interval of the process , then set to , otherwise set it equal to zero. Since is a Poisson process with rate , the components of are independent and identically distributed with . Similarly, if denotes the discretized process , then we have
due to the memoryless property of Poisson processes and independent thinning. Consider the following “test”-channel for ,
(46) | |||
(47) |
Define the discretized distortion function
(48) |
The reconstruction is taken as
where
We note that since , and at least one of C.1-C.2 is satisfied, , and hence . Thus the distortion function in (48) is bounded. Let
(49) |
Due to the Berger-Tung inner bound [56, Theorem 12.1, p. 295], for a given , , and all sufficiently large , there exists encoders and , and a decoder such that for
satisfying
(50) | |||
(51) |
It is noteworthy that the Berger-Tung inner bound has a conditioning term in the mutual-information expression, which in general is a stronger bound than that presented here. However, in our setting we can drop this conditioning as explained in Remark 3 in the main paper.
Given the above setup, each encoder upon observing obtains the binary valued discrete-time process , and sends to the decoder. The decoder outputs the reconstruction as
Let denote the actual number of arrivals of in an interval . Then is related to the original distortion function via the above reconstruction as follows:
where for (a), we have used the definition of in (49).
Hence taking the expectation, we get
(52) |
where, for (a), we have used (51),
for (b) we note that , and
for (c), we have used the inequality .
Moreover using (50), for
(53) |
The scaling of the mutual information and the distortion function with respect to is given by the following lemma.
Lemma 14
For
Proof:
Please see the supplementary material. ∎
Now given the rate-distortion vector and , first choose sufficiently small so that
Then let , and choose a sufficiently large so that (50) and (51) are satisfied. From (52) and (53) we conclude that a sequence of code exists with when at least one of the conditions C.1 or C.2 is satisfied.
Now consider the case when some , and for that , for some ’s. Say and . This gives us for . Then we need to show that the rate-distortion vector
(54) |
is achievable. Let and for some . Then the term
is continuous in and goes from zero to infinity as is increased from zero to , hence there exists some such that with ,
(55) |
We note that this satisfies condition C.1. Hence the rate-distortion vector in (54) is achievable by using that satisfies (55). The case when or both can be handled similarly.
Converse:
We will prove the converse when feedforward is present. For the given code with feedforward, let and denote the first and second encoder’s output respectively. We essentially repeat the steps in the converse proof of Theorem 4 to show that
Since , we conclude from Theorem 1 that there exists a process such that is the intensity of and
(56) |
Let denote the decoder’s output. The distortion constraint satisfies
(57) |
where for the last equality we have used Lemma 12. Once again using the inequality , and noting that the individual terms have finite expectations,
(58) |
Combining these inequalities, we obtain
(59) |
where, for (a) we have used (57) and (58) and
for (b) we use the fact that .
We can upper bound the term as
(60) |
where, for (a) we have used Lemma 2 and
for (b), we used the Markov chain .
Combining (59) and (60) we get
(61) |
For , using Lemma 2
(62) |

We will first consider the case when for . We shall proceed by defining certain auxiliary processes (see Figure 2). Let be obtained from -thinning of , where
Then using Lemma 11 we can write
where and are independent Poisson processes with rates and respectively. Whereas, by definition
where and are independent Poisson processes with rates and respectively. Hence we conclude that the joint distribution of is identical to the joint distribution of . Let be obtained by adding an independent Poisson process with rate to ,
Also using Lemma 11 we have
where and are independent Poisson processes with rates and . Hence the joint distribution of and are identical. Moreover, forms a Markov chain and forms a Markov chain. This allows us to write
(63) |
Since is a -thinning of , Theorem 3 gives
(64) |
Also is obtained by adding an independent Poisson process with rate to , Theorem 2 yields
(65) |
where, is the -intensity of . Then we can further lower bound in (61) as
where for (a), we have used (63),
for (b), we have used (65), and
for (c), we define and to be uniformly distributed on , independent of all other random variables and independent of each other as well.
For each , in (62) can be lower bounded as
where for (a), we have used (64),
for (b), we have used (65), and
for (c), recall that and are uniformly distributed on , independent of all other random variables and independent of each other.
Now we use Carathéodory’s theorem [54, Theorem 17.1]. For each , there exist non-negative and , such that and
where in the last line we have used the fact that since is the -intensity of , . Hence we have
(66) | ||||
(67) |
Now define
We note that if , and . Substituting the above definitions in (66)
(68) | ||||
(69) |
Likewise,
Substituting the above in (67), we get
(70) |
If either , say , equals 1, then and are independent so that , and we can repeat the above steps to show that
Since is arbitrary, taking and gives us the rate region in the statement of the theorem.
∎