Continuity and Additivity Properties
of Information Decompositions
Abstract
Information decompositions quantify how the Shannon information about a given random variable is distributed among several other random variables. Various requirements have been proposed that such a decomposition should satisfy, leading to different candidate solutions. Curiously, however, only two of the original requirements that determined the Shannon information have been considered, namely monotonicity and normalization. Two other important properties, continuity and additivity, have not been considered. In this contribution, we focus on the mutual information of two finite variables about a third finite variable and check which of the decompositions satisfy these two properties. While most of them satisfy continuity, only one of them is both continuous and additive.
1 Introduction
The fundamental concept of Shannon information is uniquely determined by four simple requirements, continuity, strong additivity, monotonicity, and a normalization (Shannon, 1948). We refer to Csiszár (2008) for a discussion of axiomatic characterizations. Continuity implies that small perturbations of the underlying probability distribution have only small effects on the information measure, and this is of course very appealing. Strong additivity refers to the requirement that the chain rule holds. Similar conditions are also satisfied, mutatis mutandis, for the derived concepts of conditional and mutual information, as well as for other information measures, such as interaction information/co-information (McGill, 1954; Bell, 2003) or total correlation/multi-information (Watanabe, 1960; Studenỳ and Vejnarová, 1998).
Williams and Beer (2010) proposed to decompose the mutual information that several random variables have about a target variable into various components that quantify how much information these variables possess individually, how much they share and how much they need to combine to become useful. That is, one wants to disentangle how information about is distributed over the . Again, various requirements can be imposed, with varying degrees of plausibility, upon such a decomposition. There are several candidate solutions, and not all of them satisfy all those requirements. Curiously, however, previous considerations did not include continuity and strong additivity. While Bertschinger et al. (2013) did consider chain rule-type properties, none of the information measures defined within the context of information decompositions satisfies any of these chain rule properties (Rauh et al., 2014).
In this contribution, we evaluate which of the various proposed decompositions satisfy continuity and additivity. Here, additivity (without strong) is required only for independent variables (see Definition 8 below). Additivity (together with other properties) may replace strong additivity when defining Shannon information axiomatically (see (Csiszár, 2008) for an overview). The importance of additivity is also discussed by Matveev and Portegies (2017).
We consider the case where all random variables are finite, and we restrict to the bivariate case . We think that this simplest possible setting is the most important one to understand conceptually and in practical applications. Already here there are important differences between the measures that have been proposed in the literature. A bivariate information decomposition consists of three functions , and that depend on the joint distribution of three variables , and that satisfy:
(1) | ||||
Hence, is decomposed into a shared part that is contained in both and , a complementary (or synergistic) part that is only available from together, and unique parts contained exclusively in either or . The different terms are functions of the joint probability distribution of the three random variables , commonly written with suggestive arguments as in (1).
To define a bivariate information decomposition in this sense, it suffices to define either of , or . The other functions are then determined from (1). The linear system (1) consists of three equations in four unknowns, where the two unknowns and are related. Thus, when starting with a function to define an information decomposition, the following consistency condition must be satisfied:
(2) |
If consistency is not given, one may try to adjust the proposed measure of unique information to enforce consistency using a construction from Banerjee et al. (2018a) (see Section 2).
As mentioned above, several bivariate information decompositions have been proposed (see Section 2 for a list). However, there are still holes in our understanding of the properties of those decompositions that have been proposed so far. This paper investigates the continuity and additivity properties of some of these decompositions.
Continuity is understood with respect to the canonical topology of the set of joint distributions of finite variables of fixed sizes. When is a sequence of joint distributions with , does ? Most, but not all, proposed information decompositions are continuous (i.e. , and are all continuous). If an information decomposition is continuous, one may ask whether it is differentiable, at least at probability distributions of full support. Among the information decompositions that we consider, only the decomposition (Niu and Quinn, 2019) is differentiable. Continuity and smoothness are discussed in detail in Section 3.
The second property that we focus on is additivity, by which we mean that , and behave additively when a system can be decomposed into (marginally) independent subsystems (see Definition 8 in Section 4). This property corresponds to the notion of extensivity as used in thermodynamics. Only the information decomposition (Bertschinger et al., 2014) in our list satisfies this property. A weak form of additivity, the identity axiom proposed by Harder et al. (2013), is well-studied and is satisfied by other bivariate information decompositions.
2 Proposed information decompositions
We now list the bivariate information decompositions that we want to investigate. The last paragraph mentions further related information measures. We denote information decompositions by , with sub- or superscripts. The corresponding measures , and inherit these decorations.
We use the following notation: , , are random variables with finite state spaces , , . The set of all probability distributions on a set (i.e. the probability simplex over ) is denoted by . The joint distribution of is then an element of .
Together with the information decomposition framework, Williams and Beer (2010) also defined an information decomposition . Let
be the specific information of the outcome about and , respectively. Then
has been criticized, because it assigns relatively large values of shared information, conflating “the same amount of information” with “the same information” (Harder et al., 2013; Griffith and Koch, 2014).
A related information decomposition is the minimum mutual information (MMI) decomposition given by
Even more severely than , this information decomposition conflates “the same amount of information” with “the same information.” Still, formally, this definition produces a valid bivariate information decomposition and thus serves as a useful benchmark. The axioms imply that for any other bivariate information decomposition. For multivariate Gaussian variables, many information decompositions actually agree with (Barrett, 2015).
To address the criticism of , Harder et al. (2013) introduced a bivariate information decomposition based on a notion of redundant information as follows. Let be the support of , and let
(3) |
Then
Motivated from decision-theoretic considerations, Bertschinger et al. (2014) introduced the bivariate information decomposition (eponymously named after the authors in (Bertschinger et al., 2014)). Given , let denote the set of joint distributions of that have the same marginals on and as . Then define the unique information that conveys about with respect to as
where the subscript in denotes the joint distribution on which the function is computed. Computation of this decomposition was investigated by Banerjee et al. (2018b). leads to a concept of synergy that agrees with the synergy measure defined by Griffith and Koch (2014).
James et al. (2018) define the following bivariate decomposition: Given the joint distribution of , let be the probability distribution in that maximizes the entropy among all distributions with and . Similarly, let be the probability distribution in that maximizes the entropy among all distributions with and and (unlike for , we do not have an explicit formula for ). Then
This definition is motivated in terms of a lattice of all sensible marginal constraints when maximizing the entropy, as in the definition of and (see (James et al., 2018) for the details).
, and
The information decompositions (Griffith et al., 2014), (Griffith and Ho, 2015) and (Kolchinsky, 2022) are motivated from the notion of common information due to Gács and Körner (1973) and present three different approaches that try to represent the shared information in terms of a random variable :
(4) | ||||
(5) | ||||
(6) |
where the optimization runs over all pairs of (deterministic) functions (for ), all joint distributions of four random variables that extend the joint distribution of (for ), and all pairs of stochastic matrices (for ), respectively. One can show that (Kolchinsky, 2022).
Niu and Quinn (2019) presented a bivariate information decomposition based on information geometric ideas. While their construction is very elegant, it only works for joint distributions of full support (i.e. for all ). It is unknown whether it can be extended meaningfully to all joint distributions. Numerical evidence exists that a unique continuous extension is possible at least to some joint distributions with restricted support (see examples in (Niu and Quinn, 2019)).
For any , consider the joint distribution
where is a normalizing constant, and let
Then
An interesting aspect about these definitions is that, by the Generalized Pythagorean Theorem (see (Amari, 2018)), .
The UI construction
Given an information measure that captures some aspect of unique information but that fails to satisfy the consistency condition (2), one may construct a corresponding bivariate information decomposition as follows:
Lemma 1.
Let be a non-negative function that satisfies
Then a bivariate information decomposition is given by
Proof.
The proof follows just as the proof of (Banerjee and Montúfar, 2020, Proposition 9). ∎
We call the construction of Lemma 1 the UI construction. The unique information returned by the UI construction is the smallest -function of any bivariate information decomposition with .
As Banerjee et al. (2018a) show, the decomposition is an example of this construction. As another example, as Banerjee et al. (2018a) and Rauh et al. (2019) suggested, the UI construction can be used to obtain bivariate information decompositions from the one- or two-way secret key rates and related information functions that have been defined as bounds on the secret key rates, such as the intrinsic information (Maurer and Wolf, 1997), the reduced intrinsic information (Renner and Wolf, 2003), or the minimum intrinsic information (Gohari and Anantharam, 2010).
Other decompositions
Several other measures have been proposed that are motivated by the framework of Williams and Beer (2010) but that leave the framework. Ince (2017) defines a decomposition , which satisfies (1), but in which , and may take negative values. The SPAM decomposition of Finn and Lizier (2018) consists of non-negative information measures that decompose the mutual information, but this decomposition has a different structure, with alternating signs and twice as many terms. Both approaches construct “pointwise” decompositions, in the sense that , and can be naturally expressed as expectations, in a similar way that entropy and mutual information can be written as expectations (see (Finn and Lizier, 2018) for details). Recent works have proposed other decompositions based on a different lattice (Ay et al., 2020) or singling out features of the target variable (Magri, 2021).
Since these measures do not lie in our direct focus, we omit their definitions. Nevertheless, one can ask the same questions: Are the corresponding information measures continuous, and are they additive? For the constructions in (Finn and Lizier, 2018), both continuity and additivity (as a consequence of a chain rule) are actually postulated. The decomposition in (Ay et al., 2020) is additive. On the other hand, is neither continuous (as can be seen from its definition) nor additive (since it does not satisfy the identity property).
3 Continuity
Most of the information decompositions that we consider are continuous. Moreover, the UI construction preserves continuity: if is continuous, then is continuous.
The notable exceptions to continuity are and the decompositions (see Lemmas 2 and 4 below). For , this is due to its definition in terms of conditional probabilities. Thus, is continuous when restricted to probability distributions of full support. For , discontinuities also appear for sequences where the support does not change.
For the information decomposition, one should keep in mind that it is only defined for probability distributions with full support. It is currently unknown whether it can be continuously extended to all probability distributions.
Clearly, continuity is a desirable property, but is it essential? A discontinuous information measure might still be useful, if the discontinuity is not too severe. For example, the Gács-Körner common information (Gács and Körner, 1973) is an information measure that vanishes except on a set of measure zero (certain distributions that do not have full support). Clearly, such an information measure is difficult to estimate. The decompositions are related to , and so their discontinuity is almost as severe (see Lemma 4). Similarly, the -decomposition is continuous at distributions of full support. If the discontinuity is well-behaved and well understood, then such a decomposition may still be useful for certain applications. Still, a discontinuous information decomposition challenges the intuition, and any discontinuity must be interpreted (such as the discontinuity of can be explained and interpreted (Gács and Körner, 1973)).
If an information decomposition is continuous, one may ask whether it is differentiable, at least at probability distributions of full support. For almost all information decompositions that we consider, the answer is no. This is easy to see for those information decompositions that involve a minimum of finitely many smooth functions (, , , ). For , we refer to Rauh et al. (2021). Only is differentiable for distributions of full support111Personal communication with the authors Niu and Quinn (2019)..
Lemma 2.
is not continuous.
Proof.
and are defined in terms of conditional probability and , which are only defined for those with and . Therefore, and are discontinuous when probabilities tend to zero. A concrete example is given below. ∎
Example 3 ( is not continuous).
For , suppose that the joint distribution of has the following marginal distributions:
1 | 0 | |
1 | 1 | |
0 | 1 | |
0 | 2 |
0 | 0 | |
0 | 1 | |
1 | 1 | |
1 | 2 |
.
Observe the symmetry of , . For , the conditional distributions of given and are, respectively:
0 | |
---|---|
1 | |
2 |
and 0 1 2 .
Therefore, . The first equality follows from the definition of in (3), noting that includes all probability distributions on and hence . The second equality holds because the marginal distribution of is equal to that of up to relabeling of the states of . The third equality follows by similar considerations as the first.
For , the conditional distributions and are not defined. In this case we have . As before, the two equalities hold because the marginal distribution of is equal to that of up to relabeling of the states of . The inequality holds because now does not include all probability distributions on and for . In total,
Lemma 4.
, and are not continuous.
Proof.
A concrete example is given below.
Example 5 ( is not continuous).
Suppose that the joint distribution of has the following marginal distributions, for :
0 | 0 | |
1 | 0 | |
1 | 1 | |
2 | 1 |
0 | 0 | |
1 | 0 | |
1 | 1 | |
2 | 1 |
.
Recall the definition of in (6). For , the marginal distributions of the pairs and are identical, whence .
Now let . According to the definition of , we need to find stochastic matrices that satisfy the condition
(7) |
For and , this condition implies and . For , the same condition gives
In the case , this implies that and thus that is independent of and . Therefore, and for .
Asymptotic continuity and locking
We discuss two further related properties, namely, asymptotic continuity and locking (Banerjee et al., 2018a; Rauh et al., 2019), which we explain shortly below. is asymptotically continuous and does not exhibit locking. It is not known if other information decompositions satisfy these properties.
Operational quantities in information theory such as channel capacities and compression rates are usually defined in the spirit of Shannon (1948) - in the asymptotic regime of many independent uses of the channel or many independent realizations of the underlying source distribution. In the asymptotic regime, real-valued functionals of distributions that are asymptotically continuous are especially useful as they often provide lower or upper bounds for operational quantities of interest (Cerf et al., 2002; Banerjee et al., 2018a; Chitambar and Gour, 2019).
Asymptotic continuity is a stronger notion of continuity that considers convergence relative to the dimension of the underlying state space (Synak-Radtke and Horodecki, 2006; Chitambar and Gour, 2019; Fannes, 1973; Winter, 2016). Concretely, a function is said to be asymptotically continuous if
for all joint distributions , where is some constant, , and is any continuous function converging to zero as (Chitambar and Gour, 2019).
As an example, entropy is asymptotically continuous (see, e.g., (Csiszár and Körner, 2011, Lemma 2.7)): For any , if , then
where is the binary entropy function, for and . Likewise, the conditional mutual information satisfies asymptotic continuity in the following sense (Renner and Wolf, 2003; Christandl and Winter, 2004): For any , if then
Note that the right-hand side of the above inequality does not depend explicitly on the cardinality of .
As Banerjee et al. (2018a) show, is asymptotically continuous:
Lemma 6.
For any , and , if , then
for some bounded, continuous function that converges to zero as .
Locking is motivated from the following property of the conditional mutual information (Renner and Wolf, 2003; Christandl et al., 2007): For arbitrary discrete random variables , , , and ,
(8) |
The conditional mutual information does not exhibit “locking” in the sense that any additional side information accessible to cannot reduce the conditional mutual information by more than the entropy of the side information.
As Rauh et al. (2019) show, does not exhibit locking:
Lemma 7.
For jointly distributed random variables ,
(9) |
This property is useful, for example, in a cryptographic context (Rauh et al., 2019) where it ensures that the unique information that has about w.r.t. an adversary cannot “unlock”, i.e., drop by an arbitrarily large amount on giving away a bit of information to .
4 Additivity
Definition 8.
An information measure (i.e. a function of the joint distribution of random variables) is additive if and only if the following holds: If is independent of , then
The information measure is superadditive, if, under the same assumptions,
The decomposition is additive:
Lemma 9.
is additive.
Proof.
This is (Bertschinger et al., 2014, Lemma 19). ∎
The information decompositions motivated from the Gács-Körner common information as defined in (4), (5) and (6) are additive (Theorem 13). All other information decompositions that we consider are not additive. However, in all information decompositions that we consider, is superadditive and is subadditive (Theorem 12).
Again, additivity is a desirable property, but is it essential? As in the case of continuity, we argue that non-additivity challenges the intuition, and any non-additivity must be interpreted. Why is it plausible that the shared information contained in two independent pairs is more than the sum of the individual shared informations, and how can one explain that the unique information is subadditive?
A related weaker property is additivity under i.i.d. sequences, i.e. when, in the definition of additivity, the vectors and are identically distributed. One can show that , , and (and, of course, ) are additive under i.i.d. sequences, but not . The construction gives additivity of under i.i.d. sequences if is additive under i.i.d. sequences. The proof of these statements is similar to the proof for additivity (given below) and omitted. For the decompositions, it is not as easy to see, and so we currently do not know whether additivity under i.i.d. sequences holds.
Lemma 10.
-
1.
If and are superadditive, then is superadditive.
-
2.
If, in addition, there exist distributions with and , then is not additive.
Proof.
-
1.
With as in the definition of superadditivity,
-
2.
In this inequality, if and , then the right hand side equals , which makes the inequality strict.∎
As a consequence:
Lemma 11.
If is subadditive, then is subadditive, is superadditive, but neither is additive.
Theorem 12.
The shared information measures , , , , and are superadditive, but not additive.
Proof.
For , the claim follows directly from Lemma 10. The same is true for , since and are additive, and also for , since and are superadditive. For , the same argument as in the proof of Lemma 10 applies, since the specific information is additive, in the sense that
Next, consider . For
Then
and
where denotes the marginal distribution of for . It follows that
If , then strict inequality holds. ∎
Theorem 13.
, and are additive.
Proof.
In the following, to slightly simplify notation, we write , and for the decompositions defined in (4), (5) and (6), respectively.
- •
-
•
To see that is superadditive, suppose that . The joint distribution of defined by is feasible for the optimization problem in the definition of . Therefore,
To prove subadditivity, let be as in the definition of in (5), with (), () as in Definition 8. The chain rule implies
where . Choose an element in . Construct two random variables as follows: is independent of and satisfies . is independent of and satisfies . By construction, is independent of given , and is independent of given . The statement follows from
-
•
The proof of superadditivity for follows line by line the proof for . To prove subadditivity for , we claim that for all random variables there exist random variables , , with , and . This correspondence can be chosen such that
where is independent of . Thus,
To prove the claim, suppose that , with as in the definition of in (6). Define random variables such that
Then and . Since only depends on the - and -marginals, . Moreover,
where the first inequality follows from (5) and the second one was discussed following (6). The claim follows from this.
∎
5 Conclusions
We have studied measures that have been defined for bivariate information decompositions, asking whether they are continuous and/or additive. The only information decomposition that is both continuous and additive is .
While there are many continuous information decompositions, it seems difficult to construct differentiable information decompositions: Currently, the only differentiable example is (which, however, is only defined in the interior of the probability simplex). It would be interesting to know which other smoothness properties are satisfied by the proposed information decompositions, such as locking and asymptotic continuity.
It also seems to be difficult to construct additive information decompositions, with and the Gács-Körner-based measures , and being the only ones. In contrast, many known information decompositions are additive under i.i.d. sequences. In the other direction, it would be worthwhile to have another look at stronger versions of additivity, such as chain rule-type properties. Bertschinger et al. (2013) concluded that such chain rules prevent a straightforward extension of decompositions to the non-bivariate case along the lines of Williams and Beer (2010). It has been argued (see, e.g., Rauh (2017)) that a general information decomposition likely needs a structure that differs from the proposal of (Williams and Beer, 2010), whence another look at chain rules may be worthwhile. Recent work (Ay et al., 2020) has proposed an additive decomposition based on a different lattice.
Acknowledgement
PB and GM have been supported by the ERC under the European Union’s Horizon 2020 research and innovation programme (grant agreement no 757983).
References
- Amari (2018) S. Amari. Information Geometry and Its Applications. Springer Publishing Company, Incorporated, 2018.
- Ay et al. (2020) N. Ay, D. Polani, and N. Virgo. Information decomposition based on cooperative game theory. Kybernetika, 56(5):979–1014, 2020.
- Banerjee and Montúfar (2020) P. K. Banerjee and G. Montúfar. The variational deficiency bottleneck. In 2020 International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2020.
- Banerjee et al. (2018a) P. K. Banerjee, E. Olbrich, J. Jost, and J. Rauh. Unique informations and deficiencies. In 56th Annual Allerton Conference on Communication, Control, and Computing, pages 32–38, 2018a.
- Banerjee et al. (2018b) P. K. Banerjee, J. Rauh, and G. Montúfar. Computing the unique information. In IEEE International Symposium on Information Theory (ISIT), pages 141–145, June 2018b.
- Barrett (2015) A. B. Barrett. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E, 91:052802, 2015.
- Bell (2003) A. J. Bell. The co-information lattice. In Proc. Fourth Int. Symp. Independent Component Analysis and Blind Signal Separation (ICA 03), 2003.
- Bertschinger and Rauh (2014) N. Bertschinger and J. Rauh. The Blackwell relation defines no lattice. In IEEE International Symposium on Information Theory (ISIT), pages 2479–2483, 2014.
- Bertschinger et al. (2013) N. Bertschinger, J. Rauh, E. Olbrich, and J. Jost. Shared information — new insights and problems in decomposing information in complex systems. In Proc. ECCS 2012, pages 251–269. Springer, 2013.
- Bertschinger et al. (2014) N. Bertschinger, J. Rauh, E. Olbrich, J. Jost, and N. Ay. Quantifying unique information. Entropy, 16(4):2161–2183, 2014.
- Cerf et al. (2002) N. J. Cerf, S. Massar, and S. Schneider. Multipartite classical and quantum secrecy monotones. Physical Review A, 66(4):042309, 2002.
- Chitambar and Gour (2019) E. Chitambar and G. Gour. Quantum resource theories. Reviews of Modern Physics, 91(2):025001, 2019.
- Christandl and Winter (2004) M. Christandl and A. Winter. “Squashed entanglement” - An additive entanglement measure. Journal of Mathematical Physics, 45(3):829–840, 2004.
- Christandl et al. (2007) M. Christandl, A. Ekert, M. Horodecki, P. Horodecki, J. Oppenheim, and R. Renner. Unifying classical and quantum key distillation. In 4th Theory of Cryptography Conference (TCC), pages 456–478, 2007.
- Csiszár and Körner (2011) I. Csiszár and J. Körner. Information theory: Coding theorems for discrete memoryless systems. Cambridge University Press, 2011.
- Csiszár (2008) I. Csiszár. Axiomatic characterizations of information measures. Entropy, 10:261–273, 2008.
- Fannes (1973) M. Fannes. A continuity property of the entropy density for spin lattice systems. Communications in Mathematical Physics, 31:291–294, 1973.
- Finn and Lizier (2018) C. Finn and J. T. Lizier. Pointwise partial information decomposition using the specificity and ambiguity lattices. Entropy, 20(4), 2018.
- Gács and Körner (1973) P. Gács and J. Körner. Common information is far less than mutual information. Problems of Control and Information Theory, 2(2):149–162, 1973.
- Gohari and Anantharam (2010) A. A. Gohari and V. Anantharam. Information-theoretic key agreement of multiple terminals-Part I. IEEE Transactions on Information Theory, 56(8):3973–3996, 2010.
- Griffith and Ho (2015) V. Griffith and T. Ho. Quantifying redundant information in predicting a target random variable. Entropy, 17(7):4644–4653, 2015.
- Griffith and Koch (2014) V. Griffith and C. Koch. Quantifying synergistic mutual information. In M. Prokopenko, editor, Guided Self-Organization: Inception, volume 9, pages 159–190. Springer Berlin Heidelberg, 2014.
- Griffith et al. (2014) V. Griffith, E. K. P. Chong, R. G. James, C. J. Ellison, and J. P. Crutchfield. Intersection information based on common randomness. Entropy, 16(4):1985–2000, 2014.
- Harder et al. (2013) M. Harder, C. Salge, and D. Polani. A bivariate measure of redundant information. Phys. Rev. E, 87:012130, Jan 2013.
- Ince (2017) R. Ince. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy, 19(7):318, 2017.
- James et al. (2018) R. James, J. Emenheiser, and J. Crutchfield. Unique information via dependency constraints. Journal of Physics A, 52(1):014002, 2018.
- Kolchinsky (2022) A. Kolchinsky. A novel approach to the partial information decomposition. Entropy, 24(3):403, 2022.
- Magri (2021) C. Magri. On shared and multiple information (version 4). arXiv:2107.11032, 2021.
- Matveev and Portegies (2017) R. Matveev and J. W. Portegies. Tropical limits of probability spaces, Part I: The intrinsic Kolmogorov-Sinai distance and the asymptotic equipartition property for configurations. arXiv:1704.00297, 2017.
- Maurer and Wolf (1997) U. Maurer and S. Wolf. The intrinsic conditional mutual information and perfect secrecy. In IEEE International Symposium on Information Theory (ISIT), 1997.
- McGill (1954) W. McGill. Multivariate information transmission. IRE Transactions on Information Theory, 4(4):93–111, 1954.
- Niu and Quinn (2019) X. Niu and C. Quinn. A measure of synergy, redundancy, and unique information using information geometry. In IEEE International Symposium on Information Theory (ISIT), 2019.
- Raginsky (2011) M. Raginsky. Shannon meets Blackwell and Le Cam: Channels, codes, and statistical experiments. In IEEE International Symposium on Information Theory, pages 1220–1224, July 2011.
- Rauh (2017) J. Rauh. Secret sharing and shared information. Entropy, 19(11):601, 2017.
- Rauh et al. (2014) J. Rauh, N. Bertschinger, E. Olbrich, and J. Jost. Reconsidering unique information: Towards a multivariate information decomposition. In IEEE International Symposium on Information Theory (ISIT), pages 2232–2236, 2014.
- Rauh et al. (2019) J. Rauh, P. K. Banerjee, E. Olbrich, and J. Jost. Unique information and secret key decompositions. In IEEE International Symposium on Information Theory (ISIT), pages 3042–3046, 2019.
- Rauh et al. (2021) J. Rauh, M. Schünemann, and J. Jost. Properties of unique information. Kybernetika, 57:383–403, 2021.
- Renner and Wolf (2003) R. Renner and S. Wolf. New bounds in secret-key agreement: The gap between formation and secrecy extraction. In Advances in Cryptology - EUROCRYPT, pages 562–577, 2003.
- Shannon (1948) C. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423 and 623–656, 1948.
- Studenỳ and Vejnarová (1998) M. Studenỳ and J. Vejnarová. The multiinformation function as a tool for measuring stochastic dependence. In Learning in graphical models, pages 261–297. Springer, 1998.
- Synak-Radtke and Horodecki (2006) B. Synak-Radtke and M. Horodecki. On asymptotic continuity of functions of quantum states. Journal of Physics A: Mathematical and General, 39(26):L423, 2006.
- Watanabe (1960) S. Watanabe. Information theoretical analysis of multivariate correlation. IBM Journal of Research and Development, 4(1):66–82, 1960.
- Williams and Beer (2010) P. Williams and R. Beer. Nonnegative decomposition of multivariate information. arXiv:1004.2515v1, 2010.
- Winter (2016) A. Winter. Tight uniform continuity bounds for quantum entropies: Conditional entropy, relative entropy distance and energy constraints. Communications in Mathematical Physics, 347:291–313, 2016.