Supplemental material for Visualizing theory space: Isometric embedding of probabilistic predictions, from the Ising model to the cosmic microwave background
This is the supplemental material accompanying Visualizing theory space: Isometric embedding of probabilistic predictions, from the Ising model to the cosmic microwave background.
Section I discusses the general non-Euclidean embedding provided by models whose errors are dependent upon parameter values, with particular reference to the cosmic microwave background correlation function where the fluctuations are the predictions of the model. Section II shows that , two times the square root of the likelihood of a fit, acts as an isometric embedding for a general probabilistic model onto an -sphere of radius two. Section II.1 illustrates the failures of this -sphere embedding as a practical tool for models whenever the data provides good discrimination between different model predictions. (The proposed intensive embedding in the main text takes a formal limit of the embedding as the amount of data goes to zero to bypass this challenge.) Here we focus on the CDM model and cosmic microwave background (CMB) anisotropy predictions, complementing the discussion in the main text of the Ising model. Section III shows that our intensive manifold embedding is isometric – faithfully representing the distances between predictions of nearby models, as characterized by the Fisher information. Finally, section IV describes in detail how one generalizes principal component analysis to the intensive manifold embedding, fleshing out the discussion in the main text by explicitly centering the model predictions before taking the limit of zero replicas and implementing the singular value decomposition.
I Cosmic microwave correlations as a non-Euclidean embedding
The anisotropy in CMB radiation can be characterized by a direction dependent intensity matrix whose components can be recognized as 3 of the 4 Stokes parameters, I, Q, and U. The Q and U polarization maps can be made independent of the Stokes parameter measurement basis by separating them into divergence (E) and curl (B) components, to generate three maps of cosmological interest; the temperature fluctuation map T and two polarization maps, E and B. These can be expanded into spherical harmonics,
(S1) |
The anisotropies are expected to be (approximately) Gaussian. All of the Gaussian information can be extracted form the angular power spectra, which are defined defined as the cross correlation of the coefficients in the expansion and written as
(S2) |
Using this, we can construct a correlation matrix for the fluctuations,
(S3) |
The values of depend on the CDM parameters, and likelihood analyses of CMB data fit with such a correlation function have been extensively studied, as they are invaluable for fitting and forecasting CMB measurements (e.g. collaboration (2015); Hamimeche and Lewis ; Tegmark (1997); Abazajian et al. (2016)). The probability of a fit for this data can be expressed as
(S4) |
This conditional probability defines the likelihood (collaboration, 2015; Tegmark, 1997), . The metric is given by the Fisher Information Matrix (FIM),
(S5) |
We can evaluate this integral by looking at the second derivatives of ;
(S6) | |||||
This expansion can be combined with Eq. S5 to extract all terms independent of the data. Thus, the first two terms in the sum can be completely pulled out of the integral. The remaining term is harder, and to evaluate it we make use of the following integral for symmetric, positive definite matrix and symmetric matrix
This allows us to solve Eq. S5, setting and . We can now combine all the pieces together, and obtain a formula for the FIM
(S8) | |||||
We can compare this to to previous results for FIM derivations, Perotto et al. (2006); Tegmark (1997) and confirm that we obtain the same result. We can decompose this as a sum over and the different spectra to obtain:
(S9) |
where is a tensor of partial derivatives, and is given by Eq. S8 where derivatives are taken with respect to the . We can express as a block diagonal matrix because the are uncorrelated for different values of , and in such a form represents the FIM where the parameters of interest are the . Note that is not constant in the coordinates given by .
For regular least-squares fitting, would simply be a constant matrix representing experimental uncertainty. However, for CMB spectra, it is parameter dependent; it varies with the . Geometrically, this can be interpreted as the metric in the embedding space. Since it varies with the , it produces a non-Euclidean embedding. Visualizing the model manifold in this space is therefore problematic, since the space is warped and distorted, and distances are not faithfully represented as show in Fig. S1 a. This problem can be solved if we instead consider the probability distributions the correspond to for different parameters. In Table S1 we present the range of CDM parameters explored in our model manifold. In for following section we explore these probability distributions for different .
II -Sphere isometric embedding and its failings
The set of probability distributions from a model generates a ‘probability simplex’, since they are all normalized to one. The Fisher Information Matrix in this space is non-Euclidean; it is diagonal with entries that are parameter dependent. For instance, in this space the Fisher Information for the Ising model is given by
(S10) |
This is a non-Euclidean metric, since it is not proportional to . It has a parameter dependent component given by the Boltzmann likelihood of being in a given spin state. This is similar to the manifold illustrated in Fig. S1 a, which shows the non-Euclidean embedding of CMB spectra in a space whose metric is also point-dependent and detailed in Section I.
If instead we consider the square root of a normalized probability distribution of a fit to data , , then we can generate a model manifold embedded on an -sphere of radius two, such that the metric is the Fisher Information Metric. Since is normalized to one and always positive, the dot product between and must be less than one, or equal to one if the likelihood functions are the same. The distance between two points on this -sphere is proportional to the Hellinger distance Hellinger (1909), an f-divergence similar to the Kullback-Leibler divergence. It is straightforward to show that the metric for this -sphere of radius two embedding is given by the Fisher Information Matrix, by considering the distance for some small perturbation :
(S11) |
This embedding is therefore isometric, preserving distances as given by the Hellinger divergence Hellinger (1909). Unfortunately, there is a maximum distance any two points can be in this embedding, and that is given by the points on the poles attached to the positive orthant of the -sphere. These are a distance apart if the radius is two. Therefore, as more and more data are collected, creating increasingly orthogonal points, the manifold ‘winds around’ the -sphere. The image generated by this is not ‘faithful’, in the sense that it does not allow for low-dimensional representations, as shown in Fig. S1 c.
II.1 Cosmic microwave -sphere embedding
There are three important different measures by which the CMB probability distributions can be compared. The first is a scaled Helligner distance Hellinger (1909) which generates the -sphere embedding (Fig. S1 c), the second is our intensive distance shown in (Fig S1 b) and the third is the Kullback-Liebler divergence, derived from a normalization of the least squared distance over all possible data that could generate a given set of spectra (which cannot be as easily visualized because it is asymmetric). Note that the manifolds shown in Fig. S1 are presented with no lensing or polarization. The full model manifold for all polarization and including lensing, up to , is insufficiently sampled by the data used in this manuscript and so is not presented.
When embedding CMB predictions on the -sphere of radius two, our squared distance between predictions for parameter sets and is , the rescaled Hellinger distance, and can be expressed as:
(S12) |
where the last expression is derived by taking a straightforward integral using Eq. S4. As the expansion is taken to higher order, the product rapidly converges to zero, resulting in a distance of for all but very small changes in parameters, as shown in the distance histogram of Fig. S1 f which illustrates the huge peak. The model manifold for CMB spectra embedded on the -sphere are represented in Fig. S1 c. Our intensive distance, , is related to a non-linear function of the Hellinger distance
(S13) |
It would be natural from an information geometry point of view to minimize in finding best fits of the model to data. In practice, the astrophysics community minimizes using a least squares method normalized over all possible data that could yield the same results collaboration (2015); Hamimeche and Lewis ; Amari (2016). One can easily show that this is in fact the Kullback-Liebler divergence and is expressed as
(S14) |
where represents the measured CMB spectra from experimental data, and are the spectra predicted for parameters . In both cases, the Kullback-Leibler and the Hellinger divergences measure distance between probability distributions. These two divergences belong to a broader class of -divergences, where is a convex function. In all these cases, the distance between two probability distributions is characterized by the choice of function, and the metric (the divergence between nearby parameter sets) is proportional to the Fisher Information Matrix Amari and Nagaoka (2000).
III Intensive manifold as an isometric embedding
Distances between predictions for two parameter combinations and in our intensive embedding are given by:
(S15) |
To determine the metric for this embedding, we consider a small parameter perturbation around some parameter combination :
(S16) |
producing the same fisher information metric shown in Eq. II. For simplicity, we have dropped all terms from the expansion equal to zero. By preserving the local metric, our intensive embedding is isometric.
IV Principal component analysis for the intensive manifold embedding
In order to visualize the various model manifolds in Fig. S1, we performed a principal component analysis on the data. This process rotates the data into an orthogonal basis such that the first component is along the direction of greatest variation, the second direction is along the direction of second greatest variation, and so on. In order to accomplish this, a data set is produced, indexed by the superscript . A data matrix can be produced, , and the index indicates the vector component of the data. This can be discrete, such as the probability state vector for the Ising model, or continuous such as the likelihood of observing a certain fluctuation in a CMB map.
The columns of are centered, producing a matrix where and is the number of data points. A singular value decomposition is normally performed on , and the th principal component for data point is now given by . This works for discrete data, but in the case of continuous data (such as likelihood functions) we must take a slightly different approach.
We construct a matrix of dot products,
(S17) |
Since this matrix can also be expressed as , we can find the principal components of our data by finding the eigenvalues and eigenvectors of the matrix of dot products. If we consider the case where data points are the square roots of the probability distribution predicted from parameter combination then, using the dot product for our replicated system, we can write out the components of as:
(S18) |
In the limit where the number of replicas go to zero, the matrix of dot products becomes:
(S19) |
In the case where the number of replicas is not a whole number, and in the limit where it tends to zero, the matrix of dot products is no longer positive definite. As a result, we can obtain positive and negative eigenvalues from its decomposition, leading to real and imaginary principal components.
In order to include experimental data into this plot, a probability distribution from the data must be generated. The dot product between these measurements and predicted distributions can then be calculated, and the results used in the matrix of dot products in Eq. S19 to find it’s projection along the principal components. The distance from the point to the manifold is given by our intensive distance from the point to the best fit.
References
- collaboration (2015) P. collaboration, (2015), 1507.02704 .
- (2) S. Hamimeche and A. Lewis, PHys. Rev. D. 77.
- Tegmark (1997) M. Tegmark, Phys. Rev. D 55 (1997).
- Abazajian et al. (2016) K. N. Abazajian et al., ArXiv e-prints (2016), arXiv:1610.02743 .
- Perotto et al. (2006) L. Perotto, J. Lesgourgues, S. Hannestad, H. Tu, and Y. Wong, JCAP 0610 (2006).
- Lewis et al. (2000) A. Lewis, A. Challinor, and A. Lasenby, Astrophys. J. 538, 473 (2000), arXiv:astro-ph/9911177 [astro-ph] .
- Hellinger (1909) E. Hellinger, J. Reine Angew. Math. 136, 210 (1909).
- Amari (2016) S. Amari, Information Geometry and its Applications, Vol. 194 (Springer, 2016).
- Amari and Nagaoka (2000) S. Amari and H. Nagaoka, Translations of Mathematical Monographs: Methods of Information Geometry, Vol. 191 (Oxford University Press, 2000).