This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Supplemental material for Visualizing theory space: Isometric embedding of probabilistic predictions, from the Ising model to the cosmic microwave background

Katherine N. Quinn, Francesco De Bernardis, Michael D. Niemack, James P. Sethna Physics Department, Cornell University, Ithaca, NY 14853-2501, United States
(July 30, 2025)

This is the supplemental material accompanying Visualizing theory space: Isometric embedding of probabilistic predictions, from the Ising model to the cosmic microwave background.

Section I discusses the general non-Euclidean embedding provided by models whose errors are dependent upon parameter values, with particular reference to the cosmic microwave background correlation function where the fluctuations are the predictions of the model. Section II shows that 2(𝐱|𝜽)2\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}, two times the square root of the likelihood of a fit, acts as an isometric embedding for a general probabilistic model onto an nn-sphere of radius two. Section II.1 illustrates the failures of this nn-sphere embedding as a practical tool for models whenever the data provides good discrimination between different model predictions. (The proposed intensive embedding in the main text takes a formal limit of the embedding as the amount of data goes to zero to bypass this challenge.) Here we focus on the Λ\LambdaCDM model and cosmic microwave background (CMB) anisotropy predictions, complementing the discussion in the main text of the Ising model. Section III shows that our intensive manifold embedding is isometric – faithfully representing the distances between predictions of nearby models, as characterized by the Fisher information. Finally, section IV describes in detail how one generalizes principal component analysis to the intensive manifold embedding, fleshing out the discussion in the main text by explicitly centering the model predictions before taking the limit of zero replicas and implementing the singular value decomposition.

I Cosmic microwave correlations as a non-Euclidean embedding

The anisotropy in CMB radiation can be characterized by a 2×22\times 2 direction dependent intensity matrix Iij(n^)I_{ij}(\hat{n}) whose components can be recognized as 3 of the 4 Stokes parameters, I, Q, and U. The Q and U polarization maps can be made independent of the Stokes parameter measurement basis by separating them into divergence (E) and curl (B) components, to generate three maps of cosmological interest; the temperature fluctuation map T and two polarization maps, E and B. These can be expanded into spherical harmonics,

X(n^)=mamXYlm(n^)whereX=T,E,B.X(\hat{n})=\sum_{\ell m}a_{\ell m}^{X}Y_{lm}(\hat{n})\quad\text{where}~X={T,E,B}. (S1)

The anisotropies are expected to be (approximately) Gaussian. All of the Gaussian information can be extracted form the angular power spectra, which are defined defined as the cross correlation of the coefficients in the expansion and written as

CXY12+1mamXamYwhereX,Y=T,E,B.C_{\ell}^{XY}\equiv\frac{1}{2\ell+1}\sum_{m}\left<a_{\ell m}^{X}a_{\ell m}^{Y}\right>\quad\text{where}~X,Y=T,E,B. (S2)

Using this, we can construct a correlation matrix for the fluctuations,

C=(CTTCTECTBCTECEECEBCTBCEBCBB).C_{\ell}=\begin{pmatrix}C_{\ell}^{TT}&C_{\ell}^{TE}&C\ell^{TB}\\ C_{\ell}^{TE}&C_{\ell}^{EE}&C_{\ell}^{EB}\\ C_{\ell}^{TB}&C_{\ell}^{EB}&C_{\ell}^{BB}\end{pmatrix}. (S3)

The values of CC_{\ell} depend on the Λ\LambdaCDM parameters, and likelihood analyses of CMB data fit with such a correlation function have been extensively studied, as they are invaluable for fitting and forecasting CMB measurements (e.g. collaboration (2015); Hamimeche and Lewis ; Tegmark (1997); Abazajian et al. (2016)). The probability of a fit for this data can be expressed as

p({a^m}|𝜽)=m1(2π)3|C|exp(12a^mC1a^m).p(\{\hat{a}_{\ell m}\}|{\mbox{\boldmath$\theta$}})=\prod_{\ell m}\frac{1}{\sqrt{(2\pi)^{3}|C_{\ell}|}}\exp\left(-\frac{1}{2}\hat{a}_{\ell m}^{\dagger}C_{\ell}^{-1}\hat{a}_{\ell m}\right). (S4)

This conditional probability defines the likelihood (collaboration, 2015; Tegmark, 1997), ({a^lm}|𝜽)=p({a^lm}|𝜽)\mathcal{L}(\{\hat{a}_{lm}\}|{\mbox{\boldmath$\theta$}})=p(\{\hat{a}_{lm}\}|{\mbox{\boldmath$\theta$}}). The metric is given by the Fisher Information Matrix (FIM),

gαβ(𝜽)=(αβlog(𝐱|𝜽))(𝐱|𝜽)d𝐱.g_{\alpha\beta}({\mbox{\boldmath$\theta$}})=-\int\left(\partial_{\alpha}\partial_{\beta}\log{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})\right){\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}){\text{d}}{\mathbf{x}}. (S5)

We can evaluate this integral by looking at the second derivatives of \mathcal{L};

αβlog({a^m}|𝜽)\displaystyle-\partial_{\alpha}\partial_{\beta}\log{\cal{L}}(\{\hat{a}_{\ell m}\}|{\mbox{\boldmath$\theta$}}) =\displaystyle= 12mαβ(log|C|+a^mC1a^m)\displaystyle\frac{1}{2}\sum_{\ell m}\partial_{\alpha}\partial_{\beta}\left(\log|C_{\ell}|+\hat{a}_{\ell m}C_{\ell}^{-1}\hat{a}_{\ell m}\right) (S6)
=\displaystyle= 12m(αβ|C||C|α|C|β|C||C|2)\displaystyle\frac{1}{2}\sum_{\ell m}\left(\frac{\partial_{\alpha}\partial_{\beta}|C_{\ell}|}{|C_{\ell}|}-\frac{\partial_{\alpha}|C_{\ell}|\partial_{\beta}|C_{\ell}|}{|C_{\ell}|^{2}}\right)
+12ma^mαβC1a^m.\displaystyle+\frac{1}{2}\sum_{\ell m}\hat{a}_{\ell m}\partial_{\alpha}\partial_{\beta}C_{\ell}^{-1}\hat{a}_{\ell m}.

This expansion can be combined with Eq. S5 to extract all terms independent of the data. Thus, the first two terms in the sum can be completely pulled out of the integral. The remaining term is harder, and to evaluate it we make use of the following integral for symmetric, positive definite M×MM\times M matrix AA and symmetric M×MM\times M matrix BB

|A|(2π)MxTBxexp(12xTAx)𝑑𝐱=Tr(A1B).\displaystyle\sqrt{\frac{|A|}{(2\pi)^{M}}}\int\textbf{x}^{T}B\textbf{x}\exp\left(-\frac{1}{2}\textbf{x}^{T}A\textbf{x}\right)d{\mathbf{x}}=\text{Tr}(A^{-1}B).

This allows us to solve Eq. S5, setting A=C1A=C_{\ell}^{-1} and B=αβC1B=\partial_{\alpha}\partial_{\beta}C_{\ell}^{-1}. We can now combine all the pieces together, and obtain a formula for the FIM

gαβ(θ)=\displaystyle g_{\alpha\beta}(\theta)= 2+12(αβ|C||C|α|C|β|C||C|2)\displaystyle\sum_{\ell}\frac{2\ell+1}{2}\left(\frac{\partial_{\alpha}\partial_{\beta}|C_{\ell}|}{|C_{\ell}|}-\frac{\partial_{\alpha}|C_{\ell}|\partial_{\beta}|C_{\ell}|}{|C_{\ell}|^{2}}\right) (S8)
+\displaystyle+ 2+12Tr(CαβC1).\displaystyle\sum_{\ell}\frac{2\ell+1}{2}\text{Tr}\left(C_{\ell}\partial_{\alpha}\partial_{\beta}C_{\ell}^{-1}\right).

We can compare this to to previous results for FIM derivations, Perotto et al. (2006); Tegmark (1997) and confirm that we obtain the same result. We can decompose this as a sum over \ell and the different spectra to obtain:

gμν=,XY,XYJXY,μlΩXY,XYlJXY,ν=(JTΩJ)μνg_{\mu\nu}=\sum_{\ell,XY,X^{\prime}Y^{\prime}}J^{l}_{XY,\mu}\Omega^{l}_{XY,X^{\prime}Y^{\prime}}J^{\ell}_{X^{\prime}Y^{\prime},\nu}=(J^{T}\Omega J)_{\mu\nu} (S9)

where JXY,μ=CXY/θμJ^{\ell}_{XY,\mu}=\partial C^{XY}_{\ell}/\partial\theta_{\mu} is a tensor of partial derivatives, and Ω\Omega is given by Eq. S8 where derivatives are taken with respect to the CXYC_{\ell}^{XY}. We can express Ω\Omega as a block diagonal matrix because the CXYC_{\ell}^{XY} are uncorrelated for different values of \ell, and in such a form represents the FIM where the parameters of interest are the CXYC_{\ell}^{XY}. Note that gμνg_{\mu\nu} is not constant in the coordinates given by CC_{\ell}.

For regular least-squares fitting, Ω\Omega would simply be a constant matrix representing experimental uncertainty. However, for CMB spectra, it is parameter dependent; it varies with the CC_{\ell}. Geometrically, this can be interpreted as the metric in the CC_{\ell} embedding space. Since it varies with the CC_{\ell}, it produces a non-Euclidean embedding. Visualizing the model manifold in this space is therefore problematic, since the space is warped and distorted, and distances are not faithfully represented as show in Fig. S1 a. This problem can be solved if we instead consider the probability distributions the CC_{\ell} correspond to for different parameters. In Table S1 we present the range of Λ\LambdaCDM parameters explored in our model manifold. In for following section we explore these probability distributions for different CC_{\ell}.

Refer to caption(a)Refer to caption(b)(c)(d)(e)(f)
Figure S1: Model manifolds for the Λ\LambdaCDM cosmological model predictions of the cosmic microwave background radiation up to =1000\ell=1000. All manifolds are plotted for the same data, and colored by AsA_{s}, the primordial fluctuation amplitude. Our universe is indicated in green in all plots. All manifolds are plotted for the first three principal components. (a) The manifold embedded in the CC_{\ell} spectra space, which in non-Euclidean and distorted. (b) The manifold in our intensive embedding, with a histogram of distances between points. The distances are spread out over a wide range, making the visualization useful. (c) The manifold embedded on an nn-sphere of radius two, where all points become effectively orthogonal. (d) Temperature spectrum for our universe, connected to its location in the three different embedding spaces. (e) Histogram of distances in our intensive embedding space. (f) Histogram of distances in the nn-sphere embedding space, illustrating that most points are almost a distance 8\sqrt{8} away, as far apart as possible on the positive orthant on the nn-sphere of radius two. The distances in the two histograms are related, since the intensive distance dId_{I} can be expressed as a function of the extensive nn-sphere Helligner distance dHd_{H}. As a result, minimizing the distance dId_{I} will also minimize dHd_{H}.
Table S1: Parameter ranges used to create the model manifolds illustrated in Fig. S1. Spectra were generated using CAMB software package. Lewis et al. (2000)
Parameter Min. Value Max Value
τ\tau 0.01 0.16
η\eta 0.01 0.999
AsA_{s} 1.0×10151.0\times 10^{-15} 1.0×1071.0\times 10^{-7}
h0h_{0} 0.091 10.0
Ωbh2\Omega_{b}h^{2} 0.0005 99.0
Ωch2\Omega_{c}h^{2} 0.0002 98.0

II nn-Sphere isometric embedding and its failings

The set of probability distributions from a model generates a ‘probability simplex’, since they are all normalized to one. The Fisher Information Matrix in this space is non-Euclidean; it is diagonal with entries that are parameter dependent. For instance, in this space the Fisher Information for the Ising model is given by

gμν(𝜽)\displaystyle g_{\mu\nu}({\mbox{\boldmath$\theta$}}) =𝐒iμ(𝐒i|𝜽)ν(𝐒i|𝜽)(𝐒i|𝜽)\displaystyle=\sum_{{\mathbf{S}}_{i}}\partial_{\mu}{\cal{L}}({\mathbf{S}}_{i}|{\mbox{\boldmath$\theta$}})\partial_{\nu}{\cal{L}}({\mathbf{S}}_{i}|{\mbox{\boldmath$\theta$}}){\cal{L}}({\mathbf{S}}_{i}|{\mbox{\boldmath$\theta$}})
=𝐒iδ𝐒i,𝐒μδ𝐒i,𝐒ν(𝐒i|𝜽)\displaystyle=\sum_{{\mathbf{S}}_{i}}\delta_{{\mathbf{S}}_{i},{\mathbf{S}}_{\mu}}\delta_{{\mathbf{S}}_{i},{\mathbf{S}}_{\nu}}{\cal{L}}({\mathbf{S}}_{i}|{\mbox{\boldmath$\theta$}})
=δ𝐒μ,𝐒ν(𝐒μ|𝜽).\displaystyle=\delta_{{\mathbf{S}}_{\mu},{\mathbf{S}}_{\nu}}{\cal{L}}({\mathbf{S}}_{\mu}|{\mbox{\boldmath$\theta$}}). (S10)

This is a non-Euclidean metric, since it is not proportional to δ𝐒μ,𝐒ν\delta_{{\mathbf{S}}_{\mu},{\mathbf{S}}_{\nu}}. It has a parameter dependent component given by the Boltzmann likelihood of being in a given spin state. This is similar to the manifold illustrated in Fig. S1 a, which shows the non-Euclidean embedding of CMB spectra in a space whose metric is also point-dependent and detailed in Section I.

If instead we consider the square root of a normalized probability distribution of a fit to data 𝐱{\mathbf{x}}, 2(𝐱|𝜽)2\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}, then we can generate a model manifold embedded on an nn-sphere of radius two, such that the metric is the Fisher Information Metric. Since (𝐱|𝜽){\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}) is normalized to one and always positive, the dot product between (𝐱|𝜽1)\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}_{1})} and (𝐱|𝜽2)\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}_{2})} must be less than one, or equal to one if the likelihood functions are the same. The distance between two points on this nn-sphere is proportional to the Hellinger distance Hellinger (1909), an f-divergence similar to the Kullback-Leibler divergence. It is straightforward to show that the metric for this nn-sphere of radius two embedding is given by the Fisher Information Matrix, by considering the distance for some small perturbation δ𝜽\delta{\mbox{\boldmath$\theta$}}:

2(𝐱|𝜽)2(𝐱|𝜽+δ𝜽),2(𝐱|𝜽)2(𝐱|𝜽+δ𝜽)\displaystyle\left<2\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}-2\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}+\delta{\mbox{\boldmath$\theta$}})},2\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}-2\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}+\delta{\mbox{\boldmath$\theta$}})}\right>
=8(1(𝐱|𝜽)(𝐱|𝜽+δ𝜽)d𝐱)\displaystyle=8\left(1-\int\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}+\delta{\mbox{\boldmath$\theta$}})}{\text{d}}{\mathbf{x}}\right)
=4δθαα(𝐱|𝜽)d𝐱αβ(𝐱|𝜽)δθαδθβ(𝐱|𝜽)d𝐱+𝒪(δ𝜽3)\displaystyle=-4\int\delta\theta^{\alpha}\partial_{\alpha}{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}){\text{d}}{\mathbf{x}}-\int\frac{\partial_{\alpha}\partial_{\beta}{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})\delta\theta^{\alpha}\delta\theta^{\beta}}{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}{\text{d}}{\mathbf{x}}+\mathcal{O}(\delta{\mbox{\boldmath$\theta$}}^{3})
=δθαδθβαlog[(𝐱|𝜽)]βlog[(𝐱|𝜽)](𝐱|𝜽)d𝐱Fisher Information Matrix\displaystyle=\delta\theta^{\alpha}\delta\theta^{\beta}\underbrace{\int\partial_{\alpha}\log\left[{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})\right]\partial_{\beta}\log\left[{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})\right]{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}){\text{d}}{\mathbf{x}}}_{\text{Fisher Information Matrix}} (S11)

This embedding is therefore isometric, preserving distances as given by the Hellinger divergence Hellinger (1909). Unfortunately, there is a maximum distance any two points can be in this embedding, and that is given by the points on the poles attached to the positive orthant of the nn-sphere. These are a distance 8\sqrt{8} apart if the radius is two. Therefore, as more and more data are collected, creating increasingly orthogonal points, the manifold ‘winds around’ the nn-sphere. The image generated by this is not ‘faithful’, in the sense that it does not allow for low-dimensional representations, as shown in Fig. S1 c.

II.1 Cosmic microwave nn-sphere embedding

There are three important different measures by which the CMB probability distributions can be compared. The first is a scaled Helligner distance Hellinger (1909) dHd_{H} which generates the nn-sphere embedding (Fig. S1 c), the second is our intensive distance dId_{I} shown in (Fig S1 b) and the third is the Kullback-Liebler divergence, derived from a normalization of the least squared distance over all possible data that could generate a given set of spectra (which cannot be as easily visualized because it is asymmetric). Note that the manifolds shown in Fig. S1 are presented with no lensing or BB polarization. The full model manifold for all polarization and including lensing, up to ell=2800ell=2800, is insufficiently sampled by the data used in this manuscript and so is not presented.

When embedding CMB predictions on the nn-sphere of radius two, our squared distance between predictions for parameter sets 𝜽1{\mbox{\boldmath$\theta$}}_{1} and 𝜽2{\mbox{\boldmath$\theta$}}_{2} is dH2d_{H}^{2}, the rescaled Hellinger distance, and can be expressed as:

dH2=8(1damp({a^m}|𝜽1)p({a^m}|𝜽2))\displaystyle d_{H}^{2}=8\left(1-\int{\text{d}}a_{\ell m}\sqrt{p(\{\hat{a}_{\ell m}\}|{\mbox{\boldmath$\theta$}}_{1})}\sqrt{p(\{\hat{a}_{\ell m}\}|{\mbox{\boldmath$\theta$}}_{2})}\right)
=88m23/2(|C(𝜽1)||C(𝜽2)||C(𝜽1)1+C(𝜽2)1|2)1/4\displaystyle=8-8\prod_{\ell m}\frac{2^{3/2}}{\left(|C_{\ell}({\mbox{\boldmath$\theta$}}_{1})||C_{\ell}({\mbox{\boldmath$\theta$}}_{2})||C_{\ell}({\mbox{\boldmath$\theta$}}_{1})^{-1}+C_{\ell}({\mbox{\boldmath$\theta$}}_{2})^{-1}|^{2}\right)^{1/4}} (S12)

where the last expression is derived by taking a straightforward integral using Eq. S4. As the expansion is taken to higher order, the product rapidly converges to zero, resulting in a distance of 8\sqrt{8} for all but very small changes in parameters, as shown in the distance histogram of Fig. S1 f which illustrates the huge peak. The model manifold for CMB spectra embedded on the nn-sphere are represented in Fig. S1 c. Our intensive distance, dId_{I}, is related to a non-linear function of the Hellinger distance

dI2\displaystyle d_{I}^{2} =8log(1dH28)\displaystyle=8\log\left(1-\frac{d_{H}^{2}}{8}\right)
=82+14log(|C(𝜽1)+C(𝜽2)|264|C(𝜽1)||C(𝜽2)|).\displaystyle=-8\sum_{\ell}\frac{2\ell+1}{4}\log\left(\frac{|C_{\ell}({\mbox{\boldmath$\theta$}}_{1})+C_{\ell}({\mbox{\boldmath$\theta$}}_{2})|^{2}}{64|C_{\ell}({\mbox{\boldmath$\theta$}}_{1})||C_{\ell}({\mbox{\boldmath$\theta$}}_{2})|}\right). (S13)

It would be natural from an information geometry point of view to minimize dI2d_{I}^{2} in finding best fits of the model to data. In practice, the astrophysics community minimizes using a least squares method normalized over all possible data that could yield the same results collaboration (2015); Hamimeche and Lewis ; Amari (2016). One can easily show that this is in fact the Kullback-Liebler divergence and is expressed as

12(2+1)(Tr(C^C(𝜽)1)+log|C(𝜽)||C^|3)\displaystyle-\frac{1}{2}\sum_{\ell}(2\ell+1)\left(\text{Tr}\left(\hat{C}_{\ell}C_{\ell}({\mbox{\boldmath$\theta$}})^{-1}\right)+\log\frac{|C_{\ell}({\mbox{\boldmath$\theta$}})|}{|\hat{C}_{\ell}|}-3\right) (S14)

where C^\hat{C}_{\ell} represents the measured CMB spectra from experimental data, and C(𝜽)C_{\ell}({\mbox{\boldmath$\theta$}}) are the spectra predicted for parameters 𝜽\theta. In both cases, the Kullback-Leibler and the Hellinger divergences measure distance between probability distributions. These two divergences belong to a broader class of ff-divergences, where ff is a convex function. In all these cases, the distance between two probability distributions is characterized by the choice of ff function, and the metric (the divergence between nearby parameter sets) is proportional to the Fisher Information Matrix Amari and Nagaoka (2000).

III Intensive manifold as an isometric embedding

Distances between predictions for two parameter combinations 𝜽1{\mbox{\boldmath$\theta$}}_{1} and 𝜽2{\mbox{\boldmath$\theta$}}_{2} in our intensive embedding are given by:

dI2(𝜽1,𝜽2)=8log(𝐱|𝜽1),(𝐱|𝜽2)d_{I}^{2}({\mbox{\boldmath$\theta$}}_{1},{\mbox{\boldmath$\theta$}}_{2})=-8\log\left<\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}_{1})},\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}_{2})}\right> (S15)

To determine the metric for this embedding, we consider a small parameter perturbation δ𝜽\delta{\mbox{\boldmath$\theta$}} around some parameter combination 𝜽\theta:

dI2(𝜽,𝜽+δ𝜽)\displaystyle d_{I}^{2}({\mbox{\boldmath$\theta$}},{\mbox{\boldmath$\theta$}}+\delta{\mbox{\boldmath$\theta$}}) =8logd𝐱(𝐱|𝜽)(𝐱|𝜽+δ𝜽)\displaystyle=-8\log\int{\text{d}}{\mathbf{x}}\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}\sqrt{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}}+\delta{\mbox{\boldmath$\theta$}})}
=d𝐱α(𝐱|𝜽)β(𝐱|𝜽)(𝐱|𝜽)δθαδθβ+𝒪(δ𝜽3)\displaystyle=\int{\text{d}}{\mathbf{x}}\frac{\partial_{\alpha}{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})\partial_{\beta}{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}{{\cal{L}}({\mathbf{x}}|{\mbox{\boldmath$\theta$}})}\delta\theta^{\alpha}\delta\theta^{\beta}+\mathcal{O}(\delta{\mbox{\boldmath$\theta$}}^{3}) (S16)

producing the same fisher information metric shown in Eq. II. For simplicity, we have dropped all terms from the expansion equal to zero. By preserving the local metric, our intensive embedding is isometric.

IV Principal component analysis for the intensive manifold embedding

In order to visualize the various model manifolds in Fig. S1, we performed a principal component analysis on the data. This process rotates the data into an orthogonal basis such that the first component is along the direction of greatest variation, the second direction is along the direction of second greatest variation, and so on. In order to accomplish this, a data set {𝐝(J)}\{{\mathbf{d}}^{(J)}\} is produced, indexed by the superscript (J)(J). A data matrix can be produced, DiJ=di(J)D_{iJ}=d^{(J)}_{i}, and the index ii indicates the vector component of the data. This can be discrete, such as the probability state vector for the Ising model, or continuous such as the likelihood of observing a certain fluctuation alma_{lm} in a CMB map.

The columns of DD are centered, producing a matrix MiJ=di(J)d~iM_{iJ}=d^{(J)}_{i}-\tilde{d}_{i} where 𝐝~=1nJ𝐝(J)\tilde{{\mathbf{d}}}=\frac{1}{n}\sum_{J}{\mathbf{d}}^{(J)} and nn is the number of data points. A singular value decomposition is normally performed on M=UΣVTM=U\Sigma V^{T}, and the iith principal component for data point JJ is now given by ΣiiUJi\Sigma_{ii}U_{Ji}. This works for discrete data, but in the case of continuous data (such as likelihood functions) we must take a slightly different approach.

We construct a matrix of dot products,

(MMT)JK\displaystyle(MM^{T})_{JK} =i(di(J)d~i)(di(K)d~i)\displaystyle=\sum_{i}(d^{(J)}_{i}-\tilde{d}_{i})(d^{(K)}_{i}-\tilde{d}_{i})
=𝐝(J)𝐝(K)(𝐝(J)𝐝~+𝐝(K)𝐝~)+𝐝~𝐝~\displaystyle={\mathbf{d}}^{(J)}\cdot{\mathbf{d}}^{(K)}-\left({\mathbf{d}}^{(J)}\cdot\tilde{{\mathbf{d}}}+{\mathbf{d}}^{(K)}\tilde{{\mathbf{d}}}\right)+\tilde{{\mathbf{d}}}\cdot\tilde{{\mathbf{d}}}
=𝐝(J)𝐝(K)1nL(𝐝(J)𝐝(L)+𝐝(K)𝐝(L))\displaystyle={\mathbf{d}}^{(J)}\cdot{\mathbf{d}}^{(K)}-\frac{1}{n}\sum_{L}\left({\mathbf{d}}^{(J)}\cdot{\mathbf{d}}^{(L)}+{\mathbf{d}}^{(K)}\cdot{\mathbf{d}}^{(L)}\right)
+1n2L,L𝐝(L)𝐝(L).\displaystyle+\frac{1}{n^{2}}\sum_{L,L^{\prime}}{\mathbf{d}}^{(L)}\cdot{\mathbf{d}}^{(L^{\prime})}. (S17)

Since this matrix can also be expressed as MMT=UΣVTVΣUT=UΣ2UTMM^{T}=U\Sigma V^{T}V\Sigma U^{T}=U\Sigma^{2}U^{T}, we can find the principal components of our data by finding the eigenvalues and eigenvectors of the matrix of dot products. If we consider the case where data points 𝐝(J){\mathbf{d}}^{(J)} are the square roots of the probability distribution predicted from parameter combination 𝜽J{\mbox{\boldmath$\theta$}}_{J} then, using the dot product for our NN replicated system, we can write out the components of (MMT)(MM^{T}) as:

(MMT)JK=\displaystyle(MM^{T})_{JK}= 𝜽J;𝜽KNNL(𝜽J;𝜽LNnN+𝜽K;𝜽LNnN)\displaystyle\frac{\left<{\mbox{\boldmath$\theta$}}_{J};{\mbox{\boldmath$\theta$}}_{K}\right>^{N}}{N}-\sum_{L}\left(\frac{\left<{\mbox{\boldmath$\theta$}}_{J};{\mbox{\boldmath$\theta$}}_{L}\right>^{N}}{nN}+\frac{\left<{\mbox{\boldmath$\theta$}}_{K};{\mbox{\boldmath$\theta$}}_{L}\right>^{N}}{nN}\right)
+L,L𝜽L;𝜽LNn2N\displaystyle+\sum_{L,L^{\prime}}\frac{\left<{\mbox{\boldmath$\theta$}}_{L};{\mbox{\boldmath$\theta$}}_{L^{\prime}}\right>^{N}}{n^{2}N} (S18)

In the limit where the number of replicas go to zero, the matrix of dot products becomes:

(MMT)JK=\displaystyle(MM^{T})_{JK}= log𝜽J;𝜽K\displaystyle\log\left<{\mbox{\boldmath$\theta$}}_{J};{\mbox{\boldmath$\theta$}}_{K}\right>
1nL(log𝜽J;𝜽L+log𝜽K;𝜽L)\displaystyle-\frac{1}{n}\sum_{L}\left(\log\left<{\mbox{\boldmath$\theta$}}_{J};{\mbox{\boldmath$\theta$}}_{L}\right>+\log\left<{\mbox{\boldmath$\theta$}}_{K};{\mbox{\boldmath$\theta$}}_{L}\right>\right)
+1n2L,Llog𝜽L;𝜽L\displaystyle+\frac{1}{n^{2}}\sum_{L,L^{\prime}}\log\left<{\mbox{\boldmath$\theta$}}_{L};{\mbox{\boldmath$\theta$}}_{L^{\prime}}\right> (S19)

In the case where the number of replicas is not a whole number, and in the limit where it tends to zero, the matrix of dot products is no longer positive definite. As a result, we can obtain positive and negative eigenvalues from its decomposition, leading to real and imaginary principal components.

In order to include experimental data into this plot, a probability distribution from the data must be generated. The dot product between these measurements and predicted distributions can then be calculated, and the results used in the matrix of dot products in Eq. S19 to find it’s projection along the principal components. The distance from the point to the manifold is given by our intensive distance from the point to the best fit.

References

  • collaboration (2015) P. collaboration,   (2015), 1507.02704 .
  • (2) S. Hamimeche and A. Lewis, PHys. Rev. D. 77.
  • Tegmark (1997) M. Tegmark, Phys. Rev. D 55 (1997).
  • Abazajian et al. (2016) K. N. Abazajian et al., ArXiv e-prints  (2016), arXiv:1610.02743 .
  • Perotto et al. (2006) L. Perotto, J. Lesgourgues, S. Hannestad, H. Tu,  and Y. Wong, JCAP 0610 (2006).
  • Lewis et al. (2000) A. Lewis, A. Challinor,  and A. Lasenby, Astrophys. J. 538, 473 (2000)arXiv:astro-ph/9911177 [astro-ph] .
  • Hellinger (1909) E. Hellinger, J. Reine Angew. Math. 136, 210 (1909).
  • Amari (2016) S. Amari, Information Geometry and its Applications, Vol. 194 (Springer, 2016).
  • Amari and Nagaoka (2000) S. Amari and H. Nagaoka, Translations of Mathematical Monographs: Methods of Information Geometry, Vol. 191 (Oxford University Press, 2000).