This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The non-Gaussian likelihood of weak lensing power spectra

Alex Hall \scalerel* 1 ahall@roe.ac.uk    Andy Taylor Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh, EH9 3HJ, U.K.
Abstract

The power spectrum of weak lensing fluctuations has a non-Gaussian distribution due to its quadratic nature. On small scales the Central Limit Theorem acts to Gaussianize this distribution but non-Gaussianity in the signal due to gravitational collapse is increasing and the functional form of the likelihood is unclear. Analyses have traditionally assumed a Gaussian likelihood with non-linearity incorporated into the covariance matrix; here we provide the theory underpinning this assumption. We calculate, for the first time, the leading-order correction to the distribution of angular power spectra from non-Gaussianity in the underlying signal and study the transition to Gaussianity. Our expressions are valid for an arbitrary number of correlated maps and correct the Wishart distribution in the presence of weak (but otherwise arbitrary) non-Gaussianity in the signal. Surprisingly, the resulting distribution is not equivalent to an Edgeworth expansion. The leading-order effect is to broaden the covariance matrix by the usual trispectrum term, with residual skewness sourced by the trispectrum and the square of the bispectrum. Using lognormal lensing maps we demonstrate that our likelihood is uniquely able to model both large and mildly non-linear scales. We provide easy-to-compute statistics to quantify the size of the non-Gaussian corrections. We show that the full non-Gaussian likelihood can be accurately modelled as a Gaussian on small, non-linear scales. On large angular scales non-linearity in the lensing signal imparts a negligible correction to the likelihood, which takes the Wishart form in the full-sky case. Our formalism is equally applicable to any kind of projected field.

I Introduction

In the coming decade several large astronomy projects aim to place percent-level constraints on cosmic acceleration, testing the Λ\LambdaCDM cosmological model and its extensions. These include the European Space Agency’s Euclid mission111https://sci.esa.int/web/euclid, the Vera C. Rubin Observatory222https://www.lsst.org/about, and the Nancy Grace Roman Space Telescope333https://roman.gsfc.nasa.gov/. These ‘Stage-IV’ facilities will each image roughly a billion galaxies, with redshifts derived from photometry calibrated against deep spectroscopic training samples.

Upcoming surveys will use these galaxies to measure the weak gravitational lensing signal (see Refs. Kilbinger (2015); Mandelbaum (2018) for recent reviews), which directly probes the underlying matter distribution and provides the opportunity to derive cosmological constraints from its statistical properties. Current lensing surveys are now achieving precision on certain Λ\LambdaCDM cosmological parameter combinations at the level of a few percent, and providing useful constraints on the properties of dark energy Hamana et al. (2020); Heymans et al. (2021); Abbott et al. (2022). The influence of weak lensing in constraining cosmological parameters is likely to increase with the advent of Stage-IV surveys, if systematic errors can be kept under control.

It is now fairly common practice to combine measurements of the cosmic shear signal with maps of the angular positions of galaxies, sometimes referred to as ‘3×\times2 pt’ analysis Abbott et al. (2018). The relatively coarse precision of photometric redshifts means that binning galaxies into broad redshift bins is reasonably lossless, a technique called ‘tomography’ Hu (1999); Schrabback et al. (2010); Heymans et al. (2013); Jee et al. (2016). Combining tomographic galaxy clustering and weak lensing are examples of the use of projected maps of dark matter (or discrete tracers thereof) to constrain cosmological models.

The baseline parameter constraints from current and upcoming surveys using projected fields will be derived from two-point statistics, such as correlation functions or power spectra, combined with a likelihood function. Accurate cosmological inference using this approach requires the likelihood to be specified accurately. The functional form of the likelihood has traditionally been assumed to be Gaussian in the measured summary statistics, the rationale being that the Central Limit Theorem (CLT) drives the distribution of these statistics towards Gaussianity. However, it has long been known that the distribution of weak lensing or galaxy clustering two-point statistics is in fact not Gaussian, for several reasons. Firstly, two-point statistics are non-linear (quadratic) functions of the data. If the data are Gaussian distributed, power spectra and correlation functions are expected to be non-Gaussian as a result, which can easily be seen by noting that the variance of the field (the correlation function at zero lag) cannot be negative. On the full sky the distribution of power spectra of Gaussian fields can be written down analytically (e.g. Ref. Percival and Brown (2006)), but in the presence of a survey mask (required to mitigate foreground contamination) the naive estimator (i.e. the pseudo-ClC_{l} estimator) has a distribution that is complicated to compute and must typically be approximated Hamimeche and Lewis (2008); Upham et al. (2020).

A second source of non-Gaussianity in the likelihood arises from the fact that the fields themselves are not Gaussian due to non-linear gravitational collapse. This issue is inextricably tied to the efficacy of the CLT to Gaussianize the estimator, because both the number of degrees of freedom being compressed by the estimator (e.g. the number of galaxy pairs of a given angular separation) and the strength of the signal non-linearity increase as the minimum angular scale is lowered. Non-Gaussianity of the signal (the assumption of Gaussian noise is typically safe) is usually incorporated by modifying the covariance matrix of the two-point statistic with a term derived from the connected four-point function (trispectrum) of the signal, while keeping the Gaussian functional form (e.g. Hikage et al. (2019); Nicola et al. (2021); Loureiro et al. (2021)).

Additionally, non-Gaussianity in the likelihood can also arise due to the presence of non-Gaussian systematic residuals, or when the covariance has been estimated from simulations Sellentin and Heavens (2016) or combined with an analytic model Hall and Taylor (2019).

Recognising that the approximation of Gaussianity may contribute bias or imprecision to parameter constraints, several works have successfully measured non-Gaussianity in the distribution of two-point statistics from simulations. Ref. Scoccimarro (2000) was an early work to find evidence for non-Gaussian moments of power spectrum estimates, but Ref. Hartlap et al. (2009) and Ref. Schneider and Hartlap (2009) represent the first attempts to incorporate non-Gaussianity into the likelihood of a weak lensing survey. The approach taken in Ref. Hartlap et al. (2009) was to model the data vector (correlation functions in their case) as the sum of independent non-Gaussian components, using the fastICA Independent Component Analysis method to construct the likelihood. Subsequent analytic work by Ref. Keitel and Schneider (2011) and Ref. Wilking and Schneider (2013) further established the non-Gaussianity of the weak lensing correlation functions. Ref. Sellentin and Heavens (2018) introduced diagnostic statistics to identify non-Gaussianity in cosmological likelihood functions, and evidence for non-Gaussianity in a weak lensing likelihood was found in Ref. Sellentin et al. (2018). Ref. Hahn et al. (2019) studied non-Gaussianity in galaxy clustering power spectra, building on and extending the work of Ref. Hartlap et al. (2009), and found non-negligible (albeit sub-1σ1\sigma) biases in parameters. In the context of weak lensing correlation functions from Stage-IV surveys, Ref. Lin et al. (2020) identified significant skewness and kurtosis in the shear correlation functions and used a Principal Component Analysis to model the resulting non-Gaussian likelihood, but found that a Gaussian approximation was sufficient for Λ\LambdaCDM parameters and an LSST-like survey. This conclusion is supported by the work of Ref. Taylor et al. (2019) who used likelihood-free methods to establish the validity of a Gaussian approximation for Stage-IV surveys and Λ\LambdaCDM parameters. Recently, Ref. Diaz Rivero and Dvorkin (2020) demonstrated clear non-Gaussianity in the likelihood of weak lensing power spectra, and presented a method to model this using normalizing flows. Ref. Friedrich et al. (2021) have also studied the impact of non-Gaussian likelihoods on parameter constraints, finding a negligible impact for the Dark Energy Survey data vector.

What these works have established is that while non-Gaussianity is clearly present in the weak lensing likelihood, the impact of the Gaussian assumption on the posterior of parameters appears to be very modest, at least in Λ\LambdaCDM models. However, there has been relatively little attempt to distinguish between the non-Gaussianity that should arise on survey scales due to the finite number of modes (that makes a Gamma distribution more accurate than a Gaussian) from the non-Gaussianity that could arise on small scales due to the CLT being inefficient at Gaussianizing the summary statistics in the face of increasing non-linearity in the signal. The use of the power spectrum as the statistic of choice aids in distinguishing between these two effects, as well as allowing the modelling to be simplified due to the diagonal covariance that arises for Gaussian fields. As well as being unable to distinguish between various sources of non-Gaussianity, most of the works modelling non-Gaussian likelihoods have used numerical techniques or ad-hoc approximations to capture the corrections, often with little physical justification. This is partly because the full non-Gaussian distribution of the shear or galaxy clustering signal cannot be written down.

In this work, we incorporate the effects of a non-Gaussian signal into the likelihood of the angular power spectrum, and study how this impacts various posterior distributions of interest. To make analytic progress we assume that the signal non-Gaussianity is sufficiently weak that it can be treated perturbatively. While this is a constraining assumption, it will allow us to study in detail how the various sources of non-Gaussianity interact and make sense of the results of the numerical studies referenced above. Our approach is more in the spirit of Ref. Smith et al. (2006), who studied similar kinds of effects in the power spectra of the weakly lensed cosmic microwave background, than the numerical works that have sought to model the full non-perturbative likelihood. We argue that an analytic approach building from the well-understood case of Gaussian fields and full sky coverage offers a valuable way of understanding the Gaussianity, or otherwise, of the likelihood function. Having a tractable model that can be quickly computed (and sampled from) also allows the effects of non-Gaussianity to be studied for any cosmological model, rather than being limited to Λ\LambdaCDM.

This paper is structured as follows. In Section II we present the main expression used in this work, Equation (22), and study its properties. In Section III we detail the simulations used to validate our likelihood and quantify the corrections to the posterior distributions. We present our conclusions in Section IV. In a series of appendices we present the derivation of the likelihood function, study the impact of alternative angular binning choices, and test the impact of the source redshift.

II The likelihood of angular power spectrum estimates

Consider a survey that has made noisy maps di(𝐧^)d^{i}(\hat{\mathbf{n}}) of the large-scale scale structure across the full sky in pp tomographic redshift bins labelled by the index ii. Taking the spherical harmonic coefficients almia^{i}_{lm} of these maps and arranging them into a vector of length pp, the angular power spectra including all cross-bin pairs can be packaged into a p×pp\times p matrix 𝐂^l\hat{\mathbf{C}}_{l} given by

ν𝐂^lm=ll𝐚lm𝐚lm,\nu\hat{\mathbf{C}}_{l}\equiv\sum_{m=-l}^{l}\mathbf{a}_{lm}\mathbf{a}_{lm}^{\dagger}, (1)

where ν2l+1\nu\equiv 2l+1 is the number of angular degrees of freedom at each ll, and the spherical multipoles of the data consist of signal plus noise as almi=slmi+nlmia^{i}_{lm}=s^{i}_{lm}+n^{i}_{lm}. We will assume the maps are spin-0 (i.e. scalar), although this could easily be relaxed; in the case of shear data for example we can define the vector 𝐚lm\mathbf{a}_{lm} to include two entries per redshift bin corresponding to the shear EE and BB modes. Similarly, projected galaxy number counts may be included in the data vector for a full 3×23\times 2pt analysis. The hatted power spectrum in Equation (1) denotes a measured, noisy quantity. We will assume that the maps are purely real, and hence all multipoles obey

𝐚lm=(1)m𝐚lm.\mathbf{a}_{l-m}=(-1)^{m}\mathbf{a}^{*}_{lm}. (2)

Our goal is to compute the sampling distribution of these power spectrum estimates, p({𝐂^l})p(\{\hat{\mathbf{C}}_{l}\}), with braces indicating the full set of angular multipoles ll, given knowledge of the underlying statistical properties of the noise and the signal. The sampling distribution considered as a function of these properties is the likelihood function. The power spectra are quadratic in the data and so are expected to obey Gaussian statistics only when the Central Limit Theorem is effective. The other source of non-Gaussianity is from the signal itself, which when measured in the late Universe on small scales has higher-order moments due to non-linear structure formation.

As discussed in Appendix A we cannot proceed by performing an Edgeworth expansion of the power spectrum distribution around its zero-order Wishart form, so instead we develop a new approach and begin by considering the statistics of the underlying data. Assuming the signal and noise are independent, the joint distribution of the power spectra (multiplied by ν\nu) is

p({ν𝐂^l})=d{𝐬lm}d{𝐧lm}[l=lminlmaxδD(ν𝐂^lm𝐚lm𝐚lm)]ps({𝐬lm})pn({𝐧lm}),p(\{\nu\hat{\mathbf{C}}_{l}\})=\int\mathrm{d}\{\mathbf{s}_{lm}\}\int\mathrm{d}\{\mathbf{n}_{lm}\}\left[\prod_{l=l_{{\rm min}}}^{l_{{\rm max}}}\delta_{D}\left(\nu\hat{\mathbf{C}}_{l}-\sum_{m}\mathbf{a}_{lm}\mathbf{a}_{lm}^{\dagger}\right)\right]p_{s}(\{\mathbf{s}_{lm}\})p_{n}(\{\mathbf{n}_{lm}\}), (3)

where the probability density p({ν𝐂^l})p(\{\nu\hat{\mathbf{C}}_{l}\}) is defined with a measure over real and symmetric matrices, which have np(p+1)/2n\equiv p(p+1)/2 independent elements. We will expand the Dirac delta in terms of real and symmetric pp-dimensional matrices 𝐉l\mathbf{J}_{l} as

δD(ν𝐂^lm𝐚lm𝐚lm)=2p(p1)2d𝐉l(2π)neiTr[𝐉l(ν𝐂^lm𝐚lm𝐚lm)],\delta_{D}\left(\nu\hat{\mathbf{C}}_{l}-\sum_{m}\mathbf{a}_{lm}\mathbf{a}_{lm}^{\dagger}\right)=2^{\frac{p(p-1)}{2}}\int\frac{\mathrm{d}\mathbf{J}_{l}}{(2\pi)^{n}}e^{i\mathrm{Tr}\left[\mathbf{J}_{l}\left(\nu\hat{\mathbf{C}}_{l}-\sum_{m}\mathbf{a}_{lm}\mathbf{a}_{lm}^{\dagger}\right)\right]}, (4)

where we integrate over all real symmetric matrices and the volume element is

d𝐉l=dJl1,1dJl1,2dJl1,pdJl2,2dJl2,3dJlp,p.\mathrm{d}\mathbf{J}_{l}=\mathrm{d}J_{l}^{1,1}\mathrm{d}J_{l}^{1,2}\dots\mathrm{d}J_{l}^{1,p}\mathrm{d}J_{l}^{2,2}\mathrm{d}J_{l}^{2,3}\dots\mathrm{d}J_{l}^{p,p}. (5)

Note that the 2p(p1)22^{\frac{p(p-1)}{2}} factor is needed since the off-diagonal elements of 𝐉l\mathbf{J}_{l} are double counted in the summation in the exponential, and hence we need to redefine each of the p(p1)/2p(p-1)/2 off-diagonal terms in the integration with a factor of 22.

We expand the signal distribution in Fourier modes as

p({𝐬lm})=d{𝐤lm}(2π)pNϕs({𝐤lm})eilm𝐤lm𝐬lm,p(\{\mathbf{s}_{lm}\})=\int\frac{\mathrm{d}\{\mathbf{k}_{lm}\}}{(2\pi)^{pN}}\,\phi_{s}(\{\mathbf{k}_{lm}\})\,e^{i\sum_{lm}\mathbf{k}_{lm}^{\dagger}\mathbf{s}_{lm}}, (6)

where Nl=lminlmax2l+1N\equiv\sum_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}2l+1 is the total number of modes per tomographic bin, and ϕs\phi_{s} is the characteristic function of the signal444We will define characteristic functions this way throughout, but we caution the reader that other authors sometimes define it with a complex conjugate.. We can, and will, define the ‘wave numbers’ 𝐤lm\mathbf{k}_{lm} to obey reality conditions as in Equation (2). The corresponding characteristic function of the noise is ϕn\phi_{n}.

Inserting Equations (4) and (6) into Equation (3), changing integration variables and doing an integral over 𝐚lm\mathbf{a}_{lm} assuming 𝐉l\mathbf{J}_{l} is non-singular555The set of singular 𝐉l\mathbf{J}_{l} has zero measure, and as the integrand is finite for this set the contribution from singular modes is vanishingly small in the limit. gives

p({ν𝐂^l})=2λp(p1)2d{𝐉l}(2π)nλeilTr(ν𝐉l𝐂^l)(l=lminlmax|2i𝐉l|ν2)d{𝐤lm}(2π)pN2ϕs({𝐤lm})e12lm𝐤lm(2i𝐉l)1𝐤lmϕn({𝐤lm}),p(\{\nu\hat{\mathbf{C}}_{l}\})=2^{\frac{\lambda p(p-1)}{2}}\int\frac{\mathrm{d}\{\mathbf{J}_{l}\}}{(2\pi)^{n\lambda}}e^{i\sum_{l}\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert 2i\mathbf{J}_{l}\rvert^{-\frac{\nu}{2}}\right)\int\frac{\mathrm{d}\{\mathbf{k}_{lm}\}}{(2\pi)^{\frac{pN}{2}}}\phi_{s}(\{\mathbf{k}_{lm}\})e^{-\frac{1}{2}\sum_{lm}\mathbf{k}_{lm}^{\dagger}(2i\mathbf{J}_{l})^{-1}\mathbf{k}_{lm}}\phi_{n}(\{\mathbf{k}_{lm}\}), (7)

where λlmaxlmin+1\lambda\equiv l_{\mathrm{max}}-l_{\mathrm{min}}+1.

Now, we define the characteristic function of {ν𝐂^l}\{\nu\hat{\mathbf{C}}_{l}\} over its independent elements as

ϕ{ν𝐂^l}({𝐉l})=d{ν𝐂^l}eilTr(ν𝐉l𝐂^l)p({ν𝐂^l}).\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\int\mathrm{d}\{\nu\hat{\mathbf{C}}_{l}\}e^{-i\sum_{l}\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}p(\{\nu\hat{\mathbf{C}}_{l}\}). (8)

This gives, finally,

ϕ{ν𝐂^l}({𝐉l})=(l=lminlmax|2i𝐉l|ν2)d{𝐤lm}(2π)pN2ϕs({𝐤lm})e12lm𝐤lm(2i𝐉l)1𝐤lmϕn({𝐤lm}),\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert 2i\mathbf{J}_{l}\rvert^{-\frac{\nu}{2}}\right)\int\frac{\mathrm{d}\{\mathbf{k}_{lm}\}}{(2\pi)^{\frac{pN}{2}}}\phi_{s}(\{\mathbf{k}_{lm}\})e^{-\frac{1}{2}\sum_{lm}\mathbf{k}_{lm}^{\dagger}(2i\mathbf{J}_{l})^{-1}\mathbf{k}_{lm}}\phi_{n}(\{\mathbf{k}_{lm}\}), (9)

i.e. a mode coupling integral between the characteristic functions of the signal, the noise, and a Gaussian in the ‘wave number’ of the field 𝐤lm\mathbf{k}_{lm} having covariance 2i𝐉2i\mathbf{J}, where 𝐉\mathbf{J} is the ‘wave number’ of the power spectra.

II.1 The known case of a Gaussian signal and Gaussian noise

So far, we have not assumed anything about the statistics of the signal or the noise, only that they are independent. In the case of Gaussian signal and noise, the characteristic functions ϕs\phi_{s} and ϕn\phi_{n} also take a Gaussian form, and the integrals in Equation (9) all decouple. Each integration has a Gaussian form that can be done analytically, with the result being a product of Wishart characteristic functions

ϕ{ν𝐂^l}({𝐉l})=l=lminlmax|𝐈+2i𝐉l𝐂l|ν2,\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}, (10)

where 𝐂l\mathbf{C}_{l} is the expected total (signal plus noise) covariance matrix of the map. The inverse Fourier transform of this can be computed analytically (see Appendix B), and gives the well-known distribution of the power spectra as a product of Wishart distributions (e.g., Ref. Percival and Brown (2006))

p({𝐂^l})\displaystyle p(\{\hat{\mathbf{C}}_{l}\}) =l=lminlmax|𝐂^l|(νp1)/22pν2Γp(ν2)|𝐂l/ν|ν2eν2Tr(𝐂l1𝐂^l)\displaystyle=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\frac{\lvert\hat{\mathbf{C}}_{l}\rvert^{(\nu-p-1)/2}}{2^{\frac{p\nu}{2}}\Gamma_{p}\left(\frac{\nu}{2}\right)\left|\mathbf{C}_{l}/\nu\right|^{\frac{\nu}{2}}}e^{-\frac{\nu}{2}\mathrm{Tr}(\mathbf{C}_{l}^{-1}\hat{\mathbf{C}}_{l})}
l=lminlmaxWp(𝐂^l;𝐂l/ν,ν),\displaystyle\equiv\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}W_{p}(\hat{\mathbf{C}}_{l};\mathbf{C}_{l}/\nu,\nu), (11)

where WpW_{p} is the pp-dimensional Wishart density and Γp\Gamma_{p} is the multivariate Gamma function.

Each 𝐂^l\hat{\mathbf{C}}_{l} estimate thus has a Wishart distribution with scale matrix 𝐂l/ν\mathbf{C}_{l}/\nu and degrees of freedom ν\nu with ν>p1\nu>p-1. As expected, the mean of each 𝐂^l\hat{\mathbf{C}}_{l} is 𝐂l\mathbf{C}_{l} (the estimator is unbiased), and the variance of a given map power spectrum is 2Cl2/ν2C_{l}^{2}/\nu. The latter also follows immediately from squaring the definition Equation (1) and using Wick’s theorem.

When p=1p=1 this distribution is equivalent to a Gamma distribution. The Gamma distribution has non-zero skewness, kurtosis, and higher-order cumulants. For example, the dimensionless skewness is equal to S=22/νS=2\sqrt{2/\nu}, and the dimensionless excess kurtosis is κ=12/ν\kappa=12/\nu. It is easy to see that higher-order cumulants come with successive factors of ν1/2\nu^{-1/2}; this is how the Central Limit Theorem manifests itself when ν1\nu\gg 1, since these higher-order cumulants are all formally zero for a Gaussian field. From Equation (1) we see that ν\nu counts the azimuthal modes contributing to a given wave number, such that the power spectrum estimator is the average of an increasingly large number of independent variables as ν\nu increases.

The convergence of any probability distribution to a Gaussian can be established formally by showing that its characteristic function tends pointwise to that of a Gaussian (Lèvy’s continuity theorem, e.g. Ref. Durrett (2010)). Inspection of Equation (10) makes clear how this happens, where the convergence to a Gaussian can be established by Taylor-expanding around ν=\nu=\infty. One can also establish Gaussianity from the density directly. In the p=1p=1 case one first has to change variables to C^l=Cl(1+n2/ν)\hat{C}_{l}=C_{l}(1+n\sqrt{2/\nu}) then take the limit ν\nu\to\infty at fixed nn, using Stirling’s approximation to deal with the Gamma function in the denominator. Whichever way one chooses to go, the resulting limiting distribution is a product of Gaussians, each with mean ClC_{l} and variance 2Cl2/ν2C_{l}^{2}/\nu.

We close this section by noting that in general one has to be careful about assuming a Gaussian sampling distribution for power spectrum estimates, as this does not necessarily guarantee a Gaussian posterior for parameters at the same level of accuracy. For example, if one has an informative and non-Gaussian prior on a parameter, the amount of constraining data required for the likelihood to dominate the posterior can be well above that needed to Gaussianize the likelihood. Secondly, and more importantly, posterior standard errors on parameters can shrink as fast as non-Gaussian contributions to the likelihood, meaning relative corrections from non-Gaussianity can remain important for posteriors even on small angular scales. In general these effects are most robustly studied with simulations. Ref. Hamimeche and Lewis (2008) provides a thorough discussion of residual non-Gaussian effects from failure of the Central Limit Theorem for Gaussian fields in the CMB context, finding that sufficiently broad multipole bins can reduce the error from approximating the likelihood as Gaussian to acceptable levels. We will return to this question in the cosmic shear context in Section III.

II.2 Corrections to the tomographic shear power distribution from a non-Gaussian signal

In the previous section we derived the distribution of angular power spectrum estimates assuming both signal and noise were Gaussian. In this section we will consider the more realistic scenario of a non-Gaussian signal, but will keep the assumption of Gaussian noise.

In the Gaussian case, the mode coupling integral in Equation (9) can be performed analytically due to the Gaussian form of the signal characteristic function ϕs\phi_{s}. The fully general non-Gaussian characteristic function of the signal has no analytic expression that can be written down, so to make progress we must restrict to weak non-Gaussianity, in the sense that the higher-order cumulants of the signal are progressively suppressed. This is the Edgeworth expansion Bouchet et al. (1992); Juszkiewicz et al. (1995); Bernardeau and Kofman (1995); Amendola (1996); Blinnikov and Moessner (1998); Taylor and Watts (2001), and truncating at leading order gives

ϕs({𝐤lm})=\displaystyle\phi_{s}(\{\mathbf{k}_{lm}\})= (1i16κlm¯ijkkl1m1ikl2m2jkl3m3k\displaystyle\left(1-i\frac{1}{6}\kappa^{ijk}_{\underline{lm}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}\vphantom{\frac{1}{2}}\right.
+124κlm¯ijkmkl1m1ikl2m2jkl3m3kkl4m4m172κlm¯ijkκlm¯lmnkl1m1ikl2m2jkl3m3kkl1m1lkl2m2mkl3m3n)ϕsG({𝐤lm}),\displaystyle\left.\vphantom{\frac{1}{2}}+\frac{1}{24}\kappa^{ijkm}_{\underline{lm}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}k^{m}_{l_{4}m_{4}}-\frac{1}{72}\kappa^{ijk}_{\underline{lm}}\kappa^{lmn}_{\underline{l^{\prime}m^{\prime}}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}k^{l}_{l_{1}^{\prime}m_{1}^{\prime}}k^{m}_{l_{2}^{\prime}m_{2}^{\prime}}k^{n}_{l_{3}^{\prime}m_{3}^{\prime}}\right)\phi_{s}^{G}(\{\mathbf{k}_{lm}\}), (12)

where all repeated indices are summed over and ϕsG\phi_{s}^{G} is the Gaussian characteristic function. The compact notation lm¯\underline{lm} denotes l1m1,l2m2,l3m3l_{1}m_{1},l_{2}m_{2},l_{3}m_{3} when attached to a three-point cumulant, and l1m1,l2m2,l3m3,l4m4l_{1}m_{1},l_{2}m_{2},l_{3}m_{3},l_{4}m_{4} when attached to a four-point cumulant. These are defined as

κlm¯ijk\displaystyle\kappa^{ijk}_{\underline{lm}} sl1m1isl2m2jsl3m3k\displaystyle\equiv\langle s^{i}_{l_{1}m_{1}}s^{j}_{l_{2}m_{2}}s^{k}_{l_{3}m_{3}}\rangle (13)
κlm¯ijkm\displaystyle\kappa^{ijkm}_{\underline{lm}} sl1m1isl2m2jsl3m3ksl4m4mc,\displaystyle\equiv\langle s^{i}_{l_{1}m_{1}}s^{j}_{l_{2}m_{2}}s^{k}_{l_{3}m_{3}}s^{m}_{l_{4}m_{4}}\rangle_{c}, (14)

where the subscript cc denotes the fully connected part.

In Appendix A we discuss how the signal cumulants are related to those of the power spectrum estimates themselves. The covariance and leading order dimensionless skewness of C^l\hat{C}_{l} in the p=1p=1 case are given by

ΔC^lΔC^l\displaystyle\langle\Delta\hat{C}_{l}\Delta\hat{C}_{l^{\prime}}\rangle =2νCl2δll+Tll,\displaystyle=\frac{2}{\nu}C_{l}^{2}\delta_{ll^{\prime}}+T_{ll^{\prime}}, (15)
ΔC^l3ΔCl23/2\displaystyle\frac{\langle\Delta\hat{C}_{l}^{3}\rangle}{\langle\Delta C_{l}^{2}\rangle^{3/2}} 8ν+2ν3B~lllCl3+32ν2TllCl2.\displaystyle\approx\sqrt{\frac{8}{\nu}}+\sqrt{2\nu^{3}}\frac{\tilde{B}_{lll}}{C_{l}^{3}}+\frac{3\sqrt{2\nu}}{2}\frac{T_{ll}}{C_{l}^{2}}. (16)

The skewness has a term coming from the Gaussian part of the signal (8/ν\sqrt{8/\nu}), plus a term proportional to the squared bispectrum of the signal B~lll\tilde{B}_{lll}, defined later in Equation (21), and a term proportional to its trispectrum TllT_{ll^{\prime}}. These latter two quantities are exactly those present in Equation (12), and are also the leading order non-Gaussian corrections to the dimensionless kurtosis of C^l\hat{C}_{l}. Indeed, as discussed in Appendix A, all of the dimensionless cumulants of C^l\hat{C}_{l} have leading-order corrections from the trispectrum and squared bispectrum. Since these two quantities are of the same perturbative order this means that the cumulants of C^l\hat{C}_{l} do not progressively smaller as their order increases, and an Edgeworth expansion of the distribution of C^l\hat{C}_{l} around a Wishart will not work. This justifies our approach of expanding the underlying signal as an Edgeworth expansion, and then propagating to get the distribution of the power spectrum.

Substituting Equation (12) into Equation (9), we immediately see that the leading order bispectrum term vanishes due to this term carrying an odd power of klmk_{lm}. The leading order contribution to the distribution is therefore from terms of order the trispectrum or bispectrum squared, whose computation formally requires going to third order in perturbation theory. We will therefore truncate the expansion in Equation (12) at the order shown. Since the probability density is linear in the characteristic function, we can simply add the trispectrum correction to that coming from bispectrum squared. We present the derivation of the characteristic function and the probability density of the power spectrum in Appendix B, with the final result given in Equations (81) and (102). We will simply quote the results here.

II.2.1 Non-Gaussian power distribution from a signal with non-zero trispectrum

In the p=1p=1 case the distribution of C^l\hat{C}_{l} corrected by a signal trispectrum given in Equation (81) reduces to

p({C^l})={l=lminlmaxΓ[C^l;ν/2,ν/(2Cl)]}[1+18l1,l2ν1ν2Tl1l2Cl1Cl2(ΔC^l1ΔC^l2Cl1Cl22ν1+2δl1l2C^l12Cl12)],p(\{\hat{C}_{l}\})=\left\{\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\Gamma[\hat{C}_{l};\nu/2,\nu/(2C_{l})]\right\}\left[1+\frac{1}{8}\sum_{l_{1},l_{2}}\nu_{1}\nu_{2}\frac{T_{l_{1}l_{2}}}{C_{l_{1}}C_{l_{2}}}\left(\frac{\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}}{C_{l_{1}}C_{l_{2}}}-\frac{2}{\nu_{1}+2}\delta_{l_{1}l_{2}}\frac{\hat{C}_{l_{1}}^{2}}{C_{l_{1}}^{2}}\right)\right], (17)

where TllT_{ll^{\prime}} is the mm-averaged version of the signal trispectrum that contributes to the power spectrum covariance, see Equation (15). One can verify that the distribution Equation (17) is correctly normalised, has a mean given by C^l=Cl\langle\hat{C}_{l}\rangle=C_{l} (unchanged from the Gamma case) and a covariance matrix given by ΔC^l1ΔC^l2=2δl1l2Cl12/ν1+Tl1l2\langle\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}\rangle=2\delta_{l_{1}l_{2}}C_{l_{1}}^{2}/\nu_{1}+T_{l_{1}l_{2}}, as expected.

Despite possessing the correct normalisation, mean and covariance, Equation (17) is not a valid probability density since it is not everywhere positive. This arises for sufficiently large C^l\hat{C}_{l}. This is a ubiquitous feature of densities based on the Edgeworth expansion, and can be remedied by writing the correction as an exponential, i.e. 1+xex1+x\approx e^{x}. This is valid since the correction is assumed to perturbatively small.

The first correcting term in Equation (17), proportional to ΔC^l1ΔC^l2\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}, can be written in a suggestive way. It is the leading-order term in the expansion of

12l1,l2ΔC^l1(2Cl12ν1δl1l2+Tl1l2)1ΔC^l2,-\frac{1}{2}\sum_{l_{1},l_{2}}\Delta\hat{C}_{l_{1}}\left(\frac{2C_{l_{1}}^{2}}{\nu_{1}}\delta_{l_{1}l_{2}}+T_{l_{1}l_{2}}\right)^{-1}\Delta\hat{C}_{l_{2}}, (18)

where the notation (Al1l2)1(A_{l_{1}l_{2}})^{-1} refers to the l1,l2l_{1},l_{2} element of the inverse of the matrix with elements Al1l2A_{l_{1}l_{2}}, as should be clear from context. Equation (18) is a χ2\chi^{2}-like functional having the inverse of the total covariance matrix. Once the Gaussian part of this expansion is ‘available’ from the zero-order Gamma term when the ν\nu\to\infty limit is taken, we will be left a Gaussian possessing the correct inverse covariance matrix in its exponent. Similarly, the second term in Equation (17) is the leading-order correction to the determinant pre-factor of a Gaussian due to the non-Gaussian covariance Tl1l2T_{l_{1}l_{2}}. Thus, in the ν\nu\to\infty limit, our distribution is the leading-order correction to a Gaussian due to Tl1l2T_{l_{1}l_{2}}. This can also be seen directly from the characteristic function, Equation (71).

Considered as a function of the model power spectrum ClC_{l} and the trispectrum Tl1l2T_{l_{1}l_{2}}, Equation (17) gives the likelihood function for fixed data C^l\hat{C}_{l}. In the next section we will compute the correction to the likelihood numerically, but there are a few analytic results we can derive from this likelihood. Most significantly, we can compute its expected curvature around a fiducial power spectrum Cf,lC_{f,l} and non-Gaussian covariance Tf,l1l2T_{f,l_{1}l_{2}} i.e. the Fisher matrix. Assuming a model with parameters θα\theta_{\alpha}, this is given by

Fαβ\displaystyle F_{\alpha\beta} =12l=lminlmaxν1Cf,l2ClθαClθβ14l1,l2=lminlmaxν1ν2Tf,l1l2Cf,l12Cf,l22Cl1θαCl2θβ\displaystyle=\frac{1}{2}\sum_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\nu\frac{1}{C_{f,l}^{2}}\frac{\partial C_{l}}{\partial\theta_{\alpha}}\frac{\partial C_{l}}{\partial\theta_{\beta}}-\frac{1}{4}\sum_{l_{1},l_{2}=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\nu_{1}\nu_{2}\frac{T_{f,l_{1}l_{2}}}{C_{f,l_{1}}^{2}C_{f,l_{2}}^{2}}\frac{\partial C_{l_{1}}}{\partial\theta_{\alpha}}\frac{\partial C_{l_{2}}}{\partial\theta_{\beta}}
l1,l2=lminlmaxCl1θα(2Cf,l12νδl1l2+Tf,l1l2)1Cl2θβ,\displaystyle\approx\sum_{l_{1},l_{2}=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\frac{\partial C_{l_{1}}}{\partial\theta_{\alpha}}\left(\frac{2C_{f,l_{1}}^{2}}{\nu}\delta_{l_{1}l_{2}}+T_{f,l_{1}l_{2}}\right)^{-1}\frac{\partial C_{l_{2}}}{\partial\theta_{\beta}}, (19)

where the derivatives are evaluated at the fiducial model. Equation (19) is the usual expression for the power spectrum Fisher matrix but with the proper non-Gaussian covariance matrix. This matches the expression for the Fisher matrix derived from a Gaussian likelihood with the covariance fixed to a fiducial model and including the non-Gaussian term, and tells us that to recover the full-sky Fisher matrix from a likelihood approximated by its Gaussian limit we have to fix the Gaussian covariance to a fiducial model. This is the main result of Ref. Carron (2013), and here we recover it in the presence of signal non-Gaussianity. It is interesting to note that in principle one can allow the non-Gaussian part of the covariance matrix, Tl1l2T_{l_{1}l_{2}}, to depend on parameters in a Gaussian likelihood even when fixing the Gaussian part, as required to keep the approximation consistent. In our perturbative framework the additional information contributed by Tl1l2(θ)T_{l_{1}l_{2}}(\theta) is higher order, but whether this is consistent with the likelihood approximation for general Tl1l2T_{l_{1}l_{2}} is still an open question, and covered by neither our analysis nor that of Ref. Carron (2013).

We note finally that we do not recover the field-level Fisher matrix from our power spectrum Fisher matrix, which now contains new terms due to non-Gaussianity. This is an expression of the fact that the power spectrum does not now contain all the information.

II.2.2 Non-Gaussian power distribution from a signal with non-zero bispectrum

We have argued that a consistent expansion of distribution of the power spectrum requires including terms of order the signal bispectrum squared. In the p=1p=1 case, the correction given in Equation (102) reduces to

p({C^l})=\displaystyle p(\{\hat{C}_{l}\})= {l=lminlmaxΓ[C^l;ν/2,ν/(2Cl)]}\displaystyle\left\{\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\Gamma[\hat{C}_{l};\nu/2,\nu/(2C_{l})]\right\}
×{1+112l1,l2,l3ν1ν2ν3B~l1l2l3Cl12Cl22Cl32[ΔC^l1ΔC^l2ΔC^l3[3]2δl1l2(ν1+2)C^l12ΔC^l3+16δl1l2δl2l3(ν1+4)(ν1+2)C^l13]},\displaystyle\times\left\{1+\frac{1}{12}\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\frac{\tilde{B}_{l_{1}l_{2}l_{3}}}{C^{2}_{l_{1}}C^{2}_{l_{2}}C^{2}_{l_{3}}}\vphantom{\frac{1}{2}}\left[\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}\Delta\hat{C}_{l_{3}}-[3]\frac{2\delta_{l_{1}l_{2}}}{(\nu_{1}+2)}\hat{C}_{l_{1}}^{2}\Delta\hat{C}_{l_{3}}+\frac{16\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+4)(\nu_{1}+2)}\hat{C}_{l_{1}}^{3}\right]\right\}, (20)

where B~\tilde{B} is proportional to the square of the reduced bispectrum bl1l2l3b_{l_{1}l_{2}l_{3}} of the signal and is given by

B~l1l2l3=14π(l1l2l3000)2bl1l2l32,\tilde{B}_{l_{1}l_{2}l_{3}}=\frac{1}{4\pi}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ 0&0&0\end{pmatrix}^{2}b_{l_{1}l_{2}l_{3}}^{2}, (21)

with the quantity in parentheses a Wigner 3j3j symbol. The notation [n][n] in Equation (20) refers to the nn terms that follow from symmetry considerations. It is straightforward to verify that the correction leaves the normalisation, mean, and covariance matrix unchanged, but corrects the three-point function by a term equal to 4B~l,l,l′′4\tilde{B}_{l,l^{\prime},l^{\prime\prime}}, as expected from Equation (38).

One can verify that the Fisher matrix is unaffected by the squared bispectrum correction at leading order. In other words, the effect of B~l1l2l3\tilde{B}_{l_{1}l_{2}l_{3}} is to modify the skewness of the likelihood, while leaving its mean and variance approximately unaffected.

As claimed in Section A.1, the correction to the density is 𝒪(B~/Cl3)\mathcal{O}(\tilde{B}/C_{l}^{3}), the same order as the trispectrum correction. This justifies our decision to include terms of order the bispectrum squared.

In summary, the leading-order correction to the distribution of power spectrum estimates from signal non-Gaussianity is

p({C^l}|{Cl,Tl,l,B~l,l,l′′})\displaystyle p(\{\hat{C}_{l}\}|\{C_{l},T_{l,l^{\prime}},\tilde{B}_{l,l^{\prime},l^{\prime\prime}}\}) ={l=lminlmaxΓ[C^l;ν/2,ν/(2Cl)]}{1+18l1,l2ν1ν2Tl1l2Cl1Cl2(ΔC^l1ΔC^l2Cl1Cl22ν1+2δl1l2C^l12Cl12)\displaystyle=\left\{\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\Gamma[\hat{C}_{l};\nu/2,\nu/(2C_{l})]\right\}\left\{1+\frac{1}{8}\sum_{l_{1},l_{2}}\nu_{1}\nu_{2}\frac{T_{l_{1}l_{2}}}{C_{l_{1}}C_{l_{2}}}\left(\frac{\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}}{C_{l_{1}}C_{l_{2}}}-\frac{2}{\nu_{1}+2}\delta_{l_{1}l_{2}}\frac{\hat{C}_{l_{1}}^{2}}{C_{l_{1}}^{2}}\right)\right.
+112l1,l2,l3ν1ν2ν3B~l1l2l3Cl12Cl22Cl32[ΔC^l1ΔC^l2ΔC^l3[3]2δl1l2(ν1+2)C^l12ΔC^l3+16δl1l2δl2l3(ν1+4)(ν1+2)C^l13]}.\displaystyle\left.+\frac{1}{12}\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\frac{\tilde{B}_{l_{1}l_{2}l_{3}}}{C^{2}_{l_{1}}C^{2}_{l_{2}}C^{2}_{l_{3}}}\vphantom{\frac{1}{2}}\left[\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}\Delta\hat{C}_{l_{3}}-[3]\frac{2\delta_{l_{1}l_{2}}}{(\nu_{1}+2)}\hat{C}_{l_{1}}^{2}\Delta\hat{C}_{l_{3}}+\frac{16\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+4)(\nu_{1}+2)}\hat{C}_{l_{1}}^{3}\right]\right\}. (22)

Equation (22) is the first major result of this work. It is the leading-order likelihood function for the power spectrum that incorporates a non-Gaussian signal. Unlike the commonly assumed Gaussian approximation, this distribution applies equally well to the largest angular scales. This distribution has the correct normalization, mean, covariance, and three-point function. In Appendix B we provide the analogous expression in the case of multiple redshift bins, in which case the distribution is a perturbation around a Wishart distribution for positive definite matrices. A practical application of the distribution may require ensuring positivity by taking the logarithm, but in practice we have not found this to be necessary. In Section III we present numerical results for the transition of the power spectrum distribution from Wishart to post-Wishart (our Equation (22)) and finally to Gaussian as the number of degrees of freedom increases.

We close this section by noting that while we have restricted ourselves to spin-0 fields (such as lensing convergence or projected galaxy number counts) for simplicity, our formalism may be easily generalized to the spin-2 case (i.e. weak lensing shear), as mentioned at the beginning of Section II. Since we work in harmonic space on the full sky, one can simply work with the EE and BB multipoles derived in the usual way from the shear field. This augments the data vector to have length 2×Nz2\times N_{z}, but all the matrix expressions presented here can be used after making this modification. Redshift bin indices must now be interpreted as also running over EE and BB indices, and the appropriate cross-bispectra and trispectra must be used (many of which will be zero due to parity invariance).

On the cut sky our expression must be modified to account for the mask, which we have not attempted to include. This induces additional mode mixing (as well as the usual EE/BB ambiguity), which could possibly be included using the techniques of either Ref. Hamimeche and Lewis (2008) or Ref. Upham et al. (2020). We defer this to a future work.

II.3 The transition to Gaussianity

A key goal of this paper is to understand and quantify how efficiently the Central Limit Theorem (CLT) Gaussianizes the distribution of C^l\hat{C}_{l} in the presence of signal non-Gaussianity. One could well imagine a situation for example where the strength of non-Gaussianity is increasing with lmaxl_{\mathrm{max}} faster than the CLT is able to Gaussianize, in which the Gaussian limit may never be achieved. In this section we use our new distribution to study how the CLT operates in detail. We specialise to the single redshift bin case, p=1p=1.

Convergence of a probability density to a limiting distribution can be established by studying the pointwise convergence of its characteristic function. We have already computed this, and we will additionally change variables from C^l\hat{C}_{l} to the ‘standardised’ variable Z^lΔC^l/2Cl2/ν\hat{Z}_{l}\equiv\Delta\hat{C}_{l}/\sqrt{2C_{l}^{2}/\nu}. The characteristic function of the set of Z^l\hat{Z}_{l} variables is

ϕ{Z^l}({Jl})\displaystyle\phi_{\{\hat{Z}_{l}\}}(\{J_{l}\}) =[l=lminlmaxeiJlν/2(1+iJl2/ν)ν/2][112l1,l2εl1l2Jl1Jl2(1+iJl12/ν1)(1+iJl22/ν2)\displaystyle=\left[\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}e^{iJ_{l}\sqrt{\nu/2}}(1+iJ_{l}\sqrt{2/\nu})^{-\nu/2}\right]\left[1-\frac{1}{2}\sum_{l_{1},l_{2}}\varepsilon_{l_{1}l_{2}}\frac{J_{l_{1}}J_{l_{2}}}{(1+iJ_{l_{1}}\sqrt{2/\nu_{1}})(1+iJ_{l_{2}}\sqrt{2/\nu_{2}})}\right.
+i6l1,l2,l3ε~l1l2l3Jl1Jl2Jl3(1+iJl12/ν1)(1+iJl22/ν2)(1+iJl32/ν3)],\displaystyle\left.+\frac{i}{6}\sum_{l_{1},l_{2},l_{3}}\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}\frac{J_{l_{1}}J_{l_{2}}J_{l_{3}}}{(1+iJ_{l_{1}}\sqrt{2/\nu_{1}})(1+iJ_{l_{2}}\sqrt{2/\nu_{2}})(1+iJ_{l_{3}}\sqrt{2/\nu_{3}})}\right], (23)

where we have defined the quantities

εl1l2\displaystyle\varepsilon_{l_{1}l_{2}} Tl1l22Cl12/ν12Cl22/ν2,\displaystyle\equiv\frac{T_{l_{1}l_{2}}}{\sqrt{2C_{l_{1}}^{2}/\nu_{1}}\sqrt{2C_{l_{2}}^{2}/\nu_{2}}}, (24)
ε~l1l2l3\displaystyle\tilde{\varepsilon}_{l_{1}l_{2}l_{3}} 4B~l1l2l32Cl12/ν12Cl22/ν22Cl32/ν3.\displaystyle\equiv\frac{4\tilde{B}_{l_{1}l_{2}l_{3}}}{\sqrt{2C_{l_{1}}^{2}/\nu_{1}}\sqrt{2C_{l_{2}}^{2}/\nu_{2}}\sqrt{2C_{l_{3}}^{2}/\nu_{3}}}. (25)

The quantity εl1l2\varepsilon_{l_{1}l_{2}} has the simple interpretation that it is the relative correction to the power spectrum covariance matrix from non-Gaussianity. Its diagonal elements can also be interpreted as the correction to the ‘effective degrees of freedom’ through νeff=ν/(1+εll)\nu_{\mathrm{eff}}=\nu/(1+\varepsilon_{ll}) such that the variance of the power spectrum is 2Cl2/νeff2C_{l}^{2}/\nu_{\mathrm{eff}}, although simply making this correction in the likelihood is not sufficient as it does not capture, among other things, mode coupling.

Consider now taking every ν1\nu\gg 1, i.e. consider the limit lmin1l_{\mathrm{min}}\gg 1. The zero-order prefactor in Equation (23) converges to a Gaussian, with the ‘linear’ part of the exponent getting cancelled by the complex exponential already present. Likewise, all the denominators in the non-Gaussian correction terms tend to unity at the same rate, leaving a quadratic correction from the trispectrum and a cubic correction from the bispectrum. In this limit, the correction is of the Edgeworth form. At the order to which we work, we may just as well replace the quadratic term by a Gaussian using eλ21λ2e^{-\lambda^{2}}\approx 1-\lambda^{2}, in which case the trispectrum term is absorbed into the covariance matrix of the Gaussian prefactor. This leaves the correction in the lmin1l_{\mathrm{min}}\gg 1 limit as

ϕ{Z^l}({Jl})exp[12l1l2Jl1(δl1l2+εl1l2)Jl2](1+i6l1,l2,l3ε~l1l2l3Jl1Jl2Jl3).\phi_{\{\hat{Z}_{l}\}}(\{J_{l}\})\approx\exp\left[-\frac{1}{2}\sum_{l_{1}l_{2}}J_{l_{1}}(\delta_{l_{1}l_{2}}+\varepsilon_{l_{1}l_{2}})J_{l_{2}}\right]\left(1+\frac{i}{6}\sum_{l_{1},l_{2},l_{3}}\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}J_{l_{1}}J_{l_{2}}J_{l_{3}}\right). (26)

For this to be valid, we clearly need l1l2Jl1Jl2εl1l21\sum_{l_{1}l_{2}}J_{l_{1}}J_{l_{2}}\varepsilon_{l_{1}l_{2}}\ll 1. A necessary condition for this is that every element of ε\varepsilon is ‘small’ in a sense that depends on the quantity of interest. For example, if one was only interested in the marginal distribution of a single C^l\hat{C}_{l}, it would be sufficient to demand that only a single εll1\varepsilon_{ll}\ll 1.

The dominant effect of the trispectrum in this limit is clearly a change to the covariance matrix. Residual ‘post-Gaussian’ effects come from the squared bispectrum through the quantity ε~l1l2l3\tilde{\varepsilon}_{l_{1}l_{2}l_{3}} affecting the three-point function of the power spectrum. These are negligible when ε~l1l2l31\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}\ll 1 at all multipoles.

One guaranteed way to make both ε\varepsilon and ε~\tilde{\varepsilon} small is to increase ClC_{l} at fixed Tl1l2T_{l_{1}l_{2}} or fixed B~l1l2l3\tilde{B}_{l_{1}l_{2}l_{3}}. This may be achieved by simply increasing the (assumed Gaussian) noise power, since ClC_{l} here is the total power spectrum. This is equivalent to saying that the noise ‘Gaussianizes’ the distribution. One must be careful however to check this Gaussianization also operates on not just the likelihood but the posterior, as we will do shortly.

The quantity εl1l2\varepsilon_{l_{1}l_{2}} is known to be significant on non-linear scales and is routinely measured in simulations (usually in a binned form, see Figure 1 for a related quantity). In contrast, the size of ε~l1l2l3\tilde{\varepsilon}_{l_{1}l_{2}l_{3}} is less obvious. We can make an order-of-magnitude estimate of its size using the asymptotic limit of the Wigner-3j3j symbol Ponzano and Regge (1968); Bernardeau et al. (2011)

limν1,ν2,ν3(l1l2l3000)2=12πA(L1,L2,L3),\lim_{\nu_{1},\nu_{2},\nu_{3}\to\infty}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ 0&0&0\end{pmatrix}^{2}=\frac{1}{2\pi A(L_{1},L_{2},L_{3})}, (27)

where Ll+1/2L\equiv l+1/2 and A(L1,L2,L3)A(L_{1},L_{2},L_{3}) is the area of the triangle having side lengths L1L_{1}, L2L_{2}, L3L_{3}. The squared 3j3j symbol thus decays as 1/ν21/\nu^{2} for large ν=min(ν1,ν2,ν3)\nu=\mathrm{min}(\nu_{1},\nu_{2},\nu_{3}). A necessary condition for negligible non-Gaussianity is therefore

[ν1ν2ν3/24π2A(L1,L2,L3)]1/2|bl1l2l3|Cl1Cl2Cl31\left[\frac{\sqrt{\nu_{1}\nu_{2}\nu_{3}/2}}{4\pi^{2}A(L_{1},L_{2},L_{3})}\right]^{1/2}\frac{\lvert b_{l_{1}l_{2}l_{3}}\rvert}{\sqrt{C_{l_{1}}C_{l_{2}}C_{l_{3}}}}\ll 1 (28)

when lmin1l_{\mathrm{min}}\gg 1. The pre-factor on the left-hand side of Equation (28) is a weak function of ν\nu, and is typically 102\sim 10^{-2} for the relevant scales and for most triangle configurations. The reduced bispectrum for weak lensing has been measured for a few triangle configurations from NN-body simulations in Ref. Munshi et al. (2020), who find that it decays with ll for most configurations, with the slowest decay being a roughly l1l^{-1} drop-off in their ‘squeezed’ configuration. This suggests that for the lensing convergence, the bispectrum-squared term should become rapidly negligible with ll. At l2000l\gtrsim 2000 where shape noise starts to dominate the power spectrum, Munshi et al. (2020) find that the reduced bispectrum for an equilateral configuration is 1017\sim 10^{-17}, and therefore the left-hand side of Equation (28) is 105\sim 10^{-5}. At l=100l=100 we have Cl109C_{l}\sim 10^{-9} and blll1017b_{lll}\sim 10^{-17}, so the left-hand side of Equation (28) is again 105\sim 10^{-5}. For a squeezed shape with l1=50l_{1}=50, l2=l3=2000l_{2}=l_{3}=2000, Munshi et al. (2020) finds b1016b\sim 10^{-16}. We find that Cl1108C_{l_{1}}~\sim 10^{-8}, and the left-hand side of Equation (28) is 104\sim 10^{-4} i.e. slightly more significant. A typical element of ε~l1l2l3\tilde{\varepsilon}_{l_{1}l_{2}l_{3}} is therefore around 10910^{-9}.

In practice the quantity that should be small is the sum of ε\varepsilon across the scales most relevant to the quantity of interest. The order-of-magnitude analysis above suggests that for a small subset of the ClC_{l} the correction from non-Gaussianity is negligible, but if the number of relevant scales is large (as might be the case for massive data compression in order to infer cosmological parameters) then the conditions on ε\varepsilon are more strict. Assuming an approximately constant ε~l1l2l3\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}, using all available scales up to lmaxl_{\mathrm{max}} would require ε~l1l2l3lmax3\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}\ll l_{\mathrm{max}}^{-3} for example. For ε~l1l2l3109\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}\sim 10^{-9} corrections become large around lmax1000l_{\mathrm{max}}\approx 1000. This is within the range of scale useable angular scales for Stage-IV lensing surveys, which motivates the more quantitative study with simulations in Section III.

II.3.1 Non-Gaussian effects on posterior distributions

The sampling distribution of C^l\hat{C}_{l} itself is usually not of interest. What matters more is the posterior distribution on the ClC_{l}, or on model parameters that generate the ClC_{l}. In this section we study analytically how non-Gaussianity affects the posterior, using our corrected likelihood Equation (22).

First we study the joint posterior distribution of the set of theory ClC_{l}. This is simply given by p({Cl}|{C^l})p({C^l}|{Cl})π({Cl})p(\{C_{l}\}|\{\hat{C}_{l}\})\propto p(\{\hat{C}_{l}\}|\{C_{l}\})\pi(\{C_{l}\}), where π({Cl})\pi(\{C_{l}\}) is the prior. Natural priors for the ClC_{l} are the uniform prior, which isolates the dependence on the likelihood, and the Jeffreys prior, which is uninformative. For a Gaussian likelihood the two coincide, but this is not the case for either the Gamma likelihood nor our Equation (22).

The Jeffreys prior is given by the square-root of the determinant of the Fisher matrix. We have computed this for our likelihood, and at leading order we have

π({Cl})(l=lminlmaxCl1)(114lνTllCl2)\pi(\{C_{l}\})\propto\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}C_{l}^{-1}\right)\left(1-\frac{1}{4}\sum_{l}\nu\frac{T_{ll}}{C_{l}^{2}}\right) (29)

The zero-order posterior for each ClC_{l} is thus an Inverse-Gamma distribution, denoted by Γ1\Gamma^{-1} given by

pΓ1(Cl|C^l)=(νC^l/2)ν/2Γ(ν/2)Clν/21exp(νC^l2Cl).p_{\Gamma^{-1}}(C_{l}|\hat{C}_{l})=\frac{(\nu\hat{C}_{l}/2)^{\nu/2}}{\Gamma(\nu/2)}\,C_{l}^{-\nu/2-1}\exp\left(-\frac{\nu\hat{C}_{l}}{2C_{l}}\right). (30)

Weak signal non-Gaussianity perturbs away from this zero-order distribution.

In what follows we will only include trispectrum terms, which we have found to be the dominant source of non-Gaussianity in the power spectrum distribution. Substituting in the Jeffreys prior and normalizing, the correction to the Inverse-Gamma posterior is

p({Cl}|{C^l})=\displaystyle p(\{C_{l}\}|\{\hat{C}_{l}\})= [l=lminlmaxΓ1(Cl;ν/2,νCl/2)]\displaystyle\left[\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\Gamma^{-1}(C_{l};\nu/2,\nu C_{l}/2)\right]
×{1+14l1l2ε^l1l2[C^l1C^l2Cl1Cl2(ΔC^l1ΔC^l2Cl1Cl22δl1l2ν1+2C^l12Cl122α~δl1l2ν1)+δl1l2[2α~(ν1+2)4]4ν1ν2]}\displaystyle\times\left\{1+\frac{1}{4}\sum_{l_{1}l_{2}}\hat{\varepsilon}_{l_{1}l_{2}}\left[\frac{\hat{C}_{l_{1}}\hat{C}_{l_{2}}}{C_{l_{1}}C_{l_{2}}}\left(\frac{\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}}{C_{l_{1}}C_{l_{2}}}-\frac{2\delta_{l_{1}l_{2}}}{\nu_{1}+2}\frac{\hat{C}_{l_{1}}^{2}}{C_{l_{1}}^{2}}-\frac{2\tilde{\alpha}\delta_{l_{1}l_{2}}}{\nu_{1}}\right)+\frac{\delta_{l_{1}l_{2}}[2\tilde{\alpha}(\nu_{1}+2)-4]-4}{\nu_{1}\nu_{2}}\right]\right\} (31)

where ε^\hat{\varepsilon} is defined as in Equation (24) but with C^l\hat{C}_{l} in place of ClC_{l}, and α~\tilde{\alpha} is a book-keeping parameter that keeps track of the influence of the correction to the Jeffreys prior; α~=1\tilde{\alpha}=1 represents the first-order-corrected Jeffreys prior, and α~=0\tilde{\alpha}=0 uses only the zero-order Jeffreys prior.

Corrections to the posterior moments from Tl1l2T_{l_{1}l_{2}} can now be computed easily using the properties of the Inverse-Gamma distribution. For example, the mean of a single CLC_{L} after marginalising over all other ClC_{l} receives a fractional correction of order 𝒪(ε^/ν)\mathcal{O}(\hat{\varepsilon}/\nu), i.e.  suppressed at large ν\nu. The leading-order fractional correction to the marginal variance of a single CLC_{L} is ε^LL\hat{\varepsilon}_{LL}, as expected, and is entirely captured by a Gaussian having the full non-Gaussian covariance matrix. The leading-order ‘post-Gaussian’ correction from signal non-Gaussianity is 𝒪(ε^/ν)\mathcal{O}(\hat{\varepsilon}/\nu). Note the variance under a Gaussian approximation also differs from that of the Inverse-Gamma by a fractional amount 𝒪(1/ν)\mathcal{O}(1/\nu) when ν\nu is finite. Our formalism is valid at all orders in 1/ν1/\nu, but does assume small ε\varepsilon. Corrections to the mean and variance not captured by a Gaussian with non-Gaussian covariance matrix are thus 𝒪(ε^/ν)\mathcal{O}(\hat{\varepsilon}/\nu). It turns out that these are also the corrections to the conditional mean and variance when all but one ClC_{l} are fixed to their measured values, with the mean correction further suppressed if α~=1\tilde{\alpha}=1.

To study the Gaussian limit in the ClC_{l} posterior, we write Cl=C^l(1+n2/ν)C_{l}=\hat{C}_{l}(1+n\sqrt{2/\nu}) and take the ν\nu\to\infty limit. The leading-order term, quadratic in nn, is precisely that specified by a Gaussian with non-Gaussian covariance. Leading-order post-Gaussian terms at fixed nn are 𝒪(ε/ν)\mathcal{O}(\varepsilon/\sqrt{\nu}), i.e. larger than the correction to the mean and variance. This correction is a cubic polynomial in nn, suggesting that it corresponds to a change in the skewness of the posterior ClC_{l}, not captured by a Gaussian. It turns out that this correction can be accounted for by a slight modification to the Gaussian where we replace Tl1l2T_{l_{1}l_{2}} in the covariance matrix with Tl1l2(C^l12/Cl12)(C^l22/Cl22)T_{l_{1}l_{2}}(\hat{C}_{l_{1}}^{2}/C_{l_{1}}^{2})(\hat{C}_{l_{2}}^{2}/C_{l_{2}}^{2}), but in practice this post-Gaussian term is small and the correction does not lead to noticeable improvement. Note that we have neglected the bispectrum-squared term in this derivation, which will also modify the ClC_{l} skewness.

We can also study the distribution of an amplitude parameter by setting Cl=AC0,lC_{l}=AC_{0,l} for some fiducial power spectrum C0,lC_{0,l} (we neglect noise here for simplicity). We find a fractional correction to the mean of order 𝒪(ε/ν)\mathcal{O}(\langle\varepsilon\rangle/\sum\nu), where we define the ll-averaged quantity

ε\displaystyle\langle\varepsilon\rangle l1l2ν1ν2εl1l2lν\displaystyle\equiv\frac{\sum_{l_{1}l_{2}}\sqrt{\nu_{1}\nu_{2}}\varepsilon_{l_{1}l_{2}}}{\sum_{l}\nu}
=l1l2ν1ν2Tl1l22Cl1Cl2lν,\displaystyle=\frac{\sum_{l_{1}l_{2}}\nu_{1}\nu_{2}\frac{T_{l_{1}l_{2}}}{2C_{l_{1}}C_{l_{2}}}}{\sum_{l}\nu}, (32)

with ε\varepsilon evaluated at the fiducial model. ε\langle\varepsilon\rangle has the simple interpretation that it is the leading-order fractional correction to the posterior parameter variance from the non-Gaussian covariance term. Similarly, the leading-order post-Gaussian fractional correction to the variance of AA is 𝒪(ε/ν)\mathcal{O}(\langle\varepsilon\rangle/\sum\nu). Corrections to the dimensionless skew of AA are 𝒪(ε/ν)\mathcal{O}(\langle\varepsilon\rangle/\sqrt{\sum\nu}), as are corrections to the pointwise posterior distribution of AA.

To summarise this section, the biggest effect of a signal trispectrum is to correct the posterior variance of parameters through the non-Gaussian covariance term. This is typically of order ε\langle\varepsilon\rangle, where the average runs over the scales of interest. We require this correction to be small in order for our perturbative series to converge, i.e. the validity of our distributions require ε1\langle\varepsilon\rangle\ll 1. A good rule of thumb is that we require the correction to the Fisher information from signal non-Gaussianity to be much less than unity. The broadening of parameter errors is mostly accounted for by a Gaussian likelihood with non-Gaussian covariance - the genuinely new terms we have computed here are ‘post-Gaussian’, and give a skewness and fractional pointwise correction to the probability density which are both of order ε/ν\langle\varepsilon\rangle/\sqrt{\sum\nu} at large ν\nu, which by Equation (44) is also the leading correction to the dimensionless skewness of the measured power spectrum. At large ν\nu this is strongly suppressed due to the Central Limit Theorem if ε\langle\varepsilon\rangle does not grow faster than ν\sqrt{\sum\nu}. In Section III we quantify this with simulations. Corrections due to the squared bispectrum also affect this skewness. Corrections to the posterior mean and variance are smaller, of order ε/ν\langle\varepsilon\rangle/\sum\nu at large ν\nu, which is of order the correction to the dimensionless kurtosis of the measured power spectrum.

II.4 Bandpowers

It is usually advantageous to bin the power spectrum in angular scale in order to reduce dimensionality of the required covariance matrix, account for mode mixing due to the survey mask, or to speed up likelihood calculations. Binning has a strong effect on the apparent impact of non-Gaussianity in the covariance matrix, since the Gaussian term will be suppressed due to the increasing mode count while the non-Gaussian term will stay roughly constant.

Our binning scheme is defined by

C^b=lbWlC^llbWl,\hat{C}_{b}=\frac{\sum_{l\in b}W_{l}\hat{C}_{l}}{\sum_{l\in b}W_{l}}, (33)

for a window function WlW_{l}, where bb labels the bin. For narrow bins, an optimal binning scheme would weight with the inverse covariance of C^l\hat{C}_{l}. To avoid dependence on the model we weight by the mode count and take Wl=2l+1W_{l}=2l+1. We explore different binning choices in Appendix C.

Assuming ClC_{l} is roughly constant over the extent of the bin, the zero-order characteristic function of the C^b\hat{C}_{b} is

ϕ{νbC^b}({Jb})\displaystyle\phi_{\{\nu_{b}\hat{C}_{b}\}}(\{J_{b}\}) =blb(1+2iJbCl)ν/2\displaystyle=\prod_{b}\prod_{l\in b}(1+2iJ_{b}C_{l})^{-\nu/2}
b(1+2iJbCb)νb/2,\displaystyle\approx\prod_{b}(1+2iJ_{b}C_{b})^{-\nu_{b}/2}, (34)

where νblbν\nu_{b}\equiv\sum_{l\in b}\nu. In other words, the zero-order distribution of the binned power spectra is approximately a Gamma distribution. In practice we can eliminate the need to approximate a constant ClC_{l} over the bin by working with C^l/Cl\hat{C}_{l}/C_{l}, but for sufficiently narrow bins the above approximation is accurate.

In the presence of signal non-Gaussianity, we can make a similar approximation that ClC_{l} is flat across each bin and simply replace l1,2,3l_{1,2,3} with b1,2,3b_{1,2,3} and ν1,2,3\nu_{1,2,3} with νb1,b2,b3\nu_{b_{1},b_{2},b_{3}} everywhere in the likelihood. Quantities such as Tl1l2T_{l_{1}l_{2}} and B~l1l2l3\tilde{B}_{l_{1}l_{2}l_{3}} can be replaced with their binned versions.

Since the number of modes in a bin scales with its width Δlb\Delta l_{b}, the fractional correction to the covariance scales as εbbΔlb\varepsilon_{bb}\propto\Delta l_{b}. The number of terms in the sum over εb1b2\varepsilon_{b_{1}b_{2}} in the denominator of ε\langle\varepsilon\rangle scales between 1/Δlb21/\Delta l_{b}^{2} for non-sparse εb1b2\varepsilon_{b_{1}b_{2}} and 1/Δlb1/\Delta l_{b} for diagonal εb1b2\varepsilon_{b_{1}b_{2}}. Therefore, corrections from the signal trispectrum scale somewhere between εΔlb\langle\varepsilon\rangle\propto\Delta l_{b} for diagonal ε\varepsilon and εconstant\langle\varepsilon\rangle\approx\mathrm{constant} for non-sparse ε\varepsilon. In practice the ε\varepsilon is not diagonal, and so we can expect our likelihood corrections to be only mildly sensitive to the choice of binning, with the generic feature that the corrections are larger for broader bins. We confirm this in Appendix C.

III Quantifying corrections from non-Gaussianity

In the previous section we derived the leading order correction to the probability density of power spectrum measurements in the presence of signal non-Gaussianity. Considered as a function of the unknown true power spectrum ClC_{l}, the signal trispectrum TllT_{ll^{\prime}}, and the squared bispectrum B~lll′′\tilde{B}_{ll^{\prime}l^{\prime\prime}}, this distribution gives the likelihood function. It is the impact of non-Gaussianity on the likelihood (or posterior) that is of most interest to us here.

We have seen that the Fisher information contributed by the parameter dependence of TllT_{ll^{\prime}} and B~lll′′\tilde{B}_{ll^{\prime}l^{\prime\prime}} is formally of higher order in the (assumed weak) non-Gaussianity, so in this section we will fix these to a fiducial model. Our approach is to use approximate simulations to compute these quantities in a fiducial cosmology, but we could just as well compute them using the halo model or fitting functions. Once we have computed these non-Gaussian signal cumulants, we can use Equation (22) to compute the correction to the likelihood of the ClC_{l}, or any parameters on which ClC_{l} depends.

III.1 Simulations

To compute TllT_{ll^{\prime}} and B~lll′′\tilde{B}_{ll^{\prime}l^{\prime\prime}} we create a large suite of lognormal lensing convergence maps using the flask software package Xavier et al. (2016). Lognormal fields are known to give a reasonable approximation for the covariance of lensing two-point functions Taruya et al. (2002); Hilbert et al. (2011), although their ability to predict configurations of the bispectrum is less well explored Martin et al. (2012). We will cross-validate our results with a small sample of lensing maps from ray-traced from NN-body simulations.

We generated 10510^{5} lognormal convergence maps with a fixed power spectrum generated with camb Lewis et al. (2000); Howlett et al. (2012). Our fiducial galaxy sample is modelled with a Euclid-like lensing sample in mind. We assume a single broad redshift distribution for lensing sources, peaked around z0.9z\approx 0.9, having the same functional form as Ref. Euclid Collaboration et al. (2020), with cosmological parameters set to the best-fit values of Ref. Planck Collaboration et al. (2020). The alternative choice of a narrow source redshift bin peaked at zs=0.325z_{s}=0.325, corresponding to the lowest redshift bin of a Euclid-like survey with 10 tomographic bins, is explored in Appendix D. We compute the power spectrum to lmax=6143l_{\mathrm{max}}=6143, applying non-linear and baryon feedback corrections according to the default specifications of Ref. Mead et al. (2021). For simplicity we do not include intrinsic alignments in the model; this will not affect our results. The ‘shift parameter’ corresponding to the (negative) minimum value of the convergence fed to flask is set to 0.012140.01214, which was chosen by hand to get a good match to NN-body simulations, and is comparable to the value found in Ref. Friedrich et al. (2021) by fitting the 1-point kurtosis of the convergence. Convergence fields are rendered on a healpix666https://healpix.sourceforge.io Górski et al. (2005) grid with Nside=2048N_{\mathrm{side}}=2048. In our actual likelihood computations we only use scales down to lmax=1000l_{\mathrm{max}}=1000, which is sufficiently small to ensure the input power spectrum is recovered with negligible bias777The exponentiation of the Gaussian field needed to produce a lognormal field transfers power to sub-grid scales where it cannot be recovered from the discrete healpix grid. This causes a negative bias in the power spectrum on all scales that increases in severity as the Nyquist frequency lNy3Nside1l_{\mathrm{Ny}}\approx 3N_{\mathrm{side}}-1 is approached., but sufficiently large to cover a realistic range of scales for lensing surveys. We apply the healpix pixel window function to the map to mimic the effect of coarse binning of a higher resolution galaxy catalogue.

To check that the lognormal fields are producing sensible results for the signal cumulants, we compare the convergence statistics with those from a suite of ray-traced NN-body simulations from Ref. Takahashi et al. (2017). These simulations assume a thin source redshift plane at zs=1.0334z_{s}=1.0334, but the resulting lensing power spectrum only differs from a fiducial ClC_{l} by a few percent across most scales.

Refer to caption
Figure 1: Ratio of the diagonal elements of the ClC_{l} covariance matrix to that of Gaussian fields for our lognormal mocks (orange) and the ray-traced NN-body simulations of Ref. Takahashi et al. (2017) (blue). We have binned the power spectra with Δlog10l=0.1\Delta\log_{10}l=0.1. Ref. Takahashi et al. (2017) speculate that the excess variance at large angular scales in the NN-body mocks is due to the finite thickness of their lensing shells.

In Figures 1 and 2 we compare the covariance matrix of the measured Cl^\hat{C_{l}} from the lognormal mocks with those of the NN-body mocks. Since the latter contains much fewer realizations (100\sim 100) the recovered statistics are much noisier, but a meaningful comparison is still possible. Figure 1 shows the ratio of the diagonal elements of the covariance matrix compared with a Gaussian prediction based on the input power spectrum (we have binned the C^l\hat{C}_{l} to emphasise the differences). The agreement between the two is reasonable, with the NN-body mocks having excess variance on large scales, as discussed in Ref. Takahashi et al. (2017). The lognormal mocks overestimate the variance on small scales, but since we are only seeking approximations to the trispectrum here the difference is sufficiently small. The non-Gaussian contribution to the covariance dramatically kicks up when l>200l>200, due to the signal trispectrum. Note that our lognormal maps are on the full sky and are produced by a local non-linear transform of the linear field, so do not capture the super-sample covariance, but the agreement with the NN-body mocks is nonetheless acceptable.

Refer to caption
Figure 2: Slices through the correlation matrix for a few multipole bins containing the ll values indicated above each panel. The diagonal elements (unity) has been omitted for clarity, and are indicated by the red dashed vertical lines.

In Figure 2 we plot slices through the correlation matrix. The agreement of the lognormal mocks with the NN-body mocks is not perfect, but acceptable. Above l200l\approx 200 mode coupling in the NN-body mocks kicks up dramatically, much in the same way as in the diagonal elements. The lognormal mocks do not capture this well, instead predicting a smooth increase, but the agreement is within an order of magnitude and acceptable.

To measure the bispectrum from the simulations we use the estimator

b^l1l2l3=1Nl1l2l3m1=l1l1m2=l2l2m3=l3l3(l1l2l3m1m2m3)al1m1al2m2al3m3,\hat{b}_{l_{1}l_{2}l_{3}}=\frac{1}{N_{l_{1}l_{2}l_{3}}}\sum_{m_{1}=-l_{1}}^{l_{1}}\sum_{m_{2}=-l_{2}}^{l_{2}}\sum_{m_{3}=-l_{3}}^{l_{3}}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ m_{1}&m_{2}&m_{3}\end{pmatrix}a_{l_{1}m_{1}}a_{l_{2}m_{2}}a_{l_{3}m_{3}}, (35)

where

Nl1l2l3=ν1ν2ν34π(l1l2l3000),N_{l_{1}l_{2}l_{3}}=\sqrt{\frac{\nu_{1}\nu_{2}\nu_{3}}{4\pi}}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ 0&0&0\end{pmatrix}, (36)

which is optimal for Gaussian fields. We use the Python package spherical888https://pypi.org/project/spherical/ to precompute the Wigner 3j3j symbols to allow for rapid summation over the azimuthal wave numbers in Equation (35), but the implementation is slow for l>100l>100 so we limit our bispectrum measurements to this maximum multipole.

Refer to caption
Figure 3: Bispectra of our lognormal mocks (orange) and the NN-body mocks of Ref. Takahashi et al. (2017) (blue). We show an equilateral configuration (top left), a squeezed configuration with long wavelength mode lL=2l_{\mathrm{L}}=2 (top right), a folded configuration with l=2l=2l′′l=2l^{\prime}=2l^{\prime\prime} (bottom left), and an isosceles configuration with common side length lcommon=50l_{\mathrm{common}}=50 (bottom right). The bispectra have not been binned in multipole.

In Figure 3 we show the reduced bispectrum measured from our lognormal convergence maps and from the NN-body mocks of Ref. Takahashi et al. (2017). The agreement is reasonable, although the noise in the NN-body measurement is large. Note that the strength of lognormality was chosen to give a good match to the non-Gaussian part of the power spectrum covariance, so the agreement in the bispectrum is not guaranteed by construction. We also find qualitative agreement with the measurements of Ref. Munshi et al. (2020).

To mitigate the impact of noise in our likelihood we fit the non-Gaussian signal polyspectra with smooth multivariate fifth-order polynomials. To make these low-order polynomials good fits we first bin the polyspectra, as described in Section II.4, in equally spaced bins in logl\log l. We separately fit a univariate polynomial to the ratio of the diagonal elements of the covariance to the Gaussian prediction and a bivariate polynomial to the correlation matrix (excluding the diagonal elements). We fit a trivariate polynomial to the reduced bispectrum before squaring and forming the quantity B~l1l2l3\tilde{B}_{l_{1}l_{2}l_{3}}. We take care to include only the non-zero elements of the bispectrum in the fit.

With TllT_{ll^{\prime}} and B~lll′′\tilde{B}_{ll^{\prime}l^{\prime\prime}} in hand we can now compute the size of the correction to weak lensing likelihoods and posteriors from signal non-Gaussianity, using Equation (22).

III.2 Results

Refer to caption
Figure 4: Fractional correction to the Gamma likelihood from a non-Gaussian signal. We assume a fixed measurement of the power spectrum C^l\hat{C}_{l} and look at the correction to the likelihood as function of the model ClC_{l}. Additionally all values of the model ClC_{l} have been scaled relative to C^l\hat{C}_{l} by an amplitude parameter, Cl=AC^lC_{l}=A\hat{C}_{l}. The corrections are shown as a function of AA scaled by its Gamma 1σ1\sigma uncertainty σ\sigma, i.e. its signal-to-noise. The non-Gaussian covariance (trispectrum) term (black solid) and squared bispectrum terms (dot-dashed green) are shown separately, along with the fractional difference of a Gaussian with covariance containing only a Gaussian part (red) or additionally the non-Gaussian part (blue).

The zero-order likelihood function for fixed C^l\hat{C}_{l} is proportional to an Inverse-Gamma distribution. In Figure 4 we show the fractional correction away from this due to signal non-Gaussianity, using C^l\hat{C}_{l} that have been binned in multipole with Δlogl=0.1\Delta\log l=0.1999The sensitivity to binning choices is mild, and discussed in Section II.4 and Appendix C.. In this figure we assume a fixed measured C^l\hat{C}_{l} and show the correction to the likelihood when the model ClC_{l} are uniformly scaled relative to C^l\hat{C}_{l}, i.e. we take Cl=AC^lC_{l}=A\hat{C}_{l} for an amplitude parameter AA. The width of the likelihood function is typically σ1/ν/2\sigma\equiv 1/\sqrt{\sum\nu/2}, the zero-order Gamma (or Wishart when p1p\neq 1) uncertainty, so we normalise the horizontal scale in Figure 4 by this quantity. The best-fit value of AA is close to unity, so the actual values plotted on the horizontal axis reflect the lmax=100l_{\mathrm{max}}=100 that we have assumed for this plot. We plot the contributions from the trispectrum and the squared bispectrum separately. We have verified that the ‘binned’ version of our model for the sampling distribution provides a close match to the histogram of measured power spectra from the lognormal mocks across the range of scales contributing to the all the figures in the section, although we note that the mean and covariance are both matched by construction.

Figure 4 shows that the contribution from the squared bispectrum B~lll′′\tilde{B}_{ll^{\prime}l^{\prime\prime}} is sub-dominant to that of the trispectrum in the vicinity of a few σ\sigma around the best-fit model, which is the range of interest. As expected due to the high mode count contributing to information on AA the zero-order distribution is close to Gaussian, as demonstrated by the red curve - note that uniform contributions to the fractional difference will get normalised away when we compute posteriors, as shown shortly. In blue we show the fractional correction of a Gaussian having the correct covariance matrix including the trispectrum contribution. In agreement with the discussion in Section II.3, this provides an accurate approximation to our distribution, with the main effect being a quadratic correction corresponding to the broadening of parameter error bars.

Refer to caption
Figure 5: Posteriors of individual power spectrum bandpowers, assuming all others are fixed to their measured values, labelled by their weighted central values above each panel. A Jeffreys prior has been assumed for all distributions, and we use scales up to lmax=100l_{\mathrm{max}}=100. Our model (black) is indistinguishable from the Gamma likelihood (yellow) in the upper panels. We also show a Gaussian with Gaussian covariance (red) and non-Gaussian covariance (blue). Differences between these are indistinguishable in the upper panels, and the red curve is very close to the yellow curve in the lower panels. Corrections are shown as a function of the model band power scaled by its Gamma 1σ1\sigma uncertainty σ(Cb)\sigma(C_{b}), i.e. its signal-to-noise. Our model transitions from Gamma at large angular scales to a Gaussian with non-Gaussian covariance at small angular scales. Below each panel we show the fractional differences with respect to the Gamma likelihood scenario.

In practice we are not so much interested in the likelihood as the posterior. The easiest posterior to compute from our likelihood is the conditional posterior of a single ClC_{l} assuming all others are fixed to their measured values. This is shown in Figure 5 for lmax=100l_{\mathrm{max}}=100 and again using binned power spectra with Δlogl=0.1\Delta\log l=0.1, which is the same binning choice used to fit the polyspectra from the simulations101010Note that the binning implementation can differ between that used in the likelihood and that used to fit the polyspectra, but we found using different choices gave negligible effect on the results.. The bin barycenters are shown above each panel in Figure 5. On large angular scales (top left panel), non-Gaussianity in the signal is weak and our distribution matches the Gamma distribution very closely (black and yellow curves are indistinguishable). A Gaussian approximation, in contrast, is poor on these scales. As the angular scale of the multipole bin is decreased, the Gaussian and Gamma approximations match ever more closely, with very little difference between any of the distributions at l30l\approx 30. As the angular scale decreases further, the effects of signal non-Gaussianity start to become important and broaden the distribution through the trispectrum contribution to the covariance matrix. Thus, our distribution starts to deviate from the Gamma and Gaussian-with-Gaussian-covariance distributions (themselves now indistinguishable) visibly at l100l\approx 100. Most of this deviation is captured by a Gaussian with a non-Gaussian covariance (blue curve), with the residual ‘post-Gaussian’ effect being a small amount of skewness in the distribution. In contrast to the Gamma and Gaussian approximations, our distribution is the only model that correctly matches the large-scale and weakly non-linear-scale limits of the true ClC_{l} distribution. Similar statements can be made of the marginal distribution of the ClC_{l}, whose leading-order properties were discussed in Section II.3.1.

It is interesting to note from Figure 5 that the regime where Gamma and Gaussian differ (l30l\lesssim 30 in this case) is distinct from the regime where signal non-Gaussianity starts to become important (l100l\gtrsim 100). That these regimes are well separated has traditionally been the motivation for considering large-scale likelihood non-Gaussianity from the finite number of modes as a separate effect to the small-scale modification of the covariance from signal non-Gaussianity. The distinction is of course redshift dependent; as the redshift of a source bin is lowered, the angular scale at which signal non-Gaussianity becomes important will increase. We therefore expect that our new likelihood expression will be of unique value when these two sources of likelihood non-Gaussianity must be treated simultaneously, i.e. for low-redshift sources. The precise impact will additionally depend on the parameters of interest. We explore a realistic low-zsz_{s} scenario in Appendix D, finding that skewness corrections to the posterior can become sizeable (although still <10%<10\%) for lmax=1000l_{\mathrm{max}}=1000, but are sub-percent once shape noise is included.

Refer to caption
Figure 6: Posterior of an amplitude parameter AA linearly scaling the model power spectrum, as a function of AA scaled by its Gamma 1σ1\sigma uncertainty. A Jeffreys prior has been assumed, and we have included scales up to lmax=100l_{\mathrm{max}}=100. From top to bottom in the centre of the figure we plot the Gamma likelihood (yellow), a Gaussian likelihood with Gaussian covariance matrix (yellow, overlapping with red), a Gaussian likelihood with non-Gaussian covariance matrix (blue), and our model (black).

It is also straightforward to compute the posterior of an amplitude parameter scaling the signal power spectrum (we again neglect noise for simplicity) as Cl=AC^lC_{l}=A\hat{C}_{l}, shown in Figure 6. Most of the constraining power on the amplitude comes from small scales, and hence the Gamma and Gaussian-with-Gaussian-covariance (yellow and red curves respectively) overlap closely. Our distribution (black curve) correctly captures the significant increase in parameter variance from the trispectrum. As we have seen in Figure 5, a Gaussian with appropriately modified covariance (blue curve) captures most of the effect of signal non-Gaussianity, with our distribution imparting a small residual skewness into the posterior.

Refer to caption
Figure 7: Leading-order fractional corrections to the posterior from a non-Gaussian signal as a function of angular wave number, valid in the l1l\gg 1 limit. Left panel: corrections to the Gamma posterior (conditional or marginal) of the model ClC_{l}. Shown are the spectral radius (i.e. largest eigenvalue) of the εll\varepsilon_{ll^{\prime}} matrix defined in Equations (24) and (32) (solid blue) and its diagonal elements (dashed blue). These quantities (blue top set of curves) are required to be small for higher-order non-Gaussian terms to be negligible. Leading-order corrections to the Gamma posterior not captured by a Gaussian functional form (‘post-Gaussian’ effects) are represented by the orange curves (middle set), and sub-dominant corrections to the mean and variance by the green curves (lower set). These are suppressed by factors of ν1/2\nu^{-1/2} and ν1\nu^{-1} respectively. Right panel: Equivalent corrections to the posterior of an amplitude parameter, now showing the cumulative statistic ε\langle\varepsilon\rangle (see text) up to lmaxl_{\mathrm{max}} (blue, top curve) and leading post-Gaussian corrections (orange and green). The interpretation of these curves is the same as in the left panel, i.e. they represent the total correction to the Gamma posterior from a non-Gaussian signal (blue), and the corrections not accounted for by a Gaussian at leading order in ν1/2\nu^{-1/2} (orange), and next-to-leading order (green).

Since our formalism is based on a perturbative series expansion of the signal distribution, it is important to check that we are working in a regime where higher-order terms can be safely neglected. In Section II.3.1 we showed that the leading-order correction to the posterior from signal non-Gaussianity is ε\langle\varepsilon\rangle, defined in Equation (32). This quantity is the trispectrum contribution to the C^l\hat{C}_{l} covariance matrix divided by the Gaussian contribution, εl1l2\varepsilon_{l_{1}l_{2}}, averaged over the scales contributing to the parameter of interest with a ν\nu-dependent weight. It has a simple interpretation that it is equal to the perturbation to the Fisher information from signal non-Gaussianity. We require ε1\langle\varepsilon\rangle\ll 1 for our perturbative series to be valid, with the leading-order post-Gaussian (i.e. beyond that predicted by a Gaussian functional form) skewness of order ε/ν\langle\varepsilon\rangle/\sqrt{\sum\nu}. We have found bispectrum-squared terms to be subdominant, but they are potentially non-negligible when l1000l\gtrsim 1000 by the arguments of Section II.3.

In Figure 7 we show a few convergence statistics that can be derived from εl1l2\varepsilon_{l_{1}l_{2}}. The left panel shows quantities relevant to the posterior of a single ClC_{l}, in which case ε\langle\varepsilon\rangle is equivalent to εll\varepsilon_{ll}, the diagonal elements of the matrix εl1l2\varepsilon_{l_{1}l_{2}}. We also show the largest eigenvalue (i.e. the spectral radius) of this matrix. A spectral radius much less than unity is sufficient to ensure a valid perturbative expansion of the inverse covariance matrix around its Gaussian term, and must hold if our expression is to give a valid approximation to the pairwise distributions of ClC_{l} with different ll.

As the blue curves in Figure 7 show, the fractional correction from signal non-Gaussianity is much less than unity out to l100l\approx 100, which justifies our use of this maximum scale in the posteriors computed in this section and explains why the correction appears so small in Figure 5. The leading-order corrections not captured by a Gaussian functional form are given by the orange curves and are never more than 10210^{-2} even out to l1000l\approx 1000, i.e. sub-percent across all scales of interest for upcoming lensing surveys, an important result. Beyond l100l\approx 100 our perturbative series breaks down, but the quantity ε/ν\langle\varepsilon\rangle/\sqrt{\sum\nu} is still indicative of the relevance of residual non-Gaussianity in the likelihood, so it is reasonable to draw useful conclusions from Figure 7 even beyond the regime where our likelihood expression is formally valid. We note that ε/ν\langle\varepsilon\rangle/\sqrt{\sum\nu} is the leading-order correction to the dimensionless skewness of the measured power spectrum from signal non-Gaussianity, valid at all (sufficiently large) ll, so it is reasonable to expect that this quantity also quantifies non-Gaussian corrections to the likelihood function. Corrections to the posterior mean and variance not captured by a Gaussian are even more subdominant, suppressed by another factor of ν\sqrt{\nu}.

The right-hand panel of Figure 7 shows ε\langle\varepsilon\rangle for an amplitude parameter scaling the model power spectrum. The corrections are comparable in size with those of the individual ClC_{l}, with lmax=100l_{\mathrm{max}}=100 just about in the regime where our expression is valid. Non-Gaussian effects are still sub-percent even out to lmax=1000l_{\mathrm{max}}=1000. In Appendix C we study the (mild) impact of choosing a different binning scheme for the power spectrum bandpowers, shown in Figure 9.

Refer to caption
Figure 8: Same as Figure 7 but with shape noise corresponding to n¯=3sq.arcmin1\bar{n}=3\,\mathrm{sq.\,arcmin}^{-1}.

Taken together, Figures 5 and 7 give the following picture for how the likelihood changes with angular multipole. On the largest scales a Gamma distribution is accurate as the signal is close to Gaussian. By l30l\approx 30 the Gamma distribution has Gaussianized, with corrections being percent-level or small in the central 2σ2\sigma region (albeit larger in the tails). As l100l\approx 100 is approached the non-Gaussianity in the signal starts to become important, and a Gaussian with appropriately broadened covariance is a good approximation. Corrections to the Gamma distribution not captured by the Gaussian approximation are sub-percent across all angular scales, but can be accounted for by using our expression Equation (22) on large scales and smoothly matching it to a Gaussian around l100l\lesssim 100.

One question we wish to answer in this work is whether the strength of signal non-Gaussianity grows faster than the CLT is able to Gaussianize the power spectrum distribution. From Figure 7 it appears that the struggle between these two competing effects is delicately balanced, with a very slow growth in the post-Gaussian correction from signal non-Gaussianity (orange curve) visible as lmaxl_{\mathrm{max}} increases. One might wonder whether this correction becomes non-negligible at very high lmaxl_{\mathrm{max}}. At some point as lmaxl_{\mathrm{max}} is increased the effect of the noise will become important, which suppresses ε\varepsilon by boosting ClC_{l} while not adding any non-Gaussianity (we assume the noise is Gaussian here). In Figure 8 we show the size of post-Gaussian effects (i.e. corrections to the Gamma distribution not captured by a Gaussian likelihood) on the posterior with shape noise given by σγ2=σe2/n¯\sigma_{\gamma}^{2}=\sigma_{e}^{2}/\bar{n}, i.e. the model power spectrum is Cl=AS0,l+NlC_{l}=AS_{0,l}+N_{l}. We assume a per-component ellipticity r.m.s. of σe=0.21\sigma_{e}=0.21, and set the source number density to n¯=3arcmin2\bar{n}=3\,\mathrm{arcmin}^{-2}, as might be expected in a single tomographic bin in a Euclid-like survey with 10 redshift bins Euclid Collaboration et al. (2020). The Gaussianizing effects of noise start to become important at l100l\gtrsim 100, and ensure that post-Gaussian effects in the likelihood are never more than sub-percent. For n¯=30arcmin2\bar{n}=30\,\mathrm{arcmin}^{-2} this turn-over happens around lmax1000l_{\mathrm{max}}\approx 1000, suggesting that post-Gaussian effects are safely negligible on small scales for Stage-IV surveys.

As discussed above, it will likely be the case that post-Gaussian effects are most relevant when the largest angular scales in the survey are mildly non-linear, i.e. for low-redshift tomographic bins. These carry little statistical weight in a Euclid-like survey where most of the sources are around z1z\approx 1, but could be relevant in other surveys. In Appendix D we explore the corrections for the lowest source redshift bin of a 10-bin lensing survey having intrinsic source redshifts comparable to that of a Euclid-like survey, showing that corrections are sub-percent once the effects of shape noise are included. This is partly because the amplitude of the shear power spectrum is suppressed for nearby sources due to the lensing geometry, so it may be that effects are larger for projected galaxy clustering. The quantity ε/ν\langle\varepsilon\rangle/\sqrt{\sum\nu}, or alternatively the dimensionless skewness of the measured power spectrum, provides an easy-to-compute diagnostic of whether one needs to worry about non-Gaussianity in the likelihood when low-redshift bins are important. If the effects are large, one will also need to account for effects of the mask in a full pseudo-ClC_{l} analysis to build an accurate likelihood model, as in Ref. Hamimeche and Lewis (2008); Benabed et al. (2009); Upham et al. (2020); de Belsunce et al. (2021), or resort to map-level or likelihood-free methods.

IV Conclusions

We have computed, for the first time, the leading-order correction to the distribution of angular power spectrum estimates of a projected field from non-Gaussianity in the underlying signal. Our expressions are given in Equation (22) in the case of a single map, and by the sum of Equation (81) and Equation (102) in the case of multiple correlated maps. These expressions correct the Wishart distribution for the matrix of power spectra, and consist of a term proportional to the signal trispectrum and a term proportional to the square of its bispectrum. The expression can easily be generalised to include non-Gaussian noise.

Our expressions are correctly normalised, and have the correct mean, covariance, and three-point function. We have shown that, in the limit of a large number of degrees of freedom and sufficiently well behaved non-Gaussianity, the main effect of signal non-Gaussianity is to correct the covariance matrix of the power spectra by the standard trispectrum term. The residual ‘post-Gaussian’ effect of non-Gaussianity is to impart some skewness to the likelihood, on top of that present due to the residual effects of finite degrees of freedom. On small scales, the size of the correction to the posterior of a parameter is given approximately by the fractional correction to the Fisher information from the trispectrum, scaled by the inverse square-root of the total degrees of freedom that contribute. For a given ClC_{l} and l1l\gg 1 the relevant quantity is Δll/2Tll/Cl2\Delta_{l}\equiv\sqrt{l/2}T_{ll}/C_{l}^{2}, where TllT_{ll} is the trispectrum contribution to the variance. When Δl1\Delta_{l}\ll 1, corrections to the likelihood of the ClC_{l} that are not captured by a Gaussian with appropriately modified covariance are suppressed.

We have provided diagnostic statistics, given in Equation (24) and Equation (32) that quantify corrections to likelihoods and posteriors. These can be computed straightforwardly from a model for the non-Gaussian part of the covariance matrix. Such a model is already a requirement when using a likelihood for two-point statistics, so these statistics should be easy to compute for a given observational setup.

We quantified corrections to the likelihood function using mock non-Gaussian weak lensing maps created assuming lognormal statistics, with input power spectra and noise levels representative of an upcoming Stage-IV survey. The bispectrum-squared term is subdominant to the correction from the trispectrum at least to l100l\approx 100, but we do not rule out stronger effects on small angular scales. Corrections to the likelihood not captured by a Gaussian are strongly suppressed on small scales in all our tests, which provides evidence that a Gaussian likelihood is sufficient on small scales even in the presence of realistic signal non-Gaussianity.

On large angular scales our likelihood correctly tends to a Gamma/Wishart distribution. We have found that the angular scales where the Gamma transitions to a Gaussian are sufficiently well separated from those where the trispectrum becomes important that it is sufficient to approximate the functional form of the likelihood as Gaussian when incorporating signal non-Gaussianity. This is the approach that has been taken traditionally, albeit without rigorous justification. This property remains true for the lowest redshift bin of a 10-bin Euclid-like lensing survey, but may break down for galaxy clustering power spectra at low redshifts. The likelihood function we have derived is unique amongst commonly used likelihoods in that it has the correct behaviour on both large and small angular scales, being non-perturbative in the degrees of freedom.

Our model relies on the signal non-Gaussianity being weak in a certain, well-defined, sense. We have taken care to identify the regime where this approximation is valid, but have argued that requiring Δl1\Delta_{l}\ll 1 is a good diagnostic for the post-Gaussian effects even beyond the limiting scale where our expansion is valid. Use of our likelihood in a practical situation will require ensuring that (2l+1)Δl1(2l+1)\Delta_{l}\ll 1, or more generally that the corrections from signal non-Gaussianity are small in the likelihood or posterior of interest. This can also be checked by ensuring that non-Gaussianity in the sampling distribution of the relevant combinations of the measured power spectrum is small, which can be checked with simulations (which are in any case necessary to validate or produce covariance matrix estimates).

We have elucidated how the Central Limit Theorem competes with signal non-Gaussianity in suppressing non-Gaussian corrections to the likelihood. The balance is surprisingly delicate, but our conclusion is that our Universe is not non-Gaussian enough on the relevant scales that post-Gaussian terms in the likelihood are necessary in the high-ll regime – the Central Limit Theorem wins. This conclusion is strengthened once Gaussian noise (e.g. from galaxy intrinsic shapes or shot noise) is present in the data. This essentially explains why previous works such as Ref. Taylor et al. (2019); Lin et al. (2020) have found a Gaussian likelihood to be sufficient for Λ\LambdaCDM parameter inference in the presence of realistic signal non-Gaussianity. Our findings are also consistent with the results of Ref. Diaz Rivero and Dvorkin (2020), who found that by far the dominant source of non-Gaussianity in the power spectrum likelihood comes from large angular scales where signal non-Gaussianity is negligible. Our formalism has allowed us to cleanly separate the effects of non-Gaussianity in the signal with that from the quadratic nature of the power spectrum estimator, improving upon previous numerical works that have been unable to distinguish between these distinct effects.

An important limitation of our likelihood is that it is only valid for full-sky power spectrum estimates. For the standard pseudo-ClC_{l} estimator even the zero-order Gaussian-field likelihood is difficult to compute Upham et al. (2020), and accounting for the mode coupling entailed by a survey mask in the non-Gaussian case seems intractable at first sight. The main effect of the mask is to reduce the number of degrees of freedom, increasing parameter variances. As the size of the survey is reduced, we can expect that at some point the largest observable scale is non-linear, in which case a suitably modified version of our likelihood might be a good place from which to build an approximation. The effects of non-Gaussianity from the signal in the case of small surveys may be behind the fairly large posterior corrections found in Ref. Hartlap et al. (2009). In the case that the full-sky power spectra are estimated from the data directly (as in quadratic maximum likelihood approaches) our distribution is likely to be more accurate, and bears some resemblance to the approximations developed in Ref. Seljak et al. (2017). For small survey cuts the mode coupling induced by the mask may only be relevant in the regime where the signal is linear, in which case an fskyf_{\mathrm{sky}} correction to the number of degrees of freedom may be sufficient to incorporate the survey mask into our likelihood. However, it may be the case that a map-level likelihood such as that used in cosmic microwave background analyses Wandelt et al. (2004), a likelihood-free approach Alsing et al. (2019), or suitable data transformations Wang et al. (2019) may be the way forward for modelling these effects in the distribution of projected fields.

While this work has studied the power spectrum likelihood, most weak lensing surveys up to date have used the shear correlation functions to derive their cosmological constraints. An analytic expression for the correlation function likelihood is difficult to write down even in the case of linear shear fields, but may be inferred numerically by sampling power spectra and subsequently transforming to real space. We defer an analytic study of this to a future work, but note that the results of the numerical studies Ref. Lin et al. (2020) and Ref. Taylor et al. (2019) suggest that non-Gaussian corrections are likely negligible for Λ\LambdaCDM models.

This work represents the first rigorous analytic study of the use of a Gaussian approximation for likelihood function of the power spectrum of non-Gaussian fields. While residual non-Gaussian effects appear small on small angular scales, we recommend that this be checked on a survey-by-survey basis using the diagnostic statistics defined in Equations (24) and (32). With the huge increase in statistical constraining power offered by near-future wide imaging surveys, it will be increasingly important to ensure that no biases arise from an incorrect statistical description of the data. Since we have found no evidence that signal non-Gaussianity imparts significant non-Gaussianity into the likelihood beyond that coming from the finite degrees of freedom, our recommendation for upcoming Stage-IV lensing surveys is to model the power spectrum likelihood as Gaussian (with the correct non-linear covariance matrix) on small scales l30l\gtrsim 30. On large scales in the case that mode coupling from the mask is irrelevant the Gamma (or Wishart in the case of multiple maps) distribution should be used, and when mode coupling is relevant one should resort to map-level, likelihood-free, or existing likelihood approximations. The likelihood we have derived in this work may be used as a bridge between these two regimes.

Acknowledgements.
We thank Karim Benabed, Michael Brown, Alan Heavens, and Robin Upham for useful conversations. We thank Eric Tittley for his maintenance of and help with cuillin, the IfA’s computing cluster ( http://cuillin.roe.ac.uk) partially funded by the Science and Technology Facilities Council (STFC) and the European Research Council. Some of the results in this paper have been derived using the HEALPix Górski et al. (2005) package. AH and ANT acknowledge the support of a UK STFC Consolidated Grant.

Appendix A Power spectrum cumulants and the Edgeworth expansion

When considering how non-Gaussianity in a signal affects the distribution of the power spectrum, it is useful to start by studying the first few cumulants. In this section we will keep the assumption of Gaussian noise, and specialise to a single redshift bin (p=1p=1) for simplicity.

Firstly, let us compute the covariance of the power spectrum estimates. Non-Gaussianity contributes via the trispectrum of the signal as

ΔC^lΔC^l=2νCl2δll+Tll,\langle\Delta\hat{C}_{l}\Delta\hat{C}_{l^{\prime}}\rangle=\frac{2}{\nu}C_{l}^{2}\delta_{ll^{\prime}}+T_{ll^{\prime}}, (37)

where TllT_{ll^{\prime}} is an mm-averaged version of the signal trispectrum; explicit forms for this are given in Ref. Okamoto and Hu (2002). We remind the reader here that ClC_{l} is the sum of signal and noise power. The trispectrum term TllT_{ll^{\prime}} is in general difficult to compute, with current strategies in weak lensing focused around halo models Cooray and Sheth (2002); Krause and Eifler (2017) or NN-body simulations Harnois-Déraps et al. (2012); Sato and Nishimichi (2013).

Now consider the three-point function of C^l\hat{C}_{l}. This consists of terms having six alma_{lm} factors. We can compute this from the definition Equation (1), using Wick’s theorem to pick out the various contributions. This gives

ΔC^l1ΔC^l2ΔC^l3=8δl1l2δl2l3ν12Cl13+4B~l1l2l3+2S~l1l2l3+4[3]δl1l2ν1Cl1Tl1l3+Pl1l2l3,\langle\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}\Delta\hat{C}_{l_{3}}\rangle=8\frac{\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{\nu_{1}^{2}}C_{l_{1}}^{3}+4\tilde{B}_{l_{1}l_{2}l_{3}}+2\tilde{S}_{l_{1}l_{2}l_{3}}+4[3]\frac{\delta_{l_{1}l_{2}}}{\nu_{1}}C_{l_{1}}T_{l_{1}l_{3}}+P_{l_{1}l_{2}l_{3}}, (38)

where the notation [3][3] denotes the 33 terms resulting from symmetrization over wave numbers in terms to the right, Pl1l2l3P_{l_{1}l_{2}l_{3}} is an mm-averaged connected signal six-point function (‘pentaspectrum’) and the bispectrum-squared quantities B~\tilde{B} and S~\tilde{S} are defined as

B~l1l2l3\displaystyle\tilde{B}_{l_{1}l_{2}l_{3}} 1ν1ν2ν3m1,m2,m3sl1m1sl2m2sl3m3sl1m1sl2m2sl3m3,\displaystyle\equiv\frac{1}{\nu_{1}\nu_{2}\nu_{3}}\sum_{m_{1},m_{2},m_{3}}\langle s_{l_{1}m_{1}}s_{l_{2}m_{2}}s_{l_{3}m_{3}}\rangle\langle s_{l_{1}m_{1}}^{*}s_{l_{2}m_{2}}^{*}s_{l_{3}m_{3}}^{*}\rangle, (39)
S~l1l2l3\displaystyle\tilde{S}_{l_{1}l_{2}l_{3}} 1ν1ν2ν3m1,m2,m3sl1m1sl1m1sl2m2sl2m2sl3m3sl3m3+sym.\displaystyle\equiv\frac{1}{\nu_{1}\nu_{2}\nu_{3}}\sum_{m_{1},m_{2},m_{3}}\langle s_{l_{1}m_{1}}s_{l_{1}m_{1}}^{*}s_{l_{2}m_{2}}\rangle\langle s_{l_{2}m_{2}}^{*}s_{l_{3}m_{3}}s_{l_{3}m_{3}}^{*}\rangle+\mathrm{sym.} (40)

By statistical isotropy the signal bispectra must be invariant under rotations, which constrains the terms in angle brackets in Equations (39) and (40) to be proportional to 3j3j-symbols that carry all the mm-dependence Cooray and Hu (2000). We will further impose that the signal statistics are invariant under a parity inversion, which introduces a second 3j3j-symbol imposing that all the wave numbers sum to an even integer. Explicitly we have

al1m1al2m2al3m3=ν1ν2ν34π(l1l1l3000)(l1l1l3m1m2m3)bl1l2l3,\langle a_{l_{1}m_{1}}a_{l_{2}m_{2}}a_{l_{3}m_{3}}\rangle=\sqrt{\frac{\nu_{1}\nu_{2}\nu_{3}}{4\pi}}\begin{pmatrix}l_{1}&l_{1}&l_{3}\\ 0&0&0\end{pmatrix}\begin{pmatrix}l_{1}&l_{1}&l_{3}\\ m_{1}&m_{2}&m_{3}\end{pmatrix}b_{l_{1}l_{2}l_{3}}, (41)

where bl1l2l3b_{l_{1}l_{2}l_{3}} is the reduced bispectrum. Substituting this expression into Equations (39) and (40), we see that S~\tilde{S} is proportional to terms like

m1(1)m1(l1l1l3m1m1m3)=δl30δm30(1)l12l1+1,\sum_{m_{1}}(-1)^{m_{1}}\begin{pmatrix}l_{1}&l_{1}&l_{3}\\ m_{1}&-m_{1}&m_{3}\end{pmatrix}=\delta_{l_{3}0}\delta_{m_{3}0}(-1)^{l_{1}}\sqrt{2l_{1}+1}, (42)

where the identity follows from the orthogonality of the 3j3j-symbols. Since the signal l=0l=0 mode is unobservable, the overall contribution from terms of this form is zero, and hence S~=0\tilde{S}=0. For B~\tilde{B} we have

B~l1l2l3=14π(l1l2l3000)2bl1l2l32.\tilde{B}_{l_{1}l_{2}l_{3}}=\frac{1}{4\pi}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ 0&0&0\end{pmatrix}^{2}b_{l_{1}l_{2}l_{3}}^{2}. (43)

As mentioned above, the 3j3j-symbol imposes that l1+l2+l3l_{1}+l_{2}+l_{3} is an even integer, as well as the triangle conditions |l1l2|l3l1+l2\lvert l_{1}-l_{2}\rvert\leq l_{3}\leq l_{1}+l_{2}.

The three-point function of power spectrum estimates thus receives contributions from squared bispectra, products of trispectra and power spectra, and the connected six-point function. Similarly, the connected four-point function will receive contributions from squared trispectra, squared bispectra multiplied by power spectra, bispectra multiplied by the connected five-point function, pentaspectra multiplied by power spectra, and the connected eight-point function.

We want to keep our model for non-Gaussianity as general as possible, so in principle we would like to keep track of the whole cumulant hierarchy of the power spectrum estimates. This is clearly going to be computationally unfeasible however, so we will hereafter assume that the signal non-Gaussianity is weak.

A.1 Weak non-Gaussianity

We will assume that non-Gaussianity in the signal is weak in the sense that each term in its cumulant hierarchy is successively smaller than the last. Introducing the order-counting parameter λ\lambda, we will assume that Cl𝒪(λ)C_{l}\sim\mathcal{O}(\lambda), bl1l2l3𝒪(λ2)b_{l_{1}l_{2}l_{3}}\sim\mathcal{O}(\lambda^{2}), Tl1l2𝒪(λ3)T_{l_{1}l_{2}}\sim\mathcal{O}(\lambda^{3}), etc., with λ1\lambda\ll 1. This is justified on sufficiently large scales where non-linearity in the underlying field is weak, and perturbation theory is accurate. We will precisely define the regime in which this perturbative approach is valid later on.

In this scenario, bispectrum-squared terms like B~l1l2l3\tilde{B}_{l_{1}l_{2}l_{3}} are the same perturbative order as power-trispectrum cross terms like Cl1Tl1l3C_{l_{1}}T_{l_{1}l_{3}}, which can also be seen by expanding the fields to second order in the linear signal; the leading-order (tree level) bispectrum features two factors of the linear power spectrum Bernardeau et al. (2002), meaning the squared bispectrum contains four such factors. The trispectrum contains three power spectra at leading order, and hence Cl1Tl1l3C_{l_{1}}T_{l_{1}l_{3}} also contains four factors of the linear power spectrum. The connected six-point function is suppressed as Pl1l2l3𝒪(λ5)P_{l_{1}l_{2}l_{3}}\sim\mathcal{O}(\lambda^{5}). We will retain leading-order ‘non-Gamma’ terms throughout. The 3-pt function of C^l\hat{C}_{l} is thus Equation (38) with S~=0\tilde{S}=0 and the final Pl1l2l3P_{l_{1}l_{2}l_{3}} term dropped.

The dimensionless 1-pt skewness of C^l\hat{C}_{l} can be derived from Equation (38) as

ΔC^l3ΔCl23/2\displaystyle\frac{\langle\Delta\hat{C}_{l}^{3}\rangle}{\langle\Delta C_{l}^{2}\rangle^{3/2}} =8Cl3/ν2+4B~lll+12ClTll/ν(2Cl2/ν+Tll)3/2\displaystyle=\frac{8C_{l}^{3}/\nu^{2}+4\tilde{B}_{lll}+12C_{l}T_{ll}/\nu}{\left(2C_{l}^{2}/\nu+T_{ll}\right)^{3/2}}
=8ν+2ν3B~lllCl3+32ν2TllCl2+𝒪(λ2).\displaystyle=\sqrt{\frac{8}{\nu}}+\sqrt{2\nu^{3}}\frac{\tilde{B}_{lll}}{C_{l}^{3}}+\frac{3\sqrt{2\nu}}{2}\frac{T_{ll}}{C_{l}^{2}}+\mathcal{O}(\lambda^{2}). (44)

It is important to note here that we could consistently neglect the 𝒪(λ5)\mathcal{O}(\lambda^{5}) term Pl1l2l3P_{l_{1}l_{2}l_{3}} in both the 3-pt function and its dimensionless version, since the variance itself is 𝒪(λ2)\mathcal{O}(\lambda^{2}) meaning that connected 6-pt terms contribute at 𝒪(λ2)\mathcal{O}(\lambda^{2}) in the dimensionless skew. This is in contrast to the connected four-point function of the power spectrum estimates. The 1-pt excess kurtosis for example is

ΔC^l4c=48Cl4ν3+96ClB~lllν+144Cl2Tllν2,\langle\Delta\hat{C}_{l}^{4}\rangle_{c}=48\frac{C_{l}^{4}}{\nu^{3}}+96\frac{C_{l}\tilde{B}_{lll}}{\nu}+144\frac{C_{l}^{2}T_{ll}}{\nu^{2}}, (45)

where we have kept terms at 𝒪(λ5)\mathcal{O}(\lambda^{5}). Working to a consistent 𝒪(λ4)\mathcal{O}(\lambda^{4}) order would mean dropping all but the first term from this expression. In contrast, the dimensionless kurtosis of C^l\hat{C}_{l} is

ΔC^l4cΔC^l22\displaystyle\frac{\langle\Delta\hat{C}_{l}^{4}\rangle_{c}}{\langle\Delta\hat{C}_{l}^{2}\rangle^{2}} =48Cl4/ν3+96ClB~lll/ν+144Cl2Tll/ν2(2Cl2/ν+Tll)2\displaystyle=\frac{48C_{l}^{4}/\nu^{3}+96C_{l}\tilde{B}_{lll}/\nu+144C_{l}^{2}T_{ll}/\nu^{2}}{\left(2C_{l}^{2}/\nu+T_{ll}\right)^{2}}
=12ν+24νB~lllCl3+24TllCl2+𝒪(λ2).\displaystyle=\frac{12}{\nu}+24\nu\frac{\tilde{B}_{lll}}{C_{l}^{3}}+24\frac{T_{ll}}{C_{l}^{2}}+\mathcal{O}(\lambda^{2}). (46)

Therefore, while the raw non-Gaussian cumulant hierarchy of the power spectrum estimates can be truncated consistently in perturbation theory, all of the dimensionless cumulants, i.e. the cumulants of a normalised C^l\hat{C}_{l} estimator with unit variance, receive corrections from a non-Gaussian signal. The cumulant hierarchy of the normalised estimator cannot therefore be truncated consistently, since terms of order 𝒪(λ)\mathcal{O}(\lambda) always contribute.

The higher-order non-Gaussian cumulants of a normalised C^l\hat{C}_{l} estimator thus do not get successively smaller, in contrast to those of the signal. This observation has important consequences, since successively smaller cumulants are a requirement for the Edgeworth expansion to converge. One might be tempted to speculate that a good way of incorporating the effects of a non-Gaussian signal into the C^l\hat{C}_{l} distribution would be to treat the non-Gaussianity as weak and then Edgeworth expand around a Gamma distribution, but the argument above tells us this will not work; we have to instead expand the signal distribution around a Gaussian. Attempts to use an Edgeworth expansion directly on the distribution of C^l\hat{C}_{l} (or a decorrelated version of it, as in Ref. Lin et al. (2020)) in order to incorporate signal non-Gaussianity should be treated with caution.

Appendix B Derivation of the tomographic redshift bin likelihood

In this Section we derive the correction to the likelihood function of tomographic redshift bin power spectra. It is straightforward to write down the correct likelihood for full-sky spectra when the signal and noise are both Gaussian - this is the Wishart distribution for positive-definite matrices. Similarly, it is straightforward to write down a Gaussian approximation to the Wishart distribution. Here we present the derivation for the case of multiple redshift bins.

Our strategy will be to substitute the Edgeworth-expanded signal characteristic function Equation (12) into the mode coupling integral Equation (9) and perform the integrations to derive the characteristic function of the power spectrum. We will then inverse Fourier-transform this to get the distribution of the power spectrum.

We will compute the trispectrum and bispectrum-squared corrections separately, but start with the simple case of Gaussian signal and noise.

B.1 Gaussian signal, Gaussian noise

As a warm-up, consider the situation where both signal and noise are Gaussian. The former should be a good approximation on large scales where the signal is a linearly-evolved version of the (close to) Gaussian initial conditions, and the latter is practically guaranteed for maps made by compression from large numbers of underlying tracers (e.g. shear maps constructed from averaged galaxy shapes, or density maps made by counting many galaxies).

We assume throughout that the underlying signal and noise obey statistical isotropy, such that modes with different wave vectors are uncorrelated. When the signal and noise are both Gaussian, the covariance of different modes captures all the statistical information of the field, implying that modes with different wave vectors are not only uncorrelated but statistically independent. The characteristic functions of signal and noise thus factorize, and since the Fourier transform of a Gaussian is also a Gaussian we have

ϕs({𝐤lm})\displaystyle\phi_{s}(\{\mathbf{k}_{lm}\}) =e12lm𝐤lm𝐒l𝐤lm,\displaystyle=e^{-\frac{1}{2}\sum_{lm}\mathbf{k}_{lm}^{\dagger}\mathbf{S}_{l}\mathbf{k}_{lm}}, (47)
ϕn({𝐤lm})\displaystyle\phi_{n}(\{\mathbf{k}_{lm}\}) =e12lm𝐤lm𝐍l𝐤lm,\displaystyle=e^{-\frac{1}{2}\sum_{lm}\mathbf{k}_{lm}^{\dagger}\mathbf{N}_{l}\mathbf{k}_{lm}}, (48)

where 𝐒l\mathbf{S}_{l} and 𝐍l\mathbf{N}_{l} are the signal and noise covariance matrices.

Equation (9) factorizes in each ll and mm, with each integral a Gaussian integral. Making these substitutions immediately gives

ϕ{ν𝐂^l}({𝐉l})=l=lminlmax|𝐈+2i𝐉l𝐂l|ν2,\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}, (49)

where 𝐂l\mathbf{C}_{l} is the total (signal plus noise) covariance matrix of the map. We recognise this as a product of characteristic functions of Wishart matrices.

The joint distribution of the power spectra is thus

p({ν𝐂^l})=l=lminlmax2p(p1)2d𝐉l(2π)neiTr(ν𝐉l𝐂^l)|𝐈+2i𝐉l𝐂l|ν2,p(\{\nu\hat{\mathbf{C}}_{l}\})=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}2^{\frac{p(p-1)}{2}}\int\frac{\mathrm{d}\mathbf{J}_{l}}{(2\pi)^{n}}e^{i\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}, (50)

i.e. each 𝐂^l\hat{\mathbf{C}}_{l} is independent.

The integral in Equation (50) is almost the standard integral that defines the multivariate Gamma function, with the exception that it is over real symmetric matrices rather than positive definite matrices. Now, let 𝐓2𝐂l12𝐉l𝐂l12\mathbf{T}\equiv 2\mathbf{C}_{l}^{\frac{1}{2}}\mathbf{J}_{l}\mathbf{C}_{l}^{\frac{1}{2}} and change integration variables from 𝐉l\mathbf{J}_{l} to 𝐓\mathbf{T}. We’ll assume the total covariance matrix is positive definite. The Jacobian of this transformation is

d𝐉l=2p(p+1)2|𝐂l|(p+1)2d𝐓,\mathrm{d}\mathbf{J}_{l}=2^{-\frac{p(p+1)}{2}}\left|\mathbf{C}_{l}\right|^{-\frac{(p+1)}{2}}\mathrm{d}\mathbf{T}, (51)

which follows from a standard result. Each term in the product is therefore

2p|𝐂l|(p+1)2(2π)p(p+1)2d𝐓ei2Tr(𝛀𝐓)|𝐈+i𝐓|ν2\frac{2^{-p}\left|\mathbf{C}_{l}\right|^{-\frac{(p+1)}{2}}}{(2\pi)^{\frac{p(p+1)}{2}}}\int\mathrm{d}\mathbf{T}\;e^{\frac{i}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{T})}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}} (52)

where 𝛀ν𝐂l12𝐂^l𝐂l12\bm{\Omega}\equiv\nu\mathbf{C}_{l}^{-\frac{1}{2}}\hat{\mathbf{C}}_{l}\mathbf{C}_{l}^{-\frac{1}{2}} is symmetric and the integration is over all real symmetric p×pp\times p matrices 𝐓\mathbf{T}. Our task is therefore to compute the integral

Pp,ν(𝛀)d𝐓ei2Tr(𝛀𝐓)|𝐈+i𝐓|ν2.P_{p,\nu}(\bm{\Omega})\equiv\int\mathrm{d}\mathbf{T}\;e^{\frac{i}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{T})}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}. (53)

There are several ways of solving this integral, as discussed in, e.g. Ref. Janik and Nowak (2003). Making the substitution 𝐙=𝐈+i𝐓\mathbf{Z}=\mathbf{I}+i\mathbf{T} in the definition of Pp,ν(𝛀)P_{p,\nu}(\bm{\Omega}) gives

Pp,ν(𝛀)=ip(p+1)2e12Tr(𝛀)Re(𝐙)=𝐈d𝐙e12Tr(𝛀𝐙)|𝐙|ν2,P_{p,\nu}(\bm{\Omega})=i^{-\frac{p(p+1)}{2}}e^{-\frac{1}{2}\mathrm{Tr}(\bm{\Omega})}\int_{\mathrm{Re}(\mathbf{Z})=\mathbf{I}}\mathrm{d}\mathbf{Z}\,e^{\frac{1}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{Z})}\lvert\mathbf{Z}\rvert^{-\frac{\nu}{2}}, (54)

where the integral is over a fixed real part and a symmetric imaginary part. The integral is an inverse Laplace transform, and can be derived from the result Herz (1955)

212p(p1)(2πi)12p(p+1)Re(Z)=X0>0d𝐙eTr(𝐙𝚲)|𝐙|a12(p+1)=|𝚲|aΓp[a+12(p+1)],\frac{2^{\frac{1}{2}p(p-1)}}{(2\pi i)^{\frac{1}{2}p(p+1)}}\int_{\mathrm{Re}(Z)=X_{0}>0}\mathrm{d}\mathbf{Z}\,e^{\mathrm{Tr}(\mathbf{Z}\bm{\Lambda})}\lvert\mathbf{Z}\rvert^{-a-\frac{1}{2}(p+1)}=\frac{\lvert\bm{\Lambda}\rvert^{a}}{\Gamma_{p}\left[a+\frac{1}{2}(p+1)\right]}, (55)

where 𝚲>0\bm{\Lambda}>0 and the integral is over symmetric imaginary parts and a fixed positive definite real part. We require 𝛀>0\bm{\Omega}>0, i.e. νp\nu\geq p. Since X0=IX_{0}=I is positive definite, we get, after some rearranging,

Pp,ν(𝛀)=2p(pν+3)2πp2(p+1)Γp(ν2)|𝛀|νp12e12Tr(𝛀).P_{p,\nu}(\bm{\Omega})=\frac{2^{\frac{p(p-\nu+3)}{2}}\pi^{\frac{p}{2}(p+1)}}{\Gamma_{p}\left(\frac{\nu}{2}\right)}\left|\bm{\Omega}\right|^{\frac{\nu-p-1}{2}}e^{-\frac{1}{2}\mathrm{Tr}(\bm{\Omega})}. (56)

Substituting Pp,ν(𝛀)P_{p,\nu}(\bm{\Omega}) back in to Equation (50) gives

p({ν𝐂^l})=l=lminlmax|ν𝐂^l|(νp1)/22pν2Γp(ν2)|𝐂l|ν2eν2Tr(𝐂l1𝐂^l),p(\{\nu\hat{\mathbf{C}}_{l}\})=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\frac{\lvert\nu\hat{\mathbf{C}}_{l}\rvert^{(\nu-p-1)/2}}{2^{\frac{p\nu}{2}}\Gamma_{p}\left(\frac{\nu}{2}\right)\left|\mathbf{C}_{l}\right|^{\frac{\nu}{2}}}e^{-\frac{\nu}{2}\mathrm{Tr}(\mathbf{C}_{l}^{-1}\hat{\mathbf{C}}_{l})}, (57)

i.e.

p({𝐂^l})\displaystyle p(\{\hat{\mathbf{C}}_{l}\}) =l=lminlmax|𝐂^l|(νp1)/22pν2Γp(ν2)|𝐂l/ν|ν2eν2Tr(𝐂l1𝐂^l)\displaystyle=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\frac{\lvert\hat{\mathbf{C}}_{l}\rvert^{(\nu-p-1)/2}}{2^{\frac{p\nu}{2}}\Gamma_{p}\left(\frac{\nu}{2}\right)\left|\mathbf{C}_{l}/\nu\right|^{\frac{\nu}{2}}}e^{-\frac{\nu}{2}\mathrm{Tr}(\mathbf{C}_{l}^{-1}\hat{\mathbf{C}}_{l})}
l=lminlmaxWp(𝐂^l;𝐂l/ν,ν),\displaystyle\equiv\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}W_{p}(\hat{\mathbf{C}}_{l};\mathbf{C}_{l}/\nu,\nu), (58)

where WpW_{p} is the pp-dimensional Wishart density.

B.2 Convergence of the Wishart distribution to Gaussianity

The convergence of probability distributions is most easily studied at the level of the characteristic function. A single 𝐂^l\hat{\mathbf{C}}_{l} has the characteristic function

ϕ𝐂^l(𝐉l)=|𝐈+2i𝐉l𝐂lν|ν2.\phi_{\hat{\mathbf{C}}_{l}}(\mathbf{J}_{l})=\left|\mathbf{I}+\frac{2i\mathbf{J}_{l}\mathbf{C}_{l}}{\nu}\right|^{-\frac{\nu}{2}}. (59)

Consider the characteristic function of 𝐌l𝐂1/2𝐂^l𝐂l1/2\mathbf{M}_{l}\equiv\mathbf{C}^{-1/2}\hat{\mathbf{C}}_{l}\mathbf{C}_{l}^{-1/2}. This is

ϕ𝐌l(𝐉l)=|𝐈+2i𝐉lν|ν2.\phi_{\mathbf{M}_{l}}(\mathbf{J}_{l})=\left|\mathbf{I}+\frac{2i\mathbf{J}_{l}}{\nu}\right|^{-\frac{\nu}{2}}. (60)

We can rewrite this as

ϕ𝐌l(𝐉l)=|𝐈+2i𝐑𝐉l𝐑Tν|ν2.\phi_{\mathbf{M}_{l}}(\mathbf{J}_{l})=\left|\mathbf{I}+\frac{2i\mathbf{R}\mathbf{J}_{l}\mathbf{R}^{T}}{\nu}\right|^{-\frac{\nu}{2}}. (61)

where 𝐑\mathbf{R} is any orthogonal matrix. Choosing 𝐑\mathbf{R} as the matrix that diagonalizes 𝐉l\mathbf{J}_{l}, this gives

ϕ𝐌l(𝐉l)=i=1p[1+2iλiν]ν2.\phi_{\mathbf{M}_{l}}(\mathbf{J}_{l})=\prod_{i=1}^{p}\left[1+\frac{2i\lambda_{i}}{\nu}\right]^{-\frac{\nu}{2}}. (62)

where λi\lambda_{i} are the eigenvalues of 𝐉l\mathbf{J}_{l}.

Now fixing 𝐉l\mathbf{J}_{l}, taking ν\nu\to\infty and keeping terms up to 𝒪(ν1)\mathcal{O}(\nu^{-1}) gives

limνϕ𝐌l(𝐉l)\displaystyle\lim_{\nu\to\infty}\phi_{\mathbf{M}_{l}}(\mathbf{J}_{l}) =exp(ii=1pλii=1pλi2ν)\displaystyle=\exp\left(-i\sum_{i=1}^{p}\lambda_{i}-\sum_{i=1}^{p}\frac{\lambda_{i}^{2}}{\nu}\right)
=exp[iTr(𝐉l)1νTr(𝐉l𝐉l)]\displaystyle=\exp\left[-i\mathrm{Tr}(\mathbf{J}_{l})-\frac{1}{\nu}\mathrm{Tr}(\mathbf{J}_{l}\mathbf{J}_{l})\right] (63)

This implies that the characteristic function of our original variable is

limνϕ𝐂^l(𝐉l)=exp[iTr(𝐉l𝐂l)1νTr(𝐉l𝐂l𝐉l𝐂l)].\lim_{\nu\to\infty}\phi_{\hat{\mathbf{C}}_{l}}(\mathbf{J}_{l})=\exp\left[-i\mathrm{Tr}(\mathbf{J}_{l}\mathbf{C}_{l})-\frac{1}{\nu}\mathrm{Tr}(\mathbf{J}_{l}\mathbf{C}_{l}\mathbf{J}_{l}\mathbf{C}_{l})\right]. (64)

The inverse Fourier transform is a Gaussian integral, giving

limνp(𝐂^l)=2p(p1)2d𝐉l(2π)p(p+1)2eiTr[𝐉l(𝐂^l𝐂l)]e1νTr(𝐉l𝐂l𝐉l𝐂l).\lim_{\nu\to\infty}p(\hat{\mathbf{C}}_{l})=2^{\frac{p(p-1)}{2}}\int\frac{\mathrm{d}\mathbf{J}_{l}}{(2\pi)^{\frac{p(p+1)}{2}}}e^{i\mathrm{Tr}[\mathbf{J}_{l}(\hat{\mathbf{C}}_{l}-\mathbf{C}_{l})]}e^{-\frac{1}{\nu}\mathrm{Tr}(\mathbf{J}_{l}\mathbf{C}_{l}\mathbf{J}_{l}\mathbf{C}_{l})}. (65)

To do this integral, it is advantageous to move to vectorized notation Gupta and Nagar (2000). For a general p×pp\times p matrix YY define vec(Y)\mathrm{vec}(Y) as the p2p^{2}-dimensional vector whose elements are (Y1,1,,Yp,1,Y1,2,,Yp,2,)(Y_{1,1},\dots,Y_{p,1},Y_{1,2},\dots,Y_{p,2},\dots). For a symmetric p×pp\times p matrix XX, define vecp(X)\mathrm{vecp}(X) as the 12p(p+1)\frac{1}{2}p(p+1)-dimensional vector whose elements are (X1,1,X1,2,,X1,p,X2,2,X2,3,)(X_{1,1},X_{1,2},\dots,X_{1,p},X_{2,2},X_{2,3},\dots). Then some standard matrix identities give

Tr(𝐗𝐃𝐗𝐄)\displaystyle\mathrm{Tr}(\mathbf{X}\mathbf{D}\mathbf{X}\mathbf{E}) =vecT(𝐗)(𝐄𝐃)vec(𝐗)\displaystyle=\mathrm{vec}^{T}(\mathbf{X})(\mathbf{E}\otimes\mathbf{D})\mathrm{vec}(\mathbf{X})
=vecpT(𝐗)𝐁p+(𝐄𝐃)𝐁p+Tvecp(𝐗),\displaystyle=\mathrm{vecp}^{T}(\mathbf{X})\mathbf{B}_{p}^{+}(\mathbf{E}\otimes\mathbf{D})\mathbf{B}_{p}^{+T}\mathrm{vecp}(\mathbf{X}), (66)

where \otimes is the Kronecker product, and 𝐁p+=(𝐁pT𝐁p)1𝐁pT\mathbf{B}_{p}^{+}=(\mathbf{B}_{p}^{T}\mathbf{B}_{p})^{-1}\mathbf{B}_{p}^{T} is the Moore-Penrose inverse of 𝐁p\mathbf{B}_{p}, the p2×12p(p+1)p^{2}\times\frac{1}{2}p(p+1) transition matrix. This matrix satisfies vec(𝐗)=𝐁p+Tvecp(𝐗)\mathrm{vec}(\mathbf{X})=\mathbf{B}_{p}^{+T}\mathrm{vecp}(\mathbf{X}) and vec(𝐗)=𝐁pTvecp(𝐗)\mathrm{vec}(\mathbf{X})=\mathbf{B}_{p}^{T}\mathrm{vecp}(\mathbf{X}), and has typical elements given by Gupta and Nagar (2000)

(𝐁p)ij,gh=12(δigδjh+δihδjg),ip,jp,ghp.(\mathbf{B}_{p})_{ij,gh}=\frac{1}{2}(\delta_{ig}\delta_{jh}+\delta_{ih}\delta_{jg}),\,i\leq p,\,j\leq p,\,g\leq h\leq p. (67)

Note that 𝐁p+\mathbf{B}_{p}^{+} is a 12p(p+1)×p2\frac{1}{2}p(p+1)\times p^{2} matrix. We also have

Tr(𝐗𝐀)\displaystyle\mathrm{Tr}(\mathbf{X}\mathbf{A}) =vecT(𝐗)vec(𝐀)\displaystyle=\mathrm{vec}^{T}(\mathbf{X})\mathrm{vec}(\mathbf{A})
=vecpT(𝐗)𝐁p+𝐁p+Tvecp(𝐀).\displaystyle=\mathrm{vecp}^{T}(\mathbf{X})\mathbf{B}_{p}^{+}\mathbf{B}_{p}^{+T}\mathrm{vecp}(\mathbf{A}). (68)

Making these substitutions, defining Δ𝐂^l=𝐂^l𝐂l\Delta\hat{\mathbf{C}}_{l}=\hat{\mathbf{C}}_{l}-\mathbf{C}_{l}, and using that |𝐁pT𝐁p|=212p(p1)\lvert\mathbf{B}_{p}^{T}\mathbf{B}_{p}\rvert=2^{-\frac{1}{2}p(p-1)} gives

limνp(𝐂^l)\displaystyle\lim_{\nu\to\infty}p(\hat{\mathbf{C}}_{l}) =2p(p1)2dvecp(𝐉l)(2π)p(p+1)2eivecpT(𝐉l)𝐁p+𝐁p+Tvecp(Δ𝐂^l)e1νvecpT(𝐉l)𝐁p+(𝐂l𝐂l)𝐁p+Tvecp(𝐉l)\displaystyle=2^{\frac{p(p-1)}{2}}\int\frac{\mathrm{dvecp}(\mathbf{J}_{l})}{(2\pi)^{\frac{p(p+1)}{2}}}e^{i\mathrm{vecp}^{T}(\mathbf{J}_{l})\mathbf{B}_{p}^{+}\mathbf{B}_{p}^{+T}\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l})}e^{-\frac{1}{\nu}\mathrm{vecp}^{T}(\mathbf{J}_{l})\mathbf{B}_{p}^{+}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}^{+T}\mathrm{vecp}(\mathbf{J}_{l})}
=exp{12vecpT(Δ𝐂^l)[2ν𝐁pT(𝐂l𝐂l)𝐁p]1vecp(Δ𝐂^l)}(2π)12p(p+1)|2ν𝐁pT(𝐂l𝐂l)𝐁p|,\displaystyle=\frac{\exp\left\{-\frac{1}{2}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l})\left[\frac{2}{\nu}\mathbf{B}_{p}^{T}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}\right]^{-1}\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l})\right\}}{\sqrt{(2\pi)^{\frac{1}{2}p(p+1)}\left|\frac{2}{\nu}\mathbf{B}_{p}^{T}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}\right|}}, (69)

i.e. vecp(𝐂^l)\mathrm{vecp}(\hat{\mathbf{C}}_{l}) is 12p(p+1)\frac{1}{2}p(p+1)-dimensional Gaussian with mean vecp(𝐂l)\mathrm{vecp}(\mathbf{C}_{l}) and covariance matrix 2ν𝐁pT(𝐂l𝐂l)𝐁p\frac{2}{\nu}\mathbf{B}_{p}^{T}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}. The matrix 𝐂^l\hat{\mathbf{C}}_{l} is thus symmetric matrix normal. Note that the mean and covariance matrix do not change compared with their Wishart forms when the limit is taken.

We have shown that the characteristic function of the Wishart distribution tends pointwise to that of a symmetric matrix Gaussian in the limit ν\nu\to\infty. This is enough to establish that Wishart matrices converge in distribution to symmetric Gaussian matrices, by Lévy’s continuity theorem.

B.3 Signal with non-zero trispectrum

We write the signal characteristic function as

ϕs({𝐤lm})=(1+124κlm¯ijkmkl1m1ikl2m2jkl3m3kkl4m4m)ϕsG({𝐤lm}).\phi_{s}(\{\mathbf{k}_{lm}\})=\left(1+\frac{1}{24}\kappa^{ijkm}_{\underline{lm}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}k^{m}_{l_{4}m_{4}}\right)\phi_{s}^{G}(\{\mathbf{k}_{lm}\}). (70)

Substituting this into Equation (9) gives a Gaussian integral, as in the Gaussian signal case, but now with four factors of the wave number in the integrand. This integral can be done analytically, which gives the power spectrum characteristic function as

ϕ{ν𝐂^l}({𝐉l})=(l=lminlmax|𝐈+2i𝐉l𝐂l|ν2)[1+18l1,l2(2l1+1)(2l2+1)ΔC^l1ijΔC^l2kmNGMl1ijMl2km].\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}\right)\left[1+\frac{1}{8}\sum_{l_{1},l_{2}}(2l_{1}+1)(2l_{2}+1)\langle\Delta\hat{C}^{ij}_{l_{1}}\Delta\hat{C}^{km}_{l_{2}}\rangle_{\mathrm{NG}}M^{ij}_{l_{1}}M^{km}_{l_{2}}\right]. (71)

A few consequences of the corrected distribution are immediately evident from Equation (71). Firstly, the leading-order correction in the limit Jl0J_{l}\to 0 is quadratic in JlJ_{l}, which tells us that both the normalization and the mean of p(C^l)p(\hat{C}_{l}) are unaffected by our correction (the latter is true by construction), but the covariance does receive a correction, given by ΔC^l1ijΔC^l2kmNG\langle\Delta\hat{C}^{ij}_{l_{1}}\Delta\hat{C}^{km}_{l_{2}}\rangle_{\mathrm{NG}}. This is as expected, agreeing with the non-perturbative expression Equation (37).

Inverse Fourier-transforming Equation (71) gives the correction to the probability density as

Δp({ν𝐂^l})=182λp(p1)2d{𝐉l}(2π)nλeilTr(ν𝐉l𝐂^l)(l=lminlmax|𝐈+2i𝐉l𝐂l|ν2)l1,l2ν1ν2ΔC^l1ijΔC^l2kmNG(𝐌l1)ij(𝐌l2)km,\Delta p(\{\nu\hat{\mathbf{C}}_{l}\})=\frac{1}{8}2^{\frac{\lambda p(p-1)}{2}}\int\frac{\mathrm{d}\{\mathbf{J}_{l}\}}{(2\pi)^{n\lambda}}e^{i\sum_{l}\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}\right)\sum_{l_{1},l_{2}}\nu_{1}\nu_{2}\langle\Delta\hat{C}^{ij}_{l_{1}}\Delta\hat{\mathrm{C}}^{km}_{l_{2}}\rangle_{\mathrm{NG}}(\mathbf{M}_{l_{1}})^{ij}(\mathbf{M}_{l_{2}})^{km}, (72)

with 𝐌l=[𝐂l+(2i𝐉l)1]1\mathbf{M}_{l}=[\mathbf{C}_{l}+(2i\mathbf{J}_{l})^{-1}]^{-1} and λlmaxlmin+1\lambda\equiv l_{\mathrm{max}}-l_{\mathrm{min}}+1. The quantity ΔC^l1ijΔC^l2kmNG\langle\Delta\hat{C}^{ij}_{l_{1}}\Delta\hat{C}^{km}_{l_{2}}\rangle_{\mathrm{NG}} is the non-Gaussian contribution to the covariance matrix. In the p=1p=1 case we call this Tl1l2T_{l_{1}l_{2}}. The crucial effect of non-Gaussianity here is to couple C^l\hat{C}_{l} with different ll, a result of non-zero off-diagonal terms in the non-Gaussian covariance matrix; the probability density no longer factorizes in ll.

The integrals in Equation (72) are of the Wishart type when l1l2l_{1}\neq l_{2}, and the only new term is that with l1=l2l_{1}=l_{2}. After some rearranging and relabelling, one can show that it is necessary to compute the integral

Hp,νab,cd(𝛀)d𝐓ei2Tr(𝛀𝐓)|𝐈+i𝐓|ν2[(𝐈+i𝐓)1]ab[(𝐈+i𝐓)1]cd.H^{ab,cd}_{p,\nu}(\bm{\Omega})\equiv\int\mathrm{d}\mathbf{T}\,e^{\frac{i}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{T})}\,\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{ab}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{cd}. (73)

We’ll evaluate this using derivatives, but first we need to establish a few rules for taking derivatives with respect to symmetric matrices. These can be found in Ref. Petersen and Pedersen (2008).

For a symmetric matrix 𝐗\mathbf{X}, we have

𝐗Xij\displaystyle\frac{\partial\mathbf{X}}{\partial X_{ij}} =𝐉ij+𝐉ji𝐉ij𝐉ij\displaystyle=\mathbf{J}^{ij}+\mathbf{J}^{ji}-\mathbf{J}^{ij}\mathbf{J}^{ij} (74)
Tr(𝐀𝐗)𝐗\displaystyle\frac{\partial\mathrm{Tr}(\mathbf{A}\mathbf{X})}{\partial\mathbf{X}} =𝐀+𝐀T(𝐀𝐈)\displaystyle=\mathbf{A}+\mathbf{A}^{T}-(\mathbf{A}\circ\mathbf{I}) (75)
det𝐗𝐗\displaystyle\frac{\partial\mathrm{det}\mathbf{X}}{\partial\mathbf{X}} =(det𝐗)[2𝐗1(𝐗1𝐈)],\displaystyle=(\mathrm{det}\mathbf{X})[2\mathbf{X}^{-1}-(\mathbf{X}^{-1}\circ\mathbf{I})], (76)

where \circ denotes the Hadamard (element-wise) product, and 𝐉ij\mathbf{J}^{ij} is the matrix whose kmkm element is δikδjm\delta_{ik}\delta_{jm}, i.e. the matrix whose only non-zero values are in cell ijij.

We can therefore write the following identity holding for symmetric matrices 𝐓\mathbf{T}:

|𝐈+i𝐓|ν2(𝐈+i𝐓)ij1=iνTij|𝐈+i𝐓|ν2+12(𝐈+i𝐓)ij1δij,\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}(\mathbf{I}+i\mathbf{T})^{-1}_{ij}=\frac{i}{\nu}\frac{\partial}{\partial T_{ij}}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}+\frac{1}{2}(\mathbf{I}+i\mathbf{T})^{-1}_{ij}\delta_{ij}, (77)

with no summation over repeated indices. Using this identity one can establish the identity

|𝐈+i𝐓|ν2(𝐈+i𝐓)ij1(𝐈+i𝐓)km1\displaystyle\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}(\mathbf{I}+i\mathbf{T})^{-1}_{ij}(\mathbf{I}+i\mathbf{T})^{-1}_{km} =iνTij[|𝐈+i𝐓|ν2(𝐈+i𝐓)km1]+12|𝐈+i𝐓|ν2(𝐈+i𝐓)ij1δij(𝐈+i𝐓)km1\displaystyle=\frac{i}{\nu}\frac{\partial}{\partial T_{ij}}\left[\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}(\mathbf{I}+i\mathbf{T})^{-1}_{km}\right]+\frac{1}{2}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}(\mathbf{I}+i\mathbf{T})^{-1}_{ij}\delta_{ij}(\mathbf{I}+i\mathbf{T})^{-1}_{km}
1ν|𝐈+i𝐓|ν2[2(𝐈+i𝐓)i(k1(𝐈+i𝐓)m)j1δij(𝐈+i𝐓)kj1(𝐈+i𝐓)mi1],\displaystyle-\frac{1}{\nu}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}\left[2(\mathbf{I}+i\mathbf{T})^{-1}_{i\left(k\right.}(\mathbf{I}+i\mathbf{T})^{-1}_{m\left.\right)j}-\delta_{ij}(\mathbf{I}+i\mathbf{T})^{-1}_{kj}(\mathbf{I}+i\mathbf{T})^{-1}_{mi}\right], (78)

where again there is no summation over repeated indices, and parentheses denote symmetrization over enclosed indices. Substituting this in to Equation (73), integrating by parts and noting that the boundary term vanishes (which it must do for the Fourier transforms to converge) gives

Hp,νijkm(𝛀)=12ν(2𝛀𝛀𝐈)ijPp,ν(𝛀)Ωkmν+12δijHp,νijkm(𝛀)2νHp,νi(km)j(𝛀)+1νδijHp,νkjmi(𝛀).H^{ijkm}_{p,\nu}(\bm{\Omega})=\frac{1}{2\nu}(2\bm{\Omega}-\bm{\Omega}\circ\mathbf{I})_{ij}P_{p,\nu}(\bm{\Omega})\frac{\Omega_{km}}{\nu}+\frac{1}{2}\delta_{ij}H^{ijkm}_{p,\nu}(\bm{\Omega})-\frac{2}{\nu}H^{i(km)j}_{p,\nu}(\bm{\Omega})+\frac{1}{\nu}\delta_{ij}H^{kjmi}_{p,\nu}(\bm{\Omega}). (79)

Permuting indices we can get three simultaneous equations for Hp,νijkmH^{ijkm}_{p,\nu}. The solution is

Hp,νijkm(𝛀)=(ν+1)ν(ν+2)(ν1)[ΩijΩkm2(ν+1)Ωi(kΩm)j]Pp,ν(𝛀).H^{ijkm}_{p,\nu}(\bm{\Omega})=\frac{(\nu+1)}{\nu(\nu+2)(\nu-1)}\left[\Omega_{ij}\Omega_{km}-\frac{2}{(\nu+1)}\Omega_{i\left(k\right.}\Omega_{\left.m\right)j}\right]P_{p,\nu}(\bm{\Omega}). (80)

Putting everything together gives the corrected density for the 𝐂^l\hat{\mathbf{C}}_{l} as

p({𝐂^l})\displaystyle p(\{\hat{\mathbf{C}}_{l}\}) =[l=lminlmaxWp(𝐂^l;𝐂l/ν,ν)]{1+18l1,l2ν1ν2ΔC^l1abΔC^l2cdNGCl1,ai1Cl1,bj1Cl2,ck1Cl2,dm1\displaystyle=\left[\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}W_{p}(\hat{\mathbf{C}}_{l};\mathbf{C}_{l}/\nu,\nu)\right]\left\{1+\frac{1}{8}\sum_{l_{1},l_{2}}\nu_{1}\nu_{2}\langle\Delta\hat{C}^{ab}_{l_{1}}\Delta\hat{C}^{cd}_{l_{2}}\rangle_{\mathrm{NG}}C^{-1}_{l_{1},ai}C^{-1}_{l_{1},bj}C^{-1}_{l_{2},ck}C^{-1}_{l_{2},dm}\vphantom{\frac{1}{2}}\right.
×[ΔC^l1,ijΔC^l2,km+2δl1l2(ν+2)(ν1)(C^l1,ijC^l1,kmν1C^l1,i(kC^l1,m)j)]},\displaystyle\times\left.\left[\Delta\hat{C}_{l_{1},ij}\Delta\hat{C}_{l_{2},km}+\frac{2\delta_{l_{1}l_{2}}}{(\nu+2)(\nu-1)}\left(\hat{C}_{l_{1},ij}\hat{C}_{l_{1},km}-\nu_{1}\hat{C}_{l_{1},i\left(k\right.}\hat{C}_{l_{1},\left.m\right)j}\right)\right]\right\}, (81)

with implicit summation over repeated indices. One can verify that the distribution Equation (81) is correctly normalised and has a mean given by 𝐂^l=𝐂l\langle\hat{\mathbf{C}}_{l}\rangle=\mathbf{C}_{l}, i.e. unchanged from the Wishart case.

We can write the correction in a more compact vectorial notation as follows. We define the Gaussian covariance matrix

𝚺lvecp(Δ𝐂^l)vecpT(Δ𝐂^l)G=2ν𝐁pT(𝐂l𝐂l)𝐁p,\bm{\Sigma}_{l}\equiv\langle\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l})\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l})\rangle_{G}=\frac{2}{\nu}\mathbf{B}_{p}^{T}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}, (82)

which implies that 𝚺l1=ν2𝐁p+(𝐂l1𝐂l1)𝐁l+T\bm{\Sigma}^{-1}_{l}=\frac{\nu}{2}\mathbf{B}_{p}^{+}(\mathbf{C}_{l}^{-1}\otimes\mathbf{C}_{l}^{-1})\mathbf{B}_{l}^{+T}. After some manipulations, the term quadratic in ΔC^l\Delta\hat{C}_{l} in Equation (17) can be written as

12l1,l2vecpT(Δ𝐂^l1)𝚺l11vecp(Δ𝐂^l1)vecp(Δ𝐂^l2)NG𝚺l21vecpT(Δ𝐂^l2),\frac{1}{2}\sum_{l_{1},l_{2}}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{1}})\bm{\Sigma}_{l_{1}}^{-1}\langle\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l_{1}})\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l_{2}})\rangle_{\mathrm{NG}}\bm{\Sigma}_{l_{2}}^{-1}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{2}}), (83)

which is the next-to-leading-order term in the expansion of

12l1,l2vecpT(Δ𝐂^l1)[𝚺l1δl1l2+vecp(Δ𝐂^l1)vecpT(Δ𝐂^l2)NG]1vecpT(Δ𝐂^l2),-\frac{1}{2}\sum_{l_{1},l_{2}}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{1}})\left[\bm{\Sigma}_{l_{1}}\delta_{l_{1}l_{2}}+\langle\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l_{1}})\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{2}})\rangle_{\mathrm{NG}}\right]^{-1}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{2}}), (84)

i.e. an expansion of a chi-squared having the inverse of the total covariance matrix. Once the Gaussian part of this expansion is ‘available’ when the ν\nu\to\infty limit is taken, we will be left a Gaussian possessing the correct inverse covariance matrix in its exponent.

The first of the two terms quadratic in C^l\hat{C}_{l} in Equation (81) has the same form as the term above, and can be written as a quadratic in vecp(𝐂^l)\mathrm{vecp}(\hat{\mathbf{C}}_{l}). Note that this term is suppressed with respect to the second term quadratic in C^l\hat{C}_{l} by a factor of ν\nu. This second term can be written as, with 𝐀l𝐂l1𝐂^l𝐂l1\mathbf{A}_{l}\equiv\mathbf{C}_{l}^{-1}\hat{\mathbf{C}}_{l}\mathbf{C}_{l}^{-1}

14l1ν13(ν1+2)(ν11)Tr[(𝐀l1𝐀l1)vec(Δ𝐂^l1)vecT(Δ𝐂^l1)NG]\displaystyle-\frac{1}{4}\sum_{l_{1}}\frac{\nu_{1}^{3}}{(\nu_{1}+2)(\nu_{1}-1)}\mathrm{Tr}\left[(\mathbf{A}_{l_{1}}\otimes\mathbf{A}_{l_{1}})\langle\mathrm{vec}(\Delta\hat{\mathbf{C}}_{l_{1}})\mathrm{vec}^{T}(\Delta\hat{\mathbf{C}}_{l_{1}})\rangle_{\mathrm{NG}}\right]
=14l1ν13(ν1+2)(ν11)Tr[𝐁p+(𝐀l1𝐀l1)𝐁p+Tvecp(Δ𝐂^l1)vecpT(Δ𝐂^l1)NG].\displaystyle=-\frac{1}{4}\sum_{l_{1}}\frac{\nu_{1}^{3}}{(\nu_{1}+2)(\nu_{1}-1)}\mathrm{Tr}\left[\mathbf{B}_{p}^{+}(\mathbf{A}_{l_{1}}\otimes\mathbf{A}_{l_{1}})\mathbf{B}_{p}^{+T}\langle\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l_{1}})\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{1}})\rangle_{\mathrm{NG}}\right]. (85)

This looks like the next-to-leading-order term in an expansion of the determinant of the total covariance, with the exception that it depends on the measured power. In the limit ν1\nu_{1}\to\infty we can write 𝐂^l=𝐂l+𝒪(ν1)\hat{\mathbf{C}}_{l}=\mathbf{C}_{l}+\mathcal{O}(\nu^{-1}), so we set 𝐂^l=𝐂l\hat{\mathbf{C}}_{l}=\mathbf{C}_{l} in this term and hence 𝐀l=𝐂l1\mathbf{A}_{l}=\mathbf{C}_{l}^{-1}. In that limit, this term dominates over the first term quadratic in C^l\hat{C}_{l} and is precisely the expansion of the determinant of the total covariance, i.e. the prefactor in the asymptotic Gaussian density.

B.4 Signal with non-zero bispectrum

We now consider a signal having non-zero bispectrum. The signal characteristic function reads

ϕs({𝐤lm})=(1172κlm¯ijkκlm¯lmnkl1m1ikl2m2jkl3m3kkl1m1lkl2m2mkl3m3n)ϕsG({𝐤lm}).\phi_{s}(\{\mathbf{k}_{lm}\})=\left(1-\frac{1}{72}\kappa^{ijk}_{\underline{lm}}\kappa^{lmn}_{\underline{l^{\prime}m^{\prime}}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}k^{l}_{l_{1}^{\prime}m_{1}^{\prime}}k^{m}_{l_{2}^{\prime}m_{2}^{\prime}}k^{n}_{l_{3}^{\prime}m_{3}^{\prime}}\right)\phi_{s}^{G}(\{\mathbf{k}_{lm}\}). (86)

Following the same steps as the trispectrum case, the correction to the characteristic function is

Δϕ{ν𝐂^l}({𝐉l})=\displaystyle\Delta\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})= (l=lminlmax|𝐈+2i𝐉l𝐂l|ν2)\displaystyle\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}\right)
×[1112l1,l2,l3ν1ν2ν3B~l1l2l3ijk,lmn(𝐂l1i2𝐉l11)il1(𝐂l2i2𝐉l21)jm1(𝐂l3i2𝐉l31)kn1].\displaystyle\times\left[1-\frac{1}{12}\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\tilde{B}^{ijk,lmn}_{l_{1}l_{2}l_{3}}\left(\mathbf{C}_{l_{1}}-\frac{i}{2}\mathbf{J}_{l_{1}}^{-1}\right)^{-1}_{il}\left(\mathbf{C}_{l_{2}}-\frac{i}{2}\mathbf{J}_{l_{2}}^{-1}\right)^{-1}_{jm}\left(\mathbf{C}_{l_{3}}-\frac{i}{2}\mathbf{J}_{l_{3}}^{-1}\right)^{-1}_{kn}\right]. (87)

In the limit that Jl0J_{l}\to 0 this is a cubic correction to the Wishart characteristic function, telling us that the effect of B~\tilde{B} is to modify the three-point function or skewness of C^l\hat{C}_{l}, as expected. Just as in the trispectrum case, the correction is not a polynomial in JlJ_{l} and hence not an Edgeworth expansion, even though it is perturbatively ‘close’ to the zero-order Wishart distribution111111Note that to get the characteristic function of C^l\hat{C}_{l} rather than νC^l\nu\hat{C}_{l} one has to replace JlJ_{l} with Jl/νJ_{l}/\nu..

Inverse-Fourier transforming Equation (87) gives the correction to the density as

Δp({ν𝐂^l})=1122λp(p1)2d{𝐉l}(2π)nλeilTr(ν𝐉l𝐂^l)(l=lminlmax|𝐈+2i𝐉l𝐂l|ν2)l1,l2,l3ν1ν2ν3B~l1l2l3ijk,lmn(𝐌l1)il(𝐌l2)jm(𝐌l3)kn,\Delta p(\{\nu\hat{\mathbf{C}}_{l}\})=-\frac{1}{12}2^{\frac{\lambda p(p-1)}{2}}\int\frac{\mathrm{d}\{\mathbf{J}_{l}\}}{(2\pi)^{n\lambda}}e^{i\sum_{l}\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}\right)\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\tilde{B}^{ijk,lmn}_{l_{1}l_{2}l_{3}}(\mathbf{M}_{l_{1}})^{il}(\mathbf{M}_{l_{2}})^{jm}(\mathbf{M}_{l_{3}})^{kn}, (88)

with 𝐌l=[𝐂l+(2i𝐉l)1]1\mathbf{M}_{l}=[\mathbf{C}_{l}+(2i\mathbf{J}_{l})^{-1}]^{-1}.

Once again the only new term is the case l1=l2=l3l_{1}=l_{2}=l_{3}. The new integral we need is

Kp,νab,cd,ef(𝛀)d𝐓ei2Tr(𝛀𝐓)|𝐈+i𝐓|ν2[(𝐈+i𝐓)1]ab[(𝐈+i𝐓)1]cd[(𝐈+i𝐓)1]ef.K^{ab,cd,ef}_{p,\nu}(\bm{\Omega})\equiv\int\mathrm{d}\mathbf{T}\,e^{\frac{i}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{T})}\,\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{ab}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{cd}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{ef}. (89)

Our strategy will again be to take derivatives. Letting 𝐀(𝐈+i𝐓)1\mathbf{A}\equiv(\mathbf{I}+i\mathbf{T})^{-1}, we have the identity

|𝐀|ν2AabAijAkm\displaystyle\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{ab}A_{ij}A_{km} =iνTij(|𝐀|ν2AkmAab)+12δij|𝐀|ν2AabAijAkm\displaystyle=\frac{i}{\nu}\frac{\partial}{\partial T_{ij}}\left(\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{km}A_{ab}\right)+\frac{1}{2}\delta_{ij}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{ab}A_{ij}A_{km}
2ν|𝐀|ν2AabAk(iAj)m+1νδij|𝐀|ν2AabAkiAmj2ν|𝐀|ν2AkmAa(iAj)b+1νδij|𝐀|ν2AkmAaiAjb,\displaystyle-\frac{2}{\nu}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{ab}A_{k\left(i\right.}A_{\left.j\right)m}+\frac{1}{\nu}\delta_{ij}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{ab}A_{ki}A_{mj}-\frac{2}{\nu}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{km}A_{a\left(i\right.}A_{\left.j\right)b}+\frac{1}{\nu}\delta_{ij}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{km}A_{ai}A_{jb}, (90)

with no summation over repeated indices.

Integrating by parts, substituting Equation (80), and defining Kp,νab,ij,kmMab,ij,kmPp,νK^{ab,ij,km}_{p,\nu}\equiv M^{ab,ij,km}P_{p,\nu} (where Pp,νP_{p,\nu} is defined in Equation (56)) gives

Mab,ij,km12δijMab,ij,km+2νMab,k(i,j)m1νδijMab,ki,mj+2νMkm,a(i,j)b1νδijMkm,ai,jb\displaystyle M^{ab,ij,km}-\frac{1}{2}\delta_{ij}M^{ab,ij,km}+\frac{2}{\nu}M^{ab,k(i,j)m}-\frac{1}{\nu}\delta_{ij}M^{ab,ki,mj}+\frac{2}{\nu}M^{km,a(i,j)b}-\frac{1}{\nu}\delta_{ij}M^{km,ai,jb}
=(ν+1)ν2(ν+2)(ν1)(Ωij12Ωijδij)(ΩkmΩab2ν+1Ωk(aΩb)m).\displaystyle=\frac{(\nu+1)}{\nu^{2}(\nu+2)(\nu-1)}\left(\Omega_{ij}-\frac{1}{2}\Omega_{ij}\delta_{ij}\right)\left(\Omega_{km}\Omega_{ab}-\frac{2}{\nu+1}\Omega_{k\left(a\right.}\Omega_{\left.b\right)m}\right). (91)

Multiplying by δij\delta_{ij} and subtracting the result gives

Mab,ij,km+2νMab,k(i,j)m+2νMkm,a(i,j)b=(ν+1)ν2(ν+2)(ν1)(ΩijΩkmΩab2ν+1ΩijΩk(aΩb)m).M^{ab,ij,km}+\frac{2}{\nu}M^{ab,k(i,j)m}+\frac{2}{\nu}M^{km,a(i,j)b}=\frac{(\nu+1)}{\nu^{2}(\nu+2)(\nu-1)}\left(\Omega_{ij}\Omega_{km}\Omega_{ab}-\frac{2}{\nu+1}\Omega_{ij}\Omega_{k\left(a\right.}\Omega_{\left.b\right)m}\right). (92)

We can write two further equations like Equation (92) by taking derivatives with respect to TkmT_{km} and TabT_{ab}. Adding all three equations gives an expression manifestly symmetric in its indices

3Mab,ij,km+2ν(Mab,ki,jm+Mab,kj,im+Mkm,aj,ib+Mkm,ai,jb+Mij,ak,mb+Mij,am,kb)\displaystyle 3M^{ab,ij,km}+\frac{2}{\nu}\left(M^{ab,ki,jm}+M^{ab,kj,im}+M^{km,aj,ib}+M^{km,ai,jb}+M^{ij,ak,mb}+M^{ij,am,kb}\right)
=(ν+1)ν2(ν+2)(ν1)[3ΩijΩkmΩab(ΩijΩkaΩbm+ΩijΩkbΩam+ΩkmΩiaΩbj+ΩkmΩibΩaj+ΩabΩkiΩjm+ΩabΩkjΩim)1+ν].\displaystyle=\frac{(\nu+1)}{\nu^{2}(\nu+2)(\nu-1)}\left[3\Omega_{ij}\Omega_{km}\Omega_{ab}-\frac{\left(\Omega_{ij}\Omega_{ka}\Omega_{bm}+\Omega_{ij}\Omega_{kb}\Omega_{am}+\Omega_{km}\Omega_{ia}\Omega_{bj}+\Omega_{km}\Omega_{ib}\Omega_{aj}+\Omega_{ab}\Omega_{ki}\Omega_{jm}+\Omega_{ab}\Omega_{kj}\Omega_{im}\right)}{1+\nu}\right]. (93)

From the p=1p=1 case we know that the solution must be cubic in the elements of 𝛀\bm{\Omega}. There are three kinds of term that we can use to build Mab,ij,kmM^{ab,ij,km} dictated by the symmetries in play, given by

Mab,ij,km=\displaystyle M^{ab,ij,km}= XΩijΩkmΩab\displaystyle X\Omega_{ij}\Omega_{km}\Omega_{ab}
+2\displaystyle+2 Y[ΩijΩk(aΩb)m+ΩkmΩi(aΩb)j+ΩabΩk(iΩj)m]\displaystyle Y[\Omega_{ij}\Omega_{k\left(a\right.}\Omega_{\left.b\right)m}+\Omega_{km}\Omega_{i\left(a\right.}\Omega_{\left.b\right)j}+\Omega_{ab}\Omega_{k\left(i\right.}\Omega_{\left.j\right)m}]
+\displaystyle+ 2Z[ΩjmΩk(aΩb)i+ΩikΩj(aΩb)m+ΩjkΩi(aΩb)m+ΩimΩk(aΩb)j],\displaystyle 2Z[\Omega_{jm}\Omega_{k\left(a\right.}\Omega_{\left.b\right)i}+\Omega_{ik}\Omega_{j\left(a\right.}\Omega_{\left.b\right)m}+\Omega_{jk}\Omega_{i\left(a\right.}\Omega_{\left.b\right)m}+\Omega_{im}\Omega_{k\left(a\right.}\Omega_{\left.b\right)j}], (94)

for unknown coefficients X,Y,ZX,Y,Z. The three terms here are closed under the symmetries of Mab,ij,kmM^{ab,ij,km}. Substituting in to Equation (93) and identifying coefficients of these terms gives the following equations for the unknown coefficients

3X+12νY\displaystyle 3X+\frac{12}{\nu}Y =3(ν+1)ν2(ν+2)(ν1)\displaystyle=\frac{3(\nu+1)}{\nu^{2}(\nu+2)(\nu-1)} (95)
6Y+2ν(2X+2Y+8Z)\displaystyle 6Y+\frac{2}{\nu}(2X+2Y+8Z) =2ν2(ν+2)(ν1)\displaystyle=-\frac{2}{\nu^{2}(\nu+2)(\nu-1)} (96)
6Z+2ν(6Y+6Z)\displaystyle 6Z+\frac{2}{\nu}(6Y+6Z) =0.\displaystyle=0. (97)

The solution is

X\displaystyle X =ν2+3ν2ν(ν1)(ν2)(ν+2)(ν+4)\displaystyle=\frac{\nu^{2}+3\nu-2}{\nu(\nu-1)(\nu-2)(\nu+2)(\nu+4)} (98)
Y\displaystyle Y =1ν(ν1)(ν2)(ν+4)\displaystyle=\frac{-1}{\nu(\nu-1)(\nu-2)(\nu+4)} (99)
Z\displaystyle Z =2ν(ν1)(ν2)(ν+2)(ν+4).\displaystyle=\frac{2}{\nu(\nu-1)(\nu-2)(\nu+2)(\nu+4)}. (100)

The integral Equation (89) is thus

Kp,nuab,cd,ef(𝛀)=Pp,ν(𝛀)ν(ν+4)(ν2)(ν1){ν2+3ν2ν+2ΩabΩcdΩef2[ΩcdΩe(aΩb)f+ΩefΩc(aΩb)d+ΩabΩe(cΩd)f]\displaystyle K_{p,nu}^{ab,cd,ef}(\bm{\Omega})=\frac{P_{p,\nu}(\bm{\Omega})}{\nu(\nu+4)(\nu-2)(\nu-1)}\left\{\frac{\nu^{2}+3\nu-2}{\nu+2}\Omega_{ab}\Omega_{cd}\Omega_{ef}-2[\Omega_{cd}\Omega_{e\left(a\right.}\Omega_{\left.b\right)f}+\Omega_{ef}\Omega_{c\left(a\right.}\Omega_{\left.b\right)d}+\Omega_{ab}\Omega_{e\left(c\right.}\Omega_{\left.d\right)f}]\right.
+4ν+2[ΩdfΩe(aΩb)c+ΩceΩd(aΩb)f+ΩdeΩc(aΩb)f+ΩcfΩe(aΩb)d]},\displaystyle\left.+\frac{4}{\nu+2}[\Omega_{df}\Omega_{e\left(a\right.}\Omega_{\left.b\right)c}+\Omega_{ce}\Omega_{d\left(a\right.}\Omega_{\left.b\right)f}+\Omega_{de}\Omega_{c\left(a\right.}\Omega_{\left.b\right)f}+\Omega_{cf}\Omega_{e\left(a\right.}\Omega_{\left.b\right)d}]\right\}, (101)

Putting everything together gives the correction to the density as

p({𝐂^l})=[l=lminlmaxWp(𝐂^l;𝐂l/ν,ν)]{1+112l1,l2,l3ν1ν2ν3B~l1l2l3abc,defCl1,ai1Cl2,bj1Cl3,ck1Cl1,dl1Cl2,em1Cl3,fn1\displaystyle p(\{\hat{\mathbf{C}}_{l}\})=\left[\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}W_{p}(\hat{\mathbf{C}}_{l};\mathbf{C}_{l}/\nu,\nu)\right]\left\{1+\frac{1}{12}\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\tilde{B}_{l_{1}l_{2}l_{3}}^{abc,def}C^{-1}_{l_{1},ai}C^{-1}_{l_{2},bj}C^{-1}_{l_{3},ck}C^{-1}_{l_{1},dl}C^{-1}_{l_{2},em}C^{-1}_{l_{3},fn}\vphantom{\frac{1}{2}}\right.
×[ΔC^l1,ilΔC^l2,jmΔC^l3,kn+[3]2δl1l2(ν1+2)(ν11)C^l1,ilC^l1,jmΔC^l3,kn[3]2δl1l2ν1(ν1+2)(ν11)C^l1,i(jC^l1,m)lΔC^l3,kn\displaystyle\times\left[\vphantom{\frac{1}{2}}\Delta\hat{C}_{l_{1},il}\Delta\hat{C}_{l_{2},jm}\Delta\hat{C}_{l_{3},kn}+[3]\frac{2\delta_{l_{1}l_{2}}}{(\nu_{1}+2)(\nu_{1}-1)}\hat{C}_{l_{1},il}\hat{C}_{l_{1},jm}\Delta\hat{C}_{l_{3},kn}-[3]\frac{2\delta_{l_{1}l_{2}}\nu_{1}}{(\nu_{1}+2)(\nu_{1}-1)}\hat{C}_{l_{1},i\left(j\right.}\hat{C}_{l_{1},\left.m\right)l}\Delta\hat{C}_{l_{3},kn}\right.
+32δl1l2δl2l3(ν1+2)(ν11)(ν1+4)(ν12)C^l1,ilC^l1,jmC^l1,kn[3]16ν1δl1l2δl2l3(ν1+2)(ν11)(ν1+4)(ν12)C^l1,jmC^l1,k(iC^l1,l)n\displaystyle+\frac{32\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+2)(\nu_{1}-1)(\nu_{1}+4)(\nu_{1}-2)}\hat{C}_{l_{1},il}\hat{C}_{l_{1},jm}\hat{C}_{l_{1},kn}-[3]\frac{16\nu_{1}\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+2)(\nu_{1}-1)(\nu_{1}+4)(\nu_{1}-2)}\hat{C}_{l_{1},jm}\hat{C}_{l_{1},k\left(i\right.}\hat{C}_{l_{1},\left.l\right)n}
+[4]4ν12δl1l2δl2l3(ν1+2)(ν11)(ν1+4)(ν12)C^l1,mnC^l1,k(iC^l1,l)j]},\displaystyle\left.\left.+[4]\frac{4\nu_{1}^{2}\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+2)(\nu_{1}-1)(\nu_{1}+4)(\nu_{1}-2)}\hat{C}_{l_{1},mn}\hat{C}_{l_{1},k\left(i\right.}\hat{C}_{l_{1},\left.l\right)j}\right]\vphantom{\frac{1}{2}}\right\}, (102)

with implicit summation over repeated indices. We have verified that this distribution is correctly normalised. Computing the corrections to the mean, covariance, and three-point function is tedious so we have only checked for the p=1p=1 case, confirming that these three cumulants are recovered correctly.

Equation (81) and Equation (102) give the leading order correction to the Wishart distribution from signal non-Gaussian. The expressions are clearly quite cumbersome when p1p\neq 1, but may be simplified using the symmetries of the bin indices. It may also be possible to write these expressions in a more covariant way, although we were not able to find a more compact expression.

Appendix C Alternative binning choices

The default results of Section III.2 are made with evenly space logarithmically binned power spectra with Δlogl=0.1\Delta\log l=0.1 and a weight function Wl=2l+1W_{l}=2l+1. The mathematical details of the binning are described in Section II.4.

The main use of binning for our purposes is to allow us to easily measure the covariance matrix of our lognormal maps from a finite number of simulations. Binning reduces the Monte Carlo noise in this measurement and increases the relative strength of the non-Gaussian part.

Refer to caption
Figure 9: The leading-order fractional corrections to the Gamma posterior with bandpowers binned with Δlogl=0.2\Delta\log l=0.2. Curves have the same meaning as in Figure 7.

As discussed in Section II.4, the correction to posteriors from signal non-Gaussianity in expected to depend only mildly on the bin width, as we verify numerically in Figure 9 where we plot the corrections to ClC_{l} and amplitude posteriors for logarithmic bins with Δlogl=0.2\Delta\log l=0.2. This plot should be compared with Figure 7. The impact of these broader bins is very mild. The broader bins roughly preserve the total mode count when Wl=2l+1W_{l}=2l+1, so the effect on the posterior is naturally expected to be small.

It is beyond the scope of this paper to provide an exhaustive investigation of the different binning choices for the power spectrum, but a few obvious schemes deserve discussion. Instead of logarithmic bins we can also choose linearly-spaced bins. This choice compresses many of the large-scale modes into a single bin and hence does not allow the low-ll behaviour of the likelihood to be clearly seen, so we chose not to adopt this choice in our baseline results. Another alternative is to use a weight function Wl=l(l+1)W_{l}=l(l+1), which is a common choice for power spectrum bandpowers. This upweights the smallest scales in each bin and consequently causes our likelihood approximation to break down at smaller lbl_{b} (the bin barycentre) for logarithmic bins with Δl=0.1\Delta l=0.1. An advantage of our formalism is that the likelihood correction diagnostic ε\langle\varepsilon\rangle can be computed for any binning choice, something which we advocate when analysing power spectra binned with this weight function.

We close by noting that any binning choice may be adopted so long as the bins are sufficiently small that the power spectrum can be approximated as flat over the extent of the bin. This assumption can be avoided entirely by working with C^l/Cl\hat{C}_{l}/C_{l} rather than ClC_{l}, as discussed in Section II.4.

Appendix D Dependence on source redshift

The baseline results in this paper have assumed a broad source redshift bin following a Euclid-like distribution, peaking at zs1z_{s}\approx 1. Lensing surveys typically place galaxies in tomographic redshift bins - the fiducial choice of Ref. Euclid Collaboration et al. (2020) uses 10 redshift bins, which is representative of upcoming Stage-IV surveys. Assuming equipopulated bins, the redshift distribution of the lowest bin is approximately Gaussian (after accounting for photometric redshift errors) with a peak around zs=0.325z_{s}=0.325 and width σz=0.1\sigma_{z}=0.1. Non-Gaussianity in the shear field is relatively stronger at a fixed angular scale at lower redshift due to that scale subtending smaller, more non-linear spatial scales and the evolution of structure formation towards lower redshift. This is offset to an extent by the overall signal strength being suppressed by the lensing kernel compared with a bin at higher source redshift. This also makes the relative effect of any Gaussianizing shape noise stronger.

To test our likelihood approximation in what is likely the most extreme case for a Euclid-like survey, we generate an ensemble of lognormal fields for this low-redshift source bin. We scale the ‘shift’ parameter quantifying the strength of the lognormality according to the redshift-scaling formula proposed by Ref. Xavier et al. (2016), confirming this provides a reasonable match to the covariance matrix of the ray-traced N-body simulations of Ref. Takahashi et al. (2017).

Refer to caption
Figure 10: Leading-order fractional corrections to the Gamma posterior for a Gaussian source redshift distribution centered at zs=0.325z_{s}=0.325 with width σz=0.1\sigma_{z}=0.1. Power spectra are assumed to be binned with Δlogl=0.1\Delta\log l=0.1. Curves have the same meaning as in Figure 7.

Figure 10 shows the leading-order corrections to the posterior and the convergence statistics for this redshift bin, which should be compared with Figure 7. The stronger non-Gaussianity is clear to see, with the blue curves quantifying the correction to the Fisher matrix on an individual ClC_{l} (left panel) or an amplitude parameter (right panel) passing unity around l100l\approx 100. Our assumption of perturbative non-Gaussianity in the signal is valid only up to around l30l\approx 30. At this angular scale the Gamma distribution is reasonably well approximated by a Gaussian, and passing the full non-Gaussian covariance matrix to this Gaussian absorbs most of the correction to the posterior that our approximation implies. Residual post-Gaussian effects are at most 1%, and extrapolating out to l=1000l=1000 we see that corrections are likely to be no more than 10%. It therefore appears that even for this low source redshift bin the scale of non-Gaussianity (l30l\approx 30) and the scale where Gamma-type corrections are necessary (l10l\approx 10) are separated enough that the likelihood can be approximated as transitioning from a Gamma to a Gaussian with the correct covariance matrix.

Refer to caption
Figure 11: Same as Figure 10 but with shape noise corresponding to n¯=3sq.arcmin1\bar{n}=3\,\mathrm{sq.\,arcmin}^{-1}.

When shape noise is added the effects of non-Gaussianity are strongly suppressed. Figure 11 shows leading-order corrections to the posterior for a model in which Cl=A(S0,l+Nl)C_{l}=A(S_{0,l}+N_{l}) when applying shape noise with a galaxy number density of n¯=3sq.arcmin1\bar{n}=3\,\mathrm{sq.\,arcmin}^{-1}, appropriate for a single redshift bin in a Euclid-like survey. Our approximation is now convergent out to much higher ll, and post-Gaussian corrections are negligible. It thus appears that although the strength of the signal non-Gaussianity may be stronger at low redshift, the relatively higher noise level suppresses non-Gaussianity in the likelihood.

We close this section by noting that these conclusions may need to be modified when considering galaxy clustering at low redshift. There the geometric suppression suffered by lensing is smaller, and hence the Gaussianizing effect of noise may be expected to be relatively weaker. We have also neglected Intrinsic Alignments, which may be expected to play a greater role in cross-bin combinations at low redshift. We advocate that the statistics plotted in Figure 7 and Figure 10 be computed whenever a likelihood approximation needs to be made.

References

  • Kilbinger (2015) M. Kilbinger, Reports on Progress in Physics 78, 086901 (2015), 1411.0115.
  • Mandelbaum (2018) R. Mandelbaum, ARA&A 56, 393 (2018), 1710.03235.
  • Hamana et al. (2020) T. Hamana, M. Shirasaki, S. Miyazaki, C. Hikage, M. Oguri, S. More, R. Armstrong, A. Leauthaud, R. Mandelbaum, H. Miyatake, et al., PASJ 72, 16 (2020), 1906.06041.
  • Heymans et al. (2021) C. Heymans, T. Tröster, M. Asgari, C. Blake, H. Hildebrandt, B. Joachimi, K. Kuijken, C.-A. Lin, A. G. Sánchez, J. L. van den Busch, et al., A&A 646, A140 (2021), 2007.15632.
  • Abbott et al. (2022) T. M. C. Abbott, M. Aguena, A. Alarcon, S. Allam, O. Alves, A. Amon, F. Andrade-Oliveira, J. Annis, S. Avila, D. Bacon, et al., Phys. Rev. D 105, 023520 (2022), 2105.13549.
  • Abbott et al. (2018) T. M. C. Abbott, F. B. Abdalla, A. Alarcon, J. Aleksić, S. Allam, S. Allen, A. Amara, J. Annis, J. Asorey, S. Avila, et al., Phys. Rev. D 98, 043526 (2018), 1708.01530.
  • Hu (1999) W. Hu, ApJ. Lett. 522, L21 (1999), astro-ph/9904153.
  • Schrabback et al. (2010) T. Schrabback, J. Hartlap, B. Joachimi, M. Kilbinger, P. Simon, K. Benabed, M. Bradač, T. Eifler, T. Erben, C. D. Fassnacht, et al., A&A 516, A63 (2010), 0911.0053.
  • Heymans et al. (2013) C. Heymans, E. Grocutt, A. Heavens, M. Kilbinger, T. D. Kitching, F. Simpson, J. Benjamin, T. Erben, H. Hildebrandt, H. Hoekstra, et al., Mon. Not. R. Astron. Soc. 432, 2433 (2013), 1303.1808.
  • Jee et al. (2016) M. J. Jee, J. A. Tyson, S. Hilbert, M. D. Schneider, S. Schmidt, and D. Wittman, Astrophys. J.  824, 77 (2016), 1510.03962.
  • Percival and Brown (2006) W. J. Percival and M. L. Brown, Mon. Not. R. Astron. Soc. 372, 1104 (2006), astro-ph/0604547.
  • Hamimeche and Lewis (2008) S. Hamimeche and A. Lewis, Phys. Rev. D 77, 103013 (2008), 0801.0554.
  • Upham et al. (2020) R. E. Upham, L. Whittaker, and M. L. Brown, Mon. Not. R. Astron. Soc. 491, 3165 (2020), 1908.00795.
  • Hikage et al. (2019) C. Hikage, M. Oguri, T. Hamana, S. More, R. Mandelbaum, M. Takada, F. Köhlinger, H. Miyatake, A. J. Nishizawa, H. Aihara, et al., PASJ 71, 43 (2019), 1809.09148.
  • Nicola et al. (2021) A. Nicola, C. García-García, D. Alonso, J. Dunkley, P. G. Ferreira, A. Slosar, and D. N. Spergel, J. Cosmol. Astropart. Phys. 2021, 067 (2021), 2010.09717.
  • Loureiro et al. (2021) A. Loureiro, L. Whittaker, A. Spurio Mancini, B. Joachimi, A. Cuceu, M. Asgari, B. Stölzner, T. Tröster, A. H. Wright, M. Bilicki, et al., arXiv e-prints arXiv:2110.06947 (2021), 2110.06947.
  • Sellentin and Heavens (2016) E. Sellentin and A. F. Heavens, Mon. Not. R. Astron. Soc. 456, L132 (2016), 1511.05969.
  • Hall and Taylor (2019) A. Hall and A. Taylor, Mon. Not. R. Astron. Soc. 483, 189 (2019), 1807.06875.
  • Scoccimarro (2000) R. Scoccimarro, Astrophys. J.  544, 597 (2000), astro-ph/0004086.
  • Hartlap et al. (2009) J. Hartlap, T. Schrabback, P. Simon, and P. Schneider, A&A 504, 689 (2009), 0901.3269.
  • Schneider and Hartlap (2009) P. Schneider and J. Hartlap, A&A 504, 705 (2009), 0905.0577.
  • Keitel and Schneider (2011) D. Keitel and P. Schneider, A&A 534, A76 (2011), 1105.3672.
  • Wilking and Schneider (2013) P. Wilking and P. Schneider, A&A 556, A70 (2013), 1304.4781.
  • Sellentin and Heavens (2018) E. Sellentin and A. F. Heavens, Mon. Not. R. Astron. Soc. 473, 2355 (2018), 1707.04488.
  • Sellentin et al. (2018) E. Sellentin, C. Heymans, and J. Harnois-Déraps, Mon. Not. R. Astron. Soc. 477, 4879 (2018), 1712.04923.
  • Hahn et al. (2019) C. Hahn, F. Beutler, M. Sinha, A. Berlind, S. Ho, and D. W. Hogg, Mon. Not. R. Astron. Soc. 485, 2956 (2019), 1803.06348.
  • Lin et al. (2020) C.-H. Lin, J. Harnois-Déraps, T. Eifler, T. Pospisil, R. Mandelbaum, A. B. Lee, S. Singh, and LSST Dark Energy Science Collaboration, Mon. Not. R. Astron. Soc. 499, 2977 (2020), 1905.03779.
  • Taylor et al. (2019) P. L. Taylor, T. D. Kitching, J. Alsing, B. D. Wandelt, S. M. Feeney, and J. D. McEwen, Phys. Rev. D 100, 023519 (2019), 1904.05364.
  • Diaz Rivero and Dvorkin (2020) A. Diaz Rivero and C. Dvorkin, Phys. Rev. D 102, 103507 (2020), 2007.05535.
  • Friedrich et al. (2021) O. Friedrich, F. Andrade-Oliveira, H. Camacho, O. Alves, R. Rosenfeld, J. Sanchez, X. Fang, T. F. Eifler, E. Krause, C. Chang, et al., Mon. Not. R. Astron. Soc. 508, 3125 (2021), 2012.08568.
  • Smith et al. (2006) S. Smith, A. Challinor, and G. Rocha, Phys. Rev. D 73, 023517 (2006), astro-ph/0511703.
  • Durrett (2010) R. Durrett, Probability: theory and examples. Volume 31 of Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK, 2010).
  • Bouchet et al. (1992) F. R. Bouchet, R. Juszkiewicz, S. Colombi, and R. Pellat, ApJ. Lett. 394, L5 (1992).
  • Juszkiewicz et al. (1995) R. Juszkiewicz, D. H. Weinberg, P. Amsterdamski, M. Chodorowski, and F. Bouchet, Astrophys. J.  442, 39 (1995), astro-ph/9308012.
  • Bernardeau and Kofman (1995) F. Bernardeau and L. Kofman, Astrophys. J.  443, 479 (1995), astro-ph/9403028.
  • Amendola (1996) L. Amendola, Mon. Not. R. Astron. Soc. 283, 983 (1996).
  • Blinnikov and Moessner (1998) S. Blinnikov and R. Moessner, A&AS 130, 193 (1998), astro-ph/9711239.
  • Taylor and Watts (2001) A. N. Taylor and P. I. R. Watts, Mon. Not. R. Astron. Soc. 328, 1027 (2001), astro-ph/0010014.
  • Carron (2013) J. Carron, A&A 551, A88 (2013), 1204.4724.
  • Ponzano and Regge (1968) G. Ponzano and T. Regge, in Spectroscopy and group theoretical methods in Physics, edited by F. Bloch (Amsterdam: North-Holland, 1968).
  • Bernardeau et al. (2011) F. Bernardeau, C. Pitrou, and J.-P. Uzan, J. Cosmol. Astropart. Phys. 2011, 015 (2011), 1012.2652.
  • Munshi et al. (2020) D. Munshi, T. Namikawa, T. D. Kitching, J. D. McEwen, R. Takahashi, F. R. Bouchet, A. Taruya, and B. Bose, Mon. Not. R. Astron. Soc. 493, 3985 (2020), 1910.04627.
  • Xavier et al. (2016) H. S. Xavier, F. B. Abdalla, and B. Joachimi, Mon. Not. R. Astron. Soc. 459, 3693 (2016), 1602.08503.
  • Taruya et al. (2002) A. Taruya, M. Takada, T. Hamana, I. Kayo, and T. Futamase, Astrophys. J.  571, 638 (2002), astro-ph/0202090.
  • Hilbert et al. (2011) S. Hilbert, J. Hartlap, and P. Schneider, A&A 536, A85 (2011), 1105.3980.
  • Martin et al. (2012) S. Martin, P. Schneider, and P. Simon, A&A 540, A9 (2012), 1109.0944.
  • Lewis et al. (2000) A. Lewis, A. Challinor, and A. Lasenby, Astrophys. J.  538, 473 (2000), astro-ph/9911177.
  • Howlett et al. (2012) C. Howlett, A. Lewis, A. Hall, and A. Challinor, J. Cosmol. Astropart. Phys. 2012, 027 (2012), 1201.3654.
  • Euclid Collaboration et al. (2020) Euclid Collaboration, A. Blanchard, S. Camera, C. Carbone, V. F. Cardone, S. Casas, S. Clesse, S. Ilić, M. Kilbinger, T. Kitching, et al., A&A 642, A191 (2020), 1910.09273.
  • Planck Collaboration et al. (2020) Planck Collaboration, N. Aghanim, Y. Akrami, M. Ashdown, J. Aumont, C. Baccigalupi, M. Ballardini, A. J. Banday, R. B. Barreiro, N. Bartolo, et al., A&A 641, A6 (2020), 1807.06209.
  • Mead et al. (2021) A. J. Mead, S. Brieden, T. Tröster, and C. Heymans, Mon. Not. R. Astron. Soc. 502, 1401 (2021), 2009.01858.
  • Górski et al. (2005) K. M. Górski, E. Hivon, A. J. Banday, B. D. Wandelt, F. K. Hansen, M. Reinecke, and M. Bartelmann, Astrophys. J.  622, 759 (2005), astro-ph/0409513.
  • Takahashi et al. (2017) R. Takahashi, T. Hamana, M. Shirasaki, T. Namikawa, T. Nishimichi, K. Osato, and K. Shiroyama, Astrophys. J.  850, 24 (2017), 1706.01472.
  • Benabed et al. (2009) K. Benabed, J. F. Cardoso, S. Prunet, and E. Hivon, Mon. Not. R. Astron. Soc. 400, 219 (2009), 0901.4537.
  • de Belsunce et al. (2021) R. de Belsunce, S. Gratton, W. Coulton, and G. Efstathiou, Mon. Not. R. Astron. Soc. 507, 1072 (2021), 2103.14378.
  • Seljak et al. (2017) U. Seljak, G. Aslanyan, Y. Feng, and C. Modi, J. Cosmol. Astropart. Phys. 2017, 009 (2017), 1706.06645.
  • Wandelt et al. (2004) B. D. Wandelt, D. L. Larson, and A. Lakshminarayanan, Phys. Rev. D 70, 083511 (2004), astro-ph/0310080.
  • Alsing et al. (2019) J. Alsing, T. Charnock, S. Feeney, and B. Wandelt, Mon. Not. R. Astron. Soc. 488, 4440 (2019), 1903.00007.
  • Wang et al. (2019) M. S. Wang, W. J. Percival, S. Avila, R. Crittenden, and D. Bianchi, Mon. Not. R. Astron. Soc. 486, 951 (2019), 1811.08155.
  • Okamoto and Hu (2002) T. Okamoto and W. Hu, Phys. Rev. D 66, 063008 (2002), astro-ph/0206155.
  • Cooray and Sheth (2002) A. Cooray and R. Sheth, Physics Reports 372, 1 (2002), astro-ph/0206508.
  • Krause and Eifler (2017) E. Krause and T. Eifler, Mon. Not. R. Astron. Soc. 470, 2100 (2017), 1601.05779.
  • Harnois-Déraps et al. (2012) J. Harnois-Déraps, S. Vafaei, and L. Van Waerbeke, Mon. Not. R. Astron. Soc. 426, 1262 (2012), 1202.2332.
  • Sato and Nishimichi (2013) M. Sato and T. Nishimichi, Phys. Rev. D 87, 123538 (2013), 1301.3588.
  • Cooray and Hu (2000) A. Cooray and W. Hu, Astrophys. J.  534, 533 (2000), astro-ph/9910397.
  • Bernardeau et al. (2002) F. Bernardeau, S. Colombi, E. Gaztañaga, and R. Scoccimarro, Physics Reports 367, 1 (2002), astro-ph/0112551.
  • Janik and Nowak (2003) R. A. Janik and M. A. Nowak, Journal of Physics A: Mathematical and General 36, 3629 (2003), URL https://doi.org/10.1088/0305-4470/36/12/343.
  • Herz (1955) C. S. Herz, Annals of Mathematics 61, 474 (1955), ISSN 0003486X, URL http://www.jstor.org/stable/1969810.
  • Gupta and Nagar (2000) A. K. Gupta and D. K. Nagar, Matrix Variate Distributions (Chapman and Hall, CRC Press, Boca Raton, FL, USA, 2000).
  • Petersen and Pedersen (2008) K. B. Petersen and M. S. Pedersen, The matrix cookbook (2008), version 20081110, URL http://www2.imm.dtu.dk/pubdb/p.php?3274.