The non-Gaussian likelihood of weak lensing power spectra

Alex Hall \scalerel* 1 ahall@roe.ac.uk Andy Taylor Institute for Astronomy, University of Edinburgh, Royal Observatory, Blackford Hill, Edinburgh, EH9 3HJ, U.K.

Abstract

The power spectrum of weak lensing fluctuations has a non-Gaussian distribution due to its quadratic nature. On small scales the Central Limit Theorem acts to Gaussianize this distribution but non-Gaussianity in the signal due to gravitational collapse is increasing and the functional form of the likelihood is unclear. Analyses have traditionally assumed a Gaussian likelihood with non-linearity incorporated into the covariance matrix; here we provide the theory underpinning this assumption. We calculate, for the first time, the leading-order correction to the distribution of angular power spectra from non-Gaussianity in the underlying signal and study the transition to Gaussianity. Our expressions are valid for an arbitrary number of correlated maps and correct the Wishart distribution in the presence of weak (but otherwise arbitrary) non-Gaussianity in the signal. Surprisingly, the resulting distribution is not equivalent to an Edgeworth expansion. The leading-order effect is to broaden the covariance matrix by the usual trispectrum term, with residual skewness sourced by the trispectrum and the square of the bispectrum. Using lognormal lensing maps we demonstrate that our likelihood is uniquely able to model both large and mildly non-linear scales. We provide easy-to-compute statistics to quantify the size of the non-Gaussian corrections. We show that the full non-Gaussian likelihood can be accurately modelled as a Gaussian on small, non-linear scales. On large angular scales non-linearity in the lensing signal imparts a negligible correction to the likelihood, which takes the Wishart form in the full-sky case. Our formalism is equally applicable to any kind of projected field.

I Introduction

In the coming decade several large astronomy projects aim to place percent-level constraints on cosmic acceleration, testing the $\Lambda$ CDM cosmological model and its extensions. These include the European Space Agency’s Euclid mission¹¹1https://sci.esa.int/web/euclid, the Vera C. Rubin Observatory²²2https://www.lsst.org/about, and the Nancy Grace Roman Space Telescope³³3https://roman.gsfc.nasa.gov/. These ‘Stage-IV’ facilities will each image roughly a billion galaxies, with redshifts derived from photometry calibrated against deep spectroscopic training samples.

Upcoming surveys will use these galaxies to measure the weak gravitational lensing signal (see Refs. Kilbinger (2015); Mandelbaum (2018) for recent reviews), which directly probes the underlying matter distribution and provides the opportunity to derive cosmological constraints from its statistical properties. Current lensing surveys are now achieving precision on certain $\Lambda$ CDM cosmological parameter combinations at the level of a few percent, and providing useful constraints on the properties of dark energy Hamana et al. (2020); Heymans et al. (2021); Abbott et al. (2022). The influence of weak lensing in constraining cosmological parameters is likely to increase with the advent of Stage-IV surveys, if systematic errors can be kept under control.

It is now fairly common practice to combine measurements of the cosmic shear signal with maps of the angular positions of galaxies, sometimes referred to as ‘3 $\times$ 2 pt’ analysis Abbott et al. (2018). The relatively coarse precision of photometric redshifts means that binning galaxies into broad redshift bins is reasonably lossless, a technique called ‘tomography’ Hu (1999); Schrabback et al. (2010); Heymans et al. (2013); Jee et al. (2016). Combining tomographic galaxy clustering and weak lensing are examples of the use of projected maps of dark matter (or discrete tracers thereof) to constrain cosmological models.

The baseline parameter constraints from current and upcoming surveys using projected fields will be derived from two-point statistics, such as correlation functions or power spectra, combined with a likelihood function. Accurate cosmological inference using this approach requires the likelihood to be specified accurately. The functional form of the likelihood has traditionally been assumed to be Gaussian in the measured summary statistics, the rationale being that the Central Limit Theorem (CLT) drives the distribution of these statistics towards Gaussianity. However, it has long been known that the distribution of weak lensing or galaxy clustering two-point statistics is in fact not Gaussian, for several reasons. Firstly, two-point statistics are non-linear (quadratic) functions of the data. If the data are Gaussian distributed, power spectra and correlation functions are expected to be non-Gaussian as a result, which can easily be seen by noting that the variance of the field (the correlation function at zero lag) cannot be negative. On the full sky the distribution of power spectra of Gaussian fields can be written down analytically (e.g. Ref. Percival and Brown (2006)), but in the presence of a survey mask (required to mitigate foreground contamination) the naive estimator (i.e. the pseudo- $C_{l}$ estimator) has a distribution that is complicated to compute and must typically be approximated Hamimeche and Lewis (2008); Upham et al. (2020).

A second source of non-Gaussianity in the likelihood arises from the fact that the fields themselves are not Gaussian due to non-linear gravitational collapse. This issue is inextricably tied to the efficacy of the CLT to Gaussianize the estimator, because both the number of degrees of freedom being compressed by the estimator (e.g. the number of galaxy pairs of a given angular separation) and the strength of the signal non-linearity increase as the minimum angular scale is lowered. Non-Gaussianity of the signal (the assumption of Gaussian noise is typically safe) is usually incorporated by modifying the covariance matrix of the two-point statistic with a term derived from the connected four-point function (trispectrum) of the signal, while keeping the Gaussian functional form (e.g. Hikage et al. (2019); Nicola et al. (2021); Loureiro et al. (2021)).

Additionally, non-Gaussianity in the likelihood can also arise due to the presence of non-Gaussian systematic residuals, or when the covariance has been estimated from simulations Sellentin and Heavens (2016) or combined with an analytic model Hall and Taylor (2019).

Recognising that the approximation of Gaussianity may contribute bias or imprecision to parameter constraints, several works have successfully measured non-Gaussianity in the distribution of two-point statistics from simulations. Ref. Scoccimarro (2000) was an early work to find evidence for non-Gaussian moments of power spectrum estimates, but Ref. Hartlap et al. (2009) and Ref. Schneider and Hartlap (2009) represent the first attempts to incorporate non-Gaussianity into the likelihood of a weak lensing survey. The approach taken in Ref. Hartlap et al. (2009) was to model the data vector (correlation functions in their case) as the sum of independent non-Gaussian components, using the fastICA Independent Component Analysis method to construct the likelihood. Subsequent analytic work by Ref. Keitel and Schneider (2011) and Ref. Wilking and Schneider (2013) further established the non-Gaussianity of the weak lensing correlation functions. Ref. Sellentin and Heavens (2018) introduced diagnostic statistics to identify non-Gaussianity in cosmological likelihood functions, and evidence for non-Gaussianity in a weak lensing likelihood was found in Ref. Sellentin et al. (2018). Ref. Hahn et al. (2019) studied non-Gaussianity in galaxy clustering power spectra, building on and extending the work of Ref. Hartlap et al. (2009), and found non-negligible (albeit sub- $1\sigma$ ) biases in parameters. In the context of weak lensing correlation functions from Stage-IV surveys, Ref. Lin et al. (2020) identified significant skewness and kurtosis in the shear correlation functions and used a Principal Component Analysis to model the resulting non-Gaussian likelihood, but found that a Gaussian approximation was sufficient for $\Lambda$ CDM parameters and an LSST-like survey. This conclusion is supported by the work of Ref. Taylor et al. (2019) who used likelihood-free methods to establish the validity of a Gaussian approximation for Stage-IV surveys and $\Lambda$ CDM parameters. Recently, Ref. Diaz Rivero and Dvorkin (2020) demonstrated clear non-Gaussianity in the likelihood of weak lensing power spectra, and presented a method to model this using normalizing flows. Ref. Friedrich et al. (2021) have also studied the impact of non-Gaussian likelihoods on parameter constraints, finding a negligible impact for the Dark Energy Survey data vector.

What these works have established is that while non-Gaussianity is clearly present in the weak lensing likelihood, the impact of the Gaussian assumption on the posterior of parameters appears to be very modest, at least in $\Lambda$ CDM models. However, there has been relatively little attempt to distinguish between the non-Gaussianity that should arise on survey scales due to the finite number of modes (that makes a Gamma distribution more accurate than a Gaussian) from the non-Gaussianity that could arise on small scales due to the CLT being inefficient at Gaussianizing the summary statistics in the face of increasing non-linearity in the signal. The use of the power spectrum as the statistic of choice aids in distinguishing between these two effects, as well as allowing the modelling to be simplified due to the diagonal covariance that arises for Gaussian fields. As well as being unable to distinguish between various sources of non-Gaussianity, most of the works modelling non-Gaussian likelihoods have used numerical techniques or ad-hoc approximations to capture the corrections, often with little physical justification. This is partly because the full non-Gaussian distribution of the shear or galaxy clustering signal cannot be written down.

In this work, we incorporate the effects of a non-Gaussian signal into the likelihood of the angular power spectrum, and study how this impacts various posterior distributions of interest. To make analytic progress we assume that the signal non-Gaussianity is sufficiently weak that it can be treated perturbatively. While this is a constraining assumption, it will allow us to study in detail how the various sources of non-Gaussianity interact and make sense of the results of the numerical studies referenced above. Our approach is more in the spirit of Ref. Smith et al. (2006), who studied similar kinds of effects in the power spectra of the weakly lensed cosmic microwave background, than the numerical works that have sought to model the full non-perturbative likelihood. We argue that an analytic approach building from the well-understood case of Gaussian fields and full sky coverage offers a valuable way of understanding the Gaussianity, or otherwise, of the likelihood function. Having a tractable model that can be quickly computed (and sampled from) also allows the effects of non-Gaussianity to be studied for any cosmological model, rather than being limited to $\Lambda$ CDM.

This paper is structured as follows. In Section II we present the main expression used in this work, Equation (22), and study its properties. In Section III we detail the simulations used to validate our likelihood and quantify the corrections to the posterior distributions. We present our conclusions in Section IV. In a series of appendices we present the derivation of the likelihood function, study the impact of alternative angular binning choices, and test the impact of the source redshift.

II The likelihood of angular power spectrum estimates

Consider a survey that has made noisy maps $d^{i}(\hat{\mathbf{n}})$ of the large-scale scale structure across the full sky in $p$ tomographic redshift bins labelled by the index $i$ . Taking the spherical harmonic coefficients $a^{i}_{lm}$ of these maps and arranging them into a vector of length $p$ , the angular power spectra including all cross-bin pairs can be packaged into a $p\times p$ matrix $\hat{\mathbf{C}}_{l}$ given by

\nu\hat{\mathbf{C}}_{l}\equiv\sum_{m=-l}^{l}\mathbf{a}_{lm}\mathbf{a}_{lm}^{\dagger},

(1)

where $\nu\equiv 2l+1$ is the number of angular degrees of freedom at each $l$ , and the spherical multipoles of the data consist of signal plus noise as $a^{i}_{lm}=s^{i}_{lm}+n^{i}_{lm}$ . We will assume the maps are spin-0 (i.e. scalar), although this could easily be relaxed; in the case of shear data for example we can define the vector $\mathbf{a}_{lm}$ to include two entries per redshift bin corresponding to the shear $E$ and $B$ modes. Similarly, projected galaxy number counts may be included in the data vector for a full $3\times 2$ pt analysis. The hatted power spectrum in Equation (1) denotes a measured, noisy quantity. We will assume that the maps are purely real, and hence all multipoles obey

\mathbf{a}_{l-m}=(-1)^{m}\mathbf{a}^{*}_{lm}.

(2)

Our goal is to compute the sampling distribution of these power spectrum estimates, $p(\{\hat{\mathbf{C}}_{l}\})$ , with braces indicating the full set of angular multipoles $l$ , given knowledge of the underlying statistical properties of the noise and the signal. The sampling distribution considered as a function of these properties is the likelihood function. The power spectra are quadratic in the data and so are expected to obey Gaussian statistics only when the Central Limit Theorem is effective. The other source of non-Gaussianity is from the signal itself, which when measured in the late Universe on small scales has higher-order moments due to non-linear structure formation.

As discussed in Appendix A we cannot proceed by performing an Edgeworth expansion of the power spectrum distribution around its zero-order Wishart form, so instead we develop a new approach and begin by considering the statistics of the underlying data. Assuming the signal and noise are independent, the joint distribution of the power spectra (multiplied by $\nu$ ) is

p(\{\nu\hat{\mathbf{C}}_{l}\})=\int\mathrm{d}\{\mathbf{s}_{lm}\}\int\mathrm{d}\{\mathbf{n}_{lm}\}\left[\prod_{l=l_{{\rm min}}}^{l_{{\rm max}}}\delta_{D}\left(\nu\hat{\mathbf{C}}_{l}-\sum_{m}\mathbf{a}_{lm}\mathbf{a}_{lm}^{\dagger}\right)\right]p_{s}(\{\mathbf{s}_{lm}\})p_{n}(\{\mathbf{n}_{lm}\}),

(3)

where the probability density $p(\{\nu\hat{\mathbf{C}}_{l}\})$ is defined with a measure over real and symmetric matrices, which have $n\equiv p(p+1)/2$ independent elements. We will expand the Dirac delta in terms of real and symmetric $p$ -dimensional matrices $\mathbf{J}_{l}$ as

\delta_{D}\left(\nu\hat{\mathbf{C}}_{l}-\sum_{m}\mathbf{a}_{lm}\mathbf{a}_{lm}^{\dagger}\right)=2^{\frac{p(p-1)}{2}}\int\frac{\mathrm{d}\mathbf{J}_{l}}{(2\pi)^{n}}e^{i\mathrm{Tr}\left[\mathbf{J}_{l}\left(\nu\hat{\mathbf{C}}_{l}-\sum_{m}\mathbf{a}_{lm}\mathbf{a}_{lm}^{\dagger}\right)\right]},

(4)

where we integrate over all real symmetric matrices and the volume element is

\mathrm{d}\mathbf{J}_{l}=\mathrm{d}J_{l}^{1,1}\mathrm{d}J_{l}^{1,2}\dots\mathrm{d}J_{l}^{1,p}\mathrm{d}J_{l}^{2,2}\mathrm{d}J_{l}^{2,3}\dots\mathrm{d}J_{l}^{p,p}.

(5)

Note that the $2^{\frac{p(p-1)}{2}}$ factor is needed since the off-diagonal elements of $\mathbf{J}_{l}$ are double counted in the summation in the exponential, and hence we need to redefine each of the $p(p-1)/2$ off-diagonal terms in the integration with a factor of $2$ .

We expand the signal distribution in Fourier modes as

p(\{\mathbf{s}_{lm}\})=\int\frac{\mathrm{d}\{\mathbf{k}_{lm}\}}{(2\pi)^{pN}}\,\phi_{s}(\{\mathbf{k}_{lm}\})\,e^{i\sum_{lm}\mathbf{k}_{lm}^{\dagger}\mathbf{s}_{lm}},

(6)

where $N\equiv\sum_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}2l+1$ is the total number of modes per tomographic bin, and $\phi_{s}$ is the characteristic function of the signal⁴⁴4We will define characteristic functions this way throughout, but we caution the reader that other authors sometimes define it with a complex conjugate.. We can, and will, define the ‘wave numbers’ $\mathbf{k}_{lm}$ to obey reality conditions as in Equation (2). The corresponding characteristic function of the noise is $\phi_{n}$ .

Inserting Equations (4) and (6) into Equation (3), changing integration variables and doing an integral over $\mathbf{a}_{lm}$ assuming $\mathbf{J}_{l}$ is non-singular⁵⁵5The set of singular $\mathbf{J}_{l}$ has zero measure, and as the integrand is finite for this set the contribution from singular modes is vanishingly small in the limit. gives

p(\{\nu\hat{\mathbf{C}}_{l}\})=2^{\frac{\lambda p(p-1)}{2}}\int\frac{\mathrm{d}\{\mathbf{J}_{l}\}}{(2\pi)^{n\lambda}}e^{i\sum_{l}\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert 2i\mathbf{J}_{l}\rvert^{-\frac{\nu}{2}}\right)\int\frac{\mathrm{d}\{\mathbf{k}_{lm}\}}{(2\pi)^{\frac{pN}{2}}}\phi_{s}(\{\mathbf{k}_{lm}\})e^{-\frac{1}{2}\sum_{lm}\mathbf{k}_{lm}^{\dagger}(2i\mathbf{J}_{l})^{-1}\mathbf{k}_{lm}}\phi_{n}(\{\mathbf{k}_{lm}\}),

(7)

where $\lambda\equiv l_{\mathrm{max}}-l_{\mathrm{min}}+1$ .

Now, we define the characteristic function of $\{\nu\hat{\mathbf{C}}_{l}\}$ over its independent elements as

\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\int\mathrm{d}\{\nu\hat{\mathbf{C}}_{l}\}e^{-i\sum_{l}\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}p(\{\nu\hat{\mathbf{C}}_{l}\}).

(8)

This gives, finally,

\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert 2i\mathbf{J}_{l}\rvert^{-\frac{\nu}{2}}\right)\int\frac{\mathrm{d}\{\mathbf{k}_{lm}\}}{(2\pi)^{\frac{pN}{2}}}\phi_{s}(\{\mathbf{k}_{lm}\})e^{-\frac{1}{2}\sum_{lm}\mathbf{k}_{lm}^{\dagger}(2i\mathbf{J}_{l})^{-1}\mathbf{k}_{lm}}\phi_{n}(\{\mathbf{k}_{lm}\}),

(9)

i.e. a mode coupling integral between the characteristic functions of the signal, the noise, and a Gaussian in the ‘wave number’ of the field $\mathbf{k}_{lm}$ having covariance $2i\mathbf{J}$ , where $\mathbf{J}$ is the ‘wave number’ of the power spectra.

II.1 The known case of a Gaussian signal and Gaussian noise

So far, we have not assumed anything about the statistics of the signal or the noise, only that they are independent. In the case of Gaussian signal and noise, the characteristic functions $\phi_{s}$ and $\phi_{n}$ also take a Gaussian form, and the integrals in Equation (9) all decouple. Each integration has a Gaussian form that can be done analytically, with the result being a product of Wishart characteristic functions

\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}},

(10)

where $\mathbf{C}_{l}$ is the expected total (signal plus noise) covariance matrix of the map. The inverse Fourier transform of this can be computed analytically (see Appendix B), and gives the well-known distribution of the power spectra as a product of Wishart distributions (e.g., Ref. Percival and Brown (2006))

	$\displaystyle p(\{\hat{\mathbf{C}}_{l}\})$	$\displaystyle=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\frac{\lvert\hat{\mathbf{C}}_{l}\rvert^{(\nu-p-1)/2}}{2^{\frac{p\nu}{2}}\Gamma_{p}\left(\frac{\nu}{2}\right)\left\|\mathbf{C}_{l}/\nu\right\|^{\frac{\nu}{2}}}e^{-\frac{\nu}{2}\mathrm{Tr}(\mathbf{C}_{l}^{-1}\hat{\mathbf{C}}_{l})}$
		$\displaystyle\equiv\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}W_{p}(\hat{\mathbf{C}}_{l};\mathbf{C}_{l}/\nu,\nu),$		(11)

where $W_{p}$ is the $p$ -dimensional Wishart density and $\Gamma_{p}$ is the multivariate Gamma function.

Each $\hat{\mathbf{C}}_{l}$ estimate thus has a Wishart distribution with scale matrix $\mathbf{C}_{l}/\nu$ and degrees of freedom $\nu$ with $\nu>p-1$ . As expected, the mean of each $\hat{\mathbf{C}}_{l}$ is $\mathbf{C}_{l}$ (the estimator is unbiased), and the variance of a given map power spectrum is $2C_{l}^{2}/\nu$ . The latter also follows immediately from squaring the definition Equation (1) and using Wick’s theorem.

When $p=1$ this distribution is equivalent to a Gamma distribution. The Gamma distribution has non-zero skewness, kurtosis, and higher-order cumulants. For example, the dimensionless skewness is equal to $S=2\sqrt{2/\nu}$ , and the dimensionless excess kurtosis is $\kappa=12/\nu$ . It is easy to see that higher-order cumulants come with successive factors of $\nu^{-1/2}$ ; this is how the Central Limit Theorem manifests itself when $\nu\gg 1$ , since these higher-order cumulants are all formally zero for a Gaussian field. From Equation (1) we see that $\nu$ counts the azimuthal modes contributing to a given wave number, such that the power spectrum estimator is the average of an increasingly large number of independent variables as $\nu$ increases.

The convergence of any probability distribution to a Gaussian can be established formally by showing that its characteristic function tends pointwise to that of a Gaussian (Lèvy’s continuity theorem, e.g. Ref. Durrett (2010)). Inspection of Equation (10) makes clear how this happens, where the convergence to a Gaussian can be established by Taylor-expanding around $\nu=\infty$ . One can also establish Gaussianity from the density directly. In the $p=1$ case one first has to change variables to $\hat{C}_{l}=C_{l}(1+n\sqrt{2/\nu})$ then take the limit $\nu\to\infty$ at fixed $n$ , using Stirling’s approximation to deal with the Gamma function in the denominator. Whichever way one chooses to go, the resulting limiting distribution is a product of Gaussians, each with mean $C_{l}$ and variance $2C_{l}^{2}/\nu$ .

We close this section by noting that in general one has to be careful about assuming a Gaussian sampling distribution for power spectrum estimates, as this does not necessarily guarantee a Gaussian posterior for parameters at the same level of accuracy. For example, if one has an informative and non-Gaussian prior on a parameter, the amount of constraining data required for the likelihood to dominate the posterior can be well above that needed to Gaussianize the likelihood. Secondly, and more importantly, posterior standard errors on parameters can shrink as fast as non-Gaussian contributions to the likelihood, meaning relative corrections from non-Gaussianity can remain important for posteriors even on small angular scales. In general these effects are most robustly studied with simulations. Ref. Hamimeche and Lewis (2008) provides a thorough discussion of residual non-Gaussian effects from failure of the Central Limit Theorem for Gaussian fields in the CMB context, finding that sufficiently broad multipole bins can reduce the error from approximating the likelihood as Gaussian to acceptable levels. We will return to this question in the cosmic shear context in Section III.

II.2 Corrections to the tomographic shear power distribution from a non-Gaussian signal

In the previous section we derived the distribution of angular power spectrum estimates assuming both signal and noise were Gaussian. In this section we will consider the more realistic scenario of a non-Gaussian signal, but will keep the assumption of Gaussian noise.

In the Gaussian case, the mode coupling integral in Equation (9) can be performed analytically due to the Gaussian form of the signal characteristic function $\phi_{s}$ . The fully general non-Gaussian characteristic function of the signal has no analytic expression that can be written down, so to make progress we must restrict to weak non-Gaussianity, in the sense that the higher-order cumulants of the signal are progressively suppressed. This is the Edgeworth expansion Bouchet et al. (1992); Juszkiewicz et al. (1995); Bernardeau and Kofman (1995); Amendola (1996); Blinnikov and Moessner (1998); Taylor and Watts (2001), and truncating at leading order gives

	$\displaystyle\phi_{s}(\{\mathbf{k}_{lm}\})=$	$\displaystyle\left(1-i\frac{1}{6}\kappa^{ijk}_{\underline{lm}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}\vphantom{\frac{1}{2}}\right.$
		$\displaystyle\left.\vphantom{\frac{1}{2}}+\frac{1}{24}\kappa^{ijkm}_{\underline{lm}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}k^{m}_{l_{4}m_{4}}-\frac{1}{72}\kappa^{ijk}_{\underline{lm}}\kappa^{lmn}_{\underline{l^{\prime}m^{\prime}}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}k^{l}_{l_{1}^{\prime}m_{1}^{\prime}}k^{m}_{l_{2}^{\prime}m_{2}^{\prime}}k^{n}_{l_{3}^{\prime}m_{3}^{\prime}}\right)\phi_{s}^{G}(\{\mathbf{k}_{lm}\}),$		(12)

where all repeated indices are summed over and $\phi_{s}^{G}$ is the Gaussian characteristic function. The compact notation $\underline{lm}$ denotes $l_{1}m_{1},l_{2}m_{2},l_{3}m_{3}$ when attached to a three-point cumulant, and $l_{1}m_{1},l_{2}m_{2},l_{3}m_{3},l_{4}m_{4}$ when attached to a four-point cumulant. These are defined as

	$\displaystyle\kappa^{ijk}_{\underline{lm}}$	$\displaystyle\equiv\langle s^{i}_{l_{1}m_{1}}s^{j}_{l_{2}m_{2}}s^{k}_{l_{3}m_{3}}\rangle$		(13)
	$\displaystyle\kappa^{ijkm}_{\underline{lm}}$	$\displaystyle\equiv\langle s^{i}_{l_{1}m_{1}}s^{j}_{l_{2}m_{2}}s^{k}_{l_{3}m_{3}}s^{m}_{l_{4}m_{4}}\rangle_{c},$		(14)

where the subscript $c$ denotes the fully connected part.

In Appendix A we discuss how the signal cumulants are related to those of the power spectrum estimates themselves. The covariance and leading order dimensionless skewness of $\hat{C}_{l}$ in the $p=1$ case are given by

	$\displaystyle\langle\Delta\hat{C}_{l}\Delta\hat{C}_{l^{\prime}}\rangle$	$\displaystyle=\frac{2}{\nu}C_{l}^{2}\delta_{ll^{\prime}}+T_{ll^{\prime}},$		(15)
	$\displaystyle\frac{\langle\Delta\hat{C}_{l}^{3}\rangle}{\langle\Delta C_{l}^{2}\rangle^{3/2}}$	$\displaystyle\approx\sqrt{\frac{8}{\nu}}+\sqrt{2\nu^{3}}\frac{\tilde{B}_{lll}}{C_{l}^{3}}+\frac{3\sqrt{2\nu}}{2}\frac{T_{ll}}{C_{l}^{2}}.$		(16)

The skewness has a term coming from the Gaussian part of the signal ( $\sqrt{8/\nu}$ ), plus a term proportional to the squared bispectrum of the signal $\tilde{B}_{lll}$ , defined later in Equation (21), and a term proportional to its trispectrum $T_{ll^{\prime}}$ . These latter two quantities are exactly those present in Equation (12), and are also the leading order non-Gaussian corrections to the dimensionless kurtosis of $\hat{C}_{l}$ . Indeed, as discussed in Appendix A, all of the dimensionless cumulants of $\hat{C}_{l}$ have leading-order corrections from the trispectrum and squared bispectrum. Since these two quantities are of the same perturbative order this means that the cumulants of $\hat{C}_{l}$ do not progressively smaller as their order increases, and an Edgeworth expansion of the distribution of $\hat{C}_{l}$ around a Wishart will not work. This justifies our approach of expanding the underlying signal as an Edgeworth expansion, and then propagating to get the distribution of the power spectrum.

Substituting Equation (12) into Equation (9), we immediately see that the leading order bispectrum term vanishes due to this term carrying an odd power of $k_{lm}$ . The leading order contribution to the distribution is therefore from terms of order the trispectrum or bispectrum squared, whose computation formally requires going to third order in perturbation theory. We will therefore truncate the expansion in Equation (12) at the order shown. Since the probability density is linear in the characteristic function, we can simply add the trispectrum correction to that coming from bispectrum squared. We present the derivation of the characteristic function and the probability density of the power spectrum in Appendix B, with the final result given in Equations (81) and (102). We will simply quote the results here.

II.2.1 Non-Gaussian power distribution from a signal with non-zero trispectrum

In the $p=1$ case the distribution of $\hat{C}_{l}$ corrected by a signal trispectrum given in Equation (81) reduces to

p(\{\hat{C}_{l}\})=\left\{\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\Gamma[\hat{C}_{l};\nu/2,\nu/(2C_{l})]\right\}\left[1+\frac{1}{8}\sum_{l_{1},l_{2}}\nu_{1}\nu_{2}\frac{T_{l_{1}l_{2}}}{C_{l_{1}}C_{l_{2}}}\left(\frac{\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}}{C_{l_{1}}C_{l_{2}}}-\frac{2}{\nu_{1}+2}\delta_{l_{1}l_{2}}\frac{\hat{C}_{l_{1}}^{2}}{C_{l_{1}}^{2}}\right)\right],

(17)

where $T_{ll^{\prime}}$ is the $m$ -averaged version of the signal trispectrum that contributes to the power spectrum covariance, see Equation (15). One can verify that the distribution Equation (17) is correctly normalised, has a mean given by $\langle\hat{C}_{l}\rangle=C_{l}$ (unchanged from the Gamma case) and a covariance matrix given by $\langle\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}\rangle=2\delta_{l_{1}l_{2}}C_{l_{1}}^{2}/\nu_{1}+T_{l_{1}l_{2}}$ , as expected.

Despite possessing the correct normalisation, mean and covariance, Equation (17) is not a valid probability density since it is not everywhere positive. This arises for sufficiently large $\hat{C}_{l}$ . This is a ubiquitous feature of densities based on the Edgeworth expansion, and can be remedied by writing the correction as an exponential, i.e. $1+x\approx e^{x}$ . This is valid since the correction is assumed to perturbatively small.

The first correcting term in Equation (17), proportional to $\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}$ , can be written in a suggestive way. It is the leading-order term in the expansion of

-\frac{1}{2}\sum_{l_{1},l_{2}}\Delta\hat{C}_{l_{1}}\left(\frac{2C_{l_{1}}^{2}}{\nu_{1}}\delta_{l_{1}l_{2}}+T_{l_{1}l_{2}}\right)^{-1}\Delta\hat{C}_{l_{2}},

(18)

where the notation $(A_{l_{1}l_{2}})^{-1}$ refers to the $l_{1},l_{2}$ element of the inverse of the matrix with elements $A_{l_{1}l_{2}}$ , as should be clear from context. Equation (18) is a $\chi^{2}$ -like functional having the inverse of the total covariance matrix. Once the Gaussian part of this expansion is ‘available’ from the zero-order Gamma term when the $\nu\to\infty$ limit is taken, we will be left a Gaussian possessing the correct inverse covariance matrix in its exponent. Similarly, the second term in Equation (17) is the leading-order correction to the determinant pre-factor of a Gaussian due to the non-Gaussian covariance $T_{l_{1}l_{2}}$ . Thus, in the $\nu\to\infty$ limit, our distribution is the leading-order correction to a Gaussian due to $T_{l_{1}l_{2}}$ . This can also be seen directly from the characteristic function, Equation (71).

Considered as a function of the model power spectrum $C_{l}$ and the trispectrum $T_{l_{1}l_{2}}$ , Equation (17) gives the likelihood function for fixed data $\hat{C}_{l}$ . In the next section we will compute the correction to the likelihood numerically, but there are a few analytic results we can derive from this likelihood. Most significantly, we can compute its expected curvature around a fiducial power spectrum $C_{f,l}$ and non-Gaussian covariance $T_{f,l_{1}l_{2}}$ i.e. the Fisher matrix. Assuming a model with parameters $\theta_{\alpha}$ , this is given by

	$\displaystyle F_{\alpha\beta}$	$\displaystyle=\frac{1}{2}\sum_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\nu\frac{1}{C_{f,l}^{2}}\frac{\partial C_{l}}{\partial\theta_{\alpha}}\frac{\partial C_{l}}{\partial\theta_{\beta}}-\frac{1}{4}\sum_{l_{1},l_{2}=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\nu_{1}\nu_{2}\frac{T_{f,l_{1}l_{2}}}{C_{f,l_{1}}^{2}C_{f,l_{2}}^{2}}\frac{\partial C_{l_{1}}}{\partial\theta_{\alpha}}\frac{\partial C_{l_{2}}}{\partial\theta_{\beta}}$
		$\displaystyle\approx\sum_{l_{1},l_{2}=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\frac{\partial C_{l_{1}}}{\partial\theta_{\alpha}}\left(\frac{2C_{f,l_{1}}^{2}}{\nu}\delta_{l_{1}l_{2}}+T_{f,l_{1}l_{2}}\right)^{-1}\frac{\partial C_{l_{2}}}{\partial\theta_{\beta}},$		(19)

where the derivatives are evaluated at the fiducial model. Equation (19) is the usual expression for the power spectrum Fisher matrix but with the proper non-Gaussian covariance matrix. This matches the expression for the Fisher matrix derived from a Gaussian likelihood with the covariance fixed to a fiducial model and including the non-Gaussian term, and tells us that to recover the full-sky Fisher matrix from a likelihood approximated by its Gaussian limit we have to fix the Gaussian covariance to a fiducial model. This is the main result of Ref. Carron (2013), and here we recover it in the presence of signal non-Gaussianity. It is interesting to note that in principle one can allow the non-Gaussian part of the covariance matrix, $T_{l_{1}l_{2}}$ , to depend on parameters in a Gaussian likelihood even when fixing the Gaussian part, as required to keep the approximation consistent. In our perturbative framework the additional information contributed by $T_{l_{1}l_{2}}(\theta)$ is higher order, but whether this is consistent with the likelihood approximation for general $T_{l_{1}l_{2}}$ is still an open question, and covered by neither our analysis nor that of Ref. Carron (2013).

We note finally that we do not recover the field-level Fisher matrix from our power spectrum Fisher matrix, which now contains new terms due to non-Gaussianity. This is an expression of the fact that the power spectrum does not now contain all the information.

II.2.2 Non-Gaussian power distribution from a signal with non-zero bispectrum

We have argued that a consistent expansion of distribution of the power spectrum requires including terms of order the signal bispectrum squared. In the $p=1$ case, the correction given in Equation (102) reduces to

	$\displaystyle p(\{\hat{C}_{l}\})=$	$\displaystyle\left\{\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\Gamma[\hat{C}_{l};\nu/2,\nu/(2C_{l})]\right\}$
		$\displaystyle\times\left\{1+\frac{1}{12}\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\frac{\tilde{B}_{l_{1}l_{2}l_{3}}}{C^{2}_{l_{1}}C^{2}_{l_{2}}C^{2}_{l_{3}}}\vphantom{\frac{1}{2}}\left[\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}\Delta\hat{C}_{l_{3}}-[3]\frac{2\delta_{l_{1}l_{2}}}{(\nu_{1}+2)}\hat{C}_{l_{1}}^{2}\Delta\hat{C}_{l_{3}}+\frac{16\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+4)(\nu_{1}+2)}\hat{C}_{l_{1}}^{3}\right]\right\},$		(20)

where $\tilde{B}$ is proportional to the square of the reduced bispectrum $b_{l_{1}l_{2}l_{3}}$ of the signal and is given by

\tilde{B}_{l_{1}l_{2}l_{3}}=\frac{1}{4\pi}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ 0&0&0\end{pmatrix}^{2}b_{l_{1}l_{2}l_{3}}^{2},

(21)

with the quantity in parentheses a Wigner $3j$ symbol. The notation $[n]$ in Equation (20) refers to the $n$ terms that follow from symmetry considerations. It is straightforward to verify that the correction leaves the normalisation, mean, and covariance matrix unchanged, but corrects the three-point function by a term equal to $4\tilde{B}_{l,l^{\prime},l^{\prime\prime}}$ , as expected from Equation (38).

One can verify that the Fisher matrix is unaffected by the squared bispectrum correction at leading order. In other words, the effect of $\tilde{B}_{l_{1}l_{2}l_{3}}$ is to modify the skewness of the likelihood, while leaving its mean and variance approximately unaffected.

As claimed in Section A.1, the correction to the density is $\mathcal{O}(\tilde{B}/C_{l}^{3})$ , the same order as the trispectrum correction. This justifies our decision to include terms of order the bispectrum squared.

In summary, the leading-order correction to the distribution of power spectrum estimates from signal non-Gaussianity is

	$\displaystyle p(\{\hat{C}_{l}\}\|\{C_{l},T_{l,l^{\prime}},\tilde{B}_{l,l^{\prime},l^{\prime\prime}}\})$	$\displaystyle=\left\{\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\Gamma[\hat{C}_{l};\nu/2,\nu/(2C_{l})]\right\}\left\{1+\frac{1}{8}\sum_{l_{1},l_{2}}\nu_{1}\nu_{2}\frac{T_{l_{1}l_{2}}}{C_{l_{1}}C_{l_{2}}}\left(\frac{\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}}{C_{l_{1}}C_{l_{2}}}-\frac{2}{\nu_{1}+2}\delta_{l_{1}l_{2}}\frac{\hat{C}_{l_{1}}^{2}}{C_{l_{1}}^{2}}\right)\right.$
		$\displaystyle\left.+\frac{1}{12}\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\frac{\tilde{B}_{l_{1}l_{2}l_{3}}}{C^{2}_{l_{1}}C^{2}_{l_{2}}C^{2}_{l_{3}}}\vphantom{\frac{1}{2}}\left[\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}\Delta\hat{C}_{l_{3}}-[3]\frac{2\delta_{l_{1}l_{2}}}{(\nu_{1}+2)}\hat{C}_{l_{1}}^{2}\Delta\hat{C}_{l_{3}}+\frac{16\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+4)(\nu_{1}+2)}\hat{C}_{l_{1}}^{3}\right]\right\}.$		(22)

Equation (22) is the first major result of this work. It is the leading-order likelihood function for the power spectrum that incorporates a non-Gaussian signal. Unlike the commonly assumed Gaussian approximation, this distribution applies equally well to the largest angular scales. This distribution has the correct normalization, mean, covariance, and three-point function. In Appendix B we provide the analogous expression in the case of multiple redshift bins, in which case the distribution is a perturbation around a Wishart distribution for positive definite matrices. A practical application of the distribution may require ensuring positivity by taking the logarithm, but in practice we have not found this to be necessary. In Section III we present numerical results for the transition of the power spectrum distribution from Wishart to post-Wishart (our Equation (22)) and finally to Gaussian as the number of degrees of freedom increases.

We close this section by noting that while we have restricted ourselves to spin-0 fields (such as lensing convergence or projected galaxy number counts) for simplicity, our formalism may be easily generalized to the spin-2 case (i.e. weak lensing shear), as mentioned at the beginning of Section II. Since we work in harmonic space on the full sky, one can simply work with the $E$ and $B$ multipoles derived in the usual way from the shear field. This augments the data vector to have length $2\times N_{z}$ , but all the matrix expressions presented here can be used after making this modification. Redshift bin indices must now be interpreted as also running over $E$ and $B$ indices, and the appropriate cross-bispectra and trispectra must be used (many of which will be zero due to parity invariance).

On the cut sky our expression must be modified to account for the mask, which we have not attempted to include. This induces additional mode mixing (as well as the usual $E$ / $B$ ambiguity), which could possibly be included using the techniques of either Ref. Hamimeche and Lewis (2008) or Ref. Upham et al. (2020). We defer this to a future work.

II.3 The transition to Gaussianity

A key goal of this paper is to understand and quantify how efficiently the Central Limit Theorem (CLT) Gaussianizes the distribution of $\hat{C}_{l}$ in the presence of signal non-Gaussianity. One could well imagine a situation for example where the strength of non-Gaussianity is increasing with $l_{\mathrm{max}}$ faster than the CLT is able to Gaussianize, in which the Gaussian limit may never be achieved. In this section we use our new distribution to study how the CLT operates in detail. We specialise to the single redshift bin case, $p=1$ .

Convergence of a probability density to a limiting distribution can be established by studying the pointwise convergence of its characteristic function. We have already computed this, and we will additionally change variables from $\hat{C}_{l}$ to the ‘standardised’ variable $\hat{Z}_{l}\equiv\Delta\hat{C}_{l}/\sqrt{2C_{l}^{2}/\nu}$ . The characteristic function of the set of $\hat{Z}_{l}$ variables is

	$\displaystyle\phi_{\{\hat{Z}_{l}\}}(\{J_{l}\})$	$\displaystyle=\left[\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}e^{iJ_{l}\sqrt{\nu/2}}(1+iJ_{l}\sqrt{2/\nu})^{-\nu/2}\right]\left[1-\frac{1}{2}\sum_{l_{1},l_{2}}\varepsilon_{l_{1}l_{2}}\frac{J_{l_{1}}J_{l_{2}}}{(1+iJ_{l_{1}}\sqrt{2/\nu_{1}})(1+iJ_{l_{2}}\sqrt{2/\nu_{2}})}\right.$
		$\displaystyle\left.+\frac{i}{6}\sum_{l_{1},l_{2},l_{3}}\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}\frac{J_{l_{1}}J_{l_{2}}J_{l_{3}}}{(1+iJ_{l_{1}}\sqrt{2/\nu_{1}})(1+iJ_{l_{2}}\sqrt{2/\nu_{2}})(1+iJ_{l_{3}}\sqrt{2/\nu_{3}})}\right],$		(23)

where we have defined the quantities

	$\displaystyle\varepsilon_{l_{1}l_{2}}$	$\displaystyle\equiv\frac{T_{l_{1}l_{2}}}{\sqrt{2C_{l_{1}}^{2}/\nu_{1}}\sqrt{2C_{l_{2}}^{2}/\nu_{2}}},$		(24)
	$\displaystyle\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}$	$\displaystyle\equiv\frac{4\tilde{B}_{l_{1}l_{2}l_{3}}}{\sqrt{2C_{l_{1}}^{2}/\nu_{1}}\sqrt{2C_{l_{2}}^{2}/\nu_{2}}\sqrt{2C_{l_{3}}^{2}/\nu_{3}}}.$		(25)

The quantity $\varepsilon_{l_{1}l_{2}}$ has the simple interpretation that it is the relative correction to the power spectrum covariance matrix from non-Gaussianity. Its diagonal elements can also be interpreted as the correction to the ‘effective degrees of freedom’ through $\nu_{\mathrm{eff}}=\nu/(1+\varepsilon_{ll})$ such that the variance of the power spectrum is $2C_{l}^{2}/\nu_{\mathrm{eff}}$ , although simply making this correction in the likelihood is not sufficient as it does not capture, among other things, mode coupling.

Consider now taking every $\nu\gg 1$ , i.e. consider the limit $l_{\mathrm{min}}\gg 1$ . The zero-order prefactor in Equation (23) converges to a Gaussian, with the ‘linear’ part of the exponent getting cancelled by the complex exponential already present. Likewise, all the denominators in the non-Gaussian correction terms tend to unity at the same rate, leaving a quadratic correction from the trispectrum and a cubic correction from the bispectrum. In this limit, the correction is of the Edgeworth form. At the order to which we work, we may just as well replace the quadratic term by a Gaussian using $e^{-\lambda^{2}}\approx 1-\lambda^{2}$ , in which case the trispectrum term is absorbed into the covariance matrix of the Gaussian prefactor. This leaves the correction in the $l_{\mathrm{min}}\gg 1$ limit as

\phi_{\{\hat{Z}_{l}\}}(\{J_{l}\})\approx\exp\left[-\frac{1}{2}\sum_{l_{1}l_{2}}J_{l_{1}}(\delta_{l_{1}l_{2}}+\varepsilon_{l_{1}l_{2}})J_{l_{2}}\right]\left(1+\frac{i}{6}\sum_{l_{1},l_{2},l_{3}}\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}J_{l_{1}}J_{l_{2}}J_{l_{3}}\right).

(26)

For this to be valid, we clearly need $\sum_{l_{1}l_{2}}J_{l_{1}}J_{l_{2}}\varepsilon_{l_{1}l_{2}}\ll 1$ . A necessary condition for this is that every element of $\varepsilon$ is ‘small’ in a sense that depends on the quantity of interest. For example, if one was only interested in the marginal distribution of a single $\hat{C}_{l}$ , it would be sufficient to demand that only a single $\varepsilon_{ll}\ll 1$ .

The dominant effect of the trispectrum in this limit is clearly a change to the covariance matrix. Residual ‘post-Gaussian’ effects come from the squared bispectrum through the quantity $\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}$ affecting the three-point function of the power spectrum. These are negligible when $\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}\ll 1$ at all multipoles.

One guaranteed way to make both $\varepsilon$ and $\tilde{\varepsilon}$ small is to increase $C_{l}$ at fixed $T_{l_{1}l_{2}}$ or fixed $\tilde{B}_{l_{1}l_{2}l_{3}}$ . This may be achieved by simply increasing the (assumed Gaussian) noise power, since $C_{l}$ here is the total power spectrum. This is equivalent to saying that the noise ‘Gaussianizes’ the distribution. One must be careful however to check this Gaussianization also operates on not just the likelihood but the posterior, as we will do shortly.

The quantity $\varepsilon_{l_{1}l_{2}}$ is known to be significant on non-linear scales and is routinely measured in simulations (usually in a binned form, see Figure 1 for a related quantity). In contrast, the size of $\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}$ is less obvious. We can make an order-of-magnitude estimate of its size using the asymptotic limit of the Wigner- $3j$ symbol Ponzano and Regge (1968); Bernardeau et al. (2011)

\lim_{\nu_{1},\nu_{2},\nu_{3}\to\infty}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ 0&0&0\end{pmatrix}^{2}=\frac{1}{2\pi A(L_{1},L_{2},L_{3})},

(27)

where $L\equiv l+1/2$ and $A(L_{1},L_{2},L_{3})$ is the area of the triangle having side lengths $L_{1}$ , $L_{2}$ , $L_{3}$ . The squared $3j$ symbol thus decays as $1/\nu^{2}$ for large $\nu=\mathrm{min}(\nu_{1},\nu_{2},\nu_{3})$ . A necessary condition for negligible non-Gaussianity is therefore

\left[\frac{\sqrt{\nu_{1}\nu_{2}\nu_{3}/2}}{4\pi^{2}A(L_{1},L_{2},L_{3})}\right]^{1/2}\frac{\lvert b_{l_{1}l_{2}l_{3}}\rvert}{\sqrt{C_{l_{1}}C_{l_{2}}C_{l_{3}}}}\ll 1

(28)

when $l_{\mathrm{min}}\gg 1$ . The pre-factor on the left-hand side of Equation (28) is a weak function of $\nu$ , and is typically $\sim 10^{-2}$ for the relevant scales and for most triangle configurations. The reduced bispectrum for weak lensing has been measured for a few triangle configurations from $N$ -body simulations in Ref. Munshi et al. (2020), who find that it decays with $l$ for most configurations, with the slowest decay being a roughly $l^{-1}$ drop-off in their ‘squeezed’ configuration. This suggests that for the lensing convergence, the bispectrum-squared term should become rapidly negligible with $l$ . At $l\gtrsim 2000$ where shape noise starts to dominate the power spectrum, Munshi et al. (2020) find that the reduced bispectrum for an equilateral configuration is $\sim 10^{-17}$ , and therefore the left-hand side of Equation (28) is $\sim 10^{-5}$ . At $l=100$ we have $C_{l}\sim 10^{-9}$ and $b_{lll}\sim 10^{-17}$ , so the left-hand side of Equation (28) is again $\sim 10^{-5}$ . For a squeezed shape with $l_{1}=50$ , $l_{2}=l_{3}=2000$ , Munshi et al. (2020) finds $b\sim 10^{-16}$ . We find that $C_{l_{1}}~\sim 10^{-8}$ , and the left-hand side of Equation (28) is $\sim 10^{-4}$ i.e. slightly more significant. A typical element of $\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}$ is therefore around $10^{-9}$ .

In practice the quantity that should be small is the sum of $\varepsilon$ across the scales most relevant to the quantity of interest. The order-of-magnitude analysis above suggests that for a small subset of the $C_{l}$ the correction from non-Gaussianity is negligible, but if the number of relevant scales is large (as might be the case for massive data compression in order to infer cosmological parameters) then the conditions on $\varepsilon$ are more strict. Assuming an approximately constant $\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}$ , using all available scales up to $l_{\mathrm{max}}$ would require $\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}\ll l_{\mathrm{max}}^{-3}$ for example. For $\tilde{\varepsilon}_{l_{1}l_{2}l_{3}}\sim 10^{-9}$ corrections become large around $l_{\mathrm{max}}\approx 1000$ . This is within the range of scale useable angular scales for Stage-IV lensing surveys, which motivates the more quantitative study with simulations in Section III.

II.3.1 Non-Gaussian effects on posterior distributions

The sampling distribution of $\hat{C}_{l}$ itself is usually not of interest. What matters more is the posterior distribution on the $C_{l}$ , or on model parameters that generate the $C_{l}$ . In this section we study analytically how non-Gaussianity affects the posterior, using our corrected likelihood Equation (22).

First we study the joint posterior distribution of the set of theory $C_{l}$ . This is simply given by $p(\{C_{l}\}|\{\hat{C}_{l}\})\propto p(\{\hat{C}_{l}\}|\{C_{l}\})\pi(\{C_{l}\})$ , where $\pi(\{C_{l}\})$ is the prior. Natural priors for the $C_{l}$ are the uniform prior, which isolates the dependence on the likelihood, and the Jeffreys prior, which is uninformative. For a Gaussian likelihood the two coincide, but this is not the case for either the Gamma likelihood nor our Equation (22).

The Jeffreys prior is given by the square-root of the determinant of the Fisher matrix. We have computed this for our likelihood, and at leading order we have

\pi(\{C_{l}\})\propto\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}C_{l}^{-1}\right)\left(1-\frac{1}{4}\sum_{l}\nu\frac{T_{ll}}{C_{l}^{2}}\right)

(29)

The zero-order posterior for each $C_{l}$ is thus an Inverse-Gamma distribution, denoted by $\Gamma^{-1}$ given by

p_{\Gamma^{-1}}(C_{l}|\hat{C}_{l})=\frac{(\nu\hat{C}_{l}/2)^{\nu/2}}{\Gamma(\nu/2)}\,C_{l}^{-\nu/2-1}\exp\left(-\frac{\nu\hat{C}_{l}}{2C_{l}}\right).

(30)

Weak signal non-Gaussianity perturbs away from this zero-order distribution.

In what follows we will only include trispectrum terms, which we have found to be the dominant source of non-Gaussianity in the power spectrum distribution. Substituting in the Jeffreys prior and normalizing, the correction to the Inverse-Gamma posterior is

	$\displaystyle p(\{C_{l}\}\|\{\hat{C}_{l}\})=$	$\displaystyle\left[\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\Gamma^{-1}(C_{l};\nu/2,\nu C_{l}/2)\right]$
		$\displaystyle\times\left\{1+\frac{1}{4}\sum_{l_{1}l_{2}}\hat{\varepsilon}_{l_{1}l_{2}}\left[\frac{\hat{C}_{l_{1}}\hat{C}_{l_{2}}}{C_{l_{1}}C_{l_{2}}}\left(\frac{\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}}{C_{l_{1}}C_{l_{2}}}-\frac{2\delta_{l_{1}l_{2}}}{\nu_{1}+2}\frac{\hat{C}_{l_{1}}^{2}}{C_{l_{1}}^{2}}-\frac{2\tilde{\alpha}\delta_{l_{1}l_{2}}}{\nu_{1}}\right)+\frac{\delta_{l_{1}l_{2}}[2\tilde{\alpha}(\nu_{1}+2)-4]-4}{\nu_{1}\nu_{2}}\right]\right\}$		(31)

where $\hat{\varepsilon}$ is defined as in Equation (24) but with $\hat{C}_{l}$ in place of $C_{l}$ , and $\tilde{\alpha}$ is a book-keeping parameter that keeps track of the influence of the correction to the Jeffreys prior; $\tilde{\alpha}=1$ represents the first-order-corrected Jeffreys prior, and $\tilde{\alpha}=0$ uses only the zero-order Jeffreys prior.

Corrections to the posterior moments from $T_{l_{1}l_{2}}$ can now be computed easily using the properties of the Inverse-Gamma distribution. For example, the mean of a single $C_{L}$ after marginalising over all other $C_{l}$ receives a fractional correction of order $\mathcal{O}(\hat{\varepsilon}/\nu)$ , i.e. suppressed at large $\nu$ . The leading-order fractional correction to the marginal variance of a single $C_{L}$ is $\hat{\varepsilon}_{LL}$ , as expected, and is entirely captured by a Gaussian having the full non-Gaussian covariance matrix. The leading-order ‘post-Gaussian’ correction from signal non-Gaussianity is $\mathcal{O}(\hat{\varepsilon}/\nu)$ . Note the variance under a Gaussian approximation also differs from that of the Inverse-Gamma by a fractional amount $\mathcal{O}(1/\nu)$ when $\nu$ is finite. Our formalism is valid at all orders in $1/\nu$ , but does assume small $\varepsilon$ . Corrections to the mean and variance not captured by a Gaussian with non-Gaussian covariance matrix are thus $\mathcal{O}(\hat{\varepsilon}/\nu)$ . It turns out that these are also the corrections to the conditional mean and variance when all but one $C_{l}$ are fixed to their measured values, with the mean correction further suppressed if $\tilde{\alpha}=1$ .

To study the Gaussian limit in the $C_{l}$ posterior, we write $C_{l}=\hat{C}_{l}(1+n\sqrt{2/\nu})$ and take the $\nu\to\infty$ limit. The leading-order term, quadratic in $n$ , is precisely that specified by a Gaussian with non-Gaussian covariance. Leading-order post-Gaussian terms at fixed $n$ are $\mathcal{O}(\varepsilon/\sqrt{\nu})$ , i.e. larger than the correction to the mean and variance. This correction is a cubic polynomial in $n$ , suggesting that it corresponds to a change in the skewness of the posterior $C_{l}$ , not captured by a Gaussian. It turns out that this correction can be accounted for by a slight modification to the Gaussian where we replace $T_{l_{1}l_{2}}$ in the covariance matrix with $T_{l_{1}l_{2}}(\hat{C}_{l_{1}}^{2}/C_{l_{1}}^{2})(\hat{C}_{l_{2}}^{2}/C_{l_{2}}^{2})$ , but in practice this post-Gaussian term is small and the correction does not lead to noticeable improvement. Note that we have neglected the bispectrum-squared term in this derivation, which will also modify the $C_{l}$ skewness.

We can also study the distribution of an amplitude parameter by setting $C_{l}=AC_{0,l}$ for some fiducial power spectrum $C_{0,l}$ (we neglect noise here for simplicity). We find a fractional correction to the mean of order $\mathcal{O}(\langle\varepsilon\rangle/\sum\nu)$ , where we define the $l$ -averaged quantity

	$\displaystyle\langle\varepsilon\rangle$	$\displaystyle\equiv\frac{\sum_{l_{1}l_{2}}\sqrt{\nu_{1}\nu_{2}}\varepsilon_{l_{1}l_{2}}}{\sum_{l}\nu}$
		$\displaystyle=\frac{\sum_{l_{1}l_{2}}\nu_{1}\nu_{2}\frac{T_{l_{1}l_{2}}}{2C_{l_{1}}C_{l_{2}}}}{\sum_{l}\nu},$		(32)

with $\varepsilon$ evaluated at the fiducial model. $\langle\varepsilon\rangle$ has the simple interpretation that it is the leading-order fractional correction to the posterior parameter variance from the non-Gaussian covariance term. Similarly, the leading-order post-Gaussian fractional correction to the variance of $A$ is $\mathcal{O}(\langle\varepsilon\rangle/\sum\nu)$ . Corrections to the dimensionless skew of $A$ are $\mathcal{O}(\langle\varepsilon\rangle/\sqrt{\sum\nu})$ , as are corrections to the pointwise posterior distribution of $A$ .

To summarise this section, the biggest effect of a signal trispectrum is to correct the posterior variance of parameters through the non-Gaussian covariance term. This is typically of order $\langle\varepsilon\rangle$ , where the average runs over the scales of interest. We require this correction to be small in order for our perturbative series to converge, i.e. the validity of our distributions require $\langle\varepsilon\rangle\ll 1$ . A good rule of thumb is that we require the correction to the Fisher information from signal non-Gaussianity to be much less than unity. The broadening of parameter errors is mostly accounted for by a Gaussian likelihood with non-Gaussian covariance - the genuinely new terms we have computed here are ‘post-Gaussian’, and give a skewness and fractional pointwise correction to the probability density which are both of order $\langle\varepsilon\rangle/\sqrt{\sum\nu}$ at large $\nu$ , which by Equation (44) is also the leading correction to the dimensionless skewness of the measured power spectrum. At large $\nu$ this is strongly suppressed due to the Central Limit Theorem if $\langle\varepsilon\rangle$ does not grow faster than $\sqrt{\sum\nu}$ . In Section III we quantify this with simulations. Corrections due to the squared bispectrum also affect this skewness. Corrections to the posterior mean and variance are smaller, of order $\langle\varepsilon\rangle/\sum\nu$ at large $\nu$ , which is of order the correction to the dimensionless kurtosis of the measured power spectrum.

II.4 Bandpowers

It is usually advantageous to bin the power spectrum in angular scale in order to reduce dimensionality of the required covariance matrix, account for mode mixing due to the survey mask, or to speed up likelihood calculations. Binning has a strong effect on the apparent impact of non-Gaussianity in the covariance matrix, since the Gaussian term will be suppressed due to the increasing mode count while the non-Gaussian term will stay roughly constant.

Our binning scheme is defined by

\hat{C}_{b}=\frac{\sum_{l\in b}W_{l}\hat{C}_{l}}{\sum_{l\in b}W_{l}},

(33)

for a window function $W_{l}$ , where $b$ labels the bin. For narrow bins, an optimal binning scheme would weight with the inverse covariance of $\hat{C}_{l}$ . To avoid dependence on the model we weight by the mode count and take $W_{l}=2l+1$ . We explore different binning choices in Appendix C.

Assuming $C_{l}$ is roughly constant over the extent of the bin, the zero-order characteristic function of the $\hat{C}_{b}$ is

	$\displaystyle\phi_{\{\nu_{b}\hat{C}_{b}\}}(\{J_{b}\})$	$\displaystyle=\prod_{b}\prod_{l\in b}(1+2iJ_{b}C_{l})^{-\nu/2}$
		$\displaystyle\approx\prod_{b}(1+2iJ_{b}C_{b})^{-\nu_{b}/2},$		(34)

where $\nu_{b}\equiv\sum_{l\in b}\nu$ . In other words, the zero-order distribution of the binned power spectra is approximately a Gamma distribution. In practice we can eliminate the need to approximate a constant $C_{l}$ over the bin by working with $\hat{C}_{l}/C_{l}$ , but for sufficiently narrow bins the above approximation is accurate.

In the presence of signal non-Gaussianity, we can make a similar approximation that $C_{l}$ is flat across each bin and simply replace $l_{1,2,3}$ with $b_{1,2,3}$ and $\nu_{1,2,3}$ with $\nu_{b_{1},b_{2},b_{3}}$ everywhere in the likelihood. Quantities such as $T_{l_{1}l_{2}}$ and $\tilde{B}_{l_{1}l_{2}l_{3}}$ can be replaced with their binned versions.

Since the number of modes in a bin scales with its width $\Delta l_{b}$ , the fractional correction to the covariance scales as $\varepsilon_{bb}\propto\Delta l_{b}$ . The number of terms in the sum over $\varepsilon_{b_{1}b_{2}}$ in the denominator of $\langle\varepsilon\rangle$ scales between $1/\Delta l_{b}^{2}$ for non-sparse $\varepsilon_{b_{1}b_{2}}$ and $1/\Delta l_{b}$ for diagonal $\varepsilon_{b_{1}b_{2}}$ . Therefore, corrections from the signal trispectrum scale somewhere between $\langle\varepsilon\rangle\propto\Delta l_{b}$ for diagonal $\varepsilon$ and $\langle\varepsilon\rangle\approx\mathrm{constant}$ for non-sparse $\varepsilon$ . In practice the $\varepsilon$ is not diagonal, and so we can expect our likelihood corrections to be only mildly sensitive to the choice of binning, with the generic feature that the corrections are larger for broader bins. We confirm this in Appendix C.

III Quantifying corrections from non-Gaussianity

In the previous section we derived the leading order correction to the probability density of power spectrum measurements in the presence of signal non-Gaussianity. Considered as a function of the unknown true power spectrum $C_{l}$ , the signal trispectrum $T_{ll^{\prime}}$ , and the squared bispectrum $\tilde{B}_{ll^{\prime}l^{\prime\prime}}$ , this distribution gives the likelihood function. It is the impact of non-Gaussianity on the likelihood (or posterior) that is of most interest to us here.

We have seen that the Fisher information contributed by the parameter dependence of $T_{ll^{\prime}}$ and $\tilde{B}_{ll^{\prime}l^{\prime\prime}}$ is formally of higher order in the (assumed weak) non-Gaussianity, so in this section we will fix these to a fiducial model. Our approach is to use approximate simulations to compute these quantities in a fiducial cosmology, but we could just as well compute them using the halo model or fitting functions. Once we have computed these non-Gaussian signal cumulants, we can use Equation (22) to compute the correction to the likelihood of the $C_{l}$ , or any parameters on which $C_{l}$ depends.

III.1 Simulations

To compute $T_{ll^{\prime}}$ and $\tilde{B}_{ll^{\prime}l^{\prime\prime}}$ we create a large suite of lognormal lensing convergence maps using the flask software package Xavier et al. (2016). Lognormal fields are known to give a reasonable approximation for the covariance of lensing two-point functions Taruya et al. (2002); Hilbert et al. (2011), although their ability to predict configurations of the bispectrum is less well explored Martin et al. (2012). We will cross-validate our results with a small sample of lensing maps from ray-traced from $N$ -body simulations.

We generated $10^{5}$ lognormal convergence maps with a fixed power spectrum generated with camb Lewis et al. (2000); Howlett et al. (2012). Our fiducial galaxy sample is modelled with a Euclid-like lensing sample in mind. We assume a single broad redshift distribution for lensing sources, peaked around $z\approx 0.9$ , having the same functional form as Ref. Euclid Collaboration et al. (2020), with cosmological parameters set to the best-fit values of Ref. Planck Collaboration et al. (2020). The alternative choice of a narrow source redshift bin peaked at $z_{s}=0.325$ , corresponding to the lowest redshift bin of a Euclid-like survey with 10 tomographic bins, is explored in Appendix D. We compute the power spectrum to $l_{\mathrm{max}}=6143$ , applying non-linear and baryon feedback corrections according to the default specifications of Ref. Mead et al. (2021). For simplicity we do not include intrinsic alignments in the model; this will not affect our results. The ‘shift parameter’ corresponding to the (negative) minimum value of the convergence fed to flask is set to $0.01214$ , which was chosen by hand to get a good match to $N$ -body simulations, and is comparable to the value found in Ref. Friedrich et al. (2021) by fitting the 1-point kurtosis of the convergence. Convergence fields are rendered on a healpix⁶⁶6https://healpix.sourceforge.io Górski et al. (2005) grid with $N_{\mathrm{side}}=2048$ . In our actual likelihood computations we only use scales down to $l_{\mathrm{max}}=1000$ , which is sufficiently small to ensure the input power spectrum is recovered with negligible bias⁷⁷7The exponentiation of the Gaussian field needed to produce a lognormal field transfers power to sub-grid scales where it cannot be recovered from the discrete healpix grid. This causes a negative bias in the power spectrum on all scales that increases in severity as the Nyquist frequency $l_{\mathrm{Ny}}\approx 3N_{\mathrm{side}}-1$ is approached., but sufficiently large to cover a realistic range of scales for lensing surveys. We apply the healpix pixel window function to the map to mimic the effect of coarse binning of a higher resolution galaxy catalogue.

To check that the lognormal fields are producing sensible results for the signal cumulants, we compare the convergence statistics with those from a suite of ray-traced $N$ -body simulations from Ref. Takahashi et al. (2017). These simulations assume a thin source redshift plane at $z_{s}=1.0334$ , but the resulting lensing power spectrum only differs from a fiducial $C_{l}$ by a few percent across most scales.

Refer to caption — Figure 1: Ratio of the diagonal elements of the $C_{l}$ covariance matrix to that of Gaussian fields for our lognormal mocks (orange) and the ray-traced $N$ -body simulations of Ref. Takahashi et al. (2017) (blue). We have binned the power spectra with $\Delta\log_{10}l=0.1$ . Ref. Takahashi et al. (2017) speculate that the excess variance at large angular scales in the $N$ -body mocks is due to the finite thickness of their lensing shells.

In Figures 1 and 2 we compare the covariance matrix of the measured $\hat{C_{l}}$ from the lognormal mocks with those of the $N$ -body mocks. Since the latter contains much fewer realizations ( $\sim 100$ ) the recovered statistics are much noisier, but a meaningful comparison is still possible. Figure 1 shows the ratio of the diagonal elements of the covariance matrix compared with a Gaussian prediction based on the input power spectrum (we have binned the $\hat{C}_{l}$ to emphasise the differences). The agreement between the two is reasonable, with the $N$ -body mocks having excess variance on large scales, as discussed in Ref. Takahashi et al. (2017). The lognormal mocks overestimate the variance on small scales, but since we are only seeking approximations to the trispectrum here the difference is sufficiently small. The non-Gaussian contribution to the covariance dramatically kicks up when $l>200$ , due to the signal trispectrum. Note that our lognormal maps are on the full sky and are produced by a local non-linear transform of the linear field, so do not capture the super-sample covariance, but the agreement with the $N$ -body mocks is nonetheless acceptable.

In Figure 2 we plot slices through the correlation matrix. The agreement of the lognormal mocks with the $N$ -body mocks is not perfect, but acceptable. Above $l\approx 200$ mode coupling in the $N$ -body mocks kicks up dramatically, much in the same way as in the diagonal elements. The lognormal mocks do not capture this well, instead predicting a smooth increase, but the agreement is within an order of magnitude and acceptable.

To measure the bispectrum from the simulations we use the estimator

\hat{b}_{l_{1}l_{2}l_{3}}=\frac{1}{N_{l_{1}l_{2}l_{3}}}\sum_{m_{1}=-l_{1}}^{l_{1}}\sum_{m_{2}=-l_{2}}^{l_{2}}\sum_{m_{3}=-l_{3}}^{l_{3}}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ m_{1}&m_{2}&m_{3}\end{pmatrix}a_{l_{1}m_{1}}a_{l_{2}m_{2}}a_{l_{3}m_{3}},

(35)

where

N_{l_{1}l_{2}l_{3}}=\sqrt{\frac{\nu_{1}\nu_{2}\nu_{3}}{4\pi}}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ 0&0&0\end{pmatrix},

(36)

which is optimal for Gaussian fields. We use the Python package spherical⁸⁸8https://pypi.org/project/spherical/ to precompute the Wigner $3j$ symbols to allow for rapid summation over the azimuthal wave numbers in Equation (35), but the implementation is slow for $l>100$ so we limit our bispectrum measurements to this maximum multipole.

In Figure 3 we show the reduced bispectrum measured from our lognormal convergence maps and from the $N$ -body mocks of Ref. Takahashi et al. (2017). The agreement is reasonable, although the noise in the $N$ -body measurement is large. Note that the strength of lognormality was chosen to give a good match to the non-Gaussian part of the power spectrum covariance, so the agreement in the bispectrum is not guaranteed by construction. We also find qualitative agreement with the measurements of Ref. Munshi et al. (2020).

To mitigate the impact of noise in our likelihood we fit the non-Gaussian signal polyspectra with smooth multivariate fifth-order polynomials. To make these low-order polynomials good fits we first bin the polyspectra, as described in Section II.4, in equally spaced bins in $\log l$ . We separately fit a univariate polynomial to the ratio of the diagonal elements of the covariance to the Gaussian prediction and a bivariate polynomial to the correlation matrix (excluding the diagonal elements). We fit a trivariate polynomial to the reduced bispectrum before squaring and forming the quantity $\tilde{B}_{l_{1}l_{2}l_{3}}$ . We take care to include only the non-zero elements of the bispectrum in the fit.

With $T_{ll^{\prime}}$ and $\tilde{B}_{ll^{\prime}l^{\prime\prime}}$ in hand we can now compute the size of the correction to weak lensing likelihoods and posteriors from signal non-Gaussianity, using Equation (22).

III.2 Results

The zero-order likelihood function for fixed $\hat{C}_{l}$ is proportional to an Inverse-Gamma distribution. In Figure 4 we show the fractional correction away from this due to signal non-Gaussianity, using $\hat{C}_{l}$ that have been binned in multipole with $\Delta\log l=0.1$ ⁹⁹9The sensitivity to binning choices is mild, and discussed in Section II.4 and Appendix C.. In this figure we assume a fixed measured $\hat{C}_{l}$ and show the correction to the likelihood when the model $C_{l}$ are uniformly scaled relative to $\hat{C}_{l}$ , i.e. we take $C_{l}=A\hat{C}_{l}$ for an amplitude parameter $A$ . The width of the likelihood function is typically $\sigma\equiv 1/\sqrt{\sum\nu/2}$ , the zero-order Gamma (or Wishart when $p\neq 1$ ) uncertainty, so we normalise the horizontal scale in Figure 4 by this quantity. The best-fit value of $A$ is close to unity, so the actual values plotted on the horizontal axis reflect the $l_{\mathrm{max}}=100$ that we have assumed for this plot. We plot the contributions from the trispectrum and the squared bispectrum separately. We have verified that the ‘binned’ version of our model for the sampling distribution provides a close match to the histogram of measured power spectra from the lognormal mocks across the range of scales contributing to the all the figures in the section, although we note that the mean and covariance are both matched by construction.

Figure 4 shows that the contribution from the squared bispectrum $\tilde{B}_{ll^{\prime}l^{\prime\prime}}$ is sub-dominant to that of the trispectrum in the vicinity of a few $\sigma$ around the best-fit model, which is the range of interest. As expected due to the high mode count contributing to information on $A$ the zero-order distribution is close to Gaussian, as demonstrated by the red curve - note that uniform contributions to the fractional difference will get normalised away when we compute posteriors, as shown shortly. In blue we show the fractional correction of a Gaussian having the correct covariance matrix including the trispectrum contribution. In agreement with the discussion in Section II.3, this provides an accurate approximation to our distribution, with the main effect being a quadratic correction corresponding to the broadening of parameter error bars.

In practice we are not so much interested in the likelihood as the posterior. The easiest posterior to compute from our likelihood is the conditional posterior of a single $C_{l}$ assuming all others are fixed to their measured values. This is shown in Figure 5 for $l_{\mathrm{max}}=100$ and again using binned power spectra with $\Delta\log l=0.1$ , which is the same binning choice used to fit the polyspectra from the simulations¹⁰¹⁰10Note that the binning implementation can differ between that used in the likelihood and that used to fit the polyspectra, but we found using different choices gave negligible effect on the results.. The bin barycenters are shown above each panel in Figure 5. On large angular scales (top left panel), non-Gaussianity in the signal is weak and our distribution matches the Gamma distribution very closely (black and yellow curves are indistinguishable). A Gaussian approximation, in contrast, is poor on these scales. As the angular scale of the multipole bin is decreased, the Gaussian and Gamma approximations match ever more closely, with very little difference between any of the distributions at $l\approx 30$ . As the angular scale decreases further, the effects of signal non-Gaussianity start to become important and broaden the distribution through the trispectrum contribution to the covariance matrix. Thus, our distribution starts to deviate from the Gamma and Gaussian-with-Gaussian-covariance distributions (themselves now indistinguishable) visibly at $l\approx 100$ . Most of this deviation is captured by a Gaussian with a non-Gaussian covariance (blue curve), with the residual ‘post-Gaussian’ effect being a small amount of skewness in the distribution. In contrast to the Gamma and Gaussian approximations, our distribution is the only model that correctly matches the large-scale and weakly non-linear-scale limits of the true $C_{l}$ distribution. Similar statements can be made of the marginal distribution of the $C_{l}$ , whose leading-order properties were discussed in Section II.3.1.

It is interesting to note from Figure 5 that the regime where Gamma and Gaussian differ ( $l\lesssim 30$ in this case) is distinct from the regime where signal non-Gaussianity starts to become important ( $l\gtrsim 100$ ). That these regimes are well separated has traditionally been the motivation for considering large-scale likelihood non-Gaussianity from the finite number of modes as a separate effect to the small-scale modification of the covariance from signal non-Gaussianity. The distinction is of course redshift dependent; as the redshift of a source bin is lowered, the angular scale at which signal non-Gaussianity becomes important will increase. We therefore expect that our new likelihood expression will be of unique value when these two sources of likelihood non-Gaussianity must be treated simultaneously, i.e. for low-redshift sources. The precise impact will additionally depend on the parameters of interest. We explore a realistic low- $z_{s}$ scenario in Appendix D, finding that skewness corrections to the posterior can become sizeable (although still $<10\%$ ) for $l_{\mathrm{max}}=1000$ , but are sub-percent once shape noise is included.

It is also straightforward to compute the posterior of an amplitude parameter scaling the signal power spectrum (we again neglect noise for simplicity) as $C_{l}=A\hat{C}_{l}$ , shown in Figure 6. Most of the constraining power on the amplitude comes from small scales, and hence the Gamma and Gaussian-with-Gaussian-covariance (yellow and red curves respectively) overlap closely. Our distribution (black curve) correctly captures the significant increase in parameter variance from the trispectrum. As we have seen in Figure 5, a Gaussian with appropriately modified covariance (blue curve) captures most of the effect of signal non-Gaussianity, with our distribution imparting a small residual skewness into the posterior.

Since our formalism is based on a perturbative series expansion of the signal distribution, it is important to check that we are working in a regime where higher-order terms can be safely neglected. In Section II.3.1 we showed that the leading-order correction to the posterior from signal non-Gaussianity is $\langle\varepsilon\rangle$ , defined in Equation (32). This quantity is the trispectrum contribution to the $\hat{C}_{l}$ covariance matrix divided by the Gaussian contribution, $\varepsilon_{l_{1}l_{2}}$ , averaged over the scales contributing to the parameter of interest with a $\nu$ -dependent weight. It has a simple interpretation that it is equal to the perturbation to the Fisher information from signal non-Gaussianity. We require $\langle\varepsilon\rangle\ll 1$ for our perturbative series to be valid, with the leading-order post-Gaussian (i.e. beyond that predicted by a Gaussian functional form) skewness of order $\langle\varepsilon\rangle/\sqrt{\sum\nu}$ . We have found bispectrum-squared terms to be subdominant, but they are potentially non-negligible when $l\gtrsim 1000$ by the arguments of Section II.3.

In Figure 7 we show a few convergence statistics that can be derived from $\varepsilon_{l_{1}l_{2}}$ . The left panel shows quantities relevant to the posterior of a single $C_{l}$ , in which case $\langle\varepsilon\rangle$ is equivalent to $\varepsilon_{ll}$ , the diagonal elements of the matrix $\varepsilon_{l_{1}l_{2}}$ . We also show the largest eigenvalue (i.e. the spectral radius) of this matrix. A spectral radius much less than unity is sufficient to ensure a valid perturbative expansion of the inverse covariance matrix around its Gaussian term, and must hold if our expression is to give a valid approximation to the pairwise distributions of $C_{l}$ with different $l$ .

As the blue curves in Figure 7 show, the fractional correction from signal non-Gaussianity is much less than unity out to $l\approx 100$ , which justifies our use of this maximum scale in the posteriors computed in this section and explains why the correction appears so small in Figure 5. The leading-order corrections not captured by a Gaussian functional form are given by the orange curves and are never more than $10^{-2}$ even out to $l\approx 1000$ , i.e. sub-percent across all scales of interest for upcoming lensing surveys, an important result. Beyond $l\approx 100$ our perturbative series breaks down, but the quantity $\langle\varepsilon\rangle/\sqrt{\sum\nu}$ is still indicative of the relevance of residual non-Gaussianity in the likelihood, so it is reasonable to draw useful conclusions from Figure 7 even beyond the regime where our likelihood expression is formally valid. We note that $\langle\varepsilon\rangle/\sqrt{\sum\nu}$ is the leading-order correction to the dimensionless skewness of the measured power spectrum from signal non-Gaussianity, valid at all (sufficiently large) $l$ , so it is reasonable to expect that this quantity also quantifies non-Gaussian corrections to the likelihood function. Corrections to the posterior mean and variance not captured by a Gaussian are even more subdominant, suppressed by another factor of $\sqrt{\nu}$ .

The right-hand panel of Figure 7 shows $\langle\varepsilon\rangle$ for an amplitude parameter scaling the model power spectrum. The corrections are comparable in size with those of the individual $C_{l}$ , with $l_{\mathrm{max}}=100$ just about in the regime where our expression is valid. Non-Gaussian effects are still sub-percent even out to $l_{\mathrm{max}}=1000$ . In Appendix C we study the (mild) impact of choosing a different binning scheme for the power spectrum bandpowers, shown in Figure 9.

Taken together, Figures 5 and 7 give the following picture for how the likelihood changes with angular multipole. On the largest scales a Gamma distribution is accurate as the signal is close to Gaussian. By $l\approx 30$ the Gamma distribution has Gaussianized, with corrections being percent-level or small in the central $2\sigma$ region (albeit larger in the tails). As $l\approx 100$ is approached the non-Gaussianity in the signal starts to become important, and a Gaussian with appropriately broadened covariance is a good approximation. Corrections to the Gamma distribution not captured by the Gaussian approximation are sub-percent across all angular scales, but can be accounted for by using our expression Equation (22) on large scales and smoothly matching it to a Gaussian around $l\lesssim 100$ .

One question we wish to answer in this work is whether the strength of signal non-Gaussianity grows faster than the CLT is able to Gaussianize the power spectrum distribution. From Figure 7 it appears that the struggle between these two competing effects is delicately balanced, with a very slow growth in the post-Gaussian correction from signal non-Gaussianity (orange curve) visible as $l_{\mathrm{max}}$ increases. One might wonder whether this correction becomes non-negligible at very high $l_{\mathrm{max}}$ . At some point as $l_{\mathrm{max}}$ is increased the effect of the noise will become important, which suppresses $\varepsilon$ by boosting $C_{l}$ while not adding any non-Gaussianity (we assume the noise is Gaussian here). In Figure 8 we show the size of post-Gaussian effects (i.e. corrections to the Gamma distribution not captured by a Gaussian likelihood) on the posterior with shape noise given by $\sigma_{\gamma}^{2}=\sigma_{e}^{2}/\bar{n}$ , i.e. the model power spectrum is $C_{l}=AS_{0,l}+N_{l}$ . We assume a per-component ellipticity r.m.s. of $\sigma_{e}=0.21$ , and set the source number density to $\bar{n}=3\,\mathrm{arcmin}^{-2}$ , as might be expected in a single tomographic bin in a Euclid-like survey with 10 redshift bins Euclid Collaboration et al. (2020). The Gaussianizing effects of noise start to become important at $l\gtrsim 100$ , and ensure that post-Gaussian effects in the likelihood are never more than sub-percent. For $\bar{n}=30\,\mathrm{arcmin}^{-2}$ this turn-over happens around $l_{\mathrm{max}}\approx 1000$ , suggesting that post-Gaussian effects are safely negligible on small scales for Stage-IV surveys.

As discussed above, it will likely be the case that post-Gaussian effects are most relevant when the largest angular scales in the survey are mildly non-linear, i.e. for low-redshift tomographic bins. These carry little statistical weight in a Euclid-like survey where most of the sources are around $z\approx 1$ , but could be relevant in other surveys. In Appendix D we explore the corrections for the lowest source redshift bin of a 10-bin lensing survey having intrinsic source redshifts comparable to that of a Euclid-like survey, showing that corrections are sub-percent once the effects of shape noise are included. This is partly because the amplitude of the shear power spectrum is suppressed for nearby sources due to the lensing geometry, so it may be that effects are larger for projected galaxy clustering. The quantity $\langle\varepsilon\rangle/\sqrt{\sum\nu}$ , or alternatively the dimensionless skewness of the measured power spectrum, provides an easy-to-compute diagnostic of whether one needs to worry about non-Gaussianity in the likelihood when low-redshift bins are important. If the effects are large, one will also need to account for effects of the mask in a full pseudo- $C_{l}$ analysis to build an accurate likelihood model, as in Ref. Hamimeche and Lewis (2008); Benabed et al. (2009); Upham et al. (2020); de Belsunce et al. (2021), or resort to map-level or likelihood-free methods.

IV Conclusions

We have computed, for the first time, the leading-order correction to the distribution of angular power spectrum estimates of a projected field from non-Gaussianity in the underlying signal. Our expressions are given in Equation (22) in the case of a single map, and by the sum of Equation (81) and Equation (102) in the case of multiple correlated maps. These expressions correct the Wishart distribution for the matrix of power spectra, and consist of a term proportional to the signal trispectrum and a term proportional to the square of its bispectrum. The expression can easily be generalised to include non-Gaussian noise.

Our expressions are correctly normalised, and have the correct mean, covariance, and three-point function. We have shown that, in the limit of a large number of degrees of freedom and sufficiently well behaved non-Gaussianity, the main effect of signal non-Gaussianity is to correct the covariance matrix of the power spectra by the standard trispectrum term. The residual ‘post-Gaussian’ effect of non-Gaussianity is to impart some skewness to the likelihood, on top of that present due to the residual effects of finite degrees of freedom. On small scales, the size of the correction to the posterior of a parameter is given approximately by the fractional correction to the Fisher information from the trispectrum, scaled by the inverse square-root of the total degrees of freedom that contribute. For a given $C_{l}$ and $l\gg 1$ the relevant quantity is $\Delta_{l}\equiv\sqrt{l/2}T_{ll}/C_{l}^{2}$ , where $T_{ll}$ is the trispectrum contribution to the variance. When $\Delta_{l}\ll 1$ , corrections to the likelihood of the $C_{l}$ that are not captured by a Gaussian with appropriately modified covariance are suppressed.

We have provided diagnostic statistics, given in Equation (24) and Equation (32) that quantify corrections to likelihoods and posteriors. These can be computed straightforwardly from a model for the non-Gaussian part of the covariance matrix. Such a model is already a requirement when using a likelihood for two-point statistics, so these statistics should be easy to compute for a given observational setup.

We quantified corrections to the likelihood function using mock non-Gaussian weak lensing maps created assuming lognormal statistics, with input power spectra and noise levels representative of an upcoming Stage-IV survey. The bispectrum-squared term is subdominant to the correction from the trispectrum at least to $l\approx 100$ , but we do not rule out stronger effects on small angular scales. Corrections to the likelihood not captured by a Gaussian are strongly suppressed on small scales in all our tests, which provides evidence that a Gaussian likelihood is sufficient on small scales even in the presence of realistic signal non-Gaussianity.

On large angular scales our likelihood correctly tends to a Gamma/Wishart distribution. We have found that the angular scales where the Gamma transitions to a Gaussian are sufficiently well separated from those where the trispectrum becomes important that it is sufficient to approximate the functional form of the likelihood as Gaussian when incorporating signal non-Gaussianity. This is the approach that has been taken traditionally, albeit without rigorous justification. This property remains true for the lowest redshift bin of a 10-bin Euclid-like lensing survey, but may break down for galaxy clustering power spectra at low redshifts. The likelihood function we have derived is unique amongst commonly used likelihoods in that it has the correct behaviour on both large and small angular scales, being non-perturbative in the degrees of freedom.

Our model relies on the signal non-Gaussianity being weak in a certain, well-defined, sense. We have taken care to identify the regime where this approximation is valid, but have argued that requiring $\Delta_{l}\ll 1$ is a good diagnostic for the post-Gaussian effects even beyond the limiting scale where our expansion is valid. Use of our likelihood in a practical situation will require ensuring that $(2l+1)\Delta_{l}\ll 1$ , or more generally that the corrections from signal non-Gaussianity are small in the likelihood or posterior of interest. This can also be checked by ensuring that non-Gaussianity in the sampling distribution of the relevant combinations of the measured power spectrum is small, which can be checked with simulations (which are in any case necessary to validate or produce covariance matrix estimates).

We have elucidated how the Central Limit Theorem competes with signal non-Gaussianity in suppressing non-Gaussian corrections to the likelihood. The balance is surprisingly delicate, but our conclusion is that our Universe is not non-Gaussian enough on the relevant scales that post-Gaussian terms in the likelihood are necessary in the high- $l$ regime – the Central Limit Theorem wins. This conclusion is strengthened once Gaussian noise (e.g. from galaxy intrinsic shapes or shot noise) is present in the data. This essentially explains why previous works such as Ref. Taylor et al. (2019); Lin et al. (2020) have found a Gaussian likelihood to be sufficient for $\Lambda$ CDM parameter inference in the presence of realistic signal non-Gaussianity. Our findings are also consistent with the results of Ref. Diaz Rivero and Dvorkin (2020), who found that by far the dominant source of non-Gaussianity in the power spectrum likelihood comes from large angular scales where signal non-Gaussianity is negligible. Our formalism has allowed us to cleanly separate the effects of non-Gaussianity in the signal with that from the quadratic nature of the power spectrum estimator, improving upon previous numerical works that have been unable to distinguish between these distinct effects.

An important limitation of our likelihood is that it is only valid for full-sky power spectrum estimates. For the standard pseudo- $C_{l}$ estimator even the zero-order Gaussian-field likelihood is difficult to compute Upham et al. (2020), and accounting for the mode coupling entailed by a survey mask in the non-Gaussian case seems intractable at first sight. The main effect of the mask is to reduce the number of degrees of freedom, increasing parameter variances. As the size of the survey is reduced, we can expect that at some point the largest observable scale is non-linear, in which case a suitably modified version of our likelihood might be a good place from which to build an approximation. The effects of non-Gaussianity from the signal in the case of small surveys may be behind the fairly large posterior corrections found in Ref. Hartlap et al. (2009). In the case that the full-sky power spectra are estimated from the data directly (as in quadratic maximum likelihood approaches) our distribution is likely to be more accurate, and bears some resemblance to the approximations developed in Ref. Seljak et al. (2017). For small survey cuts the mode coupling induced by the mask may only be relevant in the regime where the signal is linear, in which case an $f_{\mathrm{sky}}$ correction to the number of degrees of freedom may be sufficient to incorporate the survey mask into our likelihood. However, it may be the case that a map-level likelihood such as that used in cosmic microwave background analyses Wandelt et al. (2004), a likelihood-free approach Alsing et al. (2019), or suitable data transformations Wang et al. (2019) may be the way forward for modelling these effects in the distribution of projected fields.

While this work has studied the power spectrum likelihood, most weak lensing surveys up to date have used the shear correlation functions to derive their cosmological constraints. An analytic expression for the correlation function likelihood is difficult to write down even in the case of linear shear fields, but may be inferred numerically by sampling power spectra and subsequently transforming to real space. We defer an analytic study of this to a future work, but note that the results of the numerical studies Ref. Lin et al. (2020) and Ref. Taylor et al. (2019) suggest that non-Gaussian corrections are likely negligible for $\Lambda$ CDM models.

This work represents the first rigorous analytic study of the use of a Gaussian approximation for likelihood function of the power spectrum of non-Gaussian fields. While residual non-Gaussian effects appear small on small angular scales, we recommend that this be checked on a survey-by-survey basis using the diagnostic statistics defined in Equations (24) and (32). With the huge increase in statistical constraining power offered by near-future wide imaging surveys, it will be increasingly important to ensure that no biases arise from an incorrect statistical description of the data. Since we have found no evidence that signal non-Gaussianity imparts significant non-Gaussianity into the likelihood beyond that coming from the finite degrees of freedom, our recommendation for upcoming Stage-IV lensing surveys is to model the power spectrum likelihood as Gaussian (with the correct non-linear covariance matrix) on small scales $l\gtrsim 30$ . On large scales in the case that mode coupling from the mask is irrelevant the Gamma (or Wishart in the case of multiple maps) distribution should be used, and when mode coupling is relevant one should resort to map-level, likelihood-free, or existing likelihood approximations. The likelihood we have derived in this work may be used as a bridge between these two regimes.

Acknowledgements.

We thank Karim Benabed, Michael Brown, Alan Heavens, and Robin Upham for useful conversations. We thank Eric Tittley for his maintenance of and help with cuillin, the IfA’s computing cluster ( http://cuillin.roe.ac.uk) partially funded by the Science and Technology Facilities Council (STFC) and the European Research Council. Some of the results in this paper have been derived using the HEALPix Górski et al. (2005) package. AH and ANT acknowledge the support of a UK STFC Consolidated Grant.

Appendix A Power spectrum cumulants and the Edgeworth expansion

When considering how non-Gaussianity in a signal affects the distribution of the power spectrum, it is useful to start by studying the first few cumulants. In this section we will keep the assumption of Gaussian noise, and specialise to a single redshift bin ( $p=1$ ) for simplicity.

Firstly, let us compute the covariance of the power spectrum estimates. Non-Gaussianity contributes via the trispectrum of the signal as

\langle\Delta\hat{C}_{l}\Delta\hat{C}_{l^{\prime}}\rangle=\frac{2}{\nu}C_{l}^{2}\delta_{ll^{\prime}}+T_{ll^{\prime}},

(37)

where $T_{ll^{\prime}}$ is an $m$ -averaged version of the signal trispectrum; explicit forms for this are given in Ref. Okamoto and Hu (2002). We remind the reader here that $C_{l}$ is the sum of signal and noise power. The trispectrum term $T_{ll^{\prime}}$ is in general difficult to compute, with current strategies in weak lensing focused around halo models Cooray and Sheth (2002); Krause and Eifler (2017) or $N$ -body simulations Harnois-Déraps et al. (2012); Sato and Nishimichi (2013).

Now consider the three-point function of $\hat{C}_{l}$ . This consists of terms having six $a_{lm}$ factors. We can compute this from the definition Equation (1), using Wick’s theorem to pick out the various contributions. This gives

\langle\Delta\hat{C}_{l_{1}}\Delta\hat{C}_{l_{2}}\Delta\hat{C}_{l_{3}}\rangle=8\frac{\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{\nu_{1}^{2}}C_{l_{1}}^{3}+4\tilde{B}_{l_{1}l_{2}l_{3}}+2\tilde{S}_{l_{1}l_{2}l_{3}}+4[3]\frac{\delta_{l_{1}l_{2}}}{\nu_{1}}C_{l_{1}}T_{l_{1}l_{3}}+P_{l_{1}l_{2}l_{3}},

(38)

where the notation $[3]$ denotes the $3$ terms resulting from symmetrization over wave numbers in terms to the right, $P_{l_{1}l_{2}l_{3}}$ is an $m$ -averaged connected signal six-point function (‘pentaspectrum’) and the bispectrum-squared quantities $\tilde{B}$ and $\tilde{S}$ are defined as

	$\displaystyle\tilde{B}_{l_{1}l_{2}l_{3}}$	$\displaystyle\equiv\frac{1}{\nu_{1}\nu_{2}\nu_{3}}\sum_{m_{1},m_{2},m_{3}}\langle s_{l_{1}m_{1}}s_{l_{2}m_{2}}s_{l_{3}m_{3}}\rangle\langle s_{l_{1}m_{1}}^{}s_{l_{2}m_{2}}^{}s_{l_{3}m_{3}}^{*}\rangle,$		(39)
	$\displaystyle\tilde{S}_{l_{1}l_{2}l_{3}}$	$\displaystyle\equiv\frac{1}{\nu_{1}\nu_{2}\nu_{3}}\sum_{m_{1},m_{2},m_{3}}\langle s_{l_{1}m_{1}}s_{l_{1}m_{1}}^{}s_{l_{2}m_{2}}\rangle\langle s_{l_{2}m_{2}}^{}s_{l_{3}m_{3}}s_{l_{3}m_{3}}^{*}\rangle+\mathrm{sym.}$		(40)

By statistical isotropy the signal bispectra must be invariant under rotations, which constrains the terms in angle brackets in Equations (39) and (40) to be proportional to $3j$ -symbols that carry all the $m$ -dependence Cooray and Hu (2000). We will further impose that the signal statistics are invariant under a parity inversion, which introduces a second $3j$ -symbol imposing that all the wave numbers sum to an even integer. Explicitly we have

\langle a_{l_{1}m_{1}}a_{l_{2}m_{2}}a_{l_{3}m_{3}}\rangle=\sqrt{\frac{\nu_{1}\nu_{2}\nu_{3}}{4\pi}}\begin{pmatrix}l_{1}&l_{1}&l_{3}\\ 0&0&0\end{pmatrix}\begin{pmatrix}l_{1}&l_{1}&l_{3}\\ m_{1}&m_{2}&m_{3}\end{pmatrix}b_{l_{1}l_{2}l_{3}},

(41)

where $b_{l_{1}l_{2}l_{3}}$ is the reduced bispectrum. Substituting this expression into Equations (39) and (40), we see that $\tilde{S}$ is proportional to terms like

\sum_{m_{1}}(-1)^{m_{1}}\begin{pmatrix}l_{1}&l_{1}&l_{3}\\ m_{1}&-m_{1}&m_{3}\end{pmatrix}=\delta_{l_{3}0}\delta_{m_{3}0}(-1)^{l_{1}}\sqrt{2l_{1}+1},

(42)

where the identity follows from the orthogonality of the $3j$ -symbols. Since the signal $l=0$ mode is unobservable, the overall contribution from terms of this form is zero, and hence $\tilde{S}=0$ . For $\tilde{B}$ we have

\tilde{B}_{l_{1}l_{2}l_{3}}=\frac{1}{4\pi}\begin{pmatrix}l_{1}&l_{2}&l_{3}\\ 0&0&0\end{pmatrix}^{2}b_{l_{1}l_{2}l_{3}}^{2}.

(43)

As mentioned above, the $3j$ -symbol imposes that $l_{1}+l_{2}+l_{3}$ is an even integer, as well as the triangle conditions $\lvert l_{1}-l_{2}\rvert\leq l_{3}\leq l_{1}+l_{2}$ .

The three-point function of power spectrum estimates thus receives contributions from squared bispectra, products of trispectra and power spectra, and the connected six-point function. Similarly, the connected four-point function will receive contributions from squared trispectra, squared bispectra multiplied by power spectra, bispectra multiplied by the connected five-point function, pentaspectra multiplied by power spectra, and the connected eight-point function.

We want to keep our model for non-Gaussianity as general as possible, so in principle we would like to keep track of the whole cumulant hierarchy of the power spectrum estimates. This is clearly going to be computationally unfeasible however, so we will hereafter assume that the signal non-Gaussianity is weak.

A.1 Weak non-Gaussianity

We will assume that non-Gaussianity in the signal is weak in the sense that each term in its cumulant hierarchy is successively smaller than the last. Introducing the order-counting parameter $\lambda$ , we will assume that $C_{l}\sim\mathcal{O}(\lambda)$ , $b_{l_{1}l_{2}l_{3}}\sim\mathcal{O}(\lambda^{2})$ , $T_{l_{1}l_{2}}\sim\mathcal{O}(\lambda^{3})$ , etc., with $\lambda\ll 1$ . This is justified on sufficiently large scales where non-linearity in the underlying field is weak, and perturbation theory is accurate. We will precisely define the regime in which this perturbative approach is valid later on.

In this scenario, bispectrum-squared terms like $\tilde{B}_{l_{1}l_{2}l_{3}}$ are the same perturbative order as power-trispectrum cross terms like $C_{l_{1}}T_{l_{1}l_{3}}$ , which can also be seen by expanding the fields to second order in the linear signal; the leading-order (tree level) bispectrum features two factors of the linear power spectrum Bernardeau et al. (2002), meaning the squared bispectrum contains four such factors. The trispectrum contains three power spectra at leading order, and hence $C_{l_{1}}T_{l_{1}l_{3}}$ also contains four factors of the linear power spectrum. The connected six-point function is suppressed as $P_{l_{1}l_{2}l_{3}}\sim\mathcal{O}(\lambda^{5})$ . We will retain leading-order ‘non-Gamma’ terms throughout. The 3-pt function of $\hat{C}_{l}$ is thus Equation (38) with $\tilde{S}=0$ and the final $P_{l_{1}l_{2}l_{3}}$ term dropped.

The dimensionless 1-pt skewness of $\hat{C}_{l}$ can be derived from Equation (38) as

	$\displaystyle\frac{\langle\Delta\hat{C}_{l}^{3}\rangle}{\langle\Delta C_{l}^{2}\rangle^{3/2}}$	$\displaystyle=\frac{8C_{l}^{3}/\nu^{2}+4\tilde{B}_{lll}+12C_{l}T_{ll}/\nu}{\left(2C_{l}^{2}/\nu+T_{ll}\right)^{3/2}}$
		$\displaystyle=\sqrt{\frac{8}{\nu}}+\sqrt{2\nu^{3}}\frac{\tilde{B}_{lll}}{C_{l}^{3}}+\frac{3\sqrt{2\nu}}{2}\frac{T_{ll}}{C_{l}^{2}}+\mathcal{O}(\lambda^{2}).$		(44)

It is important to note here that we could consistently neglect the $\mathcal{O}(\lambda^{5})$ term $P_{l_{1}l_{2}l_{3}}$ in both the 3-pt function and its dimensionless version, since the variance itself is $\mathcal{O}(\lambda^{2})$ meaning that connected 6-pt terms contribute at $\mathcal{O}(\lambda^{2})$ in the dimensionless skew. This is in contrast to the connected four-point function of the power spectrum estimates. The 1-pt excess kurtosis for example is

\langle\Delta\hat{C}_{l}^{4}\rangle_{c}=48\frac{C_{l}^{4}}{\nu^{3}}+96\frac{C_{l}\tilde{B}_{lll}}{\nu}+144\frac{C_{l}^{2}T_{ll}}{\nu^{2}},

(45)

where we have kept terms at $\mathcal{O}(\lambda^{5})$ . Working to a consistent $\mathcal{O}(\lambda^{4})$ order would mean dropping all but the first term from this expression. In contrast, the dimensionless kurtosis of $\hat{C}_{l}$ is

	$\displaystyle\frac{\langle\Delta\hat{C}_{l}^{4}\rangle_{c}}{\langle\Delta\hat{C}_{l}^{2}\rangle^{2}}$	$\displaystyle=\frac{48C_{l}^{4}/\nu^{3}+96C_{l}\tilde{B}_{lll}/\nu+144C_{l}^{2}T_{ll}/\nu^{2}}{\left(2C_{l}^{2}/\nu+T_{ll}\right)^{2}}$
		$\displaystyle=\frac{12}{\nu}+24\nu\frac{\tilde{B}_{lll}}{C_{l}^{3}}+24\frac{T_{ll}}{C_{l}^{2}}+\mathcal{O}(\lambda^{2}).$		(46)

Therefore, while the raw non-Gaussian cumulant hierarchy of the power spectrum estimates can be truncated consistently in perturbation theory, all of the dimensionless cumulants, i.e. the cumulants of a normalised $\hat{C}_{l}$ estimator with unit variance, receive corrections from a non-Gaussian signal. The cumulant hierarchy of the normalised estimator cannot therefore be truncated consistently, since terms of order $\mathcal{O}(\lambda)$ always contribute.

The higher-order non-Gaussian cumulants of a normalised $\hat{C}_{l}$ estimator thus do not get successively smaller, in contrast to those of the signal. This observation has important consequences, since successively smaller cumulants are a requirement for the Edgeworth expansion to converge. One might be tempted to speculate that a good way of incorporating the effects of a non-Gaussian signal into the $\hat{C}_{l}$ distribution would be to treat the non-Gaussianity as weak and then Edgeworth expand around a Gamma distribution, but the argument above tells us this will not work; we have to instead expand the signal distribution around a Gaussian. Attempts to use an Edgeworth expansion directly on the distribution of $\hat{C}_{l}$ (or a decorrelated version of it, as in Ref. Lin et al. (2020)) in order to incorporate signal non-Gaussianity should be treated with caution.

Appendix B Derivation of the tomographic redshift bin likelihood

In this Section we derive the correction to the likelihood function of tomographic redshift bin power spectra. It is straightforward to write down the correct likelihood for full-sky spectra when the signal and noise are both Gaussian - this is the Wishart distribution for positive-definite matrices. Similarly, it is straightforward to write down a Gaussian approximation to the Wishart distribution. Here we present the derivation for the case of multiple redshift bins.

Our strategy will be to substitute the Edgeworth-expanded signal characteristic function Equation (12) into the mode coupling integral Equation (9) and perform the integrations to derive the characteristic function of the power spectrum. We will then inverse Fourier-transform this to get the distribution of the power spectrum.

We will compute the trispectrum and bispectrum-squared corrections separately, but start with the simple case of Gaussian signal and noise.

B.1 Gaussian signal, Gaussian noise

As a warm-up, consider the situation where both signal and noise are Gaussian. The former should be a good approximation on large scales where the signal is a linearly-evolved version of the (close to) Gaussian initial conditions, and the latter is practically guaranteed for maps made by compression from large numbers of underlying tracers (e.g. shear maps constructed from averaged galaxy shapes, or density maps made by counting many galaxies).

We assume throughout that the underlying signal and noise obey statistical isotropy, such that modes with different wave vectors are uncorrelated. When the signal and noise are both Gaussian, the covariance of different modes captures all the statistical information of the field, implying that modes with different wave vectors are not only uncorrelated but statistically independent. The characteristic functions of signal and noise thus factorize, and since the Fourier transform of a Gaussian is also a Gaussian we have

	$\displaystyle\phi_{s}(\{\mathbf{k}_{lm}\})$	$\displaystyle=e^{-\frac{1}{2}\sum_{lm}\mathbf{k}_{lm}^{\dagger}\mathbf{S}_{l}\mathbf{k}_{lm}},$		(47)
	$\displaystyle\phi_{n}(\{\mathbf{k}_{lm}\})$	$\displaystyle=e^{-\frac{1}{2}\sum_{lm}\mathbf{k}_{lm}^{\dagger}\mathbf{N}_{l}\mathbf{k}_{lm}},$		(48)

where $\mathbf{S}_{l}$ and $\mathbf{N}_{l}$ are the signal and noise covariance matrices.

Equation (9) factorizes in each $l$ and $m$ , with each integral a Gaussian integral. Making these substitutions immediately gives

\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}},

(49)

where $\mathbf{C}_{l}$ is the total (signal plus noise) covariance matrix of the map. We recognise this as a product of characteristic functions of Wishart matrices.

The joint distribution of the power spectra is thus

p(\{\nu\hat{\mathbf{C}}_{l}\})=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}2^{\frac{p(p-1)}{2}}\int\frac{\mathrm{d}\mathbf{J}_{l}}{(2\pi)^{n}}e^{i\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}},

(50)

i.e. each $\hat{\mathbf{C}}_{l}$ is independent.

The integral in Equation (50) is almost the standard integral that defines the multivariate Gamma function, with the exception that it is over real symmetric matrices rather than positive definite matrices. Now, let $\mathbf{T}\equiv 2\mathbf{C}_{l}^{\frac{1}{2}}\mathbf{J}_{l}\mathbf{C}_{l}^{\frac{1}{2}}$ and change integration variables from $\mathbf{J}_{l}$ to $\mathbf{T}$ . We’ll assume the total covariance matrix is positive definite. The Jacobian of this transformation is

\mathrm{d}\mathbf{J}_{l}=2^{-\frac{p(p+1)}{2}}\left|\mathbf{C}_{l}\right|^{-\frac{(p+1)}{2}}\mathrm{d}\mathbf{T},

(51)

which follows from a standard result. Each term in the product is therefore

\frac{2^{-p}\left|\mathbf{C}_{l}\right|^{-\frac{(p+1)}{2}}}{(2\pi)^{\frac{p(p+1)}{2}}}\int\mathrm{d}\mathbf{T}\;e^{\frac{i}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{T})}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}

(52)

where $\bm{\Omega}\equiv\nu\mathbf{C}_{l}^{-\frac{1}{2}}\hat{\mathbf{C}}_{l}\mathbf{C}_{l}^{-\frac{1}{2}}$ is symmetric and the integration is over all real symmetric $p\times p$ matrices $\mathbf{T}$ . Our task is therefore to compute the integral

P_{p,\nu}(\bm{\Omega})\equiv\int\mathrm{d}\mathbf{T}\;e^{\frac{i}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{T})}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}.

(53)

There are several ways of solving this integral, as discussed in, e.g. Ref. Janik and Nowak (2003). Making the substitution $\mathbf{Z}=\mathbf{I}+i\mathbf{T}$ in the definition of $P_{p,\nu}(\bm{\Omega})$ gives

P_{p,\nu}(\bm{\Omega})=i^{-\frac{p(p+1)}{2}}e^{-\frac{1}{2}\mathrm{Tr}(\bm{\Omega})}\int_{\mathrm{Re}(\mathbf{Z})=\mathbf{I}}\mathrm{d}\mathbf{Z}\,e^{\frac{1}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{Z})}\lvert\mathbf{Z}\rvert^{-\frac{\nu}{2}},

(54)

where the integral is over a fixed real part and a symmetric imaginary part. The integral is an inverse Laplace transform, and can be derived from the result Herz (1955)

\frac{2^{\frac{1}{2}p(p-1)}}{(2\pi i)^{\frac{1}{2}p(p+1)}}\int_{\mathrm{Re}(Z)=X_{0}>0}\mathrm{d}\mathbf{Z}\,e^{\mathrm{Tr}(\mathbf{Z}\bm{\Lambda})}\lvert\mathbf{Z}\rvert^{-a-\frac{1}{2}(p+1)}=\frac{\lvert\bm{\Lambda}\rvert^{a}}{\Gamma_{p}\left[a+\frac{1}{2}(p+1)\right]},

(55)

where $\bm{\Lambda}>0$ and the integral is over symmetric imaginary parts and a fixed positive definite real part. We require $\bm{\Omega}>0$ , i.e. $\nu\geq p$ . Since $X_{0}=I$ is positive definite, we get, after some rearranging,

P_{p,\nu}(\bm{\Omega})=\frac{2^{\frac{p(p-\nu+3)}{2}}\pi^{\frac{p}{2}(p+1)}}{\Gamma_{p}\left(\frac{\nu}{2}\right)}\left|\bm{\Omega}\right|^{\frac{\nu-p-1}{2}}e^{-\frac{1}{2}\mathrm{Tr}(\bm{\Omega})}.

(56)

Substituting $P_{p,\nu}(\bm{\Omega})$ back in to Equation (50) gives

p(\{\nu\hat{\mathbf{C}}_{l}\})=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\frac{\lvert\nu\hat{\mathbf{C}}_{l}\rvert^{(\nu-p-1)/2}}{2^{\frac{p\nu}{2}}\Gamma_{p}\left(\frac{\nu}{2}\right)\left|\mathbf{C}_{l}\right|^{\frac{\nu}{2}}}e^{-\frac{\nu}{2}\mathrm{Tr}(\mathbf{C}_{l}^{-1}\hat{\mathbf{C}}_{l})},

(57)

i.e.

	$\displaystyle p(\{\hat{\mathbf{C}}_{l}\})$	$\displaystyle=\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\frac{\lvert\hat{\mathbf{C}}_{l}\rvert^{(\nu-p-1)/2}}{2^{\frac{p\nu}{2}}\Gamma_{p}\left(\frac{\nu}{2}\right)\left\|\mathbf{C}_{l}/\nu\right\|^{\frac{\nu}{2}}}e^{-\frac{\nu}{2}\mathrm{Tr}(\mathbf{C}_{l}^{-1}\hat{\mathbf{C}}_{l})}$
		$\displaystyle\equiv\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}W_{p}(\hat{\mathbf{C}}_{l};\mathbf{C}_{l}/\nu,\nu),$		(58)

where $W_{p}$ is the $p$ -dimensional Wishart density.

B.2 Convergence of the Wishart distribution to Gaussianity

The convergence of probability distributions is most easily studied at the level of the characteristic function. A single $\hat{\mathbf{C}}_{l}$ has the characteristic function

\phi_{\hat{\mathbf{C}}_{l}}(\mathbf{J}_{l})=\left|\mathbf{I}+\frac{2i\mathbf{J}_{l}\mathbf{C}_{l}}{\nu}\right|^{-\frac{\nu}{2}}.

(59)

Consider the characteristic function of $\mathbf{M}_{l}\equiv\mathbf{C}^{-1/2}\hat{\mathbf{C}}_{l}\mathbf{C}_{l}^{-1/2}$ . This is

\phi_{\mathbf{M}_{l}}(\mathbf{J}_{l})=\left|\mathbf{I}+\frac{2i\mathbf{J}_{l}}{\nu}\right|^{-\frac{\nu}{2}}.

(60)

We can rewrite this as

\phi_{\mathbf{M}_{l}}(\mathbf{J}_{l})=\left|\mathbf{I}+\frac{2i\mathbf{R}\mathbf{J}_{l}\mathbf{R}^{T}}{\nu}\right|^{-\frac{\nu}{2}}.

(61)

where $\mathbf{R}$ is any orthogonal matrix. Choosing $\mathbf{R}$ as the matrix that diagonalizes $\mathbf{J}_{l}$ , this gives

\phi_{\mathbf{M}_{l}}(\mathbf{J}_{l})=\prod_{i=1}^{p}\left[1+\frac{2i\lambda_{i}}{\nu}\right]^{-\frac{\nu}{2}}.

(62)

where $\lambda_{i}$ are the eigenvalues of $\mathbf{J}_{l}$ .

Now fixing $\mathbf{J}_{l}$ , taking $\nu\to\infty$ and keeping terms up to $\mathcal{O}(\nu^{-1})$ gives

	$\displaystyle\lim_{\nu\to\infty}\phi_{\mathbf{M}_{l}}(\mathbf{J}_{l})$	$\displaystyle=\exp\left(-i\sum_{i=1}^{p}\lambda_{i}-\sum_{i=1}^{p}\frac{\lambda_{i}^{2}}{\nu}\right)$
		$\displaystyle=\exp\left[-i\mathrm{Tr}(\mathbf{J}_{l})-\frac{1}{\nu}\mathrm{Tr}(\mathbf{J}_{l}\mathbf{J}_{l})\right]$		(63)

This implies that the characteristic function of our original variable is

\lim_{\nu\to\infty}\phi_{\hat{\mathbf{C}}_{l}}(\mathbf{J}_{l})=\exp\left[-i\mathrm{Tr}(\mathbf{J}_{l}\mathbf{C}_{l})-\frac{1}{\nu}\mathrm{Tr}(\mathbf{J}_{l}\mathbf{C}_{l}\mathbf{J}_{l}\mathbf{C}_{l})\right].

(64)

The inverse Fourier transform is a Gaussian integral, giving

\lim_{\nu\to\infty}p(\hat{\mathbf{C}}_{l})=2^{\frac{p(p-1)}{2}}\int\frac{\mathrm{d}\mathbf{J}_{l}}{(2\pi)^{\frac{p(p+1)}{2}}}e^{i\mathrm{Tr}[\mathbf{J}_{l}(\hat{\mathbf{C}}_{l}-\mathbf{C}_{l})]}e^{-\frac{1}{\nu}\mathrm{Tr}(\mathbf{J}_{l}\mathbf{C}_{l}\mathbf{J}_{l}\mathbf{C}_{l})}.

(65)

To do this integral, it is advantageous to move to vectorized notation Gupta and Nagar (2000). For a general $p\times p$ matrix $Y$ define $\mathrm{vec}(Y)$ as the $p^{2}$ -dimensional vector whose elements are $(Y_{1,1},\dots,Y_{p,1},Y_{1,2},\dots,Y_{p,2},\dots)$ . For a symmetric $p\times p$ matrix $X$ , define $\mathrm{vecp}(X)$ as the $\frac{1}{2}p(p+1)$ -dimensional vector whose elements are $(X_{1,1},X_{1,2},\dots,X_{1,p},X_{2,2},X_{2,3},\dots)$ . Then some standard matrix identities give

	$\displaystyle\mathrm{Tr}(\mathbf{X}\mathbf{D}\mathbf{X}\mathbf{E})$	$\displaystyle=\mathrm{vec}^{T}(\mathbf{X})(\mathbf{E}\otimes\mathbf{D})\mathrm{vec}(\mathbf{X})$
		$\displaystyle=\mathrm{vecp}^{T}(\mathbf{X})\mathbf{B}_{p}^{+}(\mathbf{E}\otimes\mathbf{D})\mathbf{B}_{p}^{+T}\mathrm{vecp}(\mathbf{X}),$		(66)

where $\otimes$ is the Kronecker product, and $\mathbf{B}_{p}^{+}=(\mathbf{B}_{p}^{T}\mathbf{B}_{p})^{-1}\mathbf{B}_{p}^{T}$ is the Moore-Penrose inverse of $\mathbf{B}_{p}$ , the $p^{2}\times\frac{1}{2}p(p+1)$ transition matrix. This matrix satisfies $\mathrm{vec}(\mathbf{X})=\mathbf{B}_{p}^{+T}\mathrm{vecp}(\mathbf{X})$ and $\mathrm{vec}(\mathbf{X})=\mathbf{B}_{p}^{T}\mathrm{vecp}(\mathbf{X})$ , and has typical elements given by Gupta and Nagar (2000)

(\mathbf{B}_{p})_{ij,gh}=\frac{1}{2}(\delta_{ig}\delta_{jh}+\delta_{ih}\delta_{jg}),\,i\leq p,\,j\leq p,\,g\leq h\leq p.

(67)

Note that $\mathbf{B}_{p}^{+}$ is a $\frac{1}{2}p(p+1)\times p^{2}$ matrix. We also have

	$\displaystyle\mathrm{Tr}(\mathbf{X}\mathbf{A})$	$\displaystyle=\mathrm{vec}^{T}(\mathbf{X})\mathrm{vec}(\mathbf{A})$
		$\displaystyle=\mathrm{vecp}^{T}(\mathbf{X})\mathbf{B}_{p}^{+}\mathbf{B}_{p}^{+T}\mathrm{vecp}(\mathbf{A}).$		(68)

Making these substitutions, defining $\Delta\hat{\mathbf{C}}_{l}=\hat{\mathbf{C}}_{l}-\mathbf{C}_{l}$ , and using that $\lvert\mathbf{B}_{p}^{T}\mathbf{B}_{p}\rvert=2^{-\frac{1}{2}p(p-1)}$ gives

	$\displaystyle\lim_{\nu\to\infty}p(\hat{\mathbf{C}}_{l})$	$\displaystyle=2^{\frac{p(p-1)}{2}}\int\frac{\mathrm{dvecp}(\mathbf{J}_{l})}{(2\pi)^{\frac{p(p+1)}{2}}}e^{i\mathrm{vecp}^{T}(\mathbf{J}_{l})\mathbf{B}_{p}^{+}\mathbf{B}_{p}^{+T}\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l})}e^{-\frac{1}{\nu}\mathrm{vecp}^{T}(\mathbf{J}_{l})\mathbf{B}_{p}^{+}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}^{+T}\mathrm{vecp}(\mathbf{J}_{l})}$
		$\displaystyle=\frac{\exp\left\{-\frac{1}{2}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l})\left[\frac{2}{\nu}\mathbf{B}_{p}^{T}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}\right]^{-1}\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l})\right\}}{\sqrt{(2\pi)^{\frac{1}{2}p(p+1)}\left\|\frac{2}{\nu}\mathbf{B}_{p}^{T}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}\right\|}},$		(69)

i.e. $\mathrm{vecp}(\hat{\mathbf{C}}_{l})$ is $\frac{1}{2}p(p+1)$ -dimensional Gaussian with mean $\mathrm{vecp}(\mathbf{C}_{l})$ and covariance matrix $\frac{2}{\nu}\mathbf{B}_{p}^{T}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p}$ . The matrix $\hat{\mathbf{C}}_{l}$ is thus symmetric matrix normal. Note that the mean and covariance matrix do not change compared with their Wishart forms when the limit is taken.

We have shown that the characteristic function of the Wishart distribution tends pointwise to that of a symmetric matrix Gaussian in the limit $\nu\to\infty$ . This is enough to establish that Wishart matrices converge in distribution to symmetric Gaussian matrices, by Lévy’s continuity theorem.

B.3 Signal with non-zero trispectrum

We write the signal characteristic function as

\phi_{s}(\{\mathbf{k}_{lm}\})=\left(1+\frac{1}{24}\kappa^{ijkm}_{\underline{lm}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}k^{m}_{l_{4}m_{4}}\right)\phi_{s}^{G}(\{\mathbf{k}_{lm}\}).

(70)

Substituting this into Equation (9) gives a Gaussian integral, as in the Gaussian signal case, but now with four factors of the wave number in the integrand. This integral can be done analytically, which gives the power spectrum characteristic function as

\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}\right)\left[1+\frac{1}{8}\sum_{l_{1},l_{2}}(2l_{1}+1)(2l_{2}+1)\langle\Delta\hat{C}^{ij}_{l_{1}}\Delta\hat{C}^{km}_{l_{2}}\rangle_{\mathrm{NG}}M^{ij}_{l_{1}}M^{km}_{l_{2}}\right].

(71)

A few consequences of the corrected distribution are immediately evident from Equation (71). Firstly, the leading-order correction in the limit $J_{l}\to 0$ is quadratic in $J_{l}$ , which tells us that both the normalization and the mean of $p(\hat{C}_{l})$ are unaffected by our correction (the latter is true by construction), but the covariance does receive a correction, given by $\langle\Delta\hat{C}^{ij}_{l_{1}}\Delta\hat{C}^{km}_{l_{2}}\rangle_{\mathrm{NG}}$ . This is as expected, agreeing with the non-perturbative expression Equation (37).

Inverse Fourier-transforming Equation (71) gives the correction to the probability density as

\Delta p(\{\nu\hat{\mathbf{C}}_{l}\})=\frac{1}{8}2^{\frac{\lambda p(p-1)}{2}}\int\frac{\mathrm{d}\{\mathbf{J}_{l}\}}{(2\pi)^{n\lambda}}e^{i\sum_{l}\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}\right)\sum_{l_{1},l_{2}}\nu_{1}\nu_{2}\langle\Delta\hat{C}^{ij}_{l_{1}}\Delta\hat{\mathrm{C}}^{km}_{l_{2}}\rangle_{\mathrm{NG}}(\mathbf{M}_{l_{1}})^{ij}(\mathbf{M}_{l_{2}})^{km},

(72)

with $\mathbf{M}_{l}=[\mathbf{C}_{l}+(2i\mathbf{J}_{l})^{-1}]^{-1}$ and $\lambda\equiv l_{\mathrm{max}}-l_{\mathrm{min}}+1$ . The quantity $\langle\Delta\hat{C}^{ij}_{l_{1}}\Delta\hat{C}^{km}_{l_{2}}\rangle_{\mathrm{NG}}$ is the non-Gaussian contribution to the covariance matrix. In the $p=1$ case we call this $T_{l_{1}l_{2}}$ . The crucial effect of non-Gaussianity here is to couple $\hat{C}_{l}$ with different $l$ , a result of non-zero off-diagonal terms in the non-Gaussian covariance matrix; the probability density no longer factorizes in $l$ .

The integrals in Equation (72) are of the Wishart type when $l_{1}\neq l_{2}$ , and the only new term is that with $l_{1}=l_{2}$ . After some rearranging and relabelling, one can show that it is necessary to compute the integral

H^{ab,cd}_{p,\nu}(\bm{\Omega})\equiv\int\mathrm{d}\mathbf{T}\,e^{\frac{i}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{T})}\,\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{ab}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{cd}.

(73)

We’ll evaluate this using derivatives, but first we need to establish a few rules for taking derivatives with respect to symmetric matrices. These can be found in Ref. Petersen and Pedersen (2008).

For a symmetric matrix $\mathbf{X}$ , we have

$\displaystyle\frac{\partial\mathbf{X}}{\partial X_{ij}}$	$\displaystyle=\mathbf{J}^{ij}+\mathbf{J}^{ji}-\mathbf{J}^{ij}\mathbf{J}^{ij}$	(74)
$\displaystyle\frac{\partial\mathrm{Tr}(\mathbf{A}\mathbf{X})}{\partial\mathbf{X}}$	$\displaystyle=\mathbf{A}+\mathbf{A}^{T}-(\mathbf{A}\circ\mathbf{I})$	(75)
$\displaystyle\frac{\partial\mathrm{det}\mathbf{X}}{\partial\mathbf{X}}$	$\displaystyle=(\mathrm{det}\mathbf{X})[2\mathbf{X}^{-1}-(\mathbf{X}^{-1}\circ\mathbf{I})],$	(76)

where $\circ$ denotes the Hadamard (element-wise) product, and $\mathbf{J}^{ij}$ is the matrix whose $km$ element is $\delta_{ik}\delta_{jm}$ , i.e. the matrix whose only non-zero values are in cell $ij$ .

We can therefore write the following identity holding for symmetric matrices $\mathbf{T}$ :

\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}(\mathbf{I}+i\mathbf{T})^{-1}_{ij}=\frac{i}{\nu}\frac{\partial}{\partial T_{ij}}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}+\frac{1}{2}(\mathbf{I}+i\mathbf{T})^{-1}_{ij}\delta_{ij},

(77)

with no summation over repeated indices. Using this identity one can establish the identity

	$\displaystyle\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}(\mathbf{I}+i\mathbf{T})^{-1}_{ij}(\mathbf{I}+i\mathbf{T})^{-1}_{km}$	$\displaystyle=\frac{i}{\nu}\frac{\partial}{\partial T_{ij}}\left[\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}(\mathbf{I}+i\mathbf{T})^{-1}_{km}\right]+\frac{1}{2}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}(\mathbf{I}+i\mathbf{T})^{-1}_{ij}\delta_{ij}(\mathbf{I}+i\mathbf{T})^{-1}_{km}$
		$\displaystyle-\frac{1}{\nu}\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}\left[2(\mathbf{I}+i\mathbf{T})^{-1}_{i\left(k\right.}(\mathbf{I}+i\mathbf{T})^{-1}_{m\left.\right)j}-\delta_{ij}(\mathbf{I}+i\mathbf{T})^{-1}_{kj}(\mathbf{I}+i\mathbf{T})^{-1}_{mi}\right],$		(78)

where again there is no summation over repeated indices, and parentheses denote symmetrization over enclosed indices. Substituting this in to Equation (73), integrating by parts and noting that the boundary term vanishes (which it must do for the Fourier transforms to converge) gives

H^{ijkm}_{p,\nu}(\bm{\Omega})=\frac{1}{2\nu}(2\bm{\Omega}-\bm{\Omega}\circ\mathbf{I})_{ij}P_{p,\nu}(\bm{\Omega})\frac{\Omega_{km}}{\nu}+\frac{1}{2}\delta_{ij}H^{ijkm}_{p,\nu}(\bm{\Omega})-\frac{2}{\nu}H^{i(km)j}_{p,\nu}(\bm{\Omega})+\frac{1}{\nu}\delta_{ij}H^{kjmi}_{p,\nu}(\bm{\Omega}).

(79)

Permuting indices we can get three simultaneous equations for $H^{ijkm}_{p,\nu}$ . The solution is

H^{ijkm}_{p,\nu}(\bm{\Omega})=\frac{(\nu+1)}{\nu(\nu+2)(\nu-1)}\left[\Omega_{ij}\Omega_{km}-\frac{2}{(\nu+1)}\Omega_{i\left(k\right.}\Omega_{\left.m\right)j}\right]P_{p,\nu}(\bm{\Omega}).

(80)

Putting everything together gives the corrected density for the $\hat{\mathbf{C}}_{l}$ as

	$\displaystyle p(\{\hat{\mathbf{C}}_{l}\})$	$\displaystyle=\left[\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}W_{p}(\hat{\mathbf{C}}_{l};\mathbf{C}_{l}/\nu,\nu)\right]\left\{1+\frac{1}{8}\sum_{l_{1},l_{2}}\nu_{1}\nu_{2}\langle\Delta\hat{C}^{ab}_{l_{1}}\Delta\hat{C}^{cd}_{l_{2}}\rangle_{\mathrm{NG}}C^{-1}_{l_{1},ai}C^{-1}_{l_{1},bj}C^{-1}_{l_{2},ck}C^{-1}_{l_{2},dm}\vphantom{\frac{1}{2}}\right.$
		$\displaystyle\times\left.\left[\Delta\hat{C}_{l_{1},ij}\Delta\hat{C}_{l_{2},km}+\frac{2\delta_{l_{1}l_{2}}}{(\nu+2)(\nu-1)}\left(\hat{C}_{l_{1},ij}\hat{C}_{l_{1},km}-\nu_{1}\hat{C}_{l_{1},i\left(k\right.}\hat{C}_{l_{1},\left.m\right)j}\right)\right]\right\},$		(81)

with implicit summation over repeated indices. One can verify that the distribution Equation (81) is correctly normalised and has a mean given by $\langle\hat{\mathbf{C}}_{l}\rangle=\mathbf{C}_{l}$ , i.e. unchanged from the Wishart case.

We can write the correction in a more compact vectorial notation as follows. We define the Gaussian covariance matrix

\bm{\Sigma}_{l}\equiv\langle\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l})\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l})\rangle_{G}=\frac{2}{\nu}\mathbf{B}_{p}^{T}(\mathbf{C}_{l}\otimes\mathbf{C}_{l})\mathbf{B}_{p},

(82)

which implies that $\bm{\Sigma}^{-1}_{l}=\frac{\nu}{2}\mathbf{B}_{p}^{+}(\mathbf{C}_{l}^{-1}\otimes\mathbf{C}_{l}^{-1})\mathbf{B}_{l}^{+T}$ . After some manipulations, the term quadratic in $\Delta\hat{C}_{l}$ in Equation (17) can be written as

\frac{1}{2}\sum_{l_{1},l_{2}}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{1}})\bm{\Sigma}_{l_{1}}^{-1}\langle\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l_{1}})\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l_{2}})\rangle_{\mathrm{NG}}\bm{\Sigma}_{l_{2}}^{-1}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{2}}),

(83)

which is the next-to-leading-order term in the expansion of

-\frac{1}{2}\sum_{l_{1},l_{2}}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{1}})\left[\bm{\Sigma}_{l_{1}}\delta_{l_{1}l_{2}}+\langle\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l_{1}})\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{2}})\rangle_{\mathrm{NG}}\right]^{-1}\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{2}}),

(84)

i.e. an expansion of a chi-squared having the inverse of the total covariance matrix. Once the Gaussian part of this expansion is ‘available’ when the $\nu\to\infty$ limit is taken, we will be left a Gaussian possessing the correct inverse covariance matrix in its exponent.

The first of the two terms quadratic in $\hat{C}_{l}$ in Equation (81) has the same form as the term above, and can be written as a quadratic in $\mathrm{vecp}(\hat{\mathbf{C}}_{l})$ . Note that this term is suppressed with respect to the second term quadratic in $\hat{C}_{l}$ by a factor of $\nu$ . This second term can be written as, with $\mathbf{A}_{l}\equiv\mathbf{C}_{l}^{-1}\hat{\mathbf{C}}_{l}\mathbf{C}_{l}^{-1}$

	$\displaystyle-\frac{1}{4}\sum_{l_{1}}\frac{\nu_{1}^{3}}{(\nu_{1}+2)(\nu_{1}-1)}\mathrm{Tr}\left[(\mathbf{A}_{l_{1}}\otimes\mathbf{A}_{l_{1}})\langle\mathrm{vec}(\Delta\hat{\mathbf{C}}_{l_{1}})\mathrm{vec}^{T}(\Delta\hat{\mathbf{C}}_{l_{1}})\rangle_{\mathrm{NG}}\right]$
	$\displaystyle=-\frac{1}{4}\sum_{l_{1}}\frac{\nu_{1}^{3}}{(\nu_{1}+2)(\nu_{1}-1)}\mathrm{Tr}\left[\mathbf{B}_{p}^{+}(\mathbf{A}_{l_{1}}\otimes\mathbf{A}_{l_{1}})\mathbf{B}_{p}^{+T}\langle\mathrm{vecp}(\Delta\hat{\mathbf{C}}_{l_{1}})\mathrm{vecp}^{T}(\Delta\hat{\mathbf{C}}_{l_{1}})\rangle_{\mathrm{NG}}\right].$		(85)

This looks like the next-to-leading-order term in an expansion of the determinant of the total covariance, with the exception that it depends on the measured power. In the limit $\nu_{1}\to\infty$ we can write $\hat{\mathbf{C}}_{l}=\mathbf{C}_{l}+\mathcal{O}(\nu^{-1})$ , so we set $\hat{\mathbf{C}}_{l}=\mathbf{C}_{l}$ in this term and hence $\mathbf{A}_{l}=\mathbf{C}_{l}^{-1}$ . In that limit, this term dominates over the first term quadratic in $\hat{C}_{l}$ and is precisely the expansion of the determinant of the total covariance, i.e. the prefactor in the asymptotic Gaussian density.

B.4 Signal with non-zero bispectrum

We now consider a signal having non-zero bispectrum. The signal characteristic function reads

\phi_{s}(\{\mathbf{k}_{lm}\})=\left(1-\frac{1}{72}\kappa^{ijk}_{\underline{lm}}\kappa^{lmn}_{\underline{l^{\prime}m^{\prime}}}k^{i}_{l_{1}m_{1}}k^{j}_{l_{2}m_{2}}k^{k}_{l_{3}m_{3}}k^{l}_{l_{1}^{\prime}m_{1}^{\prime}}k^{m}_{l_{2}^{\prime}m_{2}^{\prime}}k^{n}_{l_{3}^{\prime}m_{3}^{\prime}}\right)\phi_{s}^{G}(\{\mathbf{k}_{lm}\}).

(86)

Following the same steps as the trispectrum case, the correction to the characteristic function is

	$\displaystyle\Delta\phi_{\{\nu\hat{\mathbf{C}}_{l}\}}(\{\mathbf{J}_{l}\})=$	$\displaystyle\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}\right)$
		$\displaystyle\times\left[1-\frac{1}{12}\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\tilde{B}^{ijk,lmn}_{l_{1}l_{2}l_{3}}\left(\mathbf{C}_{l_{1}}-\frac{i}{2}\mathbf{J}_{l_{1}}^{-1}\right)^{-1}_{il}\left(\mathbf{C}_{l_{2}}-\frac{i}{2}\mathbf{J}_{l_{2}}^{-1}\right)^{-1}_{jm}\left(\mathbf{C}_{l_{3}}-\frac{i}{2}\mathbf{J}_{l_{3}}^{-1}\right)^{-1}_{kn}\right].$		(87)

In the limit that $J_{l}\to 0$ this is a cubic correction to the Wishart characteristic function, telling us that the effect of $\tilde{B}$ is to modify the three-point function or skewness of $\hat{C}_{l}$ , as expected. Just as in the trispectrum case, the correction is not a polynomial in $J_{l}$ and hence not an Edgeworth expansion, even though it is perturbatively ‘close’ to the zero-order Wishart distribution¹¹¹¹11Note that to get the characteristic function of $\hat{C}_{l}$ rather than $\nu\hat{C}_{l}$ one has to replace $J_{l}$ with $J_{l}/\nu$ ..

Inverse-Fourier transforming Equation (87) gives the correction to the density as

\Delta p(\{\nu\hat{\mathbf{C}}_{l}\})=-\frac{1}{12}2^{\frac{\lambda p(p-1)}{2}}\int\frac{\mathrm{d}\{\mathbf{J}_{l}\}}{(2\pi)^{n\lambda}}e^{i\sum_{l}\mathrm{Tr}(\nu\mathbf{J}_{l}\hat{\mathbf{C}}_{l})}\left(\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}\lvert\mathbf{I}+2i\mathbf{J}_{l}\mathbf{C}_{l}\rvert^{-\frac{\nu}{2}}\right)\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\tilde{B}^{ijk,lmn}_{l_{1}l_{2}l_{3}}(\mathbf{M}_{l_{1}})^{il}(\mathbf{M}_{l_{2}})^{jm}(\mathbf{M}_{l_{3}})^{kn},

(88)

with $\mathbf{M}_{l}=[\mathbf{C}_{l}+(2i\mathbf{J}_{l})^{-1}]^{-1}$ .

Once again the only new term is the case $l_{1}=l_{2}=l_{3}$ . The new integral we need is

K^{ab,cd,ef}_{p,\nu}(\bm{\Omega})\equiv\int\mathrm{d}\mathbf{T}\,e^{\frac{i}{2}\mathrm{Tr}(\bm{\Omega}\mathbf{T})}\,\lvert\mathbf{I}+i\mathbf{T}\rvert^{-\frac{\nu}{2}}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{ab}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{cd}\left[(\mathbf{I}+i\mathbf{T})^{-1}\right]_{ef}.

(89)

Our strategy will again be to take derivatives. Letting $\mathbf{A}\equiv(\mathbf{I}+i\mathbf{T})^{-1}$ , we have the identity

	$\displaystyle\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{ab}A_{ij}A_{km}$	$\displaystyle=\frac{i}{\nu}\frac{\partial}{\partial T_{ij}}\left(\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{km}A_{ab}\right)+\frac{1}{2}\delta_{ij}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{ab}A_{ij}A_{km}$
		$\displaystyle-\frac{2}{\nu}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{ab}A_{k\left(i\right.}A_{\left.j\right)m}+\frac{1}{\nu}\delta_{ij}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{ab}A_{ki}A_{mj}-\frac{2}{\nu}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{km}A_{a\left(i\right.}A_{\left.j\right)b}+\frac{1}{\nu}\delta_{ij}\lvert\mathbf{A}\rvert^{\frac{\nu}{2}}A_{km}A_{ai}A_{jb},$		(90)

with no summation over repeated indices.

Integrating by parts, substituting Equation (80), and defining $K^{ab,ij,km}_{p,\nu}\equiv M^{ab,ij,km}P_{p,\nu}$ (where $P_{p,\nu}$ is defined in Equation (56)) gives

	$\displaystyle M^{ab,ij,km}-\frac{1}{2}\delta_{ij}M^{ab,ij,km}+\frac{2}{\nu}M^{ab,k(i,j)m}-\frac{1}{\nu}\delta_{ij}M^{ab,ki,mj}+\frac{2}{\nu}M^{km,a(i,j)b}-\frac{1}{\nu}\delta_{ij}M^{km,ai,jb}$
	$\displaystyle=\frac{(\nu+1)}{\nu^{2}(\nu+2)(\nu-1)}\left(\Omega_{ij}-\frac{1}{2}\Omega_{ij}\delta_{ij}\right)\left(\Omega_{km}\Omega_{ab}-\frac{2}{\nu+1}\Omega_{k\left(a\right.}\Omega_{\left.b\right)m}\right).$		(91)

Multiplying by $\delta_{ij}$ and subtracting the result gives

M^{ab,ij,km}+\frac{2}{\nu}M^{ab,k(i,j)m}+\frac{2}{\nu}M^{km,a(i,j)b}=\frac{(\nu+1)}{\nu^{2}(\nu+2)(\nu-1)}\left(\Omega_{ij}\Omega_{km}\Omega_{ab}-\frac{2}{\nu+1}\Omega_{ij}\Omega_{k\left(a\right.}\Omega_{\left.b\right)m}\right).

(92)

We can write two further equations like Equation (92) by taking derivatives with respect to $T_{km}$ and $T_{ab}$ . Adding all three equations gives an expression manifestly symmetric in its indices

	$\displaystyle 3M^{ab,ij,km}+\frac{2}{\nu}\left(M^{ab,ki,jm}+M^{ab,kj,im}+M^{km,aj,ib}+M^{km,ai,jb}+M^{ij,ak,mb}+M^{ij,am,kb}\right)$
	$\displaystyle=\frac{(\nu+1)}{\nu^{2}(\nu+2)(\nu-1)}\left[3\Omega_{ij}\Omega_{km}\Omega_{ab}-\frac{\left(\Omega_{ij}\Omega_{ka}\Omega_{bm}+\Omega_{ij}\Omega_{kb}\Omega_{am}+\Omega_{km}\Omega_{ia}\Omega_{bj}+\Omega_{km}\Omega_{ib}\Omega_{aj}+\Omega_{ab}\Omega_{ki}\Omega_{jm}+\Omega_{ab}\Omega_{kj}\Omega_{im}\right)}{1+\nu}\right].$		(93)

From the $p=1$ case we know that the solution must be cubic in the elements of $\bm{\Omega}$ . There are three kinds of term that we can use to build $M^{ab,ij,km}$ dictated by the symmetries in play, given by

$\displaystyle M^{ab,ij,km}=$	$\displaystyle X\Omega_{ij}\Omega_{km}\Omega_{ab}$
$\displaystyle+2$	$\displaystyle Y[\Omega_{ij}\Omega_{k\left(a\right.}\Omega_{\left.b\right)m}+\Omega_{km}\Omega_{i\left(a\right.}\Omega_{\left.b\right)j}+\Omega_{ab}\Omega_{k\left(i\right.}\Omega_{\left.j\right)m}]$
$\displaystyle+$	$\displaystyle 2Z[\Omega_{jm}\Omega_{k\left(a\right.}\Omega_{\left.b\right)i}+\Omega_{ik}\Omega_{j\left(a\right.}\Omega_{\left.b\right)m}+\Omega_{jk}\Omega_{i\left(a\right.}\Omega_{\left.b\right)m}+\Omega_{im}\Omega_{k\left(a\right.}\Omega_{\left.b\right)j}],$	(94)

for unknown coefficients $X,Y,Z$ . The three terms here are closed under the symmetries of $M^{ab,ij,km}$ . Substituting in to Equation (93) and identifying coefficients of these terms gives the following equations for the unknown coefficients

$\displaystyle 3X+\frac{12}{\nu}Y$	$\displaystyle=\frac{3(\nu+1)}{\nu^{2}(\nu+2)(\nu-1)}$	(95)
$\displaystyle 6Y+\frac{2}{\nu}(2X+2Y+8Z)$	$\displaystyle=-\frac{2}{\nu^{2}(\nu+2)(\nu-1)}$	(96)
$\displaystyle 6Z+\frac{2}{\nu}(6Y+6Z)$	$\displaystyle=0.$	(97)

The solution is

$\displaystyle X$	$\displaystyle=\frac{\nu^{2}+3\nu-2}{\nu(\nu-1)(\nu-2)(\nu+2)(\nu+4)}$	(98)
$\displaystyle Y$	$\displaystyle=\frac{-1}{\nu(\nu-1)(\nu-2)(\nu+4)}$	(99)
$\displaystyle Z$	$\displaystyle=\frac{2}{\nu(\nu-1)(\nu-2)(\nu+2)(\nu+4)}.$	(100)

The integral Equation (89) is thus

	$\displaystyle K_{p,nu}^{ab,cd,ef}(\bm{\Omega})=\frac{P_{p,\nu}(\bm{\Omega})}{\nu(\nu+4)(\nu-2)(\nu-1)}\left\{\frac{\nu^{2}+3\nu-2}{\nu+2}\Omega_{ab}\Omega_{cd}\Omega_{ef}-2[\Omega_{cd}\Omega_{e\left(a\right.}\Omega_{\left.b\right)f}+\Omega_{ef}\Omega_{c\left(a\right.}\Omega_{\left.b\right)d}+\Omega_{ab}\Omega_{e\left(c\right.}\Omega_{\left.d\right)f}]\right.$
	$\displaystyle\left.+\frac{4}{\nu+2}[\Omega_{df}\Omega_{e\left(a\right.}\Omega_{\left.b\right)c}+\Omega_{ce}\Omega_{d\left(a\right.}\Omega_{\left.b\right)f}+\Omega_{de}\Omega_{c\left(a\right.}\Omega_{\left.b\right)f}+\Omega_{cf}\Omega_{e\left(a\right.}\Omega_{\left.b\right)d}]\right\},$		(101)

Putting everything together gives the correction to the density as

	$\displaystyle p(\{\hat{\mathbf{C}}_{l}\})=\left[\prod_{l=l_{\mathrm{min}}}^{l_{\mathrm{max}}}W_{p}(\hat{\mathbf{C}}_{l};\mathbf{C}_{l}/\nu,\nu)\right]\left\{1+\frac{1}{12}\sum_{l_{1},l_{2},l_{3}}\nu_{1}\nu_{2}\nu_{3}\tilde{B}_{l_{1}l_{2}l_{3}}^{abc,def}C^{-1}_{l_{1},ai}C^{-1}_{l_{2},bj}C^{-1}_{l_{3},ck}C^{-1}_{l_{1},dl}C^{-1}_{l_{2},em}C^{-1}_{l_{3},fn}\vphantom{\frac{1}{2}}\right.$
	$\displaystyle\times\left[\vphantom{\frac{1}{2}}\Delta\hat{C}_{l_{1},il}\Delta\hat{C}_{l_{2},jm}\Delta\hat{C}_{l_{3},kn}+[3]\frac{2\delta_{l_{1}l_{2}}}{(\nu_{1}+2)(\nu_{1}-1)}\hat{C}_{l_{1},il}\hat{C}_{l_{1},jm}\Delta\hat{C}_{l_{3},kn}-[3]\frac{2\delta_{l_{1}l_{2}}\nu_{1}}{(\nu_{1}+2)(\nu_{1}-1)}\hat{C}_{l_{1},i\left(j\right.}\hat{C}_{l_{1},\left.m\right)l}\Delta\hat{C}_{l_{3},kn}\right.$
	$\displaystyle+\frac{32\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+2)(\nu_{1}-1)(\nu_{1}+4)(\nu_{1}-2)}\hat{C}_{l_{1},il}\hat{C}_{l_{1},jm}\hat{C}_{l_{1},kn}-[3]\frac{16\nu_{1}\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+2)(\nu_{1}-1)(\nu_{1}+4)(\nu_{1}-2)}\hat{C}_{l_{1},jm}\hat{C}_{l_{1},k\left(i\right.}\hat{C}_{l_{1},\left.l\right)n}$
	$\displaystyle\left.\left.+[4]\frac{4\nu_{1}^{2}\delta_{l_{1}l_{2}}\delta_{l_{2}l_{3}}}{(\nu_{1}+2)(\nu_{1}-1)(\nu_{1}+4)(\nu_{1}-2)}\hat{C}_{l_{1},mn}\hat{C}_{l_{1},k\left(i\right.}\hat{C}_{l_{1},\left.l\right)j}\right]\vphantom{\frac{1}{2}}\right\},$		(102)

with implicit summation over repeated indices. We have verified that this distribution is correctly normalised. Computing the corrections to the mean, covariance, and three-point function is tedious so we have only checked for the $p=1$ case, confirming that these three cumulants are recovered correctly.

Equation (81) and Equation (102) give the leading order correction to the Wishart distribution from signal non-Gaussian. The expressions are clearly quite cumbersome when $p\neq 1$ , but may be simplified using the symmetries of the bin indices. It may also be possible to write these expressions in a more covariant way, although we were not able to find a more compact expression.

Appendix C Alternative binning choices

The default results of Section III.2 are made with evenly space logarithmically binned power spectra with $\Delta\log l=0.1$ and a weight function $W_{l}=2l+1$ . The mathematical details of the binning are described in Section II.4.

The main use of binning for our purposes is to allow us to easily measure the covariance matrix of our lognormal maps from a finite number of simulations. Binning reduces the Monte Carlo noise in this measurement and increases the relative strength of the non-Gaussian part.

As discussed in Section II.4, the correction to posteriors from signal non-Gaussianity in expected to depend only mildly on the bin width, as we verify numerically in Figure 9 where we plot the corrections to $C_{l}$ and amplitude posteriors for logarithmic bins with $\Delta\log l=0.2$ . This plot should be compared with Figure 7. The impact of these broader bins is very mild. The broader bins roughly preserve the total mode count when $W_{l}=2l+1$ , so the effect on the posterior is naturally expected to be small.

It is beyond the scope of this paper to provide an exhaustive investigation of the different binning choices for the power spectrum, but a few obvious schemes deserve discussion. Instead of logarithmic bins we can also choose linearly-spaced bins. This choice compresses many of the large-scale modes into a single bin and hence does not allow the low- $l$ behaviour of the likelihood to be clearly seen, so we chose not to adopt this choice in our baseline results. Another alternative is to use a weight function $W_{l}=l(l+1)$ , which is a common choice for power spectrum bandpowers. This upweights the smallest scales in each bin and consequently causes our likelihood approximation to break down at smaller $l_{b}$ (the bin barycentre) for logarithmic bins with $\Delta l=0.1$ . An advantage of our formalism is that the likelihood correction diagnostic $\langle\varepsilon\rangle$ can be computed for any binning choice, something which we advocate when analysing power spectra binned with this weight function.

We close by noting that any binning choice may be adopted so long as the bins are sufficiently small that the power spectrum can be approximated as flat over the extent of the bin. This assumption can be avoided entirely by working with $\hat{C}_{l}/C_{l}$ rather than $C_{l}$ , as discussed in Section II.4.

Appendix D Dependence on source redshift

The baseline results in this paper have assumed a broad source redshift bin following a Euclid-like distribution, peaking at $z_{s}\approx 1$ . Lensing surveys typically place galaxies in tomographic redshift bins - the fiducial choice of Ref. Euclid Collaboration et al. (2020) uses 10 redshift bins, which is representative of upcoming Stage-IV surveys. Assuming equipopulated bins, the redshift distribution of the lowest bin is approximately Gaussian (after accounting for photometric redshift errors) with a peak around $z_{s}=0.325$ and width $\sigma_{z}=0.1$ . Non-Gaussianity in the shear field is relatively stronger at a fixed angular scale at lower redshift due to that scale subtending smaller, more non-linear spatial scales and the evolution of structure formation towards lower redshift. This is offset to an extent by the overall signal strength being suppressed by the lensing kernel compared with a bin at higher source redshift. This also makes the relative effect of any Gaussianizing shape noise stronger.

To test our likelihood approximation in what is likely the most extreme case for a Euclid-like survey, we generate an ensemble of lognormal fields for this low-redshift source bin. We scale the ‘shift’ parameter quantifying the strength of the lognormality according to the redshift-scaling formula proposed by Ref. Xavier et al. (2016), confirming this provides a reasonable match to the covariance matrix of the ray-traced N-body simulations of Ref. Takahashi et al. (2017).

Figure 10 shows the leading-order corrections to the posterior and the convergence statistics for this redshift bin, which should be compared with Figure 7. The stronger non-Gaussianity is clear to see, with the blue curves quantifying the correction to the Fisher matrix on an individual $C_{l}$ (left panel) or an amplitude parameter (right panel) passing unity around $l\approx 100$ . Our assumption of perturbative non-Gaussianity in the signal is valid only up to around $l\approx 30$ . At this angular scale the Gamma distribution is reasonably well approximated by a Gaussian, and passing the full non-Gaussian covariance matrix to this Gaussian absorbs most of the correction to the posterior that our approximation implies. Residual post-Gaussian effects are at most 1%, and extrapolating out to $l=1000$ we see that corrections are likely to be no more than 10%. It therefore appears that even for this low source redshift bin the scale of non-Gaussianity ( $l\approx 30$ ) and the scale where Gamma-type corrections are necessary ( $l\approx 10$ ) are separated enough that the likelihood can be approximated as transitioning from a Gamma to a Gaussian with the correct covariance matrix.

When shape noise is added the effects of non-Gaussianity are strongly suppressed. Figure 11 shows leading-order corrections to the posterior for a model in which $C_{l}=A(S_{0,l}+N_{l})$ when applying shape noise with a galaxy number density of $\bar{n}=3\,\mathrm{sq.\,arcmin}^{-1}$ , appropriate for a single redshift bin in a Euclid-like survey. Our approximation is now convergent out to much higher $l$ , and post-Gaussian corrections are negligible. It thus appears that although the strength of the signal non-Gaussianity may be stronger at low redshift, the relatively higher noise level suppresses non-Gaussianity in the likelihood.

We close this section by noting that these conclusions may need to be modified when considering galaxy clustering at low redshift. There the geometric suppression suffered by lensing is smaller, and hence the Gaussianizing effect of noise may be expected to be relatively weaker. We have also neglected Intrinsic Alignments, which may be expected to play a greater role in cross-bin combinations at low redshift. We advocate that the statistics plotted in Figure 7 and Figure 10 be computed whenever a likelihood approximation needs to be made.

References

Kilbinger (2015) M. Kilbinger, Reports on Progress in Physics 78, 086901 (2015), 1411.0115.
Mandelbaum (2018) R. Mandelbaum, ARA&A 56, 393 (2018), 1710.03235.
Hamana et al. (2020) T. Hamana, M. Shirasaki, S. Miyazaki, C. Hikage, M. Oguri, S. More, R. Armstrong, A. Leauthaud, R. Mandelbaum, H. Miyatake, et al., PASJ 72, 16 (2020), 1906.06041.
Heymans et al. (2021) C. Heymans, T. Tröster, M. Asgari, C. Blake, H. Hildebrandt, B. Joachimi, K. Kuijken, C.-A. Lin, A. G. Sánchez, J. L. van den Busch, et al., A&A 646, A140 (2021), 2007.15632.
Abbott et al. (2022) T. M. C. Abbott, M. Aguena, A. Alarcon, S. Allam, O. Alves, A. Amon, F. Andrade-Oliveira, J. Annis, S. Avila, D. Bacon, et al., Phys. Rev. D 105, 023520 (2022), 2105.13549.
Abbott et al. (2018) T. M. C. Abbott, F. B. Abdalla, A. Alarcon, J. Aleksić, S. Allam, S. Allen, A. Amara, J. Annis, J. Asorey, S. Avila, et al., Phys. Rev. D 98, 043526 (2018), 1708.01530.
Hu (1999) W. Hu, ApJ. Lett. 522, L21 (1999), astro-ph/9904153.
Schrabback et al. (2010) T. Schrabback, J. Hartlap, B. Joachimi, M. Kilbinger, P. Simon, K. Benabed, M. Bradač, T. Eifler, T. Erben, C. D. Fassnacht, et al., A&A 516, A63 (2010), 0911.0053.
Heymans et al. (2013) C. Heymans, E. Grocutt, A. Heavens, M. Kilbinger, T. D. Kitching, F. Simpson, J. Benjamin, T. Erben, H. Hildebrandt, H. Hoekstra, et al., Mon. Not. R. Astron. Soc. 432, 2433 (2013), 1303.1808.
Jee et al. (2016) M. J. Jee, J. A. Tyson, S. Hilbert, M. D. Schneider, S. Schmidt, and D. Wittman, Astrophys. J. 824, 77 (2016), 1510.03962.
Percival and Brown (2006) W. J. Percival and M. L. Brown, Mon. Not. R. Astron. Soc. 372, 1104 (2006), astro-ph/0604547.
Hamimeche and Lewis (2008) S. Hamimeche and A. Lewis, Phys. Rev. D 77, 103013 (2008), 0801.0554.
Upham et al. (2020) R. E. Upham, L. Whittaker, and M. L. Brown, Mon. Not. R. Astron. Soc. 491, 3165 (2020), 1908.00795.
Hikage et al. (2019) C. Hikage, M. Oguri, T. Hamana, S. More, R. Mandelbaum, M. Takada, F. Köhlinger, H. Miyatake, A. J. Nishizawa, H. Aihara, et al., PASJ 71, 43 (2019), 1809.09148.
Nicola et al. (2021) A. Nicola, C. García-García, D. Alonso, J. Dunkley, P. G. Ferreira, A. Slosar, and D. N. Spergel, J. Cosmol. Astropart. Phys. 2021, 067 (2021), 2010.09717.
Loureiro et al. (2021) A. Loureiro, L. Whittaker, A. Spurio Mancini, B. Joachimi, A. Cuceu, M. Asgari, B. Stölzner, T. Tröster, A. H. Wright, M. Bilicki, et al., arXiv e-prints arXiv:2110.06947 (2021), 2110.06947.
Sellentin and Heavens (2016) E. Sellentin and A. F. Heavens, Mon. Not. R. Astron. Soc. 456, L132 (2016), 1511.05969.
Hall and Taylor (2019) A. Hall and A. Taylor, Mon. Not. R. Astron. Soc. 483, 189 (2019), 1807.06875.
Scoccimarro (2000) R. Scoccimarro, Astrophys. J. 544, 597 (2000), astro-ph/0004086.
Hartlap et al. (2009) J. Hartlap, T. Schrabback, P. Simon, and P. Schneider, A&A 504, 689 (2009), 0901.3269.
Schneider and Hartlap (2009) P. Schneider and J. Hartlap, A&A 504, 705 (2009), 0905.0577.
Keitel and Schneider (2011) D. Keitel and P. Schneider, A&A 534, A76 (2011), 1105.3672.
Wilking and Schneider (2013) P. Wilking and P. Schneider, A&A 556, A70 (2013), 1304.4781.
Sellentin and Heavens (2018) E. Sellentin and A. F. Heavens, Mon. Not. R. Astron. Soc. 473, 2355 (2018), 1707.04488.
Sellentin et al. (2018) E. Sellentin, C. Heymans, and J. Harnois-Déraps, Mon. Not. R. Astron. Soc. 477, 4879 (2018), 1712.04923.
Hahn et al. (2019) C. Hahn, F. Beutler, M. Sinha, A. Berlind, S. Ho, and D. W. Hogg, Mon. Not. R. Astron. Soc. 485, 2956 (2019), 1803.06348.
Lin et al. (2020) C.-H. Lin, J. Harnois-Déraps, T. Eifler, T. Pospisil, R. Mandelbaum, A. B. Lee, S. Singh, and LSST Dark Energy Science Collaboration, Mon. Not. R. Astron. Soc. 499, 2977 (2020), 1905.03779.
Taylor et al. (2019) P. L. Taylor, T. D. Kitching, J. Alsing, B. D. Wandelt, S. M. Feeney, and J. D. McEwen, Phys. Rev. D 100, 023519 (2019), 1904.05364.
Diaz Rivero and Dvorkin (2020) A. Diaz Rivero and C. Dvorkin, Phys. Rev. D 102, 103507 (2020), 2007.05535.
Friedrich et al. (2021) O. Friedrich, F. Andrade-Oliveira, H. Camacho, O. Alves, R. Rosenfeld, J. Sanchez, X. Fang, T. F. Eifler, E. Krause, C. Chang, et al., Mon. Not. R. Astron. Soc. 508, 3125 (2021), 2012.08568.
Smith et al. (2006) S. Smith, A. Challinor, and G. Rocha, Phys. Rev. D 73, 023517 (2006), astro-ph/0511703.
Durrett (2010) R. Durrett, Probability: theory and examples. Volume 31 of Cambridge Series in Statistical and Probabilistic Mathematics (Cambridge University Press, Cambridge, UK, 2010).
Bouchet et al. (1992) F. R. Bouchet, R. Juszkiewicz, S. Colombi, and R. Pellat, ApJ. Lett. 394, L5 (1992).
Juszkiewicz et al. (1995) R. Juszkiewicz, D. H. Weinberg, P. Amsterdamski, M. Chodorowski, and F. Bouchet, Astrophys. J. 442, 39 (1995), astro-ph/9308012.
Bernardeau and Kofman (1995) F. Bernardeau and L. Kofman, Astrophys. J. 443, 479 (1995), astro-ph/9403028.
Amendola (1996) L. Amendola, Mon. Not. R. Astron. Soc. 283, 983 (1996).
Blinnikov and Moessner (1998) S. Blinnikov and R. Moessner, A&AS 130, 193 (1998), astro-ph/9711239.
Taylor and Watts (2001) A. N. Taylor and P. I. R. Watts, Mon. Not. R. Astron. Soc. 328, 1027 (2001), astro-ph/0010014.
Carron (2013) J. Carron, A&A 551, A88 (2013), 1204.4724.
Ponzano and Regge (1968) G. Ponzano and T. Regge, in Spectroscopy and group theoretical methods in Physics, edited by F. Bloch (Amsterdam: North-Holland, 1968).
Bernardeau et al. (2011) F. Bernardeau, C. Pitrou, and J.-P. Uzan, J. Cosmol. Astropart. Phys. 2011, 015 (2011), 1012.2652.
Munshi et al. (2020) D. Munshi, T. Namikawa, T. D. Kitching, J. D. McEwen, R. Takahashi, F. R. Bouchet, A. Taruya, and B. Bose, Mon. Not. R. Astron. Soc. 493, 3985 (2020), 1910.04627.
Xavier et al. (2016) H. S. Xavier, F. B. Abdalla, and B. Joachimi, Mon. Not. R. Astron. Soc. 459, 3693 (2016), 1602.08503.
Taruya et al. (2002) A. Taruya, M. Takada, T. Hamana, I. Kayo, and T. Futamase, Astrophys. J. 571, 638 (2002), astro-ph/0202090.
Hilbert et al. (2011) S. Hilbert, J. Hartlap, and P. Schneider, A&A 536, A85 (2011), 1105.3980.
Martin et al. (2012) S. Martin, P. Schneider, and P. Simon, A&A 540, A9 (2012), 1109.0944.
Lewis et al. (2000) A. Lewis, A. Challinor, and A. Lasenby, Astrophys. J. 538, 473 (2000), astro-ph/9911177.
Howlett et al. (2012) C. Howlett, A. Lewis, A. Hall, and A. Challinor, J. Cosmol. Astropart. Phys. 2012, 027 (2012), 1201.3654.
Euclid Collaboration et al. (2020) Euclid Collaboration, A. Blanchard, S. Camera, C. Carbone, V. F. Cardone, S. Casas, S. Clesse, S. Ilić, M. Kilbinger, T. Kitching, et al., A&A 642, A191 (2020), 1910.09273.
Planck Collaboration et al. (2020) Planck Collaboration, N. Aghanim, Y. Akrami, M. Ashdown, J. Aumont, C. Baccigalupi, M. Ballardini, A. J. Banday, R. B. Barreiro, N. Bartolo, et al., A&A 641, A6 (2020), 1807.06209.
Mead et al. (2021) A. J. Mead, S. Brieden, T. Tröster, and C. Heymans, Mon. Not. R. Astron. Soc. 502, 1401 (2021), 2009.01858.
Górski et al. (2005) K. M. Górski, E. Hivon, A. J. Banday, B. D. Wandelt, F. K. Hansen, M. Reinecke, and M. Bartelmann, Astrophys. J. 622, 759 (2005), astro-ph/0409513.
Takahashi et al. (2017) R. Takahashi, T. Hamana, M. Shirasaki, T. Namikawa, T. Nishimichi, K. Osato, and K. Shiroyama, Astrophys. J. 850, 24 (2017), 1706.01472.
Benabed et al. (2009) K. Benabed, J. F. Cardoso, S. Prunet, and E. Hivon, Mon. Not. R. Astron. Soc. 400, 219 (2009), 0901.4537.
de Belsunce et al. (2021) R. de Belsunce, S. Gratton, W. Coulton, and G. Efstathiou, Mon. Not. R. Astron. Soc. 507, 1072 (2021), 2103.14378.
Seljak et al. (2017) U. Seljak, G. Aslanyan, Y. Feng, and C. Modi, J. Cosmol. Astropart. Phys. 2017, 009 (2017), 1706.06645.
Wandelt et al. (2004) B. D. Wandelt, D. L. Larson, and A. Lakshminarayanan, Phys. Rev. D 70, 083511 (2004), astro-ph/0310080.
Alsing et al. (2019) J. Alsing, T. Charnock, S. Feeney, and B. Wandelt, Mon. Not. R. Astron. Soc. 488, 4440 (2019), 1903.00007.
Wang et al. (2019) M. S. Wang, W. J. Percival, S. Avila, R. Crittenden, and D. Bianchi, Mon. Not. R. Astron. Soc. 486, 951 (2019), 1811.08155.
Okamoto and Hu (2002) T. Okamoto and W. Hu, Phys. Rev. D 66, 063008 (2002), astro-ph/0206155.
Cooray and Sheth (2002) A. Cooray and R. Sheth, Physics Reports 372, 1 (2002), astro-ph/0206508.
Krause and Eifler (2017) E. Krause and T. Eifler, Mon. Not. R. Astron. Soc. 470, 2100 (2017), 1601.05779.
Harnois-Déraps et al. (2012) J. Harnois-Déraps, S. Vafaei, and L. Van Waerbeke, Mon. Not. R. Astron. Soc. 426, 1262 (2012), 1202.2332.
Sato and Nishimichi (2013) M. Sato and T. Nishimichi, Phys. Rev. D 87, 123538 (2013), 1301.3588.
Cooray and Hu (2000) A. Cooray and W. Hu, Astrophys. J. 534, 533 (2000), astro-ph/9910397.
Bernardeau et al. (2002) F. Bernardeau, S. Colombi, E. Gaztañaga, and R. Scoccimarro, Physics Reports 367, 1 (2002), astro-ph/0112551.
Janik and Nowak (2003) R. A. Janik and M. A. Nowak, Journal of Physics A: Mathematical and General 36, 3629 (2003), URL https://doi.org/10.1088/0305-4470/36/12/343.
Herz (1955) C. S. Herz, Annals of Mathematics 61, 474 (1955), ISSN 0003486X, URL http://www.jstor.org/stable/1969810.
Gupta and Nagar (2000) A. K. Gupta and D. K. Nagar, Matrix Variate Distributions (Chapman and Hall, CRC Press, Boca Raton, FL, USA, 2000).
Petersen and Pedersen (2008) K. B. Petersen and M. S. Pedersen, The matrix cookbook (2008), version 20081110, URL http://www2.imm.dtu.dk/pubdb/p.php?3274.