The PAU Survey: narrowband photometric redshifts using Gaussian processes

John Y. H. Soo^1,2, Benjamin Joachimi², Martin Eriksen³, Małgorzata Siudek^3,4 E-mail: johnsooyh@usm.my Alex Alarcon⁵, Laura Cabayol³, Jorge Carretero^3,6, Ricard Casas^7,8 Francisco J. Castander^7,8, Enrique Fernández³, Juan García-Bellido⁹ Enrique Gaztanaga^7,8, Hendrik Hildebrandt¹⁰, Henk Hoekstra¹¹, Ramon Miquel^3,12 Cristobal Padilla³, Eusebio Sánchez¹³, Santiago Serrano^7,8 and Pau Tallada-Crespí^6,13.
¹School of Physics, Universiti Sains Malaysia (USM), 11800 USM, Pulau Pinang, Malaysia.
²Department of Physics and Astronomy, University College London (UCL), Gower Street, London WC1E 6BT, UK.
³Institut de Física d’Altes Energies (IFAE), The Barcelona Institute of Science and Technology, 08193 Bellaterra (Barcelona), Spain.
⁴National Centre for Nuclear Research, 7 Pasteura str., 02-093 Warsaw, Poland.
⁵High Energy Physics Division, Argonne National Laboratory, Lemont, IL 60439, USA.
⁶Port d’Informació Científica (PIC), Universitat Autònoma de Barcelona, Carrer Albareda S/N, 08193 Bellaterra (Barcelona), Spain.
⁷Institute of Space Sciences (ICE/CSIC), Universitat Autònoma de Barcelona, Carrer de Can Magrans S/N, 08193 Cerdanyola del Vallès (Barcelona), Spain.
⁸Institut d’Estudis Espacials de Catalunya (IEEC), 08034 Barcelona, Spain.
⁹Instituto de Física Teórica (IFT-UAM/CSIC), Universidad Autónoma de Madrid, 28049 Madrid, Spain.
¹⁰German Centre for Cosmological Lensing, Astronomisches Institut, Ruhr-Universität Bochum (AIRUB), Universitätsstr. 150, 44801 Bochum, Germany.
¹¹Leiden Observatory, Leiden University, Niels Bohrweg 2, 2333 CA, Leiden, The Netherlands.
¹²Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain.
¹³Centro de Investigaciones Energeticas, Medioambientales y Tecnologicas (CIEMAT), Avenida Complutense 40, 28040 Madrid, Spain.

(Accepted XXX. Received YYY; in original form ZZZ)

Abstract

We study the performance of the hybrid template-machine-learning photometric redshift (photo- $z$ ) algorithm delight, which uses Gaussian processes, on a subset of the early data release of the Physics of the Accelerating Universe Survey (PAUS). We calibrate the fluxes of the $40$ PAUS narrow bands with $6$ broadband fluxes ( $uBVriz$ ) in the COSMOS field using three different methods, including a new method which utilises the correlation between the apparent size and overall flux of the galaxy. We use a rich set of empirically derived galaxy spectral templates as guides to train the Gaussian process, and we show that our results are competitive with other standard photometric redshift algorithms. delight achieves a photo- $z$ $68$ th percentile error of $\sigma_{68}=0.0081(1+z)$ without any quality cut for galaxies with $i_{\textrm{auto}}<22.5$ as compared to $0.0089(1+z)$ and $0.0202(1+z)$ for the bpz and annz $2$ codes, respectively. delight is also shown to produce more accurate probability distribution functions for individual redshift estimates than bpz and annz $2$ . Common photo- $z$ outliers of delight and bcnz $2$ (previously applied to PAUS) are found to be primarily caused by outliers in the narrowband fluxes, with a small number of cases potentially indicating spectroscopic redshift failures in the reference sample. In the process, we introduce performance metrics derived from the results of bcnz $2$ and delight, allowing us to achieve a photo- $z$ quality of $\sigma_{68}<0.0035(1+z)$ at a magnitude of $i_{\textrm{auto}}<22.5$ while keeping $50$ per cent objects of the galaxy sample.

keywords:

galaxies: distances and redshifts – methods: numerical – methods: statistical

^†^†pubyear: 2020^†^†pagerange: The PAU Survey: narrowband photometric redshifts using Gaussian processes–The PAU Survey: narrowband photometric redshifts using Gaussian processes

1 Introduction

Photometric redshift (photo- $z$ ) estimation continues to be an active research area as it plays a major role in solving the big questions in cosmology. Redshifts provide radial information (distance) to the traditional two dimensional sky maps of galaxies. They are traditionally determined through spectroscopic methods (spectroscopic redshifts, or spec- $z$ ’s). Yet since the process requires long telescope time for high completeness, photo- $z$ ’s are instrumental for the analysis of large surveys containing of order $10^{8-9}$ galaxies. Photo- $z$ methodology has been evolving and improving a lot over the past couple of decades (e.g. Brescia et al., 2018; Salvato et al., 2019), such that it had been sufficiently useful for most recent cosmological researches.

Photo- $z$ , as its name suggests, is often determined through the use of a handful of broadband photometric filters obtained from large sky surveys. Photo- $z$ estimation methods are generally categorised into two different types: the template-based method, which relies on accurate models of spectral energy distribution (SED) templates of different types of galaxies; and the data-driven empirical method, which relies on training sets of galaxies and machine-learning algorithms. Each method however has its own limitations: template-based methods may produce photo- $z$ ’s with large scatter and catastrophic rates without representative templates; while machine-learning methods may perform poorly outside the regions of the parameters covered by the training sample (D’Isanto et al., 2018). As a result, hybrid methods have been implemented to utilise the best of both worlds (Cavuoti et al., 2017; Duncan et al., 2018, 2019).

Many current and upcoming surveys such as the Dark Energy Survey (DES, Abbott et al., 2005), Legacy Survey of Space and Time (LSST, Ivezić et al., 2008), Euclid (Laureijs et al., 2011), Kilo-Degree Survey (KiDS, De Jong et al., 2013), Wide Field Infrared Survey Telescope (WFIRST, Spergel et al., 2013) and Hyper Suprime-Cam (HSC, Aihara et al., 2018) have set stringent photo- $z$ requirements to ensure that they meet their science goals, forcing the quality of photo- $z$ methodology to constantly improve. For example, LSST’s photo- $z$ requirement is to reach a root-mean-square error of $\sigma_{\textrm{RMS}}<0.02(1+z)$ , while the Euclid requirement is $\sigma_{\textrm{RMS}}<0.05(1+z)$ . High quality photo- $z$ ’s are required for a reliable estimation of e.g. weak lensing (Benjamin et al., 2013), angular clustering (Crocce et al., 2016), intrinsic alignment (Johnston et al., 2020), structure formation, galaxy classification and galaxy properties (Jouvel et al., 2017; Laigle et al., 2018; Siudek et al., 2018).

The aforementioned surveys are predominantly broadband surveys which use between $4$ - $9$ broadband filters ranging from infrared to ultraviolet. This work however, explores the estimation of photo- $z$ ’s in narrowband surveys, focusing on the Physics of the Accelerating Universe Survey (PAUS, Padilla et al., 2019), which observes the sky using $40$ narrow bands (see Section 2.1). Producing high quality photo- $z$ ’s for such a survey requires careful optimisation between narrow and broad bands, since machine-learning based methods have to be optimised for a larger number of inputs (Eriksen et al., 2020), while template-based methods require more attention towards the narrow emission line features.

Martí et al. (2014) used simulations to predict that by using PAUS narrowband photometry, the photo- $z$ quality could reach an unprecedentedly low $68$ th percentile error of $\sigma_{68}=0.0035(1+z)$ at a quality cut of $50$ per cent at $i<22.5$ . This has been verified by Eriksen et al. (2019), where they combined the $40$ PAUS narrow bands (early data release) with broad bands $uBVriz$ from the Cosmic Evolution Survey (COSMOS, Laigle et al., 2016), and using their template-based photo- $z$ code bcnz $2$ , they showed that this result is achievable when a $50$ per cent photometric quality cut was imposed on the final testing set. In a more recent work, Eriksen et al. (2020) used deepz, a deep learning algorithm on the same data set and showed that it outperformed bcnz $2$ by reaching $50$ per cent lower in $\sigma_{68}$ . Furthermore, Alarcon et al. (2021) showed that an ever greater precision can be achieved when using additional photometric bands available in the COSMOS field (a total of $66$ bands).

We are motivated by the work of Eriksen et al. (2019), but instead of using purely template-based methods, we attempt to achieve this PAUS photo- $z$ precision by utilising Gaussian processes (GPs, see Section 3.1) to make empirical adjustments to templates, working on the same data set and conditions. We seek to produce an independent method that is competitive, as that will allow us to exploit synergies with bcnz $2$ by Eriksen et al. (2019) as shown in this work, deepz (Eriksen et al., 2020), and photo- $z$ ’s by Alarcon et al. (2021) in the future. Therefore the contents of this paper reflect our findings, putting special emphasis on the performance and application of delight (Leistedt & Hogg, 2017), a hybrid template-machine-learning photo- $z$ code. When carefully calibrated and combined with COSMOS broadband fluxes, delight should achieve equally good results as that of bcnz $2$ . The main aims of this paper are threefold:

1.

to optimise and test the performance of the hybrid template-machine-learning photo- $z$ code delight on a narrowband survey;
2.

to develop an optimal method to calibrate the fluxes between the COSMOS broadbands and the PAUS narrow bands;
3.

to provide an independent photo- $z$ solution for PAUS, enabling the study of photometric and spectroscopic redshift outliers.

This paper is structured as follows. In Section 2 we first introduce PAUS and the sources of photometry and spectroscopic redshifts used in this work. Section 3 describes the algorithms (delight, annz $2$ and bpz) used in this work, together with their optimisation settings and SED templates used. Section 4 describes the full details of how the photometry and spectroscopy from PAUS, COSMOS and zCOSMOS are cross-matched, how the galaxy fluxes are selected, the three methods to calibrate the broadband and narrowband fluxes, and the performance metrics used in this work to compare the results between runs and codes. Section 5 shows the photo- $z$ results obtained by delight, and a thorough analysis is conducted to compare its performance with annz $2$ , bpz and bcnz $2$ . Finally, in Section 6 we study the photo- $z$ outliers of delight and bcnz $2$ , and derive new metrics with improved photo- $z$ outlier identifications. Our work is concluded in Section 7.

2 Photometry and Spectroscopy

In this work, photometric data were obtained from PAUS (Section 2.1) and COSMOS (Section 2.2), while spectroscopic redshifts were obtained from zCOSMOS (Section 2.3). In this section, these surveys will be introduced, together with the selection cuts used to obtain our training and testing sets.

2.1 PAUS

PAUS is a narrowband photometric galaxy survey aimed at mapping the large-scale structure of the Universe up to $i\sim 23.0$ . Using $40$ narrow bands spaced by $100$ Å in the range between $4500$ to $8500$ Å (filter responses visualised in Eriksen et al., 2019, and Fig. 4), PAUS aims to achieve redshifts with a precision of $\sigma_{\textrm{RMS}}<0.0035(1+z)$ for galaxies with $i_{\textrm{auto}}<22.5$ . PAUS uses the PAUCam instrument (Padilla et al., 2019) on the $4$ m William Herschel Telescope (WHT) at Observatorio del Roque de los Muchachos (ORM) in La Palma. It has observed more than $50$ deg² of sky since the beginning of $2016$ , and observations to full depth in all narrow bands for $100$ deg² are planned.

The PAUS forced-aperture coadded photometry has its aperture defined by using the $50$ per cent light radius ( $r_{50}$ ), the point spread function (PSF), ellipticity and Sérsic index of COSMOS morphology, such that the fluxes measure a fixed fraction of light. The reader is referred to Eriksen et al. (2019) for detailed information on how the PAUS fluxes are measured. In this work we used the early data release from PAUS (objects are observed at least five times, using an elliptical aperture with $62.5$ per cent light radius), and select objects with $i_{\textrm{auto}}\leq 22.5$ , entries with no missing measurement, and the COSMOS flag TYPE=0 (extended objects).

2.2 COSMOS

The Cosmic Evolution Survey (COSMOS, Scoville et al., 2007) covers a sky area of $2$ deg² ( $149.47^{\circ}\leq\alpha\leq 150.7^{\circ}$ , $1.62^{\circ}\leq\delta\leq 2.83^{\circ}$ ) and is known for its high sensitivity, depth and an exceptionally low and uniform Galactic extinction ( $E_{B\text{-}V}\sim 0.02$ ).

In this work we used photometry from the COSMOS $2015$ Catalogue (Laigle et al., 2016); it is a highly complete mass-selected sample to very high redshifts, highly optimised for the study of galaxy evolution and environments in the early Universe. The COSMOS $2015$ Catalogue provides $30$ band photometry ranging from near UV to near infrared wavelengths, all these have been observed through multiple facilities, two of which are the Canada-Hawaii-France Telescope (CFHT) and Subaru Telescope (Miyazaki et al., 2002). From this catalogue we only use the CFHT $u^{*}$ -band (Boulade et al., 2003) and Subaru $B$ , $V$ , $r$ , $i^{+}$ and $z^{++}$ bands (Miyazaki et al., 2002), in conjunction with the narrowband photometry of PAUS. For simplicity, these bands will be referred to collectively as the $uBVriz$ bands; the superscripts are dropped for easier reading.

2.3 zCOSMOS

The zCOSMOS Survey (Lilly et al., 2007) targets galaxies in the COSMOS field using the Visible Multi-Object Spectrograph (VIMOS, Le Fèvre et al., 2003). zCOSMOS-Bright observed $20\,689$ galaxies in a sky area of $1.7$ deg², these galaxies have magnitudes $15<i_{\textrm{auto}}<22.5$ and redshifts in the range of $0.1<z<1.2$ , its spectral range is in the red (rest-frame wavelength $5550$ Å to $9650$ Å) to follow strong spectral features around the $4000$ Å break to as high redshifts as possible.

In this work we use data from zCOSMOS-Bright DR $3$ ¹¹1http://www.eso.org/qi/catalog/show/65. Galaxies with redshift confidence class $3$ and $4$ (spectroscopic verification rate of $99$ and $99.8$ per cent, respectively) are selected and cross-matched with PAUS objects.

2.4 Our dataset

Using the aforementioned selection cuts, we cross-matched within $1^{\prime\prime}$ the $40$ -narrowband photometry from PAUS, six broadband photometry ( $uBVriz$ ) from COSMOS, and highly reliable redshifts from zCOSMOS to obtain a data sample of $8406$ galaxies, which is divided randomly into half for training and testing respectively. This sample uses a total of $46$ bands, and flux calibration between the broad and narrow bands is required as they are obtained from different surveys with different flux measurements. The calibration between these fluxes will be discussed in Section 4.

Refer to caption — Figure 1: Colour-magnitude diagram for the PAUS data (red) used in this work in comparison with the COSMOS2015 sample (all objects with TYPE=0 and detected in $r$ and $i$ ). The contours represent the density of objects.

The colour-magnitude diagram of this sample is shown in Fig. 1, in comparison with the COSMOS $2015$ sample (all objects with TYPE=0 and detected in $r$ and $i$ ). The slight incompleteness in $i$ magnitude is due to the selection effects in brightness of the spectroscopic redshifts available.

The sample size may seem small, but is sufficient for the GP to work, since the GP essentially creates $4000+$ flux-redshift ‘templates’ to produce photo- $z$ ’s for objects in the testing set. However, we note that such a small training size has a major effect on the results of annz $2$ as this training size is close to the lower limit threshold suggested by Bonfield et al. (2010). We also note that the sample we have chosen is very similar to that of Eriksen et al. (2019), the only difference being that they have a more relaxed cut in the number of bands (N_BANDS), being 35<N_BANDS<40 (workable for a template code like bcnz $2$ ), while we used N_BANDS=40²²2The relaxed cut resulted in Eriksen et al. (2019) having a larger sample size of $10\,801$ objects.. When comparing results between delight and bcnz $2$ , we will only compare photo- $z$ ’s of the exact same objects. Note that we have used the same broad bands as used by Eriksen et al. (2019).

3 Algorithms and Templates

3.1 Delight and Gaussian processes

delight ³³3https://github.com/ixkael/Delight (Leistedt & Hogg, 2017) is a hybrid template-based and machine learning photo- $z$ algorithm, which was constructed to combine the advantages, and minimise the disadvantages, of both types of algorithms. delight constructs a large collection of latent SED templates (or physical flux-redshift models) from training data, with a template SED library as a guide to the learning of the model. This conceptually novel approach uses Gaussian processes (GPs) operating in flux-redshift space. delight was featured in the results of the LSST Photo- $z$ Data Challenge $1$ (Schmidt et al., 2020), where it was found to have a low photo- $z$ bias but slightly broader PDFs.

A GP is a supervised learning method, which finds a distribution over the possible functions $f(x)$ that are consistent with the observed data $x$ . Consider Fig. 2: suppose we have a set of observed variables $y=f(x)$ , we can fit it using a GP, denoted as $f\sim\mathcal{GP}\left(\mu,k\right)$ , which assumes that the probability of all $f(x)$ is jointly Gaussian and representable by a mean function $\mu(x)$ and a covariance matrix $\Sigma(x)=k(x_{i},x_{j})$ . $k(x_{i},x_{j})$ is the kernel function, which relates one variable $x_{i}$ to another $x_{j}$ . An example case would be $\mu\equiv 0$ and a kernel function that takes the form of a squared exponential,

k(x_{i},x_{j})=\sigma^{2}_{f}\exp\left[\frac{-(x_{i}-x_{j})^{2}}{2l^{2}}\right],

(1)

where $\sigma_{f}^{2}$ is the maximum allowable covariance between data (set by the errors on the observation), and $l$ is the tunable correlation length that determines the smoothness of the GP. In this simplistic case, the GP will try to find a marginalisation of all possible functions, but $\mu$ and $k$ can be modified if an underlying model of the data we want to fit is known. The covariance function is defined such that a smooth function is to be predicted.

Assuming that we have a set of training data $\{x_{i},f(x_{i})\}$ and would like to find the prediction $\{x_{*},f_{*}(x_{*})\}$ , the GP models $f$ and $f_{*}$ as jointly Gaussian, $\mathcal{N}(\mu,\Sigma)$ , and therefore

\left(\begin{array}[]{cc}f(x)\\ f_{*}(x)\end{array}\right)\sim\mathcal{N}\left(\left(\begin{array}[]{cc}\mu\\ \mu_{*}\end{array}\right),\left(\begin{array}[]{cc}\Sigma&\Sigma_{*}\\ \Sigma_{*}^{T}&\Sigma_{**}\end{array}\right)\right),

(2)

where $\Sigma=k(x_{i},x_{j})$ is the covariance between the training data, $\Sigma_{*}=k(x_{*},x_{i})$ the covariance between training and the predicted data (superscript $T$ denotes the transpose of the matrix), while $\Sigma_{**}=k(x_{*},x_{*})$ is the variance of the predicted data.

It follows from the above that the posterior $p(f_{*}|x_{*},x_{i},f_{i})$ is also Gaussian, therefore a predicted point $f_{*}(x_{*})$ is plotted (green dot in Fig. 2) is modelled by a Gaussian function (smooth blue line) which runs across all points, with its $95\%$ confidence interval ( $\pm 1.96\sigma_{f_{*}}$ ) represented by the navy shaded area.

In the context of delight, GPs are used to calculate the predicted fluxes $\hat{F}$ at a certain redshift $z$ for a training object $i$ with fluxes $F_{i}$ and redshift $z_{i}$ . This could be better understood by first defining the posterior photo- $z$ distribution $p(z|\hat{F})$ of an object in the testing set. For machine learning methods, it has the form

p(z|\hat{F})\approx\sum_{i}p(\hat{F}|z,z_{i},F_{i})\,p(z|z_{i},F_{i})p(z_{i},F_{i}),

(3)

where $p(\hat{F}|z,z_{i},F_{i})$ is the prediction for fluxes of the training galaxy at a different redshift $z$ , while $p(z|z_{i},F_{i})$ and $p(z_{i},F_{i})$ are the priors that provide the redshift distributions and abundances, generated from the training data, which are multiplied to give the combined probability $p(z,z_{i},F_{i})$ for a given redshift $z$ and training object with redshift $z_{i}$ and fluxes $F_{i}$ . This is analogous to the one derived from template-based methods,

p(z|\hat{F})\approx\sum_{i}p(\hat{F}|z,t_{i})\,p(z|t_{i})p(t_{i}),

(4)

where $t_{i}$ is the template, $p(z|t_{i})p(t_{i})=p(z,t_{i})$ is the prior and $p(\hat{F}|z,t_{i})$ is the probability of the predicted flux $\hat{F}$ at redshift $z$ and for template $t_{i}$ . Both equations are easily differentiated by the fact that for template-based methods, $p(z|\hat{F})$ is derived using a list of templates $t_{i}$ , while for machine learning methods it is derived using the individual training set objects with fluxes $F_{i}$ and spectroscopic redshift $z_{i}$ .

delight differs a little from the usual machine learning method in the sense that instead of finding a direct empirical relationship between the fluxes and redshifts of the training objects, it uses a GP to model the predicted fluxes of a training galaxy at different redshifts with the help of SED templates. This creates a latent flux-redshift template for each training object, where for a given set of fluxes in the testing set, it could be compared to several training templates to find the best predicted redshift.

The algorithm first fits a best-fit SED template to a particular training object $i$ with redshift $z_{i}$ and fluxes $F_{i}$ (multiple bands); the best-fit SED template is then used to formulate the mean function and kernel of a GP to build a flux-redshift template which could predict the expected fluxes of certain band filters when this object is redshifted to a different $z$ . With each training object now becoming a flux-redshift template, the final photo- $z$ posterior distribution of a testing set object is determined by making a pairwise comparison of every training-testing pair, and a weighted solution is obtained based on the best fits of each pair.

In other words, we are computing the probability that the target galaxy has the same SED as the training galaxy but at a different redshift. delight is thus a hybrid template-machine-learning photo- $z$ algorithm in the sense that SED templates are used to ‘guide’ the creation of flux-redshift templates based on the training objects, or, if seen from another perspective, the GP ‘corrects’ the SED templates by using training data. We refer the reader to Leistedt & Hogg (2017) for more on Gaussian processes, and also for the full expressions of the $\mu$ and $k$ in relation to the filter responses, flux normalisations, linear mixtures of physical SED templates, and the manually configurable SED residual function of emission lines.

delight is advantageous over many other photo- $z$ algorithms as its output is less dependent on representative training data, and it does not strictly require the training set to use the same photometric bands. However, it still requires accurate spectroscopic redshifts, high quality training fluxes and representative templates to produce high quality photo- $z$ probability density functions (PDFs), or $p(z)$ . As such, given a few photometric bands, delight is able to predict missing bands or fluxes in an entirely different set of photometric bands, and this function is utilised in Section 4.1 to predict and calibrate the flux values between two surveys.

3.2 Delight optimisation

The optimisation settings of delight used in this work are as follows. For the GP setup, the number of Gaussians to fit the filter curves (numGpCoeff) was set to $7$ instead of the default $20$ , appropriately selected to accommodate the smaller full width half maximum (FWHM) of the narrowband filters. Other than that, we have mainly used the default hyperparameter settings for delight with the exception of the widths of the luminosity and redshift priors $\sigma_{\ell}$ and $\sigma_{z}$ (ellPriorSigma and zPriorSigma, see Leistedt & Hogg, 2017), which have been lowered to $0.2$ and $0.1$ respectively as they produced better results.

As mentioned earlier, the mean function and the kernel of the GP are modelled after the choice of emission lines and SED template sets. We replaced the $3$ default emission lines in delight with the list provided by Eriksen et al. (2019), although we note that the change in result for this is insignificant. As for the templates, we used the Brown et al. (2014) high-quality templates, which consist of $129$ SEDs derived from real nearby galaxies. These templates have wavelengths covering the ultraviolet to mid-infrared, and encompass a broad range of galaxy types including ellipticals, spirals, merging galaxies, blue compact dwarfs and luminous infrared galaxies. In this work we have also tested the performance of various other template sets (Coleman et al., 1980; Kinney et al., 1996; Bruzual & Charlot, 2003; Ilbert et al., 2006; Polletta et al., 2007); however they do not perform as well as those of Brown et al. (2014): the root-mean-square photo- $z$ errors could range between $21$ to $112$ per cent higher when these templates are used. Therefore, the results from these tests are not shown in this work.

We note that delight requires all magnitudes $m_{i}$ and magnitude errors to be converted into fluxes $F_{i}$ and flux variances, with a zero-point adjustment of $26.4$ in magnitude (i.e. $F_{i}=10^{-0.4(m_{i}-26.4)}$ ). We have also added a $3$ and $6$ per cent flux error in quadrature to the flux variances for the narrow and broad bands, respectively, to account for other flux errors from both the data and the model (values estimated via trial and error). It is also worth mentioning that while delight is capable of processing negative fluxes (non-detections), the reference band (referenceBand) used for flux normalisation only handles fluxes with positive values. In this work we have selected the narrow band $nb625$ as the reference band, or the COSMOS $r$ -band in cases where narrow bands were not used.

Throughout this work, we use $z_{\textrm{map}}$ (the maximum a posteriori of the PDF) to represent the best point estimate photo- $z$ produced by delight. The output photo- $z$ PDF bins were set to be linear instead of logarithmic, with a stepsize of $0.001$ , and a range of $0.02<z<1.65$ , keeping close to the limits of the spectroscopic redshifts.

3.3 Other algorithms

We are also interested in how delight compares to other common template-based or machine-learning-based methods besides bcnz $2$ and deepz. Therefore two other photo- $z$ algorithms, annz $2$ and bpz are also used in this work, using the same training and template sets, to be compared with the performance of delight. In the following paragraphs we briefly introduce the two algorithms and their optimisation settings.

annz $2$ ⁴⁴4https://github.com/IftachSadeh/ANNZ (Sadeh et al., 2016) is a machine-learning-based photo- $z$ algorithm which has been widely used in recent works (Bonnett et al., 2016; Jouvel et al., 2017; Bilicki et al., 2018; Soo et al., 2018; Schmidt et al., 2020) due to its high customisability and its ability to produce PDFs. It uses the Toolkit for Multivariate Data Analysis (TMVA, Hoecker et al., 2007) with root (Brun & Rademakers, 1997), which allows it to run multiple different machine learning algorithms for training, and outputs photo- $z$ ’s based on a weighted average of their performance. In this work we ran annz $2$ with a mixture of $3$ machine learning methods, namely artificial neural networks (ANNs), boosted decision trees (BDTs) and $k$ -nearest neighbours (KNNs), see Hoecker et al. (2007) for detailed descriptions of these machine learning algorithms. An architecture of $N$ : $\frac{2N+1}{3}$ : $\frac{N+2}{3}$ : $1$ was used for the ANN; the bagging method was used to boost the decision trees; a polynomial kernel was used for the KNN; while the other hyperparameters for each method were individually optimised for best performance. annz $2$ version $2.3.1$ was used in this work, and the mean value of the PDF, $z_{\textrm{pdf}}$ was chosen to represent the photo- $z$ point estimate.

bpz ⁵⁵5http://www.stsci.edu/~dcoe/BPZ/ (Benítez, 2000), on the other hand, is one of the longest-standing template-based photo- $z$ algorithms, and still widely used today (Martí et al., 2014; Bundy et al., 2015; Cavuoti et al., 2017; Tanaka et al., 2018; Joudaki et al., 2020; Raihan et al., 2020). Other than sharing the usual attributes of a template-based code, bpz uses Bayesian inference, prior information of redshift distributions and template interpolation to improve photo- $z$ results. bpz version $1.99.3$ was used in this work, and similar to delight the Brown templates were used, with the interpolation parameter set to $2$ . We assumed the same functional form for the Bayesian priors as those used by COSMOS (Laigle et al., 2016). The peak of the PDF, $z_{\textrm{b}}$ , was used as the best photo- $z$ point estimate.

Other than annz $2$ and bpz, the results of delight are also compared to the results of bcnz $2$ , which was developed specifically for the PAUS data (Eriksen et al., 2019). bcnz $2$ is able to compute a linear combination of SED templates and is designed to deal with emission lines, extinction, and adjust zero-points between narrow and broad bands, all of which are crucial in the context of PAUS. The introduction of the code bcnz $2$ and its early demonstration of PAUS photo- $z$ can be found in Eriksen et al. (2019).

4 Flux Calibration

This work utilises fluxes obtained from two different surveys: the PAUS narrowband fluxes are measured using an aperture which covers $62.5$ per cent of light from the galaxy, while COSMOS broadband fluxes are measured using a fixed $3^{\prime\prime}$ aperture. Therefore, calibration is required to ensure that the flux values are consistent with one another. We only calibrate the broadband fluxes, leaving the narrowband fluxes untouched following Eriksen et al. (2019). The calibration process is done in two steps: first we derive empirical corrections to account for differences in the aperture photometry (calibration for each galaxy), then placing all bands at the same flux zero point (calibration for each band). For the correction for differences in flux aperture, we note that ideally this could have been easily done if spec- $z$ ’s are available; however since the evaluation set would not have spec- $z$ ’s available, we present $3$ alternatives in the following sections to calibrate the fluxes photometrically.

4.1 Correction for differences in flux aperture

In the first step we define a parameter $R_{\textrm{g}}$ , a correction factor estimated for each galaxy to be multiplied with all of its six $uBVriz$ broadband fluxes. Ideally, this factor is estimated by first finding the best-fit Brown template for each galaxy using only $40$ narrowband fluxes from PAUS and its true redshift. The best-fit template is then used to generate the predicted $uBVriz$ fluxes, and a weighted mean of the ratios between the predicted flux and the original COSMOS flux $R_{\textrm{g},b}$ is calculated for each band $b$ , given by

R_{\textrm{g}}=\frac{\sum_{b}R_{\textrm{g},b}/\sigma_{R_{\textrm{g}},b}^{2}}{\sum_{b}1/\sigma_{R_{\textrm{g}},b}^{2}},

(5)

where the sum is over the six COSMOS broad bands, and $\sigma_{R_{\textrm{g}},b}^{2}$ is the variance of $R_{\textrm{g},b}$ . Here we have assumed that the Brown templates are sufficiently representative, and therefore the predicted flux derived from it is the true flux of the broad bands. We have also assumed that $R_{\textrm{g},b}$ should be almost the same across each band for each galaxy. This calibration is motivated by the fact that each galaxy requires a calibration between fixed-size and adaptive aperture photometry dependent on its apparent size.

We now explore three different methods to determine $R_{\textrm{g}}$ from the photometric data only.

4.1.1 The photo-z calibration method

The first method, which we call the photo- $z$ calibration method, is very similar to the method above except that we replace the spectroscopic redshifts used to determine the predicted $uBVriz$ flux for the testing set with photometric redshifts. We first use delight and only the $40$ narrow bands to produce photo- $z$ ’s for each object, and then we use these photo- $z$ ’s to estimate the predicted fluxes, and then later $R_{\textrm{g}}$ for each galaxy. This implies that the better the quality of the photo- $z$ ’s produced by only the $40$ narrow bands, the better the calibrated broadband fluxes will be.

4.1.2 The size calibration method

The second method, hereafter the size calibration method, does not require the production of predicted fluxes for the testing set. Instead, this method uses the correlation between the sizes of galaxies with their values of $R_{\textrm{g}}$ in the training set, to predict the values of $R_{\textrm{g}}$ for objects in the testing set. With the predicted fluxes of the training set known, we plot $R_{\textrm{g}}$ against the $50$ per cent light radius $r_{50}$ (measured in pixels) for each object, and obtain a best-fit linear-least-squares regression line in the process,

R_{\textrm{g}}=m\cdot r_{50}+c,

(6)

where the slope and $y$ -intercept are found to be $m=0.0101$ and $c=0.4504$ respectively, with a correlation coefficient of $r=0.8349$ , implying a strong positive correlation between $R_{\textrm{g}}$ and $r_{50}$ .

With this relationship derived, the values of $R_{\textrm{g}}$ for each object in the testing set can be estimated. This method is motivated by the fact that the size of galaxies is a defining factor for the difference in their flux values when measured using a fixed aperture or when measured using a fixed light radius. Fig. 3 shows a scatter plot of $r_{50}$ v.s. $R_{\textrm{g}}$ for the training set, where the correlation equation is determined. The distribution of $R_{\textrm{g}}$ is also tabulated in the figure, it is shown to have a median value of $0.6349$ , implying that on average COSMOS measures more flux for each galaxy than PAUS. We note that in the case when galaxies have undefined values of $r_{50}$ , we substitute them with the mean value of $r_{50}=22.4934$ pixels.

4.1.3 The flux calibration method

The third and final method is the flux calibration method, which is similar to the method used by Eriksen et al. (2019), but simpler in that the Gaussian Process has a larger capacity to accommodate uncertainties. This method makes use of the fact that there are overlaps in wavelength between the COSMOS broad bands and PAUS narrow bands: the $V$ -band overlaps with the narrow bands $nb505$ to $nb585$ ( $9$ bands); the $r$ -band overlaps with $nb565$ to $nb685$ ( $13$ bands); and the $i$ -band overlaps with $nb705$ to $nb835$ ( $14$ bands). This overlap is illustrated in Fig. 4.

Similar to the previous method, no redshift information is required for flux prediction, the $R_{\textrm{g}}$ in this case is estimated by first averaging the narrowband fluxes within the range of the broad band of interest ( $V$ , $r$ or $i$ ), and then taking the ratio between the broadband flux and the averaged narrowband fluxes. This will give us $3$ values of $R_{\textrm{g},b}$ for the $3$ $Vri$ bands, and finally $R_{\textrm{g}}$ for each galaxy is taken as the weighted average of the $3$ values.

This method is simple yet effective: it does not involve the spectroscopic redshift, the photo- $z$ derived by $40$ narrow bands, or even the size of the galaxy. Here we assume that the $R_{\textrm{g}}$ estimated using $Vri$ is applicable for the $uBz$ bands as well. We will compare the overall photo- $z$ quality produced by the three methods above in Section 5.2.

4.2 Correction to flux zero points

After calibrating the COSMOS broadband fluxes for each galaxy, we proceed to calibrate the broadband magnitude offsets within each band. We perform a weighted least-squares fit between the predicted broadband fluxes (produced by delight using $40$ PAUS narrowband fluxes, the respective best fit Brown templates and zCOSMOS spec- $z$ ’s) and the original COSMOS $uBVriz$ fluxes in the training set, by using a simple linear equation,

\ln(F_{\textrm{p},b})=a_{b}\cdot\ln(F_{\textrm{g},b})+c_{b}

(7)

where $F_{\textrm{p},b}$ is the predicted flux for band $b$ , $F_{\textrm{g},b}$ the COSMOS broadband flux after undergoing the per-galaxy calibration, and $a_{b}$ and $c_{b}$ are constants to be optimised. The values of $a_{b}$ and $c_{b}$ estimated for each band using the training set are now used to calibrate the fluxes in the testing set, and these values are tabulated in Table 1. A weighted fit was implemented, with the inverse variances of the fluxes used as the weights, since we expect that objects which are brighter to have relatively lower variances, and by accounting for the variances of objects the fainter objects would be upweighted.

Table 1: List of the best fit parameters

a_{b}

and

c_{b}

for each band

b

when the predicted and original COSMOS fluxes from the training set were fitted with a weighted least-squares fit, using Equation 7.

Bands	$a_{b}$	$c_{b}$
$u$	$1.0007\pm 0.0001$	$0.0354\pm 0.0008$
$B$	$0.9906\pm 0.0002$	$0.2163\pm 0.0009$
$V$	$0.9988\pm 0.0002$	$-0.0830\pm 0.0009$
$r$	$1.0006\pm 0.0002$	$0.0015\pm 0.0009$
$i$	$1.0202\pm 0.0001$	$-0.0875\pm 0.0008$
$z$	$0.9791\pm 0.0001$	$0.0424\pm 0.0007$

As expected from the table, the values of $a_{b}$ and $c_{b}$ are very close to $1$ and $0$ respectively, since the calibrated flux for aperture correction $F_{\textrm{g},b}$ is already very close to the predicted flux $F_{\textrm{p},b}$ . Essentially, this process ‘straightens’ the correlation line, providing minor yet essential improvements to the overall calibration.

4.3 Overall calibration performance

Figure 5 shows the correlation between the broadband fluxes predicted by delight (using spectroscopic redshifts, PAUS $40$ narrow bands and Brown templates) and the COSMOS broadband fluxes for our training set, both before and after calibration (red and blue, respectively). The figure only shows the result of the flux calibration method, as the other two methods look very similar graphically (which translates to a small difference in photo- $z$ results shown later in Section 5).

The RMS values displayed in Figure 5 show that for all bands, the scatter between the original fluxes with respect to the predicted fluxes has reduced by $63$ to $88$ per cent after the two-step calibration was done. The scatter at low fluxes for the $u$ and $B$ -bands remains evident, which originated from the high uncertainty in flux measurements. Despite the large decrease in scatter, we note that the RMS value here is not a metric of improvement for calibration as we do not have the true values of the broadband fluxes in the matched apertures. However, the calibration of the broadband fluxes did translate into an improvement in photo- $z$ scatter and $68$ th percentile error by about $70$ to $80$ per cent, as shown in Section 5.

5 Results and Discussion

Table 2: The root-mean-square error (

\sigma_{\textrm{RMS}}

68

th percentile error (

\sigma_{68}

), outlier fraction (

\eta_{\textrm{out}}

), mean continuous ranked probability score (

\rho_{\textrm{CRPS}}

) and the root-mean-square error in redshift distribution (

n_{\textrm{RMS}}

) for the photo-

z

’s produced in this work, using different algorithms, methods and number of bands. All results are produced using

6

broad bands (BB) and

40

narrow bands (NB) unless stated otherwise.

Photo- $z$ methods	$\sigma_{\textrm{RMS}}$	$\sigma_{68}$	$\eta_{\textrm{out}}$ ( $\%$ )	$\rho_{\textrm{CRPS}}$	$n_{\textrm{RMS}}$
delight ( $6$ BB only)	$0.0514$	$0.0441$	$0.93$	$0.0388$	$0.885$
delight ( $40$ NB only)	$0.0684$	$0.0119$	$4.02$	$0.0298$	$0.637$
delight (no calibration)	$0.1555$	$0.0566$	$9.06$	$0.0887$	$0.895$
delight (photo- $z$ calibration method)	$0.0335$	$0.0083$	$0.71$	$0.0158$	$0.634$
delight (size calibration method)	$0.0341$	$0.0095$	$0.76$	$0.0165$	$0.646$
delight (flux calibration method)	$0.0331$	$0.0081$	$0.86$	$0.0155$	$0.636$
delight (flux calibration method, no GP)	$0.0442$	$0.0089$	$0.98$	$0.0179$	$0.639$
annz $2$	$0.0556$	$0.0396$	$2.66$	$0.0719$	$0.465$
annz $2$ ( $6$ BB only)	$0.0371$	$0.0202$	$1.14$	$0.0522$	$0.432$
bpz	$0.0368$	$0.0089$	$0.86$	$0.0184$	$0.740$
bcnz $2$	$0.0403$	$0.0085$	$1.14$	$-$	$-$

Table 2 summarises the results of this work, it shows all the photo- $z$ metrics we produced, using different algorithms (delight, annz $2$ , bpz), different calibration methods (flux, photo- $z$ and size), and different number of input fluxes ( $6$ broad bands, $40$ narrow bands, or both). We divide the analysis of the results into two sections: Section 5.2 studies the performance between the three calibration methods used in delight, while Section 5.3 compares the best performance of delight with annz $2$ , bpz and bcnz $2$ . In the following section, we briefly introduce the performance metrics we used in this work.

5.1 Performance metrics

In this work we use three metrics to quantify the performance of the photo- $z$ point estimates: the root-mean-square error ( $\sigma_{\textrm{RMS}}$ ), the $68$ th percentile error ( $\sigma_{68}$ ) and the outlier fraction rate ( $\eta_{\textrm{out}}$ ). With $\Delta z\equiv\frac{z_{\textrm{phot}}-z_{\textrm{spec}}}{1+z_{\textrm{spec}}}$ , the above metrics are defined as follows:

\sigma_{\textrm{RMS}}\equiv\sqrt{\frac{1}{N}\sum_{i}^{N}\left|\Delta z_{i}\right|^{2}}\ ,

(8)

\sigma_{68}\equiv\frac{Q_{84.1\%}(\Delta z_{i})-Q_{15.9\%}(\Delta z_{i})}{2}\ ,

(9)

\eta_{\textrm{out}}\equiv\textrm{\% objects where}\left|\Delta z_{i}\right|\geq 0.15\ .

(10)

Here $N$ is the total number of galaxies, while $Q$ is a percentile of the distribution. Since $\sigma_{\textrm{RMS}}$ is calculated without the outliers removed, it measures the overall scatter of the sample, whereas $\sigma_{68}$ measures the scatter with reduced sensitivity to outliers.

With similar motivations as Martí et al. (2014) and Eriksen et al. (2019), we hope to achieve an overall photo- $z$ error of $\sigma_{68}\leq 0.0035(1+z_{\textrm{spec}})$ for at least $50$ per cent of the testing sample after applying an appropriately chosen quality cut. We use the Bayesian ODDS ( $\Theta$ ) parameter (Benítez, 2000) in delight, similar to its implementation in annz $2$ by Soo et al. (2018). $\Theta$ can be estimated from the photo- $z$ PDF, $p(z)$ using the equation

\Theta=\int^{z_{\textrm{p}}+k(1+z_{\textrm{p}})}_{z_{\textrm{p}}-k(1+z_{\textrm{p}})}p(z)\,dz\ ,

(11)

where $z_{\textrm{p}}$ is the peak of $p(z)$ and $k=0.01$ . $\Theta$ ranges between $0$ and $1$ , the higher the value the lower the $p(z)$ width, which implies a more precisely predicted photo- $z$ (though not necessarily accurate). The value of $k$ is arbitrary, appropriately selected such that not too many objects end up having $\Theta=1$ . Therefore, an $x$ per cent quality cut on the sample keeps the top $x$ percent of objects with the highest values of $\Theta$ .

To assess the quality of the $p(z)$ , we use probability integral transform (PIT) plots and the continuous ranked probability score (CRPS). The PIT is the cumulative distribution function (CDF) at $z_{\textrm{spec}}$ while asserting the $p(z)$ to have an area of unity. Since the photo- $z$ CDF is $C(z)=\int^{z}_{0}p(z^{\prime})\ dz^{\prime}$ , PIT is defined to be

\textrm{PIT}=C(z_{\textrm{spec}})=\int_{0}^{z_{\textrm{spec}}}p(z)\,dz\ .

(12)

A PIT distribution tells us on average if the $p(z)$ produced are ‘adequately shaped’: the shape of the PIT distribution can tell us if the $p(z)$ produced are generally too wide/narrow, or if the $p(z)$ are over/under-predicting the true redshift.

The CRPS on the other hand tells us how well the $p(z)$ encapsulates or predicts the true redshift ( $z_{\textrm{spec}}$ ). The CRPS of a $p(z)$ can be expressed as

\textrm{CRPS}=\int^{\infty}_{-\infty}\left|C(z)-\mathcal{H}(z-z_{\textrm{spec}})\right|^{2}\,dz\ ,

(13)

where $\mathcal{H}(z-z_{\textrm{spec}})$ is the Heaviside step function with

\mathcal{H}(z-z_{\textrm{spec}})=\left\{\begin{array}[]{ll}1,&z=z_{\textrm{spec}}\\ 0,&\textrm{otherwise}.\end{array}\right.

(14)

In this work, we use the symbol $\rho_{\textrm{CRPS}}$ to represent the average CRPS value of all galaxies in the testing sample, in which the smaller the value, the better the $p(z)$ are at predicting their true redshifts. We refer the reader to Polsterer et al. (2016) for a detailed description of both PIT and CRPS.

Finally, we also assess the quality of the redshift distribution $n(z)$ . We can find how similar the spec- $z$ distribution $n_{\textrm{spec}}(z)$ is compared to the photo- $z$ distribution $n_{\textrm{phot}}(z)$ by estimating $n_{\textrm{RMS}}$ , the root-mean-square difference between the distributions:

n_{\mathrm{RMS}}=\sqrt{\int\left[n_{\mathrm{phot}}(z)-n_{\mathrm{spec}}(z)\right]^{2}\,dz}.

(15)

$n_{\textrm{RMS}}$ provides us a quantitative measure to compare the performances of photo- $z$ with distributions produced by different codes.

5.2 Performance of Delight

Rows $1$ and $2$ from Table 2 shows the photo- $z$ ’s produced when only trained using the broad and narrow bands individually, and we find that by combining both broad and narrow bands (rows $4$ to $6$ ), we have achieved at least $34$ and $20$ per cent improvement in the photo- $z$ scatter and $\sigma_{68}$ , respectively (visualised in Fig. 6).

Rows $3$ to $7$ proceed to show the metrics for each calibration method, and on average, the performance of each method is quite similar, all within $4$ to $16$ per cent difference in $\sigma_{\textrm{RMS}}$ and $\sigma_{68}$ , respectively. Statistically, the flux calibration method seems to perform slightly better compared to the remaining ones, with the exception of the photo- $z$ calibration method having better values of $\eta_{\textrm{out}}$ and $n_{\textrm{RMS}}$ . This suggests that while the photo- $z$ ’s produced by training with only $40$ narrow bands are not as competitive as when trained with all $46$ bands and calibrated broad bands (see Table 2 and Fig. 6), it is however sufficient to guide the calibration process. Note that we have also included the results of delight run as a pure template code when calibrated using the flux calibration method for comparison, and we see that without the help of the GP, the photo- $z$ results are similar for most metrics except a degradation in scatter of up to $-33.5$ per cent. Therefore the good results of delight shown here are mainly due to the use of the Brown templates, the flux calibration, the combination of broad and narrow bands, and also the work of the GP.

As the three calibration methods presented in Section 4 all result in very similar photo- $z$ performance, we will only show results for the flux calibration method in the following. It is notable however that in all cases, the photo- $z$ requirement of $\sigma_{68}<0.0035(1+z)$ is achievable for all objects at $i_{\textrm{auto}}<20.0$ , or objects with a $40$ per cent $\Theta$ cut at $i_{\textrm{auto}}<22.5$ . All three methods also shows that despite such high percentage $\Theta$ cuts being implemented, a significant number of high photo- $z$ objects still remain in the sample.

5.3 Comparison with Other Algorithms

Since the delight results for each of the three calibration methods are very similar to each other, we decided to select only the flux calibration method to be compared to the results obtained by the two other algorithms used in this work, annz $2$ and bpz. We also include the point estimates from Eriksen et al. (2019). The values of $\sigma_{\textrm{RMS}}$ , $\sigma_{68}$ and other relevant metrics obtained from these algorithms are shown in rows $8$ to $11$ of Table 2, and visualised in Fig. 7.

From the figure, it is found that annz $2$ , being a purely machine-learning based algorithm, is underperforming compared to the other algorithms. This machine-learning method is unable to make full use of the extra information provided by the $40$ narrow bands, and is shown to perform better without them. This is partially due to the problem of the curse of dimensionality (Bellman, 1957), sharply diluting the pattern recognition power of the algorithm as the number of inputs increases. Besides, the very small training sample size may have heavily affected the potential of annz $2$ . Here we note however that the deep learning code deepz is shown to work well on a similar sample (Eriksen et al., 2020), therefore we hope to do follow-up evaluations of annz $2$ on PAUS data in the future when a larger training set is available.

In terms of the quality of the point estimate photo- $z$ ’s, delight is shown to fare well against bcnz $2$ and bpz (Fig. 7), both of which are purely template-based methods. As both delight and bpz used the same template sets in this case (i.e. the Brown templates), we find that the Gaussian process contributed to $25$ and $9$ per cent improvement in the scatter and $\sigma_{68}$ , respectively, as compared to the pure template fit of bpz.

Despite the similarities in the point estimates for the entire sample (Table 2), when we cut the sample in percentages of $\Theta$ (Figs. 8 and 9), we see two major differences. Firstly, the cut in $\Theta$ for bpz does not systematically remove objects with high uncertainties (especially for objects brighter than $i_{\textrm{auto}}=21$ ); and secondly, the cut in $\Theta$ for bpz selectively removes objects with lower photo- $z$ . In both cases, delight is shown to not only perform better in this regard as compared to bpz, but also better than all other algorithms shown.

A selection of sample $p(z)$ produced by each algorithm is shown in Fig. 10, while the overall quality of the $p(z)$ produced are visualised in the PIT plots as shown in Fig. 11. Once again we see delight on average producing superior $p(z)$ compared to annz $2$ and bpz: it is obvious from the PIT plots that the $p(z)$ produced by annz $2$ are too narrow (a U-shaped distribution), while those by bpz are too wide (a significant central peak). In terms of $\rho_{\textrm{CRPS}}$ (see Table 2), delight once again performs better than both bpz and annz $2$ , where the adequate shapes and accurately positioned peaks of the $p(z)$ provide good predictions of the true redshift.

We note that the $p(z)$ produced by annz $2$ are ragged compared to bpz and delight, this is due to the limited training sample size and the low number of network committees used. We intend to look into several methodologies to smoothen machine-learning based $p(z)$ which are limited by such conditions; this is left for future work. The limited testing size has also produced an $n_{\textrm{spec}}(z)$ distribution which is not smooth, thus despite annz $2$ producing an $n(z)$ closest to the spectroscopic distribution (lowest $n_{\textrm{RMS}}$ ), it may have experienced overfitting. Having said that, for the different delight runs shown in Table 2, the values of $n_{\textrm{RMS}}$ are consistent with the other metrics. Therefore, we leave the analysis of $n(z)$ to future work when a large enough testing sample is available.

6 Application: identifying photo-z outliers

6.1 Analysing the photo-z outliers of Delight and BCNz2

As we compared the photo- $z$ results, we discovered that there are some galaxies that have similar delight and bcnz $2$ photo- $z$ values, however these redshift values are far from their respective zCOSMOS spectroscopic redshifts or broadband photo- $z$ ’s. Since both bcnz $2$ and delight utilise the PAUS narrow bands, we expect that the photo- $z$ ’s they produce are more sensitive to emission lines as compared to photo- $z$ ’s produced using only broad bands. Therefore, we suspect that objects that have similar photo- $z$ values for delight and bcnz $2$ but have disagreeing spec- $z$ values to be an indication of either having (1) a catastrophic zCOSMOS spectroscopic redshift⁶⁶6While we have already selected to use only secure spectroscopic redshifts in this work, we still deem this as a possibility, since a $1$ per cent outlier rate in $4000+$ spec- $z$ measurements may still yield $40$ objects, which is within the same order of number of objects being investigated in this section. Our results later in this section however have verified that most of the outliers are not caused by catastrophic spectroscopic redshifts., (2) outlier broadband or narrowband fluxes, or (3) misidentification of close neighbours.

For the purpose of this inquiry, we have selected $30$ objects from the sample which are photo- $z$ outliers in $z_{\textrm{Delight}}$ vs. $z_{\textrm{spec}}$ or $z_{\textrm{BCNz}}$ vs. $z_{\textrm{spec}}$ , yet are not outliers in $z_{\textrm{Delight}}$ vs. $z_{\textrm{BCNz}}$ . Mathematically, they satisfy the following conditions:

1.

$\frac{\left|z_{\textrm{Delight}}-z_{\textrm{spec}}\right|}{1+z_{\textrm{spec}}}\geq 0.15$ or $\frac{\left|z_{\textrm{BCNz}}-z_{\textrm{spec}}\right|}{1+z_{\textrm{spec}}}\geq 0.15$ , and
2.

$\frac{\left|z_{\textrm{Delight}}-z_{\textrm{BCNz}}\right|}{1+\frac{z_{\textrm{Delight}}+z_{\textrm{BCNz}}}{2}}<0.15$ .

Note that the $z_{\textrm{Delight}}$ used here refers to the photo- $z$ produced using the flux calibration method, trained using $46$ bands guided by the Brown templates.

These $30$ objects are visualised in the redshift-redshift plots in Fig. 12. Note that in the following paragraphs, we will define a photo- $z$ to be catastrophic if it is found to be an outlier with respect to its spec- $z$ , as defined mathematically above. These objects are found to have faint magnitudes ( $i_{\textrm{auto}}>19.75$ ) and small angular sizes ( $r_{50}<60$ ACS pixels, or $1.8^{\prime\prime}$ ), which describe most galaxies of interest for PAUS. We study several different attributes of these objects, namely their respective photo- $z$ ’s by delight, bcnz $2$ and lephare, photo- $z$ PDFs, best-fit templates (Brown and GP), spectra and images. We summarise important observations according to their respective attributes below.

Photo- $z$ ’s. While these $30$ objects have been identified as outliers when trained using $46$ bands, we find that two-thirds of these objects have non-catastrophic photo- $z$ ’s when trained with either only the broad or narrow bands, respectively. In other words, only one-third of these objects have catastrophic photo- $z$ ’s regardless of which bands were used in the training or fitting process. This suggests that most of the time, outlier fluxes in the broad or narrow bands may have caused a degradation in photo- $z$ quality when trained together (more on this in the templates paragraph below). We have also made a comparison between delight photo- $z$ ’s with those produced by lephare for the COSMOS2015 catalogue (Laigle et al., 2016), and found that in fact half of the $30$ objects have non-catastrophic lephare photo- $z$ ’s. This suggests that the infrared $yJHK$ bands could have played a role in improving the PAUS photo- $z$ ’s, and could be incorporated in future trainings in case the PAUS photometry is problematic⁷⁷7We note that these additional bands will not be available over most of PAUS, which targets Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) wide fields W1 to W4. There is however some infrared data on these fields provided by the Wide-field InfraRed Camera (WIRCam) and the VISTA Kilo-degree Infrared Galaxy Survey (VIKING)..

Photo- $z$ PDFs. We inspected the secondary/tertiary peaks of the PDFs for all delight runs (trained with $6$ broad bands, $40$ narrow bands, or both), and find that less than $20$ per cent of these secondary/tertiary peaks coincide with their respective spec- $z$ ’s. We deduce that despite the importance of secondary PDF peaks in redshift distributions, they do not significantly influence the photo- $z$ quality of these $30$ objects.

Templates. delight utilises the $129$ Brown et al. (2014) templates and the $4203$ training objects to guide the GP to produce the same number of new flux-redshift templates, which are used to produce photo- $z$ ’s for the objects. In the training process, delight would always choose one best-fit Brown template for each training galaxy to be trained by the GP. Here we inspected two different kinds of best-fit Brown templates to these $30$ outliers: one fixed at the spec- $z$ , and the other with the redshift as a free parameter. In both cases, we examined

1.

if the objects fit to the same templates when trained with only broad bands, only narrow bands, or both, respectively;
2.

if there are any trends in galaxy morphological types, based on the galaxy type classification indicated by the template;
3.

if there is any correlation between the $\chi^{2}$ value of the best-fit templates and the quality of photo- $z$ ’s; and
4.

if any outlier narrowband fluxes can be identified as the cause of the degradation of photo- $z$ .

As expected, we find that $70$ per cent of the outlier objects have different best-fit Brown templates between the fits at fixed photo- $z$ and spec- $z$ , which contrasts with the case for non-outliers at only $35$ per cent. We also find that only slightly more than a third of both the outlier and non-outlier objects were fitted to the same templates when trained using broad bands as compared to trained with all $46$ bands. The high percentage of objects with different template fits at different reference redshifts (photo- $z$ or spec- $z$ ) and flux combinations (broad bands, narrow bands, or both) also resulted in no trend in galaxy morphological types among the outliers.

However, it was found that up to $60$ per cent of the objects have their best-fit template $\chi^{2}$ value correlating with the quality in photo- $z$ , which further affirms the usage of this as a metric to remove unreliable photo- $z$ ’s (see Section 6.2), as also attempted by Eriksen et al. (2019) and Eriksen et al. (2020).

Perhaps a more significant finding from the study of the best-fit templates is the ability to identify outlier narrowband fluxes. Fig. 13 shows an example which highlights the importance of identifying outlier narrowband fluxes, which is shown to significantly affect the photo- $z$ results. It was found that a third of the $30$ objects contained outlier narrowband fluxes, which results in entirely different template fits and photo- $z$ ’s when trained with narrow bands, as compared to when trained with broad bands only. Among these $10$ objects, $8$ of them are shown to have worse photo- $z$ as compared to training without the narrow bands. We find indications for a significant fraction of narrowband flux outliers also for galaxies without catastrophic redshift failures. Forthcoming PAUS data reductions will therefore implement methods to identify and correct flux outliers.

Images. We inspect the individual object images compiled by zCOSMOS DR $3$ , these are $5^{\prime\prime}\times 5^{\prime\prime}$ images observed by the Hubble Space Telescope/Advanced Camera for Surveys (HST/ACS) in the F $814$ W filter (Koekemoer et al., 2007). Among the $30$ outlier objects, we find $63.3$ and $26.7$ per cent of them having bright neighbours within $5^{\prime\prime}$ and $3^{\prime\prime}$ of the primary source, respectively. Having said that, we have not found any correlation between the presence of bright neighbours to the other attributes that we have studied thus far. In fact the opposite is true: we find that $60$ per cent of the objects with outlier narrowband fluxes actually have primary sources without any bright neighbours in vicinity.

Spectra. So far we have assumed that the zCOSMOS spectra obtained are reliable, as only entries with high-confidence quality flags have been selected for training (see Section 2.3). In order to probe further, we examined the one-dimensional spectra obtained by the VIMOS spectrograph, which is processed by the VIMOS Interactive Pipeline and Graphical Interface (VIPGI, Scodeggio et al., 2005) to produce the zCOSMOS spec- $z$ ’s used in this work. The spectra have a range between $5500$ Å and $9450$ Å, measured with a resolution of $R\sim 600$ at $2.5$ Å per pixel (Lilly et al., 2009).

We used the redshift measurement tool ez (Garilli et al., 2010) to inspect the spectra of the $30$ outlier objects, and compared our best fits to the spectroscopic redshift produced by zCOSMOS, and also the photo- $z$ ’s produced by delight, bcnz $2$ , deepz, lephare (COSMOS2015) and those of Alarcon et al. (2021).

Upon inspection, we find that up to $10$ of these objects ( $33$ per cent) have disputable zCOSMOS spec- $z$ (e.g. two possible redshift values, different best-fit redshift values, line confusion, and low signal-to-noise). However, most of these potential spec- $z$ failures could be forced-fitted to the zCOSMOS spec- $z$ and still look satisfactory, which leaves only $2$ ( $6.7$ per cent) of these objects having truly catastrophic spec- $z$ ’s. Both these objects are found to have better ez fits at redshift values within $10$ per cent uncertainty from the photo- $z$ ’s produced by delight and other algorithms. The spectrum of one of these objects is shown in Fig. 14. We have also found one isolated case where the spectra belonged to a bright neighbour and has been mismatched to the PAUS photometry.

Generally, the higher-redshift objects are identified by clear O II ( $3727.1$ Å) emission lines, while the lower-redshift objects are identified by clear H $\alpha$ ( $6564.6$ Å) emission lines. We therefore conclude that although catastrophic spec- $z$ ’s played a role in this situation, our results did not provide enough evidence to say that it is a major cause for catastrophic photo- $z$ ’s produced by bcnz $2$ and delight. This is not surprising since we have only selected secure spectroscopic redshifts from COSMOS to be used in this work. However this highlights the usefulness of multiple PAUS photo- $z$ ’s being used to determine failure rates in insecure spectroscopic redshifts.

To summarise this part, we believe that the potentially important source for catastrophic photo- $z$ ’s in the context of PAUS are the outlier narrowband fluxes, with weak evidence for the existence of a small number of spec- $z$ failures. We leave the tackling of outlier narrowband fluxes to future work, but in the following section, we attempt to improve our process to identify and remove these outlier photo- $z$ ’s.

6.2 New metrics to remove photo-z outliers

In Figs. 8 and 9 we have used the Bayesian odds ( $\Theta$ ) to cut the sample, and the aim of this was to keep as many objects as possible while achieving the goal of $\sigma_{68}\leq 0.0035(1+z)$ . Here, we extend our previous results further towards that goal by introducing several new metrics to better separate the photo- $z$ outliers from the sample. These metrics are motivated by the inspection of the $30$ outliers in Section 6.1, and they are defined as follows:

1.

The Delight-BCNz2 metric ( $\Delta_{\textrm{DB}}$ ),

$\Delta_{\textrm{DB}}\equiv\frac{\left|z_{\textrm{Delight}}-z_{\textrm{BCNz}}\right|}{1+\frac{z_{\textrm{Delight}}+z_{\textrm{BCNz}}}{2}},$ (16)

a metric used to identify the similarity between delight and bcnz $2$ photo- $z$ ’s. It is plausible that, in general, the closer the photo- $z$ ’s between the two algorithms, the more reliable they are;
2.

The Delight photo- $z$ standard deviation ( $\sigma_{\textrm{D}}$ ), which is the standard deviation between all delight photo- $z$ runs regardless of calibration method and number of bands. Smaller deviations could indicate more reliable photo- $z$ ’s;
3.

The chi-squared value of the best-fit Brown template ( $\chi^{2}_{\textrm{t}}$ ), where we identified a trend that the better the fit, the more reliable the photo- $z$ ; and
4.

The broadband-narrowband complementary metric ( $\rho^{2}$ ),

$\rho^{2}\equiv\int p_{\textrm{BB}}(z)p_{\textrm{NB}}(z)\,dz,$ (17)

where $p_{\textrm{BB}}(z)$ and $p_{\textrm{NB}}(z)$ are the $p(z)$ produced by delight when trained with only broad bands and only narrow bands, respectively. By multiplying these two $p(z)$ and summing over the distribution at each step $i$ , we can identify the consistency between the broadband and narrowband $p(z)$ . A higher value of $\rho^{2}$ means a larger overlap, which indicates more reliable photo- $z$ ’s.

Together with $\Theta$ and the delight photo- $z$ error ( $\delta z$ ), we yield a total of $6$ metrics to experiment with. Using the results from the flux calibration method, we generate and test the individual performance for each of these metrics. For each metric, we measure the $\sigma_{\textrm{RMS}}$ and $\sigma_{68}$ after systematically removing objects with the worst metric values, $10$ per cent of the total sample size each time, until we reach a sample size of only $40$ per cent.

We also repeat the exercise by using combined cuts on several metrics, testing all $57$ combinations of the $6$ metrics. We note that we do not combine the metrics by averaging or multiplying them, as it would have diluted the impact of the individual metrics. Instead, we rank the values for each metric individually (from best to worst), and remove objects rank by rank, starting with metric values lying in the worst rank. E.g. for the combination of metrics $\Theta+\Delta_{\textrm{DB}}$ , we first remove all objects which share the worst values of $\Theta$ and $\Delta_{\textrm{DB}}$ , then remove all objects sharing the second worst values of them, and so on, until we reach a required sample size percentile ( $90$ , $80$ , etc), where we output the values of $\sigma_{\textrm{RMS}}$ and $\sigma_{68}$ . We visualise the performance of these metric cuts at several percentiles for $\sigma_{68}$ with respect to $i_{\textrm{auto}}$ (cumulative) in Fig. 15.

We find that each performance metric cuts the sample differently: while metric cuts of $\sigma_{\textrm{D}}$ and $\rho^{2}$ reduce the scatter ( $\sigma_{\textrm{RMS}}$ ) significantly, metric cuts of $\Theta$ and $\Delta_{\textrm{DB}}$ reduce the $\sigma_{68}$ instead. The metric $\chi^{2}_{\textrm{t}}$ , however does not seem to bring any significant improvement to the results. We have also plotted a cut in $\Delta z=\frac{\left|z_{\textrm{phot}}-z_{\textrm{spec}}\right|}{1+z_{\textrm{spec}}}$ (bottom-left panel in Fig. 15), which is the theoretical ‘best metric’, providing an upper limit to be compared with the performance of each of the metrics. Here we noticed that even with the theoretical best metric, a cut of slightly lesser than $70$ per cent (blue line) on the sample is still necessary to fulfil the PAUS target of $\sigma_{68}<0.0035(1+z)$ (dotted line) for delight.

Therefore, we select the $60$ per cent cut (navy line, retaining $60$ per cent of galaxies) as a benchmark to assess the performance of these metrics, we do so by locating where this line cuts the dotted line (i.e., finding the maximum value of $i_{\textrm{auto}}$ where the photo- $z$ ’s achieves the PAUS target at $60$ per cent cut). From Fig. 15, it is clear that cutting in all $6$ metrics does not necessarily outperform the performance when cutting with only $\Theta$ , so we searched for the best combination of metrics for $\sigma_{\textrm{RMS}}$ and $\sigma_{68}$ separately.

For $\sigma_{\textrm{RMS}}$ , the best combination of metrics is $\Delta_{\textrm{DB}}+\sigma_{\textrm{D}}+\rho^{2}$ , and this combination achieves $\sigma_{\textrm{RMS}}<0.0035(1+z)$ at $i_{\textrm{auto}}<19.27$ at $60$ per cent cut, a significant improvement to the case when only $\Theta$ was used, where it did not cut the line at all. For $\sigma_{68}$ , the best combination of metrics is $\Theta+\Delta_{\textrm{DB}}$ where it reached $\sigma_{\textrm{68}}<0.0035(1+z)$ at $i_{\textrm{auto}}<21.25$ at $60$ per cent cut, which is also a significant improvement as compared to $\Theta$ at $i_{\textrm{auto}}<20.88$ . Here we note that in fact using $\Delta_{\textrm{DB}}$ alone, the target can be reached at a higher limit of $i_{\textrm{auto}}<21.50$ , which highlights the significance of a synergy between delight and bcnz $2$ in selecting a high quality photo- $z$ sample.

Finally, we also show the performance of the metrics in terms of the completeness with respect to the photo- $z$ (using delight’s flux calibration method), visualised in Fig. 16. We find that metrics like $\sigma_{\textrm{D}}$ and $\rho^{2}$ tend to selectively remove high photo- $z$ objects, while $\Theta$ , $\chi^{2}_{\textrm{t}}$ and $\Delta_{\textrm{DB}}$ tend to remove mid-ranged photo- $z$ objects. In general, a cut using all $6$ performance metrics at $60$ per cent cut shows a balanced result in the completeness, keeping a sufficient number of high redshift objects in the sample.

To summarise the performance of the individual metrics,

•

$\chi^{2}_{\textrm{t}}$ is the least-performing metric here; it does not bring significant positive impact to the results;
•

Cuts in $\sigma_{\textrm{D}}$ and $\rho^{2}$ help to improve the scatter, however, they tend to selectively remove higher photo- $z$ objects from the sample;
•

$\Theta$ and $\delta z$ show very similar results, however $\Theta$ tends to keep more high photo- $z$ objects in the sample; and
•

$\Delta_{\textrm{DB}}$ is the best-performing metric here, and we recommend the use of such a metric to remove outlier photo- $z$ ’s from a sample.

7 Conclusion and Future Work

In this work we have optimised delight, a hybrid template-machine learning algorithm such that it could be used to obtain photo- $z$ ’s for PAUS, by utilising its $40$ narrowband fluxes combined with $6$ $uBVriz$ COSMOS broadband fluxes. We have shown three distinct methods to calibrate the broadband and narrowband fluxes, and found that all three methods yield comparable results, although the most stable and the one which produces the lowest value of $\sigma_{68}$ is what we defined as the flux calibration method: a method where we calibrate the broadband fluxes with respect to the narrowband fluxes by finding the flux ratio of the filter combinations which overlap. This calibration method is entirely photometric, and it was able to produce photo- $z$ ’s with a scatter reaching as low as $\sigma_{\textrm{RMS}}=0.0331(1+z)$ and $\sigma_{68}=0.0081(1+z)$ for the full PAUS galaxy sample at $i_{\textrm{auto}}<22.5$ .

We have also compared the results of delight with a machine learning algorithm (annz $2$ ) and a template-based algorithm (bpz and bcnz $2$ ). We find that annz $2$ underperforms significantly, indicating that annz $2$ in its basic form is not suitable for narrowband surveys with large number of bands and small number of training objects.

Despite the photo- $z$ performance of bpz being within $9$ per cent difference of that of delight, the latter still stood out in terms of the quality of the photo- $z$ PDF $p(z)$ ( $16$ per cent better in $\rho_{\textrm{CRPS}}$ ) and the effectiveness of its Bayesian odds ( $\Theta$ ) cut in retaining objects with higher quality photo- $z$ without losing too many high-redshift objects. delight is also shown to produce competitive results as compared to bcnz $2$ ( $5$ per cent lower in $\sigma_{68}$ ), the default photo- $z$ produced for the PAUS.

Further investigation on the common photo- $z$ outliers of delight and bcnz $2$ led to the conclusion that outlier narrowband fluxes are the main cause for erroneous photo- $z$ ’s, an insight which will inform improvements in forthcoming PAUS data reductions. We have also inspected the spectra and identified catastrophic spec- $z$ ’s, however the effects are shown to be insignificant in this work. Motivated by the study of $30$ outliers shared between delight and bcnz $2$ , we introduced several new metrics to help improve the identification of photo- $z$ outliers and remove them from the sample to achieve better results. From the $6$ metrics compared, our newly introduced delight-bcnz $2$ metric ( $\Delta_{\textrm{DB}}$ ) is shown to significantly improve our photo- $z$ quality, allowing it to reach the PAUS target of $\sigma_{68}<0.0035(1+z)$ at $i_{\textrm{auto}}<21.5$ while retaining $60$ per cent of the sample objects. These new metrics could be utilised to return more accurate uncertainties in redshift, which are vital in many cosmological studies.

This opens the door to future studies in finding synergies between different photo- $z$ algorithms and between broadband and narrowband photometry. Together with the promising developments of deep learning approaches to deal with narrowband data (Eriksen et al., 2020), these insights will pave the way towards unprecedentedly precise and accurate photometric redshifts for the full PAUS survey and beyond, like the Javalambre Physics of the Accelerating Universe Astrophysical Survey (J-PAS, Benítez et al., 2014).

Acknowledgements

The authors wish to thank the referee for the helpful and constructive comments. JYHS would like to thank Boris Leistedt for fruitful discussions and the setup of delight earlier in this work. JYHS also would like to thank Hwee San Lim and Tiem Leong Yoon for assisting in the setup of equipment in Universiti Sains Malaysia where most of the computational work of this paper was completed. JYHS acknowledges financial support from the MyBrainSc Scholarship by the Ministry of Education, Malaysia, a studentship provided by Ofer Lahav, and the Short Term Research Grant by Universiti Sains Malaysia (304/PFIZIK/6315395). JYHS and BJ acknowledge support by the University College London Cosmoparticle Initiative. MS acknowledges funding from the National Science Centre of Poland (UMO-2016/23/N/ST9/02963) and the Spanish Ministry of Science and Innovation through the Juan de la Cierva Formación programme (FJC2018-038792-I). H. Hildebrandt acknowledges support by a Heisenberg grant of the Deutsche Forschungsgemeinschaft (Hi 1495/5-1) and an ERC Consolidator Grant (no. 770935). H. Hoekstra acknowledges support from the Netherlands Organisation for Scientific Research (NWO) through grant 639.043.512. IEEC and IFAE are partially funded by the Institució Centres de Recerca de Catalunya (CERCA) and Beatriu de Pinós Programme of Generalitat de Catalunya. Work at Argonne National Lab is supported by UChicago Argonne LLC, Operator of Argonne National Laboratory (Argonne). Argonne, a U.S. Department of Energy Office of Science Laboratory, is operated under contract no. DE-AC02-06CH11357.

This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie Actions, through the following projects: Latin American Chinese European Galaxy (LACEGAL) Formation Network (no. 734374), the Enabling Weak Lensing Cosmology (EWC) Programme (no. 776247), and Barcelona Institute of Science and Technology (PROBIST) Postdoctoral Programme (no. 754510).

PAUS is partially supported by the Ministry of Economy and Competitiveness (MINECO, grants CSD2007-00060, AYA2015-71825, ESP2017-89838, PGC2018-094773, PGC2018-102021, SEV-2016-0588, SEV-2016-0597 and MDM-2015-0509). Funding for PAUS has also been provided by Durham University (ERC StG DEGAS-259586), ETH Zurich, and Leiden University (ERC StG ADULT-279396).

The PAU data centre is hosted by the Port d’Informació Científica (PIC), maintained through a collaboration of CIEMAT and IFAE, with additional support from Universitat Autònoma de Barcelona and the European Research Development Fund (ERDF).

Data availability

The data from PAUS (photometry and photo- $z$ ’s) is currently not yet publicly available. The data from COSMOS were accessed from the ESO Catalogue Facility (https://www.eso.org/qi/, while the data from zCOSMOS (spectra and spec- $z$ ’s) were accessed from the zCOSMOS database (http://cesam.lam.fr/zCosmos/). The derived data generated in this research will be shared on reasonable request to the corresponding author.

References

Abbott et al. (2005) Abbott T., et al., 2005, preprint (arXiv:astro-ph/0510346)
Aihara et al. (2018) Aihara H., et al., 2018, PASJ, 70
Alarcon et al. (2021) Alarcon A., et al., 2021, MNRAS, 501, 6103
Bellman (1957) Bellman R., 1957, JPSJ, 12, 1049
Benítez (2000) Benítez N., 2000, ApJ, 536, 571
Benítez et al. (2014) Benítez N., et al., 2014, preprint (arXiv:1403.5237)
Benjamin et al. (2013) Benjamin J., et al., 2013, MNRAS, 431, 1547
Bilicki et al. (2018) Bilicki M., et al., 2018, A&A, 616, A69
Bonfield et al. (2010) Bonfield D. G., Sun Y., Davey N., Jarvis M. J., Abdalla F. B., Banerji M., Adams R. G., 2010, MNRAS, 405, 987
Bonnett et al. (2016) Bonnett C., et al., 2016, Phys. Rev. D, 94, 042005
Boulade et al. (2003) Boulade O., et al., 2003, Proc. SPIE, 4841, 72
Brescia et al. (2018) Brescia M., Cavuoti S., Amaro V., Riccio G., Angora G., Vellucci C., Longo G., 2018, in Kalinichenko L., Manolopoulos Y., Malkov O., Skvortsov N., Stupnikov S., Sukhomlin V., eds, Data Analytics and Management in Data Intensive Domains. Communications in Computer and Information Science. Springer International Publishing, pp 61–72, doi:10.1007/978-3-319-96553-6_5
Brown et al. (2014) Brown M. J. I., et al., 2014, ApJS, 212, 18
Brun & Rademakers (1997) Brun R., Rademakers F., 1997, Nucl. Instrum. Methods Phys. Res. Sec. A, 389, 81
Bruzual & Charlot (2003) Bruzual A. G., Charlot S., 2003, MNRAS, 344, 1000
Bundy et al. (2015) Bundy K., et al., 2015, ApJS, 221, 15
Cavuoti et al. (2017) Cavuoti S., et al., 2017, MNRAS, 466, 2039
Coleman et al. (1980) Coleman G. D., Wu C.-C., Weedman D. W., 1980, ApJS, 43, 393
Crocce et al. (2016) Crocce M., et al., 2016, MNRAS, 455, 4301
D’Isanto et al. (2018) D’Isanto A., Cavuoti S., Gieseke F., Polsterer K. L., 2018, A&A, 616, A97
De Jong et al. (2013) De Jong J. T. A., Verdoes Kleijn G. A., Kuijken K. H., Valentijn E. A., 2013, Exp. Astron., 35, 25
Duncan et al. (2018) Duncan K. J., et al., 2018, MNRAS, 473, 2655
Duncan et al. (2019) Duncan K., et al., 2019, ApJ, 876, 110
Eriksen et al. (2019) Eriksen M., et al., 2019, MNRAS, 484, 4200
Eriksen et al. (2020) Eriksen M., et al., 2020, MNRAS, 497, 4565
Garilli et al. (2010) Garilli B., Fumana M., Franzetti P., Paioro L., Scodeggio M., Le Fèvre O., Paltani S., Scaramella R., 2010, PASP, 122, 827
Hoecker et al. (2007) Hoecker A., et al., 2007, preprint (arXiv:physics/0703039)
Ilbert et al. (2006) Ilbert O., et al., 2006, A&A, 457, 16
Ivezić et al. (2008) Ivezić Z., et al., 2008, preprint (arXiv:0805.2366)
Johnston et al. (2020) Johnston H., et al., 2020, preprint (arXiv:2010.09696)
Joudaki et al. (2020) Joudaki S., et al., 2020, A&A, 638, L1
Jouvel et al. (2017) Jouvel S., et al., 2017, MNRAS, 469, 2771
Kinney et al. (1996) Kinney A. L., Calzetti D., Bohlin R. C., McQuade K., Storchi-Bergmann T., Schmitt H. R., 1996, ApJ, 467, 38
Koekemoer et al. (2007) Koekemoer A. M., et al., 2007, ApJS, 172, 196
Laigle et al. (2016) Laigle C., et al., 2016, ApJS, 224, 24
Laigle et al. (2018) Laigle C., et al., 2018, MNRAS, 474, 5437
Laureijs et al. (2011) Laureijs R., et al., 2011, preprint (arXiv:1110.3193)
Le Fèvre et al. (2003) Le Fèvre O., et al., 2003, Proc. SPIE, 4841, 1670
Leistedt & Hogg (2017) Leistedt B., Hogg D. W., 2017, ApJ, 838, 5
Lilly et al. (2007) Lilly S. J., et al., 2007, ApJS, 172, 70
Lilly et al. (2009) Lilly S. J., et al., 2009, ApJS, 184, 218
Martí et al. (2014) Martí P., Miquel R., Castander F. J., Gaztanaga E., Eriksen M., Sanchez C., 2014, MNRAS, 442, 92
Miyazaki et al. (2002) Miyazaki S., et al., 2002, PASJ, 54, 833
Padilla et al. (2019) Padilla C., et al., 2019, AJ, 157, 246
Polletta et al. (2007) Polletta M., et al., 2007, ApJ, 663, 81
Polsterer et al. (2016) Polsterer K. L., D’Isanto A., Gieseke F., 2016, preprint (arXiv:1608.08016)
Raihan et al. (2020) Raihan S. F., Schrabback T., Hildebrandt H., Applegate D., Mahler G., 2020, MNRAS, 497, 1404
Sadeh et al. (2016) Sadeh I., Abdalla F. B., Lahav O., 2016, PASP, 128, 104502
Salvato et al. (2019) Salvato M., Ilbert O., Hoyle B., 2019, Nat. Astron., 3, 212
Schmidt et al. (2020) Schmidt S. J., et al., 2020, MNRAS, 499, 1587
Scodeggio et al. (2005) Scodeggio M., et al., 2005, PASP, 117, 1284
Scoville et al. (2007) Scoville N., et al., 2007, ApJS, 172, 1
Siudek et al. (2018) Siudek M., et al., 2018, preprint (arXiv:1805.09905)
Soo et al. (2018) Soo J. Y. H., et al., 2018, MNRAS, 475, 3613
Spergel et al. (2013) Spergel D., et al., 2013, preprint (arXiv:1305.5422)
Tanaka et al. (2018) Tanaka M., et al., 2018, PASJ, 70, S9