This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The PAU Survey: narrowband photometric redshifts using Gaussian processes

John Y. H. Soo1,2, Benjamin Joachimi2, Martin Eriksen3, Małgorzata Siudek3,4 E-mail: johnsooyh@usm.my    Alex Alarcon5, Laura Cabayol3, Jorge Carretero3,6, Ricard Casas7,8    Francisco J. Castander7,8, Enrique Fernández3, Juan García-Bellido9    Enrique Gaztanaga7,8, Hendrik Hildebrandt10, Henk Hoekstra11, Ramon Miquel3,12    Cristobal Padilla3, Eusebio Sánchez13, Santiago Serrano7,8 and Pau Tallada-Crespí6,13.
1School of Physics, Universiti Sains Malaysia (USM), 11800 USM, Pulau Pinang, Malaysia.
2Department of Physics and Astronomy, University College London (UCL), Gower Street, London WC1E 6BT, UK.
3Institut de Física d’Altes Energies (IFAE), The Barcelona Institute of Science and Technology, 08193 Bellaterra (Barcelona), Spain.
4National Centre for Nuclear Research, 7 Pasteura str., 02-093 Warsaw, Poland.
5High Energy Physics Division, Argonne National Laboratory, Lemont, IL 60439, USA.
6Port d’Informació Científica (PIC), Universitat Autònoma de Barcelona, Carrer Albareda S/N, 08193 Bellaterra (Barcelona), Spain.
7Institute of Space Sciences (ICE/CSIC), Universitat Autònoma de Barcelona, Carrer de Can Magrans S/N, 08193 Cerdanyola del Vallès (Barcelona), Spain.
8Institut d’Estudis Espacials de Catalunya (IEEC), 08034 Barcelona, Spain.
9Instituto de Física Teórica (IFT-UAM/CSIC), Universidad Autónoma de Madrid, 28049 Madrid, Spain.
10German Centre for Cosmological Lensing, Astronomisches Institut, Ruhr-Universität Bochum (AIRUB), Universitätsstr. 150, 44801 Bochum, Germany.
11Leiden Observatory, Leiden University, Niels Bohrweg 2, 2333 CA, Leiden, The Netherlands.
12Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Spain.
13Centro de Investigaciones Energeticas, Medioambientales y Tecnologicas (CIEMAT), Avenida Complutense 40, 28040 Madrid, Spain.
(Accepted XXX. Received YYY; in original form ZZZ)
Abstract

We study the performance of the hybrid template-machine-learning photometric redshift (photo-zz) algorithm delight, which uses Gaussian processes, on a subset of the early data release of the Physics of the Accelerating Universe Survey (PAUS). We calibrate the fluxes of the 4040 PAUS narrow bands with 66 broadband fluxes (uBVrizuBVriz) in the COSMOS field using three different methods, including a new method which utilises the correlation between the apparent size and overall flux of the galaxy. We use a rich set of empirically derived galaxy spectral templates as guides to train the Gaussian process, and we show that our results are competitive with other standard photometric redshift algorithms. delight achieves a photo-zz 6868th percentile error of σ68=0.0081(1+z)\sigma_{68}=0.0081(1+z) without any quality cut for galaxies with iauto<22.5i_{\textrm{auto}}<22.5 as compared to 0.0089(1+z)0.0089(1+z) and 0.0202(1+z)0.0202(1+z) for the bpz and annz22 codes, respectively. delight is also shown to produce more accurate probability distribution functions for individual redshift estimates than bpz and annz22. Common photo-zz outliers of delight and bcnz22 (previously applied to PAUS) are found to be primarily caused by outliers in the narrowband fluxes, with a small number of cases potentially indicating spectroscopic redshift failures in the reference sample. In the process, we introduce performance metrics derived from the results of bcnz22 and delight, allowing us to achieve a photo-zz quality of σ68<0.0035(1+z)\sigma_{68}<0.0035(1+z) at a magnitude of iauto<22.5i_{\textrm{auto}}<22.5 while keeping 5050 per cent objects of the galaxy sample.

keywords:
galaxies: distances and redshifts – methods: numerical – methods: statistical
pubyear: 2020pagerange: The PAU Survey: narrowband photometric redshifts using Gaussian processesThe PAU Survey: narrowband photometric redshifts using Gaussian processes

1 Introduction

Photometric redshift (photo-zz) estimation continues to be an active research area as it plays a major role in solving the big questions in cosmology. Redshifts provide radial information (distance) to the traditional two dimensional sky maps of galaxies. They are traditionally determined through spectroscopic methods (spectroscopic redshifts, or spec-zz’s). Yet since the process requires long telescope time for high completeness, photo-zz’s are instrumental for the analysis of large surveys containing of order 108910^{8-9} galaxies. Photo-zz methodology has been evolving and improving a lot over the past couple of decades (e.g. Brescia et al., 2018; Salvato et al., 2019), such that it had been sufficiently useful for most recent cosmological researches.

Photo-zz, as its name suggests, is often determined through the use of a handful of broadband photometric filters obtained from large sky surveys. Photo-zz estimation methods are generally categorised into two different types: the template-based method, which relies on accurate models of spectral energy distribution (SED) templates of different types of galaxies; and the data-driven empirical method, which relies on training sets of galaxies and machine-learning algorithms. Each method however has its own limitations: template-based methods may produce photo-zz’s with large scatter and catastrophic rates without representative templates; while machine-learning methods may perform poorly outside the regions of the parameters covered by the training sample (D’Isanto et al., 2018). As a result, hybrid methods have been implemented to utilise the best of both worlds (Cavuoti et al., 2017; Duncan et al., 2018, 2019).

Many current and upcoming surveys such as the Dark Energy Survey (DES, Abbott et al., 2005), Legacy Survey of Space and Time (LSST, Ivezić et al., 2008), Euclid (Laureijs et al., 2011), Kilo-Degree Survey (KiDS, De Jong et al., 2013), Wide Field Infrared Survey Telescope (WFIRST, Spergel et al., 2013) and Hyper Suprime-Cam (HSC, Aihara et al., 2018) have set stringent photo-zz requirements to ensure that they meet their science goals, forcing the quality of photo-zz methodology to constantly improve. For example, LSST’s photo-zz requirement is to reach a root-mean-square error of σRMS<0.02(1+z)\sigma_{\textrm{RMS}}<0.02(1+z), while the Euclid requirement is σRMS<0.05(1+z)\sigma_{\textrm{RMS}}<0.05(1+z). High quality photo-zz’s are required for a reliable estimation of e.g. weak lensing (Benjamin et al., 2013), angular clustering (Crocce et al., 2016), intrinsic alignment (Johnston et al., 2020), structure formation, galaxy classification and galaxy properties (Jouvel et al., 2017; Laigle et al., 2018; Siudek et al., 2018).

The aforementioned surveys are predominantly broadband surveys which use between 44-99 broadband filters ranging from infrared to ultraviolet. This work however, explores the estimation of photo-zz’s in narrowband surveys, focusing on the Physics of the Accelerating Universe Survey (PAUS, Padilla et al., 2019), which observes the sky using 4040 narrow bands (see Section 2.1). Producing high quality photo-zz’s for such a survey requires careful optimisation between narrow and broad bands, since machine-learning based methods have to be optimised for a larger number of inputs (Eriksen et al., 2020), while template-based methods require more attention towards the narrow emission line features.

Martí et al. (2014) used simulations to predict that by using PAUS narrowband photometry, the photo-zz quality could reach an unprecedentedly low 6868th percentile error of σ68=0.0035(1+z)\sigma_{68}=0.0035(1+z) at a quality cut of 5050 per cent at i<22.5i<22.5. This has been verified by Eriksen et al. (2019), where they combined the 4040 PAUS narrow bands (early data release) with broad bands uBVrizuBVriz from the Cosmic Evolution Survey (COSMOS, Laigle et al., 2016), and using their template-based photo-zz code bcnz22, they showed that this result is achievable when a 5050 per cent photometric quality cut was imposed on the final testing set. In a more recent work, Eriksen et al. (2020) used deepz, a deep learning algorithm on the same data set and showed that it outperformed bcnz22 by reaching 5050 per cent lower in σ68\sigma_{68}. Furthermore, Alarcon et al. (2021) showed that an ever greater precision can be achieved when using additional photometric bands available in the COSMOS field (a total of 6666 bands).

We are motivated by the work of Eriksen et al. (2019), but instead of using purely template-based methods, we attempt to achieve this PAUS photo-zz precision by utilising Gaussian processes (GPs, see Section 3.1) to make empirical adjustments to templates, working on the same data set and conditions. We seek to produce an independent method that is competitive, as that will allow us to exploit synergies with bcnz22 by Eriksen et al. (2019) as shown in this work, deepz (Eriksen et al., 2020), and photo-zz’s by Alarcon et al. (2021) in the future. Therefore the contents of this paper reflect our findings, putting special emphasis on the performance and application of delight (Leistedt & Hogg, 2017), a hybrid template-machine-learning photo-zz code. When carefully calibrated and combined with COSMOS broadband fluxes, delight should achieve equally good results as that of bcnz22. The main aims of this paper are threefold:

  1. 1.

    to optimise and test the performance of the hybrid template-machine-learning photo-zz code delight on a narrowband survey;

  2. 2.

    to develop an optimal method to calibrate the fluxes between the COSMOS broadbands and the PAUS narrow bands;

  3. 3.

    to provide an independent photo-zz solution for PAUS, enabling the study of photometric and spectroscopic redshift outliers.

This paper is structured as follows. In Section 2 we first introduce PAUS and the sources of photometry and spectroscopic redshifts used in this work. Section 3 describes the algorithms (delight, annz22 and bpz) used in this work, together with their optimisation settings and SED templates used. Section 4 describes the full details of how the photometry and spectroscopy from PAUS, COSMOS and zCOSMOS are cross-matched, how the galaxy fluxes are selected, the three methods to calibrate the broadband and narrowband fluxes, and the performance metrics used in this work to compare the results between runs and codes. Section 5 shows the photo-zz results obtained by delight, and a thorough analysis is conducted to compare its performance with annz22, bpz and bcnz22. Finally, in Section 6 we study the photo-zz outliers of delight and bcnz22, and derive new metrics with improved photo-zz outlier identifications. Our work is concluded in Section 7.

2 Photometry and Spectroscopy

In this work, photometric data were obtained from PAUS (Section 2.1) and COSMOS (Section 2.2), while spectroscopic redshifts were obtained from zCOSMOS (Section 2.3). In this section, these surveys will be introduced, together with the selection cuts used to obtain our training and testing sets.

2.1 PAUS

PAUS is a narrowband photometric galaxy survey aimed at mapping the large-scale structure of the Universe up to i23.0i\sim 23.0. Using 4040 narrow bands spaced by 100100 Å in the range between 45004500 to 85008500 Å (filter responses visualised in Eriksen et al., 2019, and Fig. 4), PAUS aims to achieve redshifts with a precision of σRMS<0.0035(1+z)\sigma_{\textrm{RMS}}<0.0035(1+z) for galaxies with iauto<22.5i_{\textrm{auto}}<22.5. PAUS uses the PAUCam instrument (Padilla et al., 2019) on the 44 m William Herschel Telescope (WHT) at Observatorio del Roque de los Muchachos (ORM) in La Palma. It has observed more than 5050 deg2 of sky since the beginning of 20162016, and observations to full depth in all narrow bands for 100100 deg2 are planned.

The PAUS forced-aperture coadded photometry has its aperture defined by using the 5050 per cent light radius (r50r_{50}), the point spread function (PSF), ellipticity and Sérsic index of COSMOS morphology, such that the fluxes measure a fixed fraction of light. The reader is referred to Eriksen et al. (2019) for detailed information on how the PAUS fluxes are measured. In this work we used the early data release from PAUS (objects are observed at least five times, using an elliptical aperture with 62.562.5 per cent light radius), and select objects with iauto22.5i_{\textrm{auto}}\leq 22.5, entries with no missing measurement, and the COSMOS flag TYPE=0 (extended objects).

2.2 COSMOS

The Cosmic Evolution Survey (COSMOS, Scoville et al., 2007) covers a sky area of 22 deg2 (149.47α150.7149.47^{\circ}\leq\alpha\leq 150.7^{\circ}, 1.62δ2.831.62^{\circ}\leq\delta\leq 2.83^{\circ}) and is known for its high sensitivity, depth and an exceptionally low and uniform Galactic extinction (EB-V0.02E_{B\text{-}V}\sim 0.02).

In this work we used photometry from the COSMOS20152015 Catalogue (Laigle et al., 2016); it is a highly complete mass-selected sample to very high redshifts, highly optimised for the study of galaxy evolution and environments in the early Universe. The COSMOS20152015 Catalogue provides 3030 band photometry ranging from near UV to near infrared wavelengths, all these have been observed through multiple facilities, two of which are the Canada-Hawaii-France Telescope (CFHT) and Subaru Telescope (Miyazaki et al., 2002). From this catalogue we only use the CFHT uu^{*}-band (Boulade et al., 2003) and Subaru BB, VV, rr, i+i^{+} and z++z^{++} bands (Miyazaki et al., 2002), in conjunction with the narrowband photometry of PAUS. For simplicity, these bands will be referred to collectively as the uBVrizuBVriz bands; the superscripts are dropped for easier reading.

2.3 zCOSMOS

The zCOSMOS Survey (Lilly et al., 2007) targets galaxies in the COSMOS field using the Visible Multi-Object Spectrograph (VIMOS, Le Fèvre et al., 2003). zCOSMOS-Bright observed 20 68920\,689 galaxies in a sky area of 1.71.7 deg2, these galaxies have magnitudes 15<iauto<22.515<i_{\textrm{auto}}<22.5 and redshifts in the range of 0.1<z<1.20.1<z<1.2, its spectral range is in the red (rest-frame wavelength 55505550 Å to 96509650 Å) to follow strong spectral features around the 40004000 Å break to as high redshifts as possible.

In this work we use data from zCOSMOS-Bright DR33111http://www.eso.org/qi/catalog/show/65. Galaxies with redshift confidence class 33 and 44 (spectroscopic verification rate of 9999 and 99.899.8 per cent, respectively) are selected and cross-matched with PAUS objects.

2.4 Our dataset

Using the aforementioned selection cuts, we cross-matched within 1′′1^{\prime\prime} the 4040-narrowband photometry from PAUS, six broadband photometry (uBVrizuBVriz) from COSMOS, and highly reliable redshifts from zCOSMOS to obtain a data sample of 84068406 galaxies, which is divided randomly into half for training and testing respectively. This sample uses a total of 4646 bands, and flux calibration between the broad and narrow bands is required as they are obtained from different surveys with different flux measurements. The calibration between these fluxes will be discussed in Section 4.

Refer to caption
Figure 1: Colour-magnitude diagram for the PAUS data (red) used in this work in comparison with the COSMOS2015 sample (all objects with TYPE=0 and detected in rr and ii). The contours represent the density of objects.

The colour-magnitude diagram of this sample is shown in Fig. 1, in comparison with the COSMOS20152015 sample (all objects with TYPE=0 and detected in rr and ii). The slight incompleteness in ii magnitude is due to the selection effects in brightness of the spectroscopic redshifts available.

The sample size may seem small, but is sufficient for the GP to work, since the GP essentially creates 4000+4000+ flux-redshift ‘templates’ to produce photo-zz’s for objects in the testing set. However, we note that such a small training size has a major effect on the results of annz22 as this training size is close to the lower limit threshold suggested by Bonfield et al. (2010). We also note that the sample we have chosen is very similar to that of Eriksen et al. (2019), the only difference being that they have a more relaxed cut in the number of bands (N_BANDS), being 35<N_BANDS<40 (workable for a template code like bcnz22), while we used N_BANDS=40222The relaxed cut resulted in Eriksen et al. (2019) having a larger sample size of 10 80110\,801 objects.. When comparing results between delight and bcnz22, we will only compare photo-zz’s of the exact same objects. Note that we have used the same broad bands as used by Eriksen et al. (2019).

3 Algorithms and Templates

3.1 Delight and Gaussian processes

delight 333https://github.com/ixkael/Delight (Leistedt & Hogg, 2017) is a hybrid template-based and machine learning photo-zz algorithm, which was constructed to combine the advantages, and minimise the disadvantages, of both types of algorithms. delight constructs a large collection of latent SED templates (or physical flux-redshift models) from training data, with a template SED library as a guide to the learning of the model. This conceptually novel approach uses Gaussian processes (GPs) operating in flux-redshift space. delight was featured in the results of the LSST Photo-zz Data Challenge 11 (Schmidt et al., 2020), where it was found to have a low photo-zz bias but slightly broader PDFs.

Refer to caption
Figure 2: Illustration of a Gaussian process (GP). The left panel shows data points (black dots), with a single datum to be predicted (green dot). The GP trains on the given data points to provide a best fit function (blue line) as shown on the right. It also provides a Gaussian confidence interval (blue shaded area) for the prediction.

A GP is a supervised learning method, which finds a distribution over the possible functions f(x)f(x) that are consistent with the observed data xx. Consider Fig. 2: suppose we have a set of observed variables y=f(x)y=f(x), we can fit it using a GP, denoted as f𝒢𝒫(μ,k)f\sim\mathcal{GP}\left(\mu,k\right) , which assumes that the probability of all f(x)f(x) is jointly Gaussian and representable by a mean function μ(x)\mu(x) and a covariance matrix Σ(x)=k(xi,xj)\Sigma(x)=k(x_{i},x_{j}). k(xi,xj)k(x_{i},x_{j}) is the kernel function, which relates one variable xix_{i} to another xjx_{j}. An example case would be μ0\mu\equiv 0 and a kernel function that takes the form of a squared exponential,

k(xi,xj)=σf2exp[(xixj)22l2],k(x_{i},x_{j})=\sigma^{2}_{f}\exp\left[\frac{-(x_{i}-x_{j})^{2}}{2l^{2}}\right], (1)

where σf2\sigma_{f}^{2} is the maximum allowable covariance between data (set by the errors on the observation), and ll is the tunable correlation length that determines the smoothness of the GP. In this simplistic case, the GP will try to find a marginalisation of all possible functions, but μ\mu and kk can be modified if an underlying model of the data we want to fit is known. The covariance function is defined such that a smooth function is to be predicted.

Assuming that we have a set of training data {xi,f(xi)}\{x_{i},f(x_{i})\} and would like to find the prediction {x,f(x)}\{x_{*},f_{*}(x_{*})\}, the GP models ff and ff_{*} as jointly Gaussian, 𝒩(μ,Σ)\mathcal{N}(\mu,\Sigma), and therefore

(f(x)f(x))𝒩((μμ),(ΣΣΣTΣ)),\left(\begin{array}[]{cc}f(x)\\ f_{*}(x)\end{array}\right)\sim\mathcal{N}\left(\left(\begin{array}[]{cc}\mu\\ \mu_{*}\end{array}\right),\left(\begin{array}[]{cc}\Sigma&\Sigma_{*}\\ \Sigma_{*}^{T}&\Sigma_{**}\end{array}\right)\right), (2)

where Σ=k(xi,xj)\Sigma=k(x_{i},x_{j}) is the covariance between the training data, Σ=k(x,xi)\Sigma_{*}=k(x_{*},x_{i}) the covariance between training and the predicted data (superscript TT denotes the transpose of the matrix), while Σ=k(x,x)\Sigma_{**}=k(x_{*},x_{*}) is the variance of the predicted data.

It follows from the above that the posterior p(f|x,xi,fi)p(f_{*}|x_{*},x_{i},f_{i}) is also Gaussian, therefore a predicted point f(x)f_{*}(x_{*}) is plotted (green dot in Fig. 2) is modelled by a Gaussian function (smooth blue line) which runs across all points, with its 95%95\% confidence interval (±1.96σf\pm 1.96\sigma_{f_{*}}) represented by the navy shaded area.

In the context of delight, GPs are used to calculate the predicted fluxes F^\hat{F} at a certain redshift zz for a training object ii with fluxes FiF_{i} and redshift ziz_{i}. This could be better understood by first defining the posterior photo-zz distribution p(z|F^)p(z|\hat{F}) of an object in the testing set. For machine learning methods, it has the form

p(z|F^)ip(F^|z,zi,Fi)p(z|zi,Fi)p(zi,Fi),p(z|\hat{F})\approx\sum_{i}p(\hat{F}|z,z_{i},F_{i})\,p(z|z_{i},F_{i})p(z_{i},F_{i}), (3)

where p(F^|z,zi,Fi)p(\hat{F}|z,z_{i},F_{i}) is the prediction for fluxes of the training galaxy at a different redshift zz, while p(z|zi,Fi)p(z|z_{i},F_{i}) and p(zi,Fi)p(z_{i},F_{i}) are the priors that provide the redshift distributions and abundances, generated from the training data, which are multiplied to give the combined probability p(z,zi,Fi)p(z,z_{i},F_{i}) for a given redshift zz and training object with redshift ziz_{i} and fluxes FiF_{i}. This is analogous to the one derived from template-based methods,

p(z|F^)ip(F^|z,ti)p(z|ti)p(ti),p(z|\hat{F})\approx\sum_{i}p(\hat{F}|z,t_{i})\,p(z|t_{i})p(t_{i}), (4)

where tit_{i} is the template, p(z|ti)p(ti)=p(z,ti)p(z|t_{i})p(t_{i})=p(z,t_{i}) is the prior and p(F^|z,ti)p(\hat{F}|z,t_{i}) is the probability of the predicted flux F^\hat{F} at redshift zz and for template tit_{i}. Both equations are easily differentiated by the fact that for template-based methods, p(z|F^)p(z|\hat{F}) is derived using a list of templates tit_{i}, while for machine learning methods it is derived using the individual training set objects with fluxes FiF_{i} and spectroscopic redshift ziz_{i}.

delight differs a little from the usual machine learning method in the sense that instead of finding a direct empirical relationship between the fluxes and redshifts of the training objects, it uses a GP to model the predicted fluxes of a training galaxy at different redshifts with the help of SED templates. This creates a latent flux-redshift template for each training object, where for a given set of fluxes in the testing set, it could be compared to several training templates to find the best predicted redshift.

The algorithm first fits a best-fit SED template to a particular training object ii with redshift ziz_{i} and fluxes FiF_{i} (multiple bands); the best-fit SED template is then used to formulate the mean function and kernel of a GP to build a flux-redshift template which could predict the expected fluxes of certain band filters when this object is redshifted to a different zz. With each training object now becoming a flux-redshift template, the final photo-zz posterior distribution of a testing set object is determined by making a pairwise comparison of every training-testing pair, and a weighted solution is obtained based on the best fits of each pair.

In other words, we are computing the probability that the target galaxy has the same SED as the training galaxy but at a different redshift. delight is thus a hybrid template-machine-learning photo-zz algorithm in the sense that SED templates are used to ‘guide’ the creation of flux-redshift templates based on the training objects, or, if seen from another perspective, the GP ‘corrects’ the SED templates by using training data. We refer the reader to Leistedt & Hogg (2017) for more on Gaussian processes, and also for the full expressions of the μ\mu and kk in relation to the filter responses, flux normalisations, linear mixtures of physical SED templates, and the manually configurable SED residual function of emission lines.

delight is advantageous over many other photo-zz algorithms as its output is less dependent on representative training data, and it does not strictly require the training set to use the same photometric bands. However, it still requires accurate spectroscopic redshifts, high quality training fluxes and representative templates to produce high quality photo-zz probability density functions (PDFs), or p(z)p(z). As such, given a few photometric bands, delight is able to predict missing bands or fluxes in an entirely different set of photometric bands, and this function is utilised in Section 4.1 to predict and calibrate the flux values between two surveys.

3.2 Delight optimisation

The optimisation settings of delight used in this work are as follows. For the GP setup, the number of Gaussians to fit the filter curves (numGpCoeff) was set to 77 instead of the default 2020, appropriately selected to accommodate the smaller full width half maximum (FWHM) of the narrowband filters. Other than that, we have mainly used the default hyperparameter settings for delight with the exception of the widths of the luminosity and redshift priors σ\sigma_{\ell} and σz\sigma_{z} (ellPriorSigma and zPriorSigma, see Leistedt & Hogg, 2017), which have been lowered to 0.20.2 and 0.10.1 respectively as they produced better results.

As mentioned earlier, the mean function and the kernel of the GP are modelled after the choice of emission lines and SED template sets. We replaced the 33 default emission lines in delight with the list provided by Eriksen et al. (2019), although we note that the change in result for this is insignificant. As for the templates, we used the Brown et al. (2014) high-quality templates, which consist of 129129 SEDs derived from real nearby galaxies. These templates have wavelengths covering the ultraviolet to mid-infrared, and encompass a broad range of galaxy types including ellipticals, spirals, merging galaxies, blue compact dwarfs and luminous infrared galaxies. In this work we have also tested the performance of various other template sets (Coleman et al., 1980; Kinney et al., 1996; Bruzual & Charlot, 2003; Ilbert et al., 2006; Polletta et al., 2007); however they do not perform as well as those of Brown et al. (2014): the root-mean-square photo-zz errors could range between 2121 to 112112 per cent higher when these templates are used. Therefore, the results from these tests are not shown in this work.

We note that delight requires all magnitudes mim_{i} and magnitude errors to be converted into fluxes FiF_{i} and flux variances, with a zero-point adjustment of 26.426.4 in magnitude (i.e. Fi=100.4(mi26.4)F_{i}=10^{-0.4(m_{i}-26.4)}). We have also added a 33 and 66 per cent flux error in quadrature to the flux variances for the narrow and broad bands, respectively, to account for other flux errors from both the data and the model (values estimated via trial and error). It is also worth mentioning that while delight is capable of processing negative fluxes (non-detections), the reference band (referenceBand) used for flux normalisation only handles fluxes with positive values. In this work we have selected the narrow band nb625nb625 as the reference band, or the COSMOS rr-band in cases where narrow bands were not used.

Throughout this work, we use zmapz_{\textrm{map}} (the maximum a posteriori of the PDF) to represent the best point estimate photo-zz produced by delight. The output photo-zz PDF bins were set to be linear instead of logarithmic, with a stepsize of 0.0010.001, and a range of 0.02<z<1.650.02<z<1.65, keeping close to the limits of the spectroscopic redshifts.

3.3 Other algorithms

We are also interested in how delight compares to other common template-based or machine-learning-based methods besides bcnz22 and deepz. Therefore two other photo-zz algorithms, annz22 and bpz are also used in this work, using the same training and template sets, to be compared with the performance of delight. In the following paragraphs we briefly introduce the two algorithms and their optimisation settings.

annz22 444https://github.com/IftachSadeh/ANNZ (Sadeh et al., 2016) is a machine-learning-based photo-zz algorithm which has been widely used in recent works (Bonnett et al., 2016; Jouvel et al., 2017; Bilicki et al., 2018; Soo et al., 2018; Schmidt et al., 2020) due to its high customisability and its ability to produce PDFs. It uses the Toolkit for Multivariate Data Analysis (TMVA, Hoecker et al., 2007) with root (Brun & Rademakers, 1997), which allows it to run multiple different machine learning algorithms for training, and outputs photo-zz’s based on a weighted average of their performance. In this work we ran annz22 with a mixture of 33 machine learning methods, namely artificial neural networks (ANNs), boosted decision trees (BDTs) and kk-nearest neighbours (KNNs), see Hoecker et al. (2007) for detailed descriptions of these machine learning algorithms. An architecture of NN:2N+13\frac{2N+1}{3}:N+23\frac{N+2}{3}:11 was used for the ANN; the bagging method was used to boost the decision trees; a polynomial kernel was used for the KNN; while the other hyperparameters for each method were individually optimised for best performance. annz22 version 2.3.12.3.1 was used in this work, and the mean value of the PDF, zpdfz_{\textrm{pdf}} was chosen to represent the photo-zz point estimate.

bpz 555http://www.stsci.edu/~dcoe/BPZ/ (Benítez, 2000), on the other hand, is one of the longest-standing template-based photo-zz algorithms, and still widely used today (Martí et al., 2014; Bundy et al., 2015; Cavuoti et al., 2017; Tanaka et al., 2018; Joudaki et al., 2020; Raihan et al., 2020). Other than sharing the usual attributes of a template-based code, bpz uses Bayesian inference, prior information of redshift distributions and template interpolation to improve photo-zz results. bpz version 1.99.31.99.3 was used in this work, and similar to delight the Brown templates were used, with the interpolation parameter set to 22. We assumed the same functional form for the Bayesian priors as those used by COSMOS (Laigle et al., 2016). The peak of the PDF, zbz_{\textrm{b}}, was used as the best photo-zz point estimate.

Other than annz22 and bpz, the results of delight are also compared to the results of bcnz22, which was developed specifically for the PAUS data (Eriksen et al., 2019). bcnz22 is able to compute a linear combination of SED templates and is designed to deal with emission lines, extinction, and adjust zero-points between narrow and broad bands, all of which are crucial in the context of PAUS. The introduction of the code bcnz22 and its early demonstration of PAUS photo-zz can be found in Eriksen et al. (2019).

4 Flux Calibration

This work utilises fluxes obtained from two different surveys: the PAUS narrowband fluxes are measured using an aperture which covers 62.562.5 per cent of light from the galaxy, while COSMOS broadband fluxes are measured using a fixed 3′′3^{\prime\prime} aperture. Therefore, calibration is required to ensure that the flux values are consistent with one another. We only calibrate the broadband fluxes, leaving the narrowband fluxes untouched following Eriksen et al. (2019). The calibration process is done in two steps: first we derive empirical corrections to account for differences in the aperture photometry (calibration for each galaxy), then placing all bands at the same flux zero point (calibration for each band). For the correction for differences in flux aperture, we note that ideally this could have been easily done if spec-zz’s are available; however since the evaluation set would not have spec-zz’s available, we present 33 alternatives in the following sections to calibrate the fluxes photometrically.

4.1 Correction for differences in flux aperture

In the first step we define a parameter RgR_{\textrm{g}}, a correction factor estimated for each galaxy to be multiplied with all of its six uBVrizuBVriz broadband fluxes. Ideally, this factor is estimated by first finding the best-fit Brown template for each galaxy using only 4040 narrowband fluxes from PAUS and its true redshift. The best-fit template is then used to generate the predicted uBVrizuBVriz fluxes, and a weighted mean of the ratios between the predicted flux and the original COSMOS flux Rg,bR_{\textrm{g},b} is calculated for each band bb, given by

Rg=bRg,b/σRg,b2b1/σRg,b2,R_{\textrm{g}}=\frac{\sum_{b}R_{\textrm{g},b}/\sigma_{R_{\textrm{g}},b}^{2}}{\sum_{b}1/\sigma_{R_{\textrm{g}},b}^{2}}, (5)

where the sum is over the six COSMOS broad bands, and σRg,b2\sigma_{R_{\textrm{g}},b}^{2} is the variance of Rg,bR_{\textrm{g},b}. Here we have assumed that the Brown templates are sufficiently representative, and therefore the predicted flux derived from it is the true flux of the broad bands. We have also assumed that Rg,bR_{\textrm{g},b} should be almost the same across each band for each galaxy. This calibration is motivated by the fact that each galaxy requires a calibration between fixed-size and adaptive aperture photometry dependent on its apparent size.

We now explore three different methods to determine RgR_{\textrm{g}} from the photometric data only.

4.1.1 The photo-z calibration method

The first method, which we call the photo-zz calibration method, is very similar to the method above except that we replace the spectroscopic redshifts used to determine the predicted uBVrizuBVriz flux for the testing set with photometric redshifts. We first use delight and only the 4040 narrow bands to produce photo-zz’s for each object, and then we use these photo-zz’s to estimate the predicted fluxes, and then later RgR_{\textrm{g}} for each galaxy. This implies that the better the quality of the photo-zz’s produced by only the 4040 narrow bands, the better the calibrated broadband fluxes will be.

4.1.2 The size calibration method

The second method, hereafter the size calibration method, does not require the production of predicted fluxes for the testing set. Instead, this method uses the correlation between the sizes of galaxies with their values of RgR_{\textrm{g}} in the training set, to predict the values of RgR_{\textrm{g}} for objects in the testing set. With the predicted fluxes of the training set known, we plot RgR_{\textrm{g}} against the 5050 per cent light radius r50r_{50} (measured in pixels) for each object, and obtain a best-fit linear-least-squares regression line in the process,

Rg=mr50+c,R_{\textrm{g}}=m\cdot r_{50}+c, (6)

where the slope and yy-intercept are found to be m=0.0101m=0.0101 and c=0.4504c=0.4504 respectively, with a correlation coefficient of r=0.8349r=0.8349, implying a strong positive correlation between RgR_{\textrm{g}} and r50r_{50}.

With this relationship derived, the values of RgR_{\textrm{g}} for each object in the testing set can be estimated. This method is motivated by the fact that the size of galaxies is a defining factor for the difference in their flux values when measured using a fixed aperture or when measured using a fixed light radius. Fig. 3 shows a scatter plot of r50r_{50} v.s. RgR_{\textrm{g}} for the training set, where the correlation equation is determined. The distribution of RgR_{\textrm{g}} is also tabulated in the figure, it is shown to have a median value of 0.63490.6349, implying that on average COSMOS measures more flux for each galaxy than PAUS. We note that in the case when galaxies have undefined values of r50r_{50}, we substitute them with the mean value of r50=22.4934r_{50}=22.4934 pixels.

Refer to caption
Figure 3: Top: Correlation between r50r_{50} and RgR_{\textrm{g}} for the training set, where RgR_{\textrm{g}} is a calibration correction factor estimated for each galaxy to be multiplied with all of its six uBVrizuBVriz broadband fluxes. Bottom: The distribution of RgR_{\textrm{g}} of the training set, estimated using the size calibration method. NN is the number of galaxies.

4.1.3 The flux calibration method

The third and final method is the flux calibration method, which is similar to the method used by Eriksen et al. (2019), but simpler in that the Gaussian Process has a larger capacity to accommodate uncertainties. This method makes use of the fact that there are overlaps in wavelength between the COSMOS broad bands and PAUS narrow bands: the VV-band overlaps with the narrow bands nb505nb505 to nb585nb585 (99 bands); the rr-band overlaps with nb565nb565 to nb685nb685 (1313 bands); and the ii-band overlaps with nb705nb705 to nb835nb835 (1414 bands). This overlap is illustrated in Fig. 4.

Refer to caption
Figure 4: The overlapping wavelengths between 3434 PAUS narrowband filters and 33 COSMOS broadband filters: VV overlaps with nb505nb505-nb585nb585 (99 bands); rr overlaps with nb565nb565-nb685nb685 (1313 bands); and ii overlaps with nb705nb705-nb835nb835 (1313 bands). Note that the filter responses from PAUS and COSMOS are normalised at different values, respectively.

Similar to the previous method, no redshift information is required for flux prediction, the RgR_{\textrm{g}} in this case is estimated by first averaging the narrowband fluxes within the range of the broad band of interest (VV, rr or ii), and then taking the ratio between the broadband flux and the averaged narrowband fluxes. This will give us 33 values of Rg,bR_{\textrm{g},b} for the 33 VriVri bands, and finally RgR_{\textrm{g}} for each galaxy is taken as the weighted average of the 33 values.

This method is simple yet effective: it does not involve the spectroscopic redshift, the photo-zz derived by 4040 narrow bands, or even the size of the galaxy. Here we assume that the RgR_{\textrm{g}} estimated using VriVri is applicable for the uBzuBz bands as well. We will compare the overall photo-zz quality produced by the three methods above in Section 5.2.

4.2 Correction to flux zero points

After calibrating the COSMOS broadband fluxes for each galaxy, we proceed to calibrate the broadband magnitude offsets within each band. We perform a weighted least-squares fit between the predicted broadband fluxes (produced by delight using 4040 PAUS narrowband fluxes, the respective best fit Brown templates and zCOSMOS spec-zz’s) and the original COSMOS uBVrizuBVriz fluxes in the training set, by using a simple linear equation,

ln(Fp,b)=abln(Fg,b)+cb\ln(F_{\textrm{p},b})=a_{b}\cdot\ln(F_{\textrm{g},b})+c_{b} (7)

where Fp,bF_{\textrm{p},b} is the predicted flux for band bb, Fg,bF_{\textrm{g},b} the COSMOS broadband flux after undergoing the per-galaxy calibration, and aba_{b} and cbc_{b} are constants to be optimised. The values of aba_{b} and cbc_{b} estimated for each band using the training set are now used to calibrate the fluxes in the testing set, and these values are tabulated in Table 1. A weighted fit was implemented, with the inverse variances of the fluxes used as the weights, since we expect that objects which are brighter to have relatively lower variances, and by accounting for the variances of objects the fainter objects would be upweighted.

Table 1: List of the best fit parameters aba_{b} and cbc_{b} for each band bb when the predicted and original COSMOS fluxes from the training set were fitted with a weighted least-squares fit, using Equation 7.
Bands aba_{b} cbc_{b}
uu 1.0007±0.00011.0007\pm 0.0001 0.0354±0.00080.0354\pm 0.0008
BB 0.9906±0.00020.9906\pm 0.0002 0.2163±0.00090.2163\pm 0.0009
VV 0.9988±0.00020.9988\pm 0.0002 0.0830±0.0009-0.0830\pm 0.0009
rr 1.0006±0.00021.0006\pm 0.0002 0.0015±0.00090.0015\pm 0.0009
ii 1.0202±0.00011.0202\pm 0.0001 0.0875±0.0008-0.0875\pm 0.0008
zz 0.9791±0.00010.9791\pm 0.0001 0.0424±0.00070.0424\pm 0.0007

As expected from the table, the values of aba_{b} and cbc_{b} are very close to 11 and 0 respectively, since the calibrated flux for aperture correction Fg,bF_{\textrm{g},b} is already very close to the predicted flux Fp,bF_{\textrm{p},b}. Essentially, this process ‘straightens’ the correlation line, providing minor yet essential improvements to the overall calibration.

4.3 Overall calibration performance

Refer to caption
Figure 5: The uBVrizuBVriz broadband fluxes predicted by delight plotted against their original COSMOS fluxes, both before and after the two-step calibration process (red and blue respectively) for our training set, using the flux calibration method as an example. Based on the root-mean-square errors (σRMS\sigma_{\textrm{RMS}}) shown in each panel, the broadband fluxes match their prediction much better after calibration.

Figure 5 shows the correlation between the broadband fluxes predicted by delight (using spectroscopic redshifts, PAUS 4040 narrow bands and Brown templates) and the COSMOS broadband fluxes for our training set, both before and after calibration (red and blue, respectively). The figure only shows the result of the flux calibration method, as the other two methods look very similar graphically (which translates to a small difference in photo-zz results shown later in Section 5).

The RMS values displayed in Figure 5 show that for all bands, the scatter between the original fluxes with respect to the predicted fluxes has reduced by 6363 to 8888 per cent after the two-step calibration was done. The scatter at low fluxes for the uu and BB-bands remains evident, which originated from the high uncertainty in flux measurements. Despite the large decrease in scatter, we note that the RMS value here is not a metric of improvement for calibration as we do not have the true values of the broadband fluxes in the matched apertures. However, the calibration of the broadband fluxes did translate into an improvement in photo-zz scatter and 6868th percentile error by about 7070 to 8080 per cent, as shown in Section 5.

5 Results and Discussion

Table 2: The root-mean-square error (σRMS\sigma_{\textrm{RMS}}), 6868th percentile error (σ68\sigma_{68}), outlier fraction (ηout\eta_{\textrm{out}}), mean continuous ranked probability score (ρCRPS\rho_{\textrm{CRPS}}) and the root-mean-square error in redshift distribution (nRMSn_{\textrm{RMS}}) for the photo-zz’s produced in this work, using different algorithms, methods and number of bands. All results are produced using 66 broad bands (BB) and 4040 narrow bands (NB) unless stated otherwise.
Photo-zz methods σRMS\sigma_{\textrm{RMS}} σ68\sigma_{68} ηout\eta_{\textrm{out}} (%\%) ρCRPS\rho_{\textrm{CRPS}} nRMSn_{\textrm{RMS}}
delight (66BB only) 0.05140.0514 0.04410.0441 0.930.93 0.03880.0388 0.8850.885
delight (4040NB only) 0.06840.0684 0.01190.0119 4.024.02 0.02980.0298 0.6370.637
delight (no calibration) 0.15550.1555 0.05660.0566 9.069.06 0.08870.0887 0.8950.895
delight (photo-zz calibration method) 0.03350.0335 0.00830.0083 0.710.71 0.01580.0158 0.6340.634
delight (size calibration method) 0.03410.0341 0.00950.0095 0.760.76 0.01650.0165 0.6460.646
delight (flux calibration method) 0.03310.0331 0.00810.0081 0.860.86 0.01550.0155 0.6360.636
delight (flux calibration method, no GP) 0.04420.0442 0.00890.0089 0.980.98 0.01790.0179 0.6390.639
annz22 0.05560.0556 0.03960.0396 2.662.66 0.07190.0719 0.4650.465
annz22 (66BB only) 0.03710.0371 0.02020.0202 1.141.14 0.05220.0522 0.4320.432
bpz 0.03680.0368 0.00890.0089 0.860.86 0.01840.0184 0.7400.740
bcnz22 0.04030.0403 0.00850.0085 1.141.14 - -

Table 2 summarises the results of this work, it shows all the photo-zz metrics we produced, using different algorithms (delight, annz22, bpz), different calibration methods (flux, photo-zz and size), and different number of input fluxes (66 broad bands, 4040 narrow bands, or both). We divide the analysis of the results into two sections: Section 5.2 studies the performance between the three calibration methods used in delight, while Section 5.3 compares the best performance of delight with annz22, bpz and bcnz22. In the following section, we briefly introduce the performance metrics we used in this work.

5.1 Performance metrics

In this work we use three metrics to quantify the performance of the photo-zz point estimates: the root-mean-square error (σRMS\sigma_{\textrm{RMS}}), the 6868th percentile error (σ68\sigma_{68}) and the outlier fraction rate (ηout\eta_{\textrm{out}}). With Δzzphotzspec1+zspec\Delta z\equiv\frac{z_{\textrm{phot}}-z_{\textrm{spec}}}{1+z_{\textrm{spec}}}, the above metrics are defined as follows:

σRMS1NiN|Δzi|2,\sigma_{\textrm{RMS}}\equiv\sqrt{\frac{1}{N}\sum_{i}^{N}\left|\Delta z_{i}\right|^{2}}\ , (8)
σ68Q84.1%(Δzi)Q15.9%(Δzi)2,\sigma_{68}\equiv\frac{Q_{84.1\%}(\Delta z_{i})-Q_{15.9\%}(\Delta z_{i})}{2}\ , (9)
ηout% objects where|Δzi|0.15.\eta_{\textrm{out}}\equiv\textrm{\% objects where}\left|\Delta z_{i}\right|\geq 0.15\ . (10)

Here NN is the total number of galaxies, while QQ is a percentile of the distribution. Since σRMS\sigma_{\textrm{RMS}} is calculated without the outliers removed, it measures the overall scatter of the sample, whereas σ68\sigma_{68} measures the scatter with reduced sensitivity to outliers.

With similar motivations as Martí et al. (2014) and Eriksen et al. (2019), we hope to achieve an overall photo-zz error of σ680.0035(1+zspec)\sigma_{68}\leq 0.0035(1+z_{\textrm{spec}}) for at least 5050 per cent of the testing sample after applying an appropriately chosen quality cut. We use the Bayesian ODDS (Θ\Theta) parameter (Benítez, 2000) in delight, similar to its implementation in annz22 by Soo et al. (2018). Θ\Theta can be estimated from the photo-zz PDF, p(z)p(z) using the equation

Θ=zpk(1+zp)zp+k(1+zp)p(z)𝑑z,\Theta=\int^{z_{\textrm{p}}+k(1+z_{\textrm{p}})}_{z_{\textrm{p}}-k(1+z_{\textrm{p}})}p(z)\,dz\ , (11)

where zpz_{\textrm{p}} is the peak of p(z)p(z) and k=0.01k=0.01. Θ\Theta ranges between 0 and 11, the higher the value the lower the p(z)p(z) width, which implies a more precisely predicted photo-zz (though not necessarily accurate). The value of kk is arbitrary, appropriately selected such that not too many objects end up having Θ=1\Theta=1. Therefore, an xx per cent quality cut on the sample keeps the top xx percent of objects with the highest values of Θ\Theta.

To assess the quality of the p(z)p(z), we use probability integral transform (PIT) plots and the continuous ranked probability score (CRPS). The PIT is the cumulative distribution function (CDF) at zspecz_{\textrm{spec}} while asserting the p(z)p(z) to have an area of unity. Since the photo-zz CDF is C(z)=0zp(z)𝑑zC(z)=\int^{z}_{0}p(z^{\prime})\ dz^{\prime}, PIT is defined to be

PIT=C(zspec)=0zspecp(z)𝑑z.\textrm{PIT}=C(z_{\textrm{spec}})=\int_{0}^{z_{\textrm{spec}}}p(z)\,dz\ . (12)

A PIT distribution tells us on average if the p(z)p(z) produced are ‘adequately shaped’: the shape of the PIT distribution can tell us if the p(z)p(z) produced are generally too wide/narrow, or if the p(z)p(z) are over/under-predicting the true redshift.

The CRPS on the other hand tells us how well the p(z)p(z) encapsulates or predicts the true redshift (zspecz_{\textrm{spec}}). The CRPS of a p(z)p(z) can be expressed as

CRPS=|C(z)(zzspec)|2𝑑z,\textrm{CRPS}=\int^{\infty}_{-\infty}\left|C(z)-\mathcal{H}(z-z_{\textrm{spec}})\right|^{2}\,dz\ , (13)

where (zzspec)\mathcal{H}(z-z_{\textrm{spec}}) is the Heaviside step function with

(zzspec)={1,z=zspec0,otherwise.\mathcal{H}(z-z_{\textrm{spec}})=\left\{\begin{array}[]{ll}1,&z=z_{\textrm{spec}}\\ 0,&\textrm{otherwise}.\end{array}\right. (14)

In this work, we use the symbol ρCRPS\rho_{\textrm{CRPS}} to represent the average CRPS value of all galaxies in the testing sample, in which the smaller the value, the better the p(z)p(z) are at predicting their true redshifts. We refer the reader to Polsterer et al. (2016) for a detailed description of both PIT and CRPS.

Finally, we also assess the quality of the redshift distribution n(z)n(z). We can find how similar the spec-zz distribution nspec(z)n_{\textrm{spec}}(z) is compared to the photo-zz distribution nphot(z)n_{\textrm{phot}}(z) by estimating nRMSn_{\textrm{RMS}}, the root-mean-square difference between the distributions:

nRMS=[nphot(z)nspec(z)]2𝑑z.n_{\mathrm{RMS}}=\sqrt{\int\left[n_{\mathrm{phot}}(z)-n_{\mathrm{spec}}(z)\right]^{2}\,dz}. (15)

nRMSn_{\textrm{RMS}} provides us a quantitative measure to compare the performances of photo-zz with distributions produced by different codes.

5.2 Performance of Delight

Refer to caption
Figure 6: Plot of photo-zz vs. spec-zz, comparing the photo-zz’s when trained and tested using only 66 uBVrizuBVriz broad bands (BB, left), only 4040 narrow bands (NB middle), and all 4646 bands combined (right). The flux calibration method was used for this plot.

Rows 11 and 22 from Table 2 shows the photo-zz’s produced when only trained using the broad and narrow bands individually, and we find that by combining both broad and narrow bands (rows 44 to 66), we have achieved at least 3434 and 2020 per cent improvement in the photo-zz scatter and σ68\sigma_{68}, respectively (visualised in Fig. 6).

Rows 33 to 77 proceed to show the metrics for each calibration method, and on average, the performance of each method is quite similar, all within 44 to 1616 per cent difference in σRMS\sigma_{\textrm{RMS}} and σ68\sigma_{68}, respectively. Statistically, the flux calibration method seems to perform slightly better compared to the remaining ones, with the exception of the photo-zz calibration method having better values of ηout\eta_{\textrm{out}} and nRMSn_{\textrm{RMS}}. This suggests that while the photo-zz’s produced by training with only 4040 narrow bands are not as competitive as when trained with all 4646 bands and calibrated broad bands (see Table 2 and Fig. 6), it is however sufficient to guide the calibration process. Note that we have also included the results of delight run as a pure template code when calibrated using the flux calibration method for comparison, and we see that without the help of the GP, the photo-zz results are similar for most metrics except a degradation in scatter of up to 33.5-33.5 per cent. Therefore the good results of delight shown here are mainly due to the use of the Brown templates, the flux calibration, the combination of broad and narrow bands, and also the work of the GP.

As the three calibration methods presented in Section 4 all result in very similar photo-zz performance, we will only show results for the flux calibration method in the following. It is notable however that in all cases, the photo-zz requirement of σ68<0.0035(1+z)\sigma_{68}<0.0035(1+z) is achievable for all objects at iauto<20.0i_{\textrm{auto}}<20.0, or objects with a 4040 per cent Θ\Theta cut at iauto<22.5i_{\textrm{auto}}<22.5. All three methods also shows that despite such high percentage Θ\Theta cuts being implemented, a significant number of high photo-zz objects still remain in the sample.

5.3 Comparison with Other Algorithms

Since the delight results for each of the three calibration methods are very similar to each other, we decided to select only the flux calibration method to be compared to the results obtained by the two other algorithms used in this work, annz22 and bpz. We also include the point estimates from Eriksen et al. (2019). The values of σRMS\sigma_{\textrm{RMS}}, σ68\sigma_{68} and other relevant metrics obtained from these algorithms are shown in rows 88 to 1111 of Table 2, and visualised in Fig. 7.

Refer to caption
Figure 7: Plots of photo-zz vs. spec-zz, comparing the results of the flux calibration method of delight (green), annz22 trained with 4646 bands / 66 broad bands (orange), bcnz22 (blue) and bpz (magenta). The same colouring scheme will be used to represent the respective methods in the following plots.

From the figure, it is found that annz22, being a purely machine-learning based algorithm, is underperforming compared to the other algorithms. This machine-learning method is unable to make full use of the extra information provided by the 4040 narrow bands, and is shown to perform better without them. This is partially due to the problem of the curse of dimensionality (Bellman, 1957), sharply diluting the pattern recognition power of the algorithm as the number of inputs increases. Besides, the very small training sample size may have heavily affected the potential of annz22. Here we note however that the deep learning code deepz is shown to work well on a similar sample (Eriksen et al., 2020), therefore we hope to do follow-up evaluations of annz22 on PAUS data in the future when a larger training set is available.

Refer to caption
Figure 8: Plots of σRMS\sigma_{\textrm{RMS}} (top) and σ68\sigma_{68} (bottom) with respect to iautoi_{\textrm{auto}} (cumulatively), comparing the performance of delight (left) with annz22 (middle), bpz (right) and bcnz22 (left, dashed lines). The coloured lines represent the sample when cut systematically in the Bayesian odds (Θ\Theta), keeping only objects with the best 100%100\% (red), 90%90\% (orange), 80%80\% (green), 70%70\% (blue), 60%60\% (navy), 50%50\% (purple) and 40%40\% (magenta) values. The black horizontal dashed line with σ68=0.0035(1+z)\sigma_{68}=0.0035(1+z) represents the photo-zz quality target of PAUS for 5050 per cent of the objects at i22.5i\sim 22.5.
Refer to caption
Figure 9: Plot of percentage of objects within each photo-zz bin with respect to the cut in Θ\Theta value for the results of delight (top left), annz22 (top right), bcnz22 (bottom left) and bpz (bottom right). The lines use the same colour scheme as those in Fig. 8, while the histograms in the background show the photo-zz distribution for each method (relative number of objects in each photo-zz bin).

In terms of the quality of the point estimate photo-zz’s, delight is shown to fare well against bcnz22 and bpz (Fig. 7), both of which are purely template-based methods. As both delight and bpz used the same template sets in this case (i.e. the Brown templates), we find that the Gaussian process contributed to 2525 and 99 per cent improvement in the scatter and σ68\sigma_{68}, respectively, as compared to the pure template fit of bpz.

Despite the similarities in the point estimates for the entire sample (Table 2), when we cut the sample in percentages of Θ\Theta (Figs. 8 and 9), we see two major differences. Firstly, the cut in Θ\Theta for bpz does not systematically remove objects with high uncertainties (especially for objects brighter than iauto=21i_{\textrm{auto}}=21); and secondly, the cut in Θ\Theta for bpz selectively removes objects with lower photo-zz. In both cases, delight is shown to not only perform better in this regard as compared to bpz, but also better than all other algorithms shown.

Refer to caption
Figure 10: Sample redshift PDF p(z)p(z) for the annz22 (orange), delight (green) and bpz (magenta). The black vertical lines shows the positions of the spectroscopic redshifts.
Refer to caption
Figure 11: Probability integral transform (PIT) distributions for the p(z)p(z) produced by the four different algorithms, delight (green), bcnz22 (blue), annz22 (orange) and bpz (magenta). The dashed horizontal line indicates the mean of the distribution, and a flat distribution is ideal. A U-shaped distribution indicates that the p(z)p(z) produced are too narrow, while a mountain-peak shaped distribution indicates that the p(z)p(z) produced are too wide.

A selection of sample p(z)p(z) produced by each algorithm is shown in Fig. 10, while the overall quality of the p(z)p(z) produced are visualised in the PIT plots as shown in Fig. 11. Once again we see delight on average producing superior p(z)p(z) compared to annz22 and bpz: it is obvious from the PIT plots that the p(z)p(z) produced by annz22 are too narrow (a U-shaped distribution), while those by bpz are too wide (a significant central peak). In terms of ρCRPS\rho_{\textrm{CRPS}} (see Table 2), delight once again performs better than both bpz and annz22, where the adequate shapes and accurately positioned peaks of the p(z)p(z) provide good predictions of the true redshift.

We note that the p(z)p(z) produced by annz22 are ragged compared to bpz and delight, this is due to the limited training sample size and the low number of network committees used. We intend to look into several methodologies to smoothen machine-learning based p(z)p(z) which are limited by such conditions; this is left for future work. The limited testing size has also produced an nspec(z)n_{\textrm{spec}}(z) distribution which is not smooth, thus despite annz22 producing an n(z)n(z) closest to the spectroscopic distribution (lowest nRMSn_{\textrm{RMS}}), it may have experienced overfitting. Having said that, for the different delight runs shown in Table 2, the values of nRMSn_{\textrm{RMS}} are consistent with the other metrics. Therefore, we leave the analysis of n(z)n(z) to future work when a large enough testing sample is available.

6 Application: identifying photo-z outliers

6.1 Analysing the photo-z outliers of Delight and BCNz2

As we compared the photo-zz results, we discovered that there are some galaxies that have similar delight and bcnz22 photo-zz values, however these redshift values are far from their respective zCOSMOS spectroscopic redshifts or broadband photo-zz’s. Since both bcnz22 and delight utilise the PAUS narrow bands, we expect that the photo-zz’s they produce are more sensitive to emission lines as compared to photo-zz’s produced using only broad bands. Therefore, we suspect that objects that have similar photo-zz values for delight and bcnz22 but have disagreeing spec-zz values to be an indication of either having (1) a catastrophic zCOSMOS spectroscopic redshift666While we have already selected to use only secure spectroscopic redshifts in this work, we still deem this as a possibility, since a 11 per cent outlier rate in 4000+4000+ spec-zz measurements may still yield 4040 objects, which is within the same order of number of objects being investigated in this section. Our results later in this section however have verified that most of the outliers are not caused by catastrophic spectroscopic redshifts., (2) outlier broadband or narrowband fluxes, or (3) misidentification of close neighbours.

For the purpose of this inquiry, we have selected 3030 objects from the sample which are photo-zz outliers in zDelightz_{\textrm{Delight}} vs. zspecz_{\textrm{spec}} or zBCNzz_{\textrm{BCNz}} vs. zspecz_{\textrm{spec}}, yet are not outliers in zDelightz_{\textrm{Delight}} vs. zBCNzz_{\textrm{BCNz}}. Mathematically, they satisfy the following conditions:

  1. 1.

    |zDelightzspec|1+zspec0.15\frac{\left|z_{\textrm{Delight}}-z_{\textrm{spec}}\right|}{1+z_{\textrm{spec}}}\geq 0.15 or |zBCNzzspec|1+zspec0.15\frac{\left|z_{\textrm{BCNz}}-z_{\textrm{spec}}\right|}{1+z_{\textrm{spec}}}\geq 0.15, and

  2. 2.

    |zDelightzBCNz|1+zDelight+zBCNz2<0.15\frac{\left|z_{\textrm{Delight}}-z_{\textrm{BCNz}}\right|}{1+\frac{z_{\textrm{Delight}}+z_{\textrm{BCNz}}}{2}}<0.15.

Note that the zDelightz_{\textrm{Delight}} used here refers to the photo-zz produced using the flux calibration method, trained using 4646 bands guided by the Brown templates.

Refer to caption
Figure 12: The selected 3030 objects (red dots) marked for this outlier analysis. These objects are photo-zz outliers of either delight or bcnz22 with respect to zspecz_{\textrm{spec}}, but are not outliers with respect to each other.

These 3030 objects are visualised in the redshift-redshift plots in Fig. 12. Note that in the following paragraphs, we will define a photo-zz to be catastrophic if it is found to be an outlier with respect to its spec-zz, as defined mathematically above. These objects are found to have faint magnitudes (iauto>19.75i_{\textrm{auto}}>19.75) and small angular sizes (r50<60r_{50}<60 ACS pixels, or 1.8′′1.8^{\prime\prime}), which describe most galaxies of interest for PAUS. We study several different attributes of these objects, namely their respective photo-zz’s by delight, bcnz22 and lephare, photo-zz PDFs, best-fit templates (Brown and GP), spectra and images. We summarise important observations according to their respective attributes below.

Photo-zz’s. While these 3030 objects have been identified as outliers when trained using 4646 bands, we find that two-thirds of these objects have non-catastrophic photo-zz’s when trained with either only the broad or narrow bands, respectively. In other words, only one-third of these objects have catastrophic photo-zz’s regardless of which bands were used in the training or fitting process. This suggests that most of the time, outlier fluxes in the broad or narrow bands may have caused a degradation in photo-zz quality when trained together (more on this in the templates paragraph below). We have also made a comparison between delight photo-zz’s with those produced by lephare for the COSMOS2015 catalogue (Laigle et al., 2016), and found that in fact half of the 3030 objects have non-catastrophic lephare photo-zz’s. This suggests that the infrared yJHKyJHK bands could have played a role in improving the PAUS photo-zz’s, and could be incorporated in future trainings in case the PAUS photometry is problematic777We note that these additional bands will not be available over most of PAUS, which targets Canada-France-Hawaii Telescope Legacy Survey (CFHTLS) wide fields W1 to W4. There is however some infrared data on these fields provided by the Wide-field InfraRed Camera (WIRCam) and the VISTA Kilo-degree Infrared Galaxy Survey (VIKING)..

Photo-zz PDFs. We inspected the secondary/tertiary peaks of the PDFs for all delight runs (trained with 66 broad bands, 4040 narrow bands, or both), and find that less than 2020 per cent of these secondary/tertiary peaks coincide with their respective spec-zz’s. We deduce that despite the importance of secondary PDF peaks in redshift distributions, they do not significantly influence the photo-zz quality of these 3030 objects.

Templates. delight utilises the 129129 Brown et al. (2014) templates and the 42034203 training objects to guide the GP to produce the same number of new flux-redshift templates, which are used to produce photo-zz’s for the objects. In the training process, delight would always choose one best-fit Brown template for each training galaxy to be trained by the GP. Here we inspected two different kinds of best-fit Brown templates to these 3030 outliers: one fixed at the spec-zz, and the other with the redshift as a free parameter. In both cases, we examined

  1. 1.

    if the objects fit to the same templates when trained with only broad bands, only narrow bands, or both, respectively;

  2. 2.

    if there are any trends in galaxy morphological types, based on the galaxy type classification indicated by the template;

  3. 3.

    if there is any correlation between the χ2\chi^{2} value of the best-fit templates and the quality of photo-zz’s; and

  4. 4.

    if any outlier narrowband fluxes can be identified as the cause of the degradation of photo-zz.

As expected, we find that 7070 per cent of the outlier objects have different best-fit Brown templates between the fits at fixed photo-zz and spec-zz, which contrasts with the case for non-outliers at only 3535 per cent. We also find that only slightly more than a third of both the outlier and non-outlier objects were fitted to the same templates when trained using broad bands as compared to trained with all 4646 bands. The high percentage of objects with different template fits at different reference redshifts (photo-zz or spec-zz) and flux combinations (broad bands, narrow bands, or both) also resulted in no trend in galaxy morphological types among the outliers.

However, it was found that up to 6060 per cent of the objects have their best-fit template χ2\chi^{2} value correlating with the quality in photo-zz, which further affirms the usage of this as a metric to remove unreliable photo-zz’s (see Section 6.2), as also attempted by Eriksen et al. (2019) and Eriksen et al. (2020).

Refer to caption
Figure 13: A sample of best-fit Brown templates (unfixed redshift) when fit to only broadband fluxes (top), only narrowbands fluxes (middle), and both fluxes (bottom) for the galaxy with zCOSMOS ID 805216805216. Lv(λ)L_{v}(\lambda) is the rest-frame luminosity density (or SED) of the galaxy. This galaxy has zspec=0.736z_{\textrm{spec}}=0.736, zpz_{\textrm{p}} and tbt_{\textrm{b}} in the figure refer to its photo-zz and best-fit Brown template number, respectively. The outlier narrowband flux shown in the middle panel (red circle) has caused a misfit in template type, resulting in erroneous photo-zz’s for both cases.

Perhaps a more significant finding from the study of the best-fit templates is the ability to identify outlier narrowband fluxes. Fig. 13 shows an example which highlights the importance of identifying outlier narrowband fluxes, which is shown to significantly affect the photo-zz results. It was found that a third of the 3030 objects contained outlier narrowband fluxes, which results in entirely different template fits and photo-zz’s when trained with narrow bands, as compared to when trained with broad bands only. Among these 1010 objects, 88 of them are shown to have worse photo-zz as compared to training without the narrow bands. We find indications for a significant fraction of narrowband flux outliers also for galaxies without catastrophic redshift failures. Forthcoming PAUS data reductions will therefore implement methods to identify and correct flux outliers.

Images. We inspect the individual object images compiled by zCOSMOS DR33, these are 5′′×5′′5^{\prime\prime}\times 5^{\prime\prime} images observed by the Hubble Space Telescope/Advanced Camera for Surveys (HST/ACS) in the F814814W filter (Koekemoer et al., 2007). Among the 3030 outlier objects, we find 63.363.3 and 26.726.7 per cent of them having bright neighbours within 5′′5^{\prime\prime} and 3′′3^{\prime\prime} of the primary source, respectively. Having said that, we have not found any correlation between the presence of bright neighbours to the other attributes that we have studied thus far. In fact the opposite is true: we find that 6060 per cent of the objects with outlier narrowband fluxes actually have primary sources without any bright neighbours in vicinity.

Refer to caption
Figure 14: Spectral line fitting (red) for the original spectra (black) of the galaxy with zCOSMOS ID 804179804179. The spec-zz given by zCOSMOS is 0.42170.4217 (top), while the best-fit using ez (Garilli et al., 2010) gives a spec-zz of 0.08470.0847 (bottom), which is closer to the photo-zz value of 0.11500.1150 estimated by delight.

Spectra. So far we have assumed that the zCOSMOS spectra obtained are reliable, as only entries with high-confidence quality flags have been selected for training (see Section 2.3). In order to probe further, we examined the one-dimensional spectra obtained by the VIMOS spectrograph, which is processed by the VIMOS Interactive Pipeline and Graphical Interface (VIPGI, Scodeggio et al., 2005) to produce the zCOSMOS spec-zz’s used in this work. The spectra have a range between 55005500 Å and 94509450 Å, measured with a resolution of R600R\sim 600 at 2.52.5 Å per pixel (Lilly et al., 2009).

We used the redshift measurement tool ez (Garilli et al., 2010) to inspect the spectra of the 3030 outlier objects, and compared our best fits to the spectroscopic redshift produced by zCOSMOS, and also the photo-zz’s produced by delight, bcnz22, deepz, lephare (COSMOS2015) and those of Alarcon et al. (2021).

Upon inspection, we find that up to 1010 of these objects (3333 per cent) have disputable zCOSMOS spec-zz (e.g. two possible redshift values, different best-fit redshift values, line confusion, and low signal-to-noise). However, most of these potential spec-zz failures could be forced-fitted to the zCOSMOS spec-zz and still look satisfactory, which leaves only 22 (6.76.7 per cent) of these objects having truly catastrophic spec-zz’s. Both these objects are found to have better ez fits at redshift values within 1010 per cent uncertainty from the photo-zz’s produced by delight and other algorithms. The spectrum of one of these objects is shown in Fig. 14. We have also found one isolated case where the spectra belonged to a bright neighbour and has been mismatched to the PAUS photometry.

Generally, the higher-redshift objects are identified by clear O II (3727.13727.1 Å) emission lines, while the lower-redshift objects are identified by clear Hα\alpha (6564.66564.6 Å) emission lines. We therefore conclude that although catastrophic spec-zz’s played a role in this situation, our results did not provide enough evidence to say that it is a major cause for catastrophic photo-zz’s produced by bcnz22 and delight. This is not surprising since we have only selected secure spectroscopic redshifts from COSMOS to be used in this work. However this highlights the usefulness of multiple PAUS photo-zz’s being used to determine failure rates in insecure spectroscopic redshifts.

To summarise this part, we believe that the potentially important source for catastrophic photo-zz’s in the context of PAUS are the outlier narrowband fluxes, with weak evidence for the existence of a small number of spec-zz failures. We leave the tackling of outlier narrowband fluxes to future work, but in the following section, we attempt to improve our process to identify and remove these outlier photo-zz’s.

6.2 New metrics to remove photo-z outliers

In Figs. 8 and 9 we have used the Bayesian odds (Θ\Theta) to cut the sample, and the aim of this was to keep as many objects as possible while achieving the goal of σ680.0035(1+z)\sigma_{68}\leq 0.0035(1+z). Here, we extend our previous results further towards that goal by introducing several new metrics to better separate the photo-zz outliers from the sample. These metrics are motivated by the inspection of the 3030 outliers in Section 6.1, and they are defined as follows:

  1. 1.

    The Delight-BCNz2 metric (ΔDB\Delta_{\textrm{DB}}),

    ΔDB|zDelightzBCNz|1+zDelight+zBCNz2,\Delta_{\textrm{DB}}\equiv\frac{\left|z_{\textrm{Delight}}-z_{\textrm{BCNz}}\right|}{1+\frac{z_{\textrm{Delight}}+z_{\textrm{BCNz}}}{2}}, (16)

    a metric used to identify the similarity between delight and bcnz22 photo-zz’s. It is plausible that, in general, the closer the photo-zz’s between the two algorithms, the more reliable they are;

  2. 2.

    The Delight photo-zz standard deviation (σD\sigma_{\textrm{D}}), which is the standard deviation between all delight photo-zz runs regardless of calibration method and number of bands. Smaller deviations could indicate more reliable photo-zz’s;

  3. 3.

    The chi-squared value of the best-fit Brown template (χt2\chi^{2}_{\textrm{t}}), where we identified a trend that the better the fit, the more reliable the photo-zz; and

  4. 4.

    The broadband-narrowband complementary metric (ρ2\rho^{2}),

    ρ2pBB(z)pNB(z)𝑑z,\rho^{2}\equiv\int p_{\textrm{BB}}(z)p_{\textrm{NB}}(z)\,dz, (17)

    where pBB(z)p_{\textrm{BB}}(z) and pNB(z)p_{\textrm{NB}}(z) are the p(z)p(z) produced by delight when trained with only broad bands and only narrow bands, respectively. By multiplying these two p(z)p(z) and summing over the distribution at each step ii, we can identify the consistency between the broadband and narrowband p(z)p(z). A higher value of ρ2\rho^{2} means a larger overlap, which indicates more reliable photo-zz’s.

Together with Θ\Theta and the delight photo-zz error (δz\delta z), we yield a total of 66 metrics to experiment with. Using the results from the flux calibration method, we generate and test the individual performance for each of these metrics. For each metric, we measure the σRMS\sigma_{\textrm{RMS}} and σ68\sigma_{68} after systematically removing objects with the worst metric values, 1010 per cent of the total sample size each time, until we reach a sample size of only 4040 per cent.

We also repeat the exercise by using combined cuts on several metrics, testing all 5757 combinations of the 66 metrics. We note that we do not combine the metrics by averaging or multiplying them, as it would have diluted the impact of the individual metrics. Instead, we rank the values for each metric individually (from best to worst), and remove objects rank by rank, starting with metric values lying in the worst rank. E.g. for the combination of metrics Θ+ΔDB\Theta+\Delta_{\textrm{DB}}, we first remove all objects which share the worst values of Θ\Theta and ΔDB\Delta_{\textrm{DB}}, then remove all objects sharing the second worst values of them, and so on, until we reach a required sample size percentile (9090, 8080, etc), where we output the values of σRMS\sigma_{\textrm{RMS}} and σ68\sigma_{68}. We visualise the performance of these metric cuts at several percentiles for σ68\sigma_{68} with respect to iautoi_{\textrm{auto}} (cumulative) in Fig. 15.

Refer to caption
Figure 15: Plot of 6868th percentile error (σ68\sigma_{\textrm{68}}) vs. iautoi_{\textrm{auto}} (cumulative) when cut using the following metrics: the Bayesian odds (Θ\Theta), best-fit Brown template χt2\chi^{2}_{\textrm{t}} value, delight-bcnz22 metric (ΔDB\Delta_{\textrm{DB}}), delight photo-zz error (δz\delta z) and standard deviation (σD\sigma_{\textrm{D}}), and the broadband-narrowband complementary metric (ρ2\rho^{2}). The coloured lines follow the same percentile cuts as shown in Fig. 8, with the dotted-coloured lines in the background of the bottom panels depicting the results of Θ\Theta for easier comparison. The bottom-left panel shows the cut in Δz\Delta z (defined in Section 5.1), the unsurpassable theoretical best used for reference. The bottom-middle panel shows the cuts when all the above metrics were combined, while the bottom-right shows the combination of metrics which yield the best results.

We find that each performance metric cuts the sample differently: while metric cuts of σD\sigma_{\textrm{D}} and ρ2\rho^{2} reduce the scatter (σRMS\sigma_{\textrm{RMS}}) significantly, metric cuts of Θ\Theta and ΔDB\Delta_{\textrm{DB}} reduce the σ68\sigma_{68} instead. The metric χt2\chi^{2}_{\textrm{t}}, however does not seem to bring any significant improvement to the results. We have also plotted a cut in Δz=|zphotzspec|1+zspec\Delta z=\frac{\left|z_{\textrm{phot}}-z_{\textrm{spec}}\right|}{1+z_{\textrm{spec}}} (bottom-left panel in Fig. 15), which is the theoretical ‘best metric’, providing an upper limit to be compared with the performance of each of the metrics. Here we noticed that even with the theoretical best metric, a cut of slightly lesser than 7070 per cent (blue line) on the sample is still necessary to fulfil the PAUS target of σ68<0.0035(1+z)\sigma_{68}<0.0035(1+z) (dotted line) for delight.

Therefore, we select the 6060 per cent cut (navy line, retaining 6060 per cent of galaxies) as a benchmark to assess the performance of these metrics, we do so by locating where this line cuts the dotted line (i.e., finding the maximum value of iautoi_{\textrm{auto}} where the photo-zz’s achieves the PAUS target at 6060 per cent cut). From Fig. 15, it is clear that cutting in all 66 metrics does not necessarily outperform the performance when cutting with only Θ\Theta, so we searched for the best combination of metrics for σRMS\sigma_{\textrm{RMS}} and σ68\sigma_{68} separately.

For σRMS\sigma_{\textrm{RMS}}, the best combination of metrics is ΔDB+σD+ρ2\Delta_{\textrm{DB}}+\sigma_{\textrm{D}}+\rho^{2}, and this combination achieves σRMS<0.0035(1+z)\sigma_{\textrm{RMS}}<0.0035(1+z) at iauto<19.27i_{\textrm{auto}}<19.27 at 6060 per cent cut, a significant improvement to the case when only Θ\Theta was used, where it did not cut the line at all. For σ68\sigma_{68}, the best combination of metrics is Θ+ΔDB\Theta+\Delta_{\textrm{DB}} where it reached σ68<0.0035(1+z)\sigma_{\textrm{68}}<0.0035(1+z) at iauto<21.25i_{\textrm{auto}}<21.25 at 6060 per cent cut, which is also a significant improvement as compared to Θ\Theta at iauto<20.88i_{\textrm{auto}}<20.88. Here we note that in fact using ΔDB\Delta_{\textrm{DB}} alone, the target can be reached at a higher limit of iauto<21.50i_{\textrm{auto}}<21.50, which highlights the significance of a synergy between delight and bcnz22 in selecting a high quality photo-zz sample.

Refer to caption
Figure 16: Plot of percentage of objects within each photo-zz bin with respect to the cut in performance metric values listed in Fig. 15. The lines show the percentiles of the same colour scheme as in Fig. 8, while the histograms in the background show the relative number of objects in each photo-zz bin. The bottom-right plot shows when the combination of all 66 metrics are used to cut the sample.

Finally, we also show the performance of the metrics in terms of the completeness with respect to the photo-zz (using delight’s flux calibration method), visualised in Fig. 16. We find that metrics like σD\sigma_{\textrm{D}} and ρ2\rho^{2} tend to selectively remove high photo-zz objects, while Θ\Theta, χt2\chi^{2}_{\textrm{t}} and ΔDB\Delta_{\textrm{DB}} tend to remove mid-ranged photo-zz objects. In general, a cut using all 66 performance metrics at 6060 per cent cut shows a balanced result in the completeness, keeping a sufficient number of high redshift objects in the sample.

To summarise the performance of the individual metrics,

  • χt2\chi^{2}_{\textrm{t}} is the least-performing metric here; it does not bring significant positive impact to the results;

  • Cuts in σD\sigma_{\textrm{D}} and ρ2\rho^{2} help to improve the scatter, however, they tend to selectively remove higher photo-zz objects from the sample;

  • Θ\Theta and δz\delta z show very similar results, however Θ\Theta tends to keep more high photo-zz objects in the sample; and

  • ΔDB\Delta_{\textrm{DB}} is the best-performing metric here, and we recommend the use of such a metric to remove outlier photo-zz’s from a sample.

7 Conclusion and Future Work

In this work we have optimised delight, a hybrid template-machine learning algorithm such that it could be used to obtain photo-zz’s for PAUS, by utilising its 4040 narrowband fluxes combined with 66 uBVrizuBVriz COSMOS broadband fluxes. We have shown three distinct methods to calibrate the broadband and narrowband fluxes, and found that all three methods yield comparable results, although the most stable and the one which produces the lowest value of σ68\sigma_{68} is what we defined as the flux calibration method: a method where we calibrate the broadband fluxes with respect to the narrowband fluxes by finding the flux ratio of the filter combinations which overlap. This calibration method is entirely photometric, and it was able to produce photo-zz’s with a scatter reaching as low as σRMS=0.0331(1+z)\sigma_{\textrm{RMS}}=0.0331(1+z) and σ68=0.0081(1+z)\sigma_{68}=0.0081(1+z) for the full PAUS galaxy sample at iauto<22.5i_{\textrm{auto}}<22.5.

We have also compared the results of delight with a machine learning algorithm (annz22) and a template-based algorithm (bpz and bcnz22). We find that annz22 underperforms significantly, indicating that annz22 in its basic form is not suitable for narrowband surveys with large number of bands and small number of training objects.

Despite the photo-zz performance of bpz being within 99 per cent difference of that of delight, the latter still stood out in terms of the quality of the photo-zz PDF p(z)p(z) (1616 per cent better in ρCRPS\rho_{\textrm{CRPS}}) and the effectiveness of its Bayesian odds (Θ\Theta) cut in retaining objects with higher quality photo-zz without losing too many high-redshift objects. delight is also shown to produce competitive results as compared to bcnz22 (55 per cent lower in σ68\sigma_{68}), the default photo-zz produced for the PAUS.

Further investigation on the common photo-zz outliers of delight and bcnz22 led to the conclusion that outlier narrowband fluxes are the main cause for erroneous photo-zz’s, an insight which will inform improvements in forthcoming PAUS data reductions. We have also inspected the spectra and identified catastrophic spec-zz’s, however the effects are shown to be insignificant in this work. Motivated by the study of 3030 outliers shared between delight and bcnz22, we introduced several new metrics to help improve the identification of photo-zz outliers and remove them from the sample to achieve better results. From the 66 metrics compared, our newly introduced delight-bcnz22 metric (ΔDB\Delta_{\textrm{DB}}) is shown to significantly improve our photo-zz quality, allowing it to reach the PAUS target of σ68<0.0035(1+z)\sigma_{68}<0.0035(1+z) at iauto<21.5i_{\textrm{auto}}<21.5 while retaining 6060 per cent of the sample objects. These new metrics could be utilised to return more accurate uncertainties in redshift, which are vital in many cosmological studies.

This opens the door to future studies in finding synergies between different photo-zz algorithms and between broadband and narrowband photometry. Together with the promising developments of deep learning approaches to deal with narrowband data (Eriksen et al., 2020), these insights will pave the way towards unprecedentedly precise and accurate photometric redshifts for the full PAUS survey and beyond, like the Javalambre Physics of the Accelerating Universe Astrophysical Survey (J-PAS, Benítez et al., 2014).

Acknowledgements

The authors wish to thank the referee for the helpful and constructive comments. JYHS would like to thank Boris Leistedt for fruitful discussions and the setup of delight earlier in this work. JYHS also would like to thank Hwee San Lim and Tiem Leong Yoon for assisting in the setup of equipment in Universiti Sains Malaysia where most of the computational work of this paper was completed. JYHS acknowledges financial support from the MyBrainSc Scholarship by the Ministry of Education, Malaysia, a studentship provided by Ofer Lahav, and the Short Term Research Grant by Universiti Sains Malaysia (304/PFIZIK/6315395). JYHS and BJ acknowledge support by the University College London Cosmoparticle Initiative. MS acknowledges funding from the National Science Centre of Poland (UMO-2016/23/N/ST9/02963) and the Spanish Ministry of Science and Innovation through the Juan de la Cierva Formación programme (FJC2018-038792-I). H. Hildebrandt acknowledges support by a Heisenberg grant of the Deutsche Forschungsgemeinschaft (Hi 1495/5-1) and an ERC Consolidator Grant (no. 770935). H. Hoekstra acknowledges support from the Netherlands Organisation for Scientific Research (NWO) through grant 639.043.512. IEEC and IFAE are partially funded by the Institució Centres de Recerca de Catalunya (CERCA) and Beatriu de Pinós Programme of Generalitat de Catalunya. Work at Argonne National Lab is supported by UChicago Argonne LLC, Operator of Argonne National Laboratory (Argonne). Argonne, a U.S. Department of Energy Office of Science Laboratory, is operated under contract no. DE-AC02-06CH11357.

This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under the Marie Skłodowska-Curie Actions, through the following projects: Latin American Chinese European Galaxy (LACEGAL) Formation Network (no. 734374), the Enabling Weak Lensing Cosmology (EWC) Programme (no. 776247), and Barcelona Institute of Science and Technology (PROBIST) Postdoctoral Programme (no. 754510).

PAUS is partially supported by the Ministry of Economy and Competitiveness (MINECO, grants CSD2007-00060, AYA2015-71825, ESP2017-89838, PGC2018-094773, PGC2018-102021, SEV-2016-0588, SEV-2016-0597 and MDM-2015-0509). Funding for PAUS has also been provided by Durham University (ERC StG DEGAS-259586), ETH Zurich, and Leiden University (ERC StG ADULT-279396).

The PAU data centre is hosted by the Port d’Informació Científica (PIC), maintained through a collaboration of CIEMAT and IFAE, with additional support from Universitat Autònoma de Barcelona and the European Research Development Fund (ERDF).

Data availability

The data from PAUS (photometry and photo-zz’s) is currently not yet publicly available. The data from COSMOS were accessed from the ESO Catalogue Facility (https://www.eso.org/qi/, while the data from zCOSMOS (spectra and spec-zz’s) were accessed from the zCOSMOS database (http://cesam.lam.fr/zCosmos/). The derived data generated in this research will be shared on reasonable request to the corresponding author.

References

  • Abbott et al. (2005) Abbott T., et al., 2005, preprint (arXiv:astro-ph/0510346)
  • Aihara et al. (2018) Aihara H., et al., 2018, PASJ, 70
  • Alarcon et al. (2021) Alarcon A., et al., 2021, MNRAS, 501, 6103
  • Bellman (1957) Bellman R., 1957, JPSJ, 12, 1049
  • Benítez (2000) Benítez N., 2000, ApJ, 536, 571
  • Benítez et al. (2014) Benítez N., et al., 2014, preprint (arXiv:1403.5237)
  • Benjamin et al. (2013) Benjamin J., et al., 2013, MNRAS, 431, 1547
  • Bilicki et al. (2018) Bilicki M., et al., 2018, A&A, 616, A69
  • Bonfield et al. (2010) Bonfield D. G., Sun Y., Davey N., Jarvis M. J., Abdalla F. B., Banerji M., Adams R. G., 2010, MNRAS, 405, 987
  • Bonnett et al. (2016) Bonnett C., et al., 2016, Phys. Rev. D, 94, 042005
  • Boulade et al. (2003) Boulade O., et al., 2003, Proc. SPIE, 4841, 72
  • Brescia et al. (2018) Brescia M., Cavuoti S., Amaro V., Riccio G., Angora G., Vellucci C., Longo G., 2018, in Kalinichenko L., Manolopoulos Y., Malkov O., Skvortsov N., Stupnikov S., Sukhomlin V., eds, Data Analytics and Management in Data Intensive Domains. Communications in Computer and Information Science. Springer International Publishing, pp 61–72, doi:10.1007/978-3-319-96553-6_5
  • Brown et al. (2014) Brown M. J. I., et al., 2014, ApJS, 212, 18
  • Brun & Rademakers (1997) Brun R., Rademakers F., 1997, Nucl. Instrum. Methods Phys. Res. Sec. A, 389, 81
  • Bruzual & Charlot (2003) Bruzual A. G., Charlot S., 2003, MNRAS, 344, 1000
  • Bundy et al. (2015) Bundy K., et al., 2015, ApJS, 221, 15
  • Cavuoti et al. (2017) Cavuoti S., et al., 2017, MNRAS, 466, 2039
  • Coleman et al. (1980) Coleman G. D., Wu C.-C., Weedman D. W., 1980, ApJS, 43, 393
  • Crocce et al. (2016) Crocce M., et al., 2016, MNRAS, 455, 4301
  • D’Isanto et al. (2018) D’Isanto A., Cavuoti S., Gieseke F., Polsterer K. L., 2018, A&A, 616, A97
  • De Jong et al. (2013) De Jong J. T. A., Verdoes Kleijn G. A., Kuijken K. H., Valentijn E. A., 2013, Exp. Astron., 35, 25
  • Duncan et al. (2018) Duncan K. J., et al., 2018, MNRAS, 473, 2655
  • Duncan et al. (2019) Duncan K., et al., 2019, ApJ, 876, 110
  • Eriksen et al. (2019) Eriksen M., et al., 2019, MNRAS, 484, 4200
  • Eriksen et al. (2020) Eriksen M., et al., 2020, MNRAS, 497, 4565
  • Garilli et al. (2010) Garilli B., Fumana M., Franzetti P., Paioro L., Scodeggio M., Le Fèvre O., Paltani S., Scaramella R., 2010, PASP, 122, 827
  • Hoecker et al. (2007) Hoecker A., et al., 2007, preprint (arXiv:physics/0703039)
  • Ilbert et al. (2006) Ilbert O., et al., 2006, A&A, 457, 16
  • Ivezić et al. (2008) Ivezić Z., et al., 2008, preprint (arXiv:0805.2366)
  • Johnston et al. (2020) Johnston H., et al., 2020, preprint (arXiv:2010.09696)
  • Joudaki et al. (2020) Joudaki S., et al., 2020, A&A, 638, L1
  • Jouvel et al. (2017) Jouvel S., et al., 2017, MNRAS, 469, 2771
  • Kinney et al. (1996) Kinney A. L., Calzetti D., Bohlin R. C., McQuade K., Storchi-Bergmann T., Schmitt H. R., 1996, ApJ, 467, 38
  • Koekemoer et al. (2007) Koekemoer A. M., et al., 2007, ApJS, 172, 196
  • Laigle et al. (2016) Laigle C., et al., 2016, ApJS, 224, 24
  • Laigle et al. (2018) Laigle C., et al., 2018, MNRAS, 474, 5437
  • Laureijs et al. (2011) Laureijs R., et al., 2011, preprint (arXiv:1110.3193)
  • Le Fèvre et al. (2003) Le Fèvre O., et al., 2003, Proc. SPIE, 4841, 1670
  • Leistedt & Hogg (2017) Leistedt B., Hogg D. W., 2017, ApJ, 838, 5
  • Lilly et al. (2007) Lilly S. J., et al., 2007, ApJS, 172, 70
  • Lilly et al. (2009) Lilly S. J., et al., 2009, ApJS, 184, 218
  • Martí et al. (2014) Martí P., Miquel R., Castander F. J., Gaztanaga E., Eriksen M., Sanchez C., 2014, MNRAS, 442, 92
  • Miyazaki et al. (2002) Miyazaki S., et al., 2002, PASJ, 54, 833
  • Padilla et al. (2019) Padilla C., et al., 2019, AJ, 157, 246
  • Polletta et al. (2007) Polletta M., et al., 2007, ApJ, 663, 81
  • Polsterer et al. (2016) Polsterer K. L., D’Isanto A., Gieseke F., 2016, preprint (arXiv:1608.08016)
  • Raihan et al. (2020) Raihan S. F., Schrabback T., Hildebrandt H., Applegate D., Mahler G., 2020, MNRAS, 497, 1404
  • Sadeh et al. (2016) Sadeh I., Abdalla F. B., Lahav O., 2016, PASP, 128, 104502
  • Salvato et al. (2019) Salvato M., Ilbert O., Hoyle B., 2019, Nat. Astron., 3, 212
  • Schmidt et al. (2020) Schmidt S. J., et al., 2020, MNRAS, 499, 1587
  • Scodeggio et al. (2005) Scodeggio M., et al., 2005, PASP, 117, 1284
  • Scoville et al. (2007) Scoville N., et al., 2007, ApJS, 172, 1
  • Siudek et al. (2018) Siudek M., et al., 2018, preprint (arXiv:1805.09905)
  • Soo et al. (2018) Soo J. Y. H., et al., 2018, MNRAS, 475, 3613
  • Spergel et al. (2013) Spergel D., et al., 2013, preprint (arXiv:1305.5422)
  • Tanaka et al. (2018) Tanaka M., et al., 2018, PASJ, 70, S9