11email: loredana.prisinzano@inaf.it 22institutetext: CEICO, Institute of Physics of the Czech Academy of Sciences, Na Slovance 2, 182 21 Praha 8, Czechia 33institutetext: Astrophysics Group, Keele University, Keele, Staffordshire ST5 5BG, United Kingdom 44institutetext: INAF - Osservatorio Astronomico di Capodimonte, via Moiariello 16, 80131 Napoli,Italy
Low mass young stars in the Milky Way unveiled by DBSCAN and Gaia EDR3. Mapping the star forming regions within 1.5 Kpc
Abstract
Context. With an unprecedented astrometric and photometric data precision, Gaia EDR3 gives us, for the first time, the opportunity to systematically detect and map in the optical bands, the low mass populations of the star forming regions (SFRs) in the Milky Way.
Aims. We provide a catalogue of the Gaia EDR3 data (photometry, proper motions and parallaxes) of the young stellar objects (YSOs) identified in the Galactic Plane (∘) within about 1.5 kpc. The catalogue of the SFRs to which they belong is also provided to study the properties of the very young clusters and put them in the context of the Galaxy structure.
Methods. We applied the machine learning unsupervised clustering algorithm DBSCAN on a sample of Gaia EDR3 data photometrically selected on the region where very young stars (10 Myr) are expected to be found, with the aim to identify co-moving and spatially consistent stellar clusters. A subsample of 52 clusters, selected among the 7 323 found with DBSCAN, has been used as template data set, to identify very young clusters from the pattern of the observed color-absolute magnitude diagrams through a pattern match process.
Results. We find 124 440 candidate YSOs clustered in 354 SFRs and stellar clusters younger than 10 Myr and within Kpc. In addition, 65 863 low mass members of 322 stellar clusters located within 500 pc and with ages 10 Myr100 Myr were also found.
Conclusions. The selected YSOs are spatially correlated with the well known SFRs. Most of them are associated with well concentrated regions or complex structures of the Galaxy and a substantial number of them have been recognized for the first time. The massive SFRs, such as, for example, Orion, Sco-Cen and Vela, located within 600-700 pc trace a very complex three-dimensional pattern, while the farthest ones seem to follow a more regular pattern along the Galactic Plane.
Key Words.:
methods: data analysis – stars: formation, pre-main sequence – Galaxy: open clusters and associations: general – catalogs – surveys1 Introduction
It is by now well known that stars originate from collapse of cold molecular clouds, and mainly form in over-dense structures and clusters usually designated as star forming regions (SFRs). During the very early phases, young stellar objects (YSOs) can be identified in the near, mid far infrared (IR) and radio wavelengths because of the presence of the optically thick infalling envelope or circumstellar disk around the central star. In the subsequent pre-main sequence stage phase, they become visible also in the optical bands. But, when the final dispersal of the disk material occurs and non-accreting transition disks form, YSOs can no longer be identified in IR or radio surveys (Ercolano et al., 2021) and a complete census is possible only in the optical bands.
While a clean identification of YSOs is very hard using only optical photometry, an efficient way to systematically single out SFRs is by the identification of kinematical stellar groups having a common space motion. With an unprecedented astrometric precision and sky coverage, Gaia data offer the possibility to recognise the SFRs as common proper motion groups, at least within the Gaia observational limits.
Data from the Gaia mission are revolutionising our capability to map the youngest stellar populations of the Milky Way in the optical bands, which is one of the main core science goals for an overall understanding of the Galactic components. The youngest stellar component is crucial to better characterize the Galactic thin disk, and its spiral arms and to understand its origin.
The characterisation of individual SFRs and their dynamics are also fundamental to understand the local formation, evolution and dispersion of star clusters, as well as the star formation history and the Initial Mass Function. Finally, statistical studies of YSOs during the early years of their formation, when the proto-planetary discs are evolving and planets form, are crucial to shed light on planet formation theory.
With more than 1.3 billion stars with precise proper motions and astrometric (positions and parallaxes) and photometric measurements, Gaia DR2 data allowed several studies aimed to identify clustered populations of the Milky Way. Some of these studies have been dedicated to SFRs, associations and moving groups. Zari et al. (2018) presented an analysis of the clustered and diffuse young populations within 500 pc, using a combination of photometric and astrometric criteria. Analogously, Kerr et al. (2021) studied the solar neighbourhood by applying the HDBSCAN clustering algorithm (McInnes et al., 2017). They found 27 young groups, associations and significant substructures, associated to known clusters and SFRs, and released a catalogue including Gaia DR2 YSOs within 333 pc.
Cantat-Gaudin et al. (2018) started from a list of known clusters to assign them unsupervised membership and parameters. Other studies have been dedicated to systematically find open clusters in the Galaxy. Castro-Ginard et al. (2018) used the DBSCAN algorithm (Ester et al., 1996) to select a list of candidate open clusters (OC) which they then refined to identify real OCs with a well defined main sequence (MS). Other papers have been recently published to both discover new open clusters and derive their parameters (e.g. Cantat-Gaudin & Anders, 2020; Cantat-Gaudin et al., 2020; Castro-Ginard et al., 2020; Liu & Pang, 2019).
A recent attempt to find Galactic Plane (GP) clustered populations, including SFRs, has been made by Kounkel & Covey (2019) and Kounkel et al. (2020), again using Gaia DR2 data and the HDBSCAN unsupervised algorithm in 5D space (, , , , ). In these works, the first limited to 1 Kpc and the second to 3 Kpc, they found clustered populations but also associations, moving groups and string-like structures, parallel to the GP, spanning hundreds of parsec in length. Clusters aged between 10 Myr and 1 Gyr have been found, with an onion-like approach, i.e. using the entire catalogue with different cutoffs in parallax and progressively merging the different catalogues.
A different approach has been adopted by Bica et al. (2019) who used infrared (IR) data from 2MASS, WISE, VVV, Spitzer, and Herschel surveys to compile a catalog of 10 978 Galactic star clusters, and associations, including 4 234 embedded clusters.
With the advent of Gaia Early Data Release 3 (EDR3), based on 34 months of observations111Gaia DR2 data were based on 22 months of observations, available photometric and astrometric measurements improved significantly. In particular, photometric improvements have been made in the calibration models, in the different photometric systems and in the treatment of the BP and RP local background flux (Riello et al., 2021).
In this paper, we use Gaia EDR3 data to systematically identify the low mass component of SFRs in the Galaxy, with ages approximately Myr and within a distance limit of 1.5 Kpc, imposed by our data selection. We focus our analysis on very young clusters, by exploiting the significant progress achieved with Gaia EDR3 data. A full exploitation of the Gaia data and the results presented here would require further data, such as spectroscopic determination of individual stellar parameters, such as effective temperatures, gravities, and stellar luminosities as well as rotational and radial velocities, crucial to derive masses, ages and 3D space velocities. Even though the results presented here cannot be used at this stage to determine the IMF, Star Formation history and 3D kinematics of the SFRs, they can serve to trace the very young Galactic stellar component within 1.5-2 Kpc through a systematic method that homogeneously identifies the bulk population of the SFRs. Such results can be used both for statistical as well as for individual detailed analyses. The paper is organised as follows: we describe in Sect. 2 the requirements adopted to select the Gaia EDR3 data and in Sect. 3 the photometric selection applied to obtain the starting sample of the YSO candidates. In Sect. 4 we describe the method adopted to identify SFRs and stellar clusters, the criteria adopted to validate them and the age classification. Our results and the discussion are presented in Sect. 5 and 6, respectively; finally, our summary and conclusions are presented in Sect 7. In the Appendix A we show the effects of the reddening in the Gaia color-absolute magnitude diagrams, in the Appendix B we estimate the effect of multiplicity in the selection of the YSOs while in the Appendix C we describe the comparison of specific regions with the literature.
2 Gaia data
In this analysis, we use the Gaia EDR3 data (Gaia Collaboration et al., 2016, 2021) which provide precise astrometry and kinematics (, , , , ) as well as excellent photometry in three broad bands (, , ). Since our analysis is focused on the Galactic midplane, where most of the YSOs are expected to be found, we selected sources within 30∘. We limit our selection to 7.5. The limit 7.5 has been chosen to discard objects with magnitudes derived from saturated CCD images, while 20.5 is the limit to include most of the objects with magnitude uncertainties lower than 0.2 mag. This range includes the young low mass populations () of the known SFRs within the distance set by the limiting magnitude. In addition, we considered only positive parallax values. This choice does not introduce any bias since we do not expect to investigate stars with very small parallaxes that could have negative values (Luri et al., 2018). Finally, we imposed a relative parallax error lower than 20%, to discard stars with a poorly constrained distance, and, to take into account the Gaia EDR3 systematics, we also considered the renormalized unit weight error (RUWE), (Lindegren et al., 2021b), expected to be for sources where the single-star model provides a good fit to the astrometric observations.
To summarise, data of our interest were selected from the Astronomical Data Query Language (ADQL) interface of the ESA Gaia Archive222https://gea.esac.esa.int/archive/ using the following restrictions:
(1) |
We included in the query also a photometric condition aimed to include the Pre-Main Sequence (PMS) region of the vs. color-absolute magnitude diagram (CAMD) where all very young stars (t10 Myr) are expected to be found. We split our selection in two samples, namely bright and faint, according to the following criteria:
(2) |
(3) |
These limits are drawn as solid blue and green lines in Fig. 1. We note that in this work, for the reddening uncorrected absolute magnitudes, we adopted the definition , based on the inverted Gaia EDR3 parallaxes, since, as shown in Piecka & Paunzen (2021), within 2 kpc, the inverse-parallax method gives results comparable to distances derived by the Bayesian approach (Bailer-Jones et al., 2021).
The minimum value was set to avoid the upper region of the color-absolute magnitude diagram, where the overlap of the Upper MS or PMS stars of the SFRs with giants, MS or Turn-off stars, is expected to be very high, especially if the reddening is not corrected. This implies a cut of the massive population of the SFRs but it does not represent an issue for our investigation since we are mainly interested in the rich low mass component of these populations.
In order to further reduce the fraction of contaminants we used also the condition , that is the minimum expected unreddened color for low mass (M1.2M⊙) PMS (age 10 Myr) stars.
Our photometric selection as well as the subsequent analysis are based on the colors. This choice allows us to avoid the use of the magnitudes that for 20 are strongly affected by the application of the minimum flux threshold, which overestimates the mean BP flux. This issue also affects the RP flux, but with a considerably lower effect in than in (Riello et al., 2021).
Once the data have been retrieved by the ESA Gaia Archive, parallax values were corrected by the zero point bias reported in Lindegren et al. (2021a), using the Python code available to the community333https://gitlab.com/icc-ub/public/gaiadr3\_zeropoint, that is a function of source magnitude, colour, and celestial position.
In addition, we performed a further data filtering by considering only objects with error in the smaller than 0.14 mag. Standard errors in the magnitudes were computed by using the propagations of the flux errors with the formulas:
(4) |
(5) |
(6) |
where , and are the mean fluxes in the , and bands, respectively, and , and , are the Gaia EDR3 zero point uncertainties444See https://www.cosmos.esa.int/web/gaia/edr3-passbandsforfurtherdetails.
3 Photometric selection of the input sample
In this section, we describe and discuss how we performed the final photometric selection of the sample used as input for the subsequent clustering analysis, that is based on the astrometric and kinematic Gaia EDR3 parameters, as will be described in Sec. 4.
By considering the typical complexity of the environment of young stars and the dependence of the reddening law from the stellar effective temperature due to the large spectral range covered by the Gaia bands (Anders et al., 2019), we do not attempt to correct colors and magnitudes for reddening and absorption but we use their observed values. This is certainly one of the main source of contamination by older stars that will be overcome as will be discussed later.
Our goal is to start from a complete sample, including all potential YSOs with ages Myr, at least in the photometric range set as described in Sect. 2. In particular, we selected the objects with falling on the red side of the solar metallicity 10 Myr isochrone computed using the PISA models (Dell’Omodarme et al., 2012; Randich et al., 2018; Tognelli et al., 2018, 2020), in the vs. diagram shown in Fig. 1. To check if the selected photometric limit is compliant with our requirements, we compared it with the reddening uncorrected CAMD of some SFRs and young clusters for which membership has been recently derived by Jackson et al. (2022), based on the 3D kinematics of the spectroscopic targets. We find that the adopted 10 Myr isocrone delimits the PMS region of clusters, such as NGC 2264, Lambda Ori, Lambda Ori B35 and Rho Ophiuchi, that are in the age range ( Myr) of our main interest. However, also members of 20 Myr old clusters, such as Gamma Velorum, fall completely in the selected photometric region, while members of 50 Myr old clusters, such as NGC 2451b, fall partially in the selected photometric region at . Going to clusters with ages Myr the overlapping region occurs at fainter magnitudes.

Since the adopted isochrone is limited to 0.1 M⊙, corresponding to =10.7, the photometric limit at fainter magnitudes has been extrapolated using a linear extrapolation. To check the position of such extrapolation, we compared it with the empirical sequence by Pecaut & Mamajek (2013) for which mean stellar colors and effective temperatures are given down to M and L spectral types and that can be used as upper limit to the region we are interested on. Our photometric limit approaches such sequence and crosses it at . This ensures us to set an inclusive photometric selection close to the MS at the lowest mass tail. In fact, even though this implies the inclusion of stars older than 10 Myr, on the other hand, it avoides a bias against the selection of very young stars.
We note that for the photometric selection, the minimum and maximum associated to each observed star have been computed by considering the 1 parallax uncertainties, that are dominating with respect to the magnitude uncertainties. The photometric selection with respect to the reference isochrone has been performed by considering the compatibility of magnitudes, with respect to their minimum and maximum values, i.e. they were selected if either their minimum or maximum value lie inside the selection region. At the end of this selection we remain with a catalogue of 18 057 300 Gaia EDR3 entries.
Performing a photometric selection as inclusive as possible, as we have done, implies the introduction of a significant contamination by old field or open cluster stars, mainly due to the uncorrected reddening , binarity or to the overlapping photometric region, at the low mass range, where the sensitivity of the colors in distinguishing PMS or MS stars becomes very low.
However, the contamination by field stars does not represent a strong issue for our clustering analysis, since they are not expected to share similar astrometric and kinematic properties. In addition, since we aim at investigating the low mass component of the SFRs, that is also the most dominant (% Lada, 2006), the statistical contrast with respect to field contaminants is expected to be favorable to detect them.
A more tricky effect of our inclusive photometric selection is that also clusters older than 10 Myr can partially fall in the selected region and be recognized as candidate clusters in the subsequent analysis. As shown in Fig. 1, at faint absolute magnitudes (), the model-computed isochrones are not very sensitive to stellar ages, and tend to overlap, especially in the vs. diagram. In addition, spectral synthesis of M dwarf stars suffers from the accuracy of the adopted atmosphere models and/or from incomplete molecular data. The model-predicted colors of very low mass stars are therefore uncertain. A further complication is the observed discrepancy between radii and colors of low mass stars, likely due to the distorting effects of magnetic activity and starspots on the structure of active stars (Somers et al., 2020; Franciosini et al., 2021). All these effects cause a spread of the low mass MS and can bring magnitudes and colors of Myr old stars to fall in the region selected by us as compatible to stars with Myr. For all these reasons, as discussed for example in Jeffries et al. (2017), the ages judged from ”standard” isochrones are almost certainly underestimated due to a systematic bias.
At faint magnitudes, the fraction of old cluster members falling in the adopted photometric region decreases with cluster ages. Hence clusters of about 20-30 Myrs will be almost completely included in our selected sample, while at the age of 100-500 Myr only the low mass tail will be included. However, because of the adopted photometric limit, the low mass tails will be included only for relatively close clusters (500 pc).
As already mentioned before, a partial contamination by old cluster members in our photometric sample can occur also for bright stars () if their reddening or a binary status gives them observed magnitudes and colors compatible with the selected photometric region. As shown in the Appendix A, the effects of using colors and magnitudes uncorrected for reddening are expected to be more severe for reddened stars with spectral types earlier than G, in comparison with later spectral types, in the sense that the selected sample is expected to be contaminated mainly by these objects, falling in the brightest part of the photometric region adopted in this work. The implications of this contingency will be discussed in the following sections.
Finally, we also considered the possible effects due to the metallicity on the selection by considering 10 Myr isochrones for metallicity lower or higher than solar. The comparison shows that while YSOs with over solar (Z=0.020, [Fe/H]=0.2) or sub solar (Z=0.005, [Fe/H]=-0.45) metallicities would fall in the selected photometric region, very metal poor YSOs (Z=0.001, [Fe/H]=-1.10) would remain outside. However, as recently found by Spina et al. (2017) at Galactocentric radii from 6.5 kpc to 8.70 kpc, young open clusters and SFRs have close-to-solar or slightly sub-solar metallicities and therefore we conclude that no SFRs are expected to be missed for metallicity effects with our photometric assumptions.
Based on the adopted photometric selection, our data set encompasses substantially all YSOs with age Myr and observed , including the most reddened () that can be detected with Gaia. Even YSOs with accretion (e.g. Gullbring et al., 1998) or seen in scattered light (Bonito et al., 2013) or flares in M-type stars (e.g. Mitra-Kraev et al., 2005) are expected to be included in our sample. In fact, these phenomena affect the or the colors, causing the stellar colors becoming bluer than their photospheric colors, while, on the contrary, their effect on the colors goes in the same direction of the reddening, causing these latter colors to become redder.
We stress, however, that the constraint , adopted to strongly reduce the contamination due to reddened Turn-off or MS stars, makes the selected photometric sample incomplete for the massive stellar component of the SFRs. A further expected missing stellar component is that of binary systems of the clusters, due to the restriction of the Gaia data to RUWE1.4 (see Appendix B). In addition, since available data do not allow us obtaining reliable corrections for the reddening affecting colors and magnitudes of the selected YSOs, accurate stellar parameters such as individual stellar ages and masses will not be derived in the subsequent analysis. However, even though the results we aim to achieve are not suitable for investigations based on complete young populations or accurate stellar parameters, they are expected to trace the dominant component of the SFRs, i.e. their low mass population, and will be crucial to have an overall systematic view of the Galactic SFRs located within 1-2 Kpc from the Sun, as well as for detailed individual or statistical investigations of these YSOs.
4 Method
4.1 Clustering with DBSCAN
This section describes the methodology used to search for candidate clusters with an unsupervised algorithm as overdensities in the five-dimensional (5D) Gaia EDR3 astrometric and kinematics parameters (, , , , ).
Starting from the data set selected as described in the Sect. 3, we performed a clustering analysis using the DBSCAN code (Ester et al., 1996), within the scikit-learn machine learning package in Python. First of all, we preparared a grid of 5∘5∘ boxes, covering the entire range of the Galactic longitudes and for ∘. In this step, we took into account the discontinuity at ∘. To homogenise the variables having different dimensions to comparable values, the five parameters (, , , , ) within each box were first re-scaled using the RobustScaler Python code based on a statistics that is robust to outliers, according to the interquartile range.
The DBSCAN algorithm requires only two input parameters (, ). It identifies candidate clusters as overdensities in a multi-dimensional space (5D in our case) in which the number of sources exceeds the required minimum number of points , within a neighborhood of a particular linking length , for all the five parameters, using a statistical distance, assumed to be Euclidean. DBSCAN does not require to know an a priori number of clusters and it is able to detect arbitrarily shaped clusters. This is crucial for our analysis aimed to find SFRs that can be characterised by circular or elongated or asymmetric shapes, reminiscent of the native molecular clouds. In order to determine the best input parameters (, ) to give as input to DBSCAN, we experimented several values in the direction of well known SFRs and we noted that in the same direction more than a combination of the two parameters is needed to reveal different real clusters located at different distances. This is due to the fact that close candidate clusters, such as associations and co-moving groups, can appear spatially (in and ) sparse while they are definitively clustered in distance and proper motions, while in the same direction it is possible to identify distant, but spatially concentrated candidate clusters. In the two cases the choice of two different values rather than a single is required to detect these kinds of clusters.
Based on this preliminar empirical analysis, we decided to run the DBSCAN codes in the entire Galactic Plane, by adopting a total of 900 combinations of (, ) values with ranging from 0.1 to 9 in steps of 0.1 and ranging from 5 to 50 in steps of 5. In addition, to account for candidate clusters falling in the borders of the defined boxes, we defined another 4 sets of grids, by shifting the original boxes by ==[0.1∘, 0.2∘, 0.3∘, 0.4∘] with respect to the original boxes. In the following we will refer to the 5 sets of grids as spatial configurations. At the end, we run DBSCAN within a total of 360/5 60/5 5=4320 different boxes with 900 combinations of parameter sets (, ).
4.2 Candidate cluster validation
One of the most challenging phases of this analysis has been the validation of the recognized candidate clusters. In fact, DBSCAN is an unsupervised density-based algorithm and, as a consequence, it picks up not only overdensities which correspond to real OCs, but also overdensities only in statistical terms. For this reason, our a posteriori validation approach has been based on the exploitation of two astrophysical constraints, based on the typical properties of the SFRs, by avoiding the introduction of strong biases.
SFRs are not characterised by well defined age sequences and they are typically observed in the Hertzsprung-Russell (HR) diagrams as ensemble showing an apparent luminosity spread, often associated to an age spread (e.g. Palla & Stahler, 1999; Palla et al., 2005). On the other hand, such spreads have also been ascribed to complex phenomena affecting their photometry, such as variability, accretion and outflows, extinction, binarity and to our inability to quantify their contribution (Soderblom et al., 2014).
Neverthless, SFRs are usually observed with a typical mass distribution, that can be shaped by a standard (or closely resembling) Initial Mass Function (IMF), characterised by an increasing fraction of members going towards decreasing masses, at least until masses of M⊙ (e.g. Salpeter, 1955; Scalo, 1998; Chabrier, 2003).
Since we are exploiting the excellent Gaia EDR3 results down to , within reasonable reddening values (A), with our data set, we expect to detect YSOs with spectral type down to M-type, at distance kpc. This is the case, for example, of the cluster NGC 6530, located at around 1.3 kpc, for which the low mass population down to 0.4 M⊙ has been detected at V (Prisinzano et al., 2005), roughly corresponding to our magnitude limit.
Based on these considerations a physical recognisable candidate cluster should incude its tail of low mass members. Hence, we imposed a minimum threshold of 10 objects with , that means to require candidate clusters having at least 10 stars with M0.5 M⊙, assuming the isochrone of 10 Myr from the Pisa models.
A further parameter that we considered as an indicator of reliability for the candidate cluster validation is the dispersion of the distances of each cluster. The observed total distance dispersion is a combination of the intrinsic dispersion plus the contribution due to the measurement errors. While the intrinsic dispersion does not depend on the distance, the contribution due to the measurement errors becomes dominant at large distances, since Gaia EDR3 parallaxes become much more uncertain. Thus, among the parameters used to find over-densities by DBSCAN, the observed standard deviation of the distances is the most critical parameter to be constrained for the identification of real clusters. To this aim, for the cluster validation, we constrained the maximum allowed observed dispersion. For distances 1 kpc, the constrain is set on the ratio between the standard deviation of the distances of the putative members and the derived mean distance for the given candidate cluster. For a valid candidate cluster the above ratio has to be . For more distant candidate clusters, we adopted the more stringent constrain that the standard deviation should be smaller than 200 pc. This limit has been chosen by considering that for NGC 2244 located at Kpc, one of the most distant clusters that we detect, the distance dispersion is about 175 pc and therefore we do not expect to find physical real clusters with a distance dispersion larger than this threshold. These choices may limit our ability to detect clusters at distance kpc for which we could, in principle, detect, at the magnitude limit of our data set, the massive component of the clusters down to M⊙ regime. However, since the accuracy of Gaia EDR3 parallaxes and kinematic data beyond this limit becomes very low, we prefer to maintain our constraints at the price of limiting our analysis to smaller distances.
The adopted constrains on the distance dispersion of cluster members have shown to be very effective in rejecting a large number of (unexpected) candidate massive clusters recognized by DBSCAN, typically with more than 1000 members, located at distances Kpc, that do not include M-type stars, but only earlier stars and that are characterised by very large dispersions in distance. These structures are likely those identified as strings in Kounkel & Covey (2019); Kounkel et al. (2020). However, since we do not recognise these structures as standard clusters, any further investigation of them is beyond the scope of this work.
The final cluster member selection has been performed only for candidate clusters that satisfy the previous constraints. As a result of our choice of the DBSCAN input parameters (see Sect. 4.1) and of the adopted spatial configurations, a given candidate cluster can be identified by adopting similar input parameters, with possible small differences in the cluster membership. In addition, a given candidate cluster can be identified in more than one box, with the same membership result, if the candidate cluster is spatially small enough to be completely identified, for a given couple of input parameters, in two or more overlapping boxes. Alternatively, it can be completely detected within one box and only partially detected in a box where the candidate cluster falls at the borders. In order to assign the most likely membership for a given cluster, we proceeded adopting the following strategy.
We first considered the candidate clusters detected within the same spatial configuration but with different set of parameters (, ). For each of the selected candidate clusters, we computed the median values of the 5 parameters (, , , , ) and then we selected all the candidate clusters that were simultaneously compatible in these 5 parameters i.e. if the two compared distributions of each parameter overlap around the median, within half of the total width. Among the compatible candidate clusters, we selected the most populated and discarded the others. This strategy allowed us to identify the most persistent candidate clusters at different scales.
In the subsequent step, we compared the candidate clusters identified in each of the five spatial configurations to select the best configuration, or likewise, the best box in which the spatial coverage of the candidate cluster is maximised. Since we can have more than one detection of the same cluster, for each member, we selected only the configuration in which it is associated to the most populated candidate cluster and that member was removed from the less populated clusters in which it was identified by DBSCAN. The peripheral members of candidate clusters covering a spatial region larger than the area of the box (5∘5∘), left out from the richest centered candidate cluster, were considered as additional candidate clusters only if they include at least 10 elements,555For this reason, our catalogue includes cases in which a single physical cluster is identified by more than one DBSCAN cluster. as also assumed in other similar works, (e.g. Castro-Ginard et al., 2018; Kerr et al., 2021). This selection strategy allowed us to include also likely members at the candidate cluster’s periphery, providing data for further investigations on the dynamics of these stellar clusters. At the end of this process, we are left with a total of 449 849 detected stars within 14 178 single candidate clusters.
Many SFRs are associated to giant molecular clouds and thus they can have a spatial extension larger than the box of 5∘5∘ used for our analysis. In order to merge candidate clusters belonging to the same complex, we proceeded as follows: we computed the median and the 16% and 84% percentiles of the distance and proper motion distributions. Then, we merged all neighboring clusters for which distances and proper motions were compatible within 1 . The total number of merged clusters is 7 323.
4.3 Cluster age classification
Literature Name | Flag | Reference | r50 | logt | N | |||
---|---|---|---|---|---|---|---|---|
[deg] | [deg] | [deg] | [pc] | [yr] | ||||
Cl10 | 1 | Le Duigou & Knödlseder (2002) | 79.867 | -0.908 | 0.886 | 1557 | 167 | |
65.78-2.61 | 2 | Avedisova (2002) | 66.153 | -3.123 | 1.194 | 1324 | 134 | |
Rosette | 3 | Zucker et al. (2020) | 206.438 | -1.903 | 2.025 | 1571 | 7.1 | 810 |
NGC 6530 | 4 | Dias et al. (2002) | 6.060 | -1.287 | 1.020 | 1364 | 635 | |
NGC 6531 | 5 | Dias et al. (2002) | 7.585 | -0.338 | 1.634 | 1350 | 8.6 | 804 |
UBC 386 | 6 | Cantat-Gaudin & Anders (2020) | 100.562 | 8.694 | 1.147 | 1280 | 6.8 | 193 |
Ass Cyg OB 9 | 7 | Sitnik (2003) | 78.753 | 1.778 | 2.293 | 1339 | 8.1 | 616 |
Serpens South molecular cloud | 8 | Fernández-López et al. (2014) | 29.364 | 2.870 | 0.976 | 920 | 123 | |
CygOB7 CO Complex | 9 | Dutra & Bica (2002) | 92.653 | 2.529 | 0.950 | 1123 | 46 | |
BRC 27 | 10 | Rebull et al. (2013) | 224.621 | -2.244 | 3.027 | 1233 | 6.9 | 1709 |
G352.16+3.07 | 11 | Otrupcek et al. (2000) | -7.866 | 3.002 | 4.764 | 1169 | 7.0 | 2357 |
IC 1396 | 12 | Zucker et al. (2020) | 99.236 | 4.733 | 7.407 | 945 | 7.4 | 3140 |
2399 | 13 | Miville-Deschênes et al. (2017) | 33.890 | 0.643 | 2.543 | 609 | 130 | |
Chamaeleon II | 14 | Zucker et al. (2020) | -56.363 | -14.720 | 2.452 | 200 | 41 | |
Cepheus | 15 | Zucker et al. (2020) | 108.911 | 4.359 | 9.748 | 923 | 8.2 | 11445 |
NGC 7039 | 16 | Cantat-Gaudin & Anders (2020) | 88.350 | -1.717 | 5.322 | 767 | 7.3 | 1048 |
CO 14 | 17 | Yonekura et al. (1997) | 104.508 | 13.950 | 3.039 | 350 | 124 | |
Serpens | 18 | Zucker et al. (2020) | 28.783 | 3.082 | 10.166 | 455 | 7.2 | 2388 |
IC 348 | 19 | Cantat-Gaudin & Anders (2020) | 160.790 | -15.812 | 11.430 | 334 | 7.4 | 2661 |
Chamaeleon I | 20 | Zucker et al. (2020) | -62.781 | -15.444 | 3.099 | 192 | 156 | |
Taurus | 21 | Zucker et al. (2020) | 172.114 | -15.302 | 4.551 | 131 | 112 | |
Ophiuchus | 22 | Zucker et al. (2020) | -8.024 | 18.781 | 12.655 | 144 | 2398 | |
Corona Australis | 23 | Zucker et al. (2020) | -0.132 | -17.592 | 3.291 | 155 | 107 | |
G302.72+4.67 | 24 | Dutra & Bica (2002) | -57.143 | 4.739 | 5.854 | 112 | 235 | |
Pozzo 1 | 25 | Cantat-Gaudin & Anders (2020) | 261.858 | -8.321 | 13.343 | 398 | 8.3 | 6001 |
ASCC 32 | 26 | Cantat-Gaudin & Anders (2020) | 237.327 | -9.186 | 9.878 | 818 | 8.4 | 4416 |
Lac OB1 | 27 | Chen & Lee (2008) | 96.762 | -15.032 | 11.268 | 548 | 7.4 | 2367 |
RSG 8 | 28 | Cantat-Gaudin & Anders (2020) | 109.331 | -1.212 | 12.055 | 468 | 7.4 | 2900 |
NGC 2451B | 29 | Cantat-Gaudin & Anders (2020) | 253.198 | -7.499 | 9.513 | 401 | 7.6 | 2826 |
NGC 2232 | 30 | Cantat-Gaudin & Anders (2020) | 215.533 | -7.983 | 13.427 | 372 | 7.2 | 1703 |
Sco OB2 UCL | 31 | de Zeeuw et al. (1999) | -29.000 | 16.813 | 15.052 | 145 | 1189 | |
IC 2602 | 32 | Cantat-Gaudin & Anders (2020) | -70.259 | -5.011 | 6.825 | 151 | 7.6 | 315 |
NGC 2516 | 33 | Cantat-Gaudin & Anders (2020) | -86.236 | -15.931 | 6.881 | 427 | 7.6 | 1156 |
Melotte 20 | 34 | Cantat-Gaudin & Anders (2020) | 147.504 | -6.461 | 8.867 | 174 | 7.7 | 414 |
Melotte 22 | 35 | Cantat-Gaudin & Anders (2020) | 166.573 | -23.406 | 5.882 | 137 | 7.9 | 296 |
NGC 2422 | 36 | Cantat-Gaudin & Anders (2020) | 230.995 | 3.061 | 6.238 | 500 | 8.0 | 347 |
Alessi 12 | 37 | Cantat-Gaudin & Anders (2020) | 67.678 | -11.723 | 3.977 | 546 | 8.1 | 127 |
NGC 3532 | 38 | Cantat-Gaudin & Anders (2020) | -72.815 | 2.279 | 4.851 | 561 | 8.6 | 88 |
IC 6451 | 39 | Cantat-Gaudin & Anders (2020) | -19.939 | -7.821 | 1.257 | 1068 | 9.2 | 86 |
NGC 6087 | 40 | Cantat-Gaudin & Anders (2020) | -32.077 | -5.426 | 2.532 | 1007 | 8.0 | 77 |
Alessi 62 | 41 | Cantat-Gaudin & Anders (2020) | 53.676 | 8.773 | 3.561 | 622 | 8.4 | 87 |
UPK 33 | 42 | Cantat-Gaudin & Anders (2020) | 27.965 | 0.108 | 3.931 | 518 | 8.4 | 111 |
NGC 1647 | 43 | Cantat-Gaudin & Anders (2020) | 180.355 | -16.861 | 2.141 | 606 | 8.6 | 272 |
NGC 6124 | 44 | Cantat-Gaudin & Anders (2020) | -19.205 | 6.078 | 5.404 | 648 | 8.3 | 1102 |
NGC 6494 | 45 | Cantat-Gaudin & Anders (2020) | 9.714 | 2.980 | 5.537 | 755 | 8.6 | 680 |
IC 4725 | 46 | Cantat-Gaudin & Anders (2020) | 14.022 | -4.595 | 4.807 | 669 | 8.1 | 788 |
Alessi 44 | 47 | Cantat-Gaudin & Anders (2020) | 37.075 | -11.510 | 7.285 | 587 | 8.2 | 637 |
Stock 2 | 48 | Cantat-Gaudin & Anders (2020) | 133.371 | -1.160 | 8.292 | 384 | 8.6 | 727 |
NGC 2168 | 49 | Cantat-Gaudin & Anders (2020) | 186.647 | 2.327 | 2.616 | 928 | 8.2 | 118 |
DSH J2320.1+5821A | 50 | Kronberger et al. (2006) | 111.248 | -2.785 | 2.394 | 1131 | 243 | |
UPK 143 | 51 | Cantat-Gaudin & Anders (2020) | 91.810 | 0.514 | 1.752 | 934 | 8.4 | 262 |
Collinder 421 | 52 | Cantat-Gaudin & Anders (2020) | 79.429 | 2.527 | 1.061 | 1265 | 8.4 | 154 |


From a visual inspection of the photometric properties of the clusters found with this analysis, we noted that, while, as expected, for most of the recognized clusters their selected members of any mass stay in the PMS region of the CAMD, there is a fraction of recognized clusters for which only the low mass members stay in that PMS region. This is, for example, the case of clusters with low or moderate extinction () and age 10 Myr Myr such as IC 2602, Melotte 20, NGC 2451 A, NGC 2451 B, where part of the MS or PMS low mass tail () overlaps the photometric region considered here. For clusters with ages 100-200 Myr, such as for example Melotte 22 (Pleiades), NGC 2422, NGC 2516 a smaller fraction of the MS low mass tail, likely composed by reddened members, cluster binaries or PMS members, is selected.
Further reddening effects or poorly constrained magnitudes or parallaxes, can bring colors or magnitudes of members of even older clusters within the PMS photometric region considered in this work. For clusters with extinction , the MS of Myr old clusters in the range fall to the right of the unreddened 10 Myr isochrone. Thus, depending on the cluster age, binaries or reddened members of clusters with ages Myr can also fall in the selected photometric region. Since these objects share the same proper motions and are at the same distance, they are recognized as belonging to a cluster and are therefore included in our catalogue.
To distinguish SFRs from old clusters, we adopted a pattern match procedure based on the extraction of the different patterns that characterise the observed CAMD of clusters of different ages. Among the clusters identified as described in the previous sections, we selected those listed in Table 1 (52 in total) and we used them as template data set.
In the template data set, we identified 28 clusters, shown in Fig. 2, that have been used by us as proxy for clusters with ages Myr. Such clusters were selected since most of them show a consistent luminosity spread, typical of the SFRs, starting from our brightest limit MG=5. However, their general shape is also set by the reddening and the distance, with the observed MG maximum limit that increases as distance decreases. All these cases have been included in the template data set to retrieve all the possible patterns observed in the CAMD due to different ages, distances, reddening and cluster richness. For each of these clusters, we assigned an increasing flag from 1 to 28, aimed to represent the different shapes of the observed CAMD shown in Fig. 2.
We identified also 8 clusters as representative for the ages Myr, flagged from 29 to 36, according to the ages given in Cantat-Gaudin & Anders (2020). The observed CAMD of these clusters are shown in Fig. 3. These clusters show an evident PMS region that is mainly populated in the range M (e.g. NGC 2451B, NGC 2232), due to our photometric selection. Such region becomes thinner and thinner for older clusters such as Melotte 20 and Melotte 22. Finally, we selected 16 clusters, flagged from 37 to 52, as proxy for clusters with ages Myr, in agreement with Cantat-Gaudin & Anders (2020). Most of these clusters have been included in the template sample to take into account the non-uniform distribution of the absolute magnitudes of their members in the observed CAMD. In fact, while for very young clusters it is uniformly populated, accordingly to their age and the IMF, for these reddened and old clusters, the population is not entirely identified. For example, the clusters with flag from 43 to 52 are characterised in the CAMD by an overdensity of members with M. Most of them are quite distant clusters (d500 pc) and thus very likely affected by reddening. As shown in Appendix A, the effect of the reddening for the Gaia bands, depends on the stellar effective temperature (Anders et al., 2019), and for high mass stars such effect is larger than for low mass stars. This would explain the presence of the peak at higher masses in the observed magnitudes of the CAMD for most of these clusters. Depending on the cluster distance, also part of the low mass tail is detected, but the overall non-uniform pattern of their CAMD is different from that expected for young clusters.
Since most of the clusters show asymmetric structures, to evaluate their extension we estimated the radius in which half of the identified members are concentrated, as r, as done in Cantat-Gaudin & Anders (2020).
In our final catalogue, we also noted the presence of other photometrically unphysical aggregates including mostly only faint stars (with ), with very red colors, with an horizontal distribution in the CAMD, likely compatible with those of giant stars and with MG nearly constant. Since most of these peculiar clusters are in the direction of the Galactic Centre, we infer that they correspond to very distant giants for which Gaia EDR3 parallaxes are systematically wrong due to the strong effects of crowding and high extinction in the direction of the Galactic Center.
To separate these aggregates from SFRs or stellar clusters, we included in our template data set further 27 cases of these peculiar aggregates, flagged from -27 to -1, with median MG from 7.6 to 15.8, covering their observed magnitude values.
According to the known ages of the clusters of the template data set, we defined the three age bins, Myr, Myr and Myr, including the clusters with flags in the ranges [1, 28], [29, 36] and [37, 52], respectively. Then we used a python implementation of the two-dimensional version of the Kolmogorov-Smirnov (KS) test777available at https://github.com/syrte/ndtest, developed by Peacock (1983) and generalised by Fasano & Franceschini (1987), to identify for each of the 7 323 clusters, the most similar amongst the chosen template clusters in the CAMD, i.e. the one for which the KS statistic is minimum.
The procedure does not intend to derive any best fitting parameter but it is aiming only to assign a flag to each cluster and then a ”coarse” age range to which it belongs to. At the end, we selected only the 1 450 clusters with more than 20 members (corresponding to 302 730 objects), for which the KS test statistic is .
In conclusion, we classified 124 440 candidate YSOs that belong to 354 structures with 10 Myr, distributed within Kpc. From now on we will indicate these structures as SFRs, meaning regions that can include at least one very young cluster and mostly consistent of YSOs with 10 Myr. In addition, we classified 65 863 low mass members of 322 stellar clusters, mainly located within 500 pc and ages 10 Myr100 Myr, and, finally, 43 936 members of 524 clusters with 100 Myr. The objects that belong to photometrically unphysical aggregates are 68 491. The results are summarised in Tab. 2. From our catalogue we reject all clusters with ages 100 Myr, the photometrically unphysical aggregates and those that remain unclassified since are mainly poorly populated with a CAMD that does not allow to properly classify them.
Classification | # Stars | # clusters | Flag |
---|---|---|---|
10 Myr | 124 440 | 354 | [1, 28] |
Myr100 | 65 863 | 322 | [29, 36] |
100 Myr | 43 936 | 524 | [37, 52] |
Phot. unphysical aggregates | 68 491 | 250 | [-27, -1] |
Unclassified | 147 119 | 5 887 |
SFRs and stellar clusters with ages t100 Myr are listed in Tab. LABEL:printtexclusters, while cluster members are given in Tab. 3. Most of the clusters listed in the table are very extended complex regions including several subclusters known in the literature, merged here within single structures. Since the aim here is to detect these Galactic young structures, the literature cluster names given in Tab. LABEL:printtexclusters, mainly taken from Cantat-Gaudin & Anders (2020) or Zucker et al. (2020) or from Simbad, are only indicative of the region.
5 Results
5.1 Photometric completeness
Within the magnitude range explored in this work and assuming the restrictions on Gaia data defined in Sect. 2, the photometric cluster completeness, for clusters with Myr, is expected to be near 100% for not embedded YSOs, since, as shown in Fig. 1, all members detectable in this age range and in the optical bands, are expected to lie in the selected photometric region.
Nevertheless, the adopted restriction introduced a bias in the selection of multiple members in the SFRs. To estimate the fraction of missed binary members with the Gaia-based selection used in this paper, we used as reference the Tau-Aur binary-star list by Kraus et al. (2012). Details about the comparison of this list with our catalogue and Gaia EDR3 data are given in Appendix B. This comparison shows that, due to the restriction, in SFRs at distances similar to Tau-Aur, we have lost about 72% of their binary populations. Assuming a binary frequency of 50% (Mathieu, 1994), a loss of 35% of PMS members can be expected. However, at large distances, the projected binary motions become smaller, and therefore we expect a less significant binary member loss for the farther-out SFRs.
For clusters with ages Myr, the cluster completeness decreases with ages and strongly depends on the cluster distance. In fact, clusters with 10 Myr100 Myr (indexed from 29 to 36), are mainly in the solar neighborhood ( pc). For these clusters, even though we are not able to detect the entire cluster population, we are, however, able to detect part of the very low mass tail component. The fraction of the detected very low mass tail component decreases with age and, in fact, for clusters with Myr (indexed from 37 to 52), mainly concentrated at pc, the completeness is very low. These latter have been discarded from our final catalogue since they include only a small fraction of the cluster members and are not in the age range of interest for this work. Clusters with ages 10 Myr100 Myr have been included in our catalogue, since the age transition to the clusters with t10 Myr is not sharply defined and, in addition, there are structures such as Sco OB2 that include clusters in both age ranges, that very likely belong to correlated star forming processes.
5.2 Spatial distribution






Gaia EDR3 ID | Flag | Cl. ID | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[deg] | [deg] | [mas] | [mas] | [mas yr-1] | [mas yr-1] | [mas yr-1] | [mas yr-1] | [mag] | [mag] | [mag] | |||
5307477283092480512 | 276.491 | -2.559 | 1.925 | 0.071 | -13.112 | 0.093 | 8.232 | 0.079 | 17.27 | 18.56 | 16.13 | 30 | 1 |
5310444177789318784 | 275.554 | -3.171 | 1.923 | 0.265 | -12.888 | 0.288 | 8.518 | 0.327 | 19.40 | 20.85 | 18.02 | 30 | 1 |
5310482871168676352 | 276.316 | -2.671 | 1.846 | 0.036 | -14.232 | 0.044 | 8.038 | 0.039 | 16.31 | 17.24 | 15.27 | 30 | 1 |
5310525477245693696 | 275.730 | -2.877 | 1.901 | 0.030 | -12.997 | 0.034 | 8.875 | 0.029 | 15.63 | 16.57 | 14.66 | 30 | 1 |
5310539800934356736 | 275.547 | -2.870 | 2.004 | 0.163 | -13.991 | 0.195 | 9.334 | 0.171 | 18.70 | 20.58 | 17.38 | 30 | 1 |
Figure 4 shows the maps of the 124 440 YSOs associated to the 354 SFRs with ages Myr, while Fig. 5 shows the maps of the 65 863, stars associated to the stellar clusters with ages 10 Myr Myr. Each map has been obtained as two-dimensional histogram smoothed with a Gaussian kernel at 3, adopting a pixel size of 3 pc3 pc.
Most of the overdensities in Fig. 4 are associated to known SFRs, some of which are labeled in the figure. With the exception of those within 200-300 pc, all clusters present a radial elongated shape, tracing the increasing uncertainties in the distances.
The clusters with ages 10 Myr Myr are mainly limited within pc (see Fig. 5) and show a much more diffuse spatial distribution. Very rich clusters such as, for example, NGC 2232, NGC 2451B, Gamma Velorum, NGC 2547, NGC 2516, Alessi 5 at distance of pc, seem to belong to a common giant complex, mostly lying in the third Galactic quadrant.
5.3 Literature comparison
In this section, we present the comparison of our results with those previously obtained in the literature for two particular regions, Sco OB2 and NGC 2264. These comparisons will be used to estimate our completeness and the contamination level, at least when the completeness of the comparison sample enable us to do it. We note that we will consider here each of the merged clusters as a unique ensemble. A detailed subclustering analysis of them, with the identification of possible substructures with age gradient or kinematic subclusters will be deferred to a future paper. A detailed comparison with the literature for other SFRs is presented in the Appendix C, where we also compare the whole catalogue with other all-sky catalogues, mainly derived with Gaia DR2 data.
5.3.1 The Sco-OB2 association




The Sco-Cen or Sco-OB2 association is a very extended SFR (120∘60∘) quite close to the Sun ( pc) that, in the last years, has been the subject of several studies focused on the low mass population. By exploiting available all-sky surveys, these studies finally allowed to study the entire region and its complexity (e.g. Zari et al., 2018; Damiani et al., 2019; Kounkel & Covey, 2019; Kerr et al., 2021; Luhman, 2022).
To select the members of this region, we performed a spatial selection by considering all stars with ∘∘, ∘∘ and, as assumed in Damiani et al. (2019), distance pc. We end up with a total of 9 663 YSOs with ages 100 Myr, distributed as in Fig. 6. In the () plane, the pattern of the YSOs associated to Sco-OB2 is that already found in the literature (e.g. de Zeeuw et al., 1999; Damiani et al., 2019; Kerr et al., 2021). Among the selected objects, 4 232 YSOs have been classified in the range Myr. Those concentrated in the Upper Sco (US) region are 2 472. They correspond to the youngest subpopulation of Rho Ophiuchi. Another prominent subpopulation, classified in the range 10 Myr100 Myr (flag 31), includes 3 741 YSOs, falling in the Upper Centaurus-Lupus (UCL) and Lower Centaurus-Crux (LCC) regions. It then represents the first generation of stars of the Sco OB2 region, in agreement with recent results (e.g. Damiani et al., 2019; Luhman, 2022).
Proper motions, parallaxes and the CAMDs of the different subpopulations detected in the Sco OB2 association are shown in Fig. 7. The proper motions of the YSOs associated to Sco-OB2 show a quite complicated pattern, confirming the complex kinematic structure of this association. The values of parallaxes of YSOs in Sco-OB2 are mostly enclosed between mas and mas, corresponding to a mean distance of 152 pc and standard deviation pc. Finally, in the CAMD, we can recognise a usual distribution of YSOs in the PMS region. As already noted, the census of stars that belong to the first generation of stars in the Sco OB2 association is likely incomplete since it is expected to lie in the region of the CAMD that has not been considered in this work.
To estimate the completeness level of our census, we compared our list of Sco-OB2 YSOs with the ones recently published by Damiani et al. (2019) and Kerr et al. (2021), based on Gaia DR2 data and by Luhman (2022), based on Gaia EDR3 data. To perform these comparisons we used the Gaia identification number of YSOs in our catalogue, retrieved as described in Sect. C.4. We find that the YSOs in common with the Damiani et al. (2019) catalogue, that includes a total 10 839 members, are 6 492. i.e. about 60% of the Damiani et al. (2019) list. Among the 9 663 YSOs selected by us in the Sco-OB2 association, those falling in the spatial region and magnitude range covered by Damiani et al. (2019) are 7 553. Therefore, the objects in common are 86% of our sample in the same field. Many of the remaining 1 061 YSOs (14%) not selected by Damiani et al. (2019) show a spatial distribution consistent with that of the other members and thus we discard the hypothesis that they are contaminants, and suggest that they are likely YSOs missed by Damiani et al. (2019), that is based on the less complete Gaia DR2 catalogue.
Adopting the same spatial constraints, we retrieved in the Sco-OB2 region 9 083 objects, selected as candidate YSOs in the Kerr et al. (2021) catalogue, independently from their clustering type of classification. Among these, 5 203 are in common with our catalogue but those classified as YSOs are 5 109 i.e. of the Kerr et al. (2021) sample888Using the Gaia DR2 number, we cross-matched the Kerr et al. (2021) and Damiani et al. (2019) lists and found 6 423 objects in common..
The Luhman (2022) catalogue includes a total of 10 509 YSOs but to be consistent with our selection, we selected those with , 7.520.5, that are in total 7 925. Using the Gaia EDR3 id, we found that the YSOs in common with our catalogue are 6 341 representing 80% of the Luhman (2022) catalogue and 85.6% of our list of YSOs in the Sco Cen, by considering the 7 408 counterparts falling in the region covered by Luhman (2022).
These percentages can not be strictly used to estimate our level of completeness or contamination, since the catalogues have been obtained starting from different initial constraints, both for the photometric and the astrometric selection, that inevitably can introduce several biases. However, these comparisons are useful to confirm the membership for 85% of the selected members. The remaining 1 067 objects not retrieved by Luhman (2022) but selected by us as YSOs show a spatial distribution consistent with that of the other members with two strong concentrations of them in the US region and around V 1062 Sco. We therefore conclude that they are genuine members rather than contaminants, likely missed by Luhman (2022) in the photometric selection based on the colors.
5.3.2 The Monoceros OB1/NGC 2264 complex and the Rosette Nebulas




Another well studied region that we used to test our results is the cluster NGC 2264 in the Monoceros OB1 complex. This relatively compact and close ( pc from the Sun) SFR, devoid of background and foreground emission, has been the subject of many detailed studies, including, for example, X-ray observations (Flaccomio et al., 2006), optical and near IR analysis of its low mass population (Venuti et al., 2019), coordinated synoptic investigations with Optical and Infrared light curves with CoRot and Spitzer (Cody et al., 2014). Flaccomio et al. (2022, in preparation) compiled the most complete data set of NGC 2264, based both on all-sky surveys (Gaia EDR3, 2MASS, VPHAS) and dedicated observations falling in the region 98.93∘∘ and 8.45∘∘. The young structure identified by us in this field includes a total of 1 916 YSOs, but only 1 062 of them (%) fall in the region investigated by Flaccomio et al. (2022, in preparation). The remaining YSOs are in part (404 YSOs) concentrated in the region corresponding to the cluster IC 446, while a further unknown group of 450 YSOs are sparsely distributed in the Southern region of NGC 2264. As shown in Fig. 8, a sub-group of these latter form a visual bridge along a filamentary structure, clearly visible in the IR IRIS image, down to the location corresponding to the more distant Rosette Nebula, located at 1.5 Kpc, hosting the SFR NGC 2244. Thus, our finding is that the known cluster NGC 2264 actually belongs to a structure larger than the 2∘∘region, typically considered in the literature for this SFR. The mean distance of YSOs associated to the complex NGC2264-IC 446 is 731.8695.5 pc, even though the proper motion distributions of the three subgroups suggest they share similar but not equal values.999A detailed kinematic analysis of these sub-regions is beyond the aims of this work.
In the same region, we identified further 5 sub-structures with distance Kpc101010This limit has been adopted to avoid the Orion sub-structures, the most populated being the cluster in the CMa OB1 association, centered around RA=106.3∘ Dec=-11.47∘, at a distance of 1250162 pc, associated to the Reflection Nebula IC 2177, including 1709 YSOs. In addition, we identified the cluster NGC 2244, including 810 YSOs, centered at RA=98.3∘ Dec=4.9∘, at a distance of 1580199 pc and the cluster associated to Mon R2, at a distance of 897112 pc, including 1272 YSOs. In addition, we detected the cluster indicated in Cantat-Gaudin & Anders (2020) as UPK 436 with 620 members and a minor sparse cluster in the region of CMa OB1 located at 807 pc.
Figure 9 shows PM, parallaxes and CAMD of all these substructures, where it is clearly visible that they are spatially and kinematically uncorrelated, while in the PMS region of the CAMD they are indistinguishable since they consist of similar age stars.
The membership defined in Flaccomio et al. (2022) includes two levels of confidences. One based on the combination of several criteria derived by dedicated X-ray, spectroscopic and IR observations, including 2263 confirmed members (sample C) and where the fraction of false positives is negligible and another list (sample C-Wide), based exclusively on all-sky surveys, including 1542 YSOs, where the membership has been deduced by a smaller number of criteria and thus the number of false positives is expected to be higher. We find that the YSOs of the sample C (sample C-Wide) in common with our list of YSOs in the NGC 2264 region are 972 (960), corresponding to a fraction of 43% (62%) with respecto to the Flaccomio sample. These fractions are considered here as indicators of our level of completeness of the entire SFR population. However, these results are strongly conditioned by the starting photometric selection () and the restrictions on the Gaia EDR3 data that we adopted in this work. In addition, the Flaccomio et al. sample C includes 497 of the 2263 confirmed members that do not have a Gaia counterpart.
To estimate the efficiency of our method to recover YSOs, we considered the members selected by Flaccomio et al. with a Gaia counterpart, falling in the photometric region considered in this work, and compliant with our initial data restrictions (i.e. and parallax relative error ). Adopting this sample, the fraction of the YSOs selected by us in common with the Flaccomio et al. membership is 95%-96%, considering both the samples C and C-Wide. We note that this is the efficiency of our clustering method but is not the efficiency of the Gaia data. In fact, if for the two lists, we consider the members falling in the same photometric region but we do not consider the restrictions in and in the parallax error, the fraction of YSOs in common is 72% for sample C and 77% for sample C-wide. This suggests that 23%-28% of genuine YSOs are missed by us due to remaining issues in the Gaia data, at least in the current Gaia EDR3 release.
Finally, we find that among the 1 052 YSOs selected by us in the NGC 2264 region, a total of 1 034 are included in the list of objects collected by Flaccomio et al. but 62 of them are not members in the more complete and less contaminated sample C. This means that about 92% i.e. (1 034-62)/1 052 of the YSOs selected by us are confirmed members. Hence we conclude that the contamination level of the sample that we have selected is 8%.
For comparison, in the same region, Kounkel & Covey (2019) found 637 YSOs, belonging to the clusters named as Theia 41 and 189, with 548 and 89 objects, respectively. Of them, 420, i.e. about 66%, are in common with our list.
6 Discussion
In the previous sections, we described how overdensities in the 5D parameter space (, , , , ) have been identified, starting from a photometrically selected sample, that covers the expected PMS region of YSOs with ages Myr.
Since no attempt has been made to correct for interstellar reddening, the starting sample was contaminated also from older reddened stars. Another possible reason for the contamination of older stars derives from the adopted strategy to select the starting sample in the plane vs. , where the sensitivity to stellar ages of the available isochrones is quite low for the low mass population. In fact, for faint and very low-mass stars, isochrones get closer and closer for ages larger than about 50-100 Myr and, consequently, it is difficult to separate young populations from the older ones. As a result, the DBSCAN clustering algorithm, adopted to resolve spatially concentrated and/or co-moving stellar populations located at the same distance, can select also clusters older than 10 Myr.
A pattern match procedure has been adopted to disentangle SFRs and young clusters from older and photometrically unphysical clusters. We found 354 SFRs with ages Myr and 322 young clusters with ages approximatively in the range 10-100 Myr. We discuss here these validated findings in the context of the GP structure within 1.5 kpc from the Sun.






The maps of the young stellar clusters recognised by the DBSCAN clustering algorithm, most of them already known in the literature, have been shown in the previous sections, and specific spatial and kinematic details have been presented for some of them.
To identify clusters extended on scales larger than the 5∘5∘ boxes used in the analysis, we merged adjacent clusters with consistent proper motions and distances. This procedure has been applied to identify extended SFRs as a whole, as in the case of the Orion Complex or Sco OB2 UCL, with r50 equal to ∘and ∘, respectively, that are among the most extended structures resolved in this work. In several cases, it identifies clusters that encompass multiple populations, as in the case of NGC 2264, that has been identified as a unique structure including also the close cluster IC 446 and other YSOs in the surrounding region. A more in depth analysis of the two clusters shows that their proper motions can be distinguished into slightly different sub-populations and thus our overall procedure to define clusters tends to include multiple sub-populations sharing similar properties, likely associated to the progenitor molecular cloud.
The question of the cluster and subcluster identification is a very complex issue that can be dealt at different spatial precision levels, required for a given analysis, as done, for example, for the MYStiX project, in Feigelson (2018), where a parametric statistical regression approach, providing hierarchical ellipsoid structures, has been adopted. The evidence of a wide range of central surface densities found in the MYStIX maps is in agreement with the different spatial morphology of the SFRs identified in this work.
Figure 10 shows the spatial distribution of the young stellar clusters found in this work, in three different bins of distance, [100, 600] pc, [600, 2000] pc and [100, 2000] pc. The young clusters are drawn by distinguishing them in the age bins Myr, 10 Myr Myr and Myr. Note that clusters with 10 Myr Myr have been found only in the solar neighbourhood ( pc) and thus are shown only in the [100, 600] pc distance range.
The distribution of SFRs ( Myr) within 600 pc is dominated by the presence of big young structures crossing the GP such as the Orion and Perseus Complexes, Gamma Velorum (Pozzo 1), Lac OB1, under the GP, BH 23 (corresponding to Theia 80 in Kounkel & Covey, 2019) and RSG 8 close to the GP, Serpens, Alessi 62, Collinder 359 and Rho Ophiuchi, over the GP. The clusters with ages 10 Myr Myr in the same distance range appear definitely more diffuse. Apart from the well known Sco-Cen association covering 60∘ in longitudes, we detected as unique complex the likewise huge association in the Vela-Puppis region, including Trumpler 10, Velorum, NGC 2457, NGC 2451B, as well as the associations around NGC 2232, Roslund 5 and Alessi 19.
Their positions appear to be connected to the clusters with Myr since they follow a spatial pattern crossing or very close to that of the SFRs. This suggests that they likely belong to a common star formation process encompassing at least two generations of YSOs, with the first generation including extended populations of dissolving young clusters and associations.
The large Sco-Cen association is connected to the Vela and Orion Complexes, confirming what already found by Bouy & Alves (2015) with Hipparcos data. These three regions are described there as three large-scale stream-like structures.
Going towards larger distances ( pc), the SFRs show a more regular pattern, approximately parallel to the GP. The most prominent SFRs are ASCC 32 and Cep OB3b in the Cepheus, respectively under and over the GP at distance of 800-900 pc. Among the most distant SFRs with more than 300 members and distance pc, we detected NGC 2244, NGC 6530, NGC 6531 , NGC 2362 and FSR 0442.
The overall distribution of YSOs in SFRs with pc traces a complex 3D pattern in the solar neighbourhood. In particular, in the Z vs. X edge-on Galactic projection (see bottom/left panel in Fig. 4 and top/left panel in Fig. 10) we find evidence of a projected inclined structure, mainly traced by the Orion, Vela OB2 and Rho Ophiuchi star forming complexes in the third and fourth Galactic quadrants and, by the Serpens, Lacerta OB1 and Perseus in the first and second Galactic quadrants. However, the SFRs falling in the Cepheus region do not follow this pattern. A global view of these structures and their spatial correlation with the surrounding nebular emission, suggests a pattern consistent with the results found in Molinari et al. (2010), where massive proto-clusters and entire clusters of YSOs in active SFRs are associated to clouds that collapse into filaments.
As already found in Zari et al. (2018), current data reveal a very complex 3D structure that cannot be simply described with the Gould Belt, i.e. the giant flat structure, inclined by ∘ with respect to the Galactic Plane, first pointed out by Gould (1879). This insight was already suggested by Guillout (2001), who presented the first detection of the Gould Belt late type star population, and who proposed the alternative scenario of a Gould disk.
A more detailed representation of the young Galactic component in the Solar Neighborhood has been recently proposed by Alves et al. (2020), who determined the 3D distribution of all local cloud complexes by deriving accurate distances to about 380 lines of sight. They suggested that such 3D distribution could be described by a damped sinusoidal wave, that they call Radcliffe Wave, with an amplitude of pc and a period of Kpc. It crosses Orion, around a minimum, Cepheus (crest), North America and Cygnus X. This structure is separated and distinct from a second structure, indicated as ”split”, crossing Sco-Cen, Aquila and Serpens clouds. They propose that the Gould Belt is a projection effect of two linear cloud complexes. The spatial distribution of YSOs associated to SFRs that has been identified in our work shows much more complex and diffuse structures but, the two elongated linear structures suggested by Alves et al. (2020) approximately cross the borderline of the two separated structures visible in the X, Y map of Fig. 4, delimited by the SFRs indicated by Alves et al. (2020). This brings us to confirm that the local young Galactic component is very complex. While our data are broadly consistent with the Alves et al. (2020) findings, further investigations, including a more detailed analysis of the kinematics of the structures, based on the 3D space coordinates () and velocities () (e.g. de Zeeuw et al., 1999) are required to confirm the scenario and to find additional insights about their origin.
To gain further insights on the star formation history of the SFRs, it is crucial to derive more accurate stellar ages. However, we do not attempt to derive stellar ages of the selected YSOs, for several reasons: first of all, the lack of a suitable photometric system. In fact, the large Gaia EDR3 G and RP photometric bands used for this work are not sensitive to the fundamental stellar parameters (effective temperatures, stellar ages, etc…), especially for low mass stars. However, future Gaia releases, overcoming issues related to the BP bands at faint magnitudes, could be crucial to this aim. Second, the lack of spectroscopic data needed to derive individual stellar reddening values, to appropriately place these YSOs on the HR diagram. Alternative ways, such as the use of 3D reddening maps (e.g. Bovy et al., 2016; Lallement et al., 2019), require careful analysis, since the integrated extinction tends to be underestimated in the molecular clouds, where SFRs are typically located. A detailed analysis is deferred to future works based on the combination of Gaia and spectroscopic data from available surveys, such as Gaia-ESO (Gilmore et al., 2012; Randich et al., 2013), LAMOST (Zhao et al., 2012) or APOGEE (Majewski et al., 2017), or future surveys such as WEAVE (Dalton et al., 2012) and 4MOST (Guiglion et al., 2019).
7 Summary and conclusions
We used the machine learning unsupervised clustering algorithm DBSCAN to systematically identify all SFRs with ages Myr, within 1.5 Kpc from the Sun. The density-based clustering algorithm has been applied to the Gaia EDR3 positions, parallaxes and proper motions of a photometrically-selected starting sample.
A pattern match procedure based on a template data set including typical clusters detected within the photometric sample has been used to distinguish very young clusters from the contaminant old clusters and from photometrically unphysical clusters. We provide here a catalogue with the main parameters (positions, spatial extent, median distance and number of members) of the 354 SFRs with ages Myr. The parameters of the 322 young clusters with ages 10 Myr Myr are also given. We provide also the list of the 124 440 plus 65 863 YSOs found in the SFRs and the young clusters, respectively, including mainly late type K-M stars. A substantial number of YSOs have been recognized for the first time. Based on the comparison of our list of YSOs in the well known region Sco-Cen and NGC2264, we roughly estimate that, within our observational limits, the completeness of the census of cluster members obtained with our analysis is 85%, at least in very rich and concentrated SFRs. For low density regions, such as the Taurus-Auriga association (see Appendix C), this completeness figure is expected to be around 50%. The mass function coverage of each cluster, strongly depends on the cluster distance, and is set by the observational limit.
Compact regular clusters, as well as SFRs in large complexes such as, for example, Taurus, Orion, Sco-OB2, Perseus, and Cygnus, have been identified with high efficiency, as estimated from the comparison with other available catalogues (see Appendix C ).
The overall distribution of these clusters in the Galaxy context shows that they are distributed along a very complex 3D pattern that seems to connect them at least within 500-600 pc. Outside of this distance, the clusters appear to be more regularly and closely distributed along the GP.
As far as we know, the catalogue of YSOs presented in this work is the sole all-sky catalogue based on the most recent Gaia EDR3 data, which benefit of major improvements with respect to Gaia DR2. This catalogue represents a step forward in the census of SFRs and can be used, for example, for further detailed interpretations of their spatial distribution in the context of the spiral arm model (Reid et al., 2019), since it covers a substantial region crossed by the Local arm and, marginally, some regions of the Perseus and Sagittarius-Carina arms (Poggio et al., 2021). Future and photometric deep surveys, such as the Rubin Legacy Survey of Space and Time (LSST) will allow to extend these limits.
We note that these results are not at this stage suitable for studies such as star formation history, cluster dynamics, based on the full space 3D velocity determination, or IMF, since the census of the SFRs is not complete and accurate masses and ages, as well as radial velocities can not be determined, until further data are available. Nevertheless, the dominant component of the SFRs has been detected and thus these results can be used as driving samples for the extraction of complete populations from Gaia data, by relaxing the stringent constraints adopted in this work.
Finally, the SFRs identified in this work are defined well enough to allow detailed studies of circumstellar disk evolution and direct imaging of young giant planets, based on multiband analisys of available or future additional observations (X-rays or IR or radio), targeting some of the individual clusters.
Acknowledgements.
This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular the institutions participating in the Gaia Multilateral Agreement. E.T. acknowledges Czech Science Foundation GAC̆R (Project: 21-16583M). JMA acknowledges financial support from the project PRIN-INAF 2019 ”Spectroscopically Tracing the Disk Dispersal Evolution. The authors are very grateful to the anonymous referee, for providing constructive comments and suggestions which significantly contributed to improving this publication.References
- Alves et al. (2020) Alves, J., Zucker, C., Goodman, A. A., et al. 2020, Nature, 578, 237
- Anders et al. (2019) Anders, F., Khalatyan, A., Chiappini, C., et al. 2019, A&A, 628, A94
- Avedisova (2002) Avedisova, V. S. 2002, Astronomy Reports, 46, 193
- Bailer-Jones et al. (2021) Bailer-Jones, C. A. L., Rybizki, J., Fouesneau, M., Demleitner, M., & Andrae, R. 2021, AJ, 161, 147
- Bica et al. (2019) Bica, E., Pavani, D. B., Bonatto, C. J., & Lima, E. F. 2019, AJ, 157, 12
- Bonito et al. (2013) Bonito, R., Prisinzano, L., Guarcello, M. G., & Micela, G. 2013, A&A, 556, A108
- Bouy & Alves (2015) Bouy, H. & Alves, J. 2015, A&A, 584, A26
- Bovy et al. (2016) Bovy, J., Rix, H.-W., Green, G. M., Schlafly, E. F., & Finkbeiner, D. P. 2016, ApJ, 818, 130
- Bressan et al. (2012) Bressan, A., Marigo, P., Girardi, L., et al. 2012, MNRAS, 427, 127
- Cantat-Gaudin & Anders (2020) Cantat-Gaudin, T. & Anders, F. 2020, A&A, 633, A99
- Cantat-Gaudin et al. (2020) Cantat-Gaudin, T., Anders, F., Castro-Ginard, A., et al. 2020, A&A, 640, A1
- Cantat-Gaudin et al. (2018) Cantat-Gaudin, T., Jordi, C., Vallenari, A., et al. 2018, A&A, 618, A93
- Castro-Ginard et al. (2020) Castro-Ginard, A., Jordi, C., Luri, X., et al. 2020, A&A, 635, A45
- Castro-Ginard et al. (2018) Castro-Ginard, A., Jordi, C., Luri, X., et al. 2018, A&A, 618, A59
- Chabrier (2003) Chabrier, G. 2003, PASP, 115, 763
- Chen & Lee (2008) Chen, W. P. & Lee, H. T. 2008, in Handbook of Star Forming Regions, Volume I, ed. B. Reipurth, Vol. 4, 124
- Chen et al. (2014) Chen, Y., Girardi, L., Bressan, A., et al. 2014, MNRAS, 444, 2525
- Cody et al. (2014) Cody, A. M., Stauffer, J., Baglin, A., et al. 2014, AJ, 147, 82
- Dalton et al. (2012) Dalton, G., Trager, S. C., Abrams, D. C., et al. 2012, in Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series, Vol. 8446, Ground-based and Airborne Instrumentation for Astronomy IV, ed. I. S. McLean, S. K. Ramsay, & H. Takami, 84460P
- Damiani et al. (2006) Damiani, F., Micela, G., Sciortino, S., et al. 2006, A&A, 460, 133
- Damiani et al. (2019) Damiani, F., Prisinzano, L., Pillitteri, I., Micela, G., & Sciortino, S. 2019, A&A, 623, A112
- de Zeeuw et al. (1999) de Zeeuw, P. T., Hoogerwerf, R., de Bruijne, J. H. J., Brown, A. G. A., & Blaauw, A. 1999, AJ, 117, 354
- Dell’Omodarme et al. (2012) Dell’Omodarme, M., Valle, G., Degl’Innocenti, S., & Prada Moroni, P. G. 2012, A&A, 540, A26
- Dias et al. (2002) Dias, W. S., Alessi, B. S., Moitinho, A., & Lépine, J. R. D. 2002, A&A, 389, 871
- Dutra & Bica (2002) Dutra, C. M. & Bica, E. 2002, A&A, 383, 631
- Ercolano et al. (2021) Ercolano, B., Picogna, G., Monsch, K., Drake, J. J., & Preibisch, T. 2021, MNRAS, 508, 1675
- Ester et al. (1996) Ester, M., Kriegel, H.-P., Sander, J., & Xu, X. 1996, in Second International Conference on Knowledge Discovery and Data Mining, ed. J. Simoudis, E. Han & U. Fayyad (Menlo Park, CA: AAAI Press), 226
- Fasano & Franceschini (1987) Fasano, G. & Franceschini, A. 1987, MNRAS, 225, 155
- Feigelson (2018) Feigelson, E. D. 2018, in Astrophysics and Space Science Library, Vol. 424, The Birth of Star Clusters, ed. S. Stahler, 119
- Fernández-López et al. (2014) Fernández-López, M., Arce, H. G., Looney, L., et al. 2014, ApJ, 790, L19
- Flaccomio et al. (2006) Flaccomio, E., Micela, G., & Sciortino, S. 2006, A&A, 455, 903
- Franciosini et al. (2021) Franciosini, E., Tognelli, E., Degl’Innocenti, S., et al. 2021, arXiv e-prints, arXiv:2111.11196
- Gaia Collaboration et al. (2021) Gaia Collaboration, Brown, A. G. A., Vallenari, A., et al. 2021, A&A, 649, A1
- Gaia Collaboration et al. (2016) Gaia Collaboration, Prusti, T., de Bruijne, J. H. J., et al. 2016, A&A, 595, A1
- Gilmore et al. (2012) Gilmore, G., Randich, S., Asplund, M., et al. 2012, The Messenger, 147, 25
- Gould (1879) Gould, B. A. 1879, Resultados del Observatorio Nacional Argentino, 1, I
- Guiglion et al. (2019) Guiglion, G., Battistini, C., Bell, C. P. M., et al. 2019, The Messenger, 175, 17
- Guillout (2001) Guillout, P. 2001, in Astronomical Society of the Pacific Conference Series, Vol. 243, From Darkness to Light: Origin and Evolution of Young Stellar Clusters, ed. T. Montmerle & P. André, 677
- Gullbring et al. (1998) Gullbring, E., Hartmann, L., Briceno, C., & Calvet, N. 1998, ApJ, 492, 323
- Jackson et al. (2022) Jackson, R. J., Jeffries, R. D., Wright, N. J., et al. 2022, MNRAS, 509, 1664
- Jeffries et al. (2017) Jeffries, R. D., Jackson, R. J., Franciosini, E., et al. 2017, MNRAS, 464, 1456
- Kerr et al. (2021) Kerr, R. M. P., Rizzuto, A. C., Kraus, A. L., & Offner, S. S. R. 2021, ApJ, 917, 23
- Kervella et al. (2022) Kervella, P., Arenou, F., & Thévenin, F. 2022, A&A, 657, A7
- Kounkel & Covey (2019) Kounkel, M. & Covey, K. 2019, AJ, 158, 122
- Kounkel et al. (2020) Kounkel, M., Covey, K., & Stassun, K. G. 2020, AJ, 160, 279
- Kraus et al. (2012) Kraus, A. L., Ireland, M. J., Hillenbrand, L. A., & Martinache, F. 2012, ApJ, 745, 19
- Krolikowski et al. (2021) Krolikowski, D. M., Kraus, A. L., & Rizzuto, A. C. 2021, AJ, 162, 110
- Kronberger et al. (2006) Kronberger, M., Teutsch, P., Alessi, B., et al. 2006, A&A, 447, 921
- Krone-Martins & Moitinho (2014) Krone-Martins, A. & Moitinho, A. 2014, A&A, 561, A57
- Lada (2006) Lada, C. J. 2006, ApJ, 640, L63
- Lallement et al. (2019) Lallement, R., Babusiaux, C., Vergely, J. L., et al. 2019, A&A, 625, A135
- Le Duigou & Knödlseder (2002) Le Duigou, J. M. & Knödlseder, J. 2002, A&A, 392, 869
- Lindegren et al. (2021a) Lindegren, L., Bastian, U., Biermann, M., et al. 2021a, A&A, 649, A4
- Lindegren et al. (2021b) Lindegren, L., Klioner, S. A., Hernández, J., et al. 2021b, A&A, 649, A2
- Liu & Pang (2019) Liu, L. & Pang, X. 2019, ApJS, 245, 32
- Luhman (2022) Luhman, K. L. 2022, AJ, 163, 24
- Luri et al. (2018) Luri, X., Brown, A. G. A., Sarro, L. M., et al. 2018, A&A, 616, A9
- Majewski et al. (2017) Majewski, S. R., Schiavon, R. P., Frinchaboy, P. M., et al. 2017, AJ, 154, 94
- Mathieu (1994) Mathieu, R. D. 1994, ARA&A, 32, 465
- Mayne & Naylor (2008) Mayne, N. J. & Naylor, T. 2008, MNRAS, 386, 261
- McInnes et al. (2017) McInnes, L., Healy, J., & Astels, S. 2017, JOSS, 2, 205
- Megeath et al. (2012) Megeath, S. T., Gutermuth, R., Muzerolle, J., et al. 2012, AJ, 144, 192
- Mitra-Kraev et al. (2005) Mitra-Kraev, U., Harra, L. K., Güdel, M., et al. 2005, A&A, 431, 679
- Miville-Deschênes & Lagache (2005) Miville-Deschênes, M.-A. & Lagache, G. 2005, ApJS, 157, 302
- Miville-Deschênes et al. (2017) Miville-Deschênes, M.-A., Murray, N., & Lee, E. J. 2017, ApJ, 834, 57
- Moitinho et al. (2001) Moitinho, A., Alves, J., Huélamo, N., & Lada, C. J. 2001, ApJ, 563, L73
- Molinari et al. (2010) Molinari, S., Swinyard, B., Bally, J., et al. 2010, A&A, 518, L100
- Montalto et al. (2021) Montalto, M., Piotto, G., Marrese, P. M., et al. 2021, A&A, 653, A98
- Otrupcek et al. (2000) Otrupcek, R. E., Hartley, M., & Wang, J. S. 2000, PASA, 17, 92
- Palla et al. (2005) Palla, F., Randich, S., Flaccomio, E., & Pallavicini, R. 2005, ApJ, 626, L49
- Palla & Stahler (1999) Palla, F. & Stahler, S. W. 1999, ApJ, 525, 772
- Peacock (1983) Peacock, J. A. 1983, MNRAS, 202, 615
- Pecaut & Mamajek (2013) Pecaut, M. J. & Mamajek, E. E. 2013, ApJS, 208, 9
- Piecka & Paunzen (2021) Piecka, M. & Paunzen, E. 2021, arXiv e-prints, arXiv:2107.07230
- Poggio et al. (2021) Poggio, E., Drimmel, R., Cantat-Gaudin, T., et al. 2021, A&A, 651, A104
- Prisinzano et al. (2005) Prisinzano, L., Damiani, F., Micela, G., & Sciortino, S. 2005, A&A, 430, 941
- Randich et al. (2013) Randich, S., Gilmore, G., & Gaia-ESO Consortium. 2013, The Messenger, 154, 47
- Randich et al. (2018) Randich, S., Tognelli, E., Jackson, R., et al. 2018, A&A, 612, A99
- Rebull et al. (2013) Rebull, L. M., Johnson, C. H., Gibbs, J. C., et al. 2013, AJ, 145, 15
- Reid et al. (2019) Reid, M. J., Menten, K. M., Brunthaler, A., et al. 2019, ApJ, 885, 131
- Riello et al. (2021) Riello, M., De Angeli, F., Evans, D. W., et al. 2021, A&A, 649, A3
- Salpeter (1955) Salpeter, E. E. 1955, ApJ, 121, 161
- Scalo (1998) Scalo, J. 1998, in ASP Conf. Ser. 142: The Stellar Initial Mass Function (38th Herstmonceux Conference), 201
- Sitnik (2003) Sitnik, T. G. 2003, Astronomy Letters, 29, 311
- Soderblom et al. (2014) Soderblom, D. R., Hillenbrand, L. A., Jeffries, R. D., Mamajek, E. E., & Naylor, T. 2014, Protostars and Planets VI, 219
- Somers et al. (2020) Somers, G., Cao, L., & Pinsonneault, M. H. 2020, ApJ, 891, 29
- Spina et al. (2017) Spina, L., Randich, S., Magrini, L., et al. 2017, A&A, 601, A70
- Tang et al. (2014) Tang, J., Bressan, A., Rosenfield, P., et al. 2014, MNRAS, 445, 4287
- Tognelli et al. (2018) Tognelli, E., Prada Moroni, P. G., & Degl’Innocenti, S. 2018, MNRAS, 476, 27
- Tognelli et al. (2020) Tognelli, E., Prada Moroni, P. G., Degl’Innocenti, S., Salaris, M., & Cassisi, S. 2020, A&A, 638, A81
- Venuti et al. (2019) Venuti, L., Damiani, F., & Prisinzano, L. 2019, A&A, 621, A14
- Yonekura et al. (1997) Yonekura, Y., Dobashi, K., Mizuno, A., Ogawa, H., & Fukui, Y. 1997, ApJS, 110, 21
- Zari et al. (2018) Zari, E., Hashemi, H., Brown, A. G. A., Jardine, K., & de Zeeuw, P. T. 2018, A&A, 620, A172
- Zhao et al. (2012) Zhao, G., Zhao, Y.-H., Chu, Y.-Q., Jing, Y.-P., & Deng, L.-C. 2012, Research in Astronomy and Astrophysics, 12, 723
- Zucker et al. (2020) Zucker, C., Speagle, J. S., Schlafly, E. F., et al. 2020, A&A, 633, A51
Appendix A Interstellar reddening effects
In this section we show the effects of the reddening on the sample selected as described in Section 3. As discussed in Anders et al. (2019), for a generic passband , extinction coefficients Ai/AV depend on the stellar effective temperature. The subsequent dust attenuated photometry of very broad photometric passbands, such as the Gaia EDR3 ones, is not a simple linear function of AV, but it is also a function of the source spectrum i.e. its effective temperature.
The PARSEC 1.2S stellar models (Bressan et al. 2012; Chen et al. 2014; Tang et al. 2014) have been implemented to predict tracks and isochrones also at non-zero extinction. As done in Montalto et al. (2021), in order to have an indication of the reddening which affects our data, we used the CMD 3.3 input form web interface, and we constructed a grid of stellar models assuming the 1 Gyr isochrone, and AV=[0.1, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 20.0, 30.0].
Figure 11 shows how the 1 Gyr isochrone changes by increasing extinction AV from 0 to 10, in the CAMD obtained by adopting the different Gaia colors (panels a and b). The 1 Gyr isochrone at A is highlighted by a thick black line while the 1 Gyr isochrone at A is highlighted by symbols with different shades of pink in the different stellar evolution phases.
Note that the reddened isochrone is not linearly shifted along a reddening direction, as usually happens when adopting a reddening vector. For example, for an object at M, corresponding to a star with (GBP-G=0.47, (G-G=0.30 and effective temperature of 6930 K at 1 Gyr (black empty square in the Figure) with extinction A, the reddening E(GBP-GRP) is equal to 1.24 and E(G-GRP) is equal to 0.55 (blue arrows in the Figure). But, for an object at M, corresponding to a star with (GBP-G=1.81, (G-G=0.90 and effective temperature of 3945 K at 1 Gyr (black bullet in the Figure) with extinction A, the reddening E(GBP-GRP) is equal to 1.09 and E(G-GRP) is equal to 0.26 (red arrows in the Figure).
Thus, at different temperatures, and for a fixed AV, the shift in color due to the reddening is smaller for the colder star. And this effect is higher in the G vs. G-GRP diagram, as can be deduced from the different slopes of the blue and red arrows. In this case, for a 4000 K star, the E(G-GRP) value (equal to 0.26) is about half than the one (0.55) associated to a 7000 K star.
This implies that while a reddened 1 Gyr old star with effective temperature of K can be expected to be found in the PMS region and mimic a star younger than 10 Myr, a colder star of 4000 K, of the same age and affected by the same extinction, does not fall in the PMS region (see Fig. 11, panel b). In conclusion, the effect of uncorrected reddening in terms of contamination of our initial photometric sample by old stars is larger for stars with spectral type F and G, than for K and M stars.

Appendix B The effect of binarity or multiplicity on astrometric selections
At the level of astrometric sensitivity offered by Gaia, the orbital motions of binary (or multiple) stars become sometimes measurable, and also difficult to disentangle from proper motion. This holds both for resolved pairs, and for unresolved unequal-mass pairs where the photocenter displays significant motion (see Kervella et al. 2022). If the binary period resonates with the Gaia sampling frequency, parallax measurements will be also affected. Perhaps the best-studied star-forming region in terms of its binary-star population is Taurus-Auriga, and we may refer to the review by Mathieu (1994) to get an idea of the expected range of system parameters. Taurus is one of the few SFRs where lunar occultation techniques were feasible for detection of close pairs, down to separations of 0.009′′(Mathieu 1994, Table A1 and references therein). Therefore, the projected binary separations range across a factor of , with no “typical” value. Correspondingly, their orbital periods span a range of a factor of .
We have checked empirically if the Gaia-based selection used in this paper keeps the binary members of a SFR, by matching the Tau-Aur binary-star list in Table 1 from Kraus et al. (2012) with the Gaia EDR3 catalogue, and with its subset selected in this paper. Out of 156 stars in Kraus et al., we found 142 Gaia-EDR3 counterparts within 0.5′′, of which 40 selected in this work using DBSCAN.
We have then compared the RUWE distributions of the selected vs. the unselected systems, to gain insight on how binary motions impact RUWE and the subsequent selection. Figure 12 shows a diagram of RUWE vs. Gaia color GGRP. The horizontal line indicates our maximum accepted RUWE value. Filled symbols are stars passing our selection, while empty symbols are the unmatched binaries, i.e. those not retrieved in our catalogue. It should be remembered that the cut in absolute- magnitude rejects some stars which would have passed the RUWE constraint. Nevertheless, the vast majority of unmatched stars have indeed RUWE values well above the chosen limiting value, and were likely rejected for this reason. There is little or no dependence of RUWE on Gaia color, and therefore mass (although part of color spread is also due to high extinction towards Tau-Aur). Also interesting is the diagram shown in Fig. 13, showing RUWE vs. binary separation. Here, the absence of any dependence of RUWE on projected separation (when measurable) is very evident, including unresolved pairs, where only the photocenter is affected by orbital motion. This latter diagram contains only 6 out of 40 stars selected by us, since about one-half of the Kraus et al. pairs are spectroscopic binaries with no measured separation. We also point out that out of the 76 binaries with no measured separation, 32 pass our selection (42%), while only 8 out of the 66 binaries with measured separation pass the selection (12%), probably because photocenter motion has a smaller effect on astrometry compared to the motion of resolved components.


Overall, extending this result from Tau-Aur to other SFRs at similar distances, we would predict to have lost 72% of their binary populations due to our selection criteria. Thus, if a binary frequency is as high as 50% (Mathieu 1994), a loss of 35% of PMS members can be expected. However, since this work selects stars at distances up to 1500 pc, this estimate should not be naively extended to our whole sample: the larger the distance the smaller the projected binary motions hence, closer to our detection limit. We therefore expect a less significant binary member loss for the farther-out SFRs. A more detailed study of these effects would however be far beyond the scope of this paper, also taking into account that the upcoming Gaia data release DR3 (foreseen June 2022) will contain orbital astrometric solutions for 135 760 non-single stars111111See https://www.cosmos.esa.int/web/gaia/.
Appendix C Literature comparison
C.1 The Taurus-Auriga association
The Taurus-Auriga complex is one of the nearest active SFRs of low mass stars to which many works have been dedicated. In this region, we identified several substructures, as can be seen from Fig. 14. In order to identify the YSOs associated to the Taurus-Auriga association, we imposed the upper distance limit equal to 225 pc, as done in Krolikowski et al. (2021), and restricted the spatial region in the range 58.0∘86.0∘ and 10.5∘38.5∘. We considered the sub-structures whose members are all within these limits. With these conditions we identified a total of 313 YSOs associated to 6 substructures. The spatial distributions are shown in Fig. 14, while PMs, parallaxes and the CAMD are shown in Fig. 15.
The members in the southwest subregion (light blue plus symbols in the Figures) are distributed quite close to the 10 Myr isochrone and then they could be part of an older population unselected by us for the photometric cut we used. But, with the exception of this, the members of the other substructures show the typical distribution of stars in PMS. The PMs of the substructures are quite well distint, as well as parallaxes, suggesting a complex 3D structure with the known core including members in the region 63.0∘70.0∘ and 23∘28∘(blue star symbols in the Figures), being also on the close side (median distance equal to 132 pc ). The easternmost and most populated substructure (red square symbols in the Figures) is, instead, the most distant (median distance equal to 171 pc). A marginal evidence of age spread, as found in Krolikowski et al. (2021), is also found with our analisys but our results cannot be considered conclusive being based on reddening uncorrected photometry.
Krolikowski et al. (2021) compiled, very recently, the most complete and inclusive census of members of this region found in the literature. Among these, 587 objects have Gaia EDR3 counterparts, with 528 having a full astrometric solution. Using the Gaia EDR3 identification number given in the Krolikowski et al. (2021) table, we matched the list of the 437 Taurus members in the Krolikowski et al. (2021) table that are included in the photometric limits imposed in our work, with the YSOs with Myr (i.e. classified with flag from 1 to 28), and found 202 objects in common, that amount to about 46% (202/437) of the Krolikowski et al. (2021) list and 65% (202/313) of our list of YSOs in this region. We note that the Krolikowski et al. (2021) list has been obtained from the compilation of previous works, including local spectroscopic and IR data surveys that do not homogeneously cover the entire region as we have done with Gaia data. For example, many of the 111 YSOs not included in the Krolikowski et al. (2021) table belong to the clusters 579 and 572 in Table LABEL:printtexclusters that includes 88 and 30 YSOs (red squares and light blue symbols in Fig. 14, top panel), in two sub-regions poorly covered by Krolikowski et al. (2021).
We compared the list of YSOs in Taurus also with the list of members identified as excess of mass (EOM) by Kerr et al. (2021) using Gaia DR2 data. Details on the match with our catalogue are given in Sect. C.4. As in our case, Kerr et al. (2021) found substructures beyond the distance of known members. To perform a consistent comparison, we restricted the Kerr et al. (2021) catalogue in RA, Dec and distance, as done for our catalogue. Those identified as EOM in this region are 429. Among these, we considered the ones with and , to match the same photometric region adopted for our catalogue. Of the 409 Kerr et al. (2021) YSOs that meets these conditions, we found that 197 (about 48%) are in common with our list of YSOs. We note that a rigorous consistent comparison is very hard to perform, since it strongly depends not only on the adopted clustering techniques but also on the subsample of Gaia data that is selected as starting point of the subsequent analysis.
The Taurus region is a well known complex structure in which the membership has been very hard to achieve due to its large spatial extent and strong obscuration by the nebula. The comparison we have done is sufficient to assert that about 50% of the selected YSOs in this region are already found in other surveys and that they are distributed in sub-structures that are consistent with those found in other works, in particular with the results presented by Kerr et al. (2021), that homogeneously cover the entire region.






C.2 The Orion Complex





YSOs associated to the Orion complex have been identified by selecting objects with ∘∘ and ∘∘. In this way, we found 18 840 YSOs associated to 7 substructures with Myr. These are shown in Fig. 16 121212 For a direct visual comparison, spatial limits of the figure are the same used in Fig. 1 of Kounkel & Covey (2019)., where we note the presence of already known substructures such as and Ori, ONC, 25 Ori. All the main sub-structures covering the Orion A and B Nebulae have been merged by our procedure in a single complex including 14 832 YSOs and further 1 576 YSOs in the Ori cluster. The most distant cluster associated to Monoceros R2 (Mon-R2) is not part of the close Orion Complex and includes 1 272 YSOs with a mean distance of 897 pc (123 pc).
Figure 17 shows proper motions and parallaxes of the substructures found in the Orion area. In particular, the proper motions show a very complex kinematic pattern of the subclusters in this region. However, a detailed analysis of the Orion kinematics is beyond the aims of this work.
Figure 17 also shows the CAMD of the populations associated to Orion. Even though we can not rigorously interpret it, being our data not corrected for reddening, we note an apparent large age spread for all the populations.
We compare our findings in the Orion Complex region with the Kounkel & Covey (2019) catalogue. Details on the match between the two catalogues are given in Sect. C.4. To retrieve the YSOs identified by Kounkel & Covey (2019) in the Orion Complex, we considered from their Table 2, the 16 structures (Theia groups) falling in the Orion region as defined above. The YSOs in the Kounkel & Covey (2019) and Kounkel et al. (2020) catalogues associated to the Theia groups of the Orion Complex are 11 882 and 10 373, respectively. Those in common with the list of Orion members found in this work are 7 983 (67%) and 7 822 (75%).
The Orion Complex has been extensively investigated with Spitzer IR data.
For example, the Megeath et al. (2012) catalogue includes 3 479 YSOs
stars131313retrieved at
http://astro1.physics.utoledo.edu/megeath/Orion/The
_Spitzer_Orion_Survey.html that cover a quite extended region of the Orion A and B nebulae.
Using the cross-match service provided by CDS, Strasbourg,
and a matching radius of 1′′, we found that 2 612 IR sources from the Megeath et al. (2012) catalogue
have a Gaia EDR3 counterpart. From this sample, we considered only those with photometric and astrometric
restrictions given in equation 1, with - and in the range
203∘∘ and ∘∘, that amount to 1 667 YSOs. Of these, 1 561 (94%) are identified by us members of the Orion Complex.
The spatial distributions of our members and
those found in Megeath et al. (2012)
are shown in Fig. 18.
This high percentage proves that Gaia data have an efficiency in accurately diagnosing membership of YSOs in SFRs
comparable to that of IR data.
If we consider the subsample of 2 612 Megeath et al. (2012) objects
with Gaia counterparts, and assume that it includes only genuine YSOs (i.e. 0% of contamination),
we can conclude that our completeness level is about 60%. This value
is the result of the restrictions we imposed to our initial data set to reduce the contamination level.
We note that we can have a significant bias against (missed) binary stars. In fact, if we only
discard the condition , and retain the other conditions,
the Megeath et al. (2012) YSOs Gaia counterparts are 1 953 and this implies that
286 YSOs (1953-1667),
i.e. about 14% of the total sample (very likely binary systems), are missed in our data set.
We do not attempt to estimate the fraction of false positives that could be included in our sample
by considering the Megeath et al. (2012) catalogue since it includes mainly Class II stars, i.e. YSOs with
IR excess emission from the circumstellar disk and it is therefore incomplete for the Class III stars,
that do not show excess emission in the IR.
C.3 The interstellar dust free SFR NGC 2362
At a distance of 1 354192 pc, NGC 2362 is a SFR characterised by a very low and uniform reddening, estimated to be E(B-V)=0.1 (Moitinho et al. 2001). For this reason, the cluster shows a small spread in the optical V vs. V-I diagram, as found by Moitinho et al. (2001) and confirmed by Damiani et al. (2006) and this enables to constraint the duration of the star formation process that in this region has been about 1-2 Myr (Damiani et al. 2006). This result has been derived on the basis of a Chandra-ACIS X-ray observation, pointed in the cluster, from which a list of very likely members has been obtained. As for the case of NGC 2264, this cluster has been found with our procedure in a region more extended than that investigated by Damiani et al. (2006). The 879 YSOs compatible with being members of NGC 2362 are plotted in Fig. 19. Within the nominal cluster center l=238.2∘, -5.54∘ (Damiani et al. 2006), we found 150 candidate members while the others are mostly concentrated around the three bumps visible in the IR image. A further subgroup of cluster members shows an aligned spatial distribution roughly going from NGC 2362 to the H ii region LBN 1059.
To compare our data, with the list of 387 X-ray members by Damiani et al. (2006), we cross-matched this list with the Gaia EDR3 catalogue, by using the cross-match service provided by CDS, Strasbourg, adopting a matching radius of 0.5′′. We find that 294 of them have a single Gaia EDR3 counterpart but those that are compliant with our initial data set restrictions and falling in the PMS region of the CAMD compatible with ages Myr are 129. Among these, 118, i.e. %, are in common with our list of YSOs. This fraction confirms that, even though our list of YSOs is incomplete, due to the significant fraction of members discarded a priori with the adopted data restrictions, in the adopted photometric ranges, the efficiency of our method in detecting very likely members is very high, if we consider that X-ray detections select YSOs without any bias based on the stellar evolutionary status (Class II or III YSOs), with an high efficiency in the spectral types (G to M) we are working on.
Within the Chandra-ACIS field of view, we selected a total of 150 YSOs, and 32 of them (21%) are not X-ray detected. X-ray detections found in Damiani et al. (2006) are complete for masses larger than 0.4 M⊙, that, assuming the cluster age of 4-5 Myr (Mayne & Naylor 2008), corresponds to . By considering that more than 50% of these X-ray undetected YSOs are fainter than this limit and that most of them are located far from the cluster center, where the Chandra-ACIS spatial resolution is lower, we are confident that the 32 X-ray undetected YSOs classified by us are likely members.




As for the other clusters, we investigated proper motions, parallaxes and CAMD, that are shown in Fig. 20. The proper motion scatter plot indicates that actually the distribution of YSOs falling in the Chandra-ACIS is more concentrated than that of the overall cluster, which shows an inclined trend. This confirms that the entire cluster is characterised by a kinematic structure slightly more complex than that of the subgroup of YSOs falling around the known cluster center. The parallax values indicate that all the detected YSOs are located at consistent distances.
We note that to reduce the observed spread in the vs. diagram shown in Fig. 20, in the computation of , we used the median cluster distance, rather than the individual member distances. The residual observed luminosity spread in the vs. diagram is likely due to reddening effects not corrected here and that, on the contrary, are very small in the V vs. V-I diagram, where the reddening vector is almost parallel to the cluster sequence in the low mass range (see Fig. 4 in Damiani et al. (2006)).
C.4 Comparison with literature all-sky star cluster catalogues
Using the gaiaedr3.dr2_neighbourhood table in the Gaia archive, we retrieved the Gaia DR2 identification number of the candidate YSOs selected in our work and thus, using these IDs, we performed the match with the Kerr et al. (2021) list, including 30 518 YSOs within 333 pc and selected with Gaia DR2. We found a total of 9 351 objects in common. Among these, 4 676 are associated to clusters with Myr and 3 914 are associated to clusters with 10 Myr Myr of our catalogue.
Using the same procedure for the Kounkel & Covey (2019) and Kounkel et al. (2020) catalogues, that include 288 370 entries up to 1 Kpc and 987 376 entries, up to 3 Kpc, respectively, we find a total of 38 567 and 42 350 YSOs in common. Those associated to SFRs with Myr (young clusters with 10 Myr Myr) are 23 071 (9 494) for the Kounkel & Covey (2019) list, and 25 511 (9 559) for the Kounkel et al. (2020) list. The remaining common stars have been discarded by us since they do not belong to the young age range. We note that, while in the contest of the entire all-sky catalogue the fraction of objects in common is very low (13% and 4%), in the region of the Orion Complex it is 67% and 75% (see Sect. C.2). However, we note that our catalogue does not include the string-like massive clusters at kpc, with spatial distribution aligned to the GP, that we discarded in the cluster validation phase (see Sect. 4.2). Instead, the Kounkel & Covey (2019) and Kounkel et al. (2020) lists include many of these objects and this could explain the low fraction of objects in common with respect to the entire catalogue. In addition, the restrictions to the initial data set are very different. For example, we imposed a photometric selection in the extinction uncorrected vs. CAMD, aimed to select mainly objects with age Myr. On the contrary, in the Kounkel & Covey (2019) and Kounkel et al. (2020) catalogues, no photometric selection has been applied and in fact these catalogues includes up to Gyr old clusters.
We compared our results also with the list of 2 017 clusters recently published by Cantat-Gaudin & Anders (2020) that includes 234 128 cluster members. They used the most complete list of clusters from the literature and assigned them cluster membership using the UPMASK procedure (Krone-Martins & Moitinho 2014), that is based on the compactness of the groups in the positional space and it is constrained to a fixed field of view. For 1 867 of these clusters, reliable parameters have been derived.
We find that the members presented by Cantat-Gaudin & Anders (2020) in common with our catalogue are 12 438. Those associated to SFRs ( Myr), young (10 Myr Myr) and old ( Myr) clusters are 6 788, 2 519 and 2 109, respectively, corresponding to 66, 38 and 76 clusters in our catalog, in the same age ranges. They belong to 311 clusters of the Cantat-Gaudin & Anders (2020) list141414This apparent discrepancy is due to the fact that our catalogue includes merged clusters that can include more than one cluster in the Cantat-Gaudin & Anders (2020) list. with parallaxes mas, that, approximatively, corresponds to the maximum distance of YSOs identified in our work. In the Cantat-Gaudin & Anders (2020) catalogue, the cluster members with , and are in total 49 074 and therefore we find that only % of YSOs detected by us are in common with Cantat-Gaudin & Anders (2020). Using the ages derived in Cantat-Gaudin & Anders (2020), we find that 226 of the matched clusters are older than 10 Myr.
For the 331 clusters in common, we compared the distances assigned by Cantat-Gaudin & Anders (2020) computed as the inverted parallaxes of the value given for each cluster and the mean distance obtained by us, computed from the weighted mean parallaxes. Errors on the parallaxes were computed as the error on the mean. The comparison is shown in Fig. 21, where the mean and standard deviation of the residuals between the two measurement sets are also given. The two determinations are consistent, even though there is a bias due to the different Gaia data realises adopted in our work (EDR3) and in Cantat-Gaudin & Anders (2020) (DR2).
