Abundance matching analysis of the emission line galaxy sample in the extended Baryon Oscillation Spectroscopic Survey
Abstract
We present the measurements of the small-scale clustering for the emission line galaxy (ELG) sample from the extended Baryon Oscillation Spectroscopic Survey (eBOSS) in the Sloan Digital Sky Survey IV (SDSS-IV). We use conditional abundance matching method to interpret the clustering measurements from to . In order to account for the correlation between properties of emission line galaxies and their environment, we add a secondary connection between star formation rate of ELGs and halo accretion rate. Three parameters are introduced to model the ELG [OII] luminosity and to mimic the target selection of eBOSS ELGs. The parameters in our models are optimized using Markov Chain Monte Carlo (MCMC) method. We find that by conditionally matching star formation rate of galaxies and the halo accretion rate, we are able to reproduce the eBOSS ELG small scale clustering within 1 error level. Our best fit model shows that the eBOSS ELG sample only consists of of all star-forming galaxies, and the satellite fraction of eBOSS ELG sample is 19.3%. We show that the effect of assembly bias is on the two-point correlation function and on the void probability function at scale of .
keywords:
large-scale structure of Universe, (sub)halos, N-body simulation, abundance1 Introduction
In modern cosmology, galaxies form and evolve within dark matter halos. In the early universe, the initial fluctuations of the matter field grow and evolve under gravity. Baryonic matter falls into dark matter halos and forms stars and galaxies. During the evolution of the universe, dark matter halos form through accretion and mergers (e.g., Lacey & Cole, 1993). At the same time, galaxies within dark matter halos form stars and grow through mergers as well. The details of the growth history and merger history depend on cosmological parameters and the physics of galaxy formation. It is important to study the relationship between galaxies and dark matter halos, because it helps us understand the properties of dark matter and constrain cosmological parameters. This paper constructs an empirical model of the newly completed eBOSS emission line galaxy (ELG) sample and its host dark matter halos.
There are many ways to model the relationship between galaxies and dark matter halos (for a recent review, see Wechsler & Tinker 2018). The most physical methods take into consideration the complicated physics of galaxy formation by using hydrodynamic simulations to simulate the galaxy formation process (an example of recent simulations see the EAGLE project; Schaye et al. 2015). Semi-analytical methods (SAMs; White & Frenk, 1991; Kauffmann et al., 1993; Somerville & Primack, 1999; Cole et al., 2000) aim to estimate the evolution of gas and star formation without explicitly incorporating hydrodynamics. Although closer to physical realism, these models are generally very computationally expensive. Fully empirical methods use statistical ways to connect properties of galaxies with properties of dark matter halos. Subhalo abundance matching (SHAM; Kravtsov et al., 2004; Vale & Ostriker, 2004; Conroy et al., 2006), in its most basic form, assumes that each (sub)halo hosts a galaxy and there is a monotonic relation between galaxy mass and (sub)halo mass. A successful SHAM model requires high-resolution dark matter simulations to locate the substructure inside host halos because the subhalos are naturally the local environment for hosting satellite galaxies. Another statistical model, the halo occupation distribution (HOD; Peacock & Smith, 2000; Seljak, 2000; Benson et al., 2000; Zheng, 2004; Scoccimarro et al., 2001), models the probability of finding galaxies of a certain type in a dark matter halo with halo mass . The spatial and velocity distribution of galaxies are determined by the halo mass profiles. HOD modeling also utilizes high-resolution N-body simulations to accurately identify halos and interpret galaxy clustering measurements, but it doesn’t require the identification of substructure inside halos.
In this paper we use abundance matching to model eBOSS ELGs. Compared with physical methods, SHAM circumvents the complex physics of galaxy formation and utilizes dark matter halo properties such as halo mass function, halo bias, halo mass profiles, etc, to interpret galaxy clustering statistics. This method is relatively computationally inexpensive and easy to implement once high-resolution simulations are given. Because of the above reason, SHAM has become a popular tool to study galaxy-halo relation in recent years, along with HOD. They all provide statistical link between dark matter halos and galaxies, and have been widely used in recent spectroscopic surveys (e.g. Avila et al., 2020; Yu et al., 2022). Some efforts are made to add more flavor into classical HOD for more physical understanding of galaxy formation. For example, Hearin et al. (2016) introduced a method to model assembly bias in HOD. Compared with HOD, SHAM naturally provides an opportunity to account for (sub)halo evolution history and model galaxies’ dependency on their environment. This can help us gain insights on physics of galaxy formation.
Despite its simple assumptions about the galaxy-halo connection, abundance matching has proven successful in simultaneously modeling the stellar mass function (SMF) as well as the two-point correlation function (see e.g. Conroy et al., 2006; Chaves-Montero et al., 2016; Rodríguez-Torres et al., 2016, for results on SDSS, DEEP2 and EAGLE simulations). The first abundance matching implementations related the stellar mass monotonically with the halo mass. However, because the masses of subhalos are sensitive to tidal stripping, it is more plausible to relate the stellar mass with properties of subhalos at the time they merged into their host halos. Several proxies are studied and compared with each other later, and it was shown that using the peak maximum circular velocity yields better result than using halo mass (Reddick et al., 2013). Another puzzle piece in the abundance matching model is the scatter in the stellar mass at a given halo property. It describes the variance in stellar mass that cannot be explained by the halo property that we choose. Models with different scatter can produce the same SMF, but with different two-point galaxy clustering statistics as well as the stellar-to-halo mass relations (SHMR). Normally the scatter is assumed to be log-normal and independent of halo mass (Yang et al., 2009; Behroozi et al., 2010), although some recent studies have shown that the scatter may be higher at lower halo masses (Cao et al., 2020; Taylor et al., 2020; Tinker, 2021).
While simple abundance matching models have proven successful, the actual galaxy-halo connection is more complex than the relationship between a single galaxy property with a single halo property. One phenomenon from observations is the galaxy bimodality, which separates the galaxies into a red sequence (usually old galaxies) and a blue sequence (usually young galaxies) in color space. The red galaxies tend to reside in overdense regions while the blue galaxies appear more often in underdense regions. This color-dependent distribution of galaxies can be revealed in the two-point statistics. Hearin & Watson (2013) proposed a method called “conditional abundance matching” for incorporating secondary galaxy properties into such models. At fixed stellar mass (or any other primary halo property) a secondary galaxy property is abundance matched to a secondary halo property. Secondary halo properties, such as halo age and concentration, strongly influence the clustering of halos at fixed mass (see, e.g. Wechsler & Tinker, 2018, and references therein). This is called assembly bias. Using this method, Tinker et al. (2017, 2018a) and Zu & Mandelbaum (2016, 2018) found no strong evidence for secondary correlations between halo properties and galaxy color. This method has also been used to explore various second galaxy properties such as star formation rate (Watson et al., 2015), galaxy morphology (Tinker et al., 2018b) and galaxy size (Hearin et al., 2017). Conditional abundance matching is an extremely useful tool to study the relationship between galaxies and dark matter halos. In this paper we adopt a similar approach as Tinker et al. (2018b) to model star-forming galaxies.
As galaxy redshift surveys push to higher redshifts, ELGs have become the preferred tracer of the dark matter density field. The strong emission lines and the abundance of ELGs at redshift make them suitable to trace dark matter and constrain cosmological models. The SDSS-IV/eBOSS survey (Dawson et al., 2016) is a newly completed spectroscopic survey that measures redshift of 174,000 ELGs at redshift range . It is so far one of the largest ELG sample to measure large-scale clustering. Future surveys that use ELGs as cosmological tracers include DESI (DESI Collaboration et al., 2016a, b), PFS (Takada et al., 2014), 4MOST (de Jong et al., 2014), Euclid (Laureijs et al., 2011), NGRST (formerly known as WFIRST Spergel et al., 2013), etc. Understanding the ELG-halo connection is beneficial to not only future surveys that heavily use ELGs, but also studies of quenching mechanisms of star formation, because the cosmic star formation peaks at around (Madau & Dickinson, 2014). The framework we developed here should be able to be applied to other surveys to inform target selection and large scale structure analysis.
The purpose of this paper is to build a physically-motivated model for the star forming galaxies with the possibility of galaxy assembly bias taken into account. To do so, we try to create physical and realistic ELG mocks with minimum parameters involved. Our model assumes that there are two connections between star forming galaxies and dark matter halos. The first is the monotonic relationship between the peak maximum circular velocity of halos and the stellar mass. The second one is the monotonic relationship between the halo accretion rates and the star formation rates of galaxies. The second relationship which we modulate, is intended to link the physical process of star formation within galaxies with halo assembly history (Watson et al., 2015; Tinker et al., 2018b). This provides physical insights about the galaxy formation to explain the ELG clustering, satellite fractions, etc.
This paper is structured as follows. In §2 we describe the eBOSS/ELG data sample. The methods of measuring small scale clustering and the observed []3727 luminosity function are described in §2.2 and §2.3, respectively. §3 describes the Multidark Planck simulation as well as the details of our abundance matching model. We present our result in §4 and interpret the clustering of eBOSS ELG sample. In §5 we describe our key findings. Throughout the paper we use a spatially flat CDM cosmology with and . It’s also the same cosmology that is used to run the MultiDark Planck 2 simulation.
2 Data
In this section we describe the eBOSS ELG data sample that we use in this paper. Our model will reproduce the distribution of luminosity in the eBOSS ELG sample, as well as the small-scale projected two-point clustering of ELGs.
2.1 eBOSS ELG sample
The extended Baryon Oscillation Spectroscopic Survey (eBOSS) is a newly completed six-year program (2014 – 2020; Dawson et al., 2016) within the Sloan Digital Sky Survey IV (SDSS-IV; Blanton et al., 2017). It uses the Sloan Foundation 2.5-m Telescope at the Apache Point Observatory (Gunn et al., 2006) with the same BOSS fiber spectrographs (Smee et al., 2013) to conduct observations. The goal of the eBOSS is to constrain cosmological models by measuring the large-scale structure of the universe, and it has successfully measured the cosmological distance scale at the percent level at four different redshifts (Alam et al., 2021). The eBOSS survey uses three different tracers to maximize redshift coverage: emission line galaxies (ELGs), luminous red galaxies (LRGs) and quasars. The eBOSS ELG program utilizes the DECam Legacy Survey (DECaLS; Dey et al., 2018) to perform target selection, because it has deeper band photometry than the SDSS imaging (Raichoor et al., 2017). We use the final eBOSS ELG catalog from the SDSS-IV Data Release 16 (DR16) in this paper. It contains 173,736 reliable spectroscopic measurements of redshift between , covering a total effective area of 727 deg2 (Raichoor et al., 2021).
2.2 Observed [] luminosity function
All abundance matching models should begin with observational measurements of the luminosity function or the mass function of the sample being modeled. Although we are modeling a sub-class of galaxies, this is still a critical piece of constructing our model. From the eBOSS ELG sample we measure the []3727 luminosity function, which quantifies the abundance of the ELG at a given [] luminosity. The reason we choose [] luminosity function instead of stellar mass function is that [] flux can be obtained directly from the spectra, while stellar masses of ELGs are measured in indirect ways such as spectral energy distribution (SED) fitting.
The [] flux of each ELG is extracted using the IDLSPEC2D eBOSS pipeline. The [] luminosity of each ELG is then calculated as
(1) |
where is the [] flux and is the luminosity distance at redshift . We show in Fig. 1 the observed [] luminosity of the eBOSS ELG sample, measured in the luminosity range of with logarithmic bins of dex. The errors are estimated using a bootstrap method. The Gaussian shape of the measured luminosity function is a product of the standard schechter form for such data and the eBOSS ELG selection function, which becomes inefficient at low luminosities.
2.3 Small-scale clustering
Besides the one-point statistics of galaxies, i.e., the galaxy luminosity function, another important tool to study the statistics of galaxies is the two-point correlation function (2PCF) . It describes the excess probability to find a galaxy pair with separation over a random galaxy field (Peebles, 1980). Quantifying the 2PCF of galaxies is useful because it distinguishes models with different galaxy bias at scales of and constrains the fraction of satellite galaxies at scales of . Bias is defined as the ratio of clustering relative to clustering of dark matter, . The bias of dark matter halo is not sensitive to the halo mass for small halos, however it is very sensitive to the halo mass for halos with higher mass. The large scale bias provides information on the overall halo mass scale for the galaxy sample, while the small scale clustering constrains the fraction of the sample that are satellite galaxies.
We use the Landy & Szalay estimator (Landy & Szalay, 1993) to measure the projected 2PCF for the eBOSS ELG sample. We correct the systematic effects on the clustering using the weighting scheme described in Raichoor et al. (2021). Each individual ELG is weighted by
(2) |
where and are the FKP weights (Feldman et al., 1994), the imaging systematic weights, the close-pair weights, and the redshift failure weights, respectively. We refer readers to Raichoor et al. (2021) for details of ELG systematic weights. We choose 17 logarithmic bins from to , covering both the one-halo term and two-halo term ranges. The errors are estimated with the jackknife resampling technique using 25 subsamples. Our measurements are presented in Fig. 2.
Estimating the covariance matrix directly from the data is noisy given the limited size of ELG sample. Thus to quantify the uncertainties in ELG clustering, we utilize the GLAM-QPM mocks from Lin et al. (2020). This set of mocks model the linear bias of eBOSS ELG sample at the scales we are interested. We first compute the covariance matrix from 2000 GLAM-QPM mocks, where and are the clustering measurements at the and bin, respectively. Then we up-scale it by the jackknife errorbar of ,
(3) |
where and are the errors of the clustering measurements at the and bin, respectively.
3 Theoretical Model
Abundance matching is a technique to describe the relationship between galaxies and dark matter halos. As described in §1, a scheme that only maps one property of galaxies with one property of halos may not be sufficient, especially for star forming galaxies like ELGs. So we adopted a conditional abundance matching model which not only matches primary properties, but also conditionally matches the star formation rate with the halo accretion rate. Although this model is physically motivated and supported by local observations, we test the impact of removing this correlation as well. We will describe the details of our method in the following sub-sections. Here we briefly summarize our abundance matching procedure of modeling eBOSS ELGs in the following steps:
-
•
In order to generate a mock galaxy catalog for all galaxies, we apply the abundance matching technique with a 0.15 dex scatter in stellar mass at fixed of halo.
-
•
To separate star-forming galaxies from passive galaxies, we utilize the quenched fraction from Tinker et al. (2013) for both central and satellite galaxies. We randomly select actively star-forming galaxies from the overall mock galaxy catalog to match the observed fractions.
-
•
A star forming main sequence fitting model is used to estimate star formation rate (SFR) of each mock galaxy, with a 0.3 dex scatter. We then perform a rank matching of the SFR with the accretion rate of the host halo at fixed stellar mass.
-
•
The luminosity of each galaxy in the mock catalog is estimated using each galaxy’s SFR with an empirical correction. We further use a parameterized selection function based on the luminosity in order to mimic the target selection of eBOSS ELG. This step introduces three free parameters. The description of the parameters is listed in Table 1.
-
•
For each parameter set, we construct the luminosity function and the two-point correlation function from the mock catalogs. The likelihood of the luminosity function as well as the two-point correlation function are used in the MCMC chain to optimize the parameters.
parameter | value |
0.15 dex (Behroozi et al., 2010) | |
0.3 dex (Noeske et al., 2007) | |
0.12 dex (best-fit) | |
41.7 dex (best-fit) | |
0.63 dex (best-fit) |
3.1 Dark matter simulations
In this paper we use the dark matter halo catalogs from the MultiDark Planck 2 simulation (MDPL2111https://www.cosmosim.org/cms/simulations/mdpl2/; Prada et al., 2012; Klypin et al., 2016). The MDPL2 simulation adopts a flat CDM cosmology with cosmological parameters described in §1, and is run by the L-GADGET-2 code, which is a variant of cosmological code GADGET-2 (Springel, 2005). The box size of the simulation is , and the mass resolution is . We choose the simulation output at redshift , roughly equal to the ELG effective redshift . Halos are found using the RockStar algorithm (Behroozi et al., 2013). The peak maximum circular velocity and the peak halo mass are determined throughout the merger history. The attributes in the dark matter halo catalogs that are used in this paper and their definitions are presented in Table 2. The reason we choose MDPL2 simulation is two folds. First it has high mass resolution, allowing us to locate halos at low mass end (around ). This is crucial in our study because ELGs are young galaxies that occupy lower-mass halos. Second, the volume of the simulation is large enough compared with eBOSS ELG footprint for us to get a good comparison to the data.
attribute name | unit | definition |
the peak halo mass over accretion history | ||
the peak maximum circular velocity over accretion history | ||
---|---|---|
halo mass enclosed within overdensity | ||
growth rate of |
3.2 Abundance Matching Model
3.2.1 Stellar Mass Function of All Galaxies
Since our abundance matching approach uses the stellar mass as the galaxy property to match with of halos, it requires the abundance of galaxies as a function of their stellar masses. We use the fitted SMF from Ilbert et al. (2013) at redshift . Note that the SMF from Ilbert et al. (2013) is for all galaxies. The ELG sample from eBOSS that has been selected using color cuts, however, is a subset of the full sample. We will tackle this problem later in §3.2.3.
3.2.2 SubHalo Abundance Matching
We follow the procedure of Behroozi et al. (2010) in order to match and according to their abundance. First we deconvolve the scatter from the true SMF to get a ‘direct’ SMF, which is used to match with directly. We assign a constant 0.15 dex scatter in at a given (Behroozi et al., 2010). Second, we perform the abundance matching using the direct SMF with from the MDPL2 halo catalog, and produce a mock galaxy catalog. At last we re-add the log-normal scatter to the stellar mass in the mock galaxy catalog, yielding a match to the input SMF from Ilbert et al. (2013).
3.2.3 Quenched Fraction
The [OII] doublet emission lines in the spectrum of ELGs indicate that they are strongly star-forming. However, the mock galaxy catalog that we built using SMF from Ilbert et al. (2013) contains all types of galaxies. In order to distinguish the young, blue and star-forming galaxies with the old, red and quiescent galaxies, we apply the quenched fraction at from Tinker et al. (2013). The quenched fraction is defined as the fraction of galaxies that has no star formation (i.e., is quenched). In Tinker et al. (2013), the separation of quenched and star-forming galaxies is based on the standard UVJ diagram. For the galaxy mass range we consider, the quenched fraction monotonically increases with stellar mass. The physical mechanisms of star formation quenching can be different for central galaxies and satellite galaxies. For central galaxies, the quenching is likely due to halo mergers or AGN feedback, whereas for satellite galaxies it is more likely due to tidal stripping from the host halos and ram pressure stripping from intracluster medium. Hence the relationship between the quenched fraction and the stellar mass is different for central galaxies and satellite galaxies. Being able to distinguish the different quenched fractions for centrals and satellites is important, because it could have impacts on the small-scale galaxy clustering. Getting the right amount of satellites helps us constrain the clustering at scales of .
We apply the quenched fraction for both centrals and satellites from Tinker et al. (2013) in the stellar mass range , and we perform a linear interpolation between each pair of adjacent data points ( vs. ) as well as a linear extrapolation for the low mass end and the high mass end . If the linear extrapolation of quenched fraction exceeds 1, we truncate the value to 1. We then take the mock galaxy catalog from §3.2.2 and randomly remove galaxies according to the quenched fraction based on their stellar masses and whether they are central galaxies. In this step we take a random subset of halos to host quenched central galaxies. This matches observations that the quenched fraction of centrals is independent of environment (Tinker et al., 2017, 2018a), it only depends on the stellar mass in our case.
One alternative way to compute the quenched fraction is performing a linear fit of vs. in Tinker et al. (2013) instead of linear interpolation/extrapolation as what we just described. We performed a sensitivity test and did not see a significant difference in the galaxy clustering between these two methods.
3.2.4 Conditional Abundance Matching
So far we have constructed a mock galaxy catalog for star forming galaxies using a single connection between stellar mass and peak maximum circular velocity of halos. However, the properties of star forming galaxies are correlated with their environment, and the assembly bias and galaxy formation has not yet been incorporated in the current model. Here we link the SFR of galaxies with the accretion rate of halos at a fixed stellar mass, using the conditional abundance matching (CAM; Hearin & Watson, 2013; Hearin et al., 2014) approach employed in Tinker et al. (2018b). In this way, for a given stellar mass, the galaxy which has the highest SFR resides in the halo which has the highest growth rate (Tinker et al., 2018b). This is the maximal impact of assembly bias. We also show results when this correlation is removed completely, thus showing the range of possible models. In order to calculate SFR for each galaxy in the mock catalog, we use the star forming main sequence fitting model from Lee et al. (2015):
(4) |
where , , , and . We use the best-fit parameters for redshift range . We add a 0.3 dex scatter to (Noeske et al., 2007), and match SFR with halo mass accretion rate given a stellar mass bin.
3.2.5 Estimating
In §3.2.4 we use CAM to link the SFR of galaxies with the growth rate of host halos, under the assumption that galaxies and halos co-evolve. Generally speaking, a galaxy with a higher SFR will have a stronger emission line in its spectrum, and it is more likely for us to observe. But the eBOSS ELG target selection was conducted using a color cut in the observed frame vs. space, as well as a cut on the -band magnitude. The color and the SFR of a galaxy are not perfectly correlated with each other. For a galaxy with active star formation, it is gas rich and can appear red due to dust attenuation. In addition, the redshifts from eBOSS ELGs were measured using doublet, and the luminosity is strongly affected by dust. Moreover, the doublet emission lines are produced from metal ions by collisional excitation, and is thus correlated with metallicity. Given the reasons above, the SFR of a galaxy is not a direct proxy for us to select eBOSS ELG targets from the mock catalog. Instead, we estimate the luminosity from the SFR in the mock catalog using an empirical correction from Gilbank et al. (2010),
(5) |
where . And we introduce a scatter in given a SFR, , as a free parameter in our model.
To mimic the target selection of eBOSS ELGs, an erf function is used to cut mocks based on . The magnitude and color limits in the ELG target selection will not produce a hard threshold in , and the erf function with a transition width is intended to mimic this effect. Eq. 6 describes the probability of a galaxy in the mock with luminosity being selected as an eBOSS ELG target. This introduces two free parameters: is the luminosity that has 50% probability for an ELG to be selected, and is the transition width of the selection function. The coefficient is added to describe the incompleteness of the eBOSS ELG sample. Note that the coefficient to the erf function is not a free parameter as it is constrained by the eBOSS ELG number density.
(6) |
3.3 The model parameters and MCMC
As described in §3.2, the 3 free parameters in our abundance matching model are , and . Our implementation of the model is able to quickly produce mock galaxy catalogs and accurately compute small scale clustering for a given parameter set, making it ideal to be constrained using MCMC method. For each trial parameter set in the MCMC chain, we calculate a likelihood of the observed luminosity function and . We use the emcee package (Foreman-Mackey et al., 2013), which implements the affine-invariant ensemble sampler (Goodman & Weare, 2010), to perform the MCMC analysis.
For the luminosity function, we use KL-divergence (also called relative entropy) as the metric. Suppose we have the normalized luminosity function from mock and the normalized luminosity function from ELG observation , the KL-divergence of from is defined as:
(7) |
Note that the luminosity function is normalized here, i.e.,
The KL-divergence measures the information gain when using to approximate . With smaller KL-divergence, the mismatch between and is smaller. The KL-divergence is always non-negative, with if and only if everywhere.
We forbid parameter sets that yield too much discrepancy on between model and ELG data by manually restricting . One can interpret this constraint as if we have a prior probability distribution and we observe a posterior probability distribution , then the log-likelihood must be greater than -0.01. The reason we use this approach instead of using to constrain on is that we have a relatively more accurate measurement on the luminosity than the two-point clustering. So if we combine the of with the of as the full log-likelihood in the MCMC, the posterior distribution of parameters will be dominated by matching the luminosity function, which is not what we intended to do. Given the reason above, we choose a reasonable hard threshold cut for constraining the luminosity function and let the MCMC focus on matching the two-point correlation function.
In order to reproduce the small scale clustering of the eBOSS ELG sample, we constrain parameters using the projected correlation function . Let denote the projected correlation function from our model given a given set of parameters and denote the projected correlation function from eBOSS ELG sample. We utilize the covariance matrix from GLAM-QPM mocks to describe the distance between and . Assuming the has a multivariate normal distribution around its expected value, the log-likelihood of obtaining is:
(8) |
where is the dimension of , in our case . This assumption can be validated through 2000 GLAM-QPM mocks, we measure cross- with 12 data points between and and the results indeed follow distribution with 12 degrees of freedom. Since and the covariance matrix are constants, the first term of is a constant and doesn’t change if our model parameters change.
4 Results
4.1 ELG luminosity function and clustering measurements
The main goal of this work is to construct a model for the one-point and two-point statistics of the eBOSS ELG sample with conditional abundance matching. We run 600,000 MCMC iterations to constrain the three free parameters described in §3. In Fig. 1 we present the comparison between the luminosity function of the eBOSS ELG sample and that from our abundance matching model. Within the space of models that pass the constraint of , we find a good match to the data. Note that although the discrepancy between the ELG luminosity function and our model is small, the errorbars of the ELG luminosity function, estimated using bootstrap method, are even smaller. This illustrates the point we made in §3.3 that if we add of in MCMC to optimize the parameters, the model will mainly focus on matching instead of . So our approach focuses on the two-point clustering instead but also has a good constraint on the one-point statistics.

In Fig. 2 we show comparison of projected two-point correlation function between the eBOSS ELG sample and our model result, with the shaded regions as the 1 error. Overall the model and the ELG agree well at the 1 level. The best fit of is with 17 degrees of freedom for . The corresponding p-value is not significant at , indicating that there is no statistical significant between our fitted model and the observed data. The reduced is slightly below 1, which is likely due to the method that we up-scale the errorbar of in Sec. 2.3. At scales of our model produces relatively lower clustering, and at our is higher than the eBOSS ELG sample. This is likely due to the reason that our quenched fraction approach yields a slightly more satellites than the eBOSS ELG sample, so our model predicted is higher at the transition scale between 1-halo term and 2-halo term.

The constraints on the parameters are presented in Fig. 3. We find that the scatter in at fixed SFR is dex. While most of the literature focus on estimating SFR using [OII] flux, there is limited discussion of the scatter in luminosity. However we could utilize the results from Moustakas et al. (2006), where they find that dex for various dust corrections. Since the is proportional to SFR according to Eq. 5, and the empirical correction term is negatively correlated with SFR, the slope of against SFR should be less than 1. Thus the should be less than , which justifies our finding.
We find that the incompleteness coefficient in Eq. 6 is under the best MCMC fit. This means that the eBOSS ELG sample selects 12% of star-forming galaxies at mass range from our mock catalog. This is in accordance with the HOD results of Guo et al. (2019), where they find that the eBOSS ELG target selection only select 1-10% of the star-forming galaxies at high mass end for various redshifts. Guo et al. (2019) also find that the completeness decreases at the low mass end. Accordingly, we use an error function to mimic the target selection at the low mass end and assume a universal incompleteness rate. Our best fit dex, which model the sharpness of the selection due to the ELG target selection at the low end.

4.2 Effect of galaxy assembly bias
In our conditional abundance matching approach we connect the SFR of galaxies with the accretion rate of halos in order to model the galaxy assembly bias. Here we define galaxy assembly bias as the dependence of galaxy properties on a second halo property that is related with halo assembly history. To test the impact of assembly bias, we perform a parallel analysis that repeats the fiducial model, with the exception of removing any correlation between SFR and halo growth rate. In that case, SFR only depends on stellar mass. In Fig. 4 we show the comparison between our best fit model and the model without the assembly bias. The two models have the same number density and satellite fraction. From the left panel in Fig. 4 we can see that both two models can match the luminosity well, and from the right panel we find that the two-point correlation function without assembly bias is generally higher than the model with assembly bias. By measuring the of the model without assembly bias, we find that removing the assembly bias causes 2PCF to deviate from the best fit and fits worse than our fiducial model. This result is expected: if the assembly bias is not included in our model, the star-forming galaxies will more likely reside in older halos given a fixed halo mass bin from to . Thus the two-point clustering of galaxies will be higher because older halos are more clustered.

We also test the impact of galaxy assembly bias on the void probability function (VPF). The VPF is defined as the probability of a randomly placed sphere with radius containing no galaxy of a certain type. Compared with the two-point statistics, it is a useful tool to test whether halo occupation or galaxy bias changes from high to low density environment (Tinker et al., 2006). We make measurements for VPF in the range of with linear binning of . For each radius we place spheres randomly in our mock catalog to obtain a relatively precise measurement of VPF. The errors on VPF are estimated using bootstrap method with 1000 resampling size. In Fig. 5 we show the difference of VPF with and without galaxy assembly bias. The model without assembly bias produces around 5% larger VPF at scale of , as shown in the lower panel of Fig. 5. Note that the difference is more than twice of the errorbar, meaning that it is statistically significant, and can be used as a test with current and upcoming data.

4.3 Halo Occupation
We investigate the mean halo occupation function of our abundance matching model, as shown in Fig. 6. The centrals and the satellites are shown in black dashed line and black dotted line, respectively. The result indicates that ELG central galaxy can be modeled as a log-normal distribution at low mass end and a decreasing power-law at high mass end. According to our model, the occupation numbers of ELG at high halo mass end decreases, but not dramatically. We compare our result with Guo et al. (2019) at redshift , as shown in the red line. Our amplitude of HOD function is higher than Guo et al. (2019) because we only show their results for redshift bin . Guo et al. (2019) finds a significant decrease of the occupation function for centrals at high mass end, likely because they use a higher quenched fraction than we do. We also compare our result with Lin et al. (2020), and we find a similar satellite HOD. The difference in the central HOD is due to Lin et al. (2020) adopting a Gaussian function for central galaxies.
Recently, Avila et al. (2020) explored different forms of HOD model to study the eBOSS ELG sample. They have tested three different formulations for the central HOD: a step-wise erf function, a Gaussian function, and a Gaussian + decaying power-law at high mass end. The difference between those three formulas is mainly at the high mass end. With the same power-law for the satellite HOD, they find that the shape of the central HOD has a less significant impact on the two-point clustering than the satellite fraction. The data marginally rules out the Gaussian formula, and there is negligible difference between the step-wise erf function and the Gaussian + decaying power-law, which is in accordance with our result.
According to our best fit model, the satellite fraction is %, agreeing with an overall percent satellite fraction for all star-forming galaxies at (Tinker et al., 2013). Favole et al. (2016) also found that the satellite fraction of ELGs at redshift 0.8 is .

5 Summary
This paper presents a methodology for constructing subhalo-based models of emission line galaxies that are physically motivated but contain limited number of free parameters that can be constrained by two-point statistics. Our method utilizes conditional abundance matching to model the one-point and two-point statistics of the eBOSS ELG sample. We create a mock galaxy catalog by matching the stellar mass of galaxies with the peak circular velocity of halos. We use quenched fraction from Tinker et al. (2013) to select star-forming galaxies. The eBOSS ELG target selection is modeled as an error function of luminosity. We have demonstrated that by conditionally matching the SFR of galaxies and halo growth rate, we are able to model the one-point and two-point statistics as well as the assembly bias of the eBOSS ELG sample. The free parameters in our model are all related to luminosity, and are determined by the MCMC method.
Our model is able to reproduce the projected two-point correlation function of eBOSS ELG within 1 error. According to our model, we find that the eBOSS ELG sample only selects of the star-forming galaxies at , agreeing with the findings in Guo et al. (2019). The satellite fraction is 19.4%, which is very similar to the satellite fraction for star-forming galaxies in Tinker et al. (2013).
The star formation activity at makes ELGs ideal tracers for spectroscopic surveys. The approach that we develop in this paper can be used for testing galaxy formation physics, as well as producing high-fidelity mock galaxy catalogs for large-scale structure analysis. Although we present an analysis of eBOSS data, the method is flexible and can be applied to any survey with an ELG-type sample selection.
acknowledgments
SL is grateful to support from the CCPP at New York University. SL, JLT and MRB are supported by NSF Award 1615997.
Funding for the Sloan Digital Sky Survey IV has been provided by the Alfred P. Sloan Foundation, the U.S. Department of Energy Office of Science, and the Participating Institutions. SDSS-IV acknowledges support and resources from the Center for High-Performance Computing at the University of Utah. The SDSS web site is www.sdss.org.
SDSS-IV is managed by the Astrophysical Research Consortium for the Participating Institutions of the SDSS Collaboration including the Brazilian Participation Group, the Carnegie Institution for Science, Carnegie Mellon University, the Chilean Participation Group, the French Participation Group, Harvard-Smithsonian Center for Astrophysics, Instituto de Astrofísica de Canarias, The Johns Hopkins University, Kavli Institute for the Physics and Mathematics of the Universe (IPMU) / University of Tokyo, Lawrence Berkeley National Laboratory, Leibniz Institut für Astrophysik Potsdam (AIP), Max-Planck-Institut für Astronomie (MPIA Heidelberg), Max-Planck-Institut für Astrophysik (MPA Garching), Max-Planck-Institut für Extraterrestrische Physik (MPE), National Astronomical Observatory of China, New Mexico State University, New York University, University of Notre Dame, Observatário Nacional / MCTI, The Ohio State University, Pennsylvania State University, Shanghai Astronomical Observatory, United Kingdom Participation Group, Universidad Nacional Autónoma de México, University of Arizona, University of Colorado Boulder, University of Oxford, University of Portsmouth, University of Utah, University of Virginia, University of Washington, University of Wisconsin, Vanderbilt University, and Yale University.
Data availability
The eBOSS ELG catalogs and the GLAM-QPM mock galaxy catalogs are available from the eBOSS DR16 galaxy catalog release.
References
- Alam et al. (2021) Alam S., et al., 2021, Phys. Rev. D, 103, 083533
- Avila et al. (2020) Avila S., et al., 2020, MNRAS, 499, 5486
- Behroozi et al. (2010) Behroozi P. S., Conroy C., Wechsler R. H., 2010, ApJ, 717, 379
- Behroozi et al. (2013) Behroozi P. S., Wechsler R. H., Wu H.-Y., 2013, ApJ, 762, 109
- Benson et al. (2000) Benson A. J., Cole S., Frenk C. S., Baugh C. M., Lacey C. G., 2000, MNRAS, 311, 793
- Blanton et al. (2017) Blanton M. R., et al., 2017, AJ, 154, 28
- Cao et al. (2020) Cao J.-z., Tinker J. L., Mao Y.-Y., Wechsler R. H., 2020, MNRAS, 498, 5080
- Chaves-Montero et al. (2016) Chaves-Montero J., Angulo R. E., Schaye J., Schaller M., Crain R. A., Furlong M., Theuns T., 2016, MNRAS, 460, 3100
- Cole et al. (2000) Cole S., Lacey C. G., Baugh C. M., Frenk C. S., 2000, MNRAS, 319, 168
- Conroy et al. (2006) Conroy C., Wechsler R. H., Kravtsov A. V., 2006, ApJ, 647, 201
- DESI Collaboration et al. (2016a) DESI Collaboration et al., 2016a, arXiv e-prints, p. arXiv:1611.00036
- DESI Collaboration et al. (2016b) DESI Collaboration et al., 2016b, arXiv e-prints, p. arXiv:1611.00037
- Dawson et al. (2016) Dawson K. S., et al., 2016, AJ, 151, 44
- Dey et al. (2018) Dey A., et al., 2018, arXiv e-prints,
- Favole et al. (2016) Favole G., et al., 2016, MNRAS, 461, 3421
- Feldman et al. (1994) Feldman H. A., Kaiser N., Peacock J. A., 1994, ApJ, 426, 23
- Foreman-Mackey et al. (2013) Foreman-Mackey D., Hogg D. W., Lang D., Goodman J., 2013, PASP, 125, 306
- Gilbank et al. (2010) Gilbank D. G., Baldry I. K., Balogh M. L., Glazebrook K., Bower R. G., 2010, MNRAS, 405, 2594
- Goodman & Weare (2010) Goodman J., Weare J., 2010, Communications in Applied Mathematics and Computational Science, 5, 65
- Gunn et al. (2006) Gunn J. E., et al., 2006, AJ, 131, 2332
- Guo et al. (2019) Guo H., et al., 2019, ApJ, 871, 147
- Hearin & Watson (2013) Hearin A. P., Watson D. F., 2013, MNRAS, 435, 1313
- Hearin et al. (2014) Hearin A. P., Watson D. F., Becker M. R., Reyes R., Berlind A. A., Zentner A. R., 2014, MNRAS, 444, 729
- Hearin et al. (2016) Hearin A. P., Zentner A. R., van den Bosch F. C., Campbell D., Tollerud E., 2016, MNRAS, 460, 2552
- Hearin et al. (2017) Hearin A., Behroozi P., Kravtsov A., Moster B., 2017, arXiv e-prints, p. arXiv:1711.10500
- Ilbert et al. (2013) Ilbert O., et al., 2013, A&A, 556, A55
- Kauffmann et al. (1993) Kauffmann G., White S. D. M., Guiderdoni B., 1993, MNRAS, 264, 201
- Klypin et al. (2016) Klypin A., Yepes G., Gottlöber S., Prada F., Heß S., 2016, MNRAS, 457, 4340
- Kravtsov et al. (2004) Kravtsov A. V., Berlind A. A., Wechsler R. H., Klypin A. A., Gottlöber S., Allgood B., Primack J. R., 2004, ApJ, 609, 35
- Lacey & Cole (1993) Lacey C., Cole S., 1993, MNRAS, 262, 627
- Landy & Szalay (1993) Landy S. D., Szalay A. S., 1993, ApJ, 412, 64
- Laureijs et al. (2011) Laureijs R., et al., 2011, arXiv e-prints, p. arXiv:1110.3193
- Lee et al. (2015) Lee N., et al., 2015, ApJ, 801, 80
- Lin et al. (2020) Lin S., et al., 2020, MNRAS, 498, 5251
- Madau & Dickinson (2014) Madau P., Dickinson M., 2014, ARA&A, 52, 415
- Moustakas et al. (2006) Moustakas J., Kennicutt Robert C. J., Tremonti C. A., 2006, ApJ, 642, 775
- Noeske et al. (2007) Noeske K. G., et al., 2007, ApJ, 660, L47
- Peacock & Smith (2000) Peacock J. A., Smith R. E., 2000, MNRAS, 318, 1144
- Peebles (1980) Peebles P. J. E., 1980, The large-scale structure of the universe
- Prada et al. (2012) Prada F., Klypin A. A., Cuesta A. J., Betancort-Rijo J. E., Primack J., 2012, MNRAS, 423, 3018
- Raichoor et al. (2017) Raichoor A., et al., 2017, MNRAS, 471, 3955
- Raichoor et al. (2021) Raichoor A., et al., 2021, MNRAS, 500, 3254
- Reddick et al. (2013) Reddick R. M., Wechsler R. H., Tinker J. L., Behroozi P. S., 2013, ApJ, 771, 30
- Rodríguez-Torres et al. (2016) Rodríguez-Torres S. A., et al., 2016, MNRAS, 460, 1173
- Schaye et al. (2015) Schaye J., et al., 2015, MNRAS, 446, 521
- Scoccimarro et al. (2001) Scoccimarro R., Sheth R. K., Hui L., Jain B., 2001, ApJ, 546, 20
- Seljak (2000) Seljak U., 2000, MNRAS, 318, 203
- Smee et al. (2013) Smee S. A., et al., 2013, AJ, 146, 32
- Somerville & Primack (1999) Somerville R. S., Primack J. R., 1999, MNRAS, 310, 1087
- Spergel et al. (2013) Spergel D., et al., 2013, arXiv e-prints, p. arXiv:1305.5422
- Springel (2005) Springel V., 2005, MNRAS, 364, 1105
- Takada et al. (2014) Takada M., et al., 2014, PASJ, 66, R1
- Taylor et al. (2020) Taylor E. N., et al., 2020, MNRAS, 499, 2896
- Tinker (2021) Tinker J. L., 2021, ApJ, 923, 154
- Tinker et al. (2006) Tinker J. L., Weinberg D. H., Warren M. S., 2006, ApJ, 647, 737
- Tinker et al. (2013) Tinker J. L., Leauthaud A., Bundy K., George M. R., Behroozi P., Massey R., Rhodes J., Wechsler R. H., 2013, ApJ, 778, 93
- Tinker et al. (2017) Tinker J. L., Wetzel A. R., Conroy C., Mao Y.-Y., 2017, MNRAS, 472, 2504
- Tinker et al. (2018a) Tinker J. L., Hahn C., Mao Y.-Y., Wetzel A. R., Conroy C., 2018a, MNRAS, 477, 935
- Tinker et al. (2018b) Tinker J. L., Hahn C., Mao Y.-Y., Wetzel A. R., 2018b, MNRAS, 478, 4487
- Vale & Ostriker (2004) Vale A., Ostriker J. P., 2004, MNRAS, 353, 189
- Watson et al. (2015) Watson D. F., et al., 2015, MNRAS, 446, 651
- Wechsler & Tinker (2018) Wechsler R. H., Tinker J. L., 2018, ARA&A, 56, 435
- White & Frenk (1991) White S. D. M., Frenk C. S., 1991, ApJ, 379, 52
- Yang et al. (2009) Yang X., Mo H. J., van den Bosch F. C., 2009, ApJ, 693, 830
- Yu et al. (2022) Yu J., et al., 2022, MNRAS, 516, 57
- Zheng (2004) Zheng Z., 2004, ApJ, 610, 61
- Zu & Mandelbaum (2016) Zu Y., Mandelbaum R., 2016, MNRAS, 457, 4360
- Zu & Mandelbaum (2018) Zu Y., Mandelbaum R., 2018, MNRAS, 476, 1637
- de Jong et al. (2014) de Jong R. S., et al., 2014, in Ramsay S. K., McLean I. S., Takami H., eds, Society of Photo-Optical Instrumentation Engineers (SPIE) Conference Series Vol. 9147, Ground-based and Airborne Instrumentation for Astronomy V. p. 91470M, doi:10.1117/12.2055826