Exploring TeV candidates of Fermi blazars through machine learning

J. T. Zhu Center for Astrophysics, Guangzhou University, Guangzhou 510006, China Department of Physics and Astronomy “G. Galilei”, University of Padova, Padova I-35131, Italy Astronomy Science and Technology Research Laboratory of Department of Education of Guangdong Province, Guangzhou 510006, China Key Laboratory for Astronomical Observation and Technology of Guangzhou, Guangzhou 510006, China C. Lin Center for Astrophysics, Guangzhou University, Guangzhou 510006, China Astronomy Science and Technology Research Laboratory of Department of Education of Guangdong Province, Guangzhou 510006, China Key Laboratory for Astronomical Observation and Technology of Guangzhou, Guangzhou 510006, China H. B. Xiao Shanghai Key Lab for Astrophysics, Shanghai Normal University Shanghai, 200234, China J. H. Fan Center for Astrophysics, Guangzhou University, Guangzhou 510006, China Astronomy Science and Technology Research Laboratory of Department of Education of Guangdong Province, Guangzhou 510006, China Key Laboratory for Astronomical Observation and Technology of Guangzhou, Guangzhou 510006, China D. Bastieri Center for Astrophysics, Guangzhou University, Guangzhou 510006, China Department of Physics and Astronomy “G. Galilei”, University of Padova, Padova I-35131, Italy G. G. Wang Center for Astrophysics, Guangzhou University, Guangzhou 510006, China Astronomy Science and Technology Research Laboratory of Department of Education of Guangdong Province, Guangzhou 510006, China Key Laboratory for Astronomical Observation and Technology of Guangzhou, Guangzhou 510006, China

(Received XXX; Revised YYY; Accepted ZZZ)

Abstract

In this work, we make use of a supervised machine learning algorithm based on Logistic Regression (LR) to select TeV blazar candidates from the 4FGL-DR2 / 4LAC-DR2, 3FHL, 3HSP, and 2BIGB catalogs. LR constructs a hyperplane based on a selection of optimal parameters, named features, and hyper-parameters whose values control the learning process and determine the values of features that a learning algorithm ends up learning, to discriminate TeV blazars from non-TeV blazars. In addition, it gives the probability (or logistic) that a source may be considered as a TeV blazar candidate. Non-TeV blazars with logistics greater than 80% are considered high-confidence TeV candidates. Using this technique, we identify 40 high-confidence TeV candidates from the 4FGL-DR2 / 4LAC-DR2 blazars and we build the feature hyper-plane to distinguish TeV and non-TeV blazars. We also calculate the hyper-planes for the 3FHL, 3HSP, and 2BIGB. Finally, we construct the broadband spectral energy distributions (SED) for the 40 candidates, testing for their detectability with various instruments. We find that 7 of them are likely to be detected by existing or upcoming IACT observatories, while 1 could be observed with EAS particle detector arrays.

AGN; galaxies-active; galaxies-Quasars; galaxies-BL Lacertae objects; data analysis; machine learning

1 Introduction

Blazars, an extreme subclass of active galactic nuclei (AGNs), are known for prominent observation properties, such as high energy $\gamma$ -ray emissions, rapid and significant amplitude variability, high luminosity, high and variable polarization, and superluminal motions, etc. (Wills et al., 1992; Urry & Padovani, 1995; Fan, 2002; Villata et al., 2006; Fan et al., 2014; Xiao et al., 2015; Gupta et al., 2016; Xiao et al., 2019; Abdollahi et al., 2020; Xiao et al., 2020; Fan et al., 2021). Blazars are historically subdivided into two main categories based on the equivalent width (EW) of the optical emission lines: flat spectrum radio quasars (FSRQs) and BL Lacertae objects (BL Lacs). In general, FSRQs show an EW greater than 5 $\mathring{A}$ , while BL Lacs illustrate no or weak emission lines, with EW less than 5 $\mathring{A}$ . Meanwhile, the spectral energy distributions (SEDs) of blazars are generally characterized by two well-separated bumps, a low-energy one is in the infrared to soft X-ray energy range due to synchrotron emission, and a high-energy one is in the region between hard X-ray to $\gamma$ -ray that is associated with an inverse Compton (IC) radiation according to leptonic model (IC; e.g., Sikora et al. 1994a). The seed photons undergoing IC scattering could be from the same electron population producing the synchrotron bump in the so-called self-Compton (SSC) model (Ghisellini et al., 1985; Maraschi et al., 1992; Bloom & Marscher, 1996), e.g. 1ES 0347-121, 1ES 0229+200 (Costamante et al., 2018; Aharonian et al., 2007a, b), or from external regions (External Compton model, EC), e.g., from the accretion disk (Dermer & Schlickeiser, 1993), broad line region (Sikora et al., 1994b), and dust torus (Błażejowski et al., 2000). While the hadronic process could also contribute to the high-energy bump by high-energy cosmic rays via the photohadronic reactions (Dimitrakoudis et al., 2012; Tavecchio, 2014). SEDs are also used to make classifications of blazars. In the original BL Lac classification of Padovani & Giommi (1995), BL Lacs are classified as two subclasses: ‘low synchrotron peaked (LSP) BL Lac’ or LBL and ‘high synchrotron peaked sources (HSP) BL Lac’ or HBL, depending on their broadband radio-to-X-ray spectral index is larger or smaller than 0.75. Abdo et al. (2010) later extended the classification to all blazars. They suggested to group blazars into three subclasses based on synchrotron peak frequency, LSP, $\log\nu_{\rm p}^{\rm s}\leq 14.0$ ; ISP (intermediate synchrotron peaked sources), $14<\log\nu_{\rm p}^{\rm s}\leq 15$ ; and HSP, $\log\nu_{\rm p}^{\rm s}>15$ . While, Fan et al. (2016) proposed slightly different criteria: $14<\log\nu_{\rm p}^{\rm s}\leq 15.3$ for ISP, and $\log\nu_{\rm p}^{\rm s}>15.3$ for HSP. Very recently, Yang et al. (2022a) also gave a similar classification.

Besides, Costamante et al. (2001) proposed a fourth subclass namely extreme HBLs (EHBLs) for BL Lacs having $\log\nu_{\rm p}^{\rm s}>17$ . The EHBLs can be divided into three subclasses according to their IC bump peak (Foffano et al., 2019) with the subclass with IC bump peak between $0.1\>\mathrm{TeV}$ and $1\>\mathrm{TeV}$ being a continuation of HSP, while the subclass with IC bump peak $>10\>\mathrm{TeV}$ named ‘hard-TeV blazars’ is an independent category featuring high power, very stable flux, and hard-TeV spectral behavior at TeV energies, posing a challenge to the SSC. With an IC bump peaking at a few TeV, the remaining subclass behaves as a transition class with a flat TeV spectral slope. Thus, the HSPs and HBLs are often considered as the potential emitters of very high energy (VHE, above 300 GeV) radiations, among which, the sources with small redshift have a significant fraction of TeV photons to be observed. The emissions of VHE photons with energy $>300\>\mathrm{GeV}$ from blazars reveals new phenomena, especially photons in the TeV band raise the challenge of particle acceleration in jets, and they are also essential clues for indirectly measuring the extragalactic background light, estimating the intergalactic magnetic field, and probing the possible origin of high-energy extragalactic neutrinos (Foffano et al., 2019). For instance, the Very Long Baseline Array (VLBA) provided 23 images of 6 TeV blazars and proved that apparent jet bending is a common property of TeV blazars (Piner et al., 2010).

The emitted flux from most astronomical sources is very low in the TeV band. In addition, the extragalactic background light (EBL) absorbs the most energetic $\gamma$ -ray emissions (Primack et al., 1999; Kneiske et al., 2004; Fermi-LAT Collaboration et al., 2018) via the interaction $\gamma$ + $\gamma$ $\rightarrow$ $e^{+}$ + $e^{-}$ , which is relative to many fundamental astrophysics problems (Domínguez et al., 2019). Therefore, observations at TeV energies require large collection areas, which are only affordable for ground-based detectors like atmospheric Cherenkov telescopes (IACTs) and extended arrays of particle detectors (EAS arrays). The $\gamma$ -rays detection techniques of IACTs and EAS arrays are different. IACTs detect Cherenkov photons in the atmosphere generated by the atmospheric extended shower (EAS) of the secondary particles initiated by the primary $\gamma$ -rays, and popular examples are the High Energy Stereoscopic System ¹¹1https://www.mpi-hd.mpg.de/hfm/HESS/ (Aharonian et al. 2004, H.E.S.S), the Major Atmospheric Gamma Imaging Cherenkov Telescopes ²²2https://magic.mpp.mpg.de/ (MAGIC Collaboration 2000, MAGIC), the Very Energetic Radiation Imaging Telescope Array System ³³3https://veritas.sao.arizona.edu/ (VERITAS Collaboration et al. 2005, VERITAS) and the next-generation IACT array: the Cherenkov Telescope Array Observatory ⁴⁴4https://www.cta-observatory.org/ (The CTA Consortium et al. 2011, CTAO); While EAS arrays detect secondary particles from the cosmic ray surviving down to the ground, such as the High Altitude Water Cherenkov Observatory ⁵⁵5https://www.hawc-observatory.org/ (Abeysekara et al. 2017a, HAWC), and the Large High Altitude Air Shower Observatory ⁶⁶6http://english.ihep.cas.cn/lhaaso/ (Cao et al. 2019, LHAASO).

Up to August 2022, only about 90 extragalactic TeV sources (Wakely & Horan, 2008)⁷⁷7http://TeVCat.uchicago.edu, out of which are 81 TeV blazars, have been verified by IACTs and EAS arrays. Finding TeV blazar candidates is an exciting and challenging work since the EBL and the sensitivity of the detector limits the number of TeV sources with high redshift. Therefore, under the constraints of EBL, an improved sensitivity will expand the sample of TeV blazars in terms of the sheer number and will find sources with higher redshift. Shortly, current and next-generation of IACTs and EAS arrays are sensitive to photons at TeV energies should increase the TeV blazar population, helping us to explore this classification further and understand the mechanism of VHE emission. (Costamante & Ghisellini, 2002; Massaro et al., 2013; Chang et al., 2017; Foffano et al., 2019). The current TeV $\gamma$ -ray detection is offered by the large area of ground-based detectors.

Most of the TeV blazars are HSPs (almost all BL Lacs), thus the TeV blazar candidate searches are often limited to HSPs/BL Lacs. (e.g. Costamante & Ghisellini 2002; Massaro et al. 2013; Chang et al. 2017; Foffano et al. 2019; Chiaro et al. 2019). Typically, BL Lacs are HSPs, whereas FSRQs are mostly LSPs and ISPs. LSP/ISP often have EC components where the electrons see a strong photon field that is Doppler-shifted. In principle they are capable of producing strong $\gamma$ -ray emissions, sometimes entering the VHE regime. But in the case of LSP/ISPs, the interaction often happens in the Klein-Nishina regime and the same strong photon field often induces a very strong absorption of the produced $\gamma$ -ray photons. Compared to LSP/ISPs, the VHE emission of HSPs is thought to originate from the low-energy photons produced by synchrotron radiation by ultrarelativistic electrons in the jet according to the SSC process. Besides, HSPs have higher $\log\nu_{\rm p}^{\rm s}$ , making their synchrotron bump more easily scattered into the TeV band by the relativistic electrons. However, it is not obvious that extreme $\log\nu_{\rm p}^{\rm s}$ values lead to the emission of TeV $\gamma$ -rays by itself, as it also depends on many other properties of the emission region, such as electron distribution, magnetic field strength, internal absorption, and the redshift of the source (Nievas Rosillo et al., 2022).

Lin & Fan (2016) gathered 662 Fermi BL Lacs, including 47 TeV sources, and compared the multiwavelength observation properties of the TeVs with those of the non-TeVs. They discovered that TeVs have a smaller average redshift, a higher flux density, and a harder $\gamma$ -ray spectrum. Flux variability in all wavebands is very common among blazars (Fan et al., 2017; Majumder et al., 2019; Yang et al., 2022a, b), with some blazars, such as 1ES 0229+200, exhibiting moderate variability and others displaying violent and high amplitude variability, with timescales ranging from hours to years. In addition, the $\log\nu_{\rm p}^{\rm s}$ is proportional to the variability of blazars. Gupta et al. (2016) analyzed 50 observations of 12 low synchrotron peaked (LSP) blazars from XMM–Newton and discovered that LSP blazars vary more slowly in the X-ray bands than in the IR/optical bands because the IC mechanism dominates the X-ray radiation in this case. In contrast, HSP blazars are predicted to exhibit more extreme X-ray band variability than LSP blazars. According to the SSC model, relativistic electrons upscatter X-rays into the TeV region, and observations have confirmed correlations between X-ray and TeV emissions (Costamante & Ghisellini, 2002; Singh et al., 2019; Osorio et al., 2019). One can therefore anticipate a correlation between X-ray and TeV emissions. Notably, Markarian 501 and 1ES 1959+650’s synchrotron peak frequencies in the X-ray band shifted to the higher energy region when a flare was observed in the TeV band (Sambruna et al., 2000; Kapanadze et al., 2018; Singh et al., 2019). In contrast, there exists an intriguing counterexample. Foffano et al. (2019) observed two groups of EHBLs with the same range of $\log\nu_{\rm p}^{\rm s}$ but opposite spectral slope in the TeV band, implying that $\log\nu_{\rm p}^{\rm s}$ and the TeV emission are independent of one another.

Beyond all doubt, the Fermi Large Area Telescope (Fermi-LAT) brought prosperity to the study of blazars after its launch in 2008, and the progress of high-energy $\gamma$ -ray astronomy of space-borne instruments is gradually dominated by it. Fermi-LAT detected thousands of blazars and published them in their works (e.g.,Abdollahi et al. 2020), which provided a large $\gamma$ -ray blazar sample. The Fermi-LAT 10-year source catalog (4FGL-DR2, Abdollahi et al. 2020; Ballet et al. 2020), and the Fermi-LAT 10-year AGN catalog (4LAC-DR2, Ajello et al. 2020; Lott et al. 2020) listed 80 out of the 81 TeV blazars (hereafter Fermi blazars.) Thus, we have a convincing reason to believe that Fermi catalogs should contain many TeV blazar candidates.

Machine learning (ML) techniques have become popular among astronomers (Ball & Brunner, 2010; Chiaro et al., 2019; Kang et al., 2019; Xiao et al., 2022). Based on the Fermi-LAT third source catalog (3FGL), Chiaro et al. (2019) applied the Artificial Neural Network (ANN) to 573 uncertain types and 559 unassociated catalog sources of the 3FGL blazars and found 80 HSPs candidates, 16 of them are proposed as TeV candidates with the highest confidence. Kang et al. (2019) collected 1312 blazar candidates of uncertain type (BCUs) out of 3137 blazars recorded in Fermi-LAT 4th source catalog, then employed random forest (RF), support vector machine (SVM), and ANN to predict the category of the BCUs. Finally, 724 BL Lac candidates and 332 FSRQ candidates are predicted by combined classification results of the ML methods. Xiao et al. (2022) compiled radio loudness ${\rm log}R$ and radio 6 cm luminosity ${\rm log}L_{6cm}$ of 2943 AGNs, and got ${\rm log}R=\langle 1.37\pm 0.02\rangle$ as the separation of radio-loud and radio-quiet AGNs by the Gaussian Mixture Model. Furthermore, a more advanced double-criterion dividing boundary ${\rm log}L_{\rm 6cm}=\textendash 2.7{\rm log}R+44.3$ was obtained by the SVM.

From a more comprehensive perspective, this work aims to increase the number of TeV blazar candidates, and break through the limitation of searching for candidates only in HSPs. Using the machine learning method, we look for the physical properties that truly distinguish TeV from non-TeV blazars, and quantify the performance of distinguishing boundaries. We also calculate the probability that a source can be called a TeV source based on these physical properties, and find sources with a higher probability of being called a TeV source from non-TeV sources. The data are compiled from four catalogs: 4FGL-DR2 / 4LAC-DR2, the 3rd Catalog of Hard Fermi-LAT Sources (3FHL, Ajello et al. 2017), the 3rd catalog of HSP blazars (3HSP, Chang et al. 2019), and The Second Brazil-ICRANet Gamma-ray Blazars (2BIGB, Arsioli et al. 2020). Then make sure how many candidates could be detected by the ground-based Cherenkov telescopes. The paper is organized as follows: Section 2 introduces the ML method; Section 3 gives the experiments and results; We make discussion and draw the conclusion in Section 4.

2 Supervised machine learning method

We intend to use the so-called supervised machine learning (SML) to select TeV blazar candidates among four blazar catalogs. In ML, samples are often referred to as datasets. SML considers a labeled dataset as a set of features and a target variable, based on example input-output pairs. The SML method constructs an inferred function (often called a model) to map the features to dispersed target variables from the known dataset in a classification task. Moreover, the model can predict labels for the unknown dataset. The dataset will be divided into three sets: a training set acts as known data to train the model, which is further decomposed into a smaller training set and a validation set which is used to improve the model, and, finally, the test set, acts as unclassified data to evaluate the generalization of the model. The dispersed target variables are often called labels, which mark the class of data (Ball & Brunner, 2010; Baron, 2019). Each source catalog can be regarded as a dataset in a matrix. The parameter columns are the features that characterize the source physical properties. Besides, binary labels distinguish TeV blazars from non-TeV blazars. Our work gets the model from training sets and then employs them in the whole dataset. Consequently, the non-TeV sources with the same predicted labels as the TeV sources are considered TeV blazar candidates. It is taken as a high-confidence candidate if the possibility of being a TeV source is higher than 80%.

There are several steps to accomplish this goal, which is demonstrated in Fig. 1.

Refer to caption — Figure 1: Flowchart of SML.

The process of SML classification can be roughly divided into three parts: Input dataset, ML-trained model, and prediction. Next, we dive into details.

2.1 Data preprocessing

Data preprocessing is an essential step in the dataset before starting the training model. By manipulating or deleting the data, preprocessing of data can change natural features into a more available representation for the downstream model. It contains the following steps:

Data Discretization: In some of the datasets, one feature is represented by multiple columns. For example, in 4FGL-DR2, such as $Flux\_band$ , 7 columns represent integral photon flux in each spectral band. We divide $Flux\_band$ as $Flux\_band\ 1$ , $Flux\ band\ 2$ , … $Flux\_band\ 7$ , respectively.
Data cleaning: There may be some useless and missing data, which will reduce model performance. We discard error features, string features, and most missing value features.
Data standardization: To make all features dimensionless, we use sklearn.preprocessing.StandardScaler to convert all features into the standard normal distribution.
Data partition: We randomly divide the dataset into a 4:1 training set and testing set using python package StratifiedKFold.split of sklearn.model_selection, and each training / testing set preserves the percentage of the dataset for each class. We repeat the division 5 times with 5 different random seeds: 0, 1, 2, 3, 4 to get 5 training sets and 5 testing sets. We mark the training / testing sets as training 1 / testing 1, training 2 / testing 2, …, training 5 / testing 5.

2.2 Selecting SML model

We chose the SML method based on the following considerations:

1.

We request a highly interpretable model that is not overly dependent on computer performance, which outputs an explicitly linear boundary to separate TeV blazars and non-TeV blazars in the feature space.
2.

Once we have the physical properties of a blazar, that are, the features, we want to get the conditional probability that the blazar is a TeV one under the features.

There are many highly-developed SML classification methods, such as: Logistic Regression (Korsós et al. 2021, LR) Support Vector Machines (Hearst 1998, SVM), artificial neural network (McCulloch & Pitts 1943, ANN), random forest (Tin Kam Ho, 1995), etc. In this work, we choose the LR model over other methods like SVM, ANN and the likes for two main reasons. Mathematically, we assumed that the binary labels of blazars follow a Bernoulli distribution, and, as the logistic function is the expectation of a Bernoulli distribution, we believe it is the most natural way to map the linear combination of physical properties of the source into a Bernoulli distribution. Algorithmically, SVM, RF, and ANN are more complicated than LR, resulting in higher calculation costs and an overall loss in the ability to interpret the model, moreover the output of SVM needs to go through the LR model to obtain a conditional probability. The same holds true for ANN, although other models may be used to obtain conditional probability. In addition, ANN and RF cannot guarantee a linear boundary.

To illustrate LR, we introduce the odds ratio: $\frac{p}{1-p}$ , where $p$ represents the probability of the positive event labeled 1. Then we further define the logit function (log-odds), which can be written as ${\rm logit}\,(p)={\rm log}\,\frac{p}{1-p}$ . We can express a linear relationship between features and the log-odds:

logit\,(p(y=1|\mathbf{x}))=w_{0}+w_{1}x_{1}+\cdots+w_{m}x_{m}=\mathbf{w}^{T}\mathbf{x},

(1)

here $y$ is the label; the positive event is marked as ‘1’, while ‘0’ for the negative events. $p(y=1|\mathbf{x}$ ) is the conditional probability that an individual labeled as ‘1’, and $\mathbf{w}$ is the weight. $p(y=1|\mathbf{x}$ ) (or $logistic$ ) is the inverse function of the ${\rm logit}$ function called the ${\rm logistic}$ function (or sigmoid function), , which can be expressed as:

logistic\,(logit(p))=\frac{1}{1+e^{-logit(p)}},

(2)

where $logit(p)$ means $logit\,(p(y=1|\mathbf{x}))$ .

For the implementation, we used scikit-learn (sklearn, Pedregosa et al. 2011), an ML library in python, which provides many useful tools. In particular, we used a sklearn API (Buitinck et al., 2013): sklearn.linear_model.LogisticRegression which implements the LR model, to train the training set. Eventually, we used the LR to classify the dataset by evaluating the $logit$ value of each source in the dataset and calculating the conditional probability $logistic$ that a source emits TeV radiation. Note that the number of TeV sources in our sample is small. To increase the weight of TeV sources, we set the weight parameter class_weight to balance in sklearn.linear_model.LogisticRegression.

2.3 Evaluating performance

A confusion matrix is applied to visualize the model performance. True positive (TP) represents the number of positive events that are correctly classified, true negative (TN) represents the number of negative events that are correctly classified, false positive (FP) represents the number of misclassified positive events, false negative (FN) represents the number of misclassified negative events, as shown in Fig. 2.

The accuracy ( $\frac{TP+TN}{FP+FN+TP+TN}$ ) is one of the most common indexes for evaluating model performance. However, we have an extreme imbalance of the two classes, which means non-TeV sources dominate the results. We also want the model to identify as many TeV sources as possible while reducing the number of non-TeVs misjudged as TeVs. To meet these two requirements, we introduce larger Area Under the Curve (Fawcett 2006, $AUC$ ) as the metric to evaluate LR performance, which is based on the receiver operating characteristic curve (Fawcett 2006, ROC) with the true positive rate ( $TPR$ ) on the Y-axis and the 1- false positive rate ( $FPR$ ) on the X-axis, where $TPR=\frac{TP}{TP+FN}$ and $FPR=\frac{FP}{FP+TN}$ . Furthermore, we introduce Youden’s J statistic (or Youden’s index, Youden 1950), which is $TPR-FPR$ . The optimal $p_{\rm thre}$ corresponds to the maximum Youden index, which means that when we fix TPR, FPR reaches a minimum, which indicates that our screening of TeV candidates is very stringent. So we consider a non-TeV blazar as TeV one if its $logistic$ beyond the optimal $p_{\rm thre}$ .

2.4 Training and tuning model

Now we can train the models on the 5 training sets and further optimize the model performance in two ways:

Feature selection: It may degrade model performance if all features are used, which is called the curse of dimensionality (Taylor, 2019). Data becomes sparse in high-dimensional space, resulting in no more extended similarities between data, which hinders efficient organization and data processing.

The sklearn.feature_selection.SequentialFeatureSelector (or SFS for short) of python could filter useless features, where the basic idea is to sequentially remove features from the full feature space until the new feature subspace contains the desired number of features.
Hyper-parameters optimization: Some parameters in the model cannot be obtained from the training model, which is often called hyper-parameters; they are not the $\mathbf{w}$ in Formula (1). Instead, we need to specify them manually before the training model.

The sklearn.grid_search.GridSearchCV (or GS for short) is helpful in the selection of optimal hyper-parameters, which performs a brute-force search on a list of values that we specify for hyper-parameters, and picks out the optimal hyper-parameters. For sklearn.linear_model.LogisticRegression, we optimize 6 hyper-parameters ⁸⁸8https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html: penalty, C, solver, tol, max_iter, l1_ratio.

The optimization strategy is inspired by the k-fold cross-validation (Ball & Brunner, 2010), which randomly divides the training set into k folds without replacement. Then k - 1 folds are used for the training model, and the remaining one fold, namely the validation set, evaluates the model performance. This process was repeated k times.

Based on k-fold cross-validation, we do both feature selection and hyper-parameter optimization at the same time using the nested cross-validation (Raschka, 2015; Varma & Simon, 2006), which is characterized by a nested loop. The dataset in the outer loop is divided into a training set and a testing set. The training set in the inner loop undergoes k-fold cross-validation selecting optimal hyper-parameters, which is performed by GS. Next, the SFS in the outer loop picks out the optimal features,and the testing set evaluates the model generalization. We implement nested cross-validation on 5 partitions of the dataset and get 5 models with optimal features and hyper-parameters.

However, Bias-variance trade-off is one of the fundamental rules of machine learning. The bias measures the deviation between the predicted value and the actual value on the known dataset, and the variance measures the performance of the model on the unknown dataset. Overfitting is a common problem in machine learning, and it is mainly due to high variance, making the performance on the testing set worse than that on the training set. Therefore, for each of the 5 partitions, we calculate the gap that equals $AUC$ of the training set minus $AUC$ of the testing set and selects one optimal model corresponding to the smallest positive gap.

2.5 Constructing linear boundary

Now we could build linear boundaries to distinguish TeV emitting sources from non-TeV emitting sources. When the $logistic$ of a source is greater than a threshold value ( $p_{\rm thre}$ ), the LR determines the source as a TeV source, otherwise as a non-TeV one. A $p_{\rm thre}$ corresponds a $logit$ threshold value: $logit_{thre}$ (See Formula (2)). So we could express the linear boundary as $logit^{\prime}=0$ , where $logit^{\prime}=logit-logit_{thre}$ .

2.6 Predicting labels

To better understand the process of predicting labels for the unknown datasets, we define some transition concepts. According to section 2.1, we call the dataset that has only undergone data discretization as the initial set. In contrast, the dataset that has undergone complete preprocessing is called the learning set. We get the optimal model from the learning set, and the optimal features are also determined. In the initial set, a subset containing the optimal features is called a prediction set. The learning set requires that all features have no missing values, but the prediction set only needs the full optimal parameters, containing more samples that will help to find more TeV candidates. Consequently, we could calculate the $logistic$ for each blazar in the prediction set, and consider the non-TeV blazars with $logistic\geqslant 80\%$ as high-confidence TeV candidates.

3 LR model to distinguish TeVs from non-TeVs

3.1 Samples

We used four catalogs: 4FGL-DR2 / 4LAC-DR2, 3FHL, 3HSP, and 2BIGB to complete our tasks. 4FGL-DR2, based on 10-year Fermi-LAT data, contains over 5700 sources, 3511 of which are AGNs (3436 are blazars). In the observer frame, 4LAC-DR2 includes synchrotron peak frequency ( $\log\nu_{\rm p}^{\rm s}$ ), the corresponding intensity ( $\log\nu_{\rm p}^{\rm s}f_{\nu_{\rm p}}^{\rm s}$ ), and redshift ( $z$ ), which are critical supplements to 4FGL-DR2; 3FHL (Ajello et al., 2017) is a 7-year-based catalog of 1556 VHE sources; 3HSP (Chang et al., 2019) includes 2013 HSP blazars containing a critical feature: the peak flux in units of the faintest HSP blazar peak flux (2.5 $\times$ 10^-12 $\>\mathrm{erg\cdot cm^{-2}\cdot s^{-1}}$ ) that has been detected in the TeVCat(Wakely & Horan, 2008): $FOM$ (Figure of Merit); 2BIGB (Arsioli et al., 2020) is the result of $\gamma$ -ray likelihood analysis for 1160 3HSP sources yielded photon flux from 500 MeV to 500 GeV: $\rm F_{0.5-500GeV}$ ( $F^{\rm ph}$ ).

TeVCat is an online interactive catalog containing VHE sources. As of now, it contains 251 sources detected mainly by IACTs and EAS arrays in the TeV energy region, and at least 91 are extragalactic sources. Among those sources, 80 blazars are detected by Fermi-LAT, thus we assume that blazars emitting TeV radiation are usually detected by Fermi-LAT. We cross-matched the four catalogs with TeVCat, respectively. We marked labels of their blazars in common with TeVCat as ‘1’, while the others were‘ 0’. According to the SML workflow introduced in the previous section, we got four groups of initial, learning, and prediction sets:

4FGL-DR2 / 4LAC-DR2: We supplement 4FGL-DR2 with $\log\nu_{\rm p}^{\rm s}$ , $\log\nu_{\rm p}f_{\nu_{\rm p}}$ and $z$ from 4LAC-DR2, obtaining a new catalog: 4FGL-DR2 / 4LAC-DR2. Initial set: 3436 blazars including 80 TeV ones, with 88 features. Learning set: 861 blazars including 71 TeV ones, with 32 features. 688 / 173 blazars for each training / testing set. Prediction set: 1459 blazars including 73 TeV ones, with 5 features.
3FHL: Initial set: 1207 blazars including 74 TeV ones, with 49 features. Learning set: 506 blazars including 63 TeV blazars, with 18 features. 404 / 102 blazars for each training / testing set. Prediction set: 540 blazars including 67 TeV ones, with 2 features.
3HSP: Initial set: 2013 blazars including 57 TeV ones, with 13 features. Learning set: 1411 blazars including 53 TeV ones, with 5 features. 1128 / 283 blazars for each training / testing set. Prediction set: 1771 blazars including 54 TeV ones, with 2 features.
2BIGB: Initial set: 1160 blazars including 57 TeV ones, with 31 features. Learning set: 1040 blazars including 54 TeV ones, with 7 features. 832 / 208 blazars for each training / testing set. Prediction set: 1040 blazars including 54 TeV ones, with 3 features.

3.2 logit, logistic, and performance of LR

Processing the SML method we ascertain the result meeting condition of $logistic\geqslant$ 80% which gives 40 high-confidence candidates out of 150 4FGL-DR2 / 4LAC-DR2 candidates (see also Fig. 11, Tab. 8). Among the 40 high-confidence candidates, 24 sources are in common with 3FHL candidates, 11 with 3HSP candidates, and 14 with 2BIGB candidates. We report the main results of SML as follows:

4FGL-DR2 / 4LAC-DR2

logit^{\prime}=4.808\Gamma+2.809V_{F}+3.889\log f^{ph}_{\rm 7}-3.34z+0.857\log\nu_{\rm p}^{\rm s}+20.244,

(3)

where $logit^{\prime}=0$ is ideal for $p_{\rm thre}=$ 40%. The optimal LR model is built on training 3, with $AUC$ being 97% on training 3 and 96% on testing 3. 150 TeV blazar candidates are obtained from 1459 blazars in the prediction set, with $AUC$ : 98%, $FPR$ : 11%, and $TPR$ : 99%. Only 1 TeV blazar is mistaken as a non-TeV one.

3FHL

logit^{\prime}=0.116\log f^{\rm ph}_{\rm 2}-0.628z+1.028,

(4)

where $logit^{\prime}=0$ is ideal for $p_{\rm thre}=$ 52%, the optimal LR model is built on training 5, with $AUC$ being 89% on training 5 and 87% on testing 5. 126 TeV candidates are obtained from 540 blazars, with $AUC$ : 88%, $FPR$ : 27%, and $TPR$ : 88%. Only 8 TeV blazars are mistaken as non-TeV ones.

3HSP

logit^{\prime}=0.306FOM-3.861z-0.173,

(5)

where $logit^{\prime}=0$ is ideal for $p_{\rm thre}=$ 61%, the optimal LR model is built on training 5, with $AUC$ being 99% on training 5 and 94% on testing 5, 40 TeV candidates are obtained from 1771 blazars, with $AUC$ : 98%, $FPR$ : 2%, and $TPR$ : 91%, only 5 TeV blazars are mistaken as non-TeV ones.

2BIGB

logit^{\prime}=-0.016E_{\rm P}+0.248FOM-4.395z-0.122

(6)

where $logit^{\prime}=0$ is ideal for $p_{\rm thre}=$ 48%, the optimal LR model is built on training 4, with $AUC$ being 97% on training 4 and 97% on testing 4. 83 TeV candidates are obtained from 1040 blazars, with $AUC$ : 97%, $FPR$ : 8%, and $TPR$ : 96%, only 2 TeV blazars are mistaken as non-TeV ones.

The features of the 4 learning sets are reported in Tab. 1, where Col. (1) indicates the catalog information; Col. (2) gives how many columns the features contain, and each column represents a feature; Col. (3) and Col. (4) show the names and units of the features from the FITS version of the catalogs; Col. (5) is the description of the feature. The performance of the LR model on 4 learning sets is also shown in Tab. LABEL:LR_4FGL to Tab. 5, where Col. (1) shows the datasets name; Col. (2) the $AUC$ , where the top is for training sets while the bottom for testing sets; Col. (3) $AUC$ difference between training sets and testing sets; Col. (4) is the $p_{\rm thre}$ ; Col. (5) is optimal features; Col. (6) is optimal hyper-parameters, note that the bold corresponds to minimum overfitting datasets, which represents the data and parameters of the optimal LR model.

Table 1: Features from 4 catalogs for learning sets.

Catalog	Column	Feature	Unit	Description
(1)	(2)	(3)	(4)	(5)
4FGL-DR2 / 4LAC-DR2	1	Pivot_Energy ( $E_{\rm P}$ )	GeV	Energy at which error on differential flux is minimal
	1	Flux ( $\log F^{\rm ph}$ )	$\mathrm{cm^{-2}\cdot s^{-1}}$	Integral photon flux from 1 to 100 GeV
	1	Energy_Flux ( $\log F$ )	$\mathrm{erg\cdot cm^{-2}\cdot s^{-1}}$	Energy flux from 100 MeV to 100 GeV
	1	PL_Flux_Density ( $\log f_{\rm 1}$ )	$\mathrm{cm^{-2}\cdot MeV^{-1}\cdot s^{-1}}$	Differential flux at Pivot_Energy in PowerLaw fit
	1	PL_Index ( $\alpha_{\rm 1}$ )		Photon index when fitting with PowerLaw
	1	LP_Flux_Density ( $\log f_{\rm 2}$ )	$\mathrm{cm^{-2}\cdot MeV^{-1}\cdot s^{-1}}$	Differential flux at Pivot_Energy in LogParabola fit
	1	LP_Index ( $\alpha_{\rm 2}$ )		Photon index at Pivot_Energy when fitting with LogParabola
	1	LP_beta ( $\beta$ )		Curvature parameter when fitting with LogParabola
	1	PLEC_Flux_Density ( $\log f_{\rm 3}$ )	$\mathrm{cm^{-2}\cdot MeV^{-1}\cdot s^{-1}}$	Differential flux at Pivot_Energy in PLSuperExpCutoff fit
	1	PLEC_Index ( $\Gamma$ )		Low-energy photon index when fitting with PLSuperExpCutoff
	1	PLEC_Expfactor ( $a$ )		Exponential factor when fitting with PLSuperExpCutoff
	1	PLEC_Exp_Index ( $b$ )		Exponential index when fitting with PLSuperExpCutoff
	7	Flux_Band ( $\log f_{\rm 1}\sim\log f_{\rm 7}$ )	$\mathrm{cm^{-2}\cdot s^{-1}}$	Integral photon flux in each spectral band
	7	nuFnu_Band ( $\log f^{\rm ph}_{\rm 1}\sim\log f^{\rm ph}_{\rm 7}$ )	$\mathrm{erg\cdot cm^{-2}\cdot s^{-1}}$	Spectral energy distribution over each spectral band
	1	Variability_Index ( $V$ )		Likelihood difference between the flux fitted in each time interval and the average flux
	1	Frac_Variability ( $V_{\rm F}$ )		Fractional variability computed from the fluxes in each year
	1	Redshift ( $z$ )		Redshift
	1	nu_syn ( $\log\nu_{\rm p}^{\rm s}$ )	Hz	Synchrotron-peak frequency in observer frame
	1	nuFnu_syn ( $\log\nu_{\rm p}^{\rm s}f_{\nu_{\rm p}}^{\rm s}$ )	$\mathrm{erg\cdot cm^{-2}\cdot s^{-1}}$	Spectral energy distribution at synchrotron-peak frequency
	1	Highest_energy ( $E_{\rm H}$ )	GeV	Highest energy among events probably coming from the source
3FHL	1	Pivot Energy ( $E_{\rm P}$ )	GeV	Energy at which error on differential flux is minimal
	1	Flux Density ( $\log f$ )	$\mathrm{cm^{-2}\cdot GeV^{-1}\cdot s^{-1}}$	Differential flux at Pivot Energy
	1	Flux ( $\log F^{\rm ph}$ )	$\mathrm{cm^{-2}\cdot s^{-1}}$	Integral photon flux from 10 GeV to 1 TeV $\mathrm{erg\ cm^{-2}\cdot s^{-1}}$
	1	Energy Flux ( $\log F$ )	$\mathrm{erg\cdot cm^{-2}\cdot s^{-1}}$	Energy flux from 10 GeV to 1 TeV
	1	PowerLaw Index ( $\alpha_{\rm 1}$ )		Photon index when fitting with power law
	1	Spectral Index ( $\alpha_{\rm 2}$ )		Photon index at Pivot Energy when fitting with LogParabola
	1	beta ( $\beta$ )		Curvature parameter when fitting with LogParabola
	4	Flux Band ( $\log f_{\rm 1}\sim\log f_{\rm 4}$ )	$\mathrm{cm^{-2}\cdot s^{-1}}$	Integral photon flux in each spectral band
	4	nuFnu_Band ( $\log f^{\rm ph}_{\rm 1}\sim\log f^{\rm ph}_{\rm 4}$ )	$\mathrm{erg\cdot cm^{-2}\cdot s^{-1}}$	Spectral energy distribution over each spectral band
	1	HEP energy ( $E_{\rm H}$ )	GeV	Highest energy among events probably coming from the source
	1	Redshift ( $z$ )		Redshift
	1	NuPeak obs ( $\log\nu_{\rm p}^{\rm s}$ )	Hz	Synchrotron-peak frequency in observer frame
3HSP	1	radio flux density ( $\log f_{\rm R}$ )	mJy	Radio flux density from the NVSS or FIRST catalog
	1	X-ray flux flux density ( $\log f_{\rm X}$ )	$\mu$ Jy	X-ray flux density at 1keV
	1	nu_syn ( $\log\nu_{\rm p}^{\rm s}$ )	Hz	Synchrotron-peak frequency in observer frame
	1	Redshift ( $z$ )		Redshift
	1	FOM ( $FOM$ )		The figure of merit parameter, which is related to the likelihood of GeV / TeV detectability
2BIGB	1	N0 ( $\log f$ )	$\mathrm{cm^{-2}\cdot MeV^{-1}\cdot s^{-1}}$	Differential flux at Pivot Energy in PowerLaw fit
	1	Gamma ( $\alpha$ )		Photon index when fitting with PowerLaw
	1	$\rm F_{0.5-500GeV}$ ( $F^{\rm ph}$ )		Integrated photon flux from 500 MeV to 500 GeV
	1	E0 ( $E_{\rm P}$ )	GeV	Energy at which error on differential flux is minimal
	1	nu_syn ( $\log\nu_{\rm p}^{\rm s}$ )	Hz	Synchrotron-peak frequency in observer frame from 3HSP
	1	Redshift ( $z$ )		Redshift from 3HSP
	1	FOM ( $FOM$ )		The same as in 3HSP

Table 2: LR model and performance for the learning dataset of 4FGL-DR2 / 4LAC-DR2

Partition	$AUC$	Overfitting	$p_{\rm thre}$	Features	Hyper-parameters
(1)	(2)	(3)	(4)	(5)	(6)
1	96.9%	0.9%	53.6%	$E_{\rm p},{\log}F^{\rm ph}_{\rm 1},{\log}F_{\rm 1},\mathit{\Gamma},V,V_{\rm f},{\log}f^{\rm ph}_{\rm 1},$ ${\log}f^{\rm ph}_{\rm 2},{\log}f^{\rm ph}_{\rm 4},{\log}f^{\rm ph}_{\rm 6},{\log}f^{\rm ph}_{\rm 7},z,{\log}\nu_{\rm p}^{\rm s}$	C: 1, max_iter: 500, penalty: l2, solver: sag, tol: $\mathit{1}0^{-4}$
	95.6%
2	97.2%	1.2%	53.5%	$E_{\rm p},F_{\rm 1},\mathit{\alpha_{\rm 1},\Gamma},V,$ $V_{\rm f},f^{\rm ph}_{\rm 7},z,\log\nu_{\rm p}^{\rm s}$	C: 1, max_iter: 100, penalty: l1, solver: liblinear, tol: $\mathit{10}^{\rm-6}$
	96.0%
3	96.8%	0.8%	53.4%	$\bm{\mathit{\Gamma,V_{\rm f},\log f^{\rm ph}_{\rm 7},z,\log\nu_{\rm p}^{\rm s}}}$	C: 1, max_iter: 500, penalty: l1, solver: saga, tol: ${\bm{\mathit{10}^{\rm-6}}}$
	96.0%
4	96.0%	-1.8%	53.0%	$\log F_{\rm 1},\mathit{\alpha_{\rm 1},\Gamma},a,V_{\rm f},$ ${\log}f_{\rm 7},\log F^{\rm ph}_{\rm 7},z,\log\nu_{\rm p}^{\rm s}$	C: 0.1, max_iter: 100, penalty: l2, solver: newton-cg, tol: $\mathit{10}^{-6}$
	98.0%
5	97.5%	5.0%	52.2%	$E_{\rm p},\mathit{\alpha_{\rm 1},\alpha_{\rm 3}},V_{\rm f},\log f_{\rm 3},\log f_{\rm 5},$ $\log f_{\rm 7},\log f^{ph}_{\rm 7},z,\log\nu_{\rm p}^{\rm s}$	C: 1, max_iter: 100, penalty: l2, solver: newton-cg, tol: $\mathit{10}^{-6}$
	92.5%

Table 3: LR model and performance for the learning set of 3FHL

Partition	$AUC$	Overfitting	$p_{\rm thre}$	Features	Hyper-parameters
(1)	(2)	(3)	(4)	(5)	(6)
1	87.8%	-1.9%	53.6%	$\log f^{ph}_{2},z$	C: 0.01, l1_ratio: 0.4, max_iter: 100, penalty: elasticnet, solver: saga, tol: $\mathit{10}^{-6}$
	89.7%
2	88.0%	-7.4%	53.5%	$\log f^{ph}_{\rm 2},z$	C: 0.01, l1_ratio: 0.3, max_iter: 100, penalty: elasticnet, solver: saga, tol: $\mathit{10}^{-6}$
	95.5%
3	88.7%	5.5%	53.4%	$\log f^{ph}_{\rm 2},z$	C: 0.01, l1_ratio: 0.4, max_iter: 100, penalty: elasticnet, solver: saga, tol: $\mathit{10}^{-6}$
	83.1%
4	89.6%	3.6%	52.9%	$\log f^{ph}_{\rm 2},z$	C: 0.01, l1_ratio: 0.2, max_iter: 100, penalty: elasticnet, solver: saga, tol: $\mathit{10}^{-6}$
	86.0%
5	89.2%	2.3%	52.2%	$\log f^{ph}_{\rm 2},z$	C: 0.01, l1_ratio: 0.4, max_iter: 100, penalty: elasticnet, solver: saga, tol: $\bm{\mathit{10}^{-6}}$
	87.0%

Table 4: LR model and performance for the learning set of 3HSP

Partition	$AUC$	Overfitting	$p_{\rm thre}$	Features	Hyper-parameters
(1)	(2)	(3)	(4)	(5)	(6)
1	98.3%	-1.2%	46.9%	$\log f_{\rm X},\log\nu^{\rm s}_{\rm p},FOM,z$	C: 0.01, l1_ratio: 0.7, max_iter: 100, penalty: elasticnet, solver: saga, tol: $\textit{10}^{-6}$
	99.5%
2	98.4%	-0.5%	60.2%	$FOM,z$	C: 0.01, max_iter: 100, penalty: l2, solver: liblinear, tol: $\textit{10}^{-6}$
	98.9%
3	98.6%	-0.5%	47.6%	$FOM,z$	C: 0.001, max_iter: 100, penalty: l2, solver: newton-cg, tol: $\textit{10}^{-6}$
	99.1%
4	98.4%	-1.4%	58.2%	$FOM,z$	C: 0.1, max_iter: 100, penalty: l2, solver: newton-cg, tol: $\textit{10}^{-6}$
	99.8%
5	99.2%	5.7%	60.8%	$\bm{\mathit{FOM,z}}$	C: 0.01, max_iter: 100, penalty: l2, solver: liblinear, tol: $\textit{10}^{-6}$
	93.5%

Table 5: LR model and performance for the learning set of 2BIGB

Partition	$AUC$	Overfitting	$p_{\rm thre}$	Features	Hyper-parameters
(1)	(2)	(3)	(4)	(5)	(6)
1	98.6%	0.2%	44.7%	$\mathit{\alpha},F,\log\nu^{s}_{p},FOM,z$	C: 0.01, max_iter: 100, penalty: l2, solver: newton-cg, tol: $\textit{10}^{-6}$
	98.4%
2	97.7%	6.0%	52.4%	$E_{\rm p},FOM,z$	C: 0.01, max_iter: 100, penalty: l2, solver: newton-cg, tol: $\textit{10}^{-6}$
	91.7%
3	98.2%	6.3%	49.1%	$FOM,z$	C: 0.01, max_iter: 100, penalty: l2, solver: newton-cg, tol: $\textit{10}^{-6}$
	91.8%
4	97.2%	0.1%	48.1%	$\bm{E_{\rm p},FOM,z}$	C: 0.01, l1_ratio: 0.1, max_iter: 100, penalty: elasticnet, solver: saga, tol: $\textit{10}^{-6}$
	97.1%
5	96.7%	-3.0%	50.3%	$FOM,z$	C: 0.01, max_iter: 100, penalty: l2, solver: newton-cg, tol: $\textit{10}^{-6}$
	99.7%

We also visualize part of the results in figures: Fig. 3 shows feature selection; Fig. 4 is $AUC$ of the training / testing sets; Fig. 5 illustrates ROC curve, $AUC$ , and $p_{\rm thre}$ of the prediction sets; Fig. 6 visualize the linear boundary of 3FHL, 3HSP, and 2BIGB. Fig. 7 is the confuse matrix.

4 Potential targets for ground-based Cherenkov detectors

Among the 40 high-confidence TeV candidates, we consider the potential detectable targets of IACTs and EAS arrays for the whole year of 2023. We employ two criteria to determine whether a source is detectable: (i) Its flux density is higher than the detector sensitivity curve at the energy band $\geqslant$ 1TeV; (ii) It could pass by the detectable sky region of the detectors.

4.1 SEDs considering EBL correction

The tactic is to correct the data by considering EBL first and then fitting SEDs on the intrinsic data. The SED contains two bumps, a low-energy one in the infrared to soft X-ray energy range due to synchrotron emission and a high-energy one in the region between hard X-ray to $\gamma$ -ray that is associated with IC emission. To do so, we use an online tool ⁹⁹9http://tools.asdc.asi.it/SED/ provided by the space science data center (SSDC) of the Italian Space Agency to collect data from radio to TeV from many missions and experiments together with catalogs and archival data (Aharonian et al., 2001; Giommi et al., 2002; Amenomori et al., 2003; Aharonian et al., 2003, 2009; Daniel et al., 2005; Schroedter et al., 2005; Acciari et al., 2008, 2011; Godambe et al., 2008; Chandra et al., 2010, 2012; H. E. S. S. Collaboration et al., 2010; HESS Collaboration et al., 2013; Aliu et al., 2011, 2015; Bartoli et al., 2011, 2012; Archambault et al., 2013, 2014; Arlen et al., 2013; Abramowski et al., 2013, 2015; Biteau & Williams, 2015; Sharma et al., 2015). When we fit the SEDs, we discard the data bins whose flux error is bigger than flux upper limits. Note that there are TeV photons with zero error of the TeV candidates. Those TeV band photons are not certified by IACTs or EAS array (See Fig. 6). Likewise, we also discard data in the low radio energy range ( ${\rm log}\nu(\>\mathrm{Hz})<9$ ), since there is an asymmetry in the synchrotron bump of some sources. In this paper, we do not take into account the simultaneous observation of the data. EBL photons interact with VHE photons via pair production, a process that induces an attenuation of the observed gamma-ray spectra from blazars at cosmological distances, which may have a large impact on SED in IC bump. Assuming the EBL optical depth ( $\tau\ (E,z)$ ) to depend on the photon energy ( $E$ ) and redshift ( $z$ ), Saldana-Lopez et al. (2021) provided a 2D grid of $\tau\ (E,z)$ ¹⁰¹⁰10https://www.ucm.es/blazars/eblin the range of 0.001 TeV $\leq E\leq$ 100 TeV, and 0 $\leq z\leq$ 6. Subsequently, the two bumps of the observed SED were fitted separately with a log-parabola as follows (Fan et al., 2016):

\log(\nu f_{\nu})=c(\log\nu-\log\nu_{\rm p})^{2}+\log\nu_{\rm p}f_{\nu_{\rm p}},

(7)

where $|2c|$ is the spectral curvature, $\log\nu_{\rm p}$ is the logarithm of the peak frequency, $\log\nu_{\rm p}f_{\nu_{\rm p}}$ is the logarithm of peak flux, and $\log(\nu f_{\nu})$ is the flux density.

we successfully fit the synchrotron bump for 40 TeV candidates and the IC bump for 20 TeV candidates and return the errors of the fitting $|2c|$ , $\log\nu_{\rm p}$ (in units of $\mathrm{Hz}$ ) and $\log\nu_{\rm p}f_{\nu_{\rm p}}$ (in units of $\mathrm{erg\ cm^{-2}\ s^{-1}}$ ). The fitting process executes on scipy.optimize.curve_fit of python. The results are also listed in Tab. 6, in which Col. (1) gives the 4FGL name, Col. (2) to Col. (4) are parameters and errors for fitting synchrotron bump, Col. (5) to Col. (7) are parameters and errors for fitting IC bump, while the right side of the slash is the data without EBL-absorbed. For each high-confidence TeV candidate, the SEDs fitting results are shown in Fig. 11.

Table 6: SEDs fitting results in the observed frame for 40 TeV candidates.^∗

4FGL name	$\rm\|c^{s}\|$	$\log\nu^{s}_{p}$	${\log v^{s}_{p}}{f^{\rm s}_{\rm p}}$	$\rm\|c^{IC}\|$	$\rm{log\nu^{IC}_{p}}$	${\log v^{IC}_{p}}{f^{\rm IC}_{\rm p}}$
(1)	(2)	(3)	(4)	(5)	(6)	(7)
J0037.8+1239	0.11 ± 0.02	15.35 ± 0.46	-11.17 ± 0.07	0.02 ± 0.02	23.15 ± 2.42	-11.74 ± 0.1
J0051.2-6242	0.16 ± 0.01	15.72 ± 0.06	-11.23 ± 0.03	0.08 ± 0.07	25.57 ± 1.2	-11.37 ± 0.13
J0110.1+6805	0.1 ± 0.01	14.99 ± 0.29	-10.89 ± 0.09
J0110.7-1254	0.05 ± 0.01	17.76 ± 1.09	-11.81 ± 0.22
J0115.8+2519	0.09 ± 0.01	15.67 ± 0.26	-11.54 ± 0.04
J0123.7-2311	0.05 ± 0.01	17.47 ± 0.88	-11.72 ± 0.19
J0159.5+1046	0.09 ± 0.0	15.63 ± 0.18	-11.66 ± 0.03	0.03 ± 0.02	22.26 ± 0.67	-11.6 ± 0.11
J0209.3-5228	0.11 ± 0.02	15.8 ± 0.09	-11.08 ± 0.05	0.07 ± 0.04	24.99 ± 0.57	-11.43 ± 0.04
J0211.2+1051	0.12 ± 0.0	14.8 ± 0.1	-10.57 ± 0.02	0.06 ± 0.05	22.53 ± 1.04	-11.04 ± 0.07
J0244.6-5819	0.12 ± 0.03	16.17 ± 0.49	-11.15 ± 0.12
…	…	…	…	…	…	…
…	…	…	…	…	…	…
…	…	…	…	…	…	…

*

Only ten items are displayed. A complete listing of this table is available in the online version.

4.2 Visibility of TeV blazar candidates

Being different from the all-sky scanning characteristics of space detectors, ground detectors can only scan part of the sky due to location constraints. IACTs need a dark sky for observations, and the field of view (FoV) of the current IACTs is small (3 ${}^{\circ}\textendash$ 5 ^∘). Besides, IACTs duty-cycle is restricted by the need to observe only during clear-sky, moonless nights, and, in addition, further constraints stem from the zenith angle: while the zenith angle increases, the sensitivity worsens and the energy threshold increases. This results in limiting the portion of the sky available for observation. At the same time, CTAO, as the next generation of IACT, consists of two arrays located in the Northern (28^∘ 45^′ N) and the Southern (24 ^∘ 41^′ S) hemispheres so that the FoV covers most of the sky, but not all, since CTAO suffers the same limitations affecting the current generation IACTs observations at high Zenith Angles ( $Z$ ). Conversely, the EAS arrays can work under all weather conditions and have large FoV ( $\sim{\rm 2}sr$ ), which can continuously monitor a significant fraction of the sky every day.

We compile the sensitivity curves of IACTs and EAS arrays from van Eldik et al. (2015); Aleksić et al. (2016); Cao et al. (2019); Abeysekara et al. (2017b, 2017) and online database: CTAO¹¹¹¹11https://www.cta-observatory.org/science/ctao-performance/, H.E.S.S. ¹²¹²12chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.mpi-hd.mpg.de/hfm/HESS/pages/home/proposals/sc_sens.pdf, VERITAS¹³¹³13https://veritas.sao.arizona.edu/about-veritas/veritas-specifications.. The FoV of these sensitivity curves are as follows: CTAO north (north site and $\mathrm{0}^{\circ}\leq Z\leq\mathrm{20}^{\circ}$ ), CTAO south (south site and $\mathrm{0}^{\circ}\leq Z\leq\mathrm{20}^{\circ}$ ), H.E.S.S. zenith ( $Z\sim\mathrm{0}^{\circ}$ ), H.E.S.S. medium ( $\mathrm{12}^{\circ}\leq Z\leq\mathrm{22}^{\circ}$ ), MAGIC low ( $\mathrm{0}^{\circ}\leq Z\leq\mathrm{30}^{\circ}$ ), MAGIC medium ( $\mathrm{30}^{\circ}\leq Z\leq\mathrm{45}^{\circ}$ ), VERITAS ( $\mathrm{0}^{\circ}\leq Z\leq\mathrm{20}^{\circ}$ ), LHAASO ( $Z\leq\mathrm{40}^{\circ}$ or $\mathrm{-11}^{\circ}\leq Dec\leq\mathrm{69}^{\circ}$ ), and HAWC ( $\mathrm{-20}^{\circ}\leq Dec\leq\mathrm{60}^{\circ}$ ); while the exposure times are: 507 days for HAWC, one year for LHAASO, 25 hours for H.E.S.S. near the zenith, and 50 hours for the rests.

Since EAS arrays can observe all sources in the FoV simultaneously according to the source declination, whereas IACTs can only track one source at a time and require darkness sky, we employ two different strategies to infer the visibility of a given source. A source could pass the observation window for EAS arrays if its declination is within the most observable sky. On the other hand, for IACTs, we evaluate observation windows by 2 tools: the H.E.S.S. online visibility tool ¹⁴¹⁴14https://www.mpi-hd.mpg.de/hfm/HESS/pages/home/visibility/ and a python package: astroplan. Input the celestial coordinates of the source and the observation information of IACTs to these two tools. We can get visible months during darkness at a specific elevation angle (1 - $Z$ ), as shown in Fig. 8 and Tab. 7. The 2 tools have slight differences in the specific observable months. The observable months of 40 high-confidence TeV blazar candidates for the 2 tools are also shown in Tab. 7, where Col. (1) gives the source name; Col. (2) and Col. (7) give the observable months for IACTs and EAS arrays; The numbers represent the observable months, i.e. ‘1’ for January, ‘2’ for February, and so on.

4.3 High-confidence TeV blazar candidates

We calculate the $\tau\ (E,z)$ for 40 TeV candidates with $E$ in the range of 0.001TeV to 100 TeV, and get the EBL-absorbed IC bumps. By comparing the 20 IC bumps and the sensitivity curves of IACTs and EAS arrays in the energy range of 1 TeV $\leq E\leq$ 100 TeV, we obtain 7 out of 40 high-confidence TeV blazar candidates that satisfy the two detectable criteria. The source number can be detected by the TeV facilities are (see Tab. 8 for detail): 6 CTAO (north or south) targets, 2 H.E.S.S targets, 5 MAGIC targets, and 1 VERITAS targets, and EAS arrays: 1 LHAASO targets, 0 HAWC targets. The SML and detectability results of the 40 high-confidence TeV blazar candidates are shown in Tab. 8, where Col. (1) gives the 4FGL name; Col. (2) and Col. (3) give the galactic coordinates (in degrees); Col. (4) synchrotron-peak frequency ( $\log\nu^{s}_{p}$ ); Col. (5) redshift (z); Col. (6) flux density EBL-absorbed of sources at $1\>\mathrm{TeV}$ ( $f_{1\>\mathrm{TeV}}$ ) which can be computed using Formula (7) the parameters of the IC bump; Col. (7) to Col. (10) $logistic$ in 4 catalogs, which is the likelihood of a source belonging to TeV sources given its optimal features Col. (11) the SED class in 4FGL-DR2 / 4LAC-DR2, where ‘bll’ stands for BL Lac, ‘fsrq’ for FSRQ, and ‘bcu’ for blazar candidate of uncertain type; Col. (12) to Col. (14) candidates compared with those of 3FHL, 3HSP, and 2BIGB, a candidate in common is marked as ‘Y’; Col. (15) a candidate comparing with those of other literature, ‘C & G’ represents the candidate is the same with Costamante & Ghisellini (2002), while ‘M’ for Massaro et al. (2013), ‘F’ for Foffano et al. (2019) ‘Chi’ for Chiaro et al. (2019); Col. (16) IC bump of candidates compare with the sensitivity of IACTs and particle detectors arrays in the range of $\mathrm{1TeV}\leq E\leq\mathrm{100TeV}$ . ‘CN’, ‘CS’, ‘HZ’, ‘HL’, ‘ML’, ‘MM’, ‘V’, ‘L,’, and ‘H’, stand for CTAO north, CTAO south, H.E.S.S zenith, H.E.S.S low, MAGIC low, MAGIC medium, VERITAS, LHAASO, and HAWC, respectively.

We also want to know how far away the source becomes undetectable in the TeV energy band, i.e. the redshift upper limit of the source beyond which it will become undetectable. For this purpose, in the range of 0 $\leq z\leq$ 6, we compare the EBL-absorbed IC bumps for 20 blazars and the sensitivity curves of detectors, at $E=$ 1 TeV, 10TeV, and 100 TeV, respectively. Redshifts upper limits for the 20 blazars for each IACT and EAS array are listed in Tab. 9, where Col. (1) gives the 4FGL name; Col. (2) to Col. (10) are the redshift upper limits of the 20 blazars for each, where the values in each column are redshift upper limits at $E=$ 1 TeV, $E=$ 10 TeV, and $E=$ 100 TeV, respectively. The value ‘0’ means the IC bump can not be detected at the specific $E$ ; Col. (11) redshifts; Col. (12) is the detectability for Cherenkov detectors. One can see the VHE photons could survive from EBL easier than those with higher redshift. This can also be seen from Formula (3) $\sim$ (6), the negative coefficient of $z$ leads $logit$ to be inversely proportional to $z$ . A larger redshift will result in a smaller logistic, indicating a source to be less likely to be a TeV candidate.

Table 7: Observable months of IACTs and EAS arryas^∗

4FGL name	CTAO North	CTAO South	H.E.S.S. zenith	H.E.S.S. low	MAGIC low	MAGIC medium	VERITAS
(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)
J0037.8+1239	7, 8, …, 12				1, 7, …, 12	1, 6, …, 12	7, 8, …, 12
J0051.2-6242
J0110.1+6805						1, 7, …, 12
J0110.7-1254		7, 8, …, 12		6^h, 7, …, 12		1, 7, …, 12
J0115.8+2519	1, 7, …, 12				1, 7, …, 12	1, 2, 6^h, …, 12	1, 7^a, …, 12
J0123.7-2311		7, 8, …, 12	7, 8, …, 12	6^h, 7, …, 12
J0159.5+1046	1, 8, …, 12				1, 7, …, 12	1, 2, …, 12
J0209.3-5228
J0211.2+1051	1, 8, …, 12				1, 8, …, 12	1, 2, 7, …, 12
J0244.6-5819
…	…	…	…	…	…	…	…
…	…	…	…	…	…	…	…
…	…	…	…	…	…	…	…

*

Only ten items are displayed. A complete listing of this table is available in the online version.
h

‘h’ indicates the month is only visible for the H.E.S.S. tool.
a

‘a’ indicates the month is only visible for astroplan.

Table 8: Predicting results for 40 TeV blazar candidates^∗

4FGL name	GLON	GLAT	$\log\nu^{\rm s}_{\rm p}$	$z$	$\log f_{\rm 1TeV}$	$\ logistic$				Class	Other catalogs			Common	Detectability
4FGL name	GLON	GLAT	$\log\nu^{\rm s}_{\rm p}$	$z$	$\log f_{\rm 1TeV}$	4FGL	3FHL	3HSP	2BIGB	Class	3FHL	3HSP	2BIGB	Common	Detectability
(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)	(10)	(11)	(12)	(13)	(14)	(15)	(16)
J0037.8+1239	117.77	-50.08	15.05	0.09	-12.32	82.50%	54.60%			bll	Y				CN, CS, ML, MM
J0051.2-6242	302.96	-54.43	15.96	0.3	-12.91	84.10%				bll
J0110.1+6805	124.69	5.29	14.85	0.29		81.80%				bll
J0110.7-1254	141.53	-75.08	17	0.23		82.40%				bll
J0115.8+2519	129.85	-37.21	15.75	0.36		97.60%				bll
J0123.7-2311	186.41	-81.7	17.96	0.4		91.50%				bll
J0159.5+1046	148.75	-48.66	15.8	0.2	-13.04	87.70%	53.60%			bll	Y
J0209.3-5228	278.35	-60.77	16.14	0.16	-12.28	95.60%		67.70%	60.20%	bll		Y	Y	M
J0211.2+1051	152.59	-47.37	14.22	0.2	-12.86	88.90%	55.50%			bll	Y
J0244.6-5819	278.45	-53.09	17.03	0.27		88.00%	53.30%		48.50%	bll	Y		Y	F
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…
…	…	…	…	…	…	…	…	…	…	…	…	…	…	…	…

*

Only ten sources are displayed. A complete listing of this table is available in the online version.
**

The source can be detected in $\mathrm{1TeV}\leq E\leq\mathrm{10TeV}$ and $\mathrm{10TeV}\leq E\leq\mathrm{100TeV}$

Table 9: Redshift uplimit of the 20 blazars for each IACT and EAS array.^∗

4FGL name	$z_{\rm uplimit}$ of CN	$z_{\rm uplimit}$ of CS	$z_{\rm uplimit}$ of HZ	$z_{\rm uplimit}$ of HL	$z_{\rm uplimit}$ of ML	$z_{\rm uplimit}$ of MM	$z_{\rm uplimit}$ of V	$z_{\rm uplimit}$ of L	$z_{\rm uplimit}$ of H	$z$	Detectability
(1)	(2)	(3)	(4)	(5)	(6)	(7)	(8)	(9)	(10)	(11)	(12)
J0037.8+1239		0.2, 0.05, 0.01		0.02, 0.01, 0.01		0.09, 0.01, 0.01		0, 0.01, 0.01	0, 0, 0.01	0.09	CN, CS, ML, MM
J0051.2-6242		0.29, 0.07, 0.01		0.14, 0.03, 0.01	0.20, 0.02, 0.01	0.19, 0.04, 0.01	0.17, 0.01, 0.01	0.1, 0.04, 0.01	0, 0.02, 0.01	0.3
J0110.1+6805										0.29
J0110.7-1254										0.23
J0115.8+2519										0.36
J0123.7-2311										0.4
J0159.5+1046								0, 0, 0.01		0.2
J0209.3-5228		0.27, 0.07, 0.01		0.11, 0.02, 0.01	0.17, 0.01, 0.01	0.17, 0.02, 0.01		0.07, 0.03, 0.01	0, 0.01, 0.01	0.16
J0211.2+1051								0, 0, 0.01		0.2
J0244.6-5819										0.27
…	…	…	…	…	…	…	…	…	…	…	…
…	…	…	…	…	…	…	…	…	…	…	…
…	…	…	…	…	…	…	…	…	…	…	…

*

Only ten items are displayed. A complete listing of this table is available in the online version.

5 Discussions and conclusions

5.1 Discussions

The SFS identified five features in 4FGL-DR2 / 4LAC-DR2 that are most critical to distinguish TeVs from non-TeVs: flux density, FOM, synchrotron peak frequency, redshift, spectral index, and variability, which are compatible with Lin & Fan (2016). Further considering X-ray emission and the $\log\nu_{\rm p}^{\rm s}$ shifts in behavior during a flare should be further analyzed in future work, sinces blazars that are not generally detected at TeV energies could be detected during flaring states. Therefore, it is possible that all blazars can emit the TeV photons. The blazars with TeV emission detected are only because their flux density in the TeV energy band is relatively high in specific periods, such as the flaring period. Note that 4FGL-DR2 includes features such as $Variability\_Index$ , which mirrored the difference between the flux fitted in each time interval and the average flux over the entire catalog interval. Furthermore, $Frac\_Variability$ is the fractional variability computed from the fluxes in each year. The $logit^{\prime}$ of 4FGL-DR2 / 4LAC-DR2 contains the $Frac\_Variability$ making sure the LR is sensitive to the sources with synchrotron peak shifting during flaring states.

According to the classification proposed by Abdo et al. (2010) or Fan et al. (2016), sources with $\log\nu_{\rm p}^{\rm s}>$ 15 (or $\log\nu_{\rm p}^{\rm s}>$ 15.3) are HSPs. In TeVCat, if we considered only HSP/HSP BL Lacs, we would have ignored more than 11% of TeV sources. We presented here an alternative way, where we have not applied many initial cuts by filtering on SED class or other properties. To compare the performance of two criteria: HSPs ( $\log\nu_{\rm p}^{\rm s}\geqslant 15.3$ Hz (Fan et al., 2016)) and LR model, we calculate the metric and confusion matrix on the testing sets of 4FGL-DR2 / 4LAC-DR2 and 3FHL. For 4FGL-DR2 / 4LAC-DR2, the $AUC$ : 73% and $TPR$ : 62% are both lower than those predicted by LR, up to 27 TeV blazars are misclassified as non-TeVs. For 3FHL, take off four sources without $\log\nu_{\rm p}^{\rm s}$ data, the $AUC$ : 68% and $TPR$ : 63% are also lower than those predicted by LR, up to 25 TeV blazars are misclassified as non-TeVs. The relevant confusion matrix is shown in Fig. 9 and Fig. 10. The results illustrated that our selection criterion is more effective.

So far, TeV candidate selection techniques privileged targets with high predicted TeV fluxes. In our analysis, instead, we identified targets that, according to their broadband features, have high chances to become promising TeV targets in specific activity states and that may have been missed so far due to their current status or to the lack of the proper instruments and observing strategies. As a data-driven algorithm, ML is inevitably sensitive to the sample composition, the ratio of the training set and test set, the value of k in cross-validation, etc. Our result is to ensure its generalization based on existing data. There is a potential bias in the fact that a significant fraction of TeVCat sources are Fermi-LAT detected blazars, as Fermi-LAT catalogs are browsed to list potential candidates for IACTs, and because Fermi-LAT monitoring is used to trigger Target of Opportunity programs. Also, our observation proposal does not consider the weather or available time of detectors. A feasible observation proposal should be more reliable and agreed upon by the observatory that considers more influencing factors.

Detecting extragalactic photons in the TeV band needs to face two major limitations, which are EBL and the performance of detectors (sensitivity, effective area, FoV, etc). The relationship between the improvement of detector sensitivity and the detection of more TeV sources is far from simple, because if it is true that an improvement in sensitivity indeed helps detect more sources but, since the attenuation rapidly increases with energy and redshift, the overall effect is rather mild (see for example Gilmore et al. 2012). The redshifts of the 81 blazars listed in TeVCat are taken from the four catalogs and literature, noting that for 4FGL J2243.9+2021 (or RGB J2243+203) we took the upper limit of the redshift range of 0.75 $\leq z\leq$ 1.1 (Sahu et al., 2019). Then we calculate the $\tau\ (E,z)$ fixing $E=$ 1 TeV and we evaluate $\tau\ (E,z)\leq$ 13.1 for all 81 blazars. The results are also listed in Tab. 10, where Col. (1) and Col. (2) give the TeVCat name and 4FGL-DR2 name; Col. (3) and Col. (4) are the celestial equator coordinates (in degrees); Col. (5) the redshift; Col. (6) the $\tau\ (E,z)$ provided by Saldana-Lopez et al. (2021) at $E=$ 1 TeV.

Table 10:

\tau\ (E,z)

E=

1 TeV for the 81 TeV blazars^∗

TeVCat Name	4FGL Name	Ra	Dec	z	$\tau$
(1)	(2)	(3)	(4)	(5)	(6)
J0013-188	J0013.9-1854	3.47	-18.89	0.095	0.92
J0033-193	J0033.5-1921	8.4	-19.35	0.61	7.61
J0035+598	J0035.9+5950	8.82	59.79	0.467	5.7
J0112+227	J0112.1+2245	18.02	22.74	0.265	2.96
J0136+391	J0136.5+3906	24.14	39.1	0.75	9.35
J0152+017	J0152.6+0147	28.14	1.78	0.08	0.76
J0214+517	J0214.3+5145	33.57	51.75	0.049	0.45
J0218+359	J0221.1+3556	35.27	35.94	0.944	11.54
J0222+430	J0222.6+4302	35.67	43.04	0.444	5.39
J0232+202	J0232.8+2018	38.22	20.27	0.139	1.4
…	…	…	…	…	…
…	…	…	…	…	…
…	…	…	…	…	…

*

Only ten items are displayed. A complete listing of this table is available in the online version.

All in all, the validation of TeV candidates is counting on IACTs and EAS arrays, while each of the IACTs and EAS arrays has advantages and disadvantages. IACTs have better energy and angular resolution. Besides, the energy threshold of IACTs could reach the GeV band while EAS arrays are generally sensitive above $\sim$ 10 TeV, but at $\sim$ 30 TeV, LHAASO sensitivity is comparable to that of CTAO, and LHAASO at $\sim$ 100 TeV is the most sensitive instrument (Cao et al., 2019). On the other hand, EAS array observation time is not limited to moonless nights and has a much larger FoV, and the exposure time of IACTs is typically quoted for 50 hr whereas for 1 or 5 years of EAS arrays. Although IACTs are better at capturing transient events, larger FoVs and longer exposure time make up the gap for the EAS array. Overall, for IACTs and EAS arrays, the differences in their operation modes and performance parameters allow them to complement each other. Furthermore, The sensitivity of LHAASO and the next-generation IACT array: CTAO, has been improved by an order of magnitude compared with the current IACTs and EAS arrays. They have pushed the detection limit of the TeV band to higher than 100 TeV (The CTA Consortium et al., 2011; Cao et al., 2019), enabling us to detect the whole energy range of the TeV sky in the future.

5.2 Conclusions

In this paper, we implement a machine learning algorithms called Logistic Regression to search the TeV blazar candidates from 4FGL-DR2 / 4LAC-DR2, 3FHL, 3HSP, and 2BIGB. Furthermore, we filter out potential observation targets for IACTs and EAS arrays. The main conclusions are enumerated below:

For the 4 catalogs, Logistic Regression provides an empirical formula to find blazars that could be detected at TeV energies with $logit^{\prime}\geqslant\rm 0$ .

For 4FGL-DR2 / 4LAC-DR2:

logit^{\prime}=4.808\Gamma+2.809V_{F}+3.889\log f^{ph}_{\rm 7}-3.34z+0.857\log\nu_{\rm p}^{\rm s}+20.244,

with $AUC=$ 98%, and the $FPR=$ 11%, $TPR=$ 99%, and the $p_{\rm thre}=$ 40%. 40 out of 150 non-TeV blazars have $logistic\geqslant$ 80% and are thus expected to be high-confidence candidates.

For 3FHL:

logit^{\prime}=0.116\log f^{ph}_{\rm 2}-0.628z+1.028,

with $AUC=$ 88%, $FPR=$ 27%, $TPR=$ 88%, and the $p_{\rm thre=}$ 52%. 24 out of 126 TeV candidates are common with the high-confidence ones of 4FGL-DR2 / 4LAC-DR2;

For 3HSP:

logit^{\prime}=0.306FOM-3.861z-0.173,

with $AUC=$ 98%, $FPR=$ 2%, $TPR=$ 91%, and $p_{\rm thre=}$ 61%. 11 out of 40 TeV candidates are common with the high-confidence ones of 4FGL-DR2 / 4LAC-DR2;

For 2BIGB,

logit^{\prime}=-0.016E_{\rm P}+0.248FOM-4.395z-0.122,

with $AUC=$ 97%, $FPR=$ 8%, $TPR=$ 96%, and $p_{\rm thre=}$ 48%. 14 out of 83 TeV candidates are common with the high-confidence ones of 4FGL-DR2 / 4LAC-DR2;

2.

For the 40 high-confidence candidates, we independently fit the two bumps of the SEDs, infer the visibility in the sky. For EAS arrays and IACTs in 2023, 7 candidates are potential targets: 6 CTAO targets, 2 H.E.S.S targets, 5 MAGIC targets, 1 VERITAS, 1 LHAASO targets, and 0 HAWC targets, respectively.
3.

We get 2 common sources with Costamante & Ghisellini (2002), 9 ones with Massaro et al. (2013), 4 ones with Foffano et al. (2019), and none with Chiaro et al. (2019), respectively.

Acknowledgement

Thanks are given to the excellent reviewer for the constructive comments and suggestions! The work is partially supported by the National Natural Science Foundation of China (NSFC U1831119, NSFC U2031201, NSFC 11733001, NSFC 12203034), Guangdong Major Project of Basic and Applied Basic Research (2019B030302001), Shanghai science and Technology Fund (22YF1431500), Scientific and Technological Cooperation Projects(2020-2023) between the People’s Republic of China and the Republic of Bulgaria, Astrophysics Key Subjects of Guangdong Province and Guangzhou City, the Large High Altitude Air Shower Observatory collaboration, the Cherenkov Telescope Array, the High Altitude Water Cherenkov Observatory, the Major Atmospheric Gamma Imaging Cherenkov Telescopes, the Very Energetic Radiation Imaging Telescope Array System, and the UK Swift Science Data Centre at the University of Leicester.

References

Abdo et al. (2010) Abdo, A. A., Ackermann, M., Agudo, I., et al. 2010, ApJ, 716, 30, doi: 10.1088/0004-637X/716/1/30
Abdollahi et al. (2020) Abdollahi, S., Acero, F., Ackermann, M., et al. 2020, ApJS, 247, 33, doi: 10.3847/1538-4365/ab6bcb
Abdollahi et al. (2020) Abdollahi, S., Acero, F., Ackermann, M., et al. 2020, The Astrophysical Journal Supplement Series, 247, 33, doi: 10.3847/1538-4365/ab6bcb
Abeysekara et al. (2017a) Abeysekara, A. U., Albert, A., Alfaro, R., et al. 2017a, ApJ, 843, 39, doi: 10.3847/1538-4357/aa7555
Abeysekara et al. (2017b) —. 2017b, ApJ, 843, 39, doi: 10.3847/1538-4357/aa7555
Abeysekara et al. (2017) Abeysekara, A. U., Albert, A., Alfaro, R., et al. 2017, The Astrophysical Journal, 843, 40, doi: 10.3847/1538-4357/aa7556
Abramowski et al. (2013) Abramowski, A., Acero, F., Aharonian, F., et al. 2013, Phys. Rev. D, 88, 102003, doi: 10.1103/PhysRevD.88.102003
Abramowski et al. (2015) Abramowski, A., Aharonian, F., Ait Benkhali, F., et al. 2015, ApJ, 802, 65, doi: 10.1088/0004-637X/802/1/65
Acciari et al. (2008) Acciari, V. A., Aliu, E., Beilicke, M., et al. 2008, ApJ, 684, L73, doi: 10.1086/592244
Acciari et al. (2011) Acciari, V. A., Aliu, E., Arlen, T., et al. 2011, ApJ, 738, 169, doi: 10.1088/0004-637X/738/2/169
Aharonian et al. (2003) Aharonian, F., Akhperjanian, A., Beilicke, M., et al. 2003, A&A, 406, L9, doi: 10.1051/0004-6361:20030838
Aharonian et al. (2004) Aharonian, F., Akhperjanian, A. G., Aye, K. M., et al. 2004, Astroparticle Physics, 22, 109, doi: 10.1016/j.astropartphys.2004.06.006
Aharonian et al. (2007a) Aharonian, F., Akhperjanian, A. G., Barres de Almeida, U., et al. 2007a, A&A, 473, L25, doi: 10.1051/0004-6361:20078412
Aharonian et al. (2007b) —. 2007b, A&A, 475, L9, doi: 10.1051/0004-6361:20078462
Aharonian et al. (2009) Aharonian, F., Akhperjanian, A. G., Anton, G., et al. 2009, ApJ, 696, L150, doi: 10.1088/0004-637X/696/2/L150
Aharonian et al. (2001) Aharonian, F. A., Akhperjanian, A. G., Barrio, J. A., et al. 2001, A&A, 366, 62, doi: 10.1051/0004-6361:20000098
Ajello et al. (2017) Ajello, M., Atwood, W. B., Baldini, L., et al. 2017, ApJS, 232, 18, doi: 10.3847/1538-4365/aa8221
Ajello et al. (2020) Ajello, M., Angioni, R., Axelsson, M., et al. 2020, The Astrophysical Journal, 892, 105, doi: 10.3847/1538-4357/ab791e
Aleksić et al. (2016) Aleksić, J., Ansoldi, S., Antonelli, L. A., et al. 2016, Astroparticle Physics, 72, 76, doi: 10.1016/j.astropartphys.2015.02.005
Aliu et al. (2011) Aliu, E., Aune, T., Beilicke, M., et al. 2011, ApJ, 742, 127, doi: 10.1088/0004-637X/742/2/127
Aliu et al. (2015) Aliu, E., Archer, A., Aune, T., et al. 2015, ApJ, 799, 7, doi: 10.1088/0004-637X/799/1/7
Amenomori et al. (2003) Amenomori, M., Ayabe, S., Cui, S. W., et al. 2003, ApJ, 598, 242, doi: 10.1086/378350
Archambault et al. (2013) Archambault, S., Arlen, T., Aune, T., et al. 2013, ApJ, 776, 69, doi: 10.1088/0004-637X/776/2/69
Archambault et al. (2014) Archambault, S., Aune, T., Behera, B., et al. 2014, ApJ, 785, L16, doi: 10.1088/2041-8205/785/1/L16
Arlen et al. (2013) Arlen, T., Aune, T., Beilicke, M., et al. 2013, ApJ, 762, 92, doi: 10.1088/0004-637X/762/2/92
Arsioli et al. (2020) Arsioli, B., Chang, Y.-L., & Musiimenta, B. 2020, Monthly Notices of the Royal Astronomical Society, 493, 2438–2451, doi: 10.1093/mnras/staa368
Ball & Brunner (2010) Ball, N. M., & Brunner, R. J. 2010, International Journal of Modern Physics D, 19, 1049, doi: 10.1142/S0218271810017160
Ballet et al. (2020) Ballet, J., Burnett, T. H., Digel, S. W., & Lott, B. 2020, arXiv e-prints, arXiv:2005.11208. https://arxiv.org/abs/2005.11208
Baron (2019) Baron, D. 2019, arXiv e-prints, arXiv:1904.07248. https://arxiv.org/abs/1904.07248
Bartoli et al. (2011) Bartoli, B., Bernardini, P., Bi, X. J., et al. 2011, ApJ, 734, 110, doi: 10.1088/0004-637X/734/2/110
Bartoli et al. (2012) —. 2012, ApJ, 758, 2, doi: 10.1088/0004-637X/758/1/2
Biteau & Williams (2015) Biteau, J., & Williams, D. A. 2015, ApJ, 812, 60, doi: 10.1088/0004-637X/812/1/60
Błażejowski et al. (2000) Błażejowski, M., Sikora, M., Moderski, R., & Madejski, G. M. 2000, ApJ, 545, 107, doi: 10.1086/317791
Bloom & Marscher (1996) Bloom, S. D., & Marscher, A. P. 1996, ApJ, 461, 657, doi: 10.1086/177092
Buitinck et al. (2013) Buitinck, L., Louppe, G., Blondel, M., et al. 2013, in ECML PKDD Workshop: Languages for Data Mining and Machine Learning, 108–122
Cao et al. (2019) Cao, Z., Chen, M.-J., Chen, Song-Zhan Hu, H.-B., et al. 2019, Chinese Astron. Astrophys., 43, 457, doi: 10.1016/j.chinastron.2019.11.001
Chandra et al. (2010) Chandra, P., Yadav, K. K., Rannot, R. C., et al. 2010, Journal of Physics G Nuclear Physics, 37, 125201, doi: 10.1088/0954-3899/37/12/125201
Chandra et al. (2012) Chandra, P., Rannot, R. C., Yadav, K. K., et al. 2012, Journal of Physics G Nuclear Physics, 39, 045201, doi: 10.1088/0954-3899/39/4/045201
Chang et al. (2017) Chang, Y. L., Arsioli, B., Giommi, P., & Padovani, P. 2017, A&A, 598, A17, doi: 10.1051/0004-6361/201629487
Chang et al. (2019) Chang, Y.-L., Arsioli, B., Giommi, P., Padovani, P., & Brandt, C. H. 2019, Astronomy & Astrophysics, 632, A77, doi: 10.1051/0004-6361/201834526
Chiaro et al. (2019) Chiaro, G., Meyer, M., Di Mauro, M., et al. 2019, ApJ, 887, 104, doi: 10.3847/1538-4357/ab46ad
Chiaro et al. (2019) Chiaro, G., Meyer, M., Mauro, M. D., et al. 2019, The Astrophysical Journal, 887, 104, doi: 10.3847/1538-4357/ab46ad
Costamante et al. (2018) Costamante, L., Bonnoli, G., Tavecchio, F., et al. 2018, MNRAS, 477, 4257, doi: 10.1093/mnras/sty857
Costamante & Ghisellini (2002) Costamante, L., & Ghisellini, G. 2002, A&A, 384, 56, doi: 10.1051/0004-6361:20011749
Costamante et al. (2001) Costamante, L., Ghisellini, G., Giommi, P., et al. 2001, A&A, 371, 512, doi: 10.1051/0004-6361:20010412
Daniel et al. (2005) Daniel, M. K., Badran, H. M., Bond, I. H., et al. 2005, ApJ, 621, 181, doi: 10.1086/427406
Dermer & Schlickeiser (1993) Dermer, C. D., & Schlickeiser, R. 1993, ApJ, 416, 458, doi: 10.1086/173251
Dimitrakoudis et al. (2012) Dimitrakoudis, S., Mastichiadis, A., Protheroe, R. J., & Reimer, A. 2012, A&A, 546, A120, doi: 10.1051/0004-6361/201219770
Domínguez et al. (2019) Domínguez, A., Wojtak, R., Finke, J., et al. 2019, The Astrophysical Journal, 885, 137, doi: 10.3847/1538-4357/ab4a0e
Fan (2002) Fan, J.-H. 2002, PASJ, 54, L55, doi: 10.1093/pasj/54.4.L55
Fan et al. (2014) Fan, J.-H., Bastieri, D., Yang, J.-H., et al. 2014, Research in Astronomy and Astrophysics, 14, 1135, doi: 10.1088/1674-4527/14/9/004
Fan et al. (2016) Fan, J. H., Yang, J. H., Liu, Y., et al. 2016, ApJS, 226, 20, doi: 10.3847/0067-0049/226/2/20
Fan et al. (2017) Fan, J. H., Kurtanidze, O., Liu, Y., et al. 2017, ApJ, 837, 45, doi: 10.3847/1538-4357/aa5def
Fan et al. (2021) Fan, J. H., Kurtanidze, S. O., Liu, Y., et al. 2021, ApJS, 253, 10, doi: 10.3847/1538-4365/abd32d
Fawcett (2006) Fawcett, T. 2006, Pattern Recognition Letters, 27, 861, doi: https://doi.org/10.1016/j.patrec.2005.10.010
Fermi-LAT Collaboration et al. (2018) Fermi-LAT Collaboration, Abdollahi, S., Ackermann, M., et al. 2018, Science, 362, 1031, doi: 10.1126/science.aat8123
Foffano et al. (2019) Foffano, L., Prandini, E., Franceschini, A., & Paiano, S. 2019, MNRAS, 486, 1741, doi: 10.1093/mnras/stz812
Ghisellini et al. (1985) Ghisellini, G., Maraschi, L., & Treves, A. 1985, A&A, 146, 204
Gilmore et al. (2012) Gilmore, R. C., Somerville, R. S., Primack, J. R., & Domínguez, A. 2012, MNRAS, 422, 3189, doi: 10.1111/j.1365-2966.2012.20841.x
Giommi et al. (2002) Giommi, P., Capalbi, M., Fiocchi, M., et al. 2002, in Blazar Astrophysics with BeppoSAX and Other Observatories, ed. P. Giommi, E. Massaro, & G. Palumbo, 63. https://arxiv.org/abs/astro-ph/0209596
Godambe et al. (2008) Godambe, S. V., Rannot, R. C., Chandra, P., et al. 2008, Journal of Physics G Nuclear Physics, 35, 065202, doi: 10.1088/0954-3899/35/6/065202
Gupta et al. (2016) Gupta, A. C., Kalita, N., Gaur, H., & Duorah, K. 2016, Monthly Notices of the Royal Astronomical Society, 462, 1508
Gupta et al. (2016) Gupta, A. C., Agarwal, A., Bhagwan, J., et al. 2016, MNRAS, 458, 1127, doi: 10.1093/mnras/stw377
H. E. S. S. Collaboration et al. (2010) H. E. S. S. Collaboration, Abramowski, A., Acero, F., et al. 2010, A&A, 520, A83, doi: 10.1051/0004-6361/201014484
Hearst (1998) Hearst, M. A. 1998, IEEE Intelligent Systems, 13, 18, doi: 10.1109/5254.708428
HESS Collaboration et al. (2013) HESS Collaboration, Abramowski, A., Acero, F., et al. 2013, MNRAS, 434, 1889, doi: 10.1093/mnras/stt1081
Kang et al. (2019) Kang, S.-J., Li, E., Ou, W., et al. 2019, ApJ, 887, 134, doi: 10.3847/1538-4357/ab558b
Kapanadze et al. (2018) Kapanadze, B., Dorner, D., Vercellone, S., et al. 2018, MNRAS, 473, 2542, doi: 10.1093/mnras/stx2492
Kneiske et al. (2004) Kneiske, T. M., Bretz, T., Mannheim, K., & Hartmann, D. H. 2004, A&A, 413, 807, doi: 10.1051/0004-6361:20031542
Korsós et al. (2021) Korsós, M. B., Erdélyi, R., Liu, J., & Morgan, H. 2021, Frontiers in Astronomy and Space Sciences, 7, 113, doi: 10.3389/fspas.2020.571186
Lin & Fan (2016) Lin, C., & Fan, J.-H. 2016, Research in Astronomy and Astrophysics, 16, 103, doi: 10.1088/1674-4527/16/7/103
Lott et al. (2020) Lott, B., Gasparrini, D., & Ciprini, S. 2020, The Fourth Catalog of Active Galactic Nuclei Detected by the Fermi Large Area Telescope – Data Release 2. https://arxiv.org/abs/2010.08406
MAGIC Collaboration (2000) MAGIC Collaboration. 2000, in American Institute of Physics Conference Series, Vol. 515, GeV-TeV Gamma Ray Astrophysics Workshop : towards a major atmospheric, ed. B. L. Dingus, M. H. Salamon, & D. B. Kieda, 510–514, doi: 10.1063/1.1291417
Majumder et al. (2019) Majumder, A., Mitra, K., Chatterjee, R., et al. 2019, MNRAS, 490, 124, doi: 10.1093/mnras/stz2557
Maraschi et al. (1992) Maraschi, L., Ghisellini, G., & Celotti, A. 1992, ApJ, 397, L5, doi: 10.1086/186531
Massaro et al. (2013) Massaro, F., Paggi, A., Errando, M., et al. 2013, ApJS, 207, 16, doi: 10.1088/0067-0049/207/1/16
McCulloch & Pitts (1943) McCulloch, W. S., & Pitts, W. 1943, The bulletin of mathematical biophysics, 5, 115
Nievas Rosillo et al. (2022) Nievas Rosillo, M., Domínguez, A., Chiaro, G., et al. 2022, MNRAS, 512, 137, doi: 10.1093/mnras/stac491
Osorio et al. (2019) Osorio, M., Sacahui, R., González, M. M., Fraija, N., & García-González, J. A. 2019, in International Cosmic Ray Conference, Vol. 36, 36th International Cosmic Ray Conference (ICRC2019), 686
Padovani & Giommi (1995) Padovani, P., & Giommi, P. 1995, ApJ, 444, 567, doi: 10.1086/175631
Pedregosa et al. (2011) Pedregosa, F., Varoquaux, G., Gramfort, A., et al. 2011, Journal of Machine Learning Research, 12, 2825
Piner et al. (2010) Piner, B. G., Pant, N., & Edwards, P. G. 2010, ApJ, 723, 1150, doi: 10.1088/0004-637X/723/2/1150
Primack et al. (1999) Primack, J. R., Bullock, J. S., Somerville, R. S., & MacMinn, D. 1999, Astroparticle Physics, 11, 93, doi: 10.1016/S0927-6505(99)00031-6
Raschka (2015) Raschka, S. 2015, Python Machine Learning (Packt Publishing)
Sahu et al. (2019) Sahu, S., López Fortín, C. E., & Nagataki, S. 2019, ApJ, 884, L17, doi: 10.3847/2041-8213/ab43c7
Saldana-Lopez et al. (2021) Saldana-Lopez, A., Domínguez, A., Pérez-González, P. G., et al. 2021, MNRAS, 507, 5144, doi: 10.1093/mnras/stab2393
Sambruna et al. (2000) Sambruna, R. M., Aharonian, F. A., Krawczynski, H., et al. 2000, ApJ, 538, 127, doi: 10.1086/309133
Schroedter et al. (2005) Schroedter, M., Badran, H. M., Buckley, J. H., et al. 2005, ApJ, 634, 947, doi: 10.1086/496968
Sharma et al. (2015) Sharma, M., Nayak, J., Koul, M. K., et al. 2015, Nuclear Instruments and Methods in Physics Research A, 770, 42, doi: 10.1016/j.nima.2014.10.012
Sikora et al. (1994a) Sikora, M., Begelman, M. C., & Rees, M. J. 1994a, ApJ, 421, 153, doi: 10.1086/173633
Sikora et al. (1994b) —. 1994b, ApJ, 421, 153, doi: 10.1086/173633
Singh et al. (2019) Singh, K. K., Bhatt, H., Bhattacharyya, S., et al. 2019, Advances in Space Research, 63, 766, doi: 10.1016/j.asr.2018.08.013
Tavecchio (2014) Tavecchio, F. 2014, Monthly Notices of the Royal Astronomical Society, 438, 3255, doi: 10.1093/mnras/stt2437
Taylor (2019) Taylor, C. 2019, Applications Of Dynamic Programming To Agricultural Decision Problems (CRC Press). https://books.google.it/books?id=71SsDwAAQBAJ
The CTA Consortium et al. (2011) The CTA Consortium, Actis, M., Agnetta, G., Aharonian, F., et al. 2011, Experimental Astronomy, 32, 193, doi: 10.1007/s10686-011-9247-0
Tin Kam Ho (1995) Tin Kam Ho. 1995, in Proceedings of 3rd International Conference on Document Analysis and Recognition, Vol. 1, 278–282 vol.1, doi: 10.1109/ICDAR.1995.598994
Urry & Padovani (1995) Urry, C. M., & Padovani, P. 1995, PASP, 107, 803, doi: 10.1086/133630
van Eldik et al. (2015) van Eldik, C., Holler, M., Berge, D., et al. 2015, in International Cosmic Ray Conference, Vol. 34, 34th International Cosmic Ray Conference (ICRC2015), 847
Varma & Simon (2006) Varma, S., & Simon, R. 2006, BMC bioinformatics, 7, 91, doi: 10.1186/1471-2105-7-91
VERITAS Collaboration et al. (2005) VERITAS Collaboration, Badran, H. M., Blaylock, G., et al. 2005, in American Institute of Physics Conference Series, Vol. 745, High Energy Gamma-Ray Astronomy, ed. F. A. Aharonian, H. J. Völk, & D. Horns, 633–638, doi: 10.1063/1.1878475
Villata et al. (2006) Villata, M., Raiteri, C. M., Balonek, T. J., et al. 2006, A&A, 453, 817, doi: 10.1051/0004-6361:20064817
Wakely & Horan (2008) Wakely, S. P., & Horan, D. 2008, in International Cosmic Ray Conference, Vol. 3, International Cosmic Ray Conference, 1341–1344
Wills et al. (1992) Wills, B. J., Wills, D., Breger, M., Antonucci, R. R. J., & Barvainis, R. 1992, ApJ, 398, 454, doi: 10.1086/171869
Xiao et al. (2022) Xiao, H., Zhu, J., Fu, L., Zhang, S., & Fan, J. 2022, PASJ, 74, 239, doi: 10.1093/pasj/psab121
Xiao et al. (2019) Xiao, H., Fan, J., Yang, J., et al. 2019, Science China Physics, Mechanics, and Astronomy, 62, 129811, doi: 10.1007/s11433-018-9371-x
Xiao et al. (2020) Xiao, H. B., Fan, J. H., Rando, R., Zhu, J. T., & Hu, L. J. 2020, Astronomische Nachrichten, 341, 462, doi: https://doi.org/10.1002/asna.202013733
Xiao et al. (2015) Xiao, H. B., Pei, Z. Y., Xie, H. J., et al. 2015, Ap&SS, 359, 39, doi: 10.1007/s10509-015-2433-1
Yang et al. (2022a) Yang, J. H., Fan, J. H., Liu, Y., et al. 2022a, ApJS, 262, 18, doi: 10.3847/1538-4365/ac7deb
Yang et al. (2022b) Yang, W.-X., Xiao, H.-B., Wang, H.-G., et al. 2022b, Research in Astronomy and Astrophysics, 22, 085002, doi: 10.1088/1674-4527/ac712c
Youden (1950) Youden, W. J. 1950, Cancer, 3, 32, doi: https://doi.org/10.1002/1097-0142(1950)3:1<32::AID-CNCR2820030106>3.0.CO;2-3