This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Testing CP properties of the Higgs boson coupling to τ\tau leptons with heterogeneous graphs

W. Esmaila, A. Hammadb, M. Nojirib,c,d and Christiane Scherbe,f
( a Institut für Kernphysik, Universität Münster, Wilhelm-Klemm-Str.
9, 48149 Münster, Germany.
b Theory Center, IPNS, KEK, 1-1 Oho, Tsukuba, Ibaraki 305-0801, Japan.
c The Graduate University of Advanced Studies (Sokendai),
1-1 Oho, Tsukuba, Japan.
d Kavli IPMU (WPI), University of Tokyo, 5-1-5 Kashiwanoha,
Kashiwa, Chiba 277-8583, Japan.
e Berkeley Center for Theoretical Physics, Department of Physics,
University of California, Berkeley, CA 94720, USA.
f Theoretical Physics Group, Lawrence Berkeley National Laboratory,
Berkeley, CA 94720, USA. )
Abstract

We explore the feasibility of measuring the CP properties of the Higgs boson coupling to τ\tau leptons at the High Luminosity Large Hadron Collider (HL-LHC). Employing detailed Monte Carlo simulations, we analyze the reconstruction of the angle between τ\tau lepton planes at the detector level, accounting for various hadronic τ\tau decay modes. Considering standard model backgrounds and detector resolution effects, we employ three Deep Learning (DL) networks, Multi-Layer Perceptron (MLP), Graph Convolution Network (GCN), and Graph Transformer Network (GTN) to enhance signal-to-background separation. To incorporate CP-sensitive observables into Graph networks, we construct Heterogeneous graphs capable of integrating nodes and edges with different structures within the same framework. Our analysis demonstrates that GTN exhibits superior efficiency compared to GCN and MLP. Under a simplified detector simulation analysis, MLP can exclude CP mixing angle larger than 2020^{\circ} at 68%68\% confidence level (CL), while GCN and GTN can achieve exclusions at 90%90\% CL and 95%95\% CL, respectively with s=14\sqrt{s}=14 TeV and =100fb1\mathcal{L}=100\rm{fb}^{-1}. Furthermore, the DL networks can achieve a significance of approximately 3σ3\sigma in excluding the pure CP-odd state.

 

 

1 Introduction

Following the discovery of a scalar resonance with a mass near 125 GeV [1, 2], researchers are now focused on measuring the properties and characteristics to assess whether it aligns with the Higgs boson predicted by the Standard Model (SM) [3, 4]. A key aspect of this investigation is examining its spin and CP transformation properties [5, 6, 7]. For example, top-associated production is a promising channel to extract such properties [8, 9, 10]. Testing its CP properties is especially significant in light of the observed baryon asymmetry of the universe. The SM CP violation has been observed initially in Kaon decays [11], and established by the measurements of the direct CP violation in K system [12], and CP violation in neutral B meson decays [13, 14]. However, it is insufficient to account for the Bayron asymmetry of the Universe [15]. Therefore, additional sources of CP violation are essential ingredients of BSM to address the origin of the matter in our Universe. So far, measurements of the Higgs boson properties, for instance, the interactions with gauge bosons [16, 17, 18, 19, 20, 21], performed by the ATLAS and CMS experiments show no deviations from the SM predictions. Still, the possibility of an extended scalar sector that includes CP violation, and thus, the observed scalar resonance being a CP mixing state, has not been ruled out.

Besides the top-associated Higgs production, studying the τ\tau spin correlation of the hττh\tau\tau coupling yields valuable information about the CP state of the Higgs boson and has been proposed for both the LHC and lepton colliders [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34]. Here, the primary focus is on reconstructing acoplanarity angle of the decays of the τ\tau lepton pairs from the angular correlation of the decay products. A significant challenge in this kind of study at LHC arises from the presence of neutrinos in τ\tau lepton decays, which complicates the accurate determination of the τ\tau momentum vector and, consequently, the angular distribution between the τ\tau lepton pairs at the LHC. However, the one-prong decays of the τ\tau lepton, τ±π±ν\tau^{\pm}\rightarrow\pi^{\pm}\nu, τ±ρ±(ρ±π±π0)ντ,τ±a1±(a1±π±π0π0)ντ\tau^{\pm}\to\rho^{\pm}(\rho^{\pm}\to\pi^{\pm}\pi^{0})\nu_{\tau},\tau^{\pm}\to a_{1}^{\pm}(a_{1}^{\pm}\to\pi^{\pm}\pi^{0}\pi^{0})\nu_{\tau}, are promising. For the π+\pi^{+} final state, the CP information can be reconstructed from charged pion momentum and interaction point (IP) information [26]. For the case of ρ\rho and a1a_{1} final states, the angle between the decay planes of (π+,nπ0\pi^{+},n\pi^{0}), (π,nπ0\pi^{-},n\pi^{0}) retains the spin correlation information of the τ\tau lepton pair for ρ\rho and a1a_{1} mode. It is essential to combine as many final states as possible; the branching ratio of τ+π+ν\tau^{+}\to\pi^{+}\nu final state is only 10.8% of the total decay width but increases up to 45.6% if one includes up to π+2π0\pi^{+}2\pi^{0} final state. Analyses of Higgs boson decays into τ\tau lepton pairs have been conducted by both ATLAS [35, 16, 36] and CMS [37, 38, 39, 40]. They show agreement with the SM prediction of the Higgs boson but do not rule out CP admixture. ATLAS and CMS will accumulate more than 200 fb-1 for each experiment in Run 3 (2024-2025), in the future, the High Luminosity Large Hadron Collider (HL-LHC) aims to collect 3000 fb-1, to reveal the nature of the Higgs boson.

The hadronic final states of ττ¯\tau\bar{\tau} suffers serious background from various SM processes. Cuts to reduce the background have been developed, such as the isolation of the jet, jet mass, change multiplicities, and momentum distribution of pions. Deep Learning Neural Networks (DNNs) can efficiently increase the signal-to-background yield. Recently, DNNs have been widely used in collider analysis for various tasks, see [41] and references therein. In this paper, we utilize different sets of DNNs to suppress background events and study the CP properties of the Higgs boson at the HL-LHC. The first DNN we consider is the Multi-Layer Perceptron (MLP), which analyzes high-level kinematic and CP distribution. Although MLP can achieve high classification performance between signal and background events, it does not provide optimal performance for an analysis of the CP properties of the Higgs boson. This is because CP information is fully mixed with event kinematics; the learned information about Higgs boson CP properties will be diluted and network performance will be degraded. The issue of degraded performance in MLPs due to inputs containing mixed information has been highlighted, for example, in [42].

Alternatively, one can utilize a heterogeneous graph to analyse the hττ¯h\rightarrow\tau\bar{\tau}. A heterogeneous graph consists of nodes and edges with different types of information, which allows the separate encoding of CP and kinematical information. We construct a graph of the nodes of the final state pions and the reconstructed τ\tau pair with selective connections. Pion nodes are fully connected to recover the kinematic information, while tau nodes are connected with edges weighted by the value of the reconstructed angular distribution between them. This approach allows CP and kinematic information to be separately encoded within a single graph.

To analyze these constructed graphs, we utilize two Graph Neural Networks (GNNs): Graph Convolutional Network (GCN) and Graph Transformer Network (GTN). One key advantage of the GTN is its ability to dynamically capture the complex information within the graph, as GTNs leverage the attention mechanism to weigh the importance of different nodes irrespective of their position, while GCNs rely on localized neighborhood aggregation. Thanks to the attention mechanism, GTNs can dynamically focus on the most relevant parts of the heterogeneous graph. This flexibility, together with the enhanced capability to model intricate patterns, make GTNs particularly powerful for heterogeneous graph classification tasks. We find that GTN can enhance the current sensitivity on CP nature of ATLAS and CMS analysis.

This paper is organized as follows: In Section 2, we describe the parameterization of the effective Lagrangian relevant to this study and the construction of the CP observable in different hadronically decaying channels of the τ\tau lepton. Section 3 details the analysis methodology and the Monte Carlo tools used for event simulation. In Section 4, we discuss the three deep learning methods, MLP, GCN and GTN. The results of this study are presented in Section 5 and our conclusion is drawn in Section 6.

2 Effective Lagrangian and CP observable

The general form of the effective Yukawa interaction between the Higgs boson hh and τ\tau leptons can be written as

Hττ=mτυκττ¯(cosθτ+iγ5sinθτ)τh.\mathcal{L}_{H\tau\tau}=-\frac{m_{\tau}}{\upsilon}\kappa_{\tau}\bar{\tau}\left(\cos\theta_{\tau}+i\gamma_{5}\sin\theta_{\tau}\right)\tau h\,. (1)

Here, κτ\kappa_{\tau} is the reduced Yukawa coupling strength, v=246v=246 the vacuum expectation value (vev) of hh and θτ\theta_{\tau} is the CP mixing angle, with θτ\theta_{\tau} ranging from 90-90^{\circ} to 9090^{\circ}. This angle parameterized the relative contributions of the CP-even and CP-odd components to the hττh\tau\tau coupling. Specifically, θτ=0\theta_{\tau}=0^{\circ} represents a purely CP-even state, while θτ=90\theta_{\tau}=90^{\circ} represents a purely CP-odd state. Values of θτ\theta_{\tau} between these extremes indicate an admixture of both components, suggesting a CP-violation in the Higgs sector. The CP-mixing angle θτ\theta_{\tau} affects the correlations between the transverse spin components of τ\tau-leptons in hττh\to\tau\tau decays. These correlations, in turn, influence the directions of the τ\tau-lepton decay products. The acoplanarity angle Φ\Phi^{\ast}, defined between the τ\tau decay planes, is sensitive to these transverse spin correlations and is influenced by the CP-mixing angle of the Yukawa coupling. The Φ\Phi^{\ast} angle is directly connected to θτ\theta_{\tau} in the hττh\to\tau\tau differential decay rate, with the relationship taking the form of a first-order trigonometric polynomial in θτ\theta_{\tau}. The differential decay rate can be obtained as [43, 44]

dΓhττ+aa+(1π216b(E+)b(E)cos(Φ2θτ)),\displaystyle{d\Gamma_{h\to\tau^{-}\tau^{+}\to a^{-}a^{{}^{\prime}+}}}\propto\left(1-\frac{\pi^{2}}{16}b(E_{+})b(E_{-})\cos\left(\Phi^{\ast}-2\theta_{\tau}\right)\right)\,, (2)

where aa and aa^{\prime} are the tau decay products and b(E±)b(E_{\pm}) is the spectral functions defined in [27].

The reconstruction of Φ\Phi^{\ast} requires the reconstruction of the τ\tau decay planes, which is challenging at the detector level due to the missing energy associated with τ\tau neutrinos. Various methods have been developed to approximate the acoplanarity angle based on differential techniques [23, 45, 25, 27, 28, 46, 31]. These methods are tailored to analyze specific τ\tau lepton decay modes and are adjusted according to the number of visible particles within the τjet\tau_{\rm jet}. Among them, we consider two methods for reconstructing Φ\Phi^{\ast}[36]. For the τ±π±ντ\tau^{\pm}\to\pi^{\pm}\nu_{\tau} decay, only one visible charged particle is present. In this case, the τ\tau plane can be reconstructed using the IP method based on the impact parameter of the charged pions from the two τ\tau. On the other hand, if τ\tau decays to ρ±ντ\rho^{\pm}\nu_{\tau} and a1±ντa^{\pm}_{1}\nu_{\tau}, both charged and neutral pions are present. In this case we use the transverse momenta of the visible particles to estimate Φ\Phi^{\ast}.

There is a significant advantage in having two visible particles from each τ\tau decay. Although the momentum and plane of the τ\tau lepton decays cannot be fully reconstructed due to the missing energy from the associated neutrino, the momenta of the charged and neutral pions can still be utilized. This allows for retaining information about the τ\tau polarization and reconstructing Φ\Phi^{\ast}. The acoplanarity angle can be derived from the charged and neutral pions at the LHC as [31, 47]

Φ=arccos(p^0+p^0)×sgn(p^(p^0+×p^0)),\Phi^{\ast}=\text{arccos}\left(\hat{p}^{0+}_{\perp}\cdot\hat{p}^{0-}_{\perp}\right)\times\text{sgn}\left(\hat{p}^{-}\cdot\left(\hat{p}^{0+}_{\perp}\times\hat{p}^{0-}_{\perp}\right)\right)\,, (3)

where p^±\hat{p}^{\pm} is the unit vector of the charged pion three momentum in the zero momentum frame of the ρ\rho-meson pair, and p^0+,p^0\hat{p}^{0+}_{\perp},\hat{p}^{0-}_{\perp} are normalised three momentum vectors of neutral pions transverse to the charged pion momentum. Another requirement is the discrimination of phase space with different τ\tau polarization. The sign of the product of the τ\tau lepton spin analyzing function, Y=yρ×y+ρY=y_{-}^{\rho}\times y_{+}^{\rho} , where y±ρ=Eπ±Eπ0Eπ±+Eπ0y_{\pm}^{\rho}=\frac{E_{\pi^{\pm}}-E_{\pi^{0}}}{E_{\pi^{\pm}}+E_{\pi^{0}}}, and EπE_{\pi} is the pion energy in the laboratory frame, appears in the CP-mixing sensitive terms of the squared matrix element. As YY is not positive definite, integration over pion momenta for both Y>0Y>0 and Y<0Y<0 would average out the CP mixing sensitive terms in the matrix element. Accordingly, the events from different classes are separated by shifting the events with Y<0Y<0 by π\pi. This way the acoplanarity angle is modified for the case of Y<0Y<0 only and defined as

Φ={Φπif 0<Φ<π,Φ+πif π<Φ<0.\Phi^{\ast}=\begin{cases}&\Phi^{\ast}-\pi\hskip 17.07164pt\text{if }\hskip 11.38109pt0<\Phi^{\ast}<\pi\,,\\ &\Phi^{\ast}+\pi\hskip 17.07164pt\text{if }\hskip 11.38109pt-\pi<\Phi^{\ast}<0\,.\end{cases} (4)

With this definition Φ\Phi^{\ast} has the range of π<Φ<π-\pi<\Phi^{\ast}<\pi.

Analogously to the ρ\rho decay mode, the acoplanarity angle of the τ\tau decay mode τ±a1±ντπ±2π0ντ\tau^{\pm}\to a^{\pm}_{1}\nu_{\tau}\to\pi^{\pm}2\pi^{0}\nu_{\tau} can be constructed by considering the four momenta sum of the neutral pions as taken in the neutral component of the ρ\rho method.

For the decay mode τ±π±ντ\tau^{\pm}\to\pi^{\pm}\nu_{\tau}, the IP method is employed to reconstruct Φ\Phi^{\ast}. The impact parameter is defined as the shortest distance between the primary vertex and the pion momentum vector extended in the direction of the τ\tau decay point. Since it is practically impossible to reconstruct the τ\tau lepton momentum due to the presence of τ\tau neutrinos among the decay products, the τ\tau lepton decay plane is reconstructed from the track momentum and the impact parameter of the charged pion. First, the normalized impact parameters of the charged pions, n^=(0,n±)\hat{n}^{\ast}=(0,\vec{n}^{\ast\pm}), are measured in the lab frame and then boosted to the zero momentum frame of the visible π±\pi^{\pm} pair. The transverse components of the boosted impact parameters to the direction of the associated charged pion momentum, n^±\hat{n}^{\ast\pm}_{\perp}, are used to define the acoplanarity angle as follows:

Φ=arccos(n^+n^)×sgn(p^(n^+×n^)).\Phi^{\ast}=\text{arccos}\left(\hat{n}^{\ast+}_{\perp}\cdot\hat{n}^{\ast-}_{\perp}\right)\times\text{sgn}\left(\hat{p}^{\ast-}\left(\hat{n}^{\ast+}_{\perp}\times\hat{n}^{\ast-}_{\perp}\right)\right)\,. (5)

Considering this setup, we can analyze the CP properties of Higgs boson decays to τ\tau lepton pairs from three hadronic decay modes of the τ\tau lepton. In general, the IP method, which requires only one charged pion to reconstruct Φ\Phi^{\ast}, is suitable for analyzing other decay modes such as τ±ρ±ντ\tau^{\pm}\to\rho^{\pm}\nu_{\tau} and τ±a1±ντ\tau^{\pm}\to a^{\pm}_{1}\nu_{\tau}. However, this method has low efficiency due to the significant uncertainty associated with IP reconstruction. The IP of the τjet\tau_{\rm jet} is relatively small compared to the tracking resolution, limiting the precision of its measurement despite the excellent resolution of the detector tracker. An advantage of the neutral pion method is that it does not rely on the reconstruction of the IP. Instead, it requires determining the direction of the neutral pion. The relatively large distance between the primary interaction point and the electro-magnetic calorimeter (ECAL) (𝒪(1)\mathcal{O}(1) m), coupled with the fine ECAL granularity, allows the direction of neutral pions to be reconstructed with smaller relative uncertainties compared to the IP.

3 Analysis methodology

With the theoretical framework established, we now proceed with a phenomenological investigation into the CP properties of the Higgs boson through its decay into a pair of τ\tau leptons. Our analysis centers on the Higgs boson, with a branching ratio BR(hττ)6.23%(h\to\tau\tau)\sim 6.23\%, produced via gluon-gluon fusion with a production cross section of 5151 pb at s=14\sqrt{s}=14 TeV and 4646 pb at s=13.5\sqrt{s}=13.5 TeV. We consider the hadronic decays of the τ\tau lepton through three distinct modes: 1) both τ\tau’s decay τ±π±ντ\tau^{\pm}\to\pi^{\pm}\nu_{\tau}, 2) both τ\tau’s decay to τ±ρ±ντ\tau^{\pm}\to\rho^{\pm}\nu_{\tau}, and 3) both τ\tau’s decay to τ±(a1±ντ)π+2π0ντ\tau^{\pm}(\to a_{1}^{\pm}\nu_{\tau})\rightarrow\pi^{+}2\pi^{0}\nu_{\tau}, with corresponding branching fractions of a single τ\tau lepton is to approximately Br(τπν)=10.8%{\rm Br}(\tau^{-}\rightarrow\pi^{-}\nu)=10.8\%, Br(τρν)=25.49%{\rm Br}(\tau^{-}\rightarrow\rho^{-}\nu)=25.49\%, and Br(τπ2π0ν)=9.26%{\rm Br}(\tau^{-}\rightarrow\pi^{-}2\pi^{0}\nu)=9.26\% according to Ref. [48]. Our networks analyse these three modes as a single input without changing network structures, and extending the analysis into mixed final states, such as πρ\pi\rho, πa1\pi a_{1}, ρa1\rho a_{1}, is straightforward.

In this section, we discuss the construction of the combined signal and background events across different τ\tau decay modes. Furthermore, we describe simulation tools we used for the simulation preserving τ\tau spin correlation in the final states for this analysis.

3.1 Signal reconstruction and background estimation

ATLAS [36] and CMS [39] have established the kinematic selection criteria for the CP mixing Higgs decay search in hττh\to\tau\tau channel. We follow the selection criteria of the ATLAS analysis in this paper. In our simulation, tau jets are reconstructed at the detector simulation level of Delphes [49] with flat identification efficiency of 60%60\% and 1%1\% faking efficiency from light jets. Two reconstructed τ\tau tagged jets are required to fulfil the basic selection cuts of PT>20P_{T}>20 GeV, each containing at least one charged track with PT>1P_{T}>1 GeV inside the jet cone. Moreover, we require the reconstructed missing energy to be T20\not{E}_{T}\geq 20 GeV. Depending on the construction method of the decay mode, different additional selection criteria are applied to enhance the sensitivity.

The IP method is used to reconstruct Φ\Phi^{\ast} for τπντ\tau\to\pi\nu_{\tau} decays. For these events only one charged track is required for a tau decay. The impact parameter method is applicable when IP is larger than detector resolution, therefore reconstructed Φ\Phi^{\ast} and CP mixing values are diluted. The transverse and longitudinal impact parameters d0d_{0} and z0z_{0} of a charged-particle track are defined as the closest distance from the primary vertex to the track in the transverse plane. To improve the efficiency of the IP method, selected events have to satisfy d0100μmd_{0}\geq 100\ \mu m and z02z_{0}\geq 2 mm.

For events with ρ\rho meson decays a charged pion track and reconstructed π0\pi^{0} are required. Similarly, for events with a1±a^{\pm}_{1} decays a charged pion track and two π0\pi^{0} are required.

The condition of reconstruction of π0\pi^{0} in τ\tau decays in ATLAS and CMS experiments are detailed in [50, 51, 52]. We imitate this condition by requiring two (four) reconstructed photons in the Delphes simulation with PT>1P_{T}>1 GeV and ΔR>0.05\Delta R>0.05 following the suggestion in [47] for the ρ±\rho^{\pm} (a±a^{\pm} ) final state, respectively. To apply this condition, we use the truth information of τ\tau jet matching to the ρ\rho (a1a_{1}) momentum direction. Note that the purpose of our paper is to show the improvement using our network from DNN and not to estimate the improvement from the actual experimental situation.

For background estimation, we follow an ATLAS search for Higgs boson decaying to τ\tau lepton pairs [53]. The dominant irreducible background emerges from the Drell–Yan process ppγ/Zττpp\to\gamma^{\ast}/Z\to\tau\tau which contributes 90%90\% of the total background events. Other backgrounds stem from t¯t\bar{t}t and misidentified τ\tau. They are reducible and can be easily separated from two τ\tau jet productions. Misidentifying τ\tau rates range between 0.150.250.15-0.25 for the one prong τjet\tau_{\rm jet} and between 0.010.040.01-0.04 for the three prong τjet\tau_{\rm jet} [53].

Refer to caption
Figure 1: Normalized distributions of the reconstructed Φ\Phi^{\ast} at the detector level are shown for background events in red and for signal with three different CP mixing angles. The pure CP-even distribution is represented by a dashed blue line, the pure CP-odd distribution by a dashed green line, and the maximally CP-mixed state by a dashed orange line.

Figure 1, shows the normalized distribution of the reconstructed Φ\Phi^{\ast} for signal events with different values of the CP mixing parameter θτ\theta_{\tau}, as well as background events. Signal distributions follow the analytical distribution of the Φ\Phi^{\ast} as described in [31, 23]

αβcos(Φ+2θτ),\alpha-\beta\cos\left(\Phi^{\ast}+2\theta_{\tau}\right)\,, (6)

where α\alpha corresponds to the total cross section and β\beta determines the relative magnitude of the asymmetry. As clearly seen, the maximal mixing distribution with θτ=45\theta_{\tau}=45^{\circ} is shifted by a phase of 2θτ2\theta_{\tau} from the pure CP-even distribution. This shift allows for effective discrimination of the CP mixing states at the reconstruction level. Such a clear reconstruction of Φ\Phi^{\ast} after accounting for detector effects is expected, as the impact of the detector on the neutral pion energy resolution and the charged pion transverse momentum resolution does not significantly affect the reconstructed Φ\Phi^{\ast} distribution.

An important effect arises from the granularity of the ECAL in the ηϕ\eta-\phi plane, which impacts the angular momentum resolution in the direction of the neutral pion. This resolution is crucial for distinguishing between single photon showers and two photons from π0\pi^{0} decays. This effect can obscure the differences between the reconstructed Φ\Phi^{\ast} distributions for various CP mixing angles. However, as shown in [47], different granularity values do not significantly alter the reconstructed Φ\Phi^{\ast} distribution. Additionally, the positions of the minima and maxima in distributions with different CP mixing angle values remain unchanged.

3.2 Events generation

For event simulation, we implement the effective Lagrangian for Higgs boson production from gluon-gluon fusion using FeynRules [54]. The NLO corrections to Higgs production are implemented as detailed in [55] giving a production cross section of 5151 pb at the energy of the HL-LHC s=14\sqrt{s}=14 TeV. To incorporate CP-mixing parameters the effective coupling of the Higgs to a τ\tau lepton pair described by Equation 1 is implemented into the same model files with κτ=1\kappa_{\tau}=1.

We employ MadGraph5 [56, 57] for cross section estimation and generating parton-level events. Pythia8.3 [58] is utilized to include parton showering and hadronization effects. To maintain spin correlations in τ\tau lepton decays within the matrix element, we use the TauDecay module [59], which is a part of MadGraph package. This module, integrated into the taudecayUFO model files, ensures spin correlation preservation by extending the matrix element to 2N2\to N, where NN represents the number of final-state pions. The factorization and renormalization scales have been kept at the default MadGraph event by event dynamic choice. Jets are formed using FastJet package [60] utilizing anti-KT algorithm [61] with R=0.4R=0.4. Detector effects are taken into account with the Delphes package [49] using the default ATLAS card. Three datasets for different decay modes of τ\tau lepton are generated separately and combined with ratios according to the branching fraction of each decay mode.

For the deep learning analysis, we use PyTorch Geometric [62] for building the GNN networks, while standard PyTorch [63] is used for the MLP. Finally, the Scikit-Learn package [64] is used to facilitate network training and evaluation.

4 Deep Learning analysis

In this section, we explore the application of various deep learning techniques, including MLP, GCN, and GTN, to investigate the CP properties of the Higgs boson. Each network is designed to handle specific types of input data according to its structure. The MLP is adept at analyzing high-level kinematic distributions, while the GCN and GTN are suited for analyzing heterogeneous graphs constructed from final state particles.

For the MLP study, the kinematical variables of the final state τjet\tau_{\rm jet} pairs are reconstructed, and Φ\Phi^{\ast} is calculated for the final state τjet\tau_{\rm jet} pair. The kinematical variables and Φ\Phi^{\ast} are fed into the MLP. Of course, the kinematical data is normalized between the standard 0 and 1 range to ensure effective processing by the neural network. The network is trained to distinguish the background and signal processes. Another method for training a network to distinguish between different CP states is to use a conditional DNN network, as described in [10].

GNNs, on the other hand, analyze graph-like structures. The standard way is that the nodes of the graph represent the final state particles, and all nodes are fully connected. The graph nodes are weighted with the four-momenta of the final state particles and edges are weighted with the angular distance between each node pair. With this approach it is not easy to incorporate the CP properties of the Higgs boson into a fully connected graph. Instead, we utilize a heterogeneous graph in this paper. A heterogeneous graph comprises multiple types of nodes and edges, each representing different entities and interactions within the experimental setup. Nodes in these graphs can still represent final state particles, but also the reconstructed τjet\tau_{\rm jet} and Higgs boson. Each node type has distinct attributes and properties that define its role within the graph. Edges represent the interactions or relationships between these entities. This method has enhanced flexibility representing the physics we focus on.

To efficiently incorporate CP information into a graph-like structure, we consider a fully connected graph of the final state pions with additional heterogeneous nodes representing the reconstructed τjet\tau_{\rm jet} and the Higgs boson. The edges of the fully connected pion graph are weighted with the angular distance between each node pair, while the edge between the two τjet\tau_{\rm jet} nodes is weighted with the value of the reconstructed Φ\Phi^{\ast}. By selectively connecting τjet\tau_{\rm jet} nodes to their decaying pions, we construct a graph that integrates both the kinematic information of the signal and the CP properties of different Higgs boson states.

4.1 Multi-Layers perceptron

An MLP is a type of feed-forward neural network consisting of an input layer, one or more hidden layers, and an output layer. The input layer size corresponds to the number of kinematic variables being considered. Hidden layers, which contain a certain number of neurons, are where the model learns to capture the complex relationships in the data. The number of hidden layers and neurons in each layer are hyper-parameters that need to be optimized.

Refer to caption
Figure 2: Kinematic distributions before applying selection cuts, used to train the MLP network for a benchmark point with θτ=90\theta_{\tau}=90^{\circ}.

The input of MLP consists of 17 inputs that encompass the kinematic and CP properties of the Higgs boson. Their distributions are shown in Figure 2 for a benchmark point with θτ=90\theta_{\tau}=90^{\circ}. In addition to the transverse momentum pTp_{T}, pseudorapidity η\eta, azimuthal angle ϕ\phi and invariant mass of the leading and second-leading τ\tau jets, the input distributions include the following:

  • T\not{E}_{T}: Missing Transverse Energy, defined as T=|vipT(vi)|{\not{E}_{T}}=|-\sum_{v_{i}}\vec{p_{T}}(v_{i})|, which is the sum of the transverse momenta of the visible particles.

  • m(τjet,τjet)m_{(\tau_{\rm jet},\tau_{\rm jet})}: The invariant mass of the τ\tau jet pair shows a peak around the mass of the SM-like Higgs boson for signal events, while background events peak around the mass of the Z boson. This occurs because the background events are primarily dominated by the ppZγττpp\to Z\gamma^{\ast}\to\tau\tau process.

  • pT(τjet,τjet)p_{T_{(\tau_{\rm jet},\tau_{\rm jet})}}: The transverse momentum of the τ\tau jet pair exemplifies the slight boost of the τ\tau jet pair in signal events compared to background events.

  • E(τjet,τjet)E_{(\tau_{\rm jet},\tau_{\rm jet})}: The energy of the τ\tau jet pair.

  • η(τjet,τjet)\eta_{(\tau_{\rm jet},\tau_{\rm jet})}: The pseudorapidity of the τ\tau jet pair.

  • ϕ(τjet,τjet)\phi_{(\tau_{\rm jet},\tau_{\rm jet})}: The azimuthal angle of the τ\tau jet pair.

  • Φ\Phi^{\ast}: The acoplanarity angle between the two τ\tau planes.

After reconstructing the kinematic distributions we stack all background events and signal events separately, resulting in data sets with dimensions ddistribution=(17,N)d_{\text{distribution}}=(17,N), where NN is the total number of training events. We use equal size training datasets of 8000080000 events for signal and background, 2000020000 events are kept to evaluate the network performance during the training. For the supervised classification problem, we assign a numeric label of Y=1Y=1 to the signal events and Y=0Y=0 to the background events.

Having a suitable MLP structure that can effectively analyze the input data we scan over the MLP hyper-parameters such as the number of the hidden layers, number of neurons in each layer and the initial value of the learning rate. The MLP we use consists of one input layer with a dimension equal to the input dataset dimension, followed by three hidden layers with rectified linear activation Unit(ReLU), where the number of neurons is 256,128,64256,128,64 for each layer, respectively. A drop out layer is inserted after each hidden layer with a dropout rate of 10%10\% of the total number of neurons of each hidden layer. A final output layer is inserted with two neurons and softmax activation that sum up the output probability to one. Once the training process is complete, the MLP is evaluated on the independent testing set with 5000050000 events of signal and background each, providing an unbiased evaluation of the model’s performance.

4.2 Graph Neural Networks

Although MLP offers a straightforward way to analyze the CP states of the Higgs boson, it suffers from low identification performance. This is because each node in the MLP is fully connected to all other nodes in the hidden layer, which dilutes the learned CP state patterns by fully connecting the Φ\Phi^{\ast} distribution to the kinematic distributions. GNNs overcome this issue by analyzing heterogeneous graphs which incorporate nodes containing different types of information and selectively connected edges. This approach is well-suited for encoding kinematic and CP information.

4.2.1 Heterogeneous Graph construction

As mentioned previously, we utilize a heterogeneous graph, rather than a traditional homogeneous fully connected graph, to capture the comprehensive event characteristics from the final state particles. The primary difference between heterogeneous and homogeneous graphs is that heterogeneous graphs can represent multiple types of nodes and edges, each with different properties and relationships, whereas homogeneous graphs consist of a single type of nodes and edges. The heterogeneous graphs can accurately model complex systems with diverse entities and interactions, such as the decay topology of Higgs boson events.

Refer to caption
Figure 3: The constructed heterogeneous graph. Four node types are considered with different colours. Pion nodes are fully connected while other nodes are connected selectively. Edge connection between the τ\tau jets (blue edge) are weighted with the value of the reconstructed Φ\Phi^{\ast}.

In this study, we construct a heterogeneous graph from the final state particles and the reconstructed decayed particles τjet\tau_{\rm jet} and hh. We define four node types, each representing a different type of particle and weighted with different information to encode the physical properties of the event. The first node type comprises the charged and neutral pions. We include six pion nodes for each event, representing the decay products of the τ\tau leptons, up to the 3-prong decay of the τ\tau. For events with a lower number of pions, we pad the remaining nodes with zeros. The features of the pion nodes include pseudorapidity η\eta, azimuthal angle ϕ\phi, angular distance from the τ\tau jet axis θj\theta_{j}, transverse momentum kTk_{T}, the logarithm of the transverse momentum ratio of each τ\tau constituent to the τ\tau jet zlog(PT/PJ)z\equiv\log(P_{T}/P_{J}), and the pion energy EE. The second node type represents the τ\tau jets, with each event including two nodes. Their features are transverse momentum, pseudorapidity, azimuthal angle, the invariant mass and energy of the reconstructed leading τ\tau jet. To encode the full event information, we introduce a third node type representing the missing transverse energy, with one node per event. This node has a single feature: the value of the missing energy in the event. The fourth and final node type represents the Higgs boson, with one node per event. The features of the Higgs node include pseudorapidity, azimuthal angle, transverse momentum, the invariant mass of the system of two τ\tau jet, and their energy. Table 1 summarizes the nodes and edge features used for the heterogeneous graph construction.

Node name Features
π±,0\pi^{\pm,0} ηπ\eta_{\pi}, ϕπ\phi_{\pi} PTπP_{T_{\pi}}, θ(π,τjet)\theta_{(\pi,\tau_{\rm jet})}, EπE_{\pi}, log(PTπ/PTτjet)\log(P_{T_{\pi}}/P_{T_{\tau_{\rm jet}}})
τ±\tau^{\pm} ητjet\eta_{\tau_{\rm jet}}, ϕτjet\phi_{\tau_{\rm jet}} PTτjetP_{T_{\tau_{\rm jet}}}, mτjetm_{\tau_{\rm jet}}, EτjetE_{\tau_{\rm jet}}
T\not{E}_{T} T\not{E}_{T}
hh ηh\eta_{h}, ϕh\phi_{h} PThP_{T_{h}}, mhm_{h}, where PhPτj1+Pτj2P_{h}\equiv P_{\tau_{j_{1}}}+P_{\tau_{j_{2}}}
Edge name Features
πi\pi_{i} - πj\pi_{j} ΔRij\Delta R_{ij}
πi\pi_{i} - τj\tau_{j} log(PTi/PTτj),θj\log(P_{T_{i}}/P_{T_{\tau_{j}}}),\theta_{j}
T\not{E}_{T} - τ\tau 𝕀\mathbb{I}
hh - τ\tau log(PTτ/PTh)\log(P_{T_{\tau}}/P_{T_{h}})
τ\tau - τ\tau Φ\Phi^{\ast}
Table 1: Nodes and edge features of the used heterogeneous graphs.

The graph is constructed to reflect the decay topology of the Higgs boson events, as shown in Figure 3, ensuring that relevant kinematic and CP information is integrated into the graph design. Accordingly, each of the three pion nodes is connected to the corresponding tau node, representing the decay products of the τ\tau lepton. Both tau nodes are connected to the Higgs node, representing the Higgs boson decaying into two τ\tau leptons. Additionally, the two tau nodes are connected by an edge weighted by the value of the reconstructed Φ\Phi^{\ast}, capturing the relative orientation and interaction between the two τ\tau leptons.

This graph structure ensures that the connections between nodes represent the physical interactions occurring in the event, embedding essential physics information directly into the graph. This design enhances the model’s ability to learn and infer the decay kinematics and the CP properties of the Higgs boson by leveraging the intrinsic event topology. Node types can be summarized as follows:

  • Pion node: Six pion nodes are fully connected to each other and their edges are weighted with the angular distance between each pair, ΔR\Delta R. Although charged pion has the same charge as parent τ\tau lepton and the role in the Φ\Phi^{\ast} reconstruction is different, we require the same type of information for both charged and neutral pions in this framework. This is because the information from both charged and neutral pions is used to reconstruct the Φ\Phi^{\ast} before constructing the graph.

  • Tau node: The τ\tau jet pair is connected with an edge, weighted by the value of the reconstructed Φ\Phi^{\ast}. Moreover, the τ\tau jet is connected to the constituent pions nodes weighted by the ratio of the transverse momenta and the angular distance between the corresponding piton and the τ\tau jets.

  • Missing energy node: The missing energy node is weighted with the transverse momentum of the visible particles as T=|υipTυi|\not{E}_{T}=-\left|\sum_{{\upsilon_{i}}}\vec{p}_{T_{\upsilon_{i}}}\right| and connected to each tau node weighted by a unit vector. Connecting the missing energy node to the tau nodes allows the network to fully recover all information needed to reconstruct τ\tau leptons.

  • Higgs node: The Higgs node is weighted by the kinematic proprietaries of the reconstructed four momenta of the τ\tau jet pair and connected only to the τ\tau jet nodes with the ratio of the transverse momentum of the Higgs boson and the corresponding reconstructed τ\tau jet. We do not connect the Higgs node to the missing energy node because τ\tau jet nodes are already connected to the missing energy node and there will be no information gained by adding this edge.

Once the heterogeneous graphs are constructed, we stack all backgrounds and signal events separately and adjust labels with Y=0Y=0 and Y=1Y=1 for the backgrounds and signal events, respectively. During the training process, the model tries to minimize the difference between its predictions and the assigned labels using cross-entropy loss as Y(x)log(Y^(x))-\sum Y(x)\log(\hat{Y}(x)), with YY and Y^\hat{Y} are the true and predicted labels for each class.

4.2.2 GNN training on heterogeneous graphs

Heterogeneous graphs contain various types of information linked to their nodes and edges, making it impossible for a single feature tensor to represent all node or edge features across the entire graph, due to variations in type and dimensionality. Instead, distinct types must be defined for both nodes and edges, each associated with its data tensors, as we have done in Sec 4.2.2. The message-passing framework is modified so that the computation message and update functions are node and edge-specific. Accordingly, training of GNNs on a heterogeneous graph is different from homogenous GNN training. For the training process we follow the methodology introduced in [65] which is detailed as follows:

Refer to caption
Figure 4: A schematic representation of message passing between τjet±\tau^{\pm}_{\rm jet} and hh. This diagram illustrates message passing between two heterogeneous nodes for illustrative purposes only; the actual network includes all the nodes shown in Figure 3.
  • Message Passing and Node Update: Message passing is performed separately for each edge and node type using edge-specific convolution operations. This separation is necessary because nodes and edges containing different information cannot be processed by the same function. Consequently, each edge connection requires an additional GNN layer to adjust the output dimensions before passing information to other nodes in the graph. For example, messages passed from a τjet±\tau^{\pm}_{\rm jet} node to an hh node involve three edge connections: one from τjet±\tau^{\pm}_{\rm jet} to hh, one from hh to τjet±\tau^{\pm}_{\rm jet}, and one from τjet±\tau^{\pm}_{\rm jet} to itself (as the two τjet±\tau^{\pm}_{\rm jet} nodes are interconnected). Each of these three edges necessitates its own GNN layer plus an additional layer, as illustrated in figure 4. The additional layers are incorporated to have the same output dimensions for different inputs across all nodes. In this case, the outputs of the GNN layers are a vector of a fixed size of 3232.

  • Aggregation and Pooling: Once the node embeddings for all types are updated, a global graph-level representation is needed for graph classification tasks. Because all outputs have the same size, the outputs from the GNN layers are summed to a single vector for pooling, and the ReLU activation function is applied. One hidden layer comprises GNN layers, activation functions, batch normalization, and dropouts. These can be repeated multiple times to capture the complex structure of the input data.

  • Graph-Level Classification: For graph classification, this typically involves fully connected layers followed by a softmax activation function that outputs two probability values indicating the class, either a signal or background-like. It is important to note that Figure 4 serves solely as an illustration of how the τjet±\tau^{\pm}_{\rm jet} and hh nodes are updated, whereas the actual training involves message passing for all node connections, as depicted in Figure 3.

This training structure allows for capturing the graph’s heterogeneous nature, ensuring that information from different node and edge types is effectively utilized in the classification task.

4.2.3 Graph Convolution network

GCNs have gathered significant attention in recent years for their ability to learn representations of graph-structured data[66, 67, 68]. The primary goal of a GCN is to learn a function that maps input features to new representations capturing the relationships among the graph nodes [66, 67, 68]. The core concept behind GCN is the generalization of the convolution operation from regular grids to irregular graphs. A graph convolution operation can be seen as a local averaging of features from neighboring vertices capturing both the local structure of the graph and the features associated with each node. Given an input graph G=(V,E)G=(V,E), the graph convolution operation is defined as

H(l+1)=σ(D^12A^D^12H(l)W(l)),H^{(l+1)}=\sigma\left(\hat{D}^{-\frac{1}{2}}\hat{A}\hat{D}^{-\frac{1}{2}}H^{(l)}W^{(l)}\right),

where H(l)N×FlH^{(l)}\in\mathbb{R}^{N\times F_{l}} is the feature matrix at layer ll, with NN being the number of vertices in the graph and FlF_{l} the dimension of the feature space at layer ll. W(l)Fl×Fl+1W^{(l)}\in\mathbb{R}^{F_{l}\times F_{l+1}} is the learnable weight matrix at layer ll. Furthermore, σ\sigma denotes the activation function. The matrix A^N×N\hat{A}\in\mathbb{R}^{N\times N} is the adjacency matrix of the input graph with added self-connections, defined as A^=A+IN\hat{A}=A+I_{N}, where AA is the adjacency matrix of GG and INI_{N} is the identity matrix of size NN. The matrix D^N×N\hat{D}\in\mathbb{R}^{N\times N} is a diagonal matrix with D^i=jA^ij\hat{D}_{i}=\sum_{j}\hat{A}_{ij} representing the degree of vertex ii in the graph with added self-connections. The graph convolution operation can be interpreted as a message-passing mechanism, where each vertex aggregates information from its neighbours and updates its features according to the learned weights. This process is repeated over several layers, allowing the model to capture higher-order relationships between vertices in the graph. We found the optimal architecture consists of three hidden layers. Each hidden layer is followed by a rectified linear unit (ReLU) activation function. The model employs sum aggregation to consolidate information from all nodes in the graph, followed by a final classification layer. The model training process was managed by a learning rate scheduler with an initial learning rate of 0.001, a step size of 5 epochs and a decay factor of 0.8. This configuration was empirically determined to yield the best performance in our evaluations.

Although GCN has demonstrated considerable success, it also has several key limitations. One major drawback of the traditional GCN model is its inability to perform inductive learning tasks due to its dependence on the graph’s specific adjacency matrix. The GCN model is also constrained by its rigid neighbourhood aggregation method. Moreover, GCN uniformly weights all neighbouring nodes during feature aggregation, which can be ineffective if some neighbours carry more significant information than others, which is often the case for our heterogeneous graph structure. These challenges have inspired the creation of numerous GCN variants, such as GTN [69].

4.2.4 Graph Transformer Network

GTN is an advanced approach to handling graph-structured data, leveraging the power of transformer architectures to process information. Transformers were originally developed for natural language processing and introduced in LHC analyses in [70, 71, 72, 73, 74, 75, 76]. GTN combines the strengths of traditional GNN with transformers to provide a more powerful and flexible framework for analyzing graphs. The attention mechanism of the transformer enables the network to selectively focus on different nodes of the input graph, allowing for the modelling of complex relationships and dependencies. Therefore, GTN is suitable for extracting diverse information from nodes with different structures, e.g. pion and τ\tau jet nodes enabling the effective extraction of kinematic and CP information about the signal events.

GTN works by firstly embedding the graph by passing the node features αi\alpha_{i}, for node ii, and edge features βij\beta_{ij}, for each edge between the nodes ii and jj, by a linear projection layer as

hi=Aαi+a0,eij=Bβij+b0,\displaystyle h_{i}=A\ \alpha_{i}+a_{0}\,,\hskip 22.76219pte_{ij}=B\ \beta_{ij}+b_{0}\,, (7)

where A,BA,B and a0,b0a_{0},b_{0} are the trainable matrices and biases of the linear projection layer for the edge and node features, respectively.

After the graph embedding, the self-attention mechanism is applied. This mechanism allows each node to attend to all other nodes in the graph. The attention mechanism calculates attention scores based on the similarities between the feature vectors of nodes. These scores determine the influence of neighbouring nodes on the target node. The attention mechanism works by defining key KK, query QQ, and calculates an attention matrix α\alpha as follows:

αij=QKTdk,\alpha_{ij}=\frac{Q\cdot K^{T}}{\sqrt{d_{k}}}\,, (8)

where Q=WQhi,K=WKhjQ=W_{Q}\cdot h_{i},K=W_{K}\cdot h_{j} with WQW_{Q} and WKW_{K} learned weight matrices, and dkd_{k} is the dimension of the key vectors. For edge-wise attention, the mechanism works as

Aij=exp(αijEek)k𝒩(i)exp(αijEek),A_{ij}=\frac{\exp(\alpha_{ij}\cdot E^{k}_{e})}{\sum_{k\in\mathcal{N}(i)}\exp(\alpha_{ij}\cdot E^{k}_{e})}\,, (9)

where 𝒩(i)\mathcal{N}(i) denotes the neighbours of node ii, Eek=eijWEE^{k}_{e}=e_{ij}\cdot W_{E} is a learnable weight vector for edge type embeddings, and jj stands for the other vertex where edge kk is attached.

Refer to caption
Figure 5: Schematic representation of a GTN layer. See the text for a detailed description of the GTN.

The attention scores are used to aggregate information from neighbouring nodes. Each node updates its feature vector by taking a weighted sum of its neighbours’ features based on the weights derived from the attention scores. This process is analogous to how traditional GNNs aggregate information from neighboring nodes, but with the added flexibility of attention weights. The node update has the form

hi(l+1)=σ(j𝒩(i)mij+hi(l)),h_{i}^{(l+1)}=\sigma\left(\sum_{j\in\mathcal{N}(i)}m_{ij}+h_{i}^{(l)}\right)\,, (10)

where σ\sigma is an activation function and ll denotes the layer number. The message passing, mijm_{ij}, is extended to include the attention effect as

mij=Aij(WVhj+eij),m_{ij}=A_{ij}(W_{V}h_{j}+e_{{ij}})\,, (11)

where WVW_{V} is a learnable weight matrix. The edge update has the same form as the node update. Note that WVhj+eijW_{V}h_{j}+e_{ij} works as the value of the attention mechanism. To capture different types of relationships and interactions GTNs employ multi-head attention. Multiple attention mechanisms run in parallel, each focusing on different aspects of the node features and interactions. The results are then concatenated and linearly transformed to produce the final node embeddings. The final node embeddings are processed through additional layers, a feed-forward neural network, with residual connections to produce the desired output in the form,

hi(l+1)=(hi(l)hj(l)eij(l))h_{i}^{(l+1)}=\mathcal{F}\left(h_{i}^{(l)}\|h_{j}^{(l)}\|e_{ij}^{(l)}\right) (12)

where \mathcal{F} is a feed forward neural network and \| denotes concatenation over all parallel attention heads. A schematic representation of a GTN layer is depicted in figure 5.

The optimized structure of the used GTN is determined through empirical evaluation and is comprised of four GTN layers for each type of graph node, with an additional three layers to adjust the dimensions of the different nodes and edges in the graph. All GTN layers comprise eight attention heads. The output of these layers is fixed to a vector of length 3232 and passed by the ReLU activation function to incorporate non-linearity. To enhance training stability batch normalization is applied after each ReLU activation. Dropout is incorporated to mitigate over-fitting by randomly deactivating 10%10\% of neurons during the training process. This hidden layer, GTN layers, ReLU activation, batch normalization and dropout are repeated three times. We use the sum aggregation function to integrate information from all graph nodes, leading to a final fully connected classification layer with two output neurons. Similar to the GCN case the training was conducted with a learning rate of 0.0010.001, managed by a scheduler with a step size of 55 epochs and a decay factor of 0.80.8.

5 Results

The discrimination power of each network is measured by the background rejection for a given signal efficiency. The discrimination power is intertwined with various kinematic distributions, including CP information. We utilize three different neural networks: MLP, GCN, and GTN, each trained on eleven benchmark points with θτ\theta_{\tau} ranging from 90to 90-90^{\circ}\ \rm to\ 90^{\circ}. Each network was trained and tested on each benchmark point individually. We use the area under curve (AUC) of the receiver operating characteristic curve (ROC) to assess the networks’ performance. ROC is a curve of the True Positive Rate (TPR) as the function of the False Positive rate (FPR), and AUC is the area surrounded by FPR=1{\rm FPR}=1 and TPR>0{\rm TPR}>0. Figure 6 shows the AUC values for all eleven signal points for MLP (green), GCN (orange), and GTN (blue). GTN demonstrates superior performance with an AUC of approximately 88%88\% for all points, while GCN and MLP achieved AUCs of approximately 86%86\% and 84%84\%, respectively. Note that the AUC does not depend on θτ\theta_{\tau} so that the use of DL does not introduce additional bias to the analysis.

Refer to caption
Figure 6: Area Under the ROC Curve (AUC) values for the three networks represented by the filled bullets for eleven CP mixing angles θτ\theta_{\tau} ranging from 90to 90-90^{\circ}\ \rm to\ 90^{\circ}. Each network was trained and tested using the signal samples of eleven individual θτ\theta_{\tau} values.

We also list the number of misclassified events for y>ycuty>y_{\rm cut} where signal efficiency is 80% in table 2 for a benchmark point with θτ=90\theta_{\tau}=90^{\circ} for integrated luminosity of 100100 fb-1 at s=14\sqrt{s}=14 TeV111We take the integrated luminosity close to the current LHC analysis for comparison. . The acceptance of the background events is 1.7% for MLP, 1.0% for GCN and 0.7% for GTN. We also compute the signal significance, following [77, 78]

σsys=2[(S+B)ln((S+B)(B+δB2)B2+(S+B)δB2)B2δB2ln(1+δB2SB(B+δB2))],\sigma_{\rm sys}=\sqrt{2\cdot\left[(S+B)\cdot\ln\left(\frac{(S+B)(B+\delta^{2}_{B})}{B^{2}+(S+B)\delta^{2}_{B}}\right)-\frac{B^{2}}{\delta^{2}_{B}}\cdot\ln\left(1+\frac{\delta^{2}_{B}S}{B(B+\delta^{2}_{B})}\right)\right]}\,, (13)

where S,BS,B is the number of the signal and background events, and δB\delta_{B} represents the systematic uncertainty of the SM background events and is set to 20%20\% [36]. These results demonstrate that the hττh\to\tau\tau process can be effectively identified at the HL-LHC by using the proposed networks.

Selection cuts MLP(TPR>0.8>0.8) GCN(TPR>0.8>0.8) GTN(TPR>0.8>0.8)
Background events 872554 1498214982 89018901 61696169
Signal events 1102 703703 705705 708708
Signal significance 2.9σ2.9\sigma 5.6σ5.6\sigma 7.2σ7.2\sigma 8.6σ8.6\sigma
Table 2: Number of signal and background events at HL-LHC with energy s=14\sqrt{s}=14 TeV and integrated luminosity =100fb1\mathcal{L}=100\rm fb^{-1} for a benchmark point with θτ=90\theta_{\tau}=90^{\circ}. The first column displays the number of signal and background events after the selection cuts. The subsequent columns present the number of signal and background for the used DNNs with a True Positive Rate exceeding 0.8. The last row presents the signal significance, calculated using equation 13.

To better understand the outputs of different networks, particularly the feature regions each network focuses on to achieve its classification performance, we use Shapley Additive Explanations (SHAP) [79]. SHAP is a method to estimate an importance value for each feature using the output of deep learning models. For a given prediction f(x)f(x), the SHAP value for a feature ii is calculated as:

ϕi=S|S|!(|N||S|1)!|N|![f(S{i})f(S)]\phi_{i}=\sum_{S}\frac{|S|!(|N|-|S|-1)!}{|N|!}\left[f(S\cup\{i\})-f(S)\right] (14)

where NN is the set of all features, SS is a subset of NN excluding feature ii, f(S)f(S) is the prediction based on the features in subset SS, and f(S{i})f(S\cup\{i\}) is the prediction with feature ii added to SS. The SHAP value ϕi\phi_{i}, thus, represents the average contribution of feature ii to the prediction over all possible subsets SS. This method ensures a fair and consistent allocation of feature importance considering the correlation between the input distributions. Since the networks used have a static structure with a fixed input dataset size, feature ii is randomly sampled to eliminate its impact on the network output.

Refer to caption
Refer to caption
Figure 7: Average SHAP values for 1000010000 test events for a signal with θτ=90\theta_{\tau}=90^{\circ} for MLP (left) and GTN (right). GTN plot shows the SHAP values for all the graph nodes and the Φ\Phi^{\ast} edge.

Figure 7 shows the SHAP values for 10000 test events for a signal point with θτ=90\theta_{\tau}=90^{\circ} for the MLP (left) and GTN (right). SHAP values are computed for the inputs of both networks, including the reconstructed distributions shown in Figure 2 and the graph nodes plus the Φ\Phi^{\ast} edge shown in Figure 3. The results indicate that in both networks, Φ\Phi^{\ast} significantly affects the network output. Interestingly, the MLP focuses primarily on the information from T,Φ,pT(τhτh)\not{E}_{T},\Phi^{\ast},p_{T_{(\tau_{h}\tau_{h})}} and the transverse momentum of the leading τ\tau jet, while it is less sensitive to all other input distributions. Conversely, the GTN distributes its attention almost equally across all nodes in the input graph, which may explain the improved performance of the GTN over the MLP.

5.1 Shape analysis

In this subsection, we explore the measurement of the CP state of the Higgs boson at the LHC by analyzing the Φ\Phi^{\ast} distribution for all considered θτ\theta_{\tau} values. We focus on the Φ\Phi^{\ast} distribution after maximizing the performance of the DNNs with a TPR 0.8\geq 0.8. Figure 8 shows normalized Φ\Phi^{\ast} distributions after applying a TPR cut for the three networks, considering two benchmark points with CP-mixing angles θτ=0\theta_{\tau}=0^{\circ} (left) and 9090^{\circ} (right). These distributions are obtained from a test sample of 50000 signal and background events, but still peaks at the correct location. Note that because of the high rejection efficiency of the background, the distribution is mostly of the signal. For example, the contribution of the background is less than 1/80 of the signal for GTN. At θτ=0\theta_{\tau}=0, the ratios of the minimum and maximum of the distributions are 0.13/0.19=0.68 for MLP while it is 0.10/0.22 =0.45 for the GTN, showing the signal distribution is reconstructed correctly by using GTN.

We then use these distributions for a binned log-likelihood analysis to test the probability of measuring a non-zero CP mixing angle against the CP-even SM case [47].

Refer to caption
Figure 8: Acoplanarity angle distributions of the signal for CP mixing θτ=0\theta_{\tau}=0^{\circ} (left) and θτ=90\theta_{\tau}=90^{\circ} (right). Dashed histograms represent Φ\Phi^{\ast} for events with True Positive Rate (TPR >0.8>0.8) for the three networks when tested on 5000050000 events for signal and background each. The theoretical prediction is represented by the red solid line.

To do so, we start by constructing a likelihood function, (D|θτ)\mathcal{L}(D|\theta_{\tau}), which represents the probability of observing data DD for a given parameter θτ\theta_{\tau}. For hypothesis testing the negative log-likelihood ratio compares the likelihoods of two hypotheses: the null hypothesis (D|0)\mathcal{L}(D|0) and the alternative hypothesis (D|θτ)\mathcal{L}(D|\theta_{\tau}). The null hypothesis represents the probability of observing data consistent with a purely CP-even Higgs combined with background events, while the alternative hypothesis represents the probability of observing data with θτ0\theta_{\tau}\neq 0 combined with background events. The limit on observing a non-CP-even state is determined by rejecting the null hypothesis at a certain confidence level. The binned negative log-likelihood ratio is defined as [47]

Δln=i[nilog(niνi)+νini],-\Delta\ln\mathcal{L}=-\sum_{i}\left[n_{i}\log\left(\frac{n_{i}}{\nu_{i}}\right)+\nu_{i}-n_{i}\right]\,, (15)

where the sum runs over the bins of the two hypothesis histograms, ni,νin_{i},\nu_{i}. Under the null hypothesis, and for large sample size, the test statistics Δln-\Delta\ln\mathcal{L} approximately follows a Chi-squared distribution with degrees of freedom equal to the difference in the number of parameters between the two hypotheses 222In our case we consider a χ2\chi^{2} distribution with degree of freedom equal to one, which represents the CP-mixing angle.. Using the χ2\chi^{2} distribution, the P-value can be computed as P(χ2Λ)P(\chi^{2}\geq\Lambda), where Λ\Lambda is the test statistic value of Δln-\Delta\ln\mathcal{L}. The confidence level of rejecting the null hypothesis and obtaining a limit on the non-zero CP-mixing is 1P-value1-P\text{-value}.

Refer to caption
Figure 9: The binned log-likelihood result as a function of θτ\theta_{\tau} for the three networks, MLP (orange), GCN (green), GTN (red) and expected ATLAS results (blue). Expected ATLAS results are extracted from [36] for an analysis with s=13\sqrt{s}=13 TeV and integrated luminosity of 139fb1139\rm fb^{-1}. Horizontal dashed lines represent the corresponding confidence levels.

Distributions for the null and alternative hypothesis are weighted according to the expected number of events at the HL-LHC with =100fb1\mathcal{L}=100\rm fb^{-1} after imposing the cut on the network output probability to maximize the signal to background ratio. Figure 9 shows the binned log-likelihood for MLP (orange), GCN (green), GTN (red) and ATLAS results (blue) versus CP mixing angle. We extracted the ATLAS results from a recent analysis with s=13\sqrt{s}=13 TeV for an integrated luminosity of 139fb1139\rm fb^{-1} [36]. It excludes the CP mixing angle θτ|28|\theta_{\tau}\geq|28^{\circ}| at 68%68\% C.L. A similar analysis performed by CMS found an expected value of θτ|21|\theta_{\tau}\geq|21^{\circ}| [40] at 68.368.3 % C.L with integrated luminosity of 138fb1138\rm fb^{-1}. As seen in figure 9, MLP excludes θτ|43|\theta_{\tau}\geq|43^{\circ}| at 95%95\% C.L. GTN shows a superior performance excluding θτ|22|\theta_{\tau}\geq|22^{\circ}| at 95%95\% C.L, while GCN excludes θτ|31|\theta_{\tau}\geq|31^{\circ}| at 95%95\% C.L. MLP excludes the pure CP-odd states nearly 3σ3\sigma while GCN and GTN improve this further. The plots indicate that an improved DNN analysis can enhance both the current LHC and future HL-LHC search. Note that our results are obtained under significant simplification on the event reconstruction for the number of π±\pi^{\pm} and π0\pi^{0}; therefore, they cannot be used for a direct comparison with the experimental results. The recent studies by ATLAS [36] and CMS [40] constraint the pure CP odd Higgs at 2σ2\sigma level as can be seen in the figure. Our MLP results using high-level variables are comparable to the ATLAS ones, indicating the simplified analysis does not affect the core message of this paper.

6 Conclusion

In this paper, we investigate the CP structure of the hτ±τh\tau^{\pm}\tau^{\mp} vertex at the HL-LHC with s=14\sqrt{s}=14 TeV. We consider three different channels of hadronic τ\tau lepton decays for the case where both of the τ\tau lepton decays into the same final state: τ±ρ±(ρ±π±π0)ντ\tau^{\pm}\to\rho^{\pm}(\rho^{\pm}\to\pi^{\pm}\pi^{0})\nu_{\tau}, τ±a1±(a1±π±π0π0)ντ\tau^{\pm}\to a_{1}^{\pm}(a_{1}^{\pm}\to\pi^{\pm}\pi^{0}\pi^{0})\nu_{\tau}, and τ±π±ντ\tau^{\pm}\to\pi^{\pm}\nu_{\tau}. The CP structure of the hτ±τh\tau^{\pm}\tau^{\mp} vertex can be determined from the angular correlation of the τ\tau spins. This correlation can be reconstructed from the angular distribution between the charged and neutral pions of the τ\tau lepton pair decays, even though there are tau neutrinos in the final state.

To improve the projected reach of measuring the CP mixing angle at the HL-LHC, we consider utilizing advanced deep-learning networks to enhance the signal-to-background yield. For this purpose, we employ three different networks to analyze different data structures. The MLP is used to analyze the kinematic distribution and reconstruct the CP mixing angle. However, due to the fully connected nature of the MLP, it fully mixes the kinematic and CP information, diluting the learned CP information and hindering the overall classification performance. To overcome this, we consider heterogeneous graphs constructed from the information stored in the final and decayed particles. With a selective connection of the nodes we fix the processing of the kinematic and CP information. For heterogeneous graph analysis, we adopt two networks, GCN and GTN. We find that GTN shows superior performance in background rejection, achieving a signal significance of 8.6σ8.6\sigma at the HL-LHC with =100fb1\mathcal{L}=100\,\rm{fb}^{-1} for a benchmark point with θτ=90\theta_{\tau}=90^{\circ}. GCN and MLP have lower signal significance for the same benchmark point with 7.19σ7.19\sigma and 5.6σ5.6\sigma, respectively.

After improving the background rejection, we perform a shape analysis for the remaining events. We use a binned negative log-likelihood analysis to estimate the probability of seeing θτ0\theta_{\tau}\neq 0 at the HL-LHC. Keeping the limitation of simplified analysis for τ\tau jet tagging using truth information of τρ\tau\rightarrow\rho and a1a_{1} decays and π0\pi^{0} reconstruction, our results show that the pure CP-odd state is excluded at nearly 3σ3\sigma using the MLP, while GCN and GTN exclude the pure CP-odd state at above the 3σ3\sigma level showing stronger significance than current LHC measurements [36, 40]. Moreover, GTN shows superior performance excluding θτ|22|\theta_{\tau}\geq|22^{\circ}| at 95%95\% C.L., while GCN excludes θτ|31|\theta_{\tau}\geq|31^{\circ}| at 95%95\% C.L. and MLP excludes θτ|43|\theta_{\tau}\geq|43^{\circ}| at 95%95\% C.L. GTN’s improved performance over GCN is due to the fact that GTN applies an attention mechanism during training. The main advantage of the attention mechanism is that it assigns weights to different elements in the input graph, emphasizing the more relevant parts while downplaying the less relevant ones. Conversely, GCN treats all neighbouring vertices equally during the feature aggregation process, which can lead to suboptimal performance if certain neighbours provide more valuable information than others, as is the case encoded in the considered heterogeneous graphs. To ensure the reproducibility of our results, we have made our codes and files publicly available on https://github.com/wesmail/HiggsCP

Acknowledgments

MN and AH are funded by grant number 22H05113, “Foundation of Machine Learning Physics”, Grant in Aid for Transformative Research Areas and 22K03626, Grant-in-Aid for Scientific Research (C). WE is funded by the ErUM-WAVE project 05D2022 "ErUM-Wave: Antizipation 3-dimensionaler Wellenfelder", which is supported by the German Federal Ministry of Education and Research (BMBF). CS is supported by the Office of High Energy Physics of the U.S. Department of Energy under contract DE-AC02-05CH11231 and through the Alexander von Humboldt Foundation. CS performed part of this work at the Aspen Center for Physics, which is supported by a grant from the Simons Foundation (1161654, Troyer).

References

  • [1] Georges Aad et al. Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC. Phys. Lett. B, 716:1–29, 2012.
  • [2] Serguei Chatrchyan et al. Observation of a New Boson at a Mass of 125 GeV with the CMS Experiment at the LHC. Phys. Lett. B, 716:30–61, 2012.
  • [3] Georges Aad et al. Measurements of the Higgs boson production and decay rates and constraints on its couplings from a combined ATLAS and CMS analysis of the LHC pp collision data at s=7\sqrt{s}=7 and 8 TeV. JHEP, 08:045, 2016.
  • [4] Georges Aad et al. Combined measurements of Higgs boson production and decay using up to 8080 fb-1 of proton-proton collision data at s=\sqrt{s}= 13 TeV collected with the ATLAS experiment. Phys. Rev. D, 101(1):012002, 2020.
  • [5] Joseph R. Dell’Aquila and Charles A. Nelson. Distinguishing a Spin 0 Technipion and an Elementary Higgs Boson: V1 V2 Modes With Decays Into ¯\bar{\ell}epton (A) (B\ell(B) And/or q¯\bar{q} (A) q(Bq(B). Phys. Rev. D, 33:93, 1986.
  • [6] Yanyan Gao, Andrei V. Gritsan, Zijin Guo, Kirill Melnikov, Markus Schulze, and Nhan V. Tran. Spin Determination of Single-Produced Resonances at Hadron Colliders. Phys. Rev. D, 81:075022, 2010.
  • [7] S. Y. Choi, D. J. Miller, M. M. Muhlleitner, and P. M. Zerwas. Identifying the Higgs spin and parity in decays to Z pairs. Phys. Lett. B, 553:61–71, 2003.
  • [8] Matthew R. Buckley and Dorival Goncalves. Boosting the Direct CP Measurement of the Higgs-Top Coupling. Phys. Rev. Lett., 116(9):091801, 2016.
  • [9] Rahool Kumar Barman, Dorival Gonçalves, and Felix Kling. Machine learning the Higgs boson-top quark CP phase. Phys. Rev. D, 105(3):035023, 2022.
  • [10] Waleed Esmail, A. Hammad, Adil Jueid, and Stefano Moretti. Boosting probes of CP violation in the top Yukawa coupling with Deep Learning. 2405.16499, 5 2024.
  • [11] J. H. Christenson, J. W. Cronin, V. L. Fitch, and R. Turlay. Evidence for the 2π2\pi Decay of the K20K_{2}^{0} Meson. Phys. Rev. Lett., 13:138–140, 1964.
  • [12] A. Alavi-Harati et al. Observation of direct CP violation in KS,LππK_{S,L}\to\pi\pi decays. Phys. Rev. Lett., 83:22–27, 1999.
  • [13] Kazuo Abe et al. Observation of large CP violation in the neutral BB meson system. Phys. Rev. Lett., 87:091802, 2001.
  • [14] Bernard Aubert et al. Observation of CP violation in the B0B^{0} meson system. Phys. Rev. Lett., 87:091801, 2001.
  • [15] A. D. Sakharov. Violation of CP Invariance, C asymmetry, and baryon asymmetry of the universe. Pisma Zh. Eksp. Teor. Fiz., 5:32–35, 1967.
  • [16] Georges Aad et al. Test of CP Invariance in vector-boson fusion production of the Higgs boson using the Optimal Observable method in the ditau decay channel with the ATLAS detector. Eur. Phys. J. C, 76(12):658, 2016.
  • [17] Morad Aaboud et al. Measurement of the Higgs boson coupling properties in the HZZ4H\rightarrow ZZ^{*}\rightarrow 4\ell decay channel at s\sqrt{s} = 13 TeV with the ATLAS detector. JHEP, 03:095, 2018.
  • [18] Morad Aaboud et al. Measurements of Higgs boson properties in the diphoton decay channel with 36 fb-1 of pppp collision data at s=13\sqrt{s}=13 TeV with the ATLAS detector. Phys. Rev. D, 98:052005, 2018.
  • [19] Vardan Khachatryan et al. Combined search for anomalous pseudoscalar HVV couplings in VH(H bb¯\to b\bar{b}) production and H \to VV decay. Phys. Lett. B, 759:672–696, 2016.
  • [20] Albert M Sirunyan et al. Measurements of the Higgs boson width and anomalous HVVHVV couplings from on-shell and off-shell production in the four-lepton final state. Phys. Rev. D, 99(11):112003, 2019.
  • [21] Albert M Sirunyan et al. Constraints on anomalous HVVHVV couplings from the production of Higgs bosons decaying to τ\tau lepton pairs. Phys. Rev. D, 100(11):112002, 2019.
  • [22] Joseph R. Dell’Aquila and Charles A. Nelson. CP Determination for New Spin Zero Mesons by the τ¯τ\bar{\tau}\tau Decay Mode. Nucl. Phys. B, 320:61–85, 1989.
  • [23] G. R. Bower, T. Pierzchala, Z. Was, and M. Worek. Measuring the Higgs boson’s parity using tau —>> rho nu. Phys. Lett. B, 543:227–234, 2002.
  • [24] K. Desch, Z. Was, and M. Worek. Measuring the Higgs boson parity at a linear collider using the tau impact parameter and tau —>> rho nu decay. Eur. Phys. J. C, 29:491–496, 2003.
  • [25] Stefan Berge, Werner Bernreuther, and Jorg Ziethe. Determining the CP parity of Higgs bosons at the LHC in their tau decay channels. Phys. Rev. Lett., 100:171605, 2008.
  • [26] Stefan Berge and Werner Bernreuther. Determining the CP parity of Higgs bosons at the LHC in the tau to 1-prong decay channels. Phys. Lett. B, 671:470–476, 2009.
  • [27] S. Berge, W. Bernreuther, B. Niepelt, and H. Spiesberger. How to pin down the CP quantum numbers of a Higgs boson in its tau decays at the LHC. Phys. Rev. D, 84:116003, 2011.
  • [28] Stefan Berge, Werner Bernreuther, and Hubert Spiesberger. Higgs CP properties using the τ\tau decay modes at the ILC. Phys. Lett. B, 727:488–495, 2013.
  • [29] Roni Harnik, Adam Martin, Takemichi Okui, Reinard Primulando, and Felix Yu. Measuring CP Violation in hτ+τh\to\tau^{+}\tau^{-} at Colliders. Phys. Rev. D, 88(7):076009, 2013.
  • [30] Matthew J. Dolan, Philip Harris, Martin Jankowiak, and Michael Spannowsky. Constraining CPCP-violating Higgs Sectors at the LHC using gluon fusion. Phys. Rev. D, 90:073008, 2014.
  • [31] Stefan Berge, Werner Bernreuther, and Sebastian Kirchner. Prospects of constraining the Higgs boson’s CP nature in the tau decay channel at the LHC. Phys. Rev. D, 92:096012, 2015.
  • [32] Andrew Askew, Prerit Jaiswal, Takemichi Okui, Harrison B. Prosper, and Nobuo Sato. Prospect for measuring the CP phase in the hττh\tau\tau coupling at the LHC. Phys. Rev. D, 91(7):075014, 2015.
  • [33] Kaoru Hagiwara, Kai Ma, and Shingo Mori. Probing CP violation in hττ+h\to\tau^{-}\tau^{+} at the LHC. Phys. Rev. Lett., 118(17):171802, 2017.
  • [34] Stefan Antusch, Oliver Fischer, A. Hammad, and Christiane Scherb. Testing CP Properties of Extra Higgs States at the HL-LHC. JHEP, 03:200, 2021.
  • [35] Georges Aad et al. Evidence for the Higgs-boson Yukawa coupling to tau leptons with the ATLAS detector. JHEP, 04:117, 2015.
  • [36] Georges Aad et al. Measurement of the CP properties of Higgs boson interactions with τ\tau-leptons with the ATLAS detector. Eur. Phys. J. C, 83(7):563, 2023.
  • [37] Serguei Chatrchyan et al. Evidence for the 125 GeV Higgs boson decaying to a pair of τ\tau leptons. JHEP, 05:104, 2014.
  • [38] Albert M Sirunyan et al. Observation of the Higgs boson decay to a pair of τ\tau leptons with the CMS detector. Phys. Lett. B, 779:283–316, 2018.
  • [39] Armen Tumasyan et al. Analysis of the CPCP structure of the Yukawa coupling between the Higgs boson and τ\tau leptons in proton-proton collisions at s\sqrt{s} = 13 TeV. JHEP, 06:012, 2022.
  • [40] Armen Tumasyan et al. Constraints on anomalous Higgs boson couplings to vector bosons and fermions from the production of Higgs bosons using the τ\tauτ\tau final state. Phys. Rev. D, 108(3):032013, 2023.
  • [41] Matthew Feickert and Benjamin Nachman. A Living Review of Machine Learning for Particle Physics. 2 2021.
  • [42] Kayoung Ban, Kyoungchul Kong, Myeonghun Park, and Seong Chan Park. Exploring the Synergy of Kinematics and Dynamics for Collider Physics. 11 2023.
  • [43] Stefan Berge, Werner Bernreuther, and Sebastian Kirchner. Determination of the Higgs CP-mixing angle in the tau decay channels. Nucl. Part. Phys. Proc., 273-275:841–845, 2016.
  • [44] M. Kramer, Johann H. Kuhn, M. L. Stong, and P. M. Zerwas. Prospects of measuring the parity of Higgs particles. Z. Phys. C, 64:21–30, 1994.
  • [45] K. Desch, A. Imhof, Z. Was, and M. Worek. Probing the CP nature of the Higgs boson at linear colliders with tau spin correlations: The Case of mixed scalar - pseudoscalar couplings. Phys. Lett. B, 579:157–164, 2004.
  • [46] Stefan Berge, Werner Bernreuther, and Sebastian Kirchner. Determination of the Higgs CP-mixing angle in the tau decay channels at the LHC including the Drell–Yan background. Eur. Phys. J. C, 74(11):3164, 2014.
  • [47] Tao Han, Satyanarayan Mukhopadhyay, Biswarup Mukhopadhyaya, and Yongcheng Wu. Measuring the CP property of Higgs coupling to tau leptons in the VBF channel at the LHC. JHEP, 05:128, 2017.
  • [48] S. Navas et al. Review of particle physics. Phys. Rev. D, 110:030001, Aug 2024.
  • [49] J. de Favereau, C. Delaere, P. Demin, A. Giammanco, V. Lemaître, A. Mertens, and M. Selvaggi. DELPHES 3, A modular framework for fast simulation of a generic collider experiment. JHEP, 02:057, 2014.
  • [50] Georges Aad et al. Reconstruction of hadronic decay products of tau leptons with the ATLAS experiment. Eur. Phys. J. C, 76(5):295, 2016.
  • [51] Vardan Khachatryan et al. Reconstruction and identification of τ\tau lepton decays to hadrons and ν\nuτ at CMS. JINST, 11(01):P01019, 2016.
  • [52] Armen Tumasyan et al. Identification of hadronic tau lepton decays using a deep neural network. JINST, 17:P07023, 2022.
  • [53] Morad Aaboud et al. Cross-section measurements of the Higgs boson decaying into a pair of τ\tau-leptons in proton-proton collisions at s=13\sqrt{s}=13 TeV with the ATLAS detector. Phys. Rev. D, 99:072001, 2019.
  • [54] Adam Alloul, Neil D. Christensen, Céline Degrande, Claude Duhr, and Benjamin Fuks. FeynRules 2.0 - A complete toolbox for tree-level phenomenology. Comput. Phys. Commun., 185:2250–2300, 2014.
  • [55] Alexander Belyaev, Neil D. Christensen, and Alexander Pukhov. CalcHEP 3.4 for collider physics within and beyond the Standard Model. Comput. Phys. Commun., 184:1729–1769, 2013.
  • [56] J. Alwall, R. Frederix, S. Frixione, V. Hirschi, F. Maltoni, O. Mattelaer, H. S. Shao, T. Stelzer, P. Torrielli, and M. Zaro. The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations. JHEP, 07:079, 2014.
  • [57] R. Frederix, S. Frixione, V. Hirschi, D. Pagani, H. S. Shao, and M. Zaro. The automation of next-to-leading order electroweak calculations. JHEP, 07:185, 2018. [Erratum: JHEP 11, 085 (2021)].
  • [58] Christian Bierlich et al. A comprehensive guide to the physics and usage of PYTHIA 8.3. SciPost Phys. Codeb., 2022:8, 2022.
  • [59] Kaoru Hagiwara, Tong Li, Kentarou Mawatari, and Junya Nakamura. TauDecay: a library to simulate polarized tau decays via FeynRules and MadGraph5. Eur. Phys. J. C, 73:2489, 2013.
  • [60] Matteo Cacciari, Gavin P. Salam, and Gregory Soyez. FastJet User Manual. Eur. Phys. J. C, 72:1896, 2012.
  • [61] Matteo Cacciari, Gavin P. Salam, and Gregory Soyez. The anti-ktk_{t} jet clustering algorithm. JHEP, 04:063, 2008.
  • [62] Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019.
  • [63] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  • [64] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in python. the Journal of machine Learning research, 12:2825–2830, 2011.
  • [65] Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15, pages 593–607. Springer, 2018.
  • [66] M. Bachlechner, T. Birkenfeld, P. Soldin, A. Stahl, and C. Wiebusch. Partition pooling for convolutional graph network applications in particle physics. JINST, 17(10):P10004, 2022.
  • [67] W. Esmail, A. Hammad, and S. Moretti. Sharpening the A → Z(∗)h signature of the Type-II 2HDM at the LHC through advanced Machine Learning. JHEP, 11:020, 2023.
  • [68] Rameswar Sahu. CapsLorentzNet: Integrating Physics Inspired Features with Graph Convolution. 3 2024.
  • [69] Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjin Wang, and Yu Sun. Masked label prediction: Unified message passing model for semi-supervised classification. arXiv preprint arXiv:2009.03509, 2020.
  • [70] Vinicius Mikuni and Florencia Canelli. Point cloud transformers applied to collider physics. Mach. Learn. Sci. Tech., 2(3):035027, 2021.
  • [71] Benno Käch, Dirk Krücker, and Isabell Melzer-Pellmann. Point Cloud Generation using Transformer Encoders and Normalising Flows. 11 2022.
  • [72] Francesco Armando Di Bello et al. Reconstructing particles in jets using set transformer and hypergraph prediction networks. Eur. Phys. J. C, 83(7):596, 2023.
  • [73] Huilin Qu, Congqiao Li, and Sitian Qian. Particle Transformer for Jet Tagging. 2 2022.
  • [74] A. Hammad, S. Moretti, and M. Nojiri. Multi-scale cross-attention transformer encoder for event classification. JHEP, 03:144, 2024.
  • [75] A. Hammad and Mihoko M. Nojiri. Streamlined jet tagging network assisted by jet prong structure. JHEP, 06:176, 2024.
  • [76] A. Hammad, P. Ko, Chih-Ting Lu, and Myeonghun Park. Exploring Exotic Decays of the Higgs Boson to Multi-Photons at the LHC via Multimodal Learning Approaches. 2405.18834, 5 2024.
  • [77] Tomohiro Abe et al. LHC Dark Matter Working Group: Next-generation spin-0 dark matter models. Phys. Dark Univ., 27:100351, 2020.
  • [78] Stefan Antusch, Eros Cazzato, Oliver Fischer, A. Hammad, and Kechen Wang. Lepton Flavor Violating Dilepton Dijet Signatures from Sterile Neutrinos at Proton Colliders. JHEP, 10:067, 2018.
  • [79] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017.